How to construct a puzzle

After publishing some musings on puzzle construction and outlining the process in private to a few people, I thought I might as well write down the process in one place. The one thing I will omit is what software to use because I have already written about that before.

Before reading past the first section (“Solve puzzles”) you should solve Hedge Maze. If you are having trouble with it I recommend trying Wings first.

Getting started: solve puzzles

Solve a lot of puzzles. And learn something from them.

Puzzles are about communicating and showcasing logical ideas. So the first step is to learn some logical ideas. And the only way to do that is by solving puzzles. Lots of them.

But you can’t be solving any old puzzles. The puzzles you solve should be teaching you things — the right things — so they need to be good. And you need to be able to tell which ones are good. In other words, you need to develop a taste in puzzles.

This means you need to get off of sudoku.com (or any other similar site). These puzzles are computer generated. You are solving puzzles to study ideas. Computers can’t have ideas.¹

So where do you find good puzzles? If you are newer to puzzle-solving, I recommend checking out gmpuzzles.com and some shorter videos on Cracking the Cryptic. If you have a little more experience, try longer Cracking the Cryptic videos or Logic Masters Germany.

I also have my own recommendations for beginners. Furthermore I recommend

for beginners, in that order.

The break-in: logical ideas interact

With some logical ideas under your belt, ask yourself: how do I force them to interact in an interesting way?

At the start there are a lot of things you can do. The vast majority of them lead to nothing useful. Your job is to restrict the search space so that it is feasible to find an interesting break-in. This means it is acceptable, and even encouraged, to set some almost arbitrary constraints on yourself.

By way of analogy, imagine you were designing a logo for a company. You would much rather have a set of reasonable requirements than nothing despite the fact it shrinks the space of good logos to choose from. Because while a set of requirements may rule out some logos that would work perfectly well for the company, it rules out far more logos that would not work well. The important fact is that the proportion of good logos in the search space increases. The same goes for puzzles. This is why even stupid inspirations can work: as idiotic as they are, they still considerably shrink the search space.

Now I will use my puzzle Hedge Maze as a case study. (If you haven’t solved it yet, now is the time!)

The break-in for the puzzle uses two ideas. As far as I know they do not have formal names, so I will call them “Push” and “Square restriction”. (By the way, Wings is a good example of “Push”. That is why I recommended you solve it if you were having trouble with Hedge Maze.)

Push

If we restrict the digit 1 to the green cells in boxes 1 and 3 as shown, then 1 must appear in one of the red cells in box 2.

This is because there already are two 1’s in rows 1 and 2. Thus rows 1 and 2 cannot have any more 1’s, forcing the 1 in box 2 to appear in row 3.

Square restriction

If we restrict the digit 9 from the red squares in boxes 1, 3, 7, 9, then the digit 9 must appear somewhere in the indicated green squares in boxes 1, 3, 7, 9. Furthermore, the digit 9 is restricted from the yellow squares in boxes 2, 4, 6, 8 as a consequence.

The interaction

So if we put cages with sufficiently small sums in the following arrangement:

the low digits in the cages will be pushed
the high digits will perform a square restriction

Applying more pressure with arrows

Square restriction was applying fairly strong pressure to the puzzle and Push was not doing much as of the moment. I wanted to force large digits for two reasons:

push the Square restriction on 8 to its limits,
and make Push behave interestingly by forcing some cages to only have two digits in common (because their common digits are pushed to the row with the arrow bulb).

To that effect I used a common arrow trick: the sum of the digits on the bulbs must equal the sum of the digits on the arrows. The sum of the digits on the bulb is at most 15. The sum of 5 digits in the same box is at least 15.

Motivation and takeaways

So how did I come up with this break-in to begin with? It was not by explicitly saying, “I will make Push and Square restriction interact in exactly this fashion”.² Rather, I decided to put down small-sum cages because my intuition told me they would apply pressure to each other in an interesting way. Of course, my intuition also did recognize the Push and Square restriction implication immediately after; that’s why I decided to keep pushing the idea rather than dismissing it as a boring arrangement of random clues.

This process with the cages took no more than 10 seconds in my head. After working out the implications with the cages, it took me at most a minute to decide on adding arrows. This doesn’t mean that I can bust out a break-in to a puzzle that quickly. But it does mean that I can recognize a good idea and discard bad ideas very fast. And in a creative process, being able to iterate fast is vital.

This is why it is so important to solve puzzles and learn from them. Otherwise I would not have honed my intuition to recognize this arrangement would apply just enough pressure to draw interesting conclusions without breaking the puzzle outright.

Finishing the puzzle: perseverance

Just do it.

After applying a sufficient amount of pressure on the puzzle through the break-in, you are no longer inventing a good puzzle. Instead you are discovering one of the few good sequences of clues to finish it. At the start there are a lot of interesting things you can do. Now at each step, you need to discover the one of the few things you can do to advance the puzzle forward yet still be interesting.

Though I say you are “finishing” the puzzle, the length of this stage can vary wildly. Sometimes you can be almost done after setting the break-in. At other times you might have 90% of the puzzle left to set. An example of the latter: the break-in of Fillomino Chaos Construction consisted of the clues in the bottom-right 5x5 corner.

After setting that 5x5 corner, the puzzle was only around 10% done. But after that, the puzzle was leading me. Each clue would leave an interesting little interaction somewhere else, and so the process continued until the puzzle was finished.

Sometimes the “interesting thing” to do is really complex, almost as complex as the break-in itself. These are the kinds of things that are hard to think of all at once, so you need to be willing to try things out, even if they are not obviously promising at first. Another example from Hedge Maze:

Look at the clues highlighted in green. They don’t really do much on their own, but putting them all together they do a lot, and in very subtle ways. Because it is so subtle — unlike the break-in, which screams “I am sledgehammering this puzzle into submission” — it is not the kind of thing that is easy to come up with, all at once.

So one tip is not to blindly follow the greedy algorithm. Though it is often the right thing to do, it is not always. Sometimes setting up clues that don’t do a lot on their own but feel like they have the potential to set up subtle interactions later is the right thing to do.

Why can’t computers — or more specifically, ML models — generate even mediocre logic puzzles, despite being able to write and produce art at a decent level? Though I’m no ML expert, here are my guesses:
- Nobody has really tried. Logic puzzles are just not popular enough to justify the effort required to train a model in generating them. Plus, there isn’t even enough data to do anything besides generate really bad classic and killer sudokus, something which can already be done by brute force.
- More fundamentally, ML is based on gradient descent, which requires differentiablity and thus continuity.
  
  Image generation, for example, is a quite continuous process. If I change one pixel’s RGB values by a little, the resulting imaes is virtually the same. And if I replace one word of a story with a near-synonym, the meaning is virtually the same as well.
  
  But in a puzzle, if I change a given 1 to a 2, chances are that I have completely broken the puzzle. Puzzles are a chaotic environment: small changes in input lead to big changes in the outcome. So I suspect training an ML model to even recognize good puzzles is well beyond our current capabilities, let alone construct them.
↩︎
To be clear, you can set puzzles by explicitly trying to make logical ideas interact. I just didn’t this time.↩︎