"Fooling Pigeon Set" is a horrible name

This post has been significantly updated February 19, 2025 following discussions with students.

I am a CS251 TA as of writing this, but this is not the official stance of the course staff. It merely represents my personal opinion.

Setting the background

Suppose I have an alphabet Σ\Sigma and a language LΣ*L \subset \Sigma^* that is not regular, i.e. cannot be solved via a DFA.

Here is how we prove LL is not regular:

Mathematical hygiene: avoiding unnecessary proofs by contradiction

Maybe you’ve heard that unnecessarily using a proof by contradiction is bad. There are quite a few philosophical reasons to avoid proof by contradiction unless necessary. But the big one is that it makes your proofs clearer.

Picture this. I want to prove a claim X. Suppose I write the following.

For the sake of contradiction, assume that X is false.

Now we perform some manipulations and conclude that X is in fact true.

Contradiction.

This is really silly. We have assumed that X is false, then proven that X is true, and then derived a contradiction. (Sometimes it is not so immediately obvious when a proof by contradiction is unnecessary. Using contradiction on these so-called “fooling pigeon sets” is not as evidently bad, but it is still quite obviously bad.)

Why on earth did we assume X was false in the first place? We could have just written

We perform some manipulations and conclude X is true.

Let me be clear that the point is not to avoid proof by contradiction as a proof technique. Rather, it is to remove any unnecessary proof by contradictions to streamline the presentation of your proof. Removing unnecessary components of your proof makes it clearer, and thus we should be encouraging that.

Obscuring the main point

Here I have presented the argument in a more hygienic manner than CS251 did for the sake of clearer exposition. (It is still deliberately bad. At the end I will explain how you properly think of these concepts.) Here is what actually happens:

Here is the problem. It complicates the argument! Since we have no control over what pair xx and yy go to the same state, we have to show that every pair xyx \neq y is sent to different states anyways. In other words, the Pigeonhole argument really buys us nothing.

The entire point of a “Fooling pigeon set” SS is that every pair of strings xySx \neq y \in S goes to a different state because we may find a ww such that exactly one of xwx w and ywy w is in LL. This has nothing to do with the Pigeonhole Principle! The point is that each of these strings carves out a different state in any DFA solving the problem, and thus SS carves out |S||S| states.

I strongly believe this is the correct presentation, because it naturally leads into the “State Complexity” of a language (the minimum number of states required for a DFA solving LL, which is equivalent to the supremum of the cardinalities of the “fooling pigeon set”). In particular, we no longer treat the finite and infinite cases as distinct. The proof that a language is not solvable in fewer than kk states for kk \in \mathbb N and k=k = \infty now follow the same structure.

To summarize: if you want to show that you need at least kk states to solve a DFA, carve out kk distinct states with a “fooling pigeon set” of size SS. And if you want to show a language is undecidable, do this for every kk \in \mathbb N. Appealing to the pigeonhole principle is entirely unnecessary and muddies the waters.

CS251 presents the infinite case very strangely

This won’t make sense unless you’ve taken CS251.

Here is the CS251 way to show a language is not regular:

Here is the thing. Why not just construct an infinite such set, and directly argue that it carves out infinitely many distinct states? It is more complicated to produce a recipe for generating a k+1k + 1-size “fooling pigeon set”, and our “fooling pigeon sets” form a chain as kk increases anyway! Just take the union of the whole thing, for crying out loud, and use our infinite set!

Doing things properly, for real

It turns out this proof technique is an application of (one direction of) the Myhill-Nerode Theorem.1

Some setup for the theorem: Consider a language LL. Then

Trivial exercise. Prove that \sim is in fact an equivalence relation.

Theorem (Myhill-Nerode). The state complexity of LL is equivalent to the number of equivalence classes in Σ*\Sigma^* under \sim.

Furthermore, if LL is regular, any accepter for LL with minimal states is equivalent to the following DFA: each equivalence class is a state, and state transitions are of the form δ([x],σ)=[xσ]\delta([x], \sigma) = [x \sigma].

Implicit in this theorem statement is that concatenation is representation invariant2 under \sim. But this is obvious: if xyx \sim y, then xzyzx z \sim y z since no string ww can separate xzx z and yzy z, since that would mean zwz w separates xx and yy.

Proof. For any DFA solving LL, each equivalence class carves out a distinct state. In other words, no two equivalence classes [x][x] and [y][y] can go to the same state because some ww separates xx and yy. So the state complexity is at least the number of equivalence classes.

Furthermore, to show that the state complexity is exactly the number of equivalence classes, all we must show is that the equivalence-class DFA described in the theorem is in fact an accepter. But note that if xyx \sim y, then xx and yy either are both in LL or not in LL since by definition, the empty string does not separate xx and yy.3

Finally, to show any DFA accepting LL with minimal state complexity is equivalent to the above DFA, create a new equivalence relation \approx where xyx \approx y if the state of xx and yy are the same. Obviously xyxyx \approx y \implies x \sim y, so the equivalence classes under \approx are subsets of the equivalence classes under \sim. Because the number of equivalence classes is identical (as the number of states are identical), they must be equivalent.

How do we use this fact? By explicitly identifying distinct equivalence classes by showing some collection of representatives can be (pairwise) separated. This is precisely what we are doing with our “fooling pigeon sets”.


  1. One of my main gripes with CS251 is that we frequently see the shadows cast by the basic theorems of the objects we study, but we never actually study the real theorems. (Rice’s Theorem is a prime example of this, by the way.)↩︎

  2. See also my algebra primer.↩︎

  3. Note this argument does not even typecheck (there is no DFA) when the number of equivalence classes is infinite. But there is nothing to show in this case, because if the state complexity is at least \infty, then it is exactly \infty.↩︎