"Fooling Pigeon Set" is a horrible name

This post has been significantly updated February 19, 2025 following discussions with students.

I am a CS251 TA as of writing this, but this is not the official stance of the course staff. It merely represents my personal opinion.

Setting the background

Suppose I have an alphabet $\Sigma$ and a language $L \subset \Sigma^*$ that is not regular, i.e. cannot be solved via a DFA.

Here is how we prove $L$ is not regular:

Suppose for the sake of contradiction that $L$ can be solved by a DFA with $k$ states.
Now we will construct a set $S \subset \Sigma^*$ of size $k + 1$ such that for any $x \neq y \in S$ , we have that $x$ and $y$ are sent to different states by the DFA.
To show this, for every pair $x \neq y$ , we will construct a word $w \in \Sigma^*$ such that exactly one of $x w$ and $y w$ is inside $L$ .
This shows that $x$ and $y$ go to different states. Otherwise, $x w$ and $y w$ would go to the same state, and their accepting/rejecting behavior under the DFA would be the same, contradicting that the DFA accepts $L$ .
Because $S$ contains $k + 1$ strings, any DFA solving $L$ must contain at least $k + 1$ states. Contradiction.

Mathematical hygiene: avoiding unnecessary proofs by contradiction

Maybe you’ve heard that unnecessarily using a proof by contradiction is bad. There are quite a few philosophical reasons to avoid proof by contradiction unless necessary. But the big one is that it makes your proofs clearer.

Picture this. I want to prove a claim X. Suppose I write the following.

For the sake of contradiction, assume that X is false.

Now we perform some manipulations and conclude that X is in fact true.

Contradiction.

This is really silly. We have assumed that X is false, then proven that X is true, and then derived a contradiction. (Sometimes it is not so immediately obvious when a proof by contradiction is unnecessary. Using contradiction on these so-called “fooling pigeon sets” is not as evidently bad, but it is still quite obviously bad.)

Why on earth did we assume X was false in the first place? We could have just written

We perform some manipulations and conclude X is true.

Let me be clear that the point is not to avoid proof by contradiction as a proof technique. Rather, it is to remove any unnecessary proof by contradictions to streamline the presentation of your proof. Removing unnecessary components of your proof makes it clearer, and thus we should be encouraging that.

Obscuring the main point

Here I have presented the argument in a more hygienic manner than CS251 did for the sake of clearer exposition. (It is still deliberately bad. At the end I will explain how you properly think of these concepts.) Here is what actually happens:

Instead of showing for every pair $x \neq y$ that $x$ and $y$ are sent to different states, we use the Pigeonhole Principle to find $x$ and $y$ that go to the same state.
Then we show that, in fact, $x$ and $y$ go to different states using the method described earlier.
Contradiction.

Here is the problem. It complicates the argument! Since we have no control over what pair $x$ and $y$ go to the same state, we have to show that every pair $x \neq y$ is sent to different states anyways. In other words, the Pigeonhole argument really buys us nothing.

The entire point of a “Fooling pigeon set” $S$ is that every pair of strings $x \neq y \in S$ goes to a different state because we may find a $w$ such that exactly one of $x w$ and $y w$ is in $L$ . This has nothing to do with the Pigeonhole Principle! The point is that each of these strings carves out a different state in any DFA solving the problem, and thus $S$ carves out $|S|$ states.

Suppose that $L$ can be solved, but every DFA solving it has at least $k$ states. To prove this, all we need to do is produce a DFA solving $L$ in $k$ states and then create a “fooling pigeon set” $S$ with $k$ strings.
Suppose that $L$ cannot be solved by a DFA. To prove this, all we need to do is produce a “fooling pigeon set” $S$ with infinitely many strings.

I strongly believe this is the correct presentation, because it naturally leads into the “State Complexity” of a language (the minimum number of states required for a DFA solving $L$ , which is equivalent to the supremum of the cardinalities of the “fooling pigeon set”). In particular, we no longer treat the finite and infinite cases as distinct. The proof that a language is not solvable in fewer than $k$ states for $k \in \mathbb N$ and $k = \infty$ now follow the same structure.

To summarize: if you want to show that you need at least $k$ states to solve a DFA, carve out $k$ distinct states with a “fooling pigeon set” of size $S$ . And if you want to show a language is undecidable, do this for every $k \in \mathbb N$ . Appealing to the pigeonhole principle is entirely unnecessary and muddies the waters.

CS251 presents the infinite case very strangely

This won’t make sense unless you’ve taken CS251.

Here is the CS251 way to show a language is not regular:

Suppose for the sake of contradiction it is regular, and a DFA with $k$ states solves it.
Construct a “fooling pigeon set” with size $k + 1$ .

Here is the thing. Why not just construct an infinite such set, and directly argue that it carves out infinitely many distinct states? It is more complicated to produce a recipe for generating a $k + 1$ -size “fooling pigeon set”, and our “fooling pigeon sets” form a chain as $k$ increases anyway! Just take the union of the whole thing, for crying out loud, and use our infinite set!

Doing things properly, for real

It turns out this proof technique is an application of (one direction of) the Myhill-Nerode Theorem.¹

Some setup for the theorem: Consider a language $L$ . Then

We say that strings $x$ and $y$ are distinguished by a string $w$ if exactly one of $x w$ and $y w$ are in $L$ .
We define an equivalence relation $\sim$ on $\Sigma^*$ such that $x \sim y$ precisely when there is no $w \in \Sigma^*$ such that $x$ and $y$ cannot be distinguished by any string.
The state complexity of a language is defined to be the minimal number of states in a DFA solving that language. If the language is not regular, we say that the state complexity is $\infty$ .

Trivial exercise. Prove that $\sim$ is in fact an equivalence relation.

Theorem (Myhill-Nerode). The state complexity of $L$ is equivalent to the number of equivalence classes in $\Sigma^*$ under $\sim$ .

Furthermore, if $L$ is regular, any accepter for $L$ with minimal states is equivalent to the following DFA: each equivalence class is a state, and state transitions are of the form $\delta([x], \sigma) = [x \sigma]$ .

Implicit in this theorem statement is that concatenation is representation invariant ² under $\sim$ . But this is obvious: if $x \sim y$ , then $x z \sim y z$ since no string $w$ can separate $x z$ and $y z$ , since that would mean $z w$ separates $x$ and $y$ .

Proof. For any DFA solving $L$ , each equivalence class carves out a distinct state. In other words, no two equivalence classes $[x]$ and $[y]$ can go to the same state because some $w$ separates $x$ and $y$ . So the state complexity is at least the number of equivalence classes.

Furthermore, to show that the state complexity is exactly the number of equivalence classes, all we must show is that the equivalence-class DFA described in the theorem is in fact an accepter. But note that if $x \sim y$ , then $x$ and $y$ either are both in $L$ or not in $L$ since by definition, the empty string does not separate $x$ and $y$ .³

Finally, to show any DFA accepting $L$ with minimal state complexity is equivalent to the above DFA, create a new equivalence relation $\approx$ where $x \approx y$ if the state of $x$ and $y$ are the same. Obviously $x \approx y \implies x \sim y$ , so the equivalence classes under $\approx$ are subsets of the equivalence classes under $\sim$ . Because the number of equivalence classes is identical (as the number of states are identical), they must be equivalent.

How do we use this fact? By explicitly identifying distinct equivalence classes by showing some collection of representatives can be (pairwise) separated. This is precisely what we are doing with our “fooling pigeon sets”.

One of my main gripes with CS251 is that we frequently see the shadows cast by the basic theorems of the objects we study, but we never actually study the real theorems. (Rice’s Theorem is a prime example of this, by the way.)↩︎
See also my algebra primer.↩︎
Note this argument does not even typecheck (there is no DFA) when the number of equivalence classes is infinite. But there is nothing to show in this case, because if the state complexity is at least $\infty$ , then it is exactly $\infty$ .↩︎