"Fooling Pigeon Set" is a horrible name
This post has been significantly updated February 19, 2025 following discussions with students.
I am a CS251 TA as of writing this, but this is not the official stance of the course staff. It merely represents my personal opinion.
Setting the background
Suppose I have an alphabet and a language that is not regular, i.e. cannot be solved via a DFA.
Here is how we prove is not regular:
- Suppose for the sake of contradiction that can be solved by a DFA with states.
- Now we will construct a set of size such that for any , we have that and are sent to different states by the DFA.
- To show this, for every pair , we will construct a word such that exactly one of and is inside .
- This shows that and go to different states. Otherwise, and would go to the same state, and their accepting/rejecting behavior under the DFA would be the same, contradicting that the DFA accepts .
- Because contains strings, any DFA solving must contain at least states. Contradiction.
Mathematical hygiene: avoiding unnecessary proofs by contradiction
Maybe you’ve heard that unnecessarily using a proof by contradiction is bad. There are quite a few philosophical reasons to avoid proof by contradiction unless necessary. But the big one is that it makes your proofs clearer.
Picture this. I want to prove a claim X. Suppose I write the following.
For the sake of contradiction, assume that X is false.
Now we perform some manipulations and conclude that X is in fact true.
Contradiction.
This is really silly. We have assumed that X is false, then proven that X is true, and then derived a contradiction. (Sometimes it is not so immediately obvious when a proof by contradiction is unnecessary. Using contradiction on these so-called “fooling pigeon sets” is not as evidently bad, but it is still quite obviously bad.)
Why on earth did we assume X was false in the first place? We could have just written
We perform some manipulations and conclude X is true.
Let me be clear that the point is not to avoid proof by contradiction as a proof technique. Rather, it is to remove any unnecessary proof by contradictions to streamline the presentation of your proof. Removing unnecessary components of your proof makes it clearer, and thus we should be encouraging that.
Obscuring the main point
Here I have presented the argument in a more hygienic manner than CS251 did for the sake of clearer exposition. (It is still deliberately bad. At the end I will explain how you properly think of these concepts.) Here is what actually happens:
- Instead of showing for every pair that and are sent to different states, we use the Pigeonhole Principle to find and that go to the same state.
- Then we show that, in fact, and go to different states using the method described earlier.
- Contradiction.
Here is the problem. It complicates the argument! Since we have no control over what pair and go to the same state, we have to show that every pair is sent to different states anyways. In other words, the Pigeonhole argument really buys us nothing.
The entire point of a “Fooling pigeon set” is that every pair of strings goes to a different state because we may find a such that exactly one of and is in . This has nothing to do with the Pigeonhole Principle! The point is that each of these strings carves out a different state in any DFA solving the problem, and thus carves out states.
- Suppose that can be solved, but every DFA solving it has at least states. To prove this, all we need to do is produce a DFA solving in states and then create a “fooling pigeon set” with strings.
- Suppose that cannot be solved by a DFA. To prove this, all we need to do is produce a “fooling pigeon set” with infinitely many strings.
I strongly believe this is the correct presentation, because it naturally leads into the “State Complexity” of a language (the minimum number of states required for a DFA solving , which is equivalent to the supremum of the cardinalities of the “fooling pigeon set”). In particular, we no longer treat the finite and infinite cases as distinct. The proof that a language is not solvable in fewer than states for and now follow the same structure.
To summarize: if you want to show that you need at least states to solve a DFA, carve out distinct states with a “fooling pigeon set” of size . And if you want to show a language is undecidable, do this for every . Appealing to the pigeonhole principle is entirely unnecessary and muddies the waters.
CS251 presents the infinite case very strangely
This won’t make sense unless you’ve taken CS251.
Here is the CS251 way to show a language is not regular:
- Suppose for the sake of contradiction it is regular, and a DFA with states solves it.
- Construct a “fooling pigeon set” with size .
Here is the thing. Why not just construct an infinite such set, and directly argue that it carves out infinitely many distinct states? It is more complicated to produce a recipe for generating a -size “fooling pigeon set”, and our “fooling pigeon sets” form a chain as increases anyway! Just take the union of the whole thing, for crying out loud, and use our infinite set!
Doing things properly, for real
It turns out this proof technique is an application of (one direction of) the Myhill-Nerode Theorem.1
Some setup for the theorem: Consider a language . Then
- We say that strings and are distinguished by a string if exactly one of and are in .
- We define an equivalence relation on such that precisely when there is no such that and cannot be distinguished by any string.
- The state complexity of a language is defined to be the minimal number of states in a DFA solving that language. If the language is not regular, we say that the state complexity is .
Trivial exercise. Prove that is in fact an equivalence relation.
Theorem (Myhill-Nerode). The state complexity of is equivalent to the number of equivalence classes in under .
Furthermore, if is regular, any accepter for with minimal states is equivalent to the following DFA: each equivalence class is a state, and state transitions are of the form .
Implicit in this theorem statement is that concatenation is representation invariant2 under . But this is obvious: if , then since no string can separate and , since that would mean separates and .
Proof. For any DFA solving , each equivalence class carves out a distinct state. In other words, no two equivalence classes and can go to the same state because some separates and . So the state complexity is at least the number of equivalence classes.
Furthermore, to show that the state complexity is exactly the number of equivalence classes, all we must show is that the equivalence-class DFA described in the theorem is in fact an accepter. But note that if , then and either are both in or not in since by definition, the empty string does not separate and .3
Finally, to show any DFA accepting with minimal state complexity is equivalent to the above DFA, create a new equivalence relation where if the state of and are the same. Obviously , so the equivalence classes under are subsets of the equivalence classes under . Because the number of equivalence classes is identical (as the number of states are identical), they must be equivalent.
How do we use this fact? By explicitly identifying distinct equivalence classes by showing some collection of representatives can be (pairwise) separated. This is precisely what we are doing with our “fooling pigeon sets”.
One of my main gripes with CS251 is that we frequently see the shadows cast by the basic theorems of the objects we study, but we never actually study the real theorems. (Rice’s Theorem is a prime example of this, by the way.)↩︎
See also my algebra primer.↩︎
Note this argument does not even typecheck (there is no DFA) when the number of equivalence classes is infinite. But there is nothing to show in this case, because if the state complexity is at least , then it is exactly .↩︎