Algebraic Data Types

Today, I’m not going to talk about unions or Cartesian products or even set theory. Instead, I want to explain intuitively what algebraic data types are, and relatedly, why you should consider AND as multiplication (*) and consider OR as addition (+).

Prerequisites: have some knowledge of at least one programming language. Not mandatory but helpful, know what a struct is. Also know that && and || are used for the AND and OR operators.

Data Types

Each data type in a struct is logically joined together by the AND operator. For instance, let us say we have items in a store represented by the Item type. Each item has a name and a price. If you have a struct (code not written in any specific language)

struct Item {
    int price;
    string name;
}

then every Item must have a price AND name. Whereas if you’re trying to identify an item in a store, you can either use the name or the barcode:

union Identifier {
    int code;
    string name;
}

Unlike a struct like Item, with both a price and name, a union like Identifier must contain only one of an int code OR a string name.

So we’ve established that there are two data types: an AND data type and an OR data type. In functional programming, the AND data type is called a “product type” and the OR data type is called a “sum type”. Why?

To rigorously and mathematically define our product and sum types, where each variable of a type could have infinite possibilities, we’d need to turn to set theory. Since we’re trying not to do that, we’ll instead build intuition by only considering the case where each variable has finite possibilities.

As an example, take our Item struct. Say that each item in the store is sold for one of 10 possible prices and takes one of 10 possible names. Then we have 10×10=10010\times 10 = 100 possible Items with distinct data. This is why the AND data type is the product type.

For the OR data type, take our Identifier union. Say that each item in the store has one of 10 possible barcodes and takes one of 10 possible names. Then, in the case we describe an item by its barcode we have 1010 possibilities, and in the case we describe an item by its name we have 1010 more possibilities, giving us a total of 10+10=2010+10=20. This is why the OR data type is the sum type.

Aside: why are unions called unions?

I promised that I wouldn’t talk about set theory, and I lied. (If you don’t understand set theory feel free to skip this part!)

In set theory, if we have a set AA of prices and a set BB of names, then the set of items described by a price and a name would be the Cartesian product C=A×BC = A\times B. Basically, each element of CC would be an ordered pair of a price and a name.

Now, if we want a set CC of all the elements that are either in set AA or set BB… well, that would be a union. If AA is the set of barcodes and BB is the set of names, then C=ABC = A\cup B is the set of possible identifiers for an item.

Aside 2: Unions are actually called tagged unions

Actually, in some lower level languages like C and C++ (and for some reason, in some higher level languages like Java too), there is no native support for checking that exactly one field of a union is filled out. Possibly multiple fields could have been assigned values, and there’s no way to check, especially since when initializing the union, all fields could be null. In this case, the programmer must indicate somehow which data type the union actually is. This is done through a tag, and this is why sum types are strictly called “tagged unions”.

Using our Identifier example, to actually create a tagged union, we’d need to write a struct like this:

enum Tag {
    CODE, NAME
};

union Identifier {
    int code;
    string name;
};

struct TaggedIdentifier {
    enum Tag tag;
    union Identifier identifier;  
};

(This code is written in C. An enum in C is a datatype that must take the value of one of several constants. These tags are said constants in the case of a tagged union.)

Booleans

Consider the boolean function x && y. For x && y to be true, we need both x and y to be true. Similarly, if we want x && y && z to be true, then we need x, y, and z to all be true.

This motivates the following observation: if we do the hack where we assign true as 1 and false as 0, then x && y = x * y, and x && y && z = x * y * z, etc. So the AND operator is very clearly equivalent to multiplication.

The OR operator is not as direct an analogue to addition in this case (the better analogue is exclusive or, aka XOR). In this case, x || y = x + y > 0.