Algebraic Data Types
Today, I’m not going to talk about unions or Cartesian products or
even set theory. Instead, I want to explain intuitively what algebraic
data types are, and relatedly, why you should consider AND as
multiplication (*
) and consider OR as addition
(+
).
Prerequisites: have some knowledge of at least one programming
language. Not mandatory but helpful, know what a struct is. Also know
that &&
and ||
are used for the AND
and OR operators.
Data Types
Each data type in a struct is logically joined together by the AND
operator. For instance, let us say we have items in a store represented
by the Item
type. Each item has a name and a price. If you
have a struct (code not written in any specific language)
struct Item {
int price;
string name;
}
then every Item
must have a price
AND
name
. Whereas if you’re trying to identify an item in a
store, you can either use the name or the barcode:
union Identifier {
int code;
string name;
}
Unlike a struct like Item
, with both a price
and name, a union like Identifier must contain only one of an
int code
OR a string name
.
So we’ve established that there are two data types: an AND data type and an OR data type. In functional programming, the AND data type is called a “product type” and the OR data type is called a “sum type”. Why?
To rigorously and mathematically define our product and sum types, where each variable of a type could have infinite possibilities, we’d need to turn to set theory. Since we’re trying not to do that, we’ll instead build intuition by only considering the case where each variable has finite possibilities.
As an example, take our Item
struct. Say that each item
in the store is sold for one of 10 possible prices and takes one of 10
possible names. Then we have
possible Items with distinct data. This is why the AND data type is the
product type.
For the OR data type, take our Identifier
union. Say
that each item in the store has one of 10 possible barcodes and takes
one of 10 possible names. Then, in the case we describe an item by its
barcode we have
possibilities, and in the case we describe an item by its name we have
more possibilities, giving us a total of
.
This is why the OR data type is the sum type.
Aside: why are unions called unions?
I promised that I wouldn’t talk about set theory, and I lied. (If you don’t understand set theory feel free to skip this part!)
In set theory, if we have a set of prices and a set of names, then the set of items described by a price and a name would be the Cartesian product . Basically, each element of would be an ordered pair of a price and a name.
Now, if we want a set of all the elements that are either in set or set … well, that would be a union. If is the set of barcodes and is the set of names, then is the set of possible identifiers for an item.
Aside 2: Unions are actually called tagged unions
Actually, in some lower level languages like C and C++ (and for some reason, in some higher level languages like Java too), there is no native support for checking that exactly one field of a union is filled out. Possibly multiple fields could have been assigned values, and there’s no way to check, especially since when initializing the union, all fields could be null. In this case, the programmer must indicate somehow which data type the union actually is. This is done through a tag, and this is why sum types are strictly called “tagged unions”.
Using our Identifier
example, to actually create a
tagged union, we’d need to write a struct like this:
enum Tag {
, NAME
CODE};
union Identifier {
int code;
;
string name};
struct TaggedIdentifier {
enum Tag tag;
union Identifier identifier;
};
(This code is written in C. An enum
in C is a datatype
that must take the value of one of several constants. These tags are
said constants in the case of a tagged union.)
Booleans
Consider the boolean function x && y
. For
x && y
to be true, we need both x
and
y
to be true. Similarly, if we want
x && y && z
to be true, then we need
x
, y
, and z
to all be true.
This motivates the following observation: if we do the hack where we
assign true as 1
and false as 0
, then
x && y = x * y
, and
x && y && z = x * y * z
, etc. So the AND
operator is very clearly equivalent to multiplication.
The OR operator is not as direct an analogue to addition in this case
(the better analogue is exclusive or, aka XOR). In this case,
x || y = x + y > 0
.