# Algebraic Data Types

Today, I’m not going to talk about unions or Cartesian products or
even set theory. Instead, I want to explain intuitively what algebraic
data types are, and relatedly, why you should consider AND as
multiplication (`*`

) and consider OR as addition
(`+`

).

Prerequisites: have some knowledge of at least one programming
language. Not mandatory but helpful, know what a struct is. Also know
that `&&`

and `||`

are used for the AND
and OR operators.

## Data Types

Each data type in a struct is logically joined together by the AND
operator. For instance, let us say we have items in a store represented
by the `Item`

type. Each item has a name and a price. If you
have a struct (code not written in any specific language)

```
struct Item {
int price;
string name;
}
```

then every `Item`

must have a `price`

AND
`name`

. Whereas if you’re trying to identify an item in a
store, you can either use the name or the barcode:

```
union Identifier {
int code;
string name;
}
```

Unlike a struct like `Item`

, with *both* a price
and name, a union like Identifier must contain only one of an
`int code`

OR a `string name`

.

So we’ve established that there are two data types: an AND data type and an OR data type. In functional programming, the AND data type is called a “product type” and the OR data type is called a “sum type”. Why?

To rigorously and mathematically define our product and sum types, where each variable of a type could have infinite possibilities, we’d need to turn to set theory. Since we’re trying not to do that, we’ll instead build intuition by only considering the case where each variable has finite possibilities.

As an example, take our `Item`

struct. Say that each item
in the store is sold for one of 10 possible prices and takes one of 10
possible names. Then we have
$10\times 10 = 100$
possible Items with distinct data. This is why the AND data type is the
product type.

For the OR data type, take our `Identifier`

union. Say
that each item in the store has one of 10 possible barcodes and takes
one of 10 possible names. Then, in the case we describe an item by its
barcode we have
$10$
possibilities, and in the case we describe an item by its name we have
$10$
more possibilities, giving us a total of
$10+10=20$.
This is why the OR data type is the sum type.

### Aside: why are unions called unions?

I promised that I wouldn’t talk about set theory, and I lied. (If you don’t understand set theory feel free to skip this part!)

In set theory, if we have a set $A$ of prices and a set $B$ of names, then the set of items described by a price and a name would be the Cartesian product $C = A\times B$. Basically, each element of $C$ would be an ordered pair of a price and a name.

Now, if we want a set $C$ of all the elements that are either in set $A$ or set $B$… well, that would be a union. If $A$ is the set of barcodes and $B$ is the set of names, then $C = A\cup B$ is the set of possible identifiers for an item.

### Aside 2: Unions are actually called tagged unions

Actually, in some lower level languages like C and C++ (and for some reason, in some higher level languages like Java too), there is no native support for checking that exactly one field of a union is filled out. Possibly multiple fields could have been assigned values, and there’s no way to check, especially since when initializing the union, all fields could be null. In this case, the programmer must indicate somehow which data type the union actually is. This is done through a tag, and this is why sum types are strictly called “tagged unions”.

Using our `Identifier`

example, to actually create a
tagged union, we’d need to write a struct like this:

```
enum Tag {
, NAME
CODE};
union Identifier {
int code;
;
string name};
struct TaggedIdentifier {
enum Tag tag;
union Identifier identifier;
};
```

(This code is written in C. An `enum`

in C is a datatype
that must take the value of one of several constants. These tags are
said constants in the case of a tagged union.)

## Booleans

Consider the boolean function `x && y`

. For
`x && y`

to be true, we need both `x`

and
`y`

to be true. Similarly, if we want
`x && y && z`

to be true, then we need
`x`

, `y`

, and `z`

to all be true.

This motivates the following observation: if we do the hack where we
assign true as `1`

and false as `0`

, then
`x && y = x * y`

, and
`x && y && z = x * y * z`

, etc. So the AND
operator is very clearly equivalent to multiplication.

The OR operator is not as direct an analogue to addition in this case
(the better analogue is exclusive or, aka XOR). In this case,
`x || y = x + y > 0`

.