You are on page 1of 2

What are algebraic datatypes?

Bartosz Milewski, physicist gone functional programmer and mathematician

Answered 224w ago · Author has 243 answers and 1m answer views
Any programming language that supports types must provide means of creating new user-
defined types. Amazingly, there are only two basic ways of combining types.
1. You can take two existing types and create a new one that contains both. A pair of int
and bool is such a combination. It contains both, an int and a bool, at the same time. It's an
AND of both types. If you look at types as sets of values, a pair describes a Cartesian
product of two sets. That's why these are called product types. A product of more than two
types forms a tuple. If you give names to elements of a tuple you create a record, or a
struct. These are all product types.

2. You can take two types and create a new one that contains either one of them. It's an OR
of both types. These combinations are called variants or unions. A union of int and bool
contains either an int or a bool, but not both. Again, if you look at types as sets, you call this
thing a disjoint sum. That's why these types are called sum types.

So we have products and sums -- multiplication and addition -- let's write them as * and +.
That sounds like beginnings of an algebra. We also have zero and one. Zero corresponds to
a void type: type that never has any values. One corresponds to a boring singleton type -- an
enumeration with only one member. But you immediately get more interesting types by
summing singletons. A sum of two singletons is a bool. A bool is a two in our algebra:


It can either be true or false: an enumeration of two values. All enumerations are sums of
singletons. Type char is an enumeration of 256 singletons (ASCII). Integer types are very
large enumerations.

Once you have the basic algebra of types, you can start writing equations and solving them.
Infinite data types, like lists or trees, are solutions of algebraic equations. For instance, a
list of 'a's is a solution to the equation:


You read this as: A list x is a data type that is a sum of a singleton (corresponding to an
empty list) and a product of 'a' with a list x. The latter corresponds to a pair of 'a' and a list:
a head of the list and the tail of a list. The head of the list is of type 'a'.

You can iterate this equation by substituting it into itself. The first iteration is:

x = 1 + a * (1 + a * x) = 1 + a + a * a * x

If you keep iterating, you get the infinite series:

x = 1 + a + a * a + a * a * a + ...
This can interpreted as: A list is either empty (the singleton 1) or an element of type 'a', or a
pair of elements of type 'a', or a triple of elements, and so on... That pretty much
enumerates all possible lists of 'a'.

BTW, this analogy with algebra goes on. Function types, for instance, correspond to

Related QuestionsMore Answers Below

Dharmendra Singh, Developer
Answered 224w ago · Author has 227 answers and 1m answer views
"Algebraic" refers to the property that an Algebraic Data Type is created by "algebraic"
operations. The "algebra" here is "sums" and "products":
"sum" is alternation (A | B,meaning A or B butnotboth) "product" is combination (A
B,meaning A and B together)


data Pair = P Int Double isa pairof numbers, an Intanda Double together. The tag P is used
(inconstructorsand pattern matching) to combine the contained values into a single
structure that can be assigned to a variable. data Pair = I Int | D Double isjustone number,
either an Int orelsea Double.In this case, the tags I and D areused(in constructors and
pattern matching) to distinguish between the two alternatives.

Sums and products can be repeatedly combined into an arbitrarily large structures.

Algebraic Data Type is not to be confused with *Abstract* Data Type, which (ironically) is
its opposite, in some sense. The initialism "ADT" usually means *Abstract* Data Type, but
GADT usually means Generalized *Algebraic* Data Type.