presentation of data than you are familiar with, so let me be clear about what I mean by *data*. Data is the building block of meaning and execution. Its complete description subsumes meaning through *type* and execution by the methods of producing and consuming data. To be clear, *data*, in the context of this article, is not binary bits of any form. Those are a *serialization* and will not be considered further. It's my hope that if you understand data then you will have a powerful tool for reasoning about the meaning and behavior of programs. It's also my hope that you will feel at least some small part of the reason why I refer to the subject of this article as "the types of data" instead of "one kind of data" or "data in Haskell". Something large lurks below the surface here. --- The types of data are $$1$$, $$0$$, $$\_ + \_$$, $$\_ \times \_$$, and $$\_ \rightarrow \_$$ where underscores denote "holes" so that $$1$$ has no holes and $$\_ \times \_$$ has two. They are also called "unit", "void", "choice", "pair", and "function space" respectively. In the domain of logic they are called "triviality", "impossibility", "or", "and", and "implication" respectively. In the domain of high-school algebra they are called "1", "0", "addition", "multiplication", and "exponentiation". | Type | Haskell | Logic | High-school Algebra | |- | $$1$$ | `Unit` | trivial | 1 | | $$0$$ | `Void` | impossible | 0 | | $$\_ + \_$$ | `(+)` | or | addition | | $$\_ \times \_$$ | `(*)` | and | multiplication | | $$\_ \rightarrow \_$$ | `(->)` | implication | exponentiation | Data is constructed from *connectives* which may or may not be dependent on other pieces of data. If a connective does not depend upon any other data then it is a constant. Let's talk about the constants first then the connectives. ### Unit, the useless type The type $$1$$ is called "unit". In default Haskell it's denoted by `()`. Unit is special because we can *construct* it without any restriction whatsoever. We can demonstrate this using Haskell. For instance, first we *define* a type named `Unit` (don't worry if you don't understand this syntax) ~~~ data Unit = T deriving Show ~~~ {: .language-haskell} This syntax indicates that we have introduced a new type named `Unit` which can be constucted in exactly one way, by invoking `T`. This implies we can create `Unit` whenever we want by using `T`, which is just what we wanted. The `deriving Show` bit is unimportant, merely needed so that we can more conveniently interact with `Unit` using the interpreter. ~~~ >>> T T it :: Unit ~~~ For those unfamiliar with Haskell's interactive prompt, GHCi, this indicates that when I typed `T` to construct a value (line 1), I get the value which can be shown as `T` back (line 2, this is where the `deriving Show` is used), and then that the most recent value executed in GHCi, automatically named `it`, has type `Unit` (line 3, the syntax `x :: t` is read "`x` has type `t`"). Now that we've talked about the way we can *consider* the `Unit` type and the way that we can *create* `Unit`-typed values we must examine how we *use* `Unit`-typed values. In particular, assume that I have given you a value of type `Unit`: what can you now do? ~~~ val :: Unit ~~~ Since you can construct new values of `Unit` whenever we like without restriction, me providing you gives you *nothing new*. So, we can conclude that there is *no* way to use a value of type `Unit`. It is inert. To summarize: the type $$1$$, called `Unit`, is a type constant which can be constructed trivially and in exactly one way, called `T`. It is useless. As a side note, notice that `Unit` has exactly one inhabitant, `T` [^totality]. This is why it's called "1" and why it behaves like the number 1. [^totality]: In Haskell, honestly, this is untrue for technical reasons. This is a general theme, unfortunately, in that the types of data as I describe them ar e only "mostly" properly implemented in Haskell. Thus, representing them in Hask ell is a matter of convenience alone. Their meaning and structure as described i s the heart of the matter. ### Void, the unobtainable The type $$0$$ is called "void". It does not exist in default Haskell but is available in the package [`void`](http://hackage.haskell.org/package/void) and there called `Void`. Void is special because we *cannot construct it* for any reason whatsoever. Anyone who claims to have a value of type $$0$$ is a liar. Let's define `Void` in Haskell to demonstrate. ~~~ data Void ~~~ {: .language-haskell} Unlike the definition of `Unit`, there are no constructors for `Void`. Thus, we cannot construct it. It's not even possible to demonstrate this fact in the interpreter. The very notion is senseless. Up until this point `Void` might appear to be very useless, but, interestingly `Void` is actually very useful![^unit-is-useless] Again, assume that I have given you a value of type `Void`: what can you now do? [^unit-is-useless]: Remember that it's `Unit` that is useless. We can even think about this exactly: `Unit` has no uses. On the other hand, `Void` has, in some punning sense, *all* uses. So `Unit` is useless and `Void` is merely unobtainabl e. ~~~ val :: Void ~~~ Oddly, as I said just before, anyone who claims to have a value of type `Void` is a liar. So I was a liar, and now so are you. Thus, if I provide you a type `Void` then you can feel free to lie back to me and claim that you have now *anything at all*. What I'd like you to see is that `Void` is a *bomb*. Another way of saying this is that since `Void` can never be constructed then anything which happens "after" a value of `Void` has been produced is entirely hypothetical and thus no longer required to follow the laws of the universe. This is an odd concept, and we'll revisit it later. To foreshadow, however, if you are familiar with the notion of proof by contradiction then you are familiar with `Void`. So, again, to summarize: the type $$0$$, called `Void`, is a type constant which cannot be constructed. It is infinitely useful. As a side note, notice that `Void` has exactly zero inhabitants. This is why it's called "0" and why it behaves like the number 0. ### Aside: Whispers of duality After having now introduced $$1$$ and $$0$$ I can spend a moment talking about an important component of the nature of the types of data, namely: *duality*. Duality is a powerful concept throughout the laws of math and physics. It is an observation: oftentimes, concepts come in pairs which behave identically, but as *mirror images*. If you are familiar with the design and function of circuits then you are already familiar with [a number of dualities](http://en.wikipedia.org/wiki/Duality_(electrical_circuit s)). $$1$$ and $$0$$ are dual to one another, flipped over the axis of construction v. use. $$1$$ is trivially constructable but useless while $$0$$ is impossible to construct but infinitely useful. We will see more duality going forward. Duality is not universal---it may appear or it may not. That said, observations of duality tend to indicate the robustness and generalizability of a concept. ### Sums, choices of construction The type *connective* $$\_ + \_$$ is called "sum". It exists by default in Haskell [named `Either`](http://hackage.haskell.org/package/base-4.7.0.1/docs/Prelude.ht ml#t:Either) but we will reimplement it below. Sums are connectives, not constants, and thus have holes. In general, we must *fill* these holes with other types in order to have a complete type. It may be simpler to think of $$\_ + \_$$ as a *type schema* and $$A + B$$ as a type. Sometimes you could say *fully-saturated type* to indicate that all holes have been filled. In Haskell we indicate the use of holes via *type variables*. To demonstrate, here is a definition of $$\_ + \_$$ in Haskell ~~~ data a + b = Inl a | Inr b deriving Show ~~~ {: .language-haskell} This syntax indicates that the type operator (i.e. type connective) named `(+)` has two holes named `a` and `b`. It can be construct in one of two ways: either using the name `Inl` (for "inject left") or the name `Inr` (for "inject right"). In order to construct a value of `a + b` with `Inl` we must also have a value of type `a`. In order to construct a value of type `a + b` with `Inr` we must also have a value of type `b`. Again we use `deriving Show` for convenience. So long as it is constructable can demonstrate `a + b` interactively, but we must decide upon what `a` and `b` will be. For instance, we can construct values of type `Unit + Unit` in two ways ~~~ >>> Inl T :: Unit + Unit Inl T it :: Unit + Unit >>> Inr T :: Unit + Unit Inr T it :: Unit + Unit ~~~ We can even construct the left-side value of type `Unit + Void` using `Inl`[^sho wing-void] ~~~ >>> Inl T :: Unit + Void Inl T it :: Unit + Void ~~~ [^showing-void]: This code won't work by itself. The code which knows how to sho w `a + b` assumes we can show both `a` and `b`. So far we haven't explained how to show `Void` and we can't use `deriving Show`. Of course, "showing `Void`" doe sn't make sense because there's never anything to show, so we just have to tell the compiler this fact: `instance Show Void where show _ = error "impossible!"` but we cannot produce the right-side value using `Inr` since it would demand we produce a value of type `Void` which is impossible. Further, we cannot construct any values of type `Void + Void` as both the left and right constructors are blocked: `Void + Void` is just as unobtainable as `Void` is. Now that we've talked about the way we can consider the type `a + b` and the way we can construct it we must examine how we use values of type `a + b`. Assume I give you a value of type `a + b` and further assume that you know a way to *use* values of type `a`. Importantly, we have not yet assumed you know how to use values of type `b`. In this circumstance, you are stuck if it turns out that the value I gave you was of the form `Inr something`: since `something :: b` and you don't know how to use `b`, you don't know how to use `a + b`. So we can conclude that in order to use a value of type `a + b` we must know how to use *both* `a` and `b`. To summarize: the type $$a + b$$, called `a + b` or the sum of `a` and `b`, is a type built from the $$+$$ connective which can be constructed from either a value of type `a` or a value of type `b`. In order to use such a value you must have both a way to use `a` and a way to use `b` so that you can handle either case. As a side note, notice that the number of inhabitants of `a + b` is equal to the number of inhabitants of `a` plus the number of inhabitants of `b`. This is why it's called "sum" and how it behaves like algebraic addition. ### Products, collections of constructions The type connective $$\_ \times \_$$ is called "product" and has two holes. It exists by default in Haskell and is called [a pair or tuple](http://hackage.haskell.org/package/base-4.7.0.1/docs/Data-Tupl e.html) but we will reimplement it below. ~~~ data a * b = Tuple a b deriving Show ~~~ {: .language-haskell} This syntax indicates that values of type `a * b` can be constructed in one way: by using the `Tuple` constructor on a value of type `a` and a value of type `b` together. Again, `deriving Show` is inessential but convenient. Some example interactive product constructions follow ~~~ >>> Tuple T T :: Unit * Unit Tuple T T it :: Unit * Unit >>> Tuple T (Inl T) :: Unit * (Unit + Void) Tuple T (Inl T) it :: Unit * (Unit + Void) ~~~ Notably, we cannot construct values of `a * Void`, `Void * b` or `Void * Void` for any types `a` or `b`. Now that we've talked about the way that we can consider the type `a * b` and the way we can construct it we must examine how we use these values. Assume I give you a value of type `a * b`. If you knew how to use values of type `a` then you could merely extract the `a` and use it. Likewise, if you knew how to use values of type `b` then you could extract it and use it. So we can conclude that in order to use a value of type `a * b` you merely have to have a use for either `a` or `b`.[^contraction] [^contraction]: A wary reader might note that you could also get by if you knew how to use *both* `a` and `b` together. This is certainly not eliminated as a po ssibility, but I'm wording it to be suggestive of duality. To summarize: the type $$a \times b$$, called `a * b` or the product of `a` and `b`, is a type built from the $$\times$$ connective and is constructed from a value of `a` and a value of `b` together. In order to use such a value you only need a way to either use `a` or use `b`. As a side note, notice that the number of inhabitants of `a * b` is equal to the number of inhabitants of `a` times the number of inhabitants of `b`. This is why it's called "product" and how it behaves like algebraic multiplication. ### Aside: Further duality It should be clear by this point that products and sums are dual along the construction v. use axis just like $$1$$ and $$0$$ were, but let's be explicit about it anyway. * Sums are constructed from values of *either* their left type or their right and used when there is a use for *both* the left and the right. * Products are constructed from values of *both* their left type and their right type together and used when there is a use for *either* values of their left type or their right type. This duality is interesting because it suggests that if we could somehow package up the notion of "use" into a type then we would have a property like "uses of products are sums of uses" and "uses of sums are products of uses" showing that the two types of data *fit* together very nicely. ### Aside: Multi-way products and sums It might feel restrictive that products and sums as introduced only allow two "arguments" each. Of course, we can extend these notions by noticing that, e.g. `a * (b * (c * d))` behaves like a four-way product and `a + (b + (c + d))` behaves like a four-way sum. There is some technical detail here about whether `a * (b * c)` is *the same as* `(a * b) * c` or merely "interchangeable with", but that's not worth examining. More interestingly, multi-way sums and products open space to ask the question of what a 0-way product or sum is. We can use the pattern to investigate these types. A 0-way product is a type which can be introduced using nothing at all and has no interesting uses (what is the 0-way notion of using "either `a` or `b`" for using `a * b`?). In other words it is `Unit`. A 0-way sum is a type which has no introduction form and can be used toward any end whatsoever. In other words it is `Void`. This generalization is not much more than a thought experiment---there's no reason to force it if it feels uncomfortable. Instead, it mere is suggestive of the importance of `Unit` and `Void`. It's also suggestive of the power of construction v. use duality. ### Exponentials, implications, hypotheticals, functions The type connective $$\_ \rightarrow \_$$ is called "function space". It exists by default in Haskell, denoted as `a -> b`, and we won't reimplement it here. We construct values of `a -> b` by showing that we can *use* `a` to *construct* `b`. Function spaces are thus *very important* as they link use to construction. We demonstrate this construction with *lambda forms* or anonymous functions with syntax like $$\lambda x \rightarrow E$$. In this case, $$x$$ names a value of type `a` which we are assuming we can construct. Then $$E$$ is a expression demonstrating our method of using $$x$$ to construct a value of type `b`. Thus, we can read $$\lambda x \rightarrow E$$ as "assuming a value of type `a` named $$x$$ we can construct a value of type `b` in the way described by $$E$$. An interactive example would be `Unit -> Unit`, i.e. constructing `Unit` from `Unit` ~~~ >>> :t (\x -> T) :: Unit -> Unit (\x -> T) :: Unit -> Unit :: Unit -> Unit ~~~ Notably, we have to use `:t` to ask for the type of `(\x -> T)` because it is not possible to instantiate `Show` for function spaces. The Haskell syntax above uses `\` to stand in for $$\lambda$$ since they sort of look similar. We also must use the parentheses in order to have `(::)` denote the whole lambda expression. Another interactive example of more interest might be `a * (b * c) -> (a * b) * c` ~~~ >>> :t (\(Tuple a (Tuple b c)) -> Tuple (Tuple a b) c) :: a * (b * c) -> (a * b) * c (\(Tuple a (Tuple b c)) -> Tuple (Tuple a b) c) :: a * (b * c) -> (a * b) * c :: (a * (b * c)) -> (a * b) * c ~~~ where we've replaced the standard `x` name with a *pattern match* indicating the usage of our product. Interpreting pattern matches as *use* of a value is an important concept in Haskell, but is merely a detail of our representation of types of data in Haskell. Any way of using tuples as described previously would suffice. Now that we've talked about the way that we can consider the type `a -> b` and the way that we can construct it we must examine how we use these values. Assume I gave you a value of type `a -> b`. If you *also* have a value of type `a` then you can fufill the hypothetical: since `a -> b` is a mechanism for using `a` to produce `b`, you can execute that specific use of `a` and retreive a `b`. Function spaces are crystalized *use*. So to summarize: the type $$a \rightarrow b$$, called the function space from `a` to `b`, is a type built from the $$\rightarrow$$ connective. It's constructed via lambda forms which represent hypothetical arguments for how to construct values of `b` using values of `a`. In order to use such a value, we also require an `a` and then we execute the argument the `a -> b` type value represents in order to produce a `b`. As a side note, notice that the number of inhabitants of `a -> b` is equal to the number of inhabitants of `b` *raised to the power of* the number of inhabitants of `a`. This is probably the must unexpected counting argument thus encountered, but we can produce examples easily. For instance, let's call the type $$1 + 1$$ by the name $$2$$ and $$1 + (1 + 1)$$ by the name $$3$$. Now, $$3 \rightarrow 2$$ is the type of ways of converting values of $$3$$ to values of $$2$$. In order to use $$3$$ we must decide, independently, what to do with its three values: `Inl T`, `Inr (Inl T)` and `Inr (Inr T)`. In order to construct a value of $$2$$ we must decide whether to pick `Inl T` or `Inr T`. This means all together we have to make a binary choice for three options in each value of $$3 \rightarrow 2$$ which leads to $$2^3 = 8$$ options. This is why $$a \rightarrow b$$ is sometimes called an "exponential" and is sometimes written $$b^a$$. ### Aside: Formalizing *use* and *construction* Now that we have function spaces we can bring to fruition an idea from the previous aside. If we decide that our goal is to produce a value of some type `r` (for "result") then a function `a -> r` is a "use" of `a`. So, now, we can state that "a use of `a * b` is `(a -> r) + (b -> r)` and a use of `a + b` is `(a -> r) * (b -> r)`.[^adjunction] [^adjunction]: Here we can be more clear and note that I only claimed that `(a - > r) + (b -> r)` is *a* use of `a * b`. We can generate *all* uses of `a * b` by noting the equivalence of `a * b -> r`, `a -> (b -> r)`, and `b -> (a -> r)`. T his is called the curry [adjunction](http://en.wikipedia.org/wiki/Adjoint_functo rs) between products and exponentials. We could also do the opposite and fix a type `s` (for "source") which we think of as the heart of all values. Now we can say that a function `s -> a` is a "construction" of `a`. Now we can say that ~~~ (s -> (a * b)) * ((a -> r) + (b -> r)) -> (s -> r) ~~~ and ~~~ (s -> (a + b)) * ((a -> r) * (b -> r)) -> (s -> r) ~~~ to completely formalize how products and sums eliminate one another. This is a bit less pretty to write down, but emphasizes the duality between construction and use. ### Aside: Explosions, negations, and *ex falso quodlibet* Now that we have exponentials we can talk more concretely about what the meaning of `Void` is. As stated previously, if we have a value of type void then we can produce anything at all. If we represent this mode of use in terms of a function space then it means that for any type `a` we might desire, we can automatically construct[^constructing quodlibet] the function `Void -> a`. This family of functions, one for every type `a`, is sometimes known as the *principle of explosion* or by the Latin phrase "ex falso quodlibet", "from falsehood follows anything". [^constructing quodlibet]: I say "construct" because I mean that the nature of t he types of data implies that *ex falso quodlibet* must be constructable. That s aid, actually constructing it in a language which implements these laws is a bit more tricky. Very formal languages like Agda and Idris provide special syntax t o assert impossibility. Coq merely represents the law itself in its proof tactic s system. In something like Haskell we have to resort to trickery, however, eith er using a "justifiable `error`" or an infinite loop ([check the source here](ht tp://hackage.haskell.org/package/void-0.6.1/docs/Data-Void.html#v:absurd)). Since it's impossible to produce a value of type `Void`, we know that for any type `a` at least one of the following is impossible to produce as well, values of the type `a` or values of the type `a -> Void`. If it were not the case, if we could construct both some `a` and the function `a -> Void` then we could combine them to produce a value of `Void`. So this must not be possible. This allows us to extend the "impossibility of `Void`" to any type we please. Construction of a function `a -> Void` implies that `a` is not constructable. This is effectively logical negation. Finally, given the notion of negation we might try to formalize proof by contradiction. This would read something like "to prove `a` we assume not `a` and derive impossibility" or, formally ((a -> Void) -> Void) -> a If you try to construct a lambda term to represent this, you will fail. This drives a division between the notion of "proof" and the notion of "construction". Our data types talk about how to construct things, but sometimes we admit proofs without construction. One branch of logic, intuitionistic logic, tries to cope without the ability to prove without constrution. Thus, in intuitionistic logic the structure of data is better preserved... but one must also do without proof by contradiction which can be painful. ## Conclusion We've now covered the types of data, $$1$$, $$0$$, $$\_ + \_$$, $$\_ \times \_$$, and $$\_ \rightarrow \_$$. We've seen that we can describe each one by talking about the shape of its name (i.e. by describing how to saturate it), how to construct a value of one, and how to use a value once constructed. Through this analysis we've seen that we can uncover a very powerful notion of construction/use duality which is carried by the natural fit between sums and products and embodied in the left and right sides of our function spaces. While the types of data start very humbly, we can already construct now very interesting values. For instance, the type $$2 = 1 + 1$$ introduced above is actually the type of booleans: let `Inl T` be truth and `Inr T` be falsity. Unfortunately, while the basis laid so far is very expressive, it is not yet expressive enough to represent most types that are reasonably useful in a program. It does not yet even describe all of the types of Haskell. For this we must introduce one or two further connectives named $$\mu$$ and $$\nu$$. A further article may explain their meaning, construction, and use. But for now I believe this article has served its purpose. These are the types of data, more or less. They are simple yet brimming with meaning and a very powerful notion of *fit*. May they serve you well. ## Discussion There has been some great discussion of this post which lead to a number of corr ecting on Reddit. In particular, please take a look at [`/r/haskell`](http://www .reddit.com/r/haskell/comments/2bj7it/let_me_tell_you_about_the_types_of_data/), [`/r/programming`](http://www.reddit.com/r/programming/comments/2bk1cd/let_me_t ell_you_about_the_types_of_data/), and [`/r/compsci`](http://www.reddit.com/r/co mpsci/comments/2bkgic/let_me_tell_you_about_the_types_of_data_xpost/). ## Bibliography * Per Martin-Lf. [*On the meannings of the logical constants and the justifications of the logi cal laws*](http://www.ae-info.org/attach/User/Martin-L%C3%B6f_Per/OtherInformati on/article.pdf). (1996) * Chris Taylor's [*The Algebra of Algebraic Data Types*](http://chris-taylor.git hub.io/blog/2013/02/10/the-algebra-of-algebraic-data-types/) ([Part 1](http://ch ris-taylor.github.io/blog/2013/02/10/the-algebra-of-algebraic-data-types/), [Par t 2](http://chris-taylor.github.io/blog/2013/02/11/the-algebra-of-algebraic-data -types-part-ii/), and [Part 3](http://chris-taylor.github.io/blog/2013/02/13/the -algebra-of-algebraic-data-types-part-iii/)) ## Footnotes