You are on page 1of 73

math people math for non-math people

Intro to Category Theory


Chapter 2: Categories (Part 1 of 3)
SOURCES
Barr M, Wells C. Category theory for computing science.
Montréal: <> Publications CRM; 1999.
DISCLAIMERS
What we’re gonna be doing in this chapter is defining a lot of category
theory-specific language and categories. I’ll do my best to provide
illustrations of the latter.

Also, a note on composition: composition is essentially using the output


of one function as the input of another function, which makes it a very
important concept when we are talking about translating data type from one
structure to another.

We are now using the notation f ∘ g for compositions and using an actual
arrow “→” for functions. I thought the prior raising the “ ∘” was an issue
with the Consolas font.
Basic definitions
Before we define categories, we need a shitty definition we can
build upon.

-let k>0. In a graph G, a path from a node x to a node y of length


k is a sequence (f1,f2,...,fk) of (not necessarily distinct) arrows
for which

- 1) source(fk)=x
- 2) target(fi)=source(fi-1) for i=2,...,k
- 3) target(f1)=y
Paths
By convention, for each node x there is a unique path of length 0
from x to x that is denoted (). It is called the empty path at x.

Observe that if you draw a path as follows:

With the arrows going from left to right, fk will be on the left
and the subscripts will go down from left to right. We do it this
way for consistency with composition.

For any arrow f, (f) is a path length of one (because it’s just
one arrow duh).
Note on paths
We directly refer to paths as the actual parentheses they
symbolize
In the diagram above, there is just one path of any natural
number length from n to n, namely () then the number of
loops ( (succ),(succ,succ),(succ,succ,succ) and so on ).
The set of paths
The set of paths of length k in a graph is denoted G k.

-in particular, G2, which will be used in the


definitions of category, is the set of pairs of arrows
(g,f) for which the target of f is the source of g. There
are called composable pairs of arrows.
aside
we have now assigned two meanings to G0 and G1 for things that are
not the same, but their meanings are parallel and they have a one-
to-one correspondence. The collection of paths of length one is
the same as the collection of arrows of G, so referring to either
as G1 is indifferent.

Similarly, the number of empty paths is pretty much one for each
node, so G0 can refer to both. If there were an object that
required the composition of two arrows, we could pretty much refer
to it the same as an element of a set G2 (illustrated next panel)
illustrated

Compare the use of ‘2’ to denote either the integer or the real number
(they are in parallel spaces and often mean the same thing). One might
want to keep the two meanings of G1 separate for purposes of
implementing a graph as a data structure.
Category
A category is a graph C together with two functions c: C 2→C1 and u: C0→C1.
The elements of C0 are called objects and those of C1 are called arrows.

-The function c is called composition (and if you look at it, it makes


sense- we are taking a structure of two arrows C 2 and reducing it to one
arrow C1), and if (g,f) is a composable pair, c(g,f) is written g ∘ f and
is called the composite of g and f.

-If A is an object of C (and as such an element of C 0), u(A) is denoted


idA, which is called the identity of object A (the rule C-3 below
essentially proposes drawing a loop around the object, essentially
assigning an arrow to it)
Categories have 4 properties
C-1: The source of g ∘ f is the source of f and the target of
g ∘ f is the target of g (defines composition)
C-2: (h ∘ g) ∘ f = h ∘ (g ∘ f) whenever each side is defined
(it is necessary to insist that the codomain of f be the codomain
of g for the composite g ∘ f to be defined, and same case for h ∘
g. Some branches of mathematics will allow only the image of f
being in the domain of g) (defines composition associativity)
C-3: The source and target of idA are both A (defines objects)
C-4: If f:A→B, then f∘idA = idb∘f = f (defines arrows)
In short
For most general purposes, a category is a graph (meaning
with arrows and nodes) with a composition function,
associativity of the compositions, and an identity
function. We will often use these as criteria for
representing a mathematical structure as a category

A category is a collection of arrows, and the objects


(object derived from an arrow going to itself which is the
significance of the identity arrow, as the “do-nothing
operation”) and the combinations of these arrows
on composition
The composite c being defined on G2 implies that g ∘ f is
defined if and only if the source of g is the target of f.
This means that composition is a function whose domain is
an equationally defined subset of G1 x G1: the equation
requires that the source of g is the target of f. It
follows from this and C-1 that in C-2, one side of the
equation is defined if and only if the other side is
defined.
note
-in the category theory literature, id A is often just
written as A (think about this, this is the point of rule
C-3: defining objects in terms of arrows so you can make
arrows the whole composition of the category)

-Inside the category the arrow is our atomic unit. The


arrow is to the category what the element is to the set.
Terminology
Morphism: arrow

Domain: source

Codomain: target

We will normally denote objects of categories by capital


letters but nodes of graphs (except when we think of a
category of a graph) by lower case letters. Arrows are
always lower case.
continued
-In computing science literature, the composite g ∘ f is
sometimes written f;g, a notation suggested by the perception
of a typed functional programming language as a category.

-We presented the concept of category as a two-sorted data


structure (the sorts are the objects and the arrows).
Categories are sometimes presented as one-sorted (arrows
only). The objects can be derived from C-3 and C-4
characterizing idA, so that there is a one to one
correspondence between the objects and the identity arrows id A.
Small and large categories
-A category is small if its objects and arrows form sets.
Otherwise, (if either the arrows or objects don’t form
sets) it is large (see: Russell’s paradox in the last video
on how a collection can be not a set).

-the category of sets and functions is an example of a


large category, in principle one must be wary of dealing
with large classes but in practice it is often not a
problem.
Hom sets
-If A and B are two objects of a category C, then the set of all arrows of
C that have source A and target B is denoted Hom C(A,B), or just Hom(A,B) if
the category is clear from context. This generalizes our definition of a
Hom set from last video.

-thus, for each triple A,B,C of objects, composition induces a


function

Hom(B,C) x Hom(A,B) → Hom(A,C)

Again, a set of the form Hom(A,B) is called a Hom set. Other common
notations for Hom(A,B) are C(A,B) and C(AB).
continued
-the reference to the set of all arrows from A to B assumes
that the arrows do in fact form a set. A category with the
property that Hom(A,B) is a set for all objects A and B is
called locally small.
Path definitions
-for any path (f1,f2,...,fn) in a category C, f1 ∘ f2 ∘ … ∘
fn is defined recursively by

f1 ∘ f2 ∘ … ∘ fn = (f1 ∘ f2 ∘ … ∘ fn-1) ∘ fn, n>2 [ILLUSTRATE]

-the general associative law: for any path (f1,f2,...,fn) in


a category C and any integer k with 1<k<n,

(f1 ∘ … ∘ fk) ∘ (fk+1 ∘ ... ∘ fn) = f1 ∘ f2 ∘ … ∘ fn


The smallest categories
-the smallest category has no objects and by implication no
arrows. The next smallest category has one object and one
arrow, which must be the identity arrow. This category may
be denoted 1 (we will leave it in another video if this
defines the natural number). Other categories that will be
occasionally referred to are categories 1 + 1 and 2
illustrated next panel (the loops are identities as we
previously defined). In both cases the choice of the
composites is forced:
1+1 and 2 illustrated
Wait, what we saw looked very important
A loop on the same node then an arrow to another node
counts as a composition.

Category theory has its own formulation of the natural


numbers outside of ZF set theory. What you saw was just it.
Notes before delving into various categories...
-meeting criteria C-1 to C-4 is what makes something a
category.

-the existence of an identity function or a “do-nothing


operation” is important for something to be classified as a
category because it shows that the domain has objects
The category of sets (denoted Set)
the category of sets is the category whose objects are sets and whose arrows are
functions with composition of functions for c (C-1) and the identity function from S
to S for ids (C-3)

The statements that this is a category amounts to the statements that composition of
functions is associative (C-2) and that identity function id s:S→S satisfies f∘ids=f
and ids∘g=g for all f with source S and all g with target S (C-4)

The axioms of a category are met by composition of functions being associative. The
fact that composition of arrows is associative follows by using the definition of
composition repeatedly:

((h ∘ g) ∘ f)(x)=(h ∘ g)(f(x))=h(g(f(x)))=h((g ∘ f)(x))=(h ∘ (g ∘ f))(x)


Note on Set
Remember our definition of a category earlier? Replace the
word arrow with function and you have the category of sets.

Set is pretty much the prime/ canonical example of a


category, as it cannot be formalized in a set theory
without some notion of a universe.
The category of finite sets (denoted Fin)
Fin is the category whose objects are finite sets and
arrows are all the functions between finite sets.
The category of sets and partial functions (denoted Pfn)

A partial function from a set S to a set T is a function with


domain S0 and codomain T, where S0 is some subset of S. The
category of sets and partial functions has all sets as objects and
all partial functions as arrows.

If f:S→T and g:T→V are partial functions with f defined on S 0⊆S


and g defined on T0⊆T, the composite g∘f:S→V is the partial
function from S to V defined on the subset {x∈S0 | f(x) ∈ T0} of S
by the requirement (g∘f)(x)=g(f(x))

(composition so defined is also associative)


The category of sets and relations (denoted Rel)

If α is a relation from a set S to a set T and β is a


relation from set T to set U, then the composite β ∘α is the
relation from S to U defined as follows: if x∈S and z ∈U,
(x,z) ∈ β∘α if and only if there is an element y ∈T for
which (x,y)∈α and (y,z)∈β [next panel]. With this
definition of composition, the category Rel of sets and
relations has sets as objects and relations as arrows. The
identity for a set S is the diagonal relation Δ S={(x,x)|
x∈A}.
Categories and computer science
Categories arise in computing science in an intrinsic way.
These include functional programming languages (to be
tackled shortly), deductive systems (in a later video), and
automata with typed states (also in a later video).
Functional programming languages
Functional programming is a programming paradigm where
programs are constructed by applying and composing functions

Constructions in functional programming languages make a


functional programming language look very much like a
category. Deduction systems are also essentially categories.

This is how categories and functioning programming languages


are similar, and the technical issues in making them precise.
FPL descriptions
A functional programming language can be described as one that
gives the user some primitive “types” (strings integers floats
Booleans etc) and “operations” (functions that move between these
types) and some constructors from which one can produce more
complicated types and operations.

What a pure functional programming language however does not have


is variables or assignment statements.You write a program by
applying constructors to the types, constants, and functions, and
running a program consists of applying an operator to constants of
the input type to produce values.
FPL examples
Some say that only no assignment statements are needed for
a language to be functional- in this sense, Miranda and
Haskell are functional.

Python, Lisp, and Haskell are examples of this. Any


language that has first-class functions and closures like
Smalltalk, Ruby, Python, C#, C++, PHP, and Perl but it’s
not ideal.
FPL rules
Functional programming languages have:

FPL-1: Primitive data types, given the language


FPL-2: Constants of each type
FPL-3: Operations, which are functions between the types
FPL-4: Constructors, which can be applied to data types
and operations to produce derived data types and operations
of the language

Do these definitions seem familiar?


continued
All of the operations and types you can derive from these
building blocks (which we call ‘primitive’ i.e primitive
data types and primitive operations) make the language (and
hopefully everything you wanna express can be made from
those).

The word ‘primitive’ simply means given in the definition


of the language rather than constructed from further
blocks/ constructors (some authors use the word
‘constructor’ for the primitive operations).
Now, here is where we make the link- if we play around these rules and reorganize them, like
two assumptions and one miniscule change, we can make these rules canonical to the rules
that designate a category:

A-1: We assume there is a “do nothing” operation idA for each type A (primitive and
constructed). When applied, it does nothing to the data[1]
A-2: We add to the language an additional type called 1, which has the property that
from every type A there is a unique operation to 1. We interpret each constant c of type A
as an arrow c:1→A (note that we aren’t talking about the number 1. ‘1’ is a value that
occurs in strings, floats, integers, and Booleans). The operation defines the constant, and
as such it’s no longer separate data
A-3: We assume the language has a composition constructor: take an operation f that
takes something of type A as input and produces something of type B (so like float to
integer for example), and another operation g that has input of type B and output of type C
(such as from integer to string or integer to Boolean). The doing one after the other is a
derived operation (or program) typically denoted f;g, which has input of type A and output
of type C[2]
[1] See what we are doing here? We are treating the data type as the object and the data as in the data type, which makes sense- a string is an element of the type
“string”. And this “do nothing” operation is required for categories and is how categories derive objects/types from arrows
[2] So this serves as the equivalent of the composition function requirement for a category
relating these rules to composition
-as we just stated, functional programming languages generally
have do-nothing operations and composition constructors. A-1
and A-3 fit the concepts that they would in defining a
category (C-3 and C-1 respectively). The language resulting
from A-2 is operationally equivalent to the original language.

-composition for categories must be associative in the sense


that if either of (f;g);h or f;(g;h) is defined, so is the
other and they are the same operation (C-2), and so this must
be true for a functional programming language.
more additions
We must also require, for f:A→B, that f;idB and idA;f are
defined and are the same operation as f. That is, we impose
the equations f;idB=f and idA;f=f on the language. Both
these requirements are reasonable in that in any
implementation, the two operations required to be the same
would surely do the same thing.

With these conditions, a functional programming language L


has a category structure C(L) for which:
Category structure of FPL
FPC-1: The types of L are the objects of C(L)
FPC-2: The operations (primitive and derived) of L are
the arrows of C(L)
FPC-3: The source and target of an arrow are the input
FPC-4: Composition is given by the composition
constructor, written in reverse order
FPC-5: The identity arrows are do nothing operations
note
observe that C(L) is a model of the language but not the
language itself. For example, in the category f;id B=f, but
in the language f and f;idB are different source programs.
This is in contrast to the treatment of languages using
context-free grammars: a context-free grammar generates the
actual language.
FPL Example
Types: NAT, BOOLEAN, CHAR (natural numbers, true or false, and characters
respectively)

i) NAT should have a constant 0:1→NAT and an operation succ:NAT→NAT

ii) There should be two constants true,false:1→BOOLEAN and an operation ¬


subject to the equations ¬∘true=false and ¬∘false=true

iii) CHAR should have on constant c:1→CHAR for each desired character c

iv) There should be two type conversion operations ord:CHAR→NAT and


chr:NAT→CHAR. These are subject to the equation chr∘ord=idCHAR (Think of chr
as an operation that is defined on all natural numbers, like counting the
modulo of the number of characters)
continued
We can make programs with this. An example program is the arrow
‘next’, which we define as the composite chr∘succ∘ord:CHAR→CHAR.
This calculates the next character in order. The arrow ‘next’ is
an arrow in the category (it would appear in the graph of the
category as one of the composition permutations of the primitive
arrows) representing the language, and so is any other composite
of a sequence of operations.

The objects are the types and constants: NAT, BOOLEAN, CHAR, and
1 (observe that typing is a natural part of the syntax in this
approach)
further continued
The arrows of C(L), which includes all of the possible permutations of
compositions, consists of all programs, with two programs being identified
if they must be the same because of the equations. For example, the arrow

chr∘succ∘ord:CHAR→CHAR

just mentioned and the arrow

chr∘succ∘ord∘chr∘ord:CHAR→CHAR

must be the same as per definition in (iv).

-this implies NAT has constants succ∘succ∘...succ∘0 where succ


occurs zero or more times.
further further continued
Composition in the category is composition of programs. Compositions are
only well defined if when two composites of primitive operations are equal
then their composites with any other program must be equal. For example,
we must have

ord∘(chr∘succ∘ord)=ord∘(chr∘succ∘ord∘chr∘ord)

as arrows from CHAR to NAT.

This discussion is incomplete- at this point we have no way to introduce


n-ary operations for n>1, nor do we have a way of specifying the flow of
control. We will remedy this in future chapters.
Mathematical structures as categories
Certain common mathematical structures (from set theory)
can be perceived as special types of categories.
Preordered and ordered sets
if S is a set, a subset α⊆SXS is called a binary operation on S. It is often
convenient to write xαy as a shortcut for (x,y) ∈α.

We say that α is reflexive if xαx for all x∈S and transitive if xαy and yαz implies
xαz for all x,y,z∈S.
A set S with a reflexive, transitive nature relation α on it is a structure (S,α)
called a preordered set. This structure determines a category C(S,α) defined as
follows.

CO-1: The objects of C(S,α) are the elements of S

CO-2: If x,y ∈ S and xαy, then C(S,α) has exactly one arrow from x to y, denoted
(y,x) (this choice of notation is odd but fits the right-to-left composition we use.
The domain of (y,x) is x and the codomain is y)

CO-3: If x is not related by α to y there is no arrow from x to y


continued
The identity arrows of C(S,α) are those of the form (x,x); they belong to
α because it is reflexive. The transitive property of α is needed to
ensure the existence of the composite function of categories, so that
(z,y)∘(y,x)=(z,x)

-The category C(S,α) for S={C,D} and

α={<C,C>,<C,D>,<D,D>}

Is the category 2:
Ordered sets
A preordered set (S,α) for which α is antisymmetric (that
is, xαy and yαx imply x=y) is called an ordered set or
poset (for ‘partially ordered set’).

-The real numbers with the usual ordering is the poset


(R,≤)
-For any set S, the set of subsets of S with inclusion
as ordering is the poset (P(S),⊆)
Semigroup
A semigroup is a set S together with an associative binary
operation m:SxS→S (meaning if this operation is applied, the
result is still in S). The set S is called the underlying set
of the semigroup.

Normally for s and t in S, m(s,t) is written ‘st’ and called


“multiplication” (not the arithmetic operation though it often
is), but it doesn’t have to be commutative; we may have st≠ts
(think of multiplying matrices). A commutative semigroup is a
semigroup whose multiplication is commutative.
notes
It is standard practice to name the semigroup by naming its underlying
set e.g “the semigroup S”. We will do this for other mathematical
structures like posets as well. This counts as “abuse of notation”,
and sometimes we will need to be more precise.

We can call a semigroup closed under its binary operation (like


multiplication or addition). There are sets that have this property
but also another one- for example vector spaces are closed under
scalar multiplication and vector addition.

Our binary operation has to be associative. If it’s not, we call the


set a magma.
Powers
We set s1=s, and for any positive integer k, sk=ssk-1. Such
powers of an element obey the laws sksn=sk+n and (sk)n=skn (for
positive k and n). On the other hand, the law (st) k=sktk
requires commutativity.
The empty semigroup
We specifically allow the empty semigroup, which consists
of the empty set and the empty function from the empty set
to itself (the cartesian product of the empty set with
itself is the empty set). It will become evident later why
we need to do this, as this is not done in most non-
category theory literature.
Identity element
An identity element e for a semigroup S is an element of S
that satisfies the equation se=es=s for all s∈S. There can
be at most one identity element in a semigroup.
Monoids
A monoid is a semigroup with an identity element. It is commutative if its
binary operations are commutative.

-it follows from the definition that a monoid is not allowed to be


empty: it contains at least an identity element.

-we can extend the notation of powers to 0 by defining x 0 as the


identity element of the monoid. The laws s ksn=sk+n and (sk)n=skn then hold
for all nonnegative k and n.

-one example of a semigroup is the set of positive integers with


addition as the operation; this is a semigroup but not a monoid. If you
include 0 you get a monoid.
Kleene closures
The Kleene closure A* of set A is the set of strings (or
lists) of finite length of elements of A. We write the
lists in parentheses; for example (a,b,d,a) is an element
of {a,b,c,d}*.

-some parts of computer science literature call these


strings instead of lists and write them this way: ‘abda’.

-A* includes the empty list () and for each element a ∈


A the list (a) of length one.
continued
-the operation of concatenation makes the Kleene closure a monoid F(A)
[ILLUSTRATE], called the free monoid determined by A. The empty list is
the identity element. We write concatenation as juxtaposition:

(a,b,d,a)(c,a,b)=(a,b,d,a,c,a,b)

-the underlying set of the free monoid is A*, not A. A is usually


assumed finite, but the Kleene closure is defined for any set A. The
elements of A* are lists of finite length in any case. When A is nonempty,
A* is an infinite set (you can just repeat elements forever).

The concept of freeness is a general concept applied to many kinds of


structures. It will be elaborated on in a future video.
Submonoid
a submonoid of a monoid M is a subset S of M with two properties:

SM-1 The identity element of M is in S


SM-2 If m,n ∈ S then mn ∈ S (which is to say, S is closed under
multiplication

-the natural numbers with addition form a submonoid of the integers with
addition. The integers with multiplication as the operation, so that 1 is the
identity element, is also submonoid of it. So do the natural numbers, and the
set of positive natural numbers- the product of two positive numbers is another
one.

-the singleton set {0} is a subset of the integers that is closed under
multiplication, is a monoid, but is not a submonoid of the integers on
multiplication, because it does not contain the identity element 1.
Monoids as categories
Any monoid M determines a category C(M):

CM-1: C(M) has one object designated “*” and this object can be chosen
arbitrarily. A single uniform choice is to take *=M
CM-2: The arrows of C(M) are the elements of M with * as source and target
CM-3: Composition is the binary operation on M [ILLUSTRATE]

-thus, a category can be seen as a generalized monoid, or a “monoid with


many objects”.

-this is a point of view not as popular as the one calling the


category a generalized poset. This idea is worth considering in computer
science, however.
comment
Many categories define a monoid to be a category with one
object, and a preordered set as a category in which every
Hom set is either empty or a singleton. This can be
justified by the fact that the category of monoids and of
one-object categories are ‘equivalent’, which we will
discuss in a later video.
INSERT END CARD HERE

PREVIOUS VIDEO NEXT VIDEO

You might also like