You are on page 1of 76

In this issue...

4 In conversation with
Ulrike Tillmann
Sophie Maclean and
David Sheard sit down with the
incoming president of the LMS
12 The maths of Mafia
Sophie Maclean has a run-in with
the mathia
30 How to be the least popular
American president
Francisco Berkemeier is still
waiting for results from Georgia
39 Who is the best England
manager? 22 On conditional probability
Paddy Moore levels the score 𝑃(this article is good | it’s written
48 Surfing on wavelets by Madeleine Hall) = 𝟢.𝟫𝟤
Johannes Huber debates whether
it’s pronounced JPEG or JPEG
60 Diagrammatic algebra
Aryan Ghobadi gives a maths 3 Page 3 model
lecture at a zoo 11 What’s hot and what’s not
21 Which number system are
you?
27 The big argument: Is the
Einstein summation convention
worth it?
28 Dear Dirichlet
35 On the cover: Cellular
automata
by Matthew Scroggs
46 Puzzles
12 55 Letters
56 Significant figures:
John Conway
by Jamie Handitye
and Jakob Stein
58 Crossnumber
69 Zoom conference zingo
70 Reviews
4 56 72 Top ten: Calculator buttons

1 spring 2021
chalkdust

Welcome to Chalkdust issue 13: this magazine is offi-


cially a teenager. Normally, with age comes maturity
and wisdom.
This issue, our significant figure (pp 56–57) is John
Conway, and his influence can be felt throughout the
The team magazine: from the cover—generated by cellular au-
Charlotte Connolly
tomata (pp 35–38)—to the team’s favourite and least
Jamie Handitye
Ellen Jolley favourite mathematical games. You can also read
Sophie Maclean about the mathematics of games like Mafia or Among
Matthew Scroggs Us (pp 12–20), conditional probability in card games
Belgin Seymenoğlu (pp 22–26), and find out about a game—which is very
David Sheard popular apparently—called ‘football’ (pp 39–45). And
Jakob Stein on pp 30–34 we delve into the greatest game of all
Adam Townsend
time (according to Russian hackers), the US presiden-
tial election.
d chalkdustmagazine.com
c contact@chalkdustmagazine.com Speaking of presidents, since our last issue a new one
a @chalkdustmag has been elected. You can read our interview with Ul-
b chalkdustmag rike Tillman, incoming president of the London Math-
l chalkdustmag ematical Society (pp 4–10), where we discuss her work
n @chalkdustmag@mathstodon.xyz tackling inequality and her research bridging pure and
e Chalkdust Magazine, Department of applied mathematics. On top of that, we have two ar-
Mathematics, UCL, Gower Street, ticles discussing the very purest and very applied-est
London WC1E 6BT, UK. of mathematics: braided categories (pp 60–68), and
image compression (pp 48–54).
We know everyone is sick of sitting in endless video
calls over the last few months; to relieve the monotony
you can play our very own game of Zoom conference
bingo (page 69). After all, it’s the twenties and there
is time for Zingo! We at Chalkdust are excited at the
prospect of running an in-person event again in the
future, seeing old and new faces (edges and vertices),
and getting rid of the boxes of magazines that are fill-
ing up our spare rooms.
The Chalkdust team

Acknowledgements
We would like to thank: all our authors for writing wonderful content; our sponsors for allowing us to
continue making the magazine; Helen Wilson, Helen Higgins, Luciano Rila and everyone else at UCL’s
Department of Mathematics; everyone at Achieve Fulfilment for their help with distribution.
ISSN 2059-3805 (Print). ISSN 2059-3813 (Online). Published by Chalkdust Magazine, Dept of Mathematics, UCL, Gower Street, London WC1E 6BT, UK. © Copyright for articles
is retained by the original authors, and all other content is copyright Chalkdust Magazine 2021. All rights reserved. If you wish to reproduce any content, please contact us at
Chalkdust Magazine, Dept of Mathematics, UCL, Gower Street, London WC1E 6BT, UK or email contact@chalkdustmagazine.com

chalkdustmagazine.com 2
Have you ‘herd’? The world’s largest cow is over six feet tall and weighs more than 1.3
tonnes. Is a bigger cow possi-bull? Will the future contain infinitely large cows? The
steaks have never been higher!
To answer this question, let’s take a look at the cow’s legs. If the main (meaty) bit of
the cow has a volume 𝑉 and density 𝜌 then its weight is 𝜌𝑉 𝑔. So each leg supports a
load of about
𝜌𝑉 𝑔
𝑁= .
𝟦
In pursuit of glory, let’s now make the length, height and width of the cow bigger by
a factor 𝑎. The cow’s new volume is 𝑎𝟥 𝑉 and so the load on each leg is 𝑎𝟥 𝑁: it grows
cubically as 𝑎 increases.
Can the legs cope? If we model the legs as cylinders (since they already ‘lactose’...),
we can use a 1757 result from the famous cow enthusiast Euler: if a cylinder has height
𝐿 and radius 𝑟 , the maximum load it can support standing upright is
𝐸π𝟥 𝑟 𝟦
𝑁max = .
𝟦𝐿𝟤
𝐸 here is just a property of the material: its stiffness, or Young’s moo-dulus.
With our scaling, 𝐿 and 𝑟 are now 𝑎 times bigger. Our
new maximum load is
𝐸π𝟥 𝑎𝟦 𝑟 𝟦
= 𝑎𝟤 𝑁max .
𝟦𝑎𝟤 𝐿𝟤

Uh oh... this only scales as 𝑎𝟤 : quadratically.


So even though 𝑁max starts above 𝑁 (it has to, given that these cows exist!), there will
come a maximum possible 𝑎, after which there will beef-ar too much cow and its legs
will give way... an udder disaster.

This analysis tells us something really important about biology—that there is a nat-
ural maximum size for land mammals. But have we reached it for cows? Brody &
Lardy’s 1000-page tome Bioenergetics and Growth from 1946 has all the de-tail you
need. We’ll leave you to ruminate on the cow-culations.

3 spring 2021
chalkdust

I n c o n versat io n w it h . . .
Ulrike Tillmann
Ander McIntyre, used with permission from Ulrike Tillmann

Sophie Maclean and David Sheard

T
hey say variety is the spice of life and to us at Chalkdust, maths is life so it makes sense that
maths is made better by variety. A variety of topics, a variety of people, a variety of poorly
constructed maths puns. Ulrike Tillmann embodies this ethos with her work bridging the
gap between pure and applied maths. Despite spending most of her academic career in the UK,
Ulrike has lived in several other countries. She was born in Germany and then went on to study in
the US. She is now a professor of pure mathematics at the University of Oxford and a fellow of the
Royal Society, balancing her time between research, teaching, and outreach. She sat down with us
to chat about her career and what the future holds, both for her and maths in general.

Taking the reigns


If you’ve been following maths news in the past few months, the name ‘Ulrike Tillmann’ may
be particularly familiar to you. It was announced recently that she will be the next president of
the London Mathematical Society, one of the UK’s five ‘learned societies’ for mathematics. She
will also take up the mantle as director of the Isaac Newton Institute, a research institute at the
University of Cambridge, in autumn of this year. Research institutes are perhaps the least well-
known entities in the academic world (as viewed from the outside), often only visited by some of
the most senior academics in a field. We asked Ulrike to explain what they are all about. “The
Isaac Newton Institute runs mathematical programmes in quite a broad range of areas. These
programmes typically run between four and six months and researchers come from all over the

chalkdustmagazine.com 4
chalkdust

world to concentrate on their research.” The programmes are beneficial not only to individual
mathematicians, but to the community as a whole. “Being together with your colleagues who are
also experts in your area, and who are often completely spread all over the world, is a fantastic
thing. It brings the field forward and it can make a big difference to that research area.” On paper,
the role of director will involve overseeing the organisation of these programmes, but she sees
it going beyond this, including “making sure that things like equality and diversity are not just
observed, but also incorporated.”
Diversity matters a lot to Ulrike and she has spent a lot of time thinking about what can be done,
so she turns our conversation towards representation in mathematics more generally. “The most
important part seems to me to be ensuring that women, and also other minorities, are welcome;
and fostering a very open society.”
Ulrike is involved with many events for women in mathematics,
both as a speaker and organiser. Indeed, encouraging more women
into mathematics was part of her motivation for taking on her new
positions. “I think as women we have to occasionally come forward
and do these roles, even though sometimes we shy away from them.
Being a presence is important.” She hopes that by increasing the
visibility of women in mathematics, women will be encouraged to
study maths and stay in academia. “We’re always drawn to groups
where we see people who are similar to us, where we can identify,
where we are obviously welcome. So I think we need to make that
part of our culture: to just be open.” Unfortunately we know all too
well that any change like this will take time, and she acknowledges Ulrike Tillmann
the difficulties. “If I had some solution, I would have implemented it by now. But go back 30 years
and there has been a big change. I think that’s encouraging and we just need to make sure that it
is pushed in the right direction.”
Ulrike, of course, knows personally the importance of di-
The most important part versity as a woman in mathematics, but she is keen to
seems to me to be ensuring impress that diversity goes beyond gender, and the other
that women, and also other underrepresented groups we often talk about. Really she
minorities, are welcome; and sees the need for diversity of experience, thinking, and
fostering a very open society. background. “In terms of excellence you need a mixture
of people—not just the stars—you need a whole mixture
of people all striving for excellence in their own ways. This cannot be measured on a simple one-
dimensional scale. I think geographic diversity is also another aspect of this which is really impor-
tant. And we will all be better off if we spread things around a little bit in a sensible way.”

Seeing the shape of data


Some of Ulrike’s most recent research has been in the fledgling field of topological data analysis
(TDA). “It’s really trying to capture the shape of data. You can imagine data as a point cloud in
some Euclidean space, and when you have such point clouds, what does it mean to be a shape?”

5 spring 2021
chalkdust

The idea of studying the basic shapes in data is nothing new. “There is clus-
tering, for example: people already understand clustering relatively well—
there’s a bunch of points here, a bunch of points there—they seem to be
separated and maybe that separation is meaningful for the data. Or in lin-
ear regression, you are trying to fit a line to your data, and then that gives
you some understanding of the data.”
Topological data analysis seeks to use advanced topological techniques to
detect more complicated structures hidden in complex data. “You are look-
ing for holes in dimension one or two and then you can use different tech-
niques to approach the same data from different directions and try to un-
derstand a little bit more about the shape. The idea is that, especially for
complex data, the shape should be meaningful.”
It might be difficult to imagine how complex
It’s always a two way dialogue be- topological features can be interpreted mean-
tween the mathematician and those ingfully in the real world, but the approach has
people who want to apply it. many success stories. “There has been a fa-
mous study by Gunnar Carlsson and collabo-
rators which looks at different types of breast cancers. The data was effectively Y-shaped, rather
than just a line. Understanding that there was a third branch, a new branch, meant they could
see that not all cancers were the same. There was actually a ‘good’ version that you didn’t have
to treat.” Data scientists rely on TDA as without it “sometimes you just can’t predict what shapes
the data has.” This last point is essential—the topological techniques can help you find patterns in
your data that you would not even think to look for.
One key topological tool used in TDA is persistent homology. Homology is a technique which uses
algebra to count the topological features of a space, for example, the homology of a torus can be
summarised by three numbers counting the features in dimensions 0, 1, and 2:
𝛽𝟢 = 𝟣 It has one connected component, so every 0-dimensional
point is connected to every other
𝛽𝟣 = 𝟤 It contains two ‘independent’ 1-dimensional circles (red)
𝛽𝟤 = 𝟣 Its 2-dimensional surface encompasses one interior cavity
Persistent homology studies those topological features which can be found persistently in the ho-
mology of the data as you vary the scale on which you look at the data points’ interactions (see
diagrams on the next page). In this context, clustering means that your data has several connected
components, and so just corresponds to the 0-dimensional homology. TDA focuses on finding
higher dimensional structures by looking at the higher-dimensional homology.
“Especially looking at biological data, it is generally not so important where exactly the points
are—they are just samples anyway. So topology in particular is quite useful because it tries to
study the shape in a ‘fuzzy’ way. Topology is just a poor relative of geometry where you forget
about angles and distances, so that it can focus on the most important features.”

chalkdustmagazine.com 6
chalkdust

An evolving field
Ulrike’s work in this area is formalised through her role as co-
director of the Centre of Topological Data Analysis at the Uni-
versity of Oxford. Despite a research background in very pure
mathematics, she doesn’t limit herself to the theoretical side of
TDA. “Our pitch to the EPSRC [Engineering and Physical Sci-
ences Research Council] was that we would go all the way to
the applications. The application should tell us what we want
to understand theoretically, and then we work backwards and
forwards between pure and applied. I have been involved in a
study where immune cells are analysed. How quickly can they
infiltrate a cancer? That is a real life study where maybe these
topological methods can be used. So you see the whole pipeline
going through.”
Since TDA is a new and exciting field, it is tempting to try and
speculate about what developments we can expect in the next
few years. Ulrike is cautiously optimistic: “I think evolution
rather than revolution is probably what we are going to see. A
certain amount of new thinking has to be cultivated because
topology is not one of your typical areas of applied mathematics:
you tend to see more analysis, numerics, and linear algebra.” It
takes time for new mathematical ideas from pure topics to infil-
trate applied research groups.
“I think it needs to be popularised a little bit first because it’s al-
ways a two-way dialogue between the mathematician and those
people who want to apply it and we just need to fill in that space
more. But it is this interaction between topological data analysis
and other techniques that will really be important.” These other
techniques are by no means limited to data science—applied
mathematics is about pulling together any and every tool which
might be helpful. “We are also trying to mix TDA with machine
learning methods to make more meaningful and also more inter-
pretable machine learning algorithms.”

Igniting a love of maths


You’d be forgiven for thinking Ulrike never had doubts she’d be
a mathematician, but this was not the case. “It was somewhat Persistent homology studies
gradual. I went to Stanford for my graduate degree and dur- the shape data makes as
ing my first year I was playing with the idea of doing something points interact at different
in computer science.” Although Ulrike did eventually settle on length scales.
maths, she does worry that a more rigid degree course would

7 spring 2021
chalkdust

have prevented this. “I did about a third of my undergraduate courses in mathematics. If I had
come to Britain at that point I would have completely missed the train.” In fact, Ulrike believes
this is a significant flaw in the British system. “I think we force our students into decisions too
early. If you like mathematics you shouldn’t have to rely on your decision as a 16-year-old to pick
further mathematics A-level.”
Ulrike grew up in a small town in Germany
I think it’s a really deep satisfaction and partially credits this for sparking her joy
that comes out of being able to solve of mathematics. “There was no kindergarten or
a problem. That you see connections anything like that, so I was a bit bored. I asked
between things that you haven’t been my mother for problems and she would set me
able to see before… some sums and I liked doing those.” Through-
out school all the way to her undergraduate de-
gree, maths was just something that came easily to her, rather than a strong interest. Eventually
it was the puzzles that drew her in. “I really wanted to work on these challenging problems that
mathematics provides. I think it’s a really deep satisfaction that comes out of solving a prob-
lem. That you see connections between things that you haven’t been able to see before, and that
maybe nobody else has been able to see before. That is very exciting—to really try to understand
something—and sometimes you bring new concepts together.”

Studying surfaces and spaces


The problems which really appeal to Ulrike come from geometry and topology, in particular the
so-called moduli spaces of surfaces. These turn out to connect with several areas of maths and
physics: “You know what a surface is, and one way to think of moduli spaces is to understand
them in families.” A simpler one-dimensional example might be to think about all possible circles
in the plane. A circle is determined by its centre (𝑥, 𝑦) and its radius 𝑟 > 𝟢. Therefore choosing a
circle in ℝ𝟤 is the same as choosing a point (𝑥, 𝑦, 𝑟) in ℝ𝟤 ×ℝ>𝟢 ≕ M. This set is the moduli space for
circles in the plane (see diagram below). The key idea is that this moduli space is itself a geometric
space (not merely an abstract set), and so you can study it using geometric and topological tools—
and hence study all possible circles at once. For example, following a path in M corresponds to
continuously deforming one circle into another.
ℝ>𝟢

ℝ𝟤
ℝ𝟤

The moduli space M of circles in the plane (left), and a continuous deformation of circles corre-
sponding to a path in M (right).

chalkdustmagazine.com 8
chalkdust

Since surfaces are two-dimensional, and geometrically


more complex, their moduli spaces are a lot more compli-
cated, but that also makes them more interesting. “Sur-
faces are one of the foundational objects in mathematics.
They appear in geometry, of course, but also dynamics
and number theory; they’re all somehow connected to
surfaces in one way or another, and the moduli space
is of interest to a lot of these subjects.” In fact, it was
through the applications of moduli spaces to physics that Pavlína Jáchimová, CC BY 3.0 CZ
Ulrike first became interested in the subject, working Ulrike representing the British Royal
with fellow Oxford topologist Graeme Segal. “He was Society of Sciences, 2018.
interested in conformal field theories and topological quantum field theories and it is the physics
story behind it that made it very interesting for me. I’m still very excited about this physics part
of it, because some of the theorems that we were able to prove can be interpreted as classifications
of so-called invertible topological quantum field theory—so the story behind it is quite important.”
Again the go-to technique for studying moduli spaces for Ulrike is homology.

A mathematician’s apology
People engaged in basic research—research which has no immediate application—are often called
upon to justify their work, whether to family and friends, funding bodies, or even policymakers
and the general public. Sometimes this may amount to no more than a minor inconvenience.
However, in recent months the topic has risen to the fore since the University of Leicester began
a consultation on proposals which, as part of broad restructuring across its faculties, include the
disbanding of the pure maths research group in favour of a focus on applications of mathematics to
artificial intelligence, computational modelling, and data science. Of course, a consultation is not
the same as enacting a proposal, and there are likely to be many factors involving the plans which
are not in the public domain; nevertheless as a pure mathematician who works on the interface
with applications in TDA, Ulrike is well-placed to comment in general on the idea of doing away
with pure mathematics in a research intensive institution.
“Of course I don’t know the precise situation,
there are often financial considerations and so Turing, for example, was a mathemati-
on, but I find it a little bit puzzling frankly. cian first and then the inventor of the
I’m a pure mathematician who also moved into Turing machine. It feels to me that re-
data science and it seems to me that a new search culture ought to support foun-
subject like data science will certainly bene- dational mathematics.
fit from pure mathematics, where many of the
new ideas are coming from. Turing, for example, was a mathematician first and then the inventor
of the Turing machine. It feels to me that research culture, in a research university—especially one
that hopes to do something as technical as data science—ought to support foundational mathemat-
ics.” The Centre for Topological Data Analysis stands as a prime example of how pure mathematics
works in tandem with its applications, although of course immediate applicability is by no means
the sole justification of pure maths.

9 spring 2021
chalkdust

Undergraduate degrees tend to have curricula designed to ensure a strong basis in pure mathe-
matics. This allows students to develop a sufficient grounding and enable them to specialise in
topics ranging from applied to pure. How these pure modules will be taught without pure research
mathematicians is clearly a question which must be tackled. “In Leicester’s case in particular, what
troubles me is that they still hope to have an undergraduate maths degree, but having teaching-
only staff to do the pure modules, and I don’t think that’s a great solution. For me, teaching gets
more interesting and is invigorated by research, so the best teaching is often inspired and kept
relevant by research.”
Much of the criticism of Leicester’s proposals centre around legitimate concerns for the individual
workers who are now facing the prospect of redundancy during a pandemic, but any argument
in defence of basic research in mathematics or any other subject needs to stand independent of
the present situation. It is no great secret that academia is a very insular field, and one might
reasonably try to argue that mathematics needs to modernise, consider new ways of working, and
what is the harm if one university opts to focus solely on technology and applications? “I think
actually Britain is generally quite advanced in thinking in terms of impact. Of course it takes some
effort to make these connections and they are also not necessarily done by pure mathematicians
themselves. But pure mathematics brings the practice and culture of rigorous thinking. That’s
really important, and it’s often our students who make the applications. Britain has a science and
technology based industry and economy, and we need more people educated in Stem subjects, of
which mathematics is a foundational part. I don’t think we can get away from that.”

Sophie Maclean
Sophie Maclean is a recent maths graduate from the University of Cambridge and very much
misses her degree. She has no free time—she is a Chalkdust editor.
a @sophiethemathmo
David Sheard
David is a final year PhD student at UCL studying geometric group theory. When not doing
maths he can usually be found singing or playing the flute.
d davidsheard.co.uk c david.sheard.17@ucl.ac.uk a @SheardDavid
My favourite game
Maths is all about playing with ideas to see what happens, and some of the coolest maths
come straight out of playing games. We’ve spread some of our favourite—and least favourite—
mathematical games throughout this issue.
Mathsteroids
Matthew Scroggs

Mathsteroids ( d mscroggs.co.uk/mathsteroids) is a version of the classic arcade game As-


teroids that you can play on a selection of interesting surfaces. For the levels on a sphere,
you can choose which 2D representation of the sphere you want to see while you play. The
Gall–Peters level is very hard. 10/shameless self-promotion

chalkdustmagazine.com 10
WHAT’S & WHAT’S Maths is a

HOT NOT
fickle world.
Stay à la mode
with our guide
to the latest
trends.

NOT Muting yourself when Agree? Disagree?


Typing noise a @chalkdustmag
not talking ASMR b chalkdustmag
There’s a 0% chance you’ll remember So soothing l chalkdustmag
to unmute later f chalkdustmag
HOT
HOT
Financial derivatives Partial derivatives
All the cool kids love buying GameStop. None of the cool kids love
these. Stop.
NOT
𝟨.𝟤 cm HOT
Seems a reasonable height for a
person

𝟨′𝟤″
Seems an unreasonable height
for a person HOT
NOT FFTs
Don’t waste your time on slow Fourier
transforms

NFTs
Don’t waste your money buying a
blockchain picture of a scorpion
NOT

Writing about an election six Being current


months too late See no pages. NOT
See pages 30–34.

HOT
More free fashion advice online
Pictures
Background: Flickr user Harmon, CC BY-SA 2.0.
at d chalkdustmagazine.com
GameStop: Mike Mozart, CC BY 2.0.

11 spring 2021
chalkdust

The maths of Mafia


Sophie Maclean

I
t was 8pm on a wintry Saturday and I was pleading for my life. “I would never betray you. I
promise.” I searched desperately for someone, anyone, to back me up. Of course, they were
right. I had been responsible for the murder of many of their friends but I wasn’t about to
admit to that. My co-conspirators had gone quiet, well aware that to support me was to put
themselves into the firing line. At 8.55pm a vote was held. By 9pm I had been executed.

OK, so that first paragraph may have been a bit misleading. Thankfully I was not actually put
to death, and I haven’t killed anyone in real life. This all happened whilst I was playing Mafia.
For those unfamiliar, Mafia is strategy game in which players are (secretly) assigned to be either
citizens or mafia. The game is split up into day and night phases (when playing in person, night
is simulated by everybody closing their eyes). During the night phase, the Mafia are able to com-
municate with each other and can vote to kill one person. During the day phase, all the residents
(both citizens and mafia) discover who died, and then vote to execute one resident. The aim for
each group is to eliminate the other.

Some of you may be reading this thinking “Hmmmm this is sus. It sounds very much like Among
Us” and you’d be right. The popular online game was inspired by Mafia and is one of many adapta-
tions of the game. The particular version I was playing—when I was outed as a Mafia member—was
Harry Potter themed (if there’s one thing you’ll learn about me during this article, it’s that I’m in-
credibly cool). People found themselves either on Team Hogwarts or Team Death Eater, and there

chalkdustmagazine.com 12
chalkdust

were some special Potter themed rules, which led to an interesting situation mathematically.
At the beginning of the game, Voldemort was allowed to select a horcrux. This essentially meant
that Voldemort could not die until both he and this other character were killed. The crucial part of
this power was that the horcrux didn’t know they were the horcrux. Towards the end of the game,
there were four characters left alive: Voldemort, Fred Weasley (who was the horcrux), Fawkes the
phoenix, and Ginny Weasley. By this point, everyone knew for certain which character had been
assigned to each player. Fred, Fawkes and Ginny had also worked out that one of them was the
horcrux but didn’t know who. They had guessed it was Fawkes. That night Voldemort attempted
to kill Ginny (though didn’t succeed because magic). The next day, Team Hogwarts were informed
of the failed assassination. What should they do next?

The Great (Monty) Hall


On the face of it, there seems to be no reason to alter who they suspect the horcrux is. The proba-
bility of Fawkes being the horcrux hasn’t changed, right? Well actually, wrong. When they settled
on Fawkes, there was a 𝟣/𝟥 chance that he was the horcrux. In attempting to kill Ginny, Voldemort
had shown that she wasn’t the horcrux (Voldemort could equally have attempted to kill Fawkes
here). So the probability that Fred is the horcrux is 𝟤/𝟥. Therefore it’s best to kill Fred (sorry
Fred!). Though at first glance, the situation doesn’t appear to have changed, the addition of new
information means it actually has, and in Mafia information is crucial.
Those of you familiar with a certain piece of mathematical lore will
now be jumping out of your seats. This is equivalent to the infamous
Monty Hall problem. In that problem, a contestant must pick one
of three doors as part of a game show. The contestant is told that
behind two of the doors is a goat, but behind the third is a car. After
they have made their choice, Monty Hall (the presenter) opens one
of the other doors to reveal a goat. The contest is then given the
opportunity to swap doors. In our Mafia analogy, the horcrux is the
car, with the other two characters being goats. Voldemort is our
Monty Hall, and by attempting to kill Ginny he opens the door to a
Harry Potter in the Great
goat. And just like in our Mafia version, the contestant is better off
Hall during a feast
swapping doors.
If this is taking a while to get your head around, consider the following table (now framed in terms
of the Monty Hall problem):
Swap? initial guess final guess
car car!
Don’t swap goat 1 goat 1
goat 2 goat 2
car a goat
Swap goat 1 car!
goat 2 car!

13 spring 2021
chalkdust

Each of the initial scenarios (ie each line) is equally likely to occur. You can now clearly see that
when swapping, the contestant wins 𝟤/𝟥 of the time, compared to only 𝟣/𝟥 of the time when
staying put. Another way of putting this is that to win when sticking with your first guess, you
have to guess correctly first time (which has probability of 𝟣/𝟥) but to win if you swap, you have
to be wrong on your first guess (which has probability of 𝟤/𝟥).
This whole scenario got me wondering how else maths could help when playing Mafia. I first learnt
to play Mafia at maths camp (I told you I was cool) so I knew it was popular with mathematicians.
Could it be that there’s a secret mathematical strategy to guarantee that you win the game?

Rules and Regulus-ions Black


In short, no. One of the great things about Mafia is there’s a huge psychological element. When
playing with people you know well, you can observe changes in behaviour that indicate they’re
lying—you can spot contradictions in alibis, you can notice voting patterns and compare that with
known friendships. Interrogations can be carried out, pressure can be applied; I’ve even known
someone to threaten to end a relationship if it transpired her partner was lying to her. But this
doesn’t mean that there’s nothing we can say. If we take out the psychological aspect, and assume
that murder choices are truly random, we can give a probability that the mafia win.
The first step in exploring the maths of any game is to clearly set the rules, and formalise our
mathematical model. We will consider a simple version of the game, where everyone is either a
mafia member or a citizen and there are no Potter-esque powers. Let’s define the game as follows:
There are initially 𝑁 players, all of whom are called residents. There is also one more person,
not playing, who coordinates the game (this allows anonymous and simultaneous votes).
Before the gameplay begins, 𝑀 of these 𝑁 players are assigned to be mafia. The remaining
𝑁 − 𝑀 players are citizens.
Every player is told their own identity. The mafia are also told the identities of each other.
The citizens only know their own identity.
A turn is defined as a day phase, followed by a night phase:
• A day phase consists of a debate, where all players can freely discuss strategy. After
this, there is a vote where all players simultaneously choose a resident to execute. The
resident with the most votes is killed. In the event of a tie, one of the most voted
residents is randomly chosen to die. It is then revealed whether this resident was a
mafia member or not.
• A night phase consists of only the mafia communicating and then voting on which
citizen to kill, followed by their death. In this model, there is no way for this death to
be prevented, and the mafia cannot kill one of their own.
We assume no psychological aspect comes in to play, and so we assume citizens never have
any information on who is and isn’t mafia (obviously this is a vast oversimplification because

chalkdustmagazine.com 14
chalkdust

if, for example, a group of people exactly the same size as the mafia always vote the same
way, and never vote for each other, even the least observant player may get suspicious.)
The game continues until only one team (citizens or mafia) remains and this team is declared
the winner.
We will assume every player plays rationally and always makes the decision that maximises
their chance of winning. This is perhaps the biggest assumption of all.
Let us write the current state with 𝑛 players and 𝑚
mafia as (𝑛, 𝑚). Let the probability of the mafia win-
ning when there are 𝑛 players and 𝑚 mafia be 𝑤(𝑛, 𝑚).
There are a few things that we can immediately say,
without much further calculation.
During a single turn, the possible transitions are
(𝑛, 𝑚) → (𝑛 − 𝟤, 𝑚 − 𝟣) (when the residents execute
Regulus Black (Sirius’s late brother) dur-
a mafia member) or (𝑛, 𝑚) → (𝑛 − 𝟤, 𝑚) (when the resi-
ing the day phase (left) and the night
phase (right)
dents execute a citizen). Fans of Chalkdust may recog-
nise this as a Markov chain (which you can read about
in issue 12). One key thing to note here is that the number of residents decreases by 𝟤 each turn,
therefore the game must end in a finite number of turns. By putting a time limit of the length
of each phase, you can also guarantee the game ends in finite time. It would be very unsporting
for the last remaining mafia member to refuse to stop talking and allow a vote to occur, thereby
ensuring that although the mafia couldn’t win, they also couldn’t lose, but it wouldn’t be unheard
of (looking at you, MPs).
Now we want to consider some probabilities. The probability of the mafia winning from each state
is independent of what happened before. We can therefore say that

𝑤(𝑛, 𝑚) = 𝑃(mafia executed | 𝑛 players, 𝑚 mafia) 𝑤(𝑛 − 𝟤, 𝑚 − 𝟣)


(f)
+ 𝑃(citizen executed | 𝑛 players, 𝑚 mafia) 𝑤(𝑛 − 𝟤, 𝑚).

Here, 𝑃(𝐴 | 𝐵) is the probability of 𝐴 happening if 𝐵 has already happened (see pages 22–26).

Sorting into houses cases


This is still a fairly general formula, and doesn’t give much insight. In order to say more, we’ll need
to look at three separate cases:
• 𝑚 >𝑛−𝑚 (the mafia have a majority)
• 𝑚 =𝑛−𝑚 (there are an equal number of mafia members and citizens)
• 𝑚 <𝑛−𝑚 (the citizens have a majority)
Firstly, let’s look at when the mafia outnumber the citizens. In this case, the mafia are guaranteed
to win. This is because during the day phase, they can all vote to kill the same citizen, win the

15 spring 2021
chalkdust

majority vote, and ensure it is a citizen that is killed. This can be organised during the night phase.
Therefore 𝑤(𝑛, 𝑚) = 𝟣 when 𝑚 > 𝑛 − 𝑚.
What about if the citizens have a majority (ie 𝑚 < 𝑛 − 𝑚)? In
𝑚 <𝑛−𝑚 our model, the citizens have no information on who is and isn’t
mafia. Therefore their strategy can only be to randomly select a
resident to eliminate each day phase. The debate phase can be
used to agree on which random resident to execute (for example
using a random number generator). The citizens then all vote for
this person, and (because they have the majority), the unlucky
Sorting into cases resident is executed. There’s nothing the mafia can do to change
that. Each resident has an equal chance of being selected. Therefore the probability that a mafia
member is executed is 𝑚/𝑛, and the probability a citizen is executed is (𝑛 − 𝑚)/𝑛.
The case when 𝑚 = 𝑛 − 𝑚 (ie 𝑛 = 𝟤𝑚) is a special case. The citizens and mafia can propose
a player each, and then one of them will randomly be executed. The probability that a mafia
member is executed is therefore 𝟣/𝟦 (and so the probability a citizen is executed is 𝟥/𝟦). If a citizen
is executed, the mafia outnumber the citizens, and so they win. If a mafia member is executed,
there remains an equal number of mafia and citizens (as a citizen is killed during the night phase).
This puts us in exactly the same situation as before, with the same probabilities. For the citizens
to win, a mafia member must be executed on all remaining turns (of which there must be 𝑚 as
two residents are killed per round). Hence 𝑃(citizens win) = (𝟣/𝟦)𝑚 = (𝟣/𝟤)𝟤𝑚 = (𝟣/𝟤)𝑛 , and so
𝑤(𝟤𝑚, 𝑚) = 𝟣 − (𝟣/𝟤)𝑛 .
If we combine all this, and use equation (f), we get the following:

⎧𝟣 if 𝑚 > 𝑛 − 𝑚
⎪ 𝟣 𝑛
𝟣−( ) if 𝑚 = 𝑛 − 𝑚
𝑤(𝑛, 𝑚) = 𝟤
⎨𝑚
⎪ 𝑤(𝑛 − 𝟤, 𝑚 − 𝟣) + 𝑛 − 𝑚 𝑤(𝑛 − 𝟤, 𝑚) otherwise.
⎩𝑛 𝑛

This is pretty neat and is now just an iterative equation. It would be perfectly possible to calculate
𝑤(𝑛, 𝑚) now, by just plugging through the steps (or even writing some code to do it for you if you’re
that way inclined). Finding a more general formula becomes pretty complicated pretty fast. But
there is still more that we can say, without our brains hurting too much.

(The philosopher’s st)one mafia member


Let us consider the game with only one mafia member. In order for this mafia member to win,
in every day phase a citizen must be executed. Remember that day phases precede night phases.
Therefore
𝑛−𝟣𝑛−𝟥 𝟣 + 𝑛 (mod 𝟤) (𝑛 − 𝟣)!!
𝑤(𝑛, 𝟣) = ×⋯× = . (f f)
𝑛 𝑛−𝟤 𝟤 + 𝑛 (mod 𝟤)) 𝑛!!
Here !! is the double factorial function (like factorial, but taking every other element). It’s worth

chalkdustmagazine.com 16
chalkdust

noting that because 𝑛 (mod 𝑘) is the remainder when 𝑛 is divided by 𝑘 , we have


𝟢 if 𝑛 is even
𝑛 (mod 𝟤) = { (f f)
𝟣 if 𝑛 is odd.

This highlights a rather interesting property: the dependence on the parity of the number of resi-
dents. This doesn’t seem unreasonable, because two players are killed each day, and whether there
are an even or odd number of players does affect the proportion of players needed to have a clear
majority. A quick calculation using equation (f f) shows that 𝑤(𝟤, 𝟣) = 𝟣/𝟤 and 𝑤(𝟥, 𝟣) = 𝟤/𝟥. In
the second case, despite there being a greater proportion of citizens, the probability that the mafia
win is actually higher.
In fact, with a single mafia member, it is always true
that adding an extra citizen to make the total number
Proof by induction
of players odd increases the mafia’s chance of win-
ning. We can prove this by induction, as shown on We use 𝑤(𝟤, 𝟣) and 𝑤(𝟥, 𝟣) as our base
the right. case. Our inductive hypothesis is that
𝑤(𝟤𝑘 + 𝟣, 𝟣) > 𝑤(𝟤𝑘, 𝟣). Let’s now con-
So now we’ve shown that the mafia’s chance of win-
sider
ning is higher with an additional citizen making the
total number of players odd, which I found pretty sur- 𝟤𝑘 + 𝟤
𝑤(𝟤𝑘 + 𝟥, 𝟣) = 𝑤(𝟤𝑘 + 𝟣, 𝟣).
prising. So you can only imagine how shocked I was 𝟤𝑘 + 𝟥
when I learned that the parity of the number of play-
It is true in general that (𝑎+𝟣)/(𝑎+𝟤) >
ers has such an effect that 𝑤(𝟫, 𝟣) > 𝑤(𝟦, 𝟣). In fact,
𝑎/(𝑎 + 𝟣) for all non-negative 𝑎 (if you
we can plot a graph of 𝑤(𝑛, 𝟣) for odd and even 𝑛 using
don’t believe me, you can try proving it
our expression in equation (f f) to highlight this:
yourself). Therefore
𝟣
𝟤𝑘 + 𝟣
𝑤(𝟤𝑘 + 𝟥, 𝟣) > 𝑤(𝟤𝑘 + 𝟣, 𝟣).
𝟢.𝟪 𝟤𝑘 + 𝟤
Now we apply our inductive hypothe-
𝟢.𝟨
𝑤(𝑛, 𝟣)

sis to get
𝟢.𝟦 𝟤𝑘 + 𝟣
𝑛 odd 𝑤(𝟤𝑘 + 𝟥, 𝟣) > 𝑤(𝟤𝑘, 𝟣)
𝟤𝑘 + 𝟤
𝟢.𝟤 = 𝑤(𝟤𝑘 + 𝟤, 𝟣)
𝑛 even
𝟢 which is exactly what we want and
𝟢 𝟧 𝟣𝟢 𝟣𝟧 𝟤𝟢 𝟤𝟧 𝟥𝟢
𝑛 completes our inductive step. There-
fore 𝑤(𝟤𝑛 +𝟣, 𝟣) > 𝑤(𝟤𝑛, 𝟣) for all 𝑛 > 𝟢.
A graph of 𝑤(𝑛, 𝟣) against 𝑛 for 𝑛 odd (green),
and 𝑛 even (blue)

Time to get Sirius


And now we can reach the limit of what I can explain without writing a thesis. Luckily for me,
Piotr Migdał has written a paper for his bachelor’s degree. There is one main result from that,

17 spring 2021
chalkdust

extending the ideas above, which I’d like to share with you. It considers the case of there being
multiple mafia members.
In a similar way to how we derived 𝑤(𝟣, 𝑛), Migdał shows that
𝑚
𝑚 (𝑛 − 𝑖)!!
𝑤(𝑛, 𝑚) = 𝟣 − ∑ ( ) .
𝑖=𝟢
𝑖 𝑛!!((𝑛 (mod 𝟤) − 𝑖)!!

Now observe that (𝑛 − 𝑖)!!/(𝑛 − 𝟣)!! → 𝟢 as 𝑖 increases. Therefore only the first two terms of the
sum contribute significantly. Hence we can write
(𝑛 − 𝟣)!!
𝑤(𝑛, 𝑚) ≈ 𝑚 . (f f f)
𝑛!!
To write this in a nicer form involves a few neat ideas. One fact that will prove very useful is that
for any 𝑘 , (𝟤𝑘 + 𝟣)!! = (𝟤𝑘 + 𝟣)(𝟤𝑘 − 𝟣)!! (using the definition of double factorial above).
We first consider the product of 𝑤(𝟤𝑘 + 𝟣, 𝑚) and 𝑤(𝟤𝑘, 𝑚). By equation (f f f), this gives:
𝑚(𝟤𝑘)!! 𝑚(𝟤𝑘 − 𝟣)!!
𝑤(𝟤𝑘 + 𝟣, 𝑚) 𝑤(𝟤𝑘, 𝑚) ≈ ×
(𝟤𝑘 + 𝟣)!! (𝟤𝑘)!!
𝑚(𝟤𝑘)!! 𝑚(𝟤𝑘 − 𝟣)!!
= ×
(𝟤𝑘 + 𝟣)(𝟤𝑘 − 𝟣)!! (𝟤𝑘)!!
𝟣
= 𝑚𝟤 . (f f f)
𝟤𝑘 + 𝟣
Now we’re going to look at 𝑤(𝟤𝑘 + 𝟣, 𝑚)/𝑤(𝟤𝑘, 𝑚), again using equation (f f f). This may seem a
bit odd right now, but trust me on this one.
𝑤(𝟤𝑘 + 𝟣, 𝑚) 𝑚(𝟤𝑘)!! 𝑚(𝟤𝑘 − 𝟣)!!
≈ ÷
𝑤(𝟤𝑘, 𝑚) (𝟤𝑘 + 𝟣)!! (𝟤𝑘)!!
𝟤
(𝟤𝑘)!!
=
(𝟤𝑘 + 𝟣)!!(𝟤𝑘 − 𝟣)!!
𝟤
(𝟤𝑘)!!
= 𝟤
(𝟤𝑘 + 𝟣)(𝟤𝑘 − 𝟣)!!
𝟤
(𝟤𝑘)!! 𝟣
=( ) . (f f f f)
(𝟤𝑘 − 𝟣)!! 𝟤𝑘 + 𝟣
At this point, you’d be forgiven for having your doubts that I’m making anything more simple here.
Fear not! It all becomes clear with the introduction of the Wallis formula:
𝟤
π (𝟤𝑘)!! 𝟣
= lim ( ) .
𝟤 𝑥→∞ (𝟤𝑘 − 𝟣)!! 𝟤𝑘 + 𝟣
We can now write equation (f f f f) in the limit 𝑘 → ∞ as
𝑤(𝟤𝑘 + 𝟣, 𝑚) π
≈ . (f f f f)
𝑤(𝟤𝑘, 𝑚) 𝟤

chalkdustmagazine.com 18
chalkdust

Okay, we’re nearly there. I promise. The final clever idea requires considering even and odd 𝑛
separately. Write 𝑛 = 𝟤𝑘 for 𝑛 even (where 𝑘 is a positive integer). For 𝑛 odd, write 𝑛 = 𝟤𝑘 + 𝟣 (again
for a positive integer 𝑘 ). Now here’s the magic. Write:

−𝟣/𝟤
⎧( 𝑤(𝟤𝑘 + 𝟣, 𝑚) )
⎪ √𝑤(𝟤𝑘 + 𝟣, 𝑚)𝑤(𝟤𝑘, 𝑚) for 𝑛 = 𝟤𝑘 (ie 𝑛 is even)
⎪ 𝑤(𝟤𝑘, 𝑚)
𝑤(𝑛, 𝑚) =
⎨ 𝟣/𝟤

⎪( 𝑤(𝟤𝑘 + 𝟣, 𝑚) )
⎩ 𝑤(𝟤𝑘, 𝑚) √𝑤(𝟤𝑘 + 𝟣, 𝑚)𝑤(𝟤𝑘, 𝑚) for 𝑛 = 𝟤𝑘 + 𝟣 (ie 𝑛 is odd).

Substituting in our values from equations (f f f) and (f f f f) above gives

−𝟣/𝟤
⎧( π ) 𝑚𝟤
for 𝑛 = 𝟤𝑘
⎪ 𝟤 √ 𝟤𝑘 + 𝟣
𝑤(𝑛, 𝑚) ≈

⎪ π 𝟣/𝟤 𝑚𝟤
( ) for 𝑛 = 𝟤𝑘 + 𝟣.
⎩ 𝟤 √ 𝟤𝑘 + 𝟣
This can be put in to a single line by recalling equation (f f), and using the fact that for large 𝑘 ,
𝟤𝑘 + 𝟣 ≈ 𝟤𝑘 :
π 𝑛 (mod 𝟤)−𝟣/𝟤 𝑚
𝑤(𝑛, 𝑚) ≈ ( ) . (f f f f f)
𝟤 √𝑛

So we finally have an approximate expression for 𝑤(𝑛, 𝑚). Phew. From this, it’s only a small step
to calculate how many of the 𝑛 players need to be made mafia initially in order to give the two
teams equal chance of winning.
To do this, we set 𝑤(𝑛, 𝑚) = 𝟣/𝟤 in equa-
tion (f f f f f). Hence we find the opti-
mal value of 𝑚 is approximately

𝑛 (mod 𝟤)+𝟣/𝟤
√𝑛 π
( ) .
𝟤 𝟤
This is particularly interesting as it means
that when creating a game of mafia, you
can choose the initial number of mafia to
ensure that the mafia (or the citizens) are
unlikely to win by luck alone, and some A fair game with 13 people (during a night phase): 3.55
skill has to be involved. Unfortunately mafia members (top) v 9.45 citizens (bottom). In a real
though, this gives no indication of what game, however, it will probably be easier to round the
that skill should be. number of people on each team to the nearest integer.

One final thing that I’d like to point out is the effect of the parity of 𝑛 on 𝑤(𝑚, 𝑛) and the op-
timal value of 𝑚. We saw in the case of one mafia member how much of difference it makes,

19 spring 2021
chalkdust

and we see it again here. It becomes even clearer when we plot the optimal value of 𝑚 against 𝑛:

𝟧
𝑛 odd

optimal value of 𝑚
𝟦

𝟥
𝑛 even
𝟤

𝟢
𝟢 𝟧 𝟣𝟢 𝟣𝟧 𝟤𝟢 𝟤𝟧 𝟥𝟢
𝑛
A graph of the optimal value of mafia 𝑚 against 𝑛 for 𝑛 odd (green), and 𝑛 even (blue)
So there we have it: I don’t have any surefire winning strategy to reveal to you. But in a way, that’s
what I love so much about Mafia. Yes, maths can be used to play the game better, or to give an
idea of how to structure a perfect game, but maths doesn’t give a way to guarantee you’ll win. It
can inform wiser choices, but ultimately it comes down to how much you trust your friends. And
if there’s one thing I’ve learnt playing Mafia, it’s that you can never truly trust your friends.

Sophie Maclean
Sophie has never been to UCL, is not a student, and is not a proper adult either. Sophie is
the impostor.
a @SophieTheMathmo

“Pro f. Fafn er h oard s h is s u p p ly o f Hago ro m o!”

Smitha Maretvadakethope

chalkdustmagazine.com 20
Which number system are you? YOU ARE
Roman
numerals
YOU ARE YOU ARE
binary hexadecimal
MDCCXXIX YOU ARE
Cistercian
1 0
01000010 1729

01101001?
What 1729

number is
on your
SILICON taxi?
START CHIPS WHAT’S A
POTATO
How do
you like your YOU ARE
potatoes? tallies

WEDGES YOU ARE


Cuneiform
…75303.1
YOU ARE THE π CHART What is PIE CHART
p-adics this?....

How many
Which
LEFT COUNTABLY mathematicians
hand do you
MANY walk into the
write with?
hotel bar?
YOU ARE THE 1
1.30357… RIGHT fractions
YOU ARE THE CONTINUE
decimals
YOU ARE
1
=1 What is continued 1+1
0.99999…? fractions CONTINUE
1
<1 1
1

1+
1 1+1
1+
1 1 CONTINUE
1+
1 1
1+ 1+
YOU ARE THE 1+1 1
1+
surreals 1+1
CONTINUE

21 spring 2021
chalkdust

On conditional probability:
Cards, Covid, and Crazy Rich Asians
Madeleine Hall

I
was watching the film Crazy Rich Asians the other day, as there’s not a lot to do at the moment
besides watching Netflix and watching more Netflix. I thoroughly enjoyed the film and would
highly recommend it. However, there was something that happened in the first few minutes
which really got me thinking and inspired the subject of this Chalkdust article.
The main character in the film is an economics professor (the American kind where you can achieve
the title of professor while still being in your 20s and without having to claw your way over a pile
of your peers to the top of your research field). Within the opening scenes of the film we see her
delivering a lecture, in which she is playing poker with a student, while also making remarks about
how to win using ‘mathematical logic’. The bell rings seemingly halfway through the lecture the
way it always does in American films and TV shows, and our professor calls out to her students
“…and don’t forget your essays on conditional probability are due next week!” Now, I am not going
to delve into the question of what type of economics course she is teaching that involves playing
poker and mathematical logic, but it got me thinking—what exactly would an ‘essay on conditional
probability’ entail?

What is conditional probability?


Conditional probability is defined as a measure of the probability of an event occurring, given that
another event (by assumption, presumption, assertion, or evidence) has already occurred. For all

chalkdustmagazine.com 22
chalkdust

intents and purposes here, for two events 𝐴 and 𝐵, we’ll write the conditional probability of 𝐴 given
𝐵 as 𝑃(𝐴 | 𝐵), and define it as
𝑃(𝐴 and 𝐵)
𝑃(𝐴 | 𝐵) = .
𝑃(𝐵)
The little bar | can just be thought of as meaning ‘given’.

Now that we’ve got some technicalities out of the way, let’s look at some examples of conditional
probability. Imagine you are dealt exactly two playing cards from a well-shuffled standard 52–
card deck. The standard deck contains exactly four kings. What is the probability that both of
𝟤
your cards are kings? We might, naively, say it must be simply (𝟦/𝟧𝟤) ≈ 𝟢.𝟧𝟫%, but we would be
gravely mistaken. There are four chances that the first card dealt to you (out of a deck of 52) is a
king. Conditional on the first card being a king, there remains three chances (out of a deck of 51)
that the second card is also a king. Conditional probability then dictates that:

𝑃(both are kings) = 𝑃(second is a king | first is a king) × 𝑃(first is a king)


𝟥 𝟦
= × ≈ 𝟢.𝟦𝟧%.
𝟧𝟣 𝟧𝟤

The events here are dependent upon each other, as opposed to independent. In the realm of prob-
ability, dependency of events is very important. For example, coin tosses are always independent
events. When tossing a fair coin, the probability of it landing on heads, given that it previously
landed on heads 10 times in a row, is still 𝟣/𝟤. Even if it lands on heads 1000 times, the chance of
it landing on heads on the 1001st toss is still 50%.

Bayes’ theorem
Any essay on conditional probability would be simply incomplete without a mention of Bayes’ the-
orem. Bayes’ theorem describes the probability of an event, based on prior knowledge of conditions
that might be related to the event. It is stated mathematically as:

Bayes’ theorem

𝑃(𝐵 | 𝐴)𝑃(𝐴)
𝑃(𝐴 | 𝐵) = .
𝑃(𝐵)

23 spring 2021
chalkdust

We can derive Bayes’ theorem from the definition of conditional probability above by considering
𝑃(𝐴 | 𝐵) and 𝑃(𝐵 | 𝐴), and using that 𝑃(𝐴 and 𝐵) equals 𝑃(𝐵 and 𝐴).
A fun (and topical!) example of Bayes’ theorem arises in a medical test/screening scenario. Suppose
a test for whether or not someone has a particular infection (say scorpionitis) is 90% sensitive, or
equivalently, the true positive rate is 90%. This means that the probability of the test being positive,
given that someone has the infection is 0.9, or 𝑃(positive | infected) = 𝟢.𝟫. Now suppose that this
is a somewhat prevalent infection, and 6% of the population at any given time are infected, ie
𝑃(infected) = 𝟢.𝟢𝟨. Finally, suppose that the test has a false positive rate of 5% (or equivalently,
has 95% specificity), meaning that 5% of the time, if a person is not infected, the test will return a
positive result, ie 𝑃(positive | not infected) = 𝟢.𝟢𝟧.
Now imagine you take this test and it comes up positive. We can ask, what is the probability that
you actually have this infection, given that your test result was positive? Well,
𝑃(positive | infected)𝑃(infected)
𝑃(infected | positive) = .
𝑃(positive)
We can directly input the probabilities in the numerator based on the information provided in the
previous paragraph. For the 𝑃(positive) term in the denominator, this probability has two parts to
it: the probability that the test is positive and you are infected (true positive), and the probability
that the test is positive and you are not infected (false positive). We need to scale these two parts
according to the group of people that they apply to—either the proportion of the population that are
infected, or the proportion that are not infected. Another way of thinking about this is considering
the fact that 𝑃(positive) = 𝑃(positive and infected) + 𝑃(positive and not infected). Thus, we have

𝑃(positive) = 𝑃(positive | infected)𝑃(infected) + 𝑃(positive | not infected)𝑃(not infected).

And we can infer all the probabilities in this expression from the information that’s been given.
Thus, we can work out that
𝟢.𝟫 × 𝟢.𝟢𝟨
𝑃(infected | positive) = ≈ 𝟢.𝟧𝟥𝟦𝟩.
𝟢.𝟫 × 𝟢.𝟢𝟨 + 𝟢.𝟢𝟧 × 𝟢.𝟫𝟦

Unpacking this result, this means that if you test positive for an
infection, and if 1 in 17 people in the population (approximately
6%) are infected at any given time, there is an almost 50% chance
that you are not actually infected, despite the test having a true
positive rate of 90%, and a false positive rate of 5% (compare to
the proportion of the shaded area in the diagram filled by infected
people). That seems pretty high. Here are some takeaways from
In a population with 6% in-
this example: the probability that you have an infection, given that fected , this test will come
you test positive for said infection, not only depends on the accu- back positive in 10.7% of
racy of the test, but it also depends on the prevalence of the disease cases.
within the population.

chalkdustmagazine.com 24
chalkdust

Unprecedented applicability
Of course, in a real-world scenario, it’s a lot more complicated than this. For something like (and,
apologies in advance for bringing it up) Covid-19, the prevalence of infection (our 𝑃(infected) value)
changes with time. For example, according to government statistics, the average number of daily
new cases in July 2020 was approximately 667, whereas in January 2021 it was 38,600. Furthermore,
𝑃(infected) depends on a vast number of factors including age, geographical location, and physical
symptoms to name only a few. Still, it would be nice to get a sense of how Bayes’ theorem can be
applied to these uNp ReC eNt Ed times.
An article from the UK Covid-19 lateral flow oversight team (catchy
name, I know) released on 26 January 2021 reported that lateral
flow tests (which provide results in a very short amount of time
but are less accurate than the ‘gold standard’ PCR tests) achieved
78.8% sensitivity and 99.68% specificity in detecting Covid-19 in-
fections. In the context of probabilities, this means that

𝑃(positive | infected) = 𝟢.𝟩𝟪𝟪, and


𝑃(positive | not infected) = 𝟢.𝟢𝟢𝟥𝟤.

On 26 January 2021, there were 1,927,100 active cases of Covid-


If 3% are infected with 19 in the UK. Out of a population of 66 million, this is gives us a
Covid-19 , a lateral flow prevalence of approximately 3%, or 𝑃(infected) = 𝟢.𝟢𝟥.
test will come back positive
2.7% of the time. Taking all these probabilities into account, we have
𝟢.𝟩𝟪𝟪 × 𝟢.𝟢𝟥
𝑃(infected | positive) = ≈ 𝟢.𝟪𝟪𝟥𝟫,
𝟢.𝟩𝟪𝟪 × 𝟢.𝟢𝟥 + 𝟢.𝟢𝟢𝟥𝟤 × 𝟢.𝟫𝟩
which means that the chances of you actually having Covid-19, given that you get a positive result
from a lateral flow test, is about 88%. This seems pretty good, but can we make this any better?
Instead of just taking the number of active cases as a percentage
of the total population of the UK to give us our prevalence, we
can alternatively consider 𝑃(infected) for a particular individual.
For someone who has a cough, a fever, or who recently interacted
with someone who was then diagnosed with Covid-19, we could
say that their 𝑃(infected) is substantially higher than the over-
all prevalence in the country. The article Interpreting a Covid-19
test result in the BMJ suggests a reasonable value for such an in-
dividual would be 𝑃(infected) = 𝟢.𝟪. It’s worth mentioning that
this article has a fun interactive tool where you can play around
with sensitivity and specificity values to see how this affects true
and false positivity and negativity rates. Taking this new value of If the prevalence increases
to 80%, you can be much
prevalence, 𝑃(infected), into account, then
more certain of a positive
𝟢.𝟩𝟪𝟪 × 𝟢.𝟪 result, but there are also
𝑃(infected | positive) = ≈ 𝟢.𝟫𝟫𝟫𝟢, more false negatives.
𝟢.𝟩𝟪𝟪 × 𝟢.𝟪 + 𝟢.𝟢𝟢𝟥𝟤 × 𝟢.𝟤

25 spring 2021
chalkdust

giving us a 99.9% chance of infection given a positive test result, which is way closer to certainty
than the previous value of 88%.
Can we do any better than this? Well, compared with the lateral flow Covid-19 tests, it has been
found that PCR tests (which use a different kind of technology to detect infection) have substan-
tially higher sensitivity and specificity. Another recent article in the BMJ published in September
2020 reported that the PCR Covid-19 test has 94% sensitivity and very close to 100% specificity. In
a survey conducted by the Office for National Statistics in the same month, they measured how
many people across England and Wales tested positive for Covid-19 infection at a given point in
time, regardless of whether they reported experiencing symptoms. In the survey, even if all posi-
tive results were false, specificity would still be 99.92%. For the sensitivity and specificity reported
in the BMJ article, this is equivalent to having a false negative rate of 6% and a false positive rate
of 0%. If we plug these numbers in, regardless of what the prevalence is taken to be, we have:

𝟢.𝟫𝟦 × 𝑃(infected)
𝑃(infected | positive) = = 𝟣.
𝟢.𝟫𝟦 × 𝑃(infected) + 𝟢 × 𝑃(not infected)
So when a test has a false positive rate of almost 0%, if you achieve a positive test result, there is
essentially a 100% chance that you do in fact have Covid-19.
So what can we take away from this? Well, we have seen that if a test has higher rates of sensitiv-
ity and specificity, then the probability of the result being a true positive is also higher. However,
prevalence and the value of the probability of infection also play a big role in this scenario. This
could be used as an argument for targeted testing only, for example if only people with symptoms
were tested then this would increase the probability of the result being a true positive. Unfortu-
nately, it is the case that a large number of Covid-19 infections are actually asymptomatic—in one
study it was found that over 40% of cases in one community in Italy had no symptoms. So, if only
people with symptoms were tested, a lot of infections would be missed.
In conclusion, I’m no epidemiolo-
gist, just your average mathemati-
cian, and I don’t really have any an- Bae’s theorem
swers. Only that conditional prob-
ability is actually pretty interesting, 𝑃(Netflix | chill) =
and it turns out you can write a
whole essay on it. The ending of 𝑃(chill | Netflix)𝑃(Netflix)
Crazy Rich Asians was much better 𝑃(chill)
than the ending to this article. Go
watch it, if you haven’t already.

Madeleine Hall
Madeleine is a pHd StUdEnT in mathematics and behavioural genomics at Imperial College
London. She likes open water swimming, toast, the Oxford comma, and tHiS mEmE. She
has found none of her optimised strokes of any use in the Serpentine.
a @maddygracehall

chalkdustmagazine.com 26
E
THIS ISSUE… E TH
TH

G
E
TH
IS THE EINSTEIN SUMMATION
CONVENTION WORTH IT?
B I AR
GU
M EN
T

YES NO
ARGUES ELLEN JOLLEY ARGUES SOPHIE MACLEAN
The Einstein summation convention is a way to write and manipulate vector
equations in many dimensions. Simply put, when you see repeated indices,
you sum over them, so ∑𝑁𝑖=𝟣 𝑎𝑖 𝑏𝑖 is written 𝑎𝑖 𝑏𝑖 for example.

This debate boils down to just one Before writing this argument, I had to
question: how much of your life do you Google ‘summation convention’ which
spend doing tensor algebra? Those of is all the evidence I need for why it’s
us who undertake a positive amount just not worth it. I’ve learnt how to
of tensor algebra or vector calculus use the convention—multiple times! In
know that the goal is to be done with it fact, I’d say it’s something I’m able to
as fast as possible! Try tensor algebra use, yet I’m still not sure I know exactly
for even five minutes without using the what it is.
summation convention—I promise you
will tire of constantly explaining “yes, Some of our readers won’t have ever
the sum still starts from 𝟣, and yes, it heard of it (which is one strike against
still goes to 𝑁.” it). Some have heard of it but won’t
know much about it (another strike).
You’ll scream, “All of them! I am But I guarantee none would be con-
summing over all indices! Obviously! fident saying they can use it without
Why’d I ever skip some??” If you’re making any errors (if you think you
confused how many you’ve got, use would be, you’re in denial).
this simple guide: physicists use four;
fluid dynamicists use three; and Italian We don’t even have need for the con-
plumbers use two. Wouldn’t it be nice vention! We already have a suitable
to avoid saying this in every equation? way to notate summation:

You may cry that it’s easier to make
mistakes with the convention; but It’s taught to schoolkids. There is no
for applied mathematicians, the joy ambiguity. And it’s so much less pre-
comes in speeding ahead to the an- tentious. Yes, the summation conven-
swer by any means—time spent on ac- tion is fractionally faster to write out,
curacy and proof is time wasted. And but mathematicians are famed for be-
as the great mathematician Bob Ross ing lazy and aloof—maybe dispensing
said: there are no mistakes, just happy with it is all we need to break that
little accidents! stereotype!

27 spring 2021
chalkdust

Moonlighting agony uncle Professor Dirichlet answers your personal


problems. Want the prof’s help? Contact c deardirichlet@chalkdustmagazine.com

Dear Dirichlet,
As a successful author on spies
who are also fish, I’m looking to
What with the number of stream branch out a little.
ing platforms, I’m hoping I can
to make my series of novels into get a TV company
a ten-episode drama. But it fee
market—how can I hook a produc ls like a buyers’
er? Let minnow!
— Micholas H erron, Oxford

■ dirichlet says: May I recommend the school market. Each year there
is a new set of 7­year­olds looking to be entertained. For example,
I am about to pitch the BBC my Downton Abbey / second world war /
great railway infrastructure crossover series for children, with all
the characters played by simple 3D shapes. I have already written
to Cube­onneville, Dame Sphera Lynne and Prismbard Kingdom Brunel.
(Still waiting for a reply from the latter two; Cube’s on board.)

Dear Dirichlet,
ns. But when I get
tner bought me some new jea
For my birthday this year my par ring the day,
of the wa rdrobe , I always find matchsticks in the pockets! Du
them out and pull out some
par tne r wil l com e ove r, put his hand in one of my pockets,
my ember the moral
to do the same? Does he not rem
of the matches! Am I supposed play with matches!
film Frances the Firefly?... Never
from 1990s public information
— The Wrong Trouse rs, Wigan

■ dirichlet says: Félicitations! You’ve been given the latest in


French fashion: couture deNim! But also com­misère­ations: nobody’s
going to remove the last matchstick for you. If you’re happy to play
along, sew up all but two pockets and keep the sticks in each pocket
equal. Failing that, I suggest an eXORcism to heal these obviously
cursed jeans. A word to the wise: run away if your partner offers
you chocolate where you are only allowed to eat squares if you also
eat those that are below it and to its right.

chalkdustmagazine.com 28
chalkdust

Dear Dirichlet,
I’m putting on a hilarious satiric
al political play at the Zoom the
I am having difficulty finding the atre next month but
right actors for the job. One bril
multiple copies of the chancellor liant scene involves
of the duchy of Lancaster chatte
other. Genius! Not sure why I can ring over each
’t find any faces so far.
— Kimberly Donglesworth, Newca
stle

■ dirichlet says: In general, when drawing up your CAST, one should


go anticlockwise from the fourth quadrant. But anyway, pop along to
your local colliery and see if you can convince a square number of
employees there to stand on an oversized chessboard. Ask everyone
on a white square to stand on their heads. Bob’s your uncle! Your
matrix of miners has become a matrix of ‘Gove actors’.

Dear Dirichlet,
some business from
is running an event to drum up
It’s all gone wrong! Our village sts—the
loc al safari par k, but we ’ve attracted the wrong sort of gue
visitors to the village’s central
ls have esc ape d! Hu ge bea sts are marauding over the the
anima we get rid of the
d for the festival tents. How can
grass area, which we repurpose
ls! — Ray & Dave, Devon
invaders? We don’t have the skil

■ dirichlet says: Sounds like a mammoth task ­ but ivorything’s going


to be OK. Given that you’ve already set up the village Green’s func­
tion, just keep the noise down: you want the volume nice and discrete.
Then naturally the animals should head to the village outskirts. I
call this... the boundary elephant method. (Pass my regards to the
catering team: as Hank ‘Hankie’ Williams used to say, “Hey Galerkin,
what you got cookin’?”)

Dear Dirichlet,
Over lockdown I have become a bit of a Twitter celebrity. How
do I capitalise on
my success?
— Brabara Barrington, Wellington

■ dirichlet says: ON MY SUCCESS.

More Dear Dirichlet, including seasonal specials,


online at d chalkdustmagazine.com

29 spring 2021
chalkdust

to b e t h e le a s t p o p ular
How
American president
Brian Copeland, CC BY 2.0

Francisco Berkemeier

S
ome people say the US presidential election system is unfair, since one candidate can win
the popular vote—meaning there are more people voting for that candidate than for other
candidates—but still fail to win the election. This means that the difference between the
number of votes for each candidate is irrelevant to the election outcome, in the sense that if you
didn’t count the extra votes, the result would be the same. This is the result of how the electoral
system is designed: the presidency is not determined by the popular vote, but by a system called
the electoral college which distributes 538 electoral college votes among the 50 states and DC.

A state’s electoral votes are equal to the number


of representatives and senators the state has in
congress. House seats are apportioned based on
population and so are representative of a state’s
population, but then the extra two Senate seats
per state give smaller states more power in an
election. The electoral college is supposed to Martin Falbisoner, CC BY-SA 3.0
guarantee that populous states can’t dominate Electoral college votes correspond to seats in
an election, but it also sets up a disparity in rep- Congress, plus three additional votes for DC.
resentation by misrepresenting every state. As a result, it has happened five times since the found-
ing of the republic that a president has won an election without winning the popular vote. Let me
invite you to a thought experiment on the implications of such a system in an extreme scenario.

chalkdustmagazine.com 30
chalkdust

How to win the presidency with only 22% of the vote


We could ask how much candidate 𝐿 (loser) can win the popular vote by and still lose the election. A
possible strategy is to first let candidate 𝑊 (winner) marginally win enough states to guarantee at
least 270 electoral votes. Then, in the remaining states, award candidate 𝐿 with all of the available
votes on those states. Schematically,
• If 𝑊 wins a state, they win it with one or two more votes than 𝐿 (depending on the parity of
the total number of voters);
• If 𝐿 wins a state, they get 100% of the votes from that state.
In fact, this is the optimal strategy, since in the states where candidate 𝑊 wins, the popular vote
difference is negligible, and the remaining states only increase the popular vote for candidate 𝐿,
which is what we want. Any other vote-per-state distribution would decrease the popular vote
difference. With our maximising strategy chosen, the question then becomes: how should we
distribute the states between the two candidates?
The best way to solve this problem is to use linear programming. This method is used to optimise
a certain outcome (for example, maximising profit or minimising costs) given certain restrictions
that are represented by linear relationships. In our case, we want to minimise the popular vote for
𝑊 given that the total number of electoral votes they win is greater or equal to 270. Notice that
with the strategy mentioned above, this is exactly the same question as maximising the popular
vote difference. In fact, 𝑊 wins with precisely 270 electoral votes.
Considering maximal turnout rates in this extreme scenario, assume there are 214 million people
voting. The calculations then tell us that 𝐿 wins the popular vote with roughly 122m more votes
than 𝑊 . This is almost four times the population of Canada! If 57% of the votes weren’t cast, the
result would remain the same. Furthermore, candidate 𝐿 gets 168m votes, which is approximately
78% of the total votes and still loses! The electoral map in such situation is below. See the table in
the online version of this article for a detailed breakdown of the voting.

EV PV
𝑊 𝟤𝟩𝟢 𝟦𝟨m (𝟤𝟤%)
𝐿 𝟤𝟨𝟪 𝟣𝟨𝟪m (𝟩𝟪%)

Map data ©OpenStreetMap contributors

US map in an extreme election scenario with two candidates and their respective electoral votes
(EV) and popular votes (PV) in millions. The winner 𝑊 is in yellow. Registered voters data from
d worldpopulationreview.com
Usually, electoral votes more or less align with the popular vote. However, a number of times in
US history, the person who took the White House did not receive the most popular votes. Our
scenario is obviously extreme, but it is mathematically possible and begs the questions: should
someone who only gets 22% of the popular vote really be the president? Should the US have a
system that allows the possibility of over 100 million voters being irrelevant? Is that really fair?

31 spring 2021
chalkdust

More than two candidates


Another curious case to consider is the mathematical consequence of having more than two can-
didates running for the presidency, as seen for example in the electoral college systems employed
by Germany or India. In the US, even though typically there are other candidates running with
other parties or independently, the race usually comes down to two sides. Assuming a tight race
between 𝑛 candidates, we can explore various questions within the same extreme scenario. For
instance, if 𝑛 candidates run, what is the max-
imum popular vote difference between candi- In reality, the US election system has a sep-
date 𝑊 (who wins the election with the most arate process to decide the presidency if no
electoral college votes), and the total popular candidate gains more than half of the elec-
vote of candidates 𝐿𝟣 , … , 𝐿𝑛−𝟣 ? Furthermore, is toral college votes, called a contingent elec-
it possible to make every single candidate win tion. However, in this scenario any of the
more popular votes than 𝑊 and still lose? Let’s top three candidates according to the elec-
consider the case of three candidates. Running toral vote could win, and so this process isn’t
the model shows that this is indeed possible, amenable to simple mathematical modelling.
resulting in:

EV PV
𝑊 𝟣𝟪𝟣 𝟣𝟫m (𝟫%)
𝐿𝟣 𝟣𝟩𝟪 𝟫𝟩m (𝟦𝟧%)
𝐿𝟤 𝟣𝟩𝟫 𝟫𝟪m (𝟦𝟨%)

Map data ©OpenStreetMap contributors

US map in an extreme election scenario with 𝟥 candidates.

In this case, 𝑊 wins with only 9% of the popular vote, while candidates 𝐿𝟣 and 𝐿𝟤 get 45% and 46%
of the popular vote, respectively. Notice that, in the states that 𝑊 wins, each candidate gets 𝟣/𝟥
of the votes in that state (for 𝑛 candidates, they would get 𝟣/𝑛), with 𝑊 marginally winning, and
in the remaining cases, the winning candidate still gets 100% of the votes. This could naturally be
distributed differently, since there is now more than one losing candidate, but we have kept the
same idea as before for simplicity.

EV PV
𝑊 92 𝟦m (𝟤%)
𝐿𝟣 89 𝟦𝟥m (𝟤𝟢%)
𝐿𝟤 89 𝟦𝟥m (𝟤𝟢%)
𝐿𝟥 90 𝟦𝟤m (𝟣𝟫%)
𝐿𝟦 89 𝟦𝟣m (𝟣𝟫%)
Map data ©OpenStreetMap contributors 𝐿𝟧 89 𝟦𝟤m (𝟤𝟢%)

Allowing more candidates results in a bizarre, yet possible popular vote differences for different
numbers of candidates.

chalkdustmagazine.com 32
chalkdust

Solving the minimisation problem for larger

Popular vote difference (millions)


values of 𝑛 is still possible and yields more in- 𝟤𝟤𝟢 Total eligible voting population
teresting results. The map above shows the 𝟤𝟢𝟢
outcome of an election with six candidates,
𝟣𝟪𝟢
where the winner gains the smallest popular
vote, and the graph to the right shows the 𝟣𝟨𝟢
maximal popular vote difference up to 𝑛 = 𝟪. 𝟣𝟦𝟢
For each 𝑛, every losing candidate gains a
𝟣𝟤𝟢
higher popular vote, but fewer (or equal) elec-
toral votes than the winner. The fate of some 𝟣𝟢𝟢
𝟤 𝟥 𝟦 𝟧 𝟨 𝟩 𝟪
states seems not to change with 𝑛. Interest-
Number of candidates (𝑛)
ingly, when 𝑛 = 𝟨, 𝑊 wins with only 𝟤% of
the popular vote! You might also notice that Maximal popular vote difference for between 2 and
as the number of candidates increases, more 8 candidates—effectively the number of irrelevant
or less every vote becomes irrelevant. votes.

Disenfranchisement laws
We can also study the impact of felony disenfranchisement laws that prevent millions of Americans
from voting due to their felony convictions. Rates of disenfranchisement vary dramatically by state
due to broad variations in voting prohibitions. For example, in 27 states felons lose their voting
rights only while incarcerated, and receive automatic restoration upon release or after a period
of time. In the other 11 states, voting rights are lost indefinitely for some crimes, while in three
states (namely DC (OK not technically a state), Maine, and Vermont), felons never lose their right
to vote, even while they are incarcerated. As of 2020, some of the key numbers can be summarised
as follows:
• An estimated 5.2 million people are disenfranchised due to a felony conviction.
• One out of 44 adults—2.3% of the total eligible US voting population—is disenfranchised due
to a current or previous felony conviction.
• The disenfranchisement distribution across correctional populations goes as follows: post-
sentence (43%), prison (24%), probation (22%), parole (10%) and other (1%).

Map data ©OpenStreetMap contributors Map data ©OpenStreetMap contributors

(a) Felony disenfranchisement rates (%), 2020. (b) The results, 𝑊 is yellow, 𝐿 is green.

US map in an extreme election scenario without disenfranchisement laws.

33 spring 2021
chalkdust

In this case, it is naturally interesting to study the disenfranchisement rates per state, as seen in the
heat map on the previous page. The map on the left represents the disenfranchised population as
a percentage of the adult voting eligible population in each state. Assuming again an extreme sce-
nario where every felon can vote, we can redo the optimisation problem with these new numbers.
The election map reflects the results under such assumption.
In this case, 𝑊 wins with 𝟤𝟣% of the popular vote, while candidate 𝐿 gets 𝟩𝟫%, which is not that
different from the previous case. We could then conclude that taking felon votes into account
doesn’t dramatically change this extreme scenario. That’s not to say that disenfranchisement is
completely irrelevant, four states changed fate: namely Indiana, Missouri, Maryland and Georgia.
Of course, a more sophisticated model which accounts for the political landscape in the US may
find that one party is more affected by felony voters than another.

Final thoughts
Naturally, there are many other parameters and assumptions that could be included in our testing,
but I suspect that there will still be the possibility of candidates winning with much less than a
majority of the popular vote. For instance, what would happen if DC or Puerto Rico became a
state? Depending on the impact such change would have on the electoral votes attributed to each
case, perhaps a different state distribution would emerge, but the overall disparity in an extreme
scenario should persist. We could even extend these ideas to other election systems and challenge
them by considering extreme scenarios.
The main goal of this article was to, in an overly dramatic manner, highlight and discuss some
of the issues with the US electoral system from a purely mathematical perspective. The model
only looks at the implications of the electoral college in an extreme scenario, but I hope it is a
starting point to think about why the system works in the way that it does, and perhaps how it
could be adjusted to avoid the possibility of such unrepresentative outcomes. Again, I stress that
the oversimplification in the model does not do justice to the complex world that is politics, but, if
nothing else, it reveals some striking consequences of the US electoral system.

Francisco Berkemeier
Francisco is mathematician born in China and raised in Portugal. After spending two years
in the deserts of Saudi Arabia doing a master’s at KAUST, he’s now a PhD student in math-
ematics and biology at University College London. He can be found playing the classical
guitar and ‘singing’ whenever cells and maths allow him to.
a @fpberkemeier l franciscoberkemeier

Did you know...

…that a graph is non-planar if and only if it contains a subgraph that is


homeomorphic to either 𝐾𝟧 or 𝐾𝟥,𝟥 .

chalkdustmagazine.com 34
chalkdust

On the cover
Cellular automata
Matthew Scroggs

T
he game of life—invented by John Conway (see pages 56–57) in 1970—is perhaps the most
famous cellular automaton. Cellular automata consist of a regular grid of cells (usually
squares) that are (usually, see page 38) either ‘on’ or ‘off’. From a given arrangement of
cells, the state of each cell in the next generation can be decided by following a set of simple rules.
Surprisingly complex patterns can often arise from these simple rules.
While the game of life uses a two-dimensional grid of squares for each generation, the cellular
automaton on the cover of this issue of Chalkdust is an elementary cellular automaton: it uses
a one-dimensional row of squares for each generation. As each generation is a row, subsequent
generations can be shown below previous ones.

Elementary cellular automata


1 0 1
In an elementary cellular automaton, the state of each cell is decided by
its state and the state of its two neighbours in the previous generation.
An example such rule is shown to the right: in this rule, the a cell will
be on in the next generation if it and its two neighbours are on–off–on 1
in the current generation. A cellular automaton is defined by eight of
these rules, as there are eight possible states of three cells. An example rule

35 spring 2021
chalkdust

In 1983, Stephen Wolfram proposed a system for naming elementary cellular automata. If on cells
are 1 and off cells are 0, all the possible states of three cells can be written out (starting with 1 1 1
and ending 0 0 0 ). The states given to each middle cell in the next generation gives a sequence of
eight ones and zeros, or an eight-digit binary number. Converting this binary number into decimal
gives the name of the rule. For example, rule 102 is shown below.

1 1 1 1 1 0 1 0 1 1 0 0 0 1 1 0 1 0 0 0 1 0 0 0

0 1 1 0 0 1 1 0

Rule 102: so called because (0)1100110 is 102 in binary

Rule 102 is, in fact, the rule that created the pattern shown on the cover of this issue of Chalkdust.
To create a pattern like this, first start with a row of squares randomly assigned to be on or off:

1 1 0 1 0 0 0 1 0 1

You can then work along the row, working out whether the cells in the next generation will be on
or off. To fill in the end cells, we imagine that the row is surrounded by an infinite sea of zeros.

0 0 1
1

0 1 1
0

1 1 0
1

... and so on until you get the full second generation:

If you continue adding rows, and colour in some of


the regions you create, you will eventually
get something that looks like the picture
to the right.
It’s quite surprising that such simple
rules can lead to such an intricate
pattern. In some parts, you can
see that the same pattern
repeats over and over, but
in other parts the pattern
seems more chaotic.
The pattern gets a square wider each row.
This is due to the state 001 being followed
by 1: each new 1 from this rule will
lead to another 1 that is one
square further left.
But just when you think
you’re getting used to
the pattern of some
small and some
slightly larger
triangles...

Surprise! There’s this huge triangle that appears out


of nowhere.
chalkdust

Other rules
Rule 102 is of course not the only rule that defines a cellular automaton: there are 256 different
rules in total.
Some of these are particularly boring. For example, in rule 204 each generation is simply a copy
of the previous generation. Rule 0 is a particularly dull one too, as after the first generation every
cell will be in the off state.
1 1 1 1 1 0 1 0 1 1 0 0 0 1 1 0 1 0 0 0 1 0 0 0

1 1 0 0 1 1 0 0

Rule 204 is one of the most boring rules as each new cell is a copy of the cell directly above it.

Some other rules are more interesting. For example, rules 30 and 150 make interesting patterns.

100 rows of rules 30 (left) and 150 (right) starting with a row of 100 cells in a random state
If you want to have a go at creating your own cellular automaton picture, you can find a template
to fill in on the inside back cover of this issue of Chalkdust. If you’d rather get a computer to do the
colouring for you, you can download the Python code I wrote to create the pictures in this article
from d github.com/mscroggs/cellular-automata and try some rules out.
There are also many ways that you can extend the ideas to create loads of different automata. For
example, you could allow each cell to be in one of three states (‘on’, ‘off’, or ‘f’) instead of the
two we’ve been allowing. You could then choose a rule assigning one of the three states to each of
the 27 possible configurations that three neighbouring three-state cells could be in. But there are
7,625,597,484,987 different automata you could make in this way, so don’t try to draw them all...

Matthew Scroggs
Matthew is a postdoctoral researcher at the University of Cambridge. He hasn’t had time to
play Klax since the noughties, but he’s pretty sure that Coke is it!
d mscroggs.co.uk a @mscroggs r mscroggs
My favourite game
Velocity Raptor by TestTubeGames
Jakob Stein

An online JavaScript (formerly Flash) game about special relativity, it’s a fun way to wrap
your head around geometry that changes as you move. 𝐸/𝑚𝑐 𝟤

chalkdustmagazine.com 38
chalkdust

Who is the best


England manager?
Flickr user Eric Kilby, CC BY-SA 2.0

Paddy Moore

M
y quest to answer this simple question began for the noblest of reasons—to win an argu-
ment with my wife. We are both football fans and have been following England all our
lives (the phrase ‘long-suffering’ has never been more apt). Our house is usually an oa-
sis of calm and tranquillity, but one thing is guaranteed to get things kicking off: was Sven-Göran
Eriksson a good England manager?
This year, we have had more time than usual
at home together and the discussion has become
heated. I believe that Sven took a golden genera-
tion of England players and led them to disappoint-
ing performances in three major tournaments and
she points to the team reaching the last 8 at con-
secutive World Cups under his stewardship. I’ve
been a maths teacher for over 25 years, so surely, I
can prove I am correct using maths, right?f
Anders Henrikson, CC BY 2.0
How do I prove that I am right? Well, the first, and
Sven-Göran Eriksson wondering where his
most obvious, thing to do is to look at the playing hat has gone.
f At this point, it is definitely not worth mentioning that in the 15 years we’ve been having this row, I never thought
of applying any maths to the problem until my wife suggested it.

39 spring 2021
chalkdust

record of Sven-Göran Eriksson. He was manager of England from January 2001 until July 2006. In
that time the team played 67 games and won 40 of them—a win percentage of 59.7%.

On its own that is a bit meaningless, so we need something to compare it to. It’s time for a spread-
sheet.

I am going to tidy up the data a little though. Firstly, up to 1946,


there was no England coach. Even under Walter Winterbottom, the
players were selected by committee, so I am going to exclude them.
Secondly, caretaker managers like Stuart Pearce or Joe Mercer (or
Sam Allardyce who was sacked after one game for ‘reasons’) didn’t
have enough games and so I’ll drop them from consideration. That
gives us the trimmed list below right, ranked by winning percentage.

This shows that Sven had a pretty average record, only just reaching
the top half of the table. I was feeling suitably pleased with my-
self for having come up with such a convincing statistic, only to be
shot down with, “Yeah, but a lot of those games were meaningless
Tonywalt, CC BY-SA 3.0
friendlies.” I mean, you could argue that playing for England in any
Walter Winterbottom
game is the pinnacle of a footballer’s career, and that international
friendlies are always important games, but I decided to look at this as it seemed interesting (and I
was confident that it would support my point even more).

Since we were looking at com- P W %


petitive internationals, I decided
to look at the overall results Fabio Capello 2008–2011 42 28 66.7%
record rather than using just the Alf Ramsey 1963–1974 113 69 61.1%
win percentage. After all, there Glenn Hoddle 1996–1999 28 17 60.7%
are three possible outcomes in Ron Greenwood 1977–1982 55 33 60.0%
football and a draw has value Sven-Göran Eriksson 2001–2006 67 40 59.7%
(although this value varies with Gareth Southgate 2016–2020 49 29 59.2%
the opponent—a draw against Roy Hodgson 2012–2016 56 33 58.9%
Brazil is generally seen as a Steve McClaren 2006–2007 18 9 50.0%
fairly decent result, whereas as Bobby Robson 1982–1990 95 47 49.5%
a draw against Greece is not). Don Revie 1974–1977 29 14 48.3%
Terry Venables 1994–1996 23 11 47.8%
To calculate this, I used 3 points Graham Taylor 1990–1993 38 18 47.4%
for a win and 1 for a draw. This Kevin Keegan 1999–2000 18 77 38.9%
has been the standard across
football since the 1980s as it Selected England managers’ full competitive international results
(Correct to Nov 2020)
rewards positive play. This
may disadvantage the man-
agers from before it was introduced because playing for a draw would have been more profitable
in group games and qualifiers, but I feel it is the best of the options available (and I’m trying to
prove that Sven was a negative manager and I think this will help me)...

chalkdustmagazine.com 40
chalkdust

P W D L F A Win % Pts Pts available Pts %


Sven-Göran Eriksson 38 26 9 3 69 26 68.4% 87 114 76.3%
Fabio Capello 22 15 5 2 54 16 68.2% 50 66 75.8%
Ron Greenwood 26 17 5 4 48 17 65.4% 56 78 71.8%
Roy Hodgson 31 19 9 3 73 18 61.3% 66 93 71.0%
Glenn Hoddle 15 9 3 3 26 8 60.0% 30 45 66.7%
Don Revie 10 6 2 2 22 7 60.0% 20 30 66.7%
Alf Ramsey 33 20 6 7 56 29 60.6% 66 99 66.7%
Gareth Southgate 36 22 6 8 80 29 61.1% 72 108 66.7%
Bobby Robson 43 22 14 7 90 22 51.2% 80 129 62.0%
Terry Venablesf 5 2 3 0 8 3 40.0% 9 15 60.0%
Graham Taylor 19 8 8 3 34 14 42.1% 32 57 56.1%
Kevin Keegan 11 4 3 4 17 10 36.4% 15 33 45.5%

Selected England managers’ full competitive international results (Correct to Nov 2020)

This did not go well and there was a significant amount of smugness, which I felt was inappropriate
and irritating.
To be honest, this is a compelling result and I needed to come back strong if I was to maintain any
credibility in this argument. I felt a little disappointed that I had done all that work to prove this
important point and it wouldn’t be any use. I was starting to get concerned that manipulating
statistics to get the result I wanted was not working, when a thought occurred to me—I might be
able to use the Fifa ranking data to demonstrate that Sven-Göran Eriksson’s England team was
only able to beat lesser teams and often struggled against higher ranking sides. In short, I chose
to take a leaf from the Trump playbook—when you’re in trouble, smear the opposition.
OK, so it’s not classy but, in this case, I think it is Team Pld Pts Ranking
a valid point to explore. Were most of Sven’s com- England 10 25 9
petitive games against weaker opposition? This is Poland 10 24 23
a possibility because qualifiers and group games Austria 10 15 72 (=)
are seeded, and so England would be facing so- Northern Ireland 10 9 101
called lesser teams. For example, let’s consider the Wales 10 8 72 (=)
2006 World Cup qualifying group (right). Azerbaijan 10 3 113
Only Poland finished ranked in the world’s top
The final positions and 2005 Fifa world rank-
50 international teams, which supports my con-
ings of Uefa group 6 in the 2006 World Cup
tention that England were flat-track bullies under qualifying.
Sven. But this raised two interestingf questions:
1. How are the Fifa rankings calculated?
2. How can I use them to win this argument?
f Terry Venables’ stats are decimated here because in the run up to Euro 96, England were only playing friendlies
because they had already qualified as hosts.
f in my opinion

41 spring 2021
chalkdust

The Fifa ranking system


The Fifa ranking system was introduced in December 1992, and initially awarded teams points for
every win or draw, like a traditional league table. However, Fifa quicklyf realised that there were
many other factors affecting the outcome of a football match and, over timef moved to a system
based on the work of Hungarian–American mathematician Árpád Élő (more on him in a moment).

The Fifa rankings are not helpful to me because they don’t cover all the managers I’m considering
and because their accuracy, reliability and the many methods used to generate them were always
questioned. Luckily, football fans have had these arguments before and there is an Elo ranking for
all men’s international teams, which has been calculated back to the first international between
England and Scotland in 1872 (a disappointing goalless draw).

The Elo rating system compares the relative performance of


the competitors in two-player games. Although it was initially
developed for rating chess players, variations of the system
are used to rate players in sports such as table tennis, esports
and even Scrabble. Strictly speaking, we should be saying an
Wikimedia commons user BaldL, CC BY-SA 4.0
Elo system, rather than the Elo system as each sport has mod-
Competitors in an esports event. ified the formula to suit their own needs.

So how does an Elo system calculate a ranking? Well, at the most basic level, each team has a
certain number of points and at the end of each game, one team gives some points to the other.
The number of points depends on the result and the rankings of the two teams. When the favourite
wins, only a few rating points will be traded, or even zero if there is a big enough difference in the
rankings (eg in September 2015, England beat San Marino 6–0, but no Elo points were exchanged).
However, if the underdog manages a surprise win, lots of rating points will be transferred (for
example, when Iceland beat England at Euro 2016, they took 40 points from England). If the ratings
difference is large enough, a team could even gain or lose points if they draw. So teams whose
ratings are too low or too high should gain or lose rating points until the ratings reflect their true
position relative to the rest of the group.

But how do you know how many points to add or take away after each game? Elo produced a
formula for this, but there is a bit of maths—brace yourself.

Firstly, Elo assumed that a team would play at around the same standard, on average, from one
game to next. However, sometimes they would play better or worse but with those performances
grouped towards the average. This is known as a normal distributionf f or bell curve, where
outstanding results are possible but rare. In the graph, the 𝑥 -axis would represent the level of
performance, and the 𝑦 -axis shows the probability of that happening. So, we can see that the
f five years later
f Over twenty years. I mean, why use an established and respected system when you can faff about making your
own useless one? To be fair, the women’s rankings have used a version of the Elo system since their inception, which
may make Fifa’s unwillingness to use it for the men even stranger.
f f Elo uses a logistic distribution rather than the normal, but the differences are small (I mean, what’s a couple of
percent between friends?).

chalkdustmagazine.com 42
chalkdust

chance of an exceptional performance is smaller than that of an unremarkable one and the bulk of
games will have a middling level of skill shown.
This means that if both teams perform to their standard, we can predict an expected score, which
Elo defined as their probability of winning plus half their probability of drawing. Because we do
not know the relative strengths of both teams, this expected score is calculated using their current
ratings and the formulas
𝟣 𝟣
𝐸𝐴 = and 𝐸𝐵 = .
𝟣 + 𝟣𝟢(𝑅𝐵 −𝑅𝐴 )/𝟦𝟢𝟢 𝟣 + 𝟣𝟢(𝑅𝐴 −𝑅𝐵 )/𝟦𝟢𝟢

𝟣 In these formulas, 𝐸𝐴 and 𝐸𝐵 are the expected


results for the teams, and 𝑅𝐴 and 𝑅𝐵 are their
𝟢.𝟪 ratings. If you plot a graph of the 𝐸 values for
Expected score (𝐸𝐴 )

different values of 𝑅𝐴 − 𝑅𝐵 you get the graph


𝟢.𝟨
shown to the left.
𝟢.𝟦 It’s interestingf to note the shape of this
𝟢.𝟤
graph, which is a sigmoid, a shape that anyone
who has drawn a cumulative frequency graph
𝟢 for their GCSE maths will recognise. It is an
−𝟣,𝟢𝟢𝟢 −𝟧𝟢𝟢 𝟢 𝟧𝟢𝟢 𝟣,𝟢𝟢𝟢
expression of the area under the distribution
Difference in rating (𝑅𝐴 − 𝑅𝐵 )
(ie the cumulative distribution function). The
The expected score for a range of differences in graph shows that if the difference between rat-
team ratings. ings is zero, the expected result is 0.5. The sys-
tem uses values of 1 for a win, 0.5 for a draw
and 0 for a loss, so this suggests a draw is the most likely outcome. And if the difference is 380 in
your favour, the expected score is 0.9, which suggests you are likely to winf .
The system then compares the actual result to the expected outcome and uses a relatively simple
calculationf f to calculate the number of points exchanged:
′ = 𝑅 + 𝐾 (𝑆 − 𝐸 ).
𝑅𝐴 𝐴 𝐴 𝐴

In this equation, 𝑅𝐴′ is the new rating for team A, 𝑆 is the actual result of the game, and 𝐾 is a
𝐴
scaling factor. We’ll come back to 𝐾 in a moment. Recently, England (rating 1969) played Belgium
(rating 2087) at the King Power stadium in Leuven, Belgium. It is generally thought that the home
team is at an advantage and to reflect this, the home team gets a bonus 100 points to their rating
which means there is a 218-point difference between the teams. England are clear underdogs, and
we can calculate the expected result as follows:
𝟣
𝐸𝐴 = ≈ 𝟢.𝟤𝟤
𝟣 + 𝟣𝟢(𝟤𝟣𝟪𝟩−𝟣𝟫𝟨𝟫)/𝟦𝟢𝟢
f Again, interesting to me.
f An 𝐸 of 0.9 doesn’t necessarily mean you’ll win 90% of the games and lose the rest as other combinations also give
𝐴
an expected score of 0.9. For example, winning 80%, and drawing the rest or winning 85%, drawing 10% and losing 5%
gives the same value.
f f Honestly, it’s easier than it looks.

43 spring 2021
chalkdust

This shows that this will be a tricky game for England, and a draw would be a good result. Un-
fortunately, England lost the game 2–0, an 𝑆𝐴 of 0 (still using 1 for a win, etc). Therefore we can
calculate the rating change using the formula:
′ = 𝟣𝟫𝟨𝟫 + 𝐾 (𝟢 − 𝟢.𝟤𝟤)
𝑅𝐴

Now we need to understand the 𝐾 value. In simple terms, the bigger the 𝐾 value we use, the more
the rating will change with each result. We need to choose a suitable value so that it isn’t too
sensitive, which would lead to wild swings, but also allows for teams to change position when they
start to improve.
𝑅𝐴′ = 𝟣𝟫𝟨𝟫 + 𝟨𝟢(𝟢 − 𝟢.𝟤𝟤) ≈ 𝟣𝟫𝟧𝟨

The world football Elo rankings adjust the 𝐾 value depending on the score and the competition.
In our example, which was a Nations League game (a new competition between European teams
with similar Fifa rankings), the base value for 𝐾 is 40. This is multiplied by 1.5 for a win by 2 clear
goals giving a 𝐾 value of 60.
This is a change of −𝟣𝟥 points, and so Belgium would change by +𝟣𝟥 points to a new rating of 2100.
Although I have focused on the world football Elo rankings, the Fifa rankings now use a system
which is basically similar, with slight variations in the weightings and allowances. This brings me
to the second, and more important part, of the question: can I use this to prove that I’m right?
Unfortunately, this explanation shows that you can only use this type of ranking, whether it’s the
Elo or the Fifa system, to compare with teams that were playing at that time. This means that
trying to use it to look back over time is pointless. You can’t compare the performance of Alf
Ramsey’s England with that of Steve McClaren using the Elo rankings, because it is not designed
to do that.

What can I do?


I can, however, use a similar idea— P W D L F A Win % Points %
looking at England’s performance 11 4 3 4 18 10 36.4% 45.5%
against differently rated teams—to
judge Sven. P W D L F A Win % Points %

To achieve this, I’ve collated all 27 22 4 1 51 15 81.5% 86.4%


of England’s results in competitive
Performance of England under Sven-Göran Eriksson
games under Sven and used some against teams in the top 20 (top) and teams out-
spreadsheet magic to create the ta- side the top 20 (bottom). The full data is available at
bles shown to the right.f d chalkdustmagazine.com.
This is conclusivef . Under Sven-Göran Eriksson, England were brilliant—if the team they were
playing were outside the top twenty. Against good teams, England were awful. For comparison, in
the 2020–21 season, Manchester United have a win percentage of 63.2% and a points percentage
f Do not ask how long this took.
f It is. Just trust me on this.

chalkdustmagazine.com 44
chalkdust

of 70.2%. On the other hand, Chelsea had a win percentage of 42.1% and a points percentage of
50.9% (based on results up to 27 January 2021), and they sacked the manager.
I can finally conclude that I was right. Sven was a rubbish manager who was worse than Frank
Lampard.

Paddy Moore
Paddy has been a maths teacher for over 25 years. He is a proud nerd and a perpetually
disappointed football fan.
a @PaddyMaths

My least favourite game


Poker
Sophie Maclean

I know it’s possible to calculate the probabilities of my opponents having each hand, and
the expected amount of money I’d win from each pot, thereby determining the optimal
strategy in a random game. What’s more, I know it’s within my mathematical capabilities
to do this. But I can just never be bothered and always lose and it’s not fun. Or maybe this
is just an incredible bluff… /

My favourite game
Nim
David Sheard

You and a friend have 𝑁 piles of things (coins, marbles, sticks… whatever), maybe each of
a different size, and you take it in turn to take any positive number of things from exactly
one pile. The winner (unless you are feeling particularly miserly) takes the final thing.
At first it’s a kinda fun, surprisingly complicated, strategy
game. Then you Google the winning strategy and find it
involves a pretty disappointing and unintuitive operation
on binary numbers called XOR- or Nim-sum which comes
out of nowhere.
Deflated but not defeated, after much head scratching it
transpires that the losing states of the game form a beauti-
ful 𝑁 -dimensional Sierpiński’s triangle elegantly described
by XOR-sum, and which is strategically intuitive. And
then after more Googling you learn that XOR-sum actually
works for a huge number of games in a really fascinating
way… or is that just me? /

45 spring 2021
Puzzles Looking for a fun puzzle but not got time to tackle the
crossnumber? You’re on the right page.

One or two 1 2 3
Put the answers to the clues in the grid by placing ei-
ther one or two letters in each box.
For example, if the answers to the clues were cone, 4
speed, cusp, and ended, the completed puzzle would
look like:
c on e
5
u nd
sp e ed
Across Down
1 Hypotenuse ÷ opposite. 1 Not real and not imaginary.
4 The volume of these shapes is 2 Adjacent ÷ hypotenuse.
(area of base) × height.
3 It is impossible to an angle using
5 𝑥 in 𝟤𝑥 . a ruler and compass.

+ + = 𝟣𝟩

Arrange the digits + + ÷


Put the numbers 1 to 9 (using each number exactly −
once) in the boxes so that the sums are correct.
+ =𝟧

The sums should be read left to right and top to bottom + − ×


ignoring the usual order of operations. For example, ÷ × = 𝟤𝟢
𝟦 + 𝟥 × 𝟤 is 14, not 10.
= = =
12 8 18

+ + =

+ +
Arrange the digits II
+ + Put the numbers 1 to 9 (using each number exactly
once) in the boxes so that the sums are correct.
+ + = 𝟤𝟦
= =
17 14

chalkdustmagazine.com 46
chalkdust

Square filler 7 2 9 2 5 6

Place the digits 0 to 9 in gaps in the 8 2 7


grid (using each digit exactly once) so
that every number in the completed 4 4 6
crossnumber is square. As usual, no
number begins with 0.
1 4 4
You can cross off the digits below as
you use them. 6 0
0 1 2 3 4 5 6 7 8 9 1 4 4 9 0

Extra letters
The words on the right are anagrams of words with a common theme with an extra letter
added. If you write the themed words in the boxes to the left, and the extra letters in
the extra letters column, two more words with the same theme will appear in the orange
boxes. Extra
letters

THRIFTY
EITHER
ENFIN
TOZER
YOWT
STEVEN
OWEN
NOTE

Add brackets
Add one set of brackets to each equation below to make them correct. The usual order of
operations (× and ÷ before + and −) should be followed.

𝟣𝟢 × 𝟥 + 𝟨 × 𝟪 − 𝟥 = 𝟦𝟪𝟢 𝟤 + 𝟤 × 𝟤 + 𝟤 ÷ 𝟤 + 𝟤 = 𝟣𝟣

𝟨 + 𝟦 𝟤 ÷ 𝟤 𝟤 = 𝟩𝟢

47 spring 2021
chalkdust

Surfing on wavelets
Flickr user Warm Winds Surf Shop, CC BY 2.0

Johannes Huber

H
igh-speed internet and digital storage get cheaper, but the challenge of sharing large files
is one that anybody who spends their time working on computers faces. Digital images
in particular can be a pain if they lead to long loading times and high server costs. If you
have ever seen an image on the internet, then you have certainly encountered the JPEG format
because it has been the web standard for almost 30 years. I am sure, however, that you have never
heard of its potential successor, JPEG2000, even though it recently celebrated its 20th anniversary.
If so, then that is unfortunate because it produces much better results than its predecessor.

Same principle but different outcome


The best way to understand why different formats give very different outcomes is to look at a
specific example. I have compressed a picture of myself using JPEG and JPEG2000. In both cases
I sacrifice image quality in favour of space savings, which leads to errors in the resulting image.
With JPEG, an image usually starts to become blurry as soon as it is compressed. You’ve probably
noticed it with images on the web. Additionally, there can also be a colour loss so the image has
a duller appearance overall. In my picture, the most obvious thing to suffer is that smooth colour
gradients have been replaced by monochrome areas which make it appear like a picture inside a
colour-by-numbers book. We can see some distinct patches of grey on the left wall for example.
With JPEG2000, on the other hand, image quality usually only starts to noticeably decrease after
extensive compression. As you can see, the image on the right still looks relatively unchanged.

chalkdustmagazine.com 48
chalkdust

JPEG Original JPEG2000


Image of the author compressed using JPEG and JPEG2000. Note that compression does not
make the image smaller, they have been scaled down here to make room for the details.

The differences between the two compressions are the result of the underlying mathematical pro-
cedures. JPEG uses the discrete cosine transform, whereas JPEG2000 uses the discrete wavelet
transform. ‘Discrete’ refers to the fact that computers only deal with information in chunks called
bits while ‘cosine’ and ‘wavelet’ stand for the functions that are used to sample the image. The
wavelet transform is so-called because the function it uses looks like small waves when graphed:

Left to right: Daubechies wavelets of order 1, 2, 4 and 8.

The one on the left is also called a Haar wavelet. We will encounter it in matrix form later when I
explain how JPEG2000 works. I want to show you how it can save more storage space while still
delivering better looking images.

From analogue to digital


When we transform an analogue image to a digital image, we view it as a grid of blocks with a
predetermined size. We call these blocks pixels, which stands for ‘picture elements’, and assign
each of them a number corresponding to their colour. And just like that, we have transformed our
image with coloured blocks into a matrix with numbers. The entries in a monochrome image matrix
are brightness values from 0 for black to 255 for white. Colour images require a few additional
parameters, but the important thing is that the file size of a digital image depends on the number
of pixels and the size of the colour scale.

49 spring 2021
chalkdust

Sometimes a low resolution is quite sufficient since our human eyes cannot detect any difference
from the original once we view an image at a certain distance. That is why JPEG is ideal for
small images such as thumbnails where resolution does not matter as long as you can recognise
what is depicted. However, when the image is larger, you have already witnessed some potential
side effects of the compression with JPEG in my sample picture. We can check the occurrence
of these so-called artefacts with suitable image galleries. Think of these as collections of images
with problematic patterns that are susceptible to various defects. The image ‘Barbara’ below, for
example, is perfectly suited for the detection of block artefacts due to the rapid sequences of light-
dark areas it contains.

Fabien Petitcolas, public domain


Original JPEG JPEG2000
The test image ‘Barbara’, with a detail compressed using JPEG and JPEG2000.

The principle of compression is that unnecessary information must be located and thrown away.
We will use some elegant mathematics to change the values in our image matrix so that this redun-
dant information becomes visible. We try to find as many entries as possible according to specific
criteria and set them to zero. The result is an approximation that is then efficiently coded. I will
not go into detail here, but the general idea of this last step is as follows: since each digital colour
value is saved as a list of ones and zeros, we look for values that appear with higher frequency
and assign them abbreviated labels instead to save space. In the case of the value zero, we could
write it as just 0 instead of its complete 8-bit binary notation 00000000. If we have lots of black
areas in our image we can use this shorter notation to save space. This, however, is not exactly the
algorithm used in image compression; look up ‘Huffman encoding’ if you are interested.

Let’s look at the numbers


The memory requirement of a digital image is measured in bits. One bit corresponds to the smallest
possible digital storage unit that takes either the value 0 or 1. We can use this to work out the
compression rate by dividing the number of bits for the compressed image by the number for the
original. If the fraction is close to zero, this means that we saved a lot of space and vice versa.
My original image is 2976 × 3968 ≈ 12,000,000 pixels large and requires about 12,500,000 bytes of
storage space. To convert bytes into bits, we multiply by eight (1 byte = 8 bits), which gives us

chalkdustmagazine.com 50
chalkdust

100,000,000 bits. The JPEG file requires about 2,220,000


bits, and the JPEG2000 file only about 800,000 bits. With
this, we get compression rates of approximately 0.02 for
JPEG and 0.01 for JPEG2000. Both of my compressions are
quite good since each requires only a tiny percentage of
the original memory space. We can see, however, that the
JPEG file suffers from a noticeable drop in image quality.
The JPEG2000 version of my picture, on the other hand,
looks relatively unchanged while still being more efficient
with only about 36% the size of the JPEG file. Let’s find
out how this is possible. Flickr user Christiaan Colen, CC BY-SA 2.0

Computers store data in bits contain-


As a rule of thumb: JPEG files with compression rates be-
ing the value 0 or 1.
low 0.25 tend to experience severe quality losses. The dis-
crete cosine transformation used by JPEG represents the values of pixel blocks (usually 𝟪 × 𝟪 pixels
in size) as combinations of cosine oscillations and makes use of orthographic projections to replace
them with simplified versions. Imagine it like this: the transformation has a set of versatile building
blocks like Lego bricks that we can combine to build any image. If you restrict yourself to use only
a few different types of Lego bricks, you can still approximate the image, but the more you limit
your options, the cruder the approximation will look as you can see in the ‘Barbara’ image. The
reason why the corners and edges of block artefacts become more apparent at higher compression
rates is that the available options are not versatile enough to create a convincing approximation.

The discrete wavelet transform


To figure out how JPEG2000 works, we start with a simple task: we count the number of passengers
at a train station throughout an eight-hour shift which gives us a list of numbers: (206, 306, 59,
69, 16, 16, 5, 3). Let’s suppose we want to send this information to someone, but we need to com-
press it first to save space (this is just an example—in practical applications, compression becomes
necessary only when you send a lot more data). How should we choose the numbers we send?
Assuming we are satisfied with a rough count, a viable option could be to round the numbers to
their nearest multiple of ten: (𝟤𝟢𝟢, 𝟥𝟢𝟢, 𝟨𝟢, 𝟩𝟢, 𝟤𝟢, 𝟤𝟢, 𝟣𝟢, 𝟢), but there is a better way.
Another possibility is to form averages of the four consecutive pairs: (𝟤𝟧𝟨, 𝟨𝟦, 𝟣𝟨, 𝟦). Doing this
requires only four values, but nobody can decrypt the original numbers without additional infor-
mation. Fortunately, we still have four empty slots left in our list. We can choose another number
for each of the four pairs which allows us to decrypt the input from the average and tells us some-
thing else about the numbers.
The linear transform ̃𝙒 ∶ (𝑎, 𝑏) → ((𝑏 + 𝑎)/𝟤, |𝑏 − 𝑎| /𝟤) generates both the mean of a number
pair (𝑎, 𝑏) and the absolute difference between the mean and either of the two numbers. The
four additional numbers are now (𝟧𝟢, 𝟧, 𝟢, 𝟣). This process is easily reversible: we get our starting
numbers by adding and subtracting the differences from the corresponding means. More revealing,
however, is that these differences measure the distribution of the original numbers. A high value
means that the original number pair was far apart and a low value means they were close together.

51 spring 2021
chalkdust

The discrete wavelet transform used by JPEG2000 takes advantage of the fact that these differences
capture the redundant information we can remove when we want to compress an image. As long as
the values are smaller than a certain threshold which depends on the intended level of compression,
we can set them to zero. In our example, the threshold could be 10, so the compressed differences
are (𝟧𝟢, 𝟢, 𝟢, 𝟢), and reversing the process gives us the list (𝟤𝟢𝟨, 𝟥𝟢𝟨, 𝟨𝟦, 𝟨𝟦, 𝟣𝟨, 𝟣𝟨, 𝟦, 𝟦). The result is
very similar to the original list even though we have simplified most of the differences. For images,
this means that we match neighbouring pixels with a similar colour by setting their difference to
zero, which we can then efficiently code to save space.
When doing this with a computer, we can use matrix algebra to simplify the calculations. If we
have a list of numbers, that is, a vector 𝒗 of even length, we can write the transformed vector:
𝒘 =̃ 𝙒 𝒗 . If we have two or four data points, for example, the transformation is a 𝟤 × 𝟤 or 𝟦 × 𝟦
matrix:
𝟣/𝟤 𝟣/𝟤 𝟢 𝟢
⎛ ⎞
̃ 𝟣/𝟤 𝟣/𝟤 ̃ 𝟢 𝟢 𝟣/𝟤 𝟣/𝟤
𝙒𝟤 = ( ), 𝙒𝟦 = ⎜ ⎟.
−𝟣/𝟤 𝟣/𝟤 ⎜−𝟣/𝟤 𝟣/𝟤 𝟢 𝟢 ⎟
⎝ 𝟢 𝟢 −𝟣/𝟤 𝟣/𝟤⎠

We can also reverse the process by using the rules for matrix multiplication: ̃ 𝙒 −𝟣 𝒘 = 𝒗 . The
discrete Haar wavelet transform (HWT) for an input vector of length 𝑁 = 𝟤𝑘 (with 𝑘 ∈ ℕ) is
defined as 𝙒𝑁 = √𝟤 ̃ 𝙒𝑁 . The factor √𝟤 is added to make the matrix orthogonal (this simplifies
calculating the reverse transformation) but you only need to pay attention to ̃ 𝙒𝑁 to understand
what is going on. It is great for illustrating how wavelet transforms work since it is relatively
simple. The first 𝑁 /𝟤 rows of the result of the Haar wavelet transform contain the averages of the
number pairs and the remaining 𝑁 /𝟤 rows their respective differences.

Transforming an image
Now we can apply the transform to any 𝑁 × 𝑀 pixel image matrix 𝘽 . Since an image is a two-
dimensional array of numbers, we want to apply the transform both vertically and horizontally.
More precisely, instead of a single list with 𝑁 elements, we now transform 𝑀 lists with 𝑁 elements
each. The result is also an 𝑁 × 𝑀 matrix divided into two halves: the upper average block and the
lower difference block. You can follow the process in the picture overleaf, starting from the left.
This first step corresponds to the one-dimensional Haar wavelet transform (1D-HWT) because
we only transformed the matrix vertically. To do the same with the rows and thus transform the
image completely, we use the rules for matrix multiplication and get 𝘽 ̃ = 𝙒𝑀 𝘽 𝙒𝑁−𝟣 . The result
of the complete, or two-dimensional, Haar wavelet transform (2D-HWT) is again an 𝑁 × 𝑀 matrix
divided into four blocks. On the right, you can see that most of the intensities are contained in
the upper left block while the remaining blocks are mostly black because they include mainly the
sought-after values close to zero that we can round down to save space.
Let’s take a closer look at their composition. The upper left block is composed of averages of
the rows and columns and looks like a smaller version of the image. The lower left block reveals
information about the horizontal details, while the upper right block contains the vertical ones.
Lastly, the lower right block is made up of differences of the rows and columns and corresponds to

chalkdustmagazine.com 52
chalkdust

Left: 1D-HWT; right: 2D-HWT.

the diagonal details. Notice that the three detail blocks only have high values at locations with a
drastic change in brightness which is why we can see the outlines of the images there.

Repetition and reversion


In practice, we repeat this process as often as possible
to increase the saving potential even further by reap-
plying it to the previous approximation in the top left
corner. Each time the process is repeated, the block
containing the averages becomes more compact. For
an 𝑁 × 𝑀 pixel image matrix, 𝑝 iterations of the Haar
wavelet transform can be performed if both 𝑁 and 𝑀
are divisible by 𝟤𝑝 . To restore the original image, we
need to apply the inverse transform to the approxima-
tion as many times as needed. You can think of it as
sharpening the image because the approximation gets
more and more detailed with each iteration we undo.
Of course, this is only completely possible if we com-
pressed the file without discarding any data. When
we reverse the process after rounding values to zero,
we end up with an approximation of the original im-
age which is our compressed version. 2D-HWT applied twice.

53 spring 2021
chalkdust

Wrapping things up
Now we know why the results with JPEG2000 are not only more efficient but also better looking.
It transforms the image as a whole instead of small 𝟪 × 𝟪 chunks. You might ask: “Why is it not
being used by the general public despite faring so much better than JPEG?” The answer could be
that it takes a lot of time and effort to change the standards for such a large-scale application like
image processing. Another hurdle is that the old format still seems to be sufficient. JPEG2000 has
found its home in fields like medical scanning, but we probably will not see it taking over anytime
soon. Today there are also some new contenders like WebP and AVIF that may supersede JPEG2000
anyway. Increasing demand for less data traffic due to environmental concerns might lead to a rise
in popularity for alternative formats. I urge everybody interested in image processing to check out
JPEG2000 and compress a few image files for themselves.

Johannes Huber
Johannes is a teacher trainee who studies maths and geography education at the University
of Vienna. He is part of the project Mathematik macht Freu(n)de which could be translated
as ‘maths brings joy (and friends)’ where he creates explanatory videos and coaches high
school students. In his free time he leads cub scouts and does parkour.
d underdetermined.blogspot.com

Cryptic crossnumber by Em—Dasher

1 2 3
Each clue points to a word or phrase whose length is in square
4 5 brackets [ ], which in turn points to a number whose length
is in round brackets ( ). These numbers should be entered in
6 7 the grid. All clues have a unique answer and, as usual, no
numbers begin with 0. Use of the internet is recommended
8 9 for some clues. If you get stuck, you can find some hints at
d chalkdustmagazine.com.
Across Down
1 Breadmaker’s act mindful! [6,5] (2) 1 Batting team adds energy to [6] (2)
sindarin.
3 Muddled cue in French [5] (1)
from exclamation! 2 Enumerate, on reflection, a sleuth’s [3] (3)
base for a tart.
4 Heartless noble learner’s [6] (3)
offspring is a Lord? 5 Gain height without width requires, [5] (3)
apparently, German Bee repellent?
6 Magazine formation. [4,4,3] (3)
7 Introduction of sphere’s central goal. [5] (2)
9 Roman god’s first saint [5] (2)
replaced by church tie. 8 Caesar’s parade lacks acidity and [11] (1)
meets very angry power-share.

chalkdustmagazine.com 54
We love it when our readers write to us. Here are some of the best emails, tweets and letters
we’ve been sent. Send your comments by email to c contact@chalkdustmagazine.com, on
Twitter a @chalkdustmag, or by post to e Chalkdust Magazine, Department of Mathematics,
University College London, Gower Street, London WC1E 6BT, UK.

Dear Chalkdust,
Some might say I was too over-excited
when I received this card from one of our
passionate & dedicated leaders in maths
this week—I say it was totally justified & I
can’t wait to get started!

Rhiannon Rainbow a @Noni_Rainbow

Dear Chalkdust, Great from Chalkdust: flo-map fractions


I look forward to your card love investigating repeating decimals with
every year. It is a gift!! Thank learners from elementary through univ
you for doing it. math majors.
Halcyon Foster a @halcyonfoster John Golden a @mathhombre

My copy of Issue 12 has arrived containing an article by me! Very excited right
now! They even sent Issue 11 so I have plenty of fun maths reading to enjoy!
Brad Ashley a @pogonomaths

Nice review of our recent publication “It’s the kind of book that I
Geometry Juniors by Ed Southall from wouldn’t be surprised to find a
Chalkdust. This title has been short- future mathematician citing as
listed by Chalkdust as their Book of the the book that made them fall in
Year. love with maths.”
humbled! Thank you Chalkdust
The Mathematical Association
a @MathematicalA Ed Southall a @edsouthall

Absolutely delighted to see Vicky Neale’s Why Study Mathematics? on the Chalk-
dust Book of the Year 2020 shortlist. Thanks for including it. Much deserved,
Vicky: it is such a helpful book!
LPP a @LPPbooks

55 spring 2021
chalkdust

Significant figures
John Conway
Thane Plambeck, CC BY 2.0

Jamie Handitye and Jakob Stein

M
y most memorable encounter with the work of late mathematician John Horton Conway
came from a friend of mine I met as a first year graduate student. As we sat across from
each other in the department common room, each having made little progress with our
research, he slid me a piece of paper with five dots drawn on it. This game, he explained, consisted
of us each taking turns to draw a line between any two dots, with the midpoint of the line we drew
then counting as an additional dot. Although the lines could bend in any direction, they were not
allowed to intersect each other, and each dot could join at most three line segments. The game
was over when one player could not make any more moves, and the other player was declared the
winner. At first, I was quickly defeated, and I spent quite some time trying to come up with the
best strategies against my skilled opponent.
The game that we spent our lunchtime playing was Sprouts, invented
by Conway and his friend Michael Paterson during their time at the As a graduate stu-
University of Cambridge, and was later popularised by Martin Gard- dent, Conway proved
ner in his Scientific American column Mathematical Games. Conway is that every positive
perhaps best known for his interest in games: he invented many, and whole number can be
his two books on the subject On numbers and games and Winning ways written as the sum
for mathematical plays include detailed analyses of many two-player of at most 37 fifth
games. He was a regular contributor to Gardner’s column, and was a powers.
major figure in the world of recreational mathematics in his own right.

chalkdustmagazine.com 56
chalkdust

Born in 1937, Conway grew up in Liverpool,


One of Conway’s most famous inventions was and attended Cambridge as an undergradu-
the game of life, a very simple type of ‘game’, ate, staying on for his postdoctoral research in
that takes place on a grid of pixels. Each pixel number theory, and eventual appointment as
starts either switched on or off, and each sec- fellow and lecturer. He moved to Princeton in
ond, any off pixel with exactly three on neigh- 1986, where he remained for the rest of his ca-
bours will also switch on, and only on pix- reer. According to those who knew him, he was
els with two or three on neighbours will re- always ready to play: he would carry around
main on. Despite these simple rules, the game puzzles, pennies, coat-hangers, and dice on
of life is actually so-called Turing complete, him, ready to stoke the imagination of some
meaning that in theory, any computer pro- unwitting colleague with a lively demonstra-
gramme could be run using these pixels. This tion or challenge. Often described as charis-
is an example of a cellular automaton, for more matic, he certainly fulfilled certain stereotypes
see pages 35–38. of the eccentric mathematician, but was also
an inspiration for many of those he taught and
spoke to, and remains so even after his death in April 2020 from complications due to Covid-19.
Conway was a prominent mathematician, not only dedicated to his work on popular games: on the
contrary, his willingness to approach any topic with the same enthusiasm led to him contributing
to research fields across mathematics. His interests included number theory, topology, analysis,
group theory, classical geometry, even theoretical physics. Analysts, for example, may be familiar
with his base 13 function, a function that takes every value between 0 and 1, but is discontinuous
everywhere. Among academics, he is better known for his work in group theory, in particular
on sporadic simple groups and the Monstrous Moonshine conjecture: a mathematical theory that
connects the sporadic groups, mysterious algebraic structures coming from group theory, with
functions called modular forms, coming from analysis. His name continues to be relevant, not
only through his own considerable research, but also through those who took inspiration from
him. In 2018, in a branch of topology called knot theory, a long-standing conjecture was solved
by then-graduate student Lisa Piccirillo, which involved the classification of a knot which bears
Conway’s name.
But for all his contributions, it was Conway’s willingness to collaborate, and share his love of ideas,
that are an example for all those interested in mathematics. So, to live by that example, I encourage
you to pick up some pencils and sheet of paper, find a friend, and go play a game of Sprouts.

Jamie Handitye
Jamie is a second year mathematics student at Christ’s College Cambridge. His main inter-
ests are in group theory, number theory, and a touch of algebraic geometry.

Jakob Stein
Jakob is a PhD student and mathematician from London and works mainly in differential
geometry. In his spare time, he likes to draw, and think about mathematics in art.
a @jakob_media

57 spring 2021
#13
Set by Humbug

1 2 3 4 5 6

7 8 9 10 11 12 13

14 15 16 17 18 19 20

21 22 23

24 25 26 27 28 29 30

31 32 33 34 35 36 37

38 39 40

41 42 43 44 45 46 47

48 49 50 51 52 53 54

55 56 57

Each clue in this crossnumber contains two statements joined by a logical connective. If the connective is
and, then both the statements are true. If the connective is nand, then at most one of the statements is true.
If the connective is or, then at least one of the statements is true. If the connective is nor, then neither of
the statements is true. If the connective is xor, then exactly one of the statements is true. If the connective
is xnor, then either the statements are both true or they are both false. Although many of the clues have
multiple answers, there is only one solution to the completed crossnumber. As usual, no numbers begin
with 0. Use of Python, OEIS, Wikipedia, etc is advised for some of the clues.

To enter, send us the sum of the across clues via the form on our website ( d chalkdustmagazine.com) by
18 September 2021. Only one entry per person will be accepted. Winners will be notified by email and
announced on our blog by 1 November 2021. One randomly-selected correct answer will win a £100 Maths
Gear goody bag, including non-transitive dice, a Festival of the Spoken Nerd DVD, a dodecaplex puzzle
and much, much more. Three randomly-selected runners up will win a Chalkdust T-shirt. Maths Gear is a
website that sells nerdy things worldwide. Find out more at d mathsgear.co.uk

chalkdustmagazine.com 58
1A is a multiple of 13 and 1A 17D and 15D share a factor 40A is a multiple of 100 and
is a square number. greater than 1 xnor 17D is 45D is a multiple of 100.
1D is equal to 31A or 1D is the product of 1D and 48A. 40A is a multiple of 45D nor
equal to 41A. 18A is a palindrome and 18D 45D is a multiple of 40A.
1D is prime xor 7A is odd. is a palindrome. 40A is a square number xnor
2D is equal to 3D or 2D is The sum of the digits of 18A 45D is a square number.
equal to 4D. is 2 and the sum of the digits 42A is a multiple of 9 xnor
of 18D is 2. 39A is a multiple of 9.
The sum of the digits of 3A is
27 or the sum of the digits of 19D is prime xor 27A is 42A is an anagram of 33A
3D is 27. prime. and 34D is an anagram of
20A is prime nor 20A is a 33D.
4D is a cube number xor 9D
is a cube number. multiple of a 2-digit prime. 43D is a multiple of 10A and
20D is a palindrome and the 26D is a multiple of 10A.
4D is a square number xor
11D is a square number. sum of the digits of 20D is 16. 44A is equal to 5A xor 36D is
22A is a multiple of 5 xnor equal to 5A.
5A is equal to 1A xor 5A is
equal to 3A. 22A is the product of 1D and 46A is a factor of 40A xor
48A. 46A is a factor of 45D.
5D is equal to 2D xor 6D is
23A is 6 times 27A and 28D 47D is a factor of 40A xor
equal to 1D.
is equal to 19D. 47D is a factor of 45D.
7A is equal to 54A xor 7A is
24A is a multiple of 8 and 48A is a cube number and
equal to 12A.
45D is a multiple of 24A. 50A is a cube number.
8A is equal to 3A xor 8A is
The sum of the digits of 24D 48A is a square number and
equal to 10A.
is 5 xnor the sum of the digits 50A is a square number.
10A is a factor of 3A or 4D is 49D is a multiple of 5 xor
of 49D is 5.
a factor of 3A. 50D is a multiple of 5.
12A is equal to 54A xor 12A 29A is equal to 37A xor 30D
is a palindrome. 52A is two more than 55A
is equal to 48A. and the sum of the digits of
The sum of the digits of 13D 29A is prime xor 37A is
52A is 3.
is 17 and the sum of the digits prime.
52D is a Fibonacci number
of 7D is 17. 32D is a multiple of 41A and
and 53D is a Fibonacci num-
14A is prime nor 14A is a 41D is a multiple of 41A.
ber.
multiple of a 2-digit prime. 35D is equal to 39A and 35D 54A is equal to 7A xor 54A is
The sum of 15D and 16A is equal to 35A. equal to 12A.
is 742 and the difference be- 37D is a prime xor the sum of 54D is a Fibonacci number
tween 15D and 16A is 38. the the digits of 37D is 5. xor 54D is the sum of 52D
16A is equal to 25A xor 17D 37D’s first digit is 1 nand and 53D.
is equal to 25A. 37D’s last digit is 1. 54D is a equal to 52D nor
16D is a multiple of 100 xor The sum of the digits of 37D 54D is equal to 53D.
21A is a multiple of 100. is 7 xor the sum of the digits 56A is a square number xor
The sum of the digits of 16D of 30D is 7. 56A is a cube number.
is 5 xor the sum of the digits 38A is a multiple of 10A and 57A is a square number xor
of 21A is 5. 39A is a multiple of 10A. 57A is greater than 200.

59 spring 2021
chalkdust

Highways Agency, CC BY 2.0

Aryan Ghobadi

A
s trends go, diagrammatic algebra has taken mathematics by storm. Appearing in papers
on computer science, pure mathematics and theoretical physics, the concept has expanded
well beyond its birthplace, the theory of Hopf algebras. Some use these diagrams to de-
pict difficult processes in quantum mechanics; others use them to model grammar in the English
language! In algebra, such diagrams provide a platform to prove difficult ring theoretic statements
by simple pictures.

As an algebraist, I’d like to present you with a down-to-earth introduction to the world of diagram-
matic algebra, by diagrammatising a rather simple structure: namely, the set of natural numbers!
At the end, I will allude to the connections between these diagrams and the exciting world of higher
and monoidal categories.

Now—imagine yourself in a lecture room, with many others as excited about diagrams as you
(yes?!), plus a cranky audience member, who isn’t a fan of category theory, in the front row:

chalkdustmagazine.com 60
chalkdust

What we would like to draw today is the process of multiplication for the natural numbers. In its
essence, multiplication, ×, takes two natural numbers, say 2 and 3, and produces another natural
number...

— six! —

Because it takes two elements and produces just one, multiplication is formally called a binary
operation: we can say it is a function 𝑚 ∶ ℕ × ℕ → ℕ, where, for example, 𝑚(𝟤, 𝟥) = 𝟨.
We will keep this 𝑚 notation for natural number multiplication to avoid confusion with the so-
called product of two sets 𝐴 and 𝐵, which is the set of all possible pairs from 𝐴 and 𝐵 and is
denoted by
𝐴 × 𝐵 = {(𝑎, 𝑏) ∶ 𝑎 ∈ 𝐴, 𝑏 ∈ 𝐵}

Now we draw (reading diagrams from top to bottom):


↦𝟤 𝟥


ℕ ℕ

𝑚=


𝟨
Multiplication, 𝑚, can really be thought of as a ‘meta-road’: it’s a one-way road with two entry
lanes, both departing from two cities whose cars correspond to natural numbers, and one exit lane
leading to natural-number-land again. We call our roads ‘meta’ because two cars, 2 and 3, enter
the lanes at the same time, possibly colliding in the middle, passing through time and space, and
a brand new car, 6, exits into the city.

— But how does your picture show any of the properties


of multiplication on the natural numbers?

Do not be alarmed by this interruption! I am ready to respond.

Diagrams for a monoid


A monoid structure is a fancy word for some of the nice properties that the multiplication of natural
numbers satisfies:
(i) associativity 𝑚(𝑥, 𝑚(𝑦, 𝑧)) = 𝑚(𝑚(𝑥, 𝑦), 𝑧) 𝟤 × (𝟥 × 𝟧) = 𝟥𝟢 = (𝟤 × 𝟥) × 𝟧
(ii) a unit element exists 𝑚(𝟣, 𝑥) = 𝑥 = 𝑚(𝑥, 𝟣) 𝟣 × 𝑥 = 𝑥 = 𝑥 × 𝟣 ∀𝑥 ∈ ℕ.
Now we simply visualise these properties using our pictorial notation. Associativity translates to
these compound meta-roads being the same:

61 spring 2021
chalkdust

ℕ ℕ ℕ ℕ ℕ ℕ

= OK... —

ℕ ℕ
But why are the diagrams the same? The key ingredient is that we need to put on our topological
glasses! We don’t care about length or curvature in our roads. It’s as if the asphalt moves freely
above the sand! With our new glasses, all the following diagrams are the same and the middle lane
can move freely from one side to the other:

= = = =

The second property we need to visualise is the unit element 𝟣 ∈ ℕ. In previous diagrams, any car
from ℕ can use the roads, whereas to discuss multiplication by 1, we need a unique car to use the
road. So we draw a special diagram for the road where only the car corresponding to 1 can use
the lane. The unit conditions require one more ingredient. Each city can have a boring ‘identity
road’ id, where nothing happens to cars taking this road. They simply leave and enter the city
looking the same. With this in mind, the diagrams representing the unit condition turn into the
following picture:
ℕ ℕ ℕ
= idℕ =
ℕ ℕ ℕ

This should not be a surprise since it is natural to think of multiplication by 1, 𝑚(𝟣, 𝑥) for any 𝑥 ,
as a function from ℕ to ℕ, which ultimately sends every number to itself. Putting our topological
glasses back on, looks as if the diagram for the identity road grew an extra hair, so we can push
it back in!

= = =

In our car metaphor, the left side represents a main road with an additional lane entering it, but
this lane is reserved for a ‘harmless’ car that does not interact with any of the other cars. So, it’s
the same as if the main road were the identity road, where nothing happens to the cars driving on
it.

— So we did it! Not so fast!


ℕ is a commutative monoid...
where’s your diagram for that? —

Here the cranky listener is using the old trick of deploying fancy words to heckle me. The word
commutative just means that the order in which we multiply the numbers doesn’t matter. Formally,

chalkdustmagazine.com 62
chalkdust

𝑚 being commutative means

𝑚(𝑎, 𝑏) = 𝑚(𝑏, 𝑎) for any 𝑎, 𝑏 ∈ ℕ.

For example, 𝟤 × 𝟥 = 𝟨 = 𝟥 × 𝟤.
To represent this, we need our roads to pass over each other. We need to build bridges! If we can
build bridges and allow lanes to pass over each other, ie diagrams like , then commutativity
translates to these diagrams being equal:
ℕ ℕ ℕ ℕ
=
ℕ ℕ
To truly see this property, we need to upgrade our glasses to 3D glasses to capture three-dimensional
topology. If we view the string diagrams through our 3D glasses, then one could unwind the right-
hand diagram by rotating it as so:

= = = =

— But even still… why would anyone care how you


draw multiplication as a diagram?

To placate this restless member of the audience, I will present the punchline a bit early and use the
keyword ‘category’ before explaining what it is.
The reason we can draw a commutative monoid such as ℕ as a three-dimensional diagram is
because commutative monoids live in what we call braided categories such as the category of sets.
Today’s algebraists will tell you that a braided category is an example of a weirder structure called
a 3-category, which has some 3D topology hidden in it. But this takes us into the daunting world
of higher categories, and by this point my heckler is hopefully intrigued but has too much pride to
ask me to elaborate.

So what’s a category? —
Aha! Back to our story...

Categories
In the same way that looking at the connections between cities in a country is more enlightening
than looking at the cities independently, in mathematics it’s more useful to understand the relation
between mathematical objects. For example, instead of looking at sets ℕ, ℝ, {𝟣, 𝟤, 𝟥}, ∅, I really need
to discuss functions between sets to understand how sets relate to each other. This now fits in a

63 spring 2021
chalkdust

bigger framework, a category. A category has some cities, for example sets 𝐴, 𝐵 and 𝐶 , and some
roads 𝑓 ∶ 𝐴 → 𝐵 between the cities, with two extra rules!

1. If roads 𝑓 ∶ 𝐴 → 𝐵 and 𝑔 ∶ 𝐵 → 𝐶 are part of my cate- 𝐴 𝐴


gory, then so is a composition road 𝑔𝑓 ∶ 𝐴 → 𝐶 which 𝑓
is made up from joining roads 𝑓 and 𝑔 (first taking the 𝑔𝑓 = 𝐵
road 𝑓 to the city 𝐵 followed by the road 𝑔 ). 𝑔
𝐶 𝐶
𝐴 𝐴 𝐴
id𝐴 𝑓
2. Every city should have a special ‘safe’ road, called the
𝐴 = 𝑓 = 𝐵
identity road, like the identity function idℕ for ℕ.
𝑓 id𝐵
𝐵 𝐵 𝐵
Categories provide a platform to draw one-dimensional diagrams and a ‘1D calculus’, ie a way to
manipulate these diagrams, as I’ve shown on the right there.

The category of sets has sets as cities and functions as roads. The identity road for each city 𝐴 is
just the identity function id𝐴 ∶ 𝐴 → 𝐴, where id𝐴 (𝑎) = 𝑎 for all 𝑎 ∈ 𝐴.

Monoidal categories
The missing piece for a 2D calculus is a way to write in the horizontal direction. When we visualised
𝑚 ∶ ℕ × ℕ → ℕ as a diagram, we said that writing two cities 𝐴 and 𝐵 next to each other meant the
product of the two sets 𝐴 × 𝐵. In other words, writing cities in rows should have a good meaning,
where ‘good’ means that roads between these cities can run parallel in the vertical direction. That
is, in the case of sets, for every pair of functions 𝑓 ∶ 𝐴𝟣 → 𝐴𝟤 and 𝑔 ∶ 𝐵𝟣 → 𝐵𝟤 , we have a new
function 𝑓 × 𝑔 ∶ 𝐴𝟣 × 𝐵𝟣 → 𝐴𝟤 × 𝐵𝟤 . In our diagrams, we represent the road 𝑓 × 𝑔 by the roads 𝑓
and 𝑔 running parallel:

𝐴𝟣 × 𝐵𝟣 𝐴𝟣 𝐵𝟣

𝑓 ×𝑔 = 𝑓 𝑔 𝑓 × 𝑔 (𝑎𝟣 , 𝑏𝟣 ) = (𝑓 (𝑎𝟣 ), 𝑔(𝑏𝟣 )), (𝑎𝟣 , 𝑏𝟣 ) ∈ 𝐴𝟣 × 𝐵𝟣

𝐴𝟤 × 𝐵𝟤 𝐴𝟤 𝐵𝟤

Similar to the identity roads acting as ineffective components in the vertical direction, we require
an ‘empty city’ 𝐸 which behaves indifferently in the horizontal direction:

𝐴 𝐸 = 𝐴 = 𝐸 𝐴.

A bit more formally, for each pair of objects 𝐴 and 𝐵, the object ‘𝐴 next to 𝐵’ is written as 𝐴 ⊗ 𝐵.
Parallel roads are written as 𝑓 ⊗ 𝑔 and 𝐸 is called the unit. A category with an ⊗ operation on
pairs of cities and roads and a unit 𝐸 is called monoidal. It should be clear that monoidal categories
provide a setting for 2-dimensional diagrams:

chalkdustmagazine.com 64
chalkdust

𝑓 ∶𝐴→𝐵⊗𝐶 𝑔 ∶𝐴⊗𝐵 →𝐸 ℎ ∶ 𝐴 𝟣 ⊗ 𝐴𝟤 ⊗ 𝐴𝟥 → 𝐵 𝟣 ⊗ 𝐵𝟤
𝐴 𝐴 𝐵 𝐴𝟣 𝐴𝟤 𝐴𝟥


𝐵 𝐶 ‘𝐸 ’
= nothing 𝐵𝟣 𝐵𝟤

The monoidal structure on the category of sets is given by 𝐴 ⊗ 𝐵 = 𝐴 × 𝐵, 𝑓 ⊗ 𝑔 = 𝑓 × 𝑔 ; and 𝐸 = {∗}


is the set with one element, so that {∗} × 𝐴 = {(∗, 𝑎) ∶ 𝑎 ∈ 𝐴}.

By now the room is probably silent and the fear that the audience has long drifted off into sweet
dreams of differential equations dawns on me. But...

— How do these monoidal categories relate to monoids


like ℕ you were talking about at the start?

An intelligent question!

In the same way you call a set a monoid when you can multiply its elements, a category is called
monoidal when you can ‘multiply’ its cities and roads, and instead of a unit element you have a
unit city. A trendier way to say this is “monoidal categories categorify monoids”. This is reflected
in the fact that a monoid structure on an object of a category only makes sense when the category
itself has a monoidal structure.

Braided monoidal categories


In a braided category, the order of cities in a row can be swapped! To swap any two cities 𝐴 and
𝐵, we need a method of travel—a road—from 𝐴 ⊗ 𝐵 to 𝐵 ⊗ 𝐴. These roads should have two entry
lanes from the cities 𝐴 and 𝐵, and two exit lanes into 𝐵 and 𝐴, in that order. We’d also like these
roads, which we denote by 𝑏𝐴,𝐵 , to resemble the 3D picture , which we saw when describing
the commutative property of ℕ. The next rules which need to be satisfied are directly influenced
by topology.

−𝟣 resembling
Firstly, each pass over road 𝑏𝐴,𝐵 should also be invertible by a road 𝑏𝐴,𝐵
the move . As apparent in the diagram on the right, the composition of two =
such roads should be the same as the identity roads of 𝐴 and 𝐵 running parallel.

The other conditions which need to hold just mean that if you take a number of cities (𝐴, 𝐵, 𝐶) and
reorder them (maybe to 𝐶, 𝐵, 𝐴) via such passover roads, the outcome should be the same journey:

65 spring 2021
chalkdust

𝐴 𝐵 𝐶 𝐴 𝐵 𝐶
∘ (𝑏𝐴,𝐵 ⊗ id𝐶 ) → ← ∘ (id𝐴 ⊗𝑏𝐵,𝐶 )
∘ (id𝐵 ⊗𝑏𝐴,𝐶 ) → = ← ∘ (𝑏𝐴,𝐶 ⊗ id𝐵 )
(𝑏𝐵,𝐶 ⊗ id𝐴 ) → ← (id𝐶 ⊗𝑏𝐴,𝐵 )
𝐶 𝐵 𝐴 𝐶 𝐵 𝐴

Geometrically this translates to ‘the order in which the roads lay above each other matters, not
the order in which one passes over the other’. As in this picture, the road connected to 𝐴 lies above
the road connected to 𝐵, which itself lies above the road connected to 𝐶 . However, the order in
which they pass over each other does not matter.
A monoidal category with passover roads for any pair of cities, as described above, is called braided.
In the category of sets, the passover roads for sets 𝐴 and 𝐵 are provided by

𝑏𝐴,𝐵 ∶ 𝐴 × 𝐵 → 𝐵 × 𝐴, 𝑏𝐴,𝐵 (𝑎, 𝑏) = (𝑏, 𝑎), 𝑎 ∈ 𝐴, 𝑏 ∈ 𝐵.

For those with some university algebra knowledge, another important example of braided monoidal
categories is the category of vector spaces with the tensor product of vector spaces. This is in fact
where the notation ⊗ comes from.

I’m sure he can’t top this... —


Well...

The big finale... higher algebra!


Let’s say we want to describe a larger system than cities and roads between them. We really
want to know how two roads 𝑓 , 𝑔 between two cities 𝐴, 𝐵 are related to each other. Under this
geographical metaphor, this would entail looking at which streets connect the two roads within
the two cities:
road 𝑓
city 𝐴 city 𝐵
𝑓

𝐴 2-road data 𝐵 𝐴 𝐵
𝑔
road 𝑔

We call such a pair of streets connecting roads 𝑓 and 𝑔 a 2-road between 𝑓 and 𝑔 . A 2-category
carries the information of cities, roads and 2-roads (for those not entertained by my metaphors:
objects, morphisms and 2-morphisms) where we draw roads and 2-roads by → and ⇒, respectively.
Similarly to how we can compose ordinary roads, we compose 2-roads 𝜃 ∶ 𝑓 ⇒ 𝑔 and 𝜂 ∶ 𝑔 ⇒ ℎ
‘vertically’ to produce a new 2-road 𝜂 ∘𝑣 𝜃 ∶ 𝑓 ⇒ ℎ (drawn on the left, overleaf). We can only do
this when 𝑓 , 𝑔 and ℎ are all roads between the same two cities 𝐴, 𝐵. But in addition to this vertical
composition, 2-roads also have a horizontal composition (drawn on the right):

chalkdustmagazine.com 66
chalkdust

vertical composition horizontal composition


𝑓 𝑓 𝑓 ℎ ℎ𝑓
𝜃
𝐴
𝜂 𝑔 𝐵 = 𝐴 𝜂 ∘𝑣 𝜃 𝐵 𝐴 𝜃 𝐵 𝜋 𝐶 = 𝐴 𝜋 ∘ℎ 𝜃 𝐶

ℎ ℎ 𝑔 𝑘 𝑘𝑔

Such compositions need to act well together, ie the order of composing horizontally or vertically
should not matter:
horizontal

• • • = • •

vertical vertical
=

=
• • • = • •

horizontal

Diagrams like the above provide a platform for a 2-dimensional calculus as well and this is no
coincidence. The information for a monoidal category is equivalent to the information needed
for a 2-category with a single city. To better understand this, compare the pictures we have been
drawing:
monoidal category equivalent to 2-category with one city ∗
cities, eg 𝐴 roads from ∗ to ∗, eg 𝐴
roads, eg 𝑚 2-roads, eg 𝑚
composition of roads vertical composition
monoidal operation ⊗ for cities roads composing
roads running parallel: ⊗ for roads horizontal composition
empty city identity road from ∗ to ∗

∗ 𝐶
𝐴 𝐵 𝐶 𝐴 𝐵
id𝐶 ∗ 𝑚 ∗ id𝐶 ∗
𝑚
𝐷 𝐶 𝐷 𝐶
𝑓 𝑔 𝑓 𝑔

𝐸 𝐹
𝐸 𝐹

The diagram on the right shows how information transfers between the two settings. This brings us

67 spring 2021
chalkdust

back to why we can draw a commutative monoid, such as the natural numbers, via 3D diagrams.
First remember that to talk about a monoid being commutative, we needed to be able to swap
elements. So we really need a braided monoidal category. In a similar fashion to how monoidal
categories are 2-categories in disguise, a braided category is a 3-category with one city and one
road, and provides a 3D calculus, where our commutative monoid ℕ can live!

— 𝑛 ∈ ℕ cheers! Monodial categories rule! —

So maybe now while these cheers fill the air, my heckler walks out of the lecture room and slams
the door. I smile with pride, knowing that ‘category theory won today’.
No mathematicians were harmed during the making of this article. All audience members were fictitious
and no real mathematicians were forced to attend my lecture.

Aryan Ghobadi
Aryan is a PhD student in mathematics at Queen Mary University of London, working with
categories in quantum algebra. He is often the cranky audience member in the front row.
d sites.google.com/view/aghobadimath
My favourite game
Prisoner’s dilemma
Belgin Seymenoğlu

What I find interesting about the prisoner’s Prisoners Cooperate Defect


dilemma is that it shows that even if the most
beneficial outcome for two parties appears to Cooperate (1 yr, 1 yr) (10 yrs, free)
be for both to cooperate, one or both of them Defect (free, 10 yrs) (5 yrs, 5 yrs)
may be tempted to defect anyway. Moreover, we see many variants showing up on TV and
in the real world, eg will countries cooperate to cut carbon emissions? Will two players on
the game show Golden Balls choose to split their money or steal from the other? 8/10

Did you know...

…that while there are the five Platonic solids we all know and love, there are actually six
‘Platonic’ polytopes in four dimensions, but only three in each dimension greater than four.

My favourite game
Gran Turismo
The Cardigans
8/11

chalkdustmagazine.com 68
ZOOM
CONFERENCE
It's the twenties and there is time for… Zingo! Tired of those long meetings that seem
never to end? Stuck in a boring conference talk? Just want virtual learning to be
over? While away the time playing our most up-to-date version of the classic game
of BINGO! Play with friend—just ensure you’re on mute before shouting ‘full zouse’.

Call with 144 Speaker who


“There’s a weird
participants who aren’t doesn’t know how to
echo (echo) (echo)”
automatically muted share their slides

Losing connection
Arriving late because
and presenting to the “You’re on mute”
of time zones
void for 5 minutes

Having a complete
Windows XP hill meltdown and
Pet cameo
virtual background garnering national
infamy

Leaning back and


“Can you repost the
“You’re not on mute” half disappearing into
link?”
your background

Speaker overruns
Three people have a
by 10 minutes yet Arriving early because
personal conversation
the chair doesn’t of time zones
in a call of 30
Jackie Weaver them

… “Ooh can I have


Someone off camera Turning a boring
one—milk no sugar”
hands speaker a cup editorial meeting into
says some joker
of tea… a magazine feature
in the chat

69 spring 2021
On this page, you can find out what we think
of recent books, films, games, and anything
else vaguely mathematical. Full reviews of
many of the items featured here can be found
at d chalkdustmagazine.com

Only Connect
This was a thrilling final, with fiendish questions and some excellent
quizzing. It was also lovely to be able to support long-time friend of
Chalkdust, Katie Steckles, and her team. What comes next in this se-
quence:

ggiii gggii ggggi ?


ggggg
The Tired Sounds of Formalized Music
Stars of the Lid Iannis Xenakis
Great lockdown album. Formalised Music is a manual, of
ggggg sorts, to create a machine that
writes music. It can be quite
frustrating to decipher Xenakis’s
Stars of the Lid writing, but his ideas on on the
and their Refinement of the mathematics of composition re-
Decline main influential and are worth
Perhaps the only thing better exploring.
than The Tired Sounds of... gghii
ggggg
Poems and Paradoxes
Klax Kyle Evans & Hana Ayoob
It’s no longer the nineties, but A really fun collection of math-
there’s still time for Klax. ematical titbits and poems with
delightful artwork.
gggii ggggh
Apollo 13 Virtual meetings
Brilliant. The 13th numbered There’s no √−𝟣 in MS Teams. I just wish i
thing is always the best. wasn’t on it all the time.
ggggg giiii
chalkdustmagazine.com 70
Chalkdust Book of the Year 2020
Molly and the Mathematical Mystery by Eugenia
Cheng and Aleksandra Artymowska
This is a beautiful book. Its pages are large, and full of wonderful illus-
trations. On each page, the reader is encouraged to help Molly con-
tinue on her adventure by finding information under flaps, opening
flaps to change available routes, or even using the flaps to construct a
path for Molly that takes her out of the page.
This book was selected by the editors of Chalkdust to be the Chalkdust
Book of the Year 2020, based on our four judging criteria: style, control,
damage and aggression.

Chalkdust Readers’ Choice 2020


Mathematical Adventures! by Ioanna Georgiou
and Asuka Young
This book doesn’t shy away from difficult ideas, such as the existence
of different sizes of infinity, and offers an excellent opportunity for a
child to meet interesting bits of maths that would often be deemed
‘too difficult’ for a few more years. This book would be a great way
to rediscover and share these interesting mathematical ideas with a
younger relative.
This book was voted by our readers to be the Chalkdust Readers’
Choice 2020.

Shortlisted
The winners were selected from our shortlist of seven books released in 2020. The seven non-
winning books are all also very good. They were:

The Wonder Book of Geometry by David Acheson; How to Make the World Add Up by Tim Harford;
Hello Numbers! What Can You Do? by Edmund Harriss, Houston Hughes and Brian Rea; Why
Study Mathematics? by Vicky Neale; and Geometry Juniors by Ed Southall.

71 spring 2021
TOP TEN
This issue features the top ten calculator buttons. To vote on the top ten waves, go to
d chalkdustmagazine.com

At 10, it’s Mambo No. 5


(A Little Bit Of...) by Lou At 9, it’s All Apologies by
Bega. Nirvana.
10 9

At 8, it’s Mambo No.5


(A Little Bit Of... ) by At 7, it’s Up Allnight by Beck.
Lou Bega.
8
7

At 5, it’s M+ambo No.5


At 6, it’s Mambo No.5 (A Little (A Little Bit Of...) by
Bit Of.. . ) by Lou Bega. Lou Bega.
6 5

At 4, it’s Mambo No.5 (A At 3, it’s My Name =


Little Bit 0f...) by Lou Bega. by Eminem.

4 3

At 2 this issue, it’s


Thunderstruck by At 1, it’s Mambo No.5 (A
AC/DC. Little Bit Off...) by Lou Bega.

2 1

chalkdustmagazine.com 72
Colour your own 1
cellular automaton Pick a number between 0 and 255.
(see pages 35–38)

2
1 1 1 1 1 0 1 0 1 1 0 0 0 1 1 0 1 0 0 0 1 0 0 0

Convert your number to binary and write its digits in the boxes above.
3 4
Flip a coin 8 times Use the rules you
and write the results defined above to
(0 for heads, 1 for fill the rest of the
tails) in the first row. grid with 1s (black)
and 0s (any other
colour you like).

You might also like