You are on page 1of 343

Computers: Broad & Shallow

Benjamin Crysup
©2018 Benjamin Crysup
All Rights Reserved

ii
Preface

The goal of this book is to, as quickly as possible,


describe how a computer is built and programmed.
Every major topic worthy of study has some set of
core principles: if you truly understand those core
principles, you can work out the rest for yourself.
Accordingly, this book is meant to provide a Broad &
Shallow introduction to computer science by getting
the basics across. This book will turn “can’t”
into “won’t”; if you want to go deeper (“won’t” into
“did, and got paid well”), I have suggested a few
additional resources at the end of the book.
This book is in two parts: the first eleven
chapters are written to explain what a computer is
and how it is programmed, while the last thirteen
chapters are a random collection of topics. The
first eleven chapters must be read (and understood)
in order; the last thirteen can be read in (almost)
any order.

iii
iv
Contents

I How to computer 1
1 How to build transistors from silicon 3

2 How to build logic gates from transistors 17

3 How to build computation circuits from gates 33

4 How to build memory from logic gates 51

5 How to build a computer 69

6 How to program arithmetic 85

7 How to program with jumps 97

8 How to program with loops 111

9 How to program and preserve your sanity 125

10 How to program in an organized manner 137

11 How to program a compiler 153

v
II How to computer science 163
12 How to dope 165

13 How to data 177

14 How to algorithm 191

15 How to spin rust 207

16 How to make friends and influence people 223

17 How to stimulate 239

18 How to play sweet music 251

19 How to paint pretty pictures 265

20 How to push a few buttons 277

21 How to multitask 289

22 How to grammar 299

23 How to lengua 313

24 How to big 323

vi
Part I

How to computer

1
Chapter 1

How to build
transistors from
silicon

The goal of this chapter is to explain the concept of


a bit, and how it relates to the fundamental unit of
computer hardware, the transistor.

Bits
Presumably, you’ve heard that computers make use
of bits and bytes. While technically true, without
knowing exactly what those are, that is an absolutely
useless statement. Additionally, there are two views
of what bits are that, while similar, carry enough
differences to confuse matters.
From an information theory standpoint, a bit
is a fundamental quantity of information: the
ability to distinguish between two states. These two
states could be zero or one, yes/no, hot/cold, A/B,
situation normal/everything’s on fire. Exactly what
the states are is not particularly relevant, just
that there are two possible states the system could
be in. If there are more possibilities, then each
bit of information cuts the number of possibilities
in half.
For instance, suppose you’re in an elevator with
four other people (two on your left, two on your

3
right), and you smell a fart. If you’d like to
know who did it, you might first ask whether it
was somebody on your left or right (one bit of
information), then on that side, which of the two
people was responsible (a second bit).

Of course, this assumes it wasn’t you.

In other words, each bit doubles the number of


possibilities you can distinguish. With one bit,
you can distinguish two possibilities; with two
bits, four possibilities; with three bits, eight
possibilities; and with eight bits (a byte), 256
possibilities.
The other definition of a bit is the practical
one: how do we go about storing and transmitting
that information. Transmission between computers is
a tricky question. However, inside a single computer
all the techniques for storing a bit boil down to
taking some item that can be in one of two states and
detecting what state it is in. One of the oldest
examples of this is to take a piece of paper and
punch holes in strategic locations. The detector
then looks at each possible location to see whether

4
the paper is intact or punched. This method has been
in use since the 1830’s, and is mechanically simple
(the detector can be as simple as a well of mercury,
a needle, and wire thread).

Punch cards and mercury: two things that can drive


you crazy.

However, paper is not the most durable of storage


media, and becomes less durable as a large number of
holes gets punched into it. Additionally, it’s kind
of hard to un-punch a piece of paper: if you make a
mistake, or things change, you need to go get a clean
sheet.
A more durable and re-writable solution is to use a
ferromagnetic material. Ferromagnetic materials are
those that become magnetized when held in a magnetic
field, and stay magnetized after the field is taken
away. Since a magnet has two poles, north and south,
all one needs to do to read a bit is to determine
whether the north end or south end of the material is
facing the reader.

5
The first byte of a funny cat photo.

Ferromagnetic materials tend to be fairly tough


(ferro = iron), and can hold their field for a long
time. For this reason, they tend to be used for
long term storage of information (magnetic tapes
and floppy disks were popular for a while, and
high capacity hard drives are still spinning chunks
of rust). However, because the material carrying
information has to be physically moved to the reader,
they also tend to be slow. Now, slow is a relative
term: modern hard drives have seek times of around
2 milliseconds. However, considering that modern
computers do calculations on nanosecond timescales,
it’s obvious some other transmission media is used
for the actual logic of a computer.

Electricity
When two charged particles are close together, they
exert a force on each other. If they have the same
charge, they will push themselves apart, and if they
have opposite charges, they will pull themselves
closer together.

6
Electrons -, Protons +

If an electron (negatively charged particle) is


put near a large collection of negatively charged
particles, it will try to move away from that
collection.

Too much negative energy.

In the above system, the electron initially isn’t


moving, but after pushing away from the collection,
it will have gained speed (kinetic energy). If
one electron put near the group gains energy when
it moves away, it makes sense that two electrons

7
placed at the same location would gain twice as much
(total) energy when they move away. The same goes
for larger numbers of electrons, so when describing
this system, one only needs to describe the energy
gain per amount of charge. A very common unit to
describe the amount of energy charges can gain is
the Volt. Since electrons like to move away from
negative charges and towards positive ones, electrons
will move from negative voltages towards positive
voltages.
This suggests another way to represent a bit.
If one has a wire, they can examine the potential:
if the wire is at a negative potential (electrons
want to leave), it is interpreted as zero/no/cold,
while if the wire is at a positive potential, it is
interpreted as one/yes/hot.
As a side note, you may have noticed that batteries
are usually labeled with a positive terminal and a
negative terminal. This matches the above definition
of voltage: electrons tend to migrate from the
negative terminal (which is at a negative voltage)
towards the positive terminal (which is at a positive
voltage). So, in many cases, examining the potential
of the wire is equivalent to asking which terminal of
a battery the wire is connected to.

The symbol for a battery: originated with the


seminal experiment of making a dead frog twitch.

8
Semiconductors
If voltage is being used to represent a bit, a fairly
easy way to transport the value is by using a wire.
Metals happen to be fairly good conductors, as the
atoms each have a large number of electrons, many of
which are very far from the nucleus and very loosely
bound. With the dispersed nature of the electron
orbitals, the electrons in a metal can move very
easily, meaning that if the metal is attached to a
constant voltage, the entirety of the metal equalizes
to that voltage.

Positive nuclei adrift in a sea of electrons.

So, we have a method with which to move bits


around. However, we’d like a method to respond
to the bits, and in order to do that, we need to
understand silicon. Pure silicon has four valence
electrons, and as such prefers to form four bonds.
Electrons prefer to put as much distance between
themselves as possible, so the bonds are oriented
to the corners of a tetrahedron, the same way the
carbons in a diamond are oriented.

9
Some crystal structures are easy to draw. Diamond
not so much.

As a convenience, this can be represented as a


two-dimensional lattice.

Much easier to draw.

It’s important to note that the electrons in


silicon are tightly bound: the electrons in each
bond are stuck in that bond and cannot move. This is
because silicon has fewer electrons than most metals,
so the electrons it has are closer and more exposed
to the positive nucleus. Because the electrons

10
are tightly bound, they cannot move in response
to a voltage difference, so pure silicon is a poor
conductor of electricity.
The word “pure” turns out to be very important. If
there are impurities in the silicon, they introduce
faults in the crystal structure and destabilize the
neighboring bonds, which in turn introduce mobile
charges. If a group five element (phosphorous,
arsenic), which has five valence electrons, is
introduced, the crystal structure of the silicon
imposes four bonds on the atom (in a tetrahedral
arrangement), producing an extra electron that is
free to wander around the crystal, in turn producing
silicon that can conduct electricity.

Arsenic: 1001 uses around the house.

Alternatively, if a group three element (boron,


aluminum), which has three valence electrons, is
introduced into the lattice, the crystal structure
imposes four bonds on the atom, leaving behind an
electron deficient silicon. Other electrons may
migrate to fill this electron deficiency, producing
what looks like positively charged holes that move
through the silicon. Again, this produces a silicon
that can conduct electricity.

11
It’s worth repeating that the protons are going
nowhere: the electrons from the neighbors move to
fill the hole.

Transistors
Silicon “doped” with a group five element,
n-type silicon, has mobile negative charges that
can conduct electricity. Silicon doped with a
group three element, p-type silicon, has mobile
electron deficient regions (that can also conduct
electricity). If a region of n-type silicon borders
a region of p-type silicon, the mobile electrons
will scatter randomly and eventually encounter the
electron deficient regions, creating a region of
silicon without carrier charges (an insulator).

12
A line in the sand (a.k.a. a diode).

A seemingly useless construct has a region of


p-type silicon surrounded by two regions of n-type
silicon.

Also known as a bipolar junction.

In this case, we’ve put together a very complicated


insulator, as there are two insulating regions
between the wires. However, if we put a piece of
metal above the middle piece of silicon and attach
the plate to a positive voltage, the deficient
regions in the p-type will move away from the plate,
and the other electrons in the p-type silicon will be
drawn toward the plate.

13
The charges shown are only the excess charges; there
are a large number of paired protons and electrons
just hanging around.

This creates a channel of silicon that looks


like it’s n-type doped, allowing current to flow.
However, if the metallic plate is negatively charged,
the reverse happens.

Current’s not flowing today.

The region near the plate becomes even more


positively charged, making it an even stronger
insulator. This device is called a transistor. If
the gate is connected to a positive voltage, there
is a closed circuit between the source and drain
wires (meaning the source and drain will have the
same voltage), while if the gate is connected to a
negative voltage, there is no connection between the
source and drain (which means their voltages have no
relation).

14
The symbol used in this book for an n-channel
Metal-Oxide Semiconductor Field Effect Transistor.

A similar device can be made by reversing the


dopants. If the dopants are reversed (a block of
n-type silicon surrounded by p-type contacts), then
the behavior is reversed. If the gate is connected
to a positive voltage, then there is no connection
between the source and drain, while a negative
voltage closes the circuit.

The symbol used in this book for a p-channel MOSFET.

15
Challenges
1) The methods to store and transmit bits presented
in this chapter are by no means the only options.
Come up with three other possible ways to store or
transmit data.

2) Carbon, Germanium, Tin and Lead have the same


number of outer shell electrons as Silicon. Do some
research on how suitable each of these materials are
as semiconductors.

3) There are other ways to build a switch, including


relays, bipolar junction transistors, and vacuum
tubes. Do some research and compare and contrast
them with the metal-oxide semiconductor field effect
transistors (MOSFETs) described in this chapter.

16
Chapter 2

How to build logic


gates from
transistors

In the previous chapter, we went over how to create


transistors. The goal of this chapter is to use
those transistors to do basic logic.

Boolean Arithmetic
By now, you’re probably familiar with the basics of
arithmetic. For example, if you have 5 moneys and 3
moneys, you can ADD them to get 8 moneys.

3 + 5 = 8
Lots of moneys.

17
For some strange reason, philosophers don’t like
working with numbers: they prefer to work with
questions of True and False. The variables they
work with are termed Boolean variables, and are
surprisingly useful to the computer engineer: a
Boolean can take one of two values, just as a bit can
take on one of two values.
Just as there are basic operations that can be
performed with numbers (add, subtract, multiply,
divide), there are basic operations that can be
performed with Booleans. There are three operators
that are of immediate interest to us.
The first of these is the NOT operator. The NOT
operator takes one value, and changes it from false
to true, and vice versa. Many programming languages
use an exclamation point “!” for NOT, so that’s what
we’ll use.

! True = False
! False = True

The second operator of interest is the OR operator.


The OR operator takes two values and reports whether
either of the two values is True. We’ll use the
pipe character, “|”, for this, as it’s also common
in programming.

True | True = True


True | False = True
False | True = True
False | False = False

The final operator we’re interested in is the AND


operator. The AND operator takes two values and
reports whether both of the inputs are true: note
that, if only one input is True, OR will report True
while AND reports False. We’ll use the ampersand
character, “&”, for this.

True & True = True


True & False = False
False & True = False
False & False = False

18
Boolean Logic
This Boolean arithmetic allows us to construct some
basic logical statements. For instance, suppose
you receive regular reports about the status of a
chemical tank. If the reports say that everything is
O.K., then you don’t have to do anything. However,
if the reports say things are NOT alright, that would
be a good time to panic.

Panic = ! AllOK
Hold your breath.

Similarly, if your place of work has a fire


alarm, then if that alarm is going off you should
find someplace else to be. Additionally, seeing
everything on fire is also a good reason to find
someplace else to be. So, you should move if the
alarm is going off OR you see fire.

Move = Alarm | Fire


Some move when they see the light.
Some move when they feel the heat.

19
And of course, you know that the shambling horror
coming towards you is a zombie if it is rotting AND
hungers for your brains.

Zombie = Rotting & Braaains


Poor guy can’t catch a break.

Truth Tables
These operations can be chained together to produce
some relatively complicated logic. For example,
when fighting off the undead hoards, you may need
to decide whether to stand your ground or run for the
hills. This decision will likely be decided by how
much ammunition you have handy, how close the horde
is, and whether you have any open room to run.

Ignore cover and number of zombies at your peril.

My own personal plan for this situation would be to


run when I have a place to run to AND I’m low on ammo
OR the enemy is getting dangerously close. In other
words:

20
Run = can_run & (low_ammo | zombies_close)

As an example, if I had lots of ammo (low_ammo


= False), the zombies were close (zombies_close =
True), and I was unable to run (can_run = False),
then the run question becomes

Run = False & (False | True)

The OR operator returns whether either argument is


True, so this becomes

Run = False & True

The AND operator returns whether both arguments are


True, so I choose not to run.

Run = False

For more complicated expressions, it can be helpful


to run through all possible situations and determine
what the results would be in each case. This is
called a truth table. For running from zombies,
there are three inputs, meaning there are eight
possible situations in the truth table.

Zombie Truth Table

can_run low_ammo zombies_close Run?


False False False False
False False True False
False True False False
False True True False
True False False False
True False True True
True True False True
True True True True

Truth tables can be used when all other analysis


fails, as the number of situations for a Boolean
expression is finite. However, if there are a
large number of inputs, the number of situations can
still be large, making the truth table difficult to
calculate.

21
You have chosen... poorly.

The NOT Gate


Now that we know what these Boolean operators do, we
might like to build them. In the previous chapter,
we went over how to build electrical switches. There
was the n-channel MOSFET, which connects when the
gate wire is at a positive voltage, and disconnects
when negative. Using this and the p-channel MOSFET
(close on negative, open on positive), it is possible
to build circuitry that performs AND, OR and NOT.
The first thing we need to do is to define True and
False. Since we have two options, and our battery
has two terminals, one option is to say that positive
voltage means True and negative means False. So,
our NOT gate begins with a battery with two wires
connected to it.

Philosophy finally has a use.

22
The NOT operator takes ONE value and produces ONE
result. So, a NOT gate will take one wire as input
and will produce one wire as output: where these
wires come from and go to are not our problem right
now.

Where did they come from? Where did they go? Where
did they come from, Cotton-eyed Joe?

When the Input is positive, we want the Output to


be negative. We have a transistor that produces a
connection on positive, so we can use an n-channel
MOSFET to connect the negative wire and the Output.

23
Half a NOT.

When Input is at a positive voltage (True), then


the n-channel MOSFET w connects the Output to the
negative wire. Every spot on a connected wire will
equalize to the same voltage, so the Output will be
held at negative voltage (False).
However, when Input is at a negative voltage, w
breaks the connection between Output and the negative
wire. This leaves the voltage of Output “floating”,
and can provide real headaches when designing later
circuitry. To fix this, we can add a transistor that
closes on negative (p-channel).

24
A Complementary MOS NOT gate.

The n-channel transistor (w) still functions as


before: when Input has a positive voltage, Output is
connected to the negative wire. However, when Input
is at a negative voltage, the p-channel transistor
(x) provides a connection between the Output and the
positive wire, making the Output positive.

25
Output = !Input

When Input is positive (True), Output is negative


(False); when Input is negative (False), Output is
positive (True). This configuration of transistors
performs a Boolean NOT, and is called a NOT gate,
which has the following symbol.

26
These things show up everywhere.

AND/OR Gates
The circuits for AND gates and OR gates are fairly
similar. Both have two inputs (A and B) and one
output. For an OR gate, if either input is True, the
Output should be connected to the positive voltage.
This can be done with two n-channel transistors, one
for each input.

Half an oar.

If A is positive, there is a connection between


Output and the positive wire. If B is positive,
there is a connection between Output and the positive
wire. If both A and B are positive, there are two
connections between Output and the positive wire:
multiple connections just mean a better connection,
so the Output is still positive. However, Output
floats if both A and B are negative. In order to fix

27
this, we need two p-channel transistors.

A CMOS OR gate.

When A and B are both negative, the two p-channel


transistors (y and z) provide a connection between
Output and the negative wire.
When either A or B is positive (True), Output
is positive (True); when both A and B are negative
(False), Output is negative (False). This
configuration of transistors performs a Boolean OR,
and is called an OR gate, which has the following
symbol.

Either a spade or an OR gate.

For an AND gate, if either input is False, the


Output must be connected to negative voltage, while
if both inputs are True, Output must be positive.
This can be constructed in a similar manner as the OR
gate.

28
A CMOS AND gate.

This collection of transistors is called an


AND gate, and has the following symbol. Note the
straight edge on the left and the curved right side
(compared to the curved left and pointy right of the
OR gate).

This could never be confused for an OR gate.

Running Gates
Going back to our decision of whether or not to run
from the zombie hoard: suppose we have three wires,
one carrying whether running is even an option,
another carrying whether we are low on ammunition and
the last carrying whether the zombies are close. Our
expression for running was

Run = can_run & (low_ammo | zombies_close)

29
If we want to create a circuit that decides whether
we should run, we’ll need an AND gate and an OR gate.

Fragile, bulky electronics are extremely useful in a


zombie apocalypse.

30
Challenges
1) Just like algebra, there are many ways to go about
manipulating and simplifying Boolean expressions.
Look up a few and see if you can simplify
(A & (B | !B)) | (C & !(A | C))
2) With the circuits for NOT, AND and OR, care was
taken to avoid having a direct connection between the
positive terminal and the negative. That event is
called an electrical short. Do some research into
what an electrical short can do.

3) There are some other operators that show up in


Boolean logic from time to time. Some notable ones
include XOR (true if the two inputs don’t have the
same value), NAND (true when AND would be false, and
vice versa) and NOR (true when OR would be false).
Design circuits for these gates using transistors.

4) Most chip designs don’t use AND, OR and NOT gates:


they typically use NAND gates. Figure out how to
create a circuit that behaves like a NOT gate using
only NAND gates. Create similar circuits for AND and
OR.

31
32
Chapter 3

How to build
computation circuits
from gates

In the previous chapter, we went over how to


implement basic Boolean arithmetic with transistors.
In this chapter, we will cover how to use Boolean
gates to build more complicated circuits.

Truth and Tables


A truth table is a list of all possible inputs to a
Boolean expression, paired with the results for each
possible input. So, if our expression is:

R1 = !A | !(B | C)

the truth table would be

33
A B C R1
0 0 0 1
0 0 1 1
0 1 0 1
0 1 1 1
1 0 0 1
1 0 1 0
1 1 0 0
1 1 1 0

The truth shall set you free: True or False?

In this case, we know the Boolean expression that


produced this truth table. However, we don’t always
have that: if we do not know the expression, but
have the truth table, there is a series of steps that
can be used to generate an expression (which can be
easily turned into a series of gates).

1) Scan the truth table and find all rows for which
the result is True.

A B C R1
0 0 0 1
0 0 1 1
0 1 0 1
0 1 1 1
1 0 0 1
- - - -
- - - -
- - - -

I want the truth.

2) For each of those rows, AND the inputs, prepending


a NOT to those that are False.

34
A B C R1
0 0 0 1 !A & !B & !C
0 0 1 1 !A & !B & C
0 1 0 1 !A & B & !C
0 1 1 1 !A & B & C
1 0 0 1 A & !B & !C
- - - -
- - - -
- - - -

You can’t handle the truth.

3) OR the expressions for the rows.

R2 =
(!A&!B&!C) | (!A&!B&C) | (!A&B&!C) | (!A&B&C) |
(A&!B&!C)
This equation produces the exact same results as
the original (you might check this by constructing a
truth table). In step 2, equations were built that
were True for only one condition, and step 3 combined
them into an expression that was True for any valid
condition.
In this particular case, using R2 is a bad idea,
since we have the original statement (R1) which
is MUCH simpler. This method tends to produce
complicated expressions, so it’s really only useful
if you have a truth table without the expression that
created it.

The Multiplexer
In order to build a computer, there are three
circuits you need to understand: if you understand
how to build a multiplexer, an adder and a flip-flop,
you have enough knowledge to struggle through a full
computer. It won’t be easy, but at that point it’s
a matter of effort rather than knowledge (can’t vs.
won’t). We’re covering the multiplexer and the adder
in this chapter, and we’ll start with the simplest,
the multiplexer.

35
Suppose you’re a general, you have reports coming
in from two of your officers (Allen and Bert), and
that you know you have spies in your organization.

Who could it be?

If your spymaster (Charlie) comes to you and says


Bert is a spy, then you’re going to throw away Bert’s
report and use Allen’s information. If Charlie
says Allen’s a spy, you’re going to use Bert’s
information.

Et tu, Brute?

This is the idea of a multiplexer: a multiplexer


is given two pieces of information (two bits, A and
B) and told which piece of information needs to be
used (given in bit C). The multiplexer then passes on
the good bit to the next thing in the circuit.

36
Symbol for a multiplexer.

When C (the control) is zero, the Output is


the same as B. When C is one, the Output is the
same as A. This is all well and good, but it’s not
immediately obvious what the Boolean expression
for Output is. However, building a truth table is
relatively straightforward.

A B C Out
0 0 0 0
0 0 1 0
0 1 0 1
0 1 1 0
1 0 0 0
1 0 1 1
1 1 0 1
1 1 1 1

Treating a multiplexer as arithmetic may seem odd,


but iff it works it works.

We have a method for going from a truth table to an


expression: first, find the rows for which Output is
True.

37
A B C Out
- - - -
- - - -
0 1 0 1
- - - -
- - - -
1 0 1 1
1 1 0 1
1 1 1 1

Step 1

Then, construct expressions that are true only for


each row of the table.

A B C Out
- - - -
- - - -
0 1 0 1 !A & B & !C
- - - -
- - - -
1 0 1 1 A & !B & C
1 1 0 1 A & B & C
1 1 1 1 A & B & !C

Step 2

Collecting all the expressions together yields our


expression for a multiplexer.

Output = (!A&B&!C) | (A&!B&C) | (A&B&C) | (A&B&!C)

This gives us a circuit to select between two


possible options.

38
Output = A if C otherwise B

Binary Numbers
The next item is the adder: a circuit that can add
two numbers. However, before we can build that, we
need to know how to store numbers using bits. We
are going to limit ourselves to integers, or whole
quantities. The best example happens to be money:
it’s possible to have one penny, ten pennies, zero
pennies (sucks to be there, but it happens), or
negative one thousand pennies (in which case you owe
somebody $10). However, you can’t have half a penny.

39
Unlike the Spanish real, nobody will take a penny
that you cut into eight bits.

The most common system for writing numbers is


called the Arabic system (so named because it
originated in India). In this system, large numbers
of items are collected and represented by multiples
of some large denomination. Using money, you start
with zero pennies. Then, for the first nine pennies,
you simply count the number of pennies.

“Counting pennies” usually isn’t literal.

However, once you hit ten pennies, you trade in as

40
many pennies as you can for dimes. So, for instance,
if you have 23¢, you would have two dimes and three
pennies.

For some reason, the larger denomination is


physically smaller.

This continues until you collect ten dimes, at


which point you trade them in for dollars (one
hundred pennies each). So, 547¢leaves you with five
dollars, four dimes and seven pennies.

Of course, we’d prefer the bill with Franklin on it.

41
This basic process is repeated for each group of
ten: once you have ten of something, you group it
and count it as the next largest group (I’m ignoring
nickels, quarters and half-dollars to make a point).
This is the basic idea of Arabic numerals: the
rightmost digit stores the number of “ones” (pennies)
you have. The digit to the left of that stores the
number of “tens” (dimes), and the digit to the left
of that stores the number of “hundreds” (dollars).
As an example, the number 123 is composed of one
“hundred”, two “tens” and three “ones”.

One hundred, two tens, and three ones.

Each place increases in value by a factor of


ten. However, there is nothing special about ten.
We could choose to increase by a factor of five,
seven, sixteen or any positive integer. Since
we are using bits, which can only represent two
values (let’s call them zero and one), a natural
option is to make groups of two. This means that
the rightmost digit still represents the number of
“ones” (pennies). However, the digit to the left now
holds the number of “twos”, followed by the number of
“fours”, “eights”, “sixteens”, “thirty-twos”, etc...

42
One eight, zero fours, one two and one one. Also
known as eleven.

This is called binary, and since each binary digit


(bit) can only take one of two values (zero or one),
we can store a number in a collection of logical
bits. We know how to move a logical bit through
a wire, so we have a way to represent numbers in
circuitry.

A byte is eight bits, or two nibbles.

43
Addition
Let’s say you have 263¢, and you rob somebody who
has 675¢. After you make good on your getaway, you’d
like to know how much money you have.

Big money.

The first thing you might do is collect the two


sets of pennies together; in this case, you’d collect
your original three pennies with the five stolen
pennies to get eight pennies total.

All in.

44
Then, you’d move on to the dimes. In this case,
your six dimes would be combined with the seven
stolen dimes to get thirteen dimes. Since we’re
collecting groups of ten, we trade ten dimes for a
dollar, leaving three dimes.

Amazing how they’re all still face up.

Finally, you’d collect your two dollars, the six


stolen dollars and the dollar you traded in the dimes
for, leaving you with a total of nine dollars, three
dimes and eight pennies.

And up to fifteen years.

45
This is the basic idea of addition: each
unit/denomination/place is collected, and any results
larger than nine add one to the next group (called a
carry).

12345 + 56789 = 69134

The same basic procedure works if your numbers are


stored in binary: you collect each digit, and any
results greater than one add to the next digit.

10101 + 00111 = 11100

There’s something important to note about addition:

46
when we’re collecting the “two” bits (the second
digit), we only need to know three things to tell
what the sum is. We need to know the “two” bits
from both source numbers, and we need to know whether
the “ones” bit produced a carry. For all positions
you do the same thing with those three pieces of
information, so if we can build a circuit that
produces a result for a single bit, we can produce
many of those “full adders” and chain them together
to add however many bits we want.

The Full Adder


Our full adder will be composed of two parts: one
circuit to determine what the sum should be, and one
to determine whether there was a carry. Our circuits
will have three inputs: the bits of the original
numbers (A and B) and whether the previous adder
produced a carry (C_in). With this knowledge, it’s
fairly straightforward to build a table going from
different input combinations to the sum (S) and the
carry (C_out).

A B C_in S C_out
0 0 0 0 0
0 0 1 1 0
0 1 0 1 0
0 1 1 0 1
1 0 0 1 0
1 0 1 0 1
1 1 0 0 1
1 1 1 1 1

You might check my work: programmers are notoriously


bad at arithmetic.

Since we have a truth table, we can build an


expression for the sum.

S = (!A&!B&C_in)|(!A&B&!C_in)|(A&!B&!C_in)|(A&B&C_in)

And we can also get one for the carry.

47
C_out =
(!A&B&C_in)|(A&!B&C_in)|(A&B&!C_in)|(A&B&C_in)

We have the gates with which to implement these two


expressions.

S = (!A&!B&C_in)|(!A&B&!C_in)|(A&!B&!C_in)|(A&B&C_in)

C_out =
(!A&B&C_in)|(A&!B&C_in)|(A&B&!C_in)|(A&B&C_in)

And these can be repeated to get the sum of n-binary


digit numbers.

48
Three bits. That’s all of eight values.

49
Challenges
1) The multiplexer presented in this chapter selects
between two options. Design a multiplexer that
selects between four options.

2) Figure out how to represent the number


3,735,928,559 (base 10) in binary (base 2) and
hexadecimal (base 16).

3) This chapter went over how to represent positive


integers in binary, but did not cover negative
integers. There are multiple approaches to
representing them: research and compare a few of
them.

4) Testing whether two numbers are equal can also be


done bit-by-bit. See if you can design a circuit to
do so.

50
Chapter 4

How to build memory


from logic gates

In the previous chapter, we went over how to create


direct computation circuits (e.g. adders and
multiplexers) from logic gates. The goal of this
chapter is to perform computation in stages.

Motivation
Up to this point, all of our circuits have calculated
their results immediately: once the inputs have been
fed to the circuit, the output changes to match. For
simple problems this may be satisfactory, but as your
problems grow more complex, the circuits required to
solve them become increasingly complicated. Worse
still, it is incredibly difficult to use this type of
circuit to build a general purpose machine: if you
build an adder, all it will ever do is addition.
A solution to these problems is to break up the
computation into multiple steps. For instance, if
you have four numbers that need to be added together,
you might add them together two at a time. If the
computation is broken up into steps, then you can
reuse parts of your circuitry for each step: adding
four numbers takes one adder if done in steps, but
three if done all at once.
Of course, this doesn’t happen for free.
In particular, we need some way to store the

51
intermediate results between steps. This is what
a flip-flop circuit does. However, unlike the
adder and multiplexer, the flip-flop is relatively
complicated: building it requires having a clock
signal to drive the computation onward and being able
to build an SR latch.

Wrong kind of flip-flop.

SR Latch
The first thing we need is a Set-Reset latch.
This device is a circuit with two inputs named,
surprisingly, Set and Reset.

52
I’ve yet to see a good symbol for these.

If the Set input is connected to positive voltage,


the output will be “Set” to one. Similarly, when
Reset is connected to positive voltage, the output
will be “Reset” to zero.

About 50% of the time, these kinds of things get a


cute name. Welcome to the other 50%.

However, the critical point about an SR-latch is


that if both inputs have a negative voltage, the
output retains its value. So, if Set is originally 1
and then drops to zero, the output stays at one. If
Reset is originally one and drops to zero, the output
stays at zero.

53
Memory. Terrible memory, but memory.

Figuring out how to build a circuit that does this


is not exactly straightforward: our truth table
method doesn’t handle time very well. However, with
a little bit of intuition (i.e. black magic), it may
be possible to come up with the following circuit.

Magic, or more magic? You decide.

When Set is one (and Reset is zero), the system


forms a stable loop where the NOT gate y continually
produces a zero and the NOT gate x continually
produces a one. If Set drops to zero in this state,
the loop in between x and y maintains the output.

54
It can remember a one.

However, when Reset is one (and Set is zero),


the system forms the opposite stable loop, where
x produces zero and y one. If Reset drops down to
zero, the loop again remains stable.

It can remember a zero. It’s now smarter than 35% of


the population.

An astute reader may have noticed that I haven’t


mentioned what happens when both Set and Reset are
one. If this is done, the loop in the middle will be
held at zero. If S and R then drop to zero, the end
result is essentially a crapshoot, as the two parts
of the circuit fight over the loop. Randomness is
not a good trait in most electronic systems, so the
typical approach in this case is to never let Set and
Reset both be one at the same time.

55
Explosives may be a bit excessive.

The Clock
The SR-latch provides some memory, but you may have
noticed that the output changes immediately when the
inputs change. This means that performing sequential
logic using them is a very delicate balancing act, as
you have to worry about how long it takes each signal
to travel along the wires. Since we were trying to
avoid complexity, this is counterproductive. What
we need is some way to stop information from going
forward until we are ready: what we need is a clock.

Not quite.

A clock is a wire in our circuit that alternates


between being connected to positive voltage and
negative voltage.

56
It’s hip to be square (wave).

How exactly this is achieved varies from circuit


to circuit. A particularly cheap way is to hook up a
switch and mechanically move it back and forth.

The second-most boring job in the world.

The particularly brave among you may also realize


that the hot leads on the electrical sockets in your
house also provide an alternating voltage.

57
The wise among you will know not to try this.

However, most systems in use today make use of a


peculiar property of quartz crystals: if subjected
to mechanical stress, they will produce a voltage,
and if subjected to a voltage, they will deform.
With this, it’s possible to design a crystal that
resonates at a given frequency, producing a clock
signal.

If you’re curious, this is called the piezoelectric


effect.

Regardless of how we produce the clock signal, we

58
can use it to prevent the latches from pushing their
data along until everything is synchronized.

D Flip-Flop
With a clock signal and two SR-latches, it’s possible
to build a circuit that stores its input one clock
cycle, and parrots it the next (a D flip-flop). To
do this, we need to gate one of the latches so it
only stores when the clock is positive (to store the
input). This can be done by adding a few gates on
the inputs so that, when the clock is negative, Set
and Reset are held at negative.

It only listens half the time. Those of you who


regularly deal with people know you’re still coming
out ahead.

When the clock is negative, the two AND gates (x


and y) hold Set and Reset at zero. However, when
the clock is positive, x and y will simply parrot
whatever Set and Reset are.
This circuit stores up the input when the clock is
positive. However, to prevent it from going forward
prematurely, we also need another latch: one gated
such that it blocks on a positive clock, and parrots
on negative.

59
With exactly one extra gate.

Connecting them together (and adding some gates to


turn data into Set and Reset) it’s possible to create
a circuit that stores an input on a positive clock,
and parrots it on a negative clock.

If you get this, the circuitry is all downhill from


here.

This device is called a D Flip Flop, and can be


used to store intermediate values between stages of
computation. The following symbol is used.

60
The “D” stands for “data”. Or possibly “delay”. Or
maybe “dammit”.

This D flip-flop stores data for one clock cycle.


This can easily be used to build a circuit that will
store a value until you say otherwise: all you need
is a multiplexer.

Repeat yourself. Repeat yourself. Repeat yourself.

Multiplication
The entire reason for building a flip-flop was to
break up computation into stages: it might be
helpful to see how this is done. A good candidate

61
for this is multiplication. Building a multiplier
using the truth table method is possible, but a huge
pain (the adder could be designed in isolation, but
each bit in a multiplier depends on all the bits in
the numbers).
However, there are multiple ways to break up
multiplication into steps. Multiplication is simply
repeated addition: if a defendant in a trial wants
to bribe each of the twelve jurors $10, then the
total amount of money needed is $10 + $10 + $10 +
$10 + $10 + $10 + $10 + $10 + $10 + $10 + $10 + $10.
That’s equivalent to 12 x $10, or 120 dollars.

In the world of money, Wilson speaks louder than


Hamilton.

In general, trying to multiply two numbers (let’s


call them A and B) is the same thing as trying to
add A to itself B times. There are faster ways of
performing this multiplication, but for right now,
we’re not overly concerned with speed.
The first thing we need to design is a circuit that
repeatedly adds a number to itself. To do this,
we can use multiple full adders (designed in the
previous chapter) and multiple D flip-flops.

62
Repeat ad-nauseum.

That circuit is a mess: in order to clean up the


drawing, the repeated elements will be drawn as one
element with ribbons carrying the data (rather than
single lines).

Much simpler.

Every clock cycle, the value stored in the


flip-flop will have A added to it, and the result
will be stored back in the flip-flop (replacing the
previous value). However, whenever the value of A
changes (for example, when we want to multiply two
different numbers), we need some way to reset the
value to zero. This can be achieved by adding a
multiplexer.

63
From hero to zero.

Whenever a new pair of numbers to multiply is


selected, the “Reset” wire can be set to positive
voltage to store a zero in the flip-flop.
We can add A to itself, however we need some way
to know when we’ve added B times: we need a circuit
to count the number of times A has been added with
itself. A counter can be created by repeatedly
adding one, and the test for when the multiplication
is done requires a circuit to compare whether two
numbers (specifically, the count and B) are equal
(this was a challenge for the previous chapter).

64
Are we there yet?

Both of these circuits together constitute a


multiplier. As an example of how it works, suppose
at the first clock cycle you want to multiply 4 and
3. The running total and the count are clamped to
zero due to Reset.

Ready, set, GO!

Every clock cycle, the 4 is added to the running


total, and 1 is added to the count. After three
clock cycles, the count is three (equal to the number
B), so the Ready wire signals that the running total
(12) is the product of 4 and 3.

65
Done

This circuit has no mechanism to stop adding, so


the next clock cycle it will keep on adding (it only
produces the answer for one clock cycle).

Whoa, stop, finish, halt.

However, later circuitry can be designed to manage


this quirk.

66
Challenges
1) A D flip-flop parrots the data (D) after one
clock cycle. A T flip-flop flips its output (true
to false, false to true) whenever its input is one.
See if you can design one.

2) See if you can design a multiplier (for four


bit numbers) that produces its result immediately.
Compare your design to the staged design in this
chapter.

3) The multiplier design given above produces the


result of AxB for one clock cycle: see if you can
add some circuitry to stop adding once the counter
hits B.

4) Multiplication is repeated addition. Division


is repeated subtraction. See if you can design a
circuit that takes two inputs (A and B) and produces
a quotient (the number of times you can subtract) and
remainder (whatever doesn’t divide cleanly).

67
68
Chapter 5

How to build a
computer

In the previous chapters, we went over how to create


logic gates, computational circuits and basic
memory. The goal of this chapter is to understand
the organization of a computer.

The Goal
All the examples up to this point have been single
purpose machines. The idea of a single purpose
computer is old: there are mechanical astrolabes
dating back to ancient Greece, and battleships in
the 1940’s used analog computers to calculate how to
angle the guns.

69
Shoot the moon.

A single purpose machine is wonderful if you only


ever have one kind of problem that never changes.
Unfortunately, we live in the real world, where
problems are numerous and ever changing. With
single purpose machines, you need a large number of
dedicated devices.
What would be useful is a general purpose computer:
a machine that reads a set of instructions, and
follows those instructions one by one until an answer
is reached. With this machine, if your problem
changes all you have to do is change which set of
instructions the machine will follow.

This time, DON’T destroy all humans.

At their core, this is what a modern computer


is: something that takes a set of instructions and

70
performs them.

Basic Organization
Unless you go with a very exotic architecture, most
computers consist of a place to store the current
state/data of the program, a place to store the
instructions (that describe how that state should
be modified), and a bunch of circuitry that takes
the current state and the current instruction and
COMPUTES the next state.

Not pictured: an ungodly number of gates and wires.

Most computers follow this setup, but the devil


is in the details: the biggest differences between
architectures tend to be how program state is stored
and the specific instructions the computer can do.
When is comes to program state, the most enduring
design seems to be the register-based von Neumann
flavor of design: at the time of this writing,
nearly every computer and computer-like device (smart
phones, laptops, game consoles, dumb phones) in use
by the general public uses this type of design. In
this case, the state of the machine is broken up into
bulk data storage (large but slow) and immediate use
storage (very small, very fast): these are called
memory and registers, respectively.

71
A LOT of computer science is simply managing
complexity.

There are typically a very small number of


registers: 8 to 32 is common. They are typically
given names and are referenced directly in programs
(you’ll notice that the drawing above has them being
managed individually). In contrast, memory is HUGE:
registers typically hold 64 to 256 bytes, while RAM
(Random Access Memory) has held over 65 thousand
since the 1980’s. Memory is typically separate from
the main computer, and instructions dealing with
memory typically work by address.

72
Seven bytes. A staggering amount of memory.

The CPU can ask the memory controller (“Complicated


Circuitry”) to give it the byte at a location (byte
3, for example).

Give me your digits.

It can also ask the memory controller to set a byte


to a value (for example, set byte 3 to 77).

73
Can you remember seven things?

Instructions
If you look at the von Neumann architecture,
you’ll notice that there is no place marked off for
instructions. That’s because instructions have to be
encoded as bits (our computer only deals with bits),
which memory is perfectly capable of storing. A von
Neumann architecture uses the same memory for storing
instructions and data.

74
Why reinvent the wheel or waste space?

So, now that we have our storage sorted out,


there are still two looming questions. First, how
does the computational circuit pick out the current
instruction? Most architectures have a special
register, called the instruction pointer (IP), that
stores the address of the next instruction. Each
clock cycle (kinda... it can get complicated),
the computation circuit takes the address in the
instruction pointer, asks memory for the byte(s)
at that address, figures out what those bytes are
telling the circuit to do, performs that command,
and updates the instruction pointer to the next
instruction.

75
Simplicity itself.

The second question is: how are the instructions


encoded? By far the most common approach is to use
an opcode and arguments. The first few bits of every
instruction are a special code that specify the type
of the instruction. For example, 0101 at the start
of an instruction could mean the instruction is
an ADD instruction. Once you know you have an ADD
instruction, you know how to parse the rest of the
instruction. In our example, ADD would be a two byte
instruction, with the opcode in the first nibble, the
destination register in the second nibble, and the
two registers to add in the third and fourth nibbles.

0101 0000 0001 0010


ADD r0 r1 r2

Set register 0 to register 1 + register 2

When this instruction is run, the values in


registers 1 (0001) and 2 (0010) are added, the result
is stored in register 0 (0000), and the instruction
pointer is increased by two (ADD is a two byte
instruction).

76
Register0 = Register1 + Register2
IP = IP + 2

Instruction Flavors
Now that we have out instructions encoded, it’s
worth asking what instructions our computer will
have. If you look at any mainstream architecture,
you will find a fairly large number of instructions.
However, they tend to fall into one (or more) of four
categories.
The first category is that of simple computation.
These are instructions that take values in registers,
perform computation (add, subtract, multiply,
divide), and put the result into a register. These
are the workhorse instructions: any flavor of
arithmetic that the computer needs to do is done
with these instructions. The most common flavors
of arithmetic are numerical (add, subtract, etc...),
Boolean (and, or, not) and, perhaps strangely,
comparisons (7 > 3 is True). Different architectures
add their own spins in the name of speed (x86
includes some vector operations, and ARM includes a
free bit shift with every instruction), but in the
end it’s all simple computation.
The second category is memory management. Unless
your problem is small enough to fit in the registers
(only 64 bytes), there must be some way to move data

77
between registers and bulk storage. In particular,
you need to be able to get a byte from an address
(stored in a register) and put that byte into a
register.

Register0 = Memory (at address in Register1)


IP = IP + 2

You also need to be able to take an address (in a


register) and a byte (in a register) and set a byte
in memory.

Memory (at address in Register1) = Register0


IP = IP + 2

78
An important question is how many bytes we have
access to. For example, if our registers hold eight
bit numbers, and we use a register to specify a
location in memory, our memory can only hold 256
bytes. However, if our registers hold sixteen
bit numbers, our memory can hold 65536 values, and
thirty-two bits give us 4294967296 values. If you’ve
ever wondered what people meant when they said a
machine was 16, 32 or 64 bit, this is (mostly) what
they’re referring to, and is called the “word size”.
Since registers tend to be larger than a byte,
there are typically instructions to move an entire
register worth of data at a time. So, if the machine
used 16 bit registers, there is probably a command to
get two bytes from RAM, and another to set two bytes
in RAM.
The third category is program flow control.
Normally, the instruction pointer goes straight
through instructions: instruction 1, then 2, then
3, and so on. However, you might want a program that
responds to changing events. For instance, if your
program is managing a heart monitor and a patients
pulse drops below 20 bpm, your program should start
sounding an alarm.

Not necessarily for the patient’s benefit.

However, the doctors are going to get annoyed if


your program always sounds an alarm. A conditional

79
jump can be used to skip past the alarm code if
everything is alright, while going through the alarm
code if something’s wrong.

Beeeeeeeeeep.

Flow control instructions change the instruction


pointer. Some of them work based on a condition:
for example, if register 1 (0001) is non-zero, change
the instruction pointer to the address in register 2
(0010).

if Register1 is non-zero
IP = Register2

80
if Register1 is zero
IP = IP + 2

There are also unconditional flow control


instructions; these are a useful organizational tool,
especially if the instruction also saves the current
value of the instruction pointer (allowing you to
return to the same point and continue).
The last category is the miscellaneous category:
there are a few operations that don’t fit cleanly
into the above categories (how do you communicate
with other hardware, how do you coordinate multiple
processors) that, while important, are also detail
dependent (i.e. architectures solve these problems
in different ways). The good news is that, if you
understand the other three categories, this isn’t
that hard to figure out.

Programs
Once you have your instructions, you just need to
figure out how to combine them to do what you want,
and you need to run them. For running the program,
there is usually some piece of logic (either baked
into the computer, or loaded in by something baked
into the computer) that can read the program’s bytes
from an external source (usually a hard drive) and
copy them into memory. Once they’re in memory, the

81
computer just needs to set the instruction pointer to
the first byte and start running.
Combining the instructions into a coherent program
is a fairly broad topic, and represents the dividing
line between hardware (transistors, gates and
circuits) and software (programs). It also happens
to be the topic of the second quarter of this book.

82
Challenges
1) The organization presented in this chapter is the
most common, but it’s not the only one. Research a
few other types of architectures, then compare and
contrast them with the von Neumann architecture.

2) Think of all the computer-type things you make


use of. Look up the type of chip used in two of
them, and find out how each chip encodes its add
instruction.

3) The categories of instructions listed in this


chapter stem from the RISC philosophy. There is
an alternative, CISC, in which instructions perform
more than one fundamental task (memory access and
add, for instance). Think of some advantages and
disadvantages of each approach.

4) If you were to design your own processor, what is


one instruction that you would include? Why would
you include it? How would you encode it, and which
category/categories would it fit in?

83
84
Chapter 6

How to program
arithmetic

In the first quarter of this book, we went over


how to create a computer. The goal of the second
quarter of this book is to understand how to write
and compile a program. The goal of this chapter is
to program basic arithmetic in assembly.

Assembly
A running program is a large collection of bytes.
If you really wanted to, you could arrange the
bytes yourself to produce the commands you want the
computer to run. However, the human eye was never
meant to look at long lists of numbers: the ones and
zeros will start bleeding into each other.

How exactly do you treat stab wounds on a number?

Luckily, your human eyes have been trained to

85
look at words and extract meaning (this is called
READING). Processor designers have noticed this, and
have written programs that will read in a specially
formatted text file and produce machine code. These
programs are called assemblers, and the text files
they read are called assembly files.
When figuring out how programs work, you will
have to work with machine instructions; this does
not mean you have to work with machine code. Most
lines in an assembly file turn into one instruction
for the machine: some assemblers provide a few
extra features, but by and large there is a one
to one equivalence between assembly and machine
code. In other words, programming in assembly IS
programming in machine code, just without the hassle
of remembering bit patterns. This means that an
assembly programmer knows exactly what the machine
will do in response to their program, and they can
squeeze out the full measure of performance from
their machines.

Not always this literally.

The fact that writing in assembly is equivalent to


writing in machine code means there are two very good
reasons to avoid programming in assembly if possible.
The first is that programs written in assembly will
only work on one type of processor. The machine
code and opcodes for an ARM processor (the processor

86
in your phone) look nothing like those for an x86
processor (the processor in your laptop), so assembly
written for an ARM processor will not run on x86, and
vice versa.

Count your blessings: this has historically been a


MUCH bigger issue.

Second, and more importantly, writing assembly


is absolutely tedious. Assembly gives you absolute
control over every facet of the machine, but that
means that you always have to specify everything
about the machine. If you want to perform a
calculation, you have to keep track of memory
locations (where are these numbers coming from), the
available commands (machine codes are only available
for simple commands) and the registers you are using
(for temporary storage of intermediate results).
However, if you want to learn how programs work,
you need to know assembly. This means we need to
choose a processor to write assembly for.

x86 Assembly (the RISC way)


When you’re learning, RISC processors are much
easier to comprehend, as each instruction has fewer
things going on (so it’s easier to keep track). The
processor running your laptop, Intel’s x86 design, is

87
decidedly NOT a RISC processor: it is a complicated
mess of a machine. However, if you program, you will
run into it, so this book will use x86 as an example.
That doesn’t mean that the full complexity of x86
will be presented: this book will use a subset of
x86 that looks like a RISC code. If you take this
up for a living, know that you should NOT program an
x86 machine the way this book does (there is a whole
lot of stuff I’m skipping). This book will also use
32 bit x86: while 64 bit is present and dominant,
x86 is backwards compatible, and the standard calling
convention is easier for 32 bit. Additionally, I
will use the syntax of the GNU compiler suite (it’s
what I’m most familiar with); if you use a different
compiler, the format may be different.
That said, let’s start this trainwreck. For a user
level program, your view of the x86 state is eight
32-bit registers (eax, ebx, ecx, edx, esi, edi, ebp,
esp), two Boolean registers (s, z), one instruction
pointer, and up to 4 billion bytes of memory.

Not pictured: utter madness.

You don’t have direct access to the instruction


pointer (there aren’t many architectures that give
you direct access) or the Boolean registers (they are
set by arithmetic commands: they stand for Zero and
Signed (i.e. negative)).

88
x86 Arithmetic Instructions
x86 provides a whole host of arithmetic instructions.
Here are seven useful ones.
addl source, destination
Calculates the sum of source and destination, puts
the result in destination, and sets s and z based on
whether the sum was negative, zero or positive. As
an example, “addl %eax, %ebx” will sum up eax and
ebx, and will put the result in ebx.

ebx = ebx + eax

subl source, destination


Calculates destination minus source, puts the result
in destination, and sets s and z based on whether
the result was negative, zero or positive. As an
example, “subl %eax, %ebx” will put ebx - eax into
ebx.
negl destination
Calculates the negation of destination, puts the

89
result in destination, and sets s and z based on the
result. As an example, “negl %eax” will calculate
the negative of eax (if eax holds 5, this will
produce -5) and store that in eax.
andl source, destination
Calculates a bitwise AND on source and destination,
puts the result in destination, and sets s
and z based on the result. As an example,
“andl %eax, %ebx” will perform 32 AND computations
(the first bit of eax & the first bit of ebx, the
second of eax & the second of ebx, etc...), and will
put all 32 results into ebx.
orl source, destination
Calculates a bitwise OR on source and destination,
puts result in destination, and sets s and z.
“orl %eax, %ebx” ORs eax and ebx, and puts the result
in ebx.
notl destination
Calculates a bitwise NOT on destination and puts the
result in destination (does not change s and z).
“notl %eax” will turn every one in eax into a zero,
and vice versa.
movl source, destination
Copies the value of source to destination (does not
change s and z). “movl %eax, %ebx” will set ebx to
the value in eax.

x86 Memory Instructions


There are three commands of note that take data to
and from memory. None of these will change s and
z. The three presented here deal with 32 bit values:
the single byte commands have weird restrictions.
movl addrreg, destination
This will load 32 bits from memory, in the address
specified in addrreg, and put them into destination.
“movl (%eax), %ebx” will take the value from memory
(at the address in eax) and put that value into ebx.
movl source, addrreg
This will take the value in source and copy it
to memory, at the address specified by addrreg.
“movl %eax, (%ebx)” will store the value in eax in
memory (at the address in ebx).

90
movl number, destination
This will put a numerical value into the destination.
“movl $123, %eax” will put 123 into eax. Note that
addresses are also numerical values (including the
addresses of instructions). The reason this command
is in the memory section is that, technically, this
does move a value in from memory (code is stored in
memory).

x86 Control Flow Instructions


There are three commands of note that change control
flow.
jz *source
This will change the instruction pointer to the
address in source if the last arithmetic instruction
produced zero (z is set). For example, “jz *%eax”
will go to the address in eax if z is true.
js *source
This will change the instruction pointer to the
address in source if the last arithmetic instruction
produced a negative value (s is set). For example,
“js *%eax” will go to the address in eax if s is
true.
jmp *source
This will change the instruction pointer to the
address in source (period). For example, “jmp *%eax”
will jump to the address in eax.

Arithmetic
With these thirteen commands, it’s possible to
construct the basics of programming. We just need
to figure out how to chain them together to do what
we want.
The first thing worth going over is how to
construct complicated arithmetic expressions in
assembly. You’ll notice that all of the arithmetic
commands deal with only one or two numbers: if
we want to calculate something more complicated,
we’ll need a sequence of commands. Our first
order of business is to use assembly as a glorified
calculator.

91
Suppose you are an accountant for the local...
legitimate business, and you need to determine
whether one of the customers is paid up.

Officially, they’re a cleaning company.

The current customer has three entries in his


account: -$250 charges for a bachelor party, -$50
for blackmail from that party, and a payment of
+$275. We can start our program by getting all three
numbers into registers.
movl $-250, %eax
movl $-50, %ebx
movl $275, %ecx
Once they’re all in registers, we can start adding.
We’ll keep the result in eax.
addl %ebx, %eax
addl %ecx, %eax
The first addl puts (-250) + (-50) into register
eax (-300), and the second puts (-300) + 275 in
eax. These five instructions, taken together, put
the balance into eax: actually reporting the result
requires using a subroutine, which is covered later.

92
Sleeping with the fishes.

Variables
Now suppose you have another customer, who has 10
transactions on his account. If you remember, 32
bit x86 only has eight registers. We can’t store
all the numbers in registers, but we have memory to
work with. Most assemblers will let you add data
as well as code, so you might start by putting your
transactions in a data section.
.data
#int transaction1 = -250;
transaction1:
.long -250
#int transaction2 = -50;
transaction2:
.long -50
#int transaction3 = 275;
transaction3:
.long 275
# //and 7 more
The “.data” is an assembler directive: it does not
show up in the machine code, but lets the assembler
know to produce data instead of code. The items
beginning with “#” are comments: they are ignored
by the assembler (if you are familiar with C, you

93
will know why the comments are written as they are;
if not, re-read this chapter after reading chapter
nine).
Most assemblers also allow you to specify labels:
a lot of code revolves around addresses, and figuring
out the address of something in code is tedious (and
error prone) arithmetic. This arithmetic is not
particularly hard, so the assembler can do it for
you. A label is a name followed by a colon.
And, at last, we get to the thing that actually
stores the data. The “.long” tells the assembler
that the numbers should be stored in 32 bits. The
numbers should explain themselves.
The code section (“.text”) becomes a little more
complicated. We need to get the address of each
transaction, load it into a register, and add it to
our accumulator. The first thing we need to do is
zero the accumulator.

movl $0, %eax

An address is a number: to get the assembler to fill


it in for you, you use the label name instead of a
number.

movl $transaction1, %ebx

Then, you need to get the 32 bit value at that


address.

movl (%ebx), %ecx

And finally, add.

addl %ecx, %eax

Putting it all together, your code section should


look like

.text
# int* transactionLoc;
# int curTransaction;
# int sum = 0;
movl $0, %eax

94
# transactionLoc = &transaction1;
movl $transaction1, %ebx
# curTransaction = *transactionLoc;
movl (%ebx), %ecx
# sum = sum + curTransaction;
addl %ecx, %eax
# transactionLoc = &transaction2;
movl $transaction2, %ebx
# curTransaction = *transactionLoc;
movl (%ebx), %ecx
# sum = sum + curTransaction;
addl %ecx, %eax
# transactionLoc = &transaction3;
movl $transaction3, %ebx
# curTransaction = *transactionLoc;
movl (%ebx), %ecx
# sum = sum + curTransaction;
addl %ecx, %eax
# //and 7 more

95
Challenges
1) When I explained the thirteen commands, I was
paving over a lot of complexity. As an example,
it is possible to add an item in a register with an
item in memory. Look up how, and rewrite the second
example (the 10 transaction customer) making use of
this.

2) x86 has other arithmetic expressions. Look up how


to multiply and divide integers.

3) What is the bitwise AND of 15 and 255? Write


assembly code to calculate this, with the result in
register eax.

4) The second example shows how to use global


variables: a storage space at a fixed location.
While useful, it’s possible to add ten numbers
without specifying an address in memory. See if you
can figure out how.

96
Chapter 7

How to program with


jumps

In the previous chapter, we went over how to perform


basic arithmetic in assembly. The goal of this
chapter is to organize and add conditions to code.

Conditional Problem Setup


Pilots in planes meant for combat typically have the
ability to eject. The ejection procedure is actually
relatively complicated: if the cockpit roof doesn’t
come off first, you’re going to have a pilot pancake.
So, like so much else, ejection might very well be
controlled by a computer.

97
Do a barrel roll.

If the plane is working perfectly, then the seat


and pilot should stay planted in the plane: ejection
should be done when either the pilot says so, or
the temperature is higher than some threshold (200
degrees).

Hope you’re not afraid of flying, ‘cause this ain’t


gonna help.

Let’s say you’re working on a team, and somebody


else has determined the temperature and the pilot’s
desires (the code to do that depends on the equipment

98
in question: it could be as simple as reading
a memory location, or it could require handling
interrupts). Those pieces of information are stored
in variables, so the assembly file you’ve been given
to work on looks like:
.data
#int planeTemperature;
planeTemperature:
.long 0
#bool pilotWantsEject;
pilotWantsEject:
.long 0
.text
#void testEject(){
testEject:
#TODO
#}
When the program has run, some other piece of
code will have changed planeTemperature to the
temperature, and pilotWantsEject to zero (for no) or
something else (for yes). All we need to do is fill
in code at TODO to figure out if we need to eject the
pilot, and if so, jump to some other code that will
actually do that (which somebody else will provide:
again, it’s specific to the equipment).

Conditional Execution
If we determine that we need to eject, the provided
place to jump to will be called “ejectPilot”. The
first thing we should test is whether the temperature
is too high, which means we need to load the current
temperature.
movl $planeTemperature, %eax
movl (%eax), %ebx
We’ll also need to load in the threshold (200).

movl $200, %eax

Now for the actual test. We want to know if the


temperature is over 200, but there’s no direct way

99
to test this (that I’ve told you about). However,
we have a command (subl) that will change condition
codes. If we subtract the plane temperature from
200, we will get a negative value if the temperature
is larger than 200. If the temperature is less
than 200, we’ll get a positive value, and if it’s
equal we’ll get zero. So, we can test whether the
temperature is too high via subtraction.

subl %ebx, %eax

This will set the condition codes, so if the code for


a negative result is set, we should jump to the code
to eject.
movl $ejectPilot, %ecx
js *%ecx
If we’ve got to this point, then the temperature
was not too high. So, now we need to test whether
the pilot wants to eject. Again, we need to get the
value in question.
movl $pilotWantsEject, %eax
movl (%eax), %ebx
And since movl does not change the condition codes,
we’ll need to compare it to something: since we want
to know if it’s equal to zero, that’s a good value to
compare to.
movl $0, %eax
subl %eax, %ebx
If the pilot does not want to eject (pilotWantsEject
equals zero), we should not jump to ejectPilot:
one way to accomplish this is to jump on zero to
skip the code we don’t want to do (namely, jump to
ejectPilot).
movl $skipJumpEject, %eax
jz *%eax
movl $ejectPilot, %eax
jmp *%eax
skipJumpEject:
This code will skip the jump if pilotWantsEject

100
is zero, but will jump to ejectPilot for any other
value. So, the whole of our code looks like:

.data
#int planeTemperature;
planeTemperature:
.long 0
#bool pilotWantsEject;
pilotWantsEject:
.long 0
.text
#void testEject(){
testEject:
# if (planeTemperature > 200)
movl $planeTemperature, %eax
movl (%eax), %ebx
movl $200, %eax
subl %ebx, %eax
# {
# ejectPilot();
movl $ejectPilot, %ecx
js *%ecx
# }
# if (pilotWantsEject)
movl $pilotWantsEject, %eax
movl (%eax), %ebx
movl $0, %eax
subl %eax, %ebx
movl $skipJumpEject, %eax
jz *%eax
# {
movl $ejectPilot, %eax
jmp *%eax
#}
skipJumpEject:
# Whatever comes after your code
#}

101
...dangit.

Subroutines
The technical name for a block of code we can use and
reuse is a subroutine. There were a few subroutines
in disguise in the previous section (“ejectPilot”
and “testEject”). Additionally, there were a few
things we didn’t go over. The first of these is
that, if it’s meant to be used in multiple places,
the subroutine will need to be told where to return
to. For instance, suppose we’re writing a subroutine
to change the angle of the ailerons.

102
...wait, how did you escape the zombies?

In addition to a return address, we also need


to decide how we’re going to give the subroutine
information. In the eject example, “ejectPilot” did
not need any information, but “testEject” read its
data from variables in memory. As a first attempt
at a calling convention, let’s say that the return
address is passed in %edx, and information is passed
via variables. So, if the name of our subroutine
is “angleAileron”, then if we need to call it, our
assembly might start by setting the angle we need to
rotate by:
.text
# our main code
# angleAileronArgumentAngle = 15;
movl $angleAileronArgumentAngle, %eax
movl $15, %ebx
movl %ebx, (%eax)
Then it might store the return address and jump to
the subroutine.
# angleAileron(15);
movl $ourReturnAddress, %edx
movl $angleAileron, %eax
jmp *%eax
We need to define where to return to. When that

103
address is returned to, we should then go on to do
other things (like check the plane temperature),
confident that the ailerons have been moved.
ourReturnAddress:
movl %planeTemperature, %ebx
#...

Nevermind.

The angleAileron subroutine can be defined


elsewhere in the assembly file (with some extra
work, it can even be defined in other files). If
using variables, the variable will need to be defined
somewhere.
.data
# int angleAileronArgumentAngle;
angleAileronArgumentAngle:
.long 0
And the code for the subroutine will also need to be
defined.

104
.text
# void angleAileron(int Angle){
angleAileron:
movl $angleAileronArgumentAngle, %eax
movl (%eax), %ebx
# ... a whole bunch of code
And at the end, it will need to return.

jmp *%edx

If we do this, we can call this subroutine from our


main code, and once the subroutine is done, it will
go back to where we left off in our main code.
This could have been done by adding a label
to our main code, and jumping to it at the end
of angleAileron. However, if we did that, then
angleAileron could only be called from one location:
if we needed to angle the ailerons in multiple
locations (i.e. part of our code angles in response
to the pilot, part of it angles as part of the
autopilot, another part angles during startup), we
would need to write the subroutine multiple times.

If angleAileron sees a lot of use, you will have a


lot of copies.

Passing a return location means we can just write


angleAileron once, and use it in multiple locations.

105
If angleAileron sees a lot of use, you will only have
one copy.

Stacks
Older languages (FORTRAN) actually stored their
arguments in variables. This works well enough,
until you have multiprocessor machines, which can
have two processors working on the same memory at the
same time: if they ever run the same subroutine,
they will trash each other’s data. Recursive
subroutines also break when using variables for
arguments: recursion is where a subroutine calls
itself, and is extremely useful for handling trees,
but if a subroutine calls itself (and arguments are
passed in variables), it will trash its own variables
in the process.
So, modern programming languages use a calling
convention that relies on a stack. When the program
first starts, it carves out a substantial section of
memory for its own use, and stores the first location
in a register (on x86 it is esp, the “extended stack
pointer”). Instead of saying the argument lives at
a specific address ($angleAileronArgumentAngle is a
specific number, such as 828), they say it lives some
number of bytes away.

106
Using a stack allows you to have multiple processors
(each with their own registers, including esp) using
the same memory.

So, angleAileron would start by, instead of reading


the angle from a variable, reading the angle from the
stack (in this case, 4 bytes after the location in
esp).
.text
# void angleAileron(int Angle){
angleAileron:
movl $4, %eax
addl %esp, %eax
movl (%eax), %ebx
# ... a whole bunch of code
jmp *%edx
Whenever angleAileron is called, the main code needs
to change esp and put the angle in its proper place
(relative to esp).

107
.text
# our main code
#//move esp back 8 bytes
movl $-8, %eax
addl %eax, %esp
# angle = 15;
movl $4, %eax
addl %esp, %eax
movl $15, %ebx
movl %ebx, (%eax)
The rest of the call remains the same (save the
return address, jump to the subroutine). The only
other thing that’s different is that, when the main
code is returned to, it needs to restore its stack
pointer (esp).
movl $8, %eax
addl %eax, %esp

108
Challenges
1) If you have two 32-bit boolean variables (varA
and varB), see if you can write some assembly that
calculates (varA | varB).

2) Suppose you have a program with four 32-bit


variables (varA, varB, varC and varRes). See if you
can write some assembly that sets varRes to -1 if
varA is greater than varB, or varC is less than zero.
If neither is the case, set varRes to 0.

3) See if you can write a subroutine that takes two


arguments and returns their sum in ebx. For extra
points, pass the arguments on a stack.

4) Write some assembly that uses that subroutine to


add four numbers (1,2,3 and 4). You should call that
subroutine three times.

109
110
Chapter 8

How to program with


loops

In the previous chapter, we went over conditional


execution and subroutines. The goal of this chapter
is to do computation with variable amounts of data.

Text
Our computer is built to handle bits. It happens
that there is a relatively straightforward way
to approach bits as integers. However, text
is a different beast. There’s not a direct,
systematic way to go between bits and letters: a
bit distinguishes between two possibilities, but the
choice of which option corresponds to 1 and which to
0 is fairly arbitrary.
That said, this book is written in English, so I
assume you can read English. In this day and age, if
you’re programming for English, you’re using ASCII:
ASCII is a code that equates numbers to characters.
So, if you see the number 70, you can look up what
that is in ASCII (capital F), while the number 117
corresponds to a different character (lowercase u).

111
What starts with “F” and ends with “uck”?

For the international audience, there are other


encodings: the current winner seems to be Unicode
(partly because UTF-8 happens to be identical to
ASCII for English characters). By the way, most
historical English encodings (of which ASCII is the
winner) have fewer than 256 characters. Eight bits
allows you to distinguish between 256 possibilities,
so now you know why there are eight bits in a byte.
Many operating systems, upon starting a program,
provide three different streams. One of them is
an input stream: if the user types in text on the
keyboard, the operating system stores it and allows
the program to get the ASCII values. There are also
two output streams, where the program can provide
text (in ASCII) to show to the person using the
program (one is for general communication, one is for
errorda;ldjaflsdfjas).

112
It’s easy to understand why the program caught fire:
take a close look at the gears.

For convenience, let’s assume someone else wrote


a subroutine called “putchar” with a very simple
calling convention (for convenience): return
address in edx, and character to write in eax.
This subroutine will write a single character to
the general communication stream. If we wanted to
write the text “Hello, World!” (example mandated
by the programming language tutorial mafia), then we
could start by loading “H” (72) into eax, the return
address into edx, and then jumping to the subroutine.
movl $72, %eax
movl $retaddr1, %edx
movl $putchar, %ebx
jmp *%ebx
retaddr1:
To write the whole text, this can be repeated for
each character.

# putchar(‘H’);
movl $72, %eax
movl $retaddr1, %edx
movl $putchar, %ebx
jmp *%ebx
retaddr1:

113
# putchar(‘e’);
movl $101, %eax
movl $retaddr2, %edx
movl $putchar, %ebx
jmp *%ebx
retaddr2:
# putchar(‘l’);
movl $108, %eax
movl $retaddr3, %edx
movl $putchar, %ebx
jmp *%ebx
retaddr3:
#// and 9 more

Loops
Writing the same code thirteen times is a little
ridiculous: it gets even worse if you’re using
longer strings of text. What would be nice is if
we could write one small piece of code that can write
the whole string. One possible way to do this is,
first, to store the text as data instead of putting
it in the code.
.data
# char* textHW = “Hello, World!”;
textHW:
.long 72, 101, 108, 108, 111, 44, 32
.long 87, 111, 114, 108, 100, 33, 0
(Two .long statements do the same thing as a single
.long statement with all the values). Our code might
first get the location of our data.

movl $textHW, %ecx

Now, you might have noticed that I added a zero


at the end of the character list. Zero is not a
valid ASCII character, so it is a useful marker.
We have the address of a character (in ecx), so we
can get that character, compare it with zero (via
subtraction), and if it is zero we can quit.

114
loopTest:
movl (%ecx), %eax
movl $0, %ebx
subl %ebx, %eax
movl $loopEnd, $ebx
jz *%ebx
If the current character is zero, we skip to the
end. Otherwise, we go on to the next code (printing
a character; remember that eax still holds the
character).
movl $retaddr1, %edx
movl $putchar, %ebx
jmp *%ebx
retaddr1:
And after we print, we need to move to the next
character (4 bytes ahead) and repeat.
movl $4, %ebx
addl %ebx, %ecx
movl $loopTest, %ebx
jmp *%ebx
Putting it all together, we get:

.data
# char* textHW = “Hello, World!”;
textHW:
.long 72, 101, 108, 108, 111, 44, 32
.long 87, 111, 114, 108, 100, 33, 0
.text
# char* curCharAddr = textHW;
movl $textHW, %ecx
loopTest:
# while(*curCharAddr)
movl (%ecx), %eax
movl $0, %ebx
subl %ebx, %eax
movl $loopEnd, $ebx
jz *%ebx
#{
# putchar(*curCharAddr);
movl $retaddr1, %edx
movl $putchar, %ebx

115
jmp *%ebx
retaddr1:
# curCharAddr = curCharAddr + 1;
movl $4, %ebx
addl %ebx, %ecx
#}
movl $loopTest, %ebx
jmp *%ebx
loopEnd:
#//The rest of your code.

This code starts by testing whether the current


character is valid (i.e. not zero), then if it isn’t
valid, stopping. Otherwise, it prints, moves to the
next character, and repeats.

There actually is a standard way to do flowcharts,


though if you need to know it to read one, somebody
has failed.

This basic structure (test for end, do, increment,


repeat) shows up absolutely everywhere in programs.
Many higher level programming languages bend over
backwards to dress up and streamline this basic
“LOOP” structure. The most basic and versatile
flavor of loop (the while loop) follows this basic
paradigm (test, do, repeat).

116
Free Malloc
There is one last thing we need to cover before we
can stop banging our heads on assembly. So far,
we’ve been baking our text into the program itself:
while this approach can be made to work with variable
input, it does require that we know the size ahead of
time. This is not always the case: as an example,
let’s say you’re worried about spies reading the
messages you send out, and want to obscure what
you’re writing.

Jcpfu wr!

One very old way to do this is with a Caesar shift


cypher (guess who developed it). In a Caesar cypher,
you add some number (the rotation) to each letter
in the message: if the rotation is two, “A” becomes
“C”, “B” becomes “D”, and so on.

117
I came, I saw, I ran.

If you’re going to write a subroutine to do


this, it will need to take the address of the first
character, the length of the string (zero might be
in a valid encrypted string, so we can’t use it as
a marker), and the rotation. For simplicity, let’s
store them in variables.
.data
# char* plainText;
csaPlainText:
.long 0
# int numChars;
csaNumChars:
.long 0
# int rotation;
csaRotation:
.long 0
If we want to store our encrypted string without
overwriting the original (we’re only worried about
spies reading messages in transit, and we might still
have things to do with the plaintext), we should
probably carve out new space to store the result:
this means we need some way to carve out new space.

118
csaPlainText (at location 44) holds the location of
the plaintext (128). Our cyphertext will live at
location 249, which we got from... somewhere.

The details are actually fairly complicated, as


the subroutine for getting memory has to be fast
(it will be used often). However, the basic idea is
that there is a large reservoir of memory (the heap)
organized in such a way that the program can loop
through to find an unused section of an appropriate
length (one slow option is a linked list: check
out the relevant section in the chapter on data
structures).

119
Exactly how this subroutine got the memory in the
first place varies from system to system.

Luckily, most programming languages will provide an


implementation for you. Let’s say the subroutine we
have is called “malloc”. This subroutine takes the
number of bytes we need to carve out and will give
us back an address where there are bytes reserved
for our use. So, if we’re writing our Caesar shift
cypher, we might first get the number of characters
we need to store.
.text
caesarShift:
movl $csaNumChars, %eax
movl (%eax), %eax
The next thing to do is calculate the number of
bytes we need to store the shifted characters.
Since we’re using four bytes per character, we need
to multiply by four (we should be using one byte
characters, but the x86 commands for handling bytes
take some extra explaining). x86 has a multiply
command, but adding the register to itself (twice)
can quadruple the value (possibly faster than the
multiply instruction).
addl %eax, %eax
addl %eax, %eax

120
With the number of bytes we need, we can call the
malloc subroutine.
movl $malloc, %ebx
movl $retaddr1, %edx
jmp *%ebx
retaddr1:
This will give us an address we can store the shifted
characters at (in eax). The next thing to do is
initialize: get the number of characters (into ebx),
the rotation (ecx) and the original text (edx). It
would also be a good idea to store a copy of the
address in eax (we’ll use edi).
movl %eax, %edi
movl $csaNumChars, %ebx
movl (%ebx), %ebx
movl $csaRotation, %ecx
movl (%ecx), %ecx
movl $csaPlainText, %edx
movl (%edx), %edx
Then comes the first part of the loop: testing. We
need to see if the number of characters (remaining)
is zero.
loopTest:
movl $0, %esi
subl %ebx, %esi
movl $loopEnd, %esi
jz *%esi
Then comes the part of the loop that actually does
something, the body. We need to get the current
character, shift it, and store it.
movl (%edx), %esi
addl %ecx, %esi
movl %esi, (%eax)
Then comes the last part of the loop: the update.
We need to move the two character addresses and
decrease the number of characters to move.

121
movl $4, %esi
addl %esi, %eax
addl %esi, %edx
movl $-1, %esi
addl $esi, %ebx
After the update, we need to go back to the test.
movl $loopTest, %esi
jmp *%esi
loopEnd:
And finally, after the loop, we should restore the
address of the original text (into eax) and the
address of the shifted text (into ebx).
movl $csaPlainText, %eax
movl (%eax), %eax
movl %edi, %ebx
When we return, the calling code will have access to
both the original text and the cypher text.
Since we carved out memory for the cypher text, at
some point the calling code will have to surrender
it using another subroutine (called free). If it
didn’t, over time the program would use more and more
memory each time it called malloc (malloc won’t use
a reserved byte, but calling free un-reserves the
bytes). Eventually, the computer would run out of
unreserved memory, and the program would crash (this
is just one of the causes of random crashing).

122
Challenges
1) Look up an ASCII code table. Figure out what
numbers are used to represent “I am 1337.”.

2) Look up a few alternate encodings. Figure out


what numbers they would use to encode “I am 1337.”.
For encodings that use more that one byte, note the
number of bytes used for each character.

3) Write an assembly file that contains a list of


numbers. Write assembly code that loops through
those numbers, doubling them.

4) The Collatz sequence is defined as: for any


given number, if it is even, divide by two, but if
it is odd, multiply by three and add one. So, if the
current number is 6 (even), the next number would be
3 (divide by two). But, if the current number is 3
(odd), the next would be 10 (3*3+1). Write a while
loop that applies this rule while the number is not
one. You may need to avail yourself of some of the
other instructions of x86.

123
124
Chapter 9

How to program and


preserve your sanity

In the previous chapter, we went over pointers and


loops. The goal of this chapter is to stop using
assembly.

The C Programming Language


I hold that assembly is actually fairly simple:
at least, the concepts are (state, memory, and
instructions to change that state). Problem is,
getting assembly code to do anything substantial is
tedious. In many ways, its simplicity works against
it: you, the programmer, have to specify everything.
Additionally, assembly code is specific to a specific
type of processor. An x86 assembly file won’t help
you if you own an ARM processor.
A solution to this problem is to have a compiled
high level programming language. The idea is to
write your program in one format, and then run a
program called a compiler (that somebody else wrote)
that converts your program to assembly. With a well
designed language, doing anything important should
be fluid and straightforward (at least, relative
to assembly). As well, if you need to move to a
different processor, all you need to do is get a
different compiler.

125
We hope and pray somebody else wrote the compiler.

This idea is old (1950’s). However, it took a


while to be used for operating system development:
most computers have a program running in the
background to manage the hardware of the computer
(what program is currently running, which program
is using which part of memory, starting and stopping
programs). This program (called an operating system)
must be fast and it must be able to do strange things
with addresses.

Not the strangest house I’ve delivered to.

126
If you wanted to do strange things with addresses,
you either worked directly in assembly, or you
waited until 1972, when Dennis Ritchie created the
C programming language. C’s fundamental type, the
pointer, is simply an address. One of the main draws
of C is that it works at the computer’s level: it’s
fairly easy to see how the constructs of C might
translate to assembly. In fact, I’ve been marking
the assembly examples with the C code that might have
produced them: you might go back and review after
reading this chapter.
This chapter will cover C, but it will be very
fast. If this is your first crack at C, you might
want to go find some additional resources. If this
is your first crack at programming at all, even more
so (also, well done making it this far).

Hello World
If you are following along, this is the first part
of the book where I expect you to actually run your
code. This is because, for a new programmer, getting
your tools set up is often the most frustrating part
of the experience: once you get “Hello World” or
an equivalent running, you can add small steps to
build to a working design. The tools for working
with assembly are (usually) very arcane, byzantine
and draconian: the tools for the C programming
language aren’t easy, but they’re easy enough for
you to figure out on your own. However, if you have
a friend who has used C before, solicit their help.
If you are in a class, and you’ve gotten C working
before, offer to assist others.
Your first challenge is to download a compiler for
C++ (a C++ compiler will handle C, and we’ll use C++
later: I suggest the GNU G++ compiler) and use it
to run the “Hello World” program. In the previous
chapter, we went over how to program it in assembly.
Luckily, it’s substantially shorter in C. Open up
a text editor, write the following, and save it as
“hello.cpp” (and remember which folder you saved it
in).

127
#include <stdio.h>
int main(){
printf(“%s”, “Hello World!\n”);
return 0;
}

You will then need to figure out how to run the


compiler: it’s different for all of them. For
g++, you will first need to start a terminal/command
prompt (look up how). Then, navigate to the folder
you saved “hello.cpp” in (using the “cd” and “dir”
or “ls” commands). Then (once “hello.cpp” shows
up when you type “dir” (Windows) or “ls” (Linux and
Mac)), type “g++ hello.cpp” (if it doesn’t work, make
sure you installed the compiler and that the folder
containing “g++” is on your path). Finally, “./a”
will run this program (and you should see the text
“Hello World!”).
Once you’ve gotten this working, you should also
figure out how to get your chosen compiler to dump
assembly: most compilers will read in the program
you’ve written (“hello.cpp”) and produce machine
code (“a.exe” or simply “a”). However, many give
also you the option to produce assembly instead of
machine code: for “g++”, this is accomplished by
“g++ -S hello.cpp”.
We’ll go over exactly what this code means at the
end: for now, just know that every line except for
“printf(‘%s’, ‘Hello World!\n’);” is just there to
get things set up.

Arithmetic and Variables


C allows you to carve out memory to store values.
In our assembly files, we could be sloppy about our
typing: bytes are bytes, and can hold whatever we
tell them to hold. However, many issues can come
from that: using a number as text will result in
garbage spewing forth.

128
It seems you owe us ninety bucks.

So, C requires that you specify the type of thing


that memory address will hold. To carve out space,
you write the name of the type, followed by the name
you want to give to the space. The names of the
types baked into C (that we’re going to use) are int
(an integer; 32 bits on a 32 bit x86 system), char
(a byte, or a character of text), bool (a Boolean
value, true or false) and double (a number with a
fractional component, look up IEEE 754 if you’re
curious how this is done). For the name you want to
give, stick to letters and numbers (and don’t start
with a number).
As an example,

#include <stdio.h>
int aGlobalIntegerVariable;
int main(){
return 0;
}

will create one variable: an int variable called


“aGlobalIntegerVariable”. On x86, this variable is
probably created by attaching a label to a “.long”
value: the C standard actually gives compiler
writers a lot of leeway, but regardless of how it’s
done, the same thing is accomplished.

129
“main” tells the compiler where to start running
code. If we want to do arithmetic (for instance,
adding up a debtor’s balance), we can use some
operators baked into the language (+-*/ for
numerics, &|! for Booleans, and > < >= <= == != for
comparisons). We can then calculate and store the
balance with the following.

#include <stdio.h>
int balance;
int main(){
balance = -250 - 50 + 275;
return 0;
}

Variables can also be used in computations (and


they can be initialized to a specific value). So,
for the balance of a debtor with transactions stored
in variables, we might write:

#include <stdio.h>
int balance;
int transaction1 = -250;
int transaction2 = -50;
int transaction3 = 275;
int main(){
balance = transaction1 + transaction2;
balance = balance + transaction3;
return 0;
}

The compiler will take care of moving


data to (“movl %ebx, (%eax)”) and from
(“movl (%eax), %ebx”) memory, as well as the
arithmetic (“addl %ebx, %ecx”).

Conditions and Subroutines


C calls its subroutines “functions”, and they
are defined by specifying the name (for example,
“angleAileron”), the data the function needs (the
arguments, which will be put on the stack), and the

130
type of data the function returns (“angleAileron”
doesn’t report anything, so a special marker, void,
would be used). So, if you wanted to define an angle
aileron function, you would start by writing:

void angleAileron(int numberOfDegrees){

This marks the start (which is probably given a


label) of a subroutine called “angleAileron”, which
needs one piece of information (“numberOfDegrees”,
which is an int), and does things but does not report
a value (void). The actual code of “angleAileron”
would go between the opening and closing braces (“{”
and “}”, respectively). Since we never went over
that code, that code probably requires working in
assembly (you can embed assembly in C) and it depends
on the system in question, we’ll use another problem
for the working example: deciding whether to eject.
C allows you to wrap code in a condition: it will
only run that code if a given flag value is not zero.
This is called an if statement: our eject test could
be done as:

//You can just say a subroutine exists


//and provide the code elsewhere.
//This is useful for organization.
void eject();

void testEject(
int planeTemperature,
bool pilotWantsEject
){
if((planeTemperature > 200) || pilotWantsEject){
eject();
}
}

This creates a function called “testEject”


that takes two arguments: an integer (the
“planeTemperature”) and a bool (whether the
“pilotWantsEject”). The way an if statement works is
that, if the expression in the parenthesis evaluates

131
to zero, skip the code. In this example, if the
plane temperature is less than (or equal to) 200, and
the pilot does not want to eject, the call to “eject”
is skipped (“movl $ifend, %eax”, “jz *%eax”).

Pointers and Loops


Everything in C has a type: this includes the
pointers. A pointer’s type is made by taking the
type of the thing it’s pointing to (the type of
thing that lives at that address), and then adding
an asterisk. So, the address of an integer would be
int*, while a pointer to a number with a fractional
part would be double*. There is also a type for an
address with no type called void*: this is useful
for writing a few utilities that do the same thing
regardless of what they’re given (like “malloc” and
“free”).
So, if we wanted to do “Hello World!” one letter
at a time, we might start with:

#include <stdio.h>
int main(){
const char* toPrint = “Hello World!”;
//TODO
}

The stuff on the right (the stuff in quotes) carves


out space in memory to hold the text (with a zero
at the end), and gives us the address of the first
letter. The stuff on the left carves out space on
the stack for a variable (called “toPrint”) that
holds the address of a character (the const tells the
compiler to not let us change anything).
C allows us to repeat code while a value is not
zero using a while loop. So, if we wanted to call
“putchar” until we hit the zero, we might write:

132
#include <stdio.h>
int main(){
const char* toPrint = “Hello World!”;
int curNumber = 0;
char curLetter = *(toPrint + curNumber);
while(curLetter){
putchar(curLetter);
curNumber = curNumber + 1;
curLetter = *(toPrint + curNumber);
}
return 0;
}

This performs some setup before the test (getting


the string to print, and getting the first letter).
Then, the test is simply whether the current letter
is not zero: if it is not, it calls “putchar”, then
moves to the next letter.
Since a pointer is an address, and an address is
just a number, it’s possible to add to it before
getting the value. So, “*(toPrint + curNumber)”
takes the address in toPrint and adds to it; the
asterisk then gets the value at that address. So,
this might compile to (if toPrint is in eax and
curNumber in ebx):
movl %eax, %ecx
addl %ebx, %ecx
movl (%ecx), %ecx
We can also do pointer arithmetic to assign values
to memory. So, our Caesar shift cypher might be
written as:

133
#include <stdio.h>
#include <stdlib.h>
char* caesarShift(
const char* plainText,
int numChars,
int rotation
),{
char* cypherText = malloc(numChars+1);
int curNumber = 0;
while( *(plainText + curNumber) ){
char curChar = *(plainText + curNumber);
char rotated = curChar + rotation;
*(cypherText + curNumber) = rotated;
curNumber = curNumber + 1;
}
*(cypherText + numChars - 1) = 0;
return cypherText;
}
int main(){
const char* message = “Hello World!”;
char* garbled = caesarShift(message, 12, 2);
printf(“%s”, message);
printf(“%s”, “\nbecame\n”);
printf(“%s”, garbled);
free(message);
return 0;
}

This creates a subroutine (“caesarShift”) that


creates storage (“malloc(numChars)”), runs through
the letters of “plainText”, and rotates and stores
them. It also adds an ending (setting the last
character to zero) and produces the value (“return”
stores the result so that whatever called the
subroutine can use it).
The main program uses this subroutine to print a
message and its garbled counterpart.

134
Hello World
#include <stdio.h>
int main(){
printf(“%s”, “Hello World!\n”);
return 0;
}

Back to “Hello World”. This creates a function


called “main”: the compiler (...kinda) adds code
to call this function when the program starts. This
function produces an int: by convention, the main
function reports whether there was an error (the
convention was established before the bool type
was added). By returning zero, we are signaling
that there was no error. The line with “printf”
is calling a subroutine (called “printf”), which
apparently takes two char* and does something with
them (the first text string is a format string, and
the second specifies what should be printed to the
standard output).
So, there’s only one line left to cover. The line
beginning with the hash tag means

135
Challenges
1) Get a compiler installed on your system, and run
“Hello World”.

2) In C, a char is one byte: other types are larger.


C provides a mechanism to get the size of other
types, namely, sizeof(...). So, the size of a bool
is “sizeof(bool)”. Write code that gets space for 7
integers, and then sets those integers to the square
of their index (the first number should be zero, the
second one, the third four, the fourth nine, etc...).

3) “printf” is a fairly weird function: it can take


a variable number of arguments. Look up the API for
“printf”, and explain how you might pack a variable
number of arguments onto a stack.

4) C has other flavors of loop. Look up the for


loop, and explain how you could accomplish the same
thing with a while loop.

136
Chapter 10

How to program in an
organized manner

In the previous chapter, we went over the basics of


C. The goal of this chapter is to organize our code.

Linking
The C compiler reads one file and turns the
statements inside it into assembly (and/or machine
code). If there are subroutines, it will find a
space to put them, and the same goes for variables.
However, if you look carefully at the “Hello World”
example, you might notice something: we use a
subroutine called “printf”, but we never actually
provide code for “printf”. “printf” is a fairly
complicated subroutine, so the code has to be
provided somewhere, yet our code doesn’t have it.
However, the C compiler you installed comes with
a whole collection of library subroutines: in most
cases (including the GNU compiler), this library is
pre-compiled. In order to compile and collect a C
file, the GNU compiler works in three passes. The
second (yes, second) pass produces an object file:
it takes all the information in the starting file
and produces assembly (with a few missing pieces of
information): this is the actual compile step. The
way C is designed, the only information needed about
things in other files is the address they wind up at:

137
so, the third pass collects multiple object files and
fills in all of the missing addresses (linking).

Fill in the blanks.

This leaves one last issue. In order for the


compiler to generate code to call a subroutine, it
must know a few things about that subroutine: its
name, the number and types of arguments it takes,
and (to catch errors early) the type of value it
returns. The C compiler must get this information
from somewhere, and while it could load in every
file and compile all at once, there are good reasons
not to (namely, efficiency and organization).
However, C allows you to specify that a subroutine
exists without saying how it works (add a semicolon
instead of braces and code). So, to provide that

138
information, we could write (at the top of the file)
declarations for functions we’re going to use but are
defined elsewhere.

int printf(const char* format, ...);

int main(){
printf(“%s”, “Hello World!\n”);
return 0;
}

This could work, but some functions get used


everywhere (“printf” for example). It’s very easy
to mess something up copying this text that many
times, and if something ever changes, God help you.
This is where the first pass comes in. C uses a text
preprocessor as its first step. This preprocessor
scans a C file for directives (things beginning with
a hash tag) and follows their instructions to produce
new text. In this case, “#include <stdio.h>” will
go find the file “stdio.h” (in the standard library
location) and copy its contents.

139
Fill in the other blanks.

The file produced by the preprocessor is then fed


to the next pass.

Fill in all the blanks.

You can also make use of this to split huge

140
programs up into manageable pieces: put your
function declarations into one or more header (“.h”)
files, and you can split the code into as many source
(“.cpp”) files as you see fit. Just remember, your
own header files should be referenced inside quotes
(#include “yourlib.h”) and standard library headers
should be in angle brackets (#include <stdlib.h>).

Structs
Up to now, we’ve been dealing with base types (char,
int, double) and pointers to those base types (char*,
int*, double*). This is fine if our program only
deals with simple concepts. However, suppose we
wanted to write a program for a self-driving car.

The future is now.

Our code needs to keep track of position,


orientation, speed, the shape of the road, the
locations of other drivers, the amount of gas we
have, the current gear, and stuff ad nauseum.

141
Until you are sick of it.

Some of our subroutines need to look at the


full state of the vehicle. If we try to write the
declaration, we’re going to wind up with something
stupid.

void planRoute(
double* location,
double* rotation,
int gasLevel,
double** otherDrivers,
double** roadPoints,
int** roadTris,
char* pleaseNoMore
);

This is ridiculous. These variables all belong


to the same concept (the state of the car), so
it’d be nice if we could treat them as one entity.
Fortunately, C provides a mechanism to do this,
called structures.
If you want to declare a variable that aggregates a
large number of items, you can can start by declaring
a structure type (probably in a header file).

142
typedef struct{
double x;
double y;
double z;
double orientation;
int gasLevel;
bool killAllHumans;
} CarState;

This creates a new type, which can be used anywhere


you’d use a base type. You can use it as a global
variable, function arguments and returns, and inside
other structure types. Finally, you can have a
pointer to this type (“CarState*”).
A variable in C has some address it lives at, and
some number of bytes it occupies. For an aggregate
type (like a structure), the address is the address
of the first thing, and the number of bytes is just
the total of the sizes of its contents (plus some
padding if the processor requires alignment).

On 32 bit x86, a double is 8 bytes, an int is 4, and


a bool... let’s say 4.

If you want to access the contents of a structure,


you use a dot followed by the name of the thing you
want to access. So, if we don’t want our car to
randomly kill people, we might write:

143
#include “cardecs.h”
#include <stdio.h>
int main(){
CarState mycar;
mycar.killAllHumans = false;
if(mycar.killAllHumans){
printf(“%s”, “Make your time.”);
}
return 0;
}

Passing a structure as an argument to a subroutine


is an expensive operation (everything must be
copied): a better option is to pass a pointer. If
you have a pointer, instead of a dot, you use “->”.

#include “cardecs.h”
#include <stdio.h>
void moveCar(CarState* toMove){
toMove->x = toMove->x + 0.5;
}
int main(){
CarState mycar;
mycar.x = 0.0;
CarState* mycarptr = &mycar;
moveCar(mycarptr);
return 0;
}

Function Pointer Polymorphism


Suppose you are writing a video game.

144
You’re all DOOMED!

This video game has to keep track of multiple


entities. It needs to track the player, any enemies,
power-ups and the level itself. All of these
entities need to periodically update their state
(where they are, who they’re trying to kill) every
so often.

Should I run from the guy with the chainsaw?

In addition, they need to draw to the screen every


60th of a second.

145
Happy trees.

However, the behavior of the player (controlled by


mouse and keyboard) is very different from that of
the enemies (controlled by AI), which is different
from the power-ups (spin around and look pretty).
The way they draw to the screen is also different.
We’re probably storing the data for each of these
entities in separate structures.

typedef struct{
char stuff;
} Enemy;
typedef struct{
int moreStuff;
} Player;
typedef struct{
double yetMoreStufff;
} PowerUp;

As it stands, our main program will need to run


through each type of entity separately.

146
Enemy** allEnemies;
Player** allPlayers;
PowerUp** allPowers;
void updateState(){
Enemy** curEnemy = allEnemies;
while(*curEnemy){
Enemy* toUpdate = *curEnemy;
enemyUpdateFunc(toUpdate);
curEnemy = curEnemy + 1;
}
Player** curPlayer = allPlayers;
while(*curPlayer){
Player* toUpdate = *curPlayer;
playerUpdateFunc(toUpdate);
curPlayer = curPlayer + 1;
}
PowerUp** curPower = allPowers;
while(*curPower){
PowerUp* toUpdate = *curPower;
powerUpdateFunc(toUpdate);
curPower = curPower + 1;
}
}

Each of those loops looks very similar: there


should be some way to simplify this so that we can
write one loop that handles everything (especially
if we want to have a large number of things in the
game without writing a large number of loops). It’s
worth remembering that the code itself is also in
memory, and has an address. C will let you get and
use that address (this is called a function pointer),
so we can simplify the above code. Each subroutine
we’re calling has the same signature, so to each
structure we might add an extra variable that stores
the subroutine.

147
typedef struct{
void* myupdatefunc;
char stuff;
} Enemy;
typedef struct{
void* myupdatefunc;
int moreStuff;
} Player;
typedef struct{
void* myupdatefunc;
double yetMoreStufff;
} PowerUp;

Since each structure contains its own update


function, we only need to write one loop to update
(which uses the function pointer provided by the
structure). If you want to do this the right
way, the syntax in C is horrible (a strange mix of
function pointers, structures and unions). However,
there was an extension to C that fixes this very
problem.

Have Some Class


C++ was created in 1983 by Bjarne Stroustrup, and is
essentially C with classes. It is an extension to C,
so any valid C program is a valid C++ program (that’s
why I had you use the C++ compiler instead of a pure
C one). In addition, it provides access to object
oriented constructs (classes).
If we wanted to write code for this game, we might
start by defining an abstract base class (in a header
file).

class GameObject{
public:
virtual void myUpdateFunc() = 0;
};

This defines a type (called GameObject) that has


one method (subroutine it keeps the address to).
However, this type does not provide this method:
we’re only saying what any GameObject can do, not

148
what a specific GameObject can do. The keywords
“public” and “virtual” can, for now, be ignored (just
include them in your code).
The actual things in the game (Player, Enemy,
PowerUp) can be declared (in a header) as:

class Enemy : public GameObject{


public:
virtual void myUpdateFunc();
char stuff;
};
class Player : public GameObject{
public:
virtual void myUpdateFunc();
int moreStuff;
};
class PowerUp : public GameObject{
public:
virtual void myUpdateFunc();
double yetMoreStuff;
};

And the methods could be defined in a source file


(“.cpp”).

void Enemy::myUpdateFunc(){
stuff = ‘K’;
}
void Player::myUpdateFunc(){
moreStuff = getJoystick();
}
void PowerUp::myUpdateFunc(){
yetMoreStuff = yetMoreStuff + 0.1;
}

This creates, essentially, a number of structure


types. Each structure starts with a single pointer:
that pointer holds the address of a list of function
pointers (the methods that structure carries around
with it, the vtable).

149
Indirection and abstraction are the cause of (and
solution to) all the problems in computer science.

These lists all have some shared structure (in this


case, the first thing of each points to a subroutine
that takes no arguments called myUpdateFunc), which
is due to the fact that they all inherit that shared
structure from GameObject.
So our single loop can be written as:

GameObject** allItems;
void updateState(){
GameObject** curItem = allItems;
while(*curItem){
GameObject* toUpdate = *curItem;
toUpdate->myUpdateFunc();
curItem = curItem + 1;
}
}

The call to the method (“toUpdate->myUpdateFunc();”)


gets the GameObject’s vtable (the first thing in the
GameObject), then gets the first (and only) method

150
in the vtable, and calls it (using toUpdate as an
implicit argument). Each item has a different entry
in its vtable (“Enemy” stores the subroutine that
sets “stuff” to the letter K, while “Player” stores
the one that reads the joystick), so each item will
run the appropriate subroutine.

151
Challenges
1) It’s also possible to declare a variable without
defining it (i.e. carving out memory for it).
Look up the “extern” keyword, and try to figure
out the difference between “int curCount;” and
“extern int curCount;”.

2) There are other directives for the preprocessor.


Look up a few and describe when you might use them.

3) I skipped going over how to build the single loop


in pure C because the syntax is terrible. Look up
the syntax for function pointers and unions, and see
if you can work it out for yourself.

4) When you create a structure in C, the compiler


does not fill in any data (whatever garbage was in
memory is what you get). Look up how you can set
up the variables of a structure when you carve out
memory for it (i.e. when you construct it). Look up
how to do the same thing for a class in C++.

152
Chapter 11

How to program a
compiler

In the previous chapter, we went over code


organization. The goal of this chapter is to go over
the basics of interpreters and compilers.

Compiler Overview
So far, we’ve been assuming somebody else has written
a compiler for us. For most systems, for most
people, this is a good assumption. However, far too
many programmers take this as an excuse to not learn
about it, claiming compiler writing is black magic.
In actuality, if you know how to program, writing a
functional compiler is straightforward (if tedious):
writing an optimizing compiler IS black magic, but I
won’t be covering that here.
A compiler typically runs in several passes:
lexing, parsing, code generation, assembly and
linking.

153
Black magic is for offense, and I find this
offensive.

In order to figure out how these work, an example


may be in order. Writing a compiler for the C
programming language is a poor choice for a simple
example: there are simply too many moving parts.
So, we’re going to use a simple language. This
example language works with an explicit stack. If we
see a number, we will push its value onto the stack.

A very pushy program.

If we see a plus symbol, we will take the top two


items off the stack and push their sum.

154
pop pop push.

And we will ignore whitespace (space, tab and


newline characters). As an example, “5 6 +” would
produce 11, and “56 2 2 + +” would produce 60.

Lexing
The first pass in a compiler is to lex the input. A
program file is stored as a long stream of bytes. We
read in the file one byte at a time. However, while
ASCII characters are single bytes, our numbers might
have more than one digit. We need some way to break
the file into usable pieces and figure out what the
pieces are (in this case, the pieces can be numbers
and the plus sign).
So, regardless of how we got our program into
memory:

const char* ourProg = “56 2 2 + +”;

the first thing to do might be to write a subroutine


that finds the next relevant token. For a full
language, a parser generator might be a good idea.
However, since we have a simple language, we can
write our own. One subroutine will skip over
whitespace.

155
bool isWhitespace(char toTest){
bool isTab = (toTest == 9);
bool isNewline = (toTest == 10);
bool isSpace = (toTest == 32);
return isTab || isNewline || isSpace;
}
const char* lexSkipWhitespace(const char* theProg){
const char* curStart = theProg;
while(isWhitespace(*curStart)){
curStart = curStart + 1;
}
return curStart;
}

The next subroutine will, given the starting point


of a number, find its end (specifically, the byte
after the end).

bool isDigit(char toTest){


return (toTest >= 48) && (toTest <= 57);
}
const char* lexFindNumberEnd(const char* theProg){
const char* curEnd = theProg;
while(isDigit(*curEnd)){
curEnd = curEnd + 1;
}
return curEnd;
}

Interpreting
The next thing to do is add a parsing phase, where
the relation of each token to everything else is
figured out. Parsing can be a very complicated
affair (recursion is almost always required);
however, this is a simple enough language that we can
parse and produce code at the same time.
Before we produce code, however, we might start
by writing a simpler program: an interpreter. An
interpreter reads in a program and “simulates” the
effects of that program: it can be helpful to write

156
an interpreter to get a feel for what the code the
compiler produces must do.

It’s easier to develop with an interpreter, but


“a.exe” is faster and easier to copy to a friend’s
computer.

If we’re going to simulate the code, the first


thing to do is get space for a stack (let’s limit it
to 50 items).

int* theStack = (int*)malloc(50*sizeof(int));

Then we set up a loop: while we haven’t hit the


end of the program, we need to keep parsing.

while(*ourProg){
...
}

If we are currently looking at whitespace, we need


to skip to the end.

while(*ourProg){
if(isWhitespace(*ourProg)){
ourProg = lexSkipWhitespace(ourProg);
}
...
}

If we are looking at a plus, we need to do an add


(get the top two values, add, and store) and move to
the next character.

157
while(*ourProg){
if(isWhitespace(*ourProg)){
ourProg = lexSkipWhitespace(ourProg);
}
else if(*ourProg == 43){
int valA = *theStack;
theStack = theStack - 1;
int valB = *theStack;
*theStack = valA + valB;
ourProg = ourProg + 1;
}
...
}

If we are looking at a digit, we need to find


the end of the number and push. However, we have
our number in ASCII characters but we need it in
binary: our parser needs a parser. Luckily, there
is a standard function to do this (“atoi”), which we
will use.

while(*ourProg){
if(isWhitespace(*ourProg)){
ourProg = lexSkipWhitespace(ourProg);
}
else if(*ourProg == 43){
int valA = *theStack;
theStack = theStack - 1;
int valB = *theStack;
*theStack = valA + valB;
ourProg = ourProg + 1;
}
else if(isDigit(*ourProg)){
int toPush = atoi(ourProg);
ourProg = lexFindNumberEnd(ourProg);
theStack = theStack + 1;
*theStack = toPush;
}
}

And, just like that, we have the core of an


interpreter.

158
Code Generation
The problem with an interpreter is, if we want to
run our program, we need our interpreter (it also
tends to be slower, since an interpreter has to do a
lot of parsing stuff while running the program). If
we want an actual compiler, we’ll have to generate
assembly code that accomplishes what we want. To
keep it simple, we’ll just print our assembly to the
standard output.
This pass usually has some boilerplate code that
will always be generated: for instance, code to set
up our stack.

printf(“%s\n”, “movl $malloc, %eax”);


printf(“%s\n”, “movl $200, %ebx”);
...

Once we have the address of the top of the stack


in %edx, and we have our program loaded, we can start
over with our loop.

while(*ourProg){
...
}

Again, we just skip whitespace.

while(*ourProg){
if(isWhitespace(*ourProg)){
ourProg = lexSkipWhitespace(ourProg);
}
...
}

However, seeing a plus means that we need to


produce code to get a value from the stack, move the
stack, get another value, add, and put a value on the
stack.

159
while(*ourProg){
if(isWhitespace(*ourProg)){
ourProg = lexSkipWhitespace(ourProg);
}
else if(*ourProg == 43){
//get value
printf(“%s\n”, “movl (%edx), %eax”);
//move stack
printf(“%s\n”, “movl $-4, %ebx”);
printf(“%s\n”, “addl %ebx, %edx”);
//get value
printf(“%s\n”, “movl (%edx), %ebx”);
//add
printf(“%s\n”, “addl %ebx, %eax”);
//store
printf(“%s\n”, “movl %eax, (%edx)”);
//go to next character
ourProg = ourProg + 1;
}
...
}

And, finally, seeing a number means we have to


write code to get that value and put it on the stack.

160
while(*ourProg){
if(isWhitespace(*ourProg)){
ourProg = lexSkipWhitespace(ourProg);
}
else if(*ourProg == 43){
printf(“%s\n”, “movl (%edx), %eax”);
printf(“%s\n”, “movl $-4, %ebx”);
printf(“%s\n”, “addl %ebx, %edx”);
printf(“%s\n”, “movl (%edx), %ebx”);
printf(“%s\n”, “addl %ebx, %eax”);
printf(“%s\n”, “movl %eax, (%edx)”);
ourProg = ourProg + 1;
}
else if(isDigit(*ourProg)){
//lex and parse the number
int toPush = atoi(ourProg);
ourProg = lexFindNumberEnd(ourProg);
//move the stack
printf(“%s\n”, “movl $4, %ebx”);
printf(“%s\n”, “addl %ebx, %edx”);
//put the number on the stack
printf(“movl $%d, %%eax\n”, toPush);
printf(“%s\n”, “movl %eax, (%edx)”);
}
}

When we run our compiler, assembly code (that


performs the commands in our source program) will be
printed to the standard output. If we save the stuff
that was printed to the standard output, we can run
an assembler on it. The resulting machine code can
then be run directly on our processor (no simulation
required).

161
Challenges
1) The example bakes the text to compile into the
program. See if you can write some code to read in
the text to compile from standard input (you might
look up the “gets” subroutine).

2) Code up the interpreter. Once you have it


working, see if you can add the ability to print:
if a ? character (63) is encountered, remove an item
from the stack and print it to the standard output.
Run the following program: “5 6 + ?”.

3) Figure out how to make your compiler produce


assembly instead of an executable. Get the assembly
for “hello.cpp”, and see how subroutines get called
on your system (different systems and compilers have
different calling conventions).

4) Code up the compiler, adding the ability to print


(the ? command). Compile and run “5 6 + ?”. (As
an aside, most compilers can handle linking against
an assembly file if you follow some simple naming
conventions).

162
Part II

How to computer
science

163
Chapter 12

How to dope

The goal of this chapter is to go over how computer


chips are actually built.

Disks
The basic idea of a transistor is relatively simple:
mix some group 5 elements into one region of a
silicon crystal, and bracket it by two regions
of silicon doped with group 3 elements. Slap an
insulator over the middle, slap some metal on that,
and attach some wires.

Been a while since we’ve looked at this, hasn’t it?

165
If you have a bulk source of transistors, you can
then wire them up. Conceptually, this is a great
way to view a computer, and historically computers
actually were built with this “transistor-transistor
logic”. However, it’s a terrible way to actually
build a computer: the resulting circuit will be a
bulky rat’s nest of wires.
Modern processors fit into a one inch square,
and contain an insane number of transistors; to do
this, you need some fairly complicated industrial
processes. The whole process starts by getting some
silicon; since silicon is one of the more common
elements (on earth at least), this should be easy.
However, most of that silicon is bound to oxygen in
silicon dioxide (sand and glass).

Sand, glass, sandy glass.

So, we need a process to turn sand into


silicon: there are several methods, but all of
them involve reducing the silicon (often through
a high-temperature electrolytic process). The
main consideration is purity: pure silicon is an
insulator, but any impurities make it conduct. Our
silicon needs to be as pure as a mountain stream.

166
Purer, if possible.

Getting it that pure is an exercise in lab


chemistry: as an example, one common technique for
purifying a solid is recrystallization. If you have
a crystal with impurities in it

Unclean, unclean.

you will need to break the crystal to get the


impurities out. Melting a crystal breaks it down to
atoms, and allows the impurities to move around.

167
De-crystallization

If you slowly recrystallize (and possibly provide a


nearby solvent that the impurities like more than
your crystal), you get a new crystal with fewer
impurities.

Statistically, you never get 100%, but you can get


close enough.

With silicon, one way to do this is by taking


a slab of silicon, heating up a region, and then
walking your heating element down the slab.

168
The heating element is called phlogiston.

This gives you a liquid-like region that moves


down the crystal: if you do it right, the impurities
should travel with the melted region (which you can
remove from the end).
Anyway, once you get your pure silicon, you can cut
it into disks and pass them to the next steps of the
process.

Photolithography
Building all these transistors in one square inch
requires producing very small transistors: at
the time of writing this, common processors have
transistors 14 nanometers wide (140 atoms across).
You aren’t doing that by hand: forget your twitching
muscles, your heart pumping blood makes your hand
twitch (and I doubt you want to stop your heart for
the amount of time you’d need to do this). We need
some way to automate the process; since many of the
individual steps in automating the process will alter
everything they touch, we need some way to limit what
can be touched.
This is achieved by coating the surface of the
disk with a special chemical. This chemical needs
to be sensitive to light: being exposed to light
should make it harden up and stick to the surface.

169
A lot of polymerization reactions are initiated by
light (specifically, light produces radicals, which
then produce a chain reaction), so those are a fairly
common choice.

Polymethyl methacrylate: aka PlexiglasTM .

Now, we just need to expose our disk to light;


however, we need to mask the light (adding a solid
slab of PlexiglasTM to the disk means we can’t
do anything). This is essentially shadow puppet
theatre: you have a light source and your disk, and
you put a mask between the two.

170
Woof.

Wherever the light hits is protected from whatever


comes next, while everywhere else the photoresist
just washes off.

Guard dog.

Ion Implantation
Now we need to dope (i.e. add impurities to) the
silicon. However, we can’t just melt the silicon and
mix the impurities in: we’ve only got 140 atoms to

171
work with, and melting is relatively indiscriminate.
Our solution: indiscriminate gunfire.

Only a slight exaggeration.

If you remember, positively charged things are


drawn towards negative charge, and vice versa. If
we add or remove an electron from an atom (probably
by shocking the hell out of it), we produce ions
that we can accelerate by using a capacitor. If the
far plate is smaller, the ions will overshoot and go
zooming off towards whatever we aim them at.

172
Generate ions, run them through a voltage difference,
and you have an ion gun.

Since we covered the disk with PlexiglasTM , the


ions only embed themselves where we want them to
(technically, they also embed in the PlexiglasTM ,
but who cares). After we’re through, we’ll need to
remove the PlexiglasTM (either through chemical means,
or grinding).

Chemical Vapor Deposition


Now that we’ve got our doped silicon, we need to slap
some stuff onto it. However, some of that stuff is
a conductive material, namely, metal. Metal doesn’t
slap so good.
Since we don’t want to shatter the disk, we need
some way to gently deposit metal. While you might
try wet-chemistry methods, perhaps the most direct
method is to boil metal in a vacuum. You start by
placing the disk and a slab of metal inside a vacuum
chamber, and remove all the air.

173
This sucks.

Then, you heat up the metal until it boils. The


metal atoms that leave the metal go flying in a
straight line until they hit something.

What’s wrong with this picture?

This will slowly coat whatever’s in the chamber


with a layer of metal, including your disk.

174
Transistors and Wires
With the processes above, it’s possible to build an
integrated circuit on a chip. To do so, you just
need to do a whole bunch of steps. For example, you
might coat, implant, grind, coat, implant, grind,
coat, deposit insulator, grind, coat, deposit metal,
grind, coat, deposit insulator, grind, coat, deposit
metal and grind.
Anyway, once you’ve finished with a disk, you just
need to prepare it for its final application. You
probably need to cut it (it’s more economical to
make many processors at once), add some wires (so
you can hook up your chip to something else) and add
a coating to protect the chip. Finally, you need
to come up with some way to market your processor
(advertising is outside the scope of this book).

175
Challenges
1) There are other methods you can use to purify a
material. Look up a few, and discuss how applicable
they would be to chip manufacture.

2) You can’t make an item with 14nm details by hand.


We’re using a mask to make the chips: how do you
make the mask?

3) A common way of heating metal for chemical vapor


deposition is to use a beam of electrons. Look
up what happens when you hit metal with electrons.
Could that cause any problems?

176
Chapter 13

How to data

The goal of this chapter is to go over how to


organize your data.

Arrays
Up to now, all of our programs have been pretty
direct: we haven’t done much to organize our data,
mostly because it came to us in a usable form.
However, if you write larger programs, or smaller
programs with a tricky setup, it can be helpful
to know a few basic ways to organize your data.
Additionally, it’s possible to write a code library
for these common structures: you can write it once,
do it right, and use that same code over and over
(in fact, many languages give you implementations of
these in their standard library).
The first data structure we’ll go over is the
array: an array is a collection of elements. You
can get an element by an index, or you can set an
element. We’ve seen this before, though I didn’t
give it the name “array”. RAM is an array of bytes,
and if you use C’s malloc to get memory, you’ve
created an array.

177
A Ray that everyone can love.

Most languages provide special syntax for dealing


with arrays: C is no exception (“a[5]” is shorthand
for “*(a+5)”). Most languages also check that you
haven’t asked for something outside the array: if
the array has five items, then asking for the sixth
element (index 5) is a problem.

A lot of hacking exploits start this way.

C doesn’t do this (adding those checks makes the


code run more slowly), but you can roll your own
(especially if you use C++). Running through an

178
array (or things that look like arrays) and doing
stuff to each item is a stupidly common operation.
The following is the basic code for doing so: commit
it to memory.

char* theArray;
int length;
...
int i = 0;
while(i < length){
curVal = theArray[i];
//other stuff
i = i + 1;
}

This code starts with an offset (i) that starts at


zero (the first item), and then increments until it
runs off the end of the array. This basic idea is
so common, many languages provide some special syntax
for dealing with it: the standard for loop in C is
shorthand for the above, and many languages (MATLAB,
Python, Java) have a for loop that iterates through
a list without an explicit index. However, the above
code works in most languages (unless you’re working
with a weird language: good luck).

It’s all Greek to me.

179
Lists
In addition to arrays, there is a related data
structure called a list. A list can get an element
at an index, or set an element at an index (exactly
like an array). However, you can also add and remove
items from the list (an array has its size fixed when
it’s made); this is useful if you don’t know ahead
of time how much data you will be working with (i.e.
reading in data from a file).
A list can do a lot of things, so most language
designers don’t bake it into their language: they
provide a class to do the job in their standard
library. There is a non-obvious question if we do
this: how do we store the data in the list? The
most natural option is as an array: you have a class
that contains an array, get and set just use the
array’s get and set, and add/remove make a new array
and copy over the data.

Easy get, easy set.

This works well enough (and should be your


default), but what happens if we write code with a
lot of adds? Every time we add, we need to copy over
the contents of the array: that’s a lot of extra
work for what should be a simple operation. A way to
get around this is to use a linked list. In a linked
list, each entry is stored in its own structure,

180
along with a pointer to the next structure.

Easy add, easy remove.

If we want to add an element, we only need to


malloc a new structure and change a pointer, which
is much faster than copying all that data.

We only change 4 things, instead of 13.

However, getting and setting an element is now


more difficult: we need to run through all the
pointers. This is a tradeoff, and also highlights

181
one of the uses of classes. You can have multiple
implementations of a list (i.e. ArrayList and
LinkedList), and if both are subclasses of List,
then you can swap out one for the other with minimal
issue.

List* listA = new ArrayList();


List* listB = new LinkedList();
listA->add(17);
listB->add(17);
int a = listA->get(0);
int b = listB->get(0);

Maps
The next item of interest is the associative array
(also known as the Map). The way these work is
you give the map an item, and it will tell you what
is associated with that item. If you’ve ever used
a phone book or building directory, you’ve used a
“map”.
Perhaps the most straightforward way to store a
map is as a pair of lists. As an example, suppose
you’re making a menu: you have the names of the
dishes (the “keys” of this map) in one list, and the
corresponding prices (the “values”) in another.

dishes (key list) prices (value list)


stuffed small intestine $3.50
boiled bovine butt $5.95
tubers $2.50
congealed drippings $1.00
rotten juice $7.00

They all sound tasty.

When a customer tells you what they want, you will


loop through the list of dishes until you find that
name, then get the corresponding price in the list of
prices.
Using parallel lists happens to be one of the
slowest ways to code this up: a faster option will

182
be shown in the next chapter (using trees).

Trees
A tree is a hierarchy: at the top of a business, you
have the CEO, then upper management, then management,
then the peons.

Ironically, peons object to being peed on.

Any time you have a hierarchy, you can describe it


as a tree. The top of the hierarchy (the CEO of a
business, the general of an army, etc...) is called
the root. The things directly under any node are
the children of that node (upper management only got
the job because of nepotism). Finally, nodes without
children are called leaf nodes.

183
Trees aren’t common, but they aren’t exactly rare
either.

The canonical representation of a tree is similar


to that of the linked list: you have a class for
the nodes, and this class stores data and a list of
children.

class Node{
public:
void* data;
List childNodes;
}

There are many ways to go through everything in a


tree. However, the right way uses recursion. As an
example, suppose we were hired by the IRS to examine
the employees of a business. We would start with the
CEO and go into his background. Then, we would look
at each of his subordinates in turn, and do the same
thing. This would continue until we look at people
with no underlings.

184
The order we look at these people.

At every level, for every person, we do the same


thing: go through their books, then look at their
subordinates. If we write a function to audit an
employee, it first needs to check that employee.
Then, it needs to go through all the subordinates and
audit each subordinate employee in turn.

void auditEmployee(Node* toAudit){


checkBackground(toAudit->data);
List* suboordinates = &(toAudit->childNodes);
int i = 0;
while(i < suboordinates->length()){
auditEmployee(suboordinates->get(i));
i = i + 1;
}
}

In assembly, a function is just an address, and


you can jump to that address from anywhere: assembly
doesn’t know that you’re jumping to a function from
that same function. Additionally, we store variables
on a stack, so we won’t overwrite anything when we do
this.
Having a function call itself is called recursion.
There is a lot that can be done with recursion, but
just because you can do something doesn’t mean you
should. If your problem is inherently structured as
a hierarchy (like the above example), recursion is

185
the natural approach. In most other cases, recursion
is probably not the right option.

Graphs
Rounding out the set of data structures is the graph.
Programmers have some weird naming conventions:
their word for dictionary is “Map”, and their
word for map is “Graph” (not to be confused with a
depiction of data).
A graph is a collection of nodes, and connections
between them. The simplest example is a road map:
you have cities (nodes) and roads between the cities
(connections).

How to travel between important Texas cities, and


Lubbock.

There are a couple of ways you could go about


representing this: one of the most common is a
Node/Link structure, where each node in the graph is
its own object, and contains its links.

class Node{
public:
void* data;
List linkedNodes;
}

186
The most common way to loop through all nodes is
something called breadth first search: you start
at a node, and note where its links point (ignoring
nodes you’ve already been through). This can be done
with two lists: one for the nodes that have been
spotted but not examined, and one for the nodes that
have been examined.

List openNodes;
List closedNodes;
while(openNodes.length() > 0){
Node* curNode = openNodes.get(0);
openNodes.remove(0);
//do something with the current node
List* links = &curNode.linkedNodes;
int i = 0;
while(i < links.length()){
Node* linkNode = links.get(i);
bool handled = contains(&closedNodes, linkNode);
bool spotted = contains(&openNodes, linkNode);
if( !(handled || spotted) ){
openNodes.add(linkNode, 0);
}
i = i + 1;
}
}

While there are still nodes you have seen but not
examined (“while(openNodes.size() > 0)”), get one of
those nodes, do something with it, and run through
its links (you REALLY need to memorize how to loop
through a list). If those linked nodes have not
already been seen (“!(linkHandled || linkSpotted)”),
add them to the list of nodes to handle.
We need all this extra code to make sure we only
hit each node once: in the worst case, it’s possible
for us to get stuck forever in a loop.

187
There are three loops in this graph that we could get
stuck in.

Exactly why you might want to loop through the


nodes in a graph is covered in the next chapter.

188
Challenges
1) Your preferred language probably provides a
standard implementation of a list and a map. Look
them up, and try to determine how they work (wrapper
around pointer/array, linked list, or something
else). If your language has a tree and graph type,
examine those as well.

2) See if you can write a list class in C++. It must


be able to get, set, add at a location and remove
from a location.

3) Your list class probably allows the user to add


any type: this can cause a headache if other code
expects a list to contain one type, but has another.
Look up templates in C++, and see if you can change
your list class to limit the possible types.

4) Technically, you don’t need to use recursion to


run through a tree: you can do it with a loop and
a list. Write two pieces of code to run through a
tree, one that uses recursion, and one with a loop.
Which do you prefer?

189
190
Chapter 14

How to algorithm

In the previous chapter (which you should read before


this one), we went over common data structures. The
goal of this chapter is to cover program analysis and
algorithm selection.

Sorting
We now have a few canned answers to the question of
how we store data. It turns out there are a few
canned answers to how we can manipulate the data;
these canned answers show up everywhere (in some form
or another), so it’s worth at least knowing about
them (especially since many programming languages
provide implementations for you).
One of the most useful is rearranging a list of
items into a preferred order (i.e. sorting a list).
For instance, let’s say we’re going into the used car
business.

Stops great. Doesn’t run well, but stops great.

When a customer comes in, we want to start with the

191
most expensive car on the lot; that requires knowing
which car is the most expensive. If our program has
a list of cars on the lot, we need to sort that list:
we can’t just find the largest, since the customer
probably won’t buy the first thing we show them.
There are many ways to go about this: one of my
favorites is called “bubble sort”. In bubble sort,
you run through the list seeing if everything is in
order: if not, you swap the two items that are out
of order and run through the list again. You keep
running through the list until it’s sorted.

List toSort;
bool loopAgain = true;
while(loopAgain){
loopAgain = false;
int i = 1;
while(i < toSort.size()){
Car* carA = toSort.get(i-1);
Car* carB = toSort.get(i);
if(carA->cost < carB->cost){
toSort.set(i-1, carB);
toSort.set(i, carA);
loopAgain = true;
}
}
}

There’s a fair amount going on here: the first


thing to note is the flag variable (loopAgain). This
flag controls whether we keep looping. It starts off
true (we have to loop at least once). Every loop,
we start by setting it to false: if we don’t make
a change, we can stop looping, but if we do make a
change, we have to loop again.
We test if we need to make a change by comparing
two cars: if they’re out of order, we swap them.

192
Bubblesort, parking lot edition.

Also, since each test looks at two cars, we start


our loop through the list from index one instead of
zero (there is no car negative one).
Anyway, once this loop stops running, we have a
list of cars that starts with the most expensive, and
gets progressively less expensive.

Violence discount?

193
Assymptotic Dominance
We have a canned subroutine (an algorithm) called
bubble sort. An important question to ask is how
long it takes to run: we’d like to get an answer
while the customer is in our office.

Any time now.

However, an exact answer to this question depends


on too many factors (the computer we’re using, the
operating system we’re using, how many programs we
have open, the time of day, whether the computer
gremlins are leaving you alone, etc...) to be
useful. A more useful approach is to gauge the
behavior of this program as we increase the size of
the inputs: for instance, if it takes 20 seconds for
our program to run on 100 cars, how long will it take
to run on 200 cars?
This is not as simple as it sounds: your natural
instinct might be to say twice the amount of input
takes twice as long to process (“around 40 seconds”),
and your natural instinct might be wrong (it’s closer
to four times as long, 80 seconds). Figuring this
out is sometimes called “Big O” analysis.
The basic idea is to figure out, if your list has
N entries, roughly how many things your program has
to do. In the case of bubble sort, if we have N cars
to examine, we have to examine N cars per loop (a
loop through a list adds a factor of N). In order
to get the big O, we have to know how many times we
loop through the list (how many loops until the while
loop stops). In the worst case, we need to run the

194
while loop N times: if the most expensive car is at
the end of the list, we have to run N times for it to
migrate to the start of the list.

This could take a while.

So, in the worst case, we have to run a loop


through N items N times; we have to do O(N 2 ) things.
The exact count of things doesn’t matter (since the
exact time it takes to do each thing is unknown), we
only care that the behavior is N ∗ N . In particular,
if we double N, we quadruple the running time
((2N ) ∗ (2N ) = 4(N ∗ N )), so instead of 200 cars taking 40
seconds, it takes 80.

Better Sorting
Normally, for an algorithm that has to touch
everything, the best you can possibly do is O(N )
(doubling input doubles time). For sorting N items
by comparison, we can’t quite get there, but we can
do better than O(N 2 ). One possible option is called
“mergesort”.
Mergesort hinges on two facts: first, merging two
sorted lists is an O(N ) operation. If you have two
lists of cars, each with their most expensive one
first, you only have to look at two cars to find the
most expensive.

195
Which is bigger: 8000 or 7000?

Which means you only have to look at 2N cars to merge


the two lists into a new sorted list.
Second, a list containing one item is sorted: if
you only have one car in the list, the most expensive
car is the first (and only) thing in the list.
Mergesort starts by splitting the list into N lists
(each with one car).

Six lists of one car

Then, it takes pairs of lists (each with one car) and


merges them (each with two cars).

196
Become three lists of two.

It then repeats this until it has a single list.


We know each merge pass does O(N ) things (each
pass looks at all the cars, even though they’re in
different lists). If we want to get the big O, we
need to know how many merge passes we need to do,
which means we need to know how many times a list
of N items can be split in half. In this case,
the answer is log2 (N ); in general, a (balanced) tree
of items has log(N ) levels. So, running an O(N )
operation log2 (N ) times means the big O is O(N log2 (N )),
which is close to O(N ) (and much better than O(N 2 )).

Better Maps Through Trees


Currently, our associative arrays are two parallel
lists: if we want to find something, we need to
search through the list of keys, which is an O(N )
operation. Unlike sorting, finding a key does not
have to look at everything in the list, so we might
be able to do better than O(N ). We can do this by
using a tree (in O(log2 (N )) time).
For instance, suppose we want a map that goes from
names to contact information (something like a phone
book, a RolodexTM or a DNS server).

197
Anybody still use these things?

When you build the map, you can sort the list of
names (making the map will be at least O(N ), we’re
trying to make searching faster). Then, you can
build a tree: take the middle-most name (probably
beginning with M or N), and put every name that comes
before it on one side, and every name that comes
after on the other.

An unlikely collection of contacts.

198
If you then repeat this process at each level, you
have a tree of names (and contact info).

Numbers withheld for security reasons.

Now we get to searching. Instead of looking


through all the names (O(N )), we can start at the
root of the tree, and ask if the name we’re searching
for comes before or after the name of the root. If
it comes before, we’ve cut our options in half (if
it’s in the set of keys, it must be under the left
leg). This can be repeated for each node until the
key is found (or not found, if the key is not in the
tree).

199
Quark is strange.

To do this, we only have to search through the


levels in the tree (O(log2 (N ))), rather than all the
keys (O(N )). Searching through a sorted tree is
called “binary search”.

Dijkstra
Let’s say you have a road map (a graph), and you want
to know the shortest route from Cheyenne Mountain to
Roswell.

200
For no particular reason.

This can be done using Dijkstra’s algorithm: a


variant of breadth first search. You start with the
starting point (Cheyenne Mountain) in the open list.

We start at Cheyenne Mountain, so we go zero miles to


get there.

You then note which places you can go to from


Cheyenne Mountain, keeping track of the travel
distance and where you came from.

201
From Cheyenne Mountain, our direct options are Pueblo
and Alamosa.

Then, you pick the town with the lowest travel


distance: you look at where you can go from that
town. For any given destination, if it’s been
handled already, we’ve already seen a shorter route
to get there, so do nothing. If it’s been seen but
not handled, we need to see if going there from the
current town is faster than the route we already
know: if so, we change the route at that node.

202
Going to Pueblo, then Alamosa, is shorter than just
going to Alamosa (the actual roads curve).

We keep repeating this until our target (Roswell)


has been handled: at that point, we have the
shortest route.

There’s no point in looking at Almagordo, because we


already know the fastest way to Roswell.

Once we have the route, all we need to do is follow


it.

203
Nothing to see here. Move along.

204
Challenges
1) There are many sorting algorithms. Look up a
few, and pick one to code up (my absolute favorite
is bogosort).

2) There is another way to speed up searching through


a map, which produces an O(1) lookup. Look up the
concept of hashing (and a map based on a hash table),
and see if you can write a hash function for a string
of text.

3) In video games and simulations, spatial


partitioning trees are used to speed up calculations
by ignoring things that are too far apart. Do
some research, and compare and contrast quad-trees,
kd-trees and bsp trees.

4) Code up Dijkstra’s algorithm. What’s the big O


of your implementation?

205
206
Chapter 15

How to spin rust

The goal of this chapter is to handle a file system.

Non-volatile Storage
Inside the processor itself, we use flip-flops to
implement registers. While bulk memory could be
built this way, most modern processors use a cheaper
option (DRAM). However, both options are ephemeral:
there is no such thing as a perfect insulator,
so any electrical system you build is going to
leak. Consequently, any system that stores data in
electrical voltage is going to lose that data if the
power goes out. How quickly it leaks varies, but it
will eventually fade away.
This is a real issue if any of your data is
important.

207
“OperatorManual.txt” not found.

Especially when you remember that, to a computer,


programs also count as data.

“SCRAM.exe” not found.

So, we need some way to store data while the


power is out. Most options use magnets (with the
notable exception of flash memory). However, the
processor runs with a clock speed somewhere near 1
billion cycles per second: all of your options for
storage that does not fade are all comparatively
slow. Consequently, as a programmer, you typically

208
want to load data into memory before actually doing
anything. This is also why we don’t just use the
long term storage as main memory.

Sectors and Tracks


Flash memory is accessed in much the same way as RAM,
and tapes are accessed sequentially.

Duct tape is the preferred solution, but magnetic


tape works in a pinch.

However, the most enduring design is a magnetic


material attached to a solid disk, spinning at some
impressive speed. All manner of floppy disks and,
until recently, hard drives used this. As well,
optical media (compact disks, DVDs and the like)
are structured in a similar manner. On each disk,
the first division is the track: if you’re spinning
the disk, it’s natural to leave the read head in one
location while the disk spins under it.

209
Pick your track by moving the read head.

A full track of the disk is a pretty substantial


amount of data, so it’s typically divided into
sectors.

Wait until the wheel of fortune brings you your


sector.

So, if you want to get some data from a hard disk,


you specify which track and sector you want. The
disk would then move the read head to the track, and
then wait until the requested sector was under it.

210
Naming Conventions and Inodes
So, we have a bunch of tracks, which contain a bunch
of sectors, which contain a bunch of data. We’ve
filled all that storage, shut down the computer, and
then turned it back on. You’re computer sees a large
amount of data, but unless you left some breadcrumbs
to follow when you return, it will just look like
noise.

No secret messages here.

You need some way to figure out what you have


on the hard drive when you boot back up. The
most common approach is to store some structure
information as a tree of “directories” (i.e.
folders). The root folder of the hard drive is
stored at a known location (possibly the first sector
of the first track) and contains some information
about the contents of the disk. First, it might
have information on other folders: for each folder,
you need to know the track and sector where it is
located. It might also have information on files:
at minimum, you need the track and sector where the
file starts.

211
Each sector holds a large number (4096) of bytes.

If we want a user of this machine to understand


what’s going on (and specify what they want done),
we also need to attach names to each of these items.
Typically, the root folder has a special name (“C:\”
on Windows, the empty string on *nix). Then, each
step down the tree is given a name: for example,
“/usr/abe/blackmail/special.xlsx” starts at the root
(“”), then goes to the “usr” folder, then the “abe”
folder, then the “blackmail” folder, and finally the
file named “special.xlsx”.

212
Weird blackmail is always the most interesting.

If the user gives the operating system this name,


they are telling the operating system directions to
the file they want.

If you don’t cache data, you have to do this for


every file.

Streams
There’s a whole lot to keep track of when playing
with file systems. For instance, many file systems

213
allow you to split the data of a file into multiple
locations (put the first 4k here, and the last 12k
there). This allows you to make full use of the
hard drive: if you don’t allow this, a few small
files placed in key locations can prevent people from
writing an intact large file.

You can’t save a 16k file, because you can’t find


four consecutive blocks.

But if you allow files to be split into pieces,


that doesn’t matter (unless speed is an issue).

214
If you can split a file, the only thing that matters
is how many blocks are unused.

However, it’s a pain to keep this straight. If


it were just this one thing, it might be manageable
by all programmers, but there is a whole litany of
things to keep track of (compression, encryption,
weird filename restrictions, maximum file sizes);
trusting random programmers with this is asking for
trouble.
As a consequence, a lot of operating systems hide
the details and provide simple interfaces to these
files. You (the programmer) ask the operating system
to open a file named “/usr/ann/pron/unlikely.tex”,
and the operating system gives you something
(probably an integer or pointer) that you can store:
when you want to read a byte from the file, you give
the operating system that identifier. The operating
system manages all the details: it doesn’t matter
that the data are scattered over hell and half of
Georgia, the user program sees the file as a stream
of bytes.
Many programming languages take this idea one step
further, and treat all collections of bytes (that get
read one at a time) as an input stream. In C++, the
standard classes for this are “istream” for reading
byte by byte, and “ostream” for writing byte by byte.
C++ provides a sub-classes for reading from files
(“ifstream”) and writing to files (“ofstream”).

215
File Structure
When you write a program that writes to the disk,
you get to specify exactly what gets written: the
computer just treats it as a stream of bytes. If you
want this information to be useful when you read it
back in, you’re probably going to add some structure
to it. This begs the question, how do you organize
it?
For simple data, having a global structure tends to
be sufficient. For example, if you have a table of
numbers, you might write the width, then the height,
then all the data, row-by-row, column by column.

You might look up the csv file format for stuff like
this.

More complicated data is typically structured in a


header-payload fashion: the file is broken up into
chunks, and each chunk begins with some information
describing what that chunk is.

216
There’s more than one way to organize data: it’s up
to you to pick the best for your application.

When you design your format, there are a few


tradeoffs to bear in mind. First, structure
information is not free: large headers will increase
file sizes and slow down processing of the file. A
second tradeoff is size versus ease of editing: text
information is easy to edit, but also tends to be
larger than just storing the bytes. On the other
hand, text files are harder to mis-interpret than
binary: an integer on one system might be 32 bits,
while an integer on another might be 64, but text is
text, and ASCII is king.
One final problem to bear in mind: I’ve said that
processors have a command to move a word between
memory and registers. However, I never said how
that word is stored in memory. You might imagine
that the processor designers were sensible, and put
the largest valued bits first. However, this might
not be the case, and the processor might store the
smaller bits first (to be fair to the x86 designers,
this does speed up addition from memory).

217
Either way can be made to work, but forgetting to
swap for different processors creates weird bugs.

Because of this, it is a bad idea to save using


a memory dump: you should explicitly order the
bytes you write, or use a library that specifies the
endianess (order) of the data it writes.

Compression
So, you’ve come up with a structure for your files,
you’ve run a simple example, and you realize you’re
writing huge files. Assuming you’ve organized your
file structure the best you can, how do you fix this?
Well, files can be large for two reasons. The first
is simply being disorganized: adding boilerplate
information and needlessly repeating things.

All work and no play makes Jack a dull boy


All work and no play makes Jack a dull boy
All work and no play makes Jack a dull boy
All work and no play makes Jack a dull boy
All work and no play makes Jack a dull boy
...

This can’t portend anything good.

Lack of organization is something we can write

218
a program to fix: these are called lossless
compression algorithms. Most of these algorithms
are based on finding and noting repeated sequences.
For example, instead of using 4300 bytes to store
100 repetitions, you can use 47: 43 for the text to
repeat, and 4 for the number of repetitions.

All work and no play makes Jack a dull boy


100x

Heeeeere’s Johnny.

These algorithms are lossless because, when you


decompress the file, you get the original. These
get used for text and records data: anything where
losing information is a problem.
This leads us to the second reason files can be
large: having too much data. For text, all of the
letters are important. However, the human eye is a
poor instrument: if you’re storing an image, there’s
no point in storing information your audience can’t
see.

Pictured: a penguin in 1000 different shades of


white.

So, the second type of compression algorithm, lossy


compression, is based on throwing away data that
doesn’t matter. Unlike lossless algorithms, lossy

219
algorithms typically have to be purpose built for the
type of data you’re compressing: a general purpose
algorithm doesn’t know what’s important and what’s
not.

220
Challenges
1) There are several different types of file system,
including FAT, NTFS and HFS. Look up a few, and
compare and contrast them.

2) On your system, look up how to open a file. See


if you can write a program to read in text from a
file, Caesar shift forward by two characters, and
write the result to another file.

3) On your system, look up how to get information


on the contents of a folder. See if you can write a
program that prints (to standard output) the name of
the most recently modified file in a folder.

4) There are several different compression


algorithms. Look up a few of them, and compare and
contrast. See if you can find at least two lossless
algorithms (and one lossy).

221
222
Chapter 16

How to make friends


and influence people

The goal of this chapter is networking.

Walkie-Talkie
Our goal in this chapter is to get computers across
the globe to share data with each other. That starts
with getting two computers in the same room to share
data.
We store and convey bits as voltage through a wire,
so the first option is to run a wire between the two
computers. If you do this, you need some way to
mark the start and end of a bit (the two computers
probably don’t have the same clock speed, and if they
do, they probably aren’t synchronized): this quickly
becomes an exercise in analog circuitry. However,
at the end of the day, connecting two computers is
basically running a wire between them.

223
No surprises so far.

However, the problems start once we add a third


computer to the mix. If we want computers A and B,
A and C, and B and C to talk, we need to run three
wires.

Well, maybe the floppy drives count as a surprise.

If we have four computers, we need to run six


wires, and for five computers we need ten.

224
Our computers also have to handle more individual
connections.

The number of wires we need gets bigger faster than


the number of computers: the exact number can be
calculated using combinatorics (it’s N choose 2), but
it gets stupidly huge stupidly fast. I don’t know if
you’ve checked lately, but copper is expensive: we
need a way to cut the number of wires.
A better way of organizing our computers is to add
a router: a special purpose computer that routes
messages to their final destination. This router is
connected to all of our computers.

225
As the Brits call it, a rooter.

When computer A wants to send some bits to computer


B, it sends a message to router: that message
contains the message for B, as well as a header that
tells the router to send those bits to computer B.

The Australians also call it that, but for different


reasons.

With this, we can have computers in the same room


talking to each other.

226
Postcards
We have our router copying messages from one computer
to another: however, what happens if we want to send
a message to a computer that is not in the same room?
Well, this computer will not be attached to our
router; however, if you’ve paid your internet bill,
your router will be connected to another router.

The reason it’s called the INTERnet.

If computer A sends a message to an offsite


computer (call it S), then your router realizes
that S is not connected to it. Anything the router
doesn’t know how to route, it sends it to your
internet service provider.

227
The first hop.

The internet service provider has a whole host of


routers that are responsible for groups of computers.
At any given router, if the target is known, the
message gets sent the right way, otherwise it keeps
getting passed along until someone knows what to do.

228
The worst link limits your communication.

There are a few things to keep in mind about this


setup. First, since multiple computers, routers
and internet service providers are involved, they
all need to play by the same rules. These rules are
called the Internet Protocol: the Internet Protocol
specifies that every computer is assigned a number
(called an IP address). This number is like a phone
number or a street address: it uniquely identifies
your computer, and provides some information on how
to transport messages to your computer.
Whenever you want to send a message, you construct
something called a packet: this packet is a
collection of data that specifies the IP address of
the destination, the IP address of the source (so
the target computer knows who to talk back to), a few
other settings, and the message itself.

229
The contents of the message, like file formats,
depend on the requirements of the communicating
programs.

You then send this packet, bit by bit, to your


router, which then figures out how to get it to the
destination and sends it further along, bit by bit.

Phonecalls
The Internet Protocol is a very basic protocol:
there are many people involved, some of which
are malicious, so IP is very fault tolerant.
Unfortunately, this means you, the programmer, get
very few guarantees. If you send a message, any
of the routers between you and the destination can
decide to not send the message.
In addition, your internet service provider could
be doing maintenance: when you send one packet, it
goes one way, while the next packet gets sent another
direction. This means that a packet you sent later
could get to the destination earlier.

230
You might think this is unlikely, until you remember
exactly how many people are involved.

To make the programmer’s job easier, a lot of


operating systems provide something called the
Transmission Control Protocol. A program on one
of the computers (called the server) signals to
its operating system that it wants to listen for
connections. Another program on another computer
(called the client) signals to its operating system
that it want to make a connection to the server.
The client sends a packet to the server saying it
wants to start a connection, and the server sends a
response.
Every time one of the programs sends a message, the
other computer sends an acknowledgement on receipt.
The operating systems of both computers keep track of
which messages have been received, and will resend if
no acknowledgement comes in.

231
Are we there yet? Are we there yet? Are we there
yet?

This allows the programmer to treat an internet


connection like a stream, without worrying about
dropped packets.
There is another issue that comes in to play at
this level: if your computer is running multiple
programs, it’s possible that multiple programs will
want to communicate over the internet. Each program
only cares about its own connection(s); so, the
operating system needs some way to split the traffic.
TCP accomplishes this with a tag (called a port
number): each connection has a port number assigned
that is used by the operating system to break up
packets based on program.

232
One computer can provide multiple programs/services
(in this case, file transfers and http pages).

Finally, there is the question of how we got the


IP address. An IP address is a number, much like
a phone number, so one might expect the user to
remember the IP addresses of their favorite websites.
However, an IP address is a number either 4 or 16
bytes long, and people tend to be bad with numbers
anyway. Add on the issue of changing IP addresses,
and this might not be a good idea. One way to fix
this is with the Domain Name Service (DNS), the phone
book of the internet.

233
An entirely accurate phone book for the internet.

There is a server at a known IP address (either set


in your operating system or provided by the internet
service provider) which will receive messages: these
messages ask what IP address corresponds to a name.
So, if your computer asks for the IP address of
“google.com”, the DNS server(s) might reply with the
following four bytes: 74, 125, 224, 72. You can
then go on to connect to that address.

Fodszqujpo
So, now we’ve connected our computer to every other
computer on the internet. We are very trusting
people: we may not be very smart.
So, there are (at least) two ways connecting to
random computers can cause us trouble. The first is
if we blindly follow instructions from the internet:
it’s usually a very bad idea to let people you don’t
trust give you code to run. If you’re writing a
program that takes user input over the internet, do
not let them run code on your machine (at least, not
without very careful design).
This sounds easier than it is: as an example,
when your internet browser opens a website, it
makes a connection with a server and reads in an
HTML page. That HTML page describes the contents

234
of the page (safe) and allows JavaScript code to
describe how the page changes in response to the
user (potentially very dangerous). If the people
programming your browser were not careful, running
that JavaScript could lead to nasty and, worse still,
subtle problems.

Who knew cat videos would lead to the end of the


world?

However, that’s a problem between two computers:


it’s an issue of trust and verification that’s not
unique to the internet. The second way connecting
computers together can bite us stems from how the
internet is set up: there are quite a few routers
between you and a server. Each one of those routers
is a computer that can copy the information that
passes through it, or change the information before
copying it. Furthermore, while IP packets have
a source IP address, you don’t have to be honest:
lying is easy.

235
We got a lead.

These are the big three issues in computer


security: confidentiality (only the destination
should be able to read the message), integrity (the
message at the destination should be what was sent)
and authentication (I’m actually talking to who I
think I’m talking to). Doing these topics justice
is outside the scope of this book; however, the
short answers are, in order, encryption, hashing, and
asymmetric encryption.
Encryption is garbling a message before sending it:
we’ve done this with a Caesar shift cypher. If both
server and client share a secret (the number used
to rotate: does A go to C, or Q), then the server
can send a message that the client can un-garble.
Caesar shift is a fairly poor algorithm: the shared
secret (the number of spaces to rotate) has very few
possibilities (26 for English), so it’s easy to just
try them all. However, the same basic idea applies
to all symmetric encryption algorithms (just with a
larger key space).
A symmetric encryption algorithm is where both
server and client share a secret: this means there
must be some way for them to share that secret (the
problem we were trying to solve in the first place).
However, it’s possible to come up with an algorithm
where only one of the parties needs to know the
secret: asymmetric encryption. In this, the server

236
comes up with two pieces of information, one used to
garble the message, and the other used to un-garble.
The server can give everyone and their dog the garble
key, and everyone and their dog can use that key to
send messages to the server. As long as the server
keeps the un-garble key secret, there should be no
problem.

The downside is you need four (BIG) keys to do the


job of one small one.

This can also be used to verify identity: if you


send me a message garbled with “my” garble key, then
only I should be able to read it (if I am who I say I
am).
So, the computers between client and server can’t
read the messages. However, they can still alter
them (if your goal is simply to cause chaos, this is
very simple: just change random bytes). The answer
to this problem is to add a summary of the message
(a hash) to the message (before encrypting). Upon
decryption, if the summary doesn’t match, someone is
messing with your messages.

237
Challenges
1) Your computer’s operating system provides routines
for TCP. Look up how to use them, and see if you
can write a simple server that prints whatever it
receives, and a client that sends “Hello, World!”.

2) Find a packet capture program that works on your


system. Play around with it: what do you notice
while searching around the web?

3) For connecting computers in the same room, you


actually don’t need communications to go through the
router. Look up how the original Ethernet standard
accomplished this.

4) Electromagnetic radiation (i.e. radio waves)


tends to be unfocused (it spreads out in all
directions). How might this be a problem with
wireless networks?

238
Chapter 17

How to stimulate

The goal of this chapter is to cover how simulations


are written.

A Bit of Calculus
Most simulations are trying to calculate how
something changes, either in space or in time.
The mathematical language of change is calculus.
Normally, this might constitute a problem (calculus
scares people for some reason). However, most of
the difficulties in calculus stem from trying to find
an exact answer. We’re working on the computer: we
don’t care about the exact answer, we just want good
enough.
Accordingly, we just need to understand the basics:
what is a derivative, and what is an integral. Both
of these operations work on something called a
function: a function takes one or more inputs, and
produces an output.

f (x, y) = x2 + sin(y)
You’ve seen this before: the programming term for
this is subroutine (and the C standard actually calls
its subroutines functions).

double f(double x, double y){


return x*x + sin(y);
}

239
A useful question to ask about a function is: if
I change one of its inputs by some small amount, how
much does the function result change?

∆x → ∆f (x, y)
This is typically phrased as the “slope of the
tangent line”:

∆f (x, y)
∆x
and this can be approximated by

f (x + ∆x, y) − f (x, y)
∆x

derivative = (f(1.5+0.001,3)-f(1.5, 3)) / 0.001;

If we were mathematicians, the next thing we would


do is take a limit (as ∆x approaches zero) and call
it a “derivative” (since we have multiple inputs,
technically this is a partial derivative).

∂f (x, y)
∂x
But, we’re not mathematicians: we don’t care about
being exact.

Moving Mountains
Derivatives are half of the calculus fun: the other
half is integrals. Suppose you want to put in a
football field, and you need to level a plot of land.

240
Proper American football.

You go and survey the land:

I once got heat exhaustion doing this.

and you get the height of the dirt on a regular grid.

241
20 foot drops don’t end well.

What you want to know is, if you level to a given


height (0 meters/yards), how much dirt will you
need to remove (or possibly add as fill)? A simple
approach to this problem is to draw rectangles up to
each height from each measurement:

Trapezoids would be better, but this works.

and then calculate the volume of each rectangle (area


times width: 3d is hard to draw).

242
A standard football field is 160ft by 360ft (split
into twelve pieces is 30 feet per piece).

If you sum up those volumes, you get a value for how


much dirt you need to move.
N
X
dirt = 160 ∗ hi ∗ ∆x
i=0

If we were mathematicians, we would take the limit


as ∆x goes to zero (and the number of grid points
goes to infinity) and we would call it an “integral”.
Z 360
dirt = 160 ∗ h(x) ∗ dx
0
But, again, we’re programmers, not mathematicians.

243
Then again, neither are known for being able to
football.

Internal Ballistics
For the pure programmers, you typically have a
differential equation dumped in your lap, which you
are then asked to solve. How that expression for the
derivative is obtained depends on who is giving it
to you. A problem I happen to like is solving for
the velocity of a pellet from an air rifle. This
system starts with a pellet in the chamber, with high
pressure air on one side, and atmospheric pressure
air on the other.

You’ll shoot your eye out, kid.

There are three things we need to keep track of:

244
the time (t), the position (x) and the velocity (v).
Time just runs on like a river: the state of our
system is a function of time (we can ask what the
position is at time 2, but asking what the time is
at velocity 7 feels backwards). By definition, the
derivative of position is velocity (this is worth
remembering).
∆x
=v
∆t
Newton’s second law gives us an expression for the
derivative of velocity (acceleration) in relation to
the applied force (F) and the mass of the projectile
(m): this is also worth remembering.
∆v F
=
∆t m
At this point, the engineer (me) dumps an
expression for F into your lap (if you’re curious,
look up the definition of pressure, Boyle’s law, and
the area of a circle):

∆v πR2 (P0 xx0 − Patm )


=
∆t m
So, we have expressions for how v and x change
with time, let’s get their values at some times
of interest. Since we’re programmers solving this
approximately, we need to pick a timestep (∆t)
for our simulation: let’s use 0.001 seconds (a
millisecond). We have the position at time zero (x0 )
and the velocity at time zero (v0 ). If we want to get
the position and velocity at 1ms (x1 and v1 ), we can
first calculate the derivatives at time zero.
∆x
(0) = v0
∆t

∆v πR2 (P0 xx00 − Patm )


(0) =
∆t m
We can get the change in position and velocity by
multiplying by the timestep.

∆x(0) = ∆t ∗ v0

245
πR2 (P0 xx00 − Patm )
∆v(0) = ∆t ∗
m
We have the changes from time 0 to time 0.001: if
we want the values, we just need to add the starting
value.

x1 = x0 + ∆t ∗ v0

πR2 (P0 xx00 − Patm )


v1 = v0 + ∆t ∗
m
To find out the position and velocity after 0.1s
(100ms), we just need to do this 100 times.

double curX = x0;


double curV = 0;
double curT = 0;
while(curT < 0.1){
double dxdt = curV;
double delP = P0*(x0 / curX) - Patm;
double dvdt = 3.14 * R * R * delP / m;
curX = curX + 0.001*dxdt;
curV = curV + 0.001*dvdt;
curT = curT + 0.001;
}

This problem setup is called an initial value


problem (we know the initial value, what is the value
at a later time). This particular method for solving
these problems is called Euler’s method (there are
other, more accurate but more complex options).

Hot Rod
We have a problem where the value varies in time:
however, we can also have a problem where the value
varies in space. As a simple example, let’s look at
a hot rod.

246
Who puts a blast furnace next to the freezer?

One end of the rod is held at a high temperature


(TH ), the other at a low temperature (TL ), and the
middle has heaters and chillers along it (how much
heat they add is a function of location, G(x)).
This rod has been held this way a long time (the
temperature has stopped changing): what I’d like to
know is the temperature (T (x)) throughout the rod.
Again, the engineer (me) dumps an equation in your
lap: if you’re curious, look up Fourier’s law and
shell balances.

∆ ∆T
∆x −G(x)
=
∆x k
k is a constant of the material (steel conducts
heat faster than StyrofoamTM ). You’ll also notice
that we’ve got a derivative of a derivative: a
derivative measures change, and change can change
(e.g. acceleration).
Again, since we’re working approximately, we choose
some distance scale: let’s split our rod into five
pieces (∆x = 1/4th the length of the rod). Each
piece has its own temperature (Ti )

247
If that was a hot rod, this is a chop shop.

The two endpoints (T0 and T4 ) have known


temperatures (we’re holding them at that value). For
the middle three, we can turn the derivative into
something useful. In particular, a derivative is a
difference, so a derivative of a derivative can be
written as a difference of a derivative... just look
at the following equation.

∆ ∆T
∆x
∆T
(x) − ∆T
∆x (x − ∆x)
= ∆x
∆x ∆x
Differencing again, this becomes:
T (x+∆x)−T (x)
∆ ∆T
∆x ∆x − T (x)−T∆x
(x−∆x)
=
∆x ∆x
This simplifies to

∆ ∆T
∆x Ti+1 − 2Ti + Ti−1
=
∆x ∆x2
There are three places we don’t know the
temperature: we can use our discretized derivative
to build three relations.

T2 − 2T1 + TH −G(x1 )
=
∆x2 k
T3 − 2T2 + T1 −G(x2 )
=
∆x2 k

248
TL − 2T3 + T2 −G(x3 )
2
=
∆x k
Three equations, with three unknowns, means we can
solve this system for the temperature in the rod.
To do this, use Gaussian elimination (I assume you
remember your algebra).
This problem setup is called a boundary value
problem (we know the value at the boundaries, what
is its value in the middle). This particular method
for solving these problems is called the finite
difference method (there are others).

249
Challenges
1) There’s a major logical problem with the air rifle
program. Can you spot it? (Here’s a hint: how fast
is the pellet moving once it leaves the muzzle?) How
would you fix it/cope with it?

2) Pick one of the following: continuously


compounded interest, radioactive decay, unrestricted
bacterial growth, spontaneous chemical degradation or
any other first order growth/decay problem. Figure
out/look up the equation for it, and write a program
to solve it.

3) The football field used heights measured on a


line, because 3d is hard to draw. How would you
calculate the 3d case?

4) See if you can program a Gaussian elimination


solver for the bar problem. If your linear algebra
is weak, you may need to do some review.

250
Chapter 18

How to play sweet


music

The goal of this chapter is to pipe things through


the speakers.

What’s That Noise?


If we’re going to play sound, we should have some
idea what sound is. At the end of the day, sound is
a vibration that travels through the air and tickles
your ear.

This is better than what happened to the other ear.

251
Vibrations are created by some disturbance in the
environment (somebody pounding on a drum, a rock
hitting the water, gunfire), and moves through the
environment at some speed (in freezing air, 331
meters per second, or 1087 feet per second). These
vibrations create temporary spikes in pressure, which
specialized hair cells in your ear respond to.
Those hairs responding triggers neurons, which in
turn signal that response to the brain, which somehow
figures out that sound is playing.

Listening
Our goal is to re-create a set of vibrations (or,
equivalently, pressures): our first challenge is
to figure out how to get the pressures we want to
play. We can generate them from code, although
sometimes you want to measure and store sounds from
the environment for later use (especially if the
sound is complex, like speech).
The device used to measure the sound is called
a microphone. There are several different types,
but all of them have a circuit where a vibrating
element produces different voltages. One design
relies on the piezoelectric effect: some crystals
(like quartz) will produce a voltage if subjected to
a mechanical stress (like vibration). If you hook
up such a crystal to a diaphragm (a big sheet that
reacts to vibrations), it’s possible to measure the
voltage produced by the crystal.

252
Plus some foam to cut back on popping (from wind and
spit).

Now, we have a microphone that produces pressure


readings. However, it produces pressure readings
all the time, while our computer is set up to handle
discrete bits of data. The most common solution to
this problem is sampling: store the pressure every
23 microseconds.

You may be noticing a pattern: programmers don’t


really care about “exact”.

If you store the pressure at a small enough

253
interval, the difference between the discrete samples
and the original sound becomes imperceptible (i.e.
not important). The question is, how small is small
enough: to answer that, there are two pieces of
information we need.
First, sound tends to form a wave: waves are
described by amplitude (loudness) and frequency
(number of spikes per second).

Do the wave.

The human ear is a variable instrument, but for


most people it cannot detect frequencies above 20000
spikes per second.
Second, if you want to recreate a sound of a given
frequency by sampling, you need to sample at least
twice as often as the frequency (Nyquist’s theorem).

254
I didn’t say it’d be a good re-creation.

Both of these facts imply that we need to save


pressures at least 40000 times a second: sampling
every 23 microseconds produces 44100 samples per
second (a standard sampling frequency).

Playing Samples
So, now we know the sound we want to play, we just
have to play it. Many devices that produce sampled
audio have special circuitry (a sound card) that
will store some number of samples, and turn those
samples into voltages. The most complex item in
these cards is a digital to analog converter: these
are circuits that will take a digital signal (an
integer) and produce a corresponding voltage. There
are many approaches to this: a simple one starts by
generating a range of voltages, and using a bank of
gates and transistors to select one of them.

255
Thought we were done with this stuff, didn’t you?

As programmers, we have one job: get data to the


sound card. The sound card has a buffer that stores
some amount of data: once we fill that buffer, we
have to get more data before the buffer runs out
(otherwise, we get stuttering and popping). This
means you either need to program carefully (so you
loop back to filling the buffer before you run out)
or use a separate thread to handle the data.

Want to hear the most annoying sound in the world?


Let your data run out.

256
This problem is known as real-time programming:
your program must do something within a time limit.
In particular, this is a soft real-time problem,
since nothing will blow up if you blow it.

What do you mean “cut the RED wire”? This book is in


black and white!

As for the speaker attached to the other end of


the sound card, there are multiple ways to produce
sound. One of the most common options is to use
electromagnetic induction: current through a coil
of wire produces a magnetic field, which can push
against a fixed magnet. If you attach one of those
parts to a diaphragm, it will produce pressure waves
as it moves.

“Magnets” is an anagram of “magic”.

257
Playing Notes
If you run the numbers, you will realize that storing
any significant amount of sound data will take a lot
of storage. Most modern systems have the memory for
it, but some specialized systems don’t, and it might
be worth freeing up memory anyway. There are other
ways to treat sound besides a collection of pressure
samples.
Most musical instruments produce sound via
resonance. The key sound generating structures on
these instruments allow certain frequencies to grow
and add to each other: this filters a random input
(like friction on a wire or breath over a pipe) so
that only the preferred frequencies come through.

The fundamental frequency is the wave that travels to


the end, bounces back, and returns right when the
next wave starts.

Most musical theory and composition starts with


a set of fundamental frequencies: these are called
notes, and most instruments are created to play
those notes (at least, once you to “tune” them to the
proper frequency).

258
Pictured: the extent of my knowledge of music
theory.

In this framework, sound is a collection of


notes that get played in some order (melody) and
combination (chords). So, some systems provide a
framework to pass the notes that must be played,
and the sound card will handle turning notes into
pressure waves.
The most common method for this is called MIDI: the
program gives a MIDI controller events (start playing
note, stop playing note), and the MIDI synthesizer
turns those events into sound. The synthesizer holds
a large amount of sound data describing how different
instruments sound: how the notes start playing
(attack) and how they sound as they keep playing
(decay).

Fourier Series
There is a little bit of advanced math that comes
in damn handy when dealing with sounds. A Fourier
transform allows you to take sound data and determine
which frequencies are playing: a lot of filters and
effects rely on the Fourier transform.

259
Because bass is the only part that matters.

To produce a Fourier series, you need to know how


to do a discrete integral: refer to the previous
chapter if this is new to you.
In mathematics, an inner product determines how
much of one value is reflected in another. If those
values are functions, the inner product will tell you
how much of your “basis” function is present in the
signal. You can use multiple basis functions. If
you select basis functions intelligently (namely,
the inner product of two different basis functions
is zero), you can split a signal into those basis
functions.

Because the base-is the only part that matters.

For any kind of repeating function, the natural


set of basis functions is sines and cosines. If your
signal repeats every P seconds, then your basis set

260
is all sines and cosines that cleanly repeat in P
seconds (i.e. repeat once, twice, thrice, etc...).

This is an intelligent selection: proving it is is


not our problem.

So, if your signal is a sum of sines and cosines:

S(t) = a0
2π 2∗2π
 
+ a1 cos P t + a2 cos P t + ...
2π 2∗2π
+ b1 sin P t + b2 sin P t + ...
then you can split up your signal (i.e. determine
the coefficients) via

1 P
Z
a0 = S(t)dt
P 0

2 P
 
n ∗ 2π
Z
an>0 = S(t)cos t dt
P 0 P

2 P
 
n ∗ 2π
Z
bn = S(t)sin t dt
P 0 P
These are continuous integrals: however, we have
discrete data. Since we know how to do an integral
in a discrete manner (refer to the football example
in the previous chapter), we know how to do these
integrals (or we can at least figure it out). If
you know your trigonometry, you can then extract an

261
amplitude from pairs of sine and cosine (with the
same frequency).

262
Challenges
1) Go find some software that will let you record
and edit (sampled) sound data (there is probably free
software available). Install it, and record your
voice saying something silly.

2) Pick a system (Windows, Linux, Mac, Haiku), pick


a programming language, and figure out how to load in
a sound and play it.

3) Look up the frequencies of musical notes. Write


code that will generate five seconds of note “A4”.

4) Figure out what frequencies are playing in the


sound file you recorded (if you can hack the math,
try to code it yourself). Which octave does your
voice naturally produce?

263
264
Chapter 19

How to paint pretty


pictures

The goal of this chapter is to put things on the


screen.

Pretty Lights
Before we put things on the screen, we need to know
what the screen is. There are a couple of different
types of displays in use: the most traditional is
the beta radiation king, the cathode ray tube. A
CRT has a source of electrons next to a large charge
collection: whenever electrons are loosed, they zoom
off towards the screen, where they hit phosphors that
in turn produce light.

265
A heated plate produces electrons, a voltage makes
them move, and magnetic fields (from the coils) bend
their path.

These phosphors are placed in a regular grid.


The electron beam scans across the grid, line by
line, top to bottom, and sends a different amount
of electrons to each phosphor. A different strength
produces a different brightness, from none to “Oh God
My Eyes!”.

Bird is the word.

A display using a grid of “pixels” is called a

266
raster display, and is one of the most technically
straightforward displays to control from the
computer. If you’re using a typical hardware setup,
with memory separate from the CPU, you can add a chip
that scans through memory, and sends pixel commands
to a display.

One of the simpler ways to do this.

So, if you want to change the screen, all you need


to do is change the video memory; the next time the
screen draws, it will use the new data and produce a
new image on the screen.

Color
Our screens have a few limitations: the biggest is
that we only have a black and white (or black and
green) display, but we’d like to have color. It
turns out that your eyes are most sensitive to three
wavelengths of light, one of which registers as red,
another green, and a third blue.
Combinations of wavelengths stimulate multiple
flavors of cone at once, and your brain interprets
that as a secondary/tertiary color: so, red and
blue produce a purple/magenta hue. So, if we want
our display to work in color, one option is to add
multiple flavors of phosphor: each pixel can have

267
different amounts of each primary color (red, green
and blue).

One of the drawbacks of printing this book in black


and white.

Those of you with an artistic background might


protest that the primary colors are red, yellow
and blue: the difference is that, with screens,
we’re adding color (by changing the amount of light
each phosphor produces), while with pigments, we’re
subtracting color (Cyan eats red light, Magenta eats
green, Yellow eats blue, and blacK eats everything).

Sprite Sheets
We also need some way to store data for the screen.
Since the screen is a raster display, it makes sense
to store images in a raster format. The human eye
can distinguish about 200 different shades of each
primary color, so a natural choice is to store how
bright each phosphor should be in a byte. So, our
first pixel would require three bytes: one for red,
one for green, and the last for blue. Then, we can
store multiple pixels for a single line, and multiple
lines for a single image.

268
Simple file formats (bitmap comes to mind) also store
pixels this way.

Traditionally, small images meant to be drawn to


the screen are called sprites (older computers with a
display had special hardware for dealing with them).
This leads us to the last issue: so far, we have a
static image, but we might like to do animation.
The human eye has a certain persistence time:
if an image changes faster than that time, the eye
perceives it as continuous motion rather than a
stream of images. If we want to do an animation,
we need to change the image faster than that limit.
Exactly how fast that is can depend on what you’re
doing: 24 frames per second gets the job done, but
for applications with split-second reaction times
(action games), higher frame rates will feel more
responsive. Your typical limit in the United States
is 60 frames per second (the cycle rate of the wall
socket power is 60Hz).
Anyway, to draw an animated sprite, we can have
a large number of frames in a single file (a sprite
sheet).

269
Play them quickly and it looks like it’s walking.

And, every frame, we can copy over a new part of


the sheet.

Vector Graphics
The hardware we work with is a raster display: the
most straightforward image format to program mimics
this structure. However, art deals with shape,
value, color and texture, not pixels. As a result,
while producing an image, an artist might like to
work with shape et. al. rather than pixels.

270
In a raster format, a circle is a bunch of square
pixels.

Some art programs store the shapes that have been


drawn, and will “render” those shapes into a raster
format for drawing. As an (important) example,
suppose we have a triangle.

Advanced geometry: tri means three.

If we want to produce a raster from this, we might


overlay a grid of pixels. Then, for each pixel, we
can determine whether it is inside the triangle or
not. Those that are, we draw to the final raster.

271
There are many ways to do this, some smarter than
others.

Formats that store artistically relevant data,


rather than pixels, are called vector formats (the
earliest formats directly moved the electron beam,
and described the motion with vectors).

3D
Suppose you’re an architect, and you want to
visualize what your building will look like before
sinking $2 million into it.

272
Looking like $#!* isn’t supposed to be literal.

Since nobody important will see the insides of


the walls, we only need to worry about the visible
surfaces.

The rats can stay, but the skeleton has to go.

273
We need some way to represent those surfaces: the
most common is to approximate it with a very large
number of triangles.

Twelve in this case.

With this, we have a three dimensional object,


but we need to draw it on a two dimensional screen.
One option is ray-tracing: pick an eye point and a
view direction, and follow lines out until they hit
something.

Actually pretty simple to code, though horribly slow.

274
With triangles, a smarter option is to start with
the eye point and view direction, and figure out
where the three corners of the triangle are on the
screen. Once you know where the triangle is, you can
rasterize it. Of course, figuring this out requires
a fair bit of math: how’s your trigonometry?

275
Challenges
1) Find a raster image editor and produce an
image: bonus points for something offensive and/or
entertaining.

2) Find a vector image editor and reproduce the same


image.

3) Find a 3d modeling tool and produce a simple


model.

4) Suppose you had the following scene:

The eye is at <0,0,10> looking to the origin <0,0,0>:


the cube is centered on the origin, and is 2 units
wide.

Look up the difference between an orthographic


and perspective projection. See if you can figure
out where the points of the triangles would be under
each.

276
Chapter 20

How to push a few


buttons

The goal of this chapter is to present your user with


a coherent interface.

Windowing
Older computers only had one program running at any
given time. In this situation, letting that program
hit video memory directly is perfectly fine: what’s
mine is mine, and what’s yours is nothing. However,
the moment you have two programs sharing a screen,
that becomes a very bad idea.

277
Think advertisers are annoying now? Imagine what
they could do with direct memory access.

You need some way for programs to share the screen.


Most modern operating systems are based around
graphical user interfaces, so they provide a natural
way to segment the screen: windows.

Virtual windows precede virtual bricks.

A window is an image that you can put data in: the


operating system will arrange for it to be put on the
screen, somewhere. We can draw anything we want in

278
this window, which begs the question, what should we
put in there?

Common Elements
Before we answer that question, we need to answer a
simpler one: what might we put in there? The first
and least interesting thing is static text. Static
text can tell the user what they’re looking at, but
it doesn’t really do much. Additionally, static
images fill the same role as text.
For accepting user input, the type of control we
use depends on what type of data we need. For text,
the standard option is the text box: a region of
space that accepts a single line of text.

Social engineering at it finest.

The check box gets used for Boolean inputs:


clicking it changes it from true to false and vice
versa.

279
Don’t sign something you don’t understand.

For numbers, spinners get used. A spinner is a


text box (for direct entry of numbers) with controls
to add or subtract from the number.

You sunk my battleship.

For selecting one option out of several


possibilities, there are a few common options. The
radio button is used for a small number of options,
and functions similarly to the check box. The list
is used for a large number of options, especially if
the number of options can change over time (or more

280
than one option can be selected). The combo box is
similar to the list, except that it only shows all
the options when the user is interacting with it.

Radio buttons to select the transmission method, a


list to select the station, and combo boxes to play
with the frequencies.

There are a couple of options for starting actions.


Buttons are used for major and/or common actions:
pressing a button typically starts a process.
Finally, menus are where most programmers cram their
miscellaneous actions.

281
Using a menu for a menu. Meta-menus.

These basic controls are offered by most windowing


systems, and work fairly well for a combination of
mouse and keyboard. However, the further away you
get from a mouse and keyboard setup (touchscreens and
game controllers), the less well they work (and the
harder you’ll have to think about your interface).

Event Driven Programming


There are two methods of programming with user
interaction. One is to write a loop that constantly
checks the state of the items in question. While
this can work, it eats up a lot of processor time:
unless your program is going to be a resource
hog anyway (video games or other single purpose
software), the only thing this will accomplish is to
annoy whoever is using your program.

282
I just want to check my e-mail.

For programs which react to the user, it would be


better if some part of the system could notify us
when something changes. In most operating systems,
this is done via an event queue: the operating
system periodically reads the state of the input
devices and stores the changes. Exactly how this
is better than the program itself looping depends
on the operating system: some processors provide
interrupts so an operating system can lie dormant
until something happens. Regardless of how the
operating system collects events, every so often, a
program with an active window will ask the operating
system for all the events.

Mouse Moved (110,200) -> (112,203)


Key Pressed Q
Mouse Button Down Left
Key Released Q
Mouse Moved (112,203) -> (111,204)

Details depend on the operating system.

Since something else (the operating system) is


carefully watching the events, our program can worry
about events at a lower frequency.
The events are typically pretty basic: key

283
pressed, key released, mouse moved, mouse button
pressed, mouse button released. Operating systems
might add some higher level ones (window needs close,
for when the user clicks the “x” at the top of the
window), but more complex logic typically falls to
the program itself.

Not Being Stupid


We know what we can put in a window (or have an idea
of the types of things we can add), and we know
how to read user interactions, the next thing is to
decide what we want to use for an interface.
Before we do this, I want you to think on all the
terrible software you’ve used over the years. Think
on all the bone-headed designs you’ve encountered,
and the bass-ackwards user interfaces you’ve seen.
If you’re writing a program, your goal should be to
not have your interface on that list. The very first
step to accomplishing this is planning: before you
start programming, you should know what your program
is supposed to do, and exactly how it’s going to do
it. Knowing this should let you know what the most
important activities are, which you should make as
straightforward as possible.
For instance, if you’re trying to make a program to
manage a nuclear power plant, the first thing to do
is know what the operators might want to do. Since
(I presume) you are not an operator of a nuclear
power plant yourself, you don’t know what an operator
might want to do, so your first action should be to
go ASK a few of them.

284
It’s pronounced nucular.

Ask them for a list of what they want to do,


and more importantly, ask them which actions are
most important and/or most frequent (these aren’t
the same thing). Ask LOTS of questions: talking
is a free action, but once you’ve got something
coded, changing things becomes hard to impossible
(especially with UI code). However, you should think
critically about their answers: your clients know
what they need to do, but they have no idea how to
code. In particular, you should pay more attention
to actions (“change the position of the fuel rods”)
and less to aesthetics (“I liked the dials on the
old switchboards”). A good interface can be made
aesthetically pleasing, but a bad interface is hard
to salvage.
Once you have the full list of what your program
must do, and the relative importance of each task,
you need to start laying things out. Two things
to keep in mind are simplification and consistency.
Consistency is important: if two tasks have the
same basic point, they should control in the same
basic way (if you use one type of control for water
valves, you should use the same type of control for
oil valves).

285
If they have the same real world control, consider
giving them the same virtual control.

You should also strive for simple controls: a lot


of this can be managed by matching the control to the
data type. However, the data type alone might not
narrow it down to one option: starter generators can
be viewed as either on/off (a Boolean value, managed
by a check box) or a pair of activate/deactivate
actions (two buttons). What do you do when faced
with such a choice? Go ask an expert (the OPERATOR),
either directly (“Which do you prefer?”) or
indirectly (“Hey, use this program for a bit while
I watch.”).
It’s also important to provide good feedback.
For example, when the operator presses the activate
button (and they will tell you to use the button
instead of the check box), there should be a visual
change to mark that the starter generator is on: it
should be minor and unobtrusive, but there should be
something the operator can look at to confirm.

286
A horse is a horse, of course of course (a horse
provides 1hp).

Finally, run your design by the users: they’re the


ones who have to use it, so they should have some say
in how it works. Test early, test often.

287
Challenges
1) For your choice of system and language, find out
how to get a window on the screen (you might consider
using a cross platform library: the operating system
controls for the interface are uniformly terrible).

2) See if you can get a bitmap image drawn to that


window: pick an image that’s as garish as possible.

3) Finally, try to program a calculator on this


system.

288
Chapter 21

How to multitask

The goal of this chapter is to have multiple programs


running at the same time.

Multiple Programs
There are two main reasons to have multiple programs
running at the same time. The first is the obvious
one: if you have two processors working on a
problem, they can get that problem done in half the
amount of time.

One man can dig a post-hole in 60 seconds. 60 men


should be able to do it in one second.

289
The second, less obvious reason is responsiveness.
Older, single-task computers had, at any given time,
one program running: if that program checked user
input and responded to it, great, otherwise you got
to sit and look at a blank screen until the program
finished running.
So, even on a machine with a single processor
(where the obvious benefit doesn’t come into play)
it can still be useful to support having multiple
programs/processes running. On a single processor
machine, the processor is designed to, every so
often, stop running the current program and start
running the operating system. The operating system
then decides which program runs next; that program
runs for a bit until it gets stopped in favor of
another program.

One processor can only run one thing at a time, but


it can change what it’s working on.

Since the processor clock period is on the order


of nanoseconds, and the thread quantum is on the
order of milliseconds, the switches between processes
happen faster than you can notice.
There is one drawback to multiprocess operating
systems: sharing resources. If you’re the only
person sitting at a table, you can spread your stuff
all over the table if you want. However, the moment
somebody else sits down, you have to share: two

290
people can’t use the same part of the table at the
same time.

There can be only one.

A computer has a finite amount of memory that all


running programs have to share. It is the job of the
operating system to manage memory for the programs
that need to run: in particular, it needs to protect
the memory in one process from other processes.
However, some processes actually do need to access
each others’ memory: two programs working on the
same problem might like to keep the data for that
problem in one spot. Two processes that share the
same memory are called threads (technically, it’s one
process with two threads). Fair warning: threaded
programs are difficult to debug, which should become
evident through the rest of the chapter.

Data Races
Let’s say you’ve gotten a plate of steak and a fork,
and you sit down at the table. You look around and
realize you forgot to get a knife, so you get up to
go get one. Then your friend comes over with his
plate. He sees the fork already on the table, and
starts eating some mashed potatoes.

291
I like mashed potatoes too, but if you’re going to
steal a fork, why not take the steak as well?

He decides he wants some ketchup, so he puts the


potato-stained fork down and gets up. Finally, you
return and get mad at the lazy bastard.

We gotta get more silverware.

This is the essence of a data race: two processes


should synchronize their use of a resource (I use the
fork, clean it, then you use the fork), but don’t.
As a concrete example, suppose you have two threads

292
running the following code.
.data
x:
.long 0
.text
...
#x = x + 1;
movl $x, %eax
movl (%eax), %ebx
movl $1, %ecx
addl %ecx, %ebx
movl %ebx, (%eax)
A thread can be stopped at any time. So, it might
happen that process 1 loads x from memory, but is
stopped before adding; process 2 then runs, loading
x. If that happens, it doesn’t matter what happens
next, x is getting set to 1 (both processes read
zero, add one, and store). However, process 1 might
run through all the commands before process 2 loads
x: in that case, x gets set to 2 (process 1 adds
one, then process 2 adds one). We ran the same code
twice, and got a different answer: this is not good.
It’s the operating system that decides which thread
gets to run when: consequently, the operating system
has to provide some way to manage access to data.
The most common solution is the mutually exclusive
lock: switching over to C, our code might look like

#include <qthread.h>
int x = 0;
lock x_lock;
void threadCode(){
lockMutex(&x_lock);
x = x + 1;
unlockMutex(&x_lock);
}

“lockMutex” and “unlockMutex” are functions


that have to be provided by the operating
system in some form or another (in Windows
it’s “EnterCriticalSection”, in *nix it’s
“pthread_mutex_lock”). When “lockMutex” is called,
the operating system checks if the lock is owned.

293
If not, the operating system marks it as owned and
lets the program continue. However, if it is owned,
the operating system will halt the current thread
(and let another thread run) until the process that
owns the lock calls “unlockMutex”. When writing a
multithreaded program, any time you edit a shared
variable (or read a shared variable that may also be
edited), you need to use a lock.

Deadlock
It should be obvious that whenever you take a lock,
at some point you need to return it and let other
people use the thing. What might not be obvious
is that we can cause threads to hang even if our
code will return the lock after we’re through. For
example, suppose you grab the key to the bathroom,
head in, and notice it’s out of toilet paper.

It’s better to find this out sooner rather than


later.

While you were doing that, someone else went to the


janitor’s closet, grabbed some toilet paper, and is
now going to get the key to the bathroom. You can’t
retrieve toilet paper (the other guy has the key to
the janitor’s closet) and the other guy can’t go
to the bathroom (you have the key). At this point,

294
neither of you can use the restroom.

The story of how I got fired.

This is called deadlock, and has a fairly simple


fix. If you order your locks (bathroom key, then
janitor’s closet key) and always grab all the locks
you need in order (get bathroom key before the
janitor’s closet key), then you won’t encounter
deadlock. Of course, in a big project with a large
number of locks that can be easier said than done.

Produce Consume
With locks, you can have two threads working at the
same time without corrupting each other’s state.
However, our threads have no way to let the operating
system know they have nothing to do. If a thread has
nothing to do, we don’t want the operating system to
run it (it will just waste time).
There are two ways we can go about doing this. One
is to signal the operating system that our thread
should not run for some amount of time: this is
typically used in video games and other programs with
a heavy emphasis on user interaction (if you need to
display new information every 14 milliseconds, but
finish your calculations in 7, you can sleep for 7
milliseconds while other programs/threads work).

295
An alternate approach is to suspend a thread
until another thread signals otherwise. There are
several approaches to this, but they are all used
for producer/consumer type workloads. If one thread
produces tasks that other threads need to run, those
other threads can’t do anything until the producer
thread has had a chance to make something.

A straightforward approach to threading is creating a


to-do list, and having a bunch of threads grab items
as they can.

296
Challenges
1) Look up the multithreading library for your
system, and identify how to create a thread, create
a lock, lock a lock, and unlock a lock.

2) See if you can write a program that starts two


threads, each of which adds one to a variable one
million times (1+1+1+1+...). Do you get 2 million as
the final answer?

3) Look up Amdahl’s law. What kinds of programs


would be well suited to multithreading? Which kinds
are poorly suited?

4) A mutex lock is a “synchronization primitive”:


there are other basic ideas for synchronizing
threads. Look up a few, and compare them.

297
298
Chapter 22

How to grammar

The goal of this chapter is to go over how to parse


text.

Why?
Suppose you’re writing a program that takes text
input from the user. Furthermore, suppose you want
that text to have a certain format (i.e. you need an
e-mail to send spam to).
Thing is, smart programmers have low opinions of
their users. If you ask a user for an e-mail, you
might get a phone number, social security number,
their mother’s maiden name, or anything but an
e-mail. Additionally, even good users can mistype
something. So, if you are taking input from a
user, you should probably check that it is formatted
correctly before doing anything with it. This is one
of the uses of text parsing libraries.
Another use of text parsing facilities is breaking
up a large text file. Suppose you have a large text
file (such as a program), and you want to break it
up into recognizable pieces (that’s a number, that’s
a name, that’s a keyword). You could code this up
by hand (we did in chapter 11); however, it can be
faster and easier to use an automated tool (called a
parser generator) to do this.

299
Finite Automata
The first thing we need to answer is how do we
actually recognize a language. The mechanics
actually depend on how complex the language is, but
for the simplest, most common problem (the regular
languages), the preferred mechanic is the finite
automata. A finite automata is a collection of
states, and a description of when to change state
(based on the text).

We’ll go over this, piece by piece.

We start in one state, and run through the text


letter-by-letter. For each letter, we change state
based on which transition matches that letter. At
the end of the text, if we’re in an accept state, the
text was valid.
This is one of those things that sounds harder than
it is; an example will clear it up. Let’s look at
the e-mail example: an email address is a collection
of letters and numbers, followed by an apetail (@),

300
followed by a collection of letters and numbers,
followed by at least one dot and a collection of
letters and numbers.

LETTERNUMS @ LETTERNUMS . LETTERNUMS


LETTERNUMS @ LETTERNUMS . LETTERNUMS . LETTERNUMS

Our finite automata needs a start state: the state


that represents no input seen yet.

So far: “”

We also need a fail-state (if we see something


other than a letter, number, apetail or period).

301
Fail is an X: once a failure, always a failure.

We need to see at least one letter/number: if we


see an apetail before that, we fail.

So far: “LETTERNUMS”

If we see an apetail after (at least) one


letter/number, we can then start on the domain.

302
So far: “LETTERNUMS @”

Again, we need to see at least one letter/number.

So far: “LETTERNUMS @ LETTERNUMS”

We then need to see a period and at least one


letter/number.

303
So far: “LETTERNUMS @ LETTERNUMS . LETTERNUMS”

If we’ve seen the period and letter, we have a


valid e-mail. So, this last state is an accept
state: if the input ends while we’re in an accept
state, the text was valid (otherwise it is invalid).

Accept is a circle: acceptance can be revoked.

Finally, we need to allow multiple periods: we can


reuse some of the states for this.

304
If we see a period in the accept state, we need to
see some more letters.

If you don’t get how this works, trace out


the states for “imapseudonym@abc.qed.com” and
“yabbadabba”. The first should wind up in an accept
state, while the second should not.
This automata can easily be used to test whether we
have an e-mail (Do we wind up in the accept state?).
In order to recognize and separate an e-mail from a
longer string of text, we can run the automata for
all the text, and pick the last time the automata was
in the accept state.

305
Run multiple automata to differentiate e-mails and
dollar amounts.

Regular Expressions
I said above that finite automata were used for
regular languages. Regular languages are languages
that can be recognized with a finite amount of
memory (the state you are in can be represented
with a single number). While they are tested using
finite automata, they are specified using regular
expressions. A regular expression is one of five
things:
1) Nothing (for those cases where silence is valid).
2) A single letter.
3) A sequence of regular expressions.
4) A choice between regular expressions.
5) A repeated regular expression.
As an example, a vowel is one of aeiou; each
individual letter is a regular expression (“a”,
“e”, “i”, “o”, and “u”), and a vowel is a regular

306
expression of type 4 (“a | e | i | o | u”). Single
letters are written as is, sequences are just written
one after the other (“ab” is a followed by b), choice
is done using the pipe (as in the vowel example), and
repetition is represented with an asterisk (“a*” is
zero or more as).
An alternate way to represent choices of single
characters is to use brackets, so a vowel could also
be written as “[aeiou]”; any character in a range is
written similarly (all lower case letters would be
written “[a-z]”).
Finite automata tend to be fairly difficult to
write, while regular expressions tend to be compact.
As an example, a regular expression for a letter or
number is

LN = [a-z] | [A-Z] | [0-9]

and a regular expression for an e-mail is

LN LN* @ LN LN* . LN LN* (. LN LN*)*

(in an actual implementation, you’d have to


escape the period). There are ways to turn a
regular expression into a finite automata: I’m
not covering them here (they require covering
nondeterministic automata). However, many
programming languages provide libraries for testing
whether a string matches a regular expression: you
have a choice of using these libraries or writing
subroutines for every type of thing you want to
test (“testemail(string) and testaddress(string) and
testnumber(string)” vs “test(string, regex)”).

Grammar
I mentioned that regular expressions and finite
automata are for “regular” languages. A natural
question to ask is what other kinds of languages you
might have. Answering that question requires being
able to talk about the structure of that language;
the structure of a language is called its grammar.
A formal grammar deals with terminals and
nonterminals: the terminals are the “words”

307
(the actual text that gets written), while the
nonterminals are groupings (clauses and the like).
A grammar is a collection of rules stating what
nonterminals can contain. As an example, in English,
a sentence is a subject, a verb phrase, and an
optional object.

Sentence -> Subject VerbPhrase


Sentence -> Subject VerbPhrase Object

If you’ve ever diagrammed a sentence, you wound


up with a tree: the clauses and phrases are the
nonterminals, while the words themselves are the
terminals.

You have any idea how long it’s been since I’ve made
a parse tree for English?

The grammar for a language tells you what that tree


might look like. The structure of that grammar also
dictates how complex that language is, and how hard
it is to parse. For example, in a regular language,
every rule turns a nonterminal into a terminal
(required) followed by a nonterminal (optional).

A -> b C

If all your rules look like this, then when parsing


you only have one nonterminal to deal with (finite

308
memory). For instance, the e-mail example could be
written

Email -> letnum SawLN


SawLN -> letnum SawLN
SawLN -> apetail SawApe
SawApe -> letnum SawALN
SawALN -> letnum SawALN
SawALN -> period SawAP
SawAP -> letnum SawAPLN
SawAPLN -> letnum SawAPLN
SawAPLN -> period SawAP
SawAPLN ->

Each of the nonterminals corresponds roughly


to a state (and the accept state corresponds to a
nonterminal which can expand to nothing). If we
take an email and diagram it, we wind up with the
following parse tree (in actuality, a list).

A terminal may be either a letter or a word/token


(pick whichever definition is convenient).

A step up in complexity is a grammar where


each nonterminal expands into some combination of
terminals and nonterminals: these are the context
free languages. Most programming languages fall
in this category (at least, they do with only
minor modifications). As an example, an arithmetic

309
statement (Expr) might have the following grammar.

Expr -> Term


Expr -> Term + Expr
Expr -> Term - Expr
Term -> Factor
Term -> Factor * Term
Term -> Factor / Term
Factor -> integer
Factor -> ( Expr )

This specifies what a valid arithmetic expression


is: it also describes how to group subexpressions.
For instance, if we parse “5 + 7*(42-9)”, we get the
following parse tree.

Coding this up yourself can take time: automated


tools are helpful.

This parse tree correctly orders the operations


(multiply before add), which falls out of the
structure of our grammar. There exist tools (called
parser generators) which take a description of a
context free grammar (or a context free grammar with
additional restrictions) and produce code to organize
the terminals into a parse tree. These tools are
used as the first step of compilers and interpreters
(go review chapter 11).

310
The next step up in complexity is the context
sensitive grammar, where the allowed groupings depend
on context. In other words, each rule has context
surrounding the nonterminal to expand (and that
context is preserved). An example could be how, in
English, a number specification (2) forces you to use
the plural form of the noun (“two men” rather than
“two man”).

two Noun -> two pluralnoun

If you were clever, you might be able to write


this in a context free way: however, there are
constructions where no amount of cleverness will make
it context free (the canonical example is requiring
equal numbers of three nonterminals, “an bn cn ”).
The final step up is where the rules don’t preserve
the context: this is an unrestricted grammar (and no
longer produces a parse tree).

311
Challenges
1) Any serious programming language has a regular
expression library. Find one for your preferred
language, and figure out how to make it test whether
a string matches a regular expression.

2) See if you can write a regular expression to match


integers. Test it on “42”, “-130”, and “49.5”.

3) C has a parser generator in the form of lex


and yacc. Look up the manual for them (or another
compiler-compiler if you prefer) and use them to
parse a text file of your choosing.

4) See if you can design a finite automaton that is


equivalent to “[Ff]u*(eL | ck)”. Describe the words
this will match.

312
Chapter 23

How to lengua

The goal of this chapter is to cover common


programming language philosophies.

High vs Low Level


When choosing (or designing) a language, there are
a few things to keep in mind. First and foremost,
every language has some set of primitives that it
is well suited to handle. As a few examples, C’s
major primitive is the pointer, Java’s is the class,
Fortran’s is the matrix and LISP’s is the linked
list. Each language provides utilities custom
tailored for those primitives: working with those
primitives is made easier, while other primitives are
comparatively clunky.
At the end of the day, every program runs through
assembly: a natural question to ask is whether
the programming language we’re using mirrors the
assembly. By now, it should be obvious that C is
within spitting distance of assembly (there’s a
reason I used it to mark up the assembly examples).
On the other hand, LISP is fairly far removed from
assembly: LISP treats every operation as a function
call, and its basic primitive (the linked list cell)
is a complex structure.
This is not to say LISP is a bad language (it’s
actually halfway decent), just that its primitive
operations don’t match those of the hardware.
This leads to one of the main splits in language

313
design: is the language low level (i.e. matches the
assembly code) or high level (uses its own, complex
primitives).
One of the helpful things with high level languages
is garbage collection. If you remember, in C,
whenever we allocated memory (malloc), we had to
make sure we returned it when we were done with
it (free). If your main primitive is the pointer,
that’s about the best you can do (the programmer
can do absolutely strange things and still have a
valid program). However, if your main primitive
is a structure or class, it becomes possible for
the compiler/interpreter to manage the frees
automatically.
One way to do this is to count the number of
pointers pointing at any given structure. If nobody
knows where the structure is, nobody can use that
structure, so it’s useless memory (that can be
returned).

There are no pointers to the floats, we should be


able to reuse that memory.

This is one of the advantages of a high level


language: the compiler/interpreter can do more of
the tedious things for you automatically, while a low
level language forces you to do them yourself (the
lowest level, assembly, forces you to do everything
yourself). On the other hand, if you don’t like how

314
the high level language does things, you’re stuck
with it, while a low level language would let you do
what you want.

Static vs Dynamic
The next big split is between static and dynamic
languages. A static language is one where everything
is known at compile time: while this cannot be done
(if we knew everything, we would know the answer and
wouldn’t need to run the program), some languages
get closer than others. In particular, types are an
important aspect.
One of the best examples of a statically typed
language is C. If you remember, in C we have to
specify the types of the variables in our program.
We can’t have a variable named “x”, we need to
have an int variable named “x”. When we created
a subroutine, we had to say what type of thing is
returned.
On the other hand, a dynamically typed language
is one where type information is left unknown until
the program is run. As an example, LISP will let you
declare a subroutine as

(defun square (x) (* x x))

This example squares a value (multiplies it with


itself). If you look closely, you will notice a lack
of type specifiers. Until you run this program and
call the square subroutine, the LISP interpreter
does not know what x is. At best, it knows x has
to be something you can multiply (and that requires a
relatively complex analysis).
One advantage of a dynamic language is that there
is less for the programmer to write: the same
function in C would be

double square(double x){


return x*x;
}

Now, this is a simple example, but add in multiple


functions with intermediate variables, and it adds

315
up. Another advantage of dynamic languages is that
they’re more flexible: in LISP we can call square
with an integer, real or even a rational and the
subroutine will run and give us the appropriate
value with an appropriate type. In C, the subroutine
expects a double, and is giving us a double.
But, there are a few drawbacks to this. The first
is that everything needs to carry type information
with it. So, in C, an integer is a single word
(i.e. 4 bytes on a 32 bit system). However, in
LISP an integer is, at a minimum, two words (a type
specifier and the payload). In languages that add a
lot of information to types (Python), this can bloat
a large-but-manageble set of data into an unusable
mess.

Python needs a lot more boilerplate than C.

The second drawback is that, when the subroutine


runs, it needs to check the type. So, instead of
just multiplying, the LISP code might become (if we
were going to C)

316
void* square(void* x){
int* typePtr = (int*)x;
void* valPtr = x + 1;
int xType = *((int*)x);
if(xType == 0){
int val = *((int*)valPtr);
int res = val * val;
//and package result
}
else if(xType == 1){
double val = *((double*)valPtr);
double res = val * val;
//and package result
}
//...
}

I don’t care how you slice it, this is going to be


slower (if you are VERY clever, you might get away
with simply increasing the file size).
A final drawback is that it’s easy to forget
what a function expects. If you’re writing a 100
line program, this isn’t a big issue, but if you’re
writing a 100000 line program, you will forget what
particular flavor of drug you were taking when you
originally wrote your code. Baking type information
into the program acts as a safeguard against the
ravages of time and bad memory.

317
Being buzzed has been shown to increase coding
productivity.

Compiled vs Interpreted
A third major split is between compiling or
interpreting. As a reminder, a compiler reads in a
program in one language and produces an executable:
you send that executable to someone else, and their
processor runs the machine code. On the other hand,
an interpreter takes the program (in its original
language) and “simulates” the effects of the code.
In point of fact, every language can be either
compiled or interpreted. However, some languages
lend themselves to one option over the other.
More dynamic languages tend to be better suited
to interpreting (it’s easier to write flexible
interpreting code than to bake that flexibility
into the compiler, and there’s less of an advantage
to compiling in the first place), while static
languages tend to be better suited to compilation.
Additionally, low level languages tend to be easier
to write compilers for (since the differences between
the source and the assembly are smaller).
An obvious advantage of compilation is speed:
having the CPU run its own flavor of assembly is
always going to be faster than running a simulation

318
of a program. Additionally, compiling your code
allows you to send the executable to your customers,
rather than your source. If you’re trying to keep
creative control over a project, this is useful.
Interpreting requires that the person who wants
to run the code has the simulator program on their
machine. This requirement ranges from trivial (when
you have control over the computers in question)
to impossible (when J.Q. Public is trying to run
your program). However, interpreting has a few
advantages. One is a short turnaround time: being
able to quickly see the effects of changes can be
helpful when debugging a piece of code. It’s also
easier to get error information from an interpreter.
The biggest advantage, however, is scripting. As
an example, let’s say you’re writing a program that
manages a large collection of files on a hard drive:
part of a System to Operate (there’s got to be a name
for this sort of program). There are some basic
operations that everyone is going to need (moving,
copying, deleting). However, you can’t anticipate
everyone’s needs: some strange person is going to
need to delete every file beginning with “BOB”.

Breaking up is hard to do.

This is where scripting comes into play: your main


program is compiled, but includes an interpreter.
The user can provide their own code for your

319
interpreter to run, which allows the user to add code
for their own requirements. Windows accomplishes
this with batch files, while *nix (Linux and Mac)
allow you to write shell scripts.
There are other programs that use this same basic
paradigm: one of the biggest is a web browser (HTML
is used to define a webpage, JavaScript defines how
it reacts to the user, and the browser interprets
both to draw a pretty picture).

Assorted Weirdness
The three splits above are some of the biggest
considerations: if you’re selecting a language for
programming, then those are the items you really
have to watch. However, there are some other things
to watch: many of these revolve around the main
primitive. For example, many programmers will swear
by functional programming languages (where the main
primitive is the function): your mileage may vary.
In addition to programming languages, there are
some domain specific languages tailored to solve a
particular problem. Some major examples are hardware
description languages (used to describe connections
between gates, such as VHDL and Verilog), data
storage languages (used to organize data, such as
XML or JSON) and database languages (used to extract
information from a database, such as SQL).

320
Challenges
1) Go look up three programming languages. Write the
same simple program in each (if you need inspiration,
write bubble sort), and compare and contrast them.

2) Look up the scripting language baked into your


Operating System (batch files or shell scripts) and
figure out how to move every file beginning with
“BOB” to a different folder.

3) If you get deep into shell scripting, you should


keep good backups (you should backup your files
anyway, but it’s a real concern for programmers). If
you have a Windows system, look up what the command
“rmdir /S /Q C:\” does; if you have a *nix system,
do the same for “rm -rf /”. Can you explain why you
shouldn’t run code you don’t understand/trust?

4) An interpreter is a simulator of code. There’s


no reason you can’t simulate assembly. Look up Java,
and explain how it works. Why might the designers of
Java chosen this design?

321
322
Chapter 24

How to big

The goal of this chapter is to build a Big project.

Planning
When you start any big project, the first thing you
must do is plan. There are many different approaches
to this: some people plan things out to the n-th
degree, leaving no part of the project up to fate.
Other people’s plans consist only of vague goals,
relying on grit to make everything work together.
This reflects different skillsets: some people
are good at planning, while some people are good at
winging it. I can’t tell you your talents.
What little I can do is tell you how I do things.
I’ve put my 10000+ hours into writing code: as a
result, I am more tolerant of “winging it” than
someone new to programming. However, I also
like building code to last: that requires good
interfaces, which requires good planning.
When starting a new project, the first thing I
do is sit down and figure out the main “actors”
in the program: the big pieces of data and logic.
This allows me to make a (simple) class diagram: a
description of the data structures in your program
(and how they relate). For instance, a video game
might have a controller loop, and types for each
of the main entities in the game (player, enemy,
pickup).

323
There is a standard for this (UML), but if you NEED
the details, you’re doing it wrong.

Each of these constitutes a class: the next


step is to create shells for every class. Instead
of writing each class fully formed, I specify the
signatures of the methods, and note which methods
I’ve finished and which need work. Starting this
way means that any major issues with your plan jump
out at you (i.e. you originally wrote draw with zero
arguments, but it’s hard to render without a screen
to render to).
It helps to start with the minimal set of classes
you need: for a video game, the minimum set would be
the controller, player and environment. If you get
those working, you can add the others later, but if
those don’t work, you won’t be able to test anything.
Once you have your shells, do the absolute minimum
you need to get a working program: I’ve heard this
phrased as “debugging Hello World”. Once you have
that minimum, fill in your methods top to bottom
to add functionality. In the video game example, I
would probably start with the controller, then add
the environment (with a dummy player), then fill in
the player code.

324
The minigun suggests Quake, the headcrab zombie
suggests Half-Life.

Doing things this way means that, if you ever have


anything go wrong, there’s less code to go over. If
you fill in every class, then run and get an error,
your error could be in any class. However, if you
fill in two classes, run, and get an error, there are
only two classes your error could be in.
Over time, you will build up experience; you will
figure out what’s important, what’s not important,
and most importantly, what’s important to you (this
is important). You will figure out your own way of
doing things; however, I have a few pieces of advice
that might smooth out the process.

Documentation
Good documentation will cover a lot of sin. Every
programming language provides some way to add
documentation: comments that the compiler ignores,
but allow you to add information for the hapless
programmer. Writing good comments will allow you
to write a piece of code, let it sit for years
(or even decades), and come back to it and have a

325
chance of figuring out what you were thinking. Good
documentation will also help you play well with
others: as an example, look at the following code.

void sp(int* x, int* y){


*x = *x ^ *y;
*y = *x ^ *y;
*x = *x ^ *y;
}

If you’ve never seen this trick before, you will


have NO idea what this code does. But, if the person
who wrote it properly documented it:

//This swaps two values.


//@param v1 The location of the first value.
// On return, will hold *v2.
//@param v2 The location of the second value.
// On return, will hold *v1.
void swap(int* v1, int* v2){
*v1 = *v1 ^ *v2;
*v2 = *v1 ^ *v2;
*v1 = *v1 ^ *v2;
}

you will know (or have some idea) what the subroutine
does, even if you don’t know how it works. Proper
documentation comes in two parts. The first is by
adding comments (in C, this is any line beginning
with “//”). Comments allow you to add text that
the compiler ignores. Exactly how you add comments
depends on your own style.
I write short subroutines (less than one page per
subroutine), so I tend to not put comments inside
subroutines. For every subroutine, I add comments
explaining what the subroutine does, what each of
the arguments is (“@param”), and what it returns
(“@return”). I like the javadoc format (the first
production language I learned was Java), so that’s
the format I use. I also explain any weird error
cases that might show up (“@throws”, for reasons that
will soon become apparent).
The first part of proper documentation is good
comments. The second (and more important) part is

326
good naming. Look again at the two examples: in
one, the subroutine is called “sp”, in the other,
it’s called “swap”. If you saw a random call to
sp (“sp(&a, &b)”), you would have to look up the
function. But, if you saw a random call to swap
(“swap(&val1, &val2)”), you could guess what it did.
When you write code, use intelligent names: typing
a longer name now will save you more time in the long
run.

Flavors of Error
If your program has a problem (and it does), there
are three times at which you can spot it. The
first is at compile time: you’ve done something
so strange the compiler can spot it for you. This
shows up if you’ve broken the grammer of the language
(i.e. you’ve mispelt a keyword, or havent matched
parenthesis. This also shows up if you try to use
a variable that doesn’t exist, or if you mess up the
types (trying to set an integer to text).

int calcTax(int cost{


//text is not an integer
//the compiler will complain
return “qed”;
}

The second time is when you run the thing. The


classic example is dividing by zero: the compiler
cannot spot that this will be a problem (unless
you’ve been very dumb and the compiler is very
smart), but most processors will complain. Exactly
how this error gets spotted depends on the language:
Java will add a large number of checks in the code
itself (which means Java can provide useful error
messages) while C relies on the processor’s error
checking (which means the Operating System gives you
some flavor of fault and kills the program).

327
int calcTax(int cost){
//valid code
//but the processor will choke
return cost / 0;
}

The third time is after a run. In this case,


you compiled the program, ran the program, and
the program told you that 10% of 137 was 42. Your
code had valid operations, it didn’t break any
restrictions of the computer, yet your code is still
wrong (in this case, you multiplied by 0.31 instead
of 0.1).

int calcTax(int cost){


//30% tax is nuts
return (int)(cost * 0.31);
}

These errors are called, respectively,


compile-time, run-time and logic errors. For any
given error, the earlier it gets detected, the easier
it is to solve. This is a very good reason to try to
turn logic errors into run-time errors, and run-time
errors into compile-time errors. This is also why I
prefer statically typed languages: if the compiler
knows the types, it can figure out when you’ve messed
them up (turning a run-time error into a compile-time
error).
This is also why C can have some nasty bugs:
things that should probably be run-time errors
(messed up pointers) are treated as logic errors
(provided the processor and Operating System don’t
catch it). Of course, the reason C doesn’t add those
checks is illuminating: performing error checking
takes time. There exist tools that you can use to
add those memory checks to a program: those tools
run VERY slow.

Exceptions
As much as possible, you want to turn logic errors
into run-time errors: with only the tools I’ve laid

328
out for you so far, that’s not an easy task. For
instance, suppose you’re writing a program to grade
essays. The easiest way to grade an essay is to see
how long it is.

It’s not the length, it’s how you use it.

To do this, you will need to open the file


containing the essay and read it in byte by byte
(counting the bytes until you hit the end). Now,
let’s say the user of your program wants you to
grade a file named “goodessay.txt”, but the only file
on the system is “badessay.txt”. The file you’re
looking for does not exist: it is an error to try
to open it (you can’t squeeze blood from a stone, for
many reasons).
How do you (the programmer) check for errors? One
option is to have the function in question return an
invalid value if there is an error: as an example,
C’s fopen returns null if there is a problem opening
the file. However, your function might not have a
reasonable invalid value (if you’re reading from a
file, all 256 possibilities for the bytes are valid).
In this case, you can add an extra return (using a
pointer).

329
FILE* myopen(const char* name, int* errRet){
*errRet = 0;
...
if(problem){
*errRet = -1;
}
...
}

And, after we call the function, we can check


whether there was an error.

int fopenErr;
FILE* curRes = myopen(“filenam”, &fopenErr);
if(fopenErr){
...
}

This works, and is a typical option in C. However,


it’s very easy to screw up: every time you call a
function that can fail, you need to add code to check
whether there was an error (and react to it). This
doesn’t sound like an easy task, especially when
you remember the number of functions that can fail
(malloc being a big one).
The big problem with error returns is that you can
ignore them. What we really need is something we
can’t ignore.

Dodge, duck, dip, dive and dodge.

This is where exceptions come in: an exception


is like an extra return, but the compiler handles

330
it automatically. When you throw an exception, the
current block of code stops running. If you did not
add any code to handle an exception, the exception
keeps killing code blocks until it either hits
handling code or stops the entire program. In C++,
you throw an exception with the keyword “throw” and a
value (that contains extra information to, hopefully,
tell you what exactly went wrong).

const char* value = “We have a problem.”;


throw value;

If you don’t add code to handle an exception, this


will kill the program. If you want to try to handle
an exception (for example, you tried to open a file
that doesn’t exist, so you want to ask the user to
give you a real file and retry), you use something
called try/catch.

try{
FILE* fp = myopen(filenam);
}
catch(const char* errMess){
printf(“Give me a file that exists, stupid.”);
}

If anything in the try block (myopen) throws an


exception that we can handle (char*), then instead of
killing the program, the code in the catch block is
run.

Need for Speed


One last thing to bear in mind before starting on
your own large projects: speed. First things first:
if you are running your code on small problems, who
cares. Spending $100000 on a race car is pointless
if you’re only going ten feet.

331
For fun on a long trip, if your passengers fall
asleep, park in front of a brick wall, scream and
wail on the horn.

If you’re working on large problems, the first


thing you should do is pick decent algorithms. If
you have an algorithm that is O(N 2 ) (or worse),
see if you can bring that down to something more
reasonable (O(N log(N )) or O(N )). It’s helpful if
you do this at the planning stage: it’s also helpful
to write your code so that it’s easy to change the
algorithm you use (use classes wisely).
If you’ve picked good algorithms and are still
having trouble, the next thing to do is figure out
what, exactly, is running so slow. “You cannot
manage what you cannot measure”: there are programs
(called “profilers”) that will measure how much time
is spent in each subroutine. Use them to figure out
where your problems are.
Once you know where your problem areas are, you
need to figure out how to make them faster. This
depends on your choice of system, language and
compiler: for instance, some compilers are smart
enough to take constant calculations out of a loop.

332
while(i<7){
double tv = sqrt(a)-cos(b);
printf(“%d”,i*tv);
i++;
}

Our code is written to calculate “tv” every


iteration, but it could be taken out of the loop and
calculated once. Some compilers are smart enough to
do this, others make us do it ourselves.

double tv = sqrt(a)-cos(b);
while(i<7){
printf(“%d”,i*tv);
i++;
}

As far as I’m concerned, these kinds of


optimization are black magic. Luckily, they don’t
come up often: remember, the processor runs at
several billion cycles per second. A little bit of
waste is easily tolerated (provided you haven’t done
anything excessively stupid).

333
Challenges
1) There are multiple types of UML diagram. Look
up how a class UML diagram works, pick a problem you
want to work on, and break it up into classes.

2) There are tools that will scan for specially


formatted comments in your code and automatically
generate documentation pages (for instance, Java
has javadoc built in). Look up such a tool for your
preferred programming language, and figure out how to
make it work.

3) In C++, you can throw anything as an exception.


However, there are a few standard exceptions you can
throw. Look up what they are, and how to go about
making your own.

4) The swap subroutine in the documentation section


actually does swap two values. See if you can
explain how. Why might you use this technique,
rather than using temporary storage space?

void swap(int* v1, int* v2){


int tmp = *v1;
*v1 = *v2;
*v2 = tmp;
}

334
Afterword

So, you’ve read the book. If you enjoyed the


exercise, I have a few book suggestions for further
study. However, the best way to get good at this
stuff is to work on a project: go grab a sandwich,
pick a project, and make it happen. CHARGE!

Dr. Benjamin Crysup fixing a computer

If you did not enjoy the exercise, take heart.


While this stuff isn’t for everyone, you now know at
least enough to not be swindled by charlatans.

335
For the specifics about how transistors are made,
check out Silicon VLSI Technology: Fundamentals,
Practice and Modeling (by James Plummer, published by
Pearson Education, 2009).
If you want to go more in depth on this subject
(i.e. actually make a computer, rather than just
think about it), check out The Elements of Computing
Systems: Building a Modern Computer from First
Principles (by Noam Nisan and Shimon Schocken,
published by MIT Press, 2008).
If you want to learn more about C, you’d be hard
pressed to beat the original: check out The C
Programming Language (by Brian Kernighan and Dennis
Ritchie, published by Prentice Hall, 1978).
There is a lot of stuff in C++ I skipped (partially
because I have my... issues with the philosophy
behind those features). However, if you’re going
to properly learn the language, going to the source
is not a bad idea: you might check out Programming:
Principles and Practice Using C++ (by Bjarne
Stroustrup, published by Addison-Wesley Professional,
2014).
For most other languages, I’ve done most of my
learning online. However, for Java, I can suggest
Learning Java (by Patrick Niemeyer and Daniel Leuck,
published by O’Reiley Media, 2013).
For some more details on common algorithms and
data structures, Donald Knuth’s “The Art of Computer
Programming” is worth cracking open (provided
he ever finishes the damn thing). Published by
Addison-Wesley Professional, 2011.
If your algebra, geometry or trigonometry are weak,
you might take a look at Precalculus Mathematics in
a Nutshell (by George F. Simmons, published by Wipf &
Stock Publishers, 2003).
A surprisingly handy math skill to have in
computer science is discrete mathematics: the book
I was taught from was Discrete and Combinatorial
Mathematics (by Ralph Grimaldi; published by Pearson
Addison Wesley, 2004)
For more on computational mathematics, Numerical
Recipes: The Art of Scientific Computing is a decent
choice (by William Press, Saul Teukolskg, William
Vetterling and Brian Flannery; published by Cambridge

336
University Press, 2007).
More information on how networking works can be
found in Computer Networking: A Top-Down Approach
(by Jim Kurose and Keith Ross; published by Pearson,
2016).
The Essential Guide to User Interface Design: An
Introduction to GUI Design Principles and Techniques
(Wilbert Galitz, Published by Wiley, 2007) is an
excellent compendium of information.
A large part of my approach to large projects
comes from the “Xtreme Programming” crowd. My
introduction to this philosophy was Planning
Extreme Programming (by Kent Beck and Martin Fowler,
published Addison-Wesley Professional, 2000).
For learning, video games tend to be a good project
choice because they require the use of almost all the
various subsystems of a computer. For information
on the programming of video games, you might check
out Game Coding Complete (by Mike McShaffry and David
Graham; published by Cengage Learning PTR, 2012).
For a good laugh, check out XKCD (xkcd.com, by
Randall Munroe). His Thing Explainer book is also
an intriguing time (published by Houghton Mifflin
Harcourt, 2015).

337

You might also like