You are on page 1of 3

Before we explain what a distributed system actually is, let’s define a

fundamental,
age-old problem that pervades not only computer science and blockchain technology,
but all
of humanity.
How does a group make a decision?
Whether this be by majority opinion, general agreement, or force, how do we each
consensus?
Consensus is trivial when we only have one actor.
I can always agree with myself on where to have lunch, but when I go out with my
friends,
we have to all agree on where to go first, and the process by which we reach
agreement
might be difficult.
In this example, my group of friends had to come to consensus on a common choice of
action
– what to get for lunch – before we were able to move forward – to actually get
lunch.
This is no different from how a distributed system works, and is where we start
building
our intuition and context.
Consensus has been studied for ages in fields such as biophysics, ethics, and
philosophy,
but the formal study of consensus in computer science didn’t start until the 70s
and 80s,
when people decided that it would be a good idea to put computers on airplanes.
The airline industry wanted computers to be able to assist in flying and monitoring
aircraft
systems: this included monitoring altitude, speed, and fuel, as well as processes
such
as fly-by-wire and autopilot later on.
This was a huge challenge because being at such a high altitude poses many threats
to
normal execution of computer programs.
For one, it’s a very adversarial environment.
Being so high up means that the atmosphere is thinner, increasing the chance of a
bit
flip due to solar radiation.
And all it takes is a single bit flip to completely destroy the normal execution
of, for example,
the sensor measuring the amount of fuel an aircraft has left.
Compounded on this was the fact that aircraft could cost hundreds of millions of US
dollars,
and that commercial airliners where these computers were going to be deployed could
carry hundreds of passengers.
So, super dependable computer systems were first pioneered by aircraft
manufacturers.
They realized that the problem they were solving could be solved by introducing
redundancy
in their system.
Instead of using a single computer onboard their aircraft, thus having a single
point
of failure, they used multiple computers onboard to distribute the points of
failure.
How these computers coordinated amongst each other though was another challenge.
Early literature had focused on enabling coordination of processes – where these
processes could
be processes on a CPU, or computers in a network, separated spatially.
One of the most impactful pieces of literature during this time was “Time, Clocks,
and
the Ordering of Events in a Distributed System”, written by computer scientist and
mathematician
Leslie Lamport in the late 70s.
In “Time, Clocks, and the Ordering of Events in a Distributed System”, Lamport
shows
that two events occurring at separate physical times can be concurrent, so long as
they don’t
affect one another.
Much of the paper is spent defining causality – what it means for an event to
happen before
another – in both the logical and physical sense.
This is important because determining the order of when events take place, such as
the
measurement of a sensor or detection of error and subsequent error correction – as
well
as determining which events actually took place in the first place – is crucial to
the correct functioning of a distributed system.
On the right, you can see one of Lamport’s diagrams, depicting three processes –
the
vertical lines – each with their own set of events – the points on these lines.
Time flows in the upwards direction, and each squiggly line between events
represents a
message being sent, and received at a later time.
If there exists a path from one event to another, by only traveling upwards in the
diagram,
then that means that one event happened before the other.
If an event doesn’t happen either before or after another event, then it’s said
that
those events are concurrent.
For those of you with experience in quantum physics, you may notice the resemblance
between
Lamport’s diagrams, and Feynman diagrams, which show the interaction of subatomic
particles.
Lamport realized that the notion of causality in distributed systems was analogous
to that
in special relativity.
In both, there are no notions of a total ordering of events – events may appear to
happen
at different times to different observers, in the case of relativity, or processes,
in
the case of distributed systems.
While this is at a depth that is out of scope for this course, it’s important to
recognize
that through the efforts of Lamport and other scientists, the formal study of
distributed
systems began to take shape.
And as it turns out, the same problem that was originally studied to coordinate
computers
on commercial airliners is still studied today, for example on more high tech jet
planes
And more recently, on various spacecraft, such as SpaceX’s famous Falcon 9 or
Dragon
spacecraft.
For example, spacecraft have to be tolerant of the violent vibrations when
accelerating
through Earth’s atmosphere, and when they do leave the atmosphere, they have to
deal
with intense heat and cold depending on which side of Earth they’re on, and also
solar
radiation.
SpaceX Dragon specifically uses three flight computers, which perform calculations
independently,
and reboot automatically if errors are found after cross checking.
Distributed systems and consensus is also studied in the context of big enterprise
operations
too.
Distributed lock servers for example, ensure that no two processes can read or
write to
the same piece of data at the same time – a problem called mutual exclusion –
thereby
preventing potential corruption to important data.
And finally of course – the main focus of this course – the blockchain and
distributed
ledger revolution.
Fundamentally, each of these problems we just went over, and more, reduce to the
problem
of consensus.
In aircraft like rockets, jet planes, and commercial airlines, a number of
redundant
onboard computers must come to consensus on sensory data for example – the position
of the aircraft, its location and altitude, as well as its fuel levels, etc..
In enterprise distributed lock servers, processes must come to consensus on who can
write what
data at what time, as the coordination of this prevents data loss and corruption.
And finally, in blockchain, full nodes agree on some state of the system, depending
on
implementation.
In Bitcoin, users agree on who owns what bitcoin.
In Ethereum users agree on the correct execution of transactions and the general
state of the
Ethereum network.
Consensus attempts to create a reliable system from potentially unreliable parts –
parts
like the computers in aircraft that are vulnerable to bit flips due to radiation,
or power outage
in a data center…
Or in public blockchains, where the ever changing network topology, as well as the
existence
of malicious entities trying to subvert the network for economic gain, for example,
don’t
align with the goals of the system as a whole.
Instead of trusting the execution of individual processes or reliability of any
individuals,
we trust the general protocol and the math behind it.
It’s trust – without trust.

You might also like