You are on page 1of 7

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/288186705

There Is No Now

Article in Communications of the ACM · May 2015

CITATION READS
1 844

1 author:

Justin Sheehy

8 PUBLICATIONS 260 CITATIONS

SEE PROFILE

All content following this page was uploaded by Justin Sheehy on 14 March 2017.

The user has requested enhancement of the downloaded file.


practice
DOI:10 / 11 45/27331 08
“Now.”
Article development led by
queue.acm.org
Even if I held up a sign with that
word written and we both looked at it,
our perception of that image would not
Problems with simultaneity happen at the same moment, as the
in distributed systems. light carrying the information about
the sign would take a different amount
BY JUSTIN SHEEHY of time to reach each of us.
While some things about comput-

There Is
ers are “virtual,” they still must operate
in the physical world and cannot ig-
nore the challenges of that world. Rear
Admiral Grace Hopper (one of the most
important pioneers in our field, whose
achievements include creating the first
compiler) used to illustrate this point

No Now
by giving each of her students a piece
of wire 11.8 inches long, the maximum
distance that electricity can travel in
one nanosecond. This physical repre-
sentation of the relationship between
information, time, and distance served
as a tool for explaining why signals
(like my metaphorical sign here) must
always and unavoidably take time to ar-
rive at their destinations. Given these
delays, it can be difficult to reason
about exactly what “now” means in
computer systems.
There is nothing theoretically pre-
venting us from agreeing about a
common idea of “now,” though, if we
carefully plan ahead. (Relativity is not
a problem here, though it is a tempt-
“NOW.” ing distraction. All of humanity’s cur-
The time elapsed between when I wrote that word rent computing systems share a close
and when you read it was at least a couple of weeks. enough frame of reference to make rel-
ativistic differences in the perception
That kind of delay is one we take for granted and do of time immaterial.) The Network Time
not even think about in written media. Protocol (NTP),14 used for synchroniz-
ing the clocks between systems on the
“Now.” Internet, works in part by calculating
If we were in the same room and instead I spoke aloud, the time that messages take to travel
you might have a greater sense of immediacy. You might between hosts. Then, once that travel
time is known, a host can account for
intuitively feel as if you were hearing the word at exactly it when adjusting its clock to match a
the same time that I spoke it. That intuition would be more authoritative source. By providing
some very precise sources (clocks based
I L LU STR ATI O N BY T R ACI DA B E R KO

wrong. If, instead of trusting your intuition, you thought on continuous measurement of atomic
about the physics of sound, you would know that time radiation) in that network, we are able
must have elapsed between my speaking and your to use NTP to synchronize computers’
clocks to within a small margin of er-
hearing. The motion of the air, carrying my word, ror. Each of the satellites making up
would take time to get from my mouth to your ear. the worldwide GPS contains multiple

36 C OM M U N I C AT I O NS O F T H E AC M | M AY 20 1 5 | VO L . 5 8 | N O. 5
MAY 2 0 1 5 | VOL . 5 8 | NO. 5 | C OMM U NICAT ION S OF T HE ACM 37
practice

of these atomic clocks (so a single clock The speed of light is one such reali- steeped in the literature of distributed
failing does not make a satellite worth- ty, but so is one that is more pernicious systems. That would be laudable and
less), and the GPS protocols allow any- and just as universal: we cannot make worthwhile, but the past several years
one with a receiver—as long as they can perfect machines that never break. It is have shown (through dozens of articles
see enough satellites to solve for all of the combination of these realities—of and blog posts, many of which badly
the variables—to determine not only asynchrony and partial failure—that misunderstand the basic idea) the idea
the receiver’s own position, but also the together make building distributed of CAP has, sadly, not been an easy way
time with excellent precision. systems a difficult pursuit. If we do not for today’s developers to come to terms
We have understood these protocols plan and account for failures in indi- with the reality of programming in a
for a few decades, so it would be easy vidual components, we all but guaran- distributed and imperfect world. That
to believe we have overcome this class tee the failure of combined systems. reality, whether from the point of view
of problems and we ought to be able to One of the most important results of CAP or FLP or any other, is a world in
build systems that assume our clocks in the theory of distributed systems is which you must assume imperfection
are synchronized. Atomic clocks, NTP, an impossibility result, showing one from the components you use as build-
and GPS satellites provide both the of the limits of the ability to build sys- ing blocks. (Any theoretical “impossi-
knowledge and the equipment to ac- tems that work in a world where things bility result” such as CAP or FLP is in-
count for the time it takes for infor- can fail. This is generally referred to as nately tied to its system model. This is
mation to travel. Therefore, systems the FLP result, named for its authors, the theoretical model of the world that
anywhere should be able to agree on Fischer, Lynch, and Paterson.8 Their the result depends on. Any such result
a “now” and to share a common, sin- work, which won the 2001 Dijkstra does not really say that some goal—
gle view of the progress of time. Then prize for the most influential paper in such as consensus—is impossible in
whole categories of hard problems in distributed computing, showed con- general, but rather it is impossible
networks and computing would be- clusively that some computational within that specific model. This can be
come much easier. If all of the systems problems that are achievable in a “syn- extremely useful to practitioners in de-
you care about have the exact same chronous” model in which hosts have veloping intuitions about which paths
perception of time, then many of these identical or shared clocks are impos- might be dead ends, but it can also be
problems become tractable to solve sible under a weaker, asynchronous misleading if you only learn the result
even when some of the hosts involved system model. Impossibility results without learning the context to which
are failing. Yet, these problems do still such as this are very important, as they the result applies.)
exist, and dealing with them is not only can guide you away from going down The real problem is things break.
a continuously active area of research, a dead-end path when designing your The literature referred to here, such
but also a major concern when build- own system. They can also provide a as FLP, is all about dealing with sys-
ing practical distributed systems. snake-oil detector, so you can feel jus- tems in which components are ex-
You might look at the mature tified in your suspicion about anyone pected to fail. If this is the problem,
mechanisms available for understand- who claims a product does something then why don’t we just use things
ing time and believe that researchers you know to be impossible. that do not break, and then build bet-
and system builders are doing a huge A related result, better known than ter systems with components we can
amount of unnecessary work. Why try FLP by developers today, is the CAP assume are robust?
to solve problems assuming clocks theorem (for consistency, availability, Quite often in the past couple of
may differ when we know how to syn- and partition tolerance). This was first years, the Spanner system from Google
chronize? Why not instead just use the proposed informally by Eric Brewer5 has been referenced as a justification
right combination of time sources and and later a formal version of it was prov- for making this sort of assumption.6
protocols to make the clocks agree, en by Seth Gilbert and Nancy Lynch.9 This system uses exactly the tech-
and move on past those problems? From a distributed systems theory niques mentioned earlier (NTP, GPS,
One thing makes this implausible and point of view, the CAP theorem is less and atomic clocks) to assist in coor-
makes these problems not only impor- interesting than FLP: a counterex- dinating the clocks of the hosts that
tant, but also necessary to face head ample “beating” the formal version of make up Spanner and in minimizing
on: everything breaks. CAP assumes an even weaker and more and measuring (but not eliminating)
The real problem is not the theo- adversarial model of the world than the uncertainty about the differences
retical notion of information requiring FLP, and demands that even more be between those clocks. The Spanner
time to be transferred from one place achieved within that model. While one paper, along with the system it docu-
to another. The real problem is that in is not exactly a subproblem of the oth- ments, is often used to back up claims
the physical world in which computing er, FLP is a stronger, more interesting, that it is possible to have a distributed
systems reside, components often fail. and perhaps more surprising result. A system with a single view of time.
One of the most common mistakes in researcher already familiar with FLP Despite the appeal of pointing at
building systems—especially, but not might find the CAP idea a bit boring. Google and using such an argument
only, distributed-computing systems It would be reasonable, though, to from authority, everyone making that
on commodity machines and net- think perhaps the value of CAP’s exis- claim is wrong. In fact, anyone citing
works—is assuming an escape from tence is to be more approachable and Spanner as evidence of synchroniza-
basic physical realities. more easily understood by those not tion being “solved” is either lying or has

38 C OM M U N I C AT I O N S OF T H E ACM | M AY 2 01 5 | VO L . 5 8 | N O. 5
practice

not actually read the paper. The sim- One of the most common variants
plest and clearest evidence against that of such claims is “the network is reli-
claim is the Spanner paper itself. The able,” in the context of a local, in-data-
TrueTime component of Spanner does center network. Many systems have
not provide a simple and unified shared
timeline. Instead, it very clearly pro- One of the most undefined and most likely disastrous
behavior when networks behave poor-
vides an API that directly exposes un-
certainty about differences in perceived
common mistakes ly. The people who want to sell you

in building
these systems justify their choice to
time between system clocks. If you ask ignore reality by explaining that such
it for the current time, it does not give
you a single value, but rather a pair of
distributed systems failures are extremely uncommon. By
doing this, they do their customers a
values describing a range of possibility is assuming double disservice. First, even if such
around “now”—that is, TrueTime does
the opposite of fixing this fundamental
we can escape events were rare, wouldn’t you want
to understand how your system would
problem. It takes the brave and fasci- from basic behave when they do occur, in order
nating choice of confronting it directly
and being explicit about uncertainty, physical realities. to prepare for the impact those events
would have on your business? Second,
instead of pretending a single absolute and worse, the claim is simply a lie—
value for “now” is meaningful across a so bald-faced a lie, in fact, that it is the
working distributed system. very first of the classic eight fallacies of
Within the production environment distributed computing.7,15 The realities
of Spanner, clock drift at any moment of such failures are well documented
is typically from one to seven millisec- in a previous ACM Queue article,3 and
onds. That is the best Google can do the evidence is so very clear and pres-
after including the corrective effects ent that anyone justifying software
of GPS, atomic clocks, eviction of the design choices by claiming “the net-
worst-drifted clocks, and more in order work is reliable” without irony should
to minimize skew. Typical x86 clocks probably not be trusted to build any
vary their speeds depending on all computing systems at all. While it is
kinds of unpredictable environmental true that some systems and networks
factors, such as load, heat, and power. might not have failed in a way a given
Even the difference between the top observer could notice, it would be fool-
and bottom of the same rack can lead ish to base a system’s safety on the as-
to a variance in skew. If the best that sumption it will never fail.
can be done in a wildly expensive en- The opposite approach to this
vironment like Google’s is to live with waving-off of physical problems is to
an uncertainty of several milliseconds, assume that almost nothing can be
most of us should assume our own counted on, and to design using only
clocks are off by much more than that. formal models of a very adversarial
Another claim often made in justify- world. The “asynchronous” model
ing the “just pretend it’s fine” approach that FLP was proved on is not the most
in distributed systems design is that adversarial model on which to build
sufficiently high-quality equipment a working system, but it is certainly a
does not fail, or at least fails so rarely world much more hostile than most
that you do not need to worry about developers believe their systems are
it. This claim would be understand- running on. The thinking goes that if
able, though incorrect, coming from the world you model in is worse than
the makers of such equipment, but it the world you run in, then things you
is usually the users of such gear, such can succeed at in the model should be
as designers of distributed database possible in the real world of imple-
software, who are heard making such mentation. (Note, distributed systems
claims. (The designers of GPS, which theorists have some models that are
Spanner and others rely on, certainly more difficult to succeed in, such as
did not subscribe to this kind of claim. those including the possibility of “Byz-
There are almost 30 GPS satellites, and antine” failure, where parts of the sys-
you need to see only four of them for tem can fail in much worse ways than
the system to work. Each of those sat- just crashing or delaying. For a truly
ellites has multiple redundant atomic adversarial network/system model,
clocks so it can continue functioning you could see, for example, either the
when one of them breaks.) symbolic or the computational models

MAY 2 0 1 5 | VOL . 5 8 | NO. 5 | C OM MU NICAT IO NS OF T HE ACM 39


practice

used by cryptographic protocol theo- attempt at walking this middle road.10 ity pieces of open source software in its
rists. In that world, system builders re- The authors of ZooKeeper knew about genre, with many excellent engineers
ally do assume your own network is out existing consensus protocols such as continually improving it. Under recent
to get you.) Paxos but decided they wanted their analysis, it fares far better under du-
It is in this context, assuming a own protocol with a few additional fea- ress than anything it competes with.1 I
world that is a bit worse than we think tures such as the ability to be process- have pointed at ZooKeeper only as an
we are really operating in, that the ing many requests at once instead of example of both the necessity and the
best-known protocols for consensus waiting for each proposal to complete pitfalls of taking a middle road with
and coordination, such as Paxos,12 are before starting the next one. They real- regard to theory and practicality. Map-
designed. For such essential building ized that if they built on TCP, they had ping theory to practice can be extreme-
blocks for distributed systems, it is the advantage of an underlying system ly challenging.
useful to know they can provide their that provided some valuable properties Another example of this middle
most important guarantees even in a they could assume in their protocol. road is hybrid logical clocks11 that inte-
world that is working against the de- For example, within a single connected grate the general techniques of logical
signer by arbitrarily losing messages, TCP socket, a sender producing mes- time with timestamps from physical
crashing hosts, and so on. (For ex- sage A followed by message B can safely clocks. This allows users to apply some
ample, with Paxos and related proto- assume that if the receiver reads B, then interesting techniques based on hav-
cols, the most emphasized property is the receiver has previously read A. That ing a view (imperfect, but still useful)
“safety”—the guarantee that different “prefix” property is very useful, and one of the physical clocks across a whole
participating hosts will not ever make not present in the asynchronous mod- system. Much like Spanner, this does
conflicting decisions.) Another such el. This is a concrete example of the not pretend to create a single unified
area of work is logical time, manifest advantages available by looking at both timeline but does allow a system de-
as vector clocks, version vectors, and the research in the field and the specific signer to build on the best available
other ways of abstracting over the or- technology that is actually available to knowledge of time as perceived and
dering of events. This idea generally build on, instead of ignoring either. communicated across a cluster.
acknowledges the inability to assume When trying to be pragmatic, All of these different coordina-
synchronized clocks and builds no- though, one must be careful not to tion systems—including Paxos, View-
tions of ordering for a world in which let that pragmatism become its own stamped Replication, Zab/Zookeeper,
clocks are entirely unreliable. strange kind of purity and an excuse and Raft—provide ways of defining an
Of course, you should always strive for sloppy work. The Zab protocol as ordering of events across a distributed
to make everything, including net- implemented inside ZooKeeper, the de system even though physical time can-
works, as reliable as possible. Confus- facto reference implementation, has not safely be used for that purpose.
ing that constant goal with an achiev- never been an accurate implementa- These protocols, though, absolutely can
able perfect end state is silly. Equally tion of the Zab protocol as designed.13 be used for that purpose: to provide a
silly would be a purist view that only the You might call yourself a “pragmatist” unified timeline across a system. You
most well-founded and perfectly under- and note that most other software also can think of coordination as providing a
stood theoretical models are sensible does not match a formal specification; logical surrogate for “now.” When used
starting places for building systems. thus you might say there is nothing un- in that way, however, these protocols
Many of the most effective distributed usual to worry about here. You would have a cost, resulting from something
systems are built on worthwhile ele- be wrong for two reasons. First, the they all fundamentally have in common:
ments that do not map perfectly to the role ZooKeeper and similar software constant communication. For example,
most common models of distributed is used for is not like other software; if you coordinate an ordering for all of
computing. An exemplary such build- it is there precisely to provide essen- the things that happen in your distrib-
ing block would be TCP. This nearly tial safety properties forming a bed- uted system, then at best you are able to
ubiquitous protocol provides some rock on which the rest of a system can provide a response latency no less than
very useful properties, the exact set of make powerful assumptions. Second, the round-trip time (two sequential mes-
which do not map exactly to any of the if there are problems with the safety sage deliveries) inside that system.
typical network models used in devel- properties of a protocol like this, the Depending on the details of your co-
oping theoretical results such as FLP. appearance of those problems (while ordination system, there may be simi-
This creates a quandary for the systems possibly very dangerous) can be subtle lar bounds on throughput as well. The
builder: it would be foolish not to as- and easy to miss. Without a strong be- designer of a system with aggressive
sume the reality of something like TCP lief the implementation reflects the performance demands may wish to do
when designing, but in some cases that protocol as analyzed, it is not reason- things correctly but find the cost of con-
puts them in a tenuous position if they able for a user to trust a system. The stant coordination to be prohibitive.
try to understand how exactly distribut- claim that a system’s “good-enough This is especially the case when hosts
ed systems theory applies to their work. correctness” is proven by its popular- or networks are expected to fail often,
The Zab protocol, which forms the ity is nonsense if the casual user can- as many coordination systems provide
most essential part of the popular not evaluate that correctness. only “eventual liveness” and can make
Apache ZooKeeper coordination soft- All of this is not to pick on ZooKeep- progress only during times of minimal
ware, is a fascinating example of an er. In fact, it is one of the highest-qual- trouble. Even in those rare times when

40 C OM M U N I C AT I O NS O F TH E ACM | M AY 2 01 5 | VO L . 5 8 | NO. 5
practice

everything is working perfectly, howev- state. It has long been known that this Acknowledgments
er, the communication cost of constant can be achieved if all of the work you Thanks to those who provided feed-
coordination might simply be too much. do has certain properties, such as be- back, including Peter Bailis, Christo-
Giving up strict coordination in ing commutative, associative, and pher Meiklejohn, Steve Vinoski, Kyle
exchange for performance is a well- idempotent. What makes CRDTs ex- Kingsbury, and Terry Coatta.
known trade-off in many areas of com- citing is the researchers in that field16
puting, including CPU architecture, are expanding our understanding of
Related articles
multithreaded programming, and how expressive we can be within that on queue.acm.org
DBMS design. Quite often this has limitation and how inexpensively we
If You Have Too Much Data,
turned into a game of finding out just can do such work while providing a
then “Good Enough” Is Good Enough
how little coordination is really needed rich set of data types that work off the Pat Helland
to provide unsurprising behavior to us- shelf for developers. http://queue.acm.org/detail.cfm?id=1988603
ers. While designers of some concur- One way to tell the development of There’s Just No Getting around It: You’re
rent-but-local systems have developed these topics is just beginning is the ex- Building a Distributed System
quite a collection of tricks for man- istence of popular distributed systems Mark Cavage
aging just enough coordination (for that prefer ad hoc hacks instead of the http://queue.acm.org/detail.cfm?id=2482856
example, the RDBMS field has a long best-known choices for dealing with The Network is Reliable
history of interesting performance their problems of consistency, coordi- Peter Bailis and Kyle Kingsbury
hacks, often resulting in being far less nation, or consensus. One example of http://queue.acm.org/detail.cfm?id=2655736
ACID (that is, atomicity, consistency, this is any distributed database with
References
isolation, durability) than you might a “last write wins” policy for resolving 1. Aphyr. Call me maybe: ZooKeeper, 2013; http://aphyr.
guess2), the exploration of such tech- conflicting writes. Since we already com/posts/291-call-me-maybe-zookeeper.
2. Bailis, P. When is “ACID” ACID? Rarely, 2013; http://
niques for general distributed systems know that “last” by itself is a mean- www.bailis.org/blog/when-is-acid-acid-rarely/.
is just beginning. ingless term for the same reason that 3. Bailis, P. and Kingsbury, K. The network is reliable.
ACM Queue 12, 7 (2014); http://queue.acm.org/detail.
This is an exciting time, as the sub- “now” is not a simple value across the cfm?id=2655736.
ject of how to make these trade-offs in whole system, this is really a “many 4. BOOM; http://boom.cs.berkeley.edu/papers.html.
5. Brewer, E.A. Towards robust distributed systems,
distributed systems design is just now writes, chosen unpredictably, will be 2000; http://www.cs.berkeley.edu/~brewer/
starting to be studied seriously. One lost” policy—but that would not sell as cs262b-2004/PODC-keynote.pdf.
6. Corbett, J.C. et al. Spanner: Google’s globally
place where this topic is getting the at- many databases, would it? Even if the distributed database. In Proceedings of the 10th
Symposium on Operating System Design and
tention it deserves is in the Berkeley Or- state of the art is still rapidly improv- Implementation, 2012; http://research.google.com/
ders of Magnitude (BOOM) team at the ing, anyone should be able to do better archive/spanner.html; http://static.googleusercontent.
com/media/research.google.com/en/us/archive/
University of California Berkeley, where than this. spanner-osdi2012.pdf.
multiple researchers are taking differ- Another example, just as terrible as 7. Deutsch, P. Eight fallacies of distributed computing;
https://blogs.oracle.com/jag/resource/Fallacies.html.
ent but related approaches to under- the “most writes lose” database strat- 8. Fischer, M.J., Lynch, N.A. and Paterson, M.S.
standing how to make disciplined trade- egy, is the choice to solve cluster man- Impossibility of distributed consensus with one
faulty process. JACM 32, 2 (1985), 374–382;
offs.4 They are breaking new ground in agement via ad-hoc coordination code http://macs.citadel.edu/rudolphg/csci604/
knowing when and how you can safely instead of using a formally founded ImpossibilityofConsensus.pdf.
9. Gilbert, S. and Lynch, N. Brewer’s conjecture and the
decide not to care about “now” and thus and well-analyzed consensus protocol. feasibility of consistent, available, partition-tolerant
not pay the costs of coordination. Work If you really do need something other Web services. ACM SIGACT News 33, 2 (2002), 51–59;
http://webpages.cs.luc.edu/~pld/353/gilbert_lynch_
like this may soon lead to a greater un- than one of the well-known consensus brewer_proof.pdf.
derstanding of exactly how little we re- protocols to solve the same problem 10. Junqueira, F.P., Reed, B.C. and Serafini, M. Zab: High-
performance broadcast for primary backup systems, 2011;
ally need synchronized time in order to they solve (hint: you don’t), then at a http://web.stanford.edu/class/cs347/reading/zab.pdf.
do our work. If distributed systems can very minimum you ought to do what 11. Kulkarni, S., Demirbas, M., Madeppa, D., Bharadwaj, A.
and Leone, M. Logical physical clocks and consistent
be built with less coordination while the ZooKeeper/Zab implementers did snapshots in globally distributed databases; http://
www.cse.buffalo.edu/tech-reports/2014-04.pdf.
still providing all of the safety properties and document your goals and assump- 12. Lamport, L. The part-time parliament. ACM Trans.
needed, they may be faster, more resil- tions clearly. That will not save you Computer Systems 16, 2 (1998), 133–169; http://
research.microsoft.com/en-us/um/people/lamport/
ient, and more able to scale. from disaster, but it will at least assist pubs/pubs.html#lamport-paxos.
Another important area of research the archaeologists who come later to 13. Medeiros, A. ZooKeeper’s atomic broadcast
protocol: Theory and practice; http://www.tcs.hut.fi/
on avoiding the need for worrying examine your remains. Studies/T-79.5001/reports/2012-deSouzaMedeiros.pdf.
about time or coordination involves This is a very exciting time to be a 14. NTP; http://www.ntp.org.
15. Rotem-Gal-Oz, A. Fallacies of distributed computing
conflict-free replicated data types builder of distributed systems. Many explained; http://www.rgoarchitects.com/Files/
(CRDTs). These are data structures changes are still to come. Some truths, fallacies.pdf.
16. SyncFree; https://syncfree.lip6.fr/index.php/
whose updates never need to be or- however, are very likely to remain. The publications.
dered and so can be used safely with- idea of “now” as a simple value that has
out trying to tackle the problem of meaning across a distance will always Justin Sheehy (@justinsheehy) is the site leader for
VMware’s Cambridge MA R&D center, and also plays
time. They provide a kind of safety that be problematic. We will continue to a strategic role as architect for the company’s Storage
is sometimes called strong eventual need more research and more practical & Availability business. His three most recent positions
before joining VMware were as CTO for Basho, Principal
consistency: all hosts in a system that experience to improve our tools for liv- Scientist at MITRE, and Sr. Architect at Akamai.
have received the same set of updates, ing with that reality. We can do better. Copyright held by owner.
regardless of order, will have the same We can do it now. Publication rights licensed to ACM. $15.00

M AY 2 01 5 | VOL. 58 | NO. 5 | COM M UN ICAT ION S O F T H E ACM 41

View publication stats

You might also like