00 upvotes00 downvotes

11 views11 pagesJul 17, 2007

The wakeup problem

© Attribution Non-Commercial (BY-NC)

PDF, TXT or read online from Scribd

Attribution Non-Commercial (BY-NC)

11 views

00 upvotes00 downvotes

The wakeup problem

Attribution Non-Commercial (BY-NC)

You are on page 1of 11

(EXTENDED ABSTRACT)

time during its execution, and when it fails, it simply

We study a new problem, the wakeup problem, that stops participating in the protocol.

seems to be very fundamental in distributed comput- In the wakeup problem, it is known a priori by all pro-

ing. We present efficient solutions to the problem and ceases that at least n - t processes will eventually wake

show how these solutions can be used to solve the con- up. The goal is simply to have a point in time at which

sensus problem, the leader election problem, and other the fact that at least r processes have already waked

related problems. The main question we try to answer up is known to p processes. It is not required that this

is, how much memory is needed to solve the wakeup time be the earliest possible, and faulty processes are

problem? We assume a model that captures important included in the counts of processes that have waked up

properties of real systems that have been largely ignored and that know about that fact. Note that in a solution

by previous work on cooperative problems. to the wakeup problem, at least p - t correct processes

eventually learn that at least r - t correct processes are

awake and participating in the protocol.

1 Introduction The significance of this problem is two-fold. First, it

seems generally useful to have a protocol such that after

1.1 The Wakeup Problem

a crash of the network or after a malicious attack, the

The wakeup problem is a deceptively simple new prob- remaining correct processes can figure out if sufficiently

lem that seems to be very fundamental to distributed many other processes remain active to carry out a given

computing. The goal is to design a t-resilient proto- task. Second, a solution to this problem is a useful

col for n asynchronous processes in a shared memory building block for solving other important problems (cf.

environment such that at least p processes eventually section 6).

learn that at least 7- processes have waked up and be-

gun participating in the protocol. Put another way, the 1.2 A New Model

wakeup problem with parameters n, t, ~- and p is to find

a protocol such that in any fair run of n processes with Much work to date on fault-tolerant parallel and dis-

at most t failures, at least p processes eventually know tributed systems has been generous of the class of faults

that at least r processes have taken at least one step in considered but rather strict in the requirements on the

the past. The only kind of failures we consider are crash system itself. Problems are usually studied in an un-

derlying model that is fully synchronous, provides each

*Computer Science D e p a r t m e n t , Yale University, New Haven, process with a unique name that is known to all other

CT 06520.

l C o m p u t e r Science D e p a r t m e n t , Technion, Haifa 32000, Israel. processes, and is initialized to a known state at time

t C o m p u t e r Science D e p a r t m e n t Carnegie Mellon University, zero. We argue that none of these assumptions is real-

P i t t s b u r g h , PA 15213. istic in today's computer networks, and achieving them

This work was s u p p o r t e d in p a r t by ONR contract N00014- even within a single parallel computer is becoming in-

89-J-1980, by the National Science Foundation u n d e r grant CCR-

8405478, by the Hebrew Technical Institute scholarship, by the creasingly difficult and costly. Large systems do not run

Technion V.P.R. Funds - Wellner Research Fund, a n d by the off of a single clock and hence are not synchronous. Pro-

Foundation for Research in Electronics, Computers a n d Commu- viding processes with unique id's is costly and difficult

nications, a d m i n i s t r a t e d by the Israel Academy of Sciences a n d

and greatly complicates reconfiguring the system. Fi-

Humanities.

nally, simultaneously resetting all of the computers and

communication channels in a large network to a known

initial state is virtually impossible and would rarely be

Permission to copy without fee all or part of this matertial is granted pro- done even if it were possible because of the large de-

vided that the copies are not made or distributed for direct commercial structive effects it would have on ongoing activities.

advantage, the ACM copyright notice and the title of the publication and

its date appear, and notice is given that copying is by permission of the Our new model of computation makes none of these

Association for Computing Machinery. To copy otherwise, or to republish,

requires a fee and/or specific permission.

assumptions. It consists of a fully asynchronous collec- 1.3 Space Complexity Results

tion of n identical anonymous deterministic processes

The main question we try to answer is, how m a n y values

that communicate via a single finite sized shared reg-

v for the shared register are necessary and sufficient to

ister which is initially in an arbitrary unknown state.

solve the wakeup problem? The answer both gives a

Access to the shared register is via atomic "test-and-

measure of the communication-space complexity of the

set" instructions which, in a single indivisible step, read

problem and also provides a way of assessing the cost

the value in the register and then write a new value that

of achieving reliability. We give a brief overview of our

can depend on the value just read.

results below.

Assuming an arbitrary unknown initial state relates

to the notion of self-stablizing systems defined by Di-

jkstra [8]. However, Dijkstra considers only non- 1.3.1 Fault-Free Solutions

terminating control problems such as the mutual ex- First we examine what can be done in the absence of

clusion problem, whereas we show how to solve decision faults (i.e., t = 0). We present a solution to the wakeup

problems such as the wakeup, consensus and leader elec- problem in which one process learns t h a t all other pro-

tion problems, in which a process makes an irrevocable cesses are awake (i.e., p = 1 and r = n), and it uses a

decision after a finite number of steps. single 4-valued register (i.e., v -- 4). The protocol for

Before proceeding, we should address two possible achieving this is quite subtle and surprising. It can also

criticisms of shared m e m o r y models in general and our be modified to solve the leader election problem. Based

model in particular. First, most computers implement on this protocol, we construct a fault-free protocol that

only reads and writes to memory, so why do we consider reaches consensus on one out of k possible values using

atomic test-and-set instructions? One answer is that a 5-valued register. Finally, we show that there is no

large parallel systems access shared m e m o r y through fault-free solution to the wakeup problem with only two

a communication network which m a y well possess in- values (i.e., one bit) when 7" _> 3.

dependent processing power that enables it to imple-

ment more powerful primitives than just simple reads 1.3.2 Fault-Tolerant Solutions: Upper Bounds

and writes. Indeed, such machines have been seriously

proposed [23, 44]. Another answer is that part of our We start by showing that the fault-free solution which

interest is in exploring the boundary between what can uses a single 4-valued register, mentioned in the pre-

and cannot be done, and a proof of impossibility for vious section, can actually tolerate t failures for any

a machine with test-and-set access to m e m o r y shows a r _< ((2n - 2)/(2t + 1) + 1)/2. Using m a n y copies of this

fortiori the corresponding impossibility for the weaker protocol, we construct a protocol with v = 8 t+l that

read/write model. tolerates t faults when I" _< n - t . Thus, if t is a con-

A second possible criticism is that real distributed stant, then a constant sized shared m e m o r y is sufficient,

systems are built around the message-passing paradigm independent of n. However, the constant grows expo-

and that shared m e m o r y models are unrealistic for large nentially with t. An easy protocol exists with v = n that

systems. Again we have several possible answers. First, works for any t and r < n - t . This means that the above

the premise m a y not be correct. Experience is showing exponential result is only of interest for t << log n. Fi-

that message-passing systems are difficult to program, nally, we show that for any t < n / 2 , there is a t-resilient

so increasing attention is being paid to implementing solution to the wakeup problem for any 7" < [n/2J + 1,

shared m e m o r y models, either in hardware (e.g. the Flu- using a single O(t)-valued register.

ent machine [45]) or in software (e.g. the Linda system

[5]). Second, message-passing systems are themselves 1.3.3 Fault-Tolerant Solutions: A Lower Bound

an abstraction that m a y not accurately reflect the reali-

ties of the underlying hardware. For example, message- We prove that for any protocol P that solves the wakeup

passing systems typically assume infinite buffers for in- problem for parameters n, t and r, the number of shared

coming messages, yet nothing is infinite in a real system, m e m o r y values used by P is at least W a, where W =

and indeed overflow of the message buffer is one kind of ( t v ~ - t ) / ( n - t) and ~ = 1 / ( l o g 2 (n7- -~t + 3)). The proof

fault to which real systems are subject. It is difficult to is quite intricate and involves showing for any protocol

see how to study a kind of fault which is assumed away with too few m e m o r y values that there is a run in which

by the model. Finally, at the lowest level, communi- n - t processes wake up and do not fail, yet no process

cation hardware looks very much like shared memory. can distinguish that run from another in which fewer

For example, a wire from one process to another can than r wake up; hence, no process knows that r are

be thought of as a binary shared register which the first awake.

process can write (by injecting a voltage) and the second

process can read (by sensing the voltage).

107

1.4 Relation to Other Problems no process wrote into the register. A register r is said

to be local if there exists an i such t h a t r E R/ and

We establish connections between the wakeup prob-

for any j ~ i, r ~ Rj. A register is s h a r e d if it is not

lem and two fundamental problems in distributed com-

local. In this paper we restrict attention to protocols

puting: the consensus problem and the leader elec-

which have exactly one register which is shared by all

tion problem. These two problems lie at the core of

the processes (i.e., IR1 n . . . n R , I = 1) and all other

m a n y problems for fault-tolerant distributed applica-

registers are local. If S' is a prefix of S then the run

tions [1, 7, 10, 13, 16, 20, 21, 32, 34, 43, 42, 48].

(f, S') is a prefiz of (f, S), and (f, S) is an extension of

We show that: (1) any protocol that uses v values

(f, SI). For any sequence S, let Si be the subsequence

and solves the wakeup problem for t < n/2, 7" > n / 2

of S containing all events in S which involve qi.

and p = 1 can be transformed into i-resilient consensus

and leader election protocols which use 8v values; and D e f i n i t i o n : C o m p u t a t i o n s (f, S) and ( f ' , S ' ) are equiv-

(2) any l-resilient consensus and leader election protocol alent with respect to qi, denoted by (f, S) / ( f ' , S'), iff

that uses v values can be transformed into a i-resilient si =

protocol which uses 4v values and solves the wakeup

We are now ready to define the notion of knowledge in

problem for any 7" < [n/2J + 1 and p = 1.

a shared m e m o r y environment. In the following, we use

Using the first result above, we can construct effi-

predicate to mean a set of runs.

cient solutions to both the consensus and leader election

problems from solutions for the wakeup problem. The D e f i n i t i o n : For a process qi, predicate b and finite run

i

second result implies that the lower bound proved for p, process qi knows b at p iff for all p' such that p .~ p',

the wakeup problem holds for these other two problems. it is the case that p' E b.

As a consequence, the consensus and the leader election

For simplicity, we assume t h a t a process always takes

problems are space-equivalent in our model. This is

a step whenever it is scheduled. A process that takes

particularly surprising since the two problems seem so

infinitely m a n y steps in a run is said to be correct in

different. The difficulty in leader election is breaking

that run; otherwise it is faulty. We say that an infinite

symmetry, whereas consensus is inherently symmetric.

run is l-fair iff at least I processes are correct in it.

Election Protocols

2.1 Protocols and Knowledge

In this subsection we formally define the notions of t-

An n-process protocol P = ( C , N , R ) consists of a resilient wakeup, consensus and leader election protocols

nonempty set C of runs, an n-tuple N = (ql . . . . ,qn) C0 < t < n). We say t h a t a p r o c e s s qi is awake in a

of process id's (or process, for short), and an n-tuple run if the run contains an event that involves qi. The

R = ( R 1 , . . . , R n ) of sets of registers. Informally, Ri predicate "at least 7- processes are awake in run p" is the

includes all the register that process qi can access. We set of all runs for which there exist v different processes

assume throughout this paper that n > 2. which are awake in the run. Note that a process that

A run is a pair of the form ( f , S ) where f is a fails after taking a step is nevertheless considered to be

function which assigns initial values to the registers in awake in the run.

R1 t.J . . . t.J Rn and S is a finite or infinite sequence of

events. (When S is finite, we also say that the run is • A wakeup protocol with p a r a m e t e r s n, t, 7" and p is

finite.) An event e = (ql, v, r, v ~) means that process qi, a protocol for n processes such that, for any ( n - t ) -

in one atomic step, first reads a value v from register fair run p, there exists a finite prefix of p in which

r and then writes a value v ~ into register r. We say at least p processes know that at least 7" processes

that the event e involves process qi and that process ql are awake in p.

performs a test-and-set operation on register r. It is easy to see that a wakeup protocol exists only

The set of runs is assumed to satisfy several proper- if max(p, 7") < n - t, and hence, from now on, we

ties; for example, it should be prefix closed. Because of assume that this is always tlle case. We also assume

lack of space, we do not give a complete list here but that min(p, 7") > 1.

point out that these properties capture the fact that

In the following, whenever we speak about a solu-

we are assuming that the processes are anonymous and

tion to the wakeup problem without mentioning p,

identically p r o g r a m m e d , are not synchronized, and that

we are assuming that p = 1.

nothing can be assumed about the initial state of the

shared memory. • A t-resilient k-consensus protocol is a protocol for

The value of a register at a finite run is the last value n processes, where each process has a local read-

that was written into that register, or its initial value if only input register and a local write-once output

108

register. For any (n - t)-fair run there exists a fi- preted as two bits which we call the "token bit" and the

nite prefix in which all the correct processes decide "See-Saw" bit. The two states of the token bit are called

on some value from a set of size k (i.e., each cor- "token present" and "no token present". We think of

rect process writes a decision value into its local a public token slot which either contains a token or is

output register), the decision values written by all empty, according to the value of the token bit. The two

processes are the same, and the decision value is states of the See-Saw bit are called "left side down" and

equal to the input value of some process. "right side down". The "See-Saw" bit describes a vir-

In the following, whenever we say "consensus" tual See-Saw which has a left and a right side. The bit

(without mentioning specific k) we mean "binary indicates which side is down (implying that the opposite

consensus", where the possible decision values are side is up).

0 and 1. Each process remembers in private m e m o r y the num-

ber of tokens it currently possess and which of four

• A t-resilient leader election protocol is a proto- states it is currently in with respect to the See-Saw:

col for n processes, where each process has a local "never been on .... on left side", "on right side", and "got

write-once output register. For any (n - t)-fair run off". A process is said to be on the up-side of the See-

there exists a finite prefix in which all the correct Saw if it is currently "on left side" and the See-Saw bit

processes decide on some value in {0, 1}, and ex- is in state "right side down", or it is currently "on right

actly one (correct or faulty) process decides on 1. side" and the See-Saw bit is in state "left side down".

T h a t process is called the leader. A process initially possesses two tokens and is in state

"never been on".

We define the protocol by a list of rules. When a pro-

3 Fault-free solutions cess is scheduled, it looks at the shared register and at

its own internal state and carries out the first applica-

In this section, we develop tile See-Saw protocol, which

ble rule, if any. If no rule is applicable, it takes a null

solves the fault-free wakeup problem using a single 4-

step which leaves its internal state and the value in the

valued shared register. Then we show how the See-Saw

shared register unchanged.

protocol can be used to solve the k-valued consensus

problem. Finally, we claim that it is impossible to solve R u l e 1: (Start of protocol) Applicable if tile scheduled

the wakeup problem using only one shared bit. process is in state "never been on". The process

To understand the See-Saw protocol, the reader gets on the up-side of the See-Saw and then flips the

should imagine a playground with a See-Saw in it. The See-Saw bit. By "get on", we mean that the process

processes will play the protocol on the See-Saw, adher- changes its state to "on left side" or "on right side"

ing to strict rules. When each process enters the play- according to whichever side is up. Since flipping

ground (wakes up), it sits on the up-side of the See-Saw the See-Saw bit causes that side to go down, the

causing it to swing to the ground. Only a process on the process ends up on the down-side of the See-Saw.

ground (or down-side) can get off and when it does the

R u l e 2: (Emitter) Applicable if the scheduled process

See-Saw must swing to the opposite orientation. These

is on the down-side of the See-Saw, has one or

rules enforce a balance invariant which says that the

more tokens, and the token slot is empty. The

number of processes on each side of the See-Saw differs

process flips the token bit (to indicate that a to-

by at most one (the heavier side always being down).

ken is present) and decrements by one the count

Each process enters the playground with two tokens.

of tokens it possesses. If its token count thereby

The protocol will force the processes on the b o t t o m of

becomes zero, the process flips the See-Saw bit and

the See-Saw to give away tokens to the processes on

gets off the See-Saw by setting its state to "got off".

the top of the See-Saw. Thus, token flow will change

direction depending on the orientation of the See-Saw. R u l e 3: (Absorber)Applicable if the scheduled process

Tokens can be neither created nor destroyed. The idea is on the up-side of the See-Saw and a token is

of the protocol is to cause tokens to concentrate in the present in the token slot. The process flips the

hands of a single process. A process seeing 2k tokens token bit (to indicate that a token is no longer

knows that at least k processes are awake. Hence, if it present) and increments by one the count of tokens

is guaranteed that eventually some process will see at it possesses.

least 27- tokens, the protocol is by definition a wakeup

protocol with p a r a m e t e r 7-, even if the process does not Note that if a scheduled process is on the down-side,

know the value of 7- and hence does not know when the has 2 k - 1 tokens, and a token is present in the token

goal has been achieved. slot, then, although no rule is applicable, the process

Following is the complete description of the See-Saw nevertheless sees a total of 2k tokens and hence knows

protocol. The 4-valued shared register is easily inter- that k processes have waked up.

109

The two main ideas behind the protocol can be stated 4 Fault-tolerant solutions

as invariants.

In this section, we explore solutions to the wakeup prob-

T O K E N INVARIANT: The number of tokens in the lem which can tolerate t > 0 process failures.

system is either 2n or 2n + 1 and does not change at The See-Saw protocol, presented in the previous sec-

any time during the protocol. (The number of tokens in

tion, cannot tolerate even a single crash failure for any

the system is the total number of tokens possessed by r > n / 3 . The reason is that the faulty process may

all of the processes, plus 1 if a token is present in the

fail after accumulating 2 n / 3 tokens, trapping two other

token bit slot.)

processes on one side of the See-Saw, each with 2 n / 3

B A L A N C E INVARIANT: The number of processes on tokens. When 7" < n / 3 , the See-Saw protocol can toler-

the left and right sides of the See-Saw is either perfectly ate at least one failure. As the parameter 7" decreases,

balanced or favors the down-side of the See-Saw by one the number of failures that the protocol can tolerate

process. increases. This fact is captured in our first theorem.

T h e o r e m 3.1: Let t = O. The See-Saw protocol uses T h e o r e m 4.1: The See-Saw protocol is a wakeup pro-

a 4-valued shared register and is a wakeup protocol for tocol for n, t, r, where r <_ ((2n - 1)/(2t + 1) + 1)/2.

n, t, 7" (and p = 1), where n and r are arbitrary and t =

We note that the See-Saw protocol can tolerate up to

O. (Note that the rules f o r the protocol do not mention n / 2 - 1 initial failures [21, 49]. In the rest of this sec-

n o r 7..)

tion, we present three t-resilient solutions to the wakeup

In applications of wakeup protocols, it is often desir- problem. Notice that when the number of failures t is

able for the processes to know the value of v so that a constant, it is possible using a constant number of

a process learning that r processes are awake can stop values for one process to learn that n - t processes are

participating in the wakeup protocol and take some ac- awake.

tion based on that knowledge. The See-Saw protocol T h e o r e m 4.2: For any t < n / 6 , there is a wakeup pro-

can be easily modified to have this property by adding tocol which uses a single 8t+l-valued register and works

a termination rule immediately after Rule 1:

f o r any 7- < n - t.

R u l e l a : ( E n d of protocol) Applicable if the scheduled T h e o r e m 4.3: For any t < n, there is a wakeup proto-

process is on the See-Saw and sees at least 2r to- col which uses a single n-valued register and works f o r

kens, where the number of tokens the process sees any r < n - t.

is the number it possesses, plus one if a token is

present in the token slot. The process thus knows T h e o r e m 4.4: For any t < n / 2 , there is a wakeup pro-

that 7. processes have waked up. It gets off the See- tocol which uses a single O(t)-valued register and works

Saw (i.e., terminates) by setting its state to "got for any 7. _< [n/2J + 1.

Off ~ ,

The See-Saw protocol can also be used to solve the 5 A Lower Bound

leader election problem by electing the first process that

sees 2n tokens. By adding a 5th value, everyone can be In this section, we establish a lower bound on the num-

informed that the leader was elected, and the leader ber of shared memory values needed to solve the wakeup

can know that everyone knows. Now, the leader can problem, where only one process is required to learn

transmit an arbitrary message, for example a consensus that r processes are awake, assuming t processes may

value, to all the other processes without using any more crash fail (thus p = 1). To simplify the presentation,

new values through a kind of serial protocol. This leads we assume that 9 < t < 2 n / 3 and 7- > n / 3 . Also, recall

to our next theorem. that we already assumed that r < n - t. For the rest of

this section, let

T h e o r e m 3.2: In the absence of faults, it is possible

to reach consensus on one of k values using a single tx/t - t t2 - 1

5-valued shared register. W - - - U -- - - (1)

n-t ' 4(n-t)'

Finally, we claim that the See-Saw protocol cannot be

improved to use only a single binary register. A slightly (2)

weaker result than Theorem 3.3 was also proved by Joe l°g2 ( tn--~t -k- 3.5)"

tlalpern [27]. The question whether 3 values suffice is Note that W < U since t > 9.

still open.

T h e o r e m 5.1: Let P be a wakeup protocol with param-

T h e o r e m 3.3: There does not exist a solution to the eters n, t and r. Let V be the set of shared m e m o r y

wakeup problem which uses only a single binary register values used by P . Then IVI > w ~.

when r > 3.

110

When we take t to be a constant fraction of n we get D e f i n i t i o n : Let V be the set of shared m e m o r y values

the following immediate corollary. of protocol P. The teachability graph G of protocol P is

the labeled directed multigraph with node set V which

C o r o l l a r y 5.1: Let P be a wakeup protocol with pa-

has an edge from node a to node b labeled with r iff

rameters n, t and v, where t --- n/c. Let V be the

a ~ b holds. (Note that there m a y be several edges

set of shared memory values used by P and let 7 =

1/(21og2(c+ 2.5)). Then, }V] = ft(n'Y). with different labels between the same two nodes. Note

also that G is finite since a : : ~ b implies that r _< [VI. )

Theorem 5.1 is immediate if V is infinite, so we as-

D e f i n i t i o n : A graph C is closed at node a w.r.t. G if a

sume throughout this section that V is finite. The proof

is in C and for every node b in G, if (a, b) is an edge of

consists of several parts. First we define a sequence of

directed graphs whose nodes are shared m e m o r y values G then b is in C.

in V. Each component C of each graph in the sequence D e f i n i t i o n : A multigraph T is terminal w.r.t. G if T is

has a cardinality k¢ and a weight w~. We establish by a subgraph of G, all of T ' s components are rooted, and

induction that k¢ _> rain(we, W) a. Finally, we argue T has a component C with root a among its minimal

that in the last graph in the sequence, every component weight components that is closed at node a w . r . t . G .

C has weight w~ > W. Hence, IV[ > k~ > W a. In the rest of the section we show that the reacha-

bility graph G of any wakeup protocol with parameters

5.1 Reachability Graphs and Terminal n , t , r has size _> W% We do that by constructing a

multigraph T which is terminal w.r.t. G and has size

Graphs

> W a. Theorem 5.1 follows from these facts.

Let V be the alphabet of the shared register. We say

that a value a E V appears m times in a given run if 5.2 Reachability Graphs

there are (at least) m different prefixes of that run where

the value of the shared register is a. The teachability graphs are defined for all protocols.

Now we concentrate on such graphs constructed from

it

a , b denotes that there exists a run in which at wakeup protocols only. We show that when the weight

most u processes participate, the initial value of of a rooted component, say C, is sufficiently small, an

the shared register is a, and the value b appears at edge exists with a label q from a root of C to a node

least once. not in C, and we can bound the size of q.

For later reference we call the following three inequal-

a : : ~ b denotes that there exists a run in which exactly

ities,

u processes participate, each process that partici-

pates takes infinitely m a n y steps, the initial value (i) pq + ( p - 1)w < n,

of the shared register is a, and the value b appears (iX) pq > n - t,

infinitely m a n y times. (iii) max(q,w) < v

Clearly, a :=~ b implies a r , b but not vice versa. Also, the zigzag inequalities. These inequalities play an im-

for every a, there exists b such that a :=~ b. portant role in our exposition.

We use the following graph-theoretic notions. A di- L e m m a 5.1: Given teachability graph G of a wakeup

rected multigraph 1 G is weakly connected if the under-

protocol P with parameters n, t, 7" and a rooted subgraph

lying undirected multigraph of G is connected. A multi- C of G with root a and weight w, if there exist positive

graph G'(V', E') is a subgraph of G(V, E) if WI _C E and integers p and q that satisfy the zigzag inequalities, then

V I _ V. A multigraph G t is a component of a multi-

for any node b of G, ira ~ b is an edge of G then b is

graph G if it is a weakly connected subgraph of G and

not in C.

for any edge (a, b) in G, either both a and b are nodes

of G' or both a and b are not in G'. A node is a rool P r o o f : We assume to the contrary that there exists p

,3f a multigraph if there is a directed path from every and q that satisfy the zigzag inequalities, and there is

other node in the multigraph to that node. A rooted an edge a ~ b s u c h that b belongs to C. Let p b e a

.Traph (rooted component) is a graph (component) with q-fair run starting from a in which exactly q processes

at least one root. A labeled multigraph is a multigraph participate and b is written infinitely often. Since b is

together with a label function that assigns a weight in N in C, there is a path from b to a such that the sum of

to each edge of G. The weight of a labeled multigraph all the labels of edges in that path is at most w and

is the sum of the weights of its edges. hence b to, a. This allows us to construct a run with

We now define the notion of a reachability, graph of a pq non-faulty processes starting with a as follows:

given protocol P.

Run q processes according to p until b is writ-

l A multigraph c a n h a v e s e v e r a l e d g e s f r o m a t o b. ten. Run w processes until a is written. (This

111

must be possible since b ~ a.) Let these w which implies (i).

processes fail. Run a second group of q pro- Finally, we show t h a t inequality (iii) is satisfied. Re-

cesses according to p until b is written. Run call that we assume t h a t t <_ 2n/3 and r > n/3. It

a second group of w processes until a is writ- follows from these assumptions t h a t r > t/2. Since

ten, and let them fail. Repeat the above until q < t/2, obviously q < r. Also, since w _< U and

the pth group of q processes have just been run t <_ 2n/3, substituting in (1) gives w < n/3, and hence

and b has again been written. At this point, w<7-. []

pq processes belong to still-active groups, and L e m m a 5.3: lf w < W , then there exists positive inte-

( p - 1)w processes have died. If any processes

gers p and q that satisfy the zigzag inequalities and

remain, let t h e m die now without taking any

steps. Now, an infinite run p~ on the active pro- ,o(n - 0

q < - - + 3. (5)

cesses can be constructed by continuing to run t+2

the first group according to p until b is writ-

ten again, then doing the same for the second

through pth groups, and repeating this cycle Proof." Recall t h a t W < U, so in particular, w < U.

forever. Let q~ be the smallest positive integer solution to (3).

It follows from L e m m a 5.2 that q' exists. Let q be the

The result is a pq-fair run. Moreover, no reliable process

smallest positive integer for which there exists a positive

can distinguish this run from p, and hence no reliable

integer p such that p and q satisfy the zigzag inequalities.

process ever knows (in p~) that more than q processes

It follows from L e m m a 5.2 that q exists and q < q'.

are awake. Also, obviously, no faulty process knows that

If q = 1 than clearly 1 < w(n - t)/(t + 2) + 2 and

more than w processes are awake. Since max(q, w) < 7"

the l e m m a holds. Assume q > 1. Since q < q' it follows

but at least pq >_ n - t >_ r processes are awake in p~,

that

this leads to a contradiction to the assumption that P ( q - 1) 2 - t ( q - 1) + w ( n - t ) > O.

is a wakeup protocol. []

Thus,

L e m m a 5.2: Assume w <_ U. Then the inequality q2 -4- 1 -4- t -4- w ( n -- t )

q < (6)

x - + - t ) _< 0 (3) t+2

has a positive integer solution. The smallest positive By L e m m a 5.2,

integer solution for (3) is

[t- 'x/t2-4w(n-t)'] < t q _ It - I /t2 - 4w(n - t)l] . (7)

q = 2 _ (4)

Since w < W, we can substitute W for w in inequality

There exists a positive integer p such that p and q satisfy (7) and get that q < IvY] < v ~ + 1. Thus, q2 <

the zigzag inequalities.

t + 2v/t + 1, so it follows from (6) and the assumption

P r o o f : We first show that (3) has a positive solution. that t > 9 that inequality (5) holds. []

Using the quadratic formula, we get that the roots of

(3) are 5.3 Terminal Graphs

t - I~t 2 - 4w(n - t)l and t + I ~ t ~ - 4w(n - t)l In this subsection, we show that the teachability graph

2 2 G of any wakeup protocol with parameters n , t , r , has

Since w < U the discriminant t 2 - 4w(n - t) > 1. Since at least one subgraph which is terminal w.r.t. G and

the value of the discriminant is less than t 2 it follows has size > W a. We first prove that the weight of any

that the roots are positive. Moreover, the difference rooted component of any terminal graph w.r.t. G has

of the two roots is at least 1; hence there is a positive weight > U. Then we show that this implies that there

integer x satisfying (3), and q is the least such integer. exists a terminal graph w.r.t. G, all of whose rooted

Moreover, since t is an integer, t/2 is either an integer or components have size > W ~.

lies exactly half way between two integers, so inequality L e m m a 5.4: Let G be the reachability graph of a

(4) holds. wakeup protocol with parameters n,t, 7" and let T be ter-

Next we show t h a t there exists a positive integer p minal w.r.t. G. Any rooted component o f T has weight

such that p and q satisfy the inequalities (i) and (it). >U.

Let p = [(n - t)/q]. The choice of p clearly satisfies

P r o o f : Assume to the contrary t h a t T has a minimal-

(it). Also from (3) it follows t h a t

weight component C of weight w < U. Then, by L e m m a

5.2, there exist positive integers q and p that satisfy the

p= < ~ + 1 < ~

-- q -- q+w zigzag inequalities. From L e m m a 5.1, there is a node b

112

not in C and an edge a ~ b in G. Therefore, T is not rooted component of T/ are both at least 2. Now, sup-

a terminal w.r.t. G, a contradiction. 1:3 pose the procedure adds an edge of label q from compo-

nent C1 ofsize kl and weight wx to component C2 of size

L e m m a 5.5: Let G be the teachability graph of a

k2 and weight w2. By step 1, the new edge emanates

wakeup protocol with parameters n , t , 7". There exists a

from a minimal weight component, so wl _< w2. The

graph T which is terminal w.r.t. G, all of whose rooted

weight w of the newly formed component is Wl + w2 + q,

components have size >_ W ~.

and the number of nodes k is kl + ks. We show now

P r o o f : The following procedure constructs T by adding that k >_ min(w, W) ~.

edges one at a time to an initial subgraph To of G un- Clearly, if w~ > W then min(w2, W) ~ = min(w, W) ~

til step 2 fails. The initial subgraph To consists of all and k2 < k, so by the induction hypothesis we are done.

the nodes of G. For each node a there is exactly one Hence, we assume that w~ < W, so also Wl < W. Since

outgoing edge a ~ b in To. We note two facts about Wl < W it follows from L e m m a 5.3 that there exist

To: (1) for every edge a ~ b, a # b, and (2) every positive integers p~ and q~ that satisfy the zigzag in-

component of To has at least one root. Fact (1) follows equalities and q' < (wl(n - - t ) ) / ( t + 2) + 3; hence by

from Lemma 5.1, choosing q = 1 and p = n - t (w = 0); Lemma 5.1 there is an edge of label q~ from any root of

while (2) follows from the fact that the outdegree of any C1 to some node not in C1. Thus, by the minimality

node is exactly one. Also, it follows from (1) that the of q (the weight of the edge in step 2), it follows that

weight and size of any component of To is at least 2. q < q' which implies that q < (wl(n - t ) ) / ( t + 2) + 3;

At any stage of the construction, every component hence,

of the graph built so far will have at least one root.

Added edges always start at a root and end at a node w = wl+w2+q (8)

of a different component. After adding an edge (a,b), _< (~-~-~+1 wx+w2+3 (9)

the components of a and b are joined together into a

single component whose root is the root of b's compo- n-t )

nent, and the weight of the new component is the sum _< (~--~-~+2.5 w l + w 2 . (10)

of the weights of the two original components plus the

label of the edge from a to b. I I I t

Let k x = W l = , a n d k 2 = w 2 =. Then k l < k 2 ' w 1 = k l

I~

,

,O

P r o c e d u r e f o r a d d i n g a n e w e d g e t o T: a n d w 2 = k 2 . We claim that

S t e p 1: Select an arbitrary component C of minimal n t

weight and an arbitrary root a of C. + 2.5) wx + w~

S t e p 2: Find the smallest integer q for which there is n-t +2.5 k1 +k~ (11)

an edge a ~ b in G such that b is not in C. This

step fails if no such edge exists. _< (k; + k;?.

S t e p 3: Place the edge a ~ b into T. It is not difficult to see that since (n - t ) / ( t + 2 ) + 3 . 5 =

,~ , I I I , .

Let Ti be a graph that is constructed after i applications 2 , equahty holds for k x = k 2. As k 2 I s increased to be

I

of the above procedure, where To is an initial graph as larger than kl, the right side increases more rapidly than

defined above. Clearly, any such sequence {T0,T1,...} the left side since /3 > 1; hence, the inequality holds.I

is finite and the last element is terminal w . r . t . G . Finally, by the induction hypothesis, kl > wx ~ = kl

I

tions of the procedure, that for any graph T/, all of the

components of T/ are rooted, and for any rooted com- (k'1 + k'2)z < (kl + k2) ~ = k #. (13)

ponent C it is the case that k _> rain(w, W) ~, k _> 2 Putting equations ( 8 ) - ( 1 3 ) together gives w _< k ~, so

and w >_ 2, where k is the size of C and w is its weight.

k > w ~ > min(w, W) ~. CI

This together with Lemma 5.4 and the fact that W < U

completes the proof. Theorem 5.1 follows immediately from Lemma 5.5.

Let/3 = 1/a. As discussed before, each component C

ofT0 has a root and has size k at least 2. The component

C consists of exactly k edges with label 1, so its weight 6 Relation to Other Problems

is also k. Hence, the base case holds since/3 > 1. In this section we show that there are efficient reduc-

Since To is a subgraph of ~ which also includes all

tions between the wakeup problem for r = [n/2J + 1

nodes of T/, it follows that the size and weight of any and the consensus and leader election problems. Hence,

the wakeup problem can be viewed as a fundamental

113

problem that captures the inherent difficulty of these ble to simultaneously reset all parts of the system to a

two problems. The following Lemma shows that in or- known initial state.

der to decide on some value in a t-resilient consensus Our results are interesting for several reasons:

protocol, it is always necessary (and in some cases also

sufficient) to learn first that at least t + 1 processes have • They give a quantitative measure of the cost of

waked up, and similarly in order to be elected in a t- fault-tolerance in shared memory parallel machines

resilient leader election protocol, it is always necessary in terms of communication bandwidth.

to learn that at least t + 1 processes have waked up. An • They apply to a model which more accurately re-

immediate consequence of the lemma is that there is no flects reality.

consensus or leader election protocol that can tolerate

[n/2~ failures. • They relate recent results from three different ac-

tive research areas in parallel and distributed com-

L e m m a 6.1: (I) Any t-resilient consensus (leader elec-

puting, namely:

tion) protocol is a t-resilient wakeup protocol for any

7" < t + l and p = n - t (p=l); (2) For any t < n/3, - Results in shared memory systems [2, 11, 19,

there exists t-resilient consensus and leader election pro- 31, 36, 38, 39, 46, 50, 51].

tocols which are not t-resilient wakeup protocols for any

- The theory of knowledge in distributed sys-

r>t+2.

tems [6, 14, 15, 17, 18, 22, 28, 25, 29, 30, 26,

T h e o r e m 6.1: Any protocol that solves the wakeup 33, 37, 40, 41].

problem for any t < n/2, r > n / 2 and p = 1, using a

- Self stabilizing protocols [3, 4, 8, 9, 12, 24, 35,

single v-valued shared register, can be transformed into

47].

a t-resilient consensus (leader election) protocol which

uses a single 8v-valued (4v-valued) shared register. • They give a new point of view and enable a deeper

From Theorems 6.1 and 4.4, it follows that for any understanding of some classical problems and re-

t < n/2, there is a t-resilient consensus (leader elec- sults in cooperative computing.

tion) protocol that uses an O(t)-valued shared regis-

• They are proved using techniques that will likely

ter. Next we show that the converse of Theorem 6.1 have application to other problems in distributed

also holds. T h a t is, the existence of a t-resilient con- computing.

sensus or leader election protocol which uses a single

v-valued shared register implies the existence of a t-

resilient wakeup protocol for any r < In/2] + 1 which Acknowledgement

uses a single O(v)-valued shared register.

We thank Joe Halpern for helpful discussions.

T h e o r e m 6.2: Any t-resilient protocol that solves the

consensus or leader election problem using a single

v-valued shared register can be transformed into a t- References

resilient protocol that solves the wakeup problem for any

7- _< [n/2J + 1 which uses a single 4v.valued shared reg- [1] K. Abrahamson. On achieving consensus using

ister. shared memory. In Proc. 7th A CM Syrup. on Prin-

It follows from Theorem 6.2 that the lower bound ciples of Distributed Computing, pages 291-302,

we proved in Section 5 for the wakeup problem when 1988.

r = [n/2J + 1 also applies to the consensus and leader [2] B. Bloom. Constructing two-writer atomic regis-

election problems. Finally, an immediate corollary of ters. In Proc. 6th A C M Syrup. on Principles of

Theorem 6.1 and Theorem 6.2 is that the consensus Distributed Computing, pages 249-259, 1987.

and leader election problems are space-equivalent. T h a t

is, there is a t-resilient consensus protocol that uses an [3] G. M. Brown, M. G. Gouda, and C.-L. Wu. Token

O(t)-valued shared register if["there is a t-resilient leader systems that self-stabilize. 1EEE Trans. on Com-

election protocol that uses an O(t)-valued shared regis- puters, 38(6):845-852, June 1989.

ter.

[4] J. E. Burns and J. Pachl. Uniform self-stablilizing

rings. A C M Trans. on Programming Languages and

7 Conclusions Systems, 11(2):330-344, 1989.

We study the fundamental new wakeup problem in a [5] N. Carriero and D. Gelernter. Linda in context.

new model where all processes are programmed alike, Communications of the A CM, 32(4):444-458, April

there is no global synchronization, and it is not possi- 1989.

114

[6] M. Chandy and J. Misra. How processes learn. [19] M. J. Fischer, N. A. Lynch, J. E. Burns, and

Journal of Distributed Computing, 1:40-52, 1986. A. Borodin. Distributed FIFO allocation of identi-

cal resources using small shared space. A CM Trans.

[7] E. Chang and R. Roberts. An improved algorithm on Programming Languages and Systems, 11(1):90-

for decentralized extrema-finding in circular config- 114, 1989.

uration of processes. Communications of the ACM,

22(5):281-283, 1979. [20] M. J. Fischer, N. A. Lynch, and M. Merritt. Easy

impossibility proofs for distributed consensus prob-

[8] E. W. Dijkstra. Self-stablizing systems in spite of lems. Journal of Distributed Computing, 1:26-39,

distributed control. Communications of the ACM, 1986.

17:643-644, 1974.

[21] M. J. Fischer, N. A. Lynch, and M. S. Paterson. Im-

[9] E. W. Dijkstra. A belated proof of self- possibility of distributed consensus with one faulty

stabilization. Journal of Distributed Computing, process. Journal of the ACM, 32(2):374-382, April

1:5-6, 1986. 1985.

[10] D. Dolev, C. Dwork, and L. Stockmeyer. On the [22] M.J. Fischer and L. D. Zuck. Reasoning about un-

minimal synchronism needed for distributed con- certainty in fault-tolerant distributed systems. In

sensus. Journal of the ACM, 34(1):77-97, 1987. M. Joseph, editor, Formal Techniques in Real-Time

and Fault-Tolerant Systems, pages 142-158. Lec-

[11] D. Dolev, E. Gafifi, and N. Shavit. Toward a non- ture Notes in Computer Science, vol. 331, Springer-

atomic era: /-exclusion as a test case. In Proc. 20th Verlag, 1988.

A CM Syrup. on Theory of Computing, pages 78-92,

1988. [23] A. Gottlieb, R. Grishman, C.P. Kruskal, K.P.

McAuliffe, L. Rudolph, and M. Snir. The NYU

[12] S. Dolev, A. Israeli, and S. Moran. Self stabiliza- ultracomputer--designing an MIMD parallel com-

tion of dynamic systems assuming only read write puter. IEEE Trans. on Computers, pages 175-189,

atomicity, submitted for publication, 1989. February 1984.

[13] C. Dwork, N. Lynch, and L. Stockmeyer. Consen- [24] M. G. Gouda. The stabilizing philosopher: Asym-

sus in the presence of partial synchrony. Journal of metry by memory and by action. Science of Com-

the ACM, 35(2):288-323, 1988. puter Programming, 1989. To appear.

[14] C. Dwork and Y. Moses. Knowledge and common [25] V. Hadzilacos. A knowledge theoretic analysis of

knowledge in a Byzantine environment i: Crash atomic commitment protocols. In Proc. 6th A CM

failures. In Theoretical Aspects of Reasoning about Symp. on Principles of Database Systems, pages

Knowledge: Proceedings of the 1986 Conference, 129-134, 1987.

pages 149-169. Morgan Kaufmann, 1986.

[26] J. Halpern and L. Zuck. A little knowledge goes

115] R. Fagin, Y. J. Halpern, and M. Vardi. A model a long way: Simple knowledge-based derivations

theoretic analysis of knowledge. In Proc. 25th IEEE and correctness proofs for a family of protocols. In

Syrup. on Foundations of Computer Science, pages Proc. 6lh A C M Syrup. on Principles of Distributed

268-278, 1984. Computing, pages 269-280, August 1987.

I16] M. J. Fischer. The consensus problem in unreliable [27] Y. J. Halpern. personal COmlnunicatiou.

distributed systems (a brief survey). In M. Karpin- [28] Y. J. Halpern. Reasoning about knowledge: An

sky, editor, Foundations of Computation Theory, overview. In Theoretical Aspects of Reasoning about

pages 127-140. Lecture Notes in Computer Science, Knowledge: Proceedings of the 1986 Conference,

vol. 158, Springer-Verlag, 1983. pages 1-17. Morgan Kaufrnann, 1986.

[17] M. J. Fischer and N. Immerman. Foundations of [29] Y. J. Halpern and R. Fagin. A formal model

knowledge for distributed systems. In Theoretical of knowledge, action, and communication in dis-

Aspects of Reasoning about Knowledge: Proceed- tributed systems: Preliminary report. In Proc. 4th

ings of the 1986 Conference, pages 171-185. Mor- A C M Syrup. on Principles of Distributed Comput-

gan Kaufmann, March 1986. ing, pages 224-236, 1985.

[28] M. J. Fischer and N. Immerman. Interpreting log- [30] Y. J. Halpern and Y. Moses. Knowledge and com-

ics of knowledge in propositional dynamic logic mon knowledge in a distributed environment. In

with converse. Information Processing Letters, Proc. 3rd ACM Syrup. on Principles of Distributed

25(3):175-181, May 1987. Computing, pages 50-61, 1984.

115

[31] P. M. Herlihy. Impossibility and universality re- [44] G. H. Pfister and et. al. The IBM research par-

sults for wait-free synchronization. In Proc. 7th allel processor prototype (RP3): Introduction and

A CM Symp. on Principles of Distributed Comput- architecture. In Proceedings International Confer-

ing, pages 276-290, 1988. ence on Parallel Processing, 1985.

[32] D. S. Hirschberg and J.B. Sinclair. Decentralized [45] A. G. Ranade, S. N. Bhatt, and S. L. Johnsson.

extrema-finding in circular configuration of pro- The fluent abstract machine. TechnicM Report

cesses. Communications of the ACM, 23:627-628, YALEU/DCS/TR-573, Department of Computer

1980. Science, Yale University, January 1988.

[33] S. Katz and G. Taubenfeld. What processes know: [46] G. Taubenfeld. Leader election in the presence of

Definitions and proof methods. In Proc. 5th ACM n - 1 initial failures. Information Processing Let-

Syrup. on Principles of Distributed Computing, ters, 33:25-28, 1989.

pages 249-262, August 1986.

[47] G. Taubenfeld. Self-stabilizing Petri nets. Techni-

[34] E. Korach, S. Moran, and S. Zaks. Tight lower cal Report YALEU/DCS/TR-707, Department of

and upper bounds for some distributed algorithms Computer Science, Yale University, May 1989.

for a complete network of processors. In Proc. 3rd

ACM Symp. on Principles of Distributed Comput- [483 G. Taubenfeld, S. Katz, and S. Moran. Impossibil-

ing, pages 199-207, 1984. ity results in the presence of multiple faulty pro-

cesses. In Proc. of the 9th FCT-TCS conference,

[35] H. S. M. Kruijer. Self-stabilization (in spite of dis- Bangalore, India, December 1989. Previous version

tributed control) in tree-structured systems. Infor- appeared as Technion TR-#492, January 1988.

mation Processing Letters, 2:91-95, 1979.

[49] G. Taubenfeld, S. Katz, and S. Moran. Initial

[36] L. Lamport. The mutual exclusion problem: State- failures in distributed computations. International

ment and solutions. Journal of the ACM, 33:327- Journal of Parallel Programming, 1990. To appear.

348, 1986. Previous version appeared as Technion TR-#517,

August 1988.

[37] D. Lehmann. Knowledge, common knowledge and

related puzzles. In Proc. 3rd A CM Syrup. on Prin- [50] G. Taubenfeld and S. Moran. Possibility and im-

ciples of Distributed Computing, pages 62-67, 1984. possibility results in a shared memory environment.

In J.C. Bermond and M. Raynal, editors, Proc. 3rd

[38] C. M. Lout and H. Abu-Amara. Memory re- International Workshop on Distributed Algorithms,

quirements for agreement among unreliable asyn- pages 254-267. Lecture Notes in Computer Science,

chronous processes. Advances in Computing Re- vol. 392, Springer-Verlag, 1989.

search, 4:163-183, 1988.

[5i] P. M. B. Vitanyi and B. Awerbuch. Atomic shared

[39] N. A. Lynch and M. J. Fischer. A technique for register access by asynchronous hardware. In Proc.

decomposing algorithms which use a single shared 27th IEEE Symp. on Foundations of Computer Sci-

variable. Journal of Computer and System Sci- ence, pages 223-243, 1986.

ences, 27(3):350-377, December 1983.

[40] Y. Moses and M. R. Tuttle. Programming simulta-

neous actions using common knowledge. Algorilh-

mica, 3:121-169, 1988.

[41] R. Parikh and R. Ramanujam. Distributed pro-

cesses and the logic of knowledge. In R. Parikh,

editor, Proceedings of the Workshop on Logic of

Programs, pages 256-268, 1985.

[42] M. Pease, R. Shostak, and L. Lamport. Reaching

agreement in the presence of faults. Journal of the

ACM, 27(2):228-234, 1980.

[43] G. L. Peterson. An O(nlogn) unidirectional al-

gorithm for the circular extrema problem. ACM

Trans. on Programming Languages and Sys$ems,

4(4):758-762, 1982.

116

## Much more than documents.

Discover everything Scribd has to offer, including books and audiobooks from major publishers.

Cancel anytime.