You are on page 1of 13

D ISTRIBUTED S YSTEMS [COMP9243] S YNCHRONOUS Lecture 4: Synchronisation and Coordination

Slide 1
Distributed Algorithms Time and Clocks Global State Concurrency Control

VS

A SYNCHRONOUS D ISTRIBUTED S YSTEMS

Timing model of a distributed system Slide 3 Affected by:


Execution speed/time of processes Communication delay Clocks & clock drift (Partial) failure

(Part 1)

D ISTRIBUTED A LGORITHMS
Algorithms that are intended to work in a distributed environment Used to accomplish tasks such as:
Communication Accessing resources

Synchronous Distributed System: Time variance is bounded Execution : bounded execution speed and time Communication : bounded transmission delay Clocks : bounded clock drift (and differences in clocks) Slide 4 Effect:
Can rely on timeouts to detect failure V Easier to design distributed algorithms X Very restrictive requirements Limit concurrent processes per processor Limit concurrent use of network Require precise clocks and synchronisation

Slide 2

Allocating resources Consensus etc.

Synchronisation and coordination inextricably linked to distributed algorithms


Achieved using distributed algorithms Required by distributed algorithms

S YNCHRONOUS VS A SYNCHRONOUS D ISTRIBUTED S YSTEMS

S YNCHRONOUS VS A SYNCHRONOUS D ISTRIBUTED S YSTEMS

Asynchronous Distributed System: Time variance is not bounded Execution : different steps can have varying duration Communication : transmission delays vary widely Slide 5 Clocks : arbitrary clock drift Effect:
Allows no assumption about time intervals X Cannot rely on timeouts to detect failure X Most asynch DS problems hard to solve V Solution for asynch DS is also a solution for synch DS Most real distributed systems are hybrid synch and asynch

S YNCHRONISATION
Important: Slide 7

AND

C OORDINATION

Doing the right thing at the right time. Two fundamental issues:
Coordination (the right thing) Synchronisation (the right time)

E VALUATING D ISTRIBUTED A LGORITHMS


General Properties:
Performance number of messages exchanged response/wait time delay

S YNCHRONISATION
Ordering of all actions
Total ordering of events

Slide 6

throughput: 1/(delay + executiontime) complexity: O() Efciency resource usage: memory, CPU, etc. Scalability Reliability number of points of failure (low is good)

Slide 8

Total ordering of instructions Total ordering of communication Ordering of access to resources Requires some concept of time

S YNCHRONISATION AND C OORDINATION

C OORDINATION

C OORDINATION
Coordinate actions and agree on values. Coordinate Actions:
What actions will occur

Slide 9

Who will perform actions

Slide 11

T IME

AND

C LOCKS

Agree on Values:
Agree on global value Agree on environment Agree on state

T IME M AIN I SSUES


Time and Clocks: synchronising clocks and using time in distributed algorithms Global State: how to acquire knowledge of the systems global state Concurrency Control: coordinating concurrent access to resources Coordination: when do processes need to coordinate and how do they do it Global Time:
Absolute time Einstein says no absolute time Absolute enough for our purposes Astronomical time

Slide 10

Slide 12

Based on earths rotation Not stable International Atomic Time (IAT) Based on oscillations of Cesium-133 Coordinated Universal Time (UTC) leap seconds Signals broadcast over the world

T IME AND C LOCKS

T IME

P HYSICAL C LOCKS
Synchronisation Using Physical Clocks: Local Time: Slide 13
Not synchronised to Global source Relative not absolute

Examples: Slide 15
Performing events at an exact time (turn lights on/off, lock/unlock gates) Logging of events (for security, for proling, for debugging) Tracking (tracking a moving object with separate cameras) Make (edit on one computer build on another)

C LOCKS
Computer Clocks:
Crystal oscillates at known frequency Oscillations cause timer interrupts Timer interrupts update clock

Based on actual time


Cp (t): current time (at UTC time t) on machine p

Slide 14

Clock Skew:
Crystals in different computers run at slightly different rates Clocks get out of sync Maximum drift rate

Slide 16

Ideally Cp (t) = t X Clock skew causes clocks to drift Must regularly synchronise with UTC

Timestamps:
Used to denote at which time an event occurred Logical vs Physical

P HYSICAL C LOCKS

S YNCHRONISING P HYSICAL C LOCKS

S YNCHRONISING P HYSICAL C LOCKS


Internal Synchronisation:
Clocks synchronise locally Only synchronised with each other

C RISTIAN S A LGORITHM
Time Server:
Has UTC receiver Passive

Slide 17

External Synchronisation:
Clocks synchronise to an external time source Synchronise with UTC every seconds

Slide 19

Algorithm:
Clients periodically request the time Dont set time backward Take propagation and interrupt handling delay into account (T 1 T 0)/2 Or take a series of measurements and average the delay

Time Server:
Server that has the correct time Server that calculates the correct time

B ERKELEY A LGORITHM
Time daemon 3:00 3:00 3:00 -10 3:00 Network +25 3:00 0 +15 -20 3:05 +5

N ETWORK T IME P ROTOCOL (NTP)


Hierarchy of Servers:
Primary Server: has UTC clock Secondary Server: connected to primary etc.

Slide 18

Slide 20

Synchronisation Modes: Multicast: for LAN, low accuracy

2:50 (a)

3:25

2:50 (b)

3:25

3:05 (c)

3:05

Procedure Call: clients poll, reasonable accuracy Symmetric: Between peer servers. highest accuracy

C RISTIAN S A LGORITHM

N ETWORK T IME P ROTOCOL (NTP)

10

The relation is a partial order:


If a b, then a causally affects b We consider unordered events to be concurrent:

Synchronisation:
Estimate clock offsets and transmission delays between two nodes

Example:a b and b a implies a b


E11 E12 E13 E14 P1

Slide 21

Keep estimates for past communication Choose offset estimate for lowest transmission delay Also determine unreliable servers Accuracy 1 - 50 msec

Slide 23
P2 E21 E22 E23 E24 Real Time

Causally related: E11 E12 , E13 , E14 , E23 , E24 , . . . E21 E22 , E23 , E24 , E13 , E14 , . . . Concurrent: E11 E21 , E12 E22 , E13 E23 , E11 E22 , E13 E24 , E14 E23 , . . .

L OGICAL C LOCKS
Event ordering is more important than physical time:
Events (e.g., state changes) in a single process are ordered Processes need to agree on ordering of causally related events (e.g., message send and receive)

Lamports logical clocks:


Software counter to locally compute the happened-before relation Each process pi maintains a logical clock Li Lamport timestamp: Li (e): timestamp of event e at pi L(e): timestamp of event e at process it occurred at

Local ordering:
System consists of N processes pi , i {1, . . . , N }

Slide 22

Local event ordering i : If pi observes e before e , we have e i e

Slide 24

Implementation:
Before timestamping a local event pi executes Li := Li + 1 Whenever a message m is sent from pi to pj : pi executes Li := Li + 1 and sends Li with m pj receives Li with m and executes Lj := max(Lj , Li ) + 1 (receive(m) is annotated with the new Lj )

Global ordering:
Leslie Lamports happened before relation Smallest relation, such that 1. e i e implies e e 2. For every message m, send (m) receive(m) 3. Transitivity: e e and e e implies e e

Properties:
a b implies L(a) < L(b) L(a) < L(b) does not necessarily imply a b

L OGICAL C LOCKS

11

L OGICAL C LOCKS

12

Example:
E11 1 E12 E13 E14 E15 2 3 4 5 E16 6 E17 P1 7

Vector clocks:
At each process, maintain a clock for every other process I.e., each clock Vi is a vector of size N Vi [j] contains is knowledge about js clock
1 2 3 E23 4 E24 7 P2 E25 Real Time

Implementation:
Initially, Vi [j] := 0 for i, j {1, . . . , N } Before pi timestamps an event, Vi [i] := Vi [i] + 1 Whenever a message m is sent from pi to pj : pi executes Vi [i] := Vi [i] + 1 and sends Vi with m pj receives Vi with m and merges the vector clocks Vi and Vj : max(Vj [k], Vi [k]) + 1 , if j = k Vj [k] := max(Vj [k], Vi [k]) , otherwise

E21 E22

Slide 25 Total event ordering:

Slide 27

Complete partial to total order by including process identiers Given local time stamps Li (e) and Lj (e ), we dene global time stamps Li (e), i and Lj (e ), j Lexicographical ordering: Li (e), i < Lj (e ), j iff Li (e) < Lj (e ) or Li (e) = Lj (e ) and i < j

V ECTOR C LOCKS
Main shortcoming of Lamports clocks:
L(a) < L(b) does not imply a b We cannot deduce causal dependencies from time stamps:
E11 1 E21 E12 2 E22 3 E32 2 E33 3 Real Time P3 P2 P1

Properties:
For all i, j, Vi [i] Vj [i] a b iff V (a) < V (b) where V V V V = V iff V [i] = V [i] for i {1, . . . , N } V iff V [i] V [i] for i {1, . . . , N } > V iff V V V = V V iff V > V V > V

Example: Slide 28
E11 1 2 E12 E13 6 (3,4,1) 3 E22 E23 4 E24 5 (2,4,1) (1,0,0) (2,0,0) E21 1 (0,1,0) P1

Slide 26
E31 1

We have L1 (E11 ) < L3 (E33 ), but E11 E33 Why? Clocks advances independently or via messages There is no history as to where advances come from

(2,2,0) (2,3,1) 1 E31 (0,0,1) E32 2 (0,0,2)

P2

P3 Real Time

For L1 (E12 ) and L3 (E32 ), 2 = 2 versus (2, 0, 0) = (0, 0, 2)

V ECTOR C LOCKS

13

G LOBAL S TATE

14

C ONSISTENT C UTS
Determining global properties:
We need to combine information from multiple nodes Without global time, how do we know whether collected local information is consistent? Local state sampled at arbitrary points in time surely is not consistent We need a criterion for what constitutes a globally consistent collection of local information

Slide 29

G LOBAL S TATE

Slide 31

Local history:

G LOBAL S TATE
Determining global properties:
Distributed garbage collection: Do any references exist to a given object?

N processes pi , i {1, . . . , N } For each pi , event series e0 , e1 , e2 , . . . i i i is called pi s history denoted by hi . May be nite or innite

Slide 30

Distributed deadlock detection: Do processes wait in a cycle for each other? Distributed termination detection: Did a set of processes cease all activity? (Consider messages in transit!)

Slide 32

We denote by hk a k-prex of hi . i Each event ej is either a local event or a communication event i

Process state:
State of process pi immediately before event ek denoted sk i i State sk records all events included in the history hk1 i i Hence, s0 refers to pi s initial state i

C ONSISTENT C UTS

15

C ONSISTENT C UTS

16

Global history and state:


Using a total event ordering, we can merge all local histories into a global history:
N

cut 1
s 0 3 r 0 3

cut 2
s 1 3 r 1 3

P3

H=

hi
i=1

Slide 33

Slide 35

P2

0 2

1 2

0 2

1 2

Similarly, we can combine a set of local states s1 , . . . , sN into a global state: S = (s1 , . . . , sN ) Which combination of local state is consistent?
P1
s 0 1 r 0 1 r

2 2

1 1

Consistent cut: Cuts:


Similar to the global history, we can dene cuts based on k-prexes:
N

We call a cut consistent iff, for all events e C, e e implies e C A global state is consistent if it corresponds to a consistent cut hci i
i=1

C= hci i

Note: we can characterise the execution of a system as a sequence of consistent global states

Slide 34

is history of pi up to and including event eci i

The cut C corresponds to the state S = (sc1 +1 , . . . , scn +1 ) 1 N The nal events in a cut are its frontier: {eci | i {1, . . . , N }} i

Slide 36

S0 S1 S2

Linearisation:
A global history that is consistent with the happened-before relation is also called a linearisation or consistent run A linearisation only passes through consistent global states A state S is reachable from state S if there is a linearisation that passes thorough S and then S

C ONSISTENT C UTS

17

C HANDY & L AMPORT S S NAPSHOTS

18

C HANDY & L AMPOR T S S NAPSHOTS


Determines a consistent global state Takes care of messages that are in transit Useful for evaluating stable global properties
P1

*
m1 m3

Properties: Slide 37
Reliable communication and failure-free processes Point-to-point message delivery is ordered Process/channel graph must be strongly connected On termination, processes hold only their local state components and a set of messages that were in transit during the snapshot.
P3

Slide 39

P2

*
m2

Outline of the algorithm:


One process initiates the algorithm by recording its local state and sending a marker message over each outgoing channel On receipt of a marker message over incoming channel c, if local state not yet saved, save local state and send marker messages, or if local state already saved, channel snapshot for c is complete Local contribution complete after markers received on all incoming channels

Slide 38

Slide 40

C ONCURRENCY

Result for each process:


One local state snapshot For each incoming channel, a set of messages received after performing the local snapshot and before the marker came down that channel

C HANDY & L AMPORT S S NAPSHOTS

19

C ONCURRENCY

20

C ONCURRENCY
Concurrency in a Non-Distributed System: Typical OS and multithreaded programming problems
Prevent race conditions Critical sections

D ISTRIBUTED M UTUAL E XCLUSION


Concurrent access to distributed resources Must prevent race conditions during critical regions

Slide 41

Mutual exclusion Locks Semaphores Monitors Must apply mechanisms correctly Deadlock Starvation

Requirements: Slide 43
Safety: At most one process may execute the critical section at a time Liveness: Requests to enter and exit the critical section eventually succeed Ordering: Requests are processed in happened-before ordering

M ETHOD 1: C ENTRAL S ERVER


Simplest approach: Concurrency in a Distributed System: Distributed System introduces more challenges
No directly shared resources (e.g., memory) Requests to enter and exit a critical section are sent to a lock server Permission to enter is granted by receiving a token When critical section left, token is returned to the server

Slide 42

No global state No global clock No centralised algorithms More concurrency

Slide 44
0 Request 3 Coordinator (a) 1 OK Queue is empty 2 0 1 Request No reply 3 2 3 2 0 1 2 OK Release

(b)

(c)

D ISTRIBUTED M UTUAL E XCLUSION

21

M ETHOD 1: C ENTRAL S ERVER

22

Properties: Slide 45
Easy to implement Does not scale well Central server may fail

Properties:
Ring imposes an average delay of N/2 hops (limits scalability)

Slide 47

Token messages consume bandwidth Failing nodes or channels can break the ring (token might be lost)

M ETHOD 2: T OKEN R ING


Implementation:
All processes are organised in a logical ring structure A token message is forwarded along the ring Before entering the critical section, a process has to wait until the token comes by Must retain the token until the critical section is left

M ETHOD 3: U SING M ULTICASTS


Algorithm by Ricart & Agrawala: Slide 48

AND

L OGICAL C LOCKS

Slide 46
2 1 3

Processes pi maintain a Lamport clock and can communicate pairwise Processes are in one of three states: 1. Released: Outside of critical section 2. Wanted: Waiting to enter critical section 3. Held: Inside critical section

7 6 (a) (b)

M ETHOD 2: TOKEN R ING

23

M ETHOD 3: U SING M ULTICASTS AND L OGICAL C LOCKS

24

E VALUATING D ISTRIBUTED A LGORITHMS


Process behaviour:
If a process wants to enter, it multicasts a message Li , pi and waits until it has received a reply from every process If a process is in Released, it immediately replies to any request to enter the critical section If a process is in Held, it delays replying until it is nished with the critical section If a process is in Wanted, it replies to a request immediately only if the requesting timestamp is smaller than the one in its own request

General Properties:
Performance number of messages exchanged response/wait time delay

Slide 49

Slide 51

throughput: 1/(delay + executiontime) complexity: O() Efciency resource usage: memory, CPU, etc. Scalability Reliability number of points of failure (low is good)

M UTUAL E XCLUSION : A C OMPARISON


Enters critical region

Messages Exchanged:
Messages per entry/exit of critical section
0 OK 2 1 OK 2 Enters critical region

8 0 8 8 1 12 (a) 2 12 1 12 OK

Centralised: 3 Ring: 1 Multicast: 2(n 1)

OK (b)

Delay: Slide 52
Delay before entering critical section
Centralised: 2 Ring: 0 n 1 Multicast: 2(n 1)

Slide 50

(c)

Properties:
Multicast leads to increasing overhead (more sophisticated algorithm using only subsets of peer processes exists) Susceptible to faults

Reliability:
Problems that may occur
Centralised: coordinator crashes Ring: lost token, process crashes Multicast: any process crashes

E VALUATING D ISTRIBUTED A LGORITHMS

25

M UTUAL E XCLUSION : A C OMPARISON

26

You might also like