Professional Documents
Culture Documents
Slide 1
Distributed Algorithms Time and Clocks Global State Concurrency Control
VS
(Part 1)
D ISTRIBUTED A LGORITHMS
Algorithms that are intended to work in a distributed environment Used to accomplish tasks such as:
Communication Accessing resources
Synchronous Distributed System: Time variance is bounded Execution : bounded execution speed and time Communication : bounded transmission delay Clocks : bounded clock drift (and differences in clocks) Slide 4 Effect:
Can rely on timeouts to detect failure V Easier to design distributed algorithms X Very restrictive requirements Limit concurrent processes per processor Limit concurrent use of network Require precise clocks and synchronisation
Slide 2
Asynchronous Distributed System: Time variance is not bounded Execution : different steps can have varying duration Communication : transmission delays vary widely Slide 5 Clocks : arbitrary clock drift Effect:
Allows no assumption about time intervals X Cannot rely on timeouts to detect failure X Most asynch DS problems hard to solve V Solution for asynch DS is also a solution for synch DS Most real distributed systems are hybrid synch and asynch
S YNCHRONISATION
Important: Slide 7
AND
C OORDINATION
Doing the right thing at the right time. Two fundamental issues:
Coordination (the right thing) Synchronisation (the right time)
S YNCHRONISATION
Ordering of all actions
Total ordering of events
Slide 6
throughput: 1/(delay + executiontime) complexity: O() Efciency resource usage: memory, CPU, etc. Scalability Reliability number of points of failure (low is good)
Slide 8
Total ordering of instructions Total ordering of communication Ordering of access to resources Requires some concept of time
C OORDINATION
C OORDINATION
Coordinate actions and agree on values. Coordinate Actions:
What actions will occur
Slide 9
Slide 11
T IME
AND
C LOCKS
Agree on Values:
Agree on global value Agree on environment Agree on state
Slide 10
Slide 12
Based on earths rotation Not stable International Atomic Time (IAT) Based on oscillations of Cesium-133 Coordinated Universal Time (UTC) leap seconds Signals broadcast over the world
T IME
P HYSICAL C LOCKS
Synchronisation Using Physical Clocks: Local Time: Slide 13
Not synchronised to Global source Relative not absolute
Examples: Slide 15
Performing events at an exact time (turn lights on/off, lock/unlock gates) Logging of events (for security, for proling, for debugging) Tracking (tracking a moving object with separate cameras) Make (edit on one computer build on another)
C LOCKS
Computer Clocks:
Crystal oscillates at known frequency Oscillations cause timer interrupts Timer interrupts update clock
Slide 14
Clock Skew:
Crystals in different computers run at slightly different rates Clocks get out of sync Maximum drift rate
Slide 16
Ideally Cp (t) = t X Clock skew causes clocks to drift Must regularly synchronise with UTC
Timestamps:
Used to denote at which time an event occurred Logical vs Physical
P HYSICAL C LOCKS
C RISTIAN S A LGORITHM
Time Server:
Has UTC receiver Passive
Slide 17
External Synchronisation:
Clocks synchronise to an external time source Synchronise with UTC every seconds
Slide 19
Algorithm:
Clients periodically request the time Dont set time backward Take propagation and interrupt handling delay into account (T 1 T 0)/2 Or take a series of measurements and average the delay
Time Server:
Server that has the correct time Server that calculates the correct time
B ERKELEY A LGORITHM
Time daemon 3:00 3:00 3:00 -10 3:00 Network +25 3:00 0 +15 -20 3:05 +5
Slide 18
Slide 20
2:50 (a)
3:25
2:50 (b)
3:25
3:05 (c)
3:05
Procedure Call: clients poll, reasonable accuracy Symmetric: Between peer servers. highest accuracy
C RISTIAN S A LGORITHM
10
Synchronisation:
Estimate clock offsets and transmission delays between two nodes
Slide 21
Keep estimates for past communication Choose offset estimate for lowest transmission delay Also determine unreliable servers Accuracy 1 - 50 msec
Slide 23
P2 E21 E22 E23 E24 Real Time
Causally related: E11 E12 , E13 , E14 , E23 , E24 , . . . E21 E22 , E23 , E24 , E13 , E14 , . . . Concurrent: E11 E21 , E12 E22 , E13 E23 , E11 E22 , E13 E24 , E14 E23 , . . .
L OGICAL C LOCKS
Event ordering is more important than physical time:
Events (e.g., state changes) in a single process are ordered Processes need to agree on ordering of causally related events (e.g., message send and receive)
Local ordering:
System consists of N processes pi , i {1, . . . , N }
Slide 22
Slide 24
Implementation:
Before timestamping a local event pi executes Li := Li + 1 Whenever a message m is sent from pi to pj : pi executes Li := Li + 1 and sends Li with m pj receives Li with m and executes Lj := max(Lj , Li ) + 1 (receive(m) is annotated with the new Lj )
Global ordering:
Leslie Lamports happened before relation Smallest relation, such that 1. e i e implies e e 2. For every message m, send (m) receive(m) 3. Transitivity: e e and e e implies e e
Properties:
a b implies L(a) < L(b) L(a) < L(b) does not necessarily imply a b
L OGICAL C LOCKS
11
L OGICAL C LOCKS
12
Example:
E11 1 E12 E13 E14 E15 2 3 4 5 E16 6 E17 P1 7
Vector clocks:
At each process, maintain a clock for every other process I.e., each clock Vi is a vector of size N Vi [j] contains is knowledge about js clock
1 2 3 E23 4 E24 7 P2 E25 Real Time
Implementation:
Initially, Vi [j] := 0 for i, j {1, . . . , N } Before pi timestamps an event, Vi [i] := Vi [i] + 1 Whenever a message m is sent from pi to pj : pi executes Vi [i] := Vi [i] + 1 and sends Vi with m pj receives Vi with m and merges the vector clocks Vi and Vj : max(Vj [k], Vi [k]) + 1 , if j = k Vj [k] := max(Vj [k], Vi [k]) , otherwise
E21 E22
Slide 27
Complete partial to total order by including process identiers Given local time stamps Li (e) and Lj (e ), we dene global time stamps Li (e), i and Lj (e ), j Lexicographical ordering: Li (e), i < Lj (e ), j iff Li (e) < Lj (e ) or Li (e) = Lj (e ) and i < j
V ECTOR C LOCKS
Main shortcoming of Lamports clocks:
L(a) < L(b) does not imply a b We cannot deduce causal dependencies from time stamps:
E11 1 E21 E12 2 E22 3 E32 2 E33 3 Real Time P3 P2 P1
Properties:
For all i, j, Vi [i] Vj [i] a b iff V (a) < V (b) where V V V V = V iff V [i] = V [i] for i {1, . . . , N } V iff V [i] V [i] for i {1, . . . , N } > V iff V V V = V V iff V > V V > V
Example: Slide 28
E11 1 2 E12 E13 6 (3,4,1) 3 E22 E23 4 E24 5 (2,4,1) (1,0,0) (2,0,0) E21 1 (0,1,0) P1
Slide 26
E31 1
We have L1 (E11 ) < L3 (E33 ), but E11 E33 Why? Clocks advances independently or via messages There is no history as to where advances come from
P2
P3 Real Time
V ECTOR C LOCKS
13
G LOBAL S TATE
14
C ONSISTENT C UTS
Determining global properties:
We need to combine information from multiple nodes Without global time, how do we know whether collected local information is consistent? Local state sampled at arbitrary points in time surely is not consistent We need a criterion for what constitutes a globally consistent collection of local information
Slide 29
G LOBAL S TATE
Slide 31
Local history:
G LOBAL S TATE
Determining global properties:
Distributed garbage collection: Do any references exist to a given object?
N processes pi , i {1, . . . , N } For each pi , event series e0 , e1 , e2 , . . . i i i is called pi s history denoted by hi . May be nite or innite
Slide 30
Distributed deadlock detection: Do processes wait in a cycle for each other? Distributed termination detection: Did a set of processes cease all activity? (Consider messages in transit!)
Slide 32
Process state:
State of process pi immediately before event ek denoted sk i i State sk records all events included in the history hk1 i i Hence, s0 refers to pi s initial state i
C ONSISTENT C UTS
15
C ONSISTENT C UTS
16
cut 1
s 0 3 r 0 3
cut 2
s 1 3 r 1 3
P3
H=
hi
i=1
Slide 33
Slide 35
P2
0 2
1 2
0 2
1 2
Similarly, we can combine a set of local states s1 , . . . , sN into a global state: S = (s1 , . . . , sN ) Which combination of local state is consistent?
P1
s 0 1 r 0 1 r
2 2
1 1
We call a cut consistent iff, for all events e C, e e implies e C A global state is consistent if it corresponds to a consistent cut hci i
i=1
C= hci i
Note: we can characterise the execution of a system as a sequence of consistent global states
Slide 34
The cut C corresponds to the state S = (sc1 +1 , . . . , scn +1 ) 1 N The nal events in a cut are its frontier: {eci | i {1, . . . , N }} i
Slide 36
S0 S1 S2
Linearisation:
A global history that is consistent with the happened-before relation is also called a linearisation or consistent run A linearisation only passes through consistent global states A state S is reachable from state S if there is a linearisation that passes thorough S and then S
C ONSISTENT C UTS
17
18
*
m1 m3
Properties: Slide 37
Reliable communication and failure-free processes Point-to-point message delivery is ordered Process/channel graph must be strongly connected On termination, processes hold only their local state components and a set of messages that were in transit during the snapshot.
P3
Slide 39
P2
*
m2
Slide 38
Slide 40
C ONCURRENCY
19
C ONCURRENCY
20
C ONCURRENCY
Concurrency in a Non-Distributed System: Typical OS and multithreaded programming problems
Prevent race conditions Critical sections
Slide 41
Mutual exclusion Locks Semaphores Monitors Must apply mechanisms correctly Deadlock Starvation
Requirements: Slide 43
Safety: At most one process may execute the critical section at a time Liveness: Requests to enter and exit the critical section eventually succeed Ordering: Requests are processed in happened-before ordering
Slide 42
Slide 44
0 Request 3 Coordinator (a) 1 OK Queue is empty 2 0 1 Request No reply 3 2 3 2 0 1 2 OK Release
(b)
(c)
21
22
Properties: Slide 45
Easy to implement Does not scale well Central server may fail
Properties:
Ring imposes an average delay of N/2 hops (limits scalability)
Slide 47
Token messages consume bandwidth Failing nodes or channels can break the ring (token might be lost)
AND
L OGICAL C LOCKS
Slide 46
2 1 3
Processes pi maintain a Lamport clock and can communicate pairwise Processes are in one of three states: 1. Released: Outside of critical section 2. Wanted: Waiting to enter critical section 3. Held: Inside critical section
7 6 (a) (b)
23
24
General Properties:
Performance number of messages exchanged response/wait time delay
Slide 49
Slide 51
throughput: 1/(delay + executiontime) complexity: O() Efciency resource usage: memory, CPU, etc. Scalability Reliability number of points of failure (low is good)
Messages Exchanged:
Messages per entry/exit of critical section
0 OK 2 1 OK 2 Enters critical region
8 0 8 8 1 12 (a) 2 12 1 12 OK
OK (b)
Delay: Slide 52
Delay before entering critical section
Centralised: 2 Ring: 0 n 1 Multicast: 2(n 1)
Slide 50
(c)
Properties:
Multicast leads to increasing overhead (more sophisticated algorithm using only subsets of peer processes exists) Susceptible to faults
Reliability:
Problems that may occur
Centralised: coordinator crashes Ring: lost token, process crashes Multicast: any process crashes
25
26