Professional Documents
Culture Documents
Determining Global States of Distributed Systems: Presented by Sanjeev R. Kulkarni
Determining Global States of Distributed Systems: Presented by Sanjeev R. Kulkarni
References
1. Distributed Snapshots: Determining Global States of Distributed Systems, K. Mani Chandy and Leslie Lamport, ACM Transactions on Computer Systems, vol 3, no 1, Feb85. 2. PUBLISHING: A Reliable Broadcast Communication Mechanism, Michael L. Powell and David L. Presotto, Proceedings of the Ninth ACM Symposium on Operating Systems Principles, Oct 83. 3. Consistent Global States of Distributed Systems: Fundamental Concepts and Mechanisms, Ozalp Babaoglu and Keith Marzullo, Distributed Systems, Sape J. Mullender, Addison-Wesley, 1993.
Global State Detection
Model of Computation
Finite set of processes Process send messages on a finite set of unidirectional channels Channels are error free, FIFO and have infinite buffers Messages experience arbitrary but finite delays Strongly connected network
Sp0
Sp1
Sp2
Sp3
Sq
Sq
Sq3
5
Transitive
if e e` and e` e`` then e e``
More on States
process state
memory state + register state + signal masks + open files + kernel buffers + Or application specific info like transactions completed, functions executed etc,.
channel state
Messages in transit i.e. those messages that have been sent but not yet received
Global State Detection 8
e.g.
distributed deadlock detection is finding a cycle in the Wait For Graph. Termination detection Checkpointing many more..
Global State Detection 9
Difficulties
Instantaneous recording not possible
No global clock : Distributed recording of local states cannot be synchronized based on time
Random Network Delays : No centralized process can initiate the detection
11
Non-Deterministic Computation
At any point in computation there can be more than one event that can happen next.
12
Producer code:
while (1) { produce m; send m; wait for ack; }
Consumer code:
while (1) { recv m; consume m; send ack; }
13
14
Example
15
Example
16
Example
17
Example
18
Example
19
20
Non-deterministic computation
3 processes
m1 m2 m3
Global State Detection 21
q
r
q
r
q m2
m3
p q
m1
m3
m2
r
Global State Detection 22
A Non-Deterministic Computation
24
A Non-Deterministic Computation
Non-Determinism
Deterministic computation
A local event would reveal everything about the global state! The process will know other process state
m
Example
Producer Consumer problem
p records its state
p q
28
Example
p
m
29
Example
q records its state
p q
30
31
32
Error!!
The sender has no record of the sending The receiver has the record of the receipt Result
Global state has record of the receive event but no send event violating the happened before concept!!
33
If e e` then it is never the case that e` is observed by the external observer and not e All feasible states are consistent
Global State Detection 34
An Example
p q
Sp0 m1
Sp1
Sp2 m2
Sp3
m3 Sq1
Global State Detection
Sq0
Sq2
Sq3
35
A Consistent State?
p Sp1 q Sq1
Sp0 m1
Sp1
Sp2 m2
Sp3
m3 Sq1
Global State Detection
Sq0
Sq2
Sq3
36
Yes
p Sp1 q Sq1
Sp0 m1
Sp1
Sp2 m2
Sp3
m3 Sq1
Global State Detection
Sq0
Sq2
Sq3
37
A Consistent State?
p Sp2 m3 q Sq3
Sp0 m1
Sp1
Sp2 m2
Sp3
m3 Sq1
Global State Detection
Sq0
Sq2
Sq3
38
Yes
p Sp2 m3 q Sq3
Sp0 m1
Sp1
Sp2 m2 m3
Sp3
Sq0
Sq1
Sq2
Global State Detection
Sq3
39
An inconsistent State
p Sp1 q Sq3
Sp0 m1
Sp1
Sp2 m2
Sp3
m3 Sq1
Global State Detection
Sq0
Sq2
Sq3
40
41
Algorithm in Action
Sp0
Sp1
Sp2
Sp3
m1 Sq0 Sq1
m2 Sq2
m3 Sq3
43
Algorithm in Action
q records state as Sq1 , sends marker to p
p
Sp0 Sp1 Sp2 Sp3
m1 Sq0 Sq1
m2 Sq2
m3 Sq3
44
Algorithm in Action
p records state as Sp2, channel state as empty
p
Sp0 Sp1 Sp2 Sp3
m1 Sq0 Sq1
m2 Sq2
m3 Sq3
45
Algorithm in Action
q records channel state as m3
p
Sp0 Sp1 Sp2 Sp3
m1 Sq0 Sq1
m2 Sq2
m3 Sq3
46
Algorithm in Action
Recorded Global State = ((Sp2, Sq1), (0,m3) )
p
Sp0 Sp1 Sp2 Sp3
m1 Sq0 Sq1
m2 Sq2
m3 Sq3
47
m
q
48
Algorithm in Action
Recorded Global State = ((Sp2, Sq1), (0,m3) )
p Sp0 Sp1 Sp2 Sp3
m1
m2
m3
Sq0
Sq1
Sq2
Sq3
Moral: Computation may not even have passed through the state recorded!
Global State Detection 49
S* Is reachable from Si
Si
Sj
Global State Detection 52
Sj Is reachable from S*
Si
Sj
Global State Detection 53
54
Stable Properties
Si S*
Sj
Global State Detection 55
Stable Properties
Si S*
Sj
Global State Detection 56
57
Checkpointing
S* serves as a checkpoint On a failure, restart the computation from S*
Si
S*
Sj
58
Solution: Publishing
A Broadcast medium A central recorder process records all the messages received by each process Processes record their states at their own time and send it to the recorder
59
Architecture of Publishing
recorder
STATE SENT MSGS ID RECD
p q
Sp1
Sq1 q
Sp1 Sq1
Global State Detection 60
recorder
STATE SENT MSGS ID RECD
p q
Sp1
Sq2 q
Sp1 Sq1
1
Global State Detection 61
recorder
STATE SENT MSGS ID RECD
p q
Sp2
Sq2 q
Sp1 Sq1
1
m1
62
Plus
Messages recd since last checkpoint
63
Problems
Publishing keeps track of all messages received by each process Expensive! Solution
recorder takes checkpoint of process p at time t deletes all messages recd by p before t.
64
p checkpoints
recorder
STATE SENT MSGS ID RECD
p q
Sp2
Sq2 q
Sp1 Sq1
1
m1
65
recorder
STATE SENT MSGS ID RECD
p q
Sp2
Sq2 q
Sp2 Sq1
1
Global State Detection 66
recorder
STATE SENT MSGS ID RECD
p q
Sp2
Sq2 q
Sp1 Sq1
1
m1
67
Say p crashes
recorder
STATE SENT MSGS ID RECD
p q
Sq2
Sp1 Sq1
1
m1
68
recorder
STATE SENT MSGS ID RECD
p q
Sp1
Sq2 q
Sp1 Sq1
1
m1
69
Replays back m1
m1
recorder
STATE SENT MSGS ID RECD
p q
Sp2
Sq2 q
Sp1 Sq1
1
m1
70
q crashes
recorder
STATE SENT MSGS ID RECD
p q
Sp2
Sp1 Sq1
1
m1
71
recorder
STATE SENT MSGS ID RECD
p q
Sp2
Sq1 q
Sp1 Sq1
1
m1
72
Ignore m1
m1
recorder
STATE SENT MSGS ID RECD
p q
Sp2
Sq1 q
Sp1 Sq1
1
m1
73
Comparison
SNAPSHOT PUBLISHING
Network Mode Scalability Restorability
Strongly connected Distributed Yes No Need not be Centralized No Yes
74
Summary
Global State detection difficult in Distributed Systems Snapshot algorithm may not give an actual state but is very helpful in detecting Stable Properties Publishing gives an asynchronous way of determining global states but is unscalable
Global State Detection 75