Professional Documents
Culture Documents
Distpres
Distpres
Distributed Systems
Presented by
Sanjeev R. Kulkarni
References
1. “Distributed Snapshots: Determining Global States of
Distributed Systems”, K. Mani Chandy and Leslie
Lamport, ACM Transactions on Computer Systems, vol 3,
no 1, Feb85.
2. “PUBLISHING: A Reliable Broadcast Communication
Mechanism”, Michael L. Powell and David L. Presotto,
Proceedings of the Ninth ACM Symposium on Operating
Systems Principles, Oct 83.
3. Consistent Global States of Distributed Systems:
Fundamental Concepts and Mechanisms, Ozalp Babaoglu
and Keith Marzullo, Distributed Systems, Sape J.
Mullender, Addison-Wesley, 1993.
Global State Detection 2
Outline of the talk
• Complexities of state detection in Distributed
Systems
• The notion of Consistent States
• The Distributed Snapshots algorithm
• Application to detect Stable Properties and
Checkpointing
• Another approach for state recording: Publishing
q `
Sq 0 Sq 1 Sq 2 Sq3
Global State Detection 5
Happened Before Relation
• Events e and e` of the same process.
– if e happens before e` then e e`
• e and e` in two different processes
– if e = send(m) and e` = recv(m) then e e`
• Transitive
– if e e` and e` e`` then e e``
• Distributed State :
Have to collect information that is spread
across several machines!!
• Deterministic Computation
– At any point in computation there is at most one event
that can happen next.
• Non-Deterministic Computation
– At any point in computation there can be more than one
event that can happen next.
p
m1
q
m2
r m3
Global State Detection 21
Three possible runs
p p
m1 m1
m3
q q m3
m2
m2
r r
p
m1
m3
q
m2
r
Global State Detection 22
A Non-Deterministic Computation
+ So simple!!
- Correct??
p q
p q
m m
p
m
Sp 1 Sq1
Sp 1 Sq1
Sp 2 Sq3
m3
Sp 2 Sq3
m3
Sp 1 Sq3
• Features:
– Does not promise us to give us exactly what is
there
– But gives us consistent state!!
m1 m2 m3
q
Sq0 Sq1 Sq2 Sq 3
m1 m2 m3
q
Sq0 Sq1 Sq2 Sq 3
m1 m2 m3
q
Sq0 Sq1 Sq2 Sq 3
m1 m2 m3
q
Sq0 Sq1 Sq2 Sq 3
m1 m2 m3
q
Sq0 Sq1 Sq2 Sq 3
M m
p q
m1 m2 m3
q
Sq0 Sq1 Sq2 Sq3
Moral: Computation may not even have
passed through the state recorded!
Global State Detection 49
What have we recorded
– S* is reachable from Si
– Sj is reachable from S*
Global State Detection 51
S* Is reachable from Si
Si
Sj
Si
Sj
Si
S*
Sj
Si
S*
Sj
• A Broadcast medium
• A central recorder process records all the
messages received by each process
• Processes record their states at their own
time and send it to the recorder
p q
STATE SENT MSGS
ID RECD
p Sp1
q Sq1
m1
p q
STATE SENT MSGS
ID RECD
p Sp1
q Sq1 1
p q
STATE SENT MSGS
ID RECD
p Sp1 m1
q Sq1 1
Plus
p q
STATE SENT MSGS
ID RECD
p Sp1 m1
q Sq1 1
p q
STATE SENT MSGS
ID RECD
p Sp2
q Sq1 1
p q
STATE SENT MSGS
ID RECD
p Sp1 m1
q Sq1 1
recorder Sq2
p q
STATE SENT MSGS
ID RECD
p Sp1 m1
q Sq1 1
p q
STATE SENT MSGS
ID RECD
p Sp1 m1
q Sq1 1
m1
p q
STATE SENT MSGS
ID RECD
p Sp1 m1
q Sq1 1
recorder Sp2
p q
STATE SENT MSGS
ID RECD
p Sp1 m1
q Sq1 1
p q
STATE SENT MSGS
ID RECD
p Sp1 m1
q Sq1 1
m1
p q
STATE SENT MSGS
ID RECD
p Sp1 m1
q Sq1 1
SNAPSHOT PUBLISHING
Strongly
Network Need not be
connected
Mode Distributed Centralized
Scalability Yes No
Restorability No Yes