CSE 513: Distributed Systems (Global State and Mutual Exclusion) Inherent Limitation Inherent Limitation

CSE 513: Distributed Systems
(Global State and Mutual Exclusion)

Guohong Cao
Department of Computer Science& Engineering Department of Computer Science & Engineering
The Pennsylvania State University
http://www.cse.psu.edu/~gcao p p g
1
Inherent Limitation Inherent Limitation
No global clock g
Due to unpredictable message delay, it is
difficult to synchronize physical clocks
Distributed schedule is difficult to implement
Nosharedmemory No shared memory
It is difficult to obtain a coherent global view,
which includes local state and messages in g
transmission.
2
Different Clocks Different Clocks
Physical clocks
Make file, input.c has time 2152, input.o has time 2150,
then recompile input.c
Wh t h h i t itt 2144b What happens when input.c was written as 2144 by a
computer?
Logical clocks Logical clocks
Do not care whether the time is 10:00 or 10:02, but the
logical relation is important.
3
Physical Clocks Physical Clocks
Clock skew
The difference between two computers clocks
Clock drift
C t t ti t diff t t Th d l i Computers count time at different rates. The underlying
oscillators are subject to physical variations such as
temperature
For ordinary clocks based on quartz crystal, the drift is
10
-7
seconds/second.
CoordinatedUniversal Time(UTC) Coordinated Universal Time (UTC)
Atomic oscillator: drift rate is about one part in 10
13
Broadcasted by radio or satellite and accessed through
l b l i i i (GPS)
4
global positioning systems (GPS)
Accuracy in the order of 0.1 ms
Cristians Algorithm Cristian s Algorithm
Also called Network Time Protocol (NTP)
A timeserver canaccessUTC A time server can access UTC
Clients check with the server to synchronize
Sending machine
Time server
Sending machine
T
0
I, Interrupt
handing time g
5
T
1
Major Problem Major Problem
Clocks never run back
What about T
1
< T
0
Solution
Changes should be introduced gradually
Suppose the timer is set to generate 100 interrupts per
second theneachinterrupt adds10ms tothetime second, then each interrupt adds 10ms to the time
When slowing down, each interrupt adds only 9ms
Whenfaster eachinterrupt adds11ms When faster, each interrupt adds 11ms.
6
Minor Problems Minor Problems
How about the communication delay
The delay may be large and vary with network load
One solution is to use (T
1
- T
0
)/2 as the
communication delay
(T
1
- T
0
- I)/2
T k l ddi d h l Take several measurements, and discard those values
which exceeds some threshold value
Other problems Other problems
Single point of failure
7
The Berkeley Algorithm The Berkeley Algorithm
A time server (master) is active, polling every
machine (slaves) periodically to ask what time it is
there
h b bl C The master may not be able to access UTC
Based on the answers, it computes an average time
andtellsslavestoadjust their time and tells slaves to adjust their time
Instead of sending the updated current time, the server
only sends the amount to be adjusted. y j
When the master fails, elect a new one
Delay is not bounded.
8
y
9
Clock Synchronization in Wireless Networks Clock Synchronization in Wireless Networks
Reference broadcast synchronization (RBS)
Use a sender to have the receivers synchronize with
each other
After anodebroadcastsreferencemessagem each After a node broadcasts reference message m, each
node p records time Tp,m that it receved m. Two nodes
p and q can exchange each others delivery times to
estimate their mutual.
T T
M
) (
M
T T
q p offset
k
k q k p
=

=
1
, ,
) (
] , [
10
Where M is the total number of reference messages sent
The critical path in RBS The critical path in RBS
11
Happened Before Happened Before
What does happened before mean without a
global clock?
Each processor usually has a local physical clock.
Denote the value of the local clock at process P
i
when event a occurs as C
i
(a).
E h i b i di h Each event in a process can be timestamped in that
process
C h l ti t l ldti C
i
may have no relation to real world time; e.g.,
just a monotonically increasing counter.
12
Happened before Happened before
Definition: A happened before relation is defined
as follows:
a b, if a and b are events in the same process
and a occurred before b
a b, if a is the event of sending a message m in
db i h f i f h a process and b is the event of receipt of the same
message by another process
If b db t iti If a b andb c, a c, transitive
13
Concurrent Concurrent
Note that a a does not hold
If a happened before b, it is possible that a
causally affects b.
If neither a b nor b a, then we say a and b
are concurrent and write as a || b
Happened before is a partial order
Note that concurrency is not transitive
a || b andb || c does not imply a || c
14
Space-Time Diagram Space Time Diagram
e
11
e
12
e
13
e
14
P
1
S
p
a
c
e
e
21
e
22
e
23
e
24
P
2
Global Time
15
Lamports Logical Clocks Lamport s Logical Clocks
Goal: implement the happens before relation in a
di ib d i h l b l l k distributed system without global clock
Assume each process has a clock (counter)
A t f l k i t if b i li A system of clocks is correct if a b implies
C(a) < C(b)
Happensbeforecanberealizedif thefollowing Happens before can be realized if the following
clock conditions are met:
[C1] For any two events a and b in a processP
i
, if a b
h C( ) C(b) then C
i
(a) < C
i
(b).
[C2] If a is the event of sending a message in process P
i
,
and b is the event of receiving the same message at P
j
,
16
g g
j
then, C
i
(a) < C
j
(b).
Implement the Logical Clock Implement the Logical Clock
[IR1] Clock C
i
is incremented between any two
successiveeventsinprocessP successive events in process P
i
,
C
i
:= C
i
(a) + d (d>0)
Itseasytoseeif a andb aretwosuccessiveeventsin It s easy to see if a and b are two successive events in
P
i
anda b, then C
i
(b) = C
i
(a) + d.
[IR2] If a is the event of sending a message m, with
i C( ) b O i i timestamp t
m
:= C
i
(a), by process P
i
. On receiving
m by process P
j
, C
j
is set to a value greater than or
equal toitspresent valueandgreater thant
m
. equal to its present value and greater than t
m
.
C
j
:= max(C
j
, t
m
+d) (d>0)
Note that the message receipt event at P
j
increments C
j
17
j j
as per rule IR1.
Lamports Logical Clock Lamport s Logical Clock
e
11
e
12
e
14
e
17
P
1
e
13
e
15
e
16
(1)
S
p
a
c
e
(1)
e
21
e
22
e
23
e
24
P
2
e
25
(1)
Global Time
18
19
Total Order Total Order
Order events by their local time, but Lamports
l i l l k l d fi i l d logical clock only defines a partial order.
Break ties by a total ordering on processes
T t l d i f t ( b) Total ordering of events (a b):
If a is any event at process P
i
and b is any event at
processP
j
thena b if either process P
j
, then a b if either
C
i
(a) <C
j
(b) or
C
i
(a) =C
j
(b) andP
i
<<P
j
<<is any arbitrary relation that totally orders the
processes to break ties. A simple way is to implement
<<by using unique identification numbers.
20
y g q
Limitations of Lamports Clocks Limitations of Lamport s Clocks
If a b thenC(a) < C(b). However the reverse is not
necessarytrue but weknowif C(a) < C(b) then what? necessary true, but we know if C(a) < C(b) then what?
See figure, C(e
11
)< C(e
22
) and C(e
11
)< C(e
32
), however,
e
11
is causally related to event e
22
,but not to event e
32
e
11
e
12
P
1
11
y
22
,
32
P
1
S
p
a
c
e
P
e e e
P
3
S
e
21
e
22
P
2
21
e
31
e
32
e
33
Global Time
Vector Clock Vector Clock
Vector clock is proposed to decide whether two
events are causally related or not by simply
looking at their timestamps
Vector clock is an array of integers instead of just
one integer. In a vector, each entry is used to
represent theknowledgeabout theclockof other represent the knowledge about the clock of other
processes
C[j] denotesP sknowledgeabout thelogical C
i
[j] denotes P
i
s knowledge about the logical
time at P
j
22
Implementation Rules Implementation Rules
[IR1] Clock C
i
is incremented between any two
i i P successive events in process P
i
,
C
i
:= C
i
(a) + d (d>0)
[IR2] If i th t f di ith [IR2] If a is the event of sending a message m, with
timestamp t
m
:= C
i
(a), by process P
i
. On receiving
m by process P
j
, C
j
is updated as follows: yp
j
,
j
p
k, C
j
[k] := max(C
j
[k], t
m
[k])
Assertion: At any time, i,j: C
i
[i] > C
j
[i]
23
Vector Clock Vector Clock
e
11
e
12
e
13
P
1
S
p
a
c
e
e e e e
P
2
P
3
e
21
e
22
e
23
e
24
e
31
e
32
P
3
Global Time
24
Global Time
Equal: t
a
= t
b
iff i, t
a
[i] = t
b
[i]
Not Equal: t
a
= t
b
iff -i, t
a
[i] = t
b
[i]
Less Than or Equal: t
a
s t
b
iff i, t
a
[i] s t
b
[i] q ff
Less Than: t
a
< t
b
iff (t
a
s t
b
. t
a
= t
b
)
Not Less Than: t
a
< t
b
iff (t
a
s t
b
. t
a
= t
b
) ff ( )
Concurrent: t
a
|| t
b
iff (t
a
< t
b
. t
b
<t
a
)
Assertion: a b iff t
a
< t
b
25
Causal Ordering of Messages Causal Ordering of Messages
Causal ordering: If send(m1) happens before send (m2),
h i i f b h 1 d 2 then every recipient of both messages m1 and m2 must
receive m1 before m2.
Applications: replicateddatabasesystems, theorder may Applications: replicated database systems, the order may
be important. E.g., account adjust
The Birman-Schiper-Stephenson protocol was
implemented in ISIS
Basic Idea: deliver a message to a process only if the
messageimmediatelyprecedingit hasbeendeliveredto message immediately preceding it has been delivered to
the process. Otherwise, buffer it until the message
immediately preceding it is delivered.
26
Causal Ordering Causal Ordering
P
1
Send (m1)
S
p
a
c
e
Send (m2)
P
2
P
2
(1)
P
2
Time
(2)
27
Time
Birman Schiper Stephenson protocol Birman-Schiper-Stephenson protocol
Before broadcasting a msg m, a process P
i
incrementsthevector timeC
i
[i] andtimestampsm increments the vector time C
i
[i] and timestamps m.
Note that (C
i
[i] 1) indicates how many messages from
P
i
precededm.
A process P
j
= P
i
, upon receiving message m
timestamped with C
m
fromP
i
, delays its delivery
until boththefollowingconditionsaresatisfied until both the following conditions are satisfied.
C
j
[i] = C
m
[i] 1
C
j
[k] > C
m
[k] k e{1,2,,n} {i}
j
Where n is the total number of processes. Delayed msgs
are queued
Whenamessageisdeliveredat aprocessP C is
28
When a message is delivered at a processP
j
, C
j
is
updated according to the vector clock ruleIR2
Global State Global State
The recorded global state may be inconsistent if n
< n or n > n,
where n is the number of messages sent by A along the
h l b f A t t d d channel before As state was recorded.
n is the number of messages sent by A along the
channel before the channels state was recorded.
Examples:
Record state of A at state 1, and state of channel and B
at state 2, n =0, n =1
Channel at state 1, state of A and B at state 2, n=0,
n=1
29
n=1.
Local state Local state Local state Local state
$500 $500
Of A Of A
C1: Empty C1: Empty
Of B Of B
$200 $200 State 1 State 1
C2 E C2 E
S1: A S1: A
S2:B S2:B
C2: Empty C2: Empty
$450 $450 $200 $200 State 2 State 2
C1: $50 C1: $50
C2: Empty C2: Empty
S1: A S1: A
S2:B S2:B
C2: Empty C2: Empty
C1 E t C1 E t
$450 $450
S1 A S1 A
$250 $250 State 3 State 3
C1: Empty C1: Empty
C2: Empty C2: Empty
30
S1: A S1: A
S2:B S2:B
Local States Local States
Local states
send(m
ij
) e LS
i
iff time(send(m
ij
)) < time(LS
i
)
rec(m
ij
) e LS
j
iff time(rec(m
ij
)) < time(LS
j
)
Transit: transit(LS
i
, LS
j
) =
{m
ij
| send(m
ij
) e LS
i
. rec(m
ij
) e LS
j
}
Inconsistent: inconsistent (LS
i
LS
j
) = Inconsistent: inconsistent (LS
i
, LS
j
)
{m
ij
| send(m
ij
) e LS
i
. rec(m
ij
) e LS
j
}
31
Global States Global States
Consistent Global states:
a global stateGS ={LS
1
, LS
2
, LS
n
} is consistent iff
i, j: 1s i, j s n :: inconsistent (LS
i
, LS
j
,) = u
Transitless global states:
a global stateGS ={LS
1
, LS
2
, LS
n
} is transitless iff
1 (LS LS ) u i, j: 1s i, j s n :: transit (LS
i
, LS
j
,) = u
Strongly consistent global states:
It i i t t dt itl It is consistent and transitless
32
Global State Global State
P
1
S
p
a
c
e
P
2
P
2
P
2
Time
33
Time
Chandy-Lamport Algorithm Chandy-Lamport Algorithm
Assumptions: FIFO channel
Marker sendingrulefor aprocessP Marker sending rule for a process P
P records its state
For each outgoing channel C from P on which a marker
hasnot beenalreadysent P sendsamarker alongC has not been already sent, P sends a marker along C
before P sends further messages along C
Marker receiving Rule for a process Q. On receipt
f k l h l C of a marker along a channel C:
If Q has not recorded its sate {
Record the state of C as empty, py,
Follow the marker sending rule. }
Else {record the sate of C as the sequence of messages
receivedalongC after Qsstatewasrecordedand
34
received along C after Q s state was recorded and
before Q received the marker along C }
$500 $500
Local state Local state
Of A Of A C1: Empty C1: Empty
Local state Local state
Of B Of B
$200 $200
State 1 State 1
S1: A S1: A
S2:B S2:B
$450 $450 $200 $200
C2: Empty C2: Empty
C1: +$50 M C1: +$50 M
$450 $450 $200 $200
State 2 State 2
C2: Empty C2: Empty
C1: +$50 M C1: +$50 M
$450 $450 $160 $160
State 3 State 3
C1: +$50 M C1: +$50 M
C2: + $40 C2: + $40
$490 $490 $160 $160
State 4 State 4
C1: + $50 C1: + $50
C2: +$40 M C2: +$40 M $$
$490 $490 $210 $210
State 5 State 5
C1: empty C1: empty
C2 C2
35
Recorded state: A ($500), B($160), c1 (empty), c2 (+$40) Recorded state: A ($500), B($160), c1 (empty), c2 (+$40)
C2: empty C2: empty
$500 $500
C1: Empty C1: Empty
$200 $200
State 1 State 1
C2: Empty C2: Empty
S1: A S1: A
S2:B S2:B
$500 $500 $160 $160
State 2 State 2
C2: Empty C2: Empty
C1: Empty M C1: Empty M
State 2 State 2
C2: + $40 C2: + $40
C1: +$50 C1: +$50
$450 $450 $160 $160
State 3 State 3
C2: + $40 M C2: + $40 M
C1: + $50 C1: + $50
$490 $490 $160 $160
State 4 State 4
C2: empty C2: empty
C1: empty C1: empty
$490 $490 $210 $210
State 5 State 5
C1: empty C1: empty
C2: empty C2: empty
36
Reorder the deduction of A and B Reorder the deduction of A and B
Now, B first deduct, then A Now, B first deduct, then A
Recorded real state: A ($500), B($160), c1 (empty), c2 (+$40) Recorded real state: A ($500), B($160), c1 (empty), c2 (+$40)
A Note on the Collected Global State A Note on the Collected Global State
Its possible that the collected global state is not identical
h l l b l to the real global state
The state may change after the marker is sent out and before it is
received
There exists a sequence of actions which can start from the
initial state and reach the collected state by permutation.
U f l t d t t t bl ti d dl kd t ti d Useful to detect stable properties, deadlock detection, and
termination detection.
Seq Seq
start start
end end
37 Seq Seq
collected collected
CSE 513: Distributed Systems CSE 513: Distributed Systems
(Mutual Exclusion)
Guohong Cao
Department of Computer Science& Engineering Department of Computer Science & Engineering
The Pennsylvania State University
http://www.cse.psu.edu/~gcao p p g
38
Performance Metrics Performance Metrics
Message complexity
Synchronization delay
Response time
System throughput =1/(sd + E)
Where E is the average CS execution time, sd is the
synchronization delay
39
Mutual exclusion in Distributed Systems Mutual exclusion in Distributed Systems
No shared memory, then no semaphores
No common physical clock
Unpredictable network delays y
Requirements
Free of deadlock
Free of starvation
Fairness
Fault tolerance
40
41
A simple solution A simple solution
Use a control site
Send request to the control site
Control site queues requests, and grants permission one
bby one.
Drawbacks
Singlepoint of failure Single point of failure,
election algorithm to select another coordinator
Bottleneck and congestion g
42
(a) Process1asksthecoordinator for permissiontoaccessaharedresource (a) Process 1 asks the coordinator for permission to access a hared resource.
Permission is granted. (b) Process 2 then asks permission to access the
same resource. The coordinator does not reply. (c) When process 1
releasestheresource it tellsthecoordinator whichthenrepliesto2
43
releases the resource, it tells the coordinator, which then replies to 2.
Election Algorithms Election Algorithms
The Bully Algorithm. The process with the highest
process id wins. A process detects the coordinator
problem starts this election.
1 P sendsanELECTION messagetoall processeswith 1. P sends an ELECTION message to all processes with
higher numbers.
2. If no one responds, P wins the election and becomes
coordinator.
3. If one of the higher-ups answers, it takes over. Ps job is
done done.
44
The Bully Algorithm (1) The Bully Algorithm (1)
The bully election algorithm. (a) Process 4 holds an election. (b)
Processes5and6respond, telling4tostop. (c) Now5and6
45
Processes 5 and 6 respond, telling 4 to stop. (c) Now 5 and 6
each hold an election.
The Bully Algorithm (2) The Bully Algorithm (2)
The bully election algorithm. (d) Process 6 tells 5 to
( ) P 6 i d ll stop. (e) Process 6 wins and tells everyone.
46
A ring algorithm, but no token.
Eachprocessknowsitssuccessor or more Each process knows its successor or more.
When a process detects coordinator failure, it builds an ELECTION
message containing its own process id and sends the message to its
successor.
If the successor is down, send to another.
Finally themessagegetsbacktotheinitiator whichcanrecognizeits Finally, the message gets back to the initiator, which can recognize its
own id. Change the message to COORDINATOR and circulate once
again.
47
Election in Wireless Environments Election in Wireless Environments
Previous election algorithms assume that the message
i i li bl d h k l d passing is reliable, and the network topology does not
change. Also, process with the highest number wins. Not
true in wireless environments. The coordinator should have
more resources such as battery power.
The initiator sends an election to its immediate neighbors.
When a node receives election for the first time, it
designates the sender as its parent, and then sends out an
election to all neighbors, except the parent. g , p p
When a node receives an election from a node other than
its parent, sends ack. A node sends ack to its parent after it
i ll k f hi hit d l ti
48
receives all acks from which it sends elections.
Elections in Wireless Environments (1) Elections in Wireless Environments (1)
Figure 6-22. Election algorithm in a wireless network, with node a
asthesource. (a) Initial network. (b)(e) Thebuild-treephase(last
49
as the source. (a) Initial network. (b) (e) The build tree phase (last
broadcast step by nodes f and I not shown). (f) reporting of best
node to source 50
Lamports M E Lamport s M.E
Non-token based algorithm
Requirements:
A process in the CS must release it before it can be granted to
another.
Requests to enter CS should be granted in the order in which they
were made.
If eachuseof CSisfinite eachrequest will eventuallybemet If each use of CS is finite, each request will eventually be met.
Network assumptions
FIFO channel and the channel is reliable
Complete connection
N sites S
1
, S
2
, , S
n
, each keeps a queue (request_queue
i
)
of requestsfor enteringtheCS
51
of requests for entering the CS.
Requesting the CS
WhenasiteSi wantstoenter theCS it sendsaREQUEST(ts i) When a site Si wants to enter the CS, it sends a REQUEST(ts
i
, i)
message to all sites and places the request on its own request_queue
i
When a site S
j
receives the REQUEST(ts
i
, i) from S
i
, it returns a
i dREPLY S d l S timestampedREPLY to S
i
, and places S
i
srequest on request_queue
j
Executing the CS: site S
i
can enter the CS when the following
conditions hold
S
i
has received a message with timestamp larger than (ts
i
, i) from all
other sites.
S t i th t f t S
i
srequest is on the top of request_queue
i
Releasing the CS
SiteS
i
, uponexitingtheCS, removesitsrequest fromthetopof its Site S
i
, upon exiting the CS, removes its request from the top of its
request queue and sends a timestampedRELEASE to all the sites in
its request set.
WhenS receivesaRELEASE fromS it removesS srequest from
52
When S
j
receives a RELEASE from S
i
, it removes S
i
srequest from
its request queue.
Making Requests Making Requests
53
Enters the CS Enters the CS
54
Exits the CS Exits the CS
55
Enters the CS Enters the CS
56
Correctness Proof Correctness Proof
Assume that S
i
and S
j
are both in the CS, and
j
timestamp (S
i
) >timestamp (S
j
).
Condition 1 implies that both S
i
and S
j
had their
own requests at the head of their request_queues.
Condition 2 along with FIFO order, guarantees
h S h h d b ll h d that S
i
has heard about all requests that precedes
its current request.
B t if S h h d b t S t S t But if S
i
has heard about S
j
s requests, S
j
s request
would be ahead of S
i
s request in the queue. A
contradiction
57
contradiction.
Performance Performance
For each entry to CS
One request sent to each of N-1 processes
One reply from each of the N-1 processes
One release to each of the N-1 processes
Message complexity: 3(N-1)
Synchronization delay: T
58
Ricart Agrawala Algorithm Ricart Agrawala Algorithm
Requesting the CS
When a site S
i
wants to enter the CS, it sends a REQUEST
message to all sites.
When a site S
j
receives the REQUEST from S
i
, it returns a W e s eS
j
eceves e QU S o S
i
, eu s
REPLY to S
i
if it is neither requesting nor executing the CS or if
site S
j
is requesting and S
i
s request timestamp is smaller than
S
j
sownrequest. Otherwise, deferstherequest. S
j
s own request. Otherwise, defers the request.
Executing the CS
Site S
i
enters the CS when it has received REPLY messages
from all sites.
Releasing the CS
WhenS exitstheCS it sendsREPLY messagestoall the
59
When S
i
exits the CS, it sends REPLY messages to all the
deferred requests.
60
61
Correctness Proof Correctness Proof
Assume that S
i
and S
j
are both in the CS, and
j
timestamp (S
i
) <timestamp (S
j
).
Since S
i
is in the CS, S
i
has received S
j
s reply
after it had made its own request.
S
j
can concurrently execute the CS withS
i
only if
S REPLY S b f S i h CS S
i
returns a REPLY to S
j
before S
i
exits the CS.
Its impossible since S
j
s request has lower
priority priority.
62
Performance Performance
It is an optimization of the Lamports algorithm, no
FIFO i FIFO assumption.
Message complexity: 2(N-1)
Onerequest sent toeachof N 1 processes One request sent to each of N-1 processes
One reply from each of the N-1 processes
Synchronizationdelay: T Synchronization delay: T
Optimization:
After S
i
has received a REPLY message from S
j
, S
i
can
j
enter the CS any number of times without requesting
permission from S
j
until S
i
sends a REPLY message to S
j
.
Message complexity: between 0 to 2(N-1).
63
g p y ( )
Quorum Quorum
Let U denote set of N sites. A coterie C is a set of
sets, where each set g in C is called a quorum,
which satisfies condition:
C={{a, b}, {b, c}} is a coterie under U={a, b, c},
d { b} {b } i b { } i
empty h g C h g = e : ,
and g={a, b} or {b, c} is a quorum, but g={a} is
not a quorum.
Oth l t Other examples: tree quorum
Guarantee mutual exclusion
64
Maekawa (Quorum-Based) Algorithm Maekawa (Quorum-Based) Algorithm
Construct quorums so that each quorum only has
sites.
Requesting the CS:
N
Send request to each site in its quorum.
When a site receives a request, it sends a reply if it
hasnt beenlocked; otherwise queueit A siteislocked hasn t been locked; otherwise, queue it. A site is locked
if it has sent a reply.
65
Maekawa (Quorum-Based) Algorithm Maekawa (Quorum-Based) Algorithm
Executing the CS:
A site access the CS only when it has received all reply
messages.
R l i th CS Releasing the CS:
Send release to each site in its quorum
Whenasitereceivesarelease it isunlocked It sendsa When a site receives a release, it is unlocked. It sends a
reply to the top site in its request queue if the queue is
not empty.
66
Performance of the Maekawa Algorithm Performance of the Maekawa Algorithm
Messages: request, reply, release N N N
messages. Total 3 messages.
Delay is 2T, since a site exiting the CS must first
N
send a release to unlock the arbiter site which in
turn sends a reply message to the next site to enter
theCS

the CS.
Correctness:
SupposeS andS arebothintheCS andS S = {S } Suppose S
i
and S
j
are both in the CS, and S
i
S
j
= {S
k
},
then site S
k
must have sent reply messages to both S
i
and S
j
concurrently. A contradiction.
67
Possible deadlocks Possible deadlocks
S
i
S
j
= {S
ij
}, S
j
S
k
= {S
jk
}, S
k
S
i
= {S
ki
},

j j j j
It is possible that S
ij
has been locked by S
i
(forcing
S
j
to wait at S
ij
), S
jk
j
(forcing
S
k
to wait at S
jk
), and S
ki
k
(forcing S
i
to wait at S
ki
) resulting in a deadlock
involvingthesitesS S andS involving the sites S
i
, S
j
, and S
k
.
Handling deadlock by new messages: failed,
inquire yield inquire, yield.
68
Solving the Deadlock Problem Solving the Deadlock Problem
A failed message from S
i
to S
j
indicates that S
i j
cannot grant S
j
s request because it has currently
granted permission to a site with a higher priority
t request
An inquire from S
i
to S
j
indicates that S
i
would like
tofindout fromS if it hassucceededinlocking to find out from S
j
if it has succeeded in locking
all the sites in its request set.
A yield messagefromS toS indicatesthat S is A yield message from S
i
to S
j
indicates that S
i
is
returning the permission to S
j
.
69
Details of handling deadlock Details of handling deadlock
When a request(ts,i) from S
i
blocks at S
j
because S
j j j
has currently grated permission to S
k
, then S
j
sends
a failed(j) to S
i
if S
i
s request has lower priority.
Oth i S d i i (j) t S Otherwise, S
j
sends an inquire (j) to S
k
.
In response to an inquire(j) from S
j
, S
k
sends a
yield (k) toS providedS hasreceivedafailed or yield (k) to S
j
, provided S
k
has received a failed or
if it sent a yield.
Inresponsetoayield(k) fromS S assumesit has In response to a yield(k) from S
k
, S
j
assumes it has
been released by S
k
, put this request into the
request queue, and send reply(j) to the top request.
70
q q , p y(j) p q
Token-based schemes Token-based schemes
Enter the CS when holding the token.
Issues: how to find and get the token. This
distinguishes various algorithms.
Generally do not assume FIFO message delivery.
Proof of correctness is trivial.
Use sequence numbers instead of timestamps
Differentiate old and current requests.
71
A Token Ring Algorithm A Token Ring Algorithm
(a) An unordered group of processes on a network.
72
( ) g p p
(b) A logical ring constructed in software.
Raymonds Algorithm Raymond s Algorithm
Sites are logically arranged as a directed tree such that the
edgeof thetreeisassigneddirectionstowardsthesite edge of the tree is assigned directions towards the site
(root) that has the token.
73
Requesting the CS Requesting the CS
1. To enter the CS, a site sends a request to the node along
thepathtotheroot if it doesnot holdthetokenandits the path to the root if it does not hold the token and its
request_q is empty. It adds its request to its request_q.
2. When a site on the path receives the message, it places the
i i d d l h h request in its request_q and sends a request along the path
to the root if it has not sent out a request.
3. When the root receives a request, it sends the token to the q
site from which it received the request and sets its holder
to point at that site.
4 Whenasitereceivesthetoken it deletesthetopentry 4. When a site receives the token, it deletes the top entry
from its request_q, sends the token to the site indicated in
this entry, and sets its holder to point at that site. If
request q isnonempty it sendsarequest tothesitewhich
74
request_q is nonempty, it sends a request to the site which
is pointed at by the holder.
Executing and Exiting the CS Executing and Exiting the CS
Executing the CS
A site enters the CS when it receives the token and its
own request is at the top of its request_q. in this case,
removethetopentry remove the top entry.
Releasing the CS
If itsrequest q isnonempty it deletesthetopentry If its request_q is nonempty, it deletes the top entry
from its request_q, sends the token to that site, and sets
its holder to that site.
If the request_q is nonempty at this point, it sends a
request to the site which is pointed at by the holder.
75
Requesting a Token Requesting a Token
76
Receiving the Token Receiving the Token
77
Performance under Light Load Performance under Light Load
Worst case is: 2(N-1)
Best case: O(log N), synchronization delay: T log N
78
Heavy Load Heavy Load
If every process is requesting the resource, the
token traverses each edge twice to give every
process a turn.
Each request goes only as far as the neighbor, who
has already asked for the token, so each use of the
resourcerequiresfour messages resource requires four messages.
One request for the token
Onedeliveryof thetoken One delivery of the token
One request for return of the token
One return of the token
79

CSE 513: Distributed Systems (Global State and Mutual Exclusion) Inherent Limitation Inherent Limitation

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CSE 513: Distributed Systems (Global State and Mutual Exclusion) Inherent Limitation Inherent Limitation

Uploaded by

Copyright:

Available Formats

CSE 513: Distributed Systems

(Global State and Mutual Exclusion)

You might also like