Professional Documents
Culture Documents
Chapter 5
Synchronization
Synchronization in distributed systems is harder than in centralized systems because of the need for distributed algorithms. Distributed algorithms have the following properties:
No machine has complete information about the system state. Machines make decisions based only on local information. Failure of one machine does not ruin the algorithm. There is no implicit assumption that a global clock exists.
Clock Synchronization
Time is unambiguous in centralized systems.
System clock keeps time, all entities use this for time.
Coordinated universal time (UTC) international standard based on atomic time same as Greenwich Mean Time
Add leap seconds to be consistent with astronomical time UTC broadcast on radio (satellite and earth) Receivers accurate to 0.1 10 ms
The goal is to synchronize machines with a master (UTC receiver machine) or with one another.
Physical Clocks
Physical Clocks
TAI (Temps Atomique International) seconds are of constant length, unlike solar seconds. Leap seconds are introduced when necessary to keep in phase with the sun.
The relation between clock time and UTC when clocks tick at different rates.
Clock Synchronization
Each clock has a maximum drift rate r
1-r <= dC/dt <= 1+r Two clocks may drift by 2r Dt in time Dt To limit drift to d => resynchronize every d/2r seconds (2r Dt <= d, Dt = d/2r)
Cristian's Algorithm
Synchronize machines to a time server that has a UTC receiver. Machine P requests time from server every d/2r seconds
Receives time t (Cutc) from server, P sets clock to t+treply where treply is the time to send reply to P Use (treq+treply)/2 as an estimate of treply Improve accuracy by making a series of measurements
Cristian's Algorithm
Berkeley Algorithm
Used in systems without UTC receiver
Keep clocks synchronized with one another One computer is master, other are slaves Master periodically polls slaves for their times
Average times and return differences to slaves Communication delays compensated as in Cristians algorithm
a) b) c)
The time daemon asks all the other machines for their clock values The machines answer The time daemon tells everyone how to adjust their clock
Distributed Approaches
Both approaches studied thus far are centralized. Decentralized algorithms: use resynchronization intervals
Broadcast time at the start of the interval Collect all other broadcast that arrive in a period S Use average value of all reported times Can throw away few highest and lowest values
Logical Clocks
For many problems, only internal consistency of clocks matters.
Absolute (real) time is less important Use logical clocks
Key idea:
Clock synchronization needs not be absolute. If two machines do not interact, no need to synchronize them. More importantly, processes need to agree on the order in which events occur rather than the time at which they occurred.
Event Ordering
Problem: define a total ordering of all events that occur in a system. Events in a single processor machine are totally ordered. In a distributed system:
No global clock, local clocks may be unsynchronized. Can not order events on different machines using local times.
Happenes-Before Relation
The expression A B is read A happens before B. If A and B are events in the same process and A executed before B, then A B If A represents sending of a message and B is the receipt of this message, then A B Relation is transitive:
A B and B C A C
Solution:
Each processor maintains a logical clock LCi Whenever an event occurs locally at I, LCi = LCi+1 When i sends message to j, piggyback LCi When j receives message from i
If LCj < LCi then LCj = LCi +1 else do nothing
Lamport Timestamps
a) b)
Three processes, each with its own clock. The clocks run at different rates. Lamport's algorithm corrects the clocks.
Updating a replicated database and leaving it in an inconsistent state without a totally-ordered logic clock.
Causality
Lamports logical clocks
If A B then C(A) < C(B) Reverse is not true!!
Nothing can be said about events by comparing time-stamps! If C(A) < C(B), then ??
Vector Clocks
Causality can be captured by means of vector timestamps. Each process i maintains a vector Vi
Vi[i] : number of events that have occurred at i Vi[j] : number of events I knows have occurred at process j
Global State
The global state of a distributed system consists of
Local state of each process Messages sent but not received (state of the queues)
Problem: how can you figure out the state of a distributed system?
Each process is independent No global clock or synchronization
Global State
a) A consistent cut receipts corresponds a send event b) An inconsistent cut sender cannot be identified
On receiving a marker
Checkpoint state if first marker and send marker on outgoing channels, save messages on all other channels until: Subsequent marker on a channel: stop saving state for that channel
Distributed Snapshot
A process finishes when
It receives a marker on each incoming channel and processes them all State: local state plus state of all channels B M Send state to initiator
Each is separate, and each is distinguished by tagging the marker with the initiator ID (and sequence number)
a)
b) c) d)
Process Q receives a marker for the first time and records its local state Q records all incoming message Q receives a marker for its incoming channel and finishes recording the state of the incoming channel
Termination Detection
Detecting the end of a distributed computation Notation: let sender be predecessor, receiver be successor Two types of markers: Done and Continue After finishing its part of the snapshot, process Q sends a Done or a Continue to its predecessor Send a Done only when
All of Qs successors send a Done Q has not received any message since it check-pointed its local state and received a marker on all incoming channels Else send a Continue
Computation has terminated if the initiator receives Done messages from everyone
Election Algorithms
Many distributed algorithms need one process to act as coordinator
Doesnt matter which process does the job, just need to pick one
Election algorithms: technique to pick a unique coordinator (aka leader election) Examples: take over the role of a failed process, pick a master in Berkeley clock synchronization algorithm Types of election algorithms: Bully and Ring algorithms
Bully Algorithm
Each process has a unique numerical ID Processes know the Ids and address of every other process Communication is assumed reliable Key Idea: select process with highest ID Process initiates election if it just recovered from failure or if coordinator failed 3 message types: election, OK, I won Several processes can initiate an election simultaneously
Need consistent result
The bully election algorithm Process 4 holds an election Process 5 and 6 respond, telling 4 to stop Now 5 and 6 each hold an election
Bully Algorithm
d) e)
Ring-based Election
Processes have unique Ids and arranged in a logical ring Each process knows its neighbors
Select process with highest ID
Begin election if just recovered or coordinator has failed Send Election to closest downstream node that is alive
Sequentially poll each successor until a live node is found
Each process tags its ID on the message Initiator picks node with highest ID and sends a coordinator message Multiple elections can be in progress
Wastes network bandwidth but does no harm
A Ring Algorithm
Comparison
Assume n processes and one election in progress
Bully algorithm
Worst case: initiator is node with lowest ID
Triggers n-2 elections at higher ranked nodes: O(n2) msgs
Ring
2 (n-1) messages always
Distributed Synchronization
Distributed system with multiple processes may need to share data or access shared data structures
Use critical sections with mutual exclusion
a)
b) c)
Process 1 asks the coordinator for permission to enter a critical region. Permission is granted Process 2 then asks permission to enter the same critical region. The coordinator does not reply. When process 1 exits the critical region, it tells the coordinator, when then replies to 2
Properties
Simulates centralized lock using blocking calls Fair: requests are granted the lock in the order they were received Simple: three messages per use of a critical section (request, grant, release) Shortcomings:
Single point of failure How do you detect a dead coordinator?
A process can not distinguish between lock in use from a dead coordinator
No response from coordinator in either case
Distributed Algorithm
[Ricart and Agrawala]: needs 2(n-1) messages Based on event ordering and time stamps Process k enters critical section as follows
Generate new time stamp TSk = TSk+1 Send request(k,TSk) all other n-1 processes Wait until reply(j) received from all other processes Enter critical section
A Distributed Algorithm
a) b) c)
Two processes want to enter the same critical region at the same moment. Process 0 has the lowest timestamp, so it wins. When process 0 is done, it sends an OK also, so 2 can now enter the critical region.
Properties
Fully decentralized N points of failure! All processes are involved in all decisions
Any overloaded process can become a bottleneck
Comparison
Algorithm
Centralized Distributed Token ring Messages per entry/exit 3 2(n1) 1 to Delay before entry (in message times) 2 2(n1) 0 to n 1
Problems
Coordinator crash Crash of any process Lost token, process crash
Transactions
Transactions provide higher level mechanism for atomicity of processing in distributed systems
Have their origins in databases
Client 1
Client 2
Write C:$297
Read B: $200 Read B: $200
Result can be inconsistent unless certain properties are imposed on the accesses
Write B:$203
Write B:$204
ACID Properties
Atomic: all or nothing (indivisible) Consistent: transaction takes system from one consistent state to another (hold certain invariants) Isolated: Immediate effects are not visible to other (serializable) Durable: Changes are permanent once transaction completes (commits)
Client 1 Client 2
a) Transaction to reserve three flights commits (White Plains New York Nairobi Malindi) b) Transaction aborts when third flight is unavailable
Classification of Transactions.
A flat transaction is a series of operations that satisfy the ACID properties.
It does not allow partial results to be committed or aborted. Example: flight reservation, Web link update.
A nest transaction is constructed from a number of subtransactions. A distributed transaction is logically a flat, indivisible transaction that operates on distributed data.
Distributed Transactions
a) b)
A nested transaction (transaction is decomposed into subtransactions) A distributed transaction (subtransaction on different data)
Implementation of transactions
Two methods can be used to implement transactions:
Private workspace: Until the transaction either commits or aborts, all of its reads and writes go to the private workspace. Writeahead log: Use a log to record the change. Only after the log has been written successfully is the change made to the file.
Private workspace
Each transaction get copies of all files, objects It can optimize for reads by not making copies It can optimize for writes by copying only what is required (An appended block and a copy of modified block are created. These new blocks are called shadow blocks.) Commit requires making local workspace global
Private Workspace
a) b) c)
The file index and disk blocks for a three-block file The situation after a transaction has modified block 0 and appended block 3 After committing
Force logs on commit If abort, read log records and undo changes [rollback] Log can be used to rerun transaction after failure Both workspaces and logs work for distributed transactions Commit needs to be atomic [will return to this issue in Ch. 7]
Writeahead Log
x = 0; y = 0; BEGIN_TRANSACTION; x = x + 1; y=y+2 x = y * y; END_TRANSACTION; (a) Log Log Log
[x = 0 / 1]
[x = 0 / 1] [y = 0/2]
(b)
(c)
Concurrency Control
Goal: Allow several transactions to be executing simultaneously such that
Collection of manipulated data item is left in a consistent state
Concurrency Control
Concurrency control can implemented in a layered fashion:
Bottom layer - A data manager performs the actual read and write operations on data. Middle layer - A scheduler carries the main responsibility for properly controlling concurrency. Scheduling can be based on the use of locks or timestamps. Highest layer The transaction manager is responsible for guaranteeing atomicity of transactions.
Concurrency Control
Concurrency Control
General organization of managers for handling distributed transactions.
Serializability
Key idea: properly schedule conflicting operations Conflict is possible if at least one operation is write
Read-write conflict Write-write conflict
BEGIN_TRANSACTION x = 0; x = x + 1; END_TRANSACTION BEGIN_TRANSACTION x = 0; x = x + 2; END_TRANSACTION BEGIN_TRANSACTION x = 0; x = x + 3; END_TRANSACTION
(a) Schedule 1
Schedule 2 Schedule 3
(b) x = 0; x = x + 1; x = 0; x = x + 2; x = 0; x = x + 3
x = 0; x = 0; x = x + 1; x = x + 2; x = 0; x = x + 3; x = 0; x = 0; x = x + 1; x = 0; x = x + 2; x = x + 3; (d)
(c) Legal
Legal Illegal
a) c) Three transactions T1, T2, and T3 d) Possible schedules (Schedule 2 is legal because it results in a valid x value.)
Serializability
Two approaches are used in concurrency control:
Pessimistic approaches: operations are synchronized before they are carried out. Optimistic approaches: operations are carried out and synchronization takes place at the end of transaction. At the conflict point, one or more transactions are aborted.
Two-Phase Locking
Two-phase locking.
Two-Phase Locking
When a transaction aborts, it must restart with a new (larger) time stamp Two values for each data item x
Max-rts(x): max time stamp of a transaction that read x Max-wts(x): max time stamp of a transaction that wrote x
Writei(x)
If ts(Ti)<max-rts(x) or ts(Ti)<max-wts(x) then Abort Ti Else
Perform Wi(x) Max-wts(x) = ts(Ti)
Disadvantage:
Rerun transaction if aborts Probability of conflict rises substantially at high loads