You are on page 1of 2

--- RECOVERY IN DISTRIBUTED SYSTEM ---

Synchronous Checkpointing and Recovery


Proposed by- (Koo & Toueg)
A coordinated checkpointing and recovery technique that takes a consistent set of
checkpointing and avoids domino effect and livelock problems during the recovery.
Includes 2 parts: the checkpointing algorithm and the recovery algorithm

THE CHECKPOINTING ALGORITHM


Assumptions
a) Processes communicate by exchanging messages through communication channels
b) Channels are FIFO
c) End-to-end protocols (such a sliding window) are assumed to cope with message loss
due to rollback recovery and communication failures
d) Communication failures do not partition the network
Two kinds of checkpoints on stable storage: permanent and tentative
1. Permanent checkpoint- It is local checkpoint on a process and part of a consistent
global checkpoint.
2. Tentative checkpoint- It is temporary checkpoint, become permanent checkpoint when
the algorithm terminates successfully.
Processes roll back only to their permanent checkpoint.
Futhermore the check pointing algorithm assumes that a single process invokes the algorithm
and also ensures that no site in the distributed system fails during the execution of the
algorithm.
ALOGRITHM:
Phase 1-
1) Initiating process Pi takes a tentative checkpoint and requests that all the processes take
tentative checkpoints.
2) Each process informs Pi whether it succeeded in taking a tentative checkpoint.
3) If Pi learns that all processes have taken tentative checkpoints, Pi decides that all
tentative checkpoints should be made permanent.
4) Otherwise, Pi decides that all tentative checkpoints should be discarded.
Phase 2-
1) Pi propagates its decision to all processes.
2) On receiving the message from Pi, all processes act accordingly.
3) No process sends message after taking a tentative checkpoint till phase 2 is completed.
Characteristics:
 all or none of the processes take permanent checkpoints
 there is no record of a message being received but not sent
THE ROLL BACK RECOVERY ALGORITHM
Assumptions-
a) A single process invokes the algorithm
b) Checkpoint and rollback recovery are not concurrently invoked
Algorithm-
Phases 1:
1) Process Pi checks whether all processes are willing to restart from their previous
checkpoints.
2) A process may reply “no” if it is already participating in a checkpointing or recovering
process initiated by some other process.
3) If all processes are willing to restart from their previous checkpoints, Pi decides that
they should restart.
4) Otherwise, Pi decides that all the processes continue with their normal activities. (Pi
may attempt recovery at later time)
Phase 2:
1) Pi propagates its decision to all processes.
2) On receiving Pi ’s decision, the processes act accordingly.
Properties
 all or none of the processes restart from checkpoints
 after rollback, all processes resume in a consistent state

Disadvantages of Synchronous Approach


 checkpoint algorithm generates message traffic
 synchronization delays are introduced

You might also like