You are on page 1of 18

Distributed Deadlock Detection

Introduction:

Deadlocks are a fundamental problem in distributed systems.

A process may request resources in any order, which may not be known a priori
and a process can request resource while holding others.

If the sequence of the allocations of resources to the processes is not controlled,


deadlocks can occur.

A deadlock is a state where a set of processes request resources that are held by
other processes in the set.

A distributed program is composed of a set of n asynchronous processes p1, p2, . .


. , pi , . . . , pn that communicates by message passing over the communication
network.

Without loss of generality we assume that each process is running on a different


processor.

The processors do not share a common global memory and communicate solely
by passing messages over the communication network.

There is no physical global clock in the system to which processes have


instantaneous access.

The communication medium may deliver messages out of order, messages may
be lost garbled or duplicated due to timeout and retransmission, processors may
fail and communication links may go down.

We make the following assumptions:

The systems have only reusable resources.


Processes are allowed to make only exclusive access to resources.

There is only one copy of each resource.

A process can be in two states: running or blocked.

In the running state (also called active state), a process has all the needed
resources and is either executing or is ready for execution.

In the blocked state, a process is waiting to acquire some resource.

Wait-For-Graph:

The state of the system can be modeled by directed graph, called a wait for
graph(WFG).

In a WFG , nodes are processes and there is a directed edge from node P1 to
mode P2 if P1 is blocked and is waiting for P2 to release some resource.

A system is deadlocked if and only if there exists a directed cycle or knot in the
WFG

Deadlock Handling Strategies

There are three strategies for handling deadlocks, viz.,

deadlock prevention,

deadlock avoidance, and

deadlock detection.

Handling of deadlock becomes highly complicated in distributed systems because


no site has accurate knowledge of the current state of the system and because
every inter-site communication involves a finite and unpredictable delay.
Deadlock prevention:It is commonly achieved either by having a process acquire
all the needed resources simultaneously before it begins executing or by
preempting a process which holds the needed resource.

• A method that might work is to order the resources and require processes to
acquire them in strictly increasing order. This approach means that a process can
never hold a high resource and ask for a low one, thus making cycles impossible.

• With global timing and transactions in distributed systems, two other methods
are possible ‐‐ both based on the idea of assigning each transaction a global
timestamp at the moment it starts.

• When one process is about to block waiting for a resource that another process
is using, a check is made to see which has a larger timestamp.

• We can then allow the wait only if the waiting process has a lower timestamp.

• The timestamp is always increasing if we follow any chain of waiting processes,


so cycles are impossible ‐‐‐ we can used decreasing order if we like.

• It is wiser to give priority to old processes because

– they have run longer so the system have larger investment on these processes.
– they are likely to hold more resources.

– A young process that is killed off will eventually age until it is the oldest one in
the system, and that eliminates starvation.

This approach is highly inefficient and impractical in distributed systems.

Deadlock avoidance:In this approach to distributed systems, a resource is granted


to a process if the resulting global system state is safe (note that a global state
includes all the processes and resources of the distributed system).

However, due to several problems, deadlock avoidance is impractical in


distributed systems.
Deadlock detection requires examination of the status of process-resource
interactions for presence of cyclic wait.

Deadlock detection in distributed systems seems to be the best approach to


handle deadlocks in distributed systems.

Issues in Deadlock Detection:

Deadlock handling using the approach of deadlock detection entails addressing


two basic issues: First, detection of existing deadlocks and second resolution of
detected deadlocks.

Detection of deadlocks involves addressing two issues:

Maintenance of the WFG and Searching of the WFG for the presence of cycles(or
knots).

Correctness Criteria:

A deadlock detection algorithm must satisfy the following 2 conditions:

I) Progress(No Undetected deadlocks):


The algorithm must detect all existing deadlocks in finite time.
In other words, after all wait-for dependencies for deadlocks
Have formed, the algorithm should not wait for any more events to
occur to detect the deadlocks.
II) Safety(No false deadlocks):
The algorithm should not report deadlocks which do not exist(called
phantom or false deadlocks).

Since a global state in a distributed system is put together by


communicating messages, a global WFG may include out-of-date arcs
that appear to denote a cycle in the system.
It is hard to create algorithms that are not confused by this.
Resolution of a Detected Deadlock:
Deadlock resolution involves breaking existing wait-for dependencies
between the processes to resolve the deadlock.
It involves rolling back one or more deadlocked processes and assigning
their resources to blocked processes so that they can resume execution.
Killing a process in the cycle(s);
Preempting the resources from a process in the cycle(s);
Rolling back a process in the cycle(s).

The resources of this process (or processes) are then released and may be
acquired by other processes.

Control Organization for Distributed Deadlock Detection Algorithms


Algorithms for detecting distributed deadlock can be handled in three
different ways:
 Centralized
 Distributed
 Hierarchical
Assume that the network supports reliable communication.
Centralized:
One central site sets up a global WFG and searchs for cycles.
All decisions are made by the central control node.
 It must maintain the global WFG constantly or
 Periodically reconstruct it.
The main advantage is that this permits the use of relatively simple
algorithms.
The disadvantages include the following:
 There is one, single point of failure.
 There can be a communication bottleneck around the site due to all the
WFG information messages.
 Furthermore, this traffic is independent of the formation of any deadlock.
Distributed:
In a distributed control organization,
 All sites have an equal amount of information.
 All sites make decisions based on local information.
 All sites bear equal responsibility for the final decision in detecting
deadlock.
 All sites expend equal effort to the final decision.
 The global WFG is spread across the sites.
 Deadlock detection is initiated whenever a process thinks there might be a
problem.
 Several sites can initiate the detection at the same time.
The advantages include the following:
o There is no central point of failure.
o A single node failure cannot cause a crash.
o There is no one site with heavy traffic due to the detection algorithm.
o The algorithm is only initiated when process(es) feel there might be a
problem.
o The algorithm is not run periodically, only when needed.
The main disadvantage is that resolution may be difficult, as not all sites
may be aware of the processes involved in the deadlock.
The proof of correctness for this type of algorithm may be difficult.

Centralized Deadlock Detection:

We use a centralized deadlock detection algorithm and try to imitate the


non‐distributed algorithm.

– Each machine maintains the resource graph for its own processes and
resources
. – A centralized coordinator maintain the resource graph for the entire
system
– When the coordinator detect a cycle, it kills off one process to break the
deadlock.
– In updating the coordinator’s graph, messages have to be passed.
• Method 1) Whenever an arc is added or deleted from the resource
graph, a message have to be sent to the coordinator.
• Method 2) Periodically, every process can send a list of arcs added and
deleted since previous update.
• Method 3) Coordinator ask for information when it needs it.

False Deadlocks:

One possible way to prevent false deadlock is to use the Lamport’s


algorithm to provide global timing for the distributed systems.
• When the coordinator gets a message that leads to a suspect deadlock:
– It send everybody a message saying “I just received a message with a
timestamp T which leads to deadlock. If anyone has a message for me
with an earlier timestamp, please send it immediately”
– When every machine has replied, positively or negatively, the
coordinator will see that the deadlock has really occurred or not.

Centralized Deadlock Detection Algorithms


• The Ho‐Ramamoorthy Algorithms
– The Two‐Phase Algorithm
– The One‐phase Algorithm
Ho‐Ramamoorthy 2‐phase Algorithm –
Each site maintains a status table of all processes initiated at that site:
includes all resources locked & all resources being waited on.
– Controller requests (periodically) the status table from each site.
– Controller then constructs WFG from these tables, searches for cycle(s).
– If no cycles, no deadlocks.
– Otherwise, (cycle exists): Request for state tables again. – Construct
WFG based only on common transactions in the 2 tables.
– If the same cycle is detected again, system is in deadlock.
– Later proved: cycles in 2 consecutive reports need not result in a
deadlock. Hence, this algorithm detects false deadlocks.

Ho‐Ramamoorthy 1‐phase Algorithm


– Each site maintains 2 status tables: resource status table and
process status table.
– Resource table: transactions that have locked or are waiting for
resources.
– Process table: resources locked by or waited on by transactions.
– Controller periodically collects these tables from each site.
– Constructs a WFG from transactions common to both the tables.
– No cycle, no deadlocks.
– A cycle means a deadlock.

Distributed Deadlock‐Detection Algorithms


• A Path‐Pushing Algorithm
– The site waits for deadlock‐related information from
other sites
– The site combines the received information with its local
TWF graph to build an updated TWF graph
– For all cycles ‘EX ‐> T1 ‐> T2 ‐> Ex’ which contains the
node ‘Ex’, the site transmits them in string form ‘Ex, T1,
T2, Ex’ to all other sites where a sub‐transaction of T2 is
waiting to receive a message from the sub‐transaction of
T2 at that site.

Edge‐Chasing Algorithm
• Chandy‐Misra‐Haas’s Algorithm:
– A probe(i, j, k) is used by a deadlock detection process Pi. This
probe is sent by the home site of Pj to Pk.
– This probe message is circulated via the edges of the graph. Probe
returning to Pi implies deadlock detection.
– Terms used:
• Pj is dependent on Pk, if a sequence of Pj, Pi1,.., Pim, Pk exists.
• Pj is locally dependent on Pk, if above condition + Pj,Pk on
same site.
• Each process maintains an array dependenti: dependenti(j) is
true if Pi knows that Pj is dependent on it. (initially set to false
for all i & j).

Chandy‐Misra‐Haas’s Algorithm
Sending the probe:
if Pi is locally dependent on itself then deadlock.
else for all Pj and Pk such that
(a) Pi is locally dependent upon Pj, and
(b) Pj is waiting on Pk, and
(c ) Pj and Pk are on different sites, send probe(i,j,k) to the home
site of Pk.
Receiving the probe:
if (d) Pk is blocked, and
(e) dependentk(i) is false, and
(f) Pk has not replied to all requests of Pj,
then begin
dependentk(i) := true;
if k = i then Pi is deadlocked
else ...
Receiving the probe:
…….
else for all Pm and Pn such that
(a’) Pk is locally dependent upon Pm, and
(b’) Pm is waiting on Pn, and
(c’) Pm and Pn are on different sites, send probe(i,m,n)
to the home site of Pn.
end.
Performance:
For a deadlock that spans m processes over n sites, m(n-1)/2 messages
are needed.
Size of the message 3 words.
Delay in deadlock detection O(n).

Chandy‐Misra‐Haas Algorithm

• There are several ways to break the deadlock:


– The process that initiates commit suicide ‐‐ this is overkilling because
several process might initiates a probe and they will all commit suicide in
fact only one of them is needed to be killed.
– Each process append its id onto the probe, when the probe come back,
the originator can kill the process which has the highest number by
sending hima message. (Even for several probes, they will all choose the
same guy)

Hierarchical:
The sites (nodes) are logically connected in a hierarchical structure (such as
a tree).
A site can detect deadlock in its descendants.
This type of algorithm has the best of both the centralized and the
distributed deadlock detection algorithms.
For efficiency purposes, it is best to keep clusters of interacting processes
together in the hierarchy.
• Follows Ho-Ramamoorthy’s 1-phase algorithm. More than 1 control site
organized in hierarchical manner.
• Each control site applies 1-phase algorithm to detect (intracluster)
deadlocks.
• Central site collects info from control sites, applies 1-phase algorithm to
detect intracluster deadlocks.
Menasce and Muntz Hirarchical deadlock dectection:

Sites (called controllers) are organized in a tree


Leaf controllers manage resources
Each maintains a local WFG concerned only about its own resources
Interior controllers are responsible for deadlock detection
Each maintains a global WFG that is the union of the WFGs of its children
Detects deadlock among its children
changes are propagated upward either continuously or periodically

Ho and Ramamoorthy’s hierarchical deadlock detection:


Sites are grouped into disjoint clusters
Periodically, a site is chosen as a central control site
Central control site chooses a control site for each cluster
Control site collects status tables from its cluster, and uses the
Ho and Ramamoorthy one-phase centralized deadlock detection algorithm
to detect deadlock in that cluster
All control sites then forward their status information and WFGs to the
central control site, which combines that information into a global WFG and
searches it for cycles
Control sites detect deadlock in clusters
Central control site detects deadlock between clusters
Resource Management, Distributed Environment, Peer-to-Peer
I. INTRODUCTION

Resource Management in Distributed Environment is a management system of


resources like files other data over the distributed system whose main aim is to make
sure that a user/client can access the remote resources with as much ease as it can
access local resources. The basis of resource management is also resource sharing. Since
a computer can request a service or file from another computer by sending an
appropriate request to it over the communication network. Hardware and software
resources can be shared among autonomous computers. This communication can also
be referred to as peer-to-peer communication mechanism which is also the basis of
distributed system rather than the centralized-server and client mechanism. The peer-
to-peer communication.
Mechanism is much more efficient, flexible, convenient and faster than the centralized-
server and client’s mechanism. In this architecture all the process involved in a task like
resource management play similar roles, interacting co-operatively as peers without any
distinction between client and server processes or the computers they run on. The aim
of the peer-to-peer architecture is to exploit the resources in a large number of
participating computers for the fulfilment of a given task. Organizing the interaction
between each computer is of prime importance. In order to be able to use the widest
possible range and types of computers, the protocol or communication channel should
not contain or misuse that may not be misunderstood by certain machines. Special care
must also be taken that messages are indeed delivered correctly and that invalid
messages are rejected which would otherwise bring down the system and perhaps the
rest of the network. Another important factor is the ability to send software to another
computer in a portable way so that it may execute and interact with the existing
network. This may not always be possible or practical when using different hardware
and resources, in which case other methods must be used such as cross-compiling or
manually porting this software.

You might also like