You are on page 1of 6

JOURNALOF PARALLEL AND DISTRIBUTEDCOMPUTING 9,77-82 ( 1990)

Two Algorithms for Mutual Exclusion in Real-Time


Distributed Computer Systems
ANDRZEJ GOSCINSKI

Department ofcomputer Science, University College, The University ofNew South Wales,
Australian Defence Force Academy, Canberra, Australian Capital Territory, Australia

lems associated with time stamps event-ordering algorithms


Two algorithms developed utilizing a priority-based event-or- (generated by the lack of a common clock), it is suggested
dering which manage mutual exclusion in distributed systems- that events be ordered on the basis of priorities of the pro-
computer networks-are proposed. in these systems, processes cessesrather than on times when they happen. For real-time
communicate only by messages and do not share memory. The systems and systems using priorities for scheduling, a prior-
computer network functions either (a) in an environment re- ity-based approach is more natural and improves the per-
quiring priorities or (b) in a real-time environment. The algo- formance of the total system.
rithms are based on broadcast requests and token passing service Two algorithms developed utilizing a priority-based
approach, but the token need not be passed if no process wishes
to enter the critical section. These algorithms are fully distrih- event-ordering which handle mutual exclusion in computer
uted and are insensitive to the relative speeds of node computers networks are proposed. In these systems, processes commu-
and communication links. They use only N messages per critical nicate only by messages and do not share memory. The
section, where N is the number of nodes (processes). The algo- computer network functions either (a) in an environment
rithms are optimal in the sense that a symmetrical, distributed requiring priorities or (b) in a real-time (also called hard
algorithm cannot use fewer messages if requests are processed real-time) environment (where processes must meet their
by each node computer concurrently. Both algorithms ensure deadlines). The following assumptions about the commu-
freedom from starvation. There are mechanisms to handle node nication network were made: ( 1) it is a homogeneous or
insertion and removal, node failure, the loss of the token, the heterogeneous network, and (2) it is an error-free network;
existence of more than one token, and delivery of messages out i.e., messages are not lost and delivered in the order sent.
of order. 0 1990 Academic Press, Inc. However, some extensions to the proposed algorithms to
handle (i) the loss of token, (ii) the existence of more than
one token, and (iii) delivery of messages out of order are
1. INTRODUCTION also considered. In the most representative papers on algo-
rithms for mutual exclusion in computer networks pre-
During the Second European SIGOPS Workshop sented by Ricart and Agrawala [ 111 and Suzuki and Kasami
“Making Distributed Systems Work” the problem that [ 121 there is an assumption that transmission times may
there seem to be very few real distributed applications was vary and the communication delay is unpredictable. In this
again raised [9]. H. Kopetz remarked that real-time sys- paper we show that it is possible to develop an algorithm for
tems, where distribution is dictated by the application, and the case with the upper boundary of transmission delay.
which exhibit natural concurrency, are ignored. Indeed, The algorithms are based on (i) requests broadcasted to
real-time systems are real concurrent systems, which gener- all the other processes in the system by a process that wishes
ate new problems for the design of distributed operating sys- to enter the critical section, and (ii) the token passing ser-
tems. One of these is the synchronization of processes. vice approach, but the token need not be passed if no pro-
Some interesting distributed synchronization algorithms cess wishes to enter the critical section. Possession of the
[ 1,6, 11, 12 ] have been proposed during the last couple of token entitles its holder to enter the critical section. These
years. The most important features of these algorithms are algorithms are fully distributed and are insensitive to the
that all events are ordered on the basis of the times when relative speeds of node computers and communication
these events happened, and that critical sections are granted links. They use only N message exchanges for one mutual
in a first-come-first-served manner. Because of many prob- exclusion invocation, where N is the number of nodes

77 0743-73 15/90 $3.00


Copyright 0 1990 by Academic Press, Inc.
All rights of reproduction in any form reserved.
78 ANDRZEJ GOSCINSKI

(processes). The algorithms are optimal in the sense that a When a process i wants to enter its critical section it sends
symmetrical, distributed algorithm cannot use fewer mes- the message request( i, p( i))-in the case of a P-system-
sages if requests are processed by each node computer con- or request( i, T( i))-in the case of the RT-system-to all
currently and a network is error-free. It ensures that each other processes in the system. It is kept waiting until it re-
process requesting entry to its critical section will receive ceives the token. When the process j holding the token is
the token within a finite time of requesting entry. in its critical section, other processes wanting to enter their
critical sections can send request messages. An incoming
2. THE ALGORITHMS priority request queue Q is formed in the node computer
running process j. All processes in this queue are ordered
Two algorithms developed utilizing a priority-based according to relation ), or relation ) ) (defined above). If
event-ordering sharing the same concept were developed. two processes have the same value of priority, the process
One is suitable for the environment requiring priorities, and with a smaller value of index i is placed first.
the other can be used in real-time environment. In the sys- A process j holding the token, after exiting the critical
tem, each node of the computer network executes an identi- section, examines the incoming request queue Q and the
cal algorithm, appropriate to the environment. The differ- queue P received with the token, if any. In the case of the
ences between these two algorithms are implied by the fea- RT-system, the clock routine must perform decrementing
tures of environments as well as methods used to ensure operations on appropriate fields of the queue P and request-
freedom from starvation. The general description of these ing queue Q elements, i.e., T( i), to maintain real-time con-
two algorithms is common; however, their important fea- sistency.
tures and specific differences will be pointed out. In the pre- -If the queue P is empty and the queue Q is empty, the
sentation of these algorithms we use the notation based on process continues with its normal execution until it receives
an abstract definition of a queue [ 5 1. The following opera- a request message from some other process.
tions are used: Head-returns the head of a queue; De- -If the queue P is not empty and the queue Q is empty,
lete-deletes the head of a queue; IsNew-tests whether a the process j deletes (using function Delete) the process
queue is empty; Append-concatenates two queues; with the highest priority (the greatest item), Delete(P) .
Merge-merges two priority queues; Add-inserts an item -If the queue P is empty and the queue Q is not empty,
to a queue. Due to this notation, the presentation is short the process j deletes (using function Delete) from the prior-
and facilitates the easy identification of the most important ity queue Q the process with the highest priority (the great-
features of both algorithms. est item), Delete(Q), and creates the new queue P by
appending (function Append)-P-system or merging
2.1. Basic Concepts (function Merge)-RT-system the queue P and the
queue Q.
There are N processes located in N nodes of the network -If the queue P is not empty and the queue Q is not
indexed from 1 to N. Each process i is characterized: in the empty, the construction of the queue P is different for both
case of a system with priorities (called here the P-system) cases.
by its priority p(i), or in the case of a real-time system
(called here the RT-system) by remaining time T(i) to run For a P-system, the process j creates a new queue P by ap-
the process. pending (function Append) the old queue P and the queue
In the P-system, associated with the token is a priority Q. Processes from the second part (from Q) and each subse-
queueP=(i,,&,. . . , i,), where ij indicates a process which quent part (appended, if necessary, in the same way as the
wants to enter its critical section, i, , i2, . . . , i, is a permuta- first part) cannot be served while processes from the preced-
tion of the sequence of numbers 1,2, . . . , m, such that m ing part are still in the queue. This requirement must be
< N, and defined by relation ) in the following way: ik ) i, fulfilled to avoid starvation of processes with smaller values
ifandonlyifforeachk<t,p(&)>p(&). of priorities.
In the RT-system, associated with the token is a priority For a RT-system, the process j constructs (function
queueP=((i,, T(i,)),(&, T(&)),. . . ,(i,, T(i,))),where Merge) the new queue P by merging the old queue P and
ij indicates a process which wants to enter its critical section, the requesting queue Q .
i, , i2, . . . , i, is a permutation of a sequence of numbers 1, The process finally sends the token with the new queue P
2 . . >m, such that m =GN, and defined by relation )) in to the first process recorded in the queue P, Head(P) .
the following way: i,+)> i, if and only if for each k < t their
remaining times satisfy the condition T( ik) < T( it). 2.2. The Basic Algorithm
For both cases, if two processes have the same priority or
remaining times, the process with the smaller value of index Figure 1 shows the algorithm, referred to as Algorithm,
i is placed first. for an error-free network for the case of a system with prior-
TWO ALGORITHMS FOR MUTUAL EXCLUSION 79

const I : Integer; {the identifier of a given node}


var i, j : Integer;
P, 0 : queue of integers;
IsNew( IsNew : Boolean; {to test whether a queue is empty}
Requesting, HaveToken : Boolean;

procedure wants-to-enter;
begin
Requesting : = true;
if not HaveToken
then begin
foralljin{1,2,...,N}\{/}do
Send request(l, p) to node j;
Wait until token is received;
HaveToken := true
end;

Critical Section;

If IsNew and IsNew


then Wait until request is received
else begin
if not IsNew and IsNew(0)
then begin
Delete(P):
HaveToken : = false;
Send token, Token(Tail(P)), to Head(P)
end
else If IsNew and not IsNew
then begin
Delete(Q);
Append(P, Q) + P; {MergeP, 0) + P}
HaveToken : = false;
Send token, Token(P), to Head(P)
end
else begin
Delete(P);
HaveToken := false;
Append(P. 0) + P; { Mqe(P, Q) + P}
Send token, Token(P), to Head(P)
end
Requesting : = false
end;
end;

procedure request-arrives;
begin
if not HaveToken
then forj = {1,2, , N}\{/} do
discard request( j, p)
else begin
Delete(P);
forj={1,2,...,}\{I}do
Add( j, 0) + 0 ; {insert an additional request}
Append(P, 0) + P ; {Merge(P, Q) -. P)
Send token, Token(P), to Head(P)
end
end;

FIG. 1. The synchronization algorithm for an error-free network.

ities, P-system. The structure of the algorithm for a real- function Append with instances of function Merge. This is
time system, RT-system, is identical to the structure of this presented in comments to several lines of Algorithm. Proce-
algorithm. Moreover, algorithm for RT-system can be eas- dure wants-to-enter is called when a node attempts to en-
ily obtained from Algorithm by replacing all instances of ter the critical section and procedure request-arrives is exe-
80 ANDRZEJ GOSCINSKI

cuted when a request message arrives. In both cases the performed, a time of waiting for entering a critical section
same suitable algorithm is executed by each node. is finite. Thus again, starvation is impossible.

3. ASSERTIONS 4. MESSAGE TRAFFIC


3.1. Mutual Exclusion
Both algorithms proposed require one broadcast message
Mutual exclusion is achieved when no pair of processes request sent by a process wanting to enter its critical section
located in different nodes are ever simultaneously in their and one token message passing, which makes it possible to
critical sections [ 111. If process i is executing in its critical enter the critical section. The process in the node holding
section then no other process can be executing in its critical the token does not need to send the request message to itself.
section. Since the network has N processes distributed among N
nodes, N messages have to be exchanged for one mutual
ASSERTION. Mutual exclusion is achieved.
exclusion invocation.
Proof Only one process can be in its critical section at The worst case delay involved in granting the critical sec-
a time, since there is only a single token. tion resource is the period of time beginning with the re-
questing node asking for the critical section and ending
3.2. Deadlock when that node enters its critical section for the process with
Distributed processes are deadlocked if no process is in the longest remaining time. This delay can be assessed in
its critical section, and no requesting process can ever pro- the following way. Let 7tp be the time of passing the token
ceed to its own critical section [ 1 I]. from one node to another (the time of transferring the re-
quest message has approximately the same value) and T,,
ASSERTION. Deadlock is impossible. be the length (in time) of the critical section. The process
with the longest remaining time occupies the (N - 1) th po-
Proof: Suppose that two processes are deadlocked. This
sition in the ordered sequence. Assuming that the time, de-
means that two distributed processes are in their critical sec-
noted by t, spent by procedures of an operating system to
tions and wait for each other. This means that two nodes
perform operations such as inserting processes in the prior-
hold tokens. This contradicts the token passing approach,
which asserts that there is one and only one token in the ity request queue Q, merging two priority queues, and re-
maining time decrementing operations is much smaller
system.
than the time of token passing, i.e., t < 7tp, then the worst
3.3. Starvation case delay 6 is expressed by
Starvation occurs when one process must wait indefi-
nitely to enter its critical section while other processes are
entering and exiting their own critical sections [ 111.
i.e., the upper boundary can be assessed. This value depends
ASSERTION. Freedom from starvation is ensured. greatly on the length (in time) of the critical section. It is
Proof: This assertion must be proved for both systems likely that when the critical section includes exchange of
separately, because of different approaches to the construc- messages between producers and consumers, this time
tion of the queue P in each case when the original queue could be very long. But, this is the problem of all synchroni-
obtained with the token does contain elements as well as zation algorithms.
the requesting queue Q is not empty.
P-system-Because (i) there is a finite number of distrib- 5. PROBLEMS OF REAL NETWORKS
uted processes (N) ordered according to the priority values,
(ii) processes creating original queue P (received with the Three types of failures can happen in real networks: (i)
token) have greater priority than processes from the re- network node-oriented failures such as insertion of new
questing queue (ordered and attached to the original nodes, removal of nodes, or node failure; (ii) token-ori-
sequence), and (iii) processes are served in the priority or- ented failures such as loss of the token, or the existence of
der, the time of waiting for entering a critical section is fi- more than one token; and (iii) message delivery failures;
nite. So, starvation is impossible. i.e., messages are delivered out of order. Because these algo-
RT-system-Because (i) all processes are ordered ac- rithms do not provide any detection algorithm before some-
cording to the decreasing values of the remaining times to thing extraordinary happens, these cases are treated uni-
run, (ii) the remaining times are finite, and (iii) remaining formly. So, to prevent this situation from stopping the pro-
time decrementing operations on appropriate fields of the posed mutual exclusion algorithm, a time-out recovery
queue P and requesting queue Q elements, i.e., T(i), are mechanism may be used.
TWO ALGORITHMS FOR MUTUAL EXCLUSION 81

5. I. Node-Oriented Events var Time-out : integer;


HaveAck : Boolean;
Insertion of New Nodes. New processes located in new procedure send-token;
nodes may be added to the group participating in the mu- begin
tual exclusion algorithm-this operation can be accommo- HaveToken := false;
Send token, Token(P), to Head(Tail(P));
dated because the algorithms do not use and maintain any
Set Time-out;
logical ring. The only requirement is a unique name for the repeat
new process to be added. This problem is solved by a nam- if Time-out d 0
ing facility. Of course, the delay time will increase with each then begin
insertion, and this can cause a problem in a boundary case Delete(Tail(P));
Send token, Token(P), to Head(Tail(F’));
(real-time requirements).
Set Time-out
If the node was previously operational in the group (e.g., end
it failed and is now restoring), it should treat itself as a until HaveAck;
brand new process, sending its request message if it wants Clear buffer with queue P
to enter its critical section. If before failure this process held end;
the token, it should destroy this token and its associated FIG. 2. Procedure send_token.
queue P, to avoid the existence of two tokens in the system.
Removal of Nodes. A node wishing to leave the group
may do so if it presently does not hold the token by notify- Loss ofthe Token. If all processes wanting to enter their
ing all other nodes of its action. This is necessary to remove critical sections sent their request messages and did not re-
it from the queue P. The node holding the token must send ceive the token during the time 6 + 2~, they can assume that
an acknowledgment message (after removing the node). the token is lost. An appropriate action should be per-
Node Failure. It can happen that the node or process in formed. If the token is lost, an election is called and the
the node fails and will not behave as expected. The node elected process generates a new token [ 3, 7, 81. All pro-
failure can occur in two different situations: first, when the cesses which want to enter their critical sections have to
node holds the token, and second, when it sends a request send request messages again to reconstruct queue P. There
message wanting to enter its critical section and is recorded are other algorithms to solve these problems.
in the queue P. Because, as we said, these algorithms do not Existence of More than One Token. If more than one to-
provide any detection algorithm before the occurrence of ken is detected, the processes (located in two nodes) have
special events (e.g., long delays of holding the token), we to make an agreement to destroy one of the tokens they
use information associated with such events. We propose a hold. At the same time, if any two processes are deadlocked
time-out recovery mechanism to be used. (both of them are in their critical sections), recovery from
To implement such a mechanism, each process 4 sending this deadlock will be done.
the token to process, say i, , starts its time-out clock. If pro-
cess 4 did not receive from process il an acknowledgment 5.3. Messages Arrive Out of Order
message ack(j) before time-out, then it assumes that pro-
cess i, failed and sends the token with modified queue P to A real network can cause the loss of messages or their out
next node, say iz. If all processes which sent their request of order arrival. In the latter case, the situation is critical
messages do not receive the token during the token-rotation when requests from distinct sources arrive in the token des-
time, they can assume that the token is lost. Consequently, tination later than a token which was sent off earlier than
the appropriate action should be undertaken (see loss of the these requests; i.e., they arrive out of order. In the case of
token section). The appropriate procedure called send-to- the proposed algorithms such an out of order arrival can be
ken is illustrated in Fig. 2. treated as the loss of the request [ 41. The loss of a request
Thus, this recovery mechanism requires (i) sending the requires its retransmission. A process is allowed to retrans-
token and acknowledgment message (the acknowledgment mit after time-out. A time-out value should be set up to at
could be the token, in which case the sending process has to least the value of the worst case delay. The relevant part of
use group addressing), and (ii) performing operations on a procedure wants-to-enter is shown in Fig. 3.
the queue P and the requesting queue by two “neighbor”
(predecessor, successor) processes/nodes. 6. CONCLUSION

5.2. Token-Oriented Failures Two algorithms that implement mutual exclusion in


computer networks working in either an environment
Protocols providing for failure detection and recovery which requires priorities of processes or a real-time environ-
have to be provided to deal with these failures. ment are presented. Because of many problems associated
82 ANDRZEJ GOSCINSKI

procedure wants-to-enter;
begin
Requesting : = true;
if not HaveToken {modification starts}
then begin
repeat
foralljin{1,2,...,N}\{/}do
Send request(l, p. r-id, ret-id) to nodej;
Set up req.-time-out;
Wait until token is received;
until HaveToken;
HaveToken := true;
Send acknowledgment, Ack(Head(P));
Delete(f)
end
end;

FIG. 3. Retransmission of a request.

with time stamps event-ordering algorithms, priority-based 6. Lamport, L. Time, clocks and the ordering of events in a distributed
system. Comm. ACMZI, 7 (July 1978), 558-565.
event-ordering of processes has been proposed. For real-
7. Le Lann, G. Distributed systems-Towards a formal approach. Proc.
time systems and systems using priorities for scheduling, a IFIP Congress, Toronto. North-Holland, Amsterdam, Aug. 1977.
priority-based approach is more natural and improves the
8. Le Lann, G. Algorithms for distributed data sharing systems which use
performance of the total system. Both algorithms are con- tickets. Proc. 3rd Berkeley Workshop, Aug. 1978.
current and distributed, and use N messages per critical sec- 9. Mullender, S. J. Report on the Second European SIGOPS Workshop
tion. In both cases freedom from starvation is ensured. “Making Distributed Systems Work”, Amsterdam, Netherland, Sept.
There are mechanisms to handle node insertion and re- 1986, Oper. Syst. Rev. 21, 1 (Jan. 1987), 49-84.
moval, node failure, the loss of the token, existence of more 10. Peterson, J. L., and Silbershatz, A. Operating Systems Concepts. Addi-
than one token, and out of order arrival of messages. son-Wesley, Reading, MA, 1985.
I I. Ricart, G., and Agrawala, A. K. An optimal algorithm for mutual ex-
clusion in computer networks. Comm. ACM 24, 1 (Jan. I98 1), 9- 17.
ACKNOWLEDGMENTS
12. Suzuki, I., and Kasami, T. A distributed mutual exclusion algorithm.
ACM Trans. Comput. Syst. 3,4 (Nov. 1985), 344-349.
I thank my colleagues A. Quaine and C. Vance for reading drafts of this
paper and for their valuable comments. 1 am also grateful to the referees
for valuable comments on the paper.
ANDRZEJ GOSCINSKI received the M.Sc. degree in automatic control
in 1968, the Ph.D. degree in control engineering and computer science in
REFERENCES 1973, and the D.Sc. degree in computer science in 1976 from the St. Staszic
University ofMining and Metallurgy, Krakow, Poland. From 1968 to 1984
1. Chandy, K. M. A mutual exclusion algorithm for distributed systems. he was with the Institute of Computer Science and Control Engineering, at
Tech. Rep., University of Texas, 1982. that University. In 1977 Dr. Goscinski was with the Department ofcontrol
Engineering, Computer Engineering and Information Sciences, Case
2. Fran&t, W. R., and Chlamtac, I. Local Networks. Lexington Books,
Lexington, MA, 1982. Western Reserve University, Cleveland, Ohio. In January 1985 he joined
the Department of Computing Science at the University of Wollongong,
3. Garcia-Molina, H. Elections in a distributed computing system. IEEE and since January 1987 has been with the Department of Computer Sci-
Trans. Camp. C-31, I (Jan. 1982), 48-59.
ence, University College, University of New South Wales, Australian De-
4. Goscinski, A. Distributed Operating Systems. The Logical Design. Ad- fence Force Academy. He is currently a Senior Lecturer. His current re-
dison-Wesley, in press. search activities are in distributed operating systems, applications of dis-
5. Hille, R. F. Data Abstraction and Program Development. Prentice- tributed systems, networks, and communication protocols. Dr. Goscinski
Hall, New York, 1987. is a member of the IEEE Computer Society.

Received September 20, 1987; revised December 5, 1988

You might also like