You are on page 1of 6

DETECTING TERMINATION OF DISTRIBUTED COMPUTATIONS

BY
EXTERNAL AGENTS
Shing-Tsaan Huang
Institute of Computer Science
National Tsing-Hua University
HsinChu, Taiwan (30043)) R.O.C.
said to be terminated iff all the processes are
ABSTMCT idle and there is no message in transit.
This paper presents an algorithm for
detecting termination of distributed This paper presents an algorithm for
computations by an auxiliary controlling the termination detection problem under the
agent. The algorithm assigns a weight to each following assumptions. A controlling agent
active process and to each message in transit. monitors the computation. A logical
The controlling agent has a weight, too. The communication channel exists between each
algorithm maintains an invariant that the process pair and between the controlling agent
sum of all the weights related to a and each of the processes. Messages from a
computation is equal to one. The controlling sender to a receiver are correctly received by
agent concludes the termination when its the receiver in an order not necessary the
weight equals one. same as the sending order. Message delay is
A space-efficient scheme is proposed arbitrary but finite. The presented algorithm
to encode the weights such that an active can be easily extended to work on distributed
process can send a very lar e number of systems with dynamic nature [8,10] in the
messages without running out o the weight. f sense that the processes may be created,
destroyed, or migrated from one processing
element to another.
1. Introduction.
The algorithm is modified from the
The termination detection problem for algorithm by Rokusawa, Ichiyoshi,
distributed computations has attracted Chikayama, and Nakashima [lo] by applying
considerable interest [1-6,8-9,111 over the our ideas of graph search technique reported
past years since the works of Dijkstra and in [7]. Adopting the weighted throw counting
Scholten [a], and Francez [4]. In our scheme used in their algorithm, our algorith~n
discussion, a distributed computation consists assigns a weight to each active process and to
of a set of processes, which communicate with each message in transit. The controlling
one another via message passing. Each agent maintains a weight, too. For each
process may be either active or idle. An active weight W, we let 0 < W < 1. The a1 orithm
process may become idle at any time. Only
active processes may send messages to others;
maintains an invariant that the sum o all the
weights related to the computation is equal to
!
and, an idle process can only be reactivated one. The controlling agent concludes
by receiving a message. The computation is termination if its weight equals one.
A space-efficient scheme is proposed
This work is supported in part by to encode the weights such that an active
ATC, ERSO, Industrial Technology Institute process can send a large number of messages
of the Republic of China under the Contract without running out of the weight, and hence
SF-C-010-1, and the National Science avoid requiring additional weight from the
Council of the Republic of China under the controlling agent.
Contract NSC77-0408-EO0 7-07.

19
0 1989 IEEE
CH2706-0/89/0000/0079$01.00

Authorized licensed use limited to: UNIVERSITY OF AKRON. Downloaded on September 14, 2009 at 11:30 from IEEE Xplore. Restrictions apply.
We have organized the rest of the (If p is idle, p is reactivated.)
paper as follows. The algorithm and its
correctness are discussed in Section 2. Then, R3: An active process may become idle at
the encoding'scheme is presented in Section 3. any time by doing:
Historic remarks of the works in [7,10] are Send a C(DW:= W) to the controlling
provided in Section 4. Finally, a summary is agent;
given in Section 5. Let W : = 0;
(The process becomes idle.)
2. The algorithm and its correctness. R4: Upon receiving a c(DW), the
controlling agent does:
Besides the messages for the Let W : = W + OW;
computation, which we call basic messages, If W = 1, the computation is
there are messages from the processes to the terminated.
controlling agent for the purpose of
termination detection which are called control
messages. The correctness reasoning of the
algorithm is as follows. Let us define the
The controlling agent and each of the following sets first.
processes maintain a variable W for its
weight. Basic messages are denoted as A the set of all the Ws on active
3;q
weight.
and control messages are denoted as
, where DW is a parameter of the B:
processes.
the set of all the Ws on basic messages
in transit.
C. the set of all the Ws on control
At the beginning, all the processes are messages in transit.
idle with their weights having value zero, and
the weight of the controlling agent equals one. Also, let us denote the W o n the controlling
The computation starts when the controlling agent as Wc.
agent sends a basic message to one of the
processes. The following P1 and P2 are
The algorithm consists of the following invariants.
rules. In the rules, we assume that a weight is W = 1.
infinitely divisible. The implementation of pl: Wc + %E ( A U BU c)
such weights is discussed in the next section. P 2 : V W E ( A U B U C ) , W>O.
Hence,
R1: The controlling agent or an active wc = 1
process may send a basic message to =>
one of the processes, say p , at any time LbY P1)
WE(AUBUc) W=O
by doing:
Derive W1 and W2 such that
+
( W1 W2 = W), ( W1 > 0), and
(W2 > 0);
=>
=>
=>
[ZGy = 4)
The computation is terminated.
Let W : = W1; That is, the algorithm never detects a false
Send a i ( D W : = W2)to p . termination.
Further,
R2: Upon receiving a B(DW), a process p AUB)=(b
does: => [by P1)
Let W : = W + D W ;

80

Authorized licensed use limited to: UNIVERSITY OF AKRON. Downloaded on September 14, 2009 at 11:30 from IEEE Xplore. Restrictions apply.
+
Wc E W E c W = 1 processes. The termination is detected when
Wc has a carry from the addition in rule R4.
=> (Message delay is finite.)
Eventually, Wc = 1. In this way, at the beginnin one bit is
sufficient to encode the weig t. As the
computation proceeds, we may need more bits
fi
That is, the algorithm detects every true to encode the weights, and the number of bits
termination in finite time. needed might not be affordable for the
We have presented an algorithm for processes to maintain or the messages to
the termination detection on distributed carry. In the following, we present a
computations. With the assumption that a spaceefficient scheme to handle this problem.
weight is infinitely divisible, the algorithm In the scheme, the controlling agent
can be easily extended to work on distributed preserves an array of a sufficiently large
systems with dynamic nature. In a dynamic number of bits to encode Wc. The array is
system, an active process may spawn another
active process or migrate from one processing evenly divided into window slots. Each other
element to another. We can let an active weight W on an active process or on any
process create another active process by message in transit is represented by (...)i,
splitting its weight into two parts, keeping where (...) is a window of bits and the
one part as its new weight, and assigning the subscript i is its corresponding slot number.
other part as the weight of the spawned For each W = (...)i,it is assumed that all the
process. Migration of the processes won't
cause any problem provided that only active bits in windows other than slot i have value 0.
processes are allowed to migrate. However, This is similar to a floating-point encoding
the maximum number of co-existing active scheme, in which the number of leading zeros
processes is bounded in an actual is stored rather than the zero's themselves.
implementation. We shall readdress this
further after the implementation of the When an active process has weight W
weights is discussed. = (0...Ol)* and wants to send a basic message,
it can slide the window to the right one slot
3. The implementation of the weights. +
position, or let i := i 1, and then split its
weight into two non-zero parts, keeping one
In this section, we present a part as its new weight and sending the other
space-efficient scheme to encode the weights. as the weight of the message. For example, a
Except Wc, the weight on the controlling process can split (O...O1)o into (l...ll)l and
agent, all the other weight Wis encoded in a (O...O1)l, then keep (l...ll)las its new weight
small number of bits, yet it behaves as if it and assign (O...O1)l as the weight of the
were infinitely divisible. message.
We may represent the Was a binar
number in an unbounded array of bits wit
the binary point before the first bit. That is,
3: An active process may have two
weights according to rule R2, one from itself
the array (100...) has a binary value 0.100 ... and the other from a received basic message.
or a decimal value 0.5. By doing this, W1 If the sum of the two weights can not be fitted
into a single window slot, we let the process
and W2 in rule R1 are always available. For keep the window with smaller slot number
example, let W = (00100...), we can have W1 and send the other to the controlling agent.
For example, when an active process having
= (00010...) and W2 = (00010...). The weight = (...)3 receives another weight =
computation may start when the controlling (...)l, we let it keep the weight and send
agent lets its Wc := (100...) and sends a basic (...)3 to the controlling agent.
message B(DW := (loo...)) to one of the

81

Authorized licensed use limited to: UNIVERSITY OF AKRON. Downloaded on September 14, 2009 at 11:30 from IEEE Xplore. Restrictions apply.
The above scheme makes possible for 4. IIistoric Remarks.
each process to maintain only a few bits, and
each message to carry the same number of The presented algorithm applies the
bits. For example, let H be the array size of ideas of the graph searching technique
Wc and h be the window size. If h = 16, and reported in [7] on the weighted throw
H = 1024h, we need 26 bits to represent the counting scheme discussed in [lo]. In
W, among them 10 bits are for the slot searching a graph with logical edges, the
searching can start from a root with a weight
number because 21° = 1024, and 16 bits are of one. The root sends a token T(DW :=
for the window itself. Only the weight Wc of 1/ I NSI) to each of its neighbors. We use NS
the controlling agent then needs 1024 16-bit to denote the set of the neighbors of a node.
computer words. Upon receiving a token T(DW) for the first
time, a node sends a token T(DW :=
When an active process has its slot DW/(INSI-l)) to each of its neighbors
number reaching the upper limit, it can split except the one from which the token is
its weight into two parts, keep one part as its received. For a revisited token T(DW), a
new wei ht and send the other as a request for node sends a T(DW) to the root. A sink
B node, a node with only one neighbor, sends a
supply o weight to the controlling agent. The
weight on a request message can be of the
form (0...01)$ Upon receiving such a request qzj to the root upon receiving a token
. The root detects the termination of
the searching when the sum of the weights on
with weight (O...O1)i, the controlling agent all the returned tokens equals one. This
then sends a supply of weight (0...01) with j search technique assumes that a weight is
1’ infinitely divisible.
< i, to the requester, or postpones the sending
of the supply until it is able to do so. When Although different formulation and
an active process receives a supply, the terminology are used, the termination
situation is similar to what described in the detection dealt with in the algorithm reported
paragraph before the last paragraph, and in [lo] is similar to the one .discussed in this
similar actions can be followed by the process. paper. However, in [lo], the weights are
When an idle process receives a supply, it integer numbers; since no special encoding
simply rebounds the weight to the controlling scheme is used, each activated process almost
agent. By enlarging H, the requests for always needs to request for supply of weight.
supply of weight can be eliminated. As discussed in (101, let us have the following
weight assignment strategy: Assign a fixed
In a dynamic system, the fact that Wc
weight 21° to a basic message if the weight of
is bounded by H also has its effects on the the sender is more than twice of that;
maximum number of active processes that can otherwise assign half of its weight. Then, an
co-exist at any time. With H = 2, for active process which is reactivated by
example, the maximum number of co-existing
active processes is 4 because in such a case, receiving a basic message with a weight of 210
the minimum possible weight for each active can only send 10 basic messages. In our
process is binary value 0.01 and the sum of proposed scheme, however, by sliding the
their weights is equal to one. window to the right, the effect is receiving a
supply of a very large weight from the
We have presented a space-e€ficient controlling agent.
scheme to encode the weights. This scheme
maintains a very large resource pool in the For instance, let H = 1024h, and h =
controlling agent which is shared by all the 16. Then, by sliding the window one slot
processes with each having only a small position, a process can send an additional 216
amount of memory space. This ideas should basic messages. The computation can go into
be interesting to the community of researchers a depth of 1024 levels without any request
studying resource sharing problems. messages, and hence, the processes are free

82

Authorized licensed use limited to: UNIVERSITY OF AKRON. Downloaded on September 14, 2009 at 11:30 from IEEE Xplore. Restrictions apply.
from experiencing any delays waiting for case that the sum of the weights in rule R2
supply messages. The memory cost for each can not be fitted into a single window of bits.
process and message is 26 bits only. Finally, only one control message is needed for
each idleness of the processes.
The major disadvantage of the scheme
is that in rule R2 the sum of the two weights
may not be able to be fitted into a single ACKNOWLEDGMENT
window of bits, hence an extra control
message is needed. However, the advantage of The author would like to thank the
the freedom of the processes from waiting for anonymous referees for their helpful
supply messages should be regarded as more comments and suggestions.
important. Further, in some applications,
sending basic messages to an active process
can be avoided. For example, in the discussed References
graph searching computation, a process (node)
goes active upon receiving the token, and goes Dijkstra, E. W., Feijen, W. H. J., and
idle immediately after sending out the tokens van Gasteren, A. J. M. Derivation of a
to its neighbors; therefore, no token is sent to Termination Detection Algorithm for
an active process. Distributed Computations. I nform.
Processing Lett., Vol. 16, pp. 217-219,
June 1983.
5. summary. Dijkstra, E. W. and Scholten, C. S.
Termination Detection for Distributed
We have presented a termination Computations. Inform. Processing
detection algorithm for distributed Lett., Vol. 11, pp. 1 4 , Aug. 1980.
computations and. proposed a space-efficient Eriksen, 0. A Termination Detection
encoding scheme to implement the ideas that Protocol and Its Formal Verification.
logically a weight can be infinitely divisible. J. Parallel and D.istributed Computing
The algorithm is modified from the one 5, pp. 82-91, 1988.
proposed by Rokusawa, et al. [lo], and Francez, N. Distributed Termination.
borrows the ideas from our other article [7]. ACM Trans. Program. Lung. Syst.,
Using our encoding scheme, each process and Vol. 2, pp. 42-55, Jan. 1980.
message need a small number of bits to encode Francez, N. and Rodeh, M. Achieving
the weight, and the processes can almost be Distributed Termination without
free from the delays waiting for supply of Freezing. IEEE Trans. Sojtware
weights from the controlling agent. Engrg., Vol. 8, pp. 287-292, Mar. 1982.
Huang, S-T. A Fully Distributed
Since a termination detection Termination Detection Scheme.
algorithm should do the best to avoid Inform. Processing Lett. Vol. 29, pp.
bothering the computation, having an active 13-18, Sept. 1988.
process waiting for control messages is not -. A distributed deadlock detection
desirable. And, the detecting delay, the algorithm for CSP style programs.
period from the computation terminates until TR-H7601, Inst. of Computer Science,
the controlling agent detects the termination, Tsing-Hua University, Feb. 1987,
should be kept as less as possible. Further, submitted for publication, under
the number of control messages sent should be revision.
as small as possible. The presented algorithm Lai, T-H. Termination Detection for
is almost optimal in these aspects. With a Dynamically Distributed Systems with
large enough H, the processes won't receive Non-fi r st-in-fi rst-ou t
any control messages. It takes only one Communication. J. Parallel and
message delay, from the latest idled process to Distributed Computing 3, pp. 577-599,
the controlling agent, to detect the 1986.
termination. And, for every basic message, at Ftana, S. P. A Distributed Solution to
most one control message is required for the the Distributed Termination Problem.

83

Authorized licensed use limited to: UNIVERSITY OF AKRON. Downloaded on September 14, 2009 at 11:30 from IEEE Xplore. Restrictions apply.
Inform. Processing Lett., Vol. 17, pp.
4 3 4 6 , July 1983.
[lo] Rokusawa, K., Ichiyoshi, N.,
Chikayama, T. and Nakashima, H. An
efficient termination detection and
abortion algorithm for distributed
processing systems. Proc. 1988 Int'l
Conj Parallel Processing, pp. 18-22,
1988.
[ll] Topor, R. W. Termination Detection
for Distributed Computations. Inform.
Processing Lett., Vol. 18, pp. 33-36,
Jan. 1984.

Authorized licensed use limited to: UNIVERSITY OF AKRON. Downloaded on September 14, 2009 at 11:30 from IEEE Xplore. Restrictions apply.

You might also like