You are on page 1of 9

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 39, NO.

6, JUNE 1991 995

Another Adaptive Distributed


Shortest Path Algorithm U

Pierre A. Humblet

Abstmct-We give a distributed algorithm to compute shortest


paths in a network with changing topology. It does not suffer
from the routing table looping behavior associated with the
Ford-Bellman distributed shortest path algorithm although it
uses truly distributed processing. Its time and message complex-
ities are evaluated.

1:
link failure

I. INTRODUCTION

0 NE of the oldest and best known problems in the field


of distributed algorithms is to compute shortest paths
between nodes in a network. This problem arises in the
following context. We have a network of links and nodes
tim
(processors). Each link ( I ,J ) is characterized by a direction
dependent length LEN(I, J ) that can change with time and can Fig. 1. Looping in Ford-Bellman.
only be observed at node I. The nodes execute a distributed
algorithm to keep track of the shortest distances between
themselves and the other nodes, in the course of which they changes. In particular, the length of a down link is considered
communicate with adjacent nodes by transmitting messages to be infinite.
over the links. This basic algorithm and a number of variations have
A popular solution to this problem is based on the been shown to converge to the correct distances if the link
Ford-Bellman method. It was originally introduced in the lengths stabilize and all cycles have strictly positive length
Arpanet [19] and has been used in a large number of networks [l], [12], [27]. However, the convergence can be very slow
[l], [3], [4], [6], [16], [24], [26]-[28]. It works as follows. when link lengths increase. In a typical example (Fig. 1) node
Two kinds of information are maintained: A becomes disconnected. Nodes B and C keep executing the
-the routing table R T d ( I , J ) , whose ( I , J ) t h entry is algorithm, increasing their RTd(., A ) . This behavior is known
maintained at node I to contain the estimate of the minimum as “counting.” In this example the new length is infinite so that
distance between I and J ; counting goes on without end. In practice there are known
-the neighbor table, NTd(I, J , M ) where all the indexes upperbounds N on the number of nodes and MAXLEN on
are node identities. NTd(I, J , M ) is used to save at I the latest LEN() and entries of RTd that exceed ( N - l)*MAXLEN
value of RTd(M, J ) transmitted by I to an adjacent node M . are set to m. If not all up links have the same length, a
The algorithm consists of the following steps. better alternative [6] is to keep track of the number of links
--Initially RTd(,) is arbitrary, except RTd(I, I ) is set to 0 in a shortest path and to only accept paths‘up to a maximum
for all I , and all links are down. number of links.
-Whenever a link adjacent to I goes up, node I sends When the “counting” phenomenon occurs in the previous
records of the form ( J ,RTd(I, J ) ) over it, for all nodes J. example data messages destined to A also cycle back and forth
-When a node I receives a pair ( J , D ) from a neighbor between nodes B and C, a phenomenon called “looping.” To
M , with I # J , is sets NTd(I, J , M ) to D and it computes decouple the concept of “counting,” which involves only the
RTd(I, J ) = min, NTd(I, J , p ) +LEN(I,p). If this results in shortest path algorithm, and the concept of ‘‘looping,’’ which
a new value for RTd(I, J ) , the record ( J ,RTd(I, J ) ) is sent involves data messages, we define “routing table looping” as
to all the neighbors of I . occurring if arbitrary long looping of the data messages can
-The same computation is also performed at I for all nodes be caused by delaying shortest path algorithm messages. The
J not equal to I whenever the length of any adjacent link Ford-Bellman algorithm is clearly subject to “routing table
looping.”
Paper approved by the Editor for Routing and Switching of the IEEE Com-
munications Society. Manuscript received June 14, 1988; revised May 23, The looping behavior problem is a major drawback of
1990. This work was supported in part by Codex Corporation and in part by distributed Ford-Bellman algorithms (for analyses, see [2],
the Army Research Office under Grant DAAL03-86-K-0171. [15] and Section V below) and modern networks avoid much
The author is with the Laboratory for Information and Decision Systems,
Massachusetts Institute of Technology, Cambridge, MA 02139. of the problem by broadcasting the whole topology to all
IEEE Log Number 9144889. nodes [22]. It is still interesting to try to modify the basic
0018-9340/91/060~0995$01.00 0 1991 IEEE

~ _ _ ~~

Authorized licensed use limited to: IEEE Xplore. Downloaded on October 7, 2008 at 9:48 from IEEE Xplore. Restrictions apply.
996 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 39, NO. 6, JUNE 1991

Ford-Bellman algorithm to prevent looping. Reference [ 131 PART 1

extends techniques developed in [18] and “freezes” part of the


network while the news of an increase in length propagates. RTd(1,I)= 0:
RTn(I,I) = RTa(I,I) =NIL:
This approach requires new types of messages, increases the
complexity of the data structures and sometimes delays a node
from obtaining a correct distance. It is also discussed in [lo],
for each node J NTd(1J M) = ’ -
send to M a message &raining kcords (J,RTd(IJ),RTn(IJ)) for all nodCS 1;

[ l l ] , and [24]. Another approach [9] reduces the likelihood of p(LEN(1.M) = - i f (1.M) is Down)
within a finite time COMPUTEO:
looping but does not always prevent it. Less efficient solutions,
such as broadcasting the sequence of nodes in the shortest
paths, have also been proposed [25]. The message is composed of records (J.D.K), where 1 is a node, D is a distance and K is a node.
for each record (J,D,K) in the message 1
In this paper, we offer another algorithm and we analyze NTd(IJ,M)=D;

-
NTn(IJ.M)= K;
its behavior. It is introduced in the next section. This will be 1
within a finite tim COMPUTEO;
followed by sections on the proof of correctness (Section III), PART 2:
efficient implementations (Section IV), complexity measures
(Section V), and finally comparisons with other methods. for
UNSEEN(I)
all nodes =FALSE
J UNSEEN(J)= TRUE;
MESSAGE,=NIL,
For each adjacentnode P lis1 the nodes J in ordmof nondccreasing md(1J.P).
Let TOP(P) denote the elekent cumntly at the top of the list for node P.
Forever do:{ For each adjacentnode P,m o v e nodes from the top of the list until
(UNSEENWP(P)) =TRUE) and
( (NT,(I,TOP(P)P) = NIL )or (RTa(LNTn(IIOPP).P))= €3);
If all lists arc empty 1
11. DESCRIPTION
OF THE ALGORITHM if ( MESSAGE # NIL ) then send MESSAGE on all Up adjacentlinks,
rem: I
P* = argmin NTd(I,ToP(p),P) + LEN(0.P));
It has often been noted that in the previous algorithm the P
RTd(I, J ) ’ s for different J’s behave independently of each
other and that one can focus on a single destination. To
the contrary, we remark here that much can be gained by
considering the interactions between different destinations.
Assume we know the neighbor K next to the destination on Fig. 2. Pseudocode for the algorithm.
a shortest path from a node I to a destination J . The following
statements must hold if we have valid paths to K and J and
0 5 LEN() 5 MAXLEN: we prefer extending the simple and explicit notation already
-RTd(I, J ) 2 RTd(I, K ) ; used in Section 1.l
+
-RTd(I, J ) 5 RTd(I, K ) MAXLEN; -To keep track of a shortest path tree at a node I we
-a neighbor of I that lies on a shortest path from I to K use three kinds of routing table entries, RTd, RT,, and RT,.
must also lie on a shortest path from I to J. RTd(I, J ) is as before. For I # J , RT,(I, J ) denotes the node
This suggests that keeping track of the node next to a des- next to J on the shortest path from I to J , while RT,(I, J )
tination on a shortest path is useful. Note that this is different indicates the node adjacent to I on a shortest path to J.
from keeping track of the node adjacent to the origin, which RT,(I, I ) and RT,(I, I ) are set to the special value NIL.
only prevents two-link loops [4], [23], [26]. The previous -The neighbor table now contains two kinds of entries,
relations could be used to quickly weed out unreachable nodes NTd and NT,. NTd(I, J , M ) is as before while NT,(I, J , M )
in Bellman-Ford type algorithms and prevent routing table is meant to be the node next to node J on a shortest path
looping (for example, in Fig. 1, node B should not accept from M to J.
the path through C as its length violates the first inequality). -Routing update messages sent by a node I are made
However, we will not use these inequalities directly in the rest of records, each record being a triple of the form
of the paper. Rather, we note that keeping information at a ( J ,RTd(I, J ) , RT,(I, 4).
node I about the nodes next to destinations is equivalent to
keeping track of an entire shortest path tree rooted at I . This is B. Algorithm
the view that we will adopt and exploit to develop our shortest The details of the implementation appear in Fig. 2 while
path algorithm. This idea has been used before in [12], but that Fig. 3 provides a graphic example. The algorithm is composed
work did not realize the full potential of the method. Riddle of two major parts: in the first part, a node observes local
[21] has proposed an algorithm that is virtually identical to topology changes or receives update messages from neighbors;
the one proposed here, but he did not provide an analysis. We these updates are saved in NT [with respect to Fig. 3, the NT’s
comment later on these earlier works. In Section 11-A and -B, for node B are the trees rooted at A , C, and D in Fig. 3(b)].
we introduce the data structures and describe the algorithm. NT(I, ., M ) is entirely rebuilt when link ( I ,M ) comes up.
‘Our notation assumes that there is no more than one link between two
nodes, but it can be easily modified to handle the general case. Also when
A. Data Structures the shortest paths are not unique they form a directed acyclic graph (DAG)
Our goal here is to keep track at each node of an estimated instead of just a tree. Our notation only keeps track of a tree in the DAG, but
it can be extended to keep track of the entire DAG. Efforts in that direction
shortest path tree, and Of the ‘‘rep1ica” Of such trees at the are sometimes necessary when one desires to spread traffic over all shortest
adjacent nodes. Although this could be done quite abstractly, paths.
HUMBLET: ADAPTIVE DISTRIBUTED SHORTEST PATH ALGORITHM 997

$ A

Fig. 3. (a) Network topology. (b) Individual node routing trees. (c) Building the routing tree at node B. (d) Reconfiguration following a topology change.

In the second major part (COMPUTE) each node I builds Because it uses a breadth first search with respect to path
from all its NT(I, .. .) a large tree with weighted edges (see length our algorithm can be seen as an adaptive distributed
Fig. 3(c) for node I = B ) where a node identity may appear version of Dijkstra’s algorithm [7]. Another distributed but
many times: node 1 puts itself as the root and “grows” on static implementation of Dijkstra’s method has been given
each adjacent link the shortest path trees communicated by its by [8]. These approaches should not be confused with those
neighbors. This large tree is then scanned in a “breadth first” relying on an explicit topology broadcast followed by local
fashion (with respect to the cumulative edge weights from the computation [22]. Dijkstra’s algorithm and this distributed
root) to obtain a subtree where each node appears at most once. version can be extended in a straightforward way to handle
That subtree is adopted as the new “local” shortest path tree negative link lengths [20], although they then have exponential
RT and changes (if any) with respect to the previous version complexity [14].
are communicated to the adjacent nodes. Fig. 3(d) illustrates
how node B behaves if an adjacent link fails.
111. PROOF OF CORRECTNESS
More precisely, COMPUTE() at node I builds RT starting
with I , considering the NT(I, J , P ) entries in order of nonde- For the algorithm to work some assumptions on the behavior
creasing distances from I, and including a node J in RT only for the links and nodes must hold. They are similar but not
if it has not been included yet (UNSEEN(J) is TRUE) and if identical to those used by many other authors.
it is next to I in the large tree (NT,(I, J , P ) is NIL) or if its 1) There is a link protocol that maintains up and down
neighbor K toward I in the large tree ( K = NT,(I,J,P)) states for the links and handles message transmissions.
already is in RT (RTa(l,K) = P ) . Thus, the RT structure It has the following_ _properties.
-
forms a directed tree (this would hold even if the NT’s did A time interval during which a link is up at a node
not form trees) that is composed of a root node out of which is called a link up period (LUP) at that node. A link
subtrees from the NT’s grow. We will call that tree the routing up period at one end of a link corresponds to at most
tree. one LUP at the other end. Both ends on the link can
The description of Fig. 2 does not specify when COM- only remain in noncorresponding LUP’s for finite
PUTE() is executed, requiring only that it is executed within a time intervals.
finite time after a triggering event. Concrete possibilities will Messages can only be sent during a LUP at the
be suggested in Section V. source and received during a LUP at the destination.

~ -~ ~~

Authorized licensed use limited to: IEEE Xplore. Downloaded on October 7, 2008 at 9:48 from IEEE Xplore. Restrictions apply.
q
IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 39, NO. 6, JUNE 1991

c) If the sequence of messages received during a LUP is


not a prefix of the sequence of messages sent during
the corresponding LUP, the LUP must be finite.2
Nodes can similarly be up or down. There are no LUP’s A

at a node while it is down.


All nodes are initially down.
All update and link state change messages received on
AfbT
(or about) a link are processed one at a time in the order
they are received.
Link lengths are strictly positive number^.^ A down link
is assigned length cc.
For the purpose of showing that the algorithm eventually
converges, we assume that there is a time T such that
between times 0 and T links and nodes go up and down
and link lengths change, but at time T links have the
same status at both ends and there are no changes after
T. Below we will use the word final to refer to the link
lengths, shortest paths, etc., in the topology after time T. ( b)
D ( I ,J ) will denote the distance between nodes I and Fig. 4 (a) Network topology before and after time T . (b) Routing tree at
J in that topology. node .4 at times T ( 0 ) = T , T ( l ) , T ( 2 )and
, T(3).
It is easy to see (we omit the formal proof) that assumptions
1-4 together with part 1 of the algorithm of Fig. 2 guarantee
that values of RTd(M, .) and RT,(M, .) at and after time T from time T ( l ) on) and a path with less than K links in the
will eventually be reflected in NTd(I, ., M ) and NT,(I, ., M ) routing tree of an adjacent node, and thus it has a final length
if link ( I , M ) is finally up. We will show that the algorithm from time T ( K ) on.
is correct, in the sense that within a finite time after T Fig. 4(b) displays the evolution of the routing tree at node
the RT structures at all nodes will form final shortest path A following the topology change at time T shown in Fig. 4(a).
trees. The proof consists of two parts. The first part shows The situations at time T ( 0 )and T ( l )simply illustrate theorem
that information about the old (before time T ) topology is 1, but that at time T ( 2 )also shows that paths in the tree with
eventually flushed from the network. The second part shows no more than K links at time T ( K ) need not be shortest
that shortest path trees are eventually computed. paths. Specifying when the routing tree is a shortest path tree
Let T ( 0 )= T and for a given execution of the algorithm requires new notation.
+
let T ( k 1 ) be the time by which all messages that are in Definitions: For any node I and real number d consider the
set of loop free directed paths starting at node I and having
transit at time T ( k )(k 2 0) have arrived at their destinations
and have been processed (including running COMPUTE() and length not exceeding d in the final topology. Let H ( I , d ) be
transmitting any messages resulting from COMPUTE()). the maximum number of links in those paths.
Theorem 1: By time T ( K ) in each routing tree all paths For any two nodes I and J let S ( I ,J ) be the set of nodes
from the root that have no more than K links have their final M such that link ( 1 , M ) is on a final shortest path to J and
length. define L ( I , J ) as follows:
Proof: We use induction on K . The theorem is true at L ( J ,J ) = 0
time T ( 0 )for K = 0. Assume that the theorem is true for
all, M,O 5 M < K . By time T ( K ) , the routing trees at H ( I ,D ( I , J ) ) ,1 +maxL(K, J )
K€S(I,J)
time T ( K - I) of all nodes have been communicated to their
neighbors. A path with no more than K links in a routing tree That the definition of L ( ) makes sense follows from the fact
is the concatenation of an adjacent link (which has final length that the set of links on shortest paths to node J forms a directed
acyclic graph so that the L ( I ,J ) ’ s can be defined iteratively.
’The usual requirement is that the sequence of messages received during a We now state and prove a theorem about the convergence
LUP be a prefix of the sequence of messages sent during the corresponding time of the algorithm.
LUP,i.e., the link protocol never delivers messages that arrive corrupted, or in Theorem 2:
duplicate, or out of sequence. We do not need the more stringent assumption
because the reception of a garbled or out of sequence message will not affect 1) If J is not connected to I in the final topology,
the algorithm forever if the LUP terminates. This property makes it easy to
adapt the algorithm to run without a link protocol.
RTd(1, J ) = CO for all times after T ( H ( I CO)
, +
1).
2 ) If J is connected to node I , the routing tree at node
3O link lengths can be allowed (even 0 length cycles) by the following I includes a final shortest path to node J from time
artifice: replace these lengths with a positive length equal to the smallest
strictly positive length divided by the number of nodes. This will not affect T ( L ( I ,J ) ) on.
shortest paths. The algorithm can also be modified to directly handle 0 length 3) If paths included in the routing tree are selected with tie
links, e.g., by considering that if paths A and B have the same length but
A contains fewer links than B , then A is “shorter” than B. Any method
breaking rules that depend only on shorter paths stored in
insuring that the set of links on “shortest” paths to a node form a directed the NT’s (i.e., the rules are not time varying, or random,
acyclic graph can be used. or depending on irrelevant longer paths) then no more
HUMBLET ADAPTIVE DISTRIBUTED SHORTEST PATH ALGORITHM 999

messages about J will be exchanged after T(L(1.J ) ) in the updated NT list. Thus the identity of K can be encoded
(i.e., the algorithm terminates). as a number, e.g., specifying the position of K in the list.
Proof: The first part of the proof follows directly from This can make significant savings in networks where node
theorem 1 and the definition of H ( 1 , d ) . The second part is identities are long.
proven by induction on the distance from 1 to J. It is true at A more efficient (and complex) implementation is to keep a
time T(0) at node J. Assuming it is true at time T ( L ( K ,J ) ) direct representation of trees for RT and NT. When a new RT
for all nodes K that have D ( K , J ) < D ( I . J ) , we will show is computed, only the difference between the new and old tree
it will hold at node 1 at time T ( L ( 1 ,J ) ) . needs to be communicated, e.g., as a set of subtrees. Recall that
Theorem I insures that by time T ( W ( 1 ,D ( 1 , J ) ) ) ,all paths a subtree of N nodes can be transmitted as N node identities
of length not exceeding D ( 1 , J ) include only links with final plus 2N bits. This can be done by walking along the subtree in
lengths, thus the routing tree cannot contain a path to J of depth first fashion, transmitting a node identity the first time it
length less than D ( I , J ) . By time is met, transmitting a 0 bit each time a link is traversed away
from the root, and a 1 bit when a link is traversed toward the
maxT(L(K, J ) + 1) root. If this is done, updating NT only takes an amount of time
K E S(1,J) proportional to the number of nodes in an update message. In
COMPUTE() one needs to consider a node in NT for inclusion
a final shortest path from all neighbors of I on a final shortest in RT only after its parent has been included, but it is not clear
path from I to J will be reflected in the NT at node I and to us if this observation can be used to effectively reduce the
COMPUTE() ensures that the routing tree of node I will amount of processing required by COMPUTE().
include a path to J. (The “max” is needed as COMPUTE() Other savings can be realized by using network specific
does not specify how ties are broken in selecting P*). Similar information. For example if the link lengths change by rela-
induction shows that under the hypothesis in the third part the tively small amounts it is likely that the structure of the routing
path to J in the routing tree at I will not change after time tree will often remain unchanged although some lengths may
T ( L ( I .J ) ) and thus no more messages will be exchanged. change. It is easy to design coding schemes taking advantage
of this feature.
Iv. EFFICIENT
IMPLEMENTATIONS AND VARIATIONS Various optimizations are also possible. For example, a node
The facts that COMPUTE() involves sorting nodes and that I need not send updates about a node J to an adjacent node K
messages include identities of nodes next to destinations may while J is in the subtree below K in the routing tree at node I ,
seem prohibitive. We indicate here how simple data structures although K must be notified when J moves into that subtree
can alleviate much of the difficulty. Below, N denotes the from another one. Also COMPUTE() needs to be run at node
number of nodes in the network and L ( I ) the number of links I following the reception of a record about node J from node
adjacent to node I . +
M only if NTd(I, J , M ) LEN(1, M ) < RTd(I, J ) ) or if
To avoid the sorting operation, nodes in a neighbor table RT,(I, J ) = M ; a similar rule holds in the case of changes
can be organized as a doubly linked list, in order of increasing in LEN().
NTd. Notice that COMPUTE includes records ( J ,D, K ) in an We now discuss the earlier works of Hagouel and Riddle. In
Update message in order of nondecreasing D so that following the first of these [12] each node also builds a routing tree but it
the reception of such a message the linked list for NT can be does not use NT structures to keep track of the routing trees of
updated with an amount of processing not worse than linear in the adjacent nodes. Rather it queries its neighbors whenever
N . Running COMPUTE() at a node requires that an amount they might have a shorter path to a destination. Reference
of processing not worse than linear in N * L ( I ) as there are [12] does not point out that its approach prevents “counting,”
no more than N * L ( I ) entries in the NT linked lists and P* and it argues correctness by relying on an equivalence (in
must be found at most N times. some sense) between the Ford-Bellman method and the new
If the number of nodes is not small compared to the diameter algorithm.
of the network another efficient alternative is to use “buckets” The method proposed in [21] is close to ours, with the
rather than linked lists to implement a neighbor table. A bucket exception that the trees transmitted in messages are not routing
just contains all nodes at a given distance via a neighbor. An trees but so called “exclusionary trees” which will be defined
Update message can then be processed in a time proportional shortly. A node I forms a big tree from the received exclu-
to the number of records it contains. sionary trees, and obtains its local routing tree by scanning the
Reference [5] has recently proposed another very simple big tree in breadth first fashion. Node I grows the exclusionary
method to build the routing tree. Select an arbitrary node in the tree to be sent to its neighbor J by again scanning its big
large tree and check if it and all the intermediate nodes toward tree in breadth first fashion but omitting subtrees hanging
the root are at minimum distance (among all the instances of from J. Thus, the exclusionary tree transmitted from I to J
a node) from the root. If they are then the entire path becomes does not contain any path going through J and each node
part of the routing tree. Repeat the process, selecting a node effectively knows the loop free shortest path (if any) to each
that has not been considered before, until no new node can be other node via each adjacent link. If an adjacent link fails,
added to the routing tree. the new shortest path is immediately available. However, this
Message sizes can be reduced by noting that if there is a does not guarantee that the new shortest path is immediately
record ( J ,D, K) in a message, node K must appear before J available at nodes that are not adjacent to the failed link. This

Authorized licensed use limited to: IEEE Xplore. Downloaded on October 7, 2008 at 9:48 from IEEE Xplore. Restrictions apply.
loo0 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 39, NO. 6, JUNE 1991

approach trades off faster recovery from single link failures


for more processing and memory, as a node computes and
stores one exclusionary tree for each neighbor. The time and
message complexities for Riddle’s algorithm do not appear to
be significantly different from ours, they are developed in the
next section.

v. TIMEAND MESSAGECOMPLEXITIES
We define the time complexity of the algorithm as the largest
time that can elapse between the moment T when the last
topology change occurs and the moment all nodes have final
shortest paths and distances to all other nodes. The unit of time (4 (b)
is an upperbound on the length of time between the moment a Fig. 5. (a) Initial routing tree to node J . (b) Routing tree to node .J after a
length increase for link (Pl,PO).
message is generated and the moment it is processed [including
running COMPUTE()]. The communication complexity is
defined as the maximum number of node identities exchanged Y ( i )= max(X(i),Y ( i - 1 ) + 1) 1 < i 5 n.
after time T .
We start this section by looking at the time complexity in Y ( i )is less than twice the number M of nodes that had link
the benchmark case where the length of a single link ( P l ,PO) ( P l ,PO) on their shortest path to node J , as both X ( j ) and
changes at a time T after the algorithm had converged. For n do not exceed M (the worst case occurs when the network
simplicity, we will limit the discussion to the case where the topology is a loop with J = PO = KO, with I = K n = P
shortest paths are unique. We will focus on the shortest path adjacent to J , and with the length of link (KO,J ) so large that
from a node I to a node J . There are four possible situations initially all shortest paths to J are via I).
involving that shortest path. One also sees that “routing table looping” cannot occur
(P1,PO) is not on the shortest path and its length following a single link failure because a tree structure is
does not change enough to affect the shortest path. This maintained at all times.
uninteresting case is not considered below. We now turn to the general case where an arbitrary number
( P I ,PO) is not on the shortest path and its length of changes can take place. To simplify the formulas we
decreases enough that it becomes part of the shortest assume that the last change occurs at time T = 0. Our time
path. complexity results are summarized in the following theorem.
( P l ,PO) is on the shortest path and its length does not Below MaxHop(1, J ) and MinHop(1, J ) denote, respectively,
change enough to modify the shortest path (although the the maximum and minimum number of links in shortest paths
length of the shortest path changes). between nodes I and J , while R denotes the ratio of the largest
( P l ,PO) is on the shortest path between I and J and its to the smallest values of link lengths assigned to an Up link.
length increases enough -
- that the shortest path changes. Theorem 3: If T = 0 and messages are processed within
one time unit after they are generated, then
In cases b) and c), node I will be aware of the change after
a delay not exceeding the number of links on the shortest path -If I is not connected to J , R T d ( I , J ) = CO by time
[the new shortest path in case b)] between PO and node I; +
H ( I , m ) 1,
this can be seen by induction on the number of links between -If I is connected to J , RT at I includes a shortest path
I and PO. Case d) is slightly more complicated and it is to J by time:
best to refer to Fig. 5. Solid lines in Fig. 5(a) represent the 1.1 L ( I , J ) ,
shortest path tree to J before the increase in the length of +
1.2 min ( H ( I , C O ) MaxHop(1, J ) , R * MaxHop(I, J ) ) ,
( P I ,PO) while in Fig. 5(b) they represent the tree after the +
1.3 min ( H ( I , CO) MinHop(1, J ) , R * MinHop(I, J ) ) if
increase. Note that the new shortest path from I to J can COMPUTE() is modified to break ties in favor of the
be decomposed in two parts: the part closer to I is made up path with fewer links in selecting P*.4
of nodes ( I = K n , . . . K2, K1) whose shortest path to J has 2.1 MaxHop(I, J ) if just before time T all path lengths
also changed. The part away from I (between KO and J ) is stored in NT’s and contained in messages in transit are
unchanged. Each of the nodes I = K n , . . . K2, K1 will learn not less than the final lengths (e.g., if all nodes were
of the increase via the old shortest path. They will adopt the isolated just before time 7’)
new shortest path after they have learned of the increase and 2.2 MinHop(1, J ) under the assumptions in 1.3 and 2.1.
after their predecessor on the new path also has adopted the Proof: The first statement and that in 1.1) follow directly
new path. Denote by X ( i ) ,i = 1 , 2 , . . . n, the number of links from Theorem 2. Note that H ( I , m ) is the maximum num-
on the old shortest path between nodes K i and PO, and by ber of links in loop free paths starting at node I and that
Y ( i ) the delay until node K i adopts the new shortest path. +
H ( I , m ) 1 never exceeds N .
We have the inductive relations
4 A chain between nodes I and J is a sequence of adjacent directed links
starting at node I and ending at node J . It may contain cycles. Its length is
Y ( l )= X ( 1 ) the sum of the lengths of its links.
~

HUMBLET ADAPTIVE DISTRIBUTED SHORTEST PATH ALGORITHM 1001

1.2 follows from 1.1)and from using the facts “ H ( I ,d ) 5


H ( I , m ) for all d” and “ H ( I , D ( I , J ) ) 5 R *
MaxHop(1, J)” in the definition of L ( I , J ) .
1.3 follows by examining why the Max operation is used
in Theorem 2.
2.1 follows by noticing that under the hypothesis in 2.1,
H ( I ,D ( I , J ) ) plays no role in Theorem 2.
2.2 follows from 2.1) as 1.3) follows from 1.2).
It is interesting to have similar results for the Ford-Bellman
algorithm. Here, we assume synchronous operation, i.e., mes-
sages are received on a “clock tick,” computation and broad-
cast take place, and the resulting update messages are received
at their destinations on the following “clock tick.” We denote
the value of RTd(I, J ) at time T = 0 by Do(I, J ) . We denote
by C ( I ,K , n ) the length of a shortest chain4 from node I to Fig. 6. Example of a geometric increase of the number of update messages.
node K that contains n links in the final topology. One finds
by induction on n that the value of RTd(I,J ) at time T n +
changes from 10 to 1 after the algorithm had converged. In
is given by Ford-Bellman it is entirely possible that news of this increase,
broadcast by node B , will first reach node C by way of node
+
( C ( I , K , n ) & ( K , J ) ) ,m i n C ( I , J , k )
k<n
B‘, causing node C to change its distance estimate to node 0
from 11 to 3. Later node C would learn directly from node
B about the true shortest path of length 2. In this process
This exact formula demonstrates that looping is a problem node B has send one update, but node C has sent two. By
in the Ford-Bellman algorithm, i.e., the presence of loops can a similar reasoning one sees that node D may send up to
keep C(1,K , n ) small, even for relatively large n. However, if four updates, node E up to eight, a geometric increase in the
all the Do(K,J ) are not less than the final value of RTd(K, J ) number of nodes. In contrast in our algorithm nodes will only
then RTd(I, J ) will converge in time MinHop(1, J ) , as does accept news of the decrease if they arrive on the shortest path
our algorithm. To the contrary if the DO() are small then it to A (which does not change) and they will participate in
becomes impossible to bound the time to convergence by a only one update. However, one can design similar examples
quantity that is independent of R. Then the tightest bound is where both the Ford-Bellman and our algorithm suffer even if
simply R * MinHop(1, J ) if I is connected to J . only a single link length changes. No occurrences of this type
Turning our attention to communication complexity, we of behavior have been reported and they are in fact unlikely
must make explicit when COMPUTE() is executed after a as the length assigned to a link if often directly related to
triggering event in part 1 of Fig. 2. There are two traditional delay, making it difficult for many messages traveling on long
possibilities, and we also suggest another.
paths to all arrive before the first message traveling on a short
A ) Event Driven: Run COMPUTE() whenever a topology
path.
change occurs or an update message is received. One expects
More can be said if we operate following B ) or C) and
that this would be the fastest. However, if the output links have
assume also that it takes no more than one time unit for a
finite capacity this might result in update messages queueing
message to traverse a link and that at most K messages can
for transmission.
traverse a link within a time unit ( K would typically be 1).
B) Periodic: Run COMPUTE() at each node on a periodic
basis, the periods need not be the same at all nodes. This The time bounds of theorem 3 can then be transformed in
has the effect of delaying propagation of changes, but may bounds for the communication complexity: it does not exceed
reduce the computational load and the number of transmitted a function linear in K * N * L * min(N, R * Diam) where
messages. L denotes the number of links, and Diam is defined as the
C ) The third possibility combines the advantages of A ) and maximum over I and J of MinHop(1, J ) .
B): Use A) but avoid the possible queueing of messages by Routing table looping can also occur if many links fail.
combining all messages queued on a link into a single one. Consider a network with a “wheel” topology, i.e., a central
That message indicates all the changes that must be made to node radially connected to N nodes forming a bidirectional
NT at the receiving end in order to obtain there the image of loop. All clockwise loop links and the inward radial links have
the current RT at the source. Note that although queueing is unit length, while all other links have length N . If all links to
eliminated this may still generate more traffic than B). the center node fail, all nodes will reroute traffic destined to
If the algorithm is operated in the event driven manner A), the center node to their neighbor in the clockwise direction.
little can be said about the number of messages that need Nodes will also send routing messages in the counterclockwise
to be exchanged. The nature of the difficulty is outlined in direction. Data traffic may cycle clockwise until counterclock-
the following example [2] that also illustrates how the new wise routing messages have propagated all around the circle.
algorithm sometimes vastly outperforms the Ford-Bellman This is the price paid by our algorithm for not sending routing
method. Consider Fig. 6 where the length of link ( B , A ) messages in both directions up and down the routing tree as in

Authorized licensed use limited to: IEEE Xplore. Downloaded on October 7, 2008 at 9:48 from IEEE Xplore. Restrictions apply.
1002 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 39, NO. 6, JUNE 1991

(131 and [18]. Reference [29] suggests delaying data messages REFERENCES
to remedy this problem.

[ l ] D. Bertsekas and R. G. Gallager, Data Networks. Englewood Cliffs,


VI. COMPARISON TO OTHER METHODS NJ: Prentice-Hall, 1987.
[2] D. Bertsekas and J. Tsitsiklis, “Parallel and Distributed Compu-
Both the algorithm described here and distributed Ford- tation-Numerical Methods.” Englewood Cliffs, NJ: Prentice-Hall,
Bellman perform equally fast and are at their best under the 1989.
[3] Burroughs Network Architecture (BNA), Architectural Description, Ref.
assumption in part 2.1) of Theorem 3, i.e., when all estimated Man., vol. 1, Burroughs Corp., Detroit, MI, Apr. 1981.
distances are initially too large. Under general conditions [4] T. Cegrell, “A routing procedure for the TIDAS message-switching
ours performs better, as it does not “count.” In addition the network,” IEEE Trans. Commun., vol. COM-23, pp. 575-585, June
1975.
complete sequence of nodes on a shortest path can easily be [5] C. Cheng, S. P. R. Kumar, and J. J. Garcia-Luna-Aceves, “A distributed
derived from the RT structure. Including this sequence in data loop-free routing algorithm suitable for arbitrary link weights,” North-
packets (also called source routing) is an easy way to guarantee western Univ., Evanston, IL, Nov. 1989.
[6] DECnet, Digital Network Architecture, Routing Layer Functional Speci-
that absolutely no looping will occur. This is very desirable fication, Version 2.0.0., Digital Equipment Corp., Maynard, MA, 1983.
for systems using virtual circuits and is the reason why [12] [7] E. W. Dijkstra, “A note to two problems in connection with graphs,”
developed his algorithm. Numerische Mathematik, vol. 1, pp. 269-271, 1959.
[8] D. U. Friedman, “Communication complexity of distributed shortest
No analysis similar to the one in Theorem 3 is available for path algorithms,” Rep. LIDS-Th-886, Lab. Inform. Decision Syst.,
the modification to Ford-Bellman proposed in 1131. However, Massachusetts Instit. Technol., Cambridge MA.
[9] J. J. Garcia-Luna-Aceves, “A new minimum-hop routing algorithm,” in
in the case of a single link length change one can see that it Proc. IEEE INFOCOM ’87, pp. 170-180.
cannot complete faster than ours. [lo] -“A distributed, loop-free, shortest path routing algorithm,’% Proc.
Shortest paths can also be computed by broadcasting the IEEE INFOCOM ’88, pp. 170-180.
[11] S. Gruchevsky and D. Piscitello, “The Burroughs integrated adaptive
topology and performing local computation 1221. This ap- routing system (BIAS),” Comput. Commun. Rev., vol. 17, no. 1&2,
proach typically is faster and requires fewer messages. How- Jan./Apr. 1987.
ever, it requires more storage and processing is not distributed: [12] J. Hagouel, “Issues in routing for large and dynamic networks,” Ph.D.
dissertation, Graduate School of Art and Sci., Columbia Univ., 1983.
each node computes its routing tree independently, while in Also available as IBM Res. Rep. RC 9942.
our approach a node benefits from the computation done by [13] J. M. Jaffe and F. M. Moss, “A responsive routing algorithm for com-
puter networks,” IEEE Trans. Commun., vol. COM-30, pp. 1758- 1762,
the neighbors. The difference is striking in the case of nodes July 1982.
that have only one adjacent link. [14] D. B. Johnson, “A note on Dijkstra’s shortest path algorithm,” J.ACM,
Although we prefer the topology broadcast method if enough vol. 20, no. 3, pp. 385-388, July 1973.
[I51 M.J. Johnson, “Updating routing tables after resource failure in a
memory is available the algorithm presented in this paper distributed computer network,” Networks, vol. 14, no. 3, pp. 379-392,
may be an attractive update for networks that currently use 1984.
Ford-Bellman as both methods use similar data structures (161 J. Jubin, “Current packet radio network protocols,” in Proc. IEEE
Infocom ’85, Washington, DC, Mar. 1985, pp. 86-92.
and messages. ( 1 7 L. Kleinrock and F. Kamoun, “Hierarchical routing for large networks;
Simple modifications to our algorithm may also be quite Performance evaluation and optimization,” Comput. Networks, vol. 1,
attractive in cases where it is enough to find short paths, and pp. 155-174, 1977.
(181 P. M. Merlin and A. Segall, “A failsafe distributed routing protocol,”
not shortest paths. For example, the algorithm can be modified IEEE Trans. Commun., vol. COM-27, pp. 1280-1288, Sept. 1979.
so that nodes only broadcast update messages if the magnitude [19] J. McQuillan, “Adaptive routing algorithms for distributed computer net-
works,” BBN Rep. 2831, Bolt, Beranek and Newman, Inc., Cambridge,
of the relative difference between the old and new lengths is MA, May 1974.
above some threshold. That way minor changes in lengths will [20] G. L. Nembauser, “A generalized permanent label setting algorithm for
only propagate in a small part of the network thus reducing the shortest path between specified nodes,”J. Math. Anal. Appl., vol. 38,
pp. 328-334, 1972.
the amount of communication. [21] G.G. Riddle, “Message routing in a computer network,” U S . Pat. 4
Another variation along the same lines can be used with 466 060, Aug. 1984.
an hierarchical addressing scheme 1171 where nodes in the [22] E.C. Rosen, “The updating protocol of ARPAnet’s new routing algo-
rithm,” Comput. Networks, vol. 4, pp. 11-19, Feb. 1980.
same subtree of the address space are also close together. [23] M. Schwartz, “Routing and flow control in data networks,” NATO
The algorithm can then be modified to only update distances Advanced Study Inst.: New Concepts in Multi-user Communications,
to “representatives” of distant subtrees, and not to all nodes Norwich, U.K., Aug. 4-16, 1980; Sijthoof and Nordhoof, The Nether-
lands.
in the subtree. This approach also reduces the size of the [24] M. Schwartz, Telecommunication Networks: Protocols, Modeling and
data structures. These two variations can also be adopted Analysis, New York: Addison-Wesley, 1986.
1251 K. G. Shin and M. S. Chen. “Performance analvsis of distributed routine
together. L ,

strategies free of ping-pong-type looping,” IEEE Trans. Comput., vol. C-


U

36, pp. 129-137, Feb. 1987.


1261 D.E. Suroule and F. Mellor, ‘‘Routing, flow, and congestion con-
~~

ACKNOWLEDGMENT trol in ;he datapac network,” ZEEE Ti&. Commun., vol. COM-29,
pp. 386-391, Apr. 1981.
The author would like to thank S . DiCecco for reawakening [27] W. D. Tajibnapis, “A correctness proof of a topology information
our interest in the shortest path problem and for providing maintenance protocol for a distributed computer network,” Commun.
ACM, vol. 20, pp. 477-485, 1977.
constructive comments, and to A. Segall and T. C . Lee for [28] J. Westcott and J. Jubin, “A distributed routing design for a broadcast
bringing to our attention the works of Hagouel and Rid- environment,” Conf. Rec. IEEE Military Commun. Conj, Boston, MA,
Oct. 1982, vol. 3, pp. 10.4.4-10.4.5.
dle.
HUMBLET: ADAPTIVE DISTRIBUTED SHORTEST PATH ALGORITHM 1003

J, J. Garcia-Luna-Aceves, “A minimum-hop routing algorithm based on Pierre A. Humblet received the B.S.E.E. degree
distributed information,” Comput. Networks ISDN Syst., vol. 16, pp. from the University of Louvain (Belgium), and the
367-382, May 1989. M.S.E.E. and Ph.D. degrees from the Massachusetts
Institute of Technology, Cambridge, MA, all in
electrical engineering.
After graduating in 1978 he has remained at
M.I.T. where he is now an associate professor of
electrical engineering. His teaching and research
interests are in the area of communication sys-
tems, particularly wide-band optical networks and
distributed algorithms. He is a consultant with
a number of companlies, most recently IBM and Codex Corporation.

Authorized licensed use limited to: IEEE Xplore. Downloaded on October 7, 2008 at 9:48 from IEEE Xplore. Restrictions apply.

You might also like