Professional Documents
Culture Documents
Pierre A. Humblet
1:
link failure
I. INTRODUCTION
~ _ _ ~~
Authorized licensed use limited to: IEEE Xplore. Downloaded on October 7, 2008 at 9:48 from IEEE Xplore. Restrictions apply.
996 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 39, NO. 6, JUNE 1991
[ l l ] , and [24]. Another approach [9] reduces the likelihood of p(LEN(1.M) = - i f (1.M) is Down)
within a finite time COMPUTEO:
looping but does not always prevent it. Less efficient solutions,
such as broadcasting the sequence of nodes in the shortest
paths, have also been proposed [25]. The message is composed of records (J.D.K), where 1 is a node, D is a distance and K is a node.
for each record (J,D,K) in the message 1
In this paper, we offer another algorithm and we analyze NTd(IJ,M)=D;
-
NTn(IJ.M)= K;
its behavior. It is introduced in the next section. This will be 1
within a finite tim COMPUTEO;
followed by sections on the proof of correctness (Section III), PART 2:
efficient implementations (Section IV), complexity measures
(Section V), and finally comparisons with other methods. for
UNSEEN(I)
all nodes =FALSE
J UNSEEN(J)= TRUE;
MESSAGE,=NIL,
For each adjacentnode P lis1 the nodes J in ordmof nondccreasing md(1J.P).
Let TOP(P) denote the elekent cumntly at the top of the list for node P.
Forever do:{ For each adjacentnode P,m o v e nodes from the top of the list until
(UNSEENWP(P)) =TRUE) and
( (NT,(I,TOP(P)P) = NIL )or (RTa(LNTn(IIOPP).P))= €3);
If all lists arc empty 1
11. DESCRIPTION
OF THE ALGORITHM if ( MESSAGE # NIL ) then send MESSAGE on all Up adjacentlinks,
rem: I
P* = argmin NTd(I,ToP(p),P) + LEN(0.P));
It has often been noted that in the previous algorithm the P
RTd(I, J ) ’ s for different J’s behave independently of each
other and that one can focus on a single destination. To
the contrary, we remark here that much can be gained by
considering the interactions between different destinations.
Assume we know the neighbor K next to the destination on Fig. 2. Pseudocode for the algorithm.
a shortest path from a node I to a destination J . The following
statements must hold if we have valid paths to K and J and
0 5 LEN() 5 MAXLEN: we prefer extending the simple and explicit notation already
-RTd(I, J ) 2 RTd(I, K ) ; used in Section 1.l
+
-RTd(I, J ) 5 RTd(I, K ) MAXLEN; -To keep track of a shortest path tree at a node I we
-a neighbor of I that lies on a shortest path from I to K use three kinds of routing table entries, RTd, RT,, and RT,.
must also lie on a shortest path from I to J. RTd(I, J ) is as before. For I # J , RT,(I, J ) denotes the node
This suggests that keeping track of the node next to a des- next to J on the shortest path from I to J , while RT,(I, J )
tination on a shortest path is useful. Note that this is different indicates the node adjacent to I on a shortest path to J.
from keeping track of the node adjacent to the origin, which RT,(I, I ) and RT,(I, I ) are set to the special value NIL.
only prevents two-link loops [4], [23], [26]. The previous -The neighbor table now contains two kinds of entries,
relations could be used to quickly weed out unreachable nodes NTd and NT,. NTd(I, J , M ) is as before while NT,(I, J , M )
in Bellman-Ford type algorithms and prevent routing table is meant to be the node next to node J on a shortest path
looping (for example, in Fig. 1, node B should not accept from M to J.
the path through C as its length violates the first inequality). -Routing update messages sent by a node I are made
However, we will not use these inequalities directly in the rest of records, each record being a triple of the form
of the paper. Rather, we note that keeping information at a ( J ,RTd(I, J ) , RT,(I, 4).
node I about the nodes next to destinations is equivalent to
keeping track of an entire shortest path tree rooted at I . This is B. Algorithm
the view that we will adopt and exploit to develop our shortest The details of the implementation appear in Fig. 2 while
path algorithm. This idea has been used before in [12], but that Fig. 3 provides a graphic example. The algorithm is composed
work did not realize the full potential of the method. Riddle of two major parts: in the first part, a node observes local
[21] has proposed an algorithm that is virtually identical to topology changes or receives update messages from neighbors;
the one proposed here, but he did not provide an analysis. We these updates are saved in NT [with respect to Fig. 3, the NT’s
comment later on these earlier works. In Section 11-A and -B, for node B are the trees rooted at A , C, and D in Fig. 3(b)].
we introduce the data structures and describe the algorithm. NT(I, ., M ) is entirely rebuilt when link ( I ,M ) comes up.
‘Our notation assumes that there is no more than one link between two
nodes, but it can be easily modified to handle the general case. Also when
A. Data Structures the shortest paths are not unique they form a directed acyclic graph (DAG)
Our goal here is to keep track at each node of an estimated instead of just a tree. Our notation only keeps track of a tree in the DAG, but
it can be extended to keep track of the entire DAG. Efforts in that direction
shortest path tree, and Of the ‘‘rep1ica” Of such trees at the are sometimes necessary when one desires to spread traffic over all shortest
adjacent nodes. Although this could be done quite abstractly, paths.
HUMBLET: ADAPTIVE DISTRIBUTED SHORTEST PATH ALGORITHM 997
$ A
Fig. 3. (a) Network topology. (b) Individual node routing trees. (c) Building the routing tree at node B. (d) Reconfiguration following a topology change.
In the second major part (COMPUTE) each node I builds Because it uses a breadth first search with respect to path
from all its NT(I, .. .) a large tree with weighted edges (see length our algorithm can be seen as an adaptive distributed
Fig. 3(c) for node I = B ) where a node identity may appear version of Dijkstra’s algorithm [7]. Another distributed but
many times: node 1 puts itself as the root and “grows” on static implementation of Dijkstra’s method has been given
each adjacent link the shortest path trees communicated by its by [8]. These approaches should not be confused with those
neighbors. This large tree is then scanned in a “breadth first” relying on an explicit topology broadcast followed by local
fashion (with respect to the cumulative edge weights from the computation [22]. Dijkstra’s algorithm and this distributed
root) to obtain a subtree where each node appears at most once. version can be extended in a straightforward way to handle
That subtree is adopted as the new “local” shortest path tree negative link lengths [20], although they then have exponential
RT and changes (if any) with respect to the previous version complexity [14].
are communicated to the adjacent nodes. Fig. 3(d) illustrates
how node B behaves if an adjacent link fails.
111. PROOF OF CORRECTNESS
More precisely, COMPUTE() at node I builds RT starting
with I , considering the NT(I, J , P ) entries in order of nonde- For the algorithm to work some assumptions on the behavior
creasing distances from I, and including a node J in RT only for the links and nodes must hold. They are similar but not
if it has not been included yet (UNSEEN(J) is TRUE) and if identical to those used by many other authors.
it is next to I in the large tree (NT,(I, J , P ) is NIL) or if its 1) There is a link protocol that maintains up and down
neighbor K toward I in the large tree ( K = NT,(I,J,P)) states for the links and handles message transmissions.
already is in RT (RTa(l,K) = P ) . Thus, the RT structure It has the following_ _properties.
-
forms a directed tree (this would hold even if the NT’s did A time interval during which a link is up at a node
not form trees) that is composed of a root node out of which is called a link up period (LUP) at that node. A link
subtrees from the NT’s grow. We will call that tree the routing up period at one end of a link corresponds to at most
tree. one LUP at the other end. Both ends on the link can
The description of Fig. 2 does not specify when COM- only remain in noncorresponding LUP’s for finite
PUTE() is executed, requiring only that it is executed within a time intervals.
finite time after a triggering event. Concrete possibilities will Messages can only be sent during a LUP at the
be suggested in Section V. source and received during a LUP at the destination.
~ -~ ~~
Authorized licensed use limited to: IEEE Xplore. Downloaded on October 7, 2008 at 9:48 from IEEE Xplore. Restrictions apply.
q
IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 39, NO. 6, JUNE 1991
messages about J will be exchanged after T(L(1.J ) ) in the updated NT list. Thus the identity of K can be encoded
(i.e., the algorithm terminates). as a number, e.g., specifying the position of K in the list.
Proof: The first part of the proof follows directly from This can make significant savings in networks where node
theorem 1 and the definition of H ( 1 , d ) . The second part is identities are long.
proven by induction on the distance from 1 to J. It is true at A more efficient (and complex) implementation is to keep a
time T(0) at node J. Assuming it is true at time T ( L ( K ,J ) ) direct representation of trees for RT and NT. When a new RT
for all nodes K that have D ( K , J ) < D ( I . J ) , we will show is computed, only the difference between the new and old tree
it will hold at node 1 at time T ( L ( 1 ,J ) ) . needs to be communicated, e.g., as a set of subtrees. Recall that
Theorem I insures that by time T ( W ( 1 ,D ( 1 , J ) ) ) ,all paths a subtree of N nodes can be transmitted as N node identities
of length not exceeding D ( 1 , J ) include only links with final plus 2N bits. This can be done by walking along the subtree in
lengths, thus the routing tree cannot contain a path to J of depth first fashion, transmitting a node identity the first time it
length less than D ( I , J ) . By time is met, transmitting a 0 bit each time a link is traversed away
from the root, and a 1 bit when a link is traversed toward the
maxT(L(K, J ) + 1) root. If this is done, updating NT only takes an amount of time
K E S(1,J) proportional to the number of nodes in an update message. In
COMPUTE() one needs to consider a node in NT for inclusion
a final shortest path from all neighbors of I on a final shortest in RT only after its parent has been included, but it is not clear
path from I to J will be reflected in the NT at node I and to us if this observation can be used to effectively reduce the
COMPUTE() ensures that the routing tree of node I will amount of processing required by COMPUTE().
include a path to J. (The “max” is needed as COMPUTE() Other savings can be realized by using network specific
does not specify how ties are broken in selecting P*). Similar information. For example if the link lengths change by rela-
induction shows that under the hypothesis in the third part the tively small amounts it is likely that the structure of the routing
path to J in the routing tree at I will not change after time tree will often remain unchanged although some lengths may
T ( L ( I .J ) ) and thus no more messages will be exchanged. change. It is easy to design coding schemes taking advantage
of this feature.
Iv. EFFICIENT
IMPLEMENTATIONS AND VARIATIONS Various optimizations are also possible. For example, a node
The facts that COMPUTE() involves sorting nodes and that I need not send updates about a node J to an adjacent node K
messages include identities of nodes next to destinations may while J is in the subtree below K in the routing tree at node I ,
seem prohibitive. We indicate here how simple data structures although K must be notified when J moves into that subtree
can alleviate much of the difficulty. Below, N denotes the from another one. Also COMPUTE() needs to be run at node
number of nodes in the network and L ( I ) the number of links I following the reception of a record about node J from node
adjacent to node I . +
M only if NTd(I, J , M ) LEN(1, M ) < RTd(I, J ) ) or if
To avoid the sorting operation, nodes in a neighbor table RT,(I, J ) = M ; a similar rule holds in the case of changes
can be organized as a doubly linked list, in order of increasing in LEN().
NTd. Notice that COMPUTE includes records ( J ,D, K ) in an We now discuss the earlier works of Hagouel and Riddle. In
Update message in order of nondecreasing D so that following the first of these [12] each node also builds a routing tree but it
the reception of such a message the linked list for NT can be does not use NT structures to keep track of the routing trees of
updated with an amount of processing not worse than linear in the adjacent nodes. Rather it queries its neighbors whenever
N . Running COMPUTE() at a node requires that an amount they might have a shorter path to a destination. Reference
of processing not worse than linear in N * L ( I ) as there are [12] does not point out that its approach prevents “counting,”
no more than N * L ( I ) entries in the NT linked lists and P* and it argues correctness by relying on an equivalence (in
must be found at most N times. some sense) between the Ford-Bellman method and the new
If the number of nodes is not small compared to the diameter algorithm.
of the network another efficient alternative is to use “buckets” The method proposed in [21] is close to ours, with the
rather than linked lists to implement a neighbor table. A bucket exception that the trees transmitted in messages are not routing
just contains all nodes at a given distance via a neighbor. An trees but so called “exclusionary trees” which will be defined
Update message can then be processed in a time proportional shortly. A node I forms a big tree from the received exclu-
to the number of records it contains. sionary trees, and obtains its local routing tree by scanning the
Reference [5] has recently proposed another very simple big tree in breadth first fashion. Node I grows the exclusionary
method to build the routing tree. Select an arbitrary node in the tree to be sent to its neighbor J by again scanning its big
large tree and check if it and all the intermediate nodes toward tree in breadth first fashion but omitting subtrees hanging
the root are at minimum distance (among all the instances of from J. Thus, the exclusionary tree transmitted from I to J
a node) from the root. If they are then the entire path becomes does not contain any path going through J and each node
part of the routing tree. Repeat the process, selecting a node effectively knows the loop free shortest path (if any) to each
that has not been considered before, until no new node can be other node via each adjacent link. If an adjacent link fails,
added to the routing tree. the new shortest path is immediately available. However, this
Message sizes can be reduced by noting that if there is a does not guarantee that the new shortest path is immediately
record ( J ,D, K) in a message, node K must appear before J available at nodes that are not adjacent to the failed link. This
Authorized licensed use limited to: IEEE Xplore. Downloaded on October 7, 2008 at 9:48 from IEEE Xplore. Restrictions apply.
loo0 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 39, NO. 6, JUNE 1991
v. TIMEAND MESSAGECOMPLEXITIES
We define the time complexity of the algorithm as the largest
time that can elapse between the moment T when the last
topology change occurs and the moment all nodes have final
shortest paths and distances to all other nodes. The unit of time (4 (b)
is an upperbound on the length of time between the moment a Fig. 5. (a) Initial routing tree to node J . (b) Routing tree to node .J after a
length increase for link (Pl,PO).
message is generated and the moment it is processed [including
running COMPUTE()]. The communication complexity is
defined as the maximum number of node identities exchanged Y ( i )= max(X(i),Y ( i - 1 ) + 1) 1 < i 5 n.
after time T .
We start this section by looking at the time complexity in Y ( i )is less than twice the number M of nodes that had link
the benchmark case where the length of a single link ( P l ,PO) ( P l ,PO) on their shortest path to node J , as both X ( j ) and
changes at a time T after the algorithm had converged. For n do not exceed M (the worst case occurs when the network
simplicity, we will limit the discussion to the case where the topology is a loop with J = PO = KO, with I = K n = P
shortest paths are unique. We will focus on the shortest path adjacent to J , and with the length of link (KO,J ) so large that
from a node I to a node J . There are four possible situations initially all shortest paths to J are via I).
involving that shortest path. One also sees that “routing table looping” cannot occur
(P1,PO) is not on the shortest path and its length following a single link failure because a tree structure is
does not change enough to affect the shortest path. This maintained at all times.
uninteresting case is not considered below. We now turn to the general case where an arbitrary number
( P I ,PO) is not on the shortest path and its length of changes can take place. To simplify the formulas we
decreases enough that it becomes part of the shortest assume that the last change occurs at time T = 0. Our time
path. complexity results are summarized in the following theorem.
( P l ,PO) is on the shortest path and its length does not Below MaxHop(1, J ) and MinHop(1, J ) denote, respectively,
change enough to modify the shortest path (although the the maximum and minimum number of links in shortest paths
length of the shortest path changes). between nodes I and J , while R denotes the ratio of the largest
( P l ,PO) is on the shortest path between I and J and its to the smallest values of link lengths assigned to an Up link.
length increases enough -
- that the shortest path changes. Theorem 3: If T = 0 and messages are processed within
one time unit after they are generated, then
In cases b) and c), node I will be aware of the change after
a delay not exceeding the number of links on the shortest path -If I is not connected to J , R T d ( I , J ) = CO by time
[the new shortest path in case b)] between PO and node I; +
H ( I , m ) 1,
this can be seen by induction on the number of links between -If I is connected to J , RT at I includes a shortest path
I and PO. Case d) is slightly more complicated and it is to J by time:
best to refer to Fig. 5. Solid lines in Fig. 5(a) represent the 1.1 L ( I , J ) ,
shortest path tree to J before the increase in the length of +
1.2 min ( H ( I , C O ) MaxHop(1, J ) , R * MaxHop(I, J ) ) ,
( P I ,PO) while in Fig. 5(b) they represent the tree after the +
1.3 min ( H ( I , CO) MinHop(1, J ) , R * MinHop(I, J ) ) if
increase. Note that the new shortest path from I to J can COMPUTE() is modified to break ties in favor of the
be decomposed in two parts: the part closer to I is made up path with fewer links in selecting P*.4
of nodes ( I = K n , . . . K2, K1) whose shortest path to J has 2.1 MaxHop(I, J ) if just before time T all path lengths
also changed. The part away from I (between KO and J ) is stored in NT’s and contained in messages in transit are
unchanged. Each of the nodes I = K n , . . . K2, K1 will learn not less than the final lengths (e.g., if all nodes were
of the increase via the old shortest path. They will adopt the isolated just before time 7’)
new shortest path after they have learned of the increase and 2.2 MinHop(1, J ) under the assumptions in 1.3 and 2.1.
after their predecessor on the new path also has adopted the Proof: The first statement and that in 1.1) follow directly
new path. Denote by X ( i ) ,i = 1 , 2 , . . . n, the number of links from Theorem 2. Note that H ( I , m ) is the maximum num-
on the old shortest path between nodes K i and PO, and by ber of links in loop free paths starting at node I and that
Y ( i ) the delay until node K i adopts the new shortest path. +
H ( I , m ) 1 never exceeds N .
We have the inductive relations
4 A chain between nodes I and J is a sequence of adjacent directed links
starting at node I and ending at node J . It may contain cycles. Its length is
Y ( l )= X ( 1 ) the sum of the lengths of its links.
~
Authorized licensed use limited to: IEEE Xplore. Downloaded on October 7, 2008 at 9:48 from IEEE Xplore. Restrictions apply.
1002 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 39, NO. 6, JUNE 1991
(131 and [18]. Reference [29] suggests delaying data messages REFERENCES
to remedy this problem.
ACKNOWLEDGMENT trol in ;he datapac network,” ZEEE Ti&. Commun., vol. COM-29,
pp. 386-391, Apr. 1981.
The author would like to thank S . DiCecco for reawakening [27] W. D. Tajibnapis, “A correctness proof of a topology information
our interest in the shortest path problem and for providing maintenance protocol for a distributed computer network,” Commun.
ACM, vol. 20, pp. 477-485, 1977.
constructive comments, and to A. Segall and T. C . Lee for [28] J. Westcott and J. Jubin, “A distributed routing design for a broadcast
bringing to our attention the works of Hagouel and Rid- environment,” Conf. Rec. IEEE Military Commun. Conj, Boston, MA,
Oct. 1982, vol. 3, pp. 10.4.4-10.4.5.
dle.
HUMBLET: ADAPTIVE DISTRIBUTED SHORTEST PATH ALGORITHM 1003
J, J. Garcia-Luna-Aceves, “A minimum-hop routing algorithm based on Pierre A. Humblet received the B.S.E.E. degree
distributed information,” Comput. Networks ISDN Syst., vol. 16, pp. from the University of Louvain (Belgium), and the
367-382, May 1989. M.S.E.E. and Ph.D. degrees from the Massachusetts
Institute of Technology, Cambridge, MA, all in
electrical engineering.
After graduating in 1978 he has remained at
M.I.T. where he is now an associate professor of
electrical engineering. His teaching and research
interests are in the area of communication sys-
tems, particularly wide-band optical networks and
distributed algorithms. He is a consultant with
a number of companlies, most recently IBM and Codex Corporation.
Authorized licensed use limited to: IEEE Xplore. Downloaded on October 7, 2008 at 9:48 from IEEE Xplore. Restrictions apply.