Fault-Tolerant Broadcasting and Gossiping in Communication Networks

Andrzej Pelc*
Departement d’lnformatique, Universite du Quebec a Hull, Hull, Quebec J8X 3x7, Canada

Broadcasting and gossiping are fundamental tasks in network communication. In broadcasting, or oneto-all communication, informationoriginally held in one node of the network (called the source) must be transmitted to all other nodes. In gossiping, or all-to-all communication, every node holds a message which has to be transmitted to all other nodes. As communication networks grow in size, they become increasingly vulnerable to component failures. Thus, capabilities for fault-tolerant broadcasting and gossiping gain importance. The present paper is a survey of the fast-growing area of research investigating these capabilities. We focus on two most important efficiency measures of broadcasting and gossiping algorithms: running time and number of elementarytransmissionsrequired by the communication process. We emphasize the unlfying thread in most results from the research in fault-tolerant communication: the trade-offs between efficiency of communication schemes and their fault-tolerance. 0 7996 John Mey & Sons, Inc.

1. INTRODUCTION
Broadcasting and gossiping are fundamental tasks in network communication. They both aim at disseminating information among nodes. In broadcasting, also called one-to-all communication, information originally held in one node of the network (called the source) has to be transmitted to all other nodes. In gossiping, or all-toall communication, every node holds a message (value) which must be transmitted to all other nodes. These types of network communication often occur in distributed computing, e.g., in global processor synchronization and updating distributed databases. Moreover, such communication tasks are implicit in many parallel computation problems, where data and results are distributed among processors. This happens, e.g., in matrix multiplication, parallel solving of linear systems, parallel computing of the discrete Fourier transform, or parallel sorting, cf. [ 8, 31, 531. Two most important measures of performance of broadcasting and gossiping algorithms are the number of elementary transmissions ( c a l f s ) and the number of
* E-mail: pelc@uqah.uquebec.ca
NETWORKS, Vol. 28 (1996) 143-156 0 1996 John Wiley & Sons. Inc.

rounds (rime) required. Another concern in the design of communication schemes is the demand that they impose on the underlying network. Since dense networks are difficult and costly to implement, it is important to consider efficient broadcasting and gossiping algorithms that work for networks as sparse as possible. Excellent accounts of the literature on broadcasting and gossiping focusing on the above-mentioned problems can be found in surveys [33, 50, 511. As communication networks grow in size, they become increasingly vulnerable to component failures. Some links and/or nodes of the network may fail. It becomes important to design communication algorithms in such a way that the desired communication task be accomplished efficiently in spite of these faults, usually without knowing their location ahead of time. As such, much attention has been devoted recently to fault-tolerant broadcasting and gossiping. The present paper, which is an extended version of [62], surveys this rapidly growing area of research. It has been necessary to make choices in the large body of literature concerning fault-tolerant communication, leaving out vast and important subdomains related to the main focus of this paper. We do not cover the issue of network reliability, as it is not immediately
CCC 0028-3045/96/030143-14

143

computers) and whose edges are communication links used to transmit messages from site to site.2. both to network communication and to fault modeling. radio communication. This parameter may dramatically change the process of gossiping. e g . pointing out differences of our approach. In case of broadcasting. we point out the choices to be made and fix the appropriate terminology. The rest of the paper is organized as follows: In Section 3. yielding a large number of communication models and fault-tolerance solutions. also called n-porr or fink-bound. The only assumption common to all papers reviewed in the present survey is the consideration of point-to-point communication networks modeled as undirected graphs . in which a signal can be simultaneously transmitted to all receivers within the range of the broadcasting station. when packets are large enough to contain all values of nodes in the network. the communication mode can be full-duplex. when during one call messages between communicating nodes can travel in both directions through the (bidirectional) link joining them. when every node can only send or only receive information in a given call. Several combinations and variations of these modes have been considered. while in the whispering mode. to unbounded. The paper is organized as follows: In Section 2. it may play a significant role. We focus on two most important and widely studied efficiency measures of broadcasting and gossiping algorithms: running time and number of elementary transmissions used in the communication process. we focus attention only on the two above-mentioned communication tasks. many combinations of assumptions concerning the communication process and the possible faults can be found. Different authors have adopted varying approaches. hence. especially if large control messages concerning already detected faults need to be circulated during the algorithm execution. schemes that work for sparse networks are more widely applicable than are those requiring. The communication mode specifies which calls can be executed simultaneously during one unit of time and what messages can be transmitted in one call. or halfduplex. the full-duplex mode is appropriate for telephone conversations. as only one message is to be disseminated throughout the network. Another issue closely related to fault-tolerant broadcasting and gossiping is the Byzantine Agreement. One dividing line is between bounded and probabilistic fault models. Further. leaving out.I# PELC concerned with the efficiency of fault-tolerant communication but rather with necessary conditions for its feasibility. 2. e. However. We do not consider problems of the Byzantine Agreement in this paper.random fault distribution is considered. packet size often does not change the algorithm design. A number of other features must be specified in order to describe the model completely. while in Section 4. In this survey.g. The whispering mode is suitable to model wire-based communication. We also discuss the issue of sparsity of networks supporting efficient faulttolerant communication schemes. e. 2. the important and largely studied issue of fault-tolerant. Large bandwidth availability.. we discuss a variety of communication and fault models discussed in the literature. also called I-porr or processor-bound. a node can communicate with at most one neighbor. a completely connected network. as large packets allow single transmission of already accumulated information.a node can communicate with all its neighbors during a single unit of time. The design of algorithms and results concerning their efficiency and robustness heavily depend on the underlying model..g. significantly influencing the performance of algorithms. This material has been covered in the survey [ 5 ] which also discussed the important problem of multiprocessor system diagnosis. Thus. MODELS AND TERMINOLOGY The domain of fault-tolerant broadcasting and gossiping is a rather broad area of research. it is very important to specify the assumptions in a detailed and rigorous way. Below. simultaneous sending to all neighbors but sequential receiving or the concurrent sending to one neighbor and receiving from one (possibly different) neighbor. Communication Mode The communication primitive is a call taking place between two adjacent sites (or nodes) of the network and usually lasting a unit of time.1. whose vertices are sites of the network (e. A packet is the amount of information that can be sent by a node to its neighbor in one call. The size of packets can vary from unit. Section 5 is devoted to the discussion of possible directions of future research.g. such as occurs in traditional telephone networks. In the shouting mode. such as in 2. pointto-point routing where problems and techniques are different from those encountered for broadcasting and gossiping. as both the goals and algorithm design techniques differ in each case. Finally. The shouting mode models. is the size of message packets. Size of Packets Another important characteristic of the communication process. although we discuss some aspects of broadcasting and gossiping in the presence of Byzantine faults.. even in this case. when each packet can contain the value of only one node. Likewise. e g . processors. while the half-duplex mode is used in sending telegrams or letters. we survey results in the bounded fault model. we emphasize the unifying thread found in most results from research in fault-tolerant communication: the trade-offs between efficiency of communication schemes and their fault-tolerance.

or both links and nodes. Clearly. Byzantine faults. rerouting. are a worst-case fault scenario: Faulty components can behave arbitrarily (even maliciously) as transmitters. Fault Distribution One of the crucial assumptions made about faults concerns their distribution. the source is assumed fault-free (otherwise. Flexibility of Algorithms In a fault-free environment. it is relatively easy to achieve fault-tolerant communication using massive redundancy. all fault-free nodes must receive messages from all fault-free nodes with a specified probability. however. Faults can be either p e m nent (i.. makes the assumption of large. this is where fault-tolerant gossiping differs from Byzantine Agreement in the case of Byzantine node failures. 2. in gossiping. representing a worst-case assumption. the communication goal cannot be achieved with certainty. Byzantine failures that exhibit all these kinds of damaging behavior rarely occur in practice. In broadcasting. also called oblivious. the trade-off between reliability and efficiency becomes an important issue. its message must reach all fault-free nodes provided that no more than k components (links or nodes or both. or adaptive. The first concern is to specify which components are faultprone: only links. In larger networks. by stopping. all components may fail and preclude any message transmission. for such networks. a distinction must be made regarding this point.e. The two most commonly studied types are crash and Byzantine faults. Another important characteristic of faults.. due to the small scale of the network. They may be caused by a hostile agent whose aim is to destroy the communication process. no communication is possible.FAULT-TOLERANT BROADCASTING AND GOSSIPING 1 6 optical communication networks. Broadcasting and gossiping algorithms can be either nonadaptive.e. all fault-free nodes must receive messages from all fault-free nodes. we usually seek what is termed k-tolerant broadcasting and gossiping. while transient link faults correspond to transmission failures due. on the other hand. Although some information may be lost. their worstcase location is assumed. Repeating attempts to transmit the same message along the same link is useless in case of permanent faults but may be essential if faults are of a transient nature. which will significantly affect the efficiency and robustness of faulttolerant communication. The choice between these two assumptions regarding fault occurrence influences the definition of the goal of broadcasting and gossiping in the presence of faults. In adaptive algorithms. otherwise. the nature of faults must be described. some limitations on the number of possibly faulty components must be imposed. All calls to be carried out in each time unit can be specified in advance.4. resources used by such brute-force communication procedures (either time or number of messages) will not be excessive. Faulty components do not alter transmitted messages. In the bounded fault model. if the called node and the connecting link are fault-free) . In the bounded model. Designing efficient communication algorithms whose reliability increases for networks of larger size is difficult for the following reason: In small networks.5. or altering transmitted messages in a way most detrimental to the communication process. hence. at least the information that is received can be trusted. other. In the probabilistic model.( I / n ) (the source being fault free).e. faults are assumed to occur randomly and independently of each . almost safe broadcasting and gossiping is sought. In the case of gossiping. no broadcasting is possible). On the other hand. Such faults are relatively benign. nodes can 2. highly reliable and efficient algorithms are sought. Even in this second case. however. e. provided that no more than k components are faulty. a node becomes aware of whether a call it attempted was successful (i. to temporary magnetic interference. Communication algorithms that work correctly in the presence of Byzantine faults can be used safely under any fauI t scenario. where every node can schedule its next call based on information currently available to it. which must be specified. In case of broadcasting. essentially unbounded packets increasingly plausible. broadcasting and gossiping algorithms have a simple form. Permanent faults usually correspond to hardware Component failures..3. 2. the status of a component may change in each unit of time). Furthermore. If the fault is a crash. with some small probability. Fault Classification Several assumptions are made concerning aspects of faults that can occur in the communication process. different calls can be executed depending on the success or failure of previous ones. In the presence of faults. before the algorithm execution. an upper bound k is imposed on the number of faulty components. is their duration. Two commonly used fault models are the bounded model and the probabilistic model.g. all faultfree nodes must receive the source message with probability at least 1 . It should be stressed that no agreement concerning messages from faulty nodes is required among the fault-free nodes. the faulty node does not send or receive messages or the faulty link does not transmit messages. since. the concept of Byzantine faults plays an important role in our study. where all calls must be scheduled in advance. only nodes. the status faulty/fault-free of a component does not change during the algorithm execution) or transient (i. depending on the particular scenario) are faulty. with specified probability. they fail to transmit messages at all. In the probabilistic fault model.. As a result. Knowledge as to which types of faults are likely to occur is important in communication algorithm design.

Permanent Link Faults We first assume that nodes are fault free and there are at most k permanently faulty links in the network. at first. Moreover.I)]. In case of broadcasting.the lower bound T 2 [log nl + k .consr. graph theoretic problems. the exact value of C was obtained in both cases. THE BOUNDED FAULT MODEL In this section. we assume that there are at most k faulty components in the network and their distribution is worst- It was conjectured that the exact value of C is close to the upper bound. For the half-duplex mode. for the full-duplex mode. The interest of the research community in different models varies. 3. communication schemes have a simple combinatorial formulation under the above assumptions. as local memory and computation capabilities of processors have increased.146 PELC only take advantage of locally available information. and c= r? n1. For the full-duplex mode. only bounds on C have been established: + 5 C I L(k + i ) ( n . Models in the Literature All the above characteristics of the communication and fault models must be specified if results concerning faulttolerant broadcasting and gossiping are to be meaningful. more precisely.4). there is a good balance of research conducted in each of the above models.1 ). C = r s2 n . ( ifksn-2. where both the full-duplex and halfduplex variations were considered. for n . otherwise. interest in nonadaptive vs. However. the exact value of T was established for k = 1.1 > k 2 0.1. we can see technological advances as well as intrinsic interest in combinatorics influencing the choice of particular communication scenarios. and the complete description of the model can only be inferred from the algorithm description or from arguments concerning efficiency and robustness of the communication schemes. Adaptive algorithms require more local control and memory at each node but are usually more efficient for the same faulttolerance level. In case of gossiping. the network is a complete graph.l ) 1 . we do not assume the existence of a central monitor supervising the overall communication process. We denote by T the minimum time of k-tolerant broadcasting or gossiping and by C the minimum number of calls used in such a communication process. As such. We assume the communication mode is whispering. the implementation of adaptive algorithms has become more realistic. broadcasting and gossiping were viewed mostly as combinatorial. if k 5 n . as a result.2. In addition. faults are of crash type. several researchers pointed out that the random fault assumption corresponds much better to failure Occurrence in real networks. C = ( k + 1 )(n . The following results concern the minimum time T of nonadaptive. 2: . The first results on the number of calls in this model were proved in [ 7 1.2. An m-hypercube is the graph on 2"' nodes labeled by binary sequences of length rn in which adjacent nodes have labels differing in exactly one place. This is probably due to the fact that. As the domain of research has evolved. the exact value of C was obtained for the half-duplex mode: C = ( k 2 ) n . Subsection 3. the probabilistic fault model gained prominence. An example is the choice to be made between the bounded and probabilistic models. At present. The first papers in the area of fault-tolerant broadcasting and gossiping favored the bounded fault model and nonadaptive algorithms. was proved. 3. and 2. We denote by log the logarithm with base 2 and by In the natural logarithm.2. adaptive communication algorithms has also varied with time. the popularity of some of the proposed models have changed over time. k-tolerant broadcasting in the full-duplex mode: In [ 5 6 ] .7. This conjecture was later disproved in [ 481 under the transient fault assumption (cf. ifk 2 n .I)]. Many authors make some of the assumptions tacitly. Notation We use the following notation through the rest of the paper: We let n denote the number of nodes in the network. case. that C = [ ( k + 3 ) / 2 ] n .6. s L ( k + i)n . Unless otherwise specified. 2.

c ..logn 5 1. the upper bound does not exceed log n k very much and thus it is fairly tight. It was also proved that B . it was shown that Moreover. it can be advantageous to decrease the coefficient at k at the expense of increasing the coefficient at [log nl .2. when m is even. T = [log nl i f k = 1. n z 2 ' . As for the function &(n). for n = 0 or 1 (mod 4 ) . it was proved in [42] to be O(n(k log n)).+ 6. First. For all k 5 Llog PI]. ifk=2. with a number of links differing from BA(n) by at most a constant factor. under the additional assumption that n = 2'". the exact value Bk(n) ( m + k)n/2 was = established.riogjl. the authors construct networks supporting k-tolerant broadcasting in minimum time. close to log n . thereby proving that B A ( n )= (m k)n/2. (2"' . It was proved that + T s m + k + l + The above results concerning the minimum values for the time T of k-tolerant broadcasting and for the k-tolerant broadcast function Bk(n) were extended in [42]. This is the minimum number of links in a network supporting k-tolerant broadcasting in minimum time T. if k = 2. In the same paper. 2 For general values of k . n > 6.1. Moreover.. Further bounds on the function B.m . when n is close to Llog nl. for even n. k-tolerant broadcasting schemes were constructed that perform in the optimal possible time: T=logn+k. depending on k . n > 2 . e. T=rlognl+2. for any constant c and a constant d depending on c but not on n or k. In [ 561. the second upper bound from 1641 is useful: + Similar upper bounds on B 2 ( n )were obtained for some values of n.1 J.the minimum time T of k-tolerant broadcasting was proved to be [log nl + k or rlog(n . n>4. This upper bound was also obtained independently in [57]. The above upper bounds on B .1.(n)s -[log nl . T = ifksn-logni f n . for k < Llog nl if n is even and for k 5 L2r10g''1 n + I J if n is odd. it was proved that T s 2k + h o g nl 4. for n being a power of 2. Sparser networks supporting 1-tolerant and 2-tolerant broadcasting in minimum time were constructed.1)(2"' . the following bounds on B . linear in n. whenever k s Llog nJ. k s [ ( n . giving the following estimates: T s (1 + $ ) k + 2 d l o g n l + d . and k s n . Let m = Llog nJ and n = 2"' + j . the networks supporting these schemes have the minimum possible number of links ( m + k)n/2. the k-tolerant broadcast function BA( n) was defined and studied for the first time.FAULT-TOLERANT BROADCASTING AND GOSSIPING 147 T=rlognl+ I.g. In [54]. In these situations. + .l m . logn + k + 1.2.(n)and its values for specific k and n can be found in [ 11. and T=sm+k+2+ k . ( n ) were established: (m . The tightness of these bounds depends on k and j. lar.hogj if k l5 1 2"' . for n = 2" and k s n . where 0 < j 5 2"'-2. n = 2' .I )1 + k + 1. For large k . tighter upper bounds on T for other values of n and k were also established.1)/(8c + l ) ] . In particu- L L m . The exact values of T for arbitrarily large k were first obtained in [37]. ( n )and B 2 ( n ) were later improved in [ 151.m - <k 5 2"' . two upper bounds on time T had been obtained earlier in [64]. For small values of j .2. and n 2 n 2 I .6) = In [ 561. B. 1 + if 2"' . + 3. and for n = 2 or 3 (mod 4 ) .6 ) .2.1. and small values of k. If n = 2"' and k 5 n .

x. the bound becomes close to log n + 2k.for a constant a < 1. for a constant a < 1. . randomized broadcasting in the whispering mode was considered.I . for large j. Faults are of the crash type. under the same assumption as above: k s an. The first results in this model were obtained in [29].. this information must be acquired by nodes during the communication process. n . encodes it into n smaller files F.nl + 2. it was shown that a nonadaptive broadcasting algorithm for the m-hypercube existed working in optimal time m and using the optimal number of 2”’ .+. . assuming that the location of faults is a priori known to the source. for a constant c < 4. The half-duplex whispering mode was adopted.) Adaptive broadcasting in complete networks has been recently investigated in [45].2. It remains open if adaptive broadcasting can be done in time O(1og n).. the authors constructed a k-tolerant broadcasting algorithm working in time O(log2 n).~]. In [ 591. 3. .The authors derived bounds on the minimum time and the minimum number of calls for k-tolerant broadcasting and gossiping in G depending on these values for G . where p is the startup time and LT is the propagation time. close to 2”’-’. . the algorithm was k-tolerant.1 ) . Uninformed nodes can also place calls. Two variations of the above model were considered: In the first. and if no faults are present. they work in optimal time hog. t E S. We first consider the minimum time T of k-tolerant adaptive broadcasting. .k ) l was obtained. in which each node could send information to two neighbors in a unit of time. In [ 431. The second variation does not impose any restriction on nodes placing calls. (This should not be confused with deterministic broadcasting in the presence of randomly distributed faults. In [ 521. are awake) can attempt placing calls. the exact value of minimum broadcasting time T = k + rlog(n . IDA . In [ 21. but they have diagnostic capabilities: A fault-free node can correctly diagnose tested nodes. (Chordal rings are rings [ x . the lineur communication model was used.148 PELC On the other hand. In this general model. Given a file F. A different approach to broadcasting in the presence of faults was adopted in [ 121. called the wake-up model.. random broadcasting in the presence of at most k faulty links was proved to be completed in time O(log n). ) . it was assumed that the number of faulty links does not exceed rm/21 . the execution time of this algorithm was shown to be smaller than the time of broadcasting using the straightforward ( k + I)-replication approach.nl + 3. both transient and permanent link failures were investigated. It was assumed that nodes fail in a Byzantine manner. In this model..73 times greater than optimal. these schemes work in time rlog. and Gz. and the size of packets was assumed unbounded. Permanent Node Faults We now assume that links are fault-free and there are at most k permanently faulty nodes. For each of the considered communication modes.k files F..1 calls. . nonadaptive broadcasting in the half-duplex whispering mode was considered. Broadcasting proposed in [ 401 was performed so that one failure could only affect one file F. the goal considered was that of minimizing the number of calls in fault-tolerant gossiping. whenever k I an. Under these assumptions. Informed nodes decide randomly to which neighbor they transmit the message at each time unit. In [ 301. which is then similar to and [57] the bound T 5 2k + [log nl + 4 from [64] and is far from the lower bound when k is large. fault-tolerant broadcasting in the mhypercube was considered. The authors presented an adaptive gossiping algorithm using 3n log n + O ( n )calls and working correctly whenever the faultfree part of the network is connected. The exact values of broadcasting time for arbitrary n and k remain unknown. . consequently.-~] with additional links [ x i . F. in such a way that F can be recovered from any subset of n . the time to transmit a message of length L is p LT.e. . to be considered in Section 4. For k = 1 and k = 2. . The authors proposed nonadaptive k-tolerant broadcasting schemes working for the complete n-node network in time rlog. a communication mode in between whispering and shouting was considered: Every node can transmit to at most k other nodes in a unit of time.. only nodes that have already obtained the source information (i. . and. Also. . Four communication modes were considered: whispering and shouting in both the halfduplex and full-duplex variations. even an adaptive algorithm does not assume any a priori knowledge of fault location. In this model. x~~+. . while faulty testers are unpredictable. In [40]. with probability converging to 1. Packets were assumed of size O ( m ) and a variation of the whispering mode was considered.~. The authors established upper and lower bounds on T and constructed a k-tolerant broadcasting algorithm requiring time at most 1. k-tolerant broadcasting and gossiping was considered in the shouting mode for product graphs G = G I X Gz. Under these assumptions. + f + .nl. In [ 121. this enables the algorithm to perform preprocessing and avoid time-consuming calls from the source to faulty nodes. . which is usually costly in terms of time and calls. In the usual scenario of kfault-tolerance.+.. It was proved that if there are at most k nonadjacent faulty nodes in some chordal rin s then adaptive broadcasting can be performed in time log nl k. the authors used Rabin’s Information Dispersal Algorithm (IDA) to construct fast k-tolerant broadcasting algorithms for the hypercube. for a fixed S where C ( 1. the communication mode is full-duplex whispering.) If k < cn. Finally..

In the traditional fault-tolerant setting. In [ 271.. they proposed a scheme for k-tolerant broadcasting in the m-hypercube requiring a time of m + k + 1 ) / t 1 .4. .1. This should be compared to the previously mentioned result of [ 121. for some integer r. . was considered in [3]. that the minimum number of tokens sufficient to perform 2-tolerant linear broadcasting is @(log log n ) and that it is 2 for 1-tolerant linear broadcasting. the location of faults was assumed to be known to all nodes. in G. . using the optimal time m and the optimal number of calls 2"' . The authors assumed that each node can communicate with at most t neighbors in a unit of time. These a e n-node chordal rings in which nodes u and u are adjacent iff u . i. For posi( tive integers k and n . A similar result for the star graph was obtained in [41]. respectively. ning trees rooted at the source s = (sI . . i.. In both papers. .both in the whispering and in the shouting mode.u = 2'(mod n). a faulttolerant version of linear broadcasting was considered. . It was proved. the shouting mode.2 arbitrary node or link failures.2.$) (the paths from the source to any node in distinct trees are mutually internally node-disjoint ) . called linear broadcasting. In any unit of time. . . they proposed k-tolerant broadcasting using the optimal number of ( k + 1)( n ! . In [ 321. was first considered in [ 10. they proposed a broadcasting algorithm for S. n and the node u = ( u l . the upper bound T s [log nl + k + 2 on k-tolerant broadcasting time was established under the assumption k s h o g n l . in the presence of k unknown faults. in particular. for i = 2. In the full-duplex whispering mode. where 0.1)/21 Byzantine faults or up to n . t = 1 + C = a. Some bounds on the broadcasting time in the presence of a larger number of faults that do not disconnect the hypercube were also shown. Assuming at most r( n ) link or node faults known to all nodes. S. . a node could communicate with only two neighbors in a unit of time. Under these assumptions.1 crash faults. s.FAULT-TOLERANT BROADCASTING AND GOSSIPING 149 3. The performance measure of a k-tolerant scheme. 3. and a. obtained under different assumptions.n. A different approach to broadcasting in the presence of faults. is a graph whose nodes are labeled by all permutations of integers 1. An algo- r( --- rithm was described that performs broadcasting in the presence of up to m .e. The difference from broadcasting is that tokens may not be "multiplied" for free at any node but have to travel to each node from the source. A variation of the broadcasting problem. The source has an unrestricted number of identical tokens. k-tolerant broadcasting in jumping nenvorks r was considered.e. . but the location of faults was known only to the source and the communication mode was more restrictive.. A broadcasting algorithm was proposed.e. ) = L3(n . indicating in which order nodes should be visited. t = 2 ZyLl' p. the number of faults was smaller and only links were fault-prone. and for :. was adopted in [ 631 and [41]. broadcasting in the m-hypercube was considered in [ 631. For the whispering mode. similar to that in [ 121. whenever the total number of faults does not exceed k.I. In [ 491. Transient Faults We now turn attention to transient faults. It was proved that their running time is asymptotically optimal. u. in whispering and shouting mode. Nonadaptive k-tolerant broadcasting in product netx G. still has diameter D(S. The adopted mode of communication was shouting.) is adjacent to all nodes u[i]. .. Permanent Link and Node Faults We now assume that both links and nodes can fail and that the total number of such permanent crash faults is at most k. the previously described linear communication model was used and k-tolerant broadcasting and gossiping algorithms were proposed for the hypercube in both shouting and whispering modes. works G = GI X The authors proposed broadcasting algorithms for such networks using the construction of n independent span. Each token has a predetermined route. where u[ i ] results from u by a transposition of uI and u. a communication mode in between whispering and shouting was considered. was the number of tokens used by the scheme. The n-star graph S. .1)/2J. The goal is for all nodes to be visited by at least one token. it converges to the lower bound as the length L of the messages increases. . The execution time t of these algorithms was derived in terms of broadcasting time in the factor networks G. 221.).1) calls. Assuming that k < m. adopted in [27]. requiring time at most m + 1 if a node can simultaneously transmit to all neighbors and time at most 2m if transmitting a message is possible only to one neighbor at a time. . nonadaptive broadcasting in the mhypercube was considered under the assumption that nodes can simultaneously send and receive messages and that the number k of faults is less than m.) and using the optimal number of calls n ! . The authors determined the maximum number r ( n ) of link or node faults such that the faultfree part of S. denote optimal fault-free broadcasting time from s. In [65]. working in optimal time D( S. . + on+ I . The authors established lower bounds on P k ( n ) and showed k-tolerant schemes using few tokens. Let us first note that if a broadcasting or gossiping algorithm works . has n! nodes and the diameter D ( S . each node that holds tokens can send at most one token to at most one other node. .3. A linear broadcasting scheme consisting of token routes is k-tolerant if every fault-free node is visited by at least one token. In [ 131. See also Problem 13 by Greenberg (the Report Dispersal Problem) in [ 341. In [ 121.. . A faulty node or link destroys all tokens passing through it. let PL n )denote the minimum number of tokens for which there exists a k-tolerant linear broadcasting scheme in a (n + 1)-node complete network. i. . These algorithms can tolerate up to L(n .

at most at calls may fail in the first t time units of the algorithm execution. For arbitrary n = 2"' > 2k. no message passes during a faulty call. We now return to the full-duplex mode.e. T = h o g nl + 2 and C 2 2n . for given costs of links. it was proved that the minimum time of ktolerant broadcasting is T = rn + 2k. which is tight. when all costs c(i. i.const under the assumption of at most k permanent link faults. The following variation on the assumption of at most k faulty calls was recently considered in [MI. Fix a C 1. unlike in the permanent fault case. for any t > 0.1 ) + 2k + 16. Faults are of crash type. It was proved that this cost is optimal among all k-tolerant broadcasting algorithms. Moreover. All results concern only link faults. A more general measure of communication cost has been considered in [ 391. For broadcasting in the half-duplex variation of this model. In the case of broadcasting. [For example.e. 1 -a .1 larger than optimal. there are at most k faulty calls during the communication process. j ) are equal to I. Assume that. The scheme also improves on the number of calls..2. 2 If the network is an rn-hypercube then. requiring a minimum time T = rn + k . We assume that nodes are fault free and links are subject to transient failures. A k-tolerant algorithm was proposed with running and 2 TE a( (A)") . the proportion coefficient must be smaller than 1.. it was conjectured in [ 7 ] that C = [(k + 3)/2]n .. it seems reasonable to assume-that the possible number of failures is proportional to the time elapsed since the beginning of the algorithm execution.] It was proved in [39] that. for arbitrary n and k . Moreover. for k > 2"-'.l)]+rn. Since the number of call failures is likely to increase with execution time. the gossiping algorithm constructed in [37]. and algorithms are nonadaptive.g. if a < $. The exact value of the minimum number of calls in ktolerant gossiping in this model is still unknown. i f n = 2 " . a class of improved upper bounds for almost all k was also obtained. j ) of the complete network on n nodes is assigned a positive cost c ( i . If the network is an n-node chain. the same fault cannot prevent transmission in two distinct time units. This order of magnitude is clearly optimal. when n = 2". can be achieved in the symmetric directed rnhypercube. all upper bounds on time and the number of calls reported above for permanent faults still hold in the present scenario. the total cost is equal to the number of calls in the half-duplex mode.: C f(k + 8 ) ( n . the communication mode is whispering. Packets are unbounded. we discuss results that use the assumption that faults are transient. The first results concerning the minimum number C of calls were proved in [48]. j ) . the problem of determining the minimum cost of performing k-tolerant gossiping among n nodes is NP-hard.1. j ) . As mentioned in Subsection 3. In [ 481. their gossiping scheme requires time at most ( k + 2)n . a k-tolerant algorithm was constructed with cost k + 1 times larger than the cost of a minimum spanning tree. The exact value of T for this model. The total cost of a broadcasting or gossiping algorithm is obtained by adding c ( i . instead of imposing a fixed upper bound on this number. i. Fork = 1 and arbitrary n . the following lower bounds were proved in [ 381. To make broadcasting possible in the whispering mode. a l-tolerant broadcasting scheme was shown for which both the time and the number of calls achieve the above optimal values. Moreover. + O ( k 6 + n log n ) . if a 1 -. for any a < 1.otherwise. i. remains unknown. This conjecture was disproved in [48] under the stronger assumption of at most k transient link faults. The authors proposed a k-tolerant gossiping algorithm with cost at most twice the optimal. Every link (i. T = L L ( r n . when compared to [48] for many values of k and j. The first result on the time of gossiping in the fullduplex mode with transient link faults was proved in [48]. then T E O ( n ) . In this subsection. On the other hand.correctly assuming at most k permanent link and/or node faults then it also works correctly assuming at most k transient link and/or node faults during the entire communication process. Hence. The following upper bound was shown: C5nk 2 time in O ( log n + k ) . This network has the minimum number of links among all networks supporting k-tolerant broadcasting in the optimum time rn + 2k. if n = 2"'. no message leaves the source in the worst case.' + j < 2 " .e. it was shown that k-tolerant broadcasting in time rn + 2k. and T s r n + 3 k + 1. j ) whenever a message travels through the link (i. This result was subsequently strengthened in [ 371 where the following upper bounds on T were proved: T I rn + k. has an additional feature: It uses the minimum number (rn + k)n/2 of calls. The following results regarding broadcasting time in this linearly bounded transient fault model were obtained in [ 441. e.

indeed. In [ 581. for all vertex transitive graphs. An adaptive. On the other hand. the author characterized networks that support faithful broadcasting using the optimal number of calls and no local memory for computations. For fixed k . nodes fail with probability 0 s q < 1. Also. results concerning broadcasting and gossiping are usually equivalent when packets are assumed of unbounded size. under more general assumptions. the issue of broadcasting time in the m-hypercube was studied for this model. Nonadaptive broadcasting and gossiping algorithms were proposed. i. First. working in time O(log2n) and using O ( n log n) calls. as all transmissions in such algorithms are scheduled in advance and do not depend on the random occurrence of faults. it was possible to allow a larger number of failures per time unit. Fix a network N and a broadcasting source s. links are subject to transient crash faults. As an open problem. They assumed that at most k calls may fail in each time unit for a constant k smaller than the edge-connectivity of the network. arbitrarily large networks were constructed for which all faithful broadcasting algorithms using no local memory for computations at nodes use a number of calls exponential in the size of the network. algorithms are deterministic and probabilistic considerations relate only to their correctness and/or efficiency. only their orders of magnitude (up to a multiplicative constant) are minimized.FAULT-TOLERANT BROADCASTING AND GOSSIPING I51 If the network is the complete n-node graph.. For multidimensional tori. a reverse of a broadcasting algorithm can be used to gather all values in one node. using the above optimal number of calls. In [ 251. It does not occur in the case of nonadaptive algorithms.. almost safe broadcasting algorithm was proposed. the author investigated the trade-off between the minimum number of calls used by a faithful broadcasting algorithm and the maximum amount of local memory needed in a node. It will be seen below that this distinction is sometimes significant. for a given parameter k .e.. and all faults are independent.in the following papers. The latter is called the space complexity of the algorithm. T E O( d ) . let k. Let d be the diameter of the network and let k (smaller than the edge-connectivity ) be the maximum number of faulty calls per time unit.. for any network. as they depend on the location of faults which are random. the requirement of faulttolerance was allowed to vary from node to node. we assume that links fail with probability 0 s p < 1.' ) . the authors asked whether T is linear in the diameter. be the minimum number of links whose deletion disconnects u from s. In [ 91. for a n y a < 1. calls. broadcasting and gossiping were studied under the assumption of unbounded packets. A broadcasting algorithm is reliable for the node u. In this way. also [ 91). Unless explicitly stated. THE PROBABILISTIC FAULT MODEL In this section. then. the construction from [25] was extended in [17] to decrease nonadaptive broadcasting time to O ( log n). A different way of defining fault-tolerant broadcasting was proposed in [ 5 8 ] . A similar assumption that the number of faulty transmissions may increase in time was addressed in [ 351. p < 1 was assumed to be constant. Moreover. consider the fault-free nodes scenario (q = 0). T E O(d'+ I ) . we may ask about the worst-case or the expected time and number of calls used by the algorithm. there exists a faithful broadcasting algorithm of linear space complexity. and there exist networks for which T E @ ( k ( d ' 2 . 4. the network could be disconnected. Later. k. if u gets the source message whenever less than k..' ) . Instead of demanding that a broadcasting scheme tolerate at most k link faults in the network. The order of the number of calls was proved to be optimal for nonadaptive algorithms (cf. A broadcasting algorithm is faithful if it is reliable for every node.as well. The values p = 0 ( q = 0) correspond to the assumption that links (nodes) are faultfree. then.1. It was proved that every faithful broadcasting algorithm for an arbitrary network uses at least Z. where it was proved that T E m + o ( m ) . both the time and number of calls are at most doubled. almost safe broadcasting algorithm was . there are two natural variations for defining the performance of adaptive algorithms in terms of time and number of calls. This model was further investigated in [21] for general networks. in [ 171. Finally. TE- + O(l0g log n). (For larger values of k . Using this approach. and there exist networks for which T E 0(d'* I ) . T E O ( k d ' * . 1 log n 1 -a 4. The following results were proved in [ 2 11 : For fixed d . It used nonconstructive expanders and worked in time O(log n). the total information can be broadcast from this node to all other nodes. Thus. results concerning the execution time and the number of calls used by broadcasting and gossiping algorithms are of asymptotic nature. In the probabilistic model. The authors considered the shouting communication mode. Given a node u f s.) In [35]. In most of the papers using the probabilistic fault model. the result from [ 251 concerning adaptive broadcasting was strengthened. Permanent Link and Node Faults We first assume that all failures are permanent and of crash type and that the communication mode is full-duplex whispering. therefore. a nonadaptive. Both these values are random variables. A simple treelike construction was applied to guarantee almost safe adaptive broadcasting and gossiping using an expected time O( log n ) and an expected number of calls O ( n ) . for arbitrary networks.for arbitrary networks.

As observed previously. it is necessary to attach node labels to values during transmissions. they showed an almost safe broadcasting algorithm working in optimal time h o g nl . must. In [ 661. The authors proposed an almost safe broadcasting algorithm working in time log n + o(log n). For unit-size packets. in this case. for packets of unbounded size. The difference from classical broadcasting is that only nodes that are awake (i.It remains open whether both the running time O(log n) and the number of calls O ( n ) can be guaranteed in the worst case. 9 < 1. already have received the message) can place calls. the latter result was strengthened: Almost safe broadcasting in time h o g nl was shown for p * = (c In n ) l n . the above results can be immediately extended to gossiping. This should be compared to the situation in [ 181 and [ 1 I].) Moreover. while for p* 2 [ ( 1 €)log n l l n . . In [ 171. the authors gave a positive answer to this question assuming that nodes are fault-free. assuming that link failure probability p is larger than constant. Since labels must use at least log n bits. a priori. the results from [ 461 were extended in another way. thus improving required time for this probability value.I faults in the m-hypercube was proposed.4. whenever p* = ( c In n ) l n . nodes do not know. adaptive broadcasting was considered. In [ 201. where c > 18. the authors constructed an almost safe wake up algorithm working in expected time O ( log n). nodes were assumed fault-free). In [17]. nonadaptive broadcasting was considered for arbitrary constant fault probability values p . almost safe broadcasting is impossible for large fault probability values. with probability converging to 1. for d < 1. In [ 141. q < 1.152 PELC given whose worst-case time was O(1og n ) and worstcase number of calls was O (n). let w ( n ) be any function divcrging to infinity. in the case when links and nodes are fault prone. The authors proposed an almost safe broadcasting algorithm working in time O ( log n). while for unit-size packets. where c > 16. almost safe broadcasting in time h o g nl + 1 was shown for p * = (C In nslog log log n ) l n. w ( n ) is any function diverging to infinity. the author showed almost safe broadcasting in time log n d log log log n. easily + seen to be optimal. In [ 361. For small probabilities of faults.g. both links and nodes were assumed fault-prone. for some positive constant c. this yields gossiping time O ( log n). which.the authors constructed a nonadaptive almost safe gossiping algorithm working in time O ( [ n l b ( n ) ] log n). both orders of magnitude being optimal. The algorithm in [28] used explicitly constructible expanders. as in broadcasting. whenever p* = ( c In n ) l n .p ) ( 1 . almost safe. randomized broadcasting was proved to be almost safe and work in time O(log2 n). In [28]. It should be mentioned that in the above gossiping algorithm. this yields linear gossiping time. an adaptive. For example. even for one-bit values. for d < I . The authors constructed a nonadaptive broadcasting algorithm requiring time O ( m ) . where classical broadcasting was performed with an expected linear number of calls.. where c > 16.09. waking-up algorithm. A variation of broadcasting. It is well known that. must contain values of b(n) nodes. In [ 471. whose value is currently transmitted. gossiping requires linear time even without faults. as the hypercube can be then disconnected with constant positive probability. by definition. where K > 12/(ln 2). almost safe broadcasting in time h o g nl was shown for p * = ( c In n-log log n ) l n . where labels did not have to be attached and packets could have only a constant number of bits. In the following papers. (This can be contrasted with the above-mentioned result from [ 171. an almost safe broadcasting algorithm workmg in time O ( m ) for the m-hypercube was constructed in [ 201. a nonadaptive broadcasting algorithm tolerating at most m . both orders of magnitude being optimal. As previously mentioned. a dormant (uninformed) node cannot call to seek the source message. The situation changes significantly if we consider smaller packets.4. Under the slightly stronger assumption p* = ( Kln2n ) l n . This restriction has a significant impact on the minimum number of calls of an almost safe. whenever p 5 0. Randomized broadcasting in the above fault model was considered in [30]. On the one hand. nonadaptive broadcasting in the hypercube was considered. using a treelike construction.e. In the beginning. Let p* = 1 .q ) 2 0. In [ 161. For packets of size b ( n ) . Forp* = [w(n)log n l l n . simultaneously weakening the assumption regarding node faults (in [ 91. This eliminated the need of nonconstructive expanders used in [ 91. For the unbounded packet size. These results were subsequently extended in [ 461. satisfying the condition ( 1 . Forp* = [log n w ( n ) ] / n where . a packet of size b ( n ) . which also follows from [ 171. unit-size packets must contain O( log n ) bits. where c > 18. only the source is + + + awake and has to wake up all fault-free nodes by sending the wake-up message. in fact. the relations between the size of packets and the time of faulttolerant broadcasting were investigated. An additional feature of the above scheme was that it worked in anonymous (complete) networks in which nodes do not know their labels and execute identical algorithms. its running time is O ( log n ) . It was proved in [ 191 that every such algorithm (even adaptive) must use an expected number of 52( n log n ) calls. thus..99. was considered in [ 191. contain b(n)log n bits. e. called waking up. and using an expected number of O ( n log n ) calls. for arbitrary fault probability values p . Moreover. The author showed almost safe broadcasting in time log n + d log log n . almost safe broadcasting was considered for the m-hypercube. almost safe broadcasting algorithm was also given having worst-case running time O(log n ) and expected number of calls O ( n ) .p denote the probability that a link is fault free. On the other hand.

L2. The lower bound T E O(n log n ) on gossiping time. algorithms were nonadaptive and the order of magnitude was proved to be optimal. On the other hand. the algorithm from [28] works for link and node crash faults. This lower bound is valid for strongly nonadaptive algorithms described above and for one-bit packets.. labels have to be attached to nodes. in case of the model Nl.then an almost safe. gossiping algorithms working in sparser networks were also constructed and their performance was proved to be of the smallest possible order of magnitude among gossiping schemes working in these networks. almost safe gossiping in time 2m2 was shown. with a small constant c. calls. Gossiping with unit-size packets in the presence of Byzantine faults was studied in [ 111. In mode 1. The communication mode was full-duplex whispering. 6). In [ 41. Does there exist an almost safe gossiping algorithm using packets containing a constant number of bits and working in linear time? A positive answer to this question for transient link faults and faultfree nodes was given in [ 181. In addition. thus. but receiving was sequential. links) failing with probability p s c / m . Transient Faults We now assume that individual calls fail with constant probability 0 zs p < 1 and all failures. proved in [ 111. 4. The following question remains open: Suppose that links and nodes of the complete network are subject to permanent crash faults with probabilities p < 1 and q < 1. remains true for crash faults as well. Suppose that node values are only 1 or 0. 03. but it is also predetermined which node's value is to be sent in a given transmission. Byzantine link failures under the assumption that nodes are fault free were considered in [ 61. as previously remarked. almost safe gossiping algorithms were constructed whose running time T and the number of calls C were shown to be of the smallest possible order of magnitude.e. a node could send a packet simultaneously to all neighbors. an almost safe gossiping algorithm requiring time 2m was presented. Other fault-tolerant aspects of the token dispersal problem were considered in [24]. q > 01.e. i. time O( then almost safe token dispersal was proposed with running time O ( G ) . Thus. For example. For fault-free nodes and links failing with probability p s c. The authors assumed that either links or nodes fail in a Byzantine way. All four models yielded by combinations of these assumptions were considered. C E O(n210g n ) and the algorithm achieving this performance worked for the sparsest possible networks. The authors considered strongly nonadaptive algorithms in which not only transmission scheduling is done in advance. if node values have a constant number of bits. The performance measure adopted in [23] was the running time of the scheme.FAULT-TOLERANT BROADCASTING AND QOSSlPlNG 1 s Its running time is (2" . In case of models N1 and N2. For fault-free links (respectively.1 )m and the number of calls is m2" . It is not strongly nonadaptive in the above sense and works in time O ( n ) for unit-size packets. in [ 281. working in time O ( n w ( n ) )and using O ( n 2 w ( n ) ) for any function w(n) -.. i. nodes) and nodes (respectively. Mode 2 was classical whispering. It was proved. However. an almost safe gossiping algorithm was constructed. with a small constant c . It remains open whether almost safe gossiping with unbounded packets. T E @( n log n). as well. on the other hand. This algorithm used only O ( n w ( n ) )links for communication. the linear broadcasting problem. N1. If only nodes or only links can fail ( p = 0 or q = 0).2" + 1. can be performed in the m-hypercube in time O(m)when links fail in a Byzantine manner with a small constant probability.2. the author performed simulations showing that for fault probability p s f all nodes become informed after an average time of less than 5m. In [ 231. For these models. is complete. no reliable communication can be achieved in a Byzantine environment. packets contain a constant number of bits. T E 6 ( n ) and C E @ ( n 2 ) . These two communication modes were combined with two assumptions regarding faults: ( N ) faultfree links and nodes failing with constant probability 0 < q < and (L) fault-free nodes and links failing with constant probability 0 < p < i. respectively. Link failure probability was a constantp < otherwise. sending was performed by shouting. contain a logarithmic number of bits. token dispersal algorithm was shown with running If both links and nodes can fail (p. The results from [ 1I ] should be compared to those from [ 281. Using nonconstructive expanders. Consider the model L2 from [ 111: whispering with fault-free nodes and faulty links. and that all values of nodes have a constant number of bits. and N2. Two communication modes were considered. The authors assumed that links and/or nodes of the (complete) network fail independently with constant probabilities and that an attempt to send a token to a faulty node or via a faulty link does not succeed (i. discussed in Section 3. the authors proposed a nonadaptive almost safe broadcasting algorithm working in time O(1og n). that almost safe gossiping in time 0( n ) or using O ( n 2 )calls is impossible in any network with o ( n 2 ) links. In case of models L1 and L2. or even almost safe broadcasting. labeled as LI. This enables the algorithm to skip the labels of nodes during transmissions.. Every fault-free node has to be visited. including those . For each of these models. similar to the approach in [ 181 and unlike that in [ 28 1.e.if the underlying network 4. was considered under the name of token dispersal. unitsize packets must.In both cases. the token remains at the sending node). in fact. nonadaptive gossiping with unbounded packets was considered for the m-hypercube. a node could receive only one packet at a time.

otherwise. e. In [ 601. The second direction for future research concerns investigating new communication and fault models arising as a consequence of emerging technologies. On the other hand. adaptive broadcasting for the complete network was considered in this model.of calls placed along the same link. it was not necessary to append node labels to their values during transmissions and. It is worth noting that the gossiping scheme in [ 181 was constructed in such a way that each node knows in advance the order in which it is going to get values of other nodes. Thus. these gaps are fairly small. are independent. FUTURE RESEARCH This survey demonstrates that although an important body of research exists concerning fault-tolerant broadcasting and gossiping. and thus yields worst-case time O ( log n ) . gossiping must take time at least linear in n . even the exact order of magnitude of minimum time or minimum number of calls in fault-tolerant communication has not yet been established. In other situations. consequently. our understanding of the relations between efficiency and fault tolerance of communication algorithms for these tasks is still fragmentary and incomplete. an almost safe nonadaptive gossiping algorithm working in time O ( n ) was constructed for the large class of networks having spanning trees of bounded maximum degree (including. communication in n-node bounded degree networks of diameter D ( n ) was investigated in [ 261. It was shown that almost safe. 5. unit-size packets really meant packets containing a constant number of bits. if faults are of crash type. Under this assumption. Node failures occur with probability 0 5 9 < I . Finally. Under this scenario. The communication mode is fullduplex whispering. they are permanent and independent. Diks and Pelc (see Problem 29 in [34]) asked if there exists an almost safe broadcasting algorithm working in time O ( n )for Byzantine faults. enable them to support efficient and robust communication algorithms. At least three groups of open problems can be specified on the basis of already obtained research results. Adaptive broadcasting and gossiping algorithms also were designed.. q < 1. they are increasingly difficult and costly to build as the number of nodes grows. it can be seen that most of the research done to date in the surveyed area concerns complete networks and hypercubes. In some cases. a large number of hypotheses have already been considered. assuming arbitrary p . The relationships between almost safe broadcasting time and the number of links in the network were studied in [61] for the shouting communication mode and crash transmission faults. This should be compared to the scenario in [28]. sparser ones. i.e. Here. Byzantine transmission faults were considered in [ 551. . no reliable communication can be guaranteed. the types of networks actually used in practice provide important and challenging research targets. for Byzantine faults. in particular. important challenges are the study of models that faithfully describe existing patterns of communication and the exploration of features likely to characterize networks built in the future. An improvement of this result follows from [ 171. nonadaptive broadcasting and gossiping can be performed in such networks in time O ( D ( n ) )using O ( n log n ) calls. The result from [ 171 holds for transient faults. there is a need to explore fault-tolerant capabilities of other networks. especially in the case of complete networks. and the remaining open problems are mostly of combinatorial interest. Although. Hence. described in Subsection 4. A variation of this model for unit-size packets was considered in [ 181. a simple algorithm working in time O ( n log n ) can be constructed. however. where almost safe. again. An almost safe broadcasting algorithm requiring expected time O(1og n ) and worst-case time O ( n log n ) was constructed. Future developments may have a significant impact on the efficiency of actually implemented communication schemes. However. all Hamiltonian graphs). Faults are of crash type and packets are of unbounded size. such as symmetry and high connectivity. In the following papers. in the case when values were of constant size. In [18]. as we have seen. nonadaptive broadcasting working in time O(log n ) was given. as well. The first group of problems concerns tightening the gaps between upper and lower bounds on the minimum time and/or the minimum number of calls in fault-tolerant broadcasting and gossiping. q = 0. In this case. The main result of [55] is the construction and analysis of such an algorithm. This research w s supported in part by NSERC Grant OGP a 0008 136. The problem considered in [ 5 5 ] is that of the minimum time required for almost safe nonadaptive broadcasting in the n node chain.g. as every node must read the value of every other node. All these orders of magnitude are optimal. It follows from [26] that this time is O ( n ) . Good topological properties of these networks. their possible combinations yielding a plethora of potential models. the assumption p < is necessary. nodes were assumed faultfree. working in worst-case time O( D ( n)) and using an expected linear number of calls. Research directed toward their solution is likely to deepen the understanding of the domain and increase the potential applicability of theoretical results in practice.1. It was shown that the minimum time of almost safe broadcasting in networks with e ( n ) links is T E @(n log n l e ( n ) ) ..

Berman and M. K. Dahbura.Igarashi. IEEE Trans. Control and Computing. Comp. Eds. A. S. 4 (1994) 417-427. Malinowski. Networks 27 ( 1996) 293-307. Methods and problems of communication in usual networks. 43 (1994)698-710. Pelc. NJ (1989). Math. Pelc. M. 25 (1993)171-220. 1994) 187-193. LNCS 972. J. Math.FAULT-TOLERANT BROADCASTING AND GOSSIPING 1! % REFERENCES R. Bao and Y. P. Diks and A. Proc.2 5-229. Rescigno. Reliable broadcasting in hypercubes with random link and node failures. 1 (1990)447-460. Liestman. The consensus problem in fault-tolerant computing.Igarashi. Discr. Pelc. 4 D. Comb. Lyzenga. Lingas. Fraigniaud. Gargano. Networks 22 (1992)469-486. J. Gopal. S. Alg. Fraigniaud. 1 (1993)288-315. K. Randomized broadcast in networks. Proceedings of the 28th Annual Allerton Conference on Communication. Diks. Diks. Liestman. Numer. Blough and A. K. Diks. Asymptotically optimal broadcasting and gossiping in faulty hypercube multicomputers. K. Random Struct. Fast gossiping with short unreliable messages. S. and A. Reliable token dispersal with random faults. P. G. Pelc. Malinowski. Disc. Almost safe gossiping in bounded degree networks. D.Fox. Fault-tolerant linear broadcasting. A. 53 ( 1994) 3 . L. and 0. Token transfer in a faulty network. B. and D. Chau and A. FTCS’21 (1991)266-273. 39 (1991) 115-1 19. Gargano. G. S. A. Gargano. Chlebus. Math. Walker. Chou and I. Canada. S. K. F. F. Diks and A. Pelc. Appl. Hawrylycz. Bienstock. Sparse networks supporting efficient reliable broadcasting. Parallel and Distributed Computation: Numerical Methods. Diks. K. A. J. Diks and A. Gargano and A. Englewood Cliffs. Sun. S. Afg. Reliable minimum-time broadcast networks. P. Fund. J. I (1988). J. Peterson. ICALP’93. Chlebus. Malek. SIAM J. Alg. SIAM J. Solving Problems on Concurrent Processors. Y. Diks. Digest of Papers. Pelc. 53 (1994) 15-24. E75 (1992)255-260. Znf: Appl. Tsitsiklis. L. Bao. Proceedings of the 27th Annual Hawaii International Conference on System Sciences 2 (Jan. and A. Broadcasting in random graphs. Congress. Pelc.. Fault-tolerant minimum broadcast networks. Fast and robust broadcasting in faulty hypercubes. Appl.. Kanai. 301 U. 1990) 978-987. Proc. Graph Theory and Computing. Upfal. Berman. Disc. B. 10 (1986)1-18. Telephone problems with failures. Open problems. Lett. B. and A. Molloy. Miura. Ahlswede. Pelc. Chlebus. K. Chau. L. Proc. Methods 7 ( 1986) 1317. B. and L. Appl. IEEE Trans. Farley. D. Reliable broadcasting in product networks with Byzantine faults. Pelc. Disc. Hakimi. Reliable broadcasting. and D. NJ. Sotteau. Manuscript. Bitan and S. 41 (1992) 1410-1419. K. C. Proceedings WDAG’95. L. Disc. Otto.148. Appl. Efficient gossiping by packets in networks with random faults. Pelc. K. Math. 29 (1995)383-400. Fraigniaud and E. Bertsekas and J. Fraigniaud and C. S. Frieze and M. Raghavan. L. P. and A. Proceedings of the First Canada. L. P. and E. Reliable gossip schemes with random link failures. and A. Appl. A. S. Richards. and A. A. K. N. Peters. and A. Diks. Math. Chlebus. Diks and A. Alg. C. Lazard. Par. 9 (1996)7-18. Disc. Proceedings of the 18th SE Conference on Combinatorics. Disc. 3 (1993)507-524.13. LNCS 700. Disc. Bruck. S . Pelc. ACM Comput. Diks. Theory and Practice. A. L. K. and K. A.Johnsson. 54 (1994)77-80. Comput. Sci. Math. and A. On optimal broadcasting in faulty hypercubes. 5 (1992)338344. LNCS 805 (May 1994) 207217. Linear broadcast routing. Pelc. S . Katano. Englewood Cliffs. Y.. Appl. Alg. Optimal linear broadcast. Liestman. Prentice-Hall. SIAMJ. Barborak. Chlebus. C. P. S . Information disseminating schemes for fault-tolerance in hypercubes. and A. B. Pelc. Proc. IEICE Trans. Lett. L 391 L. Math. to appear. Tighter bounds on fault-tolerant broadcasting and gossiping. Combin. A. B. Feige. K.388-397. Broadcasting with random faults. Carlsson. Information dissemination in distributed systems with faulty units. Igarashi. 0 K. Peleg. Waking up an anonymous faulty network from a single source. 53 (1994)79-133. K. frob. H. Theor. 59 (1987)37-48. Montreal. Networks 23 ( 1993)691-701. Khachatrian. Diks. and A. T. Inf. Broadcasting in hypercubes with randomly distributed Byzantine faults. S. Diks. Broadcasting in a hypercube when some calls fail. Optimal communication in networks with randomly distributed Byzantine faults. Broadcasting in synchronous networks with dynamic faults. Pelc. Lett. (Oct. to appear. D. 20 (1988)1-7. 1 M. Haroutunian. M. Optimal broadcasting in faulty hypercubes. Zaks. A. Prentice-Hall. K. J. K. J. Proceedings of the 26th Annual International Symposium on Fault-Tolerant Computing ( 1996) 262-27 1. P. Communication com- . Constructing fault-tolerant minimal broadcast networks. and D. Reliable broadcasting in logarithmic time with Byzantine link failures. Disc.France Conference on Parallel and Distributed Computing. Comp. Chlebus. H. 1 (1989)490-517. S . Math. Peyrat. Bagchi and S. Networks 27 (1996)309-318. Par. In5 Sys. 53 (1994) 135. Salmon.

Universita di Salerno.Osawa. Y. Gargano. Rescigno. Kanai. Ohring and D. [ 4 5 ] L. and S. KuEera. Time bounds on faulttolerant broadcasting. Minimum time networks tolerating a logarithmic number of faults. A note on optimal time broadcast in faulty hypercubes. Disc. Par. Peine. Gpieniec and A. 53 (1994) 149-170. Kanai. Comp. 1996 . Haddad. Gerbessiotis. Math. A. Broadcasting with linearly bounded transmission faults. A. Peleg and A. S. SIAM J. Matrix multiplication on Boolean cubes using generic communication primitives. A. A. Ho. Math. A survey of gossiping and broadcasting in communication networks. Comp. Wouk. Peleg. SIAM (1989) 108-156. Message complexity versus space complexity in fault tolerant broadcast protocols. E75 ( 1992) 22-29. Pelc. U.Igarashi. K.156 PELC Parallel Processing and Medium-Scale Multiprocessors (A. H. Par. Hsu and D. A. IEICE Trans. Harutounian. Disc. Close-to-optimal and near-optimal broadcasting in random graphs. Gerbessiotis. [ 481 R. Rescigno. S. L. [ 441 L. Networks 15 (1985) 159-171. G. P. IEEE Trans. Monien. K. Alg. S. Disc. 1995 Accepted March 27. Du. Shin. D. 40 ( 1991 ) 169-174. L. Vaccaro. Proc. Ramanathan and K. Reliable broadcast in hypercube multicomputers. and A. Networks 18 (1988) 319-349. Miura. and (1995) 129-150. plexity of fault-tolerant information diffusion. Universitt du Quebec i Hull ( 1995). Science Press & A M S . Proceedings of the Fourth International Colloquium on Coding Theory. Schaffer. J. D. (F. Maddaluno. Proceedings of the 6th IEEE Symposium on Parallel and Distributed Processing ( 1994) 188195. S.. Technical Report RR 95/01-1. T. Vaccaro. Adaptive broadcasting with faulty nodes. Math.Han. Khachatrian and H. B. Par. Gargano and U. Liestman. T. Appl. A. Fault-tolerant broadcast graphs. and A. Broadcasting in faulty binary jumping networks. Minimum time broadcast in faulty star networks. Moran. On optimal broadcast graphs. Dist. Appl. Gpieniec and A. Y. Disc. Pelc. 26 (1995) 132-135. Pelc.). Thesis. W. 63 [ 411 L. Proceedings of the Fifrh IEEE Symposium on Parallel and Distributed Computing ( 1 993 ) . Greece (June 1995) 159-172. A. Universitt du Qu6bec i Hull ( 1995 ) . Hedetniemi. Optimal and near-optimal broadcast in random graphs. K. Comput. J. Broadcasting time in sparse networks with faulty transmissions. A. Proceedings of the 2nd Colloquium on Structural Information and Communication Complexity. 23 (1994) 462-467. V. Hohndel. J. Liestman. Y. [46] A. E. Faulttolerant hypercube broadcasting via information dispersal. Lett. R. Klasing. Schaffer. HromkoviE. Hedetniemi. 25 (1989) 289-297. V. Neiworks 19 ( 1989) 803-822. Dist. S. Ed. Dissemination of information in interconnection networks (broadcasting and gossiping). Gpieniec and A. CombinarorialNerwork Theory. 1987. Vaccaro. Algorithms for the construction of faulttolerant networks (in Italian). H. R. Technical Report RR 951 04-7. Proc. Optimal schemes for disseminating information and their faulttolerance. C. Manuscript.-Z. Wierman. S. Broadcasting in random graphs. Comput. Gargano. A. Networks 23 ( 1993) 271 -282. 5 (1992) 178-198. Math. Networks 19 ( 1989) 505 -5 19. I n f . On gossiping [ 491 [ 501 [ 511 [ 521 [ 531 with faulty telephone lines. [ 431 L. Par. Armenia ( 1990) 69-77. G. Johnsson and C. Broadcasting through a noisy one-dimensional network. Broadcasting in complete networks with faulty nodes using unreliable calls. Eds. SIAM J . Lett. Technical Report MPI-1-93. Fast fault-tolerant broadcasting and gossiping. Discr.106. S. to appear. L. Scheinerman and J. and K. Pelc. Roy. M.). [ 471 A. and R. Igarashi. Methods 8 (1987) 439-445. SIROCC0'95. Pelc. Miura. and U. Pelc. 2 (1992) 355-361. Inf: Syst. to appear. L. Optimal fault tolerant communication algorithms on product networks using spanning trees. Broadcasting with a bounded fraction of faulty nodes. [40] L. L. A. Received November 22. Appl. 37 (1988) 1654-1657. MaxPlanck-Institut fur Infonnatik ( 1993). [ 421 L.

Sign up to vote on this title
UsefulNot useful