You are on page 1of 13

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 34, NO.

4, APRIL 2023 1115

On the Message Complexity of Fault-Tolerant


Computation: Leader Election and Agreement
Manish Kumar and Anisur Rahaman Molla

Abstract—This article investigates the message complexity of of permissionless distributed systems [14] (where participants
two fundamental problems, leader election and agreement in the can join the system anonymously with virtual IDs without any
crash-fault synchronous and fully-connected distributed network. control over the numbers), it is also desirable to design pro-
We present randomized (Monte Carlo) algorithms for both the
problems and also show non-trivial lower bounds on the message tocols that may tolerate an arbitrary number of faulty nodes.
complexity. Our algorithms achieve sublinear message complexity However, despite intensive research, there has still not been
in the so-called implicit version of the two problems when tolerating a practical solution to the fault-tolerant agreement or leader
more than a constant fraction of the faulty nodes. In comparison election for large-scale networks. One key reason for this is
to the state-of-art, our results improved and extended the works of
the large message complexity of currently known protocols– a
[Gilbert-Kowalski, SODA’10] (which studied only the agreement
problem) in several directions. Specifically, our algorithms tolerate concern which has been raised in many systems papers [13],
any number of faulty nodes up to (n − polylog n). The message [15], [16], [17], [18], and at the same time it is desirable to have
complexity (and also the time complexity) of our algorithms is simple and lightweight protocols that can be implemented easily.
optimal (up to a polylog n factor). Further, our algorithm works in In this paper, motivated by the need for a practical fault-tolerant
anonymous networks, where nodes do not know each other. To the
protocol, we focus on the following fundamental questions in
best of our knowledge, these are the first sub-linear results for both
the leader election and the agreement problem in the crash-fault solving fault-tolerant agreement (also known as consensus) and
distributed networks. leader election in synchronous distributed networks: What is the
minimum number of messages required to solve fault-tolerant
Index Terms—Distributed algorithm, randomized algorithm,
fault-tolerant algorithm, crash-fault, leader election, agreement, leader election and agreement?
message complexity. The leader election problem requires a group of nodes in a
distributed network to elect a unique leader among themselves,
such that exactly one node must output its decision as the leader,
I. INTRODUCTION and all the other nodes output their decision as non-leader. These
EADER election and agreement are two fundamental prob- non-leader nodes need not be aware of the identity of the leader.
L lems in distributed computing, which, along with their
variants, have been extensively studied for the last four decades,
This implicit variant of leader election is quite well-known [19],
[20], [21], [22] and is sufficient in many applications. In the
since their introduction [1], [2], [3]. Over time, the problems explicit version, the non-leader nodes must know the ID of the
gained significant interest due to their direct applications in leader node. Similarly, in the implicit version of the classical
real-world distributed systems, which are prone to be faulty. (binary) agreement problem, each node starts with an input
For example, the content delivery network Akamai [4] uses value (0 or 1) and the agreement problem requires guarantee-
distributed leader election as a subroutine to achieve fault- ing the following two conditions: (1) A non-empty subset of
tolerance, Paxos [5] agreement protocol uses leader election nodes should agree (or decide) on the same value (consensus
for consensus. The importance and widespread application of condition) and (2) the common value should be the input value
the two problems are prevalent in many domains, e.g., sensor of some nodes (validity condition). In the explicit version, all
networks [6], IoT networks [7], grid computing [8], peer-to-peer the nodes must agree on the same input value. The implicit
networks [9], [10], [11], [12] and cloud computing [13]. Often, agreement, which is a generalization of the (explicit) agreement
in large-scale distributed networks, it is desirable to achieve problem, was first introduced by [23]. It is trivial that Ω(n) is
low-cost and scalable leader election and agreement protocols, a message lower bound for the explicit version of both the
even with probabilistic guarantees. Furthermore, due to the rise problems.
We study the implicit version of leader election and agreement
Manuscript received 27 May 2022; revised 16 December 2022; accepted 12 under crash-fault. Note that any solution of the implicit version
January 2023. Date of publication 26 January 2023; date of current version 10 of two problems can be easily extended to solve the explicit
February 2023. This work was supported by ISI DCSW/TAC under Grant F5484. versions in O(1) rounds. The flexibility of the implicit version
Recommended for acceptance by R. Vaidyanathan. (Corresponding author:
Manish Kumar.) of the two problems lies in the fact that all the nodes need not
The authors are with the Indian Statistical Institute, Kolkata, West Bengal be agreed on the leader or do not need to know the agreed value.
700108, India (e-mail: manishsky27@gmail.com; molla@isical.ac.in). Therefore, in principle, the agreement (either on a leader or a
This article has supplementary downloadable material available at
https://doi.org/10.1109/TPDS.2023.3239993, provided by the authors. value) may be fulfilled with less communication. In particular,
Digital Object Identifier 10.1109/TPDS.2023.3239993 one can expect sublinear message bounds, unlike the case of the

1045-9219 © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY GUWAHATI. Downloaded on November 13,2023 at 06:41:13 UTC from IEEE Xplore. Restrictions apply.
1116 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 34, NO. 4, APRIL 2023

explicit version of the problems when all nodes need to agree, leader is non-faulty with probability at least α. The
which requires at least n messages. message complexity is sublinear for α > (log n/n1/5 )
Our work is closely related to the state-of-art Gilbert- and then the algorithm tolerates at most (n − n4/5 log n)
Kowalski [24], which studies the message complexity of the faulty nodes. For any constant fraction faulty nodes in
agreement problem in the same setting. It presents an optimal the network (i.e., when α is any constant), the algorithm
O(n) message complexity algorithm for the explicit agreement is both message and time optimal (up to a polylog n
and tolerates at most n/2 − 1 faulty nodes. Note that the al- factor). The algorithm can be extended to achieve explicit
gorithm in [24] assumes the nodes know their neighbors; if not leader election in O(log n/α) rounds and O(n log n/α)
known (like ours), the message complexity would be O(n log n). messages.
In comparison to this, our implicit agreement algorithm can be 2) Message lower bound for fault-tolerant leader election:
extended to an explicit agreement algorithm that has message We show an unconditional lower bound of Ω(n1/2 /α3/2 )
complexity O(n log n) when tolerating any linear fraction faulty messages for any leader election algorithm that succeeds
nodes. In fact, our algorithm is more general in the sense that with high probability and tolerates at most (1 − α)-
the message complexity (and also the time complexity) depends fraction faulty nodes. The lower bound also holds in the
on the number of faulty nodes, and thus, tolerating more faulty LOCAL model of distributed computing.
nodes, up to n − log2 n. Furthermore, our agreement algorithm 3) Fault-tolerant agreement: We present a random-
works in anonymous networks where nodes do not know their ized algorithm that solves (implicit) agreement with
neighbors and is also scalable, i.e., sends only a constant number high probability in O(log n/α) rounds and incurs
of bits over an edge per round. We also show a matching lower O((n1/2 log3/2 n)/α3/2 ) messages with high probability.
bound of the agreement problem, which is optimal up to a The algorithm is message optimal (up to a polylog n
polylog n factor. factor) as we show a tight lower bound. The message com-
The implicit leader election was studied in the fault-free plexity is sublinear for α > log n/n1/3 , which means, the
setting (i.e., a network with no faulty nodes) by Kutten et al. [21]. algorithm tolerates at most (n − n2/3 log n) faulty nodes
They presented an optimal sublinear randomized algorithm and a and achieves sublinear message bound. The algorithm can
tight lower bound on the message complexity (up to a polylog n be extended to achieve explicit agreement in O(log n/α)
factor). To the best of our knowledge, this is the first paper that rounds and O(n log n/α) messages.
studies the message complexity of (implicit) leader election in 4) Message lower bound for fault-tolerant agreement: We
the crash-fault synchronous distributed networks. We further show an unconditional lower bound of Ω(n1/2 /α3/2 ) on
show a non-trivial lower bound on the message complexity. the message complexity for any agreement algorithm that
The message complexity is optimal for more than a constant succeeds with high probability and tolerates at most (1 −
fraction of faulty nodes in the network (up to a polylog n factor). α)-fraction faulty nodes. The lower bound is essentially
The O(log n) round complexity of both the problem (for any tight (up to a polylog n factor). The lower bound also holds
constant fraction of faulty nodes) is also almost optimal due in the LOCAL model.
to the lower bound Ω(log n/ log log n) by [25]. Our algorithms One surprising fact about our results is that for any constant
work in anonymous networks, where each node (initially) does fraction of faulty nodes, the Õ(n1/2 ) message complexity (Õ
not know the identity of its neighbors. Our algorithms are simple, hides a polylog n factor) of leader election and agreement is
lightweight (based on sampling only), and scalable (sends only asymptotically same as in the fault-free network [21], [23]. The
O(log n) bits through an edge per round). lower bounds are also matched.
Paper Organization: Section II presents the model and def-
A. Our Results and Implications initions. Section III encloses the prior work in the direction.
Section IV contains fault-tolerant leader election algorithm
We present sublinear algorithms and lower bounds on the
(Section IV-A) and lower bound on the message complexity
message complexity of implicit agreement and leader election in
(Section IV-B). Section V contains fault-tolerant agreement
a fully-connected synchronous network. Recall that the network
algorithm (Section V-A) and lower bound on the message com-
has at least αn non-faulty nodes among n nodes1 and the
plexity of agreement (Section V-B). Finally, we conclude in
CONGEST communication model. The algorithms tolerate up to
Section VI.
(n − log2 n) faulty nodes. The algorithms and the lower bounds
depend on the value of α, which essentially explains that the
message complexity increases with the number of faulty nodes II. MODEL AND DEFINITIONS
in the network. The main results are:
The distributed network is a fully-connected (i.e., complete)
1) Fault-tolerant leader election: We present a random-
n nodes graph. The network is anonymous, i.e., a node does
ized implicit leader algorithm which elects a leader
not know the identity of its neighbors initially. This model
with high probability in O(log n/α) rounds and incurs
is known as KT0 (knowledge till 0 hop); on the other hand,
O((n1/2 log5/2 n)/α5/2 ) messages, such that the elected
in the KT1 (knowledge till 1 hop) model, nodes know their
neighbors [26]. A network is called f -resilient if there are at
1 We often use the notation αn to mean the integer number αn and (1 − α)n most f nodes which may fail by crashing. We assume, up to
to mean the integer number (1 − α)n. f ≤ (n − log2 n) nodes may fail by crashing. More precisely,

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY GUWAHATI. Downloaded on November 13,2023 at 06:41:13 UTC from IEEE Xplore. Restrictions apply.
KUMAR AND MOLLA: ON THE MESSAGE COMPLEXITY OF FAULT-TOLERANT COMPUTATION: LEADER 1117

at most (1 − α)n nodes can be faulty and the remaining at III. RELATED WORK
least αn nodes are non-faulty in the network, where α is a real Leader election has been extensively studied in various mod-
constant that lies in the range [log2 n/n, 1]. The algorithm runs els and settings, starting from its introduction by Le Lann [2]
in synchronous rounds, and nodes communicate by exchanging in ring networks and then by the seminal work of Gallager-
messages in rounds. A faulty node may crash in any round during Humblet-Spira [27] in general graphs. The problem is well
the execution of the algorithm. If a node crashes in a round, then studied in the complete network itself [28], [29], [30], [31],
an arbitrary subset (possibly all) of its messages for that round [32], [33], and reference therein. Similarly, the agreement is
may be lost (as determined by an adversary) and may not be also extensively studied in the class of complete networks [3],
reached the destination. The crashed node halts (inactive) in [22], [24], [25], [34], [35], [36], [37].
the further rounds of the algorithm. If a node does not crash in In case of agreement, Bar-Joseph
a round, then all the messages that it sends in that round are  √ and Ben-Or [38] showed
the upper and lower bound Θ f / n log n on the expected
delivered to the destination [19]. We assume a static adversary number of rounds in adaptive crash-fault. In the case of non-
controls the faulty nodes, which selects the faulty nodes before adaptive (i.e., static) adversary, Kowalski and Mirek [39]
the execution starts.2 However, the adversary can adaptively  showed
n
choose when and how a node crashes. Each node has access the upper and lower bound Θ (n−f ) log(n/(n−f )) on the
to an arbitrary number of private random bits. expected number of rounds. While these works are devoted to
We consider CONGEST communication model [26], where a the round complexity, the main focus of this paper is on message
node is allowed to send a message of size O(log n) bits through complexity. Recently, Hajiaghayi et al. [40] presents a crash-
an edge per round. The message complexity of an algorithm is fault consensus algorithm having message complexity Õ(n3/2 )
the total number of messages sent by all the nodes throughout in a network where nodes know their neighbors’ ID and the port
the execution of the algorithm. Our lower bounds hold in the connecting to it (a.k.a. KT1 model) under adaptive adversary.
LOCAL communication model, where nodes are allowed to Let us discuss some relevant works that are closely related
send a message of arbitrary size through an edge per round. We to ours. In a fault-free setting, Kutten et al. [21] studied the
assume that nodes know the value of α and n. Although a node (implicit) leader election problem in complete networks and
knows the value of n simply by its number of ports connected showed sub-linear message bounds using randomization, which
to the other nodes since the network is complete. breaks the linear lower bound of any deterministic algorithms
Let us now formally define the fault-tolerant leader election (if we restrict on O(1) time deterministic algorithm, it requires
and agreement problem. Ω(n2 ) messages in a complete network). In particular, they
Definition 1 (Fault-Tolerant Leader Election): Consider a presented an algorithm that executes in O(1) time and uses
complete network of n nodes, in which at most f nodes could be only O(n1/2 log3/2 n) messages to elect a leader in a complete
faulty and the remaining (n − f ) nodes are non-faulty. The fault- network where all the nodes are non-faulty. They also showed
tolerant leader election requires electing a leader node from the an almost matching lower bound for randomized leader election.
set of n nodes such that the leader is non-faulty with probability Later, Augustine et al. [23] introduced and studied implicit
(1 − f /n).3 Every node maintains a state variable that can be set agreement problem in fault-free settings and showed similar
to a value in {⊥, N ON ELECT ED, ELECT ED}, where sub-linear bounds for agreement. Our work is inspired by the
⊥ denotes the ‘undecided’ state; initially, the state is set to ⊥. works of [21], [23], [24], but in the faulty setting.
Solving the fault-tolerant leader election problem requires the In the crash-fault setting, our work is closely related to
state of exactly one node to be ELECT ED and all other nodes the work of Gilbert-Kowalski [24] which studied the message
are N ON ELECT ED. This is the requirement for implicit complexity of the agreement. They presented an optimal O(n)
leader election. message complexity (explicit) agreement protocol which toler-
Definition 2 (Fault-Tolerant Agreement): Consider a com- ates at most n/2 − 1 faulty nodes in the KT1 model. The time
plete network of n nodes, in which at most f nodes could be complexity is O(log n). While their algorithm solves the implicit
faulty and the remaining (n − f ) nodes are non-faulty. Suppose agreement in a smaller set of nodes (of size O(log n)) in the
initially each node has an input value in {0, 1}. An implicit course of the explicit agreement, they didn’t mention the implicit
agreement holds when the final state is either {0, ⊥} or {1, ⊥} version of the agreement. Their implicit agreement solution uses
and at least one node has a state other than ⊥ (which should be Õ(n1/2 ) messages (Õ hides a polylog n factor) and O(log n)
the input value of some node), where ⊥ denotes the ‘undecided’ rounds. In comparison, our agreement algorithm matches all
state. In other words, all the decided nodes must agree on the these bounds while tolerating more faulty nodes.
same value which is an initial input value of some node and there The work [36] presented an (explicit) agreement algorithm
must be at least one decided node in the network. tolerating a linear fraction of faulty nodes. The message com-
plexity and the time complexity of their algorithm are O(n log n)
2 In contrast, an adaptive adversary can determine the nodes to be faulty during and O(log n) respectively but in expectation. Recall that our
the execution of the algorithm. result achieves the same bound with high probability and with
3 Note that all the faulty nodes may survive and execute the algorithm correctly
until the leader is elected, and then they may crash. Thus, a leader could be
higher resilience. The work [41] presented an agreement al-
faulty with probability f /n if the adversary selects the faulty nodes uniformly gorithm tolerating an arbitrary number of faults in the asyn-
at random in the beginning. chronous and adaptive crash fault model, but their algorithm uses

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY GUWAHATI. Downloaded on November 13,2023 at 06:41:13 UTC from IEEE Xplore. Restrictions apply.
1118 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 34, NO. 4, APRIL 2023

TABLE I
2
COMPARISON WITH THE BEST KNOWN AGREEMENT PROTOCOLS IN THE SAME MODEL. HERE, c IS ANY CONSTANT AND logn n ≤ α ≤ 1. * THE BOUND HOLDS
IN EXPECTATION. $ A RECENT BRIEF-ANNOUNCEMENT PAPER [42] IMPROVED THE MESSAGE COMPLEXITY TO O(n + f log f ), BUT MISSING FORMAL PROOFS

at least a quadratic number of messages (in expectation) when that round. Thus, there might be two sets of nodes with different
the number of faulty nodes is linear or more. Our agreement states or views in the network. This may lead to an incorrect
algorithm also gives better bounds compared to the deterministic leader election. Let us describe the algorithm below.
algorithms presented in [35], [37], [42], in which the round Recall that we consider anonymous networks, i.e., nodes do
complexity is O(f ) and the message complexity is Ω̃(n). Table I not know each other (or, nodes do not have IDs), but know the
presents state-of-the-art in the same setting. values n and α. Initially, each node randomly picks an integer
The leader election is a less studied problem in the faulty number in the range [1, n4 ] 4 (called its rank, which is also
setting. In the fault-free setting, Gilbert et al. [43] provided an al- used as the ID of the node). The idea is to select the smaller
gorithm that solves implicit leader election in a general network committee nodes (called candidate nodes) which are responsible

with O n log7 n · tmix messages and in O(tmix log2 n) for electing a leader node among themselves. Among the can-
rounds, where tmix is the mixing time of the graph. didate nodes, the node which proposes itself as a leader without
 Kowalski crashing becomes the leader. A random set of candidate nodes
and Mosteiro [44] elect a unique leader using Õ n · tmix /Φ of size Θ(log n/α) is selected, say, the set is C. For this, each
messages with high probability in the congest model, where Φ node selects itself with probability O(log n/αn) to become a
is the graph conductance. candidate node. The reason behind selecting so many candidate
We study the message complexity of the fault-tolerant leader nodes is to make sure that C contains at least one non-faulty node.
election problem and show a non-trivial bound. The message This selection is done locally at each node, and hence the candi-
complexity is optimal for any constant fraction of faulty nodes date nodes do not know each other initially. Since knowing each
in the network (up to a polylog n factor). The round complexity other is message expensive, the candidate nodes communicate
is almost optimal due to the lower bound Ω(log n/ log log n) among themselves via some other nodes. For this, each candidate
[25]. node randomly samples Θ(((n log n)/α)1/2 ) nodes among all
In the context of Byzantine failure (where faulty nodes can the n nodes; call them as referee nodes and denoted by R. The
behave maliciously), many excellent works that studied the reason behind sampling so many referee nodes is to make sure at
message complexity of leader election and agreement, a few least one common “non-faulty” referee node between any pair
of them are [45], [46], [47], [48], [49], [50], [51], [52]. of candidate nodes’ referees. The candidate nodes communicate
with each other via the referee nodes. Notice that a node may be
IV. FAULT-TOLERANT LEADER ELECTION sampled as a referee node by multiple candidate nodes.
The idea is to make the minimum ID node among the candi-
In this section, we first present a randomized algorithm that
date nodes to be the leader. That’s why all the candidate nodes
elects a leader with high probability in a complete n-node
send their rank (which is also their ID) to the other candidate
network with at least αn non-faulty nodes for a real value of
nodes. However, the minimum ID node may crash during the
α in [log2 n/n, 1]. Our goal is to minimize the message com-
broadcast and its ID may not reach all the candidate nodes. Then
plexity of the algorithm while keeping the run time as small
the algorithm tries to select the second minimum node to be the
as possible. Then we show a non-trivial lower bound on the
leader, but it too may crash, and its ID may not be available to
message complexity of any leader election algorithm.
all the candidate nodes. Then the third minimum node and so
on. This continues until it guarantees a node ID is available to
A. Algorithm all the candidate nodes so that they can agree on that node as the
The challenging part of designing an efficient algorithm is
handling the faulty nodes. A faulty node may crash in some 4 The range is taken in such a way that all the n ranks are distinct w.h.p. —
rounds and its message may not reach all the destination nodes in which can be shown easily using a standard Chernoff bound [53].

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY GUWAHATI. Downloaded on November 13,2023 at 06:41:13 UTC from IEEE Xplore. Restrictions apply.
KUMAR AND MOLLA: ON THE MESSAGE COMPLEXITY OF FAULT-TOLERANT COMPUTATION: LEADER 1119

leader. Note that our algorithm promises that a crashed node is Step 4: If a candidate node u doesn’t mark itself as the leader
never elected as a leader (but it may crash after the election). and doesn’t propose another node as the leader in Step 1 and
Pre-Processing: First, each candidate node u sends its Step 3, and didn’t receive any updates in the next 4 rounds, then
own rank, i.e., IDu to its referee nodes, say Ru . A referee u sends the next minimum rank from its rankListu to Ru . This
node may receive ranks from multiple candidate nodes. Each case may arise when u proposed a minimum rank but does not
referee node w forwards the list of received ranks to its respective receive the proposal from that particular node, or a node with a
candidate nodes, say Cw . Thus, each candidate node has the list higher rank in the next 4 rounds. Then u will propose the next
of some other candidate nodes’ ranks; call the list as rankList.5 minimum rank available in its rankList. Since it might be the
While a candidate node knows the rank of another candidate case that the rank which u proposed has crashed, which u can
node, this knowledge doesn’t help to know the port (or the edge) confirm after 4 rounds.
connecting to that node. Now, the candidate nodes communicate The above four steps are performed for O(log n/α) itera-
via the referee nodes to agree on a leader from the rankList. tions and the algorithm terminates. In the end, all the non-
This is done by performing the following steps iteratively. faulty candidate nodes have the same minimum rank in their
Step 1: Each candidate node u proposes the minimum rank rankList, which they agree to be the leader. Intuitively, there
from its rankList as a potential leader. The proposed rank of u, are O(log n/α) candidate nodes and a single node may crash
say, pu is sent to all the candidate nodes via the referee nodes. For in each iteration (say, the minimum ID node crashes among the
this, each candidate node u sends IDu , pu to its referee nodes remaining candidate nodes). Since the set of candidate nodes
Ru . For a node u, if pu = IDu i.e., u’s rank is the minimum contains at least one non-faulty node, the algorithm computes a
in its rankList, then u marks itself as the leader. During the unique leader after O(log n/α) rounds (each iteration takes at
iteration, if a candidate node receives a higher rank, it updates most 4 rounds). Notice that if a candidate node u has not crashed
its rankList by removing all the ranks smaller than the received in Step 1 and proposes its ID as the minimum rank, then u is
rank. A node proposes a rank from its rankList only once and going to be the leader since it successfully sends its rank to all
does not propose the same rank in the later rounds. the candidate nodes. It might crash in the next round, i.e., in
Step 2: Then each referee node w sends the maximum pro- Step 2 or later, but then all the non-faulty nodes are unaware
posed rank among the ranks that it received from its candidate of u’s crash and u still be the unique leader in the network. A
nodes Cw . Let pmax w denote the maximum received rank by a complete pseudocode is included in the supplemental material,
referee node w. The referee node also sends the ID of a node u if which can be found on the Computer Society Digital Library.6
u proposed its own rank (as the maximum rank pmax w ), otherwise, Let us now show the few important lemmas that support the
sends a null value. More precisely, each referee node sends correctness of the algorithm. In particular, Lemma 2 shows that
IDu , pmax
w if pmax
w is proposed by IDu and pmax w = IDu , the set of candidate nodes contains at least one non-faulty node.
max
otherwise sends ⊥, pw , where ⊥ denotes a null value. Then Lemma 3 shows that between any pair of candidate nodes,
Step 3: Each candidate node u considers the maximum re- there is a common non-faulty referee node. It ensures that a
ceived rank, say, p̃max u from its referee nodes, i.e., p̃max u = non-faulty candidate node can successfully send messages to
max{pw : w ∈ Ru }. If IDu = p̃max
max
u and u was not marked other candidate nodes via the common non-faulty referee node.
as the leader, then u sends IDu , p̃max u to its referee nodes Ru Lemma 1: Consider an n-node network with at least αn non-
in the next round and mark itself as the leader. If IDu = p̃max u faulty nodes, where α is a real number in the range [log2 n/n, 1].
and u was marked as the leader, then u doesn’t respond in If each node selects itself with probability 6 log n/(αn) to be-
the next round. Otherwise, if IDu = p̃max u then u decides the come a candidate node then with high probability, i.e., with prob-
following. If u receives IDv , p̃max u from its referee nodes, ability ≥ (1 − 1/n), the number of selected candidate nodes is
i.e., p̃max
u is proposed by some other candidate node v as its own |C| = Θ(log n/α).
proposed rank, then u sends IDu , p̃max u and considers v as Proof: Let a random variable X denotes the number of se-
the leader until any further updates. The node v might crash in lected candidate nodes. Since each node selects itself indepen-
Step 1. In that case, any non-faulty nodes which do not receive dently with probability 6 log n/αn, the expected value of X
v’s rank may propose some higher rank in the next iteration. is E[X] = 6 log n/α. Thus, by using the Chernoff bound [53],
2
Thus, u might receive some higher rank later and accordingly Pr[X ≥ (1 + δ)E[X]] ≤ e−δ E[x]/3 for δ = 1 we get,
updates its rankList (i.e., removes the smaller ranks from the
rankListu ). If u receives ⊥, p̃max and p̃max ∈ rankListu 1
u u Pr[X ≥ 12 log n/α] ≤ e−6 log n/3α < e−2 log n/α = .
then u sends IDu , p̃u max
in the next round, otherwise (if n2/α
p̃max
u ∈/ rankListu ), u proposes a higher rank than p̃max u from Again using the Chernoff bound, Pr[X ≤ (1 − δ)E[X]] ≤
its rankListu in the next round. Whenever u sends or proposes 2
e−δ E[x]/2 for δ = 2/3 we get,
a higher rank, it updates its rankListu by removing the smaller
ranks from it. If a node proposes itself as the potential leader 1
Pr[X ≤ 2 log n/α] ≤ e−24 log n/18α < e− log n/α = .
successfully (i.e., without crashing) then the other nodes also n1/α
agree on that node as the leader.

5 A candidate may not receive all other candidate nodes’ rank due to faulty 6 [Online]. Available: http://doi.ieeecomputersociety.org/10.1109/TPDS.
candidate nodes in C. 2023.3239993

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY GUWAHATI. Downloaded on November 13,2023 at 06:41:13 UTC from IEEE Xplore. Restrictions apply.
1120 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 34, NO. 4, APRIL 2023

Thus, the size of the candidate set is 2 log n/α ≤ |C| ≤ the iteration, the candidate nodes and the referee nodes com-
12 log n/α with high probability for any value of α ≤ 1. municate with each other for O(log n/α) rounds, incurring
Lemma 2: The set of candidate nodes C contains at least one O((n1/2 log5/2 n)/α5/2 ) messages. Therefore, the total mes-
non-faulty node with high probability. sage complexity of the algorithm is O((n1/2 log5/2 n)/α5/2 ).
Proof: Since |C| are selected independently and uniformly Finally, we show the correctness of the algorithm. First, we
from n nodes and there are at least αn non-faulty nodes, show that all the non-faulty candidate nodes agree on a leader,
the probability that all the nodes in C are faulty is at most and then show that the leader is unique. Suppose, a non-faulty
(1 − (αn)/n)|C| , which is e−|C|α ≤ e−2 log n = 1/n2 . That is, candidate node, say u, doesn’t have any knowledge about the
the probability that C has at least one non-faulty node is at least leader. Then u proposes the minimum rank from its rankListu
1 − 1/n2 . Thus, with high probability C contains at least one as a potential leader (see, Step 1). If u receives a higher rank
non-faulty node. from any other candidate node, say w, then u updates its leader
Lemma 3: Any pair of candidate nodes have at least one accordingly (see, Step 3). Therefore, all the candidate nodes
common non-faulty referee node with high probability. must agree on a leader when the algorithm terminates. Now we
Proof: First, it follows from Lemma 2 that any Ru contains show that the leader is unique. If not, then assume that there exist
at least one non-faulty node, since |Ru | > |C|. Let us consider two candidate nodes u and v such that u agrees on a leader 1
two candidate nodes, namely u and v. We show that there is and v agrees on a leader 2 at the end of the algorithm. Without
at least one common non-faulty referee node for u and v, i.e., loss of generality, assume that 1 > 2 . This means that the rank
Ru ∩ Rv has at least one non-faulty node with high probability. 1 proposed by u is not reached to v. This implies that u has
Recall that each candidate node samples 2(n log n/α)1/2 referee crashed during the iteration—which contradicts that u agrees
nodes independently and uniformly among n nodes. As there are on a leader at the end of the algorithm. Therefore, a crashed
at least αn non-faulty nodes, the probability of sampling any node is never elected as a leader, but it may crash later. Since
non-faulty node as referee node is: αn (2(n log n/α) 1/2
) at least αn nodes are non-faulty in the network, the probability
n . n . Thus, the
probability that the candidate node u is not selecting a non-faulty that the elected leader is non-faulty is at least α.
For any constant value of α, our algorithm solves the
node as the referee node is (1 − 2(α log n/n)1/2 ). It also holds
fault-tolerant leader election problem in O(log n) rounds and
for v.
O(n1/2 log5/2 n) messages. Thus, we get the following corol-
Now the candidate node v samples 2(n log n/α)1/2 referee
lary immediately.
nodes. So, the probability of not selecting a non-faulty referee
Corollary 1: Consider a complete network of n nodes and
node from the sampled nodes of u is:
CONGEST communication model. There is a leader election
2(n log n/α)1/2 algorithm which elects a leader in O(log n) rounds and incurs
2(n log n/α)1/2 1 O(n1/2 log5/2 n) messages with high probability when tolerat-
1− ≤ e−4 log n = .
n/α n4 ing at most n − n/c faulty nodes for any constant c ≥ 1. The
elected leader is non-faulty with a probability of at least 1/c.
Therefore, the probability of selecting at least one non-faulty Remark 1: The above message bounds are in terms of the
referee node from the sampled nodes of u is at least 1 − 1/n4 . number of messages. Since the size of a message can be at most
Taking the union bound over all the pairs, the claim holds for O(log n) bits (recall, it is a CONGEST model), the message
any pair of candidate nodes. complexity may be increased by a O(log n) factor in terms of
Finally, we show the round and message complexity of our bits.
algorithm in the main result of this section.
Theorem 4.1: Consider a complete network of n nodes with B. Lower Bound on the Message Complexity
at least αn non-faulty nodes and CONGEST communica-
We show a lower bound on the number of messages required
tion model. Then there is a fault-tolerant leader election algo-
to solve fault-tolerant leader election. In particular, we show that
rithm which elects a leader in O(log n/α) rounds and incurs
any algorithm, solving leader election with probability at least
O((n1/2 log5/2 n)/α5/2 ) messages with high probability, such
2/e + , for any constant  > 0 and tolerates (1 − α)n faulty
that the elected leader is non-faulty with probability at least α,
nodes requires sending Ω(n1/2 /α3/2 ) messages.
where α is real a number in [log2 n/n, 1].
Our model assumes that all nodes execute the same algorithm
Proof: The algorithm runs for O(log n/α) iterations. Each
and have access to an unbiased private coin (through which they
iteration takes at most 4 rounds. There are some pre-processing
can generate random bits locally). We assume for now that nodes
steps before the iterations, which take O(log n/α) rounds (since
are anonymous, i.e., nodes do not have IDs. Later we show by
the referee nodes may need to send O(log n/α) ranks to their
a simple reduction that the lower bound still holds even if the
candidate nodes in parallel). So the time complexity of our
nodes start with unique IDs, but are not known to other nodes.
algorithm is O(log n/α) rounds.
The lower bound is unconditional on the running time and holds
The size of the candidate nodes is O(log n/α). Each can-
even in the LOCAL model of distributed computing7 (which
didate node samples O(((n log n)/α)1/2 ) referee nodes and means that it also holds for the CONGEST model).
sends rank to them. Each referee node sends O(log n/α) ranks
to their respective candidate nodes. So the message complex- 7 In LOCAL model, a node can send a message of arbitrary size through an
ity is O((n1/2 log5/2 n)/α5/2 ) before the iterations start. In edge per round.

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY GUWAHATI. Downloaded on November 13,2023 at 06:41:13 UTC from IEEE Xplore. Restrictions apply.
KUMAR AND MOLLA: ON THE MESSAGE COMPLEXITY OF FAULT-TOLERANT COMPUTATION: LEADER 1121

A Brief Overview of the Proof: The basic proof idea is adapted initiator if it is not influenced before sending its first mes-
from the paper [21], where they showed a message lower bound sage. That is, if u sends its first message in round r, then
of Ω(n1/2 ) by any leader election algorithm in a “non-faulty” u has an outgoing edge in C r and is an isolated vertex in
network. While this Ω(n1/2 ) bound also holds in a faulty net- C 1 , . . . , C r−1 . For every initiator u, we define the influence cloud
work, we improve the lower bound by a factor of α3/2 when IC ru as the pair IC ru = (Cur , Sur ), where Cur = u, w1 , . . . , wk is
there are at most (1 − α)n faulty nodes in the network. Thus, the ordered set of all nodes that are influenced by u, namely, that
our lower bound proof contains more technical challenges. For are reachable along a directed path in C r from u ordered by the
the sake of completeness, we include some results and arguments time by which they joined the cloud (breaking ties arbitrarily),
from [21], [23]. The high-level idea is: if an algorithm does not and Sur = σr (u), σr (w1 ), . . . , σr (wk ) is their configuration
send enough messages, i.e., communication among the nodes after round r, namely, their current tuple of states. We may
is less, then the algorithm might be wrong by electing multiple sometimes abuse notation by referring to the ordered node set Cur
leaders. Since there might be two or more groups (of nodes) as the influence cloud of u. Note that a passive (non-initiator)
without any communication among them, and each group has node v does not send any messages before receiving the first
an equal chance to elect a leader (as we assume that the nodes message from some other node.
are anonymous). More precisely, suppose there is an algorithm There must be enough initiator nodes, otherwise, if all the
A which solves leader election in this faulty network with initiator nodes crash, the algorithm may not succeed with high
o(n1/2 /α3/2 ) messages. Then we show a contradiction. For probability. In fact, we show that in a network with at least αn
this, we consider a communication graph, which is essentially non-faulty nodes, any leader election algorithm which succeeds
formed by the communication pattern of nodes during a run with at least a constant probability, requires at least 1/2α initiator
of A (an edge between two nodes if they exchange message). nodes.
Then we show that there are at least two disjoint components Lemma 4: Any leader election algorithm in this crash-fault
in the communication graph and each component has an equal model requires at least 1/2α initiator nodes (α < 1/24) to
probability to elect a leader in it– which leads to a contradiction. guarantee at least a constant success probability.
Thus, any algorithm that solves the leader election problem in Proof: It must be shown that there is at least one non-faulty
this faulty network requires to send Ω(n1/2 /α3/2 ) messages. In initiator node; otherwise, if all the initiator nodes are faulty, then
fact, we show the following result. they may crash at the beginning of the algorithm (in the worst
Theorem 4.2: Consider any algorithm A that solves leader case). Without loss of generality, assume that the initiator nodes
election and sends at most f (n) messages (of arbitrary size) are chosen uniformly at random. Let E be the event that there is
with high probability on a complete anonymous network of n at least one non-faulty initiator node among the 1/2α nodes
nodes with at least αn non-faulty nodes. If A solves leader when chosen uniformly at random. Then,
election with probability at least 2/e + , for any constant  > 0,
then f (n) ∈ Ω(n1/2 /α3/2 ).  1/2α  1/2α
αn
Notice that the message lower bound is inversely proportional Pr[E] = 1 − 1 − =1− 1−α
to the fraction of non-faulty nodes. This is intuitive, since, when n
the number of non-faulty nodes decreases (i.e., faulty nodes ≥ 1 − e−1/2
increase), there is a need of spending more resources (here it
is message) to mitigate the influence of the faulty nodes, and
hence the message complexity increases. The remainder of the Thus, the algorithm needs at least 1/2α initiator nodes so
section concentrates on the proof of Theorem 4.2. that at least one of them is non-faulty with probability at least
Assume, by contradiction, that there exists an algorithm A 1 − e−1/2 , a constant. Hence, the lemma.
which solves the fault-tolerant leader election problem with Since we assume that A sends o(n1/2 /α3/2 ) messages– a
probability at least 2/e +  and sends only f (n) ∈ o(n1/2 /α3/2 ) finite number of messages, there is some round ρ by which
messages. We show a contradiction. Consider a complete net- no more messages are sent. Let us assume Cuρ1 , Cuρ2 , . . . , Cuρi ,
work where for every node, the edges are randomly connected (i < 1/2α) are at most 1/2α influence clouds corresponding to
to the ports. In other words, an adversary chooses the connec- the initiator nodes in the end of the algorithm. Recall that some
tions of a node’s (n − 1) ports as a random permutation over influence clouds may merge during the execution. Therefore,
{1, 2, . . . , n − 1}. the size of the smallest influence cloud, say, Cuρ∗ is at most
1/2 3/2
We adapt the following definitions from [21]. Consider one o( n 1/2α

), which is o(n1/2 /α1/2 ).
(arbitrary but fixed) execution of the algorithm A. Let us define In general, it is possible that in a given execution, two influ-
the communication graph C r to be a directed graph on the ence clouds Cur 1 and Cur 2 intersect each other over some common
given set of n nodes, where there is an edge from u to v if node v, if v happens to be influenced by both u1 and u2 . The
and only if u sends a message to v in some round r ≤ r. following lemma shows that the low message complexity of the
For any node u, denote the state of u in round r by σr (u). algorithm A yields a good probability for the smallest influence
Let Σ be the set of all nodes’ states possible in the algorithm cloud Cuρ∗ to be disjoint from all other influence clouds.
A. We say that node u influence nodes w by round r if there Let N be the event that there is no intersection between (the
is a directed path from u to w in C r . A node u is called an nodes of) all the influence clouds and the smallest influence

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY GUWAHATI. Downloaded on November 13,2023 at 06:41:13 UTC from IEEE Xplore. Restrictions apply.
1122 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 34, NO. 4, APRIL 2023

cloud, i.e., (∪i:i=∗ Cuρi ) ∩ Cuρ∗ = ∅. Let M be the event that the 12α3 f 2 (n)
)(1
3 2 3 2
− n1 ) ≥ 1 − n1 − 12α nf (n) + 12α nf2 (n) ∈ 1 − o
n
algorithm A sends no more than f (n) messages. (1), as required.
Lemma 5: Assume that Pr[M ] ≥ (1 − n1 ). If f (n) ∈ We next consider potential cloud configurations, namely, Z =
1 12α3 f 2 (n)
o(n1/2 /α3/2 ), then Pr[N ∧ M ] ≥ 1 − n − n + σ0 , σ1 , . . . , σ1/2α , where σi ∈ Σ or every i, and more gener-
3
12α f (n) 2
∈ 1 − o(1). ally, potential cloud configuration sequences Z̄ r = Z 1 , . . . , Z r ,
n2
Proof: Consider a round r, the smallest influence cloud C r at where each Z i is a potential cloud configuration, which may
round r, and a node v ∈ C r . Without loss of generality assume potentially occur as the configuration tuple of some influence
that the smallest influence cloud is unique, otherwise, fix one clouds in a round i of some execution of the algorithm A. We
arbitrarily. Assuming an event M , there are at most f (n) nodes study the occurrence probability of potential cloud configuration
that have sent or received a message from any influence clouds sequences.
and may thus be a part of some other cloud except C r . Recall We say that the potential cloud configuration Z = σ0 , σ1 ,
that the port numbering of every node was chosen uniformly . . . , σ1/2α is realized by the initiator u in round r if the influence
at random, and we conditioned on the occurrence of the event cloud IC ru = (Cur , Sur ) has the same node states in Sur as those
M . The initiator node of the smallest influence cloud may know of Z, or more formally Sur = σr (u), σr (w1 ), . . . , σr (w1/2α )
the destinations of at most 2αf (n) of its ports in any round; such that σr (u) = σ0 and σr (wi ) = σi for every i ∈ [1, 1/2α].
out of them 2α2 f (n) ports are connected to non-faulty nodes in In this case, the influence cloud IC ru is referred to as a re-
expectation. Then it is easy to show using Chernoff bound (see alization of the potential cloud configuration Z. (Note that
Theorem 4.4 in [53]) that at most 12α2 f (n) ports are connected a potential cloud configuration may have many different
to non-faulty nodes with high probability. Similarly, there are at realizations.)
least αf (n)/2 non-faulty nodes among the f (n) nodes with high More generally, we say that the potential cloud configuration
probability (using the Chernoff bound in Theorem 4.5, [53]). sequence Z̄ r = (Z 1 , . . . , Z r ) is realized by the initiator u if for
Therefore, to send a message from the smallest influence cloud to every round i = 1, . . . , r, the influence cloud IC iu is a realization
another cloud, v must hit upon one of the ports which connected of the potential cloud configuration Z i . In this case, the sequence
of influence clouds of u up to round r, IC ¯ r = IC 1 , . . . , IC r ,
to a non-faulty node leading to other clouds. Let H be the event u u u
r
that a message sent by a node v in round r reaches a node u is referred to as a realization of Z̄ . (Recall, a potential cloud
that is already part of some other (non-singleton) cloud. (Recall configuration sequence may have many different realizations.)
that if u is in a singleton cloud due to not having received or For a potential cloud configuration Z, let Eur (Z) be the event
sent any messages yet, it simply becomes a member of v’s cloud that Z is realized by the initiator u in (round r of) the run of
and the smallest influence cloud does not send a message to the algorithm A. For a potential cloud configuration sequence Z r ,
singleton cloud of the initiator node, otherwise, it will not be the let Eu (Z r ) denote the event that Z r is realized by the initiator
smallest influence cloud). Therefore, the number of non-faulty u in (the first r rounds of) the run of algorithm A.
nodes in the influence clouds except the nodes in the smallest Lemma 6: Every initiator node has the same probability to be
influence cloud is: αf2(n) − 12α2 f (n) which is α(1−24α)f
2
(n)
. In the part of the smallest influence cloud.
consequence, we have the probability that the smallest influence Proof: Let a1 , a2 , . . . , a1/2α are the initiator nodes. Let Ei
cloud has to collude with any of the (1 − 2α)/2α (i.e., 1/2α − and Ej be the event that the initiator node ai and aj are the
1) clouds: part of the smallest influence cloud where i = j. Recall, each
node has the same probability to be an initiator node. Also,
α(1−24α)f (n)
2 each initiator node has the same probability to be a non-faulty
1
Pr [H|M ] ≤ 2 12α2 f (n) · node. Suppose Pr[Ei ] < Pr[Ej ] without loss of generality. This
i=1
n means that node aj has a higher probability to be in the smallest
3 2
12α f (n)(1 − 24α) 12α3 f 2 (n) influence cloud than node ai . Thus, in this way, it is possible to
≤ < estimate the size of the influence clouds in a biased way which
n n
essentially contradicts the uniformity of the initiator nodes.
which is o(1) as f (n) ∈ o(n1/2 /α3/2 ). Therefore, Pr[Ei ] = Pr[Ej ].
In the above inequality, the probability 12α2 f (n) is the Lemma 7: Restrict attention to executions of algorithm A that
number of non-faulty nodes (w.h.p.) in the smallest influence satisfy event N , namely, in which any influence cloud is disjoint
cloud and 1/n is the probability of selecting a particular node from other influence clouds. Then Pr[Eu (Z̄ r )] = Pr[Ev (Z̄ r )]
by another good node. Further, we have α(1−24α)f
2
(n)
non-faulty for every r ∈ [1, ρ], every potential cloud configuration sequence
nodes w.h.p. which can collude with the smallest influence cloud Z̄ r , and every two initiators u and v.
throughout the execution of the algorithm A. Moreover, a node in Proof: The proof is by induction on r. Initially, in round 1,
the smallest influence cloud might receive messages from other all possible influence clouds of the algorithm A are singletons,
influence clouds for the collision and vice versa. Consequently, i.e., their node sets contain just the initiator. Neither u nor v
we have a factor 2 for both ways whether message received or have received any messages from other nodes. This means that
sent by the cloud. Pr[σ1 (u) = s] = Pr[σ1 (v) = s] for all s ∈ Σ, thus any poten-
Observe that Pr[N |M ] = 1 − Pr[H|M ]. Since Pr[N ∧ M ] tial cloud configuration Z 1 = s has the same probability of
= Pr[N |M ].P [M ]. It follows that Pr[N ∧ M ] ≥ (1 − occurring for any initiator, implying the claim.

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY GUWAHATI. Downloaded on November 13,2023 at 06:41:13 UTC from IEEE Xplore. Restrictions apply.
KUMAR AND MOLLA: ON THE MESSAGE COMPLEXITY OF FAULT-TOLERANT COMPUTATION: LEADER 1123

Assuming that the result holds for round r − 1 ≥ 1, we show configuration Z may be the end-cloud of many different potential
that it still holds for round r. Consider a potential cloud config- cloud configuration sequences, and each such potential cloud
uration sequence Z̄ r = (Z 1 , . . . , Z r ) and two initiators u and configuration sequence may have many different realizations,
v. We need to show that Z̄ r is equally likely to be realized by u the above lemma implies the following (integrating over all
and v, conditioned on the event N . By the inductive hypothesis, possible choices).
the prefix Z̄ r−1 = (Z 1 , . . . , Z r−1 ) satisfies the claim. Hence, it Corollary 2: Restrict attention to executions of algorithm A
suffices to prove the following. Let pu be the probability of the that satisfy event N , namely, in which the smallest influence
event Eur (Z r ) conditioned on the event N ∧ Eur (Z̄ r−1 ). Define cloud is disjoint from the other influence cloud. Consider two
the probability pv similarly for v. Then it remains to prove that initiators u and v and a potential cloud configuration Z. Then Z
pu = pv . is equiv-probable for u and v.
To do that we need to show, for any state σj ∈ Z r , that the By assumption, the algorithm A succeeds with probability at
probability that wu,j the jth node in IC ru is in state σj , condi- least 2/e + , for some fixed constant  > 0. Let S be the event
tioned on the event N ∧ Eu (Z̄ r−1 ), is the same as the probability that A elects exactly one leader. We have
that wv,j , the jth node in IC rv is in state σj , conditioned on the 2
event N ∧ Ev (Z̄ r−1 ). +  ≤ Pr[S] = Pr[S|M ∧ N ] Pr[M ∧ N ]
e
There are two cases to be considered. The first is that the
potential influence cloud Z r−1 has j or more states. Then + Pr[S|not(M ∧ N )][not(M ∧ N )]
by our assumption that events Eu (Z̄ r−1 ) and Ev (Z̄ r−1 ) hold, ≤ Pr[S|M ∧ N ] Pr[M ∧ N ]
the nodes wu,j and wv,j were already in u’s and v’s influence
clouds, respectively, at the end of the round r − 1. The node + [not(M ∧ N )].
wu,j changes its state from its previous state, σj , to σj on round By Lemma 5, we know that Pr[M ∧ N ] ∈ 1 − o(1) and
r as the result of receiving some messages M1 , . . . , M from Pr[not(M ∧ N )] ∈ o(1), and thus it follows that
neighbors xu1 , . . . , xu in u’s influence cloud IC r−1
u , respectively.
2/e +  − o(1)
In turn, the node xuj sends a message Mj to wu,j on round r as the Pr[S|M ∧ N ] ≥
result of being in a certain state σr (xuj ) at the beginning of round 1 − o(1)
r (or equivalently, at the end of round r − 1) and making a certain 2
> , for sufficiently large n. (1)
random choice (with a certain probability qj for sending Mj to e
wu,j ). But if one assumes that the event Ev (Z̄ r−1 ) holds, namely, By Corollary 2, each of the initiators has the same probability
that Z̄ r−1 is realized by the initiator v, then the respective nodes p of realizing a potential cloud configuration where some node is
xv1 , . . . , xv in v’s influence cloud IC r−1v will be in the same a leader. Assuming that events M and N occur, it is immediate
respective states (σr (xvj ) = σr (xuj ) for every j) on the end of that 0 < p < 1. Let X be the random variable that represents
round r − 1, and therefore will send the messages M1 , . . . , M the smallest influence cloud is disjoint from the other influence
to the node wv,j with the same probabilities qj . Also, on the clouds. Recall that the algorithm A succeeds whenever an event
end of round r − 1, the node wv,j is in the same state σj as S occurs. Its success probability that the smallest influence cloud
wu,j (assuming event Ev (Z̄ r−1 )). It follows that the node wv,j is disjoint from the other influence cloud, is given by,
changes its state to σj on round r with the same probability as
the node wu,j . Pr[S|M ∧ N ] = 2p(1 − p) (2)
The second case to be considered is when the potential in- The probability value in (2) is maximum when p = 1/2, which
fluence cloud Z r−1 has fewer than j states. This means (con- yields that Pr[S|M ∧ N ] ≤ 1/2. This, however, is a contradic-
ditioned on the events Eu (Z̄ r−1 ) and Ev (Z̄ r−1 ) respectively) tion to (1). Thus, our assumption that there is an algorithm A that
that the nodes wu,j and wv,j were not in the respective influence solves leader election with probability at least 2/e +  using only
clouds on the end of the round r − 1. Rather, they were both o(n1/2 /α3/2 ) messages is wrong. That is, any algorithm incurs
passive nodes. By an argument similar to that made for round 1, Ω(n1/2 /α3/2 ) messages. This completes the proof of Theorem
any pair of (so far) passive nodes have equal probability of being 4.2.
in any state. Hence, Pr[σr−1 (wu,j ) = s] = Pr[σr−1 (wv,j ) = s] Now we argue using a standard technique that the above
for all s ∈ Σ. As in the former case, the node wu,j changes its lower bound holds for any algorithm which assumes that nodes
state from its previous state, σj to σj on round r as the result of re- are equipped with unique IDs. For this, we simply assume that
ceiving some messages M1 , . . . , M from neighbors xu1 , . . . , xu an adversary provides IDs, chosen uniformly at random from,
that are already in u’s influence cloud IC r−1 u , respectively. By say, [1, n4 ]. Let B be an algorithm that exploits these unique
a similar analysis, it follows that the node wv,j changes its state IDs. Notice, however, that the execution of B that exploits the
to σj on round r with the same probability as the node wu,j . adversarially generated random IDs is essentially equivalent to
We now conclude that for any potential cloud configuration the execution of B in the anonymous setting, in which each node
Z and any two initiators u and v, the events Euρ (Z) and Evρ (Z) first generates a random number between [1, n4 ] and then uses
are equally likely. More specifically, we say that the potential that random number as their IDs. The only difference is that
cloud configuration Z is equiv-probable for initiators u and v multiple nodes may generate the same random number, thereby
if Pr[Euρ (Z)|N ] = Pr[Evρ (Z)|N ]. Although a potential cloud inheriting the same ID, but this is an event whose probability is

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY GUWAHATI. Downloaded on November 13,2023 at 06:41:13 UTC from IEEE Xplore. Restrictions apply.
1124 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 34, NO. 4, APRIL 2023

at most 1/n (this can be easily shown using Chernoff bound as they agree on 0. If none of the candidate nodes receives 0 as an
the range is taken [1, n4 ]). So when we condition these random input (i.e., all of them get 1), they don’t send any messages and
numbers being distinct, the two executions are probabilistically agree on 1 eventually.
identical, so any lower bound on the execution without adver- Step 0: Each candidate node u does in parallel: if bu = 0 then
sarially generated IDs will extend to the case where nodes have u sends 0 to its referee nodes Ru and agrees on 0. If bu = 1 then
unique IDs provided by the adversary. A more formal argument u sends 1 to Ru , but doesn’t agree on 1. This step is required to
can be found in [21]. inform the nodes in Ru that they are selected as referee nodes
by u.
V. FAULT-TOLERANT AGREEMENT Then perform the following steps iteratively.
Step 1: If a candidate node u receives 0 from any of its referee
In this section, we first present a randomized algorithm that nodes, and it has not agreed on 0 before, then u sends 0 to its
solves agreement with high probability in a complete n-node referee node and agrees on 0. If u has agreed on 0 before, then
network with at most (1 − α)n faulty nodes, where α is a real u doesn’t send anything.
number in [log2 n/n, 1]. The algorithm takes O(log n/α) rounds Step 2: If a referee node w possesses 0 from any of its
and sends no more than O(n1/2 log3/2 n/α3/2 ) messages with candidate nodes, and it has not sent 0 earlier, then w sends 0
high probability. Our algorithm is message optimal (up to a to its candidate nodes, say, Cw .
polylog n factor) as we also show a matching lower bound The above two steps (Step 1 and 2) are performed for
Ω(n1/2 /α3/2 ) on the message complexity. O(log n/α) iterations, and then the algorithm terminates. In
Note that a leader election algorithm immediately gives a the end, all the (non-faulty) candidate nodes have the same
solution to the agreement problem: simply by agreeing on the minimum value, either 0 or 1. If they do not have 0, they agree
leader’s input value. Hence, our leader election algorithm also on 1. Intuitively, there are O(log n/α) candidate nodes and a
solves agreement, but then the message complexity would be single node may crash in each iteration (say, the single node
O(n1/2 log5/2 n/α5/2 ). We design an efficient algorithm which with value 0 crashes among the remaining candidate nodes) and
solves agreement using O(n1/2 log3/2 n/α3/2 ) messages in this thus, 0 may propagate slowly. But eventually, 0 reaches to all
faulty network. On the other hand, a lower bound of the leader the candidate nodes as the set of candidate nodes contains at
election problem doesn’t apply to the agreement as agreement least one non-faulty node (see Lemma 2) and there is a common
is an easier problem than the leader election. We show a lower non-faulty referee node between any pair of candidate nodes (see
bound Ω(n1/2 /α3/2 ) on the message complexity of the fault- Lemma 3). On the other hand, if all the candidate nodes possess
tolerant agreement problem. input value 1, then the algorithm doesn’t send any messages dur-
ing the iterations and terminates after O(log n/α) rounds with
A. Algorithm all the candidate nodes agree on 1. Thus, the algorithm agrees
Let us consider the binary agreement problem. Recall that on a unique value (which must be an input) after O(log n/α)
we consider an anonymous network, i.e., nodes do not know rounds (as each iteration takes at most 2 rounds). A pseudocode
each other, but know the values of n and α. Initially, each node is included in the supplemental material, available online.
receives a value in {0, 1} given by an adversary. Our goal is It is easy to see that the algorithm terminates in O(log n/α)
to design a message efficient algorithm that solves agreement rounds. In Step 0, at most O(log n/α) candidate nodes send
implicitly. At the end of the algorithm, a non-empty subset of a message to O(((n log n)/α)1/2 ) referee nodes. Later in the
the nodes must agree on a single value (between 0 and 1) while iteration, a candidate node may send a message to its referee
preserving the validity condition. nodes at most once– when it receives a 0 from another candidate
Let bu ∈ {0, 1} denotes the input bit of a node u. The basic node (via the referee nodes). The same is true for a referee node.
framework of the algorithm is similar to the leader election A referee node only sends the received value 0 to its candidate
algorithm in Section IV-A. First, select a smaller committee nodes only once. Also, note that all the messages contain only
of nodes, called the candidate nodes which agree on a value a single bit value. Therefore, the total message complexity of
among themselves. For this, a random set of candidate nodes of the algorithm is O(n1/2 log3/2 n/α3/2 ) bits. Thus, we get the
size Θ(log n/α) is selected, say the set is C. Since the candidate following main result of this section.
nodes do not know each other, they communicate among them- Theorem 5.1: Consider a complete n-node network with at
selves via referee nodes. For this, each candidate node samples least αn non-faulty nodes and CONGEST communication
O(n1/2 log1/2 n/α1/2 ) referee nodes from its neighbors. The model. Then there is a randomized algorithm which solves
set of referee nodes of a candidate node u is denoted by Ru . agreement with high probability in O(log n/α) rounds and in-
Recall that a node may be sampled as a referee node by multiple curs O(n1/2 log3/2 n/α3/2 ) message bits with high probability.
candidate nodes. For any constant value of α, the algorithm solves the fault-
The idea of the algorithm is: the candidate nodes are biased tolerant agreement in O(log n) rounds and O(n1/2 log3/2 n)
to agree on 0 if any of them receives 0 as an input bit. For this, messages. Thus, we get the following corollary immediately.
any candidate node whose input value is 0 forwards 0 to all the Corollary 3: Consider a complete n-node network and CON-
candidate nodes. If there is at least one candidate node with value GEST communication model. There is a randomized algorithm
0, then 0 is propagated to all the candidate nodes eventually, and which solves agreement with high probability in O(log n)

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY GUWAHATI. Downloaded on November 13,2023 at 06:41:13 UTC from IEEE Xplore. Restrictions apply.
KUMAR AND MOLLA: ON THE MESSAGE COMPLEXITY OF FAULT-TOLERANT COMPUTATION: LEADER 1125

rounds and incurs O(n1/2 log3/2 n) message bits with high forest in which each tree contains exactly one node (called its
probability when tolerating at most n − n/c faulty nodes for root) with zero in-edges and, furthermore, at least two tree edges
any constant c ≥ 1. are oriented away from the root.
Implicit to Explicit Agreement: The above result accomplishes Proof: Recall that the number of messages passed by A
sub-linear message bound (when α = o(log n/n1/3 )) for the is at most o(n1/2 /α3/2 ) with probability at least 1 − o(1).
implicit agreement. For the explicit agreement, the message Therefore, with probability at least 1 − o(1), we can apply
lower bound is Ω(n), since the agreed value needs to reach the condition that the number of nodes that participate in any
to all the nodes in the network. The above implicit agreement form of communication (either sending or receiving) is at most
algorithm can be easily extended to an explicit agreement using o(n1/2 /α3/2 ). Recall also that each message is sent to a random
O(n log n/α) messages in one extra round. At the end of the node. From the Lemma 4, in a similar way, there exist at
implicit agreement algorithm, the candidate nodes which reach least 1/2α initiator nodes. Therefore, the smallest tree possess
1/2
/α3/2 )
an agreement broadcast the agreed value to all the nodes in the at max o(n 1/2α i.e., o(n1/2 /α1/2 ) nodes. Also, a node
network in parallel in a single round. Then all the nodes agree is non-faulty with at least α probability. Hence, there exist
on the received value. Since the size of such candidate nodes o(n1/2 /α1/2 )α i.e., o(n1/2 α1/2 ) good nodes in the smallest tree
is at most O(log n/α), the message complexity of this step is in expectation. By Chernoff bound (see Theorem 4.4 in [53]), we
O(n log n/α). Hence, the claim. have 7 × o(n1/2 α1/2 ) i.e., o(n1/2 α1/2 ) good nodes with high
probability. Recall that in the network there are o(n1/2 /α1/2 )
B. Lower Bound on the Message Complexity good nodes in expectation with high probability. Therefore,
the probability that none of those o(n1/2 α1/2 ) messages is
We show that any algorithm for implicit agreement with targeted towards either a good node that has sent messages
constant success probability requires Ω(n1/2 /α3/2 ) messages or a good node that has already received some message is at
to be sent in order to achieve implicit agreement with probability least (1 − o(n1/2 α1/2 )/n)o(n1/2 /α1/2 ) ≥ 1 − o(1). Removing
at least 1 −  for a sufficiently small  > 0. Section IV-B shows the conditioning, we get the probability that at least two compo-
that leader election requires Ω(n1/2 /α3/2 ) messages, but our nents in G are rooted and oriented trees is again at least 1 − 
p
proof is different because, under leader election, all nodes essen- for some fixed  > 0.
tially start in identical configurations. In the implicit agreement Thus, for the rest of the argument, we apply the condition that
problem, however, the nodes start with initial values and our for all p ∈ [0, 1], G is a forest as described in Lemma 4. We
p
lower bound argument takes that into account and shows that say that a tree is a deciding tree if there is at least one deciding
no algorithm can exploit those initial values in order to reach node within that tree.
agreement with fewer messages. Our approach is highly inspired Lemma 9: For every value of p ∈ [0, 1], the probability that
by the work done in [23]. there are at least two deciding trees is at least a constant.
Consider a randomized algorithm A to achieve implicit agree- Proof: Since A reaches agreement with probability at least
ment with probability 1 − . For the sake of contradiction, let 1 − , the probability that no tree decides is at most .
us assume that A sends at most o(n1/2 /α3/2 ) messages with For all values of p, the probability that there is at most one
probability at least 1 − o(1). We will show that A must have deciding tree is at most a suitable constant c bounded away from
an error probability that is at least a constant, so for the sake of 1. Otherwise, there exists a p value for which, with probability
contradiction, assume that the probability with which A does at least c, exactly one tree decides. Then, quite easily, the
not reach implicit agreement is at most o(1). For now, we will root of that deciding tree can be made a leader. This yields a
assume that the nodes are anonymous, so A does not have leader election protocol that succeeds with probability at least c,
access to unique node identifiers or any other identifying feature which will contradict Theorem 4.2 when c is a sufficiently large
pertaining to individual nodes. constant strictly less than 1 − .
Recall also that the network is anonymous, so a node u, at least Adding up all these possibilities, we are still left with a
at the starting of the protocol, will not know which incident edge probability of at least a constant, q < 1 −  − c with which there
leads to some neighbor v. In fact, for the purpose of showing must be two or more deciding trees.
this lower bound, we assume that for every node, the neighbor Now we know that there are at least two deciding trees with
sequence (as we enumerate from port 1 to port n − 1) is a constant probability, we show that these trees reach opposing
uniformly random permutation independent of all other nodes’ decisions with probability bounded from below by a constant,
permutations. thereby leading to a contradiction that A reaches agreement with
Given any p ∈ [0, 1], let Cp denote the (random) starting high probability.
configuration wherein each node is independently assigned an Lemma 10: There exists a value p∗ ∈ [0, 1] such that the
initial value of 1 with probability p, or 0 with probability 1 − p. probability that there are at least two deciding trees in G ∗ with
p
Let Gp be the (random) directed graph on the n nodes with an opposing decisions is at least a constant.
edge from u to v if and only if u sent a message to v and the Proof: We can define the probabilistic valency of p with
message was sent before v sent any message to u as A executed respect to A, denoted V , as the probability that A will ter-
p
from the starting configuration Cp . We now show that, with minate with a decision value 1 under the initial configuration
probability at least 1 − , Gp is a forest comprising trees that C . Thus, the probabilistic valencies of p = 0 and p = 1 are
p
are directed away from their respective root. 0 and 1, respectively. It is easy to see that Vp is a continuous
Lemma 8: With probability at least 1 −  for some arbitrarily function of p ∈ [0, 1] as infinitesimally small changes to p can
small but constant  > 0, the graph G for all p ∈ [0, 1] is a only result
Authorized licensed use limited to: INDIAN INSTITUTE OFpTECHNOLOGY GUWAHATI. Downloaded in infinitesimally small UTC
changes to VXplore.
on November 13,2023 at 06:41:13 from IEEE p . This is true apply.
Restrictions
1126 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 34, NO. 4, APRIL 2023

because infinitesimally small changes to p will only result in that if a constant fraction of nodes is faulty, then the message
infinitesimally small changes to Cp , and since the algorithm’s complexity of leader election and agreement is asymptotically
behavior is based on the initial configuration (which does not same as in the fault-free network (complete network). We also
change much), Vp will also only change infinitesimally. This showed a non-trivial lower bound of message complexity of
continuity in Vp implies that for every x ∈ [0, 1], there is a p both the problems. Some open questions that are raised by our
such that Vp = x. work: (1) There is a gap between the upper and lower bounds
We prove this lemma under the condition that there are at of the message complexity of leader election. Is it possible to
least two deciding trees and that these two trees do not interact narrow down this gap? (2) Extend the study of the message
with each other. We have seen that enforcing this conditioning complexity of the problem in general graphs. (3) Finally, whether
requires a probability of q bounded away from 0. An important a sub-linear message bound agreement protocol is possible in the
implication here is that, despite this conditioning, the probabili- presence of Byzantine node failure.
ties of the two decision values must be bounded from below by
a constant. For the sake of contradiction, let us suppose (without
REFERENCES
loss of generality) that the algorithm decides 1 with probability
at most o(1) under this conditioning. Let us fix p such that [1] L. Lamport, R. E. Shostak, and M. C. Pease, “The byzantine generals
problem,” ACM Trans. Program. Lang. Syst., vol. 4, no. 3, pp. 382–401,
Vp = 1 − q/2. Then, the probability that it decides 1 without 1982.
the conditioning will be at most q(o(1)) + (1 − q) < 1 − q/2, [2] G. L. Lann, “Distributed systems - towards a formal approach,” in Proc.
a contradiction. 7th IFIP Congr. Inf. Process., 1977, pp. 155–160.
[3] M. C. Pease, R. E. Shostak, and L. Lamport, “Reaching agreement in the
Now consider two deciding trees T1 and T2 . For simplicity, presence of faults,” J. ACM, vol. 27, no. 2, pp. 228–234, 1980.
let us assume that the decision outcome of T1 is (say) 0; this [4] E. Nygren, R. K. Sitaraman, and J. Sun, “The Akamai network: A platform
occurs with some constant probability. We now reason that the for high-performance internet applications,” ACM SIGOPS Oper. Syst.
Rev., vol. 44, no. 3, pp. 2–19, 2010.
probability that the decision outcome of T2 is 1 will be at least [5] T. D. Chandra, R. Griesemer, and J. Redstone, “Paxos made live: An
a constant. This is true because the random input values for T2 engineering perspective,” in Proc. ACM Symp. Princ. Distrib. Comput.,
were all chosen independently of the input values for T1 . Thus, 2007, pp. 398–407.
[6] E. Shi and A. Perrig, “Designing secure sensor networks,” IEEE Wireless
the two trees can decide contradictory values with probability at Commun., vol. 11, no. 6, pp. 38–43, Dec. 2004.
least a constant. [7] M. U. Rahman, “Leader election in the Internet of Things: Challenges and
Thus, we have shown that when the nodes are anonymous and opportunities,” 2019, arXiv:1911.00759.
the number of messages sent is at most o(n1/2 /α3/2 ), implicit [8] D. P. Anderson and J. Kubiatowicz, “The worldwide computer,” Scientific
American, vol. 286, no. 3, pp. 28–35, 2002.
agreement is not reached with probability at least a constant. [9] S. Kutten and D. Zinenko, “Low communication self-stabilization through
To generalize to the case where nodes do have IDs, we simply randomization,” in Proc. Int. Symp. Distrib. Comput., 2010, pp. 465–479.
[10] S. Ratnasamy, P. Francis, M. Handley, R. M. Karp, and S. Shenker,
assume that the adversary provides ID’s chosen uniformly at “A scalable content-addressable network,” in Proc. Conf. Appl. Technol.
random from, say, [1, n4 ]. Let A∗ be an algorithm that exploits Archit. Protoc. Comput. Commun., 2001, pp. 161–172.
these unique IDs. [11] S. C. Rhea, P. R. Eaton, D. Geels, H. Weatherspoon, B. Y. Zhao, and
Notice, however, that the execution of A∗ that exploits J. Kubiatowicz, “Pond: The oceanstore prototype,” in Proc. Conf. File
Storage Technol., Cathedral Hill Hotel, San Francisco, , California, USA,
the adversarially generated random IDs is essentially akin to 2003. [Online]. Available: http://www.usenix.org/events/fast03/tech/rhea.
the execution of A∗ in the anonymous setting in which each html
node first generates a random number between [1, n4 ] and then [12] A. I. T. Rowstron and P. Druschel, “Pastry: Scalable, decentralized ob-
ject location, and routing for large-scale peer-to-peer systems,” in Proc.
uses that random number as their IDs. The only difference is that IFIP/ACM Int. Conf. Distrib. Syst. Platforms, 2001, pp. 329–350.
multiple nodes may generate the same random number, thereby [13] A. Wright, “Contemporary approaches to fault tolerance,” Commun. ACM,
vol. 52, no. 7, pp. 13–15, 2009.
inheriting the same ID, but this is an event whose probability is [14] D. Gupta, J. Saia, and M. Young, “Resource burning for permissionless
at most 1/n, so when we condition on these random numbers systems,” in Proc. IFIP/ACM Int. Conf. Distrib. Syst. Platforms Open
being distinct, the two executions are probabilistically identical, Distrib. Process., 2020, pp. 19–44.
[15] A. Agbaria and R. Friedman, “Overcoming byzantine failures using check-
so any lower bound on the execution without adversarially pointing,” Urbana-Champaign Coordinated Scienc. Lab. Tech., Univ. Illi-
generated IDs will extend to the case where nodes have unique nois, Tech. Rep. UILU-ENG-03-2228 (CRHC-03-14), 2003.
IDs provided by the adversary. Thus, [16] Y. Amir et al., “Scaling Byzantine fault-tolerant replication to wide area
networks,” in Proc. Int. Conf. Dependable Syst. Netw., 2006, pp. 105–114.
Theorem 5.2: Suppose A is an algorithm that solves implicit [17] M. Castro and B. Liskov, “Practical Byzantine fault tolerance and proactive
agreement with probability at least 1 −  for some sufficiently recovery,” ACM Trans. Comput. Syst., vol. 20, no. 4, pp. 398–461, 2002.
small constant  > 0 in a network of n nodes with unique [18] D. Malkhi and M. K. Reiter, “Unreliable intrusion detection in distributed
computations,” in Proc. 10th Comput. Secur. Found. Workshop, 1997,
IDs. Then, with probability at least a constant, the message pp. 116–125.
complexity of A is at least Ω(n1/2 /α3/2 ). [19] H. Attiya and J. L. Welch, Distributed Computing - Fundamentals, Simu-
lations, and Advanced Topics, 2nd ed. Hoboken, NJ, USA: Wiley, 2004.
[20] S. Kutten, G. Pandurangan, D. Peleg, P. Robinson, and A. Trehan, “On
VI. CONCLUSION AND FUTURE WORKS the complexity of universal leader election,” J. ACM, vol. 62, no. 1,
pp. 7:1–7:27, 2015.
In this paper, we studied the role played by randomization [21] S. Kutten, G. Pandurangan, D. Peleg, P. Robinson, and A. Trehan, “Sublin-
ear bounds for randomized leader election,” Theor. Comput. Sci., vol. 561,
in the fault-tolerant distributed network. In particular, we ex- pp. 134–143, 2015.
amined how randomization can help to solve agreement and [22] N. A. Lynch, Distributed Algorithms. San Mateo, CA, USA: Morgan
leader election efficiently in a crash-fault setting. We showed Kaufmann, 1996.

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY GUWAHATI. Downloaded on November 13,2023 at 06:41:13 UTC from IEEE Xplore. Restrictions apply.
KUMAR AND MOLLA: ON THE MESSAGE COMPLEXITY OF FAULT-TOLERANT COMPUTATION: LEADER 1127

[23] J. Augustine, A. R. Molla, and G. Pandurangan, “Sublinear message [46] N. Braud-Santoni, R. Guerraoui, and F. Huc, “Fast byzantine agreement,”
bounds for randomized agreement,” in Proc. ACM Symp. Princ. Distrib. in Proc. ACM Symp. Princ. Distrib. Comput., 2013, pp. 57–64.
Comput., 2018, pp. 315–324. [47] B. M. Kapron, D. Kempe, V. King, J. Saia, and V. Sanwalani, “Fast
[24] S. Gilbert and D. R. Kowalski, “Distributed agreement with optimal asynchronous Byzantine agreement and leader election with full infor-
communication complexity,” in Proc. Annu. ACM-SIAM Symp. Discrete mation,” in Proc. Annu. ACM-SIAM Symp. Discrete Algorithms, 2008,
Algorithms, 2010, pp. 965–977. pp. 1038–1047.
[25] B. Chor, M. Merritt, and D. B. Shmoys, “Simple constant-time consensus [48] V. King and J. Saia, “From almost everywhere to everywhere: Byzantine
protocols in realistic failure models,” J. ACM, vol. 36, pp. 591–614, 1989. agreement with Õ(n3/2) bits,” in Proc. Int. Symp. Distrib. Comput., 2009,
[26] D. Peleg, “Distributed computing: A locality-sensitive approach,” USA: pp. 464–478.
Soc. Ind. Appl. Mathematics, 2000. [49] V. King and J. Saia, “Scalable Byzantine computation,” SIGACT News,
[27] R. G. Gallager, P. A. Humblet, and P. M. Spira, “A distributed algorithm vol. 41, no. 3, pp. 89–104, 2010.
for minimum-weight spanning trees,” ACM Trans. Program. Lang. Syst., [50] V. King and J. Saia, “Breaking the O(n2) bit barrier: Scalable byzan-
vol. 5, no. 1, pp. 66–77, 1983. tine agreement with an adaptive adversary,” J. ACM, vol. 58, no. 4,
[28] Y. Afek and E. Gafni, “Time and message bounds for election in syn- pp. 18:1–18:24, 2011.
chronous and asynchronous complete networks,” SIAM J. Comput., vol. 20, [51] V. King and J. Saia, “Byzantine agreement in expected polynomial time,”
no. 2, pp. 376–394, 1991. J. ACM, vol. 63, no. 2, pp. 13:1–13:21, 2016.
[29] P. Humblet, “Selecting a leader in a clique in 0(N log N) messages,” in [52] V. King, J. Saia, V. Sanwalani, and E. Vee, “Scalable leader elec-
Proc. IEEE 23rd Conf. Decis. Control, 1984, pp. 1139–1140. tion,” in Proc. Annu. ACM-SIAM Symp. Discrete Algorithms, 2006,
[30] E. Korach, S. Kutten, and S. Moran, “A modular technique for the design pp. 990–999.
of efficient distributed leader finding algorithms,” ACM Trans. Program. [53] M. Mitzenmacher and E. Upfal, Probability and Computing:
Lang. Syst., vol. 12, no. 1, pp. 84–101, 1990. Randomized Algorithms and Probabilistic Analysis. Cambridge, U.K.:
[31] E. Korach, S. Moran, and S. Zaks, “The optimality of distributive construc- Cambridge Univ. Press, 2005, doi: 10.1017/CBO9780511813603..
tions of minimum weight and degree restricted spanning trees in a complete
network of processors,” SIAM J. Comput., vol. 16, no. 2, pp. 231–236,
1987.
[32] E. Korach, S. Moran, and S. Zaks, “Optimal lower bounds for some dis-
tributed algorithms for a complete network of processors,” Theor. Comput.
Sci., vol. 64, no. 1, pp. 125–132, 1989.
Manish Kumar received the BTech degree in in-
[33] G. Singh, “Efficient distributed algorithms for leader election in complete
formation technology from the National Institute of
networks,” in Proc. Int. Conf. Distrib. Comput. Syst., 1991, pp. 472–479. Technology (NIT), Kurukshetra, India, in 2015, and
[34] B. S. Chlebus and D. R. Kowalski, “Gossiping to reach consensus,” in Proc.
the MTech degree in computer science from the
14th Annu. ACM Symp. Parallel Algorithms Archit., 2002, pp. 220–229.
Indian Statistical Institute, Kolkata, India, in 2018.
[35] B. S. Chlebus and D. R. Kowalski, “Robust gossiping with an application
He was a summer intern with Cryptology and Secu-
to consensus,” J. Comput. Syst. Sci., vol. 72, pp. 1262–1281, 2006. rity Research Unit, Indian Statistical Institute (ISI),
[36] B. S. Chlebus and D. R. Kowalski, “Locally scalable randomized consen-
Kolkata, India, in 2017. During MTech degree’s thesis
sus for synchronous crash failures,” in Proc. 21st Annu. Symp. Parallelism
work, from 2017 to 2018, he worked in the security
Algorithms Archit., 2009, pp. 290–299.
of XCB and HCTR. He is currently a senior research
[37] B. S. Chlebus, D. R. Kowalski, and M. Strojnowski, “Fast scalable deter-
fellow with the ISI, Kolkata, India. His research has
ministic consensus for crash failures,” in Proc. ACM Symp. Princ. Distrib.
been concerned with secure distributed computation and distributed network
Comput., 2009, pp. 111–120.
algorithms.
[38] Z. Bar-Joseph and M. Ben-Or, “A tight lower bound for randomized
synchronous consensus,” in Proc. 17th Annu. ACM Symp. Princ. Distrib.
Comput., 1998, pp. 193–199.
[39] D. R. Kowalski and J. Mirek, “On the complexity of fault-tolerant consen-
sus,” in Proc. 7th Int. Conf. Netw. Syst., 2019, pp. 19–31.
[40] M. T. Hajiaghayi, D. R. Kowalski, and J. Olkowski, “Improved communi-
cation complexity of fault-tolerant consensus,” in Proc. 54th Annu. ACM Anisur Rahaman Molla received the PhD degree
SIGACT Symp. Theory Comput., 2022, pp. 488–501. from Nanyang Technological University, Singapore.
[41] D. Alistarh, J. Aspnes, V. King, and J. Saia, “Communication-efficient He is an Assistant Professor with the Cryptology
randomized consensus,” Distrib. Comput., vol. 31, no. 6, pp. 489–501, and Security Research Unit, Indian Statistical In-
2018. stitute (ISI), Kolkata, India. He has held a faculty
[42] B. S. Chlebus, D. R. Kowalski, and J. Olkowski, “Brief announcement: position with the National Institute of Science Ed-
Deterministic consensus and checkpointing with crashes: Time and com- ucation and Research (NISER) Bhubaneswar. He did
munication efficiency,” in Proc. ACM Symp. Princ. Distrib. Comput., 2022, postdoc from the University of Freiburg, Germany
pp. 106–108. and Singapore University of Technology and Design
[43] S. Gilbert, P. Robinson, and S. Sourav, “Leader election in well-connected (SUTD). He is a member of the ACM. He serves as a
graphs,” in Proc. ACM Symp. Princ. Distrib. Comput., 2018, pp. 227–236. program committee member for several international
[44] D. R. Kowalski and M. A. Mosteiro, “Time and communication complexity conferences. He also serves as an external reviewer for numerous international
of leader election in anonymous networks,” in Proc. Int. Conf. Distrib. journals and conferences, and has given several invited talks and tutorials. His
Comput. Syst., 2021, pp. 449–460. research interests include distributed network algorithms, security in distributed
[45] I. Abraham et al., “Communication complexity of Byzantine agree- computing, and distributed mobile robots. His work is supported by the Depart-
ment, revisited,” in Proc. ACM Symp. Princ. Distrib. Comput., 2019, ment of Science & Technology (DST), Govt. of India. More information can be
pp. 317–326. found at: https://www.isical.ac.in/ ∼molla

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY GUWAHATI. Downloaded on November 13,2023 at 06:41:13 UTC from IEEE Xplore. Restrictions apply.

You might also like