Bolt-Dumbo Transformer Achieves Asynchronous Consensus As Fast As Pipelined BFT

Bolt-Dumbo Transformer: Asynchronous Consensus
As Fast As the Pipelined BFT

Yuan Lu Zhenliang Lu Qiang Tang
Institute of Software School of Computer Science School of Computer Science
Chinese Academy of Sciences The University of Sydney The University of Sydney
luyuan@iscas.ac.cn zhlu9620@uni.sydney.edu.au qiang.tang@sydney.edu.au
that the transactions shall output reasonably quickly [3,

arXiv:2103.09425v3 [cs.CR] 23 Nov 2021
Abstract—An urgent demand of deploying BFT consensus (e.g.,

atomic broadcast) over the Internet is raised for implementing 21, 53], e.g., in a polynomial amount of rounds.1
(permissioned) blockchains. Deterministic (partial) synchronous
protocols can be simple and fast in good network conditions, A desideratum for robust BFT in the absence of network
but subject to denial-of-service when synchrony assumption fails. synchrony. The dynamic nature of Internet poses new fun-
Alternative asynchronous protocols, on the contrary, are robust damental challenges for implementing secure yet still highly
against the adversarial network, but are substantially more efficient ABC protocols. Traditionally, most practical ABC
complicated and slower for the inherent usage of randomization.
Facing the issues, optimistic asynchronous atomic broadcast
protocols [27] were studied for the in-house scenarios where
(Kursawe-Shoup, 2002; Ramasamy-Cachin, 2005) was proposed participating parties are geographically close and well con-
to improve the normal-case performance of the slow robust nected. Unsurprisingly, their securities rely on some form
asynchronous protocols. They run a deterministic fastlane if the of assumptions about the network conditions. For example,
network condition remains good, and can fall back to a fully asyn- classic synchrony assumption needs all messages to deliver
chronous protocol via a pace-synchronization mechanism (analog
to view-change with asynchronous securities) if the fastlane fails.
within a known delay, and its weaker variant called partial
Unfortunately, existing pace-synchronization directly uses a heavy synchrony [32] (a.k.a. eventual synchrony) assumes that after
tool of asynchronous multi-valued validated Byzantine agreement an unknown global stabilization time (GST), all messages can
(MVBA). When it is frequently invoked in the fluctuating WAN be delivered synchronously. Unfortunately, these synchrony
setting, the benefits of adding the fastlane can be eliminated. assumptions may not always hold in the wide-area network
We present Bolt-Dumbo Transformer (BDT), a generic frame-
work for practical optimistic asynchronous atomic broadcast.
(WAN), because of fluctuating bandwidth, unreliable links,
At the core of our design, a new fastlane abstraction, that substantial delays, and even network attacks. Especially in
can be easily implemented, is set forth to better prepare the an asynchronous network [8], the adversary can arbitrarily
fastlane, thus enabling a highly efficient pace-synchronization to delay messages and schedule their deliveries, and many others
handle its failures. The resulting design reduces a cumbersome [6, 7, 9, 13, 28, 39, 40, 57, 62], might sustain the inherent loss
MVBA to a variant of the conceptually simplest binary agreement
only. Besides detailed security analyses, we also give concrete
of liveness to grind to a halt [34, 47].
instantiations of our framework and implement them. Extensive That said, when the network is the adversary, relying on
experiments show that our new fallback mechanism adds minimal synchrony might lead up to the fatal vulnerability of denial-
overhead, and demonstrates that BDT can enjoy both the of-service! Thus, to thrive in the unstable or even adversar-
low latency of deterministic protocols and the robustness of ial Internet for mission-critical applications (e.g., consortium
randomized asynchronous protocols in practice.
blockchain as financial infrastructure), it becomes a sine qua
I. I NTRODUCTION non instead of an extra bonus to conquer the denial-of-service
threat. Noticeably, fully asynchronous ABC [21, 31, 41, 47]
The explosive popularity of decentralization [19, 51] creates
can ensure safety and liveness simultaneously with no assump-
an unprecedented demand of deploying robust Byzantine fault
tions about network synchrony, thus becoming the most natural
tolerant (BFT) consensuses. These protocols can replicate an
and robust candidates for mission-critical applications.2
ever-growing linearized log of transactions among a set of n
distributed parties [24], and were conventionally abstracted as Fully asynchronous BFT? Robustness with a high price!
BFT atomic broadcast (ABC) [21]. Despite an adversary that Nevertheless, the higher security assurance of asynchronous
can influence the underlying communication network (e.g., 1 It can even be necessary to require output within a polynomial number
delay messages) and corrupt some participating parties (e.g., of rounds (in the security/system parameters), if using computationally-secure
n/3), ABC can attain the following indispensable securities: cryptographic primitives such as digital signature and hash function.
2 Besides fully asynchronous protocols, one might suggest to choose a con-
• Safety. The honest parties can output the same linearly- servative upper bound of network delay to make (partially) synchronous BFT
ordered log of transactions; survive in the adversarial network. But this might bring serious performance
• Liveness. Any transaction inputted by some honest party degradation in latency, e.g., the exaggeratedly slow Bitcoin. Following that, a
large number of “robust” (partially) synchronous protocols are subject to this
can appear in each honest party’s output log eventually; robustness-latency trade-off, such as Prime [6], SPIN [62] and RBFT [9]. Let
sometimes, one might consider a higher efficient liveness alone, none of them can have liveness in an asynchronous network.
ABC does not come for free: the seminal FLP “impossibility” potential asynchronous adversary), a fallback mechanism then
[34] states that no deterministic protocol can ensure both safety is triggered to first clean any possible disagreement caused by
and liveness in an asynchronous network. So asynchronous the failed fastlane (through a pace-synchronization mechanism
ABC must run randomized subroutines to circumvent the analog to view-change in the asynchronous network) and then
“impossibility”, which already hints its complexity. Indeed, ensure liveness (via a robust pessimistic path consisting of a
few asynchronous protocols have been deployed in practice full-fledged asynchronous protocol).
during the past decades due to large complexities, until the The rationale behind the idea is straight: the realistic Internet
recent progresses of HoneyBadger [47] and Dumbo protocols might become unstable or even adversarial sometimes, but
[41, 46] provide novel paths to practical asynchronous atomic usually there are sufficient optimistic periods, during which
broadcast in terms of optimal amortized communication cost. the network stays in good conditions to deliver messages
Despite those recent progresses, the actual performance synchronously and thus a deterministic fastlane can be utilized
of state-of-the-art randomized asynchronous protocols (e.g., to improve the normal-case performance. In this way, the
Dumbo) is still far worse than the deterministic (partial) “amortized” latency of such an asynchronous protocol can be
synchronous ones (e.g., HotStuff [63]3 ), especially regarding closer to the deterministic fastlane instead of always suffering
critical metrics such as latency. In the same WAN deployment from the slow pessimistic path.
environments consisting of n=16, 64, 100 Amazon EC2 in-
stances across the globe, HotStuff is dozens of times faster Bottleneck of optimistic asynchronous atomic broadcast.
than the state-of-the-art Dumbo. Even worse, the inferior Although the high-level idea of the above optimistic asyn-
latency performance of asynchronous protocols (inherently) chronous atomic broadcast appears to be enticing in theory,
stems from the fact that all parties need to iteratively generate the actual design was complicated since the whole fallback
some costly randomness (e.g., “common coin” [22, 25]), and mechanism shall be guaranteed to work in fully asynchronous
multiple repetitions are necessary to ensure the parties to network. In particular, if we aim to harvest the best of both
coincidentally output with an overwhelming probability. As a paths of optimistic asynchronous ABC in practice, serious
result, dozens of rounds on average are needed, even if in the efficiency hurdle remains.
most efficient existing asynchronous protocols such as Dumbo To reason the remaining challenges, let us briefly overview
[41]. While for their (partial) synchronous counterparts, only the earlier theoretic results — KS02 [44] and RC05 [59].
a very small number of rounds are required in the optimistic
cases when the underlying communication network is luckily KS02 and RC05
synchronous, e.g., 5 in the two-chain HotStuff. Optimistic Fastlane
Deterministic, fast, but no liveness assurance
Init
A. Optimistic asynchronous protocols and its bottleneck BroadCast BroadCast ... BroadCast
1 2 s
The above issues correspond to a fundamental “dilemma” Delivered Broadcasts If timeout:

lying in the design space of atomic broadcast protocols suitable fallback
for the open Internet: the cutting-edge (partially) synchronous

protocols can optimistically work very fast, but lack liveness
multicast Complain
guarantee in adversarial networks; on the contrary, the fully
If enough
asynchronous protocols are robust even in malicious networks, valid Complants
but suffer from poor latency performance in the normal case.
Pace Synchronization
(Use MVBAs to decide from where to fallback)
Overcome the performance barrier of asynchronous pro-
Coin-Toss
tocols via optimistic fastlane? One natural question arises:
can we resolve the “dilemma” to design a new atomic broad-
Several
cast protocol achieving the best of both synchronous and cumbersome
asynchronous paradigms, such that it is super fast over the async. MVBAs
normal Internet and remains robust all the time (even if in an
asynchronous network)? Regarding the question, pioneering
works of Kursawe-Shoup [44] (KS02) and a later improve-
ment of Ramasamy-Cachin [59] (RC05) proposed optimistic Restart
fastlane
asynchronous atomic broadcast.
As Fig. 1 illustrates, optimistic asynchronous ABC initially
Pessimistic Path
executes a deterministic protocol as “fastlane” to progress (Use some Async Consensus, Ensure liveness by
fast in good network conditions (optimistic case). But if e.g., directly invoke MVBA) to ensure async consensus
honest nodes' inputs eventually output)
the fastlane does not progress in time (e.g., because of a
Fig. 1: The execution flow of KS02 [44] and RC05 [59]. Both protocols rely
3 Remark that [63] gave a three-chain HotStuff protocol along with a two- on cumbersome asynchronous multi-value BA (MVBA) to clean the mess-up
chain variant. Throughout the paper, we let HotStuff refer to the two-chain of failed fastlanes to pick up from where to fallback.
version for its lower latency and higher throughput in practice.
2
Challenges, prior art, and remaining bottleneck. As Fig. 1 il- As Fig. 2 (a) exemplifies, even though the network stays
lustrates, the fastlane of KS02 and RC05 directly employs a in good conditions for the majority of time, the overall
sequence of some broadcast primitives (the output of which is average latency of the protocol is still way larger than its
also called a block for brevity). If a party does not receive a fastlane. One slow fallback could “waste” the gain of dozens
block within a period (defined by a timeout parameter), then it of optimistic blocks, and it essentially renders the optimistic
simply requests to fall back by informing other parties about fastlane ineffective. In the extreme case shown in Fig. 2 (b),
the index of the block which it is now at. When the honest the fallback is always triggered because the fastlane leaders
parties receive a sufficient number of fallback requests (e.g., are facing adaptive denial-of-service attack, it even doubles the
2f +1 in the presence of up to f faulty parties), they execute a cost of simply running the pessimistic asynchronous protocol
mechanism (called pace-synchronization throughout the paper) alone.
in order to decide where to continue the pessimistic path.
(a) Usual Unstable Network
Since different honest parties may have different progress Fastlane Pace-sync Pessimistic path Pace-sync Pessimistic path
in the fastlane when they decide to fall back, e.g., some are ....
now at block 5, some at block 10, thus pace-synchronization
needs to ensure: (i) all honest parties can eventually enter sync period async period sync period async period sync period
the pessimistic path from the same block; and (ii) all the
“mess-ups” (e.g., missing blocks) left by the fastlane can (b) Worst Async. Network
Pace-sync Pessimistic path Pace-sync Pessimistic path Pace-sync
be properly handled. Both requirements should be satisfied
in an asynchronous network! These requirements hint that
all the parties may need to agree on a block index that is async period
proposed by some honest party, otherwise they might decide to
Fig. 2: Consequence of slow fallback in KS02/RC05 in a typical network
sync up some blocks that were never delivered. Unfortunately, setting. The length of the blocks denote the latency in each phase.
directly implementing such a functionality requires one-shot
asynchronous (multi-valued) Byzantine agreement with strong
validity (that means the output must be some honest party’s It follows that in the wide-area Internet, inefficient pace
input), which is infeasible and inherently incurs exponential synchronization in previous theoretical protocols would likely
communication complexity [35]. eliminate the potential benefits of introducing the optimistic
As depicted in Fig. 1, both KS02 and RC05 smartly fastlane, and thus their applicability is limited. A fundamental
implement pace-synchronization through asynchronous multi- practical challenge remains open:
valued validated Byzantine agreement (MVBA) to get around
Can we have an asynchronous atomic broadcast
the infeasible strong validity. An MVBA is a weaker and
(with an optimistic deterministic fastlane), such that the cost
implementable form of multi-valued Byzantine agreement, the
of pace synchronizations is minimal, to harvest the best of
output of which is allowed to be from a malicious party
both paths in the real-world Internet environment?
but has to satisfy a predefined predicate. Still, MVBA is a
cumbersome building block, and was proposed by Cachin B. Our contributions
et al. to directly construct full-fledged asynchronous atomic
Summary of results. We give the first practical and generic
broadcast [21]. What’s worse, this heavy primitive is invoked
framework for optimistic asynchronous atomic broadcast
for pace-synchronization as well as the pessimistic path, which
called Bolt-Dumbo Transformer (BDT for short), and answers
in turn blows up the communication to at least O(n3 ) bits
the aforementioned question affirmatively:
and dozens of rounds. Although we may reduce the O(n3 )
communication to O(n2 ) by some very recent results (e.g., • BDT always has liveness and safety in the asynchronous
Dumbo-MVBA from [46]), however, they remain costly in network against up to n/3 Byzantine corruptions (which
practice due to a large number of execution rounds and is optimal);
additional computing costs (e.g., a large number of threshold • In normal cases, i.e., the network is benign for sufficient
signature verifications). periods such as the real world Internet, BDT can be as
fast as the state-of-the-art pipeline BFT protocols, e.g.,
Consequences of slow pace-synchronization. A severe impact
the two-chain version of HotStuff;
caused by the inefficient pace-synchronization is to harm the
• Even if in the worst case of adversarial asynchronous
effectiveness of adding fastlane, in particular when the network
network, it still performs almost same to the currently
condition may fluctuate and thus pace-synchronization could
most efficient asynchronous protocols, e.g., Dumbo.
potentially be triggered frequently in the real-world Internet.
To see the issue, consider the heavy pace-synchronization At the core of BDT framework, it is a new abstraction of
of existing work [44, 59] that is as slow as the asynchronous fastlane (nickname Bolt), which makes all honest parties better
pessimistic path and dozens of times slower than the fastlane.4 prepared to enable a simplest possible (asynchronous) pace
synchronization (nickname Transformer). Together with the
4 Actual situation might be much worse in RC02 [44] because several more recent progress of asynchronous consensus (e.g., Dumbo) as
MVBA invocations with much larger inputs are needed in the pace-sync. the pessimistic path, BDT is depicted in Fig 3. Specifically,
3
tocol Dumbo as its pessimistic path, this can be easily
BDT (Our Optimist. Async. Atomic Broadcast Framework)
replaced by future more efficient constructions.
Init Bolt (Fastlane):
general and very fast but 1) New fastlane abstraction:
vunerable w.r.t. liveness attacks general for implementing
and simplifies pace-sync
Technical overview. Let us first briefly recall why MVBA is
used for pace synchronization, and poses the main practical
Level-1 fallback if:
Restart 1. timeout (i.e., fastlane halts) hurdle in existing optimistic asynchronous atomic broadcasts?
fastlane 2. or TXs might be censored
Transformer (Pace-Sync): 2) Optimal pace-sync: If the leader is corrupted or the network is failed as a
use Binary BA only to decide how from heavy multi-value BA
many TXs were output via fastlane to much simpler binary BA synchronous network, then the fastlane could fail to progress,
in this case, each honest party can wait for a set of 2f + 1
Level-2 fallback if: fallback requests sent from distinct parties, and each fall-
the fastlane did output nothing,
i.e., completely failed back request (called complaint message in KS02 and RC05)
3) Two fallback levels:
Dumbo (Pessimistic Path): reduce the runs of async contains an index s of the latest fastlane block along with
secure against arbitrary network protocols in benign network
delay but suffer from slow execution
the corresponding quorum “certificate” (which usually is a
collection of signatures from distinct parties). The goal of
Fig. 3: The overview of our new practical and generic optimistic atomic pace synchronization is to decide a common index contained
broadcast framework, nicknamed by Bolt-Dumbo Transformer.
by some fallback sent from an honest party, such that all
honest parties can agree to enter the pessimistic path from
• A highly efficient pace-synchronization. With the help of this position. To this end, prior studies utilized Cachin et al.’s
our new abstraction of fastlane (to be explained soon), MVBA [21] to pick up a set of 2f + 1 valid fallback requests
Transformer removes the need of cumbersome MVBA from distinct parties, and the largest index s among these
by reducing the pace synchronization problem to a much fallbacks naturally becomes the agreed fastlane position to
simpler specific-purpose asynchronous binary Byzantine start fallback. In addition, due to the nice property of valid
agreement (ABBA), we call it two-consecutive-valued quorum “certificates”, it is guaranteed that enough parties did
Byzantine agreement (tcv-BA for short). Besides that receive the blocks with indices until s, so the honest parties
it brings O(n) factor improvement of communication can expect to fetch the missing blocks.
complexity relative to KS02 and RC05. More importantly, Now we walk through how we reduce the above complex
in practice, tcv-BA attains a minimal overhead that is sim- pace synchronization to only a variant of binary agreement.
ilar to the latency of fastlane. Transformer is essentially
optimal, since pace synchronization can be viewed as a First ingredient: a new abstraction of the fastlane. We put
version of asynchronous consensus itself, and ABBA is forth a new simple fastlane abstraction called notarizable
the arguably simplest consensus we can expect. weak atomic broadcast (nw-ABC, with nickname Bolt). In the
• Further benefits of a light pace synchronization. To fur- optimistic case, it performs as a full-fledged atomic broadcast
ther exploit the benefits brought by the simple protocol and can output a block per τ clock ticks. But if the
Transformer, we add a simple check after pace-sync to synchrony assumption fails to hold, it won’t have liveness
create two-level fallbacks: if pace-sync reveals that the nor exact agreement, only a notarizability property can be
fastlane still made some output, it immediately restarts ensured: whenever any party outputs a block at position j
another fastlane without running the actual pessimistic with a proof (e.g., a quorum certificate), then at least f + 1
path. This is in contrast with previous works [44, 59] honest ones already output the same block at position j − 1
where the slow pessimistic path will always be run with the corresponding proof (cf. Fig. 4). It is also easy to
after each pace-sync, which is often unnecessarily costly. implement nw-ABC. We present two exemplary constructions.
Remark that the earlier studies cannot effectively adopt The first one assigns a stable leader to sequentially multicast
our two-level fallback tactic, because their heavy pace- its input proposals and other parties simply vote by digital
synchronizations might even bring extra cost in case of signatures, and is appropriate for the scenarios that favor
frequent fallbacks. latency. The second one is to let a stable leader sequentially
• Flexible instantiations. Since BDT is generic, it enables execute reliable broadcasts (using erasure code to reduce the
flexible choices of the underlying building blocks. For ex- leader’s bandwdith usage [23, 47]) and is more friendly for
ample, we present two exemplary fastlane instantiations, large input proposals, thus achieving a throughput higher than
resulting in two BDT implementations that favor latency the first instantiation in global Internet.
and throughput, respectively, so one can instantiate BDT
according to the actual application scenarios. Also, our How “notarizability” better prepares honest parties? More
pace-sync mechanism Transformer can be constructed importantly, the seemingly straightforward notarizability turns
around any asynchronous binary agreement, thus having out to be very useful to simplify pace-synchronization. Let us
the potential of using more efficient ABBA to further examine the pattern of honest parties’ fastlane outputs, when
reduce the fallback overhead. Similarly, though currently entering the pace synchronization phase (for simplicity, we
we use the state-of-the-art asynchronous consensus pro- only consider the indices of the fastlane blocks).
4
A snapshot of all parties' local views of nw-ABC output by one block. This can even temporarily violate safety
Block 1 Block 2 Block 3 Block 4 Block 5 Block 6 Block 7 requirement, if other parties commit a different block after
1's Proof possible
r ty 1 Proof 2 Proof 3 Proof 4 Proof 5 Proof 6
no proof pace-sync. To solve the issue, we introduce a “safe buffer” to
Pa
let the newest fastlane block be pending, i.e., not output until
Block 1
2's Proof
Block 2 one more fastlane block with valid proof is received.
ty 1 Proof 2
r
Pa
Extensive evaluations in the real-world environments. To
Block 1 Block 2 Block 3 Block 4 Block 5 Block 6 Block 7 demonstrate the practical performance of BDT, we implement
3's possible
Pa
rty
Proof 1 Proof 2 Proof 3 Proof 4 Proof 5 Proof 6 no proof it with using state-of-the-art [41] as the pessimistic path5
and then instantiate the whole protocol. We compare our two
Block 1 Block 2 Block 3 Block 4 Block 5 Block 6 Block 7 typical BDT implementations to Dumbo and HotStuff, and
4's
r ty Proof 1 Proof 2 Proof 3 Proof 4 Proof 5 Proof 6 Proof 7
Pa Notarizability conduct extensive experiments to show its real-world applica-
Block 7 w/ valid proof ⟹ f+1 honest parties receive: bilities. For fair comparison, we implement all protocols in the
1) all blocks through Block 1 to Block 6 w/ valid proofs
2) Block 7 , though it might not be notarized by a proof
same language with same cryptographic libraries and deploy in
the same WAN environment consisting of up to 100 Amazon
Fig. 4: Exemplifying the notarizability of our nw-ABC abstraction.
EC2 c5.large instances distributed across the globe.
Suppose all honest parties have quit the fastlane, exchanged TABLE I: Performance comparison of BDT, HotStuff and Dumbo over 100
their fallback requests (containing their latest block index and EC2 c5.large servers across 16 regions in 5 continents
the proof), received such 2f + 1 fallback requests, and thus
Good-Case v.s. HotStuff Worst-Case v.s. Dumbo
entered the pace synchronization. At the time, let s to be the (fastlanes always complete) (fastlanes always fail)
largest index of all notarized blocks with valid proofs. BDT-sCAST∗∗ BDT-sRBC† HotStuff § BDT-Timeout‡ Dumbo
We can make two easy claims: (i) no honest party can see a Basic latency
0.44 0.67 0.42 21.95 16.36
valid fallback with an index equal or larger than s + 1; (ii) all (sec)
Throughput
honest parties must see some fallback with an index equal or (tps) ∗
9253 18234 10805 18806 21242
larger than s − 1. If (i) does not hold, following notarizability, ∗ Each transaction has 250 bytes to approximate the basic Bitcoin TX.
at least one party can produce a proof for block s + 1, which ∗∗ BDT-sCAST is BDT with using Bolt-sCAST, where Bolt from sequential
contradicts the definition of s. While for (ii), since block s is multicasts (sCAST) is called Bolt-sCAST.
† BDT-sRBC is BDT with using Bolt-sRBC, where Bolt from sequential
with valid proof, at least f + 1 honest parties received block
s−1 with valid proof. So for any party waits for 2f +1 fallback reliable broadcast (sRBC) is called Bolt-sRBC.
‡ BDT-Timeout waits for 2.5 sec, and then runs pace-sync and Dumbo.
requests, it must see at least one fallback sent from some of § Reasons why our evaluations for HotStuff different from [63]: (i) instead of
these f + 1 honest parties, thus seeing s − 1; otherwise, there using one AWS region, we test among 16 AWS regions across 5 continents;
would be 3f +2 parties (contradicting our baseline assumption (ii) we let all implementations including HotStuff’s to make agreement on
that the number of corrupted parties f is optimal b(n−1)/3c). 250-byte transactions instead of 32-byte transaction hashes; (iii) for fair
comparison, we use a non-blocking network layer written in Python to test
The above two simple claims successfully narrow the range all BFT, which is a single process using gevent library for concurrent tasks
of the honest parties’ inputs to be {s − 1, s}, which are two and differs from the multi-processing communication layer in [63].
unknown but consecutive integers.
Second ingredient: async. agreement for consecutive values. Some highlighting experimental results could be found in
Thanks to the nw-ABC abstraction, pace-sync is reduced to Table I, which are: (i) the BDT implementation based on
pick one of two unknown consecutive inputs {s − 1, s}. To sequential multicasts can attain a basic latency about only
handle the problem, we further define two-consecutive-valued 0.44 second (i.e., nearly same to 2-chain HotStuff’s and
Byzantine agreement (tcv-BA), and it is easy to implement less than 3% of Dumbo’s latency), even if we intentionally
tcv-BA from any asynchronous binary Byzantine agreement raise frequent pace synchronizations after every 50 optimistic
(cf. Section IV for the concrete construction). blocks;(ii) in the worst case that we intentionally make fastlane
Fine-tuning Bolt for safety. When tcv-BA outputs a value u, to always fail, the throughput of BDT remains 90% of Dumbo
Transformer could simply ask all honest nodes to sync up BFT’s and is close to 20,000 transaction per second, even
to block u, because no matter u is s or s − 1, the u-th conditioned on that we do let BDT wait as long as 2.5 seconds
fastlane block is with a valid proof, so it can be retrieved to timeout before invoking pace-sync. Those demonstrate BDT
due to notarizability (cf. Fig. 4). Nevertheless, a subtle issue can be “as fast as” the deterministic protocols in normal cases
remains: tcv-BA inherently cannot guarantee u = s, and thus and can maintain robust performance in the worst case. More
u probably is s−1. This is because an asynchronous adversary in-depth evaluations and analysis to understand the scenarios
can always delay the messages of few honest parties that input in-between are presented in Section VI and VII.
s to make them seemingly crashed. That means, if a party
outputs a fastlane block whenever seeing its proof, it faces a 5 Remark that [41] presents two asynchronous BFT, i.e., Dumbo-1 and
threat that the pace synchronization returns an index smaller Dumbo-2. Since Dumbo-2 outperforms the former, we let Dumbo BFT (or
than its latest output block, which can revoke its committed Dumbo) refer to Dumbo-2 throughout the paper.
5
II. OTHER R ELATED W ORK novel pace-sync mechanism is nicknamed by Transformer,
In the past decades, asynchronous BFT protocols are mostly at the core of which is a variant of binary agreement called
theoretical results [2, 10–12, 17, 25, 29, 54, 55, 58], and no two-consecutive-value Byzantine agreement (tcv-BA); For the
clear and convincing evidence was ever shown to demonstrate pessimistic path, it can be reduced to any asynchronous
real-world feasibility until several recent studies such as Hon- consensus, and we nickname it by Dumbo as instantiating it
eyBadger BFT [47], BEAT [31], Dumbo [41, 46] and VABA via the state-of-the-art asynchronous protocol — Dumbo.
[3]. Nevertheless, they still have very large latency (that is TABLE II: Core component, nickname and instantiations for each BDT phase.
much larger than partially synchronous protocols in the normal
Core Component Nickname Instantiations
network) because of using dozens of rounds to terminate.
Optimistic Bolt-sRBC or Bolt-sCAST
The aforementioned KS02 [44] and RC05 [59] demon- nw-ABC Bolt
Fastlane (cf. Section III.A)
strated the theoretical feasibility of developing optimistic Reducing to any ABBA
asynchronous atomic broadcast to reduce the latency of asyn- Pace-Sync tcv-BA Transformer or adapting ABBA in [50]
(cf. Section III.B)
chronous protocols in the benign network environments. Very
ACS from Dumbo [41] or
recently, Spiegelman [61] studied the problem with using Pessimistic asynchronous
Dumbo any multi-value consensus
a recent MVBA construction [3] to instantiate the fallback. Path consensus
(e.g., MVBA [3, 21, 46])
However, as earlier discussed, these frameworks are still
inefficient. Especially in the unstable networks, they may not
B. Data structure: block and log
guarantee the expected benefits of introducing deterministic
fastlanes, when frequent fallbacks are occurring. Transaction. Without loss of generality, we let a transaction
It is well known that partially synchronous protocols can be denoted by tx to represent a string of |m| bits.
responsive after GST in the absence of failures. Nonetheless, Block structure. The block is a tuple in form of block :=
if some parties are slow or even act maliciously, many of hepoch, slot, TXs, Proofi, where epoch and slot are natural
these protocols [27, 63] might suffer from a worst-case latency numbers, TXs is a sequence of transactions also known as the
related to the upper bound of network delay. Some recent payload. Throughout the paper, we assume |TXs| = B, where
studies [4, 5, 40, 48, 57, 60] also consider synchronous B be the batch size parameter. The batch size can be chosen to
protocols with optimistic responsiveness, such that when some saturate the network’s available bandwidth in practice. Proof
special conditions were satisfied, they can confirm transactions is a string (a.k.a. quorum certificate in literature) attesting that
very quickly (with optimal n/2 tolerance in the synchronous at least f + 1 honest parties indeed vote the block by signing.
setting). Our protocol is responsive all the time, because it
does not have to wait for timeout that is set as large as the block block block block
upper bound of network delay in all cases. epoch slot=1 epoch slot=2 epoch slot=3 epoch slot=L
Besides, some literature [14, 15, 33, 45, 49] is interested ......
TXs tx tx tx... TXs tx tx tx... TXs tx tx tx... TXs tx tx tx...
in combining synchronous and asynchronous protocols for
Proof Proof Proof Proof
stronger security guarantees such as higher fault tolerance.
log[1] log[2] log[3] log[L]
This line of work might focus on designing protocols that can
tolerate up to n/2 corruptions in the synchronous cases and Consecutive log
tolerate n/3 corruptions in the asynchronous cases. We instead Fig. 5: The structure of block and consecutive log to our terminology.
aim to harvest efficiency from the deterministic protocols and
stick with n/3 tolerance (optimal in asynchronous networks).
Blocks as output log. Throughout the paper, a log (or
Concurrent works. A concurrent work [37] considers adding interchangeably called as blocks) refers to an indexed sequence
an asynchronous view change to the variants of HotStuff, and of blocks. For a log with length L := |log|, we might use
a later extended version of it [36] was presented with imple- hereunder conventional notations: (1). log[i] denotes the i-th
mentations very recently. Though with a similar motivation, block in log, and i ∈ [1, L]. For example, log[1] is the first
they focus on a specific construction of asynchronous fallback block of log, log[−1] is the alias of the last block in log,
by opening up a recent MVBA protocol called VABA [3], and and log[−2] represents the second-to-last block in log, and so
thus cannot have the linear communication per transaction (as forth. (2). log.append(·) can append some block to log. For
we do) in the pessimistic path. Moreover, it is essentially still example, when log.append(·) takes a block 6= ∅ as input, |log|
an MVBA protocol handling pace synchronization, while we increases by one and log[−1] becomes this newly appended
reduce the task to conceptual minimum—binary agreement. block; when log.append(·) takes a sequence of non-empty
blocks [blockx+1 , . . . , blockx+k ] as input, |log| would increase
III. P ROBLEM F ORMULATION AND N OTATIONS
by k, and log[−1] = blockx+k , log[−2] = blockx+k−1 and so
A. Nicknames on after the operation; when log.append(·) takes an empty
As briefly mentioned, our novel fastlane is centered around block ∅ as input, the append operation does nothing.
a new broadcast primitive called notarized weak atomic broad- Consecutive output log. An output log consisting of L blocks
cast (nw-ABC), and we let Bolt to be its nickname; Our is said to be consecutive if it satisfies: for any two successive
6
blocks log[i] and log[i + 1] included by log, either of the Definition 1: In the atomic broadcast (ABC) protocol (i.e.,
following two cases shall be satisfied: (i) log[i].epoch = an abstraction of state-machine replication for distributed ser-
log[i + 1].epoch and log[i].slot + 1 = log[i + 1].slot; or (ii) vice), each party is with an implicit queue of input transactions
log[i].epoch + 1 = log[i + 1].epoch and log[i + 1].slot = 1. (i.e., the input backlog and its size can be bounded by an
Without loss of generality, we let all logs to be consecutive unfixed polynomial) and outputs a log of blocks. Besides
throughout the paper for presentation simplicity. the above syntax, the ABC protocol satisfies the following
properties with all but negligible probability:
C. Modeling the system and threats
• Total-order. If an honest party outputs a log consisting of
We consider the standard asynchronous message-passing a sequence of blocks, and another honest party outputs
system with trusted setup, which can be detailed as follows. a log0 , then log[i] = log0 [i] for every i that 1 ≤ i ≤
Known identities and trusted setup. There are n designated min{|log|, |log0 |}.
parties, each of which has a unique identity (i.e., P1 through • Agreement. If an honest party adds a block to its log, all
Pn ) known by everyone else. All involved threshold cryp- honest parties would add the block to their logs.
tosystems such as TSIG and TPKE are properly set up, so all • Liveness (adapted from [21]). If all honest parties input a
parties can get and only get their own secret keys in addition transaction tx (i.e., tx is in their implicit input backlog),
to all relevant public keys. The setup can be done through tx would be included by some honest party’s output log
distributed key generation [38, 42, 43, 56] or a trusted dealer. in at most a polynomial amount of time.
Static Byzantine corruptions. The adversary can choose up Remarks on the definition of ABC. Throughout the paper, we
to f out of n parties to corrupt and fully control before the also let safety refer to the union of total-order and agreement.
execution. No asynchronous BFT can tolerate more than f = Besides, we insist on the strong liveness notion from [21]
b(n − 1)/3c such static corruptions. Throughout the paper, we to ensure that each input transaction can output reasonably
stick with this optimal resilience. quickly instead of eventually. Such strong liveness is needed
Fully-meshed asynchronous p2p channels. We consider that for the end security goal of guaranteeing fast confirmation of
there exists an established reliable authenticated p2p channel input transactions even in the worst adversary-influencing case,
between any two parties. The adversary can arbitrarily delay i.e., it resolves the denial-of-service attacks. This reasonable
and reorder messages, but cannot drop or modify messages aim can be separated from some studies that have exponen-
sent among honest parties. tially large confirmation latency [10], and was also known as
censorship-resilience in HoneyBadger BFT [47].
Computationally-bounded adversary. We consider computa-
tionally bounded adversary that can perform some probabilis- E. Other terminologies
tic computing steps bounded by polynomials in the number of
Performance metrics. We are particularly interested in con-
message bits generated by honest parties, which is standard
structing practical asynchronous atomic broadcasts, and there-
cryptographic practice in the asynchronous network.
fore, it becomes meaningful to consider the following (ex-
Adversary-controlling local “time”. It is impossible to im- pected) computational costs per output block:
plement synchronized global time in the asynchronous model.
• Communication complexity. We primarily focus on the
Nevertheless, we do not require any global wall-clock for
(average) bits of all messages associated to output each
securities. Same to [20, 44], it is still feasible to let each
block. Because the communicated bits per block essen-
party keep an adversary-controlling local “clock” that elapses
tially reflects the (amortized) communication per deliv-
at the speed of the actual network delay δ: after initializing,
ered transaction, in particular when each block includes
each party sends a “tick” message to itself via the adversary-
O(B)-sized transactions on average due to a specified
controlling network, then whenever receiving a “tick”, it
batch-size parameter B for block size.
increases its local “time” by one and resends a new “tick”
• Each party’s bandwidth usage. This is similar to per-
to itself via the adversary. Using the adversary-controlling
block communication complexity, but it does not count
“clock”, each party can maintain a timeout mechanism, for
the overall message bits that are exchanged for generating
example, let timer(τ ).start() to denote that a local timer
each block. Instead, this metric counts a certain party’s
is initialized and will “expire” after τ clock ticks, and let
bandwidth usage (including both incoming and outgoing
timer(τ ).restart() denote to reset the timer.
messages) amortized for each output block.
D. Security goal: asynchronous atomic broadcast ABC • Message complexity. We are also interested in the average
number of messages associated to a block. This charac-
Asynchronous atomic broadcast. Our goal is to develop an
terizes the total number of messages exchanged between
atomic broadcast protocol that attains high robustness in the
honest parties to produce a block.
unstable or even hostile network environment and remains high
• Running time (or asynchronous rounds)). The eventual
performance if the network is luckily benign. That means, the
delivery in asynchronous network might cause that the
proposed protocol shall at least satisfy the following definition
protocol execution is somehow independent to “real
of atomic broadcast in the asynchronous model for security.
time”. Nevertheless, it is still needed to characterize the
7
running time, and a standard way to do so is counting the 2nd to the highest block among all parties’ outputs
asynchronous “rounds” as in [21, 25]. (cf. Fig. 4). Therefore, after all honest parties exchange
timeout requests with notifying each other about their
Cryptographic abstractions. H denotes a collision-resistant
latest block’s proof, everyone shall at least receive the
hash function. TSIG and TPKE denote threshold signature
2nd to the highest block’s proof, and at most receive the
and threshold encryption, respectively. Established TSIG is a
highest block’s proof.
tuple of algorithms (SignSharet , VrfySharet , Combinet , Vrfyt ),
• Considering the above fact guaranteed by nw-ABC’s
and throughout the paper, we call the signature share out-
notarizability, we can conclude that: after exchanging
put from SignSharet the partial signature, and also call
timeout requests, all honest parties either know s (the
the output of Combinet the full signature as it combines t
index of highest block) or s − 1. We therefore lift the
partial signatures. Established TPKE consists three algorithms
conventional binary agreement to a special variant of
(Enct , DecSharet , Dect ). In all notations, the subscript t repre-
tcv-BA, which can always decide a common value out of
sents the threshold, cf. some classic literature such as [47] for
{s, s−1} despite that the adversary might input arbitrarily
the properties of the well-known threshold cryptosystems. The
outside {s, s − 1}, say s − 2 or s + 1.
cryptographic security parameter is denoted by λ, capturing
the bit-length of signatures and hashes. A. Abstracting and constructing the optimistic fastlane
Building blocks. We might use the following asynchronous Our first ingredient is a new fastlane abstraction and its
broadcast/consensus primitives in the black-box manner. construction with nickname Bolt. This is the key compo-
Definition 2: Reliable broadcast (RBC) has a designated nent to ensure that all parties’ outputs in the fastlane are
sender who aims to send its input to all parties, and satisfies the weakly consistent, thus enabling the simplification of pace-
next properties except with negligible probability: (i) Validity. synchronization. To abstract the key features of this critical
If the sender is honest and inputs v, then all honest parties building block, we put forth the notion of notarizable weak
output v; (ii) Agreement. The outputs of any two honest parties atomic broadcast (nw-ABC) as follows.
are same; (iii) Totality. If an honest party outputs v, then all Definition 5: Notarizable weak atomic broadcast
honest parties output v. (nw-ABC, nicknamed by Bolt). In the Bolt protocol with an
identification id, each party takes a transaction buffer as input
Definition 3: Asynchronous binary Byzantine agreement
and outputs a log of blocks, where each block log[j] is in
(ABBA) [22, 25, 50] has a syntax that each party inputs
form of hid, j, TXsj , Proof j i. There also exists two external
and outputs a single bit, and shall guarantee the following
functions Bolt.verify and Bolt.extract taking id, slot j and
properties except with negligible probability: (i) Validity. If
Proof j as input (whose outputs and functionalities would
any honest party outputs b, then at least one honest party
soon be explained below). We require that Bolt satisfies the
takes b as input; (ii) Agreement. The outputs of any two honest
following properties except with negligible probability:
parties are same; (iii) Termination. If all honest parties activate
• Total-order. Same to atomic broadcast.
the protocol with taking a bit as input, then all honest parties
• Notarizability. If any (probably malicious) party out-
would output a bit in the protocol.
puts at the j-th position of its log a block log[j] :=
Definition 4: Asynchronous common subset (ACS) [12]
hid, j, TXsj , Proof j i s.t. Bolt.verify(id, j, Proof j ) = 1,
among n parties (with f corruptions) lets each party input a
then: there exist at least f + 1 honest nodes, each of
value and output a set of values. It satisfies the next properties
which either already outputs log[j], or already outputs
except with negligible probability: (i) Validity. The output set
log[j − 1] and can use the valid Proof j to extract log[j]
S of an honest party contains the inputs of at least n − 2f
by Bolt.extract from the received protocol messages.
honest parties; (ii) Agreement. The outputs of any two honest
• Abandonability. An honest party will not output any block
parties are same; (iii) Termination. If all honest parties activate
in Bolt[id] after invoking abandon(id). In addition, if f +1
the protocol, then all honest parties would output.
honest parties invoke abandon(id) before output log[j],
then no party can output valid log[j + 1].
IV. A LL S PARK FOR Bolt-Dumbo Transformer: • Optimistic liveness. There exist a non-empty collection
FASTLANE AND T WO -C ONSECUTIVE -VALUE BA of optimistic conditions that specify the honesty of some
BDT immediately benefits from our highly efficient pace- certain parties, such that for any honest party, once it out-
synchronization —Transformer— that is centering around the puts log[j], it will output log[j +1] within κ asynchronous
conceptually simplest binary agreement in lieu of any compli- rounds, where κ is a constant number.
cated multi-value agreement. This simple and efficient pace- Comparing to fully-fledged asynchronous atomic broadcast,
synchronization becomes possible for two critical ingredients, nw-ABC does not have the exact agreement and liveness
i.e., a novel fastlane abstraction nw-ABC and a new variant of properties: (i) to compensate the lack of agreement, the
binary Byzantine agreement tcv-BA. Specifically, notarizability is introduced to guarantee that whenever a party
• Thanks to the nice notarizability of nw-ABC, all par- outputs a block log[j] at position j, then at least f + 1 honest
ties’ nw-ABC outputs are somewhat weakly consistent, ones already output a block at the position j − 1, and in
namely, at least f + 1 honest parties must already output addition, f + 1 honest parties essentially already receive the
8
protocol messages carrying the payload of log[j], so they such a sense, the fastlane can be easily instantiated in many
can immediately extract the block if seeing any valid Proof j ways through the lens of (partially) synchronous protocols.
for log[j]; (ii) liveness exists in an optimistic form, this Unsurprisingly, one candidate is the fastlane used in KS05
weakening allows nw-ABC to be easily implementable in the [59]. Here we present two more exemplary Bolt constructions.
asynchronous setting by simple deterministic protocols. Note Bolt-sCAST from sequential multicasts. As shown in the
that the optimistic liveness excludes a trivial implementation algorithm in Figure 6, Bolt can be easily constructed from
of fastlane that “each party remains idle to do nothing, until pipelined multicasts with using threshold signature, and we
the abandon interface is invoked to abort”. call it Bolt-sCAST. The idea of Bolt-sCAST is as simple as:
the leader proposes a batch of transactions through multicast,
// Validate Proof s to check whether at least f + 1 honest parties output the s-th
block or can extract it (according to the next Bolt.extract function)
then all parties send back their signatures on the proposed
external function Bolt.verify(id, s, Proof s ): batch as their votes, once the leader collects enough votes
parse Proof s as hhs , σs i
return TSIG.Vrfy2f +1 (hid, s, hs i, σs )
from distinct parties (i.e., 2f + 1), it uses the votes to form a
.............................................................................. proof (quorum certificate) for its precedent proposal. and then
// Leverage the valid Proof s to extract the s-th block from some received protocol
messages (though the block was not output yet).
repeats to multicast a new proposal of transactions (along with
external function Bolt.extract(id, s, Proof s ): the proof). Upon receiving the new proposal and the precedent
if Bolt.verify(id, s, Proof s ) = 1, then parse Proof s as hhs , σs i
if TXss was received during executing Bolt s.t. hs = H(TXss ), then:
proof, the parties would output the precedent proposal and the
return block:=(id, s, TXss , Proof s ), where Proof s := hhs , σs i proof (as a block), and then repeat to respond with their votes
return block:=(id, s, ⊥, ⊥)
on the new proposal. Such simple execution is sequentially
Fig. 8: Invocable external functions for Bolt instantiations repeated until the abandon interface is invoked.
Bolt-sRBC from sequential reliable broadcast. In Bolt-
Careful readers might have noticed that the above abstrac- sCAST, the leader’s bandwidth usage is n times more than
tion, in particular the notarizability property, share similarities the other parties’. This essentially restricts each block’s batch
with the popular lock-commit paradigm widely used in many size and becomes the throughput bottleneck, unless the leader
(partially) synchronous byzantine/crash fault tolerant protocols has significant physical resources. We therefore present an
[7, 27, 32, 40, 63]. For example, when any honest party alternative Bolt implementation to remove the bottleneck. Note
outputs some value (i.e. “commit”), then at least f + 1 honest that a RBC implementation [47] can use the technique of
parties shall receive and already vote this output (i.e. “lock”). verifiable information dispersal [23] to adopt erasure code and
This essentially is captured by notarizability, so an output Merkle commitment tree for communication efficiency as well
block with valid proof can ensure that at least f + 1 honest as balancing network workload. We therefore use sequential
parties can extract the same block from its protocol messages RBC instances with consecutively increasing identifiers to
(because they must already receive the block and vote it). In implement Bolt (cf. Figure 7). In the implementation, a
let id be the sesssion identification of Bolt[id], buf be a FIFO queue of input transactions, B be the batch parameter, and P` be the leader (where ` = (id mod n) + 1)
Pi initializes s = 1, σ0 = ⊥ and runs the protocol in consecutive slot number s as follows:
• Broadcast. if Pi is the leader P`
– if s > 1 then:
∗ wait for 2f + 1 VOTE(id, s − 1, σs−1,i ) from distinct parties Pi , where σs−1,i is the valid partial signature signed by Pi for hid, s − 1, H(TXss−1 )i //
To be precise, VOTE carries σs−1,i , s.t. TSIG.VrfyShare2f +1 (hid, s − 1, H(TXss−1 )i, (i, σs−1,i )) = 1
∗ compute σs−1 , the full-sginature for hid, s − 1, H(TXss−1 )i, by aggregating the 2f + 1 received valid partial signatures
// To be precise, that means σs−1 ← TSIG.Combine2f +1 (hid, s − 1, H(TXss−1 )i, {(i, σs−1,i )}2f +1 )
– multicast PROPOSAL(id, s, TXss , σs−1 ), where TXss ← buf[: B] // Here it selects the first B-sized transactions at the front of buf
• Commit and Vote. upon receiving PROPOSAL (id, s, TXss , σs−1 ) from leader P`
– if s > 1 then:
∗ proceed only if σs−1 is valid full signature that aggregates 2f + 1 partial sigantures for hid, s − 1, H(TXss−1 )i, otherwise abort
// To be precise, that means a valid PROPOSAL message shall carry σs−1 s.t. TSIG.Vrfy2f +1 (hid, s − 1, H(TXss−1 )i, σs−1 ) = 1
∗ output block:=(id, s − 1, TXss−1 , Proof s−1 ), where Proof s−1 := hH(TXss−1 ), σs−1 i
– send VOTE(id, s, σs,i ) to the leader P` , where σs,i is the partial signature for hid, s, H(TXss )i, then let s ← s + 1
// To be precise, that means the VOTE message to send carries σs,i ← TSIG.SignShare2f +1 (ski , hid, s, H(TXss )i)
• Abandon. upon abandon(id) is invoked then: abort the above execution
Fig. 6: Bolt from sequential multicasts (Bolt-sCAST). Its invocable external functions are presented in Fig. 8.
let id be the sesssion identification of Bolt[id], buf be a FIFO queue of input transactions, B be the batch parameter, and P` be the leader (where ` = (id mod n) + 1)
Pi initializes s = 1, σ0 = ⊥ and runs the protocol in consecutive slot number s as follows:
• Broadcast. if Pi is the leader P` , activates RBC[hid, si] with input TXss ← buf[: B]; else activates RBC[hid, si] as non-leader party
• Vote. upon RBC[hid, si] returns TXss
– send VOTE(id, s, σs,i ) to all, where σs,i is the partial signature for hid, s, H(TXss )i
• Commit. upon receiving 2f + 1 VOTE (id, s, σs,i ) from distinct parties Pi , where σs,i is the valid partial-signature signed by Pi for hid, s, H(TXss )i
– output block:=(id, s, TXss , Proof s ), where Proof s := hH(TXss ), σs i and σs is the valid full signature that aggregates the 2f + 1 partial signatures for
hid, s, H(TXss )i, then let s ← s + 1
• Abandon. upon abandon(id) is invoked then: abort the above execution
Fig. 7: Bolt from sequential RBCs (Bolt-sRBC). Its invocable external functions are presented in Fig. 8.
9
designated fastlane leader can reliably broadcast its proposed Let ABBA be any asynchronous binary agreement, then party Pi executes:
transaction batches one by one. For each party receives some Upon receiving input R then:
• multicast VALUE (id, R)
transaction batch from a reliable broadcast, it signs the batch • wait for f + 1 VALUE (id, v) messages from distinct parties containing the
and multicasts the signature as vote, then wait for 2f + 1 same v, and activate ABBA[id] with v%2 as input
• wait for ABBA[id] returns s:
votes on the same batch to form a proof (quorum certificate)
– if v%2 = s, then: return v
such that it outputs the batch and the proof (as a block), and – else: wait for receiving f + 1 VALUE(id, v 0 ) messages from distinct
proceeds into the next reliable broadcast. parties containing the same v 0 such that v 0 %2 = s, then return v 0
Security and efficiency of Bolt constructions. The security Fig. 10: tcv-BA protocol centering around any ABBA “black-box”
analyses for Bolt-sCAST and Bolt-sRBC are simple by nature
and we defer them to Appendix A. Their complexities can be
tediously counted as well, and we later group them and all V. Bolt-Dumbo Transformer F RAMEWORK :
other complexity analyses related to BDT in Section VII. P RACTICAL O PTIMISTIC A SYNCHRONOUS BFT
B. Two-consecutive-value Byzantine agreement Now we are ready to present our practical optimistic asyn-
chronous atomic broadcast protocol Bolt-Dumbo Transformer
Another critical ingredient of our paper is an efficient pace
(BDT) in details, as having the critical nw-ABC and tcv-BA
synchronization subprotocol (a.k.a. Transformer) that enables
ingredients at hand. BDT is a generic framework with
efficient fallback from the optimistic path to the pessimistic
as-simple-as-possible pace synchronization (nicknamed by
path. It can help the honest parties to choose one common
Transformer) centering around tcv-BA, which allows to run
integer out of two unknown but consecutive numbers, thus
any nw-ABC instantiation such as Bolt as the optimistic
bootstrapping Transformer to enable the participating parties
fastlane, and naturally can use many asynchronous consensus
to reach an agreement on how many blocks were indeed output
protocols (e.g., Dumbo [41], VABA [3], Dumbo-MVBA [46])
in the optimistic fastlane. Essentially, tcv-BA can be viewed
as its pessimistic path.
as a generalized notion to extend the conventional binary
agreement and therefore be formalized due to the following
definition. Optimistic Fastlane Pace-Sync Pessimistic Path
Definition 6: Two-consecutive-value Byzantine agreement Bolt Transformer
(s, Proof )
Dumbo
s
P1 P1 P1 P1 If: s=0 P1
(tcv-BA), and satisfies the properties of termination, agreement max({s}) s
and validity (i.e., same to those of asynchronous binary Block s-1 P2 Block s P2 P2 P2 ACS P2
agreement) except with negligible probability, if all honest ... tcv-BA MVBA
Proof s-1 Proof s
P3 P3 P3 P3 etc. P3
parties input some value in {v, v + 1} where v ∈ N. Finalized Pending
To squeeze extreme performance of pace synchronization, P4 P4 P4 P4 P4
we give a non-black-box construction tcv-BA that only has to Timeout
revise three lines of code of the practical ABBA construction If: Bolt still works (s>0)
adapted from [50]. This non-black-box construction basically If: Bolt completely fails (s=0)
reuses the pseudocode except few slightly different if-else

checking (see Figure 9) and hence has the same performance Fig. 11: The execution flow of Bolt-Dumbo Transformer
of this widely adopted practical binary agreement instantiation.
Protocol overview. As outlined in Fig. 11, the fastlane of
For each party Pi , make the following modifications to the ABBA code in Alg. 7 BDT essentially is a Bolt instance wrapped by a timer. If
of [41] (originally from [50] but with some adaptions to use Ethan MacBrough’s honest parties can receive a new Bolt block in time, they would
suggestion [1] to fix the potential liveness issues of [50]):
Replace line 13-23 of Algorithm 7 in [41] with the next instructions:
restart the timer. Otherwise, the timer expires, such that the
• s ← Coinr .GetCoin() parties would multicast a fallback request containing the latest
– if Sr = {v} then: Bolt block’s proof according to their local views.
∗ if v%2 = s%2 After timeout, it enters pace-sync. Each party just waits for
· if decided = false then: output v; decided = true
· else (i.e, decided = true) then: halt
n − f fallback requests with valid Bolt proofs, and checks the
∗ estr+1 ← v maximum slot number out of the received fallbacks. After that,
– if Sr = {v1 , v2 } then: every honest party either has the globally latest Bolt block’s
∗ if v1 %2 = s%2, then estr+1 ← v1 slot number or one less than that, so they can invoke tcv-BA to
∗ else (i.e, v2 %2 = s%2), then estr+1 ← v2
pick up one out of these two numbers, from which they either
Fig. 9: tcv-BA protocol. Lines different to Alg. 7 in [41] are in orange texts tentatively continue another fastlane or run the underlying
pessimistic asynchronous protocol.
In addition, tcv-BA can be constructed by centering around The remaining non-triviality is that tcv-BA cannot ensure its
any asynchronous binary agreement (ABBA) instance with output to always be the larger number out of the two possible
only one more “multicast” round, cf. Figure 10. This black- candidates, that means the globally latest Bolt block can be
box construction provides us a convenient way to inherit any revoked after pace-sync. Hence, in the fastlane, the latest Bolt
potential improvements of underlying ABBA primitives [30]. block is marked as “pending”, i.e., once Bolt returns a fastlane
10
block, it is temporarily not finalized as BDT’s output, until Remarks on Help and CallHelp. When a party realizes that it
Bolt returns one more block. This pending fastlane block misses some fastlane blocks, it invokes the CallHelp function
becomes critical to realize safety in BDT. to broadcast a C ALL H ELP message, in which it specifies some
Protocol details. BDT is formally illustrated in Fig. 12. It em- blocks that shall be outputted but it did not receive yet. The
ploys a reduction to nw-ABC, tcv-BA, and some asynchronous Help service is a daemon process activated once every party
consensus (exemplified by ACS). BDT proceeds by epoch. is initialized and takes care of C ALL H ELP messages. If the
Each epoch might invoke three phases, i.e., Bolt, Transformer CallHelp function is invoked by some honest party, it can
and Pessimistic, and can be briefly described as: always successfully return the needed blocks, since at least
f + 1 honest parties have indeed received all blocks up to
1) Bolt phase. When an honest party enters an epoch syncPace and they are also running a Help daemon service
e, it activates a Bolt[e] instance, and locally starts to respond the request. The techniques of erasure-code and
an adversary-controlling “timer” that expires after τ Merkle commitment tree [23, 47] are used in the Help daemon,
clock “ticks” and resets once hearing the “heartbeat” of such that each Help daemon only has to return a coded
Bolt[e] (e.g., Bolt[e] returns a new block). If one party fragment of the fetched blocks, thus saving the communication
receives a new Bolt[e] block in time without “timeout”, complexity by an O(n) order.
it temporarily records the block as pending, finalizes
Remarks on alternative fastlane strategies. The exemplary
the previous (non-empty) pending as BDT’s output,
fastlane instantiation in Figure 12 is a parameterized by Esize
and sets its “pace” pe to this new block’s slot number.
to specify that each Bolt instance is terminated to run pace-
Otherwise, the “timeout” mechanism interrupts (it could
synchronization after Esize fastlane blocks are output. When
be the current leader is corrupted or the network failed to
the fastlane is built from some Bolt instantiation such as
keep synchronous), and the party would abandon Bolt[e].
Bolt-sCAST, it is possible that up to f honest parties might be
Beside the above “timeout” mechanism to ensure Bolt
much slower than the else honest parties (which can even be
progress in time, we also consider the case that some
intentionally caused by malicious fastlane leader). The strategy
transactions are probably censored: if the oldest trans-
of “periodically” stopping fastlanes to invoke Transformer can
action (i.e., the top at the input buffer) is not output for
enable all parties to quickly sync up any missing blocks.
a duration T , an interruption is also raised to abandon
Nevertheless, this Esize parameter is essentially optional. For
Bolt[e]. Finally, it is optional that the parties might raise
example, it is possible to tweak the Help service and the
an interruption to quit from the fastlane if Bolt[e] has
CallHelp function to dismiss the Esize parameter: every party
finalized enough (Esize) optimistic blocks.
simply multicast the epoch and slot of its latest fastlane block
Once a party is interrupted to abandon Bolt[e] for any
“periodically” via some checkpoint messages, so that an honest
reason listed above (or 1. trigger “timeout” mechanism,
party can learn how many blocks it is currently missing after
or 2. some transactions are probably censored, or 3.
receiving enough checkpoint messages, and thus can ask others
it has inalized enough (Esize) optimistic blocks), it
to fetch them. Nevertheless, we advocate the current design of
immediately multicasts its latest “pace” pe along with
using Esize parameter and set Esize not too large, since this
the corresponding Bolt[e] block’s proof via a PACE S YNC
can give each party a fair chance to be the leader of fastlane,
message.
and more importantly, the periodic Transformer execution is
2) Transformer phase. If an honest party receives (n − f )
essentially cheap for its high efficiency.
valid PACE S YNC messages from distinct parties w.r.t.
Bolt[e], then it enters the Transformer phase. In the Remarks on alternative pessimistic strategies. Though in
Transformer phase, the party chooses the maximum the exemplary protocol description, the Pessimistic path would
“pace” maxPace out of the n − f “paces” announced invoke Dumbo to output one block, only if the Transformer
by distinct parties, and it would use this maxPace as phase reveals the fastlane outputs nothing. Nonetheless, this
input to invoke the tcv-BA[e] instance. When tcv-BA[e] is not the only design choice, and there could be some global
returns some value syncPace, then all parties decide heuristics that can estimate how many pessimistic blocks to
that the fastlane in this epoch e has finalized syncPace output in the pessimistic path, according to the publicly known
optimistic blocks. In some worse case that a party did information (e.g., how many times the fastlane completely fails
not yet receive all blocks up to syncPace, it would fetch to output nothing in a stream). Designing such a heuristic to
the missing blocks from other honest parties by invoking better fit the real-world Internet environments could be an
the CallHelp function (cf. Fig. 13). interesting engineering question to explore in the future but
3) Pessimistic phase. This phase may not be executed does not impact any security analysis in this paper.
unless the optimistic fastlane of the current epoch e
Security intuitions. We elaborate the intuitions of the safety
makes no progress at all, i.e, syncPace = 0. In the
and liveness of Bolt-Dumbo Transformer in the following,
worst case, Pessimistic is invoked to guarantee that some
while deferring detailed proofs to Appendix B for space limit.
blocks (e.g., one) can be generated despite how bad the
underlying network is, which becomes the last line of Safety. The core ideas of proving agreement and total-order
defense to ensure the critical liveness. can be summarized as follows:
11
• Transformer returns a common index. Any two parties index returned from Transformer, at least f + 1 honest
must obtain the same block index from Transformer, parties did receive all blocks (with valid proofs) up to this
so they always agree the same position from which to index. As such, if any party misses some blocks, it can
continue either a pessimistic path or a new fastlane. This easily fetch the correct blocks from these f + 1 parties.
is guaranteed by the agreement of tcv-BA. This is because the notarizability of Bolt prevents the
• Transformer returns an index not “too large”. For the adversary from forging a proof for a fastlane block with
// Optimistic Path (also the BDT protocol’s main entry)

Every party Pi runs the protocol in consecutive epoch numbered e (initialized as 1) as follows:
• initialize: pe ← 0, Proof e ← ⊥, Pacese ← {}, pendinge ← ∅
• activate Bolt[e] instance, and start a timer that expires if not being restarted after τ clock “ticks” (i.e., invoke timer(τ ).start())
• upon Bolt[e] delivers a block:
– parse block:=he, p, TXsp , Proof p i, where p is the “slot” number
– log.append(pendinge ), buf ← buf \ {TXs in pendinge }, pendinge ← block //finalized the elder pending block, pending the newly fastlane block
– pe ← p, Proof e ← Proof p , timer(τ ).restart // Bolt[e] makes progress in time, so restart the “heartbeat” timer
• upon timer(τ ) expires or pe = Esize or the front tx in the backlog buf was buffered T clock “ticks” ago:
– invoke Bolt[e].abandon() and multicast PACE S YNC(e, pe , Proof e ) // the fastlane is probably stucking or censoring certain transactions
j j
• upon receiving message PACE S YNC (e, pe , Proof e ) from Pj for the first time:
– if Bolt.verify(e, pje , Proof je ) = 1: Pacese ← Pacese ∪ pje

– if |Pacese | = n − f : // enough nodes have already quitted the fastlane
∗ invoke Transformer(e) and wait for its return to continue // enter into the pace-synchronization phase
∗ proceed to the next epoch e ← e + 1 // restart the fastlane of next epoch
......................................................................................................................................................................
// Pace Synchronization
internal function Transformer(e): // internal function shares all internal states of the BDT protocol
• let maxPacee ← max(Pacese ) and then syncPacee ← tcv-BA[e](maxPacee ) // see Fig. 9 or 10 for concrete implementations of tcv-BA
• if syncPacee > 0:
– send PACE S YNC(e, syncPacee , Proof) to all if syncPacee ∈ Pacese , where Bolt.verify(e, syncPacee , Proof) = 1
– if syncPacee = pe : log.append(pendinge ) and buf = buf \ {TXs in pendinge }
– if syncPacee = pe + 1:
∗ wait for a valid PACE S YNC(e, syncPacee , Proof), then block0 ← Bolt.extract(e, syncPacee , Proof) // try to extract the missing block
∗ if block0 is in form of (e, syncPacee , ⊥, ⊥), then: block0 ← CallHelp(e, pe , 1) // failed to extract, have to rely on other nodes to fetch, cf. Fig. 13
∗ log.append(pendinge ).append(block0 ) and buf = buf \ {TXs in pendinge and block0 }
– if syncPacee > pe + 1: blocks ← CallHelp(e, pe , syncPacee − pe ) // contact other nodes to fetch missing fastlane blocks, cf. Fig. 13
∗ log.append(pendinge ).append(blocks) and buf = buf \ {TXs in pendinge and blocks}
– continue Optimistic Path with e ← e + 1
• if syncPacee = 0: invoke Pessimistic(e) and wait for its return, then continue Optimistic Path with e ← e + 1
......................................................................................................................................................................
// Pessimistic Path
internal function Pessimistic(e): // internal function shares all internal states of the BDT protocol
• txsi ← randomly select bB/nc-sized transactions from the first B-sized transactions at the top of buf
• xi ← TPKE.Enc(epk, txsi ), namely, encrypt txsi to obtain xi
• {xj }j∈S ← ACS[e](xi ), where S ⊂ [n] and |S| ≥ n − f
S
• For each j ∈ S, jointly decrypt the ciphertext xj to obtain txsj , so the payload TXs = j∈S txsj ← {TPKE.Dec(epk, xj )}j∈S
• let block := he, 1, TXs, ⊥i, then log.append(block) and buf = buf \ {TXs in block}
// This pessimistic path might be replaced by many asynchronous consensus protocols, e.g., ACS and MVBA. Throughout the paper, the pessimistic path is instantiated by
Dumbo, i.e., using a reduction to ACS (Honeybadger [47]) and an ACS implementation from Dumbo [41]. Essentially, the underlying ACS [41] can be replaced by any improved
implementation.
Fig. 12: The Bolt-Dumbo Transformer (BDT) protocol
// The Help daemon process

Help: It is a daemon process that can read the finalized output log of Bolt-Dumbo Transformer, and it listens to the down below event:
• upon receiving message C ALL H ELP (e, tip, gap) from party Pj for the first time:
– assert 1 ≤ gap ≤ Esize
– wait for log containing the block := he, tip + gap, ∗, ∗i
– let M ← retrieve all blocks in log from block := he, tip + 1, ∗, ∗i to block := he, tip + gap, ∗, ∗i
∗ let {mk }k∈[n] be the fragements of a (n − 2f, n)-erasure code applied to M and h be a Merkle tree root computed over {mk }k∈[n]
∗ send H ELP(e, tip, gap, h, mi , bi ) to Pj where mi is the i-th erasure-code fragement of M and bi is the i-th Merkle tree branch
// The CallHelp function

external function CallHelp(e, tip, gap):
• let F ← [ ] to be a dictionary structure such that F [h] can store all leaves committed to Merkle tree root h
• multicast message C ALL H ELP (e, tip, gap)
• upon receiving the message H ELP (e, tip, gap, h, mj , bj ) from party Pj for the first time:
– if bj is a valid Merkle branch for root h and leaf mj then: F [h] ← F [h] ∪ (j, mj ); otherwise discard the message
– if |F [h]| = n − 2f then:
∗ interpolate the n − 2f leaves stored in F [h] to reconstruct M , then parse M as a sequence of blocks and return blocks
Fig. 13: Help and CallHelp, where Help is a daemon process that can read the output log of BDT and CallHelp is a function used to call the Help service
12
an index higher than the actually delivered block. So no network layer is implemented as a separate Python process to
honest party would input some index of an irretrievable provide non-blocking communication interfaces.
block to tcv-BA, and then the validity of tcv-BA simply For common coin, it is realized by hashing the Boldyreva’s
guarantees the claim. pairing-based threshold signature, and we borrow the same
• Transformer returns an index not “too small”. No honest implementation used in HoneyBadger BFT over MNT224
party would revoke any fastlane block that was already curve [47]. For threshold signature (except the one used for
committed as a finalized output. Since each honest party common coin), we use ECDSA over secp256k1 curve with
waits for 2f + 1 PACE S YNC messages from distinct using coincurve library, i.e., Python bindings for libsecp256k1
parties, then due to the notarizability of Bolt, there is library, which helps us achieve a Dumbo implementation
at least one PACE S YNC message contains s − 1, where s that can perform better than the original result [41] where
is the latest fastlane block (among all parties). So every Boldyreva’s threshold signature [16] is used. For threshold
honest party at least inputs tcv-BA with s−1. The validity public key encryption, the hybrid encryption approach imple-
of tcv-BA then ensures the output at least to be s − 1 as mented in HoneyBadger BFT is used [47]. For erasure coding,
well. Recall that there is a “safe buffer” to hold the latest the Reed-Solomon implementation in the zfec library is used
fastlane block as a pending one, the claim is then correct. as same as [47] and [41]. We use the real clock in each EC2
• Pessimistic path and fastlane are safe. Pessimistic path instance to implement every BFT process’s local time in lieu of
is trivial to see from their immediate properties. Fastlane the adversary-controlling “clock” in our formal security model.
trivially has total-order by definition, and its weaker Two Bolt instantiations given in Section IV, namely, Bolt-
version agreement — notarizability — is complemented sCAST and Bolt-sRBC, are implemented. BDT with using
by Transformer as argued above. Bolt-sCAST is denoted by BDT-sCAST, while the other in-
Liveness. The liveness of BDT stem from the liveness of all stantiation with using Bolt-sRBC is denoted by BDT-sRBC. In
three phases. The liveness of fastlane is fully guaranteed by addition, BDT-Timeout denotes to let the fastlane be a simple
the “timeout” parameter τ in the worst case. That means, all idle procedure that does nothing and just waits for timeout to
honest parties can leave the fastlanes, and the protocol would fallback, which can be used as benchmark to “mimic” the
not “stuck” there. After that, all parties would invoke tcv-BA worst case that the fastlanes always output nothing due to
and then obtain syncPace as the tcv-BA[e] output due to the constant denial-of-service attacks.
termination of tcv-BA; moreover, if any honest party realizes Experiment setup on Amazon EC2. To demonstrate the
that it misses some fastlane blocks after obtaining syncPace, applicability of our scheme in the real-world environment, we
it can easily sync up to the slot number syncPace within only run BDT (as well as Dumbo and HotStuff for benchmarking)
two asynchronous rounds, because at least f +1 honest parties among Amazon EC2 c5.large instances that are equipped with
would help it to fetch any missing blocks. That said, no honest two vCPUs and 4 GB main memory. We perform compre-
party would “stuck” during the Transformer phase. Finally, the hensive evaluations for varying protocol parameters on 64
honest parties would enter the Pessimistic phase if the fastlanes c5.large instances and 100 c5.large instances, respectively. All
completely fail to output nothing, after which, the protocol EC2 instances in our tests are evenly distributed in 16 regions
must successfully output expected O(B)-sized transactions, across five continents, i.e., Virgina, Ohio, California, Oregon,
thus ensuring liveness even if in the worst cases. Central Canada, San Paulo, Frankfurt, Ireland, London, Paris,
Efficiency analysis. The complexities can be analyzed by Stockholm, Mubai, Seoul, Singapore, Tokyo and Sydney.
considering each underlying module. Overall, BDT would In the tests, we might fix some parameters of BDT to
cost O(n) communicated bits per output transaction in either intentionally amplify the overhead of fallback. For example,
optimistic cases (where the synchrony assumption holds) or we let each fastlane interrupted after output only 50 blocks, so
pessimistic cases (where the network becomes asynchronous). Transformer is frequently invoked. We also set the fastlane’s
We defer the tedious complexity analysis on counting the timeout parameter τ as large as 2.5 sec (nearly twenty times
exchanged bits to Section VII. of the one-way network latency in our test environment), so
all fallbacks triggered by timeout would add additional 2.5 sec
VI. P ERFORMANCE E VALUATION OVER THE G LOBE as overhead besides the cost of Transformer.
We perform extensive experiments to evaluate the prototype Basic latency. In our experiments, we measure the latency of
implementation of Bolt-Dumbo Transformer among Amazon a transaction as the time elapsed between the moment when
EC2 instances distributed across five continents. it appears in the front of a party’s input backlog and the
Implementation details. We implement BDT, Dumbo and moments when it is outputted to the finalized log. It is worth
HotStuff in the same language (i.e. Python 3), with using noticing that a transaction is not finalized when the transaction
the same libraries and security parameters for all underlying is included by a fastlane block that is just delivered, since this
cryptographic implementations. All BFT protocols are imple- block is only a pending block, and we have to wait one more
mented as single-core Python processes with using gevent to Bolt block for its finalization to count the transaction’s latency.
handle concurrent tasks. Besides, a common network layer We firstly measure the basic latency in Figure 14 to reflect
is programmed by using unauthenticated TCP sockets. The how fast the protocol is (in the good cases without faults or
13
timeouts), if all blocks have zero payload. This case provides 4 0 k
H o tS tu ff- n o - fa u lt
us the baseline understanding about how fast BDT, HotStuff B D T - s C A S T - n o - fa u lt
th ro u g h p u t (tp s )
B D T - s R B C - n o - fa u lt
and Dumbo can be to handle the scenarios favoring low- 2
3 0 k D u m b o - n o - fa u lt 7 7
6 2 6
2 7
latency. Figure 14 reveals that: in terms of basic latency, when 2 4 2
2 4
4 2 1
n = 100, BDT-sCAST is 36x faster than Dumbo, and BDT- 2 3
2 0 k 1 8
3
3 8
sRBC is 23x faster than Dumbo; when n = 64, BDT-sCAST 1 4
8 3
2 5
1 1 8 0
is 18x faster than Dumbo, and BDT-sRBC is 10x faster than 1 0 5 3
M a x im u m
1 0 k 9 2
Dumbo; moreover, the execution speed of both BDT-sCAST
and BDT-sRBC are at the same magnitude of HotStuff. In
0
particular, the basic latency of BDT-sCAST is almost as same 6 4 1 0 0
as the two-chain version of Hotstuff, which can be understood S c a le ( # o f p a r t ie s )
because the optimistic latency of BDT-sCAST is five-round6 , Fig. 15: Highest throughput in experiments over wide-area network for two-
i.e., same to that of two-chain Hotstuff. chain HotStuff, BDT-sCAST, BDT-sRBC and Dumbo (if no fault).
1 8 H o tS tu ff- n o - fa u lt .3 6
1 6
1 6
B D T - s C A S T - n o - fa u lt design spaces, for example, smaller timeout parameter τ can
B D T - s R B C - n o - fa u lt
B a s ic la t e n c y ( s e c o n d )
1 4 D u m b o - n o - fa u lt be attempted to squeeze performance out of any good periods

1 2 when the network is on good days. We estimate such overhead
1 0
4 4
0 2 from two different perspectives as shown in Figure 16 and 17.
7 .7
8 First, as shown in Figure 16, we measure the execution time
6
of Transformer in various settings by taking combinations of
4
6 3 0 9 6 3 6 9 7 6 0 8 the following setups: (i) BDT-sCAST or BDT-sRBC; (ii) 1/3
2 9 5 0 3 7 5 2 5 3 1 9 8
0 .3 0 .4 0 .6 0 .4 0 .4 0 .6 crashes on or off; (iii) 64 EC2 instances or 100 EC2 instances.
0
6 4 1 0 0 Moreover, using MVBA in [21] for fallback is also presented
S c a le ( # o f p a r t ie s ) as the basic benchmark. The result indicates that: in case of
Fig. 14: Minimum latency in experiments over wide-area network for two- executing BDT-sRBC, Transformer is always smaller than 1
chain HotStuff, BDT-sCAST, BDT-sRBC and Dumbo (if no fault). second, no matter n and on/off of crashes, which is really
cheap relative to the high cost of running MVBA fallback,
Peak throughput. We measure throughput in unit of trans- for example, MVBA fallback is 10 times slower than our
actions per second (where each transaction is a 250 bytes Transformer in the BDT-sRBC instantiation when n = 100;
string to approximate the size of a typical Bitcoin transaction). in all cases, BDT-sCAST has a Transformer phase longer than
Therefore, the throughput can be counted as the number of that in BDT-sRBC, which is because in BDT-sRBC, all parties
output transactions over the running time of test. would more likely take the same input to tcv-BA, causing this
We show the peak throughput in Figure 15 to reflect the randomized module easier to terminate.
maximum capability of each protocol to process transaction 1 8
B D T -s C A S T - n o - fa u lt
burst. This gives us an insight how well BDT, HotStuff 1 6 B D T -s R B C - n o - fa u lt
D u m b o 's b a s ic la t e n c y ( n o - fa lu t, n = 1 0 0 )
and Dumbo can handle the application scenarios favoring 1 4

M V B A -fa llb a c k - n o - fa u lt ( K S 0 2 , R C 0 5 )
L a te n c y o f T ra n s fo rm e r
B D T -s C A S T -1 /3 -c ra s h
throughput. The results shows that: BDT-sCAST realizes a 1 2 B D T -R B C -1 /3 -c ra s h
peak throughput that remains about 85% of HotStuff’s when 1 0
M V B A -fa llb a c k - 1 /3 - c r a s h ( K S 0 2 , R C 0 5 )
(s e c o n d )
either n is 100 or 64, BDT-sRBC achieves a maximum 8 D u m b o 's b a s ic la t e n c y ( n o - fa lu t, n = 6 4 )
throughput that is as high as around 90% of Dumbo’s for 6

n = 64 case and about 85% of Dumbo’s for n = 100 4
case. All these throughput peaks are achieved despite frequent 2
H o t S t u f f 's H o t S t u f f 's
Transformer executions as we intend to let each fastlane to 0
6 4 1 0 0
fallback after output mere 50 blocks. S c a le ( # o f p a r t ie s )
Overhead of our pace-sync Transformer. It is critical for us Fig. 16: Latency of running Transformer for pace synchronization in BDT-
to understand the extra cost incurred by Transformer, since sCAST and BDT-sRBC (when there is no fault and 1/3 crash, respectively).
cheaper Transformer will free us from worrying on the over- MVBA fallback in RC05 is also tested and reported as reference result.
head of optimistic path failures, and therefore more aggressive
6 The five-round latency of BDT-sCAST in the best cases can be counted
Second, as illustrated in Figure 17, we measure the latency-
as follows: one round for the leader to multicast the proposed batch of throughput tradeoffs for BDT-Timeout, namely, to see how
transactions, one round for the parties to vote (by signing), one round for the much BDT is worse than Dumbo when BDT’s fastlane is
leader to multicast the quorum proof consisting of enough signatures (thus all under denial-of-service. This is arguably the worst-case test
parties can get a pending block), and finally two more rounds for every parties
to receive one more block and therefore output the earlier pending block. The vector for BDT, since relative to Dumbo, it always costs extra
concrete of rounds of BDT-sRBC in the best cases can be counted similarly. 2.5 seconds to timeout and then executes the extra Transformer
14
subprotocol. Nevertheless, the result shows that even if having become larger (e.g., 10000 transactions per block); in contrast,
so much extra cost, the performance degradation is still ac- the throughput of BDT-sRBC is increasing faster than those
ceptable in relative to Dumbo, since for the same throughput, of BDT-sCAST and HotStuff, because larger batch sizes in
BDT always only spends few more seconds (which is largely BDT-sRBC would not place much worse bandwidth load on
caused by our conservation 2.5 sec timeout parameter). the leader, and thus can raise more significant increment in
the throughput.
1 0 0
B D T - T im e o u t- 6 4 - n o - fa u lt ( fa s tla n e a lw a y s fa ils )
D u m b o - 6 4 - n o - fa u lt
4
8 0 3 x 1 0 H o tS tu ff- 6 4 - n o - fa u lt
B D T - T im e o u t- 1 0 0 - n o - fa u lt ( fa s tla n e a lw a y s fa ils ) B D T - s C A S T - 6 4 - n o - fa u lt
L a te n c y (s e c o n d )
D u m b o - 1 0 0 - n o - fa u lt B D T - s R B C - 6 4 - n o - fa u lt
6 0 D u m b o - 6 4 - n o - fa u lt
T h ro u g h p u t (tp s )
2 x 1 0 4 H o tS tu ff- 1 0 0 - n o - fa u lt
B D T - s C A S T - 1 0 0 - n o - fa u lt
4 0 B D T - s R B C - 1 0 0 - n o - fa u lt
D u m b o - 1 0 0 - n o - fa u lt
4
2 0 1 x 1 0
0
0 5 0 0 0 1 0 0 0 0 1 5 0 0 0 2 0 0 0 0 2 5 0 0 0
0
1 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0
Fig. 17: Latency v.s. throughput for experiments of BDT that has an idling B a t c h S iz e ( t x )
fastlane (i.e., just wait 2.5 seconds to run Transformer and Dumbo).
Fig. 19: Throughput v.s. batch size for experiments over wide-area network
when n = 64 and n = 100, respectively (in case of no fault).
Varying batch sizes. For understanding to what an extent the

batch size matters, we report how throughput and latency de-
Latency-throughput trade-off (no fault). It is shown in
pend on varying batch sizes in Figure 18 and 19, respectively.
Figure 20 that either BDT-sCAST or BDT-sRBC is much
Figure 18 illustrates how latency increases with larger
faster than Dumbo by several orders of magnitude in all
batch size in BDT-sCAST, BDT-sRBC, HotStuff and Dumbo
cases, while the two BDT instantiations have different fa-
when n = 64 and n = 100, respectively. It clearly states
vors towards distinct scenarios. BDT-sCAST has a latency-
that: Dumbo always takes a latency much larger than BDT-
throughput trade-off similar to that of HotStuff, and is more
sCAST, BDT-sRBC and HotStuff; for BDT-sRBC, its latency
favoring latency comparing to Dumbo. For example, when
increases much slower than BDT-sCAST and HotStuff, in
fixing throughput around 5000 tps, BDT-sCAST has a latency
particular when B = 10000, the latency of BDT-sRBC is
only about 0.5 sec when n = 64, and a latency about 0.9
around one second only, while these of BDT-sCAST and
sec when n = 100; in contrast, Dumbo needs about 10 sec
HotStuff have been more than 2 seconds. The slow increasing
or 20 sec to realize the same throughput for n = 64 and
of BDT-sRBC’s latency is mainly because its better balanced
n = 100, respectively. BDT-sRBC has a latency-throughput
network load pattern.
trend quite different from HotStuff and BDT-sCAST. Namely,
1 0 0
when fixing larger throughput, BDT-sRBC has a latency less
than BDT-sCAST’s; when fixing small throughput, BDT-
sRBC could be slower. This separates them clearly in terms
of application scenarios, since BDT-sRBC is a better choice
L a te n c y (s e c o n d )
1 0
H o tS tu ff- 6 4 - n o - fa u lt
for large throughput-favoring cases and BDT-sCAST is more
B D T - s C A S T - 6 4 - n o - fa u lt suitable for latency-sensitive scenarios.
B D T - s R B C - 6 4 - n o - fa u lt
D u m b o - 6 4 - n o - fa u lt
H o tS tu ff- 1 0 0 - n o - fa u lt 4 0 k
1 B D T - s C A S T - 1 0 0 - n o - fa u lt H o tS tu ff- 6 4 - n o - fa u lt H o tS tu ff- 1 0 0 - n o - fa u lt
B D T - s R B C - 1 0 0 - n o - fa u lt B D T - s C A S T - 6 4 - n o - fa u lt B D T - s C A S T - 1 0 0 - n o - fa u lt
D u m b o - 1 0 0 - n o - fa u lt B D T - s R B C - 6 4 - n o - fa u lt B D T - s R B C - 1 0 0 - n o - fa u lt
3 0 k D u m b o - 6 4 - n o - fa u lt D u m b o - 1 0 0 - n o - fa u lt
1 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0
B a t c h S iz e ( t x )
Fig. 18: Latency v.s. batch size for experiments over wide-area network when 2 0 k
n = 64 and n = 100, respectively (in case of no fault).
1 0 k
Figure 19 illustrates how throughput increases with larger

batch size in BDT-sCAST, BDT-sRBC, HotStuff and Dumbo 0
1 1 0
when n = 64 and n = 100, respectively. Dumbo really L a te n c y (s e c o n d )
needs very large batch size to realize an acceptable throughput;
Fig. 20: Throughput v.s. latency for experiments over wide-area network when
BDT-sCAST and HotStuff have a similar trend that the n = 64 and n = 100, respectively (in case of no fault).
throughput would stop to increase soon after the batch sizes
15
Latency-throughput trade-off (1/3 crashes). We also report that the fastlanes never timeout, the Pessimistic phase is not
the latency-throughput trade-off in the presence of 1/3 crashes. executed, so the fastlane cost shown in Table III would also
The crashes not only lag the execution of all protocols, but reflect the amortized complexities of the overall BDT protocol
also mimic that a portion of Bolt instances are under denial- (in case that the epoch size Esize is large enough, e.g., n).
of-services. We might fix the batch size of the pessimistic
TABLE III: Per-block performance of different Bolt instantiations (which is
path in BDT-sCAST and BDT-sRBC as 106 transactions in also per-block cost of BDT in the optimistic cases)
these tests, because this batch size parameter brings reasonable
throughput-latency trade-off in Dumbo. Shown in Figure 21, Msg. Comm. Round
Bandwidth Cost
both BDT-sCAST and BDT-sRBC have some design spaces Leader Others
that show a latency better than Dumbo’s and presents a Bolt-sCAST O(n) O(nB) O(1) O(nB) O(B)
throughput always better than HotStuff’s, despite that on Bolt-sRBC O(n2 ) O(nB) O(1) O(B) O(B)
average 1/3 instances of Bolt are unluckily stuck to wait for
2.5 sec to timeout without returning any optimistic output.
Complexities of the worst cases (disregarding the adversary).
That means our practical BDT framework does create new
In the optimistic fastlane, there is a worst-case overhead
design space to harvest the best of both paths, resulting
of using O(τ ) asynchronous rounds to leave the tentatively
in that we can achieve reasonable throughput and latency
optimistic execution without outputting any valid blocks.
simultaneously in fluctuating deployment environments.
After the stop of fastlane, all parties enter the Transformer
phase, and would participate in tcv-BA, in which the expected
Ne message complexity is O(n2 ), the expected communication
w
de
sig
ns
complexity is O(λn2 ), and the expected bandwidth cost of
pa each parties is O(λn). Besides, if the output value of tcv-BA
ce
is large than zero, then the CallHelp subroutine could probably
be invoked, this process will incur O(n2 ) overall message
complexity and O(nB) per-block communication complexity
and causes each party to spend O(B) bandwidth to fetch
each block on average. In the worst case, Dumbo is executed
after Transformer, which on average costs overall O(n3 )
messages, overall O(nB) communicated bits, and O(B)
Fig. 21: Throughput v.s. latency for experiments over wide-area network when bandwidth usage for each party. The latency of generating a
n = 64 and n = 100, respectively (in case of 1/3 crash fault). We fix the block in the pessimistic path is of O(1) rounds on average.
fallback batch size of BDT instances to 106 transactions in all tests. To summarize these, we can have the worst-case per-block
performance illustrated in Table IV. Note that the O(τ ) worst-
VII. C OMPLEXITY AND N UMERICAL A NALYSES case running time of generating a block reflects two possible
This section discusses the critical complexity metrics of the worst cases: (i) the fastlane finalizes few blocks (e.g., one) and
BDT framework and those of its major modules. We then then times out, so O(τ ) has to be added to the few fastlane
assign each module a running time cost according to our real- blocks; or (ii) the fastlane finalizes nothing and then times
world experimental data, thus enabling more precise numerical out, and thus O(τ ) must be added to the pessimistic Dumbo
analysis to estimate the expected latency of BDT in various blocks. Nevertheless, the timeout parameter τ in our system
“simulated” unstable deployment environments. is not necessarily close to the network delay upper bound ∆,
Complexity analysis. Here we analyze BDT regarding its and it can represent some adversary-controlling “clock ticks”
complexities. Recall that we assume the batch size B suffi- to approximate the number of asynchronous rounds spent to
ciently large, e.g., Ω(λn2 log n), throughout the paper. generate each fastlane block, i.e., O(τ ) = O(1).
Complexities of the fastlane (also of the optismtic cases). For TABLE IV: Per-block performance of BDT in the worst cases
the optimistic fastlane, we have two instantiations, namely
Bolt-sCAST and Bolt-sRBC. As shown in Table III, Bolt- Bandwidth Cost
Msg. Comm. Round
sCAST is with linear O(n) per-block message complexity, Leader Others
and the leader’s per-block bandwidth usageO(nB) is also BDT-sCAST O(n3 ) O(nB) O(τ )∗ O(nB) O(B)
linear in n; Bolt-sRBC is with quadratic per-block message BDT-sRBC O(n3 ) O(nB) O(τ )∗ O(B) O(B)
complexity as O(n2 ), while the per-block bandwidth usage of ∗ Note O(τ ) can be close to O(1), since we essentially let τ “clock ticks”
every party is not larger than the batch size O(B); for both to approximate the delay of generating a new fastlane block in Bolt.
“fastlane” instantiations, though the generation of each Bolt
block will cost exact O(1) rounds, Bolt-sCAST uses slightly Complexities in comparision to other BFT consensuses. Here
less concrete asynchronous rounds on average. we also summarize the communication complexities of BDT
Note that in the optimistic case when (i) the fastlane leaders and some known BFT protocols in the optimistic case and
are always honest and (ii) the network condition is benign such the worst case, respectively. To quantify the latency of those
16
protocols in unstable network environment, Table V also latency) rather than a second in the real world.
lists each protocol’s (normalized) average latency. This metric
considers that the fastlane has a probability α ∈ [0, 1] to output 3 5
( a ) W h e n s o m e fa s t la n e s fa il in t h e b e g in
blocks in time, and also has a chance of β = 1 − α that falls 3 0
A v e ra g e L a te n c y (s e c o n d )
back and then executes the pessimistic asynchronous protocol. 2 5
Let C be the latency of using earlier asynchronous protocols L a te n c y o f D u m b o
2 0
[21, 47] directly as the pessimistic path and c be that of the
1 5
state-of-the-art Dumbo BFT [41] and that of using MVBA for
synchronization during fallback. Both C and c are represented 1 0 B D T w i t h f a i le d f a s t la n e s
R C 0 5 w i t h f a i le d f a s t la n e s
in the unit of fastlane latency. According to the experimental 5
data [41, 47], C is normally at hundreds and the latter c is L a t e n c y o f F a s t la n e
0
typically dozens ([47] runs n instances of ABBA, thus rounds 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0 1 0 0
P e r c e n t o f F a ilin g - in - t h e - B e g in F a s t la n e s ( % )
depend on number of parties, while [41] reduces it to constant).
3 5
Our fallback is almost as fast as the fastlane, so its magnitude ( b ) W h e n s o m e fa s t la n e s fa il in t h e m id d le
around one. As such, we can do a simple calculation as shown 3 0
A v e ra g e L a te n c y (s e c o n d )
in Table V to roughly estimate the latency of all those protocols 2 5
deployed in the realistic fluctuating network. 2 0

L a te n c y o f D u m b o
TABLE V: Complexities of BFT protocols in various settings (where B is 1 5
sufficiently large s.t. all λ terms are omitted, and α + β = 1) B D T w i t h f a i le d f a s t la n e s

1 0 R C 0 5 w i t h f a i le d f a s t la n e s
Per-block Com. Compl. Normalized average latency 5

Protocol
L a t e n c y o f F a s t la n e
Optim. Worst considering fastlane latency as 1 0
1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0 1 0 0
PBFT [27] O(nB) ∞ 1/α P e r c e n t o f F a ilin g - in - t h e - M id d le F a s t la n e s ( % )
HotStuff [63] O(nB) ∞ 1/α
HBBFT [47] O(nB) O(nB) C Fig. 22: Numerical analysis to reflect the average latency of BDT and RC05
[59] in fluctuating deployment environment. The analysis methodology is
Dumbo [41] O(nB) O(nB) c
similar to the formulas in Table V except that here consider more protocol
β
KS02 [44] O(n2 B) O(n3 B) (α + C+kc
)−1 ? parameters such as batch size, epoch size, timeout, etc.
β
RC05 [59] O(nB) O(n3 B) (α + C+c )−1
β −1
BDT (ours) O(nB) O(nB) (α + c+1 ) We consider two simulated scenarios. One is illustrated in
? There is an integer parameter k in [44] to specify the degree of parallelism
Figure 22 (a), in which there are some portion of fastlane
for the fastlane, thus probably incurring extra cost of fallback.
instances that completely fail and output nothing but just
timeout and fallback after idling for 2.5 seconds, while the else
fastlane instances successfully output all optimistic blocks. In
Numerical analysis on latency in unstable network. To the case, BDT can save up to almost 10 seconds on average
understand the applicability level of BDT framework, we fur- latency relative to RC05. This is a result of the much more
ther conduct more precise numerical estimations to visualize efficient fallback mechanism; more importantly, the efficient
the average latency of BDT and prior art (e.g. RC05) in the fallback brings much robuster performance against unstable
unstable Internet deployment environment, in particular for network environment, for example, RC05 starts to perform
some typical scenarios between the best and the worst cases. worse than Dumbo once more than 45% fastlane instances
The real-world experiment data shown in Section VI is completely fail in the beginning of their executions, while
considered to specify the cost of each protocol module in the BDT can be faster than Dumbo until more than 75% fastlane
estimations. In particular, we use our experimental results over instances completely fail. The other case is shown in Figure
the globe when n = 100 to specify the parameters used in the 22 (b), where some fastlane instances stop to progress in the
numerical estimations regarding both RC05 and BDT: we set middle of their executions (e.g., stop to progress after 25
the latency of fastlane as 1 second (to reflect the actual latency optimistic blocks are finalized) and then wait for 2.5 seconds to
of Bolt) and set the latency of pessimistic path as 20 seconds timeout and fallback. In the case, BDT performs almost as fast
(according to the measured latency of Dumbo); the fastlane as its underlying fastlane (i.e., the average delay is really close
block and the pessimistic block are set to contain 104 and to 1 second) despite the overheads of timeout and Transformer
106 transactions, respectively; the MVBA fallback in RC05 is in this fluctuating network condition; in contrast, RC05 can be
set to use 10 seconds and our Transformer is set to cost 1 an order of magnitude slower than BDT, and it would be even
second (cf. Figure 17); for fair comparison, we let RC05 to slower than Dumbo if more than 55% fastlane instances fail
use the state-of-the-art Dumbo protocol as its pessimistic path; to progress in the middle of their optimistic executions.
other protocols parameters (e.g., epoch size and timeout) are
also taken into the consideration and are set as same as those VIII. C ONCLUSION AND R EFLECTION
in the experiments. Note that a “second” in the simulations BDT showcases low latency (same to the pipelined con-
is a measurement of virtual time (normalized by the fastlane sensus Hotstuff) in normal network conditions, and presents
17
robust throughput (same to the underlying asynchronous proto- [14] E. Blum, J. Katz, and J. Loss, “Synchronous consensus with optimal
col) in even adversarial environments. To our knowledge, this asynchronous fallback guarantees,” in Proc. TCC 2019, pp. 131–150.
[15] E. Blum, C.-D. Liu-Zhang, and J. Loss, “Always have a backup plan:
is the first optimistic asynchronous atomic broadcast realizing fully secure synchronous mpc with asynchronous fallback,” in Proc.
such a level of applicability, because of a highly efficient pace- CRYPTO 2020, pp. 707–731.
synchronization reduced to only some binary agreement. [16] A. Boldyreva, “Threshold signatures, multisignatures and blind signa-
tures based on the gap-diffie-hellman-group signature scheme,” in Proc.
Besides, BDT realizes an important by-product of respon- PKC 2003, pp. 31–46.
sive fastlane. Due to the highly efficient fallback in BDT, [17] G. Bracha, “Asynchronous byzantine agreement protocols,” Information
engineers now can aggressively set the timeout parameter and Computation, vol. 75, no. 2, pp. 130–143, 1987.
[18] M. Bravo, G. Chockler, and A. Gotsman, “Making byzantine consensus
of the fastlane according to her optimistic estimation about live,” in Proc. DISC 2020.
benign network conditions for good days only. Doing this can [19] V. Buterin, “A next-generation smart contract and decentralized appli-
quickly replace a slow (or even malicious) fastlane leader, cation platform.”
[20] C. Cachin, K. Kursawe, A. Lysyanskaya, and R. Strobl, “Asynchronous
making the fastlane to run in full horse-power to be even verifiable secret sharing and proactive cryptosystems,” in Proc. CCS
responsive (i.e., the protocol’s latency only relies on the actual 2002, pp. 88–97.
network delay instead of the delay upper bound). In contrast, [21] C. Cachin, K. Kursawe, F. Petzold, and V. Shoup, “Secure and efficient
asynchronous broadcast protocols,” in Proc. CRYPTO 2001, pp. 524–
practitioners have to cautiously tweak their timeout policy 541.
in (partial) synchronous protocols, e.g., using some not-so- [22] C. Cachin, K. Kursawe, and V. Shoup, “Random oracles in constantipole:
small timeout parameter τ as the estimated upper bound ∆ for practical asynchronous byzantine agreement using cryptography,” in
Proc. PODC 2020, pp. 123–132.
network delay, or adaptively increasing τ by some heuristics [23] C. Cachin and S. Tessaro, “Asynchronous verifiable information disper-
[18, 26, 52], which is needed to ensure synchrony assumption sal,” in Proc. SRDS 2005, pp. 191–201.
hold for liveness. Unfortunately, the large timeout parameter [24] C. Cachin and M. Vukolic, “Blockchain consensus protocols in the wild
(keynote talk),” in Proc. DISC 2017.
in (partial) synchronous protocols rules out the responsiveness [25] R. Canetti and T. Rabin, “Fast asynchronous byzantine agreement with
property, since their latency would rely on the conservatively optimal resilience,” in Proc. STOC 1993, pp. 42–51.
estimated timeout parameter τ > ∆. [26] M. Castro and B. Liskov, “Practical byzantine fault tolerance and
proactive recovery,” ACM Transactions on Computer Systems (TOCS),
vol. 20, no. 4, pp. 398–461, 2002.
ACKNOWLEDGMENT [27] M. Castro, B. Liskov et al., “Practical byzantine fault tolerance,” in
Proc. OSDI 1999, pp. 173–186.
We would like to thank Vincent Gramoli for his helpful [28] B. Y. Chan and E. Shi, “Streamlet: Textbook streamlined blockchains,”
feedback on the results presented in the paper. in Proc. AFT 2020, pp. 1–11.
[29] M. Correia, N. F. Neves, and P. Verı́ssimo, “From consensus to atomic
R EFERENCES broadcast: Time-free byzantine-resistant protocols without signatures,”
[1] “Bug in aba protocol’s use of common coin,” The Computer Journal, vol. 49, no. 1, pp. 82–96, 2006.
https://github.com/amiller/HoneyBadgerBFT/issues/59, accessed: [30] T. Crain, “Experimental evaluation of asynchronous binary byzantine
2021-4-10. consensus algorithms with t < n/3 and o(n2) messages and o(1)
[2] I. Abraham, D. Dolev, and J. Y. Halpern, “An almost-surely terminating round expected termination,” arXiv preprint arXiv:2004.09547, 2020.
polynomial protocol for asynchronous byzantine agreement with optimal [31] S. Duan, M. K. Reiter, and H. Zhang, “Beat: Asynchronous bft made
resilience,” in Proc. PODC 2008, pp. 405–414. practical,” in Proc. CCS 2018, pp. 2028–2041.
[3] I. Abraham, D. Malkhi, and A. Spiegelman, “Asymptotically optimal [32] C. Dwork, N. Lynch, and L. Stockmeyer, “Consensus in the presence
validated asynchronous byzantine agreement,” in Proc. PODC 2019, pp. of partial synchrony,” JACM, vol. 35, no. 2, pp. 288–323, 1988.
337–346. [33] E. Blum, J. Katz, and J. Loss, “Tardigrade: An atomic broadcast protocol
[4] I. Abraham, K. Nayak, L. Ren, and Z. Xiang, “Good-case latency of for arbitrary network conditions,” in Proc. ASIACRYPT 2021.
byzantine broadcast: a complete categorization,” in Proc. PODC 2021, [34] M. J. Fischer, N. A. Lynch, and M. S. Paterson, “Impossibility of
p. 331–341. distributed consensus with one faulty process,” Journal of the Assccktion
[5] Abraham, Ittai and Malkhi, Dahlia and Nayak, Kartik and Ren, Ling for Computing Machinery, vol. 32, no. 2, pp. 374–382, 1985.
and Yin, Maofan, “Sync hotstuff: Simple and practical synchronous state [35] M. Fitzi and J. A. Garay, “Efficient player-optimal protocols for strong
machine replication,” in Proc. IEEE S&P 2019. and differential consensus,” in Proceedings of the twenty-second annual
[6] Y. Amir, B. Coan, J. Kirsch, and J. Lane, “Prime: Byzantine replication symposium on Principles of distributed computing, 2003, pp. 211–220.
under attack,” IEEE transactions on dependable and secure computing, [36] R. Gelashvili, L. Kokoris-Kogias, A. Sonnino, A. Spiegelman, and
vol. 8, no. 4, pp. 564–577, 2010. Z. Xiang, “Jolteon and ditto: Network-adaptive efficient consensus with
[7] Y. Amoussou-Guenou, A. Del Pozzo, M. Potop-Butucaru, and S. Tucci- asynchronous fallback,” arXiv preprint arXiv:2106.10362, 2021.
Piergiovanni, “Correctness of tendermint-core blockchains,” in Proc. [37] R. Gelashvili, L. Kokoris-Kogias, A. Spiegelman, and Z. Xiang, “Be
OPODIS 2018. prepared when network goes bad: An asynchronous view-change proto-
[8] H. Attiya and J. Welch, Distributed computing: fundamentals, simula- col,” arXiv preprint arXiv:2103.03181, 2021.
tions, and advanced topics. John Wiley & Sons, 2004, vol. 19. [38] R. Gennaro, S. Jarecki, H. Krawczyk, and T. Rabin, “Secure distributed
[9] P.-L. Aublin, S. B. Mokhtar, and V. Quéma, “Rbft: Redundant byzan- key generation for discrete-log based cryptosystems,” in Proc. EURO-
tine fault tolerance,” in 2013 IEEE 33rd International Conference on CRYPT 1999, pp. 295–310.
Distributed Computing Systems, 2013, pp. 297–306. [39] R. Guerraoui, N. Knežević, V. Quéma, and M. Vukolić, “The next
[10] M. Ben-Or, “Another advantage of free choice (extended abstract) 700 bft protocols,” in Proceedings of the 5th European conference on
completely asynchronous agreement protocols,” in Proc. PODC 1983, Computer systems, 2010, pp. 363–376.
pp. 27–30. [40] G. G. Gueta, I. Abraham, S. Grossman, D. Malkhi, B. Pinkas, M. Reiter,
[11] M. Ben-Or and R. El-Yaniv, “Resilient-optimal interactive consistency D.-A. Seredinschi, O. Tamir, and A. Tomescu, “Sbft: a scalable and
in constant time,” Distributed Computing, vol. 16, no. 4, pp. 249–262, decentralized trust infrastructure,” in Proc. DSN 2019, pp. 568–580.
2003. [41] B. Guo, Z. Lu, Q. Tang, J. Xu, and Z. Zhang, “Dumbo: Faster
[12] M. Ben-Or, B. Kelmer, and T. Rabin, “Asynchronous secure computa- asynchronous bft protocols,” in Proc. CCS 2020, pp. 803–818.
tions with optimal resilience,” in Proc. PODC 1994, pp. 183–192. [42] A. Kate and I. Goldberg, “Distributed key generation for the internet,”
[13] A. Bessani, J. Sousa, and E. E. Alchieri, “State machine replication for in Proc. ICDCS 2009, pp. 119–128.
the masses with bft-smart,” in Proc. DSN 2014, 2014, pp. 355–362. [43] E. Kokoris Kogias, D. Malkhi, and A. Spiegelman, “Asynchronous
18
distributed key generation for computationally-secure randomness, con- use the valid Proof j to extract blocks[j] from the receivied
sensus, and threshold signatures.” in Proc. CCS 2020, pp. 1751–1767. protocol messages.
[44] K. Kursawe and V. Shoup, “Optimistic asynchronous atomic broadcast,”
in Proc. ICALP 2005, first announced in 2002, pp. 204–215. For abandonability: it is immediate to see from the pseu-
[45] J. Loss and T. Moran, “Combining asynchronous and synchronous docode of the abandon interface.
byzantine agreement: The best of both worlds.” IACR Cryptol. ePrint For optimistic liveness: suppose that the optimistic condition
Arch., vol. 2018, p. 235, 2018.
[46] Y. Lu, Z. Lu, Q. Tang, and G. Wang, “Dumbo-mvba: Optimal multi- is that the leader is honest, then any honest party would
valued validated asynchronous byzantine agreement, revisited,” in Proc. output block[1] in three asynchronous rounds after entering the
PODC 2020, pp. 129–138. protocol and would output log[j + 1] within two asynchronous
[47] A. Miller, Y. Xia, K. Croman, E. Shi, and D. Song, “The honey badger
of bft protocols,” in Proc. CCS 2016, pp. 31–42. rounds after outputting log[j] (for all j ≥ 1).
[48] A. Momose, J. P. Cruz, and Y. Kaji, “Hybrid-bft: Optimistically respon- Lemma 2: The algorithm in Figure 7 satisfies the total-
sive synchronous consensus with optimal latency or resilience.” IACR order, notarizability, abandonability and optimistic liveness
Cryptol. ePrint Arch., vol. 2020, p. 406, 2020.
[49] A. Momose and L. Ren, “Multi-threshold byzantine fault tolerance,” in properties of Bolt except with negligible probability.
Proc. CCS 2021. Proof: It is clear that the total-order, notarizability and
[50] A. Mostefaoui, H. Moumen, and M. Raynal, “Signature-free asyn- abandonability follow immediately from the properties of
chronous byzantine consensus with t < n/3 and O(n2 ) messages,”
in Proc. PODC 2014, pp. 2–9. RBC and the pseudocode of Bolt-sRBC, since the agreement
[51] S. Nakamoto, “Bitcoin: A peer-to-peer electronic cash system,” 2008. of RBC guarantees that the output TXs by any parties is the
[52] O. Naor, M. Baudet, D. Malkhi, and A. Spiegelman, “Cogsworth: Byzan- same and a valid proof along with the sequentially executing
tine view synchronization,” arXiv preprint arXiv:1909.05204, 2019.
[53] R. Pass and E. Shi, “The sleepy model of consensus,” in Advances in nature of all RBC instances would ensure total-order and
Cryptology – ASIACRYPT 2017, 2017, pp. 380–409. notarizability. For optimistic liveness, the optimistic condition
[54] A. Patra, “Error-free multi-valued broadcast and byzantine agreement remains to be that the leader is honest, and κ is 4 due to the
with optimal communication complexity,” in Proc. OPODIS 2011, pp.
34–49. RBC construction in [47] and an extra vote step.
[55] A. Patra, A. Choudhary, and C. Pandu Rangan, “Simple and efficient
asynchronous byzantine agreement with optimal resilience,” in Proc. A PPENDIX B
PODC 2009, pp. 92–101. D EFERRED P ROOFS FOR Bolt-Dumbo Transformer
[56] T. P. Pedersen, “A threshold cryptosystem without a trusted party,” in
Proc. EUROCRYPT 1991, pp. 522–526. Safety proof. We first prove the total-order and agreement,
[57] R. Pass, and E. Shi, “Thunderella: Blockchains with optimistic instant assuming the underlying Bolt, tcv-BA and ACS are secure.
confirmation,” in Proc. EUROCRYPT 2018, 2018, pp. 3–33.
[58] M. O. Rabin, “Randomized byzantine generals,” in 24th Annual Sympo- Claim 1: If an honest party activates tcv-BA[e], then at least
sium on Foundations of Computer Science. IEEE, 1983, pp. 403–409. n − 2f honest parties have already invoked abandon(e), and
[59] H. V. Ramasamy and C. Cachin, “Parsimonious asynchronous byzantine- from now on: suppose these same parties invoke abandon(id)
fault-tolerant atomic broadcast,” in Proc. OPODIS 2005, pp. 88–102.
[60] N. Shrestha, I. Abraham, L. Ren, and K. Nayak, “On the optimality of before they output block:=he, R, ·, ·i, then any party (in-
optimistic responsiveness,” in Proc. CCS 2020, pp. 839–857. cluding the faulty ones) cannot receive (or forge) a valid
[61] A. Spiegelman, “In search for an optimal authenticated byzantine block:=he, R + 1, ·, ·i, and all honest parties would activate
agreement,” in Proc. DISC 2021, 2021.
[62] G. S. Veronese, M. Correia, A. N. Bessani, and L. C. Lung, “Spin one’s tcv-BA[e].
wheels? byzantine fault tolerance with a spinning primary,” in Proc. Proof: When an honest party Pi activates tcv-BA[e], it
SRDS 2009, pp. 135–144. must have received n − f valid PACE S YNC messages from
[63] M. Yin, D. Malkhi, M. K. Reiter, G. G. Gueta, and I. Abraham,
“Hotstuff: Bft consensus with linearity and responsiveness,” in Proc. distinct parties, so there would be at least n − 2f honest
PODC 2019, 2019, pp. 347–356. parties multicast PACE S YNC messages. By the pseudocode,
it also means that at least n − 2f honest parties have invoked
A PPENDIX A abandon(e). Note that n − 2f ≥ f + 1 and these same parties
D EFERRED P ROOFS FOR Bolt C ONSTRUCTIONS invoke abandon(id) before they output block:=he, R, ·, ·i, so
Lemma 1: The algorithm in Figure 6 satisfies the total- no party would deliver any valid block:=he, R + 1, ·, ·i in
order, notarizability, abandonability and optimistic liveness this epoch’s Bolt phase due to the abandonability property of
properties of Bolt except with negligible probability. Bolt. It is also implies that any parties cannot from the Bolt
Proof: Here we prove the three properties one by one: receive valid block:=he, R+1, ·, ·i , then all honest parties will
For total-order: First, we prove at same position, for any two eventually be interrupted by the “timeout” mechanism after τ
honest parties Pi and Pj return blocki and blockj , respectively, asynchronous rounds and then multicast PACE S YNC messages.
then blocki = blockj . It is clear that if the honest party Pi This ensures that all honest parties finally receive n − f valid
outputs blocki , then at least f + 1 honest parties did vote for PACE S YNC messages from distinct parties, causing all honest
blocki because TSIG.Vrfy2f +1 passes verification. So did f +1 parties to activate tcv-BA[e].
honest parties vote for blockj . That means at least one honest Claim 2: Suppose that some party receives a valid
party votes for both blocks, so blocki = blockj . block:=he, R, ·, ·i from Bolt[e] when an honest party invokes
For notarizability: Suppose a party Pi outputs blocks[j] := tcv-BA[e] s.t. this block is the one with largest slot number
hid, j, TXsj , Proof j i, it means at least f + 1 honest parties among all parties’ valid pending blocks (which means the
vote for block[j], according to the pseudocode, at least those union of the honest parties’ actual pending blocks and the
same f + 1 honest parties already output blocks[j − 1] and malicious parties’ arbitrary valid Bolt[e] block), then all honest
received the TXsj , hence, those honest parties can further parties’ maxPacee must be same and be either R or R − 1.
19
Proof: Following Claim 1, once an honest party invokes So in this case, all honest parties (i.e., Q̄ ∪ Q) will not discard
tcv-BA[e], the Bolt[e] block with the largest slot number any block in their finalized log.
Rmax would not change anymore. Let us call this already Claim 4: If tcv-BA[e] returns syncPacee , then at least f + 1
fixed Bolt[e] block with highest slot number as blockmax . honest parties can append blocks with slot numbers from 1 to
Since there is someone that receives blockmax :=he, R, ·, ·i, at syncPacee that all received from Bolt[e] into the log without
least f + 1 honest parties (e.g., denoted by Q) have already invoking CallHelp function.
received the block he, R − 1, ·, ·i, which is because of the Proof: Suppose tcv-BA[e] returns syncPacee , according to
notarizability property of Bolt. So these honest parties would the strong validity of tcv-BA, then at least one honest party
broadcast a valid PACE S YNC(e, R − 1, ·) message or a valid inputs the number syncPacee . The same honest party must
PACE S YNC(e, R, ·) message. According to the pseudocode in receive a valid message PACE S YNC(e, syncPacee , Proof),
Figure 12, maxPacee is the maximum number in the set of which means there must exists a valid Bolt block
Pacese , where Pacese contains the slot numbers encapsulated he, syncPacee , ·, Proofi. Following the pseudocode, the hon-
by n − f valid PACE S YNC messages. Therefore, Pacese must est party will multicst PACE S YNC(e, syncPacee , Proof) if
contain one PACE S YNC message’s slot number from at least tcv-BA[e] returns syncPacee , then all honest parties can get
n − 2f ≥ f + 1 honest parties (e.g., denoted by Q̄). All honest the Proof. According to the notarizability and total-order
parties’ local Pacese set must contain R−1 and/or R, because properties of Bolt, at least f + 1 honest parties can append
Q̄ and Q contain at least one common honest party. Moreover, blocks from he, 1, ·, ·i to he, syncPacee , ·, ·i into the log with-
there is no valid PACE S YNC message containing any slot larger out invoking CallHelp function..
than R since the proof for that is unforgeable, which means Claim 5: If an honest party invokes CallHelp function to
R is the largest possible value in all honest parties’ Pacese . retrieve a block log[i], it eventually can get it; if another honest
So any honest party’s maxPacee must be R or R − 1. party retrieves a block log[i]0 at the same log position i from
Claim 3: No honest party would get a syncPacee smaller the CallHelp function, then log[i] = log[i]0 .
than the slot number of it latest finalized block log[−1] (i.e., Proof: Due to Claim 4 and total-order properties of Bolt,
no block finalized in some honest party’s log can be revoked). any block log[i] that an honest party is retrieving through
Proof: Suppose an honest party invokes tcv-BA[e] and a CallHelp function shall have been in the output log of at
valid Bolt[e] block he, R, ·, ·i is the one with largest slot least f + 1 honest parties, so it eventually can get f + 1
number among the union of the honest parties’ actual pending correct H ELP messages with the same Merkle tree root h
blocks and the malicious parties’ arbitrary valid Bolt[e] block. from distinct parties, then it can interpolate the f + 1 leaves
Because of Claim 2, all honest parties will activate tcv-BA[e] to reconstruct log[i] which is same to other honest parties’
with taking either R or R−1 as input. According to the strong local log[i]. We can argue the agreement by contradiction, in
validity of tcv-BA, the output syncPacee of tcv-BA[e] must be case the interpolation of honest party fails or it recovers a
either R or R − 1. Then we consider the next two cases: block log0 [i] different from the the honest party’s local log[i],
1) Only malicious parties have this he, R, ·, ·i block; that means the Merkle tree with root h commits some leaves
2) Some honest party Pi also has the he, R, ·, ·i block. that are not coded fragments of log[i]; nevertheless, there is at
For Case 1) Due to the notarizability property of Bolt and least one honest party encode log[i] and commits the block’s
this case’s baseline, there exist f + 1 honest parties (denoted erasure code to have a Merkle tree root h; so the adversary
by a set Q) have the block he, R−1, ·, ·i as their local pending. indeed breaks the collision resistance of Merkle tree, implying
Note that remaining honest parties (denoted by a set Q̄) would the break of the collision-resistance of hash function, which is
have local pending block not higher than R − 1. According to computationally infeasible. So all honest parties that attempt to
the algorithm in Figure 12, we can state that: (i) if the output retrieve a missing block log[i] must fetch the block consistent
is R, then all honest parties will sync their log up to the block to other honest parties’.
he, R, ·, ·i (which include all honest parties’ local pending); (ii) Lemma 3: If all honest parties enter the Bolt phase with the
similarly, if the output is R − 1, all honest parties will sync up same log, then they will always finish the Transformer phase
till he, R − 1, ·, ·i (which also include all honest parties’ local with still having the same log0 .
pending). So in this case, all honest parties (i.e., Q̄ ∪ Q) will Proof: If all honest parties enter the epoch with the same
not discard their pending block, let alone discard some Bolt log, it is easy to see that they all will eventually interrupt
that are already finalized to output into log. to abandon the Bolt phase. Due to Claim 1, all honest
For Case 2) Let Q denote the set of honest parties that have parties would activate tcv-BA[e]. Following the agreement and
the block he, R, ·, ·i as their local pending. Note the remaining termination of tcv-BA[e], all parties would finish tcv-BA[e]
honest parties Q̄ would have the pending block not higher than to get a common syncPacee , and then by the pseudocode,
R. In this case, following the algorithm in Figure 12, we can all honest parties will sync up to the same log, and the last
see that: (i) if the output is R, then all honest parties will sync block of log with slot number syncPacee (due to total-order
their log up to the block he, R, ·, ·i (which include all honest properties of Bolt, Claim 4 and Claim 5), hence, all parties
parties’ local pending); (ii) similarly, if the output is R − 1, will finish the Transformer phase with the same output log.
all honest parties will sync up to he, R − 1, ·, ·i (which include Lemma 4: For any two honest parties before finishing the
Q̄ parties’ local pending and Q parties’ finalized output log). Transformer phase, then there exists one party, such that its
20
log is a prefix of (or equal to) the other’s. τ + κEsize asynchronous rounds, where κ, according to the
Proof: The blocks outputted before the completion of the optimistic liveness property of Bolt, is a tiny constant that
Transformer phase were originally generated from the Bolt represents the running time of generating a new optimistic Bolt
phase, then the Lemma holds immediately by following the block and depends on the underlying Bolt instantiation (e.g.,
total-order property of Bolt and Claim 3. it is two in the Bolt-sCAST instantiation). Upon timeout, the
Lemma 5: If any honest party enters the Pessimistic phase, broadcast of PACE S YNC message will take one more asyn-
then all honest parties will enters the phase and always leave chronous round. After that, all honest parties would receive
the phase with having the same log. enough PACE S YNC messages to enter the Transformer phase,
Proof: If any honest party enter the Pessimistic phase, all which costs at most τ + κEsize + 1 asynchronous rounds.
honest parties would enter this phase, which is due to Claim Lemma 8: If all honest parties enter the Transformer phase,
1, the agreement and termination property of tcv-BA and they all leave the phase in expected constant asynchronous
syncPacee = 0. Let us assume that all honest parties enter the rounds and then either enter the Pessimistic phase or enter the
Pessimistic phase with the same log, it would be trivial too next epoch’s Bolt phase.
see the statement for the agreement and termination properties Proof: If all honest parties enter the Transformer phase,
of ACS and the correct and robustness properties of threshold it is trivial to see the Lemma since the underlying tcv-BA
public key encryption. Then considering Lemma 3 and the terminates in on-average constant asynchronous rounds. If the
simple fact that all honest parties activate with the common output of tcv-BA equal 0, then enter the Pessimistic phase,
empty log, we can inductively reason that all honest parties otherwise, enter the next epoch’s Bolt phase.
must enter any Pessimistic phase with the same log. So the Lemma 9: If all honest parties enter the Pessimistic phase,
Lemma holds. all honest parties will leave this Pessimistic phase in on-
Lemma 6: For any two honest parties in the same epoch, average constant asynchronous rounds with outputting some
there exists one party, such that its log is a prefix of (or equal blocks containing on-average O(B)-sized transactions.
to) the other’s. Proof: Similar to [47]’s analysis, Pessimistic phase at least
Proof: If two honest parties do not enter the Pessimistic outputs O(B)-sized transactions (without worrying that the
phase during the epoch, both of them only participate in adversary can learn any bit about the transactions to be
Bolt or Transformer, so this Lemma holds immediately by outputted before they are actually finalized as output) for
following Lemma 4. For two honest parties that one enters each execution. Here we remark that the original analysis in
the Pessimistic phase and one does not, this Lemma holds [47] only requires IND-CPA security of threshold public key
because the latter one’s log is either a prefix of the former encryption might be not enough, since we need to simulate
one’s or equal to the former one’s due to Lemma 3 and that the adversary can query decryption oracle by inserting
4. For the remaining case that both honest parties enter the her ciphertext into the ACS output. Moreover, the underlying
Pessimistic phase, they must initially have exactly same log Dumbo ACS construction [41] ensures all parties to leave the
(due to Lemma 3). Moreover, in the phase, all honest parties phase in on-average constant asynchronous rounds.
would execute the ACS instances in a sequential manner (e.g., Theorem 2: The Bolt-Dumbo Transformer protocol satisfies
there is only one ACS instance in our exemplary pseudocode), the liveness property.
so every honest party would output in one ACS instance only Proof: Due to Lemma 7 and 8, the adversary would not
if it has already outputted in all earlier ACS instances. Besides, be able to stuck the honest parties during the Bolt and
any two honest party would output the same transaction batch Transformer phases. Even if in the worst cases, the two phases
in every ACS instance for the agreement property of ACS. do not deliver any useful output and the adversary intends to
So this Lemma also holds for any two honest parties that are prevent the parties from running the Pessimistic phase (thus
staying in the same epoch. not eliminating any transactions from the honest parties’ input
Theorem 1: The Bolt-Dumbo Transformer protocol satisfies buffer), we still have a timeout mechanism against censorship,
the agreement and total order properties. which can ensure to execute the Pessimistic phase for every
Proof: The total order be induced by Lemma 6 along O(T ) asynchronous rounds if there is a suspiciously censored
with the fact the protocol is executed epoch by epoch. The tx in all honest parties’ buffers due to some timeout parameter
agreement follows immediately from Lemma 3 and 5 along T . Recall Lemma 9, for each tx in all honest parties’ buffers,
with the fact that all honest parties initialize with the same it would take O(XT /B) asynchronous rounds at worst (i.e.,
empty log to enter the first epoch’s Bolt phase. we always rely on the timeout T to invoke the Pessimistic
phase) to make tx be one of the top B transactions in all
Liveness proof. Then we prove the liveness property of parties’ buffers, where X is the bound of buffer size (e.g.,
Bolt-Dumbo Transformer. an unfixed polynomial in λ). After that, any luckily finalized
Lemma 7: Once all honest parties enter the Bolt phase, they optimistic phase block would output tx (in few more δ), or
must leave the phase in at most constant asynchronous rounds still relying on the timeout to invoke the Pessimistic phase,
and then all enter the Transformer phase. causing the worst case latency O((X/B + λ)δT ), which is
Proof: At worst case, all honest parties’ timeout will in- a function in the actual network delay δ factored by some
terrupt, causing them to abandon the Bolt phase in at most (unfixed) polynomial of security parameters.
21

Bolt-Dumbo Transformer Achieves Asynchronous Consensus As Fast As Pipelined BFT

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Bolt-Dumbo Transformer Achieves Asynchronous Consensus As Fast As Pipelined BFT

Uploaded by

Copyright:

Available Formats

Bolt-Dumbo Transformer: Asynchronous Consensus

As Fast As the Pipelined BFT

that the transactions shall output reasonably quickly [3,

Abstract—An urgent demand of deploying BFT consensus (e.g.,

The above issues correspond to a fundamental “dilemma” Delivered Broadcasts If timeout:

for the open Internet: the cutting-edge (partially) synchronous

reuses the pseudocode except few slightly different if-else

// Optimistic Path (also the BDT protocol’s main entry)

– if Bolt.verify(e, pje , Proof je ) = 1: Pacese ← Pacese ∪ pje

Fig. 12: The Bolt-Dumbo Transformer (BDT) protocol

// The Help daemon process

// The CallHelp function

1 4 D u m b o - n o - fa u lt be attempted to squeeze performance out of any good periods

and Dumbo can handle the application scenarios favoring 1 4

either n is 100 or 64, BDT-sRBC achieves a maximum 8 D u m b o 's b a s ic la t e n c y ( n o - fa lu t, n = 6 4 )

throughput that is as high as around 90% of Dumbo’s for 6

Varying batch sizes. For understanding to what an extent the

Figure 19 illustrates how throughput increases with larger

deployed in the realistic fluctuating network. 2 0

TABLE V: Complexities of BFT protocols in various settings (where B is 1 5

sufficiently large s.t. all λ terms are omitted, and α + β = 1) B D T w i t h f a i le d f a s t la n e s

Per-block Com. Compl. Normalized average latency 5

You might also like