Professional Documents
Culture Documents
Byzantine Fault
Tolerance
Castro and Liskov
SOSP 99
Why this paper?
• Kind of incredible that it’s even possible
• Let alone a practical NFS implementation with it
put(X, 1)
Backup
Review: Consensus
Clients
shl
Consensus State Consensus State Consensus State
Module Machine Module Machine Module Machine
Servers
Log Log Log
add jmp mov shl add jmp mov shl add jmp mov shl
get(X)
Malicious Primary?
• Might lie!
X = 10
get(X)
X = -1
X = 10
Malicious Primary?
• Might lie!
• Solution: direct response
from participants X = 10
X = 10
get(X)
X = -1
X = 10 X = 10
Malicious Primary?
• Might lie!
• Solution: direct response
from participants X = 10
• Problem again: X = 10
primary just
lies more get(X)
X = -1
X = -1 X = -1
X = 10 X = 10
The Need for Crypto
• Need to be able to authenticate messages
• Public-key crypto for signatures
• Each client and server has a private and public key
• All hosts know all public keys
• Signed messages are signed with private key
• Public key can verify that message came from host
with the private key
• While we’re on it: we’ll need hashes/digests also
Authenticated Messages
• Client rejects duplicates or unknown signatures
X = 10
X = 10, signed S2 S2
get(X)
S1
X = -1, signed S1
X = -1, signed S1
X = -1, signed S?? S3
X = 10, signed S3 X = 10
How is this possible? Why 3f + 1?
• First, remember the rules
• Must be able to make progress with n minus f
responses
• n = 3f + 1
• Progress with 3f + 1 - f = 2f + 1
• Often 4 total, progress with 3
• Why? In case those f will never respond
Try 2f + 1, f = 1
X = 10
S1 X = 10
X = 10
get(X) S2 X = 10
C1
S3 X = ??
X = -1
get(X) S2 X = -1
C1
S3 X = 10
get(X) X = -1 S2 X = -1
C1
S3 X = ??
X = 10
get(X) S2 X = 10
C1
X = -1
S3 X = 10
S4 X = ??
Phase 1
the requested operation, replicas send a reply to the client.
Replicas discard requests whose timestamp is lower than
the timestamp in the last reply they sent to the client to
guarantee exactly-once semantics.
We do not rely on ordered message delivery, and
• Client to Primary therefore it is possible for a replica to commit requests
{REQUEST, op, timestamp, clientId}
out of order.
sc This does not matter since it keeps the pre-
prepare, prepare, and commit messages logged until the
• Primary to Replicas corresponding request can be executed.
{PRE-PREPARE, view, seqn, h(req)} sp, req
Figure 1 shows the operation of the algorithm in the
normal case of no primary faults. Replica 0 is the primary,
• Replicas to Replicas replica 3 is faulty, and is the client.
{PREPARE, view, seqn, h(req), replicaId}sri
request pre-prepare prepare commit reply
C
3 X
request pre-prepare prepare commit reply
C
Phase 1
0
3 X
2f + 1
prepared(req, v, n, i) → not prepared(req’, v, n, j)
for honest ri and rj
Honest intersection of maximally disjoint 2f+1 sets is
non-empty
Phase 2
• Problem: Just because some other req' won't
execute at (v, n) doesn't mean req will
Problem: Prepared != Committed
C
Prepares
S1
Pre-prepare
S2
S3