You are on page 1of 6

Raft - consensus protocol

April 15, 2017

What is consensus?
In short concensus is agreement on shared state (Single system image). It’s typically used to
build replicated state machine.

Replicated log === replicated state machine.

Idea is that if the log on all different servers looks the same, the state machine looks the same.
All the state machines produce the same result for the same set of operations.

What happens in case of failures
Recovers from server failures autonomously.

 minority fail: no problem.

 majority fail: lose availability, retain consistency.

Failure Model
Raft consider only fail-stop (not Byzantine) model, delayed/lost messages.

How is consensus used?
Consensus can be used to replicate whole databases like Spanner from Google. It can be used to
maintain system level configuration like etcd, nomad, etc.

Raft Overview
Basic concept is to Parition your data and provide each partition to a state machine. Raft heavily
relies on leader which is an after thought in Paxos.

Leader Election
Select one of the servers as leader, detect crashes and choose a new leader.

Safety: allow at most one winner per term. 2. .  Each log in composed of entries and each entry has an index number. Raft uses leader. others accept its decisions. Asymmetric. Safety Only server with up-to-date log can become leader. Log  Each servers has its own copies of logs.Log Replication Leader takes commands from clients. Liveness: some candidate must eventually win. one server is in charge. append those to leader log and leader replicates its log to other servers. o Clients communicate with leader. leader-less o All servers have equal roles o clients can connect to any server 2. Approaches to Consensus: 1. Symmetric. o each server has only one vote o two different servers cannot accumulate majority. o choose elections timeouts randomly. all the complexity comes in when leader fails Leader Election 1. leader based o At any given time.

{Not Really} Normal Operation:  Client sends command to leader  Leader appends the command to its log  Leader sends AppendEntries RPCs to followers  Once new entry committed: o Leader passes command to state machine. Also. Log Consistency Properties that are always true 1. . If the log doesn’t have that entry. we say that entry is committed. RPC will be rejected.  Each entry contains a command and a term number (increases monotonically. Combination of index and term uniquely idenfies a long entry. Above properties are enforced by check made during ApendEntries consistency check.) Committed: If a particular entry is stored on majority of server. reruts result to client. 2. the logs are identical in all preceding entries.  Slow machines don’t slow down the clients. It’s kind of like an induction step. they will have the same comand. then all preceding entries also match. all preceding entries are also committed.. If a given entry is committed. o Leader notifier followers of commited enteries in subsequent AppendEntries RPCs o Followers pass committed commands to their state machines. Which proves that if an entry in a server matches a leader entry. It includes two values: Index and term of entry just before the new one. i.  Need only majority to respond that command is stored.e.

 Entry must be committed before applying to state machine. Raft Guarantee: Once a leader has decided that a particular entry is committed. Raft Approach: — Assume that leader’s log is always correct. Whoever wins the election is guarantted to have the most complete log among the candidate servers. -> unqiuely entire log. no other state machine must apply a different value for that log entry. If the terms match and the voting servers log is longer.S5 were leaders for term 2. Safety Requirement: Once a log entry has been applied to a state machine. Explanation of slides from Raft user study: . We elect the server that has the log that is most likely all the enteries that are committed. that entry will be present in the logs of all future leaders. 2. deny the vote. COMMITTED —> Present in future leaders’ logs (Restriction on committment) (Restrictions on leader election) Picking the best Leader We can’t know for sure if entry is committed. When candidate request a vote: it includes: index and term of last log entry. Continues normal operation.  Onlt enteries in leaders log can be committed. If voters log is more complete. Will eventually make follower’s logs identical to leader’s. deny the vote. cleanup has to happend during the normal operation. 1. How?  Leader will never overwrite its own entry.3 and 4.Leader Changes  Old leader may have left entries partially replicated. Scenario slide 15: S4.  No special step taken when new leader comes up.

decrement the nextIndex and try again. c)  Get rid of all extra eneteries and fill them with enteries from leaders log. Reparing Logs Slide 21:  Followers missing entries (a.  S4 or S5 cannot become leaders if S1 is down. consistency check fails. On receiving AppendEnteries. f. nextIndex is 1 + leader’s last index. If follower overwrites inconsistent entry. S2/S3 will become leader. Rules for committment:  Must be stored on a majority of servers  At least one new entry from leader’s term must also be stored on majority of servers.  Cannot consider entry 2 to be committed. b. Slide 19:  If S1 is down.Slice 18: Interesting cases to consider:  Entry being committed is in current term  Most recent entry is declared committed.  Leader keeps nextIndex for each follower. it deletes all subsequent entries. Old leader may not be dead  Temporarily disconnected from network. Client Protocol . Combination of election rules and commitment rules makes Raft safe. e)  Extra enteries (d.

client reissues command to some other server. Client embeds unique id in each command. — conflicting majorities Two Phase: 1st phase: joint consensus (intermediate phase that need majority of both old and new configurations for elections. Cannot switch directly from one configuration to antoher: conflictin majorities could arise. contact any server — If contacted server not leader.  eventulaly executed -> might get executed twice. Configuration Changes: System configuration:  ID. server adds an entry to log describing new configuration.) union of old and new configurations. leader checks its log for entry with that id.Sender commands to leader — If leader unknown.  configuration changes takes effect immediately.  propogates to another servers.  Determines what constitutes a majority.  Change degree of replication. committed and executed by leader’s state machine. Linearizability: exactly once semantics. Consensus mechanism must support changes in the configuration:  Replace failed machine. address for each server. Leader doesn’t response until logged. .  If request times out. Before accepting command. it will redirect to leader. separate 2nd phase: clients makes a request to leader.