CS State Machine Replication - PDF Free Download

CS 5450 State Machine Replication

Key Ideas To tolerate faults replicate functionality! Can represent deterministic distributed system as replicated state machine (SMR) Each replica reaches the same conclusion about the system independently Key examples of distributed algorithms that generically implement SMR Formalizes notions of fault-tolerance in SMR slide 2

Motivation Server Client 10 get(x) get(x) No response Client

Motivation Server Client

Motivation uneed replication for fault tolerance uwhat happens in these scenarios without replication? Storage - disk failure Web service - network failure ube able to reason about failure tolerance uhow badly can things go wrong and have our system continue to function? slide 5

State Machines ustate variables udeterministic commands slide 6

Requests and Causality Process order consistent with potential causality uclient A sends r, then r' ur is processed before r' ur causes Client B to send r' ur is processed before r'. slide 7

Coding State Machines ustate machines are procedures uclient calls procedure uavoid loops umore flexible structure slide 8

State Machine Replication c X = Y X = Y X = Y X = Y State Machine Replica

State Machine Replication f(c) f(c) X = Z X = Z f(c) f(c) X = Z X = Z State Machine Replica

Write put(x,10)

After the Write Great!

Write put(x,10)

Need Agreement get(x) 10 get(x) Replicas need to agree which requests have been handled 3 Problem!

Two Writes put(x,10) put(x,30)

Either Outcome is Fine 0 0 OR 0 0

Order Matters put(x,10) put(x,30)

Order Matters put(x,10) put(x,30) 0 0

Order Matters 0 0 Replicas need to handle requests in the same order

Requirements All non-faulty servers need uagreement Every replica needs to accept the same set of requests uorder All replicas process requests in the same relative order slide 22

Idea for Agreement usomeone proposes a request uif the proposer is non-faulty, all servers will accept that request slide 23

Agreement put(x,10)

Agreement put(x,10) Non-faulty Transmitter

Idea for Order Assign unique ids to requests, process them in ascending order uhow do we assign unique ids in a distributed system? uhow do we know when every replica has processed a given request? slide 26

Order put(x,30) put(x,10)

Order put(x,30) put(x,10) Assign Total Ordering Request ID 1 2

Order Assign Total Ordering Request ID 1 2

Order 0 0 0 0 Assign Total Ordering Request ID 1 2 Cannot receive request with smaller ID is now stable!

Order Assign Total Ordering Request ID 1 2 is now stable! is now stable!

Generating IDs uorder via clocks (client timestamp = id) Logical clocks Synchronized clocks utwo-phase ID generation Every replica proposes a candidate One candidate is chosen and agreed upon by all replicas slide 33

Replica ID Generation put(x,30) put(x,10)

Replica ID Generation 1.1 1.3 2.1 2.3 1.2 1.4 2.2 2.4 1) Propose candidates

Replica ID Generation 1.1 2.4 1.3 2.1 2.3 2.4 1.2 2.4 1.4 2.2 2.4 2.4 2) Accept

Replica ID Generation 1.1 2.4 1.3 2.2 2.1 2.2 2.3 2.4 1.2 2.4 1.4 2.2 2.2 2.2 2.4 2.4 3) Accept

Replica ID Generation 2.1 2.2 1.3 2.2 1.1 2.4 2.3 2.4 2.2 2.2 1.4 2.2 1.2 2.4 2.4 2.4 is now stable

Replica ID Generation 2.1 2.2 1.3 2.2 1.1 2.4 2.3 2.4 2.2 2.2 1.4 2.2 1.2 2.4 2.4 2.4 4) Apply

Replica ID Generation 2.1 2.2 0 0 1.3 2.2 1.1 2.4 2.3 2.4 2.2 2.2 0 0 1.4 2.2 1.2 2.4 2.4 2.4 5) Apply

Rules for Replica-Generated IDs uany new candidate ID must be > ID of any accepted request uthe ID selected from the candidate list must be >= each candidate uwhen is a candidate stable? It has been accepted No other pending request with a smaller candidate ID slide 41

Faults ufail-stop A faulty server can be detected as faulty ubyzantine Faulty servers can do arbitrary, perhaps malicious things This includes crash failures (server can stop responding without notification) slide 42

Fail-Stop Tolerance put(x,30)

Fail-Stop Tolerance 1.1 1) Propose Candidates.

Fail-Stop Tolerance 1.1 1.1 2) Accept

Fail-Stop Tolerance 1.1 1.1 0 2) Apply

Fail-Stop Tolerance 0 GAME OVER!!! 2) Apply

Fail-Stop Fault Tolerance uto tolerate t failures, need t+1 servers. uas long as 1 server remains, we re OK uonly need to participate in protocols with other live servers slide 48

Byzantine Fault Tolerance uto tolerate t failures, need 2t + 1 servers uprotocols now involve votes Can only trust server response if the majority of servers say the same thing ut + 1 servers need to participate in replication protocols slide 64

Lamport (1978) slide 65

Fault-Tolerant State Machines uimplement the state machine on multiple processors ustate machine replication Each starts in the same initial state Executes the same requests Requires consensus to execute in same order Deterministic, each will do the exact same thing Produce the same output slide 66

Consensus utermination uvalidity uintegrity uagreement Ensures procedures are called in same order across all machines slide 67