CS State Machine Replication

Size: px

Start display at page:

Download "CS State Machine Replication"

Justin Hodge
5 years ago
Views:

1 CS 5450 State Machine Replication

2 Key Ideas To tolerate faults replicate functionality! Can represent deterministic distributed system as replicated state machine (SMR) Each replica reaches the same conclusion about the system independently Key examples of distributed algorithms that generically implement SMR Formalizes notions of fault-tolerance in SMR slide 2

3 Motivation Server Client 10 get(x) get(x) No response Client

4 Motivation Server Client

5 Motivation uneed replication for fault tolerance uwhat happens in these scenarios without replication? Storage - disk failure Web service - network failure ube able to reason about failure tolerance uhow badly can things go wrong and have our system continue to function? slide 5

6 State Machines ustate variables udeterministic commands slide 6

7 Requests and Causality Process order consistent with potential causality uclient A sends r, then r' ur is processed before r' ur causes Client B to send r' ur is processed before r'. slide 7

8 Coding State Machines ustate machines are procedures uclient calls procedure uavoid loops umore flexible structure slide 8

9 State Machine Replication c X = Y X = Y X = Y X = Y State Machine Replica

10 State Machine Replication f(c) f(c) X = Z X = Z f(c) f(c) X = Z X = Z State Machine Replica

11 Write put(x,10)

12 After the Write Great!

13 Write put(x,10)

14 Need Agreement get(x) 10 get(x) Replicas need to agree which requests have been handled 3 Problem!

15 Two Writes put(x,10) put(x,30)

16 Either Outcome is Fine 0 0 OR 0 0

17 Order Matters put(x,10) put(x,30)

18 Order Matters put(x,10) put(x,30)

19 Order Matters put(x,10) put(x,30) 0 0

20 Order Matters put(x,10) put(x,30) 0 0

21 Order Matters 0 0 Replicas need to handle requests in the same order

22 Requirements All non-faulty servers need uagreement Every replica needs to accept the same set of requests uorder All replicas process requests in the same relative order slide 22

23 Idea for Agreement usomeone proposes a request uif the proposer is non-faulty, all servers will accept that request slide 23

24 Agreement put(x,10)

25 Agreement put(x,10) Non-faulty Transmitter

26 Idea for Order Assign unique ids to requests, process them in ascending order uhow do we assign unique ids in a distributed system? uhow do we know when every replica has processed a given request? slide 26

27 Order put(x,30) put(x,10)

28 Order put(x,30) put(x,10) Assign Total Ordering Request ID 1 2

29 Order Assign Total Ordering Request ID 1 2

30 Order Assign Total Ordering Request ID 1 2

31 Order Assign Total Ordering Request ID 1 2 Cannot receive request with smaller ID is now stable!

32 Order Assign Total Ordering Request ID 1 2 is now stable! is now stable!

33 Generating IDs uorder via clocks (client timestamp = id) Logical clocks Synchronized clocks utwo-phase ID generation Every replica proposes a candidate One candidate is chosen and agreed upon by all replicas slide 33

34 Replica ID Generation put(x,30) put(x,10)

35 Replica ID Generation ) Propose candidates

36 Replica ID Generation ) Accept

37 Replica ID Generation ) Accept

38 Replica ID Generation is now stable

39 Replica ID Generation ) Apply

40 Replica ID Generation ) Apply

41 Rules for Replica-Generated IDs uany new candidate ID must be > ID of any accepted request uthe ID selected from the candidate list must be >= each candidate uwhen is a candidate stable? It has been accepted No other pending request with a smaller candidate ID slide 41

42 Faults ufail-stop A faulty server can be detected as faulty ubyzantine Faulty servers can do arbitrary, perhaps malicious things This includes crash failures (server can stop responding without notification) slide 42

43 Fail-Stop Tolerance put(x,30)

44 Fail-Stop Tolerance 1.1 1) Propose Candidates.

45 Fail-Stop Tolerance ) Accept

46 Fail-Stop Tolerance ) Apply

47 Fail-Stop Tolerance 0 GAME OVER!!! 2) Apply

48 Fail-Stop Fault Tolerance uto tolerate t failures, need t+1 servers. uas long as 1 server remains, we re OK uonly need to participate in protocols with other live servers slide 48

49 Byzantine Fault Tolerance uto tolerate t failures, need 2t + 1 servers uprotocols now involve votes Can only trust server response if the majority of servers say the same thing ut + 1 servers need to participate in replication protocols slide 64

50 Lamport (1978) slide 65

51 Fault-Tolerant State Machines uimplement the state machine on multiple processors ustate machine replication Each starts in the same initial state Executes the same requests Requires consensus to execute in same order Deterministic, each will do the exact same thing Produce the same output slide 66

52 Consensus utermination uvalidity uintegrity uagreement Ensures procedures are called in same order across all machines slide 67

Fault Tolerance via the State Machine Replication Approach. Favian Contreras

Fault Tolerance via the State Machine Replication Approach Favian Contreras Implementing Fault-Tolerant Services Using the State Machine Approach: A Tutorial Written by Fred Schneider Why a Tutorial? The