Distributed Algorithms Practical Byzantine Fault Tolerance

Size: px
Start display at page:

Download "Distributed Algorithms Practical Byzantine Fault Tolerance"

Transcription

1 Distributed Algorithms Practical Byzantine Fault Tolerance Alberto Montresor Università di Trento 2018/12/06 This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

2 Table of contents 1 Introduction 2 Byzantine generals 3 Practical Byzantine Fault Tolerance 4 Beyond PBFT Overview 5 Zyzzyva Introduction Three cases The case of the missing phase View changes 6 Aardvark 7 UpRight

3 Introduction Motivation Processes may exhibit arbitrary (Byzantine) behavior Malicious attacks They lie They collude Software error Arbitrary states, messages Examples Amazon outage (2008), Root cause was a single bit flip in internal state messages 1 Shuttle Mission STS-124 (2008), 3-1 disagreement on sensors during fuel loading (on Earth!) Alberto Montresor (UniTN) DS - BFT 2018/12/06 1 / 80

4 Introduction History State-of-the-art at the end of the 90 s Theoretically feasible algorithms to tolerate Byzantine failures, but inefficient in practice Assume synchrony known bounds for message delays and processing speed Most importantly: synchrony assumption needed for correctness what about DoS? Bibliography L. Lamport, R. Shostak, and M. Pease. The Byzantine generals problem. ACM Transactions on Programming Languages and Systems (TOPLAS), 4(3): , Alberto Montresor (UniTN) DS - BFT 2018/12/06 2 / 80

5 Table of contents 1 Introduction 2 Byzantine generals 3 Practical Byzantine Fault Tolerance 4 Beyond PBFT Overview 5 Zyzzyva Introduction Three cases The case of the missing phase View changes 6 Aardvark 7 UpRight

6 Byzantine generals Byzantine generals Wait Attack! No, wait! Surrender! Attack! Attack! Wait From cs4410 fall 08 lecture Alberto Montresor (UniTN) DS - BFT 2018/12/06 3 / 80

7 Byzantine generals Specification A commanding general must send an order to his n 1 lieutenant generals such that: IC1: All loyal lieutenants obey the same order IC2: If the commanding general is loyal, then every loyal lieutenant obeys the order he sends Assumptions ( Oral messages): Every message that is sent is received correctly The receiver of a message knows who sent it The absence of a message can be detected Alberto Montresor (UniTN) DS - BFT 2018/12/06 4 / 80

8 Byzantine generals Impossibility results Under the Oral messages assumption, no solution with three generals can handle even a single traitor Comm. Gen. Comm. Gen. Attack! Attack! Attack! Retreat! Liut. 1 He said Retreat! Liut. 2 Liut. 1 He said Retreat! Liut. 2 Alberto Montresor (UniTN) DS - BFT 2018/12/06 5 / 80

9 Byzantine generals Oral Message algorithm OM(m) Algorithm OM(0) 1 The commander sends its value to every lieutenant 2 Each lieutenant uses the value he received from commander, or uses retreat if he received no value Algorithm OM(m) 1 The commander sends its value to every lieutenant 2 i, let v i be the value lieutenant i receives from the commander, or retreat if it has received no value. Lieutenant i acts as the commander of algorithm OM(m 1) to send the value v i to each of the other n 2 other lieutenants 3 j i, let v j be the value received by i from j in Step 2 of algorithm OM(m 1) or retreat if no value. Lieutenant i uses the value majority(v 1,..., v n ) (deterministic function) Alberto Montresor (UniTN) DS - BFT 2018/12/06 6 / 80

10 Byzantine generals Oral Message Algorithm Example OM(1) C L1 A A A L2 A A R L3 A A A Alberto Montresor (UniTN) DS - BFT 2018/12/06 7 / 80

11 Byzantine generals Oral messages Theorem For any m, Algorithm OM(m) satisfies conditions IC1 and IC2 if there are more than 3m generals and at most m traitors Problems: message paths of length up to m + 1 (expensive) absence of messages must be detected via time-out (vulnerable to DoS) Alberto Montresor (UniTN) DS - BFT 2018/12/06 8 / 80

12 Table of contents 1 Introduction 2 Byzantine generals 3 Practical Byzantine Fault Tolerance 4 Beyond PBFT Overview 5 Zyzzyva Introduction Three cases The case of the missing phase View changes 6 Aardvark 7 UpRight

13 Practical Byzantine Fault Tolerance A Byzantine renaissance Bibliography M. Castro and B. Liskov. Practical Byzantine fault tolerance and proactive recovery. ACM Trans. Comput. Syst., 20: , Nov Contributions First state machine replication protocol that survives Byzantine faults in asynchronous networks Live under weak Byzantine assumptions Byzantine Paxos/Raft! Implementation of a Byzantine, fault tolerant distributed FS Experiments measuring cost of replication technique Alberto Montresor (UniTN) DS - BFT 2018/12/06 9 / 80

14 Practical Byzantine Fault Tolerance Assumptions System model Asynchronous distributed system with N processes Unreliable channels Unbreakable cryptography Message m is signed by its sender i, and we write m σ(i), through: Public/private key pairs Message authentication codes (MAC) A digest d(m) of message m is produced through collision-resistant hash functions Alberto Montresor (UniTN) DS - BFT 2018/12/06 10 / 80

15 Practical Byzantine Fault Tolerance Assumptions Failure model Up to f Byzantine servers N > 3f total servers (Potentially Byzantine clients) Independent failures Different implementations of the service Different operating systems Different root passwords, different administrator Alberto Montresor (UniTN) DS - BFT 2018/12/06 11 / 80

16 Practical Byzantine Fault Tolerance Specification State machine replication Replicated service with a state and deterministic operations operating on it Clients issue a request and block waiting for reply Safety The system satisfies linearizability, provided that N > 3f + 1 Regardless of faulty clients... all operations performed by faulty clients are observed in a consistent way by non-faulty clients The algorithm does not rely on synchrony to provide safety... Liveness It relies on synchrony to provide liveness Assumes delay(t) does not grow faster than t indefinitely Weak assumption if network faults are eventually repaired Circumvent the impossibility results of FLP Alberto Montresor (UniTN) DS - BFT 2018/12/06 12 / 80

17 Practical Byzantine Fault Tolerance Optimality Theorem To tolerate up to f malicious nodes, N must be equal to 3f + 1 Proof

18 Practical Byzantine Fault Tolerance Optimality Theorem To tolerate up to f malicious nodes, N must be equal to 3f + 1 Proof It must be possible to proceed after communicating with N f replicas, because the faulty replicas may not respond

19 Practical Byzantine Fault Tolerance Optimality Theorem To tolerate up to f malicious nodes, N must be equal to 3f + 1 Proof It must be possible to proceed after communicating with N f replicas, because the faulty replicas may not respond But the f replicas not responding may be just slow, so f of those that responded might be faulty

20 Practical Byzantine Fault Tolerance Optimality Theorem To tolerate up to f malicious nodes, N must be equal to 3f + 1 Proof It must be possible to proceed after communicating with N f replicas, because the faulty replicas may not respond But the f replicas not responding may be just slow, so f of those that responded might be faulty The correct replicas who responded (N 2f) must outnumber the faulty replicas, so N 2f > f N > 3f Alberto Montresor (UniTN) DS - BFT 2018/12/06 13 / 80

21 Practical Byzantine Fault Tolerance Optimality So, N > 3f to ensure that at least a correct replica is present in the reply set N = 3f + 1; more is useless more and larger messages without improving resiliency Alberto Montresor (UniTN) DS - BFT 2018/12/06 14 / 80

22 Practical Byzantine Fault Tolerance Processes and views Replicas IDs: 0... N 1 Replicas move through a sequence of configurations called views During view v: Primary replica is i: i = v mod N The other are backups View changes are carried out when the primary appears to have failed Alberto Montresor (UniTN) DS - BFT 2018/12/06 15 / 80

23 Practical Byzantine Fault Tolerance The algorithm To invoke an operation, the client sends a request to the primary The primary multicasts the request to the backups Quorums are employed to guarantee ordering on operations When an order has been agreed, replicas execute the request and send a reply to the client When the client receives at least f + 1 identical replies, it is satisfied Client Primary Backup 1 Backup 2 Backup 3 Alberto Montresor (UniTN) DS - BFT 2018/12/06 16 / 80

24 Practical Byzantine Fault Tolerance Problems The primary could be faulty! could ignore commands; assign same sequence number to different requests; skip sequence numbers; etc backups monitor primary s behavior and trigger view changes to replace faulty primary Backups could be faulty! could incorrectly store commands forwarded by a correct primary use dissemination Byzantine quorum systems Faulty replicas could incorrectly respond to the client! Client waits for f + 1 matching replies before accepting response Alberto Montresor (UniTN) DS - BFT 2018/12/06 17 / 80

25 Practical Byzantine Fault Tolerance The general idea Algorithm steps are justified by certificates Sets (quorums) of signed messages from distinct replicas proving that a property of interest holds With quorums of size at least 2f + 1 Any two quorums intersect in at least one correct replica There is always one quorum that contains only non-faulty replicas 1. State: A 2. State: A 3. State: A 4. State: Servers X Clients write A write A write A write A Alberto Montresor (UniTN) DS - BFT 2018/12/06 18 / 80

26 Practical Byzantine Fault Tolerance The general idea Algorithm steps are justified by certificates Sets (quorums) of signed messages from distinct replicas proving that a property of interest holds With quorums of size at least 2f + 1 Any two quorums intersect in at least one correct replica There is always one quorum that contains only non-faulty replicas 1. State: 2. State: 3. State: 4. State: A A B B B Servers Clients X write B write B write B write B Alberto Montresor (UniTN) DS - BFT 2018/12/06 18 / 80

27 Practical Byzantine Fault Tolerance Protocol schema Normal operation How the protocol works in the absence of failures hopefully, the common case View changes How to depose a faulty primary and elect a new one Garbage collection How to reclaim the storage used to keep certificates Recovery How to make a faulty replica behave correctly again (not here) Alberto Montresor (UniTN) DS - BFT 2018/12/06 19 / 80

28 Practical Byzantine Fault Tolerance State The internal state of each of the replicas include: the state of the actual service a message log containing all the messages the replica has accepted an integer denoting the replica current view Alberto Montresor (UniTN) DS - BFT 2018/12/06 20 / 80

29 Practical Byzantine Fault Tolerance Client request Primary Request Backup 1 Backup 2 Backup 3 request, o, t, c σ(c) o: state machine operation t: timestamp (used to ensure exactly-once semantics) c: client id σ(c): client signature Alberto Montresor (UniTN) DS - BFT 2018/12/06 21 / 80

30 Practical Byzantine Fault Tolerance Pre-prepare phase Primary Request Backup 1 Backup 2 Backup 3 Pre-prepare pre-prepare, v, n, d(m) σ(p), m v: current view n: sequence number d(m): digest of client message σ(p): primary signature m: client message Alberto Montresor (UniTN) DS - BFT 2018/12/06 22 / 80

31 Practical Byzantine Fault Tolerance Pre-prepare phase pre-prepare, v, n, d(m) σ(p), m Correct replica i accepts pre-prepare if: the pre-prepare message is well-formed the current view of i is v i has not accepted another pre-prepare for v, n with a different digest n is between two water-marks L and H (to avoid sequence number exhaustion caused by faulty primaries) Each accepted pre-prepare message is stored in the accepting replica s message log (including the primary s) Non-accepted pre-prepare messages are just discarded Alberto Montresor (UniTN) DS - BFT 2018/12/06 23 / 80

32 Practical Byzantine Fault Tolerance Prepare phase Primary Request Backup 1 Backup 2 Backup 3 Pre-prepare Prepare prepare, v, n, d(m) σ(i) Accepted by correct replica j if: the prepare message is well-formed current view of j is v n is between two water-marks L and H Alberto Montresor (UniTN) DS - BFT 2018/12/06 24 / 80

33 Practical Byzantine Fault Tolerance Prepare phase Primary Request Backup 1 Backup 2 Backup 3 Pre-prepare Prepare prepare, v, n, d(m) σ(i) Replicas that send prepare accept the sequence number n for m in view v Each accepted prepare message is stored in the accepting replica s message log Alberto Montresor (UniTN) DS - BFT 2018/12/06 24 / 80

34 Practical Byzantine Fault Tolerance Prepare certificate (P-certificate) Replica i produces a prepare certificate prepared(m, v, n, i) iff its log holds: The request m A pre-prepare for m in view v with sequence number n Log contains 2f prepare messages from different backups that match the pre-prepare prepared(m, v, n, i) means that a quorum of (2f + 1) replicas agrees with assigning sequence number n to m in view v Theorem There are no two non-faulty replicas i, j such that prepared(m, v, n, i) and prepared(m, v, n, j), with m m Proof? Alberto Montresor (UniTN) DS - BFT 2018/12/06 25 / 80

35 Practical Byzantine Fault Tolerance Commit phase Primary Request Backup 1 Backup 2 Backup 3 Pre-prepare Prepare Commit commit, v, n, d(m), i σ(i) After having collected a P-certificate prepared(m, v, n, i), replica i sends a commit message Accepted if: The commit message is well-formed Current view of i is v n is between two water-marks L and H Alberto Montresor (UniTN) DS - BFT 2018/12/06 26 / 80

36 Practical Byzantine Fault Tolerance Commit certificate (C-Certificate) Commit certificates ensure total order across views we guarantee that we can t miss prepare certificates during a view change A replica has a certificate committed(m, v, n, i) if: it had a P-certificate prepared(m, v, n, i) log contains 2f + 1 matching commit from different replicas (possibly including its own) Replica executes a request after it gets commit certificate for it, and has cleared all requests with smaller sequence numbers Alberto Montresor (UniTN) DS - BFT 2018/12/06 27 / 80

37 Practical Byzantine Fault Tolerance Reply phase Primary Request Backup 1 Backup 2 Backup 3 Pre-prepare Prepare Commit Reply reply, v, t, c, i, r σ(i) r is the reply Client waits for f + 1 replies with the same t, r If the client does not receive replies soon enough, it broadcast the request to all replicas Alberto Montresor (UniTN) DS - BFT 2018/12/06 28 / 80

38 Practical Byzantine Fault Tolerance View change A un-satisfied replica backup i mutinies: stops accepting messages (except view-change and new-view) multicasts view-change, v + 1, P, i σ(i) P contains a P-certificate P m for each request m (up to a given number, see garbage collection) Mutiny succeeds if the new primary collects a new-view certificate V : a set containing 2f + 1 view-change messages indicating that 2f + 1 distinct replicas (including itself) support the change of leadership Alberto Montresor (UniTN) DS - BFT 2018/12/06 29 / 80

39 Practical Byzantine Fault Tolerance View change The primary elect p (replica v + 1 mod N): extracts from the new-view certificate V the highest sequence number h of any message for which V contains a P-certificate creates a new pre-prepare message for any client message m with sequence number n h and add it to the set O if there is a P-certificate for n, m in V Otherwise O O pre-prepare, v + 1, n, d m σ(p ) O O pre-prepare, v + 1, n, d null σ(p ) p multicasts new-view, v + 1, V, O σ(p ) Alberto Montresor (UniTN) DS - BFT 2018/12/06 30 / 80

40 Practical Byzantine Fault Tolerance View change Backup accepts a new-view, v + 1, V, O σ(p ) message for v + 1 if it is signed properly by p V contains valid view-change messages for v + 1 the correctness of O can be locally verified (repeating the primary s computation) Actions: Adds all entries in O to its log (so did p!) Multicasts a prepare for each message in O Adds all prepares to the log and enters new view Alberto Montresor (UniTN) DS - BFT 2018/12/06 31 / 80

41 Practical Byzantine Fault Tolerance Garbage collection A correct replica keeps in log messages about request o until: o has been executed by a majority of correct replicas, and this fact can proven during a view change Truncate log with stable checkpoints Each replica i periodically (after processing k requests) checkpoints state and multicasts checkpoint, n, d, i n: last executed request d: state digest A set S containing 2f + 1 equivalent checkpoint messages from distinct processes are a proof of the checkpoint s correctness (stable checkpoint certificate) Alberto Montresor (UniTN) DS - BFT 2018/12/06 32 / 80

42 Practical Byzantine Fault Tolerance View Change, revisited Message view-change, v + 1, n, S, C, P, i σ(i) n: the sequence number of the last stable checkpoint S: the last stable checkpoint C: the checkpoint certificate (2f + 1 checkpoint messages) Message new-view, v + 1, n, V, O σ(p ) n: the sequence number of the last stable checkpoint V, O: contains only requests with sequence number larger than n Alberto Montresor (UniTN) DS - BFT 2018/12/06 33 / 80

43 Practical Byzantine Fault Tolerance Optimizations Reducing replies One replica designated to send reply to client Other replicas send digest of the reply Lower latency for writes (4 messages) Replicas respond at Prepare phase (tentative execution) Client waits for 2f + 1 matching responses Fast reads (one round trip) Client sends to all; they respond immediately Client waits for 2f + 1 matching responses Alberto Montresor (UniTN) DS - BFT 2018/12/06 34 / 80

44 Practical Byzantine Fault Tolerance Optimizations: cryptography Reducing overhead Public-key cryptography only for view changes MACs (message authentication codes) for all other messages To give an idea (Pentium 200Mhz) Generating 1024-bit RSA signature of a MD5 digest: 43ms Generating a MAC of the same message: 10µs Alberto Montresor (UniTN) DS - BFT 2018/12/06 35 / 80

45 Practical Byzantine Fault Tolerance Application: Byzantine NFS server Alberto Montresor (UniTN) DS - BFT 2018/12/06 36 / 80

46 r readimpact hmark okup iolates o have Practical Byzantine Fault Tolerance the first four phases because the time spent waiting for Application: Byzantine NFS server de ran client e same ons for ad and es, and ark for ion for below 4% for is high ration. lookup operations to complete in BFS-strict is at least 20% of the elapsed time for these phases, whereas it is less than 5% of the elapsed time for the last phase. BFS phase strict r/o lookup NFS-std (-69%) 0.47 (-73%) (-2%) 7.91 (-16%) (35%) 6.45 (20%) (32%) 7.87 (19%) (-2%) (-2%) total (3%) (-2%) Table 3: Andrew benchmark: BFS vs NFS-std. times are in seconds. The Table 3 shows the results for BFS vs NFS-std. These results show that BFS can be used in practice BFSstrict takes only 3% more time to run the complete benchmark. Thus, one could replace the NFS V2 Alberto Montresor (UniTN) DS - BFT 2018/12/06 37 / 80

47 Practical Byzantine Fault Tolerance Reality Check Example of systems that have adopted Byzantine Fault Tolerance: Boeing 777 Aircraft Information Management System Boeing 777/787 flight control system SpaceX Dragon flight control system BitCoin Alberto Montresor (UniTN) DS - BFT 2018/12/06 38 / 80

48 Distributed Algorithms Practical Byzantine Fault Tolerance Alberto Montresor Università di Trento 2018/12/06 Acknowledgments: Lorenzo Alvisi This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

49 Table of contents 1 Introduction 2 Byzantine generals 3 Practical Byzantine Fault Tolerance 4 Beyond PBFT Overview 5 Zyzzyva Introduction Three cases The case of the missing phase View changes 6 Aardvark 7 UpRight

50 Table of contents 1 Introduction 2 Byzantine generals 3 Practical Byzantine Fault Tolerance 4 Beyond PBFT Overview 5 Zyzzyva Introduction Three cases The case of the missing phase View changes 6 Aardvark 7 UpRight

51 Beyond PBFT Overview Overview After PBFT, several others papers started to appear: HQ: J. Cowling, D. Myers, B. Liskov, R. Rodrigues, and L. Shrira. HQ replication: A hybrid quorum protocol for Byzantine fault tolerance. In Proc. of the Symposium on Operating systems design and implementation, OSDI 06, Oct Q/U: M. Abd-El-Malek, G. Ganger, G. Goodson, M. Retier, and J. Wylie. Fault-scalable Byzantine fault-tolerant services. In Proc. of the ACM Symposium on Operating Systems Principles, SOSP 05, Oct The end results has been to complicate the adoption of Byzantine solutions. Alberto Montresor (UniTN) DS - BFT 2018/12/06 39 / 80

52 Beyond PBFT Overview Overview In the regions we studied (up to f = 5), if contention is low and low latency is the main issue, then if it is acceptable to use 5f + 1 replicas, Q/U is the best choice, else HQ is the best since it outperforms PBFT with a batch size of 1. Otherwise, PBFT is the best choice in this region: It can handle high contention workloads, and it can beat the throughput of both HQ and Q/U through its use of batching. Outside of this region, we expect HQ will scale best: HQ s throughput decreases more slowly than Q/U s (because of the latter s larger message and processing costs) and PBFT s (where eventually batching cannot compen- sate for the quadratic number of messages). Alberto Montresor (UniTN) DS - BFT 2018/12/06 40 / 80

53 Table of contents 1 Introduction 2 Byzantine generals 3 Practical Byzantine Fault Tolerance 4 Beyond PBFT Overview 5 Zyzzyva Introduction Three cases The case of the missing phase View changes 6 Aardvark 7 UpRight

54 Zyzzyva Introduction Zyzzyva3 OSDI 06 R. Kotla, A. Clement, E. Wong, L. Alvisi, and M. Dahlin. Zyzzyva: Speculative byzantine fault tolerance. In Proc. of the ACM Symposium on Operating Systems Principles, (SOSP 07), Stevenson, WA, Oct ACM. One protocol to rule them all! Zyzzyva is the last word on BFT! (Is it?) 3 Zyzzyva is the last word of the English dictionary Apart from Zyzzyzus Alberto Montresor (UniTN) DS - BFT 2018/12/06 41 / 80

55 Zyzzyva Introduction Replica coordination All correct replicas execute the same sequence of commands For each received command c, correct replicas: Agree on c s position in the sequence Execute c in the agreed upon order Reply to the client Alberto Montresor (UniTN) DS - BFT 2018/12/06 42 / 80

56 Zyzzyva Introduction How it is done now Primary Request Backup 1 Backup 2 Backup 3 Pre-prepare Prepare Commit Reply Alberto Montresor (UniTN) DS - BFT 2018/12/06 43 / 80

57 Zyzzyva Introduction The engineer s Rule of thumb Citation Handle normal and worst case separately as a rule, because the requirements for the two are quite different: the normal case must be fast; the worst case must make some progress Butler Lampson, Hints for Computer System Design Alberto Montresor (UniTN) DS - BFT 2018/12/06 44 / 80

58 Zyzzyva Introduction How Zyzzyva does it Primary Request Replica 1 Replica 2 Replica 3 Alberto Montresor (UniTN) DS - BFT 2018/12/06 45 / 80

59 Zyzzyva Introduction Specification for State Machine Replication (SMR) Stability A command is stable at a replica once its position in the sequence cannot change Safety Correct clients only process replies to stable commands Liveness All commands issued by correct clients eventually become stable and elicit a reply Alberto Montresor (UniTN) DS - BFT 2018/12/06 46 / 80

60 Zyzzyva Introduction Enforncing safety Safety requires: Correct clients only process replies to stable commands...but SMR implementations enforce instead: Correct replicas only execute and reply to commands that are stable Service performs an output commit with each reply Alberto Montresor (UniTN) DS - BFT 2018/12/06 47 / 80

61 Zyzzyva Introduction Speculative BFT (Trust, but verify) Replicas execute and reply to a command without knowing whether it is stable trust order provided by primary no explicit replica agreement! Correct client, before processing reply, verifies that it corresponds to stable command if not, client takes action to ensure liveness Alberto Montresor (UniTN) DS - BFT 2018/12/06 48 / 80

62 Zyzzyva Introduction Verifying stability Necessary condition for stability in Zyzzyva: A command c can become stable only if a majority of correct replicas agree on its position in the sequence Client can process a response for c iff: a majority of correct replicas agrees on c s position the set of replies is incompatible, for all possible future executions, with a majority of correct replicas agreeing on a different command holding c s current position Alberto Montresor (UniTN) DS - BFT 2018/12/06 49 / 80

63 Zyzzyva Introduction History History H i,k is the sequence of the first k commands executed by replica i On receipt of a command c from the primary, replica appends c to its command history Replica reply for c includes: the application-level response the corresponding command history Additional details: Can be hashed through incremental hashing Alberto Montresor (UniTN) DS - BFT 2018/12/06 50 / 80

64 Zyzzyva Three cases Case 1: Unanimity Primary c <c,k> < r 1,H 1,k > Replica 1 < r 2,H 2,k > <c,k> Replica 2 <c,k> < r 3,H 3,k > Replica 3 < r 4,H 4,k > Client processes response if all replies match: r 1 =... = r 4 H 1,k =... = H 4,k Alberto Montresor (UniTN) DS - BFT 2018/12/06 51 / 80

65 Zyzzyva Three cases Case 1: Unanimity Some comments: Note that although a client has a proof that the request position in the command history is irremediately set, no server has such a proof Comparison of histories may be based on incremental hash Three message hops to complete the request in the good case Is it safe to accept the reply in this case? Alberto Montresor (UniTN) DS - BFT 2018/12/06 52 / 80

66 Zyzzyva Three cases Case 1: Unanimity Some comments: Note that although a client has a proof that the request position in the command history is irremediately set, no server has such a proof Comparison of histories may be based on incremental hash Three message hops to complete the request in the good case Is it safe to accept the reply in this case? All processes have agreed on ordering Alberto Montresor (UniTN) DS - BFT 2018/12/06 52 / 80

67 Zyzzyva Three cases Case 1: Unanimity Some comments: Note that although a client has a proof that the request position in the command history is irremediately set, no server has such a proof Comparison of histories may be based on incremental hash Three message hops to complete the request in the good case Is it safe to accept the reply in this case? All processes have agreed on ordering Correct processes cannot change their mind later Alberto Montresor (UniTN) DS - BFT 2018/12/06 52 / 80

68 Zyzzyva Three cases Case 1: Unanimity Some comments: Note that although a client has a proof that the request position in the command history is irremediately set, no server has such a proof Comparison of histories may be based on incremental hash Three message hops to complete the request in the good case Is it safe to accept the reply in this case? All processes have agreed on ordering Correct processes cannot change their mind later New primary can ask n f replicas for their histories Alberto Montresor (UniTN) DS - BFT 2018/12/06 52 / 80

69 Zyzzyva Three cases Case 2: A majority of correct replicas agree Primary c <c,k> < r 1,H 1,k > Replica 1 < r 2,H 2,k > <c,k> Replica 2 <c,k> < r 3,H 3,k > Replica 3 Is it safe to accept such a message? Alberto Montresor (UniTN) DS - BFT 2018/12/06 53 / 80

70 Zyzzyva Three cases Case 2: A majority of correct replicas agree Primary c <c,k> < r 1,H 1,k > Replica 1 < r 2,H 2,k > <c,k> Replica 2 < r 3,H 3,k > Replica 3 Consider this case... Alberto Montresor (UniTN) DS - BFT 2018/12/06 54 / 80

71 Zyzzyva Three cases Case 2: A majority of correct replicas agree Primary Replica 1 c < r i,h i,k > <c,k> <c,k> CC=<H 1,k, H 2,k, H 3,k> Replica 2 <c,k> Replica 3 Client sends to all a commit certificate containing 2f + 1 matching histories Alberto Montresor (UniTN) DS - BFT 2018/12/06 55 / 80

72 Zyzzyva Three cases Case 2: A majority of correct replicas agree Primary Replica 1 c < r i,h i,k > <c,k> <c,k> ack CC=<H 1,k, H 2,k, H 3,k> Replica 2 <c,k> Replica 3 Client processes response if it receives at least 2f + 1 acks Alberto Montresor (UniTN) DS - BFT 2018/12/06 56 / 80

73 Zyzzyva Three cases Case 2: A majority of correct replicas agree Safe? Certificate proves that a majority of correct processes agree on its position in the sequence Incompatible with a majority backing a different command for that position Stability Stability depends on matching command histories Stability is prefix-closed: If a command with sequence number k is stable, then so is every command with sequence number k < k Alberto Montresor (UniTN) DS - BFT 2018/12/06 57 / 80

74 Zyzzyva Three cases Case 3: None of the above Primary c <c,k> < r 1,H 1,k > Replica 1 < r 2,H 2,k > Replica 2 Replica 3 Fewer than 2f + 1 replies match Clients retransmits c to all replicas hinting primary may be faulty Alberto Montresor (UniTN) DS - BFT 2018/12/06 58 / 80

75 Zyzzyva The case of the missing phase The case of the missing phase Primary Backup 1 Backup 2 Backup 3 Request Pre-prepare Prepare Commit Reply Where did the third phase go? Why was it there to begin with? Primary Replica 1 c < r i,h i,k > <c,k> <c,k> ack CC=<H 1,k, H 2,k, H 3,k> Replica 2 <c,k> Replica 3 Alberto Montresor (UniTN) DS - BFT 2018/12/06 59 / 80

76 Zyzzyva The case of the missing phase The missing phase commit Consider this scenario: f malicious replicas, including the primary The primary stops communicating with f correct replicas They go on strike they stop accepting messages in this view, ask a view change f + f replicas stops accepting messages, f + 1 replicas keep working The remaining f + 1 replicas are not enough to conclude the pre-prepare and prepare phases The f correct processes that are asking a view change are not enough to conclude one, so there is no opportunity to regain liveness by electing a new primary Alberto Montresor (UniTN) DS - BFT 2018/12/06 60 / 80

77 Zyzzyva The case of the missing phase The missing phase commit The third phase of PBFT breaks this stalemate: The remaining f + 1 replicas either gather the evidence necessary to complete the request, or determine that a view change is necessary Commit phase needed for liveness Alberto Montresor (UniTN) DS - BFT 2018/12/06 61 / 80

78 Zyzzyva View changes Where the third phase go? In PBFT What compromises liveness in the previous scenario is that the PBFT view change protocol lets correct replicas commit to a view change and become silent in a view without any guarantee that their action will lead to the view change In Zyzzyva A correct replica does not abandon view v unless it is guaranteed that every other correct replica will do the same, forcing a new view and a new primary Alberto Montresor (UniTN) DS - BFT 2018/12/06 62 / 80

79 Zyzzyva View changes View change Two phases: Processes unsatisfied with the current primary sent a message i-hate-the-primary, v to all If a process collect f + 1 i-hate-the-primary messages, sends a message to all containing such messages and starts a new view change (similar to the traditional one) Extra phase of agreement protocol is moved to the view change protocol Alberto Montresor (UniTN) DS - BFT 2018/12/06 63 / 80

80 Zyzzyva View changes Optimizations Checkpoint protocol to garbage collect histories Replacing digital signatures with MAC Replicating application state at only 2f + 1 replicas Batching Alberto Montresor (UniTN) DS - BFT 2018/12/06 64 / 80

81 Zyzzyva View changes Performance 7:28 R. Kotla et al. 140 Unreplicated Throughput (Kops/sec) Zyzzyva Zyzzyva (B=10) Zyzzyva5 (B=10) PBFT (B=10) Zyzzyva5 20 PBFT Q/U max throughput HQ Number of clients Fig. 4. Realized throughput for the 0/0 benchmark as the number of client varies for systems configured to tolerate f = 1 faults. 4 Alberto Montresor (UniTN) DS - BFT 2018/12/06 65 / 80

82 Zyzzyva View changes Discussion What have you learned? Do you agree on the principles? Alberto Montresor (UniTN) DS - BFT 2018/12/06 66 / 80

83 Table of contents 1 Introduction 2 Byzantine generals 3 Practical Byzantine Fault Tolerance 4 Beyond PBFT Overview 5 Zyzzyva Introduction Three cases The case of the missing phase View changes 6 Aardvark 7 UpRight

84 Aardvark Aardvark 4 NSDI 09 A. Clement, E. Wong, L. Alvisi, M. Dahlin, and M. Marchetti. Making Byzantine fault tolerant systems tolerate Byzantine faults. In Proc. of the 6th USENIX symposium on Networked systems design and implementation, NSDI 09, pages USENIX Association, A new beginning! Porc_formiguer.JPG 4 Aardvark is the first word of the English dictionary Oritteropo in Italian Alberto Montresor (UniTN) DS - BFT 2018/12/06 67 / 80

85 Aardvark From the article Surviving vs tolerating Although current BFT systems can survive Byzantine faults without compromising safety, we contend that a system that can be made completely unavailable by a simple Byzantine failure can hardly be said to tolerate Byzantine faults. Alberto Montresor (UniTN) DS - BFT 2018/12/06 68 / 80

86 Aardvark Conventional wisdom Handle normal and worst case separately remain safe in worst case make progress in normal case Maximize performance when the network is synchronous all clients and servers behave correctly Alberto Montresor (UniTN) DS - BFT 2018/12/06 69 / 80

87 Aardvark Conventional wisdom Misguided encourages systems that fail to deliver BFT Maximize performance when the network is synchronous all clients and servers behave correctly Alberto Montresor (UniTN) DS - BFT 2018/12/06 69 / 80

88 Aardvark Conventional wisdom Misguided encourages systems that fail to deliver BFT Dangerous it encourages fragile optimizations Alberto Montresor (UniTN) DS - BFT 2018/12/06 69 / 80

89 Aardvark Conventional wisdom Misguided encourages systems that fail to deliver BFT Dangerous it encourages fragile optimizations Futile it yields diminishing return on common case Alberto Montresor (UniTN) DS - BFT 2018/12/06 69 / 80

90 Aardvark Blueprint Build the system around execution path that: provides acceptable performance across the broadest set of executions it is easy to implement it is robust against Byzantine attempts to push the system away from it Alberto Montresor (UniTN) DS - BFT 2018/12/06 70 / 80

91 Aardvark Revisiting conventional wisdom Signatures are expensive use MACs Faulty clients can use MACs to generate ambiguity (One node validating a MAC authenticator does not guarantee that any other nodes will validate that same authenticator) Aardvark requires clients to sign requests View changes are to be avoided Aardvark uses regular view changes to maintain high throughput despite faulty primaries Hardware multicast is a boon Aardvark uses separate work queues for clients and individual replicas Aardvark uses fully connected topology among replicas (separate NICs) Alberto Montresor (UniTN) DS - BFT 2018/12/06 71 / 80

92 Aardvark MAC Attack Primary c <c,k> Replica 1 <c,k> Replica 2 <c,k> Replica 3 Alberto Montresor (UniTN) DS - BFT 2018/12/06 72 / 80

93 Aardvark MAC Attack Primary c <c,k> Replica 1 <c,k> Replica 2 <c,k> Replica 3 Alberto Montresor (UniTN) DS - BFT 2018/12/06 73 / 80

94 Aardvark Throughput Best Faulty Client Faulty Faulty case client flood primary replica PBFT 62K 0 crash 1k 250 QU 24K 0 crash NA 19k HQ 15K NA 4.5K NA crash Zyzzyva 80K 0 crash crash 0 Aardvark 39K 39K 7.8K 37K 11K Alberto Montresor (UniTN) DS - BFT 2018/12/06 74 / 80

95 Table of contents 1 Introduction 2 Byzantine generals 3 Practical Byzantine Fault Tolerance 4 Beyond PBFT Overview 5 Zyzzyva Introduction Three cases The case of the missing phase View changes 6 Aardvark 7 UpRight

96 UpRight UpRight Bibliography A. Clement, M. Kapritsos, S. Lee, Y. Wang, L. Alvisi, M. Dahlin, and T. Riche. UpRight cluster services. In Proc. of the ACM Symposium on Operating Systems Principles, SOSP 09, Oct A new (B)FT replication library Minimal intrusiveness for existing apps Adequate performance Goal: ease BFT deployment make explicit incremental cost of BFT switching to BFT: simple change in a config file Alberto Montresor (UniTN) DS - BFT 2018/12/06 75 / 80

97 UpRight UpRight u= max number of failures to ensure liveness r = max number of commission failures to preserve safety Omission Byzantine Commission r = u = f: BFT r = 0 : CFT Crash Alberto Montresor (UniTN) DS - BFT 2018/12/06 76 / 80

98 UpRight UpRight Exposes incremental cost of BFT Byzantine agreement if r << u, BFT CFT in replication cost Allows richer design options Byzantine faults are rare: u > r Safety more critical than liveness: r > u Alberto Montresor (UniTN) DS - BFT 2018/12/06 77 / 80

99 UpRight Reality Check UpRight 5 (Java; latest update Oct. 2009) ArchiStar-BFT 6 (Java; latest update May 2015) Bft-SMaRt 7 (Java; latest update Apr. 2016) Alberto Montresor (UniTN) DS - BFT 2018/12/06 78 / 80

100 UpRight For (far in the) future lectures S. Gaertner, M. Bourennane, C. Kurtsiefer, A. Cabello, and H. Weinfurter. Experimental demonstration of a quantum protocol for byzantine agreement and liar detection. Physical Review Letters, 100(7), Feb Alberto Montresor (UniTN) DS - BFT 2018/12/06 79 / 80

101 UpRight Reading material M. Castro and B. Liskov. Practical Byzantine fault tolerance. In Proc. of the 3 rd Symposium on Operating systems design and implementation, OSDI 99, pages , New Orleans, Louisiana, USA, USENIX Association. R. Kotla, A. Clement, E. Wong, L. Alvisi, and M. Dahlin. Zyzzyva: Speculative byzantine fault tolerance. In Proc. of the ACM Symposium on Operating Systems Principles, (SOSP 07), Stevenson, WA, Oct ACM. A. Clement, E. Wong, L. Alvisi, M. Dahlin, and M. Marchetti. Making Byzantine fault tolerant systems tolerate Byzantine faults. In Proc. of the 6th USENIX symposium on Networked systems design and implementation, NSDI 09, pages USENIX Association, Alberto Montresor (UniTN) DS - BFT 2018/12/06 80 / 80

Distributed Algorithms Practical Byzantine Fault Tolerance

Distributed Algorithms Practical Byzantine Fault Tolerance Distributed Algorithms Practical Byzantine Fault Tolerance Alberto Montresor University of Trento, Italy 2017/01/06 This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International

More information

Practical Byzantine Fault Tolerance (The Byzantine Generals Problem)

Practical Byzantine Fault Tolerance (The Byzantine Generals Problem) Practical Byzantine Fault Tolerance (The Byzantine Generals Problem) Introduction Malicious attacks and software errors that can cause arbitrary behaviors of faulty nodes are increasingly common Previous

More information

PBFT: A Byzantine Renaissance. The Setup. What could possibly go wrong? The General Idea. Practical Byzantine Fault-Tolerance (CL99, CL00)

PBFT: A Byzantine Renaissance. The Setup. What could possibly go wrong? The General Idea. Practical Byzantine Fault-Tolerance (CL99, CL00) PBFT: A Byzantine Renaissance Practical Byzantine Fault-Tolerance (CL99, CL00) first to be safe in asynchronous systems live under weak synchrony assumptions -Byzantine Paxos! The Setup Crypto System Model

More information

Zyzzyva. Speculative Byzantine Fault Tolerance. Ramakrishna Kotla. L. Alvisi, M. Dahlin, A. Clement, E. Wong University of Texas at Austin

Zyzzyva. Speculative Byzantine Fault Tolerance. Ramakrishna Kotla. L. Alvisi, M. Dahlin, A. Clement, E. Wong University of Texas at Austin Zyzzyva Speculative Byzantine Fault Tolerance Ramakrishna Kotla L. Alvisi, M. Dahlin, A. Clement, E. Wong University of Texas at Austin The Goal Transform high-performance service into high-performance

More information

Authenticated Agreement

Authenticated Agreement Chapter 18 Authenticated Agreement Byzantine nodes are able to lie about their inputs as well as received messages. Can we detect certain lies and limit the power of byzantine nodes? Possibly, the authenticity

More information

Practical Byzantine Fault

Practical Byzantine Fault Practical Byzantine Fault Tolerance Practical Byzantine Fault Tolerance Castro and Liskov, OSDI 1999 Nathan Baker, presenting on 23 September 2005 What is a Byzantine fault? Rationale for Byzantine Fault

More information

Practical Byzantine Fault Tolerance. Miguel Castro and Barbara Liskov

Practical Byzantine Fault Tolerance. Miguel Castro and Barbara Liskov Practical Byzantine Fault Tolerance Miguel Castro and Barbara Liskov Outline 1. Introduction to Byzantine Fault Tolerance Problem 2. PBFT Algorithm a. Models and overview b. Three-phase protocol c. View-change

More information

Byzantine fault tolerance. Jinyang Li With PBFT slides from Liskov

Byzantine fault tolerance. Jinyang Li With PBFT slides from Liskov Byzantine fault tolerance Jinyang Li With PBFT slides from Liskov What we ve learnt so far: tolerate fail-stop failures Traditional RSM tolerates benign failures Node crashes Network partitions A RSM w/

More information

Byzantine Techniques

Byzantine Techniques November 29, 2005 Reliability and Failure There can be no unity without agreement, and there can be no agreement without conciliation René Maowad Reliability and Failure There can be no unity without agreement,

More information

Robust BFT Protocols

Robust BFT Protocols Robust BFT Protocols Sonia Ben Mokhtar, LIRIS, CNRS, Lyon Joint work with Pierre Louis Aublin, Grenoble university Vivien Quéma, Grenoble INP 18/10/2013 Who am I? CNRS reseacher, LIRIS lab, DRIM research

More information

AS distributed systems develop and grow in size,

AS distributed systems develop and grow in size, 1 hbft: Speculative Byzantine Fault Tolerance With Minimum Cost Sisi Duan, Sean Peisert, Senior Member, IEEE, and Karl N. Levitt Abstract We present hbft, a hybrid, Byzantine fault-tolerant, ted state

More information

Reducing the Costs of Large-Scale BFT Replication

Reducing the Costs of Large-Scale BFT Replication Reducing the Costs of Large-Scale BFT Replication Marco Serafini & Neeraj Suri TU Darmstadt, Germany Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group www.deeds.informatik.tu-darmstadt.de

More information

Practical Byzantine Fault Tolerance

Practical Byzantine Fault Tolerance Practical Byzantine Fault Tolerance Robert Grimm New York University (Partially based on notes by Eric Brewer and David Mazières) The Three Questions What is the problem? What is new or different? What

More information

Zyzzyva: Speculative Byzantine Fault Tolerance

Zyzzyva: Speculative Byzantine Fault Tolerance : Speculative Byzantine Fault Tolerance Ramakrishna Kotla, Lorenzo Alvisi, Mike Dahlin, Allen Clement, and Edmund Wong Dept. of Computer Sciences University of Texas at Austin {kotla,lorenzo,dahlin,aclement,elwong}@cs.utexas.edu

More information

Zyzzyva: Speculative Byzantine Fault Tolerance

Zyzzyva: Speculative Byzantine Fault Tolerance : Speculative Byzantine Fault Tolerance Ramakrishna Kotla Microsoft Research Silicon Valley, USA kotla@microsoft.com Allen Clement, Edmund Wong, Lorenzo Alvisi, and Mike Dahlin Dept. of Computer Sciences

More information

Practical Byzantine Fault Tolerance and Proactive Recovery

Practical Byzantine Fault Tolerance and Proactive Recovery Practical Byzantine Fault Tolerance and Proactive Recovery MIGUEL CASTRO Microsoft Research and BARBARA LISKOV MIT Laboratory for Computer Science Our growing reliance on online services accessible on

More information

Practical Byzantine Fault Tolerance Using Fewer than 3f+1 Active Replicas

Practical Byzantine Fault Tolerance Using Fewer than 3f+1 Active Replicas Proceedings of the 17th International Conference on Parallel and Distributed Computing Systems San Francisco, California, pp 241-247, September 24 Practical Byzantine Fault Tolerance Using Fewer than 3f+1

More information

Authenticated Byzantine Fault Tolerance Without Public-Key Cryptography

Authenticated Byzantine Fault Tolerance Without Public-Key Cryptography Appears as Technical Memo MIT/LCS/TM-589, MIT Laboratory for Computer Science, June 999 Authenticated Byzantine Fault Tolerance Without Public-Key Cryptography Miguel Castro and Barbara Liskov Laboratory

More information

or? Paxos: Fun Facts Quorum Quorum: Primary Copy vs. Majority Quorum: Primary Copy vs. Majority

or? Paxos: Fun Facts Quorum Quorum: Primary Copy vs. Majority Quorum: Primary Copy vs. Majority Paxos: Fun Facts Quorum Why is the algorithm called Paxos? Leslie Lamport described the algorithm as the solution to a problem of the parliament on a fictitious Greek island called Paxos Many readers were

More information

Tradeoffs in Byzantine-Fault-Tolerant State-Machine-Replication Protocol Design

Tradeoffs in Byzantine-Fault-Tolerant State-Machine-Replication Protocol Design Tradeoffs in Byzantine-Fault-Tolerant State-Machine-Replication Protocol Design Michael G. Merideth March 2008 CMU-ISR-08-110 School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213

More information

BYZANTINE GENERALS BYZANTINE GENERALS (1) A fable: Michał Szychowiak, 2002 Dependability of Distributed Systems (Byzantine agreement)

BYZANTINE GENERALS BYZANTINE GENERALS (1) A fable: Michał Szychowiak, 2002 Dependability of Distributed Systems (Byzantine agreement) BYZANTINE GENERALS (1) BYZANTINE GENERALS A fable: BYZANTINE GENERALS (2) Byzantine Generals Problem: Condition 1: All loyal generals decide upon the same plan of action. Condition 2: A small number of

More information

Practical Byzantine Fault Tolerance

Practical Byzantine Fault Tolerance Appears in the Proceedings of the Third Symposium on Operating Systems Design and Implementation, New Orleans, USA, February 1999 Practical Byzantine Fault Tolerance Miguel Castro and Barbara Liskov Laboratory

More information

Key-value store with eventual consistency without trusting individual nodes

Key-value store with eventual consistency without trusting individual nodes basementdb Key-value store with eventual consistency without trusting individual nodes https://github.com/spferical/basementdb 1. Abstract basementdb is an eventually-consistent key-value store, composed

More information

Byzantine Fault Tolerance

Byzantine Fault Tolerance Byzantine Fault Tolerance CS 240: Computing Systems and Concurrency Lecture 11 Marco Canini Credits: Michael Freedman and Kyle Jamieson developed much of the original material. So far: Fail-stop failures

More information

Why then another BFT protocol? Zyzzyva. Simplify, simplify. Simplify, simplify. Complex decision tree hampers BFT adoption. H.D. Thoreau. H.D.

Why then another BFT protocol? Zyzzyva. Simplify, simplify. Simplify, simplify. Complex decision tree hampers BFT adoption. H.D. Thoreau. H.D. Why then another BFT protool? Yes No Zyzzyva Yes No Yes No Comple deision tree hampers BFT adoption Simplify, simplify H.D. Thoreau Simplify, simplify H.D. Thoreau Yes No Yes No Yes Yes No One protool

More information

Byzantine Fault Tolerance and Consensus. Adi Seredinschi Distributed Programming Laboratory

Byzantine Fault Tolerance and Consensus. Adi Seredinschi Distributed Programming Laboratory Byzantine Fault Tolerance and Consensus Adi Seredinschi Distributed Programming Laboratory 1 (Original) Problem Correct process General goal: Run a distributed algorithm 2 (Original) Problem Correct process

More information

EECS 591 DISTRIBUTED SYSTEMS. Manos Kapritsos Fall 2018

EECS 591 DISTRIBUTED SYSTEMS. Manos Kapritsos Fall 2018 EECS 591 DISTRIBUTED SYSTEMS Manos Kapritsos Fall 2018 THE GENERAL IDEA Replicas A Primary A 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 PBFT: NORMAL OPERATION Three phases: Pre-prepare Prepare Commit assigns sequence

More information

Byzantine Fault Tolerance

Byzantine Fault Tolerance Byzantine Fault Tolerance CS6450: Distributed Systems Lecture 10 Ryan Stutsman Material taken/derived from Princeton COS-418 materials created by Michael Freedman and Kyle Jamieson at Princeton University.

More information

CSE 5306 Distributed Systems. Fault Tolerance

CSE 5306 Distributed Systems. Fault Tolerance CSE 5306 Distributed Systems Fault Tolerance 1 Failure in Distributed Systems Partial failure happens when one component of a distributed system fails often leaves other components unaffected A failure

More information

Tolerating Latency in Replicated State Machines through Client Speculation

Tolerating Latency in Replicated State Machines through Client Speculation Tolerating Latency in Replicated State Machines through Client Speculation April 22, 2009 1, James Cowling 2, Edmund B. Nightingale 3, Peter M. Chen 1, Jason Flinn 1, Barbara Liskov 2 University of Michigan

More information

Data Consistency and Blockchain. Bei Chun Zhou (BlockChainZ)

Data Consistency and Blockchain. Bei Chun Zhou (BlockChainZ) Data Consistency and Blockchain Bei Chun Zhou (BlockChainZ) beichunz@cn.ibm.com 1 Data Consistency Point-in-time consistency Transaction consistency Application consistency 2 Strong Consistency ACID Atomicity.

More information

Two New Protocols for Fault Tolerant Agreement

Two New Protocols for Fault Tolerant Agreement Two New Protocols for Fault Tolerant Agreement Poonam Saini 1 and Awadhesh Kumar Singh 2, 1,2 Department of Computer Engineering, National Institute of Technology, Kurukshetra, India nit.sainipoonam@gmail.com,

More information

Revisiting Fast Practical Byzantine Fault Tolerance

Revisiting Fast Practical Byzantine Fault Tolerance Revisiting Fast Practical Byzantine Fault Tolerance Ittai Abraham, Guy Gueta, Dahlia Malkhi VMware Research with: Lorenzo Alvisi (Cornell), Rama Kotla (Amazon), Jean-Philippe Martin (Verily) December 4,

More information

A definition. Byzantine Generals Problem. Synchronous, Byzantine world

A definition. Byzantine Generals Problem. Synchronous, Byzantine world The Byzantine Generals Problem Leslie Lamport, Robert Shostak, and Marshall Pease ACM TOPLAS 1982 Practical Byzantine Fault Tolerance Miguel Castro and Barbara Liskov OSDI 1999 A definition Byzantine (www.m-w.com):

More information

CSE 5306 Distributed Systems

CSE 5306 Distributed Systems CSE 5306 Distributed Systems Fault Tolerance Jia Rao http://ranger.uta.edu/~jrao/ 1 Failure in Distributed Systems Partial failure Happens when one component of a distributed system fails Often leaves

More information

Consensus and related problems

Consensus and related problems Consensus and related problems Today l Consensus l Google s Chubby l Paxos for Chubby Consensus and failures How to make process agree on a value after one or more have proposed what the value should be?

More information

International Journal of Advanced Research in Computer Science and Software Engineering

International Journal of Advanced Research in Computer Science and Software Engineering Volume 2, Issue 9, September 2012 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Backup Two

More information

The Long March of BFT. Weird Things Happen in Distributed Systems. A specter is haunting the system s community... A hierarchy of failure models

The Long March of BFT. Weird Things Happen in Distributed Systems. A specter is haunting the system s community... A hierarchy of failure models A specter is haunting the system s community... The Long March of BFT Lorenzo Alvisi UT Austin BFT Fail-stop A hierarchy of failure models Crash Weird Things Happen in Distributed Systems Send Omission

More information

CS 138: Practical Byzantine Consensus. CS 138 XX 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.

CS 138: Practical Byzantine Consensus. CS 138 XX 1 Copyright 2017 Thomas W. Doeppner. All rights reserved. CS 138: Practical Byzantine Consensus CS 138 XX 1 Copyright 2017 Thomas W. Doeppner. All rights reserved. Scenario Asynchronous system Signed messages s are state machines It has to be practical CS 138

More information

Proactive Recovery in a Byzantine-Fault-Tolerant System

Proactive Recovery in a Byzantine-Fault-Tolerant System Proactive Recovery in a Byzantine-Fault-Tolerant System Miguel Castro and Barbara Liskov Laboratory for Computer Science, Massachusetts Institute of Technology, 545 Technology Square, Cambridge, MA 02139

More information

Adapting Byzantine Fault Tolerant Systems

Adapting Byzantine Fault Tolerant Systems Adapting Byzantine Fault Tolerant Systems Miguel Neves Pasadinhas miguel.pasadinhas@tecnico.ulisboa.pt Instituto Superior Técnico (Advisor: Professor Luís Rodrigues) Abstract. Malicious attacks, software

More information

Today: Fault Tolerance

Today: Fault Tolerance Today: Fault Tolerance Agreement in presence of faults Two army problem Byzantine generals problem Reliable communication Distributed commit Two phase commit Three phase commit Paxos Failure recovery Checkpointing

More information

Byzantine Fault-Tolerance with Commutative Commands

Byzantine Fault-Tolerance with Commutative Commands Byzantine Fault-Tolerance with Commutative Commands Pavel Raykov 1, Nicolas Schiper 2, and Fernando Pedone 2 1 Swiss Federal Institute of Technology (ETH) Zurich, Switzerland 2 University of Lugano (USI)

More information

Practical Byzantine Fault Tolerance. Castro and Liskov SOSP 99

Practical Byzantine Fault Tolerance. Castro and Liskov SOSP 99 Practical Byzantine Fault Tolerance Castro and Liskov SOSP 99 Why this paper? Kind of incredible that it s even possible Let alone a practical NFS implementation with it So far we ve only considered fail-stop

More information

Today: Fault Tolerance. Fault Tolerance

Today: Fault Tolerance. Fault Tolerance Today: Fault Tolerance Agreement in presence of faults Two army problem Byzantine generals problem Reliable communication Distributed commit Two phase commit Three phase commit Paxos Failure recovery Checkpointing

More information

To do. Consensus and related problems. q Failure. q Raft

To do. Consensus and related problems. q Failure. q Raft Consensus and related problems To do q Failure q Consensus and related problems q Raft Consensus We have seen protocols tailored for individual types of consensus/agreements Which process can enter the

More information

Proactive and Reactive View Change for Fault Tolerant Byzantine Agreement

Proactive and Reactive View Change for Fault Tolerant Byzantine Agreement Journal of Computer Science 7 (1): 101-107, 2011 ISSN 1549-3636 2011 Science Publications Proactive and Reactive View Change for Fault Tolerant Byzantine Agreement Poonam Saini and Awadhesh Kumar Singh

More information

A Correctness Proof for a Practical Byzantine-Fault-Tolerant Replication Algorithm

A Correctness Proof for a Practical Byzantine-Fault-Tolerant Replication Algorithm Appears as Technical Memo MIT/LCS/TM-590, MIT Laboratory for Computer Science, June 1999 A Correctness Proof for a Practical Byzantine-Fault-Tolerant Replication Algorithm Miguel Castro and Barbara Liskov

More information

Proactive Recovery in a Byzantine-Fault-Tolerant System

Proactive Recovery in a Byzantine-Fault-Tolerant System Proactive Recovery in a Byzantine-Fault-Tolerant System Miguel Castro and Barbara Liskov Laboratory for Computer Science, Massachusetts Institute of Technology, 545 Technology Square, Cambridge, MA 02139

More information

Distributed Systems 11. Consensus. Paul Krzyzanowski

Distributed Systems 11. Consensus. Paul Krzyzanowski Distributed Systems 11. Consensus Paul Krzyzanowski pxk@cs.rutgers.edu 1 Consensus Goal Allow a group of processes to agree on a result All processes must agree on the same value The value must be one

More information

CSCI 5454, CU Boulder Samriti Kanwar Lecture April 2013

CSCI 5454, CU Boulder Samriti Kanwar Lecture April 2013 1. Byzantine Agreement Problem In the Byzantine agreement problem, n processors communicate with each other by sending messages over bidirectional links in order to reach an agreement on a binary value.

More information

Failure models. Byzantine Fault Tolerance. What can go wrong? Paxos is fail-stop tolerant. BFT model. BFT replication 5/25/18

Failure models. Byzantine Fault Tolerance. What can go wrong? Paxos is fail-stop tolerant. BFT model. BFT replication 5/25/18 Failure models Byzantine Fault Tolerance Fail-stop: nodes either execute the protocol correctly or just stop Byzantine failures: nodes can behave in any arbitrary way Send illegal messages, try to trick

More information

PBFT: A Byzantine Renaissance. The Setup. What could possibly go wrong? The General Idea. Practical Byzantine Fault-Tolerance (CL99, CL00)

PBFT: A Byzantine Renaissance. The Setup. What could possibly go wrong? The General Idea. Practical Byzantine Fault-Tolerance (CL99, CL00) PBFT: A Byzantine Renaissane Pratial Byzantine Fault-Tolerane (CL99, CL00) first to be safe in asynhronous systems live under weak synhrony assumptions -Byzantine Paos! The Setup Crypto System Model Asynhronous

More information

Byzantine Fault Tolerant Raft

Byzantine Fault Tolerant Raft Abstract Byzantine Fault Tolerant Raft Dennis Wang, Nina Tai, Yicheng An {dwang22, ninatai, yicheng} @stanford.edu https://github.com/g60726/zatt For this project, we modified the original Raft design

More information

Practical Byzantine Fault Tolerance

Practical Byzantine Fault Tolerance Practical Byzantine Fault Tolerance Miguel Castro January 31, 2001 c Massachusetts Institute of Technology 2001 This research was supported in part by DARPA under contract DABT63-95-C-005, monitored by

More information

arxiv: v2 [cs.dc] 12 Sep 2017

arxiv: v2 [cs.dc] 12 Sep 2017 Efficient Synchronous Byzantine Consensus Ittai Abraham 1, Srinivas Devadas 2, Danny Dolev 3, Kartik Nayak 4, and Ling Ren 2 arxiv:1704.02397v2 [cs.dc] 12 Sep 2017 1 VMware Research iabraham@vmware.com

More information

ZZ: Cheap Practical BFT using Virtualization

ZZ: Cheap Practical BFT using Virtualization University of Massachusetts, Technical Report TR14-08 1 ZZ: Cheap Practical BFT using Virtualization Timothy Wood, Rahul Singh, Arun Venkataramani, and Prashant Shenoy Department of Computer Science, University

More information

Resource-efficient Byzantine Fault Tolerance. Tobias Distler, Christian Cachin, and Rüdiger Kapitza

Resource-efficient Byzantine Fault Tolerance. Tobias Distler, Christian Cachin, and Rüdiger Kapitza 1 Resource-efficient Byzantine Fault Tolerance Tobias Distler, Christian Cachin, and Rüdiger Kapitza Abstract One of the main reasons why Byzantine fault-tolerant (BFT) systems are currently not widely

More information

ByzID: Byzantine Fault Tolerance from Intrusion Detection

ByzID: Byzantine Fault Tolerance from Intrusion Detection : Byzantine Fault Tolerance from Intrusion Detection Sisi Duan UC Davis sduan@ucdavis.edu Karl Levitt UC Davis levitt@ucdavis.edu Hein Meling University of Stavanger, Norway hein.meling@uis.no Sean Peisert

More information

Global atomicity. Such distributed atomicity is called global atomicity A protocol designed to enforce global atomicity is called commit protocol

Global atomicity. Such distributed atomicity is called global atomicity A protocol designed to enforce global atomicity is called commit protocol Global atomicity In distributed systems a set of processes may be taking part in executing a task Their actions may have to be atomic with respect to processes outside of the set example: in a distributed

More information

Replication in Distributed Systems

Replication in Distributed Systems Replication in Distributed Systems Replication Basics Multiple copies of data kept in different nodes A set of replicas holding copies of a data Nodes can be physically very close or distributed all over

More information

Evaluating BFT Protocols for Spire

Evaluating BFT Protocols for Spire Evaluating BFT Protocols for Spire Henry Schuh & Sam Beckley 600.667 Advanced Distributed Systems & Networks SCADA & Spire Overview High-Performance, Scalable Spire Trusted Platform Module Known Network

More information

Distributed Systems (ICE 601) Fault Tolerance

Distributed Systems (ICE 601) Fault Tolerance Distributed Systems (ICE 601) Fault Tolerance Dongman Lee ICU Introduction Failure Model Fault Tolerance Models state machine primary-backup Class Overview Introduction Dependability availability reliability

More information

Failures, Elections, and Raft

Failures, Elections, and Raft Failures, Elections, and Raft CS 8 XI Copyright 06 Thomas W. Doeppner, Rodrigo Fonseca. All rights reserved. Distributed Banking SFO add interest based on current balance PVD deposit $000 CS 8 XI Copyright

More information

Distributed Systems. Fault Tolerance. Paul Krzyzanowski

Distributed Systems. Fault Tolerance. Paul Krzyzanowski Distributed Systems Fault Tolerance Paul Krzyzanowski Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution 2.5 License. Faults Deviation from expected

More information

Today: Fault Tolerance. Replica Management

Today: Fault Tolerance. Replica Management Today: Fault Tolerance Failure models Agreement in presence of faults Two army problem Byzantine generals problem Reliable communication Distributed commit Two phase commit Three phase commit Failure recovery

More information

ByTAM: a Byzantine Fault Tolerant Adaptation Manager

ByTAM: a Byzantine Fault Tolerant Adaptation Manager ByTAM: a Byzantine Fault Tolerant Adaptation Manager Frederico Miguel Reis Sabino Thesis to obtain the Master of Science Degree in Information Systems and Computer Engineering Supervisor: Prof. Doutor

More information

Chapter 8 Fault Tolerance

Chapter 8 Fault Tolerance DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 8 Fault Tolerance 1 Fault Tolerance Basic Concepts Being fault tolerant is strongly related to

More information

ABSTRACT. Web Service Atomic Transaction (WS-AT) is a standard used to implement distributed

ABSTRACT. Web Service Atomic Transaction (WS-AT) is a standard used to implement distributed ABSTRACT Web Service Atomic Transaction (WS-AT) is a standard used to implement distributed processing over the internet. Trustworthy coordination of transactions is essential to ensure proper running

More information

Distributed Deadlock

Distributed Deadlock Distributed Deadlock 9.55 DS Deadlock Topics Prevention Too expensive in time and network traffic in a distributed system Avoidance Determining safe and unsafe states would require a huge number of messages

More information

Velisarios: Byzantine Fault-Tolerant Protocols Powered by Coq

Velisarios: Byzantine Fault-Tolerant Protocols Powered by Coq Velisarios: Byzantine Fault-Tolerant Protocols Powered by Coq Vincent Rahli, Ivana Vukotic, Marcus Völp, Paulo Esteves-Verissimo SnT, University of Luxembourg, Esch-sur-Alzette, Luxembourg firstname.lastname@uni.lu

More information

Distributed Algorithms (PhD course) Consensus SARDAR MUHAMMAD SULAMAN

Distributed Algorithms (PhD course) Consensus SARDAR MUHAMMAD SULAMAN Distributed Algorithms (PhD course) Consensus SARDAR MUHAMMAD SULAMAN Consensus (Recapitulation) A consensus abstraction is specified in terms of two events: 1. Propose ( propose v )» Each process has

More information

Today: Fault Tolerance. Failure Masking by Redundancy

Today: Fault Tolerance. Failure Masking by Redundancy Today: Fault Tolerance Agreement in presence of faults Two army problem Byzantine generals problem Reliable communication Distributed commit Two phase commit Three phase commit Failure recovery Checkpointing

More information

UpRight Cluster Services

UpRight Cluster Services UpRight Cluster Services Allen Clement, Manos Kapritsos, Sangmin Lee, Yang Wang, Lorenzo Alvisi, Mike Dahlin, Taylor Riché Department of Computer Sciences The University of Texas at Austin Austin, Texas,

More information

HQ Replication: A Hybrid Quorum Protocol for Byzantine Fault Tolerance

HQ Replication: A Hybrid Quorum Protocol for Byzantine Fault Tolerance HQ Replication: A Hybrid Quorum Protocol for Byzantine Fault Tolerance James Cowling 1, Daniel Myers 1, Barbara Liskov 1, Rodrigo Rodrigues 2, and Liuba Shrira 3 1 MIT CSAIL, 2 INESC-ID and Instituto Superior

More information

Distributed Systems 2 Introduction

Distributed Systems 2 Introduction Distributed Systems 2 Introduction Alberto Montresor University of Trento, Italy 2018/09/13 This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Contents 1 Getting

More information

BAR Gossip. Lorenzo Alvisi UT Austin

BAR Gossip. Lorenzo Alvisi UT Austin BAR Gossip Lorenzo Alvisi UT Austin MAD Services Nodes collaborate to provide service that benefits each node Service spans multiple administrative domains (MADs) Examples: Overlay routing, wireless mesh

More information

Byzantine Fault Tolerance Can Be Fast

Byzantine Fault Tolerance Can Be Fast Byzantine Fault Tolerance Can Be Fast Miguel Castro Microsoft Research Ltd. 1 Guildhall St., Cambridge CB2 3NH, UK mcastro@microsoft.com Barbara Liskov MIT Laboratory for Computer Science 545 Technology

More information

AGREEMENT PROTOCOLS. Paxos -a family of protocols for solving consensus

AGREEMENT PROTOCOLS. Paxos -a family of protocols for solving consensus AGREEMENT PROTOCOLS Paxos -a family of protocols for solving consensus OUTLINE History of the Paxos algorithm Paxos Algorithm Family Implementation in existing systems References HISTORY OF THE PAXOS ALGORITHM

More information

Distributed Algorithms Introduction

Distributed Algorithms Introduction Distributed Algorithms Introduction Alberto Montresor University of Trento, Italy 2016/04/26 This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Contents 1

More information

CMSC 858F: Algorithmic Game Theory Fall 2010 Achieving Byzantine Agreement and Broadcast against Rational Adversaries

CMSC 858F: Algorithmic Game Theory Fall 2010 Achieving Byzantine Agreement and Broadcast against Rational Adversaries CMSC 858F: Algorithmic Game Theory Fall 2010 Achieving Byzantine Agreement and Broadcast against Rational Adversaries Instructor: Mohammad T. Hajiaghayi Scribe: Adam Groce, Aishwarya Thiruvengadam, Ateeq

More information

Distributed Systems COMP 212. Lecture 19 Othon Michail

Distributed Systems COMP 212. Lecture 19 Othon Michail Distributed Systems COMP 212 Lecture 19 Othon Michail Fault Tolerance 2/31 What is a Distributed System? 3/31 Distributed vs Single-machine Systems A key difference: partial failures One component fails

More information

BAR gossip. Antonio Massaro. May 9, May 9, / 40

BAR gossip. Antonio Massaro. May 9, May 9, / 40 BAR gossip Antonio Massaro May 9, 2016 May 9, 2016 1 / 40 MAD services Single nodes cooperate to provide services in Multiple Administrative Domains Internet routing File distribution Archival storage

More information

Zeno: Eventually Consistent Byzantine-Fault Tolerance

Zeno: Eventually Consistent Byzantine-Fault Tolerance Zeno: Eventually Consistent Byzantine-Fault Tolerance Atul Singh 1,2, Pedro Fonseca 1, Petr Kuznetsov 3, Rodrigo Rodrigues 1, Petros Maniatis 4 1 MPI-SWS, 2 Rice University, 3 TU Berlin/Deutsche Telekom

More information

HT-Paxos: High Throughput State-Machine Replication Protocol for Large Clustered Data Centers

HT-Paxos: High Throughput State-Machine Replication Protocol for Large Clustered Data Centers 1 HT-Paxos: High Throughput State-Machine Replication Protocol for Large Clustered Data Centers Vinit Kumar 1 and Ajay Agarwal 2 1 Associate Professor with the Krishna Engineering College, Ghaziabad, India.

More information

arxiv:cs/ v3 [cs.dc] 1 Aug 2007

arxiv:cs/ v3 [cs.dc] 1 Aug 2007 A Byzantine Fault Tolerant Distributed Commit Protocol arxiv:cs/0612083v3 [cs.dc] 1 Aug 2007 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University, 2121 Euclid Ave,

More information

Parsimonious Asynchronous Byzantine-Fault-Tolerant Atomic Broadcast

Parsimonious Asynchronous Byzantine-Fault-Tolerant Atomic Broadcast Parsimonious Asynchronous Byzantine-Fault-Tolerant Atomic Broadcast HariGovind V. Ramasamy Christian Cachin August 19, 2005 Abstract Atomic broadcast is a communication primitive that allows a group of

More information

Consensus Problem. Pradipta De

Consensus Problem. Pradipta De Consensus Problem Slides are based on the book chapter from Distributed Computing: Principles, Paradigms and Algorithms (Chapter 14) by Kshemkalyani and Singhal Pradipta De pradipta.de@sunykorea.ac.kr

More information

Zzyzx: Scalable Fault Tolerance through Byzantine Locking

Zzyzx: Scalable Fault Tolerance through Byzantine Locking Zzyzx: Scalable Fault Tolerance through Byzantine Locking James Hendricks Shafeeq Sinnamohideen Gregory R. Ganger Michael K. Reiter Carnegie Mellon University University of North Carolina at Chapel Hill

More information

Security (and finale) Dan Ports, CSEP 552

Security (and finale) Dan Ports, CSEP 552 Security (and finale) Dan Ports, CSEP 552 Today Security: what if parts of your distributed system are malicious? BFT: state machine replication Bitcoin: peer-to-peer currency Course wrap-up Security Too

More information

Fault Tolerance. Distributed Systems. September 2002

Fault Tolerance. Distributed Systems. September 2002 Fault Tolerance Distributed Systems September 2002 Basics A component provides services to clients. To provide services, the component may require the services from other components a component may depend

More information

Tolerating Byzantine Faulty Clients in a Quorum System

Tolerating Byzantine Faulty Clients in a Quorum System Tolerating Byzantine Faulty Clients in a Quorum System Barbara Liskov MIT CSAIL Cambridge, MA, USA Rodrigo Rodrigues INESC-ID / Instituto Superior Técnico Lisbon, Portugal Abstract Byzantine quorum systems

More information

CS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5.

CS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5. Question 1 What makes a message unstable? How does an unstable message become stable? Distributed Systems 2016 Exam 2 Review Paul Krzyzanowski Rutgers University Fall 2016 In virtual sychrony, a message

More information

Byzantine Clients Rendered Harmless Barbara Liskov, Rodrigo Rodrigues

Byzantine Clients Rendered Harmless Barbara Liskov, Rodrigo Rodrigues Computer Science and Artificial Intelligence Laboratory Technical Report MIT-CSAIL-TR-2005-047 MIT-LCS-TR-994 July 21, 2005 Byzantine Clients Rendered Harmless Barbara Liskov, Rodrigo Rodrigues massachusetts

More information

All about Eve: Execute-Verify Replication for Multi-Core Servers

All about Eve: Execute-Verify Replication for Multi-Core Servers All about Eve: Execute-Verify Replication for Multi-Core Servers Manos Kapritsos, Yang Wang, Vivien Quema, Allen Clement, Lorenzo Alvisi, Mike Dahlin Dependability Multi-core Databases Key-value stores

More information

Coordination and Agreement

Coordination and Agreement Coordination and Agreement Nicola Dragoni Embedded Systems Engineering DTU Informatics 1. Introduction 2. Distributed Mutual Exclusion 3. Elections 4. Multicast Communication 5. Consensus and related problems

More information

Toward Intrusion Tolerant Clouds

Toward Intrusion Tolerant Clouds Toward Intrusion Tolerant Clouds Prof. Yair Amir, Prof. Vladimir Braverman Daniel Obenshain, Tom Tantillo Department of Computer Science Johns Hopkins University Prof. Cristina Nita-Rotaru, Prof. Jennifer

More information

Fault Tolerance. Basic Concepts

Fault Tolerance. Basic Concepts COP 6611 Advanced Operating System Fault Tolerance Chi Zhang czhang@cs.fiu.edu Dependability Includes Availability Run time / total time Basic Concepts Reliability The length of uninterrupted run time

More information

Semi-Passive Replication in the Presence of Byzantine Faults

Semi-Passive Replication in the Presence of Byzantine Faults Semi-Passive Replication in the Presence of Byzantine Faults HariGovind V. Ramasamy Adnan Agbaria William H. Sanders University of Illinois at Urbana-Champaign 1308 W. Main Street, Urbana IL 61801, USA

More information

BFT Selection. Ali Shoker and Jean-Paul Bahsoun. University of Toulouse III, IRIT Lab. Toulouse, France

BFT Selection. Ali Shoker and Jean-Paul Bahsoun. University of Toulouse III, IRIT Lab. Toulouse, France BFT Selection Ali Shoker and Jean-Paul Bahsoun University of Toulouse III, IRIT Lab. Toulouse, France firstname.lastname@irit.fr Abstract. One-size-fits-all protocols are hard to achieve in Byzantine fault

More information