A definition. Byzantine Generals Problem. Synchronous, Byzantine world

Similar documents
Byzantine fault tolerance. Jinyang Li With PBFT slides from Liskov

Byzantine Fault Tolerance

Byzantine Fault Tolerance

Failure models. Byzantine Fault Tolerance. What can go wrong? Paxos is fail-stop tolerant. BFT model. BFT replication 5/25/18

Practical Byzantine Fault Tolerance. Miguel Castro and Barbara Liskov

Practical Byzantine Fault

Practical Byzantine Fault Tolerance Consensus and A Simple Distributed Ledger Application Hao Xu Muyun Chen Xin Li

Robust BFT Protocols

Practical Byzantine Fault Tolerance

Practical Byzantine Fault Tolerance. Castro and Liskov SOSP 99

Byzantine Fault Tolerance and Consensus. Adi Seredinschi Distributed Programming Laboratory

Consensus Problem. Pradipta De

Tradeoffs in Byzantine-Fault-Tolerant State-Machine-Replication Protocol Design

Recovering from a Crash. Three-Phase Commit

Byzantine Fault-Tolerance with Commutative Commands

Failures, Elections, and Raft

Consensus and related problems

Data Consistency and Blockchain. Bei Chun Zhou (BlockChainZ)

Alternative Consensus

Paxos and Replication. Dan Ports, CSEP 552

Distributed Consensus: Making Impossible Possible

Distributed Commit in Asynchronous Systems

Security (and finale) Dan Ports, CSEP 552

arxiv: v2 [cs.dc] 12 Sep 2017

Distributed Systems Consensus

ABSTRACT. Web Service Atomic Transaction (WS-AT) is a standard used to implement distributed

Revisiting Fast Practical Byzantine Fault Tolerance

Practical Byzantine Fault Tolerance and Proactive Recovery

Today: Fault Tolerance

Byzantine Fault Tolerant Raft

Consensus. Chapter Two Friends. 2.3 Impossibility of Consensus. 2.2 Consensus 16 CHAPTER 2. CONSENSUS

G Distributed Systems: Fall Quiz II

Viewstamped Replication to Practical Byzantine Fault Tolerance. Pradipta De

Authenticated Byzantine Fault Tolerance Without Public-Key Cryptography

Today: Fault Tolerance. Fault Tolerance

Distributed Consensus: Making Impossible Possible

Tolerating Latency in Replicated State Machines through Client Speculation

Practical Byzantine Fault Tolerance Using Fewer than 3f+1 Active Replicas

Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf

Authenticated Agreement

Reducing the Costs of Large-Scale BFT Replication

ROUND COMPLEXITY LOWER BOUND OF ISC PROTOCOL IN THE PARALLELIZABLE MODEL. Huijing Gong CMSC 858F

Consensus, impossibility results and Paxos. Ken Birman

Consensus a classic problem. Consensus, impossibility results and Paxos. Distributed Consensus. Asynchronous networks.

Dfinity Consensus, Explored

Distributed Systems 11. Consensus. Paul Krzyzanowski

CMSC 858F: Algorithmic Game Theory Fall 2010 Achieving Byzantine Agreement and Broadcast against Rational Adversaries

Byzantine Techniques

Zyzzyva. Speculative Byzantine Fault Tolerance. Ramakrishna Kotla. L. Alvisi, M. Dahlin, A. Clement, E. Wong University of Texas at Austin

Cheap Paxos. Leslie Lamport and Mike Massa. Appeared in The International Conference on Dependable Systems and Networks (DSN 2004)

Distributed Systems. 09. State Machine Replication & Virtual Synchrony. Paul Krzyzanowski. Rutgers University. Fall Paul Krzyzanowski

To do. Consensus and related problems. q Failure. q Raft

Consensus. Chapter Two Friends. 8.3 Impossibility of Consensus. 8.2 Consensus 8.3. IMPOSSIBILITY OF CONSENSUS 55

EECS 498 Introduction to Distributed Systems

The Long March of BFT. Weird Things Happen in Distributed Systems. A specter is haunting the system s community... A hierarchy of failure models

Introduction to Distributed Systems Seif Haridi

Agreement and Consensus. SWE 622, Spring 2017 Distributed Software Engineering

Parsimonious Asynchronous Byzantine-Fault-Tolerant Atomic Broadcast

A SECURE DOMAIN NAME SYSTEM BASED ON INTRUSION TOLERANCE

Distributed Algorithms Practical Byzantine Fault Tolerance

Byzantine Clients Rendered Harmless Barbara Liskov, Rodrigo Rodrigues

A Correctness Proof for a Practical Byzantine-Fault-Tolerant Replication Algorithm

CSE 5306 Distributed Systems. Fault Tolerance

Byzantine Fault Tolerant Coordination for Web Services Atomic Transactions

International Journal of Advanced Research in Computer Science and Software Engineering

BYZANTINE AGREEMENT CH / $ IEEE. by H. R. Strong and D. Dolev. IBM Research Laboratory, K55/281 San Jose, CA 95193

Transactions. CS 475, Spring 2018 Concurrent & Distributed Systems

Coordinating distributed systems part II. Marko Vukolić Distributed Systems and Cloud Computing

Exploiting Commutativity For Practical Fast Replication. Seo Jin Park and John Ousterhout

Global atomicity. Such distributed atomicity is called global atomicity A protocol designed to enforce global atomicity is called commit protocol

Agreement in Distributed Systems CS 188 Distributed Systems February 19, 2015

Intuitive distributed algorithms. with F#

CSE 5306 Distributed Systems

or? Paxos: Fun Facts Quorum Quorum: Primary Copy vs. Majority Quorum: Primary Copy vs. Majority

Secure Reliable Multicast Protocols in a WAN

CS 138: Practical Byzantine Consensus. CS 138 XX 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.

PBFT: A Byzantine Renaissance. The Setup. What could possibly go wrong? The General Idea. Practical Byzantine Fault-Tolerance (CL99, CL00)

CS5412: CONSENSUS AND THE FLP IMPOSSIBILITY RESULT

Byzantine Fault Tolerant Coordination for Web Services Atomic Transactions

Distributed Algorithms Benoît Garbinato

Consensus in Distributed Systems. Jeff Chase Duke University

Two New Protocols for Fault Tolerant Agreement

Assignment 12: Commit Protocols and Replication Solution

Cryptocurrency and Blockchain Research

Practical Byzantine Fault Tolerance

Last time. Distributed systems Lecture 6: Elections, distributed transactions, and replication. DrRobert N. M. Watson

Resilient State Machine Replication

Replication in Distributed Systems

Asynchronous Reconfiguration for Paxos State Machines

Byzantine Consensus in Directed Graphs

Distributed Systems 2 Introduction

SpecPaxos. James Connolly && Harrison Davis

Distributed Systems (5DV147)

Semi-Passive Replication in the Presence of Byzantine Faults

Fault Tolerance via the State Machine Replication Approach. Favian Contreras

Distributed Algorithms Practical Byzantine Fault Tolerance

Scaling Byzantine Fault-tolerant Replication to Wide Area Networks

Paxos provides a highly available, redundant log of events

arxiv:cs/ v3 [cs.dc] 1 Aug 2007

Fault-Tolerant Distributed Consensus

Transcription:

The Byzantine Generals Problem Leslie Lamport, Robert Shostak, and Marshall Pease ACM TOPLAS 1982 Practical Byzantine Fault Tolerance Miguel Castro and Barbara Liskov OSDI 1999 A definition Byzantine (www.m-w.com): 1: of, relating to, or characteristic of the ancient city of Byzantium 4b: intricately involved : labyrinthine <rules of Byzantine complexity> Lamport s reason: I have long felt that, because it was posed as a cute problem about philosophers seated around a table, Dijkstra's dining philosopher's problem received much more attention than it deserves. (http://research.microsoft.com/users/lamport/pubs/pubs.html#byz) Byzantine Generals Problem Synchronous, Byzantine world Concerned with (binary) atomic broadcast All correct nodes receive same value If broadcaster correct, correct nodes receive broadcasted value Can use broadcast to build consensus protocols (aka, agreement) Consensus: think Byzantine fault-tolerant (BFT) Paxos Synchronous Fail-stop Asynchronous Byzantine

Cool note Example Byzantine fault-tolerant system:! Seawolf submarine s control system Sims, J. T. 1997. Redundancy Management Software Services for Seawolf Ship Control System. In Proceedings of the 27th international Symposium on Fault-Tolerant Computing (FTCS '97) (June 25-27, 1997). FTCS. IEEE Computer Society, Washington, DC, 390. Practical Byzantine Fault Tolerance: Asynchronous, Byzantine Synchronous Asynchronous But it remains to be seen if commodity distributed systems are willing to pay to have so many replicas in a system Fail-stop Byzantine Practical Byzantine Fault Tolerance Why async BFT? BFT: Malicious attacks, software errors Need N-version programming? Faulty client can write garbage data, but can t make system inconsistent (violate operational semantics) Why async? Faulty network can violate timing assumptions But can also prevent liveness [For different liveness properties, see, e.g., Cachin, C., Kursawe, K., and Shoup, V. 2000. Random oracles in constantipole: practical asynchronous Byzantine agreement using cryptography (extended abstract). In Proceedings of the Nineteenth Annual ACM Symposium on Principles of Distributed Computing (Portland, Oregon, United States, July 16-19, 2000). PODC '00. ACM, New York, NY, 123-132.] Distributed systems Async BFT consensus: Need 3f+1 nodes Sketch of proof: Divide 3f nodes into three groups of f, left, middle, right, where middle f are faulty. When left+middle talk, they must reach consensus (right may be crashed). Same for right+middle. Faulty middle can steer partitions to different values! [See Bracha, G. and Toueg, S. 1985. Asynchronous consensus and broadcast protocols. J. ACM 32, 4 (Oct. 1985), 824-840.] FLP impossibility: Async consensus may not terminate Sketch of proof: System starts in bivalent state (may decide 0 or 1). At some point, the system is one message away from deciding on 0 or 1. If that message is delayed, another message may move the system away from deciding. Holds even when servers can only crash (not Byzantine)! Hence, protocol cannot always be live (but there exist randomized BFT variants that are probably live) [See Fischer, M. J., Lynch, N. A., and Paterson, M. S. 1985. Impossibility of distributed consensus with one faulty process. J. ACM 32, 2 (Apr. 1985), 374-382.]

What we ve learnt so far: tolerate fail-stop failures Byzantine fault tolerance Jinyang Li With PBFT slides from Liskov Traditional RSM tolerates benign failures Node crashes Network partitions A RSM w/ 2f+1 replicas can tolerate f simultaneous crashes Byzantine faults Nodes fail arbitrarily Failed node performs incorrect computation Failed nodes collude Causes: attacks, software/hardware errors Examples: Client asks bank to deposit $100, a Byzantine bank server substracts $100 instead. Client asks file system to store f1= aaa. A Byzantine server returns f1= bbb to clients. Strawman defense Clients sign inputs. Clients verify computation based on signed inputs. Example: C stores signed file f1= aaa with server. C verifies that returned f1 is signed correctly. Problems: Byzantine node can return stale/correct computation E.g. Client stores signed f1= aaa and later stores signed f1= bbb, a Byzantine node can always return f1= aaa. Inefficient: clients have to perform computations!

PBFT ideas PBFT, Practical Byzantine Fault Tolerance, M. Castro and B. Liskov, SOSP 1999 Replicate service across many nodes Assumption: only a small fraction of nodes are Byzantine Rely on a super-majority of votes to decide on correct computation. PBFT property: tolerates <=f failures using a RSM with 3f+1 replicas Why doesn t traditional RSM work with Byzantine nodes? Cannot rely on the primary to assign seqno Malicious primary can assign the same seqno to different requests! Cannot use Paxos for view change Paxos uses a majority accept-quorum to tolerate f benign faults out of 2f+1 nodes Does the intersection of two quorums always contain one honest node? Bad node tells different things to different quorums! E.g. tell N1 accept=val1 and tell N2 accept=val2 Paxos under Byzantine faults Paxos under Byzantine faults Prepare vid=1, myn=n0:1 OK val=null accept vid=1, myn=n0:1, val=xyz OK N2 N2 N0 Prepare vid=1, myn=n0:1 OK val=null N1 N0 decides on Vid1=xyz N0 N1

Paxos under Byzantine faults Paxos under Byzantine faults prepare vid=1, myn=n1:1, val=abc OK val=null accept vid=1, myn=n1:1, val=abc OK N2 N2 N0 decides on Vid1=xyz N0 N1 N0 decides on Vid1=xyz N0 Agreement conflict! N1 nh=n1:1 N1 decides on Vid1=abc PBFT main ideas Static configuration (same 3f+1 nodes) To deal with malicious primary Use a 3-phase protocol to agree on sequence number To deal with loss of agreement Use a bigger quorum (2f+1 out of 3f+1 nodes) Need to authenticate communications Servers Clients BFT requires a 2f+1 quorum out of 3f+1 nodes 1. State: A write A 2. State: A write A write A 3. State: A write A 4. State: For liveness, the quorum size must be at most N - f

Servers Clients BFT Quorums 1. State: 2. State: 3. State: 4. State: A A B B B write B write B write B write B PBFT Strategy Primary runs the protocol in the normal case Replicas watch the primary and do a view change if it fails For correctness, any two quorums must intersect at least one honest node: (N-f) + (N-f) - N >= f+1 N >= 3f+1 Replica state A replica id i (between 0 and N-1) Replica 0, replica 1, A view number v#, initially 0 Primary is the replica with id i = v# mod N A log of <op, seq#, status> entries Status = pre-prepared or prepared or committed Normal Case Client sends request to primary or to all

Normal Case Primary sends pre-prepare message to all Pre-prepare contains <v#,seq#,op> Records operation in log as pre-prepared Keep in mind that primary might be malicious Send different seq# for the same op to different replicas Use a duplicate seq# for op Normal Case Replicas check the pre-prepare and if it is ok: Record operation in log as pre-prepared Send prepare messages to all Prepare contains <i,v#,seq#,op> All to all communication Normal Case: Replicas wait for 2f+1 matching prepares Record operation in log as prepared Send commit message to all Commit contains <i,v#,seq#,op> What does this stage achieve: All honest nodes that are prepared prepare the same value Normal Case: Replicas wait for 2f+1 matching commits Record operation in log as committed Execute the operation Send result to the client

Normal Case Client waits for f+1 matching replies Client Primary BFT Request Pre-Prepare Prepare Commit Reply Replica 2 Replica 3 Replica 4 View Change Replicas watch the primary Request a view change Commit point: when 2f+1 replicas have prepared View Change Replicas watch the primary Request a view change send a do-viewchange request to all new primary requires 2f+1 requests sends new-view with this certificate Rest is similar

Additional Issues State transfer Checkpoints (garbage collection of the log) Selection of the primary Timing of view changes Possible improvements Lower latency for writes (4 messages) Replicas respond at prepare Client waits for 2f+1 matching responses Fast reads (one round trip) Client sends to all; they respond immediately Client waits for 2f+1 matching responses Practical limitations of BFTs Expensive Protection is achieved only when <= f nodes fail Is 1 node more or less secure than 4 nodes? Does not prevent many types of attacks: Turn a machine into a botnet node Steal SSNs from servers