Byzantine fault tolerance. Jinyang Li With PBFT slides from Liskov

Similar documents
A definition. Byzantine Generals Problem. Synchronous, Byzantine world

Byzantine Fault Tolerance

Byzantine Fault Tolerance

Failure models. Byzantine Fault Tolerance. What can go wrong? Paxos is fail-stop tolerant. BFT model. BFT replication 5/25/18

Zyzzyva. Speculative Byzantine Fault Tolerance. Ramakrishna Kotla. L. Alvisi, M. Dahlin, A. Clement, E. Wong University of Texas at Austin

Robust BFT Protocols

Tolerating Latency in Replicated State Machines through Client Speculation

Practical Byzantine Fault Tolerance. Miguel Castro and Barbara Liskov

Reducing the Costs of Large-Scale BFT Replication

Practical Byzantine Fault

Security (and finale) Dan Ports, CSEP 552

Authenticated Agreement

Practical Byzantine Fault Tolerance Consensus and A Simple Distributed Ledger Application Hao Xu Muyun Chen Xin Li

Byzantine Fault Tolerance and Consensus. Adi Seredinschi Distributed Programming Laboratory

Practical Byzantine Fault Tolerance. Castro and Liskov SOSP 99

Practical Byzantine Fault Tolerance

Viewstamped Replication to Practical Byzantine Fault Tolerance. Pradipta De

Distributed Algorithms Practical Byzantine Fault Tolerance

Exploiting Commutativity For Practical Fast Replication. Seo Jin Park and John Ousterhout

Distributed Algorithms Practical Byzantine Fault Tolerance

Data Consistency and Blockchain. Bei Chun Zhou (BlockChainZ)

AS distributed systems develop and grow in size,

G Distributed Systems: Fall Quiz II

CS 138: Practical Byzantine Consensus. CS 138 XX 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.

Practical Byzantine Fault Tolerance (The Byzantine Generals Problem)

Recovering from a Crash. Three-Phase Commit

PBFT: A Byzantine Renaissance. The Setup. What could possibly go wrong? The General Idea. Practical Byzantine Fault-Tolerance (CL99, CL00)

Byzantine Fault-Tolerance with Commutative Commands

Two New Protocols for Fault Tolerant Agreement

Practical Byzantine Fault Tolerance and Proactive Recovery

Janus. Consolidating Concurrency Control and Consensus for Commits under Conflicts. Shuai Mu, Lamont Nelson, Wyatt Lloyd, Jinyang Li

or? Paxos: Fun Facts Quorum Quorum: Primary Copy vs. Majority Quorum: Primary Copy vs. Majority

Tradeoffs in Byzantine-Fault-Tolerant State-Machine-Replication Protocol Design

Zzyzx: Scalable Fault Tolerance through Byzantine Locking

HQ Replication: A Hybrid Quorum Protocol for Byzantine Fault Tolerance

Practical Byzantine Fault Tolerance

Authenticated Byzantine Fault Tolerance Without Public-Key Cryptography

Byzantine Fault Tolerant Raft

Proactive Recovery in a Byzantine-Fault-Tolerant System

Revisiting Fast Practical Byzantine Fault Tolerance

Paxos and Replication. Dan Ports, CSEP 552

SpecPaxos. James Connolly && Harrison Davis

International Journal of Advanced Research in Computer Science and Software Engineering

Zzyzx: Scalable Fault Tolerance through Byzantine Locking

Machine Fault Tolerance for Reliable Datacenter Systems

Large-Scale Byzantine Fault Tolerance: Safe but Not Always Live

Proactive Recovery in a Byzantine-Fault-Tolerant System

Tolerating latency in replicated state machines through client speculation

Today: Fault Tolerance

ZZ: Cheap Practical BFT using Virtualization

Proactive and Reactive View Change for Fault Tolerant Byzantine Agreement

Today: Fault Tolerance. Fault Tolerance

Failures, Elections, and Raft

TAPIR. By Irene Zhang, Naveen Sharma, Adriana Szekeres, Arvind Krishnamurthy, and Dan Ports Presented by Todd Charlton

Agreement in Distributed Systems CS 188 Distributed Systems February 19, 2015

Zyzzyva: Speculative Byzantine Fault Tolerance

ZZ and the Art of Practical BFT Execution

Byzantine Fault Tolerance for Distributed Systems

arxiv:cs/ v3 [cs.dc] 1 Aug 2007

Distributed Systems 11. Consensus. Paul Krzyzanowski

Practical Byzantine Fault Tolerance

Zyzzyva: Speculative Byzantine Fault Tolerance

Consensus and related problems

ABSTRACT. Web Service Atomic Transaction (WS-AT) is a standard used to implement distributed

BAR Gossip. Lorenzo Alvisi UT Austin

Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf

ByzID: Byzantine Fault Tolerance from Intrusion Detection

CS 425 / ECE 428 Distributed Systems Fall 2017

ZZ and the Art of Practical BFT

Byzantine Techniques

Fault Tolerance. Distributed Systems. September 2002

Paxos Replicated State Machines as the Basis of a High- Performance Data Store

Paxos provides a highly available, redundant log of events

Scaling Byzantine Fault-tolerant Replication to Wide Area Networks

arxiv: v2 [cs.dc] 12 Sep 2017

Alternative Consensus

Towards Recoverable Hybrid Byzantine Consensus

Trustworthy Coordination of Web Services Atomic Transactions

A Correctness Proof for a Practical Byzantine-Fault-Tolerant Replication Algorithm

ZZ and the Art of Practical BFT Execution

Application-Aware Byzantine Fault Tolerance

Replication in Distributed Systems

When You Don t Trust Clients: Byzantine Proposer Fast Paxos

Distributed Systems COMP 212. Revision 2 Othon Michail

Dep. Systems Requirements

Steward: Scaling Byzantine Fault-Tolerant Systems to Wide Area Networks

Introduction to Distributed Systems Seif Haridi

Distributed Storage Systems: Data Replication using Quorums

From Viewstamped Replication to Byzantine Fault Tolerance

Prophecy: Using History for High Throughput Fault Tolerance

Zeno: Eventually Consistent Byzantine-Fault Tolerance

SDPaxos: Building Efficient Semi-Decentralized Geo-replicated State Machines

ZZ and the Art of Practical BFT Execution

Byzantine Fault Tolerant Coordination for Web Services Atomic Transactions

Distributed Systems. replication Johan Montelius ID2201. Distributed Systems ID2201

Parallel Data Types of Parallelism Replication (Multiple copies of the same data) Better throughput for read-only computations Data safety Partitionin

EECS 591 DISTRIBUTED SYSTEMS. Manos Kapritsos Fall 2018

Enhancing Throughput of

Exploiting Commutativity For Practical Fast Replication. Seo Jin Park and John Ousterhout

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 14 Distributed Transactions

Transcription:

Byzantine fault tolerance Jinyang Li With PBFT slides from Liskov

What we ve learnt so far: tolerate fail-stop failures Traditional RSM tolerates benign failures Node crashes Network partitions A RSM w/ 2f+1 replicas can tolerate f simultaneous crashes

Byzantine faults Nodes fail arbitrarily Failed node performs incorrect computation Failed nodes collude Causes: attacks, software/hardware errors Examples: Client asks bank to deposit $100, a Byzantine bank server substracts $100 instead. Client asks file system to store f1= aaa. A Byzantine server returns f1= bbb to clients.

Strawman defense Clients sign inputs. Clients verify computation based on signed inputs. Example: C stores signed file f1= aaa with server. C verifies that returned f1 is signed correctly. Problems: Byzantine node can return stale/correct computation E.g. Client stores signed f1= aaa and later stores signed f1= bbb, a Byzantine node can always return f1= aaa. Inefficient: clients have to perform computations!

PBFT ideas PBFT, Practical Byzantine Fault Tolerance, M. Castro and B. Liskov, SOSP 1999 Replicate service across many nodes Assumption: only a small fraction of nodes are Byzantine Rely on a super-majority of votes to decide on correct computation. PBFT property: tolerates <=f failures using a RSM with 3f+1 replicas

Why doesn t traditional RSM work with Byzantine nodes? Cannot rely on the primary to assign seqno Malicious primary can assign the same seqno to different requests! Cannot use Paxos for view change Paxos uses a majority accept-quorum to tolerate f benign faults out of 2f+1 nodes Does the intersection of two quorums always contain one honest node? Bad node tells different things to different quorums! E.g. tell N1 accept=val1 and tell N2 accept=val2

Paxos under Byzantine faults Prepare vid=1, myn=n0:1 OK val=null N2 N0 nh=n0:1 Prepare vid=1, myn=n0:1 OK val=null N1 nh=n0:1

Paxos under Byzantine faults accept vid=1, myn=n0:1, val=xyz OK N2 N0 decides on Vid1=xyz N0 nh=n0:1 X N1 nh=n0:1

Paxos under Byzantine faults prepare vid=1, myn=n1:1, val=abc OK val=null N2 X N0 decides on Vid1=xyz N0 nh=n0:1 N1 nh=n0:1

Paxos under Byzantine faults accept vid=1, myn=n1:1, val=abc OK N2 N0 decides on Vid1=xyz N0 nh=n0:1 X Agreement conflict! N1 nh=n1:1 N1 decides on Vid1=abc

PBFT main ideas Static configuration (same 3f+1 nodes) To deal with malicious primary Use a 3-phase protocol to agree on sequence number To deal with loss of agreement Use a bigger quorum (2f+1 out of 3f+1 nodes) Need to authenticate communications

BFT requires a 2f+1 quorum out of 3f+1 nodes 1. State: A 2. State: A 3. State: A 4. State: Servers X write A write A write A write A Clients For liveness, the quorum size must be at most N - f

BFT Quorums A A B B 1. State: 2. State: 3. State: 4. State: B Servers X write B write B write B write B Clients For correctness, any two quorums must intersect at least one honest node: (N-f) + (N-f) - N >= f+1 N >= 3f+1

PBFT Strategy Primary runs the protocol in the normal case Replicas watch the primary and do a view change if it fails

Replica state A replica id i (between 0 and N-1) Replica 0, replica 1, A view number v#, initially 0 Primary is the replica with id i = v# mod N A log of <op, seq#, status> entries Status = pre-prepared or prepared or committed

Normal Case Client sends request to primary or to all

Normal Case Primary sends pre-prepare message to all Pre-prepare contains <v#,seq#,op> Records operation in log as pre-prepared Keep in mind that primary might be malicious Send different seq# for the same op to different replicas Use a duplicate seq# for op

Normal Case Replicas check the pre-prepare and if it is ok: Record operation in log as pre-prepared Send prepare messages to all Prepare contains <i,v#,seq#,op> All to all communication

Normal Case: Replicas wait for 2f+1 matching prepares Record operation in log as prepared Send commit message to all Commit contains <i,v#,seq#,op> What does this stage achieve: All honest nodes that are prepared prepare the same value

Normal Case: Replicas wait for 2f+1 matching commits Record operation in log as committed Execute the operation Send result to the client

Normal Case Client waits for f+1 matching replies

BFT Client Request Pre-Prepare Prepare Commit Reply Primary Replica 2 Replica 3 Replica 4

View Change Replicas watch the primary Request a view change Commit point: when 2f+1 replicas have prepared

View Change Replicas watch the primary Request a view change send a do-viewchange request to all new primary requires 2f+1 requests sends new-view with this certificate Rest is similar

State transfer Additional Issues Checkpoints (garbage collection of the log) Selection of the primary Timing of view changes

Possible improvements Lower latency for writes (4 messages) Replicas respond at prepare Client waits for 2f+1 matching responses Fast reads (one round trip) Client sends to all; they respond immediately Client waits for 2f+1 matching responses

BFT Performance Phase BFS-PK BFS NFS-sdt 1 25.4 0.7 0.6 2 1528.6 39.8 26.9 3 80.1 34.1 30.7 4 87.5 41.3 36.7 5 2935.1 265.4 237.1 total 4656.7 381.3 332.0 Table 2: Andrew 100: elapsed time in seconds M. Castro and B. Liskov, Proactive Recovery in a Byzantine-Fault- Tolerant System, OSDI 2000

PBFT inspires much follow-on work BASE: Using abstraction to improve fault tolerance, R. Rodrigo et al, SOSP 2001 R.Kotla and M. Dahlin, High Throughput Byzantine Fault tolerance. DSN 2004 J. Li and D. Mazieres, Beyond one-third faulty replicas in Byzantine fault tolerant systems, NSDI 07 Abd-El-Malek et al, Fault-scalable Byzantine fault-tolerant services, SOSP 05 J. Cowling et al, HQ replication: a hybrid quorum protocol for Byzantine Fault tolerance, OSDI 06 Zyzzyva: Speculative Byzantine fault tolerance SOSP 07 Tolerating Byzantine faults in database systems using commit barrier scheduling SOSP 07 Low-overhead Byzantine fault-tolerant storage SOSP 07 Attested append-only memory: making adversaries stick to their word SOSP 07

Practical limitations of BFTs Expensive Protection is achieved only when <= f nodes fail Is 1 node more or less secure than 4 nodes? Does not prevent many classes attacks: Turn a machine into a botnet node Steal SSNs from servers