Byzantine Fault Tolerance

Similar documents
Byzantine Fault Tolerance

Practical Byzantine Fault Tolerance. Castro and Liskov SOSP 99

Byzantine fault tolerance. Jinyang Li With PBFT slides from Liskov

Failure models. Byzantine Fault Tolerance. What can go wrong? Paxos is fail-stop tolerant. BFT model. BFT replication 5/25/18

A definition. Byzantine Generals Problem. Synchronous, Byzantine world

Primary/Backup. CS6450: Distributed Systems Lecture 3/4. Ryan Stutsman

CS6450: Distributed Systems Lecture 13. Ryan Stutsman

CS6450: Distributed Systems Lecture 11. Ryan Stutsman

Security (and finale) Dan Ports, CSEP 552

Practical Byzantine Fault Tolerance. Miguel Castro and Barbara Liskov

Proactive Recovery in a Byzantine-Fault-Tolerant System

CS6450: Distributed Systems Lecture 15. Ryan Stutsman

Practical Byzantine Fault Tolerance and Proactive Recovery

Practical Byzantine Fault Tolerance

Scaling KVS. CS6450: Distributed Systems Lecture 14. Ryan Stutsman

Proactive Recovery in a Byzantine-Fault-Tolerant System

Today: Fault Tolerance

Practical Byzantine Fault

Bitcoin. CS6450: Distributed Systems Lecture 20 Ryan Stutsman

Distributed Transactions

Recovering from a Crash. Three-Phase Commit

Today: Fault Tolerance. Fault Tolerance

PRIMARY-BACKUP REPLICATION

Byzantine Fault Tolerance and Consensus. Adi Seredinschi Distributed Programming Laboratory

Consensus and related problems

Byzantine Techniques

Byzantine Fault Tolerant Raft

CS 138: Practical Byzantine Consensus. CS 138 XX 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.

GFS. CS6450: Distributed Systems Lecture 5. Ryan Stutsman

Paxos provides a highly available, redundant log of events

Practical Byzantine Fault Tolerance (The Byzantine Generals Problem)

Practical Byzantine Fault Tolerance

or? Paxos: Fun Facts Quorum Quorum: Primary Copy vs. Majority Quorum: Primary Copy vs. Majority

Agreement in Distributed Systems CS 188 Distributed Systems February 19, 2015

TWO-PHASE COMMIT ATTRIBUTION 5/11/2018. George Porter May 9 and 11, 2018

PBFT: A Byzantine Renaissance. The Setup. What could possibly go wrong? The General Idea. Practical Byzantine Fault-Tolerance (CL99, CL00)

Authenticated Byzantine Fault Tolerance Without Public-Key Cryptography

Parallel Data Types of Parallelism Replication (Multiple copies of the same data) Better throughput for read-only computations Data safety Partitionin

Causal Consistency and Two-Phase Commit

Recall: Primary-Backup. State machine replication. Extend PB for high availability. Consensus 2. Mechanism: Replicate and separate servers

Today: Fault Tolerance. Replica Management

Primary-Backup Replication

Distributed Hash Tables

Viewstamped Replication to Practical Byzantine Fault Tolerance. Pradipta De

Recall our 2PC commit problem. Recall our 2PC commit problem. Doing failover correctly isn t easy. Consensus I. FLP Impossibility, Paxos

Practical Byzantine Fault Tolerance

CS5412: CONSENSUS AND THE FLP IMPOSSIBILITY RESULT

Key-value store with eventual consistency without trusting individual nodes

Chapter 8 Fault Tolerance

Paxos and Raft (Lecture 21, cs262a) Ion Stoica, UC Berkeley November 7, 2016

Extend PB for high availability. PB high availability via 2PC. Recall: Primary-Backup. Putting it all together for SMR:

Data Consistency and Blockchain. Bei Chun Zhou (BlockChainZ)

Distributed Systems. Lec 9: Distributed File Systems NFS, AFS. Slide acks: Dave Andersen

Practical Byzantine Fault Tolerance Consensus and A Simple Distributed Ledger Application Hao Xu Muyun Chen Xin Li

CSE 124: REPLICATED STATE MACHINES. George Porter November 8 and 10, 2017

Distributed Systems 11. Consensus. Paul Krzyzanowski

CSE 5306 Distributed Systems. Fault Tolerance

Fault Tolerance Part I. CS403/534 Distributed Systems Erkay Savas Sabanci University

Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf

Distributed Systems COMP 212. Lecture 19 Othon Michail

Replication. Feb 10, 2016 CPSC 416

Last Class:Consistency Semantics. Today: More on Consistency

Failures, Elections, and Raft

REPLICATED STATE MACHINES

Implementation Issues. Remote-Write Protocols

Fault Tolerance. Basic Concepts

Paxos and Replication. Dan Ports, CSEP 552

CSE 5306 Distributed Systems

Transactions. CS 475, Spring 2018 Concurrent & Distributed Systems

Distributed Systems Consensus

The transaction. Defining properties of transactions. Failures in complex systems propagate. Concurrency Control, Locking, and Recovery

Distributed Systems. Fault Tolerance. Paul Krzyzanowski

Today: Fault Tolerance. Failure Masking by Redundancy

Distributed Systems Fault Tolerance

Last time. Distributed systems Lecture 6: Elections, distributed transactions, and replication. DrRobert N. M. Watson

To do. Consensus and related problems. q Failure. q Raft

Two phase commit protocol. Two phase commit protocol. Recall: Linearizability (Strong Consistency) Consensus

CONCURRENCY CONTROL, TRANSACTIONS, LOCKING, AND RECOVERY

Lecture XII: Replication

WA 2. Paxos and distributed programming Martin Klíma

CprE Fault Tolerance. Dr. Yong Guan. Department of Electrical and Computer Engineering & Information Assurance Center Iowa State University

Robust BFT Protocols

Eventual Consistency. Eventual Consistency

Fault Tolerance. it continues to perform its function in the event of a failure example: a system with redundant components

Distributed Systems. 19. Fault Tolerance Paul Krzyzanowski. Rutgers University. Fall 2013

Fault Tolerance. Distributed Systems. September 2002

AS distributed systems develop and grow in size,

Availability, Reliability, and Fault Tolerance

The Long March of BFT. Weird Things Happen in Distributed Systems. A specter is haunting the system s community... A hierarchy of failure models

Consensus a classic problem. Consensus, impossibility results and Paxos. Distributed Consensus. Asynchronous networks.

Consensus, impossibility results and Paxos. Ken Birman

Tradeoffs in Byzantine-Fault-Tolerant State-Machine-Replication Protocol Design

Proseminar Distributed Systems Summer Semester Paxos algorithm. Stefan Resmerita

Tolerating Latency in Replicated State Machines through Client Speculation

Authenticated Agreement

Distributed Systems (ICE 601) Fault Tolerance

Replication in Distributed Systems

Distributed Systems 24. Fault Tolerance

Steward: Scaling Byzantine Fault-Tolerant Systems to Wide Area Networks

arxiv: v2 [cs.dc] 12 Sep 2017

Transcription:

Byzantine Fault Tolerance CS6450: Distributed Systems Lecture 10 Ryan Stutsman Material taken/derived from Princeton COS-418 materials created by Michael Freedman and Kyle Jamieson at Princeton University. Licensed for use under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. Some material taken/derived from MIT 6.824 by Robert Morris, Franz Kaashoek, and Nickolai Zeldovich. 1

3f + 1? At f + 1 we can tolerate f failures and hold on to data. At 2f + 1 we can tolerate f failures and remain available. What do we get for 3f + 1? SMR that can tolerate f malicious or arbitrarily nasty failures

So far: Fail-stop failures Traditional state machine replication tolerates fail-stop failures: Node crashes Network breaks or partitions State machine replication with N = 2f+1 replicas can tolerate f simultaneous failstop failures Two algorithms: Paxos, RAFT

Byzantine faults Byzantine fault: Node/component fails arbitrarily Might perform incorrect computation Might give conflicting information to different parts of the system Might collude with other failed nodes Why would nodes fail arbitrarily? Software bug present in code Hardware failure occurs Hack attack on system

Today: Byzantine fault tolerance Can we provide state machine replication for a service in the presence of Byzantine faults? Such a service is called a Byzantine Fault Tolerant (BFT) service Why might we care about this level of reliability? 5

Mini-case-study: Boeing 777 fly-by-wire primary flight control system Triple-redundant, dissimilar processor hardware: 1. Intel 80486 2. Motorola 3. AMD Key techniques: Hardware and software diversity Voting between components Each processor runs code from a different compiler Simplified design: Pilot inputs à three processors Processors vote à control surface 6

Today 1. Traditional state-machine replication for BFT? 2. Practical BFT replication algorithm 3. Performance and Discussion 7

Review: Tolerating one fail-stop failure Traditional state machine replication (Paxos) requires, e.g., 2f + 1 = three replicas, if f = 1 Operations are totally ordered à correctness Each operation uses f + 1 = 2 of them Overlapping quorums 8

Use Paxos for BFT? 1. Can t rely on the primary to assign seqno Could assign same seqno to different requests 2. Can t use Paxos for view change Under Byzantine faults, the intersection of two majority (f + 1 node) quorums may be bad node Bad node tells different quorums different things! e.g. tells N0 accept val1, but N1 accept val2

Paxos under Byzantine faults (f = 1) N2 OK Prepare(N0:1) N0 n h =N0:1 OK(val=null) N1 n h =N0:1

Paxos under Byzantine faults (f = 1) N2 f +1 OK Accept(N0:1, val=xyz) N0 n h =N0:1 Decide xyz N1 n h =N0:1

Paxos under Byzantine faults (f = 1) N2 N0 n h =N0:1 Decide xyz N1 n h =N2:1

Paxos under Byzantine faults (f = 1) N2 f +1 N0 n h =N0:1 Decide xyz Decide abc N1 n h =N2:1 Conflicting decisions!

Today 1. Traditional state-machine replication for BFT? 2. Practical BFT replication algorithm [Liskov & Castro, 2001] 3. Performance and Discussion 14

Practical BFT: Overview Uses 3f+1 replicas to survive f failures Shown to be minimal (Lamport) Provides state machine replication Arbitrary service accessed by operations, e.g., File system ops read and write files and directories Tolerates Byzantine-faulty clients Goal in paper: NFS service that provides nearunreplicated performance 15

Correctness argument Assume Operations are deterministic Replicas start in same state Then if replicas execute the same requests in the same order: Correct replicas will produce identical results 16

First, a Few Issues 1. Caveat: Independence 2. Spoofing/Authentication

The Caveat: Independence Assumes independent node failures for BFT! Is this a big assumption? We actually had this assumption with consensus If nodes fail in a correlated way it amplifies the loss of a single node If factor is > f then system still wedges. Put another way: for Paxos to remain available when software bugs can produce temporally related crashes what do we need? 2f + 1 independent implementations

The Struggle for Independence Same here: for true independence, need 3f + 1 impls More important here 1. Nodes may be actively malicious and that should be ok. But they are looking for our weak spot and will exploit to amplify their effect. 2. If > f failures here anything can happen to the data. Attacker might change it, delete it, etc We ll never know. Requires different implementations, operating systems, root passwords, administrators. Ugh! Still, don t always know whether we ve exceeded f

Non-problem: Client failures Clients can t cause internal inconsistencies the data in the servers State machine replication property Clients can write bogus data to the system System should authenticate clients and separate their data just like any other datastore This is a separate problem 20

What clients do 1. Send requests to the primary replica 2. Wait for f+1 identical replies Note: The replies may be deceptive i.e. replica returns correct answer, but locally does otherwise! But one reply is actually from a non-faulty replica Client 3f+1 replicas 21

What replicas do Carry out a protocol that ensures that Replies from honest replicas are correct Enough replicas process each request to ensure that The non-faulty replicas process the same requests In the same order Non-faulty replicas obey the protocol 22

Primary-Backup protocol Primary-Backup protocol: Group runs in a view View number designates the primary replica Client Primary Backups View Primary is the node whose id (modulo view #) = 1 23

Ordering requests Primary picks the ordering of requests But the primary might be a liar! Client Primary Backups View Backups ensure primary behaves correctly Check and certify correct ordering Trigger view changes to replace faulty primary 24

Byzantine quorums (f = 1) A Byzantine quorum contains 2f+1 replicas One op s quorum overlaps with next op s quorum There are 3f+1 replicas, in total So overlap is f+1 replicas f+1 replicas must contain 1 non-faulty replica 25

Quorum certificates A Byzantine quorum contains 2f+1 replicas Quorum certificate: a collection of 2f + 1 signed, identical messages from a Byzantine quorum All messages agree on the same statement Guarantees no other operation can commit Servers provide this as proof to each other 26

Keys Each client and replica has a private-public keypair Messages between client/servers are signed Use MACs/symmetric key crypto to make things fast 27

Ordering requests request: m Signed,Client Primary Let seq(m)=n Signed, Primary Backup 1 Backup 2 Primary could be lying, sending a different message to each backup! Backup 3 Primary chooses the request s sequence number (n) Sequence number determines order of execution 28

Checking the primary s message request: m Signed,Client Primary Let seq(m)=n Signed, Primary I accept seq(m)=n Signed, Backup 1 Backup 1 I accept seq(m)=n Signed, Backup 2 Backup 2 Backup 3 Backups locally verify they ve seen one client request for sequence number n If local check passes, replica broadcasts accept message Each replica makes this decision independently 29

Collecting a prepared certificate (f = 1) request: m Signed,Client Primary Backup 1 Backup 2 Let seq(m)=n Signed, Primary P I accept seq(m)=n Signed, Backup 1 P I accept seq(m)=n Signed, Backup 2 P Backup 3 Backups wait to collect a prepared quorum certificate Message does is prepared not know (P) whether at a replica the when other it correct has: A message from the primary proposing the seqno 2f messages nodes do from too! itself So, and we others can t accepting commit the yet! seqno Each correct node has a prepared certificate locally, but 30

Why we can t commit yet: B2 s shoes request: m Signed,Client Primary Backup 1 Backup 2 Let seq(m)=n Signed, Primary? I accept seq(m)=n Signed, Backup 1? I accept seq(m)=n Signed, Backup 2 P Backup 3? B2 doesn t know if B1 is/will be prepared Only knows B1 accepted it Same with primary If B2 s messages lost and it goes down... B3 comes up, accepts different message, commit may not happen 31

Collecting a committed certificate (f = 1) request: m Let seq(m)=n Primary Backup 1 Backup 2 P P accept P C Have cert for seq(m)=n Signed, Primary C Signed, Backup 1 C Signed, Backup 2 Backup 3 Once the request is committed, replicas execute the operation and send a reply directly back to the client. Prepared replicas announce: they know a quorum accepts Replicas wait for a committed quorum certificate C: 2f+1 different statements that a replica is prepared 32

Byzantine primary (f = 1) request: m Primary Backup 1 Let seq(m)=n Let seq(m )=n accept m Backup 2 Backup 3 Let seq(m )=n accept m Recall: To prepare, need primary message and 2f accepts Backup 1: Has primary message for m, accepts for m Backups prepare 2, 3: Have àprimary time message for a view + one change matching accept No one has accumulated enough messages to 33

Byzantine primary In general, backups won t prepare if primary lies Suppose they did: two distinct requests m and m for the same sequence number n Then prepared quorum certificates (each of size 2f+1) would intersect at an honest replica So that honest replica would have sent an accept message for both m and m So m = m 34

View change Client Primary Backups View If a replica suspects the primary is faulty, it requests a view change Sends a viewchange request to all replicas Everyone acks the view change request New primary collects a quorum (2f+1) of responses Sends a new-view message with this certificate

Considerations for view change Need committed operations to survive into next view Client may have gotten answer Need to preserve liveness If replicas are too fast to do view change, but really primary is okay then performance problem Or malicious replica tries to subvert the system by proposing a bogus view change 36

Garbage collection Storing all messages and certificates into a log Can t let log grow without bound Protocol to shrink the log when it gets too big Discard messages, certificates on commit? No! Need them for view change Replicas have to agree to shrink the log 37

Proactive recovery What we ve done so far: good service provided there are no more than f failures over system lifetime But cannot recognize faulty replicas! Therefore proactive recovery: Recover the replica to a known good state whether faulty or not Correct service provided no more than f failures in a small time window e.g., 10 minutes 38

Recovery protocol sketch Watchdog timer Secure co-processor Stores node s private key (of private-public keypair) Read-only memory Restart node periodically: Saves its state (timed operation) Reboot, reload code from read-only memory Discard all secret keys (prevent impersonation) Establishes new secret keys and state 39

Today 1. Traditional state-machine replication for BFT? 2. Practical BFT replication algorithm [Liskov & Castro, 2001] 3. Performance and Discussion 40

File system benchmarks BFS filesystem runs atop BFT Four replicas tolerating one Byzantine failure Modified Andrew filesystem benchmark What s performance relative to NFS? Compare BFS versus Linux NFSv2 (unsafe!) BFS 15% slower: claim can be used in practice 41

Practical limitations of BFT Protection is achieved only when at most f nodes fail Is one node more or less secure than four? Need independent implementations of the service Needs more messages, rounds than conventional state machine replication Does not prevent many classes of attacks: Turn a machine into a botnet node Steal SSNs from servers

Large impact Inspired much follow-on work to address its limitations The ideas surrounding Byzantine fault tolerance have found numerous applications: Boeing 777 and 787 flight control computer systems Digital currency systems 43