CS 138: Practical Byzantine Consensus. CS 138 XX 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.

Similar documents
Practical Byzantine Fault

Practical Byzantine Fault Tolerance (The Byzantine Generals Problem)

Practical Byzantine Fault Tolerance

Practical Byzantine Fault Tolerance. Miguel Castro and Barbara Liskov

Byzantine Fault Tolerance and Consensus. Adi Seredinschi Distributed Programming Laboratory

or? Paxos: Fun Facts Quorum Quorum: Primary Copy vs. Majority Quorum: Primary Copy vs. Majority

Practical Byzantine Fault Tolerance

Tolerating Latency in Replicated State Machines through Client Speculation

Reducing the Costs of Large-Scale BFT Replication

Byzantine Fault Tolerance

Today: Fault Tolerance

Byzantine Techniques

Consensus and related problems

Basic vs. Reliable Multicast

Byzantine Fault Tolerance

Authenticated Byzantine Fault Tolerance Without Public-Key Cryptography

Failure models. Byzantine Fault Tolerance. What can go wrong? Paxos is fail-stop tolerant. BFT model. BFT replication 5/25/18

Byzantine fault tolerance. Jinyang Li With PBFT slides from Liskov

Today: Fault Tolerance. Failure Masking by Redundancy

PBFT: A Byzantine Renaissance. The Setup. What could possibly go wrong? The General Idea. Practical Byzantine Fault-Tolerance (CL99, CL00)

Distributed Systems. 09. State Machine Replication & Virtual Synchrony. Paul Krzyzanowski. Rutgers University. Fall Paul Krzyzanowski

Today: Fault Tolerance. Fault Tolerance

Viewstamped Replication to Practical Byzantine Fault Tolerance. Pradipta De

Assignment 12: Commit Protocols and Replication

Distributed Systems 11. Consensus. Paul Krzyzanowski

Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf

Distributed Systems. 19. Fault Tolerance Paul Krzyzanowski. Rutgers University. Fall 2013

Recovering from a Crash. Three-Phase Commit

Practical Byzantine Fault Tolerance. Castro and Liskov SOSP 99

CSE 5306 Distributed Systems

CSE 5306 Distributed Systems. Fault Tolerance

Security (and finale) Dan Ports, CSEP 552

NFSv4 as the Building Block for Fault Tolerant Applications

Fault Tolerance. Goals: transparent: mask (i.e., completely recover from) all failures, or predictable: exhibit a well defined failure behavior

Transactions Between Distributed Ledgers

Practical Byzantine Fault Tolerance Consensus and A Simple Distributed Ledger Application Hao Xu Muyun Chen Xin Li

Failures, Elections, and Raft

Distributed Systems (ICE 601) Fault Tolerance

Proactive Recovery in a Byzantine-Fault-Tolerant System

Fault Tolerance. Distributed Systems. September 2002

Replication in Distributed Systems

Distributed Systems. replication Johan Montelius ID2201. Distributed Systems ID2201

Chapter 8 Fault Tolerance

Today: Fault Tolerance. Replica Management

Fault Tolerance Causes of failure: process failure machine failure network failure Goals: transparent: mask (i.e., completely recover from) all

Proactive Recovery in a Byzantine-Fault-Tolerant System

Distributed Systems. Before We Begin. Advantages. What is a Distributed System? CSE 120: Principles of Operating Systems. Lecture 13.

Practical Byzantine Fault Tolerance

Distributed Systems 24. Fault Tolerance

Distributed Systems COMP 212. Lecture 19 Othon Michail

Linearizability CMPT 401. Sequential Consistency. Passive Replication

Coordination 1. To do. Mutual exclusion Election algorithms Next time: Global state. q q q

Fault Tolerance. Basic Concepts

Key-value store with eventual consistency without trusting individual nodes

Basic concepts in fault tolerance Masking failure by redundancy Process resilience Reliable communication. Distributed commit.

CS 425 / ECE 428 Distributed Systems Fall 2017

Consensus Problem. Pradipta De

Practical Byzantine Fault Tolerance Using Fewer than 3f+1 Active Replicas

Zyzzyva. Speculative Byzantine Fault Tolerance. Ramakrishna Kotla. L. Alvisi, M. Dahlin, A. Clement, E. Wong University of Texas at Austin

Distributed Systems. Fault Tolerance. Paul Krzyzanowski

Consensus in Distributed Systems. Jeff Chase Duke University

Distributed Algorithms Practical Byzantine Fault Tolerance

Lecture XII: Replication

Practical Byzantine Fault Tolerance and Proactive Recovery

Byzantine Fault Tolerant Raft

A definition. Byzantine Generals Problem. Synchronous, Byzantine world

Global atomicity. Such distributed atomicity is called global atomicity A protocol designed to enforce global atomicity is called commit protocol

Fault Tolerance. Distributed Systems IT332

Practice: Large Systems

A Correctness Proof for a Practical Byzantine-Fault-Tolerant Replication Algorithm

CSE 486/586 Distributed Systems

CS 138: Google. CS 138 XVII 1 Copyright 2016 Thomas W. Doeppner. All rights reserved.

Last time. Distributed systems Lecture 6: Elections, distributed transactions, and replication. DrRobert N. M. Watson

CS5412: CONSENSUS AND THE FLP IMPOSSIBILITY RESULT

AS distributed systems develop and grow in size,

Exercise 12: Commit Protocols and Replication

Distributed System. Gang Wu. Spring,2018

Distributed Algorithms Practical Byzantine Fault Tolerance

Robust BFT Protocols

Paxos and Replication. Dan Ports, CSEP 552

Distributed File System

Practice: Large Systems Part 2, Chapter 2

Parallel Data Types of Parallelism Replication (Multiple copies of the same data) Better throughput for read-only computations Data safety Partitionin

Data Consistency and Blockchain. Bei Chun Zhou (BlockChainZ)

Replicated State Machine in Wide-area Networks

CS 138: Google. CS 138 XVI 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.

Byzantine Fault Tolerance Can Be Fast

Tolerating latency in replicated state machines through client speculation

The OceanStore Write Path

Peer-to-peer Sender Authentication for . Vivek Pathak and Liviu Iftode Rutgers University

Dep. Systems Requirements

To do. Consensus and related problems. q Failure. q Raft

C 1. Recap. CSE 486/586 Distributed Systems Failure Detectors. Today s Question. Two Different System Models. Why, What, and How.

Consensus, impossibility results and Paxos. Ken Birman

CS 470 Spring Fault Tolerance. Mike Lam, Professor. Content taken from the following:

Exam 2 Review. Fall 2011

A Lightweight Fault Tolerance Framework for Web Services 1

Paxos and Raft (Lecture 21, cs262a) Ion Stoica, UC Berkeley November 7, 2016

FAULT TOLERANCE. Fault Tolerant Systems. Faults Faults (cont d)

Recall our 2PC commit problem. Recall our 2PC commit problem. Doing failover correctly isn t easy. Consensus I. FLP Impossibility, Paxos

Transcription:

CS 138: Practical Byzantine Consensus CS 138 XX 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.

Scenario Asynchronous system Signed messages s are state machines It has to be practical CS 138 XX 2 Copyright 2017 Thomas W. Doeppner. All rights reserved.

Ground Rules All messages, including client requests, are signed messages cannot be modified Primary determines order: assigns sequence numbers traitorous primary server could give different total orders to different backup servers A server could be unresponsive delay does not affect safety can affect liveness - exponential bound on how long delays may be CS 138 XX 3 Copyright 2017 Thomas W. Doeppner. All rights reserved.

Faulty s n > 3f assume n = 3f+1 Backup servers must have identical responses from n-f servers (including primary) before they will trust primary New primary is chosen if current one is faulty (or non-reponsive) CS 138 XX 4 Copyright 2017 Thomas W. Doeppner. All rights reserved.

The Request request pre-prepare 1 0 pre-prepare pre-prepare 2 3 CS 138 XX 5 Copyright 2017 Thomas W. Doeppner. All rights reserved.

Non-Primaries Respond (1) 1 prepare prepare 0 prepare 2 3 CS 138 XX 6 Copyright 2017 Thomas W. Doeppner. All rights reserved.

Non-Primaries Respond (2) 1 0 prepare prepare prepare 2 3 CS 138 XX 7 Copyright 2017 Thomas W. Doeppner. All rights reserved.

Non-Primaries Respond (3) 1 0 prepare 2 prepare prepare 3 CS 138 XX 8 Copyright 2017 Thomas W. Doeppner. All rights reserved.

s Commit to Request (1) 1 commit commit 0 commit 2 3 CS 138 XX 9 Copyright 2017 Thomas W. Doeppner. All rights reserved.

s Commit to Request (2) 1 0 commit commit commit 2 3 CS 138 XX 10 Copyright 2017 Thomas W. Doeppner. All rights reserved.

s Commit to Request (3) 1 0 commit 2 commit commit 3 CS 138 XX 11 Copyright 2017 Thomas W. Doeppner. All rights reserved.

s Commit to Request (4) 1 0 commit commit commit 2 3 CS 138 XX 12 Copyright 2017 Thomas W. Doeppner. All rights reserved.

All Respond to reply 1 reply 0 reply 2 reply 3 CS 138 XX 13 Copyright 2017 Thomas W. Doeppner. All rights reserved.

Contents of Messages i pre-prepare: seq #; digest(msg) msg i prepare: seq #; digest(msg), i CS 138 XX 14 Copyright 2017 Thomas W. Doeppner. All rights reserved.

Be Prepared A non-primary server is prepared when it has received pre-prepare message it has received matching prepare messages from 2f other servers - 2f+1 including itself It s prepared to believe the primary both content of request and request sequence CS 138 XX 15 Copyright 2017 Thomas W. Doeppner. All rights reserved.

However There are multiple clients, each sending a sequence of requests Communication isn t perfect messages may arrive out of order s may be prepared, but s is not but will be eventually may be prepared for request q but not for q-1 CS 138 XX 16 Copyright 2017 Thomas W. Doeppner. All rights reserved.

Commitment i multicasts commit message to all others when it is prepared i commit: seq #; digest(msg), i A message is committed if it is prepared at f+1 non-faulty servers how does an individual server know this? - it is prepared and has received 2f commits from others executes message when message is committed and all previous messages have been executed CS 138 XX 17 Copyright 2017 Thomas W. Doeppner. All rights reserved.

Logging Each server maintains log of pre-prepares prepares commits CS 138 XX 18 Copyright 2017 Thomas W. Doeppner. All rights reserved.

Checkpoints Checkpoint = state of replica after all messages through a particular sequence number have been executed Log can be trimmed when all agree on replicas states s periodically exchange signed checkpoint messages contain digest of checkpoint Checkpoint messages from 2f+1 different servers constitute a proof of the checkpoint Log up to the checkpoint can be replaced with checkpoint and its proof CS 138 XX 19 Copyright 2017 Thomas W. Doeppner. All rights reserved.

Traitorous Primary sends request No response from primary re-sends request to all servers s forward request to primary If no response, then need new primary CS 138 XX 20 Copyright 2017 Thomas W. Doeppner. All rights reserved.

Views A particular primary server is in charge of a view v If the primary changes, the view changes to v+1 the primary for view v is server v mod S - S is the number of servers CS 138 XX 21 Copyright 2017 Thomas W. Doeppner. All rights reserved.

View Changes (1) Non-primaries who time-out waiting for server send signed view-change messages provide - most recent checkpoint plus proof - list of prepared messages since checkpoint with proof: pre-prepare plus prepare messages CS 138 XX 22 Copyright 2017 Thomas W. Doeppner. All rights reserved.

View Changes (2) New primary, after receiving 2f valid viewchange messages, responds with new-view message provides - set of view-change messages i.e., proof of view change - list of pre-prepare messages for all prepared messages since checkpoint missing messages are nullified non-primaries move to new view and reprocess prepared messages in this view CS 138 XX 23 Copyright 2017 Thomas W. Doeppner. All rights reserved.

Liveness Timeout period for view changes grows exponentially server multicasts view-change message after T-second timeout it waits 2T seconds for new-view message before timing out again 4T seconds before timing out again 8T seconds before timing out again CS 138 XX 24 Copyright 2017 Thomas W. Doeppner. All rights reserved.

Performance BFS: Byzantine fault-tolerant NFS replicated NFS servers simplified implementation of NFS - NFSv2 Implementations tested BFS: 4 servers BFS-nr: one server NFS-std: Digital Unix NFSv2 CS 138 XX 25 Copyright 2017 Thomas W. Doeppner. All rights reserved.

Andrew Benchmark phase 1 creates subdirectories recursively phase 2 copies a source-code tree phase 3 examines status of all files without reading their data phase 4 reads all data bytes phase 5 compiles and links all files CS 138 XX 26 Copyright 2017 Thomas W. Doeppner. All rights reserved.

BFS vs BFS-nr CS 138 XX 27 Copyright 2017 Thomas W. Doeppner. All rights reserved.

BFS vs. NFS CS 138 XX 28 Copyright 2017 Thomas W. Doeppner. All rights reserved.