Zyzzyva. Speculative Byzantine Fault Tolerance. Ramakrishna Kotla. L. Alvisi, M. Dahlin, A. Clement, E. Wong University of Texas at Austin

Similar documents
Reducing the Costs of Large-Scale BFT Replication

Zyzzyva: Speculative Byzantine Fault Tolerance

Tolerating Latency in Replicated State Machines through Client Speculation

Byzantine fault tolerance. Jinyang Li With PBFT slides from Liskov

Robust BFT Protocols

Zyzzyva: Speculative Byzantine Fault Tolerance

Distributed Algorithms Practical Byzantine Fault Tolerance

Distributed Algorithms Practical Byzantine Fault Tolerance

AS distributed systems develop and grow in size,

Exploiting Commutativity For Practical Fast Replication. Seo Jin Park and John Ousterhout

Authenticated Agreement

Byzantine Fault Tolerance and Consensus. Adi Seredinschi Distributed Programming Laboratory

EECS 591 DISTRIBUTED SYSTEMS. Manos Kapritsos Fall 2018

Two New Protocols for Fault Tolerant Agreement

Practical Byzantine Fault

ZZ: Cheap Practical BFT using Virtualization

Revisiting Fast Practical Byzantine Fault Tolerance

Practical Byzantine Fault Tolerance (The Byzantine Generals Problem)

Zzyzx: Scalable Fault Tolerance through Byzantine Locking

PBFT: A Byzantine Renaissance. The Setup. What could possibly go wrong? The General Idea. Practical Byzantine Fault-Tolerance (CL99, CL00)

International Journal of Advanced Research in Computer Science and Software Engineering

Practical Byzantine Fault Tolerance. Miguel Castro and Barbara Liskov

Scrooge: Reducing the Costs of Fast Byzantine Replication in Presence of Unresponsive Replicas

Key-value store with eventual consistency without trusting individual nodes

Proactive and Reactive View Change for Fault Tolerant Byzantine Agreement

Practical Byzantine Fault Tolerance

Tolerating latency in replicated state machines through client speculation

ByzID: Byzantine Fault Tolerance from Intrusion Detection

Tradeoffs in Byzantine-Fault-Tolerant State-Machine-Replication Protocol Design

Zzyzx: Scalable Fault Tolerance through Byzantine Locking

Prophecy: Using History for High Throughput Fault Tolerance

ZHT: Const Eventual Consistency Support For ZHT. Group Member: Shukun Xie Ran Xin

SDPaxos: Building Efficient Semi-Decentralized Geo-replicated State Machines

Byzantine Fault-Tolerance with Commutative Commands

Failure models. Byzantine Fault Tolerance. What can go wrong? Paxos is fail-stop tolerant. BFT model. BFT replication 5/25/18

ZZ and the Art of Practical BFT

Data Consistency and Blockchain. Bei Chun Zhou (BlockChainZ)

or? Paxos: Fun Facts Quorum Quorum: Primary Copy vs. Majority Quorum: Primary Copy vs. Majority

A definition. Byzantine Generals Problem. Synchronous, Byzantine world

Byzantine Fault Tolerance for Distributed Systems

UpRight Cluster Services

Copyright by Ramakrishna Rao Kotla 2008

Byzantine Fault Tolerant Raft

Toward Intrusion Tolerant Clouds

PBFT: A Byzantine Renaissance. The Setup. What could possibly go wrong? The General Idea. Practical Byzantine Fault-Tolerance (CL99, CL00)

Exploiting Commutativity For Practical Fast Replication. Seo Jin Park and John Ousterhout

All about Eve: Execute-Verify Replication for Multi-Core Servers

All about Eve: Execute-Verify Replication for Multi-Core Servers

Why then another BFT protocol? Zyzzyva. Simplify, simplify. Simplify, simplify. Complex decision tree hampers BFT adoption. H.D. Thoreau. H.D.

CS 138: Practical Byzantine Consensus. CS 138 XX 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.

Evaluating BFT Protocols for Spire

Authenticated Byzantine Fault Tolerance Without Public-Key Cryptography

Paxos Replicated State Machines as the Basis of a High- Performance Data Store

Viewstamped Replication to Practical Byzantine Fault Tolerance. Pradipta De

Replication in Distributed Systems

Application-Aware Byzantine Fault Tolerance

ZZ and the Art of Practical BFT Execution

HP: Hybrid Paxos for WANs

Proactive Recovery in a Byzantine-Fault-Tolerant System

The Next 700 BFT Protocols

ZZ and the Art of Practical BFT Execution

Designing Distributed Systems using Approximate Synchrony in Data Center Networks

Resource-efficient Byzantine Fault Tolerance. Tobias Distler, Christian Cachin, and Rüdiger Kapitza

APPLICATION AWARE FOR BYZANTINE FAULT TOLERANCE

There Is More Consensus in Egalitarian Parliaments

TAPIR. By Irene Zhang, Naveen Sharma, Adriana Szekeres, Arvind Krishnamurthy, and Dan Ports Presented by Todd Charlton

Machine Fault Tolerance for Reliable Datacenter Systems

Practical Byzantine Fault Tolerance Using Fewer than 3f+1 Active Replicas

ZZ and the Art of Practical BFT Execution

Enhancing Throughput of

HQ Replication: A Hybrid Quorum Protocol for Byzantine Fault Tolerance

Distributed Systems 11. Consensus. Paul Krzyzanowski

Fault Tolerance via the State Machine Replication Approach. Favian Contreras

BFTCloud: A Byzantine Fault Tolerance Framework for Voluntary-Resource Cloud Computing

Proactive Recovery in a Byzantine-Fault-Tolerant System

XFT: Practical Fault Tolerance beyond Crashes

Specula(ng Seriously. Rachid Guerraoui, EPFL

Practical Byzantine Fault Tolerance and Proactive Recovery

ABSTRACT. Web Service Atomic Transaction (WS-AT) is a standard used to implement distributed

When You Don t Trust Clients: Byzantine Proposer Fast Paxos

BFT Selection. Ali Shoker and Jean-Paul Bahsoun. University of Toulouse III, IRIT Lab. Toulouse, France

Replicated State Machine in Wide-area Networks

Group Replication: A Journey to the Group Communication Core. Alfranio Correia Principal Software Engineer

Exam Distributed Systems

XFT: Practical Fault Tolerance Beyond Crashes

Byzantine Fault Tolerance Can Be Fast

Trustworthy Coordination of Web Services Atomic Transactions

arxiv: v2 [cs.dc] 12 Sep 2017

Practical Byzantine Fault Tolerance Consensus and A Simple Distributed Ledger Application Hao Xu Muyun Chen Xin Li

Distributed Systems. replication Johan Montelius ID2201. Distributed Systems ID2201

Byzantine Fault Tolerance

Gnothi: Separating Data and Metadata for Efficient and Available Storage Replication

Towards Recoverable Hybrid Byzantine Consensus

Byzantine Fault Tolerance

Fast Byzantine Consensus

BAR Gossip. Lorenzo Alvisi UT Austin

Janus. Consolidating Concurrency Control and Consensus for Commits under Conflicts. Shuai Mu, Lamont Nelson, Wyatt Lloyd, Jinyang Li

A Byzantine Fault-Tolerant Key-Value Store for Safety-Critical Distributed Real-Time Systems

ByTAM: a Byzantine Fault Tolerant Adaptation Manager

Practical Byzantine Fault Tolerance. Castro and Liskov SOSP 99

Transcription:

Zyzzyva Speculative Byzantine Fault Tolerance Ramakrishna Kotla L. Alvisi, M. Dahlin, A. Clement, E. Wong University of Texas at Austin

The Goal Transform high-performance service into high-performance and reliable service Request Request Client Reply Server Client Replies Servers

BFT state machine replication BFT state-of-the-art Practical Byzantine Fault Tolerance [OSDI 99, OSDI 00] Generalized abstraction [SOSP 01] Reduced replication cost [SOSP 03] High Throughput [DSN 04] Applications: Farsite[OSDI 02], Oceanstore[FAST 03] Quorum based approaches: Q/U[SOSP 05], HQ[OSDI 06] Promising approach to build reliable systems

Why another BFT protocol? Yes High request contention? No PBFT[OSDI 99] Yes Low latency? No Yes Replication cost No PBFT[OSDI 99] < 5f+1? HQ [OSDI 06] QU[SOSP 05] HQ[OSDI 06] BFT state-of-the-art is too complex

Zyzzyva: Rethinks BFT state machine replication BFT? Yes Zyzzyva Outperform existing BFT approaches High performance: Comparable to unreplicated services Low overhead: Approaches lower bounds

Zyzzyva: Outline Rethink state machine replication Speculation: Avoiding explicit replica agreement Speculative BFT: Double edged sword Implementation and Optimizations Evaluation

State Machine Replication Service is replicated to tolerate failures Requirement: Applications observe centralized service How: Replicas execute requests in the same order Agreement: Replicas agree on the request order Execution: Replicas execute requests in agreed order

Traditional BFT state machine replication Request Reply Agreement Execution Replicas agree on the request order before executing Cost: Agreement protocol overhead

Zyzzyva: Speculative BFT Replication Reply Request Speculative execution Replicas execute requests without agreement Cost: No explicit replica agreement

Avoid explicit replica agreement Idea: Leverage clients to avoid explicit agreement Intuition: Output commit at the client Sufficient: Client knows that system is consistent Not required: Replicas know that they are consistent How: Client commits output only if system is consistent Applications observe centralized service

Zyzzyva: Outline Rethink state machine replication Speculation: Avoiding explicit replica agreement Speculative BFT: Double edged sword Implementation and Optimizations Evaluation

Speculative BFT: Leveraging client Idea: Leverage clients to avoid explicit agreement Intuition: Output commit at clients and not replicas Replicas need not know if system is consistent How: Client can verify if reply is stable Before committing a reply to the application Stable reply: Replicas are in consistent state

Speculative BFT: Request history Request history allows client to verify stable reply Replicas include request history in the replies Request history: Ordered set of requests executed Replies include application response and request history <R ik, H ik >: Reply from a replica i after executing request k

Stable: Unanimous reply R 1k =R 2k=..?, H 1k =H 2k=..? Done Request: R c Replies: <R 1k, H 1K > <R 4k, H 4k > <R c,k> Speculative execution Client commits the output when all replies match All correct replicas are in consistent state

Replies: Only majority match 2f+1? R c <R 1k,H 1K> <R c,k> X Speculative execution Majority of correct replicas share the same history Client receives at least 2f+1 matching replies

Stable reply with failures Client can make progress with additional work Sufficient: Majority of correct replicas can prove That they share request history to other replicas Intuition: Eventually all correct replicas agree Commit phase: Client deposits commit certificate Commit certificate consists of 2f+1 matching histories Client commits after receiving 2f+1 matching acks

Stable reply: Majority 2f+1 2f+1 Done R c <R 1k,H 1K> C:<H 1k,...H 3k > <R c,k> X Speculative execution Commit Client deposits commit certificate Client commits when it receives 2f+1 matching acks

Failures: Primary or Network < 2f+1? R c <R 1k,H 1K > <R 2k,H 2K > <R c,k> X Speculative execution Client receives fewer than 2f+1 matching replies View change: Client retransmissions act as hint

Zyzzyva: Speculative BFT Same consistency guarantees as traditional BFT Application observes centralized service Leverage clients to avoid explicit replica agreement Significantly lower overhead

Zyzzyva: Outline Rethink state machine replication Speculation: Avoiding explicit replica agreement Speculative BFT: Double edged sword? Implementation and Optimizations Evaluation

Can a faulty client block? By not depositing the commit certificate Faulty clients cannot block other correct clients Liveness: Correct clients ensure system progress Protocol uses cumulative request histories Correct clients commit all previous requests as well Faulty client can only affect its own progress

Can a faulty client compromise safety? By committing inconsistent history? Faulty clients cannot compromise safety Faulty clients cannot deposit inconsistent histories Safety: Faulty clients cannot forge request histories No two valid commit certificates can have varying prefixes

Zyzzyva: Outline Rethink state machine replication Speculation: Avoiding explicit replica agreement Speculative BFT: Double edged sword Implementation and Optimizations Evaluation

Implementation details Checkpoint protocol: Garbage collect histories View change protocol: Elect new primary Optimizations Replace digital signatures with MACs Application state is replicated at only 2f+1 replicas Request batching

Optimization: Making faulty case faster Done 2f+1 4f+1 Commit R c X Zyzzyva5: Speeds up using 5f+1 replicas Completes in a single phase with f faulty replicas

Zyzzyva: Outline Rethink state machine replication Speculation: Avoiding explicit replica agreement Speculative BFT: Double edged sword Implementation and Optimizations Evaluation

Evaluation setup Zyzzyva replication library Compare with other protocols PBFT[OSDI 99], QU[SOSP 05], HQ[OSDI 06], Unreplicated Client-server workload Different request/reply payloads Configuration: Tolerate 1 faulty node in the system 20 Machines: 3.0 GHz running Linux 2.6 Kernel LAN: 1 Gbps ethernet links

Throughput 140 Unreplicated Throughput (Kops/sec) 120 100 80 60 40 20 PBFT Zyzzyva Zyzzyva5 Q/U max throughput HQ 0 20 40 60 80 100 Number of clients Speculation improves throughput significantly

Throughput Throughput (Kops/sec) 140 120 100 80 60 40 20 PBFT Zyzzyva Unreplicated Zyzzyva (B=10) Zyzzyva5 (B=10) PBFT (B=10) Zyzzyva5 Q/U max throughput HQ 0 20 40 60 80 100 Number of clients Speculation improves throughput significantly Zyzzyva within 35% of unreplicated service

Throughput: With a faulty backup node 100 Throughput (Kops/sec) 80 60 40 20 PBFT Zyzzyva5 (B=10) PBFT (B=10) Zyzzyva (B=10) Zyzzyva5 Zyzzyva HQ 0 20 40 60 80 100 Number of clients Zyzzyva provides excellent performance

Latency Zyzzyva Q/U Replication cost App replicas Latency (Updates) Message delays 2f+1 5f+1 3 2 Q/U: Quorum based optimistic approach Latency: 4 or more with request contention

Latency: Best case for Q/U Latency (micro sec) 800 600 400 200 Unreplicated Q/U Zyzzyva Zyzzyva5 PBFT HQ 0 Not significant: Q/U is 15% better than Zyzzyva5 No request/reply payloads, no contention, update Zyzzyva outperforms Q/U: contention, reads, load

Zyzzyva approaches optimal Optimal Zyzzyva Replication cost Total replicas Replication cost App. replicas Throughput Overhead: Crypto. ops Latency Message delays 3f+1 3f+1 2f+1 2f+1 2 2+3f/b 3 3 Throughput: Zyzzyva exploits batching Overhead reduces with increasing batch size

Conclusion Transform high-performance service to high-performance and reliable service Zyzzyva: Speculative BFT Performance comparable to unreplicated service

Thank you! Acknowledgements: Hewlett-Packard - Travel grant NSF research grants

BACKUP SLIDES

According to dictionary.com, a zyzzyva is any of various tropical American weevils of the genus Zyzzyva, often destructive to plants.

Throughput: With a faulty backup node Throughput (Kops/sec) 100 Zyzzyva (B= 10) with commit optimization 80 Zyzzyva5 (B=10) PBFT (B=10) 60 Zyzzyva (B=10) Zyzzyva5 40 Zyzzyva 20 PBFT HQ 0 20 40 60 80 100 Number of clients Failures: Zyzzyva outperforms other protocols Zyzzyva5: 2+(5f+1)/b Zyzzyva(with opt): 2+(5f+1)/b