A Comparison. of the Byzantin~ Agreement Problem and the Transaction Commit Problem

Similar documents
Distributed Systems Fault Tolerance

BYZANTINE GENERALS BYZANTINE GENERALS (1) A fable: Michał Szychowiak, 2002 Dependability of Distributed Systems (Byzantine agreement)

Consensus on Transaction Commit

BYZANTINE AGREEMENT CH / $ IEEE. by H. R. Strong and D. Dolev. IBM Research Laboratory, K55/281 San Jose, CA 95193

CSE 5306 Distributed Systems. Fault Tolerance

Fault Tolerance. Distributed Software Systems. Definitions

A Case Study of Agreement Problems in Distributed Systems : Non-Blocking Atomic Commitment

CSE 5306 Distributed Systems

Cheap Paxos. Leslie Lamport and Mike Massa. Appeared in The International Conference on Dependable Systems and Networks (DSN 2004)

Tradeoffs in Byzantine-Fault-Tolerant State-Machine-Replication Protocol Design

Today: Fault Tolerance

Causal Consistency and Two-Phase Commit

Consensus in Distributed Systems. Jeff Chase Duke University

Adapting Commit Protocols for Large-Scale and Dynamic Distributed Applications

Chapter 8 Fault Tolerance

Today: Fault Tolerance. Fault Tolerance

Distributed Deadlock

Today: Fault Tolerance. Failure Masking by Redundancy

Chapter 8 Fault Tolerance

PRIMARY-BACKUP REPLICATION

Initial Assumptions. Modern Distributed Computing. Network Topology. Initial Input

Fault Tolerance. Distributed Systems. September 2002

Fault Tolerance. Distributed Systems IT332

2-PHASE COMMIT PROTOCOL

Distributed Systems COMP 212. Lecture 19 Othon Michail

Distributed Systems Principles and Paradigms. Chapter 08: Fault Tolerance

Dep. Systems Requirements

Fault Tolerance. Basic Concepts

Fault Tolerance Part I. CS403/534 Distributed Systems Erkay Savas Sabanci University

TWO-PHASE COMMIT ATTRIBUTION 5/11/2018. George Porter May 9 and 11, 2018

21. Distributed Algorithms

Consensus and agreement algorithms

Basic concepts in fault tolerance Masking failure by redundancy Process resilience Reliable communication. Distributed commit.

Practical Byzantine Fault Tolerance

Two-Phase Atomic Commitment Protocol in Asynchronous Distributed Systems with Crash Failure

Name: 1. CS372H: Spring 2009 Final Exam

Today: Fault Tolerance. Replica Management

An optimal novel Byzantine agreement protocol (ONBAP) for heterogeneous distributed database processing systems

International Journal of Advanced Research in Computer Science and Software Engineering

CHAPTER AGREEMENT ~ROTOCOLS 8.1 INTRODUCTION

Failure Tolerance. Distributed Systems Santa Clara University

System Models. 2.1 Introduction 2.2 Architectural Models 2.3 Fundamental Models. Nicola Dragoni Embedded Systems Engineering DTU Informatics

Global atomicity. Such distributed atomicity is called global atomicity A protocol designed to enforce global atomicity is called commit protocol

Fault-Tolerant Distributed Consensus

Consensus and related problems

Distributed Systems Consensus

Dfinity Consensus, Explored

Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf

Fault-Tolerance & Paxos

CprE Fault Tolerance. Dr. Yong Guan. Department of Electrical and Computer Engineering & Information Assurance Center Iowa State University

Verteilte Systeme/Distributed Systems Ch. 5: Various distributed algorithms

Byzantine Generals in Action" Implementing Fail-Stop Processors

AGREEMENT PROTOCOLS. Paxos -a family of protocols for solving consensus

Implementing Shared Registers in Asynchronous Message-Passing Systems, 1995; Attiya, Bar-Noy, Dolev

The objective. Atomic Commit. The setup. Model. Preserve data consistency for distributed transactions in the presence of failures

Practical Byzantine Fault

Byzantine Fault Tolerance and Consensus. Adi Seredinschi Distributed Programming Laboratory

CS505: Distributed Systems

Capacity of Byzantine Agreement: Complete Characterization of Four-Node Networks

Fast Paxos. Leslie Lamport

Issues in Programming Language Design for Embedded RT Systems

Consensus Problem. Pradipta De

FAULT TOLERANCE. Fault Tolerant Systems. Faults Faults (cont d)

Practical Byzantine Fault Tolerance Using Fewer than 3f+1 Active Replicas

The Long March of BFT. Weird Things Happen in Distributed Systems. A specter is haunting the system s community... A hierarchy of failure models

Distributed Systems 11. Consensus. Paul Krzyzanowski

Transactions. CS 475, Spring 2018 Concurrent & Distributed Systems

Fork Sequential Consistency is Blocking

Consensus in the Presence of Partial Synchrony

The algorithms we describe here are more robust require that prior to applying the algorithm the and comprehensive than the other distributed particip

ATOMIC COMMITMENT Or: How to Implement Distributed Transactions in Sharded Databases

An Anonymous Self-Stabilizing Algorithm For 1-Maximal Matching in Trees

Fork Sequential Consistency is Blocking

Recovering from a Crash. Three-Phase Commit

Specifying and Proving Broadcast Properties with TLA

Failures, Elections, and Raft

Robust BFT Protocols

Artificial Neural Network Based Byzantine Agreement Protocol

arxiv:cs/ v3 [cs.dc] 1 Aug 2007

Fault Tolerance. o Basic Concepts o Process Resilience o Reliable Client-Server Communication o Reliable Group Communication. o Distributed Commit

Distributed Systems Principles and Paradigms

Practical Byzantine Fault Tolerance and Proactive Recovery

Introduction to Distributed Systems Seif Haridi

Ruminations on Domain-Based Reliable Broadcast

Yale University Department of Computer Science

CMSC 858F: Algorithmic Game Theory Fall 2010 Achieving Byzantine Agreement and Broadcast against Rational Adversaries

Detectable Byzantine Agreement Secure Against Faulty Majorities

Research Statement. Yehuda Lindell. Dept. of Computer Science Bar-Ilan University, Israel.

A Correctness Proof for a Practical Byzantine-Fault-Tolerant Replication Algorithm

Distributed Commit in Asynchronous Systems

Distributed Encryption and Decryption Algorithms

Fault Tolerance Part II. CS403/534 Distributed Systems Erkay Savas Sabanci University

Distributed Systems. 09. State Machine Replication & Virtual Synchrony. Paul Krzyzanowski. Rutgers University. Fall Paul Krzyzanowski

ROUND COMPLEXITY LOWER BOUND OF ISC PROTOCOL IN THE PARALLELIZABLE MODEL. Huijing Gong CMSC 858F

Authenticated Byzantine Fault Tolerance Without Public-Key Cryptography

Erez Petrank. Department of Computer Science. Haifa, Israel. Abstract

Distributed Systems (ICE 601) Fault Tolerance

Distributed Algorithms Benoît Garbinato

Fault Tolerance. Goals: transparent: mask (i.e., completely recover from) all failures, or predictable: exhibit a well defined failure behavior

Proseminar Distributed Systems Summer Semester Paxos algorithm. Stefan Resmerita

Transcription:

1'TANDEMCOMPUTERS A Comparison. of the Byzantin~ Agreement Problem and the Transaction Commit Problem Jim Gray Technical Report 88.6 May 1988. Part Number 15274

...

A Comparison of the Byzantine Agreement Problem and the Transaction Commit Problem Jim Gray Technical Report 88.6 Tandem Part No. 15274 May 1988

A COMPARISON OF THE BYZANTINE AGREEMENT PROBLEM AND THE TRANSACTION COMMIT PROBLEM Jim Gray. May 1988 Tandem Technical Report 88.6 Abstract: Transaction commit and Byzantine agreement solve the problem of multiple processes reaching agreement in the presence of process and message failures. This paper summarizes the computation and fault models of the two kinds of agreement and shows the difference between them. In particular, it explains that Byzantine agreement is rarely used in practice because it involves significantly more hardware and messages, yet does not give predictable behavior if there are more than a few errors. Table of Contents Introduction 1 The Transaction Commit Problem 1 The Byzantine Generals Problem 5 Comparing the, Problems 7 Acknowledgments 9 References 9

INTRODUCTION The Workshop on Fault Tolerant Distributed Computing met at Asilomar, California on March 16 19, 1986. It brought together practitioners and theorists. The theorists seemed primarily concerned with variations of the Byzantine Generals problem. The practitioners seemed primarily concerned with techniques for building fault tolerant and distributed systems. Prior to the conference, it was widely believed that the transaction commit problem faced by distributed systems is a degenerate form ofthe Byzantine Generals Problem studied by academe. Perhaps the most useful consequence of the conference was to show that these two problems have little in common. THE TRANSACTION COMMIT PROBLEM The transaction commit problem was first solved in 1971. Niko Garzado first noticed and solved it while working for ffim on a distributed system for the Italian Social Security Department. The problem was folklore for several years. Five years later, descriptions and solutions began to appear in the open literature [Grayl], [Lampson], [Rosenkrantz]. Solutions to the commit problem make a collection of actions atomic -- either all happen or none happens. Atomicity is easy ifall goes well, but the commit problem requires atomicity even if there are failures. Today, commit algorithms are a key element ofmost transaction processing systems. Maintenance ofreplicated objects (all copies must be the same) and maintenance ofconsistency within a system (a mail message must arrive at one node ifit leaves a second one) require atomic agreement among computers. To state the commit problem more precisely, one needs a model ofcomputation and offailures. Lampson and Sturgis formulated a simple and elegant model ofcomputation and failures [Lampson]. This model is now widely embraced by practitioners. It is impossible to do the model justice here; their paper is well worth reading. In outline: 1

The Lampson-Sturgis computation model consists of: storage which may be read and written. processes which execute three kinds ofactions: change state send or receive a message read or write storage Processes run at arbitrary speed, but eventually make progress. The fault model postulates that: Storage writes may fail or corrupt another piece of storage. Such faults are rare, but when they happen, a subsequent read can detect the fault. Storage may spontaneously decay, but such faults are rare. In particular, a pair of storage units will not both decay within the repair time for one storage unit. When decayed storage is read, the reader can detect the corruption. These are the fault assumptions ofduplexed discs. Processes may lose state and be reset to a null state, but such faults are rare and detectable. Messages may be delayed, corrupted or lost, but such faults are rare. Corrupted messages are detectably corrupted. Based on these computation and fault models, Lampson and Sturgis showed how to build singlefault tolerant stable storage which does not decay (duplexed discs), and stable processes which do not reset (process pairs). This single-fault tolerance is based on the assumptions ofrare errors and eventual progress (Mean Time To Repair is orders ofmagnitude less than Mean Time To Fai1url~). The cost model implicit in this computation model is: The computation cost is storage accesses plus messages. The delay is serialized storage accesses plus serialized messages. Good algorithms have low cost and low delay in the average case. In addition they tolerate arbitrarily many lost messages, process resets, and storage decays. 2

Given this computation and fault model the commit problem can be stated as: THE COMMIT PROBLEM: Given N stable processes each with the state diagram: then the commit problem is to find an algorithm which forces ALL processes to the COMMITTED state or ALL to the ABORTED state depending on the input to the algorithm The commit problem is easy to solve; one just sends the decision to each process, and keeps resending it until the process changes state and acknowledges. Because messages may be lost and processes may run slowly, there is no limit on how long the algorithm will take. But, eventually all processes will agree and the algorithm will terminate. The expected cost is 2N messages plus N stable (i.e. single-fault tolerant) writes. The expected delay is two message delays plus one stable write delay. There is a more general version of the commit problem called the two-phase commit problem. It allows an ACTIVE process to unilaterally abort its part ofthe transaction and consequently abort the whole transaction. After entering the PREPARED state an active process abdicates its right to unilateral abort. By contrast, the one-phase problem does not allow any of the processes to unilaterally abort. This unilateral abort is very convenient; the CANCEL keys ofmost user interfaces are examples of unilateral abort. 3

TWO-PHASE COMMIT PROBLEM: Given N stable processes each with the state diagram above, find an algorithmwhich forces ALL the processes to the COMMITIED stateorall to the ABORTED state depending on the input to the algorithm and whether any process has already spontaneously made the ACTIVE to ABORTED transition. Note that once the process is PREPARED, it cannot make a unilateral decision. Algorithms for two-phase commit are fancier and costlier than the one-phase algorithms described first. Needless to say, three-phase algorithms have been invented and implemented, but it has stopped there. All ofthe commit problems have the following properties: All processes are assumed to correctly follow the state machine. There may be arbitrarily many storage failures, process failures, and message failures. Eventually, all processes agree. 4

THE BYZANTINE GENERALS PROBLEM The Byzantine Generals Problem grew out ofattempts to build a fault-tolerant computer to control an unstable aircraft. The goal was to find a design that could tolerate arbitrary faults in the computer. Since, the designers assumed that they could verify the computer software, the only real problem was faulty hardware. Faulty computer hardware can execute even correct programs in crazy ways. So, the designers postulated that most computers functioned correctly, but some functioned in the most malicious way conceivable. Thus, the Byzantine Generals Problem was formulated [Pease], [Lamport]. THE BYZANTINE GENERALS PROBLEM: Assume that there are N generals. Some of them are good and some are faulty. They cancommunicate only via messages. The bad ones can forge messages, delay messages sent via them, send conflicting or contradictory messages, and masquerade as other generals. Ifa message from a "good" general is lost or damaged, then the good general is treated as a bad one. An algorithm solves the Byzantine Generals Problem ifit gets all the good generals to agree within a bounded time. The computation model ofthe Byzantine Generals problem is similar to the Lampson-Sturgis model for processes and messages. The Byzantine fault model is quite different from the Lampson-Sturgis fault model. The Byzantine Generals Problem assumes that processors and messages may fail undetectably, even maliciously. The Lampson-Sturgis model assumes processes execute correctly or detectably fail and that messages are delivered, detectably corrupted, or lost. Forgeries or undetected corruption are defmed as "impossible" (Le. very rare squared) by Lampson and Sturgis; actually, they just define such an event as a catastrophe. The gist ofthe theory on solutions to the Byzantine Generals Problem is: Ifat least 1/3 ofthe generals are bad, then the good generals cannot reliably agree [Pease]. Iffewer than 1/3 ofthe generals are bad, then there are many algorithms. 5

COMPARING THE PROBLEMS The Commit and the Byzantine Generals problems are identical in the fault-free case. In that case all the participants must agree. In the typical case (no faults) Commit algorithms send many fewer messages than the Byzantine Generals algorithms. Fundamental differences between the problems and their solutions emerge when there are faults. The basic differences are: Commit protocols tolerate many faults. Byzantine protocols tolerate at most NI3 faults. ALL processors agree in the commit case. SOME processors agree in the Byzantine case. Commit algorithms are fail-fast. They give either a common answer or no answer. Byzantine algorithms give random answers without warning ifthe fault threshold is exceeded. Commit agreement may require unbounded time. Byzantine agreement terminates in bounded time. Commit algorithms require no extra processors and few extra messages. Byzantine algorithms require many messages and processors. The following examples show that neither of these problems is especially realistic 6

Byzantine ATMs: Consider an Automated Teller Machine (ATM) doing a debit of a bank account. Three computers storing the account plus the ATM give the requisite four processors needed for Byzantine agreement. Ifthe phone line to the ATM fails, then the three computers quickly agree to debit the account but the ATM refuses to dispense the money. This is Byzantine agreement. Commit ATMs: Ifthe same ATM is controlled by a single computer and they obey a commit protocol, then they will eventually agree (debit plus dispense or no debit and no dispense). But the customer may have to wait at the ATM for many days before the faulty phone line is fixed and the money is dispensed. Most "real" systems focus on single-fault tolerance because single faults are rare, double faults are very rare (rare squared). Commit algorithms are geared to this single-fault view. They give single-fault tolerance by duplexing fail-fast modules [Gray2]. Byzantine Algorithms require at least four-plexed modules to tolerate a single fault. I believe that this high processor cost and a related high message cost is the reason that no practitioner has yet: implemented Byzantine algorithms. Look at the two pictures above and judge for yourself. The Lampson-Sturgis model distinguishes between message failure and process failure because long-haul messages typically are corrupted once an hour while processors typically fail once a 7

month. This is an important practical distinction -- especially since Byzantine messages outnumber processors by a polynomial. Amazingly, the Byzantine fault model typically equates message failures with process failures. As the number of nodes grows, the number ofmessage failures grows polynomially and produces a system much less reliable than a single processor. Indeed all Byzantine agreement systems have mean time to failure much below a single processor system [Tay], [Babaoglu]. Commit algorithms are fail-fast while Byzantine algorithms give an answer in any case. Each nonfaulty processor executing a Byzantine Generals algorithm gives an answer within a fixed time _.. -N3 message delays for a typical algorithm. Ifthere are few faults, then all the non-faulty processors will give the same answer. Ifat least N/3 processors are faulty orifat least N/3 messages are damaged, then two "correct" processors may give different answers. By contrast, commit algorithms eventually get all the processes to give the same answer. This may take a very long time and many messages. No one wants to wait forever for the right answer. Unfortunately, there is no solution to this dilemma Ifall processors must agree, they must be prepared to wait an unbounded time. The salient properties ofthe two problems are listed in the chart below. It shows that there is little overlap between the two problems. DEGREE OF AGREEMENT FAULT SOME AIL TOLERANCE & AGREE AGREE SPEED LIMITED TIME/ERRORS BYZANTINE? UNLIMITED TIME/ERRORS? CO~ Based on this comparison between the two problems, practitioners embraced the Commit problem over the Byzantine Generals problem because it has an efficient solution to the single-fault case, gives correct answers in the multi-fault case, and has good no-fault performance. 8

ACKN.OWLEDGMENTS Andrea Borr, Phill Garrett, Fritz Graf, Pat Helland, Pete Homan, Fritz Joem and Barbara Simons made valuable comments on this paper. REFERENCES [Babaoglu] Babaoglu, 0., "on The Reliability of Consensus Based Fault Tolerant Distributed Computer Systems", ACM TOCS, V. 5.4,1987. [Gray1] Gray, J., "Notes on Database Operating Systems", Operating Systems, An Advanced Course, Lecture Notes in Computer Science, V. 60, Springer Verlag, 1978. [Gray2] Gray, J., "Why Do Computers Stop, and What Can We Do About It?", Tandem TR: 85.7, Tandem Computers, 1985. [Lamport] Lamport, L., Shostak, R., Pease, M., "The Byzantine Generals Problem", ACM TPLS, V. 4.3, 1982. [Lampson] Lampson, B.W., Sturgis, H., "Atomic Transactions", Distributed Systems Architecture and Implementation; An Advanced Course, Lecture Notes in Computer Science, V. 105, Springer Verlag, 1981. [Pease] Pease, M., Shostak, R., Lamport, L., "Reaching Agreement in the Presence of Faults", ACM Journal, V. 27.2, 1980. [Rosenkrantz] Rosenkrantz, D.J., R.D. Stearns, P.M. Lewis, "System Level Concurrency Control for Database Systems", ACM TODS, V. 3.2, 1977. [Tay] Tay, Y.C., "The Reliability of (k,n)- Resilient Distributed Systems", Proc. 4th Symposium in Distributed Software and Database Systems, IEEE, 1984. 9

Distributed by..,tandemcomputers Corporate Information Center 19333 Vallco Parkway MS3-07 Cupertino, CA 95014-2599