Distributed Algorithms Introduction
|
|
- Marion Scott
- 6 years ago
- Views:
Transcription
1 Distributed Algorithms Introduction Alberto Montresor University of Trento, Italy 2016/04/26 This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
2 Contents 1 Getting Started Two generals Common Knowledge Byzantine generals Summary 2 Themes of the course Impossible vs practical Classical vs extreme distributed systems Syllabus 3 Distributed Systems Informal definition Challenges 4 Conclusions
3 Getting Started Two generals Two generals A thought experiment: Alberto Montresor (UniTN) DS - Introduction 2016/04/26 1 / 35
4 Getting Started Two generals A potential solution General A: attack at dawn! General B: ack, attack at dawn! General A: ack ack, attack at dawn! General B: ack ack ack, attack at dawn!... Theorem Under this scenario, there is no solution for the Two Generals Problem Proof. By contradiction on the number of messages exchanged. Alberto Montresor (UniTN) DS - Introduction 2016/04/26 2 / 35
5 DS - Introduction Getting Started Two generals A potential solution A potential solution General A: attack at dawn! General B: ack, attack at dawn! General A: ack ack, attack at dawn! General B: ack ack ack, attack at dawn!... Theorem Under this scenario, there is no solution for the Two Generals Problem Proof. By contradiction on the number of messages exchanged. Let s assume (by contradiction) that there is at least one solution to this problem under this scenario. If there is one, there could be many. If there are many, we can find one which uses the minimum number of messages. Take the last message of this protocol: it can be received or it can be lost The protocol should work in both cases. So we could avoid sending it at all! The resulting protocol uses less messages than the minimum, which is a contradiction
6 Getting Started Two generals Reality Check Atomic Commit An atomic commit is an operation in which a set of distinct changes is applied as a single operation. Example: ATM s withdrawal You whitdraw 100 euro from an ATM in Trento Your balance should be decreased by 100 euro Alberto Montresor (UniTN) DS - Introduction 2016/04/26 3 / 35
7 Getting Started Two generals A pragmatic solution Probabilistic protocol Assuming messengers are caught independently of each other with probability p General A: Send n messengers Attacks no matter what General B: Attacks if receives at least one messenger p n is the probability that the attack will be uncoordinated Trade-off: We can decrease the probability of failure by increasing n... But at the additional cost of sending more messengers! Without be ever certain that the attack will be coordinated! Alberto Montresor (UniTN) DS - Introduction 2016/04/26 4 / 35
8 Getting Started Common Knowledge Muddy children n children go playing Children are truthful, perceptive, intelligent Mom says: Don t get muddy! A bunch (say, k) get mud on their forehead Daddy comes, looks around, and says Some of you got a muddy forehead Daddy repeatedly ask: Do you know whether you have a muddy forehead? What happens? Slides from Lorenzo Alvisi Alberto Montresor (UniTN) DS - Introduction 2016/04/26 5 / 35
9 Getting Started Common Knowledge Muddy children Theorem The first k 1 times Daddy asks, they children says No The k-th time, the k children say Yes. Proof. By induction on k Alberto Montresor (UniTN) DS - Introduction 2016/04/26 6 / 35
10 DS - Introduction Getting Started Common Knowledge Muddy children Muddy children Theorem The first k 1 times Daddy asks, they children says No The k-th time, the k children say Yes. Proof. By induction on k Let k = 1. The first time daddy asks, the child with mud on his forehead say yes. Because all the other have no mud, and someone has mud on his forehead, it must be him. Let k > 1 Every child with mud see k 1 children with mud on the forehead. If there were k 1 children with mud, they would have said yes at the (k 1)-th time daddy asks, but they didn t. So there are actually k children with mud, and they all say yes at the k-th time daddy asks.
11 Getting Started Common Knowledge Muddy children Variation 1 Suppose k > 1 Every one knows that someone has a dirty forehead before Dad announces it Does Dad still need to speak up? Let p = Someone s forehead is dirty Every one knows p But, unless the father speak, if k = 2 not every one knows that everyone knows p! Suppose A and B are dirty. Before the father speaks A does not know whether B knows p If k = 3, not every one knows that every one knows that every one knows p... Alberto Montresor (UniTN) DS - Introduction 2016/04/26 7 / 35
12 Getting Started Common Knowledge Muddy children Variation 2... the father took every child aside and told them individually (without others noticing) that someone s forehead is muddy? Variation 3... every child had (unknown to the other children) put a miniature microphone on every other child so they can hear what the father says in private to them? Alberto Montresor (UniTN) DS - Introduction 2016/04/26 8 / 35
13 Getting Started Common Knowledge Two generals, reloaded There is an entire logic that formalizes what knowledge participants acquire while running a protocol J. Halpern and Y. Moses. Knowledge and Common Knowledge in a Distributed Environment. E.W. Dijkstra Prize Solving the Two Generals Problem requires common knowledge everyone knows that everyone knows that everyone knows... But: Common knowledge cannot be achieved by communicating through unreliable channels Alberto Montresor (UniTN) DS - Introduction 2016/04/26 9 / 35
14 Getting Started Common Knowledge A common knowledge puzzle Albert and Bernard just become friends with Cheryl, and they want to know when her birthday is. Cheryl gives them a list of 10 dates: May 15,16,19 June 17,18 July 14,16 August 14,15,17 Cheryl then tells Albert and Bernard separately the month and the day of her birthday respectively. Albert: I don t know when Cheryl s birthday is, but I know that Bernard does not know too. Bernard: At first I didn t know when Cheryl s birthday is, but I know now. Albert: Then I also know when Cheryl s birthday is Alberto Montresor (UniTN) DS - Introduction 2016/04/26 10 / 35
15 Getting Started Byzantine generals Byzantine generals Scenario n Byzantine generals encircling a city They must decide whether to attack or retreat! Messengers are reliable and synchronous Generals may be traitors Nobody knows which generals are traitors Problem specification The generals require an algorithm to reach an agreement such that (i) all loyal generals decide on the same plan of action and (ii) a small number of traitorous generals cannot cause the loyal generals to adopt different plans. Alberto Montresor (UniTN) DS - Introduction 2016/04/26 11 / 35
16 Getting Started Byzantine generals A potential solution Wait for a majority of generals to agree WRONG! Possible scenario: 3 generals 1 vote attack, 1 vote retreat 1 traitorous general: sends a vote attack to the attack general sends a vote retreat to the retreat general Alberto Montresor (UniTN) DS - Introduction 2016/04/26 12 / 35
17 Getting Started Byzantine generals The problem is solvable Byzantine Fault Tolerance (1982) L. Lamport, R. Shostak, M. Pease, The Byzantine Generals Problem, ACM Trans. on Programming Languages and Systems, 4(3): , A protocol that given n processes,can tolerate up to t traitorous generals with n 3t + 1 Example: 4 generals can tolerate up to 1 byzantine general Practical Byzantine Fault Tolerance (2002) M. Castro and B. Liskov, Practical Byzantine Fault Tolerance and Proactive Recovery, ACM Trans. on Computer Systems, 20(4): , PBFT triggered a renaissance in BFT replication research Still going on... Alberto Montresor (UniTN) DS - Introduction 2016/04/26 13 / 35
18 Getting Started Byzantine generals Reality Check BFT sponsors in 1982 NASA The Ballistic Missile Defense System Command Army Research Office Nancy Lynch s book on Distributed Systems: The agreement problem is a simplified version of a problem that originally arose in the development of on-board aircraft control systems. BitCoin, a peer-to-peer digital currency system, is based on BFT. The 8-hour downtime of Amazon S3 in July 2008 is a well-known example of what happens when you don t use BFT. Alberto Montresor (UniTN) DS - Introduction 2016/04/26 14 / 35
19 Getting Started Summary Take-home lessons We need to properly model our distributed systems Reliable / unreliable communication Benign / malicious processes Solutions depend on the underlying model Approximate or probabilistic solutions Bounded solution No solution at all! Coordinating multiple processes is difficult Unexpected events: failures, malicious behavior Lack of common knowledge Alberto Montresor (UniTN) DS - Introduction 2016/04/26 15 / 35
20 Themes of the course Impossible vs practical Theory vs practice Yogi Berra says: In theory, theory and practice are the same. In practice, they are not. Yogi Berra Always go to other people s funerals, otherwise they won t go to yours I really didn t say everything I said Nobody goes there anymore; it s too crowded Alberto Montresor (UniTN) DS - Introduction 2016/04/26 16 / 35
21 Themes of the course Impossible vs practical First theme of the course Impossible vs practical Several papers about impossibility results: M. Fischer, N. Lynch, M. Paterson. Impossibility of Distributed Consensus with One Faulty Process. Journal of ACM, 32(2): , S. Gilbert, N. Lynch. Brewer s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services. ACM SIGACT News, 33(2):51-59, Yet, many of these problems have practical solutions: T. Chandra and S. Toueg. Unreliable failure detectors for reliable distributed systems. Journal of the ACM, 43(2): , L. Lamport. Paxos made simple. ACM SIGACT News, 32(4):18 25, A general tension in Computer Science: M. Vardi. Solving the Unsolvable. Comm. of the ACM, 54(7):5, Alberto Montresor (UniTN) DS - Introduction 2016/04/26 17 / 35
22 Themes of the course Classical vs extreme distributed systems Beyond technology The Spanish Flu, Pandemic: killed 20M people in a relatively short time, more than World War I Virus goal: spread itself as quickly as possible Unreliable environment: viruses may be killed transmission may fail Transmission network is a complex graph Alberto Montresor (UniTN) DS - Introduction 2016/04/26 18 / 35
23 Themes of the course Classical vs extreme distributed systems Beyond technology Flocks of birds Flying in a flock is good: probability of being killed by a predator is reduced Flying in a flock is bad: probability of finding (enough) food is reduced Birds self-organize themselves in a flock No central authority Alberto Montresor (UniTN) DS - Introduction 2016/04/26 19 / 35
24 Themes of the course Classical vs extreme distributed systems Second theme of the course Classical vs extreme distributed systems Classical distributed system problems include agreement, total order broadcast, atomic commit, replication, etc. Extreme distributed system problems include self-* properties, scalability, full decentralization, etc. Special issue on Springer Computing (Sept. 2012) Alberto Montresor, Gusz Eiben, Maarten van Steen, editors Modern distributed systems may nowadays consist of hundreds of thousands of computers, ranging from high-end powerful machines to low-end resource-constrained wireless devices. We label them as extreme distributed systems, as they push scalability and complexity well beyond traditional scenarios. Alberto Montresor (UniTN) DS - Introduction 2016/04/26 20 / 35
25 Themes of the course Syllabus Topics Introduction Models Time, clocks, events Reliable broadcast Epidemic protocols Impossibility of consensus Consensus and failure detectors Complex networks Replication P2P Epidemics: Beyond dissemination Atomic Commitment Rollback and Recovery Group Communication Paxos Practical Byzantine Fault Tolerance Blockchains Byzantine, altruistic, rational model Distributed frameworks Alberto Montresor (UniTN) DS - Introduction 2016/04/26 21 / 35
26 Distributed Systems Informal definition What is a distributed system? Definition (pragmatic) A collection of independent, autonomous hosts connected through a communication network. Host communicate via message passing to achieve some form of cooperation. Definition (by negation) A parallel system where there are no: shared, global clock shared memory accurate failure detection Alberto Montresor (UniTN) DS - Introduction 2016/04/26 22 / 35
27 Distributed Systems Informal definition What is a distributed system? Definition (optimistic) A collection of independent computers that appears to its users as a single coherent system [Andrew Tanenbaum] Definition (pessimistic) A distributed system is one in which the failure of a computer you did not even know existed can render your own computer unusable [Leslie Lamport] Alberto Montresor (UniTN) DS - Introduction 2016/04/26 23 / 35
28 Distributed Systems Informal definition Motivation Inherent distribution Applications which require sharing of resources or dissemination of information among geographically distant entities are natural distributed systems Examples: Bank with several branches Distributed file systems, shared devices, etc. Alberto Montresor (UniTN) DS - Introduction 2016/04/26 24 / 35
29 Distributed Systems Informal definition Motivation Distribution as an artifact Distribution may be an artifact of an engineering solution to satisfy some specific requirements such as: fault-tolerance increased performance / cost ratio minimum level of Quality of Service (QoS) Examples: Replicated servers Alberto Montresor (UniTN) DS - Introduction 2016/04/26 25 / 35
30 Distributed Systems Informal definition Independent failures From distributed systems to failures Dependencies between hosts augment the effect of failures Distributed systems make the life of developers more difficult Availability example: n dependent hosts, probability of failure p Availability: (1 p) n From failures to distributed systems Providing a replicated service via multiple independent processes enables fault tolerance Distributed systems should simplify the life of users n independent hosts, probability of failure p Availability: 1 p n Alberto Montresor (UniTN) DS - Introduction 2016/04/26 26 / 35
31 Distributed Systems Informal definition Parallel systems Definition Multiple processors that: share memory Communication based on shared memory and synchronization mechanisms share time Access to a common clock share fate No independent failures Alberto Montresor (UniTN) DS - Introduction 2016/04/26 27 / 35
32 Distributed Systems Informal definition The true spirit of distributed systems Parallel systems exploit determinism to achieve efficiency Distributed systems address uncertainty created by: multiplicity of control flows absence of shared memory and global time presence of failures Alberto Montresor (UniTN) DS - Introduction 2016/04/26 28 / 35
33 Distributed Systems Challenges Challenges blah, blah Heterogeneity Openness Security Failure handling Transparency Alberto Montresor (UniTN) DS - Introduction 2016/04/26 29 / 35
34 Distributed Systems Challenges Heterogeneity Levels: Network Computing hardware Operating systems Programming languages Multiple implementations Middleware Software layer that abstracts from the above providing a uniform computational model Alberto Montresor (UniTN) DS - Introduction 2016/04/26 30 / 35
35 Distributed Systems Challenges Openness The degree to which a distributed system can be extended and re-implemented Public interfaces Standardization Examples: Corba Java2EE.Net Web Services Alberto Montresor (UniTN) DS - Introduction 2016/04/26 31 / 35
36 Distributed Systems Challenges Security aspects Confidentiality: avoiding the disclosure of the content of a message to a party distinct from the intended receiver Integrity: avoiding the corruption of the transmitted contents by a third party Availability: the capability of providing a service in spite of malicious behavior Alberto Montresor (UniTN) DS - Introduction 2016/04/26 32 / 35
37 Distributed Systems Challenges Failure handling Failure detection (e.g., message checksum) Failure masking (e.g., retransmission) Failure tolerance (e.g., replicated servers) Failure recovery (e.g., log files) Distributed systems use a mix of these techniques Alberto Montresor (UniTN) DS - Introduction 2016/04/26 33 / 35
38 Distributed Systems Challenges Transparency Access transparency: Hides differences in data representation and invocation mechanisms Location transparency: enables resources to be accessed without knowledge of their location Concurrency transparency: enables several processes to operate concurrently using shared resources without interference Replication transparency: enables multiple resource instances to be used to increase reliability and performance without knowledge of the replicas by users Failure transparency: enables the concealment of faults, allowing users and application programs to complete their tasks despite the failure of hardware or software components. Migration transparency: allows the movement of resources and clients within a system without affecting operations Alberto Montresor (UniTN) DS - Introduction 2016/04/26 34 / 35
39 Conclusions What s next in the course? Distributed system modeling Which kind of failures exist? Which kind of failures we tolerate? Problem specification Formal description of the problem Algorithms, algorithms, algorithms Pseudo-code descriptions of algorithms Proofs Just writing the code is not enough! Sometimes, impossibility proofs Reality checks Learn about real systems where these protocols are applied Alberto Montresor (UniTN) DS - Introduction 2016/04/26 35 / 35
40 Reading Material A. Panconesi. The coordinated attack and the jealous amazons. A. Panconesi. Coordination and the fall of Eastern Roman Empire.
41 Reality Check: Interesting links The S3 incident Solving the unsolvable The rise and fall of Corba Notes on Distributed Systems for Young Bloods
Distributed Systems 2 Introduction
Distributed Systems 2 Introduction Alberto Montresor University of Trento, Italy 2018/09/13 This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Contents 1 Getting
More informationWhere I was born. Where I studied. Distributed Computing Honors. Distributed Computing. Lorenzo Alvisi. Lorenzo Alvisi Surbhi Goel Benny Renard
Distributed Computing Distributed Computing Honors Lorenzo Alvisi Lorenzo Alvisi Surbhi Goel Benny Renard Where I was born Bologna, Italy Surbhi Goel Benny Renard Where I studied Where I studied What is
More informationCSE 5306 Distributed Systems. Fault Tolerance
CSE 5306 Distributed Systems Fault Tolerance 1 Failure in Distributed Systems Partial failure happens when one component of a distributed system fails often leaves other components unaffected A failure
More informationCSE 5306 Distributed Systems
CSE 5306 Distributed Systems Fault Tolerance Jia Rao http://ranger.uta.edu/~jrao/ 1 Failure in Distributed Systems Partial failure Happens when one component of a distributed system fails Often leaves
More informationByzantine Fault Tolerance
Byzantine Fault Tolerance CS 240: Computing Systems and Concurrency Lecture 11 Marco Canini Credits: Michael Freedman and Kyle Jamieson developed much of the original material. So far: Fail-stop failures
More informationToday: Fault Tolerance
Today: Fault Tolerance Agreement in presence of faults Two army problem Byzantine generals problem Reliable communication Distributed commit Two phase commit Three phase commit Paxos Failure recovery Checkpointing
More informationChapter 8 Fault Tolerance
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 8 Fault Tolerance 1 Fault Tolerance Basic Concepts Being fault tolerant is strongly related to
More informationToday: Fault Tolerance. Fault Tolerance
Today: Fault Tolerance Agreement in presence of faults Two army problem Byzantine generals problem Reliable communication Distributed commit Two phase commit Three phase commit Paxos Failure recovery Checkpointing
More informationDistributed Algorithms Reliable Broadcast
Distributed Algorithms Reliable Broadcast Alberto Montresor University of Trento, Italy 2016/04/26 This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Contents
More informationBYZANTINE CONSENSUS THROUGH BITCOIN S PROOF- OF-WORK
Informatiemanagement: BYZANTINE CONSENSUS THROUGH BITCOIN S PROOF- OF-WORK The aim of this paper is to elucidate how Byzantine consensus is achieved through Bitcoin s novel proof-of-work system without
More informationA definition. Byzantine Generals Problem. Synchronous, Byzantine world
The Byzantine Generals Problem Leslie Lamport, Robert Shostak, and Marshall Pease ACM TOPLAS 1982 Practical Byzantine Fault Tolerance Miguel Castro and Barbara Liskov OSDI 1999 A definition Byzantine (www.m-w.com):
More informationPractical Byzantine Fault Tolerance. Miguel Castro and Barbara Liskov
Practical Byzantine Fault Tolerance Miguel Castro and Barbara Liskov Outline 1. Introduction to Byzantine Fault Tolerance Problem 2. PBFT Algorithm a. Models and overview b. Three-phase protocol c. View-change
More informationSecurity (and finale) Dan Ports, CSEP 552
Security (and finale) Dan Ports, CSEP 552 Today Security: what if parts of your distributed system are malicious? BFT: state machine replication Bitcoin: peer-to-peer currency Course wrap-up Security Too
More informationDistributed Systems Consensus
Distributed Systems Consensus Amir H. Payberah amir@sics.se Amirkabir University of Technology (Tehran Polytechnic) Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 1 / 56 What is the Problem?
More informationAlternative Consensus
1 Alternative Consensus DEEP DIVE Alexandra Tran, Dev Ojha, Jeremiah Andrews, Steven Elleman, Ashvin Nihalani 2 TODAY S AGENDA GETTING STARTED 1 INTRO TO CONSENSUS AND BFT 2 NAKAMOTO CONSENSUS 3 BFT ALGORITHMS
More informationBYZANTINE GENERALS BYZANTINE GENERALS (1) A fable: Michał Szychowiak, 2002 Dependability of Distributed Systems (Byzantine agreement)
BYZANTINE GENERALS (1) BYZANTINE GENERALS A fable: BYZANTINE GENERALS (2) Byzantine Generals Problem: Condition 1: All loyal generals decide upon the same plan of action. Condition 2: A small number of
More informationIntroduction to Distributed Systems Seif Haridi
Introduction to Distributed Systems Seif Haridi haridi@kth.se What is a distributed system? A set of nodes, connected by a network, which appear to its users as a single coherent system p1 p2. pn send
More information02 - Distributed Systems
02 - Distributed Systems Definition Coulouris 1 (Dis)advantages Coulouris 2 Challenges Saltzer_84.pdf Models Physical Architectural Fundamental 2/58 Definition Distributed Systems Distributed System is
More informationConsensus Problem. Pradipta De
Consensus Problem Slides are based on the book chapter from Distributed Computing: Principles, Paradigms and Algorithms (Chapter 14) by Kshemkalyani and Singhal Pradipta De pradipta.de@sunykorea.ac.kr
More informationGlobal atomicity. Such distributed atomicity is called global atomicity A protocol designed to enforce global atomicity is called commit protocol
Global atomicity In distributed systems a set of processes may be taking part in executing a task Their actions may have to be atomic with respect to processes outside of the set example: in a distributed
More informationByzantine Fault Tolerance
Byzantine Fault Tolerance CS6450: Distributed Systems Lecture 10 Ryan Stutsman Material taken/derived from Princeton COS-418 materials created by Michael Freedman and Kyle Jamieson at Princeton University.
More information02 - Distributed Systems
02 - Distributed Systems Definition Coulouris 1 (Dis)advantages Coulouris 2 Challenges Saltzer_84.pdf Models Physical Architectural Fundamental 2/60 Definition Distributed Systems Distributed System is
More informationDistributed Systems. Fault Tolerance. Paul Krzyzanowski
Distributed Systems Fault Tolerance Paul Krzyzanowski Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution 2.5 License. Faults Deviation from expected
More informationSystem Models for Distributed Systems
System Models for Distributed Systems INF5040/9040 Autumn 2015 Lecturer: Amir Taherkordi (ifi/uio) August 31, 2015 Outline 1. Introduction 2. Physical Models 4. Fundamental Models 2 INF5040 1 System Models
More informationDistributed Systems Fault Tolerance
Distributed Systems Fault Tolerance [] Fault Tolerance. Basic concepts - terminology. Process resilience groups and failure masking 3. Reliable communication reliable client-server communication reliable
More informationBYZANTINE AGREEMENT CH / $ IEEE. by H. R. Strong and D. Dolev. IBM Research Laboratory, K55/281 San Jose, CA 95193
BYZANTINE AGREEMENT by H. R. Strong and D. Dolev IBM Research Laboratory, K55/281 San Jose, CA 95193 ABSTRACT Byzantine Agreement is a paradigm for problems of reliable consistency and synchronization
More informationToday: Fault Tolerance. Failure Masking by Redundancy
Today: Fault Tolerance Agreement in presence of faults Two army problem Byzantine generals problem Reliable communication Distributed commit Two phase commit Three phase commit Failure recovery Checkpointing
More informationFault Tolerance Part I. CS403/534 Distributed Systems Erkay Savas Sabanci University
Fault Tolerance Part I CS403/534 Distributed Systems Erkay Savas Sabanci University 1 Overview Basic concepts Process resilience Reliable client-server communication Reliable group communication Distributed
More informationRecovering from a Crash. Three-Phase Commit
Recovering from a Crash If INIT : abort locally and inform coordinator If Ready, contact another process Q and examine Q s state Lecture 18, page 23 Three-Phase Commit Two phase commit: problem if coordinator
More informationDistributed Systems 11. Consensus. Paul Krzyzanowski
Distributed Systems 11. Consensus Paul Krzyzanowski pxk@cs.rutgers.edu 1 Consensus Goal Allow a group of processes to agree on a result All processes must agree on the same value The value must be one
More informationConsensus and related problems
Consensus and related problems Today l Consensus l Google s Chubby l Paxos for Chubby Consensus and failures How to make process agree on a value after one or more have proposed what the value should be?
More informationFault-Tolerant Distributed Consensus
Fault-Tolerant Distributed Consensus Lawrence Kesteloot January 20, 1995 1 Introduction A fault-tolerant system is one that can sustain a reasonable number of process or communication failures, both intermittent
More informationAgreement in Distributed Systems CS 188 Distributed Systems February 19, 2015
Agreement in Distributed Systems CS 188 Distributed Systems February 19, 2015 Page 1 Introduction We frequently want to get a set of nodes in a distributed system to agree Commitment protocols and mutual
More informationThe CAP theorem. The bad, the good and the ugly. Michael Pfeiffer Advanced Networking Technologies FG Telematik/Rechnernetze TU Ilmenau
The CAP theorem The bad, the good and the ugly Michael Pfeiffer Advanced Networking Technologies FG Telematik/Rechnernetze TU Ilmenau 2017-05-15 1 / 19 1 The bad: The CAP theorem s proof 2 The good: A
More informationFAULT TOLERANCE. Fault Tolerant Systems. Faults Faults (cont d)
Distributed Systems Fö 9/10-1 Distributed Systems Fö 9/10-2 FAULT TOLERANCE 1. Fault Tolerant Systems 2. Faults and Fault Models. Redundancy 4. Time Redundancy and Backward Recovery. Hardware Redundancy
More informationDistributed Algorithms Models
Distributed Algorithms Models Alberto Montresor University of Trento, Italy 2016/04/26 This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Contents 1 Taxonomy
More informationDistributed Systems COMP 212. Lecture 19 Othon Michail
Distributed Systems COMP 212 Lecture 19 Othon Michail Fault Tolerance 2/31 What is a Distributed System? 3/31 Distributed vs Single-machine Systems A key difference: partial failures One component fails
More informationTWO-PHASE COMMIT ATTRIBUTION 5/11/2018. George Porter May 9 and 11, 2018
TWO-PHASE COMMIT George Porter May 9 and 11, 2018 ATTRIBUTION These slides are released under an Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0) Creative Commons license These slides
More informationFailure models. Byzantine Fault Tolerance. What can go wrong? Paxos is fail-stop tolerant. BFT model. BFT replication 5/25/18
Failure models Byzantine Fault Tolerance Fail-stop: nodes either execute the protocol correctly or just stop Byzantine failures: nodes can behave in any arbitrary way Send illegal messages, try to trick
More informationA Case Study of Agreement Problems in Distributed Systems : Non-Blocking Atomic Commitment
A Case Study of Agreement Problems in Distributed Systems : Non-Blocking Atomic Commitment Michel RAYNAL IRISA, Campus de Beaulieu 35042 Rennes Cedex (France) raynal @irisa.fr Abstract This paper considers
More informationFault Tolerance. Distributed Systems. September 2002
Fault Tolerance Distributed Systems September 2002 Basics A component provides services to clients. To provide services, the component may require the services from other components a component may depend
More informationDISTRIBUTED SYSTEMS. Second Edition. Andrew S. Tanenbaum Maarten Van Steen. Vrije Universiteit Amsterdam, 7'he Netherlands PEARSON.
DISTRIBUTED SYSTEMS 121r itac itple TAYAdiets Second Edition Andrew S. Tanenbaum Maarten Van Steen Vrije Universiteit Amsterdam, 7'he Netherlands PEARSON Prentice Hall Upper Saddle River, NJ 07458 CONTENTS
More informationToday: Fault Tolerance. Replica Management
Today: Fault Tolerance Failure models Agreement in presence of faults Two army problem Byzantine generals problem Reliable communication Distributed commit Two phase commit Three phase commit Failure recovery
More informationData Consistency and Blockchain. Bei Chun Zhou (BlockChainZ)
Data Consistency and Blockchain Bei Chun Zhou (BlockChainZ) beichunz@cn.ibm.com 1 Data Consistency Point-in-time consistency Transaction consistency Application consistency 2 Strong Consistency ACID Atomicity.
More informationDISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 1. Introduction
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 1 Introduction Modified by: Dr. Ramzi Saifan Definition of a Distributed System (1) A distributed
More informationConsensus in Distributed Systems. Jeff Chase Duke University
Consensus in Distributed Systems Jeff Chase Duke University Consensus P 1 P 1 v 1 d 1 Unreliable multicast P 2 P 3 Consensus algorithm P 2 P 3 v 2 Step 1 Propose. v 3 d 2 Step 2 Decide. d 3 Generalizes
More informationByzantine Techniques
November 29, 2005 Reliability and Failure There can be no unity without agreement, and there can be no agreement without conciliation René Maowad Reliability and Failure There can be no unity without agreement,
More informationTradeoffs in Byzantine-Fault-Tolerant State-Machine-Replication Protocol Design
Tradeoffs in Byzantine-Fault-Tolerant State-Machine-Replication Protocol Design Michael G. Merideth March 2008 CMU-ISR-08-110 School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213
More informationTo do. Consensus and related problems. q Failure. q Raft
Consensus and related problems To do q Failure q Consensus and related problems q Raft Consensus We have seen protocols tailored for individual types of consensus/agreements Which process can enter the
More informationSystem models for distributed systems
System models for distributed systems INF5040/9040 autumn 2010 lecturer: Frank Eliassen INF5040 H2010, Frank Eliassen 1 System models Purpose illustrate/describe common properties and design choices for
More informationCS5412: CONSENSUS AND THE FLP IMPOSSIBILITY RESULT
1 CS5412: CONSENSUS AND THE FLP IMPOSSIBILITY RESULT Lecture XII Ken Birman Generalizing Ron and Hermione s challenge 2 Recall from last time: Ron and Hermione had difficulty agreeing where to meet for
More informationThe Long March of BFT. Weird Things Happen in Distributed Systems. A specter is haunting the system s community... A hierarchy of failure models
A specter is haunting the system s community... The Long March of BFT Lorenzo Alvisi UT Austin BFT Fail-stop A hierarchy of failure models Crash Weird Things Happen in Distributed Systems Send Omission
More informationFault Tolerance. Basic Concepts
COP 6611 Advanced Operating System Fault Tolerance Chi Zhang czhang@cs.fiu.edu Dependability Includes Availability Run time / total time Basic Concepts Reliability The length of uninterrupted run time
More informationConsensus. Chapter Two Friends. 2.3 Impossibility of Consensus. 2.2 Consensus 16 CHAPTER 2. CONSENSUS
16 CHAPTER 2. CONSENSUS Agreement All correct nodes decide for the same value. Termination All correct nodes terminate in finite time. Validity The decision value must be the input value of a node. Chapter
More informationTopics in Reliable Distributed Systems
Topics in Reliable Distributed Systems 049017 1 T R A N S A C T I O N S Y S T E M S What is A Database? Organized collection of data typically persistent organization models: relational, object-based,
More informationDistributed Systems. Before We Begin. Advantages. What is a Distributed System? CSE 120: Principles of Operating Systems. Lecture 13.
CSE 120: Principles of Operating Systems Lecture 13 Distributed Systems December 2, 2003 Before We Begin Read Chapters 15, 17 (on Distributed Systems topics) Prof. Joe Pasquale Department of Computer Science
More informationCS October 2017
Atomic Transactions Transaction An operation composed of a number of discrete steps. Distributed Systems 11. Distributed Commit Protocols All the steps must be completed for the transaction to be committed.
More informationDep. Systems Requirements
Dependable Systems Dep. Systems Requirements Availability the system is ready to be used immediately. A(t) = probability system is available for use at time t MTTF/(MTTF+MTTR) If MTTR can be kept small
More informationFault Tolerance. it continues to perform its function in the event of a failure example: a system with redundant components
Fault Tolerance To avoid disruption due to failure and to improve availability, systems are designed to be fault-tolerant Two broad categories of fault-tolerant systems are: systems that mask failure it
More informationDistributed Systems Principles and Paradigms. Chapter 08: Fault Tolerance
Distributed Systems Principles and Paradigms Maarten van Steen VU Amsterdam, Dept. Computer Science Room R4.20, steen@cs.vu.nl Chapter 08: Fault Tolerance Version: December 2, 2010 2 / 65 Contents Chapter
More informationPRIMARY-BACKUP REPLICATION
PRIMARY-BACKUP REPLICATION Primary Backup George Porter Nov 14, 2018 ATTRIBUTION These slides are released under an Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0) Creative Commons
More informationIntroduction to Distributed Systems
Introduction to Distributed Systems Minsoo Ryu Department of Computer Science and Engineering 2 Definition A distributed system is a collection of independent computers that appears to its users as a single
More informationPractical Byzantine Fault Tolerance Consensus and A Simple Distributed Ledger Application Hao Xu Muyun Chen Xin Li
Practical Byzantine Fault Tolerance Consensus and A Simple Distributed Ledger Application Hao Xu Muyun Chen Xin Li Abstract Along with cryptocurrencies become a great success known to the world, how to
More informationConcepts. Techniques for masking faults. Failure Masking by Redundancy. CIS 505: Software Systems Lecture Note on Consensus
CIS 505: Software Systems Lecture Note on Consensus Insup Lee Department of Computer and Information Science University of Pennsylvania CIS 505, Spring 2007 Concepts Dependability o Availability ready
More informationCprE Fault Tolerance. Dr. Yong Guan. Department of Electrical and Computer Engineering & Information Assurance Center Iowa State University
Fault Tolerance Dr. Yong Guan Department of Electrical and Computer Engineering & Information Assurance Center Iowa State University Outline for Today s Talk Basic Concepts Process Resilience Reliable
More informationFailures, Elections, and Raft
Failures, Elections, and Raft CS 8 XI Copyright 06 Thomas W. Doeppner, Rodrigo Fonseca. All rights reserved. Distributed Banking SFO add interest based on current balance PVD deposit $000 CS 8 XI Copyright
More informationDistributed Systems COMP 212. Revision 2 Othon Michail
Distributed Systems COMP 212 Revision 2 Othon Michail Synchronisation 2/55 How would Lamport s algorithm synchronise the clocks in the following scenario? 3/55 How would Lamport s algorithm synchronise
More informationConsensus. Chapter Two Friends. 8.3 Impossibility of Consensus. 8.2 Consensus 8.3. IMPOSSIBILITY OF CONSENSUS 55
8.3. IMPOSSIBILITY OF CONSENSUS 55 Agreement All correct nodes decide for the same value. Termination All correct nodes terminate in finite time. Validity The decision value must be the input value of
More informationDistributed Systems (5DV147)
Distributed Systems (5DV147) Fundamentals Fall 2013 1 basics 2 basics Single process int i; i=i+1; 1 CPU - Steps are strictly sequential - Program behavior & variables state determined by sequence of operations
More informationConsensus, impossibility results and Paxos. Ken Birman
Consensus, impossibility results and Paxos Ken Birman Consensus a classic problem Consensus abstraction underlies many distributed systems and protocols N processes They start execution with inputs {0,1}
More informationFault-Tolerance & Paxos
Chapter 15 Fault-Tolerance & Paxos How do you create a fault-tolerant distributed system? In this chapter we start out with simple questions, and, step by step, improve our solutions until we arrive at
More informationPractical Byzantine Fault
Practical Byzantine Fault Tolerance Practical Byzantine Fault Tolerance Castro and Liskov, OSDI 1999 Nathan Baker, presenting on 23 September 2005 What is a Byzantine fault? Rationale for Byzantine Fault
More informationDistributed Systems Conclusions & Exam. Brian Nielsen
Distributed Systems Conclusions & Exam Brian Nielsen bnielsen@cs.aau.dk Study Regulations Purpose: That the student obtains knowledge about concepts in distributed systems, knowledge about their construction,
More informationConsensus a classic problem. Consensus, impossibility results and Paxos. Distributed Consensus. Asynchronous networks.
Consensus, impossibility results and Paxos Ken Birman Consensus a classic problem Consensus abstraction underlies many distributed systems and protocols N processes They start execution with inputs {0,1}
More informationDistributed Commit in Asynchronous Systems
Distributed Commit in Asynchronous Systems Minsoo Ryu Department of Computer Science and Engineering 2 Distributed Commit Problem - Either everybody commits a transaction, or nobody - This means consensus!
More informationFault Tolerance 1/64
Fault Tolerance 1/64 Fault Tolerance Fault tolerance is the ability of a distributed system to provide its services even in the presence of faults. A distributed system should be able to recover automatically
More informationCMSC 858F: Algorithmic Game Theory Fall 2010 Achieving Byzantine Agreement and Broadcast against Rational Adversaries
CMSC 858F: Algorithmic Game Theory Fall 2010 Achieving Byzantine Agreement and Broadcast against Rational Adversaries Instructor: Mohammad T. Hajiaghayi Scribe: Adam Groce, Aishwarya Thiruvengadam, Ateeq
More informationLecture 3. Introduction to Cryptocurrencies
Lecture 3 Introduction to Cryptocurrencies Public Keys as Identities public key := an identity if you see sig such that verify(pk, msg, sig)=true, think of it as: pk says, [msg] to speak for pk, you must
More informationIntroduction to Distributed Systems
Introduction to Distributed Systems Distributed Systems L-A Sistemi Distribuiti L-A Andrea Omicini andrea.omicini@unibo.it Ingegneria Due Alma Mater Studiorum Università di Bologna a Cesena Academic Year
More informationByzantine fault tolerance. Jinyang Li With PBFT slides from Liskov
Byzantine fault tolerance Jinyang Li With PBFT slides from Liskov What we ve learnt so far: tolerate fail-stop failures Traditional RSM tolerates benign failures Node crashes Network partitions A RSM w/
More informationByzantine Consensus in Directed Graphs
Byzantine Consensus in Directed Graphs Lewis Tseng 1,3, and Nitin Vaidya 2,3 1 Department of Computer Science, 2 Department of Electrical and Computer Engineering, and 3 Coordinated Science Laboratory
More informationDistributed Systems Conclusions & Exam. Brian Nielsen
Distributed Systems Conclusions & Exam Brian Nielsen bnielsen@cs.aau.dk Definition A distributed system is the one in which hardware and software components at networked computers communicate and coordinate
More informationThe Ripple Protocol Consensus Algorithm
Ripple Labs Inc, 2014 The Ripple Protocol Consensus Algorithm David Schwartz david@ripple.com Noah Youngs nyoungs@nyu.edu Arthur Britto arthur@ripple.com Abstract While several consensus algorithms exist
More informationDistributed Systems COMP 212. Lecture 1 Othon Michail
Distributed Systems COMP 212 Lecture 1 Othon Michail Course Information Lecturer: Othon Michail Office 2.14 Holt Building http://csc.liv.ac.uk/~michailo/teaching/comp2 12 Structure 30 Lectures + 10 lab
More informationIntuitive distributed algorithms. with F#
Intuitive distributed algorithms with F# Natallia Dzenisenka Alena Hall @nata_dzen @lenadroid A tour of a variety of intuitivedistributed algorithms used in practical distributed systems. and how to prototype
More informationCS6450: Distributed Systems Lecture 11. Ryan Stutsman
Strong Consistency CS6450: Distributed Systems Lecture 11 Ryan Stutsman Material taken/derived from Princeton COS-418 materials created by Michael Freedman and Kyle Jamieson at Princeton University. Licensed
More informationBasic vs. Reliable Multicast
Basic vs. Reliable Multicast Basic multicast does not consider process crashes. Reliable multicast does. So far, we considered the basic versions of ordered multicasts. What about the reliable versions?
More informationCS6450: Distributed Systems Lecture 15. Ryan Stutsman
Strong Consistency CS6450: Distributed Systems Lecture 15 Ryan Stutsman Material taken/derived from Princeton COS-418 materials created by Michael Freedman and Kyle Jamieson at Princeton University. Licensed
More informationDistributed Deadlock
Distributed Deadlock 9.55 DS Deadlock Topics Prevention Too expensive in time and network traffic in a distributed system Avoidance Determining safe and unsafe states would require a huge number of messages
More informationDistributed Algorithms Practical Byzantine Fault Tolerance
Distributed Algorithms Practical Byzantine Fault Tolerance Alberto Montresor University of Trento, Italy 2017/01/06 This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International
More informationChapter 8 Fault Tolerance
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 8 Fault Tolerance Fault Tolerance Basic Concepts Being fault tolerant is strongly related to what
More informationChapter 5: Distributed Systems: Fault Tolerance. Fall 2013 Jussi Kangasharju
Chapter 5: Distributed Systems: Fault Tolerance Fall 2013 Jussi Kangasharju Chapter Outline n Fault tolerance n Process resilience n Reliable group communication n Distributed commit n Recovery 2 Basic
More informationFault Tolerance. Distributed Software Systems. Definitions
Fault Tolerance Distributed Software Systems Definitions Availability: probability the system operates correctly at any given moment Reliability: ability to run correctly for a long interval of time Safety:
More informationDistributed algorithms
Distributed algorithms Prof R. Guerraoui lpdwww.epfl.ch Exam: Written Reference: Book - Springer Verlag http://lpd.epfl.ch/site/education/da - Introduction to Reliable (and Secure) Distributed Programming
More informationExam 2 Review. Fall 2011
Exam 2 Review Fall 2011 Question 1 What is a drawback of the token ring election algorithm? Bad question! Token ring mutex vs. Ring election! Ring election: multiple concurrent elections message size grows
More informationChapter 1: Distributed Systems: What is a distributed system? Fall 2013
Chapter 1: Distributed Systems: What is a distributed system? Fall 2013 Course Goals and Content n Distributed systems and their: n Basic concepts n Main issues, problems, and solutions n Structured and
More informationTwo-Phase Atomic Commitment Protocol in Asynchronous Distributed Systems with Crash Failure
Two-Phase Atomic Commitment Protocol in Asynchronous Distributed Systems with Crash Failure Yong-Hwan Cho, Sung-Hoon Park and Seon-Hyong Lee School of Electrical and Computer Engineering, Chungbuk National
More informationDistributed Algorithms Benoît Garbinato
Distributed Algorithms Benoît Garbinato 1 Distributed systems networks distributed As long as there were no machines, programming was no problem networks distributed at all; when we had a few weak computers,
More informationPractical Byzantine Fault Tolerance
Practical Byzantine Fault Tolerance Robert Grimm New York University (Partially based on notes by Eric Brewer and David Mazières) The Three Questions What is the problem? What is new or different? What
More informationInternational Journal of Advanced Research in Computer Science and Software Engineering
Volume 2, Issue 9, September 2012 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Backup Two
More information