Agreement in Distributed Systems CS 188 Distributed Systems February 19, 2015

Size: px
Start display at page:

Download "Agreement in Distributed Systems CS 188 Distributed Systems February 19, 2015"

Transcription

1 Agreement in Distributed Systems CS 188 Distributed Systems February 19, 2015 Page 1

2 Introduction We frequently want to get a set of nodes in a distributed system to agree Commitment protocols and mutual exclusion are particular cases The approaches we discussed for those work in limited situations In general, when can we reach agreement in a distributed system? Page 2

3 Basics of Agreement Protocols What is agreement? What are the necessary conditions for agreement? Page 3

4 What Do We Mean By Agreement? In simplest case, can n processors agree that a variable takes on value 0 or 1? Only non-faulty processors need agree More complex agreements can be built from this simple agreement Page 4

5 Conditions for Agreement Consistency Protocols All participants agree on same value and decisions are final Validity Participants agree on a value at least one of them wanted Termination/Progress All participants choose a value in a finite number of steps Page 5

6 Challenges to Agreement Delays In message delivery In nodes responding to messages Failures And recovery from failures Lies by participants Or innocent errors that have similar effects Page 6

7 Failures and Agreement Failures make agreement difficult Failed nodes don t participate Failed nodes sometimes recover at inconvenient times At worst, failed nodes participate in harmful ways Real failures are worse than fail-stop Page 7

8 Types of Failures Fail-stop A nice, clean failure Processor stops executing anything Realistic failures Partitionings Arbitrary delays Adversarial failures Arbitrary bad things happen Page 8

9 Election Algorithms If you get everyone to agree a particular node is in charge, Future consensus is easy, since he makes the decisions How do you determine who s in charge? Statically Dynamically Page 9

10 Static Leader Selection Methods Predefine one process/node as the leader Simple Everyone always knows who s the leader Not very resilient If the leader fails, then what? Page 10

11 Dynamic Leader Selection Methods Choose a new leader dynamically whenever necessary More complicated But failure of a leader is easy to handle Just elect a new one Election doesn t imply voting Not necessarily majority-based Page 11

12 Election Algorithms vs. Mutual Exclusion Algorithms Most mutual exclusion algorithms don t care much about failures Election algorithms are designed to handle failures Also, mutual exclusion algorithms only need a winner Election algorithms need everyone to know who won Page 12

13 A Typical Use of Election Algorithms A group of processes wants to periodically take a distributed snapshot They don t want multiple simultaneous snapshots So they want one leader to order them to take the snapshot Page 13

14 Problems in Election Algorithms Some of the nodes may have failed before the algorithm starts Some of the nodes may fail during the algorithm Some nodes may recover from failure Possible at inconvenient times What about partitions? Page 14

15 Election Algorithms and the Real Work The election algorithm is usually overhead There s a real computation you want to perform The election algorithm chooses someone to lead it Having two leaders while real computation is going on is bad Page 15

16 The Bully Algorithm The biggest kid on the block gets to be the leader But what if the biggest kid on the block is taking his piano lesson? The next biggest kid gets to be leader Until the piano lesson is over... Page 16

17 Spike s The piano Mom lesson hasn t let ends him out yet Electing a Bully The kids come out to play I m I m here, the leader, Hey, where I m the I m leader, here, Hey, Butch! Peewee! are and Butch! you we re Spike! sissies? playing let s who play else tag! Spike! is? baseball! Cuthbert! Page 17

18 Assumptions of the Bully Algorithm A static set of possible participants With an agreed-upon order All messages are delivered with T m seconds All responses are sent within T p seconds of delivery These last two imply synchronous behavior Page 18

19 The Basic Idea Behind the Bully Algorithm Possible leaders try to take over If they detect a better leader, they agree to its leadership Keep track of state information about whether you are electing a leader Only do real work when you agree on a leader Page 19

20 The Bully Algorithm and Timeouts Call out the biggest kid s name If he doesn t answer soon enough, call out the next biggest kid s name Until you hear an answer Or the caller is the biggest kid Then take over, by telling everyone else you re the leader Page 20

21 The Bully Algorithm At Work One node is currently the coordinator It expects a certain set of nodes to be up and participating The coordinator asks all other nodes If an expected node doesn t answer, start an election Also if it answers in the negative If an unexpected node answers, start an election Page 21

22 The Practicality of the Bully Algorithm The bully algorithm works reasonably well if the timeouts are effective A timeout occurring really means the site in question is down And there are no partitions at all If there are, what happens? Page 22

23 The Invitation Algorithm More practical than bully algorithm Doesn t depend on timeouts But its results are not as definitive An asynchronous algorithm Page 23

24 The Basic Idea Behind the Invitation Algorithm A current coordinator tries to get all other nodes to agree to his leadership If more than one coordinator around, get together and merge groups Use timeouts only to allow progress, not to make definitive decisions No set priorities for who will be coordinator Page 24

25 The Invitation Algorithm and Group Numbers The invitation algorithm recruits a group of nodes to work together More than one group can exist simultaneously Group numbers identify the group Why not identify with coordinator ID? Because one node can serially coordinate many groups Page 25

26 The Basic Operation of the Invitation Algorithm Coordinators in a normal state periodically check all other nodes If any other node is a coordinator, try to merge the groups If timeouts occur, don t worry about it Also don t worry if a response to check comes from this or earlier request Page 26

27 Merging in the Invitation Algorithm Merging always requires forming new group May have same coordinator, but different group number Coordinator who initiates merge asks all other known coordinators to merge They ask their group members Original group members also asked Page 27

28 A Simplified Example 1 UP ={1,2,3,4} Yes 1 Accept Invite Ready 31 Node 1 checks for other coordinator Accept Ready Invite AreYouCoordinator? No AreYouCoordinator? 3 Invite on behalf of node So node 1 finds another coordinator Node 1 asks the Node other 1 coordinator forms a new and group his old node to join his group If all members of UP{} respond, we re fine Page 28

29 The Reorganization State Nodes enter the reorganization state after getting their answer What s the point of this state? Why not just start up the group? After all, we all know who s going to be a member Or do we? Page 29

30 Why We Need Another Round of Messages 1 Invitation Invitation Invitation Assuming Who And what does no if 1 think someone timeouts, will crashes? join 4 will the also group, join at this point? And 2 Presumably and 23 needs not to know accepting that the invitation? Page 30

31 Timeouts in the Merge Don t worry too much about them Some nodes respond before the timeout Some don t If you don t catch them this time, you might the next Page 31

32 Straggler Messages This algorithm is asynchronous So messages may come in late What do we do when messages arrive late? Mostly, reject them How do we tell? Messages contain group number Page 32

33 Multiple Simultaneous Groups The invitation algorithm allows multiple simultaneous groups to exist Each with a proper coordinator Is this a good thing? No, but what are the alternatives? No node ever belongs to more than one group, at least Page 33

34 Paxos A family of algorithms that allow a distributed system to reach agreement In the face of delays and failures Can t perfectly guarantee progress But makes progress in realistic conditions Does guarantee consistency Usually defined to reach consensus on some value v Page 34

35 Paxos Assumptions Processors are of variable speed and may fail Might recover after failure But they don t lie Any processor can send a message to any other processor Messages can be lost, arbitrarily delayed, reordered, or duplicated But never corrupted Page 35

36 Client Paxos Processor Roles Issues a request, waits for a response Acceptor/voter Remembers things for the protocol Proposer (simpler if there s only one) Assists client in getting a response Learner Actually executes a request Leader One of the proposers that leads the process One processor can play several roles Usually, all processes are acceptors, proposers, and learners Page 36

37 Paxos Quorums Collections of acceptors that make decisions Several different quorums in system Messages are sent to quorums, not single acceptors Messages only effective if all quorum members receive it Similarly, all acceptors in a quorum must send a message for to be effective If any member of the quorum survives, its decisions survive Page 37

38 Quorum Membership All quorums must contain a majority of all acceptors in the system Any two quorums must share at least one acceptor E.g., if there are four acceptors {1,2,3,4}, quorums might be: {1,2,3}, {1,2,4}, {2,3,4}, {1,3,4} Page 38

39 Paxos Rounds Paxos proceeds in rounds In response to a client request If the round reaches agreement, the client gets a response If not, you start another round Continue till a round reaches agreement Page 39

40 V res is a result chosen by P, if no promise had a value A Simple Paxos Round 4. accept(n,v res ) 2. prepare(n) A 1 C 1. request P A 2 L 1 6. response If an acceptor ever promised on this item before, it returns the generation and value from that run of Paxos, not null A 3 5. accepted(n,v max ) 3. promise(n,null) 3. promise(n,null) 3. promise(n, null) L 2 N is a bigger number than P has ever used or seen before Page 40

41 The Point of Different Paxos The client wants to get something done Roles The learners ensure redundant memory of the Remember! result of a decision A 1 C One machine P can play A multiple roles 2 L 1 The proposer coordinates protocol activities A 3 The acceptors ensure proper concurrent behavior and handle proposer failures L 2 Page 41

42 Paxos Error Handling Some cases simple, some complex A simple case: One of the acceptors fails If there s still a quorum, no problem Go ahead without him Another simple case: One of the learners failed If any learners are left, they ll provide the right response to the client Page 42

43 More Complex Error Cases Things like failure of proposer in middle of a round Paxos chooses a new leader and uses him from this point What if old leader comes back? Even more complex, but it works out Page 43

44 Paxos and Overheads Generally quite expensive In messages and thus delays Many optimizations possible Some don t alter the protocol characteristics Some trade off handling some error conditions for better performance Page 44

45 Byzantine Agreement Life can be a lot worse than merely being unable to rely on timeouts What if one of the nodes we re working with is lying? How can we reach agreement if we can t trust all the participants? Page 45

46 The Purpose of Byzantine Agreement Well, why would one of our distributed system components lie? It probably wouldn t But it might contain a bug If it contains the worst possible bug, what can it do? Essentially, inadvertently lie Page 46

47 The Realism of Byzantine Agreement It isn t realistic It doesn t really happen No one really uses it But it demonstrates a limit on how badly things can go while still allowing agreement Page 47

48 Why Is It Called Byzantine? After the fall of Rome itself, the empire lived on in the east Called Byzantium Byzantium survived for around 1000 years The Byzantines were famous for their treachery and double-dealing Page 48

49 The Byzantine General Problem Several Byzantine generals each command their own army They are far apart and communicate with messengers The emperor wants to attack the Turks If all generals attack, they ll win Even if a majority attack, they ll win Retreating is OK, if everyone does it But the Turks may have bribed some generals Page 49

50 The Complete Problem Statement Messages are point-to-point Messages are reliably delivered, with a predictable timeout Failure to receive message in time means sender is a traitor Traitors can send any messages they please But cannot forge their identities Page 50

51 How Many Traitors Is Too Many? Can all the loyal generals reach agreement on whether to attack or retreat? Or can the traitors prevent them from reaching any agreement? How many generals must the Turks bribe before no agreement is possible? Page 51

52 The Answer If the Turks bribe 1/3 of the generals, the remaining 2/3 s cannot reach agreement How can that be? Why not just a majority? Easiest to consider in the case of a commander Page 52

53 The 3-General Byzantine Problem Attack Commander Retreat Attack But what if the commander is a traitor? What if they re all loyal? Everyone One general attacks attacks, and one the Turk retreats, is vanquished the traitor pockets the bribe, and the Turks win Page 53

54 Can t the Loyal Generals Check Their Orders? Attack 1 Commander Retreat Retreat 2 3 Attack Generals 2 and 3 check their orders They figure out 1 is a traitor and come to their own agreement Page 54

55 But What if the Commander Wasn t the Traitor? Attack 1 Commander Attack Retreat 2 3 Attack Generals 3 is 2 the and traitor, 3 check this their time But 1 isn t the traitor, 3 is the traitor orders They He convinces figure out 21 to is retreat, a traitor 1 is and slaughtered come to their attacking, own agreement and 3 pockets the bribe Page 55

56 Can General 2 Tell Which Scenario Is Occurring? When 1 was the traitor, 2 saw: When 3 was the traitor, 2 saw: Attack 1 Attack 1 Retreat 2 3 Retreat can t tell the difference, so he can t decide whether to attack or retreat Page 56

57 What If There Were 4 Generals? 1 Commander Attack Attack Retreat What if the commander (1) is the traitor? If he doesn t send some messages, he ll be seen as the traitor But what can he send? Page 57

58 Can the Three Loyal Generals Reach Agreement? 1 Commander Attack Attack Retreat They can exchange all the messages and let the majority rule Since there are only two messages, the commander must have sent the same message to two nodes If the commander is loyal and someone else is lying, the majority represents the loyal commander s will Page 58

59 But What if There Were Five Generals? 1 Commander Attack Attack Retreat Pre-arrange a tie-breaker E.g., always retreat on ties All the loyal generals then retreat And the traitor must explain his failure to the Turks Retreat Page 59

60 What If You Don t Want a Commander? What if you want everyone to vote? And accept the majority? With the guarantee that all loyal nodes abide by the majority? Serially treat each node as the commander Reach agreement on his vote Then move on to the next node Page 60

61 The Trick Behind Byzantine Agreement Everyone must know what everyone else thinks about everything else Not just what I think the commander said, but what everyone else claims the commander said Resulting algorithms are tricky and expensive But it could be (and will be) worse Page 61

62 Authenticated Byzantine Agreement What if the messages are signed in an unforgeable way? Then dishonest generals can t lie about what honest general told them In this case, honest generals reach agreement regardless of how many are dishonest Page 62

Distributed Systems 11. Consensus. Paul Krzyzanowski

Distributed Systems 11. Consensus. Paul Krzyzanowski Distributed Systems 11. Consensus Paul Krzyzanowski pxk@cs.rutgers.edu 1 Consensus Goal Allow a group of processes to agree on a result All processes must agree on the same value The value must be one

More information

Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf

Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf Distributed systems Lecture 6: distributed transactions, elections, consensus and replication Malte Schwarzkopf Last time Saw how we can build ordered multicast Messages between processes in a group Need

More information

Failure models. Byzantine Fault Tolerance. What can go wrong? Paxos is fail-stop tolerant. BFT model. BFT replication 5/25/18

Failure models. Byzantine Fault Tolerance. What can go wrong? Paxos is fail-stop tolerant. BFT model. BFT replication 5/25/18 Failure models Byzantine Fault Tolerance Fail-stop: nodes either execute the protocol correctly or just stop Byzantine failures: nodes can behave in any arbitrary way Send illegal messages, try to trick

More information

Distributed Systems. 10. Consensus: Paxos. Paul Krzyzanowski. Rutgers University. Fall 2017

Distributed Systems. 10. Consensus: Paxos. Paul Krzyzanowski. Rutgers University. Fall 2017 Distributed Systems 10. Consensus: Paxos Paul Krzyzanowski Rutgers University Fall 2017 1 Consensus Goal Allow a group of processes to agree on a result All processes must agree on the same value The value

More information

Distributed Systems. Fault Tolerance. Paul Krzyzanowski

Distributed Systems. Fault Tolerance. Paul Krzyzanowski Distributed Systems Fault Tolerance Paul Krzyzanowski Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution 2.5 License. Faults Deviation from expected

More information

Consensus, impossibility results and Paxos. Ken Birman

Consensus, impossibility results and Paxos. Ken Birman Consensus, impossibility results and Paxos Ken Birman Consensus a classic problem Consensus abstraction underlies many distributed systems and protocols N processes They start execution with inputs {0,1}

More information

Recap. CSE 486/586 Distributed Systems Paxos. Paxos. Brief History. Brief History. Brief History C 1

Recap. CSE 486/586 Distributed Systems Paxos. Paxos. Brief History. Brief History. Brief History C 1 Recap Distributed Systems Steve Ko Computer Sciences and Engineering University at Buffalo Facebook photo storage CDN (hot), Haystack (warm), & f4 (very warm) Haystack RAID-6, per stripe: 10 data disks,

More information

Consensus and related problems

Consensus and related problems Consensus and related problems Today l Consensus l Google s Chubby l Paxos for Chubby Consensus and failures How to make process agree on a value after one or more have proposed what the value should be?

More information

Consensus a classic problem. Consensus, impossibility results and Paxos. Distributed Consensus. Asynchronous networks.

Consensus a classic problem. Consensus, impossibility results and Paxos. Distributed Consensus. Asynchronous networks. Consensus, impossibility results and Paxos Ken Birman Consensus a classic problem Consensus abstraction underlies many distributed systems and protocols N processes They start execution with inputs {0,1}

More information

Transactions. CS 475, Spring 2018 Concurrent & Distributed Systems

Transactions. CS 475, Spring 2018 Concurrent & Distributed Systems Transactions CS 475, Spring 2018 Concurrent & Distributed Systems Review: Transactions boolean transfermoney(person from, Person to, float amount){ if(from.balance >= amount) { from.balance = from.balance

More information

Last time. Distributed systems Lecture 6: Elections, distributed transactions, and replication. DrRobert N. M. Watson

Last time. Distributed systems Lecture 6: Elections, distributed transactions, and replication. DrRobert N. M. Watson Distributed systems Lecture 6: Elections, distributed transactions, and replication DrRobert N. M. Watson 1 Last time Saw how we can build ordered multicast Messages between processes in a group Need to

More information

Today: Fault Tolerance

Today: Fault Tolerance Today: Fault Tolerance Agreement in presence of faults Two army problem Byzantine generals problem Reliable communication Distributed commit Two phase commit Three phase commit Paxos Failure recovery Checkpointing

More information

Today: Fault Tolerance. Fault Tolerance

Today: Fault Tolerance. Fault Tolerance Today: Fault Tolerance Agreement in presence of faults Two army problem Byzantine generals problem Reliable communication Distributed commit Two phase commit Three phase commit Paxos Failure recovery Checkpointing

More information

Distributed Systems COMP 212. Lecture 19 Othon Michail

Distributed Systems COMP 212. Lecture 19 Othon Michail Distributed Systems COMP 212 Lecture 19 Othon Michail Fault Tolerance 2/31 What is a Distributed System? 3/31 Distributed vs Single-machine Systems A key difference: partial failures One component fails

More information

FAULT TOLERANCE. Fault Tolerant Systems. Faults Faults (cont d)

FAULT TOLERANCE. Fault Tolerant Systems. Faults Faults (cont d) Distributed Systems Fö 9/10-1 Distributed Systems Fö 9/10-2 FAULT TOLERANCE 1. Fault Tolerant Systems 2. Faults and Fault Models. Redundancy 4. Time Redundancy and Backward Recovery. Hardware Redundancy

More information

CSE 5306 Distributed Systems. Fault Tolerance

CSE 5306 Distributed Systems. Fault Tolerance CSE 5306 Distributed Systems Fault Tolerance 1 Failure in Distributed Systems Partial failure happens when one component of a distributed system fails often leaves other components unaffected A failure

More information

Practical Byzantine Fault Tolerance. Miguel Castro and Barbara Liskov

Practical Byzantine Fault Tolerance. Miguel Castro and Barbara Liskov Practical Byzantine Fault Tolerance Miguel Castro and Barbara Liskov Outline 1. Introduction to Byzantine Fault Tolerance Problem 2. PBFT Algorithm a. Models and overview b. Three-phase protocol c. View-change

More information

Distributed Systems Fault Tolerance

Distributed Systems Fault Tolerance Distributed Systems Fault Tolerance [] Fault Tolerance. Basic concepts - terminology. Process resilience groups and failure masking 3. Reliable communication reliable client-server communication reliable

More information

Recall our 2PC commit problem. Recall our 2PC commit problem. Doing failover correctly isn t easy. Consensus I. FLP Impossibility, Paxos

Recall our 2PC commit problem. Recall our 2PC commit problem. Doing failover correctly isn t easy. Consensus I. FLP Impossibility, Paxos Consensus I Recall our 2PC commit problem FLP Impossibility, Paxos Client C 1 C à TC: go! COS 418: Distributed Systems Lecture 7 Michael Freedman Bank A B 2 TC à A, B: prepare! 3 A, B à P: yes or no 4

More information

Database management system Prof. D. Janakiram Department of Computer Science and Engineering Indian Institute of Technology, Madras

Database management system Prof. D. Janakiram Department of Computer Science and Engineering Indian Institute of Technology, Madras Database management system Prof. D. Janakiram Department of Computer Science and Engineering Indian Institute of Technology, Madras Lecture 25 Basic 2-phase & 3-phase Commit protocol In the last lecture,

More information

CSE 5306 Distributed Systems

CSE 5306 Distributed Systems CSE 5306 Distributed Systems Fault Tolerance Jia Rao http://ranger.uta.edu/~jrao/ 1 Failure in Distributed Systems Partial failure Happens when one component of a distributed system fails Often leaves

More information

To do. Consensus and related problems. q Failure. q Raft

To do. Consensus and related problems. q Failure. q Raft Consensus and related problems To do q Failure q Consensus and related problems q Raft Consensus We have seen protocols tailored for individual types of consensus/agreements Which process can enter the

More information

Outline More Security Protocols CS 239 Computer Security February 6, 2006

Outline More Security Protocols CS 239 Computer Security February 6, 2006 Outline More Security Protocols CS 239 Computer Security February 6, 2006 Combining key distribution and authentication Verifying security protocols Page 1 Page 2 Combined Key Distribution and Authentication

More information

Fault Tolerance Part I. CS403/534 Distributed Systems Erkay Savas Sabanci University

Fault Tolerance Part I. CS403/534 Distributed Systems Erkay Savas Sabanci University Fault Tolerance Part I CS403/534 Distributed Systems Erkay Savas Sabanci University 1 Overview Basic concepts Process resilience Reliable client-server communication Reliable group communication Distributed

More information

WA 2. Paxos and distributed programming Martin Klíma

WA 2. Paxos and distributed programming Martin Klíma Paxos and distributed programming Martin Klíma Spefics of distributed programming Communication by message passing Considerable time to pass the network Non-reliable network Time is not synchronized on

More information

CS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5.

CS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5. Question 1 What makes a message unstable? How does an unstable message become stable? Distributed Systems 2016 Exam 2 Review Paul Krzyzanowski Rutgers University Fall 2016 In virtual sychrony, a message

More information

Paxos and Replication. Dan Ports, CSEP 552

Paxos and Replication. Dan Ports, CSEP 552 Paxos and Replication Dan Ports, CSEP 552 Today: achieving consensus with Paxos and how to use this to build a replicated system Last week Scaling a web service using front-end caching but what about the

More information

Practical Byzantine Fault Tolerance. Castro and Liskov SOSP 99

Practical Byzantine Fault Tolerance. Castro and Liskov SOSP 99 Practical Byzantine Fault Tolerance Castro and Liskov SOSP 99 Why this paper? Kind of incredible that it s even possible Let alone a practical NFS implementation with it So far we ve only considered fail-stop

More information

Coordination 1. To do. Mutual exclusion Election algorithms Next time: Global state. q q q

Coordination 1. To do. Mutual exclusion Election algorithms Next time: Global state. q q q Coordination 1 To do q q q Mutual exclusion Election algorithms Next time: Global state Coordination and agreement in US Congress 1798-2015 Process coordination How can processes coordinate their action?

More information

Byzantine Fault Tolerance

Byzantine Fault Tolerance Byzantine Fault Tolerance CS 240: Computing Systems and Concurrency Lecture 11 Marco Canini Credits: Michael Freedman and Kyle Jamieson developed much of the original material. So far: Fail-stop failures

More information

CprE Fault Tolerance. Dr. Yong Guan. Department of Electrical and Computer Engineering & Information Assurance Center Iowa State University

CprE Fault Tolerance. Dr. Yong Guan. Department of Electrical and Computer Engineering & Information Assurance Center Iowa State University Fault Tolerance Dr. Yong Guan Department of Electrical and Computer Engineering & Information Assurance Center Iowa State University Outline for Today s Talk Basic Concepts Process Resilience Reliable

More information

Outline More Security Protocols CS 239 Computer Security February 4, 2004

Outline More Security Protocols CS 239 Computer Security February 4, 2004 Outline More Security Protocols CS 239 Computer Security February 4, 2004 Combining key distribution and authentication Verifying security protocols Page 1 Page 2 Combined Key Distribution and Authentication

More information

Security (and finale) Dan Ports, CSEP 552

Security (and finale) Dan Ports, CSEP 552 Security (and finale) Dan Ports, CSEP 552 Today Security: what if parts of your distributed system are malicious? BFT: state machine replication Bitcoin: peer-to-peer currency Course wrap-up Security Too

More information

CS5412: CONSENSUS AND THE FLP IMPOSSIBILITY RESULT

CS5412: CONSENSUS AND THE FLP IMPOSSIBILITY RESULT 1 CS5412: CONSENSUS AND THE FLP IMPOSSIBILITY RESULT Lecture XII Ken Birman Generalizing Ron and Hermione s challenge 2 Recall from last time: Ron and Hermione had difficulty agreeing where to meet for

More information

AGREEMENT PROTOCOLS. Paxos -a family of protocols for solving consensus

AGREEMENT PROTOCOLS. Paxos -a family of protocols for solving consensus AGREEMENT PROTOCOLS Paxos -a family of protocols for solving consensus OUTLINE History of the Paxos algorithm Paxos Algorithm Family Implementation in existing systems References HISTORY OF THE PAXOS ALGORITHM

More information

Distributed Systems Consensus

Distributed Systems Consensus Distributed Systems Consensus Amir H. Payberah amir@sics.se Amirkabir University of Technology (Tehran Polytechnic) Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 1 / 56 What is the Problem?

More information

Paxos and Raft (Lecture 21, cs262a) Ion Stoica, UC Berkeley November 7, 2016

Paxos and Raft (Lecture 21, cs262a) Ion Stoica, UC Berkeley November 7, 2016 Paxos and Raft (Lecture 21, cs262a) Ion Stoica, UC Berkeley November 7, 2016 Bezos mandate for service-oriented-architecture (~2002) 1. All teams will henceforth expose their data and functionality through

More information

Agreement and Consensus. SWE 622, Spring 2017 Distributed Software Engineering

Agreement and Consensus. SWE 622, Spring 2017 Distributed Software Engineering Agreement and Consensus SWE 622, Spring 2017 Distributed Software Engineering Today General agreement problems Fault tolerance limitations of 2PC 3PC Paxos + ZooKeeper 2 Midterm Recap 200 GMU SWE 622 Midterm

More information

Paxos Made Simple. Leslie Lamport, 2001

Paxos Made Simple. Leslie Lamport, 2001 Paxos Made Simple Leslie Lamport, 2001 The Problem Reaching consensus on a proposed value, among a collection of processes Safety requirements: Only a value that has been proposed may be chosen Only a

More information

Outline. More Security Protocols CS 239 Security for System Software April 22, Needham-Schroeder Key Exchange

Outline. More Security Protocols CS 239 Security for System Software April 22, Needham-Schroeder Key Exchange Outline More Security Protocols CS 239 Security for System Software April 22, 2002 Combining key distribution and authentication Verifying security protocols Page 1 Page 2 Combined Key Distribution and

More information

BYZANTINE GENERALS BYZANTINE GENERALS (1) A fable: Michał Szychowiak, 2002 Dependability of Distributed Systems (Byzantine agreement)

BYZANTINE GENERALS BYZANTINE GENERALS (1) A fable: Michał Szychowiak, 2002 Dependability of Distributed Systems (Byzantine agreement) BYZANTINE GENERALS (1) BYZANTINE GENERALS A fable: BYZANTINE GENERALS (2) Byzantine Generals Problem: Condition 1: All loyal generals decide upon the same plan of action. Condition 2: A small number of

More information

Dep. Systems Requirements

Dep. Systems Requirements Dependable Systems Dep. Systems Requirements Availability the system is ready to be used immediately. A(t) = probability system is available for use at time t MTTF/(MTTF+MTTR) If MTTR can be kept small

More information

Intuitive distributed algorithms. with F#

Intuitive distributed algorithms. with F# Intuitive distributed algorithms with F# Natallia Dzenisenka Alena Hall @nata_dzen @lenadroid A tour of a variety of intuitivedistributed algorithms used in practical distributed systems. and how to prototype

More information

Fault Tolerance Dealing with an imperfect world

Fault Tolerance Dealing with an imperfect world Fault Tolerance Dealing with an imperfect world Paul Krzyzanowski Rutgers University September 14, 2012 1 Introduction If we look at the words fault and tolerance, we can define the fault as a malfunction

More information

Distributed Systems. coordination Johan Montelius ID2201. Distributed Systems ID2201

Distributed Systems. coordination Johan Montelius ID2201. Distributed Systems ID2201 Distributed Systems ID2201 coordination Johan Montelius 1 Coordination Coordinating several threads in one node is a problem, coordination in a network is of course worse: failure of nodes and networks

More information

Distributed Systems. Multicast and Agreement

Distributed Systems. Multicast and Agreement Distributed Systems Multicast and Agreement Björn Franke University of Edinburgh 2015/2016 Multicast Send message to multiple nodes A node can join a multicast group, and receives all messages sent to

More information

Proseminar Distributed Systems Summer Semester Paxos algorithm. Stefan Resmerita

Proseminar Distributed Systems Summer Semester Paxos algorithm. Stefan Resmerita Proseminar Distributed Systems Summer Semester 2016 Paxos algorithm stefan.resmerita@cs.uni-salzburg.at The Paxos algorithm Family of protocols for reaching consensus among distributed agents Agents may

More information

Paxos. Sistemi Distribuiti Laurea magistrale in ingegneria informatica A.A Leonardo Querzoni. giovedì 19 aprile 12

Paxos. Sistemi Distribuiti Laurea magistrale in ingegneria informatica A.A Leonardo Querzoni. giovedì 19 aprile 12 Sistemi Distribuiti Laurea magistrale in ingegneria informatica A.A. 2011-2012 Leonardo Querzoni The Paxos family of algorithms was introduced in 1999 to provide a viable solution to consensus in asynchronous

More information

Byzantine Fault Tolerance

Byzantine Fault Tolerance Byzantine Fault Tolerance CS6450: Distributed Systems Lecture 10 Ryan Stutsman Material taken/derived from Princeton COS-418 materials created by Michael Freedman and Kyle Jamieson at Princeton University.

More information

Process Synchroniztion Mutual Exclusion & Election Algorithms

Process Synchroniztion Mutual Exclusion & Election Algorithms Process Synchroniztion Mutual Exclusion & Election Algorithms Paul Krzyzanowski Rutgers University November 2, 2017 1 Introduction Process synchronization is the set of techniques that are used to coordinate

More information

Fault Tolerance. Distributed Systems. September 2002

Fault Tolerance. Distributed Systems. September 2002 Fault Tolerance Distributed Systems September 2002 Basics A component provides services to clients. To provide services, the component may require the services from other components a component may depend

More information

Last Time. 19: Distributed Coordination. Distributed Coordination. Recall. Event Ordering. Happens-before

Last Time. 19: Distributed Coordination. Distributed Coordination. Recall. Event Ordering. Happens-before Last Time 19: Distributed Coordination Last Modified: 7/3/2004 1:50:34 PM We talked about the potential benefits of distributed systems We also talked about some of the reasons they can be so difficult

More information

Verteilte Systeme/Distributed Systems Ch. 5: Various distributed algorithms

Verteilte Systeme/Distributed Systems Ch. 5: Various distributed algorithms Verteilte Systeme/Distributed Systems Ch. 5: Various distributed algorithms Holger Karl Computer Networks Group Universität Paderborn Goal of this chapter Apart from issues in distributed time and resulting

More information

Fault Tolerance. Basic Concepts

Fault Tolerance. Basic Concepts COP 6611 Advanced Operating System Fault Tolerance Chi Zhang czhang@cs.fiu.edu Dependability Includes Availability Run time / total time Basic Concepts Reliability The length of uninterrupted run time

More information

Last Class: Clock Synchronization. Today: More Canonical Problems

Last Class: Clock Synchronization. Today: More Canonical Problems Last Class: Clock Synchronization Logical clocks Vector clocks Global state Lecture 11, page 1 Today: More Canonical Problems Distributed snapshot and termination detection Election algorithms Bully algorithm

More information

You can also launch the instances on different machines by supplying IPv4 addresses and port numbers in the format :3410

You can also launch the instances on different machines by supplying IPv4 addresses and port numbers in the format :3410 CS 3410: Paxos Introduction In this assignment, you will implement a simple in-memory database that is replicated across multiple hosts using the Paxos distributed consensis protocol. You can download

More information

Chapter 8 Fault Tolerance

Chapter 8 Fault Tolerance DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 8 Fault Tolerance 1 Fault Tolerance Basic Concepts Being fault tolerant is strongly related to

More information

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 14 Distributed Transactions

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 14 Distributed Transactions CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 14 Distributed Transactions Transactions Main issues: Concurrency control Recovery from failures 2 Distributed Transactions

More information

SpecPaxos. James Connolly && Harrison Davis

SpecPaxos. James Connolly && Harrison Davis SpecPaxos James Connolly && Harrison Davis Overview Background Fast Paxos Traditional Paxos Implementations Data Centers Mostly-Ordered-Multicast Network layer Speculative Paxos Protocol Application layer

More information

Example File Systems Using Replication CS 188 Distributed Systems February 10, 2015

Example File Systems Using Replication CS 188 Distributed Systems February 10, 2015 Example File Systems Using Replication CS 188 Distributed Systems February 10, 2015 Page 1 Example Replicated File Systems NFS Coda Ficus Page 2 NFS Originally NFS did not have any replication capability

More information

Global atomicity. Such distributed atomicity is called global atomicity A protocol designed to enforce global atomicity is called commit protocol

Global atomicity. Such distributed atomicity is called global atomicity A protocol designed to enforce global atomicity is called commit protocol Global atomicity In distributed systems a set of processes may be taking part in executing a task Their actions may have to be atomic with respect to processes outside of the set example: in a distributed

More information

ATOMIC COMMITMENT Or: How to Implement Distributed Transactions in Sharded Databases

ATOMIC COMMITMENT Or: How to Implement Distributed Transactions in Sharded Databases ATOMIC COMMITMENT Or: How to Implement Distributed Transactions in Sharded Databases We talked about transactions and how to implement them in a single-node database. We ll now start looking into how to

More information

Failures, Elections, and Raft

Failures, Elections, and Raft Failures, Elections, and Raft CS 8 XI Copyright 06 Thomas W. Doeppner, Rodrigo Fonseca. All rights reserved. Distributed Banking SFO add interest based on current balance PVD deposit $000 CS 8 XI Copyright

More information

Recovering from a Crash. Three-Phase Commit

Recovering from a Crash. Three-Phase Commit Recovering from a Crash If INIT : abort locally and inform coordinator If Ready, contact another process Q and examine Q s state Lecture 18, page 23 Three-Phase Commit Two phase commit: problem if coordinator

More information

CS 3640: Introduction to Networks and Their Applications

CS 3640: Introduction to Networks and Their Applications CS 3640: Introduction to Networks and Their Applications Fall 2018, Lecture 7: The Link Layer II Medium Access Control Protocols Instructor: Rishab Nithyanand Teaching Assistant: Md. Kowsar Hossain 1 You

More information

Chapter 5: Distributed Systems: Fault Tolerance. Fall 2013 Jussi Kangasharju

Chapter 5: Distributed Systems: Fault Tolerance. Fall 2013 Jussi Kangasharju Chapter 5: Distributed Systems: Fault Tolerance Fall 2013 Jussi Kangasharju Chapter Outline n Fault tolerance n Process resilience n Reliable group communication n Distributed commit n Recovery 2 Basic

More information

Exam 2 Review. October 29, Paul Krzyzanowski 1

Exam 2 Review. October 29, Paul Krzyzanowski 1 Exam 2 Review October 29, 2015 2013 Paul Krzyzanowski 1 Question 1 Why did Dropbox add notification servers to their architecture? To avoid the overhead of clients polling the servers periodically to check

More information

Practical Byzantine Fault Tolerance (The Byzantine Generals Problem)

Practical Byzantine Fault Tolerance (The Byzantine Generals Problem) Practical Byzantine Fault Tolerance (The Byzantine Generals Problem) Introduction Malicious attacks and software errors that can cause arbitrary behaviors of faulty nodes are increasingly common Previous

More information

PROCESS SYNCHRONIZATION

PROCESS SYNCHRONIZATION DISTRIBUTED COMPUTER SYSTEMS PROCESS SYNCHRONIZATION Dr. Jack Lange Computer Science Department University of Pittsburgh Fall 2015 Process Synchronization Mutual Exclusion Algorithms Permission Based Centralized

More information

Failure Tolerance. Distributed Systems Santa Clara University

Failure Tolerance. Distributed Systems Santa Clara University Failure Tolerance Distributed Systems Santa Clara University Distributed Checkpointing Distributed Checkpointing Capture the global state of a distributed system Chandy and Lamport: Distributed snapshot

More information

Introduction to Distributed Systems Seif Haridi

Introduction to Distributed Systems Seif Haridi Introduction to Distributed Systems Seif Haridi haridi@kth.se What is a distributed system? A set of nodes, connected by a network, which appear to its users as a single coherent system p1 p2. pn send

More information

Concepts. Techniques for masking faults. Failure Masking by Redundancy. CIS 505: Software Systems Lecture Note on Consensus

Concepts. Techniques for masking faults. Failure Masking by Redundancy. CIS 505: Software Systems Lecture Note on Consensus CIS 505: Software Systems Lecture Note on Consensus Insup Lee Department of Computer and Information Science University of Pennsylvania CIS 505, Spring 2007 Concepts Dependability o Availability ready

More information

Network Protocols. Sarah Diesburg Operating Systems CS 3430

Network Protocols. Sarah Diesburg Operating Systems CS 3430 Network Protocols Sarah Diesburg Operating Systems CS 3430 Protocol An agreement between two parties as to how information is to be transmitted A network protocol abstracts packets into messages Physical

More information

Fault-Tolerance & Paxos

Fault-Tolerance & Paxos Chapter 15 Fault-Tolerance & Paxos How do you create a fault-tolerant distributed system? In this chapter we start out with simple questions, and, step by step, improve our solutions until we arrive at

More information

Last Class: Clock Synchronization. Today: More Canonical Problems

Last Class: Clock Synchronization. Today: More Canonical Problems Last Class: Clock Synchronization Logical clocks Vector clocks Global state Lecture 12, page 1 Today: More Canonical Problems Distributed snapshot and termination detection Election algorithms Bully algorithm

More information

Distributed Consensus Protocols

Distributed Consensus Protocols Distributed Consensus Protocols ABSTRACT In this paper, I compare Paxos, the most popular and influential of distributed consensus protocols, and Raft, a fairly new protocol that is considered to be a

More information

Distributed Synchronization. EECS 591 Farnam Jahanian University of Michigan

Distributed Synchronization. EECS 591 Farnam Jahanian University of Michigan Distributed Synchronization EECS 591 Farnam Jahanian University of Michigan Reading List Tanenbaum Chapter 5.1, 5.4 and 5.5 Clock Synchronization Distributed Election Mutual Exclusion Clock Synchronization

More information

CSCI 5454, CU Boulder Samriti Kanwar Lecture April 2013

CSCI 5454, CU Boulder Samriti Kanwar Lecture April 2013 1. Byzantine Agreement Problem In the Byzantine agreement problem, n processors communicate with each other by sending messages over bidirectional links in order to reach an agreement on a binary value.

More information

CS October 2017

CS October 2017 Atomic Transactions Transaction An operation composed of a number of discrete steps. Distributed Systems 11. Distributed Commit Protocols All the steps must be completed for the transaction to be committed.

More information

CMPSCI 677 Operating Systems Spring Lecture 14: March 9

CMPSCI 677 Operating Systems Spring Lecture 14: March 9 CMPSCI 677 Operating Systems Spring 2014 Lecture 14: March 9 Lecturer: Prashant Shenoy Scribe: Nikita Mehra 14.1 Distributed Snapshot Algorithm A distributed snapshot algorithm captures a consistent global

More information

Data Consistency and Blockchain. Bei Chun Zhou (BlockChainZ)

Data Consistency and Blockchain. Bei Chun Zhou (BlockChainZ) Data Consistency and Blockchain Bei Chun Zhou (BlockChainZ) beichunz@cn.ibm.com 1 Data Consistency Point-in-time consistency Transaction consistency Application consistency 2 Strong Consistency ACID Atomicity.

More information

Process groups and message ordering

Process groups and message ordering Process groups and message ordering If processes belong to groups, certain algorithms can be used that depend on group properties membership create ( name ), kill ( name ) join ( name, process ), leave

More information

BlockFin A Fork-Tolerant, Leaderless Consensus Protocol April

BlockFin A Fork-Tolerant, Leaderless Consensus Protocol April BlockFin A Fork-Tolerant, Leaderless Consensus Protocol April 2018 @storecoin What are the most desirable features in a blockchain? Scalability (throughput) and decentralization (censorship resistance),

More information

Today: Fault Tolerance. Failure Masking by Redundancy

Today: Fault Tolerance. Failure Masking by Redundancy Today: Fault Tolerance Agreement in presence of faults Two army problem Byzantine generals problem Reliable communication Distributed commit Two phase commit Three phase commit Failure recovery Checkpointing

More information

Distributed Systems 8L for Part IB

Distributed Systems 8L for Part IB Distributed Systems 8L for Part IB Handout 3 Dr. Steven Hand 1 Distributed Mutual Exclusion In first part of course, saw need to coordinate concurrent processes / threads In particular considered how to

More information

Silberschatz and Galvin Chapter 18

Silberschatz and Galvin Chapter 18 Silberschatz and Galvin Chapter 18 Distributed Coordination CPSC 410--Richard Furuta 4/21/99 1 Distributed Coordination Synchronization in a distributed environment Ð Event ordering Ð Mutual exclusion

More information

Fault Tolerance. Distributed Systems IT332

Fault Tolerance. Distributed Systems IT332 Fault Tolerance Distributed Systems IT332 2 Outline Introduction to fault tolerance Reliable Client Server Communication Distributed commit Failure recovery 3 Failures, Due to What? A system is said to

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Distributed System Engineering: Spring Exam I

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Distributed System Engineering: Spring Exam I Department of Electrical Engineering and Computer Science MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.824 Distributed System Engineering: Spring 2017 Exam I Write your name on this cover sheet. If you tear

More information

Byzantine Techniques

Byzantine Techniques November 29, 2005 Reliability and Failure There can be no unity without agreement, and there can be no agreement without conciliation René Maowad Reliability and Failure There can be no unity without agreement,

More information

Today: Fault Tolerance. Replica Management

Today: Fault Tolerance. Replica Management Today: Fault Tolerance Failure models Agreement in presence of faults Two army problem Byzantine generals problem Reliable communication Distributed commit Two phase commit Three phase commit Failure recovery

More information

Consensus in Distributed Systems. Jeff Chase Duke University

Consensus in Distributed Systems. Jeff Chase Duke University Consensus in Distributed Systems Jeff Chase Duke University Consensus P 1 P 1 v 1 d 1 Unreliable multicast P 2 P 3 Consensus algorithm P 2 P 3 v 2 Step 1 Propose. v 3 d 2 Step 2 Decide. d 3 Generalizes

More information

Clock Synchronization. Synchronization. Clock Synchronization Algorithms. Physical Clock Synchronization. Tanenbaum Chapter 6 plus additional papers

Clock Synchronization. Synchronization. Clock Synchronization Algorithms. Physical Clock Synchronization. Tanenbaum Chapter 6 plus additional papers Clock Synchronization Synchronization Tanenbaum Chapter 6 plus additional papers Fig 6-1. In a distributed system, each machine has its own clock. When this is the case, an event that occurred after another

More information

Practice: Large Systems Part 2, Chapter 2

Practice: Large Systems Part 2, Chapter 2 Practice: Large Systems Part 2, Chapter 2 Overvie Introduction Strong Consistency Crash Failures: Primary Copy, Commit Protocols Crash-Recovery Failures: Paxos, Chubby Byzantine Failures: PBFT, Zyzzyva

More information

Raft and Paxos Exam Rubric

Raft and Paxos Exam Rubric 1 of 10 03/28/2013 04:27 PM Raft and Paxos Exam Rubric Grading Where points are taken away for incorrect information, every section still has a minimum of 0 points. Raft Exam 1. (4 points, easy) Each figure

More information

Byzantine Fault Tolerant Raft

Byzantine Fault Tolerant Raft Abstract Byzantine Fault Tolerant Raft Dennis Wang, Nina Tai, Yicheng An {dwang22, ninatai, yicheng} @stanford.edu https://github.com/g60726/zatt For this project, we modified the original Raft design

More information

DATABASE TRANSACTIONS. CS121: Relational Databases Fall 2017 Lecture 25

DATABASE TRANSACTIONS. CS121: Relational Databases Fall 2017 Lecture 25 DATABASE TRANSACTIONS CS121: Relational Databases Fall 2017 Lecture 25 Database Transactions 2 Many situations where a sequence of database operations must be treated as a single unit A combination of

More information

Arvind Krishnamurthy Fall Collection of individual computing devices/processes that can communicate with each other

Arvind Krishnamurthy Fall Collection of individual computing devices/processes that can communicate with each other Distributed Systems Arvind Krishnamurthy Fall 2003 Concurrent Systems Collection of individual computing devices/processes that can communicate with each other General definition encompasses a wide range

More information

Byzantine fault tolerance. Jinyang Li With PBFT slides from Liskov

Byzantine fault tolerance. Jinyang Li With PBFT slides from Liskov Byzantine fault tolerance Jinyang Li With PBFT slides from Liskov What we ve learnt so far: tolerate fail-stop failures Traditional RSM tolerates benign failures Node crashes Network partitions A RSM w/

More information

P2 Recitation. Raft: A Consensus Algorithm for Replicated Logs

P2 Recitation. Raft: A Consensus Algorithm for Replicated Logs P2 Recitation Raft: A Consensus Algorithm for Replicated Logs Presented by Zeleena Kearney and Tushar Agarwal Diego Ongaro and John Ousterhout Stanford University Presentation adapted from the original

More information