Agreement in Distributed Systems CS 188 Distributed Systems February 19, 2015

Size: px

Start display at page:

Download "Agreement in Distributed Systems CS 188 Distributed Systems February 19, 2015"

Blaise Barrett
5 years ago
Views:

1 Agreement in Distributed Systems CS 188 Distributed Systems February 19, 2015 Page 1

2 Introduction We frequently want to get a set of nodes in a distributed system to agree Commitment protocols and mutual exclusion are particular cases The approaches we discussed for those work in limited situations In general, when can we reach agreement in a distributed system? Page 2

3 Basics of Agreement Protocols What is agreement? What are the necessary conditions for agreement? Page 3

4 What Do We Mean By Agreement? In simplest case, can n processors agree that a variable takes on value 0 or 1? Only non-faulty processors need agree More complex agreements can be built from this simple agreement Page 4

5 Conditions for Agreement Consistency Protocols All participants agree on same value and decisions are final Validity Participants agree on a value at least one of them wanted Termination/Progress All participants choose a value in a finite number of steps Page 5

6 Challenges to Agreement Delays In message delivery In nodes responding to messages Failures And recovery from failures Lies by participants Or innocent errors that have similar effects Page 6

7 Failures and Agreement Failures make agreement difficult Failed nodes don t participate Failed nodes sometimes recover at inconvenient times At worst, failed nodes participate in harmful ways Real failures are worse than fail-stop Page 7

8 Types of Failures Fail-stop A nice, clean failure Processor stops executing anything Realistic failures Partitionings Arbitrary delays Adversarial failures Arbitrary bad things happen Page 8

9 Election Algorithms If you get everyone to agree a particular node is in charge, Future consensus is easy, since he makes the decisions How do you determine who s in charge? Statically Dynamically Page 9

10 Static Leader Selection Methods Predefine one process/node as the leader Simple Everyone always knows who s the leader Not very resilient If the leader fails, then what? Page 10

11 Dynamic Leader Selection Methods Choose a new leader dynamically whenever necessary More complicated But failure of a leader is easy to handle Just elect a new one Election doesn t imply voting Not necessarily majority-based Page 11

12 Election Algorithms vs. Mutual Exclusion Algorithms Most mutual exclusion algorithms don t care much about failures Election algorithms are designed to handle failures Also, mutual exclusion algorithms only need a winner Election algorithms need everyone to know who won Page 12

13 A Typical Use of Election Algorithms A group of processes wants to periodically take a distributed snapshot They don t want multiple simultaneous snapshots So they want one leader to order them to take the snapshot Page 13

14 Problems in Election Algorithms Some of the nodes may have failed before the algorithm starts Some of the nodes may fail during the algorithm Some nodes may recover from failure Possible at inconvenient times What about partitions? Page 14

15 Election Algorithms and the Real Work The election algorithm is usually overhead There s a real computation you want to perform The election algorithm chooses someone to lead it Having two leaders while real computation is going on is bad Page 15

16 The Bully Algorithm The biggest kid on the block gets to be the leader But what if the biggest kid on the block is taking his piano lesson? The next biggest kid gets to be leader Until the piano lesson is over... Page 16

17 Spike s The piano Mom lesson hasn t let ends him out yet Electing a Bully The kids come out to play I m I m here, the leader, Hey, where I m the I m leader, here, Hey, Butch! Peewee! are and Butch! you we re Spike! sissies? playing let s who play else tag! Spike! is? baseball! Cuthbert! Page 17

18 Assumptions of the Bully Algorithm A static set of possible participants With an agreed-upon order All messages are delivered with T m seconds All responses are sent within T p seconds of delivery These last two imply synchronous behavior Page 18

19 The Basic Idea Behind the Bully Algorithm Possible leaders try to take over If they detect a better leader, they agree to its leadership Keep track of state information about whether you are electing a leader Only do real work when you agree on a leader Page 19

20 The Bully Algorithm and Timeouts Call out the biggest kid s name If he doesn t answer soon enough, call out the next biggest kid s name Until you hear an answer Or the caller is the biggest kid Then take over, by telling everyone else you re the leader Page 20

21 The Bully Algorithm At Work One node is currently the coordinator It expects a certain set of nodes to be up and participating The coordinator asks all other nodes If an expected node doesn t answer, start an election Also if it answers in the negative If an unexpected node answers, start an election Page 21

22 The Practicality of the Bully Algorithm The bully algorithm works reasonably well if the timeouts are effective A timeout occurring really means the site in question is down And there are no partitions at all If there are, what happens? Page 22

23 The Invitation Algorithm More practical than bully algorithm Doesn t depend on timeouts But its results are not as definitive An asynchronous algorithm Page 23

24 The Basic Idea Behind the Invitation Algorithm A current coordinator tries to get all other nodes to agree to his leadership If more than one coordinator around, get together and merge groups Use timeouts only to allow progress, not to make definitive decisions No set priorities for who will be coordinator Page 24

25 The Invitation Algorithm and Group Numbers The invitation algorithm recruits a group of nodes to work together More than one group can exist simultaneously Group numbers identify the group Why not identify with coordinator ID? Because one node can serially coordinate many groups Page 25

26 The Basic Operation of the Invitation Algorithm Coordinators in a normal state periodically check all other nodes If any other node is a coordinator, try to merge the groups If timeouts occur, don t worry about it Also don t worry if a response to check comes from this or earlier request Page 26

27 Merging in the Invitation Algorithm Merging always requires forming new group May have same coordinator, but different group number Coordinator who initiates merge asks all other known coordinators to merge They ask their group members Original group members also asked Page 27

28 A Simplified Example 1 UP ={1,2,3,4} Yes 1 Accept Invite Ready 31 Node 1 checks for other coordinator Accept Ready Invite AreYouCoordinator? No AreYouCoordinator? 3 Invite on behalf of node So node 1 finds another coordinator Node 1 asks the Node other 1 coordinator forms a new and group his old node to join his group If all members of UP{} respond, we re fine Page 28

29 The Reorganization State Nodes enter the reorganization state after getting their answer What s the point of this state? Why not just start up the group? After all, we all know who s going to be a member Or do we? Page 29

30 Why We Need Another Round of Messages 1 Invitation Invitation Invitation Assuming Who And what does no if 1 think someone timeouts, will crashes? join 4 will the also group, join at this point? And 2 Presumably and 23 needs not to know accepting that the invitation? Page 30

31 Timeouts in the Merge Don t worry too much about them Some nodes respond before the timeout Some don t If you don t catch them this time, you might the next Page 31

32 Straggler Messages This algorithm is asynchronous So messages may come in late What do we do when messages arrive late? Mostly, reject them How do we tell? Messages contain group number Page 32

33 Multiple Simultaneous Groups The invitation algorithm allows multiple simultaneous groups to exist Each with a proper coordinator Is this a good thing? No, but what are the alternatives? No node ever belongs to more than one group, at least Page 33

34 Paxos A family of algorithms that allow a distributed system to reach agreement In the face of delays and failures Can t perfectly guarantee progress But makes progress in realistic conditions Does guarantee consistency Usually defined to reach consensus on some value v Page 34

35 Paxos Assumptions Processors are of variable speed and may fail Might recover after failure But they don t lie Any processor can send a message to any other processor Messages can be lost, arbitrarily delayed, reordered, or duplicated But never corrupted Page 35

36 Client Paxos Processor Roles Issues a request, waits for a response Acceptor/voter Remembers things for the protocol Proposer (simpler if there s only one) Assists client in getting a response Learner Actually executes a request Leader One of the proposers that leads the process One processor can play several roles Usually, all processes are acceptors, proposers, and learners Page 36

37 Paxos Quorums Collections of acceptors that make decisions Several different quorums in system Messages are sent to quorums, not single acceptors Messages only effective if all quorum members receive it Similarly, all acceptors in a quorum must send a message for to be effective If any member of the quorum survives, its decisions survive Page 37

38 Quorum Membership All quorums must contain a majority of all acceptors in the system Any two quorums must share at least one acceptor E.g., if there are four acceptors {1,2,3,4}, quorums might be: {1,2,3}, {1,2,4}, {2,3,4}, {1,3,4} Page 38

39 Paxos Rounds Paxos proceeds in rounds In response to a client request If the round reaches agreement, the client gets a response If not, you start another round Continue till a round reaches agreement Page 39

40 V res is a result chosen by P, if no promise had a value A Simple Paxos Round 4. accept(n,v res ) 2. prepare(n) A 1 C 1. request P A 2 L 1 6. response If an acceptor ever promised on this item before, it returns the generation and value from that run of Paxos, not null A 3 5. accepted(n,v max ) 3. promise(n,null) 3. promise(n,null) 3. promise(n, null) L 2 N is a bigger number than P has ever used or seen before Page 40

41 The Point of Different Paxos The client wants to get something done Roles The learners ensure redundant memory of the Remember! result of a decision A 1 C One machine P can play A multiple roles 2 L 1 The proposer coordinates protocol activities A 3 The acceptors ensure proper concurrent behavior and handle proposer failures L 2 Page 41

42 Paxos Error Handling Some cases simple, some complex A simple case: One of the acceptors fails If there s still a quorum, no problem Go ahead without him Another simple case: One of the learners failed If any learners are left, they ll provide the right response to the client Page 42

43 More Complex Error Cases Things like failure of proposer in middle of a round Paxos chooses a new leader and uses him from this point What if old leader comes back? Even more complex, but it works out Page 43

44 Paxos and Overheads Generally quite expensive In messages and thus delays Many optimizations possible Some don t alter the protocol characteristics Some trade off handling some error conditions for better performance Page 44

45 Byzantine Agreement Life can be a lot worse than merely being unable to rely on timeouts What if one of the nodes we re working with is lying? How can we reach agreement if we can t trust all the participants? Page 45

46 The Purpose of Byzantine Agreement Well, why would one of our distributed system components lie? It probably wouldn t But it might contain a bug If it contains the worst possible bug, what can it do? Essentially, inadvertently lie Page 46

47 The Realism of Byzantine Agreement It isn t realistic It doesn t really happen No one really uses it But it demonstrates a limit on how badly things can go while still allowing agreement Page 47

48 Why Is It Called Byzantine? After the fall of Rome itself, the empire lived on in the east Called Byzantium Byzantium survived for around 1000 years The Byzantines were famous for their treachery and double-dealing Page 48

49 The Byzantine General Problem Several Byzantine generals each command their own army They are far apart and communicate with messengers The emperor wants to attack the Turks If all generals attack, they ll win Even if a majority attack, they ll win Retreating is OK, if everyone does it But the Turks may have bribed some generals Page 49

50 The Complete Problem Statement Messages are point-to-point Messages are reliably delivered, with a predictable timeout Failure to receive message in time means sender is a traitor Traitors can send any messages they please But cannot forge their identities Page 50

51 How Many Traitors Is Too Many? Can all the loyal generals reach agreement on whether to attack or retreat? Or can the traitors prevent them from reaching any agreement? How many generals must the Turks bribe before no agreement is possible? Page 51

52 The Answer If the Turks bribe 1/3 of the generals, the remaining 2/3 s cannot reach agreement How can that be? Why not just a majority? Easiest to consider in the case of a commander Page 52

53 The 3-General Byzantine Problem Attack Commander Retreat Attack But what if the commander is a traitor? What if they re all loyal? Everyone One general attacks attacks, and one the Turk retreats, is vanquished the traitor pockets the bribe, and the Turks win Page 53

54 Can t the Loyal Generals Check Their Orders? Attack 1 Commander Retreat Retreat 2 3 Attack Generals 2 and 3 check their orders They figure out 1 is a traitor and come to their own agreement Page 54

55 But What if the Commander Wasn t the Traitor? Attack 1 Commander Attack Retreat 2 3 Attack Generals 3 is 2 the and traitor, 3 check this their time But 1 isn t the traitor, 3 is the traitor orders They He convinces figure out 21 to is retreat, a traitor 1 is and slaughtered come to their attacking, own agreement and 3 pockets the bribe Page 55

56 Can General 2 Tell Which Scenario Is Occurring? When 1 was the traitor, 2 saw: When 3 was the traitor, 2 saw: Attack 1 Attack 1 Retreat 2 3 Retreat can t tell the difference, so he can t decide whether to attack or retreat Page 56

57 What If There Were 4 Generals? 1 Commander Attack Attack Retreat What if the commander (1) is the traitor? If he doesn t send some messages, he ll be seen as the traitor But what can he send? Page 57

58 Can the Three Loyal Generals Reach Agreement? 1 Commander Attack Attack Retreat They can exchange all the messages and let the majority rule Since there are only two messages, the commander must have sent the same message to two nodes If the commander is loyal and someone else is lying, the majority represents the loyal commander s will Page 58

59 But What if There Were Five Generals? 1 Commander Attack Attack Retreat Pre-arrange a tie-breaker E.g., always retreat on ties All the loyal generals then retreat And the traitor must explain his failure to the Turks Retreat Page 59

60 What If You Don t Want a Commander? What if you want everyone to vote? And accept the majority? With the guarantee that all loyal nodes abide by the majority? Serially treat each node as the commander Reach agreement on his vote Then move on to the next node Page 60

61 The Trick Behind Byzantine Agreement Everyone must know what everyone else thinks about everything else Not just what I think the commander said, but what everyone else claims the commander said Resulting algorithms are tricky and expensive But it could be (and will be) worse Page 61

62 Authenticated Byzantine Agreement What if the messages are signed in an unforgeable way? Then dishonest generals can t lie about what honest general told them In this case, honest generals reach agreement regardless of how many are dishonest Page 62

Distributed Systems 11. Consensus. Paul Krzyzanowski

Distributed Systems 11. Consensus Paul Krzyzanowski pxk@cs.rutgers.edu 1 Consensus Goal Allow a group of processes to agree on a result All processes must agree on the same value The value must be one