Distributed Systems Consensus

Size: px
Start display at page:

Download "Distributed Systems Consensus"

Transcription

1 Distributed Systems Consensus Amir H. Payberah Amirkabir University of Technology (Tehran Polytechnic) Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 1 / 56

2 What is the Problem? Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 2 / 56

3 Two Generals Problem Two generals need to be agree on time to attack to win. They communicate through messengers, who may be killed on their way. Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 3 / 56

4 Two Generals Problem Two generals need to be agree on time to attack to win. They communicate through messengers, who may be killed on their way. Agreement is the problem. Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 3 / 56

5 Replicated State Machine Problem (1/2) Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 4 / 56

6 Replicated State Machine Problem (1/2) Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 4 / 56

7 Replicated State Machine Problem (1/2) The solution: replicate the server. Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 4 / 56

8 Replicated State Machine Problem (2/2) Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 5 / 56

9 Replicated State Machine Problem (2/2) Make the server deterministic (state machine). Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 5 / 56

10 Replicated State Machine Problem (2/2) Make the server deterministic (state machine). Replicate the server. Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 5 / 56

11 Replicated State Machine Problem (2/2) Make the server deterministic (state machine). Replicate the server. Ensure correct replicas step through the same sequence of state transitions (How?) Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 5 / 56

12 Replicated State Machine Problem (2/2) Make the server deterministic (state machine). Replicate the server. Ensure correct replicas step through the same sequence of state transitions (How?) Agreement is the problem. Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 5 / 56

13 The Agreement Problem Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 6 / 56

14 The Agreement Problem Some nodes propose values (or actions) by sending them to the others. All nodes must decide whether to accept or reject those values. Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 7 / 56

15 The Agreement Problem Some nodes propose values (or actions) by sending them to the others. All nodes must decide whether to accept or reject those values. But,... Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 7 / 56

16 The Agreement Problem Some nodes propose values (or actions) by sending them to the others. All nodes must decide whether to accept or reject those values. But,... Concurrent processes and uncertainty of timing, order of events and inputs. Failure and recovery of machines/processors, of communication channels. Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 7 / 56

17 Agreement Requirements Safety Liveness Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 8 / 56

18 Agreement Requirements Safety Validity: only a value that has been proposed may be chosen. Liveness Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 8 / 56

19 Agreement Requirements Safety Validity: only a value that has been proposed may be chosen. Agreement: no two correct nodes choose different values. Liveness Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 8 / 56

20 Agreement Requirements Safety Validity: only a value that has been proposed may be chosen. Agreement: no two correct nodes choose different values. Integrity: a node chooses at most once. Liveness Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 8 / 56

21 Agreement Requirements Safety Validity: only a value that has been proposed may be chosen. Agreement: no two correct nodes choose different values. Integrity: a node chooses at most once. Liveness Termination: every correct node eventually choose a value. Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 8 / 56

22 Agreement in Distributed Systems: Possible Solutions Two-Phase Commit (2PC) Paxos Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 9 / 56

23 Two-Phase Commit (2PC) Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 10 / 56

24 The Two-Phase Commit (2PC) Problem The problem first was encountered in database systems. Suppose a database system is updating some complicated data structures that include parts residing on more than one machine. Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 11 / 56

25 The Two-Phase Commit (2PC) Problem The problem first was encountered in database systems. Suppose a database system is updating some complicated data structures that include parts residing on more than one machine. System model: Concurrent processes and uncertainty of timing, order of events and inputs (asynchronous systems). Failure and recovery of machines/processors, of communication channels. Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 11 / 56

26 Intuitive Example (1/3) You want to organize outing with 3 friends at 6pm Tuesday. Go out only if all friends can make it. Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 12 / 56

27 Intuitive Example (2/3) What do you do? Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 13 / 56

28 Intuitive Example (2/3) What do you do? Call each of them and ask if can do 6pm on Tuesday (voting phase) Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 13 / 56

29 Intuitive Example (2/3) What do you do? Call each of them and ask if can do 6pm on Tuesday (voting phase) If all can do Tuesday, call each friend back to ACK (commit) Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 13 / 56

30 Intuitive Example (2/3) What do you do? Call each of them and ask if can do 6pm on Tuesday (voting phase) If all can do Tuesday, call each friend back to ACK (commit) If one cannot do Tuesday, call other three to cancel (abort) Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 13 / 56

31 Intuitive Example (3/3) Critical details Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 14 / 56

32 Intuitive Example (3/3) Critical details While you were calling everyone to ask, people who have promised they can do 6pm Tuesday must reserve that slot. Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 14 / 56

33 Intuitive Example (3/3) Critical details While you were calling everyone to ask, people who have promised they can do 6pm Tuesday must reserve that slot. You need to remember the decision and tell anyone whom you have not been able to reach during commit/abort phase. Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 14 / 56

34 Intuitive Example (3/3) Critical details While you were calling everyone to ask, people who have promised they can do 6pm Tuesday must reserve that slot. You need to remember the decision and tell anyone whom you have not been able to reach during commit/abort phase. That is exactly how 2PC works. Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 14 / 56

35 The 2PC Players Coordinator (Transaction Manager) Begins transaction. Responsible for commit/abort. Participants (Resource Managers) The servers with the data used in the distributed transaction. Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 15 / 56

36 The 2PC Algorithm Phase 1 - prepare phase Phase 2 - commit phase Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 16 / 56

37 The 2PC Algorithm - Prepare Phase Coordinator asks each participant cancommit. Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 17 / 56

38 The 2PC Algorithm - Prepare Phase Coordinator asks each participant cancommit. Participants must prepare to commit using permanent storage before answering yes. Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 17 / 56

39 The 2PC Algorithm - Prepare Phase Coordinator asks each participant cancommit. Participants must prepare to commit using permanent storage before answering yes. Lock the objects. Participants are not allowed to cause an abort after it replies yes to cancommit. Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 17 / 56

40 The 2PC Algorithm - Prepare Phase Coordinator asks each participant cancommit. Participants must prepare to commit using permanent storage before answering yes. Lock the objects. Participants are not allowed to cause an abort after it replies yes to cancommit. Outcome of the transaction is uncertain until docommit or doabort. Other participants might still cause an abort. Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 17 / 56

41 The 2PC Algorithm - Commit Phase The coordinator collects all votes. If unanimous yes, causes commit. If any participant voted no, causes abort. Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 18 / 56

42 The 2PC Algorithm - Commit Phase The coordinator collects all votes. If unanimous yes, causes commit. If any participant voted no, causes abort. The fate of the transaction is decided atomically at the coordinator, once all participants vote. Coordinator records fate using permanent storage. Then broadcasts docommit or doabort to participants. Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 18 / 56

43 2PC Sequence of Events Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 19 / 56

44 Recovery in 2PC Recovery after timeouts. Recovery after crashes and reboot. Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 20 / 56

45 Recovery in 2PC Recovery after timeouts. Recovery after crashes and reboot. Note: you cannot differentiate between the above in a realistic asynchronous network. Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 20 / 56

46 Handling Timeout To avoid processes blocking for ever. Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 21 / 56

47 Handling Timeout To avoid processes blocking for ever. Two scenarios: Coordinator waits for votes from participants. Participant is waiting for the final decision. Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 21 / 56

48 Handling Timeout at Coordinator If B voted no, can coordinator unilaterally abort? If B voted yes, can coordinator unilaterally abort/commit? Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 22 / 56

49 Handling Timeout at Coordinator If B voted no, can coordinator unilaterally abort? If B voted yes, can coordinator unilaterally abort/commit? Coordinator waits for votes from participants. Participant is waiting for the final decision. Coordinator timeout abort and send doabort to participants. Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 22 / 56

50 Handling Timeout at Participants If B times out on TM and has voted yes, then execute termination protocol. Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 23 / 56

51 Handling Timeout at Participants If B times out on TM and has voted yes, then execute termination protocol. Simple protocol: participant is remained blocked until it can establish communication with coordinator. Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 23 / 56

52 Handling Timeout at Participants If B times out on TM and has voted yes, then execute termination protocol. Simple protocol: participant is remained blocked until it can establish communication with coordinator. Cooperative protocol: participant sends a decision-request message to other participants. Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 23 / 56

53 Handling Timeout at Participants If B times out on TM and has voted yes, then execute termination protocol. Simple protocol: participant is remained blocked until it can establish communication with coordinator. Cooperative protocol: participant sends a decision-request message to other participants. B sends status message to A Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 23 / 56

54 Handling Timeout at Participants If B times out on TM and has voted yes, then execute termination protocol. Simple protocol: participant is remained blocked until it can establish communication with coordinator. Cooperative protocol: participant sends a decision-request message to other participants. B sends status message to A If A has received commit/abort from TM,... Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 23 / 56

55 Handling Timeout at Participants If B times out on TM and has voted yes, then execute termination protocol. Simple protocol: participant is remained blocked until it can establish communication with coordinator. Cooperative protocol: participant sends a decision-request message to other participants. B sends status message to A If A has received commit/abort from TM,... If A has not responded to TM,... Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 23 / 56

56 Handling Timeout at Participants If B times out on TM and has voted yes, then execute termination protocol. Simple protocol: participant is remained blocked until it can establish communication with coordinator. Cooperative protocol: participant sends a decision-request message to other participants. B sends status message to A If A has received commit/abort from TM,... If A has not responded to TM,... If A has responded with no/yes... Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 23 / 56

57 Handling Crash and Recovery (1/2) All nodes must log protocol progress. Participants: prepared, uncertain, committed/aborted Coordinator: prepared, committed/aborted, done Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 24 / 56

58 Handling Crash and Recovery (1/2) All nodes must log protocol progress. Participants: prepared, uncertain, committed/aborted Coordinator: prepared, committed/aborted, done Nodes cannot back out if commit is decided. Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 24 / 56

59 Handling Crash and Recovery (2/2) Coordinator crashes: If it finds no commit on disk, it aborts. If it finds commit, it commits. Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 25 / 56

60 Handling Crash and Recovery (2/2) Coordinator crashes: If it finds no commit on disk, it aborts. If it finds commit, it commits. Participant crashes: If it finds no yes on disk, it aborts. If it finds yes, runs termination protocol to decide. Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 25 / 56

61 Fault-Tolerance Limitations of 2PC Even with recovery enabled, 2PC is not really fault-tolerant (or live), because it can be blocked even when one (or a few) machines fail. Blocking means that it does not make progress during the failures. Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 26 / 56

62 Fault-Tolerance Limitations of 2PC Even with recovery enabled, 2PC is not really fault-tolerant (or live), because it can be blocked even when one (or a few) machines fail. Blocking means that it does not make progress during the failures. Any scenarios? Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 26 / 56

63 2PC Blocking Scenario TM sends docommit to A, A gets it and commits, and then both TM and A die. Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 27 / 56

64 2PC Blocking Scenario TM sends docommit to A, A gets it and commits, and then both TM and A die. B, C, D have already also replied yes, have locked their mutexes, and now need to wait for TM or A to reappear. They cannot recover the decision with certainty until TM or A are online. Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 27 / 56

65 2PC Blocking Scenario TM sends docommit to A, A gets it and commits, and then both TM and A die. B, C, D have already also replied yes, have locked their mutexes, and now need to wait for TM or A to reappear. They cannot recover the decision with certainty until TM or A are online. This is why 2PC is called a blocking protocol: 2PC is safe, but not live. Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 27 / 56

66 Impossibility of Distributed Consensus with One Faulty Process Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 28 / 56

67 FLP Fischer-Lynch-Paterson (FLP) M.J. Fischer, N.A. Lynch, and M.S. Paterson, Impossibility of distributed consensus with one faulty process, Journal of the ACM, Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 29 / 56

68 FLP Impossibility Result It is impossible for a set of processors in an asynchronous system to agree on a binary value, even if only a single process is subject to an unannounced failure. The core of the problem is asynchrony. Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 30 / 56

69 FLP Impossibility Result What FLP says: you cannot guarantee both safety and progress when there is even a single fault at an inopportune moment. Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 31 / 56

70 FLP Impossibility Result What FLP says: you cannot guarantee both safety and progress when there is even a single fault at an inopportune moment. What FLP does not say: in practice, how close can you get to the ideal (always safe and live)? Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 31 / 56

71 FLP Impossibility Result What FLP says: you cannot guarantee both safety and progress when there is even a single fault at an inopportune moment. What FLP does not say: in practice, how close can you get to the ideal (always safe and live)? So, Paxos... Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 31 / 56

72 Paxos Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 32 / 56

73 Paxos The only known completely-safe and largely-live agreement protocol. L. Lamport, The part-time parliament, ACM Transactions on Computer Systems, Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 33 / 56

74 The Paxos Players Proposers Suggests values for consideration by acceptors. Acceptors Considers the values proposed by proposers. Renders an accept/reject decision. Learners Learns the chosen value. Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 34 / 56

75 The Paxos Players Proposers Suggests values for consideration by acceptors. Acceptors Considers the values proposed by proposers. Renders an accept/reject decision. Learners Learns the chosen value. A node can act as more than one roles (usually 3). Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 34 / 56

76 Single Proposal, Single Acceptor Use just one acceptor Collects proposers proposals. Decides the value and tells everyone else. Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 35 / 56

77 Single Proposal, Single Acceptor Use just one acceptor Collects proposers proposals. Decides the value and tells everyone else. Sounds familiar? Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 35 / 56

78 Single Proposal, Single Acceptor Use just one acceptor Collects proposers proposals. Decides the value and tells everyone else. Sounds familiar? two-phase commit (2PC) acceptor fails = protocol blocks Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 35 / 56

79 Single Proposal, Multiple Acceptors One acceptor is not fault-tolerant enough. Let s have multiple acceptors. Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 36 / 56

80 Single Proposal, Multiple Acceptors One acceptor is not fault-tolerant enough. Let s have multiple acceptors. Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 36 / 56

81 Single Proposal, Multiple Acceptors One acceptor is not fault-tolerant enough. Let s have multiple acceptors. From there, must reach a decision. How? Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 36 / 56

82 Single Proposal, Multiple Acceptors One acceptor is not fault-tolerant enough. Let s have multiple acceptors. From there, must reach a decision. How? Decision = value accepted by the majority. Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 36 / 56

83 Single Proposal, Multiple Acceptors One acceptor is not fault-tolerant enough. Let s have multiple acceptors. From there, must reach a decision. How? Decision = value accepted by the majority. P1: an acceptor must accept first proposal it receives. Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 36 / 56

84 Multiple Proposals, Multiple Acceptors If there are multiple proposals, no proposal may get the majority. 3 proposals may each get 1/3 of the acceptors. Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 37 / 56

85 Multiple Proposals, Multiple Acceptors If there are multiple proposals, no proposal may get the majority. 3 proposals may each get 1/3 of the acceptors. Solution: acceptors can accept multiple proposals, distinguished by a unique proposal number. Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 37 / 56

86 Handling Multiple Proposals All chosen proposals must have the same value. Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 38 / 56

87 Handling Multiple Proposals All chosen proposals must have the same value. P2: If a proposal with value v is chosen, then every higher-numbered proposal that is chosen also has value v. Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 38 / 56

88 Handling Multiple Proposals All chosen proposals must have the same value. P2: If a proposal with value v is chosen, then every higher-numbered proposal that is chosen also has value v. P2a:... accepted... Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 38 / 56

89 Handling Multiple Proposals All chosen proposals must have the same value. P2: If a proposal with value v is chosen, then every higher-numbered proposal that is chosen also has value v. P2a:... accepted... P2b:... proposed... Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 38 / 56

90 The Paxos Algorithm Phase 1a - prepare phase Phase 1b - promise phase Phase 2a - accept phase Phase 2b - accepted phase Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 39 / 56

91 Paxos Algorithm Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 40 / 56

92 Paxos Algorithm - Prepare Phase A proposer selects a proposal number n and sends a prepare request with number n to majority of acceptors. Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 41 / 56

93 Paxos Algorithm - Promise Phase If an acceptor receives a prepare request with number n greater than that of any prepare request it saw Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 42 / 56

94 Paxos Algorithm - Promise Phase If an acceptor receives a prepare request with number n greater than that of any prepare request it saw It responses yes to that request with a promise not to accept any more proposals numbered less than n. Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 42 / 56

95 Paxos Algorithm - Promise Phase If an acceptor receives a prepare request with number n greater than that of any prepare request it saw It responses yes to that request with a promise not to accept any more proposals numbered less than n. It includes the highest-numbered proposal (if any) that it has accepted. Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 42 / 56

96 Paxos Algorithm - Accept Phase If the proposer receives a response yes to its prepare requests from a majority of acceptors Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 43 / 56

97 Paxos Algorithm - Accept Phase If the proposer receives a response yes to its prepare requests from a majority of acceptors It sends an accept request to each of those acceptors for a proposal numbered n with a values v, which is the value of the highest-numbered proposal among the responses. Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 43 / 56

98 Paxos Algorithm - Accepted Phase If an acceptor receives an accept request for a proposal numbered n Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 44 / 56

99 Paxos Algorithm - Accepted Phase If an acceptor receives an accept request for a proposal numbered n It accepts the proposal unless it has already responded to a prepare request having a number greater than n. Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 44 / 56

100 Definition of Chosen A value is chosen at proposal number n, iff majority of acceptors accept that value in phase 2 of the proposal number. Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 45 / 56

101 Paxos Example Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 46 / 56

102 Paxos Example Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 47 / 56

103 Paxos - Safety (1/3) If a value v is chosen at proposal number n, any value that is sent out in phase 2 of any later proposal numbers must be also v. Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 48 / 56

104 Paxos - Safety (2/3) Decision = Majority (any two majorities share at least one element) Therefore after the first round in which there is a decision, any subsequent round involves at least one acceptor that has accepted v. Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 49 / 56

105 Paxos - Safety (3/3) Now suppose our claim is not true, and let m is the first proposal number that is later than n and in 2nd phase, the value sent out is w v. This is not possible, because if the proposer P was able to start 2nd phase for w, it means it got a majority to accept round for m (for m > n). So, either: v would not have been the value decided, or v would have been proposed by P Therefore, once a majority accepts v, that never changes. Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 50 / 56

106 Liveness If two or more proposers race to propose new values, they might step on each other toes all the time. P1 : prepare(n 1 ) P2 : prepare(n 2 ) P 1 : accept(n 1, v 1 ) P 2 : accept(n 2, v 2 ) P1 : prepare(n 3 ) P2 : prepare(n 4 )... n 1 < n 2 < n 3 < n 4 < With randomness, this occurs exceedingly rarely. Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 51 / 56

107 Paxos is Everywhere Google: Chubby (Paxos-based distributed lock service) Most Google services use Chubby directly or indirectly Yahoo: Zookeeper (Paxos-based distributed lock service) Zookeeper is open-source and integrates with Hadoop UW: Scatter (Paxos-based consistent DHT) Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 52 / 56

108 Summary Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 53 / 56

109 Summary Replicated State Machine 2PC: blocking FLP impossibility Paxos Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 54 / 56

110 References: M.J. Fischer, N.A. Lynch, and M.S. Paterson, Impossibility of distributed consensus with one faulty process, Journal of the ACM, L. Lamport, The part-time parliament, ACM Transactions on Computer Systems, L. Lamport, Paxos made simple, ACM Sigact News, J. Gray, and L. Lamport, Consensus on transaction commit, ACM Transactions on Database Systems, Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 55 / 56

111 Questions? Acknowledgements Some slides were derived from Jinyang Li slides (New York University). Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 56 / 56

Beyond FLP. Acknowledgement for presentation material. Chapter 8: Distributed Systems Principles and Paradigms: Tanenbaum and Van Steen

Beyond FLP. Acknowledgement for presentation material. Chapter 8: Distributed Systems Principles and Paradigms: Tanenbaum and Van Steen Beyond FLP Acknowledgement for presentation material Chapter 8: Distributed Systems Principles and Paradigms: Tanenbaum and Van Steen Paper trail blog: http://the-paper-trail.org/blog/consensus-protocols-paxos/

More information

Distributed Commit in Asynchronous Systems

Distributed Commit in Asynchronous Systems Distributed Commit in Asynchronous Systems Minsoo Ryu Department of Computer Science and Engineering 2 Distributed Commit Problem - Either everybody commits a transaction, or nobody - This means consensus!

More information

Transactions. CS 475, Spring 2018 Concurrent & Distributed Systems

Transactions. CS 475, Spring 2018 Concurrent & Distributed Systems Transactions CS 475, Spring 2018 Concurrent & Distributed Systems Review: Transactions boolean transfermoney(person from, Person to, float amount){ if(from.balance >= amount) { from.balance = from.balance

More information

Distributed Systems 11. Consensus. Paul Krzyzanowski

Distributed Systems 11. Consensus. Paul Krzyzanowski Distributed Systems 11. Consensus Paul Krzyzanowski pxk@cs.rutgers.edu 1 Consensus Goal Allow a group of processes to agree on a result All processes must agree on the same value The value must be one

More information

Distributed Systems. 10. Consensus: Paxos. Paul Krzyzanowski. Rutgers University. Fall 2017

Distributed Systems. 10. Consensus: Paxos. Paul Krzyzanowski. Rutgers University. Fall 2017 Distributed Systems 10. Consensus: Paxos Paul Krzyzanowski Rutgers University Fall 2017 1 Consensus Goal Allow a group of processes to agree on a result All processes must agree on the same value The value

More information

Consensus a classic problem. Consensus, impossibility results and Paxos. Distributed Consensus. Asynchronous networks.

Consensus a classic problem. Consensus, impossibility results and Paxos. Distributed Consensus. Asynchronous networks. Consensus, impossibility results and Paxos Ken Birman Consensus a classic problem Consensus abstraction underlies many distributed systems and protocols N processes They start execution with inputs {0,1}

More information

Consensus, impossibility results and Paxos. Ken Birman

Consensus, impossibility results and Paxos. Ken Birman Consensus, impossibility results and Paxos Ken Birman Consensus a classic problem Consensus abstraction underlies many distributed systems and protocols N processes They start execution with inputs {0,1}

More information

Consensus in Distributed Systems. Jeff Chase Duke University

Consensus in Distributed Systems. Jeff Chase Duke University Consensus in Distributed Systems Jeff Chase Duke University Consensus P 1 P 1 v 1 d 1 Unreliable multicast P 2 P 3 Consensus algorithm P 2 P 3 v 2 Step 1 Propose. v 3 d 2 Step 2 Decide. d 3 Generalizes

More information

Agreement and Consensus. SWE 622, Spring 2017 Distributed Software Engineering

Agreement and Consensus. SWE 622, Spring 2017 Distributed Software Engineering Agreement and Consensus SWE 622, Spring 2017 Distributed Software Engineering Today General agreement problems Fault tolerance limitations of 2PC 3PC Paxos + ZooKeeper 2 Midterm Recap 200 GMU SWE 622 Midterm

More information

Recovering from a Crash. Three-Phase Commit

Recovering from a Crash. Three-Phase Commit Recovering from a Crash If INIT : abort locally and inform coordinator If Ready, contact another process Q and examine Q s state Lecture 18, page 23 Three-Phase Commit Two phase commit: problem if coordinator

More information

Paxos and Replication. Dan Ports, CSEP 552

Paxos and Replication. Dan Ports, CSEP 552 Paxos and Replication Dan Ports, CSEP 552 Today: achieving consensus with Paxos and how to use this to build a replicated system Last week Scaling a web service using front-end caching but what about the

More information

Paxos and Raft (Lecture 21, cs262a) Ion Stoica, UC Berkeley November 7, 2016

Paxos and Raft (Lecture 21, cs262a) Ion Stoica, UC Berkeley November 7, 2016 Paxos and Raft (Lecture 21, cs262a) Ion Stoica, UC Berkeley November 7, 2016 Bezos mandate for service-oriented-architecture (~2002) 1. All teams will henceforth expose their data and functionality through

More information

Today: Fault Tolerance

Today: Fault Tolerance Today: Fault Tolerance Agreement in presence of faults Two army problem Byzantine generals problem Reliable communication Distributed commit Two phase commit Three phase commit Paxos Failure recovery Checkpointing

More information

Paxos. Sistemi Distribuiti Laurea magistrale in ingegneria informatica A.A Leonardo Querzoni. giovedì 19 aprile 12

Paxos. Sistemi Distribuiti Laurea magistrale in ingegneria informatica A.A Leonardo Querzoni. giovedì 19 aprile 12 Sistemi Distribuiti Laurea magistrale in ingegneria informatica A.A. 2011-2012 Leonardo Querzoni The Paxos family of algorithms was introduced in 1999 to provide a viable solution to consensus in asynchronous

More information

Distributed System. Gang Wu. Spring,2018

Distributed System. Gang Wu. Spring,2018 Distributed System Gang Wu Spring,2018 Lecture4:Failure& Fault-tolerant Failure is the defining difference between distributed and local programming, so you have to design distributed systems with the

More information

CS October 2017

CS October 2017 Atomic Transactions Transaction An operation composed of a number of discrete steps. Distributed Systems 11. Distributed Commit Protocols All the steps must be completed for the transaction to be committed.

More information

Assignment 12: Commit Protocols and Replication Solution

Assignment 12: Commit Protocols and Replication Solution Data Modelling and Databases Exercise dates: May 24 / May 25, 2018 Ce Zhang, Gustavo Alonso Last update: June 04, 2018 Spring Semester 2018 Head TA: Ingo Müller Assignment 12: Commit Protocols and Replication

More information

Recap. CSE 486/586 Distributed Systems Paxos. Paxos. Brief History. Brief History. Brief History C 1

Recap. CSE 486/586 Distributed Systems Paxos. Paxos. Brief History. Brief History. Brief History C 1 Recap Distributed Systems Steve Ko Computer Sciences and Engineering University at Buffalo Facebook photo storage CDN (hot), Haystack (warm), & f4 (very warm) Haystack RAID-6, per stripe: 10 data disks,

More information

Consensus and related problems

Consensus and related problems Consensus and related problems Today l Consensus l Google s Chubby l Paxos for Chubby Consensus and failures How to make process agree on a value after one or more have proposed what the value should be?

More information

Today: Fault Tolerance. Fault Tolerance

Today: Fault Tolerance. Fault Tolerance Today: Fault Tolerance Agreement in presence of faults Two army problem Byzantine generals problem Reliable communication Distributed commit Two phase commit Three phase commit Paxos Failure recovery Checkpointing

More information

Fault Tolerance. o Basic Concepts o Process Resilience o Reliable Client-Server Communication o Reliable Group Communication. o Distributed Commit

Fault Tolerance. o Basic Concepts o Process Resilience o Reliable Client-Server Communication o Reliable Group Communication. o Distributed Commit Fault Tolerance o Basic Concepts o Process Resilience o Reliable Client-Server Communication o Reliable Group Communication o Distributed Commit -1 Distributed Commit o A more general problem of atomic

More information

Proseminar Distributed Systems Summer Semester Paxos algorithm. Stefan Resmerita

Proseminar Distributed Systems Summer Semester Paxos algorithm. Stefan Resmerita Proseminar Distributed Systems Summer Semester 2016 Paxos algorithm stefan.resmerita@cs.uni-salzburg.at The Paxos algorithm Family of protocols for reaching consensus among distributed agents Agents may

More information

Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf

Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf Distributed systems Lecture 6: distributed transactions, elections, consensus and replication Malte Schwarzkopf Last time Saw how we can build ordered multicast Messages between processes in a group Need

More information

ATOMIC COMMITMENT Or: How to Implement Distributed Transactions in Sharded Databases

ATOMIC COMMITMENT Or: How to Implement Distributed Transactions in Sharded Databases ATOMIC COMMITMENT Or: How to Implement Distributed Transactions in Sharded Databases We talked about transactions and how to implement them in a single-node database. We ll now start looking into how to

More information

Practice: Large Systems Part 2, Chapter 2

Practice: Large Systems Part 2, Chapter 2 Practice: Large Systems Part 2, Chapter 2 Overvie Introduction Strong Consistency Crash Failures: Primary Copy, Commit Protocols Crash-Recovery Failures: Paxos, Chubby Byzantine Failures: PBFT, Zyzzyva

More information

Consensus on Transaction Commit

Consensus on Transaction Commit Consensus on Transaction Commit Jim Gray and Leslie Lamport Microsoft Research 1 January 2004 revised 19 April 2004, 8 September 2005, 5 July 2017 MSR-TR-2003-96 This paper appeared in ACM Transactions

More information

CS505: Distributed Systems

CS505: Distributed Systems Department of Computer Science CS505: Distributed Systems Lecture 13: Distributed Transactions Outline Distributed Transactions Two Phase Commit and Three Phase Commit Non-blocking Atomic Commit with P

More information

Parallel Data Types of Parallelism Replication (Multiple copies of the same data) Better throughput for read-only computations Data safety Partitionin

Parallel Data Types of Parallelism Replication (Multiple copies of the same data) Better throughput for read-only computations Data safety Partitionin Parallel Data Types of Parallelism Replication (Multiple copies of the same data) Better throughput for read-only computations Data safety Partitioning (Different data at different sites More space Better

More information

To do. Consensus and related problems. q Failure. q Raft

To do. Consensus and related problems. q Failure. q Raft Consensus and related problems To do q Failure q Consensus and related problems q Raft Consensus We have seen protocols tailored for individual types of consensus/agreements Which process can enter the

More information

CS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5.

CS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5. Question 1 What makes a message unstable? How does an unstable message become stable? Distributed Systems 2016 Exam 2 Review Paul Krzyzanowski Rutgers University Fall 2016 In virtual sychrony, a message

More information

Exercise 12: Commit Protocols and Replication

Exercise 12: Commit Protocols and Replication Data Modelling and Databases (DMDB) ETH Zurich Spring Semester 2017 Systems Group Lecturer(s): Gustavo Alonso, Ce Zhang Date: May 22, 2017 Assistant(s): Claude Barthels, Eleftherios Sidirourgos, Eliza

More information

Data Consistency and Blockchain. Bei Chun Zhou (BlockChainZ)

Data Consistency and Blockchain. Bei Chun Zhou (BlockChainZ) Data Consistency and Blockchain Bei Chun Zhou (BlockChainZ) beichunz@cn.ibm.com 1 Data Consistency Point-in-time consistency Transaction consistency Application consistency 2 Strong Consistency ACID Atomicity.

More information

Recall our 2PC commit problem. Recall our 2PC commit problem. Doing failover correctly isn t easy. Consensus I. FLP Impossibility, Paxos

Recall our 2PC commit problem. Recall our 2PC commit problem. Doing failover correctly isn t easy. Consensus I. FLP Impossibility, Paxos Consensus I Recall our 2PC commit problem FLP Impossibility, Paxos Client C 1 C à TC: go! COS 418: Distributed Systems Lecture 7 Michael Freedman Bank A B 2 TC à A, B: prepare! 3 A, B à P: yes or no 4

More information

Intuitive distributed algorithms. with F#

Intuitive distributed algorithms. with F# Intuitive distributed algorithms with F# Natallia Dzenisenka Alena Hall @nata_dzen @lenadroid A tour of a variety of intuitivedistributed algorithms used in practical distributed systems. and how to prototype

More information

Lecture 17 : Distributed Transactions 11/8/2017

Lecture 17 : Distributed Transactions 11/8/2017 Lecture 17 : Distributed Transactions 11/8/2017 Today: Two-phase commit. Last time: Parallel query processing Recap: Main ways to get parallelism: Across queries: - run multiple queries simultaneously

More information

Consensus Problem. Pradipta De

Consensus Problem. Pradipta De Consensus Problem Slides are based on the book chapter from Distributed Computing: Principles, Paradigms and Algorithms (Chapter 14) by Kshemkalyani and Singhal Pradipta De pradipta.de@sunykorea.ac.kr

More information

The objective. Atomic Commit. The setup. Model. Preserve data consistency for distributed transactions in the presence of failures

The objective. Atomic Commit. The setup. Model. Preserve data consistency for distributed transactions in the presence of failures The objective Atomic Commit Preserve data consistency for distributed transactions in the presence of failures Model The setup For each distributed transaction T: one coordinator a set of participants

More information

AGREEMENT PROTOCOLS. Paxos -a family of protocols for solving consensus

AGREEMENT PROTOCOLS. Paxos -a family of protocols for solving consensus AGREEMENT PROTOCOLS Paxos -a family of protocols for solving consensus OUTLINE History of the Paxos algorithm Paxos Algorithm Family Implementation in existing systems References HISTORY OF THE PAXOS ALGORITHM

More information

Last time. Distributed systems Lecture 6: Elections, distributed transactions, and replication. DrRobert N. M. Watson

Last time. Distributed systems Lecture 6: Elections, distributed transactions, and replication. DrRobert N. M. Watson Distributed systems Lecture 6: Elections, distributed transactions, and replication DrRobert N. M. Watson 1 Last time Saw how we can build ordered multicast Messages between processes in a group Need to

More information

Replications and Consensus

Replications and Consensus CPSC 426/526 Replications and Consensus Ennan Zhai Computer Science Department Yale University Recall: Lec-8 and 9 In the lec-8 and 9, we learned: - Cloud storage and data processing - File system: Google

More information

Fault Tolerance. Basic Concepts

Fault Tolerance. Basic Concepts COP 6611 Advanced Operating System Fault Tolerance Chi Zhang czhang@cs.fiu.edu Dependability Includes Availability Run time / total time Basic Concepts Reliability The length of uninterrupted run time

More information

CS 425 / ECE 428 Distributed Systems Fall 2017

CS 425 / ECE 428 Distributed Systems Fall 2017 CS 425 / ECE 428 Distributed Systems Fall 2017 Indranil Gupta (Indy) Nov 7, 2017 Lecture 21: Replication Control All slides IG Server-side Focus Concurrency Control = how to coordinate multiple concurrent

More information

Fault Tolerance. Distributed Systems. September 2002

Fault Tolerance. Distributed Systems. September 2002 Fault Tolerance Distributed Systems September 2002 Basics A component provides services to clients. To provide services, the component may require the services from other components a component may depend

More information

The challenges of non-stable predicates. The challenges of non-stable predicates. The challenges of non-stable predicates

The challenges of non-stable predicates. The challenges of non-stable predicates. The challenges of non-stable predicates The challenges of non-stable predicates Consider a non-stable predicate Φ encoding, say, a safety property. We want to determine whether Φ holds for our program. The challenges of non-stable predicates

More information

Assignment 12: Commit Protocols and Replication

Assignment 12: Commit Protocols and Replication Data Modelling and Databases Exercise dates: May 24 / May 25, 2018 Ce Zhang, Gustavo Alonso Last update: June 04, 2018 Spring Semester 2018 Head TA: Ingo Müller Assignment 12: Commit Protocols and Replication

More information

Distributed Data with ACID Transactions

Distributed Data with ACID Transactions Distributed Data with ACID Transactions 3-tier application with data distributed across multiple DBMSs Not replicating the data (yet) DBMS1... DBMS2 Application Server Clients Why Do This? Legacy systems

More information

Distributed Consensus Protocols

Distributed Consensus Protocols Distributed Consensus Protocols ABSTRACT In this paper, I compare Paxos, the most popular and influential of distributed consensus protocols, and Raft, a fairly new protocol that is considered to be a

More information

WA 2. Paxos and distributed programming Martin Klíma

WA 2. Paxos and distributed programming Martin Klíma Paxos and distributed programming Martin Klíma Spefics of distributed programming Communication by message passing Considerable time to pass the network Non-reliable network Time is not synchronized on

More information

Introduction to Distributed Systems Seif Haridi

Introduction to Distributed Systems Seif Haridi Introduction to Distributed Systems Seif Haridi haridi@kth.se What is a distributed system? A set of nodes, connected by a network, which appear to its users as a single coherent system p1 p2. pn send

More information

Agreement in Distributed Systems CS 188 Distributed Systems February 19, 2015

Agreement in Distributed Systems CS 188 Distributed Systems February 19, 2015 Agreement in Distributed Systems CS 188 Distributed Systems February 19, 2015 Page 1 Introduction We frequently want to get a set of nodes in a distributed system to agree Commitment protocols and mutual

More information

CprE Fault Tolerance. Dr. Yong Guan. Department of Electrical and Computer Engineering & Information Assurance Center Iowa State University

CprE Fault Tolerance. Dr. Yong Guan. Department of Electrical and Computer Engineering & Information Assurance Center Iowa State University Fault Tolerance Dr. Yong Guan Department of Electrical and Computer Engineering & Information Assurance Center Iowa State University Outline for Today s Talk Basic Concepts Process Resilience Reliable

More information

Distributed Algorithms Benoît Garbinato

Distributed Algorithms Benoît Garbinato Distributed Algorithms Benoît Garbinato 1 Distributed systems networks distributed As long as there were no machines, programming was no problem networks distributed at all; when we had a few weak computers,

More information

Fault Tolerance. it continues to perform its function in the event of a failure example: a system with redundant components

Fault Tolerance. it continues to perform its function in the event of a failure example: a system with redundant components Fault Tolerance To avoid disruption due to failure and to improve availability, systems are designed to be fault-tolerant Two broad categories of fault-tolerant systems are: systems that mask failure it

More information

Asynchronous Reconfiguration for Paxos State Machines

Asynchronous Reconfiguration for Paxos State Machines Asynchronous Reconfiguration for Paxos State Machines Leander Jehl and Hein Meling Department of Electrical Engineering and Computer Science University of Stavanger, Norway Abstract. This paper addresses

More information

Coordinating distributed systems part II. Marko Vukolić Distributed Systems and Cloud Computing

Coordinating distributed systems part II. Marko Vukolić Distributed Systems and Cloud Computing Coordinating distributed systems part II Marko Vukolić Distributed Systems and Cloud Computing Last Time Coordinating distributed systems part I Zookeeper At the heart of Zookeeper is the ZAB atomic broadcast

More information

7 Fault Tolerant Distributed Transactions Commit protocols

7 Fault Tolerant Distributed Transactions Commit protocols 7 Fault Tolerant Distributed Transactions Commit protocols 7.1 Subtransactions and distribution 7.2 Fault tolerance and commit processing 7.3 Requirements 7.4 One phase commit 7.5 Two phase commit x based

More information

Paxos and Distributed Transactions

Paxos and Distributed Transactions Paxos and Distributed Transactions INF 5040 autumn 2016 lecturer: Roman Vitenberg Paxos what is it? The most commonly used consensus algorithm A fundamental building block for data centers Distributed

More information

Global atomicity. Such distributed atomicity is called global atomicity A protocol designed to enforce global atomicity is called commit protocol

Global atomicity. Such distributed atomicity is called global atomicity A protocol designed to enforce global atomicity is called commit protocol Global atomicity In distributed systems a set of processes may be taking part in executing a task Their actions may have to be atomic with respect to processes outside of the set example: in a distributed

More information

Designing for Understandability: the Raft Consensus Algorithm. Diego Ongaro John Ousterhout Stanford University

Designing for Understandability: the Raft Consensus Algorithm. Diego Ongaro John Ousterhout Stanford University Designing for Understandability: the Raft Consensus Algorithm Diego Ongaro John Ousterhout Stanford University Algorithms Should Be Designed For... Correctness? Efficiency? Conciseness? Understandability!

More information

EECS 498 Introduction to Distributed Systems

EECS 498 Introduction to Distributed Systems EECS 498 Introduction to Distributed Systems Fall 2017 Harsha V. Madhyastha Replicated State Machines Logical clocks Primary/ Backup Paxos? 0 1 (N-1)/2 No. of tolerable failures October 11, 2017 EECS 498

More information

Distributed Consensus: Making Impossible Possible

Distributed Consensus: Making Impossible Possible Distributed Consensus: Making Impossible Possible QCon London Tuesday 29/3/2016 Heidi Howard PhD Student @ University of Cambridge heidi.howard@cl.cam.ac.uk @heidiann360 What is Consensus? The process

More information

Failures, Elections, and Raft

Failures, Elections, and Raft Failures, Elections, and Raft CS 8 XI Copyright 06 Thomas W. Doeppner, Rodrigo Fonseca. All rights reserved. Distributed Banking SFO add interest based on current balance PVD deposit $000 CS 8 XI Copyright

More information

Recap. CSE 486/586 Distributed Systems Google Chubby Lock Service. Recap: First Requirement. Recap: Second Requirement. Recap: Strengthening P2

Recap. CSE 486/586 Distributed Systems Google Chubby Lock Service. Recap: First Requirement. Recap: Second Requirement. Recap: Strengthening P2 Recap CSE 486/586 Distributed Systems Google Chubby Lock Service Steve Ko Computer Sciences and Engineering University at Buffalo Paxos is a consensus algorithm. Proposers? Acceptors? Learners? A proposer

More information

EECS 591 DISTRIBUTED SYSTEMS

EECS 591 DISTRIBUTED SYSTEMS EECS 591 DISTRIBUTED SYSTEMS Manos Kapritsos Fall 2018 Slides by: Lorenzo Alvisi 3-PHASE COMMIT Coordinator I. sends VOTE-REQ to all participants 3. if (all votes are Yes) then send Precommit to all else

More information

Consistency. CS 475, Spring 2018 Concurrent & Distributed Systems

Consistency. CS 475, Spring 2018 Concurrent & Distributed Systems Consistency CS 475, Spring 2018 Concurrent & Distributed Systems Review: 2PC, Timeouts when Coordinator crashes What if the bank doesn t hear back from coordinator? If bank voted no, it s OK to abort If

More information

Exam 2 Review. October 29, Paul Krzyzanowski 1

Exam 2 Review. October 29, Paul Krzyzanowski 1 Exam 2 Review October 29, 2015 2013 Paul Krzyzanowski 1 Question 1 Why did Dropbox add notification servers to their architecture? To avoid the overhead of clients polling the servers periodically to check

More information

Basic vs. Reliable Multicast

Basic vs. Reliable Multicast Basic vs. Reliable Multicast Basic multicast does not consider process crashes. Reliable multicast does. So far, we considered the basic versions of ordered multicasts. What about the reliable versions?

More information

Chapter 8 Fault Tolerance

Chapter 8 Fault Tolerance DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 8 Fault Tolerance 1 Fault Tolerance Basic Concepts Being fault tolerant is strongly related to

More information

CSE 486/586: Distributed Systems

CSE 486/586: Distributed Systems CSE 486/586: Distributed Systems Concurrency Control (part 3) Ethan Blanton Department of Computer Science and Engineering University at Buffalo Lost Update Some transaction T1 runs interleaved with some

More information

6.033 Computer System Engineering

6.033 Computer System Engineering MIT OpenCourseWare http://ocw.mit.edu 6.033 Computer System Engineering Spring 2009 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms. Lec 19 : Nested atomic

More information

Distributed Systems Fault Tolerance

Distributed Systems Fault Tolerance Distributed Systems Fault Tolerance [] Fault Tolerance. Basic concepts - terminology. Process resilience groups and failure masking 3. Reliable communication reliable client-server communication reliable

More information

Fault-Tolerance & Paxos

Fault-Tolerance & Paxos Chapter 15 Fault-Tolerance & Paxos How do you create a fault-tolerant distributed system? In this chapter we start out with simple questions, and, step by step, improve our solutions until we arrive at

More information

Paxos Made Simple. Leslie Lamport, 2001

Paxos Made Simple. Leslie Lamport, 2001 Paxos Made Simple Leslie Lamport, 2001 The Problem Reaching consensus on a proposed value, among a collection of processes Safety requirements: Only a value that has been proposed may be chosen Only a

More information

Distributed Transactions

Distributed Transactions Distributed Transactions Preliminaries Last topic: transactions in a single machine This topic: transactions across machines Distribution typically addresses two needs: Split the work across multiple nodes

More information

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 14 Distributed Transactions

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 14 Distributed Transactions CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 14 Distributed Transactions Transactions Main issues: Concurrency control Recovery from failures 2 Distributed Transactions

More information

A definition. Byzantine Generals Problem. Synchronous, Byzantine world

A definition. Byzantine Generals Problem. Synchronous, Byzantine world The Byzantine Generals Problem Leslie Lamport, Robert Shostak, and Marshall Pease ACM TOPLAS 1982 Practical Byzantine Fault Tolerance Miguel Castro and Barbara Liskov OSDI 1999 A definition Byzantine (www.m-w.com):

More information

CSE 444: Database Internals. Section 9: 2-Phase Commit and Replication

CSE 444: Database Internals. Section 9: 2-Phase Commit and Replication CSE 444: Database Internals Section 9: 2-Phase Commit and Replication 1 Today 2-Phase Commit Replication 2 Two-Phase Commit Protocol (2PC) One coordinator and many subordinates Phase 1: Prepare Phase 2:

More information

Distributed Consensus: Making Impossible Possible

Distributed Consensus: Making Impossible Possible Distributed Consensus: Making Impossible Possible Heidi Howard PhD Student @ University of Cambridge heidi.howard@cl.cam.ac.uk @heidiann360 hh360.user.srcf.net Sometimes inconsistency is not an option

More information

Byzantine Fault Tolerance

Byzantine Fault Tolerance Byzantine Fault Tolerance CS 240: Computing Systems and Concurrency Lecture 11 Marco Canini Credits: Michael Freedman and Kyle Jamieson developed much of the original material. So far: Fail-stop failures

More information

Module 8 - Fault Tolerance

Module 8 - Fault Tolerance Module 8 - Fault Tolerance Dependability Reliability A measure of success with which a system conforms to some authoritative specification of its behavior. Probability that the system has not experienced

More information

Topics in Reliable Distributed Systems

Topics in Reliable Distributed Systems Topics in Reliable Distributed Systems 049017 1 T R A N S A C T I O N S Y S T E M S What is A Database? Organized collection of data typically persistent organization models: relational, object-based,

More information

Distributed Systems COMP 212. Lecture 19 Othon Michail

Distributed Systems COMP 212. Lecture 19 Othon Michail Distributed Systems COMP 212 Lecture 19 Othon Michail Fault Tolerance 2/31 What is a Distributed System? 3/31 Distributed vs Single-machine Systems A key difference: partial failures One component fails

More information

Control. CS432: Distributed Systems Spring 2017

Control. CS432: Distributed Systems Spring 2017 Transactions and Concurrency Control Reading Chapter 16, 17 (17.2,17.4,17.5 ) [Coulouris 11] Chapter 12 [Ozsu 10] 2 Objectives Learn about the following: Transactions in distributed systems Techniques

More information

Distributed Systems. Fault Tolerance. Paul Krzyzanowski

Distributed Systems. Fault Tolerance. Paul Krzyzanowski Distributed Systems Fault Tolerance Paul Krzyzanowski Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution 2.5 License. Faults Deviation from expected

More information

Fault Tolerance. Goals: transparent: mask (i.e., completely recover from) all failures, or predictable: exhibit a well defined failure behavior

Fault Tolerance. Goals: transparent: mask (i.e., completely recover from) all failures, or predictable: exhibit a well defined failure behavior Fault Tolerance Causes of failure: process failure machine failure network failure Goals: transparent: mask (i.e., completely recover from) all failures, or predictable: exhibit a well defined failure

More information

CSE 5306 Distributed Systems

CSE 5306 Distributed Systems CSE 5306 Distributed Systems Fault Tolerance Jia Rao http://ranger.uta.edu/~jrao/ 1 Failure in Distributed Systems Partial failure Happens when one component of a distributed system fails Often leaves

More information

CSE 5306 Distributed Systems. Fault Tolerance

CSE 5306 Distributed Systems. Fault Tolerance CSE 5306 Distributed Systems Fault Tolerance 1 Failure in Distributed Systems Partial failure happens when one component of a distributed system fails often leaves other components unaffected A failure

More information

Two-Phase Atomic Commitment Protocol in Asynchronous Distributed Systems with Crash Failure

Two-Phase Atomic Commitment Protocol in Asynchronous Distributed Systems with Crash Failure Two-Phase Atomic Commitment Protocol in Asynchronous Distributed Systems with Crash Failure Yong-Hwan Cho, Sung-Hoon Park and Seon-Hyong Lee School of Electrical and Computer Engineering, Chungbuk National

More information

Exam 2 Review. Fall 2011

Exam 2 Review. Fall 2011 Exam 2 Review Fall 2011 Question 1 What is a drawback of the token ring election algorithm? Bad question! Token ring mutex vs. Ring election! Ring election: multiple concurrent elections message size grows

More information

PRIMARY-BACKUP REPLICATION

PRIMARY-BACKUP REPLICATION PRIMARY-BACKUP REPLICATION Primary Backup George Porter Nov 14, 2018 ATTRIBUTION These slides are released under an Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0) Creative Commons

More information

FAULT TOLERANT LEADER ELECTION IN DISTRIBUTED SYSTEMS

FAULT TOLERANT LEADER ELECTION IN DISTRIBUTED SYSTEMS FAULT TOLERANT LEADER ELECTION IN DISTRIBUTED SYSTEMS Marius Rafailescu The Faculty of Automatic Control and Computers, POLITEHNICA University, Bucharest ABSTRACT There are many distributed systems which

More information

A Case Study of Agreement Problems in Distributed Systems : Non-Blocking Atomic Commitment

A Case Study of Agreement Problems in Distributed Systems : Non-Blocking Atomic Commitment A Case Study of Agreement Problems in Distributed Systems : Non-Blocking Atomic Commitment Michel RAYNAL IRISA, Campus de Beaulieu 35042 Rennes Cedex (France) raynal @irisa.fr Abstract This paper considers

More information

Causal Consistency and Two-Phase Commit

Causal Consistency and Two-Phase Commit Causal Consistency and Two-Phase Commit CS 240: Computing Systems and Concurrency Lecture 16 Marco Canini Credits: Michael Freedman and Kyle Jamieson developed much of the original material. Consistency

More information

Module 8 Fault Tolerance CS655! 8-1!

Module 8 Fault Tolerance CS655! 8-1! Module 8 Fault Tolerance CS655! 8-1! Module 8 - Fault Tolerance CS655! 8-2! Dependability Reliability! A measure of success with which a system conforms to some authoritative specification of its behavior.!

More information

Distributed Systems. Day 13: Distributed Transaction. To Be or Not to Be Distributed.. Transactions

Distributed Systems. Day 13: Distributed Transaction. To Be or Not to Be Distributed.. Transactions Distributed Systems Day 13: Distributed Transaction To Be or Not to Be Distributed.. Transactions Summary Background on Transactions ACID Semantics Distribute Transactions Terminology: Transaction manager,,

More information

Today: Fault Tolerance. Replica Management

Today: Fault Tolerance. Replica Management Today: Fault Tolerance Failure models Agreement in presence of faults Two army problem Byzantine generals problem Reliable communication Distributed commit Two phase commit Three phase commit Failure recovery

More information

Fault Tolerance. Distributed Systems IT332

Fault Tolerance. Distributed Systems IT332 Fault Tolerance Distributed Systems IT332 2 Outline Introduction to fault tolerance Reliable Client Server Communication Distributed commit Failure recovery 3 Failures, Due to What? A system is said to

More information

COMMENTS. AC-1: AC-1 does not require all processes to reach a decision It does not even require all correct processes to reach a decision

COMMENTS. AC-1: AC-1 does not require all processes to reach a decision It does not even require all correct processes to reach a decision ATOMIC COMMIT Preserve data consistency for distributed transactions in the presence of failures Setup one coordinator a set of participants Each process has access to a Distributed Transaction Log (DT

More information

Distributed Systems 8L for Part IB

Distributed Systems 8L for Part IB Distributed Systems 8L for Part IB Handout 3 Dr. Steven Hand 1 Distributed Mutual Exclusion In first part of course, saw need to coordinate concurrent processes / threads In particular considered how to

More information