Assignment 12: Commit Protocols and Replication Solution

Similar documents
Exercise 12: Commit Protocols and Replication

Assignment 12: Commit Protocols and Replication

Exercise 12: Commit Protocols and Replication

Distributed Transactions

The objective. Atomic Commit. The setup. Model. Preserve data consistency for distributed transactions in the presence of failures

EECS 591 DISTRIBUTED SYSTEMS. Manos Kapritsos Winter 2018

The challenges of non-stable predicates. The challenges of non-stable predicates. The challenges of non-stable predicates

Distributed Commit in Asynchronous Systems

COMMENTS. AC-1: AC-1 does not require all processes to reach a decision It does not even require all correct processes to reach a decision

Transactions. CS 475, Spring 2018 Concurrent & Distributed Systems

EECS 591 DISTRIBUTED SYSTEMS

Agreement and Consensus. SWE 622, Spring 2017 Distributed Software Engineering

CS 541 Database Systems. Three Phase Commit

Distributed Systems Consensus

Consensus in Distributed Systems. Jeff Chase Duke University

CSE 444: Database Internals. Section 9: 2-Phase Commit and Replication

Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf

Proseminar Distributed Systems Summer Semester Paxos algorithm. Stefan Resmerita

Consensus and related problems

Consensus, impossibility results and Paxos. Ken Birman

Consensus a classic problem. Consensus, impossibility results and Paxos. Distributed Consensus. Asynchronous networks.

Fault Tolerance. Goals: transparent: mask (i.e., completely recover from) all failures, or predictable: exhibit a well defined failure behavior

Distributed Systems Fault Tolerance

Fault Tolerance. o Basic Concepts o Process Resilience o Reliable Client-Server Communication o Reliable Group Communication. o Distributed Commit

Data Modeling and Databases Ch 14: Data Replication. Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich

Fault Tolerance Causes of failure: process failure machine failure network failure Goals: transparent: mask (i.e., completely recover from) all

FAULT TOLERANT LEADER ELECTION IN DISTRIBUTED SYSTEMS

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 14 Distributed Transactions

Last time. Distributed systems Lecture 6: Elections, distributed transactions, and replication. DrRobert N. M. Watson

Module 8 - Fault Tolerance

Fault Tolerance Part II. CS403/534 Distributed Systems Erkay Savas Sabanci University

Basic vs. Reliable Multicast

Introduction to Distributed Systems Seif Haridi

Distributed KIDS Labs 1

Distributed Systems. Fault Tolerance. Paul Krzyzanowski

Distributed Algorithms Benoît Garbinato

Exercise 11: Transactions

Distributed System. Gang Wu. Spring,2018

CS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5.

Distributed Systems Principles and Paradigms. Chapter 08: Fault Tolerance

CS505: Distributed Systems

Failures, Elections, and Raft

Homework #2 Nathan Balon CIS 578 October 31, 2004

ZooKeeper & Curator. CS 475, Spring 2018 Concurrent & Distributed Systems

CS 347 Parallel and Distributed Data Processing

Asynchronous Reconfiguration for Paxos State Machines

CSE 5306 Distributed Systems. Fault Tolerance

Fault Tolerance. it continues to perform its function in the event of a failure example: a system with redundant components

Replication in Distributed Systems

ATOMIC COMMITMENT Or: How to Implement Distributed Transactions in Sharded Databases

Distributed Databases. CS347 Lecture 16 June 6, 2001

Distributed Systems 11. Consensus. Paul Krzyzanowski

Lecture XII: Replication

Fault Tolerance. Distributed Systems IT332

Distributed Systems COMP 212. Lecture 19 Othon Michail

CSE 5306 Distributed Systems

To do. Consensus and related problems. q Failure. q Raft

Distributed Systems 8L for Part IB

Distributed Coordination with ZooKeeper - Theory and Practice. Simon Tao EMC Labs of China Oct. 24th, 2015

Topics in Reliable Distributed Systems

Distributed Data with ACID Transactions

Distributed Systems. Characteristics of Distributed Systems. Lecture Notes 1 Basic Concepts. Operating Systems. Anand Tripathi

Distributed Systems. Characteristics of Distributed Systems. Characteristics of Distributed Systems. Goals in Distributed System Designs

A Case Study of Agreement Problems in Distributed Systems : Non-Blocking Atomic Commitment

A Reliable Broadcast System

Coordination and Agreement

Recall: Primary-Backup. State machine replication. Extend PB for high availability. Consensus 2. Mechanism: Replicate and separate servers

Management of Protocol State

Today: Fault Tolerance

Semi-Passive Replication in the Presence of Byzantine Faults

Consensus Problem. Pradipta De

CMU SCS CMU SCS Who: What: When: Where: Why: CMU SCS

Coordination 1. To do. Mutual exclusion Election algorithms Next time: Global state. q q q

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 21: Network Protocols (and 2 Phase Commit)

Distributed Systems. coordination Johan Montelius ID2201. Distributed Systems ID2201

Practical Byzantine Fault Tolerance. Miguel Castro and Barbara Liskov

Distributed Data Management Replication

Module 8 Fault Tolerance CS655! 8-1!

Integrity in Distributed Databases

Intuitive distributed algorithms. with F#

Distributed Systems. 09. State Machine Replication & Virtual Synchrony. Paul Krzyzanowski. Rutgers University. Fall Paul Krzyzanowski

C 1. Recap. CSE 486/586 Distributed Systems Failure Detectors. Today s Question. Two Different System Models. Why, What, and How.

Recall our 2PC commit problem. Recall our 2PC commit problem. Doing failover correctly isn t easy. Consensus I. FLP Impossibility, Paxos

Replication and Consistency. Fall 2010 Jussi Kangasharju

Consensus on Transaction Commit

Distributed Systems. Day 11: Replication [Part 3 Raft] To survive failures you need a raft

Practical Byzantine Fault

Data Consistency and Blockchain. Bei Chun Zhou (BlockChainZ)

Today: Fault Tolerance. Fault Tolerance

Assignment 7: Integrity Constraints

Distributed Database Management System UNIT-2. Concurrency Control. Transaction ACID rules. MCA 325, Distributed DBMS And Object Oriented Databases

Concepts. Techniques for masking faults. Failure Masking by Redundancy. CIS 505: Software Systems Lecture Note on Consensus

Practical Byzantine Fault Tolerance

CS 347 Parallel and Distributed Data Processing

Paxos and Replication. Dan Ports, CSEP 552

Name: 1. CS372H: Spring 2009 Final Exam

Extend PB for high availability. PB high availability via 2PC. Recall: Primary-Backup. Putting it all together for SMR:

Today: Fault Tolerance. Failure Masking by Redundancy

Byzantine Fault Tolerant Raft

Distributed Systems. Day 13: Distributed Transaction. To Be or Not to Be Distributed.. Transactions

Transcription:

Data Modelling and Databases Exercise dates: May 24 / May 25, 2018 Ce Zhang, Gustavo Alonso Last update: June 04, 2018 Spring Semester 2018 Head TA: Ingo Müller Assignment 12: Commit Protocols and Replication Solution This assignment will be discussed during the exercise slots indicated above. If you want feedback for your copy, hand it in during the lecture on the Wednesday before (preferably stapled and with your e-mail address). You can also annotate your copy with questions you think should be discussed during the exercise session. If you have questions that are not answered by the solution we provide, send them to David (david.sidler@inf.ethz.ch). 1 2PC 1. Assuming a completely asynchronous system, is it always possible to achieve consensus? Explain your answer. Solution: No. If consensus has been reached (as it appears to an omniscient observer), the participants can t know about it until they all receive word from every other participant (directly or otherwise). Since communication is asynchronous, the following failures may prevent achieving consensus: 1. Communications fail, resulting in at least one participant never learning of all decisions, preventing it to move forward. 2. After transmitting its decision, one of the participants looses its ability to commit to its own decision, rendering its vote erroneous. Therefore, a consensus protocol must either contain a synchronous component or be tolerant of faults to a certain degree. The impossibility of asynchronous consensus has been shown by the following seminal work: Fischer, M. J.; Lynch, N. A.; Paterson, M. S. (1985). Impossibility of distributed consensus with one faulty process.

2. List all timeout possibilities of the 2PC protocol. Differentiate between the coordinator and the participants. Describe the consequences of each scenario. Coordinator Participant Participant timeout phase after sending a vote requests before receiving a vote request after sending a vote to commit consequences aborts the transaction aborts the transaction ask around 3. Assume the following scenario: We have one coordinator C and three participants P 1, P 2, P 3 running 2PC protocol. We define the event (A, B, MSG) as A sends the message MSG to B. A and B can be any of the participants or the coordinator, i.e., A, B {C, P 1, P 2, P 3 }. Allowed messages are request to vote, voting yes, voting no, request to abort, and request to commit, i.e., MSG {REQ, YES, NO, ABORT, COM } respectively. We also define the event (A, FAIL) to be the failure of node A at that point. Consider now the following order of events: (a) (C, P 1, REQ) (b) (C, P 2, REQ) (c) (C, P 3, REQ) (d) (P 1, C, YES) (e) (P 2, C, YES) (f) (P 3, C, YES) (g) (C, P 1, COM ) (h) (C, P 2, COM ) (i) (C, P 3, COM ) For each of the fail-scenarios described in the table below, replace one of the events from (a) to (i) with a different event in order for the fail-scenario to take place. If there are multiple possibilities, replace the earliest one. Assume that all the actions following the given modification will also change according to the 2PC protocol. Scenario event timestep(a-i) 2PC aborts, but no node has failed (P 1, C, NO) d A participant experiences a timeout waiting for a message (C, FAIL) a The coordinator experiences a timeout waiting for a message (P 1, FAIL) d 2PC blocks (C, FAIL) g A Cooperative Termination Protocol is run and the protocol finishes (C, FAIL) h 4. Give an example of a scenario where the 2PC protocol does not terminate.

Solution: If all participants vote commit and the coordinator fails after having received all votes, but before having sent any decisions, the whole system blocks: It is not known to the other participants whether the coordinator has voted commit or abort. Therefore, since no participant has voted abort, the participants cannot conclude that they can safely abort the transaction. Since none of the participants has received a commit message, they cannot conclude that the transaction has committed either. Instead, the participants have to wait for the coordinator to recover. 5. Given your answer to the question 1.4, define an alternation of the 2PC protocol that would terminate in the same scenario. Disregard all other constraints. Solution: Above discussion assumes that the coordinator is also a participant, whose vote is unknown to the surviving participants. However, we can use a coordinator that is not a participant at the same time. In this case, assuming the coordinator and none of the participants failed, the participants hold all information that determines the outcome of the transaction, so they can elect a new coordinator and start over. Note that if, in addition to the coordinator, also a participant fails, then the protocol may block as well: the one failed participant may have voted either way and whatever the remaining ones conclude to do, they may contradict what the failed one did. Hence, for 2PC to block, the coordinator and a participant need to fail, the two of which may or may not be the same machine. 2 3PC 1. A coordinator C and two participants P 1, P 2 run the three-phase-commit (3PC) protocol. The coordinator also acts as participant. We model the execution of the protocol as a series of events. An event can be one of the following: A message event of the form (A, B, MSG) means that node A sends the message MSG to node B, where A, B {C, P 1, P 2 } and the message MSG {REQ, YES, NO, PRE COM, ACK, ABORT, COM }, meaning request to vote, voting yes, voting no, pre-commit, acknowledge last message, request to abort, and request to commit respectively. A group communication event of the form (A, ask around ABORT) or (A, ask around COM ) means that node A initiates a round of group communication where all reachable nodes exchange all relevant information and then decide to abort or to commit accordingly. Group communication can be initiated after a time-out. We assume that no failures occur during group communication. A failure event of the form (A, FAIL) means that node A fails. The following sequence of events shows an execution of the 3PC protocol where no failures occur:

time step event 1 (C, P 1, REQ) 2 (C, P 2, REQ) 3 (P 1, C, YES) 4 (P 2, C, YES) 5 (C, P 1, PRE COM ) 6 (C, P 2, PRE COM ) 7 (P 1, C, ACK) 8 (P 2, C, ACK) 9 (C, P 1, COM ) 10 (C, P 2, COM ) We now modify this sequence of events starting from some time step. Complete each new sequence with one possible next event such that it models a valid execution of the 3PC protocol. Sequence (i): time step event 4 (P 2, C, NO) (C, P 1, ABORT) 5 time step Sequence (iv): event 6 (C, FAIL) 7 (P 2, ask around COM ) time step Sequence (ii): event 2 (C, FAIL) 3 (P 1, C, YES) 4 (P 1, ask around ABORT) Sequence (v): time step event 4 (P 2, FAIL) (C, P 1, ABORT) 5 Sequence (vi): time step Sequence (iii): event 5 (C, FAIL) 6 (P 1, ask around ABORT) time step event 6 (P 2, FAIL) 7 (C, P 2, PRE COM ) 8 (P 1, C, ACK) (C, P 1, COM ) 9

Explanation: Answers with failure events are not accepted and give 0 points. Sequence 1: P 2 decided to abort, so C propagates the decision to abort. Sequence 2: Either P 1 times out waiting for the pre-commit message, in which case it asks around, does not find any pre-commit, and finally concludes to abort; or P 2 times out waiting for the vote requests, in which case it locally aborts the transaction (but the question does not have a notation for that). Hence, the answer in the table and an answer mentioning P 2, timeout, and abort are accepted solutions. Sequence 3: Both P 1 or P 2 time out waiting for the pre-commit, in which at least one of the two starts asking around, does not find any pre-commit, and may finally conclude to abort. Any of the two can happen first, so either of the participants may be the sender in the next event. In a different implementation of the protocol, they may realize that all participants were willing to commit (P 1 and P 2 because they are still alive and can tell so; C because otherwise it would not have initiated the vote in the first place) and thus conclude to commit. Any answer in the form (P, ask around ) is thus correct. Sequence 4: P 2 times out waiting for the PRE COM and thus asks around, finds the PRE COM at P 1, and hence eventually concludes to commit. Concurrently, P 1 acknowledges the PRE COM. Either of the two events can happen first, so the solution in the table and (P 1, C, ACK) are the two accepted solutions. Sequence 5: C times out waiting on the vote of P 1, so it needs to abort the transaction and informs P 2 about that decision. Sequence 6: C times out waiting on the ACK of P 2, but since the decision to commit is certain, it can confirm the decision with the remaining participants (P 1 ). 2. In the commit protocols discussed in the lecture, participants vote whether to commit, then they decide by consensus. Give an example scenario in 3PC, which can t happen in 2PC, and in which a participant is forced to decide against its own vote. Solution: The following is one of multiple possible solutions: The participant receives the vote request in time, votes to commit, so do all other participants, but the coordinator fails before being able to send any pre-commits. All participants time out while waiting for the pre-commit, and are in an uncertainty period. When asking around, the situation becomes apparent and all decide to abort (TR3). Note that the coordinator is also a participant. 3. Define a scenario in which 3PC violates at least one of the AC rules.

Solution: The following is one of multiple possible solutions: A network split may occur, in which a subset of sites looses communication with all other sites, but not with each other. If this happens in a situation where the participants had voted to commit and there had been no pre-commits received on one side of the split, but there had been at least one pre-commit received on the other, the part of the network without the pre-commits will decide to abort according to TR3, while the other part may eventually decide to commit according to TR4. This violates AC1 and results in an inconsistent database state. 4. Compare the number of messages sent for 2PC and 3PC protocols when all the participants have committed the update. Solution: A succeeding 3PC protocol has the same number of runs and messages as a 2PC protocol run on the same configuration, plus one message round (send message, and wait for the reply) for each participant: send ACK, wait for commit; and also one message round for the coordinator: wait for ACKs, broadcast commit. Therefore, 3PC comes with an increase in number of messages sent that s linearly dependent on the number of nodes. Number of messages in 2PC: 3(n 1) Number of messages in 3PC: 5(n 1) Where n is the number of nodes in the system. 5. Compare the runtimes of 2PC and 3PC protocols with a timeout occurring during the second timeout window of at least one participant (no decision in 2PC, no pre-commit in 3PC). Solution: Second timeout window for a participant occurs if the coordinator fails after receiving the votes but before sending the decision. Therefore, it required 2(n 1) messages. If there is at least one participant that is certain, both protocols will run the same cooperative termination protocol. If not, 2PC will block and 3PC will run the termination protocol. 3 Liveness, safety, fault tolerance A protocol is live if each non-faulty process will eventually terminate. A protocol is safe if all processes that terminate arrive to the same decision (whether to commit or abort). The network is reliable if all messages arrive on time (the only possible failures are process failures). The network is unreliable if some messages may be lost. Fill the following table with true or false.

reliable network 2PC is live false false 2PC is safe true true 3PC is live true true 3PC is safe true false there exists a protocol that is live and safe true unreliable network false Solution: 2PC may block if the coordinator, who is also a participant, fails after receiving yes from all participants and before sending any commit message. In this case all participants are uncertain and cannot continue because they do not know the vote of the coordinator. This may happen whether the network is reliable or not. However, in 2PC, all participants always reach the same decision, which makes is a safe protocol. With a reliable network, 3PC will be live and safe in case of a node failure, because if at least one has received either a precommit or commit, all can commit, otherwise they all abort. 3PC in reliable network will always make progress. The basic version of 3PC (without quorum) does not tolerate communication failures, therefore in case of a network failure it is not safe, but live. For instance, if a node is disconnected from the network after receiving a precommit, and then the coordinator fails, the disconnected node will commit, while all other nodes will timeout and abort. 4 Replication 1. List three reasons why to use replication. Solution: 1. Increased throughput of read-only/read-mostly workloads. 2. Fault tolerance. 3. Coexistence of read-optimized and write-optimized data layout and access path, e.g., a column-store copy for read-heavy analytical queries and a row-store copy for write-heavy transactional queries. In this case replacation may need to be asynchronous to get reasonable performance. 2. List two disadvantages shared by all replication types. Solution:

1. Writes generate overhead in a system (replicating writes always takes longer than one single write). 2. Higher space consumption. 3. List the four main types of replication strategies. Solution: 1. Synchronous primary copy 2. Asynchronous primary copy 3. Synchronous update everywhere 4. Asynchronous update everywhere 4. Describe a scenario in which you would use a synchronous primary copy-strategy instead of an asynchronous update-everywhere strategy and state why. Solution: The main advantages of the first strategy over the latter are the lack of inconsistencies, and the lack of need to coordinate updates. A correct answer would emphasize at least one of these without going against the other. One example that builds on both these advantages is: Example: A system with a high frequency of reads, low frequency of writes from a single source, and a read consistency critical, such as a collection of sensor readings that gets updated arbitrarily periodically, with the requirement that two reads at the same time must return the same result. 5. Describe a scenario in which a synchronous strategy causes the database to loose consistency. Solution: The following is one of multiple possible solutions: Assume a fault-tolerant network of sites employing the synchronous update-everywhere strategy. If a network split occurs, both halves see the sites on the other side of the split as simply being offline. This allows for sites on both sides of the split to alter the database state independently of each other, ultimately causing the database as observed over both parts to be in an inconsistent state.