Assignment 12: Commit Protocols and Replication Solution

Data Modelling and Databases Exercise dates: May 24 / May 25, 2018 Ce Zhang, Gustavo Alonso Last update: June 04, 2018 Spring Semester 2018 Head TA: Ingo Müller Assignment 12: Commit Protocols and Replication Solution This assignment will be discussed during the exercise slots indicated above. If you want feedback for your copy, hand it in during the lecture on the Wednesday before (preferably stapled and with your e-mail address). You can also annotate your copy with questions you think should be discussed during the exercise session. If you have questions that are not answered by the solution we provide, send them to David (david.sidler@inf.ethz.ch). 1 2PC 1. Assuming a completely asynchronous system, is it always possible to achieve consensus? Explain your answer. Solution: No. If consensus has been reached (as it appears to an omniscient observer), the participants can t know about it until they all receive word from every other participant (directly or otherwise). Since communication is asynchronous, the following failures may prevent achieving consensus: 1. Communications fail, resulting in at least one participant never learning of all decisions, preventing it to move forward. 2. After transmitting its decision, one of the participants looses its ability to commit to its own decision, rendering its vote erroneous. Therefore, a consensus protocol must either contain a synchronous component or be tolerant of faults to a certain degree. The impossibility of asynchronous consensus has been shown by the following seminal work: Fischer, M. J.; Lynch, N. A.; Paterson, M. S. (1985). Impossibility of distributed consensus with one faulty process.

2. List all timeout possibilities of the 2PC protocol. Differentiate between the coordinator and the participants. Describe the consequences of each scenario. Coordinator Participant Participant timeout phase after sending a vote requests before receiving a vote request after sending a vote to commit consequences aborts the transaction aborts the transaction ask around 3. Assume the following scenario: We have one coordinator C and three participants P 1, P 2, P 3 running 2PC protocol. We define the event (A, B, MSG) as A sends the message MSG to B. A and B can be any of the participants or the coordinator, i.e., A, B {C, P 1, P 2, P 3 }. Allowed messages are request to vote, voting yes, voting no, request to abort, and request to commit, i.e., MSG {REQ, YES, NO, ABORT, COM } respectively. We also define the event (A, FAIL) to be the failure of node A at that point. Consider now the following order of events: (a) (C, P 1, REQ) (b) (C, P 2, REQ) (c) (C, P 3, REQ) (d) (P 1, C, YES) (e) (P 2, C, YES) (f) (P 3, C, YES) (g) (C, P 1, COM ) (h) (C, P 2, COM ) (i) (C, P 3, COM ) For each of the fail-scenarios described in the table below, replace one of the events from (a) to (i) with a different event in order for the fail-scenario to take place. If there are multiple possibilities, replace the earliest one. Assume that all the actions following the given modification will also change according to the 2PC protocol. Scenario event timestep(a-i) 2PC aborts, but no node has failed (P 1, C, NO) d A participant experiences a timeout waiting for a message (C, FAIL) a The coordinator experiences a timeout waiting for a message (P 1, FAIL) d 2PC blocks (C, FAIL) g A Cooperative Termination Protocol is run and the protocol finishes (C, FAIL) h 4. Give an example of a scenario where the 2PC protocol does not terminate.

Solution: If all participants vote commit and the coordinator fails after having received all votes, but before having sent any decisions, the whole system blocks: It is not known to the other participants whether the coordinator has voted commit or abort. Therefore, since no participant has voted abort, the participants cannot conclude that they can safely abort the transaction. Since none of the participants has received a commit message, they cannot conclude that the transaction has committed either. Instead, the participants have to wait for the coordinator to recover. 5. Given your answer to the question 1.4, define an alternation of the 2PC protocol that would terminate in the same scenario. Disregard all other constraints. Solution: Above discussion assumes that the coordinator is also a participant, whose vote is unknown to the surviving participants. However, we can use a coordinator that is not a participant at the same time. In this case, assuming the coordinator and none of the participants failed, the participants hold all information that determines the outcome of the transaction, so they can elect a new coordinator and start over. Note that if, in addition to the coordinator, also a participant fails, then the protocol may block as well: the one failed participant may have voted either way and whatever the remaining ones conclude to do, they may contradict what the failed one did. Hence, for 2PC to block, the coordinator and a participant need to fail, the two of which may or may not be the same machine. 2 3PC 1. A coordinator C and two participants P 1, P 2 run the three-phase-commit (3PC) protocol. The coordinator also acts as participant. We model the execution of the protocol as a series of events. An event can be one of the following: A message event of the form (A, B, MSG) means that node A sends the message MSG to node B, where A, B {C, P 1, P 2 } and the message MSG {REQ, YES, NO, PRE COM, ACK, ABORT, COM }, meaning request to vote, voting yes, voting no, pre-commit, acknowledge last message, request to abort, and request to commit respectively. A group communication event of the form (A, ask around ABORT) or (A, ask around COM ) means that node A initiates a round of group communication where all reachable nodes exchange all relevant information and then decide to abort or to commit accordingly. Group communication can be initiated after a time-out. We assume that no failures occur during group communication. A failure event of the form (A, FAIL) means that node A fails. The following sequence of events shows an execution of the 3PC protocol where no failures occur:

time step event 1 (C, P 1, REQ) 2 (C, P 2, REQ) 3 (P 1, C, YES) 4 (P 2, C, YES) 5 (C, P 1, PRE COM ) 6 (C, P 2, PRE COM ) 7 (P 1, C, ACK) 8 (P 2, C, ACK) 9 (C, P 1, COM ) 10 (C, P 2, COM ) We now modify this sequence of events starting from some time step. Complete each new sequence with one possible next event such that it models a valid execution of the 3PC protocol. Sequence (i): time step event 4 (P 2, C, NO) (C, P 1, ABORT) 5 time step Sequence (iv): event 6 (C, FAIL) 7 (P 2, ask around COM ) time step Sequence (ii): event 2 (C, FAIL) 3 (P 1, C, YES) 4 (P 1, ask around ABORT) Sequence (v): time step event 4 (P 2, FAIL) (C, P 1, ABORT) 5 Sequence (vi): time step Sequence (iii): event 5 (C, FAIL) 6 (P 1, ask around ABORT) time step event 6 (P 2, FAIL) 7 (C, P 2, PRE COM ) 8 (P 1, C, ACK) (C, P 1, COM ) 9

Explanation: Answers with failure events are not accepted and give 0 points. Sequence 1: P 2 decided to abort, so C propagates the decision to abort. Sequence 2: Either P 1 times out waiting for the pre-commit message, in which case it asks around, does not find any pre-commit, and finally concludes to abort; or P 2 times out waiting for the vote requests, in which case it locally aborts the transaction (but the question does not have a notation for that). Hence, the answer in the table and an answer mentioning P 2, timeout, and abort are accepted solutions. Sequence 3: Both P 1 or P 2 time out waiting for the pre-commit, in which at least one of the two starts asking around, does not find any pre-commit, and may finally conclude to abort. Any of the two can happen first, so either of the participants may be the sender in the next event. In a different implementation of the protocol, they may realize that all participants were willing to commit (P 1 and P 2 because they are still alive and can tell so; C because otherwise it would not have initiated the vote in the first place) and thus conclude to commit. Any answer in the form (P, ask around ) is thus correct. Sequence 4: P 2 times out waiting for the PRE COM and thus asks around, finds the PRE COM at P 1, and hence eventually concludes to commit. Concurrently, P 1 acknowledges the PRE COM. Either of the two events can happen first, so the solution in the table and (P 1, C, ACK) are the two accepted solutions. Sequence 5: C times out waiting on the vote of P 1, so it needs to abort the transaction and informs P 2 about that decision. Sequence 6: C times out waiting on the ACK of P 2, but since the decision to commit is certain, it can confirm the decision with the remaining participants (P 1 ). 2. In the commit protocols discussed in the lecture, participants vote whether to commit, then they decide by consensus. Give an example scenario in 3PC, which can t happen in 2PC, and in which a participant is forced to decide against its own vote. Solution: The following is one of multiple possible solutions: The participant receives the vote request in time, votes to commit, so do all other participants, but the coordinator fails before being able to send any pre-commits. All participants time out while waiting for the pre-commit, and are in an uncertainty period. When asking around, the situation becomes apparent and all decide to abort (TR3). Note that the coordinator is also a participant. 3. Define a scenario in which 3PC violates at least one of the AC rules.

Solution: The following is one of multiple possible solutions: A network split may occur, in which a subset of sites looses communication with all other sites, but not with each other. If this happens in a situation where the participants had voted to commit and there had been no pre-commits received on one side of the split, but there had been at least one pre-commit received on the other, the part of the network without the pre-commits will decide to abort according to TR3, while the other part may eventually decide to commit according to TR4. This violates AC1 and results in an inconsistent database state. 4. Compare the number of messages sent for 2PC and 3PC protocols when all the participants have committed the update. Solution: A succeeding 3PC protocol has the same number of runs and messages as a 2PC protocol run on the same configuration, plus one message round (send message, and wait for the reply) for each participant: send ACK, wait for commit; and also one message round for the coordinator: wait for ACKs, broadcast commit. Therefore, 3PC comes with an increase in number of messages sent that s linearly dependent on the number of nodes. Number of messages in 2PC: 3(n 1) Number of messages in 3PC: 5(n 1) Where n is the number of nodes in the system. 5. Compare the runtimes of 2PC and 3PC protocols with a timeout occurring during the second timeout window of at least one participant (no decision in 2PC, no pre-commit in 3PC). Solution: Second timeout window for a participant occurs if the coordinator fails after receiving the votes but before sending the decision. Therefore, it required 2(n 1) messages. If there is at least one participant that is certain, both protocols will run the same cooperative termination protocol. If not, 2PC will block and 3PC will run the termination protocol. 3 Liveness, safety, fault tolerance A protocol is live if each non-faulty process will eventually terminate. A protocol is safe if all processes that terminate arrive to the same decision (whether to commit or abort). The network is reliable if all messages arrive on time (the only possible failures are process failures). The network is unreliable if some messages may be lost. Fill the following table with true or false.

reliable network 2PC is live false false 2PC is safe true true 3PC is live true true 3PC is safe true false there exists a protocol that is live and safe true unreliable network false Solution: 2PC may block if the coordinator, who is also a participant, fails after receiving yes from all participants and before sending any commit message. In this case all participants are uncertain and cannot continue because they do not know the vote of the coordinator. This may happen whether the network is reliable or not. However, in 2PC, all participants always reach the same decision, which makes is a safe protocol. With a reliable network, 3PC will be live and safe in case of a node failure, because if at least one has received either a precommit or commit, all can commit, otherwise they all abort. 3PC in reliable network will always make progress. The basic version of 3PC (without quorum) does not tolerate communication failures, therefore in case of a network failure it is not safe, but live. For instance, if a node is disconnected from the network after receiving a precommit, and then the coordinator fails, the disconnected node will commit, while all other nodes will timeout and abort. 4 Replication 1. List three reasons why to use replication. Solution: 1. Increased throughput of read-only/read-mostly workloads. 2. Fault tolerance. 3. Coexistence of read-optimized and write-optimized data layout and access path, e.g., a column-store copy for read-heavy analytical queries and a row-store copy for write-heavy transactional queries. In this case replacation may need to be asynchronous to get reasonable performance. 2. List two disadvantages shared by all replication types. Solution:

1. Writes generate overhead in a system (replicating writes always takes longer than one single write). 2. Higher space consumption. 3. List the four main types of replication strategies. Solution: 1. Synchronous primary copy 2. Asynchronous primary copy 3. Synchronous update everywhere 4. Asynchronous update everywhere 4. Describe a scenario in which you would use a synchronous primary copy-strategy instead of an asynchronous update-everywhere strategy and state why. Solution: The main advantages of the first strategy over the latter are the lack of inconsistencies, and the lack of need to coordinate updates. A correct answer would emphasize at least one of these without going against the other. One example that builds on both these advantages is: Example: A system with a high frequency of reads, low frequency of writes from a single source, and a read consistency critical, such as a collection of sensor readings that gets updated arbitrarily periodically, with the requirement that two reads at the same time must return the same result. 5. Describe a scenario in which a synchronous strategy causes the database to loose consistency. Solution: The following is one of multiple possible solutions: Assume a fault-tolerant network of sites employing the synchronous update-everywhere strategy. If a network split occurs, both halves see the sites on the other side of the split as simply being offline. This allows for sites on both sides of the split to alter the database state independently of each other, ultimately causing the database as observed over both parts to be in an inconsistent state.