Assignment 12: Commit Protocols and Replication

Similar documents
Exercise 12: Commit Protocols and Replication

Assignment 12: Commit Protocols and Replication Solution

Exercise 12: Commit Protocols and Replication

EECS 591 DISTRIBUTED SYSTEMS. Manos Kapritsos Winter 2018

COMMENTS. AC-1: AC-1 does not require all processes to reach a decision It does not even require all correct processes to reach a decision

Distributed Transactions

The objective. Atomic Commit. The setup. Model. Preserve data consistency for distributed transactions in the presence of failures

The challenges of non-stable predicates. The challenges of non-stable predicates. The challenges of non-stable predicates

Fault Tolerance. o Basic Concepts o Process Resilience o Reliable Client-Server Communication o Reliable Group Communication. o Distributed Commit

Proseminar Distributed Systems Summer Semester Paxos algorithm. Stefan Resmerita

EECS 591 DISTRIBUTED SYSTEMS

Exercise 11: Transactions

Distributed Systems Consensus

Consensus in Distributed Systems. Jeff Chase Duke University

Assignment 3: Relational Algebra Solution

CSE 444: Database Internals. Section 9: 2-Phase Commit and Replication

Transactions. CS 475, Spring 2018 Concurrent & Distributed Systems

Assignment 6: SQL III

Exercise 9: Normal Forms

Assignment 2: Relational Model

Assignment 6: SQL III Solution

6.033 Spring 2016 Lecture #18. Distributed transactions Multi-site atomicity Two-phase commit

CS 138: Practical Byzantine Consensus. CS 138 XX 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.

6.033 Spring Lecture #18. Distributed transactions Multi-site atomicity Two-phase commit spring 2018 Katrina LaCurts

Assignment 7: Integrity Constraints

Fault Tolerance. Basic Concepts

Consensus, impossibility results and Paxos. Ken Birman

Consensus a classic problem. Consensus, impossibility results and Paxos. Distributed Consensus. Asynchronous networks.

Distributed System. Gang Wu. Spring,2018

Introduction to Distributed Systems Seif Haridi

Fault Tolerance. it continues to perform its function in the event of a failure example: a system with redundant components

Fault Tolerance. Goals: transparent: mask (i.e., completely recover from) all failures, or predictable: exhibit a well defined failure behavior

Distributed Algorithms Benoît Garbinato

Distributed Systems Fault Tolerance

Distributed Commit in Asynchronous Systems

Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf

Agreement and Consensus. SWE 622, Spring 2017 Distributed Software Engineering

Data Modelling and Databases Exercise dates: March 22/March 23, 2018 Ce Zhang, Gustavo Alonso Last update: March 26, 2018.

Data Modeling and Databases Ch 14: Data Replication. Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich

Causal Consistency and Two-Phase Commit

Coordination 1. To do. Mutual exclusion Election algorithms Next time: Global state. q q q

Assignment 1: Entity-Relationship Model Solution

Fault Tolerance Causes of failure: process failure machine failure network failure Goals: transparent: mask (i.e., completely recover from) all

Recovering from a Crash. Three-Phase Commit

Basic vs. Reliable Multicast

Assignment 2: Relational Model Solution

7 Fault Tolerant Distributed Transactions Commit protocols

Today: Fault Tolerance. Reliable One-One Communication

Module 8 Fault Tolerance CS655! 8-1!

Semi-Passive Replication in the Presence of Byzantine Faults

Distributed Systems. Fault Tolerance. Paul Krzyzanowski

Viewstamped Replication to Practical Byzantine Fault Tolerance. Pradipta De

CS 347 Parallel and Distributed Data Processing

Distributed Systems COMP 212. Lecture 19 Othon Michail

CS 541 Database Systems. Three Phase Commit

Consensus and related problems

Today: Fault Tolerance

Fault Tolerance. Chapter 7

Today: Fault Tolerance. Failure Masking by Redundancy

CSE 5306 Distributed Systems

CSE 5306 Distributed Systems. Fault Tolerance

Distributed Systems Principles and Paradigms. Chapter 08: Fault Tolerance

Fault Tolerance. Distributed Systems. September 2002

CS455: Introduction to Distributed Systems [Spring 2018] Dept. Of Computer Science, Colorado State University

Distributed Data with ACID Transactions

Asynchronous Reconfiguration for Paxos State Machines

Module 8 - Fault Tolerance

Today: Fault Tolerance. Fault Tolerance

CS505: Distributed Systems

Consistency. CS 475, Spring 2018 Concurrent & Distributed Systems

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 21: Network Protocols (and 2 Phase Commit)

CSE 486/586: Distributed Systems

CprE Fault Tolerance. Dr. Yong Guan. Department of Electrical and Computer Engineering & Information Assurance Center Iowa State University

FAULT TOLERANT LEADER ELECTION IN DISTRIBUTED SYSTEMS

Distributed Systems. 19. Fault Tolerance Paul Krzyzanowski. Rutgers University. Fall 2013

Distributed Systems 24. Fault Tolerance

Practical Byzantine Fault Tolerance

Data Modelling and Databases Exercise dates: March 20/March 27, 2017 Ce Zhang, Gustavo Alonso Last update: February 17, 2018.

Distributed Coordination with ZooKeeper - Theory and Practice. Simon Tao EMC Labs of China Oct. 24th, 2015

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 14 Distributed Transactions

Recall our 2PC commit problem. Recall our 2PC commit problem. Doing failover correctly isn t easy. Consensus I. FLP Impossibility, Paxos

Lazy Agent Replication and Asynchronous Consensus for the Fault-Tolerant Mobile Agent System

Chapter 8 Fault Tolerance

C 1. Recap. CSE 486/586 Distributed Systems Failure Detectors. Today s Question. Two Different System Models. Why, What, and How.

ZHT: Const Eventual Consistency Support For ZHT. Group Member: Shukun Xie Ran Xin

Transactions Between Distributed Ledgers

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING UNIT-1

Assignment 5: SQL II Solution

CS 347 Parallel and Distributed Data Processing

Fault Tolerance. Distributed Systems IT332

CS 425 / ECE 428 Distributed Systems Fall 2017

Initial Assumptions. Modern Distributed Computing. Network Topology. Initial Input

Paxos and Replication. Dan Ports, CSEP 552

Fault Tolerance Part II. CS403/534 Distributed Systems Erkay Savas Sabanci University

Global atomicity. Such distributed atomicity is called global atomicity A protocol designed to enforce global atomicity is called commit protocol

Lecture XII: Replication

Basic Protocols and Error Control Mechanisms

ZooKeeper & Curator. CS 475, Spring 2018 Concurrent & Distributed Systems

Replication and Consistency. Fall 2010 Jussi Kangasharju

ATOMIC COMMITMENT Or: How to Implement Distributed Transactions in Sharded Databases

Transcription:

Data Modelling and Databases Exercise dates: May 24 / May 25, 2018 Ce Zhang, Gustavo Alonso Last update: June 04, 2018 Spring Semester 2018 Head TA: Ingo Müller Assignment 12: Commit Protocols and Replication This assignment will be discussed during the exercise slots indicated above. If you want feedback for your copy, hand it in during the lecture on the Wednesday before (preferably stapled and with your e-mail address). You can also annotate your copy with questions you think should be discussed during the exercise session. If you have questions that are not answered by the solution we provide, send them to David (david.sidler@inf.ethz.ch). 1 2PC 1. Assuming a completely asynchronous system, is it always possible to achieve consensus? Explain your answer. 2. List all timeout possibilities of the 2PC protocol. Differentiate between the coordinator and the participants. Describe the consequences of each scenario. Coordinator Participant Participant timeout phase consequences 3. Assume the following scenario: We have one coordinator C and three participants P 1, P 2, P 3 running 2PC protocol. We define the event (A, B, MSG) as A sends the message MSG to

B. A and B can be any of the participants or the coordinator, i.e., A, B {C, P 1, P 2, P 3 }. Allowed messages are request to vote, voting yes, voting no, request to abort, and request to commit, i.e., MSG {REQ, YES, NO, ABORT, COM } respectively. We also define the event (A, FAIL) to be the failure of node A at that point. Consider now the following order of events: (a) (C, P 1, REQ) (b) (C, P 2, REQ) (c) (C, P 3, REQ) (d) (P 1, C, YES) (e) (P 2, C, YES) (f) (P 3, C, YES) (g) (C, P 1, COM ) (h) (C, P 2, COM ) (i) (C, P 3, COM ) For each of the fail-scenarios described in the table below, replace one of the events from (a) to (i) with a different event in order for the fail-scenario to take place. If there are multiple possibilities, replace the earliest one. Assume that all the actions following the given modification will also change according to the 2PC protocol. Scenario event timestep(a-i) 2PC aborts, but no node has failed A participant experiences a timeout waiting for a message The coordinator experiences a timeout waiting for a message 2PC blocks A Cooperative Termination Protocol is run and the protocol finishes 4. Give an example of a scenario where the 2PC protocol does not terminate. 5. Given your answer to the question 1.4, define an alternation of the 2PC protocol that would terminate in the same scenario. Disregard all other constraints.

2 3PC 1. A coordinator C and two participants P 1, P 2 run the three-phase-commit (3PC) protocol. The coordinator also acts as participant. We model the execution of the protocol as a series of events. An event can be one of the following: A message event of the form (A, B, MSG) means that node A sends the message MSG to node B, where A, B {C, P 1, P 2 } and the message MSG {REQ, YES, NO, PRE COM, ACK, ABORT, COM }, meaning request to vote, voting yes, voting no, pre-commit, acknowledge last message, request to abort, and request to commit respectively. A group communication event of the form (A, ask around ABORT) or (A, ask around COM ) means that node A initiates a round of group communication where all reachable nodes exchange all relevant information and then decide to abort or to commit accordingly. Group communication can be initiated after a time-out. We assume that no failures occur during group communication. A failure event of the form (A, FAIL) means that node A fails. The following sequence of events shows an execution of the 3PC protocol where no failures occur: time step event 1 (C, P 1, REQ) 2 (C, P 2, REQ) 3 (P 1, C, YES) 4 (P 2, C, YES) 5 (C, P 1, PRE COM ) 6 (C, P 2, PRE COM ) 7 (P 1, C, ACK) 8 (P 2, C, ACK) 9 (C, P 1, COM ) 10 (C, P 2, COM ) We now modify this sequence of events starting from some time step. Complete each new sequence with one possible next event such that it models a valid execution of the 3PC protocol. Sequence (i): 4 (P 2, C, NO) 5 2 (C, FAIL) 3 (P 1, C, YES) 4 Sequence (ii): Sequence (iii):

5 (C, FAIL) 6 Sequence (iv): 6 (C, FAIL) 7 Sequence (v): 4 (P 2, FAIL) 5 Sequence (vi): 6 (P 2, FAIL) 7 (C, P 2, PRE COM ) 8 (P 1, C, ACK) 9 2. In the commit protocols discussed in the lecture, participants vote whether to commit, then they decide by consensus. Give an example scenario in 3PC, which can t happen in 2PC, and in which a participant is forced to decide against its own vote. 3. Define a scenario in which 3PC violates at least one of the AC rules.

4. Compare the number of messages sent for 2PC and 3PC protocols when all the participants have committed the update. 5. Compare the runtimes of 2PC and 3PC protocols with a timeout occurring during the second timeout window of at least one participant (no decision in 2PC, no pre-commit in 3PC). 3 Liveness, safety, fault tolerance A protocol is live if each non-faulty process will eventually terminate. A protocol is safe if all processes that terminate arrive to the same decision (whether to commit or abort). The network is reliable if all messages arrive on time (the only possible failures are process failures). The network is unreliable if some messages may be lost. Fill the following table with true or false.

2PC is live 2PC is safe 3PC is live 3PC is safe there exists a protocol that is live and safe reliable network unreliable network 4 Replication 1. List three reasons why to use replication. 2. List two disadvantages shared by all replication types. 3. List the four main types of replication strategies.

4. Describe a scenario in which you would use a synchronous primary copy-strategy instead of an asynchronous update-everywhere strategy and state why. 5. Describe a scenario in which a synchronous strategy causes the database to loose consistency.