Exercise 12: Commit Protocols and Replication

Size: px
Start display at page:

Download "Exercise 12: Commit Protocols and Replication"

Transcription

1 Data Modelling and Databases (DMDB) ETH Zurich Spring Semester 2017 Systems Group Lecturer(s): Gustavo Alonso, Ce Zhang Date: May 22, 2017 Assistant(s): Claude Barthels, Eleftherios Sidirourgos, Eliza Last update: August 16, 2017 Wszola, Ingo Müller, Kaan Kara, Renato Marroquín, Zsolt István Exercise 12: Commit Protocols and Replication Solution The exercises marked with * will be discussed in the exercise session. You can solve the other exercises as practice, ask questions about them in the session, and hand them in for feedback. All exercises may be relevant for the exam. Ask Lefteris (lefteris.sidirourgos@inf.ethz.ch) for feedback on this week s exercise sheet or give it to the TA of your session (preferably stapled and with your address). 1 2PC 1. Assuming a completely asynchronous system, is it always possible to achieve consensus? Why? Solution: No. If consensus has been reached (as it appears to an omniscient observer), the participants can t know about it until they all receive word from every other participant (directly or otherwise). Since communication is asynchronous, the following failures may prevent achieving consensus: 1. Communications fail, resulting in at least one participant never learning of all decisions, preventing it to move forward. 2. After transmitting its decision, one of the participants looses its ability to commit to its own decision, rendering its vote erroneous. Therefore, a consensus protocol must either contain a synchronous component or be tolerant of faults to a certain degree.

2 The impossibility of asynchronous consensus has been shown by the following seminal work: Fischer, M. J.; Lynch, N. A.; Paterson, M. S. (1985). "Impossibility of distributed consensus with one faulty process". 2. List all timeout possibilities of the 2PC protocol. Differentiate between the coordinator and the participants. Describe the consequences of each scenario. Coordinator Participant Participant timeout phase after sending a vote requests before receiving a vote request after sending a vote to commit consequences aborts the transaction aborts the transaction ask around 3. Assume the following scenario: We have one coordinator C and three participants P 1, P 2, P 3 running 2PC protocol. We define the event (A, B, MSG) as A sends the message MSG to B. A and B can be any of the participants or the coordinator, i.e., A, B {C, P 1, P 2, P 3 }. Allowed messages are request to vote, voting yes, voting no, request to abort, and request to commit, i.e., MSG {REQ, Y ES, NO, ABORT, COM} respectively. We also define the event (A, F AIL) to be the failure of node A at that point. Consider now the following order of events: (a) (C, P 1, REQ) (b) (C, P 2, REQ) (c) (C, P 3, REQ) (d) (P 1, C, Y ES) (e) (P 2, C, Y ES) (f) (P 3, C, Y ES) (g) (C, P 1, COM) (h) (C, P 2, COM) (i) (C, P 3, COM) For each of the fail-scenarios described in the table below, replace one of the events from (a) to (i) with a different event in order for the fail-scenario to take place. If there are multiple possibilities, replace the earliest one. Assume that all the actions following the given modification will also change according to the 2PC protocol.

3 Scenario event timestep(a-i) 2PC aborts, but no node has failed (P 1, C, NO) d a participant experiences a timeout waiting for a message (C, F AIL) a the coordinator experiences a timeout waiting for a message (P 1, F AIL) d 2PC blocks (C, F AIL) g a Cooperative Termination Protocol is run and the protocol finishes (C, F AIL) h 4. Give an example of a scenario where the 2PC protocol does not terminate. Solution: If all participants vote commit and the coordinator fails after having received all votes, but before having sent any decisions, the whole system blocks: It is not known to the other participants whether the coordinator has voted commit or abort. Therefore, since no participant has voted abort, the participants cannot conclude that they can safely abort the transaction. Since none of the participants has received a commit message, they cannot conclude that the transaction has committed either. Instead, the participants have to wait for the coordinator to recover. 5. Given your answer to the question 1.4, define an alternation of the 2PC protocol that would terminate in the same scenario. Disregard all other constraints. Solution: Above discussion assumes that the coordinator is also a participant, whose vote is unknown to the surviving participants. However, we can use a coordinator that is not a participant at the same time. In this case, assuming the coordinator and none of the participants failed, the participants hold all information that determines the outcome of the transaction, so they can elect a new coordinator and start over. Note that if, in addition to the coordinator, also a participant fails, then the protocol may block as well: the one failed participant may have voted either way and whatever the remaining ones conclude to do, they may contradict what the failed one did. Hence, for 2PC to block, the coordinator and a participant need to fail, the two of which may or may not be the same machine. 2 3PC 1. A coordinator C and two participants P 1, P 2 run the three-phase-commit (3PC) protocol. The coordinator also acts as participant. We model the execution of the protocol as a series of events. An event can be one of the following: A message event of the form (A, B, MSG) means that node A sends the message MSG to node B, where A, B {C, P 1, P 2 } and the message MSG {request, yes,

4 no, pre-commit, ack, abort, commit}, meaning request to vote, voting yes, voting no, pre-commit, acknowledge last message, request to abort, and request to commit respectively. A group communication event of the form (A, ask around abort) or (A, ask around commit) means that node A initiates a round of group communication where all reachable nodes exchange all relevant information and then decide to abort or to commit accordingly. We assume that no failures occur during group communication. A failure event of the form (A, fail) means that node A fails. The following sequence of events shows an execution of the 3PC protocol where no failures occur: time step event 1 (C, P 1, request) 2 (C, P 2, request) 3 (P 1, C, yes) 4 (P 2, C, yes) 5 (C, P 1, pre-commit) 6 (C, P 2, pre-commit) 7 (P 1, C, ack) 8 (P 2, C, ack) 9 (C, P 1, commit) 10 (C, P 2, commit) We now modify this sequence of events starting from some time step. Complete each new sequence with one possible next event such that it models a valid execution of the 3PC protocol. Sequence (i): Sequence (iii): time step event 4 (P 2, C, no) (C, P 1, abort) 5 Sequence (ii): time step event 2 (C, fail) 3 (P 1, C, yes) 4 (P 1, ask around abort) time step event 5 (C, fail) 6 (P 1, ask around abort)

5 time step Sequence (iv): event 6 (C, fail) 7 (P 2, ask around commit) Sequence (v): Sequence (vi): time step event 6 (P 2, fail) 7 (C, P 2, pre-commit) 8 (P 1, C, ack) (C, P 1, commit) 9 time step event 4 (P 2, fail) (C, P 1, abort) 5 Explanation: Answers with failure events are not accepted and give 0 points. Sequence (i): P 2 decided to abort, so C propagates the decision to abort. Sequence (ii): Either P 1 times out waiting for the pre-commit message, in which case it asks around, does not find any pre-commit, and finally concludes to abort; or P 2 times out waiting for the vote requests, in which case it locally aborts the transaction (but the question does not have a notation for that). Hence, the answer in the table and an answer mentioning P 2, timeout, and abort are accepted solutions. Sequence (iii): Both P 1 or P 2 time out waiting for the pre-commit, in which at least one of the two starts asking around, does not find any pre-commit, and may finally conclude to abort. Any of the two can happen first, so either of the participants may be the sender in the next event. In a different implementation of the protocol, they may realize that all participants were willing to commit (P 1 and P 2 because they are still alive and can tell so; C because otherwise it would not have initiated the vote in the first place) and thus conclude to commit. Any answer in the form (P, ask around ) is thus correct. Sequence (iv): P 2 times out waiting for the pre-commit and thus asks around, finds the pre-commit at P 1, and hence eventually concludes to commit. Concurrently, P 1 acknowledges the pre-commit. Either of the two events can happen first, so the solution in the table and (P 1, C, ack) are the two accepted solutions. Sequence (v): C times out waiting on the vote of P 1, so it needs to abort the transaction and informs P 2 about that decision. Sequence (vi): C times out waiting on the ack of P 2, but since the decision to commit is certain, it can confirm the decision with the remaining participants (P 1 ). 2. In the commit protocols discussed in the lecture, participants vote whether to commit, then they decide by consensus. Give an example scenario in 3PC, which can t happen in 2PC, and in which a participant is forced to decide against its own vote.

6 Solution: (Example solution) The participant receives the vote request in time, votes to commit, so do all other participants, but the coordinator fails before being able to send any pre-commits. All participants time out while waiting for the pre-commit, and are in an uncertainty period. When asking around, the situation becomes apparent and all decide to abort (TR3). 3. Define a scenario in which 3PC violates at least one of the AC rules. Solution: (Example solution) A network split may occur, in which a subset of sites looses communication with all other sites, but not with each other. If this happens in a situation where the participants had voted to commit and there had been no precommits received on one side of the split, but there had been at least one pre-commit received on the other, the part of the network without the pre-commits will decide to abort according to TR3, while the other part may eventually decide to commit according to TR4. This violates AC1 and results in an inconsistent database state. 4. Compare the number of messages sent for 2PC and 3PC protocols when all the participants have committed the update. Solution: A succeeding 3PC protocol has the same number of runs and messages as a 2PC protocol run on the same configuration, plus one message round (send message, and wait for the reply) for each participant: send ACK, wait for commit; and also one message round for the coordinator: wait for ACKs, broadcast commit. Therefore, 3PC comes with an increase in number of messages sent that s linearly dependent on the number of nodes. Number of messages in 2PC: 3(n 1) Number of messages in 3PC: 5(n 1), where n is the number of nodes in the system. 5. Compare the runtimes of 2PC and 3PC protocols with a timeout occurring during the second timeout window of at least one participant (no decision in 2PC, no pre-commit in 3PC). Solution: Second timeout window for a participant occurs if the coordinator fails after receiving the votes but before sending the decision. Therefore, it required 2(n 1) messages. If there is at least one participant that is certain, both protocols will run the same cooperative termination protocol. If not, 2PC will block and 3PC will run the termination protocol. 3 Liveness, safety, fault tolerance A protocol is live if each non-faulty process will eventually terminate. A protocol is safe if all processes that terminate arrive to the same decision (whether to commit or abort). The network

7 is reliable if all messages arrive on time (the only possible failures are process failures). The network is unreliable if some messages may be lost. Fill the following table with true or false. reliable network 2PC is live false false 2PC is safe true true 3PC is live true true 3PC is safe true false there exists a protocol that is live and safe true unreliable network false Solution: 2PC may block if the coordinator, who is also a participant, fails after receiving yes from all participants and before sending any commit message. In this case all participants are uncertain and cannot continue because they do not know the vote of the coordinator. This may happen whether the network is reliable or not. However, in 2PC, all participants always reach the same decision, which makes is a safe protocol. With a reliable network, 3PC will be live and safe in case of a node failure, because if at least one has received either a precommit or commit, all can commit, otherwise they all abort. 3PC in reliable network will always make progress. In case of network failure, 3PC is live, but not safe. If a node is disconnected from the network after receiving a precommit, and then the coordinator fails, the disconnected node will commit, while all other nodes will timeout and abort. 4 Replication 1. List three reasons why to use replication. Solution: 1. Increased throughput of read-only/read-mostly workloads. 2. Fault tolerance. 3. Coexistence of read-optimized and write-optimized data layout and access path, e.g., a column-store copy for read-heavy analytical queries and a row-store copy for write-heavy transactional queries. In this case replacation may need to be asynchronous to get reasonable performance. 2. List three disadvantages shared by all replication types. Solution:

8 1. Writes generate overhead in a system (replicating writes always takes longer than one single write). 2. Higher space consumption. 3. As per the CAP theorem, writes will either lock or put the copies in an inconsistent state. In both cases, the time complexity of reading recently updated data gets dominated by the time complexity of the update (or the corresponding consistency protocol). 3. List the four main types of replication strategies. Solution: 1. Synchronous primary copy 2. Asynchronous primary copy 3. Synchronous update everywhere 4. Asynchronous update everywhere 4. Describe a scenario in which you would use a synchronous primary copy-strategy instead of an asynchronous update-everywhere strategy and state why. Solution: The main advantages of the first strategy over the latter are the lack of inconsistencies, and the lack of need to coordinate updates. A correct answer would emphasize at least one of these without going against the other. One example that builds on both these advantages is: Example: A system with a high frequency of reads, low frequency of writes from a single source, and a read consistency critical, such as a collection of sensor readings that gets updated arbitrarily periodically, with the requirement that two reads at the same time must return the same result. 5. Describe a scenario in which a synchronous strategy causes the database to loose consistency. Solution: (Example solution) Assume a fault-tolerant network of sites employing the synchronous update-everywhere strategy. If a network split occurs, both halves see the sites on the other side of the split as simply being offline. This allows for sites on both sides of the split to alter the database state independently of each other, ultimately causing the database as observed over both parts to be in an inconsistent state.

Assignment 12: Commit Protocols and Replication Solution

Assignment 12: Commit Protocols and Replication Solution Data Modelling and Databases Exercise dates: May 24 / May 25, 2018 Ce Zhang, Gustavo Alonso Last update: June 04, 2018 Spring Semester 2018 Head TA: Ingo Müller Assignment 12: Commit Protocols and Replication

More information

Exercise 12: Commit Protocols and Replication

Exercise 12: Commit Protocols and Replication Data Modelling and Databases (DMDB) ETH Zurich Spring Semester 2017 Systems Group Lecturer(s): Gustavo Alonso, Ce Zhang Date: May 22, 2017 Assistant(s): Claude Barthels, Eleftherios Sidirourgos, Eliza

More information

Assignment 12: Commit Protocols and Replication

Assignment 12: Commit Protocols and Replication Data Modelling and Databases Exercise dates: May 24 / May 25, 2018 Ce Zhang, Gustavo Alonso Last update: June 04, 2018 Spring Semester 2018 Head TA: Ingo Müller Assignment 12: Commit Protocols and Replication

More information

Exercise 11: Transactions

Exercise 11: Transactions Data Modelling and Databases (DMDB) ETH Zurich Spring Semester 2017 Systems Group Lecturer(s): Gustavo Alonso, Ce Zhang Date: Assistant(s): Claude Barthels, Eleftherios Sidirourgos, Eliza Last update:

More information

Distributed Transactions

Distributed Transactions Distributed Transactions Preliminaries Last topic: transactions in a single machine This topic: transactions across machines Distribution typically addresses two needs: Split the work across multiple nodes

More information

Exercise 9: Normal Forms

Exercise 9: Normal Forms Data Modelling and Databases (DMDB) ETH Zurich Spring Semester 2017 Systems Group Lecturer(s): Gustavo Alonso, Ce Zhang Date: Assistant(s): Claude Barthels, Eleftherios Sidirourgos, Eliza Last update:

More information

The objective. Atomic Commit. The setup. Model. Preserve data consistency for distributed transactions in the presence of failures

The objective. Atomic Commit. The setup. Model. Preserve data consistency for distributed transactions in the presence of failures The objective Atomic Commit Preserve data consistency for distributed transactions in the presence of failures Model The setup For each distributed transaction T: one coordinator a set of participants

More information

The challenges of non-stable predicates. The challenges of non-stable predicates. The challenges of non-stable predicates

The challenges of non-stable predicates. The challenges of non-stable predicates. The challenges of non-stable predicates The challenges of non-stable predicates Consider a non-stable predicate Φ encoding, say, a safety property. We want to determine whether Φ holds for our program. The challenges of non-stable predicates

More information

Distributed Commit in Asynchronous Systems

Distributed Commit in Asynchronous Systems Distributed Commit in Asynchronous Systems Minsoo Ryu Department of Computer Science and Engineering 2 Distributed Commit Problem - Either everybody commits a transaction, or nobody - This means consensus!

More information

EECS 591 DISTRIBUTED SYSTEMS. Manos Kapritsos Winter 2018

EECS 591 DISTRIBUTED SYSTEMS. Manos Kapritsos Winter 2018 EECS 591 DISTRIBUTED SYSTEMS Manos Kapritsos Winter 2018 ATOMIC COMMIT Preserve data consistency for distributed transactions in the presence of failures Setup one coordinator a set of participants Each

More information

Transactions. CS 475, Spring 2018 Concurrent & Distributed Systems

Transactions. CS 475, Spring 2018 Concurrent & Distributed Systems Transactions CS 475, Spring 2018 Concurrent & Distributed Systems Review: Transactions boolean transfermoney(person from, Person to, float amount){ if(from.balance >= amount) { from.balance = from.balance

More information

EECS 591 DISTRIBUTED SYSTEMS

EECS 591 DISTRIBUTED SYSTEMS EECS 591 DISTRIBUTED SYSTEMS Manos Kapritsos Fall 2018 Slides by: Lorenzo Alvisi 3-PHASE COMMIT Coordinator I. sends VOTE-REQ to all participants 3. if (all votes are Yes) then send Precommit to all else

More information

COMMENTS. AC-1: AC-1 does not require all processes to reach a decision It does not even require all correct processes to reach a decision

COMMENTS. AC-1: AC-1 does not require all processes to reach a decision It does not even require all correct processes to reach a decision ATOMIC COMMIT Preserve data consistency for distributed transactions in the presence of failures Setup one coordinator a set of participants Each process has access to a Distributed Transaction Log (DT

More information

CS 541 Database Systems. Three Phase Commit

CS 541 Database Systems. Three Phase Commit CS 541 Database Systems Three Phase Commit 1 Introduction No ACP can eliminate blocking if total failures or total site failures are possible. 2PC may cause blocking even if there is a nontotal site failure

More information

Agreement and Consensus. SWE 622, Spring 2017 Distributed Software Engineering

Agreement and Consensus. SWE 622, Spring 2017 Distributed Software Engineering Agreement and Consensus SWE 622, Spring 2017 Distributed Software Engineering Today General agreement problems Fault tolerance limitations of 2PC 3PC Paxos + ZooKeeper 2 Midterm Recap 200 GMU SWE 622 Midterm

More information

Distributed Systems Consensus

Distributed Systems Consensus Distributed Systems Consensus Amir H. Payberah amir@sics.se Amirkabir University of Technology (Tehran Polytechnic) Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 1 / 56 What is the Problem?

More information

Consensus in Distributed Systems. Jeff Chase Duke University

Consensus in Distributed Systems. Jeff Chase Duke University Consensus in Distributed Systems Jeff Chase Duke University Consensus P 1 P 1 v 1 d 1 Unreliable multicast P 2 P 3 Consensus algorithm P 2 P 3 v 2 Step 1 Propose. v 3 d 2 Step 2 Decide. d 3 Generalizes

More information

ETH Zurich Spring Semester Systems Group Lecturer(s): Gustavo Alonso, Ce Zhang Date: March 20/March 27, 2017.

ETH Zurich Spring Semester Systems Group Lecturer(s): Gustavo Alonso, Ce Zhang Date: March 20/March 27, 2017. Data Modelling and Databases (DMDB) ETH Zurich Spring Semester 2017 Systems Group Lecturer(s): Gustavo Alonso, Ce Zhang Date: March Assistant(s): Claude Barthels, Eleftherios Sidirourgos, Last update:

More information

Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf

Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf Distributed systems Lecture 6: distributed transactions, elections, consensus and replication Malte Schwarzkopf Last time Saw how we can build ordered multicast Messages between processes in a group Need

More information

Proseminar Distributed Systems Summer Semester Paxos algorithm. Stefan Resmerita

Proseminar Distributed Systems Summer Semester Paxos algorithm. Stefan Resmerita Proseminar Distributed Systems Summer Semester 2016 Paxos algorithm stefan.resmerita@cs.uni-salzburg.at The Paxos algorithm Family of protocols for reaching consensus among distributed agents Agents may

More information

Consensus and related problems

Consensus and related problems Consensus and related problems Today l Consensus l Google s Chubby l Paxos for Chubby Consensus and failures How to make process agree on a value after one or more have proposed what the value should be?

More information

CSE 444: Database Internals. Section 9: 2-Phase Commit and Replication

CSE 444: Database Internals. Section 9: 2-Phase Commit and Replication CSE 444: Database Internals Section 9: 2-Phase Commit and Replication 1 Today 2-Phase Commit Replication 2 Two-Phase Commit Protocol (2PC) One coordinator and many subordinates Phase 1: Prepare Phase 2:

More information

Fault Tolerance. Goals: transparent: mask (i.e., completely recover from) all failures, or predictable: exhibit a well defined failure behavior

Fault Tolerance. Goals: transparent: mask (i.e., completely recover from) all failures, or predictable: exhibit a well defined failure behavior Fault Tolerance Causes of failure: process failure machine failure network failure Goals: transparent: mask (i.e., completely recover from) all failures, or predictable: exhibit a well defined failure

More information

Consensus a classic problem. Consensus, impossibility results and Paxos. Distributed Consensus. Asynchronous networks.

Consensus a classic problem. Consensus, impossibility results and Paxos. Distributed Consensus. Asynchronous networks. Consensus, impossibility results and Paxos Ken Birman Consensus a classic problem Consensus abstraction underlies many distributed systems and protocols N processes They start execution with inputs {0,1}

More information

Data Modeling and Databases Ch 14: Data Replication. Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich

Data Modeling and Databases Ch 14: Data Replication. Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich Data Modeling and Databases Ch 14: Data Replication Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich Database Replication What is database replication The advantages of

More information

Fault Tolerance Causes of failure: process failure machine failure network failure Goals: transparent: mask (i.e., completely recover from) all

Fault Tolerance Causes of failure: process failure machine failure network failure Goals: transparent: mask (i.e., completely recover from) all Fault Tolerance Causes of failure: process failure machine failure network failure Goals: transparent: mask (i.e., completely recover from) all failures or predictable: exhibit a well defined failure behavior

More information

Introduction to Distributed Systems Seif Haridi

Introduction to Distributed Systems Seif Haridi Introduction to Distributed Systems Seif Haridi haridi@kth.se What is a distributed system? A set of nodes, connected by a network, which appear to its users as a single coherent system p1 p2. pn send

More information

Consensus, impossibility results and Paxos. Ken Birman

Consensus, impossibility results and Paxos. Ken Birman Consensus, impossibility results and Paxos Ken Birman Consensus a classic problem Consensus abstraction underlies many distributed systems and protocols N processes They start execution with inputs {0,1}

More information

ETH Zurich Spring Semester Systems Group Lecturer(s): Gustavo Alonso, Ce Zhang Date: March 27/March 31, Exercise 5: SQL II.

ETH Zurich Spring Semester Systems Group Lecturer(s): Gustavo Alonso, Ce Zhang Date: March 27/March 31, Exercise 5: SQL II. Data Modelling and Databases (DMDB) ETH Zurich Spring Semester 2017 Systems Group Lecturer(s): Gustavo Alonso, Ce Zhang Date: March Assistant(s): Claude Barthels, Eleftherios Sidirourgos, Last update:

More information

CS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5.

CS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5. Question 1 What makes a message unstable? How does an unstable message become stable? Distributed Systems 2016 Exam 2 Review Paul Krzyzanowski Rutgers University Fall 2016 In virtual sychrony, a message

More information

Distributed Systems Fault Tolerance

Distributed Systems Fault Tolerance Distributed Systems Fault Tolerance [] Fault Tolerance. Basic concepts - terminology. Process resilience groups and failure masking 3. Reliable communication reliable client-server communication reliable

More information

ETH Zurich Spring Semester Systems Group Lecturer(s): Gustavo Alonso, Ce Zhang Date: August 19, Exam. Questions

ETH Zurich Spring Semester Systems Group Lecturer(s): Gustavo Alonso, Ce Zhang Date: August 19, Exam. Questions Data Modelling and Databases (DMDB) ETH Zurich Spring Semester 2017 Systems Group Lecturer(s): Gustavo Alonso, Ce Zhang Date: August 19, 2017 Assistant(s): Claude Barthels, Eleftherios Sidirourgos, Last

More information

Fault Tolerance. o Basic Concepts o Process Resilience o Reliable Client-Server Communication o Reliable Group Communication. o Distributed Commit

Fault Tolerance. o Basic Concepts o Process Resilience o Reliable Client-Server Communication o Reliable Group Communication. o Distributed Commit Fault Tolerance o Basic Concepts o Process Resilience o Reliable Client-Server Communication o Reliable Group Communication o Distributed Commit -1 Distributed Commit o A more general problem of atomic

More information

Distributed KIDS Labs 1

Distributed KIDS Labs 1 Distributed Databases @ KIDS Labs 1 Distributed Database System A distributed database system consists of loosely coupled sites that share no physical component Appears to user as a single system Database

More information

FAULT TOLERANT LEADER ELECTION IN DISTRIBUTED SYSTEMS

FAULT TOLERANT LEADER ELECTION IN DISTRIBUTED SYSTEMS FAULT TOLERANT LEADER ELECTION IN DISTRIBUTED SYSTEMS Marius Rafailescu The Faculty of Automatic Control and Computers, POLITEHNICA University, Bucharest ABSTRACT There are many distributed systems which

More information

Basic vs. Reliable Multicast

Basic vs. Reliable Multicast Basic vs. Reliable Multicast Basic multicast does not consider process crashes. Reliable multicast does. So far, we considered the basic versions of ordered multicasts. What about the reliable versions?

More information

CMU SCS CMU SCS Who: What: When: Where: Why: CMU SCS

CMU SCS CMU SCS Who: What: When: Where: Why: CMU SCS Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB s C. Faloutsos A. Pavlo Lecture#23: Distributed Database Systems (R&G ch. 22) Administrivia Final Exam Who: You What: R&G Chapters 15-22

More information

Last time. Distributed systems Lecture 6: Elections, distributed transactions, and replication. DrRobert N. M. Watson

Last time. Distributed systems Lecture 6: Elections, distributed transactions, and replication. DrRobert N. M. Watson Distributed systems Lecture 6: Elections, distributed transactions, and replication DrRobert N. M. Watson 1 Last time Saw how we can build ordered multicast Messages between processes in a group Need to

More information

Distributed System. Gang Wu. Spring,2018

Distributed System. Gang Wu. Spring,2018 Distributed System Gang Wu Spring,2018 Lecture4:Failure& Fault-tolerant Failure is the defining difference between distributed and local programming, so you have to design distributed systems with the

More information

Fault Tolerance Part II. CS403/534 Distributed Systems Erkay Savas Sabanci University

Fault Tolerance Part II. CS403/534 Distributed Systems Erkay Savas Sabanci University Fault Tolerance Part II CS403/534 Distributed Systems Erkay Savas Sabanci University 1 Reliable Group Communication Reliable multicasting: A message that is sent to a process group should be delivered

More information

Distributed Algorithms Benoît Garbinato

Distributed Algorithms Benoît Garbinato Distributed Algorithms Benoît Garbinato 1 Distributed systems networks distributed As long as there were no machines, programming was no problem networks distributed at all; when we had a few weak computers,

More information

CS 347 Parallel and Distributed Data Processing

CS 347 Parallel and Distributed Data Processing CS 347 Parallel and Distributed Data Processing Spring 2016 Notes 6: Reliability Reliable Distributed DB Management Reliability Failure models Scenarios CS 347 Notes 6 2 Reliability Correctness Serializability

More information

ATOMIC COMMITMENT Or: How to Implement Distributed Transactions in Sharded Databases

ATOMIC COMMITMENT Or: How to Implement Distributed Transactions in Sharded Databases ATOMIC COMMITMENT Or: How to Implement Distributed Transactions in Sharded Databases We talked about transactions and how to implement them in a single-node database. We ll now start looking into how to

More information

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 14 Distributed Transactions

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 14 Distributed Transactions CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 14 Distributed Transactions Transactions Main issues: Concurrency control Recovery from failures 2 Distributed Transactions

More information

Module 8 - Fault Tolerance

Module 8 - Fault Tolerance Module 8 - Fault Tolerance Dependability Reliability A measure of success with which a system conforms to some authoritative specification of its behavior. Probability that the system has not experienced

More information

Distributed Data with ACID Transactions

Distributed Data with ACID Transactions Distributed Data with ACID Transactions 3-tier application with data distributed across multiple DBMSs Not replicating the data (yet) DBMS1... DBMS2 Application Server Clients Why Do This? Legacy systems

More information

A Reliable Broadcast System

A Reliable Broadcast System A Reliable Broadcast System Yuchen Dai, Xiayi Huang, Diansan Zhou Department of Computer Sciences and Engineering Santa Clara University December 10 2013 Table of Contents 2 Introduction......3 2.1 Objective...3

More information

Distributed Systems Principles and Paradigms. Chapter 08: Fault Tolerance

Distributed Systems Principles and Paradigms. Chapter 08: Fault Tolerance Distributed Systems Principles and Paradigms Maarten van Steen VU Amsterdam, Dept. Computer Science Room R4.20, steen@cs.vu.nl Chapter 08: Fault Tolerance Version: December 2, 2010 2 / 65 Contents Chapter

More information

CS505: Distributed Systems

CS505: Distributed Systems Department of Computer Science CS505: Distributed Systems Lecture 13: Distributed Transactions Outline Distributed Transactions Two Phase Commit and Three Phase Commit Non-blocking Atomic Commit with P

More information

Failures, Elections, and Raft

Failures, Elections, and Raft Failures, Elections, and Raft CS 8 XI Copyright 06 Thomas W. Doeppner, Rodrigo Fonseca. All rights reserved. Distributed Banking SFO add interest based on current balance PVD deposit $000 CS 8 XI Copyright

More information

Distributed Coordination with ZooKeeper - Theory and Practice. Simon Tao EMC Labs of China Oct. 24th, 2015

Distributed Coordination with ZooKeeper - Theory and Practice. Simon Tao EMC Labs of China Oct. 24th, 2015 Distributed Coordination with ZooKeeper - Theory and Practice Simon Tao EMC Labs of China {simon.tao@emc.com} Oct. 24th, 2015 Agenda 1. ZooKeeper Overview 2. Coordination in Spring XD 3. ZooKeeper Under

More information

Homework #2 Nathan Balon CIS 578 October 31, 2004

Homework #2 Nathan Balon CIS 578 October 31, 2004 Homework #2 Nathan Balon CIS 578 October 31, 2004 1 Answer the following questions about the snapshot algorithm: A) What is it used for? It used for capturing the global state of a distributed system.

More information

Distributed Systems. Day 13: Distributed Transaction. To Be or Not to Be Distributed.. Transactions

Distributed Systems. Day 13: Distributed Transaction. To Be or Not to Be Distributed.. Transactions Distributed Systems Day 13: Distributed Transaction To Be or Not to Be Distributed.. Transactions Summary Background on Transactions ACID Semantics Distribute Transactions Terminology: Transaction manager,,

More information

Distributed Systems. Fault Tolerance. Paul Krzyzanowski

Distributed Systems. Fault Tolerance. Paul Krzyzanowski Distributed Systems Fault Tolerance Paul Krzyzanowski Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution 2.5 License. Faults Deviation from expected

More information

Data Consistency and Blockchain. Bei Chun Zhou (BlockChainZ)

Data Consistency and Blockchain. Bei Chun Zhou (BlockChainZ) Data Consistency and Blockchain Bei Chun Zhou (BlockChainZ) beichunz@cn.ibm.com 1 Data Consistency Point-in-time consistency Transaction consistency Application consistency 2 Strong Consistency ACID Atomicity.

More information

CSE 5306 Distributed Systems. Fault Tolerance

CSE 5306 Distributed Systems. Fault Tolerance CSE 5306 Distributed Systems Fault Tolerance 1 Failure in Distributed Systems Partial failure happens when one component of a distributed system fails often leaves other components unaffected A failure

More information

ZooKeeper & Curator. CS 475, Spring 2018 Concurrent & Distributed Systems

ZooKeeper & Curator. CS 475, Spring 2018 Concurrent & Distributed Systems ZooKeeper & Curator CS 475, Spring 2018 Concurrent & Distributed Systems Review: Agreement In distributed systems, we have multiple nodes that need to all agree that some object has some state Examples:

More information

Chapter 4: Distributed Transactions (First Part) IPD, Forschungsbereich Systeme der Informationsverwaltung

Chapter 4: Distributed Transactions (First Part) IPD, Forschungsbereich Systeme der Informationsverwaltung Chapter 4: Distributed Transactions (First Part) IPD, Forschungsbereich e der Informationsverwaltung 1 Distributed Transactions (1) US Customers Transfer USD 500,-- from Klemens account to Jim s account.

More information

Topics in Reliable Distributed Systems

Topics in Reliable Distributed Systems Topics in Reliable Distributed Systems 049017 1 T R A N S A C T I O N S Y S T E M S What is A Database? Organized collection of data typically persistent organization models: relational, object-based,

More information

Asynchronous Reconfiguration for Paxos State Machines

Asynchronous Reconfiguration for Paxos State Machines Asynchronous Reconfiguration for Paxos State Machines Leander Jehl and Hein Meling Department of Electrical Engineering and Computer Science University of Stavanger, Norway Abstract. This paper addresses

More information

Consensus Problem. Pradipta De

Consensus Problem. Pradipta De Consensus Problem Slides are based on the book chapter from Distributed Computing: Principles, Paradigms and Algorithms (Chapter 14) by Kshemkalyani and Singhal Pradipta De pradipta.de@sunykorea.ac.kr

More information

Today: Fault Tolerance

Today: Fault Tolerance Today: Fault Tolerance Agreement in presence of faults Two army problem Byzantine generals problem Reliable communication Distributed commit Two phase commit Three phase commit Paxos Failure recovery Checkpointing

More information

Integrity in Distributed Databases

Integrity in Distributed Databases Integrity in Distributed Databases Andreas Farella Free University of Bozen-Bolzano Table of Contents 1 Introduction................................................... 3 2 Different aspects of integrity.....................................

More information

Name: 1. CS372H: Spring 2009 Final Exam

Name: 1. CS372H: Spring 2009 Final Exam Name: 1 Instructions CS372H: Spring 2009 Final Exam This exam is closed book and notes with one exception: you may bring and refer to a 1-sided 8.5x11- inch piece of paper printed with a 10-point or larger

More information

Fault Tolerance. Distributed Systems IT332

Fault Tolerance. Distributed Systems IT332 Fault Tolerance Distributed Systems IT332 2 Outline Introduction to fault tolerance Reliable Client Server Communication Distributed commit Failure recovery 3 Failures, Due to What? A system is said to

More information

Coordination and Agreement

Coordination and Agreement Coordination and Agreement Nicola Dragoni Embedded Systems Engineering DTU Informatics 1. Introduction 2. Distributed Mutual Exclusion 3. Elections 4. Multicast Communication 5. Consensus and related problems

More information

Fault Tolerance. it continues to perform its function in the event of a failure example: a system with redundant components

Fault Tolerance. it continues to perform its function in the event of a failure example: a system with redundant components Fault Tolerance To avoid disruption due to failure and to improve availability, systems are designed to be fault-tolerant Two broad categories of fault-tolerant systems are: systems that mask failure it

More information

CS October 2017

CS October 2017 Atomic Transactions Transaction An operation composed of a number of discrete steps. Distributed Systems 11. Distributed Commit Protocols All the steps must be completed for the transaction to be committed.

More information

CSE 5306 Distributed Systems

CSE 5306 Distributed Systems CSE 5306 Distributed Systems Fault Tolerance Jia Rao http://ranger.uta.edu/~jrao/ 1 Failure in Distributed Systems Partial failure Happens when one component of a distributed system fails Often leaves

More information

Distributed Systems COMP 212. Lecture 19 Othon Michail

Distributed Systems COMP 212. Lecture 19 Othon Michail Distributed Systems COMP 212 Lecture 19 Othon Michail Fault Tolerance 2/31 What is a Distributed System? 3/31 Distributed vs Single-machine Systems A key difference: partial failures One component fails

More information

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 21: Network Protocols (and 2 Phase Commit)

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 21: Network Protocols (and 2 Phase Commit) CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring 2003 Lecture 21: Network Protocols (and 2 Phase Commit) 21.0 Main Point Protocol: agreement between two parties as to

More information

To do. Consensus and related problems. q Failure. q Raft

To do. Consensus and related problems. q Failure. q Raft Consensus and related problems To do q Failure q Consensus and related problems q Raft Consensus We have seen protocols tailored for individual types of consensus/agreements Which process can enter the

More information

Distributed Systems 11. Consensus. Paul Krzyzanowski

Distributed Systems 11. Consensus. Paul Krzyzanowski Distributed Systems 11. Consensus Paul Krzyzanowski pxk@cs.rutgers.edu 1 Consensus Goal Allow a group of processes to agree on a result All processes must agree on the same value The value must be one

More information

Distributed Systems 24. Fault Tolerance

Distributed Systems 24. Fault Tolerance Distributed Systems 24. Fault Tolerance Paul Krzyzanowski pxk@cs.rutgers.edu 1 Faults Deviation from expected behavior Due to a variety of factors: Hardware failure Software bugs Operator errors Network

More information

Distributed Systems. Characteristics of Distributed Systems. Lecture Notes 1 Basic Concepts. Operating Systems. Anand Tripathi

Distributed Systems. Characteristics of Distributed Systems. Lecture Notes 1 Basic Concepts. Operating Systems. Anand Tripathi 1 Lecture Notes 1 Basic Concepts Anand Tripathi CSci 8980 Operating Systems Anand Tripathi CSci 8980 1 Distributed Systems A set of computers (hosts or nodes) connected through a communication network.

More information

Distributed Systems. Characteristics of Distributed Systems. Characteristics of Distributed Systems. Goals in Distributed System Designs

Distributed Systems. Characteristics of Distributed Systems. Characteristics of Distributed Systems. Goals in Distributed System Designs 1 Anand Tripathi CSci 8980 Operating Systems Lecture Notes 1 Basic Concepts Distributed Systems A set of computers (hosts or nodes) connected through a communication network. Nodes may have different speeds

More information

Distributed Systems 8L for Part IB

Distributed Systems 8L for Part IB Distributed Systems 8L for Part IB Handout 3 Dr. Steven Hand 1 Distributed Mutual Exclusion In first part of course, saw need to coordinate concurrent processes / threads In particular considered how to

More information

Semi-Passive Replication in the Presence of Byzantine Faults

Semi-Passive Replication in the Presence of Byzantine Faults Semi-Passive Replication in the Presence of Byzantine Faults HariGovind V. Ramasamy Adnan Agbaria William H. Sanders University of Illinois at Urbana-Champaign 1308 W. Main Street, Urbana IL 61801, USA

More information

Consensus on Transaction Commit

Consensus on Transaction Commit Consensus on Transaction Commit Jim Gray and Leslie Lamport Microsoft Research 1 January 2004 revised 19 April 2004, 8 September 2005, 5 July 2017 MSR-TR-2003-96 This paper appeared in ACM Transactions

More information

Advanced Systems Lab (Intro and Administration) G. Alonso Systems Group

Advanced Systems Lab (Intro and Administration) G. Alonso Systems Group Advanced Systems Lab (Intro and Administration) G. Alonso Systems Group http://www.systems.ethz.ch Overview of the Course Focus on project Individual project during semester (3 milestones) This is a project

More information

Consistency. CS 475, Spring 2018 Concurrent & Distributed Systems

Consistency. CS 475, Spring 2018 Concurrent & Distributed Systems Consistency CS 475, Spring 2018 Concurrent & Distributed Systems Review: 2PC, Timeouts when Coordinator crashes What if the bank doesn t hear back from coordinator? If bank voted no, it s OK to abort If

More information

Concepts. Techniques for masking faults. Failure Masking by Redundancy. CIS 505: Software Systems Lecture Note on Consensus

Concepts. Techniques for masking faults. Failure Masking by Redundancy. CIS 505: Software Systems Lecture Note on Consensus CIS 505: Software Systems Lecture Note on Consensus Insup Lee Department of Computer and Information Science University of Pennsylvania CIS 505, Spring 2007 Concepts Dependability o Availability ready

More information

Today: Fault Tolerance. Fault Tolerance

Today: Fault Tolerance. Fault Tolerance Today: Fault Tolerance Agreement in presence of faults Two army problem Byzantine generals problem Reliable communication Distributed commit Two phase commit Three phase commit Paxos Failure recovery Checkpointing

More information

A Case Study of Agreement Problems in Distributed Systems : Non-Blocking Atomic Commitment

A Case Study of Agreement Problems in Distributed Systems : Non-Blocking Atomic Commitment A Case Study of Agreement Problems in Distributed Systems : Non-Blocking Atomic Commitment Michel RAYNAL IRISA, Campus de Beaulieu 35042 Rennes Cedex (France) raynal @irisa.fr Abstract This paper considers

More information

Beyond FLP. Acknowledgement for presentation material. Chapter 8: Distributed Systems Principles and Paradigms: Tanenbaum and Van Steen

Beyond FLP. Acknowledgement for presentation material. Chapter 8: Distributed Systems Principles and Paradigms: Tanenbaum and Van Steen Beyond FLP Acknowledgement for presentation material Chapter 8: Distributed Systems Principles and Paradigms: Tanenbaum and Van Steen Paper trail blog: http://the-paper-trail.org/blog/consensus-protocols-paxos/

More information

Exam 2 Review. October 29, Paul Krzyzanowski 1

Exam 2 Review. October 29, Paul Krzyzanowski 1 Exam 2 Review October 29, 2015 2013 Paul Krzyzanowski 1 Question 1 Why did Dropbox add notification servers to their architecture? To avoid the overhead of clients polling the servers periodically to check

More information

Control. CS432: Distributed Systems Spring 2017

Control. CS432: Distributed Systems Spring 2017 Transactions and Concurrency Control Reading Chapter 16, 17 (17.2,17.4,17.5 ) [Coulouris 11] Chapter 12 [Ozsu 10] 2 Objectives Learn about the following: Transactions in distributed systems Techniques

More information

(Pessimistic) Timestamp Ordering. Rules for read and write Operations. Read Operations and Timestamps. Write Operations and Timestamps

(Pessimistic) Timestamp Ordering. Rules for read and write Operations. Read Operations and Timestamps. Write Operations and Timestamps (Pessimistic) stamp Ordering Another approach to concurrency control: Assign a timestamp ts(t) to transaction T at the moment it starts Using Lamport's timestamps: total order is given. In distributed

More information

Practical Byzantine Fault Tolerance. Miguel Castro and Barbara Liskov

Practical Byzantine Fault Tolerance. Miguel Castro and Barbara Liskov Practical Byzantine Fault Tolerance Miguel Castro and Barbara Liskov Outline 1. Introduction to Byzantine Fault Tolerance Problem 2. PBFT Algorithm a. Models and overview b. Three-phase protocol c. View-change

More information

Recovering from a Crash. Three-Phase Commit

Recovering from a Crash. Three-Phase Commit Recovering from a Crash If INIT : abort locally and inform coordinator If Ready, contact another process Q and examine Q s state Lecture 18, page 23 Three-Phase Commit Two phase commit: problem if coordinator

More information

Management of Protocol State

Management of Protocol State Management of Protocol State Ibrahim Matta December 2012 1 Introduction These notes highlight the main issues related to synchronizing the data at both sender and receiver of a protocol. For example, in

More information

C 1. Recap. CSE 486/586 Distributed Systems Failure Detectors. Today s Question. Two Different System Models. Why, What, and How.

C 1. Recap. CSE 486/586 Distributed Systems Failure Detectors. Today s Question. Two Different System Models. Why, What, and How. Recap Best Practices Distributed Systems Failure Detectors Steve Ko Computer Sciences and Engineering University at Buffalo 2 Today s Question Two Different System Models How do we handle failures? Cannot

More information

Exam 2 Review. Fall 2011

Exam 2 Review. Fall 2011 Exam 2 Review Fall 2011 Question 1 What is a drawback of the token ring election algorithm? Bad question! Token ring mutex vs. Ring election! Ring election: multiple concurrent elections message size grows

More information

(Pessimistic) Timestamp Ordering

(Pessimistic) Timestamp Ordering (Pessimistic) Timestamp Ordering Another approach to concurrency control: Assign a timestamp ts(t) to transaction T at the moment it starts Using Lamport's timestamps: total order is given. In distributed

More information

CS 347: Distributed Databases and Transaction Processing Notes07: Reliable Distributed Database Management

CS 347: Distributed Databases and Transaction Processing Notes07: Reliable Distributed Database Management CS 347: Distributed Databases and Transaction Processing Notes07: Reliable Distributed Database Management Hector Garcia-Molina CS 347 Notes07 1 Reliable distributed database management Reliability Failure

More information

416 practice questions (PQs)

416 practice questions (PQs) 416 practice questions (PQs) 1. Goal: give you some material to study for the final exam and to help you to more actively engage with the material we cover in class. 2. Format: questions that are in scope

More information

Distributed Systems. Day 11: Replication [Part 3 Raft] To survive failures you need a raft

Distributed Systems. Day 11: Replication [Part 3 Raft] To survive failures you need a raft Distributed Systems Day : Replication [Part Raft] To survive failures you need a raft Consensus Consensus: A majority of nodes agree on a value Variations on the problem, depending on assumptions Synchronous

More information

Practical Byzantine Fault

Practical Byzantine Fault Practical Byzantine Fault Tolerance Practical Byzantine Fault Tolerance Castro and Liskov, OSDI 1999 Nathan Baker, presenting on 23 September 2005 What is a Byzantine fault? Rationale for Byzantine Fault

More information

Parallel Data Types of Parallelism Replication (Multiple copies of the same data) Better throughput for read-only computations Data safety Partitionin

Parallel Data Types of Parallelism Replication (Multiple copies of the same data) Better throughput for read-only computations Data safety Partitionin Parallel Data Types of Parallelism Replication (Multiple copies of the same data) Better throughput for read-only computations Data safety Partitioning (Different data at different sites More space Better

More information

Two-Phase Atomic Commitment Protocol in Asynchronous Distributed Systems with Crash Failure

Two-Phase Atomic Commitment Protocol in Asynchronous Distributed Systems with Crash Failure Two-Phase Atomic Commitment Protocol in Asynchronous Distributed Systems with Crash Failure Yong-Hwan Cho, Sung-Hoon Park and Seon-Hyong Lee School of Electrical and Computer Engineering, Chungbuk National

More information