EECS 591 DISTRIBUTED SYSTEMS
|
|
- Rosanna Randall
- 5 years ago
- Views:
Transcription
1 EECS 591 DISTRIBUTED SYSTEMS Manos Kapritsos Fall 2018 Slides by: Lorenzo Alvisi
2 3-PHASE COMMIT Coordinator I. sends VOTE-REQ to all participants 3. if (all votes are Yes) then send Precommit to all else := Abort send Abort to all who voted Yes halt 5. collect Ack from all participants When all Ack s have been received: := Commit send Commit to all Participant 2. sends to Coordinator if halt = No then := Abort 4. if received Precommit then send Ack 6. When receives Commit, sets := Commit and halts
3 TIMEOUT ACTIONS Coordinator Participant Step 3: Coordinator is waiting for vote from participants Step 2: is waiting for VOTE-REQ from the coordinator Same as in 2PC Same as in 2PC Step 4: is waiting for Precommit Run termination protocol Step 5: Coordinator is waiting for Ack s Coordinator sends Commit Step 6: is waiting for Commit Run termination protocol
4 TERMINATION PROTOCOL: PROCESS STATES At any time while running 3PC, each participant can be in exactly one of these four states: Aborted Uncertain Committable Committed Not voted, voted No, received Abort Voted Yes but not received Precommit Received Precommit, not Commit Received Commit
5 NOT ALL STATES ARE COMPATIBLE Aborted Uncertain Committable Committed Aborted Uncertain Committable Committed
6 TERMINATION PROTOCOL When times out, it starts an election protocol to elect a new coordinator The new coordinator sends STATE-REQ to all processes that participated in the election The new coordinator collects the states and follows a set of termination rules
7 to elect a new coordinator The new coordinator sends STATE-REQ to all TERMINATION PROTOCOL processes that participated in the election The new coordinator collects the states and follows a set of termination rules TR1: if some process decided Abort, then decide Abort send Abort to all halt TR2: if some process decided Commit, then decide Commit send Commit to all halt TR3: if all processes that reported state are uncertain, then decide Abort send Abort to all halt TR4: if some process is committable, but none committed, then send Precommit to uncertain processes wait for Ack s send Commit to all halt
8 TERMINATION PROTOCOL AND FAILURES Processes can fail while executing the termination protocol if times out on, it can just ignore if fails, a new coordinator is elected and the protocol is restarted (election protocol to follow) total failures will need special care
9 RECOVERING If If If If fails before sending Yes, decide Abort fails after having decided, follow decision fails after voting Yes, but before receiving decision value asks other processes for help 3PC is non-blocking: will receive a response with the decision has received Precommit still needs to ask other processes (cannot just Commit) No need to log Precommit! (or is there?)
10 THE ELECTION PROTOCOL Processes agree on linear ordering (e.g. by pid) Each process maintains a set of all processes that it believes to be operational When detects failure of, it removes from and chooses smallest in to be the new coordinator If, then is the new coordinator Otherwise, sends UR-ELECTED to
11 WHAT IF? What if, which has not detected the failure of, receives a STATE-REQ from? it concludes that it removes from must be faulty every What if receives a STATE-REQ from after it has changed the coordinator to? ignores the request
12 TOTAL FAILURE Suppose that is the first process to recover and that is uncertain. Can decide Abort? Some process could have decided Commit after crashed! is blocked until some process recovers such that either can recover independently is the last process to fail: then invoke the termination protocol can simply
13 DETERMINING THE LAST PROCESS TO FAIL Suppose a set of processes has recovered Does contain the last process to fail? the last process to fail is in the set of every process so the last process to fail must be in contains the last process to fail if:
14 ADMINISTRIVIA Homework #1 due Wednesday before class Research project Declare your team by Oct 1st (by to me) Declare your topic by Oct 8 (by to me) Not sure what to do? Come talk to me.
15 CONSENSUS AND RELIABLE BROADCAST
16 BROADCAST If a process sends a message eventually delivers, then every process How can we adapt the spec for an environment where processes may fail?
17 RELIABLE BROADCAST Validity Agreement Integrity If the sender is correct and broadcasts a message, then all correct processes eventually deliver If a correct process delivers a message, then all correct processes eventually deliver Every correct process delivers at most one message, and if it delivers, then some process must have broadcast
18 TERMINATING RELIABLE BROADCAST Validity Agreement Integrity Termination If the sender is correct and broadcasts a message, then all correct processes eventually deliver If a correct process delivers a message, then all correct processes eventually deliver Every correct process delivers at most one message, and if it delivers, then some process must have broadcast Every correct process eventually delivers some message
19 CONSENSUS Every process has a value to propose. After running a consensus algorithm, all processes should deliver the same value.
20 CONSENSUS Validity Agreement If all processes that propose a value propose, then all correct processes eventually decide If a correct process decides, then all correct processes eventually decide Integrity Termination Every correct process decides at most one value, and if it decides, then some process must have proposed Every correct process eventually decides some value
21 PROPERTIES OF send(m) AND receive(m) Benign failures: Validity If sends to, and, and the link between them are correct, then eventually receives Uniform* integrity For every message, receives at most once from, and only if sent to * A property is called uniform if it applies to both correct and faulty processes
22 MODEL Synchronous message passing Execution is a sequence of rounds In each round every process takes a step sends messages to neighbors receives messages send in that round changes its state Network is fully connected No communication failures
23 A SIMPLE CONSENSUS ALGORITHM Process : Initially To execute propose( ): 1. Send { } to all decide( ) occurs as follows: 2. for all, do 3. receive from decide min( )
24 time AN EXECUTION
25 AN EXECUTION What should decide at the end of the round? start of round end of round
26 AN EXECUTION What should decide at the end of the round?
27 AN EXECUTION What should decide at the end of the round?
28 AN EXECUTION What should decide at the end of the round?
29 ECHOING VALUES A process that receives a proposal in round 1, relays it to others during round 2 Suppose hasn t heard from at the end of round 2. Can decide? round 1 round 2
30 ECHOING VALUES A process that receives a proposal in round 1, relays it to others during round 2 Suppose hasn t heard from at the end of round 2. Can decide? round 1 round 2
31 ECHOING VALUES A process that receives a proposal in round 1, relays it to others during round 2 Suppose hasn t heard from at the end of round 2. Can decide? round 1 round 2
32 ECHOING VALUES A process that receives a proposal in round 1, relays it to others during round 2 Suppose hasn t heard from at the end of round 2. Can decide? round 1 round 2
EECS 591 DISTRIBUTED SYSTEMS. Manos Kapritsos Winter 2018
EECS 591 DISTRIBUTED SYSTEMS Manos Kapritsos Winter 2018 ATOMIC COMMIT Preserve data consistency for distributed transactions in the presence of failures Setup one coordinator a set of participants Each
More informationThe objective. Atomic Commit. The setup. Model. Preserve data consistency for distributed transactions in the presence of failures
The objective Atomic Commit Preserve data consistency for distributed transactions in the presence of failures Model The setup For each distributed transaction T: one coordinator a set of participants
More informationThe challenges of non-stable predicates. The challenges of non-stable predicates. The challenges of non-stable predicates
The challenges of non-stable predicates Consider a non-stable predicate Φ encoding, say, a safety property. We want to determine whether Φ holds for our program. The challenges of non-stable predicates
More informationDistributed Transactions
Distributed Transactions Preliminaries Last topic: transactions in a single machine This topic: transactions across machines Distribution typically addresses two needs: Split the work across multiple nodes
More informationCOMMENTS. AC-1: AC-1 does not require all processes to reach a decision It does not even require all correct processes to reach a decision
ATOMIC COMMIT Preserve data consistency for distributed transactions in the presence of failures Setup one coordinator a set of participants Each process has access to a Distributed Transaction Log (DT
More informationCS 541 Database Systems. Three Phase Commit
CS 541 Database Systems Three Phase Commit 1 Introduction No ACP can eliminate blocking if total failures or total site failures are possible. 2PC may cause blocking even if there is a nontotal site failure
More informationConsensus in Distributed Systems. Jeff Chase Duke University
Consensus in Distributed Systems Jeff Chase Duke University Consensus P 1 P 1 v 1 d 1 Unreliable multicast P 2 P 3 Consensus algorithm P 2 P 3 v 2 Step 1 Propose. v 3 d 2 Step 2 Decide. d 3 Generalizes
More informationAssignment 12: Commit Protocols and Replication Solution
Data Modelling and Databases Exercise dates: May 24 / May 25, 2018 Ce Zhang, Gustavo Alonso Last update: June 04, 2018 Spring Semester 2018 Head TA: Ingo Müller Assignment 12: Commit Protocols and Replication
More informationTransactions. CS 475, Spring 2018 Concurrent & Distributed Systems
Transactions CS 475, Spring 2018 Concurrent & Distributed Systems Review: Transactions boolean transfermoney(person from, Person to, float amount){ if(from.balance >= amount) { from.balance = from.balance
More informationExercise 12: Commit Protocols and Replication
Data Modelling and Databases (DMDB) ETH Zurich Spring Semester 2017 Systems Group Lecturer(s): Gustavo Alonso, Ce Zhang Date: May 22, 2017 Assistant(s): Claude Barthels, Eleftherios Sidirourgos, Eliza
More informationAssignment 12: Commit Protocols and Replication
Data Modelling and Databases Exercise dates: May 24 / May 25, 2018 Ce Zhang, Gustavo Alonso Last update: June 04, 2018 Spring Semester 2018 Head TA: Ingo Müller Assignment 12: Commit Protocols and Replication
More informationAgreement and Consensus. SWE 622, Spring 2017 Distributed Software Engineering
Agreement and Consensus SWE 622, Spring 2017 Distributed Software Engineering Today General agreement problems Fault tolerance limitations of 2PC 3PC Paxos + ZooKeeper 2 Midterm Recap 200 GMU SWE 622 Midterm
More informationDistributed Systems Fault Tolerance
Distributed Systems Fault Tolerance [] Fault Tolerance. Basic concepts - terminology. Process resilience groups and failure masking 3. Reliable communication reliable client-server communication reliable
More informationConsensus and related problems
Consensus and related problems Today l Consensus l Google s Chubby l Paxos for Chubby Consensus and failures How to make process agree on a value after one or more have proposed what the value should be?
More informationFault Tolerance. o Basic Concepts o Process Resilience o Reliable Client-Server Communication o Reliable Group Communication. o Distributed Commit
Fault Tolerance o Basic Concepts o Process Resilience o Reliable Client-Server Communication o Reliable Group Communication o Distributed Commit -1 Distributed Commit o A more general problem of atomic
More informationDistributed Systems Principles and Paradigms. Chapter 08: Fault Tolerance
Distributed Systems Principles and Paradigms Maarten van Steen VU Amsterdam, Dept. Computer Science Room R4.20, steen@cs.vu.nl Chapter 08: Fault Tolerance Version: December 2, 2010 2 / 65 Contents Chapter
More informationDistributed Commit in Asynchronous Systems
Distributed Commit in Asynchronous Systems Minsoo Ryu Department of Computer Science and Engineering 2 Distributed Commit Problem - Either everybody commits a transaction, or nobody - This means consensus!
More informationModule 8 - Fault Tolerance
Module 8 - Fault Tolerance Dependability Reliability A measure of success with which a system conforms to some authoritative specification of its behavior. Probability that the system has not experienced
More informationToday: Fault Tolerance
Today: Fault Tolerance Agreement in presence of faults Two army problem Byzantine generals problem Reliable communication Distributed commit Two phase commit Three phase commit Paxos Failure recovery Checkpointing
More informationDistributed File System
Distributed File System Last Class NFS Design choices Opportunistic locking Local name spaces CS 138 XVIII 1 Copyright 2018 Theophilus Benson, Thomas W. Doeppner. All DFS Basic Info Global name space across
More informationDistributed Systems Consensus
Distributed Systems Consensus Amir H. Payberah amir@sics.se Amirkabir University of Technology (Tehran Polytechnic) Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 1 / 56 What is the Problem?
More informationRecovering from a Crash. Three-Phase Commit
Recovering from a Crash If INIT : abort locally and inform coordinator If Ready, contact another process Q and examine Q s state Lecture 18, page 23 Three-Phase Commit Two phase commit: problem if coordinator
More informationDistributed System. Gang Wu. Spring,2018
Distributed System Gang Wu Spring,2018 Lecture4:Failure& Fault-tolerant Failure is the defining difference between distributed and local programming, so you have to design distributed systems with the
More informationDistributed Systems (ICE 601) Fault Tolerance
Distributed Systems (ICE 601) Fault Tolerance Dongman Lee ICU Introduction Failure Model Fault Tolerance Models state machine primary-backup Class Overview Introduction Dependability availability reliability
More informationToday: Fault Tolerance. Fault Tolerance
Today: Fault Tolerance Agreement in presence of faults Two army problem Byzantine generals problem Reliable communication Distributed commit Two phase commit Three phase commit Paxos Failure recovery Checkpointing
More informationThe Long March of BFT. Weird Things Happen in Distributed Systems. A specter is haunting the system s community... A hierarchy of failure models
A specter is haunting the system s community... The Long March of BFT Lorenzo Alvisi UT Austin BFT Fail-stop A hierarchy of failure models Crash Weird Things Happen in Distributed Systems Send Omission
More informationDistributed Systems Principles and Paradigms
Distributed Systems Principles and Paradigms Chapter 07 (version 16th May 2006) Maarten van Steen Vrije Universiteit Amsterdam, Faculty of Science Dept. Mathematics and Computer Science Room R4.20. Tel:
More informationDistributed Algorithmic
Distributed Algorithmic Master 2 IFI, CSSR + Ubinet Françoise Baude Université de Nice Sophia-Antipolis UFR Sciences Département Informatique baude@unice.fr web site : deptinfo.unice.fr/~baude/algodist
More informationToday: Fault Tolerance. Reliable One-One Communication
Today: Fault Tolerance Reliable communication Distributed commit Two phase commit Three phase commit Failure recovery Checkpointing Message logging Lecture 17, page 1 Reliable One-One Communication Issues
More informationFault Tolerance Part II. CS403/534 Distributed Systems Erkay Savas Sabanci University
Fault Tolerance Part II CS403/534 Distributed Systems Erkay Savas Sabanci University 1 Reliable Group Communication Reliable multicasting: A message that is sent to a process group should be delivered
More informationNetwork Time Protocol
Network Time Protocol The oldest distributed protocol still running on the Internet Hierarchical architecture Latency-tolerant, jitter-tolerant, faulttolerant.. very tolerant! Hierarchical structure Each
More informationCprE Fault Tolerance. Dr. Yong Guan. Department of Electrical and Computer Engineering & Information Assurance Center Iowa State University
Fault Tolerance Dr. Yong Guan Department of Electrical and Computer Engineering & Information Assurance Center Iowa State University Outline for Today s Talk Basic Concepts Process Resilience Reliable
More informationTopics in Reliable Distributed Systems
Topics in Reliable Distributed Systems 049017 1 T R A N S A C T I O N S Y S T E M S What is A Database? Organized collection of data typically persistent organization models: relational, object-based,
More informationTo do. Consensus and related problems. q Failure. q Raft
Consensus and related problems To do q Failure q Consensus and related problems q Raft Consensus We have seen protocols tailored for individual types of consensus/agreements Which process can enter the
More informationToday: Fault Tolerance. Failure Masking by Redundancy
Today: Fault Tolerance Agreement in presence of faults Two army problem Byzantine generals problem Reliable communication Distributed commit Two phase commit Three phase commit Failure recovery Checkpointing
More informationDistributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf
Distributed systems Lecture 6: distributed transactions, elections, consensus and replication Malte Schwarzkopf Last time Saw how we can build ordered multicast Messages between processes in a group Need
More informationDistributed Algorithms (PhD course) Consensus SARDAR MUHAMMAD SULAMAN
Distributed Algorithms (PhD course) Consensus SARDAR MUHAMMAD SULAMAN Consensus The processes use consensus to agree on a common value out of values they initially propose Reaching consensus is one of
More informationFault Tolerance. Distributed Systems. September 2002
Fault Tolerance Distributed Systems September 2002 Basics A component provides services to clients. To provide services, the component may require the services from other components a component may depend
More information6.033 Computer System Engineering
MIT OpenCourseWare http://ocw.mit.edu 6.033 Computer System Engineering Spring 2009 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms. Lec 19 : Nested atomic
More informationFault Tolerance. Basic Concepts
COP 6611 Advanced Operating System Fault Tolerance Chi Zhang czhang@cs.fiu.edu Dependability Includes Availability Run time / total time Basic Concepts Reliability The length of uninterrupted run time
More informationDistributed Computing. CS439: Principles of Computer Systems November 19, 2018
Distributed Computing CS439: Principles of Computer Systems November 19, 2018 Bringing It All Together We ve been studying how an OS manages a single CPU system As part of that, it will communicate with
More informationCSE 444: Database Internals. Section 9: 2-Phase Commit and Replication
CSE 444: Database Internals Section 9: 2-Phase Commit and Replication 1 Today 2-Phase Commit Replication 2 Two-Phase Commit Protocol (2PC) One coordinator and many subordinates Phase 1: Prepare Phase 2:
More informationNONBLOCKING COMMIT PROTOCOLS
Dale Skeen NONBLOCKING COMMIT PROTOCOLS MC714 Sistemas Distribuídos Nonblocking Commit Protocols Dale Skeen From a certain point onward there is no longer any turning back. That is the point that must
More informationMYE017 Distributed Systems. Kostas Magoutis
MYE017 Distributed Systems Kostas Magoutis magoutis@cse.uoi.gr http://www.cse.uoi.gr/~magoutis Message reception vs. delivery The logical organization of a distributed system to distinguish between message
More informationVerteilte Systeme/Distributed Systems Ch. 5: Various distributed algorithms
Verteilte Systeme/Distributed Systems Ch. 5: Various distributed algorithms Holger Karl Computer Networks Group Universität Paderborn Goal of this chapter Apart from issues in distributed time and resulting
More informationAnnouncements. Late homework policy
Announcements Late homework policy n Updated on course website n Up to 1 HW can be late for up to 5 days without penalty n After that, late HW accepted and graded with discount of 10%/day for up to 5 days
More informationExercise 12: Commit Protocols and Replication
Data Modelling and Databases (DMDB) ETH Zurich Spring Semester 2017 Systems Group Lecturer(s): Gustavo Alonso, Ce Zhang Date: May 22, 2017 Assistant(s): Claude Barthels, Eleftherios Sidirourgos, Eliza
More informationConsensus, impossibility results and Paxos. Ken Birman
Consensus, impossibility results and Paxos Ken Birman Consensus a classic problem Consensus abstraction underlies many distributed systems and protocols N processes They start execution with inputs {0,1}
More informationCS505: Distributed Systems
Department of Computer Science CS505: Distributed Systems Lecture 13: Distributed Transactions Outline Distributed Transactions Two Phase Commit and Three Phase Commit Non-blocking Atomic Commit with P
More informationProseminar Distributed Systems Summer Semester Paxos algorithm. Stefan Resmerita
Proseminar Distributed Systems Summer Semester 2016 Paxos algorithm stefan.resmerita@cs.uni-salzburg.at The Paxos algorithm Family of protocols for reaching consensus among distributed agents Agents may
More informationDistributed Systems COMP 212. Lecture 19 Othon Michail
Distributed Systems COMP 212 Lecture 19 Othon Michail Fault Tolerance 2/31 What is a Distributed System? 3/31 Distributed vs Single-machine Systems A key difference: partial failures One component fails
More informationControl. CS432: Distributed Systems Spring 2017
Transactions and Concurrency Control Reading Chapter 16, 17 (17.2,17.4,17.5 ) [Coulouris 11] Chapter 12 [Ozsu 10] 2 Objectives Learn about the following: Transactions in distributed systems Techniques
More informationDistributed Algorithms Failure detection and Consensus. Ludovic Henrio CNRS - projet SCALE
Distributed Algorithms Failure detection and Consensus Ludovic Henrio CNRS - projet SCALE ludovic.henrio@cnrs.fr Acknowledgement The slides for this lecture are based on ideas and materials from the following
More informationBasic vs. Reliable Multicast
Basic vs. Reliable Multicast Basic multicast does not consider process crashes. Reliable multicast does. So far, we considered the basic versions of ordered multicasts. What about the reliable versions?
More informationCoordination and Agreement
Coordination and Agreement 1 Introduction 2 Distributed Mutual Exclusion 3 Multicast Communication 4 Elections 5 Consensus and Related Problems AIM: Coordination and/or Agreement Collection of algorithms
More informationHomework #2 Nathan Balon CIS 578 October 31, 2004
Homework #2 Nathan Balon CIS 578 October 31, 2004 1 Answer the following questions about the snapshot algorithm: A) What is it used for? It used for capturing the global state of a distributed system.
More informationLeader Election in Rings
Leader Election Arvind Krishnamurthy Fall 2003 Leader Election in Rings Under different models: Synchronous vs. asynchronous Anonymous vs. non-anonymous (knowledge of unique id) Knowledge of n (non-uniform)
More informationDistributed Systems. coordination Johan Montelius ID2201. Distributed Systems ID2201
Distributed Systems ID2201 coordination Johan Montelius 1 Coordination Coordinating several threads in one node is a problem, coordination in a network is of course worse: failure of nodes and networks
More informationDistributed Deadlock
Distributed Deadlock 9.55 DS Deadlock Topics Prevention Too expensive in time and network traffic in a distributed system Avoidance Determining safe and unsafe states would require a huge number of messages
More informationCoordination and Agreement
Coordination and Agreement 12.1 Introduction 12.2 Distributed Mutual Exclusion 12.4 Multicast Communication 12.3 Elections 12.5 Consensus and Related Problems AIM: Coordination and/or Agreement Collection
More informationFault Tolerance. Distributed Systems IT332
Fault Tolerance Distributed Systems IT332 2 Outline Introduction to fault tolerance Reliable Client Server Communication Distributed commit Failure recovery 3 Failures, Due to What? A system is said to
More informationReliable Broadcast. vanilladb.org
Reliable Broadcast vanilladb.org Broadcast A broadcast abstraction enables a process to send a message to all processes in a system, including itself A naïve approach Try to broadcast the message to as
More informationFault Tolerance. Goals: transparent: mask (i.e., completely recover from) all failures, or predictable: exhibit a well defined failure behavior
Fault Tolerance Causes of failure: process failure machine failure network failure Goals: transparent: mask (i.e., completely recover from) all failures, or predictable: exhibit a well defined failure
More informationDistributed systems. Consensus
Distributed systems Consensus Prof R. Guerraoui Distributed Programming Laboratory Consensus B A C 2 Consensus In the consensus problem, the processes propose values and have to agree on one among these
More informationDistributed Algorithms Reliable Broadcast
Distributed Algorithms Reliable Broadcast Alberto Montresor University of Trento, Italy 2016/04/26 This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Contents
More informationDistributed Algorithms Benoît Garbinato
Distributed Algorithms Benoît Garbinato 1 Distributed systems networks distributed As long as there were no machines, programming was no problem networks distributed at all; when we had a few weak computers,
More informationFailures, Elections, and Raft
Failures, Elections, and Raft CS 8 XI Copyright 06 Thomas W. Doeppner, Rodrigo Fonseca. All rights reserved. Distributed Banking SFO add interest based on current balance PVD deposit $000 CS 8 XI Copyright
More informationEECS 591 DISTRIBUTED SYSTEMS. Manos Kapritsos Fall 2018
EECS 591 DISTRIBUTED SYSTEMS Manos Kapritsos Fall 2018 THE GENERAL IDEA Replicas A Primary A 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 PBFT: NORMAL OPERATION Three phases: Pre-prepare Prepare Commit assigns sequence
More informationDistributed Systems Principles and Paradigms
Distributed Systems Principles and Paradigms Chapter 08 (version October 5, 2007) Maarten van Steen Vrije Universiteit Amsterdam, Faculty of Science Dept. Mathematics and Computer Science Room R4.20. Tel:
More informationDistributed Systems Principles and Paradigms
Distributed Systems Principles and Paradigms Chapter 08 (version October 5, 2007) Maarten van Steen Vrije Universiteit Amsterdam, Faculty of Science Dept. Mathematics and Computer Science Room R4.20. Tel:
More informationConsensus a classic problem. Consensus, impossibility results and Paxos. Distributed Consensus. Asynchronous networks.
Consensus, impossibility results and Paxos Ken Birman Consensus a classic problem Consensus abstraction underlies many distributed systems and protocols N processes They start execution with inputs {0,1}
More informationPaxos and Raft (Lecture 21, cs262a) Ion Stoica, UC Berkeley November 7, 2016
Paxos and Raft (Lecture 21, cs262a) Ion Stoica, UC Berkeley November 7, 2016 Bezos mandate for service-oriented-architecture (~2002) 1. All teams will henceforth expose their data and functionality through
More informationZooKeeper & Curator. CS 475, Spring 2018 Concurrent & Distributed Systems
ZooKeeper & Curator CS 475, Spring 2018 Concurrent & Distributed Systems Review: Agreement In distributed systems, we have multiple nodes that need to all agree that some object has some state Examples:
More informationName: 1. CS372H: Spring 2009 Final Exam
Name: 1 Instructions CS372H: Spring 2009 Final Exam This exam is closed book and notes with one exception: you may bring and refer to a 1-sided 8.5x11- inch piece of paper printed with a 10-point or larger
More informationLecture 17 : Distributed Transactions 11/8/2017
Lecture 17 : Distributed Transactions 11/8/2017 Today: Two-phase commit. Last time: Parallel query processing Recap: Main ways to get parallelism: Across queries: - run multiple queries simultaneously
More informationDistributed Systems. Multicast and Agreement
Distributed Systems Multicast and Agreement Björn Franke University of Edinburgh 2015/2016 Multicast Send message to multiple nodes A node can join a multicast group, and receives all messages sent to
More informationPractice: Large Systems Part 2, Chapter 2
Practice: Large Systems Part 2, Chapter 2 Overvie Introduction Strong Consistency Crash Failures: Primary Copy, Commit Protocols Crash-Recovery Failures: Paxos, Chubby Byzantine Failures: PBFT, Zyzzyva
More informationATOMIC COMMITMENT Or: How to Implement Distributed Transactions in Sharded Databases
ATOMIC COMMITMENT Or: How to Implement Distributed Transactions in Sharded Databases We talked about transactions and how to implement them in a single-node database. We ll now start looking into how to
More informationConsensus Problem. Pradipta De
Consensus Problem Slides are based on the book chapter from Distributed Computing: Principles, Paradigms and Algorithms (Chapter 14) by Kshemkalyani and Singhal Pradipta De pradipta.de@sunykorea.ac.kr
More informationCS505: Distributed Systems
Department of Computer Science CS505: Distributed Systems Lecture 14: More Agreement Problems Uniform Reliable Broadcast Terminating Reliable Broadcast Leader Election Uniform Reliable Broadcast By now
More informationModule 8 Fault Tolerance CS655! 8-1!
Module 8 Fault Tolerance CS655! 8-1! Module 8 - Fault Tolerance CS655! 8-2! Dependability Reliability! A measure of success with which a system conforms to some authoritative specification of its behavior.!
More informationFault Tolerance Causes of failure: process failure machine failure network failure Goals: transparent: mask (i.e., completely recover from) all
Fault Tolerance Causes of failure: process failure machine failure network failure Goals: transparent: mask (i.e., completely recover from) all failures or predictable: exhibit a well defined failure behavior
More information7 Fault Tolerant Distributed Transactions Commit protocols
7 Fault Tolerant Distributed Transactions Commit protocols 7.1 Subtransactions and distribution 7.2 Fault tolerance and commit processing 7.3 Requirements 7.4 One phase commit 7.5 Two phase commit x based
More informationConsistency. CS 475, Spring 2018 Concurrent & Distributed Systems
Consistency CS 475, Spring 2018 Concurrent & Distributed Systems Review: 2PC, Timeouts when Coordinator crashes What if the bank doesn t hear back from coordinator? If bank voted no, it s OK to abort If
More informationDistributed Algorithms (PhD course) Consensus SARDAR MUHAMMAD SULAMAN
Distributed Algorithms (PhD course) Consensus SARDAR MUHAMMAD SULAMAN Consensus (Recapitulation) A consensus abstraction is specified in terms of two events: 1. Propose ( propose v )» Each process has
More informationCS 347 Parallel and Distributed Data Processing
CS 347 Parallel and Distributed Data Processing Spring 2016 Notes 6: Reliability Reliable Distributed DB Management Reliability Failure models Scenarios CS 347 Notes 6 2 Reliability Correctness Serializability
More informationToday: Fault Tolerance. Replica Management
Today: Fault Tolerance Failure models Agreement in presence of faults Two army problem Byzantine generals problem Reliable communication Distributed commit Two phase commit Three phase commit Failure recovery
More information(Pessimistic) Timestamp Ordering. Rules for read and write Operations. Read Operations and Timestamps. Write Operations and Timestamps
(Pessimistic) stamp Ordering Another approach to concurrency control: Assign a timestamp ts(t) to transaction T at the moment it starts Using Lamport's timestamps: total order is given. In distributed
More information21. Distributed Algorithms
21. Distributed Algorithms We dene a distributed system as a collection of individual computing devices that can communicate with each other [2]. This denition is very broad, it includes anything, from
More informationClock Synchronization. Synchronization. Clock Synchronization Algorithms. Physical Clock Synchronization. Tanenbaum Chapter 6 plus additional papers
Clock Synchronization Synchronization Tanenbaum Chapter 6 plus additional papers Fig 6-1. In a distributed system, each machine has its own clock. When this is the case, an event that occurred after another
More informationRecall our 2PC commit problem. Recall our 2PC commit problem. Doing failover correctly isn t easy. Consensus I. FLP Impossibility, Paxos
Consensus I Recall our 2PC commit problem FLP Impossibility, Paxos Client C 1 C à TC: go! COS 418: Distributed Systems Lecture 7 Michael Freedman Bank A B 2 TC à A, B: prepare! 3 A, B à P: yes or no 4
More informationLast time. Distributed systems Lecture 6: Elections, distributed transactions, and replication. DrRobert N. M. Watson
Distributed systems Lecture 6: Elections, distributed transactions, and replication DrRobert N. M. Watson 1 Last time Saw how we can build ordered multicast Messages between processes in a group Need to
More informationParallel Data Types of Parallelism Replication (Multiple copies of the same data) Better throughput for read-only computations Data safety Partitionin
Parallel Data Types of Parallelism Replication (Multiple copies of the same data) Better throughput for read-only computations Data safety Partitioning (Different data at different sites More space Better
More informationCoordination and Agreement
Coordination and Agreement Nicola Dragoni Embedded Systems Engineering DTU Informatics 1. Introduction 2. Distributed Mutual Exclusion 3. Elections 4. Multicast Communication 5. Consensus and related problems
More information(Pessimistic) Timestamp Ordering
(Pessimistic) Timestamp Ordering Another approach to concurrency control: Assign a timestamp ts(t) to transaction T at the moment it starts Using Lamport's timestamps: total order is given. In distributed
More informationUnderstanding Non-Blocking Atomic Commitment
Understanding Non-Blocking Atomic Commitment Özalp Babaoğlu Sam Toueg Technical Report UBLCS-93-2 January 1993 Department of Computer Science University of Bologna Mura Anteo Zamboni 7 40127 Bologna (Italy)
More informationDistributed Systems. Day 13: Distributed Transaction. To Be or Not to Be Distributed.. Transactions
Distributed Systems Day 13: Distributed Transaction To Be or Not to Be Distributed.. Transactions Summary Background on Transactions ACID Semantics Distribute Transactions Terminology: Transaction manager,,
More informationHomework 5 (by Tupac Shakur) Solutions Due: Monday Dec 3, 11:59pm
CARNEGIE MELLON UNIVERSITY DEPARTMENT OF COMPUTER SCIENCE 15-445/645 DATABASE SYSTEMS (FALL 2018) PROF. ANDY PAVLO Homework 5 (by Tupac Shakur) Solutions Due: Monday Dec 3, 2018 @ 11:59pm IMPORTANT: Upload
More informationDistributed Computing. CS439: Principles of Computer Systems November 20, 2017
Distributed Computing CS439: Principles of Computer Systems November 20, 2017 Last Time Network Programming: Sockets End point of communication Identified by (IP address : port number) pair Client-Side
More information