CS 347 Parallel and Distributed Data Processing
|
|
- Samantha Moore
- 6 years ago
- Views:
Transcription
1 CS 347 Parallel and Distributed Data Processing Spring 2016 Notes 6: Reliability
2 Reliable Distributed DB Management Reliability Failure models Scenarios CS 347 Notes 6 2
3 Reliability Correctness Serializability Atomicity Persistence Availability CS 347 Notes 6 3
4 Types of Failures Processor failures Halt, delay, restart, erratic execution Storage failures Volatile vs. non-volatile storage failures Atomic write violations, transient errors, localized vs. global failures Network failures Lost message, out-of-order messages, partitions, bounded delay CS 347 Notes 6 4
5 Types of Failures Unintended vs. malevolent failures Single vs. multiple failures Detectable vs. undetectable failures CS 347 Notes 6 5
6 Failure Models Cannot protect against everything Unlikely failures E.g., flooding in the Sahara Ten of the Strangest Data Center Outages [ goo.gl/dcqysr ] Failures expensive to protect against E.g., earthquakes Failures we know how to easily cope with E.g., using message sequence numbers CS 347 Notes 6 6
7 Failure Models Desired Events Undesired Expected Unexpected CS 347 Notes 6 7
8 Node Models 1. Fail-stop nodes time perfect halted recovery perfect volatile memory lost stable storage ok CS 347 Notes 6 8
9 Node Models 2. Byzantine nodes A perfect arbitrary failure recovery perfect B C At any given time, at most some fraction (e.g., 1/2 or 1/3) of nodes are failing CS 347 Notes 6 9
10 Network Models 1. Reliable network In order messages No spontaneous messages Timeout T D No response within T D means destination is down (not paused) No lost messages except due to node failures CS 347 Notes 6 10
11 Network Models Variation of reliable network Persistent messages If destination down, network will eventually deliver message Simplifies node recovery, but inefficient Not considered here CS 347 Notes 6 11
12 Network Models 2. Partitionable network In order messages No spontaneous messages No timeout Nodes can have different views of the failures CS 347 Notes 6 12
13 Amazon Easter Outage Misconfiguration overloaded router partition [ goo.gl/z2npq ] CS 347 Notes 6 13
14 Scenarios Reliable network Fail-stop nodes No data replication (1) Data replication (2) Partitionable network Fail-stop nodes (3) CS 347 Notes 6 14
15 No Data Replication Reliable network, fail-stop nodes net P α X Basic idea: node P α controls X P α does concurrency control for X P α does recovery for X Single control point simplifies both CS 347 Notes 6 15
16 Process Models Transaction T wants to access X req P T Local DBMS Locks Log X P T is process that represents T at this node CS 347 Notes 6 16
17 Process Models Cohorts Application code responsible for remote access Application interacts with local DBMS Transaction Manager System handles distribution and remote access CS 347 Notes 6 17
18 Process Models Cohorts USER Spawn Process Communication Data Access T T T Local DBMS Local DBMS Local DBMS CS 347 Notes 6 18
19 Process Models Transaction Manager Data Access USER Trans Mgr Trans Mgr Trans Mgr Local DBMS Local DBMS Local DBMS CS 347 Notes 6 19
20 Distributed Commit Problem Transaction T Actions a 1, a 2 Action a 3 Actions a 4, a 5 CS 347 Notes 6 20
21 Centralized Two-Phase Commit Coordinator Participant incoming msg outgoing msg * = everyone go exec * I exec ok I exec nok ok * commit * nok abort * A commit abort A C C CS 347 Notes 6 21
22 Centralized Two-Phase Commit No lost messages (for now) Reliable network ill discuss node failures next hen participant enters state It must have acquired all resources It can only abort or commit if so instructed by the coordinator Coordinator only enters C state if all participants are in It is certain that all will eventually commit CS 347 Notes 6 22
23 Handling Node Failures At failing node Coordinator and participant logs are used to reconstruct state before failure CS 347 Notes 6 23
24 Handling Node Failures Example Participant log contains on recovery T 1 T 1 T 1 X undo/redo info... Y undo/redo... info state CS 347 Notes 6 24
25 Handling Node Failures Example Participant log contains on recovery T 1 T 1 T 1 X undo/redo info... Y undo/redo... info state Recovery steps 1. Notice that T 1 is in state 2. Obtain X, Y write locks (no read locks why?) 3. ait for message from coordinator (or ask about outcome) CS 347 Notes 6 25
26 Handling Node Failures Other examples No record on log abort T 1 Have C record on log finish T 1 CS 347 Notes 6 26
27 Handling Node Failures At the protocol level Add timeouts to cope with messages lost during failures Add finish (F) state for coordinator ~ all done, can forget outcome CS 347 Notes 6 27
28 Coordinator I cok = committed ok go exec * nok abort * A ok * commit * nok * C F cok * CS 347 Notes 6 28
29 Coordinator go exec * I nok abort * ping abort t = timeout cping = coordinator ping ping A t cping ok * commit * t abort * nok * C F ping commit t cping cok * CS 347 Notes 6 29
30 Participant exec ok I abort nok exec nok A commit cok C CS 347 Notes 6 30
31 Participant cping t ping exec ok commit cok I abort nok exec nok A cping nok C cping cok CS 347 Notes 6 31
32 Participant done = either cok or nok for coordinator cping t ping exec ok commit cok I abort nok exec nok A cping done equivalent to finish state C cping done CS 347 Notes 6 32
33 Presumed Abort Protocol F and A states combined in coordinator Saves persistent space (allows coordinator to forget sooner) Presumed commit is analogous CS 347 Notes 6 33
34 Presumed Abort Protocol Coordinator go exec * I nok, t abort * ping abort ping A/F ok * commit * cok * C ping commit t cping CS 347 Notes 6 34
35 Simplified Logging All state transitions must be logged here to log participant state? Both coordinator and participant Participant only CS 347 Notes 6 35
36 Simplified Logging Example Coordinator tracking participant OKs T 1 start participants { a, b } T 1... ok received... from a After failure, we know we are still waiting for OK from node b Alternative Do not log receipt of OKs Abort T 1 on recovery CS 347 Notes 6 36
37 Simplified Logging Example Logging receipt of cok messages If logged then coordinator can recover state If not logged resend commit * participants reply done if duplicate CS 347 Notes 6 37
38 Variants of 2PC Linear ok ok ok coordinator commit commit commit Hierarchical CS 347 Notes 6 38
39 Variants of 2PC Distributed Nodes broadcast all messages Every node knows when to commit CS 347 Notes 6 39
40 2PC is Blocking Sample scenario Coordinator P 2 P 1 P 3 P 4 CS 347 Notes 6 40
41 2PC is Blocking Case I P 1 coordinator sent commits P 1 C Case II P 1 A Surviving participants P 2, P 3, P 4 cannot safely abort or commit CS 347 Notes 6 41
42 Three-Phase Commit Non-blocking commit protocol Assumes that a failed node stays down forever Key idea Before committing the coordinator tells participants that everyone is OK CS 347 Notes 6 42
43 Three-Phase Commit Coordinator Participant ** = all non-failed nodes go exec * I exec ok I exec nok ok * pre * nok abort * A pre ack abort A P P ack ** commit * commit C C CS 347 Notes 6 43
44 3PC Recovery Rules Termination protocol survivors Survivors try to complete transaction, based on their current states Goal If dead nodes committed or aborted, then survivors should not contradict; else survivors can do as they please CS 347 Notes 6 44
45 3PC Recovery Rules Let { S 1, S 2,, S n } be survivor nodes If one or more S i = COMMIT COMMIT T If one or more S i = ABORT ABORT T If one or more S i = PREPARE COMMIT T T could not have aborted If no S i = PREPARE (or COMMIT) ABORT T T could not have committed CS 347 Notes 6 45
46 3PC Recovery Rules Example 1? P? CS 347 Notes 6 46
47 3PC Recovery Rules Example 2? I? CS 347 Notes 6 47
48 3PC Recovery Rules Example 3? P? P C CS 347 Notes 6 48
49 3PC Recovery Rules Example 4? P? A CS 347 Notes 6 49
50 3PC Recovery Rules Once survivors make decision, they must select new coordinator to continue 3PC P P P C P C C P C C Decide to commit Time 1 Time 2 Time 3 Time 4 CS 347 Notes 6 50
51 3PC Recovery Rules hen survivors continue 3PC, failed nodes do not count ** = all non-failed nodes P ack ** commit * C CS 347 Notes 6 51
52 3PC Recovery Rules 3PC is unsafe with partitions P P abort commit CS 347 Notes 6 52
53 3PC Node Recovery After node recovers from failure Do not participate in termination protocol? P CS 347 Notes 6 53
54 3PC Node Recovery After node recovers from failure Do not participate in termination protocol? P A CS 347 Notes 6 54
55 3PC Node Recovery After node recovers from failure Do not participate in termination protocol? P A CS 347 Notes 6 55
56 3PC Node Recovery After node recovers from failure aits until receives commit or abort decision from another node commit/abort ping? ping ping CS 347 Notes 6 56
57 3PC Node Recovery aiting for commit or abort decision from others is ok Unless all nodes fail????? CS 347 Notes 6 57
58 3PC Node Recovery aiting for commit or abort decision from others is ok Unless all nodes fail????? Two options A. ait for all nodes to recover B. Perform majority commit CS 347 Notes 6 58
59 3PC Node Recovery A. ait for all nodes to recover Recovering node waits for either 1. Commit or abort decision from another node or 2. If all other nodes are up and recovering then 3PC can continue No danger that there is a failed node that had committed or aborted CS 347 Notes 6 59
60 3PC Node Recovery B. Perform majority commit ant a gang of failed but recovered nodes to be able to terminate the transaction, even when the rest are still failing Nodes are assigned votes, total is V Majority is M round((v + 1) / 2) E.g., V = 6 M 4 To make state transitions, coordinator requires messages from notes with a majority of votes CS 347 Notes 6 60
61 3PC with Majority Votes Example 1 Coordinator? P 2? P 3 Nodes P 2, P 3, P 4 enter state and fail hen they recover, coordinator and P 1 are down Each node has one vote, V = 5, M 3 P 1? P 4 CS 347 Notes 6 61
62 3PC with Majority Votes Example 1 Coordinator? P 2? P 3 Nodes P 2, P 3, P 4 enter state and fail hen they recover, coordinator and P 1 are down Each node has one vote, V = 5, M 3 P 1? P 4 Since P 2, P 3, P 4 have majority, they know coordinator could not have gone to P without at least one of their votes T can be aborted CS 347 Notes 6 62
63 3PC with Majority Votes Example 2 Coordinator P 1 P 2?? P 3 P P 4 Nodes P 3 and P 4 enter P and state, then fail hen they recover, coordinator, P 1 and P 2 are down Each node has one vote, V = 5, M 3 CS 347 Notes 6 63
64 3PC with Majority Votes Example 2 Coordinator P 1 P 2?? P 3 P P 4 Nodes P 3 and P 4 enter P and state, then fail hen they recover, coordinator, P 1 and P 2 are down Each node has one vote, V = 5, M 3 Nodes P 3 and P 4 have insufficient votes they do nothing CS 347 Notes 6 64
65 3PC with Majority Votes Majority ensures that any decision (e.g., preparing, committing) will be known to any future group making subsequent decisions Decision #1 Decision #2 CS 347 Notes 6 65
66 3PC with Majority Votes Example? P A CS 347 Notes 6 66
67 3PC with Majority Votes Example? P C P A CS 347 Notes 6 67
68 3PC with Majority Votes Need prepare to abort state Coordinator Participant ** = all participants with majority votes go exec * I exec ok I exec nok prea acka ok * prec * nok prea * prec ackc prea acka A abort PC PA PC PA ackc ** commit * acka ** abort * commit C A C CS 347 Notes 6 68
69 3PC with Majority Votes Example revisited? PC CS 347 Notes 6 69
70 3PC with Majority Votes Example revisited Scenario 1? PC OK for all remaining nodes to enter PC and eventually commit The transaction could not have aborted CS 347 Notes 6 70
71 3PC with Majority Votes Example revisited Scenario 2? PC PA Same outcome as scenario 1 Even though most recently failed node was already in PA The PA node will have to commit when it eventually recovers CS 347 Notes 6 71
72 3PC with Majority Votes Example revisited Scenario 3? PC PA Remaining nodes initiated abort, some entered PA state CS 347 Notes 6 72
73 3PC with Majority Votes Example revisited Scenario 3? PC PA Cannot make a decision The transaction could have committed or aborted Exercise: work out sequences of steps (will revisit later) CS 347 Notes 6 73
74 3PC with Majority Votes Example revisited Scenario 3? PC PA ill have to wait for either 1. Some node to recover in A or C 2. All nodes to recover and confirm that none were in A or C CS 347 Notes 6 74
75 3PC with Majority Votes If survivors have majority and all states try to abort If survivors have majority and states in {, PC, C } try to commit If survivors have majority and states in {, PA, A } try to abort Otherwise block CS 347 Notes 6 75
76 3PC Comparison Basic 3PC Only nodes that have not failed participate in decision Any remaining subgroup can terminate (even one node) If all nodes fail, must wait for all to recover CS 347 Notes 6 76
77 3PC Comparison 3PC with majority votes A group of failed but recovering nodes can terminate transaction Need majority to commit Blocking protocol CS 347 Notes 6 77
78 3PC Logging hen a node recovers, it uses its log as usual to determine the status of each transaction If commit logged redo if necessary If abort logged (or wait is missing) rollback if necessary CS 347 Notes 6 78
79 3PC Logging hen a node recovers, it uses its log as usual to determine the status of each transaction If commit logged redo if necessary If abort logged (or wait is missing) rollback if necessary If wait logged (or pre state ) Reclaim locks held by T before crash Try to terminate T (with other nodes) Can start normal processing once locks secured for recovering transactions CS 347 Notes 6 79
80 Summary Failure models Nodes Networks Reliable network, fail-stop nodes, no replication Two-phase commit (blocking) Three-phase commit Basic (non-blocking) ith majority votes Next: replication CS 347 Notes 6 80
CS 347 Parallel and Distributed Data Processing
S 347 arallel and Distributed Data rocessing Spring 2016 Reliable Distributed DB Management Reliability Failure models Scenarios Notes 6: Reliability S 347 Notes 6 2 Reliability orrectness Serializability
More informationCS 347: Distributed Databases and Transaction Processing Notes07: Reliable Distributed Database Management
CS 347: Distributed Databases and Transaction Processing Notes07: Reliable Distributed Database Management Hector Garcia-Molina CS 347 Notes07 1 Reliable distributed database management Reliability Failure
More informationDistributed Databases
Topics for the day Distributed Databases CS347 Lecture 15 June 4, 2001 Concurrency Control Schedules and Serializability Locking Timestamp control Reliability Failure models Twophase protocol 1 2 Example
More informationDistributed Databases. CS347 Lecture 16 June 6, 2001
Distributed Databases CS347 Lecture 16 June 6, 2001 1 Reliability Topics for the day Three-phase commit (3PC) Majority 3PC Network partitions Committing with partitions Concurrency control with partitions
More informationFault Tolerance. Goals: transparent: mask (i.e., completely recover from) all failures, or predictable: exhibit a well defined failure behavior
Fault Tolerance Causes of failure: process failure machine failure network failure Goals: transparent: mask (i.e., completely recover from) all failures, or predictable: exhibit a well defined failure
More informationFault Tolerance Causes of failure: process failure machine failure network failure Goals: transparent: mask (i.e., completely recover from) all
Fault Tolerance Causes of failure: process failure machine failure network failure Goals: transparent: mask (i.e., completely recover from) all failures or predictable: exhibit a well defined failure behavior
More informationModule 8 Fault Tolerance CS655! 8-1!
Module 8 Fault Tolerance CS655! 8-1! Module 8 - Fault Tolerance CS655! 8-2! Dependability Reliability! A measure of success with which a system conforms to some authoritative specification of its behavior.!
More informationTopics in Reliable Distributed Systems
Topics in Reliable Distributed Systems 049017 1 T R A N S A C T I O N S Y S T E M S What is A Database? Organized collection of data typically persistent organization models: relational, object-based,
More informationModule 8 - Fault Tolerance
Module 8 - Fault Tolerance Dependability Reliability A measure of success with which a system conforms to some authoritative specification of its behavior. Probability that the system has not experienced
More informationFault Tolerance. Distributed Systems IT332
Fault Tolerance Distributed Systems IT332 2 Outline Introduction to fault tolerance Reliable Client Server Communication Distributed commit Failure recovery 3 Failures, Due to What? A system is said to
More information7 Fault Tolerant Distributed Transactions Commit protocols
7 Fault Tolerant Distributed Transactions Commit protocols 7.1 Subtransactions and distribution 7.2 Fault tolerance and commit processing 7.3 Requirements 7.4 One phase commit 7.5 Two phase commit x based
More informationTransactions. CS 475, Spring 2018 Concurrent & Distributed Systems
Transactions CS 475, Spring 2018 Concurrent & Distributed Systems Review: Transactions boolean transfermoney(person from, Person to, float amount){ if(from.balance >= amount) { from.balance = from.balance
More informationPRIMARY-BACKUP REPLICATION
PRIMARY-BACKUP REPLICATION Primary Backup George Porter Nov 14, 2018 ATTRIBUTION These slides are released under an Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0) Creative Commons
More informationFault Tolerance. o Basic Concepts o Process Resilience o Reliable Client-Server Communication o Reliable Group Communication. o Distributed Commit
Fault Tolerance o Basic Concepts o Process Resilience o Reliable Client-Server Communication o Reliable Group Communication o Distributed Commit -1 Distributed Commit o A more general problem of atomic
More informationCS505: Distributed Systems
Department of Computer Science CS505: Distributed Systems Lecture 13: Distributed Transactions Outline Distributed Transactions Two Phase Commit and Three Phase Commit Non-blocking Atomic Commit with P
More informationCSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 13 - Distribution: transactions
CSE 544 Principles of Database Management Systems Magdalena Balazinska Fall 2007 Lecture 13 - Distribution: transactions References Transaction Management in the R* Distributed Database Management System.
More informationDistributed System. Gang Wu. Spring,2018
Distributed System Gang Wu Spring,2018 Lecture4:Failure& Fault-tolerant Failure is the defining difference between distributed and local programming, so you have to design distributed systems with the
More informationToday: Fault Tolerance. Fault Tolerance
Today: Fault Tolerance Agreement in presence of faults Two army problem Byzantine generals problem Reliable communication Distributed commit Two phase commit Three phase commit Paxos Failure recovery Checkpointing
More informationToday: Fault Tolerance. Reliable One-One Communication
Today: Fault Tolerance Reliable communication Distributed commit Two phase commit Three phase commit Failure recovery Checkpointing Message logging Lecture 17, page 1 Reliable One-One Communication Issues
More informationConsensus and related problems
Consensus and related problems Today l Consensus l Google s Chubby l Paxos for Chubby Consensus and failures How to make process agree on a value after one or more have proposed what the value should be?
More informationDistributed Systems
15-440 Distributed Systems 11 - Fault Tolerance, Logging and Recovery Tuesday, Oct 2 nd, 2018 Logistics Updates P1 Part A checkpoint Part A due: Saturday 10/6 (6-week drop deadline 10/8) *Please WORK hard
More informationGlobal atomicity. Such distributed atomicity is called global atomicity A protocol designed to enforce global atomicity is called commit protocol
Global atomicity In distributed systems a set of processes may be taking part in executing a task Their actions may have to be atomic with respect to processes outside of the set example: in a distributed
More informationDistributed Commit in Asynchronous Systems
Distributed Commit in Asynchronous Systems Minsoo Ryu Department of Computer Science and Engineering 2 Distributed Commit Problem - Either everybody commits a transaction, or nobody - This means consensus!
More informationFault Tolerance. it continues to perform its function in the event of a failure example: a system with redundant components
Fault Tolerance To avoid disruption due to failure and to improve availability, systems are designed to be fault-tolerant Two broad categories of fault-tolerant systems are: systems that mask failure it
More informationCOMMENTS. AC-1: AC-1 does not require all processes to reach a decision It does not even require all correct processes to reach a decision
ATOMIC COMMIT Preserve data consistency for distributed transactions in the presence of failures Setup one coordinator a set of participants Each process has access to a Distributed Transaction Log (DT
More informationLecture 17 : Distributed Transactions 11/8/2017
Lecture 17 : Distributed Transactions 11/8/2017 Today: Two-phase commit. Last time: Parallel query processing Recap: Main ways to get parallelism: Across queries: - run multiple queries simultaneously
More informationToday: Fault Tolerance
Today: Fault Tolerance Agreement in presence of faults Two army problem Byzantine generals problem Reliable communication Distributed commit Two phase commit Three phase commit Paxos Failure recovery Checkpointing
More informationAdapting Commit Protocols for Large-Scale and Dynamic Distributed Applications
Adapting Commit Protocols for Large-Scale and Dynamic Distributed Applications Pawel Jurczyk and Li Xiong Emory University, Atlanta GA 30322, USA {pjurczy,lxiong}@emory.edu Abstract. The continued advances
More informationDistributed Transaction Management
Distributed Transaction Management Material from: Principles of Distributed Database Systems Özsu, M. Tamer, Valduriez, Patrick, 3rd ed. 2011 + Presented by C. Roncancio Distributed DBMS M. T. Özsu & P.
More informationCSE 444: Database Internals. Section 9: 2-Phase Commit and Replication
CSE 444: Database Internals Section 9: 2-Phase Commit and Replication 1 Today 2-Phase Commit Replication 2 Two-Phase Commit Protocol (2PC) One coordinator and many subordinates Phase 1: Prepare Phase 2:
More informationDistributed Transactions
Distributed Transactions Preliminaries Last topic: transactions in a single machine This topic: transactions across machines Distribution typically addresses two needs: Split the work across multiple nodes
More information6.033 Computer System Engineering
MIT OpenCourseWare http://ocw.mit.edu 6.033 Computer System Engineering Spring 2009 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms. Lec 19 : Nested atomic
More informationEECS 591 DISTRIBUTED SYSTEMS. Manos Kapritsos Winter 2018
EECS 591 DISTRIBUTED SYSTEMS Manos Kapritsos Winter 2018 ATOMIC COMMIT Preserve data consistency for distributed transactions in the presence of failures Setup one coordinator a set of participants Each
More informationRecovering from a Crash. Three-Phase Commit
Recovering from a Crash If INIT : abort locally and inform coordinator If Ready, contact another process Q and examine Q s state Lecture 18, page 23 Three-Phase Commit Two phase commit: problem if coordinator
More informationFault Tolerance Part II. CS403/534 Distributed Systems Erkay Savas Sabanci University
Fault Tolerance Part II CS403/534 Distributed Systems Erkay Savas Sabanci University 1 Reliable Group Communication Reliable multicasting: A message that is sent to a process group should be delivered
More informationFault Tolerance. Distributed Systems. September 2002
Fault Tolerance Distributed Systems September 2002 Basics A component provides services to clients. To provide services, the component may require the services from other components a component may depend
More informationActions are never left partially executed. Actions leave the DB in a consistent state. Actions are not affected by other concurrent actions
Transaction Management Recovery (1) Review: ACID Properties Atomicity Actions are never left partially executed Consistency Actions leave the DB in a consistent state Isolation Actions are not affected
More informationATOMIC COMMITMENT Or: How to Implement Distributed Transactions in Sharded Databases
ATOMIC COMMITMENT Or: How to Implement Distributed Transactions in Sharded Databases We talked about transactions and how to implement them in a single-node database. We ll now start looking into how to
More informationToday: Fault Tolerance. Failure Masking by Redundancy
Today: Fault Tolerance Agreement in presence of faults Two army problem Byzantine generals problem Reliable communication Distributed commit Two phase commit Three phase commit Failure recovery Checkpointing
More informationCS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5.
Question 1 What makes a message unstable? How does an unstable message become stable? Distributed Systems 2016 Exam 2 Review Paul Krzyzanowski Rutgers University Fall 2016 In virtual sychrony, a message
More informationControl. CS432: Distributed Systems Spring 2017
Transactions and Concurrency Control Reading Chapter 16, 17 (17.2,17.4,17.5 ) [Coulouris 11] Chapter 12 [Ozsu 10] 2 Objectives Learn about the following: Transactions in distributed systems Techniques
More informationDistributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf
Distributed systems Lecture 6: distributed transactions, elections, consensus and replication Malte Schwarzkopf Last time Saw how we can build ordered multicast Messages between processes in a group Need
More informationGoal A Distributed Transaction
Goal A Distributed Transaction We want a transaction that involves multiple nodes Review of transactions and their properties Things we need to implement transactions * Locks * Achieving atomicity through
More informationDistributed Systems COMP 212. Lecture 19 Othon Michail
Distributed Systems COMP 212 Lecture 19 Othon Michail Fault Tolerance 2/31 What is a Distributed System? 3/31 Distributed vs Single-machine Systems A key difference: partial failures One component fails
More informationDistributed File System
Distributed File System Last Class NFS Design choices Opportunistic locking Local name spaces CS 138 XVIII 1 Copyright 2018 Theophilus Benson, Thomas W. Doeppner. All DFS Basic Info Global name space across
More informationDistributed Systems Fault Tolerance
Distributed Systems Fault Tolerance [] Fault Tolerance. Basic concepts - terminology. Process resilience groups and failure masking 3. Reliable communication reliable client-server communication reliable
More information) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons)
) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons) Transactions - Definition A transaction is a sequence of data operations with the following properties: * A Atomic All
More informationCS5412: TRANSACTIONS (I)
1 CS5412: TRANSACTIONS (I) Lecture XVII Ken Birman Transactions 2 A widely used reliability technology, despite the BASE methodology we use in the first tier Goal for this week: in-depth examination of
More informationCSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 14 Distributed Transactions
CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 14 Distributed Transactions Transactions Main issues: Concurrency control Recovery from failures 2 Distributed Transactions
More informationAgreement and Consensus. SWE 622, Spring 2017 Distributed Software Engineering
Agreement and Consensus SWE 622, Spring 2017 Distributed Software Engineering Today General agreement problems Fault tolerance limitations of 2PC 3PC Paxos + ZooKeeper 2 Midterm Recap 200 GMU SWE 622 Midterm
More informationFault Tolerance Part I. CS403/534 Distributed Systems Erkay Savas Sabanci University
Fault Tolerance Part I CS403/534 Distributed Systems Erkay Savas Sabanci University 1 Overview Basic concepts Process resilience Reliable client-server communication Reliable group communication Distributed
More informationCausal Consistency and Two-Phase Commit
Causal Consistency and Two-Phase Commit CS 240: Computing Systems and Concurrency Lecture 16 Marco Canini Credits: Michael Freedman and Kyle Jamieson developed much of the original material. Consistency
More informationDep. Systems Requirements
Dependable Systems Dep. Systems Requirements Availability the system is ready to be used immediately. A(t) = probability system is available for use at time t MTTF/(MTTF+MTTR) If MTTR can be kept small
More informationToday: Fault Tolerance. Replica Management
Today: Fault Tolerance Failure models Agreement in presence of faults Two army problem Byzantine generals problem Reliable communication Distributed commit Two phase commit Three phase commit Failure recovery
More information) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons)
) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons) Goal A Distributed Transaction We want a transaction that involves multiple nodes Review of transactions and their properties
More informationTransactions. Transaction. Execution of a user program in a DBMS.
Transactions Transactions Transaction Execution of a user program in a DBMS. Transactions Transaction Execution of a user program in a DBMS. Transaction properties Atomicity: all-or-nothing execution Consistency:
More informationFault Tolerance. Basic Concepts
COP 6611 Advanced Operating System Fault Tolerance Chi Zhang czhang@cs.fiu.edu Dependability Includes Availability Run time / total time Basic Concepts Reliability The length of uninterrupted run time
More informationSynchronization Part 2. REK s adaptation of Claypool s adaptation oftanenbaum s Distributed Systems Chapter 5 and Silberschatz Chapter 17
Synchronization Part 2 REK s adaptation of Claypool s adaptation oftanenbaum s Distributed Systems Chapter 5 and Silberschatz Chapter 17 1 Outline Part 2! Clock Synchronization! Clock Synchronization Algorithms!
More informationDistributed Database Management System UNIT-2. Concurrency Control. Transaction ACID rules. MCA 325, Distributed DBMS And Object Oriented Databases
Distributed Database Management System UNIT-2 Bharati Vidyapeeth s Institute of Computer Applications and Management, New Delhi-63,By Shivendra Goel. U2.1 Concurrency Control Concurrency control is a method
More informationBasic concepts in fault tolerance Masking failure by redundancy Process resilience Reliable communication. Distributed commit.
Basic concepts in fault tolerance Masking failure by redundancy Process resilience Reliable communication One-one communication One-many communication Distributed commit Two phase commit Failure recovery
More informationCS October 2017
Atomic Transactions Transaction An operation composed of a number of discrete steps. Distributed Systems 11. Distributed Commit Protocols All the steps must be completed for the transaction to be committed.
More informationChapter 4: Distributed Transactions (First Part) IPD, Forschungsbereich Systeme der Informationsverwaltung
Chapter 4: Distributed Transactions (First Part) IPD, Forschungsbereich e der Informationsverwaltung 1 Distributed Transactions (1) US Customers Transfer USD 500,-- from Klemens account to Jim s account.
More informationAssignment 12: Commit Protocols and Replication Solution
Data Modelling and Databases Exercise dates: May 24 / May 25, 2018 Ce Zhang, Gustavo Alonso Last update: June 04, 2018 Spring Semester 2018 Head TA: Ingo Müller Assignment 12: Commit Protocols and Replication
More informationTransaction Management. Pearson Education Limited 1995, 2005
Chapter 20 Transaction Management 1 Chapter 20 - Objectives Function and importance of transactions. Properties of transactions. Concurrency Control Deadlock and how it can be resolved. Granularity of
More informationCSE 5306 Distributed Systems
CSE 5306 Distributed Systems Fault Tolerance Jia Rao http://ranger.uta.edu/~jrao/ 1 Failure in Distributed Systems Partial failure Happens when one component of a distributed system fails Often leaves
More informationDistributed Systems COMP 212. Revision 2 Othon Michail
Distributed Systems COMP 212 Revision 2 Othon Michail Synchronisation 2/55 How would Lamport s algorithm synchronise the clocks in the following scenario? 3/55 How would Lamport s algorithm synchronise
More informationCSE 5306 Distributed Systems. Fault Tolerance
CSE 5306 Distributed Systems Fault Tolerance 1 Failure in Distributed Systems Partial failure happens when one component of a distributed system fails often leaves other components unaffected A failure
More informationTransaction Management & Concurrency Control. CS 377: Database Systems
Transaction Management & Concurrency Control CS 377: Database Systems Review: Database Properties Scalability Concurrency Data storage, indexing & query optimization Today & next class Persistency Security
More informationFailure Models. Fault Tolerance. Failure Masking by Redundancy. Agreement in Faulty Systems
Fault Tolerance Fault cause of an error that might lead to failure; could be transient, intermittent, or permanent Fault tolerance a system can provide its services even in the presence of faults Requirements
More informationChapter 5 Distributed Transaction Management
Chapter 5 Distributed Transaction Management Table of Contents Definition of a Transaction Properties of Transactions Supporting Atomicity of Distributed Transactions Chapter5-1 1 1. Definition of a Transaction
More information) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons)
) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons) Goal A Distributed Transaction We want a transaction that involves multiple nodes Review of transactions and their properties
More informationThe objective. Atomic Commit. The setup. Model. Preserve data consistency for distributed transactions in the presence of failures
The objective Atomic Commit Preserve data consistency for distributed transactions in the presence of failures Model The setup For each distributed transaction T: one coordinator a set of participants
More informationDistributed Systems (ICE 601) Fault Tolerance
Distributed Systems (ICE 601) Fault Tolerance Dongman Lee ICU Introduction Failure Model Fault Tolerance Models state machine primary-backup Class Overview Introduction Dependability availability reliability
More informationRecoverability. Kathleen Durant PhD CS3200
Recoverability Kathleen Durant PhD CS3200 1 Recovery Manager Recovery manager ensures the ACID principles of atomicity and durability Atomicity: either all actions in a transaction are done or none are
More informationChapter 8 Fault Tolerance
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 8 Fault Tolerance 1 Fault Tolerance Basic Concepts Being fault tolerant is strongly related to
More informationCS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 21: Network Protocols (and 2 Phase Commit)
CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring 2003 Lecture 21: Network Protocols (and 2 Phase Commit) 21.0 Main Point Protocol: agreement between two parties as to
More informationDistributed Systems. Day 13: Distributed Transaction. To Be or Not to Be Distributed.. Transactions
Distributed Systems Day 13: Distributed Transaction To Be or Not to Be Distributed.. Transactions Summary Background on Transactions ACID Semantics Distribute Transactions Terminology: Transaction manager,,
More informationOutline. Failure Types
Outline Database Tuning Nikolaus Augsten University of Salzburg Department of Computer Science Database Group 1 Unit 10 WS 2013/2014 Adapted from Database Tuning by Dennis Shasha and Philippe Bonnet. Nikolaus
More informationPrimary-Backup Replication
Primary-Backup Replication CS 240: Computing Systems and Concurrency Lecture 7 Marco Canini Credits: Michael Freedman and Kyle Jamieson developed much of the original material. Simplified Fault Tolerance
More informationPART II. CS 245: Database System Principles. Notes 08: Failure Recovery. Integrity or consistency constraints. Integrity or correctness of data
CS 245: Database System Principles Notes 08: Failure Recovery PART II Crash recovery (2 lectures) Concurrency control (3 lectures) Transaction processing (2 lects) Information integration (1 lect) Ch.17[17]
More informationXI. Transactions CS Computer App in Business: Databases. Lecture Topics
XI. Lecture Topics Properties of Failures and Concurrency in SQL Implementation of Degrees of Isolation CS338 1 Problems Caused by Failures Accounts(, CId, BranchId, Balance) update Accounts set Balance
More informationConcurrency Control & Recovery
Transaction Management Overview CS 186, Fall 2002, Lecture 23 R & G Chapter 18 There are three side effects of acid. Enhanced long term memory, decreased short term memory, and I forget the third. - Timothy
More information) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons)
) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons) Transactions - Definition A transaction is a sequence of data operations with the following properties: * A Atomic All
More informationDatabase Architectures
Database Architectures CPS352: Database Systems Simon Miner Gordon College Last Revised: 4/15/15 Agenda Check-in Parallelism and Distributed Databases Technology Research Project Introduction to NoSQL
More informationThe Implicit{Yes Vote Commit Protocol. with Delegation of Commitment. Pittsburgh, PA Pittsburgh, PA are relatively less reliable.
The Implicit{Yes Vote Commit Protocol with Delegation of Commitment Yousef J. Al-Houmaily Panos K. Chrysanthis Dept. of Electrical Engineering Dept. of Computer Science University of Pittsburgh University
More informationMYE017 Distributed Systems. Kostas Magoutis
MYE017 Distributed Systems Kostas Magoutis magoutis@cse.uoi.gr http://www.cse.uoi.gr/~magoutis Message reception vs. delivery The logical organization of a distributed system to distinguish between message
More informationExam 2 Review. Fall 2011
Exam 2 Review Fall 2011 Question 1 What is a drawback of the token ring election algorithm? Bad question! Token ring mutex vs. Ring election! Ring election: multiple concurrent elections message size grows
More informationRecovery from failures
Lecture 05.02 Recovery from failures By Marina Barsky Winter 2017, University of Toronto Definition: Consistent state: all constraints are satisfied Consistent DB: DB in consistent state Observation: DB
More informationCprE Fault Tolerance. Dr. Yong Guan. Department of Electrical and Computer Engineering & Information Assurance Center Iowa State University
Fault Tolerance Dr. Yong Guan Department of Electrical and Computer Engineering & Information Assurance Center Iowa State University Outline for Today s Talk Basic Concepts Process Resilience Reliable
More informationChapter 25: Advanced Transaction Processing
Chapter 25: Advanced Transaction Processing Transaction-Processing Monitors Transactional Workflows High-Performance Transaction Systems Main memory databases Real-Time Transaction Systems Long-Duration
More informationThe challenges of non-stable predicates. The challenges of non-stable predicates. The challenges of non-stable predicates
The challenges of non-stable predicates Consider a non-stable predicate Φ encoding, say, a safety property. We want to determine whether Φ holds for our program. The challenges of non-stable predicates
More informationExercise 12: Commit Protocols and Replication
Data Modelling and Databases (DMDB) ETH Zurich Spring Semester 2017 Systems Group Lecturer(s): Gustavo Alonso, Ce Zhang Date: May 22, 2017 Assistant(s): Claude Barthels, Eleftherios Sidirourgos, Eliza
More informationDistributed Databases
Distributed Databases Chapter 22, Part B Database Management Systems, 2 nd Edition. R. Ramakrishnan and Johannes Gehrke 1 Introduction Data is stored at several sites, each managed by a DBMS that can run
More informationDistributed Systems. Fault Tolerance. Paul Krzyzanowski
Distributed Systems Fault Tolerance Paul Krzyzanowski Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution 2.5 License. Faults Deviation from expected
More informationDatabase Management Systems
Database Management Systems Distributed Databases Doug Shook What does it mean to be distributed? Multiple nodes connected by a network Data on the nodes is logically related The nodes do not need to be
More informationDistributed Data with ACID Transactions
Distributed Data with ACID Transactions 3-tier application with data distributed across multiple DBMSs Not replicating the data (yet) DBMS1... DBMS2 Application Server Clients Why Do This? Legacy systems
More informationCrash Recovery. Hector Garcia-Molina Stijn Vansummeren. CS 245 Notes 08 1
Crash Recovery Hector Garcia-Molina Stijn Vansummeren CS 245 Notes 08 1 Integrity or correctness of data Would like data to be accurate or correct at all times EMP Name White Green Gray Age 52 3421 1 CS
More informationDistributed Systems Principles and Paradigms. Chapter 08: Fault Tolerance
Distributed Systems Principles and Paradigms Maarten van Steen VU Amsterdam, Dept. Computer Science Room R4.20, steen@cs.vu.nl Chapter 08: Fault Tolerance Version: December 2, 2010 2 / 65 Contents Chapter
More informationCHAPTER 3 RECOVERY & CONCURRENCY ADVANCED DATABASE SYSTEMS. Assist. Prof. Dr. Volkan TUNALI
CHAPTER 3 RECOVERY & CONCURRENCY ADVANCED DATABASE SYSTEMS Assist. Prof. Dr. Volkan TUNALI PART 1 2 RECOVERY Topics 3 Introduction Transactions Transaction Log System Recovery Media Recovery Introduction
More information