Distributed File System
|
|
- Melvyn Greer
- 5 years ago
- Views:
Transcription
1 Distributed File System Last Class NFS Design choices Opportunistic locking Local name spaces CS 138 XVIII 1 Copyright 2018 Theophilus Benson, Thomas W. Doeppner. All
2 DFS Basic Info Global name space across all servers in DFS (or Cell in DCE speak) Hierarchy Is partition into filesets Stored at a different servers Mapping stored in fileset location database (FLDB) fs system users projects sol7 osf1 twd motif dce FLDB bin bin usr bin CS 138 XVIII 2 Copyright 2018 Theophilus Benson, Thomas W. Doeppner. All
3 DFS Mount Points fs FileSets are given names independent of hierarchy root.cell system users projects bin.sol7 bin.osf1 users.twd proj.motif proj.dce Mapping of FileSet name to server stored in FLDB sol7 osf1 twd motif dce bin bin.sol7 Client Client Client bin usr bin bin.osf1 users.twd proj.motif proj.dce Client Client Client FLDB FLDB mounting = symbolic link to the name To use FS must use name to look up server in FLDB Provides global name FLDB space: all clients use same name. CS 138 XVIII 4 Copyright 2018 Theophilus Benson, Thomas W. Doeppner. All
4 DCE DFS V. NFS DCE: Single global name space Namespace is build into the central db (directory hierarchy) Mounts/lookups must use central databases NSF Independent disjoin names spaces based on mount CS 138 XVIII 5 Copyright 2018 Theophilus Benson, Thomas W. Doeppner. All
5 Strict Consistency in DFS fs system users projects sol7 osf1 twd motif dce bin bin usr bin To read/write files. Clients request tokens. Client Client FLDB provides token and allows caching Client Client Client Client FLDB As long FLDB as tokens are held, client can make appropriate changes. CS 138 XVIII 6 Copyright 2018 Theophilus Benson, Thomas W. Doeppner. All
6 fs DFS Tokens (1) system users projects sol7 osf1 twd motif dce bin bin usr bin Client File A: Read: File B: Write:0-512 Client File A: Read: File B: Write: FLDB FLDB FLDB CS 138 XVIII 7 Copyright 2018 Theophilus Benson, Thomas W. Doeppner. All
7 fs DFS Tokens (2) system users projects sol7 osf1 twd motif dce bin bin usr bin Client File A: Read: Write(A, ) Client File A: Read: FLDB FLDB FLDB CS 138 XVIII 8 Copyright 2018 Theophilus Benson, Thomas W. Doeppner. All
8 DFS Tokens (3) fs system users projects sol7 osf1 twd motif dce bin bin usr bin Revoke(A, Read, ) Client File A: Read: Client File A: Read: FLDB FLDB FLDB CS 138 XVIII 9 Copyright 2018 Theophilus Benson, Thomas W. Doeppner. All
9 fs DFS Tokens (4) system users projects sol7 osf1 twd motif dce bin bin usr bin Client Grant(A, write, ) Client File A: Read: File A: Write: FLDB FLDB FLDB CS 138 XVIII 10 Copyright 2018 Theophilus Benson, Thomas W. Doeppner. All
10 fs DFS Tokens (5) system users projects sol7 osf1 twd motif dce bin bin usr bin Client Read(A, ) Client File A: Read: File A: Write: FLDB FLDB FLDB CS 138 XVIII 11 Copyright 2018 Theophilus Benson, Thomas W. Doeppner. All
11 fs DFS Tokens (6) system users projects sol7 osf1 twd motif dce bin bin usr bin Client Client File A: Read: File A: Write: FLDB FLDB FLDB CS 138 XVIII 12 Copyright 2018 Theophilus Benson, Thomas W. Doeppner. All
12 fs DFS Tokens (7) system users projects sol7 osf1 twd motif dce bin bin usr bin Grant(A, read, ) Client File A: Read: Client File A: Read: FLDB FLDB FLDB CS 138 XVIII 13 Copyright 2018 Theophilus Benson, Thomas W. Doeppner. All
13 Consistency Models NFS: weak or strict Strick if locks are used CIFS: strict consistency Uses opportunistic locks Revoke + flush DCE: Strict consistency Uses tokens Similar revoke+flush policy as CIFS CS 138 XVIII 14 Copyright 2018 Theophilus Benson, Thomas W. Doeppner. All
14 Crash Recovery CS 138 XVIII 15 Copyright 2018 Theophilus Benson, Thomas W. Doeppner. All
15 Client DFS Crash Recovery Client crash: server reclaims token should be able to revoke when client down crash: on reboot rebuilt token DB Client should keep using cache even when server failed Client Network failure: both are up but can t communicate Revoking tokens or using cache create issues Compromise: Client uses token until expire revokes unilateral When failure fixed, client changes may be rejected!!! CS 138 XVIII 16 Copyright 2018 Theophilus Benson, Thomas W. Doeppner. All
16 DFS Recovery Problems Client application must participate! must recognize that operation returns timedout error must retry Due to semantics of tokens, it isn t feasible to provide NFS-style hard mount. Hard mount possible because NFS is stateless In DCE DFS state complicates hard mounting CS 138 XVIII 18 Copyright 2018 Theophilus Benson, Thomas W. Doeppner. All
17 State It s required! exact Unix semantics - E.g. Need to know how many clients using file to delete mandatory locks CS 138 XVIII 21 Copyright 2018 Theophilus Benson, Thomas W. Doeppner. All
18 State Recovery Recovery from crashed server clients reclaim state on server - grace period after crash during which no new state may be established recovery from crashed client server detects crash and nullifies client state information on server CS 138 XVIII 22 Copyright 2018 Theophilus Benson, Thomas W. Doeppner. All
19 Coping with Non- Responsiveness Leases locks are granted for a fixed period of time - server-specified lease if lease not renewed before expiration, server may (unilaterally) revoke locks and share reservations - most client RPCs renew leases clients must contact server periodically - if clientid is rejected as stale, then server has restarted - server s grace period is equal to lease period CS 138 XVIII 23 Copyright 2018 Theophilus Benson, Thomas W. Doeppner. All
20 DFS LFS Distributed file system!= Local file system s might give up on non-crashed clients clients may lose locks clients may lose files NFSv4 attempts to make such things unlikely CS 138 XVIII 27 Copyright 2018 Theophilus Benson, Thomas W. Doeppner. All
21 Distributed File System Summary Opportunistic Locking Enables client to safely cache file and modify it Improves performance Global V. Local name spaces Global: DCE DFS Local: NFS Crash recovery Requires client participation Leases help deal with non responsiveness DFS!= LFS CS 138 XVIII 29 Copyright 2018 Theophilus Benson, Thomas W. Doeppner. All
22 CS 138: Distributed Transactions CS 138 XVIII 30 Copyright 2018 Theophilus Benson, Thomas W. Doeppner. All
23 Today Transactions Atomic of Transactions Five desired properties Two Phase Analyzed state machines for different roles Participant and Coordinator response to failures Blocking issue Three Phase Attempt to eliminate Blocking issue More complex protocol including more messages Analyzed state machines for different roles Much of this lecture is adapted from the textbook by Tanenbaum and Van Steen and from Chapter 7 of Concurrency Control and Recovery in Database Systems, by P. Bernstein, V. Hadzilacos, and N. Goodman, Addison-Wesley (1987). CS 138 XVIII 31 Copyright 2018 Theophilus Benson, Thomas W. Doeppner. All
24 Transactions ACID property: atomic - all or nothing consistent - take system from one consistent state to another isolated - have no effect on other transactions until committed durable - persists CS 138 XVIII 32 Copyright 2018 Theophilus Benson, Thomas W. Doeppner. All
25 Recall: Configuration Changes in Raft! Cannot switch directly from one configuration to another: conflicting majorities could arise See the paper for details C old time C new Majority of C old Majority of C new CS 138 XVIII 33 Copyright 2018 Theophilus Benson, Thomas W. Doeppner. All
26 Roles in Atomic Transactions C = Coordinator Initiates a change P = Participants All nodes involved in making the atomic change CS 138 XVIII 34 Copyright 2018 Theophilus Benson, Thomas W. Doeppner. All
27 Distributed Transactions withdraw(100); Begin Transaction; a.withdraw(100); b.deposit(50); c.deposit(50); End Transaction; Client (Participant) A (Participant) deposit(50); B (Participant) deposit(50); C (Participant) CS 138 XVIII 35 Copyright 2018 Theophilus Benson, Thomas W. Doeppner. All
28 Coordination Coordinator withdraw(100); Begin Transaction; a.withdraw(100); b.deposit(50); c.deposit(50); End Transaction; Client (Participant) A (Participant) deposit(50); B (Participant) deposit(50); C (Participant) CS 138 XVIII 36 Copyright 2018 Theophilus Benson, Thomas W. Doeppner. All
29 Atomic Properties AC1: All participants that reach a (commit/abort) decision reach the same one AC2: A participant cannot reverse its decision AC3: The commit decision can be reached only if all participants agree AC4: If there are no failures and all participants vote yes, then decision will be commit AC5: For any execution, if all failures are repaired and no new failures occur for a sufficiently long interval, then all participants will reach a (commit/abort) decision CS 138 XVIII 37 Copyright 2018 Theophilus Benson, Thomas W. Doeppner. All
30 Two-Phase Phase 1 coordinator prepares to commit: - asks participants to vote either commit or abort participants respond appropriately Phase 2 coordinator decides outcome: - if all participants vote commit, outcome is commit, otherwise outcome is abort - outcome sent to all participants participants do what they re told CS 138 XVIII 38 Copyright 2018 Theophilus Benson, Thomas W. Doeppner. All
31 Two-Phase Coordinator pariticpants pariticpants pariticpants pariticpants If all Then If any Abort Then Abort Make a change or Abort Response Decide to abort Or commit based On local state Once a node responds. The node can t change responses or Abort CS 138 XVIII 39 Copyright 2018 Theophilus Benson, Thomas W. Doeppner. All
32 Two-Phase Coordinator pariticpants pariticpants pariticpants pariticpants If all Then If any Abort Then Abort (AC3, AC4) Make a change or Abort Response Decide to abort Or commit based On local state Once a node responds. The node can t change responses (AC2) or Abort (AC1) CS 138 XVIII 40 Copyright 2018 Theophilus Benson, Thomas W. Doeppner. All
33 Coordinator State Diagram for Two-Phase Init app commit/vote req Coordinator Make a change pariticpants pariticpants pariticpants pariticpants any abort/abort Wait all commit/commit or Abort Abort Response Coordinator CS 138 XVIII 41 Copyright 2018 Theophilus Benson, Thomas W. Doeppner. All
34 Participant State Diagram for Two-Phase Coordinator Make a change pariticpants pariticpants pariticpants pariticpants vote req/abort Init vote req/commit or Abort abort/ack Uncertain commit/ack Response Abort Participant CS 138 XVIII 42 Copyright 2018 Theophilus Benson, Thomas W. Doeppner. All
35 State Diagrams for Two Phase Init app commit/vote req vote req/abort Init vote req/commit any abort/abort Wait all commit/commit abort/ack Uncertain commit/ack Abort Abort Coordinator Participant CS 138 XVIII 43 Copyright 2018 Theophilus Benson, Thomas W. Doeppner. All
36 Failures Coordinator or participants could crash assume fail-stop - crash detected by time-out - no byzantine failures crashed machines restart - recover their state Focus on two failures Node failures Network failures (i.e., Lost messages) CS 138 XVIII 44 Copyright 2018 Theophilus Benson, Thomas W. Doeppner. All
37 Crash Points Init app commit/vote req vote req/abort Init vote req/commit any abort/abort Wait all commit/commit abort/ack Uncertain commit/ack Abort Abort Coordinator Participant CS 138 XVIII 45 Copyright 2018 Theophilus Benson, Thomas W. Doeppner. All
38 Crash Points Init app commit/vote req vote req/abort Init vote req/commit any abort/abort Wait all commit/commit abort/ack Uncertain commit/ack Abort Abort Coordinator Participant CS 138 XVIII 46 Copyright 2018 Theophilus Benson, Thomas W. Doeppner. All
39 Coordinator Recovery from Participant Failure Init app commit/vote req Wait any abort/abort all commit/commit Abort Coordinator Coordinator in wait state It detects failure of participant Using timeout waiting for participant to say commit or abort Coordinator in Wait state can t assume either outcome Waiting for them to respond Takes no response as an abort Abort transaction!! CS 138 XVIII 47 Copyright 2018 Theophilus Benson, Thomas W. Doeppner. All
40 Participant Recovery from Coordinator Failure vote req/abort abort/ack Abort Init Uncertain Participant vote req/commit commit/ack Participant in Uncertain state Participant in Uncertain state It detects failure of coordinator Using timeout waiting for coordinator to say commit or abort can t assume either outcome waits for coordinator to restart - On restart contact coordinator for final outcome CS 138 XVIII 48 Copyright 2018 Theophilus Benson, Thomas W. Doeppner. All
41 Optimizing Participant Recovery Waiting for Coordinator to restart can take a LONG time! Especially if coordinator takes a while to reboot Participants contact other participants p contacts q (p is in Uncertain state) q is in: - commit (or abort) state p goes to commit (or abort) state - init state (hasn t yet voted) both q and p abort - uncertain state vote req/abort abort/ack Init vote req/commit Uncertain commit/ack both p and q remain uncertain Abort CS 138 XVIII 49 Copyright 2018 Theophilus Benson, Thomas W. Doeppner. All
42 Optimizing Participant Recovery q is in: - commit (or abort) state p goes to commit (or abort) state - init state (hasn t yet voted) both q and p abort - uncertain state both p and q remain uncertain vote req/abort Init vote req/commit vote req/abort Init vote req/commit abort/ack Uncertain commit/ack abort/ack Uncertain commit/ack Node P Abort CS 138 XVIII 50 Copyright 2018 Theophilus Benson, Thomas W. Doeppner. All
43 Optimizing Participant Recovery q is in: - commit (or abort) state p goes to commit (or abort) state - init state (hasn t yet voted) both q and p abort - uncertain state both p and q remain uncertain vote req/abort Init vote req/commit vote req/abort Init vote req/commit abort/ack Uncertain commit/ack abort/ack Uncertain commit/ack Node P Abort CS 138 XVIII 51 Copyright 2018 Theophilus Benson, Thomas W. Doeppner. All
44 Optimizing Participant Recovery q is in: - commit (or abort) state p goes to commit (or abort) state - init state (hasn t yet voted) both q and p abort - uncertain state both p and q remain uncertain When all active nodes are uncertain, then this optimization does not provide a benefit. vote req/abort Init vote req/commit vote req/abort Init vote req/commit abort/ack Uncertain commit/ack abort/ack Uncertain commit/ack Node P Abort CS 138 XVIII 52 Copyright 2018 Theophilus Benson, Thomas W. Doeppner. All
45 Two-Phase Works fine in practice! But all participants could conceivably be in uncertain state and coordinator is down (for a long time) Can we make it so such blocking can t happen? CS 138 XVIII 53 Copyright 2018 Theophilus Benson, Thomas W. Doeppner. All
46 What Causes Blocking? Coordinator is down If all operational (not-failed) participants are in uncertain state, they are blocked If all participants are operational, they can elect new coordinator If any participant has crashed, the others don t know if it crashed before or after voting (to commit or abort) CS 138 XVIII 54 Copyright 2018 Theophilus Benson, Thomas W. Doeppner. All
47 Guaranteeing Non-Blocking Non-blocking property (NB): if any operational process is in the Uncertain state, then no process (operational or failed) can have decided to commit If NB holds, then operational processes may elect new coordinator and decide to commit or abort CS 138 XVIII 55 Copyright 2018 Theophilus Benson, Thomas W. Doeppner. All
48 Failures Coordinator or participants could crash no communication failures assume fail-stop - crash detected by time-out - no byzantine failures crashed machines restart - recover their state CS 138 XVIII 56 Copyright 2018 Theophilus Benson, Thomas W. Doeppner. All
49 Three-Phase Phase 1 coordinator prepares to commit: - asks participants to vote either commit or abort participants respond appropriately Phase 2 coordinator counts votes: - if all participants vote commit, outcome is precommit, otherwise outcome is abort - outcome sent to all participants participants ack and either abort or wait for commit Phase 3 coordinator waits for all acks - if committing, sends final commit to all participants participants commit CS 138 XVIII 57 Copyright 2018 Theophilus Benson, Thomas W. Doeppner. All
50 Two-Phase Coordinator pariticpants pariticpants pariticpants pariticpants If all Then If any Abort Then Abort Make a change or Abort Pre-commit/Abort ACK Decide to abort Or commit based On local state Pre- or Abort CS 138 XVIII 58 Copyright 2018 Theophilus Benson, Thomas W. Doeppner. All
51 Revised State Diagrams Init app commit/vote req vote req/abort Init vote req/commit any abort/abort Wait all commit/precommit abort/ack Uncertain precommit/ack Abort Pre Abort Pre all ack/commit commit/commit CS 138 XVIII 59 Copyright 2018 Theophilus Benson, Thomas W. Doeppner. All
52 Revised State Diagrams Init app commit/vote req vote req/abort Init vote req/commit any abort/abort Wait all commit/precommit abort/ack Uncertain precommit/ack Abort Pre Abort Pre all ack/commit commit/commit CS 138 XVIII 60 Copyright 2018 Theophilus Benson, Thomas W. Doeppner. All
53 Coordinator Recovery from Participant Failure any abort/abort Init Wait app commit/vote req all commit/precommit Coordinator in Wait state It detects failure of a participant Using timeout waiting for participant to say commit or abort Abort Pre all ack/commit Coordinator in Wait state can t assume either outcome Waiting for them to respond Interprets absence of participant response as an abort Abort transaction!! Same as 2PC CS 138 XVIII 63 Copyright 2018 Theophilus Benson, Thomas W. Doeppner. All
54 Coordinator Recovery from Participant Failure any abort/abort Abort Init Wait app commit/vote req all commit/precommit Pre all ack/commit Coordinator in Pre state It detects failure of participant Using timeout waiting for participant to say ACK Coordinator in Pre- state CAN assume they will commit So commit independently Failed nodes will learn on reboot - Either from coordinator or friend CS 138 XVIII 64 Copyright 2018 Theophilus Benson, Thomas W. Doeppner. All
55 Participant Recovery from Coordinator Failure vote req/abort abort/ack Init Uncertain vote req/commit precommit/ack Participant in init state It detects failure of coordinator Using timeout waiting for coordinator to propose a vote Abort commit/commit Pre Safely Abort without waiting CS 138 XVIII 65 Copyright 2018 Theophilus Benson, Thomas W. Doeppner. All
56 Participant Recovery from Coordinator Failure vote req/abort abort/ack Abort Init Uncertain?? vote req/commit commit/commit precommit/ack Pre Participant in uncertain state It detects failure of coordinator Using timeout waiting for coordinator to say commit or abort Participant in Uncertain state can t assume either outcome Contact another node to determine status CS 138 XVIII 67 Copyright 2018 Theophilus Benson, Thomas W. Doeppner. All
57 Participant Recovery from Coordinator Failure vote req/abort abort/ack Init Uncertain vote req/commit precommit/ack Participant in Precommit state It detects failure of coordinator Using timeout waiting for coordinator to say commit Abort commit/commit Pre Participant, p, in Pre state Why not just COMMIT? There might be a node, q, out there in uncertain state If p commits and fails, there s a chance q can still ABORT and violate the earlier assumptions CS 138 XVIII 69 Copyright 2018 Theophilus Benson, Thomas W. Doeppner. All
58 Details (1) If original coordinator remains operational participant crashes handled as in two-phase commit If participant times out in Uncertain state if any other participant has aborted, it aborts otherwise, it starts an election for a new coordinator - All in Uncertain or some in uncertain, some in Precommit If participant times out in Pre state it starts an election for a new coordinator All in precommit or some in uncertain, some in Precommit No node can be in the Abort state if any node is in Pre. If any in then they move into CS 138 XVIII 71 Copyright 2018 Theophilus Benson, Thomas W. Doeppner. All
59 Participant Times out In Uncertain State vote req/abort Init vote req/commit Participant in uncertain state waiting for coordinator to say commit or abort abort/ack Abort Uncertain?? commit/commit Other participants can be in: precommit/ack Pre Uncertain these participants responded to vote with Yes Abort these participants responded to vote with Abrot Init these participants have not received the request to vote NO participant can be in Pre CS 138 XVIII 72 Copyright 2018 Theophilus Benson, Thomas W. Doeppner. All
60 Participant Times out In Pre State vote req/abort Init vote req/commit Participant in Precommit state waiting for coordinator to say commit abort/ack Abort Uncertain commit/commit Other participants can be in: precommit/ack Pre Precommit received pre-commit msg from coordinator but waiting for commit received commit Uncertain these participants have not received precommit from coordinator. - Coordinator could have failed before sending precommit to them NO participant can be in Init/Abort - All must have voted yet for CS 138 XVIII 73 coordinator to send Pre Copyright 2018 Theophilus Benson, Thomas W. Doeppner. All
61 Coordinator Recovery: New Coordinator When newly elected coordinator starts up sends state-request message to all operational participants Coordinator collects states and proceeds according to four termination rules (termination protocol): TR1: if any participant is in Abort state, all are sent abort messages TR2: if some participant is in state, all are sent commit messages TR3: if all participants are in Uncertain state, all are sent abort messages TR4: if some participant is in Pre state, but none in State, those in Uncertain state are sent Pre messages; once these are acked, all participants are sent commit messages CS 138 XVIII 74 Copyright 2018 Theophilus Benson, Thomas W. Doeppner. All
62 Total Failure What if coordinator and all participants fail? When they come back up, how do they decide? if resurrected participant either didn t vote or voted abort, it may unilaterally abort - otherwise, must run termination protocol (from last slide) but works only if last participant to fail has come back up - If last participant doesn t come back up, you don t know its current state CS 138 XVIII 78 Copyright 2018 Theophilus Benson, Thomas W. Doeppner. All
63 Communication Failures Network could partition into multiple pieces Not sufficient to get agreement in a piece containing a quorum consensus is required for commit! Scenario all participants vote coordinator collects results network partitions before or after all results collected if network reconnects: easy network never fully reconnects, but each participant eventually can communicate (perhaps briefly) with all others CS 138 XVIII 79 Copyright 2018 Theophilus Benson, Thomas W. Doeppner. All
64 Summary of Distributed Transaction Transactions Atomic of Transactions Five desired properties Two Phase Analyzed state machines for different roles Participant and Coordinator response to failures Blocking issue Three Phase Attempt to eliminate Blocking issue More complex protocol including more messages Analyzed state machines for different roles CS 138 XVIII 80 Copyright 2018 Theophilus Benson, Thomas W. Doeppner. All
Distributed Systems. Day 13: Distributed Transaction. To Be or Not to Be Distributed.. Transactions
Distributed Systems Day 13: Distributed Transaction To Be or Not to Be Distributed.. Transactions Summary Background on Transactions ACID Semantics Distribute Transactions Terminology: Transaction manager,,
More informationTransactions. CS 475, Spring 2018 Concurrent & Distributed Systems
Transactions CS 475, Spring 2018 Concurrent & Distributed Systems Review: Transactions boolean transfermoney(person from, Person to, float amount){ if(from.balance >= amount) { from.balance = from.balance
More informationThe objective. Atomic Commit. The setup. Model. Preserve data consistency for distributed transactions in the presence of failures
The objective Atomic Commit Preserve data consistency for distributed transactions in the presence of failures Model The setup For each distributed transaction T: one coordinator a set of participants
More informationModule 8 Fault Tolerance CS655! 8-1!
Module 8 Fault Tolerance CS655! 8-1! Module 8 - Fault Tolerance CS655! 8-2! Dependability Reliability! A measure of success with which a system conforms to some authoritative specification of its behavior.!
More informationDistributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf
Distributed systems Lecture 6: distributed transactions, elections, consensus and replication Malte Schwarzkopf Last time Saw how we can build ordered multicast Messages between processes in a group Need
More informationCOMMENTS. AC-1: AC-1 does not require all processes to reach a decision It does not even require all correct processes to reach a decision
ATOMIC COMMIT Preserve data consistency for distributed transactions in the presence of failures Setup one coordinator a set of participants Each process has access to a Distributed Transaction Log (DT
More informationDistributed Transactions
Distributed Transactions Preliminaries Last topic: transactions in a single machine This topic: transactions across machines Distribution typically addresses two needs: Split the work across multiple nodes
More informationModule 8 - Fault Tolerance
Module 8 - Fault Tolerance Dependability Reliability A measure of success with which a system conforms to some authoritative specification of its behavior. Probability that the system has not experienced
More informationCS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5.
Question 1 What makes a message unstable? How does an unstable message become stable? Distributed Systems 2016 Exam 2 Review Paul Krzyzanowski Rutgers University Fall 2016 In virtual sychrony, a message
More informationCS 541 Database Systems. Three Phase Commit
CS 541 Database Systems Three Phase Commit 1 Introduction No ACP can eliminate blocking if total failures or total site failures are possible. 2PC may cause blocking even if there is a nontotal site failure
More informationThe challenges of non-stable predicates. The challenges of non-stable predicates. The challenges of non-stable predicates
The challenges of non-stable predicates Consider a non-stable predicate Φ encoding, say, a safety property. We want to determine whether Φ holds for our program. The challenges of non-stable predicates
More informationEECS 591 DISTRIBUTED SYSTEMS. Manos Kapritsos Winter 2018
EECS 591 DISTRIBUTED SYSTEMS Manos Kapritsos Winter 2018 ATOMIC COMMIT Preserve data consistency for distributed transactions in the presence of failures Setup one coordinator a set of participants Each
More informationTopics in Reliable Distributed Systems
Topics in Reliable Distributed Systems 049017 1 T R A N S A C T I O N S Y S T E M S What is A Database? Organized collection of data typically persistent organization models: relational, object-based,
More informationPRIMARY-BACKUP REPLICATION
PRIMARY-BACKUP REPLICATION Primary Backup George Porter Nov 14, 2018 ATTRIBUTION These slides are released under an Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0) Creative Commons
More informationConsistency in Distributed Systems
Consistency in Distributed Systems Recall the fundamental DS properties DS may be large in scale and widely distributed 1. concurrent execution of components 2. independent failure modes 3. transmission
More informationNFS: Naming indirection, abstraction. Abstraction, abstraction, abstraction! Network File Systems: Naming, cache control, consistency
Abstraction, abstraction, abstraction! Network File Systems: Naming, cache control, consistency Local file systems Disks are terrible abstractions: low-level blocks, etc. Directories, files, links much
More informationCS 167 Final Exam Solutions
CS 167 Final Exam Solutions Spring 2018 Do all questions. 1. [20%] This question concerns a system employing a single (single-core) processor running a Unix-like operating system, in which interrupts are
More informationLast time. Distributed systems Lecture 6: Elections, distributed transactions, and replication. DrRobert N. M. Watson
Distributed systems Lecture 6: Elections, distributed transactions, and replication DrRobert N. M. Watson 1 Last time Saw how we can build ordered multicast Messages between processes in a group Need to
More informationCSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 14 Distributed Transactions
CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 14 Distributed Transactions Transactions Main issues: Concurrency control Recovery from failures 2 Distributed Transactions
More informationFailures, Elections, and Raft
Failures, Elections, and Raft CS 8 XI Copyright 06 Thomas W. Doeppner, Rodrigo Fonseca. All rights reserved. Distributed Banking SFO add interest based on current balance PVD deposit $000 CS 8 XI Copyright
More informationATOMIC COMMITMENT Or: How to Implement Distributed Transactions in Sharded Databases
ATOMIC COMMITMENT Or: How to Implement Distributed Transactions in Sharded Databases We talked about transactions and how to implement them in a single-node database. We ll now start looking into how to
More informationCausal Consistency and Two-Phase Commit
Causal Consistency and Two-Phase Commit CS 240: Computing Systems and Concurrency Lecture 16 Marco Canini Credits: Michael Freedman and Kyle Jamieson developed much of the original material. Consistency
More informationEECS 591 DISTRIBUTED SYSTEMS
EECS 591 DISTRIBUTED SYSTEMS Manos Kapritsos Fall 2018 Slides by: Lorenzo Alvisi 3-PHASE COMMIT Coordinator I. sends VOTE-REQ to all participants 3. if (all votes are Yes) then send Precommit to all else
More informationControl. CS432: Distributed Systems Spring 2017
Transactions and Concurrency Control Reading Chapter 16, 17 (17.2,17.4,17.5 ) [Coulouris 11] Chapter 12 [Ozsu 10] 2 Objectives Learn about the following: Transactions in distributed systems Techniques
More informationCS 347 Parallel and Distributed Data Processing
CS 347 Parallel and Distributed Data Processing Spring 2016 Notes 6: Reliability Reliable Distributed DB Management Reliability Failure models Scenarios CS 347 Notes 6 2 Reliability Correctness Serializability
More informationCS122 Lecture 15 Winter Term,
CS122 Lecture 15 Winter Term, 2017-2018 2 Transaction Processing Last time, introduced transaction processing ACID properties: Atomicity, consistency, isolation, durability Began talking about implementing
More informationRecall: Primary-Backup. State machine replication. Extend PB for high availability. Consensus 2. Mechanism: Replicate and separate servers
Replicated s, RAFT COS 8: Distributed Systems Lecture 8 Recall: Primary-Backup Mechanism: Replicate and separate servers Goal #: Provide a highly reliable service Goal #: Servers should behave just like
More informationAgreement and Consensus. SWE 622, Spring 2017 Distributed Software Engineering
Agreement and Consensus SWE 622, Spring 2017 Distributed Software Engineering Today General agreement problems Fault tolerance limitations of 2PC 3PC Paxos + ZooKeeper 2 Midterm Recap 200 GMU SWE 622 Midterm
More informationCS5412: TRANSACTIONS (I)
1 CS5412: TRANSACTIONS (I) Lecture XVII Ken Birman Transactions 2 A widely used reliability technology, despite the BASE methodology we use in the first tier Goal for this week: in-depth examination of
More informationCONCURRENCY CONTROL, TRANSACTIONS, LOCKING, AND RECOVERY
CONCURRENCY CONTROL, TRANSACTIONS, LOCKING, AND RECOVERY George Porter May 18, 2018 ATTRIBUTION These slides are released under an Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0) Creative
More informationConsensus and related problems
Consensus and related problems Today l Consensus l Google s Chubby l Paxos for Chubby Consensus and failures How to make process agree on a value after one or more have proposed what the value should be?
More informationDistributed Commit in Asynchronous Systems
Distributed Commit in Asynchronous Systems Minsoo Ryu Department of Computer Science and Engineering 2 Distributed Commit Problem - Either everybody commits a transaction, or nobody - This means consensus!
More information) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons)
) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons) Goal A Distributed Transaction We want a transaction that involves multiple nodes Review of transactions and their properties
More information) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons)
) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons) Transactions - Definition A transaction is a sequence of data operations with the following properties: * A Atomic All
More informationNetwork File Systems
Network File Systems CS 240: Computing Systems and Concurrency Lecture 4 Marco Canini Credits: Michael Freedman and Kyle Jamieson developed much of the original material. Abstraction, abstraction, abstraction!
More informationTWO-PHASE COMMIT ATTRIBUTION 5/11/2018. George Porter May 9 and 11, 2018
TWO-PHASE COMMIT George Porter May 9 and 11, 2018 ATTRIBUTION These slides are released under an Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0) Creative Commons license These slides
More informationDistributed System. Gang Wu. Spring,2018
Distributed System Gang Wu Spring,2018 Lecture4:Failure& Fault-tolerant Failure is the defining difference between distributed and local programming, so you have to design distributed systems with the
More informationDATABASE TRANSACTIONS. CS121: Relational Databases Fall 2017 Lecture 25
DATABASE TRANSACTIONS CS121: Relational Databases Fall 2017 Lecture 25 Database Transactions 2 Many situations where a sequence of database operations must be treated as a single unit A combination of
More informationDistributed Systems. Characteristics of Distributed Systems. Lecture Notes 1 Basic Concepts. Operating Systems. Anand Tripathi
1 Lecture Notes 1 Basic Concepts Anand Tripathi CSci 8980 Operating Systems Anand Tripathi CSci 8980 1 Distributed Systems A set of computers (hosts or nodes) connected through a communication network.
More informationDistributed Systems. Characteristics of Distributed Systems. Characteristics of Distributed Systems. Goals in Distributed System Designs
1 Anand Tripathi CSci 8980 Operating Systems Lecture Notes 1 Basic Concepts Distributed Systems A set of computers (hosts or nodes) connected through a communication network. Nodes may have different speeds
More information7 Fault Tolerant Distributed Transactions Commit protocols
7 Fault Tolerant Distributed Transactions Commit protocols 7.1 Subtransactions and distribution 7.2 Fault tolerance and commit processing 7.3 Requirements 7.4 One phase commit 7.5 Two phase commit x based
More informationTransactions. Transactions. Distributed Software Systems. A client s banking transaction. Bank Operations. Operations in Coordinator interface
ransactions ransactions Distributed Software Systems A transaction is a sequence of server operations that is guaranteed by the server to be atomic in the presence of multiple clients and server crashes.
More informationDistributed Databases. CS347 Lecture 16 June 6, 2001
Distributed Databases CS347 Lecture 16 June 6, 2001 1 Reliability Topics for the day Three-phase commit (3PC) Majority 3PC Network partitions Committing with partitions Concurrency control with partitions
More informationConsensus in Distributed Systems. Jeff Chase Duke University
Consensus in Distributed Systems Jeff Chase Duke University Consensus P 1 P 1 v 1 d 1 Unreliable multicast P 2 P 3 Consensus algorithm P 2 P 3 v 2 Step 1 Propose. v 3 d 2 Step 2 Decide. d 3 Generalizes
More informationToday: Fault Tolerance
Today: Fault Tolerance Agreement in presence of faults Two army problem Byzantine generals problem Reliable communication Distributed commit Two phase commit Three phase commit Paxos Failure recovery Checkpointing
More informationGoal A Distributed Transaction
Goal A Distributed Transaction We want a transaction that involves multiple nodes Review of transactions and their properties Things we need to implement transactions * Locks * Achieving atomicity through
More informationP2 Recitation. Raft: A Consensus Algorithm for Replicated Logs
P2 Recitation Raft: A Consensus Algorithm for Replicated Logs Presented by Zeleena Kearney and Tushar Agarwal Diego Ongaro and John Ousterhout Stanford University Presentation adapted from the original
More informationCSE 444: Database Internals. Section 9: 2-Phase Commit and Replication
CSE 444: Database Internals Section 9: 2-Phase Commit and Replication 1 Today 2-Phase Commit Replication 2 Two-Phase Commit Protocol (2PC) One coordinator and many subordinates Phase 1: Prepare Phase 2:
More informationDistributed Transaction Management
Distributed Transaction Management Material from: Principles of Distributed Database Systems Özsu, M. Tamer, Valduriez, Patrick, 3rd ed. 2011 + Presented by C. Roncancio Distributed DBMS M. T. Özsu & P.
More informationDatabase Management System Prof. D. Janakiram Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No.
Database Management System Prof. D. Janakiram Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No. # 18 Transaction Processing and Database Manager In the previous
More informationTo do. Consensus and related problems. q Failure. q Raft
Consensus and related problems To do q Failure q Consensus and related problems q Raft Consensus We have seen protocols tailored for individual types of consensus/agreements Which process can enter the
More informationLecture 17 : Distributed Transactions 11/8/2017
Lecture 17 : Distributed Transactions 11/8/2017 Today: Two-phase commit. Last time: Parallel query processing Recap: Main ways to get parallelism: Across queries: - run multiple queries simultaneously
More informationZooKeeper & Curator. CS 475, Spring 2018 Concurrent & Distributed Systems
ZooKeeper & Curator CS 475, Spring 2018 Concurrent & Distributed Systems Review: Agreement In distributed systems, we have multiple nodes that need to all agree that some object has some state Examples:
More informationPage 1. Goals of Today s Lecture. The ACID properties of Transactions. Transactions
Goals of Today s Lecture CS162 Operating Systems and Systems Programming Lecture 19 Transactions, Two Phase Locking (2PL), Two Phase Commit (2PC) Finish Transaction scheduling Two phase locking (2PL) and
More informationCS 347: Distributed Databases and Transaction Processing Notes07: Reliable Distributed Database Management
CS 347: Distributed Databases and Transaction Processing Notes07: Reliable Distributed Database Management Hector Garcia-Molina CS 347 Notes07 1 Reliable distributed database management Reliability Failure
More informationCPS 512 midterm exam #1, 10/7/2016
CPS 512 midterm exam #1, 10/7/2016 Your name please: NetID: Answer all questions. Please attempt to confine your answers to the boxes provided. If you don t know the answer to a question, then just say
More informationExtend PB for high availability. PB high availability via 2PC. Recall: Primary-Backup. Putting it all together for SMR:
Putting it all together for SMR: Two-Phase Commit, Leader Election RAFT COS 8: Distributed Systems Lecture Recall: Primary-Backup Mechanism: Replicate and separate servers Goal #: Provide a highly reliable
More informationSynchronization Part 2. REK s adaptation of Claypool s adaptation oftanenbaum s Distributed Systems Chapter 5 and Silberschatz Chapter 17
Synchronization Part 2 REK s adaptation of Claypool s adaptation oftanenbaum s Distributed Systems Chapter 5 and Silberschatz Chapter 17 1 Outline Part 2! Clock Synchronization! Clock Synchronization Algorithms!
More informationCS October 2017
Atomic Transactions Transaction An operation composed of a number of discrete steps. Distributed Systems 11. Distributed Commit Protocols All the steps must be completed for the transaction to be committed.
More information) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons)
) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons) Goal A Distributed Transaction We want a transaction that involves multiple nodes Review of transactions and their properties
More informationRecovery and Logging
Recovery and Logging Computer Science E-66 Harvard University David G. Sullivan, Ph.D. Review: ACID Properties A transaction has the following ACID properties: Atomicity: either all of its changes take
More informationDistributed Systems 8L for Part IB
Distributed Systems 8L for Part IB Handout 3 Dr. Steven Hand 1 Distributed Mutual Exclusion In first part of course, saw need to coordinate concurrent processes / threads In particular considered how to
More informationFault Tolerance. o Basic Concepts o Process Resilience o Reliable Client-Server Communication o Reliable Group Communication. o Distributed Commit
Fault Tolerance o Basic Concepts o Process Resilience o Reliable Client-Server Communication o Reliable Group Communication o Distributed Commit -1 Distributed Commit o A more general problem of atomic
More informationComputer Science 425 Distributed Systems CS 425 / CSE 424 / ECE 428. Fall 2012
Computer Science 425 Distributed Systems CS 425 / CSE 424 / ECE 428 Fall 2012 Indranil Gupta (Indy) October 18, 2012 Lecture 16 Concurrency Control Reading: Chapter 16.1,2,4 and 17.1,2,3,5 (relevant parts)
More informationDistributed Systems. 12. Concurrency Control. Paul Krzyzanowski. Rutgers University. Fall 2017
Distributed Systems 12. Concurrency Control Paul Krzyzanowski Rutgers University Fall 2017 2014-2017 Paul Krzyzanowski 1 Why do we lock access to data? Locking (leasing) provides mutual exclusion Only
More information416 Distributed Systems. Distributed File Systems 2 Jan 20, 2016
416 Distributed Systems Distributed File Systems 2 Jan 20, 2016 1 Outline Why Distributed File Systems? Basic mechanisms for building DFSs Using NFS and AFS as examples NFS: network file system AFS: andrew
More informationDistributed Systems Fault Tolerance
Distributed Systems Fault Tolerance [] Fault Tolerance. Basic concepts - terminology. Process resilience groups and failure masking 3. Reliable communication reliable client-server communication reliable
More informationBeyond FLP. Acknowledgement for presentation material. Chapter 8: Distributed Systems Principles and Paradigms: Tanenbaum and Van Steen
Beyond FLP Acknowledgement for presentation material Chapter 8: Distributed Systems Principles and Paradigms: Tanenbaum and Van Steen Paper trail blog: http://the-paper-trail.org/blog/consensus-protocols-paxos/
More informationConsistency. CS 475, Spring 2018 Concurrent & Distributed Systems
Consistency CS 475, Spring 2018 Concurrent & Distributed Systems Review: 2PC, Timeouts when Coordinator crashes What if the bank doesn t hear back from coordinator? If bank voted no, it s OK to abort If
More informationToday: Fault Tolerance. Fault Tolerance
Today: Fault Tolerance Agreement in presence of faults Two army problem Byzantine generals problem Reliable communication Distributed commit Two phase commit Three phase commit Paxos Failure recovery Checkpointing
More information) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons)
) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons) Transactions - Definition A transaction is a sequence of data operations with the following properties: * A Atomic All
More informationFault Tolerance. Goals: transparent: mask (i.e., completely recover from) all failures, or predictable: exhibit a well defined failure behavior
Fault Tolerance Causes of failure: process failure machine failure network failure Goals: transparent: mask (i.e., completely recover from) all failures, or predictable: exhibit a well defined failure
More informationExam 2 Review. Fall 2011
Exam 2 Review Fall 2011 Question 1 What is a drawback of the token ring election algorithm? Bad question! Token ring mutex vs. Ring election! Ring election: multiple concurrent elections message size grows
More informationActions are never left partially executed. Actions leave the DB in a consistent state. Actions are not affected by other concurrent actions
Transaction Management Recovery (1) Review: ACID Properties Atomicity Actions are never left partially executed Consistency Actions leave the DB in a consistent state Isolation Actions are not affected
More informationParallel Data Types of Parallelism Replication (Multiple copies of the same data) Better throughput for read-only computations Data safety Partitionin
Parallel Data Types of Parallelism Replication (Multiple copies of the same data) Better throughput for read-only computations Data safety Partitioning (Different data at different sites More space Better
More informationThe transaction. Defining properties of transactions. Failures in complex systems propagate. Concurrency Control, Locking, and Recovery
Failures in complex systems propagate Concurrency Control, Locking, and Recovery COS 418: Distributed Systems Lecture 17 Say one bit in a DRAM fails: flips a bit in a kernel memory write causes a kernel
More informationCS352 Lecture - The Transaction Concept
CS352 Lecture - The Transaction Concept Last Revised 11/7/06 Objectives: 1. To introduce the notion of a transaction and the ACID properties of a transaction 2. To introduce the notion of the state of
More informationCSE 486/586: Distributed Systems
CSE 486/586: Distributed Systems Concurrency Control (part 3) Ethan Blanton Department of Computer Science and Engineering University at Buffalo Lost Update Some transaction T1 runs interleaved with some
More informationGFS: The Google File System
GFS: The Google File System Brad Karp UCL Computer Science CS GZ03 / M030 24 th October 2014 Motivating Application: Google Crawl the whole web Store it all on one big disk Process users searches on one
More informationDistributed File Systems. CS 537 Lecture 15. Distributed File Systems. Transfer Model. Naming transparency 3/27/09
Distributed File Systems CS 537 Lecture 15 Distributed File Systems Michael Swift Goal: view a distributed system as a file system Storage is distributed Web tries to make world a collection of hyperlinked
More informationToday: Fault Tolerance. Replica Management
Today: Fault Tolerance Failure models Agreement in presence of faults Two army problem Byzantine generals problem Reliable communication Distributed commit Two phase commit Three phase commit Failure recovery
More informationAssignment 12: Commit Protocols and Replication Solution
Data Modelling and Databases Exercise dates: May 24 / May 25, 2018 Ce Zhang, Gustavo Alonso Last update: June 04, 2018 Spring Semester 2018 Head TA: Ingo Müller Assignment 12: Commit Protocols and Replication
More informationRaft and Paxos Exam Rubric
1 of 10 03/28/2013 04:27 PM Raft and Paxos Exam Rubric Grading Where points are taken away for incorrect information, every section still has a minimum of 0 points. Raft Exam 1. (4 points, easy) Each figure
More informationGlobal atomicity. Such distributed atomicity is called global atomicity A protocol designed to enforce global atomicity is called commit protocol
Global atomicity In distributed systems a set of processes may be taking part in executing a task Their actions may have to be atomic with respect to processes outside of the set example: in a distributed
More informationAdapting Commit Protocols for Large-Scale and Dynamic Distributed Applications
Adapting Commit Protocols for Large-Scale and Dynamic Distributed Applications Pawel Jurczyk and Li Xiong Emory University, Atlanta GA 30322, USA {pjurczy,lxiong}@emory.edu Abstract. The continued advances
More informationCSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 13 - Distribution: transactions
CSE 544 Principles of Database Management Systems Magdalena Balazinska Fall 2007 Lecture 13 - Distribution: transactions References Transaction Management in the R* Distributed Database Management System.
More informationCS 138: Google. CS 138 XVI 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.
CS 138: Google CS 138 XVI 1 Copyright 2017 Thomas W. Doeppner. All rights reserved. Google Environment Lots (tens of thousands) of computers all more-or-less equal - processor, disk, memory, network interface
More informationNetwork File System (NFS)
Network File System (NFS) Nima Honarmand User A Typical Storage Stack (Linux) Kernel VFS (Virtual File System) ext4 btrfs fat32 nfs Page Cache Block Device Layer Network IO Scheduler Disk Driver Disk NFS
More informationPage 1. Goals of Today s Lecture" Two Key Questions" Goals of Transaction Scheduling"
Goals of Today s Lecture" CS162 Operating Systems and Systems Programming Lecture 19 Transactions, Two Phase Locking (2PL), Two Phase Commit (2PC)" Transaction scheduling Two phase locking (2PL) and strict
More informationIntroduction to Data Management CSE 344
Introduction to Data Management CSE 344 Lecture 22: Transactions I CSE 344 - Fall 2014 1 Announcements HW6 due tomorrow night Next webquiz and hw out by end of the week HW7: Some Java programming required
More informationCS 138: Practical Byzantine Consensus. CS 138 XX 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.
CS 138: Practical Byzantine Consensus CS 138 XX 1 Copyright 2017 Thomas W. Doeppner. All rights reserved. Scenario Asynchronous system Signed messages s are state machines It has to be practical CS 138
More informationTransactions. Chapter 15. New Chapter. CS 2550 / Spring 2006 Principles of Database Systems. Roadmap. Concept of Transaction.
New Chapter CS 2550 / Spring 2006 Principles of Database Systems 09 Transactions Chapter 15 Transactions Alexandros Labrinidis University of Pittsburgh Alexandros Labrinidis, Univ. of Pittsburgh 2 CS 2550
More informationToday: Fault Tolerance. Reliable One-One Communication
Today: Fault Tolerance Reliable communication Distributed commit Two phase commit Three phase commit Failure recovery Checkpointing Message logging Lecture 17, page 1 Reliable One-One Communication Issues
More informationExample: Transfer Euro 50 from A to B
TRANSACTIONS Example: Transfer Euro 50 from A to B 1. Read balance of A from DB into Variable a: read(a,a); 2. Subtract 50.- Euro from the balance: a:= a 50; 3. Write new balance back into DB: write(a,a);
More informationData Modeling and Databases Ch 14: Data Replication. Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich
Data Modeling and Databases Ch 14: Data Replication Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich Database Replication What is database replication The advantages of
More informationCS 138: Google. CS 138 XVII 1 Copyright 2016 Thomas W. Doeppner. All rights reserved.
CS 138: Google CS 138 XVII 1 Copyright 2016 Thomas W. Doeppner. All rights reserved. Google Environment Lots (tens of thousands) of computers all more-or-less equal - processor, disk, memory, network interface
More informationCS505: Distributed Systems
Department of Computer Science CS505: Distributed Systems Lecture 13: Distributed Transactions Outline Distributed Transactions Two Phase Commit and Three Phase Commit Non-blocking Atomic Commit with P
More informationCPS352 Lecture - The Transaction Concept
Objectives: CPS352 Lecture - The Transaction Concept Last Revised March 3, 2017 1. To introduce the notion of a transaction and the ACID properties of a transaction 2. To introduce the notion of the state
More informationò Server can crash or be disconnected ò Client can crash or be disconnected ò How to coordinate multiple clients accessing same file?
Big picture (from Sandberg et al.) NFS Don Porter CSE 506 Intuition Challenges Instead of translating VFS requests into hard drive accesses, translate them into remote procedure calls to a server Simple,
More informationFault Tolerance Causes of failure: process failure machine failure network failure Goals: transparent: mask (i.e., completely recover from) all
Fault Tolerance Causes of failure: process failure machine failure network failure Goals: transparent: mask (i.e., completely recover from) all failures or predictable: exhibit a well defined failure behavior
More information