Distributed System Gang Wu Spring,2018
Lecture4:Failure& Fault-tolerant Failure is the defining difference between distributed and local programming, so you have to design distributed systems with the expectation of failure "Failure happens all the time. It is your number one concern.
Dependable system Availability Reliability Safety Maintainability
Crash at the Wrong Time Examples Failure during middle of online purchase Failure during mv /home/fudan /home/sjtu Problematic Pay the bill or not? twice? Where and how many files? one / zero / dup? What guarantees do applications need?
Atomicity (All-or-nothing) All-or-nothing Atomicity All-or-nothing A set of operations either all finish or none at all No intermediate state exist upon recovery Transfer $1000 From A: $3000 To B: $2000 A=A-1000, B=B+1000 persistent storage All-or-nothing is one of the guarantees offered by database transactions
Replication Benefits High availability High performance low latency Techniques Organize the replicas (Primary and Backups ) Consistency Failure processing what happens when a primary crashed? Atomicity...
Replication Management Where to place the replicas Servers Replica Content Replica Permanent Copies Cluster (Servers together) Active-Standby work simultaneously (high concurrence) Mirroring Static configuration Server-initialized copies Client-initialized copies
Replication Management Where to place the replicas Servers Replica Content Replica Permanent Copies Server-initialized copies elastic computinig Client-initialized copies client cache
Replication management How to organize the replicas Primary and backups Backups are maintained for availablity only Updates are send to the Primary by the user Eventually consistency Master slaves (coordinator) Master manages the work of slaves computing and data access are doing by the slaves single point of failure Peer to peer
Failure management Primary and backups Backups crash Primary crash elect a new primary Consensus: Allow a group of nodes to agree on a result Paxos: fault-tolerant distributed consensus algorithm (the only known) Master slaves (coordinator) Slaves crash Master crash Peer to peer Heart-beat testing
Failure management Crash recovery Backward recovery Go back to a correct status checkpoint ( Snapshot, Distributed snapshot(message transfer) ) Cost a lot Forward recovery Go forward to a correct status with the help of redundant information Known what failure happened Checkpoint & Logging
Failure management Logging Keep a log of all update actions Each action has 3 required operations old status DO New status New status UNDO Old status Log Log old status REDO New status Log
Distributed transactions How about atomicity and concurrency control in distributed systems? Client desire Atomicity: transfer either happens or not at all Concurrency control: maintain serializability
Distributed transactions Transaction Coordinator (TC) desire Begin transaction Responsible for commit/abort...
Distributed transactions One-phase Commit 1. A does not have enough money 2. Node has crashed 3. Coordinator crashed 4. Some other client is reading/writing A...
Distributed transactions Correctness If one COMMITs, no one ABORTs If one ABORTs, no one COMMITs Two-phase Commit (2PC) The commit-step itself is two phase Phase-1: Voting Each participant prepares to commit, and votes on whether or not it can commit Phase-2: Committing Each participant actually commits or abort
Two-phase Commit (2PC)
Two-phase Commit (2PC) The Voting Phase TC asks each participant? cancommit(t) Participants must prepare to commit using permanent storage before answering Objects are still locked Once a participant votes YES, it is not allowed to cause an ABORT Outcome of T is uncertain until docommit(t) or doabort(t) Other participants might still cause an ABORT
Two-phase Commit (2PC) The Committing Phase TC collects all votes If unanimous YES, cause COMMIT If any participant voted NO, cause ABORT The fate of the T is decided atomically at the TC, once all participants vote TC records fate using permanent storage Then broadcasts docommit(t) or doabort(t) to participants
Two-phase Commit (2PC) INIT Vote-request Vote-abort INIT Commit Vote-request Vote-request Vote-commit Vote-abort Global-abort WAIT Vote-commit Global-commit Global-abort ACK READY Global-commit ACK ABORT COMMIT ABORT COMMIT TC's finite-state machine Participant's finite-state machine
Two-phase Commit (2PC) Timeout TC times out waiting for participant s response Participant times out waiting for TC s outcome message Participant send Vote_abort when timeout at Init TC send Global_aboort to all when timeour at WAIT Participant timeout at READY, check other's status or just blocked Every participants timeout at READY, can only blocked there 3PC