Distributed System. Gang Wu. Spring, PDF Free Download

Distributed System Gang Wu Spring,2018

Lecture4:Failure& Fault-tolerant Failure is the defining difference between distributed and local programming, so you have to design distributed systems with the expectation of failure "Failure happens all the time. It is your number one concern.

Dependable system Availability Reliability Safety Maintainability

Crash at the Wrong Time Examples Failure during middle of online purchase Failure during mv /home/fudan /home/sjtu Problematic Pay the bill or not? twice? Where and how many files? one / zero / dup? What guarantees do applications need?

Atomicity (All-or-nothing) All-or-nothing Atomicity All-or-nothing A set of operations either all finish or none at all No intermediate state exist upon recovery Transfer $1000 From A: $3000 To B: $2000 A=A-1000, B=B+1000 persistent storage All-or-nothing is one of the guarantees offered by database transactions

Replication Benefits High availability High performance low latency Techniques Organize the replicas (Primary and Backups ) Consistency Failure processing what happens when a primary crashed? Atomicity...

Replication Management Where to place the replicas Servers Replica Content Replica Permanent Copies Cluster (Servers together) Active-Standby work simultaneously (high concurrence) Mirroring Static configuration Server-initialized copies Client-initialized copies

Replication Management Where to place the replicas Servers Replica Content Replica Permanent Copies Server-initialized copies elastic computinig Client-initialized copies client cache

Replication management How to organize the replicas Primary and backups Backups are maintained for availablity only Updates are send to the Primary by the user Eventually consistency Master slaves (coordinator) Master manages the work of slaves computing and data access are doing by the slaves single point of failure Peer to peer

Failure management Primary and backups Backups crash Primary crash elect a new primary Consensus: Allow a group of nodes to agree on a result Paxos: fault-tolerant distributed consensus algorithm (the only known) Master slaves (coordinator) Slaves crash Master crash Peer to peer Heart-beat testing

Failure management Crash recovery Backward recovery Go back to a correct status checkpoint ( Snapshot, Distributed snapshot(message transfer) ) Cost a lot Forward recovery Go forward to a correct status with the help of redundant information Known what failure happened Checkpoint & Logging

Failure management Logging Keep a log of all update actions Each action has 3 required operations old status DO New status New status UNDO Old status Log Log old status REDO New status Log

Distributed transactions How about atomicity and concurrency control in distributed systems? Client desire Atomicity: transfer either happens or not at all Concurrency control: maintain serializability

Distributed transactions Transaction Coordinator (TC) desire Begin transaction Responsible for commit/abort...

Distributed transactions One-phase Commit 1. A does not have enough money 2. Node has crashed 3. Coordinator crashed 4. Some other client is reading/writing A...

Distributed transactions Correctness If one COMMITs, no one ABORTs If one ABORTs, no one COMMITs Two-phase Commit (2PC) The commit-step itself is two phase Phase-1: Voting Each participant prepares to commit, and votes on whether or not it can commit Phase-2: Committing Each participant actually commits or abort

Two-phase Commit (2PC)

Two-phase Commit (2PC) The Voting Phase TC asks each participant? cancommit(t) Participants must prepare to commit using permanent storage before answering Objects are still locked Once a participant votes YES, it is not allowed to cause an ABORT Outcome of T is uncertain until docommit(t) or doabort(t) Other participants might still cause an ABORT

Two-phase Commit (2PC) The Committing Phase TC collects all votes If unanimous YES, cause COMMIT If any participant voted NO, cause ABORT The fate of the T is decided atomically at the TC, once all participants vote TC records fate using permanent storage Then broadcasts docommit(t) or doabort(t) to participants

Two-phase Commit (2PC) INIT Vote-request Vote-abort INIT Commit Vote-request Vote-request Vote-commit Vote-abort Global-abort WAIT Vote-commit Global-commit Global-abort ACK READY Global-commit ACK ABORT COMMIT ABORT COMMIT TC's finite-state machine Participant's finite-state machine

Two-phase Commit (2PC) Timeout TC times out waiting for participant s response Participant times out waiting for TC s outcome message Participant send Vote_abort when timeout at Init TC send Global_aboort to all when timeour at WAIT Participant timeout at READY, check other's status or just blocked Every participants timeout at READY, can only blocked there 3PC

Distributed System. Gang Wu. Spring,2018