Fault Tolerance. Distributed Systems. September 2002

Similar documents
CprE Fault Tolerance. Dr. Yong Guan. Department of Electrical and Computer Engineering & Information Assurance Center Iowa State University

Fault Tolerance. Basic Concepts

Fault Tolerance. Chapter 7

Today: Fault Tolerance. Failure Masking by Redundancy

Distributed Systems Fault Tolerance

Today: Fault Tolerance. Replica Management

MYE017 Distributed Systems. Kostas Magoutis

Today: Fault Tolerance. Fault Tolerance

Today: Fault Tolerance

Chapter 8 Fault Tolerance

CSE 5306 Distributed Systems

CSE 5306 Distributed Systems. Fault Tolerance

Today: Fault Tolerance. Reliable One-One Communication

Fault Tolerance. Distributed Systems IT332

Fault Tolerance Part II. CS403/534 Distributed Systems Erkay Savas Sabanci University

Chapter 5: Distributed Systems: Fault Tolerance. Fall 2013 Jussi Kangasharju

Chapter 8 Fault Tolerance

MYE017 Distributed Systems. Kostas Magoutis

Fault Tolerance. Fall 2008 Jussi Kangasharju

Failure Tolerance. Distributed Systems Santa Clara University

Dep. Systems Requirements

Fault Tolerance. Distributed Software Systems. Definitions

Distributed Systems Principles and Paradigms. Chapter 08: Fault Tolerance

Fault Tolerance Part I. CS403/534 Distributed Systems Erkay Savas Sabanci University

Fault Tolerance 1/64

Distributed Systems COMP 212. Lecture 19 Othon Michail

Distributed Systems Principles and Paradigms

Basic concepts in fault tolerance Masking failure by redundancy Process resilience Reliable communication. Distributed commit.

Failure Models. Fault Tolerance. Failure Masking by Redundancy. Agreement in Faulty Systems

Fault Tolerance. o Basic Concepts o Process Resilience o Reliable Client-Server Communication o Reliable Group Communication. o Distributed Commit

Distributed Systems Principles and Paradigms

Distributed Systems Principles and Paradigms

Module 8 Fault Tolerance CS655! 8-1!


Distributed Systems

Problem: if one process cannot perform its operation, it cannot notify the. Thus in practise better schemes are needed.

Module 8 - Fault Tolerance

Distributed Systems. Fault Tolerance. Paul Krzyzanowski

Distributed Systems (ICE 601) Fault Tolerance

Fault Tolerance. it continues to perform its function in the event of a failure example: a system with redundant components

Consensus and related problems

Distributed Systems. 09. State Machine Replication & Virtual Synchrony. Paul Krzyzanowski. Rutgers University. Fall Paul Krzyzanowski

Distributed Systems Reliable Group Communication

Fault Tolerance Dealing with an imperfect world

Recovering from a Crash. Three-Phase Commit

TWO-PHASE COMMIT ATTRIBUTION 5/11/2018. George Porter May 9 and 11, 2018

Implementation Issues. Remote-Write Protocols

MODELS OF DISTRIBUTED SYSTEMS

PRIMARY-BACKUP REPLICATION

Last Class:Consistency Semantics. Today: More on Consistency

(Pessimistic) Timestamp Ordering

Distributed Systems 11. Consensus. Paul Krzyzanowski

FAULT TOLERANCE. Fault Tolerant Systems. Faults Faults (cont d)

(Pessimistic) Timestamp Ordering. Rules for read and write Operations. Read Operations and Timestamps. Write Operations and Timestamps

Distributed Systems. 19. Fault Tolerance Paul Krzyzanowski. Rutgers University. Fall 2013

Distributed System. Gang Wu. Spring,2018

Basic vs. Reliable Multicast

To do. Consensus and related problems. q Failure. q Raft

Distributed Systems 24. Fault Tolerance

Practical Byzantine Fault

Global atomicity. Such distributed atomicity is called global atomicity A protocol designed to enforce global atomicity is called commit protocol

Fault Tolerance. Goals: transparent: mask (i.e., completely recover from) all failures, or predictable: exhibit a well defined failure behavior

MODELS OF DISTRIBUTED SYSTEMS

Practical Byzantine Fault Tolerance. Miguel Castro and Barbara Liskov

COMMUNICATION IN DISTRIBUTED SYSTEMS

Distributed Systems COMP 212. Revision 2 Othon Michail

Viewstamped Replication to Practical Byzantine Fault Tolerance. Pradipta De

Coordination 2. Today. How can processes agree on an action or a value? l Group communication l Basic, reliable and l ordered multicast

Networked Systems and Services, Fall 2018 Chapter 4. Jussi Kangasharju Markku Kojo Lea Kutvonen

Consensus in Distributed Systems. Jeff Chase Duke University

Fault Tolerance Causes of failure: process failure machine failure network failure Goals: transparent: mask (i.e., completely recover from) all

Distributed Systems. Characteristics of Distributed Systems. Lecture Notes 1 Basic Concepts. Operating Systems. Anand Tripathi

Distributed Systems. Characteristics of Distributed Systems. Characteristics of Distributed Systems. Goals in Distributed System Designs

C 1. Today s Question. CSE 486/586 Distributed Systems Failure Detectors. Two Different System Models. Failure Model. Why, What, and How

Distributed Transactions

Distributed Systems 23. Fault Tolerance

CS 470 Spring Fault Tolerance. Mike Lam, Professor. Content taken from the following:

CS 425 / ECE 428 Distributed Systems Fall 2017

TSW Reliability and Fault Tolerance

CSE 486/586 Distributed Systems

Eventual Consistency. Eventual Consistency

Distributed Systems Consensus

C 1. Recap. CSE 486/586 Distributed Systems Failure Detectors. Today s Question. Two Different System Models. Why, What, and How.

Causal Consistency and Two-Phase Commit

Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf

Concepts. Techniques for masking faults. Failure Masking by Redundancy. CIS 505: Software Systems Lecture Note on Consensus

Transactions. CS 475, Spring 2018 Concurrent & Distributed Systems

Network Protocols. Sarah Diesburg Operating Systems CS 3430

Distributed Commit in Asynchronous Systems

Distributed Computing. CS439: Principles of Computer Systems November 20, 2017

Practical Byzantine Fault Tolerance (The Byzantine Generals Problem)

Parallel Data Types of Parallelism Replication (Multiple copies of the same data) Better throughput for read-only computations Data safety Partitionin

Reliable Distributed System Approaches

Consensus, impossibility results and Paxos. Ken Birman

Beyond FLP. Acknowledgement for presentation material. Chapter 8: Distributed Systems Principles and Paradigms: Tanenbaum and Van Steen

CS 347 Parallel and Distributed Data Processing

Distributed Systems. coordination Johan Montelius ID2201. Distributed Systems ID2201

Distributed Systems Multicast & Group Communication Services

0: BEGIN TRANSACTION 1: W = 1 2: X = W + 1 3: Y = X * 2 4: COMMIT TRANSACTION

Failures, Elections, and Raft

Transcription:

Fault Tolerance Distributed Systems September 2002 Basics A component provides services to clients. To provide services, the component may require the services from other components a component may depend on some other component. Specifically: A component C depends on C if the correctness of C s behavior depends on the correctness of C s behavior. September 2002 DoCS 2002 2

Dependable Systems Dependable systems are characterized by o Availability % of time system may be used immediately o Reliability Mean time between failures o Safety How serious is the impact of a failure o Maintainability How long does it take to repair the system o Security September 2002 DoCS 2002 3 Definitions A system fails when it does not perform according to its specification. An error is part of a system state that may lead to a failure. A fault is the cause of an error. Types of faults o Transient Weather on a microwave link o Intermittent Bad solder joint o Permanent Coding error Note: For distributed systems, components can be either processes or channels A fault tolerant system does not fail in the presence of faults. September 2002 DoCS 2002 4

Server Failure Models Type of failure Crash failure Omission failure Receive omission Send omission Timing failure Response failure Value failure State transition failure Arbitrary failure Description A server halts, but is working correctly until it halts A server fails to respond to incoming requests A server fails to receive incoming messages A server fails to send messages A server's response lies outside the specified time interval The server's response is incorrect The value of the response is wrong The server deviates from the correct flow of control A server may produce arbitrary responses at arbitrary times Observation: Crash failures are the least severe; arbitrary failures are the worst September 2002 DoCS 2002 5 Crash Failure (1) Problem: Clients cannot distinguish between a crashed component and one that is just a bit slow Examples: Consider a server from which a client is expecting output: o Is the server perhaps exhibiting timing or omission failures o Is the channel between client and server faulty (crashed, or exhibiting timing or omission failures) September 2002 DoCS 2002 6

Crash Failure (2) Fail-silent: The component exhibits omission or crash failures; clients cannot tell what went wrong Fail-stop: The component exhibits crash failures, but its failure can be detected (either through announcement or timeouts) Fail-safe: The component exhibits arbitrary, but benign failures (they can t do any harm) September 2002 DoCS 2002 7 Masking Errors with Redundancy Information redundancy - Hamming code Time redundancy - timeout and retry Physical redundancy - triple modular redundancy September 2002 DoCS 2002 8

Processes Process Groups Replicated processes Reaching agreement Basic issue: Protect yourself against faulty processes by replicating and distributing computations in a group. September 2002 DoCS 2002 9 Flat and Hierarchical Groups All processes are equal Complex decision making Requires multicast Communicate with coordinator Coordinator makes decisions Single point of failure September 2002 DoCS 2002 10

Replicated Processes Fault tolerance can be achieved by o Primary backup o Replicated write K fault tolerant o System can tolerate k faulty components o To tolerate fail-stop requires k+1 o To tolerate byzantine requires 2k+1 September 2002 DoCS 2002 11 Byzantine Generals 3 The generals announce their troop strengths (in units of 1 kilosoldiers). The vectors that each general assembles The vectors that each general receives in step 3. September 2002 DoCS 2002 12

Agreement in Faulty Systems 3 The same as in previous slide, except now with 2 loyal generals and one traitor. September 2002 DoCS 2002 13 Reliable Communication So far: Concentrated on process resilience (by means of process groups). What about reliable communication channels? Error detection: o Framing of packets to allow for bit error detection o Use of frame numbering to detect packet loss Error correction: o Add so much redundancy that corrupted packets can be automatically corrected o Request retransmission of lost, or last N packets Observation: Most of this work assumes point-to-point communication September 2002 DoCS 2002 14

Reliable RPC/RMI? Client unable to locate the server o Relatively simple just report back to client Invocation from client to server is lost o Just resend message Server crashes after receiving request Reply message from server is lost Client crashes after sending an invocation September 2002 DoCS 2002 15 Server Crashes Server crashes are harder as you don t what it had already done: Normal operation Crash after execution Crash before execution At least once - retry until ack received At most once - report failure Exactly once - ideal target September 2002 DoCS 2002 16

Reliable Communication Client unable to locate the server Invocation from client to server is lost Server crashes after receiving request Reply message from server is lost Client crashes after sending an invocation September 2002 DoCS 2002 17 Reliable RPC/RMI Detecting lost replies can be hard, because it can also be that the server had crashed. You don t know whether the server has carried out the operation Solution: None, except that you can try to make your operations idempotent: o repeatable without any harm done if it happened to be carried out before. September 2002 DoCS 2002 18

Reliable RPC/RMI Problem: The server is doing work and holding resources for nothing (called doing an orphan computation). Orphan is killed (or rolled back) by client when it reboots Broadcast new epoch number when recovering servers kill orphans Require computations to complete in a time T units. Old ones are simply removed. September 2002 DoCS 2002 19 Reliable Multicast Multicast can be useful to groups of replicas What is reliable multicast? o Message order o Interaction with group membership o Failing and recovering members September 2002 DoCS 2002 20

Basic Reliable Multicast Known fixed group No failures Underlying unreliable multicast September 2002 DoCS 2002 21 Scalable Reliable Multicasting: Feedback Suppression Basic idea: Let a process P suppress its own feedback when it notices another process Q is already asking for a retransmission Assumptions: o All receivers listen to a common feedback channel to which feedback messages are submitted o Process P schedules its own feedback message randomly, and suppresses it when observing another feedback message September 2002 DoCS 2002 22

Multicast Feedback Only negative acknowledgements Multicast NACKs suppress same Retransmit also multicast September 2002 DoCS 2002 23 Hierarchical Feedback Control Local coordinator o Forwards the message to its children. o Handles retransmission requests. September 2002 DoCS 2002 24

Atomic Multicast Message is delivered to all members of a group or to none. Messages are delivered in same order to all members. What does it mean for a member to join or leave the group? Simplifies higher level applications. September 2002 DoCS 2002 25 Virtual Synchrony Distinguish between message receipt and message delivery September 2002 DoCS 2002 26

Virtual Synchrony P1 P2 P1 joins P3 crashes P3 rejoins P3 P4 G={P1,P2,P3,P4} G={P1,P2,P4} G={P1,P2,P3,P4} Principle of virtual synchronous multicast September 2002 DoCS 2002 27 Ordered Multicast FIFO ordering: If a correct process issues multicast(g, m 1 ) followed by multicast(g,m 2 ) then every correct process that delivers m 2 will deliver m 1 before m 2. Causal ordering: If multicast(g, m 1 ) happened before multicast(g, m 2 ) then any correct process that delivers m 2 will deliver m 1 before m 2. Total ordering: If a correct process delivers m 1 before it delivers m 2, then any other correct process that delivers m 2 will deliver m 1 before m 2. September 2002 DoCS 2002 28

FIFO Ordering t 1 t 2 t 3 t 4 time P 1 P 2 P 3 P 4 P 5 September 2002 DoCS 2002 29 Causal Ordering c 1 c 2 time c3 P 1 P 2 P 3 P 4 P 5 September 2002 DoCS 2002 30

Total Causal Ordering c 1 c 2 time c3 P 1 P 2 P 3 P 4 P 5 September 2002 DoCS 2002 31 Six Reliable Multicasts Multicast Reliable multicast FIFO multicast Causal multicast Atomic multicast FIFO atomic multicast Causal atomic multicast Basic Message Ordering None FIFO-ordered delivery Causal-ordered delivery None FIFO-ordered delivery Causal-ordered delivery Total-ordered Delivery? No No No Yes Yes Yes September 2002 DoCS 2002 32

Totally Ordered Requests (ISIS) P 2 1 2 3 1 Message 2 Proposed Seq P 4 1 3 Agreed Seq 2 3 P 1 P 3 September 2002 DoCS 2002 33 Implementing Virtual Synchrony Process 4 notices that process 7 has crashed, sends a view change Process 6 sends out all its unstable messages, followed by a flush message Process 6 installs the new view when it has received a flush message from everyone else September 2002 DoCS 2002 34

Two-Phase Commit (1) a) The finite state machine for the coordinator in 2PC. b) The finite state machine for a participant. September 2002 DoCS 2002 35 Two-Phase Commit (2) State of Q COMMIT ABORT INIT READY Action by P Make transition to COMMIT Make transition to ABORT Make transition to ABORT Contact another participant Actions taken by a participant P when residing in state READY and having contacted another participant Q. September 2002 DoCS 2002 36

Two-Phase Commit (3) actions by coordinator: while START _2PC to local log; multicast VOTE_REQUEST to all participants; while not all votes have been collected { wait for any incoming vote; if timeout { while GLOBAL_ABORT to local log; multicast GLOBAL_ABORT to all participants; exit; } record vote; } if all participants sent VOTE_COMMIT and coordinator votes COMMIT{ write GLOBAL_COMMIT to local log; multicast GLOBAL_COMMIT to all participants; } else { write GLOBAL_ABORT to local log; multicast GLOBAL_ABORT to all participants; } Outline of the steps taken by the coordinator in a two phase commit protocol September 2002 DoCS 2002 37 Two-Phase Commit (4) actions by participant: write INIT to local log; wait for VOTE_REQUEST from coordinator; if timeout { write VOTE_ABORT to local log; exit; } if participant votes COMMIT { write VOTE_COMMIT to local log; send VOTE_COMMIT to coordinator; wait for DECISION from coordinator; if timeout { multicast DECISION_REQUEST to other participants; wait until DECISION is received; /* remain blocked */ write DECISION to local log; } if DECISION == GLOBAL_COMMIT write GLOBAL_COMMIT to local log; else if DECISION == GLOBAL_ABORT write GLOBAL_ABORT to local log; } else { write VOTE_ABORT to local log; send VOTE ABORT to coordinator; } September 2002 DoCS 2002 38 Steps taken by participant process in 2PC.

Two-Phase Commit (5) actions for handling decision requests: /* executed by separate thread */ while true { wait until any incoming DECISION_REQUEST is received; /* remain blocked */ read most recently recorded STATE from the local log; if STATE == GLOBAL_COMMIT send GLOBAL_COMMIT to requesting participant; else if STATE == INIT or STATE == GLOBAL_ABORT send GLOBAL_ABORT to requesting participant; else skip; /* participant remains blocked */ Steps taken for handling incoming decision requests. September 2002 DoCS 2002 39 Three-Phase Commit a) Finite state machine for the coordinator in 3PC b) Finite state machine for a participant September 2002 DoCS 2002 40

Recovery Stable Storage a) Stable Storage b) Crash after drive 1 is updated c) Bad spot September 2002 DoCS 2002 41 Checkpointing A recovery line. September 2002 DoCS 2002 42

Independent Checkpointing The domino effect. September 2002 DoCS 2002 43 Message Logging Incorrect replay of messages after recovery, leading to an orphan process. September 2002 DoCS 2002 44