CprE Fault Tolerance. Dr. Yong Guan. Department of Electrical and Computer Engineering & Information Assurance Center Iowa State University

Similar documents
Fault Tolerance. Distributed Systems. September 2002

Chapter 8 Fault Tolerance

Fault Tolerance. Chapter 7

MYE017 Distributed Systems. Kostas Magoutis

Fault Tolerance. Basic Concepts

Chapter 8 Fault Tolerance

Today: Fault Tolerance. Failure Masking by Redundancy

CSE 5306 Distributed Systems

CSE 5306 Distributed Systems. Fault Tolerance

Today: Fault Tolerance. Replica Management

Fault Tolerance. Distributed Systems IT332

Today: Fault Tolerance. Reliable One-One Communication

Today: Fault Tolerance

Today: Fault Tolerance. Fault Tolerance

Fault Tolerance Part II. CS403/534 Distributed Systems Erkay Savas Sabanci University

MYE017 Distributed Systems. Kostas Magoutis

Fault Tolerance. Distributed Software Systems. Definitions

Fault Tolerance 1/64

Fault Tolerance. Fall 2008 Jussi Kangasharju

Distributed Systems Fault Tolerance

Dep. Systems Requirements

Fault Tolerance Part I. CS403/534 Distributed Systems Erkay Savas Sabanci University

Chapter 5: Distributed Systems: Fault Tolerance. Fall 2013 Jussi Kangasharju

Failure Tolerance. Distributed Systems Santa Clara University

Distributed Systems COMP 212. Lecture 19 Othon Michail

Distributed Systems Principles and Paradigms. Chapter 08: Fault Tolerance

Basic concepts in fault tolerance Masking failure by redundancy Process resilience Reliable communication. Distributed commit.

Module 8 Fault Tolerance CS655! 8-1!

Failure Models. Fault Tolerance. Failure Masking by Redundancy. Agreement in Faulty Systems

Distributed Systems Principles and Paradigms

Module 8 - Fault Tolerance

Fault Tolerance. o Basic Concepts o Process Resilience o Reliable Client-Server Communication o Reliable Group Communication. o Distributed Commit


Distributed Systems Principles and Paradigms

Distributed Systems Principles and Paradigms

Distributed Systems Reliable Group Communication

Problem: if one process cannot perform its operation, it cannot notify the. Thus in practise better schemes are needed.

Networked Systems and Services, Fall 2018 Chapter 4. Jussi Kangasharju Markku Kojo Lea Kutvonen

(Pessimistic) Timestamp Ordering

Recovering from a Crash. Three-Phase Commit

Distributed Systems. 09. State Machine Replication & Virtual Synchrony. Paul Krzyzanowski. Rutgers University. Fall Paul Krzyzanowski

(Pessimistic) Timestamp Ordering. Rules for read and write Operations. Read Operations and Timestamps. Write Operations and Timestamps

Distributed Systems 11. Consensus. Paul Krzyzanowski

MODELS OF DISTRIBUTED SYSTEMS

Distributed Systems. Fault Tolerance. Paul Krzyzanowski

Distributed Systems (ICE 601) Fault Tolerance

Distributed Systems

Consensus and related problems

MODELS OF DISTRIBUTED SYSTEMS

Fault Tolerance. it continues to perform its function in the event of a failure example: a system with redundant components

Distributed Systems. Characteristics of Distributed Systems. Lecture Notes 1 Basic Concepts. Operating Systems. Anand Tripathi

Distributed Systems. Characteristics of Distributed Systems. Characteristics of Distributed Systems. Goals in Distributed System Designs

Fault Tolerance Dealing with an imperfect world

COMMUNICATION IN DISTRIBUTED SYSTEMS

To do. Consensus and related problems. q Failure. q Raft

Parallel and Distributed Systems. Programming Models. Why Parallel or Distributed Computing? What is a parallel computer?

Last Class:Consistency Semantics. Today: More on Consistency

Implementation Issues. Remote-Write Protocols

Today CSCI Communication. Communication in Distributed Systems. Communication in Distributed Systems. Remote Procedure Calls (RPC)

CS 425 / ECE 428 Distributed Systems Fall 2017

FAULT TOLERANCE. Fault Tolerant Systems. Faults Faults (cont d)

Distributed Systems Exam 1 Review. Paul Krzyzanowski. Rutgers University. Fall 2016

Global atomicity. Such distributed atomicity is called global atomicity A protocol designed to enforce global atomicity is called commit protocol

Synchronization. Chapter 5

Distributed Systems. 19. Fault Tolerance Paul Krzyzanowski. Rutgers University. Fall 2013

Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf

Distributed Computing. CS439: Principles of Computer Systems November 19, 2018

Distributed Computing. CS439: Principles of Computer Systems November 20, 2017

Transactions. CS 475, Spring 2018 Concurrent & Distributed Systems

Distributed System. Gang Wu. Spring,2018

Distributed Operating Systems

PRIMARY-BACKUP REPLICATION

Fault Tolerance. Goals: transparent: mask (i.e., completely recover from) all failures, or predictable: exhibit a well defined failure behavior

TWO-PHASE COMMIT ATTRIBUTION 5/11/2018. George Porter May 9 and 11, 2018

Causal Consistency and Two-Phase Commit

Failures, Elections, and Raft

Synchronization. Clock Synchronization

Agreement in Distributed Systems CS 188 Distributed Systems February 19, 2015

Distributed Systems COMP 212. Revision 2 Othon Michail

Last time. Distributed systems Lecture 6: Elections, distributed transactions, and replication. DrRobert N. M. Watson

Distributed Systems 24. Fault Tolerance

2017 Paul Krzyzanowski 1

Exam 2 Review. October 29, Paul Krzyzanowski 1

Fault Tolerance Causes of failure: process failure machine failure network failure Goals: transparent: mask (i.e., completely recover from) all

Distributed Systems Consensus

Distributed Systems 23. Fault Tolerance

CS 470 Spring Fault Tolerance. Mike Lam, Professor. Content taken from the following:

CS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5.

Paxos and Raft (Lecture 21, cs262a) Ion Stoica, UC Berkeley November 7, 2016

Consistency in Distributed Systems

Basic vs. Reliable Multicast

0: BEGIN TRANSACTION 1: W = 1 2: X = W + 1 3: Y = X * 2 4: COMMIT TRANSACTION

Consensus in Distributed Systems. Jeff Chase Duke University

Process groups and message ordering

Eventual Consistency. Eventual Consistency

Practical Byzantine Fault

Distributed Systems (5DV147)

Synchronization Part 2. REK s adaptation of Claypool s adaptation oftanenbaum s Distributed Systems Chapter 5 and Silberschatz Chapter 17

PROCESS SYNCHRONIZATION

Today CSCI Recovery techniques. Recovery. Recovery CAP Theorem. Instructor: Abhishek Chandra

Transcription:

Fault Tolerance Dr. Yong Guan Department of Electrical and Computer Engineering & Information Assurance Center Iowa State University

Outline for Today s Talk Basic Concepts Process Resilience Reliable C-S Communication Reliable Group Communication Distributed Commit Recovery

Readings for Today s Lecture References Chapter 8 of Distributed Systems: Principles and Paradigms

Introduction to Fault Tolerance CprE 450-550

Basic Concepts Being fault tolerant is strongly related to what are called dependable systems Dependability implies the following: Availability Reliability Safety Maintainability Security

Basic Concepts (cont.) A system is said to fail when it cannot meet its promises An error is a part of a system s state that may lead to a failure The cause of an error is called fault Building dependable systems closely relates to controlling faults Fault tolerance means that a system can provide its services even in the presence of faults

Basic Concepts (cont.) Transient fault Intermittent fault Permanent fault

Failure Models Type of failure Crash failure Omission failure Receive omission Send omission Timing failure Response failure Value failure State transition failure Arbitrary failure Description A server halts, but is working correctly until it halts A server fails to respond to incoming requests A server fails to receive incoming messages A server fails to send messages A server's response lies outside the specified time interval The server's response is incorrect The value of the response is wrong The server deviates from the correct flow of control A server may produce arbitrary responses at arbitrary times Different types of failures.

Failure Masking by Redundancy Triple modular redundancy. (TMR)

Process Resilience Design issues Group Memberships Failure Masking and Replication Primary-based replication Replication-write protocols (quorum-based protocols, etc.) K fault tolerance Agreement in Faulty Systems

Flat Groups versus Hierarchical Groups CprE 450-550 a) Communication in a flat group. b) Communication in a simple hierarchical group

Agreement in Faulty Systems The general goal: Have all the non-faulty processes reach consensus on some issues and establish it within a finite number of steps. Example: Two-army problem Blue 1 Red Blue 2

Agreement in Faulty Systems (2) Circumstances under which distributed agreement can be reached.

Agreement in Faulty Systems (3) The Byzantine generals problem for 3 loyal generals and1 traitor. a) The generals announce their troop strengths (in units of 1 kilosoldiers). b) The vectors that each general assembles based on (a) c) The vectors that each general receives in step 3.

Agreement in Faulty Systems (4) The same as in previous slide, except now with 2 loyal generals and one traitor.

Agreement in Faulty Systems (4) Lamport (1983) Given m faulty processes, agreement can be achieved only if 2m+1 correctly functioning processes are present, for a total of 3m+1 processes.

Reliable Client-Server Communication

Reliable Client-Server Communication A communication channel may exhibit crash, omission, timing, and arbitrary failures. Reliable transport protocol: TCP RPC semantics in the presence of failures RPC: hide communication by making remote procedure calls as local ones If both C and S work perfectly, RPC does its job well. Problem: It is not easy to mask the difference between remote and local calls in the presence of failures.

Reliable Client-Server Communication Five different classes of failures that can occur in RPC systems: 1. The client is unable to locate the server. 2. The request message from the client to the server is lost. 3. The server crashes after receiving a request. 4. The reply message from the server to the client is lost. 5. The client crashes after sending a request.

RPC: Server Crashes A server in client-server communication a) Normal case b) Crash after execution c) Crash before execution Guarantee nothing At least once semantics At most once semantics Exactly once semantics

RPC: Server Crashes (2) Three events that can happen at the server: Send the completion message (M), Print the text (P), Crash (C).

RPC: Server Crashes (3) These events can occur in six different orderings: 1. M P C: A crash occurs after sending the completion message and printing the text. 2. M C ( P): A crash happens after sending the completion message, but before the text could be printed. 3. P M C: A crash occurs after sending the completion message and printing the text. 4. P C( M): The text printed, after which a crash occurs before the completion message could be sent. 5. C ( P M): A crash happens before the server could do anything. 6. C ( M P): A crash happens before the server could do anything.

RPC: Server Crashes (4) Different combinations of client and server strategies in the presence of server crashes.

RPC: Lost reply message Idempotent: Some operations can safely be repeated as often as necessary with no damage being done.

RPC: Client Crashes What happens if a client sends a request to a server to do some work and crashes before the server replies? Orphan: At a point, a computation is active and no parent is waiting for the result. Such unwanted computation is called orphan. Waste CPU cylces Can lock files or otherwise tie up valuable resources

RPC: Client Crashes Four things can be done: Extermination: Log what is about to do before RPC stub sends a RPC message. After reboot, the log is checked and the orphan is explicitly killed off. Reincarnation: Divide time up into sequentially numbered epochs. After reboot, it broadcasts a message to all machines declaring the start of a new epoch. Once such broadcast mesg received, all remote computations on behalf of that client are killed. Gentle reincarnation: When epoch broadcast comes in, each machine checks to see if it has any remote computations, and if so, tries to locate their owner. Only if the owner cannot be found is the computation killed. Expiration: Each RPC is given a time T to do its job.

Reliable Group Communication CprE 450-550

Reliable Group Communication TCP offers reliable point-to-point channels. How to implement reliable group communication? One way: Let each process set up a point-to-point connection to each other process it wants to. Not efficient What do we mean Reliable Group Communication? A message that is sent to a process group should be delivered to each member of that group.

Reliable Group Communication (2) Need further definition for Reliable Group Communication What about new member joining the group during the communication? What about an existing member leaving the group? Crashes? In the presence of faulty processes, multicasting is considered to be reliable when it can be guaranteed that all non-faulty group members receive the message. Agreement should be reached.

Basic Reliable-Multicasting Schemes A simple solution to reliable multicasting when all receivers are known and are assumed not to fail a) Message transmission b) Reporting feedback

Scalability in Reliable Multicasting Problem: Feedback implosion Will negative acknowledgement solve the problem? Is there any problem with only returning negative acknowledgement? Feedback suppression

Nonhierarchical Feedback Control Several receivers have scheduled a request for retransmission, but the first retransmission request leads to the suppression of others.

Hierarchical Feedback Control The essence of hierarchical reliable multicasting. a) Each local coordinator forwards the message to its children. b) A local coordinator handles retransmission requests.

Atomic Multicast Atomic multicast problem: A message is delivered to either all processes or to none at all. In addition, all messages are delivered in the same order to all processes Why this important? Example: Replicated database

Virtual Synchrony Group View: Delivery list, the set of processes contained in the group View Change Virtually Synchronous: A mesg multicast to group view G is delivered to each non-faulty process in G. If the sender of the mesg crashes during the multicast, the mesg may either be delivered to all remaining processes or ignored by each of them. The logical organization of a distributed system to distinguish between message receipt and message delivery

Virtual Synchrony (2) The principle of virtual synchronous multicast.

Message Ordering Four different ordering: Unordered multicasts FIFO-ordered multicasts Causally-ordered multicasts Totally-ordered multicasts Process P1 Process P2 Process P3 sends m1 receives m1 receives m2 sends m2 receives m2 receives m1 Three communicating processes in the same group. The ordering of events per process is shown along the vertical axis.

Message Ordering (2) Process P1 Process P2 Process P3 Process P4 sends m1 receives m1 receives m3 sends m3 sends m2 receives m3 receives m1 sends m4 receives m2 receives m4 receives m2 receives m4 Four processes in the same group with two different senders, and a possible delivery order of messages under FIFO-ordered multicasting

Implementing Virtual Synchrony Six different versions of virtually synchronous reliable multicasting.

Implementing Virtual Synchrony (2) a) Process 4 notices that process 7 has crashed, sends a view change b) Process 6 sends out all its unstable messages, followed by a flush message c) Process 6 installs the new view when it has received a flush message from everyone else

Distributed Commit CprE 450-550

Distributed Commit The distributed commit problem involves having an operation being performed by each member of a process group, or none at all. One-phase commit protocol Problem: if one of the participant can not perform the operation, no way to tell the coordinator Two phase commit protocol Three phase commit protocol

Two Phase Commit Protocol 1. The coordinator sends a VOTE_REQUEST message to all participants 2. When a participant receives VOTE_REQUEST, it returns either VOTE_COMMIT or VOTE_ABORT to the coordinator. 3. The coordinator collects all votes from the participants. If all participants have voted to commit the transaction, then so will the coordinator. In that case, it sends a GLOBAL_COMMIT to all participants. Otherwise, it multicasts a GLOBAL_ABORT. 4. Each participant that voted for a commit waits for the final reaction by the coordinator. Locally commit if a GLOBAL_COMMIT is received or abort if GLOBAL_ABORT is received.

Two-Phase Commit a) The finite state machine for the coordinator in 2PC. b) The finite state machine for a participant.

Two-Phase Commit (2) State of Q Action by P COMMIT Make transition to COMMIT ABORT Make transition to ABORT INIT Make transition to ABORT READY Contact another participant Actions taken by a participant P when residing in state READY and having contacted another participant Q.

Two-Phase Commit (3) Actions by coordinator: while START _2PC to local log; multicast VOTE_REQUEST to all participants; while not all votes have been collected { wait for any incoming vote; if timeout { while GLOBAL_ABORT to local log; multicast GLOBAL_ABORT to all participants; exit; } record vote; } if all participants sent VOTE_COMMIT and coordinator votes COMMIT{ write GLOBAL_COMMIT to local log; multicast GLOBAL_COMMIT to all participants; } else { write GLOBAL_ABORT to local log; multicast GLOBAL_ABORT to all participants; } Outline of the steps taken by the coordinator in a two phase commit protocol

Two-Phase Commit (4) Steps taken by a participant process in 2PC. Actions by participant: write INIT to local log; wait for VOTE_REQUEST from coordinator; if timeout { write VOTE_ABORT to local log; exit; } if participant votes COMMIT { write VOTE_COMMIT to local log; send VOTE_COMMIT to coordinator; wait for DECISION from coordinator; if timeout { multicast DECISION_REQUEST to other participants; wait until DECISION is received; /* remain blocked */ write DECISION to local log; } if DECISION == GLOBAL_COMMIT write GLOBAL_COMMIT to local log; else if DECISION == GLOBAL_ABORT write GLOBAL_ABORT to local log; } else { write VOTE_ABORT to local log; send VOTE ABORT to coordinator; }

Two-Phase Commit (5) Steps taken for handling incoming decision requests. Actions for handling decision requests: /* executed by separate thread */ while true { wait until any incoming DECISION_REQUEST is received; /* remain blocked */ read most recently recorded STATE from the local log; if STATE == GLOBAL_COMMIT send GLOBAL_COMMIT to requesting participant; else if STATE == INIT or STATE == GLOBAL_ABORT send GLOBAL_ABORT to requesting participant; else skip; /* participant remains blocked */

Three-Phase Commit The states of the coordinator and each participant satisfy the following two conditions: 1. There is no single state from which it is possible to make a transition directly to either a COMMIT or an ABORT state. 2. There is no state in which it is not possible to make a final decision, and from which a transition to a COMMIT state can be made.

Three-Phase Commit (2) a) Finite state machine for the coordinator in 3PC b) Finite state machine for a participant

Recovery CprE 450-550

Recovery Two forms of error recovery Backward recovery Checkpoint Forward recovery

Recovery Stable Storage a) Stable Storage b) Crash after drive 1 is updated c) Bad spot

Checkpointing A recovery line.

Independent Checkpointing The domino effect.

Message Logging Incorrect replay of messages after recovery, leading to an orphan process.

Message Logging Scheme A message is stable if it can no longer be lost. Pessimistic logging protocols Optimistic logging protocols

Questions? Thanks and See you next time