Implementation Issues. Remote-Write Protocols

Similar documents
Last Class:Consistency Semantics. Today: More on Consistency

Eventual Consistency. Eventual Consistency

Today: Fault Tolerance. Replica Management

Basic concepts in fault tolerance Masking failure by redundancy Process resilience Reliable communication. Distributed commit.

Today: Fault Tolerance

Today: Fault Tolerance. Fault Tolerance

Last Class: Web Caching. Today: More on Consistency

Module 8 Fault Tolerance CS655! 8-1!

Fault Tolerance. Distributed Software Systems. Definitions

Fault Tolerance. Distributed Systems IT332

Distributed Systems COMP 212. Lecture 19 Othon Michail

Failure Models. Fault Tolerance. Failure Masking by Redundancy. Agreement in Faulty Systems

Last Class: Consistency Models. Today: Implementation Issues

Fault Tolerance. Distributed Systems. September 2002

Fault Tolerance Part I. CS403/534 Distributed Systems Erkay Savas Sabanci University

Fault Tolerance. Basic Concepts

Module 8 - Fault Tolerance

Dep. Systems Requirements

Distributed Systems

Replication in Distributed Systems

Consistency and Replication (part b)

Consistency and Replication. Some slides are from Prof. Jalal Y. Kawash at Univ. of Calgary

Chapter 7 Consistency And Replication

CprE Fault Tolerance. Dr. Yong Guan. Department of Electrical and Computer Engineering & Information Assurance Center Iowa State University

Fault Tolerance. Chapter 7

Replication. Consistency models. Replica placement Distribution protocols

Consistency and Replication

Today: World Wide Web! Traditional Web-Based Systems!

Replica Placement. Replica Placement

Distributed Systems (ICE 601) Fault Tolerance

CS 470 Spring Fault Tolerance. Mike Lam, Professor. Content taken from the following:

Failure Tolerance. Distributed Systems Santa Clara University

Distributed Systems (5DV147)

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 7 Consistency And Replication

Consistency & Replication

Chapter 4: Distributed Systems: Replication and Consistency. Fall 2013 Jussi Kangasharju

Distributed Systems. Characteristics of Distributed Systems. Lecture Notes 1 Basic Concepts. Operating Systems. Anand Tripathi

Distributed Systems. Characteristics of Distributed Systems. Characteristics of Distributed Systems. Goals in Distributed System Designs

Module 7 - Replication

Distributed Systems Fault Tolerance

Parallel Data Types of Parallelism Replication (Multiple copies of the same data) Better throughput for read-only computations Data safety Partitionin

Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf

Replication architecture

CSE 5306 Distributed Systems

CSE 5306 Distributed Systems. Fault Tolerance

Replication & Consistency Part II. CS403/534 Distributed Systems Erkay Savas Sabanci University

FAULT TOLERANCE. Fault Tolerant Systems. Faults Faults (cont d)

Chapter 8 Fault Tolerance

Issues in Programming Language Design for Embedded RT Systems

Fault Tolerance. The Three universe model

Last time. Distributed systems Lecture 6: Elections, distributed transactions, and replication. DrRobert N. M. Watson

Consensus and related problems

Consistency and Replication 1/65

02 - Distributed Systems

CSE 5306 Distributed Systems. Consistency and Replication

CONSISTENCY IN DISTRIBUTED SYSTEMS

Lecture 6 Consistency and Replication

Byzantine Fault Tolerance

Consistency and Replication. Why replicate?

Replication and Consistency. Fall 2010 Jussi Kangasharju

Consistency and Replication 1/62

Chapter 5: Distributed Systems: Fault Tolerance. Fall 2013 Jussi Kangasharju

Fault Tolerance. it continues to perform its function in the event of a failure example: a system with redundant components

Today: World Wide Web

Distributed Systems. Fault Tolerance. Paul Krzyzanowski

To do. Consensus and related problems. q Failure. q Raft

02 - Distributed Systems

Today: World Wide Web. Traditional Web-Based Systems

Byzantine Fault Tolerance

Department of Electrical Engineering and Computer Science MASSACHUSETTS INSTITUTE OF TECHNOLOGY Spring Second Exam.

Distributed Systems Principles and Paradigms. Chapter 07: Consistency & Replication

CS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5.

Parallel DBs. April 25, 2017

6.033 Lecture Fault Tolerant Computing 3/31/2014

Important Lessons. A Distributed Algorithm (2) Today's Lecture - Replication

TSW Reliability and Fault Tolerance

CSE 5306 Distributed Systems

System Models for Distributed Systems

Availability, Reliability, and Fault Tolerance

Distributed Systems COMP 212. Revision 2 Othon Michail

ZooKeeper & Curator. CS 475, Spring 2018 Concurrent & Distributed Systems

Today CSCI Consistency Protocols. Consistency Protocols. Data Consistency. Instructor: Abhishek Chandra. Implementation of a consistency model

Consistency and Replication. Why replicate?

10. Replication. Motivation

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

Primary-Backup Replication

Replication and Consistency

System models for distributed systems

Computing Parable. The Archery Teacher. Courtesy: S. Keshav, U. Waterloo. Computer Science. Lecture 16, page 1

Fault Tolerance Dealing with an imperfect world

Distributed System. Gang Wu. Spring,2018

Consistency and Replication

Distributed Systems 24. Fault Tolerance

NPTEL Course Jan K. Gopinath Indian Institute of Science

From eventual to strong consistency. Primary-Backup Replication. Primary-Backup Replication. Replication State Machines via Primary-Backup

Transactions. CS 475, Spring 2018 Concurrent & Distributed Systems

Recovering from a Crash. Three-Phase Commit

Distributed Systems. 19. Fault Tolerance Paul Krzyzanowski. Rutgers University. Fall 2013

Distributed Systems. replication Johan Montelius ID2201. Distributed Systems ID2201

Mobile and Heterogeneous databases Distributed Database System Transaction Management. A.R. Hurson Computer Science Missouri Science & Technology

Transcription:

Implementation Issues Two techniques to implement consistency models Primary-based protocols Assume a primary replica for each data item Primary responsible for coordinating all writes Replicated write protocols No primary is assumed for a data item Writes can take place at any replica Lecture 16, page 1 Remote-Write Protocols Traditionally used in client-server systems Lecture 16, page 2

Remote-Write Protocols (2) Primary-backup protocol Allow local reads, sent writes to primary Block on write until all replicas are notified Implements sequential consistency Lecture 16, page 3 Local-Write Protocols (1) Primary-based local-write protocol in which a single copy is migrated between processes. Limitation: need to track the primary for each data item Lecture 16, page 4

Local-Write Protocols (2) Primary-backup protocol in which the primary migrates to the process wanting to perform an update Lecture 16, page 5 Replicated-write Protocols Relax the assumption of one primary No primary, any replica is allowed to update Consistency is more complex to achieve Quorum-based protocols Use voting to request/acquire permissions from replicas Consider a file replicated on N servers N R +N W > N N W > N/2 Update: contact at least (N/2+1) servers and get them to agree to do update (associate version number with file) Read: contact majority of servers and obtain version number If majority of servers agree on a version number, read Lecture 16, page 6

Giffordʼs Quorum-Based Protocol Three examples of the voting algorithm: a) A correct choice of read and write set b) A choice that may lead to write-write conflicts c) A correct choice, known as ROWA (read one, write all) Lecture 16, page 7 Replica Management Replica server placement Web: geophically skewed request patterns Where to place a proxy? K-clusters algorithm Permanent replicas versus temporary Mirroring: all replicas mirror the same content Proxy server: on demand replication Server-initiated versus client-initiated Lecture 16, page 8

Content Distribution Will come back to this in Chap 12 CDN: network of proxy servers Caching: update versus invalidate Push versus pull-based approaches Stateful versus stateless Web caching: what semantics to provide? Lecture 16, page 9 Final Thoughts Replication and caching improve performance in distributed systems Consistency of replicated data is crucial Many consistency semantics (models) possible Need to pick appropriate model depending on the application Example: web caching: weak consistency is OK since humans are tolerant to stale information (can reload browser) Implementation overheads and complexity grows if stronger guarantees are desired Lecture 16, page 10

Fault Tolerance Single machine systems Failures are all or nothing OS crash, disk failures Distributed systems: multiple independent nodes Partial failures are also possible (some nodes fail) Question: Can we automatically recover from partial failures? Important issue since probability of failure grows with number of independent components (nodes) in the systems Prob(failure) = Prob(Any one component fails)=1-p(no failure) Lecture 16, page 11 A Perspective Computing systems are not very reliable OS crashes frequently (Windows), buggy software, unreliable hardware, software/hardware incompatibilities Until recently: computer users were tech savvy Could depend on users to reboot, troubleshoot problems Growing popularity of Internet/World Wide Web Novice users Need to build more reliable/dependable systems Example: what is your TV (or car) broke down every day? Users don t want to restart TV or fix it (by opening it up) Need to make computing systems more reliable Important for online banking, e-commerce, online trading, webmail Lecture 16, page 12

Basic Concepts Need to build dependable systems Requirements for dependable systems Availability: system should be available for use at any given time 99.999 % availability (five 9s) => very small down times Reliability: system should run continuously without failure Safety: temporary failures should not result in a catastrophic Example: computing systems controlling an airplane, nuclear reactor Maintainability: a failed system should be easy to repair Lecture 16, page 13 Basic Concepts (contd) Fault tolerance: system should provide services despite faults Transient faults Intermittent faults Permanent faults Lecture 16, page 14

Failure Models Type of failure Crash failure Omission failure Receive omission Send omission Timing failure Response failure Value failure State transition failure Arbitrary failure Description A server halts, but is working correctly until it halts A server fails to respond to incoming requests A server fails to receive incoming messages A server fails to send messages A server's response lies outside the specified time interval The server's response is incorrect The value of the response is wrong The server deviates from the correct flow of control A server may produce arbitrary responses at arbitrary times Different types of failures. Lecture 16, page 15 Failure Masking by Redundancy Triple modular redundancy. Lecture 16, page 16