Replication in Distributed Systems

Similar documents
Basic vs. Reliable Multicast

Distributed Systems (5DV147)

Module 7 - Replication

Replication and Consistency. Fall 2010 Jussi Kangasharju

Distributed Systems. replication Johan Montelius ID2201. Distributed Systems ID2201

Important Lessons. A Distributed Algorithm (2) Today's Lecture - Replication

Consistency and Replication. Some slides are from Prof. Jalal Y. Kawash at Univ. of Calgary

Chapter 4: Distributed Systems: Replication and Consistency. Fall 2013 Jussi Kangasharju

Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf

殷亚凤. Consistency and Replication. Distributed Systems [7]

Consistency and Replication

Consistency and Replication (part b)

Distributed Systems. Day 9: Replication [Part 1]

Distributed Systems: Consistency and Replication

Replication and Consistency

CSE 5306 Distributed Systems. Consistency and Replication

Distributed Systems. 09. State Machine Replication & Virtual Synchrony. Paul Krzyzanowski. Rutgers University. Fall Paul Krzyzanowski

Lecture 6 Consistency and Replication

Chapter 7 Consistency And Replication

Consistency and Replication. Why replicate?

Lecture XII: Replication

Replication. Consistency models. Replica placement Distribution protocols

Last time. Distributed systems Lecture 6: Elections, distributed transactions, and replication. DrRobert N. M. Watson

Computing Parable. The Archery Teacher. Courtesy: S. Keshav, U. Waterloo. Computer Science. Lecture 16, page 1

Consistency and Replication

Consistency and Replication

DISTRIBUTED COMPUTER SYSTEMS

Replications and Consensus

11. Replication. Motivation

Important Lessons. Today's Lecture. Two Views of Distributed Systems

Last Class: Consistency Models. Today: Implementation Issues

X X C 1. Recap. CSE 486/586 Distributed Systems Gossiping. Eager vs. Lazy Replication. Recall: Passive Replication. Fault-Tolerance and Scalability

Replication. Feb 10, 2016 CPSC 416

Consistency & Replication

Linearizability CMPT 401. Sequential Consistency. Passive Replication

TAPIR. By Irene Zhang, Naveen Sharma, Adriana Szekeres, Arvind Krishnamurthy, and Dan Ports Presented by Todd Charlton

CONSISTENCY IN DISTRIBUTED SYSTEMS

Last Class: Web Caching. Today: More on Consistency

Consistency and Replication 1/62

Time and distributed systems. Just use time stamps? Correct consistency model? Replication and Consistency

Consistency and Replication. Why replicate?

CS6450: Distributed Systems Lecture 11. Ryan Stutsman

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 14 Distributed Transactions

CS 425 / ECE 428 Distributed Systems Fall 2017

Eventual Consistency. Eventual Consistency

CSE 5306 Distributed Systems

Recall: Primary-Backup. State machine replication. Extend PB for high availability. Consensus 2. Mechanism: Replicate and separate servers

Replication of Data. Data-Centric Consistency Models. Reliability vs. Availability

Parallel Data Types of Parallelism Replication (Multiple copies of the same data) Better throughput for read-only computations Data safety Partitionin

CS 470 Spring Fault Tolerance. Mike Lam, Professor. Content taken from the following:

Distributed Systems. Characteristics of Distributed Systems. Lecture Notes 1 Basic Concepts. Operating Systems. Anand Tripathi

Distributed Systems. Characteristics of Distributed Systems. Characteristics of Distributed Systems. Goals in Distributed System Designs

Replication architecture

Module 7 File Systems & Replication CS755! 7-1!

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 7 Consistency And Replication

GOSSIP ARCHITECTURE. Gary Berg css434

Byzantine Fault Tolerant Raft

Distributed Systems 24. Fault Tolerance

Distributed Systems 8L for Part IB

Distributed Data Management Transactions

Distributed Systems. Lec 10: Distributed File Systems GFS. Slide acks: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung

Synchrony Weakened by Message Adversaries vs Asynchrony Enriched with Failure Detectors. Michel Raynal, Julien Stainer

SCALABLE CONSISTENCY AND TRANSACTION MODELS

Extend PB for high availability. PB high availability via 2PC. Recall: Primary-Backup. Putting it all together for SMR:

Implementation Issues. Remote-Write Protocols

Consistency and Replication 1/65

Replica Placement. Replica Placement

Last Class:Consistency Semantics. Today: More on Consistency

Causal Consistency and Two-Phase Commit

Distributed Systems. Fault Tolerance. Paul Krzyzanowski

Practical Byzantine Fault

Exploiting Commutativity For Practical Fast Replication. Seo Jin Park and John Ousterhout

Two phase commit protocol. Two phase commit protocol. Recall: Linearizability (Strong Consistency) Consensus

Distributed Storage Systems: Data Replication using Quorums

Introduction to Distributed Systems Seif Haridi

Mutual consistency, what for? Replication. Data replication, consistency models & protocols. Difficulties. Execution Model.

Distributed Systems. 19. Fault Tolerance Paul Krzyzanowski. Rutgers University. Fall 2013

Intuitive distributed algorithms. with F#

Distributed Systems COMP 212. Revision 2 Othon Michail

Distributed System. Gang Wu. Spring,2018

CSE 444: Database Internals. Section 9: 2-Phase Commit and Replication

Key-value store with eventual consistency without trusting individual nodes

Reducing the Costs of Large-Scale BFT Replication

CS Amazon Dynamo

Primary-Backup Replication

Large-Scale Key-Value Stores Eventual Consistency Marco Serafini

Atomicity. Bailu Ding. Oct 18, Bailu Ding Atomicity Oct 18, / 38

Strong Consistency and Agreement

Building Consistent Transactions with Inconsistent Replication

Database Replication: A Tutorial

CS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5.

Consistency in Distributed Storage Systems. Mihir Nanavati March 4 th, 2016

Distributed Systems 11. Consensus. Paul Krzyzanowski

Exam 2 Review. October 29, Paul Krzyzanowski 1

Dynamo: Amazon s Highly Available Key-value Store. ID2210-VT13 Slides by Tallat M. Shafaat

DrRobert N. M. Watson

Today: Fault Tolerance. Replica Management

Paxos provides a highly available, redundant log of events

Distributed KIDS Labs 1

Data Modeling and Databases Ch 14: Data Replication. Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich

Transcription:

Replication in Distributed Systems

Replication Basics Multiple copies of data kept in different nodes A set of replicas holding copies of a data Nodes can be physically very close or distributed all over the world A set of clients that make requests (read/write) of the data in a replica Why replicate? Fault Tolerance Service can be provided from a different replica if one replica fails Load Balancing Load can be shared by multiple replicas (ex. web servers) Reduced latency Replicas placed closer to request source for faster access

Ideally, the user should think that there is a single copy of the data (replication transparency) Requires a write at one replica to propagate instantaneously to another replica as if it is done on a single copy Impossible if real time ordering of read/write operations is to be maintained as in a single copy Then how should we keep replicas updated in the presence of writes? Should all copies of the data have the same value always, or are intermediate differences allowed? Depends on consistency model to be satisfied depending on application need What is a consistency model?

Consistency Models

Consistency Models Defines what guarantees are provided on reads on a shared data in the presence of possibly interleaved/overlapped access Replicas of a data accessed at multiple sites can be viewed as a single shared data Tradeoff Should be strong enough to be useful Should be weak enough to be efficiently implementable

Data-centric models Linearizability Sequential consistency Causal consistency Client-centric models Eventual consistency Many other models exist Why so many models? Application requirements are different Stronger models require more overheads to implement, so many weaker models have evolved if strong guarantees are not needed for an application

Linearizability Satisfied if there exists some sequential ordering of the reads and writes in which 1. Operations of individual processes are ordered in the same way as in the actual order 2. For two operations by two different processes, if times in actual order are t1 and t2, and times in the sequential order are t1 and t2, then if t1 < t2, then t1 < t2 3. Each read gets the value of the latest write before it in the sequential ordering Time can be based on any global timestamping scheme Used mostly for formal verification etc.

(a) is linearizable, (b) is not

Sequential Consistency Requires only the first and third conditions of Linearizability No ordering of events at different processes required Linearizability implies sequential consistency but not vice-versa No notion of time, and hence no notion of most recent write Ordering of events at different processes may not be important as they could have happened in some other order in practice anyway due to different reasons (server speed, message delays, ) Ok if you can find at least one such possible order of the events that respects the event order of individual processes Widely used in practice Still costly to implement

(a) is sequentially consistent (Is (a) Linearizable?) (b) is not sequentially consistent Sequential consistency means all processes see all writes in the same order ( seeing means the results returned by reads)

Causal Consistency Causally related writes Writes in the same process Writes in different processes linked by reads in between Writes not causally related are concurrent All writes that are causally related must be seen (results of read) by every process in the same order Writes that are not causally related can be seen in any order by different processes Value returned by the reads must be consistent with this causal order Note that sequential consistency implies causal consistency, but not vice-versa

Example Is this sequentially consistent?

What about these?

Eventual Consistency A client-centric consistency model The read-write ordering is to be satisfied for each client separately only No requirement for any consistency for read/writes of different clients All reads should eventually get the same value Good if most operations are read, writes are very infrequent, and some temporary inconsistencies can be tolerated

Implementing Consistency Models

Replication Architectures Consistency models are fine, but how do systems implement them? Depends on replication architecture and the specific model Replication Architectures Passive Replication All requests made to a single replica (primary) Active Replication Requests made to all replicas

Passive Replication Each client requests to a single replica (primary) A unique identifier assigned by primary for each request Other replicas are backup Master-slave like relation between primary and backups Reads are returned from primary On write, Primary executes the write and sends the updated state to all replicas Receive reply from all replicas Reply success to client Primary also sends periodic heartbeat messages to all backups to indicate it is alive

If primary dies (no heartbeat message detected at backup) Backups elect a new leader that starts to act as primary Client may fail to access a service during the duration between primary crash and new primary election (failover time) Need to ensure Exactly one primary at all times (except failover time) All backups agree on the primary No backups respond to client requests Problem: what happens if there is a failure during update of replicas?

Consistency models enforced in passive replication Linearizability, as primary acts as sequencer, serializing all access to the data Enforcing Linearizability implies sequential consistency

Active Replication No master-slave relation among replicas A client makes requests to all replicas In practice, client can send request to one replica, that replica can act as front end to send requests to all replicas Client must know what replica to go to if the front end fails All replicas replies to client, client can take First response for crash failure model (requires f+1 replicas to tolerate f faults) Majority for byzantine faults (requires 2f+1 replicas to tolerate f faults) Need to ensure that all replicas agree on the order of client requests If all requests are applied in the same order at all replicas, their final state is consistent Consensus problem

State Machine Replication A general strategy proposed to build fault-tolerant systems by replication Basis for active replication in practice Each replica is represented by a state machine All replicas start with the same initial state Client requests are made to the state machines Need to ensure All non-faulty state machines receive all requests (Agreement) All non-faulty state machines processes the requests in the same order (Order)

Ensuring Agreement and Order Using atomic multicast Read/write request sent to all replicas using atomic multicast How to implement atomic multicast? Atomic multicast is equivalent to consensus Use other specialized consensus protocols (Paxos/Raft) We will study Raft

Implementing Linearizability Client makes read/write request Read/write request sent by local replica to all others using atomic multicast On receiving this, replica servers (a) update copy on write and send back ack, (b) only send ack on read On completion of atomic multicast, the local copy given to client on read or success returned to client on write Ensures All read/writes are ordered the same by all replicas Time-order is preserved for all non-overlapping read/writes

Implementing Sequential Consistency Using atomic multicast Client makes read/write request On read, just return the local copy (no atomic multicast) On write, request sent by local replica to all others using atomic multicast On receiving this, replica servers update copy on write and send back ack On completion of atmic multicast, success returned to client on write Using Quorum-based protocols/voting protocols

Voting Protocols Each replica is given a number of votes Let V = sum of all replica votes Choose read quorum Q r and write quorum Q w such that Q r + Q w > V, 2 * Q w > V and To read, each replica must get Q r votes To write, each replica must get Q w votes Guarantees A read and a write do not occur together Two writes do not occur together

Data items are tagged with a version Incremented on each write To read data client gets read quorum from replica sites chooses the copy with the highest version no. from those replicas Most updated copy (why?) To write data Client gets read quorum from replica sites Chooses the copy with highest version number (say t) Client gets write quorum from replica sites Writes the data with version number > t in all replicas of write quorum

Special Cases Let N = no. of replicas ROWA (Read One Write All) Each replica has one vote, Q r = 1, Q w = N Fast reads, slow writes Cannot tolerate even one replica failure for writes Majority Each replica has one vote, Q r = Q w = N/2 + 1 Equal read/write overhead

Questions How to assign votes? More reliable servers can be assigned more votes More powerful servers may be assigned more votes What if one or more replicas fail? No combination of votes may satisfy the quorum constraints in case of failure What if there is a network partition? No majority may exist in any of the partitions, even if no replicas crash

Problems with Sequential Consistency Not scalable Atomic Multicast requires all replicas to contact all other replicas Voting based protocols still require a replica to contact a large no. of other replicas (majority in the worst case) Good for small systems that require such strong consistency Many systems do not need such strong consistency guarantees

Implementing Eventual Consistency Easy if a client always connects to a single replica Hard otherwise Ex. mobile systems Generally goal is to ensure that all replicas have the same state eventually Epidemic protocol to replicate Replication topology Form a replication topology to decide who replicates from who Scheduled Replication Push vs. pull models

Design Issues in Replication Consistency model to be enforced Where can updates happen? One designated replica or any replica? When to propagate updates? Eager (immediately, before response to client) or lazy (sometime after response is sent to client)? Depends on consistency model to be supported How many replicas to install? Where to place the replicas?

Replica Placement Mirroring Static replicas created a-priori May mirror all data or part depending on need Server-generated Dynamic replicas created by a server when load increases Client caching Replicas created by caching frequently used data at/near client Pros/Cons of each? When should you use what?