Distributed Systems. replication Johan Montelius ID2201. Distributed Systems ID2201

Similar documents
Replication in Distributed Systems

Basic vs. Reliable Multicast

Linearizability CMPT 401. Sequential Consistency. Passive Replication

Distributed Systems (5DV147)

Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf

Distributed Systems 11. Consensus. Paul Krzyzanowski

Distributed Systems. 09. State Machine Replication & Virtual Synchrony. Paul Krzyzanowski. Rutgers University. Fall Paul Krzyzanowski

Distributed Systems. Day 9: Replication [Part 1]

Module 7 - Replication

Distributed Systems. coordination Johan Montelius ID2201. Distributed Systems ID2201

Last time. Distributed systems Lecture 6: Elections, distributed transactions, and replication. DrRobert N. M. Watson

X X C 1. Recap. CSE 486/586 Distributed Systems Gossiping. Eager vs. Lazy Replication. Recall: Passive Replication. Fault-Tolerance and Scalability

Distributed Systems. 10. Consensus: Paxos. Paul Krzyzanowski. Rutgers University. Fall 2017

Consensus and related problems

CS 425 / ECE 428 Distributed Systems Fall 2017

Replication and Consistency

Fault Tolerance. Distributed Software Systems. Definitions

Parallel Data Types of Parallelism Replication (Multiple copies of the same data) Better throughput for read-only computations Data safety Partitionin

Failures, Elections, and Raft

DISTRIBUTED COMPUTER SYSTEMS

Replication Brian Nielsen

Practical Byzantine Fault Tolerance

Replication and Consistency. Fall 2010 Jussi Kangasharju

Data Consistency and Blockchain. Bei Chun Zhou (BlockChainZ)

Replication architecture

Distributed Systems Fault Tolerance

Distributed Systems. Aleardo Manacero Jr.

Distributed Systems. 19. Fault Tolerance Paul Krzyzanowski. Rutgers University. Fall 2013

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 14 Distributed Transactions

Replication. Feb 10, 2016 CPSC 416

Replications and Consensus

Distributed Systems. Fault Tolerance. Paul Krzyzanowski

Replication. Consistency models. Replica placement Distribution protocols

PBFT: A Byzantine Renaissance. The Setup. What could possibly go wrong? The General Idea. Practical Byzantine Fault-Tolerance (CL99, CL00)

Strong Consistency and Agreement

Chapter 7 Consistency And Replication

殷亚凤. Consistency and Replication. Distributed Systems [7]

Today: Fault Tolerance. Fault Tolerance

Lecture XII: Replication

ZooKeeper & Curator. CS 475, Spring 2018 Concurrent & Distributed Systems

Byzantine Fault Tolerance and Consensus. Adi Seredinschi Distributed Programming Laboratory

Distributed Systems (ICE 601) Fault Tolerance

Consistency and Replication

Failure Models. Fault Tolerance. Failure Masking by Redundancy. Agreement in Faulty Systems

Dep. Systems Requirements

Fault Tolerance. Distributed Systems. September 2002

Recall our 2PC commit problem. Recall our 2PC commit problem. Doing failover correctly isn t easy. Consensus I. FLP Impossibility, Paxos

Recovering from a Crash. Three-Phase Commit

Consistency and Replication

Today: Fault Tolerance. Failure Masking by Redundancy

Recall: Primary-Backup. State machine replication. Extend PB for high availability. Consensus 2. Mechanism: Replicate and separate servers

Consistency & Replication

Today: Fault Tolerance

CSE 5306 Distributed Systems

Fault Tolerance via the State Machine Replication Approach. Favian Contreras

Distributed Data Management Replication

CSE 5306 Distributed Systems. Fault Tolerance

Introduction to Distributed Systems Seif Haridi

Fault Tolerance for Highly Available Internet Services: Concept, Approaches, and Issues

Distributed Systems COMP 212. Lecture 19 Othon Michail

Recap. CSE 486/586 Distributed Systems Paxos. Paxos. Brief History. Brief History. Brief History C 1

There is a tempta7on to say it is really used, it must be good

416 practice questions (PQs)

SCALABLE CONSISTENCY AND TRANSACTION MODELS

Chapter 4: Distributed Systems: Replication and Consistency. Fall 2013 Jussi Kangasharju

Today: Fault Tolerance. Replica Management

To do. Consensus and related problems. q Failure. q Raft

Google File System, Replication. Amin Vahdat CSE 123b May 23, 2006

Practical Byzantine Fault Tolerance. Miguel Castro and Barbara Liskov

Consistency and Replication

Reliable Distributed System Approaches

CSE 5306 Distributed Systems. Consistency and Replication

Consensus, impossibility results and Paxos. Ken Birman

Global atomicity. Such distributed atomicity is called global atomicity A protocol designed to enforce global atomicity is called commit protocol

Important Lessons. A Distributed Algorithm (2) Today's Lecture - Replication

Distributed Systems 24. Fault Tolerance

Applications of Paxos Algorithm

Robust BFT Protocols

Distributed Systems. Characteristics of Distributed Systems. Lecture Notes 1 Basic Concepts. Operating Systems. Anand Tripathi

Distributed Systems. Characteristics of Distributed Systems. Characteristics of Distributed Systems. Goals in Distributed System Designs

Time and distributed systems. Just use time stamps? Correct consistency model? Replication and Consistency

Dynamo: Amazon s Highly Available Key-value Store. ID2210-VT13 Slides by Tallat M. Shafaat

Consistency in Distributed Storage Systems. Mihir Nanavati March 4 th, 2016

CS 138: Practical Byzantine Consensus. CS 138 XX 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.

Chapter 5: Distributed Systems: Fault Tolerance. Fall 2013 Jussi Kangasharju

Important Lessons. Today's Lecture. Two Views of Distributed Systems

Fault Tolerance Dealing with an imperfect world

Exam 2 Review. October 29, Paul Krzyzanowski 1

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 7 Consistency And Replication

CSE 444: Database Internals. Section 9: 2-Phase Commit and Replication

Distributed Systems COMP 212. Revision 2 Othon Michail

A Framework for Highly Available Services Based on Group Communication

Distributed Algorithms Reliable Broadcast

The UNIVERSITY of EDINBURGH. SCHOOL of INFORMATICS. CS4/MSc. Distributed Systems. Björn Franke. Room 2414

Primary-Backup Replication

Paxos and Raft (Lecture 21, cs262a) Ion Stoica, UC Berkeley November 7, 2016

Consistency and Replication. Why replicate?

Replica Placement. Replica Placement

Distributed Storage Systems: Data Replication using Quorums

Fault-Tolerant Distributed Services and Paxos"

Transcription:

Distributed Systems ID2201 replication Johan Montelius 1

The problem The problem we have: servers might be unavailable The solution: keep duplicates at different servers 2

Building a fault-tolerant service When building a fault-tolerant service with replicated data and functionality the service: should produce the same results as a non-replicated service should respond despite node crashes... if the cost is not too high 3

What is a correct behavior A replicated service is said to be linearizable if for any execution there is some interleaving that... meets the specification of a nonreplicated service matches the real time order of operations in the real execution 4

A less restricted A replicated service is said to be sequentially consistent if for any execution there is some interleaving that... meets the specification of a nonreplicated service matches the program order of operations in the real execution 5

even less restricted Eventual consistency sooner or later, but until then? Causal consistency if a caused b and you read b then you should be able to read a Read your writes at least during a session? Monotonic reads always read something new 6

System model Asynchronous system, nodes fail only by crashing. client front end replica manager service replica manager client front end replica manager 7

Group membership service A dynamic group: nodes can be added or removed needs to be done in a controlled way system might halt until the group is updated A static group: if a node crashes it will be unavailable until it is restarted should continue to operate even if some nodes are down 8

View synchrony A group is monitored and if a node is suspected to have crashed, a new view is delivered. Communication is restricted to within a view. Inside a view, we can implement leader election, atomic multicast etc. 9

view-synchronous communication p q r v(g) = {p,q,r} v(g) = {q,r} 10

disallowed p q r v(g) = {p,q,r} v(g) = {q,r} 11

disallowed p q r v(g) = {p,r} v(g) = {p,q,r} 12

Group membership service RM leave front end RM Dynamic group fail RM enter RM group service 13

passive replication front end front end primary RM RM RM backup 14

passive replication Request request with a unique identifier Coordination primary checks if it is a new request Execution primary execute, and store response Agreement send updated state and reply to all backup nodes Respond send reply to front-end 15

Is it linearizable? The primary replica manager will serialize all operations. If the primary fails, it retains linearizability if backup takes over where the primary left of. 16

primary crash Primary crash: backups will receive new view with primary missing new primary is elected Request is resent: if agreement was reached last time, the reply is known and is resent if not, the execution is redone 17

Pros and cons Pros All operations passes through a primary that linerarize operations. Works even if execution is indeterministic Cons delivering state change can be costly 18

active replication RM state machines front end RM RM 19

Active replication Request multicasted to group, unique identifier Coordination deliver request in total order Execution all replicas are identical and deterministic Agreement not needed Response sent to front end, first reply to client 20

Active replication Sequential consistency: All replicas execute the same sequence of operations. All replicas produce the same answer. Linearizability: Total order multicast does not (automatically) preserve real-time order. 21

Pros and cons Pros no need to change existing servers no need to send state changes could survive Byzantine failures Cons requires total order multicast requires deterministic execution 22

High availability Both replication schemes require that servers are available. If a server crashes it will take some time to detect and remove the faulty node. depends on network is this acceptable Can we build a system that responds even if all nodes are not available? 23

Gossip architecture client front end replica manager gossip replica manager client front end replica manager 24

Relaxed consistency Increase availability at the expense of consistency. causal update ordering forced (total and causal) update ordering immediate update ordering (total order with respect to all other updates) 25

Example: bulletin board Adding messages: causal ordering Adding a user: forced ordering Removing a user: immediate ordering All replicas should agree on what messages are before the removal of the user. 26

Implementation Front ends keep a vector clock that reflects the last seen value. The vector holds an entry for each replica in the system. The vector clock is updated as the front end sees replies from the replicas. The front end is responsible for fault tolerant replication. 27

Query front end I have seen values written at <2,4,6>, don't give me old data. read X at <2,4,6> foo <2,5,6> Ok, this is the value at <2,5,6> <2,5,5> replica manager replica manager <2,5,6> replica manager <2,5,6> 28

Query A front end sends a query request to any replica manager. Query contains vector time stamp. Replica manager must hold query until it has all information that happenbefore the query. Replica manager returns response and new time stamp. 29

Updates front end write X at <3,5,7> I have seen values written at <3,5,7>, write this later. ack: <4,5,7> <3,5,6> replica manager replica manager <2,5,7> <3,5,6> replica manager 30

Updates Front end sends updates to one (or more) replica manager. The update is scheduled by the replica manager to be executed in causal order. Updates is sent to remaining replica mangers using the gossip protocol. 31

Gossip architectures Performance at the price or causal consistency. Forced and immediate consistency more expensive. Can the application live with causal consistency? Highly available, only one replica needs to be available. 32

Quorum based front end RM RM state machines N: number of nodes R: read quorum W: write quorum Each datum has a write time (logical or real) to determine the most recent RM failed 33

Sequential consistent Assume the read or write quorum must be taken in order for the operation to take place. R + W > N A read operation will overlaps with the most recent write operation. The time stamp will/might tell which one is most recent. W > N/2 Two read operations can not occur concurrently. 34

Summary Replicating objects used to achieve fault tolerant services. Services should (?) provide single image view as defined by sequential consistency. Passive replication Active replication Gossip based Quorum based replication 35