Linearizability CMPT 401. Sequential Consistency. Passive Replication

Similar documents
Distributed File Systems. Case Studies: Sprite Coda

Distributed Systems. replication Johan Montelius ID2201. Distributed Systems ID2201

Replication Brian Nielsen

Replication in Distributed Systems

Basic vs. Reliable Multicast

GOSSIP ARCHITECTURE. Gary Berg css434

Today CSCI Coda. Naming: Volumes. Coda GFS PAST. Instructor: Abhishek Chandra. Main Goals: Volume is a subtree in the naming space

Distributed Systems (5DV147)

Replication and Consistency

Asynchronous Replication and Bayou. Jeff Chase CPS 212, Fall 2000

CS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5.

Important Lessons. Today's Lecture. Two Views of Distributed Systems

X X C 1. Recap. CSE 486/586 Distributed Systems Gossiping. Eager vs. Lazy Replication. Recall: Passive Replication. Fault-Tolerance and Scalability

Consistency & Replication

Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf

Failures, Elections, and Raft

Replica Placement. Replica Placement

Computing Parable. The Archery Teacher. Courtesy: S. Keshav, U. Waterloo. Computer Science. Lecture 16, page 1

CSE 486/586: Distributed Systems

Example File Systems Using Replication CS 188 Distributed Systems February 10, 2015

Today: Coda, xfs! Brief overview of other file systems. Distributed File System Requirements!

EECS 498 Introduction to Distributed Systems

G Bayou: A Weakly Connected Replicated Storage System. Robert Grimm New York University

Managing Update Conflicts in Bayou. Lucy Youxuan Jiang, Hiwot Tadese Kassa

Distributed Systems. Day 9: Replication [Part 1]

Extend PB for high availability. PB high availability via 2PC. Recall: Primary-Backup. Putting it all together for SMR:

Chapter 18 Distributed Systems and Web Services

Today: Coda, xfs. Case Study: Coda File System. Brief overview of other file systems. xfs Log structured file systems HDFS Object Storage Systems

Lecture XII: Replication

Replication & Consistency Part II. CS403/534 Distributed Systems Erkay Savas Sabanci University

Two phase commit protocol. Two phase commit protocol. Recall: Linearizability (Strong Consistency) Consensus

File Locking in NFS. File Locking: Share Reservations

Consistency and Replication 1/65

Practical Byzantine Fault

Consistency and Replication. Some slides are from Prof. Jalal Y. Kawash at Univ. of Calgary

CSE 444: Database Internals. Lecture 25 Replication

Recall: Primary-Backup. State machine replication. Extend PB for high availability. Consensus 2. Mechanism: Replicate and separate servers

INF-5360 Presentation

Consensus in Distributed Systems. Jeff Chase Duke University

To do. Consensus and related problems. q Failure. q Raft

Consensus and related problems

Consistency and Replication. Why replicate?

TECHNISCHE UNIVERSITEIT EINDHOVEN Faculteit Wiskunde en Informatica

11. Replication. Motivation

Distributed Systems 8L for Part IB. Additional Material (Case Studies) Dr. Steven Hand

10. Replication. CSEP 545 Transaction Processing Philip A. Bernstein. Copyright 2003 Philip A. Bernstein. Outline

Distributed Systems. Catch-up Lecture: Consistency Model Implementations

Consistency and Replication 1/62

Trek: Testable Replicated Key-Value Store

Last time. Distributed systems Lecture 6: Elections, distributed transactions, and replication. DrRobert N. M. Watson

Distributed Systems. Day 11: Replication [Part 3 Raft] To survive failures you need a raft

Primary-Backup Replication

Consistency and Replication

Consistency and Replication (part b)

Disconnected Operation in the Coda File System

Module 7 - Replication

Distributed File Systems (Chapter 14, M. Satyanarayanan) CS 249 Kamal Singh

Replication and Consistency. Fall 2010 Jussi Kangasharju

Exam 2 Review. October 29, Paul Krzyzanowski 1

Paxos provides a highly available, redundant log of events

Availability versus consistency. Eventual Consistency: Bayou. Eventual consistency. Bayou: A Weakly Connected Replicated Storage System

read (Q) write (Q) read(q ) write( Q)

CS 138: Practical Byzantine Consensus. CS 138 XX 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 14 Distributed Transactions

Chapter 4: Distributed Systems: Replication and Consistency. Fall 2013 Jussi Kangasharju

Distributed Systems: Consistency and Replication

Parallel Data Types of Parallelism Replication (Multiple copies of the same data) Better throughput for read-only computations Data safety Partitionin

Versioning, Consistency, and Agreement

TECHNISCHE UNIVERSITEIT EINDHOVEN Faculteit Wiskunde en Informatica

Replication. Feb 10, 2016 CPSC 416

Important Lessons. A Distributed Algorithm (2) Today's Lecture - Replication

Distributed Key Value Store Utilizing CRDT to Guarantee Eventual Consistency

Distributed Data Management Replication

DrRobert N. M. Watson

416 practice questions (PQs)

Distributed Systems. Lec 12: Consistency Models Sequential, Causal, and Eventual Consistency. Slide acks: Jinyang Li

The UNIVERSITY of EDINBURGH. SCHOOL of INFORMATICS. CS4/MSc. Distributed Systems. Björn Franke. Room 2414

Paxos and Replication. Dan Ports, CSEP 552

Distributed Systems. 09. State Machine Replication & Virtual Synchrony. Paul Krzyzanowski. Rutgers University. Fall Paul Krzyzanowski

Google File System, Replication. Amin Vahdat CSE 123b May 23, 2006

Consistency and Replication

CS Amazon Dynamo

Building Replication Systems with PRACTI and PADRE

CS 425 / ECE 428 Distributed Systems Fall 2017

Synchronization. Chapter 5

SCALABLE CONSISTENCY AND TRANSACTION MODELS

Consistency examples. COS 418: Distributed Systems Precept 5. Themis Melissaris

Distributed Systems. 19. Fault Tolerance Paul Krzyzanowski. Rutgers University. Fall 2013

Large-Scale Key-Value Stores Eventual Consistency Marco Serafini

Authenticated Byzantine Fault Tolerance Without Public-Key Cryptography

TAPIR. By Irene Zhang, Naveen Sharma, Adriana Szekeres, Arvind Krishnamurthy, and Dan Ports Presented by Todd Charlton

Distributed Systems. Aleardo Manacero Jr.

Consistency and Replication

Lecture XIII: Replication-II

DISTRIBUTED COMPUTER SYSTEMS

Consistency: Relaxed. SWE 622, Spring 2017 Distributed Software Engineering

Distributed and Fault-Tolerant Execution Framework for Transaction Processing

Exam 2 Review. Fall 2011

Last Class: Web Caching. Today: More on Consistency

BRANCH:IT FINAL YEAR SEVENTH SEM SUBJECT: MOBILE COMPUTING UNIT-IV: MOBILE DATA MANAGEMENT

Transcription:

Linearizability CMPT 401 Thursday, March 31, 2005 The execution of a replicated service (potentially with multiple requests interleaved over multiple servers) is said to be linearizable if: The interleaved sequence of operations has the same results as if it was run sequentially on a single object. The order of operations in the interleaving is consistent with the real times at which the operations occurred in the actual execution. Sequential Consistency Passive Replication The execution of a replicated service (potentially with multiple requests interleaved over multiple servers) is said to be sequentially consistent if: The interleaved sequence of operations has the same results as if it was run sequentially on a single object. The order of operations in the interleaving is consistent with the program order in which each individual client requested them. One primary physical object that handles all client requests One or more backup physical objects that stay in synch with the primary When the primary fails, one backup is promoted to be the new primary Ideally, this provides fault-tollerance

Is Passive Replication Linearizable? Since one server is handling all the requests, they are handled as if they were processed at one correct object. (first requirement) Since the server handles requests in the order it receives them, the ordering of operations is real time. (second requirement) What if reads can be directed to the backups? What about during failures? Active Replication Client sends request to front end and blocks waiting for a response. Front end uses reliable, totally ordered multicast to send request to all replica servers. Servers execute the request (identically since requests arrive in the same order at all servers). Responses are returned to the front end. The front end determines the single response and returns it to the client. Again, the goal here is to make a fault-tolerant system Problems with Active Replication in Reality Fisher et al. showed that we cannot build a system that can be guaranteed to reach consensus in an asynchronous system with crash failures. We have shown that if we have a totally ordered, reliable multicast system, we can solve consensus. Thus, we cannot build such a system in an asynchronous system with crash failures. Active replication relies on this kind of multicast system to provide the guarantees in the algorithm. Making Active Replication Workable We have previously seen that we can build failure detectors that allow a consensus system with very low probability of failure. Similarly, we have mentioned randomized algorithms for consensus that have very low probability of failure. Using either of these approaches, we can construct an active replication scheme with very low probability of failure.

The gossip System System designed to automatically move data to the edges and increase availability Divide requests into two types: queries are read-only requests with no writing updates are write-only requests with no reading The system is designed to meet two constraints: Consistent Service - Each client sends requests to any replica it chooses. Clients are allowed to send some requests to one replica and some to another. Regardless, they always produce responses consistent with the updates the client has seen so far Relaxed Replica Consistency - All replicas eventually receive all updates and apply them in an order sufficient to meet the consistency needs of the application. gossip Query Requests Client sends request to front end which in turn selects a nearby, available replica. Front end sends the request along with a vector timestamp (one entry for each replica) that indicates the most recent updates the client has seen. If replica has a greater or equal local timestamp, it responds with its local data and its local vector timestamp. Otherwise, it must request and wait for more updates. Front end merges its local timestamp with received replica timestamp gossip Update Requests Client sends request to front end which in turn selects a nearby, available replica. In cases where fault-tolerance is important, a front end may send the request to N/2 + 1 replicas to ensure resilience in the case of crashes. Front end sends the request along with: a vector timestamp (one entry for each replica) that indicates the most recent updates the client has seen a unique ID, to ensure that updates are not applied more than once The application of the update depends on whether it is handled in causal, forced, or immediate mode gossip Causal Updates Replica increments its entry in its own vector timestamp Replica immediately responds to client with its new vector timestamp (the update timestamp). Client merges timestamp with its own. The replica waits until its timestamp is greater or equal to the client s timestamp The replica finally updates the value and merges its current timestamp with the update timestamp

gossip Forced Updates These updates must be totally as well as causally ordered At any given time, one replica is known to all others as the primary replica. The order in which updates reach the primary replica is appended to them as a sequence number. Before a replica will apply a forced update, it must both have both a! timestamp and a forced sequence number which is less by only one gossip Immediate Updates Immediate updates go through the primary replica as well. Immediate updates are flagged with information on which causal and forced updates have come before it. Other replicas must apply this update exactly after the forced and causal updates determined by the primary. Gossip Messages Updates are shared between replicas using gossip messages. A gossip message consists of: the sender s update log the sender s timestamp The receiver of a message must: merge in any updates that it has not seen before discard any pending updates that have arrived in the log merge the received vector timestamp with its own The Bayou System Preventing conflicts is too restrictive in a system with disconnects and partitions Instead, when replicas share updates with each other, they can try to resolve any conflicts that occur Using domain-specific rules, the resolution of conflicts is called operational transformation Each replica has a list of committed list of updates and a tentative list of updates. Order of operations and thus final decision to commit is imposed by using a primary replica

The Coda System A descendent of AFS with the goal of allowing high availability, despite disconnects and partitions The replicas are called the Volume Storage Group (VSG). At any given time, a client can access some subset of these replicas called the Available Volume Storage Group (AVSG). Connected execution proceeds as per AFS, with updates being communicated by clients to the AVSG. Disconnected Coda When the AVSG is empty, the client is said to be disconnected. In this situation, the client still has access to any files that were cached locally before the disconnect. When reconnection occurs, all the updates are sent back to the AVSG and any conflicts are manually resolved by the user. Coda Replication Each file as a Coda Version Vector (CVV). This is a vector timestamp with one entry for each replica in the VSG. Each element of the CVV represents the number of updates received at a given replica for this file. Replicas can compare CVVs and, if v1!"v2 or v1 #"v2, then the more recent version of the file can be transmitted to update the old version. Normally, this is a two-step process: individual servers agree to the update and acknowle to the client the client computes the new CVV for the file and notifies all the servers who updated If one of those conditions does not hold, then the file is considered to be in conflict. User intervention is required to merge the files. Communication with Replicas In AFS, we know which server is going to give us a callback message on changes In Coda, clients select one member of the AVSG when opening a file. This one replica is responsible for providing a callback When a file is updated, the update is sent to the whole AVSG, so those replicas can provide callbacks to their clients Once every few minutes, the client must probe the VSG for each cached file to check what replicas are in the AVSG. Replicas respond with a vector timestamp representing the state of the replica (roughly). If a volume is found to be inconsistent between members of the AVSG, the client drops all its callback promises and requests new version of its files.