SCALABLE CONSISTENCY AND TRANSACTION MODELS THANKS TO M. GROSSNIKLAUS

Similar documents
SCALABLE CONSISTENCY AND TRANSACTION MODELS

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

CS6450: Distributed Systems Lecture 11. Ryan Stutsman

CONSISTENT FOCUS. Building reliable distributed systems at a worldwide scale demands trade-offs between consistency and availability.

Eventual Consistency 1

NoSQL systems: sharding, replication and consistency. Riccardo Torlone Università Roma Tre

Strong Consistency & CAP Theorem

CS6450: Distributed Systems Lecture 15. Ryan Stutsman

Virtual Synchrony. Ki Suh Lee Some slides are borrowed from Ken, Jared (cs ) and JusBn (cs )

Large-Scale Key-Value Stores Eventual Consistency Marco Serafini

Transactions and ACID

The CAP theorem. The bad, the good and the ugly. Michael Pfeiffer Advanced Networking Technologies FG Telematik/Rechnernetze TU Ilmenau

Database Availability and Integrity in NoSQL. Fahri Firdausillah [M ]

Distributed Systems. Characteristics of Distributed Systems. Lecture Notes 1 Basic Concepts. Operating Systems. Anand Tripathi

Distributed Systems. Characteristics of Distributed Systems. Characteristics of Distributed Systems. Goals in Distributed System Designs

10. Replication. Motivation

Report to Brewer s original presentation of his CAP Theorem at the Symposium on Principles of Distributed Computing (PODC) 2000

Building Consistent Transactions with Inconsistent Replication

CAP Theorem. March 26, Thanks to Arvind K., Dong W., and Mihir N. for slides.

Computing Parable. The Archery Teacher. Courtesy: S. Keshav, U. Waterloo. Computer Science. Lecture 16, page 1

CS October 2017

Big Data Management and NoSQL Databases

There is a tempta7on to say it is really used, it must be good

CISC 7610 Lecture 2b The beginnings of NoSQL

Modern Database Concepts

Database Architectures

CS Amazon Dynamo

Mutual consistency, what for? Replication. Data replication, consistency models & protocols. Difficulties. Execution Model.

CS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5.

Distributed Data Store

Important Lessons. Today's Lecture. Two Views of Distributed Systems

CISC 7610 Lecture 5 Distributed multimedia databases. Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL

10.0 Towards the Cloud

Lecture 6 Consistency and Replication

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2015 Lecture 14 NoSQL

NoSQL Concepts, Techniques & Systems Part 1. Valentina Ivanova IDA, Linköping University

CS 655 Advanced Topics in Distributed Systems

CompSci 516 Database Systems

CS 138: Dynamo. CS 138 XXIV 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.

Migrating Oracle Databases To Cassandra

Replication in Distributed Systems

Distributed Hash Tables

Final Exam Logistics. CS 133: Databases. Goals for Today. Some References Used. Final exam take-home. Same resources as midterm

Advances in Data Management - NoSQL, NewSQL and Big Data A.Poulovassilis

Performance and Forgiveness. June 23, 2008 Margo Seltzer Harvard University School of Engineering and Applied Sciences

Consistency in Distributed Storage Systems. Mihir Nanavati March 4 th, 2016

Eventual Consistency Today: Limitations, Extensions and Beyond

Architekturen für die Cloud

Consistency & Replication

NoSQL systems. Lecture 21 (optional) Instructor: Sudeepa Roy. CompSci 516 Data Intensive Computing Systems

CS5412: ANATOMY OF A CLOUD

CAP Theorem, BASE & DynamoDB

Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf

Consistency in Distributed Systems

11. Replication. Motivation

DrRobert N. M. Watson

4/9/2018 Week 13-A Sangmi Lee Pallickara. CS435 Introduction to Big Data Spring 2018 Colorado State University. FAQs. Architecture of GFS

Distributed Data Management Replication

Distributed Data Management Summer Semester 2015 TU Kaiserslautern

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)

Dynamo: Amazon s Highly Available Key-value Store. ID2210-VT13 Slides by Tallat M. Shafaat

NoSQL Databases MongoDB vs Cassandra. Kenny Huynh, Andre Chik, Kevin Vu

CMU SCS CMU SCS Who: What: When: Where: Why: CMU SCS

EECS 498 Introduction to Distributed Systems

Databases : Lecture 1 2: Beyond ACID/Relational databases Timothy G. Griffin Lent Term Apologies to Martin Fowler ( NoSQL Distilled )

Integrity in Distributed Databases

Last time. Distributed systems Lecture 6: Elections, distributed transactions, and replication. DrRobert N. M. Watson

DYNAMO: AMAZON S HIGHLY AVAILABLE KEY-VALUE STORE. Presented by Byungjin Jun

Introduction to NoSQL

Data-Intensive Distributed Computing

DISTRIBUTED COMPUTER SYSTEMS

<Insert Picture Here> Oracle NoSQL Database A Distributed Key-Value Store

Database Architectures

Spotify. Scaling storage to million of users world wide. Jimmy Mårdell October 14, 2014

Data Management for Big Data Part 1

FIT: A Distributed Database Performance Tradeoff. Faleiro and Abadi CS590-BDS Thamir Qadah

Consistency and Scalability

CS5412: THE BASE METHODOLOGY VERSUS THE ACID MODEL

CIB Session 12th NoSQL Databases Structures

NewSQL Databases. The reference Big Data stack

Important Lessons. A Distributed Algorithm (2) Today's Lecture - Replication

Jargons, Concepts, Scope and Systems. Key Value Stores, Document Stores, Extensible Record Stores. Overview of different scalable relational systems

Distributed Systems. replication Johan Montelius ID2201. Distributed Systems ID2201

NoSQL Databases. Vincent Leroy

Distributed Systems (5DV147)

Overview. * Some History. * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL. * NoSQL Taxonomy. *TowardsNewSQL

Rule 14 Use Databases Appropriately

Dynamo: Key-Value Cloud Storage

Distributed Systems. Day 13: Distributed Transaction. To Be or Not to Be Distributed.. Transactions

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 14 Distributed Transactions

Consistency: Relaxed. SWE 622, Spring 2017 Distributed Software Engineering

Basic vs. Reliable Multicast

Introduction to NoSQL Databases

Achieving the Potential of a Fully Distributed Storage System

Time. Supriya Vadlamani

CSE 444: Database Internals. Section 9: 2-Phase Commit and Replication

Data Modeling and Databases Ch 14: Data Replication. Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich

Haridimos Kondylakis Computer Science Department, University of Crete

Workshop Report: ElaStraS - An Elastic Transactional Datastore in the Cloud

Cassandra, MongoDB, and HBase. Cassandra, MongoDB, and HBase. I have chosen these three due to their recent

Transcription:

Sharding and Replica@on Data Management in the Cloud SCALABLE CONSISTENCY AND TRANSACTION MODELS THANKS TO M. GROSSNIKLAUS Sharding Breaking a database into several collecbons (shards) Each data item (e.g., a document) goes in one shard ReplicaBon Have mulbple copies of a database Each data item lives in several places Can combine sharding and replicabon Why do systems shard and replicate? 1 2 Issues with Sharing and Replica@on Sharding: If an operabon needs mulbple data items, it might need to access several shards ReplicaBon: Trying to keep replicas in sync with each other. Can you think of others? Brewer s Conjecture Three properbes that are desirable and expected from real- world shared- data systems C: data consistency A: availability P: tolerance of network parbbon At PODC 2000 (Portland, OR), Eric Brewer made the conjecture that only two of these properbes can be sabsfied by a system at any given Bme Conjecture was formalized and confirmed by MIT researchers Seth Gilbert and Nancy Lynch in 2002 Now known as the CAP Theorem 3 4 1

Data Consistency Database systems typically implement ACID transacbons Atomicity: all or nothing Consistency: transacbons never observe or result in inconsistent data IsolaBon: transacbons are not aware of concurrent transacbons Durability: once commi]ed, the state of a transacbon is permanent Useful in automated business applicabons banking: at the end of a transacbon the sum of money in both accounts is the same as before the transacbon online aucbons: the last bidder wins the aucbon There are applicabons that can deal with looser consistency guarantees and periods of inconsistency Availability Services are expected to be highly available every request should receive a response if you can read a data item, you can update it it can create real- world problems when a service goes down RealisBc goal service should be as available as the network it runs on if any service on the network is available, the service should be available 5 6 Par@@on- Tolerance A service should conbnue to perform as expected if some nodes crash if some communicabon links fail One desirable fault tolerance property is resilience to a network parbboning into mulbple components In cloud compubng, node and communicabon failures are not the excepbon but everyday events Problems with CAP (from D. Abadi) Asymmetry of CAP properbes C is a property of the system in general A is a property of the system only when there is a parbbon There are not three different choices in pracbce, CA and CP are indisbnguishable, since A is only sacrificed when there is a parbbon Used as an excuse to not bother with consistency Availability is really important to me, so CAP says I have to get rid of consistency 7 8 2

Another Problem to Fix Apart from availability in the face of parbbons, there are other costs to consistency Overhead of synchronizabon schemes Latency if replicas are far apart, can be a long wait to update one but for reliability, you might want replicas in different data centers A Cut at Fixing Both Problems PAC/ELC In the case of a parbbon (P), does the system choose availability (A) or consistency (C)? Else (E), does the system choose latency (L) or consistency (C)? PA/EL Dynamo, SimpleDB, Cassandra, Riptano, CouchDB, Cloudant PC/EC ACID compliant database systems PA/EC GenieDB PC/EL Existence is debatable 9 10 A Case for P*/EC Increased push for horizontally scalable transacbonal database systems cloud compubng distributed applicabons desire to deploy applicabons on cheap, commodity hardware Vast majority of currently available horizontally scalable systems are P*/EL developed by engineers at Google, Facebook, Yahoo, Amazon, etc. these engineers can handle reduced consistency, but it s really hard, and there needs to be an opbon for the rest of us Also distributed concurrency control and commit protocols are expensive once consistency is gone, atomicity usually goes next NoSQL Key Problems to Overcome High availability is cribcal, replicabon must be a first class cibzen Today s systems generally act, then replicate complicates semanbcs of sending read queries to replicas need confirmabon from replica before commit (increased latency) if you want durability and high availability In progress transacbons must be aborted upon a master failure Want system that replicates then acts Distributed concurrency control and commit are expensive, want to get rid of them both 11 12 3

Key Idea Instead of weakening ACID, strengthen it Challenges guaranteeing equivalence to some serial order makes acbve replicabon difficult running the same set of transacbons on two different replicas might cause replicas to diverge Disallow any nondeterminisbc behavior Disallow aborts caused by DBMS disallow deadlock distributed commit much easier if there are no aborts Consequences of Determinism Replicas produce the same output, given the same input facilitates acbve replicabon Only inibal input needs to be logged, state at failure can be reconstructed from this input log (or from a replica) AcBve distributed transacbons not aborted upon node failure greatly reduces (or eliminates) cost of distributed commit don t have to worry about nodes failing during commit protocol don t have to worry about effects of transacbon making it to disk before promising to commit transacbon just need one message from any node that potenbally can determinisbcally abort the transacbon this message can be sent in the middle of the transacbon, as soon as it knows it will commit 13 14 Strong vs. Weak Consistency Eventual Consistency Strong consistency amer an update is commi]ed, each subsequent access will return the updated value Weak consistency the systems does not guarantee that subsequent accesses will return the updated value a number of condibons might need to be met before the updated value is returned inconsistency window: period between update and the point in Bme when every access is guaranteed to return the updated value Specific form of weak consistency If no new updates are made, eventually all accesses will return the last updated values In the absence of failures, the maximum size of the inconsistency window can be determined based on communicabon delays system load number of replicas Not a new esoteric idea! Domain Name System (DNS) uses eventual consistency for updates RDBMS use eventual consistency for asynchronous replicabon or backup (e.g. log shipping) 15 16 4

Users Eventual- Consistency Ac@vity U1 U2 U3 U11 U12 Bid Accepters S1 S2 S8 Items for Auc@on Boots Tricycle Zune Bid Trackers A B C D 17 18 Models of Eventual Consistency Causal Consistency if A communicated to B that it has updated a value, a subsequent access by B will return the updated value, and a write is guaranteed to supersede the earlier write access by C that has no causal relabonship to A is subject to normal eventual consistency rules Read- your- writes Consistency special case of the causal consistency model amer updabng a value, a process will always read the updated value and never see an older value Session Consistency pracbcal case of read- your- writes consistency data is accessed in a session where read- your- writes is guaranteed guarantees do not span over sessions Models of Eventual Consistency Monotonic Read Consistency if a process has seen a parbcular value, any subsequent access will never return any previous value Monotonic Write Consistency system guarantees to serialize the writes of one process systems that do not guarantee this level of consistency are hard to program ProperBes can be combined e.g. monotonic reads plus session- level consistency e.g. monotonic reads plus read- your- own- writes quite a few different scenarios are possible it depends on an applicabon whether it can deal with the consequences 19 20 5

Configura@ons DefiniBons N: number of nodes that store a replica W: number of replicas that need to acknowledge a write operabon R: number of replicas that are accessed for a read operabon W+R > N e.g. synchronous replica@on (N=2, W=2, and R=1) write set and read set always overlap strong consistency can be guaranteed through quorum protocols risk of reduced availability: in basic quorum protocols, operabons fail if fewer than the required number of nodes respond, due to node failure W+R = N e.g. asynchronous replica@on (N=2, W=1, and R=1) strong consistency cannot be guaranteed Configura@ons R=1, W=N opbmized for read access: single read will return a value write operabon involves all nodes and risks not succeeding R=N, W=1 opbmized for write access: write operabon involves only one node and relies on asynchronous updates to other replicas read operabon involves all nodes and returns latest value durability is not guaranteed in presence of failures W < (N+1)/2 risk of conflicbng writes W+R <= N weak/eventual consistency 21 22 BASE Basically Available, Som state, Eventually Consistent As consistency is achieved eventually, conflicts have to be resolved at some point read repair write repair asynchronous repair Conflict resolubon is typically based on a global (parbal) ordering of operabons that (determinisbcally) guarantees that all replicas resolve conflicts in the same way client- specified Bmestamps vector clocks Vector Clocks Generate a parbal ordering of events in a distributed system and detecbng causality violabons A vector clock of a system of n processes is an vector of n logical clocks (one clock per process) messages contain the state of the sending process's logical clock local smallest possible values copy of the global vector clock is kept in each process Vector clocks algorithm was independently developed by Colin Fidge and Friedemann Ma]ern in 1988 23 24 6

Update Rules for Vector Clocks Vector Clock Example All clocks are inibalized to zero A process increments its own logical clock in the vector by one each Bme it experiences an internal event each Bme a process prepares to send a message each Bme a process receives a message Each Bme a process sends a message, it transmits the enbre vector clock along with the message being sent Each Bme a process receives a message, it updates each element in its vector by taking the pair- wise maximum of the value in its own vector clock and the value in the vector in the received message 25 Source: Wikipedia (hkp://www.wikipedia.org) 26 References S. Gilbert and N. Lynch: Brewer s Conjecture and the Feasibility of Consistent, Available and Par@@on- Tolerant Web Services. SIGACT News 33(2), pp. 51-59, 2002. W. Vogels: Eventually Consistent. ACM Queue 6(6), pp. 14-19, 2008. 27 7