The CAP theorem. The bad, the good and the ugly. Michael Pfeiffer Advanced Networking Technologies FG Telematik/Rechnernetze TU Ilmenau

Similar documents
NoSQL systems: sharding, replication and consistency. Riccardo Torlone Università Roma Tre

SCALABLE CONSISTENCY AND TRANSACTION MODELS

Big Data Management and NoSQL Databases

Report to Brewer s original presentation of his CAP Theorem at the Symposium on Principles of Distributed Computing (PODC) 2000

CS6450: Distributed Systems Lecture 11. Ryan Stutsman

Large-Scale Key-Value Stores Eventual Consistency Marco Serafini

CAP Theorem. March 26, Thanks to Arvind K., Dong W., and Mihir N. for slides.

CS6450: Distributed Systems Lecture 15. Ryan Stutsman

Strong Consistency & CAP Theorem

CMU SCS CMU SCS Who: What: When: Where: Why: CMU SCS

Integrity in Distributed Databases

10. Replication. Motivation

CS October 2017

Conflict-free Replicated Data Types in Practice

WHERE TO PUT DATA. or What are we going to do with all this stuff?

Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf

EECS 498 Introduction to Distributed Systems

Consistency in Distributed Storage Systems. Mihir Nanavati March 4 th, 2016

Extreme Computing. NoSQL.

Transactions and ACID

CS 470 Spring Fault Tolerance. Mike Lam, Professor. Content taken from the following:

CS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5.

Eventual Consistency Today: Limitations, Extensions and Beyond

Replication. Feb 10, 2016 CPSC 416

Brewer's Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services

Data Consistency Now and Then

Computing Parable. The Archery Teacher. Courtesy: S. Keshav, U. Waterloo. Computer Science. Lecture 16, page 1

SCALABLE CONSISTENCY AND TRANSACTION MODELS THANKS TO M. GROSSNIKLAUS

Self-healing Data Step by Step

CISC 7610 Lecture 2b The beginnings of NoSQL

CAP and the Architectural Consequences

Consensus a classic problem. Consensus, impossibility results and Paxos. Distributed Consensus. Asynchronous networks.

Eventual Consistency 1

Mutual consistency, what for? Replication. Data replication, consistency models & protocols. Difficulties. Execution Model.

DrRobert N. M. Watson

Databases : Lecture 1 2: Beyond ACID/Relational databases Timothy G. Griffin Lent Term Apologies to Martin Fowler ( NoSQL Distilled )

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)

Introduction to NoSQL

11. Replication. Motivation

4/9/2018 Week 13-A Sangmi Lee Pallickara. CS435 Introduction to Big Data Spring 2018 Colorado State University. FAQs. Architecture of GFS

Consensus, impossibility results and Paxos. Ken Birman

Distributed Systems. Characteristics of Distributed Systems. Lecture Notes 1 Basic Concepts. Operating Systems. Anand Tripathi

Distributed Systems. Characteristics of Distributed Systems. Characteristics of Distributed Systems. Goals in Distributed System Designs

Databases : Lectures 11 and 12: Beyond ACID/Relational databases Timothy G. Griffin Lent Term 2013

CS505: Distributed Systems

10.0 Towards the Cloud

CSE-E5430 Scalable Cloud Computing Lecture 10

Consistency and Scalability

DISTRIBUTED EVENTUALLY CONSISTENT COMPUTATIONS

Data-Intensive Distributed Computing

Replication in Distributed Systems

Performance and Forgiveness. June 23, 2008 Margo Seltzer Harvard University School of Engineering and Applied Sciences

CISC 7610 Lecture 5 Distributed multimedia databases. Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL

Coordination-Free Computations. Christopher Meiklejohn

Distributed Hash Tables

Consistency. CS 475, Spring 2018 Concurrent & Distributed Systems

Basic vs. Reliable Multicast

11/5/2018 Week 12-A Sangmi Lee Pallickara. CS435 Introduction to Big Data FALL 2018 Colorado State University

Advances in Data Management - NoSQL, NewSQL and Big Data A.Poulovassilis

Important Lessons. Today's Lecture. Two Views of Distributed Systems

CSE 530A. Non-Relational Databases. Washington University Fall 2013

Proseminar Distributed Systems Summer Semester Paxos algorithm. Stefan Resmerita

Last time. Distributed systems Lecture 6: Elections, distributed transactions, and replication. DrRobert N. M. Watson

Distributed Algorithms Benoît Garbinato

Don t Give Up on Serializability Just Yet. Neha Narula

Parallel Data Types of Parallelism Replication (Multiple copies of the same data) Better throughput for read-only computations Data safety Partitionin

Causal Consistency and Two-Phase Commit

Making RAMCloud Writes Even Faster

Architekturen für die Cloud

Exam 2 Review. Fall 2011

PNUTS: Yahoo! s Hosted Data Serving Platform. Reading Review by: Alex Degtiar (adegtiar) /30/2013

NoSQL Concepts, Techniques & Systems Part 1. Valentina Ivanova IDA, Linköping University

Today CSCI Recovery techniques. Recovery. Recovery CAP Theorem. Instructor: Abhishek Chandra

Modern Database Concepts

Failures, Elections, and Raft

Introduction to Distributed Systems Seif Haridi

NoSQL Databases An efficient way to store and query heterogeneous astronomical data in DACE. Nicolas Buchschacher - University of Geneva - ADASS 2018

Webinar Series TMIP VISION

Lecture XII: Replication

Transactions. CS 475, Spring 2018 Concurrent & Distributed Systems

CS5412: ANATOMY OF A CLOUD

Parallel DBs. April 25, 2017

PNUTS and Weighted Voting. Vijay Chidambaram CS 380 D (Feb 8)

Lecture 6 Consistency and Replication

Fault-Tolerance & Paxos

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 14 Distributed Transactions

Distributed Data Store

Distributed Systems COMP 212. Revision 2 Othon Michail

FIT: A Distributed Database Performance Tradeoff. Faleiro and Abadi CS590-BDS Thamir Qadah

Eventual Consistency Today: Limitations, Extensions and Beyond

CPS 512 midterm exam #1, 10/7/2016

Distributed Consensus: Making Impossible Possible

Paxos provides a highly available, redundant log of events

Weak Consistency and Disconnected Operation in git. Raymond Cheng

GridGain and Apache Ignite In-Memory Performance with Durability of Disk

Distributed Computation Models

Introduction to NoSQL Databases

Apache Cassandra - A Decentralized Structured Storage System

Riak. Distributed, replicated, highly available

NoSQL Databases. Vincent Leroy

Transcription:

The CAP theorem The bad, the good and the ugly Michael Pfeiffer Advanced Networking Technologies FG Telematik/Rechnernetze TU Ilmenau 2017-05-15 1 / 19

1 The bad: The CAP theorem s proof 2 The good: A different perspective 3 The ugly: CAP and SDN 2 / 19

Section 1 The bad: The CAP theorem s proof 3 / 19

The CAP theorem Central proposition In a distributed system, it is impossible to provide Consistency, Availability, and Partition tolerance all at once, i.e. at least one of them has to be sacrificed. Suggested by Brewer in 1999/2000, proof by Gilbert and Lynch in 2002 [1] In many networks, the absence of partitions cannot be guaranteed (firmware bugs, administrative errors,... ) choice between CP and AP 4 / 19

Formal model Network partition All messages between nodes in different components are lost. 5 / 19

Formal model Network partition All messages between nodes in different components are lost. Availability: Available data objects Every request received by a non-failing node must result in a response. No time boundary, but network partition can last forever, thus a strong availability requirement. 5 / 19

Formal model Network partition All messages between nodes in different components are lost. Availability: Available data objects Every request received by a non-failing node must result in a response. No time boundary, but network partition can last forever, thus a strong availability requirement. Consistency: Atomic data objects total order on all operations such that each operation looks as if it were completed at a single instant. Equivalent: Requests must act as if they were processed on a single node, one at a time. 5 / 19

Proof Proof by contradiction. Assume there is a CAP system: 6 / 19

Proof Proof by contradiction. Assume there is a CAP system: G 1 G 2 6 / 19

Proof Proof by contradiction. Assume there is a CAP system: G 1 G 2 6 / 19

Proof Proof by contradiction. Assume there is a CAP system: 1. x 42 G 1 G 2 C 1 6 / 19

Proof Proof by contradiction. Assume there is a CAP system: G 1 G 2 1. x 42 2. success! C 1 6 / 19

Proof Proof by contradiction. Assume there is a CAP system: G 1 G 2 1. x 42 2. success! 3. x? C 1 C 2 6 / 19

Proof Proof by contradiction. Assume there is a CAP system: G 1 G 2 1. x 42 2. success! 3. x? 4.??? C 1 C 2 6 / 19

Classical strategies for CP and AP CP systems Delay the acknowledgement of a write operation until new value has been propagated to all nodes Examples: Relational database with synchronous replication 2PCP 7 / 19

Classical strategies for CP and AP CP systems Delay the acknowledgement of a write operation until new value has been propagated to all nodes Examples: Relational database with synchronous replication 2PCP AP systems Answer with the (possibly stale) last known value Examples: Slave DNS servers NoSQL databases 7 / 19

Section 2 The good: A different perspective 8 / 19

A different perspective (by Brewer [2]) The partition decision If a partition occurs during the processing of an operation, each node can decide to cancel the operation (favour C over A), or proceed, but risk inconsistencies (favour A over C). But: It is possible to decide differently every time, based on the circumstances. 9 / 19

A different perspective (by Brewer [2]) The partition decision If a partition occurs during the processing of an operation, each node can decide to cancel the operation (favour C over A), or proceed, but risk inconsistencies (favour A over C). But: It is possible to decide differently every time, based on the circumstances. This means: No partition No problem But during a partition, all systems must decide eventually Permanently retrying is in fact a choice for C over A 9 / 19

Mitigation strategies Generally: To keep consistency, some operations must be forbidden during a partition Others are okay (e.g. read queries) Often: Guarantee to consistency to a certain degree Example: Read-your-own-writes consistency Facebook: A user s timeline is stored at master copy and cached at slaves Usually users see (potentially stale) copies at slaves But when they post something, their reads are redirected to the respective master for a certain time Different strategies on different levels possible, e.g. inside a single site and between sites (latency!) Often: In one component progress is possible, multiple consensus algorithms available (e.g. dynamic voting) 10 / 19

Partition recovery What if we still want to continue service during partition? 1 Detect partition 2 Enter a special partition mode 3 Continue service 4 After partition: Recovery 11 / 19

Partition recovery What if we still want to continue service during partition? 1 Detect partition 2 Enter a special partition mode 3 Continue service 4 After partition: Recovery The small problem: Partition detection Nodes can disagree whether a partition exists Consensus about partition state not possible Nodes may enter the partition mode at different times A distributed commit protocol is required (2PCP, Paxos,... ) 11 / 19

The big problem: Partition recovery A (very) simple example: Users register on a web site Every user is assigned an unique ID (SQL: serial, auto_increment) During partition: Same ID might be assigned twice Recovery: Recreate uniqueness of IDs 12 / 19

The big problem: Partition recovery A (very) simple example: Users register on a web site Every user is assigned an unique ID (SQL: serial, auto_increment) During partition: Same ID might be assigned twice Recovery: Recreate uniqueness of IDs Partition recovery: It s about invariants In a consistent system, invariants are guaranteed Even when the system s designer does not know them In an available system, invariants must be explicitly restored after a partition System s designer must know the invariants and how to restore them 12 / 19

CRDTs Commutative/Conflict-free Replicated Data Types (CRDTs) are data types that provably converge Example: Google Docs serialises edits into a series of insert and delete operations 13 / 19

CRDTs Commutative/Conflict-free Replicated Data Types (CRDTs) are data types that provably converge Example: Google Docs serialises edits into a series of insert and delete operations On Monday, the ANT lecture is at 13:00. 13 / 19

CRDTs Commutative/Conflict-free Replicated Data Types (CRDTs) are data types that provably converge Example: Google Docs serialises edits into a series of insert and delete operations On Thursday, the ANT lecture is at 13:00. On Monday, the ANT lecture is at 13:00. 13 / 19

CRDTs Commutative/Conflict-free Replicated Data Types (CRDTs) are data types that provably converge Example: Google Docs serialises edits into a series of insert and delete operations On Monday, the ANT lecture is at 13:00. On Thursday, the ANT lecture is at 13:00. On Monday, the ANT lecture is at 17:00. 13 / 19

CRDTs Commutative/Conflict-free Replicated Data Types (CRDTs) are data types that provably converge Example: Google Docs serialises edits into a series of insert and delete operations On Monday, the ANT lecture is at 13:00. On Thursday, the ANT lecture is at 13:00. On Monday, the ANT lecture is at 17:00. On Thursday, the ANT lecture is at 17:00. 13 / 19

CRDTs Commutative/Conflict-free Replicated Data Types (CRDTs) are data types that provably converge Example: Google Docs serialises edits into a series of insert and delete operations On Monday, the ANT lecture is at 13:00. On Thursday, the ANT lecture is at 13:00. On Monday, the ANT lecture is at 17:00. On Thursday, the ANT lecture is at 17:00. Application-specific invariants are not ensured automatically 13 / 19

More on partition recovery Recovery is tedious and error prone Brewer: Similar to going from single-threaded to multi-threaded programming Sometimes only possibility: Ask the user (e.g. git merge) Balance between availability and consistency: ATMs: When partitioned, limit withdrawal to amount X Invariant: Not more withdrawals than allowed Manual correction afterwards Usual tools: Version vectors (vector clocks) Logging, replay and rollback 14 / 19

Section 3 The ugly: CAP and SDN 15 / 19

SDN and CAP So far, we have talked about distributed systems on the application layer (databases, web services,...) SDN is much more basic (layer 2/3) Network functionality is essential pure CP is not really an option AP means partition recovery is required 16 / 19

SDN and partition recovery Possible without the network up and running? Beware of dependency loops... Is falling back to non-sdn networking possible? Even if SDN has been used to replace features like VLANs? Relying on user input rather unrealistic... Possible to figure out all the invariants? Most SDN publications ignore the issue... BGP does not stabilise in all cases [3]... 17 / 19

Wrapping up 1 The CAP theorem is proven and holds. 2 Do not think about CP or AP systems, but about the partition decision. 3 Many possibilities to fine-tune the balance between consistency and availability, and to recover from partitions. 4 But systems tend to become very complex. 5 Can we stomach this amount of complexity for building services as basic as network connectivity? 18 / 19

[1] Seth Gilbert and Nancy Lynch. Brewer s conjecture and the feasibility of consistent, available, partition-tolerant web services. In: ACM SIGACT News 33 (2 June 2002), pp. 51 59. DOI: 10.1145/564585.564601. [2] Eric Brewer. CAP twelve years later: How the rules have changed. In: Computer 45 (2 Feb. 2012), pp. 23 29. DOI: 10.1109/MC.2012.37. [3] Timothy G. Griffin and Gordon Wilfong. An analysis of BGP convergence properties. In: ACM SIGCOMM Computer Communication Review 29 (4 Oct. 1999), pp. 277 288. DOI: 10.1145/316194.316231. 19 / 19