# The CAP theorem. The bad, the good and the ugly. Michael Pfeiffer Advanced Networking Technologies FG Telematik/Rechnernetze TU Ilmenau

Size: px
Start display at page:

## Transcription

1 The CAP theorem The bad, the good and the ugly Michael Pfeiffer Advanced Networking Technologies FG Telematik/Rechnernetze TU Ilmenau / 19

2 1 The bad: The CAP theorem s proof 2 The good: A different perspective 3 The ugly: CAP and SDN 2 / 19

3 Section 1 The bad: The CAP theorem s proof 3 / 19

4 The CAP theorem Central proposition In a distributed system, it is impossible to provide Consistency, Availability, and Partition tolerance all at once, i.e. at least one of them has to be sacrificed. Suggested by Brewer in 1999/2000, proof by Gilbert and Lynch in 2002 [1] In many networks, the absence of partitions cannot be guaranteed (firmware bugs, administrative errors,... ) choice between CP and AP 4 / 19

5 Formal model Network partition All messages between nodes in different components are lost. 5 / 19

6 Formal model Network partition All messages between nodes in different components are lost. Availability: Available data objects Every request received by a non-failing node must result in a response. No time boundary, but network partition can last forever, thus a strong availability requirement. 5 / 19

7 Formal model Network partition All messages between nodes in different components are lost. Availability: Available data objects Every request received by a non-failing node must result in a response. No time boundary, but network partition can last forever, thus a strong availability requirement. Consistency: Atomic data objects total order on all operations such that each operation looks as if it were completed at a single instant. Equivalent: Requests must act as if they were processed on a single node, one at a time. 5 / 19

8 Proof Proof by contradiction. Assume there is a CAP system: 6 / 19

9 Proof Proof by contradiction. Assume there is a CAP system: G 1 G 2 6 / 19

10 Proof Proof by contradiction. Assume there is a CAP system: G 1 G 2 6 / 19

11 Proof Proof by contradiction. Assume there is a CAP system: 1. x 42 G 1 G 2 C 1 6 / 19

12 Proof Proof by contradiction. Assume there is a CAP system: G 1 G 2 1. x success! C 1 6 / 19

13 Proof Proof by contradiction. Assume there is a CAP system: G 1 G 2 1. x success! 3. x? C 1 C 2 6 / 19

14 Proof Proof by contradiction. Assume there is a CAP system: G 1 G 2 1. x success! 3. x? 4.??? C 1 C 2 6 / 19

15 Classical strategies for CP and AP CP systems Delay the acknowledgement of a write operation until new value has been propagated to all nodes Examples: Relational database with synchronous replication 2PCP 7 / 19

16 Classical strategies for CP and AP CP systems Delay the acknowledgement of a write operation until new value has been propagated to all nodes Examples: Relational database with synchronous replication 2PCP AP systems Answer with the (possibly stale) last known value Examples: Slave DNS servers NoSQL databases 7 / 19

17 Section 2 The good: A different perspective 8 / 19

18 A different perspective (by Brewer [2]) The partition decision If a partition occurs during the processing of an operation, each node can decide to cancel the operation (favour C over A), or proceed, but risk inconsistencies (favour A over C). But: It is possible to decide differently every time, based on the circumstances. 9 / 19

19 A different perspective (by Brewer [2]) The partition decision If a partition occurs during the processing of an operation, each node can decide to cancel the operation (favour C over A), or proceed, but risk inconsistencies (favour A over C). But: It is possible to decide differently every time, based on the circumstances. This means: No partition No problem But during a partition, all systems must decide eventually Permanently retrying is in fact a choice for C over A 9 / 19

20 Mitigation strategies Generally: To keep consistency, some operations must be forbidden during a partition Others are okay (e.g. read queries) Often: Guarantee to consistency to a certain degree Example: Read-your-own-writes consistency Facebook: A user s timeline is stored at master copy and cached at slaves Usually users see (potentially stale) copies at slaves But when they post something, their reads are redirected to the respective master for a certain time Different strategies on different levels possible, e.g. inside a single site and between sites (latency!) Often: In one component progress is possible, multiple consensus algorithms available (e.g. dynamic voting) 10 / 19

21 Partition recovery What if we still want to continue service during partition? 1 Detect partition 2 Enter a special partition mode 3 Continue service 4 After partition: Recovery 11 / 19

22 Partition recovery What if we still want to continue service during partition? 1 Detect partition 2 Enter a special partition mode 3 Continue service 4 After partition: Recovery The small problem: Partition detection Nodes can disagree whether a partition exists Consensus about partition state not possible Nodes may enter the partition mode at different times A distributed commit protocol is required (2PCP, Paxos,... ) 11 / 19

23 The big problem: Partition recovery A (very) simple example: Users register on a web site Every user is assigned an unique ID (SQL: serial, auto_increment) During partition: Same ID might be assigned twice Recovery: Recreate uniqueness of IDs 12 / 19

24 The big problem: Partition recovery A (very) simple example: Users register on a web site Every user is assigned an unique ID (SQL: serial, auto_increment) During partition: Same ID might be assigned twice Recovery: Recreate uniqueness of IDs Partition recovery: It s about invariants In a consistent system, invariants are guaranteed Even when the system s designer does not know them In an available system, invariants must be explicitly restored after a partition System s designer must know the invariants and how to restore them 12 / 19

25 CRDTs Commutative/Conflict-free Replicated Data Types (CRDTs) are data types that provably converge Example: Google Docs serialises edits into a series of insert and delete operations 13 / 19

26 CRDTs Commutative/Conflict-free Replicated Data Types (CRDTs) are data types that provably converge Example: Google Docs serialises edits into a series of insert and delete operations On Monday, the ANT lecture is at 13: / 19

27 CRDTs Commutative/Conflict-free Replicated Data Types (CRDTs) are data types that provably converge Example: Google Docs serialises edits into a series of insert and delete operations On Thursday, the ANT lecture is at 13:00. On Monday, the ANT lecture is at 13: / 19

28 CRDTs Commutative/Conflict-free Replicated Data Types (CRDTs) are data types that provably converge Example: Google Docs serialises edits into a series of insert and delete operations On Monday, the ANT lecture is at 13:00. On Thursday, the ANT lecture is at 13:00. On Monday, the ANT lecture is at 17: / 19

29 CRDTs Commutative/Conflict-free Replicated Data Types (CRDTs) are data types that provably converge Example: Google Docs serialises edits into a series of insert and delete operations On Monday, the ANT lecture is at 13:00. On Thursday, the ANT lecture is at 13:00. On Monday, the ANT lecture is at 17:00. On Thursday, the ANT lecture is at 17: / 19

30 CRDTs Commutative/Conflict-free Replicated Data Types (CRDTs) are data types that provably converge Example: Google Docs serialises edits into a series of insert and delete operations On Monday, the ANT lecture is at 13:00. On Thursday, the ANT lecture is at 13:00. On Monday, the ANT lecture is at 17:00. On Thursday, the ANT lecture is at 17:00. Application-specific invariants are not ensured automatically 13 / 19

31 More on partition recovery Recovery is tedious and error prone Brewer: Similar to going from single-threaded to multi-threaded programming Sometimes only possibility: Ask the user (e.g. git merge) Balance between availability and consistency: ATMs: When partitioned, limit withdrawal to amount X Invariant: Not more withdrawals than allowed Manual correction afterwards Usual tools: Version vectors (vector clocks) Logging, replay and rollback 14 / 19

32 Section 3 The ugly: CAP and SDN 15 / 19

33 SDN and CAP So far, we have talked about distributed systems on the application layer (databases, web services,...) SDN is much more basic (layer 2/3) Network functionality is essential pure CP is not really an option AP means partition recovery is required 16 / 19

34 SDN and partition recovery Possible without the network up and running? Beware of dependency loops... Is falling back to non-sdn networking possible? Even if SDN has been used to replace features like VLANs? Relying on user input rather unrealistic... Possible to figure out all the invariants? Most SDN publications ignore the issue... BGP does not stabilise in all cases [3] / 19

35 Wrapping up 1 The CAP theorem is proven and holds. 2 Do not think about CP or AP systems, but about the partition decision. 3 Many possibilities to fine-tune the balance between consistency and availability, and to recover from partitions. 4 But systems tend to become very complex. 5 Can we stomach this amount of complexity for building services as basic as network connectivity? 18 / 19

36 [1] Seth Gilbert and Nancy Lynch. Brewer s conjecture and the feasibility of consistent, available, partition-tolerant web services. In: ACM SIGACT News 33 (2 June 2002), pp DOI: / [2] Eric Brewer. CAP twelve years later: How the rules have changed. In: Computer 45 (2 Feb. 2012), pp DOI: /MC [3] Timothy G. Griffin and Gordon Wilfong. An analysis of BGP convergence properties. In: ACM SIGCOMM Computer Communication Review 29 (4 Oct. 1999), pp DOI: / / 19

### NoSQL systems: sharding, replication and consistency. Riccardo Torlone Università Roma Tre

NoSQL systems: sharding, replication and consistency Riccardo Torlone Università Roma Tre Data distribution NoSQL systems: data distributed over large clusters Aggregate is a natural unit to use for data

### SCALABLE CONSISTENCY AND TRANSACTION MODELS

Data Management in the Cloud SCALABLE CONSISTENCY AND TRANSACTION MODELS 69 Brewer s Conjecture Three properties that are desirable and expected from realworld shared-data systems C: data consistency A:

### Big Data Management and NoSQL Databases

NDBI040 Big Data Management and NoSQL Databases Lecture 11. Advanced Aspects of Big Data Management Doc. RNDr. Irena Holubova, Ph.D. holubova@ksi.mff.cuni.cz http://www.ksi.mff.cuni.cz/~holubova/ndbi040/

### Report to Brewer s original presentation of his CAP Theorem at the Symposium on Principles of Distributed Computing (PODC) 2000

Brewer s CAP Theorem Report to Brewer s original presentation of his CAP Theorem at the Symposium on Principles of Distributed Computing (PODC) 2000 Written by Table of Contents Introduction... 2 The CAP-Theorem...

### CS6450: Distributed Systems Lecture 11. Ryan Stutsman

Strong Consistency CS6450: Distributed Systems Lecture 11 Ryan Stutsman Material taken/derived from Princeton COS-418 materials created by Michael Freedman and Kyle Jamieson at Princeton University. Licensed

### Large-Scale Key-Value Stores Eventual Consistency Marco Serafini

Large-Scale Key-Value Stores Eventual Consistency Marco Serafini COMPSCI 590S Lecture 13 Goals of Key-Value Stores Export simple API put(key, value) get(key) Simpler and faster than a DBMS Less complexity,

### CAP Theorem. March 26, Thanks to Arvind K., Dong W., and Mihir N. for slides.

C A CAP Theorem P March 26, 2018 Thanks to Arvind K., Dong W., and Mihir N. for slides. CAP Theorem It is impossible for a web service to provide these three guarantees at the same time (pick 2 of 3):

### CS6450: Distributed Systems Lecture 15. Ryan Stutsman

Strong Consistency CS6450: Distributed Systems Lecture 15 Ryan Stutsman Material taken/derived from Princeton COS-418 materials created by Michael Freedman and Kyle Jamieson at Princeton University. Licensed

### Strong Consistency & CAP Theorem

Strong Consistency & CAP Theorem CS 240: Computing Systems and Concurrency Lecture 15 Marco Canini Credits: Michael Freedman and Kyle Jamieson developed much of the original material. Consistency models

### CMU SCS CMU SCS Who: What: When: Where: Why: CMU SCS

Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB s C. Faloutsos A. Pavlo Lecture#23: Distributed Database Systems (R&G ch. 22) Administrivia Final Exam Who: You What: R&G Chapters 15-22

### Integrity in Distributed Databases

Integrity in Distributed Databases Andreas Farella Free University of Bozen-Bolzano Table of Contents 1 Introduction................................................... 3 2 Different aspects of integrity.....................................

### 10. Replication. Motivation

10. Replication Page 1 10. Replication Motivation Reliable and high-performance computation on a single instance of a data object is prone to failure. Replicate data to overcome single points of failure

### CS October 2017

Atomic Transactions Transaction An operation composed of a number of discrete steps. Distributed Systems 11. Distributed Commit Protocols All the steps must be completed for the transaction to be committed.

### Conflict-free Replicated Data Types in Practice

Conflict-free Replicated Data Types in Practice Georges Younes Vitor Enes Wednesday 11 th January, 2017 HASLab/INESC TEC & University of Minho InfoBlender Motivation Background Background: CAP Theorem

### WHERE TO PUT DATA. or What are we going to do with all this stuff?

WHERE TO PUT DATA or What are we going to do with all this stuff? About The Speaker Application Developer/Architect 21 years Web Developer 15 years Web Operations 7 years MongoDB Key-value Distributed

### Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf

Distributed systems Lecture 6: distributed transactions, elections, consensus and replication Malte Schwarzkopf Last time Saw how we can build ordered multicast Messages between processes in a group Need

### EECS 498 Introduction to Distributed Systems

EECS 498 Introduction to Distributed Systems Fall 2017 Harsha V. Madhyastha Replicated State Machines Logical clocks Primary/ Backup Paxos? 0 1 (N-1)/2 No. of tolerable failures October 11, 2017 EECS 498

### Consistency in Distributed Storage Systems. Mihir Nanavati March 4 th, 2016

Consistency in Distributed Storage Systems Mihir Nanavati March 4 th, 2016 Today Overview of distributed storage systems CAP Theorem About Me Virtualization/Containers, CPU microarchitectures/caches, Network

### Extreme Computing. NoSQL.

Extreme Computing NoSQL PREVIOUSLY: BATCH Query most/all data Results Eventually NOW: ON DEMAND Single Data Points Latency Matters One problem, three ideas We want to keep track of mutable state in a scalable

### Transactions and ACID

Transactions and ACID Kevin Swingler Contents Recap of ACID transactions in RDBMSs Transactions and ACID in MongoDB 1 Concurrency Databases are almost always accessed by multiple users concurrently A user

### CS 470 Spring Fault Tolerance. Mike Lam, Professor. Content taken from the following:

CS 47 Spring 27 Mike Lam, Professor Fault Tolerance Content taken from the following: "Distributed Systems: Principles and Paradigms" by Andrew S. Tanenbaum and Maarten Van Steen (Chapter 8) Various online

### CS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5.

Question 1 What makes a message unstable? How does an unstable message become stable? Distributed Systems 2016 Exam 2 Review Paul Krzyzanowski Rutgers University Fall 2016 In virtual sychrony, a message

### Eventual Consistency Today: Limitations, Extensions and Beyond

Eventual Consistency Today: Limitations, Extensions and Beyond Peter Bailis and Ali Ghodsi, UC Berkeley - Nomchin Banga Outline Eventual Consistency: History and Concepts How eventual is eventual consistency?

### Replication. Feb 10, 2016 CPSC 416

Replication Feb 10, 2016 CPSC 416 How d we get here? Failures & single systems; fault tolerance techniques added redundancy (ECC memory, RAID, etc.) Conceptually, ECC & RAID both put a master in front

### Brewer's Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services

PODC 2004 The PODC Steering Committee is pleased to announce that PODC 2004 will be held in St. John's, Newfoundland. This will be the thirteenth PODC to be held in Canada but the first to be held there

### Data Consistency Now and Then

Data Consistency Now and Then Todd Schmitter JPMorgan Chase June 27, 2017 Room #208 Data consistency in real life Social media Facebook post: January 22, 2017, at a political rally Comments displayed are

### Computing Parable. The Archery Teacher. Courtesy: S. Keshav, U. Waterloo. Computer Science. Lecture 16, page 1

Computing Parable The Archery Teacher Courtesy: S. Keshav, U. Waterloo Lecture 16, page 1 Consistency and Replication Today: Consistency models Data-centric consistency models Client-centric consistency

### SCALABLE CONSISTENCY AND TRANSACTION MODELS THANKS TO M. GROSSNIKLAUS

Sharding and Replica@on Data Management in the Cloud SCALABLE CONSISTENCY AND TRANSACTION MODELS THANKS TO M. GROSSNIKLAUS Sharding Breaking a database into several collecbons (shards) Each data item (e.g.,

### Self-healing Data Step by Step

Self-healing Data Step by Step Uwe Friedrichsen (codecentric AG) NoSQL matters Cologne, 29. April 2014 @ufried Uwe Friedrichsen uwe.friedrichsen@codecentric.de http://slideshare.net/ufried http://ufried.tumblr.com

### CISC 7610 Lecture 2b The beginnings of NoSQL

CISC 7610 Lecture 2b The beginnings of NoSQL Topics: Big Data Google s infrastructure Hadoop: open google infrastructure Scaling through sharding CAP theorem Amazon s Dynamo 5 V s of big data Everyone

### CAP and the Architectural Consequences

CAP and the Architectural Consequences NoSQL matters Cologne 2013-04-27 martin Schönert (triagens) 2013 triagens GmbH 2013-04-27 1 Who am I martin Schönert I work at triagens GmbH I have been in software

### Consensus a classic problem. Consensus, impossibility results and Paxos. Distributed Consensus. Asynchronous networks.

Consensus, impossibility results and Paxos Ken Birman Consensus a classic problem Consensus abstraction underlies many distributed systems and protocols N processes They start execution with inputs {0,1}

### Eventual Consistency 1

Eventual Consistency 1 Readings Werner Vogels ACM Queue paper http://queue.acm.org/detail.cfm?id=1466448 Dynamo paper http://www.allthingsdistributed.com/files/ amazon-dynamo-sosp2007.pdf Apache Cassandra

### Mutual consistency, what for? Replication. Data replication, consistency models & protocols. Difficulties. Execution Model.

Replication Data replication, consistency models & protocols C. L. Roncancio - S. Drapeau Grenoble INP Ensimag / LIG - Obeo 1 Data and process Focus on data: several physical of one logical object What

### DrRobert N. M. Watson

Distributed systems Lecture 15: Replication, quorums, consistency, CAP, and Amazon/Google case studies DrRobert N. M. Watson 1 Last time General issue of consensus: How to get processes to agree on something

### Databases : Lecture 1 2: Beyond ACID/Relational databases Timothy G. Griffin Lent Term Apologies to Martin Fowler ( NoSQL Distilled )

Databases : Lecture 1 2: Beyond ACID/Relational databases Timothy G. Griffin Lent Term 2016 Rise of Web and cluster-based computing NoSQL Movement Relationships vs. Aggregates Key-value store XML or JSON

### Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Week 10: Mutable State (2/2) March 16, 2017 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo These

### Introduction to NoSQL

Introduction to NoSQL Agenda History What is NoSQL Types of NoSQL The CAP theorem History - RDBMS Relational DataBase Management Systems were invented in the 1970s. E. F. Codd, "Relational Model of Data

### 11. Replication. Motivation

11. Replication Seite 1 11. Replication Motivation Reliable and high-performance computation on a single instance of a data object is prone to failure. Replicate data to overcome single points of failure

### 4/9/2018 Week 13-A Sangmi Lee Pallickara. CS435 Introduction to Big Data Spring 2018 Colorado State University. FAQs. Architecture of GFS

W13.A.0.0 CS435 Introduction to Big Data W13.A.1 FAQs Programming Assignment 3 has been posted PART 2. LARGE SCALE DATA STORAGE SYSTEMS DISTRIBUTED FILE SYSTEMS Recitations Apache Spark tutorial 1 and

### Consensus, impossibility results and Paxos. Ken Birman

Consensus, impossibility results and Paxos Ken Birman Consensus a classic problem Consensus abstraction underlies many distributed systems and protocols N processes They start execution with inputs {0,1}

### Distributed Systems. Characteristics of Distributed Systems. Lecture Notes 1 Basic Concepts. Operating Systems. Anand Tripathi

1 Lecture Notes 1 Basic Concepts Anand Tripathi CSci 8980 Operating Systems Anand Tripathi CSci 8980 1 Distributed Systems A set of computers (hosts or nodes) connected through a communication network.

### Distributed Systems. Characteristics of Distributed Systems. Characteristics of Distributed Systems. Goals in Distributed System Designs

1 Anand Tripathi CSci 8980 Operating Systems Lecture Notes 1 Basic Concepts Distributed Systems A set of computers (hosts or nodes) connected through a communication network. Nodes may have different speeds

### Databases : Lectures 11 and 12: Beyond ACID/Relational databases Timothy G. Griffin Lent Term 2013

Databases : Lectures 11 and 12: Beyond ACID/Relational databases Timothy G. Griffin Lent Term 2013 Rise of Web and cluster-based computing NoSQL Movement Relationships vs. Aggregates Key-value store XML

### CS505: Distributed Systems

Department of Computer Science CS505: Distributed Systems Lecture 13: Distributed Transactions Outline Distributed Transactions Two Phase Commit and Three Phase Commit Non-blocking Atomic Commit with P

### 10.0 Towards the Cloud

10.0 Towards the Cloud Distributed Data Management Wolf-Tilo Balke Christoph Lofi Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 10.0 Special Purpose Database

### CSE-E5430 Scalable Cloud Computing Lecture 10

CSE-E5430 Scalable Cloud Computing Lecture 10 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 23.11-2015 1/29 Exam Registering for the exam is obligatory,

### Consistency and Scalability

COMP 150-IDS: Internet Scale Distributed Systems (Spring 2015) Consistency and Scalability Noah Mendelsohn Tufts University Email: noah@cs.tufts.edu Web: http://www.cs.tufts.edu/~noah Copyright 2015 Noah

### DISTRIBUTED EVENTUALLY CONSISTENT COMPUTATIONS

LASP 2 DISTRIBUTED EVENTUALLY CONSISTENT COMPUTATIONS 3 EN TAL AV CHRISTOPHER MEIKLEJOHN 4 RESEARCH WITH: PETER VAN ROY (UCL) 5 MOTIVATION 6 SYNCHRONIZATION IS EXPENSIVE 7 SYNCHRONIZATION IS SOMETIMES

### Data-Intensive Distributed Computing

Data-Intensive Distributed Computing CS 451/651 (Fall 2018) Part 7: Mutable State (2/2) November 13, 2018 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo These slides are

### Replication in Distributed Systems

Replication in Distributed Systems Replication Basics Multiple copies of data kept in different nodes A set of replicas holding copies of a data Nodes can be physically very close or distributed all over

### Performance and Forgiveness. June 23, 2008 Margo Seltzer Harvard University School of Engineering and Applied Sciences

Performance and Forgiveness June 23, 2008 Margo Seltzer Harvard University School of Engineering and Applied Sciences Margo Seltzer Architect Outline A consistency primer Techniques and costs of consistency

### CISC 7610 Lecture 5 Distributed multimedia databases. Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL

CISC 7610 Lecture 5 Distributed multimedia databases Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL Motivation YouTube receives 400 hours of video per minute That is 200M hours

### Coordination-Free Computations. Christopher Meiklejohn

Coordination-Free Computations Christopher Meiklejohn LASP DISTRIBUTED, EVENTUALLY CONSISTENT COMPUTATIONS CHRISTOPHER MEIKLEJOHN (BASHO TECHNOLOGIES, INC.) PETER VAN ROY (UNIVERSITÉ CATHOLIQUE DE LOUVAIN)

### Distributed Hash Tables

Distributed Hash Tables What is a DHT? Hash Table data structure that maps keys to values essen=al building block in so?ware systems Distributed Hash Table (DHT) similar, but spread across many hosts Interface

### Consistency. CS 475, Spring 2018 Concurrent & Distributed Systems

Consistency CS 475, Spring 2018 Concurrent & Distributed Systems Review: 2PC, Timeouts when Coordinator crashes What if the bank doesn t hear back from coordinator? If bank voted no, it s OK to abort If

### Basic vs. Reliable Multicast

Basic vs. Reliable Multicast Basic multicast does not consider process crashes. Reliable multicast does. So far, we considered the basic versions of ordered multicasts. What about the reliable versions?

### 11/5/2018 Week 12-A Sangmi Lee Pallickara. CS435 Introduction to Big Data FALL 2018 Colorado State University

11/5/2018 CS435 Introduction to Big Data - FALL 2018 W12.A.0.0 CS435 Introduction to Big Data 11/5/2018 CS435 Introduction to Big Data - FALL 2018 W12.A.1 Consider a Graduate Degree in Computer Science

### Advances in Data Management - NoSQL, NewSQL and Big Data A.Poulovassilis

Advances in Data Management - NoSQL, NewSQL and Big Data A.Poulovassilis 1 NoSQL So-called NoSQL systems offer reduced functionalities compared to traditional Relational DBMSs, with the aim of achieving

### Important Lessons. Today's Lecture. Two Views of Distributed Systems

Important Lessons Replication good for performance/ reliability Key challenge keeping replicas up-to-date Wide range of consistency models Will see more next lecture Range of correctness properties L-10

### CSE 530A. Non-Relational Databases. Washington University Fall 2013

CSE 530A Non-Relational Databases Washington University Fall 2013 NoSQL "NoSQL" was originally the name of a specific RDBMS project that did not use a SQL interface Was co-opted years later to refer to

### Proseminar Distributed Systems Summer Semester Paxos algorithm. Stefan Resmerita

Proseminar Distributed Systems Summer Semester 2016 Paxos algorithm stefan.resmerita@cs.uni-salzburg.at The Paxos algorithm Family of protocols for reaching consensus among distributed agents Agents may

### Last time. Distributed systems Lecture 6: Elections, distributed transactions, and replication. DrRobert N. M. Watson

Distributed systems Lecture 6: Elections, distributed transactions, and replication DrRobert N. M. Watson 1 Last time Saw how we can build ordered multicast Messages between processes in a group Need to

### Distributed Algorithms Benoît Garbinato

Distributed Algorithms Benoît Garbinato 1 Distributed systems networks distributed As long as there were no machines, programming was no problem networks distributed at all; when we had a few weak computers,

### Don t Give Up on Serializability Just Yet. Neha Narula

Don t Give Up on Serializability Just Yet Neha Narula Don t Give Up on Serializability Just Yet A journey into serializable systems Neha Narula MIT CSAIL GOTO Chicago May 2015 2 @neha PhD candidate at

### Parallel Data Types of Parallelism Replication (Multiple copies of the same data) Better throughput for read-only computations Data safety Partitionin

Parallel Data Types of Parallelism Replication (Multiple copies of the same data) Better throughput for read-only computations Data safety Partitioning (Different data at different sites More space Better

### Causal Consistency and Two-Phase Commit

Causal Consistency and Two-Phase Commit CS 240: Computing Systems and Concurrency Lecture 16 Marco Canini Credits: Michael Freedman and Kyle Jamieson developed much of the original material. Consistency

### Making RAMCloud Writes Even Faster

Making RAMCloud Writes Even Faster (Bring Asynchrony to Distributed Systems) Seo Jin Park John Ousterhout Overview Goal: make writes asynchronous with consistency. Approach: rely on client Server returns

### Architekturen für die Cloud

Architekturen für die Cloud Eberhard Wolff Architecture & Technology Manager adesso AG 08.06.11 What is Cloud? National Institute for Standards and Technology (NIST) Definition On-demand self-service >

### Exam 2 Review. Fall 2011

Exam 2 Review Fall 2011 Question 1 What is a drawback of the token ring election algorithm? Bad question! Token ring mutex vs. Ring election! Ring election: multiple concurrent elections message size grows

### PNUTS: Yahoo! s Hosted Data Serving Platform. Reading Review by: Alex Degtiar (adegtiar) /30/2013

PNUTS: Yahoo! s Hosted Data Serving Platform Reading Review by: Alex Degtiar (adegtiar) 15-799 9/30/2013 What is PNUTS? Yahoo s NoSQL database Motivated by web applications Massively parallel Geographically

### NoSQL Concepts, Techniques & Systems Part 1. Valentina Ivanova IDA, Linköping University

NoSQL Concepts, Techniques & Systems Part 1 Valentina Ivanova IDA, Linköping University 2017-03-20 2 Outline Today Part 1 RDBMS NoSQL NewSQL DBMS OLAP vs OLTP NoSQL Concepts and Techniques Horizontal scalability

### Today CSCI Recovery techniques. Recovery. Recovery CAP Theorem. Instructor: Abhishek Chandra

Today CSCI 5105 Recovery CAP Theorem Instructor: Abhishek Chandra 2 Recovery Operations to be performed to move from an erroneous state to an error-free state Backward recovery: Go back to a previous correct

### Modern Database Concepts

Modern Database Concepts Basic Principles Doc. RNDr. Irena Holubova, Ph.D. holubova@ksi.mff.cuni.cz NoSQL Overview Main objective: to implement a distributed state Different objects stored on different

### Introduction to Distributed Systems Seif Haridi

Introduction to Distributed Systems Seif Haridi haridi@kth.se What is a distributed system? A set of nodes, connected by a network, which appear to its users as a single coherent system p1 p2. pn send

### NoSQL Databases An efficient way to store and query heterogeneous astronomical data in DACE. Nicolas Buchschacher - University of Geneva - ADASS 2018

NoSQL Databases An efficient way to store and query heterogeneous astronomical data in DACE DACE https://dace.unige.ch Data and Analysis Center for Exoplanets. Facility to store, exchange and analyse data

### Webinar Series TMIP VISION

Webinar Series TMIP VISION TMIP provides technical support and promotes knowledge and information exchange in the transportation planning and modeling community. Today s Goals To Consider: Parallel Processing

### Lecture XII: Replication

Lecture XII: Replication CMPT 401 Summer 2007 Dr. Alexandra Fedorova Replication 2 Why Replicate? (I) Fault-tolerance / High availability As long as one replica is up, the service is available Assume each

### Transactions. CS 475, Spring 2018 Concurrent & Distributed Systems

Transactions CS 475, Spring 2018 Concurrent & Distributed Systems Review: Transactions boolean transfermoney(person from, Person to, float amount){ if(from.balance >= amount) { from.balance = from.balance

### CS5412: ANATOMY OF A CLOUD

1 CS5412: ANATOMY OF A CLOUD Lecture VIII Ken Birman How are cloud structured? 2 Clients talk to clouds using web browsers or the web services standards But this only gets us to the outer skin of the cloud

### Parallel DBs. April 25, 2017

Parallel DBs April 25, 2017 1 Sending Hints Rk B Si Strategy 3: Bloom Filters Node 1 Node 2 2 Sending Hints Rk B Si Strategy 3: Bloom Filters Node 1 with

### PNUTS and Weighted Voting. Vijay Chidambaram CS 380 D (Feb 8)

PNUTS and Weighted Voting Vijay Chidambaram CS 380 D (Feb 8) PNUTS Distributed database built by Yahoo Paper describes a production system Goals: Scalability Low latency, predictable latency Must handle

### Lecture 6 Consistency and Replication

Lecture 6 Consistency and Replication Prof. Wilson Rivera University of Puerto Rico at Mayaguez Electrical and Computer Engineering Department Outline Data-centric consistency Client-centric consistency

### Fault-Tolerance & Paxos

Chapter 15 Fault-Tolerance & Paxos How do you create a fault-tolerant distributed system? In this chapter we start out with simple questions, and, step by step, improve our solutions until we arrive at

### CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 14 Distributed Transactions

CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 14 Distributed Transactions Transactions Main issues: Concurrency control Recovery from failures 2 Distributed Transactions

### Distributed Data Store

Distributed Data Store Large-Scale Distributed le system Q: What if we have too much data to store in a single machine? Q: How can we create one big filesystem over a cluster of machines, whose data is

### Distributed Systems COMP 212. Revision 2 Othon Michail

Distributed Systems COMP 212 Revision 2 Othon Michail Synchronisation 2/55 How would Lamport s algorithm synchronise the clocks in the following scenario? 3/55 How would Lamport s algorithm synchronise

FIT: A Distributed Database Performance Tradeoff Faleiro and Abadi CS590-BDS Thamir Qadah Desirable features in Distributed Databases Impossible to achieve Fairness Isolation Throughput It is impossible

### Eventual Consistency Today: Limitations, Extensions and Beyond

Eventual Consistency Today: Limitations, Extensions and Beyond Peter Bailis and Ali Ghodsi, UC Berkeley Presenter: Yifei Teng Part of slides are cited from Nomchin Banga Road Map Eventual Consistency:

### Distributed Consensus: Making Impossible Possible

Distributed Consensus: Making Impossible Possible Heidi Howard PhD Student @ University of Cambridge heidi.howard@cl.cam.ac.uk @heidiann360 hh360.user.srcf.net Sometimes inconsistency is not an option

Paxos applied

### Weak Consistency and Disconnected Operation in git. Raymond Cheng

Weak Consistency and Disconnected Operation in git Raymond Cheng ryscheng@cs.washington.edu Motivation How can we support disconnected or weakly connected operation? Applications File synchronization across

### GridGain and Apache Ignite In-Memory Performance with Durability of Disk

GridGain and Apache Ignite In-Memory Performance with Durability of Disk Dmitriy Setrakyan Apache Ignite PMC GridGain Founder & CPO http://ignite.apache.org #apacheignite Agenda What is GridGain and Ignite

### Distributed Computation Models

Distributed Computation Models SWE 622, Spring 2017 Distributed Software Engineering Some slides ack: Jeff Dean HW4 Recap https://b.socrative.com/ Class: SWE622 2 Review Replicating state machines Case

### Introduction to NoSQL Databases

Introduction to NoSQL Databases Roman Kern KTI, TU Graz 2017-10-16 Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 1 / 31 Introduction Intro Why NoSQL? Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 2 / 31 Introduction

### Apache Cassandra - A Decentralized Structured Storage System

Apache Cassandra - A Decentralized Structured Storage System Avinash Lakshman Prashant Malik from Facebook Presented by: Oded Naor Acknowledgments Some slides are based on material from: Idit Keidar, Topics