CS6450: Distributed Systems Lecture 15. Ryan Stutsman

Similar documents
Strong Consistency & CAP Theorem

CS6450: Distributed Systems Lecture 11. Ryan Stutsman

Time and distributed systems. Just use time stamps? Correct consistency model? Replication and Consistency

Replications and Consensus

SCALABLE CONSISTENCY AND TRANSACTION MODELS

Causal Consistency and Two-Phase Commit

Distributed Transactions

CS6450: Distributed Systems Lecture 13. Ryan Stutsman

Primary/Backup. CS6450: Distributed Systems Lecture 3/4. Ryan Stutsman

Data Storage Revolution

Causal Consistency. CS 240: Computing Systems and Concurrency Lecture 16. Marco Canini

NoSQL systems: sharding, replication and consistency. Riccardo Torlone Università Roma Tre

GFS. CS6450: Distributed Systems Lecture 5. Ryan Stutsman

Byzantine Fault Tolerance

Consistency in Distributed Storage Systems. Mihir Nanavati March 4 th, 2016

Scaling KVS. CS6450: Distributed Systems Lecture 14. Ryan Stutsman

CAP Theorem. March 26, Thanks to Arvind K., Dong W., and Mihir N. for slides.

Consistency examples. COS 418: Distributed Systems Precept 5. Themis Melissaris

Distributed Consensus: Making Impossible Possible

Large-Scale Key-Value Stores Eventual Consistency Marco Serafini

Recall use of logical clocks

Strong Consistency and Agreement

Replication in Distributed Systems

Distributed Consensus: Making Impossible Possible

Distributed Systems. Characteristics of Distributed Systems. Lecture Notes 1 Basic Concepts. Operating Systems. Anand Tripathi

Distributed Systems. Characteristics of Distributed Systems. Characteristics of Distributed Systems. Goals in Distributed System Designs

The CAP theorem. The bad, the good and the ugly. Michael Pfeiffer Advanced Networking Technologies FG Telematik/Rechnernetze TU Ilmenau

Bitcoin. CS6450: Distributed Systems Lecture 20 Ryan Stutsman

Data-Intensive Distributed Computing

SCALABLE CONSISTENCY AND TRANSACTION MODELS THANKS TO M. GROSSNIKLAUS

Report to Brewer s original presentation of his CAP Theorem at the Symposium on Principles of Distributed Computing (PODC) 2000

EECS 498 Introduction to Distributed Systems

Modern Database Concepts

CSE 124: REPLICATED STATE MACHINES. George Porter November 8 and 10, 2017

Two phase commit protocol. Two phase commit protocol. Recall: Linearizability (Strong Consistency) Consensus

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)

REPLICATED STATE MACHINES

The transaction. Defining properties of transactions. Failures in complex systems propagate. Concurrency Control, Locking, and Recovery

Computing Parable. The Archery Teacher. Courtesy: S. Keshav, U. Waterloo. Computer Science. Lecture 16, page 1

Lecture 6 Consistency and Replication

EECS 498 Introduction to Distributed Systems

CS5412: ANATOMY OF A CLOUD

Distributed Systems. Lec 12: Consistency Models Sequential, Causal, and Eventual Consistency. Slide acks: Jinyang Li

Distributed Data Management Summer Semester 2015 TU Kaiserslautern

Building Consistent Transactions with Inconsistent Replication

Distributed Hash Tables

MDCC MULTI DATA CENTER CONSISTENCY. amplab. Tim Kraska, Gene Pang, Michael Franklin, Samuel Madden, Alan Fekete

Concurrency Control II and Distributed Transactions

Building Consistent Transactions with Inconsistent Replication

CPS 512 midterm exam #1, 10/7/2016

Transactions. CS6450: Distributed Systems Lecture 16. Ryan Stutsman

Brewer's Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services

CS October 2017

Big Data Management and NoSQL Databases

Large-scale Caching. CS6450: Distributed Systems Lecture 18. Ryan Stutsman

10. Replication. Motivation

Defining properties of transactions

ATOMIC COMMITMENT Or: How to Implement Distributed Transactions in Sharded Databases

Mutual consistency, what for? Replication. Data replication, consistency models & protocols. Difficulties. Execution Model.

Consistency. CS 475, Spring 2018 Concurrent & Distributed Systems

Eventual Consistency 1

TAPIR. By Irene Zhang, Naveen Sharma, Adriana Szekeres, Arvind Krishnamurthy, and Dan Ports Presented by Todd Charlton

Distributed Systems (5DV147)

CONCURRENCY CONTROL, TRANSACTIONS, LOCKING, AND RECOVERY

CAP and the Architectural Consequences

PRIMARY-BACKUP REPLICATION

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 14 Distributed Transactions

CMU SCS CMU SCS Who: What: When: Where: Why: CMU SCS

Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf

CS 425 / ECE 428 Distributed Systems Fall 2017

Apache Cassandra - A Decentralized Structured Storage System

Performance and Forgiveness. June 23, 2008 Margo Seltzer Harvard University School of Engineering and Applied Sciences

TWO-PHASE COMMIT ATTRIBUTION 5/11/2018. George Porter May 9 and 11, 2018

Module 7 - Replication

Distributed Systems Intro and Course Overview

CS 470 Spring Fault Tolerance. Mike Lam, Professor. Content taken from the following:

Failure models. Byzantine Fault Tolerance. What can go wrong? Paxos is fail-stop tolerant. BFT model. BFT replication 5/25/18

Concurrency control 12/1/17

Transactions. CS 475, Spring 2018 Concurrent & Distributed Systems

Databases : Lecture 1 2: Beyond ACID/Relational databases Timothy G. Griffin Lent Term Apologies to Martin Fowler ( NoSQL Distilled )

There Is More Consensus in Egalitarian Parliaments

Paxos Replicated State Machines as the Basis of a High- Performance Data Store

Important Lessons. A Distributed Algorithm (2) Today's Lecture - Replication

Erasure Coding in Object Stores: Challenges and Opportunities

11. Replication. Motivation

Weak Consistency as a Last Resort

Conflict-free Replicated Data Types in Practice

4/9/2018 Week 13-A Sangmi Lee Pallickara. CS435 Introduction to Big Data Spring 2018 Colorado State University. FAQs. Architecture of GFS

Extreme Computing. NoSQL.

CISC 7610 Lecture 2b The beginnings of NoSQL

Integrity in Distributed Databases

Don t Give Up on Serializability Just Yet. Neha Narula

Multi-version concurrency control

Versioning, Consistency, and Agreement

Shared memory model" Shared memory guarantees" Read-write register" Wait-freedom: unconditional progress " Liveness" Shared memory basics"

Security (and finale) Dan Ports, CSEP 552

Databases : Lectures 11 and 12: Beyond ACID/Relational databases Timothy G. Griffin Lent Term 2013

Transactions. Lecture 8. Transactions. ACID Properties. Transaction Concept. Example of Fund Transfer. Example of Fund Transfer (Cont.

Fault Tolerance via the State Machine Replication Approach. Favian Contreras

Failures, Elections, and Raft

Transcription:

Strong Consistency CS6450: Distributed Systems Lecture 15 Ryan Stutsman Material taken/derived from Princeton COS-418 materials created by Michael Freedman and Kyle Jamieson at Princeton University. Licensed for use under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. Some material taken/derived from MIT 6.824 by Robert Morris, Franz Kaashoek, and Nickolai Zeldovich. 1

Consistency models 2PC / Consensus Eventual consistency Paxos / Raft Dynamo 2

Consistency in Paxos/Raft shl Consensus Module State Machine Consensus Module State Machine Consensus Module State Machine Log add jmp mov shl Log add jmp mov shl Log add jmp mov shl Fault-tolerance / durability: Don t lose operations Consistency: Ordering between (visible) operations

Correct consistency model? A B Let s say A and B send an op. All readers see A B? All readers see B A? Some see A B and others B A?

Paxos/Raft has strong consistency Provide behavior of a single copy of object: Read should return the most recent write Subsequent reads should return same value, until next write Telephone intuition: 1. Alice updates Facebook post 2. Alice calls Bob on phone: Check my Facebook post! 3. Bob read s Alice s wall, sees her post 5

Strong Consistency? write(a,1) success 1 read(a) Phone call: Ensures happens-before relationship, even through out-of-band communication 6

Strong Consistency? write(a,1) success 1 read(a) One cool trick: Delay responding to writes/ops until properly committed 7

Strong Consistency? This is buggy! write(a,1) success committed read(a) 1 Isn t sufficient to return value of third node: It doesn t know precisely when op is globally committed Instead: Need to actually order read operation 8

Strong Consistency! write(a,1) success 1 read(a) Order all operations via (1) leader, (2) consensus 9

Strong consistency = linearizability Linearizability (Herlihy and Wang 1991) 1. All servers appear as if all ops are executed atomically in some (single, total) sequential order 2. Global ordering preserves each client s own local ordering 3. Global ordering consistent with invocation and response Preserves real-time guarantee If X completes before Y starts, then X precedes Y in sequence Once write completes, all later reads (by wall-clock start time) should return value of that write or value of later write. Once read returns particular value, all later reads should return that value or value of later write.

Intuition: Real-time ordering write(a,1) success 1 committed read(a) Once write completes, all later reads (by wall-clock start time) should return value of that write or value of later write. Once read returns particular value, all later reads should return that value or value of later write. 11

write(a,1) success read(a)? 12

write(a,1) success read(a)? 13

write(a,1) success read(a)? 14

A: B: W(0) R(1) W(0) W(1) (a) Linearizable History R(0) A: B: A: B: W(0) R(1) W(0) W(1) (a) Linearizable History W(0) R(1) W(0) W(1) R(0) R(1) (b) Non-Linearizable History 15

Weaker: Sequential consistency Sequential = Linearizability real-time ordering 1. All servers appear to execute all ops in some identical sequential order 2. Global ordering preserves each client s own local ordering With concurrent ops, reordering of ops (w.r.t. real-time ordering) acceptable, but all servers must see same order e.g., linearizability cares about time sequential consistency cares only about program order

Sequential Consistency write(a,1) success 0 read(a) In example, system orders read(a) before write(a,1) 17

Valid Sequential Consistency? ü x Why? Because P3 and P4 don t agree on order of ops. Doesn t matter when events took place on diff machine, as long as proc s AGREE on order. What if P1 did both W(x)a and W(x)b? - Neither valid, as (a) doesn t preserve local ordering

Tradeoffs are fundamental? 2PC / Consensus Eventual consistency Paxos / Raft Dynamo 19

CAP Conjecture for Distributed Systems From keynote lecture by Eric Brewer (2000) History: Eric started Inktomi, early Internet search site based around commodity clusters of computers Using CAP to justify BASE model: Basically Available, Soft-state services with Eventual consistency Popular interpretation: 2-out-of-3 Consistency (Linearizability) Availability (Can reply to all requests without blocking) Partition Tolerance: Arbitrary crash/network failures 20

CAP Theorem: Proof AP, but not consistent Gilbert, Seth, and Nancy Lynch. "Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services." ACM SIGACT News 33.2 (2002): 51-59. 21

CAP Theorem: Proof CP, but not available Gilbert, Seth, and Nancy Lynch. "Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services." ACM SIGACT News 33.2 (2002): 51-59. 22

CAP Theorem: Proof Not partition tolerant Gilbert, Seth, and Nancy Lynch. "Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services." ACM SIGACT News 33.2 (2002): 51-59. 23

CAP Theorem: AP or CP Criticism: It s not 2-out-of-3 Can t choose no partitions So: AP or CP CA not partition tolerant? 24

Raft? 25

PACELC If there is a partition (P): How does system tradeoff A and C? Else (no partition) How does system tradeoff L and C? Is there a useful system that switches? Dynamo: PA/EL ACID dbs: PC/EC http://dbmsmusings.blogspot.com/2010/04/problems-with-cap-and-yahoos-little.html 26

More tradeoffs L vs. C Low-latency: Speak to fewer than quorum of nodes? 2PC: write N, read 1 Raft: write N/2 + 1, read N/2 + 1 General: W + R > N L and C are fundamentally at odds C = linearizability, sequential, serializability (more later) 27

28

29

Chain replication Writes to head, which orders all writes When write reaches tail, implicitly committed rest of chain Reads to tail, which orders reads w.r.t. committed writes

Chain replication for read-heavy (CRAQ) Goal: If all replicas have same version, read from any one Challenge: They need to know they have correct version

Chain replication for read-heavy (CRAQ) Replicas maintain multiple versions of objects while dirty, i.e., contain uncommitted writes Commitment sent up chain after reaches tail

Chain replication for read-heavy (CRAQ) Read to dirty object must check with tail for proper version This orders read with respect to global order, regardless of replica that handles

Performance: CR vs. CRAQ Reads/s 0 5000 10000 15000 3x- 7x- 1x- CRAQ!7 CRAQ!3 CR!3 0 20 40 60 80 100 Writes/s R. van Renesse and F. B. Schneider. Chain replication for supporting high throughput and availability. OSDI 2004. 34 J. Terrace and M. Freedman. Object Storage on CRAQ: High-throughput chain replication for read-mostly workloads. USENIX ATC 2009.