Conflict-free Replicated Data Types in Practice

Similar documents
Eventual Consistency Today: Limitations, Extensions and Beyond

DISTRIBUTED EVENTUALLY CONSISTENT COMPUTATIONS

Eventual Consistency Today: Limitations, Extensions and Beyond

The CAP theorem. The bad, the good and the ugly. Michael Pfeiffer Advanced Networking Technologies FG Telematik/Rechnernetze TU Ilmenau

SCALABLE CONSISTENCY AND TRANSACTION MODELS

Strong Consistency & CAP Theorem

Coordination-Free Computations. Christopher Meiklejohn

Designing and Evaluating a Distributed Computing Language Runtime. Christopher Meiklejohn Université catholique de Louvain, Belgium

Strong Eventual Consistency and CRDTs

Conflict-free Replicated Data Types (CRDTs)

Strong Eventual Consistency and CRDTs

CS6450: Distributed Systems Lecture 15. Ryan Stutsman

Computing Parable. The Archery Teacher. Courtesy: S. Keshav, U. Waterloo. Computer Science. Lecture 16, page 1

NoSQL systems: sharding, replication and consistency. Riccardo Torlone Università Roma Tre

CS6450: Distributed Systems Lecture 11. Ryan Stutsman

Lasp: A Language for Distributed, Coordination-Free Programming

Report to Brewer s original presentation of his CAP Theorem at the Symposium on Principles of Distributed Computing (PODC) 2000

Big Data Management and NoSQL Databases

CS 655 Advanced Topics in Distributed Systems

Mutual consistency, what for? Replication. Data replication, consistency models & protocols. Difficulties. Execution Model.

Selective Hearing: An Approach to Distributed, Eventually Consistent Edge Computation

Distributed Systems. Characteristics of Distributed Systems. Lecture Notes 1 Basic Concepts. Operating Systems. Anand Tripathi

Distributed Systems. Characteristics of Distributed Systems. Characteristics of Distributed Systems. Goals in Distributed System Designs

Important Lessons. Today's Lecture. Two Views of Distributed Systems

Self-healing Data Step by Step

Databases : Lecture 1 2: Beyond ACID/Relational databases Timothy G. Griffin Lent Term Apologies to Martin Fowler ( NoSQL Distilled )

TARDiS: A branch and merge approach to weak consistency

Conflict-Free Replicated Data Types (basic entry)

Databases : Lectures 11 and 12: Beyond ACID/Relational databases Timothy G. Griffin Lent Term 2013

arxiv: v1 [cs.dc] 4 Mar 2016

Consistency: Relaxed. SWE 622, Spring 2017 Distributed Software Engineering

What Came First? The Ordering of Events in

Optimized OR-Sets Without Ordering Constraints

CS Amazon Dynamo

arxiv: v1 [cs.dc] 12 Oct 2017

Eventual Consistency 1

Bloom: Big Systems, Small Programs. Neil Conway UC Berkeley

Large-Scale Geo-Replicated Conflict-free Replicated Data Types

Genie. Distributed Systems Synthesis and Verification. Marc Rosen. EN : Advanced Distributed Systems and Networks May 1, 2017

CAP Theorem. March 26, Thanks to Arvind K., Dong W., and Mihir N. for slides.

CONSISTENT FOCUS. Building reliable distributed systems at a worldwide scale demands trade-offs between consistency and availability.

Why distributed databases suck, and what to do about it. Do you want a database that goes down or one that serves wrong data?"

Consistency [Delaney et al., 2006] Responsiveness. Distributed Virtual Environments. Outline. Network Architecture. Outline 12/12/2013

Chapter 11 - Data Replication Middleware

4/9/2018 Week 13-A Sangmi Lee Pallickara. CS435 Introduction to Big Data Spring 2018 Colorado State University. FAQs. Architecture of GFS

10. Replication. Motivation

Cassandra - A Decentralized Structured Storage System. Avinash Lakshman and Prashant Malik Facebook

Chapter 7 Consistency And Replication

Modern Database Concepts

Perspectives on NoSQL

ACMS: The Akamai Configuration Management System. A. Sherman, P. H. Lisiecki, A. Berkheimer, and J. Wein

Index. Raul Estrada and Isaac Ruiz 2016 R. Estrada and I. Ruiz, Big Data SMACK, DOI /

Improving Logical Clocks in Riak with Dotted Version Vectors: A Case Study

Presented By: Devarsh Patel

Consistency in Distributed Systems

INF-5360 Presentation

Distributed Data Management Summer Semester 2015 TU Kaiserslautern

Introduction to Distributed Data Systems

Distributed Key Value Store Utilizing CRDT to Guarantee Eventual Consistency

CSE-E5430 Scalable Cloud Computing Lecture 10

Consistency in Distributed Storage Systems. Mihir Nanavati March 4 th, 2016

11/5/2018 Week 12-A Sangmi Lee Pallickara. CS435 Introduction to Big Data FALL 2018 Colorado State University

NoSQL Concepts, Techniques & Systems Part 1. Valentina Ivanova IDA, Linköping University

Exam 2 Review. Fall 2011

Fault Tolerance 1/64

SCALABLE CONSISTENCY AND TRANSACTION MODELS THANKS TO M. GROSSNIKLAUS

Distributed Systems. 09. State Machine Replication & Virtual Synchrony. Paul Krzyzanowski. Rutgers University. Fall Paul Krzyzanowski

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 14 Distributed Transactions

Distributed Data Management Replication

Eventually Consistent HTTP with Statebox and Riak

Universidade do Minho Escola de Engenharia Departamento de Informática. Vitor Manuel Enes Duarte. Efficient Synchronization of State-based CRDTs

Final Exam Logistics. CS 133: Databases. Goals for Today. Some References Used. Final exam take-home. Same resources as midterm

Lecture 6 Consistency and Replication

Atomicity. Bailu Ding. Oct 18, Bailu Ding Atomicity Oct 18, / 38

Databases. Laboratorio de sistemas distribuidos. Universidad Politécnica de Madrid (UPM)

Basic vs. Reliable Multicast

Distributed Hash Tables

Riak. Distributed, replicated, highly available

CISC 7610 Lecture 5 Distributed multimedia databases. Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL

Weak Consistency and Disconnected Operation in git. Raymond Cheng

Deterministic Threshold Queries of Distributed Data Structures

CAP and the Architectural Consequences

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 7 Consistency And Replication

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2015 Lecture 14 NoSQL

MINIMIZING TRANSACTION LATENCY IN GEO-REPLICATED DATA STORES

Building Consistent Transactions with Inconsistent Replication

MDCC MULTI DATA CENTER CONSISTENCY. amplab. Tim Kraska, Gene Pang, Michael Franklin, Samuel Madden, Alan Fekete

Distributed Data Store

Advanced Databases ( CIS 6930) Fall Instructor: Dr. Markus Schneider. Group 17 Anirudh Sarma Bhaskara Sreeharsha Poluru Ameya Devbhankar

Horizontal or vertical scalability? Horizontal scaling is challenging. Today. Scaling Out Key-Value Storage

10. Replication. CSEP 545 Transaction Processing Philip A. Bernstein. Copyright 2003 Philip A. Bernstein. Outline

Key-Value Stores: RiakKV

CAP Theorem, BASE & DynamoDB

Distributed Systems (5DV147)

Distributed Computation Models

Process groups and message ordering

Achieving the Potential of a Fully Distributed Storage System

CS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5.

Recap. CSE 486/586 Distributed Systems Case Study: Amazon Dynamo. Amazon Dynamo. Amazon Dynamo. Necessary Pieces? Overview of Key Design Techniques

Spotify. Scaling storage to million of users world wide. Jimmy Mårdell October 14, 2014

Transcription:

Conflict-free Replicated Data Types in Practice Georges Younes Vitor Enes Wednesday 11 th January, 2017 HASLab/INESC TEC & University of Minho InfoBlender

Motivation

Background

Background: CAP Theorem Brewer s Conjecture 2000 Eric Brewer, PoDC Conference Keynote 2002 Seth Gilbert and Nancy Lynch, ACM SIGACT News 33(2) Of three properties of shared-data system - data Consistency, system Availability and tolerance to network Partitions - only two can be achieved at any given moment in time. 1

Background: CAP Theorem source: Lourenco et al. Journal of Big Data (2015) 2:18 DOI 10.1186/s40537-015-0025-0 2

The best thing about being me... There are so many me s -Agent Smith 3

Background: Data Replication Data Replication: Maintaining multiple data copies on separate machines. Improves Availability Allows access when some replicas are not available Improves Performance reduced latency: let users access nearby replicas increased throughput: let multiple machines serve the data Pessimistic Provide single-copy consistency Block access to a replica unless it is provably up to date Perform well in LANs (small latencies, uncommon failures) Not in Wide Area Networks Optimistic Provide eventual consistency Let access to a replica without a priori synchronization Updates propagated in bg, occasional conflicts resolved later Offer many advantages over their pessimistic counterparts 4

Background: Eventual Consistency (EC) 5

Background: Eventual Consistency (EC) 6

Background: Eventual Consistency (EC) 7

Background: Eventual Consistency (EC) At the same moment, two users can see the same tweet with different number of favorites 8

Conflict-free Replicated Data Types

Conflict-free Replicated Data Types CRDTs Conflict-free: resolves conflicts automatically converging to the same state Replicated: multiple independent copies Data Types: Registers, Counters, Maps, Sequences, Sets, Graphs..etc CRDT Flavors There are two flavors of CRDTs: Operation-based: broadcasts update operations to other replicas State-based: propagates the local state to other replicas 9

Op-based CRDTs Figure 1: Operation based replication For each update operation do: Prepare: Calculate the downstream (the effect observed locally) Apply the effect locally Disseminate (reliable causal broadcast) the calculated effect to be applied on all other replicas 10

Op-based (Counter :: N) replica A S A = 0 S A = update(inc, S A ) = 0 + 1 = 1 m 1 = inc send B (m 1 ) Op-based (Counter :: N) replica B S B = 0 S B = update(dec, S B ) = S B 1 = 1 m 2 = inc send A (m 2 ) on receive R (m 2 ): S A = update(dec, S A ) = 1 1 = 0 on receive R (m 1 ): S B = update(inc, S B ) = 1 + 1 = 0 What if m 1 is lost? What if m 2 is duplicated? 11

How do solve these problems? TRCB: Tagged Reliable Causal Broadcast Tags each operation with a unique timestamp (VC) Delivery of operations respecting causal order Reliable bcast: at-least-once to prevent message loss Reliable bcast: at-most-once to prevent message duplication Are all data types commutative? Counters are commutative: incr; decr; == decr; incr; Not all data types are (In fact most are NOT) A Set S 1 with add(element, Set) and rmv(element, Set) operation is not add(a, S 1 ); rmv(a, S 1 ); value(s 1 ) = {} rmv(a, S 1 ); add(a, S 1 ); value(s 1 ) = {a} How do we solve concurrent operations? 12

POLog: Partially Ordered Log 13

POLog: Partially Ordered Log 14

POLog: Partially Ordered Log 15

POLog: Partially Ordered Log 16

POLog: Partially Ordered Log 17

POLog: Partially Ordered Log Querying the Add-Wins-Set (AWSet) An element v is in the set if there is an add v operation in the set that is not succeeded by rmv v operation v (t, add v ) POLog (t, rmv v ) POLog.t t So does the POLog keep growing? In the op-based CRDT model, the POLog keeps growing Pure op-based CRDT model was introduced to: Apply GC on the POLog Reduces the message size Provides a more generic API 18

Pure op-based in Redis Figure 2: Pure op-based Architecture in Redis 19

Operation-based (GCounter :: N) A 1 inc 2 inc 3 m 1=inc m 2=inc B 1 2 3 State-based (GCounter :: I N) A {B 1 } inc {A 1, B 1 } inc {A 2, B 1 } m 1={A 1,B 1} m 2={A 2,B 1} B {B 1 } {A 1, B 1 } {A 2, B 1 } What if m 1 is lost? (monotonicity) What if m 2 is duplicated? (idempotence) What if m 1 and m 2 are reordered? (commutativity) 20

Why are state-based CRDTs so cool? State-based CRDT A state-based CRDT is a join-semilattice (S,, ) where S is a poset, its partial order, and a binary join operator that derives the least upper bound for every two elements of S. s, t, u S: s s = s s t = t s s (t u) = (s t) u (idempotence) (commutativity) (associativity) $ $ full state transmission $ $ 21

Avoiding full state transmission How we can we avoid replica A sending the full state to replica B? Two fundamental problems A knows something about B: B has S old A has S new and knows that B has S old Goal: compute delta d of minimum size s.t. S new = S old d A knows nothing about B: B has S old A has S new Goal: protocol that minimizes communication cost 22

Solving the first problem 23

Solving the first problem: Delta-CRDTs State-based (GSet :: P(E)) A {} add x {x} add y {x} {x, y} {x,y} B {} {x} {x, y} add z {x, y, z} Delta-state-based A {} add x {x} add y {x} {x, y} B {} {x} {x, y} add z {x, y, z} {y} {x, y, z} {x,y,z} {x, y, z} {z} 24

Solving the second problem 25

Solving the second problem: Join Decompositions State-driven A {a} add x,y {a, x, y} {a, x, y, z} B {a} add z {a, z} Digest-driven {a,z} A {a} add x,y {a, x, y} {a, x, y} B {a} add z {a, z} d B {x,y} {a, x, y, z} ({x,y},d B ) {a, x, y, z} {a, x, y, z} {z} 26

Who uses that?

27

Lasp Lasp is a language for distributed, eventually consistent computations CRDTs from basho/riak dt to lasp/types lasp/types: State-based CRDTs Delta-based CRDTs Pure-op-based CRDTs 28

GCounter state transmission 45 Advertisement Impression Counter 40 35 30 GB Transmitted 25 20 15 10 5 0 state delta state delta state delta state delta 128 256 512 1024 (Client Number) State 29

LDB & LSim LDB benchmarking platform for CRDTs lasp/types + replication LSim State-based CRDTs Delta-based CRDTs Delta-based CRDTs + Join Decompositions Pure-op-based CRDTs LDB + Peer Services + Workloads experiments in DC/OS (Apache Mesos + Marathon) 30

GSet state transmission MB Transmitted 0.35 0.3 0.25 0.2 0.15 0.1 0.05 State - Ring State - HyParView State - Erdos Renyi Decompositions - HyParView Decompositions - Erdos Renyi Decompositions - Ring 0 0 20 40 60 80 100 120 140 160 180 200 Time in Seconds 31

Challenges 1. Log aggregation 2. Time 3. Bugs Time-based graphs (e.g. x axis is time) Total-effort graphs demands equal total-time across all runs LDFI IronFleet Jepsen... 4. $$ (On-Demand vs Spot Instances) 32

Questions? bit.ly/crdts-infoblender 32