(It s the invariants, stupid)

Size: px
Start display at page:

Download "(It s the invariants, stupid)"

Transcription

1 Consistency in three dimensions (It s the invariants, stupid) Marc Shapiro Cloud to the edge Social, web, e-commerce: shared mutable data Scalability replication consistency issues Inria & Sorbonne-Universités UPMC-LIP6 Social, web, e-commerce: shared mutable data Scalability replication consistency issues 2 Cloud to the edge Why consistency models Social, web, e-commerce: shared mutable data Scalability replication consistency issues SER, SI, PSI, US, NMSI, EC, More parallelism, less synchronisation Implementation, execution freedom Availability, Performance Anomalies, hard to understand Fundamental trade-off: availability, responsiveness vs. programmability What is right for my application? 3 4

2 Why consistency models Two parts 90% -only transactions; Disaster Tolerant Termination Latency of Transactions (ms) 70% -only transactions; Disaster Tolerant 1. Understanding consistency Organising consistency design space into independent dimensions Consistency and invariants 3 Serrano-SI P-Store-SER GMU-US Jessy2pc-NMSI RC Walter-PSI SDUR-SER Application-specific consistency Minimise synchronisation While maintaining application correctness Credit: Masoud Saeida Ardekani 5 6 Atul Adya s hierarchy Operation More invariants More predictable Strict Serializability (PL-SS) Full Serializability (PL-3) Snapshot Isolation (PL-SI) Serializability (PL-3U) Low performance client origin replica u Forward Consistent View (PL-FCV) other replica Repeatable (PL-2.99) Monotonic Snapshot s (PL-MSR) Consistent View (PL-2+) Hard to program More behaviours Cursor Stability (PL-CS) PL-2 PL-1 Monotonic View (PL-2L) One dimension is not enough High performance u: state (retval, (state state)) Prepare (@origin) ; deliver one, write all (ROWA) Deferred-update replication (DUR) 7 8

3 Application invariants v? Convergence? Safety? South Boat North = { sheep, dog, wolf } carrynorth(s) 1 S 2 carrysouth(s) 1 S 2 S {South, Boat, North} : sheep S wolf S dog S, Multi-master Strong: total, identical state Weak: concurrent, interleaving, no global state Hard to tease invariants out Silent invariants 9 10 Three classes / Gen1 of invariant of protocol Mostly orthogonal (but not all combinations make sense.) Strict Serialisability Gen1 Constrain value of an object Total of operations EQ PO State equivalence between objects Ordering between operations Units Causality 11 Units / EQ Snapshot Isolation Eventual Consistency HAT 12 / PO Causal

4 Gen1 invariants Gen1: ed updates Inv = 0 x = x x 1 { Inv 1 x} { Inv } Predict that Inv will be true after : Sequential: weakest precondition Generalises to bounded concurrency Unbounded concurrency: no sufficient precondition Invariant is not stable Limit concurrency: escrow No concurrency: updates Generic invariants No Lost s Total updates updates Total, Consensus Unavailable Pick a number Will all replicas observe updates in the same total? No: full parallelism Local: lost work Global synchronisation, unavailable () If yes, will reads appear in the same? Ex.: Serialisability, Linearisability Not: EC, Causal Consistency Type EQ invariants EQ: indivisible unit A = B x.friendof (y) y.friendof (x) x + y = constant South Boat North = { sheep, dog, wolf } Joint update to two objects Atomicity (all-or-nothing) property of transactions Protocol: single update message Asynchronous + Unit: group operations ( transaction ) Indivisible, atomic, all-or-nothing Deliver effectors together Snapshot: all reads from same set of effectors Plain: cached reads (repeatable) Consistent: according to Causal/ Total constraints 15 16

5 EQ + Gen: + Type PO invariants + Unit: group operations ( transaction ) Indivisible, atomic, all-or-nothing Deliver effectors together Snapshot: all reads from same set of effectors Plain: cached reads (repeatable) Consistent: according to Causal/ Total constraints employee.salary employee.manager.salary sheep S wolf S dog S Referential integrity inode references disk block Security access (u, p) ACL (u, p) Demarcation Protocol: Decrease LHS / increase RHS, or 1. increase RHS by c 2. increase LHS by c' c causal delivery Causal- delivery Don t show photos to Bob post photo Causal- delivery u v u v u v u v Bob v u Bob u v Bob sees photo access (Bob, photo) ACL (Bob, photo) v observed effects of u at source v to be delivered after u v visible(i) u v u visible(i) 19 v observed effects of u at origin v to be delivered after u if u v, then observing both u and v implies u; ; v Asynchronous: doesn t slow down sender Cost: metadata + transitive closure 20

6 PO: Causal- visblty Total causal + Write- Monotonic s + My Writes Causality Which writes visible to read delivery of writes Transitive closure property Metadata Sender not delayed available Ex.: SI, Causal C. Not: Serialisability + Total updates updates Total, Monotonic s + My Writes + Write- Total causal Total updates updates Total updates updates Lin + Total, EC Monotonic s + My Writes + Write- Total causal + Total, Monotonic s + My Writes + Write- Total causal 23 24

7 Total updates updates SSER Lin SER Total updates updates SSER Lin + Total, Monotonic s + My Writes + Write- Total causal + Total, Monotonic s + My Writes + Write- Total causal SER Total updates updates SSER SER Total updates updates SSER + Total, SI Monotonic s + My Writes + Write- Total causal + Total, PSI SI Monotonic s + My Writes + Write- Total causal 27 28

8 + Total updates updates Total, Monotonic s + My Writes HAT + Write- Total causal 29 Total XXX unit orthogonal causal writes reads Eventual C. N N N Causal+ N Y N Cmtd Y N N CausalTxn Y Y N Total Centrlisd writes N N Y N This table does not take reads into account read = writes SC, Lineariz. N Y Y Y Y Serlzbty Y N Y Y N Serializability Y N Y Y Y PSI Snapshot Isolatn Y Y Y Y N Strict Serlzbty Y Y N 30 Explicit consistency Hybrid, opportunistic consistency model Programme(r) declares application invariants. System ensures that every state transition preserves the invariant. Synchronise only when strictly necessary for invariant Improve performance, availability Assume: causal ing accrue 5% debit(100) Convergent: do replicas that delivered the same updates have the same state? Safe: Are invariants preserved? Sequential: in isolation maintains invariant execution maintains invariant +5 I? I?

9 Specification Verification Problem balance 0 {balance amount } debit!(amount)= {balance amount} Token = {τ,...} Token Token Invariant precondition effector tokens u: state (retval, (state state)) Given the specification of operations: preconditions effects tokens uτ conflict relation on tokens does any interleaving of the operations always preserve a given integrity invariant? CISE Proof Tool Analysis and Co-design Tokens Semantics Invariant CISE OK Counter-example CISE proof obligations: O(n 2 ), no combinatorial explosion Manual or automated Tool based on Z3 SAT solver Discharges verification conditions OK = Safe Tokens Semantics Invariant CISE OK Counter-example Counter-example? Design decision: either Strengthen precondition Weaken effect Weaken invariant or Strengthen synchronisation (tokens) 35 36

10 I? I? Simple example: bank account CISE Proof Obligations 1: Sequential correctness Any single operation maintains the invariant 2: Convergence If concurrent, effectors commute 3: Precondition Stability Every precondition is stable under every operation, if concurrent If satisfied: invariant is guaranteed 37 Operations: credit(amt), debit(amt), accrue() Invariant: balance 0 Start with weak specification Rule 1 strengthen precondition for debit Rule 2 accrue adds Rule 3 debit debit unsafe, fixed with concurrency control 38 I I I I CISE Proof Obligations 1: Sequential correctness Any single operation maintains the invariant 2: Convergence If concurrent, effectors commute 3: Precondition Stability Every precondition is stable under every operation, if concurrent If satisfied: invariant is guaranteed 39 invariant "balance >= 0 { amt 0 } deposit! = {balance += alt} { amt 0 } debit! = {balance -= alt} No counter-example for deposit, accrue. Counter-example for debit: if balance initially zero, debit will make it negative. { amt balance } debit! = {balance -= alt} 40

11 I I invariant "balance >= 0 { amt 0 } deposit! = {balance += alt} { amt 0 } debit! = {balance -= alt} No counter-example for deposit, accrue. Counter-example for debit: if balance initially zero, debit will make it negative. { amt balance } debit! = {balance -= alt} 41 CISE Proof Obligations 1: Sequential correctness Any single operation maintains the invariant 2: Convergence If concurrent, effectors commute 3: Precondition Stability Every precondition is stable under every operation, if concurrent If satisfied: invariant is guaranteed 42 I balance = 1 balance = 1 debitpre {1 1} debit(1) balance 1 balance 1 balance 1 balance = 1? CISE Proof Obligations 1: Sequential correctness Any single operation maintains the invariant 2: Convergence If concurrent, effectors commute 3: Precondition Stability Every precondition is stable under every operation, if concurrent If satisfied: invariant is guaranteed 43 debit(1) debitpre { 1 0 }? CISE Proof Obligations 1: Sequential correctness Any single operation maintains the invariant 2: Convergence If concurrent, effectors commute 3: Precondition Stability Every precondition is stable under every operation, if concurrent If satisfied: invariant is guaranteed Fix: concurrency control 44

12 Advanced example: file system Advanced example: file system Operations: mkdir, rmdir, mv, update, etc. Invariant: Tree Rule 1 precondition on mv May not move node under self Rule 2 Use CRDTs for update update Rule 3 mv mv precondition unstable Operations: mkdir, rmdir, mv, update, etc. Invariant: Tree Rule 1 precondition on mv May not move node under self Rule 2 Use CRDTs for update update Rule 3 mv mv precondition unstable root mvpre { B/.../A} mv /B, /A A root B root Applying the logic A B A B root mv /A, /B mvpre { B/.../A}? CISE Proof Obligations 1: Sequential correctness Any single operation maintains the invariant 2: Convergence If concurrent, effectors commute 3: Precondition Stability Every precondition is stable under every operation, if concurrent If satisfied: invariant is guaranteed Fix: concurrency control 47 A B Only O(n 2 ): no need to consider all possible interleavings We use a tool You can apply the same logic manually 48

13 Deployment Ad-service Data-centers deployed in AWS: 3 Regions (EU, US-EAST/WEST); N app-servers connect to local DBs; Clients submit operations to the app-server in close loop. Compare performance: Causal Consistency Strong Consistency (Writes to single server) Red-Blue Consistency (Causal + Writes to single server) Explicit Consistency (Causal + Reservations) ads; 100% I-offenders 50 Tournament Tournament: latency 82% reads; 4% safe writes; 14% I-offenders Detailed Operations Latency 51 52

14 Summary Summary Not all concurrency leads to invariant violation Explicit consistency + CISE Identify just enough coordination Indigo Enforces explicit consistency Deconstructing consistency CRDTs provide you consistency Mechanisms for consistency Explicit consistency for just right consistency Acknowledgments SyncFree European FP7 project, Carlos Baquero, Valter Balegas, Annette Bieniusa, Russell Brown, Sérgio Duarte, Nuno Preguiça, etc. Masoud Saeida-Ardekani, Marek Zawirski. Alexey Gotsman, Hongseok Yang, Carla Ferreira, Mahsa Najafzadeh 55

This talk is about. Data consistency in 3D. Geo-replicated database q: Queue c: Counter { q c } Shared database q: Queue c: Counter { q c }

This talk is about. Data consistency in 3D. Geo-replicated database q: Queue c: Counter { q c } Shared database q: Queue c: Counter { q c } Data consistency in 3D (It s the invariants, stupid) Marc Shapiro Masoud Saieda Ardekani Gustavo Petri This talk is about Understanding consistency Primitive consistency mechanisms How primitives compose

More information

Just-Right Consistency. Centralised data store. trois bases. As available as possible As consistent as necessary Correct by design

Just-Right Consistency. Centralised data store. trois bases. As available as possible As consistent as necessary Correct by design Just-Right Consistency As available as possible As consistent as necessary Correct by design Marc Shapiro, UMC-LI6 & Inria Annette Bieniusa, U. Kaiserslautern Nuno reguiça, U. Nova Lisboa Christopher Meiklejohn,

More information

SWIFTCLOUD: GEO-REPLICATION RIGHT TO THE EDGE

SWIFTCLOUD: GEO-REPLICATION RIGHT TO THE EDGE SWIFTCLOUD: GEO-REPLICATION RIGHT TO THE EDGE Annette Bieniusa T.U. Kaiserslautern Carlos Baquero HSALab, U. Minho Marc Shapiro, Marek Zawirski INRIA, LIP6 Nuno Preguiça, Sérgio Duarte, Valter Balegas

More information

Genie. Distributed Systems Synthesis and Verification. Marc Rosen. EN : Advanced Distributed Systems and Networks May 1, 2017

Genie. Distributed Systems Synthesis and Verification. Marc Rosen. EN : Advanced Distributed Systems and Networks May 1, 2017 Genie Distributed Systems Synthesis and Verification Marc Rosen EN.600.667: Advanced Distributed Systems and Networks May 1, 2017 1 / 35 Outline Introduction Problem Statement Prior Art Demo How does it

More information

Strong Eventual Consistency and CRDTs

Strong Eventual Consistency and CRDTs Strong Eventual Consistency and CRDTs Marc Shapiro, INRIA & LIP6 Nuno Preguiça, U. Nova de Lisboa Carlos Baquero, U. Minho Marek Zawirski, INRIA & UPMC Large-scale replicated data structures Large, dynamic

More information

Strong Eventual Consistency and CRDTs

Strong Eventual Consistency and CRDTs Strong Eventual Consistency and CRDTs Marc Shapiro, INRIA & LIP6 Nuno Preguiça, U. Nova de Lisboa Carlos Baquero, U. Minho Marek Zawirski, INRIA & UPMC Large-scale replicated data structures Large, dynamic

More information

Mahsa NAJAFZADEH. The Analysis and Co-design of Weakly-Consistent Applications

Mahsa NAJAFZADEH. The Analysis and Co-design of Weakly-Consistent Applications THÈSE DE DOCTORAT DE l UNIVERSITÉ PIERRE ET MARIE CURIE Spécialité Informatique École doctorale Informatique, Télécommunications et Électronique (Paris) Présentée par Mahsa NAJAFZADEH Pour obtenir le grade

More information

Conflict-free Replicated Data Types (CRDTs)

Conflict-free Replicated Data Types (CRDTs) Conflict-free Replicated Data Types (CRDTs) for collaborative environments Marc Shapiro, INRIA & LIP6 Nuno Preguiça, U. Nova de Lisboa Carlos Baquero, U. Minho Marek Zawirski, INRIA & UPMC Conflict-free

More information

Cloud-backed applications. Write Fast, Read in the Past: Causal Consistency for Client- Side Applications. Requirements & challenges

Cloud-backed applications. Write Fast, Read in the Past: Causal Consistency for Client- Side Applications. Requirements & challenges Write Fast, Read in the Past: ausal onsistenc for lient- Side Applications loud-backed applications 1. Request 3. Repl 2. Process request & store update 4. Transmit update 50~200ms databas e app server

More information

A Framework for Transactional Consistency Models with Atomic Visibility

A Framework for Transactional Consistency Models with Atomic Visibility A Framework for Transactional Consistency Models with Atomic Visibility Andrea Cerone, Giovanni Bernardi, Alexey Gotsman IMDEA Software Institute, Madrid, Spain CONCUR - Madrid, September 1st 2015 Data

More information

Improving the Correct Eventual Consistency Tool

Improving the Correct Eventual Consistency Tool arxiv:1807.06431v1 [cs.dc] 17 Jul 2018 Improving the Correct Eventual Consistency Tool Sreeja Nair, Marc Shapiro RESEARCH REPORT N 9191 July 2018 Project-Teams DELYS ISSN 0249-6399 ISRN INRIA/RR--9191--FR+ENG

More information

The CISE Tool: Proving Weakly-Consistent Applications Correct

The CISE Tool: Proving Weakly-Consistent Applications Correct The CISE Tool: Proving Weakly-Consistent Applications Correct Mahsa Najafzadeh Sorbonne-Universités-UPMC & Inria, Paris, France Alexey Gotsman IMDEA Software Institute, Spain Hongseok Yang University of

More information

Eventual Consistency Today: Limitations, Extensions and Beyond

Eventual Consistency Today: Limitations, Extensions and Beyond Eventual Consistency Today: Limitations, Extensions and Beyond Peter Bailis and Ali Ghodsi, UC Berkeley - Nomchin Banga Outline Eventual Consistency: History and Concepts How eventual is eventual consistency?

More information

Towards a Proof Framework for Information Systems with Weak Consistency

Towards a Proof Framework for Information Systems with Weak Consistency Towards a Proof Framework for Information Systems with Weak Consistency Peter Zeller and Arnd Poetzsch-Heffter University of Kaiserslautern, Germany {p zeller,poetzsch@cs.uni-kl.de Abstract. Weakly consistent

More information

Extending Eventually Consistent Cloud Databases for Enforcing Numeric Invariants

Extending Eventually Consistent Cloud Databases for Enforcing Numeric Invariants Extending Eventually Consistent Cloud Databases for Enforcing Numeric Invariants Valter Balegas, Diogo Serra, Sérgio Duarte Carla Ferreira, Rodrigo Rodrigues, Nuno Preguiça NOVA LINCS/FCT/Universidade

More information

Evaluation of the CEC (Correct Eventual Consistency) Tool

Evaluation of the CEC (Correct Eventual Consistency) Tool Evaluation of the CEC (Correct Eventual Consistency) Tool Sreeja Nair To cite this version: Sreeja Nair. Evaluation of the CEC (Correct Eventual Consistency) Tool. [Research Report] RR-9111, Inria Paris;

More information

Eventual Consistency Today: Limitations, Extensions and Beyond

Eventual Consistency Today: Limitations, Extensions and Beyond Eventual Consistency Today: Limitations, Extensions and Beyond Peter Bailis and Ali Ghodsi, UC Berkeley Presenter: Yifei Teng Part of slides are cited from Nomchin Banga Road Map Eventual Consistency:

More information

Replication in Distributed Systems

Replication in Distributed Systems Replication in Distributed Systems Replication Basics Multiple copies of data kept in different nodes A set of replicas holding copies of a data Nodes can be physically very close or distributed all over

More information

Masoud SAEIDA ARDEKANI

Masoud SAEIDA ARDEKANI THÈSE DE DOCTORAT DE l UNIVERSITÉ PIERRE ET MARIE CURIE Spécialité Informatique École doctorale Informatique, Télécommunications et Électronique (Paris) Présentée par Masoud SAEIDA ARDEKANI Pour obtenir

More information

Large-Scale Key-Value Stores Eventual Consistency Marco Serafini

Large-Scale Key-Value Stores Eventual Consistency Marco Serafini Large-Scale Key-Value Stores Eventual Consistency Marco Serafini COMPSCI 590S Lecture 13 Goals of Key-Value Stores Export simple API put(key, value) get(key) Simpler and faster than a DBMS Less complexity,

More information

DISTRIBUTED COMPUTER SYSTEMS

DISTRIBUTED COMPUTER SYSTEMS DISTRIBUTED COMPUTER SYSTEMS CONSISTENCY AND REPLICATION CONSISTENCY MODELS Dr. Jack Lange Computer Science Department University of Pittsburgh Fall 2015 Consistency Models Background Replication Motivation

More information

Using Erlang, Riak and the ORSWOT CRDT at bet365 for Scalability and Performance. Michael Owen Research and Development Engineer

Using Erlang, Riak and the ORSWOT CRDT at bet365 for Scalability and Performance. Michael Owen Research and Development Engineer 1 Using Erlang, Riak and the ORSWOT CRDT at bet365 for Scalability and Performance Michael Owen Research and Development Engineer 2 Background 3 bet365 in stats Founded in 2000 Located in Stoke-on-Trent

More information

IPA: Invariant-Preserving Applications for Weakly Consistent Replicated Databases

IPA: Invariant-Preserving Applications for Weakly Consistent Replicated Databases IPA: Invariant-Preserving Applications for Weakly Consistent Replicated Databases Valter Balegas NOVA LINCS, FCT, Universidade NOVA de Lisboa v.sousa@campus.fct.unl.pt Rodrigo Rodrigues INESC-ID, Instituto

More information

From strong to eventual consistency: getting it right. Nuno Preguiça, U. Nova de Lisboa Marc Shapiro, Inria & UPMC-LIP6

From strong to eventual consistency: getting it right. Nuno Preguiça, U. Nova de Lisboa Marc Shapiro, Inria & UPMC-LIP6 From strong to eventual consistency: getting it right Nuno Preguiça, U. Nova de Lisboa Marc Shapiro, Inria & UPMC-LIP6 Conflict-free Replicated Data Types ı Marc Shapiro 1,5, Nuno Preguiça 2,1, Carlos

More information

Masoud SAEIDA ARDEKANI

Masoud SAEIDA ARDEKANI THÈSE DE DOCTORAT DE l UNIVERSITÉ PIERRE ET MARIE CURIE Spécialité Informatique École doctorale Informatique, Télécommunications et Électronique (Paris) Présentée par Masoud SAEIDA ARDEKANI Pour obtenir

More information

Self-healing Data Step by Step

Self-healing Data Step by Step Self-healing Data Step by Step Uwe Friedrichsen (codecentric AG) NoSQL matters Cologne, 29. April 2014 @ufried Uwe Friedrichsen uwe.friedrichsen@codecentric.de http://slideshare.net/ufried http://ufried.tumblr.com

More information

Conflict-Free Replicated Data Types (basic entry)

Conflict-Free Replicated Data Types (basic entry) Conflict-Free Replicated Data Types (basic entry) Marc Shapiro Sorbonne-Universités-UPMC-LIP6 & Inria Paris http://lip6.fr/marc.shapiro/ 16 May 2016 1 Synonyms Conflict-Free Replicated Data Types (CRDTs).

More information

Causal Consistency and Two-Phase Commit

Causal Consistency and Two-Phase Commit Causal Consistency and Two-Phase Commit CS 240: Computing Systems and Concurrency Lecture 16 Marco Canini Credits: Michael Freedman and Kyle Jamieson developed much of the original material. Consistency

More information

Transactions. 1. Transactions. Goals for this lecture. Today s Lecture

Transactions. 1. Transactions. Goals for this lecture. Today s Lecture Goals for this lecture Transactions Transactions are a programming abstraction that enables the DBMS to handle recovery and concurrency for users. Application: Transactions are critical for users Even

More information

Important Lessons. A Distributed Algorithm (2) Today's Lecture - Replication

Important Lessons. A Distributed Algorithm (2) Today's Lecture - Replication Important Lessons Lamport & vector clocks both give a logical timestamps Total ordering vs. causal ordering Other issues in coordinating node activities Exclusive access to resources/data Choosing a single

More information

Lectures 8 & 9. Lectures 7 & 8: Transactions

Lectures 8 & 9. Lectures 7 & 8: Transactions Lectures 8 & 9 Lectures 7 & 8: Transactions Lectures 7 & 8 Goals for this pair of lectures Transactions are a programming abstraction that enables the DBMS to handle recoveryand concurrency for users.

More information

Reminder from last time

Reminder from last time Concurrent systems Lecture 5: Concurrency without shared data, composite operations and transactions, and serialisability DrRobert N. M. Watson 1 Reminder from last time Liveness properties Deadlock (requirements;

More information

Distributed Systems. 09. State Machine Replication & Virtual Synchrony. Paul Krzyzanowski. Rutgers University. Fall Paul Krzyzanowski

Distributed Systems. 09. State Machine Replication & Virtual Synchrony. Paul Krzyzanowski. Rutgers University. Fall Paul Krzyzanowski Distributed Systems 09. State Machine Replication & Virtual Synchrony Paul Krzyzanowski Rutgers University Fall 2016 1 State machine replication 2 State machine replication We want high scalability and

More information

Lecture 21. Lecture 21: Concurrency & Locking

Lecture 21. Lecture 21: Concurrency & Locking Lecture 21 Lecture 21: Concurrency & Locking Lecture 21 Today s Lecture 1. Concurrency, scheduling & anomalies 2. Locking: 2PL, conflict serializability, deadlock detection 2 Lecture 21 > Section 1 1.

More information

殷亚凤. Consistency and Replication. Distributed Systems [7]

殷亚凤. Consistency and Replication. Distributed Systems [7] Consistency and Replication Distributed Systems [7] 殷亚凤 Email: yafeng@nju.edu.cn Homepage: http://cs.nju.edu.cn/yafeng/ Room 301, Building of Computer Science and Technology Review Clock synchronization

More information

A Modular Design for Geo-Distributed Querying

A Modular Design for Geo-Distributed Querying A Modular Design for Geo-Distributed Querying Dimitrios Vasilas, Marc Shapiro, Bradley King To cite this version: Dimitrios Vasilas, Marc Shapiro, Bradley King. A Modular Design for Geo-Distributed Querying:

More information

CS6450: Distributed Systems Lecture 11. Ryan Stutsman

CS6450: Distributed Systems Lecture 11. Ryan Stutsman Strong Consistency CS6450: Distributed Systems Lecture 11 Ryan Stutsman Material taken/derived from Princeton COS-418 materials created by Michael Freedman and Kyle Jamieson at Princeton University. Licensed

More information

Co-Design and Verification of an Available File System

Co-Design and Verification of an Available File System Co-Design and Verification of an Available File System Mahsa Najafzadeh 1, Marc Shapiro 2, and Patrick Eugster 1,3 1 Purdue University, West Lafayette, USA 2 INRIA-LIP6, Paris, France 3 Darmstadt University,

More information

Distributed Systems. Characteristics of Distributed Systems. Lecture Notes 1 Basic Concepts. Operating Systems. Anand Tripathi

Distributed Systems. Characteristics of Distributed Systems. Lecture Notes 1 Basic Concepts. Operating Systems. Anand Tripathi 1 Lecture Notes 1 Basic Concepts Anand Tripathi CSci 8980 Operating Systems Anand Tripathi CSci 8980 1 Distributed Systems A set of computers (hosts or nodes) connected through a communication network.

More information

Distributed Systems. Characteristics of Distributed Systems. Characteristics of Distributed Systems. Goals in Distributed System Designs

Distributed Systems. Characteristics of Distributed Systems. Characteristics of Distributed Systems. Goals in Distributed System Designs 1 Anand Tripathi CSci 8980 Operating Systems Lecture Notes 1 Basic Concepts Distributed Systems A set of computers (hosts or nodes) connected through a communication network. Nodes may have different speeds

More information

Marek ZAWIRSKI. Cohérence à terme fiable avec des types de données répliquées

Marek ZAWIRSKI. Cohérence à terme fiable avec des types de données répliquées THÈSE DE DOCTORAT DE l UNIVERSITÉ PIERRE ET MARIE CURIE Spécialité Informatique École doctorale Informatique, Télécommunications et Électronique (Paris) Présentée par Marek ZAWIRSKI Pour obtenir le grade

More information

Transaction Management & Concurrency Control. CS 377: Database Systems

Transaction Management & Concurrency Control. CS 377: Database Systems Transaction Management & Concurrency Control CS 377: Database Systems Review: Database Properties Scalability Concurrency Data storage, indexing & query optimization Today & next class Persistency Security

More information

Basic vs. Reliable Multicast

Basic vs. Reliable Multicast Basic vs. Reliable Multicast Basic multicast does not consider process crashes. Reliable multicast does. So far, we considered the basic versions of ordered multicasts. What about the reliable versions?

More information

EECS 498 Introduction to Distributed Systems

EECS 498 Introduction to Distributed Systems EECS 498 Introduction to Distributed Systems Fall 2017 Harsha V. Madhyastha Replicated State Machines Logical clocks Primary/ Backup Paxos? 0 1 (N-1)/2 No. of tolerable failures October 11, 2017 EECS 498

More information

DISTRIBUTED EVENTUALLY CONSISTENT COMPUTATIONS

DISTRIBUTED EVENTUALLY CONSISTENT COMPUTATIONS LASP 2 DISTRIBUTED EVENTUALLY CONSISTENT COMPUTATIONS 3 EN TAL AV CHRISTOPHER MEIKLEJOHN 4 RESEARCH WITH: PETER VAN ROY (UCL) 5 MOTIVATION 6 SYNCHRONIZATION IS EXPENSIVE 7 SYNCHRONIZATION IS SOMETIMES

More information

Module 7 - Replication

Module 7 - Replication Module 7 - Replication Replication Why replicate? Reliability Avoid single points of failure Performance Scalability in numbers and geographic area Why not replicate? Replication transparency Consistency

More information

CMPT 354: Database System I. Lecture 11. Transaction Management

CMPT 354: Database System I. Lecture 11. Transaction Management CMPT 354: Database System I Lecture 11. Transaction Management 1 Why this lecture DB application developer What if crash occurs, power goes out, etc? Single user à Multiple users 2 Outline Transaction

More information

Automating the Choice of Consistency Levels in Replicated Systems

Automating the Choice of Consistency Levels in Replicated Systems Automating the Choice of Consistency Levels in Replicated Systems Cheng Li, João Leitão, Allen Clement Nuno Preguiça, Rodrigo Rodrigues, Viktor Vafeiadis Max Planck Institute for Software Systems (MPI-SWS),

More information

Overview of Transaction Management

Overview of Transaction Management Overview of Transaction Management Chapter 16 Comp 521 Files and Databases Fall 2010 1 Database Transactions A transaction is the DBMS s abstract view of a user program: a sequence of database commands;

More information

Geo-Replication: Fast If Possible, Consistent If Necessary

Geo-Replication: Fast If Possible, Consistent If Necessary Geo-Replication: Fast If Possible, Consistent If Necessary Valter Balegas, Cheng Li, Mahsa Najafzadeh, Daniel Porto, Allen Clement, Sérgio Duarte, Carla Ferreira, Johannes Gehrke, João Leitão, Nuno Preguiça,

More information

Distributed Systems (5DV147)

Distributed Systems (5DV147) Distributed Systems (5DV147) Replication and consistency Fall 2013 1 Replication 2 What is replication? Introduction Make different copies of data ensuring that all copies are identical Immutable data

More information

CSC 261/461 Database Systems Lecture 24

CSC 261/461 Database Systems Lecture 24 CSC 261/461 Database Systems Lecture 24 Fall 2017 TRANSACTIONS Announcement Poster: You should have sent us the poster by yesterday. If you have not done so, please send us asap. Make sure to send it for

More information

Consistency in Distributed Storage Systems. Mihir Nanavati March 4 th, 2016

Consistency in Distributed Storage Systems. Mihir Nanavati March 4 th, 2016 Consistency in Distributed Storage Systems Mihir Nanavati March 4 th, 2016 Today Overview of distributed storage systems CAP Theorem About Me Virtualization/Containers, CPU microarchitectures/caches, Network

More information

Chapter 4: Distributed Systems: Replication and Consistency. Fall 2013 Jussi Kangasharju

Chapter 4: Distributed Systems: Replication and Consistency. Fall 2013 Jussi Kangasharju Chapter 4: Distributed Systems: Replication and Consistency Fall 2013 Jussi Kangasharju Chapter Outline n Replication n Consistency models n Distribution protocols n Consistency protocols 2 Data Replication

More information

COURSE 1. Database Management Systems

COURSE 1. Database Management Systems COURSE 1 Database Management Systems Assessment / Other Details Final grade 50% - laboratory activity / practical test 50% - written exam Course details (bibliography, course slides, seminars, lab descriptions

More information

Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf

Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf Distributed systems Lecture 6: distributed transactions, elections, consensus and replication Malte Schwarzkopf Last time Saw how we can build ordered multicast Messages between processes in a group Need

More information

Transaction Management and Concurrency Control. Chapter 16, 17

Transaction Management and Concurrency Control. Chapter 16, 17 Transaction Management and Concurrency Control Chapter 16, 17 Instructor: Vladimir Zadorozhny vladimir@sis.pitt.edu Information Science Program School of Information Sciences, University of Pittsburgh

More information

Intro to Transactions

Intro to Transactions Reading Material CompSci 516 Database Systems Lecture 14 Intro to Transactions [RG] Chapter 16.1-16.3, 16.4.1 17.1-17.4 17.5.1, 17.5.3 Instructor: Sudeepa Roy Acknowledgement: The following slides have

More information

Consistency in Distributed Systems

Consistency in Distributed Systems Consistency in Distributed Systems Recall the fundamental DS properties DS may be large in scale and widely distributed 1. concurrent execution of components 2. independent failure modes 3. transmission

More information

TARDiS: A branch and merge approach to weak consistency

TARDiS: A branch and merge approach to weak consistency TARDiS: A branch and merge approach to weak consistency By: Natacha Crooks, Youer Pu, Nancy Estrada, Trinabh Gupta, Lorenzo Alvis, Allen Clement Presented by: Samodya Abeysiriwardane TARDiS Transactional

More information

MDCC MULTI DATA CENTER CONSISTENCY. amplab. Tim Kraska, Gene Pang, Michael Franklin, Samuel Madden, Alan Fekete

MDCC MULTI DATA CENTER CONSISTENCY. amplab. Tim Kraska, Gene Pang, Michael Franklin, Samuel Madden, Alan Fekete MDCC MULTI DATA CENTER CONSISTENCY Tim Kraska, Gene Pang, Michael Franklin, Samuel Madden, Alan Fekete gpang@cs.berkeley.edu amplab MOTIVATION 2 3 June 2, 200: Rackspace power outage of approximately 0

More information

What are Transactions? Transaction Management: Introduction (Chap. 16) Major Example: the web app. Concurrent Execution. Web app in execution (CS636)

What are Transactions? Transaction Management: Introduction (Chap. 16) Major Example: the web app. Concurrent Execution. Web app in execution (CS636) What are Transactions? Transaction Management: Introduction (Chap. 16) CS634 Class 14, Mar. 23, 2016 So far, we looked at individual queries; in practice, a task consists of a sequence of actions E.g.,

More information

Distributed transactional reads: the strong, the quick, the fresh & the impossible

Distributed transactional reads: the strong, the quick, the fresh & the impossible Distributed transactional reads: the strong, the quick, the fresh & the impossible Alejandro Z. Tomsic Sorbonne Université, Inria, LIP6, Paris Manuel Bravo IMDEA Software Institute, Madrid Marc Shapiro

More information

Coordination-Free Computations. Christopher Meiklejohn

Coordination-Free Computations. Christopher Meiklejohn Coordination-Free Computations Christopher Meiklejohn LASP DISTRIBUTED, EVENTUALLY CONSISTENT COMPUTATIONS CHRISTOPHER MEIKLEJOHN (BASHO TECHNOLOGIES, INC.) PETER VAN ROY (UNIVERSITÉ CATHOLIQUE DE LOUVAIN)

More information

Transaction Management Overview. Transactions. Concurrency in a DBMS. Chapter 16

Transaction Management Overview. Transactions. Concurrency in a DBMS. Chapter 16 Transaction Management Overview Chapter 16 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Transactions Concurrent execution of user programs is essential for good DBMS performance. Because

More information

Module 7 File Systems & Replication CS755! 7-1!

Module 7 File Systems & Replication CS755! 7-1! Module 7 File Systems & Replication CS755! 7-1! Distributed File Systems CS755! 7-2! File Systems File system! Operating System interface to disk storage! File system attributes (Metadata)! File length!

More information

Mutual consistency, what for? Replication. Data replication, consistency models & protocols. Difficulties. Execution Model.

Mutual consistency, what for? Replication. Data replication, consistency models & protocols. Difficulties. Execution Model. Replication Data replication, consistency models & protocols C. L. Roncancio - S. Drapeau Grenoble INP Ensimag / LIG - Obeo 1 Data and process Focus on data: several physical of one logical object What

More information

Introduction to Data Management. Lecture #24 (Transactions)

Introduction to Data Management. Lecture #24 (Transactions) Introduction to Data Management Lecture #24 (Transactions) Instructor: Mike Carey mjcarey@ics.uci.edu Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Announcements v HW and exam info:

More information

Replication and Consistency. Fall 2010 Jussi Kangasharju

Replication and Consistency. Fall 2010 Jussi Kangasharju Replication and Consistency Fall 2010 Jussi Kangasharju Chapter Outline Replication Consistency models Distribution protocols Consistency protocols 2 Data Replication user B user C user A object object

More information

Introduction to Data Management. Lecture #18 (Transactions)

Introduction to Data Management. Lecture #18 (Transactions) Introduction to Data Management Lecture #18 (Transactions) Instructor: Mike Carey mjcarey@ics.uci.edu Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Announcements v Project info: Part

More information

Transaction Management: Introduction (Chap. 16)

Transaction Management: Introduction (Chap. 16) Transaction Management: Introduction (Chap. 16) CS634 Class 14 Slides based on Database Management Systems 3 rd ed, Ramakrishnan and Gehrke What are Transactions? So far, we looked at individual queries;

More information

Changing Requirements for Distributed File Systems in Cloud Storage

Changing Requirements for Distributed File Systems in Cloud Storage Changing Requirements for Distributed File Systems in Cloud Storage Wesley Leggette Cleversafe Presentation Agenda r About Cleversafe r Scalability, our core driver r Object storage as basis for filesystem

More information

Synchronization Part 2. REK s adaptation of Claypool s adaptation oftanenbaum s Distributed Systems Chapter 5 and Silberschatz Chapter 17

Synchronization Part 2. REK s adaptation of Claypool s adaptation oftanenbaum s Distributed Systems Chapter 5 and Silberschatz Chapter 17 Synchronization Part 2 REK s adaptation of Claypool s adaptation oftanenbaum s Distributed Systems Chapter 5 and Silberschatz Chapter 17 1 Outline Part 2! Clock Synchronization! Clock Synchronization Algorithms!

More information

Transaction Chopping for Parallel Snapshot Isolation

Transaction Chopping for Parallel Snapshot Isolation Transaction Chopping for Parallel Snapshot Isolation Andrea Cerone 1, Alexey Gotsman 1, and Hongseok Yang 2 1 IMDEA Software Institute 2 University of Oxford Abstract. Modern Internet services often achieve

More information

Synchronization. Chapter 5

Synchronization. Chapter 5 Synchronization Chapter 5 Clock Synchronization In a centralized system time is unambiguous. (each computer has its own clock) In a distributed system achieving agreement on time is not trivial. (it is

More information

Chapter 11 - Data Replication Middleware

Chapter 11 - Data Replication Middleware Prof. Dr.-Ing. Stefan Deßloch AG Heterogene Informationssysteme Geb. 36, Raum 329 Tel. 0631/205 3275 dessloch@informatik.uni-kl.de Chapter 11 - Data Replication Middleware Motivation Replication: controlled

More information

Causal Consistency. CS 240: Computing Systems and Concurrency Lecture 16. Marco Canini

Causal Consistency. CS 240: Computing Systems and Concurrency Lecture 16. Marco Canini Causal Consistency CS 240: Computing Systems and Concurrency Lecture 16 Marco Canini Credits: Michael Freedman and Kyle Jamieson developed much of the original material. Consistency models Linearizability

More information

Transactional storage for geo-replicated systems

Transactional storage for geo-replicated systems Transactional storage for geo-replicated systems Yair Sovran Russell Power Marcos K. Aguilera Jinyang Li New York University Microsoft Research Silicon Valley ABSTRACT We describe the design and implementation

More information

Document Sub Title. Yotpo. Technical Overview 07/18/ Yotpo

Document Sub Title. Yotpo. Technical Overview 07/18/ Yotpo Document Sub Title Yotpo Technical Overview 07/18/2016 2015 Yotpo Contents Introduction... 3 Yotpo Architecture... 4 Yotpo Back Office (or B2B)... 4 Yotpo On-Site Presence... 4 Technologies... 5 Real-Time

More information

SwiftCloud: Fault-Tolerant Geo-Replication Integrated all the Way to the Client Machine

SwiftCloud: Fault-Tolerant Geo-Replication Integrated all the Way to the Client Machine SwiftCloud: Fault-Tolerant Geo-Replication Integrated all the Way to the Client Machine Marek Zawirski, Annette Bieniusa, Valter Balegas, Sérgio Duarte, Carlos Baquero, Marc Shapiro, Nuno Preguiça To cite

More information

Consistency and Replication. Why replicate?

Consistency and Replication. Why replicate? Consistency and Replication Today: Consistency models Data-centric consistency models Client-centric consistency models Lecture 15, page 1 Why replicate? Data replication versus compute replication Data

More information

Concurrency Control - Formal Foundations

Concurrency Control - Formal Foundations Concurrency Control - Formal Foundations 1 Last time Intro to ACID transactions Focus on Isolation Every transaction has the illusion of having the DB to itself Isolation anomalies bad things that can

More information

GlobalFS: A Strongly Consistent Multi-Site Filesystem

GlobalFS: A Strongly Consistent Multi-Site Filesystem GlobalFS: A Strongly Consistent Multi-Site Filesystem Leandro Pacheco Raluca Halalai Valerio Schiavoni Fernando Pedone Etienne Rivière Pascal Felber RainbowFS Workshop May 3rd, 2017 Distributed applications

More information

Consumer-view of consistency properties: definition, measurement, and exploitation

Consumer-view of consistency properties: definition, measurement, and exploitation Consumer-view of consistency properties: definition, measurement, and exploitation Alan Fekete (University of Sydney) Alan.Fekete@sydney.edu.au PaPoC, London, April 2016 Limitations of this talk Discuss

More information

Database Systems CSE 414

Database Systems CSE 414 Database Systems CSE 414 Lecture 20: Introduction to Transactions CSE 414 - Spring 2017 1 Announcements HW6 due on Wednesday WQ6 available for one more day WQ7 (last one!) due on Sunday CSE 414 - Spring

More information

PRIMARY-BACKUP REPLICATION

PRIMARY-BACKUP REPLICATION PRIMARY-BACKUP REPLICATION Primary Backup George Porter Nov 14, 2018 ATTRIBUTION These slides are released under an Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0) Creative Commons

More information

CAP Theorem. March 26, Thanks to Arvind K., Dong W., and Mihir N. for slides.

CAP Theorem. March 26, Thanks to Arvind K., Dong W., and Mihir N. for slides. C A CAP Theorem P March 26, 2018 Thanks to Arvind K., Dong W., and Mihir N. for slides. CAP Theorem It is impossible for a web service to provide these three guarantees at the same time (pick 2 of 3):

More information

INF-5360 Presentation

INF-5360 Presentation INF-5360 Presentation Optimistic Replication Ali Ahmad April 29, 2013 Structure of presentation Pessimistic and optimistic replication Elements of Optimistic replication Eventual consistency Scheduling

More information

Transaction Management: Concurrency Control, part 2

Transaction Management: Concurrency Control, part 2 Transaction Management: Concurrency Control, part 2 CS634 Class 16 Slides based on Database Management Systems 3 rd ed, Ramakrishnan and Gehrke Locking for B+ Trees Naïve solution Ignore tree structure,

More information

Locking for B+ Trees. Transaction Management: Concurrency Control, part 2. Locking for B+ Trees (contd.) Locking vs. Latching

Locking for B+ Trees. Transaction Management: Concurrency Control, part 2. Locking for B+ Trees (contd.) Locking vs. Latching Locking for B+ Trees Transaction Management: Concurrency Control, part 2 Slides based on Database Management Systems 3 rd ed, Ramakrishnan and Gehrke CS634 Class 16 Naïve solution Ignore tree structure,

More information

Overview. Introduction to Transaction Management ACID. Transactions

Overview. Introduction to Transaction Management ACID. Transactions Introduction to Transaction Management UVic C SC 370 Dr. Daniel M. German Department of Computer Science Overview What is a transaction? What properties transactions have? Why do we want to interleave

More information

Last time. Distributed systems Lecture 6: Elections, distributed transactions, and replication. DrRobert N. M. Watson

Last time. Distributed systems Lecture 6: Elections, distributed transactions, and replication. DrRobert N. M. Watson Distributed systems Lecture 6: Elections, distributed transactions, and replication DrRobert N. M. Watson 1 Last time Saw how we can build ordered multicast Messages between processes in a group Need to

More information

Concurrency Control & Recovery

Concurrency Control & Recovery Transaction Management Overview CS 186, Fall 2002, Lecture 23 R & G Chapter 18 There are three side effects of acid. Enhanced long term memory, decreased short term memory, and I forget the third. - Timothy

More information

Consistency. CS 475, Spring 2018 Concurrent & Distributed Systems

Consistency. CS 475, Spring 2018 Concurrent & Distributed Systems Consistency CS 475, Spring 2018 Concurrent & Distributed Systems Review: 2PC, Timeouts when Coordinator crashes What if the bank doesn t hear back from coordinator? If bank voted no, it s OK to abort If

More information

Synchronization Part II. CS403/534 Distributed Systems Erkay Savas Sabanci University

Synchronization Part II. CS403/534 Distributed Systems Erkay Savas Sabanci University Synchronization Part II CS403/534 Distributed Systems Erkay Savas Sabanci University 1 Election Algorithms Issue: Many distributed algorithms require that one process act as a coordinator (initiator, etc).

More information

Strong Consistency & CAP Theorem

Strong Consistency & CAP Theorem Strong Consistency & CAP Theorem CS 240: Computing Systems and Concurrency Lecture 15 Marco Canini Credits: Michael Freedman and Kyle Jamieson developed much of the original material. Consistency models

More information

Transaction Management Overview

Transaction Management Overview Transaction Management Overview Chapter 16 CSE 4411: Database Management Systems 1 Transactions Concurrent execution of user programs is essential for good DBMS performance. Because disk accesses are frequent,

More information

Making Consistency More Consistent: A Unified Model for Coherence, Consistency and Isolation

Making Consistency More Consistent: A Unified Model for Coherence, Consistency and Isolation Making Consistency More Consistent: A Unified Model for Coherence, Consistency and Isolation ABSTRACT Adriana Szekeres University of Washington aaasz@cs.washington.edu Ordering guarantees are often defined

More information

CSE 344 MARCH 5 TH TRANSACTIONS

CSE 344 MARCH 5 TH TRANSACTIONS CSE 344 MARCH 5 TH TRANSACTIONS ADMINISTRIVIA OQ6 Out 6 questions Due next Wednesday, 11:00pm HW7 Shortened Parts 1 and 2 -- other material candidates for short answer, go over in section Course evaluations

More information

Write Fast, Read in the Past: Causal Consistency for Client-side Applications

Write Fast, Read in the Past: Causal Consistency for Client-side Applications Write Fast, Read in the Past: Causal Consistency for Client-side Applications Marek Zawirski, Nuno Preguiça, Sérgio Duarte, Annette Bieniusa, Valter Balegas, Marc Shapiro To cite this version: Marek Zawirski,

More information