Conflict-free Replicated Data Types (CRDTs)

Size: px
Start display at page:

Download "Conflict-free Replicated Data Types (CRDTs)"

Transcription

1 Conflict-free Replicated Data Types (CRDTs) for collaborative environments Marc Shapiro, INRIA & LIP6 Nuno Preguiça, U. Nova de Lisboa Carlos Baquero, U. Minho Marek Zawirski, INRIA & UPMC

2 Conflict-free objects for largescale distribution Shared Mutable data Read replicate Updates? Novel, principled approach: Conflict-free objects Large, dynamic graph Incremental, parallel, asynchronous: - updates - processing Can we design useful object types without any synchronisation whatsoever? Can we build practical systems from such objects? 2

3 Conflict-free objects for largescale distribution Shared Mutable data Read replicate Updates? Novel, principled approach: Conflict-free objects Large, dynamic graph Incremental, parallel, asynchronous: - updates - processing Can we design useful object types without any synchronisation whatsoever? Can we build practical systems from such objects? 2

4 Conflict-free objects for largescale distribution Shared Mutable data Read replicate Updates? Novel, principled approach: Conflict-free objects Large, dynamic graph Incremental, parallel, asynchronous: - updates - processing Can we design useful object types without any synchronisation whatsoever? Can we build practical systems from such objects? 2

5 Conflict-free objects for largescale distribution Shared Mutable data Read replicate Updates? Novel, principled approach: Conflict-free objects Large, dynamic graph Incremental, parallel, asynchronous: - updates - processing Can we design useful object types without any synchronisation whatsoever? Can we build practical systems from such objects? 2

6 Conflict-free objects for largescale distribution Shared Mutable data Read replicate Updates? Novel, principled approach: Conflict-free objects Large, dynamic graph Incremental, parallel, asynchronous: - updates - processing Can we design useful object types without any synchronisation whatsoever? Can we build practical systems from such objects? 2

7 Replication for beginners

8 Replicated data Share data Replicate at many locations Performance: local reads Availability: immune from network failure Fault-tolerance: replicate computation Scalability: load balancing Updates Push to all replicas Conflicts: Consistency? Fault tolerance and parallelism too? Conflict!! 4

9 Strong consistency Preclude conflicts All replicas execute updates in same total order Any deterministic object Simultaneous N- way agreement Consensus Serialisation bottleneck Tolerates < n/2 faults Very general Correct Doesn't scale 5

10 Strong consistency Preclude conflicts All replicas execute updates in same total order Any deterministic object 1 Simultaneous N- way agreement Consensus Serialisation bottleneck Tolerates < n/2 faults Very general Correct Doesn't scale 5

11 Strong consistency Preclude conflicts All replicas execute updates in same total order Any deterministic object 12 Simultaneous N- way agreement Consensus Serialisation bottleneck Tolerates < n/2 faults Very general Correct Doesn't scale 5

12 Strong consistency Preclude conflicts All replicas execute updates in same total order Any deterministic object 123 Simultaneous N- way agreement Consensus Serialisation bottleneck Tolerates < n/2 faults Very general Correct Doesn't scale 5

13 Strong consistency Preclude conflicts All replicas execute updates in same total order Any deterministic object 1234 Simultaneous N- way agreement Consensus Serialisation bottleneck Tolerates < n/2 faults Very general Correct Doesn't scale 5

14 Strong consistency Preclude conflicts All replicas execute updates in same total order Any deterministic object Simultaneous N- way agreement Consensus Serialisation bottleneck Tolerates < n/2 faults Very general Correct Doesn't scale 5

15 Strong consistency Preclude conflicts All replicas execute updates in same total order Any deterministic object Simultaneous N- way agreement Consensus Serialisation bottleneck Tolerates < n/2 faults Very general Correct Doesn't scale 5

16 Eventual Consistency Update local + propagate No foreground synch Expose tentative state Eventual, reliable delivery On conflict Arbitrate Roll back Availability ++ Parallelism++ Latency -- Complexity ++ Consensus moved to background 6

17 Eventual Consistency Update local + propagate No foreground synch Expose tentative state Eventual, reliable delivery On conflict Arbitrate Roll back Availability ++ Parallelism++ Latency -- Complexity ++ Consensus moved to background 6

18 Eventual Consistency Update local + propagate No foreground synch Expose tentative state Eventual, reliable delivery On conflict Arbitrate Roll back Availability ++ Parallelism++ Latency -- Complexity ++ Consensus moved to background 6

19 Eventual Consistency Update local + propagate No foreground synch Expose tentative state Eventual, reliable delivery On conflict Arbitrate Roll back Availability ++ Parallelism++ Latency -- Complexity ++ Consensus moved to background 6

20 Eventual Consistency Update local + propagate No foreground synch Expose tentative state Eventual, reliable delivery On conflict Arbitrate Roll back Availability ++ Parallelism++ Latency -- Complexity ++ Consensus moved to background 6

21 Eventual Consistency Update local + propagate No foreground synch Expose tentative state Conflict! Eventual, reliable delivery On conflict Arbitrate Roll back Availability ++ Parallelism++ Latency -- Complexity ++ Consensus moved to background 6

22 Eventual Consistency Update local + propagate No foreground synch Expose tentative state Eventual, reliable delivery On conflict Arbitrate Roll back Availability ++ Parallelism++ Latency -- Complexity ++ Consensus moved to background 6

23 Eventual Consistency Update local + propagate No foreground synch Expose tentative state Eventual, reliable delivery On conflict Arbitrate Roll back Availability ++ Parallelism++ Latency -- Complexity ++ Consensus moved to background 6

24 Eventual Consistency Update local + propagate No foreground synch Expose tentative state Eventual, reliable delivery On conflict Arbitrate Roll back Availability ++ Parallelism++ Latency -- Complexity ++ Consensus moved to background 6

25 Strong Eventual Consistency Available, responsive More parallelism No conflicts No rollback Update local + propagate No synch Expose intermediate state Eventual, reliable delivery No conflict Deterministic outcome of concurrent updates No consensus: n-1 faults Not universal Solves the CAP problem 7

26 Strong Eventual Consistency Available, responsive More parallelism No conflicts No rollback Update local + propagate No synch Expose intermediate state Eventual, reliable delivery No conflict Deterministic outcome of concurrent updates No consensus: n-1 faults Not universal Solves the CAP problem 7

27 Strong Eventual Consistency Available, responsive More parallelism No conflicts No rollback Update local + propagate No synch Expose intermediate state Eventual, reliable delivery No conflict Deterministic outcome of concurrent updates No consensus: n-1 faults Not universal Solves the CAP problem 7

28 Strong Eventual Consistency Available, responsive More parallelism No conflicts No rollback Update local + propagate No synch Expose intermediate state Eventual, reliable delivery No conflict Deterministic outcome of concurrent updates No consensus: n-1 faults Not universal Solves the CAP problem 7

29 Strong Eventual Consistency Available, responsive More parallelism No conflicts No rollback Update local + propagate No synch Expose intermediate state Eventual, reliable delivery No conflict Deterministic outcome of concurrent updates No consensus: n-1 faults Not universal Solves the CAP problem 7

30 The challenge: What interesting objects can we design with no synchronisation whatsoever?

31 Portfolio of CRDTs Register Last-Writer Wins Multi-Value Set Grow-Only 2P Observed-Remove Map Set of Registers Counter Unlimited Non-negative Graphs Directed Monotonic DAG Edit graph Sequence Edit sequence 9

32 Set design alternatives Sequential specification: {true} add(e) {e S} {true} remove(e) {e S} linearisable: sequential order equivalent to real-time order Requires consensus {true} add(e) remove(e) {????} linearisable? add wins? remove wins? last writer wins? error state? 10

33 Observed-Remove Set s s1 s2 s3 {} {} {} Can never remove more tokens than exist Op order removed tokens have been previously added 11

34 Observed-Remove Set s s1 s2 s3 {} {} {} add(a) S {aα} Can never remove more tokens than exist Op order removed tokens have been previously added 11

35 Observed-Remove Set s {} add(a) s1 s2 s3 {} {} S {aα} add(a) S {aβ} Can never remove more tokens than exist Op order removed tokens have been previously added 11

36 Observed-Remove Set s s1 s2 s3 {} {} {} add(a) S {aα} add(a) S {aβ} {aβ} M {aβ} Can never remove more tokens than exist Op order removed tokens have been previously added 11

37 Observed-Remove Set s s1 s2 s3 {} {} {} add(a) S {aα} add(a) S {aβ} {aβ} M {aβ} {aα} M {aβ, aα} Can never remove more tokens than exist Op order removed tokens have been previously added 11

38 Observed-Remove Set s s1 s2 s3 {} {} {} add(a) S {aα} add(a) S {aβ} {aβ} M rmv (a) {aβ} S {aα} {aα} M {aβ, aα} Can never remove more tokens than exist Op order removed tokens have been previously added 11

39 Observed-Remove Set s s1 s2 s3 {} {} {} add(a) S {aα} add(a) S {aβ} {aβ} rmv (a) S {aα} Can never remove more tokens than {aα} {aα} exist Op order removed M M M {aβ} {aβ, aα} {aβ, aα} tokens have been previously added 11

40 Observed-Remove Set s s1 s2 s3 {} {} {} add(a) S {aα} add(a) S {aβ} {aβ} rmv (a) S {aβ, aα} M {aα} {aβ} Can never remove more tokens than {aα} {aα} exist Op order removed M M M {aβ} {aβ, aα} {aβ, aα} tokens have been previously added 11

41 Observed-Remove Set s s1 s2 s3 {} {} {} add(a) S {aα} add(a) S {aβ} {aβ} rmv (a) S {aβ, aα} M {aα} {aβ} Can never remove more tokens than {aα} {aα} exist Op order removed M M M {aβ} {aβ, aα} {aβ, aα} tokens have been previously added Payload: added, removed (element, unique-token) add(e) = A A {(e, α)} Remove: all unique elements observed remove(e) = R R { (e, ) A} lookup(e) = (e, ) A \ R merge (S, S') = (A A', R R') {true} add(e) remove(e) {e S} 11

42 OR-Set Set: solves Dynamo Shopping Cart anomaly Optimisations Just mark tombstones Garbage-collect tombstones Operation-based approach 12

43 Graph design alternatives Graph = (V, E) where E V! V Sequential specification: {v,v' V} addedge(v,v') { } { (v,v') E} removevertex(v) { } Concurrent: removevertex(v') addedge(v,v') linearisable? addedge wins? removevertex wins? etc. for our Web Search Engine application, removevertex wins Do not check precondition at add/remove 13

44 Graph Payload = OR-Set V, OR-Set E Updates add/remove to V, E addvertex(v), removevertex(v) addedge(v,v'), removeedge(v,v') Do not enforce invariant a priori lookupedge(v,v') = (v,v') E v V v' V removevertex(v') addedge(v,v') remove wins " 14

45 Graph Payload = OR-Set V, OR-Set E Updates add/remove to V, E addvertex(v), removevertex(v) addedge(v,v'), removeedge(v,v') Do not enforce invariant a priori lookupedge(v,v') = (v,v') E v V v' V removevertex(v') addedge(v,v') remove wins " 14

46 Graph Payload = OR-Set V, OR-Set E Updates add/remove to V, E addvertex(v), removevertex(v) addedge(v,v'), removeedge(v,v') Do not enforce invariant a priori lookupedge(v,v') = (v,v') E v V v' V removevertex(v') addedge(v,v') remove wins " 14

47 Graph Payload = OR-Set V, OR-Set E Updates add/remove to V, E addvertex(v), removevertex(v) addedge(v,v'), removeedge(v,v') Do not enforce invariant a priori lookupedge(v,v') = (v,v') E v V v' V removevertex(v') addedge(v,v') remove wins " 14

48 Graph Payload = OR-Set V, OR-Set E Updates add/remove to V, E addvertex(v), removevertex(v) addedge(v,v'), removeedge(v,v') Do not enforce invariant a priori lookupedge(v,v') = (v,v') E v V v' V removevertex(v') addedge(v,v') remove wins " 14

49 Deep (internal) view: DAG representing insert order Co-operative editing {x,z graphi x < z} add-betweeni (x, y, z) {y graphi x<y<z} add: between alreadyordered elements strong order takes preceden ce over weak begin and end sentinels Local constraint implies globally acyclic This spec is implemented directly by WOOT Clean Surface view: summarises total order ensure consistent ordering at all replicas II NR RI A α ββ γ γδ ε

50 Deep (internal) view: DAG representing insert order Co-operative editing {x,z graphi x < z} add-betweeni (x, y, z) {y graphi x<y<z} I δ add: between alreadyordered elements strong order takes preceden ce over weak begin and end sentinels Local constraint implies globally acyclic This spec is implemented directly by WOOT Clean Surface view: summarises total order ensure consistent ordering at all replicas II NR RI A α ββ γ γδ ε

51 Co-operative editing Deep (internal) view: DAG representing insert order ensure consistent ordering at all replicas {x,z graphi x < z} add-betweeni (x, y, z) {y graphi x<y<z} I δ add: between alreadyordered elements strong order takes preceden ce over weak A ε begin and end sentinels II NR RI A Local constraint implies globally acyclic This spec is implemented directly by WOOT Clean Surface view: summarises total order α ββ γ γδ ε

52 Co-operative editing Deep (internal) view: DAG representing insert order N β ensure consistent ordering at all replicas {x,z graphi x < z} add-betweeni (x, y, z) {y graphi x<y<z} I δ add: between alreadyordered elements strong order takes preceden ce over weak A ε begin and end sentinels II NR RI A Local constraint implies globally acyclic This spec is implemented directly by WOOT Clean Surface view: summarises total order α ββ γ γδ ε

53 Co-operative editing Deep (internal) view: DAG representing insert order N β ensure consistent ordering at all replicas {x,z graphi x < z} add-betweeni (x, y, z) {y graphi x<y<z} R γ I δ add: between alreadyordered elements strong order takes preceden ce over weak A ε begin and end sentinels II NR RI A Local constraint implies globally acyclic This spec is implemented directly by WOOT Clean Surface view: summarises total order α ββ γ γδ ε

54 Co-operative editing Deep (internal) view: DAG representing insert order I α N β ensure consistent ordering at all replicas {x,z graphi x < z} add-betweeni (x, y, z) {y graphi x<y<z} R γ I δ add: between alreadyordered elements strong order takes preceden ce over weak A ε begin and end sentinels II NR RI A Local constraint implies globally acyclic This spec is implemented directly by WOOT Clean Surface view: summarises total order α ββ γ γδ ε

55 Co-operative editing Deep (internal) view: DAG representing insert order I α N β ensure consistent ordering at all replicas {x,z graphi x < z} add-betweeni (x, y, z) {y graphi x<y<z} R γ I δ add: between alreadyordered elements strong order takes preceden ce over weak A ε begin and end sentinels II NR RI A Local constraint implies globally acyclic This spec is implemented directly by WOOT Clean Surface view: summarises total order α ββ γ γδ ε

56 Co-operative editing Deep (internal) view: DAG representing insert order I α N β ensure consistent ordering at all replicas {x,z graphi x < z} add-betweeni (x, y, z) {y graphi x<y<z} R γ I δ add: between alreadyordered elements strong order takes preceden ce over weak A ε begin and end sentinels II NR RI A Local constraint implies globally acyclic This spec is implemented directly by WOOT Clean Surface view: summarises total order α ββ γ γδ ε

57 Co-operative editing Deep (internal) view: DAG representing insert order I α N β ensure consistent ordering at all replicas {x,z graphi x < z} add-betweeni (x, y, z) {y graphi x<y<z} R γ I δ add: between alreadyordered elements strong order takes preceden ce over weak A ε begin and end sentinels II NR RI A Local constraint implies globally acyclic This spec is implemented directly by WOOT Clean Surface view: summarises total order α ββ γ γδ ε

58 Co-operative editing Deep (internal) view: DAG representing insert order I α N β ensure consistent ordering at all replicas {x,z graphi x < z} add-betweeni (x, y, z) {y graphi x<y<z} R γ I δ add: between alreadyordered elements strong order takes preceden ce over weak A ε begin and end sentinels II NR RI A Local constraint implies globally acyclic This spec is implemented directly by WOOT Clean Surface view: summarises total order α ββ γ γδ ε

59 Continuum I N Assign each element a unique real number position Real numbers not appropriate approximate by tree 16

60 Continuum I N A Assign each element a unique real number position Real numbers not appropriate approximate by tree 16

61 Continuum I N I A Assign each element a unique real number position Real numbers not appropriate approximate by tree 16

62 Continuum L I N R I A Assign each element a unique real number position Real numbers not appropriate approximate by tree 16

63 Treedoc binary tree 0 R 1 I 1 N I 0 A Compact, lowarity tree In the following slides, will Low arity: binary, quaternary = L I N R I Binary naming tree: compact, self-adjusting logarithmic: Logarithmic properties assuming tree add appends leaf non-destructive, IDs don t change remove: tombstone, IDs don't change 17

64 Treedoc binary tree 0 R 1 L 0 I 1 N I 0 A Compact, lowarity tree In the following slides, will Low arity: binary, quaternary = L I N R I Binary naming tree: compact, self-adjusting logarithmic: Logarithmic properties assuming tree add appends leaf non-destructive, IDs don t change remove: tombstone, IDs don't change 17

65 Treedoc binary tree Low arity: binary, quaternary Binary naming tree: compact, self-adjusting Logarithmic properties L 0 I 0 1 N R I 0 1 = L I N R I A logarithmic: assuming tree Compact, lowarity tree In the following slides, will add appends leaf non-destructive, IDs don t change remove: tombstone, IDs don't change 17

66 Treedoc binary tree Low arity: binary, quaternary Binary naming tree: compact, self-adjusting Logarithmic properties L 0 I 0 1 N R I 0 1 = L I N R I A logarithmic: assuming tree Compact, lowarity tree In the following slides, will add appends leaf non-destructive, IDs don t change remove: tombstone, IDs don't change 17

67 Treedoc binary tree Low arity: binary, quaternary Binary naming tree: compact, self-adjusting Logarithmic properties L 0 I 0 1 N R I 0 1 = L I N R I A logarithmic: assuming tree Compact, lowarity tree In the following slides, will add appends leaf non-destructive, IDs don t change remove: tombstone, IDs don't change 17

68 Layered Treedoc binary tree 18

69 Layered Treedoc sparse ary tree Site 34 Site 79 binary tree 18

70 Layered Treedoc sparse ary tree Site 34 Site 79 Site 22 binary tree 18

71 Layered Treedoc sparse ary tree Site 34 Site 79 Site 34 Site 22 binary tree 18

72 Layered Treedoc sparse ary tree Site 34 Site 79 Site 34 Site 22 Site 34 Site 66 Site 79 binary tree 18

73 Layered Treedoc sparse ary tree Site 34 Site 79 Site 34 Site 22 Site 34 Site 66 Site 79 binary tree Edit: Binary tree Concurrency: Sparse tree 18

74 The theory Two simple conditions for correctness without synchronisation

75 Query client s s1 s2 s3 Local at source replica Client's choice Example: Amazon shopping cart is replicated unspecified client, e.g., Web frontend One or more load-balancer, failures may direct client to different replicas 20

76 Query s s1 client S s1.q(a) s2 s3 Local at source replica Client's choice Example: Amazon shopping cart is replicated unspecified client, e.g., Web frontend One or more load-balancer, failures may direct client to different replicas 20

77 Query client s s1 s2.q(b) S s1.q(a) s2 S s3 Example: Amazon shopping cart is replicated unspecified client, e.g., Web frontend One or more load-balancer, failures may direct client to different replicas Local at source replica Client's choice 20

78 State-based replication client s s1 s2 s3 Local at source s1.u(a), s2.u(b), Compute Update local payload Convergence: Episodically: send si payload On delivery: merge payloads m merge two valid states produce valid state no historical info available Inefficient if payload is large 21

79 State-based replication client s s1 S s1.u(a) s2 s3 Local at source s1.u(a), s2.u(b), Compute Update local payload Convergence: Episodically: send si payload On delivery: merge payloads m merge two valid states produce valid state no historical info available Inefficient if payload is large 21

80 State-based replication client s s1 s2.u(b) S s1.u(a) s2 S s3 Local at source s1.u(a), s2.u(b), Compute Update local payload Convergence: Episodically: send si payload On delivery: merge payloads m merge two valid states produce valid state no historical info available Inefficient if payload is large 21

81 State-based replication s s1 s2.u(b) S s1.u(a) s2 S s3 Local at source s1.u(a), s2.u(b), Compute Update local payload Convergence: Episodically: send si payload On delivery: merge payloads m merge two valid states produce valid state no historical info available Inefficient if payload is large 21

82 State-based replication s s1 s2.u(b) S s1.u(a) s1 s2 S M s2.m(s1) s3 Local at source s1.u(a), s2.u(b), Compute Update local payload Convergence: Episodically: send si payload On delivery: merge payloads m merge two valid states produce valid state no historical info available Inefficient if payload is large 21

83 State-based replication s s1 s2.u(b) S s1.u(a) s1 s2 S M s2.m(s1) s2 s3 M s3.m(s2) Local at source s1.u(a), s2.u(b), Compute Update local payload Convergence: Episodically: send si payload On delivery: merge payloads m merge two valid states produce valid state no historical info available Inefficient if payload is large 21

84 State-based replication s s1 s2 s2.u(b) S S s1.u(a) s1 M s2.m(s1) s2 s1.m(s2) M s2 s3 M s3.m(s2) Local at source s1.u(a), s2.u(b), Compute Update local payload Convergence: Episodically: send si payload On delivery: merge payloads m merge two valid states produce valid state no historical info available Inefficient if payload is large 21

85 State-based: monotonic semilattice CRDT s s1 s2 s3 s1.u(a) S s2.u(b) S s1 M s2.m(s1) s2 M s3.m(s2) s1.m(s2) M s2 = Least Upper Bound LUB = merge If payload type forms a semi-lattice updates are increasing merge computes Least Upper Bound then replicas converge to LUB of last values Example: Payload = int, merge = max no reference to history 22

86 Operation-based replication s s1 s2 s2.u(b) s1.u(a) S S push to all replicas eventually push small updates - more efficient than state-based s3 At source: prepare broadcast to all replicas Eventually, at all replicas: update local replica 23

87 Operation-based replication s s1 s2 s2.u(b) s1.u(a) S S b b s1.u(b) D push to all replicas eventually push small updates - more efficient than state-based s3 D s3.u(b) At source: prepare broadcast to all replicas Eventually, at all replicas: update local replica 23

88 Operation-based replication s s1 s2 s2.u(b) s1.u(a) S S b b a s1.u(b) D a s2.u(a) D push to all replicas eventually push small updates - more efficient than state-based s3 D s3.u(b) D s3.u(a) At source: prepare broadcast to all replicas Eventually, at all replicas: update local replica 23

89 Op-based: commute CRDT s s1 s1.u(a) S b s1.u(b) D a s2.u(a) s2 s2.u(b) S b a D s3 D s3.u(b) D s3.u(a) Delivery order ensures downstream precondition happened-before or weaker If:!! (Liveness) all replicas execute all operations!!! in delivery order (Safety) concurrent operations all commute Then: replicas converge 24

90 Monotonic semi-lattice commutative Systematic transformation Inefficient Hand-crafted op-based implementation 1. A state-based object can emulate an operation-based object, and vice-versa 2. State-based emulation of a CvRDT is a CmRDT 3. Operation-based emulation of a CvRDT is a CmRDT 25

91 Operation-based OR-Set Payload: S = { (e,α), (e, β), (e', γ), }!! where α, β, unique Operations: lookup(e) = α: (e, α) S Set of IDs associated with a in the old state As observed by source! 26

92 Operation-based OR-Set Payload: S = { (e,α), (e, β), (e', γ), }!! where α, β, unique Operations: lookup(e) = α: (e, α) S add(e) = S S {(e, α)} where α fresh Set of IDs associated with a in the old state As observed by source! 26

93 Operation-based OR-Set Payload: S = { (e,α), (e, β), (e', γ), }!! where α, β, unique Operations: lookup(e) = α: (e, α) S add(e) = S S {(e, α)} where α fresh remove (e) =! (at source) R = {(e, α) S}! (downstream) S S \ R Set of IDs associated with a in the old state As observed by source! 26

94 Operation-based OR-Set Payload: S = { (e,α), (e, β), (e', γ), }!! where α, β, unique Operations: lookup(e) = α: (e, α) S add(e) = S S {(e, α)} where α fresh remove (e) =! (at source) R = {(e, α) S}! (downstream) S S \ R Set of IDs associated with a in the old state As observed by source! No tombstones 26

95 Operation-based OR-Set Payload: S = { (e,α), (e, β), (e', γ), }!! where α, β, unique Operations: lookup(e) = α: (e, α) S add(e) = S S {(e, α)} where α fresh remove (e) =! (at source) R = {(e, α) S}! (downstream) S S \ R Set of IDs associated with a in the old state As observed by source! No tombstones {true} add(e) remove(e) {e S} 26

96 Ongoing work

97 CRDTs for P2P & Cloud Computing ConcoRDanT: ANR Systematic study of conflict-free design space Theory and practice Characterise invariants Library of data types Not universal Conflict-free vs. conflict semantics Move consensus off critical path, non-critical ops 28

98 CRDT + dataflow Web site 1 Web site 2 Set Whitelist crawl extract links Graph Graph spam detector Web site 3 Content DB Map URL to last 2 versions extract words Words Map word to Set of URLs Incremental, asynchronous processing Replicate, shard CRDTs near the edge Propagate updates dataflow Throttle according to QoS metrics (freshness, availability, cost, etc.) Scale: sharded Synchronous processing: snapshot, at centre 29

99 OR-Set + Snapshot Read consistent snapshot Despite concurrent, incremental updates Unique token = time (vector clock) α = Lamport (process i, counter t) UIDs identify snapshot version Snapshot: vector clock value Retain tombstones until not needed lookup(e, t) = (e, i, t') A : t'>t (e, i, t') R: t'>t 30

100 Sharded OR-Set Very large objects Independent shards Static: hash (Dynamic: requires consensus to rebalance) Statically-Sharded CRDT Each shard is a CRDT Update: single shard No cross-object invariants The combination remains a CRDT Statically Sharded OR-Set Combination of smaller OR-Sets Snapshot: clock across shards 31

101 Take aways Principled approach Strong Eventual Consistency Two sufficient conditions: State: monotonic semi-lattice Operation: commutativity Useful CRDTs Register, Counter, Set, Map (KVS), Graph, Monotonic DAG, Sequence Future work Snapshot, sharding, dataflow A wee bit of synchronisation 32

102 Portfolio of CRDTs Register Last-Writer Wins Multi-Value Set Grow-Only 2P Observed-Remove Map Set of Registers Counter Unlimited Non-negative Graphs Directed Monotonic DAG Edit graph Sequence Edit sequence 33

Strong Eventual Consistency and CRDTs

Strong Eventual Consistency and CRDTs Strong Eventual Consistency and CRDTs Marc Shapiro, INRIA & LIP6 Nuno Preguiça, U. Nova de Lisboa Carlos Baquero, U. Minho Marek Zawirski, INRIA & UPMC Large-scale replicated data structures Large, dynamic

More information

Strong Eventual Consistency and CRDTs

Strong Eventual Consistency and CRDTs Strong Eventual Consistency and CRDTs Marc Shapiro, INRIA & LIP6 Nuno Preguiça, U. Nova de Lisboa Carlos Baquero, U. Minho Marek Zawirski, INRIA & UPMC Large-scale replicated data structures Large, dynamic

More information

Self-healing Data Step by Step

Self-healing Data Step by Step Self-healing Data Step by Step Uwe Friedrichsen (codecentric AG) NoSQL matters Cologne, 29. April 2014 @ufried Uwe Friedrichsen uwe.friedrichsen@codecentric.de http://slideshare.net/ufried http://ufried.tumblr.com

More information

Conflict-free Replicated Data Types

Conflict-free Replicated Data Types Conflict-free Replicated Data Types Marc Shapiro, Nuno Preguiça, Carlos Baquero, Marek Zawirski To cite this version: Marc Shapiro, Nuno Preguiça, Carlos Baquero, Marek Zawirski. Conflict-free Replicated

More information

A comprehensive study of Convergent and Commutative Replicated Data Types

A comprehensive study of Convergent and Commutative Replicated Data Types INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE A comprehensive study of Convergent and Commutative Replicated Data Types Marc Shapiro, INRIA & LIP6, Paris, France Nuno Preguiça, CITI,

More information

Cloud-backed applications. Write Fast, Read in the Past: Causal Consistency for Client- Side Applications. Requirements & challenges

Cloud-backed applications. Write Fast, Read in the Past: Causal Consistency for Client- Side Applications. Requirements & challenges Write Fast, Read in the Past: ausal onsistenc for lient- Side Applications loud-backed applications 1. Request 3. Repl 2. Process request & store update 4. Transmit update 50~200ms databas e app server

More information

(It s the invariants, stupid)

(It s the invariants, stupid) Consistency in three dimensions (It s the invariants, stupid) Marc Shapiro Cloud to the edge Social, web, e-commerce: shared mutable data Scalability replication consistency issues Inria & Sorbonne-Universités

More information

SWIFTCLOUD: GEO-REPLICATION RIGHT TO THE EDGE

SWIFTCLOUD: GEO-REPLICATION RIGHT TO THE EDGE SWIFTCLOUD: GEO-REPLICATION RIGHT TO THE EDGE Annette Bieniusa T.U. Kaiserslautern Carlos Baquero HSALab, U. Minho Marc Shapiro, Marek Zawirski INRIA, LIP6 Nuno Preguiça, Sérgio Duarte, Valter Balegas

More information

Genie. Distributed Systems Synthesis and Verification. Marc Rosen. EN : Advanced Distributed Systems and Networks May 1, 2017

Genie. Distributed Systems Synthesis and Verification. Marc Rosen. EN : Advanced Distributed Systems and Networks May 1, 2017 Genie Distributed Systems Synthesis and Verification Marc Rosen EN.600.667: Advanced Distributed Systems and Networks May 1, 2017 1 / 35 Outline Introduction Problem Statement Prior Art Demo How does it

More information

INF-5360 Presentation

INF-5360 Presentation INF-5360 Presentation Optimistic Replication Ali Ahmad April 29, 2013 Structure of presentation Pessimistic and optimistic replication Elements of Optimistic replication Eventual consistency Scheduling

More information

Large-Scale Geo-Replicated Conflict-free Replicated Data Types

Large-Scale Geo-Replicated Conflict-free Replicated Data Types Large-Scale Geo-Replicated Conflict-free Replicated Data Types Carlos Bartolomeu carlos.bartolomeu@tecnico.ulisboa.pt Instituto Superior Técnico (Advisor: Professor Luís Rodrigues) Abstract. Conflict-free

More information

Conflict-free Replicated Data Types

Conflict-free Replicated Data Types Conflict-free Replicated Data Types Marc Shapiro, Nuno Preguiça, Carlos Baquero, Marek Zawirski To cite this version: Marc Shapiro, Nuno Preguiça, Carlos Baquero, Marek Zawirski. Conflict-free Replicated

More information

Eventual Consistency Today: Limitations, Extensions and Beyond

Eventual Consistency Today: Limitations, Extensions and Beyond Eventual Consistency Today: Limitations, Extensions and Beyond Peter Bailis and Ali Ghodsi, UC Berkeley Presenter: Yifei Teng Part of slides are cited from Nomchin Banga Road Map Eventual Consistency:

More information

Horizontal or vertical scalability? Horizontal scaling is challenging. Today. Scaling Out Key-Value Storage

Horizontal or vertical scalability? Horizontal scaling is challenging. Today. Scaling Out Key-Value Storage Horizontal or vertical scalability? Scaling Out Key-Value Storage COS 418: Distributed Systems Lecture 8 Kyle Jamieson Vertical Scaling Horizontal Scaling [Selected content adapted from M. Freedman, B.

More information

Scaling Out Key-Value Storage

Scaling Out Key-Value Storage Scaling Out Key-Value Storage COS 418: Distributed Systems Logan Stafman [Adapted from K. Jamieson, M. Freedman, B. Karp] Horizontal or vertical scalability? Vertical Scaling Horizontal Scaling 2 Horizontal

More information

Just-Right Consistency. Centralised data store. trois bases. As available as possible As consistent as necessary Correct by design

Just-Right Consistency. Centralised data store. trois bases. As available as possible As consistent as necessary Correct by design Just-Right Consistency As available as possible As consistent as necessary Correct by design Marc Shapiro, UMC-LI6 & Inria Annette Bieniusa, U. Kaiserslautern Nuno reguiça, U. Nova Lisboa Christopher Meiklejohn,

More information

Distributed Key Value Store Utilizing CRDT to Guarantee Eventual Consistency

Distributed Key Value Store Utilizing CRDT to Guarantee Eventual Consistency Distributed Key Value Store Utilizing CRDT to Guarantee Eventual Consistency CPSC 416 Project Proposal n6n8: Trevor Jackson, u2c9: Hayden Nhan, v0r5: Yongnan (Devin) Li, x5m8: Li Jye Tong Introduction

More information

Eventual Consistency Today: Limitations, Extensions and Beyond

Eventual Consistency Today: Limitations, Extensions and Beyond Eventual Consistency Today: Limitations, Extensions and Beyond Peter Bailis and Ali Ghodsi, UC Berkeley - Nomchin Banga Outline Eventual Consistency: History and Concepts How eventual is eventual consistency?

More information

Conflict-free Replicated Data Types in Practice

Conflict-free Replicated Data Types in Practice Conflict-free Replicated Data Types in Practice Georges Younes Vitor Enes Wednesday 11 th January, 2017 HASLab/INESC TEC & University of Minho InfoBlender Motivation Background Background: CAP Theorem

More information

Conflict-Free Replicated Data Types (basic entry)

Conflict-Free Replicated Data Types (basic entry) Conflict-Free Replicated Data Types (basic entry) Marc Shapiro Sorbonne-Universités-UPMC-LIP6 & Inria Paris http://lip6.fr/marc.shapiro/ 16 May 2016 1 Synonyms Conflict-Free Replicated Data Types (CRDTs).

More information

Using Erlang, Riak and the ORSWOT CRDT at bet365 for Scalability and Performance. Michael Owen Research and Development Engineer

Using Erlang, Riak and the ORSWOT CRDT at bet365 for Scalability and Performance. Michael Owen Research and Development Engineer 1 Using Erlang, Riak and the ORSWOT CRDT at bet365 for Scalability and Performance Michael Owen Research and Development Engineer 2 Background 3 bet365 in stats Founded in 2000 Located in Stoke-on-Trent

More information

Bloom: Big Systems, Small Programs. Neil Conway UC Berkeley

Bloom: Big Systems, Small Programs. Neil Conway UC Berkeley Bloom: Big Systems, Small Programs Neil Conway UC Berkeley Distributed Computing Programming Languages Data prefetching Register allocation Loop unrolling Function inlining Optimization Global coordination,

More information

Large-Scale Key-Value Stores Eventual Consistency Marco Serafini

Large-Scale Key-Value Stores Eventual Consistency Marco Serafini Large-Scale Key-Value Stores Eventual Consistency Marco Serafini COMPSCI 590S Lecture 13 Goals of Key-Value Stores Export simple API put(key, value) get(key) Simpler and faster than a DBMS Less complexity,

More information

Dynamo: Amazon s Highly Available Key-Value Store

Dynamo: Amazon s Highly Available Key-Value Store Dynamo: Amazon s Highly Available Key-Value Store DeCandia et al. Amazon.com Presented by Sushil CS 5204 1 Motivation A storage system that attains high availability, performance and durability Decentralized

More information

ISSN Bulletin. of the. European Association for Theoretical Computer Science EATC S

ISSN Bulletin. of the. European Association for Theoretical Computer Science EATC S IN 0252 9742 Bulletin of the European Association for Theoretical Computer cience EATC EATC Number 104 June 2011 Tʜ Dɪʀɪʙ Cɪɴɢ Cʟɴ ʙʏ Pɴɢɪ Fʀ Department of Computer cience, University of Crete P.O. Box

More information

Extending Eventually Consistent Cloud Databases for Enforcing Numeric Invariants

Extending Eventually Consistent Cloud Databases for Enforcing Numeric Invariants Extending Eventually Consistent Cloud Databases for Enforcing Numeric Invariants Valter Balegas, Diogo Serra, Sérgio Duarte Carla Ferreira, Rodrigo Rodrigues, Nuno Preguiça NOVA LINCS/FCT/Universidade

More information

From strong to eventual consistency: getting it right. Nuno Preguiça, U. Nova de Lisboa Marc Shapiro, Inria & UPMC-LIP6

From strong to eventual consistency: getting it right. Nuno Preguiça, U. Nova de Lisboa Marc Shapiro, Inria & UPMC-LIP6 From strong to eventual consistency: getting it right Nuno Preguiça, U. Nova de Lisboa Marc Shapiro, Inria & UPMC-LIP6 Conflict-free Replicated Data Types ı Marc Shapiro 1,5, Nuno Preguiça 2,1, Carlos

More information

Scaling KVS. CS6450: Distributed Systems Lecture 14. Ryan Stutsman

Scaling KVS. CS6450: Distributed Systems Lecture 14. Ryan Stutsman Scaling KVS CS6450: Distributed Systems Lecture 14 Ryan Stutsman Material taken/derived from Princeton COS-418 materials created by Michael Freedman and Kyle Jamieson at Princeton University. Licensed

More information

Dynamo: Key-Value Cloud Storage

Dynamo: Key-Value Cloud Storage Dynamo: Key-Value Cloud Storage Brad Karp UCL Computer Science CS M038 / GZ06 22 nd February 2016 Context: P2P vs. Data Center (key, value) Storage Chord and DHash intended for wide-area peer-to-peer systems

More information

EECS 498 Introduction to Distributed Systems

EECS 498 Introduction to Distributed Systems EECS 498 Introduction to Distributed Systems Fall 2017 Harsha V. Madhyastha Replicated State Machines Logical clocks Primary/ Backup Paxos? 0 1 (N-1)/2 No. of tolerable failures October 11, 2017 EECS 498

More information

Dynamo: Amazon s Highly Available Key-value Store. ID2210-VT13 Slides by Tallat M. Shafaat

Dynamo: Amazon s Highly Available Key-value Store. ID2210-VT13 Slides by Tallat M. Shafaat Dynamo: Amazon s Highly Available Key-value Store ID2210-VT13 Slides by Tallat M. Shafaat Dynamo An infrastructure to host services Reliability and fault-tolerance at massive scale Availability providing

More information

This talk is about. Data consistency in 3D. Geo-replicated database q: Queue c: Counter { q c } Shared database q: Queue c: Counter { q c }

This talk is about. Data consistency in 3D. Geo-replicated database q: Queue c: Counter { q c } Shared database q: Queue c: Counter { q c } Data consistency in 3D (It s the invariants, stupid) Marc Shapiro Masoud Saieda Ardekani Gustavo Petri This talk is about Understanding consistency Primitive consistency mechanisms How primitives compose

More information

Selective Hearing: An Approach to Distributed, Eventually Consistent Edge Computation

Selective Hearing: An Approach to Distributed, Eventually Consistent Edge Computation Selective Hearing: An Approach to Distributed, Eventually Consistent Edge Computation Christopher Meiklejohn Basho Technologies, Inc. Bellevue, WA Email: cmeiklejohn@basho.com Peter Van Roy Université

More information

TARDiS: A branch and merge approach to weak consistency

TARDiS: A branch and merge approach to weak consistency TARDiS: A branch and merge approach to weak consistency By: Natacha Crooks, Youer Pu, Nancy Estrada, Trinabh Gupta, Lorenzo Alvis, Allen Clement Presented by: Samodya Abeysiriwardane TARDiS Transactional

More information

Why distributed databases suck, and what to do about it. Do you want a database that goes down or one that serves wrong data?"

Why distributed databases suck, and what to do about it. Do you want a database that goes down or one that serves wrong data? Why distributed databases suck, and what to do about it - Regaining consistency Do you want a database that goes down or one that serves wrong data?" 1 About the speaker NoSQL team lead at Trifork, Aarhus,

More information

Building Consistent Transactions with Inconsistent Replication

Building Consistent Transactions with Inconsistent Replication DB Reading Group Fall 2015 slides by Dana Van Aken Building Consistent Transactions with Inconsistent Replication Irene Zhang, Naveen Kr. Sharma, Adriana Szekeres, Arvind Krishnamurthy, Dan R. K. Ports

More information

CS 138: Dynamo. CS 138 XXIV 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.

CS 138: Dynamo. CS 138 XXIV 1 Copyright 2017 Thomas W. Doeppner. All rights reserved. CS 138: Dynamo CS 138 XXIV 1 Copyright 2017 Thomas W. Doeppner. All rights reserved. Dynamo Highly available and scalable distributed data store Manages state of services that have high reliability and

More information

Replication in Distributed Systems

Replication in Distributed Systems Replication in Distributed Systems Replication Basics Multiple copies of data kept in different nodes A set of replicas holding copies of a data Nodes can be physically very close or distributed all over

More information

Distributed Data Management Replication

Distributed Data Management Replication Felix Naumann F-2.03/F-2.04, Campus II Hasso Plattner Institut Distributing Data Motivation Scalability (Elasticity) If data volume, processing, or access exhausts one machine, you might want to spread

More information

Optimized OR-Sets Without Ordering Constraints

Optimized OR-Sets Without Ordering Constraints Optimized OR-Sets Without Ordering Constraints Madhavan Mukund, Gautham Shenoy R, and S P Suresh Chennai Mathematical Institute, India {madhavan,gautshen,spsuresh}@cmi.ac.in Abstract. Eventual consistency

More information

CSE-E5430 Scalable Cloud Computing Lecture 10

CSE-E5430 Scalable Cloud Computing Lecture 10 CSE-E5430 Scalable Cloud Computing Lecture 10 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 23.11-2015 1/29 Exam Registering for the exam is obligatory,

More information

Introduction to Distributed Data Systems

Introduction to Distributed Data Systems Introduction to Distributed Data Systems Serge Abiteboul Ioana Manolescu Philippe Rigaux Marie-Christine Rousset Pierre Senellart Web Data Management and Distribution http://webdam.inria.fr/textbook January

More information

CAP Theorem, BASE & DynamoDB

CAP Theorem, BASE & DynamoDB Indian Institute of Science Bangalore, India भ रत य व ज ञ न स स थ न ब गल र, भ रत DS256:Jan18 (3:1) Department of Computational and Data Sciences CAP Theorem, BASE & DynamoDB Yogesh Simmhan Yogesh Simmhan

More information

Weak Consistency and Disconnected Operation in git. Raymond Cheng

Weak Consistency and Disconnected Operation in git. Raymond Cheng Weak Consistency and Disconnected Operation in git Raymond Cheng ryscheng@cs.washington.edu Motivation How can we support disconnected or weakly connected operation? Applications File synchronization across

More information

10. Replication. Motivation

10. Replication. Motivation 10. Replication Page 1 10. Replication Motivation Reliable and high-performance computation on a single instance of a data object is prone to failure. Replicate data to overcome single points of failure

More information

arxiv: v1 [cs.dc] 9 Jan 2018

arxiv: v1 [cs.dc] 9 Jan 2018 Search on Secondary Attributes in Geo-Distributed Systems Dimitrios Vasilas 1,2, Marc Shapiro 1, and Bradley King 2 arxiv:1801.02974v1 [cs.dc] 9 Jan 2018 1 Inria & Sorbonne Universités, UPMC Univ Paris

More information

Dynamo. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Motivation System Architecture Evaluation

Dynamo. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Motivation System Architecture Evaluation Dynamo Smruti R. Sarangi Department of Computer Science Indian Institute of Technology New Delhi, India Smruti R. Sarangi Leader Election 1/20 Outline Motivation 1 Motivation 2 3 Smruti R. Sarangi Leader

More information

Deterministic Threshold Queries of Distributed Data Structures

Deterministic Threshold Queries of Distributed Data Structures Deterministic Threshold Queries of Distributed Data Structures Lindsey Kuper and Ryan R. Newton Indiana University {lkuper, rrnewton}@cs.indiana.edu Abstract. Convergent replicated data types, or CvRDTs,

More information

Riak. Distributed, replicated, highly available

Riak. Distributed, replicated, highly available INTRO TO RIAK Riak Overview Riak Distributed Riak Distributed, replicated, highly available Riak Distributed, highly available, eventually consistent Riak Distributed, highly available, eventually consistent,

More information

A Framework for Transactional Consistency Models with Atomic Visibility

A Framework for Transactional Consistency Models with Atomic Visibility A Framework for Transactional Consistency Models with Atomic Visibility Andrea Cerone, Giovanni Bernardi, Alexey Gotsman IMDEA Software Institute, Madrid, Spain CONCUR - Madrid, September 1st 2015 Data

More information

Chapter 11 - Data Replication Middleware

Chapter 11 - Data Replication Middleware Prof. Dr.-Ing. Stefan Deßloch AG Heterogene Informationssysteme Geb. 36, Raum 329 Tel. 0631/205 3275 dessloch@informatik.uni-kl.de Chapter 11 - Data Replication Middleware Motivation Replication: controlled

More information

Lasp: A Language for Distributed, Coordination-Free Programming

Lasp: A Language for Distributed, Coordination-Free Programming Lasp: A Language for Distributed, Coordination-Free Programming Christopher Meiklejohn Basho Technologies, Inc. cmeiklejohn@basho.com Peter Van Roy Université catholique de Louvain peter.vanroy@uclouvain.be

More information

SMAC: State Management for Geo-Distributed Containers

SMAC: State Management for Geo-Distributed Containers SMAC: State Management for Geo-Distributed Containers Jacob Eberhardt, Dominik Ernst, David Bermbach Information Systems Engineering Research Group Technische Universitaet Berlin Berlin, Germany Email:

More information

Linearizability CMPT 401. Sequential Consistency. Passive Replication

Linearizability CMPT 401. Sequential Consistency. Passive Replication Linearizability CMPT 401 Thursday, March 31, 2005 The execution of a replicated service (potentially with multiple requests interleaved over multiple servers) is said to be linearizable if: The interleaved

More information

DISTRIBUTED EVENTUALLY CONSISTENT COMPUTATIONS

DISTRIBUTED EVENTUALLY CONSISTENT COMPUTATIONS LASP 2 DISTRIBUTED EVENTUALLY CONSISTENT COMPUTATIONS 3 EN TAL AV CHRISTOPHER MEIKLEJOHN 4 RESEARCH WITH: PETER VAN ROY (UCL) 5 MOTIVATION 6 SYNCHRONIZATION IS EXPENSIVE 7 SYNCHRONIZATION IS SOMETIMES

More information

Consistency in Distributed Systems

Consistency in Distributed Systems Consistency in Distributed Systems Recall the fundamental DS properties DS may be large in scale and widely distributed 1. concurrent execution of components 2. independent failure modes 3. transmission

More information

CS Amazon Dynamo

CS Amazon Dynamo CS 5450 Amazon Dynamo Amazon s Architecture Dynamo The platform for Amazon's e-commerce services: shopping chart, best seller list, produce catalog, promotional items etc. A highly available, distributed

More information

CS6450: Distributed Systems Lecture 13. Ryan Stutsman

CS6450: Distributed Systems Lecture 13. Ryan Stutsman Eventual Consistency CS6450: Distributed Systems Lecture 13 Ryan Stutsman Material taken/derived from Princeton COS-418 materials created by Michael Freedman and Kyle Jamieson at Princeton University.

More information

Mutual consistency, what for? Replication. Data replication, consistency models & protocols. Difficulties. Execution Model.

Mutual consistency, what for? Replication. Data replication, consistency models & protocols. Difficulties. Execution Model. Replication Data replication, consistency models & protocols C. L. Roncancio - S. Drapeau Grenoble INP Ensimag / LIG - Obeo 1 Data and process Focus on data: several physical of one logical object What

More information

Availability versus consistency. Eventual Consistency: Bayou. Eventual consistency. Bayou: A Weakly Connected Replicated Storage System

Availability versus consistency. Eventual Consistency: Bayou. Eventual consistency. Bayou: A Weakly Connected Replicated Storage System Eventual Consistency: Bayou Availability versus consistency Totally-Ordered Multicast kept replicas consistent but had single points of failure Not available under failures COS 418: Distributed Systems

More information

Automating the Choice of Consistency Levels in Replicated Systems

Automating the Choice of Consistency Levels in Replicated Systems Automating the Choice of Consistency Levels in Replicated Systems Cheng Li, João Leitão, Allen Clement Nuno Preguiça, Rodrigo Rodrigues, Viktor Vafeiadis Max Planck Institute for Software Systems (MPI-SWS),

More information

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2015 Lecture 14 NoSQL

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2015 Lecture 14 NoSQL CSE 544 Principles of Database Management Systems Magdalena Balazinska Winter 2015 Lecture 14 NoSQL References Scalable SQL and NoSQL Data Stores, Rick Cattell, SIGMOD Record, December 2010 (Vol. 39, No.

More information

Citrea and Swarm: partially ordered op logs in the browser

Citrea and Swarm: partially ordered op logs in the browser Citrea and Swarm: partially ordered op logs in the browser Victor Grishchenko Citrea LLC victor.grishchenko@gmail.com 25 March 2014 Abstract Principles of eventual consistency are normally applied in largescale

More information

DIVING IN: INSIDE THE DATA CENTER

DIVING IN: INSIDE THE DATA CENTER 1 DIVING IN: INSIDE THE DATA CENTER Anwar Alhenshiri Data centers 2 Once traffic reaches a data center it tunnels in First passes through a filter that blocks attacks Next, a router that directs it to

More information

Replicated State Machine in Wide-area Networks

Replicated State Machine in Wide-area Networks Replicated State Machine in Wide-area Networks Yanhua Mao CSE223A WI09 1 Building replicated state machine with consensus General approach to replicate stateful deterministic services Provide strong consistency

More information

DYNAMO: AMAZON S HIGHLY AVAILABLE KEY-VALUE STORE. Presented by Byungjin Jun

DYNAMO: AMAZON S HIGHLY AVAILABLE KEY-VALUE STORE. Presented by Byungjin Jun DYNAMO: AMAZON S HIGHLY AVAILABLE KEY-VALUE STORE Presented by Byungjin Jun 1 What is Dynamo for? Highly available key-value storages system Simple primary-key only interface Scalable and Reliable Tradeoff:

More information

Basic vs. Reliable Multicast

Basic vs. Reliable Multicast Basic vs. Reliable Multicast Basic multicast does not consider process crashes. Reliable multicast does. So far, we considered the basic versions of ordered multicasts. What about the reliable versions?

More information

Replication and Consistency

Replication and Consistency Replication and Consistency Today l Replication l Consistency models l Consistency protocols The value of replication For reliability and availability Avoid problems with disconnection, data corruption,

More information

What Came First? The Ordering of Events in

What Came First? The Ordering of Events in What Came First? The Ordering of Events in Systems @kavya719 kavya the design of concurrent systems Slack architecture on AWS systems with multiple independent actors. threads in a multithreaded program.

More information

Dynamo Tom Anderson and Doug Woos

Dynamo Tom Anderson and Doug Woos Dynamo motivation Dynamo Tom Anderson and Doug Woos Fast, available writes - Shopping cart: always enable purchases FLP: consistency and progress at odds - Paxos: must communicate with a quorum Performance:

More information

TAPIR. By Irene Zhang, Naveen Sharma, Adriana Szekeres, Arvind Krishnamurthy, and Dan Ports Presented by Todd Charlton

TAPIR. By Irene Zhang, Naveen Sharma, Adriana Szekeres, Arvind Krishnamurthy, and Dan Ports Presented by Todd Charlton TAPIR By Irene Zhang, Naveen Sharma, Adriana Szekeres, Arvind Krishnamurthy, and Dan Ports Presented by Todd Charlton Outline Problem Space Inconsistent Replication TAPIR Evaluation Conclusion Problem

More information

PushyDB. Jeff Chan, Kenny Lam, Nils Molina, Oliver Song {jeffchan, kennylam, molina,

PushyDB. Jeff Chan, Kenny Lam, Nils Molina, Oliver Song {jeffchan, kennylam, molina, PushyDB Jeff Chan, Kenny Lam, Nils Molina, Oliver Song {jeffchan, kennylam, molina, osong}@mit.edu https://github.com/jeffchan/6.824 1. Abstract PushyDB provides a more fully featured database that exposes

More information

CS 655 Advanced Topics in Distributed Systems

CS 655 Advanced Topics in Distributed Systems Presented by : Walid Budgaga CS 655 Advanced Topics in Distributed Systems Computer Science Department Colorado State University 1 Outline Problem Solution Approaches Comparison Conclusion 2 Problem 3

More information

There is a tempta7on to say it is really used, it must be good

There is a tempta7on to say it is really used, it must be good Notes from reviews Dynamo Evalua7on doesn t cover all design goals (e.g. incremental scalability, heterogeneity) Is it research? Complexity? How general? Dynamo Mo7va7on Normal database not the right fit

More information

Extreme Computing. NoSQL.

Extreme Computing. NoSQL. Extreme Computing NoSQL PREVIOUSLY: BATCH Query most/all data Results Eventually NOW: ON DEMAND Single Data Points Latency Matters One problem, three ideas We want to keep track of mutable state in a scalable

More information

ACMS: The Akamai Configuration Management System. A. Sherman, P. H. Lisiecki, A. Berkheimer, and J. Wein

ACMS: The Akamai Configuration Management System. A. Sherman, P. H. Lisiecki, A. Berkheimer, and J. Wein ACMS: The Akamai Configuration Management System A. Sherman, P. H. Lisiecki, A. Berkheimer, and J. Wein Instructor: Fabian Bustamante Presented by: Mario Sanchez The Akamai Platform Over 15,000 servers

More information

Chapter 24 NOSQL Databases and Big Data Storage Systems

Chapter 24 NOSQL Databases and Big Data Storage Systems Chapter 24 NOSQL Databases and Big Data Storage Systems - Large amounts of data such as social media, Web links, user profiles, marketing and sales, posts and tweets, road maps, spatial data, email - NOSQL

More information

Databases. Laboratorio de sistemas distribuidos. Universidad Politécnica de Madrid (UPM)

Databases. Laboratorio de sistemas distribuidos. Universidad Politécnica de Madrid (UPM) Databases Laboratorio de sistemas distribuidos Universidad Politécnica de Madrid (UPM) http://lsd.ls.fi.upm.es/lsd/lsd.htm Nuevas tendencias en sistemas distribuidos 2 Summary Transactions. Isolation.

More information

Recap. CSE 486/586 Distributed Systems Case Study: Amazon Dynamo. Amazon Dynamo. Amazon Dynamo. Necessary Pieces? Overview of Key Design Techniques

Recap. CSE 486/586 Distributed Systems Case Study: Amazon Dynamo. Amazon Dynamo. Amazon Dynamo. Necessary Pieces? Overview of Key Design Techniques Recap CSE 486/586 Distributed Systems Case Study: Amazon Dynamo Steve Ko Computer Sciences and Engineering University at Buffalo CAP Theorem? Consistency, Availability, Partition Tolerance P then C? A?

More information

Consistency and Replication. Why replicate?

Consistency and Replication. Why replicate? Consistency and Replication Today: Consistency models Data-centric consistency models Client-centric consistency models Lecture 15, page 1 Why replicate? Data replication versus compute replication Data

More information

Introduction to Distributed Systems Seif Haridi

Introduction to Distributed Systems Seif Haridi Introduction to Distributed Systems Seif Haridi haridi@kth.se What is a distributed system? A set of nodes, connected by a network, which appear to its users as a single coherent system p1 p2. pn send

More information

#IoT #BigData. 10/31/14

#IoT #BigData.  10/31/14 #IoT #BigData Seema Jethani @seemaj @basho 1 10/31/14 Why should we care? 2 11/2/14 Source: http://en.wikipedia.org/wiki/internet_of_things Motivation for Specialized Big Data Systems Rate of data capture

More information

Eventual Consistency 1

Eventual Consistency 1 Eventual Consistency 1 Readings Werner Vogels ACM Queue paper http://queue.acm.org/detail.cfm?id=1466448 Dynamo paper http://www.allthingsdistributed.com/files/ amazon-dynamo-sosp2007.pdf Apache Cassandra

More information

SCALABLE CONSISTENCY AND TRANSACTION MODELS

SCALABLE CONSISTENCY AND TRANSACTION MODELS Data Management in the Cloud SCALABLE CONSISTENCY AND TRANSACTION MODELS 69 Brewer s Conjecture Three properties that are desirable and expected from realworld shared-data systems C: data consistency A:

More information

殷亚凤. Consistency and Replication. Distributed Systems [7]

殷亚凤. Consistency and Replication. Distributed Systems [7] Consistency and Replication Distributed Systems [7] 殷亚凤 Email: yafeng@nju.edu.cn Homepage: http://cs.nju.edu.cn/yafeng/ Room 301, Building of Computer Science and Technology Review Clock synchronization

More information

11. Replication. Motivation

11. Replication. Motivation 11. Replication Seite 1 11. Replication Motivation Reliable and high-performance computation on a single instance of a data object is prone to failure. Replicate data to overcome single points of failure

More information

Coordination-Free Computations. Christopher Meiklejohn

Coordination-Free Computations. Christopher Meiklejohn Coordination-Free Computations Christopher Meiklejohn LASP DISTRIBUTED, EVENTUALLY CONSISTENT COMPUTATIONS CHRISTOPHER MEIKLEJOHN (BASHO TECHNOLOGIES, INC.) PETER VAN ROY (UNIVERSITÉ CATHOLIQUE DE LOUVAIN)

More information

Presented By: Devarsh Patel

Presented By: Devarsh Patel : Amazon s Highly Available Key-value Store Presented By: Devarsh Patel CS5204 Operating Systems 1 Introduction Amazon s e-commerce platform Requires performance, reliability and efficiency To support

More information

CS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5.

CS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5. Question 1 What makes a message unstable? How does an unstable message become stable? Distributed Systems 2016 Exam 2 Review Paul Krzyzanowski Rutgers University Fall 2016 In virtual sychrony, a message

More information

CORFU: A Shared Log Design for Flash Clusters

CORFU: A Shared Log Design for Flash Clusters CORFU: A Shared Log Design for Flash Clusters Authors: Mahesh Balakrishnan, Dahlia Malkhi, Vijayan Prabhakaran, Ted Wobber, Michael Wei, John D. Davis EECS 591 11/7/18 Presented by Evan Agattas and Fanzhong

More information

Computing Parable. The Archery Teacher. Courtesy: S. Keshav, U. Waterloo. Computer Science. Lecture 16, page 1

Computing Parable. The Archery Teacher. Courtesy: S. Keshav, U. Waterloo. Computer Science. Lecture 16, page 1 Computing Parable The Archery Teacher Courtesy: S. Keshav, U. Waterloo Lecture 16, page 1 Consistency and Replication Today: Consistency models Data-centric consistency models Client-centric consistency

More information

Replication and Consistency. Fall 2010 Jussi Kangasharju

Replication and Consistency. Fall 2010 Jussi Kangasharju Replication and Consistency Fall 2010 Jussi Kangasharju Chapter Outline Replication Consistency models Distribution protocols Consistency protocols 2 Data Replication user B user C user A object object

More information

DISTRIBUTED COMPUTER SYSTEMS

DISTRIBUTED COMPUTER SYSTEMS DISTRIBUTED COMPUTER SYSTEMS CONSISTENCY AND REPLICATION CONSISTENCY MODELS Dr. Jack Lange Computer Science Department University of Pittsburgh Fall 2015 Consistency Models Background Replication Motivation

More information

Recap. CSE 486/586 Distributed Systems Case Study: Amazon Dynamo. Amazon Dynamo. Amazon Dynamo. Necessary Pieces? Overview of Key Design Techniques

Recap. CSE 486/586 Distributed Systems Case Study: Amazon Dynamo. Amazon Dynamo. Amazon Dynamo. Necessary Pieces? Overview of Key Design Techniques Recap Distributed Systems Case Study: Amazon Dynamo CAP Theorem? Consistency, Availability, Partition Tolerance P then C? A? Eventual consistency? Availability and partition tolerance over consistency

More information

IPA: Invariant-Preserving Applications for Weakly Consistent Replicated Databases

IPA: Invariant-Preserving Applications for Weakly Consistent Replicated Databases IPA: Invariant-Preserving Applications for Weakly Consistent Replicated Databases Valter Balegas NOVA LINCS, FCT, Universidade NOVA de Lisboa v.sousa@campus.fct.unl.pt Rodrigo Rodrigues INESC-ID, Instituto

More information

ZHT: Const Eventual Consistency Support For ZHT. Group Member: Shukun Xie Ran Xin

ZHT: Const Eventual Consistency Support For ZHT. Group Member: Shukun Xie Ran Xin ZHT: Const Eventual Consistency Support For ZHT Group Member: Shukun Xie Ran Xin Outline Problem Description Project Overview Solution Maintains Replica List for Each Server Operation without Primary Server

More information

DrRobert N. M. Watson

DrRobert N. M. Watson Distributed systems Lecture 15: Replication, quorums, consistency, CAP, and Amazon/Google case studies DrRobert N. M. Watson 1 Last time General issue of consensus: How to get processes to agree on something

More information

CS November 2017

CS November 2017 Bigtable Highly available distributed storage Distributed Systems 18. Bigtable Built with semi-structured data in mind URLs: content, metadata, links, anchors, page rank User data: preferences, account

More information

Transactional storage for geo-replicated systems

Transactional storage for geo-replicated systems Transactional storage for geo-replicated systems Yair Sovran Russell Power Marcos K. Aguilera Jinyang Li New York University Microsoft Research Silicon Valley ABSTRACT We describe the design and implementation

More information

Distributed Systems. Characteristics of Distributed Systems. Lecture Notes 1 Basic Concepts. Operating Systems. Anand Tripathi

Distributed Systems. Characteristics of Distributed Systems. Lecture Notes 1 Basic Concepts. Operating Systems. Anand Tripathi 1 Lecture Notes 1 Basic Concepts Anand Tripathi CSci 8980 Operating Systems Anand Tripathi CSci 8980 1 Distributed Systems A set of computers (hosts or nodes) connected through a communication network.

More information