How Eventual is Eventual Consistency?
|
|
- Dina Bradford
- 6 years ago
- Views:
Transcription
1 Probabilistically Bounded Staleness How Eventual is Eventual Consistency? Peter Bailis, Shivaram Venkataraman, Michael J. Franklin, Joseph M. Hellerstein, Ion Stoica (UC Berkeley) BashoChats 002, 28 February 2012
2 what: PBS consistency prediction why: how: weak consistency is fast measure latencies use WARS model
3 1. Fast 2. Scalable 3. Available
4 solution: replicate for 1. capacity 2. fault-tolerance
5
6 keep replicas in sync
7 keep replicas in sync slow
8 keep replicas in sync slow alternative: sync later
9 keep replicas in sync slow alternative: sync later inconsistent
10 consistency, latency contact more replicas, read more recent data consistency, latency contact fewer replicas, read less recent data
11 Dynamo: Amazon s Highly Available Key-value Store SOSP 2007 Apache, DataStax Project Voldemort
12 N = 3 replicas R1 R2 R3 ( key, 1) ( key, 1) ( key, 1) read R=3 Coordinator client
13 N = 3 replicas R1 R2 R3 ( key, 1) ( key, 1) ( key, 1) read R=3 Coordinator read( key ) client
14 N = 3 replicas R1 R2 R3 ( key, 1) ( key, 1) ( key, 1) read( key ) read R=3 Coordinator client
15 N = 3 replicas R1 R2 R3 ( key, 1) ( key, 1) ( key, 1) ( key, 1) ( key, 1) ( key, 1) read R=3 Coordinator client
16 N = 3 replicas R1 R2 R3 ( key, 1) ( key, 1) ( key, 1) ( key, 1) ( key, 1) ( key, 1) read R=3 Coordinator ( key, 1) client
17 N = 3 replicas R1 R2 R3 ( key, 1) ( key, 1) ( key, 1) read R=3 Coordinator client
18 N = 3 replicas R1 R2 R3 ( key, 1) ( key, 1) ( key, 1) read R=3 Coordinator read( key ) client
19 N = 3 replicas R1 R2 R3 ( key, 1) ( key, 1) ( key, 1) read( key ) read R=3 Coordinator client
20 N = 3 replicas R1 R2 R3 ( key, 1) ( key, 1) ( key, 1) ( key, 1) read R=3 Coordinator client
21 N = 3 replicas R1 R2 R3 ( key, 1) ( key, 1) ( key, 1) ( key, 1) ( key, 1) read R=3 Coordinator client
22 N = 3 replicas R1 R2 R3 ( key, 1) ( key, 1) ( key, 1) ( key, 1) ( key, 1) ( key, 1) read R=3 Coordinator client
23 N = 3 replicas R1 R2 R3 ( key, 1) ( key, 1) ( key, 1) ( key, 1) ( key, 1) ( key, 1) read R=3 Coordinator ( key, 1) client
24 N = 3 replicas R1 R2 R3 ( key, 1) ( key, 1) ( key, 1) read R=1 Coordinator client
25 N = 3 replicas R1 R2 R3 ( key, 1) ( key, 1) ( key, 1) read R=1 Coordinator read( key ) client
26 N = 3 replicas R1 R2 R3 ( key, 1) ( key, 1) ( key, 1) read( key ) read R=1 Coordinator client send read to all
27 N = 3 replicas R1 R2 R3 ( key, 1) ( key, 1) ( key, 1) ( key, 1) read R=1 Coordinator client send read to all
28 N = 3 replicas R1 R2 R3 ( key, 1) ( key, 1) ( key, 1) ( key, 1) read R=1 Coordinator ( key, 1) client send read to all
29 N = 3 replicas R1 R2 R3 ( key, 1) ( key, 1) ( key, 1) ( key, 1) ( key, 1) read R=1 Coordinator ( key, 1) client send read to all
30 N = 3 replicas R1 R2 R3 ( key, 1) ( key, 1) ( key, 1) ( key, 1) ( key, 1) ( key, 1) read R=1 Coordinator ( key, 1) client send read to all
31 N replicas/key read: wait for R replies write: wait for W acks
32 R1( key, 1) R2( key, 1) R3( key, 1) Coordinator W=1
33 R1( key, 1) R2( key, 1) R3( key, 1) Coordinator W=1 write( key, 2)
34 R1( key, 1) R2( key, 1) R3( key, 1) write( key, 2) Coordinator W=1
35 R1( key, 2) R2( key, 1) R3( key, 1) ack( key, 2) Coordinator W=1
36 R1( key, 2) R2( key, 1) R3( key, 1) ack( key, 2) Coordinator W=1 ack( key, 2)
37 R1( key, 2) R2( key, 1) R3( key, 1) Coordinator W=1 Coordinator R=1 ack( key, 2) read( key )
38 R1( key, 2) R2( key, 1) R3( key, 1) read( key ) Coordinator W=1 Coordinator R=1 ack( key, 2)
39 R1( key, 2) R2( key, 1) R3( key, 1) ( key, 1) Coordinator W=1 Coordinator R=1 ack( key, 2)
40 R1( key, 2) R2( key, 1) R3( key, 1) Coordinator W=1 Coordinator R=1 ack( key, 2) ( key,1)
41 R1( key, 2) R2( key, 1) R3( key, 1) Coordinator W=1 Coordinator R=1 ack( key, 2) ( key,1)
42 R1( key, 2) R2( key, 1) R3( key, 1) Coordinator W=1 Coordinator R=1 ack( key, 2) ( key,1)
43 R1( key, 2) R2( key, 1) R3( key, 1) Coordinator W=1 Coordinator R=1 ack( key, 2) ( key,1)
44 R1 ( key, 2) R2 ( key, 2) R3( key, 1) ack( key, 2) Coordinator W=1 Coordinator R=1 ack( key, 2) ( key,1)
45 R1 ( key, 2) R2 ( key, 2) R3 ( key, 2) ack( key, 2) ack( key, 2) Coordinator W=1 Coordinator R=1 ack( key, 2) ( key,1)
46 R1 ( key, 2) R2 ( key, 2) R3 ( key, 2) Coordinator W=1 Coordinator R=1 ack( key, 2) ( key,1)
47 R1 ( key, 2) R2 ( key, 2) R3 ( key, 2) ( key, 2) Coordinator W=1 Coordinator R=1 ack( key, 2) ( key,1)
48 R1 ( key, 2) R2 ( key, 2) R3 ( key, 2) ( key, 2) ( key, 2) Coordinator W=1 Coordinator R=1 ack( key, 2) ( key,1)
49 R1 ( key, 2) R2 ( key, 2) R3 ( key, 2) Coordinator W=1 Coordinator R=1 ack( key, 2) ( key,1)
50 if: then: else: W > N/2 R+W > N read (at least) last committed version eventual consistency
51 eventual consistency If no new updates are made to the object, eventually all accesses will return the last updated value W. Vogels, CACM 2008
52 How eventual? How long do I have to wait?
53 How consistent? What happens if I don t wait?
54 Riak Defaults N=3 R=2 W=2
55 Riak Defaults N=3 R=2 W=2 2+2 > 3
56 Riak Defaults N=3 Phew, I m safe! R=2 W=2 2+2 > 3
57 Riak Defaults N=3 R=2 W=2 2+2 > 3 Phew, I m safe!...but what s my latency cost?
58 Riak Defaults N=3 R=2 W=2 2+2 > 3 Phew, I m safe!...but what s my latency cost? Should I change?
59 R+W
60 R+W strong consistency
61 R+W strong consistency low latency
62 Cassandra: R=W=1, N=3 by default (1+1 3)
63 "In the general case, we typically use [Cassandra s] consistency level of [R=W=1], which provides maximum performance. Nice!" --D. Williams, HBase vs Cassandra: why we moved February
64
65
66 Low Value Data n = 2, r = 1, w = 1 Consistency or Bust: Breaking a Riak Cluster Sunday, July 31, 11
67 Low Value Data n = 2, r = 1, w = 1 Consistency or Bust: Breaking a Riak Cluster Sunday, July 31, 11
68 Mission Critical Data n = 5, r = 1, w = 5, dw = 5 Consistency or Bust: Breaking a Riak Cluster Sunday, July 31, 11
69 Mission Critical Data n = 5, r = 1, w = 5, dw = 5 Consistency or Bust: Breaking a Riak Cluster Sunday, July 31, 11
70 LinkedIn very low latency and high availability : R=W=1, N=3 N=3 not required, some consistency : R=W=1, personal communication
71 Anecdotally, EC worthwhile for many kinds of data
72 Anecdotally, EC worthwhile for many kinds of data How eventual? How consistent?
73 Anecdotally, EC worthwhile for many kinds of data How eventual? How consistent? eventual and consistent enough
74 Can we do better?
75 Can we do better? Probabilistically Bounded Staleness can t make promises can give expectations
76 PBS is: a way to quantify latency-consistency trade-offs what s the latency cost of consistency? what s the consistency cost of latency?
77 PBS is: a way to quantify latency-consistency trade-offs what s the latency cost of consistency? what s the consistency cost of latency? a SLA for consistency
78 How eventual? t-visibility: consistent reads with probability p after after t seconds (e.g., 99.9% of reads will be consistent after 10ms)
79 Coordinator once per replica Replica
80 Coordinator once per replica write Replica
81 Coordinator once per replica write Replica ack
82 Coordinator once per replica write Replica wait for W responses ack
83 Coordinator once per replica write Replica wait for W responses ack t seconds elapse
84 Coordinator once per replica write Replica wait for W responses ack t seconds elapse read
85 Coordinator once per replica write Replica wait for W responses ack t seconds elapse read response
86 Coordinator once per replica write Replica wait for W responses ack t seconds elapse read wait for R responses response
87 Coordinator once per replica write Replica wait for W responses ack t seconds elapse read response is wait for R responses response stale if read arrives before write
88 Coordinator once per replica write Replica wait for W responses ack t seconds elapse read response is wait for R responses response stale if read arrives before write
89 Coordinator once per replica write Replica wait for W responses ack t seconds elapse read response is wait for R responses response stale if read arrives before write
90 Coordinator once per replica write Replica wait for W responses ack t seconds elapse read response is wait for R responses response stale if read arrives before write
91 T I M E N=2
92 write write T I M E N=2
93 write write T I M ack ack E N=2
94 write write T I M W=1 ack ack E N=2
95 write write T I M W=1 ack ack E N=2
96 write write T I M W=1 ack ack E read read N=2
97 write write T I M W=1 ack ack E read read response response N=2
98 write write T I M W=1 ack ack E read read R=1 response response N=2
99 write write T I M W=1 ack ack E read read R=1 response response N=2 good
100 T I M E N=2
101 write T write I M E N=2
102 write T write I M ack E N=2 ack
103 write T write I W=1 ack M E N=2 ack
104 write T write I W=1 ack M E N=2 ack
105 write T write I W=1 ack M E read read N=2 ack
106 write T write I W=1 ack M E read read response response N=2 ack
107 write T write I W=1 ack M E read read R=1 response response N=2 ack
108 write T write I W=1 ack M E read read R=1 response N=2 response bad ack
109 write T write I W=1 ack M E read read R=1 response N=2 response bad ack
110 write T write I W=1 ack M E read read R=1 response N=2 response bad ack
111 Coordinator once per replica write Replica wait for W responses ack t seconds elapse read response is wait for R responses response stale if read arrives before write
112 R1( key, 2) R2( key, 1) R3( key, 1) Coordinator W=1 Coordinator R=1 ack( key, 2)
113 R1( key, 2) R2( key, 1) R3( key, 1) ( key, 1) Coordinator W=1 Coordinator R=1 ack( key, 2) ( key,1)
114 R1( key, 2) R2( key, 1) R3( key, 1) R3 replied before last write arrived! ( key, 1) write( key, 2) Coordinator W=1 Coordinator R=1 ack( key, 2) ( key,1)
115 Coordinator once per replica write Replica wait for W responses ack t seconds elapse read response is wait for R responses response stale if read arrives before write
116 Coordinator once per replica write (W) Replica wait for W responses ack t seconds elapse read response is wait for R responses response stale if read arrives before write
117 Coordinator once per replica write (W) Replica wait for W responses (A) ack t seconds elapse read response is wait for R responses response stale if read arrives before write
118 Coordinator once per replica write (W) Replica wait for W responses (A) ack t seconds elapse wait for R responses read (R) response response is stale if read arrives before write
119 Coordinator once per replica write (W) Replica wait for W responses (A) ack t seconds elapse wait for R responses read (R) (S) response response is stale if read arrives before write
120 Solving WARS: hard Monte Carlo methods: easy
121 To use WARS: gather latency data run simulation Cassandra implementation validated simulations; available on Github
122 How eventual? t-visibility: consistent reads with probability p after after t seconds key: WARS model need: latencies
123 How consistent? What happens if I don t wait?
124 Probability of reading later older than k versions is exponentially reduced by k Pr(reading latest write) = 99% Pr(reading one of last two writes) = 99.9% Pr(reading one of last three writes) = 99.99%
125 Riak Defaults N=3 R=2 W=2 2+2 > 3 Phew, I m safe!...but what s my latency cost? Should I change?
126 LinkedIn 150M+ users built and uses Voldemort Yammer 100K+ companies uses Riak Thanks production latencies
127 N=3
128 N=3 10 ms
129 LNKD-DISK 99.9% consistent reads: R=2, W=1 t = 13.6 ms Latency: ms 100% consistent reads: R=3, W=1 N=3 Latency: ms Latency is combined read and write latency at 99.9th percentile
130 LNKD-DISK 16.5% faster 99.9% consistent reads: R=2, W=1 t = 13.6 ms Latency: ms 100% consistent reads: R=3, W=1 N=3 Latency: ms Latency is combined read and write latency at 99.9th percentile
131 LNKD-DISK 16.5% faster worthwhile? 99.9% consistent reads: R=2, W=1 t = 13.6 ms Latency: ms 100% consistent reads: R=3, W=1 N=3 Latency: ms Latency is combined read and write latency at 99.9th percentile
132 N=3
133 N=3
134 N=3
135 LNKD-SSD 99.9% consistent reads: R=1, W=1 t = 1.85 ms Latency: 1.32 ms 100% consistent reads: R=3, W=1 N=3 Latency: 4.20 ms Latency is combined read and write latency at 99.9th percentile
136 LNKD-SSD 59.5% faster 99.9% consistent reads: R=1, W=1 t = 1.85 ms Latency: 1.32 ms 100% consistent reads: R=3, W=1 N=3 Latency: 4.20 ms Latency is combined read and write latency at 99.9th percentile
137 LNKD-SSD 59.5% faster better payoff! 99.9% consistent reads: R=1, W=1 t = 1.85 ms Latency: 1.32 ms 100% consistent reads: R=3, W=1 N=3 Latency: 4.20 ms Latency is combined read and write latency at 99.9th percentile
138 Coordinator once per replica Replica wait for W responses write (W) (A) ack critical factor in staleness t seconds elapse wait for R responses read (R) (S) response response is stale if read arrives before write
139 Probability Density Function low variance Probability Latency
140 Probability Density Function low variance Probability high variance Latency
141 Coordinator once per replica Replica wait for W responses write (W) (A) ack SSDs reduce variance compared to disks! t seconds elapse wait for R responses read (R) (S) response response is stale if read arrives before write
142 YMMR 99.9% consistent reads: R=1, W=1 t = ms Latency: 43.3 ms 100% consistent reads: R=3, W=1 N=3 Latency: ms Latency is combined read and write latency at 99.9th percentile
143 YMMR 81.1% faster 99.9% consistent reads: R=1, W=1 t = ms Latency: 43.3 ms 100% consistent reads: R=3, W=1 N=3 Latency: ms Latency is combined read and write latency at 99.9th percentile
144 YMMR 81.1% faster even better payoff! 99.9% consistent reads: R=1, W=1 t = ms Latency: 43.3 ms 100% consistent reads: R=3, W=1 N=3 Latency: ms Latency is combined read and write latency at 99.9th percentile
145 Riak Defaults N=3 R=2 W=2 2+2 > 3 Phew, I m safe!...but what s my latency cost? Should I change?
146 Is low latency worth it?
147 Is low latency worth it? PBS can tell you.
148 Is low latency worth it? PBS can tell you. (and PBS is easy)
149
150 Workflow 1. Metrics 2. Simulation 3. Set N, R, W 4. Profit
151 what: PBS consistency prediction why: how: weak consistency is fast measure latencies use WARS model
152 R+W strong consistency low latency
153
154 PBS latency vs. consistency trade-offs fast and simple modeling large benefits be more
155 VLDB 2012 early print tinyurl.com/pbspaper cassandra patch github.com/pbailis/cassandra-pbs
156 Extra Slides
157 PBS and apps
158 staleness requires either: staleness-tolerant data structures timelines, logs cf. commutative data structures logical monotonicity asynchronous compensation code detect violations after data is returned; see paper write code to fix any errors cf. Building on Quicksand memories, guesses, apologies
159 asynchronous compensation minimize: (compensation cost) (# of expected anomalies)
160 Read only newer data? monotonic reads session guarantee) # versions tolerable staleness = client s read rate global write rate (for a given key)
161 Failure?
162 Treat failures as latency spikes
163 How l o n g do partitions last?
164 what time interval? 99.9% uptime/yr 8.76 hours downtime/yr 8.76 consecutive hours down bad 8-hour rolling average
165 what time interval? 99.9% uptime/yr 8.76 hours downtime/yr 8.76 consecutive hours down bad 8-hour rolling average hide in tail of distribution OR continuously evaluate SLA, adjust
166 Give me (and academia) failure data!
167 In paper: Closed-form analysis Monotonic reads Staleness detection Varying N WAN model Production latency data tinyurl.com/pbspaper
168 t-visibility depends on: 1) message delays 2) background version exchange (anti-entropy)
169 t-visibility depends on: 1) message delays 2) background version exchange (anti-entropy) anti-entropy: only decreases staleness comes in many flavors hard to guarantee rate Focus on message delays first
170 LNKD-SSD LNKD-DISK YMMR WAN 1.0 R=3 0.8 CDF Write Latency (ms) N=3 (LNKD-SSD and LNKD-DISK identical for reads)
171 LNKD-SSD LNKD-DISK YMMR WAN 1.0 W=3 0.8 CDF Write Latency (ms) N=3
172 N=3
173 N=3
174 N=3
175 Synthetic, Exponential Distributions
176 W 1/4x ARS Synthetic, Exponential Distributions
177 W 1/4x ARS Synthetic, Exponential Distributions W 10x ARS
178 N = 3 replicas R1 R2 R3 Write to W, read from R replicas
179 N = 3 replicas R1 R2 R3 Write to W, read from R replicas quorum system: guaranteed intersection { } R1 R2 R3 R=W=3 replicas { R2} { R3} { R3} } R1 R2 R1 R=W=2 replicas
180 N = 3 replicas R1 R2 R3 Write to W, read from R replicas quorum system: guaranteed intersection { } R1 R2 R3 R=W=3 replicas { R2} { R3} { R3} } R1 R2 R1 R=W=2 replicas partial quorum system: may not intersect { }{ }{ } R1 R2 R3 R=W=1 replicas
@ Twitter Peter Shivaram Venkataraman, Mike Franklin, Joe Hellerstein, Ion Stoica. UC Berkeley
PBS @ Twitter 6.22.12 Peter Bailis @pbailis Shivaram Venkataraman, Mike Franklin, Joe Hellerstein, Ion Stoica UC Berkeley PBS Probabilistically Bounded Staleness 1. Fast 2. Scalable 3. Available solution:
More informationProbabilistically Bounded Staleness for Practical Partial Quorums
Probabilistically Bounded Staleness for Practical Partial Quorums Peter Bailis Shivaram Venkataraman Joseph M. Hellerstein Michael Franklin Ion Stoica Electrical Engineering and Computer Sciences University
More informationEventual Consistency Today: Limitations, Extensions and Beyond
Eventual Consistency Today: Limitations, Extensions and Beyond Peter Bailis and Ali Ghodsi, UC Berkeley - Nomchin Banga Outline Eventual Consistency: History and Concepts How eventual is eventual consistency?
More informationarxiv: v1 [cs.db] 26 Apr 2012
Probabilistically Bounded Staleness for Practical Partial Quorums Peter Bailis, Shivaram Venkataraman, Michael J. Franklin, Joseph M. Hellerstein, Ion Stoica University of California, Berkeley {pbailis,
More informationEventual Consistency Today: Limitations, Extensions and Beyond
Eventual Consistency Today: Limitations, Extensions and Beyond Peter Bailis and Ali Ghodsi, UC Berkeley Presenter: Yifei Teng Part of slides are cited from Nomchin Banga Road Map Eventual Consistency:
More informationEventual Consistency 1
Eventual Consistency 1 Readings Werner Vogels ACM Queue paper http://queue.acm.org/detail.cfm?id=1466448 Dynamo paper http://www.allthingsdistributed.com/files/ amazon-dynamo-sosp2007.pdf Apache Cassandra
More informationIntuitive distributed algorithms. with F#
Intuitive distributed algorithms with F# Natallia Dzenisenka Alena Hall @nata_dzen @lenadroid A tour of a variety of intuitivedistributed algorithms used in practical distributed systems. and how to prototype
More informationSCALABLE CONSISTENCY AND TRANSACTION MODELS
Data Management in the Cloud SCALABLE CONSISTENCY AND TRANSACTION MODELS 69 Brewer s Conjecture Three properties that are desirable and expected from realworld shared-data systems C: data consistency A:
More informationAutomatic Adjustment of Consistency Level by Predicting Staleness Rate for Distributed Key-Value Storage System
Automatic Adjustment of Consistency Level by Predicting Staleness Rate for Distributed Key-Value Storage System Thazin Nwe 1, Junya Nakamura 2, Ei Chaw Htoon 1, Tin Tin Yee 1 1 University of Information
More informationDynamo: Amazon s Highly Available Key-Value Store
Dynamo: Amazon s Highly Available Key-Value Store DeCandia et al. Amazon.com Presented by Sushil CS 5204 1 Motivation A storage system that attains high availability, performance and durability Decentralized
More informationDistributed Data Management Replication
Felix Naumann F-2.03/F-2.04, Campus II Hasso Plattner Institut Distributing Data Motivation Scalability (Elasticity) If data volume, processing, or access exhausts one machine, you might want to spread
More informationDynamo: Key-Value Cloud Storage
Dynamo: Key-Value Cloud Storage Brad Karp UCL Computer Science CS M038 / GZ06 22 nd February 2016 Context: P2P vs. Data Center (key, value) Storage Chord and DHash intended for wide-area peer-to-peer systems
More informationPerformance and Forgiveness. June 23, 2008 Margo Seltzer Harvard University School of Engineering and Applied Sciences
Performance and Forgiveness June 23, 2008 Margo Seltzer Harvard University School of Engineering and Applied Sciences Margo Seltzer Architect Outline A consistency primer Techniques and costs of consistency
More informationSpotify. Scaling storage to million of users world wide. Jimmy Mårdell October 14, 2014
Cassandra @ Spotify Scaling storage to million of users world wide! Jimmy Mårdell October 14, 2014 2 About me Jimmy Mårdell Tech Product Owner in the Cassandra team 4 years at Spotify
More informationThere is a tempta7on to say it is really used, it must be good
Notes from reviews Dynamo Evalua7on doesn t cover all design goals (e.g. incremental scalability, heterogeneity) Is it research? Complexity? How general? Dynamo Mo7va7on Normal database not the right fit
More informationData Store Consistency. Alan Fekete
Data Store Consistency Alan Fekete Outline Overview Consistency Properties - Research paper: Wada et al, CIDR 11 Transaction Techniques - Research paper: Bailis et al, HotOS 13, VLDB 14 Each data item
More informationCS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University
CS 555: DISTRIBUTED SYSTEMS [REPLICATION & CONSISTENCY] Frequently asked questions from the previous class survey Shrideep Pallickara Computer Science Colorado State University L25.1 L25.2 Topics covered
More informationLarge-Scale Key-Value Stores Eventual Consistency Marco Serafini
Large-Scale Key-Value Stores Eventual Consistency Marco Serafini COMPSCI 590S Lecture 13 Goals of Key-Value Stores Export simple API put(key, value) get(key) Simpler and faster than a DBMS Less complexity,
More informationBuilding Consistent Transactions with Inconsistent Replication
Building Consistent Transactions with Inconsistent Replication Irene Zhang, Naveen Kr. Sharma, Adriana Szekeres, Arvind Krishnamurthy, Dan R. K. Ports University of Washington Distributed storage systems
More informationTAPIR. By Irene Zhang, Naveen Sharma, Adriana Szekeres, Arvind Krishnamurthy, and Dan Ports Presented by Todd Charlton
TAPIR By Irene Zhang, Naveen Sharma, Adriana Szekeres, Arvind Krishnamurthy, and Dan Ports Presented by Todd Charlton Outline Problem Space Inconsistent Replication TAPIR Evaluation Conclusion Problem
More informationCh06. NoSQL Part III.
Ch06. NoSQL Part III. Joonho Kwon Data Science Laboratory, PNU 2017. Spring Adapted from Dr.-Ing. Sebastian Michel s slides Recap: Configurations R/W Configuration Kind ofconsistency W=N and R=1 Read optimized
More information10. Replication. Motivation
10. Replication Page 1 10. Replication Motivation Reliable and high-performance computation on a single instance of a data object is prone to failure. Replicate data to overcome single points of failure
More informationZHT: Const Eventual Consistency Support For ZHT. Group Member: Shukun Xie Ran Xin
ZHT: Const Eventual Consistency Support For ZHT Group Member: Shukun Xie Ran Xin Outline Problem Description Project Overview Solution Maintains Replica List for Each Server Operation without Primary Server
More informationPNUTS and Weighted Voting. Vijay Chidambaram CS 380 D (Feb 8)
PNUTS and Weighted Voting Vijay Chidambaram CS 380 D (Feb 8) PNUTS Distributed database built by Yahoo Paper describes a production system Goals: Scalability Low latency, predictable latency Must handle
More informationMDCC MULTI DATA CENTER CONSISTENCY. amplab. Tim Kraska, Gene Pang, Michael Franklin, Samuel Madden, Alan Fekete
MDCC MULTI DATA CENTER CONSISTENCY Tim Kraska, Gene Pang, Michael Franklin, Samuel Madden, Alan Fekete gpang@cs.berkeley.edu amplab MOTIVATION 2 3 June 2, 200: Rackspace power outage of approximately 0
More informationDynamo: Amazon s Highly Available Key-value Store
Dynamo: Amazon s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall and
More informationHyperDex. A Distributed, Searchable Key-Value Store. Robert Escriva. Department of Computer Science Cornell University
HyperDex A Distributed, Searchable Key-Value Store Robert Escriva Bernard Wong Emin Gün Sirer Department of Computer Science Cornell University School of Computer Science University of Waterloo ACM SIGCOMM
More informationDistributed Hash Tables Chord and Dynamo
Distributed Hash Tables Chord and Dynamo (Lecture 19, cs262a) Ion Stoica, UC Berkeley October 31, 2016 Today s Papers Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications, Ion Stoica,
More informationCAP Theorem, BASE & DynamoDB
Indian Institute of Science Bangalore, India भ रत य व ज ञ न स स थ न ब गल र, भ रत DS256:Jan18 (3:1) Department of Computational and Data Sciences CAP Theorem, BASE & DynamoDB Yogesh Simmhan Yogesh Simmhan
More informationCSE-E5430 Scalable Cloud Computing Lecture 10
CSE-E5430 Scalable Cloud Computing Lecture 10 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 23.11-2015 1/29 Exam Registering for the exam is obligatory,
More informationHorizontal or vertical scalability? Horizontal scaling is challenging. Today. Scaling Out Key-Value Storage
Horizontal or vertical scalability? Scaling Out Key-Value Storage COS 418: Distributed Systems Lecture 8 Kyle Jamieson Vertical Scaling Horizontal Scaling [Selected content adapted from M. Freedman, B.
More informationScaling Out Key-Value Storage
Scaling Out Key-Value Storage COS 418: Distributed Systems Logan Stafman [Adapted from K. Jamieson, M. Freedman, B. Karp] Horizontal or vertical scalability? Vertical Scaling Horizontal Scaling 2 Horizontal
More informationDistributed Data Management Summer Semester 2015 TU Kaiserslautern
Distributed Data Management Summer Semester 2015 TU Kaiserslautern Prof. Dr.-Ing. Sebastian Michel Databases and Information Systems Group (AG DBIS) http://dbis.informatik.uni-kl.de/ Distributed Data Management,
More informationConsumer-view of consistency properties: definition, measurement, and exploitation
Consumer-view of consistency properties: definition, measurement, and exploitation Alan Fekete (University of Sydney) Alan.Fekete@sydney.edu.au PaPoC, London, April 2016 Limitations of this talk Discuss
More informationBuilding High Performance Apps using NoSQL. Swami Sivasubramanian General Manager, AWS NoSQL
Building High Performance Apps using NoSQL Swami Sivasubramanian General Manager, AWS NoSQL Building high performance apps There is a lot to building high performance apps Scalability Performance at high
More informationAzure Cosmos DB. Planet Earth Scale, for now. Mike Sr. Consultant, Microsoft
Azure Cosmos DB Planet Earth Scale, for now Mike Lawell, @sqldiver, Sr. Consultant, Microsoft Mission-critical applications for a global userbase need Building globally distributed applications Design
More informationScaling the Yelp s logging pipeline with Apache Kafka. Enrico
Scaling the Yelp s logging pipeline with Apache Kafka Enrico Canzonieri enrico@yelp.com @EnricoC89 Yelp s Mission Connecting people with great local businesses. Yelp Stats As of Q1 2016 90M 102M 70% 32
More informationData Storage Revolution
Data Storage Revolution Relational Databases Object Storage (put/get) Dynamo PNUTS CouchDB MemcacheDB Cassandra Speed Scalability Availability Throughput No Complexity Eventual Consistency Write Request
More informationInvestigation of Techniques to Model and Reduce Latencies in Partial Quorum Systems
Investigation of Techniques to Model and Reduce Latencies in Partial Quorum Systems Wesley Chow chowwesley@berkeley.edu Ning Tan ntan@eecs.berkeley.edu December 17, 2012 Abstract In this project, we investigate
More informationAdaptive Hybrid Quorums in Practical Settings
Adaptive Hybrid Quorums in Practical Settings Aaron Davidson, Aviad Rubinstein, Anirudh Todi, Peter Bailis, and Shivaram Venkataraman December 17, 2012 Abstract A series of recent works explore the trade-o
More informationCS October 2017
Atomic Transactions Transaction An operation composed of a number of discrete steps. Distributed Systems 11. Distributed Commit Protocols All the steps must be completed for the transaction to be committed.
More informationCS Amazon Dynamo
CS 5450 Amazon Dynamo Amazon s Architecture Dynamo The platform for Amazon's e-commerce services: shopping chart, best seller list, produce catalog, promotional items etc. A highly available, distributed
More information! Replication comes with consistency cost: ! Reasons for replication: Better performance and availability. ! Replication transforms client-server
in Replication Continuous and Haifeng Yu CPS 212 Fall 2002! Replication comes with consistency cost:! Reasons for replication: Better performance and availability! Replication transforms -server communication
More informationCS6450: Distributed Systems Lecture 11. Ryan Stutsman
Strong Consistency CS6450: Distributed Systems Lecture 11 Ryan Stutsman Material taken/derived from Princeton COS-418 materials created by Michael Freedman and Kyle Jamieson at Princeton University. Licensed
More informationDrRobert N. M. Watson
Distributed systems Lecture 15: Replication, quorums, consistency, CAP, and Amazon/Google case studies DrRobert N. M. Watson 1 Last time General issue of consensus: How to get processes to agree on something
More informationDynamo: Amazon s Highly Available Key-value Store. ID2210-VT13 Slides by Tallat M. Shafaat
Dynamo: Amazon s Highly Available Key-value Store ID2210-VT13 Slides by Tallat M. Shafaat Dynamo An infrastructure to host services Reliability and fault-tolerance at massive scale Availability providing
More informationTECHNISCHE UNIVERSITEIT EINDHOVEN Faculteit Wiskunde en Informatica
TECHNISCHE UNIVERSITEIT EINDHOVEN Faculteit Wiskunde en Informatica Examination Architecture of Distributed Systems (2IMN10), on Monday November 7, 2016, from 13.30 to 16.30 hours. Before you start, read
More informationTools for Social Networking Infrastructures
Tools for Social Networking Infrastructures 1 Cassandra - a decentralised structured storage system Problem : Facebook Inbox Search hundreds of millions of users distributed infrastructure inbox changes
More informationDistributed Data Store
Distributed Data Store Large-Scale Distributed le system Q: What if we have too much data to store in a single machine? Q: How can we create one big filesystem over a cluster of machines, whose data is
More informationExtend PB for high availability. PB high availability via 2PC. Recall: Primary-Backup. Putting it all together for SMR:
Putting it all together for SMR: Two-Phase Commit, Leader Election RAFT COS 8: Distributed Systems Lecture Recall: Primary-Backup Mechanism: Replicate and separate servers Goal #: Provide a highly reliable
More informationFast and Accurate Load Balancing for Geo-Distributed Storage Systems
Fast and Accurate Load Balancing for Geo-Distributed Storage Systems Kirill L. Bogdanov 1 Waleed Reda 1,2 Gerald Q. Maguire Jr. 1 Dejan Kostic 1 Marco Canini 3 1 KTH Royal Institute of Technology 2 Université
More informationPage 1. Key Value Storage"
Key Value Storage CS162 Operating Systems and Systems Programming Lecture 14 Key Value Storage Systems March 12, 2012 Anthony D. Joseph and Ion Stoica http://inst.eecs.berkeley.edu/~cs162 Handle huge volumes
More informationPaxos Replicated State Machines as the Basis of a High- Performance Data Store
Paxos Replicated State Machines as the Basis of a High- Performance Data Store William J. Bolosky, Dexter Bradshaw, Randolph B. Haagens, Norbert P. Kusters and Peng Li March 30, 2011 Q: How to build a
More informationRecall: Primary-Backup. State machine replication. Extend PB for high availability. Consensus 2. Mechanism: Replicate and separate servers
Replicated s, RAFT COS 8: Distributed Systems Lecture 8 Recall: Primary-Backup Mechanism: Replicate and separate servers Goal #: Provide a highly reliable service Goal #: Servers should behave just like
More informationQuantitative Analysis of Consistency in NoSQL Key-value Stores
Quantitative Analysis of Consistency in NoSQL Key-value Stores Si Liu, Son Nguyen, Jatin Ganhotra, Muntasir Raihan Rahman Indranil Gupta, and José Meseguer February 206 NoSQL Systems Growing quickly $3.4B
More informationTrade- Offs in Cloud Storage Architecture. Stefan Tai
Trade- Offs in Cloud Storage Architecture Stefan Tai Cloud computing is about providing and consuming resources as services There are five essential characteristics of cloud services [NIST] [NIST]: http://csrc.nist.gov/groups/sns/cloud-
More informationIntra-cluster Replication for Apache Kafka. Jun Rao
Intra-cluster Replication for Apache Kafka Jun Rao About myself Engineer at LinkedIn since 2010 Worked on Apache Kafka and Cassandra Database researcher at IBM Outline Overview of Kafka Kafka architecture
More informationCS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University
CS 555: DISTRIBUTED SYSTEMS [DYNAMO & GOOGLE FILE SYSTEM] Frequently asked questions from the previous class survey What s the typical size of an inconsistency window in most production settings? Dynamo?
More informationDistributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf
Distributed systems Lecture 6: distributed transactions, elections, consensus and replication Malte Schwarzkopf Last time Saw how we can build ordered multicast Messages between processes in a group Need
More informationBIG DATA AND CONSISTENCY. Amy Babay
BIG DATA AND CONSISTENCY Amy Babay Outline Big Data What is it? How is it used? What problems need to be solved? Replication What are the options? Can we use this to solve Big Data s problems? Putting
More informationConsistency and Replication (part b)
Consistency and Replication (part b) EECS 591 Farnam Jahanian University of Michigan Tanenbaum Chapter 6.1-6.5 Eventual Consistency A very weak consistency model characterized by the lack of simultaneous
More informationCPS 512 midterm exam #1, 10/7/2016
CPS 512 midterm exam #1, 10/7/2016 Your name please: NetID: Answer all questions. Please attempt to confine your answers to the boxes provided. If you don t know the answer to a question, then just say
More informationCS 138: Dynamo. CS 138 XXIV 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.
CS 138: Dynamo CS 138 XXIV 1 Copyright 2017 Thomas W. Doeppner. All rights reserved. Dynamo Highly available and scalable distributed data store Manages state of services that have high reliability and
More informationBuilding Consistent Transactions with Inconsistent Replication
DB Reading Group Fall 2015 slides by Dana Van Aken Building Consistent Transactions with Inconsistent Replication Irene Zhang, Naveen Kr. Sharma, Adriana Szekeres, Arvind Krishnamurthy, Dan R. K. Ports
More informationConsistency and Replication. Some slides are from Prof. Jalal Y. Kawash at Univ. of Calgary
Consistency and Replication Some slides are from Prof. Jalal Y. Kawash at Univ. of Calgary Reasons for Replication Reliability/Availability : Mask failures Mask corrupted data Performance: Scalability
More informationDistributed Key Value Store Utilizing CRDT to Guarantee Eventual Consistency
Distributed Key Value Store Utilizing CRDT to Guarantee Eventual Consistency CPSC 416 Project Proposal n6n8: Trevor Jackson, u2c9: Hayden Nhan, v0r5: Yongnan (Devin) Li, x5m8: Li Jye Tong Introduction
More informationDistributed Systems COMP 212. Revision 2 Othon Michail
Distributed Systems COMP 212 Revision 2 Othon Michail Synchronisation 2/55 How would Lamport s algorithm synchronise the clocks in the following scenario? 3/55 How would Lamport s algorithm synchronise
More informationDYNAMO: AMAZON S HIGHLY AVAILABLE KEY-VALUE STORE. Presented by Byungjin Jun
DYNAMO: AMAZON S HIGHLY AVAILABLE KEY-VALUE STORE Presented by Byungjin Jun 1 What is Dynamo for? Highly available key-value storages system Simple primary-key only interface Scalable and Reliable Tradeoff:
More informationclass 20 updates 2.0 prof. Stratos Idreos
class 20 updates 2.0 prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ UPDATE table_name SET column1=value1,column2=value2,... WHERE some_column=some_value INSERT INTO table_name VALUES
More informationCassandra, MongoDB, and HBase. Cassandra, MongoDB, and HBase. I have chosen these three due to their recent
Tanton Jeppson CS 401R Lab 3 Cassandra, MongoDB, and HBase Introduction For my report I have chosen to take a deeper look at 3 NoSQL database systems: Cassandra, MongoDB, and HBase. I have chosen these
More informationDesign Patterns for Large- Scale Data Management. Robert Hodges OSCON 2013
Design Patterns for Large- Scale Data Management Robert Hodges OSCON 2013 The Start-Up Dilemma 1. You are releasing Online Storefront V 1.0 2. It could be a complete bust 3. But it could be *really* big
More informationEECS 498 Introduction to Distributed Systems
EECS 498 Introduction to Distributed Systems Fall 2017 Harsha V. Madhyastha Replicated State Machines Logical clocks Primary/ Backup Paxos? 0 1 (N-1)/2 No. of tolerable failures October 11, 2017 EECS 498
More informationConsistency Rationing in the Cloud: Pay only when it matters
Consistency Rationing in the Cloud: Pay only when it matters By Sandeepkrishnan Some of the slides in this presenta4on have been taken from h7p://www.cse.iitb.ac.in./dbms/cs632/ra4oning.ppt 1 Introduc4on:
More informationConsistency & Replication
Objectives Consistency & Replication Instructor: Dr. Tongping Liu To understand replication and related issues in distributed systems" To learn about how to keep multiple replicas consistent with each
More informationThe material in this lecture is taken from Dynamo: Amazon s Highly Available Key-value Store, by G. DeCandia, D. Hastorun, M. Jampani, G.
The material in this lecture is taken from Dynamo: Amazon s Highly Available Key-value Store, by G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall,
More informationApache Cassandra - A Decentralized Structured Storage System
Apache Cassandra - A Decentralized Structured Storage System Avinash Lakshman Prashant Malik from Facebook Presented by: Oded Naor Acknowledgments Some slides are based on material from: Idit Keidar, Topics
More informationErasure Coding in Object Stores: Challenges and Opportunities
Erasure Coding in Object Stores: Challenges and Opportunities Lewis Tseng Boston College July 2018, PODC Acknowledgements Nancy Lynch Muriel Medard Kishori Konwar Prakash Narayana Moorthy Viveck R. Cadambe
More informationLecture XII: Replication
Lecture XII: Replication CMPT 401 Summer 2007 Dr. Alexandra Fedorova Replication 2 Why Replicate? (I) Fault-tolerance / High availability As long as one replica is up, the service is available Assume each
More informationPresented By: Devarsh Patel
: Amazon s Highly Available Key-value Store Presented By: Devarsh Patel CS5204 Operating Systems 1 Introduction Amazon s e-commerce platform Requires performance, reliability and efficiency To support
More informationDistributed Systems. 10. Consensus: Paxos. Paul Krzyzanowski. Rutgers University. Fall 2017
Distributed Systems 10. Consensus: Paxos Paul Krzyzanowski Rutgers University Fall 2017 1 Consensus Goal Allow a group of processes to agree on a result All processes must agree on the same value The value
More informationWhy distributed databases suck, and what to do about it. Do you want a database that goes down or one that serves wrong data?"
Why distributed databases suck, and what to do about it - Regaining consistency Do you want a database that goes down or one that serves wrong data?" 1 About the speaker NoSQL team lead at Trifork, Aarhus,
More informationUnderstanding NoSQL Database Implementations
Understanding NoSQL Database Implementations Sadalage and Fowler, Chapters 7 11 Class 07: Understanding NoSQL Database Implementations 1 Foreword NoSQL is a broad and diverse collection of technologies.
More informationBloom: Big Systems, Small Programs. Neil Conway UC Berkeley
Bloom: Big Systems, Small Programs Neil Conway UC Berkeley Distributed Computing Programming Languages Data prefetching Register allocation Loop unrolling Function inlining Optimization Global coordination,
More informationCS 655 Advanced Topics in Distributed Systems
Presented by : Walid Budgaga CS 655 Advanced Topics in Distributed Systems Computer Science Department Colorado State University 1 Outline Problem Solution Approaches Comparison Conclusion 2 Problem 3
More informationConsistency in Distributed Storage Systems. Mihir Nanavati March 4 th, 2016
Consistency in Distributed Storage Systems Mihir Nanavati March 4 th, 2016 Today Overview of distributed storage systems CAP Theorem About Me Virtualization/Containers, CPU microarchitectures/caches, Network
More informationPaxos and Raft (Lecture 21, cs262a) Ion Stoica, UC Berkeley November 7, 2016
Paxos and Raft (Lecture 21, cs262a) Ion Stoica, UC Berkeley November 7, 2016 Bezos mandate for service-oriented-architecture (~2002) 1. All teams will henceforth expose their data and functionality through
More informationDistributed Systems. Lec 12: Consistency Models Sequential, Causal, and Eventual Consistency. Slide acks: Jinyang Li
Distributed Systems Lec 12: Consistency Models Sequential, Causal, and Eventual Consistency Slide acks: Jinyang Li (http://www.news.cs.nyu.edu/~jinyang/fa10/notes/ds-eventual.ppt) 1 Consistency (Reminder)
More informationZHT A Fast, Reliable and Scalable Zero- hop Distributed Hash Table
ZHT A Fast, Reliable and Scalable Zero- hop Distributed Hash Table 1 What is KVS? Why to use? Why not to use? Who s using it? Design issues A storage system A distributed hash table Spread simple structured
More informationXXXX Characterizing and Adapting the Consistency-Latency Tradeoff in Distributed Key-value Stores
XXXX Characterizing and Adapting the Consistency-Latency Tradeoff in Distributed Key-value Stores MUNTASIR RAIHAN RAHMAN, LEWIS TSENG, SON NGUYEN, INDRANIL GUPTA, NITIN VAIDYA, University of Illinois at
More informationNoSQL Databases. Amir H. Payberah. Swedish Institute of Computer Science. April 10, 2014
NoSQL Databases Amir H. Payberah Swedish Institute of Computer Science amir@sics.se April 10, 2014 Amir H. Payberah (SICS) NoSQL Databases April 10, 2014 1 / 67 Database and Database Management System
More informationCISC 7610 Lecture 2b The beginnings of NoSQL
CISC 7610 Lecture 2b The beginnings of NoSQL Topics: Big Data Google s infrastructure Hadoop: open google infrastructure Scaling through sharding CAP theorem Amazon s Dynamo 5 V s of big data Everyone
More informationPage 1. Goals of Today s Lecture" Two Key Questions" Goals of Transaction Scheduling"
Goals of Today s Lecture" CS162 Operating Systems and Systems Programming Lecture 19 Transactions, Two Phase Locking (2PL), Two Phase Commit (2PC)" Transaction scheduling Two phase locking (2PL) and strict
More informationMY WEAK CONSISTENCY IS STRONG WHEN BAD THINGS DO NOT COME IN THREES ZECHAO SHANG JEFFREY XU YU
MY WEAK CONSISTENCY IS STRONG WHEN BAD THINGS DO NOT COME IN THREES ZECHAO SHANG JEFFREY XU YU DISCLAIMER: NOT AN OLTP TALK HOW TO GET ALMOST EVERYTHING FOR NOTHING SHARED-MEMORY SYSTEM IS BACK shared
More informationUsing the SDACK Architecture to Build a Big Data Product. Yu-hsin Yeh (Evans Ye) Apache Big Data NA 2016 Vancouver
Using the SDACK Architecture to Build a Big Data Product Yu-hsin Yeh (Evans Ye) Apache Big Data NA 2016 Vancouver Outline A Threat Analytic Big Data product The SDACK Architecture Akka Streams and data
More informationDistributed Systems. Catch-up Lecture: Consistency Model Implementations
Distributed Systems Catch-up Lecture: Consistency Model Implementations Slides redundant with Lec 11,12 Slide acks: Jinyang Li, Robert Morris, Dave Andersen 1 Outline Last times: Consistency models Strict
More informationLast time. Distributed systems Lecture 6: Elections, distributed transactions, and replication. DrRobert N. M. Watson
Distributed systems Lecture 6: Elections, distributed transactions, and replication DrRobert N. M. Watson 1 Last time Saw how we can build ordered multicast Messages between processes in a group Need to
More informationZooKeeper & Curator. CS 475, Spring 2018 Concurrent & Distributed Systems
ZooKeeper & Curator CS 475, Spring 2018 Concurrent & Distributed Systems Review: Agreement In distributed systems, we have multiple nodes that need to all agree that some object has some state Examples:
More informationNoSQL systems: sharding, replication and consistency. Riccardo Torlone Università Roma Tre
NoSQL systems: sharding, replication and consistency Riccardo Torlone Università Roma Tre Data distribution NoSQL systems: data distributed over large clusters Aggregate is a natural unit to use for data
More informationReplication in Distributed Systems
Replication in Distributed Systems Replication Basics Multiple copies of data kept in different nodes A set of replicas holding copies of a data Nodes can be physically very close or distributed all over
More informationStrong Consistency & CAP Theorem
Strong Consistency & CAP Theorem CS 240: Computing Systems and Concurrency Lecture 15 Marco Canini Credits: Michael Freedman and Kyle Jamieson developed much of the original material. Consistency models
More information