@ Twitter Peter Shivaram Venkataraman, Mike Franklin, Joe Hellerstein, Ion Stoica. UC Berkeley

Size: px
Start display at page:

Download "@ Twitter Peter Shivaram Venkataraman, Mike Franklin, Joe Hellerstein, Ion Stoica. UC Berkeley"

Transcription

1 Twitter Peter Shivaram Venkataraman, Mike Franklin, Joe Hellerstein, Ion Stoica UC Berkeley

2 PBS Probabilistically Bounded Staleness

3 1. Fast 2. Scalable 3. Available

4 solution: replicate for 1. request capacity 2. reliability

5 solution: replicate for 1. request capacity 2. reliability

6 solution: replicate for 1. request capacity 2. reliability

7

8 keep replicas in sync

9 keep replicas in sync

10 keep replicas in sync

11 keep replicas in sync

12 keep replicas in sync

13 keep replicas in sync

14 keep replicas in sync

15 keep replicas in sync slow

16 keep replicas in sync slow alternative: sync later

17 keep replicas in sync slow alternative: sync later

18 keep replicas in sync slow alternative: sync later

19 keep replicas in sync slow alternative: sync later inconsistent

20 keep replicas in sync slow alternative: sync later inconsistent

21 keep replicas in sync slow alternative: sync later inconsistent

22 keep replicas in sync slow alternative: sync later inconsistent

23 consistency, latency contact more replicas, read more recent data consistency, latency contact fewer replicas, read less recent data

24 consistency, latency contact more replicas, read more recent data consistency, latency contact fewer replicas, read less recent data

25 eventual consistency if no new updates are made to the object, eventually all accesses will return the last updated value W. Vogels, CACM 2008

26 How eventual? How long do I have to wait?

27 How consistent? What happens if I don t wait?

28 PBS problem: solution: no guarantees with eventual consistency consistency prediction technique: measure latencies use WARS model

29 Dynamo: Amazon s Highly Available Key-value Store SOSP 2007 Apache, DataStax Project Voldemort

30 Adobe Rackspace IBM Spotify Palantir Twitter Cassandra Cisco Netflix Reddit Morningstar Digg Mozilla Rhapsody Shazam Gowalla Soundcloud Voldemort LinkedIn Gilt Groupe Aol GitHub Ask.com Yammer Best Buy Riak Comcast Boeing JoyentCloud

31 N = 3 replicas R1 R2 R3 ( key, 1) ( key, 1) ( key, 1) read R=3 Coordinator client

32 N = 3 replicas R1 R2 R3 ( key, 1) ( key, 1) ( key, 1) read R=3 Coordinator read( key ) client

33 N = 3 replicas R1 R2 R3 ( key, 1) ( key, 1) ( key, 1) read( key ) read R=3 Coordinator client

34 N = 3 replicas R1 R2 R3 ( key, 1) ( key, 1) ( key, 1) ( key, 1) ( key, 1) ( key, 1) read R=3 Coordinator client

35 N = 3 replicas R1 R2 R3 ( key, 1) ( key, 1) ( key, 1) ( key, 1) ( key, 1) ( key, 1) read R=3 Coordinator ( key, 1) client

36 N = 3 replicas R1 R2 R3 ( key, 1) ( key, 1) ( key, 1) read R=3 Coordinator client

37 N = 3 replicas R1 R2 R3 ( key, 1) ( key, 1) ( key, 1) read R=3 Coordinator read( key ) client

38 N = 3 replicas R1 R2 R3 ( key, 1) ( key, 1) ( key, 1) read( key ) read R=3 Coordinator client

39 N = 3 replicas R1 R2 R3 ( key, 1) ( key, 1) ( key, 1) ( key, 1) read R=3 Coordinator client

40 N = 3 replicas R1 R2 R3 ( key, 1) ( key, 1) ( key, 1) ( key, 1) ( key, 1) read R=3 Coordinator client

41 N = 3 replicas R1 R2 R3 ( key, 1) ( key, 1) ( key, 1) ( key, 1) ( key, 1) ( key, 1) read R=3 Coordinator client

42 N = 3 replicas R1 R2 R3 ( key, 1) ( key, 1) ( key, 1) ( key, 1) ( key, 1) ( key, 1) read R=3 Coordinator ( key, 1) client

43 N = 3 replicas R1 R2 R3 ( key, 1) ( key, 1) ( key, 1) read R=1 Coordinator client

44 N = 3 replicas R1 R2 R3 ( key, 1) ( key, 1) ( key, 1) read R=1 Coordinator read( key ) client

45 N = 3 replicas R1 R2 R3 ( key, 1) ( key, 1) ( key, 1) read( key ) read R=1 Coordinator client send read to all

46 N = 3 replicas R1 R2 R3 ( key, 1) ( key, 1) ( key, 1) ( key, 1) read R=1 Coordinator client send read to all

47 N = 3 replicas R1 R2 R3 ( key, 1) ( key, 1) ( key, 1) ( key, 1) read R=1 Coordinator ( key, 1) client send read to all

48 N = 3 replicas R1 R2 R3 ( key, 1) ( key, 1) ( key, 1) ( key, 1) ( key, 1) read R=1 Coordinator ( key, 1) client send read to all

49 N = 3 replicas R1 R2 R3 ( key, 1) ( key, 1) ( key, 1) ( key, 1) ( key, 1) ( key, 1) read R=1 Coordinator ( key, 1) client send read to all

50 N replicas/key read: wait for R replies write: wait for W acks

51 R1( key, 1) R2( key, 1) R3( key, 1) Coordinator W=1

52 R1( key, 1) R2( key, 1) R3( key, 1) Coordinator W=1 write( key, 2)

53 R1( key, 1) R2( key, 1) R3( key, 1) write( key, 2) Coordinator W=1

54 R1( key, 2) R2( key, 1) R3( key, 1) ack( key, 2) Coordinator W=1

55 R1( key, 2) R2( key, 1) R3( key, 1) ack( key, 2) Coordinator W=1 ack( key, 2)

56 R1( key, 2) R2( key, 1) R3( key, 1) Coordinator W=1 Coordinator R=1 ack( key, 2) read( key )

57 R1( key, 2) R2( key, 1) R3( key, 1) read( key ) Coordinator W=1 Coordinator R=1 ack( key, 2)

58 R1( key, 2) R2( key, 1) R3( key, 1) ( key, 1) Coordinator W=1 Coordinator R=1 ack( key, 2)

59 R1( key, 2) R2( key, 1) R3( key, 1) Coordinator W=1 Coordinator R=1 ack( key, 2) ( key,1)

60 R1( key, 2) R2( key, 1) R3( key, 1) Coordinator W=1 Coordinator R=1 ack( key, 2) ( key,1)

61 R1( key, 2) R2( key, 1) R3( key, 1) Coordinator W=1 Coordinator R=1 ack( key, 2) ( key,1)

62 R1( key, 2) R2( key, 1) R3( key, 1) Coordinator W=1 Coordinator R=1 ack( key, 2) ( key,1)

63 R1 ( key, 2) R2 ( key, 2) R3( key, 1) ack( key, 2) Coordinator W=1 Coordinator R=1 ack( key, 2) ( key,1)

64 R1 ( key, 2) R2 ( key, 2) R3 ( key, 2) ack( key, 2) ack( key, 2) Coordinator W=1 Coordinator R=1 ack( key, 2) ( key,1)

65 R1 ( key, 2) R2 ( key, 2) R3 ( key, 2) Coordinator W=1 Coordinator R=1 ack( key, 2) ( key,1)

66 R1 ( key, 2) R2 ( key, 2) R3 ( key, 2) ( key, 2) Coordinator W=1 Coordinator R=1 ack( key, 2) ( key,1)

67 R1 ( key, 2) R2 ( key, 2) R3 ( key, 2) ( key, 2) ( key, 2) Coordinator W=1 Coordinator R=1 ack( key, 2) ( key,1)

68 R1 ( key, 2) R2 ( key, 2) R3 ( key, 2) Coordinator W=1 Coordinator R=1 ack( key, 2) ( key,1)

69 if: R+W > N then: else: strong consistency eventual consistency

70 R+W

71 R+W strong consistency

72 R+W strong consistency lower latency

73 R+W strong consistency lower latency

74 R+W strong consistency lower latency

75 Cassandra: R=W=1, N=3 by default (1+1 3)

76 "In the general case, we typically use [Cassandra s] consistency level of [R=W=1], which provides maximum performance. Nice!" --D. Williams, HBase vs Cassandra: why we moved February

77

78

79 NoSQL Primer Consistency or Bust: Breaking a Riak Cluster Sunday, July 31, 11 23

80 Low Value Data n = 2, r = 1, w = 1 Consistency or Bust: Breaking a Riak Cluster Sunday, July 31, 11

81 Low Value Data n = 2, r = 1, w = 1 Consistency or Bust: Breaking a Riak Cluster Sunday, July 31, 11

82 Mission Critical Data n = 5, r = 1, w = 5, dw = 5 Consistency or Bust: Breaking a Riak Cluster Sunday, July 31, 11

83 Mission Critical Data n = 5, r = 1, w = 5, dw = 5 Consistency or Bust: Breaking a Riak Cluster Sunday, July 31, 11

84 LinkedIn very low latency and high availability : R=W=1, N=3 N=3 not required, some consistency : R=W=1, N=2 Alex Feinberg, personal communication

85 Anecdotally, EC worthwhile for many kinds of data

86 Anecdotally, EC worthwhile for many kinds of data How eventual? How consistent?

87 Anecdotally, EC worthwhile for many kinds of data How eventual? How consistent? eventual and consistent enough

88 Can we do better?

89 Can we do better? Probabilistically Bounded Staleness can t make promises can give expectations

90 PBS is: a way to quantify latency-consistency trade-offs what s the latency cost of consistency? what s the consistency cost of latency?

91 PBS is: a way to quantify latency-consistency trade-offs what s the latency cost of consistency? what s the consistency cost of latency? an SLA for consistency

92 How eventual? t-visibility: probability p of consistent reads after after t seconds (e.g., 99.9% of reads will be consistent after 10ms)

93 t-visibility depends on: 1) message delays 2) background version exchange (anti-entropy)

94 t-visibility depends on: 1) message delays 2) background version exchange (anti-entropy) anti-entropy: only decreases staleness comes in many flavors hard to guarantee rate Focus on message delays

95 focus on steady state with failures: unavailable or sloppy

96 Coordinator once per replica Replica T i m e

97 Coordinator once per replica Replica T i m write e

98 Coordinator once per replica Replica T i m write e ack

99 Coordinator once per replica Replica T i m write e wait for W ack responses

100 Coordinator once per replica Replica T i m write e wait for W ack responses t seconds elapse

101 Coordinator once per replica Replica T i m write e wait for W ack responses t seconds elapse read

102 Coordinator once per replica Replica T i m write e wait for W ack responses t seconds elapse read response

103 Coordinator once per replica Replica T i m write e wait for W ack responses t seconds elapse read wait for R responses response

104 Coordinator once per replica Replica T i m write e wait for W ack responses t seconds elapse read wait for R responses response response is stale if read arrives before write

105 Coordinator once per replica Replica T i m write e wait for W ack responses t seconds elapse read wait for R responses response response is stale if read arrives before write

106 Coordinator once per replica Replica T i m write e wait for W ack responses t seconds elapse read wait for R responses response response is stale if read arrives before write

107 Coordinator once per replica Replica T i m write e wait for W ack responses t seconds elapse read wait for R responses response response is stale if read arrives before write

108 T i m e N=2

109 write write T i m e N=2

110 write write T i m ack ack e N=2

111 write write T i m W=1 ack ack e N=2

112 write write T i m W=1 ack ack e N=2

113 write write T i m W=1 ack ack e read read N=2

114 write write T i m W=1 ack ack e read read response response N=2

115 write write T i m W=1 ack ack e read read R=1 response response N=2

116 write write T i m W=1 ack ack e read read R=1 response response N=2 good

117 T i m e N=2

118 write write T i m e N=2

119 write write T i m ack e N=2 ack

120 write write T i m W=1 ack e N=2 ack

121 write write T i m W=1 ack e N=2 ack

122 write write T i m W=1 ack e read read N=2 ack

123 write write T i m W=1 ack e read read response response N=2 ack

124 write write T i m W=1 ack e read read R=1 response response N=2 ack

125 write write T i m W=1 ack e read read R=1 response N=2 response bad ack

126 write write T i m W=1 ack e read read R=1 response N=2 response bad ack

127 write write T i m W=1 ack e read read R=1 response N=2 response bad ack

128 Coordinator once per replica write Replica T i m e wait for W responses ack t seconds elapse read wait for R responses response response is stale if read arrives before write

129 R1( key, 2) R2( key, 1) R3( key, 1) Coordinator W=1 Coordinator R=1 ack( key, 2)

130 R1( key, 2) R2( key, 1) R3( key, 1) ( key, 1) Coordinator W=1 Coordinator R=1 ack( key, 2) ( key,1)

131 R1( key, 2) R2( key, 1) R3( key, 1) R3 replied before last write arrived! ( key, 1) write( key, 2) Coordinator W=1 Coordinator R=1 ack( key, 2) ( key,1)

132 Coordinator once per replica write Replica T i m e wait for W responses ack t seconds elapse read wait for R responses response response is stale if read arrives before write

133 Coordinator once per replica write (W) Replica T i m e wait for W responses ack t seconds elapse read wait for R responses response response is stale if read arrives before write

134 Coordinator once per replica write (W) Replica T i m e wait for W responses (A) ack t seconds elapse read wait for R responses response response is stale if read arrives before write

135 Coordinator once per replica write (W) Replica T i m e wait for W responses (A) ack t seconds elapse read wait for R responses (R) response response is stale if read arrives before write

136 Coordinator once per replica write (W) Replica T i m e wait for W responses (A) ack t seconds elapse read wait for R responses (R) (S) response response is stale if read arrives before write

137 Solving WARS: hard Monte Carlo methods: easier

138 To use WARS: gather latency data W A R S run simulation Monte Carlo, sampling

139 How eventual? t-visibility: consistent reads with probability p after after t seconds key: WARS model need: latencies

140 How consistent? What happens if I don t wait?

141 Probability of reading later older than k versions is exponentially reduced by k Pr(reading latest write) = 99% Pr(reading one of last two writes) = 99.9% Pr(reading one of last three writes) = 99.99%

142

143 Cassandra cluster, injected latencies: WARS Simulation accuracy t-staleness RMSE: 0.28% latency N-RMSE: 0.48%

144 LinkedIn 150M+ users built and uses Voldemort Yammer 100K+ companies uses Riak production latencies fit gaussian mixtures

145 N=3

146 N=3 10 ms

147 LNKD-DISK 99.9% consistent reads: R=2, W=1 t = 13.6 ms Latency: ms 100% consistent reads: R=3, W=1 N=3 Latency: ms Latency is combined read and write latency at 99.9th percentile

148 LNKD-DISK 16.5% faster 99.9% consistent reads: R=2, W=1 t = 13.6 ms Latency: ms 100% consistent reads: R=3, W=1 N=3 Latency: ms Latency is combined read and write latency at 99.9th percentile

149 LNKD-DISK 16.5% faster worthwhile? 99.9% consistent reads: R=2, W=1 t = 13.6 ms Latency: ms 100% consistent reads: R=3, W=1 N=3 Latency: ms Latency is combined read and write latency at 99.9th percentile

150 N=3

151 N=3

152 N=3

153 LNKD-SSD 99.9% consistent reads: R=1, W=1 t = 1.85 ms Latency: 1.32 ms 100% consistent reads: R=3, W=1 N=3 Latency: 4.20 ms Latency is combined read and write latency at 99.9th percentile

154 LNKD-SSD 99.9% consistent reads: 59.5% faster R=1, W=1 t = 1.85 ms Latency: 1.32 ms 100% consistent reads: R=3, W=1 N=3 Latency: 4.20 ms Latency is combined read and write latency at 99.9th percentile

155 Coordinator once per replica Replica wait for W responses write (W) (A) ack critical factor in staleness t seconds elapse wait for R responses read (R) (S) response response is stale if read arrives before write

156 LNKD-SSD LNKD-DISK 1.0 W=3 0.8 CDF Write Latency (ms) N=3

157 LNKD-SSD LNKD-DISK 1.0 W=3 0.8 CDF Write Latency (ms) N=3

158 LNKD-SSD LNKD-DISK 1.0 W=3 0.8 CDF Write Latency (ms) N=3

159 Coordinator once per replica Replica wait for W responses write (W) (A) ack SSDs reduce variance compared to disks! t seconds elapse wait for R responses read (R) (S) response response is stale if read arrives before write

160 N=3

161 N=3

162 N=3

163 YMMR 99.9% consistent reads: R=1, W=1 t = ms Latency: 43.3 ms 100% consistent reads: R=3, W=1 N=3 Latency: ms Latency is combined read and write latency at 99.9th percentile

164 YMMR 99.9% consistent reads: 81.1% faster R=1, W=1 t = ms Latency: 43.3 ms 100% consistent reads: R=3, W=1 N=3 Latency: ms Latency is combined read and write latency at 99.9th percentile

165

166 Workflow 1. Tracing 2. Simulation 3. Tune N, R, W 4. Profit

167

168

169 PBS problem: solution: no guarantees with eventual consistency consistency prediction technique: measure latencies use WARS model

170 consistency is a metric to measure to predict

171 R+W strong consistency lower latency

172 R+W

173 latency vs. consistency trade-offs simple modeling with WARS BSmodel staleness in time, versions

174 latency vs. consistency trade-offs simple modeling with WARS BSmodel staleness in time, versions eventual consistency often fast often consistent PBS helps explain when and why

175 BS eventual consistency latency vs. consistency trade-offs simple modeling with WARS model staleness in time, versions often fast often consistent PBS helps explain when and why

176 VLDB 2012 early print tinyurl.com/pbsvldb cassandra patch tinyurl.com/pbspatch

177 Extra Slides

178 Related Work

179 Quorum System Theory e.g., Probabilistic Quorums k-quorums Deterministic Staleness e.g., TACT/conits FRACS

180 Consistency Verification e.g., Golab et al. (PODC 11), Bermbach and Tai (M4WSOC 11)

181 PBS and apps

182 staleness requires either: staleness-tolerant data structures timelines, logs cf. commutative data structures logical monotonicity asynchronous compensation code detect violations after data is returned; see paper write code to fix any errors cf. Building on Quicksand memories, guesses, apologies

183 asynchronous compensation minimize: (compensation cost) (# of expected anomalies)

184 Read only newer data? monotonic reads session guarantee) # versions tolerable staleness = client s read rate global write rate (for a given key)

185 Failure?

186 Treat failures as latency spikes

187 How l o n g do partitions last?

188 what time interval? 99.9% uptime/yr 8.76 hours downtime/yr 8.76 consecutive hours down bad 8-hour rolling average

189 what time interval? 99.9% uptime/yr 8.76 hours downtime/yr 8.76 consecutive hours down bad 8-hour rolling average hide in tail of distribution OR continuously evaluate SLA, adjust

190 LNKD-SSD LNKD-DISK YMMR WAN 1.0 W=3 0.8 CDF Write Latency (ms) N=3

191 LNKD-SSD LNKD-DISK YMMR WAN 1.0 R=3 0.8 CDF Write Latency (ms) N=3 (LNKD-SSD and LNKD-DISK identical for reads)

192 Probabilistic quorum systems pinconsistent = ( ( N-W R N R ) )

193 How consistent? k-staleness: probability p of reading one of last k versions 81

194 How consistent? 1- ( ( ( N-W R N R K )) ) 82

195 How consistent? 1- ( ( ( N-W R N R K )) ) 82

196 <k,t>-staleness: versions and time

197 <k,t>-staleness: versions and time approximation: exponentiate t-staleness by k

198 strong consistency _ reads return the last written value or newer (defined w.r.t. real time, when the read started)

199 N = 3 replicas R1 R2 R3 Write to W, read from R replicas

200 N = 3 replicas R1 R2 R3 Write to W, read from R replicas quorum system: guaranteed intersection { } R1 R2 R3 R=W=3 replicas { R2} { R3} { R3} } R1 R2 R1 R=W=2 replicas

201 N = 3 replicas R1 R2 R3 Write to W, read from R replicas quorum system: guaranteed intersection { } R1 R2 R3 R=W=3 replicas { R2} { R3} { R3} } R1 R2 R1 R=W=2 replicas partial quorum system: may not intersect { }{ }{ } R1 R2 R3 R=W=1 replicas

202 Synthetic, Exponential Distributions N=3, W=1, R=1

203 W 1/4x ARS Synthetic, Exponential Distributions N=3, W=1, R=1

204 W 1/4x ARS Synthetic, Exponential Distributions W 10x ARS N=3, W=1, R=1

205 concurrent writes: deterministically choose ( key, 1) ( key, 2) Coordinator R=2

206

207

208

209

How Eventual is Eventual Consistency?

How Eventual is Eventual Consistency? Probabilistically Bounded Staleness How Eventual is Eventual Consistency? Peter Bailis, Shivaram Venkataraman, Michael J. Franklin, Joseph M. Hellerstein, Ion Stoica (UC Berkeley) BashoChats 002, 28 February

More information

Probabilistically Bounded Staleness for Practical Partial Quorums

Probabilistically Bounded Staleness for Practical Partial Quorums Probabilistically Bounded Staleness for Practical Partial Quorums Peter Bailis Shivaram Venkataraman Joseph M. Hellerstein Michael Franklin Ion Stoica Electrical Engineering and Computer Sciences University

More information

arxiv: v1 [cs.db] 26 Apr 2012

arxiv: v1 [cs.db] 26 Apr 2012 Probabilistically Bounded Staleness for Practical Partial Quorums Peter Bailis, Shivaram Venkataraman, Michael J. Franklin, Joseph M. Hellerstein, Ion Stoica University of California, Berkeley {pbailis,

More information

Eventual Consistency Today: Limitations, Extensions and Beyond

Eventual Consistency Today: Limitations, Extensions and Beyond Eventual Consistency Today: Limitations, Extensions and Beyond Peter Bailis and Ali Ghodsi, UC Berkeley - Nomchin Banga Outline Eventual Consistency: History and Concepts How eventual is eventual consistency?

More information

Eventual Consistency Today: Limitations, Extensions and Beyond

Eventual Consistency Today: Limitations, Extensions and Beyond Eventual Consistency Today: Limitations, Extensions and Beyond Peter Bailis and Ali Ghodsi, UC Berkeley Presenter: Yifei Teng Part of slides are cited from Nomchin Banga Road Map Eventual Consistency:

More information

Eventual Consistency 1

Eventual Consistency 1 Eventual Consistency 1 Readings Werner Vogels ACM Queue paper http://queue.acm.org/detail.cfm?id=1466448 Dynamo paper http://www.allthingsdistributed.com/files/ amazon-dynamo-sosp2007.pdf Apache Cassandra

More information

SCALABLE CONSISTENCY AND TRANSACTION MODELS

SCALABLE CONSISTENCY AND TRANSACTION MODELS Data Management in the Cloud SCALABLE CONSISTENCY AND TRANSACTION MODELS 69 Brewer s Conjecture Three properties that are desirable and expected from realworld shared-data systems C: data consistency A:

More information

Performance and Forgiveness. June 23, 2008 Margo Seltzer Harvard University School of Engineering and Applied Sciences

Performance and Forgiveness. June 23, 2008 Margo Seltzer Harvard University School of Engineering and Applied Sciences Performance and Forgiveness June 23, 2008 Margo Seltzer Harvard University School of Engineering and Applied Sciences Margo Seltzer Architect Outline A consistency primer Techniques and costs of consistency

More information

Dynamo: Amazon s Highly Available Key-Value Store

Dynamo: Amazon s Highly Available Key-Value Store Dynamo: Amazon s Highly Available Key-Value Store DeCandia et al. Amazon.com Presented by Sushil CS 5204 1 Motivation A storage system that attains high availability, performance and durability Decentralized

More information

Spotify. Scaling storage to million of users world wide. Jimmy Mårdell October 14, 2014

Spotify. Scaling storage to million of users world wide. Jimmy Mårdell October 14, 2014 Cassandra @ Spotify Scaling storage to million of users world wide! Jimmy Mårdell October 14, 2014 2 About me Jimmy Mårdell Tech Product Owner in the Cassandra team 4 years at Spotify

More information

Distributed Data Management Replication

Distributed Data Management Replication Felix Naumann F-2.03/F-2.04, Campus II Hasso Plattner Institut Distributing Data Motivation Scalability (Elasticity) If data volume, processing, or access exhausts one machine, you might want to spread

More information

Dynamo: Key-Value Cloud Storage

Dynamo: Key-Value Cloud Storage Dynamo: Key-Value Cloud Storage Brad Karp UCL Computer Science CS M038 / GZ06 22 nd February 2016 Context: P2P vs. Data Center (key, value) Storage Chord and DHash intended for wide-area peer-to-peer systems

More information

Dynamo: Amazon s Highly Available Key-value Store

Dynamo: Amazon s Highly Available Key-value Store Dynamo: Amazon s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall and

More information

Trade- Offs in Cloud Storage Architecture. Stefan Tai

Trade- Offs in Cloud Storage Architecture. Stefan Tai Trade- Offs in Cloud Storage Architecture Stefan Tai Cloud computing is about providing and consuming resources as services There are five essential characteristics of cloud services [NIST] [NIST]: http://csrc.nist.gov/groups/sns/cloud-

More information

Quantitative Analysis of Consistency in NoSQL Key-value Stores

Quantitative Analysis of Consistency in NoSQL Key-value Stores Quantitative Analysis of Consistency in NoSQL Key-value Stores Si Liu, Son Nguyen, Jatin Ganhotra, Muntasir Raihan Rahman Indranil Gupta, and José Meseguer February 206 NoSQL Systems Growing quickly $3.4B

More information

Consumer-view of consistency properties: definition, measurement, and exploitation

Consumer-view of consistency properties: definition, measurement, and exploitation Consumer-view of consistency properties: definition, measurement, and exploitation Alan Fekete (University of Sydney) Alan.Fekete@sydney.edu.au PaPoC, London, April 2016 Limitations of this talk Discuss

More information

Intuitive distributed algorithms. with F#

Intuitive distributed algorithms. with F# Intuitive distributed algorithms with F# Natallia Dzenisenka Alena Hall @nata_dzen @lenadroid A tour of a variety of intuitivedistributed algorithms used in practical distributed systems. and how to prototype

More information

Building High Performance Apps using NoSQL. Swami Sivasubramanian General Manager, AWS NoSQL

Building High Performance Apps using NoSQL. Swami Sivasubramanian General Manager, AWS NoSQL Building High Performance Apps using NoSQL Swami Sivasubramanian General Manager, AWS NoSQL Building high performance apps There is a lot to building high performance apps Scalability Performance at high

More information

Distributed Hash Tables Chord and Dynamo

Distributed Hash Tables Chord and Dynamo Distributed Hash Tables Chord and Dynamo (Lecture 19, cs262a) Ion Stoica, UC Berkeley October 31, 2016 Today s Papers Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications, Ion Stoica,

More information

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University CS 555: DISTRIBUTED SYSTEMS [REPLICATION & CONSISTENCY] Frequently asked questions from the previous class survey Shrideep Pallickara Computer Science Colorado State University L25.1 L25.2 Topics covered

More information

arxiv: v3 [cs.dc] 1 Oct 2015

arxiv: v3 [cs.dc] 1 Oct 2015 Continuous Partial Quorums for Consistency-Latency Tuning in Distributed NoSQL Storage Systems [Please refer to the proceedings of SCDM 15 for the extended version of this manuscript.] Marlon McKenzie

More information

Data Store Consistency. Alan Fekete

Data Store Consistency. Alan Fekete Data Store Consistency Alan Fekete Outline Overview Consistency Properties - Research paper: Wada et al, CIDR 11 Transaction Techniques - Research paper: Bailis et al, HotOS 13, VLDB 14 Each data item

More information

Automatic Adjustment of Consistency Level by Predicting Staleness Rate for Distributed Key-Value Storage System

Automatic Adjustment of Consistency Level by Predicting Staleness Rate for Distributed Key-Value Storage System Automatic Adjustment of Consistency Level by Predicting Staleness Rate for Distributed Key-Value Storage System Thazin Nwe 1, Junya Nakamura 2, Ei Chaw Htoon 1, Tin Tin Yee 1 1 University of Information

More information

MDCC MULTI DATA CENTER CONSISTENCY. amplab. Tim Kraska, Gene Pang, Michael Franklin, Samuel Madden, Alan Fekete

MDCC MULTI DATA CENTER CONSISTENCY. amplab. Tim Kraska, Gene Pang, Michael Franklin, Samuel Madden, Alan Fekete MDCC MULTI DATA CENTER CONSISTENCY Tim Kraska, Gene Pang, Michael Franklin, Samuel Madden, Alan Fekete gpang@cs.berkeley.edu amplab MOTIVATION 2 3 June 2, 200: Rackspace power outage of approximately 0

More information

Investigation of Techniques to Model and Reduce Latencies in Partial Quorum Systems

Investigation of Techniques to Model and Reduce Latencies in Partial Quorum Systems Investigation of Techniques to Model and Reduce Latencies in Partial Quorum Systems Wesley Chow chowwesley@berkeley.edu Ning Tan ntan@eecs.berkeley.edu December 17, 2012 Abstract In this project, we investigate

More information

Large-Scale Key-Value Stores Eventual Consistency Marco Serafini

Large-Scale Key-Value Stores Eventual Consistency Marco Serafini Large-Scale Key-Value Stores Eventual Consistency Marco Serafini COMPSCI 590S Lecture 13 Goals of Key-Value Stores Export simple API put(key, value) get(key) Simpler and faster than a DBMS Less complexity,

More information

CS Amazon Dynamo

CS Amazon Dynamo CS 5450 Amazon Dynamo Amazon s Architecture Dynamo The platform for Amazon's e-commerce services: shopping chart, best seller list, produce catalog, promotional items etc. A highly available, distributed

More information

Intra-cluster Replication for Apache Kafka. Jun Rao

Intra-cluster Replication for Apache Kafka. Jun Rao Intra-cluster Replication for Apache Kafka Jun Rao About myself Engineer at LinkedIn since 2010 Worked on Apache Kafka and Cassandra Database researcher at IBM Outline Overview of Kafka Kafka architecture

More information

HyperDex. A Distributed, Searchable Key-Value Store. Robert Escriva. Department of Computer Science Cornell University

HyperDex. A Distributed, Searchable Key-Value Store. Robert Escriva. Department of Computer Science Cornell University HyperDex A Distributed, Searchable Key-Value Store Robert Escriva Bernard Wong Emin Gün Sirer Department of Computer Science Cornell University School of Computer Science University of Waterloo ACM SIGCOMM

More information

CAP Theorem, BASE & DynamoDB

CAP Theorem, BASE & DynamoDB Indian Institute of Science Bangalore, India भ रत य व ज ञ न स स थ न ब गल र, भ रत DS256:Jan18 (3:1) Department of Computational and Data Sciences CAP Theorem, BASE & DynamoDB Yogesh Simmhan Yogesh Simmhan

More information

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University CS 555: DISTRIBUTED SYSTEMS [DYNAMO & GOOGLE FILE SYSTEM] Frequently asked questions from the previous class survey What s the typical size of an inconsistency window in most production settings? Dynamo?

More information

Presented By: Devarsh Patel

Presented By: Devarsh Patel : Amazon s Highly Available Key-value Store Presented By: Devarsh Patel CS5204 Operating Systems 1 Introduction Amazon s e-commerce platform Requires performance, reliability and efficiency To support

More information

Ch06. NoSQL Part III.

Ch06. NoSQL Part III. Ch06. NoSQL Part III. Joonho Kwon Data Science Laboratory, PNU 2017. Spring Adapted from Dr.-Ing. Sebastian Michel s slides Recap: Configurations R/W Configuration Kind ofconsistency W=N and R=1 Read optimized

More information

class 20 updates 2.0 prof. Stratos Idreos

class 20 updates 2.0 prof. Stratos Idreos class 20 updates 2.0 prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ UPDATE table_name SET column1=value1,column2=value2,... WHERE some_column=some_value INSERT INTO table_name VALUES

More information

Horizontal or vertical scalability? Horizontal scaling is challenging. Today. Scaling Out Key-Value Storage

Horizontal or vertical scalability? Horizontal scaling is challenging. Today. Scaling Out Key-Value Storage Horizontal or vertical scalability? Scaling Out Key-Value Storage COS 418: Distributed Systems Lecture 8 Kyle Jamieson Vertical Scaling Horizontal Scaling [Selected content adapted from M. Freedman, B.

More information

Scaling Out Key-Value Storage

Scaling Out Key-Value Storage Scaling Out Key-Value Storage COS 418: Distributed Systems Logan Stafman [Adapted from K. Jamieson, M. Freedman, B. Karp] Horizontal or vertical scalability? Vertical Scaling Horizontal Scaling 2 Horizontal

More information

Azure Cosmos DB. Planet Earth Scale, for now. Mike Sr. Consultant, Microsoft

Azure Cosmos DB. Planet Earth Scale, for now. Mike Sr. Consultant, Microsoft Azure Cosmos DB Planet Earth Scale, for now Mike Lawell, @sqldiver, Sr. Consultant, Microsoft Mission-critical applications for a global userbase need Building globally distributed applications Design

More information

Fine-tuning the Consistency-Latency Trade-off in Quorum-Replicated Distributed Storage Systems

Fine-tuning the Consistency-Latency Trade-off in Quorum-Replicated Distributed Storage Systems Fine-tuning the Consistency-Latency Trade-off in Quorum-Replicated Distributed Storage Systems Marlon McKenzie Electrical and Computer Engineering University of Waterloo, Canada m2mckenzie@uwaterloo.ca

More information

CS 655 Advanced Topics in Distributed Systems

CS 655 Advanced Topics in Distributed Systems Presented by : Walid Budgaga CS 655 Advanced Topics in Distributed Systems Computer Science Department Colorado State University 1 Outline Problem Solution Approaches Comparison Conclusion 2 Problem 3

More information

DYNAMO: AMAZON S HIGHLY AVAILABLE KEY-VALUE STORE. Presented by Byungjin Jun

DYNAMO: AMAZON S HIGHLY AVAILABLE KEY-VALUE STORE. Presented by Byungjin Jun DYNAMO: AMAZON S HIGHLY AVAILABLE KEY-VALUE STORE Presented by Byungjin Jun 1 What is Dynamo for? Highly available key-value storages system Simple primary-key only interface Scalable and Reliable Tradeoff:

More information

CS 138: Dynamo. CS 138 XXIV 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.

CS 138: Dynamo. CS 138 XXIV 1 Copyright 2017 Thomas W. Doeppner. All rights reserved. CS 138: Dynamo CS 138 XXIV 1 Copyright 2017 Thomas W. Doeppner. All rights reserved. Dynamo Highly available and scalable distributed data store Manages state of services that have high reliability and

More information

10. Replication. Motivation

10. Replication. Motivation 10. Replication Page 1 10. Replication Motivation Reliable and high-performance computation on a single instance of a data object is prone to failure. Replicate data to overcome single points of failure

More information

Building Consistent Transactions with Inconsistent Replication

Building Consistent Transactions with Inconsistent Replication Building Consistent Transactions with Inconsistent Replication Irene Zhang, Naveen Kr. Sharma, Adriana Szekeres, Arvind Krishnamurthy, Dan R. K. Ports University of Washington Distributed storage systems

More information

XXXX Characterizing and Adapting the Consistency-Latency Tradeoff in Distributed Key-value Stores

XXXX Characterizing and Adapting the Consistency-Latency Tradeoff in Distributed Key-value Stores XXXX Characterizing and Adapting the Consistency-Latency Tradeoff in Distributed Key-value Stores MUNTASIR RAIHAN RAHMAN, LEWIS TSENG, SON NGUYEN, INDRANIL GUPTA, NITIN VAIDYA, University of Illinois at

More information

There is a tempta7on to say it is really used, it must be good

There is a tempta7on to say it is really used, it must be good Notes from reviews Dynamo Evalua7on doesn t cover all design goals (e.g. incremental scalability, heterogeneity) Is it research? Complexity? How general? Dynamo Mo7va7on Normal database not the right fit

More information

! Replication comes with consistency cost: ! Reasons for replication: Better performance and availability. ! Replication transforms client-server

! Replication comes with consistency cost: ! Reasons for replication: Better performance and availability. ! Replication transforms client-server in Replication Continuous and Haifeng Yu CPS 212 Fall 2002! Replication comes with consistency cost:! Reasons for replication: Better performance and availability! Replication transforms -server communication

More information

CSE-E5430 Scalable Cloud Computing Lecture 10

CSE-E5430 Scalable Cloud Computing Lecture 10 CSE-E5430 Scalable Cloud Computing Lecture 10 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 23.11-2015 1/29 Exam Registering for the exam is obligatory,

More information

Dynamo: Amazon s Highly Available Key-value Store. ID2210-VT13 Slides by Tallat M. Shafaat

Dynamo: Amazon s Highly Available Key-value Store. ID2210-VT13 Slides by Tallat M. Shafaat Dynamo: Amazon s Highly Available Key-value Store ID2210-VT13 Slides by Tallat M. Shafaat Dynamo An infrastructure to host services Reliability and fault-tolerance at massive scale Availability providing

More information

Database Availability and Integrity in NoSQL. Fahri Firdausillah [M ]

Database Availability and Integrity in NoSQL. Fahri Firdausillah [M ] Database Availability and Integrity in NoSQL Fahri Firdausillah [M031010012] What is NoSQL Stands for Not Only SQL Mostly addressing some of the points: nonrelational, distributed, horizontal scalable,

More information

Distributed Data Store

Distributed Data Store Distributed Data Store Large-Scale Distributed le system Q: What if we have too much data to store in a single machine? Q: How can we create one big filesystem over a cluster of machines, whose data is

More information

DataStax Enterprise 4.0 In-Memory Option A look at performance, use cases, and anti-patterns. White Paper

DataStax Enterprise 4.0 In-Memory Option A look at performance, use cases, and anti-patterns. White Paper DataStax Enterprise 4.0 In-Memory Option A look at performance, use cases, and anti-patterns White Paper Table of Contents Abstract... 3 Introduction... 3 Performance Implications of In-Memory Tables...

More information

Getting to know. by Michelle Darling August 2013

Getting to know. by Michelle Darling August 2013 Getting to know by Michelle Darling mdarlingcmt@gmail.com August 2013 Agenda: What is Cassandra? Installation, CQL3 Data Modelling Summary Only 15 min to cover these, so please hold questions til the end,

More information

Design Patterns for Large- Scale Data Management. Robert Hodges OSCON 2013

Design Patterns for Large- Scale Data Management. Robert Hodges OSCON 2013 Design Patterns for Large- Scale Data Management Robert Hodges OSCON 2013 The Start-Up Dilemma 1. You are releasing Online Storefront V 1.0 2. It could be a complete bust 3. But it could be *really* big

More information

Why distributed databases suck, and what to do about it. Do you want a database that goes down or one that serves wrong data?"

Why distributed databases suck, and what to do about it. Do you want a database that goes down or one that serves wrong data? Why distributed databases suck, and what to do about it - Regaining consistency Do you want a database that goes down or one that serves wrong data?" 1 About the speaker NoSQL team lead at Trifork, Aarhus,

More information

Fast and Accurate Load Balancing for Geo-Distributed Storage Systems

Fast and Accurate Load Balancing for Geo-Distributed Storage Systems Fast and Accurate Load Balancing for Geo-Distributed Storage Systems Kirill L. Bogdanov 1 Waleed Reda 1,2 Gerald Q. Maguire Jr. 1 Dejan Kostic 1 Marco Canini 3 1 KTH Royal Institute of Technology 2 Université

More information

PNUTS and Weighted Voting. Vijay Chidambaram CS 380 D (Feb 8)

PNUTS and Weighted Voting. Vijay Chidambaram CS 380 D (Feb 8) PNUTS and Weighted Voting Vijay Chidambaram CS 380 D (Feb 8) PNUTS Distributed database built by Yahoo Paper describes a production system Goals: Scalability Low latency, predictable latency Must handle

More information

Adaptive Hybrid Quorums in Practical Settings

Adaptive Hybrid Quorums in Practical Settings Adaptive Hybrid Quorums in Practical Settings Aaron Davidson, Aviad Rubinstein, Anirudh Todi, Peter Bailis, and Shivaram Venkataraman December 17, 2012 Abstract A series of recent works explore the trade-o

More information

Cassandra Database Security

Cassandra Database Security Cassandra Database Security Author: Mohit Bagria NoSQL Database A NoSQL database (sometimes called as Not Only SQL) is a database that provides a mechanism to store and retrieve data other than the tabular

More information

Distributed Data Management Summer Semester 2015 TU Kaiserslautern

Distributed Data Management Summer Semester 2015 TU Kaiserslautern Distributed Data Management Summer Semester 2015 TU Kaiserslautern Prof. Dr.-Ing. Sebastian Michel Databases and Information Systems Group (AG DBIS) http://dbis.informatik.uni-kl.de/ Distributed Data Management,

More information

Page 1. Key Value Storage"

Page 1. Key Value Storage Key Value Storage CS162 Operating Systems and Systems Programming Lecture 14 Key Value Storage Systems March 12, 2012 Anthony D. Joseph and Ion Stoica http://inst.eecs.berkeley.edu/~cs162 Handle huge volumes

More information

Understanding NoSQL Database Implementations

Understanding NoSQL Database Implementations Understanding NoSQL Database Implementations Sadalage and Fowler, Chapters 7 11 Class 07: Understanding NoSQL Database Implementations 1 Foreword NoSQL is a broad and diverse collection of technologies.

More information

BENCHMARK: PRELIMINARY RESULTS! JUNE 25, 2014!

BENCHMARK: PRELIMINARY RESULTS! JUNE 25, 2014! BENCHMARK: PRELIMINARY RESULTS JUNE 25, 2014 Our latest benchmark test results are in. The detailed report will be published early next month, but after 6 weeks of designing and running these tests we

More information

ZHT: Const Eventual Consistency Support For ZHT. Group Member: Shukun Xie Ran Xin

ZHT: Const Eventual Consistency Support For ZHT. Group Member: Shukun Xie Ran Xin ZHT: Const Eventual Consistency Support For ZHT Group Member: Shukun Xie Ran Xin Outline Problem Description Project Overview Solution Maintains Replica List for Each Server Operation without Primary Server

More information

Bloom: Big Systems, Small Programs. Neil Conway UC Berkeley

Bloom: Big Systems, Small Programs. Neil Conway UC Berkeley Bloom: Big Systems, Small Programs Neil Conway UC Berkeley Distributed Computing Programming Languages Data prefetching Register allocation Loop unrolling Function inlining Optimization Global coordination,

More information

Haridimos Kondylakis Computer Science Department, University of Crete

Haridimos Kondylakis Computer Science Department, University of Crete CS-562 Advanced Topics in Databases Haridimos Kondylakis Computer Science Department, University of Crete QSX (LN2) 2 NoSQL NoSQL: Not Only SQL. User case of NoSQL? Massive write performance. Fast key

More information

BIG DATA AND CONSISTENCY. Amy Babay

BIG DATA AND CONSISTENCY. Amy Babay BIG DATA AND CONSISTENCY Amy Babay Outline Big Data What is it? How is it used? What problems need to be solved? Replication What are the options? Can we use this to solve Big Data s problems? Putting

More information

CS October 2017

CS October 2017 Atomic Transactions Transaction An operation composed of a number of discrete steps. Distributed Systems 11. Distributed Commit Protocols All the steps must be completed for the transaction to be committed.

More information

DrRobert N. M. Watson

DrRobert N. M. Watson Distributed systems Lecture 15: Replication, quorums, consistency, CAP, and Amazon/Google case studies DrRobert N. M. Watson 1 Last time General issue of consensus: How to get processes to agree on something

More information

TAPIR. By Irene Zhang, Naveen Sharma, Adriana Szekeres, Arvind Krishnamurthy, and Dan Ports Presented by Todd Charlton

TAPIR. By Irene Zhang, Naveen Sharma, Adriana Szekeres, Arvind Krishnamurthy, and Dan Ports Presented by Todd Charlton TAPIR By Irene Zhang, Naveen Sharma, Adriana Szekeres, Arvind Krishnamurthy, and Dan Ports Presented by Todd Charlton Outline Problem Space Inconsistent Replication TAPIR Evaluation Conclusion Problem

More information

Paxos Replicated State Machines as the Basis of a High- Performance Data Store

Paxos Replicated State Machines as the Basis of a High- Performance Data Store Paxos Replicated State Machines as the Basis of a High- Performance Data Store William J. Bolosky, Dexter Bradshaw, Randolph B. Haagens, Norbert P. Kusters and Peng Li March 30, 2011 Q: How to build a

More information

Scaling the Yelp s logging pipeline with Apache Kafka. Enrico

Scaling the Yelp s logging pipeline with Apache Kafka. Enrico Scaling the Yelp s logging pipeline with Apache Kafka Enrico Canzonieri enrico@yelp.com @EnricoC89 Yelp s Mission Connecting people with great local businesses. Yelp Stats As of Q1 2016 90M 102M 70% 32

More information

CMPT 354: Database System I. Lecture 11. Transaction Management

CMPT 354: Database System I. Lecture 11. Transaction Management CMPT 354: Database System I Lecture 11. Transaction Management 1 Why this lecture DB application developer What if crash occurs, power goes out, etc? Single user à Multiple users 2 Outline Transaction

More information

Why NoSQL? Why Riak?

Why NoSQL? Why Riak? Why NoSQL? Why Riak? Justin Sheehy justin@basho.com 1 What's all of this NoSQL nonsense? Riak Voldemort HBase MongoDB Neo4j Cassandra CouchDB Membase Redis (and the list goes on...) 2 What went wrong with

More information

CS6450: Distributed Systems Lecture 11. Ryan Stutsman

CS6450: Distributed Systems Lecture 11. Ryan Stutsman Strong Consistency CS6450: Distributed Systems Lecture 11 Ryan Stutsman Material taken/derived from Princeton COS-418 materials created by Michael Freedman and Kyle Jamieson at Princeton University. Licensed

More information

TECHNISCHE UNIVERSITEIT EINDHOVEN Faculteit Wiskunde en Informatica

TECHNISCHE UNIVERSITEIT EINDHOVEN Faculteit Wiskunde en Informatica TECHNISCHE UNIVERSITEIT EINDHOVEN Faculteit Wiskunde en Informatica Examination Architecture of Distributed Systems (2IMN10), on Monday November 7, 2016, from 13.30 to 16.30 hours. Before you start, read

More information

NoSQL systems: sharding, replication and consistency. Riccardo Torlone Università Roma Tre

NoSQL systems: sharding, replication and consistency. Riccardo Torlone Università Roma Tre NoSQL systems: sharding, replication and consistency Riccardo Torlone Università Roma Tre Data distribution NoSQL systems: data distributed over large clusters Aggregate is a natural unit to use for data

More information

Overview. * Some History. * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL. * NoSQL Taxonomy. *TowardsNewSQL

Overview. * Some History. * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL. * NoSQL Taxonomy. *TowardsNewSQL * Some History * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL * NoSQL Taxonomy * Towards NewSQL Overview * Some History * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL * NoSQL Taxonomy *TowardsNewSQL NoSQL

More information

Dynamo: Amazon s Highly Available Key-value Store

Dynamo: Amazon s Highly Available Key-value Store Dynamo: Amazon s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall and

More information

The NoSQL Ecosystem. Adam Marcus MIT CSAIL

The NoSQL Ecosystem. Adam Marcus MIT CSAIL The NoSQL Ecosystem Adam Marcus MIT CSAIL marcua@csail.mit.edu / @marcua About Me Social Computing + Database Systems Easily Distracted: Wrote The NoSQL Ecosystem in The Architecture of Open Source Applications

More information

Formal Modeling and Analysis of Cassandra in Maude

Formal Modeling and Analysis of Cassandra in Maude Formal Modeling and Analysis of Cassandra in Maude Si Liu, Muntasir Raihan Rahman, Stephen Skeirik, Indranil Gupta, and José Meseguer Department of Computer Science University of Illinois at Urbana-Champaign

More information

Mike Sr. Consultant, Microsoft Azure Cosmos DB. Planet Earth Scale, for now

Mike Sr. Consultant, Microsoft Azure Cosmos DB. Planet Earth Scale, for now Mike Lawell, @sqldiver, Sr. Consultant, Microsoft milawell@microsoft.com Azure Cosmos DB Planet Earth Scale, for now Mission-critical applications for a global userbase need Building globally distributed

More information

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2015 Lecture 14 NoSQL

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2015 Lecture 14 NoSQL CSE 544 Principles of Database Management Systems Magdalena Balazinska Winter 2015 Lecture 14 NoSQL References Scalable SQL and NoSQL Data Stores, Rick Cattell, SIGMOD Record, December 2010 (Vol. 39, No.

More information

Performance Evaluation of NoSQL Databases

Performance Evaluation of NoSQL Databases Performance Evaluation of NoSQL Databases A Case Study - John Klein, Ian Gorton, Neil Ernst, Patrick Donohoe, Kim Pham, Chrisjan Matser February 2015 PABS '15: Proceedings of the 1st Workshop on Performance

More information

10/18/2017. Announcements. NoSQL Motivation. NoSQL. Serverless Architecture. What is the Problem? Database Systems CSE 414

10/18/2017. Announcements. NoSQL Motivation. NoSQL. Serverless Architecture. What is the Problem? Database Systems CSE 414 Announcements Database Systems CSE 414 Lecture 11: NoSQL & JSON (mostly not in textbook only Ch 11.1) HW5 will be posted on Friday and due on Nov. 14, 11pm [No Web Quiz 5] Today s lecture: NoSQL & JSON

More information

class 17 updates prof. Stratos Idreos

class 17 updates prof. Stratos Idreos class 17 updates prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ UPDATE table_name SET column1=value1,column2=value2,... WHERE some_column=some_value INSERT INTO table_name VALUES (value1,value2,value3,...)

More information

A Global In-memory Data System for MySQL Daniel Austin, PayPal Technical Staff

A Global In-memory Data System for MySQL Daniel Austin, PayPal Technical Staff A Global In-memory Data System for MySQL Daniel Austin, PayPal Technical Staff Percona Live! MySQL Conference Santa Clara, April 12th, 2012 v1.3 Intro: Globalizing NDB Proposed Architecture What We Learned

More information

Reprise: Stability under churn (Tapestry) A Simple lookup Test. Churn (Optional Bamboo paper last time)

Reprise: Stability under churn (Tapestry) A Simple lookup Test. Churn (Optional Bamboo paper last time) EECS 262a Advanced Topics in Computer Systems Lecture 22 Reprise: Stability under churn (Tapestry) P2P Storage: Dynamo November 20 th, 2013 John Kubiatowicz and Anthony D. Joseph Electrical Engineering

More information

Paxos and Raft (Lecture 21, cs262a) Ion Stoica, UC Berkeley November 7, 2016

Paxos and Raft (Lecture 21, cs262a) Ion Stoica, UC Berkeley November 7, 2016 Paxos and Raft (Lecture 21, cs262a) Ion Stoica, UC Berkeley November 7, 2016 Bezos mandate for service-oriented-architecture (~2002) 1. All teams will henceforth expose their data and functionality through

More information

Computing Parable. The Archery Teacher. Courtesy: S. Keshav, U. Waterloo. Computer Science. Lecture 16, page 1

Computing Parable. The Archery Teacher. Courtesy: S. Keshav, U. Waterloo. Computer Science. Lecture 16, page 1 Computing Parable The Archery Teacher Courtesy: S. Keshav, U. Waterloo Lecture 16, page 1 Consistency and Replication Today: Consistency models Data-centric consistency models Client-centric consistency

More information

Cassandra, MongoDB, and HBase. Cassandra, MongoDB, and HBase. I have chosen these three due to their recent

Cassandra, MongoDB, and HBase. Cassandra, MongoDB, and HBase. I have chosen these three due to their recent Tanton Jeppson CS 401R Lab 3 Cassandra, MongoDB, and HBase Introduction For my report I have chosen to take a deeper look at 3 NoSQL database systems: Cassandra, MongoDB, and HBase. I have chosen these

More information

Ambry: LinkedIn s Scalable Geo- Distributed Object Store

Ambry: LinkedIn s Scalable Geo- Distributed Object Store Ambry: LinkedIn s Scalable Geo- Distributed Object Store Shadi A. Noghabi *, Sriram Subramanian +, Priyesh Narayanan +, Sivabalan Narayanan +, Gopalakrishna Holla +, Mammad Zadeh +, Tianwei Li +, Indranil

More information

ZHT A Fast, Reliable and Scalable Zero- hop Distributed Hash Table

ZHT A Fast, Reliable and Scalable Zero- hop Distributed Hash Table ZHT A Fast, Reliable and Scalable Zero- hop Distributed Hash Table 1 What is KVS? Why to use? Why not to use? Who s using it? Design issues A storage system A distributed hash table Spread simple structured

More information

Characterizing and Adapting the Consistency-Latency Tradeoff in Distributed Key-value Stores

Characterizing and Adapting the Consistency-Latency Tradeoff in Distributed Key-value Stores Characterizing and Adapting the Consistency-Latency Tradeoff in Distributed Key-value Stores Muntasir Raihan Rahman, Lewis Tseng, Son Nguyen, Indranil Gupta, Nitin Vaidya University of Illinois at Urbana-Champaign

More information

Tools for Social Networking Infrastructures

Tools for Social Networking Infrastructures Tools for Social Networking Infrastructures 1 Cassandra - a decentralised structured storage system Problem : Facebook Inbox Search hundreds of millions of users distributed infrastructure inbox changes

More information

Erasure Coding in Object Stores: Challenges and Opportunities

Erasure Coding in Object Stores: Challenges and Opportunities Erasure Coding in Object Stores: Challenges and Opportunities Lewis Tseng Boston College July 2018, PODC Acknowledgements Nancy Lynch Muriel Medard Kishori Konwar Prakash Narayana Moorthy Viveck R. Cadambe

More information

Apache Cassandra - A Decentralized Structured Storage System

Apache Cassandra - A Decentralized Structured Storage System Apache Cassandra - A Decentralized Structured Storage System Avinash Lakshman Prashant Malik from Facebook Presented by: Oded Naor Acknowledgments Some slides are based on material from: Idit Keidar, Topics

More information

Consistency & Replication

Consistency & Replication Objectives Consistency & Replication Instructor: Dr. Tongping Liu To understand replication and related issues in distributed systems" To learn about how to keep multiple replicas consistent with each

More information

Inherent Replica Inconsistency in Cassandra

Inherent Replica Inconsistency in Cassandra Inherent Replica Inconsistency in Cassandra Xiangdong Huang, Jianmin Wang, Jian Bai, Guiguang Ding, Mingsheng Long School of Software, Tsinghua University, Beijing China {huangxd2,baij2,longmc}@mails.tsinghua.edu.cn,

More information

Self-healing Data Step by Step

Self-healing Data Step by Step Self-healing Data Step by Step Uwe Friedrichsen (codecentric AG) NoSQL matters Cologne, 29. April 2014 @ufried Uwe Friedrichsen uwe.friedrichsen@codecentric.de http://slideshare.net/ufried http://ufried.tumblr.com

More information

Consistency in Distributed Storage Systems. Mihir Nanavati March 4 th, 2016

Consistency in Distributed Storage Systems. Mihir Nanavati March 4 th, 2016 Consistency in Distributed Storage Systems Mihir Nanavati March 4 th, 2016 Today Overview of distributed storage systems CAP Theorem About Me Virtualization/Containers, CPU microarchitectures/caches, Network

More information