Ghislain Fourny. Big Data 5. Wide column stores

Size: px
Start display at page:

Download "Ghislain Fourny. Big Data 5. Wide column stores"

Transcription

1 Ghislain Fourny Big Data 5. Wide column stores

2 Data Technology Stack User interfaces Querying Data stores Indexing Processing Validation Data models Syntax Encoding Storage 2

3 Where we are User interfaces Querying Data stores Indexing Processing Validation Data models Syntax Encoding Last weeks Storage 3

4 Where we are User interfaces Querying Data stores Indexing Processing Validation Today Data models Syntax Encoding Last weeks Storage 4

5 Relational model 5

6 Relational model Schema 6

7 Issues with relational databases (RDBMS) Small scale 7

8 Issues with relational databases (RDBMS) Small scale Single machine 8

9 Can we fix a RDBMS? 9

10 Can we fix a RDBMS? Scale up (remember?) 10

11 Can we fix a RDBMS? Scale out 11

12 Can we fix a RDBMS? Cluster Scale out 12

13 Can we fix a RDBMS? Cluster Replicate Scale out 13

14 Can we fix a RDBMS? Hard to set up Scale out 14

15 Can we fix a RDBMS? Hard to set up Very high maintenance costs Scale out 15

16 HBase By design running on a scalable cluster of commodity hardware 16

17 HBase By design running on a scalable cluster of commodity hardware HDFS 17

18 Wide column stores: data model 18

19 Founding paper 's BigTable 19

20 The tabular model 20

21 The tabular model: expensive joins 21

22 Design paradigm of BigTable store together what is accessed together 22

23 The tabular model: expensive joins

24 3rd Normal Form: Example Legi Name City State Alan Turing Bletchley Park UK City State PLZ Bletchley Park UK MK3 6EB Alan Turing Bletchley Park UK Bletchley Park UK MK3 6EB Georg Cantor Pfäffikon SZ Pfäffikon SZ Georg Cantor Pfäffikon SZ Pfäffikon SZ Felix Bloch Pfäffikon ZH Pfäffikon ZH

25 3rd Normal Form: Counter-Example Legi Name City State PLZ Alan Turing Bletchley Park UK MK3 6EB Alan Turing Bletchley Park UK MK3 6EB Georg Cantor Pfäffikon SZ Georg Cantor Pfäffikon SZ Felix Bloch Pfäffikon ZH

26 The tabular model: expensive joins

27 The columnar model: denormalized

28 Rows Row ID A1 1E0 22A 4A2 28

29 Rows Yes, for now this actually looks pretty much like key-value storage. Row ID A1 1E0 22A 4A2 29

30 Columns Row ID A1 1E0 22A 4A2 30

31 Columns Column family Row ID A1 1E0 22A 4A2 31

32 Column families must be known in advance... Row ID 32

33 Column families must be known in advance... Row ID 000 A B 1 2 I 002 0A1 1E0 22A 4A2 33

34 ... but columns can be added on the fly Row ID 000 A B C 1 2 I II III IV 002 0A1 1E0 22A 4A2 34

35 Primary queries 35

36 Primary queries Get 36

37 Get Row ID 000 A B C 1 2 I II III IV 002 0A1 1E0 22A 4A2 37

38 Primary queries Get 38

39 Primary queries Get Put 39

40 Put Row ID 000 A B C 1 2 I II III IV 002 0A1 1E A 4A2 40

41 Primary queries Get Put 41

42 Primary queries Get Put Scan (This is new) 42

43 Scan Row ID 000 A B C 1 2 I II III IV 002 0A1 1E A 4A2 43

44 Primary queries Get Put Scan (This is new) 44

45 Primary queries Get Put Scan Delete (This is new) 45

46 Delete Row ID 000 A B C 1 2 I II III IV 002 0A1 1E A 4A2 46

47 Some terminology: Key-value model Key Value 47

48 Some terminology: Column-oriented stores Column1 Column2 48

49 Some terminology: Column-oriented key-value stores Also: wide column stores, column-family-oriented Row ID A B C 1 2 I II III IV 49

50 Examples of Column-oriented key-value stores 's BigTable 50

51 Warning on terminology NoSQL is very recent! 51

52 Warning on terminology Key-value storage Relational table Words have a "life" File Block NoSQL Object storage 52

53 HBase: physical level 53

54 Physical layer: regions Row ID A B C 1 2 I II III IV 54

55 Physical layer: regions Row ID A B C 1 2 I II III IV 55

56 Physical layer: regions Row ID A B C 1 2 I II III IV Min-incl. Max-excl. 56

57 Physical layer: column families Row ID A B C 1 2 I II III IV Min-incl. Max-excl. Stored together 57

58 Architecture "The same procedure as every year, James." 58

59 HDFS... Namenode /dir/file1 /dir/file2 /file3 Datanode Datanode Datanode Datanode Datanode Datanode 59

60 HBase HMaster Regionserver Regionserver Regionserver Regionserver Regionserver Regionserver 60

61 HMaster HMaster Regionserver Regionserver Regionserver Regionserver Regionserver Regionserver 61

62 HMaster DDL operations 62

63 HMaster DDL operations Create table 63

64 HMaster DDL operations Create table Delete table 64

65 HMaster assigns regions to RegionServers Row ID 65

66 HMaster assigns regions to RegionServers Row ID 66

67 HMaster assigns regions to RegionServers Row ID 67

68 HMaster assigns regions to RegionServers Row ID 68

69 HMaster splits regions Row ID 69

70 HMaster handles Regionserver failovers 70

71 Architecture HMaster Regionserver Regionserver Regionserver Regionserver Regionserver Regionserver 71

72 Regionserver HMaster Regionserver Regionserver Regionserver Regionserver Regionserver Regionserver 72

73 Physical storage Row ID Min-incl. A B C 1 2 Stored together I II III IV 73

74 Physical storage Row ID A B C 1 2 I II III IV Store Store Store Store Store Store 74

75 Store = column family Row ID

76 Store = column family Row ID 1 2 Cell 76

77 Store = column family Row ID 1 2 HFile HFile HFile HFile (On HDFS) 77

78 HFile HFile 78

79 HFile HFile That's actually an SSTable (flat sorted list of key-value pairs) 79

80 HFile HFile KeyValue That's actually an SSTable (flat sorted list of key-value pairs) (Stores a cell) 80

81 HFile

82 Versioning Different versions of same cell Latest 82

83 Versioning: timeline V 1 V 2 V 3 V 4 83

84 Versioning: timeline V 1 V 2 V 3 V 4 Total order: not like DynamoDB 84

85 Versioning: timeline V 1 V 2 V 3 V 4 Total order: not like DynamoDB A B C HBase guarantees ACID on the row level (concurrent writes and reads are synchronizing with per-row locks) 85

86 HFile: KeyValue key value 86

87 HFile: KeyValue (prefix code) keylength valuelength key value 87

88 Prefix code example: Gamma code

89 Prefix code example: Gamma code

90 Prefix code example: Gamma code

91 Prefix code example: Gamma code

92 Prefix code example: Gamma code

93 Prefix code example: Gamma code

94 Prefix code example: Gamma code

95 Prefix code example: Gamma code

96 Prefix code example: Gamma code

97 Prefix code example: Gamma code

98 Prefix code example: Gamma code

99 Prefix code example: Gamma code

100 HFile: Key row length row (key) column family length column family column qualifier timestamp key type 100

101 HFile: Key row length row (key) column family length column family column qualifier timestamp key type This one is for the versioning 101

102 HFile: Key row length row (key) column family length column family column qualifier timestamp key type This one is for marking as deleted 102

103 Blocks HFile 103

104 Blocks HFile "Quantity" of KeyValues that get read at a time 104

105 Blocks Default HFile 64kb 105

106 Blocks: long keys or values size(keyvalue) > block size No split (longer block) 106

107 Inside an HFile key1 key5 key11 key17 /index /data 107

108 Looking up a key key1 key5 key11 key17 108

109 Looking up a key key1 key5 key11 key17 109

110 Looking up a key key1 key5 key14 key11 key17 110

111 Looking up a key key1 key5 key14 key11 key17 111

112 Looking up a key key1 key5 key14 key11 key17 112

113 Looking up a key key1 key5 key14 key11 key17 key14 113

114 Writing to an HFile key1 114

115 Writing to an HFile key1 key1 115

116 Writing to an HFile key1 key1 key2 key3 key4 116

117 Writing to an HFile key1 key1 key2 key3 key4 key5 key5 117

118 Writing to an HFile key1 key5 key1 key2 key3 key4 key5 key6 key7 key8 key9 key10 118

119 Writing to an HFile key1 key5 key11 key1 key2 key3 key4 key5 key6 key7 key8 key9 key10 key11 119

120 Writing to an HFile key1 key5 key11 key1 key2 key3 key4 key5 key6 key7 key8 key9 key10 key11 key12 key13 key14 key15 key16 120

121 Writing to an HFile key1 key5 key11 key17 key1 key2 key3 key4 key5 key6 key7 key8 key9 key10 key11 key12 key13 key14 key15 key16 key17 key18 121

122 Levels of physical storage Table 122

123 Levels of physical storage Table Region 123

124 Levels of physical storage Table Region Store 124

125 Levels of physical storage Table Region Store StoreFile 125

126 Levels of physical storage Table Region Store StoreFile Block 126

127 Levels of physical storage Table Region Store StoreFile Block KeyValue 127

128 Problem key1 key5 key11 key17 key1 key2 key3 key4 key5 key6 key7 key8 key9 key10 key11 key12 key13 key14 key15 key16 key17 key18 128

129 Problem key1 We can only write key-values in sorted order Sorted key2 key3 key4 key5 key6 key7 key8 key9 key10 key11 key12 key13 key14 key15 key16 key17 key18 129

130 HBase: Writing new cells 130

131 On Disk Table Region Store StoreFile Block KeyValue 131

132 Store StoreFile Block Block StoreFile Block Block 132

133 Store MemStore radub85 / 123RF Stock Photo StoreFile Block Block StoreFile Block Block 133

134 In Memory Table Region Store MemStore Cell 134

135 Writing new cells MemStore StoreFile Block Block 135

136 Writing new cells MemStore StoreFile Block Block 136

137 Writing new cells MemStore StoreFile Block Block 137

138 Writing new cells MemStore StoreFile Block Block 138

139 Writing new cells MemStore StoreFile Block Block 139

140 Flush MemStore StoreFile StoreFile Block Block Block Block Sort! 140

141 Flush When: 141

142 Flush When: Reaching max Memstore size in a store 142

143 Flush When: Reaching max Memstore size in a store Reaching overall max Memstore size 143

144 Flush When: Reaching max Memstore size in a store Reaching overall max Memstore size Reaching full Write-Ahead Log 144

145 Write-Ahead Log MemStore 145

146 Write-Ahead Log MemStore HLog 146

147 Write-Ahead Log MemStore HLog On HDFS One per RegionServer 147

148 Write-Ahead Log MemStore HLog 148

149 Reading from a Store MemStore StoreFile Block Block StoreFile Block Block 149

150 Reading from a Store MemStore StoreFile Block Block StoreFile Block Block 150

151 Compaction StoreFile StoreFile StoreFile Block Block Block Block Block Block 151

152 Compaction StoreFile StoreFile StoreFile Block Block Block Block Block Block 152

153 Compaction StoreFile (Sort again) Block Block Block Block Block Block 153

154 Seek vs. Transfer B+-trees LSM-Trees 154

155 Seek vs. Transfer Classical RDBMS Wide column stores 155

156 Log-Structured Merge-Trees 156

157 Log-Structured Merge-Trees C 0 C 1 C

158 Log-Structured Merge-Trees merge merge C 0 C 1 C

159 Seek vs. Transfer Seek-time-bound Transfer-time-bound 159

160 The META table: a table like any other 160

161 The META table: stores region locations table + region start key + region id + replica id info: regioninfo info: server info: serverstartcode T10:15:00 161

162 RegionInfo RegionInfo Table name Start key Region ID Replica ID encodedname End key Split Offline 162

163 HBase Bootstrap Root 163

164 HBase Bootstrap Root Meta 164

165 HBase Bootstrap Root Meta Regular tables 165

166 Architecture HMaster Regionserver Regionserver Regionserver Regionserver Regionserver Regionserver 166

167 Architecture HMaster Create/delete/update table Regionserver Regionserver Regionserver Regionserver Regionserver Regionserver 167

168 Architecture HMaster Region? Regionserver Regionserver Regionserver Regionserver Regionserver Regionserver (hosting meta) 168

169 Architecture HMaster Region? Regionserver location(s) Regionserver Regionserver Regionserver Regionserver Regionserver Regionserver 169

170 Architecture HMaster Query Regionserver Regionserver Regionserver Regionserver Regionserver Regionserver 170

171 HBase: Underlying APIs grazvydas / 123RF Stock Photo 171

172 HBase implementation (Packaged code) 172

173 HBase APIs REST 173

174 HBase: caching 174

175 HBase Caches: reading faster LRU block cache Level 1 175

176 HBase Caches: reading faster LRU block cache bucket cache Level 1 Level 2 176

177 HBase Caches: reading faster LRU block cache bucket cache HDFS Level 1 Level 2 177

178 LRU Block Cache On the Least Recently Heap Used 178

179 LRU Block Cache: levels of priority Single access priority Multi access priority In-memory access priority 179

180 When NOT to use the cache 180

181 When NOT to use the cache Batch processing 181

182 When NOT to use the cache Random access 182

183 Summary of what we have in memory 183

184 Summary of what we have in memory MemStore 184

185 Summary of what we have in memory MemStore LRU BlockCache 185

186 Summary of what we have in memory MemStore LRU BlockCache Indices of HFiles 186

187 Summary of what we have in memory MemStore LRU BlockCache Indices of HFiles Bloom Filters (avoids disk reads if we can guarantee that a key is not in an HFile) 187

188 Hash function Source: Jorge Stolfi (Wikipedia) 188

189 Bloom filter Very quickly whether an element belongs to a set (potentially false positives) 189

190 Bloom filter

191 Bloom filter John Smith hash function 1 hash function 2 hash function k

192 Bloom filter Mary Smith hash function 1 hash function 2 hash function k

193 Bloom filter: not in set hash function 1 hash function 2 hash function k Albert Einstein? 193

194 Bloom filter: in set (and correct) hash function 1 hash function 2 hash function k Mary Smith? 194

195 Bloom filter: in set (false positive) hash function 1 hash function 2 hash function k Louis de Broglie? 195

196 Data Locality 196

197 HBase vs. HDFS 197

198 With HDFS load balancer

199 HFile compaction brings back locality 199

200 Best practices 200

201 Number of rows Millions RDBMS Billions HBase 201

202 Number of nodes > 5 202

203 Row IDs and column names keep them >short< why? 203

204 10 Design Principles of Big Data 204

205 1. Learn from the past 205

206 2. Keep the design simple 206

207 3. Modularize the architecture 207

208 4. Homogeneity in the large 208

209 5. Heterogeneity in the small 209

210 6. Separate metadata from data 210

211 7. Abstract logical model from its physical implementation 211

212 8. Shard the data 212

213 9. Replicate the data 213

214 10. Buy lots of cheap hardware 214

215 Spanner 215

216 Spanner new: externally-consistent distributed transactions 216

217 Spanner Tabular Data Model Language ACID properties Distribution Scalability Sharding Replicas SQL NoSQL 217

218 Spanner Tabular Data Model Language ACID properties Distribution Scalability Sharding Replicas SQL NoSQL 218

219 Spanner: Data Model Multi-column primary key 219

220 Spanner: Data Model Multi-column primary key Timestamp 220

221 Spanner: Data Model Multi-column primary key Timestamp Directory 221

222 Spanner: Data Model Multi-column primary key Timestamp Tablet 222

223 Spanner: Data Scale 1,000,000,000,000s of rows 223

224 Spanner: Architecture of a Zone zonemaster Spanserver Spanserver Spanserver Spanserver Spanserver Spanserver 224

225 Spanner: Architecture universemaster 225

226 Spanner: Architecture Data Center Data Center Data Center 226

227 Spanner: Architecture Replica Replica Replica Paxos Paxos Paxos Tablet Tablet Tablet Colossus Colossus Colossus 227

228 Spanner: Architecture 100s of data centers Data Center Data Center Data Center Data Center Data Center Data Center Data Center Data Center Data Center Data Center Data Center Data Center Data Center Data Center Data Center Data Center Data Center Data Center Data Center Data Center Data Center Data Center Data Center Data Center Data Center Data Center Data Center Data Center Data Center Data Center Data Center Data Center Data Center Data Center Data Center Data Center 228

229 Spanner: Architecture 1,000,000s of machines 229

230 Spanner: Architecture Higher availability Lower latency 230

Ghislain Fourny. Big Data 5. Column stores

Ghislain Fourny. Big Data 5. Column stores Ghislain Fourny Big Data 5. Column stores 1 Introduction 2 Relational model 3 Relational model Schema 4 Issues with relational databases (RDBMS) Small scale Single machine 5 Can we fix a RDBMS? Scale up

More information

HBASE INTERVIEW QUESTIONS

HBASE INTERVIEW QUESTIONS HBASE INTERVIEW QUESTIONS http://www.tutorialspoint.com/hbase/hbase_interview_questions.htm Copyright tutorialspoint.com Dear readers, these HBase Interview Questions have been designed specially to get

More information

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2016)

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2016) Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2016) Week 10: Mutable State (1/2) March 15, 2016 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo These

More information

COSC 6339 Big Data Analytics. NoSQL (II) HBase. Edgar Gabriel Fall HBase. Column-Oriented data store Distributed designed to serve large tables

COSC 6339 Big Data Analytics. NoSQL (II) HBase. Edgar Gabriel Fall HBase. Column-Oriented data store Distributed designed to serve large tables COSC 6339 Big Data Analytics NoSQL (II) HBase Edgar Gabriel Fall 2018 HBase Column-Oriented data store Distributed designed to serve large tables Billions of rows and millions of columns Runs on a cluster

More information

BigTable: A Distributed Storage System for Structured Data

BigTable: A Distributed Storage System for Structured Data BigTable: A Distributed Storage System for Structured Data Amir H. Payberah amir@sics.se Amirkabir University of Technology (Tehran Polytechnic) Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26

More information

10 Million Smart Meter Data with Apache HBase

10 Million Smart Meter Data with Apache HBase 10 Million Smart Meter Data with Apache HBase 5/31/2017 OSS Solution Center Hitachi, Ltd. Masahiro Ito OSS Summit Japan 2017 Who am I? Masahiro Ito ( 伊藤雅博 ) Software Engineer at Hitachi, Ltd. Focus on

More information

ADVANCED HBASE. Architecture and Schema Design GeeCON, May Lars George Director EMEA Services

ADVANCED HBASE. Architecture and Schema Design GeeCON, May Lars George Director EMEA Services ADVANCED HBASE Architecture and Schema Design GeeCON, May 2013 Lars George Director EMEA Services About Me Director EMEA Services @ Cloudera Consulting on Hadoop projects (everywhere) Apache Committer

More information

HBase. Леонид Налчаджи

HBase. Леонид Налчаджи HBase Леонид Налчаджи leonid.nalchadzhi@gmail.com HBase Overview Table layout Architecture Client API Key design 2 Overview 3 Overview NoSQL Column oriented Versioned 4 Overview All rows ordered by row

More information

Distributed Systems. 19. Spanner. Paul Krzyzanowski. Rutgers University. Fall 2017

Distributed Systems. 19. Spanner. Paul Krzyzanowski. Rutgers University. Fall 2017 Distributed Systems 19. Spanner Paul Krzyzanowski Rutgers University Fall 2017 November 20, 2017 2014-2017 Paul Krzyzanowski 1 Spanner (Google s successor to Bigtable sort of) 2 Spanner Take Bigtable and

More information

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Week 10: Mutable State (1/2) March 14, 2017 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo These

More information

Bigtable: A Distributed Storage System for Structured Data By Fay Chang, et al. OSDI Presented by Xiang Gao

Bigtable: A Distributed Storage System for Structured Data By Fay Chang, et al. OSDI Presented by Xiang Gao Bigtable: A Distributed Storage System for Structured Data By Fay Chang, et al. OSDI 2006 Presented by Xiang Gao 2014-11-05 Outline Motivation Data Model APIs Building Blocks Implementation Refinement

More information

Big Data Analytics. Rasoul Karimi

Big Data Analytics. Rasoul Karimi Big Data Analytics Rasoul Karimi Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Big Data Analytics Big Data Analytics 1 / 1 Outline

More information

Comparing SQL and NOSQL databases

Comparing SQL and NOSQL databases COSC 6397 Big Data Analytics Data Formats (II) HBase Edgar Gabriel Spring 2014 Comparing SQL and NOSQL databases Types Development History Data Storage Model SQL One type (SQL database) with minor variations

More information

Goal of the presentation is to give an introduction of NoSQL databases, why they are there.

Goal of the presentation is to give an introduction of NoSQL databases, why they are there. 1 Goal of the presentation is to give an introduction of NoSQL databases, why they are there. We want to present "Why?" first to explain the need of something like "NoSQL" and then in "What?" we go in

More information

Typical size of data you deal with on a daily basis

Typical size of data you deal with on a daily basis Typical size of data you deal with on a daily basis Processes More than 161 Petabytes of raw data a day https://aci.info/2014/07/12/the-dataexplosion-in-2014-minute-by-minuteinfographic/ On average, 1MB-2MB

More information

Big Data Processing Technologies. Chentao Wu Associate Professor Dept. of Computer Science and Engineering

Big Data Processing Technologies. Chentao Wu Associate Professor Dept. of Computer Science and Engineering Big Data Processing Technologies Chentao Wu Associate Professor Dept. of Computer Science and Engineering wuct@cs.sjtu.edu.cn Schedule (1) Storage system part (first eight weeks) lec1: Introduction on

More information

CS November 2017

CS November 2017 Bigtable Highly available distributed storage Distributed Systems 18. Bigtable Built with semi-structured data in mind URLs: content, metadata, links, anchors, page rank User data: preferences, account

More information

BigTable. Chubby. BigTable. Chubby. Why Chubby? How to do consensus as a service

BigTable. Chubby. BigTable. Chubby. Why Chubby? How to do consensus as a service BigTable BigTable Doug Woos and Tom Anderson In the early 2000s, Google had way more than anybody else did Traditional bases couldn t scale Want something better than a filesystem () BigTable optimized

More information

Ghislain Fourny. Big Data 2. Lessons learnt from the past

Ghislain Fourny. Big Data 2. Lessons learnt from the past Ghislain Fourny Big Data 2. Lessons learnt from the past Mr. Databases: Edgar Codd Wikipedia Data Independence (Edgar Codd) Logical data model Lorem Ipsum Dolor sit amet Physical storage Consectetur Adipiscing

More information

Fattane Zarrinkalam کارگاه ساالنه آزمایشگاه فناوری وب

Fattane Zarrinkalam کارگاه ساالنه آزمایشگاه فناوری وب Fattane Zarrinkalam کارگاه ساالنه آزمایشگاه فناوری وب 1391 زمستان Outlines Introduction DataModel Architecture HBase vs. RDBMS HBase users 2 Why Hadoop? Datasets are growing to Petabytes Traditional datasets

More information

BigTable. CSE-291 (Cloud Computing) Fall 2016

BigTable. CSE-291 (Cloud Computing) Fall 2016 BigTable CSE-291 (Cloud Computing) Fall 2016 Data Model Sparse, distributed persistent, multi-dimensional sorted map Indexed by a row key, column key, and timestamp Values are uninterpreted arrays of bytes

More information

Extreme Computing. NoSQL.

Extreme Computing. NoSQL. Extreme Computing NoSQL PREVIOUSLY: BATCH Query most/all data Results Eventually NOW: ON DEMAND Single Data Points Latency Matters One problem, three ideas We want to keep track of mutable state in a scalable

More information

Data Informatics. Seon Ho Kim, Ph.D.

Data Informatics. Seon Ho Kim, Ph.D. Data Informatics Seon Ho Kim, Ph.D. seonkim@usc.edu HBase HBase is.. A distributed data store that can scale horizontally to 1,000s of commodity servers and petabytes of indexed storage. Designed to operate

More information

Spanner: Google's Globally-Distributed Database. Presented by Maciej Swiech

Spanner: Google's Globally-Distributed Database. Presented by Maciej Swiech Spanner: Google's Globally-Distributed Database Presented by Maciej Swiech What is Spanner? "...Google's scalable, multi-version, globallydistributed, and synchronously replicated database." What is Spanner?

More information

Accelerating Big Data: Using SanDisk SSDs for Apache HBase Workloads

Accelerating Big Data: Using SanDisk SSDs for Apache HBase Workloads WHITE PAPER Accelerating Big Data: Using SanDisk SSDs for Apache HBase Workloads December 2014 Western Digital Technologies, Inc. 951 SanDisk Drive, Milpitas, CA 95035 www.sandisk.com Table of Contents

More information

NoSQL Databases. Amir H. Payberah. Swedish Institute of Computer Science. April 10, 2014

NoSQL Databases. Amir H. Payberah. Swedish Institute of Computer Science. April 10, 2014 NoSQL Databases Amir H. Payberah Swedish Institute of Computer Science amir@sics.se April 10, 2014 Amir H. Payberah (SICS) NoSQL Databases April 10, 2014 1 / 67 Database and Database Management System

More information

CS November 2018

CS November 2018 Bigtable Highly available distributed storage Distributed Systems 19. Bigtable Built with semi-structured data in mind URLs: content, metadata, links, anchors, page rank User data: preferences, account

More information

Google File System and BigTable. and tiny bits of HDFS (Hadoop File System) and Chubby. Not in textbook; additional information

Google File System and BigTable. and tiny bits of HDFS (Hadoop File System) and Chubby. Not in textbook; additional information Subject 10 Fall 2015 Google File System and BigTable and tiny bits of HDFS (Hadoop File System) and Chubby Not in textbook; additional information Disclaimer: These abbreviated notes DO NOT substitute

More information

7680: Distributed Systems

7680: Distributed Systems Cristina Nita-Rotaru 7680: Distributed Systems BigTable. Hbase.Spanner. 1: BigTable Acknowledgement } Slides based on material from course at UMichigan, U Washington, and the authors of BigTable and Spanner.

More information

HBase... And Lewis Carroll! Twi:er,

HBase... And Lewis Carroll! Twi:er, HBase... And Lewis Carroll! jw4ean@cloudera.com Twi:er, LinkedIn: @jw4ean 1 Introduc@on 2010: Cloudera Solu@ons Architect 2011: Cloudera TAM/DSE 2012-2013: Cloudera Training focusing on Partners and Newbies

More information

Distributed File Systems II

Distributed File Systems II Distributed File Systems II To do q Very-large scale: Google FS, Hadoop FS, BigTable q Next time: Naming things GFS A radically new environment NFS, etc. Independence Small Scale Variety of workloads Cooperation

More information

Bigtable: A Distributed Storage System for Structured Data. Andrew Hon, Phyllis Lau, Justin Ng

Bigtable: A Distributed Storage System for Structured Data. Andrew Hon, Phyllis Lau, Justin Ng Bigtable: A Distributed Storage System for Structured Data Andrew Hon, Phyllis Lau, Justin Ng What is Bigtable? - A storage system for managing structured data - Used in 60+ Google services - Motivation:

More information

Introduction to BigData, Hadoop:-

Introduction to BigData, Hadoop:- Introduction to BigData, Hadoop:- Big Data Introduction: Hadoop Introduction What is Hadoop? Why Hadoop? Hadoop History. Different types of Components in Hadoop? HDFS, MapReduce, PIG, Hive, SQOOP, HBASE,

More information

Bigtable. Presenter: Yijun Hou, Yixiao Peng

Bigtable. Presenter: Yijun Hou, Yixiao Peng Bigtable Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber Google, Inc. OSDI 06 Presenter: Yijun Hou, Yixiao Peng

More information

Bigtable: A Distributed Storage System for Structured Data by Google SUNNIE CHUNG CIS 612

Bigtable: A Distributed Storage System for Structured Data by Google SUNNIE CHUNG CIS 612 Bigtable: A Distributed Storage System for Structured Data by Google SUNNIE CHUNG CIS 612 Google Bigtable 2 A distributed storage system for managing structured data that is designed to scale to a very

More information

How do we build TiDB. a Distributed, Consistent, Scalable, SQL Database

How do we build TiDB. a Distributed, Consistent, Scalable, SQL Database How do we build TiDB a Distributed, Consistent, Scalable, SQL Database About me LiuQi ( 刘奇 ) JD / WandouLabs / PingCAP Co-founder / CEO of PingCAP Open-source hacker / Infrastructure software engineer

More information

CSE 444: Database Internals. Lectures 26 NoSQL: Extensible Record Stores

CSE 444: Database Internals. Lectures 26 NoSQL: Extensible Record Stores CSE 444: Database Internals Lectures 26 NoSQL: Extensible Record Stores CSE 444 - Spring 2014 1 References Scalable SQL and NoSQL Data Stores, Rick Cattell, SIGMOD Record, December 2010 (Vol. 39, No. 4)

More information

Introduction Data Model API Building Blocks SSTable Implementation Tablet Location Tablet Assingment Tablet Serving Compactions Refinements

Introduction Data Model API Building Blocks SSTable Implementation Tablet Location Tablet Assingment Tablet Serving Compactions Refinements Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber Google, Inc. M. Burak ÖZTÜRK 1 Introduction Data Model API Building

More information

Replica Parallelism to Utilize the Granularity of Data

Replica Parallelism to Utilize the Granularity of Data Replica Parallelism to Utilize the Granularity of Data 1st Author 1st author's affiliation 1st line of address 2nd line of address Telephone number, incl. country code 1st author's E-mail address 2nd Author

More information

CISC 7610 Lecture 2b The beginnings of NoSQL

CISC 7610 Lecture 2b The beginnings of NoSQL CISC 7610 Lecture 2b The beginnings of NoSQL Topics: Big Data Google s infrastructure Hadoop: open google infrastructure Scaling through sharding CAP theorem Amazon s Dynamo 5 V s of big data Everyone

More information

YCSB++ benchmarking tool Performance debugging advanced features of scalable table stores

YCSB++ benchmarking tool Performance debugging advanced features of scalable table stores YCSB++ benchmarking tool Performance debugging advanced features of scalable table stores Swapnil Patil M. Polte, W. Tantisiriroj, K. Ren, L.Xiao, J. Lopez, G.Gibson, A. Fuchs *, B. Rinaldi * Carnegie

More information

CSE-E5430 Scalable Cloud Computing Lecture 9

CSE-E5430 Scalable Cloud Computing Lecture 9 CSE-E5430 Scalable Cloud Computing Lecture 9 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 15.11-2015 1/24 BigTable Described in the paper: Fay

More information

W b b 2.0. = = Data Ex E pl p o l s o io i n

W b b 2.0. = = Data Ex E pl p o l s o io i n Hypertable Doug Judd Zvents, Inc. Background Web 2.0 = Data Explosion Web 2.0 Mt. Web 2.0 Traditional Tools Don t Scale Well Designed for a single machine Typical scaling solutions ad-hoc manual/static

More information

Megastore: Providing Scalable, Highly Available Storage for Interactive Services & Spanner: Google s Globally- Distributed Database.

Megastore: Providing Scalable, Highly Available Storage for Interactive Services & Spanner: Google s Globally- Distributed Database. Megastore: Providing Scalable, Highly Available Storage for Interactive Services & Spanner: Google s Globally- Distributed Database. Presented by Kewei Li The Problem db nosql complex legacy tuning expensive

More information

How we build TiDB. Max Liu PingCAP Amsterdam, Netherlands October 5, 2016

How we build TiDB. Max Liu PingCAP Amsterdam, Netherlands October 5, 2016 How we build TiDB Max Liu PingCAP Amsterdam, Netherlands October 5, 2016 About me Infrastructure engineer / CEO of PingCAP Working on open source projects: TiDB: https://github.com/pingcap/tidb TiKV: https://github.com/pingcap/tikv

More information

Outline. Spanner Mo/va/on. Tom Anderson

Outline. Spanner Mo/va/on. Tom Anderson Spanner Mo/va/on Tom Anderson Outline Last week: Chubby: coordina/on service BigTable: scalable storage of structured data GFS: large- scale storage for bulk data Today/Friday: Lessons from GFS/BigTable

More information

Facebook. The Technology Behind Messages (and more ) Kannan Muthukkaruppan Software Engineer, Facebook. March 11, 2011

Facebook. The Technology Behind Messages (and more ) Kannan Muthukkaruppan Software Engineer, Facebook. March 11, 2011 HBase @ Facebook The Technology Behind Messages (and more ) Kannan Muthukkaruppan Software Engineer, Facebook March 11, 2011 Talk Outline the new Facebook Messages, and how we got started with HBase quick

More information

ΕΠΛ 602:Foundations of Internet Technologies. Cloud Computing

ΕΠΛ 602:Foundations of Internet Technologies. Cloud Computing ΕΠΛ 602:Foundations of Internet Technologies Cloud Computing 1 Outline Bigtable(data component of cloud) Web search basedonch13of thewebdatabook 2 What is Cloud Computing? ACloudis an infrastructure, transparent

More information

5/2/16. Announcements. NoSQL Motivation. The New Hipster: NoSQL. Serverless. What is the Problem? Database Systems CSE 414

5/2/16. Announcements. NoSQL Motivation. The New Hipster: NoSQL. Serverless. What is the Problem? Database Systems CSE 414 Announcements Database Systems CSE 414 Lecture 16: NoSQL and JSon Current assignments: Homework 4 due tonight Web Quiz 6 due next Wednesday [There is no Web Quiz 5 Today s lecture: JSon The book covers

More information

Big Table. Google s Storage Choice for Structured Data. Presented by Group E - Dawei Yang - Grace Ramamoorthy - Patrick O Sullivan - Rohan Singla

Big Table. Google s Storage Choice for Structured Data. Presented by Group E - Dawei Yang - Grace Ramamoorthy - Patrick O Sullivan - Rohan Singla Big Table Google s Storage Choice for Structured Data Presented by Group E - Dawei Yang - Grace Ramamoorthy - Patrick O Sullivan - Rohan Singla Bigtable: Introduction Resembles a database. Does not support

More information

Database Systems CSE 414

Database Systems CSE 414 Database Systems CSE 414 Lecture 16: NoSQL and JSon CSE 414 - Spring 2016 1 Announcements Current assignments: Homework 4 due tonight Web Quiz 6 due next Wednesday [There is no Web Quiz 5] Today s lecture:

More information

Distributed PostgreSQL with YugaByte DB

Distributed PostgreSQL with YugaByte DB Distributed PostgreSQL with YugaByte DB Karthik Ranganathan PostgresConf Silicon Valley Oct 16, 2018 1 CHECKOUT THIS REPO: github.com/yugabyte/yb-sql-workshop 2 About Us Founders Kannan Muthukkaruppan,

More information

big picture parallel db (one data center) mix of OLTP and batch analysis lots of data, high r/w rates, 1000s of cheap boxes thus many failures

big picture parallel db (one data center) mix of OLTP and batch analysis lots of data, high r/w rates, 1000s of cheap boxes thus many failures Lecture 20 -- 11/20/2017 BigTable big picture parallel db (one data center) mix of OLTP and batch analysis lots of data, high r/w rates, 1000s of cheap boxes thus many failures what does paper say Google

More information

References. What is Bigtable? Bigtable Data Model. Outline. Key Features. CSE 444: Database Internals

References. What is Bigtable? Bigtable Data Model. Outline. Key Features. CSE 444: Database Internals References CSE 444: Database Internals Scalable SQL and NoSQL Data Stores, Rick Cattell, SIGMOD Record, December 2010 (Vol 39, No 4) Lectures 26 NoSQL: Extensible Record Stores Bigtable: A Distributed

More information

Bigtable. A Distributed Storage System for Structured Data. Presenter: Yunming Zhang Conglong Li. Saturday, September 21, 13

Bigtable. A Distributed Storage System for Structured Data. Presenter: Yunming Zhang Conglong Li. Saturday, September 21, 13 Bigtable A Distributed Storage System for Structured Data Presenter: Yunming Zhang Conglong Li References SOCC 2010 Key Note Slides Jeff Dean Google Introduction to Distributed Computing, Winter 2008 University

More information

HBase Solutions at Facebook

HBase Solutions at Facebook HBase Solutions at Facebook Nicolas Spiegelberg Software Engineer, Facebook QCon Hangzhou, October 28 th, 2012 Outline HBase Overview Single Tenant: Messages Selection Criteria Multi-tenant Solutions

More information

Cloudera Kudu Introduction

Cloudera Kudu Introduction Cloudera Kudu Introduction Zbigniew Baranowski Based on: http://slideshare.net/cloudera/kudu-new-hadoop-storage-for-fast-analytics-onfast-data What is KUDU? New storage engine for structured data (tables)

More information

NewSQL Databases. The reference Big Data stack

NewSQL Databases. The reference Big Data stack Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica NewSQL Databases Corso di Sistemi e Architetture per Big Data A.A. 2017/18 Valeria Cardellini The reference

More information

CA485 Ray Walshe NoSQL

CA485 Ray Walshe NoSQL NoSQL BASE vs ACID Summary Traditional relational database management systems (RDBMS) do not scale because they adhere to ACID. A strong movement within cloud computing is to utilize non-traditional data

More information

A BigData Tour HDFS, Ceph and MapReduce

A BigData Tour HDFS, Ceph and MapReduce A BigData Tour HDFS, Ceph and MapReduce These slides are possible thanks to these sources Jonathan Drusi - SCInet Toronto Hadoop Tutorial, Amir Payberah - Course in Data Intensive Computing SICS; Yahoo!

More information

Shen PingCAP 2017

Shen PingCAP 2017 Shen Li @ PingCAP About me Shen Li ( 申砾 ) Tech Lead of TiDB, VP of Engineering Netease / 360 / PingCAP Infrastructure software engineer WHY DO WE NEED A NEW DATABASE? Brief History Standalone RDBMS NoSQL

More information

10/18/2017. Announcements. NoSQL Motivation. NoSQL. Serverless Architecture. What is the Problem? Database Systems CSE 414

10/18/2017. Announcements. NoSQL Motivation. NoSQL. Serverless Architecture. What is the Problem? Database Systems CSE 414 Announcements Database Systems CSE 414 Lecture 11: NoSQL & JSON (mostly not in textbook only Ch 11.1) HW5 will be posted on Friday and due on Nov. 14, 11pm [No Web Quiz 5] Today s lecture: NoSQL & JSON

More information

Spanner: Google's Globally-Distributed Database* Huu-Phuc Vo August 03, 2013

Spanner: Google's Globally-Distributed Database* Huu-Phuc Vo August 03, 2013 Spanner: Google's Globally-Distributed Database* Huu-Phuc Vo August 03, 2013 *OSDI '12, James C. Corbett et al. (26 authors), Jay Lepreau Best Paper Award Outline What is Spanner? Features & Example Structure

More information

Introduction to NoSQL Databases

Introduction to NoSQL Databases Introduction to NoSQL Databases Roman Kern KTI, TU Graz 2017-10-16 Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 1 / 31 Introduction Intro Why NoSQL? Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 2 / 31 Introduction

More information

Introduction to Hadoop. High Availability Scaling Advantages and Challenges. Introduction to Big Data

Introduction to Hadoop. High Availability Scaling Advantages and Challenges. Introduction to Big Data Introduction to Hadoop High Availability Scaling Advantages and Challenges Introduction to Big Data What is Big data Big Data opportunities Big Data Challenges Characteristics of Big data Introduction

More information

BigTable: A Distributed Storage System for Structured Data (2006) Slides adapted by Tyler Davis

BigTable: A Distributed Storage System for Structured Data (2006) Slides adapted by Tyler Davis BigTable: A Distributed Storage System for Structured Data (2006) Slides adapted by Tyler Davis Motivation Lots of (semi-)structured data at Google URLs: Contents, crawl metadata, links, anchors, pagerank,

More information

Google Spanner - A Globally Distributed,

Google Spanner - A Globally Distributed, Google Spanner - A Globally Distributed, Synchronously-Replicated Database System James C. Corbett, et. al. Feb 14, 2013. Presented By Alexander Chow For CS 742 Motivation Eventually-consistent sometimes

More information

YCSB++ Benchmarking Tool Performance Debugging Advanced Features of Scalable Table Stores

YCSB++ Benchmarking Tool Performance Debugging Advanced Features of Scalable Table Stores YCSB++ Benchmarking Tool Performance Debugging Advanced Features of Scalable Table Stores Swapnil Patil Milo Polte, Wittawat Tantisiriroj, Kai Ren, Lin Xiao, Julio Lopez, Garth Gibson, Adam Fuchs *, Billie

More information

CS 655 Advanced Topics in Distributed Systems

CS 655 Advanced Topics in Distributed Systems Presented by : Walid Budgaga CS 655 Advanced Topics in Distributed Systems Computer Science Department Colorado State University 1 Outline Problem Solution Approaches Comparison Conclusion 2 Problem 3

More information

Google Cloud Bigtable. And what it's awesome at

Google Cloud Bigtable. And what it's awesome at Google Cloud Bigtable And what it's awesome at Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes Agenda 1 Research 2 A story about bigness 3 How it works 4 When it's awesome Google Research

More information

Jargons, Concepts, Scope and Systems. Key Value Stores, Document Stores, Extensible Record Stores. Overview of different scalable relational systems

Jargons, Concepts, Scope and Systems. Key Value Stores, Document Stores, Extensible Record Stores. Overview of different scalable relational systems Jargons, Concepts, Scope and Systems Key Value Stores, Document Stores, Extensible Record Stores Overview of different scalable relational systems Examples of different Data stores Predictions, Comparisons

More information

Pyro: A Spatial-Temporal Big-Data Storage System. Shen Li Shaohan Hu Raghu Ganti Mudhakar Srivatsa Tarek Abdelzaher

Pyro: A Spatial-Temporal Big-Data Storage System. Shen Li Shaohan Hu Raghu Ganti Mudhakar Srivatsa Tarek Abdelzaher Pyro: A Spatial-Temporal Big-Data Storage System Shen Li Shaohan Hu Raghu Ganti Mudhakar Srivatsa Tarek Abdelzaher 1 Applications A huge amount of geo- tagged events are generated and stored in real- 5me.

More information

Distributed Systems. GFS / HDFS / Spanner

Distributed Systems. GFS / HDFS / Spanner 15-440 Distributed Systems GFS / HDFS / Spanner Agenda Google File System (GFS) Hadoop Distributed File System (HDFS) Distributed File Systems Replication Spanner Distributed Database System Paxos Replication

More information

What is database? Types and Examples

What is database? Types and Examples What is database? Types and Examples Visit our site for more information: www.examplanning.com Facebook Page: https://www.facebook.com/examplanning10/ Twitter: https://twitter.com/examplanning10 TABLE

More information

GridGain and Apache Ignite In-Memory Performance with Durability of Disk

GridGain and Apache Ignite In-Memory Performance with Durability of Disk GridGain and Apache Ignite In-Memory Performance with Durability of Disk Dmitriy Setrakyan Apache Ignite PMC GridGain Founder & CPO http://ignite.apache.org #apacheignite Agenda What is GridGain and Ignite

More information

Integrity in Distributed Databases

Integrity in Distributed Databases Integrity in Distributed Databases Andreas Farella Free University of Bozen-Bolzano Table of Contents 1 Introduction................................................... 3 2 Different aspects of integrity.....................................

More information

Database Evolution. DB NoSQL Linked Open Data. L. Vigliano

Database Evolution. DB NoSQL Linked Open Data. L. Vigliano Database Evolution DB NoSQL Linked Open Data Requirements and features Large volumes of data..increasing No regular data structure to manage Relatively homogeneous elements among them (no correlation between

More information

Structured Big Data 1: Google Bigtable & HBase Shiow-yang Wu ( 吳秀陽 ) CSIE, NDHU, Taiwan, ROC

Structured Big Data 1: Google Bigtable & HBase Shiow-yang Wu ( 吳秀陽 ) CSIE, NDHU, Taiwan, ROC Structured Big Data 1: Google Bigtable & HBase Shiow-yang Wu ( 吳秀陽 ) CSIE, NDHU, Taiwan, ROC Lecture material is mostly home-grown, partly taken with permission and courtesy from Professor Shih-Wei Liao

More information

Apache HBase Andrew Purtell Committer, Apache HBase, Apache Software Foundation Big Data US Research And Development, Intel

Apache HBase Andrew Purtell Committer, Apache HBase, Apache Software Foundation Big Data US Research And Development, Intel Apache HBase 0.98 Andrew Purtell Committer, Apache HBase, Apache Software Foundation Big Data US Research And Development, Intel Who am I? Committer on the Apache HBase project Member of the Big Data Research

More information

Time Series Storage with Apache Kudu (incubating)

Time Series Storage with Apache Kudu (incubating) Time Series Storage with Apache Kudu (incubating) Dan Burkert (Committer) dan@cloudera.com @danburkert Tweet about this talk: @getkudu or #kudu 1 Time Series machine metrics event logs sensor telemetry

More information

BIG DATA TECHNOLOGIES: WHAT EVERY MANAGER NEEDS TO KNOW ANALYTICS AND FINANCIAL INNOVATION CONFERENCE JUNE 26-29,

BIG DATA TECHNOLOGIES: WHAT EVERY MANAGER NEEDS TO KNOW ANALYTICS AND FINANCIAL INNOVATION CONFERENCE JUNE 26-29, BIG DATA TECHNOLOGIES: WHAT EVERY MANAGER NEEDS TO KNOW ANALYTICS AND FINANCIAL INNOVATION CONFERENCE JUNE 26-29, 2016 1 OBJECTIVES ANALYTICS AND FINANCIAL INNOVATION CONFERENCE JUNE 26-29, 2016 2 WHAT

More information

Corbett et al., Spanner: Google s Globally-Distributed Database

Corbett et al., Spanner: Google s Globally-Distributed Database Corbett et al., : Google s Globally-Distributed Database MIMUW 2017-01-11 ACID transactions ACID transactions SQL queries ACID transactions SQL queries Semi-relational data model ACID transactions SQL

More information

April Final Quiz COSC MapReduce Programming a) Explain briefly the main ideas and components of the MapReduce programming model.

April Final Quiz COSC MapReduce Programming a) Explain briefly the main ideas and components of the MapReduce programming model. 1. MapReduce Programming a) Explain briefly the main ideas and components of the MapReduce programming model. MapReduce is a framework for processing big data which processes data in two phases, a Map

More information

Big Data Hadoop Course Content

Big Data Hadoop Course Content Big Data Hadoop Course Content Topics covered in the training Introduction to Linux and Big Data Virtual Machine ( VM) Introduction/ Installation of VirtualBox and the Big Data VM Introduction to Linux

More information

CISC 7610 Lecture 5 Distributed multimedia databases. Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL

CISC 7610 Lecture 5 Distributed multimedia databases. Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL CISC 7610 Lecture 5 Distributed multimedia databases Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL Motivation YouTube receives 400 hours of video per minute That is 200M hours

More information

Scaling Up HBase. Duen Horng (Polo) Chau Assistant Professor Associate Director, MS Analytics Georgia Tech. CSE6242 / CX4242: Data & Visual Analytics

Scaling Up HBase. Duen Horng (Polo) Chau Assistant Professor Associate Director, MS Analytics Georgia Tech. CSE6242 / CX4242: Data & Visual Analytics http://poloclub.gatech.edu/cse6242 CSE6242 / CX4242: Data & Visual Analytics Scaling Up HBase Duen Horng (Polo) Chau Assistant Professor Associate Director, MS Analytics Georgia Tech Partly based on materials

More information

Advanced HBase Schema Design. Berlin Buzzwords, June 2012 Lars George

Advanced HBase Schema Design. Berlin Buzzwords, June 2012 Lars George Advanced HBase Schema Design Berlin Buzzwords, June 2012 Lars George lars@cloudera.com About Me SoluDons Architect @ Cloudera Apache HBase & Whirr CommiIer Author of HBase The Defini.ve Guide Working with

More information

Distributed Data Store

Distributed Data Store Distributed Data Store Large-Scale Distributed le system Q: What if we have too much data to store in a single machine? Q: How can we create one big filesystem over a cluster of machines, whose data is

More information

18-hdfs-gfs.txt Thu Oct 27 10:05: Notes on Parallel File Systems: HDFS & GFS , Fall 2011 Carnegie Mellon University Randal E.

18-hdfs-gfs.txt Thu Oct 27 10:05: Notes on Parallel File Systems: HDFS & GFS , Fall 2011 Carnegie Mellon University Randal E. 18-hdfs-gfs.txt Thu Oct 27 10:05:07 2011 1 Notes on Parallel File Systems: HDFS & GFS 15-440, Fall 2011 Carnegie Mellon University Randal E. Bryant References: Ghemawat, Gobioff, Leung, "The Google File

More information

HBase: Overview. HBase is a distributed column-oriented data store built on top of HDFS

HBase: Overview. HBase is a distributed column-oriented data store built on top of HDFS HBase 1 HBase: Overview HBase is a distributed column-oriented data store built on top of HDFS HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed Computing

More information

Apache Hadoop Goes Realtime at Facebook. Himanshu Sharma

Apache Hadoop Goes Realtime at Facebook. Himanshu Sharma Apache Hadoop Goes Realtime at Facebook Guide - Dr. Sunny S. Chung Presented By- Anand K Singh Himanshu Sharma Index Problem with Current Stack Apache Hadoop and Hbase Zookeeper Applications of HBase at

More information

Big Data for Engineers Spring Resource Management

Big Data for Engineers Spring Resource Management Ghislain Fourny Big Data for Engineers Spring 2018 7. Resource Management artjazz / 123RF Stock Photo Data Technology Stack User interfaces Querying Data stores Indexing Processing Validation Data models

More information

MySQL Cluster Web Scalability, % Availability. Andrew

MySQL Cluster Web Scalability, % Availability. Andrew MySQL Cluster Web Scalability, 99.999% Availability Andrew Morgan @andrewmorgan www.clusterdb.com Safe Harbour Statement The following is intended to outline our general product direction. It is intended

More information

18-hdfs-gfs.txt Thu Nov 01 09:53: Notes on Parallel File Systems: HDFS & GFS , Fall 2012 Carnegie Mellon University Randal E.

18-hdfs-gfs.txt Thu Nov 01 09:53: Notes on Parallel File Systems: HDFS & GFS , Fall 2012 Carnegie Mellon University Randal E. 18-hdfs-gfs.txt Thu Nov 01 09:53:32 2012 1 Notes on Parallel File Systems: HDFS & GFS 15-440, Fall 2012 Carnegie Mellon University Randal E. Bryant References: Ghemawat, Gobioff, Leung, "The Google File

More information

<Insert Picture Here> Oracle NoSQL Database A Distributed Key-Value Store

<Insert Picture Here> Oracle NoSQL Database A Distributed Key-Value Store Oracle NoSQL Database A Distributed Key-Value Store Charles Lamb The following is intended to outline our general product direction. It is intended for information purposes only,

More information

MapReduce & BigTable

MapReduce & BigTable CPSC 426/526 MapReduce & BigTable Ennan Zhai Computer Science Department Yale University Lecture Roadmap Cloud Computing Overview Challenges in the Clouds Distributed File Systems: GFS Data Process & Analysis:

More information

Huge market -- essentially all high performance databases work this way

Huge market -- essentially all high performance databases work this way 11/5/2017 Lecture 16 -- Parallel & Distributed Databases Parallel/distributed databases: goal provide exactly the same API (SQL) and abstractions (relational tables), but partition data across a bunch

More information

Using space-filling curves for multidimensional

Using space-filling curves for multidimensional Using space-filling curves for multidimensional indexing Dr. Bisztray Dénes Senior Research Engineer 1 Nokia Solutions and Networks 2014 In medias res Performance problems with RDBMS Switch to NoSQL store

More information

Big Data 7. Resource Management

Big Data 7. Resource Management Ghislain Fourny Big Data 7. Resource Management artjazz / 123RF Stock Photo Data Technology Stack User interfaces Querying Data stores Indexing Processing Validation Data models Syntax Encoding Storage

More information

/ Cloud Computing. Recitation 10 March 22nd, 2016

/ Cloud Computing. Recitation 10 March 22nd, 2016 15-319 / 15-619 Cloud Computing Recitation 10 March 22nd, 2016 Overview Administrative issues Office Hours, Piazza guidelines Last week s reflection Project 3.3, OLI Unit 4, Module 15, Quiz 8 This week

More information