Ghislain Fourny. Big Data 5. Column stores

Ghislain Fourny Big Data 5. Column stores 1

Introduction 2

Relational model 3

Relational model Schema 4

Issues with relational databases (RDBMS) Small scale Single machine 5

Can we fix a RDBMS? Scale up (remember?) 6

Can we fix a RDBMS? Scale out 7

Can we fix a RDBMS? Cluster Scale out 8

Can we fix a RDBMS? Cluster Replicate Scale out 9

Can we fix a RDBMS? Hard to set up Very high maintenance costs Scale out 10

HBase By design running on a scalable cluster of commodity hardware 11

HBase By design running on a scalable cluster of commodity hardware HDFS 12

Wide column stores: data model 13

Founding paper 's BigTable 14

The tabular model 15

The tabular model: expensive joins 16

Design paradigm of BigTable store together what is accessed together 17

The tabular model: expensive joins 1 4 2 2 4 6 1 2 3 4 5 6 18

The columnar model: denormalized 1 4 2 2 4 6 19

Rows Row ID 000 002 0A1 1E0 22A 4A2 20

Columns Row ID 000 002 0A1 1E0 22A 4A2 21

Columns Column family Row ID 000 002 0A1 1E0 22A 4A2 22

Column families must be known in advance... Row ID 23

Column families must be known in advance... Row ID 000 A B 1 2 I 002 0A1 1E0 22A 4A2 24

... but columns can be added on the fly Row ID 000 A B C 1 2 I II III IV 002 0A1 1E0 22A 4A2 25

Primary queries Get Put Scan Delete 26

Get Row ID 000 A B C 1 2 I II III IV 002 0A1 1E0 22A 4A2 27

Put Row ID 000 A B C 1 2 I II III IV 002 0A1 1E0 204 22A 4A2 28

Scan Row ID 000 A B C 1 2 I II III IV 002 0A1 1E0 204 22A 4A2 29

Delete Row ID 000 A B C 1 2 I II III IV 002 0A1 1E0 204 22A 4A2 30

Some terminology: Key-value model Key Value 31

Some terminology: Column-oriented stores Column1 Column2 32

Some terminology: Column-oriented key-value stores Also: wide column stores, column family-oriented Row ID A B C 1 2 I II III IV 33

Examples of Column-oriented key-value stores 's BigTable 34

Warning on terminology NoSQL is very recent! 35

Warning on terminology Key-value storage Relational table Words have a "life" File Block NoSQL Object storage 36

HBase: physical level 37

Physical layer: regions Row ID A B C 1 2 I II III IV 38

Physical layer: regions Row ID A B C 1 2 I II III IV 39

Physical layer: regions Row ID A B C 1 2 I II III IV Min-incl. Max-excl. 40

Physical layer: column families Row ID A B C 1 2 I II III IV Min-incl. Max-excl. Stored together 41

Architecture "The same procedure as every year, James." 42

HDFS... Namenode /dir/file1 /dir/file2 /file3 Datanode Datanode Datanode Datanode Datanode Datanode 43

HBase HMaster Regionserver Regionserver Regionserver Regionserver Regionserver Regionserver 44

HBase HMaster Replicas! Regionserver Regionserver Regionserver Regionserver Regionserver Regionserver 45

HMaster HMaster Regionserver Regionserver Regionserver Regionserver Regionserver Regionserver 46

HMaster DDL operations Create table Delete table 47

HMaster assigns regions to RegionServers Row ID 48

HMaster assigns regions to RegionServers Row ID 49

HMaster assigns regions to RegionServers Row ID 50

HMaster splits regions Row ID 51

HMaster handles Regionserver failovers 52

Architecture HMaster Regionserver Regionserver Regionserver Regionserver Regionserver Regionserver 53

Regionserver HMaster Regionserver Regionserver Regionserver Regionserver Regionserver Regionserver 54

Physical storage Row ID Min-incl. A B C 1 2 Stored together I II III IV 55

Physical storage Row ID A B C 1 2 I II III IV Store Store Store Store Store Store 56

Store = column family Row ID 1 2 57

Store = column family Row ID 1 2 Cell 58

Store = column family Row ID 1 2 HFile HFile HFile HFile (On HDFS) 59

HFile HFile 60

HFile HFile That's actually an SSTable (flat sorted list of key-value pairs) 61

HFile HFile KeyValue That's actually an SSTable (flat sorted list of key-value pairs) (Stores a cell) 62

Versioning Different versions of same cell Latest 63

HFile: KeyValue key value 64

HFile: KeyValue (prefix code) keylength valuelength key value 65

HFile: Key row length row (key) column family length column family column qualifier timestamp key type 66

HFile: Key row length row (key) column family length column family column qualifier timestamp key type This one is for the versioning 67

Blocks HFile 68

Blocks HFile "Quantity" of KeyValues that get read at a time 69

Blocks Default HFile 64kb 70

Blocks: long keys or values size(keyvalue) > block size No split (longer block) 71

Levels of physical storage Table 72

Levels of physical storage Table Region 73

Levels of physical storage Table Region Store 74

Levels of physical storage Table Region Store StoreFile 75

Levels of physical storage Table Region Store StoreFile Block 76

Levels of physical storage Table Region Store StoreFile Block KeyValue 77

HBase: Writing new cells 78

On Disk Table Region Store StoreFile Block KeyValue 79

Store StoreFile Block Block StoreFile Block Block 80

Store MemStore radub85 / 123RF Stock Photo StoreFile Block Block StoreFile Block Block 81

In Memory Table Region Store MemStore Cell 82

Writing new cells MemStore StoreFile Block Block 83

Writing new cells MemStore StoreFile Block Block 84

Writing new cells MemStore StoreFile Block Block 85

Writing new cells MemStore StoreFile Block Block 86

Writing new cells MemStore StoreFile Block Block 87

Flush MemStore StoreFile StoreFile Block Block Block Block Sort! 88

Flush When: Reaching max Memstore size in a store Reaching overall max Memstore size Reaching full Write-Ahead Log 89

Reading from a Store MemStore StoreFile Block Block StoreFile Block Block 90

Reading from a Store MemStore StoreFile Block Block StoreFile Block Block 91

Compaction StoreFile StoreFile StoreFile Block Block Block Block Block Block 92

Compaction StoreFile StoreFile StoreFile Block Block Block Block Block Block 93

Compaction StoreFile (Sort again) Block Block Block Block Block Block 94

The META table: a table like any other 95

The META table: stores region locations table + region start key + region id + replica id info: regioninfo info: server www.example.com:0 info: serverstartcode 2016-10-11T10:15:00 96

RegionInfo RegionInfo Table name Start key Region ID Replica ID encodedname End key Split Offline 97

Architecture HMaster Regionserver Regionserver Regionserver Regionserver Regionserver Regionserver 98

Architecture HMaster Create/delete/update table Regionserver Regionserver Regionserver Regionserver Regionserver Regionserver 99

Architecture HMaster Region? Regionserver Regionserver Regionserver Regionserver Regionserver Regionserver (hosting meta) 100

Architecture HMaster Region? Regionserver location(s) Regionserver Regionserver Regionserver Regionserver Regionserver Regionserver 101

Architecture HMaster Query Regionserver Regionserver Regionserver Regionserver Regionserver Regionserver 102

HBase: Underlying APIs grazvydas / 123RF Stock Photo 103

HBase implementation (Packaged code) 104

HBase APIs REST 105

HBase: caching 106

HBase Caches: reading faster LRU block cache Level 1 107

HBase Caches: reading faster LRU block cache bucket cache Level 1 Level 2 108

HBase Caches: reading faster LRU block cache bucket cache HDFS Level 1 Level 2 109

LRU Block Cache On the Least Recently Heap Used 110

LRU Block Cache: levels of priority Single access priority Multi access priority In-memory access priority111

When to NOT use the cache Batch processing 112

When to NOT use the cache Random access 113

Hash function Source: Jorge Stolfi (Wikipedia) 114

Bloom filter Very quickly whether an element belongs to a set (potentially false positives) 115

Bloom filter 0 0 0 0 0 0 0 0 0 0 0 0 116

Bloom filter John Smith hash function 1 hash function 2 hash function k 0 1 1 0 0 0 0 1 0 0 0 0 117

Bloom filter Mary Smith hash function 1 hash function 2 hash function k 0 1 1 0 0 1 1 1 0 0 0 0 118

Bloom filter: not in set 0 1 1 0 0 1 1 1 0 0 0 0 hash function 1 hash function 2 hash function k Albert Einstein? 119

Bloom filter: in set (and correct) 0 1 1 0 0 1 1 1 0 0 0 0 hash function 1 hash function 2 hash function k Mary Smith? 120

Bloom filter: in set (false positive) 0 1 1 0 0 1 1 1 0 0 0 0 hash function 1 hash function 2 hash function k Louis de Broglie? 121

Data Locality 122

HBase vs. HDFS 123

With HDFS load balancer... 124

HFile compaction brings back locality 125

Best practices 126

Number of rows Millions RDBMS Billions HBase 127

Number of nodes > 5 128

10 Design Principles of Big Data 129

1. Learn from the past 130

2. Keep the design simple 131

3. Modularize the architecture 132

4. Homogeneity in the large 133

5. Heterogeneity in the small 134

6. Separate metadata from data 135

7. Abstract logical model from its physical implementation 136

8. Shard the data 137

9. Replicate the data 138

10. Buy lots of cheap hardware 139