Ghislain Fourny Big Data 5. Column stores 1
Introduction 2
Relational model 3
Relational model Schema 4
Issues with relational databases (RDBMS) Small scale Single machine 5
Can we fix a RDBMS? Scale up (remember?) 6
Can we fix a RDBMS? Scale out 7
Can we fix a RDBMS? Cluster Scale out 8
Can we fix a RDBMS? Cluster Replicate Scale out 9
Can we fix a RDBMS? Hard to set up Very high maintenance costs Scale out 10
HBase By design running on a scalable cluster of commodity hardware 11
HBase By design running on a scalable cluster of commodity hardware HDFS 12
Wide column stores: data model 13
Founding paper 's BigTable 14
The tabular model 15
The tabular model: expensive joins 16
Design paradigm of BigTable store together what is accessed together 17
The tabular model: expensive joins 1 4 2 2 4 6 1 2 3 4 5 6 18
The columnar model: denormalized 1 4 2 2 4 6 19
Rows Row ID 000 002 0A1 1E0 22A 4A2 20
Columns Row ID 000 002 0A1 1E0 22A 4A2 21
Columns Column family Row ID 000 002 0A1 1E0 22A 4A2 22
Column families must be known in advance... Row ID 23
Column families must be known in advance... Row ID 000 A B 1 2 I 002 0A1 1E0 22A 4A2 24
... but columns can be added on the fly Row ID 000 A B C 1 2 I II III IV 002 0A1 1E0 22A 4A2 25
Primary queries Get Put Scan Delete 26
Get Row ID 000 A B C 1 2 I II III IV 002 0A1 1E0 22A 4A2 27
Put Row ID 000 A B C 1 2 I II III IV 002 0A1 1E0 204 22A 4A2 28
Scan Row ID 000 A B C 1 2 I II III IV 002 0A1 1E0 204 22A 4A2 29
Delete Row ID 000 A B C 1 2 I II III IV 002 0A1 1E0 204 22A 4A2 30
Some terminology: Key-value model Key Value 31
Some terminology: Column-oriented stores Column1 Column2 32
Some terminology: Column-oriented key-value stores Also: wide column stores, column family-oriented Row ID A B C 1 2 I II III IV 33
Examples of Column-oriented key-value stores 's BigTable 34
Warning on terminology NoSQL is very recent! 35
Warning on terminology Key-value storage Relational table Words have a "life" File Block NoSQL Object storage 36
HBase: physical level 37
Physical layer: regions Row ID A B C 1 2 I II III IV 38
Physical layer: regions Row ID A B C 1 2 I II III IV 39
Physical layer: regions Row ID A B C 1 2 I II III IV Min-incl. Max-excl. 40
Physical layer: column families Row ID A B C 1 2 I II III IV Min-incl. Max-excl. Stored together 41
Architecture "The same procedure as every year, James." 42
HDFS... Namenode /dir/file1 /dir/file2 /file3 Datanode Datanode Datanode Datanode Datanode Datanode 43
HBase HMaster Regionserver Regionserver Regionserver Regionserver Regionserver Regionserver 44
HBase HMaster Replicas! Regionserver Regionserver Regionserver Regionserver Regionserver Regionserver 45
HMaster HMaster Regionserver Regionserver Regionserver Regionserver Regionserver Regionserver 46
HMaster DDL operations Create table Delete table 47
HMaster assigns regions to RegionServers Row ID 48
HMaster assigns regions to RegionServers Row ID 49
HMaster assigns regions to RegionServers Row ID 50
HMaster splits regions Row ID 51
HMaster handles Regionserver failovers 52
Architecture HMaster Regionserver Regionserver Regionserver Regionserver Regionserver Regionserver 53
Regionserver HMaster Regionserver Regionserver Regionserver Regionserver Regionserver Regionserver 54
Physical storage Row ID Min-incl. A B C 1 2 Stored together I II III IV 55
Physical storage Row ID A B C 1 2 I II III IV Store Store Store Store Store Store 56
Store = column family Row ID 1 2 57
Store = column family Row ID 1 2 Cell 58
Store = column family Row ID 1 2 HFile HFile HFile HFile (On HDFS) 59
HFile HFile 60
HFile HFile That's actually an SSTable (flat sorted list of key-value pairs) 61
HFile HFile KeyValue That's actually an SSTable (flat sorted list of key-value pairs) (Stores a cell) 62
Versioning Different versions of same cell Latest 63
HFile: KeyValue key value 64
HFile: KeyValue (prefix code) keylength valuelength key value 65
HFile: Key row length row (key) column family length column family column qualifier timestamp key type 66
HFile: Key row length row (key) column family length column family column qualifier timestamp key type This one is for the versioning 67
Blocks HFile 68
Blocks HFile "Quantity" of KeyValues that get read at a time 69
Blocks Default HFile 64kb 70
Blocks: long keys or values size(keyvalue) > block size No split (longer block) 71
Levels of physical storage Table 72
Levels of physical storage Table Region 73
Levels of physical storage Table Region Store 74
Levels of physical storage Table Region Store StoreFile 75
Levels of physical storage Table Region Store StoreFile Block 76
Levels of physical storage Table Region Store StoreFile Block KeyValue 77
HBase: Writing new cells 78
On Disk Table Region Store StoreFile Block KeyValue 79
Store StoreFile Block Block StoreFile Block Block 80
Store MemStore radub85 / 123RF Stock Photo StoreFile Block Block StoreFile Block Block 81
In Memory Table Region Store MemStore Cell 82
Writing new cells MemStore StoreFile Block Block 83
Writing new cells MemStore StoreFile Block Block 84
Writing new cells MemStore StoreFile Block Block 85
Writing new cells MemStore StoreFile Block Block 86
Writing new cells MemStore StoreFile Block Block 87
Flush MemStore StoreFile StoreFile Block Block Block Block Sort! 88
Flush When: Reaching max Memstore size in a store Reaching overall max Memstore size Reaching full Write-Ahead Log 89
Reading from a Store MemStore StoreFile Block Block StoreFile Block Block 90
Reading from a Store MemStore StoreFile Block Block StoreFile Block Block 91
Compaction StoreFile StoreFile StoreFile Block Block Block Block Block Block 92
Compaction StoreFile StoreFile StoreFile Block Block Block Block Block Block 93
Compaction StoreFile (Sort again) Block Block Block Block Block Block 94
The META table: a table like any other 95
The META table: stores region locations table + region start key + region id + replica id info: regioninfo info: server www.example.com:0 info: serverstartcode 2016-10-11T10:15:00 96
RegionInfo RegionInfo Table name Start key Region ID Replica ID encodedname End key Split Offline 97
Architecture HMaster Regionserver Regionserver Regionserver Regionserver Regionserver Regionserver 98
Architecture HMaster Create/delete/update table Regionserver Regionserver Regionserver Regionserver Regionserver Regionserver 99
Architecture HMaster Region? Regionserver Regionserver Regionserver Regionserver Regionserver Regionserver (hosting meta) 100
Architecture HMaster Region? Regionserver location(s) Regionserver Regionserver Regionserver Regionserver Regionserver Regionserver 101
Architecture HMaster Query Regionserver Regionserver Regionserver Regionserver Regionserver Regionserver 102
HBase: Underlying APIs grazvydas / 123RF Stock Photo 103
HBase implementation (Packaged code) 104
HBase APIs REST 105
HBase: caching 106
HBase Caches: reading faster LRU block cache Level 1 107
HBase Caches: reading faster LRU block cache bucket cache Level 1 Level 2 108
HBase Caches: reading faster LRU block cache bucket cache HDFS Level 1 Level 2 109
LRU Block Cache On the Least Recently Heap Used 110
LRU Block Cache: levels of priority Single access priority Multi access priority In-memory access priority111
When to NOT use the cache Batch processing 112
When to NOT use the cache Random access 113
Hash function Source: Jorge Stolfi (Wikipedia) 114
Bloom filter Very quickly whether an element belongs to a set (potentially false positives) 115
Bloom filter 0 0 0 0 0 0 0 0 0 0 0 0 116
Bloom filter John Smith hash function 1 hash function 2 hash function k 0 1 1 0 0 0 0 1 0 0 0 0 117
Bloom filter Mary Smith hash function 1 hash function 2 hash function k 0 1 1 0 0 1 1 1 0 0 0 0 118
Bloom filter: not in set 0 1 1 0 0 1 1 1 0 0 0 0 hash function 1 hash function 2 hash function k Albert Einstein? 119
Bloom filter: in set (and correct) 0 1 1 0 0 1 1 1 0 0 0 0 hash function 1 hash function 2 hash function k Mary Smith? 120
Bloom filter: in set (false positive) 0 1 1 0 0 1 1 1 0 0 0 0 hash function 1 hash function 2 hash function k Louis de Broglie? 121
Data Locality 122
HBase vs. HDFS 123
With HDFS load balancer... 124
HFile compaction brings back locality 125
Best practices 126
Number of rows Millions RDBMS Billions HBase 127
Number of nodes > 5 128
10 Design Principles of Big Data 129
1. Learn from the past 130
2. Keep the design simple 131
3. Modularize the architecture 132
4. Homogeneity in the large 133
5. Heterogeneity in the small 134
6. Separate metadata from data 135
7. Abstract logical model from its physical implementation 136
8. Shard the data 137
9. Replicate the data 138
10. Buy lots of cheap hardware 139