Faster HBase queries. Introducing hindex Secondary indexes for HBase. ApacheCon North America Rajeshbabu Chintaguntla
|
|
- Shawn O’Connor’
- 6 years ago
- Views:
Transcription
1 Security Level: Faster HBase queries Introducing hindex Secondary indexes for HBase ApacheCon North America Rajeshbabu Chintaguntla HUAWEI TECHNOLOGIES CO., LTD.
2 $ whoami Apache HBase Committer Senior Software Engineer, Huawei India R&D Developed hindex Secondary indexes for HBase HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 2
3 Agenda HBase A brief introduction Introduction to hindex Usage Test Results HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 3
4 Agenda HBase A brief introduction Introduction to hindex Usage Test Results HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 4
5 Apache HBase salary dob an open-source, distributed, 0 versioned, non-relational database Ram M OIH SSE modeled after Google s BigTable 141 Anu M OIH TL 2100 leverages distributed data storage provided by HDFS 145 Pia F BDI TL 326 Jay M SOA SA allows random, read/write access to data in HDFS 152 Suma F SOA SSE 126 Som M OIH SE GOAL: hosting very large tables - billions of rows X millions of.. columns HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 5
6 HBase Introduction Sorted lexicographically salary dob Ram M OIH SSE 254 Anu M OIH TL Pia F BDI TL 326 Jay M SOA SA 521 Suma F SOA SSE 126 Som M OIH SE HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 6
7 HBase Introduction Sorted Sparse any number of columns salary dob Raj M SE 534 Ram M SSE Anu M Pia F 326 Jay SOA SA 521 Suma SOA SSE Som M OIH SE HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 7
8 HBase Introduction Sorted Sparse Multi-dimensional data is versioned salary dob 141 Raj M OIH SE SSE 534 Ram M SSE TL SSE Anu M Pia F 326 Jay SOA SA Suma SOA SSE 1550 cf1 Som M OIH SE SSE Key= cf2 name T1 Raj gender T1 M dept T1 OIH title T1 SE title T2 SSE mobile T1 534 salary T3 0 dob T (, column family, column, timestamp) -> value HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 8
9 HBase Introduction Sorted Sparse Multi-dimensional Distributed salary dob leverages HDFS, data replicated across nodes Raj M SE 534 Ram M SSE Anu M Pia F 326 Jay SOA SA 521 Suma SOA SSE Som M OIH SE Protects against failing node HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 9
10 HBase Introduction Sorted Sparse Multi-dimensional Distributed Auto-sharded split and re-distributed as data grows salary dob 141 Ram M OIH SSE 254 Anu M OIH TL Pia F BDI TL Regions Jay M SOA SA 521 Suma F SOA SSE 126 Som M OIH SE HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 10
11 HBase Introduction Sorted Sparse Multi-dimensional Distributed Auto-sharded metadata salary dob 0 Region Range R R Ram M OIH SSE R3 141 Anu M OIH TL 2100 Pia F BDI TL Regions 145 Jay M SOA SA Suma F SOA SSE Som M OIH SE HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 11
12 HBase Introduction HMaster Region Server 1 Region Server 2 Region 1 Region 2 Store cf1 Store cf2 HBlock1 HBlock1 HBlock1 HBlock1 Mem store HBlock2 : HBlock2 : Mem store HBlock2 : HBlock2 : HBlockN HBlockN HBlockN HBlockN HFile1 HFile2 HFile3 HFile4 Master, region servers and zookeeper Table horizontally divided into regions Columns grouped into Column families Vertical partition of tables Memstore, HFiles in DFS. HFiles logically split into smaller blocks. Data read write happen as blocks MapReduce integration HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 12
13 Coprocessors Allow to run client-supplied code on server-side, Can extend the functionality of HBase without changing the kernel Client Pre-Action Action Post-Action Observers Like triggers Runs extended functionality before or after an action through hooks provided by coprocessor framework. EndPoints Like Stored procedures Can run any time from client The endpoint implementation will then be executed remotely at the target region(s) results from those executions will be returned to the client. HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 13
14 Filters Source: Lars, George, HBase The Definitive Guide, O Reilly Media HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 14
15 HBase Query salary dob 0 SELECT NAME FROM PERSON WHERE KEY=141 Ram M OIH SSE Anu M OIH TL Pia F BDI TL 326 Jay M SOA SA NOTE: HBase does not have native support for SQL like query interface. It is used here for the sake of easy understanding. Similar query can be done using scanners and filters. Suma F SOA SSE Som M OIH SE HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 15
16 HBase Query by Row salary dob 0 SELECT NAME FROM PERSON WHERE KEY=141 Ram M OIH SSE Anu M OIH TL 2100 Pia F BDI TL Jay M SOA SA Suma F SOA SSE Som M OIH SE HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 16
17 HBase Query by Column (w/o Index) salary dob 0 SELECT NAME FROM PERSON WHERE MOBILE= Ram M OIH SSE Anu M OIH TL 2100 Pia F BDI TL Jay M SOA SA Suma F SOA SSE Som M OIH SE HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 17
18 HBase Query by Column (w/o Index) salary dob 0 SELECT NAME FROM PERSON WHERE MOBILE= Ram M OIH SSE Anu M OIH TL 2100 Pia F BDI TL Jay M SOA SA Suma F SOA SSE Som M OIH SE HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 18
19 HBase Query by Column (w/o Index) salary dob 0 SELECT NAME FROM PERSON WHERE MOBILE= Ram M OIH SSE Anu M OIH TL 2100 Pia F BDI TL Jay M SOA SA Suma F SOA SSE Som M OIH SE Full table scan HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 19
20 HBase Query by Column (w/o Index) salary dob 0 SELECT NAME FROM PERSON WHERE MOBILE= Ram M OIH SSE Anu M OIH TL 2100 Pia F BDI TL Jay M SOA SA Suma F SOA SSE Som M OIH SE Full table scan Billions of records full table scan evil to performance HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 20
21 HBase Query by Column (w/o Index) salary dob 0 SELECT NAME FROM PERSON WHERE MOBILE= Ram M OIH SSE Anu M OIH TL 2100 Pia F BDI TL Jay M SOA SA Suma F SOA SSE Som M OIH SE Full table scan Billions of records full table scan evil to performance Side effects Client timeouts, lease expiring HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 21
22 Agenda HBase A brief introduction Introduction to hindex Usage Test Results HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 22
23 Introducing hindex Coprocessor based server side implementation of secondary indexing solution Separate index table, used by all indexes of a table. Region wise indexing (aka local indexing) Custom load balancer co-locates the index table regions with actual table regions. Index table row construction is: region start + index name+ indexed column(s) value + user table row HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 23
24 Coprocesor Host Client Ext Coprocesor Host hindex: Architecture Balancer Coprocessor handles the index data HMaster A custom LoadBalancer does collocation Client App HBase Client RegionServer Indexing Coprocessor Client Extn allows specifying index details while creating table, not needed for read/write RowKey cf1:col1 001 A 002 B 003 Z 004 C 005 A 006 A Primary User table RowKey Index table cf 001_A_ _A_ _A_ _B_ _C_ _Z_003 Secondary index table HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 24
25 HBase Query by Column (w/ Index) Ram M OIH SSE 254 salary dob SELECT NAME FROM PERSON WHERE MOBILE= 141 Anu M OIH TL 2100 Index column: MOBILE Pia F BDI TL Jay M SOA SA Suma F SOA SSE Som M OIH SE HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 25
26 HBase Query by Column (w/ Index) idx Ram M OIH SSE 254 salary dob SELECT NAME FROM PERSON WHERE MOBILE= Anu M OIH TL Pia F BDI TL Jay M SOA SA Suma F SOA SSE Som M OIH SE HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 26
27 HBase Query by Column (w/ Index) idx1 _ _ Ram M OIH SSE 254 salary dob SELECT NAME FROM PERSON WHERE MOBILE= 326_ 141 Anu M OIH TL _ Pia F BDI TL _ 145 Jay M SOA SA _145 Suma F SOA SSE _ Som M OIH SE To ensure user table row order when a column value repeated in multiple rows HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 27
28 HBase Query by Column (w/ Index) idx1 _ _ Ram M OIH SSE 254 salary dob SELECT NAME FROM PERSON WHERE MOBILE= 326_ 141 Anu M OIH TL _ Pia F BDI TL _ 145 Jay M SOA SA _145 Suma F SOA SSE _ Som M OIH SE HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 28
29 HBase Query by Column (w/ Index) idx1 salary dob _ _ Ram M OIH SSE _ 141 Anu M OIH TL _ Pia F BDI TL Regions 126_ 145 Jay M SOA SA _ _ Suma F SOA SSE 126 Som M OIH SE Index maintained per region, not globally.. HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 29
30 HBase Query by Column (w/ Index) idx1 salary dob _141 Network calls avoided 0 254_ Ram M OIH SSE _ 141 Anu M OIH TL _ Pia F BDI TL Regions 126_ 145 Jay M SOA SA _ _ Suma F SOA SSE 126 Som M OIH SE Index maintained per region, not globally.. HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 30
31 HBase Query by Column (w/ Index) idx1 salary dob _ _ Ram M OIH SSE _ 141 Anu M OIH TL _ Pia F BDI TL Regions 126_ 145 Jay M SOA SA _ _ Suma F SOA SSE 126 Som M OIH SE Index maintained per region, not globally.. Handle Region Movement Region Splits HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 31
32 Regions Co-location Actual Table A R1 B RS1 Client Create table with regions HMaster B C R2 R1 R1 Balancer A R1 RS2 B B C R2 R2 R2 Index Table HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 32
33 Put operation Table-> t1 & Column family -> cf Index-> idx1(cf:q1) & idx2(cf:q2) Index table -> t1_idx HRegionServer Client Put t1, AAB, cf:q1, 5, cf:q2, z1 A B User Region R1 A B Index Region A R1 Coprocessor Put t1_idx, Aidx15AAB Put t1_idx, Aidx2z1AAB HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 33
34 Scan Operation 1) Create scanner for index region at server side HRegionServer Client Create scanner (condition cf:q1=5) A User Region R1 A Index Region R1 B B A Coprocessor Create scanner on index region Start row : Aidx15 Stop row : Aidx16 HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 34
35 Scan Operation 2) Scan index table at server side and seek to exact rows in the user table HRegionServer Client 1 2 A User Region A Index Region R1 R1 B Coprocessor 3 B A 5 Seek to exact row 4 next() HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 35
36 Scan Operation Coprocessors read index and seek to exact row in the user table Doing seeks on HFiles based on row obtained from index data HFiles reads as block by block Default block size is 64kb Skipping block reads from HDFS where data not at all present Some times skipping full HFile No need to read index details back to client avoiding network extra usage. HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 36
37 hindex: Usage SELECT NAME FROM PERSON Filters (with equal or range conditions) WHERE (DEPT= OIH OR TITLE= TL ) AND (400 > MOBILE AND MOBILE > 500) AND Filters list with MUST_PASS_ALL OR Filters list with MUST_PASS_ONE HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 37
38 HBase Query with AND idx1 BDI_ OIH_ idx2 TL_141 TL_ Ram M OIH SSE 254 SELECT NAME FROM PERSON WHERE DEPT=OIH AND TITLE=TL OIH_141 TL_ Anu M OIH TL OIH_ SA_145 Pia F BDI TL 326 OIH_152 SE_ 145 Jay M SOA SA 521 SOA_145 SSE_ Suma F SOA SSE 126 SOA_ SSE_ 152 Som M OIH TL 665 HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 38
39 HBase Query with AND idx1_bdi_ idx1_oih_ idx1_oih_141 idx1_oih_ idx1_oih_152 idx1_soa_145 idx1_soa_ idx2_tl_141 idx2_tl_ idx2_tl_152 idx2_sa_ Ram M OIH SSE 254 Anu M OIH TL Pia F BDI TL 326 Jay M SOA SA 521 Suma F SOA SSE 126 Som M OIH TL 665 1) Single index table per user table, easier to collocate 2) Index table row s have index name to store each index data together SELECT NAME FROM PERSON WHERE DEPT=OIH AND TITLE=TL HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 39
40 HBase Query with AND idx1_bdi_ idx1_oih_ idx1_oih_141 idx1_oih_ idx1_oih_152 idx1_soa_ Ram M OIH SSE 254 Anu M OIH TL Pia F BDI TL 326 Jay M SOA SA 521 SELECT NAME FROM PERSON WHERE DEPT=OIH AND TITLE=TL idx1_soa_ Suma F SOA SSE Som M OIH TL 665 idx2_tl_141 idx2_tl_ idx2_tl_152 idx2_sa_145 Create two scanners 1) start: idx1_oih, end: idx1_oii 2) Start: idx2_tl, end: idx2_tm HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 40
41 HBase Query with AND idx1_bdi_160 idx1_oih_ idx1_oih_141 idx2_tl_ idx2_tl_145 Idx2_TL_ Ram M SOA SSE 254 SELECT NAME FROM PERSON WHERE DEPT=OIH AND TITLE=TL idx1_oih_ Idx2_TL_ Anu M OIH SA idx1_oih_152 idx2_tl_152 Pia F OIH TL 326 idx1_soa_ Idx2_TL_ Jay M SOA TL 521 idx1_soa_ idx2_sa_141 Suma F SOA TL 126 Idx1_SOA_ M SOA TL Som M OIH TL Su F BDI TL 928 two scanners 1) start: idx1_oih, end: idx1_oii 2) Start: idx2_tl, end: idx2_tm HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 41
42 HBase Query with AND idx1_bdi_160 idx1_oih_ idx1_oih_141 idx1_oih_ idx2_tl_ idx2_tl_145 < Idx2_TL_ scanner-1 can jump to idx1_oih_ Ram M SOA SSE Anu M OIH SA Idx2_TL_150 SELECT NAME FROM PERSON WHERE DEPT=OIH AND TITLE=TL idx1_oih_152 idx2_tl_152 Pia F OIH TL 326 idx1_soa_ Idx2_TL_ Jay M SOA TL 521 idx1_soa_ idx2_sa_141 Suma F SOA TL 126 Idx1_SOA_ M SOA TL Som M OIH TL Su F BDI TL 928 two scanners 1) start: idx1_oih, end: idx1_oii 2) Start: idx2_tl, end: idx2_tm HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 42
43 HBase Query with AND idx1_bdi_160 idx1_oih_ idx1_oih_141 idx2_tl_ idx2_tl_145 Idx2_TL_ Ram M SOA SSE 254 SELECT NAME FROM PERSON WHERE DEPT=OIH AND TITLE=TL idx1_oih_ Idx2_TL_ Anu M OIH SA idx1_oih_152 idx2_tl_152 Pia F OIH TL 326 idx1_soa_ Idx2_TL_ Jay M SOA TL 521 idx1_soa_ idx2_sa_141 Suma F SOA TL 126 Idx1_SOA_ M SOA TL Som M OIH TL Su F BDI TL 928 two scanners 1) start: idx1_oih, end: idx1_oii 2) Start: idx2_tl, end: idx2_tm HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 43
44 HBase Query with AND idx1_bdi_160 idx1_oih_ idx1_oih_141 idx2_tl_ idx2_tl_145 Idx2_TL_ Ram M SOA SSE 254 SELECT NAME FROM PERSON WHERE DEPT=OIH AND TITLE=TL idx1_oih_ Idx2_TL_ Anu M OIH SA Pia idx1_oih_152 idx2_tl_152 Pia F OIH TL 326 idx1_soa_ idx1_soa_ Idx2_TL_160 idx2_sa_ Jay M SOA TL 521 = (meets our condition) - Can fetch required data - Move both scanners to next Suma F SOA TL 126 Idx1_SOA_ M SOA TL Som M OIH TL Su F BDI TL 928 two scanners 1) start: idx1_oih, end: idx1_oii 2) Start: idx2_tl, end: idx2_tm HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 44
45 HBase Query with AND idx1_bdi_160 idx1_oih_ idx1_oih_141 idx2_tl_ idx2_tl_145 Idx2_TL_ Ram M SOA SSE 254 SELECT NAME FROM PERSON WHERE DEPT=OIH AND TITLE=TL idx1_oih_ idx1_oih_152 idx1_soa_ idx1_soa_ Idx1_SOA_150 Idx2_TL_150 idx2_tl_152 Idx2_TL_ > 145 scanner-2 can jump to idx2_tl_152 idx2_sa_ Anu M OIH SA Pia F OIH TL Jay M SOA TL 521 Suma F SOA TL M SOA TL Som M OIH TL 665 Pia 160 Su F BDI TL 928 two scanners 1) start: idx1_oih, end: idx1_oii 2) Start: idx2_tl, end: idx2_tm HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 45
46 HBase Query with AND idx1_bdi_160 idx1_oih_ idx1_oih_141 idx2_tl_ Idx2_TL_145 idx2_tl_ Ram M SOA SSE 254 SELECT NAME FROM PERSON WHERE DEPT=OIH AND TITLE=TL idx1_oih_ idx2_tl_ Anu M OIH SA Pia idx1_oih_152 idx2_tl_152 Pia F OIH TL 326 Som idx1_soa_ idx1_soa_ Idx2_TL_160 idx2_sa_ Jay M SOA TL = 152 (meets our condition) - Can fetch required data - Move both scanners to next Suma F SOA TL 126 Idx1_SOA_ M SOA TL Som M OIH TL Su F BDI TL 928 two scanners 1) start: idx1_oih, end: idx1_oii 2) Start: idx2_tl, end: idx2_tm HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 46
47 HBase Query with AND idx1_bdi_160 idx1_oih_ idx1_oih_141 idx2_tl_ Idx2_TL_145 idx2_tl_ Ram M SOA SSE 254 SELECT NAME FROM PERSON WHERE DEPT=OIH AND TITLE=TL idx1_oih_ idx2_tl_ Anu M OIH SA Pia idx1_oih_152 idx2_tl_152 Pia F OIH TL 326 Som idx1_soa_ idx2_tl_ Jay M SOA TL 521 idx1_soa_ idx2_sa_141 Suma F SOA TL 126 Idx1_SOA_ M SOA TL 325 Scanner-1 reaches end Close both scanners Som M OIH TL 665 Su F BDI TL 928 two scanners 1) start: idx1_oih, end: idx1_oii 2) Start: idx2_tl, end: idx2_tm HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 47
48 HBase Query with AND (w/ 2-column index) idx1_bdi_tl_160 idx1_oih_sa_141 idx1_oih_tl_ Ram M SOA SSE 254 SELECT NAME FROM PERSON WHERE DEPT=OIH AND TITLE=TL idx1_oih_tl_ Anu M OIH SA idx1_soa_sse_ Pia F OIH TL 326 idx1_soa_tl_ Jay M SOA TL 521 Idx1_SOA_TL_ Suma F SOA TL 126 Idx1_SOA_TL_ M SOA TL 325 Idx1_SOA_TL_ Som M OIH TL Su F BDI TL scanner: start: idx1_oih_tl, end: idx1_oih_tm HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 48
49 HBase Query with AND (w/ 2-column index) idx1_bdi_tl_160 idx1_oih_sa_141 idx1_oih_tl_ Ram M SOA SSE 254 SELECT NAME FROM PERSON WHERE DEPT=OIH AND TITLE=TL idx1_oih_tl_ Anu M OIH SA idx1_soa_sse_ Pia F OIH TL 326 idx1_soa_tl_ Jay M SOA TL 521 Idx1_SOA_TL_ Suma F SOA TL 126 Idx1_SOA_TL_ M SOA TL 325 Idx1_SOA_TL_ Som M OIH TL Su F BDI TL scanner: start: idx1_oih_tl, end: idx1_oih_tm HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 49
50 HBase Query with AND (w/ 2-column index) idx1_bdi_tl_160 idx1_oih_sa_141 idx1_oih_tl_ idx1_oih_tl_152 idx1_soa_sse_ Meets condition - Can fetch required data Ram M SOA SSE Anu M OIH SA Pia F OIH TL 326 SELECT NAME FROM PERSON WHERE DEPT=OIH AND TITLE=TL Pia idx1_soa_tl_145 Idx1_SOA_TL_ Idx1_SOA_TL_150 Idx1_SOA_TL_ Jay M SOA TL 521 Suma F SOA TL 126 M SOA TL 325 Som M OIH TL 665 Su F BDI TL scanner: start: idx1_oih_tl, end: idx1_oih_tm HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 50
51 HBase Query with AND (w/ 2-column index) idx1_bdi_tl_160 idx1_oih_sa_141 idx1_oih_tl_ Ram M SOA SSE 254 SELECT NAME FROM PERSON WHERE DEPT=OIH AND TITLE=TL idx1_oih_tl_152 idx1_soa_sse_ idx1_soa_tl_ Anu M OIH SA Meets condition - Can fetch required data Pia F OIH TL Jay M SOA TL 521 Pia Som Idx1_SOA_TL_ Idx1_SOA_TL_150 Idx1_SOA_TL_ Suma F SOA TL 126 M SOA TL 325 Som M OIH TL 665 Su F BDI TL scanner: start: idx1_oih_tl, end: idx1_oih_tm HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 51
52 HBase Query with AND (w/ 2-column index) idx1_bdi_tl_160 idx1_oih_sa_141 idx1_oih_tl_ Ram M SOA SSE 254 SELECT NAME FROM PERSON WHERE DEPT=OIH AND TITLE=TL idx1_oih_tl_ Anu M OIH SA Pia idx1_soa_sse_ Pia F OIH TL 326 Som idx1_soa_tl_ Jay M SOA TL 521 Idx1_SOA_TL_ Suma F SOA TL 126 Idx1_SOA_TL_150 Idx1_SOA_TL_ Scanner reaches end Close the scanner 152 M SOA TL 325 Som M OIH TL Su F BDI TL scanner: start: idx1_oih_tl, end: idx1_oih_tm HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 52
53 HBase Query with OR idx1_bdi_160 idx1_oih_ idx1_oih_141 idx1_oih_ idx1_oih_152 idx1_soa_ idx1_soa_ Idx1_SOA_150 idx2_tl_ Idx2_TL_145 idx2_tl_ <= Get required results idx2_tl_150 Ram M SOA SSE 254 Anu M OIH SA Pia F OIH TL 326 idx2_tl_152 - Move scanner-1 to next till exceeds idx2_tl_ Jay M SOA TL 521 idx2_sa_141 Suma F SOA TL M SOA TL Som M OIH TL 665 SELECT NAME FROM PERSON WHERE DEPT=OIH OR TITLE=TL 160 Su F BDI TL 928 two scanners 1) start: idx1_oih, end: idx1_oii 2) Start: idx2_tl, end: idx2_tm HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 53
54 HBase Query with OR idx1_bdi_160 idx1_oih_ idx1_oih_141 idx1_oih_ idx1_oih_152 idx1_soa_ idx1_soa_ Idx1_SOA_150 idx2_tl_ Idx2_TL_145 idx2_tl_ <= Get required results idx2_tl_150 Ram M SOA SSE 254 Anu M OIH SA Pia F OIH TL 326 idx2_tl_152 - Move scanner-1 to next till exceeds idx2_tl_ Jay M SOA TL 521 idx2_sa_141 Suma F SOA TL M SOA TL Som M OIH TL 665 SELECT NAME FROM PERSON WHERE DEPT=OIH OR TITLE=TL Raj 160 Su F BDI TL 928 two scanners 1) start: idx1_oih, end: idx1_oii 2) Start: idx2_tl, end: idx2_tm HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 54
55 HBase Query with OR idx1_bdi_160 idx1_oih_ idx1_oih_141 idx2_tl_ Idx2_TL_145 idx2_tl_ Ram M SOA SSE 254 SELECT NAME FROM PERSON WHERE DEPT=OIH OR TITLE=TL idx1_oih_ idx1_oih_152 idx1_soa_ idx1_soa_ Idx1_SOA_150 idx2_tl_ idx2_tl_ <= Get required results idx2_tl_160 Anu M OIH SA Pia F OIH TL 326 Jay M SOA TL Move scanner-1 to next till Suma exceeds F SOA TL 126 idx2_sa_ M SOA TL Som M OIH TL 665 Raj 160 Su F BDI TL 928 two scanners 1) start: idx1_oih, end: idx1_oii 2) Start: idx2_tl, end: idx2_tm HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 55
56 HBase Query with OR idx1_bdi_160 idx1_oih_ idx1_oih_141 idx2_tl_ Idx2_TL_145 idx2_tl_ Ram M SOA SSE 254 SELECT NAME FROM PERSON WHERE DEPT=OIH OR TITLE=TL idx1_oih_ idx1_oih_152 idx1_soa_ idx1_soa_ Idx1_SOA_150 idx2_tl_ idx2_tl_ <= Get required results idx2_tl_160 Anu M OIH SA Pia F OIH TL 326 Jay M SOA TL Move scanner-1 to next till Suma exceeds F SOA TL 126 idx2_sa_ M SOA TL Som M OIH TL 665 Raj Anu 160 Su F BDI TL 928 two scanners 1) start: idx1_oih, end: idx1_oii 2) Start: idx2_tl, end: idx2_tm HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 56
57 HBase Query with OR idx1_bdi_160 idx1_oih_ idx1_oih_141 idx2_tl_ Idx2_TL_145 idx2_tl_ Ram M SOA SSE 254 SELECT NAME FROM PERSON WHERE DEPT=OIH OR TITLE=TL idx1_oih_ idx1_oih_152 idx1_soa_ idx1_soa_ Idx1_SOA_150 idx2_tl_150 idx2_tl_152 idx2_tl_160 idx2_sa_ == Get required results Anu M OIH SA Pia F OIH TL 326 Jay M SOA TL 521 Suma F SOA TL 126 M SOA TL 325 Som M OIH TL Move both scanners 160 in this Su case F BDI TL 928 Raj Anu two scanners 1) start: idx1_oih, end: idx1_oii 2) Start: idx2_tl, end: idx2_tm HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 57
58 HBase Query with OR idx1_bdi_160 idx1_oih_ idx1_oih_141 idx2_tl_ Idx2_TL_145 idx2_tl_ Ram M SOA SSE 254 SELECT NAME FROM PERSON WHERE DEPT=OIH OR TITLE=TL idx1_oih_ idx1_oih_152 idx1_soa_ idx1_soa_ Idx1_SOA_ idx2_tl_150 idx2_tl_ idx2_tl_160 idx2_sa_ == Get required results Anu M OIH SA Pia F OIH TL 326 Jay M SOA TL 521 Suma F SOA TL 126 M SOA TL 325 Som M OIH TL Move both scanners 160 in this Su case F BDI TL 928 Raj Anu Pia two scanners 1) start: idx1_oih, end: idx1_oii 2) Start: idx2_tl, end: idx2_tm HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 58
59 HBase Query with OR idx1_bdi_160 idx1_oih_ idx1_oih_141 idx2_tl_ idx2_tl_145 idx2_tl_ Ram M SOA SSE 254 SELECT NAME FROM PERSON WHERE DEPT=OIH OR TITLE=TL idx1_oih_ idx1_oih_152 idx1_soa_ idx1_soa_ Idx1_SOA_150 idx2_tl_ Anu M OIH SA idx2_tl_152 Pia F OIH TL 326 idx2_tl_ Jay M SOA TL 521 idx2_sa_141 Suma F SOA TL M SOA TL Som M OIH TL >= 145 (scanner is behind) - Get required results Su F BDI TL Move scanner-2 to next till exceeds 152 Raj Anu Pia two scanners 1) start: idx1_oih, end: idx1_oii 2) Start: idx2_tl, end: idx2_tm HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 59
60 HBase Query with OR idx1_bdi_160 idx1_oih_ idx1_oih_141 idx2_tl_ idx2_tl_145 idx2_tl_ Ram M SOA SSE 254 SELECT NAME FROM PERSON WHERE DEPT=OIH OR TITLE=TL idx1_oih_ idx1_oih_152 idx1_soa_ idx1_soa_ Idx1_SOA_150 idx2_tl_ Anu M OIH SA idx2_tl_152 Pia F OIH TL 326 idx2_tl_ Jay M SOA TL 521 idx2_sa_141 Suma F SOA TL M SOA TL Som M OIH TL >= 145 (scanner is behind) - Get required results Su F BDI TL Move scanner-2 to next till exceeds 152 Raj Anu Pia Jay two scanners 1) start: idx1_oih, end: idx1_oii 2) Start: idx2_tl, end: idx2_tm HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 60
61 HBase Query with OR idx1_bdi_160 idx1_oih_ idx1_oih_141 idx2_tl_ idx2_tl_145 idx2_tl_ Ram M SOA SSE 254 SELECT NAME FROM PERSON WHERE DEPT=OIH OR TITLE=TL idx1_oih_ idx2_tl_ Anu M OIH SA Raj idx1_oih_152 idx2_tl_152 Pia F OIH TL 326 Anu idx1_soa_ idx1_soa_ Idx1_SOA_150 idx2_tl_160 idx2_sa_ Jay M SOA TL 521 Suma F SOA TL 126 M SOA TL 325 Pia Jay Suma 152 Som M OIH TL Su F BDI TL 928 two scanners 1) start: idx1_oih, end: idx1_oii 2) Start: idx2_tl, end: idx2_tm HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 61
62 HBase Query with OR idx1_bdi_160 idx1_oih_ idx1_oih_141 idx2_tl_ idx2_tl_145 idx2_tl_ Ram M SOA SSE 254 SELECT NAME FROM PERSON WHERE DEPT=OIH OR TITLE=TL idx1_oih_ idx1_oih_152 idx2_tl_150 idx2_tl_ Anu M OIH SA Pia F OIH TL 326 Raj Anu idx1_soa_ idx1_soa_ Idx1_SOA_150 idx2_tl_160 idx2_sa_ Jay M SOA TL 521 Suma F SOA TL 126 M SOA TL 325 Pia Jay Suma 152 Som M OIH TL Su F BDI TL 928 two scanners 1) start: idx1_oih, end: idx1_oii 2) Start: idx2_tl, end: idx2_tm HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 62
63 HBase Query with OR idx1_bdi_160 idx1_oih_ idx1_oih_141 idx2_tl_ idx2_tl_145 idx2_tl_ Ram M SOA SSE 254 SELECT NAME FROM PERSON WHERE DEPT=OIH OR TITLE=TL idx1_oih_ idx2_tl_ Anu M OIH SA Raj idx1_oih_152 idx2_tl_152 Pia F OIH TL 326 Anu idx1_soa_ idx1_soa_ Idx1_SOA_150 idx2_tl_160 idx2_sa_ Jay M SOA TL 521 Suma F SOA TL 126 M SOA TL 325 Pia Jay Suma Som 152 Som M OIH TL Su F BDI TL 928 two scanners 1) start: idx1_oih, end: idx1_oii 2) Start: idx2_tl, end: idx2_tm HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 63
64 Region Split Explicit split on index region is avoided (using custom split policy for index tables) 01 Row cf:col1 01 A 02 A 01 Row 01_A_01 01_A_02 cf When user table region splits, corresponding index region also splits Split at this point 03 C 04 B 05 X 06 A 01_A_06 01_A_07 01_B_04 01_C_03 Split for index region same as that of user region A User table region 01_X_05 09 Index table region Row cf:col1 01 A 02 A 03 C 04 B Row 01_A_01 01_A_02 01_B_04 01_C_03 cf 05 Row cf:col1 05 Row Cf 05 X 05_A_06 06 A 05_A_ A 09 05_X_05 HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 64
65 Region Split Custom HalfStoreFileReader for reading index daughter regions. IndexHalfStoreFileReader Both half store file readers start at same point i.e. beginning of file 01 Row cf:col1 01 A 02 A 03 C 04 B 05 X 06 A HalfStoreFileReader Daughter A HalfStoreFileReader Daughter B Checks actual table row and decide KV corresponding to it or not A User table region IndexHalfStoreFileReader for daughter B - Changes the as per the daughter start. 01 Row cf IndexHalfStoreFileReader Daughter A 01_A_01 Daughter B 01_A_02 01_A_06 01_A_07 01_B_04 01_C_ _X_05 Index table region HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 65
66 Agenda HBase A brief introduction Introduction to hindex Usage Test Results HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 66
67 Usage: Getting Started For HBase 0.94.x For HBase 0.98 or trunk HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 67
68 Usage: Configurations Name Value hbase.coprocessor.master.classes hbase.coprocessor.region.classes hbase.coprocessor.wal.classes hbase.master.loadbalancer.class hbase.use.secondary.index org.apache.hadoop.hbase.index.coprocessor.master.indexmasterobserver org.apache.hadoop.hbase.index.coprocessor.regionserver.indexregionobserver org.apache.hadoop.hbase.index.coprocessor.wal.indexwalobserver org.apache.hadoop.hbase.index.secindexloa dbalancer true HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 68
69 Usage: Creating Index HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 69
70 Usage: Tools TableIndexer tool to create index(es) for existing data $HBASE_HOME/bin/hbase org.apache.hadoop.hbase.index.mapreduce.tableindexer -Dtablename.to.index=table -Dtable.columns.index= IDX1=>cf1:[q1->datatype& length] Bulk load tool to load user data to user table and index it at same time $HBASE_HOME/bin/hbase org.apache.hadoop.hbase.index.mapreduce.indeximporttsv -Dimporttsv.columns=a,b,c -Dimporttsv.bulk.output=hdfs://storefile-outputdir <tablename> <hdfs-data-inputdir> $HBASE_HOME/bin/hbase org.apache.hadoop.hbase.index.mapreduce.indexloadincrementalhfiles <hdfs://storefileoutput> <tablename> Tool to check regions co-location and repair if any co-location mismatches $HBASE_HOME/bin/hbase org.apache.hadoop.hbase.index.util.secondaryindexcolocator HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 70
71 Agenda HBase A brief introduction Introduction to hindex Usage Test Results HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 71
72 Test Results: Put Performance Hardware Topology Data Architecture : x86_64 CPU(s) : 24 (2 threads per core) RS Heap size: 8GB 5 Region Servers 100 Regions (user table) 100 GB data 500 bytes per record HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 72
73 Test Results: Scan Performance Search for a column value idx1 cf:q1 idx2 cf:q2 Hardware Topology Data Architecture : x86_64 CPU(s) : 24 (2 threads per core) RS Heap size: 8GB 5 Region Servers 100 Regions (user table) 50 GB data 500 bytes per record HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 73
74 Test Results: Query with AND idx1 cf:q1 idx2 cf:q2 Hardware Topology Data Architecture : x86_64 CPU(s) : 24 (2 threads per core) RS Heap size: 8GB 5 Region Servers 100 Regions (user table) 50 GB data 500 bytes per record HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 74
75 Test Results: Scan w/ Range Query idx3 cf:q3 Hardware Topology Data Architecture : x86_64 CPU(s) : 24 (2 threads per core) RS Heap size: 8GB 5 Region Servers 100 Regions (user table) 50 GB data 500 bytes per record HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 75
76 Test Results: Scan w/ Multi Column Index idx4 -> cf:q1,cf:q3 Hardware Topology Data Architecture : x86_64 CPU(s) : 24 (2 threads per core) RS Heap size: 8GB 5 Region Servers 100 Regions (user table) 50 GB data 500 bytes per record HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 76
77 Summary Design Supports multiple indexes and multi-column indexes on a table Supports indexing on part of a column value Supports equal and range condition scans using index Supports dynamic add/drop index Supports hints to skip index scan or specific indexes to use in the scan. Intelligent Filter evaluation Application usage No changes required to perform read and write operations. Use IndexAdmin (client extension) to perform admin operations like create, enable, disable and drop on indexed table. Need not perform admin operations separately on index table. Upgrade/Integration Minimal code changes in HBase kernel. HBase version upgrade is very easy. HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 77
78 Roadmap Contribute to HBase community (In progress refer HBASE-9203) Integration with Apache Phoenix HBCK tool support for Secondary index tables Pluggable Scan-Evaluation HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 78
79 Q & A HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 79
80 Thank you Copyright 2011 Huawei Technologies Co., Ltd. All Rights Reserved. The information in this document may contain predictive statements including, without limitation, statements regarding the future financial and operating results, future product portfolio, new technology, etc. There are a number of factors that could cause actual results and developments to differ materially from those expressed or implied in the predictive statements. Therefore, such information is provided for reference purpose only and constitutes neither an offer nor an acceptance. Huawei may change the information at any time without notice.
HBase. Леонид Налчаджи
HBase Леонид Налчаджи leonid.nalchadzhi@gmail.com HBase Overview Table layout Architecture Client API Key design 2 Overview 3 Overview NoSQL Column oriented Versioned 4 Overview All rows ordered by row
More informationHBASE INTERVIEW QUESTIONS
HBASE INTERVIEW QUESTIONS http://www.tutorialspoint.com/hbase/hbase_interview_questions.htm Copyright tutorialspoint.com Dear readers, these HBase Interview Questions have been designed specially to get
More informationADVANCED HBASE. Architecture and Schema Design GeeCON, May Lars George Director EMEA Services
ADVANCED HBASE Architecture and Schema Design GeeCON, May 2013 Lars George Director EMEA Services About Me Director EMEA Services @ Cloudera Consulting on Hadoop projects (everywhere) Apache Committer
More informationHBase: Overview. HBase is a distributed column-oriented data store built on top of HDFS
HBase 1 HBase: Overview HBase is a distributed column-oriented data store built on top of HDFS HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed Computing
More informationData Informatics. Seon Ho Kim, Ph.D.
Data Informatics Seon Ho Kim, Ph.D. seonkim@usc.edu HBase HBase is.. A distributed data store that can scale horizontally to 1,000s of commodity servers and petabytes of indexed storage. Designed to operate
More informationHBase Solutions at Facebook
HBase Solutions at Facebook Nicolas Spiegelberg Software Engineer, Facebook QCon Hangzhou, October 28 th, 2012 Outline HBase Overview Single Tenant: Messages Selection Criteria Multi-tenant Solutions
More informationGhislain Fourny. Big Data 5. Column stores
Ghislain Fourny Big Data 5. Column stores 1 Introduction 2 Relational model 3 Relational model Schema 4 Issues with relational databases (RDBMS) Small scale Single machine 5 Can we fix a RDBMS? Scale up
More informationBigTable: A Distributed Storage System for Structured Data
BigTable: A Distributed Storage System for Structured Data Amir H. Payberah amir@sics.se Amirkabir University of Technology (Tehran Polytechnic) Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26
More informationHBase... And Lewis Carroll! Twi:er,
HBase... And Lewis Carroll! jw4ean@cloudera.com Twi:er, LinkedIn: @jw4ean 1 Introduc@on 2010: Cloudera Solu@ons Architect 2011: Cloudera TAM/DSE 2012-2013: Cloudera Training focusing on Partners and Newbies
More informationApache HBase Andrew Purtell Committer, Apache HBase, Apache Software Foundation Big Data US Research And Development, Intel
Apache HBase 0.98 Andrew Purtell Committer, Apache HBase, Apache Software Foundation Big Data US Research And Development, Intel Who am I? Committer on the Apache HBase project Member of the Big Data Research
More informationBigtable: A Distributed Storage System for Structured Data By Fay Chang, et al. OSDI Presented by Xiang Gao
Bigtable: A Distributed Storage System for Structured Data By Fay Chang, et al. OSDI 2006 Presented by Xiang Gao 2014-11-05 Outline Motivation Data Model APIs Building Blocks Implementation Refinement
More informationBig Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2016)
Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2016) Week 10: Mutable State (1/2) March 15, 2016 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo These
More informationNoSQL Databases. Amir H. Payberah. Swedish Institute of Computer Science. April 10, 2014
NoSQL Databases Amir H. Payberah Swedish Institute of Computer Science amir@sics.se April 10, 2014 Amir H. Payberah (SICS) NoSQL Databases April 10, 2014 1 / 67 Database and Database Management System
More informationA Fast and High Throughput SQL Query System for Big Data
A Fast and High Throughput SQL Query System for Big Data Feng Zhu, Jie Liu, and Lijie Xu Technology Center of Software Engineering, Institute of Software, Chinese Academy of Sciences, Beijing, China 100190
More information10 Million Smart Meter Data with Apache HBase
10 Million Smart Meter Data with Apache HBase 5/31/2017 OSS Solution Center Hitachi, Ltd. Masahiro Ito OSS Summit Japan 2017 Who am I? Masahiro Ito ( 伊藤雅博 ) Software Engineer at Hitachi, Ltd. Focus on
More informationComparing SQL and NOSQL databases
COSC 6397 Big Data Analytics Data Formats (II) HBase Edgar Gabriel Spring 2014 Comparing SQL and NOSQL databases Types Development History Data Storage Model SQL One type (SQL database) with minor variations
More informationA Glimpse of the Hadoop Echosystem
A Glimpse of the Hadoop Echosystem 1 Hadoop Echosystem A cluster is shared among several users in an organization Different services HDFS and MapReduce provide the lower layers of the infrastructures Other
More informationGhislain Fourny. Big Data 5. Wide column stores
Ghislain Fourny Big Data 5. Wide column stores Data Technology Stack User interfaces Querying Data stores Indexing Processing Validation Data models Syntax Encoding Storage 2 Where we are User interfaces
More informationDistributed File Systems II
Distributed File Systems II To do q Very-large scale: Google FS, Hadoop FS, BigTable q Next time: Naming things GFS A radically new environment NFS, etc. Independence Small Scale Variety of workloads Cooperation
More informationFattane Zarrinkalam کارگاه ساالنه آزمایشگاه فناوری وب
Fattane Zarrinkalam کارگاه ساالنه آزمایشگاه فناوری وب 1391 زمستان Outlines Introduction DataModel Architecture HBase vs. RDBMS HBase users 2 Why Hadoop? Datasets are growing to Petabytes Traditional datasets
More informationCOSC 6339 Big Data Analytics. NoSQL (II) HBase. Edgar Gabriel Fall HBase. Column-Oriented data store Distributed designed to serve large tables
COSC 6339 Big Data Analytics NoSQL (II) HBase Edgar Gabriel Fall 2018 HBase Column-Oriented data store Distributed designed to serve large tables Billions of rows and millions of columns Runs on a cluster
More informationIntegration of Apache Hive
Integration of Apache Hive and HBase Enis Soztutar enis [at] apache [dot] org @enissoz Page 1 Agenda Overview of Hive and HBase Hive + HBase Features and Improvements Future of Hive and HBase Q&A Page
More informationIntroduction to BigData, Hadoop:-
Introduction to BigData, Hadoop:- Big Data Introduction: Hadoop Introduction What is Hadoop? Why Hadoop? Hadoop History. Different types of Components in Hadoop? HDFS, MapReduce, PIG, Hive, SQOOP, HBASE,
More informationCS November 2018
Bigtable Highly available distributed storage Distributed Systems 19. Bigtable Built with semi-structured data in mind URLs: content, metadata, links, anchors, page rank User data: preferences, account
More informationScaling Up HBase. Duen Horng (Polo) Chau Assistant Professor Associate Director, MS Analytics Georgia Tech. CSE6242 / CX4242: Data & Visual Analytics
http://poloclub.gatech.edu/cse6242 CSE6242 / CX4242: Data & Visual Analytics Scaling Up HBase Duen Horng (Polo) Chau Assistant Professor Associate Director, MS Analytics Georgia Tech Partly based on materials
More informationTypical size of data you deal with on a daily basis
Typical size of data you deal with on a daily basis Processes More than 161 Petabytes of raw data a day https://aci.info/2014/07/12/the-dataexplosion-in-2014-minute-by-minuteinfographic/ On average, 1MB-2MB
More informationCS November 2017
Bigtable Highly available distributed storage Distributed Systems 18. Bigtable Built with semi-structured data in mind URLs: content, metadata, links, anchors, page rank User data: preferences, account
More informationThe State of Apache HBase. Michael Stack
The State of Apache HBase Michael Stack Michael Stack Chair of the Apache HBase PMC* Caretaker/Janitor Member of the Hadoop PMC Engineer at Cloudera in SF * Project Management
More informationBig Data Analytics. Rasoul Karimi
Big Data Analytics Rasoul Karimi Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Big Data Analytics Big Data Analytics 1 / 1 Outline
More informationResearch on the Application of Bank Transaction Data Stream Storage based on HBase Xiaoguo Wang*, Yuxiang Liu and Lin Zhang
International Conference on Engineering Management (Iconf-EM 2016) Research on the Application of Bank Transaction Data Stream Storage based on HBase Xiaoguo Wang*, Yuxiang Liu and Lin Zhang School of
More informationBig Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)
Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Week 10: Mutable State (1/2) March 14, 2017 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo These
More informationCSE-E5430 Scalable Cloud Computing Lecture 9
CSE-E5430 Scalable Cloud Computing Lecture 9 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 15.11-2015 1/24 BigTable Described in the paper: Fay
More informationFacebook. The Technology Behind Messages (and more ) Kannan Muthukkaruppan Software Engineer, Facebook. March 11, 2011
HBase @ Facebook The Technology Behind Messages (and more ) Kannan Muthukkaruppan Software Engineer, Facebook March 11, 2011 Talk Outline the new Facebook Messages, and how we got started with HBase quick
More informationHBase Security. Works in Progress Andrew Purtell, an Intel HBase guy
HBase Security Works in Progress Andrew Purtell, an Intel HBase guy Outline Project Rhino Extending the HBase ACL model to per-cell granularity Introducing transparent encryption I don t always need encryption,
More informationDistributed Systems 16. Distributed File Systems II
Distributed Systems 16. Distributed File Systems II Paul Krzyzanowski pxk@cs.rutgers.edu 1 Review NFS RPC-based access AFS Long-term caching CODA Read/write replication & disconnected operation DFS AFS
More informationAdvanced HBase Schema Design. Berlin Buzzwords, June 2012 Lars George
Advanced HBase Schema Design Berlin Buzzwords, June 2012 Lars George lars@cloudera.com About Me SoluDons Architect @ Cloudera Apache HBase & Whirr CommiIer Author of HBase The Defini.ve Guide Working with
More informationRails on HBase. Zachary Pinter and Tony Hillerson RailsConf 2011
Rails on HBase Zachary Pinter and Tony Hillerson RailsConf 2011 What we will cover What is it? What are the tradeoffs that HBase makes? Why HBase is probably the wrong choice for your app Why HBase might
More informationIntroduction to Column Oriented Databases in PHP
Introduction to Column Oriented Databases in PHP By Slavey Karadzhov About Me Name: Slavey Karadzhov (Славей Караджов) Programmer since my early days PHP programmer since 1999
More informationCA485 Ray Walshe NoSQL
NoSQL BASE vs ACID Summary Traditional relational database management systems (RDBMS) do not scale because they adhere to ACID. A strong movement within cloud computing is to utilize non-traditional data
More informationSEARCHING BILLIONS OF PRODUCT LOGS IN REAL TIME. Ryan Tabora - Think Big Analytics NoSQL Search Roadshow - June 6, 2013
SEARCHING BILLIONS OF PRODUCT LOGS IN REAL TIME Ryan Tabora - Think Big Analytics NoSQL Search Roadshow - June 6, 2013 1 WHO AM I? Ryan Tabora Think Big Analytics - Senior Data Engineer Lover of dachshunds,
More informationGridGain and Apache Ignite In-Memory Performance with Durability of Disk
GridGain and Apache Ignite In-Memory Performance with Durability of Disk Dmitriy Setrakyan Apache Ignite PMC GridGain Founder & CPO http://ignite.apache.org #apacheignite Agenda What is GridGain and Ignite
More informationBigtable. A Distributed Storage System for Structured Data. Presenter: Yunming Zhang Conglong Li. Saturday, September 21, 13
Bigtable A Distributed Storage System for Structured Data Presenter: Yunming Zhang Conglong Li References SOCC 2010 Key Note Slides Jeff Dean Google Introduction to Distributed Computing, Winter 2008 University
More informationPyro: A Spatial-Temporal Big-Data Storage System. Shen Li Shaohan Hu Raghu Ganti Mudhakar Srivatsa Tarek Abdelzaher
Pyro: A Spatial-Temporal Big-Data Storage System Shen Li Shaohan Hu Raghu Ganti Mudhakar Srivatsa Tarek Abdelzaher 1 Applications A huge amount of geo- tagged events are generated and stored in real- 5me.
More informationA Brief Introduction of TiDB. Dongxu (Edward) Huang CTO, PingCAP
A Brief Introduction of TiDB Dongxu (Edward) Huang CTO, PingCAP About me Dongxu (Edward) Huang, Cofounder & CTO of PingCAP PingCAP, based in Beijing, China. Infrastructure software engineer, open source
More informationColumn-Family Databases Cassandra and HBase
Column-Family Databases Cassandra and HBase Kevin Swingler Google Big Table Google invented BigTableto store the massive amounts of semi-structured data it was generating Basic model stores items indexed
More informationVoldemort. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation
Voldemort Smruti R. Sarangi Department of Computer Science Indian Institute of Technology New Delhi, India Smruti R. Sarangi Leader Election 1/29 Outline 1 2 3 Smruti R. Sarangi Leader Election 2/29 Data
More informationStrategies for Incremental Updates on Hive
Strategies for Incremental Updates on Hive Copyright Informatica LLC 2017. Informatica, the Informatica logo, and Big Data Management are trademarks or registered trademarks of Informatica LLC in the United
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY Database Systems: Fall 2008 Quiz II
Department of Electrical Engineering and Computer Science MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.830 Database Systems: Fall 2008 Quiz II There are 14 questions and 11 pages in this quiz booklet. To receive
More informationA Security Audit Module for HBase
2016 Joint International Conference on Artificial Intelligence and Computer Engineering (AICE 2016) and International Conference on Network and Communication Security (NCS 2016) ISBN: 978-1-60595-362-5
More informationIntroduction to Hadoop. High Availability Scaling Advantages and Challenges. Introduction to Big Data
Introduction to Hadoop High Availability Scaling Advantages and Challenges Introduction to Big Data What is Big data Big Data opportunities Big Data Challenges Characteristics of Big data Introduction
More informationBig Data Processing Technologies. Chentao Wu Associate Professor Dept. of Computer Science and Engineering
Big Data Processing Technologies Chentao Wu Associate Professor Dept. of Computer Science and Engineering wuct@cs.sjtu.edu.cn Schedule (1) Storage system part (first eight weeks) lec1: Introduction on
More informationUsing space-filling curves for multidimensional
Using space-filling curves for multidimensional indexing Dr. Bisztray Dénes Senior Research Engineer 1 Nokia Solutions and Networks 2014 In medias res Performance problems with RDBMS Switch to NoSQL store
More informationCSE 530A. B+ Trees. Washington University Fall 2013
CSE 530A B+ Trees Washington University Fall 2013 B Trees A B tree is an ordered (non-binary) tree where the internal nodes can have a varying number of child nodes (within some range) B Trees When a key
More informationEnable Spark SQL on NoSQL Hbase tables with HSpark IBM Code Tech Talk. February 13, 2018
Enable Spark SQL on NoSQL Hbase tables with HSpark IBM Code Tech Talk February 13, 2018 https://developer.ibm.com/code/techtalks/enable-spark-sql-onnosql-hbase-tables-with-hspark-2/ >> MARC-ARTHUR PIERRE
More informationCopyright 2012, Oracle and/or its affiliates. All rights reserved.
1 Big Data Connectors: High Performance Integration for Hadoop and Oracle Database Melli Annamalai Sue Mavris Rob Abbott 2 Program Agenda Big Data Connectors: Brief Overview Connecting Hadoop with Oracle
More informationCSE 444: Database Internals. Lectures 26 NoSQL: Extensible Record Stores
CSE 444: Database Internals Lectures 26 NoSQL: Extensible Record Stores CSE 444 - Spring 2014 1 References Scalable SQL and NoSQL Data Stores, Rick Cattell, SIGMOD Record, December 2010 (Vol. 39, No. 4)
More informationDistributed Data Analytics Partitioning
G-3.1.09, Campus III Hasso Plattner Institut Different mechanisms but usually used together Distributing Data Replication vs. Replication Store copies of the same data on several nodes Introduces redundancy
More informationHBase: column-oriented database
HBase: column-oriented database Software Languages Team University of Koblenz-Landau Ralf Lämmel and Andrei Varanovich Motivation Billions of rows X millions of columns X thousands of versions = terabytes
More informationCassandra, MongoDB, and HBase. Cassandra, MongoDB, and HBase. I have chosen these three due to their recent
Tanton Jeppson CS 401R Lab 3 Cassandra, MongoDB, and HBase Introduction For my report I have chosen to take a deeper look at 3 NoSQL database systems: Cassandra, MongoDB, and HBase. I have chosen these
More informationbig picture parallel db (one data center) mix of OLTP and batch analysis lots of data, high r/w rates, 1000s of cheap boxes thus many failures
Lecture 20 -- 11/20/2017 BigTable big picture parallel db (one data center) mix of OLTP and batch analysis lots of data, high r/w rates, 1000s of cheap boxes thus many failures what does paper say Google
More informationAzure-persistence MARTIN MUDRA
Azure-persistence MARTIN MUDRA Storage service access Blobs Queues Tables Storage service Horizontally scalable Zone Redundancy Accounts Based on Uri Pricing Calculator Azure table storage Storage Account
More informationReferences. What is Bigtable? Bigtable Data Model. Outline. Key Features. CSE 444: Database Internals
References CSE 444: Database Internals Scalable SQL and NoSQL Data Stores, Rick Cattell, SIGMOD Record, December 2010 (Vol 39, No 4) Lectures 26 NoSQL: Extensible Record Stores Bigtable: A Distributed
More information/ Cloud Computing. Recitation 10 March 22nd, 2016
15-319 / 15-619 Cloud Computing Recitation 10 March 22nd, 2016 Overview Administrative issues Office Hours, Piazza guidelines Last week s reflection Project 3.3, OLI Unit 4, Module 15, Quiz 8 This week
More informationMapReduce & BigTable
CPSC 426/526 MapReduce & BigTable Ennan Zhai Computer Science Department Yale University Lecture Roadmap Cloud Computing Overview Challenges in the Clouds Distributed File Systems: GFS Data Process & Analysis:
More informationCISC 7610 Lecture 5 Distributed multimedia databases. Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL
CISC 7610 Lecture 5 Distributed multimedia databases Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL Motivation YouTube receives 400 hours of video per minute That is 200M hours
More informationRapid Automated Indication of Link-Loss
Rapid Automated Indication of Link-Loss Fast recovery of failures is becoming a hot topic of discussion for many of today s Big Data applications such as Hadoop, HBase, Cassandra, MongoDB, MySQL, MemcacheD
More informationTrafodion Enterprise-Class Transactional SQL-on-HBase
Trafodion Enterprise-Class Transactional SQL-on-HBase Trafodion Introduction (Welsh for transactions) Joint HP Labs & HP-IT project for transactional SQL database capabilities on Hadoop Leveraging 20+
More informationMySQL Cluster Web Scalability, % Availability. Andrew
MySQL Cluster Web Scalability, 99.999% Availability Andrew Morgan @andrewmorgan www.clusterdb.com Safe Harbour Statement The following is intended to outline our general product direction. It is intended
More informationGoogle File System and BigTable. and tiny bits of HDFS (Hadoop File System) and Chubby. Not in textbook; additional information
Subject 10 Fall 2015 Google File System and BigTable and tiny bits of HDFS (Hadoop File System) and Chubby Not in textbook; additional information Disclaimer: These abbreviated notes DO NOT substitute
More informationBigTable. CSE-291 (Cloud Computing) Fall 2016
BigTable CSE-291 (Cloud Computing) Fall 2016 Data Model Sparse, distributed persistent, multi-dimensional sorted map Indexed by a row key, column key, and timestamp Values are uninterpreted arrays of bytes
More informationInsight Case Studies. Tuning the Beloved DB-Engines. Presented By Nithya Koka and Michael Arnold
Insight Case Studies Tuning the Beloved DB-Engines Presented By Nithya Koka and Michael Arnold Who is Nithya Koka? Senior Hadoop Administrator Project Lead Client Engagement On-Call Engineer Cluster Ninja
More informationAchieving Horizontal Scalability. Alain Houf Sales Engineer
Achieving Horizontal Scalability Alain Houf Sales Engineer Scale Matters InterSystems IRIS Database Platform lets you: Scale up and scale out Scale users and scale data Mix and match a variety of approaches
More informationPerformance Analysis of Hbase
Performance Analysis of Hbase Neseeba P.B, Dr. Zahid Ansari Department of Computer Science & Engineering, P. A. College of Engineering, Mangalore, 574153, India Abstract Hbase is a distributed column-oriented
More informationDistributed Systems [Fall 2012]
Distributed Systems [Fall 2012] Lec 20: Bigtable (cont ed) Slide acks: Mohsen Taheriyan (http://www-scf.usc.edu/~csci572/2011spring/presentations/taheriyan.pptx) 1 Chubby (Reminder) Lock service with a
More informationLarge-Scale GPU programming
Large-Scale GPU programming Tim Kaldewey Research Staff Member Database Technologies IBM Almaden Research Center tkaldew@us.ibm.com Assistant Adjunct Professor Computer and Information Science Dept. University
More informationTRANSACTIONS OVER HBASE
TRANSACTIONS OVER HBASE Alex Baranau @abaranau Gary Helmling @gario Continuuity WHO WE ARE We ve built Continuuity Reactor: the world s first scale-out application server for Hadoop Fast, easy development,
More informationTRANSACTIONS AND ABSTRACTIONS
TRANSACTIONS AND ABSTRACTIONS OVER HBASE Andreas Neumann @anew68! Continuuity AGENDA Transactions over HBase: Why? What? Implementation: How? The approach Transaction Manager Abstractions Future WHO WE
More informationPart 1: Indexes for Big Data
JethroData Making Interactive BI for Big Data a Reality Technical White Paper This white paper explains how JethroData can help you achieve a truly interactive interactive response time for BI on big data,
More informationColumn Stores and HBase. Rui LIU, Maksim Hrytsenia
Column Stores and HBase Rui LIU, Maksim Hrytsenia December 2017 Contents 1 Hadoop 2 1.1 Creation................................ 2 2 HBase 3 2.1 Column Store Database....................... 3 2.2 HBase
More informationApache BookKeeper. A High Performance and Low Latency Storage Service
Apache BookKeeper A High Performance and Low Latency Storage Service Hello! I am Sijie Guo - PMC Chair of Apache BookKeeper Co-creator of Apache DistributedLog Twitter Messaging/Pub-Sub Team Yahoo! R&D
More informationHadoop An Overview. - Socrates CCDH
Hadoop An Overview - Socrates CCDH What is Big Data? Volume Not Gigabyte. Terabyte, Petabyte, Exabyte, Zettabyte - Due to handheld gadgets,and HD format images and videos - In total data, 90% of them collected
More informationYCSB++ benchmarking tool Performance debugging advanced features of scalable table stores
YCSB++ benchmarking tool Performance debugging advanced features of scalable table stores Swapnil Patil M. Polte, W. Tantisiriroj, K. Ren, L.Xiao, J. Lopez, G.Gibson, A. Fuchs *, B. Rinaldi * Carnegie
More informationBigtable: A Distributed Storage System for Structured Data by Google SUNNIE CHUNG CIS 612
Bigtable: A Distributed Storage System for Structured Data by Google SUNNIE CHUNG CIS 612 Google Bigtable 2 A distributed storage system for managing structured data that is designed to scale to a very
More information4/9/2018 Week 13-A Sangmi Lee Pallickara. CS435 Introduction to Big Data Spring 2018 Colorado State University. FAQs. Architecture of GFS
W13.A.0.0 CS435 Introduction to Big Data W13.A.1 FAQs Programming Assignment 3 has been posted PART 2. LARGE SCALE DATA STORAGE SYSTEMS DISTRIBUTED FILE SYSTEMS Recitations Apache Spark tutorial 1 and
More informationDatabase Evolution. DB NoSQL Linked Open Data. L. Vigliano
Database Evolution DB NoSQL Linked Open Data Requirements and features Large volumes of data..increasing No regular data structure to manage Relatively homogeneous elements among them (no correlation between
More informationIndexing. Week 14, Spring Edited by M. Naci Akkøk, , Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel
Indexing Week 14, Spring 2005 Edited by M. Naci Akkøk, 5.3.2004, 3.3.2005 Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel Overview Conventional indexes B-trees Hashing schemes
More informationHypertable: The Storage Infrastructure behind Rediffmail - one of the World s Largest Services. Introduction. Current Architecture
Hypertable: The Storage Infrastructure behind Rediffmail - one of the World s Largest Email Services Doug Judd CEO, Hypertable Inc. Introduction Rediff.com India (Nasdaq: REDF) is one of India's top Internet
More informationTuning Enterprise Information Catalog Performance
Tuning Enterprise Information Catalog Performance Copyright Informatica LLC 2015, 2018. Informatica and the Informatica logo are trademarks or registered trademarks of Informatica LLC in the United States
More informationPoS(ISGC 2011 & OGF 31)075
1 GIS Center, Feng Chia University 100 Wen0Hwa Rd., Taichung, Taiwan E-mail: cool@gis.tw Tien-Yin Chou GIS Center, Feng Chia University 100 Wen0Hwa Rd., Taichung, Taiwan E-mail: jimmy@gis.tw Lan-Kun Chung
More informationAbout Speaker. Shrwan Krishna Shrestha. /
PARTITION About Speaker Shrwan Krishna Shrestha shrwan@sqlpassnepal.org / shrwan@gmail.com 98510-50947 Topics to be covered Partition in earlier versions Table Partitioning Overview Benefits of Partitioned
More informationBigtable: A Distributed Storage System for Structured Data. Andrew Hon, Phyllis Lau, Justin Ng
Bigtable: A Distributed Storage System for Structured Data Andrew Hon, Phyllis Lau, Justin Ng What is Bigtable? - A storage system for managing structured data - Used in 60+ Google services - Motivation:
More informationEsgynDB Enterprise 2.0 Platform Reference Architecture
EsgynDB Enterprise 2.0 Platform Reference Architecture This document outlines a Platform Reference Architecture for EsgynDB Enterprise, built on Apache Trafodion (Incubating) implementation with licensed
More informationDistributed Filesystem
Distributed Filesystem 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributing Code! Don t move data to workers move workers to the data! - Store data on the local disks of nodes in the
More informationHadoop ecosystem. Nikos Parlavantzas
1 Hadoop ecosystem Nikos Parlavantzas Lecture overview 2 Objective Provide an overview of a selection of technologies in the Hadoop ecosystem Hadoop ecosystem 3 Hadoop ecosystem 4 Outline 5 HBase Hive
More informationGoogle File System. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google fall DIP Heerak lim, Donghun Koo
Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google 2017 fall DIP Heerak lim, Donghun Koo 1 Agenda Introduction Design overview Systems interactions Master operation Fault tolerance
More informationGoogle Cloud Bigtable. And what it's awesome at
Google Cloud Bigtable And what it's awesome at Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes Agenda 1 Research 2 A story about bigness 3 How it works 4 When it's awesome Google Research
More informationRAID in Practice, Overview of Indexing
RAID in Practice, Overview of Indexing CS634 Lecture 4, Feb 04 2014 Slides based on Database Management Systems 3 rd ed, Ramakrishnan and Gehrke 1 Disks and Files: RAID in practice For a big enterprise
More informationHDFS Federation. Sanjay Radia Founder and Hortonworks. Page 1
HDFS Federation Sanjay Radia Founder and Architect @ Hortonworks Page 1 About Me Apache Hadoop Committer and Member of Hadoop PMC Architect of core-hadoop @ Yahoo - Focusing on HDFS, MapReduce scheduler,
More informationFusion iomemory PCIe Solutions from SanDisk and Sqrll make Accumulo Hypersonic
WHITE PAPER Fusion iomemory PCIe Solutions from SanDisk and Sqrll make Accumulo Hypersonic Western Digital Technologies, Inc. 951 SanDisk Drive, Milpitas, CA 95035 www.sandisk.com Table of Contents Executive
More informationCluster-Level Google How we use Colossus to improve storage efficiency
Cluster-Level Storage @ Google How we use Colossus to improve storage efficiency Denis Serenyi Senior Staff Software Engineer dserenyi@google.com November 13, 2017 Keynote at the 2nd Joint International
More information