Faster HBase queries. Introducing hindex Secondary indexes for HBase. ApacheCon North America Rajeshbabu Chintaguntla

Size: px
Start display at page:

Download "Faster HBase queries. Introducing hindex Secondary indexes for HBase. ApacheCon North America Rajeshbabu Chintaguntla"

Transcription

1 Security Level: Faster HBase queries Introducing hindex Secondary indexes for HBase ApacheCon North America Rajeshbabu Chintaguntla HUAWEI TECHNOLOGIES CO., LTD.

2 $ whoami Apache HBase Committer Senior Software Engineer, Huawei India R&D Developed hindex Secondary indexes for HBase HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 2

3 Agenda HBase A brief introduction Introduction to hindex Usage Test Results HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 3

4 Agenda HBase A brief introduction Introduction to hindex Usage Test Results HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 4

5 Apache HBase salary dob an open-source, distributed, 0 versioned, non-relational database Ram M OIH SSE modeled after Google s BigTable 141 Anu M OIH TL 2100 leverages distributed data storage provided by HDFS 145 Pia F BDI TL 326 Jay M SOA SA allows random, read/write access to data in HDFS 152 Suma F SOA SSE 126 Som M OIH SE GOAL: hosting very large tables - billions of rows X millions of.. columns HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 5

6 HBase Introduction Sorted lexicographically salary dob Ram M OIH SSE 254 Anu M OIH TL Pia F BDI TL 326 Jay M SOA SA 521 Suma F SOA SSE 126 Som M OIH SE HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 6

7 HBase Introduction Sorted Sparse any number of columns salary dob Raj M SE 534 Ram M SSE Anu M Pia F 326 Jay SOA SA 521 Suma SOA SSE Som M OIH SE HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 7

8 HBase Introduction Sorted Sparse Multi-dimensional data is versioned salary dob 141 Raj M OIH SE SSE 534 Ram M SSE TL SSE Anu M Pia F 326 Jay SOA SA Suma SOA SSE 1550 cf1 Som M OIH SE SSE Key= cf2 name T1 Raj gender T1 M dept T1 OIH title T1 SE title T2 SSE mobile T1 534 salary T3 0 dob T (, column family, column, timestamp) -> value HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 8

9 HBase Introduction Sorted Sparse Multi-dimensional Distributed salary dob leverages HDFS, data replicated across nodes Raj M SE 534 Ram M SSE Anu M Pia F 326 Jay SOA SA 521 Suma SOA SSE Som M OIH SE Protects against failing node HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 9

10 HBase Introduction Sorted Sparse Multi-dimensional Distributed Auto-sharded split and re-distributed as data grows salary dob 141 Ram M OIH SSE 254 Anu M OIH TL Pia F BDI TL Regions Jay M SOA SA 521 Suma F SOA SSE 126 Som M OIH SE HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 10

11 HBase Introduction Sorted Sparse Multi-dimensional Distributed Auto-sharded metadata salary dob 0 Region Range R R Ram M OIH SSE R3 141 Anu M OIH TL 2100 Pia F BDI TL Regions 145 Jay M SOA SA Suma F SOA SSE Som M OIH SE HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 11

12 HBase Introduction HMaster Region Server 1 Region Server 2 Region 1 Region 2 Store cf1 Store cf2 HBlock1 HBlock1 HBlock1 HBlock1 Mem store HBlock2 : HBlock2 : Mem store HBlock2 : HBlock2 : HBlockN HBlockN HBlockN HBlockN HFile1 HFile2 HFile3 HFile4 Master, region servers and zookeeper Table horizontally divided into regions Columns grouped into Column families Vertical partition of tables Memstore, HFiles in DFS. HFiles logically split into smaller blocks. Data read write happen as blocks MapReduce integration HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 12

13 Coprocessors Allow to run client-supplied code on server-side, Can extend the functionality of HBase without changing the kernel Client Pre-Action Action Post-Action Observers Like triggers Runs extended functionality before or after an action through hooks provided by coprocessor framework. EndPoints Like Stored procedures Can run any time from client The endpoint implementation will then be executed remotely at the target region(s) results from those executions will be returned to the client. HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 13

14 Filters Source: Lars, George, HBase The Definitive Guide, O Reilly Media HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 14

15 HBase Query salary dob 0 SELECT NAME FROM PERSON WHERE KEY=141 Ram M OIH SSE Anu M OIH TL Pia F BDI TL 326 Jay M SOA SA NOTE: HBase does not have native support for SQL like query interface. It is used here for the sake of easy understanding. Similar query can be done using scanners and filters. Suma F SOA SSE Som M OIH SE HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 15

16 HBase Query by Row salary dob 0 SELECT NAME FROM PERSON WHERE KEY=141 Ram M OIH SSE Anu M OIH TL 2100 Pia F BDI TL Jay M SOA SA Suma F SOA SSE Som M OIH SE HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 16

17 HBase Query by Column (w/o Index) salary dob 0 SELECT NAME FROM PERSON WHERE MOBILE= Ram M OIH SSE Anu M OIH TL 2100 Pia F BDI TL Jay M SOA SA Suma F SOA SSE Som M OIH SE HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 17

18 HBase Query by Column (w/o Index) salary dob 0 SELECT NAME FROM PERSON WHERE MOBILE= Ram M OIH SSE Anu M OIH TL 2100 Pia F BDI TL Jay M SOA SA Suma F SOA SSE Som M OIH SE HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 18

19 HBase Query by Column (w/o Index) salary dob 0 SELECT NAME FROM PERSON WHERE MOBILE= Ram M OIH SSE Anu M OIH TL 2100 Pia F BDI TL Jay M SOA SA Suma F SOA SSE Som M OIH SE Full table scan HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 19

20 HBase Query by Column (w/o Index) salary dob 0 SELECT NAME FROM PERSON WHERE MOBILE= Ram M OIH SSE Anu M OIH TL 2100 Pia F BDI TL Jay M SOA SA Suma F SOA SSE Som M OIH SE Full table scan Billions of records full table scan evil to performance HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 20

21 HBase Query by Column (w/o Index) salary dob 0 SELECT NAME FROM PERSON WHERE MOBILE= Ram M OIH SSE Anu M OIH TL 2100 Pia F BDI TL Jay M SOA SA Suma F SOA SSE Som M OIH SE Full table scan Billions of records full table scan evil to performance Side effects Client timeouts, lease expiring HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 21

22 Agenda HBase A brief introduction Introduction to hindex Usage Test Results HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 22

23 Introducing hindex Coprocessor based server side implementation of secondary indexing solution Separate index table, used by all indexes of a table. Region wise indexing (aka local indexing) Custom load balancer co-locates the index table regions with actual table regions. Index table row construction is: region start + index name+ indexed column(s) value + user table row HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 23

24 Coprocesor Host Client Ext Coprocesor Host hindex: Architecture Balancer Coprocessor handles the index data HMaster A custom LoadBalancer does collocation Client App HBase Client RegionServer Indexing Coprocessor Client Extn allows specifying index details while creating table, not needed for read/write RowKey cf1:col1 001 A 002 B 003 Z 004 C 005 A 006 A Primary User table RowKey Index table cf 001_A_ _A_ _A_ _B_ _C_ _Z_003 Secondary index table HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 24

25 HBase Query by Column (w/ Index) Ram M OIH SSE 254 salary dob SELECT NAME FROM PERSON WHERE MOBILE= 141 Anu M OIH TL 2100 Index column: MOBILE Pia F BDI TL Jay M SOA SA Suma F SOA SSE Som M OIH SE HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 25

26 HBase Query by Column (w/ Index) idx Ram M OIH SSE 254 salary dob SELECT NAME FROM PERSON WHERE MOBILE= Anu M OIH TL Pia F BDI TL Jay M SOA SA Suma F SOA SSE Som M OIH SE HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 26

27 HBase Query by Column (w/ Index) idx1 _ _ Ram M OIH SSE 254 salary dob SELECT NAME FROM PERSON WHERE MOBILE= 326_ 141 Anu M OIH TL _ Pia F BDI TL _ 145 Jay M SOA SA _145 Suma F SOA SSE _ Som M OIH SE To ensure user table row order when a column value repeated in multiple rows HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 27

28 HBase Query by Column (w/ Index) idx1 _ _ Ram M OIH SSE 254 salary dob SELECT NAME FROM PERSON WHERE MOBILE= 326_ 141 Anu M OIH TL _ Pia F BDI TL _ 145 Jay M SOA SA _145 Suma F SOA SSE _ Som M OIH SE HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 28

29 HBase Query by Column (w/ Index) idx1 salary dob _ _ Ram M OIH SSE _ 141 Anu M OIH TL _ Pia F BDI TL Regions 126_ 145 Jay M SOA SA _ _ Suma F SOA SSE 126 Som M OIH SE Index maintained per region, not globally.. HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 29

30 HBase Query by Column (w/ Index) idx1 salary dob _141 Network calls avoided 0 254_ Ram M OIH SSE _ 141 Anu M OIH TL _ Pia F BDI TL Regions 126_ 145 Jay M SOA SA _ _ Suma F SOA SSE 126 Som M OIH SE Index maintained per region, not globally.. HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 30

31 HBase Query by Column (w/ Index) idx1 salary dob _ _ Ram M OIH SSE _ 141 Anu M OIH TL _ Pia F BDI TL Regions 126_ 145 Jay M SOA SA _ _ Suma F SOA SSE 126 Som M OIH SE Index maintained per region, not globally.. Handle Region Movement Region Splits HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 31

32 Regions Co-location Actual Table A R1 B RS1 Client Create table with regions HMaster B C R2 R1 R1 Balancer A R1 RS2 B B C R2 R2 R2 Index Table HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 32

33 Put operation Table-> t1 & Column family -> cf Index-> idx1(cf:q1) & idx2(cf:q2) Index table -> t1_idx HRegionServer Client Put t1, AAB, cf:q1, 5, cf:q2, z1 A B User Region R1 A B Index Region A R1 Coprocessor Put t1_idx, Aidx15AAB Put t1_idx, Aidx2z1AAB HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 33

34 Scan Operation 1) Create scanner for index region at server side HRegionServer Client Create scanner (condition cf:q1=5) A User Region R1 A Index Region R1 B B A Coprocessor Create scanner on index region Start row : Aidx15 Stop row : Aidx16 HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 34

35 Scan Operation 2) Scan index table at server side and seek to exact rows in the user table HRegionServer Client 1 2 A User Region A Index Region R1 R1 B Coprocessor 3 B A 5 Seek to exact row 4 next() HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 35

36 Scan Operation Coprocessors read index and seek to exact row in the user table Doing seeks on HFiles based on row obtained from index data HFiles reads as block by block Default block size is 64kb Skipping block reads from HDFS where data not at all present Some times skipping full HFile No need to read index details back to client avoiding network extra usage. HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 36

37 hindex: Usage SELECT NAME FROM PERSON Filters (with equal or range conditions) WHERE (DEPT= OIH OR TITLE= TL ) AND (400 > MOBILE AND MOBILE > 500) AND Filters list with MUST_PASS_ALL OR Filters list with MUST_PASS_ONE HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 37

38 HBase Query with AND idx1 BDI_ OIH_ idx2 TL_141 TL_ Ram M OIH SSE 254 SELECT NAME FROM PERSON WHERE DEPT=OIH AND TITLE=TL OIH_141 TL_ Anu M OIH TL OIH_ SA_145 Pia F BDI TL 326 OIH_152 SE_ 145 Jay M SOA SA 521 SOA_145 SSE_ Suma F SOA SSE 126 SOA_ SSE_ 152 Som M OIH TL 665 HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 38

39 HBase Query with AND idx1_bdi_ idx1_oih_ idx1_oih_141 idx1_oih_ idx1_oih_152 idx1_soa_145 idx1_soa_ idx2_tl_141 idx2_tl_ idx2_tl_152 idx2_sa_ Ram M OIH SSE 254 Anu M OIH TL Pia F BDI TL 326 Jay M SOA SA 521 Suma F SOA SSE 126 Som M OIH TL 665 1) Single index table per user table, easier to collocate 2) Index table row s have index name to store each index data together SELECT NAME FROM PERSON WHERE DEPT=OIH AND TITLE=TL HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 39

40 HBase Query with AND idx1_bdi_ idx1_oih_ idx1_oih_141 idx1_oih_ idx1_oih_152 idx1_soa_ Ram M OIH SSE 254 Anu M OIH TL Pia F BDI TL 326 Jay M SOA SA 521 SELECT NAME FROM PERSON WHERE DEPT=OIH AND TITLE=TL idx1_soa_ Suma F SOA SSE Som M OIH TL 665 idx2_tl_141 idx2_tl_ idx2_tl_152 idx2_sa_145 Create two scanners 1) start: idx1_oih, end: idx1_oii 2) Start: idx2_tl, end: idx2_tm HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 40

41 HBase Query with AND idx1_bdi_160 idx1_oih_ idx1_oih_141 idx2_tl_ idx2_tl_145 Idx2_TL_ Ram M SOA SSE 254 SELECT NAME FROM PERSON WHERE DEPT=OIH AND TITLE=TL idx1_oih_ Idx2_TL_ Anu M OIH SA idx1_oih_152 idx2_tl_152 Pia F OIH TL 326 idx1_soa_ Idx2_TL_ Jay M SOA TL 521 idx1_soa_ idx2_sa_141 Suma F SOA TL 126 Idx1_SOA_ M SOA TL Som M OIH TL Su F BDI TL 928 two scanners 1) start: idx1_oih, end: idx1_oii 2) Start: idx2_tl, end: idx2_tm HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 41

42 HBase Query with AND idx1_bdi_160 idx1_oih_ idx1_oih_141 idx1_oih_ idx2_tl_ idx2_tl_145 < Idx2_TL_ scanner-1 can jump to idx1_oih_ Ram M SOA SSE Anu M OIH SA Idx2_TL_150 SELECT NAME FROM PERSON WHERE DEPT=OIH AND TITLE=TL idx1_oih_152 idx2_tl_152 Pia F OIH TL 326 idx1_soa_ Idx2_TL_ Jay M SOA TL 521 idx1_soa_ idx2_sa_141 Suma F SOA TL 126 Idx1_SOA_ M SOA TL Som M OIH TL Su F BDI TL 928 two scanners 1) start: idx1_oih, end: idx1_oii 2) Start: idx2_tl, end: idx2_tm HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 42

43 HBase Query with AND idx1_bdi_160 idx1_oih_ idx1_oih_141 idx2_tl_ idx2_tl_145 Idx2_TL_ Ram M SOA SSE 254 SELECT NAME FROM PERSON WHERE DEPT=OIH AND TITLE=TL idx1_oih_ Idx2_TL_ Anu M OIH SA idx1_oih_152 idx2_tl_152 Pia F OIH TL 326 idx1_soa_ Idx2_TL_ Jay M SOA TL 521 idx1_soa_ idx2_sa_141 Suma F SOA TL 126 Idx1_SOA_ M SOA TL Som M OIH TL Su F BDI TL 928 two scanners 1) start: idx1_oih, end: idx1_oii 2) Start: idx2_tl, end: idx2_tm HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 43

44 HBase Query with AND idx1_bdi_160 idx1_oih_ idx1_oih_141 idx2_tl_ idx2_tl_145 Idx2_TL_ Ram M SOA SSE 254 SELECT NAME FROM PERSON WHERE DEPT=OIH AND TITLE=TL idx1_oih_ Idx2_TL_ Anu M OIH SA Pia idx1_oih_152 idx2_tl_152 Pia F OIH TL 326 idx1_soa_ idx1_soa_ Idx2_TL_160 idx2_sa_ Jay M SOA TL 521 = (meets our condition) - Can fetch required data - Move both scanners to next Suma F SOA TL 126 Idx1_SOA_ M SOA TL Som M OIH TL Su F BDI TL 928 two scanners 1) start: idx1_oih, end: idx1_oii 2) Start: idx2_tl, end: idx2_tm HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 44

45 HBase Query with AND idx1_bdi_160 idx1_oih_ idx1_oih_141 idx2_tl_ idx2_tl_145 Idx2_TL_ Ram M SOA SSE 254 SELECT NAME FROM PERSON WHERE DEPT=OIH AND TITLE=TL idx1_oih_ idx1_oih_152 idx1_soa_ idx1_soa_ Idx1_SOA_150 Idx2_TL_150 idx2_tl_152 Idx2_TL_ > 145 scanner-2 can jump to idx2_tl_152 idx2_sa_ Anu M OIH SA Pia F OIH TL Jay M SOA TL 521 Suma F SOA TL M SOA TL Som M OIH TL 665 Pia 160 Su F BDI TL 928 two scanners 1) start: idx1_oih, end: idx1_oii 2) Start: idx2_tl, end: idx2_tm HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 45

46 HBase Query with AND idx1_bdi_160 idx1_oih_ idx1_oih_141 idx2_tl_ Idx2_TL_145 idx2_tl_ Ram M SOA SSE 254 SELECT NAME FROM PERSON WHERE DEPT=OIH AND TITLE=TL idx1_oih_ idx2_tl_ Anu M OIH SA Pia idx1_oih_152 idx2_tl_152 Pia F OIH TL 326 Som idx1_soa_ idx1_soa_ Idx2_TL_160 idx2_sa_ Jay M SOA TL = 152 (meets our condition) - Can fetch required data - Move both scanners to next Suma F SOA TL 126 Idx1_SOA_ M SOA TL Som M OIH TL Su F BDI TL 928 two scanners 1) start: idx1_oih, end: idx1_oii 2) Start: idx2_tl, end: idx2_tm HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 46

47 HBase Query with AND idx1_bdi_160 idx1_oih_ idx1_oih_141 idx2_tl_ Idx2_TL_145 idx2_tl_ Ram M SOA SSE 254 SELECT NAME FROM PERSON WHERE DEPT=OIH AND TITLE=TL idx1_oih_ idx2_tl_ Anu M OIH SA Pia idx1_oih_152 idx2_tl_152 Pia F OIH TL 326 Som idx1_soa_ idx2_tl_ Jay M SOA TL 521 idx1_soa_ idx2_sa_141 Suma F SOA TL 126 Idx1_SOA_ M SOA TL 325 Scanner-1 reaches end Close both scanners Som M OIH TL 665 Su F BDI TL 928 two scanners 1) start: idx1_oih, end: idx1_oii 2) Start: idx2_tl, end: idx2_tm HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 47

48 HBase Query with AND (w/ 2-column index) idx1_bdi_tl_160 idx1_oih_sa_141 idx1_oih_tl_ Ram M SOA SSE 254 SELECT NAME FROM PERSON WHERE DEPT=OIH AND TITLE=TL idx1_oih_tl_ Anu M OIH SA idx1_soa_sse_ Pia F OIH TL 326 idx1_soa_tl_ Jay M SOA TL 521 Idx1_SOA_TL_ Suma F SOA TL 126 Idx1_SOA_TL_ M SOA TL 325 Idx1_SOA_TL_ Som M OIH TL Su F BDI TL scanner: start: idx1_oih_tl, end: idx1_oih_tm HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 48

49 HBase Query with AND (w/ 2-column index) idx1_bdi_tl_160 idx1_oih_sa_141 idx1_oih_tl_ Ram M SOA SSE 254 SELECT NAME FROM PERSON WHERE DEPT=OIH AND TITLE=TL idx1_oih_tl_ Anu M OIH SA idx1_soa_sse_ Pia F OIH TL 326 idx1_soa_tl_ Jay M SOA TL 521 Idx1_SOA_TL_ Suma F SOA TL 126 Idx1_SOA_TL_ M SOA TL 325 Idx1_SOA_TL_ Som M OIH TL Su F BDI TL scanner: start: idx1_oih_tl, end: idx1_oih_tm HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 49

50 HBase Query with AND (w/ 2-column index) idx1_bdi_tl_160 idx1_oih_sa_141 idx1_oih_tl_ idx1_oih_tl_152 idx1_soa_sse_ Meets condition - Can fetch required data Ram M SOA SSE Anu M OIH SA Pia F OIH TL 326 SELECT NAME FROM PERSON WHERE DEPT=OIH AND TITLE=TL Pia idx1_soa_tl_145 Idx1_SOA_TL_ Idx1_SOA_TL_150 Idx1_SOA_TL_ Jay M SOA TL 521 Suma F SOA TL 126 M SOA TL 325 Som M OIH TL 665 Su F BDI TL scanner: start: idx1_oih_tl, end: idx1_oih_tm HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 50

51 HBase Query with AND (w/ 2-column index) idx1_bdi_tl_160 idx1_oih_sa_141 idx1_oih_tl_ Ram M SOA SSE 254 SELECT NAME FROM PERSON WHERE DEPT=OIH AND TITLE=TL idx1_oih_tl_152 idx1_soa_sse_ idx1_soa_tl_ Anu M OIH SA Meets condition - Can fetch required data Pia F OIH TL Jay M SOA TL 521 Pia Som Idx1_SOA_TL_ Idx1_SOA_TL_150 Idx1_SOA_TL_ Suma F SOA TL 126 M SOA TL 325 Som M OIH TL 665 Su F BDI TL scanner: start: idx1_oih_tl, end: idx1_oih_tm HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 51

52 HBase Query with AND (w/ 2-column index) idx1_bdi_tl_160 idx1_oih_sa_141 idx1_oih_tl_ Ram M SOA SSE 254 SELECT NAME FROM PERSON WHERE DEPT=OIH AND TITLE=TL idx1_oih_tl_ Anu M OIH SA Pia idx1_soa_sse_ Pia F OIH TL 326 Som idx1_soa_tl_ Jay M SOA TL 521 Idx1_SOA_TL_ Suma F SOA TL 126 Idx1_SOA_TL_150 Idx1_SOA_TL_ Scanner reaches end Close the scanner 152 M SOA TL 325 Som M OIH TL Su F BDI TL scanner: start: idx1_oih_tl, end: idx1_oih_tm HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 52

53 HBase Query with OR idx1_bdi_160 idx1_oih_ idx1_oih_141 idx1_oih_ idx1_oih_152 idx1_soa_ idx1_soa_ Idx1_SOA_150 idx2_tl_ Idx2_TL_145 idx2_tl_ <= Get required results idx2_tl_150 Ram M SOA SSE 254 Anu M OIH SA Pia F OIH TL 326 idx2_tl_152 - Move scanner-1 to next till exceeds idx2_tl_ Jay M SOA TL 521 idx2_sa_141 Suma F SOA TL M SOA TL Som M OIH TL 665 SELECT NAME FROM PERSON WHERE DEPT=OIH OR TITLE=TL 160 Su F BDI TL 928 two scanners 1) start: idx1_oih, end: idx1_oii 2) Start: idx2_tl, end: idx2_tm HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 53

54 HBase Query with OR idx1_bdi_160 idx1_oih_ idx1_oih_141 idx1_oih_ idx1_oih_152 idx1_soa_ idx1_soa_ Idx1_SOA_150 idx2_tl_ Idx2_TL_145 idx2_tl_ <= Get required results idx2_tl_150 Ram M SOA SSE 254 Anu M OIH SA Pia F OIH TL 326 idx2_tl_152 - Move scanner-1 to next till exceeds idx2_tl_ Jay M SOA TL 521 idx2_sa_141 Suma F SOA TL M SOA TL Som M OIH TL 665 SELECT NAME FROM PERSON WHERE DEPT=OIH OR TITLE=TL Raj 160 Su F BDI TL 928 two scanners 1) start: idx1_oih, end: idx1_oii 2) Start: idx2_tl, end: idx2_tm HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 54

55 HBase Query with OR idx1_bdi_160 idx1_oih_ idx1_oih_141 idx2_tl_ Idx2_TL_145 idx2_tl_ Ram M SOA SSE 254 SELECT NAME FROM PERSON WHERE DEPT=OIH OR TITLE=TL idx1_oih_ idx1_oih_152 idx1_soa_ idx1_soa_ Idx1_SOA_150 idx2_tl_ idx2_tl_ <= Get required results idx2_tl_160 Anu M OIH SA Pia F OIH TL 326 Jay M SOA TL Move scanner-1 to next till Suma exceeds F SOA TL 126 idx2_sa_ M SOA TL Som M OIH TL 665 Raj 160 Su F BDI TL 928 two scanners 1) start: idx1_oih, end: idx1_oii 2) Start: idx2_tl, end: idx2_tm HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 55

56 HBase Query with OR idx1_bdi_160 idx1_oih_ idx1_oih_141 idx2_tl_ Idx2_TL_145 idx2_tl_ Ram M SOA SSE 254 SELECT NAME FROM PERSON WHERE DEPT=OIH OR TITLE=TL idx1_oih_ idx1_oih_152 idx1_soa_ idx1_soa_ Idx1_SOA_150 idx2_tl_ idx2_tl_ <= Get required results idx2_tl_160 Anu M OIH SA Pia F OIH TL 326 Jay M SOA TL Move scanner-1 to next till Suma exceeds F SOA TL 126 idx2_sa_ M SOA TL Som M OIH TL 665 Raj Anu 160 Su F BDI TL 928 two scanners 1) start: idx1_oih, end: idx1_oii 2) Start: idx2_tl, end: idx2_tm HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 56

57 HBase Query with OR idx1_bdi_160 idx1_oih_ idx1_oih_141 idx2_tl_ Idx2_TL_145 idx2_tl_ Ram M SOA SSE 254 SELECT NAME FROM PERSON WHERE DEPT=OIH OR TITLE=TL idx1_oih_ idx1_oih_152 idx1_soa_ idx1_soa_ Idx1_SOA_150 idx2_tl_150 idx2_tl_152 idx2_tl_160 idx2_sa_ == Get required results Anu M OIH SA Pia F OIH TL 326 Jay M SOA TL 521 Suma F SOA TL 126 M SOA TL 325 Som M OIH TL Move both scanners 160 in this Su case F BDI TL 928 Raj Anu two scanners 1) start: idx1_oih, end: idx1_oii 2) Start: idx2_tl, end: idx2_tm HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 57

58 HBase Query with OR idx1_bdi_160 idx1_oih_ idx1_oih_141 idx2_tl_ Idx2_TL_145 idx2_tl_ Ram M SOA SSE 254 SELECT NAME FROM PERSON WHERE DEPT=OIH OR TITLE=TL idx1_oih_ idx1_oih_152 idx1_soa_ idx1_soa_ Idx1_SOA_ idx2_tl_150 idx2_tl_ idx2_tl_160 idx2_sa_ == Get required results Anu M OIH SA Pia F OIH TL 326 Jay M SOA TL 521 Suma F SOA TL 126 M SOA TL 325 Som M OIH TL Move both scanners 160 in this Su case F BDI TL 928 Raj Anu Pia two scanners 1) start: idx1_oih, end: idx1_oii 2) Start: idx2_tl, end: idx2_tm HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 58

59 HBase Query with OR idx1_bdi_160 idx1_oih_ idx1_oih_141 idx2_tl_ idx2_tl_145 idx2_tl_ Ram M SOA SSE 254 SELECT NAME FROM PERSON WHERE DEPT=OIH OR TITLE=TL idx1_oih_ idx1_oih_152 idx1_soa_ idx1_soa_ Idx1_SOA_150 idx2_tl_ Anu M OIH SA idx2_tl_152 Pia F OIH TL 326 idx2_tl_ Jay M SOA TL 521 idx2_sa_141 Suma F SOA TL M SOA TL Som M OIH TL >= 145 (scanner is behind) - Get required results Su F BDI TL Move scanner-2 to next till exceeds 152 Raj Anu Pia two scanners 1) start: idx1_oih, end: idx1_oii 2) Start: idx2_tl, end: idx2_tm HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 59

60 HBase Query with OR idx1_bdi_160 idx1_oih_ idx1_oih_141 idx2_tl_ idx2_tl_145 idx2_tl_ Ram M SOA SSE 254 SELECT NAME FROM PERSON WHERE DEPT=OIH OR TITLE=TL idx1_oih_ idx1_oih_152 idx1_soa_ idx1_soa_ Idx1_SOA_150 idx2_tl_ Anu M OIH SA idx2_tl_152 Pia F OIH TL 326 idx2_tl_ Jay M SOA TL 521 idx2_sa_141 Suma F SOA TL M SOA TL Som M OIH TL >= 145 (scanner is behind) - Get required results Su F BDI TL Move scanner-2 to next till exceeds 152 Raj Anu Pia Jay two scanners 1) start: idx1_oih, end: idx1_oii 2) Start: idx2_tl, end: idx2_tm HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 60

61 HBase Query with OR idx1_bdi_160 idx1_oih_ idx1_oih_141 idx2_tl_ idx2_tl_145 idx2_tl_ Ram M SOA SSE 254 SELECT NAME FROM PERSON WHERE DEPT=OIH OR TITLE=TL idx1_oih_ idx2_tl_ Anu M OIH SA Raj idx1_oih_152 idx2_tl_152 Pia F OIH TL 326 Anu idx1_soa_ idx1_soa_ Idx1_SOA_150 idx2_tl_160 idx2_sa_ Jay M SOA TL 521 Suma F SOA TL 126 M SOA TL 325 Pia Jay Suma 152 Som M OIH TL Su F BDI TL 928 two scanners 1) start: idx1_oih, end: idx1_oii 2) Start: idx2_tl, end: idx2_tm HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 61

62 HBase Query with OR idx1_bdi_160 idx1_oih_ idx1_oih_141 idx2_tl_ idx2_tl_145 idx2_tl_ Ram M SOA SSE 254 SELECT NAME FROM PERSON WHERE DEPT=OIH OR TITLE=TL idx1_oih_ idx1_oih_152 idx2_tl_150 idx2_tl_ Anu M OIH SA Pia F OIH TL 326 Raj Anu idx1_soa_ idx1_soa_ Idx1_SOA_150 idx2_tl_160 idx2_sa_ Jay M SOA TL 521 Suma F SOA TL 126 M SOA TL 325 Pia Jay Suma 152 Som M OIH TL Su F BDI TL 928 two scanners 1) start: idx1_oih, end: idx1_oii 2) Start: idx2_tl, end: idx2_tm HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 62

63 HBase Query with OR idx1_bdi_160 idx1_oih_ idx1_oih_141 idx2_tl_ idx2_tl_145 idx2_tl_ Ram M SOA SSE 254 SELECT NAME FROM PERSON WHERE DEPT=OIH OR TITLE=TL idx1_oih_ idx2_tl_ Anu M OIH SA Raj idx1_oih_152 idx2_tl_152 Pia F OIH TL 326 Anu idx1_soa_ idx1_soa_ Idx1_SOA_150 idx2_tl_160 idx2_sa_ Jay M SOA TL 521 Suma F SOA TL 126 M SOA TL 325 Pia Jay Suma Som 152 Som M OIH TL Su F BDI TL 928 two scanners 1) start: idx1_oih, end: idx1_oii 2) Start: idx2_tl, end: idx2_tm HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 63

64 Region Split Explicit split on index region is avoided (using custom split policy for index tables) 01 Row cf:col1 01 A 02 A 01 Row 01_A_01 01_A_02 cf When user table region splits, corresponding index region also splits Split at this point 03 C 04 B 05 X 06 A 01_A_06 01_A_07 01_B_04 01_C_03 Split for index region same as that of user region A User table region 01_X_05 09 Index table region Row cf:col1 01 A 02 A 03 C 04 B Row 01_A_01 01_A_02 01_B_04 01_C_03 cf 05 Row cf:col1 05 Row Cf 05 X 05_A_06 06 A 05_A_ A 09 05_X_05 HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 64

65 Region Split Custom HalfStoreFileReader for reading index daughter regions. IndexHalfStoreFileReader Both half store file readers start at same point i.e. beginning of file 01 Row cf:col1 01 A 02 A 03 C 04 B 05 X 06 A HalfStoreFileReader Daughter A HalfStoreFileReader Daughter B Checks actual table row and decide KV corresponding to it or not A User table region IndexHalfStoreFileReader for daughter B - Changes the as per the daughter start. 01 Row cf IndexHalfStoreFileReader Daughter A 01_A_01 Daughter B 01_A_02 01_A_06 01_A_07 01_B_04 01_C_ _X_05 Index table region HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 65

66 Agenda HBase A brief introduction Introduction to hindex Usage Test Results HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 66

67 Usage: Getting Started For HBase 0.94.x For HBase 0.98 or trunk HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 67

68 Usage: Configurations Name Value hbase.coprocessor.master.classes hbase.coprocessor.region.classes hbase.coprocessor.wal.classes hbase.master.loadbalancer.class hbase.use.secondary.index org.apache.hadoop.hbase.index.coprocessor.master.indexmasterobserver org.apache.hadoop.hbase.index.coprocessor.regionserver.indexregionobserver org.apache.hadoop.hbase.index.coprocessor.wal.indexwalobserver org.apache.hadoop.hbase.index.secindexloa dbalancer true HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 68

69 Usage: Creating Index HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 69

70 Usage: Tools TableIndexer tool to create index(es) for existing data $HBASE_HOME/bin/hbase org.apache.hadoop.hbase.index.mapreduce.tableindexer -Dtablename.to.index=table -Dtable.columns.index= IDX1=>cf1:[q1->datatype& length] Bulk load tool to load user data to user table and index it at same time $HBASE_HOME/bin/hbase org.apache.hadoop.hbase.index.mapreduce.indeximporttsv -Dimporttsv.columns=a,b,c -Dimporttsv.bulk.output=hdfs://storefile-outputdir <tablename> <hdfs-data-inputdir> $HBASE_HOME/bin/hbase org.apache.hadoop.hbase.index.mapreduce.indexloadincrementalhfiles <hdfs://storefileoutput> <tablename> Tool to check regions co-location and repair if any co-location mismatches $HBASE_HOME/bin/hbase org.apache.hadoop.hbase.index.util.secondaryindexcolocator HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 70

71 Agenda HBase A brief introduction Introduction to hindex Usage Test Results HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 71

72 Test Results: Put Performance Hardware Topology Data Architecture : x86_64 CPU(s) : 24 (2 threads per core) RS Heap size: 8GB 5 Region Servers 100 Regions (user table) 100 GB data 500 bytes per record HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 72

73 Test Results: Scan Performance Search for a column value idx1 cf:q1 idx2 cf:q2 Hardware Topology Data Architecture : x86_64 CPU(s) : 24 (2 threads per core) RS Heap size: 8GB 5 Region Servers 100 Regions (user table) 50 GB data 500 bytes per record HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 73

74 Test Results: Query with AND idx1 cf:q1 idx2 cf:q2 Hardware Topology Data Architecture : x86_64 CPU(s) : 24 (2 threads per core) RS Heap size: 8GB 5 Region Servers 100 Regions (user table) 50 GB data 500 bytes per record HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 74

75 Test Results: Scan w/ Range Query idx3 cf:q3 Hardware Topology Data Architecture : x86_64 CPU(s) : 24 (2 threads per core) RS Heap size: 8GB 5 Region Servers 100 Regions (user table) 50 GB data 500 bytes per record HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 75

76 Test Results: Scan w/ Multi Column Index idx4 -> cf:q1,cf:q3 Hardware Topology Data Architecture : x86_64 CPU(s) : 24 (2 threads per core) RS Heap size: 8GB 5 Region Servers 100 Regions (user table) 50 GB data 500 bytes per record HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 76

77 Summary Design Supports multiple indexes and multi-column indexes on a table Supports indexing on part of a column value Supports equal and range condition scans using index Supports dynamic add/drop index Supports hints to skip index scan or specific indexes to use in the scan. Intelligent Filter evaluation Application usage No changes required to perform read and write operations. Use IndexAdmin (client extension) to perform admin operations like create, enable, disable and drop on indexed table. Need not perform admin operations separately on index table. Upgrade/Integration Minimal code changes in HBase kernel. HBase version upgrade is very easy. HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 77

78 Roadmap Contribute to HBase community (In progress refer HBASE-9203) Integration with Apache Phoenix HBCK tool support for Secondary index tables Pluggable Scan-Evaluation HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 78

79 Q & A HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 79

80 Thank you Copyright 2011 Huawei Technologies Co., Ltd. All Rights Reserved. The information in this document may contain predictive statements including, without limitation, statements regarding the future financial and operating results, future product portfolio, new technology, etc. There are a number of factors that could cause actual results and developments to differ materially from those expressed or implied in the predictive statements. Therefore, such information is provided for reference purpose only and constitutes neither an offer nor an acceptance. Huawei may change the information at any time without notice.

HBase. Леонид Налчаджи

HBase. Леонид Налчаджи HBase Леонид Налчаджи leonid.nalchadzhi@gmail.com HBase Overview Table layout Architecture Client API Key design 2 Overview 3 Overview NoSQL Column oriented Versioned 4 Overview All rows ordered by row

More information

HBASE INTERVIEW QUESTIONS

HBASE INTERVIEW QUESTIONS HBASE INTERVIEW QUESTIONS http://www.tutorialspoint.com/hbase/hbase_interview_questions.htm Copyright tutorialspoint.com Dear readers, these HBase Interview Questions have been designed specially to get

More information

ADVANCED HBASE. Architecture and Schema Design GeeCON, May Lars George Director EMEA Services

ADVANCED HBASE. Architecture and Schema Design GeeCON, May Lars George Director EMEA Services ADVANCED HBASE Architecture and Schema Design GeeCON, May 2013 Lars George Director EMEA Services About Me Director EMEA Services @ Cloudera Consulting on Hadoop projects (everywhere) Apache Committer

More information

HBase: Overview. HBase is a distributed column-oriented data store built on top of HDFS

HBase: Overview. HBase is a distributed column-oriented data store built on top of HDFS HBase 1 HBase: Overview HBase is a distributed column-oriented data store built on top of HDFS HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed Computing

More information

Data Informatics. Seon Ho Kim, Ph.D.

Data Informatics. Seon Ho Kim, Ph.D. Data Informatics Seon Ho Kim, Ph.D. seonkim@usc.edu HBase HBase is.. A distributed data store that can scale horizontally to 1,000s of commodity servers and petabytes of indexed storage. Designed to operate

More information

HBase Solutions at Facebook

HBase Solutions at Facebook HBase Solutions at Facebook Nicolas Spiegelberg Software Engineer, Facebook QCon Hangzhou, October 28 th, 2012 Outline HBase Overview Single Tenant: Messages Selection Criteria Multi-tenant Solutions

More information

Ghislain Fourny. Big Data 5. Column stores

Ghislain Fourny. Big Data 5. Column stores Ghislain Fourny Big Data 5. Column stores 1 Introduction 2 Relational model 3 Relational model Schema 4 Issues with relational databases (RDBMS) Small scale Single machine 5 Can we fix a RDBMS? Scale up

More information

BigTable: A Distributed Storage System for Structured Data

BigTable: A Distributed Storage System for Structured Data BigTable: A Distributed Storage System for Structured Data Amir H. Payberah amir@sics.se Amirkabir University of Technology (Tehran Polytechnic) Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26

More information

HBase... And Lewis Carroll! Twi:er,

HBase... And Lewis Carroll! Twi:er, HBase... And Lewis Carroll! jw4ean@cloudera.com Twi:er, LinkedIn: @jw4ean 1 Introduc@on 2010: Cloudera Solu@ons Architect 2011: Cloudera TAM/DSE 2012-2013: Cloudera Training focusing on Partners and Newbies

More information

Apache HBase Andrew Purtell Committer, Apache HBase, Apache Software Foundation Big Data US Research And Development, Intel

Apache HBase Andrew Purtell Committer, Apache HBase, Apache Software Foundation Big Data US Research And Development, Intel Apache HBase 0.98 Andrew Purtell Committer, Apache HBase, Apache Software Foundation Big Data US Research And Development, Intel Who am I? Committer on the Apache HBase project Member of the Big Data Research

More information

Bigtable: A Distributed Storage System for Structured Data By Fay Chang, et al. OSDI Presented by Xiang Gao

Bigtable: A Distributed Storage System for Structured Data By Fay Chang, et al. OSDI Presented by Xiang Gao Bigtable: A Distributed Storage System for Structured Data By Fay Chang, et al. OSDI 2006 Presented by Xiang Gao 2014-11-05 Outline Motivation Data Model APIs Building Blocks Implementation Refinement

More information

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2016)

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2016) Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2016) Week 10: Mutable State (1/2) March 15, 2016 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo These

More information

NoSQL Databases. Amir H. Payberah. Swedish Institute of Computer Science. April 10, 2014

NoSQL Databases. Amir H. Payberah. Swedish Institute of Computer Science. April 10, 2014 NoSQL Databases Amir H. Payberah Swedish Institute of Computer Science amir@sics.se April 10, 2014 Amir H. Payberah (SICS) NoSQL Databases April 10, 2014 1 / 67 Database and Database Management System

More information

A Fast and High Throughput SQL Query System for Big Data

A Fast and High Throughput SQL Query System for Big Data A Fast and High Throughput SQL Query System for Big Data Feng Zhu, Jie Liu, and Lijie Xu Technology Center of Software Engineering, Institute of Software, Chinese Academy of Sciences, Beijing, China 100190

More information

10 Million Smart Meter Data with Apache HBase

10 Million Smart Meter Data with Apache HBase 10 Million Smart Meter Data with Apache HBase 5/31/2017 OSS Solution Center Hitachi, Ltd. Masahiro Ito OSS Summit Japan 2017 Who am I? Masahiro Ito ( 伊藤雅博 ) Software Engineer at Hitachi, Ltd. Focus on

More information

Comparing SQL and NOSQL databases

Comparing SQL and NOSQL databases COSC 6397 Big Data Analytics Data Formats (II) HBase Edgar Gabriel Spring 2014 Comparing SQL and NOSQL databases Types Development History Data Storage Model SQL One type (SQL database) with minor variations

More information

A Glimpse of the Hadoop Echosystem

A Glimpse of the Hadoop Echosystem A Glimpse of the Hadoop Echosystem 1 Hadoop Echosystem A cluster is shared among several users in an organization Different services HDFS and MapReduce provide the lower layers of the infrastructures Other

More information

Ghislain Fourny. Big Data 5. Wide column stores

Ghislain Fourny. Big Data 5. Wide column stores Ghislain Fourny Big Data 5. Wide column stores Data Technology Stack User interfaces Querying Data stores Indexing Processing Validation Data models Syntax Encoding Storage 2 Where we are User interfaces

More information

Distributed File Systems II

Distributed File Systems II Distributed File Systems II To do q Very-large scale: Google FS, Hadoop FS, BigTable q Next time: Naming things GFS A radically new environment NFS, etc. Independence Small Scale Variety of workloads Cooperation

More information

Fattane Zarrinkalam کارگاه ساالنه آزمایشگاه فناوری وب

Fattane Zarrinkalam کارگاه ساالنه آزمایشگاه فناوری وب Fattane Zarrinkalam کارگاه ساالنه آزمایشگاه فناوری وب 1391 زمستان Outlines Introduction DataModel Architecture HBase vs. RDBMS HBase users 2 Why Hadoop? Datasets are growing to Petabytes Traditional datasets

More information

COSC 6339 Big Data Analytics. NoSQL (II) HBase. Edgar Gabriel Fall HBase. Column-Oriented data store Distributed designed to serve large tables

COSC 6339 Big Data Analytics. NoSQL (II) HBase. Edgar Gabriel Fall HBase. Column-Oriented data store Distributed designed to serve large tables COSC 6339 Big Data Analytics NoSQL (II) HBase Edgar Gabriel Fall 2018 HBase Column-Oriented data store Distributed designed to serve large tables Billions of rows and millions of columns Runs on a cluster

More information

Integration of Apache Hive

Integration of Apache Hive Integration of Apache Hive and HBase Enis Soztutar enis [at] apache [dot] org @enissoz Page 1 Agenda Overview of Hive and HBase Hive + HBase Features and Improvements Future of Hive and HBase Q&A Page

More information

Introduction to BigData, Hadoop:-

Introduction to BigData, Hadoop:- Introduction to BigData, Hadoop:- Big Data Introduction: Hadoop Introduction What is Hadoop? Why Hadoop? Hadoop History. Different types of Components in Hadoop? HDFS, MapReduce, PIG, Hive, SQOOP, HBASE,

More information

CS November 2018

CS November 2018 Bigtable Highly available distributed storage Distributed Systems 19. Bigtable Built with semi-structured data in mind URLs: content, metadata, links, anchors, page rank User data: preferences, account

More information

Scaling Up HBase. Duen Horng (Polo) Chau Assistant Professor Associate Director, MS Analytics Georgia Tech. CSE6242 / CX4242: Data & Visual Analytics

Scaling Up HBase. Duen Horng (Polo) Chau Assistant Professor Associate Director, MS Analytics Georgia Tech. CSE6242 / CX4242: Data & Visual Analytics http://poloclub.gatech.edu/cse6242 CSE6242 / CX4242: Data & Visual Analytics Scaling Up HBase Duen Horng (Polo) Chau Assistant Professor Associate Director, MS Analytics Georgia Tech Partly based on materials

More information

Typical size of data you deal with on a daily basis

Typical size of data you deal with on a daily basis Typical size of data you deal with on a daily basis Processes More than 161 Petabytes of raw data a day https://aci.info/2014/07/12/the-dataexplosion-in-2014-minute-by-minuteinfographic/ On average, 1MB-2MB

More information

CS November 2017

CS November 2017 Bigtable Highly available distributed storage Distributed Systems 18. Bigtable Built with semi-structured data in mind URLs: content, metadata, links, anchors, page rank User data: preferences, account

More information

The State of Apache HBase. Michael Stack

The State of Apache HBase. Michael Stack The State of Apache HBase Michael Stack Michael Stack Chair of the Apache HBase PMC* Caretaker/Janitor Member of the Hadoop PMC Engineer at Cloudera in SF * Project Management

More information

Big Data Analytics. Rasoul Karimi

Big Data Analytics. Rasoul Karimi Big Data Analytics Rasoul Karimi Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Big Data Analytics Big Data Analytics 1 / 1 Outline

More information

Research on the Application of Bank Transaction Data Stream Storage based on HBase Xiaoguo Wang*, Yuxiang Liu and Lin Zhang

Research on the Application of Bank Transaction Data Stream Storage based on HBase Xiaoguo Wang*, Yuxiang Liu and Lin Zhang International Conference on Engineering Management (Iconf-EM 2016) Research on the Application of Bank Transaction Data Stream Storage based on HBase Xiaoguo Wang*, Yuxiang Liu and Lin Zhang School of

More information

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Week 10: Mutable State (1/2) March 14, 2017 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo These

More information

CSE-E5430 Scalable Cloud Computing Lecture 9

CSE-E5430 Scalable Cloud Computing Lecture 9 CSE-E5430 Scalable Cloud Computing Lecture 9 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 15.11-2015 1/24 BigTable Described in the paper: Fay

More information

Facebook. The Technology Behind Messages (and more ) Kannan Muthukkaruppan Software Engineer, Facebook. March 11, 2011

Facebook. The Technology Behind Messages (and more ) Kannan Muthukkaruppan Software Engineer, Facebook. March 11, 2011 HBase @ Facebook The Technology Behind Messages (and more ) Kannan Muthukkaruppan Software Engineer, Facebook March 11, 2011 Talk Outline the new Facebook Messages, and how we got started with HBase quick

More information

HBase Security. Works in Progress Andrew Purtell, an Intel HBase guy

HBase Security. Works in Progress Andrew Purtell, an Intel HBase guy HBase Security Works in Progress Andrew Purtell, an Intel HBase guy Outline Project Rhino Extending the HBase ACL model to per-cell granularity Introducing transparent encryption I don t always need encryption,

More information

Distributed Systems 16. Distributed File Systems II

Distributed Systems 16. Distributed File Systems II Distributed Systems 16. Distributed File Systems II Paul Krzyzanowski pxk@cs.rutgers.edu 1 Review NFS RPC-based access AFS Long-term caching CODA Read/write replication & disconnected operation DFS AFS

More information

Advanced HBase Schema Design. Berlin Buzzwords, June 2012 Lars George

Advanced HBase Schema Design. Berlin Buzzwords, June 2012 Lars George Advanced HBase Schema Design Berlin Buzzwords, June 2012 Lars George lars@cloudera.com About Me SoluDons Architect @ Cloudera Apache HBase & Whirr CommiIer Author of HBase The Defini.ve Guide Working with

More information

Rails on HBase. Zachary Pinter and Tony Hillerson RailsConf 2011

Rails on HBase. Zachary Pinter and Tony Hillerson RailsConf 2011 Rails on HBase Zachary Pinter and Tony Hillerson RailsConf 2011 What we will cover What is it? What are the tradeoffs that HBase makes? Why HBase is probably the wrong choice for your app Why HBase might

More information

Introduction to Column Oriented Databases in PHP

Introduction to Column Oriented Databases in PHP Introduction to Column Oriented Databases in PHP By Slavey Karadzhov About Me Name: Slavey Karadzhov (Славей Караджов) Programmer since my early days PHP programmer since 1999

More information

CA485 Ray Walshe NoSQL

CA485 Ray Walshe NoSQL NoSQL BASE vs ACID Summary Traditional relational database management systems (RDBMS) do not scale because they adhere to ACID. A strong movement within cloud computing is to utilize non-traditional data

More information

SEARCHING BILLIONS OF PRODUCT LOGS IN REAL TIME. Ryan Tabora - Think Big Analytics NoSQL Search Roadshow - June 6, 2013

SEARCHING BILLIONS OF PRODUCT LOGS IN REAL TIME. Ryan Tabora - Think Big Analytics NoSQL Search Roadshow - June 6, 2013 SEARCHING BILLIONS OF PRODUCT LOGS IN REAL TIME Ryan Tabora - Think Big Analytics NoSQL Search Roadshow - June 6, 2013 1 WHO AM I? Ryan Tabora Think Big Analytics - Senior Data Engineer Lover of dachshunds,

More information

GridGain and Apache Ignite In-Memory Performance with Durability of Disk

GridGain and Apache Ignite In-Memory Performance with Durability of Disk GridGain and Apache Ignite In-Memory Performance with Durability of Disk Dmitriy Setrakyan Apache Ignite PMC GridGain Founder & CPO http://ignite.apache.org #apacheignite Agenda What is GridGain and Ignite

More information

Bigtable. A Distributed Storage System for Structured Data. Presenter: Yunming Zhang Conglong Li. Saturday, September 21, 13

Bigtable. A Distributed Storage System for Structured Data. Presenter: Yunming Zhang Conglong Li. Saturday, September 21, 13 Bigtable A Distributed Storage System for Structured Data Presenter: Yunming Zhang Conglong Li References SOCC 2010 Key Note Slides Jeff Dean Google Introduction to Distributed Computing, Winter 2008 University

More information

Pyro: A Spatial-Temporal Big-Data Storage System. Shen Li Shaohan Hu Raghu Ganti Mudhakar Srivatsa Tarek Abdelzaher

Pyro: A Spatial-Temporal Big-Data Storage System. Shen Li Shaohan Hu Raghu Ganti Mudhakar Srivatsa Tarek Abdelzaher Pyro: A Spatial-Temporal Big-Data Storage System Shen Li Shaohan Hu Raghu Ganti Mudhakar Srivatsa Tarek Abdelzaher 1 Applications A huge amount of geo- tagged events are generated and stored in real- 5me.

More information

A Brief Introduction of TiDB. Dongxu (Edward) Huang CTO, PingCAP

A Brief Introduction of TiDB. Dongxu (Edward) Huang CTO, PingCAP A Brief Introduction of TiDB Dongxu (Edward) Huang CTO, PingCAP About me Dongxu (Edward) Huang, Cofounder & CTO of PingCAP PingCAP, based in Beijing, China. Infrastructure software engineer, open source

More information

Column-Family Databases Cassandra and HBase

Column-Family Databases Cassandra and HBase Column-Family Databases Cassandra and HBase Kevin Swingler Google Big Table Google invented BigTableto store the massive amounts of semi-structured data it was generating Basic model stores items indexed

More information

Voldemort. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation

Voldemort. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation Voldemort Smruti R. Sarangi Department of Computer Science Indian Institute of Technology New Delhi, India Smruti R. Sarangi Leader Election 1/29 Outline 1 2 3 Smruti R. Sarangi Leader Election 2/29 Data

More information

Strategies for Incremental Updates on Hive

Strategies for Incremental Updates on Hive Strategies for Incremental Updates on Hive Copyright Informatica LLC 2017. Informatica, the Informatica logo, and Big Data Management are trademarks or registered trademarks of Informatica LLC in the United

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Database Systems: Fall 2008 Quiz II

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Database Systems: Fall 2008 Quiz II Department of Electrical Engineering and Computer Science MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.830 Database Systems: Fall 2008 Quiz II There are 14 questions and 11 pages in this quiz booklet. To receive

More information

A Security Audit Module for HBase

A Security Audit Module for HBase 2016 Joint International Conference on Artificial Intelligence and Computer Engineering (AICE 2016) and International Conference on Network and Communication Security (NCS 2016) ISBN: 978-1-60595-362-5

More information

Introduction to Hadoop. High Availability Scaling Advantages and Challenges. Introduction to Big Data

Introduction to Hadoop. High Availability Scaling Advantages and Challenges. Introduction to Big Data Introduction to Hadoop High Availability Scaling Advantages and Challenges Introduction to Big Data What is Big data Big Data opportunities Big Data Challenges Characteristics of Big data Introduction

More information

Big Data Processing Technologies. Chentao Wu Associate Professor Dept. of Computer Science and Engineering

Big Data Processing Technologies. Chentao Wu Associate Professor Dept. of Computer Science and Engineering Big Data Processing Technologies Chentao Wu Associate Professor Dept. of Computer Science and Engineering wuct@cs.sjtu.edu.cn Schedule (1) Storage system part (first eight weeks) lec1: Introduction on

More information

Using space-filling curves for multidimensional

Using space-filling curves for multidimensional Using space-filling curves for multidimensional indexing Dr. Bisztray Dénes Senior Research Engineer 1 Nokia Solutions and Networks 2014 In medias res Performance problems with RDBMS Switch to NoSQL store

More information

CSE 530A. B+ Trees. Washington University Fall 2013

CSE 530A. B+ Trees. Washington University Fall 2013 CSE 530A B+ Trees Washington University Fall 2013 B Trees A B tree is an ordered (non-binary) tree where the internal nodes can have a varying number of child nodes (within some range) B Trees When a key

More information

Enable Spark SQL on NoSQL Hbase tables with HSpark IBM Code Tech Talk. February 13, 2018

Enable Spark SQL on NoSQL Hbase tables with HSpark IBM Code Tech Talk. February 13, 2018 Enable Spark SQL on NoSQL Hbase tables with HSpark IBM Code Tech Talk February 13, 2018 https://developer.ibm.com/code/techtalks/enable-spark-sql-onnosql-hbase-tables-with-hspark-2/ >> MARC-ARTHUR PIERRE

More information

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Copyright 2012, Oracle and/or its affiliates. All rights reserved. 1 Big Data Connectors: High Performance Integration for Hadoop and Oracle Database Melli Annamalai Sue Mavris Rob Abbott 2 Program Agenda Big Data Connectors: Brief Overview Connecting Hadoop with Oracle

More information

CSE 444: Database Internals. Lectures 26 NoSQL: Extensible Record Stores

CSE 444: Database Internals. Lectures 26 NoSQL: Extensible Record Stores CSE 444: Database Internals Lectures 26 NoSQL: Extensible Record Stores CSE 444 - Spring 2014 1 References Scalable SQL and NoSQL Data Stores, Rick Cattell, SIGMOD Record, December 2010 (Vol. 39, No. 4)

More information

Distributed Data Analytics Partitioning

Distributed Data Analytics Partitioning G-3.1.09, Campus III Hasso Plattner Institut Different mechanisms but usually used together Distributing Data Replication vs. Replication Store copies of the same data on several nodes Introduces redundancy

More information

HBase: column-oriented database

HBase: column-oriented database HBase: column-oriented database Software Languages Team University of Koblenz-Landau Ralf Lämmel and Andrei Varanovich Motivation Billions of rows X millions of columns X thousands of versions = terabytes

More information

Cassandra, MongoDB, and HBase. Cassandra, MongoDB, and HBase. I have chosen these three due to their recent

Cassandra, MongoDB, and HBase. Cassandra, MongoDB, and HBase. I have chosen these three due to their recent Tanton Jeppson CS 401R Lab 3 Cassandra, MongoDB, and HBase Introduction For my report I have chosen to take a deeper look at 3 NoSQL database systems: Cassandra, MongoDB, and HBase. I have chosen these

More information

big picture parallel db (one data center) mix of OLTP and batch analysis lots of data, high r/w rates, 1000s of cheap boxes thus many failures

big picture parallel db (one data center) mix of OLTP and batch analysis lots of data, high r/w rates, 1000s of cheap boxes thus many failures Lecture 20 -- 11/20/2017 BigTable big picture parallel db (one data center) mix of OLTP and batch analysis lots of data, high r/w rates, 1000s of cheap boxes thus many failures what does paper say Google

More information

Azure-persistence MARTIN MUDRA

Azure-persistence MARTIN MUDRA Azure-persistence MARTIN MUDRA Storage service access Blobs Queues Tables Storage service Horizontally scalable Zone Redundancy Accounts Based on Uri Pricing Calculator Azure table storage Storage Account

More information

References. What is Bigtable? Bigtable Data Model. Outline. Key Features. CSE 444: Database Internals

References. What is Bigtable? Bigtable Data Model. Outline. Key Features. CSE 444: Database Internals References CSE 444: Database Internals Scalable SQL and NoSQL Data Stores, Rick Cattell, SIGMOD Record, December 2010 (Vol 39, No 4) Lectures 26 NoSQL: Extensible Record Stores Bigtable: A Distributed

More information

/ Cloud Computing. Recitation 10 March 22nd, 2016

/ Cloud Computing. Recitation 10 March 22nd, 2016 15-319 / 15-619 Cloud Computing Recitation 10 March 22nd, 2016 Overview Administrative issues Office Hours, Piazza guidelines Last week s reflection Project 3.3, OLI Unit 4, Module 15, Quiz 8 This week

More information

MapReduce & BigTable

MapReduce & BigTable CPSC 426/526 MapReduce & BigTable Ennan Zhai Computer Science Department Yale University Lecture Roadmap Cloud Computing Overview Challenges in the Clouds Distributed File Systems: GFS Data Process & Analysis:

More information

CISC 7610 Lecture 5 Distributed multimedia databases. Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL

CISC 7610 Lecture 5 Distributed multimedia databases. Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL CISC 7610 Lecture 5 Distributed multimedia databases Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL Motivation YouTube receives 400 hours of video per minute That is 200M hours

More information

Rapid Automated Indication of Link-Loss

Rapid Automated Indication of Link-Loss Rapid Automated Indication of Link-Loss Fast recovery of failures is becoming a hot topic of discussion for many of today s Big Data applications such as Hadoop, HBase, Cassandra, MongoDB, MySQL, MemcacheD

More information

Trafodion Enterprise-Class Transactional SQL-on-HBase

Trafodion Enterprise-Class Transactional SQL-on-HBase Trafodion Enterprise-Class Transactional SQL-on-HBase Trafodion Introduction (Welsh for transactions) Joint HP Labs & HP-IT project for transactional SQL database capabilities on Hadoop Leveraging 20+

More information

MySQL Cluster Web Scalability, % Availability. Andrew

MySQL Cluster Web Scalability, % Availability. Andrew MySQL Cluster Web Scalability, 99.999% Availability Andrew Morgan @andrewmorgan www.clusterdb.com Safe Harbour Statement The following is intended to outline our general product direction. It is intended

More information

Google File System and BigTable. and tiny bits of HDFS (Hadoop File System) and Chubby. Not in textbook; additional information

Google File System and BigTable. and tiny bits of HDFS (Hadoop File System) and Chubby. Not in textbook; additional information Subject 10 Fall 2015 Google File System and BigTable and tiny bits of HDFS (Hadoop File System) and Chubby Not in textbook; additional information Disclaimer: These abbreviated notes DO NOT substitute

More information

BigTable. CSE-291 (Cloud Computing) Fall 2016

BigTable. CSE-291 (Cloud Computing) Fall 2016 BigTable CSE-291 (Cloud Computing) Fall 2016 Data Model Sparse, distributed persistent, multi-dimensional sorted map Indexed by a row key, column key, and timestamp Values are uninterpreted arrays of bytes

More information

Insight Case Studies. Tuning the Beloved DB-Engines. Presented By Nithya Koka and Michael Arnold

Insight Case Studies. Tuning the Beloved DB-Engines. Presented By Nithya Koka and Michael Arnold Insight Case Studies Tuning the Beloved DB-Engines Presented By Nithya Koka and Michael Arnold Who is Nithya Koka? Senior Hadoop Administrator Project Lead Client Engagement On-Call Engineer Cluster Ninja

More information

Achieving Horizontal Scalability. Alain Houf Sales Engineer

Achieving Horizontal Scalability. Alain Houf Sales Engineer Achieving Horizontal Scalability Alain Houf Sales Engineer Scale Matters InterSystems IRIS Database Platform lets you: Scale up and scale out Scale users and scale data Mix and match a variety of approaches

More information

Performance Analysis of Hbase

Performance Analysis of Hbase Performance Analysis of Hbase Neseeba P.B, Dr. Zahid Ansari Department of Computer Science & Engineering, P. A. College of Engineering, Mangalore, 574153, India Abstract Hbase is a distributed column-oriented

More information

Distributed Systems [Fall 2012]

Distributed Systems [Fall 2012] Distributed Systems [Fall 2012] Lec 20: Bigtable (cont ed) Slide acks: Mohsen Taheriyan (http://www-scf.usc.edu/~csci572/2011spring/presentations/taheriyan.pptx) 1 Chubby (Reminder) Lock service with a

More information

Large-Scale GPU programming

Large-Scale GPU programming Large-Scale GPU programming Tim Kaldewey Research Staff Member Database Technologies IBM Almaden Research Center tkaldew@us.ibm.com Assistant Adjunct Professor Computer and Information Science Dept. University

More information

TRANSACTIONS OVER HBASE

TRANSACTIONS OVER HBASE TRANSACTIONS OVER HBASE Alex Baranau @abaranau Gary Helmling @gario Continuuity WHO WE ARE We ve built Continuuity Reactor: the world s first scale-out application server for Hadoop Fast, easy development,

More information

TRANSACTIONS AND ABSTRACTIONS

TRANSACTIONS AND ABSTRACTIONS TRANSACTIONS AND ABSTRACTIONS OVER HBASE Andreas Neumann @anew68! Continuuity AGENDA Transactions over HBase: Why? What? Implementation: How? The approach Transaction Manager Abstractions Future WHO WE

More information

Part 1: Indexes for Big Data

Part 1: Indexes for Big Data JethroData Making Interactive BI for Big Data a Reality Technical White Paper This white paper explains how JethroData can help you achieve a truly interactive interactive response time for BI on big data,

More information

Column Stores and HBase. Rui LIU, Maksim Hrytsenia

Column Stores and HBase. Rui LIU, Maksim Hrytsenia Column Stores and HBase Rui LIU, Maksim Hrytsenia December 2017 Contents 1 Hadoop 2 1.1 Creation................................ 2 2 HBase 3 2.1 Column Store Database....................... 3 2.2 HBase

More information

Apache BookKeeper. A High Performance and Low Latency Storage Service

Apache BookKeeper. A High Performance and Low Latency Storage Service Apache BookKeeper A High Performance and Low Latency Storage Service Hello! I am Sijie Guo - PMC Chair of Apache BookKeeper Co-creator of Apache DistributedLog Twitter Messaging/Pub-Sub Team Yahoo! R&D

More information

Hadoop An Overview. - Socrates CCDH

Hadoop An Overview. - Socrates CCDH Hadoop An Overview - Socrates CCDH What is Big Data? Volume Not Gigabyte. Terabyte, Petabyte, Exabyte, Zettabyte - Due to handheld gadgets,and HD format images and videos - In total data, 90% of them collected

More information

YCSB++ benchmarking tool Performance debugging advanced features of scalable table stores

YCSB++ benchmarking tool Performance debugging advanced features of scalable table stores YCSB++ benchmarking tool Performance debugging advanced features of scalable table stores Swapnil Patil M. Polte, W. Tantisiriroj, K. Ren, L.Xiao, J. Lopez, G.Gibson, A. Fuchs *, B. Rinaldi * Carnegie

More information

Bigtable: A Distributed Storage System for Structured Data by Google SUNNIE CHUNG CIS 612

Bigtable: A Distributed Storage System for Structured Data by Google SUNNIE CHUNG CIS 612 Bigtable: A Distributed Storage System for Structured Data by Google SUNNIE CHUNG CIS 612 Google Bigtable 2 A distributed storage system for managing structured data that is designed to scale to a very

More information

4/9/2018 Week 13-A Sangmi Lee Pallickara. CS435 Introduction to Big Data Spring 2018 Colorado State University. FAQs. Architecture of GFS

4/9/2018 Week 13-A Sangmi Lee Pallickara. CS435 Introduction to Big Data Spring 2018 Colorado State University. FAQs. Architecture of GFS W13.A.0.0 CS435 Introduction to Big Data W13.A.1 FAQs Programming Assignment 3 has been posted PART 2. LARGE SCALE DATA STORAGE SYSTEMS DISTRIBUTED FILE SYSTEMS Recitations Apache Spark tutorial 1 and

More information

Database Evolution. DB NoSQL Linked Open Data. L. Vigliano

Database Evolution. DB NoSQL Linked Open Data. L. Vigliano Database Evolution DB NoSQL Linked Open Data Requirements and features Large volumes of data..increasing No regular data structure to manage Relatively homogeneous elements among them (no correlation between

More information

Indexing. Week 14, Spring Edited by M. Naci Akkøk, , Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel

Indexing. Week 14, Spring Edited by M. Naci Akkøk, , Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel Indexing Week 14, Spring 2005 Edited by M. Naci Akkøk, 5.3.2004, 3.3.2005 Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel Overview Conventional indexes B-trees Hashing schemes

More information

Hypertable: The Storage Infrastructure behind Rediffmail - one of the World s Largest Services. Introduction. Current Architecture

Hypertable: The Storage Infrastructure behind Rediffmail - one of the World s Largest  Services. Introduction. Current Architecture Hypertable: The Storage Infrastructure behind Rediffmail - one of the World s Largest Email Services Doug Judd CEO, Hypertable Inc. Introduction Rediff.com India (Nasdaq: REDF) is one of India's top Internet

More information

Tuning Enterprise Information Catalog Performance

Tuning Enterprise Information Catalog Performance Tuning Enterprise Information Catalog Performance Copyright Informatica LLC 2015, 2018. Informatica and the Informatica logo are trademarks or registered trademarks of Informatica LLC in the United States

More information

PoS(ISGC 2011 & OGF 31)075

PoS(ISGC 2011 & OGF 31)075 1 GIS Center, Feng Chia University 100 Wen0Hwa Rd., Taichung, Taiwan E-mail: cool@gis.tw Tien-Yin Chou GIS Center, Feng Chia University 100 Wen0Hwa Rd., Taichung, Taiwan E-mail: jimmy@gis.tw Lan-Kun Chung

More information

About Speaker. Shrwan Krishna Shrestha. /

About Speaker. Shrwan Krishna Shrestha. / PARTITION About Speaker Shrwan Krishna Shrestha shrwan@sqlpassnepal.org / shrwan@gmail.com 98510-50947 Topics to be covered Partition in earlier versions Table Partitioning Overview Benefits of Partitioned

More information

Bigtable: A Distributed Storage System for Structured Data. Andrew Hon, Phyllis Lau, Justin Ng

Bigtable: A Distributed Storage System for Structured Data. Andrew Hon, Phyllis Lau, Justin Ng Bigtable: A Distributed Storage System for Structured Data Andrew Hon, Phyllis Lau, Justin Ng What is Bigtable? - A storage system for managing structured data - Used in 60+ Google services - Motivation:

More information

EsgynDB Enterprise 2.0 Platform Reference Architecture

EsgynDB Enterprise 2.0 Platform Reference Architecture EsgynDB Enterprise 2.0 Platform Reference Architecture This document outlines a Platform Reference Architecture for EsgynDB Enterprise, built on Apache Trafodion (Incubating) implementation with licensed

More information

Distributed Filesystem

Distributed Filesystem Distributed Filesystem 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributing Code! Don t move data to workers move workers to the data! - Store data on the local disks of nodes in the

More information

Hadoop ecosystem. Nikos Parlavantzas

Hadoop ecosystem. Nikos Parlavantzas 1 Hadoop ecosystem Nikos Parlavantzas Lecture overview 2 Objective Provide an overview of a selection of technologies in the Hadoop ecosystem Hadoop ecosystem 3 Hadoop ecosystem 4 Outline 5 HBase Hive

More information

Google File System. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google fall DIP Heerak lim, Donghun Koo

Google File System. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google fall DIP Heerak lim, Donghun Koo Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google 2017 fall DIP Heerak lim, Donghun Koo 1 Agenda Introduction Design overview Systems interactions Master operation Fault tolerance

More information

Google Cloud Bigtable. And what it's awesome at

Google Cloud Bigtable. And what it's awesome at Google Cloud Bigtable And what it's awesome at Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes Agenda 1 Research 2 A story about bigness 3 How it works 4 When it's awesome Google Research

More information

RAID in Practice, Overview of Indexing

RAID in Practice, Overview of Indexing RAID in Practice, Overview of Indexing CS634 Lecture 4, Feb 04 2014 Slides based on Database Management Systems 3 rd ed, Ramakrishnan and Gehrke 1 Disks and Files: RAID in practice For a big enterprise

More information

HDFS Federation. Sanjay Radia Founder and Hortonworks. Page 1

HDFS Federation. Sanjay Radia Founder and Hortonworks. Page 1 HDFS Federation Sanjay Radia Founder and Architect @ Hortonworks Page 1 About Me Apache Hadoop Committer and Member of Hadoop PMC Architect of core-hadoop @ Yahoo - Focusing on HDFS, MapReduce scheduler,

More information

Fusion iomemory PCIe Solutions from SanDisk and Sqrll make Accumulo Hypersonic

Fusion iomemory PCIe Solutions from SanDisk and Sqrll make Accumulo Hypersonic WHITE PAPER Fusion iomemory PCIe Solutions from SanDisk and Sqrll make Accumulo Hypersonic Western Digital Technologies, Inc. 951 SanDisk Drive, Milpitas, CA 95035 www.sandisk.com Table of Contents Executive

More information

Cluster-Level Google How we use Colossus to improve storage efficiency

Cluster-Level Google How we use Colossus to improve storage efficiency Cluster-Level Storage @ Google How we use Colossus to improve storage efficiency Denis Serenyi Senior Staff Software Engineer dserenyi@google.com November 13, 2017 Keynote at the 2nd Joint International

More information