A Graph-based Database Partitioning Method for Parallel OLAP Query Processing

Size: px

Start display at page:

Download "A Graph-based Database Partitioning Method for Parallel OLAP Query Processing"

Lenard Thompson
5 years ago
Views:

1 ICDE 18 A Graph-based Database Partitioning Method for Parallel OLAP Query Processing Yoon-Min Nam, Min-Soo Kim*, Donghyoung Han Department of Information and Communication Engineering DGIST, Republic of Korea * a corresponding author

2 Outline Introduction Motivation GPT: Graph-based database partitioning scheme HMC: Hash-based multi-column partitioning Experimental results Conclusions 2

3 Join processing in parallel system Need shuffle relations according to their join key values - due to their data distribution, i.e., blocks in HDFS - shuffle becomes more expensive as the input size increases R " R " ' S " ' S " R % R % ' S % ' S % R & R & ' S & ' S & Shuffle R Shuffle S 3

4 Database partitioning Co-locate a set of rows according to their join key values - to avoid expensive shuffle, i.e., co-partitioned join Pre-shuffle of R R P " (R) P " (R) P " (S) P " (S) Pre-shuffle of S S P % (R) P % (R) P % (S) P % (S) P & (R) P & (R) P & (S) P & (S) Horizontal partitioning R Horizontal partitioning S 4

5 Three important issues in database partitioning Which tables are partitioned? Which column(s) is/are used for table partitioning? How to partition a table? (e.g., hash / range / round-robin / ) ss_sold_date ss_item ss_cdemo ss_addr ss_promo ss_quantity ss_quantity ss_sales_price ss_ext_sales_price ss_ext_price_list ss_coupon_amt ss_net_paid_inc_tax ss_sold_time ss_customer ss_hdemo ss_store ss_ticket_number ss_wholesales_cost ss_list_price ss_ext_discount_amt ss_ext_wholesales_cost ss_ext_tax ss_net_paid ss_net_profit (a) BioWarehouse schema [BMC 15] (b) A list of 24 columns of Store_Sales table in TPC-DS 5

Existing database partitioning schemes REF [SIGMOD 08] -

Lineitem (child) (a) Referential constraint in TPC-H schema

approach by additionally exploiting join predicate - more

6 Existing database partitioning schemes REF [SIGMOD 08] - exploit referential constraint references REF Orders (parent) Lineitem (child) (a) Referential constraint in TPC-H schema (b) REF partitioned tables PREF [SIGMOD 15] - generalize REF approach by additionally exploiting join predicate - more chance to co-partition tables - use tree-structured partitioning scheme 6

Drawback of PREF(1): large data redundancy Table-level duplicates (e.g., PREF/WD, workload-driven) Tuple-level duplicates (e.g., PREF/SD, schema-driven) - data dependency btw.

7 Drawback of PREF(1): large data redundancy Table-level duplicates (e.g., PREF/WD, workload-driven) Tuple-level duplicates (e.g., PREF/SD, schema-driven) - data dependency btw. parent-child tables (reference partitioning) - cumulative redundancy (caused by data dependency) Tree #4 SR SS C D CD HD CA I CS SR SS Tree #5 CA HD I T C CS WS SS CD D cd_demo = ss_cdemo CD ss_sold_date = d_date SS CR SR WS D WP WR Inv Tree #2 D CS CP (a) Table-level duplicates (PREF/WD) 7 (b) Data dependency (PREF/SD)

8 Tuple-level duplicates in PREF scheme SS cdemo item date P 1 SS D WR SS D WR D date 1 2 WR date item (a) Sample database P 2 P (b) PREF/SD 8 (c) GPT/SD

9 Drawback of PREF(2): low query performance Large scan & join overhead due to large DR - DR: data redundancy NOT support local query processing in many cases Example: TPC-DS Q17 - nine join operations among five tables - PREF requires shuffle tables for two joins (e.g., CS-D and SR-D) CS SR D I SS Tree #2 CS SR SS HD I T D WS WR CS SR D I CR SS Inv (a) Query join graph of TPC-DS Q17 (b) PREF/WD (c) GPT/WD 9

10 PREF vs. GPT (TPC-DS benchmark) Tree #1 Tree #2 Tree #3 Tree #4 Tree #5 Tree #7 Tree #6 (a) PREF/WD (b) Partitioning graph of GPT/WD 10

11 Observation in our graph-based approach Star, snowflake and snowstorm schemas in OLAP (VLDB 14) - Join query in OLAP comes from foreign key relationships in the schema k cct cc ct cn ct cn mk miidx mi it lt ml t pi kt n ci mc at k ch rt an miidx it mi t kt mc (a) IMDB schema (VLDB 15) (b) example query join graph Partitioning scheme as a Graph (PG) - e.g., partitioning a large table and replicating a small table - hub vertex: shared (dimension) table or frequently joined table - select expensive and frequently used join op. as an edge of PG 11

Vertices and edges in PG (Partitioning Graph) Three kinds of table -

from Rep type to Part type Two special edges - triedge: a set of edges

a triangle R R[i] e 1 X[m] X Part -table Hub -table S e S[j] T[k] T e 2

12 Vertices and edges in PG (Partitioning Graph) Three kinds of table - Part (partitioned), Rep (replicated) and Hub - Hub table is converted from Rep type to Part type Two special edges - triedge: a set of edges that form a triangle - indirect join edge: does not exist in G but form a triangle R R[i] e 1 X[m] X Part -table Hub -table S e S[j] T[k] T e 2 Y[n] Y Co-partitioning edge triedges (e) indirect edges of e 1 and e 2 12

13 Problem definition Given a database D, and a join graph G = V, E, l, w, the problem is finding the optimal partitioned graph PG s.t. PG = argmax CDE PG{benefit(PG P. E)} Benefit model for three edge types - intra (E a ), inter (E r ) and indirect join edge (E t ) R R[i] e 1 T T[k] X[m] e & X S S[j] e 2 z[l] Y[n] Y Z 13

14 HMC: Hash-based multi-column partitioning Co-partitioning method for edges of PG Partition a table by hashing its partitioning column values - N partitions for a table T - P T = {P 1 T, P N T } Store a tuple t T up to κ different partitions among P(T) - using κ different partitioning columns, C T = {c 1,, c κ } Indicate duplicates by using bitmap vector - each t T has bitvector bitv = [b 1,, b κ ] Dup - e.g., h t. c k = i à t T is copied to P i (T) à set b k = 1 for t 14

15 Example of HMC method N = 3 and 1 h(k) 3 - e.g., h 1 = 1, h 2 = 2, h 3 = 3 and h 4 = 1 R R[1] R[2] R[3] R[1] R[2] R[3] P " R P(R) dup (1,0) (1,1) dup(p " R ) P(S) S[1] S[2] 1 5 P " S S S[1] S[2] P % R P & R (1,1) (0,1) dup(p % R ) (0,1) (1,0) dup(p & R ) 2 3 P % S 3 2 P & S 15

16 Subpartition for efficient duplicate elimination The number of partitioning columns limits the number of unique bit vectors (bitv) - i.e., κ = 2 à 2 2 bitv: {00, 01, 10, 11} à 2 2 Subpartitions Co-locate tuples having the same bit vector together Scan only necessary Subpartitions depends on the query R R[1] R[2] R[3] Subpartition 1 bitv = (0, 0) P " R P % R P & R Subpartition 2 bitv = (0, 1) R[1] R[2] R[3] R[1] R[2] R[3] Subpartition 3 bitv = (1, 0) R[1] R[2] R[3] R[1] R[2] R[3] Subpartition 4 bitv = (1, 1) R[1] R[2] R[3] R[1] R[2] R[3]

17 Experimental evaluation Three performance measures - Data redundancy - Data bulk loading performance - Query performance Experimental environments - 11 machines as default, i.e., 1 master + 10 nodes (up to 21 machines) - H/W: each machine has six-core i7 CPU, 32 RAM, 1.2TB PCIE- SSD, 1Gbps Ethernet network - S/W: Hadoop TPC-DS benchmark for dataset and queries 17

18 Data redundancy PREF suffers from large data redundancy - PREF/SD (tuple-level dup.) and PREF/WD (table-level dup.) GPT s DR only slightly increases due to Rep-Tables - no data-dependency due to hash-based co-partitioning (HMC) - no table-level duplicates Data redundancy PREF/SD GPT/SD PREF/WD GPT/WD SF=25 SF=50 SF=100 Scale of dataset (N=50) Data redundancy N=25 N=50 N=100 # partitions (SF=100) 233 (a) Changing scale of dataset (b) Changing # partitions 18

19 Data bulk loading performance Dataset: TPC-DS Bulk loading performance DR 4000 PREF/SD PREF/WD Elapsed time (sec.) GPT/SD GPT/WD Elapsed time (sec.) SF=25 SF=50 SF=100 Scale of dataset (N=10) (a) Changing scale of dataset 0 N=25 N=50 N=100 # partitions (SF=100) (b) Changing # partitions 19

20 Query performance (1): TPC-DS 1TB Elapsed time (sec.) PREF/SD GPT/SD Q3 Q7 Q17 Q25 Q27 Q29 Q42 Q43 Q48 Q50 Q52 Q55 Q58 Q71 Q73 Q76 Q79 Q82 Q85 Q93 Queries (a) Comparison results for schema-driven approach(y-axis is log-scale) Elapsed time (sec.) PREF/WD GPT/WD Q3 Q7 Q17 Q25 Q27 Q29 Q42 Q43 Q48 Q50 Q52 Q55 Q58 Q71 Q73 Q76 Q79 Q82 Q85 Q93 Queries (b) Comparison results for workload-driven approach 20

21 Query performance (2): IMDB and BioWarehouse mov_id MI_idx mov_id MI MK mov_id id MC mov_id comp_id T id N CN PI p_id mov_id p_id AN CI (a) IMDB (κ = 2) id id BStoBSS BS_id BStoG id G_id GtoP G_id BStoP S CT ST P_idBS_id DS_id O_id O_id O_id BioSource Entry id DS_id DS_id O_id Gene id DS id DS_id id C idds_id id DS_id DS_id id Protein Feature Term (b) BioWarehouse (κ = 2) CR O_id O_id C_id CtoO T_id RT Data redundancy IMDB 12 PREF/WD GPT/WD 104 Dataset (c) Data redundancy 32 BioWarehouse Total elapsed time (sec.) IMDB BioWarehouse Dataset (d) Query performance 21

22 Query performance (3) TPC-DS 1TB Linear scalability Performance breakdown - varying optimization techniques Elapsed time (sec.) GPT partitioning X O O X O O X O O Subpartitioning X X O X X O X X O TPC-DS Q17 TPC-DS Q29 TPC-DS Q85 (b) Performance breakdown 22 Total elapsed time (sec.) # machines (a) Scalability

23 Conclusions We propose a graph-based partitioning scheme for fast and efficient OLAP query processing We propose two techniques for horizontal database partitioning in OLAP setting - GPT: low data redundancy while maximizing query performance - HMC: hash-based multi-column partitioning strategy with efficient duplicate elimination method We show GPT significantly outperforms PREF - e.g., 48% faster performance with 2.35 X less space in TPC-DS 23

24 Thank you! Any questions?

25 References [BMC 15] "BioWarehouse: a bioinformatics database warehouse toolkit., BMC bioinformatics 7. 1 (2006): 170. [TPC-DS] "The making of TPC-DS., PVLDB, [SIGMOD 15] "Locality-aware partitioning in parallel database systems., SIGMOD, [SIGMOD 08] "Supporting table partitioning by reference in oracle., SIGMOD, [VLDB 14] Of snowstorms and bushy trees., PVLDB, [VLDB 15] How Good Are Query Optimizers, Really?, PVLDB, 2015.

26 Back-up slides

27 GPT method 24

28 Table classification Cost-based approach S " (100 ) S & (300 ) T (50 MB) (a) Rep-table scenario S % (200 ) S (400 ) S " (100 ) S & (300 ) T (500 ) (b) Part-table scenario S % (200 ) S (400 ) 25

29 Characteristics of GPT Query performance while varying κ and H/W configurations - TPC-DS 1TB and 20 Queries (a) Data Redundancy (b) Query Performance 26

30 GPT on Apache SPARK How to use? - in table creation DDL, turn GPT partitioning on - support all file formats, e.g., CSV and Parquet GPT-aware query plan generation - automatically generates a query plan without shuffles if possible - scan op. exploits bitv to scan Subpartitions depend on the query 27

31 Elapsed time (per query / log scale) 5,000 Elapsed time (sec.) TPC-DS Query (1TB database) Hive Spark Spark+Bucketing GPT (MapReduce) GPT (Spark) 28

A Graph-based Database Partitioning Method for Parallel OLAP Query Processing

A Graph-based Database Partitioning Method for Parallel OLAP Query Processing Yoon-Min Nam, Min-Soo Kim*, Donghyoung Han DGIST, Republic of Korea {ronymin, mskim, icedrak}@dgist.ac.kr Abstract As the amount