Advanced Query Optimization

Size: px
Start display at page:

Download "Advanced Query Optimization"

Transcription

1 Advanced Query Optimization Andreas Meister Otto-von-Guericke University Magdeburg Summer Term 2018

2 Why do we need query optimization? Andreas Meister Advanced Query Optimization Last Change: April 23, /90

3 SQL - TPC-H - Query 5 SELECT N_NAME, SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)) FROM CUSTOMER, ORDERS, LINEITEM, SUPPLIER, NATION, REGION WHERE C_CUSTKEY = O_CUSTKEY AND L_ORDERKEY = O_ORDERKEY AND L_SUPPKEY = S_SUPPKEY AND C_NATIONKEY = S_NATIONKEY AND S_NATIONKEY = N_NATIONKEY AND N_REGIONKEY = R_REGIONKEY AND R_NAME = ASIA AND O_ORDERDATE >= AND O_ORDERDATE < GROUP BY N_NAME ORDER BY REVENUE DESC Andreas Meister Advanced Query Optimization Last Change: April 23, /90

4 SQL - TPC-H - Query 5 SELECT N_NAME, SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)) FROM CUSTOMER, ORDERS, LINEITEM, SUPPLIER, NATION, REGION WHERE C_CUSTKEY = O_CUSTKEY AND L_ORDERKEY = O_ORDERKEY AND L_SUPPKEY = S_SUPPKEY AND C_NATIONKEY = S_NATIONKEY AND S_NATIONKEY = N_NATIONKEY AND N_REGIONKEY = R_REGIONKEY AND R_NAME = ASIA AND O_ORDERDATE >= AND O_ORDERDATE < GROUP BY N_NAME ORDER BY REVENUE DESC How would you execute this query? Andreas Meister Advanced Query Optimization Last Change: April 23, /90

5 Challenges SQL Declarative (What, not how!) Andreas Meister Advanced Query Optimization Last Change: April 23, /90

6 Challenges SQL Declarative (What, not how!) Plenty options to chose from: Operator variants (Joins: NL, BNL, Hash, Sort-Merge,...) Table access: Index vs Scans Execution: Operator order... Andreas Meister Advanced Query Optimization Last Change: April 23, /90

7 Challenges SQL Declarative (What, not how!) Plenty options to chose from: Operator variants (Joins: NL, BNL, Hash, Sort-Merge,...) Table access: Index vs Scans Execution: Operator order... Options are based on properties of database: Data type Data distribution... Andreas Meister Advanced Query Optimization Last Change: April 23, /90

8 Challenges SQL Declarative (What, not how!) Plenty options to chose from: Operator variants (Joins: NL, BNL, Hash, Sort-Merge,...) Table access: Index vs Scans Execution: Operator order... Options are based on properties of database: Data type Data distribution... Why do we not chose an option randomly? Andreas Meister Advanced Query Optimization Last Change: April 23, /90

9 Challenge Because the efficiency depends on the choice. runtime [ms] random query plans [sorted by runtime] Adapted from [Neumann, 2014] Andreas Meister Advanced Query Optimization Last Change: April 23, /90

10 How is efficiency ensured? Andreas Meister Advanced Query Optimization Last Change: April 23, /90

11 Query Processing - Phases SQL-Query Result LogicalL optimization PhysicalL optimization Cost-based Selection TranslationL& ViewLresolving Algebra StandardizationL& Simplification Optimization Algebra Access plan Translation time Execution Runtime Code Code-Generation PlanL parameterization Access plan Adapted from [Saake et al., 2012] Andreas Meister Advanced Query Optimization Last Change: April 23, /90

12 Query Processing - Translation & View Resolving Simplification of arithmetic expressions Resolve sub-queries Insertion of view definitions ɣ SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT));N_NAME σ C_CUSTKEY = O_CUSTKEY AND L_ORDERKEY = O_ORDERKEY AND L_SUPPKEY = S_SUPPKEY AND C_NATIONKEY = S_NATIONKEY AND S_NATIONKEY = N_NATIONKEY AND N_REGIONKEY = R_REGIONKEY AND R_NAME = ASIA AND O_ORDERDATE >= AND O_ORDERDATE < ' X X REGION X NATION SUPPLIER LINEITEM X X ORDERS CUSTOMER Andreas Meister Advanced Query Optimization Last Change: April 23, /90

13 Query Processing - Standardization & Simplification Expressions: Conjunctive normal form: (p 11 p 1n ) (p m1 p mn ) Disjunktive normal form: (p 11 p 1n ) (p m1 p mn ) Query: Unnesting σ A=B R 1 <Condition> R 1 R 2 A in π B R 2 Andreas Meister Advanced Query Optimization Last Change: April 23, /90

14 Query Processing - Optimization Goal: Efficient query execution Andreas Meister Advanced Query Optimization Last Change: April 23, /90

15 Query Processing - Optimization Goal: Efficient query execution Three phases: Logical optimization Physical optimization Cost-based selection Andreas Meister Advanced Query Optimization Last Change: April 23, /90

16 Query Processing - Optimization Goal: Efficient query execution Three phases: Logical optimization Physical optimization Cost-based selection Method: Use available information about Data (distribution, type, ) Database system (algorithms, processors, ) Query (operators, restrictions, ) Andreas Meister Advanced Query Optimization Last Change: April 23, /90

17 Query Processing - Logical Optimization Apply optimization rules ɣ SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT));N_NAME N_REGIONKEY = R_REGIONKEY σ R_NAME = ASIA S_NATIONKEY = N_NATIONKEY REGION NATION L_SUPPKEY = S_SUPPKEY AND C_NATIONKEY = S_NATIONKEY SUPPLIER L_ORDERKEY = O_ORDERKEY LINEITEM C_CUSTKEY = O_CUSTKEY σ O_ORDERDATE >= ' AND O_ORDERDATE < ' CUSTOMER ORDERS Andreas Meister Advanced Query Optimization Last Change: April 23, /90

18 Query Processing - Algorithm Simple optimization algorithm Resolve complex selection predicate, if applicable resolving of and Remove redundant operators Pushing down selections as near as possible to the leaf Resolve cross joins Pushing projections in leaf direction Single steps will be executed in the stated order until no replacement can be performed Andreas Meister Advanced Query Optimization Last Change: April 23, /90

19 Query Processing - Physical Optimization Consider: Storage information (Indexes, page size, ) Algorithms Processors ɣ SORT SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT));N_NAME SORT N_REGIONKEY = R_REGIONKEY σ REL R_NAME = ASIA HASH S_NATIONKEY = N_NATIONKEY REGION NATION SORT L_SUPPKEY = S_SUPPKEY AND C_NATIONKEY = S_NATIONKEY SUPPLIER NLJ L_ORDERKEY = O_ORDERKEY LINEITEM HASH C_CUSTKEY = O_CUSTKEY σ IND O_ORDERDATE >= ' AND O_ORDERDATE < ' CUSTOMER ORDERS Andreas Meister Advanced Query Optimization Last Change: April 23, /90

20 Query Processing - Cost-Based Selection query span search space transformation rules equivalent plans search strategy cost estimations "best" plan Adapted from [Saake et al., 2012] Often combined with physical optimization Andreas Meister Advanced Query Optimization Last Change: April 23, /90

21 Query Processing - Plan Parametrization Prepared Statements: Optimize once Execute multiple times Configuration via variables Replace variables with values Andreas Meister Advanced Query Optimization Last Change: April 23, /90

22 Query Processing - Code Generation Compile query into executable code Time (seconds) 100 "tuple at a time" DBMS "X" MySQL 4.1 interpretation dominates execution interpretation overhead decreases "column at a time" MonetDB/MIL main memory materialization overhead query without selection 1 vectors start to exceed 0.60 CPU cache, causing MonetDB/X100 extra memory traffic "vector at a time" 0.22 low interpretation overhead Hand Coded in cache materialization C Program K 4K 16K 64K 256K 1M 4M 6M TPC-H Query 1 with different vector size [Zukowski et al., 2005] Andreas Meister Advanced Query Optimization Last Change: April 23, /90

23 Query Processing - Execution Planed all details Execute the code Andreas Meister Advanced Query Optimization Last Change: April 23, /90

24 Query Processing - Execution Planed all details Execute the code What could possibly go wrong? Andreas Meister Advanced Query Optimization Last Change: April 23, /90

25 Query Processing - Execution Planed all details Execute the code What could possibly go wrong? In practice: Wrong assumptions Bad estimates Andreas Meister Advanced Query Optimization Last Change: April 23, /90

26 Query Processing - Execution Planed all details Execute the code What could possibly go wrong? In practice: Wrong assumptions Bad estimates Consider runtime information to adapt query-execution Andreas Meister Advanced Query Optimization Last Change: April 23, /90

27 How does the optimization work? Andreas Meister Advanced Query Optimization Last Change: April 23, /90

28 Andreas Meister Advanced Query Optimization Last Change: April 23, /90 In specific: Join-Order Cost-based Optimization query span search space transformation rules equivalent plans search strategy cost estimations "best" plan Adapted from [Saake et al., 2012]

29 SQL - TPC-H - Query 5 SELECT N_NAME, SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)) FROM CUSTOMER, ORDERS, LINEITEM, SUPPLIER, NATION, REGION WHERE C_CUSTKEY = O_CUSTKEY AND L_ORDERKEY = O_ORDERKEY AND L_SUPPKEY = S_SUPPKEY AND C_NATIONKEY = S_NATIONKEY AND S_NATIONKEY = N_NATIONKEY AND N_REGIONKEY = R_REGIONKEY AND R_NAME = ASIA AND O_ORDERDATE >= AND O_ORDERDATE < GROUP BY N_NAME ORDER BY REVENUE DESC Andreas Meister Advanced Query Optimization Last Change: April 23, /90

30 SQL - TPC-H - Query 5 SELECT N_NAME, SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)) FROM CUSTOMER, ORDERS, LINEITEM, SUPPLIER, NATION, REGION WHERE C_CUSTKEY = O_CUSTKEY AND L_ORDERKEY = O_ORDERKEY AND L_SUPPKEY = S_SUPPKEY AND C_NATIONKEY = S_NATIONKEY AND S_NATIONKEY = N_NATIONKEY AND N_REGIONKEY = R_REGIONKEY AND R_NAME = ASIA AND O_ORDERDATE >= AND O_ORDERDATE < GROUP BY N_NAME ORDER BY REVENUE DESC In which order should the join be performed? Andreas Meister Advanced Query Optimization Last Change: April 23, /90

31 Join-Order Optimization Why consider join-order optimization? runtime [ms] random query plans [sorted by runtime] Adapted from [Neumann, 2014] Andreas Meister Advanced Query Optimization Last Change: April 23, /90

32 Join-Order Optimization - Challenge Goal: Find an efficient join order Andreas Meister Advanced Query Optimization Last Change: April 23, /90

33 Join-Order Optimization - Challenge Goal: Find an efficient join order Challenge: NP-complete problem Limited time Andreas Meister Advanced Query Optimization Last Change: April 23, /90

34 Join-Order Optimization - Example TPC-H Q5 CUSTOMER (C) ORDERS (O) LINEITEM (L) SUPPLIER (S) NATION (N) REGION (R) L C L O L S L N L R ((L S) C) ((L S) O) ((L S) N) ((L S) R) (((L S) O) C) (((L S) O) N) (((L S) O) R) ((((L S) O) N) C) ((((L S) O) N) R) (((((L S) O) N) C) R) (((((L S) O) N) R) C) Andreas Meister Advanced Query Optimization Last Change: April 23, /90

35 Join-Order Optimization - Example TPC-H Q5 CUSTOMER (C) ORDERS (O) LINEITEM (L) SUPPLIER (S) NATION (N) REGION (R) L C L O L S L N L R ((L S) C) ((L S) O) ((L S) N) ((L S) R) (((L S) O) C) (((L S) O) N) (((L S) O) R) ((((L S) O) N) C) ((((L S) O) N) R) (((((L S) O) N) C) R) (((((L S) O) N) R) C) This is not even all! Andreas Meister Advanced Query Optimization Last Change: April 23, /90

36 Join-Order Optimization - Tree Form Deep Trees L C N O S R R S O N L C Left-Deep Trees Right-Deep Trees Andreas Meister Advanced Query Optimization Last Change: April 23, /90

37 Join-Order Optimization - Tree Form More general tree forms: L S R N O C Zick-Zack Trees L S O N C R Bushy Trees Andreas Meister Advanced Query Optimization Last Change: April 23, /90

38 Join-Order Optimization - Challenge Number of equivalent options: Left (or Right) Deep-Trees: n! For 10 tables: Andreas Meister Advanced Query Optimization Last Change: April 23, /90

39 Join-Order Optimization - Challenge Number of equivalent options: Left (or Right) Deep-Trees: n! For 10 tables: Bushy Trees: S(n) n! variants S(1) = 1 n 1 S(n) = S(i)S(n i) i=1 For 10 tables: Andreas Meister Advanced Query Optimization Last Change: April 23, /90

40 Join-Order Optimization - Topology Number of valid trees depends on the query topology: r 1 r 2 r 3... r n r 1 r 2 r 3... r n linear cyclic r 1 r 1 r 2 r 2 r 3... r n r 3 r n... star clique Considering cross-joins clique Andreas Meister Advanced Query Optimization Last Change: April 23, /90

41 Join-Order Optimization - Approaches Greedy Deterministic Exhaustive search Hybrid Randomized Transformation Sampling Genetic algorithms Andreas Meister Advanced Query Optimization Last Change: April 23, /90

42 Join-Order Optimization - Deterministic Approaches Same input Same output Andreas Meister Advanced Query Optimization Last Change: April 23, /90

43 Join-Order Optimization - Deterministic Approaches Same input Same output Greedy: Short runtime Simple heuristics Neither efficiency nor optimality guaranteed Andreas Meister Advanced Query Optimization Last Change: April 23, /90

44 Join-Order Optimization - Deterministic Approaches Same input Same output Greedy: Short runtime Simple heuristics Neither efficiency nor optimality guaranteed Exhaustive Search: Guarantee optimal results Long runtime Only applicable to simple queries Andreas Meister Advanced Query Optimization Last Change: April 23, /90

45 Join-Order Optimization - Randomized Approaches Same input Different outputs No optimality guaranteed, but often efficient results Andreas Meister Advanced Query Optimization Last Change: April 23, /90

46 Join-Order Optimization - Randomized Approaches Same input Different outputs No optimality guaranteed, but often efficient results Transformation: Selection of initial result Transformation of current result Andreas Meister Advanced Query Optimization Last Change: April 23, /90

47 Join-Order Optimization - Randomized Approaches Same input Different outputs No optimality guaranteed, but often efficient results Transformation: Selection of initial result Transformation of current result Sampling: Randomly choosing candidates Best solution is stored Andreas Meister Advanced Query Optimization Last Change: April 23, /90

48 Join-Order Optimization - Randomized Approaches Same input Different outputs No optimality guaranteed, but often efficient results Transformation: Selection of initial result Transformation of current result Sampling: Randomly choosing candidates Best solution is stored Genetic algorithms: Selection of initial solution pool Creating new solutions based on solution pool Andreas Meister Advanced Query Optimization Last Change: April 23, /90

49 Join-Order Optimization - Hybrid Approaches Advantages deterministic approaches: Predictable Ensure optimality Advantages randomized approaches: Suitable for complex optimization problems Limited runtime Hybrid approaches: Combine both deterministic and randomized approaches Andreas Meister Advanced Query Optimization Last Change: April 23, /90

50 Join-Order Optimization - Approaches Greedy Deterministic Exhaustive search Hybrid Randomized Transformation Sampling Genetic algorithms In specific: Dynamic programming Andreas Meister Advanced Query Optimization Last Change: April 23, /90

51 Join-Order Optimization - Dynamic Programming Algorithm 1: DP SIZE [Selinger et al., 1979] Input : Join query Q with n tables T = {T 1,..., T n} Output: an optimal bushy join tree 1 foreach T i T do 2 optimalplan(t i ) = T i ; 3 for s = 2 to n do 4 for s l = 1 to s 1 do 5 s r = s s l ; 6 foreach S l T : S l = s l do 7 foreach S r T : S r = s r do 8 if S l S r then continue; 9 if S l not connected to S r then continue; 10 optimal-left-plan = optimalplan(s l ); 11 optimal-right-plan = optimalplan(s r ); 12 current-plan = createjointree(optimal-left-plan, optimal-right-plan); 13 if cost(optimalplan(s l S r )) > cost(current-plan) then 14 optimalplan(s l S r ) = current-plan ; 15 return optimalplan(t ) ; Andreas Meister Advanced Query Optimization Last Change: April 23, /90

52 Dynamic Programming - DP SIZE T 4 T 3 T 2 T 1 1 # Tables Andreas Meister Advanced Query Optimization Last Change: April 23, /90

53 Dynamic Programming - DP SIZE T 4 T 3,4 T 3 T 2... T 1 T 1,2 1 2 # Tables Andreas Meister Advanced Query Optimization Last Change: April 23, /90

54 Dynamic Programming - DP SIZE T 4 T 3,4 T 3 T T 1 T 1,2 T 1,2, # Tables Andreas Meister Advanced Query Optimization Last Change: April 23, /90

55 Dynamic Programming - DP SIZE T 4 T 3,4 T 3 T 1,2,3,4 T T 1 T 1,2 T 1,2,3 T 1,2,3, # Tables Andreas Meister Advanced Query Optimization Last Change: April 23, /90

56 Dynamic Programming - DP SIZE Evaluation of DP SIZE only based on number of tables Not only valid...: (T1,T 2,T 3 ) (T 4 ) (T1,T 2,T 4 ) (T 3 ) (T 1,T 3,T 4 ) (T 2 ) (T 2,T 3,T 4 ) (T 1 ) Andreas Meister Advanced Query Optimization Last Change: April 23, /90

57 Dynamic Programming - DP SIZE Evaluation of DP SIZE only based on number of tables Not only valid...: (T1,T 2,T 3 ) (T 4 ) (T1,T 2,T 4 ) (T 3 ) (T 1,T 3,T 4 ) (T 2 ) (T 2,T 3,T 4 ) (T 1 ).., but also invalid entries are evaluated: (T1,T 2,T 3 ) (T 1 ) (T1,T 2,T 3 ) (T 2 ) (T1,T 2,T 3 ) (T 3 ) Andreas Meister Advanced Query Optimization Last Change: April 23, /90

58 Dynamic Programming - DP SIZE Evaluation of DP SIZE only based on number of tables Not only valid...: (T1,T 2,T 3 ) (T 4 ) (T1,T 2,T 4 ) (T 3 ) (T 1,T 3,T 4 ) (T 2 ) (T 2,T 3,T 4 ) (T 1 ).., but also invalid entries are evaluated: (T1,T 2,T 3 ) (T 1 ) (T1,T 2,T 3 ) (T 2 ) (T1,T 2,T 3 ) (T 3 ) Solution: Enumerate calculations (DP SUB ) Andreas Meister Advanced Query Optimization Last Change: April 23, /90

59 Dynamic Programming - DP SUB Idea: Use integer representation of solutions Table T i available i-th bit set 1 (T1 ) 2 (T 2 ) 3 (T 1,T 2 ) 4 (T3 ) 5 (T1,T 3 ) 15 (T1,T 2,T 3,T 4 ) Andreas Meister Advanced Query Optimization Last Change: April 23, /90

60 Algorithm 2: DP SUB [Vance and Maier, 1996] Input : Join query Q with n tables T = {T 1,..., T n} Output: an optimal bushy join tree 1 foreach T i T do 2 optimalplan(t i ) = T i ; 3 for k = 2 to n do 4 for S = 2 k to 2 k 1 do 5 foreach S l S do 6 optimal-left-plan = optimalplan(s l ); 7 if optimal-left-plan = then continue; 8 S r = S S l ; 9 optimal-right-plan = optimalplan(s r ); 10 if optimal-right-plan = then continue; 11 if optimal-left-plan not connected to optimal-right-plan then continue; 12 current-plan = createjointree(optimal-left-plan, optimal-right-plan); 13 if cost(optimalplan(s)) > cost(current-plan) then 14 optimalplan(s) = current-plan ; 15 return optimalplan(2 n 1) ; Andreas Meister Advanced Query Optimization Last Change: April 23, /90

61 Dynamic Programming - DP SUB n=2: 3 (T 1,T 2 ) Andreas Meister Advanced Query Optimization Last Change: April 23, /90

62 Dynamic Programming - DP SUB n=2: n=3: 3 (T 1,T 2 ) 5 (T1,T 3 ) 6 (T 2,T 3 ) 7 (T 1,T 2,T 3 ) Andreas Meister Advanced Query Optimization Last Change: April 23, /90

63 Dynamic Programming - DP SUB n=2: n=3: n=4: 3 (T 1,T 2 ) 5 (T1,T 3 ) 6 (T 2,T 3 ) 7 (T 1,T 2,T 3 ) 9 (T1,T 4 ) 10 (T 2,T 4 ) 11 (T 1,T 2,T 4 ) 12 (T3,T 4 ) 13 (T1,T 3,T 4 ) 14 (T2,T 3,T 4 ) 15 (T1,T 2,T 3,T 4 ) Andreas Meister Advanced Query Optimization Last Change: April 23, /90

64 Dynamic Programming - DP SUB Splitting integers determines calculations e.g. solution s 13 (T 1,T 3,T 4 ): Andreas Meister Advanced Query Optimization Last Change: April 23, /90

65 Dynamic Programming - DP SUB Splitting integers determines calculations e.g. solution s 13 (T 1,T 3,T 4 ): Implementation: 1 Determine the first left join partner l (least significant bit of s using Compiler or DeBruijn) 2 Determine first right join partner (s l) 3 Determine next left join partner l = s&(l s) 4 Repeat step 2-3 until l == s Andreas Meister Advanced Query Optimization Last Change: April 23, /90

66 Dynamic Programming - DP SUB Remember: Valid combinations based on query topology: r 1 r 2 r 3... r n r 1 r 2 r 3... r n linear cyclic r 1 r 1 r 2 r 2 r 3... r n r 3 r n... star clique DP SUB always enumerate all combinations For non-cliques also unneeded combinations are evaluated Andreas Meister Advanced Query Optimization Last Change: April 23, /90

67 Dynamic Programming - DP CCP Idea: Enumerate calculations based on query Andreas Meister Advanced Query Optimization Last Change: April 23, /90

68 Dynamic Programming - DP CCP Idea: Enumerate calculations based on query Approach: Enumerate tables of query in a breath-first-manner (Avoid duplicate calculations) Determine connected sub-graphs within query For sub-graphs: Determine complements (joinable connected-subgraphs) Andreas Meister Advanced Query Optimization Last Change: April 23, /90

69 Algorithm 3: DP CCP [Moerkotte and Neumann, 2006] Input : Join query Q with n tables T = {T 1,..., T n} Output: an optimal bushy join tree 1 foreach T i T do 2 optimalplan(t i ) = i ; 3 csgs = enumeratecsg(q); 4 foreach S l csgs do 5 cmps = enumeratecmp(q, S l ); 6 foreach S r cmps do 7 optimal-left-plan = optimalplan(s l ); 8 optimal-right-plan = optimalplan(s r ); 9 current-plan = createjointree(optimal-left-plan, optimal-right-plan); 10 if cost(optimalplan(s l S r )) > cost(current-plan) then 11 optimalplan(s l S r ) = current-plan ; 12 current-plan = createjointree(optimal-right-plan, optimal-left-plan); 13 if cost(optimalplan(s l S r )) > cost(current-plan) then 14 optimalplan(s l S r ) = current-plan ; 15 return optimalplan(r) ; Andreas Meister Advanced Query Optimization Last Change: April 23, /90

70 Dynamic Programming - Overview DP SIZE Good approach for simple optimization problems Inefficient (invalid join partners), for more complex optimization problems DP SUB Good for optimizing cliques Based on enumeration of all possible combinations, inefficient for other topologies DP CCP Only needed join pairs are evaluated based on enumeration Enumeration poses overhead for simple optimization problems Andreas Meister Advanced Query Optimization Last Change: April 23, /90

71 Problems with traditional DP Approaches Serial execution of calculations Independent calculations available R 1 R 2 R 3 R 4 Adapted from [Han et al., 2008] S = 2: R1-R2; R1-R3; R1-R4 Current multi-core CPUs offer parallel execution Parallel DP-approach needed for current hardware Andreas Meister Advanced Query Optimization Last Change: April 23, /90

72 Parallelized Dynamic Programming - PDP SVA Idea: Partition independent calculations [Han et al., 2008] Adapted from [Han et al., 2008] QS: Qualifier set PlanList: Optimal query execution plan q 1,, q 4 : Qualifier for relations P 1,, P 4 : QS with 1,, 4 relations Andreas Meister Advanced Query Optimization Last Change: April 23, /90

73 Algorithm 4: PDP SVA [Han et al., 2008] Input : Join query Q with n tables T = {T 1,..., T n} Output: an optimal bushy join tree 1 foreach T i T do 2 optimalplan(t i ) = T i ; 3 for s = 2 to n do 4 SSDVs = AllocateSearchSpace(S,m); 5 for i = 1 to MAX_THREAD_ID do 6 threadpool.submitjob(mutipleplanjoin(ssdvs[i],s)); 7 threadpool.sync(); 8 MergeAndPrunePlanPartitions(S); 9 for i = 1 to MAX_THREAD_ID do 10 threadpool.submitjob(buildskipvectorarray(i)); 11 threadpool.sync(); 12 return optimalplan(r) ; Andreas Meister Advanced Query Optimization Last Change: April 23, /90

74 Parallelized Dynamic Programming - PDP SVA Idea: Distribute over different threads [Han et al., 2008] Adapted from [Han et al., 2008] Andreas Meister Advanced Query Optimization Last Change: April 23, /90

75 Allocation Schemata Search space: s 2 smallsz =1 ( P smallsz P S smallsz ) Total Sum Allocation: Divide the search space in m (number of threads) smaller parts and distribute them equally over the m threads Adapted from [Han et al., 2008] Andreas Meister Advanced Query Optimization Last Change: April 23, /90

76 Allocation Schemata /2 Equi-Depth Allocation: Equally distribute each ( P smallsz P S smallsz ) over all threads (q 1,q 1 q 2 q 3 ) (q 2,q 1 q 2 q 3 ) (q 3,q 1 q 2 q 3 ) (q 4,q 1 q 2 q 3 ) (q 1 q 2,q 1 q 2 ) (q 1 q 3,q 1 q 2 ) (q 1 q 4,q 1 q 2 ) (q 1,q 1 q 2 q 4 ) (q 2,q 1 q 2 q 4 ) (q 3,q 1 q 2 q 4 ) (q 4,q 1 q 2 q 4 ) (q 1 q 2,q 1 q 3 ) (q 1 q 3,q 1 q 3 ) (q 1 q 4,q 1 q 3 ) (q 1,q 1 q 3 q 4 ) (q 2,q 1 q 3 q 4 ) P 1 P 3 thread 1 P 2 P 2 (q 3,q 1 q 3 q 4 ) (q 4,q 1 q 3 q 4 ) (q 1 q 2,q 1 q 4 ) (q 1 q 3,q 1 q 4 ) (q 1 q 4,q 1 q 4 ) thread 2 Adapted from [Han et al., 2008] Andreas Meister Advanced Query Optimization Last Change: April 23, /90

77 Allocation Schemata /3 Round-Robin Outer Allocation: Randomly distribute join pairs (t i, t j ) to thread (i mod m) (q 1,q 1 q 2 q 3 ) (q 2,q 1 q 2 q 3 ) (q 3,q 1 q 2 q 3 ) (q 4,q 1 q 2 q 3 ) (q 1 q 2,q 1 q 2 ) (q 1 q 3,q 1 q 2 ) (q 1 q 4,q 1 q 2 ) (q 1,q 1 q 2 q 4 ) (q 2,q 1 q 2 q 4 ) (q 3,q 1 q 2 q 4 ) (q 4,q 1 q 2 q 4 ) (q 1 q 2,q 1 q 3 ) (q 1 q 3,q 1 q 3 ) (q 1 q 4,q 1 q 3 ) (q 1,q 1 q 3 q 4 ) (q 2,q 1 q 3 q 4 ) P 1 P 3 thread 1 P 2 P 2 (q 3,q 1 q 3 q 4 ) (q 4,q 1 q 3 q 4 ) (q 1 q 2,q 1 q 4 ) (q 1 q 3,q 1 q 4 ) (q 1 q 4,q 1 q 4 ) thread 2 Adapted from [Han et al., 2008] Andreas Meister Advanced Query Optimization Last Change: April 23, /90

78 Allocation Schemata /4 Round-Robin Inner Allocation: Randomly distribute join pairs (t i, t j ) to thread (j mod m) (q 1,q 1 q 2 q 3 ) (q 2,q 1 q 2 q 3 ) (q 3,q 1 q 2 q 3 ) (q 4,q 1 q 2 q 3 ) (q 1 q 2,q 1 q 2 ) (q 1 q 3,q 1 q 2 ) (q 1 q 4,q 1 q 2 ) (q 1,q 1 q 2 q 4 ) (q 2,q 1 q 2 q 4 ) (q 3,q 1 q 2 q 4 ) (q 4,q 1 q 2 q 4 ) (q 1 q 2,q 1 q 3 ) (q 1 q 3,q 1 q 3 ) (q 1 q 4,q 1 q 3 ) (q 1,q 1 q 3 q 4 ) (q 2,q 1 q 3 q 4 ) P 1 P 3 thread 1 P 2 P 2 (q 3,q 1 q 3 q 4 ) (q 4,q 1 q 3 q 4 ) (q 1 q 2,q 1 q 4 ) (q 1 q 3,q 1 q 4 ) (q 1 q 4,q 1 q 4 ) thread 2 Adapted from [Han et al., 2008] Andreas Meister Advanced Query Optimization Last Change: April 23, /90

79 Storage of allocation information Store distribution information in the search space description vector (SSDV) SSDV-Entry: smallsz, [stoutidx, stblkidx, stblkoff ], [endoutidx, endblkidx, endblkoff ], outinc, ininc Andreas Meister Advanced Query Optimization Last Change: April 23, /90

80 Storage of allocation information Store distribution information in the search space description vector (SSDV) SSDV-Entry: smallsz, [stoutidx, stblkidx, stblkoff ], [endoutidx, endblkidx, endblkoff ], outinc, ininc smallsz: Identifier for join of ( P smallsz P S smallsz ) stoutidx: Start index of outer tuple stblkidx: Start block index stblkoff: Offset of inner tuple within block endoutidx: End index of outer tuple endblkidx: End block index endblkoff: Offset of end inner tuple within block outinc: Step size for outer loop ininc: Step size for inner loop Andreas Meister Advanced Query Optimization Last Change: April 23, /90

81 Storage of allocation information - example Adapted from [Han et al., 2008] 1 Block: Thread 1 - SSDV-Entry: { 1, [1, 1, 1], [4, 1, 1], 1, 1, 2, [ 1, 1, 1], [ 1, 1, 1], 1, 1 } Thread 2 - SSDV-Entry: { 1, [4, 1, 2], [4, 1, 3], 1, 1, 2, [1, 1, 1], [3, 1, 3], 1, 1 } Andreas Meister Advanced Query Optimization Last Change: April 23, /90

82 Parallelized Dynamic Programming - PDP SVA Algorithm 5: MultiplePlanJoin [Han et al., 2008] Input: SSDV, S 1 for i=1 to s 2 do 2 PlanJoin(SSDV[i],S) Andreas Meister Advanced Query Optimization Last Change: April 23, /90

83 Parallelized Dynamic Programming - PDP SVA Algorithm 6: PlanJoin [Han et al., 2008] Input: ssdvelmt, S 1 smallsz = ssdvelmt.smallsz; largesz = S-smallSZ; 2 for blkidx = ssdvelmt.stblkidx to ssdvelmt.endblkidx do 3 blk = blkidx-th block in P largesz ; 4 stoutidx, endoutidx = GetOuterRange(ssdvElmt, blkidx); 5 for t o=p smallsz [stoutidx] to P smallsz [endoutidx] step by ssdvelmt.outinc do 6 stblkoff, endblkoff = GetOffsetRangeInBlk(ssdvElmt,blkIdx,offset of t o); 7 for t i = blk [stblkoff ] to blk [endblkoff ] step by ssdvelmt.ininc do 8 if t o.qs t i.qs then 9 continue; 10 if not (t o.qs connected to t i.qs) then 11 continue; 12 Resulting plans = CreateJoinPlans(t o, t i ); 13 PrunePlans(P S, ResultingPlans); Andreas Meister Advanced Query Optimization Last Change: April 23, /90

84 Storage of allocation information - example Adapted from [Han et al., 2008] 1 Block: Thread 1 - SSDV-Entry: { 1, [1, 1, 1], [4, 1, 1], 1, 1, 2, [ 1, 1, 1], [ 1, 1, 1], 1, 1 } Thread 2 - SSDV-Entry: { 1, [4, 1, 2], [4, 1, 3], 1, 1, 2, [1, 1, 1], [3, 1, 3], 1, 1 } Andreas Meister Advanced Query Optimization Last Change: April 23, /90

85 Parallelizing problems Only a small amount of combinations of different join sets are valid Adapted from [Han et al., 2008] Andreas Meister Advanced Query Optimization Last Change: April 23, /90

86 Skip Vector Arrays Problem: High number of invalid combination of qualifier sets Andreas Meister Advanced Query Optimization Last Change: April 23, /90

87 Skip Vector Arrays Problem: High number of invalid combination of qualifier sets Idea: Increase performance by skipping unnecessary combinations of join sets Store additional skipping information to efficiently determine the next join sets Andreas Meister Advanced Query Optimization Last Change: April 23, /90

88 Skip Vector Arrays Adapted from [Han et al., 2008] Andreas Meister Advanced Query Optimization Last Change: April 23, /90

89 Skip Vector Arrays Adapted from [Han et al., 2008] Andreas Meister Advanced Query Optimization Last Change: April 23, /90

90 Parallelized Dynamic Programming - PDP SVA Based on DP SIZE Same drawback invalid calculations Parallelization strategies for DP SUB and DP CCP needed Andreas Meister Advanced Query Optimization Last Change: April 23, /90

91 Parallelized Dynamic Programming - PDP SVA Based on DP SIZE Same drawback invalid calculations Parallelization strategies for DP SUB and DP CCP needed Idea: Use produce-consumer-model Consumer Consumer Producer Queue Consumer Consumer Consumer Andreas Meister Advanced Query Optimization Last Change: April 23, /90

92 Algorithm 7: DPE GENERIC [Han and Lee, 2009] Input : Join query Q with n tables T = {T 1,..., T n} Output: an optimal bushy join tree 1 EnumerationBuffer B c, B p; 2 Hash-Table Memo; 3 partial_order = buildpartialorder(r); 4 e = parsecalculations(partial_order, Memo, B p, MAX_ENUM_CNT ); 5 while e NO_MORE_PAIR do 6 switchbuffers() // B c = B p and B p = B c; 7 for i = 0 to MAX_THREAD_ID 1 do 8 threadpool.submitjob(generateqeps(b c,memo)); 9 e = parsecalculations(partial_order, Memo, B p, MAX_ENUM_CNT ); 10 GenerateQEPs(B c,memo)); 11 threadpool.sync(); 12 return Memo(R) ; Andreas Meister Advanced Query Optimization Last Change: April 23, /90

93 Parallelized Dynamic Programming - DPE GENERIC Characteristics: Parallel Producer-Consumer model Double queue (Producer + Consumer) Parameters: Maximal queue size Partial order Enumeration (DPSIZE,DP SUB,DP CCP ) Andreas Meister Advanced Query Optimization Last Change: April 23, /90

94 Parallelized Dynamic Programming - DPE GENERIC Partial Order: Grouping based on resulting quantifier set q 1q 2q 3q 4 (q 1, q 2q 3q 4),(q 2, q 1q 3q 4),(q 3, q 1q 2q 4),(q 4, q 1q 2q 3), (q 1q 2, q 3q 4),(q 1q 3, q 2q 4),(q 1q 4, q 2q 3) q 1q 2q 3 q 1q 2q 4 q 1q 3q 4 q 2q 3q 4 (q 1, q 2q 3), (q 2, q 1q 3), (q 3, q 1q 2) (q 1, q 2q 4), (q 2, q 1q 4), (q 4, q 1q 2) (q 1, q 3q 4), (q 3, q 1q 4), (q 4, q 1q 3) (q 2, q 3q 4), (q 3, q 2q 4), (q 4, q 2q 3) q 1q 2 q 1q 3 q 1q 4 q 2q 3 q 2q 4 q 3q 4 (q 1, q 2) (q 1, q 3) (q 1, q 4) (q 2, q 3) (q 2, q 4) (q 3, q 4) q 1 (q 1) q 2 (q 2) q 3 (q 3) q 4 (q 4) Adapted from [Han and Lee, 2009] Problem: Only few calculations are grouped together Andreas Meister Advanced Query Optimization Last Change: April 23, /90

95 Parallelized Dynamic Programming - DPE GENERIC Partial Order: Grouping based on size of resulting quantifier set SRQS 4 (q 1, q 2q 3q 4),(q 2, q 1q 3q 4),(q 3, q 1q 2q 4),(q 4, q 1q 2q 3), (q 1q 2, q 3q 4),(q 1q 3, q 2q 4),(q 1q 4, q 2q 3) SRQS 3 (q 1, q 2q 3),(q 1, q 2q 4),(q 1, q 3q 4),(q 2, q 3q 4), (q 2, q 1q 3),(q 2, q 1q 4),(q 3, q 1q 4),(q 3, q 2q 4), (q 3, q 1q 2),(q 4, q 1q 2),(q 4, q 1q 3),(q 4, q 2q 3) SRQS 2 (q 1, q 2),(q 1, q 3),(q 1, q 4),(q 2, q 3),(q 2, q 4),(q 3, q 4) SRQS 1 (q 1),(q 2),(q 3),(q 4) Adapted from [Han and Lee, 2009] Problem: Only few independent calculations are available Andreas Meister Advanced Query Optimization Last Change: April 23, /90

96 Parallelized Dynamic Programming - DPE GENERIC Partial Order: Grouping based on size of larger quantifier set SLQS 3 (q 1, q 2q 3q 4),(q 2, q 1q 3q 4),(q 3, q 2q 1q 4),(q 4, q 1q 2q 3) SLQS [1,3] SLQS 2 SLQS [1,2] SLQS [2,2] (q 1, q 2q 3), (q 1, q 2q 4), (q 1, q 3q 4), (q 1q 2, q 3q 4), (q 2, q 1q 3), (q 2, q 1q 4), (q 2, q 3q 4), (q 1q 3, q 2q 4), (q 3, q 1q 2), (q 3, q 1q 4), (q 3, q 2q 4), (q 1q 4, q 2q 3) (q 4, q 1q 2), (q 4, q 1q 3), (q 4, q 2q 3) SLQS 1 (q 1, q 2),(q 1, q 3),(q 1, q 4), (q 2, q 3),(q 2, q 4),(q 3, q 4) SLQS [1,1] SLQS 0 (q 1), (q 2), (q 3), (q 4) SLQS [1,0] Adapted from [Han and Lee, 2009] Andreas Meister Advanced Query Optimization Last Change: April 23, /90

97 Parallelized Dynamic Programming - DPE GENERIC Each element in partial: own queue (q 1 ) (q 2 ) (q 3 ) (q 4 ) B [1, 0] (q 1, q 2 ) (q 1, q 3 ) (q 1, q 4 ) (q 2, q 3 ) (q 2, q 4 ) (q 3, q 4 ) B [1, 1] B [2, 2] (q 1, q 2 q 3 ) (q 1, q 3 q 4 ) (q 3, q 2 q 4 ) (q 4, q 2 q 3 ) (q 2, q 3 q 4 ) B [1, 2] B [1, 3] Adapted from [Han and Lee, 2009] Insert calculations into corresponding queue Consumer iterate over all available queues and pull available calculations Access need to be synchronized Andreas Meister Advanced Query Optimization Last Change: April 23, /90

98 Parallelized Dynamic Programming - DPE GENERIC While consumer evaluate available calculations, the producer creates new calculations Number of calculations is predefined front (q 1 q 4, q 2 q 3 ) (q 1 q 3, q 2 q 4 ) (q 1 q 2, q 3 q 4 ) B [2, 2] (q 2, q 1 q 4 ) (q 3, q 1 q 4 ) (q 2, q 1 q 3 ) (q 4, q 1 q 3 ) (q 3, q 1 q 2 ) (q 4, q 1 q 2 ) (q 1, q 2 q 4 ) B [1, 0] B [1, 1] B [1, 2] (q 2, q 1 q 3 q 4 ) (q 3, q 1 q 2 q 4 ) (q 4, q 1 q 2 q 3 ) (q 1, q 2 q 3 q 4 ) B [1, 3] Adapted from [Han and Lee, 2009] Andreas Meister Advanced Query Optimization Last Change: April 23, /90

99 Parallelized Dynamic Programming - DPE GENERIC Sort calculations based on dependencies Better parallelization : dependency-free entries Threading Across Dependencies T a front T b B [, i ] B [, i + 1] B [, i + 2] B [, i ] B [, i + 1] B [, i + 2] Adapted from [Han and Lee, 2009] Andreas Meister Advanced Query Optimization Last Change: April 23, /90

100 Parallelized Dynamic Programming - DPE GENERIC Sort calculations based on dependencies Better parallelization : dependency-free entries Threading Across Dependencies T a front T b B [, i ] B [, i + 1] B [, i + 2] B [, i ] B [, i + 1] B [, i + 2] Adapted from [Han and Lee, 2009] What happens after a thread pulled a calculation? Andreas Meister Advanced Query Optimization Last Change: April 23, /90

101 Parallelized Dynamic Programming - DPE GENERIC Processing similar to sequential execution: Fetch information of join-partners Evaluate costs Hash table HSJN MGJN NLJN : a MEMO entry : a QEP node q 1 q 2 q 3 q 4 q 1 q 2 : a hash chain : a plan chain q 3 q 4 HSJN Adapted from [Han and Lee, 2009] Problem: Synchronization needed Andreas Meister Advanced Query Optimization Last Change: April 23, /90

102 Parallelized Dynamic Programming - DPE GENERIC Problem: Synchronization needed Solution: Group calculations into equivalence classes Link entries directly to entries of the hash table Hash table NLJN (m[q 1q 2q 3q 4], m[q 1q 4], m[q 2q 3]) (m[q 1q 2q 3q 4], m[q 1q 3], m[q 2q 4]) B [2,2] (m[q 1q 2q 3q 4], m[q 1q 2], m[q 3q 4]) q 1q 2q 3q 4 earmark-and-pinned q 3q 4 HSJN A synchronization-free global MEMO q 1q 2 (m[q 1q 3q 4], m[q 4], m[q 1q 3]) (m[q 1q 3q 4], m[q 3], m[q 1q 2]) (m[q 1q 2q 4], m[q 4], m[q 1q 2]) (m[q 1q 2q 4], m[q 2], m[q 1q 4]) (m[q 1q 2q 4], m[q 1], m[q 2q 4]) (m[q 1q 2q 3], m[q 3], m[q 1q 2]) B [1,2] (m[q 1q 2q 3], m[q 2], m[q 1q 3]) Adapted from [Han and Lee, 2009] Andreas Meister Advanced Query Optimization Last Change: April 23, /90

103 So what is the best approach? Andreas Meister Advanced Query Optimization Last Change: April 23, /90

104 Dynamic Programming - Evaluation Linear queries - simple cost function: Runtime (ns) DP SIZE DP SUB DP CCP PDP SVA DPE GENE #Tables Andreas Meister Advanced Query Optimization Last Change: April 23, /90

105 Dynamic Programming - Evaluation Star queries - simple cost function: Runtime (ns) DP SIZE DP SUB DP CCP PDP SVA DPE GENE #Tables Andreas Meister Advanced Query Optimization Last Change: April 23, /90

106 Dynamic Programming - Evaluation Clique queries - simple cost function: Runtime (ns) DP SIZE DP SUB DP CCP PDP SVA DPE GENE #Tables Andreas Meister Advanced Query Optimization Last Change: April 23, /90

107 Dynamic Programming - Evaluation Linear queries - complex cost function: Runtime (ns) DP SIZE DP SUB DP CCP PDP SVA DPE GENE Complex #Tables Andreas Meister Advanced Query Optimization Last Change: April 23, /90

108 Dynamic Programming - Evaluation Star queries - complex cost function: Runtime (ns) DP SIZE DP SUB DP CCP PDP SVA DPE GENE #Tables Andreas Meister Advanced Query Optimization Last Change: April 23, /90

109 Dynamic Programming - Evaluation Clique queries - complex cost function: Runtime (ns) DP SIZE DP SUB DP CCP PDP SVA DPE GENE #Tables Andreas Meister Advanced Query Optimization Last Change: April 23, /90

110 Dynamic Programming - Evaluation There is no best approach: Sequential approaches good for simple problems Parallel approaches good for complex problems Andreas Meister Advanced Query Optimization Last Change: April 23, /90

111 Dynamic Programming - Evaluation There is no best approach: Sequential approaches good for simple problems Parallel approaches good for complex problems Simple problems: Simple cost function Small number of tables Andreas Meister Advanced Query Optimization Last Change: April 23, /90

112 Dynamic Programming - Evaluation There is no best approach: Sequential approaches good for simple problems Parallel approaches good for complex problems Simple problems: Simple cost function Small number of tables Complex problems: Complex cost function Large number of tables Andreas Meister Advanced Query Optimization Last Change: April 23, /90

113 How are cost calculated? Andreas Meister Advanced Query Optimization Last Change: April 23, /90

114 Cost estimation Cost-based optimization needs cost estimations Cost-estimation directly influence optimization Andreas Meister Advanced Query Optimization Last Change: April 23, /90

115 Cost estimation Cost-based optimization needs cost estimations Cost-estimation directly influence optimization Challenge: Accurate estimations Time limit Andreas Meister Advanced Query Optimization Last Change: April 23, /90

116 Cost estimation Cost-estimation directly influence optimization: Good estimation good optimization results Bad estimations unpredictable optimization results Andreas Meister Advanced Query Optimization Last Change: April 23, /90

117 Cost estimation Cost-estimation directly influence optimization: Good estimation good optimization results Bad estimations unpredictable optimization results Challenge: Accurate estimations Time limit Andreas Meister Advanced Query Optimization Last Change: April 23, /90

118 Cost estimation Cost-estimation directly influence optimization: Good estimation good optimization results Bad estimations unpredictable optimization results Challenge: Accurate estimations Time limit Components: Cardinality estimation: Statistics + Formulas Histograms Parametric approaches Sample-based approaches Cost function Andreas Meister Advanced Query Optimization Last Change: April 23, /90

119 Cardinality estimation - Statistics + Formulas Idea: Estimate selectivity of operators σ F (R(R)) = sel(f, R) r Andreas Meister Advanced Query Optimization Last Change: April 23, /90

120 Cardinality estimation - Statistics + Formulas Idea: Estimate selectivity of operators σ F (R(R)) = sel(f, R) r Estimation (for interpolateable, arithmetic values) : sel(a = v, R) = 1 val A,r sel(a < v, R) = sel(a > v, R) = sel(a between v 1 and v 2, R) = v A min A max A min A max v A max A min v 2 v 1 A max A min Further cost formulas presented in [Saake et al., 2012] Andreas Meister Advanced Query Optimization Last Change: April 23, /90

121 Cardinality estimation - Histograms frequency real distribution approximate distribution attribute values Adapted from [Saake et al., 2012] Andreas Meister Advanced Query Optimization Last Change: April 23, /90

122 Cardinality estimation - Parametric approaches Gaussian Distribution Cluster Adapted from [Böhm et al., 2005] Andreas Meister Advanced Query Optimization Last Change: April 23, /90

123 Cardinality estimation - Sample-based approaches Kernel density estimator Adapted from [Heimel and Markl, 2012] Andreas Meister Advanced Query Optimization Last Change: April 23, /90

124 Cardinality estimation - Cost function Input: Cardinality estimation Query information Database information Output: Cost estimation (artificial, time, ) Andreas Meister Advanced Query Optimization Last Change: April 23, /90

125 Cardinality estimation - Cost function Input: Cardinality estimation Query information Database information Output: Cost estimation (artificial, time, ) Considered aspects: Disk accesses CPU consumption Memory consumption Network traffic Execution time Cache misses Andreas Meister Advanced Query Optimization Last Change: April 23, /90

126 Cardinality estimation - Cost function Input: Cardinality estimation Query information Database information Output: Cost estimation (artificial, time, ) Considered aspects: Disk accesses CPU consumption Memory consumption Network traffic Execution time Cache misses Complex is not always better![leis et al., 2015] Andreas Meister Advanced Query Optimization Last Change: April 23, /90

127 Is this all? Andreas Meister Advanced Query Optimization Last Change: April 23, /90

128 Outlook Modern hardware Cost estimation New optimization approaches: Ant-Colony optimization Genetic algorithms Mixed integer linear programming Further optimization problems: Physical Database Design Partitioning Index Materialized views Self-Tuning Andreas Meister Advanced Query Optimization Last Change: April 23, /90

129 Wrap-up Query Optimization Join-Order Optimization Dynamic Programming Sequential Parallel Cost estimation Andreas Meister Advanced Query Optimization Last Change: April 23, /90

130 Wrap-up Query Optimization Join-Order Optimization Dynamic Programming Sequential Parallel Cost estimation There is no best approach Andreas Meister Advanced Query Optimization Last Change: April 23, /90

131 Join us! Possible collaboration: Thesis Scientific Team Project: Modern Database Technologies Requirements: Motivation (Programming skills) Andreas Meister Advanced Query Optimization Last Change: April 23, /90

132 References I Böhm, C., Kriegel, H.-P., Kröger, P., and Linhart, P. (2005). Selectivity Estimation of High Dimensional Window Queries via Clustering. In Advances in Spatial and Temporal Databases, volume 3633 of Lecture Notes in Computer Science, pages Springer. Han, W.-S., Kwak, W., Lee, J., Lohman, G. M., and Markl, V. (2008). Parallelizing Query Optimization. PVLDB, 1(1): Han, W.-S. and Lee, J. (2009). Dependency-aware Reordering for Parallelizing Query Optimization in Multi-core CPUs. SIGMOD, pages ACM. Heimel, M. and Markl, V. (2012). A First Step Towards GPU-assisted Query Optimization. In ADMS. Leis, V., Gubichev, A., Mirchev, A., Boncz, P., Kemper, A., and Neumann, T. (2015). How Good Are Query Optimizers, Really? PVLDB, 9(3): Moerkotte, G. and Neumann, T. (2006). Analysis of Two Existing and One New Dynamic Programming Algorithm for the Generation of Optimal Bushy Join Trees Without Cross Products. VLDB, pages VLDB Endowment. Andreas Meister Advanced Query Optimization Last Change: April 23, /90

133 References II Neumann, T. (2014). Engineering High-Performance Database Engines. PVLDB, 7(13): Saake, G., Heuer, A., and Sattler, K.-U. (2012). Datenbanken: Implementierungstechniken. mitp-verlag, Bonn, 3 edition. Selinger, P. G., Astrahan, M. M., Chamberlin, D. D., Lorie, R. A., and Price, T. G. (1979). Access Path Selection in a Relational Database Management System. SIGMOD, pages ACM. Vance, B. and Maier, D. (1996). Rapid Bushy Join-order Optimization with Cartesian Products. SIGMOD, pages ACM. Zukowski, M., Boncz, P. A., Nes, N., and Héman, S. (2005). Monetdb/x100 - A DBMS in the CPU cache. IEEE Data Eng. Bull., 28(2): Andreas Meister Advanced Query Optimization Last Change: April 23, /90

Avoiding Sorting and Grouping In Processing Queries

Avoiding Sorting and Grouping In Processing Queries Avoiding Sorting and Grouping In Processing Queries Outline Motivation Simple Example Order Properties Grouping followed by ordering Order Property Optimization Performance Results Conclusion Motivation

More information

Beyond EXPLAIN. Query Optimization From Theory To Code. Yuto Hayamizu Ryoji Kawamichi. 2016/5/20 PGCon Ottawa

Beyond EXPLAIN. Query Optimization From Theory To Code. Yuto Hayamizu Ryoji Kawamichi. 2016/5/20 PGCon Ottawa Beyond EXPLAIN Query Optimization From Theory To Code Yuto Hayamizu Ryoji Kawamichi 2016/5/20 PGCon 2016 @ Ottawa Historically Before Relational Querying was physical Need to understand physical organization

More information

QUERY OPTIMIZATION E Jayant Haritsa Computer Science and Automation Indian Institute of Science. JAN 2014 Slide 1 QUERY OPTIMIZATION

QUERY OPTIMIZATION E Jayant Haritsa Computer Science and Automation Indian Institute of Science. JAN 2014 Slide 1 QUERY OPTIMIZATION E0 261 Jayant Haritsa Computer Science and Automation Indian Institute of Science JAN 2014 Slide 1 Database Engines Main Components Query Processing Transaction Processing Access Methods JAN 2014 Slide

More information

R & G Chapter 13. Implementation of single Relational Operations Choices depend on indexes, memory, stats, Joins Blocked nested loops:

R & G Chapter 13. Implementation of single Relational Operations Choices depend on indexes, memory, stats, Joins Blocked nested loops: Relational Query Optimization R & G Chapter 13 Review Implementation of single Relational Operations Choices depend on indexes, memory, stats, Joins Blocked nested loops: simple, exploits extra memory

More information

Midterm Review. March 27, 2017

Midterm Review. March 27, 2017 Midterm Review March 27, 2017 1 Overview Relational Algebra & Query Evaluation Relational Algebra Rewrites Index Design / Selection Physical Layouts 2 Relational Algebra & Query Evaluation 3 Relational

More information

Overview of Implementing Relational Operators and Query Evaluation

Overview of Implementing Relational Operators and Query Evaluation Overview of Implementing Relational Operators and Query Evaluation Chapter 12 Motivation: Evaluating Queries The same query can be evaluated in different ways. The evaluation strategy (plan) can make orders

More information

Two-Phase Optimization for Selecting Materialized Views in a Data Warehouse

Two-Phase Optimization for Selecting Materialized Views in a Data Warehouse Two-Phase Optimization for Selecting Materialized Views in a Data Warehouse Jiratta Phuboon-ob, and Raweewan Auepanwiriyakul Abstract A data warehouse (DW) is a system which has value and role for decision-making

More information

TPC-H Benchmark Set. TPC-H Benchmark. DDL for TPC-H datasets

TPC-H Benchmark Set. TPC-H Benchmark. DDL for TPC-H datasets TPC-H Benchmark Set TPC-H Benchmark TPC-H is an ad-hoc and decision support benchmark. Some of queries are available in the current Tajo. You can download the TPC-H data generator here. DDL for TPC-H datasets

More information

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17 Announcement CompSci 516 Database Systems Lecture 10 Query Evaluation and Join Algorithms Project proposal pdf due on sakai by 5 pm, tomorrow, Thursday 09/27 One per group by any member Instructor: Sudeepa

More information

Implementation of Relational Operations

Implementation of Relational Operations Implementation of Relational Operations Module 4, Lecture 1 Database Management Systems, R. Ramakrishnan 1 Relational Operations We will consider how to implement: Selection ( ) Selects a subset of rows

More information

6.830 Problem Set 2 (2017)

6.830 Problem Set 2 (2017) 6.830 Problem Set 2 1 Assigned: Monday, Sep 25, 2017 6.830 Problem Set 2 (2017) Due: Monday, Oct 16, 2017, 11:59 PM Submit to Gradescope: https://gradescope.com/courses/10498 The purpose of this problem

More information

Module 9: Query Optimization

Module 9: Query Optimization Module 9: Query Optimization Module Outline Web Forms Applications SQL Interface 9.1 Outline of Query Optimization 9.2 Motivating Example 9.3 Equivalences in the relational algebra 9.4 Heuristic optimization

More information

Towards Comprehensive Testing Tools

Towards Comprehensive Testing Tools Towards Comprehensive Testing Tools Redefining testing mechanisms! Kuntal Ghosh (Software Engineer) PGCon 2017 26.05.2017 1 Contents Motivation Picasso Visualizer Picasso Art Gallery for PostgreSQL 10

More information

Advanced Database Systems

Advanced Database Systems Lecture IV Query Processing Kyumars Sheykh Esmaili Basic Steps in Query Processing 2 Query Optimization Many equivalent execution plans Choosing the best one Based on Heuristics, Cost Will be discussed

More information

MOIRA A Goal-Oriented Incremental Machine Learning Approach to Dynamic Resource Cost Estimation in Distributed Stream Processing Systems

MOIRA A Goal-Oriented Incremental Machine Learning Approach to Dynamic Resource Cost Estimation in Distributed Stream Processing Systems MOIRA A Goal-Oriented Incremental Machine Learning Approach to Dynamic Resource Cost Estimation in Distributed Stream Processing Systems Daniele Foroni, C. Axenie, S. Bortoli, M. Al Hajj Hassan, R. Acker,

More information

Principles of Data Management. Lecture #9 (Query Processing Overview)

Principles of Data Management. Lecture #9 (Query Processing Overview) Principles of Data Management Lecture #9 (Query Processing Overview) Instructor: Mike Carey mjcarey@ics.uci.edu Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Today s Notable News v Midterm

More information

Evaluation of Relational Operations. Relational Operations

Evaluation of Relational Operations. Relational Operations Evaluation of Relational Operations Chapter 14, Part A (Joins) Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Relational Operations v We will consider how to implement: Selection ( )

More information

CS330. Query Processing

CS330. Query Processing CS330 Query Processing 1 Overview of Query Evaluation Plan: Tree of R.A. ops, with choice of alg for each op. Each operator typically implemented using a `pull interface: when an operator is `pulled for

More information

Query Execution [15]

Query Execution [15] CSC 661, Principles of Database Systems Query Execution [15] Dr. Kalpakis http://www.csee.umbc.edu/~kalpakis/courses/661 Query processing involves Query processing compilation parsing to construct parse

More information

Review. Relational Query Optimization. Query Optimization Overview (cont) Query Optimization Overview. Cost-based Query Sub-System

Review. Relational Query Optimization. Query Optimization Overview (cont) Query Optimization Overview. Cost-based Query Sub-System Review Relational Query Optimization R & G Chapter 12/15 Implementation of single Relational Operations Choices depend on indexes, memory, stats, Joins Blocked nested loops: simple, exploits extra memory

More information

Module 4. Implementation of XQuery. Part 0: Background on relational query processing

Module 4. Implementation of XQuery. Part 0: Background on relational query processing Module 4 Implementation of XQuery Part 0: Background on relational query processing The Data Management Universe Lecture Part I Lecture Part 2 2 What does a Database System do? Input: SQL statement Output:

More information

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 7 - Query optimization

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 7 - Query optimization CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 7 - Query optimization Announcements HW1 due tonight at 11:45pm HW2 will be due in two weeks You get to implement your own

More information

Faloutsos 1. Carnegie Mellon Univ. Dept. of Computer Science Database Applications. Outline

Faloutsos 1. Carnegie Mellon Univ. Dept. of Computer Science Database Applications. Outline Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications Lecture #14: Implementation of Relational Operations (R&G ch. 12 and 14) 15-415 Faloutsos 1 introduction selection projection

More information

Overview of Query Evaluation. Overview of Query Evaluation

Overview of Query Evaluation. Overview of Query Evaluation Overview of Query Evaluation Chapter 12 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Overview of Query Evaluation v Plan: Tree of R.A. ops, with choice of alg for each op. Each operator

More information

Orri Erling (Program Manager, OpenLink Virtuoso), Ivan Mikhailov (Lead Developer, OpenLink Virtuoso).

Orri Erling (Program Manager, OpenLink Virtuoso), Ivan Mikhailov (Lead Developer, OpenLink Virtuoso). Orri Erling (Program Manager, OpenLink Virtuoso), Ivan Mikhailov (Lead Developer, OpenLink Virtuoso). Business Intelligence Extensions for SPARQL Orri Erling and Ivan Mikhailov OpenLink Software, 10 Burlington

More information

CS 4604: Introduction to Database Management Systems. B. Aditya Prakash Lecture #10: Query Processing

CS 4604: Introduction to Database Management Systems. B. Aditya Prakash Lecture #10: Query Processing CS 4604: Introduction to Database Management Systems B. Aditya Prakash Lecture #10: Query Processing Outline introduction selection projection join set & aggregate operations Prakash 2018 VT CS 4604 2

More information

On-Disk Bitmap Index Performance in Bizgres 0.9

On-Disk Bitmap Index Performance in Bizgres 0.9 On-Disk Bitmap Index Performance in Bizgres 0.9 A Greenplum Whitepaper April 2, 2006 Author: Ayush Parashar Performance Engineering Lab Table of Contents 1.0 Summary...1 2.0 Introduction...1 3.0 Performance

More information

What happens. 376a. Database Design. Execution strategy. Query conversion. Next. Two types of techniques

What happens. 376a. Database Design. Execution strategy. Query conversion. Next. Two types of techniques 376a. Database Design Dept. of Computer Science Vassar College http://www.cs.vassar.edu/~cs376 Class 16 Query optimization What happens Database is given a query Query is scanned - scanner creates a list

More information

Histogram Support in MySQL 8.0

Histogram Support in MySQL 8.0 Histogram Support in MySQL 8.0 Øystein Grøvlen Senior Principal Software Engineer MySQL Optimizer Team, Oracle February 2018 Program Agenda 1 2 3 4 5 Motivating example Quick start guide How are histograms

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe CHAPTER 19 Query Optimization Introduction Query optimization Conducted by a query optimizer in a DBMS Goal: select best available strategy for executing query Based on information available Most RDBMSs

More information

Anorexic Plan Diagrams

Anorexic Plan Diagrams Anorexic Plan Diagrams E0 261 Jayant Haritsa Computer Science and Automation Indian Institute of Science JAN 2014 Plan Diagram Reduction 1 Query Plan Selection Core technique Query (Q) Query Optimizer

More information

Evaluation of Relational Operations: Other Techniques. Chapter 14 Sayyed Nezhadi

Evaluation of Relational Operations: Other Techniques. Chapter 14 Sayyed Nezhadi Evaluation of Relational Operations: Other Techniques Chapter 14 Sayyed Nezhadi Schema for Examples Sailors (sid: integer, sname: string, rating: integer, age: real) Reserves (sid: integer, bid: integer,

More information

Database Applications (15-415)

Database Applications (15-415) Database Applications (15-415) DBMS Internals- Part VI Lecture 14, March 12, 2014 Mohammad Hammoud Today Last Session: DBMS Internals- Part V Hash-based indexes (Cont d) and External Sorting Today s Session:

More information

Implementation of Relational Operations. Introduction. CS 186, Fall 2002, Lecture 19 R&G - Chapter 12

Implementation of Relational Operations. Introduction. CS 186, Fall 2002, Lecture 19 R&G - Chapter 12 Implementation of Relational Operations CS 186, Fall 2002, Lecture 19 R&G - Chapter 12 First comes thought; then organization of that thought, into ideas and plans; then transformation of those plans into

More information

CompSci 516 Data Intensive Computing Systems

CompSci 516 Data Intensive Computing Systems CompSci 516 Data Intensive Computing Systems Lecture 9 Join Algorithms and Query Optimizations Instructor: Sudeepa Roy CompSci 516: Data Intensive Computing Systems 1 Announcements Takeaway from Homework

More information

Query Processing & Optimization

Query Processing & Optimization Query Processing & Optimization 1 Roadmap of This Lecture Overview of query processing Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Introduction

More information

Administrivia. CS 133: Databases. Cost-based Query Sub-System. Goals for Today. Midterm on Thursday 10/18. Assignments

Administrivia. CS 133: Databases. Cost-based Query Sub-System. Goals for Today. Midterm on Thursday 10/18. Assignments Administrivia Midterm on Thursday 10/18 CS 133: Databases Fall 2018 Lec 12 10/16 Prof. Beth Trushkowsky Assignments Lab 3 starts after fall break No problem set out this week Goals for Today Cost-based

More information

Optimizing Queries Using Materialized Views

Optimizing Queries Using Materialized Views Optimizing Queries Using Materialized Views Paul Larson & Jonathan Goldstein Microsoft Research 3/22/2001 Paul Larson, View matching 1 Materialized views Precomputed, stored result defined by a view expression

More information

CAS CS 460/660 Introduction to Database Systems. Query Evaluation II 1.1

CAS CS 460/660 Introduction to Database Systems. Query Evaluation II 1.1 CAS CS 460/660 Introduction to Database Systems Query Evaluation II 1.1 Cost-based Query Sub-System Queries Select * From Blah B Where B.blah = blah Query Parser Query Optimizer Plan Generator Plan Cost

More information

Query Optimization Time: The New Bottleneck in Realtime

Query Optimization Time: The New Bottleneck in Realtime Query Optimization Time: The New Bottleneck in Realtime Analytics Rajkumar Sen Jack Chen Nika Jimsheleishvilli MemSQL Inc. MemSQL Inc. MemSQL Inc. 534 4 th Street, 534 4 th Street, 534 4 th Street, San

More information

Overview of Query Evaluation

Overview of Query Evaluation Overview of Query Evaluation Chapter 12 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Overview of Query Evaluation Plan: Tree of R.A. ops, with choice of alg for each op. Each operator

More information

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 9 - Query optimization

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 9 - Query optimization CSE 544 Principles of Database Management Systems Magdalena Balazinska Fall 2007 Lecture 9 - Query optimization References Access path selection in a relational database management system. Selinger. et.

More information

Evaluation of Relational Operations

Evaluation of Relational Operations Evaluation of Relational Operations Yanlei Diao UMass Amherst March 13 and 15, 2006 Slides Courtesy of R. Ramakrishnan and J. Gehrke 1 Relational Operations We will consider how to implement: Selection

More information

Evaluation of Relational Operations

Evaluation of Relational Operations Evaluation of Relational Operations Chapter 12, Part A Database Management Systems, R. Ramakrishnan and J. Gehrke 1 Relational Operations We will consider how to implement: Selection ( ) Selects a subset

More information

Reminders. Query Optimizer Overview. The Three Parts of an Optimizer. Dynamic Programming. Search Algorithm. CSE 444: Database Internals

Reminders. Query Optimizer Overview. The Three Parts of an Optimizer. Dynamic Programming. Search Algorithm. CSE 444: Database Internals Reminders CSE 444: Database Internals Lab 2 is due on Wednesday HW 5 is due on Friday Lecture 11 Query Optimization (part 2) CSE 444 - Winter 2017 1 CSE 444 - Winter 2017 2 Query Optimizer Overview Input:

More information

Hash-Based Indexing 165

Hash-Based Indexing 165 Hash-Based Indexing 165 h 1 h 0 h 1 h 0 Next = 0 000 00 64 32 8 16 000 00 64 32 8 16 A 001 01 9 25 41 73 001 01 9 25 41 73 B 010 10 10 18 34 66 010 10 10 18 34 66 C Next = 3 011 11 11 19 D 011 11 11 19

More information

Jayant Haritsa. Database Systems Lab Indian Institute of Science Bangalore, India

Jayant Haritsa. Database Systems Lab Indian Institute of Science Bangalore, India Jayant Haritsa Database Systems Lab Indian Institute of Science Bangalore, India Query Execution Plans SQL, the standard database query interface, is a declarative language Specifies only what is wanted,

More information

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe Introduction to Query Processing and Query Optimization Techniques Outline Translating SQL Queries into Relational Algebra Algorithms for External Sorting Algorithms for SELECT and JOIN Operations Algorithms

More information

CSE 444: Database Internals. Lecture 11 Query Optimization (part 2)

CSE 444: Database Internals. Lecture 11 Query Optimization (part 2) CSE 444: Database Internals Lecture 11 Query Optimization (part 2) CSE 444 - Spring 2014 1 Reminders Homework 2 due tonight by 11pm in drop box Alternatively: turn it in in my office by 4pm, or in Priya

More information

Optimizer Standof. MySQL 5.6 vs MariaDB 5.5. Peter Zaitsev, Ovais Tariq Percona Inc April 18, 2012

Optimizer Standof. MySQL 5.6 vs MariaDB 5.5. Peter Zaitsev, Ovais Tariq Percona Inc April 18, 2012 Optimizer Standof MySQL 5.6 vs MariaDB 5.5 Peter Zaitsev, Ovais Tariq Percona Inc April 18, 2012 Thank you Ovais Tariq Ovais Did a lot of heavy lifing for this presentation He could not come to talk together

More information

Evaluation of relational operations

Evaluation of relational operations Evaluation of relational operations Iztok Savnik, FAMNIT Slides & Textbook Textbook: Raghu Ramakrishnan, Johannes Gehrke, Database Management Systems, McGraw-Hill, 3 rd ed., 2007. Slides: From Cow Book

More information

15-415/615 Faloutsos 1

15-415/615 Faloutsos 1 Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB Applications Lecture #14: Implementation of Relational Operations (R&G ch. 12 and 14) 15-415/615 Faloutsos 1 Outline introduction selection

More information

CompSci 516 Data Intensive Computing Systems. Lecture 11. Query Optimization. Instructor: Sudeepa Roy

CompSci 516 Data Intensive Computing Systems. Lecture 11. Query Optimization. Instructor: Sudeepa Roy CompSci 516 Data Intensive Computing Systems Lecture 11 Query Optimization Instructor: Sudeepa Roy Duke CS, Fall 2017 CompSci 516: Database Systems 1 Announcements HW2 has been posted on sakai Due on Oct

More information

Implementing Relational Operators: Selection, Projection, Join. Database Management Systems, R. Ramakrishnan and J. Gehrke 1

Implementing Relational Operators: Selection, Projection, Join. Database Management Systems, R. Ramakrishnan and J. Gehrke 1 Implementing Relational Operators: Selection, Projection, Join Database Management Systems, R. Ramakrishnan and J. Gehrke 1 Readings [RG] Sec. 14.1-14.4 Database Management Systems, R. Ramakrishnan and

More information

High Volume In-Memory Data Unification

High Volume In-Memory Data Unification 25 March 2017 High Volume In-Memory Data Unification for UniConnect Platform powered by Intel Xeon Processor E7 Family Contents Executive Summary... 1 Background... 1 Test Environment...2 Dataset Sizes...

More information

Data Modeling and Databases Ch 10: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich

Data Modeling and Databases Ch 10: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich Data Modeling and Databases Ch 10: Query Processing - Algorithms Gustavo Alonso Systems Group Department of Computer Science ETH Zürich Transactions (Locking, Logging) Metadata Mgmt (Schema, Stats) Application

More information

Tuning Relational Systems I

Tuning Relational Systems I Tuning Relational Systems I Schema design Trade-offs among normalization, denormalization, clustering, aggregate materialization, vertical partitioning, etc Query rewriting Using indexes appropriately,

More information

Database Applications (15-415)

Database Applications (15-415) Database Applications (15-415) DBMS Internals- Part VI Lecture 17, March 24, 2015 Mohammad Hammoud Today Last Two Sessions: DBMS Internals- Part V External Sorting How to Start a Company in Five (maybe

More information

arxiv: v1 [cs.db] 6 Jan 2019

arxiv: v1 [cs.db] 6 Jan 2019 Exact Selectivity Computation for Modern In-Memory Database Optimization Jun Hyung Shin University of California Merced jshin33@ucmerced.edu Florin Rusu University of California Merced frusu@ucmerced.edu

More information

Relational Query Optimization

Relational Query Optimization Relational Query Optimization Module 4, Lectures 3 and 4 Database Management Systems, R. Ramakrishnan 1 Overview of Query Optimization Plan: Tree of R.A. ops, with choice of alg for each op. Each operator

More information

Data Modeling and Databases Ch 9: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich

Data Modeling and Databases Ch 9: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich Data Modeling and Databases Ch 9: Query Processing - Algorithms Gustavo Alonso Systems Group Department of Computer Science ETH Zürich Transactions (Locking, Logging) Metadata Mgmt (Schema, Stats) Application

More information

ECS 165B: Database System Implementa6on Lecture 7

ECS 165B: Database System Implementa6on Lecture 7 ECS 165B: Database System Implementa6on Lecture 7 UC Davis April 12, 2010 Acknowledgements: por6ons based on slides by Raghu Ramakrishnan and Johannes Gehrke. Class Agenda Last 6me: Dynamic aspects of

More information

Parallel Query Optimisation

Parallel Query Optimisation Parallel Query Optimisation Contents Objectives of parallel query optimisation Parallel query optimisation Two-Phase optimisation One-Phase optimisation Inter-operator parallelism oriented optimisation

More information

Chapter 17: Parallel Databases

Chapter 17: Parallel Databases Chapter 17: Parallel Databases Introduction I/O Parallelism Interquery Parallelism Intraquery Parallelism Intraoperation Parallelism Interoperation Parallelism Design of Parallel Systems Database Systems

More information

A Compression Framework for Query Results

A Compression Framework for Query Results A Compression Framework for Query Results Zhiyuan Chen and Praveen Seshadri Cornell University zhychen, praveen@cs.cornell.edu, contact: (607)255-1045, fax:(607)255-4428 Decision-support applications in

More information

Lazy Maintenance of Materialized Views

Lazy Maintenance of Materialized Views Lazy Maintenance of Materialized Views Jingren Zhou, Microsoft Research, USA Paul Larson, Microsoft Research, USA Hicham G. Elmongui, Purdue University, USA Introduction 2 Materialized views Speed up query

More information

Schema Tuning. Tuning Schemas : Overview

Schema Tuning. Tuning Schemas : Overview Administração e Optimização de Bases de Dados 2012/2013 Schema Tuning Bruno Martins DEI@Técnico e DMIR@INESC-ID Tuning Schemas : Overview Trade-offs among normalization / denormalization Overview When

More information

Administriva. CS 133: Databases. General Themes. Goals for Today. Fall 2018 Lec 11 10/11 Query Evaluation Prof. Beth Trushkowsky

Administriva. CS 133: Databases. General Themes. Goals for Today. Fall 2018 Lec 11 10/11 Query Evaluation Prof. Beth Trushkowsky Administriva Lab 2 Final version due next Wednesday CS 133: Databases Fall 2018 Lec 11 10/11 Query Evaluation Prof. Beth Trushkowsky Problem sets PSet 5 due today No PSet out this week optional practice

More information

Lecture #14 Optimizer Implementation (Part I)

Lecture #14 Optimizer Implementation (Part I) 15-721 ADVANCED DATABASE SYSTEMS Lecture #14 Optimizer Implementation (Part I) Andy Pavlo / Carnegie Mellon University / Spring 2016 @Andy_Pavlo // Carnegie Mellon University // Spring 2017 2 TODAY S AGENDA

More information

Principles of Data Management. Lecture #12 (Query Optimization I)

Principles of Data Management. Lecture #12 (Query Optimization I) Principles of Data Management Lecture #12 (Query Optimization I) Instructor: Mike Carey mjcarey@ics.uci.edu Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Today s Notable News v B+ tree

More information

Sandor Heman, Niels Nes, Peter Boncz. Dynamic Bandwidth Sharing. Cooperative Scans: Marcin Zukowski. CWI, Amsterdam VLDB 2007.

Sandor Heman, Niels Nes, Peter Boncz. Dynamic Bandwidth Sharing. Cooperative Scans: Marcin Zukowski. CWI, Amsterdam VLDB 2007. Cooperative Scans: Dynamic Bandwidth Sharing in a DBMS Marcin Zukowski Sandor Heman, Niels Nes, Peter Boncz CWI, Amsterdam VLDB 2007 Outline Scans in a DBMS Cooperative Scans Benchmarks DSM version VLDB,

More information

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15 Systems Infrastructure for Data Science Web Science Group Uni Freiburg WS 2014/15 Lecture II: Indexing Part I of this course Indexing 3 Database File Organization and Indexing Remember: Database tables

More information

Architecture and Implementation of Database Systems Query Processing

Architecture and Implementation of Database Systems Query Processing Architecture and Implementation of Database Systems Query Processing Ralf Möller Hamburg University of Technology Acknowledgements The course is partially based on http://www.systems.ethz.ch/ education/past-courses/hs08/archdbms

More information

Query Evaluation Overview, cont.

Query Evaluation Overview, cont. Query Evaluation Overview, cont. Lecture 9 Feb. 29, 2016 Slides based on Database Management Systems 3 rd ed, Ramakrishnan and Gehrke Architecture of a DBMS Query Compiler Execution Engine Index/File/Record

More information

Column-Stores vs. Row-Stores: How Different Are They Really?

Column-Stores vs. Row-Stores: How Different Are They Really? Column-Stores vs. Row-Stores: How Different Are They Really? Daniel J. Abadi, Samuel Madden and Nabil Hachem SIGMOD 2008 Presented by: Souvik Pal Subhro Bhattacharyya Department of Computer Science Indian

More information

Database Systems. Announcement. December 13/14, 2006 Lecture #10. Assignment #4 is due next week.

Database Systems. Announcement. December 13/14, 2006 Lecture #10. Assignment #4 is due next week. Database Systems ( 料 ) December 13/14, 2006 Lecture #10 1 Announcement Assignment #4 is due next week. 2 1 Overview of Query Evaluation Chapter 12 3 Outline Query evaluation (Overview) Relational Operator

More information

Evaluation of Relational Operations

Evaluation of Relational Operations Evaluation of Relational Operations Chapter 14 Comp 521 Files and Databases Fall 2010 1 Relational Operations We will consider in more detail how to implement: Selection ( ) Selects a subset of rows from

More information

When and How to Take Advantage of New Optimizer Features in MySQL 5.6. Øystein Grøvlen Senior Principal Software Engineer, MySQL Oracle

When and How to Take Advantage of New Optimizer Features in MySQL 5.6. Øystein Grøvlen Senior Principal Software Engineer, MySQL Oracle When and How to Take Advantage of New Optimizer Features in MySQL 5.6 Øystein Grøvlen Senior Principal Software Engineer, MySQL Oracle Program Agenda Improvements for disk-bound queries Subquery improvements

More information

Schema for Examples. Query Optimization. Alternative Plans 1 (No Indexes) Motivating Example. Alternative Plans 2 With Indexes

Schema for Examples. Query Optimization. Alternative Plans 1 (No Indexes) Motivating Example. Alternative Plans 2 With Indexes Schema for Examples Query Optimization (sid: integer, : string, rating: integer, age: real) (sid: integer, bid: integer, day: dates, rname: string) Similar to old schema; rname added for variations. :

More information

Advanced Databases: Parallel Databases A.Poulovassilis

Advanced Databases: Parallel Databases A.Poulovassilis 1 Advanced Databases: Parallel Databases A.Poulovassilis 1 Parallel Database Architectures Parallel database systems use parallel processing techniques to achieve faster DBMS performance and handle larger

More information

Efficient in-memory query execution using JIT compiling. Han-Gyu Park

Efficient in-memory query execution using JIT compiling. Han-Gyu Park Efficient in-memory query execution using JIT compiling Han-Gyu Park 2012-11-16 CONTENTS Introduction How DCX works Experiment(purpose(at the beginning of this slide), environment, result, analysis & conclusion)

More information

Relational Query Optimization. Overview of Query Evaluation. SQL Refresher. Yanlei Diao UMass Amherst October 23 & 25, 2007

Relational Query Optimization. Overview of Query Evaluation. SQL Refresher. Yanlei Diao UMass Amherst October 23 & 25, 2007 Relational Query Optimization Yanlei Diao UMass Amherst October 23 & 25, 2007 Slide Content Courtesy of R. Ramakrishnan, J. Gehrke, and J. Hellerstein 1 Overview of Query Evaluation Query Evaluation Plan:

More information

BDCC: Exploiting Fine-Grained Persistent Memories for OLAP. Peter Boncz

BDCC: Exploiting Fine-Grained Persistent Memories for OLAP. Peter Boncz BDCC: Exploiting Fine-Grained Persistent Memories for OLAP Peter Boncz NVRAM System integration: NVMe: block devices on the PCIe bus NVDIMM: persistent RAM, byte-level access Low latency Lower than Flash,

More information

Query Optimization. Schema for Examples. Motivating Example. Similar to old schema; rname added for variations. Reserves: Sailors:

Query Optimization. Schema for Examples. Motivating Example. Similar to old schema; rname added for variations. Reserves: Sailors: Query Optimization Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Schema for Examples (sid: integer, sname: string, rating: integer, age: real) (sid: integer, bid: integer, day: dates,

More information

Magda Balazinska - CSE 444, Spring

Magda Balazinska - CSE 444, Spring Query Optimization Algorithm CE 444: Database Internals Lectures 11-12 Query Optimization (part 2) Enumerate alternative plans (logical & physical) Compute estimated cost of each plan Compute number of

More information

Module 9: Selectivity Estimation

Module 9: Selectivity Estimation Module 9: Selectivity Estimation Module Outline 9.1 Query Cost and Selectivity Estimation 9.2 Database profiles 9.3 Sampling 9.4 Statistics maintained by commercial DBMS Web Forms Transaction Manager Lock

More information

! Parallel machines are becoming quite common and affordable. ! Databases are growing increasingly large

! Parallel machines are becoming quite common and affordable. ! Databases are growing increasingly large Chapter 20: Parallel Databases Introduction! Introduction! I/O Parallelism! Interquery Parallelism! Intraquery Parallelism! Intraoperation Parallelism! Interoperation Parallelism! Design of Parallel Systems!

More information

Chapter 20: Parallel Databases

Chapter 20: Parallel Databases Chapter 20: Parallel Databases! Introduction! I/O Parallelism! Interquery Parallelism! Intraquery Parallelism! Intraoperation Parallelism! Interoperation Parallelism! Design of Parallel Systems 20.1 Introduction!

More information

Chapter 20: Parallel Databases. Introduction

Chapter 20: Parallel Databases. Introduction Chapter 20: Parallel Databases! Introduction! I/O Parallelism! Interquery Parallelism! Intraquery Parallelism! Intraoperation Parallelism! Interoperation Parallelism! Design of Parallel Systems 20.1 Introduction!

More information

Examples of Physical Query Plan Alternatives. Selected Material from Chapters 12, 14 and 15

Examples of Physical Query Plan Alternatives. Selected Material from Chapters 12, 14 and 15 Examples of Physical Query Plan Alternatives Selected Material from Chapters 12, 14 and 15 1 Query Optimization NOTE: SQL provides many ways to express a query. HENCE: System has many options for evaluating

More information

Simba: Towards Building Interactive Big Data Analytics Systems. Feifei Li

Simba: Towards Building Interactive Big Data Analytics Systems. Feifei Li Simba: Towards Building Interactive Big Data Analytics Systems Feifei Li Complex Operators over Rich Data Types Integrated into System Kernel For Example: SELECT k-means from Population WHERE k=5 and feature=age

More information

Cost-based Query Sub-System. Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Last Class.

Cost-based Query Sub-System. Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Last Class. Cost-based Query Sub-System Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB Applications Queries Select * From Blah B Where B.blah = blah Query Parser Query Optimizer C. Faloutsos A. Pavlo

More information

Overview of Query Processing. Evaluation of Relational Operations. Why Sort? Outline. Two-Way External Merge Sort. 2-Way Sort: Requires 3 Buffer Pages

Overview of Query Processing. Evaluation of Relational Operations. Why Sort? Outline. Two-Way External Merge Sort. 2-Way Sort: Requires 3 Buffer Pages Overview of Query Processing Query Parser Query Processor Evaluation of Relational Operations Query Rewriter Query Optimizer Query Executor Yanlei Diao UMass Amherst Lock Manager Access Methods (Buffer

More information

Query processing of pre-partitioned data using Sandwich Operators

Query processing of pre-partitioned data using Sandwich Operators Query processing of pre-partitioned data using Sandwich Operators Stephan Baumann 1, Peter Boncz 2, and Kai-Uwe Sattler 1 1 Ilmenau University of Technology, Ilmenau, Germany, first.last@tu-ilmenau.de,

More information

Outline. Query Processing Overview Algorithms for basic operations. Query optimization. Sorting Selection Join Projection

Outline. Query Processing Overview Algorithms for basic operations. Query optimization. Sorting Selection Join Projection Outline Query Processing Overview Algorithms for basic operations Sorting Selection Join Projection Query optimization Heuristics Cost-based optimization 19 Estimate I/O Cost for Implementations Count

More information

DBMS Query evaluation

DBMS Query evaluation Data Management for Data Science DBMS Maurizio Lenzerini, Riccardo Rosati Corso di laurea magistrale in Data Science Sapienza Università di Roma Academic Year 2016/2017 http://www.dis.uniroma1.it/~rosati/dmds/

More information

An SQL query is parsed into a collection of query blocks optimize one block at a time. Nested blocks are usually treated as calls to a subroutine

An SQL query is parsed into a collection of query blocks optimize one block at a time. Nested blocks are usually treated as calls to a subroutine QUERY OPTIMIZATION 1 QUERY OPTIMIZATION QUERY SUB-SYSTEM 2 ROADMAP 3. 12 QUERY BLOCKS: UNITS OF OPTIMIZATION An SQL query is parsed into a collection of query blocks optimize one block at a time. Nested

More information

Algorithms for Query Processing and Optimization. 0. Introduction to Query Processing (1)

Algorithms for Query Processing and Optimization. 0. Introduction to Query Processing (1) Chapter 19 Algorithms for Query Processing and Optimization 0. Introduction to Query Processing (1) Query optimization: The process of choosing a suitable execution strategy for processing a query. Two

More information

Architecture and Implementation of Database Systems (Winter 2015/16)

Architecture and Implementation of Database Systems (Winter 2015/16) Jens Teubner Architecture & Implementation of DBMS Winter 2015/16 1 Architecture and Implementation of Database Systems (Winter 2015/16) Jens Teubner, DBIS Group jens.teubner@cs.tu-dortmund.de Winter 2015/16

More information

Chapter 12: Query Processing

Chapter 12: Query Processing Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Overview Chapter 12: Query Processing Measures of Query Cost Selection Operation Sorting Join

More information