Advanced Query Optimization
|
|
- Rosa June Dorsey
- 5 years ago
- Views:
Transcription
1 Advanced Query Optimization Andreas Meister Otto-von-Guericke University Magdeburg Summer Term 2018
2 Why do we need query optimization? Andreas Meister Advanced Query Optimization Last Change: April 23, /90
3 SQL - TPC-H - Query 5 SELECT N_NAME, SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)) FROM CUSTOMER, ORDERS, LINEITEM, SUPPLIER, NATION, REGION WHERE C_CUSTKEY = O_CUSTKEY AND L_ORDERKEY = O_ORDERKEY AND L_SUPPKEY = S_SUPPKEY AND C_NATIONKEY = S_NATIONKEY AND S_NATIONKEY = N_NATIONKEY AND N_REGIONKEY = R_REGIONKEY AND R_NAME = ASIA AND O_ORDERDATE >= AND O_ORDERDATE < GROUP BY N_NAME ORDER BY REVENUE DESC Andreas Meister Advanced Query Optimization Last Change: April 23, /90
4 SQL - TPC-H - Query 5 SELECT N_NAME, SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)) FROM CUSTOMER, ORDERS, LINEITEM, SUPPLIER, NATION, REGION WHERE C_CUSTKEY = O_CUSTKEY AND L_ORDERKEY = O_ORDERKEY AND L_SUPPKEY = S_SUPPKEY AND C_NATIONKEY = S_NATIONKEY AND S_NATIONKEY = N_NATIONKEY AND N_REGIONKEY = R_REGIONKEY AND R_NAME = ASIA AND O_ORDERDATE >= AND O_ORDERDATE < GROUP BY N_NAME ORDER BY REVENUE DESC How would you execute this query? Andreas Meister Advanced Query Optimization Last Change: April 23, /90
5 Challenges SQL Declarative (What, not how!) Andreas Meister Advanced Query Optimization Last Change: April 23, /90
6 Challenges SQL Declarative (What, not how!) Plenty options to chose from: Operator variants (Joins: NL, BNL, Hash, Sort-Merge,...) Table access: Index vs Scans Execution: Operator order... Andreas Meister Advanced Query Optimization Last Change: April 23, /90
7 Challenges SQL Declarative (What, not how!) Plenty options to chose from: Operator variants (Joins: NL, BNL, Hash, Sort-Merge,...) Table access: Index vs Scans Execution: Operator order... Options are based on properties of database: Data type Data distribution... Andreas Meister Advanced Query Optimization Last Change: April 23, /90
8 Challenges SQL Declarative (What, not how!) Plenty options to chose from: Operator variants (Joins: NL, BNL, Hash, Sort-Merge,...) Table access: Index vs Scans Execution: Operator order... Options are based on properties of database: Data type Data distribution... Why do we not chose an option randomly? Andreas Meister Advanced Query Optimization Last Change: April 23, /90
9 Challenge Because the efficiency depends on the choice. runtime [ms] random query plans [sorted by runtime] Adapted from [Neumann, 2014] Andreas Meister Advanced Query Optimization Last Change: April 23, /90
10 How is efficiency ensured? Andreas Meister Advanced Query Optimization Last Change: April 23, /90
11 Query Processing - Phases SQL-Query Result LogicalL optimization PhysicalL optimization Cost-based Selection TranslationL& ViewLresolving Algebra StandardizationL& Simplification Optimization Algebra Access plan Translation time Execution Runtime Code Code-Generation PlanL parameterization Access plan Adapted from [Saake et al., 2012] Andreas Meister Advanced Query Optimization Last Change: April 23, /90
12 Query Processing - Translation & View Resolving Simplification of arithmetic expressions Resolve sub-queries Insertion of view definitions ɣ SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT));N_NAME σ C_CUSTKEY = O_CUSTKEY AND L_ORDERKEY = O_ORDERKEY AND L_SUPPKEY = S_SUPPKEY AND C_NATIONKEY = S_NATIONKEY AND S_NATIONKEY = N_NATIONKEY AND N_REGIONKEY = R_REGIONKEY AND R_NAME = ASIA AND O_ORDERDATE >= AND O_ORDERDATE < ' X X REGION X NATION SUPPLIER LINEITEM X X ORDERS CUSTOMER Andreas Meister Advanced Query Optimization Last Change: April 23, /90
13 Query Processing - Standardization & Simplification Expressions: Conjunctive normal form: (p 11 p 1n ) (p m1 p mn ) Disjunktive normal form: (p 11 p 1n ) (p m1 p mn ) Query: Unnesting σ A=B R 1 <Condition> R 1 R 2 A in π B R 2 Andreas Meister Advanced Query Optimization Last Change: April 23, /90
14 Query Processing - Optimization Goal: Efficient query execution Andreas Meister Advanced Query Optimization Last Change: April 23, /90
15 Query Processing - Optimization Goal: Efficient query execution Three phases: Logical optimization Physical optimization Cost-based selection Andreas Meister Advanced Query Optimization Last Change: April 23, /90
16 Query Processing - Optimization Goal: Efficient query execution Three phases: Logical optimization Physical optimization Cost-based selection Method: Use available information about Data (distribution, type, ) Database system (algorithms, processors, ) Query (operators, restrictions, ) Andreas Meister Advanced Query Optimization Last Change: April 23, /90
17 Query Processing - Logical Optimization Apply optimization rules ɣ SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT));N_NAME N_REGIONKEY = R_REGIONKEY σ R_NAME = ASIA S_NATIONKEY = N_NATIONKEY REGION NATION L_SUPPKEY = S_SUPPKEY AND C_NATIONKEY = S_NATIONKEY SUPPLIER L_ORDERKEY = O_ORDERKEY LINEITEM C_CUSTKEY = O_CUSTKEY σ O_ORDERDATE >= ' AND O_ORDERDATE < ' CUSTOMER ORDERS Andreas Meister Advanced Query Optimization Last Change: April 23, /90
18 Query Processing - Algorithm Simple optimization algorithm Resolve complex selection predicate, if applicable resolving of and Remove redundant operators Pushing down selections as near as possible to the leaf Resolve cross joins Pushing projections in leaf direction Single steps will be executed in the stated order until no replacement can be performed Andreas Meister Advanced Query Optimization Last Change: April 23, /90
19 Query Processing - Physical Optimization Consider: Storage information (Indexes, page size, ) Algorithms Processors ɣ SORT SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT));N_NAME SORT N_REGIONKEY = R_REGIONKEY σ REL R_NAME = ASIA HASH S_NATIONKEY = N_NATIONKEY REGION NATION SORT L_SUPPKEY = S_SUPPKEY AND C_NATIONKEY = S_NATIONKEY SUPPLIER NLJ L_ORDERKEY = O_ORDERKEY LINEITEM HASH C_CUSTKEY = O_CUSTKEY σ IND O_ORDERDATE >= ' AND O_ORDERDATE < ' CUSTOMER ORDERS Andreas Meister Advanced Query Optimization Last Change: April 23, /90
20 Query Processing - Cost-Based Selection query span search space transformation rules equivalent plans search strategy cost estimations "best" plan Adapted from [Saake et al., 2012] Often combined with physical optimization Andreas Meister Advanced Query Optimization Last Change: April 23, /90
21 Query Processing - Plan Parametrization Prepared Statements: Optimize once Execute multiple times Configuration via variables Replace variables with values Andreas Meister Advanced Query Optimization Last Change: April 23, /90
22 Query Processing - Code Generation Compile query into executable code Time (seconds) 100 "tuple at a time" DBMS "X" MySQL 4.1 interpretation dominates execution interpretation overhead decreases "column at a time" MonetDB/MIL main memory materialization overhead query without selection 1 vectors start to exceed 0.60 CPU cache, causing MonetDB/X100 extra memory traffic "vector at a time" 0.22 low interpretation overhead Hand Coded in cache materialization C Program K 4K 16K 64K 256K 1M 4M 6M TPC-H Query 1 with different vector size [Zukowski et al., 2005] Andreas Meister Advanced Query Optimization Last Change: April 23, /90
23 Query Processing - Execution Planed all details Execute the code Andreas Meister Advanced Query Optimization Last Change: April 23, /90
24 Query Processing - Execution Planed all details Execute the code What could possibly go wrong? Andreas Meister Advanced Query Optimization Last Change: April 23, /90
25 Query Processing - Execution Planed all details Execute the code What could possibly go wrong? In practice: Wrong assumptions Bad estimates Andreas Meister Advanced Query Optimization Last Change: April 23, /90
26 Query Processing - Execution Planed all details Execute the code What could possibly go wrong? In practice: Wrong assumptions Bad estimates Consider runtime information to adapt query-execution Andreas Meister Advanced Query Optimization Last Change: April 23, /90
27 How does the optimization work? Andreas Meister Advanced Query Optimization Last Change: April 23, /90
28 Andreas Meister Advanced Query Optimization Last Change: April 23, /90 In specific: Join-Order Cost-based Optimization query span search space transformation rules equivalent plans search strategy cost estimations "best" plan Adapted from [Saake et al., 2012]
29 SQL - TPC-H - Query 5 SELECT N_NAME, SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)) FROM CUSTOMER, ORDERS, LINEITEM, SUPPLIER, NATION, REGION WHERE C_CUSTKEY = O_CUSTKEY AND L_ORDERKEY = O_ORDERKEY AND L_SUPPKEY = S_SUPPKEY AND C_NATIONKEY = S_NATIONKEY AND S_NATIONKEY = N_NATIONKEY AND N_REGIONKEY = R_REGIONKEY AND R_NAME = ASIA AND O_ORDERDATE >= AND O_ORDERDATE < GROUP BY N_NAME ORDER BY REVENUE DESC Andreas Meister Advanced Query Optimization Last Change: April 23, /90
30 SQL - TPC-H - Query 5 SELECT N_NAME, SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)) FROM CUSTOMER, ORDERS, LINEITEM, SUPPLIER, NATION, REGION WHERE C_CUSTKEY = O_CUSTKEY AND L_ORDERKEY = O_ORDERKEY AND L_SUPPKEY = S_SUPPKEY AND C_NATIONKEY = S_NATIONKEY AND S_NATIONKEY = N_NATIONKEY AND N_REGIONKEY = R_REGIONKEY AND R_NAME = ASIA AND O_ORDERDATE >= AND O_ORDERDATE < GROUP BY N_NAME ORDER BY REVENUE DESC In which order should the join be performed? Andreas Meister Advanced Query Optimization Last Change: April 23, /90
31 Join-Order Optimization Why consider join-order optimization? runtime [ms] random query plans [sorted by runtime] Adapted from [Neumann, 2014] Andreas Meister Advanced Query Optimization Last Change: April 23, /90
32 Join-Order Optimization - Challenge Goal: Find an efficient join order Andreas Meister Advanced Query Optimization Last Change: April 23, /90
33 Join-Order Optimization - Challenge Goal: Find an efficient join order Challenge: NP-complete problem Limited time Andreas Meister Advanced Query Optimization Last Change: April 23, /90
34 Join-Order Optimization - Example TPC-H Q5 CUSTOMER (C) ORDERS (O) LINEITEM (L) SUPPLIER (S) NATION (N) REGION (R) L C L O L S L N L R ((L S) C) ((L S) O) ((L S) N) ((L S) R) (((L S) O) C) (((L S) O) N) (((L S) O) R) ((((L S) O) N) C) ((((L S) O) N) R) (((((L S) O) N) C) R) (((((L S) O) N) R) C) Andreas Meister Advanced Query Optimization Last Change: April 23, /90
35 Join-Order Optimization - Example TPC-H Q5 CUSTOMER (C) ORDERS (O) LINEITEM (L) SUPPLIER (S) NATION (N) REGION (R) L C L O L S L N L R ((L S) C) ((L S) O) ((L S) N) ((L S) R) (((L S) O) C) (((L S) O) N) (((L S) O) R) ((((L S) O) N) C) ((((L S) O) N) R) (((((L S) O) N) C) R) (((((L S) O) N) R) C) This is not even all! Andreas Meister Advanced Query Optimization Last Change: April 23, /90
36 Join-Order Optimization - Tree Form Deep Trees L C N O S R R S O N L C Left-Deep Trees Right-Deep Trees Andreas Meister Advanced Query Optimization Last Change: April 23, /90
37 Join-Order Optimization - Tree Form More general tree forms: L S R N O C Zick-Zack Trees L S O N C R Bushy Trees Andreas Meister Advanced Query Optimization Last Change: April 23, /90
38 Join-Order Optimization - Challenge Number of equivalent options: Left (or Right) Deep-Trees: n! For 10 tables: Andreas Meister Advanced Query Optimization Last Change: April 23, /90
39 Join-Order Optimization - Challenge Number of equivalent options: Left (or Right) Deep-Trees: n! For 10 tables: Bushy Trees: S(n) n! variants S(1) = 1 n 1 S(n) = S(i)S(n i) i=1 For 10 tables: Andreas Meister Advanced Query Optimization Last Change: April 23, /90
40 Join-Order Optimization - Topology Number of valid trees depends on the query topology: r 1 r 2 r 3... r n r 1 r 2 r 3... r n linear cyclic r 1 r 1 r 2 r 2 r 3... r n r 3 r n... star clique Considering cross-joins clique Andreas Meister Advanced Query Optimization Last Change: April 23, /90
41 Join-Order Optimization - Approaches Greedy Deterministic Exhaustive search Hybrid Randomized Transformation Sampling Genetic algorithms Andreas Meister Advanced Query Optimization Last Change: April 23, /90
42 Join-Order Optimization - Deterministic Approaches Same input Same output Andreas Meister Advanced Query Optimization Last Change: April 23, /90
43 Join-Order Optimization - Deterministic Approaches Same input Same output Greedy: Short runtime Simple heuristics Neither efficiency nor optimality guaranteed Andreas Meister Advanced Query Optimization Last Change: April 23, /90
44 Join-Order Optimization - Deterministic Approaches Same input Same output Greedy: Short runtime Simple heuristics Neither efficiency nor optimality guaranteed Exhaustive Search: Guarantee optimal results Long runtime Only applicable to simple queries Andreas Meister Advanced Query Optimization Last Change: April 23, /90
45 Join-Order Optimization - Randomized Approaches Same input Different outputs No optimality guaranteed, but often efficient results Andreas Meister Advanced Query Optimization Last Change: April 23, /90
46 Join-Order Optimization - Randomized Approaches Same input Different outputs No optimality guaranteed, but often efficient results Transformation: Selection of initial result Transformation of current result Andreas Meister Advanced Query Optimization Last Change: April 23, /90
47 Join-Order Optimization - Randomized Approaches Same input Different outputs No optimality guaranteed, but often efficient results Transformation: Selection of initial result Transformation of current result Sampling: Randomly choosing candidates Best solution is stored Andreas Meister Advanced Query Optimization Last Change: April 23, /90
48 Join-Order Optimization - Randomized Approaches Same input Different outputs No optimality guaranteed, but often efficient results Transformation: Selection of initial result Transformation of current result Sampling: Randomly choosing candidates Best solution is stored Genetic algorithms: Selection of initial solution pool Creating new solutions based on solution pool Andreas Meister Advanced Query Optimization Last Change: April 23, /90
49 Join-Order Optimization - Hybrid Approaches Advantages deterministic approaches: Predictable Ensure optimality Advantages randomized approaches: Suitable for complex optimization problems Limited runtime Hybrid approaches: Combine both deterministic and randomized approaches Andreas Meister Advanced Query Optimization Last Change: April 23, /90
50 Join-Order Optimization - Approaches Greedy Deterministic Exhaustive search Hybrid Randomized Transformation Sampling Genetic algorithms In specific: Dynamic programming Andreas Meister Advanced Query Optimization Last Change: April 23, /90
51 Join-Order Optimization - Dynamic Programming Algorithm 1: DP SIZE [Selinger et al., 1979] Input : Join query Q with n tables T = {T 1,..., T n} Output: an optimal bushy join tree 1 foreach T i T do 2 optimalplan(t i ) = T i ; 3 for s = 2 to n do 4 for s l = 1 to s 1 do 5 s r = s s l ; 6 foreach S l T : S l = s l do 7 foreach S r T : S r = s r do 8 if S l S r then continue; 9 if S l not connected to S r then continue; 10 optimal-left-plan = optimalplan(s l ); 11 optimal-right-plan = optimalplan(s r ); 12 current-plan = createjointree(optimal-left-plan, optimal-right-plan); 13 if cost(optimalplan(s l S r )) > cost(current-plan) then 14 optimalplan(s l S r ) = current-plan ; 15 return optimalplan(t ) ; Andreas Meister Advanced Query Optimization Last Change: April 23, /90
52 Dynamic Programming - DP SIZE T 4 T 3 T 2 T 1 1 # Tables Andreas Meister Advanced Query Optimization Last Change: April 23, /90
53 Dynamic Programming - DP SIZE T 4 T 3,4 T 3 T 2... T 1 T 1,2 1 2 # Tables Andreas Meister Advanced Query Optimization Last Change: April 23, /90
54 Dynamic Programming - DP SIZE T 4 T 3,4 T 3 T T 1 T 1,2 T 1,2, # Tables Andreas Meister Advanced Query Optimization Last Change: April 23, /90
55 Dynamic Programming - DP SIZE T 4 T 3,4 T 3 T 1,2,3,4 T T 1 T 1,2 T 1,2,3 T 1,2,3, # Tables Andreas Meister Advanced Query Optimization Last Change: April 23, /90
56 Dynamic Programming - DP SIZE Evaluation of DP SIZE only based on number of tables Not only valid...: (T1,T 2,T 3 ) (T 4 ) (T1,T 2,T 4 ) (T 3 ) (T 1,T 3,T 4 ) (T 2 ) (T 2,T 3,T 4 ) (T 1 ) Andreas Meister Advanced Query Optimization Last Change: April 23, /90
57 Dynamic Programming - DP SIZE Evaluation of DP SIZE only based on number of tables Not only valid...: (T1,T 2,T 3 ) (T 4 ) (T1,T 2,T 4 ) (T 3 ) (T 1,T 3,T 4 ) (T 2 ) (T 2,T 3,T 4 ) (T 1 ).., but also invalid entries are evaluated: (T1,T 2,T 3 ) (T 1 ) (T1,T 2,T 3 ) (T 2 ) (T1,T 2,T 3 ) (T 3 ) Andreas Meister Advanced Query Optimization Last Change: April 23, /90
58 Dynamic Programming - DP SIZE Evaluation of DP SIZE only based on number of tables Not only valid...: (T1,T 2,T 3 ) (T 4 ) (T1,T 2,T 4 ) (T 3 ) (T 1,T 3,T 4 ) (T 2 ) (T 2,T 3,T 4 ) (T 1 ).., but also invalid entries are evaluated: (T1,T 2,T 3 ) (T 1 ) (T1,T 2,T 3 ) (T 2 ) (T1,T 2,T 3 ) (T 3 ) Solution: Enumerate calculations (DP SUB ) Andreas Meister Advanced Query Optimization Last Change: April 23, /90
59 Dynamic Programming - DP SUB Idea: Use integer representation of solutions Table T i available i-th bit set 1 (T1 ) 2 (T 2 ) 3 (T 1,T 2 ) 4 (T3 ) 5 (T1,T 3 ) 15 (T1,T 2,T 3,T 4 ) Andreas Meister Advanced Query Optimization Last Change: April 23, /90
60 Algorithm 2: DP SUB [Vance and Maier, 1996] Input : Join query Q with n tables T = {T 1,..., T n} Output: an optimal bushy join tree 1 foreach T i T do 2 optimalplan(t i ) = T i ; 3 for k = 2 to n do 4 for S = 2 k to 2 k 1 do 5 foreach S l S do 6 optimal-left-plan = optimalplan(s l ); 7 if optimal-left-plan = then continue; 8 S r = S S l ; 9 optimal-right-plan = optimalplan(s r ); 10 if optimal-right-plan = then continue; 11 if optimal-left-plan not connected to optimal-right-plan then continue; 12 current-plan = createjointree(optimal-left-plan, optimal-right-plan); 13 if cost(optimalplan(s)) > cost(current-plan) then 14 optimalplan(s) = current-plan ; 15 return optimalplan(2 n 1) ; Andreas Meister Advanced Query Optimization Last Change: April 23, /90
61 Dynamic Programming - DP SUB n=2: 3 (T 1,T 2 ) Andreas Meister Advanced Query Optimization Last Change: April 23, /90
62 Dynamic Programming - DP SUB n=2: n=3: 3 (T 1,T 2 ) 5 (T1,T 3 ) 6 (T 2,T 3 ) 7 (T 1,T 2,T 3 ) Andreas Meister Advanced Query Optimization Last Change: April 23, /90
63 Dynamic Programming - DP SUB n=2: n=3: n=4: 3 (T 1,T 2 ) 5 (T1,T 3 ) 6 (T 2,T 3 ) 7 (T 1,T 2,T 3 ) 9 (T1,T 4 ) 10 (T 2,T 4 ) 11 (T 1,T 2,T 4 ) 12 (T3,T 4 ) 13 (T1,T 3,T 4 ) 14 (T2,T 3,T 4 ) 15 (T1,T 2,T 3,T 4 ) Andreas Meister Advanced Query Optimization Last Change: April 23, /90
64 Dynamic Programming - DP SUB Splitting integers determines calculations e.g. solution s 13 (T 1,T 3,T 4 ): Andreas Meister Advanced Query Optimization Last Change: April 23, /90
65 Dynamic Programming - DP SUB Splitting integers determines calculations e.g. solution s 13 (T 1,T 3,T 4 ): Implementation: 1 Determine the first left join partner l (least significant bit of s using Compiler or DeBruijn) 2 Determine first right join partner (s l) 3 Determine next left join partner l = s&(l s) 4 Repeat step 2-3 until l == s Andreas Meister Advanced Query Optimization Last Change: April 23, /90
66 Dynamic Programming - DP SUB Remember: Valid combinations based on query topology: r 1 r 2 r 3... r n r 1 r 2 r 3... r n linear cyclic r 1 r 1 r 2 r 2 r 3... r n r 3 r n... star clique DP SUB always enumerate all combinations For non-cliques also unneeded combinations are evaluated Andreas Meister Advanced Query Optimization Last Change: April 23, /90
67 Dynamic Programming - DP CCP Idea: Enumerate calculations based on query Andreas Meister Advanced Query Optimization Last Change: April 23, /90
68 Dynamic Programming - DP CCP Idea: Enumerate calculations based on query Approach: Enumerate tables of query in a breath-first-manner (Avoid duplicate calculations) Determine connected sub-graphs within query For sub-graphs: Determine complements (joinable connected-subgraphs) Andreas Meister Advanced Query Optimization Last Change: April 23, /90
69 Algorithm 3: DP CCP [Moerkotte and Neumann, 2006] Input : Join query Q with n tables T = {T 1,..., T n} Output: an optimal bushy join tree 1 foreach T i T do 2 optimalplan(t i ) = i ; 3 csgs = enumeratecsg(q); 4 foreach S l csgs do 5 cmps = enumeratecmp(q, S l ); 6 foreach S r cmps do 7 optimal-left-plan = optimalplan(s l ); 8 optimal-right-plan = optimalplan(s r ); 9 current-plan = createjointree(optimal-left-plan, optimal-right-plan); 10 if cost(optimalplan(s l S r )) > cost(current-plan) then 11 optimalplan(s l S r ) = current-plan ; 12 current-plan = createjointree(optimal-right-plan, optimal-left-plan); 13 if cost(optimalplan(s l S r )) > cost(current-plan) then 14 optimalplan(s l S r ) = current-plan ; 15 return optimalplan(r) ; Andreas Meister Advanced Query Optimization Last Change: April 23, /90
70 Dynamic Programming - Overview DP SIZE Good approach for simple optimization problems Inefficient (invalid join partners), for more complex optimization problems DP SUB Good for optimizing cliques Based on enumeration of all possible combinations, inefficient for other topologies DP CCP Only needed join pairs are evaluated based on enumeration Enumeration poses overhead for simple optimization problems Andreas Meister Advanced Query Optimization Last Change: April 23, /90
71 Problems with traditional DP Approaches Serial execution of calculations Independent calculations available R 1 R 2 R 3 R 4 Adapted from [Han et al., 2008] S = 2: R1-R2; R1-R3; R1-R4 Current multi-core CPUs offer parallel execution Parallel DP-approach needed for current hardware Andreas Meister Advanced Query Optimization Last Change: April 23, /90
72 Parallelized Dynamic Programming - PDP SVA Idea: Partition independent calculations [Han et al., 2008] Adapted from [Han et al., 2008] QS: Qualifier set PlanList: Optimal query execution plan q 1,, q 4 : Qualifier for relations P 1,, P 4 : QS with 1,, 4 relations Andreas Meister Advanced Query Optimization Last Change: April 23, /90
73 Algorithm 4: PDP SVA [Han et al., 2008] Input : Join query Q with n tables T = {T 1,..., T n} Output: an optimal bushy join tree 1 foreach T i T do 2 optimalplan(t i ) = T i ; 3 for s = 2 to n do 4 SSDVs = AllocateSearchSpace(S,m); 5 for i = 1 to MAX_THREAD_ID do 6 threadpool.submitjob(mutipleplanjoin(ssdvs[i],s)); 7 threadpool.sync(); 8 MergeAndPrunePlanPartitions(S); 9 for i = 1 to MAX_THREAD_ID do 10 threadpool.submitjob(buildskipvectorarray(i)); 11 threadpool.sync(); 12 return optimalplan(r) ; Andreas Meister Advanced Query Optimization Last Change: April 23, /90
74 Parallelized Dynamic Programming - PDP SVA Idea: Distribute over different threads [Han et al., 2008] Adapted from [Han et al., 2008] Andreas Meister Advanced Query Optimization Last Change: April 23, /90
75 Allocation Schemata Search space: s 2 smallsz =1 ( P smallsz P S smallsz ) Total Sum Allocation: Divide the search space in m (number of threads) smaller parts and distribute them equally over the m threads Adapted from [Han et al., 2008] Andreas Meister Advanced Query Optimization Last Change: April 23, /90
76 Allocation Schemata /2 Equi-Depth Allocation: Equally distribute each ( P smallsz P S smallsz ) over all threads (q 1,q 1 q 2 q 3 ) (q 2,q 1 q 2 q 3 ) (q 3,q 1 q 2 q 3 ) (q 4,q 1 q 2 q 3 ) (q 1 q 2,q 1 q 2 ) (q 1 q 3,q 1 q 2 ) (q 1 q 4,q 1 q 2 ) (q 1,q 1 q 2 q 4 ) (q 2,q 1 q 2 q 4 ) (q 3,q 1 q 2 q 4 ) (q 4,q 1 q 2 q 4 ) (q 1 q 2,q 1 q 3 ) (q 1 q 3,q 1 q 3 ) (q 1 q 4,q 1 q 3 ) (q 1,q 1 q 3 q 4 ) (q 2,q 1 q 3 q 4 ) P 1 P 3 thread 1 P 2 P 2 (q 3,q 1 q 3 q 4 ) (q 4,q 1 q 3 q 4 ) (q 1 q 2,q 1 q 4 ) (q 1 q 3,q 1 q 4 ) (q 1 q 4,q 1 q 4 ) thread 2 Adapted from [Han et al., 2008] Andreas Meister Advanced Query Optimization Last Change: April 23, /90
77 Allocation Schemata /3 Round-Robin Outer Allocation: Randomly distribute join pairs (t i, t j ) to thread (i mod m) (q 1,q 1 q 2 q 3 ) (q 2,q 1 q 2 q 3 ) (q 3,q 1 q 2 q 3 ) (q 4,q 1 q 2 q 3 ) (q 1 q 2,q 1 q 2 ) (q 1 q 3,q 1 q 2 ) (q 1 q 4,q 1 q 2 ) (q 1,q 1 q 2 q 4 ) (q 2,q 1 q 2 q 4 ) (q 3,q 1 q 2 q 4 ) (q 4,q 1 q 2 q 4 ) (q 1 q 2,q 1 q 3 ) (q 1 q 3,q 1 q 3 ) (q 1 q 4,q 1 q 3 ) (q 1,q 1 q 3 q 4 ) (q 2,q 1 q 3 q 4 ) P 1 P 3 thread 1 P 2 P 2 (q 3,q 1 q 3 q 4 ) (q 4,q 1 q 3 q 4 ) (q 1 q 2,q 1 q 4 ) (q 1 q 3,q 1 q 4 ) (q 1 q 4,q 1 q 4 ) thread 2 Adapted from [Han et al., 2008] Andreas Meister Advanced Query Optimization Last Change: April 23, /90
78 Allocation Schemata /4 Round-Robin Inner Allocation: Randomly distribute join pairs (t i, t j ) to thread (j mod m) (q 1,q 1 q 2 q 3 ) (q 2,q 1 q 2 q 3 ) (q 3,q 1 q 2 q 3 ) (q 4,q 1 q 2 q 3 ) (q 1 q 2,q 1 q 2 ) (q 1 q 3,q 1 q 2 ) (q 1 q 4,q 1 q 2 ) (q 1,q 1 q 2 q 4 ) (q 2,q 1 q 2 q 4 ) (q 3,q 1 q 2 q 4 ) (q 4,q 1 q 2 q 4 ) (q 1 q 2,q 1 q 3 ) (q 1 q 3,q 1 q 3 ) (q 1 q 4,q 1 q 3 ) (q 1,q 1 q 3 q 4 ) (q 2,q 1 q 3 q 4 ) P 1 P 3 thread 1 P 2 P 2 (q 3,q 1 q 3 q 4 ) (q 4,q 1 q 3 q 4 ) (q 1 q 2,q 1 q 4 ) (q 1 q 3,q 1 q 4 ) (q 1 q 4,q 1 q 4 ) thread 2 Adapted from [Han et al., 2008] Andreas Meister Advanced Query Optimization Last Change: April 23, /90
79 Storage of allocation information Store distribution information in the search space description vector (SSDV) SSDV-Entry: smallsz, [stoutidx, stblkidx, stblkoff ], [endoutidx, endblkidx, endblkoff ], outinc, ininc Andreas Meister Advanced Query Optimization Last Change: April 23, /90
80 Storage of allocation information Store distribution information in the search space description vector (SSDV) SSDV-Entry: smallsz, [stoutidx, stblkidx, stblkoff ], [endoutidx, endblkidx, endblkoff ], outinc, ininc smallsz: Identifier for join of ( P smallsz P S smallsz ) stoutidx: Start index of outer tuple stblkidx: Start block index stblkoff: Offset of inner tuple within block endoutidx: End index of outer tuple endblkidx: End block index endblkoff: Offset of end inner tuple within block outinc: Step size for outer loop ininc: Step size for inner loop Andreas Meister Advanced Query Optimization Last Change: April 23, /90
81 Storage of allocation information - example Adapted from [Han et al., 2008] 1 Block: Thread 1 - SSDV-Entry: { 1, [1, 1, 1], [4, 1, 1], 1, 1, 2, [ 1, 1, 1], [ 1, 1, 1], 1, 1 } Thread 2 - SSDV-Entry: { 1, [4, 1, 2], [4, 1, 3], 1, 1, 2, [1, 1, 1], [3, 1, 3], 1, 1 } Andreas Meister Advanced Query Optimization Last Change: April 23, /90
82 Parallelized Dynamic Programming - PDP SVA Algorithm 5: MultiplePlanJoin [Han et al., 2008] Input: SSDV, S 1 for i=1 to s 2 do 2 PlanJoin(SSDV[i],S) Andreas Meister Advanced Query Optimization Last Change: April 23, /90
83 Parallelized Dynamic Programming - PDP SVA Algorithm 6: PlanJoin [Han et al., 2008] Input: ssdvelmt, S 1 smallsz = ssdvelmt.smallsz; largesz = S-smallSZ; 2 for blkidx = ssdvelmt.stblkidx to ssdvelmt.endblkidx do 3 blk = blkidx-th block in P largesz ; 4 stoutidx, endoutidx = GetOuterRange(ssdvElmt, blkidx); 5 for t o=p smallsz [stoutidx] to P smallsz [endoutidx] step by ssdvelmt.outinc do 6 stblkoff, endblkoff = GetOffsetRangeInBlk(ssdvElmt,blkIdx,offset of t o); 7 for t i = blk [stblkoff ] to blk [endblkoff ] step by ssdvelmt.ininc do 8 if t o.qs t i.qs then 9 continue; 10 if not (t o.qs connected to t i.qs) then 11 continue; 12 Resulting plans = CreateJoinPlans(t o, t i ); 13 PrunePlans(P S, ResultingPlans); Andreas Meister Advanced Query Optimization Last Change: April 23, /90
84 Storage of allocation information - example Adapted from [Han et al., 2008] 1 Block: Thread 1 - SSDV-Entry: { 1, [1, 1, 1], [4, 1, 1], 1, 1, 2, [ 1, 1, 1], [ 1, 1, 1], 1, 1 } Thread 2 - SSDV-Entry: { 1, [4, 1, 2], [4, 1, 3], 1, 1, 2, [1, 1, 1], [3, 1, 3], 1, 1 } Andreas Meister Advanced Query Optimization Last Change: April 23, /90
85 Parallelizing problems Only a small amount of combinations of different join sets are valid Adapted from [Han et al., 2008] Andreas Meister Advanced Query Optimization Last Change: April 23, /90
86 Skip Vector Arrays Problem: High number of invalid combination of qualifier sets Andreas Meister Advanced Query Optimization Last Change: April 23, /90
87 Skip Vector Arrays Problem: High number of invalid combination of qualifier sets Idea: Increase performance by skipping unnecessary combinations of join sets Store additional skipping information to efficiently determine the next join sets Andreas Meister Advanced Query Optimization Last Change: April 23, /90
88 Skip Vector Arrays Adapted from [Han et al., 2008] Andreas Meister Advanced Query Optimization Last Change: April 23, /90
89 Skip Vector Arrays Adapted from [Han et al., 2008] Andreas Meister Advanced Query Optimization Last Change: April 23, /90
90 Parallelized Dynamic Programming - PDP SVA Based on DP SIZE Same drawback invalid calculations Parallelization strategies for DP SUB and DP CCP needed Andreas Meister Advanced Query Optimization Last Change: April 23, /90
91 Parallelized Dynamic Programming - PDP SVA Based on DP SIZE Same drawback invalid calculations Parallelization strategies for DP SUB and DP CCP needed Idea: Use produce-consumer-model Consumer Consumer Producer Queue Consumer Consumer Consumer Andreas Meister Advanced Query Optimization Last Change: April 23, /90
92 Algorithm 7: DPE GENERIC [Han and Lee, 2009] Input : Join query Q with n tables T = {T 1,..., T n} Output: an optimal bushy join tree 1 EnumerationBuffer B c, B p; 2 Hash-Table Memo; 3 partial_order = buildpartialorder(r); 4 e = parsecalculations(partial_order, Memo, B p, MAX_ENUM_CNT ); 5 while e NO_MORE_PAIR do 6 switchbuffers() // B c = B p and B p = B c; 7 for i = 0 to MAX_THREAD_ID 1 do 8 threadpool.submitjob(generateqeps(b c,memo)); 9 e = parsecalculations(partial_order, Memo, B p, MAX_ENUM_CNT ); 10 GenerateQEPs(B c,memo)); 11 threadpool.sync(); 12 return Memo(R) ; Andreas Meister Advanced Query Optimization Last Change: April 23, /90
93 Parallelized Dynamic Programming - DPE GENERIC Characteristics: Parallel Producer-Consumer model Double queue (Producer + Consumer) Parameters: Maximal queue size Partial order Enumeration (DPSIZE,DP SUB,DP CCP ) Andreas Meister Advanced Query Optimization Last Change: April 23, /90
94 Parallelized Dynamic Programming - DPE GENERIC Partial Order: Grouping based on resulting quantifier set q 1q 2q 3q 4 (q 1, q 2q 3q 4),(q 2, q 1q 3q 4),(q 3, q 1q 2q 4),(q 4, q 1q 2q 3), (q 1q 2, q 3q 4),(q 1q 3, q 2q 4),(q 1q 4, q 2q 3) q 1q 2q 3 q 1q 2q 4 q 1q 3q 4 q 2q 3q 4 (q 1, q 2q 3), (q 2, q 1q 3), (q 3, q 1q 2) (q 1, q 2q 4), (q 2, q 1q 4), (q 4, q 1q 2) (q 1, q 3q 4), (q 3, q 1q 4), (q 4, q 1q 3) (q 2, q 3q 4), (q 3, q 2q 4), (q 4, q 2q 3) q 1q 2 q 1q 3 q 1q 4 q 2q 3 q 2q 4 q 3q 4 (q 1, q 2) (q 1, q 3) (q 1, q 4) (q 2, q 3) (q 2, q 4) (q 3, q 4) q 1 (q 1) q 2 (q 2) q 3 (q 3) q 4 (q 4) Adapted from [Han and Lee, 2009] Problem: Only few calculations are grouped together Andreas Meister Advanced Query Optimization Last Change: April 23, /90
95 Parallelized Dynamic Programming - DPE GENERIC Partial Order: Grouping based on size of resulting quantifier set SRQS 4 (q 1, q 2q 3q 4),(q 2, q 1q 3q 4),(q 3, q 1q 2q 4),(q 4, q 1q 2q 3), (q 1q 2, q 3q 4),(q 1q 3, q 2q 4),(q 1q 4, q 2q 3) SRQS 3 (q 1, q 2q 3),(q 1, q 2q 4),(q 1, q 3q 4),(q 2, q 3q 4), (q 2, q 1q 3),(q 2, q 1q 4),(q 3, q 1q 4),(q 3, q 2q 4), (q 3, q 1q 2),(q 4, q 1q 2),(q 4, q 1q 3),(q 4, q 2q 3) SRQS 2 (q 1, q 2),(q 1, q 3),(q 1, q 4),(q 2, q 3),(q 2, q 4),(q 3, q 4) SRQS 1 (q 1),(q 2),(q 3),(q 4) Adapted from [Han and Lee, 2009] Problem: Only few independent calculations are available Andreas Meister Advanced Query Optimization Last Change: April 23, /90
96 Parallelized Dynamic Programming - DPE GENERIC Partial Order: Grouping based on size of larger quantifier set SLQS 3 (q 1, q 2q 3q 4),(q 2, q 1q 3q 4),(q 3, q 2q 1q 4),(q 4, q 1q 2q 3) SLQS [1,3] SLQS 2 SLQS [1,2] SLQS [2,2] (q 1, q 2q 3), (q 1, q 2q 4), (q 1, q 3q 4), (q 1q 2, q 3q 4), (q 2, q 1q 3), (q 2, q 1q 4), (q 2, q 3q 4), (q 1q 3, q 2q 4), (q 3, q 1q 2), (q 3, q 1q 4), (q 3, q 2q 4), (q 1q 4, q 2q 3) (q 4, q 1q 2), (q 4, q 1q 3), (q 4, q 2q 3) SLQS 1 (q 1, q 2),(q 1, q 3),(q 1, q 4), (q 2, q 3),(q 2, q 4),(q 3, q 4) SLQS [1,1] SLQS 0 (q 1), (q 2), (q 3), (q 4) SLQS [1,0] Adapted from [Han and Lee, 2009] Andreas Meister Advanced Query Optimization Last Change: April 23, /90
97 Parallelized Dynamic Programming - DPE GENERIC Each element in partial: own queue (q 1 ) (q 2 ) (q 3 ) (q 4 ) B [1, 0] (q 1, q 2 ) (q 1, q 3 ) (q 1, q 4 ) (q 2, q 3 ) (q 2, q 4 ) (q 3, q 4 ) B [1, 1] B [2, 2] (q 1, q 2 q 3 ) (q 1, q 3 q 4 ) (q 3, q 2 q 4 ) (q 4, q 2 q 3 ) (q 2, q 3 q 4 ) B [1, 2] B [1, 3] Adapted from [Han and Lee, 2009] Insert calculations into corresponding queue Consumer iterate over all available queues and pull available calculations Access need to be synchronized Andreas Meister Advanced Query Optimization Last Change: April 23, /90
98 Parallelized Dynamic Programming - DPE GENERIC While consumer evaluate available calculations, the producer creates new calculations Number of calculations is predefined front (q 1 q 4, q 2 q 3 ) (q 1 q 3, q 2 q 4 ) (q 1 q 2, q 3 q 4 ) B [2, 2] (q 2, q 1 q 4 ) (q 3, q 1 q 4 ) (q 2, q 1 q 3 ) (q 4, q 1 q 3 ) (q 3, q 1 q 2 ) (q 4, q 1 q 2 ) (q 1, q 2 q 4 ) B [1, 0] B [1, 1] B [1, 2] (q 2, q 1 q 3 q 4 ) (q 3, q 1 q 2 q 4 ) (q 4, q 1 q 2 q 3 ) (q 1, q 2 q 3 q 4 ) B [1, 3] Adapted from [Han and Lee, 2009] Andreas Meister Advanced Query Optimization Last Change: April 23, /90
99 Parallelized Dynamic Programming - DPE GENERIC Sort calculations based on dependencies Better parallelization : dependency-free entries Threading Across Dependencies T a front T b B [, i ] B [, i + 1] B [, i + 2] B [, i ] B [, i + 1] B [, i + 2] Adapted from [Han and Lee, 2009] Andreas Meister Advanced Query Optimization Last Change: April 23, /90
100 Parallelized Dynamic Programming - DPE GENERIC Sort calculations based on dependencies Better parallelization : dependency-free entries Threading Across Dependencies T a front T b B [, i ] B [, i + 1] B [, i + 2] B [, i ] B [, i + 1] B [, i + 2] Adapted from [Han and Lee, 2009] What happens after a thread pulled a calculation? Andreas Meister Advanced Query Optimization Last Change: April 23, /90
101 Parallelized Dynamic Programming - DPE GENERIC Processing similar to sequential execution: Fetch information of join-partners Evaluate costs Hash table HSJN MGJN NLJN : a MEMO entry : a QEP node q 1 q 2 q 3 q 4 q 1 q 2 : a hash chain : a plan chain q 3 q 4 HSJN Adapted from [Han and Lee, 2009] Problem: Synchronization needed Andreas Meister Advanced Query Optimization Last Change: April 23, /90
102 Parallelized Dynamic Programming - DPE GENERIC Problem: Synchronization needed Solution: Group calculations into equivalence classes Link entries directly to entries of the hash table Hash table NLJN (m[q 1q 2q 3q 4], m[q 1q 4], m[q 2q 3]) (m[q 1q 2q 3q 4], m[q 1q 3], m[q 2q 4]) B [2,2] (m[q 1q 2q 3q 4], m[q 1q 2], m[q 3q 4]) q 1q 2q 3q 4 earmark-and-pinned q 3q 4 HSJN A synchronization-free global MEMO q 1q 2 (m[q 1q 3q 4], m[q 4], m[q 1q 3]) (m[q 1q 3q 4], m[q 3], m[q 1q 2]) (m[q 1q 2q 4], m[q 4], m[q 1q 2]) (m[q 1q 2q 4], m[q 2], m[q 1q 4]) (m[q 1q 2q 4], m[q 1], m[q 2q 4]) (m[q 1q 2q 3], m[q 3], m[q 1q 2]) B [1,2] (m[q 1q 2q 3], m[q 2], m[q 1q 3]) Adapted from [Han and Lee, 2009] Andreas Meister Advanced Query Optimization Last Change: April 23, /90
103 So what is the best approach? Andreas Meister Advanced Query Optimization Last Change: April 23, /90
104 Dynamic Programming - Evaluation Linear queries - simple cost function: Runtime (ns) DP SIZE DP SUB DP CCP PDP SVA DPE GENE #Tables Andreas Meister Advanced Query Optimization Last Change: April 23, /90
105 Dynamic Programming - Evaluation Star queries - simple cost function: Runtime (ns) DP SIZE DP SUB DP CCP PDP SVA DPE GENE #Tables Andreas Meister Advanced Query Optimization Last Change: April 23, /90
106 Dynamic Programming - Evaluation Clique queries - simple cost function: Runtime (ns) DP SIZE DP SUB DP CCP PDP SVA DPE GENE #Tables Andreas Meister Advanced Query Optimization Last Change: April 23, /90
107 Dynamic Programming - Evaluation Linear queries - complex cost function: Runtime (ns) DP SIZE DP SUB DP CCP PDP SVA DPE GENE Complex #Tables Andreas Meister Advanced Query Optimization Last Change: April 23, /90
108 Dynamic Programming - Evaluation Star queries - complex cost function: Runtime (ns) DP SIZE DP SUB DP CCP PDP SVA DPE GENE #Tables Andreas Meister Advanced Query Optimization Last Change: April 23, /90
109 Dynamic Programming - Evaluation Clique queries - complex cost function: Runtime (ns) DP SIZE DP SUB DP CCP PDP SVA DPE GENE #Tables Andreas Meister Advanced Query Optimization Last Change: April 23, /90
110 Dynamic Programming - Evaluation There is no best approach: Sequential approaches good for simple problems Parallel approaches good for complex problems Andreas Meister Advanced Query Optimization Last Change: April 23, /90
111 Dynamic Programming - Evaluation There is no best approach: Sequential approaches good for simple problems Parallel approaches good for complex problems Simple problems: Simple cost function Small number of tables Andreas Meister Advanced Query Optimization Last Change: April 23, /90
112 Dynamic Programming - Evaluation There is no best approach: Sequential approaches good for simple problems Parallel approaches good for complex problems Simple problems: Simple cost function Small number of tables Complex problems: Complex cost function Large number of tables Andreas Meister Advanced Query Optimization Last Change: April 23, /90
113 How are cost calculated? Andreas Meister Advanced Query Optimization Last Change: April 23, /90
114 Cost estimation Cost-based optimization needs cost estimations Cost-estimation directly influence optimization Andreas Meister Advanced Query Optimization Last Change: April 23, /90
115 Cost estimation Cost-based optimization needs cost estimations Cost-estimation directly influence optimization Challenge: Accurate estimations Time limit Andreas Meister Advanced Query Optimization Last Change: April 23, /90
116 Cost estimation Cost-estimation directly influence optimization: Good estimation good optimization results Bad estimations unpredictable optimization results Andreas Meister Advanced Query Optimization Last Change: April 23, /90
117 Cost estimation Cost-estimation directly influence optimization: Good estimation good optimization results Bad estimations unpredictable optimization results Challenge: Accurate estimations Time limit Andreas Meister Advanced Query Optimization Last Change: April 23, /90
118 Cost estimation Cost-estimation directly influence optimization: Good estimation good optimization results Bad estimations unpredictable optimization results Challenge: Accurate estimations Time limit Components: Cardinality estimation: Statistics + Formulas Histograms Parametric approaches Sample-based approaches Cost function Andreas Meister Advanced Query Optimization Last Change: April 23, /90
119 Cardinality estimation - Statistics + Formulas Idea: Estimate selectivity of operators σ F (R(R)) = sel(f, R) r Andreas Meister Advanced Query Optimization Last Change: April 23, /90
120 Cardinality estimation - Statistics + Formulas Idea: Estimate selectivity of operators σ F (R(R)) = sel(f, R) r Estimation (for interpolateable, arithmetic values) : sel(a = v, R) = 1 val A,r sel(a < v, R) = sel(a > v, R) = sel(a between v 1 and v 2, R) = v A min A max A min A max v A max A min v 2 v 1 A max A min Further cost formulas presented in [Saake et al., 2012] Andreas Meister Advanced Query Optimization Last Change: April 23, /90
121 Cardinality estimation - Histograms frequency real distribution approximate distribution attribute values Adapted from [Saake et al., 2012] Andreas Meister Advanced Query Optimization Last Change: April 23, /90
122 Cardinality estimation - Parametric approaches Gaussian Distribution Cluster Adapted from [Böhm et al., 2005] Andreas Meister Advanced Query Optimization Last Change: April 23, /90
123 Cardinality estimation - Sample-based approaches Kernel density estimator Adapted from [Heimel and Markl, 2012] Andreas Meister Advanced Query Optimization Last Change: April 23, /90
124 Cardinality estimation - Cost function Input: Cardinality estimation Query information Database information Output: Cost estimation (artificial, time, ) Andreas Meister Advanced Query Optimization Last Change: April 23, /90
125 Cardinality estimation - Cost function Input: Cardinality estimation Query information Database information Output: Cost estimation (artificial, time, ) Considered aspects: Disk accesses CPU consumption Memory consumption Network traffic Execution time Cache misses Andreas Meister Advanced Query Optimization Last Change: April 23, /90
126 Cardinality estimation - Cost function Input: Cardinality estimation Query information Database information Output: Cost estimation (artificial, time, ) Considered aspects: Disk accesses CPU consumption Memory consumption Network traffic Execution time Cache misses Complex is not always better![leis et al., 2015] Andreas Meister Advanced Query Optimization Last Change: April 23, /90
127 Is this all? Andreas Meister Advanced Query Optimization Last Change: April 23, /90
128 Outlook Modern hardware Cost estimation New optimization approaches: Ant-Colony optimization Genetic algorithms Mixed integer linear programming Further optimization problems: Physical Database Design Partitioning Index Materialized views Self-Tuning Andreas Meister Advanced Query Optimization Last Change: April 23, /90
129 Wrap-up Query Optimization Join-Order Optimization Dynamic Programming Sequential Parallel Cost estimation Andreas Meister Advanced Query Optimization Last Change: April 23, /90
130 Wrap-up Query Optimization Join-Order Optimization Dynamic Programming Sequential Parallel Cost estimation There is no best approach Andreas Meister Advanced Query Optimization Last Change: April 23, /90
131 Join us! Possible collaboration: Thesis Scientific Team Project: Modern Database Technologies Requirements: Motivation (Programming skills) Andreas Meister Advanced Query Optimization Last Change: April 23, /90
132 References I Böhm, C., Kriegel, H.-P., Kröger, P., and Linhart, P. (2005). Selectivity Estimation of High Dimensional Window Queries via Clustering. In Advances in Spatial and Temporal Databases, volume 3633 of Lecture Notes in Computer Science, pages Springer. Han, W.-S., Kwak, W., Lee, J., Lohman, G. M., and Markl, V. (2008). Parallelizing Query Optimization. PVLDB, 1(1): Han, W.-S. and Lee, J. (2009). Dependency-aware Reordering for Parallelizing Query Optimization in Multi-core CPUs. SIGMOD, pages ACM. Heimel, M. and Markl, V. (2012). A First Step Towards GPU-assisted Query Optimization. In ADMS. Leis, V., Gubichev, A., Mirchev, A., Boncz, P., Kemper, A., and Neumann, T. (2015). How Good Are Query Optimizers, Really? PVLDB, 9(3): Moerkotte, G. and Neumann, T. (2006). Analysis of Two Existing and One New Dynamic Programming Algorithm for the Generation of Optimal Bushy Join Trees Without Cross Products. VLDB, pages VLDB Endowment. Andreas Meister Advanced Query Optimization Last Change: April 23, /90
133 References II Neumann, T. (2014). Engineering High-Performance Database Engines. PVLDB, 7(13): Saake, G., Heuer, A., and Sattler, K.-U. (2012). Datenbanken: Implementierungstechniken. mitp-verlag, Bonn, 3 edition. Selinger, P. G., Astrahan, M. M., Chamberlin, D. D., Lorie, R. A., and Price, T. G. (1979). Access Path Selection in a Relational Database Management System. SIGMOD, pages ACM. Vance, B. and Maier, D. (1996). Rapid Bushy Join-order Optimization with Cartesian Products. SIGMOD, pages ACM. Zukowski, M., Boncz, P. A., Nes, N., and Héman, S. (2005). Monetdb/x100 - A DBMS in the CPU cache. IEEE Data Eng. Bull., 28(2): Andreas Meister Advanced Query Optimization Last Change: April 23, /90
Avoiding Sorting and Grouping In Processing Queries
Avoiding Sorting and Grouping In Processing Queries Outline Motivation Simple Example Order Properties Grouping followed by ordering Order Property Optimization Performance Results Conclusion Motivation
More informationBeyond EXPLAIN. Query Optimization From Theory To Code. Yuto Hayamizu Ryoji Kawamichi. 2016/5/20 PGCon Ottawa
Beyond EXPLAIN Query Optimization From Theory To Code Yuto Hayamizu Ryoji Kawamichi 2016/5/20 PGCon 2016 @ Ottawa Historically Before Relational Querying was physical Need to understand physical organization
More informationQUERY OPTIMIZATION E Jayant Haritsa Computer Science and Automation Indian Institute of Science. JAN 2014 Slide 1 QUERY OPTIMIZATION
E0 261 Jayant Haritsa Computer Science and Automation Indian Institute of Science JAN 2014 Slide 1 Database Engines Main Components Query Processing Transaction Processing Access Methods JAN 2014 Slide
More informationR & G Chapter 13. Implementation of single Relational Operations Choices depend on indexes, memory, stats, Joins Blocked nested loops:
Relational Query Optimization R & G Chapter 13 Review Implementation of single Relational Operations Choices depend on indexes, memory, stats, Joins Blocked nested loops: simple, exploits extra memory
More informationMidterm Review. March 27, 2017
Midterm Review March 27, 2017 1 Overview Relational Algebra & Query Evaluation Relational Algebra Rewrites Index Design / Selection Physical Layouts 2 Relational Algebra & Query Evaluation 3 Relational
More informationOverview of Implementing Relational Operators and Query Evaluation
Overview of Implementing Relational Operators and Query Evaluation Chapter 12 Motivation: Evaluating Queries The same query can be evaluated in different ways. The evaluation strategy (plan) can make orders
More informationTwo-Phase Optimization for Selecting Materialized Views in a Data Warehouse
Two-Phase Optimization for Selecting Materialized Views in a Data Warehouse Jiratta Phuboon-ob, and Raweewan Auepanwiriyakul Abstract A data warehouse (DW) is a system which has value and role for decision-making
More informationTPC-H Benchmark Set. TPC-H Benchmark. DDL for TPC-H datasets
TPC-H Benchmark Set TPC-H Benchmark TPC-H is an ad-hoc and decision support benchmark. Some of queries are available in the current Tajo. You can download the TPC-H data generator here. DDL for TPC-H datasets
More informationAnnouncement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17
Announcement CompSci 516 Database Systems Lecture 10 Query Evaluation and Join Algorithms Project proposal pdf due on sakai by 5 pm, tomorrow, Thursday 09/27 One per group by any member Instructor: Sudeepa
More informationImplementation of Relational Operations
Implementation of Relational Operations Module 4, Lecture 1 Database Management Systems, R. Ramakrishnan 1 Relational Operations We will consider how to implement: Selection ( ) Selects a subset of rows
More information6.830 Problem Set 2 (2017)
6.830 Problem Set 2 1 Assigned: Monday, Sep 25, 2017 6.830 Problem Set 2 (2017) Due: Monday, Oct 16, 2017, 11:59 PM Submit to Gradescope: https://gradescope.com/courses/10498 The purpose of this problem
More informationModule 9: Query Optimization
Module 9: Query Optimization Module Outline Web Forms Applications SQL Interface 9.1 Outline of Query Optimization 9.2 Motivating Example 9.3 Equivalences in the relational algebra 9.4 Heuristic optimization
More informationTowards Comprehensive Testing Tools
Towards Comprehensive Testing Tools Redefining testing mechanisms! Kuntal Ghosh (Software Engineer) PGCon 2017 26.05.2017 1 Contents Motivation Picasso Visualizer Picasso Art Gallery for PostgreSQL 10
More informationAdvanced Database Systems
Lecture IV Query Processing Kyumars Sheykh Esmaili Basic Steps in Query Processing 2 Query Optimization Many equivalent execution plans Choosing the best one Based on Heuristics, Cost Will be discussed
More informationMOIRA A Goal-Oriented Incremental Machine Learning Approach to Dynamic Resource Cost Estimation in Distributed Stream Processing Systems
MOIRA A Goal-Oriented Incremental Machine Learning Approach to Dynamic Resource Cost Estimation in Distributed Stream Processing Systems Daniele Foroni, C. Axenie, S. Bortoli, M. Al Hajj Hassan, R. Acker,
More informationPrinciples of Data Management. Lecture #9 (Query Processing Overview)
Principles of Data Management Lecture #9 (Query Processing Overview) Instructor: Mike Carey mjcarey@ics.uci.edu Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Today s Notable News v Midterm
More informationEvaluation of Relational Operations. Relational Operations
Evaluation of Relational Operations Chapter 14, Part A (Joins) Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Relational Operations v We will consider how to implement: Selection ( )
More informationCS330. Query Processing
CS330 Query Processing 1 Overview of Query Evaluation Plan: Tree of R.A. ops, with choice of alg for each op. Each operator typically implemented using a `pull interface: when an operator is `pulled for
More informationQuery Execution [15]
CSC 661, Principles of Database Systems Query Execution [15] Dr. Kalpakis http://www.csee.umbc.edu/~kalpakis/courses/661 Query processing involves Query processing compilation parsing to construct parse
More informationReview. Relational Query Optimization. Query Optimization Overview (cont) Query Optimization Overview. Cost-based Query Sub-System
Review Relational Query Optimization R & G Chapter 12/15 Implementation of single Relational Operations Choices depend on indexes, memory, stats, Joins Blocked nested loops: simple, exploits extra memory
More informationModule 4. Implementation of XQuery. Part 0: Background on relational query processing
Module 4 Implementation of XQuery Part 0: Background on relational query processing The Data Management Universe Lecture Part I Lecture Part 2 2 What does a Database System do? Input: SQL statement Output:
More informationCSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 7 - Query optimization
CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 7 - Query optimization Announcements HW1 due tonight at 11:45pm HW2 will be due in two weeks You get to implement your own
More informationFaloutsos 1. Carnegie Mellon Univ. Dept. of Computer Science Database Applications. Outline
Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications Lecture #14: Implementation of Relational Operations (R&G ch. 12 and 14) 15-415 Faloutsos 1 introduction selection projection
More informationOverview of Query Evaluation. Overview of Query Evaluation
Overview of Query Evaluation Chapter 12 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Overview of Query Evaluation v Plan: Tree of R.A. ops, with choice of alg for each op. Each operator
More informationOrri Erling (Program Manager, OpenLink Virtuoso), Ivan Mikhailov (Lead Developer, OpenLink Virtuoso).
Orri Erling (Program Manager, OpenLink Virtuoso), Ivan Mikhailov (Lead Developer, OpenLink Virtuoso). Business Intelligence Extensions for SPARQL Orri Erling and Ivan Mikhailov OpenLink Software, 10 Burlington
More informationCS 4604: Introduction to Database Management Systems. B. Aditya Prakash Lecture #10: Query Processing
CS 4604: Introduction to Database Management Systems B. Aditya Prakash Lecture #10: Query Processing Outline introduction selection projection join set & aggregate operations Prakash 2018 VT CS 4604 2
More informationOn-Disk Bitmap Index Performance in Bizgres 0.9
On-Disk Bitmap Index Performance in Bizgres 0.9 A Greenplum Whitepaper April 2, 2006 Author: Ayush Parashar Performance Engineering Lab Table of Contents 1.0 Summary...1 2.0 Introduction...1 3.0 Performance
More informationWhat happens. 376a. Database Design. Execution strategy. Query conversion. Next. Two types of techniques
376a. Database Design Dept. of Computer Science Vassar College http://www.cs.vassar.edu/~cs376 Class 16 Query optimization What happens Database is given a query Query is scanned - scanner creates a list
More informationHistogram Support in MySQL 8.0
Histogram Support in MySQL 8.0 Øystein Grøvlen Senior Principal Software Engineer MySQL Optimizer Team, Oracle February 2018 Program Agenda 1 2 3 4 5 Motivating example Quick start guide How are histograms
More informationCopyright 2016 Ramez Elmasri and Shamkant B. Navathe
CHAPTER 19 Query Optimization Introduction Query optimization Conducted by a query optimizer in a DBMS Goal: select best available strategy for executing query Based on information available Most RDBMSs
More informationAnorexic Plan Diagrams
Anorexic Plan Diagrams E0 261 Jayant Haritsa Computer Science and Automation Indian Institute of Science JAN 2014 Plan Diagram Reduction 1 Query Plan Selection Core technique Query (Q) Query Optimizer
More informationEvaluation of Relational Operations: Other Techniques. Chapter 14 Sayyed Nezhadi
Evaluation of Relational Operations: Other Techniques Chapter 14 Sayyed Nezhadi Schema for Examples Sailors (sid: integer, sname: string, rating: integer, age: real) Reserves (sid: integer, bid: integer,
More informationDatabase Applications (15-415)
Database Applications (15-415) DBMS Internals- Part VI Lecture 14, March 12, 2014 Mohammad Hammoud Today Last Session: DBMS Internals- Part V Hash-based indexes (Cont d) and External Sorting Today s Session:
More informationImplementation of Relational Operations. Introduction. CS 186, Fall 2002, Lecture 19 R&G - Chapter 12
Implementation of Relational Operations CS 186, Fall 2002, Lecture 19 R&G - Chapter 12 First comes thought; then organization of that thought, into ideas and plans; then transformation of those plans into
More informationCompSci 516 Data Intensive Computing Systems
CompSci 516 Data Intensive Computing Systems Lecture 9 Join Algorithms and Query Optimizations Instructor: Sudeepa Roy CompSci 516: Data Intensive Computing Systems 1 Announcements Takeaway from Homework
More informationQuery Processing & Optimization
Query Processing & Optimization 1 Roadmap of This Lecture Overview of query processing Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Introduction
More informationAdministrivia. CS 133: Databases. Cost-based Query Sub-System. Goals for Today. Midterm on Thursday 10/18. Assignments
Administrivia Midterm on Thursday 10/18 CS 133: Databases Fall 2018 Lec 12 10/16 Prof. Beth Trushkowsky Assignments Lab 3 starts after fall break No problem set out this week Goals for Today Cost-based
More informationOptimizing Queries Using Materialized Views
Optimizing Queries Using Materialized Views Paul Larson & Jonathan Goldstein Microsoft Research 3/22/2001 Paul Larson, View matching 1 Materialized views Precomputed, stored result defined by a view expression
More informationCAS CS 460/660 Introduction to Database Systems. Query Evaluation II 1.1
CAS CS 460/660 Introduction to Database Systems Query Evaluation II 1.1 Cost-based Query Sub-System Queries Select * From Blah B Where B.blah = blah Query Parser Query Optimizer Plan Generator Plan Cost
More informationQuery Optimization Time: The New Bottleneck in Realtime
Query Optimization Time: The New Bottleneck in Realtime Analytics Rajkumar Sen Jack Chen Nika Jimsheleishvilli MemSQL Inc. MemSQL Inc. MemSQL Inc. 534 4 th Street, 534 4 th Street, 534 4 th Street, San
More informationOverview of Query Evaluation
Overview of Query Evaluation Chapter 12 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Overview of Query Evaluation Plan: Tree of R.A. ops, with choice of alg for each op. Each operator
More informationCSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 9 - Query optimization
CSE 544 Principles of Database Management Systems Magdalena Balazinska Fall 2007 Lecture 9 - Query optimization References Access path selection in a relational database management system. Selinger. et.
More informationEvaluation of Relational Operations
Evaluation of Relational Operations Yanlei Diao UMass Amherst March 13 and 15, 2006 Slides Courtesy of R. Ramakrishnan and J. Gehrke 1 Relational Operations We will consider how to implement: Selection
More informationEvaluation of Relational Operations
Evaluation of Relational Operations Chapter 12, Part A Database Management Systems, R. Ramakrishnan and J. Gehrke 1 Relational Operations We will consider how to implement: Selection ( ) Selects a subset
More informationReminders. Query Optimizer Overview. The Three Parts of an Optimizer. Dynamic Programming. Search Algorithm. CSE 444: Database Internals
Reminders CSE 444: Database Internals Lab 2 is due on Wednesday HW 5 is due on Friday Lecture 11 Query Optimization (part 2) CSE 444 - Winter 2017 1 CSE 444 - Winter 2017 2 Query Optimizer Overview Input:
More informationHash-Based Indexing 165
Hash-Based Indexing 165 h 1 h 0 h 1 h 0 Next = 0 000 00 64 32 8 16 000 00 64 32 8 16 A 001 01 9 25 41 73 001 01 9 25 41 73 B 010 10 10 18 34 66 010 10 10 18 34 66 C Next = 3 011 11 11 19 D 011 11 11 19
More informationJayant Haritsa. Database Systems Lab Indian Institute of Science Bangalore, India
Jayant Haritsa Database Systems Lab Indian Institute of Science Bangalore, India Query Execution Plans SQL, the standard database query interface, is a declarative language Specifies only what is wanted,
More informationIntroduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe
Introduction to Query Processing and Query Optimization Techniques Outline Translating SQL Queries into Relational Algebra Algorithms for External Sorting Algorithms for SELECT and JOIN Operations Algorithms
More informationCSE 444: Database Internals. Lecture 11 Query Optimization (part 2)
CSE 444: Database Internals Lecture 11 Query Optimization (part 2) CSE 444 - Spring 2014 1 Reminders Homework 2 due tonight by 11pm in drop box Alternatively: turn it in in my office by 4pm, or in Priya
More informationOptimizer Standof. MySQL 5.6 vs MariaDB 5.5. Peter Zaitsev, Ovais Tariq Percona Inc April 18, 2012
Optimizer Standof MySQL 5.6 vs MariaDB 5.5 Peter Zaitsev, Ovais Tariq Percona Inc April 18, 2012 Thank you Ovais Tariq Ovais Did a lot of heavy lifing for this presentation He could not come to talk together
More informationEvaluation of relational operations
Evaluation of relational operations Iztok Savnik, FAMNIT Slides & Textbook Textbook: Raghu Ramakrishnan, Johannes Gehrke, Database Management Systems, McGraw-Hill, 3 rd ed., 2007. Slides: From Cow Book
More information15-415/615 Faloutsos 1
Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB Applications Lecture #14: Implementation of Relational Operations (R&G ch. 12 and 14) 15-415/615 Faloutsos 1 Outline introduction selection
More informationCompSci 516 Data Intensive Computing Systems. Lecture 11. Query Optimization. Instructor: Sudeepa Roy
CompSci 516 Data Intensive Computing Systems Lecture 11 Query Optimization Instructor: Sudeepa Roy Duke CS, Fall 2017 CompSci 516: Database Systems 1 Announcements HW2 has been posted on sakai Due on Oct
More informationImplementing Relational Operators: Selection, Projection, Join. Database Management Systems, R. Ramakrishnan and J. Gehrke 1
Implementing Relational Operators: Selection, Projection, Join Database Management Systems, R. Ramakrishnan and J. Gehrke 1 Readings [RG] Sec. 14.1-14.4 Database Management Systems, R. Ramakrishnan and
More informationHigh Volume In-Memory Data Unification
25 March 2017 High Volume In-Memory Data Unification for UniConnect Platform powered by Intel Xeon Processor E7 Family Contents Executive Summary... 1 Background... 1 Test Environment...2 Dataset Sizes...
More informationData Modeling and Databases Ch 10: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich
Data Modeling and Databases Ch 10: Query Processing - Algorithms Gustavo Alonso Systems Group Department of Computer Science ETH Zürich Transactions (Locking, Logging) Metadata Mgmt (Schema, Stats) Application
More informationTuning Relational Systems I
Tuning Relational Systems I Schema design Trade-offs among normalization, denormalization, clustering, aggregate materialization, vertical partitioning, etc Query rewriting Using indexes appropriately,
More informationDatabase Applications (15-415)
Database Applications (15-415) DBMS Internals- Part VI Lecture 17, March 24, 2015 Mohammad Hammoud Today Last Two Sessions: DBMS Internals- Part V External Sorting How to Start a Company in Five (maybe
More informationarxiv: v1 [cs.db] 6 Jan 2019
Exact Selectivity Computation for Modern In-Memory Database Optimization Jun Hyung Shin University of California Merced jshin33@ucmerced.edu Florin Rusu University of California Merced frusu@ucmerced.edu
More informationRelational Query Optimization
Relational Query Optimization Module 4, Lectures 3 and 4 Database Management Systems, R. Ramakrishnan 1 Overview of Query Optimization Plan: Tree of R.A. ops, with choice of alg for each op. Each operator
More informationData Modeling and Databases Ch 9: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich
Data Modeling and Databases Ch 9: Query Processing - Algorithms Gustavo Alonso Systems Group Department of Computer Science ETH Zürich Transactions (Locking, Logging) Metadata Mgmt (Schema, Stats) Application
More informationECS 165B: Database System Implementa6on Lecture 7
ECS 165B: Database System Implementa6on Lecture 7 UC Davis April 12, 2010 Acknowledgements: por6ons based on slides by Raghu Ramakrishnan and Johannes Gehrke. Class Agenda Last 6me: Dynamic aspects of
More informationParallel Query Optimisation
Parallel Query Optimisation Contents Objectives of parallel query optimisation Parallel query optimisation Two-Phase optimisation One-Phase optimisation Inter-operator parallelism oriented optimisation
More informationChapter 17: Parallel Databases
Chapter 17: Parallel Databases Introduction I/O Parallelism Interquery Parallelism Intraquery Parallelism Intraoperation Parallelism Interoperation Parallelism Design of Parallel Systems Database Systems
More informationA Compression Framework for Query Results
A Compression Framework for Query Results Zhiyuan Chen and Praveen Seshadri Cornell University zhychen, praveen@cs.cornell.edu, contact: (607)255-1045, fax:(607)255-4428 Decision-support applications in
More informationLazy Maintenance of Materialized Views
Lazy Maintenance of Materialized Views Jingren Zhou, Microsoft Research, USA Paul Larson, Microsoft Research, USA Hicham G. Elmongui, Purdue University, USA Introduction 2 Materialized views Speed up query
More informationSchema Tuning. Tuning Schemas : Overview
Administração e Optimização de Bases de Dados 2012/2013 Schema Tuning Bruno Martins DEI@Técnico e DMIR@INESC-ID Tuning Schemas : Overview Trade-offs among normalization / denormalization Overview When
More informationAdministriva. CS 133: Databases. General Themes. Goals for Today. Fall 2018 Lec 11 10/11 Query Evaluation Prof. Beth Trushkowsky
Administriva Lab 2 Final version due next Wednesday CS 133: Databases Fall 2018 Lec 11 10/11 Query Evaluation Prof. Beth Trushkowsky Problem sets PSet 5 due today No PSet out this week optional practice
More informationLecture #14 Optimizer Implementation (Part I)
15-721 ADVANCED DATABASE SYSTEMS Lecture #14 Optimizer Implementation (Part I) Andy Pavlo / Carnegie Mellon University / Spring 2016 @Andy_Pavlo // Carnegie Mellon University // Spring 2017 2 TODAY S AGENDA
More informationPrinciples of Data Management. Lecture #12 (Query Optimization I)
Principles of Data Management Lecture #12 (Query Optimization I) Instructor: Mike Carey mjcarey@ics.uci.edu Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Today s Notable News v B+ tree
More informationSandor Heman, Niels Nes, Peter Boncz. Dynamic Bandwidth Sharing. Cooperative Scans: Marcin Zukowski. CWI, Amsterdam VLDB 2007.
Cooperative Scans: Dynamic Bandwidth Sharing in a DBMS Marcin Zukowski Sandor Heman, Niels Nes, Peter Boncz CWI, Amsterdam VLDB 2007 Outline Scans in a DBMS Cooperative Scans Benchmarks DSM version VLDB,
More informationSystems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15
Systems Infrastructure for Data Science Web Science Group Uni Freiburg WS 2014/15 Lecture II: Indexing Part I of this course Indexing 3 Database File Organization and Indexing Remember: Database tables
More informationArchitecture and Implementation of Database Systems Query Processing
Architecture and Implementation of Database Systems Query Processing Ralf Möller Hamburg University of Technology Acknowledgements The course is partially based on http://www.systems.ethz.ch/ education/past-courses/hs08/archdbms
More informationQuery Evaluation Overview, cont.
Query Evaluation Overview, cont. Lecture 9 Feb. 29, 2016 Slides based on Database Management Systems 3 rd ed, Ramakrishnan and Gehrke Architecture of a DBMS Query Compiler Execution Engine Index/File/Record
More informationColumn-Stores vs. Row-Stores: How Different Are They Really?
Column-Stores vs. Row-Stores: How Different Are They Really? Daniel J. Abadi, Samuel Madden and Nabil Hachem SIGMOD 2008 Presented by: Souvik Pal Subhro Bhattacharyya Department of Computer Science Indian
More informationDatabase Systems. Announcement. December 13/14, 2006 Lecture #10. Assignment #4 is due next week.
Database Systems ( 料 ) December 13/14, 2006 Lecture #10 1 Announcement Assignment #4 is due next week. 2 1 Overview of Query Evaluation Chapter 12 3 Outline Query evaluation (Overview) Relational Operator
More informationEvaluation of Relational Operations
Evaluation of Relational Operations Chapter 14 Comp 521 Files and Databases Fall 2010 1 Relational Operations We will consider in more detail how to implement: Selection ( ) Selects a subset of rows from
More informationWhen and How to Take Advantage of New Optimizer Features in MySQL 5.6. Øystein Grøvlen Senior Principal Software Engineer, MySQL Oracle
When and How to Take Advantage of New Optimizer Features in MySQL 5.6 Øystein Grøvlen Senior Principal Software Engineer, MySQL Oracle Program Agenda Improvements for disk-bound queries Subquery improvements
More informationSchema for Examples. Query Optimization. Alternative Plans 1 (No Indexes) Motivating Example. Alternative Plans 2 With Indexes
Schema for Examples Query Optimization (sid: integer, : string, rating: integer, age: real) (sid: integer, bid: integer, day: dates, rname: string) Similar to old schema; rname added for variations. :
More informationAdvanced Databases: Parallel Databases A.Poulovassilis
1 Advanced Databases: Parallel Databases A.Poulovassilis 1 Parallel Database Architectures Parallel database systems use parallel processing techniques to achieve faster DBMS performance and handle larger
More informationEfficient in-memory query execution using JIT compiling. Han-Gyu Park
Efficient in-memory query execution using JIT compiling Han-Gyu Park 2012-11-16 CONTENTS Introduction How DCX works Experiment(purpose(at the beginning of this slide), environment, result, analysis & conclusion)
More informationRelational Query Optimization. Overview of Query Evaluation. SQL Refresher. Yanlei Diao UMass Amherst October 23 & 25, 2007
Relational Query Optimization Yanlei Diao UMass Amherst October 23 & 25, 2007 Slide Content Courtesy of R. Ramakrishnan, J. Gehrke, and J. Hellerstein 1 Overview of Query Evaluation Query Evaluation Plan:
More informationBDCC: Exploiting Fine-Grained Persistent Memories for OLAP. Peter Boncz
BDCC: Exploiting Fine-Grained Persistent Memories for OLAP Peter Boncz NVRAM System integration: NVMe: block devices on the PCIe bus NVDIMM: persistent RAM, byte-level access Low latency Lower than Flash,
More informationQuery Optimization. Schema for Examples. Motivating Example. Similar to old schema; rname added for variations. Reserves: Sailors:
Query Optimization Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Schema for Examples (sid: integer, sname: string, rating: integer, age: real) (sid: integer, bid: integer, day: dates,
More informationMagda Balazinska - CSE 444, Spring
Query Optimization Algorithm CE 444: Database Internals Lectures 11-12 Query Optimization (part 2) Enumerate alternative plans (logical & physical) Compute estimated cost of each plan Compute number of
More informationModule 9: Selectivity Estimation
Module 9: Selectivity Estimation Module Outline 9.1 Query Cost and Selectivity Estimation 9.2 Database profiles 9.3 Sampling 9.4 Statistics maintained by commercial DBMS Web Forms Transaction Manager Lock
More information! Parallel machines are becoming quite common and affordable. ! Databases are growing increasingly large
Chapter 20: Parallel Databases Introduction! Introduction! I/O Parallelism! Interquery Parallelism! Intraquery Parallelism! Intraoperation Parallelism! Interoperation Parallelism! Design of Parallel Systems!
More informationChapter 20: Parallel Databases
Chapter 20: Parallel Databases! Introduction! I/O Parallelism! Interquery Parallelism! Intraquery Parallelism! Intraoperation Parallelism! Interoperation Parallelism! Design of Parallel Systems 20.1 Introduction!
More informationChapter 20: Parallel Databases. Introduction
Chapter 20: Parallel Databases! Introduction! I/O Parallelism! Interquery Parallelism! Intraquery Parallelism! Intraoperation Parallelism! Interoperation Parallelism! Design of Parallel Systems 20.1 Introduction!
More informationExamples of Physical Query Plan Alternatives. Selected Material from Chapters 12, 14 and 15
Examples of Physical Query Plan Alternatives Selected Material from Chapters 12, 14 and 15 1 Query Optimization NOTE: SQL provides many ways to express a query. HENCE: System has many options for evaluating
More informationSimba: Towards Building Interactive Big Data Analytics Systems. Feifei Li
Simba: Towards Building Interactive Big Data Analytics Systems Feifei Li Complex Operators over Rich Data Types Integrated into System Kernel For Example: SELECT k-means from Population WHERE k=5 and feature=age
More informationCost-based Query Sub-System. Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Last Class.
Cost-based Query Sub-System Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB Applications Queries Select * From Blah B Where B.blah = blah Query Parser Query Optimizer C. Faloutsos A. Pavlo
More informationOverview of Query Processing. Evaluation of Relational Operations. Why Sort? Outline. Two-Way External Merge Sort. 2-Way Sort: Requires 3 Buffer Pages
Overview of Query Processing Query Parser Query Processor Evaluation of Relational Operations Query Rewriter Query Optimizer Query Executor Yanlei Diao UMass Amherst Lock Manager Access Methods (Buffer
More informationQuery processing of pre-partitioned data using Sandwich Operators
Query processing of pre-partitioned data using Sandwich Operators Stephan Baumann 1, Peter Boncz 2, and Kai-Uwe Sattler 1 1 Ilmenau University of Technology, Ilmenau, Germany, first.last@tu-ilmenau.de,
More informationOutline. Query Processing Overview Algorithms for basic operations. Query optimization. Sorting Selection Join Projection
Outline Query Processing Overview Algorithms for basic operations Sorting Selection Join Projection Query optimization Heuristics Cost-based optimization 19 Estimate I/O Cost for Implementations Count
More informationDBMS Query evaluation
Data Management for Data Science DBMS Maurizio Lenzerini, Riccardo Rosati Corso di laurea magistrale in Data Science Sapienza Università di Roma Academic Year 2016/2017 http://www.dis.uniroma1.it/~rosati/dmds/
More informationAn SQL query is parsed into a collection of query blocks optimize one block at a time. Nested blocks are usually treated as calls to a subroutine
QUERY OPTIMIZATION 1 QUERY OPTIMIZATION QUERY SUB-SYSTEM 2 ROADMAP 3. 12 QUERY BLOCKS: UNITS OF OPTIMIZATION An SQL query is parsed into a collection of query blocks optimize one block at a time. Nested
More informationAlgorithms for Query Processing and Optimization. 0. Introduction to Query Processing (1)
Chapter 19 Algorithms for Query Processing and Optimization 0. Introduction to Query Processing (1) Query optimization: The process of choosing a suitable execution strategy for processing a query. Two
More informationArchitecture and Implementation of Database Systems (Winter 2015/16)
Jens Teubner Architecture & Implementation of DBMS Winter 2015/16 1 Architecture and Implementation of Database Systems (Winter 2015/16) Jens Teubner, DBIS Group jens.teubner@cs.tu-dortmund.de Winter 2015/16
More informationChapter 12: Query Processing
Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Overview Chapter 12: Query Processing Measures of Query Cost Selection Operation Sorting Join
More information