Database Group Research Overview. Immanuel Trummer
|
|
- Esther Carter
- 5 years ago
- Views:
Transcription
1 Database Group Research Overview Immanuel Trummer
2 Talk Overview User Query Data Analysis Result Processing
3 Talk Overview Fact Checking Query User Data Vocalization Data Analysis Result Processing Query Optimization
4 Talk Overview Fact Checking Query User Data Vocalization Data Analysis Result Processing Query Optimization
5 Data Vocalization
6 How to Present Data?
7 How to Present Data? Data Visual Interfaces User
8 How to Present Data? Data Visualization Visual Interfaces Data User
9 How to Present Data? Data Visualization Visual Interfaces Data Audio Interfaces User
10 How to Present Data? Data Visualization Visual Interfaces Data Data Vocalization Audio Interfaces User
11 Reading versus Listening Feature Written Text Voice Output Slowing Rereading Skimming
12 Vocalization: Overview Data Vocalization User
13 Vocalization: Overview Cognitive Load Memory Load Data Vocalization User
14 Vocalization: Overview Speaking Time Cognitive Load Memory Load Data Vocalization User
15 Vocalization: Overview Precision Speaking Time Cognitive Load Memory Load Data Vocalization User
16 Vocalization: Overview Precision Speaking Time Cognitive Load Memory Load Data Vocalization User Optimization Problem: Given input relation, find time-optimal vocalization under constraints on precision, output structure, and memory load.
17 Example
18 Vocalization: Naive Baseline Restaurant Cuisine Rating Upstate Traditional American 4.75 Thai Castle Thai 3.3 John s Traditional American 4.7 Paris French 3.3 The View Traditional American 4.9 La Masseria Italian 3.2 Input
19 Vocalization: Naive Baseline Restaurant Cuisine Rating Upstate Traditional American 4.75 Thai Castle Thai 3.3 John s Traditional American 4.7 Paris French 3.3 The View Traditional American 4.9 Vocalization Upstate with Traditional American cuisine and four point seven five stars user average rating. Thai Castle with Thai cuisine and three point three stars user average rating. John s with Traditional American cuisine and four point seven stars user average rating. Paris with French cuisine and three point three stars user average rating. The View with Traditional American cuisine and four point nine stars user average rating. La Masseria with Italian cuisine and three point two stars average rating. La Masseria Italian 3.2 Input Output
20 Vocalization: Naive Baseline Restaurant Cuisine Rating Upstate Traditional American 4.75 Thai Castle Thai 3.3 John s Traditional American 4.7 Paris French 3.3 The View Traditional American 4.9 Vocalization Upstate with Traditional American cuisine and four point seven five stars user average rating. Thai Castle with Thai cuisine and three point three stars user average rating. John s with Traditional American cuisine and four point seven stars user average rating. Paris with French cuisine and three point three stars user average rating. The View with Traditional American cuisine and four point nine stars user average rating. La Masseria with Italian cuisine and three point two stars average rating. La Masseria Italian 3.2 Input 502 Characters Output
21 Vocalization: Naive Baseline Restaurant Cuisine Rating Upstate Traditional American 4.75 Thai Castle Thai 3.3 John s Traditional American 4.7 Paris French 3.3 The View La Masseria Traditional American 4.9 Italian 3.2 Input Vocalization Upstate with Traditional American cuisine and four point seven five stars user average rating. Thai Castle with Thai cuisine and three point three stars user average rating. John s with Traditional American cuisine and four point seven stars user average rating. Paris with French cuisine and three point three stars user average rating. The View with Traditional American cuisine and four point nine stars user average rating. La Masseria with Italian cuisine and three point two stars average rating. " 502 Characters Output
22 Vocalization: Naive Baseline Restaurant Cuisine Rating Upstate Equal Values Traditional American 4.75 Thai Castle Thai 3.3 John s Traditional American 4.7 Paris French 3.3 The View La Masseria Traditional American 4.9 Italian 3.2 Input Vocalization Upstate with Traditional American cuisine and four point seven five stars user average rating. Thai Castle with Thai cuisine and three point three stars user average rating. John s with Traditional American cuisine and four point seven stars user average rating. Paris with French cuisine and three point three stars user average rating. The View with Traditional American cuisine and four point nine stars user average rating. La Masseria with Italian cuisine and three point two stars average rating. " 502 Characters Output
23 More Concise Version La Masseria with Italian cuisine and three point two stars average rating. Restaurant Cuisine Rating Upstate Traditional American 4.75 Thai Castle Thai 3.3 John s Traditional American 4.7 Paris French 3.3 The View La Masseria Traditional American 4.9 Italian 3.2 Vocalization Restaurants with Traditional American cuisine: Upstate with four point seven five stars user average rating. John s with four point seven stars user average rating. The View with four point nine stars user average rating. Restaurants with three point three stars user average rating: Thai Castle with Thai cuisine. Paris with French cuisine. Input Output
24 More Concise Version La Masseria with Italian cuisine and three point two stars average rating. Restaurant Cuisine Rating Upstate Traditional American Input 4.75 Thai Castle Thai 3.3 John s Traditional American 4.7 Paris French 3.3 The View La Masseria Traditional American 4.9 Italian 3.2 Vocalization Restaurants with Traditional American cuisine: Upstate with four point seven five stars user average rating. John s with four point seven stars user average rating. The View with four point nine stars user average rating. Restaurants with three point three stars user average rating: Thai Castle with Thai cuisine. Paris with French cuisine. Contexts Output
25 More Concise Version La Masseria with Italian cuisine and three point two stars average rating. Restaurant Cuisine Rating Upstate Traditional American Input 4.75 Thai Castle Thai 3.3 John s Traditional American 4.7 Paris French 3.3 The View La Masseria Traditional American 4.9 Italian 3.2 Vocalization Restaurants with Traditional American cuisine: Upstate with four point seven five stars user average rating. John s with four point seven stars user average rating. The View with four point nine stars user average rating. Restaurants with three point three stars user average rating: Thai Castle with Thai cuisine. Paris with French cuisine. Scopes Output
26 More Concise Version La Masseria with Italian cuisine and three point two stars average rating. Restaurant Cuisine Rating Upstate Traditional American 4.75 Thai Castle Thai 3.3 John s Traditional American 4.7 Paris French 3.3 The View La Masseria Traditional American 4.9 Italian 3.2 Vocalization Restaurants with Traditional American cuisine: Upstate with four point seven five stars user average rating. John s with four point seven stars user average rating. The View with four point nine stars user average rating. Restaurants with three point three stars user average rating: Thai Castle with Thai cuisine. Paris with French cuisine. Input Output
27 More Concise Version La Masseria with Italian cuisine and three point two stars average rating. Restaurant Cuisine Rating Upstate Traditional American Input 4.75 Thai Castle Thai 3.3 John s Traditional American 4.7 Paris French 3.3 The View La Masseria Traditional American 4.9 Italian 3.2 Vocalization Restaurants with Traditional American cuisine: Upstate with four point seven five stars user average rating. John s with four point seven stars user average rating. The View with four point nine stars user average rating. Restaurants with three point three stars user average rating: Thai Castle with Thai cuisine. Paris with French cuisine Characters Output
28 More Concise Version La Masseria with Italian cuisine and three point two stars average rating. Restaurant Cuisine Rating Upstate Traditional American Input 4.75 Thai Castle Thai 3.3 John s Traditional American 4.7 Paris French 3.3 The View La Masseria Traditional American 4.9 Italian 3.2 Vocalization Restaurants with Traditional American cuisine: Upstate with four point seven five stars user average rating. John s with four point seven stars user average rating. The View with four point nine stars user average rating. Restaurants with three point three stars user average rating: Thai Castle with Thai cuisine. Paris with French cuisine Characters Output "
29 More Concise Version La Masseria with Italian cuisine and three point two stars average rating. Restaurant Cuisine Rating Upstate Traditional American Input 4.75 Thai Castle Thai 3.3 John s Traditional American 4.7 Paris French 3.3 The View La Masseria Traditional American 4.9 Italian 3.2 Vocalization Similar Values Restaurants with Traditional American cuisine: Upstate with four point seven five stars user average rating. John s with four point seven stars user average rating. The View with four point nine stars user average rating. Restaurants with three point three stars user average rating: Thai Castle with Thai cuisine. Paris with French cuisine. Output
30 Even More Concise Version Restaurant Cuisine Rating Upstate Traditional American 4.75 Thai Castle Thai 3.3 John s Traditional American 4.7 Paris French 3.3 The View Traditional American 4.9 Vocalization Restaurants with Traditional American cuisine and four to five stars user average rating: Upstate. John s. The View. Restaurants with three to four stars user average rating: Thai Castle with Thai cuisine. Paris with French cuisine. La Masseria with Italian cuisine. La Masseria Italian 3.2 Input Output
31 Even More Concise Version Restaurant Cuisine Rating Upstate Traditional American 4.75 Thai Castle Thai 3.3 John s Traditional American 4.7 Paris French 3.3 The View Traditional American 4.9 Vocalization Restaurants with Traditional American cuisine and four to five stars user average rating: Upstate. John s. The View. Restaurants with three to four stars user average rating: Thai Castle with Thai cuisine. Paris with French cuisine. La Masseria with Italian cuisine. La Masseria Italian 3.2 Input Characters Output
32 Even More Concise Version Restaurant Cuisine Rating Upstate Traditional American Input 4.75 Thai Castle Thai 3.3 John s Traditional American 4.7 Paris French 3.3 The View La Masseria Traditional American 4.9 Italian 3.2 Vocalization Restaurants with Traditional American cuisine and four to five stars user average rating: Upstate. John s. The View. Restaurants with three to four stars user average rating: Thai Castle with Thai cuisine. Paris with French cuisine. La Masseria with Italian cuisine Characters Output #
33 Vocalization Problem Input: a relation to vocalize Search Space: sequence of scopes Constraints: Context size S Categorical value domains: Domain size C Numerical value domains: Upper Bound Lower Bound * W Memory Precision Precision Objective: minimize speaking time!
34 Vocalization Problem Input: a relation to vocalize NP-Hard! Search Space: sequence of scopes Constraints: Context size S Categorical value domains: Domain size C Numerical value domains: Upper Bound Lower Bound * W Memory Precision Precision Objective: minimize speaking time!
35 Algorithms
36 Overview of Algorithms
37 Overview of Algorithms Polynomial Exponential Complexity
38 Overview of Algorithms Guarantees Optimal Near-Optimal None Polynomial Exponential Complexity
39 Overview of Algorithms Optimal Guarantees Integer Programming Near-Optimal None Polynomial Exponential Complexity
40 Overview of Algorithms Optimal Guarantees Integer Programming Near-Optimal None Polynomial Two-Phase Algorithm Exponential Complexity
41 Overview of Algorithms Optimal Near-Optimal None Guarantees Integer Programming Greedy Approach Two-Phase Algorithm Polynomial Exponential Complexity
42 Greedy Algorithm Find Optimal Voice Output Optimal Exponential
43 Greedy Algorithm Find Optimal Context Set Find Optimal Voice Output Optimal Optimal Exponential Polynomial
44 Greedy Algorithm Greedily Construct Context Set Find Optimal Context Find Optimal Voice Output? Optimal Optimal Polynomial Exponential Polynomial
45 Greedy Algorithm Greedily Construct Context Set Find Optimal Context Find Optimal Voice Output? Optimal Optimal Time Savings(Context Set) Property Holds? Submodular Monotone Non-Negative Polynomial Exponential Polynomial
46 Greedy Algorithm Greedily Construct Context Set Find Optimal Context Find Optimal Voice Output Near-Optimal Optimal Optimal Time Savings(Context Set) Property Holds? Submodular Monotone Non-Negative Polynomial Exponential Polynomial
47 Greedy Algorithm Greedily Construct Context Set Greedily Construct Context Find Optimal Voice Output Near-Optimal? Optimal Time Savings(Context Set) Property Holds? Submodular Monotone Non-Negative Polynomial Polynomial Polynomial
48 Greedy Algorithm Greedily Construct Context Set Greedily Construct Context Find Optimal Voice Output Near-Optimal? Optimal Time Savings(Context Set) Property Holds? Submodular Monotone Non-Negative Polynomial Polynomial Polynomial Time Savings(Assignments) Property Holds? Submodular Monotone Non-Negative
49 Greedy Algorithm Greedily Construct Context Set Greedily Construct Context Find Optimal Voice Output Near-Optimal Near-Optimal Optimal Time Savings(Context Set) Property Holds? Submodular Monotone Non-Negative Polynomial Polynomial Polynomial Time Savings(Assignments) Property Holds? Submodular Monotone Non-Negative
50 Experiments
51 Scope of Experiments
52 Scope of Experiments Restaurants Mobile Phones Football Statistics Laptop Models Scenario
53 Scope of Experiments Restaurants Mobile Phones Football Statistics Laptop Models Naive Baseline Scenario Algorithm Integer Programming Two-Phase Algorithm Greedy Approach (Different Configurations)
54 Scope of Experiments Metric User Preference Optimization Time Speech Length Restaurants Mobile Phones Football Statistics Laptop Models Naive Baseline Scenario Algorithm Integer Programming Two-Phase Algorithm Greedy Approach (Different Configurations)
55 Experimental Platform Hardware MacBook Pro 2.3 GHz CPU 8 GB memory Software CPLEX 12.7 IBM Watson MacOS 10.12
56 Speech Length Precision Complexity #Rows Greedy Approach Two-Phase Algorithm Integer Programming High Medium Low High Medium Low Low Low Low Medium Medium Medium #Rows #Rows #Rows #Rows #Rows Laptops Restaurants Football Phones
57 Speech Length Reduced Length Precision Complexity #Rows Greedy Approach Two-Phase Algorithm Integer Programming High Medium Low High Medium Low Low Low Low Medium Medium Medium #Rows #Rows #Rows #Rows #Rows Laptops Restaurants Football Phones
58 Speech Length Reduced Length Precision Complexity #Rows Greedy Approach Two-Phase Algorithm Integer Programming High Medium Low High Medium Low Low Low Low Medium Medium Medium #Rows #Rows #Rows #Rows #Rows Laptops Restaurants Football Phones
59 Speech Length Reduced Length Precision Complexity #Rows Greedy Approach Two-Phase Algorithm Integer Programming High Medium Low High Medium Low Low Low Low Medium Medium Medium #Rows #Rows #Rows #Rows #Rows Laptops Restaurants Football Phones
60 Speech Length Reduced Length Precision Complexity #Rows Greedy Approach Two-Phase Algorithm Integer Programming High Medium Low High Medium Low Low Low Low Medium Medium Medium #Rows #Rows #Rows #Rows #Rows Best Laptops Restaurants Football Phones
61 Speech Length Reduced Length Precision Complexity #Rows Greedy Approach Two-Phase Algorithm Integer Programming High Medium Low High Medium Low Low Low Low Medium Medium Medium #Rows #Rows #Rows #Rows #Rows Best Entropy: 0.59 Entropy: 0.7 Laptops Restaurants Entropy: 0.65 Entropy: 0.59 Football Phones
62 User Preference vs. Time 10 Naive Concise 60 Naive Concise Preference (#Votes) Speech Length (Seconds) #Rows #Rows
63 Optimization Time Precision Complexity Greedy Approach Two-Phase Algorithm Integer Programming High Medium Low High Medium Low Low Low Low Medium Medium Medium Laptops Restaurants Football Phones #Rows #Rows #Rows #Rows #Rows #Rows
64 Best Precision Complexity Optimization Time Greedy Approach Two-Phase Algorithm Integer Programming High Medium Low High Medium Low Low Low Low Medium Medium Medium Laptops Restaurants Football Phones #Rows #Rows #Rows #Rows #Rows #Rows
65 Data to Text Written Text Sonification Non-Speech Speech Output Text for Voice Output Vocalization Relational Input Input is Text Text Reader Audio Output Visual Output Visualization
66 Talk Overview Fact Checking Query User Data Vocalization Data Analysis Result Processing Query Optimization
67 Talk Overview Fact Checking Query User Data Vocalization Data Analysis Result Processing Query Optimization
68 Query Optimization
69 Problem Model Tuning Knobs Operation order Operators Query Query Optimization Cost Metric Execution time Optimal Plan(s)
70 Problem Model Tuning Knobs Operation order Operators NP-Hard to Approximate Query Query Optimization Optimal Plan(s) NP-Hard to Solve Cost Metric Execution time
71 Pre-Computation Before Run Time VLDB 15 VLDBJ 16 Dissertation Overview Query Template Optimizer Potentially Optimal Plans CACM Research Jim Gray Highlight Award Best of VLDB Query Optimizer Best of SIGMOD Optimal Plans SLOW Exhaustive SLOW SIGMOD 17 SIGMOD 14 Query Optimizer Solver Near-Optimal Plans Solver FASTER Query Optimizer FASTER Near-Optimal Plans VLDB 16 Optimizer SIGMOD 15 Query * Plans Parallel FASTER Query Optimizer Acceptable Plans Near-Optimal Plans Optimal Plans Approximate Incremental Random Optimality Guarantees VLDB 16 FASTER Optimizer SIGMOD 16 Query Best Effort Plans Quantum Optimization Platform Query Optimizer Best Effort Plans FAST FAST
72 Solving Hard Optimization Problems Optimization Problem
73 Solving Hard Optimization Problems Optimization Problem Specialized Solver
74 Solving Hard Optimization Problems Standard Problem Optimization Problem Specialized Solver
75 Solving Hard Optimization Problems Standard Problem Optimization Problem Specialized Solver Standard Solver
76 Solving Hard Optimization Problems Join Ordering Optimization Problem Standard Problem Specialized Solver Standard Solver
77 Solving Hard Optimization Problems Join Ordering Optimization Problem Standard Problem Specialized Solver Query Optimizer Standard Solver
78 Solving Hard Optimization Problems Join Ordering Optimization Problem MILP Standard Problem Specialized Solver Query Optimizer Standard Solver
79 Solving Hard Optimization Problems Join Ordering Optimization Problem MILP Standard Problem Specialized Solver Query Optimizer Standard Solver MILP Solver
80 Evolution of MILP Solvers CPLEX Version CPLEX MILP Speedup Year Source: A brief history of linear and mixed-integer linear programming by R. Bixby
81 Evolution of MILP Solvers CPLEX Version Hardware Independently CPLEX MILP Speedup Year Source: A brief history of linear and mixed-integer linear programming by R. Bixby
82 Potential Benefits 1E+16 Search Space Size (#Relations) 1E+08 1E Number of Joined Tables
83 Potential Benefits 1E+16 Exhaustive Non-Exhaustive Search Space Size (#Relations) 1E+08 1E Number of Joined Tables
84 Potential Benefits 1E+16 Exhaustive MILP Non-Exhaustive Search Space Size (#Relations) 1E+08 1E Number of Joined Tables
85 Potential Benefits Search Space Size (#Relations) 1E+16 1E+08 1E+00 Exhaustive MILP Non-Exhaustive + Parallel Optimization Number of Joined Tables
86 Potential Benefits Search Space Size (#Relations) 1E+16 1E+08 1E+00 Exhaustive MILP Non-Exhaustive + Parallel Optimization + Incremental Optimization Number of Joined Tables
87 Potential Benefits Search Space Size (#Relations) 1E+16 1E+08 1E+00 Exhaustive MILP Non-Exhaustive + Parallel Optimization + Incremental Optimization + Configuration Options Number of Joined Tables
88 Transformation
89 Overview Join Ordering Given tables with cardinality, predicates with selectivity Find optimal join order, best operators, etc. Transformation Mixed Integer Linear Programming Integer and continuous Variables Set of Linear Constraints Linear Objective Function
90 Transformation Scope Queries: Projections, N-Ary Predicates, Sub-Queries, Operators: Hash Join, BNL Join, Sort-Merge Join, Cost: Interesting Orders, Correlated Predicates, Byte Sizes
91 Transformation Scope Queries: Projections, N-Ary Predicates, Sub-Queries, See Paper Operators: Hash Join, BNL Join, Sort-Merge Join, Cost: Interesting Orders, Correlated Predicates, Byte Sizes
92 Transformation Scope Queries: Projections, N-Ary Predicates, Sub-Queries, See Paper Operators: Hash Join, BNL Join, Sort-Merge Join, Cost: Interesting Orders, Correlated Predicates, Byte Sizes This Talk: Highly Simplified Model
93 SELECT * FROM R,S,T WHERE P(R,S)
94 SELECT * FROM R,S,T WHERE P(R,S)????
95 SELECT * FROM R,S,T WHERE P(R,S) BR BS BR BS Operand Tables? Operand Tables BT BR BS BT? Operand Tables BT BR BS?? Operand Tables BT BR BS Operand Tables BT Variables
96 SELECT * FROM R,S,T WHERE P(R,S) BR BS BR BS Operand Tables? Operand Tables BT BR BS BT? Operand Tables BT BR BS?? Operand Tables BT BR BS Operand Tables BT Constraints Variables
97 SELECT * FROM R,S,T WHERE P(R,S) BR BS BR BS Operand Tables? Operand Tables BT BR BS BT? Operand Tables BT =1 BR BS?? Operand Tables BT =1 BR BS Operand Tables BT =1 Constraints Variables
98 SELECT * FROM R,S,T WHERE P(R,S) BR BS BR BS Operand Tables? Operand Tables BT BR BS BT =1? Operand Tables BT =1 BR BS?? Operand Tables BT =1 BR BS Operand Tables BT =1 Constraints Variables
99 RC SELECT * FROM R,S,T WHERE P(R,S) Cardinality BR RC BS BR? Operand Tables Cardinality BS Operand Tables BT RC BR BS BT =1 Cardinality? Operand Tables BT =1 BR BS?? Operand Tables BT =1 BR BS Operand Tables BT =1 Constraints Variables
100 RC SELECT * FROM R,S,T WHERE P(R,S) Cardinality BP? Operand Tables Applicable Predicates BR RC BS BR Cardinality BS Operand Tables BT RC BR BS BT =1 Cardinality? Operand Tables BT =1 BR BS?? Operand Tables BT =1 BR BS Operand Tables BT =1 Constraints Variables
101 Calculating Cardinality T 1 T n WHERE P 1 P m
102 Calculating Cardinality Cardinality(T 1 T n WHERE P 1 P m )
103 Calculating Cardinality Cardinality(T 1 T n WHERE P 1 P m ) = icardinality(t i )* iselectivity(p i )
104 Calculating Cardinality Cardinality(T 1 T n WHERE P 1 P m ) = icardinality(t i )* iselectivity(p i )
105 Calculating Cardinality Cardinality(T 1 T n WHERE P 1 P m ) = icardinality(t i )* iselectivity(p i ) Not Linear!
106 Calculating Cardinality Cardinality(T 1 T n WHERE P 1 P m ) = icardinality(t i )* iselectivity(p i )
107 Calculating Cardinality Log(Cardinality(T 1 T n WHERE P 1 P m )) = Log( icardinality(t i )* iselectivity(p i ))
108 Calculating Cardinality Log(Cardinality(T 1 T n WHERE P 1 P m )) = Log( icardinality(t i )* iselectivity(p i )) = ilog(cardinality(t i ))+ ilog(selectivity(p i ))
109 Calculating Cardinality Log(Cardinality(T 1 T n WHERE P 1 P m )) = Log( icardinality(t i )* iselectivity(p i )) = ilog(cardinality(t i ))+ ilog(selectivity(p i ))
110 Calculating Cardinality Log(Cardinality(T 1 T n WHERE P 1 P m )) = Log( icardinality(t i )* iselectivity(p i )) = ilog(cardinality(t i ))+ ilog(selectivity(p i )) Linear!
111 RC SELECT * FROM R,S,T WHERE P(R,S) Cardinality BP? Operand Tables Applicable Predicates BR RC BS BR Cardinality BS Operand Tables BT RC BR BS BT =1 Cardinality? Operand Tables BT =1 BR BS?? Operand Tables BT =1 BR BS Operand Tables BT =1 Constraints Variables
112 RC SELECT * FROM R,S,T WHERE P(R,S) Cardinality RL BP? Operand Tables Log-Cardinality Applicable Predicates BR RC BS BR Cardinality BS Operand Tables BT RC BR BS BT =1 Cardinality? Operand Tables BT =1 BR BS?? Operand Tables BT =1 BR BS Operand Tables BT =1 Constraints Variables
113 Calculating Cardinality II Cardinality Log Cardinality
114 Calculating Cardinality II Cardinality Log Cardinality
115 Calculating Cardinality II Cardinality Log Cardinality BT1 BT2 BT3
116 Calculating Cardinality II Cardinality =0 =1 Log Cardinality =0 BT1 BT2 BT3 =1 =0 =1
117 Calculating Cardinality II Cardinality Approximate Cardinality =0 =1 Log Cardinality =0 BT1 BT2 BT3 =1 =0 =1
118 RC SELECT * FROM R,S,T WHERE P(R,S) Cardinality RL BP? Operand Tables Log-Cardinality Applicable Predicates BR RC BS BR Cardinality BS Operand Tables BT RC BR BS BT =1 Cardinality? Operand Tables BT =1 BR BS?? Operand Tables BT =1 BR BS Operand Tables BT =1 Constraints Variables
119 RC SELECT * FROM R,S,T WHERE P(R,S) Cardinality BT1 BT2 RL BP? Operand Tables Cardinality Bounds Log-Cardinality Applicable Predicates BR RC BS BR Cardinality BS Operand Tables BT3 BT RC BR BS BT =1 Cardinality? Operand Tables BT =1 BR BS?? Operand Tables BT =1 BR BS Operand Tables BT =1 Constraints Variables
120 RC SELECT * FROM R,S,T WHERE P(R,S) Cardinality RC Approximate Cardinality BT1 BT2 RL BP? Operand Tables Cardinality Bounds Log-Cardinality Applicable Predicates BR RC BS BR Cardinality BS Operand Tables BT3 BT RC BR BS BT =1 Cardinality? Operand Tables BT =1 BR BS?? Operand Tables BT =1 BR BS Operand Tables BT =1 Constraints Variables
121 RC SELECT * FROM R,S,T WHERE P(R,S) Cardinality RC Approximate Cardinality BT1 BT2 RL BP? Operand Tables Cardinality Bounds Log-Cardinality Applicable Predicates BR BS Cardinality Operand Tables BT3 BT =1 Cardinality? Operand Tables =1 #Variables O(nrTables 2 ) RC BR BS RC BR BS BT BT BR BS?? Operand Tables BT =1 BR BS Operand Tables BT =1 Constraints Variables
122 Experimental Results
123 Prototypical Implementation Postgres Parser Optimizer ILP Solver Execution
124 Prototypical Implementation Postgres Parser Optimizer ILP Solver Execution
125 Prototypical Implementation Postgres Parser Optimizer ILP Solver Decomposition Join Ordering Execution
126 Prototypical Implementation Postgres Parser Optimizer ILP Solver Decomposition Join Ordering Genetic Algorithm Dynamic Programming Execution
127 Prototypical Implementation Postgres Parser Optimizer ILP Solver Decomposition Join Ordering Genetic Algorithm Dynamic Programming MILP Proxy Execution Query & Statistics Optimal Order MILP Problem MILP MILP Solution Generation
128 Prototypical Implementation Postgres Parser Optimizer ILP Solver 32 LOC Decomposition Join Ordering Genetic Algorithm Dynamic Programming MILP Proxy Execution Query & Statistics Optimal Order MILP Problem 396 LOC MILP MILP Solution Generation
129 Experimental Scope
130 Parameters Experimental Scope
131 Parameters Experimental Scope NFL TPC-H Graphs Loans Survey Musicbrainz Database
132 Parameters Experimental Scope NFL TPC-H Graphs Loans Survey Musicbrainz Database Star Chain Cycle Irregular Queries
133 Parameters Experimental Scope NFL TPC-H Graphs Loans Survey Musicbrainz Database Integer Programming Dynamic Programming: Bushy Dynamic Programming: LD+Cross Dynamic Programming: Left-Deep Genetic Algorithm: Left-Deep Genetic Algorithm: Bushy Optimizers Star Chain Cycle Irregular Queries
134 Parameters Experimental Scope NFL TPC-H Graphs Loans Survey Musicbrainz Database Integer Programming Dynamic Programming: Bushy Dynamic Programming: LD+Cross Dynamic Programming: Left-Deep Genetic Algorithm: Left-Deep Genetic Algorithm: Bushy Optimizers Star Chain Cycle Irregular Queries Metrics
135 Parameters Experimental Scope NFL TPC-H Graphs Loans Survey Musicbrainz Database Integer Programming Dynamic Programming: Bushy Dynamic Programming: LD+Cross Dynamic Programming: Left-Deep Genetic Algorithm: Left-Deep Genetic Algorithm: Bushy Optimizers Star Chain Cycle Irregular Queries Metrics Optimization Time Plan Quality
136 Parameters Experimental Scope NFL TPC-H Graphs Loans Survey Musicbrainz Database Integer Programming Dynamic Programming: Bushy Dynamic Programming: LD+Cross Dynamic Programming: Left-Deep Genetic Algorithm: Left-Deep Genetic Algorithm: Bushy Optimizers Star Chain Cycle Irregular Queries Metrics Optimization Time Plan Quality Guarantees Actual
137 Parameters Experimental Scope NFL TPC-H Graphs Loans Survey Musicbrainz Database Integer Programming Dynamic Programming: Bushy Dynamic Programming: LD+Cross Dynamic Programming: Left-Deep Genetic Algorithm: Left-Deep Genetic Algorithm: Bushy Optimizers Star Chain Cycle Irregular Queries Metrics Optimization Time Plan Quality Guarantees Actual Estimated Measured
138 Parameters Experimental Scope NFL TPC-H Graphs Loans Survey Musicbrainz Database Integer Programming Dynamic Programming: Bushy Dynamic Programming: LD+Cross Dynamic Programming: Left-Deep Genetic Algorithm: Left-Deep Genetic Algorithm: Bushy Optimizers Star Chain Cycle Irregular Queries Metrics Optimization Time Plan Quality Guarantees Actual Estimated Approximated Model Postgres Model Measured DBMS-X Postgres
139 Optimization Time DP-Left DP-Left+Cartesian DP-Bushy IP T-out 500 sec IP T-out 50 sec IP T-out 5 sec 500 sec 50 sec 5 sec Optimization Time (ms) Memory out Memory out Memory out Number of Joined Tables DBMS: Postgres 9.6 ILP Solver: Gurobi 6.5 Hardware: 12 GB Memory, 2.6 GHz CPU
140 Optimization Time DP-Left DP-Left+Cartesian DP-Bushy IP T-out 500 sec IP T-out 50 sec IP T-out 5 sec 500 sec 50 sec 5 sec Optimization Time (ms) Memory out Memory out Memory out Number of Joined Tables DBMS: Postgres 9.6 ILP Solver: Gurobi 6.5 Hardware: 12 GB Memory, 2.6 GHz CPU
141 Optimization Time DP-Left DP-Left+Cartesian DP-Bushy IP T-out 500 sec IP T-out 50 sec IP T-out 5 sec 500 sec 50 sec 5 sec Optimization Time (ms) Memory out Memory out Memory out Join Orders x Relations x Number of Joined Tables DBMS: Postgres 9.6 ILP Solver: Gurobi 6.5 Hardware: 12 GB Memory, 2.6 GHz CPU
142 MILP Optimizer Timeouts Budget 5 Timeouts Budget 50 Timeouts Budget 500 Timeouts Timeout (%) Number of Joined Tables DBMS: Postgres 9.6 ILP Solver: Gurobi 6.5 Hardware: 12 GB Memory, 2.6 GHz CPU
143 Guarantees on Plan Quality After 5 Seconds Survey-Star Loan-Star NFL-Star Graph-Chain Graph-Cycle Plan Cost Bound Number of Joined Tables DBMS: Postgres 9.6 ILP Solver: Gurobi 6.5 Hardware: 12 GB Memory, 2.6 GHz CPU
144 Approximate Cost Model vs. Postgres Cost Model Theoretical Worst Case Empirical Worst Case Optimal Plan 4 Cost Overhead Number of Joined Tables DBMS: Postgres 9.6 ILP Solver: Gurobi 6.5 Hardware: 12 GB Memory, 2.6 GHz CPU
145 10000 Musicbrainz : Combined Optimization and Execution Time Genetic Optimizer Left Genetic Optimizer Bushy MILP Optimizer Genetic Optimizer Left Genetic Optimizer Bushy MILP Optimizer Genetic Optimizer Left Genetic Optimizer Bushy MILP Optimizer Joined Tables Joined Tables Joined Tables Optimization Time Execution Time Total Time DBMS: Postgres 9.6 ILP Solver: Gurobi 6.5 Hardware: 12 GB Memory, 2.6 GHz CPU
146 Replacing Cost Models Motivation: unreliable execution cost estimates Correlated data User-defined predicates Goal: regret bounded query evaluation Method: use reinforcement learning (UCT)
147 Talk Overview Fact Checking Query User Data Vocalization Data Analysis Result Processing Query Optimization
148 Talk Overview Fact Checking Query User Data Vocalization Data Analysis Result Processing Query Optimization
149 The FactChecker
150 Text Summarizing Data
151 Text Summarizing Data
152 Text Summarizing Data
153 Authoring Tools Spell Checker
154 Authoring Tools Fact Checker
155 Input Overview Output Data Text Fact Checker Verified
156 Input Overview Output Data Text Fact Checker Claim: The average income in New York was $61,437. Verified
157 Input Overview Output Data Text Fact Checker Claim: The average income in New York was $61,437. Query: SELECT Avg(Income) FROM Population WHERE State = NY Claimed Result: 61,437 Verified
158 Input Overview Output Data Text Fact Checker Claim: The average income in New York was $61,437. Query: SELECT Avg(Income) FROM Population WHERE State = NY Claimed Result: 61,437 Evaluation Result: 60,419 Verified
159 Input Overview Output Data Text No Match! Fact Checker Claim: The average income in New York was $61,437. Query: SELECT Avg(Income) FROM Population WHERE State = NY Claimed Result: 61,437 Evaluation Result: 60,419 Verified
160 Fact Checking vs. Natural Language Querying Criterion NL Querying Fact Checking Input Type Query Query + Result Input Format One Sentence Multiple Sentences Batch Size One Query Multiple Related Claims
161 High-Level Overview Main Ideas Massive scale evaluation of candidates Search for common document theme Scope Focus on query sub-class E.g., SELECT AVG(Income)WHERE Profession = Computer Scientist AND Location = NY Semi-automatic fact checking
162 Matching
163 Matching Query Fragments Avg() Average, mean, State = NY NY, State, New York State, Empire State Income Income, Gain,
164 Matching Query Fragments Avg() Average, mean, State = NY NY, State, New York State, Empire State Claim Sentence Income Income, Gain,
165 Claim Keywords Matching Query Fragments Headline Section Headline Avg() Average, mean, Paragraph Start Claim Sentence State = NY NY, State, New York State, Empire State Income Income, Gain,
166 Claim Keywords IR Matching Query Fragments Headline Section Headline Avg() Average, mean, Paragraph Start Claim Sentence State = NY NY, State, New York State, Empire State Income Income, Gain,
167 Probabilistic Model Document Parameters Pr(Function) Pr(Aggregate) Pr(Predicate)
168 Probabilistic Model Document Parameters Pr(Function) Pr(Aggregate) Pr(Predicate) Claim 1 Variables Claim n Variables Q1 Qn
169 Probabilistic Model Document Parameters Pr(Function) Pr(Aggregate) Pr(Predicate) Claim 1 Variables Claim n Variables Q1 Qn S1 E1 Sn En
170 Probabilistic Model Document Parameters Pr(Function) Pr(Aggregate) Pr(Predicate) Claim 1 Variables Claim n Variables Q1 Qn S1 E1 Sn En
171 Probabilistic Model Document Parameters Pr(Function) Pr(Aggregate) Pr(Predicate) Claim 1 Variables Expectation Claim n Variables Q1 Qn S1 E1 Sn En
172 Probabilistic Model Maximization Document Parameters Pr(Function) Pr(Aggregate) Pr(Predicate) Claim 1 Variables Expectation Claim n Variables Q1 Qn S1 E1 Sn En
173 Probabilistic Model Maximization Document Parameters Pr(Function) Pr(Aggregate) Pr(Predicate) Claim 1 Variables Expectation Claim n Variables Q1 Qn S1 E1 Sn En Requires Execution!
174 Query Evaluation Fact Checker - Execution Engine Cost Model + Optimizer Query Merging Result Cache Incremental Execution Relational DBMS
175 User Interface
176 User Interface
177 User Interface
178 User Interface
179 Test Cases 34 articles with relational data sets Sources New York Times FiveThirtyEight Stack Overflow Data set size: from few KB to ~100 MB
180 Analysis of Test Cases
181 Claims by Article Correct Incorrect # Claims Articles
182 Claim Statistics Granularity Total Erroneous Erroneous (%) Articles Claims
183 Query Complexity No Predicates One Predicate Two Predicates 12% 18% 70%
184 Impact of Document Theme Aggregation Function Set of Restricted Columns Aggregated Column Coverage (%) Top-N Likely Candidates (N)
185 Evaluating Fact Checking
186 Accuracy of Query Mapping Total Correct Claim Wrong Claim Coverage (%) Top-N Queries (N)
187 Number of Possible Queries 1E+12 #Possible Queries 1E+06 1E+00 Test Cases
188 Impact of Keyword Source 80 Sentence +Previous +First in Paragraph +Synonyms +Headlines 70 Coverage (%) Top 1 Top 5 Top 10 Top-N Likely Candidates (N)
189 Time vs. Accuracy: #Iterations Top Coverage (%) Execution Time (s)
190 Impact of Optimizations on Execution Time Version Run Time (s) Speedup Naive Query Merging 82 x62 + Result Caching 57 x1.4 + Incremental Evaluation 54 x1.05
191 Execution Time Breakdown Indexing Expectation Maximization Information Retrieval Query Processing 10% 5% 24% 20% 6% 61% 67% 7% Small Data Large Data
192 User Study
193 User Study Setup Goal: compare FactChecker vs. SQL interface Criterion: time to verify claims in documents Participants: eight participants (7 CS majors) Setup: Intro: six minutes tutorial for fact checker Alternate between two interfaces Two long articles, 20 minutes per article Four short articles, 5 minutes per article
194 #Claims Verified by Article Stack Overflow 2016 Stack Overflow 2015 HipHop Candidate Airline Etiquettes Elo Blattner Stack Overflow # Claims Verified Time (s) Time (s) Time (s) Time (s) Fact Checker (AVG) SQL (AVG) Time (s) Time (s)
195 FC-Speedup by User 18 Speedup (#Verifications/Minute) U1 U2 U3 U4 U5 U6 U7 U8 AVG User
196 Features Used Top-1 Top-5 Top-10 Custom 5% 13% 45% 38%
197 Precision and Recall Approach Recall Precision F1 Score FactChecker + User FactChecker (Automatic) 100% 91% 95% 67% 67% 67% SQL + User 30% 57% 39%
198 Talk Overview Fact Checking Query User Data Vocalization Data Analysis Result Processing Query Optimization
199 ?
Accelerating Analytical Workloads
Accelerating Analytical Workloads Thomas Neumann Technische Universität München April 15, 2014 Scale Out in Big Data Analytics Big Data usually means data is distributed Scale out to process very large
More informationColumn Stores vs. Row Stores How Different Are They Really?
Column Stores vs. Row Stores How Different Are They Really? Daniel J. Abadi (Yale) Samuel R. Madden (MIT) Nabil Hachem (AvantGarde) Presented By : Kanika Nagpal OUTLINE Introduction Motivation Background
More informationJoin Processing for Flash SSDs: Remembering Past Lessons
Join Processing for Flash SSDs: Remembering Past Lessons Jaeyoung Do, Jignesh M. Patel Department of Computer Sciences University of Wisconsin-Madison $/MB GB Flash Solid State Drives (SSDs) Benefits of
More informationVocalizing Large Time Series Efficiently
Vocalizing Large Time Series Efficiently Immanuel Trummer, Mark Bryan, Ramya Narasimha {it224,mab539,rn343}@cornell.edu Cornell University ABSTRACT We vocalize query results for time series data. We describe
More informationColumn-Stores vs. Row-Stores. How Different are they Really? Arul Bharathi
Column-Stores vs. Row-Stores How Different are they Really? Arul Bharathi Authors Daniel J.Abadi Samuel R. Madden Nabil Hachem 2 Contents Introduction Row Oriented Execution Column Oriented Execution Column-Store
More informationWeaving Relations for Cache Performance
VLDB 2001, Rome, Italy Best Paper Award Weaving Relations for Cache Performance Anastassia Ailamaki David J. DeWitt Mark D. Hill Marios Skounakis Presented by: Ippokratis Pandis Bottleneck in DBMSs Processor
More informationParallel Query Optimisation
Parallel Query Optimisation Contents Objectives of parallel query optimisation Parallel query optimisation Two-Phase optimisation One-Phase optimisation Inter-operator parallelism oriented optimisation
More informationAdvanced Data Management
Advanced Data Management Medha Atre Office: KD-219 atrem@cse.iitk.ac.in Aug 11, 2016 Assignment-1 due on Aug 15 23:59 IST. Submission instructions will be posted by tomorrow, Friday Aug 12 on the course
More informationXML Data Stream Processing: Extensions to YFilter
XML Data Stream Processing: Extensions to YFilter Shaolei Feng and Giridhar Kumaran January 31, 2007 Abstract Running XPath queries on XML data steams is a challenge. Current approaches that store the
More informationColumn-Stores vs. Row-Stores: How Different Are They Really?
Column-Stores vs. Row-Stores: How Different Are They Really? Daniel J. Abadi, Samuel Madden and Nabil Hachem SIGMOD 2008 Presented by: Souvik Pal Subhro Bhattacharyya Department of Computer Science Indian
More informationTriangle SQL Server User Group Adaptive Query Processing with Azure SQL DB and SQL Server 2017
Triangle SQL Server User Group Adaptive Query Processing with Azure SQL DB and SQL Server 2017 Joe Sack, Principal Program Manager, Microsoft Joe.Sack@Microsoft.com Adaptability Adapt based on customer
More informationRelational Database Index Design and the Optimizers
Relational Database Index Design and the Optimizers DB2, Oracle, SQL Server, et al. Tapio Lahdenmäki Michael Leach (C^WILEY- IX/INTERSCIENCE A JOHN WILEY & SONS, INC., PUBLICATION Contents Preface xv 1
More informationRed Fox: An Execution Environment for Relational Query Processing on GPUs
Red Fox: An Execution Environment for Relational Query Processing on GPUs Georgia Institute of Technology: Haicheng Wu, Ifrah Saeed, Sudhakar Yalamanchili LogicBlox Inc.: Daniel Zinn, Martin Bravenboer,
More informationCopyright 2016 Ramez Elmasri and Shamkant B. Navathe
CHAPTER 19 Query Optimization Introduction Query optimization Conducted by a query optimizer in a DBMS Goal: select best available strategy for executing query Based on information available Most RDBMSs
More informationDatabase Learning: Toward a Database that Becomes Smarter Over Time
Database Learning: Toward a Database that Becomes Smarter Over Time Yongjoo Park Ahmad Shahab Tajik Michael Cafarella Barzan Mozafari University of Michigan, Ann Arbor Today s databases Database Users
More informationApplying Multi-Core Model Checking to Hardware-Software Partitioning in Embedded Systems
V Brazilian Symposium on Computing Systems Engineering Applying Multi-Core Model Checking to Hardware-Software Partitioning in Embedded Systems Alessandro Trindade, Hussama Ismail, and Lucas Cordeiro Foz
More informationObstacle-Aware Longest-Path Routing with Parallel MILP Solvers
, October 20-22, 2010, San Francisco, USA Obstacle-Aware Longest-Path Routing with Parallel MILP Solvers I-Lun Tseng, Member, IAENG, Huan-Wen Chen, and Che-I Lee Abstract Longest-path routing problems,
More informationOutline. Database Management and Tuning. Outline. Join Strategies Running Example. Index Tuning. Johann Gamper. Unit 6 April 12, 2012
Outline Database Management and Tuning Johann Gamper Free University of Bozen-Bolzano Faculty of Computer Science IDSE Unit 6 April 12, 2012 1 Acknowledgements: The slides are provided by Nikolaus Augsten
More informationRed Fox: An Execution Environment for Relational Query Processing on GPUs
Red Fox: An Execution Environment for Relational Query Processing on GPUs Haicheng Wu 1, Gregory Diamos 2, Tim Sheard 3, Molham Aref 4, Sean Baxter 2, Michael Garland 2, Sudhakar Yalamanchili 1 1. Georgia
More informationImproving the Performance of OLAP Queries Using Families of Statistics Trees
Improving the Performance of OLAP Queries Using Families of Statistics Trees Joachim Hammer Dept. of Computer and Information Science University of Florida Lixin Fu Dept. of Mathematical Sciences University
More informationOutline. Database Tuning. Join Strategies Running Example. Outline. Index Tuning. Nikolaus Augsten. Unit 6 WS 2014/2015
Outline Database Tuning Nikolaus Augsten University of Salzburg Department of Computer Science Database Group 1 Examples Unit 6 WS 2014/2015 Adapted from Database Tuning by Dennis Shasha and Philippe Bonnet.
More informationElementary IR: Scalable Boolean Text Search. (Compare with R & G )
Elementary IR: Scalable Boolean Text Search (Compare with R & G 27.1-3) Information Retrieval: History A research field traditionally separate from Databases Hans P. Luhn, IBM, 1959: Keyword in Context
More informationA Class of Submodular Functions for Document Summarization
A Class of Submodular Functions for Document Summarization Hui Lin, Jeff Bilmes University of Washington, Seattle Dept. of Electrical Engineering June 20, 2011 Lin and Bilmes Submodular Summarization June
More informationCS 4604: Introduction to Database Management Systems. B. Aditya Prakash Lecture #10: Query Processing
CS 4604: Introduction to Database Management Systems B. Aditya Prakash Lecture #10: Query Processing Outline introduction selection projection join set & aggregate operations Prakash 2018 VT CS 4604 2
More informationInternational Journal of Advance Engineering and Research Development. Performance Enhancement of Search System
Scientific Journal of Impact Factor(SJIF): 3.134 International Journal of Advance Engineering and Research Development Volume 2,Issue 7, July -2015 Performance Enhancement of Search System Ms. Uma P Nalawade
More informationTrack Join. Distributed Joins with Minimal Network Traffic. Orestis Polychroniou! Rajkumar Sen! Kenneth A. Ross
Track Join Distributed Joins with Minimal Network Traffic Orestis Polychroniou Rajkumar Sen Kenneth A. Ross Local Joins Algorithms Hash Join Sort Merge Join Index Join Nested Loop Join Spilling to disk
More informationRecord Placement Based on Data Skew Using Solid State Drives
BPOE-5 Record Placement Based on Data Skew Using Solid State Drives Jun Suzuki 1,2, Shivaram Venkataraman 2, Sameer Agarwal 2, Michael Franklin 2, and Ion Stoica 2 1 Green Platform Research Laboratories,
More informationAutomated Configuration of MIP solvers
Automated Configuration of MIP solvers Frank Hutter, Holger Hoos, and Kevin Leyton-Brown Department of Computer Science University of British Columbia Vancouver, Canada {hutter,hoos,kevinlb}@cs.ubc.ca
More informationCMPT 354: Database System I. Lecture 7. Basics of Query Optimization
CMPT 354: Database System I Lecture 7. Basics of Query Optimization 1 Why should you care? https://databricks.com/glossary/catalyst-optimizer https://sigmod.org/sigmod-awards/people/goetz-graefe-2017-sigmod-edgar-f-codd-innovations-award/
More informationBridging the Processor/Memory Performance Gap in Database Applications
Bridging the Processor/Memory Performance Gap in Database Applications Anastassia Ailamaki Carnegie Mellon http://www.cs.cmu.edu/~natassa Memory Hierarchies PROCESSOR EXECUTION PIPELINE L1 I-CACHE L1 D-CACHE
More informationChapter 9. Cardinality Estimation. How Many Rows Does a Query Yield? Architecture and Implementation of Database Systems Winter 2010/11
Chapter 9 How Many Rows Does a Query Yield? Architecture and Implementation of Database Systems Winter 2010/11 Wilhelm-Schickard-Institut für Informatik Universität Tübingen 9.1 Web Forms Applications
More informationLecture #14 Optimizer Implementation (Part I)
15-721 ADVANCED DATABASE SYSTEMS Lecture #14 Optimizer Implementation (Part I) Andy Pavlo / Carnegie Mellon University / Spring 2016 @Andy_Pavlo // Carnegie Mellon University // Spring 2017 2 TODAY S AGENDA
More informationQUERY OPTIMIZATION E Jayant Haritsa Computer Science and Automation Indian Institute of Science. JAN 2014 Slide 1 QUERY OPTIMIZATION
E0 261 Jayant Haritsa Computer Science and Automation Indian Institute of Science JAN 2014 Slide 1 Database Engines Main Components Query Processing Transaction Processing Access Methods JAN 2014 Slide
More informationWorkload Characterization and Optimization of TPC-H Queries on Apache Spark
Workload Characterization and Optimization of TPC-H Queries on Apache Spark Tatsuhiro Chiba and Tamiya Onodera IBM Research - Tokyo April. 17-19, 216 IEEE ISPASS 216 @ Uppsala, Sweden Overview IBM Research
More informationHANA Performance. Efficient Speed and Scale-out for Real-time BI
HANA Performance Efficient Speed and Scale-out for Real-time BI 1 HANA Performance: Efficient Speed and Scale-out for Real-time BI Introduction SAP HANA enables organizations to optimize their business
More informationA Case for Merge Joins in Mediator Systems
A Case for Merge Joins in Mediator Systems Ramon Lawrence Kirk Hackert IDEA Lab, Department of Computer Science, University of Iowa Iowa City, IA, USA {ramon-lawrence, kirk-hackert}@uiowa.edu Abstract
More informationVenice: Reliable Virtual Data Center Embedding in Clouds
Venice: Reliable Virtual Data Center Embedding in Clouds Qi Zhang, Mohamed Faten Zhani, Maissa Jabri and Raouf Boutaba University of Waterloo IEEE INFOCOM Toronto, Ontario, Canada April 29, 2014 1 Introduction
More informationA Fast Randomized Algorithm for Multi-Objective Query Optimization
A Fast Randomized Algorithm for Multi-Objective Query Optimization Immanuel Trummer and Christoph Koch {firstname}.{lastname}@epfl.ch École Polytechnique Fédérale de Lausanne ABSTRACT Query plans are compared
More informationCSE 232A Graduate Database Systems
CSE 232A Graduate Database Systems Arun Kumar Topic 4: Query Optimization Chapters 12 and 15 of Cow Book Slide ACKs: Jignesh Patel, Paris Koutris 1 Lifecycle of a Query Query Result Query Database Server
More informationArchitecture-Conscious Database Systems
Architecture-Conscious Database Systems 2009 VLDB Summer School Shanghai Peter Boncz (CWI) Sources Thank You! l l l l Database Architectures for New Hardware VLDB 2004 tutorial, Anastassia Ailamaki Query
More informationPathfinder/MonetDB: A High-Performance Relational Runtime for XQuery
Introduction Problems & Solutions Join Recognition Experimental Results Introduction GK Spring Workshop Waldau: Pathfinder/MonetDB: A High-Performance Relational Runtime for XQuery Database & Information
More informationOptimizing Join Enumeration in Transformation- based Query Optimizers
Optimizing Join Enumeration in Transformation- based Query Optimizers ANIL SHANBHAG, S. SUDARSHAN IIT BOMBAY anils@mit.edu, sudarsha@cse.iitb.ac.in VLDB 2014 Query Optimization: Quick Background System
More informationHYRISE In-Memory Storage Engine
HYRISE In-Memory Storage Engine Martin Grund 1, Jens Krueger 1, Philippe Cudre-Mauroux 3, Samuel Madden 2 Alexander Zeier 1, Hasso Plattner 1 1 Hasso-Plattner-Institute, Germany 2 MIT CSAIL, USA 3 University
More informationLeveraging Transitive Relations for Crowdsourced Joins*
Leveraging Transitive Relations for Crowdsourced Joins* Jiannan Wang #, Guoliang Li #, Tim Kraska, Michael J. Franklin, Jianhua Feng # # Department of Computer Science, Tsinghua University, Brown University,
More informationBackground: disk access vs. main memory access (1/2)
4.4 B-trees Disk access vs. main memory access: background B-tree concept Node structure Structural properties Insertion operation Deletion operation Running time 66 Background: disk access vs. main memory
More informationFaloutsos 1. Carnegie Mellon Univ. Dept. of Computer Science Database Applications. Outline
Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications Lecture #14: Implementation of Relational Operations (R&G ch. 12 and 14) 15-415 Faloutsos 1 introduction selection projection
More informationHow Achaeans Would Construct Columns in Troy. Alekh Jindal, Felix Martin Schuhknecht, Jens Dittrich, Karen Khachatryan, Alexander Bunte
How Achaeans Would Construct Columns in Troy Alekh Jindal, Felix Martin Schuhknecht, Jens Dittrich, Karen Khachatryan, Alexander Bunte Number of Visas Received 1 0,75 0,5 0,25 0 Alekh Jens Health Level
More informationSubmodular Optimization
Submodular Optimization Nathaniel Grammel N. Grammel Submodular Optimization 1 / 28 Submodularity Captures the notion of Diminishing Returns Definition Suppose U is a set. A set function f :2 U! R is submodular
More informationRelational Database Index Design and the Optimizers
Relational Database Index Design and the Optimizers DB2, Oracle, SQL Server, et al. Tapio Lahdenmäki Michael Leach A JOHN WILEY & SONS, INC., PUBLICATION Relational Database Index Design and the Optimizers
More informationAutomatic Scaling Iterative Computations. Aug. 7 th, 2012
Automatic Scaling Iterative Computations Guozhang Wang Cornell University Aug. 7 th, 2012 1 What are Non-Iterative Computations? Non-iterative computation flow Directed Acyclic Examples Batch style analytics
More informationOPEN INFORMATION EXTRACTION FROM THE WEB. Michele Banko, Michael J Cafarella, Stephen Soderland, Matt Broadhead and Oren Etzioni
OPEN INFORMATION EXTRACTION FROM THE WEB Michele Banko, Michael J Cafarella, Stephen Soderland, Matt Broadhead and Oren Etzioni Call for a Shake Up in Search! Question Answering rather than indexed key
More informationJHPCSN: Volume 4, Number 1, 2012, pp. 1-7
JHPCSN: Volume 4, Number 1, 2012, pp. 1-7 QUERY OPTIMIZATION BY GENETIC ALGORITHM P. K. Butey 1, Shweta Meshram 2 & R. L. Sonolikar 3 1 Kamala Nehru Mahavidhyalay, Nagpur. 2 Prof. Priyadarshini Institute
More informationPrinciples of Data Management. Lecture #9 (Query Processing Overview)
Principles of Data Management Lecture #9 (Query Processing Overview) Instructor: Mike Carey mjcarey@ics.uci.edu Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Today s Notable News v Midterm
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY Database Systems: Fall 2015 Quiz I
Department of Electrical Engineering and Computer Science MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.830 Database Systems: Fall 2015 Quiz I There are 12 questions and 13 pages in this quiz booklet. To receive
More informationRecord Placement Based on Data Skew Using Solid State Drives
Record Placement Based on Data Skew Using Solid State Drives Jun Suzuki 1, Shivaram Venkataraman 2, Sameer Agarwal 2, Michael Franklin 2, and Ion Stoica 2 1 Green Platform Research Laboratories, NEC j-suzuki@ax.jp.nec.com
More informationIntroduction to Column Stores with MemSQL. Seminar Database Systems Final presentation, 11. January 2016 by Christian Bisig
Final presentation, 11. January 2016 by Christian Bisig Topics Scope and goals Approaching Column-Stores Introducing MemSQL Benchmark setup & execution Benchmark result & interpretation Conclusion Questions
More informationShark. Hive on Spark. Cliff Engle, Antonio Lupher, Reynold Xin, Matei Zaharia, Michael Franklin, Ion Stoica, Scott Shenker
Shark Hive on Spark Cliff Engle, Antonio Lupher, Reynold Xin, Matei Zaharia, Michael Franklin, Ion Stoica, Scott Shenker Agenda Intro to Spark Apache Hive Shark Shark s Improvements over Hive Demo Alpha
More informationBDCC: Exploiting Fine-Grained Persistent Memories for OLAP. Peter Boncz
BDCC: Exploiting Fine-Grained Persistent Memories for OLAP Peter Boncz NVRAM System integration: NVMe: block devices on the PCIe bus NVDIMM: persistent RAM, byte-level access Low latency Lower than Flash,
More informationOverview. DW Performance Optimization. Aggregates. Aggregate Use Example
Overview DW Performance Optimization Choosing aggregates Maintaining views Bitmapped indices Other optimization issues Original slides were written by Torben Bach Pedersen Aalborg University 07 - DWML
More informationD2.P2: A Query Planner Based on Quantitative and Structural Information. Francesco Scarcello Alfredo Mazzitelli Università della Calabria
D2.P2: A Query Planner Based on Quantitative and Structural Information Francesco Scarcello Alfredo Mazzitelli Università della Calabria Query Planners Given a query and some information on the database,
More informationReview. Support for data retrieval at the physical level:
Query Processing Review Support for data retrieval at the physical level: Indices: data structures to help with some query evaluation: SELECTION queries (ssn = 123) RANGE queries (100
More informationHistogram-Aware Sorting for Enhanced Word-Aligned Compress
Histogram-Aware Sorting for Enhanced Word-Aligned Compression in Bitmap Indexes 1- University of New Brunswick, Saint John 2- Université du Québec at Montréal (UQAM) October 23, 2008 Bitmap indexes SELECT
More informationParallel DBMS. Prof. Yanlei Diao. University of Massachusetts Amherst. Slides Courtesy of R. Ramakrishnan and J. Gehrke
Parallel DBMS Prof. Yanlei Diao University of Massachusetts Amherst Slides Courtesy of R. Ramakrishnan and J. Gehrke I. Parallel Databases 101 Rise of parallel databases: late 80 s Architecture: shared-nothing
More informationPerformance Optimization for Informatica Data Services ( Hotfix 3)
Performance Optimization for Informatica Data Services (9.5.0-9.6.1 Hotfix 3) 1993-2015 Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any means (electronic,
More informationSmooth Scan: Statistics-Oblivious Access Paths. Renata Borovica-Gajic Stratos Idreos Anastasia Ailamaki Marcin Zukowski Campbell Fraser
Smooth Scan: Statistics-Oblivious Access Paths Renata Borovica-Gajic Stratos Idreos Anastasia Ailamaki Marcin Zukowski Campbell Fraser Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q16 Q18 Q19 Q21 Q22
More informationOverview of Data Exploration Techniques. Stratos Idreos, Olga Papaemmanouil, Surajit Chaudhuri
Overview of Data Exploration Techniques Stratos Idreos, Olga Papaemmanouil, Surajit Chaudhuri data exploration not always sure what we are looking for (until we find it) data has always been big volume
More informationWeaving Relations for Cache Performance
Weaving Relations for Cache Performance Anastassia Ailamaki Carnegie Mellon Computer Platforms in 198 Execution PROCESSOR 1 cycles/instruction Data and Instructions cycles
More informationFASTER UPPER BOUNDING OF INTERSECTION SIZES
FASTER UPPER BOUNDING OF INTERSECTION SIZES IBM Research Tokyo - Japan Daisuke Takuma Hiroki Yanagisawa 213 IBM Outline Motivation of upper bounding Contributions Data structure Proposal Evaluation (introduce
More informationApproximation Schemes for Many-Objective Query Optimization
Approximation Schemes for Many-Objective Query Optimization Immanuel Trummer and Christoph Koch École Polytechnique Fédérale de Lausanne {firstname}.{lastname}@epfl.ch ABSTRACT The goal of multi-objective
More informationChapter 12: Query Processing
Chapter 12: Query Processing Overview Catalog Information for Cost Estimation $ Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Transformation
More informationDistributed Query Optimization: Use of mobile Agents Kodanda Kumar Melpadi
Distributed Query Optimization: Use of mobile Agents Kodanda Kumar Melpadi M.Tech (IT) GGS Indraprastha University Delhi mk_kumar_76@yahoo.com Abstract DDBS adds to the conventional centralized DBS some
More informationJust In Time Compilation in PostgreSQL 11 and onward
Just In Time Compilation in PostgreSQL 11 and onward Andres Freund PostgreSQL Developer & Committer Email: andres@anarazel.de Email: andres.freund@enterprisedb.com Twitter: @AndresFreundTec anarazel.de/talks/2018-09-07-pgopen-jit/jit.pdf
More informationImproving IBM Red Brick Warehouse Query Performance
Improving IBM Red Brick Warehouse Query Performance Aman Sinha, Richard Taylor, Mandar Pimpale, David Wilhite, Cindy Fung European Red Brick Users Group Conference 2003 Milano, Italy September 9 September
More informationORACLE TRAINING CURRICULUM. Relational Databases and Relational Database Management Systems
ORACLE TRAINING CURRICULUM Relational Database Fundamentals Overview of Relational Database Concepts Relational Databases and Relational Database Management Systems Normalization Oracle Introduction to
More informationAsking the Right Questions in Crowd Data Sourcing
MoDaS Mob Data Sourcing Asking the Right Questions in Crowd Data Sourcing Tova Milo Tel Aviv University Outline Introduction to crowd (data) sourcing Databases and crowds Declarative is good How to best
More informationCOLUMN-STORES VS. ROW-STORES: HOW DIFFERENT ARE THEY REALLY? DANIEL J. ABADI (YALE) SAMUEL R. MADDEN (MIT) NABIL HACHEM (AVANTGARDE)
COLUMN-STORES VS. ROW-STORES: HOW DIFFERENT ARE THEY REALLY? DANIEL J. ABADI (YALE) SAMUEL R. MADDEN (MIT) NABIL HACHEM (AVANTGARDE) PRESENTATION BY PRANAV GOEL Introduction On analytical workloads, Column
More informationKeywords : Skyline Join, Skyline Objects, Non Reductive, Block Nested Loop Join, Cardinality, Dimensionality
Volume 4, Issue 3, March 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Skyline Evaluation
More informationPS2 out today. Lab 2 out today. Lab 1 due today - how was it?
6.830 Lecture 7 9/25/2017 PS2 out today. Lab 2 out today. Lab 1 due today - how was it? Project Teams Due Wednesday Those of you who don't have groups -- send us email, or hand in a sheet with just your
More informationGraph-Based Synopses for Relational Data. Alkis Polyzotis (UC Santa Cruz)
Graph-Based Synopses for Relational Data Alkis Polyzotis (UC Santa Cruz) Data Synopses Data Query Result Data Synopsis Query Approximate Result Problem: exact answer may be too costly to compute Examples:
More informationR & G Chapter 13. Implementation of single Relational Operations Choices depend on indexes, memory, stats, Joins Blocked nested loops:
Relational Query Optimization R & G Chapter 13 Review Implementation of single Relational Operations Choices depend on indexes, memory, stats, Joins Blocked nested loops: simple, exploits extra memory
More informationEvaluation of Relational Operations: Other Techniques. Chapter 14 Sayyed Nezhadi
Evaluation of Relational Operations: Other Techniques Chapter 14 Sayyed Nezhadi Schema for Examples Sailors (sid: integer, sname: string, rating: integer, age: real) Reserves (sid: integer, bid: integer,
More informationMulti-threaded Queries. Intra-Query Parallelism in LLVM
Multi-threaded Queries Intra-Query Parallelism in LLVM Multithreaded Queries Intra-Query Parallelism in LLVM Yang Liu Tianqi Wu Hao Li Interpreted vs Compiled (LLVM) Interpreted vs Compiled (LLVM) Interpreted
More informationPrimavera Compression Server 5.0 Service Pack 1 Concept and Performance Results
- 1 - Primavera Compression Server 5.0 Service Pack 1 Concept and Performance Results 1. Business Problem The current Project Management application is a fat client. By fat client we mean that most of
More informationColumnstore and B+ tree. Are Hybrid Physical. Designs Important?
Columnstore and B+ tree Are Hybrid Physical Designs Important? 1 B+ tree 2 C O L B+ tree 3 B+ tree & Columnstore on same table = Hybrid design 4? C O L C O L B+ tree B+ tree ? C O L C O L B+ tree B+ tree
More informationHeckaton. SQL Server's Memory Optimized OLTP Engine
Heckaton SQL Server's Memory Optimized OLTP Engine Agenda Introduction to Hekaton Design Consideration High Level Architecture Storage and Indexing Query Processing Transaction Management Transaction Durability
More informationSQL Queries. for. Mere Mortals. Third Edition. A Hands-On Guide to Data Manipulation in SQL. John L. Viescas Michael J. Hernandez
SQL Queries for Mere Mortals Third Edition A Hands-On Guide to Data Manipulation in SQL John L. Viescas Michael J. Hernandez r A TT TAddison-Wesley Upper Saddle River, NJ Boston Indianapolis San Francisco
More informationThe Heuristic (Dark) Side of MIP Solvers. Asja Derviskadic, EPFL Vit Prochazka, NHH Christoph Schaefer, EPFL
The Heuristic (Dark) Side of MIP Solvers Asja Derviskadic, EPFL Vit Prochazka, NHH Christoph Schaefer, EPFL 1 Table of content [Lodi], The Heuristic (Dark) Side of MIP Solvers, Hybrid Metaheuristics, 273-284,
More informationImpala Intro. MingLi xunzhang
Impala Intro MingLi xunzhang Overview MPP SQL Query Engine for Hadoop Environment Designed for great performance BI Connected(ODBC/JDBC, Kerberos, LDAP, ANSI SQL) Hadoop Components HDFS, HBase, Metastore,
More informationCSE 530A. Query Planning. Washington University Fall 2013
CSE 530A Query Planning Washington University Fall 2013 Scanning When finding data in a relation, we've seen two types of scans Table scan Index scan There is a third common way Bitmap scan Bitmap Scans
More informationCS 310: Memory Hierarchy and B-Trees
CS 310: Memory Hierarchy and B-Trees Chris Kauffman Week 14-1 Matrix Sum Given an M by N matrix X, sum its elements M rows, N columns Sum R given X, M, N sum = 0 for i=0 to M-1{ for j=0 to N-1 { sum +=
More informationAn Initial Study of Overheads of Eddies
An Initial Study of Overheads of Eddies Amol Deshpande University of California Berkeley, CA USA amol@cs.berkeley.edu Abstract An eddy [2] is a highly adaptive query processing operator that continuously
More informationColumn-Stores vs. Row-Stores: How Different Are They Really?
Column-Stores vs. Row-Stores: How Different Are They Really? Daniel Abadi, Samuel Madden, Nabil Hachem Presented by Guozhang Wang November 18 th, 2008 Several slides are from Daniel Abadi and Michael Stonebraker
More informationRestricted Use Case Modeling Approach
RUCM TAO YUE tao@simula.no Simula Research Laboratory Restricted Use Case Modeling Approach User Manual April 2010 Preface Use case modeling is commonly applied to document requirements. Restricted Use
More informationDATABASE MANAGEMENT SYSTEMS
www..com Code No: N0321/R07 Set No. 1 1. a) What is a Superkey? With an example, describe the difference between a candidate key and the primary key for a given relation? b) With an example, briefly describe
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY Database Systems: Fall 2008 Quiz II
Department of Electrical Engineering and Computer Science MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.830 Database Systems: Fall 2008 Quiz II There are 14 questions and 11 pages in this quiz booklet. To receive
More informationParser. Select R.text from Report R, Weather W where W.image.rain() and W.city = R.city and W.date = R.date and R.text.
Select R.text from Report R, Weather W where W.image.rain() and W.city = R.city and W.date = R.date and R.text. Lifecycle of a Query CSE 190D Database System Implementation Arun Kumar Query Query Result
More informationRobustness in Automatic Physical Database Design
Robustness in Automatic Physical Database Design Kareem El Gebaly Ericsson kareem.gebaly@ericsson.com Ashraf Aboulnaga University of Waterloo ashraf@cs.uwaterloo.ca ABSTRACT Automatic physical database
More informationOptimal Sequential Multi-Way Number Partitioning
Optimal Sequential Multi-Way Number Partitioning Richard E. Korf, Ethan L. Schreiber, and Michael D. Moffitt Computer Science Department University of California, Los Angeles Los Angeles, CA 90095 IBM
More informationLa Fragmentation Horizontale Revisitée: Prise en Compte de l Interaction de Requêtes
National Engineering School of Mechanic & Aerotechnics 1, avenue Clément Ader - BP 40109-86961 Futuroscope cedex France La Fragmentation Horizontale Revisitée: Prise en Compte de l Interaction de Requêtes
More informationWeb User Session Reconstruction Using Integer Programming
Calhoun: The NPS Institutional Archive DSpace Repository Faculty and Researchers Faculty and Researchers Collection 2008 Web User Session Reconstruction Using Integer Programming Dell, Robert F.; Román,
More information