Optimization of logical query plans Eliminating redundant joins

Similar documents
Database Systems Architecture. Stijn Vansummeren

Lecture 1: Conjunctive Queries

Improving Query Plans. CS157B Chris Pollett Mar. 21, 2005.

Algebraic laws extensions to relational algebra

Information Systems for Engineers. Exercise 10. ETH Zurich, Fall Semester Hand-out Due

Foundations of Databases

Query Processing and Optimization

Background material for Chapter 3 taken from CS 245

Relational Algebra. Spring 2012 Instructor: Hassan Khosravi

Database Modifications and Transactions

Relational Model, Relational Algebra, and SQL

SQL queries II. Set operations and joins

Optimization of Logical Queries

Exam II Computer Programming 420 Dr. St. John Lehman College City University of New York 20 November 2001

Outline. Query Processing Overview Algorithms for basic operations. Query optimization. Sorting Selection Join Projection

Data integration lecture 2

Relational Algebra and SQL. Basic Operations Algebra of Bags

Relational Algebra. Algebra of Bags

Joins, NULL, and Aggregation

The Extended Algebra. Duplicate Elimination. Sorting. Example: Duplicate Elimination

Programming and Database Fundamentals for Data Scientists

Midterm 1 157A Fall /22

The query processor turns user queries and data modification commands into a query plan - a sequence of operations (or algorithm) on the database

Conjunctive queries. Many computational problems are much easier for conjunctive queries than for general first-order queries.

AC61/AT61 DATABASE MANAGEMENT SYSTEMS DEC 2013

Relational model continued. Understanding how to use the relational model. Summary of board example: with Copies as weak entity

Schema Mappings and Data Exchange

Relational Algebra BASIC OPERATIONS DATABASE SYSTEMS AND CONCEPTS, CSCI 3030U, UOIT, COURSE INSTRUCTOR: JAREK SZLICHTA

Query Evaluation and Optimization

Database Systems External Sorting and Query Optimization. A.R. Hurson 323 CS Building

Notes. Some of these slides are based on a slide set provided by Ulf Leser. CS 640 Query Processing Winter / 30. Notes

Optimizing logical query plans

Relational Algebra and SQL

Lecture Query evaluation. Combining operators. Logical query optimization. By Marina Barsky Winter 2016, University of Toronto

Logic and Databases. Phokion G. Kolaitis. UC Santa Cruz & IBM Research - Almaden

Transactions and Concurrency Control

Two hours UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE. Date: Thursday 16th January 2014 Time: 09:45-11:45. Please answer BOTH Questions

CMPS 277 Principles of Database Systems. Lecture #4

Chapter 2 The relational Model of data. Relational algebra

Data Definition Language (DDL), Views and Indexes Instructor: Shel Finkelstein

COMPUTER SCIENCE TRIPOS

Query Processing & Optimization. CS 377: Database Systems

CS 377 Database Systems

RELATIONAL DATA MODEL: Relational Algebra

Chapter 2: Intro to Relational Model

L22: The Relational Model (continued) CS3200 Database design (sp18 s2) 4/5/2018

Optimization Overview

What happens. 376a. Database Design. Execution strategy. Query conversion. Next. Two types of techniques

Relational Query Languages: Relational Algebra. Juliana Freire

Database Theory VU , SS Introduction: Relational Query Languages. Reinhard Pichler

Relational Databases

Chapter 6 The Relational Algebra and Calculus

Database Theory VU , SS Codd s Theorem. Reinhard Pichler

Inverting Schema Mappings: Bridging the Gap between Theory and Practice

Automatic Reasoning (Section 8.3)

Parser: SQL parse tree

Logical Aspects of Massively Parallel and Distributed Systems

Background material for Chapter 3 taken from CS 245

What s a database system? Review of Basic Database Concepts. Entity-relationship (E/R) diagram. Two important questions. Physical data independence

Chapter 3: Relational Model

Query Processing SL03

CMPS 277 Principles of Database Systems. Lecture #3

Today s topics. Null Values. Nulls and Views in SQL. Standard Boolean 2-valued logic 9/5/17. 2-valued logic does not work for nulls

CS34800 Information Systems. The Relational Model Prof. Walid Aref 29 August, 2016

Databases Relational algebra Lectures for mathematics students

Database Systems. Basics of the Relational Data Model

Foundations of Databases

The Complexity of Relational Queries: A Personal Perspective

Advanced Query Processing

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe

Announcements. Relational Model & Algebra. Example. Relational data model. Example. Schema versus instance. Lecture notes

Database Technology Introduction. Heiko Paulheim

Ian Kenny. November 28, 2017

Relational Model and Relational Algebra

Incomplete Information: Null Values

Chapter 6 5/2/2008. Chapter Outline. Database State for COMPANY. The Relational Algebra and Calculus

Principles of Database Management Systems

EECS-3421a: Test #2 Queries

The Inverse of a Schema Mapping

Three Query Language Formalisms

Chapter 6 Part I The Relational Algebra and Calculus

Database Systems CSE 303. Lecture 02

Database Systems CSE 303. Lecture 02

Three Query Language Formalisms

Recursive Functions. Recursive functions are built up from basic functions by some operations.

DATABASE THEORY. Lecture 11: Introduction to Datalog. TU Dresden, 12th June Markus Krötzsch Knowledge-Based Systems

Chapter 5 Relational Algebra. Nguyen Thi Ai Thao

Relational Model & Algebra. Announcements (Thu. Aug. 27) Relational data model. CPS 116 Introduction to Database Systems

Chapter 2: Intro to Relational Model

Databases -Normalization I. (GF Royle, N Spadaccini ) Databases - Normalization I 1 / 24

Integrity Constraints (Chapter 7.3) Overview. Bottom-Up. Top-Down. Integrity Constraint. Disjunctive & Negative Knowledge. Proof by Refutation

Database Theory VU , SS Introduction: Relational Query Languages. Reinhard Pichler

DATABASE SYSTEMS. Database Systems 1 L. Libkin

CSE344 Midterm Exam Winter 2017

Today s Lecture 4/13/ WFFs/ Free and Bound Variables 9.3 Proofs for Pred. Logic (4 new implicational rules)!

Relational Model, Key Constraints

Query Decomposition and Data Localization

Course No: 4411 Database Management Systems Fall 2008 Midterm exam

Database Constraints and Design

Chapter 2: The Relational Algebra

Transcription:

Optimization of logical query plans Eliminating redundant joins 66

Optimization of logical query plans Query Compiler Execution Engine SQL Translation Logical query plan "Intermediate code" Logical plan optimization Optimized logical query plan Physical plan selection Physical query plan "Machine code" Result Statistics and Metadata Physical Data Storage 67

Optimization of logical query plans Logical plans = execution trees A logical query plan like π A,B (σ A=5 (R) (S T )) is essentially an execution tree π σ R S T A physical query plan is a logical query plan in which each node is assigned the algorithm that should be used to evaluate the corresponding relational algebra operator. (There are multiple ways to evaluate each algebra operator) 68

Optimization of logical query plans The need to optimize Every internal node (i.e., each occurrence of an operator) in the plan must be executed. Hence, the fewer nodes we have, the faster the execution Definition A relational algebra expression e is optimal if there is no other expression e that is both (1) equivalent to e and (2) shorter than e (i.e., has fewer occurrence of operators than e). The optimization problem: Input: a relational algebra expression e Output: an optimal relational algebra expression e equivalent to e. 69

Optimization of logical query plans The optimization problem is undecidable: It is known that the following problem is undecidable: Input: relational algebra expressions e 1 and e 2 over a single relation R(A, B) Output: e 1 e 2? Proof: Suppose that we can compute e, the optimal expression for (e 1 e 2 ) (e 2 e 1 ). Note that e 1 e 2 if, and only if, e is either σ false (R) or R R. Conclusion: the optimization problem is undecidable Question: can we optimize plans that are of a particular form? 70

In practice, most SQL queries are of the following form: SELECT... FROM R1, R2,..., Rm WHERE A1 = B1 AND A2 = B2 AND... AND An = Bn The corresponding logical query plans are of the form: π... σ A1 =B 1 A 2 =B 2 A n =B n (R 1 R 2 R m ). We call such relational algebra expressions select-project-join expressions (SPJ expressions for short) 71

Removing redundant joins: A careless SQL programmer writes the following query: SELECT movietitle FROM StarsIn S1 WHERE starname IN (SELECT name FROM MovieStar, StarsIn S2 WHERE birthdate = 1960 AND S2.movieTitle = S1.movieTitle) This query is equivalent to the following one, which has one join less to execute! SELECT movietitle FROM StarsIn WHERE starname IN (SELECT name FROM MovieStar WHERE birthdate = 1960) 72

Here are the corresponding logical query plans: versus π S1.movieTitle(ρ S2 (StarsIn) π S1.movieTitle(ρ S1 (StarsIn) ρ S S 2.movieTitle=S 1.movieTitle 1 (StarsIn) σ birthdate=1960(moviestar)) S 1.starName=name S 1.starName=name σ birthdate=1960(moviestar)). Redundant joins may also be introduced because of view expansion Can we optimize select-project-join expressions? (And hence remove redundant joins?) 73

Definition A conjunctive query is an expression of the form Q (x 1,..., x n ) }{{} head R(t 1,..., t m ),..., S(t 1,..., t k) }{{} body Here t 1,..., t k denote variables and constants, and x 1,..., x n must be variables that occur in t 1,..., t k. We call an expression like R(t 1,..., t m ) an atom. If an atom does not contain any variables, and hence consists solely of contstants, then it is called a fact. 74

Semantics of conjunctive queries Consider the following toy database D: R 1 2 2 3 2 5 6 7 7 5 5 5 as well as the following conjunctive query over the relations R(A, B) and S(C): Q(x, y) R(x, y), R(y, 5), S(y). Intuitively, Q wants to retrieve all pairs of values (x, y) such that (1) this pair occurs in relation R; (2) y occurs together with the constant 5 in a tuple in R; and (3) y occurs as a value in S. The formal definition is as follows. S 2 7 75

Semantics of conjunctive queries Consider the following toy database D: R 1 2 2 3 2 5 6 7 7 5 5 5 as well as the following conjunctive query over the relations R(A, B) and S(C): Q(x, y) R(x, y), R(y, 5), S(y). A substitution f of Q into D is a function that maps variables in Q to constants in D. For example: f : x 1 y 2 S 2 7 76

Semantics of conjunctive queries Consider the following toy database D: R 1 2 2 3 2 5 6 7 7 5 5 5 as well as the following conjunctive query over the relations R(A, B) and S(C): Q(x, y) R(x, y), R(y, 5), S(y). S 2 7 A matching is a substitution that maps the body of Q into facts in D. example: f : x 1 y 2 For 77

Semantics of conjunctive queries Consider the following toy database D: R 1 2 2 3 2 5 6 7 7 5 5 5 as well as the following conjunctive query over the relations R(A, B) and S(C): Q(x, y) R(x, y), R(y, 5), S(y). The result of a conjunctive query is obtained by applying all possible matchings to the head of the query. In our example: S 2 7 Q(D) = {(1, 2), (6, 7)}. 78

Translation of SPJ expressions into conjunctive queries SPJ: π title (Movie title=movietitle StarsIn starname=name σ birthdate=1960 (MovieStar)) CQ: Q 1 (t) Movie(t, y, l, i, s, p), StarsIn(t, y 2, n), MovieStar(n, a, g, 1960) Translation of conjunctive queries into SPJ expressions CQ: Q(x, y) R(x, y), R(y, 5), S(y) SPJ: π R1.A,R 1.B σ R1.B=R 2.A σ R2.B=5 σ R1.B=C(ρ R1 (R) ρ R2 (R) S) Conclusion: Select-project-join expressions and conjunctive queries are two separate syntaxes for the same class of queries. 79

In-class exercise Consider the following SQL expression over the relations R(A, B) and S(B, C): SELECT R1.A, S1.B FROM R R1, R R2, R R3, S S1, S S2 WHERE R1.A = R2.A AND R2.B =4 AND R2.A = R3.A AND R3.B = S1.B AND S1.C = S2.C AND S2.B=4 Exercise: Translate this SQL expression into a logical query plan. If this plan is an SPJ-expression, then translate this plan into a conjunctive query. 80

Another in-class exercise Consider the following conjunctive query over the relations MovieStar(name: string, address: string, gender: char, birthdate: date) StarsIn(movieTitle: string, movieyear: string, starname: string) Q(t) MovieStar(n, a, g, 1940), StarsIn(t, y, n) Translate this conjunctive query into an SPJ expression. What is the corresponding SQL query? 81

Containment of conjunctive queries Q 1 is contained in Q 2 if Q 1 (D) Q 2 (D), for every database D. Example: A(x, y) R(x, w), G(w, z), R(z, y) B(x, y) R(x, w), G(w, w), R(w, y) Then B is contained in A. Proof: 1. Let D be an arbitrary database, and let t B(D). 2. Then there exists a matching f of B into D such that t = (f(x), f(y)). We need to show that (f(x), f(y)) A(D). 3. Let h be the following substitution: x f(x) y f(y) w f(w) z f(w) 4. Then h is a matching of A into D, and hence t = (f(x), f(y)) = (h(x), h(y)) A(D) 82

Containment of conjunctive queries is decidable A(x, y) R(x, w), G(w, z), R(z, y) B(x, y) R(x, w), G(w, w), R(w, y) Golden method to check whether B A: 1. First calculate the canonical database D for B: R x w w y G w w 2. Then check whether (x, y) A(D). If so, B A, otherwise B A. 83

Containment of conjunctive queries is decidable A(x, y) R(x, w), G(w, z), R(z, y) B(x, y) R(x, w), G(w, w), R(w, y) Fact: B A (x, y) A(D) with D the canonical database for B. First possibility: (x, y) A(D) In this case we have just constructed a counter-example because (x, y) B(D). R x w w y G w w 84

Containment of conjunctive queries is decidable A(x, y) R(x, w), G(w, z), R(z, y) B(x, y) R(x, w), G(w, w), R(w, y) Fact: B A (x, y) A(D) with D the canonical database for B. Second possibility: (x, y) A(D) There hence exists a matching h of A into D such that h(x) = x and h(y) = y Let D be an arbitrary other database, and pick t B(D ). There hence exists a matching f such that t = (f(x), f(y)). Then f h is a matching of A on D : R x w z y A G w z h R x w w y D = B G w w f D R............ G............ 85

Containment of conjunctive queries is decidable A(x, y) R(x, w), G(w, z), R(z, y) B(x, y) R(x, w), G(w, w), R(w, y) Fact: B A (x, y) A(D) with D the canonical database for B. Second possibility: (x, y) A(D) There hence exists a matching h of A into D such that h(x) = x and h(y) = y Let D be an arbitrary other database, and pick t B(D ). There hence exists a matching f such that t = (f(x), f(y)). Then f h is a matching of A on D : And hence t = (f(x), f(y)) = (f(h(x)), f(h(y)) A(D ) 86

Conclusion: Containment of conjunctive queries is decidable Consequently the equivalence of conjunctive queries is also decidable 87

Optimizing conjunctive queries Input: A conjunctive query Q Output: A conjunctive query Q equivalent to Q that is optimal (i.e., has the least number of atoms in its body). For each conjunctive query we can obtain an equivalent, optimal query, by removing atoms from its body Let Q be a CQ and let P be an arbitrary optimal and equivalent query. Then Q P and hence (x, y) P (D Q ) with D Q the canonical database for Q. Let f be the matching that ensures this fact. Let Q be obtained by removing from Q all atoms that are not in the range of f Then Q Q Moreover, also Q P (because (x, y) P (D Q ) still holds) and P Q. Hence Q Q. Note that Q contains at most the same number of atoms as P. Hence Q is optimal. 88

Optimization of conjunctive queries Input: A conjunctive query Q Output: A conjunctive query Q equivalent to Q that is optimal (i.e., has the least number of atoms in its body). Optimization algorithm A conjunctive query is given. Consider for example: Q(x) R(x, x), R(x, y) We check, atom by atom, what atoms in its body are redundant. In our example we first try to delete R(x, x): Q 1 (x) R(x, y) Note that Q Q 1 but Q 1 Q. We hence cannot remove this atom. 89

Optimization of conjunctive queries Input: A conjunctive query Q Output: A conjunctive query Q equivalent to Q that is optimal (i.e., has the least number of atoms in its body). Optimization algorithm A conjunctive query is given. Consider for example: Q(x) R(x, x), R(x, y) We check, atom by atom, what atoms in its body are redundant. We next try to remove R(x, y): Q 2 (x) R(x, x) Note that Q Q 2 and Q 2 Q. Q 2 is certainly shorter than Q and hence closer to the optimal query. Since there remain no other atoms to test, our result is Q 2. 90

Optimization of select-project-join expressions 1. Translate the select-project-join expression e into an conjunctive query Q. 2. Optimize Q. 3. Translate Q back into a select-project-join expression. 91

Optimization of logical query plans Optimization of complete, arbitrary query plans 1. Detect and optimize the select-project-join sub-expressions in the plan 2. Then use heuristics to further optimize the modified plan. Heuristic: rewriting through algebraic laws Consider relations R(A, B) and S(C, D) and the following expression that we want to optimize π A σ A=5 B<D (R S) Pushing selections: π A σ B<D (σ A=5 (R) S) Recognizing joins: π A (σ A=5 (R) S) B<D Introduce projections where possible: π A (σ A=5 (R) B<D π D (S)) 92

Optimization of logical query plans An in-class integrated exercise Recall the relational schema. MovieStar(name: string, address: string, gender: char, birthdate: date) StarsIn(movieTitle: string, movieyear: string, starname: string) A careless SQL programmer writes the following query: SELECT movietitle FROM StarsIn S1 WHERE starname IN (SELECT name FROM MovieStar, StarsIn S2 WHERE birthdate = 1960 AND S2.movieTitle = S1.movieTitle) Translate this query into a logical query plan, remove redundant joins from it, and then use heuristics to further optimize the obtained logical plan. 93