Foundations of Databases

Similar documents
Transactions and Concurrency Control

Foundations of Databases

Optimization of logical query plans Eliminating redundant joins

Lecture 1: Conjunctive Queries

Database Theory VU , SS Codd s Theorem. Reinhard Pichler

Incomplete Information: Null Values

Conjunctive queries. Many computational problems are much easier for conjunctive queries than for general first-order queries.

Informationslogistik Unit 4: The Relational Algebra

Schema Mappings and Data Exchange

Relational Databases

Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Chapter 6 Outline. Unary Relational Operations: SELECT and

CS 377 Database Systems

Chapter 2: Intro to Relational Model

Foundations of Databases

Database Theory VU , SS Introduction: Relational Query Languages. Reinhard Pichler

CSCC43H: Introduction to Databases. Lecture 3

DATABASE SYSTEMS. Database Systems 1 L. Libkin

Relational Model and Relational Algebra A Short Review Class Notes - CS582-01

Ian Kenny. November 28, 2017

Introduction Alternative ways of evaluating a given query using

Chapter 8: The Relational Algebra and The Relational Calculus

Database Systems. Basics of the Relational Data Model

Logic and Databases. Phokion G. Kolaitis. UC Santa Cruz & IBM Research - Almaden

Database Systems Architecture. Stijn Vansummeren

Chapter 2: Intro to Relational Model

Relational Model and Relational Algebra

Database Theory VU , SS Introduction: Relational Query Languages. Reinhard Pichler

Principles of Database Management Systems

Relational Model, Relational Algebra, and SQL

DATABASE SYSTEMS. Database Systems 1 L. Libkin

Relational Query Languages: Relational Algebra. Juliana Freire

Element Algebra. 1 Introduction. M. G. Manukyan

Database System Concepts

CMPS 277 Principles of Database Systems. Lecture #4

Relational Query Languages. Relational Algebra. Preliminaries. Formal Relational Query Languages. Relational Algebra: 5 Basic Operations

Chapter 6 The Relational Algebra and Calculus

v Conceptual Design: ER model v Logical Design: ER to relational model v Querying and manipulating data

Chapter 6 Part I The Relational Algebra and Calculus

Relational algebra. Iztok Savnik, FAMNIT. IDB, Algebra

Relational Algebra. Relational Query Languages

Semantic Errors in Database Queries

Relational Algebra and SQL

Computational Logic. Relational Query Languages. Free University of Bozen-Bolzano, Werner Nutt

Uncertainty in Databases. Lecture 2: Essential Database Foundations

Databases-1 Lecture-01. Introduction, Relational Algebra

Static type safety guarantees for the operators of a relational database querying system. Cédric Lavanchy

Relational Algebra. [R&G] Chapter 4, Part A CS4320 1

Relational Algebra Homework 0 Due Tonight, 5pm! R & G, Chapter 4 Room Swap for Tuesday Discussion Section Homework 1 will be posted Tomorrow

Relational Algebra. Study Chapter Comp 521 Files and Databases Fall

Query Processing SL03

Query Processing Strategies and Optimization

Improving Query Plans. CS157B Chris Pollett Mar. 21, 2005.

Relational Algebra. Procedural language Six basic operators

CMSC 424 Database design Lecture 18 Query optimization. Mihai Pop

COMP 244 DATABASE CONCEPTS AND APPLICATIONS

Query Decomposition and Data Localization

Today s topics. Null Values. Nulls and Views in SQL. Standard Boolean 2-valued logic 9/5/17. 2-valued logic does not work for nulls

Databases. Relational Model, Algebra and operations. How do we model and manipulate complex data structures inside a computer system? Until

Ontology and Database Systems: Foundations of Database Systems

Complexity Theory VU , SS General Information. Reinhard Pichler

Query Processing & Optimization

Introduction to Data Management. Lecture #11 (Relational Algebra)

Relational Algebra 1

Lecture #8 (Still More Relational Theory...!)

CAS CS 460/660 Introduction to Database Systems. Relational Algebra 1.1

Detecting Logical Errors in SQL Queries

Relational Algebra Part I. CS 377: Database Systems

Course: Database Management Systems. Lê Thị Bảo Thu

Relational Algebra. Algebra of Bags

Database Technology Introduction. Heiko Paulheim

Database Tuning and Physical Design: Basics of Query Execution

The Relational Model

Relational Query Languages

Query Rewriting Using Views in the Presence of Inclusion Dependencies

Relational Algebra BASIC OPERATIONS DATABASE SYSTEMS AND CONCEPTS, CSCI 3030U, UOIT, COURSE INSTRUCTOR: JAREK SZLICHTA

Finding Equivalent Rewritings in the Presence of Arithmetic Comparisons

Delhi Noida Bhopal Hyderabad Jaipur Lucknow Indore Pune Bhubaneswar Kolkata Patna Web: Ph:

Informationslogistik Unit 5: Data Integrity & Functional Dependency

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe

FOL Modeling of Integrity Constraints (Dependencies)

RELATIONAL QUERY LANGUAGES

Databases 1. Daniel POP

Chapter 14 Query Optimization

Chapter 14 Query Optimization

Chapter 14 Query Optimization

Informatics 1: Data & Analysis

Stored Relvars 18 th April 2013 (30 th March 2001) David Livingstone. Stored Relvars

Chapter 3: Relational Model

Advanced Databases. Lecture 4 - Query Optimization. Masood Niazi Torshiz Islamic Azad university- Mashhad Branch

Database Constraints and Design

Chapter 14: Query Optimization

Database Systems External Sorting and Query Optimization. A.R. Hurson 323 CS Building

Databases Lectures 1 and 2

Query Evaluation and Optimization

Informatics 1: Data & Analysis

Schema Design for Uncertain Databases

Relational Algebra. Relational Algebra Overview. Relational Algebra Overview. Unary Relational Operations 8/19/2014. Relational Algebra Overview

Database Management Systems. Chapter 4. Relational Algebra. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

CMPT 354: Database System I. Lecture 5. Relational Algebra

Logic As a Query Language. Datalog. A Logical Rule. Anatomy of a Rule. sub-goals Are Atoms. Anatomy of a Rule

Transcription:

Foundations of Databases Free University of Bozen Bolzano, 2004 2005 Thomas Eiter Institut für Informationssysteme Arbeitsbereich Wissensbasierte Systeme (184/3) Technische Universität Wien http://www.kr.tuwien.ac.at/staff/eiter (Most of the slides based on material by Leonid Libkin)

Foundations of Databases 1 Query optimization: finding a good way to evaluate a query Queries are declarative, and can be translated into procedural languages in more than one way Hence one has to choose the best (or at least good) procedural query This happens in the context of query processing A query processor turns queries and updates into sequences of operations on the database

Foundations of Databases 2 Query processing and optimization stages Which relational algebra expression, equivalent to a given declarative query, will lead to the most efficient algorithm? For each algebraic operator, what algorithm do we use to compute that operator? How do operations pass data (main memory buffer, disk buffer)? Issues: Finding equivalent relational algebra expressions ( query plans ) Assessing efficiency of their evaluation: We need to know how data is stored, and how it is accessed, etc. Use general guidelines and statistics information

Foundations of Databases 3 Overview of query processing Start with a declarative query: SELECT R.A, S.B, T.E FROM R,S,T WHERE R.C=S.C AND S.D=T.D AND R.A>5 AND S.B<3 AND T.D=T.E Translate into an algebra expression: π R.A,S.B,T.E (σ R.A>5 S.B<3 T.D=T.E (R S T )) Optimization step: rewrite to an equivalent but more efficient expression: π R.A,S.B,T.E (σ A>5 (R) σ B<3 (S) σ D=E (T ))) Why is it more efficient? Because selections are evaluated early, and joined relations are not as large as R, S, T.

Foundations of Databases 4 Overview of query processing cont d Evaluating the optimized expression. Choices to make: order of joins. First query plan (out of two): A,B A>5 B<3 D=E R S T first join S and T, and then join the result with R.

Foundations of Databases 5 Alternative query plan: A,B A>5 B<3 D=E R S T It first joins S, T, and then joins the result with R. Both query plans produce the same result. How to choose one?

Foundations of Databases 6 Optimization by algebraic manipulations Given a relational algebra expression e, find another expression e equivalent to e that is easier (faster) to evaluate. Basic question: Given two relational algebra expressions e 1, e 2, are they equivalent? This is the same as asking if an expression e always produces the empty answer: e 1 = e 2 e 1 e 2 = and e 2 e 1 = Problem: testing e = is undecidable for relational algebra expressions. Good news: We can still list some useful equalities, and It is decidable for very important classes of queries (SPJ queries)

Foundations of Databases 7 Optimization by Algebraic Equivalences Systematic way of query optimization: Apply equivalences and are commutative and associative, hence applicable in any order Cascaded projections might simplified: If the attributes A 1,..., A n are among B 1,..., B m, then Cascaded selections might be merged: π A1,...,A n (π B1,...,B m (E)) = π A1,...,A n (E) σ c1 (σ c2 (E)) = σ c1 c 2 (E) Commuting selection with join. If c only involves attributes from E 1, then etc We will not treat this here. σ c (E 1 E 2 ) = σ c (E 1 ) E 2

Foundations of Databases 8 Optimization of conjunctive queries Reminder: conjunctive queries = SPJR queries = simple SELECT-FROM-WHERE SQL queries (only AND and (in)equality in the WHERE clause) Extremely common, and thus special optimization techniques have been developed Reminder: for two relational algebra expressions e 1, e 2, e 1 = e 2 is undecidable. But for conjunctive queries, even e 1 e 2 is decidable. Main goal of optimizing conjunctive queries: reduce the number of joins.

Foundations of Databases 9 Optimization of conjunctive queries: an example Given a relation R with two attributes A, B Two SQL queries: Q1 SELECT R1.B, R1.A FROM R R1, R R2 WHERE R2.A=R1.B Q2 SELECT R3.A, R1.A FROM R R1, R R2, R R3 WHERE R1.B=R2.B AND R2.B=R3.A Are they equivalent? If they are, we saved one join operation. In relational algebra: Q 1 = π 2,1 (σ 2=3 (R R)) Q 2 = π 5,1 (σ 2=4 4=5 (R R R))

Foundations of Databases 10 Optimization of conjunctive queries cont d Are Q 1 and Q 2 equivalent? If they are, we cannot show it by using equivalences for relational algebra expression. Because: they don t decrease the number of or operators, but Q 1 has 1 join, and Q 2 has 2. But Q 1 and Q 2 are equivalent. How can we show this? But representing queries as databases. This representation is very close to rule-based queries. Q 1 (x, y) : R(y, x), R(x, z) Q 2 (x, y) : R(y, x), R(w, x), R(x, u)

Foundations of Databases 11 Conjunctive queries into tableaux Tableau: representation of a conjunctive query as a database We first consider queries over a single relation Tableaux: Q 1 (x, y) : R(y, x), R(x, z) Q 2 (x, y) : R(y, x), R(w, x), R(x, u) A B A B y x y x x z w x x y answer line x u Variables in the answer line are called distinguished x y answer line

Foundations of Databases 12 Tableau homomorphisms A homomorphism of two tableaux f : T 1 T 2 is a mapping f : {variables of T 1 } {variables of T 2 } {constants} For every distinguished x, f(x) = x For every row x 1,..., x k in T 1, f(x 1 ),..., f(x k ) is a row of T 2 Defn. Query containment: Q Q if Q(I) Q (I) for every database instance I. Homomorphism Theorem: Let Q, Q be two conjunctive queries, and T, T their tableaux. Then Q Q there exists a homomorphism f : T T

Foundations of Databases 13 Applying the Homomorphism Theorem: Q 1 = Q 2 T1 A B y x x z x y T2 A B y x w x x u x y f(x)=x, f(y)=y f(u)=z, f(w)=y Hence Q1 Q2 T1 A B y x x z x y T2 A B y x w x x u x y f(x)=x, f(y)=y f(z)=u Hence Q2 Q1

Foundations of Databases 14 Applying the Homomorphism Theorem: Complexity Given two conjunctive queries, how hard is it to test if Q 1 = Q 2? it is easy to transform them into tableaux, from either SPJ relational algebra queries, or SQL queries, or rule-based queries. However, a polynomial algorithm for deciding equivalence is unlikely to exist: Theorem. Given two tableaux, deciding the existence of a homomorphism between them is NP-complete. In practice, query expressions are small, and thus conjunctive query optimization is nonetheless feasible in polynomial time

Foundations of Databases 15 Minimizing Conjunctive Queries Goal: Given a conjunctive query Q, find an equivalent conjunctive query Q with the minimum number of joins. Assume Q is Q( x) : R 1 ( u 1 ),..., R k ( u k ) Assume that there is an equivalent conjunctive query Q of the form Q ( x) : S 1 ( v 1 ),..., S l ( v l ), l < k. Then Q is equivalent to a query of the form Q ( x) : R i1 ( u i1 ),..., R l ( u il ) In other words, to minimize a conjunctive query, one has to delete some subqueries on the right of :

Foundations of Databases 16 Minimizing conjunctive queries cont d Given a conjunctive query Q, transform it into a tableau T. Let Q be a minimal conjunctive query equivalent to Q. Then its tableau T is a subset of T. Minimization algorithm: T := T ; repeat until no change end. choose a row t in T ; if there is a homomorphism f : T T \ {t} then T := T \ {t} Note: If a homomorphism T T \ {t} exists, then T, T \ {t} define equivalent queries, as a homomorphism from T \ {t} to T exists. (Why?)

Foundations of Databases 17 Minimizing SPJ/conjunctive queries: example R with three attributes A, B, C SPJ query Q = π AB (σ B=4 (R)) π BC (π AB (R) π AC (σ B=4 (R))) Translate into relational calculus: ` z1 R(x, y, z 1 ) y = 4 x 1 ` z2 R(x 1, y, z 2 ) ` y 1 R(x 1, y 1, z) y 1 = 4 Simplify, by substituting the constant, and putting quantifiers forward: x 1, z 1, z 2 (R(x, 4, z 1 ) R(x 1, 4, z 2 ) R(x 1, 4, z) y = 4) Conjunctive query: Q(x, y, z) : R(x, 4, z 1 ), R(x 1, 4, z 2 ), R(x 1, 4, z), y = 4

Foundations of Databases 18 Minimizing SPJ/conjunctive queries cont d Tableau T : A B C x 4 z 1 x 1 4 z 2 x 1 4 z x 4 z Minimization, step 1: Is there a homomorphism from T to A B C x 1 4 z 2 x 1 4 z x 4 z Answer: No. For any homomorphism f, f(x) = x (why?), thus the image of the first row is not in the small tableau.

Foundations of Databases 19 Minimizing SPJ/conjunctive queries cont d A B C Step 2: Is T equivalent to x 4 z 1 x 1 4 z x 4 z Answer: Yes. Homomorphism f : f(z 2 ) = z, all other variables stay the same. The new tableau is not equivalent to A B C x 4 z 1 x 4 z or A B C x 1 4 z x 4 z Because f(x) = x, f(z) = z, and the image of one of the rows is not present.

Foundations of Databases 20 Minimizing SPJ/conjunctive queries cont d A B C Minimal tableau: x 4 z 1 x 1 4 z x 4 z Back to conjunctive query: Q (x, y, z) : R(x, y, z 1 ), R(x 1, y, z), y = 4 An SPJ query: σ B=4 (π AB (R) π BC (R)) Pushing selections: π AB (σ B=4 (R)) π BC (σ B=4 (R))

Foundations of Databases 21 Review of the journey We started with π AB (σ B=4 (R)) π BC (π AB (R) π AC (σ B=4 (R))) Translated into a conjunctive query Built a tableau and minimized it Translated back into conjunctive query and SPJ query Applied algebraic equivalences and obtained π AB (σ B=4 (R)) π BC (σ B=4 (R)) Savings: one join.

Foundations of Databases 22 All minimizations are equivalent Let Q be a conjunctive query, and Q 1, Q 2 two conjunctive queries equivalent to Q Assume that Q 1 and Q 2 are both minimal, and let T 1 and T 2 be their tableaux. Then T 1 and T 2 are isomorphic; that is, T 2 can be obtained from T 1 by renaming of variables. That is, all minimizations are equivalent. In particular, in the minimization algorithm, the order in which rows are considered, is irrelevant.

Foundations of Databases 23 Equivalence of conjunctive queries: multiple relations So far we assumed that there is only one relation R, but what if there are many? Construct tableaux as before: Q(x, y): B(x, y), R(y, z), R(y, w), R(w, y) Tableau: A B B: A x B y R: y y z w w y x y Formally, a tableau is just a database where variables can appear in tuples, plus a set of distinguished variables.

Foundations of Databases 24 Tableaux and multiple relations Given two tableaux T 1 and T 2 over the same set of relations, and the same distinguished variables, a homomorphism f : T 1 T 2 is a mapping f : {variables of T 1 } {variables of T 2 } such that - f(x) = x for every distinguished variable, and - for each row t in R in T 1, f( t) is in R in T 2. Homomorphism theorem: let Q 1 and Q 2 be conjunctive queries, and T 1, T 2 their tableaux. Then Q 2 Q 1 there exists a homomorphism f : T 1 T 2

Foundations of Databases 25 Minimization with multiple relations The algorithm is the same as before, but one has to try rows in different relations. Consider homomorphism f(z) = w, and f is the identity for other variables. Applying this to the tableau for Q yields B: A x B y R: A y w B w y x y This can t be further reduced, as for any homomorphism f, f(x) = x, f(y) = y. Thus Q is equivalent to Q (x, y) : B(x, y), R(y, w), R(w, y) One join is eliminated.

Foundations of Databases 26 Conjunctive Queries with Equalities and Inequalities Equality / Inequality atoms x = y, x = a, x z, etc Let T 1, T 2 be the tableaux of the parts of conjunctive queries Q 1 and Q 2 with ordinary relations Q 2 Q 1 if there exists a homomorphism f : T 1 T 2 such that for each (in)equality atom t 1 θt 2 in Q 1, f(t 1 )θf(t 2 ) is logically implied by the equality and inequality atoms in Q 2 The converse does not hold in general It holds under certain conditions, though Note: Deciding whether a set of equality / inequality atoms A logically implies an equality / inequality atom is easy.

Foundations of Databases 27 Query optimization and integrity constraints Additional equivalences can be inferred if integrity constraints are known Example: Let R have attributes A, B, C. Assume that R satisfies A B. Then it holds that R = π AB (R) π AC (R) Tableaux can help with these optimizations! π AB (R) π AC (R) as a conjunctive query: Q(x, y, z): R(x, y, z 1 ), R(x, y 1, z)

Foundations of Databases 28 Query optimization and integrity constraints cont d A B C Tableau: x y z 1 x y 1 z x y z Using the FD A B infer y = y 1 A B C A B C Next, minimize the resulting tableau: x y z 1 x y z x y z x y z x y z And this says that the query is equivalent to Q (x, y, z): R(x, y, z), that is, R.

Foundations of Databases 29 Query optimization and integrity constraints cont d General idea: simplify the tableau using functional dependencies and then minimize. Given: a conjunctive query Q, and a set of FDs F Algorithm: Step 1. Compute the tableau T for Q. Step 2. Apply algorithm CHASE(T, F ). Step 3. Minimize output of CHASE(T, F ). Step 4. Construct a query from the tableau produced in Step 3.

Foundations of Databases 30 The CHASE Assume that all FDs are of the form X A, where A is an attribute. for each X A in F do for each t 1, t 2 in T such that t 1.X = t 2.X and t 1.A t 2.A do case t 1.A, t 2.A of both nondistinguished replace one by the other one distinguished, one nondistinguished replace nondistinguished by distinguished one constant, one variable replace variable by constant both constants output and STOP end end.

Foundations of Databases 31 Query optimization and integrity constraints: example R is over A, B, C; F contains B A Q = π BC (σ A=4 (R)) π AB (R) Q as a conjunctive query: Q(x, y, z) : R(4, y, z), R(x, y, z 1 ) Tableau: A B C 4 y z x y z 1 x y z CHASE A B C 4 y z 4 y z 1 4 y z minimize A B C 4 y z 4 y z Final result: Q(x, y, z) : R(x, y, z), x = 4, that is, σ A=4 (R).

Foundations of Databases 32 Query optimization and integrity constraints: example Same R and F ; the query is: Q = π BC (σ A=4 (R)) π AB (σ A=5 (R)) As a conjunctive query: Q(x, y, z) : R(4, y, z), R(x, y, z 1 ), x = 5 A B C Tableau: 4 y z 5 y z 1 CHASE 5 y z Final result: This equivalence is not true without the FD B A

Foundations of Databases 33 Query optimization and integrity constraints: example Sometimes simplifications are quite dramatic Same R, FD is A B, the query is Q = π AB (R) π A (σ B=4 (R)) π AB (π AC (R) π BC (R)) Convert into conjunctive query: Q(x, y) : R(x, y, z 1 ), R(x, y 1, z), R(x 1, y, z), R(x, 4, z 2 )

Foundations of Databases 34 Tableau: A B C A B C x y z 1 x y 1 z x 1 y z x 4 z 2 CHASE x 4 z 1 x 4 z x 1 4 z x 4 z 2 minimize A B C x 4 z x 4 x y x 4

Foundations of Databases 35 Query optimization and integrity constraints: example cont d A B C x 4 z is translated into Q(x, y) : R(x, y, z), y = 4 x 4 or, equivalently π AB (σ B=4 (R)). Thus, π AB (R) π A (σ B=4 (R)) π AB (π AC (R) π BC (R)) = π AB (σ B=4 (R)) in the presence of FD A B. Savings: 3 joins! This cannot be derived by algebraic manipulations, nor conjunctive query minimization without using CHASE.

Foundations of Databases 36 Bibliography [1] S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, 1995. [2] H. Garcia-Molina, J. D. Ullman, and J. Widom. Database Systems The Complete Book. Prentice Hall, 2002. [3] D. Maier. The Theory of Relational Databases. Computer Science Press, Rockville, Md., 1983. [4] J. D. Ullman. Principles of Database and Knowledge Base Systems. Computer Science Press, 1989.