Database System Concepts

Similar documents
CMSC 424 Database design Lecture 18 Query optimization. Mihai Pop

Chapter 14: Query Optimization

Contents Contents Introduction Basic Steps in Query Processing Introduction Transformation of Relational Expressions...

Query Optimization. Shuigeng Zhou. December 9, 2009 School of Computer Science Fudan University

Chapter 14 Query Optimization

Chapter 14 Query Optimization

Chapter 14 Query Optimization

Advanced Databases. Lecture 4 - Query Optimization. Masood Niazi Torshiz Islamic Azad university- Mashhad Branch

Chapter 12: Query Processing

Chapter 11: Query Optimization

Chapter 13: Query Optimization. Chapter 13: Query Optimization

Query Processing & Optimization

Ch 5 : Query Processing & Optimization

Introduction Alternative ways of evaluating a given query using

Database System Concepts

14.3 Transformation of Relational Expressions

Chapter 13: Query Optimization

Query processing and optimization

Database System Concepts

Query Processing Strategies and Optimization

Chapter 4: SQL. Basic Structure

CMSC424: Database Design. Instructor: Amol Deshpande

Query Optimization. Chapter 13

CSIT5300: Advanced Database Systems

DBMS Y3/S5. 1. OVERVIEW The steps involved in processing a query are: 1. Parsing and translation. 2. Optimization. 3. Evaluation.

Database System Concepts, 5th Ed.! Silberschatz, Korth and Sudarshan See for conditions on re-use "

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

SQL. Lecture 4 SQL. Basic Structure. The select Clause. The select Clause (Cont.) The select Clause (Cont.) Basic Structure.

Database System Concepts

Chapter 3: SQL. Chapter 3: SQL

Silberschatz, Korth and Sudarshan See for conditions on re-use

Overview of Query Processing and Optimization

Basic Structure Set Operations Aggregate Functions Null Values Nested Subqueries Derived Relations Views Modification of the Database Data Definition

Chapter 13: Query Processing

Chapter 3: SQL. Database System Concepts, 5th Ed. Silberschatz, Korth and Sudarshan See for conditions on re-use

DATABASE TECHNOLOGY - 1MB025

Advanced Database Systems

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for

Chapter 13: Query Processing Basic Steps in Query Processing

Query Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016

R & G Chapter 13. Implementation of single Relational Operations Choices depend on indexes, memory, stats, Joins Blocked nested loops:

Lecture 3 SQL. Shuigeng Zhou. September 23, 2008 School of Computer Science Fudan University

CS 582 Database Management Systems II

SQL QUERIES. CS121: Relational Databases Fall 2017 Lecture 5

Examples of Physical Query Plan Alternatives. Selected Material from Chapters 12, 14 and 15

Database Systems SQL SL03

DATABASE TECHNOLOGY - 1MB025

DATABASTEKNIK - 1DL116

Database Systems SQL SL03

Outline. Query Processing Overview Algorithms for basic operations. Query optimization. Sorting Selection Join Projection

Database Tuning and Physical Design: Basics of Query Execution

Module 9: Query Optimization

Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Chapter 6 Outline. Unary Relational Operations: SELECT and

Chapter 12: Query Processing

Improving Query Plans. CS157B Chris Pollett Mar. 21, 2005.

Relational Query Optimization

CS122 Lecture 4 Winter Term,

Relational Algebra. Relational Algebra. 7/4/2017 Md. Golam Moazzam, Dept. of CSE, JU

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe

The SQL database language Parts of the SQL language

Querying Data with Transact SQL

Chapter 12: Query Processing. Chapter 12: Query Processing

Textbook: Chapter 6! CS425 Fall 2013 Boris Glavic! Chapter 3: Formal Relational Query. Relational Algebra! Select Operation Example! Select Operation!

DATABASE DESIGN I - 1DL300

The Extended Algebra. Duplicate Elimination. Sorting. Example: Duplicate Elimination

Query Processing. Solutions to Practice Exercises Query:

Relational Model, Relational Algebra, and SQL

Relational Algebra. ICOM 5016 Database Systems. Roadmap. R.A. Operators. Selection. Example: Selection. Relational Algebra. Fundamental Property

Chapter 6: Formal Relational Query Languages

CS122 Lecture 10 Winter Term,

Query Optimization. Query Optimization. Optimization considerations. Example. Interaction of algorithm choice and tree arrangement.

Relational Query Optimization. Highlights of System R Optimizer

An SQL query is parsed into a collection of query blocks optimize one block at a time. Nested blocks are usually treated as calls to a subroutine

Faloutsos 1. Carnegie Mellon Univ. Dept. of Computer Science Database Applications. Outline

Chapter 3: Relational Model

Database System Concepts

Review. Relational Query Optimization. Query Optimization Overview (cont) Query Optimization Overview. Cost-based Query Sub-System

Chapter 2: Relational Model

Query Optimization. Introduction to Databases CompSci 316 Fall 2018

Database Systems. Announcement. December 13/14, 2006 Lecture #10. Assignment #4 is due next week.

Database System Concepts, 5th Ed.! Silberschatz, Korth and Sudarshan See for conditions on re-use "

CS34800 Information Systems. The Relational Model Prof. Walid Aref 29 August, 2016

Relational Query Optimization. Overview of Query Evaluation. SQL Refresher. Yanlei Diao UMass Amherst October 23 & 25, 2007

Parser: SQL parse tree

What happens. 376a. Database Design. Execution strategy. Query conversion. Next. Two types of techniques

Final Exam Review 2. Kathleen Durant CS 3200 Northeastern University Lecture 23

Chapter 19 Query Optimization

Introduction to Database Systems CSE 444

CS121 MIDTERM REVIEW. CS121: Relational Databases Fall 2017 Lecture 13

Query Optimization Overview

Advances in Data Management Query Processing and Query Optimisation A.Poulovassilis

CS 4604: Introduction to Database Management Systems. B. Aditya Prakash Lecture #10: Query Processing

Principles of Data Management. Lecture #12 (Query Optimization I)

CS3DB3/SE4DB3/SE6M03 TUTORIAL

Query Processing and Query Optimization. Prof Monika Shah

ATYPICAL RELATIONAL QUERY OPTIMIZER

15-415/615 Faloutsos 1

DATABASE TECHNOLOGY. Spring An introduction to database systems

INF1383 -Bancos de Dados

Schema for Examples. Query Optimization. Alternative Plans 1 (No Indexes) Motivating Example. Alternative Plans 2 With Indexes

Transcription:

Chapter 14: Optimization Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2007/2008 Slides (fortemente) baseados nos slides oficiais do livro c Silberschatz, Korth and Sudarshan.

Outline 1 2 of Plans 3

Outline 1 2 3

Alternative ways of evaluating a given query Equivalent expressions Different algorithms for each operation Cost difference between a good and a bad way of evaluating a query can be enormous Need to estimate the cost of operations Statistical information about relations. Examples: number of tuples number of distinct values for an attributes... Statistics estimation for intermediate results to compute cost of complex expressions

Equivalent Relations generated by two equivalent expressions have the same set of attributes and contain the same set of tuples although their tuples/attributes may be ordered differently π customer name σ branch city=brooklyn π customer name σ branch city=brooklyn branch branch account depositor account depositor

Cost-Based Optimization Generation of query-evaluation plans for an expression involves several steps: Generating logically equivalent expressions using equivalence rules Annotating resultant expressions to get alternative query plans Choosing the cheapest plan based on estimated cost The overall process is called cost-based optimization

Outline of Plans 1 2 of Plans 3

of of Plans Two relational algebra expressions are said to be equivalent if on every legal database instance the two expressions generate the same set of tuples Note: order of tuples is irrelevant In SQL, inputs and outputs are multisets of tuples Two expressions in the multiset version of the relational algebra are said to be equivalent if on every legal database instance the two expressions generate the same multiset of tuples An equivalence rule says that expressions of two forms are equivalent Can replace expression of first form by second, or vice versa

Equivalence Rules I of Plans 1) Conjunctive selection operations can be deconstructed into a sequence of individual selections σ θ1 θ 2 (E) = σ θ1 (σ θ2 (E)) 2) Selection operations are commutative σ θ1 (σ θ2 (E)) = σ θ2 (σ θ1 (E)) 3) Only the last in a sequence of projection operations is needed, the others can be omitted π L1 (π L2 (...(π Ln (E))...)) = π L1 (E)

Equivalence Rules II of Plans 4) Selections can be combined with Cartesian products and theta joins σ θ (E 1 E 2 ) = E 1 θ E 2 σ θ1 (E 1 θ2 E 2 ) = E 1 θ1 θ 2 E 2 5) Theta-join operations and natural joins are commutative E 1 θ E 2 = E 2 θ E 1

Equivalence Rules III of Plans 6) Join operations are associative (a) Natural join operations are associative (E 1 E 2 ) E 3 = E 1 (E 2 E 3 ) (b) Theta joins are associative in the following manner (E 1 θ1 E 2 ) θ2 θ 3 E 3 = E 1 θ1 θ 3 (E 2 θ2 E 3 ) where θ 2 involves attributes from only E 2 and E 3

Equivalence Rules IV of Plans Example (customer c.c name=d.c name depositor) d.a number=a.a number balance>1000 account = customer c.c name=d.c name balance>1000 (depositor d.a number=a.a number account)

Equivalence Rules V of Plans 7) The selection operation distributes over the theta join operation under the following two conditions: (a) When all the attributes in θ 0 involve only the attributes of one of the expressions (E 1 ) being joined σ θ0 (E 1 θ E 2 ) = (σ θ0 (E 1 )) θ E 2 (b) When θ 1 involves only the attributes of E 1 and θ 2 involves only the attributes of E 2 σ θ1 θ 2 (E 1 θ E 2 ) = (σ θ1 (E 1 )) θ (σ θ2 (E 2 ))

Equivalence Rules VI of Plans Example σ c name=smith balance>1000 (depositor account) = σ c name=smith (depositor) σ balance>1000 (account)

Equivalence Rules VII of Plans 8) The projections operation distributes over the theta join operation as follows: (a) Let L 1 and L 2 be attributes from E 1 and E 2. If the join involves only attributes in L 1 L 2 π L1 L 2 (E 1 θ E 2 ) = π L1 (E 1 ) θ π L2 (E 2 ) (b) Let L 3 be attributes of E 1 that are involved in the join condition, but are not in L 1 L 2, and let L 4 be attributes of E 2 that are involved in the join condition, but are not in L 1 L 2 π L1 L 2 (E 1 θ E 2 ) = π L1 L 2 ((π L1 L 3 (E 1 )) θ (π L2 L 4 (E 2 )))

Equivalence Rules VIII of Plans Example π b name,c name (account a.a number=d.a number depositor) = π b name,c name ((π b name,a number (E 1 )) a.a number=d.a number (π c name,a number (E 2 )))

Equivalence Rules IX of Plans 9) The set operations union and intersection are commutative E 1 E 2 = E 2 E 1 E 1 E 2 = E 2 E 1 10) Set union and intersection are associative (E 1 E 2 ) E 3 = E 1 (E 2 E 3 ) (E 1 E 2 ) E 3 = E 1 (E 2 E 3 )

Equivalence Rules X of Plans 11) The selection operation distributes over, and σ θ (E 1 E 2 ) = σ θ (E 1 ) σ θ (E 2 ) and similarly for and. Also σ θ (E 1 E 2 ) = σ θ (E 1 ) E 2 and similarly for and 12) The projection operation distributes over union π L (E 1 E 2 ) = π L (E 1 ) π L (E 2 ) Other rules also exist for outer joins, aggregations,...

Graphical Examples θ E 1 E 2 (join commutativity) θ E 2 E 1 of Plans E 1 E 2 E 3 (natural join associativity) E 1 E 2 E 3 σ θ E 1 E 2 (selection distributivity over join, rule 7(a)) σ θ E 1 E 2

Example of Plans : Find the names of all customers who have an account at some branch located in Brooklyn π customer name (σ branch city=brooklyn (branch using rule 7(a) (account depositor))) π customer name ((σ branch city=brooklyn (branch)) (account depositor)) Performing the selection as early as possible reduces the size of the relation to be joined

Multiple s : Find the names of all customers with an account at a Brooklyn branch whose account balance is over $1000 of Plans π customer name ((σ branch city=brooklyn balance>1000 (branch (account depositor))) using join associatively (rule 6(a)): π customer name ((σ branch city=brooklyn balance>1000 (branch account) depositor)) Second form provides an opportunity to apply the perform selections early rule, resulting in the expression π customer name ((σ branch city=brooklyn (branch) σ balance>1000 (account)) depositor)

Multiple s (cont.) π customer name π customer name of Plans σ branch city = Brooklyn balance > 1000 depositor branch σ branch city=brooklyn σ balance>1000 account depositor branch account

Projection Operation Example Consider the query of Plans π customer name ((σ branch city=brooklyn (branch) account) depositor) When we compute σ branch city=brooklyn (branch) account, we get a relation whose schema is branch name, branch city, assets, account number, balance) Push projections using equivalence rules 8(a) and 8(b) eliminate unneeded attributes from intermediate results π customer name (π account number (σ branch city=brooklyn (branch) account) depositor)

Join Ordering Example Consider the expression of Plans π customer name ((σ branch city=brooklyn (branch)) (account depositor)) Could compute (account depositor) first, and join result with σ branch city=brooklyn (branch) However, (account depositor) is likely to be a large relation...... and only a small fraction of the bank s customers are likely to have accounts in branches located in Brooklyn Thus, it is better to compute first σ branch city=brooklyn (branch) account

Generating Equivalent of Plans optimizers use equivalence rules to systematically generate expressions equivalent to the given expression Conceptually, generate all equivalent expressions by repeatedly executing the following step until no more expressions can be found: for each expression found so far, use all applicable equivalence rules add newly generated expressions to the set of expressions found so far The above approach is very expensive in space and time Space requirements reduced by sharing common subexpressions: when E 1 is generated from E 2 by an equivalence rule, usually only the top level of the two are different, subtrees below are the same and can be shared Time requirements are reduced by not generating all expressions

Plan of Plans An evaluation plan defines exactly what algorithm is used for each operation, and how the execution of the operations is coordinated pipeline σ branch city=brooklyn (index 1) π customer name sort (merge join) (hash join) depositor pipeline σ balance<1000 (linear scan) branch account

Choice of Plans of Plans Must consider the interaction of evaluation techniques when choosing evaluation plans: choosing the cheapest algorithm for each operation independently may not yield best overall algorithm merge-join may be costlier than hash-join, but may provide a sorted output which reduces the cost for an outer level aggregation nested-loop join may provide opportunity for pipelining Practical query optimizers incorporate elements of the following two broad approaches: 1 Search all the plans and choose the best plan in a cost-based fashion 2 Uses heuristics to choose a plan

Heuristic Optimization of Plans Cost-based optimization is expensive, even with dynamic programming s may use heuristics to reduce the number of choices that must be made in a cost-based fashion Heuristic optimization transforms the query-tree by using a set of rules that typically (but not in all cases) improve execution performance: Perform selection early (reduces the number of tuples) Perform projection early (reduces the number of attributes) Perform most restrictive selection and join operations before other similar operations Some systems use only heuristics, others combine heuristics with partial cost-based optimization

Typical Heuristic Optimization of Plans 1 Deconstruct conjunctive selections into a sequence of single selection operations (rule 1) 2 Move selection operations down the query tree for the earliest possible execution (rules 2, 7(a), 7(b), and 11) 3 Execute first those selection and join operations that will produce the smallest relations (rule 6) 4 Replace Cartesian product operations that are followed by a selection condition by join operations (rule 4(a)) 5 Deconstruct and move as far down the tree as possible lists of projection attributes, creating new projections where needed (rules 3, 8(a), 8(b), and 12) 6 Identify those subtrees whose operations can be pipelined, and execute them using pipelining

Outline 1 2 3

Optimizing SQL treats nested subqueries in the where clause as functions that take parameters and return value(s) Parameters are variables from outer level query that are used in the nested subquery; such variables are called correlation variables select customer name from borrower b where exists (select * from depositor d where d.customer name = b.customer name) Conceptually, nested subquery is executed once for each tuple in the cross-product generated by the outer level from clause Such evaluation is called correlated evaluation Note: other conditions in the where clause may be used to compute a join before executing the nested subquery

Optimizing (cont.) Correlated evaluation may be quite inefficient since a large number of calls may be made to the nested query there may be unnecessary random I/O as a result SQL optimizers attempt to transform nested subqueries to joins where possible E.g.: earlier nested query can be rewritten as select customer name from borrower b, depositor d where d.customer name = b.customer name Note: above query doesn t correctly deal with duplicates In general, it is not possible/straightforward to move the entire nested subquery into the outer level query from clause A temporary relation is created instead, and used in body of outer level query

Creating Temporary Tables In general, SQL queries of the form below can be rewritten as shown: select... from L 1 where P 1 and exists (select * from L 2 where P 2) create table t 1 as select distinct V from L 2 where P 1 2 select... from L 1, t 1 where P 1 and P 2 2 P2 1 contains predicates in P 2 that do not involve any correlation variables P2 2 reintroduces predicates involving correlation variables, with relations renamed appropriately V contains all attributes used in predicates with correlation variables

Decorrelation In our example select customer name from borrower b where exists (select * from depositor d where d.customer name = b.customer name) translates to create table t 1 as select distinct customer name from depositor select customer name from borrower, t1 where t1.customer name = borrower.customer name

Decorrelation (cont.) The process of replacing a nested query by a query with a join (possibly with a temporary relation) is called decorrelation Decorrelation is more complicated when the nested subquery uses aggregation when the result of the nested subquery is used to test for equality when the condition linking the nested subquery to the other query is not exists and many other... Therefore avoid complex nested subqueries

End of Chapter 14