Query Processing and Optimization

Similar documents
Examples of Physical Query Plan Alternatives. Selected Material from Chapters 12, 14 and 15

Improving Query Plans. CS157B Chris Pollett Mar. 21, 2005.

Background material for Chapter 3 taken from CS 245

Relational Query Optimization

R & G Chapter 13. Implementation of single Relational Operations Choices depend on indexes, memory, stats, Joins Blocked nested loops:

The query processor turns user queries and data modification commands into a query plan - a sequence of operations (or algorithm) on the database

Relational Query Optimization. Highlights of System R Optimizer

Database Systems. Announcement. December 13/14, 2006 Lecture #10. Assignment #4 is due next week.

Query Processing and Query Optimization. Prof Monika Shah

Evaluation of Relational Operations: Other Techniques. Chapter 14 Sayyed Nezhadi

Principles of Data Management. Lecture #9 (Query Processing Overview)

Principles of Data Management. Lecture #12 (Query Optimization I)

Overview of Implementing Relational Operators and Query Evaluation

Operator Implementation Wrap-Up Query Optimization

Overview of DB & IR. ICS 624 Spring Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa

Query Optimization. Schema for Examples. Motivating Example. Similar to old schema; rname added for variations. Reserves: Sailors:

Database Applications (15-415)

CS330. Query Processing

CompSci 516 Data Intensive Computing Systems

Query Evaluation (i)

Overview of Query Evaluation. Overview of Query Evaluation

Query Optimization. Schema for Examples. Motivating Example. Similar to old schema; rname added for variations. Reserves: Sailors:

An SQL query is parsed into a collection of query blocks optimize one block at a time. Nested blocks are usually treated as calls to a subroutine

Query Optimization in Relational Database Systems

Overview of Query Processing

Overview of Query Evaluation

Background material for Chapter 3 taken from CS 245

Implementation of Relational Operations

Evaluation of Relational Operations. Relational Operations

CompSci 516 Data Intensive Computing Systems. Lecture 11. Query Optimization. Instructor: Sudeepa Roy

Administrivia. CS 133: Databases. Cost-based Query Sub-System. Goals for Today. Midterm on Thursday 10/18. Assignments

Evaluation of Relational Operations

Review. Relational Query Optimization. Query Optimization Overview (cont) Query Optimization Overview. Cost-based Query Sub-System

Administrivia. Relational Query Optimization (this time we really mean it) Review: Query Optimization. Overview: Query Optimization

Relational Query Optimization. Overview of Query Evaluation. SQL Refresher. Yanlei Diao UMass Amherst March 8 and 13, 2007

Algebraic laws extensions to relational algebra

Faloutsos 1. Carnegie Mellon Univ. Dept. of Computer Science Database Applications. Outline

Outline. Query Processing Overview Algorithms for basic operations. Query optimization. Sorting Selection Join Projection

CS 4604: Introduction to Database Management Systems. B. Aditya Prakash Lecture #10: Query Processing

Evaluation of Relational Operations

External Sorting Implementing Relational Operators

Schema for Examples. Query Optimization. Alternative Plans 1 (No Indexes) Motivating Example. Alternative Plans 2 With Indexes

Relational Query Optimization. Overview of Query Evaluation. SQL Refresher. Yanlei Diao UMass Amherst October 23 & 25, 2007

Query Evaluation Overview, cont.

Query Evaluation Overview, cont.

Overview of Query Evaluation. Chapter 12

Evaluation of Relational Operations

QUERY OPTIMIZATION E Jayant Haritsa Computer Science and Automation Indian Institute of Science. JAN 2014 Slide 1 QUERY OPTIMIZATION

15-415/615 Faloutsos 1

CSIT5300: Advanced Database Systems

Implementation of Relational Operations. Introduction. CS 186, Fall 2002, Lecture 19 R&G - Chapter 12

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17

Optimization of logical query plans Eliminating redundant joins

Query Processing & Optimization. CS 377: Database Systems

Administriva. CS 133: Databases. General Themes. Goals for Today. Fall 2018 Lec 11 10/11 Query Evaluation Prof. Beth Trushkowsky

Evaluation of relational operations

Introduction to Data Management. Lecture 14 (SQL: the Saga Continues...)

Overview of Query Processing. Evaluation of Relational Operations. Why Sort? Outline. Two-Way External Merge Sort. 2-Way Sort: Requires 3 Buffer Pages

SQL: Queries, Programming, Triggers

Database Applications (15-415)

Database Applications (15-415)

Query Evaluation! References:! q [RG-3ed] Chapter 12, 13, 14, 15! q [SKS-6ed] Chapter 12, 13!

CSIT5300: Advanced Database Systems

CSE 444: Database Internals. Section 4: Query Optimizer

Database Applications (15-415)

SQL: Queries, Constraints, Triggers

CAS CS 460/660 Introduction to Database Systems. Query Evaluation II 1.1

Query Optimization Overview. COSC 404 Database System Implementation. Query Optimization. Query Processor Components The Parser

CIS 330: Applied Database Systems

Basic form of SQL Queries

ATYPICAL RELATIONAL QUERY OPTIMIZER

Relational Algebra. Study Chapter Comp 521 Files and Databases Fall

Database Applications (15-415)

Notes. Some of these slides are based on a slide set provided by Ulf Leser. CS 640 Query Processing Winter / 30. Notes

SQL: Queries, Programming, Triggers. Basic SQL Query. Conceptual Evaluation Strategy. Example of Conceptual Evaluation. A Note on Range Variables

SQL. Chapter 5 FROM WHERE

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 7 - Query optimization

COSC 404 Database System Implementation. Query Optimization. Dr. Ramon Lawrence University of British Columbia Okanagan

Query Processing and Advanced Queries. Query Optimization (4)

Relational Query Optimization

Database Management Systems. Chapter 4. Relational Algebra. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

Final Exam Review 2. Kathleen Durant CS 3200 Northeastern University Lecture 23

Query Evaluation and Optimization

Review: Query Evaluation Steps. Example Query: Logical Plan 1. What We Already Know. Example Query: Logical Plan 2.

Principles of Database Management Systems

Algorithms for Query Processing and Optimization. 0. Introduction to Query Processing (1)

Database Management System. Relational Algebra and operations

CSE 344 APRIL 20 TH RDBMS INTERNALS

Enterprise Database Systems

Implementation of Relational Operations: Other Operations

Relational Algebra. [R&G] Chapter 4, Part A CS4320 1

Relational Algebra 1

Lecture Query evaluation. Combining operators. Logical query optimization. By Marina Barsky Winter 2016, University of Toronto

CAS CS 460/660 Introduction to Database Systems. Relational Algebra 1.1

Query Optimization. Kyuseok Shim. KDD Laboratory Seoul National University. Query Optimization

Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 9 - Query optimization

Database Systems External Sorting and Query Optimization. A.R. Hurson 323 CS Building

Introduction to Data Management. Lecture #11 (Relational Algebra)

Transcription:

Query Processing and Optimization (Part-1) Prof Monika Shah

Overview of Query Execution SQL Query Compile Optimize Execute SQL query parse parse tree statistics convert logical query plan apply laws improved l.q.p estimate result sizes l.q.p. +sizes consider physical plans {(P1,C1),(P2,C2)...} answer execute Pi pick best estimate costs {P1,P2,..}

Logical Plans vs. Physical Plans Physical plan means how each operator will execute (which algorithm) E.g., Join can be nested-loop, hash-based, merge-based, or sort-based Each logical plan will map to multiple physical plans Logical Plan Ptitle starname=name StarsIn Pname sbirthdate LIKE %1960 One Physical Plan Hash join Parameters: join order, memory size, project attributes,... SEQ scan index scan Parameters: Select Condition,... StarsIn MovieStar MovieStar

Example: SQL query SELECT title FROM StarsIn WHERE starname IN ( SELECT name FROM MovieStar WHERE birthdate LIKE %1960 ); (Find the movies with stars born in 1960) 4

Example: Parse Tree <Query> <SFW> SELECT <SelList> FROM <FromList> WHERE <Condition> <Attribute> <RelName> <Tuple> IN <Query> title StarsIn <Attribute> ( <Query> ) starname <SFW> SELECT <SelList> FROM <FromList> WHERE <Condition> <Attribute> <RelName> <Attribute> LIKE <Pattern> name MovieStar birthdate %1960 5

Preprocessing : unfold view, semantic check Input: SQL Query SELECT t i t l e FROM ParamountMovies WHERE year = 1979; Where, CREATE VIEW ParamountMovies AS SELECT t i t l e, y ear FROM Movies WHERE studioname = Paramount ; simplification substitute

Query Plan Generator : Transform Parse tree into relational algebra

Example: Generating Relational Algebra SELECT title title FROM StarsIn WHERE starname IN ( SELECT name FROM MovieStar StarsIn <condition> ); WHERE birthdate LIKE %1960 <tuple> IN name (Find the movies with stars born in 1960) <attribute> birthdate LIKE %1960 starname MovieStar Fig. 7.15: An expression using a two-argument, midway between a parse tree nd relational algebra 8

Example: Logical Query Plan title title starname=name StarsIn <condition> <tuple> IN name StarsIn name <attribute> birthdate LIKE %1960 birthdate LIKE %1960 starname MovieStar MovieStar 9

Rewrite : Translate into an best equivalent logical query plan (Using algebraic laws): Optimal sequence of operation Algebraic Transformation Laws Commutative : i.e A op B B op A, for op: X, X,,. But, not for - Associativity: A op (B op C) (A op B)op C, for op: X, X,,. But, not for - Distribution Law : i.e. A s ( B s C) (A s B) s (A s C ) But, A B ( B B C)! (A B B) B (A B C ) c ( R S) c (R) c (S), c ( R x S) c (R) X (S), Other basic : R = R. R = R. R S = S, if S R. c ( R - S) c (R) - (S). if C is only applicable to R

Rewrite : Translate into an best equivalent logical query plan (Approaches) Approach 1: Cost based optimization Approach 2 : Heuristic based optimization Heuristic Optimization Laws Goal: reduce size of intermediate results Heuristics to reduce the number of choices set of rules that typically (but not in all cases) improve execution performance: Perform selection early (reduces the number of tuples) Perform projection early (reduces the number of attributes) Perform most restrictive selection and join operations before other similar operations. Multi-way join ordering Replace Cartesian with Join And many other

Example: Improved Logical Query Plan title starname=name title starname=name Question: Push project to StarsIn? StarsIn name StarsIn name birthdate LIKE %1960 birthdate LIKE %1960 MovieStar MovieStar Fig. 7.20: An improvement on fig. 7.18. 12

Example: Estimate Result Sizes Need expected size StarsIn MovieStar 13

Cost Estimation for various operators: Selection : T(S) = T (R ) / V ( R, A ), where S = a=c (R) T(S) = T (R ) / 3, where S = a<c (R) inequality T(S) = 1/3. T (R ) / V ( R, A ), where S = a<c AND b=2 (R) AND T(S) = c1uc2 = 1-(c1Uc2) = 1-c1 c2 =1- (1- T(R)/3) (1-T(R )/V( R, A)), where S = a<c OR b=2 (R) OR Join : T( R X a S) = 0, where R and S are disjoint = 1. T(S), where a is key of R = T(R). T(S), where a is non-key, and same values in a Hence, avg cost = T(R). T(S) /max(v(r,a),v(s,a))

Cost Estimation for various operators: Complex Join : Join with multiple attribute : T( R X a,b S) = T(R). T(S) / (max(v(r,a),v(s,a)) x max(v(r,b),v(s,b)) Multiple Join : T( R 1 X a R 2 x a...r k ) = T(R 1 ). T(R 2 )... T(R k ) / (Product of largest k-1 V(R,a))

Cost Estimation for various operators: Union: T(S) = T (R ) + T(V), where S = R B V, and disjoint R s V T(S) = max(t (R ), T(V)), where S = R s V and containment avg= larger + smaller/2 Intersection: T(S) = ½ (min(t (R ), T(V) )) average cost Difference: T(S) = ½ (T (R ) - T(V) ) average cost

Self Review Questions What is need of computing estimated size of logical operator? Does it required to compute cost of each operator of logical query plan? What is difference between Logical query plan and Physical Query Plan? What if we generate Physical Query plan directly from relational algebra?

Self Review Questions (contd ) Here are the statistics for four relations W, X, Y, Z. W(A,B) X(B,C) Y(C,D) Z(D,E) T(W)= 100 T(X) = 200 T(Y) = 300 T(Z) = 400 V(W,A) = 20 V(X,B) = 50 V(Y,C) = 50 V(Z,D)=40 V(W,B) = 60 V(X,C) = 100 V(Y,D) = 50 V(Z,E) = 100 Estimate the tuple numbers of the following expressions: 1. σ A=35 (W) 2. σ A=35^B=5(W) 3. W X 4. X Y 5. W X Y Z

Complete cost Evaluation : Schema for Examples Sailors (sid: integer, sname: string, rating: integer, age: real) Reserves (sid: integer, bid: integer, day: dates, rname: string) Prof Monika Shah Reserves: Each tuple is 40 bytes long, 100 tuples per block, 1000 blocks. Assume there are 100 boats Sailors: Each tuple is 50 bytes long, 80 tuples per block, 500 blocks. Assume there are 10 different ratings Assume we have 5 blocks in our buffer pool! Assume 1 I/O take 200 ms 1

Motivating Example SELECT S.sname FROM Reserves R, Sailors S WHERE R.sid=S.sid AND R.bid=100 AND S.rating>5 Cost: 500+500*1000 I/Os By no means the worst plan! Misses several opportunities: selections could have been `pushed earlier, no use is made of any available indexes, etc. Goal of optimization: To find more efficient plans that compute the same answer. Plan: sname bid=100 rating > 5 sid=sid (On-the-fly) (On-the-fly) (Block nested join) Sailors Reserves Prof Monika Shah

Compute I/O cost of Plan A, Plan B and Plan C Which one is cost effective comparitive to others? sname Plan A Plan B (On-the-fly) sname (On-the-fly) Plan C sname (On-the-fly) bid=100 (On-the-fly) rating > 5 (On-the-fly) rating > 5 (On-the-fly) sid=sid (block-oriented Nested loops) sid=sid (block-oriented Nested loops) sid=sid (Index nested Join) rating > 5 (On-the-fly) Reserves bid=100 (On-the-fly) Sailors bid=100 (Use Index) Sailors Hash Index on sno Sailors Reserves Reserves Hash Index on bid