Lecture Query evaluation. Combining operators. Logical query optimization. By Marina Barsky Winter 2016, University of Toronto

Similar documents
L22: The Relational Model (continued) CS3200 Database design (sp18 s2) 4/5/2018

Lecture 16. The Relational Model

Introduction to Data Management CSE 344. Lectures 8: Relational Algebra

4/10/2018. Relational Algebra (RA) 1. Selection (σ) 2. Projection (Π) Note that RA Operators are Compositional! 3.

Introduction to Data Management CSE 344. Lectures 8: Relational Algebra

Chapter 2: The Relational Algebra

Chapter 2: The Relational Algebra

CAS CS 460/660 Introduction to Database Systems. Relational Algebra 1.1

Optimization Overview

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2006 Lecture 3 - Relational Model

The Extended Algebra. Duplicate Elimination. Sorting. Example: Duplicate Elimination

Agenda. Discussion. Database/Relation/Tuple. Schema. Instance. CSE 444: Database Internals. Review Relational Model

Relational Algebra. B term 2004: lecture 10, 11

CSC 261/461 Database Systems Lecture 13. Fall 2017

Relational Model, Relational Algebra, and SQL

Databases Relational algebra Lectures for mathematics students

Relational Query Languages. Preliminaries. Formal Relational Query Languages. Example Schema, with table contents. Relational Algebra

Lecture Query evaluation. Cost-based transformations. By Marina Barsky Winter 2016, University of Toronto

Relational Algebra. Note: Slides are posted on the class website, protected by a password written on the board

Announcements. Agenda. Database/Relation/Tuple. Discussion. Schema. CSE 444: Database Internals. Room change: Lab 1 part 1 is due on Monday

Relational Databases

RELATIONAL ALGEBRA. CS 564- Fall ACKs: Dan Suciu, Jignesh Patel, AnHai Doan

Announcements. Agenda. Database/Relation/Tuple. Schema. Discussion. CSE 444: Database Internals

CSE 544 Principles of Database Management Systems

Database Technology Introduction. Heiko Paulheim

Relational Algebra and SQL. Basic Operations Algebra of Bags

CMPT 354: Database System I. Lecture 7. Basics of Query Optimization

CS3DB3/SE4DB3/SE6DB3 TUTORIAL

Database Management Systems. Chapter 4. Relational Algebra. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

CMP-3440 Database Systems

CS 377 Database Systems

Relational Model: History

CS 317/387. A Relation is a Table. Schemas. Towards SQL - Relational Algebra. name manf Winterbrew Pete s Bud Lite Anheuser-Busch Beers

CMSC 424 Database design Lecture 18 Query optimization. Mihai Pop

Ian Kenny. November 28, 2017

Relational Algebra. Procedural language Six basic operators

Relational Query Languages: Relational Algebra. Juliana Freire

Relational Algebra. Algebra of Bags

Relational Algebra. [R&G] Chapter 4, Part A CS4320 1

Relational Algebra BASIC OPERATIONS DATABASE SYSTEMS AND CONCEPTS, CSCI 3030U, UOIT, COURSE INSTRUCTOR: JAREK SZLICHTA

1 Relational Data Model

QUERY OPTIMIZATION. CS 564- Spring ACKs: Jeff Naughton, Jignesh Patel, AnHai Doan

Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Chapter 6 Outline. Unary Relational Operations: SELECT and

EECS 647: Introduction to Database Systems

Database Usage (and Construction)

Basic operators: selection, projection, cross product, union, difference,

CMPT 354: Database System I. Lecture 5. Relational Algebra

Relational Algebra and SQL

Relational Algebra. Relational Query Languages

Relational Model History. COSC 304 Introduction to Database Systems. Relational Model and Algebra. Relational Model Definitions.

Relational Algebra. Study Chapter Comp 521 Files and Databases Fall

Subqueries. Must use a tuple-variable to name tuples of the result

CSE 344 JANUARY 26 TH DATALOG

CS 4604: Introduc0on to Database Management Systems. B. Aditya Prakash Lecture #3: SQL and Rela2onal Algebra- - - Part 1

Chapter 2: Intro to Relational Model

Relational Model and Relational Algebra

Today s topics. Null Values. Nulls and Views in SQL. Standard Boolean 2-valued logic 9/5/17. 2-valued logic does not work for nulls

The Relational Model and Relational Algebra

Textbook: Chapter 6! CS425 Fall 2013 Boris Glavic! Chapter 3: Formal Relational Query. Relational Algebra! Select Operation Example! Select Operation!

Improving Query Plans. CS157B Chris Pollett Mar. 21, 2005.

SQL: Data Manipulation Language. csc343, Introduction to Databases Diane Horton Winter 2017

Relational Algebra 1

Databases Lectures 1 and 2

SQL: csc343, Introduction to Databases Renée J. Miller and Fatemeh Nargesian and Sina Sina Meraji. Winter 2018

Chapter 2 The relational Model of data. Relational algebra

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 7 - Query optimization

Algebraic laws extensions to relational algebra

Relational Query Languages

CSC 261/461 Database Systems Lecture 19

Chapter 6: Formal Relational Query Languages

1. Introduction; Selection, Projection. 2. Cartesian Product, Join. 5. Formal Definitions, A Bit of Theory

CSE 544 Principles of Database Management Systems

RELATIONAL DATA MODEL: Relational Algebra

Query Processing & Optimization

Final Review. CS 145 Final Review. The Best Of Collection (Master Tracks), Vol. 2

Relational Algebra. Lecture 4A Kathleen Durant Northeastern University

Name: Problem 1 Consider the following two transactions: T0: read(a); read(b); if (A = 0) then B = B + 1; write(b);

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe

Database Management Systems CSEP 544. Lecture 3: SQL Relational Algebra, and Datalog

Relational Model & Algebra. Announcements (Thu. Aug. 27) Relational data model. CPS 116 Introduction to Database Systems

2.3 Algorithms Using Map-Reduce

Relational Algebra. Relational Query Languages

Relational Query Languages. Relational Algebra. Preliminaries. Formal Relational Query Languages. Relational Algebra: 5 Basic Operations

CSC 261/461 Database Systems Lecture 5. Fall 2017

CSE 544 Principles of Database Management Systems

Database System Concepts

Introduction to SQL. Select-From-Where Statements Multirelation Queries Subqueries. Slides are reused by the approval of Jeffrey Ullman s

Introduction to Data Management. Lecture #11 (Relational Algebra)

CIS 330: Applied Database Systems. ER to Relational Relational Algebra

Query Processing & Optimization. CS 377: Database Systems

Databases 1. Daniel POP

Chapter 6 The Relational Algebra and Calculus

Relational Algebra. ICOM 5016 Database Systems. Roadmap. R.A. Operators. Selection. Example: Selection. Relational Algebra. Fundamental Property

Relational model continued. Understanding how to use the relational model. Summary of board example: with Copies as weak entity

Relational Algebra Part I. CS 377: Database Systems

CS411 Database Systems. 05: Relational Schema Design Ch , except and

CSE Database Management Systems. York University. Parke Godfrey. Winter CSE-4411M Database Management Systems Godfrey p.

Relational Database Model. III. Introduction to the Relational Database Model. Relational Database Model. Relational Terminology.

CSE344 Midterm Exam Winter 2017

Introduction to Database Systems CSE 444

Transcription:

Lecture 02.03. Query evaluation Combining operators. Logical query optimization By Marina Barsky Winter 2016, University of Toronto

Quick recap: Relational Algebra Operators Core operators: Selection σ Projection π Cartesian product x Union U Difference Renaming ρ Derived operators: Join Intersection

Core RA operators

Slice operations: Projection Produces from relation R a new relation that has only the A 1,, A n columns of R. S= attribute list (R)

Slice operations: Selection Produces a new relation with those tuples of R which satisfy condition C. S= condition ( R )

Join operation: Cartesian product 1. Set of tuples rs that are formed by choosing the first part (r) to be any tuple of R and the second part (s) to be any tuple of S. X 2.Schema for the resulting relation is the union of schemas for R and S. 3.If R and S happen to have some attributes in common, then prefix those attributes by the relation name. T=R x S

Union T=R S R S R U S

Difference R S R S R - S

Renaming Operator S(A1,A2,,An) (R) 1. Resulting relation has exactly the same tuples as R, but the name of the relation is S. 2. Moreover, the attributes of the result relation S are named A 1, A 2,, A n, in order from the left.

Query with renaming: example T (uid1, uid2) A B B A B C A C C B SELECT R.uid1, R.uid2 FROM T as R, T as S WHERE R.uid1 = S.uid2 AND R.uid2 = S.uid1 Find all true friends in twitter dataset By renaming T we created two identical relations R and S, and we now extract all tuples where for each pair X Y in R there is a pair Y X in S π R.uid1, R.uid2 σ R.uid1=S.uid2 AND R.uid2 = S.uid1 ( R (T) x S (T))

Core operators sufficient to express any query in relational model Relational model due to Edgar Ted Codd, a mathematician at IBM in 1970 A Relational Model of Data for Large Shared Data Banks". Communications of the ACM 13 (6): 377 387 He proved that any query can be expressed using these core operators: σ, π, x, U,, ρ The Relational model is precise, implementable, and we can operate on it (query/update, etc.)

Relational algebra: closure Students(sid,sname,gpa) SELECT DISTINCT sname, gpa FROM Students WHERE gpa > 3.5; How do we represent this query in RA? Note that any RA Operator returns relation, so we can compose complex queries from known operators π sname,gpa (σ gpa>3.5 (Students)) σ gpa>3.5 (π sname,gpa ( Students)) Are these logically equivalent?

RA has Limitations! Cannot compute transitive closure Name1 Name2 Relationship Fred Mary Father Mary Joe Cousin Mary Bill Spouse Nancy Lou Sister Find all direct and indirect relatives of Fred Cannot express in RA!!! Need to write C program, use a graph engine, or PL- SQL

Derived RA operators

Join operation: Theta-join 1.The result of this operation is constructed as follows: a)take the Cartesian product of R and S. b) Select from the product only those tuples that satisfy the condition C. 2.Schema for the result is the union of the schema of R and S, with R or S prefix as necessary. T= R condition S Shortcut for T=σ condition (R x S) X σ

Join operation: Equijoin 1.Equijoin is a subset of thetajoins where the join condition is equality X σ T= R R.A = S.B S Shortcut for T=σ R.A = S.B (R x S)

Natural Join R S Special case of equijoin when attributes we want to use in join have the same name in both tables Let A 1, A 2,,A n be the attributes in both the schema of R and the schema of S. Then a tuple r from R and a tuple s from S are successfully paired if and only if r and s agree on each of the attributes A 1, A 2,, A n.

Outer join 1. For each tuple in R, include all tuples in S which satisfy join condition, but include also tuples of R that do not have matches in S X 2. For this case, pair tuples of R with NULL σ NULL Left outer join T= R condition S

Outer join: example Anonymous patient P age zip disease 54 99999 heart 20 44444 flue 33 66666 lung Anonymous occupation O age zip job 54 99999 lawyer 20 44444 cashier T= P O age zip disease job 54 99999 heart lawyer 20 44444 flue cashier 33 66666 lung NULL

Quick question If I have a relation R with 100 records and a relation S with exactly 1 record, how many records will be in the result of R LEFT OUTER JOIN S? A. At least 100, but could be more B. Could be any number between 0 and 100 inclusive C. 0 D. 1 E. 100

Quick question If I have a relation R with 100 records and a relation S with exactly 1 record, how many records will be in the result of R LEFT OUTER JOIN S? A. At least 100, but could be more B. Could be any number between 0 and 100 inclusive C. 0 D. 1 E. 100

Intersection T=R S R S R S R S

Why intersection is not a core operation? R S R S R-S R S R - S (are in R but not in S) R - (R-S)

Why intersection is not a core operation? R S shortcut to R (R S) Can be derived using core operations

Set vs. bag (multi-set) semantics Sets: {a,b,c}, {a,d,e,f}, Bags: {a,a,b,c}, {b,b,b,b,b}, Relational algebra has two semantics: Set semantics = standard relational algebra Bag semantics = extended Relational Algebra Rule of thumb: Every paper will assume set semantics Every implementation will assume bag semantics

Operations on multisets All RA operations need to be defined carefully on bags C (R): preserve the number of occurrences π A (R): no duplicate elimination Cross-product, join: no duplicate elimination This is important- relational DBMSs work on multisets, not sets!

Quick question What implementation would have a smaller cost implementation for bag projection or set projection? Why? A. Set projection. The number of records in a set is typically smaller than a bag, and cost is a function of the number of records in the collection. B. Bag projection. A bag is easier to reason about formally, and therefore allows more aggressive optimization opportunities. C. Bag projection. Removing duplicates requires an extra step, which can be expensive and is not always required by the application.

Quick question What implementation would have a smaller cost implementation for bag projection or set projection? Why? A. Set projection. The number of records in a set is typically smaller than a bag, and cost is a function of the number of records in the collection. B. Bag projection. A bag is easier to reason about formally, and therefore allows more aggressive optimization opportunities. C. Bag projection. Removing duplicates requires an extra step, which can be expensive and is not always required by the application.

Extended operators on bags Duplicate elimination δ Sorting τ Grouping and aggregation γ

RDBMS query evaluation How does a RDBMS answer your query? SQL Query Relational Algebra (RA) Plan Optimized RA Plan Execution Declarative query (user declares what results are needed) Translate to relational algebra expression Find logically equivalent- but more efficient- RA expression Execute each operator of the optimized plan!

question SQL query parse parse tree convert logical query plan improve logically improved l.q.p estimate sizes l.q.p. +sizes consider physical plans {P1,P2,..} estimate costs statistics RDBMS query optimizer: steps Convert parsed SQL into corresponding RA expression Apply known algebraic transformations produce improved logical query plan Transform based on estimated cost Choose one min-cost logical expression For each step, consider alternative physical implementations Choose physical plan with min I/Os Execute answer pick best execute

RDBMS query evaluation How does a RDBMS answer your query? SQL Query Relational Algebra (RA) Plan Optimized RA Plan Execution Declarative query (user declares what results are needed) Translate to relational algebra expression Find logically equivalent- but more efficient- RA expression Execute each operator of the optimized plan! That we just learned how to do!

Example: two SQL queries the same execution plan R R.A = S.B S σ R.A = S.B (R x S) Join SELECT * FROM R JOIN S ON R.A = S.B Cross-product with selection SELECT * FROM R, S WHERE R.A = S.B Two ways to request the same results The optimizer does not care about the syntax of SQL query: it is going to work on the algebraic representation anyway Because the algebraic expressions are equivalent, the optimizer will have the same final plan for both queries

RDBMS query evaluation How does a RDBMS answer your query? SQL Query Relational Algebra (RA) Plan Optimized RA Plan Execution Relational Algebra allows us to translate declarative (SQL) queries into precise and optimizable expressions!

question SQL query parse parse tree convert logical query plan improve logically improved l.q.p estimate sizes l.q.p. +sizes consider physical plans {P1,P2,..} estimate costs statistics RDBMS query optimizer: steps Convert parsed SQL into corresponding RA expression Apply known algebraic transformations produce improved logical query plan Transform based on estimated cost Choose one min-cost logical expression For each step, consider alternative physical implementations Choose physical plan with min I/Os Execute answer pick best execute

Translating general SQL queries (SELECT-FROM-WHERE) into RA What is the general form of an SFW query in RA? Given a general SFW SQL query: SELECT A_1,..., A_n FROM R_1,..., R_m WHERE c_1,..., c_k; We can express this in relational algebra as follows: π A1,...,An (σ c1 σ c k (R 1x xr m ))

We can visualize the RA expression as a tree π B π B (R A, B S B, C ) R(A,B) S(B,C) Bottom-up tree traversal = order of operation execution!

From RA to SQL π B What SQL query does this correspond to? R(A,B) S(B,C) Are there any logically equivalent RA expressions?

From SQL to RA: example 1 R(A,B) S(B,C) T(C,D) SELECT R.A,T.D FROM R,S,T WHERE R.B = S.B AND S.C = T.C AND R.A < 10; π A,D A<10 T(C,D) π A,D (σ A<10 T R S ) R(A,B) S(B,C)

From SQL to RA: example 2 S (product, city, price) SELECT city, count (*) FROM S GROUP BY city HAVING sum(price)>100 π city,c p>100 γ city, sum(price) p, count( ) c S(product,city,price) π city, c (σ p>100 (γ city, sum(price) p, count( ) c (S)) )

RDBMS query evaluation How does a RDBMS answer your query? SQL Query Relational Algebra (RA) Plan Optimized RA Plan Execution We transform the original RA expression into equivalent expressions using algebraic laws

question SQL query parse parse tree convert logical query plan improve logically improved l.q.p estimate sizes l.q.p. +sizes consider physical plans {P1,P2,..} estimate costs statistics RDBMS query optimizer: steps Convert parsed SQL into corresponding RA expression Apply known algebraic transformations produce improved logical query plan Transform based on estimated cost Choose one min-cost logical expression For each step, consider alternative physical implementations Choose physical plan with min I/Os Execute answer pick best execute

RA laws involving selection ( ) Selection for a single relation Splitting law: C D ( R) ( ( R)) Commutative law: order is flexible C C ( ( R)) ( ( R)) D D D C

RA laws involving selection ( ) Binary selection (on 2 relations) C ( R S) R C S ( R S) C C ( R) S ( R S) C D C ( R) D S For the binary operators, we push the selection to R only if all attributes in the condition C are in R.

Pushing selections: example Consider R(A,B) and S(B,C) and the expression below: A=1 B<C (R S) 1. Splitting AND A=1 ( B < C (R S)) 2. Push to S A=1 (R B < C (S)) 3. Push to R A=1 (R) B < C (S)

Laws for (bag) projection A simple law: Project out attributes that are not needed later. i.e. keep only the output attr. and any join attribute. L ( R S) ( ( R) ( S)) L M N L ( R S) C L ( R S) L C R ( ( R) ( S)) L L M M C N N ( ( R) ( S)) L C M R

Pushing projection: example Schema R(a,b,c), S(c,d,e) a e x( R S) a e x(, a, c( R) c e( S)) a b x, d e y ( R S) x, y ( a b x, c( R) d e y, c( S))

Why to push projections? π B π B π B R(A,B) S(B,C) R(A,B) S(B,C) Why might we prefer this plan?

Commutative and associative laws for joins Commutative and associative laws for joins: R S = S R (R S ) T = R (S T) Above laws are applicable for both sets and bags

Quick question Given relation R(A,B): Here, projection & selection commute: σ A=5 (π A (R)) π A (σ A=5 (R)) What about here? π B σ A=5 R σ A=5 (π B (R))?

Logical optimization: example R(A,B) S(B,C) T(C,D) SELECT R.A, T.D FROM R,S,T WHERE R.B = S.B AND S.C = T.C AND R.A < 10; Push down selection on A so it occurs earlier π A,D A<10 T(C,D) π A,D (σ A<10 T R S ) R(A,B) S(B,C)

Logical optimization: example R(A,B) S(B,C) T(C,D) SELECT R.A, T.D FROM R,S,T WHERE R.B = S.B AND S.C = T.C AND R.A < 10; Push down selection on A so it occurs earlier π A,D T(C,D) π A,D T σ A<10 (R) S A<10 R(A,B) S(B,C)

Logical optimization: example R(A,B) S(B,C) T(C,D) SELECT R.A, T.D FROM R,S,T WHERE R.B = S.B AND S.C = T.C AND R.A < 10; Push down projection so it occurs earlier π A,D T(C,D) π A,D T σ A<10 (R) S A<10 R(A,B) S(B,C)

Logical optimization: example R(A,B) S(B,C) T(C,D) SELECT R.A, T.D FROM R,S,T WHERE R.B = S.B AND S.C = T.C AND R.A < 10; We eliminate B earlier! In general, when is an attribute not needed? π A,C π A,D T(C,D) π A,D T π A,C σ A<10 (R) S A<10 R(A,B) S(B,C)

Improving logical query plans using algebraic laws: summary 1. Push as far down as possible 2. Do splitting of complex conditions in in order to push even further 3. Push as far down as possible, introduce new early (but take care for exceptions) 4. Combine with to produce -joins or equijoins 5. Choose an order for joins Topic by itself

Why still so many different plans selected for the same query? Depends on sizes of intermediate outputs SELECT extendedprice FROM lineitem, supplier WHERE lineitem.sid = supplier.sid AND extendedprice: varies AND supplier.accountbalance: varies Same query executed with different selection cardinality covering from 0 to 100% of all values results in completely different plans Picasso Database Query Optimizer Visualizer: link Here: 89 plans, each in different color

Coming next: cost-based transformations