CSE 444: Database Internals. Section 4: Query Optimizer

Similar documents
CSE 444: Database Internals. Sec2on 4: Query Op2mizer

CompSci 516 Data Intensive Computing Systems. Lecture 11. Query Optimization. Instructor: Sudeepa Roy

Implementing Relational Operators: Selection, Projection, Join. Database Management Systems, R. Ramakrishnan and J. Gehrke 1

Query Optimization. Schema for Examples. Motivating Example. Similar to old schema; rname added for variations. Reserves: Sailors:

Examples of Physical Query Plan Alternatives. Selected Material from Chapters 12, 14 and 15

Query Optimization. Schema for Examples. Motivating Example. Similar to old schema; rname added for variations. Reserves: Sailors:

CompSci 516 Data Intensive Computing Systems

Schema for Examples. Query Optimization. Alternative Plans 1 (No Indexes) Motivating Example. Alternative Plans 2 With Indexes

CSIT5300: Advanced Database Systems

Overview of Query Evaluation. Overview of Query Evaluation

Principles of Data Management. Lecture #9 (Query Processing Overview)

Overview of Implementing Relational Operators and Query Evaluation

CS330. Query Processing

External Sorting Implementing Relational Operators

Overview of Query Evaluation

Evaluation of Relational Operations: Other Techniques. Chapter 14 Sayyed Nezhadi

Database Management System

Overview of Query Evaluation. Chapter 12

Relational Query Optimization

Review. Relational Query Optimization. Query Optimization Overview (cont) Query Optimization Overview. Cost-based Query Sub-System

An SQL query is parsed into a collection of query blocks optimize one block at a time. Nested blocks are usually treated as calls to a subroutine

Basic form of SQL Queries

Relational Query Optimization. Highlights of System R Optimizer

Introduction to Data Management. Lecture 14 (SQL: the Saga Continues...)

Lecture 3 More SQL. Instructor: Sudeepa Roy. CompSci 516: Database Systems

Relational Query Optimization. Overview of Query Evaluation. SQL Refresher. Yanlei Diao UMass Amherst October 23 & 25, 2007

Database Applications (15-415)

Database Applications (15-415)

Overview of Query Processing. Evaluation of Relational Operations. Why Sort? Outline. Two-Way External Merge Sort. 2-Way Sort: Requires 3 Buffer Pages

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17

Lecture 2 SQL. Instructor: Sudeepa Roy. CompSci 516: Data Intensive Computing Systems

Principles of Data Management. Lecture #12 (Query Optimization I)

Implementing Joins 1

R & G Chapter 13. Implementation of single Relational Operations Choices depend on indexes, memory, stats, Joins Blocked nested loops:

Database Applications (15-415)

ATYPICAL RELATIONAL QUERY OPTIMIZER

Implementation of Relational Operations

Database Applications (15-415)

IMPORTANT: Circle the last two letters of your class account:

Query Evaluation Overview, cont.

Implementation of Relational Operations: Other Operations

Evaluation of Relational Operations. Relational Operations

Query Evaluation Overview, cont.

Administriva. CS 133: Databases. General Themes. Goals for Today. Fall 2018 Lec 11 10/11 Query Evaluation Prof. Beth Trushkowsky

Evaluation of Relational Operations

Evaluation of relational operations

Evaluation of Relational Operations

Database Systems. Announcement. December 13/14, 2006 Lecture #10. Assignment #4 is due next week.

Midterm Exam #2 (Version A) CS 122A Winter 2017

Database Applications (15-415)

Database Applications (15-415)

Evaluation of Relational Operations

Cost-based Query Sub-System. Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Last Class.

CS 4604: Introduction to Database Management Systems. B. Aditya Prakash Lecture #10: Query Processing

SQL Part 2. Kathleen Durant PhD Northeastern University CS3200 Lesson 6

Overview of Query Processing

CSIT5300: Advanced Database Systems

Faloutsos 1. Carnegie Mellon Univ. Dept. of Computer Science Database Applications. Outline

Administrivia. Relational Query Optimization (this time we really mean it) Review: Query Optimization. Overview: Query Optimization

Lecture 3 SQL - 2. Today s topic. Recap: Lecture 2. Basic SQL Query. Conceptual Evaluation Strategy 9/3/17. Instructor: Sudeepa Roy

Evaluation of Relational Operations: Other Techniques

Implementation of Relational Operations. Introduction. CS 186, Fall 2002, Lecture 19 R&G - Chapter 12

ECS 165B: Database System Implementa6on Lecture 7

Relational Query Optimization. Overview of Query Evaluation. SQL Refresher. Yanlei Diao UMass Amherst March 8 and 13, 2007

UNIVERSITY OF CALIFORNIA College of Engineering Department of EECS, Computer Science Division

Dtb Database Systems. Announcement

Operator Implementation Wrap-Up Query Optimization

Evaluation of Relational Operations: Other Techniques

Administrivia. CS 133: Databases. Cost-based Query Sub-System. Goals for Today. Midterm on Thursday 10/18. Assignments

SQL. Chapter 5 FROM WHERE

Database Systems. October 12, 2011 Lecture #5. Possessed Hands (U. of Tokyo)

CITS3240 Databases Mid-semester 2008

SQL: Queries, Constraints, Triggers

Advances in Data Management Query Processing and Query Optimisation A.Poulovassilis

Course No: 4411 Database Management Systems Fall 2008 Midterm exam

SQL: Queries, Programming, Triggers

Evaluation of Relational Operations: Other Techniques

CompSci 516: Database Systems

Introduction to Data Management. Lecture #13 (Relational Calculus, Continued) It s time for another installment of...

CAS CS 460/660 Introduction to Database Systems. Query Evaluation II 1.1

Overview of DB & IR. ICS 624 Spring Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa

Introduction to Data Management. Lecture #14 (Relational Languages IV)

Database Systems ( 資料庫系統 )

QUERY OPTIMIZATION E Jayant Haritsa Computer Science and Automation Indian Institute of Science. JAN 2014 Slide 1 QUERY OPTIMIZATION

Enterprise Database Systems

15-415/615 Faloutsos 1

CIS 330: Applied Database Systems

SQL: Queries, Programming, Triggers. Basic SQL Query. Conceptual Evaluation Strategy. Example of Conceptual Evaluation. A Note on Range Variables

The Database Language SQL (i)

System R Optimization (contd.)

SQL: The Query Language Part 1. Relational Query Languages

Database Applications (15-415)

Midterm Exam (Version B) CS 122A Spring 2017

Data Science 100. Databases Part 2 (The SQL) Slides by: Joseph E. Gonzalez & Joseph Hellerstein,

Introduction to Data Management. Lecture #10 (Relational Calculus, Continued)

Query Optimization in Relational Database Systems

Evaluation of Relational Operations. SS Chung

HW1 is due tonight HW2 groups are assigned. Outline today: - nested queries and witnesses - We start with a detailed example! - outer joins, nulls?

Data Science 100 Databases Part 2 (The SQL) Previously. How do you interact with a database? 2/22/18. Database Management Systems

Συστήματα Διαχείρισης Βάσεων Δεδομένων

Transcription:

CSE 444: Database Internals Section 4: Query Optimizer

Plan for Today Problem 1A, 1B: Estimating cost of a plan You try to compute the cost for 5 mins We will go over the solution together Problem 2: Sellinger Optimizer We will do it together

1. Estimating Cost of a given plan Student (sid, name, age, address) Book(bid, title, author) Checkout(sid, bid, date) Query: SELECT S.name FROM Student S, Book B, Checkout C WHERE S.sid = C.sid AND B.bid = C.bid AND B.author = 'Olden Fames' AND S.age >= 13 AND S.age <= 19

S(sid,name,age,addr) B(bid,title,author) C(sid,bid,date) Assumptions Student: S Book: B Checkout: C Sid, bid are foreign keys in C referencing S and B. There are 10,000 Student records stored on 1,000 pages. There are 50,000 Book records stored on 5,000 pages. There are 300,000 Checkout records stored on 15,000 pages. There are 500 different authors. Student ages range from 7 to 24 uniformly (integers).

S(sid,name,age,addr) B(bid,title,author) C(sid,bid,date) T(S)=10,000 T(B)=50,000 T(C)=300,000 Physical Query Plan 1A B(S)=1,000 B(B)=5,000 B(C)=15,000 V(B,author) = 500 7 <= age <= 24 (Tuple-based nested loop B inner) (Block-nested loop, S outer, C inner) (d) P name (c) s 13<=age<=19 Ʌ author = Olden Fames sid (a) bid (b) Q. Compute 1. the cost and cardinality in steps (a) to (d) 2. the total cost Assumptions: Data is not sorted on any attributes Student S (File scan) Checkout C (File scan) Book B (File scan) 5

S(sid,name,age,addr) B(bid,title,author) C(sid,bid,date) (c) s 13<=age<=19 Ʌ author = Olden Fames (Block-nested loop, S outer, C inner) Solution 1A Student S (File scan) T(S)=10,000 T(B)=50,000 T(C)=300,000 (d) P name (Tuple-based nested loop B inner) sid (a) bid (b) Checkout C (File scan) Total cost = 1,515,001,000 Final cardinality = 234 (approx) Book B (File Scan) B(S)=1,000 B(B)=5,000 B(C)=15,000 (a) Cost (I/O) B(S) + B(S) * B(C) = 1000 + 1000 * 15000 = 15,001,000 (b) V(B,author) = 500 7 <= age <= 24 Cardinality = T(S) * T(C)/V(S, sid) = 300,000 (foreign key join) Cost(I/O) = T(S join C) * B(B) = 300,000 * 5,000 = 15 * 10 8 Cardinality = T(S join C) * T(B)/ V(B, bid)) = 300,000 (foreign key join) (c, d) Cost(I/O) = 0 (on the fly) Cardinality: 300,000 * 1/500 * 7/18 = 234 (approx) (assuming uniformity and independence) 6

S(sid,name,age,addr) B(bid,title,author) C(sid,bid,date) T(S)=10,000 T(B)=50,000 T(C)=300,000 Physical Query Plan 1B B(S)=1,000 B(B)=5,000 B(C)=15,000 V(B,author) = 500 7 <= age <= 24 (g) P name (Block nested loop S inner) (Indexed-nested loop, B outer, C inner) (b) P bid Book B (Index scan) (a) s author = Olden Fames (d) P sid bid (f) s 13<=age<=19 (c) sid (e) Checkout C (Index scan) Q. Compute 1. the cost and cardinality in steps (a) to (g) 2. the total cost Assumptions: Unclustered B+tree index on B.author Clustered B+tree index on C.bid All index pages are in memory Unlimited memory Student S (File scan) 7

S(sid,name,age,addr) B(bid,title,author): Un. B+ on author C(sid,bid,date): Cl. B+ on bid Solution 1B (Block nested loop S inner) (Indexed-nested loop, B outer, C inner) (b) P bid (a) s author = Olden Fames Book B (Index scan) (d) P sid bid (f) s 13<=age<=19 (c) T(S)=10,000 T(B)=50,000 T(C)=300,000 (g) P name sid Checkout C (Index scan) (e) Student S (File scan) Total cost = 1300 (compare with 1,515,001,000 in 1A) Final cardinality = 234 (approx) (same as 1A!) B(S)=1,000 B(B)=5,000 B(C)=15,000 V(B,author) = 500 7 <= age <= 24 (a) cost (I/O) = T(B) / V(B, author) = 50,000/500 = 100 (unclustered) cardinality = 100 (b) Cost = 0 cardinality = 100 (c) i. one index lookup per outer B tuple ii. 1 book has 6 checkouts (uniformity) iii. # C tuples per page = T(C)/B(C) = 20 iv. 6 tuples fit in at most 2 consecutive pages (clustered) or 1 if all fit on the page Cost = 100 * 2= 200 cardinality = 100 * 6 = 600 (d) Cost =0, cardinality= 600 (e) Outer relation is already in memory, need to scan S relation Cost B(S) = 1000 Cardinality = 600 (f) Cost = 0 Cardinality = 600 * 7/18 = 234 (approx) (g) Cost= 0, cardinality = 234 8

2. Sellinger Optimization Example Sailors (sid, sname, srating, age) Boats(bid, bname, color) Reserves(sid, bid, date, rname) Query: SELECT S.sid, R.rname FROM Sailors S, Boats B, Reserves R WHERE S.sid = R.sid AND B.bid = R.bid AND B.color = red Example is from the Ramakrishnan book

S (sid, sname, srating, age) B (bid, bname, color) R (sid, bid, date, rname) Available Indexes Sailors: S Boats: B Reserves: R Sid, bid foreign key in R referencing S and B resp. Sailors Unclustered B+ tree index on sid Unclustered hash index on sid Boats Unclustered B+ tree index on color Unclustered hash index on color Reserves Unclustered B+ tree on sid Clustered B+ tree on bid

S (sid, sname, srating, age): 1. B+tree - sid, 2. hash index - sid B (bid, bname, color) : 1. B+tree - color, 2. hash index - color R (sid, bid, date, rname) : 1. B+tree - sid, 2. Clustered B+tree - bid SELECT S.sid, R.rname WHERE S.sid = R.sid B.bid = R.bid, B.color = red First Pass Where to start? How to access each relation, assuming it would be the first relation being read File scan is also available! Sailors? No selection matching an index, use File Scan (no overhead) Reserves? Same as Sailors Boats? Hash index on color, matches B.color = red B+ tree also matches the predicate, but hash index is cheaper B+ tree would be cheaper for range queries

S (sid, sname, srating, age): 1. B+tree - sid, 2. hash index - sid B (bid, bname, color) : 1. B+tree - color, 2. hash index - color R (sid, bid, date, rname) : 1. B+tree - sid, 2. Clustered B+tree - bid What next? SELECT S.sid, R.rname WHERE S.sid = R.sid B.bid = R.bid, B.color = red Second Pass For each of the plan in Pass 1 taken as outer, consider joining another relation as inner What are the combinations? How many new options? Outer Inner OPTION 1 OPTION 2 OPTION 3 R (file scan) B (B+-color) (hash color) (File scan) R (file scan) S (B+-sid) (hash sid),, S (file scan) B (B+-color) (hash color),, S (file scan) R (B+-sid) (Cl. B+ bid),, B (hash index) R (B+-sid) (Cl. B+ bid,, B (hash index) S (B+-sid) (hash sid),,

S (sid, sname, srating, age): 1. B+tree - sid, 2. hash index - sid B (bid, bname, color) : 1. B+tree - color, 2. hash index - color R (sid, bid, date, rname) : 1. B+tree - sid, 2. Clustered B+tree - bid SELECT S.sid, R.rname WHERE S.sid = R.sid B.bid = R.bid, B.color = red Second Pass Which outer-inner combinations can be discarded? B, S and S, B: Cartesian product! Outer Inner OPTION 1 OPTION 2 OPTION 3 R (file scan) B (B+-color) (hash color) (File scan) R (file scan) S (B+-sid) (hash sid),, S (file scan) B (B+-color) (hash color),, S (file scan) R (B+-sid) (Cl. B+ bid),, B (hash index) S (B+-sid) (hash sid),, B (hash index) R (B+-sid) (Cl. B+ bid):,, OPTION 3 is not shown on next slide, expected to be more expensive

S (sid, sname, srating, age): 1. B+tree - sid, 2. hash index - sid B (bid, bname, color) : 1. B+tree - color, 2. hash index - color R (sid, bid, date, rname) : 1. B+tree - sid, 2. Clustered B+tree - bid SELECT S.sid, R.rname WHERE S.sid = R.sid B.bid = R.bid, B.color = red Outer Inner OPTION 1 OPTION 2 R (file scan) S (B+-sid) Slower than hash-index (need Sailor tuples matching S.sid = value, where value comes from an outer R tuple) (hash sid): likely to be faster 2A. Index nested loop join 2B Sort Merge based join: (sorted by sid) R (file scan) B (B+-color) Not useful (hash color) Select those tuples where B.color = red using the color index (note: no index on bid) S (file scan) R (B+-sid) Consider all join methods (Cl. B+ bid) Not useful B (hash index) R (B+-sid) Not useful (Cl. B+ bid) 2A. Index nested loop join 2B. Sort-merge join (sorted on bid) Keep the least cost plan between (R, S) and (S, R) (R, B) and (B, R)

S (sid, sname, srating, age): 1. B+tree - sid, 2. hash index - sid B (bid, bname, color) : 1. B+tree - color, 2. hash index - color R (sid, bid, date, rname) : 1. B+tree - sid, 2. Clustered B+tree - bid SELECT S.sid, R.rname WHERE S.sid = R.sid B.bid = R.bid, B.color = red Third Pass Join with the third relation For each option retained in Pass 2, join with the third relation E.g. Boats (B+tree on color) sort-merged-join Reserves (B+tree on bid) Join the result with Sailors (B+ tree on sid) using sort-mergejoin Need to sort (B join R) by sid, was sorted on bid before Outputs tuples sorted by sid Not useful here, but will be useful if we had GROUP BY on sid In general, a higher cost interesting plans may be retained (e.g. sort operator at root, grouping attribute in group by query later, join attribute in a later join)

Homework 5 Query Plan Cost Computation Query Optimization