CSIT5300: Advanced Database Systems

Similar documents
CSIT5300: Advanced Database Systems

CSIT5300: Advanced Database Systems

Examples of Physical Query Plan Alternatives. Selected Material from Chapters 12, 14 and 15

CSE 444: Database Internals. Section 4: Query Optimizer

CSE 444: Database Internals. Sec2on 4: Query Op2mizer

Query Optimization. Schema for Examples. Motivating Example. Similar to old schema; rname added for variations. Reserves: Sailors:

Query Optimization. Schema for Examples. Motivating Example. Similar to old schema; rname added for variations. Reserves: Sailors:

R & G Chapter 13. Implementation of single Relational Operations Choices depend on indexes, memory, stats, Joins Blocked nested loops:

Database Systems. Announcement. December 13/14, 2006 Lecture #10. Assignment #4 is due next week.

Schema for Examples. Query Optimization. Alternative Plans 1 (No Indexes) Motivating Example. Alternative Plans 2 With Indexes

Implementing Relational Operators: Selection, Projection, Join. Database Management Systems, R. Ramakrishnan and J. Gehrke 1

Relational Query Optimization

Overview of Query Evaluation. Chapter 12

ATYPICAL RELATIONAL QUERY OPTIMIZER

Database Applications (15-415)

Evaluation of Relational Operations: Other Techniques. Chapter 14 Sayyed Nezhadi

CS330. Query Processing

Overview of Implementing Relational Operators and Query Evaluation

External Sorting Implementing Relational Operators

Overview of Query Evaluation. Overview of Query Evaluation

Review. Relational Query Optimization. Query Optimization Overview (cont) Query Optimization Overview. Cost-based Query Sub-System

Overview of Query Evaluation

CSIT5300: Advanced Database Systems

Introduction to Data Management. Lecture 14 (SQL: the Saga Continues...)

Principles of Data Management. Lecture #9 (Query Processing Overview)

Relational Query Optimization. Highlights of System R Optimizer

Basic form of SQL Queries

CompSci 516 Data Intensive Computing Systems

Administriva. CS 133: Databases. General Themes. Goals for Today. Fall 2018 Lec 11 10/11 Query Evaluation Prof. Beth Trushkowsky

Faloutsos 1. Carnegie Mellon Univ. Dept. of Computer Science Database Applications. Outline

Database Applications (15-415)

An SQL query is parsed into a collection of query blocks optimize one block at a time. Nested blocks are usually treated as calls to a subroutine

CS 4604: Introduction to Database Management Systems. B. Aditya Prakash Lecture #10: Query Processing

CSIT5300: Advanced Database Systems

CSIT5300: Advanced Database Systems

Implementing Joins 1

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17

Overview of Query Processing

Evaluation of Relational Operations

This lecture. Step 1

Evaluation of Relational Operations

Database Management System

Query Processing and Query Optimization. Prof Monika Shah

Evaluation of Relational Operations: Other Techniques

SQL: Queries, Programming, Triggers

15-415/615 Faloutsos 1

Evaluation of Relational Operations: Other Techniques

CAS CS 460/660 Introduction to Database Systems. Query Evaluation II 1.1

Administrivia. Relational Query Optimization (this time we really mean it) Review: Query Optimization. Overview: Query Optimization

Lecture 3 More SQL. Instructor: Sudeepa Roy. CompSci 516: Database Systems

Database Applications (15-415)

Implementation of Relational Operations: Other Operations

Query Evaluation Overview, cont.

Evaluation of Relational Operations: Other Techniques

Chapter 12: Query Processing. Chapter 12: Query Processing

Overview of DB & IR. ICS 624 Spring Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa

Enterprise Database Systems

Query Evaluation Overview, cont.

CIS 330: Applied Database Systems

SQL. Chapter 5 FROM WHERE

Advances in Data Management Query Processing and Query Optimisation A.Poulovassilis

Query Processing: The Basics. External Sorting

Συστήματα Διαχείρισης Βάσεων Δεδομένων

Cost-based Query Sub-System. Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Last Class.

ECS 165B: Database System Implementa6on Lecture 7

SQL Part 2. Kathleen Durant PhD Northeastern University CS3200 Lesson 6

Implementation of Relational Operations

Evaluation of relational operations

IMPORTANT: Circle the last two letters of your class account:

Chapter 12: Query Processing

Database Applications (15-415)

Evaluation of Relational Operations

CompSci 516 Data Intensive Computing Systems. Lecture 11. Query Optimization. Instructor: Sudeepa Roy

Implementation of Relational Operations. Introduction. CS 186, Fall 2002, Lecture 19 R&G - Chapter 12

Principles of Data Management. Lecture #12 (Query Optimization I)

Evaluation of Relational Operations. Relational Operations

Advanced Database Systems

SQL: Queries, Programming, Triggers. Basic SQL Query. Conceptual Evaluation Strategy. Example of Conceptual Evaluation. A Note on Range Variables

Database System Concepts

SQL: Queries, Constraints, Triggers

QUERY OPTIMIZATION E Jayant Haritsa Computer Science and Automation Indian Institute of Science. JAN 2014 Slide 1 QUERY OPTIMIZATION

Chapter 13: Query Processing

Relational Query Optimization. Overview of Query Evaluation. SQL Refresher. Yanlei Diao UMass Amherst March 8 and 13, 2007

Course No: 4411 Database Management Systems Fall 2008 Midterm exam

Relational Query Optimization

Relational Query Optimization. Overview of Query Evaluation. SQL Refresher. Yanlei Diao UMass Amherst October 23 & 25, 2007

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for

Chapter 13: Query Processing Basic Steps in Query Processing

Administrivia. CS 133: Databases. Cost-based Query Sub-System. Goals for Today. Midterm on Thursday 10/18. Assignments

Query Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016

Query Processing and Optimization

Database Systems. October 12, 2011 Lecture #5. Possessed Hands (U. of Tokyo)

Database Systems ( 資料庫系統 )

Query Evaluation (i)

Database Applications (15-415)

Operator Implementation Wrap-Up Query Optimization

Review: Where have we been?

Homework 2: Query Processing/Optimization, Transactions/Recovery (due February 16th, 2017, 9:30am, in class hard-copy please)

System R Optimization (contd.)

QUERY EXECUTION: How to Implement Relational Operations?

Transcription:

CSIT5300: Advanced Database Systems E11: Exercises on Query Optimization Dr. Kenneth LEUNG Department of Computer Science and Engineering The Hong Kong University of Science and Technology Hong Kong SAR, China kwtleung@cse.ust.hk CSIT5300

Exercise #1 Single Relation Plans Consider the file Sailors (sname, sid, age, rating) n sailors =10,000 records, b sailors =1,000 pages (10 records/page) V(rating, Sailors)=10 and V(age, Sailors)=100 Query: SELECT name FROM Sailors WHERE rating = 7 AND age=40 Describe alternative plans to process the query and estimate their cost assuming uniformity and attribute independence How many records you expect in the result? 10,000*1/10*1/100=10 records kwtleung@cse.ust.hk CSIT5300 2

Exercise #1 Single Relation Plans (cont.) 1] Sequential (Linear) Scan Read the whole Sailors file and select the records that satisfy both conditions. Cost = b Sailors =1,000 pages 2] Binary Search If the file is sorted on Rating (or Age), do binary search and find the first record satisfying the condition Rating=7 (or Age=40). Retrieve the remaining records sequentially and filter out non qualifying tuples. Question: What is cheaper: Sorting on Rating or Age? Cost for sorting on Rating = log 2 1000+99=109 Cost for sorting on Age = log 2 1000+9=19 3] Single-index Use an index for one of the two conditions and check the other condition in memory: if we have an index on Rating we use this index to find records where Rating=7. For each such record we check the condition Age=40. Not necessarily good solution if index is not clustered. There are 1000 sailors with Rating=7. If index on Rating is not clustered, we need 1,000 random page accesses to retrieve these sailors. Better to do sequential scan. If the index is clustered, all sailors with Rating=7 are in 100 consecutive pages. kwtleung@cse.ust.hk CSIT5300 3

Exercise #1 Single Relation Plans (cont.) 4] Multidimensional index If we can have a multidimensional index (e.g., Grid file) on rating and age we can use this index to retrieve the records that satisfy both conditions. Cost depends on the index. 5] Multiple-indexes Use more than one index and do intersection of record pointers (Rids) before retrieval of actual records: use the index on Age and find all Rids of records where Age=40. Then use an index on Rating to find all Rids where Rating=7. Find the intersection of these two sets of Rids (using sorting) and only retrieve records that belong to the intersection. Applicable only if both indexes contain record pointers. 6] Covering Index If all attributes mentioned in the query match a dense index we can do an index-only scan without accessing the actual file. If, for instance, we have a B + -Tree on <name, age, rating> we can answer the query by doing a sequential scan of the index. If the B + -Tree is on <age, rating, name> we only need to read the part of the index for age = 40 and rating = 7. kwtleung@cse.ust.hk CSIT5300 4

Exercise #2 Multi-Relation Plans, Cost Estimation Assume that there are 10,000 Sailors, 100,000 Reservations, 1,000 Boats and V(date, Reserves)=1000, V(color, Boat)=10 What is the expected number of records in the result of the query? SELECT * FROM Sailor S, Reserves R, Boats B WHERE S.sid=R.sid and R.bid = B.bid and R.date=1.1.2005 and B.color=red There are 100 reservations (n reserves /V(date, Reserves)) on 1.1.2005. 10% of these reservations (i.e., 10) are on red boats. Thus, the expected result contains 10 records. Alternative solution by estimating the join result before applying selections: S JOIN R contains 100,000 records (S JOIN R) JOIN B contains 100,000 records The output size of σ R.date=1.1.2005 and B.color=red (S JOIN R JOIN B) is 100,000*Selectivity R.date=1.1.2005 *Selectivity color=red =100,000*1/1000*1/10=10 kwtleung@cse.ust.hk CSIT5300 5

Exercise #3 Evaluation/Execution Plans Assume that for all files there are 10 records per page and you have the following indexes: 1] Hash index on S.sid for Sailors (no overflow buckets) 2] Clustered B + -Tree on R.date for Reserves (2 levels) 3] Hash index on B.bid for Boats (no overflow buckets) Describe different plans for processing the query and estimate their cost. Alternative join orders after heuristic optimization (pushing the selections down): 1] (Sailor JOIN σ R.date=1.1.2005 Reserves) JOIN σ color=red Boats 2] Sailor JOIN (σ R.date=1.1.2005 Reserves JOIN σ color=red Boats) kwtleung@cse.ust.hk CSIT5300 6

Exercise #3 Evaluation/Execution Plans (cont.) Cost estimation for 1 st expression (Sailor JOIN σ R.date=1.1.2005 Reserves) JOIN σ color=red Boats Total cost: C1+C2+C3 C1: Cost of computing Temp1=(Sailor JOIN σ R.date=1.1.2005 Reserves) C2: Cost of computing Temp2=σ color=red Boats C3: Cost of Temp1 JOIN Temp2 In order to estimate C1, we need to determine the best sub-plan P1 for computing Temp1 Some alternatives: 1] BNL using Sailor as the outer relation 2] BNL using Reserves as the outer relation 3] Sort-merge join (we have to sort both tables on sid) 4] Hash join 5] Index nested loop with Reserves as the outer relation. Sailors contains a hash index on the join attribute sid. Furthermore, we have a selective condition (σ R.date=1.1.2005 Reserves) and a clustering index on Reserves.date. Best Option kwtleung@cse.ust.hk CSIT5300 7

Exercise #3 Evaluation/Execution Plans (cont.) Cost Estimation of C1 Temp1 = Sailor JOIN σ R.date=1.1.2005 Reserves Sub-Plan P1: Use clustering B + -Tree on Reserves.date to find all reservations on 1.1.2005. How many? n Reserves /V(Date,Reserves)=100,000/1,000=100 How many pages do I need to access, in order to find these reservations: 2+10 (2 in the index and 10 in the file the reservations are ordered on the date) For each reservation, I retrieve the corresponding sailor record using the hash index on Sailor.sid, with cost 2 per reservation. Total cost: 12+100*2=212 Do I need to consider other algorithms that produce results in interesting orders? In this case no, because the only useful order would be on the Reserves.bid, which cannot be generated by any algorithm in this example. kwtleung@cse.ust.hk CSIT5300 8

Exercise #3 Evaluation/Execution Plans (cont.) Cost Estimation of C2: Temp2 = σ color=red Boats C3: Temp1 JOIN Temp2 C2 using Sub-Plan P2: For Boats, I only have a hash index on bid (no index on color). Therefore, in order to find red boats, I need to scan the entire (1,000 boat records) file with cost b Boats =100 pages. Since only 10% of the boats are red, I expect to retrieve 100 records. C3 using Sub-Plan P3 (materialization). Store the intermediate results of P2 and P1, then read them and join them on the bid using any join algorithm. Very expensive. C3 using Sub-Plan P3 (pipelining). Recall that during P1, we generate 100 (Sailor JOIN σ R.date=1.1.2005 Reserves) records. When each such record is generated, we find the corresponding Boat, using the hash index on Boats.bid with cost 2. If the color is not red I discard the record (i.e., I perform the selection on the fly without the need for P2). This corresponds to the Index Nested Loops algorithm and has cost 100*2. Total cost for 1 st expression: C1+C3=212+200=412. kwtleung@cse.ust.hk CSIT5300 9

Exercise #3 Evaluation/Execution Plans (cont.) Evaluation plan for 1 st expression (Sailor JOIN σ R.date=1.1.2005 Reserves) JOIN σ color=red Boats Total cost=12+200+200=412 2. Index nested loop for every reservation on 1.1.2005 use hash index on S.sid to find information about the sailor with cost 100(1+1) Output contains 100 records JOIN 3. Index nested loop JOIN for every record of step 2 use hash index on B.bid to find information about the boat filter B.color condition in memory The cost is 1+1 per reservation, i.e., total cost 200 for all reservations of step 1 The output contains only 10 records (10% of the reservations are on red boats σ R.date=1.1.2005 Sailor S 1. Use B+-tree on R.date Will retrieve 100 records with cost 2+10 Reserves R σ B.color=red Boats B kwtleung@cse.ust.hk CSIT5300 10

Exercise #3 Evaluation/Execution Plans (cont.) Evaluation plan for 2 nd expression Sailor JOIN (σ R.date=1.1.2005 Reserves JOIN σ color=red Boats) using the same concepts Total cost=12+200+20=232 JOIN 3. Index nested loop for every reservation that satisfies steps 1 and 2 use hash index on S.sid to find information about the sailor The cost is 1+1 per reservation, i.e., total cost 20 for all reservations of step 2 Sailor S JOIN σ R.date=1.1.2005 1. Use B+-tree on R.date Will retrieve 100 records with cost 2+10 Reserves R 2. Index nested loop for every reservation that satisfies the date condition use hash index on B.bid to find information about the boat filter B.color condition in memory The cost is 1+1 per reservation, i.e., total cost 200 for all reservations of step 1 The output contains only 10 records (10% of the reservations are on red boats σ B.color=red Boats B kwtleung@cse.ust.hk CSIT5300 11

Exercise #3 Evaluation/Execution Plans (cont.) Final evaluation plan The optimizer will choose to execute the last plan, i.e., for Sailor JOIN (σ R.date=1.1.2005 Reserves JOIN σ color=red Boats), because it is cheaper Intuitively, this is expected because this expression first performs the joins on the tables that involve selections Real optimizers follow similar principles but include additional factors such as CPU time, and the difference between random and sequential I/O operation kwtleung@cse.ust.hk CSIT5300 12