Introduction to Data Management CSE 344. Lecture 12: Cost Estimation Relational Calculus

Similar documents
CSE 344 APRIL 27 TH COST ESTIMATION

CSE 344 FEBRUARY 21 ST COST ESTIMATION

Database Systems CSE 414

Announcements. Two typical kinds of queries. Choosing Index is Not Enough. Cost Parameters. Cost of Reading Data From Disk

Data Storage. Query Performance. Index. Data File Types. Introduction to Data Management CSE 414. Introduction to Database Systems CSE 414

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 7 - Query optimization

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 9 - Query optimization

Introduction to Database Systems CSE 444, Winter 2011

Introduction to Database Systems CSE 414. Lecture 26: More Indexes and Operator Costs

Lecture 19: Query Optimization (1)

Hash table example. B+ Tree Index by Example Recall binary trees from CSE 143! Clustered vs Unclustered. Example

Review. Administrivia (Preview for Friday) Lecture 21: Query Optimization (1) Where We Are. Relational Algebra. Relational Algebra.

Lecture 21: Query Optimization (1)

CSE 344 APRIL 20 TH RDBMS INTERNALS

Review: Query Evaluation Steps. Example Query: Logical Plan 1. What We Already Know. Example Query: Logical Plan 2.

Introduction to Database Systems CSE 344

Introduction to Database Systems CSE 414. Lecture 16: Query Evaluation

Introduction to Data Management CSE 344. Lectures 9: Relational Algebra (part 2) and Query Evaluation

CSE 544 Principles of Database Management Systems

Database Management Systems CSEP 544. Lecture 6: Query Execution and Optimization Parallel Data processing

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 7 - Query execution

CSE 444: Database Internals. Lectures 5-6 Indexing

Announcements. From SQL to RA. Query Evaluation Steps. An Equivalent Expression

Database Applications (15-415)

CSE 344 FEBRUARY 14 TH INDEXING

Lassonde School of Engineering Winter 2016 Term Course No: 4411 Database Management Systems

CSE 444: Database Internals. Lecture 3 DBMS Architecture

Lecture 20: Query Optimization (2)

Introduction to Database Systems CSE 444

CSE 444: Database Internals. Lecture 3 DBMS Architecture

R has a ordered clustering index file on its tuples: Read index file to get the location of the tuple with the next smallest value

Database Applications (15-415)

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17

Announcements. Agenda. Database/Relation/Tuple. Discussion. Schema. CSE 444: Database Internals. Room change: Lab 1 part 1 is due on Monday

University of Waterloo Midterm Examination Sample Solution

Principles of Data Management. Lecture #9 (Query Processing Overview)

Overview of Query Evaluation. Chapter 12

CSE wi Final Exam Sample Solution

Implementation of Relational Operations

Evaluation of Relational Operations

Database Systems CSE 414

Introduction to Database Systems CSE 344

CSE344 Midterm Exam Winter 2017

CompSci 516 Data Intensive Computing Systems. Lecture 11. Query Optimization. Instructor: Sudeepa Roy

Database Management System

Database Applications (15-415)

CSE 444: Database Internals. Sec2on 4: Query Op2mizer

Announcements. Agenda. Database/Relation/Tuple. Schema. Discussion. CSE 444: Database Internals

CSIT5300: Advanced Database Systems

CMPT 354: Database System I. Lecture 7. Basics of Query Optimization

CSE 444: Database Internals. Section 4: Query Optimizer

R & G Chapter 13. Implementation of single Relational Operations Choices depend on indexes, memory, stats, Joins Blocked nested loops:

Evaluation of Relational Operations. Relational Operations

Agenda. Discussion. Database/Relation/Tuple. Schema. Instance. CSE 444: Database Internals. Review Relational Model

Query Processing. Introduction to Databases CompSci 316 Fall 2017

Principles of Database Management Systems

Examples of Physical Query Plan Alternatives. Selected Material from Chapters 12, 14 and 15

CS 4604: Introduction to Database Management Systems. B. Aditya Prakash Lecture #10: Query Processing

Administriva. CS 133: Databases. General Themes. Goals for Today. Fall 2018 Lec 11 10/11 Query Evaluation Prof. Beth Trushkowsky

Cost-based Query Sub-System. Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Last Class.

Database Systems. Announcement. December 13/14, 2006 Lecture #10. Assignment #4 is due next week.

Spring 2017 QUERY PROCESSING [JOINS, SET OPERATIONS, AND AGGREGATES] 2/19/17 CS 564: Database Management Systems; (c) Jignesh M.

Administrivia. CS 133: Databases. Cost-based Query Sub-System. Goals for Today. Midterm on Thursday 10/18. Assignments

Dtb Database Systems. Announcement

CompSci 516 Data Intensive Computing Systems

Faloutsos 1. Carnegie Mellon Univ. Dept. of Computer Science Database Applications. Outline

CSE 444: Database Internals. Lecture 22 Distributed Query Processing and Optimization

EECS 647: Introduction to Database Systems

What we already know. Late Days. CSE 444: Database Internals. Lecture 3 DBMS Architecture

Implementation of Relational Operations: Other Operations

Query Processing and Advanced Queries. Query Optimization (4)

University of Waterloo Midterm Examination Solution

Overview of Implementing Relational Operators and Query Evaluation

CSE 544, Winter 2009, Final Examination 11 March 2009

Implementing Joins 1

CSE 444 Final Exam. August 21, Question 1 / 15. Question 2 / 25. Question 3 / 25. Question 4 / 15. Question 5 / 20.

External Sorting Implementing Relational Operators

Evaluation of Relational Operations

Query Execution [15]

CSE344 Midterm Exam Fall 2016

CSE 444, Winter 2011, Final Examination. 17 March 2011

Evaluation of Relational Operations: Other Techniques. Chapter 14 Sayyed Nezhadi

CSE 444: Database Internals. Lecture 11 Query Optimization (part 2)

Implementing Relational Operators: Selection, Projection, Join. Database Management Systems, R. Ramakrishnan and J. Gehrke 1

Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Administrivia. Administrivia. Faloutsos/Pavlo CMU /615

Evaluation of Relational Operations: Other Techniques

Lecture 17: Query execution. Wednesday, May 12, 2010

Query Processing with Indexes. Announcements (February 24) Review. CPS 216 Advanced Database Systems

CSE 544 Principles of Database Management Systems

ECS 165B: Database System Implementa6on Lecture 7

Outline. Database Management and Tuning. Outline. Join Strategies Running Example. Index Tuning. Johann Gamper. Unit 6 April 12, 2012

Evaluation of Relational Operations: Other Techniques

Evaluation of Relational Operations: Other Techniques

Query Evaluation Overview, cont.

Query Processing: The Basics. External Sorting

CSEP 544: Lecture 04. Query Execution. CSEP544 - Fall

Introduction to Data Management. Lecture #17 (Physical DB Design!)

Evaluation of Relational Operations

Sorting & Aggregations

Query Evaluation Overview, cont.

Transcription:

Introduction to Data Management CSE 344 Lecture 12: Cost Estimation Relational Calculus CSE 344 - Winter 2017 1

HW3 due tonight Announcements WQ4 and HW4 out Due on Thursday 2/9 2

Midterm! Monday, February 13 th in class Contents Lectures and sections through February 8th Homework 1 through 4 Webquiz 1 through 4 Closed book. No computers, phones, watches, etc.! Can bring one letter-sized piece of paper with notes Can write on both sides You might want to save it for the final 3

Today s Outline Finish cost estimation Relational calculus CSE 344 - Winter 2017 4

Review Estimate cost of physical query plans Based on # of I/O operations Estimate cost for each operator Cost of entire plan = Σ operator cost Cost for selection operator Cost for join operator 5

Review: Cost Parameters Cost = I/O + CPU + Network BW We will focus on I/O in this class Parameters: B(R) = # of blocks (i.e., pages) for relation R T(R) = # of tuples in relation R V(R, a) = # of distinct values of attribute a When a is a key, V(R,a) = T(R) When a is not a key, V(R,a) can be anything <= T(R) Where do these values come from? DBMS collects statistics about data on disk CSE 344 - Winter 2017 6

Index Based Selection Example: B(R) = 2000 T(R) = 100,000 V(R, a) = 20 cost of σ a=v (R) =? Table scan: Index based selection: CSE 344 - Winter 2017 7

Index Based Selection Example: B(R) = 2000 T(R) = 100,000 V(R, a) = 20 cost of σ a=v (R) =? Table scan: B(R) = 2,000 I/Os Index based selection: CSE 344 - Winter 2017 8

Index Based Selection Example: B(R) = 2000 T(R) = 100,000 V(R, a) = 20 cost of σ a=v (R) =? Table scan: B(R) = 2,000 I/Os Index based selection: If index is clustered: If index is unclustered: CSE 344 - Winter 2017 9

Index Based Selection Example: B(R) = 2000 T(R) = 100,000 V(R, a) = 20 cost of σ a=v (R) =? Table scan: B(R) = 2,000 I/Os Index based selection: If index is clustered: B(R) * 1/V(R,a) = 100 I/Os If index is unclustered: CSE 344 - Winter 2017 10

Index Based Selection Example: B(R) = 2000 T(R) = 100,000 V(R, a) = 20 cost of σ a=v (R) =? Table scan: B(R) = 2,000 I/Os Index based selection: If index is clustered: B(R) * 1/V(R,a) = 100 I/Os If index is unclustered: T(R) * 1/V(R,a) = 5,000 I/Os CSE 344 - Winter 2017 11

Index Based Selection Example: B(R) = 2000 T(R) = 100,000 V(R, a) = 20 cost of σ a=v (R) =? Table scan: B(R) = 2,000 I/Os Index based selection: If index is clustered: B(R) * 1/V(R,a) = 100 I/Os If index is unclustered: T(R) * 1/V(R,a) = 5,000 I/Os Lesson: Don t build unclustered indexes when V(R,a) is small! CSE 344 - Winter 2017 12

Cost of Executing Operators (Focus on Joins) CSE 344 - Winter 2017 13

Outline Join operator algorithms One-pass algorithms (Sec. 15.2 and 15.3) Index-based algorithms (Sec 15.6) Note about readings: In class, we discuss only algorithms for joins Other operators are easier: read the book CSE 344 - Winter 2017 14

Join Algorithms Nested loop join Hash join Sort-merge join CSE 344 - Winter 2017 15

Nested Loop Joins Tuple-based nested loop R S R is the outer relation, S is the inner relation for each tuple t 1 in R do for each tuple t 2 in S do if t 1 and t 2 join then output (t 1,t 2 ) What is the Cost? CSE 344 - Winter 2017 16

Nested Loop Joins Tuple-based nested loop R S R is the outer relation, S is the inner relation for each tuple t 1 in R do for each tuple t 2 in S do if t 1 and t 2 join then output (t 1,t 2 ) Cost: B(R) + T(R) B(S) Multiple-pass since S is read many times What is the Cost? CSE 344 - Winter 2017 17

Page-at-a-time Refinement for each page of tuples r in R do for each page of tuples s in S do for all pairs of tuples t 1 in r, t 2 in s if t 1 and t 2 join then output (t 1,t 2 ) Cost: B(R) + B(R)B(S) What is the Cost? CSE 344 - Winter 2017 18

Hash Join Hash join: R S Scan R, build buckets in main memory Then scan S and join Cost: B(R) + B(S) Which relation to build the hash table on? One-pass algorithm when B(R) M M = number of memory pages available CSE 344 - Winter 2017 23

Hash Join Example Patient(pid, name, address) Insurance(pid, provider, policy_nb) Patient Insurance Two tuples per page Patient 1 Bob Seattle 2 Ela Everett 3 Jill Kent 4 Joe Seattle Insurance 2 Blue 123 4 Prem 432 4 Prem 343 3 GrpH 554 24

Patient Showing pid only Hash Join Example Insurance Memory M = 21 pages Some largeenough # Disk Patient Insurance 1 2 2 4 6 6 3 4 4 3 1 3 9 6 2 8 8 5 8 9 This is one page with two tuples 25

Hash Join Example Step 1: Scan Patient and build hash table in memory Can be done in Memory M = 21 pages method open() Hash h: pid % 5 5 1 6 2 3 8 4 9 Disk Patient Insurance 1 2 2 4 6 6 3 4 4 3 1 3 9 6 2 8 8 5 8 9 1 2 Input buffer 26

Hash Join Example Step 2: Scan Insurance and probe into hash table Done during Memory M = 21 pages calls to next() Hash h: pid % 5 5 1 6 2 3 8 4 9 Disk Patient Insurance 1 2 2 4 6 6 3 4 4 3 1 3 9 6 2 8 8 5 8 9 21 42 Input buffer 2 2 Write to disk or pass to next operator Output buffer 27

Hash Join Example Step 2: Scan Insurance and probe into hash table Done during Memory M = 21 pages calls to next() Hash h: pid % 5 5 1 6 2 3 8 4 9 Disk Patient Insurance 1 2 2 4 6 6 3 4 4 3 1 3 9 6 2 8 8 5 8 9 21 42 Input buffer 4 4 Output buffer 28

Hash Join Example Step 2: Scan Insurance and probe into hash table Done during Memory M = 21 pages calls to next() Hash h: pid % 5 5 1 6 2 3 8 4 9 Disk Patient Insurance 1 2 2 4 6 6 3 4 4 3 1 3 9 6 2 8 8 5 8 9 41 32 Input buffer 4 4 Output buffer Keep going until read all of Insurance Cost: B(R) + B(S) 29

Sort-Merge Join Sort-merge join: R S Scan R and sort in main memory Scan S and sort in main memory Merge R and S Cost: B(R) + B(S) One pass algorithm when B(S) + B(R) <= M Typically, this is NOT a one pass algorithm CSE 344 - Winter 2017 30

Sort-Merge Join Example Step 1: Scan Patient and sort in memory Memory M = 21 pages 1 2 3 4 5 6 8 9 Disk Patient Insurance 1 2 2 4 6 6 3 4 4 3 1 3 9 6 2 8 8 5 8 9 31

Sort-Merge Join Example Step 2: Scan Insurance and sort in memory Memory M = 21 pages 1 2 3 4 5 6 8 9 Disk Patient Insurance 1 2 2 4 6 6 3 4 4 3 1 3 9 6 2 8 1 2 2 3 3 4 4 6 6 8 8 9 8 5 8 9 32

Sort-Merge Join Example Step 3: Merge Patient and Insurance Memory M = 21 pages 1 2 3 4 5 6 8 9 Disk Patient Insurance 1 2 2 4 6 6 3 4 4 3 1 3 9 6 2 8 1 2 2 3 3 4 4 6 6 8 8 9 1 1 Output buffer 8 5 8 9 33

Sort-Merge Join Example Step 3: Merge Patient and Insurance Memory M = 21 pages 1 2 3 4 5 6 8 9 Disk Patient Insurance 1 2 2 4 6 6 3 4 4 3 1 3 9 6 2 8 1 2 2 3 3 4 4 6 6 8 8 9 2 2 Output buffer Keep going until end of first relation 8 5 8 9 34

Cost of Query Plans CSE 344 - Winter 2017 36

T(Supplier) = 1000 T(Supply) = 10,000 B(Supplier) = 100 B(Supply) = 100 V(Supplier,scity) = 20 V(Supplier,state) = 10 V(Supply,pno) = 2,500 M = 11 Physical Query Plan 1 (On the fly) (On the fly) π sname Selection and project on-the-fly à No additional cost. σ scity= Seattle and sstate= WA and pno=2 (Nested loop) Supplier (File scan) sid = sid Total cost of plan is thus cost of join: = B(Supplier)+B(Supplier)*B(Supply) = 100 + 100 * 100 = 10,100 I/Os SELECT sname FROM Supplier x, Supply y Supply WHERE x.sid = y.sid and y.pno = 2 (File scan) and x.scity = Seattle CSE 344 - Winter 2017 and x.sstate = WA 37

T(Supplier) = 1000 T(Supply) = 10,000 B(Supplier) = 100 B(Supply) = 100 V(Supplier,scity) = 20 V(Supplier,state) = 10 V(Supply,pno) = 2,500 M = 11 Physical Query Plan 2 4. (On the fly) 3. (Sort-merge join) π sname sid = sid (Scan write to T1) 1. σ scity= Seattle and sstate= WA (Scan write to T2) 2. σ pno=2 Total cost = 100 + 100 * 1/20 * 1/10 (step 1) + 100 + 100 * 1/2500 (step 2) + 2 (step 3) + 0 (step 4) Total cost 204 I/Os Supplier (File scan) SELECT sname Supply FROM Supplier x, Supply y WHERE x.sid = y.sid (File scan) and y.pno = 2 and x.scity = Seattle CSE 344 - Winter 2017 38 and x.sstate = WA