Overview of Query Evaluation

Similar documents
Overview of Query Evaluation. Overview of Query Evaluation

CS330. Query Processing

Overview of Query Evaluation

Overview of Implementing Relational Operators and Query Evaluation

Schema for Examples. Query Optimization. Alternative Plans 1 (No Indexes) Motivating Example. Alternative Plans 2 With Indexes

Query Optimization. Schema for Examples. Motivating Example. Similar to old schema; rname added for variations. Reserves: Sailors:

Relational Query Optimization. Overview of Query Evaluation. SQL Refresher. Yanlei Diao UMass Amherst October 23 & 25, 2007

Relational Query Optimization

Query Optimization. Schema for Examples. Motivating Example. Similar to old schema; rname added for variations. Reserves: Sailors:

Principles of Data Management. Lecture #9 (Query Processing Overview)

Evaluation of Relational Operations: Other Techniques. Chapter 14 Sayyed Nezhadi

ECS 165B: Database System Implementa6on Lecture 7

Review. Relational Query Optimization. Query Optimization Overview (cont) Query Optimization Overview. Cost-based Query Sub-System

Relational Query Optimization. Overview of Query Evaluation. SQL Refresher. Yanlei Diao UMass Amherst March 8 and 13, 2007

Administrivia. CS 133: Databases. Cost-based Query Sub-System. Goals for Today. Midterm on Thursday 10/18. Assignments

Relational Query Optimization. Highlights of System R Optimizer

Query Evaluation! References:! q [RG-3ed] Chapter 12, 13, 14, 15! q [SKS-6ed] Chapter 12, 13!

Evaluation of relational operations

Relational Query Optimization

QUERY OPTIMIZATION E Jayant Haritsa Computer Science and Automation Indian Institute of Science. JAN 2014 Slide 1 QUERY OPTIMIZATION

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17

QUERY OPTIMIZATION [CH 15]

Overview of Query Evaluation. Chapter 12

Principles of Data Management. Lecture #12 (Query Optimization I)

Query Evaluation Overview, cont.

An SQL query is parsed into a collection of query blocks optimize one block at a time. Nested blocks are usually treated as calls to a subroutine

Implementation of Relational Operations: Other Operations

Database Systems. Announcement. December 13/14, 2006 Lecture #10. Assignment #4 is due next week.

Query Evaluation Overview, cont.

CS330. Some Logistics. Three Topics. Indexing, Query Processing, and Transactions. Next two homework assignments out today Extra lab session:

R & G Chapter 13. Implementation of single Relational Operations Choices depend on indexes, memory, stats, Joins Blocked nested loops:

Evaluation of Relational Operations: Other Techniques

Implementing Relational Operators: Selection, Projection, Join. Database Management Systems, R. Ramakrishnan and J. Gehrke 1

Midterm Review CS634. Slides based on Database Management Systems 3 rd ed, Ramakrishnan and Gehrke

Evaluation of Relational Operations: Other Techniques

Overview of Query Processing

Database Applications (15-415)

Faloutsos 1. Carnegie Mellon Univ. Dept. of Computer Science Database Applications. Outline

CS 4604: Introduction to Database Management Systems. B. Aditya Prakash Lecture #10: Query Processing

Overview of Query Processing. Evaluation of Relational Operations. Why Sort? Outline. Two-Way External Merge Sort. 2-Way Sort: Requires 3 Buffer Pages

15-415/615 Faloutsos 1

Administrivia. Relational Query Optimization (this time we really mean it) Review: Query Optimization. Overview: Query Optimization

CAS CS 460/660 Introduction to Database Systems. Query Evaluation II 1.1

Implementation of Relational Operations. Introduction. CS 186, Fall 2002, Lecture 19 R&G - Chapter 12

Query Optimization. Kyuseok Shim. KDD Laboratory Seoul National University. Query Optimization

CompSci 516 Data Intensive Computing Systems

Query Evaluation (i)

Evaluation of Relational Operations: Other Techniques

Database Applications (15-415)

Database Applications (15-415)

Evaluation of Relational Operations

Hash-Based Indexing 165

Query Evaluation Overview. Query Optimization: Chap. 15. Evaluation Example. Cost Estimation. Query Blocks. Query Blocks

Cost-based Query Sub-System. Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Last Class.

ATYPICAL RELATIONAL QUERY OPTIMIZER

Administriva. CS 133: Databases. General Themes. Goals for Today. Fall 2018 Lec 11 10/11 Query Evaluation Prof. Beth Trushkowsky

QUERY EXECUTION: How to Implement Relational Operations?

Evaluation of Relational Operations

Query Processing & Optimization

System R Optimization (contd.)

CSE 444: Database Internals. Sec2on 4: Query Op2mizer

External Sorting Implementing Relational Operators

CSE 444: Database Internals. Section 4: Query Optimizer

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 9 - Query optimization

Examples of Physical Query Plan Alternatives. Selected Material from Chapters 12, 14 and 15

Final Exam Review 2. Kathleen Durant CS 3200 Northeastern University Lecture 23

CompSci 516 Data Intensive Computing Systems. Lecture 11. Query Optimization. Instructor: Sudeepa Roy

Principles of Data Management. Lecture #13 (Query Optimization II)

Συστήματα Διαχείρισης Βάσεων Δεδομένων

Evaluation of Relational Operations

Evaluation of Relational Operations. Relational Operations

Database Management System

Database Applications (15-415)

RELATIONAL OPERATORS #1

7. Query Processing and Optimization

Chapter 12: Query Processing. Chapter 12: Query Processing

Module 9: Query Optimization

Query Optimization in Relational Database Systems

Overview of DB & IR. ICS 624 Spring Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa

Chapter 12: Query Processing

Managing Resources, cont., Intro Query Processing

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 7 - Query optimization

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Advanced Database Systems

User Perspective. Module III: System Perspective. Module III: Topics Covered. Module III Overview of Storage Structures, QP, and TM

Ch 5 : Query Processing & Optimization

Lecture 20: Query Optimization (2)

Hash table example. B+ Tree Index by Example Recall binary trees from CSE 143! Clustered vs Unclustered. Example

Query and Join Op/miza/on 11/5

Managing Resources, cont., Intro Query Processing CS634 Lecture 8

Introduction to Database Systems CSE 414. Lecture 26: More Indexes and Operator Costs

Sorting a file in RAM. External Sorting. Why Sort? 2-Way Sort of N pages Requires Minimum of 3 Buffers Pass 0: Read a page, sort it, write it.

Parser. Select R.text from Report R, Weather W where W.image.rain() and W.city = R.city and W.date = R.date and R.text.

Indexing. Chapter 8, 10, 11. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

Query Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016

Operator Implementation Wrap-Up Query Optimization

Architecture and Implementation of Database Systems (Winter 2015/16)

Database System Concepts

Relational Query Languages. Preliminaries. Formal Relational Query Languages. Example Schema, with table contents. Relational Algebra

Query Processing and Query Optimization. Prof Monika Shah

Transcription:

Overview of Query Evaluation Chapter 12 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

Overview of Query Evaluation Plan: Tree of R.A. operators, with choice of algorithm for each operator. Two main issues in query optimization: For a given query, what plans are considered? Algorithm to search plan space for cheapest (estimated) plan. How is the cost of a plan estimated? Ideally: Want to find best plan. Practically: Avoid worst plans! We will study the System R approach. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 2

Statistics and Catalogs Need information about the relations and indexes involved. Catalogs typically contain at least: # tuples (NTuples) and # pages (NPages) for each relation. # distinct key values (NKeys) and NPages for each index. Index height, low/high key values (Low/High) for each tree index. Catalogs updated periodically. Updating whenever data changes is too expensive; lots of approximation anyway, so slight inconsistency ok. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 3

Access Paths An access path is a method of retrieving tuples: Table scan, or index that matches a selection (in the query) A tree index matches terms that involve only attributes in a prefix of the search key. E.g., Tree index on <a, b, c> matches the selection a=5 AND b=3, and a=5 AND b>6, but not b=3. A hash index matches terms that has a term attribute = value for every attribute in the search key of the index. E.g., Hash index on <a, b, c> matches a=5 AND b=3 AND c=5; but it does not match b=3, or a=5 AND b=3, or a>5 AND b=3 AND c=5. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 4

A Note on Complex Selections (day<8/9/94 AND rname= Paul ) OR bid=5 OR sid=3 Selection conditions are first converted to conjunctive normal form (CNF): (day<8/9/94 OR bid=5 OR sid=3 ) AND (rname= Paul OR bid=5 OR sid=3) We only discuss case with no ORs; see text if you are curious about the general case. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 5

One Approach to Selections Find the most selective access path, retrieve tuples using it, and apply any remaining terms that don t match the index: Most selective access path: An index or file scan that we estimate will require the fewest page I/Os. Terms that match this index reduce the number of tuples retrieved Other terms are used to discard some retrieved tuples, but do not affect number of tuples/pages fetched. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 6

One Approach to Selections (Cont d) Consider day<8/9/94 AND bid=5 AND sid=3. A B+ tree index on day can be used, then bid=5 and sid=3 must be checked for each retrieved tuple. Again, consider day<8/9/94 AND bid=5 AND sid=3 A hash index on <bid, sid> could be used, then day<8/9/94 must then be checked. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 7

Using an Index for Selections Cost of using an index depends on: #qualifying tuples, and clustering. For example, assuming that the query below selects 10% of the tuples(#pages = 100, #tuples = 10,000, with a clustered index Cost is little more than 100 I/Os a non-clustered, up to 10000 I/Os! SELECT * FROM Reserves R WHERE R.rname < C% What if we select all tuples? SELECT * FROM Reserves Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 8

Some terminologies first. Query Evaluation Pipeline (QEP) Pipelined evaluation The Iterator interface open() //or init() getnext() Returns a tuple instance reset() close() Lazy Evaluation Pull Model Left-deep Plans vs. Right-deep Plans vs. Bushy Plans Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 9

Highlights of System R Optimizer Impact: Most widely used currently; works well for < 10 joins. Cost estimation: Approximate art at best. Statistics, maintained in system catalogs, used to estimate cost of operations and result sizes. Considers combination of CPU and I/O costs. Plan Space: Too large, must be pruned. Only the space of left-deep plans is considered. Left-deep plans allow output of each operator to be pipelined into the next operator without storing it in a temporary relation. Cartesian products avoided. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 10

Cost Estimation For each plan considered, must estimate cost: Must estimate cost of each operation in plan tree. Depends on input cardinalities. We ve already discussed how to estimate the cost of operations (sequential scan, index scan) Must also estimate size of result for each operation in tree! Use information about the input relations. For selections and joins, assume independence of predicates. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 11

Size Estimation and Reduction Factors Consider a query block: SELECT attribute list FROM relation list WHERE term1 AND... AND termk Maximum # tuples in result is the product of the cardinalities of relations in the FROM clause. Reduction factor (RF) associated with each term reflects the impact of the term in reducing result size. Result cardinality = Max # tuples * product of all RF s. Implicit assumption that terms are independent! Term col=value has RF 1/NKeys(I), given index I on col Term col1=col2 has RF 1/MAX(NKeys(I1), NKeys(I2)) Term col>value has RF (High(I)-value)/(High(I)-Low(I)) Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 12

Size Estimation and Reduction Factors Consider this query block where ZipCode has an index SELECT * FROM Students WHERE ZipCode = 47907 Assume that NTuples(Students) = 1000, and that NKeys(ZipCode) = 2 Then, reduction factor for ZipCode = 47907 is 1 / 2 = 0.5 And, estimated result size = 1000 * 0.5 = 500 tuples SELECT * FROM Students WHERE ZipCode = 47907 And StudentId > 800 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 13

Size Estimation and Reduction Factors Consider this query block where ZipCode has an index and StudentId has another index SELECT * FROM Students WHERE ZipCode = 47907 And StudentId > 800 Assume that NTuples(Students) = 1000, and that NKeys(ZipCode) = 2, High(StudentId) = 1000 and Low(StudentId) = 1 Then, RF(ZipCode = 47907) = 1 / 2 = 0.5 Then, RF(StudentId > 800) = (1000 800) / (1000-1) 0.2 And, estimated result size = 1000 * 0.5 * 0.2 = 100 tuples Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 14