Notes. Some of these slides are based on a slide set provided by Ulf Leser. CS 640 Query Processing Winter / 30. Notes

Similar documents
Algebraic laws extensions to relational algebra

University of Waterloo Midterm Examination Sample Solution

CSC 261/461 Database Systems Lecture 19

Outline. Query Processing Overview Algorithms for basic operations. Query optimization. Sorting Selection Join Projection

Notes. These slides are based on slide sets provided by M. T. Öszu and by R. Ramakrishnan and J. Gehrke. CS 640 Views Winter / 15.

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe

Advanced Database Systems

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Database System Concepts

R has a ordered clustering index file on its tuples: Read index file to get the location of the tuple with the next smallest value

Query Processing & Optimization. CS 377: Database Systems

Chapter 13: Query Processing

CS 4604: Introduction to Database Management Systems. B. Aditya Prakash Lecture #10: Query Processing

Algorithms for Query Processing and Optimization. 0. Introduction to Query Processing (1)

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for

Chapter 13: Query Processing Basic Steps in Query Processing

Database Applications (15-415)

Chapter 12: Query Processing. Chapter 12: Query Processing

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17

Query Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016

Chapter 12: Query Processing

Something to think about. Problems. Purpose. Vocabulary. Query Evaluation Techniques for large DB. Part 1. Fact:

Query Processing. Introduction to Databases CompSci 316 Fall 2017

Lecture 14. Lecture 14: Joins!

Statistics. Duplicate Elimination

RELATIONAL OPERATORS #1

CMSC424: Database Design. Instructor: Amol Deshpande

CMSC424: Database Design. Instructor: Amol Deshpande

The query processor turns user queries and data modification commands into a query plan - a sequence of operations (or algorithm) on the database

Database Applications (15-415)

University of Waterloo Midterm Examination Solution

Chapter 12: Query Processing

Module 4. Implementation of XQuery. Part 0: Background on relational query processing

CS 245 Midterm Exam Solution Winter 2015

What happens. 376a. Database Design. Execution strategy. Query conversion. Next. Two types of techniques

Faloutsos 1. Carnegie Mellon Univ. Dept. of Computer Science Database Applications. Outline

Query processing and optimization

Query Processing and Optimization

Principles of Database Management Systems

Query Processing and Optimization *

Implementing Relational Operators: Selection, Projection, Join. Database Management Systems, R. Ramakrishnan and J. Gehrke 1

CompSci 516 Data Intensive Computing Systems

Parser: SQL parse tree

Course No: 4411 Database Management Systems Fall 2008 Midterm exam

Query Processing & Optimization

Datenbanksysteme II: Implementing Joins. Ulf Leser

EECS 647: Introduction to Database Systems

Database Systems. Announcement. December 13/14, 2006 Lecture #10. Assignment #4 is due next week.

Outline. Database Management and Tuning. Outline. Join Strategies Running Example. Index Tuning. Johann Gamper. Unit 6 April 12, 2012

CSE 544 Principles of Database Management Systems

Announcements. Two typical kinds of queries. Choosing Index is Not Enough. Cost Parameters. Cost of Reading Data From Disk

PARALLEL & DISTRIBUTED DATABASES CS561-SPRING 2012 WPI, MOHAMED ELTABAKH

Relational Query Optimization. Overview of Query Evaluation. SQL Refresher. Yanlei Diao UMass Amherst October 23 & 25, 2007

Database Systems External Sorting and Query Optimization. A.R. Hurson 323 CS Building

Background material for Chapter 3 taken from CS 245

7. Query Processing and Optimization

Improving Query Plans. CS157B Chris Pollett Mar. 21, 2005.

Today's Class. Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Example Database. Query Plan Example

Relational Query Optimization. Highlights of System R Optimizer

Advanced Databases: Parallel Databases A.Poulovassilis

CSE 344 FEBRUARY 14 TH INDEXING

Final Exam Review 2. Kathleen Durant CS 3200 Northeastern University Lecture 23

Query Processing with Indexes. Announcements (February 24) Review. CPS 216 Advanced Database Systems

External Sorting Implementing Relational Operators

Hash table example. B+ Tree Index by Example Recall binary trees from CSE 143! Clustered vs Unclustered. Example

Chapter 3. Algorithms for Query Processing and Optimization

Lecture 15: The Details of Joins

Database Tuning and Physical Design: Basics of Query Execution

Relational Query Optimization

Introduction to Database Systems CSE 344

Hash-Based Indexing 165

CSIT5300: Advanced Database Systems

Introduction to Database Systems CSE 414. Lecture 26: More Indexes and Operator Costs

Lassonde School of Engineering Winter 2016 Term Course No: 4411 Database Management Systems

Data Storage. Query Performance. Index. Data File Types. Introduction to Data Management CSE 414. Introduction to Database Systems CSE 414

CSE 344 APRIL 20 TH RDBMS INTERNALS

DBMS Query evaluation

Administriva. CS 133: Databases. General Themes. Goals for Today. Fall 2018 Lec 11 10/11 Query Evaluation Prof. Beth Trushkowsky

Query Optimization. Introduction to Databases CompSci 316 Fall 2018

Review. Relational Query Optimization. Query Optimization Overview (cont) Query Optimization Overview. Cost-based Query Sub-System

R & G Chapter 13. Implementation of single Relational Operations Choices depend on indexes, memory, stats, Joins Blocked nested loops:

Carnegie Mellon University in Qatar Spring Problem Set 4. Out: April 01, 2018

Datenbanksysteme II: Caching and File Structures. Ulf Leser

Introduction to Database Systems CSE 444

Database Systems CSE 414

CS 564 Final Exam Fall 2015 Answers

Data Modeling and Databases Ch 10: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich

Data Modeling and Databases Ch 9: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich

Project. Building a Simple Query Optimizer with Performance Evaluation Experiment on Query Rewrite Optimization

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 7 - Query execution

15-415/615 Faloutsos 1

Chapter 17: Parallel Databases

Review. Support for data retrieval at the physical level:

Query Execution [15]

Relational Query Optimization

Administrivia. CS 133: Databases. Cost-based Query Sub-System. Goals for Today. Midterm on Thursday 10/18. Assignments

Query Optimization. Query Optimization. Optimization considerations. Example. Interaction of algorithm choice and tree arrangement.

QUERY OPTIMIZATION E Jayant Haritsa Computer Science and Automation Indian Institute of Science. JAN 2014 Slide 1 QUERY OPTIMIZATION

Databases 2 Lecture IV. Alessandro Artale

Evaluation of Relational Operations

Transcription:

uery Processing Olaf Hartig David R. Cheriton School of Computer Science University of Waterloo CS 640 Principles of Database Management and Use Winter 2013 Some of these slides are based on a slide set provided by Ulf Leser. CS 640 uery Processing Winter 2013 1 / 30 Outline 1 uery Processing Steps 2 uery Representations Logical Plans Physical Plans 3 Examples for Physical Operators Table Scan Sorting Duplicate Elimination Nested Loop Join Sort-Merge Join Hash Join 4 Summary CS 640 uery Processing Winter 2013 2 / 30 uery Processing Steps CS 640 uery Processing Winter 2013 3 / 30

uery Processing Steps () ˆ Input: SL query ˆ Output: internal representation (e.g. based on relational algebra) CS 640 uery Processing Winter 2013 4 / 30 uery Processing Steps () Check whether: ˆ mentioned tables, attributes, etc. exist, ˆ comparisons are feasible (e.g. comparability of attribute types), ˆ aggregation queries have a valid SELECT clause, ˆ etc. CS 640 uery Processing Winter 2013 5 / 30 uery Processing Steps () ˆ Substitute references to views by the view denitions ˆ (More on views in a later lecture) CS 640 uery Processing Winter 2013 6 / 30

uery Processing Steps () ˆ Considers possible query execution plans (EPs) ˆ Dierent EPs have dierent costs (that is, resources needed for their execution) ˆ Output: an ecient EP (i.e. estimated cost is the lowest or comparatively low) ˆ This task is all but trivial... CS 640 uery Processing Winter 2013 7 / 30 uery Processing Steps (, cont'd) ˆ The set of all possible EPs is huge. ˆ Optimizers enumerate only a restricted subset (search space) ˆ A desirable optimizer: ˆ cost estimation is accurate ˆ search space includes low cost EPs ˆ enumeration algorithm is ecient ˆ (uery optimization will be the main topic of our discussion next week.) CS 640 uery Processing Winter 2013 8 / 30 uery Processing Steps () ˆ Translate the selected EP into a representation ready for execution (e.g. compiled machine code, interpreted language) CS 640 uery Processing Winter 2013 9 / 30

uery Processing Steps () ˆ Execute the compiled plan ˆ Return the query result via the respective interface CS 640 uery Processing Winter 2013 10 / 30 uery Representations While it is processed by a DBMS, a query goes through multiple representations. Usually, these are: 1 an expression in the query language (e.g. SL query) 2 parse tree 3 logical plan 4 physical plan 5 compiled program CS 640 uery Processing Winter 2013 11 / 30 Logical Plans ˆ represented as an expression in a logical algebra (such as the relational algebra) ˆ closely related to the logical data model ˆ can be visualized as a tree of logical operators SELECT title FROM StarsIn WHERE starname IN ( SELECT name FROM MovieStar WHERE dob LIKE "% 1960" ) title starname=name StarsIn name dob LIKE "% 1960s" MovieStar CS 640 uery Processing Winter 2013 12 / 30

Logical Plans (cont'd) Remember, algebra expressions may be rewritten into semantically equivalent expressions. title starname=name StarsIn name dob LIKE "% 1960s" MovieStar title starname=name StarsIn name dob LIKE "% 1960" MovieStar CS 640 uery Processing Winter 2013 13 / 30 Physical Plans ˆ Also often called query execution plan (EP) ˆ Represented as an expression in a physical algebra ˆ Physical operators come with a specic algorithm. ˆ e.g. a nested loop oin CS 640 uery Processing Winter 2013 14 / 30 Physical Plans (Example) StarsIn title starname=name name dob LIKE "% 1960" MovieStar A possible physical plan for the given logical plan: Proect (Attributes: {title}) Hash Join (Predicate: starname=name) Sequential Scan Index Scan (Table: StarsIn) (Predicate: dob LIKE "% 1960" Table: MovieStar) CS 640 uery Processing Winter 2013 15 / 30

Logical Operators vs. Physical Operators ˆ Logical operators and physical operators do not necessarily map directly into one another. ˆ e.g. most oin algorithms can proect out attributes (without duplicate elimination) ˆ e.g. a (physical) duplicate removal operator implements only a part of the (logical) proection operator ˆ e.g. a (physical) sort operator has no counterpart in a (set based) logical algebra ˆ Physical operators can be associated with a cost function (to compute the amount of resources needed for their execution). CS 640 uery Processing Winter 2013 16 / 30 Examples for Physical Operators ˆ Table scan ˆ Sorting ˆ Duplicate elimination ˆ Selection ˆ Proection CS 640 uery Processing Winter 2013 17 / 30 Table Scan ˆ The leaves of each physical operator tree are (physical) tables. ˆ Accessing them completely implies a sequential scan. ˆ Load each page of the table. ˆ Sequentially scanning a table that occupies n pages has n I/O cost. Combining the scan with the next operation in the plan is often better. Example: SELECT A, B FROM t WHERE A = 5 Selection: If index on A available, perform an index scan instead (i.e. obtain relevant tuples by accessing the index) ˆ Especially eective if A is a key Proection: Integrate into the table scan (i.e. read all tuples but only pass on attributes that are needed) ˆ Can also be combined with an index scan. CS 640 uery Processing Winter 2013 18 / 30

Sorting ˆ Many physical operators require input to be sorted. ˆ However, the (unsorted) input may not t into main memory. ˆ We need an external sorting algorithm. ˆ Intermediate results are temporarily stored on secondary memory. CS 640 uery Processing Winter 2013 19 / 30 (Simple) External Merge Sort ˆ First, sort each page internally. ˆ Group these sorted pages into pairs and for each pair merge its two pages. ˆ Each of these groups is then merged with another group, resulting in groups of four pages. ˆ And so on. ˆ The nal group is the completely sorted le. ˆ This strategy makes use of 3 page buers in main memory. ˆ If more buers are available, we should exploit them... CS 640 uery Processing Winter 2013 20 / 30 External Merge Sort ˆ Suppose: ˆ m+1 page buers are available in main memory; ˆ the input occupies p pages. ˆ Stage 1: groups. ˆ Group the pages into p m ˆ Sort each group (using an m-way merge after sorting each of its pages internally). ˆ Each additional stage n (n > 1): ˆ Group the sorted groups from stage n-1 into larger groups such that each larger group consists of m of the previous groups (and m n pages). ˆ For each of these new groups perform an m-way merge. ˆ If m n+1 p we are done. ˆ Maximum number of stages: log m (p) ˆ Maximum number of pages read and written: 2 p log m (p) CS 640 uery Processing Winter 2013 21 / 30

Duplicate Elimination (Option 1) Two steps: 1 Sort input table (or intermediate result) on DISTINCT column(s) ˆ Can be skipped if input is sorted already. 2 Scan sorted table and output unique tuples. ˆ Generated output is sorted ˆ Cannot be pipelined CS 640 uery Processing Winter 2013 22 / 30 Duplicate Elimination (Option 2) ˆ Idea: Scan the input and gradually populate an internal data structure that holds each unique input tuple once. ˆ For each input tuple check whether it has already been seen ˆ No: insert tuple into the data structure and output the tuple ˆ Yes: skip to the next input tuple ˆ Possible data structures: ˆ Hash table might be faster, needs good hash function ˆ Binary tree some cost for balancing, robust ˆ Does not sort output (but existing sorting would remain) ˆ Can be pipelined CS 640 uery Processing Winter 2013 23 / 30 Nested Loop Join ˆ General idea: (We focus on equi oins and natural oins.) FOR EACH tuple r in relation R DO FOR EACH tuple s in relation S DO IF r:a = s:a THEN output tuple r [ s. ˆ Of course, we do this for relations that are distributed over multiple pages: FOR EACH page p of R DO FOR EACH page q of S DO FOR EACH tuple r on page p DO FOR EACH tuple s on page q DO IF r:a = s:a THEN output tuple r [ s. ˆ I/O cost: pages(r) + pages(r) pages(s) CS 640 uery Processing Winter 2013 24 / 30

Nested Loop Join (Possible Improvements) ˆ Block nested loop oin ˆ Zig-zag oin ˆ Index nested loop oin CS 640 uery Processing Winter 2013 25 / 30 Sort-Merge Join ˆ Sort phase: ˆ Sort both inputs on the oin attributes ˆ May need to use an external sorting algorithm ˆ Sorting may be skipped for inputs that are already sorted ˆ Merge phase: ˆ Synchronously iterate over both (sorted) inputs ˆ Merge and output tuples that can be oined ˆ Caution if oin values may appear multiple times CS 640 uery Processing Winter 2013 26 / 30 Sort-Merge Join (Example) R A B a2 1 a4 0 a3 2 a4 1! sort(b) A B a4 0 a2 1 a4 1 a3 2 ( ( ( ( ) ) ) ) B C 0 c2 2 c3 2 c1 3 c4 sort(b) S B C 3 c4 2 c1 0 c2 2 c3 R S A B C a4 0 c2 a3 2 c3 a3 2 c1 CS 640 uery Processing Winter 2013 27 / 30

Sort-Merge Join (Cost Estimation) ˆ I/O costs for sort phase are approximately: l 2 pages(r) log pages(r) m {z } sorting R ˆ I/O costs for merge phase are: l + 2 pages(s) log pages(s) m {z } sorting S pages(r) + pages(s) CS 640 uery Processing Winter 2013 28 / 30 Hash Join ˆ Idea: use oin attributes as hash keys in both input relations ˆ Choose hash function for building hash tables with m buckets (where m is the number of page buers available in main memory) ˆ Partitioning phase: ˆ Scan relation R and populate its hash table. ˆ Whenever the page buer for a hash bucket is full, write it to disc and start a new page for that buer. ˆ Finally, write the remaining page buers to disc. ˆ Do the same for relation S (using the same hash function!). ˆ Result: hash-partitioned versions of both relations on disc ˆ Now, we only need to compare tuples in corresponding partitions. ˆ Probing phase: ˆ Read in a complete partition from R (assuming R < S ). ˆ Scan over the corresponding partition of S and produce oin tuples. pages(r) + pages(s) ˆ I/O costs: 2 {z } partitioning + pages(r) + pages(s) {z } probing CS 640 uery Processing Winter 2013 29 / 30 Summary ˆ For each query dierent physical operators can be combined into dierent, semantically equivalent EPs (query execution plans). ˆ Each physical operator comes with an algorithm. ˆ Commonly used techniques for many of these algorithms: ˆ Combining: Multiple tasks may be combined once some input data has been read in. ˆ Partitioning: Either by sorting or by hashing, we can partition the input(s) and do less work by ignoring many irrelevant combinations. ˆ Indexing: Existing indexes may be exploited for reducing work to relevant parts of the input. ˆ Each of these algorithms has a certain cost. ˆ Thus, dierent EPs have dierent costs. ˆ The actual cost can only be estimated (as long as we do not execute the EP). CS 640 uery Processing Winter 2013 30 / 30