What happens. 376a. Database Design. Execution strategy. Query conversion. Next. Two types of techniques

Similar documents
Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe

Algorithms for Query Processing and Optimization. 0. Introduction to Query Processing (1)

Chapter 3. Algorithms for Query Processing and Optimization

Chapter 18 Strategies for Query Processing. We focus this discussion w.r.t RDBMS, however, they are applicable to OODBS.

Chapter 19 Query Optimization

Outline. Query Processing Overview Algorithms for basic operations. Query optimization. Sorting Selection Join Projection

CSC 742 Database Management Systems

Database System Concepts

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Query Processing & Optimization. CS 377: Database Systems

Chapter 12: Query Processing

Query Processing. Introduction to Databases CompSci 316 Fall 2017

Chapter 12: Query Processing. Chapter 12: Query Processing

Query Processing & Optimization

Chapter 13: Query Processing

Database Systems External Sorting and Query Optimization. A.R. Hurson 323 CS Building

Query Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016

The Relational Algebra

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for

Chapter 13: Query Processing Basic Steps in Query Processing

Chapter 12: Query Processing

Advanced Database Systems

Ch 5 : Query Processing & Optimization

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17

Database Applications (15-415)

Query optimization. Elena Baralis, Silvia Chiusano Politecnico di Torino. DBMS Architecture D B M G. Database Management Systems. Pag.

CS 4604: Introduction to Database Management Systems. B. Aditya Prakash Lecture #10: Query Processing

CMP-3440 Database Systems

Faloutsos 1. Carnegie Mellon Univ. Dept. of Computer Science Database Applications. Outline

RELATIONAL DATA MODEL: Relational Algebra

Query Optimization. Query Optimization. Optimization considerations. Example. Interaction of algorithm choice and tree arrangement.

COSC344 Database Theory and Applications. σ a= c (P) S. Lecture 4 Relational algebra. π A, P X Q. COSC344 Lecture 4 1

DBMS Query evaluation

Query processing and optimization

2.2.01c. Machine architecture (by capacity)

EECS 647: Introduction to Database Systems

Algorithms for Query Processing and Optimization

Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Chapter 6 Outline. Unary Relational Operations: SELECT and

R & G Chapter 13. Implementation of single Relational Operations Choices depend on indexes, memory, stats, Joins Blocked nested loops:

Advances in Data Management Query Processing and Query Optimisation A.Poulovassilis

Overview of Query Evaluation. Chapter 12

CSIT5300: Advanced Database Systems

Relational Algebra. Relational Algebra Overview. Relational Algebra Overview. Unary Relational Operations 8/19/2014. Relational Algebra Overview

QUERY PROCESSING & OPTIMIZATION CHAPTER 19 (6/E) CHAPTER 15 (5/E)

Query Processing Strategies and Optimization

RELATIONAL OPERATORS #1

Query Processing and Optimization *

Implementation of Relational Operations: Other Operations

15-415/615 Faloutsos 1

Evaluation of Relational Operations

Evaluation of Relational Operations: Other Techniques

Chapter 6 The Relational Algebra and Calculus

CMSC424: Database Design. Instructor: Amol Deshpande

Storage hierarchy. Textbook: chapters 11, 12, and 13

Overview of Query Processing and Optimization

Implementation of Relational Operations. Introduction. CS 186, Fall 2002, Lecture 19 R&G - Chapter 12

DBMS Y3/S5. 1. OVERVIEW The steps involved in processing a query are: 1. Parsing and translation. 2. Optimization. 3. Evaluation.

Administrivia. CS 133: Databases. Cost-based Query Sub-System. Goals for Today. Midterm on Thursday 10/18. Assignments

Hash-Based Indexing 165

Relational Algebra. Ron McFadyen ACS

Overview of Implementing Relational Operators and Query Evaluation

Evaluation of Relational Operations: Other Techniques

Database Applications (15-415)

Database Systems. Announcement. December 13/14, 2006 Lecture #10. Assignment #4 is due next week.

Introduction to Wireless Sensor Network. Peter Scheuermann and Goce Trajcevski Dept. of EECS Northwestern University

Chapter 8: Relational Algebra

7. Query Processing and Optimization

CSC 261/461 Database Systems Lecture 19

Evaluation of relational operations

Evaluation of Relational Operations: Other Techniques

Evaluation of Relational Operations

Implementing Relational Operators: Selection, Projection, Join. Database Management Systems, R. Ramakrishnan and J. Gehrke 1

Review. Support for data retrieval at the physical level:

Database Applications (15-415)

Administriva. CS 133: Databases. General Themes. Goals for Today. Fall 2018 Lec 11 10/11 Query Evaluation Prof. Beth Trushkowsky

Database Applications (15-415)

NESTED QUERIES AND AGGREGATION CHAPTER 5 (6/E) CHAPTER 8 (5/E)

Principles of Data Management. Lecture #9 (Query Processing Overview)

Query Execution [15]

Chapter 5 Relational Algebra. Nguyen Thi Ai Thao

CAS CS 460/660 Introduction to Database Systems. Query Evaluation II 1.1

CSIT5300: Advanced Database Systems

Hash table example. B+ Tree Index by Example Recall binary trees from CSE 143! Clustered vs Unclustered. Example

Parser: SQL parse tree

Chapter 14 Query Optimization

Principles of Database Management Systems

Chapter 14 Query Optimization

Chapter 14 Query Optimization

Relational Model, Relational Algebra, and SQL

Introduction to Database Systems CSE 414. Lecture 26: More Indexes and Operator Costs

Plan for today. Query Processing/Optimization. Parsing. A query s trip through the DBMS. Validation. Logical plan

CS 377 Database Systems

Slides by: Ms. Shree Jaswal

Query Evaluation! References:! q [RG-3ed] Chapter 12, 13, 14, 15! q [SKS-6ed] Chapter 12, 13!

Database Systems. Project 2

COMP9311 Week 10 Lecture. DBMS Architecture. DBMS Architecture and Implementation. Database Application Performance

Query Processing with Indexes. Announcements (February 24) Review. CPS 216 Advanced Database Systems

Chapter 6 - Part II The Relational Algebra and Calculus

Wentworth Institute of Technology COMP570 Database Applications Fall 2014 Derbinsky. Physical Tuning. Lecture 10. Physical Tuning

5.3 Parser and Translator. 5.3 Parser and Translator. 5.3 Parser and Translator. 5.3 Parser and Translator. 5.3 Parser and Translator

Transcription:

376a. Database Design Dept. of Computer Science Vassar College http://www.cs.vassar.edu/~cs376 Class 16 Query optimization What happens Database is given a query Query is scanned - scanner creates a list of tokens recognized by language. Query is parsed - tokens are checked for syntactic correctness. Query is validated - attributes exist and query is semantically validated. 1 2 Query conversion Execution strategy Query is converted to intermediate format Query tree Query graph How the DBMS takes the tree or graph and executes it against the database. Many different strategies This is where query optimization comes in to place Query optimization is picking the best execution plan (time, disk accesses, etc.) No! Just reasonably efficient strategy. 3 4 Next Two types of techniques Execution plan is converted to code (query code generation) Two types of execution Interpreted - executed directly Compiled - execution strategy is stored and executed at a later time Runtime DBMS executes the final query. Heuristic rules for ordering operations for query optimization. Systematically estimating costs. (usually combination of these two strategies is used.) 5 6 and Procedural Abstraction 1

Query operations Tasks include: search, sort, merge, union, intersect, etc. Typically DBMS have several algorithms to perfrom each task. Most start with SQL Take SQL query, break into query blocks (a block is composed of a single SELECT- FROM-WHERE). Convert SQL to relational algebra expression (a tree data structure). Optimize this expression. 7 8 Example Convert this to two queries SELECT Lname, Fname FROM EMPLOYEE WHERE Salary>(SELECT MAX (Salary) FROM EMPLOYEE WHERE Dno=5); SELECT MAX (Salary) FROM EMPLOYEE WHERE Dno=5; Æ MAX SALARY (σ Dno=5 (EMPLOYEE)) SELECT Lname, Fname FROM EMPLOYEE WHERE Salary > c; (c is the result of the first query) More difficult with correlated query though. 9 10 Cover optimization in following order External sorting - required for sort-merge operations. SELECT operations JOIN operations PROJECT operations Set operations (union, intersection, difference) Aggregate operation (min, max, average, count) 11 External sorting Required for almost everything: sort-merge, union, intersection, diff, duplicate elimination. Break file into subfiles (process in runs and merge). Two phase Sort Merge 12 and Procedural Abstraction 2

External sort calculations n R = b/n B n R - number of initial runs b - number of file blocks. n B - available buffer space (in main memory) E.G. if buffer space is 100 blocks and b = 64000 blocks then # runs = 640 13 Merge calculations dm - degree of merge (how many runs can be merged per pass) d M = MIN(n B -1, n R ) Passes = «(log dm (n R ))» For n R = 640 and n B = 100, d M = 99 640 runs can be merged in 2 runs. (remember, worst case d M is 2!) 14 Worst case performance SELECT Algorithms (2*b) + (2 * (b* (log 2 b))) One read/one write for sorting Disk accesses for merging. (replace log 2 with log dm for general case) 15 Depends if on indexed or non-indexed attributes. Simple methods (file scan or index scan) Linear search - check every record. Binary search - = is comparison op. Nonindexed. Primary index for equality test. Primary index for range (or other equality test). Cluster index for multiple. 16 SELECT (complex select) Using conjunctions or disjunctions Conjunctions Use simple methods and then check remaining simple conditions. Use composite index or composite hash. Intersection of record pointers - when you have multiple secondary indices. NOTE: access path - index. Complex SELECT Optimizer should chose access path that retrieves fewest records in most efficient way. S - selectivity of an access path. - defined r/r - r satisf/r total tuples. 17 18 and Procedural Abstraction 3

SELECT - disjunctions SELECT FROM WHERE a<x OR b<6; Return the UNION. Limited by attributes without indices. Can only optimize if all disjunctions have indices. JOIN algorithms Equijoin or natural join - R a=b S Algorithms: Nested loop join - brute force Single-loop join - use index to search for match. Sort-merge join - only if both are sorted by join value. Very efficient (if not using logical blocks) 19 20 JOIN algo - cont. JOIN analysis Hash join - hash both to same hash table. Hash smaller of 2 first. 2nd phase is probing phase. Requires blocks for both tables and 1 block for join results. Join selection factor - (% of records that will be joined) For the single loop join, use the table with highest join selection factor as the outer loop. 21 22 JOIN analysis PROJECT algorithms Sort-merge join - is linear if already sorted or n log (n) if not. Partition-hash - use the same hash for both. If internal hash, very fast, otherwise more complicated. If attribute is a key, result has same number of tuples as R. If attribute is not key, may need to remove duplicate through sorting or hashing. Remember, SQL queries do not normally remove duplicates (need to use DISTINCT keyword). 23 24 and Procedural Abstraction 4

Set algorithms UNION, INTERSECTION, SET DIFFERENCE, CARTESIAN PRODUCT - * avoid CP like the plague! * Others must be union compatible Use sort-merge algorithms. Hashing using partition and probe also work well. Set using sort-merge UNION - sort and merge both tables simultaneously. INTERSECTION - sort, merge only if found in both tables. DIFFERENCE - merge if in first but not in second. 25 26 Set using hash Aggregate operations UNION - hash R, hash S, on match, don t add again. INTERSECTION - hash R, hash S, on match, copy to result set. DIFFERENCE - hash R, hash S, on match, mark record invalid (but keep in hash). Table scan or index MIN and MAX are good index operators. COUNT, AVERAGE, SUM only work with dense indices. (need to count # of records matching.) COUNT DISTINCT can be used with sparse index. 27 28 Group by OUTER JOIN algorithm When using group by Partition records using sort or hash, then apply function to records in group. Clustered index has this by default (group by operations are easy to perform if table has cluster index.) Modify join operations. Or use relational algebra operations 1. Inner join R and S. 2. Find R not in join result. 3. Join difference with null 4. Union 1 and 3. 29 30 and Procedural Abstraction 5

How to do this all quickly Pipelining or stream-based processing - don t write intermediate results out to disk. For example: SELECT Lname from EMPLOYEE E, WORKS_ON W WHERE E.Ssn=W.Ssn and W.Pno=4 and E.dno = 4; Result from SELECT are fed right into join then project rather than create 4 temp files. 31 Query Tree The data structure used to hold a relational algebra(ra) or extended RA expression. Relations are leaf nodes. Relational algebra operations are internal nodes. Initial tree generated by parser is not best. Give heuristics for optimizing these trees. 32 Query tree Canonical form of query tree Top node is PROJECT (π) Next node is SELECT (σ) Leaf nodes are joined using Cartesian product into one relation (with all attributes and all tuples) connected to the big σ statement. VERY EXPENSIVE to execute this tree! Query tree Canonical form is good place to start optimization. All heuristics should not change flavor of query. 33 34 Example π Pnum, Dnum, Lname, Addr, Bdate (( σ Ploc = Stafford (PROJECT)) dnum=dnum (DEPARTMENT)) Mgrssn=Ssn(EMPLOYEE)) 35 36 and Procedural Abstraction 6