Query Processing and Query Optimization. Prof Monika Shah

Similar documents
R & G Chapter 13. Implementation of single Relational Operations Choices depend on indexes, memory, stats, Joins Blocked nested loops:

Examples of Physical Query Plan Alternatives. Selected Material from Chapters 12, 14 and 15

Relational Query Optimization

Overview of Implementing Relational Operators and Query Evaluation

CS330. Query Processing

Overview of Query Evaluation. Overview of Query Evaluation

Principles of Data Management. Lecture #9 (Query Processing Overview)

Review. Relational Query Optimization. Query Optimization Overview (cont) Query Optimization Overview. Cost-based Query Sub-System

Database Systems. Announcement. December 13/14, 2006 Lecture #10. Assignment #4 is due next week.

Evaluation of Relational Operations: Other Techniques. Chapter 14 Sayyed Nezhadi

Overview of Query Evaluation

Relational Query Optimization. Highlights of System R Optimizer

Query Optimization. Schema for Examples. Motivating Example. Similar to old schema; rname added for variations. Reserves: Sailors:

Query Optimization. Schema for Examples. Motivating Example. Similar to old schema; rname added for variations. Reserves: Sailors:

Administrivia. CS 133: Databases. Cost-based Query Sub-System. Goals for Today. Midterm on Thursday 10/18. Assignments

Schema for Examples. Query Optimization. Alternative Plans 1 (No Indexes) Motivating Example. Alternative Plans 2 With Indexes

Implementation of Relational Operations

External Sorting Implementing Relational Operators

Principles of Data Management. Lecture #12 (Query Optimization I)

Evaluation of Relational Operations. Relational Operations

Query Evaluation Overview, cont.

QUERY OPTIMIZATION E Jayant Haritsa Computer Science and Automation Indian Institute of Science. JAN 2014 Slide 1 QUERY OPTIMIZATION

Query Evaluation Overview, cont.

Evaluation of Relational Operations

CAS CS 460/660 Introduction to Database Systems. Query Evaluation II 1.1

Implementation of Relational Operations. Introduction. CS 186, Fall 2002, Lecture 19 R&G - Chapter 12

Evaluation of relational operations

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17

Administrivia. Relational Query Optimization (this time we really mean it) Review: Query Optimization. Overview: Query Optimization

Evaluation of Relational Operations

Operator Implementation Wrap-Up Query Optimization

CompSci 516 Data Intensive Computing Systems

Query Processing and Optimization

An SQL query is parsed into a collection of query blocks optimize one block at a time. Nested blocks are usually treated as calls to a subroutine

Relational Query Optimization. Overview of Query Evaluation. SQL Refresher. Yanlei Diao UMass Amherst October 23 & 25, 2007

Relational Query Optimization. Overview of Query Evaluation. SQL Refresher. Yanlei Diao UMass Amherst March 8 and 13, 2007

Evaluation of Relational Operations

Administriva. CS 133: Databases. General Themes. Goals for Today. Fall 2018 Lec 11 10/11 Query Evaluation Prof. Beth Trushkowsky

CSIT5300: Advanced Database Systems

Faloutsos 1. Carnegie Mellon Univ. Dept. of Computer Science Database Applications. Outline

Database Applications (15-415)

Overview of Query Processing

ATYPICAL RELATIONAL QUERY OPTIMIZER

Overview of Query Evaluation. Chapter 12

Query Evaluation! References:! q [RG-3ed] Chapter 12, 13, 14, 15! q [SKS-6ed] Chapter 12, 13!

Query Evaluation (i)

CS 4604: Introduction to Database Management Systems. B. Aditya Prakash Lecture #10: Query Processing

Overview of DB & IR. ICS 624 Spring Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa

Implementing Relational Operators: Selection, Projection, Join. Database Management Systems, R. Ramakrishnan and J. Gehrke 1

Evaluation of Relational Operations. SS Chung

Query Processing & Optimization

15-415/615 Faloutsos 1

CompSci 516 Data Intensive Computing Systems. Lecture 11. Query Optimization. Instructor: Sudeepa Roy

Overview of Query Processing. Evaluation of Relational Operations. Why Sort? Outline. Two-Way External Merge Sort. 2-Way Sort: Requires 3 Buffer Pages

Advanced Database Systems

Database Applications (15-415)

Chapter 12: Query Processing. Chapter 12: Query Processing

Chapter 12: Query Processing

Database Applications (15-415)

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Database Applications (15-415)

CSIT5300: Advanced Database Systems

System R Optimization (contd.)

QUERY EXECUTION: How to Implement Relational Operations?

Chapter 12: Query Processing

Query Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016

Implementation of Relational Operations: Other Operations

Database System Concepts

Relational Query Optimization

Implementing Joins 1

Midterm Review CS634. Slides based on Database Management Systems 3 rd ed, Ramakrishnan and Gehrke

Database Applications (15-415)

Algorithms for Query Processing and Optimization. 0. Introduction to Query Processing (1)

QUERY OPTIMIZATION [CH 15]

Final Exam Review 2. Kathleen Durant CS 3200 Northeastern University Lecture 23

CSE 444: Database Internals. Section 4: Query Optimizer

ECS 165B: Database System Implementa6on Lecture 7

CSE 444: Database Internals. Lecture 22 Distributed Query Processing and Optimization

Query Optimization in Relational Database Systems

Query Optimization. Kyuseok Shim. KDD Laboratory Seoul National University. Query Optimization

Database Systems External Sorting and Query Optimization. A.R. Hurson 323 CS Building

Lassonde School of Engineering Winter 2016 Term Course No: 4411 Database Management Systems

Chapter 13: Query Processing

Evaluation of Relational Operations: Other Techniques

Spring 2017 QUERY PROCESSING [JOINS, SET OPERATIONS, AND AGGREGATES] 2/19/17 CS 564: Database Management Systems; (c) Jignesh M.

DBMS Query evaluation

Cost-based Query Sub-System. Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Last Class.

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for

Chapter 13: Query Processing Basic Steps in Query Processing

IMPORTANT: Circle the last two letters of your class account:

Evaluation of Relational Operations: Other Techniques

Evaluation of Relational Operations: Other Techniques

CS330. Some Logistics. Three Topics. Indexing, Query Processing, and Transactions. Next two homework assignments out today Extra lab session:

SQL: Queries, Constraints, Triggers

Database Management Systems. Chapter 5

Basic form of SQL Queries

Overview of Query Processing and Optimization

CSE 444: Database Internals. Sec2on 4: Query Op2mizer

Advanced Databases. Lecture 4 - Query Optimization. Masood Niazi Torshiz Islamic Azad university- Mashhad Branch

Transcription:

Query Processing and Query Optimization

Query Processing SQL Query Is in Library Cache? System catalog (Dict / Dict cache) Scan and verify relations Parse into parse tree (relational Calculus) View definitions View unfolding Query transformations into Alternate relational algebras Statistics, index info Query optimizer Execution Plan Query Evaluation Query Result Data Indices

Query Optimization Cost Based Query Optimization (recommended) Rule Based Query Optimization (For backward compatibility with legacy application)

Cost Based Query Optimization Implementation of single Relational Operations Choices depend on indexes, memory, stats, Joins Blocked nested loops: simple, exploits extra memory Indexed nested loops: best if 1 rel small and one indexed Sort/Merge Join good with small amount of memory, bad with duplicates Hash Join fast (enough memory), bad with skewed data

Cost based Query Optimization (contd ) Query can be converted to relational algebra Rel. Algebra converted to tree, joins as branches Each operator has implementation choices Operators can also be applied in different order! SELECT S.sname FROM Reserves R, Sailors S WHERE R.sid=S.sid AND R.bid=100 AND S.rating>5 sname bid=100 rating > 5 (sname) (bid=100 rating > 5) (Reserves Sailors) sid=sid Reserves Sailors

Schema for Examples Sailors (sid: integer, sname: string, rating: integer, age: real) Reserves (sid: integer, bid: integer, day: dates, rname: string) As seen in previous lectures Reserves: Each tuple is 40 bytes long, 100 tuples per page, 1000 pages. Assume there are 100 boats Sailors: Each tuple is 50 bytes long, 80 tuples per page, 500 pages. Assume there are 10 different ratings Assume we have 5 pages in our buffer pool!

Motivating Example SELECT S.sname FROM Reserves R, Sailors S WHERE R.sid=S.sid AND R.bid=100 AND S.rating>5 Cost: 500+500*1000 I/Os By no means the worst plan! Misses several opportunities: selections could have been `pushed earlier, no use is made of any available indexes, etc. Goal of optimization: To find more efficient plans that compute the same answer. Plan: sname bid=100 rating > 5 sid=sid (Page-Oriented Nested loops) Sailors Reserves

Alternative Plans Push Selects (No Indexes) sname sname bid=100 bid=100 rating > 5 sid=sid (Page-Oriented Nested loops) rating > 5 sid=sid (Page-Oriented Nested loops) Reserves Sailors Reserves Sailors 500,500 IOs 250,500 IOs

Alternative Plans Push Selects (No Indexes) sname sname bid=100 sid=sid (Page-Oriented Nested loops) sid=sid (Page-Oriented Nested loops) rating > 5 bid = 100 rating > 5 Reserves Sailors Reserves Sailors 250,500 IOs 250,500 IOs

Alternative Plans Push Selects (No Indexes) sname sname bid=100 rating > 5 sid=sid (Page-Oriented Nested loops) sid=sid (Page-Oriented Nested loops) rating > 5 Reserves bid=100 Sailors Sailors Reserves 250,500 IOs 6000 IOs

Alternative Plans Push Selects (No Indexes) sname rating > 5 sname bid=100 sid=sid (Page-Oriented Nested loops) Sailors bid=100 sid=sid (Page-Oriented Nested loops) rating > 5 (Scan & Write to temp T2) Reserves 6000 IOs Reserves Sailors 4250 IOs 1000 + 500+ 250 + (10 * 250)

Alternative Plans Push Selects (No Indexes) sname sname sid=sid (Page-Oriented Nested loops) sid=sid (Page-Oriented Nested loops) bid=100 rating > 5 (Scan & Write to temp T2) rating>5 bid=100 (Scan & Write to temp T2) Reserves Sailors Sailors Reserves 4250 IOs 4010 IOs 500 + 1000 +10 +(250 *10)

More Alternative Plans (No Indexes) sname sid=sid (Sort-Merge Join) Main difference: Sort Merge Join With 5 buffers, cost of plan: (Scan; write to temp T1) bid=100 Reserves rating > 5 Sailors Scan Reserves (1000) + write temp T1 (10 pages, if we boats, uniform distribution) = 1010. have 100 Scan Sailors (500) + write temp T2 (250 pages, if have 10 ratings) = 750. Sort T1 (2*2*10) + sort T2 (2*4*250) + merge (10+250) = 2300 Total: 4060 page I/Os. If use BNL join, join = 10+4*250, total cost = 2770. Can also `push projections, but must be careful! T1 has only sid, T2 only sid, sname: T1 fits in 3 pgs, cost of BNL under 250 pgs, total < 2000. (Scan; write to temp T2)

More Alt Plans: Indexes With clustered index on bid of Reserves, we get 100,000/100 = 1000 tuples on 1000/100 = 10 pages. INL with outer not materialized. Projecting out unnecessary fields from outer doesn t help. Join column sid is a key for Sailors. (Use hash Index, do not write to temp) At most one matching tuple, unclustered index on sid OK. bid=100 Reserves sname rating > 5 sid=sid Sailors (Index Nested Loops, with pipelining ) Decision not to push rating>5 before the join is based on availability of sid index on Sailors. Cost: Selection of Reserves tuples (10 I/Os); then, for each, must get matching Sailors tuple (1000*1.2); total 1210 I/Os.

Cost Based Query Optimization Summary Find Alternate Plans Cost Estimation for each alternate plan Find a Query Plan with least cost Disadvantage : Expensive to cost estimation for Large number of Alternate plans generated. For Example, Find best join-order for r 1 r 2... r n. (2(n 1))!/(n 1)! different join orders for above expression For n = 7, the number is 665280, for n=10 number is 176 billion! Solution : No need to generate all the join orders. Use dynamic programming to find least-cost join order

Materialization create and read temporary relations create implies writing to disk more page writes π name σ coursename=advanced DBs courseid; index-nested loop cid; hash join course student takes

Pipelining (1/2) creating a pipeline of operations reduces number of read-write operations implementations demand-driven - data pull producer-driven - data push π name σ coursename=advanced DBs ccourseid; index-nested loop cid; hash join course student takes

Pipelining (2/2) can pipelining always be used? any algorithm? cost of R S materialization and hash join: B R + 3(B R +B S ) pipelined pipelining and indexed nested loop join: N R * HT i cid R courseid materialized S σ coursename=advanced DBs student takes course

Heuristic Optimization Cost-based optimization is expensive, even with dynamic programming. Solution : reduce search space using 1) Randomized Algorithm : Iterative Improvement or 2) Heuristic optimization Goal: reduce size of intermediate results Heuristics to reduce the number of choices Heuristic optimization transforms the query-tree by using a set of rules that typically (but not in all cases) improve execution performance: Perform selection early (reduces the number of tuples) Perform projection early (reduces the number of attributes) Perform most restrictive selection and join operations before other similar operations. And many other Some systems use only heuristics, others combine heuristics with partial cost-based optimization.

Heuristic Optimization (Example)

Heuristic Optimization (Example)

Rule/Hint Based Query Optimization Oracle : Allow to embed hints in SQL statements to guide the optimizer towards making more efficient choices. Syntax : SELECT /*+ hint */ cola, colb,... FROM tab1, tab2,... Where, the /* and */ are normally comments + sign : causes comment to be treated as a hint. Different values for hint can include: ALL_ROWS - Optimize the query for best throughput (lowest resource utilization) (CBO approach irrespective of presence of statistics) FIRST_ROWS(n) - Optimize for fastest response time. (CBO approach irrespective of presence of statistics) CHOOSE - Optimizer chooses either Rule based or Cost based. If statistics are available (via the ANALYZE TABLE command), Cost based is chosen, otherwise, rule based is chosen. RULE - Force the use of the Rule based optimizer.

Rule/Hint Based Query Optimization(contd ) Other Hints in Oracle : for every possible step within execution plans: Global hints rule, first_rows, first_rows_n all_rows, driving_site Table join hints use_nl, use_hash Index hints Specifies an index name Table access hints parallel, full, cardinality Table join hints ordered System ignore irrelevant Hint. i.e Specifying an index hint on a table that has no indexes Specifying a parallel hint for an index range scan Mutually exclusive index specified (like index and parallel both)

Rule/Hint Based Query Optimization(contd ) FIRST_ROWS(n): Used When : Typically users are interested to see first few rows This Hint ignored for DELETE, UPDATE statements and SELECT statement containing following clauses: Set operators (UNION,INTERSECT,MINUS,UNION ALL) n GROUP BYclause n FOR UPDATEclause n Aggregate functions n DISTINCToperator n ORDER BYclauses, when there is no index on the ordering columns Example : Best response time to retrieve first 10 rows SELECT /*+ FIRST_ROWS(10) */ employee_id, last_name, salary, job_id FROM employees WHERE department_id = 20;

Rule/Hint Based Query Optimization(contd ) Hint FULL: Ignore Indexes Blocks are read sequentially I/O larger than a single block can be speedup using FULL table scan Used When : Table is small or Typically users are interested to see first few rows Example: /*Ignore index on last_name*/ SELECT /*+ FULL(e) */ employee_id, last_name FROM employees e WHERE last_name LIKE :b1 Full table scan applied by default when function used on indexed column in where clause. i.e index on last_name SELECT last_name, first_name FROM employees WHERE UPPER(last_name) LIKE :b1

Rule/Hint Based Query Optimization(contd ) Hints in MS SQL Server : Query Hints can be added using OPTION clause at end of the statement Syntax : SELECT select_list FROM table_source WHERE search_condition GROUP BY group_by_expression HAVING search_condition ORDER BY order_expression OPTIONS (query options) Where, {HASH ORDER} GROUP : use hashing or ordering in the GROUP BY or COMPUTE {MERGE HASH CONCAT} UNION : use merging/hashing/concatenating in UNION If more than one hint, the query optimizer selects the least expensive strategy. {LOOP MERGE HASH } JOIN : use specified join in the whole query. If more than one join hint is specified, the query optimizer selects the least expensive FORCE ORDER : Specifies that the join order indicated by the query syntax is preserved during query optimization.

Distributed Query Processing Methodology Calculus Query on Distributed Relations CONTROL SITE LOCAL SITES Query Decomposition Algebraic Query on Distributed Relations Data Localization Fragment Query Global Optimization Optimized Fragment Query with Communication Operations Local Optimization Optimized Local Queries GLOBAL SCHEMA FRAGME NT SCHEMA STATS ON FRAGME NTS LOCAL SCHEMA S

MDB Query Processing Architecture Global/local correspondences Allocation and capabilities Local/DBMS mappings

Distributed Query Optimization

INGRES Algorithm 1. Decompose each multi-variable query into a sequence of mono-variable queries with a common variable 2. Process each by a one variable query processor Choose an initial execution plan (heuristics) Order the rest by considering intermediate relation sizes No statistical information is maintained

INGRES Algorithm (contd..) 1. Decompose each multi-variable query into a sequence of mono-variable queries with a common variable 2. Process each by a one variable query processor Choose an initial execution plan (heuristics) Order the rest by considering intermediate sizes relation Apply tuple substitution to integrate query q i-1 to q i No statistical information is maintained

System R* Algorithm 1. Simple (i.e., mono-relation) queries are executed according to the best access path. 2. Execute joins Determine the possible ordering of joins Determine the cost of each ordering Choose the join ordering with minimal cost Ship Whole / Fetch as needed (semijoin)

Ordering joins Distributed INGRES System R* Semijoin ordering SDD-1 Join Ordering Better if

SDD-1 Based on Hill Climbing Algorithm Hill Climbing Algorithm SemiJoins No Replication No Fragmentation Minimize total time or response time Do not consider cost of transferring data from result site to user site Ignore local processing cost