QUERY PROCESSING & OPTIMIZATION CHAPTER 19 (6/E) CHAPTER 15 (5/E)

Similar documents
QUERY PROCESSING & OPTIMIZATION CHAPTER 19 (6/E) CHAPTER 15 (5/E)

Query Processing SL03

Query Processing. high level user query. low level data manipulation. query processor. commands

Query Decomposition and Data Localization

Distributed Databases Systems

Introduction Background Distributed DBMS Architecture Distributed Database Design Semantic Data Control Distributed Query Processing

Introduction Background Distributed DBMS Architecture Distributed Database Design Semantic Data Control Distributed Query Processing

Distributed caching for multiple databases

Outline. Distributed DBMS 1998 M. Tamer Özsu & Patrick Valduriez

Optimization of Distributed Queries

CS54200: Distributed. Introduction

Mobile and Heterogeneous databases

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Mobile and Heterogeneous databases Distributed Database System Query Processing. A.R. Hurson Computer Science Missouri Science & Technology

QUERY OPTIMIZATION [CH 15]

COSC 122 Computer Fluency. Databases. Dr. Ramon Lawrence University of British Columbia Okanagan

Key Points. COSC 122 Computer Fluency. Databases. What is a database? Databases in the Real-World DBMS. Database System Approach

Distributed Query Optimization: Use of mobile Agents Kodanda Kumar Melpadi

Relational Model History. COSC 304 Introduction to Database Systems. Relational Model and Algebra. Relational Model Definitions.

II. Structured Query Language (SQL)

COSC 304 Introduction to Database Systems SQL. Dr. Ramon Lawrence University of British Columbia Okanagan

Physical Design. Elena Baralis, Silvia Chiusano Politecnico di Torino. Phases of database design D B M G. Database Management Systems. Pag.

Lassonde School of Engineering Winter 2016 Term Course No: 4411 Database Management Systems

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe

SQL Queries. COSC 304 Introduction to Database Systems SQL. Example Relations. SQL and Relational Algebra. Example Relation Instances

II. Structured Query Language (SQL)

Examples of Physical Query Plan Alternatives. Selected Material from Chapters 12, 14 and 15

Hash-Based Indexing 165

Algorithms for Query Processing and Optimization. 0. Introduction to Query Processing (1)

Outline. Query Processing Overview Algorithms for basic operations. Query optimization. Sorting Selection Join Projection

What is Data? Volatile vs. persistent data Our concern is primarily with persistent data

What is Data? ANSI definition: Volatile vs. persistent data. Data. Our concern is primarily with persistent data

Outline. q Database integration & querying. q Peer-to-Peer data management q Stream data management q MapReduce-based distributed data management

COSC 304 Introduction to Database Systems SQL DDL. Dr. Ramon Lawrence University of British Columbia Okanagan

Chapter 3. Algorithms for Query Processing and Optimization

Relational Query Optimization. Highlights of System R Optimizer

COSC344 Database Theory and Applications. σ a= c (P) S. Lecture 4 Relational algebra. π A, P X Q. COSC344 Lecture 4 1

CSIT5300: Advanced Database Systems

Hash table example. B+ Tree Index by Example Recall binary trees from CSE 143! Clustered vs Unclustered. Example

Outline. Distributed DBMS Page 5. 1

Relational Model History. COSC 416 NoSQL Databases. Relational Model (Review) Relation Example. Relational Model Definitions. Relational Integrity

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 9 - Query optimization

R & G Chapter 13. Implementation of single Relational Operations Choices depend on indexes, memory, stats, Joins Blocked nested loops:

Introduction to Database Systems CSE 414. Lecture 26: More Indexes and Operator Costs

University of Waterloo Midterm Examination Sample Solution

DISTRIBUTED DATABASES CS561-SPRING 2012 WPI, MOHAMED ELTABAKH

Why Relational Databases? Relational databases allow for the storage and analysis of large amounts of data.

Outline. Database Management and Tuning. Outline. Join Strategies Running Example. Index Tuning. Johann Gamper. Unit 6 April 12, 2012

Outline. File Systems. Page 1

Databases. Relational Model, Algebra and operations. How do we model and manipulate complex data structures inside a computer system? Until

CS 377 Database Systems

Background. Chapter Overview of Relational DBMS Relational Database Concepts

Relational Query Optimization

Database Applications (15-415)

Database Systems CSE 414

RELATIONAL OPERATORS #1

Query Processing & Optimization. CS 377: Database Systems

Parser. Select R.text from Report R, Weather W where W.image.rain() and W.city = R.city and W.date = R.date and R.text.

Note that it works well to answer questions on computer instead of on the board.

Cs712 Important Questions & Past Papers

CMP-3440 Database Systems

University of Massachusetts Amherst Department of Computer Science Prof. Yanlei Diao

What happens. 376a. Database Design. Execution strategy. Query conversion. Next. Two types of techniques

Query Processing: The Basics. External Sorting

Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Chapter 6 Outline. Unary Relational Operations: SELECT and

Chapter 19 Query Optimization

Chapter 18 Strategies for Query Processing. We focus this discussion w.r.t RDBMS, however, they are applicable to OODBS.

Principles of Data Management. Lecture #12 (Query Optimization I)

CSC 742 Database Management Systems

Principles of Data Management. Lecture #9 (Query Processing Overview)

Database Applications (15-415)

Module 9: Query Optimization

Schema for Examples. Query Optimization. Alternative Plans 1 (No Indexes) Motivating Example. Alternative Plans 2 With Indexes

Introduction to Database Systems CSE 344

Relational Algebra 1

CSC 261/461 Database Systems Lecture 19

QUERY OPTIMIZATION. CS 564- Spring ACKs: Jeff Naughton, Jignesh Patel, AnHai Doan

Query Optimization. Schema for Examples. Motivating Example. Similar to old schema; rname added for variations. Reserves: Sailors:

Query Processing & Optimization

Ch 5 : Query Processing & Optimization

Overview of Data Management

COSC 404 Database System Implementation. Query Processing. Dr. Ramon Lawrence University of British Columbia Okanagan

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 7 - Query optimization

CS 338 Basic SQL Part II

Query optimization. Elena Baralis, Silvia Chiusano Politecnico di Torino. DBMS Architecture D B M G. Database Management Systems. Pag.

Announcements. Two typical kinds of queries. Choosing Index is Not Enough. Cost Parameters. Cost of Reading Data From Disk

Review. Administrivia (Preview for Friday) Lecture 21: Query Optimization (1) Where We Are. Relational Algebra. Relational Algebra.

Optimization Overview

Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Administrivia. Administrivia. Faloutsos/Pavlo CMU /615

Basant Group of Institution

Overview of Implementing Relational Operators and Query Evaluation

Lecture 19: Query Optimization (1)

Mahathma Gandhi University

Relational Query Optimization. Overview of Query Evaluation. SQL Refresher. Yanlei Diao UMass Amherst October 23 & 25, 2007

Lecture 21: Query Optimization (1)

Database Applications (15-415)

Relational Model: History

CPSC 421 Database Management Systems. Lecture 19: Physical Database Design Concurrency Control and Recovery

Introduction to Database Systems CSE 344

Outline. Database Tuning. Join Strategies Running Example. Outline. Index Tuning. Nikolaus Augsten. Unit 6 WS 2014/2015

Transcription:

QUERY PROCESSING & OPTIMIZATION CHAPTER 19 (6/E) CHAPTER 15 (5/E)

2 LECTURE OUTLINE Query Processing Methodology Basic Operations and Their Costs Generation of Execution Plans

3 QUERY PROCESSING IN A DDBMS high level user query query processor Low-level data manipulation commands for D-DBMS

4 SELECTING ALTERNATIVES SELECT FROM WHERE AND ENAME EMP,ASG EMP.ENO = ASG.ENO ASG.RESP = "Manager" Strategy 1 ENAME ( RESP= Manager EMP.ENO=ASG.ENO (EMP ASG)) Strategy 2 ENAME (EMP ENO ( RESP= Manager (ASG)) Strategy 2 avoids Cartesian product, so may be better

5 PICTORIALLY Strategy 1 Strategy 2 ENAME ENAME ASG.RESP= Manager EMP.ENO=ASG.ENO ENO ASG.RESP= Manager EMP ASG EMP ASG

6 QUERY PROCESSING METHODOLOGY generate alternative access plans, i.e., procedure, for processing the query select an efficient access plan Query in high-level language Query Decomposition & Translation Algebraic Query Query Optimization Query Execution Plan SQL check SQL syntax check existence of relations and attributes replace views by their definitions transform query into an internal form Query Code Generator Code to execute query Runtime Processor Query Result

7 EXAMPLE SELECT V.Vno, Vname, count(*), sum(amount) FROM Vendor V, Transaction T WHERE V.Vno = T.Vno AND V.Vno between 1000 and 2000 GROUP BY V.Vno, Vname HAVING sum(amount) > 100 Scan the Vendor table, select all tuples where Vno = [1000, 2000], eliminate attributes other than Vno and Vname, and place the result in a temporary relation R 1 Join the tables R 1 and Transaction, eliminate attributes other than Vno, Vname, and Amount, and place the result in a temporary relation R 2. This may involve: sorting R 1 on Vno sorting Transaction on Vno merging the two sorted relations to produce R 2 Perform grouping on R 2, and place the result in a temporary relation R 3. This may involve: sorting R 2 on Vno and Vname grouping tuples with identical values of Vno and Vname counting the number of tuples in each group, and adding their Amounts Scan R 3, select all tuples with sum(amount) > 100 to produce the result.

8 EXAMPLE SELECT V.Vno, Vname, count(*), sum(amount) FROM Vendor V, Transaction T WHERE V.Vno = T.Vno AND 2000 V.Vno between 1000 and GROUP BY V.Vno, Vname HAVING sum(amount) > 100 Scan (Sum(Amount) > 100) Grouping (Vno, Vname) Join (V.Vno = T.Vno) Scan (Vno between 1000 and 2000) Vendor Transaction

9 QUERY OPTIMIZATION ISSUES Determining the shape of the execution plan Order of execution Determining which how each node in the plan should be executed Operator implementations These are interdependent and an optimizer would do both in generating the execution plan

10 SHAPE OF THE EXECUTION PLAN Finding query trees that are equivalent Produce the same result provably These are based on the transformation (equivalence) rules Commutativity of selection p1 (A 1 )( p2 (A 2 )R) p2 (A 2 )( p1 (A 1 )R) Commutativity of binary operations R S S R R S S R R S S R R S S R Associativity of binary operations ( R S) T R (S T) (R S) T R (S T) (R S) T (S R) T Cascading of unary operations A ( A (R)) A (R) where R[A] and A' A, A" A and A' A" p1 (A 1 )( p2 (A 2 )(R)) p1 (A 1 ) p 2 (A 2 )(R)

11 OTHER TRANSFORMATION RULES Commuting selection with projection B ( p(a) R) p(a) ( B R) (where B A) Commuting selection with binary operations p(a) (R S) ( p(a) (R)) S (where A belongs to R only) p(ai )(R (Aj,B k )S) ( p(ai ) (R)) (Aj,B k )S (where A i belongs to R only) p(ai )(R S) p(ai ) (R) p(ai ) (S) (where A i belongs to R and S) p(ai )(R S) p(ai ) (R) p(ai ) (s) (where A i belongs to R and S) Commuting projection with binary operations C (R S) A (R) B (S) C (R (Aj,B k )S) A (R) (Aj,B k ) B (S) C (R S) C (R) C (S) C (R S) C (R) C (S) where R[A] and S[B]; C = A' B' where A' A, B' B

12 EXAMPLE TRANSFORMATION Find the names of employees other than J. Doe who worked on the CAD/CAM project for either one or two years. E.ENAME G.DUR=12 G.DUR=24 Project SELECT ENAME FROM PROJ P, ASG G, EMP E WHERE G.ENO=E.ENO AND G.PNO=P.PNO AND E.ENAME <> 'J. Doe' AND P.PNAME='CAD/CAM' AND (G.DUR=12 OR G.DUR=24) P.PNAME= CAD/CAM E.ENAME<> J. DOE PNO ENO Select Join PROJ ASG EMP

13 EQUIVALENT QUERY E.ENAME P.PNAME= CAD/CAM (G.DUR=12 G.DUR=24) E.ENAME<> J. Doe PNO,ENO EMP PROJ ASG

14 ANOTHER EQUIVALENT QUERY ENAME PNO PNO,ENAME EN O PNO PNO,ENO PNO,ENAME PNAME = "CAD/CAM" DUR =12 DUR=24 ENAME "J. Doe" PROJ ASG EMP

15 CLICKER QUESTION #36 Is the right query plan equivalent to the left query plan? E.ENAME G.DUR=12 G.DUR=24 E.ENAME P.PNAME= CAD/CAM EN O E.ENAME<> J. DOE PNO PNO EN O P.PNAME= CAD/CAM G.DUR=12 G.DUR=24 E.ENAME<> J. DOE PROJ ASG EMP PROJ ASG EMP (a) Yes (b) No

16 IMPORTANT PROBLEM JOIN ORDER Assume you have R S T W Linear Join Tree W Bushy Join Tree T R S R S T T Most systems implement linear join trees Left-linear

17 JOIN ORDERING Even with left-linear, how do you know which order? Assume natural join over common attributes W T T T W W R S R S R S

18 SOME OPERATOR IMPLEMENTATIONS Tuple Selection without an index with a clustered index with an unclustered index with multiple indices Projection Joining nested loop join sort-merge join and others... Grouping and Duplicate Elimination by sorting by hashing Sorting

19 EXAMPLE JOIN ALGORITHMS SELECT C.Cnum, A.Balance FROM Customer C, Accounts A WHERE C.Cnum = A.Cnum Nested loop join: for each tuple c in Customer do for each tuple a in Accounts do if c.cnum = a.cnum then output c.cnum,a.balance end end

20 EXAMPLE JOIN ALGORITHMS (2) SELECT FROM WHERE C.Cnum, A.Balance Customer C, Accounts A C.Cnum = A.Cnum Index join: for each tuple c in Customer do use the index to find Accounts tuples a where a.cnum matches c.cnum if there are any such tuples a then output c.cnum, a.balance end end Sort-merge join: sort Customer and Accounts on Cnum merge the resulting sorted relations

21 COMPLEXITY OF OPERATORS Assume Relations of cardinality n Sequential scan Operation Complexity Select Project (without duplicate elimination) Project (with duplicate elimination) Group Join Semi-join Division Set Operators O(n) O(n log n) O(n log n) Cartesian Product O(n 2 )

22 COST OF PLANS Alternative access plans may be compared according to cost. The cost of an access plan is the sum of the costs of its component operations. There are many possible cost metrics. However, most metrics reflect the amounts of system resources consumed by the access plan. System resources may include: disk block I/O s processing time network bandwidth

23 LECTURE SUMMARY Query processing methodology Basic query operations and their costs Generation of execution plans