Spring 2017 QUERY PROCESSING [JOINS, SET OPERATIONS, AND AGGREGATES] 2/19/17 CS 564: Database Management Systems; (c) Jignesh M.

Similar documents
Evaluation of Relational Operations: Other Techniques. Chapter 14 Sayyed Nezhadi

Evaluation of Relational Operations

Implementation of Relational Operations

Evaluation of Relational Operations. Relational Operations

Evaluation of Relational Operations

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17

CAS CS 460/660 Introduction to Database Systems. Query Evaluation II 1.1

Database Applications (15-415)

Evaluation of relational operations

Implementation of Relational Operations. Introduction. CS 186, Fall 2002, Lecture 19 R&G - Chapter 12

Faloutsos 1. Carnegie Mellon Univ. Dept. of Computer Science Database Applications. Outline

Evaluation of Relational Operations: Other Techniques

CS 4604: Introduction to Database Management Systems. B. Aditya Prakash Lecture #10: Query Processing

Implementation of Relational Operations: Other Operations

CompSci 516 Data Intensive Computing Systems

Implementing Joins 1

15-415/615 Faloutsos 1

Evaluation of Relational Operations: Other Techniques

Evaluation of Relational Operations. SS Chung

Administriva. CS 133: Databases. General Themes. Goals for Today. Fall 2018 Lec 11 10/11 Query Evaluation Prof. Beth Trushkowsky

CS330. Query Processing

Evaluation of Relational Operations: Other Techniques

CompSci 516 Data Intensive Computing Systems. Lecture 11. Query Optimization. Instructor: Sudeepa Roy

CSIT5300: Advanced Database Systems

Overview of Query Evaluation. Overview of Query Evaluation

Operator Implementation Wrap-Up Query Optimization

External Sorting Implementing Relational Operators

Principles of Data Management. Lecture #9 (Query Processing Overview)

Examples of Physical Query Plan Alternatives. Selected Material from Chapters 12, 14 and 15

Spring 2017 EXTERNAL SORTING (CH. 13 IN THE COW BOOK) 2/7/17 CS 564: Database Management Systems; (c) Jignesh M. Patel,

Overview of Implementing Relational Operators and Query Evaluation

RELATIONAL OPERATORS #1

R & G Chapter 13. Implementation of single Relational Operations Choices depend on indexes, memory, stats, Joins Blocked nested loops:

Evaluation of Relational Operations

Dtb Database Systems. Announcement

Overview of Query Evaluation

Query Optimization. Schema for Examples. Motivating Example. Similar to old schema; rname added for variations. Reserves: Sailors:

QUERY EXECUTION: How to Implement Relational Operations?

Overview of Query Processing. Evaluation of Relational Operations. Why Sort? Outline. Two-Way External Merge Sort. 2-Way Sort: Requires 3 Buffer Pages

Lassonde School of Engineering Winter 2016 Term Course No: 4411 Database Management Systems

QUERY OPTIMIZATION [CH 15]

Query Processing. Introduction to Databases CompSci 316 Fall 2017

Database Applications (15-415)

Database Applications (15-415)

Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Administrivia. Administrivia. Faloutsos/Pavlo CMU /615

Relational Query Optimization

Database Systems. Announcement. December 13/14, 2006 Lecture #10. Assignment #4 is due next week.

Query Optimization. Schema for Examples. Motivating Example. Similar to old schema; rname added for variations. Reserves: Sailors:

Implementing Relational Operators: Selection, Projection, Join. Database Management Systems, R. Ramakrishnan and J. Gehrke 1

Cost-based Query Sub-System. Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Last Class.

Sorting & Aggregations

Schema for Examples. Query Optimization. Alternative Plans 1 (No Indexes) Motivating Example. Alternative Plans 2 With Indexes

Database Management System

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe

CMPUT 391 Database Management Systems. Query Processing: The Basics. Textbook: Chapter 10. (first edition: Chapter 13) University of Alberta 1

Query Processing and Query Optimization. Prof Monika Shah

Review. Relational Query Optimization. Query Optimization Overview (cont) Query Optimization Overview. Cost-based Query Sub-System

Query Evaluation (i)

Query Evaluation Overview, cont.

Relational Query Optimization. Highlights of System R Optimizer

Overview of DB & IR. ICS 624 Spring Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa

Advances in Data Management Query Processing and Query Optimisation A.Poulovassilis

EECS 647: Introduction to Database Systems

Query Evaluation Overview, cont.

Query Processing: The Basics. External Sorting

An SQL query is parsed into a collection of query blocks optimize one block at a time. Nested blocks are usually treated as calls to a subroutine

Midterm Review CS634. Slides based on Database Management Systems 3 rd ed, Ramakrishnan and Gehrke

QUERY OPTIMIZATION E Jayant Haritsa Computer Science and Automation Indian Institute of Science. JAN 2014 Slide 1 QUERY OPTIMIZATION

Algorithms for Query Processing and Optimization. 0. Introduction to Query Processing (1)

CSE 444: Database Internals. Section 4: Query Optimizer

Query Processing and Advanced Queries. Query Optimization (4)

Relational Query Optimization

University of Waterloo Midterm Examination Solution

Presenter: Eunice Tan Lam Ziyuan Yan Ming fei. Join Algorithm

Spring 2017 B-TREES (LOOSELY BASED ON THE COW BOOK: CH. 10) 1/29/17 CS 564: Database Management Systems, Jignesh M. Patel 1

EXTERNAL SORTING. CS 564- Spring ACKs: Dan Suciu, Jignesh Patel, AnHai Doan

Database Applications (15-415)

Advanced Database Systems

Principles of Data Management. Lecture #12 (Query Optimization I)

Last Class Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications

Project. CIS611 Spring 2014 SS Chung Due by April 15. Performance Evaluation Experiment on Query Rewrite Optimization

Goals for Today. CS 133: Databases. Relational Model. Multi-Relation Queries. Reason about the conceptual evaluation of an SQL query

Midterm 1: CS186, Spring I. Storage: Disk, Files, Buffers [11 points] cs186-

CSE 232A Graduate Database Systems

Overview of Query Processing

Query Processing with Indexes. Announcements (February 24) Review. CPS 216 Advanced Database Systems

Introduction to Data Management CSE 344. Lecture 12: Cost Estimation Relational Calculus

CSE 544 Principles of Database Management Systems

Lecture 15: The Details of Joins

Chapter 12: Query Processing. Chapter 12: Query Processing

Administrivia. CS 133: Databases. Cost-based Query Sub-System. Goals for Today. Midterm on Thursday 10/18. Assignments

Overview of Query Evaluation. Chapter 12

Chapter 12: Query Processing

Review. Support for data retrieval at the physical level:

Readings. Important Decisions on DB Tuning. Index File. ICOM 5016 Introduction to Database Systems

Midterm 1: CS186, Spring I. Storage: Disk, Files, Buffers [11 points] SOLUTION. cs186-

Chapter 17: Parallel Databases

University of Waterloo Midterm Examination Sample Solution

Final Exam Review. Kathleen Durant CS 3200 Northeastern University Lecture 22

Data Modeling and Databases Ch 10: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich

Transcription:

Spring 2017 QUERY PROCESSING [JOINS, SET OPERATIONS, AND AGGREGATES] 2/19/17 CS 564: Database Management Systems; (c) Jignesh M. Patel, 2013 1

Joins The focus here is on equijoins These are very common, given how we design the database schemas using primary and foreign keys Equijoins are used to bring the tuples back together Example: SELECT U.login AS login, COUNT(*) AS NumMsgsToday FROM User U, Messages M WHERE U.uid = M.uid AND M.Date(tstamp) = CURRENT_DATE -- select msgs posted today GROUP BY U.login -- group by login ORDER BY NumMsgsToday DESC -- order by descending msg count We look at equijoin algorithms next 2/19/17 CS 564: Database Management Systems; (c) Jignesh M. Patel, 2013 2

Page Nested Loops Join: PNL 1. For each page in the User table, p u 2. For each page of Message, p m 3. Join the tuples on page p u with tuple in p m 4. Output matching tuples (after applying any projection) Let U denote the # pages in the User table and M denote the # pages in the Messages table, Then, the IO cost of the PNL Algorithm is: How many buffer pool pages does this algorithm use? Can we do better if we have a larger buffer pool with B pages? Where, B >> 3 2/19/17 CS 564: Database Management Systems; (c) Jignesh M. Patel, 2013 3

Block Nested Loops Join: BNL 1. Scan the User table B-2 pages at a time 2. For each page of Message, p m 3. Probe the hash table with each tuple on the page p m 4. Output matching tuples (after applying any projection) Let U denote the # pages in the User table and M denote the # pages in the Messages table, Then, the IO cost of the BNL Algorithm is: O( ) What is the CPU cost for this algorithm? 2/19/17 CS 564: Database Management Systems; (c) Jignesh M. Patel, 2013 4

Block Nested Loops Join: BNL 1. Scan the User table B-2 pages at a time 2. Insert the user tuples into an in-memory hash table on the join attribute 3. For each page of Message, p m 4. Probe the hash table with each tuple on the page p m 5. Output matching tuples (after applying any projection) Let U denote the # pages in the User table and M denote the # pages in the Messages table, Then, the IO cost of the BNL Algorithm is: O( ) 2/19/17 CS 564: Database Management Systems; (c) Jignesh M. Patel, 2013 5

Index Nested Loops Join: INL Can be used when there is an index on the join attribute on one of the tables 1. For each page in the User table, p u 2. For each tuple on page p u 3. Probe the Index on the join attribute on Messages 4. Output matching tuples (after applying any projection) Let U denote the # pages in the User table, and U denote the # tuples in the User table, and I m denote the cost of one index probe on the Messages table Then, the IO cost of the PNL Algorithm is: The cost of I m depends on the type of the index and if the index is clustered or unclustered How many buffer pages does this use? 2/19/17 CS 564: Database Management Systems; (c) Jignesh M. Patel, 2013 6

Blocked Index Nested Loops Join: BINL 1. Scan the User table B-2 pages at a time 2. Sort the tuples in the B-2 pages on the join key 3. For each tuple of the User table 4. Probe the Index on the join attribute on Messages 5. Output matching tuples (after applying any projection) Why does sorting help? 2/19/17 CS 564: Database Management Systems; (c) Jignesh M. Patel, 2013 7

Sort-Merge Join: SMJ 1. Generate sorted runs for U (Pass 0) 2. Generate sorted runs for M (Pass 0) 3. Merge the sorted runs for U and M 4. While merging check for the join condition 5. Output matching tuples Runs of U on average are ~2B pages long (with B buffer pages) Runs of M are also ~2B pages long So we have U /2B and M /2B runs after Line 2 Need to hold one page from each of these runs in memory for Line 3 So, U /2B + M /2B B If M is the larger relation, then this means that a safe criteria is: 2/19/17 CS 564: Database Management Systems; (c) Jignesh M. Patel, 2013 8

(Simple) Hash Join Algorithm: HJ 1. Partition U into P partitions, using a hash function h1 on the join key 2. Partition M into P partitions, using a hash function h1 on the join key 3. Join each partition of U with the corresponding partition of M 4. (using hashing as in BNL, so build a hash table on the U partition) // Note the hash function in the second part must be different from h1 With B buffer pages, # partitions is ~B (for each U and M) Each partition of U must fit in memory (with its hash table). Assume that the hash table increases the space required by a factor of F. Thus, the largest U that can be joined in two passes is constrained by: 2/19/17 CS 564: Database Management Systems; (c) Jignesh M. Patel, 2013 9

Hash-Join Original Relation OUTPUT 1 Partitions... INPUT 2 hash function h1 B-1 1 2 k Disk B main memory buffers Disk What if f* U i > B-2? Partitions of R & S hash fn h2 Hash table for partition Ri (k < B-1 pages) Join Result h2 Input buffer for Si Output buffer Disk B main memory buffers Disk 2/19/17 EECS 484: Database Management Systems, Jignesh M. Patel 10

Hash Join versus Sort-Merge Join Need to join U with M, where M > U, using B buffer pages To do a two-pass join, SMJ needs In this case the IO cost is: 3 * ( U + M ) To do a two-pass join, HJ needs In this case the IO cost is: 3 * ( U + M ) So HJ can sort two relations with fewer buffer pages! 2/19/17 CS 564: Database Management Systems 11

General Join Conditions Equalities over several attributes e.g., R.sid=S.sid AND R.rname=S.sname: Index NL index on <sid, sname> index on sid or sname. SM and Hash, sort/hash on combination of join attrs Inequality conditions (e.g., R.rname < S.sname): For Index NL, need (clustered!) B+ tree index. Large # index matches SM and Hash not applicable Block NL likely to be the winner 2/19/17 EECS 484: Database Management Systems, Jignesh M. Patel 12

and Ⅹ special cases of join and similar; we ll do. Duplicate elimination Sorting: Set Operations Sort both relations (on all attributes). Merge sorted relations eliminating duplicates. Alternative: Merge sorted runs from both relations. Hashing: Partition R and S Build hash table for R i. Probe with tuples in S i, add to table if not a duplicate 2/19/17 EECS 484: Database Management Systems, Jignesh M. Patel 13

Sorting Aggregates Sort on group by attributes (if any) Scan sorted tuples, computing running aggregate Max: Max Average: Sum, Count If the group by attribute changes, output aggregate result Cost: sorting cost 2/19/17 EECS 484: Database Management Systems, Jignesh M. Patel 14

Aggregates Hashing Hash on group by attributes (if any) Hash entry: group attributes + running aggregate Scan tuples, probe hash table, update hash entry Scan hash table, and output each hash entry Cost: Scan relation! What if we have a large # groups? 2/19/17 EECS 484: Database Management Systems, Jignesh M. Patel 15

Index Without Grouping Aggregates Can use B+tree on aggregate attribute(s) Where clause? With grouping B+tree on all attributes in SELECT, WHERE and GROUP BY clauses Index-only scan If group-by attributes prefix of search key => data entries/tuples retrieved in group-by order Else => get data entries and then use a sort or hash aggregate algorithm 2/19/17 EECS 484: Database Management Systems, Jignesh M. Patel 16