Carnegie Mellon University in Qatar Spring Problem Set 4. Out: April 01, 2018

Similar documents
Reminders - IMPORTANT:

R & G Chapter 13. Implementation of single Relational Operations Choices depend on indexes, memory, stats, Joins Blocked nested loops:

Database Systems CSE 414

Relational Query Optimization. Highlights of System R Optimizer

RELATIONAL OPERATORS #1

Cost-based Query Sub-System. Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Last Class.

Hash-Based Indexing 165

Overview of Implementing Relational Operators and Query Evaluation

Principles of Data Management. Lecture #12 (Query Optimization I)

Notes. Some of these slides are based on a slide set provided by Ulf Leser. CS 640 Query Processing Winter / 30. Notes

Examples of Physical Query Plan Alternatives. Selected Material from Chapters 12, 14 and 15

Relational Query Optimization

CSE 414 Midterm. Friday, April 29, 2016, 1:30-2:20. Question Points Score Total: 100

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17

Lassonde School of Engineering Winter 2016 Term Course No: 4411 Database Management Systems

Database Applications (15-415)

Administrivia. CS 133: Databases. Cost-based Query Sub-System. Goals for Today. Midterm on Thursday 10/18. Assignments

Database Applications (15-415)

Outline. Database Management and Tuning. Outline. Join Strategies Running Example. Index Tuning. Johann Gamper. Unit 6 April 12, 2012

CS 245 Midterm Exam Solution Winter 2015

CSE 414 Midterm. April 28, Name: Question Points Score Total 101. Do not open the test until instructed to do so.

CSE 190D Spring 2017 Final Exam Answers

Final Exam Review 2. Kathleen Durant CS 3200 Northeastern University Lecture 23

What s a database system? Review of Basic Database Concepts. Entity-relationship (E/R) diagram. Two important questions. Physical data independence

Implementation of Relational Operations

CS 4320/5320 Homework 2

An SQL query is parsed into a collection of query blocks optimize one block at a time. Nested blocks are usually treated as calls to a subroutine

CSE 190D Spring 2017 Final Exam

Administriva. CS 133: Databases. General Themes. Goals for Today. Fall 2018 Lec 11 10/11 Query Evaluation Prof. Beth Trushkowsky

CS222P Fall 2017, Final Exam

Parser: SQL parse tree

External Sorting Implementing Relational Operators

QUERY OPTIMIZATION [CH 15]

Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Administrivia. Administrivia. Faloutsos/Pavlo CMU /615

Announcements. Two typical kinds of queries. Choosing Index is Not Enough. Cost Parameters. Cost of Reading Data From Disk

Overview of Query Processing. Evaluation of Relational Operations. Why Sort? Outline. Two-Way External Merge Sort. 2-Way Sort: Requires 3 Buffer Pages

University of Waterloo Midterm Examination Sample Solution

Hash table example. B+ Tree Index by Example Recall binary trees from CSE 143! Clustered vs Unclustered. Example

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

CS W Introduction to Databases Spring Computer Science Department Columbia University

Introduction to Database Systems CSE 414. Lecture 26: More Indexes and Operator Costs

CSE 444 Homework 1 Relational Algebra, Heap Files, and Buffer Manager. Name: Question Points Score Total: 50

Evaluation of Relational Operations

CS 564 Final Exam Fall 2015 Answers

R has a ordered clustering index file on its tuples: Read index file to get the location of the tuple with the next smallest value

Information Systems Engineering. SQL Structured Query Language DDL Data Definition (sub)language

Homework 4: Query Processing, Query Optimization (due March 21 st, 2016, 4:00pm, in class hard-copy please)

Parser. Select R.text from Report R, Weather W where W.image.rain() and W.city = R.city and W.date = R.date and R.text.

CSE 530 Midterm Exam

Review. Relational Query Optimization. Query Optimization Overview (cont) Query Optimization Overview. Cost-based Query Sub-System

Schema for Examples. Query Optimization. Alternative Plans 1 (No Indexes) Motivating Example. Alternative Plans 2 With Indexes

Using Relational Databases for Digital Research

Administrivia. Relational Query Optimization (this time we really mean it) Review: Query Optimization. Overview: Query Optimization

Assignment 6 Solutions

Principles of Data Management. Lecture #9 (Query Processing Overview)

Query Optimization. Schema for Examples. Motivating Example. Similar to old schema; rname added for variations. Reserves: Sailors:

Introduction to Databases, Fall 2005 IT University of Copenhagen. Lecture 2: Relations and SQL. September 5, Lecturer: Rasmus Pagh

Physical Design. Elena Baralis, Silvia Chiusano Politecnico di Torino. Phases of database design D B M G. Database Management Systems. Pag.

Relational Query Optimization. Overview of Query Evaluation. SQL Refresher. Yanlei Diao UMass Amherst October 23 & 25, 2007

CSE 232A Graduate Database Systems

CSC 261/461 Database Systems Lecture 19

Query Evaluation! References:! q [RG-3ed] Chapter 12, 13, 14, 15! q [SKS-6ed] Chapter 12, 13!

Evaluation of Relational Operations. Relational Operations

Database Applications (15-415)

Overview of DB & IR. ICS 624 Spring Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa

Query Processing and Optimization *

Query Optimization. Schema for Examples. Motivating Example. Similar to old schema; rname added for variations. Reserves: Sailors:

CSE 444: Database Internals. Lecture 11 Query Optimization (part 2)

Modern Database Systems Lecture 1

Spring 2017 EXTERNAL SORTING (CH. 13 IN THE COW BOOK) 2/7/17 CS 564: Database Management Systems; (c) Jignesh M. Patel,

EECS 647: Introduction to Database Systems

CSE344 Midterm Exam Winter 2017

L20: Joins. CS3200 Database design (sp18 s2) 3/29/2018

Faloutsos 1. Carnegie Mellon Univ. Dept. of Computer Science Database Applications. Outline

Queen s University Faculty of Arts and Science School of Computing CISC 432* / 836* Advanced Database Systems

Implementing Relational Operators: Selection, Projection, Join. Database Management Systems, R. Ramakrishnan and J. Gehrke 1

CSE344 Final Exam Winter 2017

Evaluation of relational operations

Database Systems CSE 414

Midterm 1: CS186, Spring I. Storage: Disk, Files, Buffers [11 points] SOLUTION. cs186-

CompSci 516 Data Intensive Computing Systems

QUERY PROCESSING & OPTIMIZATION CHAPTER 19 (6/E) CHAPTER 15 (5/E)

1. Data Definition Language.

CGS 3066: Spring 2017 SQL Reference

Evaluation of Relational Operations

Evaluation of Relational Operations: Other Techniques. Chapter 14 Sayyed Nezhadi

Implementation of Relational Operations. Introduction. CS 186, Fall 2002, Lecture 19 R&G - Chapter 12

15-415/615 Faloutsos 1

Database Applications (15-415)

Database Systems External Sorting and Query Optimization. A.R. Hurson 323 CS Building

Query Evaluation (i)

CS 4604: Introduction to Database Management Systems. B. Aditya Prakash Lecture #10: Query Processing

Outline. Database Tuning. Join Strategies Running Example. Outline. Index Tuning. Nikolaus Augsten. Unit 6 WS 2014/2015

CMPS182 Midterm Examination. Name: Question 1: Question 2: Question 3: Question 4: Question 5: Question 6:

Carnegie Mellon University in Qatar Spring Problem Set 2. Out: January 23, 2018

Midterm 1: CS186, Spring I. Storage: Disk, Files, Buffers [11 points] cs186-

Fundamentals of Database Systems

Evaluation of Relational Operations

QUERY PROCESSING & OPTIMIZATION CHAPTER 19 (6/E) CHAPTER 15 (5/E)

CSE 344 Midterm Exam

Transcription:

Carnegie Mellon University in Qatar 15415 - Spring 2018 Problem Set 4 Out: April 01, 2018 Due: April 11, 2018 1

1 B + External Sorting [20 Points] Suppose that you just nished inserting several records into a heap le and now want to sort those records. Assume that the DBMS uses external sort and makes ecient use of the available buer space when it sorts a le. Below is some potentially useful information about the newly loaded le and the DBMS software available to operate on it: The number of records in the le is 4500. The sort key for the le is 4 bytes long. You can assume that rids are 8 bytes long and page ids are 4 bytes long. Each record is a total of 48 bytes long. The page size is 512 bytes. Each page has 12 bytes of control information on it. Four buer pages are available. 4pts 1. How many sorted sub-les will there be after the initial pass of the sort? How long will each sub-le be? 2pts 2pts 2. 3. How many passes (including the initial pass just considered) are required to sort this le? What is the total I/O cost for sorting this le? 4pts 8pts 4. 5. What is the largest le, in terms of the number of records, can you sort with just four buer pages in two passes? How would your answer change if you had 257 buer pages? Suppose that you have a B+ tree index with the search key being the same as the desired sort key. Find the cost of using the index to retrieve the records in sorted order for each of the following cases: (a) The index uses Alternative (1) for data entries. (b) The index uses Alternative (2) and is unclustered (assume the worst-case). Page 2

2 Evaluating Relational Operators [20 points] Consider a join operation between two relations R and S : R R.a=S.b S. Read the information below about the two relations to be joined, then answer the following questions. Assume the cost metric as the number of I/Os and ignore the cost of writing out the result. Relation R contains 10,000 tuples and has 10 tuples per page. Relation S contains 2000 tuples and also has 10 tuples per page. Attribute b of relation S is the primary key for S. Both relations are stored as simple heap les. 52 buer pages are available. 8pts 1. If unclustered B + indexes existed on R.a and S.b, which would incur a lower cost; an index nested loop join or a block nested loop join? Explain (assume the worst-case). 3pts 2. Would your answer change if only ve buer pages were available? 6pts 3. Would your answer change if S contained only 10 tuples instead of 2000? 3pts 4. If R.a is a foreign key that refers to S.b, which of R or S would you choose to be the inner relation? Explain. Assignment continues on the next pages Page 3

3 Query Optimization in PostgreSQL [60 Points] The purpose of this question is to expose the query plans elected by PostgreSQL and observe how indices aect its choice of query plans. For your convenience, Part 3.1 presents a brief summary of the steps involved in query optimization (from the course lectures). In Part 3.2, you must run the given queries on PostgreSQL, analyze the execution plans elected, and answer the given questions. In these questions, we refer to the estimated total cost and the actual total cost; the former is in arbitrary units, while the latter is in milliseconds. Both are straightforwardly obtained from the output. 3.1 Summary of Query Optimization As shown in Figure 1 below, given a query, the Query Optimizer is the DBMS component responsible for generating a subset of all possible plans, evaluating each such plan, and nally electing a (sub) optimal plan. The sub-component responsible for generating plans is the Plan Generator. For a given query, the plan generator does the following: By convention, it produces all the left-deep relational algebra trees. For each tree produced in (1), it produces the extended relational algebra trees (by enumerating the dierent algorithms for each operator as well as access paths for each relation). Figure 1: Components of a DBMS. The sub-component that cooperates very closely with the Plan Generator is the Plan Cost Estimator. For every extended relational algebra tree (a.k.a. a query plan) produced by the Generator, the Plan Cost Estimator estimates its cost by leveraging statistics from the System Catalog about the size of relations and subsequently computing the cost of the algorithms involved. The Plan Generator, in conclusion, elects the query plan with the least cost and deems it the execution plan for the given query. Page 4

3.2 Analyzing Query Optimization in PostgreSQL Consider the two relations, movies and actors, from the MovieLens database of Project 1, whose schemas are shown below: movies (mid: integer, title: varchar, year: date, rating: real, num_ratings: integer) actors (mid: integer, name: varchar, cast_position: integer) For every relation in a database, PostgreSQL, by default, creates an index on the primary key. The rst task is to list all indices using the command \di followed by dropping the indices on movies and actors (hint: for the default indices use the SQL command: alter table.. drop constraint..) 25pts 1. Consider the following query that nds the name and cast position of the actors that acted in the movie with mid = 1000, and answer the following questions WHERE mid = 100 We will use the EXPLAIN 1 2 and ANALYZE SQL commands to expose the execution plan for the given query. Using EXPLAIN and ANALYZE, execute the given query. (a) What is the execution plan? Also, include the output of PostgreSQL. (b) What is the estimated total cost of the plan? (c) What is the actual total cost of the plan? Next, build an index on the eld mid (hint: use the SQL command: create index..) and re-run the given query. (d) What is the execution plan? Also, include the output of PostgreSQL. (e) What is the estimated total cost of the plan with the built index? (f) What is the actual total cost of the plan with the built index? (g) Did the estimated and actual costs change? (Yes/No) (h) How did the costs change? (increased/decreased/unchanged) (i) Explain why the costs remained the same or changed with the built index. (j) Can you deduce the type of index built? (Yes/No) (k) Which type of tree was built? Explain. (tree-based/hash-based/either) 1 Read about the syntax of EXPLAIN at https://www.postgresql.org/docs/current/static/sql-explain.html 2 Read about the output of EXPLAIN at https://www.postgresql.org/docs/current/static/performancetips.html Page 5

Finally, drop the index built above (hint: use the SQL command: drop index..). Build a new index on the elds mid and name and re-run the given query. (l) What is the execution plan? Also, include the output of PostgreSQL. (m) What is the estimated total cost of the plan? (n) What is the actual total cost of the plan with? (o) Did the estimated and actual costs change? (Yes/No) (p) How did the costs change? (increased/decreased/unchanged) (q) Explain why the costs remained the same or changed. (r) Can you deduce the type of index built? (Yes/No) (s) Which type of tree was built? Explain. (tree-based/hash-based/either) 5pts 2. Consider the following query. Using EXPLAIN and ANALYZE and given the index on mid and name, execute the given query. WHERE mid > 1000 and mid < 2000 (a) What is the execution plan? Also, include the output of PostgreSQL. (b) What is the estimated total cost of the plan? (c) What is the actual total cost of the plan? (d) Can you deduce the type of index built? (Yes/No) (e) Which type of tree was built? Explain. (tree-based/hash-based/either) 10pts 3. Consider the following two queries. Using EXPLAIN and ANALYZE and given the index on mid and name, execute the given queries. Q1 : WHERE mid > 1000 and cast_position = 1 Q2 : WHERE mid > 6000 and cast_position = 1 (a) What is the execution plan of each query? Also, include the output of PostgreSQL. (b) What is the estimated total cost of each plan? (c) What is the actual total cost of each plan? (d) Are the execution plans of Q1 and Q2 the same? (Yes/No) (e) Explain why the execution plans are the same or dierent. Page 6

20pts 4. Consider the following query involving a join between the two relations, movies and actors. After dropping the index on mid and name, execute the given query. SELECT m. mid, a. name FROM movies m, actors a WHERE a. mid = m. mid (a) What is the execution plan? Also, include the output of PostgreSQL. (b) What is the estimated total cost of the plan? (c) Which join algorithm is used in the execution plan? Next, build an index on the eld m.mid and re-run the given query. (d) What is the execution plan? Also, include the output of PostgreSQL. (e) What is the estimated total cost of the plan? (f) Which join algorithm is used in the execution plan? (g) Did the execution plan change? (Yes/No) (h) Explain why the execution plan changed or remained the same. Next, build an index on the eld a.mid and re-run the given query. (i) What is the execution plan? Also, include the output of PostgreSQL. (j) What is the estimated total cost of the plan? (k) Which join algorithm is used in the execution plan? (l) Did the execution plan change after the creation of the second index? (Yes/No) (m) Explain why the execution plan changed or remained the same after the creation of both indices. Finally, using the SET 3 command, disable the join algorithm used in part (l) above and re-run the given query. (n) What is the execution plan? Also, include the output of PostgreSQL. (o) What is the actual total cost of the plan? (p) Which join algorithm is used in the execution plan? 3 The SET command can be used to change the behavior of the query planner. For example, set enable_hash_join = false; disables the query planner's use of hash join query plans. Page 7