CS 222/122C Fall 2017, Final Exam. Sample solutions

Similar documents
CS222P Fall 2017, Final Exam

6.830 Lecture 8 10/2/2017. Lab 2 -- Due Weds. Project meeting sign ups. Recap column stores & paper. Join Processing:

CS 564 Final Exam Fall 2015 Answers

CS 222/122C Fall 2016, Midterm Exam

Midterm 1: CS186, Spring I. Storage: Disk, Files, Buffers [11 points] cs186-

Midterm 1: CS186, Spring I. Storage: Disk, Files, Buffers [11 points] SOLUTION. cs186-

University of Waterloo Midterm Examination Sample Solution

Database Management Systems (COP 5725) Homework 3

Chapter 12: Query Processing. Chapter 12: Query Processing

Chapter 12: Query Processing

Database Systems External Sorting and Query Optimization. A.R. Hurson 323 CS Building

Parser: SQL parse tree

University of California, Berkeley. CS 186 Introduction to Databases, Spring 2014, Prof. Dan Olteanu MIDTERM

Database Applications (15-415)

Course No: 4411 Database Management Systems Fall 2008 Midterm exam

Query Processing. Solutions to Practice Exercises Query:

Chapter 12: Query Processing

Something to think about. Problems. Purpose. Vocabulary. Query Evaluation Techniques for large DB. Part 1. Fact:

CSE 190D Spring 2017 Final Exam

Query Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Database Systems: Fall 2015 Quiz I

RELATIONAL OPERATORS #1

CSE 190D Spring 2017 Final Exam Answers

CSIT5300: Advanced Database Systems

Advanced Database Systems

Implementing Relational Operators: Selection, Projection, Join. Database Management Systems, R. Ramakrishnan and J. Gehrke 1

Chapter 13: Query Processing

IMPORTANT: Circle the last two letters of your class account:

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe

Hash-Based Indexing 165

CSE 544 Principles of Database Management Systems

Algorithms for Query Processing and Optimization. 0. Introduction to Query Processing (1)

CSE 530A. B+ Trees. Washington University Fall 2013

Administriva. CS 133: Databases. General Themes. Goals for Today. Fall 2018 Lec 11 10/11 Query Evaluation Prof. Beth Trushkowsky

LAB 3 Part 1 ADBC - Interactive SQL Queries-Advanced

University of Waterloo Midterm Examination Solution

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for

Chapter 13: Query Processing Basic Steps in Query Processing

R has a ordered clustering index file on its tuples: Read index file to get the location of the tuple with the next smallest value

CS 186/286 Spring 2018 Midterm 1

Dtb Database Systems. Announcement

Final Exam Review. Kathleen Durant CS 3200 Northeastern University Lecture 22

Last Class Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications

Midterm Review CS634. Slides based on Database Management Systems 3 rd ed, Ramakrishnan and Gehrke

Query Processing: The Basics. External Sorting

R16 SET - 1 '' ''' '' ''' Code No: R

Database System Concepts

Outline. Query Processing Overview Algorithms for basic operations. Query optimization. Sorting Selection Join Projection

Query Processing & Optimization. CS 377: Database Systems

Problem Set 2 Solutions

Final Exam Review 2. Kathleen Durant CS 3200 Northeastern University Lecture 23

Review: Memory, Disks, & Files. File Organizations and Indexing. Today: File Storage. Alternative File Organizations. Cost Model for Analysis

CS-245 Database System Principles

Announcements. Reading Material. Recap. Today 9/17/17. Storage (contd. from Lecture 6)

Final Exam Review. Kathleen Durant PhD CS 3200 Northeastern University

Roadmap DB Sys. Design & Impl. Overview. Reference. What is Join Processing? Example #1. Join Processing

IMPORTANT: Circle the last two letters of your class account:

Kathleen Durant PhD Northeastern University CS Indexes

THE B+ TREE INDEX. CS 564- Spring ACKs: Jignesh Patel, AnHai Doan

CPSC 311: Analysis of Algorithms (Honors) Exam 1 October 11, 2002

Query Optimization. Query Optimization. Optimization considerations. Example. Interaction of algorithm choice and tree arrangement.

Introduction to Indexing 2. Acknowledgements: Eamonn Keogh and Chotirat Ann Ratanamahatana

Magda Balazinska - CSE 444, Spring

CS 186, Fall 1999 Midterm 2 Professor Hawthorn

EXTERNAL SORTING. CS 564- Spring ACKs: Dan Suciu, Jignesh Patel, AnHai Doan

Sorting & Aggregations

CSE 444: Database Internals. Lectures 5-6 Indexing

CS 186/286 Spring 2018 Midterm 1

External Sorting. Chapter 13. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

CMPS 181, Database Systems II, Final Exam, Spring 2016 Instructor: Shel Finkelstein. Student ID: UCSC

DBMS Y3/S5. 1. OVERVIEW The steps involved in processing a query are: 1. Parsing and translation. 2. Optimization. 3. Evaluation.

Evaluation of Relational Operations

Administrivia. CS 133: Databases. Cost-based Query Sub-System. Goals for Today. Midterm on Thursday 10/18. Assignments

Spring 2017 QUERY PROCESSING [JOINS, SET OPERATIONS, AND AGGREGATES] 2/19/17 CS 564: Database Management Systems; (c) Jignesh M.

Indexing: Overview & Hashing. CS 377: Database Systems

Relational Query Optimization. Overview of Query Evaluation. SQL Refresher. Yanlei Diao UMass Amherst October 23 & 25, 2007

CSIT5300: Advanced Database Systems

CMSC424: Database Design. Instructor: Amol Deshpande

ECS 165B: Database System Implementa6on Lecture 7

Spring 2013 CS 122C & CS 222 Midterm Exam (and Comprehensive Exam, Part I) (Max. Points: 100)

CS 186, Fall 1999 Midterm 2 Professor Hawthorn

Query Processing & Optimization

Query Optimization. Introduction to Databases CompSci 316 Fall 2018

PS2 out today. Lab 2 out today. Lab 1 due today - how was it?

Lecture 8 Index (B+-Tree and Hash)

Last Class. Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Why do we need to sort? Today's Class

CSE 544, Winter 2009, Final Examination 11 March 2009

Overview of Implementing Relational Operators and Query Evaluation

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 7 - Query execution

CS122 Lecture 8 Winter Term,

External Sorting Implementing Relational Operators

CSE 444, Winter 2011, Final Examination. 17 March 2011

EXTERNAL SORTING. Sorting

Spring 2017 EXTERNAL SORTING (CH. 13 IN THE COW BOOK) 2/7/17 CS 564: Database Management Systems; (c) Jignesh M. Patel,

Module 9: Query Optimization

Context. File Organizations and Indexing. Cost Model for Analysis. Alternative File Organizations. Some Assumptions in the Analysis.

Lassonde School of Engineering Winter 2016 Term Course No: 4411 Database Management Systems

Midterm Review. March 27, 2017

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Database Systems: Fall 2009 Quiz I Solutions

DATABASE PERFORMANCE AND INDEXES. CS121: Relational Databases Fall 2017 Lecture 11

Transcription:

CS 222/122C Fall 2017, Final Exam Principles of Data Management Department of Computer Science, UC Irvine Prof. Chen Li (Max. Points: 100 + 15) Sample solutions Question 1: Short questions (15 points) a) (3 pts) Operators can be easily connected to generate a plan since they have the same interface. b) (3 pts) The subplan generates intermediate results sorted on the attribute A. In addition, the attribute A is used later in the remaining plan, i.e., it s used in a join, group by, distinct, or order by operator. c) (3 pts) These results could be pipelined to the next operator without going to the disk. d) (3 pts) (1) ((R JOIN S) JOIN T) JOINT W; (2) (R JOIN (S JOIN T)) JOIN W; (3) (R JOIN S) JOIN (T JOIN W). e) (3 pts) Approach (1) uses space more efficiently since it stores a duplicate key only once in data entries. A main drawback is that it needs to maintain overflow pages when the list of rid s cannot fit in one page. Question 2: B+ Tree (10 points) a) (2 pts) Fewer I/O s, Fewer splits & rebalancing. b) (4 pts) CS222 Final: Sample solutions, Fall 2017, Page 1

c) (4 pts) Assume the condition deptid = 10 or 11 can be handled as a range. The plan of using <deptid, gpa> has fewer IOs. Question 3: External Sort (13 points) a) (4 pts) Pass 0: 300 sorted runs of 31 pages each; Pass 1: 10 sorted runs of 930 pages each; Pass 2: 1 sorted file of 9300 pages 5 * 9300 = 46500 b) (4 pts) 2 * log(a) * (A * B * C). Each record needs to be pushed into and popped out of the heap (priority queue), and each push/pop operation has a cost of log(a). If we combine the pop of the previous top element and the push of the new element into one step, we may get rid of the 2 * in the formula. c) (5 pts) We can allocate some space in memory to store a directory of pointers pointing to each record. Then when you sort the variable-length records, we swap the pointers instead of the actual record. If the directory is small, then we need at least one buffer page. If there are too many records and one buffer page cannot hold that many pointers, we need more buffer pages as needed. Question 4: Join (18 points) a) (4 pts) 200 + 3000 * (1 + 20,000/3,000) Assumption: we only need to access one leaf node of the Order.cid B+ tree. b) (4 pts) CS222 Final: Sample solutions, Fall 2017, Page 2

Need 1 partition pass. (2 * 2-1) * (200 + 500) = 2100 c) (5 pts) Suppose R is the smaller relation. In step 1, read the records of R page by page. For each record, apply a hash function h. If the hash value h(x) is 0, keep it in memory to build a hash table. Otherwise, pass it to the disk through a buffer page. In step 2, read the records of S page by page. For each record, apply the same hash function h. If the hash value h(x) is 0, do a lookup in the in-memory hash table and find matching results, which are output as results though one page buffer. Otherwise, pass it to the disk through one page buffer. Repeat steps 1 and 2 for those passed-over records on the disk using a sequence of different hash functions, until the remaining disk records of R can fit into memory. Then load these pages into memory to build a hash table, and scan the disk pages of S to do the in-memory join. d) (5 pts) CS222 Final: Sample solutions, Fall 2017, Page 3

Suppose R is the smaller relation. In step 1, read the records of R page by page. For each record, apply a hash function h. If the hash value h(x) is 0, keep it in memory to build a hash table for partition R0. Otherwise, pass it to the buffer page i = h(x), which will eventually flushed to the disk to generate partition Ri. In step 2, read the records of S page by page. For each record, apply the same hash function h. If the hash value h(x) is 0, do a lookup in the in-memory hash table of R0 and find matching results, which are output as results though one page buffer. Otherwise, pass it to the buffer page i = h(x), which will eventually be flushed to the disk to generate partition Si. Use the second step of grace join to join each (Ri, Si) partition pairs. This join algorithm combines the idea of keeping one partition in memory as a hash table to reduce their disk IOs from Simple Hash Join and the idea of partitioning those remaining records using multiple output buffers from Grace Hash Join. Question 5: Distinct (12 points) a) (3 pts) select count(distinct name) from Applicant CS222 Final: Sample solutions, Fall 2017, Page 4

b) (3 pts) We need to two merge passes. Thus, the total is 1000 + 400 * 4 = 1000 + 1600 = 2600 c) (3 pts) Same to b), 2600 d) (3 pts) For b, eliminate duplicate results during sort/merge phase. For c, eliminate duplicate results during partitioning/building phase. Question 6: Cost Estimation (12 points) a) (3 pts) b) (3 pts) 35 c) (3 pts) d) (3 pts) 4/24 * 100 CS222 Final: Sample solutions, Fall 2017, Page 5

Question 7: System-R Optimizer (20 points) a) (2 pts) Write the meaning of the query in plain English. For each supplier, for all its products with a price more than $200 and sold to a California customer, compute their total price. b) (6 pts) Products(pid, pname, price) 1) B+ tree scan on pid, interesting order on pid; kept; 2) B+ tree search on price for the condition price > 200 ; kept if its cost is less than the access method 1). 3) B+ tree search on pid for a constant, kept for a later index-based join Customers(cid, cname, email, state) 1) B+ tree scan on cid, interesting order on cid; kept; 2) B+ tree search on cname for the condition state=ca ; kept if its cost is less than the access method 1); 3) B+ tree search on cid for a constant, kept for a later index-based join c) (4 pts) 1) Products join Buys: For each access method on Products kept from the previous step, for each access method on Buys kept from the previous step, consider all possible valid join methods: block nested loop join, index-based join, sort merge join, hash join, etc. Estimate the cost of each subplan. Use the interesting order from the previous method, if any, when estimating the cost of a sort-merge join. For each interesting order, select the join method with the smallest cost, and remove those subplans that are dominated by another subplan in terms of both cost and interesting order(s). 2) Repeat the same step for Buys join Products; d) (4 pts) 1) (C, CP, P, PS ) 2) ( CP, P, PS, S) 3) (C, CP, PS, S) Notice that CP and PS can still join their pid attributes. If your answer assumes CP and PS do not join when P is not included, we can also accept an answer that does not include 3). System-R optimizer does not consider cross products. So a subset that produces a cross product is wrong. CS222 Final: Sample solutions, Fall 2017, Page 6

e) (4 pts) - Interesting orders - Left-deep trees only - Avoid Cartesian products - Dynamic programming to do plan enumeration - Selection push down - Deal with group by at the end - Deal with nested subqueries as blocks, and optimize them separately (not shown in the example) Question 8: Extra-Credit Question (15 points) a) (5 pts) Sort the new records. Perform a full scan over the original B+ tree. Merge the results of sorted new records with the results from scanning B+ tree to bulk load a new B+ tree. b) (5 pts) Use a tombstone record to indicate that a key has been deleted. c) (5 pts) Perform a full scan over multiple B+ trees to get sorted records. During the full scan, use a priority queue to merge results so that a key can be returned at most once. If a key has been deleted (as a tomebstone), we ignore that key. For the sorted records, we bulkload a new B+tree. CS222 Final: Sample solutions, Fall 2017, Page 7