CS 564 Final Exam Fall 2015 Answers

Similar documents
CSE 190D Spring 2017 Final Exam Answers

CSE 190D Spring 2017 Final Exam

Database Management Systems (COP 5725) Homework 3

IMPORTANT: Circle the last two letters of your class account:

Lassonde School of Engineering Winter 2016 Term Course No: 4411 Database Management Systems

IMPORTANT: Circle the last two letters of your class account:

CS 222/122C Fall 2017, Final Exam. Sample solutions

RELATIONAL OPERATORS #1

McGill April 2009 Final Examination Database Systems COMP 421

Final Exam CSE232, Spring 97

Assignment 6 Solutions

INSTITUTO SUPERIOR TÉCNICO Administração e optimização de Bases de Dados

Midterm 1: CS186, Spring I. Storage: Disk, Files, Buffers [11 points] cs186-

Database Management Systems Written Examination

Midterm 2: CS186, Spring 2015

INSTITUTO SUPERIOR TÉCNICO Administração e optimização de Bases de Dados

CSE 414 Final Examination

Final Review. May 9, 2017

Final Review. May 9, 2018 May 11, 2018

University of Waterloo Midterm Examination Sample Solution

CSE 444 Homework 1 Relational Algebra, Heap Files, and Buffer Manager. Name: Question Points Score Total: 50

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe

Midterm 1: CS186, Spring I. Storage: Disk, Files, Buffers [11 points] SOLUTION. cs186-

Database Management Systems Paper Solution

Midterm 1: CS186, Spring 2015

CISC437/637 Database Systems Final Exam

CSE 544, Winter 2009, Final Examination 11 March 2009

Final Exam CSE232, Spring 97, Solutions

INSTITUTO SUPERIOR TÉCNICO Administração e optimização de Bases de Dados

Exam 1. March 20th, CS525 - Midterm Exam Solutions

University of California, Berkeley. CS 186 Introduction to Databases, Spring 2014, Prof. Dan Olteanu MIDTERM

R has a ordered clustering index file on its tuples: Read index file to get the location of the tuple with the next smallest value

Algorithms for Query Processing and Optimization. 0. Introduction to Query Processing (1)

Examples of Physical Query Plan Alternatives. Selected Material from Chapters 12, 14 and 15

CISC437/637 Database Systems Final Exam

CSE 444, Winter 2011, Final Examination. 17 March 2011

Relational DBMS Internals Solutions Manual. A. Albano, D. Colazzo, G. Ghelli and R. Orsini

Problem 1 (12P) Indicate whether each of the following statements is true or false; explain briey. a) In query optimization, it is always better to pe

CSE344 Midterm Exam Winter 2017

Query Processing and Advanced Queries. Query Optimization (4)

Query Optimization. Query Optimization. Optimization considerations. Example. Interaction of algorithm choice and tree arrangement.

CS 461: Database Systems. Final Review. Julia Stoyanovich

CSE344 Midterm Exam Fall 2016

CSE344 Final Exam Winter 2017

Chapter 3. Algorithms for Query Processing and Optimization

Outline. Database Management and Tuning. Outline. Join Strategies Running Example. Index Tuning. Johann Gamper. Unit 6 April 12, 2012

R & G Chapter 13. Implementation of single Relational Operations Choices depend on indexes, memory, stats, Joins Blocked nested loops:

CS/B.Tech/CSE/New/SEM-6/CS-601/2013 DATABASE MANAGEMENENT SYSTEM. Time Allotted : 3 Hours Full Marks : 70

CS 186/286 Spring 2018 Midterm 1

Name Class Account UNIVERISTY OF CALIFORNIA, BERKELEY College of Engineering Department of EECS, Computer Science Division J.

External Sorting Implementing Relational Operators

Lecture 12. Lecture 12: The IO Model & External Sorting

Database Systems External Sorting and Query Optimization. A.R. Hurson 323 CS Building

Goal of Concurrency Control. Concurrency Control. Example. Solution 1. Solution 2. Solution 3

CSE Midterm - Spring 2017 Solutions

Implementing Relational Operators: Selection, Projection, Join. Database Management Systems, R. Ramakrishnan and J. Gehrke 1

CSE 444, Winter 2011, Midterm Examination 9 February 2011

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 7 - Query execution

Outline. Query Processing Overview Algorithms for basic operations. Query optimization. Sorting Selection Join Projection

Relational Query Optimization. Overview of Query Evaluation. SQL Refresher. Yanlei Diao UMass Amherst October 23 & 25, 2007

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Database Systems: Fall 2015 Quiz I

CS 222/122C Fall 2016, Midterm Exam

Database Management Systems Written Exam

Query Processing. Introduction to Databases CompSci 316 Fall 2017

Homework 1: RA, SQL and B+-Trees (due September 24 th, 2014, 2:30pm, in class hard-copy please)

Chapter 12: Query Processing

Course No: 4411 Database Management Systems Fall 2008 Midterm exam

Delhi Noida Bhopal Hyderabad Jaipur Lucknow Indore Pune Bhubaneswar Kolkata Patna Web: Ph:

CSE 344 Final Examination

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17

Query Processing: The Basics. External Sorting

CSE 344 Final Examination

Homework 2: Query Processing/Optimization, Transactions/Recovery (due February 16th, 2017, 9:30am, in class hard-copy please)

CS222P Fall 2017, Final Exam

From SQL-query to result Have a look under the hood

CSE 344 Midterm Nov 1st, 2017, 1:30-2:20

CSE 544 Principles of Database Management Systems

Overview of Transaction Management

Database Applications (15-415)

Fundamentals of Database Systems

! Parallel machines are becoming quite common and affordable. ! Databases are growing increasingly large

Chapter 20: Parallel Databases

Chapter 20: Parallel Databases. Introduction

VIEW OTHER QUESTION PAPERS

192 Chapter 14. TotalCost=3 (1, , 000) = 6, 000

Chapter 17: Parallel Databases

CMSC 461 Final Exam Study Guide

Database Systems. Announcement. December 13/14, 2006 Lecture #10. Assignment #4 is due next week.

Administrivia. CS 133: Databases. Cost-based Query Sub-System. Goals for Today. Midterm on Thursday 10/18. Assignments

CMPS 181, Database Systems II, Final Exam, Spring 2016 Instructor: Shel Finkelstein. Student ID: UCSC

CSE 444 Final Exam. August 21, Question 1 / 15. Question 2 / 25. Question 3 / 25. Question 4 / 15. Question 5 / 20.

CSIT5300: Advanced Database Systems

Queen s University Faculty of Arts and Science School of Computing CISC 432* / 836* Advanced Database Systems

1 (10) 2 (8) 3 (12) 4 (14) 5 (6) Total (50)

EECS 647: Introduction to Database Systems

CSE 562 Final Exam Solutions

CSE 344 Final Examination

Final Review. CS634 May 11, Slides based on Database Management Systems 3 rd ed, Ramakrishnan and Gehrke

Chapter 18: Parallel Databases

Chapter 18: Parallel Databases. Chapter 18: Parallel Databases. Parallelism in Databases. Introduction

Transcription:

CS 564 Final Exam Fall 015 Answers A: STORAGE AND INDEXING [0pts] I. [10pts] For the following questions, clearly circle True or False. 1. The cost of a file scan is essentially the same for a heap file and a sorted file.. It is possible to store fixed-length records in a page format (layout) designed for variable-length records. 3. If a B+ tree index follows the alternative of storing the data records directly in the leaf pages, it is automatically a clustered index. 4. Among all the non-leaf nodes of a B+ tree index, duplicate entries of search keys are allowed in the root node. 5. A hash index is suitable for range search predicates. II. [10pts] Consider the following extendible hash index. 00 16* 8* 40* Bucket A Bucket B 01 10 11 Bucket C 10* 18* 90* 6* 5* 17* 1* Bucket D 7* 11* 3* 1. [8pts] Draw the index after the following keys have been inserted: 4*, 1*, and 5*. 1

000 001 010 011 3 Bucket A 16* 8* 40* 4* Bucket C 10* 18* 90* 6* 3 Bucket B1 17* 1* 5* Bucket D 100 7* 11* 3* 101 110 111 3 Bucket B 5* 1*. [pts] Having inserted the above three keys, what is the minimum number of delete operations needed for the global depth to decrease? Clearly circle the correct answer. (Hint: A bucket is merged with its split image if and only if it becomes empty.) (a) 0 (b) 1 (c) (d) 3 (e) 4 ANSWER: (c) B: OPERATOR IMPLEMENTATIONS [30pts] I. [6pts] For the following questions, clearly circle either True or False. 1. It is not possible to speed up the evaluation of the aggregation γ COUNT( ) (R) by using a B+ tree index on R.. A block nested loop join algorithm can achieve the same number of page I/Os as a hash join algorithm for some database instances. 3. Using the double buffering technique typically reduces the total number of passes needed for an external merge sort. II. [10pts] We are given a relation R(A, B, C) with 1,000 pages. There is a clustered B+ tree index on R on the composite key (A, B). Data records are stored in a file separate from the

index. The index has non-leaf levels and 00 leaf pages. The buffer pool has 00 frames. What is the lowest possible I/O cost of computing π A (R)? Ignore the cost of writing the output to disk. Describe the optimal algorithm and explain how it achieves that cost. ANSWER: The optimal cost is 0 I/Os by using only the index file ( non-leaf pages to get to the first leaf page and 00 leaf pages). This is possible because the projection list attributes (A) form a prefix of the search key (A,B). Thus, the leaf pages are already sorted on the projection list, which enables a sort-based project using a simple sequential scan of the leaf pages. Notice that we do not have to scan the full data records, because we do not need to access attribute C to project on A. III. [14pts] We are given two relations: R with 1,000 pages and S with 40,000 pages. The buffer pool has 10 frames. We are performing a key-foreign key join of R and S wherein S has the foreign key attribute. Which of the following algorithms has the lowest I/O cost: block nested loop join, hash join, or sort-merge join? Ignore the cost of writing the output to disk. Assume that the fudge factor for constructing a hash table is f = 1.4. Explain your answer in detail. (Hint: It is possible to obtain and justify the answer fully without even computing all the I/O costs exactly.) ANSWER: Hash join. Since 10 > 1.4 1, 000, hash join will have an I/O cost of 3 ( R + S ). But since 10 < 40, 000, sort-merge will not finish in passes, and will need to do a full external merge sort, which raises the I/O cost compared to the hash join. Block nested loop join will obviously be much slower than hash join, since there will be 1.4 1, 000/100 = 14 blocks of R, and hence, 14 passes over S. C: QUERY OPTIMIZATION [0pts] I. [14pts] Consider the following database schema: Product (PID, PName, Price) Company (CID, CName, Location, PID) Company.PID is a foreign key referring to Product.PID. Product has 0,000 tuples, and each record is 0 bytes long. Company has 1,000 tuples, and each record is 5 bytes long. Each page can hold 5,000 bytes. The buffer pool has 10 frames. Assume that there are 500 distinct values for the attribute Location and that its distribution is uniform. As always, the fudge factor for a hash table is f = 1.4. Consider the following SQL query: SELECT FROM WHERE PName, CName Product, Company Product.PID = Company.PID 3

AND Location = "Madison" ; Propose a physical plan (does not need to be optimal) that evaluates the above query. Compute its total I/O cost and briefly explain each component of the total cost. Ignore the cost of writing the output to disk. ANSWER: The number of pages for Product is (0, 000 0)/5, 000 = 80, and for Company is (1, 000 5)/5, 000 = 5. Thus, both tables easily fit together in the buffer pool. Thus, a simple physical plan is to first compute the join. Any join algorithm can be used; the I/O cost is just one read of each table, i.e., 80 + 5 = 85. After the join, we simply pipeline the result to an in-memory selection on Location and then an in-memory projection without duplicate elimination. Thus, we have no more I/O costs. II. [6pts] Consider the following database schema: R(A, B), S(A, B, C), T(B, D, E) For the following questions, clearly circle either True or False. 1. The following two relational algebra queries are equivalent: Q 1 =σ A=1,B> ((R S) T) Q =(σ A=1,B> (R S)) T. The following two relational algebra queries are equivalent: Q 3 =π E (σ D=1 (T S)) Q 4 =π B (S) π B,E (σ D=1 (T)) D: TRANSACTION MANAGEMENT [10pts] I. [4pts] For the following questions, clearly circle either True or False. 1. Using PL guarantees that there will be no deadlocks during a concurrent execution of transactions. 4

. All four SQL isolation levels guarantee that there will be no dirty reads during a concurrent execution of transactions. II. [6pts] Consider the following two transactions: T1 : R(A), W(A), R(B), W(B), Commit T : R(B), R(C), W(C), W(B), Commit Consider the following interleaved schedule of the two transactions: R T1 (A), R T (B), R T (C), W T1 (A), R T1 (B), W T1 (B), W T (C), W T (B), Commit T1, Commit T Is the schedule serializable? If you claim yes, write an equivalent serial (non-interleaved) execution of the two transactions. If you claim no, explain why it is not serializable. ANSWER: No, it is not serializable. There is a write-write conflict on B; the update by T1 is lost, since T overwrites it after reading an older value of B. Thus, it is not equivalent to either T1 T or T T1. E: RELATIONAL ALGEBRA AND SQL [0pts] I. [10pts] For the following two questions, clearly circle all the correct options and only the correct options. If you circle only some of the correct options and no wrong options, you will get proportional points. But if you circle even one wrong option, you will get zero for that question. (Hint: If you are not sure about an option, it is probably better not to circle it.) 1. [4pts] Which of the following symbols do not represent relational operators from Codd s original relational algebra? (a) γ (b) θ (c) δ (d) + (e) ANSWER: (a), (b), (c), (d). [6pts] This question is directly from Homework 1. Consider the following database schema: Likes (drinker, beer) Frequents (drinker, bar) Serves (bar, beer) Which of the following relational algebra queries answer the following question: which drinkers frequent only bars that serve only beers they like? 5

(a) π drinker (Likes Frequents Serves) (b) π drinker (Frequents) π drinker (((π drinker (Frequents) π beer (Serves)) Likes) Frequents Serves) (c) π drinker (Frequents) π drinker (Frequents π drinker,bar (Likes Serves)) (d) π drinker (Frequents) π drinker (Likes Frequents Serves) (e) π drinker (Frequents) π drinker (π drinker,beer (Frequents Serves) Likes) ANSWER: (b), (e) II. [10pts] Consider the following database schema (the Expedia database): Locations (Zipcode, City, State) Hotels (HotelID, Name, Zipcode, NumRooms) Hotels.Zipcode is a foreign key referring to Locations.Zipcode. Searches (SearchID, Zipcode, StartDate, Duration, NumPeople) Searches.Zipcode is a foreign key referring to Locations.Zipcode. Results (HotelID, SearchID, Rank, Price) Results.HotelID is a foreign key referring to Hotels.HotelID. Results.SearchID is a foreign key referring to Searches.SearchID. Write a single SQL query (it should be one statement) to answer the following: Get the names and cities of the top 10 hotels (in increasing order of their rank) for the search request with SearchID 564 such that all the hotels output are in the same state as the state corresponding to the zipcode from the given search request. ANSWER: SELECT H.Name, H.City FROM Results R, Hotels H, Searches S, Locations LH, Locations LS WHERE S.SearchID = 564 AND R.HotelID = H.HotelID AND R.SearchID = S.SearchID AND LH.Zipcode = H.Zipcode AND LS.Zipcode = S.Zipcode AND LH.State = LS.State ORDER BY R.Rank LIMIT 10; 6

EXTRA CREDIT (OPTIONAL) [+5pts] Consider the relation R(K, A 1,..., A n ), where n is a positive integer, and K is the key. What is the number of non-trivial functional dependencies on R? Explain your answer. ANSWER: K is the key of R. Thus, any non-trivial FD has to be of the form XK Y. X can be any subset of A 1... A n. Fix some X of size k (= 0 to n). The number of choices for Y is n+1 k+1, since it could be any set of attributes, except a subset of XK. Thus, the total number of non-trivial FDs is n k=0 (n k )(n+1 k+1 ) = n n+1 3 n = (4 n 3 n ). 7