Fall 2001 University of California, Berkeley College of Engineering Computer Science Division EECS Prof. Michael J. Franklin FINAL EXAM CS 186 Introduction to Database Systems NAME: STUDENT ID: IMPORTANT: Circle the last two letters of your class account: cs186 a b c d e f g h i j k l m n o p q r s t u v w x y z a b c d e f g h i j k l m n o p q r s t u v w x y z DISCUSSION SECTION DAY & TIME: TA NAME: General Information: This is a closed book examination but you are allowed two 8.5 x 11 sheets of notes (double sided). You have 2 hours and 45 minutes to answer as many questions as possible. Partial credit will be given. There are 100 points in all. You should read all of the questions before starting the exam, as some of the questions are substantially more time-consuming than others. Write all of your answers directly on this paper. Be sure to clearly indicate your final answer for each question. Also, be sure to state any assumptions that you are making in your answers. GOOD LUCK!!! Problem Possible Score 1. Logical Database Design 10 2. Physical Database Design 8 3. B+Trees 12 4. Hashing 8 5. Query Optimization 15 6. Query Estimation 9 7. SQL 15 8. Concurrency Control 15 9. Recovery 8 TOTAL 100
Use this page for scratch space if you like. CS 186 Final Exam December 18, 2001 Page 2 of 18
Name: SID: Question 1 Logical and Physical Database Design [4 parts, 10 points total]: An intramural softball league plays on Kleeburger field every Monday night from 6pm 10pm. There are eight teams in the league. So, every Monday there are four one-hour games, each team playing in one of the games. - Each team plays all the others exactly once during a seven week season. - For each game, there is one Home team and one Visiting team. - Every team has one member who serves as captain. - The league has the phone numbers of the captains and calls the captain of the Home team if a game is cancelled for any reason. - All team names and all phone numbers are unique. - Players names are not unique. The data for the league is stored in the relation G = (D, T, H, V, C, P), where the attributes are date (D), time (T), home team (H), visiting team (V), captain name(c), and phone number (P). Queries of the following form are frequently asked, and you must be able to answer them without computing a join: What is the phone number and name of the captain of team X? Given a date Y and a time Z, who is the home team and who is the visiting team? a) [3 points] The following two FDs hold: 1) P-> CH 2)H -> CP. In addition, there are 6 other FDs that you must find based on the above information. Every one of them has DTHVCP as the right side. List the 6 candidate keys that make up the left sides of these FDs. b) [2 points] Is the schema G in 3NF? If so, why? else, give a specific reason (including a specific FD) why not. c) [3 points] Design a lossless BCNF database schema for the intramural league that satisfies the query requirements stated above. d) [2 points] Give an example of a query that is likely to run slower on this schema than on the relation G (English description is sufficient). CS 186 Final Exam December 18, 2001 Page 3 of 18
Question 2 Physical Databse Design [4 parts, 8 points total]: Consider the following relation, with the primary key underlined: Emp (eid: integer, sal: integer, age: real, deptid: integer) There is a clustered index on eid and an unclustered index on age. a) [2 points] How would you use the indexes to enforce the constraint that eid is a key? b) [2 points] Give an example of an update that is definitely speeded up because of the available indexes. (English description is sufficient.) c) [2 points] Give an example of an update that is definitely slowed down because of the indexes. (English description is sufficient.) d) [2 points] Give an example of an update that is neither speeded up nor slowed down by the indexes. CS 186 Final Exam December 18, 2001 Page 4 of 18
Name: SID: Question 3 B+Trees [6 parts, 12 points total]: For each of the following B+ Trees, decide whether it is a valid B+ Tree (i.e., one that could exist after numerous inserts and deletes) or if it is invalid. Circle your choice, and if it is invalid, describe in one sentence the single main reason why. The trees follow all rules in the book including merging on delete. All of the trees are of order d=2. 20 45 10 17 25 32 56 70 90 2* 3* 5* 20* 22* 23* 10* 12* 14* 15* 26* 28* 30* 17* 18* 19* 32* 35* 44* a) [2 points] circle one: valid invalid If invalid, why? 45* 47* 52* 58* 62* 68* 70* 72* 74* 85* 92* 95* 96* 98* 20 40 60 80 3* 8* 14* 17* 23 29 45 50 55 60* 62* 65* 75* 90* 93* 95* 99* 20* 21* 22* 24* 26* 27* 28* 31* 32* 39* 40* 42* 43* 45* 47* 48* 49* 50* 52* 54* 56* 57* 58* b) [2 points] circle one: valid invalid If invalid, why? CS 186 Final Exam December 18, 2001 Page 5 of 18
10 2* 3* 5* 10* 12* 14* 15* c) [2 points] circle one: valid invalid If invalid, why? 20 45 10 17 25 56 70 90 2* 3* 5* 10* 12* 14* 15* 17* 18* 19* 20* 22* 23* 26* 28* 30* 45* 47* 52* 58* 62* 68* 70* 72* 74* 85* 92* d) [2 points] circle one: valid invalid If invalid, why? CS 186 Final Exam December 18, 2001 Page 6 of 18
Name: SID: 24 47 10 17 24 36 47 70 91 2* 3* 5* 21* 22* 23* 10* 12* 14* 15* 26* 28* 30* 18* 19* 20* 36* 38* 44* e) [2 points] circle one: valid invalid If invalid, why? 47* 49* 52* 58* 62* 68* 70* 72* 74* 85* 92* 95* 96* 40 84 30 37 50 65 73 90 95 20* 22* 25* 30* 32* 35* 36* 37* 38* 39* 41* 43* 46* 47* 50* 53* 62* 67* 68* 70* 72* 74* 76* 82* 83* 85* 86* 89* 91* 92* 94* 95* 98* 99* f) [2 points] circle one: valid invalid If invalid, why? CS 186 Final Exam December 18, 2001 Page 7 of 18
Question 4 Hashing [1 part, 8 points]: Consider the following 5 update operations. operation no. operation key value (binary) 1 insert 20 (10100) 2 insert 46 (101110) 3 delete 13 (1101) 4 insert 18 (10010) 5 insert 23 (10111) Now, consider an extendible hash structure where each bucket can hold up to 4 entries, with a depth 2 and an initial state as shown below. hash function h(n) = n mod Draw the extendible hash structure and its contents after the 5 operations have occurred in the order shown. We recommend that you do your scratch work on this page at first. But, this page will not be graded. You MUST put your final answer on the following page!! 00 01 10 11 2 2 8 16 1 5 7 13 21 2 6 10 22 CS 186 Final Exam December 18, 2001 Page 8 of 18
Name: SID: Final answer for Question 4 - Extendible Hashing: Only this page will be graded for question 4. The final structure should have a directory of size 8 so use the template below. Show all buckets and pointers Label the directory entries with their corresponding hash value (as on the previous page). Make sure to include local depths for all buckets and the global depth of the directory. CS 186 Final Exam December 18, 2001 Page 9 of 18
Question 5 Query Plan Optimization [5 parts, 15 points total]: Consider the following 2 relations: Sailors Reserves # of Pages 500 5,000 # of tuples 2500 100,000 tuples/page 5 20 indexes B+ Tree on rating Hash Table on sid sorted by sid date clustered by sid date cost of sorting by any column 2000 I/Os 30,000 I/Os Find the # of I/O s that will be estimated for each join on the following pages. Just to recap, here are the rules we re using. - The indexes use Alternative 2. - Do not include the cost of outputting the final result - Assume any duplicates exist together on the same page - The fudge factor, f, is 1. This means you can ignore it. - The optimizer only knows how to use the simplest methods for Sort-Merge-Join and Hash-Join. No special optimizations are used for these. - A Hash table lookup costs 1.2 I/O s to get the rid. - A B-Tree lookup costs 3 I/O s to get the rid. You have a buffer with 52 pages available. One page is used as the output buffer. That leaves 51 pages for you to work with. The arithmetic for this question should be very simple. CS 186 Final Exam December 18, 2001 Page 10 of 18
Name: SID: Please write neatly and circle your answer. All of these joins are on sid=sid. a) [3 points] Cost of S Join R (S as the outer) using Index Nested Loops. If not possible, explain why. b) [3 points] Cost of R Join S (R as the outer) using Index Nested Loops. If not possible, explain why. c) [3 points] Cost of S Join R (S as the outer) using Block Nested Loops. If not possible, explain why. CS 186 Final Exam December 18, 2001 Page 11 of 18
Use this page for scratch space if you like. CS 186 Final Exam December 18, 2001 Page 12 of 18
Name: SID: d) [3 points] Cost of S Join R using Sort-Merge Join. If not possible, explain why. e) [3 points] Cost of R Join S using Hash-Join. If not possible, explain why. CS 186 Final Exam December 18, 2001 Page 13 of 18
Question 6 Query Plan Estimation [5 parts, 9 points total]: Consider the following SQL Query: SELECT B.name, S.name FROM Boats B, Reserves R, Sailors S WHERE B.bid = R.bid AND R.sid = S.sid AND B.color = Red AND S.rating > 5 A (rather bad) query optimizer decides to use the following plan: σ rating>5 σ color=' Re d ' π bid,sid S B R The following part of the system catalog: Boats (500 tuples): size min value max value distinct values bid 4 bytes 1 100 100 color 10 bytes Blue Yellow 5 name 20 byes Anabelle Zues 100 Reserves (100,000 tuples): size min value max value distinct values bid 4 bytes 20 90 50 sid 4 bytes 12 462 400 date 8 bytes 01/12/76 03/22/01 1,500 Sailors (2,500 tuples): size min value max value distinct values sid 4 bytes 1 500 500 name 25 bytes Aaron Wendy 400 rating 4 bytes 1 20 20 CS 186 Final Exam December 18, 2001 Page 14 of 18
Name: SID: a) [1 point] What is the reduction factor of the selection on color? b) [1 point] What is the reduction factor of the projection on bid,sid? c) [1 point] What is the reduction factor of the selection on rating? d) [3 points] The sailing club has a policy that only high ranking members are allowed to reserve red boats. Explain how the actual output size after reduction might differ from the one a System-R optimizer would calculate for the selection on rating, due to this policy. e) [3 points] Use System-R estimation to find out how many tuples are in the final result. CS 186 Final Exam December 18, 2001 Page 15 of 18
Question 7 SQL [3 parts, 15 points total]: Use the following relational schema for an employee database (primary keys are underlined) Employee(emp_SSN, emp_name, street, city, salary, manager_ssn) Work(emp_SSN, proj_id, time) Projects(proj_ID, project_name, budget) Note, in the above relations, managers are also employees. Express the following in SQL a) [5 points] Find the names of all employees who manage at least one other manager. Do not return duplicates. b) [5 points] For each employee, return his/her SSN, name, and the number of projects that he/she works on. Sort your result by SSN. All employees should appear exactly once in the result. If an employee does not work on any projects, they should be returned with a count of zero. c) [5 points] For each project that has more than 5 employees working on it, return the name of the project, its budget, and the number of workers working on it. CS 186 Final Exam December 18, 2001 Page 16 of 18
Name: SID: Question 8 Concurrency Control [4 parts, 15 points total]: T1: W(A) W(B) COMMIT T2: R(A) R(B) COMMIT T3: W(B) R(?) COMMIT I. Producible using 2 Phase Locking II. Conflict Serializable a) [4 points] If? = A, this schedule is which of the following: a. I & II b. I only c. II only d. neither I nor II b) [4 points] If? = B, this schedule is which of the following: a. I & II b. I only c. II only d. neither I nor II c) [4 points] If? = C, this schedule is which of the following: a. I & II b. I only c. II only d. neither I nor II d) [3 points] In a system that implements intention locking, why is it ok to allow two separate transactions to each hold an IX lock on the same object. CS 186 Final Exam December 18, 2001 Page 17 of 18
Question 9 Recovery [2 parts, 8 points total]: Your project partner decides to implement a database system that uses a buffer manager with a Steal/No Force policy and a recovery system similar to ARIES. a) [4 points] Your project partner decides to implement the recovery system without using any checkpoints. What affect does this decision have on each of the three phases (Analysis, Redo, Undo).? b) [4 points] Now your partner decides that the Undo phase is not needed because all the changes were lost at the time of crash. Explain why your partner is wrong. CS 186 Final Exam December 18, 2001 Page 18 of 18