CSE 344 Final Examination

Similar documents
CSE 414 Database Systems section 10: Final Review. Joseph Xu 6/6/2013

CSE 344 Final Examination

CSE 344 Final Examination

CSE 414 Final Examination

McGill April 2009 Final Examination Database Systems COMP 421

CSE 344 Final Review. August 16 th

CSE 190D Spring 2017 Final Exam

CSE 444 Midterm Exam

CSE 190D Spring 2017 Final Exam Answers

CSE 344 Final Examination

CSE 344 Midterm. Wednesday, February 19, 2014, 14:30-15:20. Question Points Score Total: 100

CSE344 Final Exam Winter 2017

CSE 344 Final Examination

CSE 444, Fall 2010, Midterm Examination 10 November 2010

CSE 344 Final Exam. March 17, Question 1 / 10. Question 2 / 30. Question 3 / 18. Question 4 / 24. Question 5 / 21.

CSE 444, Winter 2011, Midterm Examination 9 February 2011

Solutions to Final Examination

IMPORTANT: Circle the last two letters of your class account:

CSE 344 AUGUST 1 ST ENTITIES

CSE 344 Midterm. Wednesday, February 19, 2014, 14:30-15:20. Question Points Score Total: 100

Database Management Systems Paper Solution

Introduction to Database Systems. Announcements CSE 444. Review: Closure, Key, Superkey. Decomposition: Schema Design using FD

CS 564 Final Exam Fall 2015 Answers

Midterm 2: CS186, Spring 2015

CSE 344 Final Examination

IMPORTANT: Circle the last two letters of your class account:

CSE 344 JULY 30 TH DB DESIGN (CH 4)

Name Class Account UNIVERISTY OF CALIFORNIA, BERKELEY College of Engineering Department of EECS, Computer Science Division J.

Database Systems CSE 414

CSEP 514 Midterm. Tuesday, Feb. 7, 2017, 5-6:20pm. Question Points Score Total: 150

CS/B.Tech/CSE/New/SEM-6/CS-601/2013 DATABASE MANAGEMENENT SYSTEM. Time Allotted : 3 Hours Full Marks : 70

CSE 344 MAY 7 TH EXAM REVIEW

Midterm Exam (Version B) CS 122A Spring 2017

Where Are We? Next Few Lectures. Integrity Constraints Motivation. Constraints in E/R Diagrams. Keys in E/R Diagrams

Announcements. Optional Reading. Distributed File System (DFS) MapReduce Process. MapReduce. Database Systems CSE 414. HW5 is due tomorrow 11pm

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Database Systems: Fall 2008 Quiz II

CSE 544, Winter 2009, Final Examination 11 March 2009

Introduction to Data Management CSE 344

Final Exam. December 5th, :00-4:00. CS425 - Database Organization Results

Announcements. Database Systems CSE 414. Why compute in parallel? Big Data 10/11/2017. Two Kinds of Parallel Data Processing

CSE wi Final Exam Sample Solution

Multiple-Choice. 3. When you want to see all of the awards, even those not yet granted to a student, replace JOIN in the following

CSE 562 Final Exam Solutions

Databases - Transactions II. (GF Royle, N Spadaccini ) Databases - Transactions II 1 / 22

CSE 444 Midterm Exam

A7-R3: INTRODUCTION TO DATABASE MANAGEMENT SYSTEMS

CSE 344 Midterm. Monday, November 9th, 2015, 9:30-10:20. Question Points Score Total: 70

Solutions to Final Examination

VIEW OTHER QUESTION PAPERS

CSE 344 FEBRUARY 14 TH INDEXING

Administration Naive DBMS CMPT 454 Topics. John Edgar 2

Introduction to Database Systems CSE 414. Lecture 16: Query Evaluation

CSE344 Midterm Exam Winter 2017

Introduction. Examination, INF3100, No examination support material is allowed

CSE 414 Final Examination. Name: Solution

CSE 344 Midterm. Monday, November 9th, 2015, 9:30-10:20. Question Points Score Total: 70

DATABASE MANAGEMENT SYSTEMS

Introduction to Data Management CSE 344

CPSC 421 Database Management Systems. Lecture 19: Physical Database Design Concurrency Control and Recovery

Database Management

Introduction to Database Systems CSE 344

EECS 647: Introduction to Database Systems

Database Management

CSE 444: Database Internals. Lectures Transactions

L Information Systems for Engineers. Final exam. ETH Zurich, Autumn Semester 2017 Friday

Final Review. May 9, 2017

Final Review. May 9, 2018 May 11, 2018

TRANSACTION PROPERTIES

CSE 444: Database Internals. Lectures 13 Transaction Schedules

In these relations, vin is a foreign key in ORDER referring to the CAR relation, and techid is a foreign key

Example Examination. Allocated Time: 100 minutes Maximum Points: 250

Transactions. Juliana Freire. Some slides adapted from L. Delcambre, R. Ramakrishnan, G. Lindstrom, J. Ullman and Silberschatz, Korth and Sudarshan

CSE 344 MARCH 9 TH TRANSACTIONS

CSE 344 MAY 14 TH ENTITIES

The University of British Columbia Computer Science 304 Practice Final Examination

The Relational Data Model

CSE 344 Midterm. Friday, February 8, 2013, 9:30-10:20. Question Points Score Total: 100

L i (A) = transaction T i acquires lock for element A. U i (A) = transaction T i releases lock for element A

Database Applications (15-415)

Introduction to Data Management CSE 344

CS2255 DATABASE MANAGEMENT SYSTEMS QUESTION BANK UNIT I

CSE 344 MAY 2 ND MAP/REDUCE

Relational model continued. Understanding how to use the relational model. Summary of board example: with Copies as weak entity

Announcements. Parallel Data Processing in the 20 th Century. Parallel Join Illustration. Introduction to Database Systems CSE 414

CS6302- DATABASE MANAGEMENT SYSTEMS- QUESTION BANK- II YEAR CSE- III SEM UNIT I

Database Architectures

CSE 344 Midterm. Monday, Nov 4, 2013, 9:30-10:20. Question Points Score Total: 100

Bachelor in Information Technology (BIT) O Term-End Examination

Course No: 4411 Database Management Systems Fall 2008 Midterm exam

Database Management System Prof. Partha Pratim Das Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur

Examination paper for TDT4145 Data Modelling and Database Systems

Database Management Systems (CS 601) Assignments

Midterm Exam #1 Version A CS 122A Winter 2017

INSTITUTO SUPERIOR TÉCNICO Administração e optimização de Bases de Dados

Goals for Today. CS 133: Databases. Final Exam: Logistics. Why Use a DBMS? Brief overview of course. Course evaluations

Advanced Data Management Technologies Written Exam

CSE 344 Midterm. November 9, 2011, 9:30am - 10:20am. Question Points Score Total: 100

SRM UNIVERSITY FACULTY OF ENGINEERING AND TECHNOLOGY SCHOOL OF COMPUTING DEPARTMENT OF CSE COURSE PLAN

CS186 Final Exam Spring 2016

Integrity Constraints, Triggers, Transactions and Procedures

Transcription:

CSE 344 Final Examination December 12, 2012, 8:30am - 10:20am Name: Question Points Score 1 30 2 20 3 30 4 20 Total: 100 This exam is open book and open notes but NO laptops or other portable devices. You have 1h:50 minutes; budget time carefully. Please read all questions carefully before answering them. Some questions are easier, others harder; if a question sounds hard, skip it and return later. Good luck! 1

1 E/R Diagrams, Constraints, Conceptual Design 1. (30 points) (a) (10 points) For each E/R diagram, indicate if the CREATE TABLE statement that follows the diagram is consistent with the diagram or not. If it is not consistent, make the necessary changes. Only make changes that are necessary. a2 a3 b c2 a1 c1 A B C CREATE TABLE A (a1 int PRIMARY KEY, a2 int, b int, c1 int NOT NULL REFERENCES C) p1 p2 P c1 isa isa d1 C D CREATE TABLE C (p1 int PRIMARY KEY, c1 int) Page 2

h1 l1 L k1 k2 H G K g1 CREATE TABLE G (h1 int REFERENCES H, k1 int REFERENCES K, l1 int REFERENCES L, g1 int, PRIMARY KEY(h1,k1,l1 ) M N O m1 m2 o1 CREATE TABLE O (o1 int, m1 int REFERENCES M, PRIMARY KEY(m1,o1)) Answer (Corrections, if any, to the above CREATE TABLE statements): Write below or edit directly above. Solution: CREATE TABLE A (a1 int PRIMARY KEY, a2 int, a3 int, b int, c1 REFERENCES C) CREATE TABLE C (p1 int PRIMARY KEY REFERENCES P, c1 int) CREATE TABLE G is correct CREATE TABLE O (o1 int PRIMARY KEY) Page 3

(b) (10 points) You are working with a customer who just designed his E/R diagram and started to write the following CREATE TABLE statements. CREATE TABLE Product (pid INT PRIMARY KEY, name VARCHAR(20)) CREATE TABLE Inventory(pid INT PRIMARY KEY, quantity INT, FOREIGN KEY (pid) REFERENCES Product) CREATE TABLE Supplier(sid INT PRIMARY KEY, name VARCHAR(20), address VARCHAR(50)) CREATE TABLE Supplies(sid INT REFERENCES Supplier, pid INT REFERENCES Product, PRIMARY KEY (sid, pid)) CREATE TABLE PurchaseOrder (pid INT REFERENCES Product, quantity INT) The customer would like to add certain constraints to his database. Advise your customer on the appropriate implementation for each of the following constraints. If appropriate, show how to modify the CREATE TABLE statements: Ensure that attribute quantity from table Inventory is always greater than or equal to 0. Answer (Your suggestion): Solution: CREATE TABLE Inventory(pid INT PRIMARY KEY, quantity INT CHECK (quantity >= 0), FOREIGN KEY (pid) REFERENCES Product) Page 4

Whenever someone deletes a tuple in Supplier, any tuple in Supplies that referred to it should also be deleted. Answer (Your suggestion): Solution: CREATE TABLE Supplies(sid INT REFERENCES Supplier ON DELETE CASCADE, pid INT REFERENCES Product, PRIMARY KEY (sid, pid)) If the quantity of a product in inventory goes down to zero, automatically insert a tuple associated with this product in the PurchaseOrder relation (explain your suggestion in English): Answer (Your suggestion): Solution: Create a row-level trigger. Event is an update to the quantity attribute of the Inventory relation. Condition is that the quantity is now zero Action is to insert a tuple into PurchaseOrder Page 5

(c) (10 points) Consider the following relational schema and set of functional dependencies. R(A,B,C,D,E,F,G) with functional dependencies: A D D C F EG DC BF Decompose R into BCNF. Show your work for partial credit. Your answer should consist of a list of table names and attributes and an indication of the keys in each table (underlined attributes). Answer (Decompose R into BCNF): Solution: Watch-out! The first FD does NOT violate BCNF so we need to pick another one to decompose. We try the second one: Try {D} + = {B, C, D, E, F, G}. Decompose into R1(B, C, D, E, F, G) and R2(A,D). R2 has two attributes, so it is necessarily in BCNF. For R1, again not all FDs violate BCNF so we need to be careful. Try {F } + = {E, F, G}. Decompose into R11(E, F, G) and R12(B, C, D, F). Both R11 and R12 are in BCNF. Page 6

2 Transactions 2. (20 points) (a) (10 points) Consider the following transaction schedules. For each schedule, indicate if it is conflict-serializable or not: r1(a); r2(b); r1(b); w2(b); w1(a); w1(b); r2(a); w2(a); c1; c2 Answer (YES/NO): Solution: NO r1(a); r2(b); r3(a); r2(a); r3(c); r1(b); r3(b); r1(c); r2(c); c1; c2; c3 Answer (YES/NO): Solution: YES w1(a); w2(a); w1(b); w3(b); w1(c); w3(c); w2(c); c1; c2; c3 Answer (YES/NO): Solution: YES Page 7

(b) (10 points) Your friend Bob just wrote a Java application that talks to a backend SQL Server DBMS. The application enables students to register for courses. Looking through the code, you notice that Bob s application performs the following sequence of operations. Prompt the user for a student ID and password. Start a new transaction. Look up the student in the database. If the student ID is not in the database or the password is incorrect, abort the transaction. Look up the courses recommended for the student. Display the courses on the screen. While the user does not choose QUIT Prompt the user to select a course. If the course is available, then register the student. End while Commit the transaction. Explain the problem in Bob s design. Explain why this is a problem. Explain how to fix the problem. You do not need to write the updated application pseudo-code. Page 8

Answer (Be careful to answer all three parts of the question): Solution: What is the problem? Bob put too many operations, including waiting for user input, inside a single, very long transaction. Why is this a problem? This is a problem because it will reduce the throughput of the application considerably. Each transaction will hold locks for an unnecessary long period of time, preventing concurrent accesses to the data by other users. Additionally, if a failure occurs, the user will lose a lot of work as all his or her course registrations will get rolled back. How to fix the problem? The solution is to use finer-grained transactions: Check the login and password as one transaction. Look-up the recommended courses as a second transaction. Then each attempt to register for one course should be executed as its own transaction. Page 9

3 Parallel Data Processing 3. (30 points) (a) (15 points) Consider a relation R(a,b) that is horizontally partitioned across N = 3 machines as shown in the diagram below. Each machine locally stores approximately 1 of the tuples in R. The tuples are randomly organized across machines (i.e., R is N block partitioned across machines). Show a relational algebra plan for this query and how it will be executed across the N = 3 machines. Pick an efficient plan that leverages the parallelism as much as possible. Include operators that need to re-shuffle data and add a note explaining how these operators will re-shuffle that data. SELECT a, max(b) as topb FROM R WHERE a > 0 GROUP BY a Answer (Draw the parallel query plan): Machine 1 Machine 2 Machine 3 1/3 of R 1/3 of R 1/3 of R Solution: Page 10

γ a, max(b) topb γ a, max(b) topb γ a, max(b) topb hash on a hash on a hash on a γ a, max(b) b γ a, max(b) b γ a, max(b) b σ a>0 σ a>0 σ a>0 scan Machine 1 scan Machine 2 scan Machine 3 1/3 of R 1/3 of R 1/3 of R Page 11

(b) (10 points) Explain how the query would be executed in MapReduce (not Pig). Make sure to specify the computation performed in the map and the reduce functions. Answer (Describe the map and reduce functions): Solution: We describe the functions and the overall execution. Each map task scans a block of R and calls the map function for each tuple. The map function applies the selection predicate to the tuple. For each tuples that passes the selection, it outputs a record with key=a and value=b. map(string key, String value): // key: irrelevant // value: tuple if a > 0 EmitIntermediate(a, b); The MapReduce engine reshuffles the output of the map phase and groups it on the intermediate key, which is attribute a. Each reduce task computes the aggregate value for each group assigned to it (by calling the reduce function) and outputs the final result. reduce(string key, Iterator values): // key: attribute a // values: all values of b in the group topb = compute max value in the group Emit(topb); It is possible to use the combiner to perform a local aggregation before the data gets re-shuffled. Page 12

(c) (5 points) What would change if we hash-partitioned R on R.a before executing the above query? Please discuss the case of the parallel DBMS and the case of MapReduce. Answer (Explain the difference for a parallel DBMS and for MapReduce): Solution: The parallel DBMS would avoid the data re-shuffling step. It would compute the aggregates locally. Hash-partitioning the data would not make any difference for the MapReduce job as MapReduce wouldn t know that the data is hash-partitioned. Additionally, MapReduce treats map and reduce functions as black-boxes and does not perform any optimizations on them. Page 13

4 NoSQL and Data Integration 4. (20 points) (a) (10 points) In order to scale a DBMS, one approach is to horizontally partition each relation across a set of machines in a cluster. Why is it expensive to execute transactions that touch arbitrary data in such a configuration? Answer: Solution: It is difficult to ensure ACID properties for transactions that touch data on multiple machines because the DBMS instances running on those machines must agree to either COMMIT or ABORT the whole transaction. Such an agreement requires that the machines involved in the transaction run the twophase commit (2PC) protocol, which requires the exchange of multiple messages and is thus expensive. Additionally, if a transaction is allowed to touch arbitrary data, it may include expensive SQL queries that require a large amount of data to be re-shuffled between machines, which can also be very expensive. Page 14

(b) (10 points) Jack is looking at customer data from two different branches of his store. Luckily, Jack made sure that the database in each branch followed the same schema. Each branch has a table CustomersBranch holding the ID, name, and state of each customer. He thus integrates the data by defining the following view CREATE VIEW CustomersData AS SELECT B1.name, B1.state FROM CustomersBranch1 B1 UNION SELECT B2.name, B2.state FROM CustomersBranch2 B2 He now wants to analyze this data and he executes the following SQL query: SELECT state, count(*) FROM CustomersData GROUP BY state But the results are different from what he expected: Both the number of groups and the counts of customers are much higher than what he expected. Give a possible explanation for each of these two problems.. Answer (Give two reasons): Solution: Problem with too many groups: State names might be misspelled or perhaps one store uses abbreviations while the other one doesn t (value heterogeneity). Problem with too many customers: Customer names may appear with slight variations in both databases even though both refer to the same physical person (duplicate records). Page 15