CSEP 514 Midterm. Tuesday, Feb. 7, 2017, 5-6:20pm. Question Points Score Total: 150

Similar documents
CSE 414 Midterm. Friday, April 29, 2016, 1:30-2:20. Question Points Score Total: 100

CSE 344 Midterm Exam

CSE 344 Midterm Nov 1st, 2017, 1:30-2:20

CSE344 Midterm Exam Winter 2017

CSE 344 Midterm. Friday, February 8, 2013, 9:30-10:20. Question Points Score Total: 100

CSE 344 Midterm. Monday, November 9th, 2015, 9:30-10:20. Question Points Score Total: 70

CSE 344 Midterm. Wednesday, Oct. 31st, 2018, 1:30-2:20. Question Points Score Total: 100

CSE 344 Final Examination

CSE 344 Final Examination

CSE 344 Midterm. Wednesday, February 19, 2014, 14:30-15:20. Question Points Score Total: 100

CS 564 Final Exam Fall 2015 Answers

CSE 344 Final Examination

CSE 344 Midterm. Wednesday, February 19, 2014, 14:30-15:20. Question Points Score Total: 100

IMPORTANT: Circle the last two letters of your class account:

Midterm 1: CS186, Spring 2015

CSE 414 Final Examination

CSE344 Midterm Exam Fall 2016

Midterm 2: CS186, Spring 2015

CSE 444 Final Exam. August 21, Question 1 / 15. Question 2 / 25. Question 3 / 25. Question 4 / 15. Question 5 / 20.

CSE 344 Midterm. Monday, February 6, 2012, 9:30-10:20. This is a closed book exam. You have 50. Please write your answers in the space provided.

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Database Systems: Fall 2015 Quiz I

Examples of Physical Query Plan Alternatives. Selected Material from Chapters 12, 14 and 15

University of California, Berkeley. (2 points for each row; 1 point given if part of the change in the row was correct)

CS 461: Database Systems. Final Review. Julia Stoyanovich

CSE 344 Midterm. Monday, Nov 4, 2013, 9:30-10:20. Question Points Score Total: 100

University of California, Berkeley. CS 186 Introduction to Databases, Spring 2014, Prof. Dan Olteanu MIDTERM

CSE 344 Final Examination

CSE 344 Midterm. Monday, November 9th, 2015, 9:30-10:20. Question Points Score Total: 70

R & G Chapter 13. Implementation of single Relational Operations Choices depend on indexes, memory, stats, Joins Blocked nested loops:

Introduction to Data Management CSE 344. Lecture 12: Cost Estimation Relational Calculus

CSE 344 Midterm. Wednesday, Nov. 1st, 2017, 1:30-2:20. Question Points Score Total: 100

CSE 444, Winter 2011, Final Examination. 17 March 2011

Introduction to Data Management CSE 344. Lectures 9: Relational Algebra (part 2) and Query Evaluation

CSE 344 Final Examination

CSE 414 Midterm. April 28, Name: Question Points Score Total 101. Do not open the test until instructed to do so.

Database Management Systems Paper Solution

CSE 344 Midterm. November 9, 2011, 9:30am - 10:20am. Question Points Score Total: 100

Exam. Question: Total Points: Score:

CSE 344 FEBRUARY 21 ST COST ESTIMATION

CSE 344 APRIL 27 TH COST ESTIMATION

University of Massachusetts Amherst Department of Computer Science Prof. Yanlei Diao

Midterm Exam #1 Version A CS 122A Winter 2017

CSE 344 APRIL 20 TH RDBMS INTERNALS

McGill April 2009 Final Examination Database Systems COMP 421

Administrivia. CS 133: Databases. Cost-based Query Sub-System. Goals for Today. Midterm on Thursday 10/18. Assignments

CSE 344 Midterm. November 9, 2011, 9:30am - 10:20am. Question Points Score Total: 100

Announcements. Two typical kinds of queries. Choosing Index is Not Enough. Cost Parameters. Cost of Reading Data From Disk

CSE 344 Final Examination

Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Administrivia. Administrivia. Faloutsos/Pavlo CMU /615

CS222P Fall 2017, Final Exam

CSE 544 Principles of Database Management Systems

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Database Systems: Fall 2009 Quiz I Solutions

Announcements. From SQL to RA. Query Evaluation Steps. An Equivalent Expression

Course No: 4411 Database Management Systems Fall 2008 Midterm exam

CS 186 Midterm, Spring 2003 Page 1

PLEASE HAND IN UNIVERSITY OF TORONTO Faculty of Arts and Science

Introduction to Database Systems CSE 344

Data Storage. Query Performance. Index. Data File Types. Introduction to Data Management CSE 414. Introduction to Database Systems CSE 414

CSE 332 Autumn 2013: Midterm Exam (closed book, closed notes, no calculators)

Implementation of Relational Operations

Midterm Exam #2 (Version A) CS 122A Winter 2017

CSE344 Final Exam Winter 2017

CS 245 Midterm Exam Solution Winter 2015

CSIT5300: Advanced Database Systems

CSCI-6421 Final Exam York University Fall Term 2004

Evaluation of Relational Operations. Relational Operations

CompSci 516 Data Intensive Computing Systems

Midterm Exam (Version B) CS 122A Spring 2017

Database Management Systems (COP 5725) Homework 3

CSE 444, Winter 2011, Midterm Examination 9 February 2011

IMPORTANT: Circle the last two letters of your class account:

Database Systems CSE 414

Database Systems CSE 414

Lassonde School of Engineering Winter 2016 Term Course No: 4411 Database Management Systems

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17

Example Examination. Allocated Time: 100 minutes Maximum Points: 250

UNIVERSITY OF CALIFORNIA College of Engineering Department of EECS, Computer Science Division

Spring 2013 CS 122C & CS 222 Midterm Exam (and Comprehensive Exam, Part I) (Max. Points: 100)

CS 245 Midterm Exam Winter 2014

Administriva. CS 133: Databases. General Themes. Goals for Today. Fall 2018 Lec 11 10/11 Query Evaluation Prof. Beth Trushkowsky

Mahathma Gandhi University

Database Applications (15-415)

Midterm Exam #2 (Version B) CS 122A Spring 2018

CSE414 Midterm Exam Spring 2018

Database Systems. Announcement. December 13/14, 2006 Lecture #10. Assignment #4 is due next week.

Midterm Exam. October 30th, :15-4:30. CS425 - Database Organization Results

Evaluation of Relational Operations

Introduction to Database Systems CSE 444

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 7 - Query execution

Cost-based Query Sub-System. Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Last Class.

Examination paper for TDT4145 Data Modelling and Database Systems

CSE Midterm - Spring 2017 Solutions

CSE 190D Spring 2017 Final Exam Answers

Evaluation of relational operations

Overview of Implementing Relational Operators and Query Evaluation

Midterm Exam #2 (Version B) CS 122A Spring 2018

CSE 444 Homework 1 Relational Algebra, Heap Files, and Buffer Manager. Name: Question Points Score Total: 50

Introduction to Data Management. Lecture 16 (SQL: There s STILL More...) It s time for another installment of...

CSE 344 FEBRUARY 14 TH INDEXING

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 7 - Query optimization

Transcription:

CSEP 514 Midterm Tuesday, Feb. 7, 2017, 5-6:20pm Name: Question Points Score 1 50 2 25 3 50 4 25 Total: 150 This exam is CLOSED book and CLOSED devices. You are allowed ONE letter-size page with notes (both sides). You have 80 minutes; budget time carefully. Please read all questions carefully before answering them. Some questions are easier, others harder. Plan to answer all questions, do not get stuck on one question. If you have no idea how to answer a question, write your thoughts about the question for partial credit. Good luck! 1

1 SQL and Indexing 1. (50 points) The following database contains the answers collected during a poll: Subject(sid, name, age) Question(qid, description, category) Answer(sid, qid, vote) Subject stores all subjects that have been polled. Question stores a set of questions. For examples, a question could be Would you rather live in large city than in a suburb? Answer stores the answers. The vote is 0 or 1, representing no and yes respectively. Answers are voluntary, not every subject answers every question: if a subject did not answer a question, then that (subject,question) pair is not inserted in the relation Answer. All primary keys are underlined. The attributes types are as follows: sid,qid, age, vote are integers. name, description, category are text. CONTINUED ON NEXT PAGE Page 2

Subject(sid, name, age) Question(qid, description, category) Answer(sid, qid, vote) (a) (12 points) Write SQL statements to create the tables for the polling database. Choose the right types for the attributes, and define all key and foreign key constraints. You should turn in CREATE TABLE statements. Solution: drop table if exists Answer; drop table if exists Subject; drop table if exists Question; create table Subject(sid int primary key, name text, age int); create table Question(qid int primary key, description text, category text); create table Answer (sid int references Subject, qid int references Question, vote int, primary key (sid, qid)); insert into Subject values(1, Alice, 22); insert into Subject values(2, Bob, 33); insert into Subject values(3, Carol, 44); insert into Question values(10, Q1?, sports ); insert into Question values(20, Q2?, sports ); insert into Question values(30, Q3?, living ); insert into Answer values(1,10,1); insert into Answer values(1,30,1); insert into Answer values(2,10,0); insert into Answer values(2,20,1); insert into Answer values(2,30,1); insert into Answer values(3,20,0); insert into Answer values(3,30,0); 2 points off for missing keys 2 points off for missing references Page 3

Subject(sid, name, age) Question(qid, description, category) Answer(sid, qid, vote) (b) (15 points) Write a SQL query to compute, for every question, the number of yes votes. Return the questions in decreasing order of their number of yes votes. Your query should return the question id, its description, and its number of yes votes. Solution: select z.qid, z.description, count(*) as cnt from Subject x, Answer y, Question z where x.sid = y.sid and y.qid= z.qid and y.vote = 1 group by z.qid, z.description order by cnt desc; Another solution is to replace count(*) with sum(y.vote) and drop the condition y.vote=1. 2 points off if vote is ignored. Page 4

Subject(sid, name, age) Question(qid, description, category) Answer(sid, qid, vote) (c) (15 points) We say that two subjects are similar if they gave the same answers to at least 50 questions. Write a SQL query to return the names of all subjects who are similar to Alice. You may assume that Alice is a existing subject in your database, and that the name Alice is unique. Solution: select x.sid, x.name from Subject a, Answer b, Subject x, Answer y where a.name = Alice and a.sid = b.sid and x.sid = y.sid and b.vote = y.vote group by x.sid, x.name having count(*) >= 50; 2 points off for computing the aggregate in a subquery 3 points partial credit for solutions that made no sense to me 7-8 points partial credit for using having count(*) < 50 combined with not exists Page 5

(d) Consider the following three queries stated in English: Subject(sid, name, age) Question(qid, description, category) Answer(sid, qid, vote) 1. Find all categories where some question received a yes vote. 2. Find all categories where every question received a yes vote. 3. Find all categories where some question received only yes votes. 4. Find all categories where every question received only yes votes. For each of the SQL queries below, indicate which of the three English queries above they correspond to, or write NONE if they do not correspond to any English query. i. (2 points) select distinct x.category from question x where not exists (select * from question u, answer v where x.category = u.category and u.qid = v.qid and v.vote = 0); i. 4 To which English query does it correspond? ii. (2 points) select distinct x.category from question x where not exists (select * from answer v where x.qid = v.qid and v.vote = 0); ii. 3 To which English query does it correspond? iii. (2 points) select distinct x.category from question x where not exists (select * from question u where x.category = u.category and not exists (select * from answer v where u.qid = v.qid and v.vote = 1)); To which English query does it correspond? iv. (2 points) select distinct x.category from question x, answer v where x.qid = v.qid and v.vote = 1; To which English query does it correspond? iii. 2 iv. 1 Page 6

2 Relational Algebra 2. (25 points) (a) (10 points) Write a Relational Algebra expression in the form of a logical query plan (i.e., draw a tree, or write an RA expression) that is equivalent to the SQL query below. select distinct x.category from question x where not exists (select * from question u, answer v where x.category = u.category and u.qid = v.qid and v.vote = 0); -5 if a join -3 for extra join Ok if missing δ -2 points if missing or Π * Π category Π category category Question Question σ vote=0 Solution: 3 points partial credit for non-sense This join may be dropped Answer Page 7

(b) Consider three relations R(A, B), S(C, D), T (E, F ), where all attributes are integers. Which of the following relational algebra expressions are equivalent? i. (3 points) (R B=C S) D=E T = R B=C (S D=E T ) Equivalent? ii. (3 points) σ A D (R B=C S) = σ A B (R) B=C σ C D (S) i. Yes ii. No Equivalent? iii. (3 points) γ A,sum(D) K (R B=C S) = π A,K (R B=C γ C,sum(D) K (S)) iii. No Equivalent? iv. (3 points) γ A,sum(D) K (R B=C S) = γ A,sum(L) K (R B=C γ C,sum(D) L (S)) Equivalent? v. (3 points) Assume B is a key in R(A, B): R B=C S = R B=C Π CD (R B=C S) iv. Yes Equivalent? v. Yes Page 8

3 Query Execution and Indexes 3. (50 points) (a) Answer true or false: i. (2 points) Physical data independence means the ability of the query optimizer to select the best plan. i. False True or false? ii. (2 points) Physical data independence means that the databases does not need to change when the underlying technology evolves over the years, such as the increased density of the data on hard discs. ii. False True or false? iii. (2 points) Physical data independence means that the SQL queries don t need to change when we modify the physical store of the database, such as adding or removing indices or re-organizing the layout of the relations. iii. True True or false? iv. (2 points) Given sufficient time and manpower, every Java program can be rewritten entirely in SQL. True or false? iv. False Page 9

(b) Assume that the table Answer(sid, qid, vote) is very large, and consider the following three queries: Q1 = select * from Answer where sid = 123456; Q2 = select * from Answer where qid = 333333; Q3 = select * from Answer where sid = 123456 and qid = 333333; Further assume that there are many subjects and many questions, and that each subject answered only a small number of questions, and each question is answered by only a small number of subjects. i. (2 points) Which of the queries Q1,Q2,Q3 can be answered efficiently using a B + -tree index on Answer(sid)? Indicate all queries that might benefit from this index. i. Q1,Q3 ii. (2 points) Which of the queries Q1,Q2,Q3 can be answered efficiently using a B + -tree index on Answer(qid)? Indicate all queries that might benefit from this index. ii. Q2,Q3 iii. (2 points) Which of the queries Q1,Q2,Q3 can be answered efficiently using a B + -tree index on Answer(sid,qid)? Indicate all queries that might benefit from this index. iii. Q1,Q3 iv. (2 points) Which of the queries Q1,Q2,Q3 can be answered efficiently using a B + -tree index on Answer(qid,sid)? Indicate all queries that might benefit from this index. iv. Q2,Q3 Page 10

(c) We have a very large relation Subject(sid, name, age) and need to compute following logical query plan σ age=30 (Subject) i. (2 points) If there exists a clustered index on age then an index based selection is always more efficient than a sequential scan. i. True True or false? ii. (2 points) If there exists an unclustered index on age then an index based selection is always more efficient than a sequential scan. True or false? ii. False (d) Let R(A, B), S(C, D) be two large relations, and assume we have four indexes, on R(A), on R(B), on S(C), and on S(D), denoted IA, IB, IC, ID. For each expression below, indicate which indices may be useful to compute it. If you have a choice, then write accordingly, e.g. IA or both (IB and IC); if it is best not to use an index at all, then write NONE. i. (2 points) R B=C S ii. (2 points) σ A= 1234 (R) B=C S i. NONE iii. (2 points) R B=C σ D= 5678 (S) ii. IA and IC iv. (2 points) σ A= 1234 (R) B=C σ D= 5678 (S) iii. ID and IB iv. (IA and IC) or (ID and Page 11

(e) Consider three large relations R(A, B), S(C, D), T (E, F ), and the following query plan: (R B=C S) D=E T The optimizer uses the following physical plan: Create in main memory a hash table for S Create in main memory a hash table for T Probe R. Assume that the result of the plan is sent directly to the client, and is not stored in the main memory. i. (2 points) Is R pipelined? ii. (2 points) Is S pipelined? i. Yes ii. No iii. (2 points) This physical plan is executable if and only if the relation R fits entirely in main memory. iii. No iv. (2 points) This physical plan is executable if and only if both S and T fit together in main memory. iv. Yes v. (2 points) This physical plan is executable if and only if all three relations R, S, and T fit together in main memory. v. No Page 12

(f) Answer yes or no: i. (2 points) If a physical operator is pipelined, then it can return answers to the parent before its child operator finishes processing all the data. i. Yes ii. (2 points) If a physical operator is blocking, then it can return answers to the parent before its child operator finishes processing all the data. ii. No iii. (2 points) In general, a blocking operator is more efficient than a pipelined operator. iii. No iv. (2 points) In general, a blocking operator requires more memory or more disk space than a pipelined operator. v. (2 points) A selection operator σ A=30 (R) is blocking. iv. Yes vi. (2 points) A merge-join operator R B=C S is blocking. v. No vi. Yes Page 13

4 Entity-Relationship Diagrams 4. (25 points) (a) (15 points) You are volunteering to help a political party reach out to voters. Design an E/R diagram for your database to store the following information. Political parties: they have a name. People: they have names, address, phone number. Voters are people. Volunteers are people; each volunteer is affiliated with a party. Contact: is a relationship between a volunteer and a voter. Each contact has a date, when that contact was made. Leans: some voters lean towards a political party (only one). name address phone Person isa date isa Volunteer Contact Voter Affiliation Party Leans Solution: Page 14

(b) (10 points) Consider the E/R Diagram below: eid name Employee Manages isa isa Developer Review Manager level department Design the corresponding relational schema. Choose reasonable types for the attributes (integer or text). Show all keys and foreign keys. You should turn in a set of CREATE TABLE statements. Solution: drop table if exists Review; drop table if exists Developer; drop table if exists Manager; drop table if exists Employee; create table Employee ( eid int primary key, name text, m int -- references Manager -- note: circular references in Postgres require ALTER TABLE -- of course, this was not required on the exam ); create table Manager ( eid int primary key references Employee, department text); create table Developer ( eid int primary key references Employee, level int); Page 15

create table Review ( did int references Developer, mid int references Manager); Page 16