CSE344 Midterm Exam Fall 2016

Similar documents
CSE344 Midterm Exam Winter 2017

CSE 344 Midterm Nov 1st, 2017, 1:30-2:20

CSE414 Midterm Exam Spring 2018

CSE344 Final Exam Winter 2017

CSE 344 Midterm Exam

CSE 344 JANUARY 19 TH SUBQUERIES 2 AND RELATIONAL ALGEBRA

Database Systems CSE 414

CSE 344 Midterm. Monday, November 9th, 2015, 9:30-10:20. Question Points Score Total: 70

Announcements. Subqueries. Lecture Goals. 1. Subqueries in SELECT. Database Systems CSE 414. HW1 is due today 11pm. WQ1 is due tomorrow 11pm

CSE 414 Midterm. Friday, April 29, 2016, 1:30-2:20. Question Points Score Total: 100

Introduction to Database Systems CSE 414

CSE 414 Midterm. April 28, Name: Question Points Score Total 101. Do not open the test until instructed to do so.

CSE 344 Midterm. Monday, November 9th, 2015, 9:30-10:20. Question Points Score Total: 70

CS 186, Fall 1999 Midterm 2 Professor Hawthorn

Lassonde School of Engineering Winter 2016 Term Course No: 4411 Database Management Systems

CSE 544 Principles of Database Management Systems

CS 186, Fall 1999 Midterm 2 Professor Hawthorn

HW1 is due tonight HW2 groups are assigned. Outline today: - nested queries and witnesses - We start with a detailed example! - outer joins, nulls?

Principles of Database Systems CSE 544. Lecture #2 SQL The Complete Story

CS 564 Final Exam Fall 2015 Answers

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Database Systems: Fall 2015 Quiz I

CSE 344 MAY 7 TH EXAM REVIEW

Database Systems CSE 414

CSE Midterm - Spring 2017 Solutions

CSE 344 Midterm. November 9, 2011, 9:30am - 10:20am. Question Points Score Total: 100

CSE 344 Midterm. Wednesday, February 19, 2014, 14:30-15:20. Question Points Score Total: 100

CSE 544 Principles of Database Management Systems

Hash table example. B+ Tree Index by Example Recall binary trees from CSE 143! Clustered vs Unclustered. Example

Introduction to Database Systems CSE 444

Announcements. Two typical kinds of queries. Choosing Index is Not Enough. Cost Parameters. Cost of Reading Data From Disk

Introduction to Database Systems CSE 414. Lecture 26: More Indexes and Operator Costs

Midterm I - Solution CS164, Spring 2014

Midterm 1: CS186, Spring 2015

Polls on Piazza. Open for 2 days Outline today: Next time: "witnesses" (traditionally students find this topic the most difficult)

CSE 544 Principles of Database Management Systems

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 7 - Query execution

Data Storage. Query Performance. Index. Data File Types. Introduction to Data Management CSE 414. Introduction to Database Systems CSE 414

Announcements. From SQL to RA. Query Evaluation Steps. An Equivalent Expression

Agenda. Discussion. Database/Relation/Tuple. Schema. Instance. CSE 444: Database Internals. Review Relational Model

Midterm Exam #2 (Version B) CS 122A Spring 2018

CSE 344 Midterm. Wednesday, February 19, 2014, 14:30-15:20. Question Points Score Total: 100

CPSC 310: Database Systems / CSPC 603: Database Systems and Applications Exam 2 November 16, 2005

CSE 344 Midterm. Monday, Nov 4, 2013, 9:30-10:20. Question Points Score Total: 100

CSE 344 Midterm. November 9, 2011, 9:30am - 10:20am. Question Points Score Total: 100

Subqueries. 1. Subqueries in SELECT. 1. Subqueries in SELECT. Including Empty Groups. Simple Aggregations. Introduction to Database Systems CSE 414

University of California, Berkeley. CS 186 Introduction to Databases, Spring 2014, Prof. Dan Olteanu MIDTERM

Introduction to Data Management CSE 344. Lecture 12: Cost Estimation Relational Calculus

CS222P Fall 2017, Final Exam

Announcements. Agenda. Database/Relation/Tuple. Discussion. Schema. CSE 444: Database Internals. Room change: Lab 1 part 1 is due on Monday

CSE 444 Final Exam. August 21, Question 1 / 15. Question 2 / 25. Question 3 / 25. Question 4 / 15. Question 5 / 20.

Announcements. Agenda. Database/Relation/Tuple. Schema. Discussion. CSE 444: Database Internals

Midterm Exam #2 (Version B) CS 122A Spring 2018

Midterm 1: CS186, Spring I. Storage: Disk, Files, Buffers [11 points] cs186-

Database Systems CSE 414

Section 1: Redundancy Anomalies [10 points]

CSEP 514 Midterm. Tuesday, Feb. 7, 2017, 5-6:20pm. Question Points Score Total: 150

Course No: 4411 Database Management Systems Fall 2008 Midterm exam

CSE 444 Final Exam. December 17, Question 1 / 24. Question 2 / 20. Question 3 / 16. Question 4 / 16. Question 5 / 16.

Introduction to Data Management CSE 344. Lectures 9: Relational Algebra (part 2) and Query Evaluation

EXAMINATION PAPER. Exam in: INF-2700 Database Systems Date: Thursday Time: Kl 15:00-19:00 Place: Åsgårdveien 9. Approved aids: None

CSE 344 Midterm. Monday, February 6, 2012, 9:30-10:20. This is a closed book exam. You have 50. Please write your answers in the space provided.

CSE 344 Final Review. August 16 th

Midterm 1: CS186, Spring I. Storage: Disk, Files, Buffers [11 points] SOLUTION. cs186-

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 9 - Query optimization

Introduction to Database Systems CSE 444

University of California, Berkeley. (2 points for each row; 1 point given if part of the change in the row was correct)

CSE 444, Winter 2011, Final Examination. 17 March 2011

Principles of Data Management. Lecture #9 (Query Processing Overview)

Midterm 2: CS186, Spring 2015

CSE 344 APRIL 27 TH COST ESTIMATION

Querying Data with Transact SQL

1 (10) 2 (8) 3 (12) 4 (14) 5 (6) Total (50)

CS 245 Midterm Exam Winter 2014

Lecture 21: Query Optimization (1)

CSE 344 FEBRUARY 14 TH INDEXING

CS-245 Database System Principles

Database Management Systems Written Examination

CSE-3421M Test #2. Queries

Review. Administrivia (Preview for Friday) Lecture 21: Query Optimization (1) Where We Are. Relational Algebra. Relational Algebra.

Administriva. CS 133: Databases. General Themes. Goals for Today. Fall 2018 Lec 11 10/11 Query Evaluation Prof. Beth Trushkowsky

CSE 544, Winter 2009, Final Examination 11 March 2009

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Database System Concepts

CSE 344 Midterm. Wednesday, Oct. 31st, 2018, 1:30-2:20. Question Points Score Total: 100

CSEP 544: Lecture 04. Query Execution. CSEP544 - Fall

CSE 344 JANUARY 26 TH DATALOG

CIS 550 Fall Final Examination. December 13, Name: Penn ID:

CS 222/122C Fall 2016, Midterm Exam

SQL - Data Query language

Introduction to Database Systems CSE 414. Lecture 8: SQL Wrap-up

CS 461: Database Systems. Final Review. Julia Stoyanovich

Lecture 19: Query Optimization (1)

Announcements. What is Datalog? Why Do We Learn Datalog? Database Systems CSE 414. Midterm. Datalog. Lecture 13: Datalog (Ch

Midterm Exam #2 (Version C) CS 122A Spring 2018

Querying Data with Transact-SQL

Midterm Exam #2 (Version A) CS 122A Winter 2017

CSE 344 FEBRUARY 21 ST COST ESTIMATION

Introduction to Database Systems CSE 344

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 7 - Query optimization

Optimization of Nested Queries in a Complex Object Model

Transcription:

CSE344 Midterm Exam Fall 2016 November 7, 2016 Please read all instructions (including these) carefully. This is a closed book exam. You are allowed a one page handwritten cheat sheet. Write your name and UW student number below. No electronic devices are allowed, including cell phones used merely as watches. Silence your cell phones and place them in your bag. Solutions will be graded on correctness and clarity. Each problem has a relatively simple and straightforward solution. Partial solutions will be graded for partial credit. There are 6 pages in this exam, not including this one. There are 3 questions, each with multiple parts. If you get stuck on a question move on and come back to it later. You have 50 minutes to work on the exam. Please write your answers in the space provided on the exam, and clearly mark your solutions. You may use the last blank page as scratch paper. Do not use any additional scratch paper. Good luck! By writing your name below, you certify that you have not received any unpermitted aid for this exam, and that you will not disclose the contents of the exam to anyone in the class who has not taken it. NAME: Solutions STUDENT NUMBER: Problem Points 1 / 10 2 / 52 3 / 38 Total / 100

Problem 1: Warm up (10 points total) Select either or for each of the following questions. For each question you get 1 point for answering it correctly, -0.5 point for an incorrect answer, and 0 point for no answer. The minimum you will get for this entire problem is 0. a) The arity of the relation R(A int, B int) with tuples (4,2), (2,2), (4,2), (3,3), (4,2) is 3. b) Data is encoded in JSon documents using key-value pairs. c) A relation can have a clustered index on a set of attributes. d) A datalog rule is safe if and only if every variable appears in a relational atom. e) A SQL query with GROUP BY can select all attributes that are grouped on, joined upon, or are aggregates. f) Every natural join can be rewritten using cross product and selection. g) A foreign key does not have to reference a primary key. h) An inner join between relations R and S always includes all tuples from R in the result. i) Queries with universal quantifiers cannot be unnested. j) A subquery in the SELECT clause in SQL can refer to tuple variables defined in the WHERE clause. p.2

Problem 2: Writing Queries (52 points total) Write the following queries using the schema below: Product ( pid, name, cid) cid is foreign key to Company.cid Company ( cid, cname, city) Purchase( pid, custid, quantity, price) pid is foreign key to Product.pid, custid is foreign key to Customer.custId Customer( custid, name, city) a) Write a SQL query that returns the name of companies, along with the number of products sold, for companies that have sold at least 2 different types of products anywhere. (13 points) SELECT c.cname, count(*) FROM product p, company c, purchase p2 WHERE p.pid = p2.pid and c.cid = p.cid GROUP BY c.cid, c.name HAVING COUNT(*) > 2 b) Write a relational algebra query that returns the distinct names of all customers from Seattle who purchased any one type of product with quantity > 10. Write your query as a tree or a single relational algebra expression. (13 points) δ(π name (σ city= Seattle (customer) custid=custid σ quantity>10 (purchase))) p.3

Schema repeated here for your reference: Product ( pid, name, cid) cid is foreign key to Company.cid Company ( cid, cname, city) Purchase( pid, custid, quantity, price) pid is foreign key to Product.pid, custid is foreign key to Customer.custId Customer( custid, name, city) c) Write a domain-independent relational calculus query that returns the CIDs of all companies where all of their products have been sold at least once. (13 points) A(x) = y z Company(x,y,z) a( b Product(a,b,x) d e f Purchase(a, d, e, f)) d) Write a safe Datalog+negation program that returns the PIDs of the products that have never been sold, along with the names of the companies that made them. Label your answer relation Ans. (13 points) NonAnswers(n, p) : Product(p, _, c), Company(c, n, _), Purchase(p, i, _, _), Customer(i, _, _) All(n,p) : Product(p, _, c), Company(c, n, _) Ans(n,p) : All(n,p), not NonAnswers(n,p) p.4

Problem 3: Short Questions (38 points total) For the following schema: Purchase( pid, custid, quantity, price) pid is foreign key to Product.pid, custid is foreign key to Customer.custId Customer( custid, name, city) With statistics: T(Purchase) = 1000 B(Purchase) = 100 V(Purchase, price) = 100, ranging from 0 to 200, with all values equally likely T(Customer) = 3000 B(Customer) = 200 V(Customer, custid) = 50 3000 (clarification during exam) Number of memory pages available = 20 a) (6 points) Given this query: SELECT * FROM Purchase p, Customer c WHERE p.custid = c.custid AND p.price < 100 AND c.custid = 42 Indicate if each of the indexes below can help speeding up query execution, assuming that it is the only index available. For each below you get 1 point for answering it correctly, -0.5 point for an incorrect answer, and 0 point for no answer. The minimum you will get for this problem is 0. 1) Hashtable index on Purchase(price) Yes No 2) B-tree index on Purchase(pid, price) Yes No 3) Hashtable index on Customer(custId) Yes No 4) Hashtable index on Purchase(custId) Yes No 5) B-tree index on Purchase(price, pid) Yes No 6) Hashtable index on Purchase(price, pid) Yes No p.5

b) Which join algorithm would you use to execute the join in a) to minimize execution time? Assume that there are no indexes available. Be clear about how the join will be executed, i.e., what attribute will you sort on if sorting is involved, what relation will you construct a hashtable on if one is needed, etc. Briefly explain why. (8 points) We will first perform selection to reduce the size of the input tables. Evaluating the selection on p.price will result in 100 * (100/200) = 50 pages (call this T1), and evaluating the selection on c.custid will result in 200 * 1/3000 = 1 page (call this T2). Since we don t have enough pages in memory to hold both T1 and T2, we don t want to perform the join using sort-merge. We can either do nested loop join (with T2 being the outer loop relation that always reside in memory), or hash join after constructing hashtable on T2. c) Are these two queries semantically equivalent? Query 1: SELECT COUNT(*) FROM A WHERE A.a IN (SELECT x FROM B WHERE B.c = A.b) Query 2: SELECT COUNT(*) FROM A, B WHERE A.a = B.x AND B.c = A.b If equivalent, write equivalent below. If not, write not equivalent, and describe the contents of A and B such that the two queries output different results. (8 points) Not Equivalent. Consider A(a,b) = (1,1) and B(x,c) = (1,1), (1,1). Here the second query produces 2 results while the first query produces 1 result. Note that query 1 is a correlated subquery, which means that the inner query is evaluated once per record in A, supplying the A.b value and then checking to see if the corresponding A.a value satisfies the IN clause. p.6

d) Describe two differences between how relational and semi-structured data models manage data instances. (8 points) Many possibilities. For instance: - Encode instances using relations rather than structured key-value pairs - Arbitrary nesting of key-value pairs rather than flattening into multiple rows - Storing collections directly as attributes rather than requiring flattening e) Would using an unclustered index ever perform worse than a sequential scan on the same table? If yes, describe a scenario for which it is true, otherwise write no below. (8 points) When using the select operator to scan a table R, sequential scans have a cost of B(R), where using unclustered index has cost T(R) * X, where X is the selectivity. If T(R) * X > B(R), or in general if the number of distinct tuples is small, then using an unclustered index will be more costly than sequential scan. -- END OF EXAM -- p.7