Exam 1. March 20th, CS525 - Midterm Exam Solutions

Similar documents
CS 564 Final Exam Fall 2015 Answers

Midterm Exam. October 30th, :15-4:30. CS425 - Database Organization Results

THE B+ TREE INDEX. CS 564- Spring ACKs: Jignesh Patel, AnHai Doan

Final Exam. May 3rd, :30-12:30. CS520 - Data Integration, Warehousing, and Provenance

University of California, Berkeley. CS 186 Introduction to Databases, Spring 2014, Prof. Dan Olteanu MIDTERM

Midterm 1: CS186, Spring I. Storage: Disk, Files, Buffers [11 points] cs186-

Homework Assignment 2. Due Date: October 21th, :30pm (noon) CS425 - Database Organization Results

Query Optimization. Query Optimization. Optimization considerations. Example. Interaction of algorithm choice and tree arrangement.

CS 186/286 Spring 2018 Midterm 1

CSE344 Midterm Exam Winter 2017

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 7 - Query execution

Midterm Exam. Name: CSE232A, Winter February 21, Brief Directions:

Midterm 1: CS186, Spring I. Storage: Disk, Files, Buffers [11 points] SOLUTION. cs186-

Lassonde School of Engineering Winter 2016 Term Course No: 4411 Database Management Systems

Database Applications (15-415)

WEEK 3. EE562 Slides and Modified Slides from Database Management Systems, R.Ramakrishnan 1

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe

CSE Midterm - Spring 2017 Solutions

Database Applications (15-415)

CSE344 Midterm Exam Fall 2016

Algorithms for Query Processing and Optimization. 0. Introduction to Query Processing (1)

CMPUT 391 Database Management Systems. Query Processing: The Basics. Textbook: Chapter 10. (first edition: Chapter 13) University of Alberta 1

Announcements. Two typical kinds of queries. Choosing Index is Not Enough. Cost Parameters. Cost of Reading Data From Disk

Outline. Query Processing Overview Algorithms for basic operations. Query optimization. Sorting Selection Join Projection

CSIT5300: Advanced Database Systems

CSE 190D Spring 2017 Final Exam

CSE 562 Midterm Solutions

Access Methods. Basic Concepts. Index Evaluation Metrics. search key pointer. record. value. Value

CS54100: Database Systems

Oregon State University Practice problems 2 Winster Figure 1: The original B+ tree. Figure 2: The B+ tree after the changes in (1).

CS-245 Database System Principles

Multi-way Search Trees. (Multi-way Search Trees) Data Structures and Programming Spring / 25

Relational Model, Relational Algebra, and SQL

Database Applications (15-415)

Chapter 3. Algorithms for Query Processing and Optimization

CS 186/286 Spring 2018 Midterm 1

Course No: 4411 Database Management Systems Fall 2008 Midterm exam

EECS 647: Introduction to Database Systems

CS 245 Midterm Exam Winter 2014

Database Management Systems (CS 601) Assignments

CSE 530A. B+ Trees. Washington University Fall 2013

Query Processing. Introduction to Databases CompSci 316 Fall 2017

Query Processing and Advanced Queries. Query Optimization (4)

Today's Class. Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Example Database. Query Plan Example

CS 222/122C Fall 2017, Final Exam. Sample solutions

CSE 190D Spring 2017 Final Exam Answers

Database Management Systems Written Examination

CSE 344 Midterm Nov 1st, 2017, 1:30-2:20

CSE 344 APRIL 27 TH COST ESTIMATION

Material You Need to Know

CSE 544 Principles of Database Management Systems

Review. Support for data retrieval at the physical level:

Tree-Structured Indexes

CSIT5300: Advanced Database Systems

Question Score Points Out Of 25

Section A Arithmetic ( 5) Exercise A

Spring 2017 B-TREES (LOOSELY BASED ON THE COW BOOK: CH. 10) 1/29/17 CS 564: Database Management Systems, Jignesh M. Patel 1

Indexing. Week 14, Spring Edited by M. Naci Akkøk, , Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel

INSTITUTO SUPERIOR TÉCNICO Administração e optimização de Bases de Dados

Multiple-choice (35 pt.)

Database System Concepts

CSE 444, Winter 2011, Final Examination. 17 March 2011

Query optimization. Elena Baralis, Silvia Chiusano Politecnico di Torino. DBMS Architecture D B M G. Database Management Systems. Pag.

Homework 1: RA, SQL and B+-Trees (due Feb 7, 2017, 9:30am, in class hard-copy please)

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17

CSC 261/461 Database Systems Lecture 19

Database Systems External Sorting and Query Optimization. A.R. Hurson 323 CS Building

Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Administrivia. Administrivia. Faloutsos/Pavlo CMU /615

University of Waterloo Midterm Examination Sample Solution

Chapters 15 and 16b: Query Optimization

CS143: Index. Book Chapters: (4 th ) , (5 th ) , , 12.10

DBMS Query evaluation

CSE 444 Final Exam. August 21, Question 1 / 15. Question 2 / 25. Question 3 / 25. Question 4 / 15. Question 5 / 20.

Exam. Question: Total Points: Score:

Topics to Learn. Important concepts. Tree-based index. Hash-based index

Practice Midterm Exam Solutions

Introduction to Database Systems CSE 444, Winter 2011

Administrivia. Tree-Structured Indexes. Review. Today: B-Tree Indexes. A Note of Caution. Introduction

CS 186, Fall 1999 Midterm 2 Professor Hawthorn

3. Relational Data Model 3.5 The Tuple Relational Calculus

FDB: A Query Engine for Factorised Relational Databases

Midterm 1: CS186, Spring 2015

B-Tree. CS127 TAs. ** the best data structure ever

Fundamentals of Database Systems

NESTED QUERIES AND AGGREGATION CHAPTER 5 (6/E) CHAPTER 8 (5/E)

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15

Final Exam. December 5th, :00-4:00. CS425 - Database Organization Results

Midterm Review. March 27, 2017

Query Decomposition and Data Localization

CSE 414 Midterm. Friday, April 29, 2016, 1:30-2:20. Question Points Score Total: 100

CPS 216 Spring 2003 Homework #1 Assigned: Wednesday, January 22 Due: Monday, February 10

Advanced Database Systems

CSE414 Midterm Exam Spring 2018

SQL - Data Query language

Magda Balazinska - CSE 444, Spring

CS 186, Fall 1999 Midterm 2 Professor Hawthorn

Indexing: B + -Tree. CS 377: Database Systems

CS251-SE1. Midterm 2. Tuesday 11/1 8:00pm 9:00pm. There are 16 multiple-choice questions and 6 essay questions.

Storage hierarchy. Textbook: chapters 11, 12, and 13

Database Management Systems (COP 5725) Homework 3

Transcription:

Name CWID Exam 1 March 20th, 2017 CS525 - Midterm Exam s Please leave this empty! 1 2 3 4 Sum

Things that you are not allowed to use Personal notes Textbook Printed lecture notes Phone The exam is 75 minutes long Instructions Multiple choice questions are graded in the following way: You get points for correct answers and points subtracted for wrong answers. The minimum points for each questions is 0. For example, assume there is a multiple choice question with 6 answers - each may be correct or incorrect - and each answer gives 1 point. If you answer 3 questions correct and 3 incorrect you get 0 points. If you answer 4 questions correct and 2 incorrect you get 2 points.... For your convenience the number of points for each part and questions are shown in parenthesis. There are 4 parts in this exam 1. SQL (32) 2. Relational Algebra (26) 3. Index Structures (24) 4. I/O Estimation (18) DB - Spring 2017: Page 2 (of 16)

Part 1 SQL (Total: 32 Points) Consider the following database schema and instance storing information about cities, points of interest (POI) and user reviews of points of interest. The x and y coordinates are geographical locations of POI. For cities we record the geographical extend of the city as a rectangle represented as two points: the lower left corner (xlow,ylow) and the upper right corner (xhigh,yhigh). POI pname x y category Cloud Gate 1010 915 Sculpture Pizza Joe 1014 915 Restaurant Fields Museum 1016 916 Museum Moma 1340 830 Museum city cname xlow xhigh ylow yhigh Chicago 950 1060 900 940 New York 1250 1400 790 850 reviews poi user rating review Could Gate Alice 4 looks cool Pizza Joe Peter 1 seriously overrated Pizza Joe Alice 3 my personal favourite Hints: When writing queries do only take the schema into account and not the example data given here. That is your queries should return correct results for all potential instances of this schema. Attributes with black background form the primary key of an relation. primary key of relation POI (points of interest). For example, pname is the The attribute poi of relation reviews is a foreign key to relation POI. DB - Spring 2017: Page 3 (of 16)

Question 1.1 (6 Points) Write an SQL query that returns the names of all restaurants (POIs with category equal to Restaurant ) that are in Chicago their coordinate (x,y) lies within the bounds of the rectangle stored for Chicago. SELECT pname FROM POI p, c i t y c WHERE x BETWEEN xlow AND xhigh AND y BETWEEN ylow AND yhigh AND cname = Chicago AND category = Restaurant ; DB - Spring 2017: Page 4 (of 16)

Question 1.2 (10 Points) Write an SQL query that returns the POI category with the highest number of POIs relative to its extend. That is normalize the number of POIs within each city by the area of the rectangle (xlow,ylow,xhigh,yhigh). WITH poicity AS ( SELECT cname, ( xhigh xlow) ( yhigh ylow) AS a r e a FROM POI p, c i t y c x BETWEEN xlow AND xhigh AND y BETWEEN ylow AND yhigh ), poinum AS ( SELECT cname, count ( ) / area AS relpoi FROM poicity ) SELECT cname FROM poinum p WHERE relpoi = ( SELECT max( relpoi ) FROM poinum ) ; DB - Spring 2017: Page 5 (of 16)

Question 1.3 (7 Points) Write an SQL query that returns the number of POI categories with an average rating of POIs in this category that is higher than 4. SELECT count ( ) AS numc FROM ( SELECT category FROM POI p, reviews r WHERE pname = poi GROUP BY category HAVING AVG( r a t i n g ) > 4) AS avgr DB - Spring 2017: Page 6 (of 16)

Question 1.4 (9 Points) Find the closest point of interest (POI) to geographical location (x=500, y=600) that belongs to category Restaurant and where at least 25% of ratings for this POI are higher than 3. Use euclidian distance to compute distance d(p 1, p 2 ) = (p 1.x p 2.x) 2 + (p 1.y p 2.y) 2. You can assume that the database system supports a function sqrt that computes the square root of a number. WITH AS r e s t ( SELECT pname, x, y FROM POI p JOIN r eviews r ON (pname = poi ) WHERE category = Restaurant GROUP BY pname, x, y HAVING count ( 1 ) AS totalcount / sum ( CASE WHEN r a t i n g > 3 THEN 1 ELSE 0) ), SELECT c i t y, s q r t ( ( x 500) ( x 500) + ( y 600) ( y 600)) AS d i s t FROM r e s t ORDER BY d i s t LIMIT 1 ; DB - Spring 2017: Page 7 (of 16)

Part 2 Relational Algebra (Total: 26 Points) Question 2.1 Relational Algebra (10 Points) Write an relational algebra expression over the schema from the SQL part that returns the name of the user which has written the most reviews. Use the bag semantics version of relational algebra. rewp eruser = user α a count(1) (reviews) q = π user (rewp eruser a=b user α max(a) (rewp eruser)) DB - Spring 2017: Page 8 (of 16)

Question 2.2 Relational Algebra (10 Points) Write an relational algebra expression over the schema from the SQL part that returns the distance between all pairs of POIs in New York. Ensure that each pair is only returned once. For example, return only one of (Cloud Gate, Pizza Joe) and (Pizza Joe, Cloud Gate). Also do not pair a point with itself, e.g., (Cloud Gate, Cloud Gate). Similar to the SQL case assume the existence of a function sqrt for computing the square root of a number. Use the bag semantics version of relational algebra. nyp oi = π pname,x,y (P OI xlow x x xhigh ylow y y yhigh σ pname= NewY ork (city)) q = π pname,oname,sqrt((x1 x2) (x1 x2)+(y1 y2) (y1 y2)) (ρ oname,x2,y2 (nyp oi) pname<oname ρ oname,x1,y1 (nyp oi) DB - Spring 2017: Page 9 (of 16)

Question 2.3 Relational Algebra (6 Points) Write an relational algebra expression over the schema from the SQL part that returns for each POI the highest and lowest rating. Use the bag semantics version of relational algebra. q = poi α min(rating),max(rating) (reviews) DB - Spring 2017: Page 10 (of 16)

Part 3 Index Structures (Total: 24 Points) Question 3.1 B+-tree Operations (24 Points) Given is the B+-tree shown below (n = 3 ). Execute the following operations and write down the resulting B+-tree after each step: insert(88),insert(99),insert(6),delete(3) When splitting or merging nodes follow these conventions: Leaf Split: In case a leaf node needs to be split, the left node should get the extra key if the keys cannot be split evenly. Non-Leaf Split: In case a non-leaf node is split evenly, the middle value should be taken from the right node. Node Underflow: In case of a node underflow you should first try to redistribute and only if this fails merge. Both approaches should prefer the left sibling. 10 40 60 3 4 5 15 22 45 48 61 62 DB - Spring 2017: Page 11 (of 16)

insert(88) 10 40 60 insert(99) 3 4 5 15 22 45 48 61 62 88 60 10 40 88 insert(6) 3 4 5 15 22 45 48 61 62 88 99 60 5 10 40 88 3 4 5 6 15 22 45 48 61 62 88 99 delete(3) 60 10 40 88 4 5 6 15 22 45 48 61 62 88 99 DB - Spring 2017: Page 12 (of 16)

Part 4 I/O Estimation (Total: 18 Points) Question 4.1 I/O Cost Estimation (12 = 4 + 4 + 4 Points) Consider two relations R and S with B(R) = 300 and B(S) = 20, 000. You have M = 201 memory pages available. Compute the number of I/O operations needed to join these two relations using block-nested-loop join, merge-join (the inputs are not sorted), and hash-join. You can assume that the hash function distributes keys evenly across buckets. Justify you result by showing the I/O cost estimation for each join method. Block Nested-loop: Use smaller table R as the inner. We only have three chunks of size 200. Thus, we get 2 (200 + B(S)) = 40,400 I/Os. Merge-join: Relation R can be sorted with one merge phase memory resulting in 2 2 B(R) = 1, 200 I/Os. Relation S requires one merge phase, merging 100 runs in the last phase: 2 2 B(S) = 80, 000 I/Os. The last merge phase of relation S can be combined with the last merge phase of R (2 + 100 = 102 blocks of memory required). The merge join can be execute during these merge phases avoiding on read of relations R and S. Without optimizations we get 7 (B(R) + B(S)) = 142, 100. If we execute the merge-join during the last merge phases we get 5 (B(R) + B(S)) = 101, 500. Hash-join: We need one partitioning phase for the partitions of Relation R to fit into memory. Thus, the hash-join requires 3 (B(R) + B(S)) = 60, 900 I/O. Question 4.2 External Sorting (6 Points) Consider a relation R with B(R) = 64, 000, 000. Assume that M = 33 memory pages are available for sorting. How many I/O operations are needed to sort this relation using no more than M memory pages. External sorting requires 2 (1 + log M 1 ( B(R) M ) ) B(R) = 2 5 64, 000, 000 = 640, 000, 000 I/Os. DB - Spring 2017: Page 13 (of 16)

DB - Spring 2017: Page 14 (of 16)

DB - Spring 2017: Page 15 (of 16)

DB - Spring 2017: Page 16 (of 16)