Homework 5: Miscellanea (due April 26 th, 2013, 9:05am, in class hard-copy please)

Similar documents
Homework 4: Query Processing, Query Optimization (due March 21 st, 2016, 4:00pm, in class hard-copy please)

Project Assignment 2 (due April 6 th, 2016, 4:00pm, in class hard-copy please)

Homework 6: FDs, NFs and XML (due April 15 th, 2015, 4:00pm, hard-copy in-class please)

Homework 2: Query Processing/Optimization, Transactions/Recovery (due February 16th, 2017, 9:30am, in class hard-copy please)

Homework 6: FDs, NFs and XML (due April 13 th, 2016, 4:00pm, hard-copy in-class please)

Project Assignment 2 (due April 6 th, 2015, 4:00pm, in class hard-copy please)

Homework 7: Transactions, Logging and Recovery (due April 22nd, 2015, 4:00pm, in class hard-copy please)

Homework 1: Relational Algebra and SQL (due February 10 th, 2016, 4:00pm, in class hard-copy please)

Homework 1: RA, SQL and B+-Trees (due Feb 7, 2017, 9:30am, in class hard-copy please)

Homework 1: RA, SQL and B+-Trees (due September 24 th, 2014, 2:30pm, in class hard-copy please)

Homework 2: E/R Models and More SQL (due February 17 th, 2016, 4:00pm, in class hard-copy please)

EXAMINATION PAPER. Exam in: INF-2700 Database Systems Date: Thursday Time: Kl 15:00-19:00 Place: Åsgårdveien 9. Approved aids: None

McGill April 2009 Final Examination Database Systems COMP 421

Homework 3: Map-Reduce, Frequent Itemsets, LSH, Streams (due March 16 th, 9:30am in class hard-copy please)

SYED AMMAL ENGINEERING COLLEGE

MaanavaN.Com DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING QUESTION BANK

Final Exam CSE232, Spring 97

CS 564 Final Exam Fall 2015 Answers

CS 186, Fall 1999 Midterm 2 Professor Hawthorn

CSE 414 Database Systems section 10: Final Review. Joseph Xu 6/6/2013

CS348: INTRODUCTION TO DATABASE MANAGEMENT (Winter, 2011) FINAL EXAMINATION

Homework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in class hard-copy please)

CS 186, Fall 1999 Midterm 2 Professor Hawthorn

VIEW OTHER QUESTION PAPERS

Northern India Engineering College, New Delhi Question Bank Database Management System. B. Tech. Mechanical & Automation Engineering V Semester

CS145 Midterm Examination

Handout 3: Functional Dependencies and Normalization

Relational Model 2: Relational Algebra

Total No. of Questions :09] [Total No. of Pages : 02. II/IV B.Tech. DEGREE EXAMINATIONS, NOV/DEC Second Semester CSE/IT DBMS

Midterm 2: CS186, Spring 2015

Final Exam. December 5th, :00-4:00. CS425 - Database Organization Results

LAB 3 Part 1 ADBC - Interactive SQL Queries-Advanced

CSE 344 Final Examination

Administrivia. CS186 Class Wrap-Up. News. News (cont) Top Decision Support DBs. Lessons? (from the survey and this course)

Carnegie Mellon University Department of Computer Science /615 - Database Applications C. Faloutsos & A. Pavlo, Fall 2016

CMU - SCS / Database Applications Spring 2013, C. Faloutsos Homework 1: E.R. + Formal Q.L. Deadline: 1:30pm on Tuesday, 2/5/2013

CSE 544 Principles of Database Management Systems

VALLIAMMAI ENGINEERING COLLEGE

Database Tuning and Physical Design: Execution of Transactions

CSE 344 Final Examination

Department of Information Technology B.E/B.Tech : CSE/IT Regulation: 2013 Sub. Code / Sub. Name : CS6302 Database Management Systems

CS/B.Tech/CSE/New/SEM-6/CS-601/2013 DATABASE MANAGEMENENT SYSTEM. Time Allotted : 3 Hours Full Marks : 70

Course Web Site. 445 Staff and Mailing Lists. Textbook. Databases and DBMS s. Outline. CMPSCI445: Information Systems. Yanlei Diao and Haopeng Zhang

CMPSCI445: Information Systems

CS 474, Spring 2016 Midterm Exam #2

Final Exam CSE232, Spring 97, Solutions

Midterm 1: CS186, Spring I. Storage: Disk, Files, Buffers [11 points] SOLUTION. cs186-

PLEASE HAND IN UNIVERSITY OF TORONTO Faculty of Arts and Science

15-415/615 Homework 8, Page 2 of 11 Thu, 4/24/2014, 1:30pm

Sankalchand Patel College of Engineering, Visnagar B.E. Semester III (CE/IT) Database Management System Question Bank / Assignment

Introduction to Data Management CSE 344

CSE 444 Midterm Exam

CSE 444, Winter 2011, Midterm Examination 9 February 2011

Introduction to Database Systems CSE 414. Lecture 3: SQL Basics

Outline. Databases and DBMS s. Recent Database Applications. Earlier Database Applications. CMPSCI445: Information Systems.

One Size Fits All: An Idea Whose Time Has Come and Gone

Introduction to Data Management CSE 414

CSE 544 Principles of Database Management Systems

L i (A) = transaction T i acquires lock for element A. U i (A) = transaction T i releases lock for element A

TRANSACTION MANAGEMENT

CSE 344 Final Examination

CSE 344 Final Review. August 16 th

MCA (Revised) Term-End Examination December,

Babu Banarasi Das National Institute of Technology and Management

CSE 444: Database Internals. Lectures Transactions

Solutions to Final Examination

Concurrency Control - Two-Phase Locking

L Information Systems for Engineers. Final exam. ETH Zurich, Autumn Semester 2017 Friday

CS6302- DATABASE MANAGEMENT SYSTEMS- QUESTION BANK- II YEAR CSE- III SEM UNIT I

CSE 414 Midterm. April 28, Name: Question Points Score Total 101. Do not open the test until instructed to do so.

Attach extra pages as needed. Write your name and ID on any extra page that you attach. Please, write neatly.

A7-R3: INTRODUCTION TO DATABASE MANAGEMENT SYSTEMS

Modern Database Systems CS-E4610

GUJARAT TECHNOLOGICAL UNIVERSITY

CSE 190D Spring 2017 Final Exam

CSE 344 JANUARY 19 TH SUBQUERIES 2 AND RELATIONAL ALGEBRA

CMSC 461 Final Exam Study Guide

Introduction to Database Systems CSE 444

CSCE 4523 Introduction to Database Management Systems Final Exam Spring I have neither given, nor received,unauthorized assistance on this exam.

Database Systems (INFR10070) Dr Paolo Guagliardo. University of Edinburgh. Fall 2016

CS-245 Database System Principles Winter 2002 Assignment 4

Rajiv GandhiCollegeof Engineering& Technology, Kirumampakkam.Page 1 of 10

DATABASE MANAGEMENT SYSTEMS

Database Systems CSE 414

CSE 544, Winter 2009, Final Examination 11 March 2009

Midterm 1: CS186, Spring I. Storage: Disk, Files, Buffers [11 points] cs186-

Where Are We? Next Few Lectures. Integrity Constraints Motivation. Constraints in E/R Diagrams. Keys in E/R Diagrams

Reminders - IMPORTANT:

Database Management System Prof. Partha Pratim Das Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur

Midterm Examination CS 265 Spring 2016 Name: I will not use notes, other exams, or any source other than my own brain on this exam: (please sign)

Introduction to Data Management CSE 344

CPSC 310: Database Systems / CSPC 603: Database Systems and Applications Final Exam Fall 2005

Final Exam. May 3rd, :30-12:30. CS520 - Data Integration, Warehousing, and Provenance

Name Class Account UNIVERISTY OF CALIFORNIA, BERKELEY College of Engineering Department of EECS, Computer Science Division J.

Bachelor in Information Technology (BIT) O Term-End Examination

CSE344 Final Exam Winter 2017

Relational Model and Relational Algebra

PLEASE HAND IN UNIVERSITY OF TORONTO Faculty of Arts and Science

The University of British Columbia Computer Science 304 Practice Final Examination

CHALMERS UNIVERSITY OF TECHNOLOGY Department of Computer Science and Engineering Examination in Databases, TDA357/DIT620 Thursday 27 March 2008, 08:30

Transcription:

Virginia Tech. Computer Science CS 4604 Introduction to DBMS Spring 2013, Prakash Homework 5: Miscellanea (due April 26 th, 2013, 9:05am, in class hard-copy please) Reminders: a. Out of 100 points. b. Rough time-estimates: ~4-6 hours. c. Please type your answers. Illegible handwriting may get no points, at the discretion of the grader. Only drawings may be hand-drawn, as long as they are neat and legible. d. There could be more than one correct answer. We shall accept them all. e. Whenever you are making an assumption, please state it clearly. Q1: B-Tree [15 points] Assume the following B-tree exists with d = 2: 9 23 1 3 5 7 10 14 27 32 38 (3x5=15 points) Sketch the state of the B-tree after each step in the following sequence of insertions and deletions: Insert 28, Insert 4, Delete 38, Delete 14, Delete 9 Note: Use the insertion and deletion algorithms given in the lecture slides. Q2: MDs and 4NF [15 points] Consider the relation R(A, B, C, D) with the following set of dependencies: A B C, C D, D B Q2.1 (6 points) Use the chase process to prove that the MD C B also holds. Q2.2 (9 points) Is this relation in 4NF? Explain your answer: if it is not in 4NF, decompose it into a set of 4NF relations. Show your steps and make sure your decomposition is lossless.

Q3: Transactions [14 points] Consider the following Schedule #1: Answer the following questions (in each case explain your answer). Let s assume A = 10 and B = 5, before Schedule #1 is executed. Q3.1 (2 points) What are the values of A and B at the end of Schedule #1? Q3.2 (2 points) Is this schedule conflict serializable? Consider the following (different) Schedule #2: Answer the following questions (in each case explain your answer). Assume every Read() is lock-shared and every Write() is lock-exclusive.

Q3.3 (3 points) Is Schedule# 2 conflict serializable? Please draw the dependency graph of this schedule. Q3.4 (2 points) Is this schedule cascadeless (i.e. does not have cascading aborts)? Q3.5 (2 points) Is this schedule possible under a (non-strict) 2PL protocol? Q3.6 (2 points) Is this schedule possible under a strict 2PL protocol? Q4: XML [12 points] Q4.1 (4 points) Write the XML Schema definition of Fig. 11.20 (page 513 in textbook) as a DTD. Q4.2 (5 points) Consider the following table: The table records the pizza name, price, which ingredient is used in a particular pizza, and how much ingredient is used (recall the DB-Pizza Store example in Homework #1). Translate it to pizzas.xml, like in Figure 12.7 (Page 532 in textbook). Q4.3 (3 points) Write the following query in XQuery: Return the names of pizzas that contain 4 units of the ingredient Tomato. Q5: Data Mining and Warehousing [12 points] Data Mining is the examination of data for patterns or trends. For example, by analyzing the sales data in a company like Amazon, we can understand which item was the most profitable during a particular period; which is the most popular color for a given dress; and so on. Let us consider the following relation: clothes_sale (clothes_type, category, size, color, season, amount) In this relation, every tuple has a clothes_type (e.g., skirt, dress, and shirt ), category (e.g., men s wear, women s wear, and kid s wear ), size (e.g., Large, Medium, Small ), color (e.g., Red ), season (e.g., Winter ), and amount (e.g., 20 ) that is how many items are sold. Q5.1 (2 points) Write the SQL statement to create the CUBE(clothes_sale) by constructing a MATERIALIZED VIEW.

Q5.2 (3 points) What is the ratio of the size of CUBE(clothes_sale) to the size of clothes_sale? Assume that the fact table clothes_sale has 5 dimensions each with 5 different values. Also assume that there are 500,000 tuples in clothes_sale. Q5.3 (7 points) Answer the following queries by using the materialized view you created before in Q5.1. Q5.3.1 (2 points) Find the total sales of red shirt for each season. Q5.3.2 (3 points) Find the most popular color for each clothes_type. Q5.3.3 (2 points) Find which clothes_type is sold most for each category. Q6: Query Optimization (Pen-n-Paper) [18 points] Consider the following relational design (the primary key of the relations have been underlined): Product (pid, pname, price, description) Customer (cid, cname, address, city, phone) Order (cid, pid, orderdate, quantity) Assume that there are 400,000 different products, 20 million customers, and 1 billion order records. Furthermore, we assume there are 20,000 cities, the maximum price is 10,000 dollars and the minimum price is 10 dollars. Consider the following three queries on the given relations: Query 1: Query 2: Query 3: SELECT c.cid, cname FROM Customer as c, Order as o, Product as P WHERE c.cid = o.cid AND o.pid = p.pid AND city = Blacksburg AND pname = FIFA 2012 ; SELECT o.cid, o.pid, orderdate FROM Customer as c, Product as p, Order as o WHERE c.cid = o.cid AND o.pid = p.pid AND price > 1000; SELECT cname, city FROM Customer as c, Order as o, Product as p WHERE c1.cid = o.cid AND o.pid = p.pid AND pname = Nike Air Force OR pname = Adidas Soccer ;

Q6.1 (1x3=3 points) Describe in plain English what each query does. Q6.2 (3x3=9 points) Translate the three queries each into a left-deep-join parse tree, and then transform them to canonical form by using the transformation rules given in the lecture slides. Make sure you mention exactly which rules you apply. Note that your canonical form for each query should be a left-deep-join tree too. Q6.3 (2x3=6 points) Compute the selectivity of the predicates in the canonical form of each query you get in Q6.2. Show your steps. Q7: Query Optimization (Hands-on) [14 points] Consider the following two relations: hw5_pub (id, title, type, ee, ee_pdf, titlesignature, dblp_key, publisher, doi) hw5_author_pub (id, author, authornum) These two tables are very similar to the dblp_pub_new and dblp_author_ref_new tables of your project. In the table hw5_pub, id is the identifier of the publication, type is the type of publications, ee is url of publications and so on. In the table hw5_author_pub, author is author s name and authornum is the rank of this author in a particular publication. We have created sample instances of these two tables on the cs4604.cs.vt.edu server. Use the following commands to copy the tables to your private database: pg_dump -U YOUR-PID -t hw5_pub s13dblp psql -d YOUR-PID pg_dump -U YOUR-PID -t hw5_author_pub s13dblp psql -d YOUR- PID Then answer the following questions: Q7.1 (3 points) What is the number of disk pages occupied by the two tables respectively? Also write down the query you used to find this information. Q7.2 (4 points) Consider the following query which lists the paper titles and years of the authors who are collaborators of Philip S. Yu : Query 4: SELECT title, year FROM hw5_author_pub AS t1, hw5_author_pub AS t2, hw5_author_pub AS t3, hw5_pub AS t4 WHERE t1.author = 'Philip S. Yu' AND t1.id = t2.id AND t2.author <> 'Philip S. Yu' AND t2.author = t3.author AND t3.id = t4.id;

Use the EXPLAIN command to get the query plan returned by the optimizer for Query 4. Copy-paste the output. In addition, draw the plan using the tree/relational algebra notation as in the lecture slides (see slides #61-62, lecture 19). What is the estimated cardinality of the result of this query? Next, run the query and report the number of rows actually returned. Also report the total runtime of the query. Q7.3 (2 points) Create an index on the attribute author in the table hw5_author_pub and an index on the attribute id in the table hw5_pub. Write the SQL commands you used. Update the statistics by running VACUUM and then ANALYZE. Q7.4 (5 points) Use the EXPLAIN command again on Query 4 (after doing Q7.3), copy-paste the output and also draw the query plan returned in the tree/relational algebra notation. What is the estimated result cardinality? Report the total runtime of the query as well. Has the performance of Query 4 improved (compared to before you had the indexes)? Hints: 1. Check the statistics collected by PostgreSQL: http://www.postgresql.org/docs/8.4/static/planner-stats.html 2. How to use EXPLAIN command and understand its output: http://www.postgresql.org/docs/8.4/static/sql-explain.html http://www.postgresql.org/docs/8.4/static/performance-tips.html 3. You can use EXPLAIN ANALYZE to get the runtime of a query.