Homework 2: Query Processing/Optimization, Transactions/Recovery (due February 16th, 2017, 9:30am, in class hard-copy please)

Similar documents
Homework 4: Query Processing, Query Optimization (due March 21 st, 2016, 4:00pm, in class hard-copy please)

Homework 7: Transactions, Logging and Recovery (due April 22nd, 2015, 4:00pm, in class hard-copy please)

UNIVERSITY OF CALIFORNIA College of Engineering Department of EECS, Computer Science Division

Homework 5: Miscellanea (due April 26 th, 2013, 9:05am, in class hard-copy please)

Homework 1: RA, SQL and B+-Trees (due Feb 7, 2017, 9:30am, in class hard-copy please)

Homework 1: RA, SQL and B+-Trees (due September 24 th, 2014, 2:30pm, in class hard-copy please)

CSE 444, Winter 2011, Midterm Examination 9 February 2011

Homework 1: Relational Algebra and SQL (due February 10 th, 2016, 4:00pm, in class hard-copy please)

Course No: 4411 Database Management Systems Fall 2008 Midterm exam

Project Assignment 2 (due April 6 th, 2016, 4:00pm, in class hard-copy please)

IMPORTANT: Circle the last two letters of your class account:

Database Management Systems (COP 5725) Homework 3

Final Review. CS634 May 11, Slides based on Database Management Systems 3 rd ed, Ramakrishnan and Gehrke

Homework 2: E/R Models and More SQL (due February 17 th, 2016, 4:00pm, in class hard-copy please)

Project Assignment 2 (due April 6 th, 2015, 4:00pm, in class hard-copy please)

INSTITUTO SUPERIOR TÉCNICO Administração e optimização de Bases de Dados

CSE 190D Spring 2017 Final Exam

CSIT5300: Advanced Database Systems

CS 564 Final Exam Fall 2015 Answers

Reminders - IMPORTANT:

Database Management Systems (CS 601) Assignments

CompSci 516 Data Intensive Computing Systems

Homework 3: Map-Reduce, Frequent Itemsets, LSH, Streams (due March 16 th, 9:30am in class hard-copy please)

CSE 444: Database Internals. Section 4: Query Optimizer

Basic form of SQL Queries

Assignment 6 Solutions

CITS3240 Databases Mid-semester 2008

Introduction to Data Management. Lecture 14 (SQL: the Saga Continues...)

CSIT5300: Advanced Database Systems

ATYPICAL RELATIONAL QUERY OPTIMIZER

Lecture 2 SQL. Instructor: Sudeepa Roy. CompSci 516: Data Intensive Computing Systems

An SQL query is parsed into a collection of query blocks optimize one block at a time. Nested blocks are usually treated as calls to a subroutine

Lecture 3 More SQL. Instructor: Sudeepa Roy. CompSci 516: Database Systems

CSD Univ. of Crete Fall 2018 TUTORIAL ON QUERY OPTIMIZATION

Examples of Physical Query Plan Alternatives. Selected Material from Chapters 12, 14 and 15

Lecture 3 SQL - 2. Today s topic. Recap: Lecture 2. Basic SQL Query. Conceptual Evaluation Strategy 9/3/17. Instructor: Sudeepa Roy

CS6302- DATABASE MANAGEMENT SYSTEMS- QUESTION BANK- II YEAR CSE- III SEM UNIT I

Homework 6: FDs, NFs and XML (due April 13 th, 2016, 4:00pm, hard-copy in-class please)

Homework 6: FDs, NFs and XML (due April 15 th, 2015, 4:00pm, hard-copy in-class please)

Database Applications (15-415)

CS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen

Name Class Account UNIVERISTY OF CALIFORNIA, BERKELEY College of Engineering Department of EECS, Computer Science Division J.

CSE 444 Midterm Exam

CSE 190D Spring 2017 Final Exam Answers

Query Processing and Query Optimization. Prof Monika Shah

Question 1 (a) 10 marks

Query Optimization. Schema for Examples. Motivating Example. Similar to old schema; rname added for variations. Reserves: Sailors:

Final Exam CSE232, Spring 97

DATABASE MANAGEMENT SYSTEMS

Homework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in class hard-copy please)

CS2255 DATABASE MANAGEMENT SYSTEMS QUESTION BANK UNIT I

Overview of Implementing Relational Operators and Query Evaluation

Query Optimization. Schema for Examples. Motivating Example. Similar to old schema; rname added for variations. Reserves: Sailors:

Database Applications (15-415)

CMPS 181, Database Systems II, Final Exam, Spring 2016 Instructor: Shel Finkelstein. Student ID: UCSC

CS348: INTRODUCTION TO DATABASE MANAGEMENT (Winter, 2011) FINAL EXAMINATION

CIS 330: Applied Database Systems

Database Systems. Announcement. December 13/14, 2006 Lecture #10. Assignment #4 is due next week.

Experimenting with bags (tables and query answers with duplicate rows):

What are Transactions? Transaction Management: Introduction (Chap. 16) Major Example: the web app. Concurrent Execution. Web app in execution (CS636)

CMSC 461 Final Exam Study Guide

Transaction Management: Introduction (Chap. 16)

Review: Where have we been?

Relational Query Optimization

The transaction. Defining properties of transactions. Failures in complex systems propagate. Concurrency Control, Locking, and Recovery

Relational DBMS Internals Solutions Manual. A. Albano, D. Colazzo, G. Ghelli and R. Orsini

Principles of Data Management. Lecture #9 (Query Processing Overview)

Review. Relational Query Optimization. Query Optimization Overview (cont) Query Optimization Overview. Cost-based Query Sub-System

IMPORTANT: Circle the last two letters of your class account:

CSE 444: Database Internals. Sec2on 4: Query Op2mizer

SQL Part 2. Kathleen Durant PhD Northeastern University CS3200 Lesson 6

Cost Models. the query database statistics description of computational resources, e.g.

Intro to Transaction Management

Queen s University Faculty of Arts and Science School of Computing CISC 432* / 836* Advanced Database Systems

Weak Levels of Consistency

SQL: Queries, Constraints, Triggers

CS330. Some Logistics. Three Topics. Indexing, Query Processing, and Transactions. Next two homework assignments out today Extra lab session:

INSTITUTO SUPERIOR TÉCNICO Administração e optimização de Bases de Dados

A transaction is a sequence of one or more processing steps. It refers to database objects such as tables, views, joins and so forth.

SQL: Queries, Programming, Triggers

CPSC 310: Database Systems / CSPC 603: Database Systems and Applications Final Exam Fall 2005

Introduction to Data Management. Lecture #14 (Relational Languages IV)

Introduction to Data Management. Lecture #13 (Relational Calculus, Continued) It s time for another installment of...

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17

CISC437/637 Database Systems Final Exam

Transactions and Recovery Study Question Solutions

SQL. Chapter 5 FROM WHERE

Database Applications (15-415)

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Database Systems: Fall 2008 Quiz I

CS 186 Databases. and SID:

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Database Systems: Fall 2015 Quiz I

Midterm Exam #2 (Version A) CS 122A Winter 2017

Introduction to Data Management. Lecture #26 (Transactions, cont.)

Database Applications (15-415)

Total No. of Questions :09] [Total No. of Pages : 02

A7-R3: INTRODUCTION TO DATABASE MANAGEMENT SYSTEMS

Transaction Management Overview. Transactions. Concurrency in a DBMS. Chapter 16

Transaction Management: Concurrency Control, part 2

Locking for B+ Trees. Transaction Management: Concurrency Control, part 2. Locking for B+ Trees (contd.) Locking vs. Latching

CSE 544, Winter 2009, Final Examination 11 March 2009

Transcription:

Virginia Tech. Computer Science CS 5614 (Big) Data Management Systems Spring 2017, Prakash Homework 2: Query Processing/Optimization, Transactions/Recovery (due February 16th, 2017, 9:30am, in class hard-copy please) Reminders: a. Out of 100 points. Contains 7 pages. b. Rough time-estimates: 5~8 hours. c. Please type your answers. Illegible handwriting may get no points, at the discretion of the grader. Only drawings may be hand-drawn, as long as they are neat and legible. d. There could be more than one correct answer. We shall accept them all. e. Whenever you are making an assumption, please state it clearly. f. Each HW has to be done individually, without taking any help from non-class resources (e.g. websites etc). Q1. Sorting [20 points] Suppose you have a file with N = 5 x 10 7 pages. Q1.1. (5 points) What is the total I/O cost of sorting the file using the two-way merge sort (with three buffer pages)? Now suppose you have B = 129 buffer pages. Answer the following questions using the general external sorting algorithm outlined in section 13.3 of the textbook (page 424). Please write the formula you used in calculating the answers. Q1.2. (5 points) How many runs will you produce in the first pass? Q1.3. (5 points) Now assume that we have a disk with an average seek time of 10ms, average rotation delay of 5ms and a transfer time of 1ms for each page. Assuming the cost of reading/writing a page is the sum of those values (i.e. 16ms) and do not distinguish between sequential and random disk-access any access is 16ms, what is the total running time to sort the file? Q1.4. (5 points) With 129 buffer pages, what s the maximum size of the file (in number of pages) that we can sort with 3 passes? Q2. Query Optimization (Pen-n-Paper) [20 points] Consider the following schema: Sailors(sid, sname, rating, age) Reserve(sid, did, day) Boats(bid, bname, size) 1

Reserver.sid is a foreign key to Sailors and Reserves.bid is a foreign key to Boats.bid. We are given the following information about the database: Reserves contains 10,000 records with 40 records per page. Sailors contains 1000 records with 20 records per page. Boats contains 100 records with 10 records per page. There are 50 values for Reserves.bid. There are 10 values for Sailors.rating(1..10). There are 10 values for Boat.size There are 500 values for Reserves.day. Consider the following queries: Query 1: SELECT S.sid, S.sname, B.bname FROM Sailors S, Reserves R, Boats B WHERE S.sid=R.sid AND R.bid = B.bid Query 2: SELECT S.sid, S.sname, B.bname FROM Sailors S, Reserves R, Boats B WHERE S.sid=R.sid AND R.bid = B.bid AND B.size>5 AND R.day= July 4, 2003 Q2.1. (4 points) Assuming uniform distribution of values and column independence, estimate the number of tuples returned by Query 2. Also assume you are given that the size of Query 1 is 10^4 tuples. (Hint: Estimate the selectivities of Boat.size>5 and Reserves.day= July 4, 2003 ) Q2.2. (4 points) Draw all possible left-deep join query trees for Query 1. Q2.3. (12 points) For the first join in each query plan you get in Q2.2 (i.e. the join at the bottom of the tree), what join algorithm would work best (i.e. cost the least)? Assume that you have 50 pages of buffer memory. There are no indexes; so indexed nested loop is not an option. Consider the Page Oriented Nested Join (PNJ), Block-NJ, Sort-Merge-J, and Hash-J. Make sure you also write down the formula you use for each case (check slides). Q3. Query Optimization (Hands-on) [25 points] Consider the following two relations: employees(emp_no, birth_date, first_name, last_name, gender, hire_date); emp_salaries(emp_no, salary, from_date, to_date); 2

The employees table has the id of the employee, birth date, first name, last name, gender (M or F) and hire_date. Each employee could have many salaries through their careers. These salaries are present in the emp_salaries table that has the emp_no, the salary and the dater range he had this particular salary. Assume there are no existing indexes on these tables and both relations are stores as simple heap files. We have created sample instances of these two tables on the cs4604.cs.vt.edu server. This is the first time you will be accessing the PostgreSQL server, so refer to the guidelines (do not wait till the HW due date to check your access to the server!): http://people.cs.vt.edu/~badityap/classes/cs5614-spr17/homeworks/hw2/pgsql-instructions.pdf Use the following commands on the command prompt (before logging in to psql) to copy the tables to your private database: pg_dump U YOUR-PID t employees cs5614s17 psql d YOUR-PID pg_dump U YOUR-PID t emp_salaries cs5614s17 psql d YOUR-PID Sanity Check: run the following two statements and verify the output. select count(*) from employees; // output-count = 300024 select count(*) from emp_salaries; // output-count = 2844047 For this question, it may help to familiarize yourselves with the pg_class and pg_stats tables, provided by PostgreSQL as part of their catalog. Please see the links below: http://www.postgresql.org/docs/8.4/static/view-pg-stats.html http://www.postgresql.org/docs/8.4/static/catalog-pg-class.html Now answer the following questions: Q3.1. (5 points) Using a single SQL SELECT query on the pg_class table, find if the relations have primary keys, the number of attributes and the number of disks pages occupied by each relation (employees and emp_salaries). Write down the SQL query you used and also paste the output. Q3.2. (5x3=15 points) We want to find if the employees are mostly males or females, using the gender attribute in the employees table. We will do this via two different ways. A. Write a SQL query that uses the employees table to analyze and find the most frequent gender between the males and females employees. Paste the output too. 3

B. Now write a SQL query that uses only the pg_stats table instead to analyze and find the most frequent gender. Again paste the output. C. Use the EXPLAIN ANALYZE command to find out the total runtime and analyze the execution plan that the PostgreSQL planner generates for the two queries from parts A and B. Also compare the outputs you get in parts A and B. What do you observe? Q3.3. (2 points) Consider the following query that retrieves every salary that was granted after Feb 9, 1952. Note: You do not need to run anything on the server for this part. select * from emp_salaries where from_date > '1953-09-02'; Will a non-clustered B+- Tree index on the from_date attribute help to speed up the retrieval? Explain in only 1 line. Q3.4. (3 points) Create an index on your database that will decrease the runtime of the query in Q3.3. Write the index and the time improvement. Q4: Concurrency Control and Deadlocks [15 points] Q4.1. (5 points) Consider the schedule below. R(.) stands for Read and W(.) for write. Answer the following questions. a. (1 point) Is the schedule serial? b. (2 points) Give the dependency graph. Is the schedule Conflict-Serializable? If yes, give the equivalent serial schedule, else explain briefly. c. (2 points) Could this schedule have been produced by 2PL? Either show the sequence of locks and unlocks obeying 2PL or explain why not. Q4.2. (4 points) Consider the schedule below. R(.) stands for Read and W(.) for write. 4

Answer the following questions: a. (2 points) Could this schedule have been produced by 2PL? Either show a valid sequence of locks and unlocks obeying 2PL or explain why not. b. (2 points) Could this schedule have been produced by strict-2pl? Either show a valid sequence of locks and unlocks obeying strict-2pl or explain why not. Q4.3. (6 points) Consider the two lock schedules below. S(.) stands for a shared lock and X(.) denotes an exclusive lock. Schedule 1 Schedule 2 5

Answer the following questions: a. (2 points) For both schedules, which locks will be granted or blocked by the lock manager? b. (2 points) Give the waits-for graphs for both the schedules. c. (2 points) Is there a deadlock at the end of Schedule 1? What about Schedule 2? For each explain in 1 line. Q5: Logging and Recovery [20 points] Consider the log below. The records are of the form: (t-id, object-id, old-value, newvalue). Assumptions: a. The PrevLSN has been omitted (it s easy to figure it out yourself); b. for simplicity we assume that A, B, C and D each represents a page; 1. (T1, start) 2. (T1, A, 45, 10) 3. (T2, start) 4. (T2, B, 5, 10) 5. (T2, C, 35, 10) 6. (T1, D, 15, 5) 7. (T1, commit) 8. (T3, start) 9. (T3, A, 10, 15) 10. (T2, D, 5, 20) 11. (begin checkpoint, end checkpoint) 12. (T2, commit) 13. (T4, start) 14. (T4, D, 20, 30) 15. (T3, C, 10, 15) 16. (T3, commit) 17. (T4, commit) Q5.1. (3x4=12 points) What are the values of pages A, B, C and D in the buffer pool after recovery? Also specify which transactions have been ReDo(ne) and which transactions have been UnDo(ne). You are not required to show the details of the intermediate step. a. If the system crashes just before line 6 is written to the stable storage? b. If the system crashes just before line 10 is written to the stable storage? c. If the system crashes just before line 13 is written to the stable storage? d. If the system crashes just before line 17 is written to the stable storage? 6

Q5.2. (8 points) Assume only the crash as listed in Q5.1c has really happened and a recovery has then been performed, and the dirty pages caused by T1 have been flushed to disk before line 8. a. (2 points) Show the content of the transaction table and the dirty page table at both the beginning and the end of the Analysis phase. You may assume that both the transaction table and dirty page table are empty at the beginning of line 1 of the log. b. (6 points) Show the content of log when the recovery has completed. 7