University of Massachusetts Amherst Department of Computer Science Prof. Yanlei Diao

Similar documents
Database Management Systems (COP 5725) Homework 3

COMP7640 Assignment 2

Hash-Based Indexing 165

Question 1 (a) 10 marks

192 Chapter 14. TotalCost=3 (1, , 000) = 6, 000

CSC 370 Database Systems Summer 2004 Solutions for Exercise No. 1 Version 1.0 June 14, 2004

Assignment #2. CS Spring 2015 Due on Friday, June 19, 2015, 9 AM For instructions on how to submit your assignment check the course website.

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Database Systems: Fall 2009 Quiz I Solutions

QUIZ 1 REVIEW SESSION DATABASE MANAGEMENT SYSTEMS

DATABASE MANAGEMENT SYSTEMS

QUERY OPTIMIZATION [CH 15]

Example Examination. Allocated Time: 100 minutes Maximum Points: 250

CSIT5300: Advanced Database Systems

Integrity constraints, relationships. CS634 Lecture 2

Evaluation of Relational Operations

Overview of Query Processing. Evaluation of Relational Operations. Why Sort? Outline. Two-Way External Merge Sort. 2-Way Sort: Requires 3 Buffer Pages

CompSci 516: Database Systems

Administrivia. CS 133: Databases. Cost-based Query Sub-System. Goals for Today. Midterm on Thursday 10/18. Assignments

TotalCost = 3 (1, , 000) = 6, 000

IMPORTANT: Circle the last two letters of your class account:

Data on External Storage

Physical Design. Elena Baralis, Silvia Chiusano Politecnico di Torino. Phases of database design D B M G. Database Management Systems. Pag.

Tree-Structured Indexes

Outer Join, More on SQL Constraints

Relational Model 2: Relational Algebra

Overview of Storage and Indexing

CSE 344 Midterm Nov 1st, 2017, 1:30-2:20

CS698F Advanced Data Management. Instructor: Medha Atre. Aug 04, 2017 CS698F Adv Data Mgmt 1

Answers to the review questions can be found in the listed sections. What are the components of a workload description? (Section 20.1.

CPSC 421 Database Management Systems. Lecture 19: Physical Database Design Concurrency Control and Recovery

Administriva. CS 133: Databases. General Themes. Goals for Today. Fall 2018 Lec 11 10/11 Query Evaluation Prof. Beth Trushkowsky

Review of Storage and Indexing

Relational Query Optimization. Overview of Query Evaluation. SQL Refresher. Yanlei Diao UMass Amherst October 23 & 25, 2007

Databases - 3. Null, Cartesian Product and Join. Null Null is a value that we use when. Something will never have a value

Lecture 19: Query Optimization (1)

QUERY OPTIMIZATION. CS 564- Spring ACKs: Jeff Naughton, Jignesh Patel, AnHai Doan

CS419 Spring Computer Security. Vinod Ganapathy Lecture 15. Chapter 5: Database security

CMPT 354 Database Systems. Simon Fraser University Fall Instructor: Oliver Schulte

Evaluation of Relational Operations

Chapter 1: overview of Storage & Indexing, Disks & Files:

CS 245 Midterm Exam Winter 2014

Why Is This Important? Overview of Storage and Indexing. Components of a Disk. Data on External Storage. Accessing a Disk Page. Records on a Disk Page

Endterm Exam (Version B) CS 122A Spring 2017

Storage and Indexing

Temporal Relations 369 / 482

Announcements. Reading Material. Today. Different File Organizations. Selection of Indexes 9/24/17. CompSci 516: Database Systems

Modern Database Systems Lecture 1

Database Design and Tuning

Databases - 3. Null, Cartesian Product and Join. Null Null is a value that we use when. Something will never have a value

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17

CS122A: Introduction to Data Management. Lecture #14: Indexing. Instructor: Chen Li

Optimization of Logical Queries

Implementation of Relational Operations

Databases IIB: DBMS-Implementation Exercise Sheet 13

CS 461: Database Systems. Final Review. Julia Stoyanovich

Midterm 1: CS186, Spring 2015

PS2 out today. Lab 2 out today. Lab 1 due today - how was it?

Babu Banarasi Das National Institute of Technology and Management

Databases. Relational Model, Algebra and operations. How do we model and manipulate complex data structures inside a computer system? Until

CSE 344 Midterm. Wednesday, Nov. 1st, 2017, 1:30-2:20. Question Points Score Total: 100

INSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad

CS 245 Midterm Exam Solution Winter 2015

Physical Database Design and Tuning. Review - Normal Forms. Review: Normal Forms. Introduction. Understanding the Workload. Creating an ISUD Chart

Evaluation of Relational Operations. Relational Operations

MySQL B+ tree. A typical B+tree. Why use B+tree?

Introduction to Data Management. Lecture #13 (Indexing)

CSE344 Midterm Exam Winter 2017

Review. Administrivia (Preview for Friday) Lecture 21: Query Optimization (1) Where We Are. Relational Algebra. Relational Algebra.

Data Modeling. Yanlei Diao UMass Amherst. Slides Courtesy of R. Ramakrishnan and J. Gehrke

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Database Systems: Fall 2017 Quiz I

Introduction to Data Management. Lecture 14 (Storage and Indexing)

Physical Database Design and Tuning. Chapter 20

Physical Database Design and Tuning

CSE 544: Principles of Database Systems

(Storage System) Access Methods Buffer Manager

Database Management Systems Paper Solution

Examples of Physical Query Plan Alternatives. Selected Material from Chapters 12, 14 and 15

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Database Systems: Fall 2015 Quiz I

Optimizing logical query plans

Lecture 21: Query Optimization (1)

Lecture #16 (Physical DB Design)

Triggers. PL/SQL reminder. Example from last week. PL/SQL reminder- cont. Triggers- Trigger introduction cont. introduction

Database Applications (15-415)

2.2.2.Relational Database concept

QUERY PROCESSING & OPTIMIZATION CHAPTER 19 (6/E) CHAPTER 15 (5/E)

R & G Chapter 13. Implementation of single Relational Operations Choices depend on indexes, memory, stats, Joins Blocked nested loops:

University of California, Berkeley. (2 points for each row; 1 point given if part of the change in the row was correct)

Faloutsos 1. Carnegie Mellon Univ. Dept. of Computer Science Database Applications. Outline

Information Systems. Database System Architecture. Relational Databases. Nikolaj Popov

CS 564 Final Exam Fall 2015 Answers

Discuss physical db design and workload What choises we have for tuning a database How to tune queries and views

Computer Security: Principles and Practice

Introduction to Data Management. Lecture #17 (Physical DB Design!)

Schema for Examples. Query Optimization. Alternative Plans 1 (No Indexes) Motivating Example. Alternative Plans 2 With Indexes

Query optimization. Elena Baralis, Silvia Chiusano Politecnico di Torino. DBMS Architecture D B M G. Database Management Systems. Pag.

Indexing. Chapter 8, 10, 11. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Database Systems: Fall 2008 Quiz II

class 10 b-trees 2.0 prof. Stratos Idreos

Translating an ER Diagram to a Relational Schema

Relational Query Optimization. Overview of Query Evaluation. SQL Refresher. Yanlei Diao UMass Amherst March 8 and 13, 2007

Transcription:

University of Massachusetts Amherst Department of Computer Science Prof. Yanlei Diao CMPSCI 445 Midterm Practice Questions NAME: LOGIN: Write all of your answers directly on this paper. Be sure to clearly indicate your final answer for each question. Also, be sure to state any assumptions that you are making in your answers. Problem Maximum Score 1. True/False Statements 2. Relational Algebra and SQL 3. B+ Tree Indexes 4. Query Evaluation TOTAL

Question 1 [5 parts] True/False Statements State if the following statements are TRUE or FALSE. Please write your answers in the boxes below. No explanation is needed. (a) Consider relational operators σ and. For any union compatible relations R1 and R2 and any predicate p, σ p (R1 R2) σ p (R1) σ p (R2) ( means that the left expression and the right expression always return the same answer (b) Consider relational operators π and. For any union compatible relations R1 and R2 and any attribute set s, π S (R1 R2) π S (R1) π S (R2). (c) On average, random accesses to N pages are faster than a sequential scan of N pages because random I/Os tend to access different cylinders and therefore cause less contention. (d) It is a good idea to create as many indexes as possible to expedite query processing because there is no advantage of having many indexes. (e) Using an unclustered B+Tree index on age to retrieve records in sorted order of age is faster than performing a two-pass external merge-sort (assuming that we have enough memory to do so).

Question 2 [3 parts]: Relational Algebra and SQL Consider the following relational schema. Emp(eid, ename, age, salary) Works(eid, did) Dept(did, dname, budget, managerid) An employee can work in more than one department. And managerid in Dept is a foreign key referencing eid in Emp. (a) Print the name and age of each employee who works in both the Hardware department and the Software department. Write an SQL statement for this query. (b) Write an expression in Relational Algebra for the query in Part (a). (c) For each employee who manages some department, print his name and the sum of the budgets of all the departments that he manages. Write an SQL statement for this query.

Question 3 [1 part]: B+ Tree Indexes (a) Create a B+tree where each node can hold at most 3 keys and 4 pointers when the following keys are inserted in the following order: 1, 10, 2, 11, 3, 4, 8, 5, 7, 6 Show the final tree below.

Question 4 [2 parts]: Query Evaluation (a) Evaluation of Selection Consider the following schema: Employees(eid: integer, ename: string, sal: integer, title: string, age: integer) Suppose that the following indexes, all using Alternative (2) for data entries, exist: An unclustered B+ tree index on sal, A clustered B+ tree index on <age, sal >. The Employees relation contains 10,000 pages and 200,000 data records. Each Employees record is 100 bytes long and each index data entry is 20 bytes long. Consider the following selection condition sal > 200 title = 'VP' Assume that the reduction factor (RF) for an equality predicate is 1% and that for an inequality predicate is 10%. Compute the cost of the most selective access method (among the file scan and available index scans) for evaluating this selection condition.

Answer to Question 1: (a) True. (b) False. Suppose E1 and E2 have the same schema (name, gpa), and E1 has one tuple ( Sam, 4.0) and E2 has one tuple ( Sam, 3.0). π name (E1-E2) returns one tuple ( Sam ), but π name (E1) - π name (E2) returns no result. (c) False. N random I/Os have repeated seek time and rotational delay, hence slower than a sequential scan of N pages. (d) False. Indexes consume a lot space and can slow down insertions. (e) False. An unclustered index can lead to 1 random I/O for each record, in general worse than two-pass external sort. For details, see slides 22-25 in Lec 11. Answer to Question 2: (a) SQL statement SELECT E.ename, E.age FROM Emp E, Works W1, Works W2, Dept D1, Dept D2 WHERE E.eid = W1.eid AND W1.did = D1.did AND D1.dname = Hardware AND E.eid = W2.eid AND W2.did = D2.did AND D2.dname = Software (b) Expression in Relational Algebra ρ(r1, π eid (σ dname = Hardware (Dept) Works)) ρ(r2, π eid (σ dname = Software (Dept) Works)) π ename, age ((R1 R2) Emp) (c) SQL statement SELECT E.ename, sum(d.budget) FROM Dept D, Emp E WHERE D.managerid = E.eid GROUP BY D.managerid Answer to Question 3: (a) The final B+ tree: Root: 7 Level 1: (3, 5) (10) Level 2: (1, 2) (3,4) (5,6) (7,8) (10,11)

Answer to Question 4: (a) sal > 200 title = 'VP' Option 1: filescan with a cost of 10,000 pages. Option 2: Unclustered B+tree. - 2,000 (10,000/(100/20)) leaf pages * 10% matches = 200 pages. - 20 * 10,000 * 10% matches = 20,000 matches. One I/O per match, then 20,000 I/Os. If use the refinement of sorting, still 10,000 pages. Option 3: Clustered B+tree on <age, sal> does not help. In this case, the file scan is the best available method to use, with a cost of 10,000.