CPSC 310: Database Systems / CSPC 603: Database Systems and Applications Exam 2 November 16, 2005

Similar documents
CPSC 310: Database Systems / CSPC 603: Database Systems and Applications Final Exam Fall 2005

CS-245 Database System Principles

CS-245 Database System Principles Winter 2002 Assignment 4

CPSC 311: Analysis of Algorithms (Honors) Exam 1 October 11, 2002

CS 245 Midterm Exam Winter 2014

CS 245 Midterm Exam Solution Winter 2015

Database Management Systems Written Exam

Spring 2013 CS 122C & CS 222 Midterm Exam (and Comprehensive Exam, Part I) (Max. Points: 100)

Database Management Systems Written Examination

CPSC 211, Sections : Data Structures and Implementations, Honors Final Exam May 4, 2001

CS 186/286 Spring 2018 Midterm 1

Some Practice Problems on Hardware, File Organization and Indexing

CSE344 Midterm Exam Fall 2016

University of Waterloo Midterm Examination Sample Solution

CS 186/286 Spring 2018 Midterm 1

University of California, Berkeley. CS 186 Introduction to Databases, Spring 2014, Prof. Dan Olteanu MIDTERM

Query Processing Strategies and Optimization

IMPORTANT: Circle the last two letters of your class account:

Database Management Systems (COP 5725) Homework 3

CS222P Fall 2017, Final Exam

Query Processing & Optimization

Storage hierarchy. Textbook: chapters 11, 12, and 13

Query optimization. Elena Baralis, Silvia Chiusano Politecnico di Torino. DBMS Architecture D B M G. Database Management Systems. Pag.

Midterm 1: CS186, Spring I. Storage: Disk, Files, Buffers [11 points] SOLUTION. cs186-

University of California, Berkeley. (2 points for each row; 1 point given if part of the change in the row was correct)

Advanced Database Systems


CSE 544 Principles of Database Management Systems

Database Systems CSE 414

Chapter 12: Query Processing. Chapter 12: Query Processing

CSE344 Midterm Exam Winter 2017

Midterm 1: CS186, Spring I. Storage: Disk, Files, Buffers [11 points] cs186-

Query Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016

Topics to Learn. Important concepts. Tree-based index. Hash-based index

CS140 Operating Systems and Systems Programming Final Exam

Review. Support for data retrieval at the physical level:

Indexing. Jan Chomicki University at Buffalo. Jan Chomicki () Indexing 1 / 25

Lassonde School of Engineering Winter 2016 Term Course No: 4411 Database Management Systems

Query Optimization. Query Optimization. Optimization considerations. Example. Interaction of algorithm choice and tree arrangement.

Database System Concepts

CMSC 424 Database design Lecture 13 Storage: Files. Mihai Pop

B-Tree. CS127 TAs. ** the best data structure ever

Problem Set 2 Solutions

Database Management Systems (COP 5725) (Spring 2007)

CS 222/122C Fall 2016, Midterm Exam

CS 564 Final Exam Fall 2015 Answers

Chapter 12: Query Processing

IMPORTANT: Circle the last two letters of your class account:

CIS 550 Fall Final Examination. December 13, Name: Penn ID:

Midterm Exam. Name: CSE232A, Winter February 21, Brief Directions:

CS140 Operating Systems and Systems Programming Final Exam

Introduction to Algorithms April 21, 2004 Massachusetts Institute of Technology. Quiz 2 Solutions

Database Applications (15-415)

Databasesystemer, forår 2005 IT Universitetet i København. Forelæsning 8: Database effektivitet. 31. marts Forelæser: Rasmus Pagh

Outline. Query Processing Overview Algorithms for basic operations. Query optimization. Sorting Selection Join Projection

INSTITUTO SUPERIOR TÉCNICO Administração e optimização de Bases de Dados

CS 4604: Introduction to Database Management Systems. B. Aditya Prakash Lecture #10: Query Processing

Something to think about. Problems. Purpose. Vocabulary. Query Evaluation Techniques for large DB. Part 1. Fact:

Database Applications (15-415)

Implementing Relational Operators: Selection, Projection, Join. Database Management Systems, R. Ramakrishnan and J. Gehrke 1

CS 461: Database Systems. Final Review. Julia Stoyanovich

Examples of Physical Query Plan Alternatives. Selected Material from Chapters 12, 14 and 15

CSE Midterm - Spring 2017 Solutions

CS 222/122C Fall 2017, Final Exam. Sample solutions

CSE 344 Midterm. Monday, November 9th, 2015, 9:30-10:20. Question Points Score Total: 70

CS140 Operating Systems and Systems Programming Final Exam

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe

Optimization Overview

Faloutsos 1. Carnegie Mellon Univ. Dept. of Computer Science Database Applications. Outline

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Database Systems: Fall 2015 Quiz I

RELATIONAL OPERATORS #1

CSCE 206: Structured Programming in C++

rate name People Consult Topics name rate name People Consult Topics name

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17

CMSC 461 Final Exam Study Guide

CMPUT 391 Database Management Systems. An Overview of Query Processing. Textbook: Chapter 11 (first edition: Chapter 14)

INSTITUTO SUPERIOR TÉCNICO Administração e optimização de Bases de Dados

CS 186 Midterm, Spring 2003 Page 1

Lecture 24: Image Retrieval: Part II. Visual Computing Systems CMU , Fall 2013

ECE 2035 A Programming HW/SW Systems Spring problems, 5 pages Exam Three 13 April Your Name (please print clearly)

What happens. 376a. Database Design. Execution strategy. Query conversion. Next. Two types of techniques

CS143: Index. Book Chapters: (4 th ) , (5 th ) , , 12.10

Administriva. CS 133: Databases. General Themes. Goals for Today. Fall 2018 Lec 11 10/11 Query Evaluation Prof. Beth Trushkowsky

Query Execution [15]

CMPUT 391 Database Management Systems. Query Processing: The Basics. Textbook: Chapter 10. (first edition: Chapter 13) University of Alberta 1

INSTITUTO SUPERIOR TÉCNICO Administração e optimização de Bases de Dados

Relational Query Optimization

Principles of Data Management. Lecture #9 (Query Processing Overview)

File Structures and Indexing

Final Examination CSE 100 UCSD (Practice)

Query Processing & Optimization. CS 377: Database Systems

Physical Disk Structure. Physical Data Organization and Indexing. Pages and Blocks. Access Path. I/O Time to Access a Page. Disks.

Chapter 12: Indexing and Hashing

Database Languages and their Compilers

Announcements. Two typical kinds of queries. Choosing Index is Not Enough. Cost Parameters. Cost of Reading Data From Disk

CompSci 516 Data Intensive Computing Systems

DBMS Query evaluation

Hash table example. B+ Tree Index by Example Recall binary trees from CSE 143! Clustered vs Unclustered. Example

CS 3330 Exam 2 Fall 2017 Computing ID:

Chapter 12: Query Processing

Transcription:

CPSC 310: Database Systems / CSPC 603: Database Systems and Applications Exam 2 November 16, 2005 Name: Instructions: 1. This is a closed book exam. Do not use any notes or books, other than your two 8.5-by-11 inch review sheets. Do not confer with any other student. Do not use any computer equipment. 2. Show your work. Partial credit will be given. Grading will be based on correctness, clarity and neatness. 3. I suggest that you read the whole exam before beginning to work any problem. Budget your time wisely according to the point distribution. 4. There are 4 questions worth a total of 100 points, on 7 pages (including this page). DO NOT BEGIN THE EXAM UNTIL INSTRUCTED TO DO SO. GOOD LUCK! Please sign the academic integrity statement: On my honor, as an Aggie, I have neither given nor received unauthorized aid on this academic work. In particular, I certify that I have not received or given any assistance that is contrary to the letter or the spirit of the guidelines for this exam. Signature: References for these problems: web site for our textbook 1

1. (30 pts total) True/False. (a) (3 pts) On average, repeated random disk I/O s are faster than repeated sequential disk I/O s because random I/O s tend to access different cylinders and therefore cause less contention. (b) (3 pts) For any data file, it is possible to construct two separate sparse indexes on different keys. (c) (3 pts) There is a benefit to constructing a two-level index that has a dense first level and a dense second level. (d) (3 pts) RAID Level 5 (having each data disk serve as the spare disk for some blocks) allows the system to tolerate up to 5 disk crashes at the same time. (e) (3 pts) Since there is no downside to having indexes, we may as well build an index on every attribute of every relation to speed up as many queries as possible. (f) (3 pts) A drawback of hashing indexes is that they are slower than B-trees for queries of the form attribute equals constant. (g) (3 pts) In order to perform bag union of relations Ê and Ë with Å memory buffers, the size of the smaller relation must be at most Å ¾. (h) (3 pts) Say a randomly chosen block is being fetched from disk. The overall access delay per byte decreases as the block size increases. (i) (3 pts) Consider relation Ê µ and a query that asks for all tuples of Ê with attribute between 30 and 100. Suppose relation Ê is stored in a file using 100 blocks, and there is a B-tree index of height 2 over Ê with key that has 30 leaf nodes. It would take fewer disk I/O s to use the index to execute the query than to perform a sequential scan on the file. (j) (3 pts) Since natural join is associative and commutative, the same number of disk I/O s are executed no matter what order a series of joins are done in. 2

2. (25 pts total) Disk Storage Organization. Suppose we have a schema with three relations, all with fixed length, fixed format records. Company(Number,...) : Contains 16,000 tuples. Each tuple is 250 bytes. Number is key and is 20 bytes. Product(Number,Id,...) : Contains 54,000 tuples. Each tuple is 200 bytes. Number refers to the number of the company that makes the product and is 20 bytes. Id is the product identifier and is 25 bytes. Model(Id,Price,...) : Contains 144,000 tuples. Each tuple is 100 bytes. Id is the id of the product involved and is 25 bytes. Price is the price of this model and is 10 bytes. Assume each company has 3 products and each product by a given company has 3 models. Compare two different strategies for organizing the records on disk. Assume a disk block holds 4096 bytes, with 96 bytes per block devoted to header information. Records are not split across blocks. Sequential Organization: All Company records are placed sequentially, in order of company number, in one set of blocks. All Product records are placed sequentially, in order of company number, in another set of blocks. All Model records are placed sequentially, in order of product id, in a third set of blocks. Clustered Organization: For each Company record, the three Product records for that company and the nine Model records for that company reside in the same block. Company records are sequenced by company number. (a) (6 pts) How many disk blocks are needed to store the three relations using the sequential organization? Justify your answer. (b) (6 pts) How many disk blocks are needed to store the three relations using the clustered organization? Justify your answer. 3

(c) (6 pts) Which storage organization is better for queries in which all Company records need to be scanned in order of company id? Justify your answer. Now consider indexing these relations. Suppose we want a primary index on Company.Number. Each index entry associates a 10 byte pointer with a key. The blocks available for use by the index hold 4096 bytes each, with 96 bytes reserved for header information. (d) (7 pts) For the sequential organization described above, how many index blocks are needed for a sparse primary index? Justify your answer. 4

3. (20 pts total) Query Execution. Suppose we have two unary relations, Ê and Ë with these tuples: Ê: 7, 2, 9, 8, 3, 9, 1, 3, 6 Ë: 8, 4, 2, 1, 3, 2, 7, 3 Show the result of joining Ê and Ë using each of the following algorithms. List the results in the order that they would be output by the join algorithm. (a) (10 pts) Naive (one tuple at a time) nested loops join. Use Ê for the outer loop and Ë for the inner loop. (b) (10 pts) Hash join. Assume there are two hash buckets, numbered 0 and 1, and that the hash function sends even values to bucket 0 and odd values to bucket 1. In the second phase, assume that bucket 0 is processed before bucket 1 and that the contents of a bucket are read in the same order as they were written. 5

4.(25 pts total) Query Compilation. Consider this schema for an on-line bookstore: Cust (CustID, Name, Address, State, Zip) Book (BookID, Title, Author, Price, Category) Order (OrderID, CustID, BookID, ShipDate) (a) (8 pts) For the following relational algebra expression (Find books by JK Rowling bought by customers in Texas for under $20), show the initial logical query plan, drawn as a tree of relational operators with relations at the leaves. Ì ØÐ ÓÓ Á ËØ Ø ¼ Ì ¼ Æ ÙØ ÓÖ ¼ ÂÃÊÓÛÐ Ò ¼ Æ È Ö ¾¼ Ù Ø º» ÇÖ Ö º» ÓÓ µµ (b) (8 pts) Transform your logical query plan by pushing selections and projections down as far as you can. Show the result as another logical query plan, drawn as a tree of relational operators with relations at the leaves 6

(c) (9 pts) Consider the following logical query plan, which finds the names of all customers who ordered a book that was shipped after Jan 1, 2005. Label each relational operator in the tree with the expected result size in terms of the number of tuples. Assume Containment of Value Sets and Preservation of Value Sets. Use the heuristics discussed in lecture, which are from the textbook. Use these statistics: Ì Orderµ ¼ ¼¼¼. Î Order,OrderIDµ ½ ¼¼¼. Î Order,CustIDµ ¼¼¼. Î Order,BookIDµ ½ ¼¼¼. Î Order,ShipDateµ ¾ ¼¼. Ì Custµ ¼¼¼. Î Cust,CustIDµ ¼¼¼. Î Cust,Nameµ ¾ ¼. Ì Bookµ ¼¼¼. 7