CS 245 Midterm Exam Winter 2014

Similar documents
CS 245 Midterm Exam Solution Winter 2015

CS-245 Database System Principles

CS161 - Final Exam Computer Science Department, Stanford University August 16, 2008

CS 245 Final Exam Winter 2016

CS 155 Final Exam. CS 155: Spring 2005 June 2005

CS 245 Final Exam Winter 2017

CS 155 Final Exam. CS 155: Spring 2004 June 2004

CS 186/286 Spring 2018 Midterm 1

CS 186/286 Spring 2018 Midterm 1

CS 155 Final Exam. CS 155: Spring 2009 June 2009

Some Practice Problems on Hardware, File Organization and Indexing

IMPORTANT: Circle the last two letters of your class account:

Database Management Systems (COP 5725) Homework 3

Spring 2013 CS 122C & CS 222 Midterm Exam (and Comprehensive Exam, Part I) (Max. Points: 100)

Lassonde School of Engineering Winter 2016 Term Course No: 4411 Database Management Systems

University of California, Berkeley. CS 186 Introduction to Databases, Spring 2014, Prof. Dan Olteanu MIDTERM

Chapter 17. Disk Storage, Basic File Structures, and Hashing. Records. Blocking

Physical Disk Structure. Physical Data Organization and Indexing. Pages and Blocks. Access Path. I/O Time to Access a Page. Disks.

University of Waterloo Midterm Examination Solution

CSE344 Midterm Exam Winter 2017

Homework Assignment 2: Java Console and Graphics

Indexing Methods. Lecture 9. Storage Requirements of Databases

CS 155 Final Exam. CS 155: Spring 2012 June 11, 2012

Final Examination CSE 100 UCSD (Practice)

Indexing. Jan Chomicki University at Buffalo. Jan Chomicki () Indexing 1 / 25

Chapter 12: Query Processing. Chapter 12: Query Processing

COSC-4411(M) Midterm #1

Chapter 12: Query Processing

Announcements. Reading Material. Recap. Today 9/17/17. Storage (contd. from Lecture 6)

CPSC 310: Database Systems / CSPC 603: Database Systems and Applications Exam 2 November 16, 2005

University of Waterloo Midterm Examination Sample Solution

CS 155 Final Exam. CS 155: Spring 2011 June 3, 2011

Storage hierarchy. Textbook: chapters 11, 12, and 13

Database Management Systems Paper Solution

Indexing: Overview & Hashing. CS 377: Database Systems

CS 155 Final Exam. CS 155: Spring 2006 June 2006

Query Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016

Practice Midterm Exam Solutions

Database Applications (15-415)

More B-trees, Hash Tables, etc. CS157B Chris Pollett Feb 21, 2005.

RAID in Practice, Overview of Indexing

Database Technology. Topic 7: Data Structures for Databases. Olaf Hartig.

Queen s University Faculty of Arts and Science School of Computing CISC 432* / 836* Advanced Database Systems

CMPUT 391 Database Management Systems. Query Processing: The Basics. Textbook: Chapter 10. (first edition: Chapter 13) University of Alberta 1

Implementing Relational Operators: Selection, Projection, Join. Database Management Systems, R. Ramakrishnan and J. Gehrke 1

Chapter 17 Indexing Structures for Files and Physical Database Design

Database Management Systems Written Examination

Database Applications (15-415)

External Sorting Implementing Relational Operators

Database Applications (15-415)

CMPS 181, Database Systems II, Final Exam, Spring 2016 Instructor: Shel Finkelstein. Student ID: UCSC

COSC-4411(M) Midterm #1

CS 222/122C Fall 2016, Midterm Exam

Material You Need to Know

CS34800 Information Systems

Indexing. Week 14, Spring Edited by M. Naci Akkøk, , Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel

File Structures and Indexing

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe

CSIT5300: Advanced Database Systems

CS143: Index. Book Chapters: (4 th ) , (5 th ) , , 12.10

Lecture Data layout on disk. How to store relations (tables) in disk blocks. By Marina Barsky Winter 2016, University of Toronto

Database Applications (15-415)

Topics to Learn. Important concepts. Tree-based index. Hash-based index

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17

Hashing. Given a search key, can we guess its location in the file? Goal: Method: hash keys into addresses

Final Exam April 14, 2007 COSC 3407: Operating Systems

Database System Concepts, 6 th Ed. Silberschatz, Korth and Sudarshan See for conditions on re-use

Data Management for Data Science

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 6 - Storage and Indexing

Database System Concepts

Chapter 11: Indexing and Hashing

Midterm Review CS634. Slides based on Database Management Systems 3 rd ed, Ramakrishnan and Gehrke

Advanced Database Systems

RELATIONAL OPERATORS #1

Indexing: B + -Tree. CS 377: Database Systems

Overview of Storage and Indexing

Chapter 13: Query Processing

On my honor I affirm that I have neither given nor received inappropriate aid in the completion of this exercise.

CS145 Midterm Examination

CSE 562 Database Systems

CSE 332 Autumn 2013: Midterm Exam (closed book, closed notes, no calculators)

Chapter 12: Indexing and Hashing. Basic Concepts

CARNEGIE MELLON UNIVERSITY DEPT. OF COMPUTER SCIENCE DATABASE APPLICATIONS

Data and File Structures Chapter 11. Hashing

CS222P Fall 2017, Final Exam

Spring 2017 B-TREES (LOOSELY BASED ON THE COW BOOK: CH. 10) 1/29/17 CS 564: Database Management Systems, Jignesh M. Patel 1

Name: Problem 1 Consider the following two transactions: T0: read(a); read(b); if (A = 0) then B = B + 1; write(b);


CS ) PROGRAMMING ASSIGNMENT 11:00 PM 11:00 PM

Chapter 12: Indexing and Hashing

Unit 8 Physical Design And Query Execution Concepts Zvi M. Kedem 1

University of California, Berkeley. (2 points for each row; 1 point given if part of the change in the row was correct)

Midterm Review. March 27, 2017

CSE 444: Database Internals. Lectures 5-6 Indexing

Chapter 13: Indexing. Chapter 13. ? value. Topics. Indexing & Hashing. value. Conventional indexes B-trees Hashing schemes (self-study) record

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for

Chapter 13: Query Processing Basic Steps in Query Processing

CS122 Lecture 15 Winter Term,

Algorithms for Query Processing and Optimization. 0. Introduction to Query Processing (1)

COSC 2007 Data Structures II Final Exam. Part 1: multiple choice (1 mark each, total 30 marks, circle the correct answer)

Transcription:

CS 245 Midterm Exam Winter 2014 This exam is open book and notes. You can use a calculator and your laptop to access course notes and videos (but not to communicate with other people). You have 70 minutes to complete the exam. Print your name: The Honor Code is an undertaking of the students, individually and collectively: 1. that they will not give or receive aid in examinations; that they will not give or receive unpermitted aid in class work, in the preparation of reports, or in any other work that is to be used by the instructor as the basis of grading; 2. that they will do their share and take an active part in seeing to it that others as well as themselves uphold the spirit and letter of the Honor Code. The faculty on its part manifests its confidence in the honor of its students by refraining from proctoring examinations and from taking unusual and unreasonable precautions to prevent the forms of dishonesty mentioned above. The faculty will also avoid, as far as practicable, academic procedures that create temptations to violate the Honor Code. While the faculty alone has the right and obligation to set academic requirements, the students and faculty will work together to establish optimal conditions for honorable academic work. I acknowledge and accept the Honor Code. Signed: Problem Points Maximum 1 10 2 10 3 10 4 10 5 10 6 10 Total 60 1

Problem 1 (10 points) Suppose blocks of 1024 bytes are used to store variable-length records. The fixed part of the block header is 24 bytes, and includes a count of the number of records. In addition, the header contains a 4-byte pointer (offset) to each record in the block. The first record is placed at the end of the block, the second record starts where the first one ends, and so on. The size of each record is not stored explicitly, since the size can be computed from the pointers. (a) How many records of size 50 bytes could we fully fit (no spanning) in such a block? (The block is designed to hold variable size records, but these records by coincidence happen to be all the same size.) Number of records: (b) Suppose we want to fully store 20 records in the block, again all of the same size. What is the maximum size of such records? Maximum record size: (c) Now assume the block layout is changed to accommodate only fixed size records. In this case, the block header is still 24 bytes and contains the common length of the records in the block. How many records of size 50 bytes could we fully fit in such a block? Number of records: (d) For this part, assume we allow record spanning. Records are fixed size, but since we can have record fragments we need to add a 5-byte header to every fragment to store its length (and other information). (The block header is still 24 bytes.) Suppose we have 2 blocks of 1024 bytes each, we would like to store 3 records of 600 bytes each. We first allocate as much as we can of the records into the first block, and then use space from the second block. After the 3 records are stored, how many bytes are still available in the second block? Free bytes available: 2

Problem 2 (10 points) Consider a relational DBMS with the following relation E: E has attributes id, name, age, salary. id is the primary key attribute (no duplicate ids). E holds 1000 tuples with id values 1,2,3,...,1000. All records (tuples) in E are stored sequentially (in increasing order) based on their id. Each data block that holds E records can hold up to 5 records. We are building a traditional index (not a B+-tree) for this sequential file. Each index block can hold up to 10 (value, pointer) entries (where the pointer can identify either a block or a record). Index entries are sorted by value. To search for a key in an index, assume the system traverses the blocks that make up the index is sequence, starting from the first block. (a) Suppose we construct a dense index INDEX1 on E.id. How many blocks must the system read to retrieve the contents of the record with id = 567? Number of blocks: (b) To improve access speed, we add an index INDEX2 on INDEX1. What type of index makes sense, sparse or dense? Type of index: (c) After building INDEX2, how many blocks must the system read to retrieve the contents of the record with id = 567? Number of blocks: 3

(d) Suppose now build a secondary index on E.salary. What type of index makes sense, sparse or dense? Type of index: (e) Continuing with the secondary E.salary index, suppose that there are 50 distinct salary values, with an equal number of records per salary value. First let us say that we implement the E.salary index in a naive way, where duplicate values are stored in the index. That is, if there are x records salary = 1000, the index contains x entries of the form (1000, pointer). (Salaries are sorted in the index.) Say we want to read the content of all records with the largest salary. In the worst case scenario, how many blocks need to be fetched? Number of blocks: (f) Next, suppose we build E.salary without duplicate values. For instance, if there are x records with salary = 1000, there will be a single entry in the index, pointing to a set of contiguous blocks that store the x record pointers. Salaries are still sorted in the index. Ignore space requirements for pointer counts and next block pointers. In the worst case scenario, how many blocks need to be fetched to read the content of all records with the largest salary? Number of blocks: 4

Problem 3 (10 points) (a) Consider an extensible hash structure with the following characteristics: Buckets can hold up to two records. No overflow blocks are allowed. The hash function we use generates b = 4 bits total. Initially the extensible hash table is empty. Say we insert X records, where the search key of each record generates a distinct 4-bit hash value (no collisions). No records are deleted during this process. We are told that after the X insertions, 4 buckets have been allocated. (Note that the previous sentence does not refer to the size of the directory.) What is the minimum possible value of X? Minimum value of X: (b) In the same scenario as part (a), what is the maximum possible value of X? Maximum value of X: 5

(c) Next consider a linear hash table (not the extensible hash table of parts (a) and (b)). This table has the following characteristics: Buckets can hold up to two records. No overflow blocks are allowed, so when necessary the table is expanded to avoid the use of overflow chains. The hash function we use generates b = 4 bits total. Initially the linear hash table is empty. We first insert three records with hash keys 0000, 0111, and 1111. After that, two more records are inserted. The keys of all five inserted records are distinct, and there are no deletions at any point. We are told that after the insertions, a total of Y blocks have been allocated. What is the minimum possible value of Y? Minimum value of Y : (d) In the same scenario of part (c), what is the maximum possible value of Y? Maximum value of Y : 6

Problem 4 (10 points) Consider a B+ Tree where each node can have at most 3 keys. Suppose the tree initially has a root node with two children leaf nodes, which contain keys {1, 3, 5} and {7, 9, 11} respectively. The root node has a single key 7. (a) Four distinct keys from the set {0, 2, 4, 6, 8, 10, 12} are added to the tree in some order, which causes the tree to grow by one level. Which four keys were added to the tree? Write the four keys separated by commas. If there are multiple solutions, write any one solution. Sequence of 4 keys: (b) Suppose we are back to the original tree with two leaf nodes having keys {1, 3, 5} and {7, 9, 11}. Now X distinct keys from the set {0, 2, 4, 6, 8, 10, 12} are added to the tree in some order, and the tree does not grow a level. What is the minimum possible value of X? Minimum value of X: (c) In the same scenario as part (b), what is the maximum possible value of X? Maximum value of X: 7

Problem 5 (10 points) Consider two relations R(A, B, C) and S(B, C, D). We want to estimate the number of tuples and the size of the following expression: U = π ACD [(σ A=3 B=5 R) S)]. We are given the following information: T (R) = 100000; V (R, A) = 20; V (R, B) = 50; V (R, C) = 150. T (S) = 5000; V (S, B) = 100; V (S, C) = 200; V (S, D) = 30. All attributes are 10 bytes in size. We use the containment of value sets assumption. We assume query values are selected from values in the relations. (a) First consider the innermost select W = σ A=3 B=5 R. Compute the following values. T (W ) = S(W ) = V (W, A) = V (W, B) = V (W, C) = (b) Next consider the join Y = W S. Compute the following values. T (Y ) = S(Y ) = V (Y, A) = V (Y, B) = V (Y, C) = V (Y, D) = 8

(c) Finally consider the full expression U = π ACD [(σ A=3 B=5 R) S)]. Compute the following values. T (U) = S(U) = V (U, A) = V (U, C) = V (U, D) = 9

Problem 6 (10 points) State if the following statements are true or false. Please write TRUE or FALSE in the space provided. (You will be penalized for incorrect answers, so leave the box blank if you don t know.) (a) The hash join algorithm assumes that the input relations are sorted. (b) The transformation σ p (R S) = σ p (R) S is valid for both sets and bags. (c) Keeping full histograms can result in more accurate estimates than just keeping a value count, but histograms requires more processing to maintain, so they could result in worse overall performance. (d) Consider relation R(A, B, C, D). The following transformation is valid for sets, where P is a predicate involving attributes B and C: π A [σ P (R)] = σ P [π A (R)] (e) Consider a disk that partitions tracks into three groups: the outermost tracks store more data, the middle tracks store less data, and the innermost the least. In such a disk, the expected rotational delay will depend on what track is being accessed. (f) Extensible hashing makes searching over an encrypted database very efficient. 10

(g) The five minute rule can be useful for determining what data resides in memory and what data resides on disk. (h) Data access on an SSD is random access (as in main memory), not block oriented (as on a disk). (i) A traditional row store organization is better than a column store for transactional applications that access data for a single customer or entity at a time. (j) The elevator disk-scheduling algorithm is named after the way real elevators make their stops. 11