CS 186 Midterm, Spring 2003 Page 1

Similar documents
University of California, Berkeley. CS 186 Introduction to Databases, Spring 2014, Prof. Dan Olteanu MIDTERM

CS 186/286 Spring 2018 Midterm 1

Midterm 1: CS186, Spring I. Storage: Disk, Files, Buffers [11 points] cs186-

UNIVERSITY OF CALIFORNIA College of Engineering Department of EECS, Computer Science Division

IMPORTANT: Circle the last two letters of your class account:

CS 186/286 Spring 2018 Midterm 1

Midterm 1: CS186, Spring I. Storage: Disk, Files, Buffers [11 points] SOLUTION. cs186-

Spring 2013 CS 122C & CS 222 Midterm Exam (and Comprehensive Exam, Part I) (Max. Points: 100)

2(Y/8K) * (1 + log15(y/(8k*16)) I/Os

Midterm 1: CS186, Spring 2015

CS-245 Database System Principles

IMPORTANT: Circle the last two letters of your class account:

University of California, Berkeley. (2 points for each row; 1 point given if part of the change in the row was correct)

CS 222/122C Fall 2016, Midterm Exam

CSE 332 Autumn 2013: Midterm Exam (closed book, closed notes, no calculators)

Database Management Systems (COP 5725) Homework 2

CPSC 421 Database Management Systems. Lecture 11: Storage and File Organization

IMPORTANT: Circle the last two letters of your class account:

Storing Data: Disks and Files. Administrivia (part 2 of 2) Review. Disks, Memory, and Files. Disks and Files. Lecture 3 (R&G Chapter 7)

CS 245 Midterm Exam Winter 2014

Storing Data: Disks and Files

Disks and Files. Jim Gray s Storage Latency Analogy: How Far Away is the Data? Components of a Disk. Disks

Storing Data: Disks and Files

Goals for Today. CS 133: Databases. Relational Model. Multi-Relation Queries. Reason about the conceptual evaluation of an SQL query

Disks & Files. Yanlei Diao UMass Amherst. Slides Courtesy of R. Ramakrishnan and J. Gehrke

CSE 444 Homework 1 Relational Algebra, Heap Files, and Buffer Manager. Name: Question Points Score Total: 50

CSE 190D Spring 2017 Final Exam

Midterm Exam (Version B) CS 122A Spring 2017

CSE 544 Principles of Database Management Systems

Midterm 1: CS186, Spring 2012

Basant Group of Institution

Mahathma Gandhi University

The DBMS accepts requests for data from the application program and instructs the operating system to transfer the appropriate data.

Storing Data: Disks and Files. Storing and Retrieving Data. Why Not Store Everything in Main Memory? Chapter 7

Storing and Retrieving Data. Storing Data: Disks and Files. Solution 1: Techniques for making disks faster. Disks. Why Not Store Everything in Tapes?

1 (10) 2 (8) 3 (12) 4 (14) 5 (6) Total (50)

CS 405G: Introduction to Database Systems. Storage

STORING DATA: DISK AND FILES

Some Practice Problems on Hardware, File Organization and Indexing

Name Class Account UNIVERISTY OF CALIFORNIA, BERKELEY College of Engineering Department of EECS, Computer Science Division J.

CSE 190D Spring 2017 Final Exam Answers

CS 564 Final Exam Fall 2015 Answers

Midterm 2, CS186 Prof. Hellerstein, Spring 2012

Review 1-- Storing Data: Disks and Files

Midterm Exam #2 (Version A) CS 122A Winter 2017

Storing Data: Disks and Files. Storing and Retrieving Data. Why Not Store Everything in Main Memory? Database Management Systems need to:

Review The Big Picture

CSC 261/461 Database Systems Lecture 8. Spring 2018

Storing and Retrieving Data. Storing Data: Disks and Files. Solution 1: Techniques for making disks faster. Disks. Why Not Store Everything in Tapes?

Why Is This Important? Overview of Storage and Indexing. Components of a Disk. Data on External Storage. Accessing a Disk Page. Records on a Disk Page

CSE 332 Spring 2013: Midterm Exam (closed book, closed notes, no calculators)

Midterm 2: CS186, Spring 2015

MIDTERM EXAMINATION Spring 2010 CS403- Database Management Systems (Session - 4) Ref No: Time: 60 min Marks: 38

Disks and Files. Storage Structures Introduction Chapter 8 (3 rd edition) Why Not Store Everything in Main Memory?

Lecture 12. Lecture 12: The IO Model & External Sorting

CSE344 Final Exam Winter 2017

Storing Data: Disks and Files

Storing Data: Disks and Files

Chapter 12: Query Processing

PLEASE HAND IN UNIVERSITY OF TORONTO Faculty of Arts and Science

L9: Storage Manager Physical Data Organization

CS 245 Midterm Exam Solution Winter 2015

Database Systems. November 2, 2011 Lecture #7. topobo (mit)

Principles of Algorithm Design

Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Administrivia. Administrivia. Faloutsos/Pavlo CMU /615

IMPORTANT: Circle the last two letters of your class account:

File Structures and Indexing

CS 461: Database Systems. Final Review. Julia Stoyanovich

8) A top-to-bottom relationship among the items in a database is established by a

Storing Data: Disks and Files

Informatics 1: Data & Analysis

Query Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016

CompSci 516 Data Intensive Computing Systems

Principles of Data Management. Lecture #9 (Query Processing Overview)

CSE344 Midterm Exam Winter 2017

Query Processing: A Systems View. Announcements (March 1) Physical (execution) plan. CPS 216 Advanced Database Systems

The Entity-Relationship Model (ER Model) - Part 2

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 6 - Storage and Indexing

Principles of Data Management. Lecture #2 (Storing Data: Disks and Files)

Where Are We? Next Few Lectures. Integrity Constraints Motivation. Constraints in E/R Diagrams. Keys in E/R Diagrams

CSE 444: Database Internals. Lectures 5-6 Indexing

Database Management

FINALTERM EXAMINATION Fall 2009 CS301- Data Structures Question No: 1 ( Marks: 1 ) - Please choose one The data of the problem is of 2GB and the hard

Chapter 13: Query Processing

Database Applications (15-415)

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Database Systems: Fall 2015 Quiz I

Evaluation of Relational Operations

Homework 2: E/R Models and More SQL (due February 17 th, 2016, 4:00pm, in class hard-copy please)

Storing Data: Disks and Files

CMSC 424 Database design Lecture 12 Storage. Mihai Pop

Midterm 1: CS186, Spring 2012

Implementation of Relational Operations

CSE544: Principles of Database Systems. Lectures 5-6 Database Architecture Storage and Indexes

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17

Project Assignment 2 (due April 6 th, 2016, 4:00pm, in class hard-copy please)

Administrivia. CS 133: Databases. Cost-based Query Sub-System. Goals for Today. Midterm on Thursday 10/18. Assignments

Database Applications (15-415)

Evaluation of relational operations

3 February 2011 CSE-3421M Test #1 p. 1 of 14. CSE-3421M Test #1. Design

15110 PRINCIPLES OF COMPUTING SAMPLE EXAM 2

Transcription:

UNIVERSITY OF CALIFORNIA College of Engineering Department of EECS, Computer Science Division CS 186 Spring 2003 J. Hellerstein Midterm Midterm Exam: Introduction to Database Systems This exam has five problems, worth a total of 100 points. Each problem is made up of multiple questions. You should read through the exam quickly and plan your time-management accordingly. Before beginning to answer a question, be sure to read it carefully and to answer all parts of every question! Except where specifically directed otherwise, you must write your answers on the answer sheets provided. You may use the sheets with questions as scratch paper. You also must write your name at the top of every page, and you must turn in all the pages of the exam. Do not remove pages from the stapled exam! Two pages of extra answer space have been provided at the back in case you run out of space while answering. If you run out of space, be sure to make a forward reference to the page number where your answer continues. Good luck! 1) Disk and Buffers (15 points) The specifications sheet for the Seagate Cheetah Xl 5-36LP 40 GB disk says that it has a sector size of 512 bytes, about 80 million (8x10 10 ) sectors, about 20,000 (2x10 4 ) cylinders, 4 double-sided platters, and an average seek time of about 4 msec. The disk platters rotate at about 18,000 (18x10 3 ) rpm (revolutions per minute), and one track of data can be transferred per revolution. We will put our database on this disk. The buffer pool of this DBMS consists of buffer pages with 1024 bytes each. The disk block size is the same as buffer page size. If your arithmetic is getting messy in this question, you can leave your answer unsimplified, as an equation of 0 variables (e.g. 9467/32 + 212 ). Note that you may not need all the information above to answer these questions. a) (5 points) Based on the information above, how many tracks are there on each platter of this disk? b) (5 points) Suppose that a table containing 10,000 records of 100 bytes each is to be stored on such a disk, and no record is allowed to span two blocks. How many blocks are required to store the entire table? (Hint: first compute the number of records per block! You may assume that the disk blocks are arranged for fixed-length records, and that the header information for each block is 20 bytes long.) c) (5 points) Suppose that your answer to part (b) was some number B. You want to join that table with another table of B blocks, using nested-loops join. How many buffer pages (at least) do we need to avoid sequential flooding? Assume that the buffer replacement policy is LRU and all the buffers are unpinned and put in the free list before executing the join. Also assume we are writing the output to disk. Your answer should be an equation on the single variable B. CS 186 Midterm, Spring 2003 Page 1

2) Hashing (20 points) In this question, you are implementing a database to handle applicants to be contestants on the TV show The Bachelorette. Consider the relation Applicants (name, age, IQ, haircolor. homes tate, walletthickness). You are interested in choosing people with a variety of attributes for the show. Consider the following query (Q1): select distinct name, haircolor, homestate from Applicants; Let Applicants be N blocks big. We will consider hash-based implementations of duplicate elimination. a) (4 points) Assume that you have a naïve implementation of hashing, which is unable to spill to disk (i.e. the code in Postgres before you added your HW2!) What is the largest value of N that can be handled efficiently for Qi in this implementation? b) (4 points) Now imagine you improve the hashing implementation somewhat, to support a simple, 2-phase disk-based hashing strategy as described in lecture (partition+rehash not hybrid hashing!). Assuming a perfect hash function, give an expression for the number of L Os for processing duplicate elimination in Q1. Do not count any ljos for reading the input to the partitioning phase, or writing the output of the rehashing phase! c) (6 points) Suppose that the hash function you used does not work very well. Describe what could go wrong in Qi using the scheme in (b), and what a (thorough) implementation would do to compensate. d) (6 points) When you originally set up the entry form, you allowed people to type any string they wanted for the haircolor field. Some weird stuff came in! Values included fuschia, dishwater and noneo1ourbusiness. Seems like a lot of the unusual entries came from Florida and California. Since you re curious, you ask the following query (Q2): select homestate, count(distinct hairco].or) from Applicants group by homestate Would the hash-based grouping you implemented in Homework 2 work for this query? Why or why not? Would sorting work better? CS 186 Midterm, Spring 2003 Page 2

3) ER Diagram (25 points) The UC Berkeley housing office has decided to computerize its information on graduate students and their dependents living in the married housing apartments at Albany Village. Here are the rules of the database: Graduate students are identified by their student ID (SID). They also have name, age. sex and department as attributes. Graduate students have zero, one or more children (with name and age as attributes). Each child has exactly one graduate student listed as their parent. We are not interested in information about a child once the graduate student leaves. Assume that name and age are not unique (nor is the pair <name,age>). Apartments can be inhabited by the families of zero, one or more graduate students. Each apartment exists within exactly one building and has a apartment number (AID) unique to the building. Apartment numbers (AIDs) may be reused across buildings. An apartment also has a capacity attribute to indicate the number of persons that can stay in the apartment. All buildings in the village are assigned a unique building number (BID) assigned by village authorities, and have an age field to indicate how old the building is. Each building must have at least one apartment. Each graduate student is assigned to exactly one apartment within the village. a) (10 points) The Entity-Relation diagram below is incomplete. Apart from attributes of entities, some participation and key constraints have not been represented. One or more entities also need to be changed to a weak entity. Complete the following Entity-Relation diagram based on the information above draw directly on the diagram below. You will not need to draw any new lines between the boxes and diamonds, but you may need to modify some of the existing lines. The current lines are all thin lines; be sure to make any changes to these lines that are necessary, and be sure that the distinctions among your lines clear! Do not forget to underline your primary and partial keys as appropriate. Apartment Stays_In Graduate Student In_Building Parenting Building BID Age Child CS 186 Midterm, Spring 2003 Page 3

b) (10 points) Using the blank spaces provided just below, write down a relational schema for the rules above, and identify all foreign keys. As an example of our notation, we provide a dummy answer for table Foo with integer attributes X and string attribute Y, where y is a foreign key to column Z of table Baz, and y cannot be null. We have also written the schema for Building as a starting point. (Notes: If your answer to (a) is correct, then this schema will be consistent with (a). But you need not answer (a) to answer this! Also, you do not have to fill in all rows of the table below!! We provided more rows than needed.) For simplicity, use only integers or strings as the attribute types. Table Name Attributes Primary Key Foreign Keys c) (5 points) Based on your information in part (b), there can potentially be at least one rule stated above for the database that would require the use of table constraints or SQL assertions. Identify in words one such rule (the rule, not the SQL). If you think there are no such rules, indicate so. CS 186 Midterm, Spring 2003 Page 4

4) Relational Algebra and Calculus (25 points) We revisit the NBA schema from homework 3. Player (playerid: integer, name : varchar(50), position varchar(10), height : integer, weight : integer, team: varchar(30)) Each Player is assigned a unique playerid. The position of a player can either be Guard, Center or Forward. The height of a player is in inches while the weight is in pounds. Each player plays for only one team. The team field is a foreign key to Team. Team (name: varchar(30, city: varchar(20)) Each Team has a unique name associated with it. There can be multiple teams from the same city. Game (gameid: integer, hometeam: varchar(30), awayteam : varchar(30), homescore : integer, awayscore : integer) Each Game has a unique gameid. The fields hometeam and awayteam are foreign keys to Team. Two teams may play each other multiple times each season. There is a check constraint to ensure that hometeam and awayteam are different. GameStats (plaverid : integer, gameid: integer, points : integer, assists : integer, rebounds : integer) GameStats records the performance statistics of a player within a game. If a player does not play in a particular game, they will not have any statistics recorded for that game. gameid is a foreign key to Games. playerid is a foreign key to Player. Assume that two assertions are in place. The first is to ensure that the player involved belongs to either the involving home or away teams, and the second is to ensure that the total score obtained by a team recorded (in Game) is consistent with the total sum (in GameStats) of individual players in the team playing in the game Write the relational calculus for the following. Please output extra columns if that makes your query simpler to write. a) (6 points) Find the tallest player(s) from each team. b) (7 points) List the players who played in all away games for their team. Write the relational algebra for the following. Be sure to get exactly the output columns requested. Feel free to use compound relational algebra operators, such as division. c) (6 points) List each player s playerid, and the gameids where they earned triple doubles. (A triple double is a game in which the player s number of assists, rebounds, and points are all at least in the double-digit range or higher). d) (6 points) List all names of Chicago Bulls players who have played in all Bulls games. CS 186 Midterm, Spring 2003 Page 5

5. Indexing (15 points) Consider the following picture of a clustered B+-tree index and heap file on the relation Artists. The first column of the table in the heap file is called FirstName, and is indexed by the tree; the second column Instrument is also shown, but the remaining columns have been omitted from the picture (they re the... in each tuple.) Assume that the FirstName column is a fixed-length field all entries are padded to 10 characters. Each block in the index is shown as a large rectangle there are currently three of these. Each block shows both the entries in the block, and the free space where more entries can fit. In the heap file, each tuple is an entire block long. Hence each rectangle at the heap file level of the picture is a disk block of its own. a) (3 points) What is the maximum number of data entries that the index below could hold before a node would have to split? b) (8 points) Give a sequence of operations on the database that would cause the tree to look the way it does. Be sure to include the initial construction of the index as one of your operations! If you like, you can use SQL commands, but quick English descriptions of each operation will be fine. c) (4 points) Redraw the B+-tree portion of the picture below to reflect the way it would look after the deletion of Charles. B+-tree. root Ferd Charles Clifford Lester Louis Ornette (Charles, Bass, ) (Clifford, Trumpet, ) (Lester, sax, ) (Ornette, sax, ) (Louis, trumpet, ) CS 186 Midterm, Spring 2003 Page 6