CSE 444 Final Exam. August 21, Question 1 / 15. Question 2 / 25. Question 3 / 25. Question 4 / 15. Question 5 / 20.

Similar documents
CSE 444 Final Exam 2 nd Midterm

CSE 444 Final Exam. December 17, Question 1 / 24. Question 2 / 20. Question 3 / 16. Question 4 / 16. Question 5 / 16.

Hash table example. B+ Tree Index by Example Recall binary trees from CSE 143! Clustered vs Unclustered. Example

Introduction to Database Systems CSE 414. Lecture 26: More Indexes and Operator Costs

R has a ordered clustering index file on its tuples: Read index file to get the location of the tuple with the next smallest value

Announcements. Two typical kinds of queries. Choosing Index is Not Enough. Cost Parameters. Cost of Reading Data From Disk

CSE344 Midterm Exam Winter 2017

Database Systems CSE 414

CSE 414 Database Systems section 10: Final Review. Joseph Xu 6/6/2013

Examples of Physical Query Plan Alternatives. Selected Material from Chapters 12, 14 and 15

CSE 344 FEBRUARY 21 ST COST ESTIMATION

University of Waterloo Midterm Examination Solution

Principles of Database Management Systems

McGill April 2009 Final Examination Database Systems COMP 421

CSE 344 Midterm. Wednesday, February 19, 2014, 14:30-15:20. Question Points Score Total: 100

CSE 344 Midterm. Wednesday, February 19, 2014, 14:30-15:20. Question Points Score Total: 100

CSE 190D Spring 2017 Final Exam Answers

CS 564 Final Exam Fall 2015 Answers

Data Storage. Query Performance. Index. Data File Types. Introduction to Data Management CSE 414. Introduction to Database Systems CSE 414

Introduction to Database Systems CSE 444, Winter 2011

CSE 344 Midterm Exam

Introduction to Data Management CSE 344. Lecture 12: Cost Estimation Relational Calculus

CSE 344 APRIL 27 TH COST ESTIMATION

CSE 344 Final Review. August 16 th

University of Waterloo Midterm Examination Sample Solution

R & G Chapter 13. Implementation of single Relational Operations Choices depend on indexes, memory, stats, Joins Blocked nested loops:

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 9 - Query optimization

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 7 - Query optimization

EECS 647: Introduction to Database Systems

CSE 190D Spring 2017 Final Exam

CSE 544 Principles of Database Management Systems

CSE344 Midterm Exam Fall 2016

CISC437/637 Database Systems Final Exam

L20: Joins. CS3200 Database design (sp18 s2) 3/29/2018

CISC437/637 Database Systems Final Exam

CSE wi Final Exam Sample Solution

CSE 344 Midterm. November 9, 2011, 9:30am - 10:20am. Question Points Score Total: 100

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe

Query Processing. Introduction to Databases CompSci 316 Fall 2017

Fundamentals of Database Systems

CSEP 514 Midterm. Tuesday, Feb. 7, 2017, 5-6:20pm. Question Points Score Total: 150

Query Processing & Optimization

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Database Systems: Fall 2015 Quiz I

CompSci 516 Data Intensive Computing Systems. Lecture 11. Query Optimization. Instructor: Sudeepa Roy

Physical Design. Elena Baralis, Silvia Chiusano Politecnico di Torino. Phases of database design D B M G. Database Management Systems. Pag.

CSE 344 Midterm. November 9, 2011, 9:30am - 10:20am. Question Points Score Total: 100

Review: Query Evaluation Steps. Example Query: Logical Plan 1. What We Already Know. Example Query: Logical Plan 2.

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 7 - Query execution

RELATIONAL OPERATORS #1

Final Exam CSE232, Spring 97

Relational Query Optimization. Highlights of System R Optimizer

Computer Science 425 Fall 2006 Second Take-home Exam Out: 2:50PM Wednesday Dec. 6, 2006 Due: 5:00PM SHARP Friday Dec. 8, 2006

CSE 344 Final Exam. March 17, Question 1 / 10. Question 2 / 30. Question 3 / 18. Question 4 / 24. Question 5 / 21.

Administrivia. CS 133: Databases. Cost-based Query Sub-System. Goals for Today. Midterm on Thursday 10/18. Assignments

Query Processing: The Basics. External Sorting

CSE 444, Winter 2011, Final Examination. 17 March 2011

CSE 444: Database Internals. Lecture 11 Query Optimization (part 2)

Optimizer Challenges in a Multi-Tenant World

CS-245 Database System Principles

Algorithms for Query Processing and Optimization. 0. Introduction to Query Processing (1)

Advanced Database Systems

Database System Concepts

CompSci 516 Data Intensive Computing Systems

CSE 562 Final Exam Solutions

From SQL-query to result Have a look under the hood

Database Management Systems (COP 5725) Homework 3

Implementation of Relational Operations

Overview of Implementing Relational Operators and Query Evaluation

Database Applications (15-415)

Evaluation of Relational Operations. Relational Operations

Database Systems. Project 2

Ch 5 : Query Processing & Optimization

Administriva. CS 133: Databases. General Themes. Goals for Today. Fall 2018 Lec 11 10/11 Query Evaluation Prof. Beth Trushkowsky

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17

Lassonde School of Engineering Winter 2016 Term Course No: 4411 Database Management Systems

CSC 261/461 Database Systems Lecture 19

CSEP 544: Lecture 04. Query Execution. CSEP544 - Fall

Semistructured Data and XML

INSTITUTO SUPERIOR TÉCNICO Administração e optimização de Bases de Dados

CSE 143 Final Exam Part 1 - August 18, 2011, 9:40 am

Evaluation of Relational Operations

CS145 Midterm Examination

Database Management Systems Paper Solution

CSE 344 MAY 7 TH EXAM REVIEW

Chapter 13: Query Processing

External Sorting. Chapter 13. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

Chapter 12: Query Processing

Relational Query Optimization

Faloutsos 1. Carnegie Mellon Univ. Dept. of Computer Science Database Applications. Outline

Outline. Database Management and Tuning. Outline. Join Strategies Running Example. Index Tuning. Johann Gamper. Unit 6 April 12, 2012

Query Processing. Solutions to Practice Exercises Query:

CSE 414 Midterm. Friday, April 29, 2016, 1:30-2:20. Question Points Score Total: 100

Chapter 12: Query Processing

CSE 344 Final Examination

QUERY OPTIMIZATION [CH 15]

Chapter 3. Algorithms for Query Processing and Optimization

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

CMSC424: Database Design. Instructor: Amol Deshpande

Chapter 12: Query Processing. Chapter 12: Query Processing

XML databases. Jan Chomicki. University at Buffalo. Jan Chomicki (University at Buffalo) XML databases 1 / 9

Transcription:

CSE 444 Final Exam August 21, 2009 Name Question 1 / 15 Question 2 / 25 Question 3 / 25 Question 4 / 15 Question 5 / 20 Total / 100 CSE 444 Final, August 21, 2009 Page 1 of 10

Question 1. B+ trees (15 points) Suppose we have the following B+ tree, where each index node has a maximum of 4 children. 30 50 5 10 15 30 35 40 45 50 60 5 10 15 30 35 40 45 50 60 Show how this B+ tree changes if the values 65 and 42 are inserted into the tree one after the other. CSE 444 Final, August 21, 2009 Page 2 of 10

Question 2. Logical query plans (25 points) This problem and the next concern two relations R(a, b, c) and S(x, y, z) that have the following characteristics: B(R) = 600 B(S) = 800 T(R) = 3000 T(S) = 4000 V(R, a) = 300 V(S, x) = 100 V(R, b) = 100 V(S, y) = 400 V(R, c) = 50 V(S, z) = 40 We also have M = 1000 (number of memory blocks). Relation R has a clustered index on attribute a and an unclustered index on attribute c. Relation S has a clustered index on attribute x and an unclustered index on attribute z. All indices are B+ trees. Now consider the following logical query plan for a query involving these two relations. R.c, ans γ R.c, sum(s.x) ans σ R.c = 10 and S.z = 17 R.b = S.y R S (Answer the questions about this query on the following page) CSE 444 Final, August 21, 2009 Page 3 of 10

Question 2. (cont.) (a) Write a SQL query that is equivalent to the logical query plan on the previous page. (b) Change or rearrange the original logical query plan to produce one that is equivalent (has the same final results), but which is estimated to be significantly faster, if possible. Recall that logical query optimization does not consider the final physical operators used to execute the query, but only things at the logical level, such as the sizes of relations and estimated sizes of intermediate results. You should include a brief but specific explanation of how much you expect your changes to improve the speed of the query and why. Draw your new query plan and write your explanation below. If you can clearly and easily show the changes on the original diagram, you may do so, otherwise draw a new diagram here. CSE 444 Final, August 21, 2009 Page 4 of 10

Question 3. Physical query plans (25 points) This problem uses the same two relations and statistics as the previous one, but for a different operation not related to the previous query. As before, the relations are R(a, b, c) and S(x, y, z). Here are the statistics again: B(R) = 600 B(S) = 800 T(R) = 3000 T(S) = 4000 V(R, a) = 300 V(S, x) = 100 V(R, b) = 100 V(S, y) = 400 V(R, c) = 50 V(S, z) = 40 M = 1000 (number of memory blocks) Also as before, relation R has a clustered index on attribute a and an unclustered index on attribute c. Relation S has a clustered index on attribute x and an unclustered index on attribute z. All indices are B+ trees. Your job for this problem is to specify and justify a good physical plan for performing the join R a=z S (i.e., join R and S using the condition R.a = S.z) Your answer should specify the physical join operator used (hash, nested loop, sort merge, or other) and the access methods used to read relations R and S (sequential scan, index, etc.). Be sure to give essential details: i.e., if you use a hash join, which relations(s) are included in hash tables; if you use nested loops, which relation(s) are accessed in the inner and outer loops, etc. Give the estimated cost of your solution in terms of number of disk I/O operations needed (you should ignore CPU time and the cost to read any index blocks). You should give a brief justification why this is the best (cheapest) way to implement the join. You do not need to exhaustively analyze all the other possible join implementations and access methods, but you should give a brief discussion of why your solution is the preferred one compared to the other possibilities. (Write your answer on the next page. You may remove this page for reference if you wish.) CSE 444 Final, August 21, 2009 Page 5 of 10

Question 3. (cont) Give your solution and explanation here. CSE 444 Final, August 21, 2009 Page 6 of 10

Question 4. XML (15 points) It s never too early to think about the holidays at least not if you are in the toy business. Santa s elves are using XML to keep track of the toys in Santa s sack this year. The DTD for their data is the following: <!DOCTYPE sack [ <!ELEMENT sack (toy)* > <!ELEMENT toy (name, color*, age?) > <!ELEMENT name (#PCDATA) > <!ELEMENT color (#PCDATA) > <!ELEMENT age (#PCDATA) > ]> Although it is not specified in the DTD itself, the values for age will be one of baby, child, teen, adult, or everyone, if age is present in an entry. If a toy is available in multiple colors, it will have multiple color attributes in its entry. Here is a short example of the kind of data that we might find in Santa s database: <sack> <toy> <name>firetruck</name> <color>red</color> <age>child</age> </toy> <toy> <name>rattle</name> <color>pink</color> <color>blue</color> <age>baby</age> </toy> </sack> On the next page, write XPath or XQuery expressions as requested to perform the desired tasks. (You may detach this page for reference if you wish.) CSE 444 Final, August 21, 2009 Page 7 of 10

Question 4. (cont). Use the XML schema on the previous page to answer these questions. If it matters, you can assume that the data is stored in a file named sack.xml. (a) Write an XPath expression (not a more general XQuery) that returns the names of all toys that are available in the color purple (i.e., have at least one color entity with the value purple ). (b) Write an XQuery expression that reformats a valid XML document specified by the original DTD to give a list of toy names grouped by age. The resulting document should include all of the toys in the original document and should match this DTD: <!DOCTYPE agelist [ <!ELEMENT agelist (group)* > <!ELEMENT group (age, name*) > <!ELEMENT name (#PCDATA) > <!ELEMENT age (#PCDATA) > ]> CSE 444 Final, August 21, 2009 Page 8 of 10

Question 5. Map Reduce (20 points) For each of the following problems describe how you would solve it using map reduce. You should explain how the input is mapped into (key, value) pairs by the map stage, i.e., specify what is the key and what is the associated value in each pair, and, if needed, how the key(s) and value(s) are computed. Then you should explain how the (key, value) pairs produced by the map stage are processed by the reduce stage to get the final answer(s). If the job cannot be done in a single map reduce pass, describe how it would be structured into two or more map reduce jobs with the output of the first job becoming input to the next one(s). You should just describe your solution algorithm. You should not translate the solution into Pig Latin or any other detailed programming language. (a) The input is a list of housing data where each input record contains information about a single house: (address, city, state, zip, value). The output should be the average house value in each zip code. CSE 444 Final, August 21, 2009 Page 9 of 10

Question 5. (cont.) (b) [Same as the last part of project 3, only this time using map reduce instead of SQL] The input contains two lists. One list gives voter information for every registered voter: (voter id, name, age, zip). The other list gives disease information: (zip, age, disease). For each unique pair of age and zip values, the output should give a list of names and a list of diseases for people in that zip code with that age. If a particular age/zip pair appears in one input list but not the other, then that age/zip pair can appear in the output with an empty list of names or diseases, or you can omit it from the output entirely, depending on which is easier. (Hint: the keys in a map/reduce step do not need to be single atomic values.) CSE 444 Final, August 21, 2009 Page 10 of 10