CSCI-6421 Final Exam York University Fall Term 2004

Similar documents
COSC-4411(M) Midterm #1

CSE-6490B Final Exam

COSC-4411(M) Midterm #1

Lassonde School of Engineering Winter 2016 Term Course No: 4411 Database Management Systems

EECS-3421a: Test #2 Queries

University of Waterloo Midterm Examination Sample Solution

CSE-3421M Test #2. Queries

CSE-3421 Test #1 Design

IMPORTANT: Circle the last two letters of your class account:

University of California, Berkeley. CS 186 Introduction to Databases, Spring 2014, Prof. Dan Olteanu MIDTERM

University of Waterloo Midterm Examination Solution

CSC 261/461 Database Systems Lecture 19

Database Management Systems (COP 5725) Homework 3

CSE 344 MAY 7 TH EXAM REVIEW

Outline. Database Management and Tuning. Outline. Join Strategies Running Example. Index Tuning. Johann Gamper. Unit 6 April 12, 2012

Query Processing. Introduction to Databases CompSci 316 Fall 2017

CS 564 Final Exam Fall 2015 Answers

CSE344 Midterm Exam Winter 2017

TotalCost = 3 (1, , 000) = 6, 000

192 Chapter 14. TotalCost=3 (1, , 000) = 6, 000

CSE 414 Midterm. Friday, April 29, 2016, 1:30-2:20. Question Points Score Total: 100

IMPORTANT: Circle the last two letters of your class account:

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

CS 245 Midterm Exam Winter 2014

Spring 2013 CS 122C & CS 222 Midterm Exam (and Comprehensive Exam, Part I) (Max. Points: 100)

Query Processing & Optimization. CS 377: Database Systems

CS222P Fall 2017, Final Exam

Implementing Relational Operators: Selection, Projection, Join. Database Management Systems, R. Ramakrishnan and J. Gehrke 1

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Database Systems: Fall 2009 Quiz I Solutions

University of California, Berkeley. (2 points for each row; 1 point given if part of the change in the row was correct)

L20: Joins. CS3200 Database design (sp18 s2) 3/29/2018

6 February 2014 CSE-3421M Test #1 w/ answers p. 1 of 14. CSE-3421M Test #1. Design

CSE 344 Midterm Nov 1st, 2017, 1:30-2:20

CMPS 181, Database Systems II, Final Exam, Spring 2016 Instructor: Shel Finkelstein. Student ID: UCSC

Query Processing and Advanced Queries. Query Optimization (4)

INSTITUTO SUPERIOR TÉCNICO Administração e optimização de Bases de Dados

Queen s University Faculty of Arts and Science School of Computing CISC 432* / 836* Advanced Database Systems

CS 461: Database Systems. Final Review. Julia Stoyanovich

CSE-3421: Exercises. Winter 2011 CSE-3421 Exercises p. 1 of 18

Parser. Select R.text from Report R, Weather W where W.image.rain() and W.city = R.city and W.date = R.date and R.text.

Announcements. What is Datalog? Why Do We Learn Datalog? Database Systems CSE 414. Midterm. Datalog. Lecture 13: Datalog (Ch

CSE 344 Midterm. Wednesday, February 19, 2014, 14:30-15:20. Question Points Score Total: 100

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe

Implementing Joins 1

CS-245 Database System Principles

Course No: 4411 Database Management Systems Fall 2008 Midterm exam

Database Systems CSE 414

CS 222/122C Fall 2016, Midterm Exam

14 October 2015 EECS-3421A Test #1 p. 1 of 14. EECS-3421A Test #1. Design

Outline. Database Tuning. Join Strategies Running Example. Outline. Index Tuning. Nikolaus Augsten. Unit 6 WS 2014/2015

Problem Set 2 Solutions

Database Management Systems (CS 601) Assignments

CSE 414 Midterm. April 28, Name: Question Points Score Total 101. Do not open the test until instructed to do so.

CIS 550 Fall Final Examination. December 13, Name: Penn ID:

Midterm Review CS634. Slides based on Database Management Systems 3 rd ed, Ramakrishnan and Gehrke

Intermediate SQL ( )

CS 245 Midterm Exam Solution Winter 2015

CS450 - Database Concepts Fall 2015

MANAGING DATA(BASES) USING SQL (NON-PROCEDURAL SQL, X401.9)

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Database Systems: Fall 2015 Quiz I

Lecture 8 Index (B+-Tree and Hash)

Query Processing: The Basics. External Sorting

Relational Model History. COSC 304 Introduction to Database Systems. Relational Model and Algebra. Relational Model Definitions.

CSE 444, Winter 2011, Final Examination. 17 March 2011

Examples of Physical Query Plan Alternatives. Selected Material from Chapters 12, 14 and 15

CSEP 514 Midterm. Tuesday, Feb. 7, 2017, 5-6:20pm. Question Points Score Total: 150

Lecture #16 (Physical DB Design)

Lecture 14. Lecture 14: Joins!

Final Exam Review 2. Kathleen Durant CS 3200 Northeastern University Lecture 23

Spring 2017 QUERY PROCESSING [JOINS, SET OPERATIONS, AND AGGREGATES] 2/19/17 CS 564: Database Management Systems; (c) Jignesh M.

Endterm Exam (Version B) CS 122A Spring 2017

Computer Science 425 Fall 2006 Second Take-home Exam Out: 2:50PM Wednesday Dec. 6, 2006 Due: 5:00PM SHARP Friday Dec. 8, 2006

CSE 344 Final Review. August 16 th

CSE344 Midterm Exam Fall 2016

CSE 544, Winter 2009, Final Examination 11 March 2009

QUERY OPTIMIZATION [CH 15]

Cost Models. the query database statistics description of computational resources, e.g.

Algorithms for Query Processing and Optimization. 0. Introduction to Query Processing (1)

Optimizer Challenges in a Multi-Tenant World

Relational Query Optimization. Overview of Query Evaluation. SQL Refresher. Yanlei Diao UMass Amherst October 23 & 25, 2007

CS 4320/5320 Homework 2

CS330. Query Processing

Physical Design. Elena Baralis, Silvia Chiusano Politecnico di Torino. Phases of database design D B M G. Database Management Systems. Pag.

Data about data is database Select correct option: True False Partially True None of the Above

Database Systems External Sorting and Query Optimization. A.R. Hurson 323 CS Building

Review. Support for data retrieval at the physical level:

INSTITUTO SUPERIOR TÉCNICO Administração e optimização de Bases de Dados

Fundamentals of Database Systems

1 (10) 2 (8) 3 (12) 4 (14) 5 (6) Total (50)

Database Optimization

UNIVERSITY OF CALIFORNIA College of Engineering Department of EECS, Computer Science Division

Principles of Data Management. Lecture #9 (Query Processing Overview)

CSE 444 Final Exam. August 21, Question 1 / 15. Question 2 / 25. Question 3 / 25. Question 4 / 15. Question 5 / 20.

CSE 232A Graduate Database Systems

Administrivia. CS186 Class Wrap-Up. News. News (cont) Top Decision Support DBs. Lessons? (from the survey and this course)

Overview of Storage and Indexing

CSE 190D Spring 2017 Final Exam

CS 4604: Introduction to Database Management Systems. B. Aditya Prakash Lecture #10: Query Processing

Overview of Implementing Relational Operators and Query Evaluation

Transcription:

6 December 2004 CS-6421 Final Exam p. 1 of 7 CSCI-6421 Final Exam York University Fall Term 2004 Due: 6pm Wednesday 15 December 2004 Last Name: First Name: Instructor: Parke Godfrey Exam Duration: take home Term: Fall 2004 Your assignment, should you choose to accept it, is to answer the following questions to the best of your knowledge. Try to keep answers brief and to the point, but be precise and be careful. Write any assumptions you make along with your answers, whenever necessary. The exam is open-book and open-notes. The exam is take-home. There are ten main questions. Each is worth five points. #1 and #10 are compulsary. You must do the compulsaries and five others of your choosing, for seven questions in all. So the test is 35 points in total. You may do an additional (eighth) problem. If so, I will drop the non-compulsary with the lowest score. If you do more than eight (the two compulsaries plus more than six others), I shall randomly dispense non-compulsaries until I have eight to grade!

6 December 2004 CS-6421 Final Exam p. 2 of 7 For the logicians: 1. (5 points) Datalog. What s that again, in English this time!? Consider (again) the schema Movie(title, director, year) Cast(actor, title, role) FK (title) refs Movie Consider the following rules which are used in the queries to follow. castin (A, M) cast (A, M, R). actor (A) castin (A, M). castout (A, M) castin (A, M), castin (A, N), M N. dicast (A, M) cast (A, M, R 1 ), cast (A, M, R 2 ), R 1 R 2. For each of the following Datalog queries, restate the query in concise, understandable English. a. query (A) dicast (A, M). b. query (A) actor (A), movie (M, D, Y), notcastin (A, M). actor (A), not query (A). c. query (A) castin (A, M 1 ), castout (A, M 2 ). d. query (A) castin (A, M), notcastout (A, M). e. query (A) castin (A, M), notdicast (A, M), notcastout (A, M). 2. (5 points) Negation Semantics. A somewhat stable model. a. (3 points) Is there a Datalog database P such that p (a positive atomic consequence) is a consequence of P with respect to the stable model semantics, but p is not a consequence of P with respect to the well founded semantics, and P has a unique stable model? If this cannot happen, explain why not. Otherwise, provide an example. b. (2 points) Is there a Datalog database P such that p (a positive atomic consequence) is a consequence of P with respect to the well founded semantics, but p is not a consequence of P with respect to the stable model semantics? (Note that when P has no stable models, everything is a consequence of P with respect to the stable model semantics.) If this cannot happen, explain why not. Otherwise, provide an example.

6 December 2004 CS-6421 Final Exam p. 3 of 7 3. (5 points) Datalog to SQL. To the max! Consider the Datalog rules maximalp (X 1,..., X k ) p (X 1,..., X k ), nottrumpedp (X 1,..., X k ). trumpedp (X 1,..., X k ) p (X 1,..., X k ), p (Y 1,..., Y k ), greater ([Y 1,...,Y k ], [X 1,...,X k ]). greater ([X Xs], [Y Ys]) X > Y, greatereq (Xs, Ys). greater ([X Xs], [X Ys]) greater (Xs, Ys). greatereq ([X Xs ], [X Ys]) X Y, greatereq (Xs, Ys). greatereq ([], []). and the query maximalp (X 1,..., X k ). a. (2 points) What does this query evaluate? b. (3 points) Write the query in SQL. 4. (5 points) Containment. Stop being repetitious and redundant. A rule R (defining predicate r) is logically redundant with respect to database DB if any query Q possible with respect to DB has the same answers whether evaluated against DB or against DB {R}. (Assume r has other rules defining it already in DB and that R is an additional rule for r.) a. (1 point) Consider the DB and R p (X, Y) e (X, A), e (A, B), e (B, Y). e (X, Y) f (X, Y). e (X, Y) f (Y, X). f (a, b). f (a, f). f (f, h). f (b, c). f (b, g). f (f, i). f (c, d). f (c, h). f (g, i). f (d, e). f (d, i). f (g, j). f (e, a). f (e, j). f (h, j). p (X, Y) e (X, Y). Is R logically redundant with respect to DB? Why or why not? b. (3 points) Describe a general method to determine whether a rule R is redundant with respect to DB. c. (1 point) Is your procedure decidable? Why or why not?

6 December 2004 CS-6421 Final Exam p. 4 of 7 5. (5 points) Expressiveness. Express yourself! a. One could, albeit with much effort, code up chess via the win and recursion-throughnegation like we did for the stones game in class. If our chess program is locally stratified, then this means that there is a perfect model, and everything is assigned true or false. This means win ( beginning board state )is either true or false. So it would be known that white (the first player) could always win playing a perfect game or that black (the second player) could always win playing a perfect game. Does this mean that a game of chess is necessarily winnable by the perfect white player or the perfect black player? Why or why not? b. Are there any types of queries that can be expressed in SQL but not Datalog? c. Are there any types of queries that can be expressed in SQL but not Datalog? (Careful.) d. Is Datalog a superset of first-order predicate calculus (logic)? Why or why not? e. Is Datalog interpreted under negation-as-finite-failure, the well founded semantics, or the stable model semantics a subset of first-order predicate calculus (logic)? Why or why not?

6 December 2004 CS-6421 Final Exam p. 5 of 7 For the engineers: 6. (5 points) Sequential Reads. Speed it up. Consider each of the join algorithms that we have studied: a. BNLJ (block nested loops join), b. INLJ (index nested loops join), c. HJ (two-pass hash join), d. SMJ (two-pass sort merge join), and e. MJ (merge join, with outer and inner sorted prior). Explain briefly whether sequential reads and writes would be advantageous in each case. Assume that sequential reads are generally not advantageous for filescans of base tables. Base tables become fragmented on disk over time due to inserts and deletes. 7. (5 points) Indexes. In a mess. You have just joined the team at Very Small Databases, Inc., (VSDB). You have been assigned to work with the infamous database expert Dr. Mark Dogfurry. Your first job is to work with him to tune a database being built for the company Geisel & Associates. The Geisel & Associates database includes two tables, Sneech and Whovillian. Dr. Dogfurry tells you the following. For table Sneech, there are the following indexes. 1. A unique clustered hash index on name. 2. An unclustered B+ tree index on specialty + no stars. 3. An unclustered B+ tree index on birthdate + hometown. 4. A clustered B+ tree index on birthdate + hometown + school level. For table Whovillian, there are the following indexes. 5. A unique unclustered hash index on name. 6. A unique unclustered hash index on age. 7. A unique hash index on address + name. 8. An unclustered B+ tree index on grade. The order of the attributes as listed for the composite index keys (for example, birthdate + hometown) is important: this is the order of the attributes by which the index is built. The information that Dr. Dogfurry has given you is suspect. That is, there is reason to believe that there are mistakes in what he has told you. State five distinct problems with the above information. Explain why each is a problem: that it is an impossibility; that it is useless; that it is redundant; and so forth.

6 December 2004 CS-6421 Final Exam p. 6 of 7 8. (5 points) Index Mechanics. Always losing your keys? a. (3 points) A linear hash has just been started. The linear hashed file currently just has one bucket (primary page). The current hash function pair is h 0,h 1. Here, h 0 masks for zero (!) right-hand bits from the hashed key, and so always returns bucket address 0. Hash function h 1 masks for 1 right-hand bit, h 2 for 2, and so forth. Assume that each page can hold two entries. The file currently has one entry of 21 (10101 2 ). 0 21 next A split should be triggered whenever an overflow page is created. Show the linear hashed file after each of the following inserts: 30 (11110 2 ), 18 (10010 2 ), 35 (100011 2 ), and 17 (10001 2 ), and 13 (1101 2 ). The insertions are cumulative, so your final hashed file should contain 30, 18, 35, 17, and 13. b. (2 points) Consider an extendible hash index that has 2 10 directory slots. What can you say about how many buckets so data-record pages if alternative #1, data-entry pages if alternative #2 or #3 that the index has? 9. (5 points) Joins. Is it further to New York or by train? Consider the schema R(A, B) and S(C, A). The underlined attributes designate the primary key. S has a foreign key cast on R through A, and S.A is not nullable. Consider the query select * from R, S where R.A = S.A; There is a clustered tree index on R.A and an unclustered tree index on S.A. Each index is of alternative #2 and has two layers of index pages. So the third layer in each case consists of the data-entry pages. Let N T generically denote the number of records in table T, and V T.C denote the number of distinct values found in column C of table T. N R N S. That is, the number of records in table R is much less than the number of records in table S. a. (3 points) Which INLJ (index nested loops) join is better for the query? A. R as the outer, using the unclustered index on S.A for probing. B. S as the outer, using the clustered index on R.A for probing. Justify your claim. (Use N R, N S, V R.A, etc. in your argument.) b. (2 points) Would employing an INL join for this query make sense? Why or why not?

6 December 2004 CS-6421 Final Exam p. 7 of 7 10. (5 points) Query Optimization. Simply the best plan available. Schema: Statistics: Employee(eid, name, did, jobcat, salary) JobBenefits(title, jobcat, since) FK (title) refs Benefit Benefit(title, description, cost) Employee: 100,000 records on 2,000 pages jobcat: 500 distinct values did (department ID): 100 distinct values (department #13 is accounting) JobBenefits: 3,500 records on 70 pages title: 200 distinct values jobcat: 500 distinct values (same values as in Employee.jobCat) Benefit: 200 records on 20 pages cost: ranges over $500,...,$10,500 Indexes: Employee: Clustered tree index on eid. (Index pages two deep; third layer, data-entry pages.) Unclustered tree index on did, jobcat. (Index pages two deep; third layer, data-entry pages.) JobBenefits: Clustered tree index on jobcat, title. (Index page one deep; second layer, data-entry pages.) Benefit: Query: Hash index on title. select name, eid, B.title from Employee E, JobBenefits J, Benefit B where E.jobCat = J.jobCat and J.title = B.title and E.did = 13 and B.cost > 10000; You have an allocation of twelve buffer frames. a. (1 point) Estimate the cardinality (the resulting number of records) of the query. b. (4 points) Devise a good query plan for the query. Show the query tree, fully annotated with the chosen algorithms and access paths. Estimate the cost of your plan. For full credit, you should have a plan that costs less than 1,500 I/O s.