CS-245 Database System Principles Winter 2002 Assignment 4

Similar documents
CPSC 310: Database Systems / CSPC 603: Database Systems and Applications Exam 2 November 16, 2005

Question Score Points Out Of 25

Relational model continued. Understanding how to use the relational model. Summary of board example: with Copies as weak entity

2.2.2.Relational Database concept

Database Management Systems

Relational Model 2: Relational Algebra

Homework 1: Relational Algebra and SQL (due February 10 th, 2016, 4:00pm, in class hard-copy please)

Administrivia. CS 133: Databases. Cost-based Query Sub-System. Goals for Today. Midterm on Thursday 10/18. Assignments

This is an almost-two week homework; it is almost twice as long as usual. You should complete the first half of it by October 2.

CSE Midterm - Spring 2017 Solutions

Problem Set 2 Solutions

2(Y/8K) * (1 + log15(y/(8k*16)) I/Os

CMPT 354 Assignment 2 Key Instructor: G. Louie

Database Systems External Sorting and Query Optimization. A.R. Hurson 323 CS Building

CSE 444 Homework 1 Relational Algebra, Heap Files, and Buffer Manager. Name: Question Points Score Total: 50

Introduction to Database Systems CSE 444

CS 245 Midterm Exam Solution Winter 2015

Homework 5: Miscellanea (due April 26 th, 2013, 9:05am, in class hard-copy please)

The Relational Model. Suan Lee

CompSci 516 Data Intensive Computing Systems. Lecture 11. Query Optimization. Instructor: Sudeepa Roy

CS 564 Final Exam Fall 2015 Answers

CS145 Midterm Examination

CS434 Notebook. April 19. Data Mining and Data Warehouse

CSE-3421M Test #2. Queries

CS 245 Midterm Exam Winter 2014

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17

University of Massachusetts Amherst Department of Computer Science Prof. Yanlei Diao

PLEASE HAND IN UNIVERSITY OF TORONTO Faculty of Arts and Science

University of Waterloo Midterm Examination Sample Solution

The URL of the whole system is:

CSE 444, Winter 2011, Final Examination. 17 March 2011

CS 525: Advanced Database Organisation

Midterm Exam. October 30th, :15-4:30. CS425 - Database Organization Results

Instructor: Craig Duckett. Lecture 11: Thursday, May 3 th, Set Operations, Subqueries, Views

R & G Chapter 13. Implementation of single Relational Operations Choices depend on indexes, memory, stats, Joins Blocked nested loops:

Section 1: Redundancy Anomalies [10 points]

Optimization Overview

Midterm Exam (Version B) CS 122A Spring 2017

CSE 544 Principles of Database Management Systems

Relational Database: The Relational Data Model; Operations on Database Relations

Ian Kenny. December 1, 2017

DSE 203 DAY 1: REVIEW OF DBMS CONCEPTS

Plan for today. Query Processing/Optimization. Parsing. A query s trip through the DBMS. Validation. Logical plan

CS145 Midterm Examination

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 7 - Query optimization

Section 2.2: Relational Databases

2.3 Algorithms Using Map-Reduce

Name: Database Systems ( 資料庫系統 ) Midterm exam, November 15, 2006

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores

Data warehouse design

Relational Databases Lecture 2

Principles of Data Management. Lecture #12 (Query Optimization I)

The University of British Columbia Computer Science 304 Practice Final Examination

COURSE OVERVIEW THE RELATIONAL MODEL. CS121: Relational Databases Fall 2017 Lecture 1

CS348: INTRODUCTION TO DATABASE MANAGEMENT (Winter, 2011) FINAL EXAMINATION

COURSE OVERVIEW THE RELATIONAL MODEL. CS121: Introduction to Relational Database Systems Fall 2016 Lecture 1

CSE344 Midterm Exam Winter 2017

CompSci 516 Data Intensive Computing Systems

CS W Introduction to Databases Spring Computer Science Department Columbia University

Examination examples

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 7 - Query execution

Examples of Physical Query Plan Alternatives. Selected Material from Chapters 12, 14 and 15

CS 453 Electronic Commerce Technologies. Homework # 4 PHP-based E-Store

Midterm Review. Winter Lecture 13

CS 327E Lecture 2. Shirley Cohen. January 27, 2016

Midterm Exam ANSWER KEY

CMU - SCS / Database Applications Spring 2013, C. Faloutsos Homework 1: E.R. + Formal Q.L. Deadline: 1:30pm on Tuesday, 2/5/2013

CS352 Lecture - Relational Calculus; QBE. Objectives: 1. To briefly introduce the tuple and domain relational calculi 2. To briefly introduce QBE.

RELATIONAL OPERATORS #1

The University of British Columbia

Final Exam. May 3rd, :30-12:30. CS520 - Data Integration, Warehousing, and Provenance

CS122 Lecture 15 Winter Term,

Evaluating XPath Queries

Lecture 04: SQL. Wednesday, October 4, 2006

CSC443 Winter 2018 Assignment 1. Part I: Disk access characteristics

Fundamentals of Database Systems

An Oracle White Paper August Building Highly Scalable Web Applications with XStream

Homework 6: FDs, NFs and XML (due April 15 th, 2015, 4:00pm, hard-copy in-class please)

THE COPPERBELT UNIVERSITY

CS 4320/5320 Homework 2

From SQL-query to result Have a look under the hood

Course Web Site. 445 Staff and Mailing Lists. Textbook. Databases and DBMS s. Outline. CMPSCI445: Information Systems. Yanlei Diao and Haopeng Zhang

CS122 Lecture 4 Winter Term,

Query Processing and Query Optimization. Prof Monika Shah

CMSC 424 Database design Lecture 18 Query optimization. Mihai Pop

Schema And Draw The Dependency Diagram

CSE4701 Principles of Databases Midterm Exam

CSE 444: Database Internals. Lecture 3 DBMS Architecture

CMP-3440 Database Systems

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 9 - Query optimization

CSE344 Midterm Exam Fall 2016

CMPSCI445: Information Systems

Database Management Systems Written Examination

Two hours UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE

UNIVERSITY OF BOLTON CREATIVE TECHNOLOGIES COMPUTING PATHWAY SEMESTER TWO EXAMINATION 2014/2015 ADVANCED DATABASE SYSTEMS MODULE NO: CPU6007

[18 marks] Consider the following schema for tracking customers ratings of books for a bookstore website.

Relational Model and Relational Algebra

University of California, Berkeley. (2 points for each row; 1 point given if part of the change in the row was correct)

CMPS182 Midterm Examination. Name: Question 1: Question 2: Question 3: Question 4: Question 5: Question 6:

Principles of Data Management. Lecture #9 (Query Processing Overview)

Transcription:

CS-245 Database System Principles Winter 2002 Assignment 4 Due at the beginning of class on Tuesday, February 19 State all assumptions and show all work. Subscribe to cs245@lists.stanford.edu to receive clarifications and changes. You can email questions to cs245-staff@cs.stanford.edu Problem 1. (15 points) Assume that we have two relations, R and S. Let p be a predicate containing only R attributes. Let q be a predicate containing only S attributes. Let m be a predicate containing only attributes from both R and S. Demonstrate how to derive the following relational algebra equivalence rules from simpler rules. You may use any rule mentioned in the book or in lecture, but state explicitly which rule you are using. p q m m [ ( p q S) ] p q p R) S ] q S) ] Problems 2-4. For the next three problems, use the following schema, for an on-line bookstore: Cust (CustID, Name, Address, State, Zip) Book (BookID, Title, Author, Price, Category) Order (OrderID, CustID, BookID, ShipDate) Inventory (BookID, Quantity, WarehouseID, ShelfLocation) Warehouse (WarehouseID, State) This schema represents customers (Cust) and books (Book). When a customer buys a book, a tuple is entered into the Order table. The inventory of books is recorded in the Inventory table, which records the quantity, warehouse and shelf location for each different book. The Warehouse table records the state where each warehouse is located in. You may assume that these relations are sets. You may assume Price is a numeric value. You may also assume that ShipDate is an integer representation of a date. Finally, you may assume that there is a constant, :today, that contains the integer representation of today s date.

Problem 2. (15 points) Write relational algebra expressions for the following SQL queries. a. Find books under $10: SELECT Title, Author FROM Book WHERE Price < 10.00; b. Find books ordered by customer 53: SELECT Title, Author FROM Cust NATURALJOIN Order NATURALJOIN Book WHERE CustID = 53; c. Find other books purchased by the same customers that purchased book 1085: SELECT B.Title, B.Author, B.BookID FROM Book B, Order Q, Order R WHERE Q.BookID=1085 AND Q.CustID=R.CustID AND R.BookID=B.BookID AND B.BookID <> 1085; Problem 3. (30 points) For each of the following relational algebra expressions: Show the logical query plan for the given expression, drawn as a tree of relational operators with relations at the leaves. Transform the logical query plan you have drawn by pushing selections and projections down as far as you can. Show the result as another logical query plan drawn as a tree of relational operators with relations at the leaves. (For simplicity, you should not introduce projections over attributes other than those attributes already projected.) Write the transformed query plan (with the selections and projections pushed down) as a relational algebra expression. a. Find books by Oprah Winfrey bought by customers in California for under $20. π Title,BookID (σ State= CA Author= Oprah Winfrey Price<20 (Cust Order Book) ) b. Find books in warehouse 12 that are supposed to be shipped today. π BookID,ShelfLocation (σ ShipDate=:today WarehouseID=12 (Order Inventory))

c. Find books that customer 89 has not bought yet but that are in the same category as some book customer 89 has bought already. π Title,Author,BookID (σ CustID=89 [(Book (π Category,CustID (Order Book) ) ) (π CustID,BookID,Title,Author,Price,Category (Order Book))]) Problem 4. (30 points) For each of the following logical query plans, label each relational operator with the expected result size: the number of tuples. You may assume containment of value sets and preservation of value sets. You may also make the assumption stated in the lecture notes, that Values in select expression Z=val are uniformly distributed over possible V(R,Z) values. Be sure to justify each value you calculate. Note that since we are only asking for the number of tuples, you do not need to examine the effect of projections. Now make the assumption stated in lecture that Values in select expression Z=val are uniformly distributed over a domain with DOM(R,Z) values. Which, if any, of the values you calculated change? Give the new values. Use the following statistics: T(Order) = 30,000 V(Order,BookID) = 1,000 V(Order,CustID) = 3,000 V(Order,ShipDate) = 2,500 DOM(Order,ShipDate) = 10,000 T(Inventory) = 100,000 V(Inventory,BookID) = 2,000 V(Inventory,WarehouseID) = 50 T(Cust) = 3,000 V(Cust,CustID) = 3,000 V(Cust,State) = 50 DOM(Cust,State) = 50 T(Warehouse) = 50 V(Warehouse,WarehouseID) = 50 V(Warehouse,State) = 50 T(Book) = 1,000 V(Book,BookID) =1,000 V(Book,Category) = 20 DOM(Book,Category) = 25 (HINT: You have enough information in these statistics to solve this problem.)

a. Find books that have yet to be shipped that are in a warehouse in the same state as the customer that ordered them. π BookID,WarehouseID σ ShipDate>:today Order Inventory Cust Warehouse b. Find books from category Mystery ordered by people in California that shipped today. π Title,Author,BookID σ ShipDate=:today σ Category= Mystery σ State= CA Order Book Cust

Problem 5. (10 points) Imagine a large relation R stored in m disk blocks where n tuples are stored per block. Consider a query that involves a selection over R. Using statistics, we deduce that the selection is likely to return at least m tuples, and we know that the tuples are uniformly distributed throughout the disk blocks. In other words, we can reasonably expect that each disk block will contain at least one tuple that will be a part of our selection result. The basic way to answer the selection is to scan the table, one block at a time, retrieving tuples that match the selection; the cost of this plain table scan operation is m I/Os. How much improvement can we expect in performance if we use an index to support our selection instead of a plain table scan?