Question 1. Part (a) [2 marks] what are different types of data independence? explain each in one line. Part (b) CSC 443 Term Test Solutions Fall 2017

Similar documents
Query Processing: The Basics. External Sorting

Chapter 12: Query Processing. Chapter 12: Query Processing

Implementing Joins 1

Database Management System

CMPUT 391 Database Management Systems. Query Processing: The Basics. Textbook: Chapter 10. (first edition: Chapter 13) University of Alberta 1

Lassonde School of Engineering Winter 2016 Term Course No: 4411 Database Management Systems

Chapter 12: Query Processing

Chapter 13: Query Processing

University of Waterloo Midterm Examination Sample Solution

Query Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for

Chapter 13: Query Processing Basic Steps in Query Processing

R & G Chapter 13. Implementation of single Relational Operations Choices depend on indexes, memory, stats, Joins Blocked nested loops:

Advanced Database Systems

Database System Concepts

Dtb Database Systems. Announcement

Overview of Implementing Relational Operators and Query Evaluation

Overview of Query Processing. Evaluation of Relational Operations. Why Sort? Outline. Two-Way External Merge Sort. 2-Way Sort: Requires 3 Buffer Pages

Database Applications (15-415)

University of Waterloo Midterm Examination Solution

Chapter 12: Query Processing

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17

Overview of Storage and Indexing

Midterm Review CS634. Slides based on Database Management Systems 3 rd ed, Ramakrishnan and Gehrke

Database Applications (15-415)

Final Exam Review. Kathleen Durant PhD CS 3200 Northeastern University

Hashed-Based Indexing

CompSci 516 Data Intensive Computing Systems

Cost-based Query Sub-System. Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Last Class.

ECS 165B: Database System Implementa6on Lecture 7

Examples of Physical Query Plan Alternatives. Selected Material from Chapters 12, 14 and 15

CS330. Query Processing

Database Applications (15-415)

Evaluation of Relational Operations

Evaluation of Relational Operations: Other Techniques. Chapter 14 Sayyed Nezhadi

Implementing Relational Operators: Selection, Projection, Join. Database Management Systems, R. Ramakrishnan and J. Gehrke 1

Evaluation of Relational Operations

Implementation of Relational Operations

Query Evaluation Overview, cont.

Evaluation of Relational Operations

CMSC424: Database Design. Instructor: Amol Deshpande

Administriva. CS 133: Databases. General Themes. Goals for Today. Fall 2018 Lec 11 10/11 Query Evaluation Prof. Beth Trushkowsky

Overview of Query Evaluation. Overview of Query Evaluation

Evaluation of Relational Operations. Relational Operations

Query Evaluation Overview, cont.

Query Processing. Introduction to Databases CompSci 316 Fall 2017

Selection Queries. to answer a selection query (ssn=10) needs to traverse a full path.

Spring 2017 QUERY PROCESSING [JOINS, SET OPERATIONS, AND AGGREGATES] 2/19/17 CS 564: Database Management Systems; (c) Jignesh M.

Query Processing and Advanced Queries. Query Optimization (4)

Overview of Query Evaluation

Find the block in which the tuple should be! If there is free space, insert it! Otherwise, must create overflow pages!

Lecture 8 Index (B+-Tree and Hash)

Database Applications (15-415)

Principles of Data Management. Lecture #9 (Query Processing Overview)

Announcements. Reading Material. Recap. Today 9/17/17. Storage (contd. from Lecture 6)

EECS 647: Introduction to Database Systems

Final Exam Review. Kathleen Durant CS 3200 Northeastern University Lecture 22

Implementation of Relational Operations. Introduction. CS 186, Fall 2002, Lecture 19 R&G - Chapter 12

Query Processing. Solutions to Practice Exercises Query:

CAS CS 460/660 Introduction to Database Systems. Query Evaluation II 1.1

Database Systems CSE 414

Query Optimization. Query Optimization. Optimization considerations. Example. Interaction of algorithm choice and tree arrangement.

CS-245 Database System Principles

Sorting & Aggregations

Evaluation of relational operations

Review. Support for data retrieval at the physical level:

Introduction to Indexing 2. Acknowledgements: Eamonn Keogh and Chotirat Ann Ratanamahatana

Database Applications (15-415)

Advanced Databases: Parallel Databases A.Poulovassilis

CS330. Some Logistics. Three Topics. Indexing, Query Processing, and Transactions. Next two homework assignments out today Extra lab session:

Query Processing & Optimization

Course No: 4411 Database Management Systems Fall 2008 Midterm exam

CS 4604: Introduction to Database Management Systems. B. Aditya Prakash Lecture #10: Query Processing

RELATIONAL OPERATORS #1

Query Processing with Indexes. Announcements (February 24) Review. CPS 216 Advanced Database Systems

CSIT5300: Advanced Database Systems

Database Applications (15-415)

Join Algorithms. Lecture #12. Andy Pavlo Computer Science Carnegie Mellon Univ. Database Systems / Fall 2018

Faloutsos 1. Carnegie Mellon Univ. Dept. of Computer Science Database Applications. Outline

CS222P Fall 2017, Final Exam

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe

Outline. Database Management and Tuning. Outline. Join Strategies Running Example. Index Tuning. Johann Gamper. Unit 6 April 12, 2012

External Sorting Implementing Relational Operators

DBMS Query evaluation

Evaluation of Relational Operations: Other Techniques

Architecture and Implementation of Database Systems Query Processing

Assignment 6 Solutions

Hash-Based Indexes. Chapter 11 Ramakrishnan & Gehrke (Sections ) CPSC 404, Laks V.S. Lakshmanan 1

Hash table example. B+ Tree Index by Example Recall binary trees from CSE 143! Clustered vs Unclustered. Example

Parallel DBMS. Chapter 22, Part A

CS 525: Advanced Database Organization 04: Indexing

Evaluation of Relational Operations: Other Techniques

Announcements. Two typical kinds of queries. Choosing Index is Not Enough. Cost Parameters. Cost of Reading Data From Disk

Transaction Management Overview

PS2 out today. Lab 2 out today. Lab 1 due today - how was it?

CS 222/122C Fall 2017, Final Exam. Sample solutions

Storage and File Structure

Introduction to Database Systems CSE 414. Lecture 26: More Indexes and Operator Costs

6.830 Lecture 8 10/2/2017. Lab 2 -- Due Weds. Project meeting sign ups. Recap column stores & paper. Join Processing:

Hash-Based Indexes. Chapter 11. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

Transcription:

Question 1. [12 marks] Short Answer Question Part (a) [2 marks] what are different types of data independence? explain each in one line Logical data independence: Protection from changes in logical structure of data. Physical data independence: Protection from changes in physical structure of data. Part (b) [3 marks] what is atomicity? explain an algorithm that insures atomicity DBMS ensures atomicity (all-or-nothing property) even if system crashes in the middle of a Xact. Idea: Keep a log (history) of all actions carried out by the DBMS while executing a set of Xacts: Before a change is made to the database, the corresponding log entry is forced to a safe location. (WAL protocol; OS support for this is often inadequate.) Write Ahead Log (WAL), if log entry wasnt saved before the crash, corresponding change was not applied to database! After a crash, the effects of partially executed transactions are undone using the log. Write Ahead Log (WAL), if log entry wasnt saved before the crash, corresponding change was not applied to database! Student #: Page 1 of 6 cont d...

Part (c) [2 marks] what s the difference between static and dynamic hashing? what s the difference between linear and extendible hashing? static:long overflow chains can develop and degrade performance. dynamic: Extendible and Linear Hashing: Dynamic techniques to fix this problem. check slides but mainly: LH handles the problem of long overflow chains without using a directory, and handles duplicates. Part (d) [2 marks] Briefly explain how Index Nested Loops work? basically foreach tuple r in R do foreach tuple s in S where ri == sj do add r, s to result Part (e) [3 marks] What is the cost of hash join for 2 relations with M and N pages? explain briefly 3(M+N) check slides Student #: Page 2 of 6 cont d...

Question 2. [12 marks] Cost Calculation Fill in the blanks in the following table B: The number of data pages R: Number of records per page D: (Average) time to read or write disk page Hash: No overflow buckets, 80% page occupancy Tree: 67% occupancy Scan Equality Range Insert Sorted Unclustered Tree index Unclustered Hash index Student #: Page 3 of 6 cont d...

CSC 443 Question 3. Term Test Solutions Fall 2017 [14 marks] Hash following data entries use both linear hashing and extendible hashing techniques with least significant bits: 8, 16, 24, 32, 40, 48, 56, 64, 128, 7, 15, 31, 63, 127, 1, 10, 4 Student #: Page 4 of 6 cont d...

Question 4. [12 marks] Join Algorithms The nearest-neighbor join is a variant of the relational equi-join operator. The nearest-neighbor join of R and S on attribute B returns each tuple of R (r) paired with all tuples of S (s) such that r[b] S[B] and r[b] s[b] is the smallest. Formally, if (r,s) is in output of the nearestneighbor join of R and S on B, then r[b] s[b] and there does not exist a tuple s in S where r[b] s[b] > r[b] s [B]. Hence, for a tuple of R, if there are tuples of S with the same B value, they will be returned (this is the same as in an equi-join). For a tuple of R r where there are no tuple(s) of S with the same B value, we return the set of tuples in S whose B values are as close as possible to r[b] and lower than r[b]. Consider Reserves(sid,bid) and Boat(bid,bname) where bid is not a key for Boat. Assume we have three boats (5, African Queen ), (10, Lebarge ), (12, Bluenose ), (12, Canadian Dime ) and five reservation (s1, 1), (s10, 5), (s2, 9), (s3, 9), (s4, 13). Then Reserves nearest-neighbor join Boat on bid is: (s10, 5, 5, African Queen ) (s2, 9, 5, African Queen ) (s3, 9, 5, African Queen ) (s4, 13, 12, Bluenose ) (s4, 13, 12, Canadian Dime ) Of course, a tuple of R joins with more than one tuple of S iff all the S tuples have the same value of B. So in our example, (s4, 13) joins with multiple tuples, but they all have the same bid value (12). Notice that (s1,1) is not in the result since there is no boat with bid 1. Part (a) [5 marks] Sort Merge Join Can you adapt the sort merge join to produce the nearest-neighbor join? Describe essential modifications to the data structures and algorithm. Be precise. If it is not possible to do so, explain why. If it is possible, state whether the I/O cost will be higher or lower than the cost of an equi-join sort merge. Solution: Sort phase is unchanged. In merge phase maintain two pointers on S, modify inner loop to still check for equality but if check fails back up one entry in S and return r with s-1. The cost should be the same as the sort merge. They may have (correctly) commented that size of output is same or larger so if that is taken into account (which we don t) cost is larger. Part (b) [5 marks] Index Nested Loop Join Can you adapt the index nested loop join algorithm to process nearest neighbor joins? Describe essential modifications to the data structures and algorithm. Be precise. Does the type of index matter? If it is not possible to do so, explain why. Does it matter which relation is outer? Will the I/O cost be different from an index nested loop equi-join? State whether it will be higher or lower and whether this depends on the relative sizes of R and S. Give the I/O cost of your implementation for Reserves nearest-neighbor join Boats on bid (using sizes, buffer pages, and indices available in previous question Question 4). Student #: Page 5 of 6 cont d...

Solution: Permit answer with outjoin (note that left outer join is easy with INL finds tuples of R that don t join with S. But to find tuples of S that don t join with R this is harder, you either have to do a R INL S followed by S INL R or some other trick). Answer for NNJ NN join is not symmetric so in R nn S, R must be outer and index must be a B+tree on S. Hash indices can t be used. Algorithm is same except when probe fails, we scan backward in index to next closest (lower) value. Reserves read 100 pages for each tuple of Reserves 10,000 probe index cost is height 2 + data entry page + 1 (since bid key) cost 100 + 10,000 (4) = 40,100 cost is same (might be a bit higher than equi-join since when we have to back up one to find next lowest bid, we might occasionally incur an extra I/0.) But it is fine if they didn t notice this. Student #: Page 6 of 6 cont d...