Databases IIB: DBMS-Implementation Exercise Sheet 13

Similar documents
Databases IIB: DBMS-Implementation Exercise Sheet 10

Semantic Errors in Database Queries

@vmahawar. Agenda Topics Quiz Useful Links

Detecting Logical Errors in SQL Queries

SQL Structured Query Language Introduction

Databases. Relational Model, Algebra and operations. How do we model and manipulate complex data structures inside a computer system? Until

Objectives. After completing this lesson, you should be able to do the following:

CSE 444 Homework 1 Relational Algebra, Heap Files, and Buffer Manager. Name: Question Points Score Total: 50

Suppose we need to get/retrieve the data from multiple columns which exists in multiple tables...then we use joins..

Queen s University Faculty of Arts and Science School of Computing CISC 432* / 836* Advanced Database Systems

Creating and Managing Tables Schedule: Timing Topic

Databases - 4. Other relational operations and DDL. How to write RA expressions for dummies

Hash-Based Indexing 165

1 SQL Structured Query Language

INFSCI 2711 Database Analysis and Design Example I for Final Exam: Solutions

Oracle DB-Tuning Essentials

Oracle Hints. Using Optimizer Hints

Do You Know SQL? About Semantic Errors in SQL Queries. Christian Goldberg

1 SQL Structured Query Language

Database Management Systems (COP 5725) Homework 3

University of Massachusetts Amherst Department of Computer Science Prof. Yanlei Diao

192 Chapter 14. TotalCost=3 (1, , 000) = 6, 000

Databases - 3. Null, Cartesian Product and Join. Null Null is a value that we use when. Something will never have a value

Examples of Physical Query Plan Alternatives. Selected Material from Chapters 12, 14 and 15

Creating SQL Tables and using Data Types

DRAFT SOLUTION. Query 1(a) DATABASE MANAGEMENT SYSTEMS PRACTICE 1

CS 222/122C Fall 2016, Midterm Exam

ENHANCING DATABASE PERFORMANCE

Databasesystemer, forår 2005 IT Universitetet i København. Forelæsning 8: Database effektivitet. 31. marts Forelæser: Rasmus Pagh

CSC 261/461 Database Systems Lecture 19

CS2 Current Technologies Note 1 CS2Bh

5 Integrity Constraints and Triggers

University of Waterloo Midterm Examination Sample Solution

Databases - 3. Null, Cartesian Product and Join. Null Null is a value that we use when. Something will never have a value

Q.1 Short Questions Marks 1. New fields can be added to the created table by using command. a) ALTER b) SELECT c) CREATE. D. UPDATE.

Query optimization. Elena Baralis, Silvia Chiusano Politecnico di Torino. DBMS Architecture D B M G. Database Management Systems. Pag.

Real-World Performance Training SQL Introduction

GIFT Department of Computing Science Data Selection and Filtering using the SELECT Statement

King Fahd University of Petroleum and Minerals

Database Foundations. 6-3 Data Definition Language (DDL) Copyright 2015, Oracle and/or its affiliates. All rights reserved.

CS 564 Final Exam Fall 2015 Answers

Objectives. After completing this lesson, you should be able to do the following:

University of California, Berkeley. CS 186 Introduction to Databases, Spring 2014, Prof. Dan Olteanu MIDTERM

Interview Questions on DBMS and SQL [Compiled by M V Kamal, Associate Professor, CSE Dept]

Programming Languages

Database Management Systems Paper Solution

SQL: Part III. Announcements. Constraints. CPS 216 Advanced Database Systems

DATABASE PERFORMANCE AND INDEXES. CS121: Relational Databases Fall 2017 Lecture 11

Principles of Database Management Systems

Create Rank Transformation in Informatica with example

Course No: 4411 Database Management Systems Fall 2008 Midterm exam

CS2 Current Technologies Lecture 3: SQL - Joins and Subqueries

IMPORTANT: Circle the last two letters of your class account:

DC62 Database management system JUNE 2013

Chapter 3. Introducation to Relational Database 9/2/2014 CSC4341 1

TotalCost = 3 (1, , 000) = 6, 000

Final Exam CSE232, Spring 97

MySQL Query Tuning 101. Sveta Smirnova, Alexander Rubin April, 16, 2015

IMPORTANT: Circle the last two letters of your class account:

What s a database system? Review of Basic Database Concepts. Entity-relationship (E/R) diagram. Two important questions. Physical data independence

QUERY OPTIMIZATION [CH 15]

Database Management Systems Written Examination

To understand the concept of candidate and primary keys and their application in table creation.

D B M G Data Base and Data Mining Group of Politecnico di Torino

Database implementation Further SQL

Sample Question Paper

Indexes (continued) Customer table with record numbers. Source: Concepts of Database Management

Why Relational Databases? Relational databases allow for the storage and analysis of large amounts of data.

Database Systems CSE 414

Visit for more.

INSTITUTO SUPERIOR TÉCNICO Administração e optimização de Bases de Dados

ORACLE VIEWS ORACLE VIEWS. Techgoeasy.com

Physical Design. Elena Baralis, Silvia Chiusano Politecnico di Torino. Phases of database design D B M G. Database Management Systems. Pag.

Components of a DMBS

COEN244: Class & function templates

SQL. Char (30) can store ram, ramji007 or 80- b

Definitions. Database Architecture. References Fundamentals of Database Systems, Elmasri/Navathe, Chapter 2. HNC Computing - Databases

2. List Implementations (a) Class Templates (b) Contiguous (c) Simply Linked (d) Simply Linked with Position Pointer (e) Doubly Linked

Graduate Examination. Department of Computer Science The University of Arizona Spring March 5, Instructions

Advanced Database Systems

CSIT5300: Advanced Database Systems

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Database Systems: Fall 2015 Quiz I

Databases. Jörg Endrullis. VU University Amsterdam

CSE 344 JANUARY 5 TH INTRO TO THE RELATIONAL DATABASE

Part III. Data Modelling. Marc H. Scholl (DBIS, Uni KN) Information Management Winter 2007/08 1

QUERY OPTIMIZATION. CS 564- Spring ACKs: Jeff Naughton, Jignesh Patel, AnHai Doan

Exam Actual. Higher Quality. Better Service! QUESTION & ANSWER

D B M G. SQL language: basics. Managing tables. Creating a table Modifying table structure Deleting a table The data dictionary Data integrity

P.G.D.C.M. (Semester I) Examination, : ELEMENTS OF INFORMATION TECHNOLOGY AND OFFICE AUTOMATION (2008 Pattern)

R has a ordered clustering index file on its tuples: Read index file to get the location of the tuple with the next smallest value

Databases 2 Lecture IV. Alessandro Artale

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

CSC 210, Exam Two Section February 1999

INTERNATIONAL INDIAN SCHOOL, RIYADH XI XII BOYS SECTION

Advanced Systems Programming

Now, we can refer to a sequence without having to use any SELECT command as follows:

QUIZ: Is either set of attributes a superkey? A candidate key? Source:

RNDr. Michal Kopecký, Ph.D. Department of Software Engineering, Faculty of Mathematics and Physics, Charles University in Prague

CS-245 Database System Principles

RELATIONAL OPERATORS #1

Transcription:

Prof. Dr. Stefan Brass January 27, 2017 Institut für Informatik MLU Halle-Wittenberg Databases IIB: DBMS-Implementation Exercise Sheet 13 As requested by the students, the repetition questions a) will not be discussed in class unless somebody asks for the solution to a specific question. So please have a look at them before the meeting and decide which questions do you wish to be discussed. Of course, you can also ask any question of your own on the topics of the course. You only have to submit solutions to the homework exercises, i.e. Part c) to h). The official deadline for Homework 13A is January 31, 12:00 (before the problem/lab session). For Homework 13B and 13C, it is February 16. Repetition Questions a) What would you answer to the following questions in an oral exam? (Note that also the written exam might contain such a question.) Explain how the Mergesort algorithm works. How are sorted runs merged? How can it be implemented with four files? What is the complexity of Mergesort (and of sorting in general)? How can the Mergesort algorithm be improved if the input is already partially sorted? How can the Mergesort algorithm be improved if some memory is available (not enough to do the complete sort in memory)? Name four different join algorithms. Explain the nested loop join. What is the complexity of the nested loop join if both tables have n rows? How can the nested loop join be improved to make use of available memory? Explain the merge join. What is the complexity of the merge join if both tables have n rows, and the join attribute is a key on one side? Compare nested loop join and merge join. What can the nested loop join do that is not possible with the merge join? Explain the index join. What is its complexity? In which case is the index join clearly better than nested loop and merge join? What is an advantage of the merge join?

Databases IIB: DBMS-Implementation Exercise Sheet 13 2 Explain the hash join. What is its basic idea? Name some equivalences of relational algebra that can be used for algebraic query optimization. Why is it usually better to do a selection before a join? Explain an example where this is not better. Suppose an SQL query declares tuple variables over four relations R1, R2, R3, R4 under FROM. Which structure of joins will a normal DBMS consider? What is a bushy join? In-Class Exercises b) We will continue to discuss the exam from 2015/16, which you find here: [http://www.informatik.uni-halle.de/ brass/dbi17/exam15.pdf] c) Consider again the EMP-DEPT database (a simplified version of a database used in Oracle tutorials): EMP(EMPNO, ENAME, SAL, DEPTNO DEPT, MGR EMP) DEPT(DEPTNO, DNAME, LOC) Which QEP does Oracle use for the query from Homework 12A? SELECT EMPNO, SAL, DNAME FROM EMP E, DEPT D WHERE E.DEPTNO = D.DEPTNO AND E.MGR = 7839 ORDER BY SAL Of course, the real example tables have only a few rows. Create an exactly fitting index. Does the query evaluation plan change? Try also ALTER SESSION SET OPTIMIZER_MODE = first_rows Does this change the query evaluation plan?

Databases IIB: DBMS-Implementation Exercise Sheet 13 3 Homework 13A (Deadline: January 31) Please read the slides 6-36 to 6-65 in the full version of the chapter about query evaluation: [http://users.informatik.uni-halle.de/ brass/dbi17/old c6 qeval.pdf] In particular, have a look at the Oracle query evaluation plans for the given example queries. d) Consider the following query to the EMP-DEPT database: SELECT EMPNO, ENAME, JOB, DEPTNO FROM EMP E WHERE EXISTS(SELECT EMPNO FROM EMP U WHERE U.MGR = E.EMPNO) There is a unique index EMP_EMPNO_PK on EMP(EMPNO). For the optimizer mode all_rows, which tries to optimize the total runtime of the query, an earlier version of Oracle generated the following query evaluation plan: 0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=6 Card=6 Bytes=150) 1 0 MERGE JOIN (SEMI) (Cost=6 Card=6 Bytes=150) 2 1 TABLE ACCESS (BY INDEX ROWID) OF EMP (TABLE) (Cost=2 Card=14 Bytes=294) 3 2 INDEX (FULL SCAN) OF EMP_EMPNO_PK (INDEX (UNIQUE)) (Cost=1 Card=14) 4 1 SORT (UNIQUE) (Cost=4 Card=13 Bytes=52) 5 4 TABLE ACCESS (FULL) OF EMP (TABLE) (Cost=3 Card=13 Bytes=52) (You do not have to understand the cost estimates of the optimizer. The table EMP has 14 rows. However, for one of the rows, the attribute MGR contains a null value.) For the optimizer mode first_rows, which tries to return the first row as quickly as possible, that version of Oracle produced the following query evaluation plan: 0 SELECT STATEMENT Optimizer=FIRST_ROWS (Cost=10 Card=6 Bytes=150) 1 0 NESTED LOOPS 2 1 NESTED LOOPS (Cost=10 Card=6 Bytes=150) 3 2 SORT (UNIQUE) (Cost=3 Card=13 Bytes=52) 4 3 TABLE ACCESS (FULL) OF EMP (TABLE) (Cost=3 Card=13 Bytes=52) 5 2 INDEX (UNIQUE SCAN) OF EMP_EMPNO_PK (INDEX (UNIQUE)) (Cost=0 Card=1) 6 1 TABLE ACCESS (BY INDEX ROWID) OF EMP (TABLE) (Cost=1 Card=1 Bytes=21) Please explain both query evaluation plans in your own words. How is the query evaluated? Also explain why Oracle chooses these different plans for the different optimization goals.

Databases IIB: DBMS-Implementation Exercise Sheet 13 4 e) Let a relation R(A,B,C,D) be given. The attribute types are: A, B, C: NUMERIC(3), D: VARCHAR(1000). Attribute A is the key of the relation. Consider the following query: There are the following indexes: I1 on R(A), I2 on R(B), and I3 on R(C). SELECT A, D FROM R WHERE B = 5 AND C BETWEEN 30 AND 100 First consider the evaluation of the query with a full table scan. Give the Oracle query evaluation plan (in the graphical tree notation or the textual indented notation) for this execution of the query. Furthermore, give an SQL query to the data dictionary that returns an estimate of the number of block accesses that the query will need. Of course, you can assume that the ANALYZE TABLE has just been done, so the size information in the data dictionary is available and current. If data should be missing in the Oracle data dictionary, explain what information would be needed. f) Now do the same for access via I2 (with condition B = 5): Give the Oracle QEP and an SQL query that returns an estimate of the number of blocks that will be accessed. Homework 13BCD (Deadline: February 16) This exercise continues the implementation project. If you finish it, it counts as three homework exercises, but it is optional (it does not count for the number of homeworks you have to complete). g) Please write a class btree_c that implements a mapping from an integer key value to an integer data value with a B + -tree (in a file on disk using your buffer manager). There are several possible extensions, but they are not required in this exercise because they would take too much time: This exercise considers only a mapping of single attributes of a fixed type (int). One could also implement a template for mappings between two arbitrary row types, so that one could handle attribute combinations. A variant of this would be that the length of the tuples is determined at runtime, e.g. in arguments to the constructor of a B-tree object. The template class would have to be instantiated and compiled for each combination of row types that the

Databases IIB: DBMS-Implementation Exercise Sheet 13 5 application needs. The dynamic version is slightly slower (because the tuple size cannot be compiled in), but is more flexible. Another variant of this is that each tuple has its own length. That is even more flexible and could handle also strings of varying length. The homework exercise only requires a UNIQUE index (i.e. a map, not a multimap). For simplicity, it is also ok if the file size is determined when it is created and afterwards remains fixed, i.e. insertion fails if more blocks would be needed. Of course, you can also extend the file if you want to program that. Then you should think about extending the file not by a single block, but instead a larger piece (in the hope that the operating system selects consecutive blocks on disk). We also consider only a simple lookup for a given key value. A B-tree would normally support also other possibilities to access its data, namely range queries and a full scan of all entries in the leaf nodes. A final simplification is that you do not need to support transactions. E.g., if an insertion splits several blocks, and there is a power failure after some of these blocks were written, the data structure will be destroyed. For a real B-tree library, one would have to handle this case, but this is too much for a weekly homework. The class btree_c should have the following interface: btree_c(int id, const char *name): The constructor is called with a file ID and a file name. It only has to store this information for later use in create or open. bool create(int size): This method creates a new file with the name and ID given in the constructor, and initializes it as an empty B-tree. This method has an argument for the number of blocks that should be the initial file size. Basically, the method create of the class file_c is called. If file creation fails, false is returned, otherwise true. bool open(): This method opens the file with the data given in the constructor. It calls the method open of the file class and may do some checking, e.g. load the first block of the file and check whether it has the right block type. The method returns true if the opening was successful, false otherwise. bool insert(int key, int value): This method inserts a key-value pair (both of type int) into the B-tree. If the insertion fails (e.g., because the key is already present in the B-tree), the method returns false. If the insertion was successful, it returns true. const int *lookup(int key): This method searches the given key value in the B-tree. If it is not found, it

Databases IIB: DBMS-Implementation Exercise Sheet 13 6 returns a null pointer. If it is found, it returns a pointer to the corresponding data value (stored somewhere in the database block buffer). The solution with the pointer makes it possible to have one additional value (the null pointer) besides all possible int values. Furthermore, if the implementation is extended to other data types or entire tuples, one would not have to copy a possibly large data value. Of course, the pointer is to read-only memory (it is const). The user of this class cannot change data values in the B-tree. bool checkpoint(): This method ensures that all modified blocks are really written to disk. You have to extend the buf_c class, as already mentioned in Homework 8B. You are also allowed to immediately write modified blocks, in which case checkpoint() does nothing. It should return the information whether all blocks were successfully written.