CMSC424: Programming Project

Similar documents
CMSC424: Programming Project

Outline. Database Management and Tuning. Outline. Join Strategies Running Example. Index Tuning. Johann Gamper. Unit 6 April 12, 2012

CS 564 Final Exam Fall 2015 Answers

CS448 Designing and Implementing a Mini Relational DBMS

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Query Evaluation! References:! q [RG-3ed] Chapter 12, 13, 14, 15! q [SKS-6ed] Chapter 12, 13!

Interpreting Explain Plan Output. John Mullins

Problem Solving through Programming In C Prof. Anupam Basu Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur

Principles of Data Management. Lecture #9 (Query Processing Overview)

ORACLE TRAINING CURRICULUM. Relational Databases and Relational Database Management Systems

Database Applications (15-415)

CMSC424: Database Design. Instructor: Amol Deshpande

University of Waterloo Midterm Examination Sample Solution

Notes. Some of these slides are based on a slide set provided by Ulf Leser. CS 640 Query Processing Winter / 30. Notes

Database Systems External Sorting and Query Optimization. A.R. Hurson 323 CS Building

Deliverables. Problem Description

Operator Implementation Wrap-Up Query Optimization

Relational Query Optimization. Highlights of System R Optimizer

CSE 344 APRIL 20 TH RDBMS INTERNALS

Query Processing and Query Optimization. Prof Monika Shah

Implementing Relational Operators: Selection, Projection, Join. Database Management Systems, R. Ramakrishnan and J. Gehrke 1

Database Systems. Project 2

Query Processing: an Overview. Query Processing in a Nutshell. .. CSC 468 DBMS Implementation Alexander Dekhtyar.. QUERY. Parser.

Project 4 Query Optimization Part 2: Join Optimization

JavaCC: SimpleExamples

CS101 Introduction to Programming Languages and Compilers

COSC6340: Database Systems Mini-DBMS Project evaluating SQL queries

Examples of Physical Query Plan Alternatives. Selected Material from Chapters 12, 14 and 15

Oracle 9i Application Development and Tuning

CSCE-608 Database Systems. COURSE PROJECT #2 (Due December 5, 2018)

CSE 344 FEBRUARY 14 TH INDEXING

CS 4321/5321 Project 2

QUERY OPTIMIZATION E Jayant Haritsa Computer Science and Automation Indian Institute of Science. JAN 2014 Slide 1 QUERY OPTIMIZATION

20 Essential Oracle SQL and PL/SQL Tuning Tips. John Mullins

Overview of DB & IR. ICS 624 Spring Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa

Introduction to Database Systems CSE 344

Principles of Data Management. Lecture #12 (Query Optimization I)

Implementation of Relational Operations. Introduction. CS 186, Fall 2002, Lecture 19 R&G - Chapter 12

Evaluation of Relational Operations

ADVANTAGES. Via PL/SQL, all sorts of calculations can be done quickly and efficiently without use of Oracle engine.

CMSC424: Database Design. Instructor: Amol Deshpande

UNIT 2 Set Operators

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Query Optimizer MySQL vs. PostgreSQL

ASSIGNMENT 5 Objects, Files, and More Garage Management

CSE 190D Spring 2017 Final Exam Answers

Overview of Implementing Relational Operators and Query Evaluation

CSC443 Winter 2018 Assignment 1. Part I: Disk access characteristics

Databases IIB: DBMS-Implementation Exercise Sheet 13

Carnegie Mellon University in Qatar Spring Problem Set 4. Out: April 01, 2018

Hash table example. B+ Tree Index by Example Recall binary trees from CSE 143! Clustered vs Unclustered. Example

Project. CIS611 Spring 2014 SS Chung Due by April 15. Performance Evaluation Experiment on Query Rewrite Optimization

CS451 - Assignment 8 Faster Naive Bayes? Say it ain t so...

Announcements. From SQL to RA. Query Evaluation Steps. An Equivalent Expression

Query Processing & Optimization

CompSci 516 Data Intensive Computing Systems

CSE 444 Final Exam. August 21, Question 1 / 15. Question 2 / 25. Question 3 / 25. Question 4 / 15. Question 5 / 20.

CIS 550, Database and Information Systems Homework 5 (Due by 11:59:59 on April 14)

CSC 261/461 Database Systems Lecture 19

Oracle Database 11g: SQL Tuning Workshop

Implementation of Relational Operations

Programming Standards: You must conform to good programming/documentation standards. Some specifics:

Evaluation of Relational Operations: Other Techniques. Chapter 14 Sayyed Nezhadi

Introduction to Database Systems CSE 414. Lecture 26: More Indexes and Operator Costs

Data Modeling and Databases Ch 10: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich

Teiid Designer User Guide 7.5.0

Relational Query Optimization

Make sure you have the latest Hive trunk by running svn up in your Hive directory. More detailed instructions on downloading and setting up

Database Systems CSE 414

Performance Optimization for Informatica Data Services ( Hotfix 3)

RELATIONAL OPERATORS #1

Data Modeling and Databases Ch 9: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich

Implementing a Logical-to-Physical Mapping

Physical Design. Elena Baralis, Silvia Chiusano Politecnico di Torino. Phases of database design D B M G. Database Management Systems. Pag.

Oracle. Exam Questions 1Z Oracle Database 11g Release 2: SQL Tuning Exam. Version:Demo

Evaluation of Relational Operations. Relational Operations

Principles of Database Management Systems

Query Evaluation (i)

Fundamentals of Database Systems

Evaluation of relational operations

Query Optimizer MySQL vs. PostgreSQL

R has a ordered clustering index file on its tuples: Read index file to get the location of the tuple with the next smallest value

University of Waterloo Midterm Examination Solution

External Sorting Sorting Tables Larger Than Main Memory

INFSCI 2711 Database Analysis and Design Example I for Final Exam: Solutions

Tuning SQL without the Tuning Pack. John Larkin JP Morgan Chase

The query processor turns user queries and data modification commands into a query plan - a sequence of operations (or algorithm) on the database

CS2112 Fall Assignment 4 Parsing and Fault Injection. Due: March 18, 2014 Overview draft due: March 14, 2014

An Introduction to Python

Parser: SQL parse tree

Learn about Oracle DECODE and see some examples in this article. section below for an example on how to use the DECODE function in the WHERE clause.

CS1622. Semantic Analysis. The Compiler So Far. Lecture 15 Semantic Analysis. How to build symbol tables How to use them to find

Lassonde School of Engineering Winter 2016 Term Course No: 4411 Database Management Systems

Database Applications (15-415)

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17

MySQL for Developers Ed 3

Module 4. Implementation of XQuery. Part 0: Background on relational query processing

3. When you process a largest recent earthquake query, you should print out:

ASSIGNMENT 5 Data Structures, Files, Exceptions, and To-Do Lists

Query Processing. Introduction to Databases CompSci 316 Fall 2017

Transcription:

CMSC424: Programming Project Due: April 24, 2012 There are two parts to this assignment. The first one involves generating and analyzing the query plans that Oracle generates. The second part asks you to implement a query processing operator in a Toy Relational Database Management System. Submission of the second part is through submit server; first part should be submitted as a hardcopy in the class. This assignment is to be done in groups of 2. Part 1 (4 pts) Oracle has following commands to help with analyzing query execution plans, and tuning the execution. Other database systems have their own set of commands. set timing on; : If this is set, the time taken to execute a query is output at the end. set autotrace on; : The plan used to execute the query is shown at the end of execution. Oracle Hints: Oracle also allows you to provide hints to the optimizer on what kinds of algorithms to use, what not to use etc. This can be used to change the plan to something else, if you think you know better than Oracle optimizer (not uncommon in practice). A list of hints can be found on the web (e.g., http://psoug.org/reference/hints.html). Here are a couple of examples you might want to try out: Changing join method: You can use hints to change the join method used by the optimizer. select /*+ use_nl(c1, c2) */ count(*) from counties c1, counties c2 where c1.name = c2.name Without the use nl there 1, this query would probably use hash join or merge join. use nl forces it to use nested loops join for joining those two tables. Change use nl to use merge or use hash to force Oracle to use other plans. Enforcing join order: By adding a clause called ordered, you can force the optimizer to use the same join order as specified in the from clauses. Using index or not using index: Simiarly, say I had an index on customers: create index counties_name_idx on counties (name); The following query: select /*+ index(counties) */ * from counties where name = Lee ;..forces the optimizer to use an index on the counties table to execute this query, whereas: select /*+ no_index(counties) */ * from counties where name = Lee ; 1 The syntax is very specific. Hints must appear as shown here.

Your tasks:..forces it to not use the index. By default, it would have probably used an index if one existed. 1. Oracle (as with all database systems) stores the metadata information in some special tables. For example, all tables stores the names of all tables, and bunch of additional information. A second table all tab columns contains the information about all the table attributes, and is used to answer queries such as describe states;. Write a query that returns the same answer as the describe states; query using the second table. Your query must return similar answer, but not identical answer that would be somewhat harder to manage. (Hint: Use describe to see the attributes of second table first). 2. Use the autotrace feature to find the plans for the following four queries and draw them. Note that the autotrace feature actually only prints out the operators; but the indentation levels, and also the order in which operators are printed out, can be used to deduce the query plan. Also, note down the time it took to execute the query (it will probably be too small to be measurable). (a) Query 1: Draw the query plan for this query. select states.name, count(*) as num from states, senators, committees where states.statecode = senators.statecode and senators.name = committees.ranking_member group by states.name; (b) Query 2: Draw and briefly explain the query plan for this query. Why is the query execution plan not performing a join operation? select count(*) from counties, states where counties.statecode = states.statecode; (c) Query 3: Draw and briefly explain the query plan for this query. More specifically, describe how the query is being executed, and what operators Oracle is using for answering this query. with counties_counts as ( select states.name, count(*) as num_counties from states, counties where states.statecode = counties.statecode group by states.name ) select name from counties_counts where num_counties = (select min(num_counties) from counties_counts); (d) Query 4: Draw and briefly explain the query plan for this query. More specifically, describe how the query is being executed (in particular the not exists part), and what operators Oracle is using for answering this query. select name from senators s where not exists (select * from committees c where c.chairman = s.name) and affiliation = D order by name;

3. Explain why the query plans chosen by the following two queries (one of which uses a hint) are different from each other. select * from counties, states where states.statecode = counties.statecode; select /*+ FIRST_ROWS(10) */ * from counties, states where states.statecode = counties.statecode; 4. Force Query 1 above to use nested-loops join instead of sort or hash joins. It is fine if you are not successful in making Oracle do what you want as long as you use the hint correctly. Oracle may ignore the hints in some cases, and in other cases, there maybe requirements for using the hint that are not described. Part 2 (6 pts) The second part of the programming assignment requires you to write an operator in a toy Relational Database Management System, built by us. Description of ToyRDBMS: The ToyRDBMS is written in Java, and uses BerkeleyDB as the underlying storage engine. BerkeleyDB does not provide any relational interface at all. In essence you can think of BerkeleyDB as a persistent key-value store (or more simply, as a disk-resident hash table). The Data in BerkeleyDB is stored in a set of Databases, each of which is a key-value store. Our ToyRDBMS maps relations to BerkeleyDB Databases. So each Database stores the tuples corresponding to a single relation, indexed by the primary key (which must be specified in the CREATE TABLE command). ToyRDBMS supports a very small subset of SQL with many restrictions. The supported commands are: 1. CREATE TABLE: Only two data types are supported: integer and string. The primary key must be identified, can only consist of a single attribute, and further must be the first attribute in the list. Example: create table R (i integer primary key, j integer, s string); 2. DROP TABLE. 3. INSERT VALUES: The usual SQL syntax should work. 4. SELECT: ToyRDBMS supports a very small subset of the form: select [DISTINCT] <list of attributes> from <list of tables> where <list of predicates>; [order by <list of attributes>]; The Select Clause can either contain * or a list of attributes. No aliasing is allowed in either the select clause or the from clause. Only equality predicates are allowed. If a predicate of the type R.a = 10 is used, R.a must be on the left hand side.

Finally, queries should not contain cycles or should not require Cartesian products. Some further details about the implementation follow. Many of the commands listed below are present in the file Makefile. You can run them (on most UNIX-like machines) by: make X (where X is the shortcut). The make commands are listed below. You are provided with a zip file. On glue machines, unzip it to create a directory ToyRDBMS. All commands below must be executed from within this (ToyRDBMS) directory. ToyRDBMS is written in Java, with a handful of files. You can compile it using javac. make all javac -classpath.:./javacc.jar:./je-3.3.82.jar *.java The required libraries are present in the working directory. ToyRDBMS uses javacc for parsing SQL. The parser is specified in the file SQLParser.jj. javacc is used to convert the.jj file into a set of java files. This has already been done for you (you will see several files like: SQLParserConstants.java, SQLParserTokenManager.java etc., which were all automatically generated). However if you modify the parser, you can recreate those files using: make SQLParser.java java -cp./javacc.jar:. javacc SQLParser.jj Ignore the warnings. The LOOKAHEAD is set so that the conflicts should not be a problem. Many of the limitations of the supported syntax are essentially limitations of this parser, and can be addressed quite easily. Some of key code files in ToyRDBMS are: SQLParser.jj: As described above, this contains the SQL parser. CommandLine.java: ToyRDBMS only has a commandline interface at this time. This class implements that interface by accepting user input from the standard input, and executing those statements. Globals.java: This contains some of the Global definitions, and more importantly, much of the code that talks to BerkeleyDB, including creating databases, inserting tuples etc. Operator.java: This is the abstract Operator class that defines the get next() iterator interface. ScanOperator.java: Implements Sequential Scan. JoinOperator.java: Another abstract Operator class. All join operators should be its subclasses. NestedLoopsJoinOperator.java: Implements the Nested Loops join operation. Query.java: Contains the code for the analzing a query, and creating a query plan for it. Currently the query plan is created in a fairly dumb fashion. Read the comments for details. RelationSchema.java, Predicate.java, Tuple.java: All of these should be self-explanatory. ProjectOperator.java: Implements the final project operation. You should only modify this file to implement the distinct and order by.

Running and Testing ToyRDBMS: As mentioned above, you can compile all the java files using the javac command (or you can import all of these into Eclipse if you like, but we won t be able to help you with any compile or runtime issues). ToyRDBMS stores the data in a directory which is a parameter to CommandLine command (see below). This directory must exist. The commands below assume that the directory is named Data and is in the current directory. For testing purposes, one populate file is provided, populate-states.sql, that contains a subset of the data from the SQL Assignment, modified to satisfy the restrictions of ToyRDBMS. You can run these files as: make populate-states cat populate-states.sql java -ea -classpath.:./classes:./javacc.jar:./je-3.3.82.jar CommandLine Data These will create the tables and insert tuples into it. As you can see, CommandLine takes in one argument, the data directory (which must be already created). You can test it using: make query1 echo SELECT * FROM states; java -ea -classpath./classes:./javacc.jar:./je-3.3.82.jar CommandLine Data The Makefile contains a list of other queries that you can execute. Your Task: Your task is to extend the ProjectOperator to handle distinct and order by. Currently the database will exit if it encounters either of these (see queries 5, 6, 7 in the Makefile). Order by may be given a list of attributes (see query 7). The tuples should be ordered in the increasing order by the attribute values. More Details on How: The code already contains a file: ProjectOperator.java, which exits if it encounters either distinct or order by. You are to modify the file appropriately to implement both of those. Submission: You will be submitting just your ProjectOperator.java file, and we will test it automatically against a set of input data files. We will essentially copy your file into our setup, compile it, and test it. You cannot submit any other files in the codebase. The submission is through the submit server (just like the SQL assignment).