Homework 1: RA, SQL and B+-Trees (due Feb 7, 2017, 9:30am, in class hard-copy please)

Similar documents
Homework 1: RA, SQL and B+-Trees (due September 24 th, 2014, 2:30pm, in class hard-copy please)

Homework 1: Relational Algebra and SQL (due February 10 th, 2016, 4:00pm, in class hard-copy please)

Homework 2: E/R Models and More SQL (due February 17 th, 2016, 4:00pm, in class hard-copy please)

Homework 2: Query Processing/Optimization, Transactions/Recovery (due February 16th, 2017, 9:30am, in class hard-copy please)

Homework 5: Miscellanea (due April 26 th, 2013, 9:05am, in class hard-copy please)

Homework 4: Query Processing, Query Optimization (due March 21 st, 2016, 4:00pm, in class hard-copy please)

Homework 3: Map-Reduce, Frequent Itemsets, LSH, Streams (due March 16 th, 9:30am in class hard-copy please)

Homework 6: FDs, NFs and XML (due April 15 th, 2015, 4:00pm, hard-copy in-class please)

1 (10) 2 (8) 3 (12) 4 (14) 5 (6) Total (50)

Homework 6: FDs, NFs and XML (due April 13 th, 2016, 4:00pm, hard-copy in-class please)

Chapter 3: Introduction to SQL. Chapter 3: Introduction to SQL

Project Assignment 2 (due April 6 th, 2016, 4:00pm, in class hard-copy please)

EECS 647: Introduction to Database Systems

Example Examination. Allocated Time: 100 minutes Maximum Points: 250

CSEN 501 CSEN501 - Databases I

Homework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in class hard-copy please)

Chapter 3: Introduction to SQL

Chapter 3: Introduction to SQL

Homework 7: Transactions, Logging and Recovery (due April 22nd, 2015, 4:00pm, in class hard-copy please)

Polls on Piazza. Open for 2 days Outline today: Next time: "witnesses" (traditionally students find this topic the most difficult)

Today s topics. Null Values. Nulls and Views in SQL. Standard Boolean 2-valued logic 9/5/17. 2-valued logic does not work for nulls

CS143: Index. Book Chapters: (4 th ) , (5 th ) , , 12.10

CS145 Midterm Examination

Topics to Learn. Important concepts. Tree-based index. Hash-based index

CS 348 Introduction to Database Management Assignment 2

CS 564 Final Exam Fall 2015 Answers

Lecture 2 SQL. Instructor: Sudeepa Roy. CompSci 516: Data Intensive Computing Systems

CMU - SCS / Database Applications Spring 2013, C. Faloutsos Homework 1: E.R. + Formal Q.L. Deadline: 1:30pm on Tuesday, 2/5/2013

Announcements. Relational Model & Algebra. Example. Relational data model. Example. Schema versus instance. Lecture notes

The SQL data-definition language (DDL) allows defining :

University of California, Berkeley. CS 186 Introduction to Databases, Spring 2014, Prof. Dan Olteanu MIDTERM

Homework 2 (by Ao Zeng) Solutions Due: Friday Sept 28, 11:59pm

Score. 1 (10) 2 (10) 3 (8) 4 (13) 5 (9) Total (50)

CS425 Midterm Exam Summer C 2012

Relational Query Languages

Lecture 3 More SQL. Instructor: Sudeepa Roy. CompSci 516: Database Systems

SQL - Data Query language

Relational Model & Algebra. Announcements (Thu. Aug. 27) Relational data model. CPS 116 Introduction to Database Systems

Project Assignment 2 (due April 6 th, 2015, 4:00pm, in class hard-copy please)

Announcements (September 18) SQL: Part II. Solution 1. Incomplete information. Solution 3? Solution 2. Homework #1 due today (11:59pm)

Announcements (September 14) SQL: Part I SQL. Creating and dropping tables. Basic queries: SFW statement. Example: reading a table

What s a database system? Review of Basic Database Concepts. Entity-relationship (E/R) diagram. Two important questions. Physical data independence

CMPT 354: Database System I. Lecture 3. SQL Basics

Introduction to SQL Part 1 By Michael Hahsler based on slides for CS145 Introduction to Databases (Stanford)

DS Introduction to SQL Part 2 Multi-table Queries. By Michael Hahsler based on slides for CS145 Introduction to Databases (Stanford)

SQL - Lecture 3 (Aggregation, etc.)

Modern Database Systems Lecture 1

SQL: Part II. Announcements (September 18) Incomplete information. CPS 116 Introduction to Database Systems. Homework #1 due today (11:59pm)

COMP 244 DATABASE CONCEPTS AND APPLICATIONS

Query Processing & Optimization. CS 377: Database Systems

Relational Algebra for sets Introduction to relational algebra for bags

Interview Questions on DBMS and SQL [Compiled by M V Kamal, Associate Professor, CSE Dept]

CSC 261/461 Database Systems Lecture 13. Fall 2017

Chapter 6 The database Language SQL as a tutorial

CSC 261/461 Database Systems Lecture 19

Lecture 3 SQL - 2. Today s topic. Recap: Lecture 2. Basic SQL Query. Conceptual Evaluation Strategy 9/3/17. Instructor: Sudeepa Roy

Carnegie Mellon University Department of Computer Science /615 - Database Applications C. Faloutsos & A. Pavlo, Spring 2014.

Chapter 6 The database Language SQL as a tutorial

CS 4604: Introduc0on to Database Management Systems. B. Aditya Prakash Lecture #3: SQL and Rela2onal Algebra- - - Part 1

Relational Model and Relational Algebra

WHAT IS SQL. Database query language, which can also: Define structure of data Modify data Specify security constraints

CS143: Relational Model

CSE 344 FEBRUARY 14 TH INDEXING

COMP 244 DATABASE CONCEPTS & APPLICATIONS

Database Management Systems. Chapter 3 Part 1

Introduction to Data Management. Lecture #4 (E-R Relational Translation)

EE221 Databases Practicals Manual

Database Languages. A DBMS provides two types of languages: Language for accessing & manipulating the data. Language for defining a database schema

CS 245 Midterm Exam Solution Winter 2015

Review. The Relational Model. Glossary. Review. Data Models. Why Study the Relational Model? Why use a DBMS? OS provides RAM and disk

IMPORTANT: Circle the last two letters of your class account:

CPS 216 Spring 2003 Homework #1 Assigned: Wednesday, January 22 Due: Monday, February 10

SQL: Queries, Constraints, Triggers

Lecture 2: Introduction to SQL

SQL. The Basics Advanced Manipulation Constraints Authorization 1. 1

Informatics 1: Data & Analysis

Database Design. Goal: specification of database schema Methodology:

CSCI-1200 Data Structures Spring 2018 Lecture 15 Associative Containers (Maps), Part 2

CSE-6490B Assignment #1

Midterm 1: CS186, Spring I. Storage: Disk, Files, Buffers [11 points] cs186-

CSIT5300: Advanced Database Systems

VIEW OTHER QUESTION PAPERS

CompSci 516: Database Systems

CMPT 354: Database System I. Lecture 2. Relational Model

SQL: Part III. Announcements. Constraints. CPS 216 Advanced Database Systems

CS 582 Database Management Systems II

CSIT5300: Advanced Database Systems

Midterm Exam. Name: CSE232A, Winter February 21, Brief Directions:

CS3DB3/SE4DB3/SE6DB3 TUTORIAL

ECE 650 Systems Programming & Engineering. Spring 2018

CS425 Fall 2017 Boris Glavic Chapter 4: Introduction to SQL

Introduction to Data Management. Lecture #5 Relational Model (Cont.) & E-Rà Relational Mapping

CSC 343 Winter SQL: Aggregation, Joins, and Triggers MICHAEL LIUT

CS233:HACD Introduction to Relational Databases Notes for Section 4: Relational Algebra, Principles and Part I 1. Cover slide

Introduction to Database Systems CSE 414

1. (a) Briefly explain the Database Design process. (b) Define these terms: Entity, Entity set, Attribute, Key. [7+8] FIRSTRANKER

CSCE-608 Database Systems. COURSE PROJECT #2 (Due December 5, 2018)

Modern Database Systems CS-E4610

Database Applications (15-415)

Two hours UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE

Transcription:

Virginia Tech. Computer Science CS 5614 (Big) Data Management Systems Spring 2017, Prakash Homework 1: RA, SQL and B+-Trees (due Feb 7, 2017, 9:30am, in class hard-copy please) Reminders: a. Out of 100 points. Contains 5 pages. b. Rough time-estimates: 5~8 hours. c. Please type your answers. Illegible handwriting may get no points, at the discretion of the grader. Only drawings may be hand-drawn, as long as they are neat and legible. d. There could be more than one correct answer. We shall accept them all. e. Whenever you are making an assumption, please state it clearly. f. Unless otherwise mentioned, you may use any SQL/RA operator seen in class/in textbook. g. Unless otherwise specified, assume set-semantics for RA and bag-semantics for SQL. h. Feel free to use the linear notation for RA and create intermediate views for SQL (unless otherwise mentioned in the problem). i. Each HW has to be done individually, without taking any help from non-class resources (e.g. websites etc). Q1. RA: Bars [30 points] Consider the following relational database that stores information about bars and customers: Drinker (name, address)! Bar (name, address)! Beer (name, brewer)! Frequents (drinker, bar, times a week) Likes (drinker, beer) Serves (bar, beer, price) Write the following queries in relational algebra: Q1.1. Q1.2. Q1.3. Q1.4. Q1.5. (2 points) Find all drinkers who frequent James Joyce Pub. (2 points) Find all bars that serve both Amstel and Corona. (3 points) Find all bars that serve at least one of the beers Amy likes for no more than $2.50. (3 points) For each bar, find all beers served at this bar that are liked by none of the drinkers who frequent that bar. (5 points) Find all drinkers who frequent only those bars that serve some beers they like. 1

Q1.6. Q1.7. (5 points) Find all drinkers who frequent every bar that serves some beers they like. (10 points) Find those drinkers who enjoy exactly the same set of beers as Amy. Q2. SQLite [25 points] This problem will use a database containing data about a university. The relations are in a SQLite database. Download and install SQLite3 from http://www.sqlite.org The schema of the database is provided below (keys are in bold, field types are omitted and they could be easily identified using SQLite): student(sid, sname, sex, age, year, gpa) dept(dname, numphds) prof(pname, dname) course(cno, cname, dname) major(dname, sid) section(dname, cno, sectno, pname) enroll(sid, grade, dname, cno, sectno) Before you start, it would be a good idea to take a look at the database file and familiarize yourself with its contents. You can run run this file on SQLite, and the database and tables will be loaded properly. File can be found here: http://people.cs.vt.edu/~badityap/classes/cs5614-spr17/homeworks/hw1/database.txt In this assignment, you will only deal with querying part of SQL. You are NOT allowed to tamper with (change the contents of) the database, i.e., CREATE, INSERT, DELETE, ALTER, UPDATE etc. However, please feel free to issue any query-oriented SQL statements, even if they are not related with the questions in this assignment. Queries Write SQL queries that answer the questions below (one query per question) and run them on SQLite. The query answers must not contain duplicates, but you should use the SQL keyword distinct only when necessary. For this question, creation of temporary tables is NOT allowed, i.e., for each question you have to write exactly one SQL statement (possible using nested SQL). Note that it is possible that the answer to some of them may be empty. Q2.1. (2 points) To find the name of the oldest student. 2

Q2.2. (2 points) Find the names and gpas of the students who are enrolled in 312. Q2.3. (2 points) Find the names and majors of students who are taking one of the Artificial Intelligence courses. Q2.4. (2 points) Find the names of students who are enrolled in a course from both the "Computer Sciences" and "Chemical Engineering" departments. Q2.5. (3 points) For each department, find the average age of the students majoring in that department along with the age difference between the oldest and youngest students. Q2.6. (3 points) Find the names of students being taught by professor "Jason Singer". Q2.7. (4 points) How many students have more than one major? (Hint: requires a nested query) Q2.8. (4 points) Find the name(s) of the oldest first year student {year = 1} (Hint: requires a nested query) Q2.9. (3 points) For those departments that have no majors taking a "Computer Sciences" course, print the department name and the number of PhD students in the department. Assignment Submission Format your answers as follows (in the hardcopy itself): 1. Query: SQL statement 1 (for query 1); Result: Copy-paste Output for query 1 2. Query: SQL statement 2 (for query 2) Result: Copy-paste Output for query 2.. 3

Q3: Crypt-arithmetic [20 points] This exercise is designed to help you think out of the box on the use of database programming for solving problems. You are given the crypt-arithmetic puzzle: SEND + MORE ----- MONEY The goal of the puzzle is to substitute numbers (from zero to nine) for letters, so that the addition works out. There are some constraints your solution should respect: 1. The same number should be used for a given letter, throughout. For example, if you guess, "5" for the letter E, then E should get the value "5" at all the places it occurs. 2. Different letters should get different numbers, e.g., you cannot assign "4" to both E and to M. 3. None of the numbers SEND, MORE, or MONEY have any leading zeroes, i.e., they do not begin with a sequence of zeroes. Explain how you will solve this puzzle by creating database tables and writing a query. Q3.1. (5 points) The schema of the tables you use. Q3.2. (10 points) Your SQL query. Q3.3. (5 points) The solution you get for the puzzle when you use an SQL interpreter and RDBMS to solve this puzzle. Copy-paste the output you get. Hints The SQL query may be quite long so you may find it useful to create the query in a text file and use the source command (or equivalent) in your SQL interpreter to read in and execute the query. Q4: Life without HAVING [10 points] You are given the following relations: Take (StudentID, CourseID) RequiredForGraduation (CourseID) The Take relation lists IDs of students and IDs of courses taken by the students. (Note that in this problem, we are assuming that each course has a unique ID, as opposed to the schema we used in class.) The RequiredForGraduation relation lists the courses every student must take to graduate. The following query finds the students who have satisfied all the requirements for graduation. 4

SELECT StudentId FROM Take AS T, RequiredForGraduation AS R WHERE T.CourseID = R.CourseID GROUP BY T.StudentID HAVING COUNT(T.CourseId) = (SELECT COUNT(CourseId) FROM RequiredForGraduation); You hate the HAVING clause and do not see the point of views. Rewrite this query without using views or the HAVING clause. Q5. B+ Tree [15 points] Assume the following B+ tree exists with d = 2: Sketch the state of the B+ tree after each step in the following sequence of insertions and deletions, maintaining at least 50% occupancy at each step and overflow triggered split. In the diagram above we have not shown pointers in the leaf nodes for simplicity but remember that the leaf nodes are linked lists. Note: Use the insertion and deletion algorithms given in the textbook section 10.5 (page 349) and 10.6 (page 353) respectively (also in the Slides). Root node can have 1 to 2d keys. During deletion redistribute the leaf pages wherever possible. Q5.1. (3 points) Insert 34 Q5.2. (3 points) Insert 2 Q5.3. (3 points) Insert 15 Q5.4. (3 points) Delete 28 Q5.5. (3 points) Delete 8 5