Homework 1: RA, SQL and B+-Trees (due September 24 th, 2014, 2:30pm, in class hard-copy please)

Similar documents
Homework 1: RA, SQL and B+-Trees (due Feb 7, 2017, 9:30am, in class hard-copy please)

Homework 1: Relational Algebra and SQL (due February 10 th, 2016, 4:00pm, in class hard-copy please)

Homework 2: E/R Models and More SQL (due February 17 th, 2016, 4:00pm, in class hard-copy please)

Homework 2: Query Processing/Optimization, Transactions/Recovery (due February 16th, 2017, 9:30am, in class hard-copy please)

Homework 4: Query Processing, Query Optimization (due March 21 st, 2016, 4:00pm, in class hard-copy please)

Homework 3: Map-Reduce, Frequent Itemsets, LSH, Streams (due March 16 th, 9:30am in class hard-copy please)

Homework 5: Miscellanea (due April 26 th, 2013, 9:05am, in class hard-copy please)

Homework 6: FDs, NFs and XML (due April 15 th, 2015, 4:00pm, hard-copy in-class please)

Homework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in class hard-copy please)

Homework 6: FDs, NFs and XML (due April 13 th, 2016, 4:00pm, hard-copy in-class please)

1 (10) 2 (8) 3 (12) 4 (14) 5 (6) Total (50)

CS 564 Final Exam Fall 2015 Answers

Homework 2 (by Ao Zeng) Solutions Due: Friday Sept 28, 11:59pm

Project Assignment 2 (due April 6 th, 2016, 4:00pm, in class hard-copy please)

CS143: Index. Book Chapters: (4 th ) , (5 th ) , , 12.10

Topics to Learn. Important concepts. Tree-based index. Hash-based index

Today s topics. Null Values. Nulls and Views in SQL. Standard Boolean 2-valued logic 9/5/17. 2-valued logic does not work for nulls

Relational Model & Algebra. Announcements (Thu. Aug. 27) Relational data model. CPS 116 Introduction to Database Systems

CSEN 501 CSEN501 - Databases I

Announcements. Relational Model & Algebra. Example. Relational data model. Example. Schema versus instance. Lecture notes

Project Assignment 2 (due April 6 th, 2015, 4:00pm, in class hard-copy please)

Homework 7: Transactions, Logging and Recovery (due April 22nd, 2015, 4:00pm, in class hard-copy please)

Relational Query Languages

CS 348 Introduction to Database Management Assignment 2

CMU - SCS / Database Applications Spring 2013, C. Faloutsos Homework 1: E.R. + Formal Q.L. Deadline: 1:30pm on Tuesday, 2/5/2013

EECS 647: Introduction to Database Systems

Lecture 3 More SQL. Instructor: Sudeepa Roy. CompSci 516: Database Systems

Polls on Piazza. Open for 2 days Outline today: Next time: "witnesses" (traditionally students find this topic the most difficult)

CSC 261/461 Database Systems Lecture 19

Example Examination. Allocated Time: 100 minutes Maximum Points: 250

Announcements (September 14) SQL: Part I SQL. Creating and dropping tables. Basic queries: SFW statement. Example: reading a table

CS145 Midterm Examination

CS425 Midterm Exam Summer C 2012

Lecture 3 SQL - 2. Today s topic. Recap: Lecture 2. Basic SQL Query. Conceptual Evaluation Strategy 9/3/17. Instructor: Sudeepa Roy

Lecture 2 SQL. Instructor: Sudeepa Roy. CompSci 516: Data Intensive Computing Systems

Announcements (September 18) SQL: Part II. Solution 1. Incomplete information. Solution 3? Solution 2. Homework #1 due today (11:59pm)

CompSci 516: Database Systems

What s a database system? Review of Basic Database Concepts. Entity-relationship (E/R) diagram. Two important questions. Physical data independence

DS Introduction to SQL Part 2 Multi-table Queries. By Michael Hahsler based on slides for CS145 Introduction to Databases (Stanford)

Query Processing & Optimization. CS 377: Database Systems

Chapter 6 The database Language SQL as a tutorial

CSC 261/461 Database Systems Lecture 13. Fall 2017

University of California, Berkeley. CS 186 Introduction to Databases, Spring 2014, Prof. Dan Olteanu MIDTERM

Chapter 6 The database Language SQL as a tutorial

SQL: Part II. Announcements (September 18) Incomplete information. CPS 116 Introduction to Database Systems. Homework #1 due today (11:59pm)

Introduction to Data Management. Lecture #4 (E-R Relational Translation)

VIEW OTHER QUESTION PAPERS

Carnegie Mellon University Department of Computer Science /615 - Database Applications C. Faloutsos & A. Pavlo, Spring 2014.

Relational Algebra for sets Introduction to relational algebra for bags

Introduction to SQL Part 1 By Michael Hahsler based on slides for CS145 Introduction to Databases (Stanford)

CS 245 Midterm Exam Solution Winter 2015

Database Design. Goal: specification of database schema Methodology:

9/8/2018. Prerequisites. Grading. People & Contact Information. Textbooks. Course Info. CS430/630 Database Management Systems Fall 2018

The Relational Model of Data (ii)

Review. The Relational Model. Glossary. Review. Data Models. Why Study the Relational Model? Why use a DBMS? OS provides RAM and disk

Relational Model and Relational Algebra

Lecture 16. The Relational Model

SQL - Lecture 3 (Aggregation, etc.)

Score. 1 (10) 2 (10) 3 (8) 4 (13) 5 (9) Total (50)

CS 4604: Introduc0on to Database Management Systems. B. Aditya Prakash Lecture #3: SQL and Rela2onal Algebra- - - Part 1

4/10/2018. Relational Algebra (RA) 1. Selection (σ) 2. Projection (Π) Note that RA Operators are Compositional! 3.

SQL: Part III. Announcements. Constraints. CPS 216 Advanced Database Systems

SQL. The Basics Advanced Manipulation Constraints Authorization 1. 1

Homework 6. Question Points Score Query Optimization 20 Functional Dependencies 20 Decompositions 30 Normal Forms 30 Total: 100

Database Management Systems. Chapter 3 Part 1

Introduction to Data Management. Lecture #5 Relational Model (Cont.) & E-Rà Relational Mapping

SQL - Data Query language

Database Management Systems (COP 5725) Homework 3

Lesson 18: There is Only One Line Passing Through a Given Point with a Given

1. (a) Briefly explain the Database Design process. (b) Define these terms: Entity, Entity set, Attribute, Key. [7+8] FIRSTRANKER

The Relational Model. Chapter 3. Comp 521 Files and Databases Fall

Two hours UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE

COMP 244 DATABASE CONCEPTS AND APPLICATIONS

CSE 344 JANUARY 5 TH INTRO TO THE RELATIONAL DATABASE

CMPT 354: Database System I. Lecture 3. SQL Basics

CS 4320/5320 Homework 2

Two hours UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE. Date: Thursday 16th January 2014 Time: 09:45-11:45. Please answer BOTH Questions

Lecture 2: Introduction to SQL

CS143: Relational Model

Database Systems ( 資料庫系統 )

IMPORTANT: Circle the last two letters of your class account:

Lecture 2 SQL. Announcements. Recap: Lecture 1. Today s topic. Semi-structured Data and XML. XML: an overview 8/30/17. Instructor: Sudeepa Roy

COMP 430 Intro. to Database Systems

Announcements (September 21) SQL: Part III. Triggers. Active data. Trigger options. Trigger example

The Relational Model. Chapter 3. Comp 521 Files and Databases Fall

Relational Model & Algebra. Announcements (Tue. Sep. 3) Relational data model. CompSci 316 Introduction to Database Systems

CSIT5300: Advanced Database Systems

Modern Database Systems Lecture 1

Midterm Exam. Name: CSE232A, Winter February 21, Brief Directions:

Data! CS 133: Databases. Goals for Today. So, what is a database? What is a database anyway? From the textbook:

CS2300: File Structures and Introduction to Database Systems

The Relational Data Model. Data Model

CSE 444, Winter 2011, Midterm Examination 9 February 2011

GUJARAT TECHNOLOGICAL UNIVERSITY

Chapter 3: Introduction to SQL. Chapter 3: Introduction to SQL

CS 582 Database Management Systems II

Two hours UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE

Database Applications (15-415)

CS 202, Fall 2017 Homework #4 Balanced Search Trees and Hashing Due Date: December 18, 2017

The Relational Model. Week 2

Transcription:

Virginia Tech. Computer Science CS 5614 (Big) Data Management Systems Fall 2014, Prakash Homework 1: RA, SQL and B+-Trees (due September 24 th, 2014, 2:30pm, in class hard-copy please) Reminders: a. Out of 100 points. Contains 5 pages. b. Rough time-estimates: 5~8 hours. c. Please type your answers. Illegible handwriting may get no points, at the discretion of the grader. Only drawings may be hand-drawn, as long as they are neat and legible. d. There could be more than one correct answer. We shall accept them all. e. Whenever you are making an assumption, please state it clearly. f. Unless otherwise mentioned, you may use any SQL/RA operator seen in class/in textbook. g. Unless otherwise specified, assume set-semantics for RA and bag-semantics for SQL. h. Feel free to use the linear notation for RA and create intermediate views for SQL (unless otherwise mentioned in the problem). i. Each HW has to be done individually, without taking any help from non-class resources (e.g. websites etc). Q1. RA: Bars [30 points] Consider the following relational database that stores information about bars and customers: Drinker (name, address)! Bar (name, address)! Beer (name, brewer)! Frequents (drinker, bar, times a week) Likes (drinker, beer) Serves (bar, beer, price) Write the following queries in relational algebra: Q1.1. Q1.2. Q1.3. Q1.4. Q1.5. (2 points) Find all drinkers who frequent James Joyce Pub. (2 points) Find all bars that serve both Amstel and Corona. (3 points) Find all bars that serve at least one of the beers Amy likes for no more than $2.50. (3 points) For each bar, find all beers served at this bar that are liked by none of the drinkers who frequent that bar. (5 points) Find all drinkers who frequent only those bars that serve some beers they like. 1

Q1.6. Q1.7. (5 points) Find all drinkers who frequent every bar that serves some beers they like. (10 points) Find those drinkers who enjoy exactly the same set of beers as Amy. Q2. SQLite [25 points] This problem will use a database containing data about a university. The relations are in a SQLite database. Download and install SQLite3 from http://www.sqlite.org The schema of the database is provided below (keys are in bold, field types are omitted and they could be easily identified using SQLite): student(sid, sname, sex, age, year, gpa) dept(dname, numphds) prof(pname, dname) course(cno, cname, dname) major(dname, sid) section(dname, cno, sectno, pname) enroll(sid, grade, dname, cno, sectno) Before you start, it would be a good idea to take a look at the database file and familiarize yourself with its contents. You can run run this file on SQLite, and the database and tables will be loaded properly. File can be found here: http://people.cs.vt.edu/~badityap/classes/cs5614-fall14/homeworks/hw1/database.txt In this assignment, you will only deal with querying part of SQL. You are NOT allowed to tamper with (change the contents of) the database, i.e., CREATE, INSERT, DELETE, ALTER, UPDATE etc. However, please feel free to issue any query-oriented SQL statements, even if they are not related with the questions in this assignment. Queries Write SQL queries that answer the questions below (one query per question) and run them on SQLite. The query answers must not contain duplicates, but you should use the SQL keyword distinct only when necessary. For this question, creation of temporary tables is NOT allowed, i.e., for each question you have to write exactly one SQL statement (possible using nested SQL). Note that it is possible that the answer to some of them may be empty. Q2.1. (2 points) To find the name of the oldest student. 2

Q2.2. (2 points) Find the names and gpas of the students who are enrolled in 312. Q2.3. (2 points) Find the names and majors of students who are taking one of the Artificial Intelligence courses. Q2.4. (2 points) Find the names of students who are enrolled in a course from both the "Computer Sciences" and "Chemical Engineering" departments. Q2.5. (3 points) For each department, find the average age of the students majoring in that department along with the age difference between the oldest and youngest students. Q2.6. (3 points) Find the names of students being taught by professor "Jason Singer". Q2.7. (4 points) How many students have more than one major? (Hint: requires a nested query) Q2.8. (4 points) Find the name(s) of the oldest first year student {year = 1} (Hint: requires a nested query) Q2.9. (3 points) For those departments that have no majors taking a "Computer Sciences" course, print the department name and the number of PhD students in the department. Assignment Submission Format your answers as follows (in the hardcopy itself): 1. Query: SQL statement 1 (for query 1); Result: Copy-paste Output for query 1 2. Query: SQL statement 2 (for query 2) Result: Copy-paste Output for query 2.. 3

Q3: Crypt-arithmetic [20 points] This exercise is designed to help you think out of the box on the use of database programming for solving problems. You are given the crypt-arithmetic puzzle: SEND + MORE ----- MONEY The goal of the puzzle is to substitute numbers (from zero to nine) for letters, so that the addition works out. There are some constraints your solution should respect: 1. The same number should be used for a given letter, throughout. For example, if you guess, "5" for the letter E, then E should get the value "5" at all the places it occurs. 2. Different letters should get different numbers, e.g., you cannot assign "4" to both E and to M. 3. None of the numbers SEND, MORE, or MONEY have any leading zeroes, i.e., they do not begin with a sequence of zeroes. Explain how you will solve this puzzle by creating database tables and writing a query. Q3.1. (5 points) The schema of the tables you use. Q3.2. (10 points) Your SQL query. Q3.3. (5 points) The solution you get for the puzzle when you use an SQL interpreter and RDBMS to solve this puzzle. Copy-paste the output you get. Hints The SQL query may be quite long so you may find it useful to create the query in a text file and use the source command (or equivalent) in your SQL interpreter to read in and execute the query. Q4: Social Network Friends [10 points] We are in-charge of an online social network MyBook. Consider the relation MyBookFriends(Id, FriendId), which is a giant table for each user on MyBook. The relation MyBookFriends keeps track of all friends of that user. Together, the two attributes comprise the (only) key for this relation. Researchers studying social networks are interested in counting the number of people who have k friends, for every possible value of k. 4

(10 points) Write a SQL query that operates on MyBookFriends and returns a relation Counts(NumFriends, NumIds). If this relation stores the tuple (k, l), then it means that there are l distinct users (Id values) in MyBookFriends, each of who has exactly k friends. For no points: imagine you run this query on a real social network like Facebook, and then plot the values with k on the x-axis and l on the y-axis---what is the shape of the plot you expect? Uniform? Linear? Non-linear? Any other particular function? Hints As you can imagine, you do not know beforehand all the different values of k (or l) that should appear in Counts. Hence your SQL query should be able to figure out all these values automatically and correctly. Q5. B+ Tree [15 points] Assume the following B+ tree exists with d = 2: Sketch the state of the B+ tree after each step in the following sequence of insertions and deletions, maintaining at least 50% occupancy at each step and overflow triggered split. In the diagram above we have not shown pointers in the leaf nodes for simplicity but remember that the leaf nodes are linked lists. Note: Use the insertion and deletion algorithms given in the textbook section 10.5 (page 349) and 10.6 (page 353) respectively (also in the Slides). Root node can have 1 to 2d keys. During deletion redistribute the leaf pages wherever possible. Q5.1. (3 points) Insert 34 Q5.2. (3 points) Insert 2 Q5.3. (3 points) Insert 15 Q5.4. (3 points) Delete 28 Q5.5. (3 points) Delete 8 5