In-class activities: Sep 25, 2017

Similar documents
Please pick up your name card

HW1 is due tonight HW2 groups are assigned. Outline today: - nested queries and witnesses - We start with a detailed example! - outer joins, nulls?

NESTED QUERIES AND AGGREGATION CHAPTER 5 (6/E) CHAPTER 8 (5/E)

L07: SQL: Advanced & Practice. CS3200 Database design (sp18 s2) 1/11/2018

L12: ER modeling 5. CS3200 Database design (sp18 s2) 2/22/2018

NESTED QUERIES AND AGGREGATION CHAPTER 5 (6/E) CHAPTER 8 (5/E)

Introduction to Database S ystems Systems CSE 444 Lecture 1 Introduction CSE Summer

SQLite, Firefox, and our small IMDB movie database. CS3200 Database design (sp18 s2) Version 1/17/2018

Introduction to Database S ystems Systems CSE 444 Lecture 1 Introduction CSE Summer

Introduc)on to Database Systems CSE 444. Lecture #1 March 29, 2010

Introduction to Database Systems CSE 444. Lecture 1 Introduction

L20: Joins. CS3200 Database design (sp18 s2) 3/29/2018

SELECT Product.name, Purchase.store FROM Product JOIN Purchase ON Product.name = Purchase.prodName

Introduction to Database Systems CSE 414

SECTION 1 DBMS LAB 1.0 INTRODUCTION 1.1 OBJECTIVES 1.2 INTRODUCTION TO MS-ACCESS. Structure Page No.

CSC 101 Spring 2010 Lab #8 Report Gradesheet

Graph Database Topics. Assignments Neo4j

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Database Systems: Fall 2015 Quiz I

Introduction to Data Management CSE 344

$99.95 per user. Writing Queries for SQL Server (2005/2008 Edition) CourseId: 160 Skill level: Run Time: 42+ hours (209 videos)

Microsoft Access 2016

Microsoft Access 2016

Advanced Databases. Lecture 4 - Query Optimization. Masood Niazi Torshiz Islamic Azad university- Mashhad Branch

Database Systems CSE 414

Principles of Database Systems CSE 544. Lecture #2 SQL The Complete Story

CS 4400 Introduction to Database Systems 2002 Spring Term Project (Section A)

The Relational Algebra

SQL queries II. Set operations and joins

CMSC 201 Fall 2018 Lab 04 While Loops

Database Systems External Sorting and Query Optimization. A.R. Hurson 323 CS Building

Introduction to SQL Part 2 by Michael Hahsler Based on slides for CS145 Introduction to Databases (Stanford)

INF 315E Introduction to Databases School of Information Fall 2015

NCSS: Databases and SQL

CSC 261/461 Database Systems Lecture 5. Fall 2017

Neo4j. Graph database for highly interconnected data

Exercise #6: Sales by Item Year over Year Report

Introduction to Database Systems CSE 444. Lecture #1 March 26, 2007

CSE 344 Introduction to Data Management. Section 2: More SQL

Lecture 4: Advanced SQL Part II

Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Administrivia. Administrivia. Faloutsos/Pavlo CMU /615

Principles of Database Systems CSE 544. Lecture #1 Introduction and SQL

Query Processing & Optimization

Hash table example. B+ Tree Index by Example Recall binary trees from CSE 143! Clustered vs Unclustered. Example

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Database Systems: Fall 2017 Quiz I

Human-Computer Interaction Design

CS 338 Nested SQL Queries

Lecture 03: SQL. Friday, April 2 nd, Dan Suciu Spring

Lab 6 Counting with SQL

Hanoi, Counting and Sierpinski s triangle Infinite complexity in finite area

Introduction to Database Systems CSE 414. Lecture 26: More Indexes and Operator Costs

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Database Systems: Fall 2009 Quiz I Solutions

Name: Lirong TAN 1. (15 pts) (a) Define what is a shortest s-t path in a weighted, connected graph G.

CSE 1325 Project Description

Lecture 3 Additional Slides. CSE 344, Winter 2014 Sudeepa Roy

WEEK 3 TERADATA EXERCISES GUIDE

Introduction to database design

Announcements. Outline UNIQUE. (Inner) joins. (Inner) Joins. Database Systems CSE 414. WQ1 is posted to gradebook double check scores

Instructor: Craig Duckett. Lecture 11: Thursday, May 3 th, Set Operations, Subqueries, Views

Teradata. This was compiled in order to describe Teradata and provide a brief overview of common capabilities and queries.

Introduction to Data Management CSE 344

Making ERAS work for you

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe

2. E/R Design Considerations

Editing and Formatting Worksheets

Xerox Wide Format IJP Variable Information with Xerox FreeFlow Design Express Solution Guide

CS 1110, LAB 1: PYTHON EXPRESSIONS.

CS 564 PS1. September 10, 2017

Chapter 11: Query Optimization

Homework 1: RA, SQL and B+-Trees (due September 24 th, 2014, 2:30pm, in class hard-copy please)

Advanced Database Systems

ProgressTestA Unit 5. Vocabulary. Grammar

Homework , Fall 2013 Software process Due Wednesday, September Automated location data on public transit vehicles (35%)

CSCI 1100L: Topics in Computing Lab Lab 1: Introduction to the Lab! Part I

Welcome to today s Webcast. Thank you so much for joining us today!

CMSC 201 Spring 2017 Lab 05 Lists

The DBMS accepts requests for data from the application program and instructs the operating system to transfer the appropriate data.

Polls on Piazza. Open for 2 days Outline today: Next time: "witnesses" (traditionally students find this topic the most difficult)

CSC 261/461 Database Systems Lecture 19

Previously everyone in the class used the mysql account: Username: csci340user Password: csci340pass

Act! User's Guide Working with Your Contacts

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17

Introduction to Programming Style

L11: ER modeling 4. CS3200 Database design (sp18 s2) 2/15/2018

CSE 344 JANUARY 19 TH SUBQUERIES 2 AND RELATIONAL ALGEBRA

Stat Wk 3. Stat 342 Notes. Week 3, Page 1 / 71

EECS-3421a: Test #2 Queries

Intellicus Enterprise Reporting and BI Platform

Training Guide. Fees and Invoicing. April 2011

Welcome to Moodle! How To Moodle

Frequency Tables. Chapter 500. Introduction. Frequency Tables. Types of Categorical Variables. Data Structure. Missing Values

SQL BASICS WITH THE SMALLBANKDB STEFANO GRAZIOLI & MIKE MORRIS

SQL Data Query Language

CMSC424: Database Design. Instructor: Amol Deshpande

Principles of Database Systems CSE 544. Lecture #1 Introduction and SQL

Three Query Language Formalisms

Algorithms for Query Processing and Optimization. 0. Introduction to Query Processing (1)

CS122 Lecture 10 Winter Term,

Homework 1: Relational Algebra and SQL (due February 10 th, 2016, 4:00pm, in class hard-copy please)

Introduction to Database Systems CSE 414

RELATIONAL DATA MODEL: Relational Algebra

Transcription:

In-class activities: Sep 25, 2017 Activities and group work this week function the same way as our previous activity. We recommend that you continue working with the same 3-person group. We suggest that you switch who within your group operates the laptop throughout class time, so you can all get hands-on practice. We know that you may not have completed some of the activity steps from last week and that s OK. You will complete some of those in conjunction with today s activities. Today s activity sheet refers back to last week s document and prompts you to complete a particular step or steps from there before you continue with certain steps here. Pacing: In a flipped classroom, the activities give you some structure, but the objective is that everyone can adjust the pace according to their needs. That means that you are in charge of your pace. However, we urge you to maintain focus and efficiency through the activities to ensure that you don t fall behind. We do not expect you to finish all the activities on this sheet today, as the queries in steps 5 and 7 are quite hard. Next week, we will have work wrap-up, which means that you will all get a chance to work through the problems you didn t get to. Today, we expect all of you to finish at least steps 1, 2, and 3 from this sheet, and steps 6 and 7 from last week s sheet for those who have not completed them already. However, we hope you will get farther, so that you are not left with too much work to cover next week. Quiz: There will be a short end-of-class quiz. As before, you should indicate your 3-person team on the quiz. We keep track of that based on table and laptop numbers. Note your table number, indicated on the whiteboard and monitor assigned to your table. There are two laptop numbers, one on yellow sticker and one on white sticker. Indicate both if possible, but at least one is necessary. If you cannot find these numbers, please talk to the instructors or TAs before you turn in your quiz.

Step 1: Load the data for the activity This week, we will work on our practicedb database, but we have modified the data slightly. We will drop the previous database, re-create it, and load the new data. Don t skip this step, as someone may have modified the database as was created last week on the machine you are working on, and some of the exercises may not work as intended. Start postgres and connect to psql (instructions on Moodle). Type \c to check what database you are connected to. You should be connected to a database other than practicedb. If you are connected to practicedb, connect to any other database with \c <database_name> Drop the previous practicedb database with the following command: DROP DATABASE practicedb; Then recreate it: CREATE DATABASE practicedb; Now connect to the practicedb database: \c practicedb Open the file practicedata2.sql, from this week s schedule and copy-paste the entire contents into your prompt. Check that you didn t get any errors and the proper tables are created.

Step 2: Advanced join practice If you have not completed (or you feel you need to practice again) steps 6 and 7 of last week s activity, do so now. Joins are very important operations in databases and you need a lot of practice with them. Writing join queries can get complicated, and some of these may take you some time to figure out. Don t hesitate to ask other teams for help and feedback, and if you feel stuck, talk to the TAs and instructors. Write a query to answer each of the following questions: 1. Which country or countries have sold products during the month of June? Hint: Note that there is a new table in your database (Sales) that records the number of each product item sold each month. How many tables do you need to join for this query? 2. Return a list of names of all employees who work on two distinct projects. Hint: This query is similar to the 2 nd query from last week s step 6: team-to-table. 3. List all employee names and the projects that each is involved in. Do not omit any employees even if they don t have a project. Hint: This query is similar to a query from last week s step 6, but not exactly the same. This needs a different type of join. 4. Return a list of all employee names and the names of their managers. Do not omit any employees even if they have no manager. Hint: This query is similar to a query from last week s step 6, but not exactly the same. This needs a different type of join. * * * Team-to-table * * * Reflect on how you determine how to write these joins. How do you determine how many tables you need to join? How do you determine if do you need a self-join? When do you need an outer join?

Step 3: Aggregation We will practice aggregate queries. These aggregates are over a single table (i.e., they don t involve joins). Write the following queries: 1. Compute the average price of all products. 2. Compute the average product price for each category of products. (You should write a single query that does that, not a separate query for each category!) 3. Compute the average price of all products cheaper than $150, for each category of products. 4. Compute the average price of products cheaper than $150, for each category, but only keep the categories where the maximum price of any product in this category is greater than $80. 5. Compute the average price of products cheaper than $150, for each category, but only keep the categories that start with the letter G. * * * Team-to-table * * * Check your understanding of the HAVING clause. Can you explain which conditions should go to the HAVING clause vs which conditions should go to the WHERE clause? Think about queries 4 and 5 above. Do your queries have a HAVING clause? If they do, can the condition(s) of the HAVING clause move to the WHERE clause? If they do not have a HAVING clause, can a WHERE condition move to the HAVING clause?

Step 4: More complex queries Aggregations and joins In this step, our queries will get more complex, as you need to combine aggregation and joins. 1. How many products have been sold from the category Gadgets? (Sales are recorded in the Sales table.) 2. What is the total revenue made by sales of Gadgets during the month of December? 3. For each employee, list how many of projects they are involved in. Check that you are reporting the number of projects for all employees and that the counts are correct. * * * Team-to-table * * * While some aggregates are simple (e.g., finding the price of the cheapest product), finding witnesses for these aggregates (e.g., which is the cheapest product) requires more complicated, typically nested queries. Work with the teams at your table to solve these harder queries: 1. For each product, find the month during which the product had the most sales out of the year. For example, if out of all the months of the year, Gizmo had the most sales in February, then the tuple (Gizmo, February) should be in the result. 2. Which company makes the most products? 3. Find the country that produces the cheapest product. 4. Find the employee with the most projects.

Step 5: Advanced practice on real-world data Connect to the IMDB database on your system. You can see all available datasets with the command \l You can see the tables in this dataset with the \d command. This dataset has the following tables: ACTOR (id, fname, lname, gender) MOVIE (id, name, year) DIRECTORS (id, fname, lname) CASTS (pid, mid, role) MOVIE_DIRECTORS (did, mid) GENRE (mid, genre) You will be working on queries of varying difficulty on this dataset. If you have not completed (or you feel you need to practice again) step 9 of last week s activity, do so now. Once you are done, continue with the queries on the following page. These are all challenging queries, so we suggest checking your work with the other teams at your table for all of them.

* * * Team-to-table * * * 1. List all directors who directed 30 or more thrillers, in descending order of the number of thrillers they directed. Return the directors' names and the number of thriller movies each of them directed. 2. We want to find actors that played five or more roles in the same movie during the year 2015. Notice that CASTS may have occasional duplicates, but we are not interested in these: we want actors that had five or more distinct roles in the same movie in the year 2015. a. Write a query that returns the actors' names, the movie name, and the number of distinct roles that they played in that movie (which will be 5). b. Write a query that returns the actors' names, the movie name, and all the distinct roles (five or more) that the actor played in that movie. 3. [Hard query] For simplicity, the IMDB dataset lists an actor s gender as M or F. (The dataset doesn t provision for non-binary actors, and the questions below are based on this assumption.) a. For each year, count the number of movies in that year that had only female actors. For movies where noone was casted, you can decide whether to consider them female-only or male-only. b. Now make a small change: for each year, report the percentage of movies with only female actors made that year, and also the total number of movies made that year. For example, one answer may be: 1990 31.81 13522 meaning that in 1990 there were 13,522 movies, and 31.81% had only female actors. You do not need to round your answer.

Step 6: Wrapping up Complete steps 8 and 10, and any other steps you may have omitted from the previous week s activity sheet. If you have additional time, work on the harder queries on step 7 of the following page.

Step 7: Harder queries [OPTIONAL] This step is optional. We suggest that you work on it only after you complete all other steps. First, note that these queries are quite complex, and may take a while to run. Think together with your teammates how you can work through them. One possibility is to create a smaller, sample dataset, which can help you write and debug the queries. This can be a bit of a challenge in itself! 1. Which is the film (or films) with the largest cast? Return the movie title and the size of the cast. By "cast size" we mean the number of distinct actors that played in that movie: if an actor played multiple roles, or if the actor is simply listed more than once in CASTS, we still count her/him only once. 2. Which decade has the largest number of films? We consider that a decade is a sequence of any 10 consecutive years. For example 1965, 1966,..., 1974 is a decade, and so is 1967, 1968,..., 1976. 3. How many actors have Bacon number 2? The Bacon number of an actor is the length of the shortest path between the actor and Kevin Bacon in the "coacting" graph. That is, Kevin Bacon has Bacon number 0; all actors who acted in the same film as KB have Bacon number 1; all actors who acted in the same film as some actor with Bacon number 1 have Bacon number 2, etc. To ponder: how can you compute the Bacon number of each actor in the database?