Introduction to Data Management CSE 344

Similar documents
Introduction to Database Systems CSE 414

SELECT Product.name, Purchase.store FROM Product JOIN Purchase ON Product.name = Purchase.prodName

Lecture 03: SQL. Friday, April 2 nd, Dan Suciu Spring

Database Systems CSE 414

Principles of Database Systems CSE 544. Lecture #2 SQL The Complete Story

Introduction to Data Management CSE 344

Lecture 4: Advanced SQL Part II

CSC 261/461 Database Systems Lecture 5. Fall 2017

Lecture 04: SQL. Monday, April 2, 2007

Announcements. Outline UNIQUE. (Inner) joins. (Inner) Joins. Database Systems CSE 414. WQ1 is posted to gradebook double check scores

Introduction to SQL Part 2 by Michael Hahsler Based on slides for CS145 Introduction to Databases (Stanford)

Lecture 04: SQL. Wednesday, October 4, 2006

SQL. Assit.Prof Dr. Anantakul Intarapadung

SQL. Structured Query Language

Principles of Database Systems CSE 544. Lecture #2 SQL, Relational Algebra, Relational Calculus

L07: SQL: Advanced & Practice. CS3200 Database design (sp18 s2) 1/11/2018

CS 2451 Database Systems: More SQL

CSE 544 Principles of Database Management Systems

CS 564 Midterm Review

Lectures 2&3: Introduction to SQL

Introduction to Database Systems CSE 414

Please pick up your name card

Introduction to Data Management CSE 344

Introduction to Data Management CSE 344

Subqueries. 1. Subqueries in SELECT. 1. Subqueries in SELECT. Including Empty Groups. Simple Aggregations. Introduction to Database Systems CSE 414

Database design and implementation CMPSCI 645. Lecture 04: SQL and Datalog

CSE 544 Principles of Database Management Systems

Introduction to Relational Databases

Introduction to Data Management CSE 344

CSE 344 MAY 14 TH ENTITIES

Introduction to Data Management CSE 344

CSE 344 AUGUST 1 ST ENTITIES

Where Are We? Next Few Lectures. Integrity Constraints Motivation. Constraints in E/R Diagrams. Keys in E/R Diagrams

DS Introduction to SQL Part 1 Single-Table Queries. By Michael Hahsler based on slides for CS145 Introduction to Databases (Stanford)

Introduction to Database Systems CSE 414

CSE 344 JULY 30 TH DB DESIGN (CH 4)

HW1 is due tonight HW2 groups are assigned. Outline today: - nested queries and witnesses - We start with a detailed example! - outer joins, nulls?

Database Systems CSE 414

CSE 344 JANUARY 19 TH SUBQUERIES 2 AND RELATIONAL ALGEBRA

Announcements. Subqueries. Lecture Goals. 1. Subqueries in SELECT. Database Systems CSE 414. HW1 is due today 11pm. WQ1 is due tomorrow 11pm

Database Systems CSE 414

Introduction to Database Systems CSE 444

CSC 261/461 Database Systems Lecture 14. Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101

Database Systems CSE 414

Lecture 3 Additional Slides. CSE 344, Winter 2014 Sudeepa Roy

Database design and implementation CMPSCI 645. Lecture 06: Constraints and E/R model

Introduction to SQL Part 1 By Michael Hahsler based on slides for CS145 Introduction to Databases (Stanford)

Introduction to Database Systems

Introduction to Data Management CSE 414

Announcements. Multi-column Keys. Multi-column Keys (3) Multi-column Keys. Multi-column Keys (2) Introduction to Data Management CSE 414

Introduction to Database Systems CSE 414. Lecture 8: SQL Wrap-up

Database Design Process Entity / Relationship Diagrams

Announcements. Database Design. Database Design. Database Design Process. Entity / Relationship Diagrams. Introduction to Data Management CSE 344

Polls on Piazza. Open for 2 days Outline today: Next time: "witnesses" (traditionally students find this topic the most difficult)

Querying Data with Transact SQL

Entity/Relationship Modelling

Agenda. Discussion. Database/Relation/Tuple. Schema. Instance. CSE 444: Database Internals. Review Relational Model

Keys, SQL, and Views CMPSCI 645

Announcements. Multi-column Keys. Multi-column Keys. Multi-column Keys (3) Multi-column Keys (2) Introduction to Data Management CSE 414

Announcements. Agenda. Database/Relation/Tuple. Discussion. Schema. CSE 444: Database Internals. Room change: Lab 1 part 1 is due on Monday

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2006 Lecture 3 - Relational Model

Announcements. Agenda. Database/Relation/Tuple. Schema. Discussion. CSE 444: Database Internals

Announcements. Database Design. Database Design. Database Design Process. Entity / Relationship Diagrams. Database Systems CSE 414

Introduction to Database Systems CSE 414. Lecture 3: SQL Basics

COMP 430 Intro. to Database Systems

Database Systems CSE 303. Lecture 02

Database Systems CSE 303. Lecture 02

Chapter 7 Advanced SQL. Objectives

Introduction to SQL. Select-From-Where Statements Multirelation Queries Subqueries

CSE 344 Introduction to Data Management. Section 2: More SQL

CS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen

CMPT 354: Database System I. Lecture 4. SQL Advanced

CSE 344 JANUARY 5 TH INTRO TO THE RELATIONAL DATABASE

CMPT 354: Database System I. Lecture 3. SQL Basics

Principles of Database Systems CSE 544p. Lecture #1 September 28, 2011

Lecture 2: Introduction to SQL

INF 212 ANALYSIS OF PROG. LANGS SQL AND SPREADSHEETS. Instructors: Crista Lopes Copyright Instructors.

4. SQL - the Relational Database Language Standard 4.3 Data Manipulation Language (DML)

SQL Functionality SQL. Creating Relation Schemas. Creating Relation Schemas

CSC 343 Winter SQL: Aggregation, Joins, and Triggers MICHAEL LIUT

Principles of Database Systems CSE 544. Lecture #1 Introduction and SQL

CS 2451 Database Systems: SQL

MIS2502: Data Analytics SQL Getting Information Out of a Database Part 1: Basic Queries

L12: ER modeling 5. CS3200 Database design (sp18 s2) 2/22/2018

SQL Continued! Outerjoins, Aggregations, Grouping, Data Modification

Querying Data with Transact-SQL

Principles of Database Systems CSE 544. Lecture #1 Introduction and SQL

DS Introduction to SQL Part 2 Multi-table Queries. By Michael Hahsler based on slides for CS145 Introduction to Databases (Stanford)

Midterm Review. Winter Lecture 13

Introduction to SQL. IT 5101 Introduction to Database Systems. J.G. Zheng Fall 2011

Before we talk about Big Data. Lets talk about not-so-big data

In-class activities: Sep 25, 2017

Announcements. Joins in SQL. Joins in SQL. (Inner) joins. Joins in SQL. Introduction to Database Systems CSE 414

Database Management

The Relational Data Model

Structured Query Language Continued. Rose-Hulman Institute of Technology Curt Clifton

Introduction to Database S ystems Systems CSE 444 Lecture 1 Introduction CSE Summer

NESTED QUERIES AND AGGREGATION CHAPTER 5 (6/E) CHAPTER 8 (5/E)

CSE 344 JANUARY 8 TH SQLITE AND JOINS

SQL: Part II. Announcements (September 18) Incomplete information. CPS 116 Introduction to Database Systems. Homework #1 due today (11:59pm)

Transcription:

Introduction to Data Management CSE 344 Lectures 4 and 5: Aggregates in SQL Dan Suciu - CSE 344, Winter 2012 1

Announcements Homework 1 is due tonight! Quiz 1 due Saturday Homework 2 is posted (due next week) You have accounts on SQL Server Needed in Homework 3 Start Management Studio (on Windows) Connect to IISQLSRV Use SQL Server Authentication You @uw login Password: in class. Then change!!

Outline Nulls (6.1.6-6.1.7) Outer joins (6.3.8) Aggregations (6.4.3 6.4.6) Examples, examples, examples Dan Suciu - CSE 344, Winter 2012 3

NULLS in SQL Whenever we don t have a value, we can put a NULL Can mean many things: Value does not exists Value exists but is unknown Value not applicable Etc. The schema specifies for each attribute if can be null (nullable attribute) or not How does SQL cope with tables that have NULLs? Dan Suciu - CSE 344, Winter 2012 4

Null Values If x= NULL then 4*(3-x)/7 is still NULL If x= NULL then x= Joe is UNKNOWN In SQL there are three boolean values: FALSE = 0 UNKNOWN = 0.5 TRUE = 1 Dan Suciu - CSE 344, Winter 2012 5

Null Values C1 AND C2 = min(c1, C2) C1 OR C2 = max(c1, C2) NOT C1 = 1 C1 SELECT * FROM Person WHERE (age < 25) AND (height > 6 OR weight > 190) E.g. age=20 height=null weight=200 Rule in SQL: include only tuples that yield TRUE Dan Suciu - CSE 344, Winter 2012 6

Null Values Unexpected behavior: SELECT * FROM Person WHERE age < 25 OR age >= 25 Some Person tuples are not included! Dan Suciu - CSE 344, Winter 2012 7

Null Values Can test for NULL explicitly: x IS NULL x IS NOT NULL SELECT * FROM Person WHERE age < 25 OR age >= 25 OR age IS NULL Now it includes all Person tuples Dan Suciu - CSE 344, Winter 2012 8

Outerjoins Product(name, category) Purchase(prodName, store) An inner join : SELECT Product.name, Purchase.store FROM Product, Purchase WHERE Product.name = Purchase.prodName Same as: SELECT Product.name, Purchase.store FROM Product JOIN Purchase ON Product.name = Purchase.prodName But Products that never sold will be lost! Dan Suciu - CSE 344, Winter 2012 9

Outerjoins Product(name, category) Purchase(prodName, store) If we want the never-sold products, need an outerjoin : SELECT Product.name, Purchase.store FROM Product LEFT OUTER JOIN Purchase ON Product.name = Purchase.prodName Dan Suciu - CSE 344, Winter 2012 10

Product Purchase Name Category ProdName Store Gizmo gadget Gizmo Wiz Camera Photo Camera Ritz OneClick Photo Camera Wiz Name Gizmo Camera Camera OneClick Store Wiz Ritz Wiz NULL

Outer Joins Left outer join: Include the left tuple even if there s no match Right outer join: Include the right tuple even if there s no match Full outer join: Include both left and right tuples even if there s no match Dan Suciu - CSE 344, Winter 2012 12

Aggregation in SQL sqlite3 lecture04 create table Purchase (pid int primary key, product varchar(15), price float, quantity int, month varchar(15));.import data.txt Purchase Specify a filename where the database will be stored Other DBMSs have other ways of impor5ng data Dan Suciu - CSE 344, Winter 2012 13

Simple Aggregations Five basic aggregate operations in SQL select count(*) from Purchase select count(quantity) from Purchase select sum(quantity) from Purchase select avg(price) from Purchase select max(quantity) from Purchase select min(quantity) from Purchase Except count, all aggregations apply to a single attribute Dan Suciu - CSE 344, Winter 2012 14

Aggregates and NULL Values Null values are not used in aggregates insert into Purchase values(11, gadget, NULL, NULL, april ) Let s try the following: select count(*) from Purchase select count(quantity) from Purchase select sum(quantity) from Purchase Dan Suciu - CSE 344, Winter 2012 15

Counting Duplicates COUNT applies to duplicates, unless otherwise stated: SELECT Count(product) FROM Purchase WHERE price > 4.99 same as Count(*) We probably want: SELECT Count(DISTINCT product) FROM Purchase WHERE price> 4.99 Dan Suciu - CSE 344, Winter 2012 16

More Examples SELECT Sum(price * quantity) FROM Purchase SELECT Sum(price * quantity) FROM Purchase WHERE product = bagel What do they mean? Dan Suciu - CSE 344, Winter 2012 17

Simple Aggregations Purchase Product Price Quantity Bagel 3 20 Bagel 1.50 20 Banana 0.5 50 Banana 2 10 Banana 4 10 SELECT Sum(price * quantity) FROM Purchase WHERE product = Bagel Dan Suciu - CSE 344, Winter 2012 90 (= 60+30) 18

Grouping and Aggregation Purchase(product, price, quantity) Find total quantities for all sales over $1, by product. SELECT product, Sum(quantity) AS TotalSales FROM Purchase WHERE price > 1 GROUP BY product Let s see what this means Dan Suciu - CSE 344, Winter 2012 19

Grouping and Aggregation 1. Compute the FROM and WHERE clauses. 2. Group by the attributes in the GROUPBY 3. Compute the SELECT clause: grouped attributes and aggregates. Dan Suciu - CSE 344, Winter 2012 20

1&2. FROM-WHERE-GROUPBY Product Price Quantity Bagel 3 20 Bagel 1.50 20 Banana 0.5 50 Banana 2 10 Banana 4 10 Dan Suciu - CSE 344, Winter 2012 21

3. SELECT Product Price Quantity Bagel 3 20 Bagel 1.50 20 Banana 0.5 50 Banana 2 10 Banana 4 10 Product TotalSales Bagel 40 Banana 20 SELECT product, Sum(quantity) AS TotalSales FROM Purchase WHERE price > 1 GROUP BY product 22

Other Examples Compare these two queries: SELECT product, count(*) FROM Purchase GROUP BY product SELECT month, count(*) FROM Purchase GROUP BY month SELECT product, sum(quantity) AS SumQuantity, max(price) AS MaxPrice FROM Purchase GROUP BY product Dan Suciu - CSE 344, Winter 2012 What does it mean? 23

Need to be Careful SELECT product, max(quantity) FROM Purchase GROUP BY product SELECT product, quantity FROM Purchase GROUP BY product Product Price Quantity Bagel 3 20 Bagel 1.50 20 Banana 0.5 50 Banana 2 10 Banana 4 10 Sqlite is WRONG on this query. SQL Server correctly gives an error 24

Ordering Results SELECT product, sum(price*quantity) as rev FROM purchase GROUP BY product ORDER BY rev desc Dan Suciu - CSE 344, Winter 2012 25

HAVING Clause Same query as earlier, except that we consider only products that had at least 30 sales. SELECT product, Sum(quantity) FROM Purchase WHERE price > 1 GROUP BY product HAVING Sum(quantity) > 30 HAVING clause contains conditions on aggregates. Dan Suciu - CSE 344, Winter 2012 26

WHERE vs HAVING WHERE condition is applied to individual rows The rows may or may not contributed to the aggregate No aggregates allowed here HAVING condition is applied to the entire group Entire group is returned, or not al all May use aggregate functions in the group Dan Suciu - CSE 344, Winter 2012 27

Aggregates and Joins create table Product (pid int primary key, pname varchar (15), manufacturer varchar(15)); insert into product values(1, 'bagel', 'Sunshine Co.'); insert into product values(2, 'banana', 'BusyHands'); insert into product values(3, 'gizmo', 'GizmoWorks'); insert into product values(4, 'gadget', 'BusyHands'); insert into product values(5, 'powergizmo, 'PowerWorks'); Dan Suciu - CSE 344, Winter 2012 28

Aggregate + Join Example SELECT x.manufacturer, count(*) FROM Product x, Purchase y WHERE x.pname = y.product GROUP BY x.manufacturer What do these query mean? SELECT x.manufacturer, y.month, count(*) FROM Product x, Purchase y WHERE x.pname = y.product GROUP BY x.manufacturer, y.month Dan Suciu - CSE 344, Winter 2012 29

General form of Grouping and Aggregation SELECT S FROM R 1,,R n WHERE C1 GROUP BY a 1,,a k HAVING C2 Why? S = may contain attributes a 1,,a k and/or any aggregates but NO OTHER ATTRIBUTES C1 = is any condition on the attributes in R 1,,R n C2 = is any condition on aggregate expressions and on attributes a 1,,a k Dan Suciu - CSE 344, Winter 2012 30

Semantics of SQL With Group-By SELECT S FROM R 1,,R n WHERE C1 GROUP BY a 1,,a k HAVING C2 Evaluation steps: 1. Evaluate FROM-WHERE using Nested Loop Semantics 2. Group by the attributes a 1,,a k 3. Apply condition C2 to each group (may have aggregates) 4. Compute aggregates in S and return the result Dan Suciu - CSE 344, Winter 2012 31

Empty Groups In the result of a group by query, there is one row per group in the result No group can be empty! In particular, count(*) is never 0 SELECT x.manufacturer, count(*) FROM Product x, Purchase y WHERE x.pname = y.product GROUP BY x.manufacturer What if there are no purchases for a manufacturer Dan Suciu - CSE 344, Winter 2012 32

Empty Groups: Example SELECT product, count(*) FROM purchase GROUP BY product SELECT product, count(*) FROM purchase WHERE price > 2.0 GROUP BY product 4 groups in our example dataset 3 groups in our example dataset Dan Suciu - CSE 344, Winter 2012 33

Empty Group Problem SELECT x.manufacturer, count(*) FROM Product x, Purchase y WHERE x.pname = y.product GROUP BY x.manufacturer What if there are no purchases for a manufacturer Dan Suciu - CSE 344, Winter 2012 34

Empty Group Solution: Outer Join SELECT x.manufacturer, count(y.pid) FROM Product x LEFT OUTER JOIN Purchase y ON x.pname = y.product GROUP BY x.manufacturer Dan Suciu - CSE 344, Winter 2012 35