Introduction to SQL Part 2 by Michael Hahsler Based on slides for CS145 Introduction to Databases (Stanford)

Similar documents
CSC 261/461 Database Systems Lecture 5. Fall 2017

Lecture 4: Advanced SQL Part II

Lecture 03: SQL. Friday, April 2 nd, Dan Suciu Spring

Lectures 2&3: Introduction to SQL

Introduction to Data Management CSE 344

Introduction to Database Systems CSE 414

SQL. Structured Query Language

CS 564 Midterm Review

Principles of Database Systems CSE 544. Lecture #2 SQL The Complete Story

CS 2451 Database Systems: More SQL

SQL. Assit.Prof Dr. Anantakul Intarapadung

Database Systems CSE 414

Lecture 04: SQL. Monday, April 2, 2007

Lecture 04: SQL. Wednesday, October 4, 2006

Polls on Piazza. Open for 2 days Outline today: Next time: "witnesses" (traditionally students find this topic the most difficult)

Introduction to Data Management CSE 344

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2006 Lecture 3 - Relational Model

Agenda. Discussion. Database/Relation/Tuple. Schema. Instance. CSE 444: Database Internals. Review Relational Model

Introduction to SQL Part 1 By Michael Hahsler based on slides for CS145 Introduction to Databases (Stanford)

Announcements. Outline UNIQUE. (Inner) joins. (Inner) Joins. Database Systems CSE 414. WQ1 is posted to gradebook double check scores

Announcements. Agenda. Database/Relation/Tuple. Discussion. Schema. CSE 444: Database Internals. Room change: Lab 1 part 1 is due on Monday

Announcements. Agenda. Database/Relation/Tuple. Schema. Discussion. CSE 444: Database Internals

DS Introduction to SQL Part 1 Single-Table Queries. By Michael Hahsler based on slides for CS145 Introduction to Databases (Stanford)

Principles of Database Systems CSE 544. Lecture #2 SQL, Relational Algebra, Relational Calculus

Please pick up your name card

HW1 is due tonight HW2 groups are assigned. Outline today: - nested queries and witnesses - We start with a detailed example! - outer joins, nulls?

L07: SQL: Advanced & Practice. CS3200 Database design (sp18 s2) 1/11/2018

DS Introduction to SQL Part 2 Multi-table Queries. By Michael Hahsler based on slides for CS145 Introduction to Databases (Stanford)

CSE 544 Principles of Database Management Systems

CSE 544 Principles of Database Management Systems

Introduction to Relational Databases

Querying Data with Transact SQL

Database design and implementation CMPSCI 645. Lecture 04: SQL and Datalog

Lecture 3 Additional Slides. CSE 344, Winter 2014 Sudeepa Roy

ASSIGNMENT NO Computer System with Open Source Operating System. 2. Mysql

Introduction to Database Systems CSE 414. Lecture 3: SQL Basics

CS 2451 Database Systems: SQL

Introduction to Database Systems CSE 414

CSE 344 JULY 30 TH DB DESIGN (CH 4)

SELECT Product.name, Purchase.store FROM Product JOIN Purchase ON Product.name = Purchase.prodName

Database Systems CSE 414

Keys, SQL, and Views CMPSCI 645

Introduction to Data Management CSE 344

Announcements. Subqueries. Lecture Goals. 1. Subqueries in SELECT. Database Systems CSE 414. HW1 is due today 11pm. WQ1 is due tomorrow 11pm

Lecture 2: Introduction to SQL

Introduction to Database Systems

Introduction to Data Management CSE 344

Subqueries. 1. Subqueries in SELECT. 1. Subqueries in SELECT. Including Empty Groups. Simple Aggregations. Introduction to Database Systems CSE 414

CSE 344 MAY 14 TH ENTITIES

Introduction to Data Management CSE 344

Lecture 3: SQL Part II

CSE 344 JANUARY 19 TH SUBQUERIES 2 AND RELATIONAL ALGEBRA

CS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen

Querying Data with Transact-SQL

Database Systems CSE 414

Big Data Processing Technologies. Chentao Wu Associate Professor Dept. of Computer Science and Engineering

Database Systems CSE 414

Introduction to Data Management CSE 344

CSC 261/461 Database Systems Lecture 14. Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101

Introduction to Database Systems CSE 414. Lecture 3: SQL Basics

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 7 - Query execution

Structured Query Language Continued. Rose-Hulman Institute of Technology Curt Clifton

Querying Data with Transact-SQL

Announcements. Multi-column Keys. Multi-column Keys (3) Multi-column Keys. Multi-column Keys (2) Introduction to Data Management CSE 414

Introduction to Data Management CSE 414

Querying Data with Transact-SQL (761)

20461: Querying Microsoft SQL Server 2014 Databases

CMPT 354: Database System I. Lecture 4. SQL Advanced

Querying Data with Transact SQL Microsoft Official Curriculum (MOC 20761)

Announcements. Multi-column Keys. Multi-column Keys. Multi-column Keys (3) Multi-column Keys (2) Introduction to Data Management CSE 414

Oracle Database: SQL and PL/SQL Fundamentals NEW

Database Management Systems,

CSE 344 AUGUST 1 ST ENTITIES

Oracle Database 11g: SQL and PL/SQL Fundamentals

20761 Querying Data with Transact SQL

Where Are We? Next Few Lectures. Integrity Constraints Motivation. Constraints in E/R Diagrams. Keys in E/R Diagrams

CSE 544 Principles of Database Management Systems

Course Modules for MCSA: SQL Server 2016 Database Development Training & Certification Course:

CSE 344 Introduction to Data Management. Section 2: More SQL

Database Design Process Entity / Relationship Diagrams

WHAT IS SQL. Database query language, which can also: Define structure of data Modify data Specify security constraints

Introduction to database design

Sql Server Syllabus. Overview

Midterm Review. Winter Lecture 13

Oracle Syllabus Course code-r10605 SQL

CSC Web Programming. Introduction to SQL

Lab # 6. Using Subqueries and Set Operators. Eng. Alaa O Shama

MANAGING DATA(BASES) USING SQL (NON-PROCEDURAL SQL, X401.9)

Handout 9 CS-605 Spring 18 Page 1 of 8. Handout 9. SQL Select -- Multi Table Queries. Joins and Nested Subqueries.

Introduction to SQL. Select-From-Where Statements Multirelation Queries Subqueries

Oracle Database: SQL and PL/SQL Fundamentals Ed 2

Announcements. Database Design. Database Design. Database Design Process. Entity / Relationship Diagrams. Introduction to Data Management CSE 344

CSC 343 Winter SQL: Aggregation, Joins, and Triggers MICHAEL LIUT

Part I: Structured Data

1) Introduction to SQL

COURSE OUTLINE MOC 20461: QUERYING MICROSOFT SQL SERVER 2014

After completing this course, participants will be able to:

Introduction to Database Systems CSE 414

COMP 430 Intro. to Database Systems

INDEX. 1 Basic SQL Statements. 2 Restricting and Sorting Data. 3 Single Row Functions. 4 Displaying data from multiple tables

Transcription:

Introduction to SQL Part 2 by Michael Hahsler Based on slides for CS145 Introduction to Databases (Stanford)

Lecture 3 Lecture Overview 1. Aggregation & GROUP BY 2. Set operators & nested queries 3. Advanced SQL-izing (NULL, Outer Joins, etc.) 2

Lecture 3 > Section 2 AGGREGATION, GROUP BY AND HAVING CLAUSE 3

Lecture 3 > Section 2 > Aggregation Aggregation SELECT AVG(price) FROM Product WHERE maker = Toyota SELECT COUNT(*) FROM Product WHERE year > 1995 SQL supports several aggregation operations: SUM, COUNT, MIN, MAX, AVG Except for COUNT, all aggregations apply to a single attribute! 4

Lecture 3 > Section 2 > Aggregation Aggregation: COUNT COUNT counts the number of tuples including duplicates. SELECT COUNT(category) FROM Product WHERE year > 1995 Note: Same as COUNT(*). Why? We probably want: SELECT COUNT(DISTINCT category) FROM Product WHERE year > 1995 5

Lecture 3 > Section 2 > Aggregation More Examples Purchase(product, date, price, quantity) SELECT SUM(price * quantity) FROM Purchase SELECT SUM(price * quantity) FROM Purchase WHERE product = bagel What do these mean? 6

Lecture 3 > Section 2 > Aggregation Simple Aggregations Purchase Product Date Price Quantity bagel 10/21 1 20 banana 10/3 0.5 10 banana 10/10 1 10 bagel 10/25 1.50 20 SELECT SUM(price * quantity) FROM Purchase WHERE product = bagel 50 (= 1*20 + 1.50*20) 7

Lecture 3 > Section 2 > GROUP BY Grouping and Aggregation Purchase(product, date, price, quantity) SELECT product, SUM(price * quantity) AS TotalSales FROM Purchase WHERE date > 10/01 GROUP BY product Let s see what this means Find total sales after 10/1 per product. Note: Be very careful with dates! Use date/time related functions! 8

Lecture 3 > Section 2 > GROUP BY Grouping and Aggregation Semantics of the query: 1. Compute the FROM and WHERE clauses 2. Group by the attributes in the GROUP BY 3. Compute the SELECT clause: grouped attributes and aggregates 9

Lecture 3 > Section 2 > GROUP BY 1. Compute the FROM and WHERE clauses SELECT product, SUM(price*quantity) AS TotalSales FROM Purchase WHERE date > 10/01 GROUP BY product FROM Product Date Price Quantity Bagel 10/21 1 20 Bagel 10/25 1.50 20 Banana 10/3 0.5 10 Banana 10/10 1 10 10

Lecture 3 > Section 2 > GROUP BY 2. Group by the attributes in the GROUP BY SELECT product, SUM(price*quantity) AS TotalSales FROM Purchase WHERE date > 10/01 GROUP BY product Product Date Price Quantity Bagel 10/21 1 20 Bagel 10/25 1.50 20 Banana 10/3 0.5 10 Banana 10/10 1 10 GROUP BY Product Date Price Quantity Bagel 10/21 1 20 10/25 1.50 20 Banana 10/3 0.5 10 10/10 1 10 11

Lecture 3 > Section 2 > GROUP BY 3. Compute the SELECT clause: grouped attributes and aggregates SELECT product, SUM(price*quantity) AS TotalSales FROM Purchase WHERE date > 10/01 GROUP BY product Product Date Price Quantity Bagel 10/21 1 20 10/25 1.50 20 Banana 10/3 0.5 10 10/10 1 10 SELECT Product TotalSales Bagel 50 Banana 15 12

Lecture 3 > Section 1 > ACTIVITY Activity 1) What do the next two queries calculate? Company(Cname, country) Product(PName, price, category, manufacturer) Purchase(id, product, buyer) SELECT SUM(price) AS total, SUM(price) *1.08 AS totalplustax FROM Product pr JOIN Purchase p ON pr.pname = p.product WHERE p.buyer = 'Joe Blow' SELECT p.buyer, SUM(price) AS total, SUM(price) *1.08 AS totalplustax FROM Product pr JOIN Purchase p ON pr.pname = p.product GROUP BY p.buyer ORDER BY 1 2) Write a query to find the price of the most expensive product in each category. 13

Lecture 3 > Section 2 > GROUP BY HAVING Clause Purchase(product, date, price, quantity) SELECT product, SUM(price*quantity) FROM Purchase WHERE date > 10/1/2005 GROUP BY product HAVING SUM(quantity) > 100 Same query as before, except that we consider only products that have more than 100 buyers HAVING clauses contains conditions on aggregates Whereas WHERE clauses condition on individual tuples 14

Lecture 3 > Section 2 > GROUP BY General form of Grouping and Aggregation SELECT S FROM R 1,,R n WHERE C 1 GROUP BY a 1,,a k HAVING C 2 S = Can ONLY contain attributes a 1,,a k and/or aggregates over other attributes C 1 = is any condition on the attributes in R 1,,R n C 2 = is any condition on the aggregate expressions Why? 15

Lecture 3 > Section 2 > GROUP BY General form of Grouping and Aggregation Evaluation steps: SELECT S FROM R 1,,R n WHERE C 1 GROUP BY a 1,,a k HAVING C 2 1. Evaluate FROM-WHERE: apply condition C 1 on the attributes in R 1,,R n 2. GROUP BY the attributes a 1,,a k 3. Apply condition C 2 to each group (may have aggregates) 4. Compute aggregates in S and do projection (SELECT) 16

Lecture 3 > Section 2 > ACTIVITY Activity 1) What does this query do? Company(Cname, country) Product(PName, price, manufacturer) Purchase(id, product, buyer) SELECT p.buyer, SUM(price) AS total, COUNT(*) AS purchases FROM Product pr JOIN Purchase p ON pr.pname = p.product GROUP BY p.buyer HAVING purchases >2 ORDER BY 1 2) What is the revenue per product in the DB? 17

Lecture 3 > Section 1 SET OPERATORS & NESTED QUERIES 18

Lecture 3 > Section 1 > Set Operators Multisets Multiset X Tuple (1, a) (1, a) (1, b) (2, c) (2, c) (2, c) (1, d) (1, d) Equivalent Representations of a Multiset λ X = Count of tuple in X (Items not listed have implicit count 0) Multiset X Tuple λ(x) (1, a) 2 (1, b) 1 (2, c) 3 (1, d) 2 Note: In a set all counts are {0,1}. 20

Lecture 3 > Section 1 > Set Operators Explicit Set Operators: INTERSECT INTERSECT clause/operator is used to combine two SELECT statements, but returns rows only from the first SELECT statement that are identical to a row in the second SELECT statement. Example: Find all departments with an ID >= 25 that do not have an employee by the name of Anderson. SELECT department_id FROM departments WHERE department_id >= 25 INTERSECT SELECT department_id FROM employees WHERE last_name <> 'Anderson'; Q 1 Q 2 24

Lecture 3 > Section 1 > Set Operators UNION The UNION operator is used to combine the result sets of 2 or more SELECT statements. It removes duplicate rows between the various SELECT statements. Example: Selects all the different cities (only distinct values) for customers and suppliers. SELECT City FROM Customers UNION SELECT City FROM Suppliers ORDER BY City Q 1 Q 2 Result does not contain duplicates! 25

Lecture 3 > Section 1 > Set Operators UNION ALL Same as UNION, but it does not remove duplicate rows. So we could later count how many customers/suppliers we have in each city. SELECT City FROM Customers UNION ALL SELECT City FROM Suppliers ORDER BY City Q 1 Q 2 ALL indicates Multiset operations! 26

Lecture 3 > Section 1 > Set Operators EXCEPT The EXCEPT operator is used to return all rows in the first SELECT statement that are not returned by the second SELECT statement. Example: Find cities where we have customers, but no suppliers. SELECT City FROM Customers EXCEPT SELECT City FROM Suppliers ORDER BY City Q 1 Q 2 27

Lecture 3 > Section 1 > Set Operators NESTED QUERIES 28

Lecture 3 > Section 1 > Nested Queries High-level note on nested queries We can do nested queries because SQL is compositional: Everything (inputs / outputs) is represented as multisets (a relation). The output of one query can thus be used as the input to another (nesting)! This is extremely powerful!

Lecture 3 > Section 1 > Nested Queries Nested Queries Company(Cname, country) Product(PName, price, manufacturer) Purchase(id, product, buyer) SELECT c.country FROM Company c WHERE c.name IN ( SELECT pr.manufacturer FROM Purchase p, Product pr WHERE p.product = pr.name AND p.buyer = Joe Blow ) Countries where one can find companies that manufacture products bought by Joe Blow 30

Lecture 3 > Section 1 > Nested Queries Nested Queries Is the query equivalent to this 3-way join? SELECT c.country FROM Company c, Product pr, Purchase p WHERE c.name = pr.manufacturer AND pr.name = p.product AND p.buyer = Joe Blow Cities where one can find companies that manufacture products bought by Joe Blow Beware of duplicates! 31

Lecture 3 > Section 1 > Nested Queries Nested Queries SELECT DISTINCT c.country FROM Company c, Product pr, Purchase p WHERE c.name = pr.manufacturer AND pr.name = p.product AND p.buyer = Joe Blow SELECT DISTINCT c.country FROM Company c WHERE c.name IN ( SELECT pr.manufacturer FROM Purchase p, Product pr WHERE p.product = pr.name AND p.buyer = Joe Blow ) Now they are equivalent 32

Lecture 3 > Section 1 > Nested Queries Subqueries Returning Relations You can also use operations of the form: s <operator> ALL R s <operator> ANY R EXISTS R Ex: Product(name, price, category, maker) SQLite: ANY and ALL are not supported! They can be typically implemented using max/min and grouping. EXISTS is supported. SELECT name FROM Product WHERE price > ALL( SELECT price FROM Product WHERE maker = Gizmo-Works ) Find products that are more expensive than all those produced by Gizmo-Works 33

Lecture 3 > Section 1 > Nested Queries Subqueries Returning Relations You can also use operations of the form: s <operator> ALL R s <operator> ANY R EXISTS R Ex: Product(name, price, category, maker) SELECT p1.name FROM Product p1 WHERE p1.maker = Gizmo-Works AND EXISTS( SELECT p2.name FROM Product p2 WHERE p2.maker <> Gizmo-Works AND p1.name = p2.name) Find copycat products, i.e. products made by competitors with the same names as products made by Gizmo-Works <> means!= (not equal) 34

Lecture 3 > Section 1 > Nested Queries Correlated Queries A correlated subquery (or synchronized subquery) is a subquery (a query nested inside another query) that uses values from the outer query. Because the subquery is evaluated once for each row processed by the outer query, it can be inefficient. Movie(title, year, director, length) SELECT DISTINCT title FROM Movie AS m WHERE year <> ANY( SELECT year FROM Movie WHERE title = m.title) Find movies whose title appears more than once (in different years). 35

Lecture 3 > Section 1 > Nested Queries Complex Correlated Query Product(name, price, category, maker, year) SELECT DISTINCT x.name, x.maker FROM Product AS x WHERE x.price > ALL( SELECT y.price FROM Product AS y WHERE x.maker = y.maker AND y.year < 1972) Find products (and their manufacturers) that are more expensive than all products made by the same manufacturer before 1972 Can be very powerful but is much harder to optimize. It is best to avoid correlated queries. 36

Lecture 3 > Section 1 > Nested Queries Subqueries in other places SELECT * FROM (SELECT product, COUNT(product) AS count FROM Purchase GROUP BY product) WHERE count > 2 SELECT *, (SELECT count(*) FROM Product p1 WHERE p1.category = p2.category) AS '# Prod. in Cat.' FROM Product p2 Subqueries can appear wherever a table or a value is needed. 37

Lecture 3 > Section 1 > Summary Basic SQL Summary SQL provides a high-level declarative language for manipulating data (DML) Set operators are powerful but have some subtleties (duplicates). Powerful, nested queries also allowed. 38

Lecture 3 > Section 1 > ACTIVITY Activity Company(Cname, country) Product(PName, price, manufacturer) Purchase(id, product, buyer) What products did Joe Blow purchase and how much were they? 1) Solve as a multiple join. 2) Solve as a correlated subquery (in SELECT). 39

Lecture 3 > Section 3 ADVANCED SQL: NULLS, CASTING AND OUTER JOINS 40

Lecture 3 > Section 3 > NULLs NULL Values Whenever we do not have a value, we can use NULL Can mean many things: Value does not exists Value exists but is unknown (n/a, not available) Value not applicable The schema specifies for each attribute if it can be null (nullable attribute) or not with NOT NULL 43

Lecture 3 > Section 3 > NULLs Null Values and Operators For numerical operations, NULL NULL: If x = NULL then 4*(3-x)/7 is also NULL For boolean operations, in SQL there are three values: FALSE = 0 TRUE = 1 UNKNOWN If x= NULL then x= Joe is UNKNOWN Note: comparison in SQL is a single = SQLite does not have a boolean datatype. It uses Integer instead! Try: SELECT 2>1 SELECT 2>NULL SELECT 1+NULL 44

Lecture 3 > Section 3 > NULLs Null Values in the WHERE Clause SELECT * FROM Person WHERE (age < 25) AND (height > 6 AND weight > 190) Will not return age=20, height=null, weight=200 Since NULL > 6 is UNKNOWN! 45

Lecture 3 > Section 3 > NULLs Null Values in WHERE Clauses Unexpected behavior: SELECT * FROM Person WHERE age < 25 OR age >= 25 Should return all persons, but persons with NULL as age are not included! You can use CASE with IS NULL, ISNULL(), IFNULL() or COALESCE() to handle NULL values. 46

Lecture 3 > Section 3 > NULLs CASTing Data Types SQL is a typed language. I.e., values and columns have a data type. SELECT 3/2 SELECT 3.0/2 SELECT 3/2.0 SELECT CAST(3 AS DOUBLE)/2 1 1.5 1.5 1.5 Typecasting rules are similar to other typed languages like C++. 47

Lecture 3 > Section 3 > Outer Joins RECAP: Inner Joins Inner joins select all rows from both tables as long as there is a match between the columns in both tables. Inner joins are the default in SQL. Example: What stores sell what products? Product(name, category) Purchase(prodName, store) SELECT Product.name, Purchase.store FROM Product JOIN Purchase ON Product.name = Purchase.prodName SELECT Product.name, Purchase.store FROM Product, Purchase WHERE Product.name = Purchase.prodName Both equivalent: Both INNER JOINS! 48

Lecture 3 > Section 3 > Outer Joins Inner Joins + NULLS = Lost data? Product(name, category) Purchase(prodName, store) SELECT Product.name, Purchase.store FROM Product JOIN Purchase ON Product.name = Purchase.prodName SELECT Product.name, Purchase.store FROM Product, Purchase WHERE Product.name = Purchase.prodName However: Products that were never sold in any store (with no Purchase tuple) will be lost! 49

Lecture 3 > Section 3 > Outer Joins Outer Joins An outer join returns also tuples from the joined relations that do not have a corresponding tuple in the other relations (filled with NULL values). Left outer joins in SQL: SELECT Product.name, Purchase.store FROM Product LEFT OUTER JOIN Purchase ON Product.name = Purchase.prodName Now we ll get products even if they didn t sell 50

Lecture 3 > Section 3 > Outer Joins Product INNER JOIN: Purchase name category prodname store Gizmo gadget Gizmo Wiz Camera Photo Camera Ritz OneClick Photo Camera Wiz SELECT Product.name, Purchase.store FROM Product INNER JOIN Purchase ON Product.name = Purchase.prodName name Gizmo Camera Camera store Wiz Ritz Wiz 51

Lecture 3 > Section 3 > Outer Joins LEFT OUTER JOIN: Product Purchase name category prodname store Gizmo gadget Gizmo Wiz Camera Photo Camera Ritz OneClick Photo Camera Wiz SELECT Product.name, Purchase.store FROM Product LEFT OUTER JOIN Purchase ON Product.name = Purchase.prodName name Gizmo Camera Camera store Wiz Ritz Wiz OneClick NULL 52

Lecture 3 > Section 3 > Outer Joins Other Outer Joins Left outer join: Include the left tuple even if there s no match Right outer join: Include the right tuple even if there s no match Full outer join: Include the both left and right tuples even if there s no match SQLite currently only supports LEFT OUTER JOIN, but you can easily just change the order of the tables in the query. 53

Lecture 3 > Section 3 > INSERT Adding Data INSERT INTO TABLE_NAME [(column1, column2, column3,...columnn)] VALUES (value1, value2, value3,...valuen); Note: column names are optional. INSERT INTO Product VALUES ('Gizmo', 19, 'Gadgets', 'GWorks') 54

Lecture 3 > Section 3 > INSERT Adding Data The data can also come from an existing table. INSERT INTO first_table_name [(column1, column2,... columnn)] SELECT column1, column2,...columnn FROM second_table_name [WHERE condition]; 55

Lecture 3 > Section 3 > DROP Removing a Table DROP TABLE database_name.table_name 56

Select Syntax Diagram (SQLite) http://www.sqlite.org/lang.html

Lecture 3 > Section 3 > ACTIVITY Activity Review (http://www.tutorialspoint.com/sqlite/): Transaction control Views Indexes Date & Time 58