CSC 261/461 Database Systems Lecture 5. Fall 2017

Similar documents
Introduction to SQL Part 2 by Michael Hahsler Based on slides for CS145 Introduction to Databases (Stanford)

Lecture 4: Advanced SQL Part II

Lecture 03: SQL. Friday, April 2 nd, Dan Suciu Spring

Lectures 2&3: Introduction to SQL

SQL. Structured Query Language

Introduction to Data Management CSE 344

Introduction to Database Systems CSE 414

CS 564 Midterm Review

CS 2451 Database Systems: More SQL

Lecture 04: SQL. Monday, April 2, 2007

Principles of Database Systems CSE 544. Lecture #2 SQL The Complete Story

SQL. Assit.Prof Dr. Anantakul Intarapadung

Database Systems CSE 414

Lecture 04: SQL. Wednesday, October 4, 2006

HW1 is due tonight HW2 groups are assigned. Outline today: - nested queries and witnesses - We start with a detailed example! - outer joins, nulls?

Introduction to Data Management CSE 344

Polls on Piazza. Open for 2 days Outline today: Next time: "witnesses" (traditionally students find this topic the most difficult)

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2006 Lecture 3 - Relational Model

Agenda. Discussion. Database/Relation/Tuple. Schema. Instance. CSE 444: Database Internals. Review Relational Model

L07: SQL: Advanced & Practice. CS3200 Database design (sp18 s2) 1/11/2018

CSE 544 Principles of Database Management Systems

Announcements. Outline UNIQUE. (Inner) joins. (Inner) Joins. Database Systems CSE 414. WQ1 is posted to gradebook double check scores

Introduction to SQL Part 1 By Michael Hahsler based on slides for CS145 Introduction to Databases (Stanford)

CSE 544 Principles of Database Management Systems

Principles of Database Systems CSE 544. Lecture #2 SQL, Relational Algebra, Relational Calculus

Please pick up your name card

Announcements. Agenda. Database/Relation/Tuple. Discussion. Schema. CSE 444: Database Internals. Room change: Lab 1 part 1 is due on Monday

Database design and implementation CMPSCI 645. Lecture 04: SQL and Datalog

CS 2451 Database Systems: SQL

Introduction to Relational Databases

Querying Data with Transact SQL

Announcements. Agenda. Database/Relation/Tuple. Schema. Discussion. CSE 444: Database Internals

Database Systems CSE 414

Announcements. Subqueries. Lecture Goals. 1. Subqueries in SELECT. Database Systems CSE 414. HW1 is due today 11pm. WQ1 is due tomorrow 11pm

Introduction to Database Systems CSE 414

DS Introduction to SQL Part 1 Single-Table Queries. By Michael Hahsler based on slides for CS145 Introduction to Databases (Stanford)

DS Introduction to SQL Part 2 Multi-table Queries. By Michael Hahsler based on slides for CS145 Introduction to Databases (Stanford)

SELECT Product.name, Purchase.store FROM Product JOIN Purchase ON Product.name = Purchase.prodName

Querying Data with Transact-SQL

CSC 261/461 Database Systems Lecture 14. Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101

Keys, SQL, and Views CMPSCI 645

CSE 344 JANUARY 19 TH SUBQUERIES 2 AND RELATIONAL ALGEBRA

Lecture 3 Additional Slides. CSE 344, Winter 2014 Sudeepa Roy

Introduction to Data Management CSE 344

Introduction to Data Management CSE 344

CSE 344 MAY 14 TH ENTITIES

CSE 344 JULY 30 TH DB DESIGN (CH 4)

Subqueries. 1. Subqueries in SELECT. 1. Subqueries in SELECT. Including Empty Groups. Simple Aggregations. Introduction to Database Systems CSE 414

CS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen

Introduction to Data Management CSE 344

Relational Model, Relational Algebra, and SQL

Database Systems CSE 414

Lecture 3: SQL Part II

Database Systems CSE 414

Querying Data with Transact-SQL

Querying Data with Transact SQL Microsoft Official Curriculum (MOC 20761)

Introduction to Data Management CSE 344

CSE 344 AUGUST 1 ST ENTITIES

SQL Continued! Outerjoins, Aggregations, Grouping, Data Modification

Structured Query Language Continued. Rose-Hulman Institute of Technology Curt Clifton

Lecture 2: Introduction to SQL

CS 582 Database Management Systems II

CSC 261/461 Database Systems Lecture 19

Introduction to SQL. Select-From-Where Statements Multirelation Queries Subqueries. Slides are reused by the approval of Jeffrey Ullman s

NESTED QUERIES AND AGGREGATION CHAPTER 5 (6/E) CHAPTER 8 (5/E)

Big Data Processing Technologies. Chentao Wu Associate Professor Dept. of Computer Science and Engineering

Introduction to SQL. Select-From-Where Statements Multirelation Queries Subqueries

SQL: Data Manipulation Language. csc343, Introduction to Databases Diane Horton Winter 2017

SQL: csc343, Introduction to Databases Renée J. Miller and Fatemeh Nargesian and Sina Sina Meraji. Winter 2018

COMP 430 Intro. to Database Systems

Entity/Relationship Modelling

Introduction to Database Systems CSE 414. Lecture 8: SQL Wrap-up

Querying Data with Transact-SQL (20761)

Lecture Query evaluation. Combining operators. Logical query optimization. By Marina Barsky Winter 2016, University of Toronto

Course Modules for MCSA: SQL Server 2016 Database Development Training & Certification Course:

Querying Microsoft SQL Server 2014

RELATIONAL ALGEBRA. CS 564- Fall ACKs: Dan Suciu, Jignesh Patel, AnHai Doan

CSC 343 Winter SQL: Aggregation, Joins, and Triggers MICHAEL LIUT

Relational Algebra and SQL

SQL. CS 564- Fall ACKs: Dan Suciu, Jignesh Patel, AnHai Doan

20461: Querying Microsoft SQL Server 2014 Databases

Chapter 6 The database Language SQL as a tutorial

Database Design Process Entity / Relationship Diagrams

Introduction to Database Systems CSE 414

Multisets and Duplicates. SQL: Duplicate Semantics and NULL Values. How does this impact Queries?

CMPT 354: Database System I. Lecture 3. SQL Basics

CMPT 354: Database System I. Lecture 4. SQL Advanced

After completing this course, participants will be able to:

RELATIONAL ALGEBRA II. CS121: Relational Databases Fall 2017 Lecture 3

EECS 647: Introduction to Database Systems

Introduction to SQL. Multirelation Queries Subqueries. Slides are reused by the approval of Jeffrey Ullman s

Introduction to Database Systems CSE 444

SQL: Data Manipulation Language

Introduction to Database Systems

Nested Queries. Dr Paolo Guagliardo. Aggregate results in WHERE The right way. Fall 2018

Handout 9 CS-605 Spring 18 Page 1 of 8. Handout 9. SQL Select -- Multi Table Queries. Joins and Nested Subqueries.

Database design and implementation CMPSCI 645. Lecture 06: Constraints and E/R model

Querying Microsoft SQL Server 2012/2014

Midterm Review. Winter Lecture 13

NESTED QUERIES AND AGGREGATION CHAPTER 5 (6/E) CHAPTER 8 (5/E)

Transcription:

CSC 261/461 Database Systems Lecture 5 Fall 2017

MULTISET OPERATIONS IN SQL 2

UNION SELECT R.A FROM R, S WHERE R.A=S.A UNION SELECT R.A FROM R, T WHERE R.A=T.A Q 1 Q 2 r. A r. A = s. A r. A r. A = t. A} Why aren t there duplicates? What if we want duplicates? 3

UNION ALL SELECT R.A FROM R, S WHERE R.A=S.A UNION ALL SELECT R.A FROM R, T WHERE R.A=T.A Q 1 Q 2 r. A r. A = s. A r. A r. A = t. A} ALL indicates Multiset operations 4

EXCEPT SELECT R.A FROM R, S WHERE R.A=S.A EXCEPT SELECT R.A FROM R, T WHERE R.A=T.A Q 1 Q 2 r. A r. A = s. A \{r. A r. A = t. A} 5

Nested queries: Sub-queries Returning Relations Another example : Company(name, city) Product(name, maker) Purchase(id, product, buyer) SELECT DISTINCT c.city FROM Company c WHERE c.name IN ( SELECT pr.maker FROM Purchase p, Product pr WHERE p.product = pr.name AND p.buyer = Joe Blow ) Cities where one can find companies that manufacture products bought by Joe Blow 6

Subqueries Returning Relations You can also use operations of the form: s > ALL R s < ANY R EXISTS R ANY and ALL not supported by SQLite. Ex: Product(name, price, category, maker) SELECT name FROM Product WHERE price > ALL( SELECT price FROM Product WHERE maker = Gizmo-Works ) Find products that are more expensive than all those produced by Gizmo-Works 7

Subqueries Returning Relations You can also use operations of the form: s > ALL R s < ANY R EXISTS R Ex: Product(name, price, category, maker) SELECT p1.name FROM Product p1 WHERE p1.maker = Gizmo-Works AND EXISTS( SELECT p2.name FROM Product p2 WHERE p2.maker <> Gizmo-Works AND p1.name = p2.name) <> means!= Find copycat products, i.e. products made by competitors with the same names as products made by Gizmo-Works 8

Nested queries as alternatives to INTERSECT and EXCEPT (SELECT R.A, R.B FROM R) INTERSECT (SELECT S.A, S.B FROM S) SELECT R.A, R.B FROM R WHERE EXISTS( SELECT * FROM S WHERE R.A=S.A AND R.B=S.B) (SELECT R.A, R.B FROM R) EXCEPT (SELECT S.A, S.B FROM S) SELECT R.A, R.B FROM R WHERE NOT EXISTS( SELECT * FROM S WHERE R.A=S.A AND R.B=S.B) 9

Correlated Queries Movie(title, year, director, length) SELECT DISTINCT title FROM Movie AS m WHERE year <> ANY( SELECT year FROM Movie WHERE title = m.title) Find movies whose title appears more than once. Note the scoping of the variables! 10

Basic SQL Summary SQL provides a high-level declarative language for manipulating data (DML) The workhorse is the SFW block Set operators are powerful but have some subtleties Powerful, nested queries also allowed. 11

2. AGGREGATION & GROUP BY 12

What you will learn about in this section 1. Aggregation operators 2. GROUP BY 3. GROUP BY: with HAVING, semantics 13

Aggregation SELECT AVG(price) FROM Product WHERE maker = Toyota SELECT COUNT(*) FROM Product WHERE year > 1995 SQL supports several aggregation operations: SUM, COUNT, MIN, MAX, AVG Except COUNT, all aggregations apply to a single attribute 14

Aggregation: COUNT COUNT applies to duplicates, unless otherwise stated SELECT COUNT(category) FROM Product WHERE year > 1995 Note: Same as COUNT(*). Why? We probably want: SELECT COUNT(DISTINCT category) FROM Product WHERE year > 1995 15

More Examples Purchase(product, date, price, quantity) SELECT SUM(price * quantity) FROM Purchase SELECT SUM(price * quantity) FROM Purchase WHERE product = bagel What do these mean? 16

Simple Aggregations Purchase Product Date Price Quantity bagel 10/21 1 20 banana 10/3 0.5 10 banana 10/10 1 10 bagel 10/25 1.50 20 SELECT SUM(price * quantity) FROM Purchase WHERE product = bagel 50 (= 1*20 + 1.50*20) 17

Grouping and Aggregation Purchase(product, date, price, quantity) SELECT product, SUM(price * quantity) AS TotalSales FROM Purchase WHERE date > 10/1/2005 GROUP BY product Find total sales after 10/1/2005 per product. Let s see what this means 18

Grouping and Aggregation Semantics of the query: 1. Compute the FROM and WHERE clauses 2. Group by the attributes in the GROUP BY 3. Compute the SELECT clause: grouped attributes and aggregates 19

1. Compute the FROM and WHERE clauses SELECT product, SUM(price*quantity) AS TotalSales FROM Purchase WHERE date > 10/1/2005 GROUP BY product FROM Product Date Price Quantity Bagel 10/21 1 20 Bagel 10/25 1.50 20 Banana 10/3 0.5 10 Banana 10/10 1 10 20

2. Group by the attributes in the GROUP BY SELECT product, SUM(price*quantity) AS TotalSales FROM Purchase WHERE date > 10/1/2005 GROUP BY product Product Date Price Quantity Bagel 10/21 1 20 Bagel 10/25 1.50 20 Banana 10/3 0.5 10 Banana 10/10 1 10 GROUP BY Product Date Price Quantity Bagel 10/21 1 20 10/25 1.50 20 Banana 10/3 0.5 10 10/10 1 10 21

3. Compute the SELECT clause: grouped attributes and aggregates SELECT product, SUM(price*quantity) AS TotalSales FROM Purchase WHERE date > 10/1/2005 GROUP BY product Product Date Price Quantity Bagel 10/21 1 20 10/25 1.50 20 Banana 10/3 0.5 10 10/10 1 10 SELECT Product TotalSales Bagel 50 Banana 15 22

HAVING Clause SELECT product, SUM(price*quantity) FROM Purchase WHERE date > 10/1/2005 GROUP BY product HAVING SUM(quantity) > 100 HAVING clauses contains conditions on aggregates Same query as before, except that we consider only products that have more than 100 buyers Whereas WHERE clauses condition on individual tuples 23

General form of Grouping and Aggregation SELECT S FROM R 1,,R n WHERE C 1 GROUP BY a 1,,a k HAVING C 2 S = Can ONLY contain attributes a 1,,a k and/or aggregates over other attributes Why? C 1 = is any condition on the attributes in R 1,,R n C 2 = is any condition on the aggregate expressions 24

General form of Grouping and Aggregation Evaluation steps: SELECT S FROM R 1,,R n WHERE C 1 GROUP BY a 1,,a k HAVING C 2 1. Evaluate FROM-WHERE: apply condition C 1 on the attributes in R 1,,R n 2. GROUP BY the attributes a 1,,a k 3. Apply condition C 2 to each group (may have aggregates) 4. Compute aggregates in S and return the result 25

Group-by vs. Nested Query Author(login, name) Wrote(login, url) Find authors who wrote ³ 10 documents: Attempt 1: with nested queries SELECT DISTINCT Author.name FROM Author WHERE COUNT( SELECT Wrote.url FROM Wrote WHERE Author.login = Wrote.login) > 10 This is SQL by a novice 26

Group-by vs. Nested Query Find all authors who wrote at least 10 documents: Attempt 2: SQL style (with GROUP BY) SELECT Author.name FROM Author, Wrote WHERE Author.login = Wrote.login GROUP BY Author.name HAVING COUNT(Wrote.url) > 10 This is SQL by an expert No need for DISTINCT: automatically from GROUP BY 27

Group-by vs. Nested Query Which way is more efficient? Attempt #1- With nested: How many times do we do a SFW query over all of the Wrote relations? Attempt #2- With group-by: How about when written this way? With GROUP BY can be much more efficient!

Topics Covered 1. NULLs 2. Outer Joins 3. With and Case 4. Constraint 5. Schema Change Statements 29

Slide 7-6 Comparisons Involving NULL and Three-Valued Logic (cont d.)

NULLS in SQL Whenever we don t have a value, we can put a NULL Can mean many things: Value does not exists Value exists but is unknown Value not applicable Etc. The schema specifies for each attribute if can be null (nullable attribute) or not Each individual NULL value considered to be different from every other NULL value SQL uses a three-valued logic: TRUE, FALSE, and UNKNOWN (like Maybe) NULL = NULL comparison is avoided How does SQL cope with tables that have NULLs? 31

Null Values Unexpected behavior: SELECT * FROM Person WHERE age < 25 OR age >= 25 Some Persons are not included! 32

Null Values Can test for NULL explicitly: x IS NULL x IS NOT NULL SELECT * FROM Person WHERE age < 25 OR age >= 25 OR age IS NULL Now it includes all Persons! 33

RECAP: Inner Joins By default, joins in SQL are inner joins : Product(name, category) Purchase(prodName, store) SELECT Product.name, Purchase.store FROM Product JOIN Purchase ON Product.name = Purchase.prodName SELECT Product.name, Purchase.store FROM Product, Purchase WHERE Product.name = Purchase.prodName Both equivalent: Both INNER JOINS! 34

Inner Joins + NULLS = Lost data? By default, joins in SQL are inner joins : Product(name, category) Purchase(prodName, store) SELECT Product.name, Purchase.store FROM Product JOIN Purchase ON Product.name = Purchase.prodName SELECT Product.name, Purchase.store FROM Product, Purchase WHERE Product.name = Purchase.prodName However: Products that never sold (with no Purchase tuple) will be lost! 35

Outer Joins An outer join returns tuples from the joined relations that don t have a corresponding tuple in the other relations I.e. If we join relations A and B on a.x = b.x, and there is an entry in A with X=5, but none in B with X=5 A LEFT OUTER JOIN will return a tuple (a, NULL)! Left outer joins in SQL: SELECT Product.name, Purchase.store FROM Product LEFT OUTER JOIN Purchase ON Product.name = Purchase.prodName Now we ll get products even if they didn t sell 36

Product INNER JOIN: Purchase name category prodname store Gizmo gadget Gizmo Wiz Camera Photo Camera Ritz OneClick Photo Camera Wiz SELECT Product.name, Purchase.store FROM Product INNER JOIN Purchase ON Product.name = Purchase.prodName Note: another equivalent way to write an INNER JOIN! name Gizmo Camera Camera store Wiz Ritz Wiz 37

Product LEFT OUTER JOIN: Purchase name category prodname store Gizmo gadget Gizmo Wiz Camera Photo Camera Ritz OneClick Photo Camera Wiz SELECT Product.name, Purchase.store FROM Product LEFT OUTER JOIN Purchase ON Product.name = Purchase.prodName name Gizmo Camera Camera OneClick store Wiz Ritz Wiz NULL 38

Other Outer Joins Left outer join: Include the left tuple even if there s no match Right outer join: Include the right tuple even if there s no match Full outer join: Include the both left and right tuples even if there s no match 39

Acknowledgement Some of the slides in this presentation are taken from the slides provided by the authors. Many of these slides are taken from cs145 course offered by Stanford University. Thanks to YouTube, especially to Dr. Daniel Soper for his useful videos. CSC 261, Spring 2017, UR