CMPT 354: Database System I. Lecture 4. SQL Advanced

Similar documents
CMPT 354: Database System I. Lecture 2. Relational Model

CMPT 354: Database System I. Lecture 3. SQL Basics

CMPT 354: Database System I. Lecture 5. Relational Algebra

CMPT 354: Database System I. Lecture 7. Basics of Query Optimization

Simple queries Set operations Aggregate operators Null values Joins Query Optimization. John Edgar 2

CMPT 354: Database System I. Lecture 1. Course Introduction

EECS 647: Introduction to Database Systems

CMPT 354: Database System I. Lecture 11. Transaction Management

Querying Data with Transact SQL

SQL - Data Query language

CS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen

Announcements (September 14) SQL: Part I SQL. Creating and dropping tables. Basic queries: SFW statement. Example: reading a table

Database Languages. A DBMS provides two types of languages: Language for accessing & manipulating the data. Language for defining a database schema

Lecture 3 SQL - 2. Today s topic. Recap: Lecture 2. Basic SQL Query. Conceptual Evaluation Strategy 9/3/17. Instructor: Sudeepa Roy

SQL-Nested Queries & Aggregate functions. Lecture By Binu Jasim 02-Aug-2016

SELECT Product.name, Purchase.store FROM Product JOIN Purchase ON Product.name = Purchase.prodName

SQL QUERIES. CS121: Relational Databases Fall 2017 Lecture 5

Constraints. Primary Key Foreign Key General table constraints Domain constraints Assertions Triggers. John Edgar 2

SQL Aggregation and Grouping. Outline

SQL Aggregation and Grouping. Davood Rafiei

Querying Data with Transact-SQL

Introduction to SQL Part 2 by Michael Hahsler Based on slides for CS145 Introduction to Databases (Stanford)

A subquery is a nested query inserted inside a large query Generally occurs with select, from, where Also known as inner query or inner select,

CSC 261/461 Database Systems Lecture 5. Fall 2017

SQL: Structured Query Language Nested Queries

Querying Data with Transact-SQL

CSC 261/461 Database Systems Lecture 19

Announcements. Joins in SQL. Joins in SQL. (Inner) joins. Joins in SQL. Introduction to Database Systems CSE 414

Lecture 3 More SQL. Instructor: Sudeepa Roy. CompSci 516: Database Systems

Structured Query Language Continued. Rose-Hulman Institute of Technology Curt Clifton

INTERMEDIATE SQL GOING BEYOND THE SELECT. Created by Brian Duffey

Introduction to database design

Database Management and Tuning

Querying Data with Transact SQL Microsoft Official Curriculum (MOC 20761)

SQL Part 2. Kathleen Durant PhD Northeastern University CS3200 Lesson 6

Database Applications (15-415)

Introduction to Data Management. Lecture 14 (SQL: the Saga Continues...)

CS122 Lecture 5 Winter Term,

Goals for Today. CS 133: Databases. Relational Model. Multi-Relation Queries. Reason about the conceptual evaluation of an SQL query

COMP 244 DATABASE CONCEPTS & APPLICATIONS

Nested Queries. Dr Paolo Guagliardo. Aggregate results in WHERE The right way. Fall 2018

Automatically Synthesizing SQL Queries from Input-Output Examples

An Introduction to Structured Query Language

Lecture 6 - More SQL

An Introduction to Structured Query Language

SQL Data Manipulation Language. Lecture 5. Introduction to SQL language. Last updated: December 10, 2014

An Introduction to Structured Query Language

NULL. The special value NULL could mean: Unknown Unavailable Not Applicable

Database Systems CSE 303. Outline. Lecture 06: SQL. What is Sub-query? Sub-query in WHERE clause Subquery

An Introduction to Structured Query Language

SQL Data Querying and Views

SQL: Part II. Announcements (September 18) Incomplete information. CPS 116 Introduction to Database Systems. Homework #1 due today (11:59pm)

Lecture 2 SQL. Instructor: Sudeepa Roy. CompSci 516: Data Intensive Computing Systems

Introduction to Data Management CSE 344

Database Management Systems,

RELATIONAL ALGEBRA II. CS121: Relational Databases Fall 2017 Lecture 3

An Introduction to Structured Query Language

CSE 344 JANUARY 19 TH SUBQUERIES 2 AND RELATIONAL ALGEBRA

Structured Query Language (SQL)

SQL: Data Querying. B0B36DBS, BD6B36DBS: Database Systems. h p:// Lecture 4

Database Management Systems. Chapter 5

Querying Data with Transact-SQL (761)

Part I: Structured Data

COMP 430 Intro. to Database Systems

SQL Data Query Language

In This Lecture. Yet More SQL SELECT ORDER BY. SQL SELECT Overview. ORDER BY Example. ORDER BY Example. Yet more SQL

Announcements (September 18) SQL: Part II. Solution 1. Incomplete information. Solution 3? Solution 2. Homework #1 due today (11:59pm)

SQL - Lecture 3 (Aggregation, etc.)

Principles of Database Systems CSE 544. Lecture #2 SQL The Complete Story

CSC 343 Winter SQL: Aggregation, Joins, and Triggers MICHAEL LIUT

Missing Information. We ve assumed every tuple has a value for every attribute. But sometimes information is missing. Two common scenarios:

Advanced SQL GROUP BY Clause and Aggregate Functions Pg 1

Carnegie Mellon Univ. Dept. of Computer Science Database Applications. General Overview - rel. model. Overview - detailed - SQL

Database Systems CSE 414

Multisets and Duplicates. SQL: Duplicate Semantics and NULL Values. How does this impact Queries?

Database Systems CSE 414

Slicing and Dicing Data in CF and SQL: Part 1

Announcements. Subqueries. Lecture Goals. 1. Subqueries in SELECT. Database Systems CSE 414. HW1 is due today 11pm. WQ1 is due tomorrow 11pm

20461: Querying Microsoft SQL Server 2014 Databases

Querying Microsoft SQL Server 2014

Instructor: Craig Duckett. Lecture 03: Tuesday, April 3, 2018 SQL Sorting, Aggregates and Joining Tables

RDBMS- Day 4. Grouped results Relational algebra Joins Sub queries. In today s session we will discuss about the concept of sub queries.

Querying Data with Transact-SQL (20761)

MANAGING DATA(BASES) USING SQL (NON-PROCEDURAL SQL, X401.9)

Database Applications (15-415)

Teradata SQL Features Overview Version

Polls on Piazza. Open for 2 days Outline today: Next time: "witnesses" (traditionally students find this topic the most difficult)

Advanced SQL Processing Prepared by Destiny Corporation

Querying Data with Transact-SQL

[AVNICF-MCSASQL2012]: NICF - Microsoft Certified Solutions Associate (MCSA): SQL Server 2012

Lecture 16. The Relational Model

Plan for today. Query Processing/Optimization. Parsing. A query s trip through the DBMS. Validation. Logical plan

Lecture 8. Monday, September 29, 2014

Computing for Medicine (C4M) Seminar 3: Databases. Michelle Craig Associate Professor, Teaching Stream

COURSE OUTLINE MOC 20461: QUERYING MICROSOFT SQL SERVER 2014

QUERYING MICROSOFT SQL SERVER COURSE OUTLINE. Course: 20461C; Duration: 5 Days; Instructor-led

Relational Algebra for sets Introduction to relational algebra for bags

Databases (MariaDB/MySQL) CS401, Fall 2015

Ø Last Thursday: String manipulation & Regular Expressions Ø guest lecture from the amazing Sam Lau Ø reviewed in section and in future labs & HWs

20761 Querying Data with Transact SQL

Transcription:

CMPT 354: Database System I Lecture 4. SQL Advanced 1

Announcements! A1 is due today A2 is released (due in 2 weeks) 2

Outline Joins Inner Join Outer Join Aggregation Queries Simple Aggregations Group By Having Discussion 3

Joins: Recap Student name gpa Mary 3.8 Tom 3.6 Jack 3.7 Enroll stdname course Mary 354 Tom 354 Tom 454 Bob 354 SELECT name, course FROM Student, Enroll WHERE name = stdname name gpa Mary 354 Tom 354 Tom 454 4

Two equivalent ways to write joins SELECT name, course FROM Student, Enroll WHERE name = stdname SELECT name, course FROM Student JOIN Enroll ON name = stdname 5

Join Types SELECT name, course FROM Student INNER JOIN Enroll ON name = stdname SELECT name, course FROM Student FULL OUTER JOIN Enroll ON name = stdname SELECT name FROM Student LEFT OUTER JOIN Enroll ON name = stdname SELECT name FROM Student RIGHT OUTER JOIN Enroll ON name = stdname 6

Join Types SELECT name, course FROM Student INNER JOIN Enroll ON name = stdname SELECT name, course FROM Student FULL OUTER JOIN Enroll ON name = stdname SELECT name FROM Student LEFT OUTER JOIN Enroll ON name = stdname SELECT name FROM Student RIGHT OUTER JOIN Enroll ON name = stdname 7

Left Join Student name gpa Mary 3.8 Tom 3.6 Jack 3.7 Enroll stdname course Mary 354 Tom 354 Tom 454 Bob 354 SELECT name, course FROM Student JOIN LEFT JOIN Enroll Enroll ON ON name = stdname We want to include all students no matter whether they enroll a course or not. How? 8

SELECT name, course FROM Student LEFT JOIN Enroll ON name = stdname Student name gpa Mary 3.8 Tom 3.6 Jack 3.7 Enroll stdname course Mary 354 Tom 354 Tom 454 Bob 354 Output name course Mary 354 Tom 354 Tom 454 Jack NULL 9

SELECT name, course FROM Student RIGHT JOIN Enroll ON name = stdname Student name gpa Mary 3.8 Tom 3.6 Jack 3.7 Enroll stdname course Mary 354 Tom 354 Tom 454 Bob 354 Output name course Mary 354 Tom 354 Tom 454 NULL 354 10

SELECT name, course FROM Enroll FULL JOIN Student ON name = stdname Enroll stdname course Mary 354 Tom 354 Tom 454 Bob 354 Student name gpa Mary 3.8 Tom 3.6 Jack 3.7 Output name course Mary 354 Tom 354 Tom 454 Jack NULL NULL 354 11

Outer Join TableA (LEFT/RIGHT/FULL) JOIN TableB Left outer join: Include tuples from tablea even if no match Right outer join: Include tuples from tablea even if no match Full outer join: Include tuples from both even if no match 12

Exercise - 1 SELECT name, course FROM Student LEFT JOIN Enroll ON name = stdname AND course = 354 Student name gpa Mary 3.8 Tom 3.6 Jack 3.7 Enroll stdname course Mary 354 Tom 354 Tom 454 Bob 354 name course Mary 354 Tom 354 Jack NULL (A) name course Mary 354 Tom 354 (B) 13

Exercise - 2 SELECT name, course FROM Student LEFT JOIN Enroll ON name = stdname WHERE course = 354 name Student gpa Mary 3.8 Tom 3.6 Jack 3.7 stdname Enroll course Mary 354 Tom 354 Tom 454 Bob 354 name course Mary 354 Tom 354 Jack NULL (A) name course Mary 354 Tom 354 (B) 14

Outline Joins Inner Join Outer Join Aggregation Queries Simple Aggregations Group By Having Discussion 15

Simple Aggregation SELECT agg(column) FROM <table name> WHERE <conditions> agg = COUNT, SUM, AVG, MAX, MIN, etc. Except count, all aggregations apply to a single attribute 16

Examples name gender gpa Bob M 3 Mike M 3 Alice F 3 Mary F 4 Tom M 4 SELECT COUNT(*) FROM Student SELECT SUM(gpa) FROM Student SELECT AVG(gpa) FROM Student SELECT MIN(gpa) FROM Student 5 17 3.4 3 SELECT MAX(gpa) FROM Student 4 17

Examples name gender gpa Bob M 3 Mike M 3 Alice F 3 Mary F 4 Tom M 4 SELECT COUNT(DISTINCT gpa) FROM Student 2 SELECT SUM(DISTINCT gpa) FROM Student 7 SELECT AVG(gpa) FROM Student WHERE gender = F 3.5 18

The need for Group By How to get AVG(gpa) for each gender? SELECT AVG(gpa) FROM Student WHERE gender = M SELECT AVG(gpa) FROM Student WHERE gender = F How to get AVG(gpa) for each age? SELECT AVG(gpa) FROM Student WHERE age = 18 SELECT AVG(gpa) FROM Student WHERE age = 19 SELECT AVG(gpa) FROM Student WHERE age = 20. 19

Grouping and Aggregation SELECT agg(column) FROM <table name> WHERE <conditions> GROUP BY <columns> How to get AVG(gpa) for each gender? SELECT AVG(gpa) FROM Student GROUP BY gender How to get AVG(gpa) for each age? SELECT AVG(gpa) FROM Student GROUP BY age 20

Grouping and Aggregation How is the following query processed? SELECT gender, AVG(gpa) FROM Student WHERE gpa > 2.5 GROUP BY gender Semantics of the query 1. Compute the FROM and WHERE clauses 2. Group by the attributes in the GROUP BY 3. Compute the SELECT clause: grouped attributes and aggregates 21

1. Compute the FROM and WHERE clauses SELECT gender, AVG(gpa) FROM Student WHERE gpa > 2.5 GROUP BY gender name gender gpa Bob M 2 Mike M 3 Alice F 3 Mary F 4 Tom M 3 name gender gpa Mike M 3 Alice F 3 Mary F 4 Tom M 3 22

2. Group by the attributes in the GROUP BY SELECT gender, AVG(gpa) FROM Student WHERE gpa > 2.5 GROUP BY gender name gender gpa Mike M 3 Alice F 3 Mary F 4 Tom M 3 gender name gpa M Mike 3 Tom 3 F Alice 3 Mary 4 23

3. Compute the SELECT clause: grouped attributes and aggregates SELECT gender, AVG(gpa) FROM Student WHERE gpa > 2.5 GROUP BY gender gender name gpa M Mike 3 Tom 3 F Alice 3 Mary 4 gender AVG(gpa) M 3 F 3.5 24

Exercise: Empty Group name gender gpa Bob M 3 Mike M 3 Alice F 4 Mary F 4 Tom M 3 SELECT gender, AVG(gpa) FROM Student WHERE gpa > 3.5 GROUP BY gender gender AVG(gpa) F 4 VS gender AVG(gpa) F 4 M NULL (A) (B) 25

Exercise: Empty Group name gender gpa Bob M 3 Mike M 3 Alice F 4 Mary F 4 Tom M 3 SELECT gender, AVG(gpa) FROM Student WHERE gpa > 3.5 GROUP BY gender name gender gpa Alice F 4 Mary F 4 26

Exercise: Empty Group name gender gpa Bob M 3 Mike M 3 Alice F 4 Mary F 4 Tom M 3 SELECT gender, AVG(gpa) FROM Student WHERE gpa > 3.5 GROUP BY gender name gender gpa Alice F 4 Mary F 4 gender name gpa Alice 4 F Mary 4 27

Exercise: Empty Group name gender gpa Bob M 3 Mike M 3 Alice F 4 Mary F 4 Tom M 3 SELECT gender, AVG(gpa) FROM Student WHERE gpa > 3.5 GROUP BY gender name gender gpa Alice F 4 Mary F 4 gender name gpa Alice 4 F Mary 4 gender AVG(gpa) F 4 28

Exercise: Invalid Selection name gender gpa Bob M 3 Mike M 3 Alice F 4 Mary F 4 Tom M 3 SELECT gender, AVG(gpa), name FROM Student WHERE gpa > 3.5 GROUP BY gender gender AVG(gpa) name F 4 Alice VS gender AVG(gpa) name F 4 Mary (A) (B) 29

Exercise: Invalid Selection name gender gpa Bob M 3 Mike M 3 Alice F 4 Mary F 4 Tom M 3 SELECT gender, AVG(gpa), name FROM Student WHERE gpa > 3.5 GROUP BY gender name gender gpa Alice F 4 Mary F 4 30

Exercise: Invalid Selection name gender gpa Bob M 3 Mike M 3 Alice F 4 Mary F 4 Tom M 3 SELECT gender, AVG(gpa), name FROM Student WHERE gpa > 3.5 GROUP BY gender name gender gpa Alice F 4 Mary F 4 gender name gpa Alice 4 F Mary 4 31

Exercise: Invalid Selection Everything in SELECT must be either a GROUP-BY attribute, or an aggregate name gender gpa Bob M 3 Mike M 3 Alice F 4 Mary F 4 Tom M 3 SELECT gender, AVG(gpa), name FROM Student WHERE gpa > 3.5 GROUP BY gender name gender gpa Alice F 4 Mary F 4 gender name gpa Alice 4 F Mary 4 gender AVG(gpa) name F 4??? 32

HAVING Clause Specify which groups you are interested in SELECT agg(column) FROM <table name> WHERE <conditions> GROUP BY <columns> HAVING <columns> 33

HAVING Clause Same query as before, except that we require each group has more than 10 students SELECT AVG(gpa), gender FROM Student WHERE gpa > 2.5 GROUP BY gender HAVING COUNT(*) > 10 HAVING clause contains conditions on aggregates. 34

Order of Evaluation SELECT S FROM R 1,,R n WHERE C 1 GROUP BY a 1,,a k HAVING C 2 Create the cross product of the tables in the FROM clause Remove rows not meeting the WHERE condition Divide records into groups by the GROUP BY clause Remove groups not meeting the HAVING clause Create one row for each group and remove columns not in the SELECT clause 35

Exercise StudentInfo name gender gpa Bob M 3 Mike M 3 Alice F 4 Mary F 4 Tom M 3 SELECT gender, AVG(gpa) FROM StudentInfo WHERE gpa > 2.5 GROUP BY gender HAVING COUNT(*) > 2 SELECT gender, AVG(gpa) FROM StudentInfo WHERE gpa > 2.5 GROUP BY gender HAVING SUM(gpa) < 9 gender AVG(gpa) gender AVG(gpa) gender AVG(gpa) M 3 F 4 M 3 F 4 (A) (B) (C) 36

Imagine you are a data scientist at a Bank 37

Computer Science vs. Data Science What When Who Goal Computer Science 1950- Software Engineer Write software to make computers work Plan à Design à Develop à Test à Deploy à Maintain What When Who Goal Data Science 2010- Data Scientist Extract insights from data to answer questions Collect à Clean à Integrate à Analyze à Visualize à Communicate 38

Discussion Q1. Who is the richest customer? Q2. Which customers have ONLY one account? 39

Discussion Q3. Howmany customers does each branch have? Q4. Which branch have a higher pay? 40

Outline Joins Inner Join Outer Join Self Join Aggregation Queries Simple Aggregations Group By Having Subqueries In the FROM clause In the WHERE clause 41

Subqueries A subquery is a SQL query nested inside a larger query Such inner-outer queries are called nested queries SELECT C.customerID, C.birthDate, C.income FROM Customer C WHERE C.customerID IN ( SELECT O.customerID FROM Account A, Owns A WHERE A.accNumber = O.accNumber AND A.branchName = 'Lonsdale ) Outer Query Inner Query 42

Subqueries Subqueries may appear in A FROM clause, A WHERE clause, and A HAVING clause SELECT <columns> FROM <table name> WHERE <conditions> GROUP BY <columns> HAVING <columns> 43

Subqueries in FROM Sometimes we need to compute an intermediate table only to use it later in a SELECT-FROM-WHERE Who is the richest customer? SELECT firstname, lastname, MAX(sumBalance) FROM (SELECT firstname, lastname, sum(balance) AS sumbalance FROM Customer C, Account A, Owns O WHERE C.customerID = O.customerID AND O.accNumber = A.accNumber GROUP BY C.customerID ) 44

Subqueries in FROM Sometimes we need to compute an intermediate table only to use it later in a SELECT-FROM-WHERE Which customers have a total balance equal to 0? SELECT firstname, lastname, sumbalance FROM (SELECT firstname, lastname, sum(balance) AS sumbalance FROM Customer C, Account A, Owns O WHERE C.customerID = O.customerID AND O.accNumber = A.accNumber GROUP BY C.customerID) AS T WHERE T. sumbalance = 0 45

Subqueries in FROM Sometimes we need to compute an intermediate table only to use it later in a SELECT-FROM-WHERE Which customers have a total balance equal to 0? SELECT firstname, lastname, sum(balance) AS sumbalance FROM Customer C, Account A, Owns O WHERE C.customerID = O.customerID AND O.accNumber = A.accNumber GROUP BY C.customerID HAVING sumbalance = 0 Rule of thumb: avoid nested queries when possible 46

Subqueries in WHERE Subqueries return a single constant >, <, =, <>, >=, <= Find the customerids of customers whose income is larger than avg(income) SELECT C.customerID FROM Customer C1 WHERE C1.income > (SELECT avg(c2.income) FROM Customer C2) 47

Subqueries in WHERE Subqueries return a relation IN NOT IN EXISTS NOT EXISTS ANY ALL 48

Accounts IN Burnaby Find the customerids of customers with an account at the Burnaby branch SELECT C.customerID FROM Customer C WHERE C.customerID IN (SELECT O.customerID FROM Account A, Owns O WHERE A.accNumber = O.accNumber AND A.branchName = Burnaby ) 49

Accounts NOT IN Burnaby Find the customerids of customers who do not have an account at the Burnaby branch SELECT C.customerID FROM Customer C WHERE C.customerID NOT IN( SELECT O.customerID FROM Account A, Owns O WHERE A.accNumber = O.accNumber AND A.branchName = Burnaby ) 50

Uncorrelated Queries The query shown previously contains an uncorrelated, or independent, sub-query The sub-query does not contain references to attributes of the outer query An independent sub-query can be evaluated before evaluation of the outer query And needs to be evaluated only once The sub-query result can be checked for each row of the outer query The cost is the cost for performing the sub-query (once) and the cost of scanning the outer relation 51

EXISTing BurnabyAccounts Find the customerids of customers with an account at the Burnaby branch SELECT C.customerID FROMCustomer C WHERE EXISTS ( SELECT * FROM Account A, Owns O WHERE C.customerID = O.customerID AND A.accNumber = O.accNumber AND A.branchName = Burnaby ) EXISTS and NOT EXISTS test whether the associated sub-query is non-empty or empty 52

Correlated Queries The previous query contained a correlated subquery With references to attributes of the outer query WHERE C.customerID = O.customerID It is evaluated once for each row in the outer query i.e. for each row in the Customer table Correlated queries are often inefficient 53

EXISTing BurnabyAccounts Find the customerids of customers with an account at the Burnaby branch SELECT DISTINCT C.customerID FROM Customer C, Account A, Owns O WHERE C.customerID = A.accNumber AND A.accNumber = O.customerID AND A.branchName = Burnaby 54

Have an account in all branches Find the customerids of customers who have an account in all branches SQ1 A list of all branch names EXCEPT SQ2 A list of branch names that a customer has an account at If the customer has an account at every branch then this result is empty 55

Have an account in all branches Putting it all together we have SELECT C.customerID FROM Customer C WHERE NOT EXISTS ( (SELECT B.branchName FROM Branch B) EXCEPT (SELECT A.branchName FROM Account A, Owns O WHERE O.customerID = C.customerID AND O.accNumber = A.accNumber)) 56

ANYone Richer Than Bruce Find the customerids of customers who earn more than some customer called Bruce SELECT C.customerID FROM Customer C WHERE C.income > ANY (SELECT Bruce.income FROM Customer Bruce WHERE Bruce.firstName = 'Bruce') Customers in the result table must have incomes greater than at least one of the rows in the sub-query result 57

Richer Than ALL the Bruces Find the customerids of customers who earn more than all customer called Bruce SELECT C.customerID FROM Customer C WHERE C.income > ALL (SELECT Bruce.income FROM Customer Bruce WHERE Bruce.firstName = 'Bruce') If there were no customers called Bruce this query would return all customers 58

Summary Selection Projection Set Operators (UNION, INTERSECT, EXCEPT) Joins (INNER, OUTER) Aggregation Group By Having Order By Distinct Subqueries 59 SQL operators can be composed just like building LEGO buildings

Acknowledge Some lecture slides were copied from or inspired by the following course materials W4111:Introduction to databases by Eugene Wu at Columbia University CSE344: Introduction to Data Management by Dan Suciu at University of Washington CMPT354: Database System I by John Edgar at Simon Fraser University CS186: Introduction to Database Systems by Joe Hellerstein at UC Berkeley CS145: Introduction to Databases by Peter Bailis at Stanford CS 348: Introduction to Database Management by Grant Weddell at University of Waterloo 60