More on SQL Nested Queries Aggregate operators and Nulls

Similar documents
SQL: Queries, Constraints, Triggers

SQL: Queries, Programming, Triggers

SQL. Chapter 5 FROM WHERE

SQL: Queries, Programming, Triggers. Basic SQL Query. Conceptual Evaluation Strategy. Example of Conceptual Evaluation. A Note on Range Variables

Lecture 3 SQL - 2. Today s topic. Recap: Lecture 2. Basic SQL Query. Conceptual Evaluation Strategy 9/3/17. Instructor: Sudeepa Roy

Lecture 3 More SQL. Instructor: Sudeepa Roy. CompSci 516: Database Systems

CIS 330: Applied Database Systems

Database Applications (15-415)

Basic form of SQL Queries

SQL - Lecture 3 (Aggregation, etc.)

Database Management Systems. Chapter 5

Enterprise Database Systems

Lecture 2 SQL. Instructor: Sudeepa Roy. CompSci 516: Data Intensive Computing Systems

SQL Part 2. Kathleen Durant PhD Northeastern University CS3200 Lesson 6

Introduction to Data Management. Lecture 14 (SQL: the Saga Continues...)

The SQL Query Language. Creating Relations in SQL. Adding and Deleting Tuples. Destroying and Alterating Relations. Conceptual Evaluation Strategy

Database Systems. October 12, 2011 Lecture #5. Possessed Hands (U. of Tokyo)

Keys, SQL, and Views CMPSCI 645

Database Systems ( 資料庫系統 )

CSEN 501 CSEN501 - Databases I

Introduction to Data Management. Lecture #14 (Relational Languages IV)

QUERIES, PROGRAMMING, TRIGGERS

SQL: Queries, Constraints, Triggers (iii)

SQL: Queries, Programming, Triggers

Introduction to Data Management. Lecture #13 (Relational Calculus, Continued) It s time for another installment of...

The Database Language SQL (i)

CS 4604: Introduc0on to Database Management Systems. B. Aditya Prakash Lecture #3: SQL and Rela2onal Algebra- - - Part 1

Today s topics. Null Values. Nulls and Views in SQL. Standard Boolean 2-valued logic 9/5/17. 2-valued logic does not work for nulls

Database Applications (15-415)

Overview of DB & IR. ICS 624 Spring Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa

CompSci 516 Data Intensive Computing Systems

Implementation of Relational Operations

This lecture. Step 1

Databases - Relational Algebra. (GF Royle, N Spadaccini ) Databases - Relational Algebra 1 / 24

This lecture. Projection. Relational Algebra. Suppose we have a relation

Evaluation of relational operations

SQL: The Query Language Part 1. Relational Query Languages

Review: Where have we been?

CS 4604: Introduction to Database Management Systems. B. Aditya Prakash Lecture #10: Query Processing

Evaluation of Relational Operations. Relational Operations

CAS CS 460/660 Introduction to Database Systems. Query Evaluation II 1.1

Implementation of Relational Operations. Introduction. CS 186, Fall 2002, Lecture 19 R&G - Chapter 12

COMP 244 DATABASE CONCEPTS & APPLICATIONS

Midterm Exam #2 (Version A) CS 122A Winter 2017

SQL: Data Manipulation Language. csc343, Introduction to Databases Diane Horton Winter 2017

CS330. Query Processing

SQL: csc343, Introduction to Databases Renée J. Miller and Fatemeh Nargesian and Sina Sina Meraji. Winter 2018

Lecture 6 - More SQL

Chapter 3: Introduction to SQL. Chapter 3: Introduction to SQL

Overview of Query Evaluation. Overview of Query Evaluation

Overview of Query Processing

Experimenting with bags (tables and query answers with duplicate rows):

CS 582 Database Management Systems II

Rewrite INTERSECT using IN

Announcements (September 14) SQL: Part I SQL. Creating and dropping tables. Basic queries: SFW statement. Example: reading a table

SQL Data Manipulation Language. Lecture 5. Introduction to SQL language. Last updated: December 10, 2014

Relational Query Optimization. Highlights of System R Optimizer

CS 4604: Introduc0on to Database Management Systems. B. Aditya Prakash Lecture #4: Subqueries in SQL

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17

Database Applications (15-415)

4. SQL - the Relational Database Language Standard 4.3 Data Manipulation Language (DML)

Introduction to Data Management. Lecture #10 (Relational Calculus, Continued)

Principles of Data Management. Lecture #12 (Query Optimization I)

The SQL Query Language. Creating Relations in SQL. Referential Integrity in SQL

Overview of Query Evaluation

Relational Calculus. Chapter Comp 521 Files and Databases Fall

Introduction SQL DRL. Parts of SQL. SQL: Structured Query Language Previous name was SEQUEL Standardized query language for relational DBMS:

Database Management Systems. Chapter 5

Evaluation of Relational Operations

Evaluation of Relational Operations

SQL - Data Query language

Database Applications (15-415)

Data Science 100. Databases Part 2 (The SQL) Slides by: Joseph E. Gonzalez & Joseph Hellerstein,

CMP-3440 Database Systems

Chapter 3: Introduction to SQL

NESTED QUERIES AND AGGREGATION CHAPTER 5 (6/E) CHAPTER 8 (5/E)

SQL Lab Query 5. More SQL Aggregate Queries, Null Values. SQL Lab Query 8. SQL Lab Query 5. SQL Lab Query 8. SQL Lab Query 9

CAS CS 460/660 Introduction to Database Systems. Relational Algebra 1.1

Database Systems. Announcement. December 13/14, 2006 Lecture #10. Assignment #4 is due next week.

Lecture 2 SQL. Announcements. Recap: Lecture 1. Today s topic. Semi-structured Data and XML. XML: an overview 8/30/17. Instructor: Sudeepa Roy

Database Management Systems. Chapter 5

Missing Information. We ve assumed every tuple has a value for every attribute. But sometimes information is missing. Two common scenarios:

Querying Data with Transact SQL

Administriva. CS 133: Databases. General Themes. Goals for Today. Fall 2018 Lec 11 10/11 Query Evaluation Prof. Beth Trushkowsky

Relational Algebra. [R&G] Chapter 4, Part A CS4320 1

ECS 165B: Database System Implementa6on Lecture 7

HW2 out! Administrative Notes

Lesson 2. Data Manipulation Language

The SQL data-definition language (DDL) allows defining :

Relational Query Languages

SQL. Lecture 4 SQL. Basic Structure. The select Clause. The select Clause (Cont.) The select Clause (Cont.) Basic Structure.

SQL: Data Querying. B0B36DBS, BD6B36DBS: Database Systems. h p:// Lecture 4

II. Structured Query Language (SQL)

Relational Algebra 1

Relational Query Optimization

Relational Algebra. Note: Slides are posted on the class website, protected by a password written on the board

15-415/615 Faloutsos 1

WHAT IS SQL. Database query language, which can also: Define structure of data Modify data Specify security constraints

INDEX. 1 Basic SQL Statements. 2 Restricting and Sorting Data. 3 Single Row Functions. 4 Displaying data from multiple tables

SQL Data Querying and Views

Transcription:

Today s Lecture More on SQL Nested Queries Aggregate operators and Nulls Winter 2003 R ecom m en ded R eadi n g s Chapter 5 Section 5.4-5.6 http://philip.greenspun.com/sql/ Simple queries, more complex queries Winter 2003 2

B asi c f orm of S Q L We have already seen the basic form of SQL: SELECT FROM [WHERE [DISTINCT] target-list relation-list qualification] relation-list: A list of relation names (possibly with a range-variable after each name). E.g., FROM Sailors s, Reserves r target-list: A list of attributes of relations in relation-list. * can be used to denote all attributes. E.g., SELECT s.sname, SELECT * qualification: Comparisons (Attr op const or Attr op Attr2, where op is one of <, >, =,,, combined using AND, OR and NOT. DISTINCT (optional) keyword indicates that the answer should not contain duplicates. Default is that duplicates are not eliminated! Winter 2003 3 S em an ti cs of th e S Q L q uery Informally: Result: Compute the cartesian product of all relations in the relation list Result2: Apply the conditions in qualification list to eliminate tuples from Result that do not satisfy the conditions Result3: Retrieve only the required components from each tuple in Result2 according to target list Result: Eliminate duplicate rows in Result3 if DISTINCT keyword is present in the SELECT clause Winter 2003 4 2

S Q L an d R el ati on al A l g eb ra Indeed, the procedure to arrive at the answer of an SQL query can be expressed in relational algebra SELECT DISTINCT A,, A n FROM R,, R m WHERE condition π A,, An (σ condition (R x x R m )) Winter 2003 5 E x am p l e D atab ase Activities sid 2336 2336 2336 88 88 2334 sid 2336 Customers 2334 88 2339 slope-id day s3 0/05/03 s 0/06/03 s 0/0/03 s2 0/0/03 s 0/0/03 s2 0/05/03 cname level type age Cho Beginner snowboard 8 Luke Inter snowboard 25 Ice Advanced ski 20 Paul Beginner ski 33 Slopes slope-id name color s Mountain Run blue s2 Olympic Lady black s3 Magic Carpet green s4 KT-22 black Winter 2003 6 3

Expressions and Strings in the SELECT c l au se General form of SELECT clause SELECT expr AS col-name,, expr n AS col-name n Expr is any arithmetic or string expressions over column names and constants Col-name is a new name for the corresponding output column according to expr Winter 2003 E x am p l es SELECT sid, cname, age+ FROM Customers SELECT sid AS id, cname AS name, (age*00 sid)/33 AS shoe-size FROM Customers Winter 2003 8 4

S tri n g m atch i n g Support for string matching through LIKE clause % denotes zero or more arbitrary characters _ denotes exactly one arbitrary character Examples abc LIKE a % is true abc LIKE %%_c is true abc LIKE a%b%_c is false Find the ids of all customers whose name ends with the letter e SELECT sid FROM Customers WHERE name LIKE %e Winter 2003 9 N ested Q ueri es A nested query is a query with another query embedded in it Embedded query = subquery A subquery can in turn have subqueries Nested queries are convenient for computing results that depend on some intermediate results that need to be computed In SQL, a subquery can appear in the FROM or WHERE or HAVING clause Winter 2003 0 5

E x am p l es Find the names of customers who went on the slope on 0/0/03 SELECT cname, Activities a WHERE a.day= 0/0/03 AND a.slope-id = c.sid SELECT cname WHERE c.sid IN (SELECT a.sid FROM Activities a WHERE a.day= 0/0/03 ) Subquery: computes the ids of all customers who skied on 0/0/03 Winter 2003 E x am p l es Find the names of customers who did not go on the slope on 0/0/03 SELECT cname WHERE c.sid NOT IN (SELECT a.sid FROM Activities a WHERE a.day= 0/0/03 ) Subquery computes the ids of all customers who skied on 0/0/03 Winter 2003 2 6

M N Q u W H c l eaning of ested ery in ER E au se Result: Compute the cartesian product of all relations in the relation list in outer query Result2: For every tuple in Result, apply the condition stated in the qualification list. ((Re)compute the subquery in order to apply the condition in the qualification list) Eliminate the tuple if the condition is not satisfied. Result3: Retrieve only the required components from each tuple in Result2 according to target list Result: Eliminate duplicate rows in Result3 if DISTINCT keyword is present in the SELECT clause Winter 2003 3 C orrel ated N ested Q ueri es In all the examples so far, the inner query is independent of the outer query The inner query could also depend on the outer query Example: Find the names of customers who have been on the slope on 0/0/03 Correlation via range variable c SELECT cname WHERE EXISTS (SELECT * FROM Activities a WHERE a.sid = c.sid a.day= 0/0/03 ) Checks that the subquery returns a non-empty result Winter 2003 4

A n oth er ex am p l e Find all customers who went on the slope Olympic Lady SELECT sid WHERE EXISTS (SELECT a.sid FROM Activities a WHERE a.sid = c.sid AND a.slope-id IN (SELECT slope-id FROM Slopes s WHERE s.name= Olympic Lady )) Winter 2003 5 S et com p ari son op erators IN, EXISTS, UNIQUE, op ANY (or op SOME), op ALL where op is one of {<,, =, <>,, >} x IN Q Returns true if x occurs in the set Q x NOT IN Q Returns true if x does not occur in Q EXISTS Q Returns true if Q is a non-empty set NOT EXISTS Q Returns true if Q is empty What does the following mean? NOT EXISTS (B EXCEPT A) Winter 2003 6 8

S et com p ari son op erators UNIQUE Q Returns true if every row in Q does not appear more than once If Q is empty? NOT UNIQUE Q Returns true if there exists a row in Q that appears more than once Find the names of all customers who went to the slopes at most once SELECT cname WHERE UNIQUE (SELECT c.sid FROM Activities a WHERE c.sid = a.sid) Answer: Paul and Luke Winter 2003 S et com p ari son op erators op ANY (or op SOME), op ALL where op is one of {<,, =, <>,, >} x op SOME Q Returns true if there exists y in Q such that x op y is true Existential quantification x op ALL Q Returns true if for every y in Q, x op y is true Universal quantification Examples: Find the names of all customers who went on some slopes SELECT c.cname WHERE c.sid = SOME (SELECT sid FROM Activities a) Winter 2003 8 9

M ore ex am p l es Find the names of all skiers whose age is greater than every snowboarder SELECT c.name WHERE c.type= skier c.age >ALL (SELECT c.age WHERE c.type = snowboard ) Find the name of the oldest customers SELECT cname FROM Customer c WHERE c.age >= ALL (SELECT c.age ) Winter 2003 9 A g g reg ates Compute summary results over a table. E.g., find the average/min/max score of all students who took CS80 find the total number of snowboarders find total salary of employees in Sales department SQL: COUNT, SUM, AVG, MIN, MAX Winter 2003 20 0

Find the total number of customers SELECT COUNT(sid) FROM Customers C O U N T Find the total number of days of operation SELECT COUNT(distinct(day)) FROM Activities Answer: 3 SELECT COUNT(day) FROM Activities Answer: 6 Alternatively, the last query could have been written as SELECT COUNT(*) FROM Activities Winter 2003 2 S U M, A V G SUM(_) and SUM(DISTINCT(_)) have similar obvious meanings Find the total profit of the company SELECT SUM(qty*price) FROM Sales Similarly for AVG(_) and AVG(DISTINCT(_)) Winter 2003 22

M I N, M A X Find the name and age of the oldest snowboarder SELECT cname, MAX(age) WHERE c.type= snowboard WRONG! The combination of aggregate and non-aggregate output for the SELECT clause is not allowed unless the query also contains a GROUP BY clause Winter 2003 23 M I N, M A X Find the name and age of the oldest snowboarder SELECT cname, age WHERE age = (SELECT MAX(age) FROM Customers) SQL allows this even though the query, rightfully, does not type-check! Winter 2003 24 2

O n a si m i l ar n ote Find the activities of Luke SELECT * FROM Activities a WHERE a.sid = (SELECT sid WHERE cname= Luke ) If there is only one Luke in the Customers table, the subquery returns only one sid value. SQL returns that single sid value to be compared with a.sid However, if the subquery returns more than one value, a run-time error occurs Winter 2003 25 M ore E x am p l es Find the names of all skiers whose age is greater than every snowboarder SELECT c.name Previously WHERE c.age >ALL (SELECT c.age WHERE c.type = snowboard ) SELECT c.name WHERE c.age > (SELECT MAX(c.age) WHERE c.type = snowboard ) Winter 2003 26 3

O M ore E x am p l es Find the names of all skiers whose age is greater than some snowboarder SELECT c.name WHERE c.age >SOME (SELECT c.age WHERE c.type = snowboard ) SELECT c.name WHERE c.age > (SELECT MIN(c.age) WHERE c.type = snowboard ) Winter 2003 2 R D E R B Y SELECT FROM WHERE ORDER BY c,, c n Sorts the result according to columns c,, c n By default, it sorts the result in ascending order Specify descending through ORDER BY c,, c n DESC Winter 2003 28 4

G R O U P B Y an d H A V I N G So far the aggregate operations have been applied to all qualifying tuples We also want to apply aggregate operations to different groups of tuples. E.g., find the total regional sales Find the average age of customers in each category Winter 2003 29 G R O U P B Y an d H A V I N G SELECT [DISTINCT] target-list FROM relation-list WHERE qualification GROUP BY grouping-list HAVING group-qualification Find the average age of customers in each category SELECT type, AVG(age) AS avg-age FROM Customers GROUP BY type type snowboard ski Avg-age 22 2 Winter 2003 30 5

G R O U P B Y an d H A V I N G SELECT [DISTINCT] target-list FROM relation-list WHERE qualification GROUP BY grouping-list HAVING group-qualification More specifically, the target list consists of a list of column names and a list of terms having the form agg(column-name) AS new-name Every column in the list of column names must appear in grouping-list The expressions appearing in group-qualification must have a single value per group. HAVING states the condition that the resulting tuple (for a group) must satisfy before it can be emitted. If GROUP BY is omitted, the entire table is regarded as a group Winter 2003 3 S em an ti cs Compute the cartesian product of all relations in the relation list in outer query For every tuple in Result, apply the condition stated in the qualification list. ((Re)compute the subquery in order to apply the condition in the qualification list) Eliminate the tuple if the condition is not satisfied. Keep only the required columns: columns in the SELECT clause, GROUP BY, and HAVING clauses are kept Sort the table according to the GROUP BY columns Apply group-qualification to each group (according to the GROUP BY columns). Eliminate groups which do not satisfy the groupqualification Generate one tuple for each group according to SELECT clause as before Winter 2003 32 6

E x am p l e Find the age of the youngest sailor with age 8, for each rating with at least 2 such sailors sid 22 sname Dustin rating age 45.0 SELECT S.rating, MIN (S.age) FROM Sailors S WHERE S.age >= 8 GROUP BY S.rating HAVING COUNT (*) > 3 64 92 38 Lubber Zorba Horatio Frodo Sam 8 0 55.5 6.0 28.0 30.0 29 Brutus 33.0 58 Rusty 0 Winter 2003 33 E x am p l e Take the cross product of all relations in the FROM clause SELECT S.rating, MIN (S.age) FROM Sailors S WHERE S.age >= 8 GROUP BY S.rating HAVING COUNT (*) > sid 22 3 64 92 38 29 58 sname Dustin Lubber Zorba Horatio Frodo Sam Brutus Rusty rating 8 0 0 age 45.0 55.5 6.0 28.0 30.0 33.0 Winter 2003 34

E x am p l e Apply the condition in the WHERE clause to every tuple SELECT S.rating, MIN (S.age) FROM Sailors S WHERE S.age >= 8 GROUP BY S.rating HAVING COUNT (*) > sid 22 3 64 92 38 29 58 sname Dustin Lubber Zorba Horatio Frodo Sam Brutus Rusty rating 8 0 0 age 45.0 55.5 6.0 28.0 30.0 33.0 Winter 2003 35 E x am p l e Keep only the required columns: columns in the SELECT clause, GROUP BY, and HAVING clauses are kept SELECT S.rating, MIN (S.age) FROM Sailors S WHERE S.age >= 8 GROUP BY S.rating HAVING COUNT (*) > sid 22 3 64 92 38 29 58 sname Dustin Lubber Horatio Frodo Sam Brutus Rusty rating 8 0 age 45.0 55.5 28.0 30.0 33.0 Winter 2003 36 8

E x am p l e rating Sort the table according to the GROUP BY columns 28.0 SELECT S.rating, MIN (S.age) FROM Sailors S WHERE S.age >= 8 GROUP BY S.rating HAVING COUNT (*) > 30.0 Winter 2003 3 8 0 8 0 age 45.0 55.5 33.0 rating age 28.0 30.0 33.0 45.0 55.5 E x am p l e Apply group-qualification to each group according to the GROUP BY columns. Eliminate groups which do not satisfy the group-qualification SELECT S.rating, MIN (S.age) FROM Sailors S WHERE S.age >= 8 GROUP BY S.rating HAVING COUNT (*) > rating 8 0 age 28.0 30.0 33.0 45.0 55.5 Winter 2003 38 9

E x am p l e Generate one tuple for each group according to SELECT clause rating age 28.0 SELECT S.rating, MIN (S.age) FROM Sailors S WHERE S.age >= 8 GROUP BY S.rating HAVING COUNT (*) > Winter 2003 39 E V E R Y an d A N Y i n H A V I N G SELECT S.rating, MIN (S.age) FROM Sailors S WHERE S.age >= 8 GROUP BY S.rating HAVING COUNT (*) > AND EVERY (S.age 40) rating age 28.0 30.0 rating age 33.0 28.0 45.0 8 55.5 0 Winter 2003 40 20

N E V E R Y an d A N Y i n H A V I N G SELECT S.rating, MIN (S.age) FROM Sailors S WHERE S.age >= 8 GROUP BY S.rating HAVING COUNT (*) > AND SOME (S.age > 40) rating age 28.0 30.0 rating age 33.0 45.0 8 55.5 0 Winter 2003 4 ul l V al ues The value of an attribute can be unknown (e.g., a rating has not been assigned) or inapplicable (e.g., no spouse). SQL provides a special value null for such situations. The presence of null complicates many issues. E.g.: Special operators needed to check if value is/is not null (IS NULL, IS NOT NULL in SQL). Is rating>8 true or false when rating is equal to null? What about AND, OR and NOT connectives? 3- valued logic (true, false and unknown). Meaning of constructs must be defined carefully. (e.g., WHERE clause eliminates rows that don t evaluate to true.) Winter 2003 42 2

O uter J oi n A variant of the join that relies on null values: SELECT c.sid, a.slope-id NATURAL LEFT OUTER JOIN Activities a Tuples of Customers table that do not match some tuple in Activities table would normally be excluded from the result; the left outer join preserves them with null values for the missing slope-id attribute. RIGHT OUTER JOIN FULL OUTER JOIN sid 2336 2336 2336 88 88 2334 2339 slope-id s3 s s s2 s s2 NULL Winter 2003 43 S um m ary Note that aggregation, GROUP BY cannot be expressed in relational algebra SQL is relationally complete: all of the operators of the relational algebra can be simulated in SQL. Additional features: string comparisons, set membership, arithmetic and grouping, nested queries. Winter 2003 44 22