SQL Lab Query 5. More SQL Aggregate Queries, Null Values. SQL Lab Query 8. SQL Lab Query 5. SQL Lab Query 8. SQL Lab Query 9

Similar documents
Lecture 3 More SQL. Instructor: Sudeepa Roy. CompSci 516: Database Systems

CIS 330: Applied Database Systems

Introduction to Data Management. Lecture 14 (SQL: the Saga Continues...)

Lecture 3 SQL - 2. Today s topic. Recap: Lecture 2. Basic SQL Query. Conceptual Evaluation Strategy 9/3/17. Instructor: Sudeepa Roy

Lecture 2 SQL. Instructor: Sudeepa Roy. CompSci 516: Data Intensive Computing Systems

SQL: Queries, Constraints, Triggers

SQL. Chapter 5 FROM WHERE

Database Management Systems. Chapter 5

Database Applications (15-415)

SQL - Lecture 3 (Aggregation, etc.)

Enterprise Database Systems

SQL: Queries, Programming, Triggers. Basic SQL Query. Conceptual Evaluation Strategy. Example of Conceptual Evaluation. A Note on Range Variables

The SQL Query Language. Creating Relations in SQL. Adding and Deleting Tuples. Destroying and Alterating Relations. Conceptual Evaluation Strategy

SQL: Queries, Programming, Triggers

HW1 is due tonight HW2 groups are assigned. Outline today: - nested queries and witnesses - We start with a detailed example! - outer joins, nulls?

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17

Today s topics. Null Values. Nulls and Views in SQL. Standard Boolean 2-valued logic 9/5/17. 2-valued logic does not work for nulls

SQL: Queries, Constraints, Triggers (iii)

Midterm Exam #2 (Version A) CS 122A Winter 2017

SQL Part 2. Kathleen Durant PhD Northeastern University CS3200 Lesson 6

Introduction to Data Management. Lecture #14 (Relational Languages IV)

Basic form of SQL Queries

SQL: Queries, Programming, Triggers

Conceptual Design. The Entity-Relationship (ER) Model

Data Science 100. Databases Part 2 (The SQL) Slides by: Joseph E. Gonzalez & Joseph Hellerstein,

Data Science 100 Databases Part 2 (The SQL) Previously. How do you interact with a database? 2/22/18. Database Management Systems

Database Applications (15-415)

Introduction to Data Management. Lecture #13 (Relational Calculus, Continued) It s time for another installment of...

Experimenting with bags (tables and query answers with duplicate rows):

Integrity constraints, relationships. CS634 Lecture 2

Introduction to Data Management. Lecture #10 (Relational Calculus, Continued)

CompSci 516 Data Intensive Computing Systems

Keys, SQL, and Views CMPSCI 645

Outer Join, More on SQL Constraints

Evaluation of relational operations

Database Systems. October 12, 2011 Lecture #5. Possessed Hands (U. of Tokyo)

Database Systems ( 資料庫系統 )

SQL: The Query Language Part 1. Relational Query Languages

CompSci 516: Database Systems

Evaluation of Relational Operations: Other Techniques

More on SQL Nested Queries Aggregate operators and Nulls

Query Optimization. Schema for Examples. Motivating Example. Similar to old schema; rname added for variations. Reserves: Sailors:

CSIT5300: Advanced Database Systems

Query Optimization. Schema for Examples. Motivating Example. Similar to old schema; rname added for variations. Reserves: Sailors:

Introduction to Data Management. Lecture #17 (SQL, the Never Ending Story )

Overview of Query Evaluation. Overview of Query Evaluation

Lecture 2 SQL. Announcements. Recap: Lecture 1. Today s topic. Semi-structured Data and XML. XML: an overview 8/30/17. Instructor: Sudeepa Roy

Principles of Data Management. Lecture #9 (Query Processing Overview)

Review: Where have we been?

Relational Algebra. Note: Slides are posted on the class website, protected by a password written on the board

CSE 444: Database Internals. Section 4: Query Optimizer

Midterm Exam #2 (Version B) CS 122A Spring 2018

RAID in Practice, Overview of Indexing

IMPORTANT: Circle the last two letters of your class account:

CS 186, Fall 2002, Lecture 8 R&G, Chapter 4. Ronald Graham Elements of Ramsey Theory

Overview of Query Evaluation

Relational Query Languages

Endterm Exam (Version B) CS 122A Spring 2017

Relational Algebra. Chapter 4, Part A. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

CompSci 516 Database Systems. Lecture 2 SQL. Instructor: Sudeepa Roy

CSEN 501 CSEN501 - Databases I

Relational Query Optimization. Highlights of System R Optimizer

Database Management Systems. Chapter 4. Relational Algebra. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

Relational Query Optimization

Overview of Implementing Relational Operators and Query Evaluation

CSE 444: Database Internals. Sec2on 4: Query Op2mizer

Relational Algebra. Study Chapter Comp 521 Files and Databases Fall

Introduction to Data Management. Lecture 16 (SQL: There s STILL More...) It s time for another installment of...

Announcements If you are enrolled to the class, but have not received the from Piazza, please send me an . Recap: Lecture 1.

Midterm Exam #2 (Version B) CS 122A Spring 2018

Relational Query Languages. Preliminaries. Formal Relational Query Languages. Example Schema, with table contents. Relational Algebra

CS 4604: Introduction to Database Management Systems. B. Aditya Prakash Lecture #10: Query Processing

Overview of Query Evaluation. Chapter 12

ATYPICAL RELATIONAL QUERY OPTIMIZER

HW2 out! Administrative Notes

Query Evaluation Overview, cont.

Database Systems. Announcement. December 13/14, 2006 Lecture #10. Assignment #4 is due next week.

CS330. Query Processing

Implementing Relational Operators: Selection, Projection, Join. Database Management Systems, R. Ramakrishnan and J. Gehrke 1

Principles of Data Management. Lecture #12 (Query Optimization I)

Overview of DB & IR. ICS 624 Spring Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa

This lecture. Step 1

The SQL Query Language. Creating Relations in SQL. Referential Integrity in SQL

Database Management Systems. Chapter 5

Query Evaluation Overview, cont.

Relational algebra. Iztok Savnik, FAMNIT. IDB, Algebra

SQL Aggregation and Grouping. Outline

SQL Aggregation and Grouping. Davood Rafiei

Faloutsos 1. Carnegie Mellon Univ. Dept. of Computer Science Database Applications. Outline

Evaluation of Relational Operations: Other Techniques

Schema for Examples. Query Optimization. Alternative Plans 1 (No Indexes) Motivating Example. Alternative Plans 2 With Indexes

Database Management Systems. Chapter 5

An SQL query is parsed into a collection of query blocks optimize one block at a time. Nested blocks are usually treated as calls to a subroutine

Databases - Relational Algebra. (GF Royle, N Spadaccini ) Databases - Relational Algebra 1 / 24

This lecture. Projection. Relational Algebra. Suppose we have a relation

Rewrite INTERSECT using IN

The Database Language SQL (i)

CS 4604: Introduc0on to Database Management Systems. B. Aditya Prakash Lecture #3: SQL and Rela2onal Algebra- - - Part 1

CSC 370 Database Systems Summer 2004 Solutions for Exercise No. 1 Version 1.0 June 14, 2004

MIS Database Systems Relational Algebra

Transcription:

SQL Lab Query 5 Sailors sid sname rating age Reserves sid bid day Boats bid name color More SQL Aggregate Queries, Null Values Slides based on Database Management Systems 3 rd ed, Ramakrishnan and Gehrke CS430/630 Lecture 8 For each boat color, list the number reservations for those boats Analysis: number of reservations: aggregate, number of rows of reserves, count(*) For each number of Need groups, GROUP BY Involves boats, reserves: need join SQL Lab Query 5 Sailors sid sname rating age Reserves sid bid day For each boat color, list the number reservations for those boats Boats bid name color select b.color, count(*) from reserves r, boats b where r.bid = b.bid group by b.color SQL Lab Query 8 Sailors sid sname rating age Reserves sid bid day Boats bid name color For each boat color, list the number of reservations for those boats, but only if that number exceeds 2. select b.color, count(*) from reserves r, boats b where r.bid = b.bid group by b.color having count(*) > 2 -- qualification on group SQL Lab Query 8 suppliers(sid, sname, address) parts(pid, pname, color) catalog(sid,pid, cost) Find the number of parts each supplier provides. List sid, sname, and count of parts. Analysis: each supplier, can be reworded to For each supplier. So need GROUP BY. Consider a supplier. The rows in catalog for that sid show all the parts that supplier provides, so count(*) for the group. We need to join to suppliers to pull in the sname. select s.sid, s.sname, count(*) from suppliers s, catalog c where s.sid = c.sid group by s.sid, s.sname SQL Lab Query 9 suppliers(sid, sname, address) parts(pid, pname, color) catalog(sid, pid, cost) Find the name of the cheapest supplier of part 4. Remember: to use min, need to use select min( ) to run the loop to calculate the min. Analysis: Consider part 4. The rows in catalog for that pid show all the suppliers that part has, and the cost. The min cost of these is select min(c1.cost) from catalog c1 where c1.pid = 4 We just need to find the suppliers of part 4 with this specific cost, and we need to join to suppliers to pull in the sname. select s.sname from suppliers s, catalog c where s.sid = c.sid and c.pid = 4 and c.cost = (select min(c1.cost) from catalog c1 where c1.pid = 4)

Aircraft Example 1 Example 1 Find the minimum salary of employees certified for some airplane with Find the minimum salary of employees certified for some airplane with SELECT MIN (e.salary) AND a.cruisingrange > 2000; Aircraft Example 2 Find the average salary of employees certified for some airplane with Example 2 Find the average salary of employees certified for some airplane with SELECT AVG (e.salary) AND a.cruisingrange > 2000; But this is the average over the certifications, and one employee can have multiple 2000+ certifications, so this is not a proper average over employees. We saw this problem before with ages of students enrolled in courses. Aircraft Example 2, fixed Aircraft Example 3 Find the average salary of employees certified for some airplane with For each aircraft name, find the minimum salary of employees certified for that SELECT avg (e.salary) FROM employees e where e.eid in (select eid from certified c, aircraft a AND a.cruisingrange > 2000);

Aircraft Example 3 Aircraft Example 4 For each aircraft name, find the minimum salary of employees certified for that For each aircraft aid, find the average salary of employees certified for that SELECT a.aname, MIN (e.salary) GROUP BY a.aname Example 4 Aircraft Example 5 For each aircraft aid, find the average salary of employees certified for that SELECT a.aid, AVG (e.salary) GROUP BY a.aid For each aircraft aid, find the count of employees certified for that SELECT a.aid, count(*) GROUP BY a.aid Here there is no problem with AVG: a particular aircraft is connected only once per employee through the certified relationship. Aircraft Example 6 Aircraft Example 6 Find aircraft by aid that have fewer than 3 employees certified for that airplane. In these cases, report the number of such employees. Find aircraft by aid that have fewer than 3 employees certified for that airplane. In these cases, report the number of such employees. SELECT a.aid, count(*) GROUP BY a.aid HAVING count(*) < 3

Yelpdb Example 1 business(id, name, neighborhood, city, state, latitude, longitude, stars, ) review(id, stars, review_date, text, useful, funny, cool, business_id, user_id) Find businesses by name in Las Vegas that have fewer than 3 stars Yelpdb Example 1 business(id, name, neighborhood, city, state, latitude, longitude, stars, ) review(id, stars, review_date, text, useful, funny, cool, business_id, user_id) Find businesses by id and name in Las Vegas, NV that have fewer than 3 stars SELECT b.id, b.name FROM business b WHERE b.city = 'Las Vegas' and b.state = 'NV' and b.stars < 3 _WVPws8wHGOxPtGvKPr3vA 3 Kings Hookah Lounge 3qOwWFBUE8mdOToI7YrQ Custom Kings IsqCZAF9YTcvKPKj2dZg Z Gallerie +------------------------+-------------------------------------+ 4425 rows in set (0.15 sec) Yelpdb Example 2 business(id, name, neighborhood, city, state, latitude, longitude, stars, ) review(id, stars, review_date, text, useful, funny, cool, business_id, user_id) Check on '3 Kings Hookah Lounge', id _WVPws8wHGOxPtGvKPr3vA How did the reviews of this business break down by number of stars? Note: "break down by": another way of saying "for each" Yelpdb Example 2 business(id, name, neighborhood, city, state, latitude, longitude, stars, ) review(id, stars, review_date, text, useful, funny, cool, business_id, user_id) Check on '3 Kings Hookah Lounge', id _WVPws8wHGOxPtGvKPr3vA How did the reviews of this business break down by number of stars? select stars, count(stars) from review where business_id = '_WVPws8wHGOxPtGvKPr3vA' group by stars; +-------+--------------+ stars count(stars) +-------+--------------+ 1 7 2 2 avg is 2.0, value in business.stars 3 2 4 1 5 1 +-------+--------------+ 5 rows in set (0.00 sec) Notes on business.stars We see that the stars value for a business is obtained by computing the average stars assigned by users in their reviews of that business. Thus business.stars is a derived value, not actually needed for full information in the database The app could just compute this average from the review table data, but that would take a little time on each query. So for performance sake, the database keeps this average in the business table. That means that to add a review to the system, the app has to adjust the target business's stars value. This duplication of data means the database is not "fully normalized", a concept we will study later. All the databases in the book have no derived values like this case. Histograms, i.e., counts by intervals of values Suppose we want to summarize the costs of parts more finely than just average, min and max. Cost values:.55, 7.95, 11.7, 15.3, 16.5, 20.5, 20.5, 22.2, 36.1, 42.3, 48.6, 57.3, 75.2,124.23,124.23 We make can a histogram like this, with buckets of values: Cost count 0-10 2 10-20 3 20-30 3 How can we get this data using SQL? Note: Bucket-number = floor(cost/10) For example cost = 22 has bucket-number floor(2.2) = 2 Clearly somehow related to GROUP BY, but GROUP BY works with column values directly, not expressions like floor(cost/10), according to R&G, and the SQL standards (SQL-92 and SQL-2003) But first try GROUP BY floor(cost/10) in Oracle

Histogram in Oracle select floor(c.cost/10)*10 as "range-start", count(*) from catalog c group by floor(c.cost/10) -- not OK by SQL 92 or 2003 order by floor(c.cost/10); --OK by SQL 2003, but not SQL 92 range-start COUNT(*) ----------- ---------- 0 3 10 4 20 3 30 1 40 2 50 1 70 1 120 2 8 rows selected. This shows that Oracle allows GROUP BY <expression>, at least simplest enough expressions. How about mysql? select floor(c.cost/10)*10 as "range-start", count(*) from catalog c group by floor(c.cost/10) order by floor(c.cost/10); ERROR 1055 (42000): Expression #1 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'eoneil1db.c.cost' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by Hmm, mysql refuses to go beyond the standard in this particular case?? What if we change the sql_mode? How about mysql? Turn off SQL modes for the session, and the query works! SET sql_mode = ''; select floor(c.cost/10)*10 as "range-start", count(*) from catalog c group by floor(c.cost/10) order by floor(c.cost/10); +-------------+----------+ range-start count(*) +-------------+----------+ 0 3 10 4 20 3 30 1 40 2 50 1 70 1 120 2 +-------------+----------+ 8 rows in set (0.01 sec) What if we are restricted to GROUP BY <column>? We can use a query in the FROM clause to generate the bucket numbers, the floor(cost/10) values: select bucket*10 as "range-start", count(*) from catalog c, (select distinct floor(cost/10) as bucket from catalog) b where floor(c.cost/10) = b.bucket group by b.bucket; This provides the same output. We have made the computed bucket numbers show up in a column in the table-query. Null Values Field values in a tuple may sometimes be unknown: e.g., a rating has not been assigned, or a new column is added to the table inapplicable: e.g., CEO has no manager, single person has no spouse SQL provides a special value NULL for such situations Special operators IS NULL, IS NOT NULL SELECT * FROM Sailors WHERE rating IS NOT NULL Note: NULL must not be used as constant in expressions! A field can be declared as NOT NULL, means NULL values are not allowed (by default, PK fields are NOT NULL) Try it out SQL> select * from class; ---unreadable mess SQL> col name format a20; SQL> select * from class; NAME MEETS_AT ROOM FID -------------------- -------------------- ---------- ---------- Data Structures MWF 10 R128 489456522 Intoduction to Math TuTh 8-9:30 R128 489221823 Artificial Intellige UP328 nce look like null column values

Try out is null SQL> select * from class where meets_at is null; NAME MEETS_AT ROOM FID -------------------- -------------------- ---------- ---------- Psychology 619023588 Artificial Intellige UP328 nce SQL> select * from class where room is null; NAME MEETS_AT ROOM FID -------------------- -------------------- ---------- ---------- Psychology 619023588 Dealing with Null Values The presence of NULL complicates some issues NULL op value has as result NULL (op is +,-,*,/) What does rating>8 evaluate to if rating is equal to NULL? Answer: unknown 3-valued logic: true, false and unknown Recall that WHERE eliminates rows that don t evaluate to true What about AND, OR and NOT connectives? unknown AND true = unknown unknown OR false = unknown NOT unknown = unknown Also, <NULL_value> = <NULL_value> is unknown! Null Values and Aggregates The COUNT(*) result includes tuples with NULL COUNT(A) only counts tuples where value of attribute A is not NULL All other aggregates skip NULL values (if aggregate is on the field that is NULL) If all values are NULL on the aggregated field, the result of aggregate is also NULL (except COUNT(A) which returns 0) Null Values and Aggregates Following two queries DO NOT RETURN SAME RESULT if there are NULLs (in field name): SELECT COUNT(*) FROM Sailors S SELECT COUNT(S.name) FROM Sailors S Following two queries DO NOT RETURN SAME RESULT if there are NULLs (in field rating): SELECT COUNT(*) FROM Sailors S SELECT COUNT(*) FROM Sailors WHERE (rating>8) OR (rating <= 8) Table class has a row with a null room count(room) counts non-null room values SQL> select count(*) from class; COUNT(*) ---------- 23 SQL> select count(room) from class; COUNT(ROOM) ----------- 22 WHERE clause evaluation with nulls Simpler Sailors instance, with a null rating: SID SNAME RATING AGE ---------- ------------------------------ ---------- ---------- 18 jones 3 30 41 jonah 6 56 22 ahab 7 44 63 moby 15 SQL> select count(*) from sailors WHERE (rating>8) OR (rating <= 8) COUNT(*) ---------- 3 Rating = null row: WHERE (null > 8 ) or (null <= 8) unknown or unknown Result: unknown so not qualified by WHERE clause WHERE only qualifies TRUE rows, not UNKNOWN or FALSE ones

Null Values and Duplicates Comparing two NULL values gives as result unknown But this rule does not hold when checking for duplicates! NULL values are considered equal in this case! Two tuples are duplicates if they match in all non-null attributes (of both) and have nulls in the other attributes (1, null) and (1,null) are dups, but (1, null) and (1,2) are not dups Implications for DISTINCT, UNIQUE subqueries, set operations! Tuples with NULL in some group-by attributes placed in same group if all non-null group-by attributes match! In one-col case: all null values put in just one group-by group (in applications, best to use non-null group-by columns) DISTINCT: if two tuples have equal values in all non-null attributes and nulls matching otherwise, only one of them is output