Announcements (September 14) SQL: Part I SQL. Creating and dropping tables. Basic queries: SFW statement. Example: reading a table

Similar documents
SQL: Part II. Announcements (September 18) Incomplete information. CPS 116 Introduction to Database Systems. Homework #1 due today (11:59pm)

EECS 647: Introduction to Database Systems

Announcements (September 18) SQL: Part II. Solution 1. Incomplete information. Solution 3? Solution 2. Homework #1 due today (11:59pm)

EECS 647: Introduction to Database Systems

Relational Model & Algebra. Announcements (Thu. Aug. 27) Relational data model. CPS 116 Introduction to Database Systems

SQL: Part III. Announcements. Constraints. CPS 216 Advanced Database Systems

What s a database system? Review of Basic Database Concepts. Entity-relationship (E/R) diagram. Two important questions. Physical data independence

Announcements. Relational Model & Algebra. Example. Relational data model. Example. Schema versus instance. Lecture notes

CSEN 501 CSEN501 - Databases I

SQL - Data Query language

Relational Model & Algebra. Announcements (Tue. Sep. 3) Relational data model. CompSci 316 Introduction to Database Systems

COMP 244 DATABASE CONCEPTS & APPLICATIONS

Announcements (September 21) SQL: Part III. Triggers. Active data. Trigger options. Trigger example

CSC Web Programming. Introduction to SQL

CMPT 354: Database System I. Lecture 3. SQL Basics

Plan for today. Query Processing/Optimization. Parsing. A query s trip through the DBMS. Validation. Logical plan

Database Systems SQL SL03

Chapter 3: Introduction to SQL

The SQL data-definition language (DDL) allows defining :

Chapter 3: Introduction to SQL. Chapter 3: Introduction to SQL

Database Systems SQL SL03

Chapter 3: Introduction to SQL

Introduction to SQL Part 1 By Michael Hahsler based on slides for CS145 Introduction to Databases (Stanford)

EECS 647: Introduction to Database Systems

SQL: Data Manipulation Language. csc343, Introduction to Databases Diane Horton Winter 2017

SQL: csc343, Introduction to Databases Renée J. Miller and Fatemeh Nargesian and Sina Sina Meraji. Winter 2018

CSE 344 Midterm. Wednesday, February 19, 2014, 14:30-15:20. Question Points Score Total: 100

EECS 647: Introduction to Database Systems

Polls on Piazza. Open for 2 days Outline today: Next time: "witnesses" (traditionally students find this topic the most difficult)

Today s topics. Null Values. Nulls and Views in SQL. Standard Boolean 2-valued logic 9/5/17. 2-valued logic does not work for nulls

Database Systems: Design, Implementation, and Management Tenth Edition. Chapter 7 Introduction to Structured Query Language (SQL)

Lecture 6 - More SQL

SQL Data Query Language

DS Introduction to SQL Part 1 Single-Table Queries. By Michael Hahsler based on slides for CS145 Introduction to Databases (Stanford)

Relational Model and Algebra. Introduction to Databases CompSci 316 Fall 2018

Database System Concepts, 5th Ed.! Silberschatz, Korth and Sudarshan See for conditions on re-use "

Lecture 3 More SQL. Instructor: Sudeepa Roy. CompSci 516: Database Systems

Keys, SQL, and Views CMPSCI 645

Introduction SQL DRL. Parts of SQL. SQL: Structured Query Language Previous name was SEQUEL Standardized query language for relational DBMS:

Computing for Medicine (C4M) Seminar 3: Databases. Michelle Craig Associate Professor, Teaching Stream

Lecture 2: Introduction to SQL

Oracle Database 11g: SQL and PL/SQL Fundamentals

Database Management Systems. Chapter 5

Basic operators: selection, projection, cross product, union, difference,

SQL: The Query Language Part 1. Relational Query Languages

Lecture 3 SQL - 2. Today s topic. Recap: Lecture 2. Basic SQL Query. Conceptual Evaluation Strategy 9/3/17. Instructor: Sudeepa Roy

SQL. Lecture 4 SQL. Basic Structure. The select Clause. The select Clause (Cont.) The select Clause (Cont.) Basic Structure.

WHAT IS SQL. Database query language, which can also: Define structure of data Modify data Specify security constraints

Chapter 4 SQL. Database Systems p. 121/567

SQL. Chapter 5 FROM WHERE

CS 582 Database Management Systems II

The Extended Algebra. Duplicate Elimination. Sorting. Example: Duplicate Elimination

SQL Part 2. Kathleen Durant PhD Northeastern University CS3200 Lesson 6

COMP 244 DATABASE CONCEPTS AND APPLICATIONS

SQL - Lecture 3 (Aggregation, etc.)

1) Introduction to SQL

SQL. SQL Queries. CISC437/637, Lecture #8 Ben Cartere9e. Basic form of a SQL query: 3/17/10

SQL: Part II. Introduction to Databases CompSci 316 Fall 2018

SQL. CS 564- Fall ACKs: Dan Suciu, Jignesh Patel, AnHai Doan

SQL functions fit into two broad categories: Data definition language Data manipulation language

Midterm Review. Winter Lecture 13

Silberschatz, Korth and Sudarshan See for conditions on re-use

Lecture 2 SQL. Instructor: Sudeepa Roy. CompSci 516: Data Intensive Computing Systems

SQL: Queries, Programming, Triggers

Oracle Database: SQL and PL/SQL Fundamentals NEW

SQL: Queries, Constraints, Triggers

Oracle Database: SQL and PL/SQL Fundamentals Ed 2

Database Management Systems. Chapter 5

Set theory is a branch of mathematics that studies sets. Sets are a collection of objects.

QQ Group

CMPT 354: Database System I. Lecture 4. SQL Advanced

4. SQL - the Relational Database Language Standard 4.3 Data Manipulation Language (DML)

Chapter 3: SQL. Chapter 3: SQL

Querying Data with Transact SQL

More on SQL Nested Queries Aggregate operators and Nulls

Relational Database Design Part I. Announcement. Relational model: review. CPS 116 Introduction to Database Systems

Lecture 3 SQL. Shuigeng Zhou. September 23, 2008 School of Computer Science Fudan University

SQL: Data Querying. B0B36DBS, BD6B36DBS: Database Systems. h p:// Lecture 4

CIS 330: Applied Database Systems

Relational Model, Relational Algebra, and SQL

Relational Algebra and SQL

Relational Database Design Part I. Announcements (September 5) Relational model: review. CPS 116 Introduction to Database Systems

CS425 Fall 2017 Boris Glavic Chapter 4: Introduction to SQL

SQL Data Querying and Views

Relational Algebra Homework 0 Due Tonight, 5pm! R & G, Chapter 4 Room Swap for Tuesday Discussion Section Homework 1 will be posted Tomorrow

Why Relational Databases? Relational databases allow for the storage and analysis of large amounts of data.

Chapter # 7 Introduction to Structured Query Language (SQL) Part II

Carnegie Mellon Univ. Dept. of Computer Science Database Applications. General Overview - rel. model. Overview - detailed - SQL

Part I: Structured Data

Basic Structure Set Operations Aggregate Functions Null Values Nested Subqueries Derived Relations Views Modification of the Database Data Definition

Chapter 3: SQL. Database System Concepts, 5th Ed. Silberschatz, Korth and Sudarshan See for conditions on re-use

Database Systems CSE 414

NESTED QUERIES AND AGGREGATION CHAPTER 5 (6/E) CHAPTER 8 (5/E)

Introduction to SQL. ECE 650 Systems Programming & Engineering Duke University, Spring 2018

Database Languages. A DBMS provides two types of languages: Language for accessing & manipulating the data. Language for defining a database schema

Chapter 4: SQL. Basic Structure

Goals for Today. CS 133: Databases. Relational Model. Multi-Relation Queries. Reason about the conceptual evaluation of an SQL query

Relational Model. Topics. Relational Model. Why Study the Relational Model? Linda Wu (CMPT )

Big Data Processing Technologies. Chentao Wu Associate Professor Dept. of Computer Science and Engineering

Enterprise Database Systems

Transcription:

Announcements (September 14) 2 SQL: Part I Books should have arrived by now Homework #1 due next Tuesday Project milestone #1 due in 4 weeks CPS 116 Introduction to Database Systems SQL 3 Creating and dropping tables 4 SQL: Structured Query Language Pronounced S-Q-L or sequel The standard query language supported by most commercial DBMS A brief history IBM System R ANSI SQL89 ANSI SQL92 (SQL2) ANSI SQL99 (SQL3) ANSI SQL 2003 (+OLAP, XML, etc.) CREATE TABLE table_name (, column_name i column_type i, ); DROP TABLE table_name; Examples create table Student (SID integer, name varchar(30), email varchar(30), age integer, GPA float); create table Course (CID char(10), title varchar(100)); create table Enroll (SID integer, CID char(10)); drop table Student; drop table Course; drop table Enroll; -- everything from -- to the end of the line is ignored. -- SQL is insensitive to white space. -- SQL is insensitive to case (e.g.,...course... is equivalent to --...COURSE...) Basic queries: SFW statement 5 Example: reading a table 6 SELECT A 1, A 2,, A n FROM R 1, R 2,, R m WHERE condition; Also called an SPJ (select-project-join) query Equivalent (not really!) to relational algebra query π A1, A 2,, A n ( σ condition (R 1 R 2 R m )) SELECT * ; Single-table query, so no cross product here WHERE clause is optional * is a short hand for all columns

Example: selection and projection 7 Example: join 8 Name of students under 18 SELECT name WHERE age < 18; When was Lisa born? SELECT 2006 age WHERE name = Lisa ; SELECT list can contain expressions Can also use built-in functions such as SUBSTR, ABS, etc. String literals (case sensitive) are enclosed in single quotes SID s and names of students taking courses with the word Database in their titles SELECT Student.SID, Student.name, Enroll, Course WHERE Student.SID = Enroll.SID AND Enroll.CID = Course.CID AND title LIKE %Database% ; LIKE matches a string against a pattern % matches any sequence of 0 or more characters Okay to omit table_name in table_name.column_name if column_name is unique Example: rename 9 A more complicated example 10 SID s of all pairs of classmates Relational algebra query: π e1.sid, e2.sid ( ρ e1 Enroll e1.cid = e2.cid e1.sid > e2.sid ρ e2 Enroll ) SQL: SELECT e1.sid AS SID1, e2.sid AS SID2 FROM Enroll AS e1, Enroll AS e2 WHERE e1.cid = e2.cid AND e1.sid > e2.sid; AS keyword is completely optional Titles of all courses that Bart and Lisa are taking together SELECT c.title sb, Student sl, Enroll eb, Enroll el, Course c WHERE sb.name = Bart AND sl.name = Lisa AND eb.sid = sb.sid AND el.sid = sl.sid AND eb.cid = c.cid AND el.cid = c.cid; Tip: Write the FROM clause first, then WHERE, and then SELECT Why SFW statements? 11 Set versus bag semantics 12 Out of many possible ways of structuring SQL statements, why did the designers choose SELECT- FROM-WHERE? A large number of queries can be written using only selection, projection, and cross product (or join) Any query that uses only these operators can be written in a canonical form: π L (σ p (R 1 R m )) Example: π R.A, S.B (R p1 S) p2 (π T.C σ p3 T) = π R.A, S.B, T.C σ p1 p2 p3 ( R S T ) SELECT-FROM-WHERE captures this canonical form Set No duplicates Relational model and algebra use set semantics Bag Duplicates allowed Number of duplicates is significant SQL uses bag semantics by default

Set versus bag example 13 A case for bag semantics 14 Enroll SID CID 142 CPS196 142 CPS114 123 CPS196 857 CPS196 857 CPS130 456 CPS114...... π SID Enroll SELECT SID FROM Enroll; SID 142 123 857 456... SID 142 142 123 857 857 456... Efficiency Saves time of eliminating duplicates Which one is more useful? π GPA Student SELECT GPA ; The first query just returns all possible GPA s The second query returns the actual GPA distribution Besides, SQL provides the option of set semantics with DISTINCT keyword Forcing set semantics 15 Operational semantics of SFW 16 SID s of all pairs of classmates SELECT e1.sid AS SID1, e2.sid AS SID2 FROM Enroll AS e1, Enroll AS e2 WHERE e1.cid = e2.cid AND e1.sid > e2.sid; Say Bart and Lisa both take CPS116 and CPS114 SELECT DISTINCT e1.sid AS SID1, e2.sid AS SID2... With DISTINCT, all duplicate (SID1, SID2) pairs are removed from the output SELECT [DISTINCT] E 1, E 2,, E n FROM R 1, R 2,, R m WHERE condition; For each t 1 in R 1 : For each t 2 in R 2 : For each t m in R m : If condition is true over t 1, t 2,, t m : Compute and output E 1, E 2,, E n as a row If DISTINCT is present Eliminate duplicate rows in output t 1, t 2,, t m are often called tuple variables SQL set and bag operations UNION, EXCEPT, INTERSECT Set semantics Duplicates in input tables, if any, are first eliminated Exactly like set,, and in relational algebra UNION ALL, EXCEPT ALL, INTERSECT ALL Bag semantics Think of each row as having an implicit count (the number of times it appears in the table) Bag union: sum up the counts from two tables Bag difference: proper-subtract the two counts Bag intersection: take the minimum of the two counts 17 Examples of bag operations Bag1 Bag2 Bag1 UNION ALL Bag2 Bag1 EXCEPT ALL Bag2 Bag1 INTERSECT ALL Bag2 18

Examples of set versus bag operations Enroll(SID, CID), ClubMember(club, SID) (SELECT SID FROM ClubMember) EXCEPT (SELECT SID FROM Enroll); SID s of students who are in clubs but not taking any classes (SELECT SID FROM ClubMember) EXCEPT ALL (SELECT SID FROM Enroll); SID s of students who are in more clubs than classes 19 20 Summary of SQL features covered so far SELECT-FROM-WHERE statements (select-project-join queries) Set and bag operations Next: how to nest SQL queries Table expression Use query result as a table In set and bag operations, FROM clauses, etc. A way to nest queries Example: names of students who are in more clubs than classes SELECT DISTINCT name, ((SELECT SID FROM ClubMember) EXCEPT ALL (SELECT SID FROM Enroll) ) AS S WHERE Student.SID = S.SID; 21 Scalar subqueries A query that returns a single row can be used as a value in WHERE, SELECT, etc. Example: students at the same age as Bart SELECT * What s Bart s age? WHERE age = ( SELECT age WHERE name = Bart ); Runtime error if subquery returns more than one row Under what condition will this runtime error never occur? name is a key of Student What if subquery returns no rows? The value returned is a special NULL value, and the comparison fails Can be used in SELECT to compute a value for an output column 22 IN subqueries 23 EXISTS subqueries 24 x IN (subquery) checks if x is in the result of subquery Example: students at the same age as (some) Bart SELECT * What s Bart s age? WHERE age IN ( SELECT age WHERE name = Bart ); EXISTS (subquery) checks if the result of subquery is non-empty Example: students at the same age as (some) Bart SELECT * AS s WHERE EXISTS (SELECT * WHERE name = Bart AND age = s.age); This happens to be a correlated subquery a subquery that references tuple variables in surrounding queries

Operational semantics of subqueries 25 Scoping rule of subqueries 26 SELECT * AS s WHERE EXISTS (SELECT * WHERE name = Bart AND age = s.age); For each row s in Student Evaluate the subquery with the appropriate value of s.age If the result of the subquery is not empty, output s.* The DBMS query optimizer may choose to process the query in an equivalent, but more efficient way (example?) To find out which table a column belongs to Start with the immediately surrounding query If not found, look in the one surrounding that; repeat if necessary Use table_name.column_name notation and AS (renaming) to avoid confusion Another example 27 Quantified subqueries 28 SELECT * s WHERE EXISTS (SELECT * FROM Enroll e WHERE SID = s.sid AND EXISTS (SELECT * FROM Enroll WHERE SID = s.sid AND CID <> e.cid)); Students who are taking at least two courses A quantified subquery can be used as a value in a WHERE condition Universal quantification (for all): WHERE xopall (subquery) True iff for all t in the result of subquery, x opt Existential quantification (exists): WHERE xopany (subquery) True iff there exists some t in the result of subquery such that x opt Beware In common parlance, any and all seem to be synonyms In SQL, ANY really means some Examples of quantified subqueries Which students have the highest GPA? SELECT * WHERE GPA >= ALL (SELECT GPA ); SELECT * WHERE NOT (GPA < ANY (SELECT GPA ); Use NOT to negate a condition 29 30 More ways of getting the highest GPA Which students have the highest GPA? SELECT * AS s WHERE NOT EXISTS (SELECT * WHERE GPA > s.gpa); SELECT * WHERE SID NOT IN (SELECT s1.sid AS s1, Student AS s2 WHERE s1.gpa < s2.gpa);

31 Summary of SQL features covered so far SELECT-FROM-WHERE statements Set and bag operations Table expressions, subqueries Subqueries allow queries to be written in more declarative ways (recall the highest GPA query) But they do not add much expressive power Try translating other forms of subqueries into [NOT] EXISTS, which in turn can be translated into join (and difference) Aggregates Standard SQL aggregate functions: COUNT, SUM, AVG, MIN, MAX Example: number of students under 18, and their average GPA SELECT COUNT(*), AVG(GPA) WHERE age < 18; COUNT(*) counts the number of rows 32 Next: aggregation and grouping Aggregates with DISTINCT 33 GROUP BY 34 Example: How many students are taking classes? SELECT COUNT(DISTINCT SID) FROM Enroll; is equivalent to: SELECT COUNT(*) FROM (SELECT DISTINCT SID, FROM Enroll); SELECT FROM WHERE GROUP BY list_of_columns; Example: find the average GPA for each age group SELECT age, AVG(GPA) GROUP BY age; Operational semantics of GROUP BY SELECT FROM WHERE GROUP BY ; Compute FROM ( ) Compute WHERE (σ) Compute GROUP BY: group rows according to the values of GROUP BY columns Compute SELECT for each group (π) For aggregation functions with DISTINCT inputs, first eliminate duplicates within the group Number of groups = number of rows in the final output 35 Example of computing GROUP BY SELECT age, AVG(GPA) GROUP BY age; SID name age GPA 142 Bart 10 2.3 857 Lisa 8 4.3 123 Milhouse 10 3.1 456 Ralph 8 2.3 Compute GROUP BY: group rows according to the values of GROUP BY columns............ SID name age GPA 142 Bart 10 2.3 123 Milhouse 10 3.1 857 Lisa 8 4.3 456 Ralph 8 2.3 age AVG_GPA............ 10 2.7 8 3.3...... Compute SELECT for each group 36

Aggregates with no GROUP BY 37 Restriction on SELECT 38 An aggregate query with no GROUP BY clause represent a special case where all rows go into one group SELECT AVG(GPA) ; SID name age GPA 142 Bart 10 2.3 857 Lisa 8 4.3 123 Milhouse 10 3.1 456 Ralph 8 2.3............ Group all rows into one group SID name age GPA 142 Bart 10 2.3 857 Lisa 8 4.3 123 Milhouse 10 3.1 456 Ralph 8 2.3............ Compute aggregate over the group AVG_GPA 3 If a query uses aggregation/group by, then every column referenced in SELECT must be either Aggregated, or A GROUP BY column This restriction ensures that any SELECT expression produces only one value for each group Examples of invalid queries 39 HAVING 40 SELECT SID, age GROUP BY age; Recall there is one output row per group There can be multiple SID values per group SELECT SID, MAX(GPA) ; Recall there is only one group for an aggregate query with no GROUP BY clause There can be multiple SID values Wishful thinking (that the output SID value is the one associated with the highest GPA) does NOT work Another way of writing the max GPA query? Used to filter groups based on the group properties (e.g., aggregate values, GROUP BY column values) SELECT FROM WHERE GROUP BY HAVING condition; Compute FROM ( ) Compute WHERE (σ) Compute GROUP BY: group rows according to the values of GROUP BY columns Compute HAVING (another σ over the groups) Compute SELECT (π) for each group that passes HAVING HAVING examples 41 42 Summary of SQL features covered so far Find the average GPA for each age group over 10 SELECT age, AVG(GPA) GROUP BY age HAVING age > 10; Can be written using WHERE without table expressions List the average GPA for each age group with more than a hundred students SELECT age, AVG(GPA) GROUP BY age HAVING COUNT(*) > 100; Can be written using WHERE and table expressions SELECT-FROM-WHERE statements Set and bag operations Table expressions, subqueries Aggregation and grouping More expressive power than relational algebra Next: ordering output rows

ORDER BY 43 ORDER BY example 44 SELECT [DISTINCT]... FROM WHERE GROUP BY HAVING ORDER BY output_column [ASC DESC], ; ASC = ascending, DESC = descending Operational semantics After SELECT list has been computed and optional duplicate elimination has been carried out, sort the output according to ORDER BY specification List all students, sort them by GPA (descending) and name (ascending) SELECT SID, name, age, GPA ORDER BY GPA DESC, name; ASC is the default option Strictly speaking, only output columns can appear in ORDER BY clause (although some DBMS support more) Can use sequence numbers instead of names to refer to output columns: ORDER BY 4 DESC, 2; 45 Summary of SQL features covered so far SELECT-FROM-WHERE statements Set and bag operations Table expressions, subqueries Aggregation and grouping Ordering Next: NULL s, outerjoins, data modification, constraints,