Agenda. Discussion. Database/Relation/Tuple. Schema. Instance. CSE 444: Database Internals. Review Relational Model

Similar documents
Announcements. Agenda. Database/Relation/Tuple. Discussion. Schema. CSE 444: Database Internals. Room change: Lab 1 part 1 is due on Monday

Announcements. Agenda. Database/Relation/Tuple. Schema. Discussion. CSE 444: Database Internals

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2006 Lecture 3 - Relational Model

CSE 544 Principles of Database Management Systems

CSE 544 Principles of Database Management Systems

Announcements. From SQL to RA. Query Evaluation Steps. An Equivalent Expression

Introduction to Data Management CSE 344. Lectures 9: Relational Algebra (part 2) and Query Evaluation

SQL. Structured Query Language

Introduction to SQL Part 1 By Michael Hahsler based on slides for CS145 Introduction to Databases (Stanford)

SQL. Assit.Prof Dr. Anantakul Intarapadung

Introduction to Data Management CSE 344. Lectures 8: Relational Algebra

Introduction to Database Systems CSE 444

Lecture 2: Introduction to SQL

Introduction to Data Management CSE 344. Lectures 8: Relational Algebra

DS Introduction to SQL Part 2 Multi-table Queries. By Michael Hahsler based on slides for CS145 Introduction to Databases (Stanford)

DS Introduction to SQL Part 1 Single-Table Queries. By Michael Hahsler based on slides for CS145 Introduction to Databases (Stanford)

Big Data Processing Technologies. Chentao Wu Associate Professor Dept. of Computer Science and Engineering

Database Management Systems CSEP 544. Lecture 3: SQL Relational Algebra, and Datalog

Before we talk about Big Data. Lets talk about not-so-big data

Principles of Database Systems CSE 544. Lecture #2 SQL The Complete Story

Polls on Piazza. Open for 2 days Outline today: Next time: "witnesses" (traditionally students find this topic the most difficult)

Introduction to SQL Part 2 by Michael Hahsler Based on slides for CS145 Introduction to Databases (Stanford)

Lecture 3: SQL Part II

CSE 344 JANUARY 26 TH DATALOG

Lectures 2&3: Introduction to SQL

Introduction to Database Systems CSE 414. Lecture 3: SQL Basics

Introduction to Database Systems CSE 414. Lecture 3: SQL Basics

Introduction to Relational Databases

CSC 261/461 Database Systems Lecture 5. Fall 2017

Lecture Query evaluation. Combining operators. Logical query optimization. By Marina Barsky Winter 2016, University of Toronto

Lecture 3 Additional Slides. CSE 344, Winter 2014 Sudeepa Roy

CS 564 Midterm Review

Introduction to Database Systems CSE 444

INF 212 ANALYSIS OF PROG. LANGS SQL AND SPREADSHEETS. Instructors: Crista Lopes Copyright Instructors.

CSE 544 Principles of Database Management Systems

Lecture 03: SQL. Friday, April 2 nd, Dan Suciu Spring

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 7 - Query execution

Keys, SQL, and Views CMPSCI 645

CSE 344 APRIL 20 TH RDBMS INTERNALS

12. MS Access Tables, Relationships, and Queries

Database Systems CSE 414

CSE 344 JANUARY 5 TH INTRO TO THE RELATIONAL DATABASE

CS 2451 Database Systems: More SQL

Introduction to Data Management CSE 344. Lecture 2: Data Models

Introduction to Databases CSE 414. Lecture 2: Data Models

Lecture 19: Query Optimization (1)

CSE 544 Principles of Database Management Systems

Introduction to Data Management CSE 414

Review. Administrivia (Preview for Friday) Lecture 21: Query Optimization (1) Where We Are. Relational Algebra. Relational Algebra.

Lecture 21: Query Optimization (1)

Introduction to Data Management CSE 344

CSEP 544: Lecture 04. Query Execution. CSEP544 - Fall

CMP-3440 Database Systems

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 7 - Query optimization

Database design and implementation CMPSCI 645. Lecture 04: SQL and Datalog

Relational Databases

Introduction to Database Systems CSE 414. Lecture 16: Query Evaluation

Database Systems CSE 414

Lecture 17: Query execution. Wednesday, May 12, 2010

CSE 344 Midterm Nov 1st, 2017, 1:30-2:20

Relational Algebra. B term 2004: lecture 10, 11

Principles of Database Systems CSE 544. Lecture #1 Introduction and SQL

Relational Model, Relational Algebra, and SQL

RELATIONAL ALGEBRA. CS 564- Fall ACKs: Dan Suciu, Jignesh Patel, AnHai Doan

MTAT Introduction to Databases

L07: SQL: Advanced & Practice. CS3200 Database design (sp18 s2) 1/11/2018

Database Systems CSE 303. Lecture 02

Principles of Database Systems CSE 544. Lecture #1 Introduction and SQL

Database Systems CSE 303. Lecture 02

Introduction to Data Management CSE 344. Lecture 12: Cost Estimation Relational Calculus

CIS 330: Applied Database Systems

Announcements. Subqueries. Lecture Goals. 1. Subqueries in SELECT. Database Systems CSE 414. HW1 is due today 11pm. WQ1 is due tomorrow 11pm

CS 2451 Database Systems: SQL

Introduction to Data Management CSE 344

COMP 430 Intro. to Database Systems

Relational Model: History

3. Relational Data Model 3.5 The Tuple Relational Calculus

SQL. Lecture 4 SQL. Basic Structure. The select Clause. The select Clause (Cont.) The select Clause (Cont.) Basic Structure.

Database Systems SQL SL03

Lecture 4: Advanced SQL Part II

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 9 - Query optimization

Lecture 16. The Relational Model

CSE 344 JANUARY 8 TH SQLITE AND JOINS

Announcements. Multi-column Keys. Multi-column Keys. Multi-column Keys (3) Multi-column Keys (2) Introduction to Data Management CSE 414

CSE 344 JANUARY 19 TH SUBQUERIES 2 AND RELATIONAL ALGEBRA

Database Systems SQL SL03

CSE344 Midterm Exam Fall 2016

Announcements. Multi-column Keys. Multi-column Keys (3) Multi-column Keys. Multi-column Keys (2) Introduction to Data Management CSE 414

COMP 430 Intro. to Database Systems

The SQL data-definition language (DDL) allows defining :

Today s topics. Null Values. Nulls and Views in SQL. Standard Boolean 2-valued logic 9/5/17. 2-valued logic does not work for nulls

SQL. CS 564- Fall ACKs: Dan Suciu, Jignesh Patel, AnHai Doan

Announcements. Outline UNIQUE. (Inner) joins. (Inner) Joins. Database Systems CSE 414. WQ1 is posted to gradebook double check scores

CSE 414 Midterm. Friday, April 29, 2016, 1:30-2:20. Question Points Score Total: 100

Database Systems CSE 414

Lesson 2. Data Manipulation Language

Announcements. Two typical kinds of queries. Choosing Index is Not Enough. Cost Parameters. Cost of Reading Data From Disk

Subqueries. 1. Subqueries in SELECT. 1. Subqueries in SELECT. Including Empty Groups. Simple Aggregations. Introduction to Database Systems CSE 414

Data Informatics. Seon Ho Kim, Ph.D.

Introduction to Database Systems CSE 344

Transcription:

Agenda CSE 444: Database Internals Review Relational Model Lecture 2 Review of the Relational Model Review Queries (will skip most slides) Relational Algebra SQL Review translation SQL à RA Needed for HW1 1 2 Database/Relation/Tuple A Database is collection of relations A Relation R is subset of S 1 x S 2 x x S n Where S i is the domain of attribute i n is number of attributes of the relation A relation is a set of tuples A Tuple t is an element of S 1 x S 2 x x S n Other names: relation = table; tuple = row Discussion Rows in a relation: Data independence! Ordering immaterial (a relation is a set) All rows are distinct set semantics Query answers may have duplicates bag semantics Columns in a tuple: Ordering is significant Applications refer to columns by their names Domain of each column is a primitive type 3 4 Schema Relation schema: describes column heads Relation name Name of each field (or column, or attribute) Domain of each field Degree (or arity) of relation: # attributes Database schema: set of all relation schemas Instance Relation instance: concrete table content Set of tuples (also called records) matching the schema Cardinality of relation instance: # tuples Database instance: set of all relation instances 5 6 1

Supplier What is the schema? What is the instance? sno sname scity sstate 1 s1 city 1 WA 2 s2 city 1 WA 3 s3 city 2 MA 4 s4 city 2 MA What is the schema? What is the instance? Relation schema Supplier(sno: integer, sname: string, scity: string, sstate: string) Supplier sno sname scity sstate 1 s1 city 1 WA 2 s2 city 1 WA 3 s3 city 2 MA 4 s4 city 2 MA instance 7 8 Integrity Constraints Condition specified on a database schema Restricts data that can be stored in db instance DBMS enforces integrity constraints Ensures only legal database instances exist Simplest form of constraint is domain constraint Attribute values must come from attribute domain Key Constraints Super Key: set of attributes that functionally determines all attributes Key: Minimal super-key; a.k.a. candidate key Primary key: One minimal key can be selected as primary key 9 10 Foreign Key Constraints A relation can refer to a tuple in another relation Foreign key Field that refers to tuples in another relation Typically, this field refers to the primary key of other relation Can pick another field as well Key Constraint SQL Examples PRIMARY KEY (pno) 11 12 2

Key Constraint SQL Examples Key Constraint SQL Examples CREATE TABLE Supply( sno integer, qty integer, price integer PRIMARY KEY (pno) CREATE TABLE Supply( sno integer, qty integer, price integer, PRIMARY KEY (sno,pno) PRIMARY KEY (pno) 13 14 Key Constraint SQL Examples Key Constraint SQL Examples CREATE TABLE Supply( sno integer, qty integer, price integer, PRIMARY KEY (sno,pno), FOREIGN KEY (sno) REFERENCES Supplier, FOREIGN KEY (pno) REFERENCES Part PRIMARY KEY (pno) CREATE TABLE Supply( sno integer, qty integer, price integer, PRIMARY KEY (sno,pno), PRIMARY KEY (pno) FOREIGN KEY (sno) REFERENCES Supplier ON DELETE NO ACTION, FOREIGN KEY (pno) REFERENCES Part ON DELETE CASCADE 15 16 General Constraints Table constraints serve to express complex constraints over a single table PRIMARY KEY (pno), CHECK ( psize > 0 ) Relational Query Languages Note: Also possible to create constraints over many tables 17 18 3

Relational Query Language Set-at-a-time: Query inputs and outputs are relations Two variants of the query language: Relational algebra: specifies order of operations Relational calculus / SQL: declarative Note We will go very quickly in class over the Relational Algebra and SQL Please review at home: Read the slides that we skipped in class Review material from 344 as needed 19 20 Relational Algebra Queries specified in an operational manner A query gives a step-by-step procedure Relational operators Take one or two relation instances as argument Return one relation instance as result Easy to compose into relational algebra expressions 21 Five Basic Relational Operators Selection: σ condition (S) Condition is Boolean combination (, ) of atomic predicates (<, <=, =,, >=, >) Projection: π list-of-attributes (S) Union ( ) Set difference ( ), Cross-product/cartesian product ( ), Join: R θ S = σ θ (R S) Other operators: anti-semijoin (read about it!), renaming 22 Logical Query Plans Logical Query Plans Π sname,scity pno=pno sno=sno σ psize > 10 What does this query compute? 23 Supplier Supply Part 24 4

Selection & Projection Examples Patient π zip,disease (Patient) no name zip disease zip disease 1 p1 98125 flu 98125 flu 2 p2 98125 heart 98125 heart 3 p3 98120 lung 98120 lung 4 p4 98120 heart 98120 heart σ disease= heart (Patient) π zip (σ disease= heart (Patient)) no name zip disease zip 2 p2 98125 heart 98120 4 p4 98120 heart 98125 Cross-Product Example AnonPatient P Voters V age zip disease name age zip 54 98125 heart p1 54 98125 20 98120 flu p2 20 98120 P V P.age P.zip disease name V.age V.zip 54 98125 heart p1 54 98125 54 98125 heart p2 20 98120 20 98120 flu p1 54 98125 20 98120 flu p2 20 98120 25 26 Different Types of Join Theta-join: R θ S = σ θ (R x S) Join of R and S with a join condition θ Cross-product followed by selection θ Equijoin: R θ S = π A (σ θ (R x S)) Join condition θ consists only of equalities Projection π A drops all redundant attributes Natural join: R S = π A (σ θ (R x S)) Equijoin Equality on all fields with same name in R and in S AnonPatient P Theta-Join Example Voters V age zip disease name age zip 50 98125 heart p1 54 98125 19 98120 flu p2 20 98120 P P.zip = V.zip and P.age <= V.age + 1 and P.age >= V.age - 1 V P.age P.zip disease name V.age V.zip 19 98120 flu p2 20 98120 27 28 Equijoin Example AnonPatient P Voters V age zip disease name age zip 54 98125 heart p1 54 98125 20 98120 flu p2 20 98120 Natural Join Example AnonPatient P Voters V age zip disease name age zip 54 98125 heart p1 54 98125 20 98120 flu p2 20 98120 P P.age=V.age V age P.zip disease name V.zip 54 98125 heart p1 98125 20 98120 flu p2 98120 P V age zip disease name 54 98125 heart p1 20 98120 flu p2 29 30 5

More Joins Outer join Include tuples with no matches in the output Use NULL values for missing attributes Variants Left outer join Right outer join Full outer join Outer Join Example AnonPatient P Voters V age zip disease name age zip 54 98125 heart p1 54 98125 20 98120 flu p2 20 98120 33 98120 lung age zip disease name 54 98125 heart p1 P o V 20 98120 flu p2 33 98120 lung null 31 32 Example of Algebra Queries Q1: Names of patients who have heart disease π name (Voter (σ disease= heart (AnonPatient)) More Examples Relations Q2: Name of supplier of parts with size greater than 10 π sname (Supplier Supply (σ psize>10 (Part)) Q3: Name of supplier of red parts or parts with size greater than 10 π sname (Supplier Supply (σ psize>10 (Part) σ pcolor= red (Part) ) ) (Many more examples in the book) 33 34 Supplier sno=sno Logical Query Plans Π sname,scity Supply pno=pno σ psize > 10 Part Extended Operators of Relational Algebra Duplicate elimination (δ) Since commercial DBMSs operate on multisets not sets Aggregate operators (γ) Min, max, sum, average, count Grouping operators (γ) Partitions tuples of a relation into groups Aggregates can then be applied to groups Sort operator (τ) 35 36 6

Structured Query Language: SQL Declarative query language, based on the relational calculus (see 344) Data definition language Statements to create, modify tables and views Data manipulation language Statements to issue queries, insert, delete data SQL Query Basic form: (plus many many more bells and whistles) SELECT <attributes> FROM <one or more relations> WHERE <conditions> 37 38 Quick Review of SQL Quick Review of SQL SELECT DISTINCT z.pno, z.pname FROM Supplier x, Supply y, Part z WHERE x.sno = y.sno and y.pno = z.pno and x.scity = Seattle and y.price < 100 What does this query compute? 39 40 Quick Review of SQL SELECT z.pname, count(*) as cnt, min(y.price) FROM Supplier x, Supply y, Part z WHERE x.sno = y.sno and y.pno = z.pno GROUP BY z.pname What about this one? Simple SQL Query Product PName Price Category Manufacturer Gizmo $19.99 Gadgets GizmoWorks Powergizmo $29.99 Gadgets GizmoWorks SingleTouch $149.99 Photography Canon MultiTouch $203.99 Household Hitachi SELECT * WHERE category= Gadgets selection 41 PName Price Category Manufacturer Gizmo $19.99 Gadgets GizmoWorks Powergizmo $29.99 Gadgets GizmoWorks 42 7

Simple SQL Query Details Product SELECT PName, Price, Manufacturer WHERE Price > 100 selection and projection PName Price Category Manufacturer Gizmo $19.99 Gadgets GizmoWorks Powergizmo $29.99 Gadgets GizmoWorks SingleTouch $149.99 Photography Canon MultiTouch $203.99 Household Hitachi PName Price Manufacturer SingleTouch $149.99 Canon MultiTouch $203.99 Hitachi Case insensitive: Same: SELECT Select select Same: Product product Different: Seattle seattle Constants: abc - yes abc - no 43 44 Eliminating Duplicates Ordering the Results SELECT DISTINCT category Compare to: SELECT category Category Gadgets Photography Household Category Gadgets Gadgets Photography Household SELECT pname, price, manufacturer WHERE category= gizmo AND price > 50 ORDER BY price, pname Ties are broken by the second attribute on the ORDER BY list, etc. Ordering is ascending, unless you specify the DESC keyword. 45 46 Joins Tuple Variables Product (pname, price, category, manufacturer) Company (cname, stockprice, country) Find all products under $200 manufactured in Japan; return their names and prices. SELECT PName, Price, Company WHERE Manufacturer=CName AND Country= Japan AND Price <= 200 Person(pname, address, worksfor) Company(cname, address) SELECT DISTINCT pname, address FROM Person, Company WHERE worksfor = cname 47 Which address? SELECT DISTINCT Person.pname, Company.address FROM Person, Company WHERE Person.worksfor = Company.cname SELECT DISTINCT x.pname, y.address FROM Person AS x, Company AS y 48 WHERE x.worksfor = y.cname 8

Nested Queries Nested query Query that has another query embedded within it The embedded query is called a subquery Why do we need them? Enables to refer to a table that must itself be computed Subqueries can appear in WHERE clause (common) FROM clause (less common) HAVING clause (less common) Subqueries Returning Relations Company(name, city) Product(pname, maker) Purchase(id, product, buyer) Return cities where one can find companies that manufacture products bought by Joe Blow SELECT Company.city FROM Company WHERE Company.name IN (SELECT Product.maker FROM Purchase, Product WHERE Product.pname=Purchase.product 50 AND Purchase.buyer = Joe Blow 49 Subqueries Returning Relations Correlated Queries You can also use: s > ALL R s > ANY R EXISTS R Product ( pname, price, category, maker) Find products that are more expensive than all those produced By Gizmo-Works SELECT name WHERE price > ALL (SELECT price FROM Purchase Movie (title, year, director, length) Find movies whose title appears more than once. SELECT DISTINCT title FROM Movie AS x WHERE year <> ANY (SELECT year FROM Movie WHERE title = x.title Note (1) scope of variables (2) this can still be expressed as single SFW 51 WHERE maker= Gizmo-Works ) correlation 52 SELECT avg(price) WHERE maker= Toyota Aggregation SELECT count(*) WHERE year > 1995 SQL supports several aggregation operations: sum, count, min, max, avg Except count, all aggregations apply to a single attribute Grouping and Aggregation SELECT S FROM R 1,,R n WHERE C1 GROUP BY a 1,,a k HAVING C2 Conceptual evaluation steps: 1. Evaluate FROM-WHERE, apply condition C1 2. Group by the attributes a 1,,a k 3. Apply condition C2 to each group (may have aggregates) 4. Compute aggregates in S and return the result Read more about it in the book... 53 54 9

Product(pid, name, price) Purchase(pid, cid, store) Customer(cid, name, city) From SQL to RA From SQL to RA SELECT DISTINCT x.name, z.name x, Purchase y, Customer z WHERE x.pid = y.pid and y.cid = y.cid and x.price > 100 and z.city = Seattle 55 56 Product(pid, name, price) Purchase(pid, cid, store) Customer(cid, name, city) From SQL to RA δ Π x.name,z.name An Equivalent Expression Query optimization = finding cheaper, equivalent expressions δ Π x.name,z.name σ price>100 and city= Seattle cid=cid Product cid=cid pid=pid σ city= Seattle pid=pid σ Customer price>100 Customer Purchase Product Purchase 57 58 Extended RA: Operators on Bags Logical Query Plan Duplicate elimination δ Grouping γ Sorting τ SELECT city, count(*) FROM sales GROUP BY city HAVING sum(price) > 100 Π city, c σ p > 100 T3(city, c) T2(city,p,c) T1(city,p,c) γ city, sum(price) p, count(*) c T1, T2, T3 = temporary tables sales(product, city, price) 59 60 10

Typical Plan for Block (1/2)... Typical Plan For Block (2/2) having condition π fields γ fields, sum/count/min/max(fields) σ selection condition join condition SELECT-PROJECT-JOIN Query π fields σ selection condition join condition join condition R S 61 62 Benefits of Relational Model Physical data independence Can change how data is organized on disk without affecting applications Logical data independence Can change the logical schema without affecting applications (not 100%... consider updates) Query Evaluation Steps Preview Query optimization SQL query Parse & Rewrite Query Select Logical Plan Select Physical Plan Query Execution Logical plan Physical plan 63 Disk 64 11