Lecture 3: SQL Part II

Similar documents
Lecture 2: Introduction to SQL

Big Data Processing Technologies. Chentao Wu Associate Professor Dept. of Computer Science and Engineering

Introduction to SQL Part 1 By Michael Hahsler based on slides for CS145 Introduction to Databases (Stanford)

DS Introduction to SQL Part 2 Multi-table Queries. By Michael Hahsler based on slides for CS145 Introduction to Databases (Stanford)

DS Introduction to SQL Part 1 Single-Table Queries. By Michael Hahsler based on slides for CS145 Introduction to Databases (Stanford)

Lectures 2&3: Introduction to SQL

Introduction to Database Systems CSE 444

CS145: Intro to Databases. Lecture 1: Course Overview

Polls on Piazza. Open for 2 days Outline today: Next time: "witnesses" (traditionally students find this topic the most difficult)

CS145: Intro to Databases. Lecture 1: Course Overview

CS564: Database Management Systems. Lecture 1: Course Overview. Acks: Chris Ré 1

SQL. Assit.Prof Dr. Anantakul Intarapadung

INF 212 ANALYSIS OF PROG. LANGS SQL AND SPREADSHEETS. Instructors: Crista Lopes Copyright Instructors.

CMPT 354: Database System I. Lecture 3. SQL Basics

Agenda. Discussion. Database/Relation/Tuple. Schema. Instance. CSE 444: Database Internals. Review Relational Model

SQL. Structured Query Language

Before we talk about Big Data. Lets talk about not-so-big data

Introduction to Databases CSE 414. Lecture 2: Data Models

Database Systems CSE 303. Lecture 02

Database Systems CSE 303. Lecture 02

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2006 Lecture 3 - Relational Model

MTAT Introduction to Databases

CSE 344 JANUARY 5 TH INTRO TO THE RELATIONAL DATABASE

COMP 430 Intro. to Database Systems

Introduction to Data Management CSE 344. Lecture 2: Data Models

Lecture 3 Additional Slides. CSE 344, Winter 2014 Sudeepa Roy

Introduction to Database Systems CSE 414. Lecture 3: SQL Basics

Big Data Processing Technologies. Chentao Wu Associate Professor Dept. of Computer Science and Engineering

CSE 344 JANUARY 8 TH SQLITE AND JOINS

Announcements. Agenda. Database/Relation/Tuple. Schema. Discussion. CSE 444: Database Internals

Announcements. Agenda. Database/Relation/Tuple. Discussion. Schema. CSE 444: Database Internals. Room change: Lab 1 part 1 is due on Monday

Introduction to Database Systems CSE 414. Lecture 3: SQL Basics

COMP 430 Intro. to Database Systems

CS 564 Midterm Review

Principles of Database Systems CSE 544. Lecture #2 SQL The Complete Story

Keys, SQL, and Views CMPSCI 645

Operating systems fundamentals - B07

Introduction to Relational Databases

Introduction to Database Systems. The Relational Data Model

Introduction to Database Systems. The Relational Data Model. Werner Nutt

Data Informatics. Seon Ho Kim, Ph.D.

Introduction to Data Management CSE 414

Announcements. Using Electronics in Class. Review. Staff Instructor: Alvin Cheung Office hour on Wednesdays, 1-2pm. Class Overview

CSE 544 Principles of Database Management Systems

CSC 261/461 Database Systems Lecture 13. Fall 2017

The Entity-Relationship Model (ER Model) - Part 2

Lecture 16. The Relational Model

Database Systems CSE 414

CSE 544 Principles of Database Management Systems

CMPT 354: Database System I. Lecture 2. Relational Model

Introduction to Data Management. Lecture #4 (E-R Relational Translation)

The Relational Model. Week 2

The Relational Model. Chapter 3. Comp 521 Files and Databases Fall

The Relational Data Model

Lecture 4. Lecture 4: The E/R Model

CSC 261/461 Database Systems Lecture 5. Fall 2017

Announcements. Multi-column Keys. Multi-column Keys (3) Multi-column Keys. Multi-column Keys (2) Introduction to Data Management CSE 414

Principles of Database Systems CSE 544. Lecture #1 Introduction and SQL

Lecture 2 SQL. Instructor: Sudeepa Roy. CompSci 516: Data Intensive Computing Systems

Introduction to SQL Part 2 by Michael Hahsler Based on slides for CS145 Introduction to Databases (Stanford)

Database Applications (15-415)

Data Modeling. Yanlei Diao UMass Amherst. Slides Courtesy of R. Ramakrishnan and J. Gehrke

Lecture 5. Lecture 5: The E/R Model

The Relational Model. Chapter 3. Comp 521 Files and Databases Fall

Announcements. Multi-column Keys. Multi-column Keys. Multi-column Keys (3) Multi-column Keys (2) Introduction to Data Management CSE 414

Review. The Relational Model. Glossary. Review. Data Models. Why Study the Relational Model? Why use a DBMS? OS provides RAM and disk

Principles of Database Systems CSE 544. Lecture #1 Introduction and SQL

Administrivia. The Relational Model. Review. Review. Review. Some useful terms

SQL: Part III. Announcements. Constraints. CPS 216 Advanced Database Systems

The Relational Data Model. Data Model

The Relational Model. Roadmap. Relational Database: Definitions. Why Study the Relational Model? Relational database: a set of relations

CSC 261/461 Database Systems Lecture 7

The Relational Model

The Relational Model. Why Study the Relational Model? Relational Database: Definitions

Relational data model

CSC 261/461 Database Systems Lecture 6. Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101

Database Technology Introduction. Heiko Paulheim

CS143: Relational Model

CS275 Intro to Databases

CS317 File and Database Systems

Lecture 2 SQL. Announcements. Recap: Lecture 1. Today s topic. Semi-structured Data and XML. XML: an overview 8/30/17. Instructor: Sudeepa Roy

Computing for Medicine (C4M) Seminar 3: Databases. Michelle Craig Associate Professor, Teaching Stream

The Relational Model. Chapter 3

The Relational Model of Data (ii)

The Relational Model. Chapter 3. Database Management Systems, R. Ramakrishnan and J. Gehrke 1

The Relational Model. Relational Data Model Relational Query Language (DDL + DML) Integrity Constraints (IC)

CompSci 516 Database Systems. Lecture 2 SQL. Instructor: Sudeepa Roy

Chapter 2 Introduction to Relational Models

L12: ER modeling 5. CS3200 Database design (sp18 s2) 2/22/2018

ENTITY-RELATIONSHIP MODEL. CS 564- Spring 2018

The Relational Model. Outline. Why Study the Relational Model? Faloutsos SCS object-relational model

CS639: Data Management for Data Science. Lecture 1: Intro to Data Science and Course Overview. Theodoros Rekatsinas

Lecture 3 More SQL. Instructor: Sudeepa Roy. CompSci 516: Database Systems

The Relational Model Constraints and SQL DDL

SQL: Data Definition Language

Introduction to Data Management. Lecture #5 Relational Model (Cont.) & E-Rà Relational Mapping

Relational Databases BORROWED WITH MINOR ADAPTATION FROM PROF. CHRISTOS FALOUTSOS, CMU /615

CS 2451 Database Systems: Relational Data Model

Database Management System Dr. S. Srinath Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No.

Introduction and Overview

Transcription:

Lecture 3 Lecture 3: SQL Part II Copyright: These slides are the modified version of the slides used in CS145 Introduction to Databases course at Stanford by Dr. Peter Bailis

Lecture 3 Today s Lecture 1. Quick review of the previous 2 sessions ACTIVITY: Single-table queries (left from previous session) 2. Multi-table queries ACTIVITY: Multi-table queries 2

Section 1 > Review Why should you study databases? Mercenary- make more $$$: Startups need DB talent right away = low employee # Massive industry Intellectual: Science: data poor to data rich No idea how to handle the data! Fundamental ideas to/from all of CS: Systems, theory, AI, logic, stats, analysis. Many great computer systems ideas started in DB. 3

Section 1 > Review Who we are Instructor (me) Mohammad Dashti Faculty in Software Engineering First year at Yazd University, first time teaching Database Systems! Research: database and machine learning systems TAs? N/A (for now) 4

Section 1 > Review Attendance I dislike mandatory attendance but in the past we noticed People who did not attend did worse L People who did not attend used more course resources L People who did not attend were less happy with the course L Thus: mandatory attendance 5

Section 1 > Review Jupyter Notebook Hello World Jupyter notebooks are interactive shells which save output in a nice notebook format They also can display markdown, LaTeX, HTML, js FYI: Jupyter Notebook are also called ipython notebooks but they handle other languages too. You ll use these for in-class activities interactive lecture supplements/recaps homeworks, projects, etc.- if helpful! Note: you do need to know or learn python for this course! 6

Section 1 > Review Jupyter Notebook Setup 1. HIGHLY RECOMMENDED. Install on your laptop via the instructions on the next slide 2. Other options running via one of the alternative methods: 1. Ubuntu VM. 2. Corn Please help out your peers by posting issues / solutions on the forum (once it s created!) As a general policy in upper-level CS courses, Windows is not officially supported. However we are making a best-effort attempt to provide some solutions here! 7

Section 1 > Review Jupyter Notebook Setup Instructions on course page (for session 1): http://el.yazd.ac.ir/lms/course/view.php?id=1112 8

Section 1 > Review What is a DBMS? A large, integrated collection of data Models a real-world enterprise Entities (e.g., Students, Courses) Relationships (e.g., Ali is enrolled in the Database Systems course) A Database Management System (DBMS) is a piece of software designed to store and manage databases 9

Section 1 > Review A Motivating, Running Example Consider building a course management system (CMS): Students Courses Professors Entities Who takes what Who teaches what Relationships 10

Section 1 > Review Data models A data model is a collection of concepts for describing data The relational model of data is the most widely used model today Main Concept: the relation- essentially, a table A schema is a description of a particular collection of data, using the given data model E.g. every relation in a relational data model has a schema describing types, etc. 11

Section 1 > Review SQL Introduction SQL is a standard language for querying and manipulating data SQL is a very high-level programming language This works because it is optimized well! SQL stands for Structured Query Language Many standards out there: ANSI SQL, SQL92 (a.k.a. SQL2), SQL99 (a.k.a. SQL3),. Vendors support various subsets NB: Probably the world s most successful parallel programming language (multicore?)

Section 1 > Review SQL is a Data Definition Language (DDL) Define relational schemata Create/alter/delete tables and their attributes Data Manipulation Language (DML) Insert/delete/modify tuples in tables Query one or more tables discussed next! 13

Section 1 > Review Tables in SQL Product PName Price Manufacturer Gizmo $19.99 GizmoWorks Powergizmo $29.99 GizmoWorks SingleTouch $149.99 Canon MultiTouch $203.99 Hitachi A relation or table is a multiset of tuples having the attributes specified by the schema Let s break this definition down 14

Lecture 3 > Section 1 > Review Tables in SQL Product PName Price Manufacturer Gizmo $19.99 GizmoWorks Powergizmo $29.99 GizmoWorks SingleTouch $149.99 Canon MultiTouch $203.99 Hitachi A multiset is an unordered list (or: a set with multiple duplicate instances allowed) List: [1, 1, 2, 3] Set: {1, 2, 3} Multiset: {1, 1, 2, 3} i.e. no next(), etc. methods! 15

Lecture 3 > Section 1 > Review Tables in SQL Product PName Price Manufacturer Gizmo $19.99 GizmoWorks Powergizmo $29.99 GizmoWorks An attribute (or column) is a typed data entry present in each tuple in the relation SingleTouch $149.99 Canon MultiTouch $203.99 Hitachi NB: Attributes must have an atomic type in standard SQL, i.e. not a list, set, etc. 16

Lecture 3 > Section 1 > Review Tables in SQL Product PName Price Manufacturer Gizmo $19.99 GizmoWorks Powergizmo $29.99 GizmoWorks SingleTouch $149.99 Canon MultiTouch $203.99 Hitachi Also referred to sometimes as a record A tuple or row is a single entry in the table having the attributes specified by the schema 17

Lecture 3 > Section 1 > Review Tables in SQL Product PName Price Manufacturer Gizmo $19.99 GizmoWorks Powergizmo $29.99 GizmoWorks SingleTouch $149.99 Canon The number of tuples is the cardinality of the relation MultiTouch $203.99 Hitachi The number of attributes is the arity of the relation 18

Lecture 3 > Section 1 > Review Data Types in SQL Atomic types: Characters: CHAR(20), VARCHAR(50) Numbers: INT, BIGINT, SMALLINT, FLOAT Others: MONEY, DATETIME, Every attribute must have an atomic type Hence tables are flat 19

Lecture 3 > Section 1 > Review Table Schemas The schema of a table is the table name, its attributes, and their types: Product(Pname: string, Price: float, Category: string, Manufacturer: string)! A key is an attribute whose values are unique; we underline a key Product(Pname: string, Price: float, Category: string, Manufacturer: string)! 20

Lecture 3 > Section 1 > Review Key constraints A key is a minimal subset of attributes that acts as a unique identifier for tuples in a relation A key is an implicit constraint on which tuples can be in the relation i.e. if two tuples agree on the values of the key, then they must be the same tuple! Students(sid:string, name:string, gpa: float)! 1. Which would you select as a key? 2. Is a key always guaranteed to exist? 3. Can we have more than one key?

Lecture 3 > Section 1 > Review NULL and NOT NULL To say don t know the value we use NULL! NULL has (sometimes painful) semantics, more detail later Students(sid:string, name:string, gpa: float)! sid name gpa 123 Bob 3.9 143 Jim NULL Say, Jim just enrolled in his first class. In SQL, we may constrain a column to be NOT NULL, e.g., name in this table

Lecture 3 > Section 1 > Review General Constraints We can actually specify arbitrary assertions E.g. There cannot be 25 people in the DB class In practice, we don t specify many such constraints. Why? Performance! Whenever we do something ugly (or avoid doing something convenient) it s for the sake of performance

Lecture 3 > Section 1 > Review SQL Query Basic form (there are many many more bells and whistles) SELECT <attributes>! FROM <one or more relations>! WHERE <conditions>! Call this a SFW query. 24

Lecture 3 > Section 1 > Review Simple SQL Query: Selection Selection is the operation of filtering a relation s tuples on some condition PName Price Category Manufacturer Gizmo $19.99 Gadgets GizmoWorks Powergizmo $29.99 Gadgets GizmoWorks SingleTouch $149.99 Photography Canon MultiTouch $203.99 Household Hitachi SELECT *! FROM Product! WHERE Category = Gadgets! PName Price Category Manufacturer Gizmo $19.99 Gadgets GizmoWorks Powergizmo $29.99 Gadgets GizmoWorks 25

Lecture 3 > Section 1 > Review Simple SQL Query: Projection Projection is the operation of producing an output table with tuples that have a subset of their prior attributes PName Price Category Manufacturer Gizmo $19.99 Gadgets GizmoWorks Powergizmo $29.99 Gadgets GizmoWorks SingleTouch $149.99 Photography Canon MultiTouch $203.99 Household Hitachi SELECT Pname, Price, Manufacturer! FROM Product! WHERE Category = Gadgets! PName Price Manufacturer Gizmo $19.99 GizmoWorks Powergizmo $29.99 GizmoWorks 26

Lecture 3 > Section 1 > Review LIKE: Simple String Pattern Matching SELECT *! FROM Products! WHERE PName LIKE %gizmo%! s LIKE p: pattern matching on strings p may contain two special symbols: % = any sequence of characters _ = any single character 27

Lecture 3 > Section 1 > Review DISTINCT: Eliminating Duplicates SELECT DISTINCT Category! FROM Product! Versus SELECT Category! FROM Product! Category Gadgets Photography Household Category Gadgets Gadgets Photography Household 28

Lecture 3 > Section 1 > Review ORDER BY: Sorting the Results SELECT PName, Price, Manufacturer! FROM Product! WHERE Category= gizmo AND Price > 50! ORDER BY Price, PName! Ties are broken by the second attribute on the ORDER BY list, etc. Ordering is ascending, unless you specify the DESC keyword. 29

Lecture 3 > Section 1 > ACTIVITY ACTIVITY: Activity-2-2.ipynb 30

Lecture 3 > Section 2 2. Multi-table queries 31

Lecture 3 > Section 2 What you will learn about in this section 1. Foreign key constraints 2. Joins: basics 3. Joins: SQL semantics 4. ACTIVITY: Multi-table queries 32

Lecture 3 > Section 2 > Foreign Keys Foreign Key constraints Suppose we have the following schema: Students(sid: string, name: string, gpa: float)! Enrolled(student_id: string, cid: string, grade: string)! And we want to impose the following constraint: a student must appear in the Students table to enroll in a class Students sid name gpa 101 Bob 3.2 123 Mary 3.8 Enrolled student_id cid grade 123 564 A 123 537 A+ We say that student_id is a foreign key that refers to Students student_id alone is not a key- what is?

Lecture 3 > Section 2 > Foreign Keys Declaring Foreign Keys Students(sid: string, name: string, gpa: float)! Enrolled(student_id: string, cid: string, grade: string)!! CREATE TABLE Enrolled(! student_id CHAR(20),! cid CHAR(20),! grade CHAR(10),! PRIMARY KEY (student_id, cid),! FOREIGN KEY (student_id) REFERENCES Students(sid)! )!

Lecture 3 > Section 2 > Foreign Keys Foreign Keys and update operations Students(sid: string, name: string, gpa: float)! Enrolled(student_id: string, cid: string, grade: string)! What if we insert a tuple into Enrolled, but no corresponding student? INSERT is rejected (foreign keys are constraints)! What if we delete a student? 1. Disallow the delete 2. Remove all of the courses for that student 3. SQL allows a third via NULL (not yet covered) DBA chooses (syntax in the book)

Lecture 3 > Section 2 > Foreign Keys Keys and Foreign Keys Company CName StockPrice Country GizmoWorks 25 USA Canon 65 Japan Hitachi 15 Japan What is a foreign key vs. a key here? Product PName Price Category Manufacturer Gizmo $19.99 Gadgets GizmoWorks Powergizmo $29.99 Gadgets GizmoWorks SingleTouch $149.99 Photography Canon MultiTouch $203.99 Household Hitachi 36

Lecture 3 > Section 2 > Joins: Basics Joins Product(PName, Price, Category, Manufacturer)! Company(CName, StockPrice, Country)! Ex: Find all products under $200 manufactured in Japan; return their names and prices. Note: we will often omit attribute types in schema definitions for brevity, but assume attributes are always atomic types SELECT PName, Price! FROM Product, Company! WHERE Manufacturer = CName! AND Country= Japan! AND Price <= 200! 37

Lecture 3 > Section 2 > Joins: Basics Joins Product(PName, Price, Category, Manufacturer)! Company(CName, StockPrice, Country)! Ex: Find all products under $200 manufactured in Japan; return their names and prices. SELECT PName, Price! FROM Product, Company! WHERE Manufacturer = CName! AND Country= Japan! AND Price <= 200! A join between tables returns all unique combinations of their tuples which meet some specified join condition 38

Lecture 3 > Section 2 > Joins: Basics Joins Product(PName, Price, Category, Manufacturer)! Company(CName, StockPrice, Country)! Several equivalent ways to write a basic join in SQL: SELECT PName, Price! FROM Product, Company! WHERE Manufacturer = CName! AND Country= Japan! AND Price <= 200! SELECT PName, Price! FROM Product! JOIN Company ON Manufacturer = Cname! AND Country= Japan! WHERE Price <= 200! A few more later on 39

Lecture 3 > Section 2 > Joins: Basics Joins Product PName Price Category Manuf Gizmo $19 Gadgets GWorks Powergizmo $29 Gadgets GWorks SingleTouch $149 Photography Canon MultiTouch $203 Household Hitachi Company Cname Stock Country GWorks 25 USA Canon 65 Japan Hitachi 15 Japan SELECT PName, Price! FROM Product, Company! WHERE Manufacturer = CName! AND Country= Japan! AND Price <= 200! PName Price SingleTouch $149.99 40

Lecture 3 > Section 2 > Joins: Semantics Tuple Variable Ambiguity in Multi-Table Person(name, address, worksfor)! Company(name, address)! SELECT DISTINCT name, address! FROM Person, Company! WHERE worksfor = name! Which address does this refer to? Which name s?? 41

Lecture 3 > Section 2 > Joins: Semantics Tuple Variable Ambiguity in Multi-Table Person(name, address, worksfor)! Company(name, address)! Both equivalent ways to resolve variable ambiguity SELECT DISTINCT Person.name, Person.address! FROM Person, Company! WHERE Person.worksfor = Company.name! SELECT DISTINCT p.name, p.address! FROM Person p, Company c! WHERE p.worksfor = c.name! 42

Lecture 3 > Section 2 > Joins: semantics Meaning (Semantics) of SQL Queries SELECT x 1.a 1, x 1.a 2,, x n.a k! FROM R 1 AS x 1, R 2 AS x 2,, R n AS x n! WHERE Conditions(x 1,, x n )! Almost never the fastest way to compute it! Answer = {} for x 1 in R 1 do for x 2 in R 2 do.. for x n in R n do if Conditions(x 1,, x n ) then Answer = Answer {(x 1.a 1, x 1.a 2,, x n.a k )} return Answer Note: this is a multiset union 43

Lecture 3 > Section 2 > Joins: semantics An example of SQL semantics 1 3 A B C 2 3 3 4 3 5 Cross Product SELECT R.A! FROM R, S! WHERE R.A = S.B! A B C 1 2 3 1 3 4 1 3 5 3 2 3 3 3 4 3 3 5 Output Apply Selections / Conditions A 3 3 A B C 3 3 4 3 3 5 Apply Projection 44

Lecture 3 > Section 2 > Joins: semantics Note the semantics of a join SELECT R.A! FROM R, S! WHERE R.A = S.B! 1. Take cross product: X=R S 2. Apply selections / conditions: Y={(r,s) X r.a==r.b} Recall: Cross product (A X B) is the set of all unique tuples in A,B Ex: {a,b,c} X {1,2} = {(a,1), (a,2), (b,1), (b,2), (c,1), (c,2)} = Filtering! 3. Apply projections to get final output: Z=(y.A,) for y Y = Returning only some attributes Remembering this order is critical to understanding the output of certain queries (see later on ) 45

Lecture 3 > Section 2 > Joins: semantics Note: we say semantics not execution order The preceding slides show what a join means Not actually how the DBMS executes it under the covers

Lecture 3 > Section 2 > ACTIVITY Joins: semantics A Subtlety about Joins Product(PName, Price, Category, Manufacturer)! Company(CName, StockPrice, Country)! Find all countries that manufacture some product in the Gadgets category. SELECT Country! FROM Product, Company! WHERE Manufacturer=CName AND Category= Gadgets! 47

Lecture 3 > Section 2 > ACTIVITY Joins: semantics A subtlety about Joins Product PName Price Category Manuf Gizmo $19 Gadgets GWorks Powergizmo $29 Gadgets GWorks SingleTouch $149 Photography Canon Company Cname Stock Country GWorks 25 USA Canon 65 Japan Hitachi 15 Japan MultiTouch $203 Household Hitachi SELECT Country! FROM Product, Company! WHERE Manufacturer=Cname! AND Category= Gadgets! What is the problem? What s the solution? Country?? 48

Lecture 3 > Section 2 > ACTIVITY ACTIVITY: Lecture-3-1.ipynb 49

Lecture 3 > Section 2 > ACTIVITY An Unintuitive Query SELECT DISTINCT R.A! FROM R, S, T! WHERE R.A=S.A OR R.A=T.A! What does it compute? 50

Lecture 3 > Section 2 > ACTIVITY An Unintuitive Query SELECT DISTINCT R.A! FROM R, S, T! WHERE R.A=S.A OR R.A=T.A! S T R But what if S = φ? Computes R (S T) Go back to the semantics! 51

Lecture 3 > Section 2 > ACTIVITY An Unintuitive Query Recall the semantics! 1. Take cross-product 2. Apply selections / conditions 3. Apply projection SELECT DISTINCT R.A! FROM R, S, T! WHERE R.A=S.A OR R.A=T.A! If S = {}, then the cross product of R, S, T = {}, and the query result = {}! Must consider semantics here. Are there more explicit way to do set operations like this? 52