Introduction to SQL Part 1 By Michael Hahsler based on slides for CS145 Introduction to Databases (Stanford)

Similar documents
Lecture 2: Introduction to SQL

DS Introduction to SQL Part 1 Single-Table Queries. By Michael Hahsler based on slides for CS145 Introduction to Databases (Stanford)

DS Introduction to SQL Part 2 Multi-table Queries. By Michael Hahsler based on slides for CS145 Introduction to Databases (Stanford)

Big Data Processing Technologies. Chentao Wu Associate Professor Dept. of Computer Science and Engineering

Lecture 3: SQL Part II

Lectures 2&3: Introduction to SQL

SQL. Assit.Prof Dr. Anantakul Intarapadung

Polls on Piazza. Open for 2 days Outline today: Next time: "witnesses" (traditionally students find this topic the most difficult)

Introduction to Database Systems CSE 444

Agenda. Discussion. Database/Relation/Tuple. Schema. Instance. CSE 444: Database Internals. Review Relational Model

SQL. Structured Query Language

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2006 Lecture 3 - Relational Model

Database Systems CSE 303. Lecture 02

Database Systems CSE 303. Lecture 02

INF 212 ANALYSIS OF PROG. LANGS SQL AND SPREADSHEETS. Instructors: Crista Lopes Copyright Instructors.

MTAT Introduction to Databases

Announcements. Agenda. Database/Relation/Tuple. Discussion. Schema. CSE 444: Database Internals. Room change: Lab 1 part 1 is due on Monday

Announcements. Agenda. Database/Relation/Tuple. Schema. Discussion. CSE 444: Database Internals

CMPT 354: Database System I. Lecture 3. SQL Basics

CSE 344 JANUARY 5 TH INTRO TO THE RELATIONAL DATABASE

Before we talk about Big Data. Lets talk about not-so-big data

Introduction to Relational Databases

COMP 430 Intro. to Database Systems

Principles of Database Systems CSE 544. Lecture #2 SQL The Complete Story

Introduction to Databases CSE 414. Lecture 2: Data Models

CSE 344 JANUARY 8 TH SQLITE AND JOINS

Lecture 3 Additional Slides. CSE 344, Winter 2014 Sudeepa Roy

CS 564 Midterm Review

Introduction to Database Systems CSE 414. Lecture 3: SQL Basics

Introduction to SQL Part 2 by Michael Hahsler Based on slides for CS145 Introduction to Databases (Stanford)

Introduction to Data Management CSE 344. Lecture 2: Data Models

Keys, SQL, and Views CMPSCI 645

Introduction to Database Systems CSE 414. Lecture 3: SQL Basics

COMP 430 Intro. to Database Systems

Database Systems CSE 414

Operating systems fundamentals - B07

CSC 261/461 Database Systems Lecture 5. Fall 2017

Introduction to Data Management CSE 414

CSE 544 Principles of Database Management Systems

Data Informatics. Seon Ho Kim, Ph.D.

CSE 544 Principles of Database Management Systems

The Relational Data Model

The Entity-Relationship Model (ER Model) - Part 2

L07: SQL: Advanced & Practice. CS3200 Database design (sp18 s2) 1/11/2018

Announcements (September 14) SQL: Part I SQL. Creating and dropping tables. Basic queries: SFW statement. Example: reading a table

CMPT 354: Database System I. Lecture 2. Relational Model

Introduction to Data Management CSE 344

Introduction to Database Systems. The Relational Data Model

Announcements. Using Electronics in Class. Review. Staff Instructor: Alvin Cheung Office hour on Wednesdays, 1-2pm. Class Overview

Introduction to Database Systems. The Relational Data Model. Werner Nutt

Announcements. Multi-column Keys. Multi-column Keys (3) Multi-column Keys. Multi-column Keys (2) Introduction to Data Management CSE 414

Review. The Relational Model. Glossary. Review. Data Models. Why Study the Relational Model? Why use a DBMS? OS provides RAM and disk

Lecture 16. The Relational Model

Entity/Relationship Modelling

SQL: Data Definition Language

Announcements. Multi-column Keys. Multi-column Keys. Multi-column Keys (3) Multi-column Keys (2) Introduction to Data Management CSE 414

CSC 261/461 Database Systems Lecture 13. Fall 2017

SQL DATA DEFINITION LANGUAGE

The Relational Model. Chapter 3. Comp 521 Files and Databases Fall

SQLite, Firefox, and our small IMDB movie database. CS3200 Database design (sp18 s2) Version 1/17/2018

The Relational Model. Chapter 3. Comp 521 Files and Databases Fall

Data Modeling. Yanlei Diao UMass Amherst. Slides Courtesy of R. Ramakrishnan and J. Gehrke

Chapter 3: Introduction to SQL

Lecture 04: SQL. Monday, April 2, 2007

CS275 Intro to Databases

Lecture 2 SQL. Instructor: Sudeepa Roy. CompSci 516: Data Intensive Computing Systems

SQL: The Query Language Part 1. Relational Query Languages

Administrivia. The Relational Model. Review. Review. Review. Some useful terms

Database Applications (15-415)

The Relational Model

SQL DATA DEFINITION LANGUAGE

Review: Where have we been?

The Relational Model. Outline. Why Study the Relational Model? Faloutsos SCS object-relational model

The Relational Model. Relational Data Model Relational Query Language (DDL + DML) Integrity Constraints (IC)

Lecture 4. Lecture 4: The E/R Model

CPSC 504 Background (aka, all you need to know about databases for this course in two lectures) Rachel Pottinger January 8 and 12, 2015

The Relational Model. Roadmap. Relational Database: Definitions. Why Study the Relational Model? Relational database: a set of relations

SQL DATA DEFINITION LANGUAGE

SQL Functionality SQL. Creating Relation Schemas. Creating Relation Schemas

Lecture 04: SQL. Wednesday, October 4, 2006

Announcements. Outline UNIQUE. (Inner) joins. (Inner) Joins. Database Systems CSE 414. WQ1 is posted to gradebook double check scores

The Relational Model. Why Study the Relational Model? Relational Database: Definitions

Introduction to Data Management. Lecture #4 (E-R Relational Translation)

ENTITY-RELATIONSHIP MODEL. CS 564- Spring 2018

Introduction to Data Management CSE 344

Principles of Database Systems CSE 544. Lecture #1 Introduction and SQL

CSE 344 JULY 30 TH DB DESIGN (CH 4)

The Relational Data Model. Data Model

Chapter 3: Introduction to SQL

Relational data model

Introduction to Database Systems CSE 414

The Relational Model of Data (ii)

Database Systems: Design, Implementation, and Management Tenth Edition. Chapter 7 Introduction to Structured Query Language (SQL)

Information Systems Engineering. SQL Structured Query Language DDL Data Definition (sub)language

SQL: Part III. Announcements. Constraints. CPS 216 Advanced Database Systems

CSC Web Programming. Introduction to SQL

Relational Model. Topics. Relational Model. Why Study the Relational Model? Linda Wu (CMPT )

Database Systems ( 資料庫系統 )

The SQL data-definition language (DDL) allows defining :

The Relational Model. Week 2

Transcription:

Introduction to SQL Part 1 By Michael Hahsler based on slides for CS145 Introduction to Databases (Stanford)

Lecture 2 Lecture Overview 1. SQL introduction & schema definitions 2. Basic single-table queries 3. Multi-table queries 2

Lecture 2 > Section 1 1. SQL INTRODUCTION & DEFINITIONS 3

Lecture 2 > Section 1 What you will learn about in this section 1. What is SQL? 2. Basic schema definitions 3. Keys & constraints intro 4. Activities: CREATE TABLE statements 4

Lecture 2 > Section 1 > SQL Basic SQL 5

Lecture 2 > Section 1 > SQL SQL Introduction SQL stands for Structured Query Language SQL is a standard language for querying and manipulating data. SQL is a high-level, declarative programming language. SQL execution is highly optimized and parallelized. Many standards out there: Standardized in 1986/87 ANSI SQL/ SQL-86, SQL92 (a.k.a. SQL2), SQL99 (a.k.a. SQL3), SQL:2011 Vendors support various subsets (e.g., SQLite implements most of the SQL-92 standard)

Lecture 2 > Section 1 > SQL SQL is a Data Definition Language (DDL) Define relational schemata Create/alter/delete tables and their attributes Data Manipulation Language (DML) Insert/delete/modify tuples in tables Query one or more tables 7

Lecture 2 > Section 1 > Definitions Tables in SQL Product PName Price Manufacturer Gizmo $19.99 GizmoWorks Powergizmo $29.99 GizmoWorks SingleTouch $149.99 Canon MultiTouch $203.99 Hitachi A relation or table is a multiset of tuples having the attributes specified by the schema This is where the name relational databases comes from. 8

Lecture 2 > Section 1 > Definitions Tables in SQL Product PName Price Manufacturer Gizmo $19.99 GizmoWorks Powergizmo $29.99 GizmoWorks SingleTouch $149.99 Canon MultiTouch $203.99 Hitachi A multiset is an unordered list (or: a set with multiple duplicate instances allowed) List: [1, 1, 2, 3] Set: {1, 2, 3} Multiset: {1, 1, 2, 3} i.e. no next(), etc. methods! 9

Lecture 2 > Section 1 > Definitions Tables in SQL Product PName Price Manufacturer Gizmo $19.99 GizmoWorks Powergizmo $29.99 GizmoWorks SingleTouch $149.99 Canon MultiTouch $203.99 Hitachi An attribute (or column) is a typed data entry present in each tuple in the relation Attributes must have an atomic type in standard SQL, i.e. not a list, set, etc. 10

Lecture 2 > Section 1 > Definitions Tables in SQL Product PName Price Manufacturer Gizmo $19.99 GizmoWorks Powergizmo $29.99 GizmoWorks SingleTouch $149.99 Canon MultiTouch $203.99 Hitachi A tuple or row is a single entry in the table having the attributes specified by the schema Sometimes also referred to as a record 11

Lecture 2 > Section 1 > Definitions Product Tables in SQL PName Price Manufacturer Gizmo $19.99 GizmoWorks Powergizmo $29.99 GizmoWorks SingleTouch $149.99 Canon The number of tuples is the cardinality of the relation MultiTouch $203.99 Hitachi The number of attributes is the arity of the relation 12

Lecture 2 > Section 1 > Definitions Data Types in SQL Atomic types: Characters: CHAR(20), VARCHAR(50) Numbers: INT, BIGINT, SMALLINT, FLOAT Others: MONEY, DATETIME, SQLite uses: integer, text and real Every attribute must have an atomic type Why? 13

Lecture 2 > Section 1 > Definitions Table Schemas The schema of a table is the table name, its attributes, and their types: Product(Pname: string, Price: float, Category: string, Manufacturer: string) A key is an attribute (combination) that identifies a tuple uniquely. Product(Pname: string, Price: float, Category: string, Manufacturer: string) 14

Lecture 2 > Section 1 > Keys & constraints Key constraints A key is a minimal subset of attributes that acts as a unique identifier for tuples in a relation A key is an implicit constraint on which tuples can be in the relation i.e., if two tuples agree on the values of the key, then they must be the same tuple! Students(sid:string, name:string, gpa: float) 1. Which would you select as a key? 2. Is a key always guaranteed to exist? 3. Can we have more than one key? key candidates and primary key

Lecture 2 > Section 1 > Keys & constraints NULL and NOT NULL To say don t know the value we use NULL Students(sid:string, name:string, gpa: float) sid name gpa 123 Bob 3.9 143 Jim NULL Say, Jim just enrolled in his first class. In SQL, we may constrain a column to be NOT NULL, e.g., name in this table

Lecture 2 > Section 1 > Keys & constraints General Constraints We could specify arbitrary assertions E.g., There cannot be 25 people in the DB class In practice, we don t specify many constraints in the database. Why? Performance!

Lecture 2 > Section 1 > Summary Summary of Schema Information Schema and Constraints are how databases understand the semantics (meaning) of data They are also useful for optimization SQL supports general constraints: Keys and foreign keys are most important We will learn about other constraints

Lecture 2 > Section 1 > Activities Activities SQLite data types (http://www.tutorialspoint.com/sqlite/) DB Browser Create a database Create a Product table Add the shown data PName Price Category Manufacturer Gizmo $19.99 Gadgets GizmoWorks Powergizmo $29.99 Gadgets GizmoWorks SingleTouch $149.99 Photography Canon MultiTouch $203.99 Household Hitachi 19

Lecture 2 > Section 2 2. SINGLE-TABLE QUERIES 20

Lecture 2 > Section 2 What you will learn about in this section 1. The SFW query 2. Other useful operators: LIKE, DISTINCT, ORDER BY 3. Activities: Single-table queries 21

Lecture 2 > Section 2 > SFW SQL Query Basic form (there are many many more bells and whistles) SELECT <attributes> FROM <one or more relations> WHERE <conditions> Call this a SFW query. 22

Lecture 2 > Section 2 > SFW Simple SQL Query: Selection Selection is the operation of filtering a relation s tuples on some condition SELECT * FROM Product WHERE Category = Gadgets PName Price Category Manufacturer Gizmo $19.99 Gadgets GizmoWorks Powergizmo $29.99 Gadgets GizmoWorks SingleTouch $149.99 Photography Canon MultiTouch $203.99 Household Hitachi PName Price Category Manufacturer Gizmo $19.99 Gadgets GizmoWorks Powergizmo $29.99 Gadgets GizmoWorks 23

Lecture 2 > Section 2 > SFW Simple SQL Query: Projection Projection is the operation of producing an output table with tuples that have a subset of their prior attributes PName Price Category Manufacturer Gizmo $19.99 Gadgets GizmoWorks Powergizmo $29.99 Gadgets GizmoWorks SingleTouch $149.99 Photography Canon MultiTouch $203.99 Household Hitachi SELECT Pname, Price, Manufacturer FROM Product WHERE Category = Gadgets PName Price Manufacturer Gizmo $19.99 GizmoWorks Powergizmo $29.99 GizmoWorks 24

Lecture 2 > Section 2 > SFW Notation Input schema Product(PName, Price, Category, Manfacturer) SELECT Pname, Price, Manufacturer FROM Product WHERE Category = Gadgets Output schema Answer(PName, Price, Manfacturer) 25

Lecture 2 > Section 2 > SFW A Few Details SQL commands are case insensitive: Same: SELECT, Select, select Same: Product, product Values are not: Different: Seattle, seattle Use single quotes for text constants: abc - yes abc - no 26

Lecture 2 > Section 2 > Other operators LIKE: Simple String Pattern Matching SELECT * FROM Products WHERE PName LIKE %gizmo% s LIKE p: pattern matching on strings p may contain two special symbols: % = any sequence of characters _ = any single character 27

Lecture 2 > Section 2 > Other operators DISTINCT: Eliminating Duplicates SELECT DISTINCT Category FROM Product Versus SELECT Category FROM Product Category Gadgets Photography Household Category Gadgets Gadgets Photography Household 28

Lecture 2 > Section 2 > Other operators ORDER BY: Sorting the Results SELECT PName, Price, Manufacturer FROM Product WHERE Category= gizmo AND Price > 50 ORDER BY Price, PName Ties are broken by the second attribute on the ORDER BY list, etc. Ordering is ascending, unless you specify the DESC keyword. Text is ordered alphabetically. 29

Lecture 3 > Section 3 > CASE CASE Statement CASE WHEN [condition1] THEN [expression1] WHEN [condition2] THEN [expression2] ELSE [default expression] END Product name category price Gizmo gadget 50 Camera Photo 299 OneClick Photo 89 Example: SELECT name, CASE WHEN price > 200 THEN Yes ELSE No END AS expensive FROM Product 30

Lecture 3 > Section 3 > LIMIT IN and BETWEEN The IN operator allows you to specify multiple values in a WHERE clause. SELECT column_name(s) FROM table_name WHERE column_name IN (value1,value2,...) The BETWEEN operator selects values within a range. The values can be numbers, text, or dates. SELECT column_name(s) FROM table_name WHERE column_name BETWEEN value1 AND value2 31

Lecture 3 > Section 3 > LIMIT LIMIT Clause Used to limit the data amount returned by the SELECT statement. Example: Find the 5 most expensive products SELECT * FROM product ORDER BY price DESC LIMIT 5 Syntax: LIMIT [no of rows] OFFSET [row num] Note: LIMIT is not standard SQL (e.g., MS SQL Server uses SELECT TOP) 32

Lecture 3 > Section 3 > LIMIT COUNT COUNT is an aggregation function that returns the number of elements. Example: Find the number of products with a price of $20 or more. SELECT COUNT(*) FROM product WHERE price >= 20 Syntax: COUNT([ALL DISTINCT] expression) 33

Lecture 2 > Section 1 > Activities Activities SQLite Operators Expressions Where clauses And & Or clauses (http://www.tutorialspoint.com/sqlite/) 1. How many gadgets are less than $20? 2. How much does it cost to buy all Gadgets? 3. What happens if the manufacturer GizmoWorks changes its name? This is why we need multiple tables! 34

Lecture 2 > Section 3 3. MULTI-TABLE QUERIES 35

Lecture 2 > Section 3 What you will learn about in this section 1. Foreign key constraints 2. Joins: basics 3. Joins: SQL semantics 4. Activities: Multi-table queries 36

Lecture 2 > Section 3 > Foreign Keys Foreign Key Constraints Suppose we have the following schema: Students(sid: string, name: string, gpa: float) Enrolled(student_id: string, cid: string, grade: string) And we want to impose the following constraint: Only existing students may enroll in courses i.e. a student must appear in the Students table to enroll in a class Students Enrolled sid name gpa 101 Bob 3.2 123 Mary 3.8 student_id cid grade 123 564 A 123 537 A+ We say that student_id is a foreign key that refers to Students Note: student_id alone is not a key- what is?

Lecture 2 > Section 3 > Foreign Keys Declaring Foreign Keys Students(sid: string, name: string, gpa: float) Enrolled(student_id: string, cid: string, grade: string) CREATE TABLE Enrolled( student_id CHAR(20), cid CHAR(20), grade CHAR(10), PRIMARY KEY (student_id, cid), FOREIGN KEY (student_id) REFERENCES Students ) Primary key Foreign key

Lecture 2 > Section 3 > Foreign Keys Foreign Keys and Update Operations Students(sid: string, name: string, gpa: float) Enrolled(student_id: string, cid: string, grade: string) What if we insert a tuple into Enrolled, but no corresponding student? INSERT is rejected (foreign keys are constraints)! What if we delete a student? 1. Disallow the delete 2. Remove all of the courses for that student 3. SQL allows a third via NULL (not yet covered) SQLite: Enable foreign keys with PRAGMA foreign_keys = ON; DB Browser: check Foreign Keys in Edit Pragma

Lecture 2 > Section 3 > Foreign Keys Keys and Foreign Keys Company CName StockPrice Country GizmoWorks 25 USA Canon 65 Japan Hitachi 15 Japan Product PName Price Category Manufacturer Gizmo $19.99 Gadgets GizmoWorks Powergizmo $29.99 Gadgets GizmoWorks SingleTouch $149.99 Photography Canon What is a foreign key vs. a key here? MultiTouch $203.99 Household Hitachi 40

Lecture 2 > Section 3 > Joins: Basics Keys and Foreign Keys Company(CName, StockPrice, Country) Product(PName, Price, Category, Manufacturer) This example uses natural keys. Often surrogate keys are used instead: Company(Cnumber, CName, StockPrice, Country) Product(Pnumber, Pname, Price, Category, ManufNumber) Why? Why do we use SMUIDs and Social Security Numbers? 41

Lecture 2 > Section 3 > Joins: Basics Joins Product(PName, Price, Category, Manufacturer) Company(CName, StockPrice, Country) Ex: Find all products under $200 manufactured in Japan; return their names and prices. SELECT PName, Price FROM Product, Company WHERE Manufacturer = CName AND Country= Japan AND Price <= 200 Note: we will often omit attribute types in schema definitions for brevity, but assume attributes are always types 42

Lecture 2 > Section 3 > Joins: Basics Joins Product(PName, Price, Category, Manufacturer) Company(CName, StockPrice, Country) Ex: Find all products under $200 manufactured in Japan; return their names and prices. SELECT PName, Price FROM Product, Company WHERE Manufacturer = CName AND Country= Japan AND Price <= 200 A join between tables returns all unique combinations of their tuples which meet some specified join condition 43

Lecture 2 > Section 3 > Joins: Basics Joins Product(PName, Price, Category, Manufacturer) Company(CName, StockPrice, Country) Several equivalent ways to write a basic join in SQL: SELECT PName, Price FROM Product, Company WHERE Manufacturer = CName AND Country= Japan AND Price <= 200 SELECT PName, Price FROM Product JOIN Company ON Manufacturer = Cname WHERE Price <= 200 AND Country= Japan 44

Lecture 2 > Section 3 > Joins: Basics Joins Product PName Price Category Manuf Gizmo $19 Gadgets GWorks Powergizmo $29 Gadgets GWorks SingleTouch $149 Photography Canon MultiTouch $203 Household Hitachi Company Cname Stock Country GWorks 25 USA Canon 65 Japan Hitachi 15 Japan SELECT PName, Price FROM Product JOIN Company ON Manufacturer = Cname WHERE Price <= 200 AND Country= Japan PName Price SingleTouch $149.99 45

Lecture 2 > Section 3 > Joins: Semantics Tuple Variable Ambiguity in Multi-Table Person(name, address, worksfor) Company(name, address) SELECT DISTINCT name, address FROM Person, Company WHERE worksfor = name Which address does this refer to? Which name s?? 46

Lecture 2 > Section 3 > Joins: Semantics Tuple Variable Ambiguity in Multi-Table Person(name, address, worksfor) Company(name, address) Both equivalent ways to resolve variable ambiguity SELECT DISTINCT Person.name, Person.address FROM Person, Company WHERE Person.worksfor = Company.name SELECT DISTINCT p.name, p.address FROM Person p, Company c WHERE p.worksfor = c.name 47

Lecture 2 > Section 3 > Joins: semantics An Example of SQL Semantics R A 1 3 SELECT R.A FROM R, S WHERE R.A = S.B A 3 3 S B C 2 3 3 4 3 5 48

Lecture 2 > Section 3 > Joins: semantics An Example of SQL Semantics R S A 1 3 B C 2 3 Cross Product SELECT R.A FROM R, S WHERE R.A = S.B R x S A B C 1 2 3 1 3 4 Apply Selections / Conditions A 3 3 Apply Projection 3 4 1 3 5 A B C 3 5 3 2 3 3 3 4 3 3 4 3 3 5 3 3 5 49

Lecture 2 > Section 3 > Joins: semantics Note the Semantics of a Join SELECT R.A FROM R, S WHERE R.A = S.B 1. Take cross product: X = R S 2. Apply selections / conditions: Y = r, s X r. A == r. B} Recall: Cross product (A x B) is the set of the combinations of all unique tuples in A and B Ex: {a,b,c} x {1,2} = {(a,1), (a,2), (b,1), (b,2), (c,1), (c,2)} = Filtering! 3. Apply projections to get final output: Z = (y. A, ) fff y Y = Returning only some attributes Remembering this order is critical to understanding the output of complicated queries! 50

Lecture 2 > Section 3 > Joins: semantics Another Note on Semantics semantics is not equal to execution order The preceding slides show what a join means Not actually how the DBMS executes it under the covers

Lecture 2 > Section 3 > Activities Joins: semantics A Subtlety About Joins Product(PName, Price, Category, Manufacturer) Company(CName, StockPrice, Country) Find all countries that manufacture some product in the Gadgets category. SELECT Country FROM Product, Company WHERE Manufacturer=CName AND Category= Gadgets 52

Lecture 2 > Section 3 > Activities Joins: semantics Product A Subtlety About Joins Find all countries that manufacture some product in the Gadgets category. Company PName Price Category Manuf Gizmo $19 Gadgets GWorks Powergizmo $29 Gadgets GWorks SingleTouch $149 Photography Canon Cname Stock Country GWorks 25 USA Canon 65 Japan Hitachi 15 Japan MultiTouch $203 Household Hitachi SELECT Country FROM Product, Company WHERE Manufacturer=Cname AND Category= Gadgets Country?? What is the problem? What is the solution? 53

Lecture 2 > Section 1 > Activities Activities 1. Create the product/company database from the slide set. Add the following relation Purchase(id, product, buyer). with the appropriate foreign key constraints and add some data. 2. Find all countries that manufacture some product in the Gadgets category (shows each country only once). 3. Find all products that are manufactured in the US sorted by price. 4. In how many different countries are the products a specific buyer purchases manufactured? 54