CSC 261/461 Database Systems Lecture 13. Fall 2017

Similar documents
Lecture 16. The Relational Model

4/10/2018. Relational Algebra (RA) 1. Selection (σ) 2. Projection (Π) Note that RA Operators are Compositional! 3.

L22: The Relational Model (continued) CS3200 Database design (sp18 s2) 4/5/2018

Chapter 2: The Relational Algebra

Chapter 2: The Relational Algebra

Relational Model and Relational Algebra

CSC 261/461 Database Systems Lecture 14. Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101

Optimization Overview

Announcements. Relational Model & Algebra. Example. Relational data model. Example. Schema versus instance. Lecture notes

Introduction to Data Management CSE 344. Lectures 8: Relational Algebra

CS411 Database Systems. 04: Relational Algebra Ch 2.4, 5.1

Relational Model & Algebra. Announcements (Thu. Aug. 27) Relational data model. CPS 116 Introduction to Database Systems

CS 377 Database Systems

Introduction to Data Management CSE 344. Lectures 8: Relational Algebra

Lecture 2: Introduction to SQL

2.2.2.Relational Database concept

Relational Algebra Homework 0 Due Tonight, 5pm! R & G, Chapter 4 Room Swap for Tuesday Discussion Section Homework 1 will be posted Tomorrow

Lecture Query evaluation. Combining operators. Logical query optimization. By Marina Barsky Winter 2016, University of Toronto

CMPT 354: Database System I. Lecture 3. SQL Basics

Relational Model & Algebra. Announcements (Tue. Sep. 3) Relational data model. CompSci 316 Introduction to Database Systems

CMPT 354: Database System I. Lecture 5. Relational Algebra

Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Chapter 6 Outline. Unary Relational Operations: SELECT and

CAS CS 460/660 Introduction to Database Systems. Relational Algebra 1.1

Relational Algebra. [R&G] Chapter 4, Part A CS4320 1

CS2300: File Structures and Introduction to Database Systems

MIS Database Systems Relational Algebra

EECS 647: Introduction to Database Systems

CIS 330: Applied Database Systems. ER to Relational Relational Algebra

Chapter 3. The Relational Model. Database Systems p. 61/569

Introduction to Data Management. Lecture #11 (Relational Algebra)

1 (10) 2 (8) 3 (12) 4 (14) 5 (6) Total (50)

Relational Algebra. Study Chapter Comp 521 Files and Databases Fall

Relational Algebra. Note: Slides are posted on the class website, protected by a password written on the board

CS 4604: Introduction to Database Management Systems. B. Aditya Prakash Lecture #2: The Relational Model and Relational Algebra

Final Review. CS 145 Final Review. The Best Of Collection (Master Tracks), Vol. 2

Relational Databases. Relational Databases. Extended Functional view of Information Manager. e.g. Schema & Example instance of student Relation

Introduction to SQL Part 1 By Michael Hahsler based on slides for CS145 Introduction to Databases (Stanford)

CSC 261/461 Database Systems Lecture 19

Chapter 2: Intro to Relational Model

RELATIONAL ALGEBRA. CS 564- Fall ACKs: Dan Suciu, Jignesh Patel, AnHai Doan

Relational Algebra. Relational Query Languages

Relational Algebra Part I. CS 377: Database Systems

Relational Query Languages: Relational Algebra. Juliana Freire

Lecture 3: SQL Part II

Why Study the Relational Model? The Relational Model. Relational Database: Definitions. The SQL Query Language. Relational Query Languages

Database Management Systems. Chapter 4. Relational Algebra. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

Chapter 6 The Relational Algebra and Relational Calculus

CS 4604: Introduc0on to Database Management Systems. B. Aditya Prakash Lecture #3: SQL and Rela2onal Algebra- - - Part 1

Relational Query Languages. Preliminaries. Formal Relational Query Languages. Example Schema, with table contents. Relational Algebra

Relational Algebra. Lecture 4A Kathleen Durant Northeastern University

RELATIONAL ALGEBRA. CS121: Relational Databases Fall 2017 Lecture 2

Big Data Processing Technologies. Chentao Wu Associate Professor Dept. of Computer Science and Engineering

Database Applications (15-415)

Relational algebra. Iztok Savnik, FAMNIT. IDB, Algebra

The Relational Model. Chapter 3. Comp 521 Files and Databases Fall

Basic operators: selection, projection, cross product, union, difference,

Relational Query Languages

CS

Lecture #8 (Still More Relational Theory...!)

CMPT 354: Database System I. Lecture 2. Relational Model

Faloutsos - Pavlo CMU SCS /615

Overview. Carnegie Mellon Univ. School of Computer Science /615 - DB Applications. Concepts - reminder. History

Relational Model 2: Relational Algebra

Big Data Processing Technologies. Chentao Wu Associate Professor Dept. of Computer Science and Engineering

Relational Algebra and SQL. Basic Operations Algebra of Bags

Relational Algebra. Procedural language Six basic operators

Database Systems ( 資料庫系統 )

Databases 1. Daniel POP

Relational Algebra. Relational Query Languages

Ian Kenny. November 28, 2017

EECS 647: Introduction to Database Systems

Relational Algebra 1

CS3DB3/SE4DB3/SE6M03 TUTORIAL

CS145: Intro to Databases. Lecture 1: Course Overview

COMP 244 DATABASE CONCEPTS AND APPLICATIONS

Relational Algebra for sets Introduction to relational algebra for bags

COMP 430 Intro. to Database Systems

The Relational Model

ECE 650 Systems Programming & Engineering. Spring 2018

Relational Algebra. Chapter 4, Part A. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

Chapter 6 Formal Relational Query Languages

The Relational Model. Chapter 3. Comp 521 Files and Databases Fall

CMP-3440 Database Systems

Data! CS 133: Databases. Goals for Today. So, what is a database? What is a database anyway? From the textbook:

Relational Model, Relational Algebra, and SQL

Relational Query Languages. Relational Algebra. Preliminaries. Formal Relational Query Languages. Relational Algebra: 5 Basic Operations

CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #2: The Rela0onal Model, and SQL/Rela0onal Algebra

Introduction to Data Management. Lecture #11 (Relational Languages I)

Query Processing & Optimization. CS 377: Database Systems

The Relational Model

Relational Query Languages

Announcements. PS 3 is out (see the usual place on the course web) Be sure to read my notes carefully Also read. Take a break around 10:15am

Web Science & Technologies University of Koblenz Landau, Germany. Relational Data Model

Relational Model: History

SQL: Data Manipulation Language. csc343, Introduction to Databases Diane Horton Winter 2017

SQL: csc343, Introduction to Databases Renée J. Miller and Fatemeh Nargesian and Sina Sina Meraji. Winter 2018

Lectures 12: Design Theory I. 1. Normal forms & functional dependencies 2/19/2018. Today s Lecture. What you will learn about in this section

Chapter 5 Relational Algebra. Nguyen Thi Ai Thao

CS317 File and Database Systems

Relational Algebra BASIC OPERATIONS DATABASE SYSTEMS AND CONCEPTS, CSCI 3030U, UOIT, COURSE INSTRUCTOR: JAREK SZLICHTA

Transcription:

CSC 261/461 Database Systems Lecture 13 Fall 2017

Announcement Start learning HTML, CSS, JavaScript, PHP + SQL We will cover the basics next week https://www.w3schools.com/php/php_mysql_intro.asp Project 1 Milestone 3 will be out soon Combination of Theory and Application BCNF decomposition and PHP and MySQL Term Paper for Grad Students http://www.cs.rochester.edu/courses/261/fall2017/projects/gradtopics.html

Clarification If we remove FD2, then LOTS1A is in form? 2NF

Agenda Relational Algebra (Today) Relational Calculus (We will not cover)

RELATIONAL ALGEBRA

Motivation Relational Algebra provides a formal foundation for relational model operations It is the basis for implementing and optimizing queries in any RDBMS The core operations of most relational systems are based on Relational Algebra

The Relational Model: Schemata Relational Schema: Students(sid: string, name: string, gpa: float) Relation name String, float, int, etc. are the domains of the attributes Attributes

The Relational Model: Data An attribute (or column) is a typed data entry present in each tuple in the relation Student sid name gpa 001 Bob 3.2 002 Joe 2.8 003 Mary 3.8 004 Alice 3.5 The number of attributes is the arity of the relation

The Relational Model: Data Student sid name gpa 001 Bob 3.2 002 Joe 2.8 003 Mary 3.8 004 Alice 3.5 The number of tuples is the cardinality of the relation A tuple or row (or record) is a single entry in the table having the attributes specified by the schema

The Relational Model: Data Student sid name gpa 001 Bob 3.2 002 Joe 2.8 003 Mary 3.8 Recall: In practice DBMSs relax the set requirement, and use multisets. 004 Alice 3.5 A relational instance is a set of tuples all conforming to the same schema

Relation Instances Relation DB Schema Students(sid: string, name: string, gpa: float) Courses(cid: string, cname: string, credits: int) Enrolled(sid: string, cid: string, grade: string) Note that the schemas impose effective domain / type constraints, i.e. Gpa can t be Apple Sid Name Gpa 101 Bob 3.2 123 Mary 3.8 Students Relation Instances sid cid Grade 123 564 A Enrolled cid cname credits 564 564-2 4 308 417 2 Courses

Querying SELECT S.name FROM Students S WHERE S.gpa > 3.5; Find names of all students with GPA > 3.5 We don t tell the system how or where to get the data- just what we want, i.e., Querying is declarative To make this happen, we need to translate the declarative query into a series of operators we ll see this next!

Relational Algebra

RDBMS Architecture How does a SQL engine work? SQL Query Relational Algebra (RA) Plan Optimized RA Plan Execution Declarative query (from user) Translate to relational algebra expresson Find logically equivalent- but more efficient- RA expression Execute each operator of the optimized plan!

RDBMS Architecture How does a SQL engine work? SQL Query Relational Algebra (RA) Plan Optimized RA Plan Execution Relational Algebra allows us to translate declarative (SQL) queries into precise and optimizable expressions!

Relational Algebra (RA) Five basic operators: 1. Selection: s 2. Projection: P 3. Cartesian Product: 4. Union: È 5. Difference: - Derived or auxiliary operators: Intersection Joins (natural, equi-join, theta join, semi-join) Renaming: r Division We ll look at these first! And also at one example of a derived operator (natural join) and a special operator (renaming)

Keep in mind: RA operates on sets! RDBMSs use multisets, however in relational algebra formalism we will consider sets! Also: we will consider the named perspective, where every attribute must have a unique name àattribute order does not matter Now on to the basic RA operators

1. Selection (σ) Students(sid,sname,gpa) Returns all tuples which satisfy a condition Notation: s c (R) Examples s Salary > 40000 (Employee) s name = Smith (Employee) The condition c can be =, <,, >, ³, <> SQL: SELECT * FROM Students WHERE gpa > 3.5; RA: σ "#$ &'.) (Students)

Another example: SSN Name Salary 1234545 John 200000 5423341 Smith 600000 4352342 Fred 500000 s Salary > 40000 (Employee) SSN Name Salary 5423341 Smith 600000 4352342 Fred 500000

2. Projection (Π) Students(sid,sname,gpa) Eliminates columns, then removes duplicates Notation: P A1,,An (R) Example: project socialsecurity number and names: P SSN, Name (Employee) Output schema: Answer(SSN, Name) SQL: SELECT DISTINCT sname, gpa FROM Students; RA: Π 45$67,"#$ (Students)

Another example: SSN Name Salary 1234545 John 200000 5423341 John 600000 4352342 John 200000 P Name,Salary (Employee) Name Salary John 200000 John 600000

Note that RA Operators are Compositional! Students(sid,sname,gpa) SELECT DISTINCT sname, gpa FROM Students WHERE gpa > 3.5; How do we represent this query in RA? Π 45$67,"#$ (σ "#$&'.) (Students)) σ "#$&'.) (Π 45$67,"#$ ( Students)) Are these logically equivalent?

3. Cross-Product ( ) Students(sid,sname,gpa) People(ssn,pname,address) Each tuple in R1 with each tuple in R2 Notation: R1 R2 Example: Employee Dependents Rare in practice; mainly used to express joins SQL: SELECT * FROM Students, People; RA: Students People

Another example: People ssn pname address 1234545 John 216 Rosse 5423341 Bob 217 Rosse Students sid sname gpa 001 John 3.4 002 Bob 1.3 Students People ssn pname address sid sname gpa 1234545 John 216 Rosse 001 John 3.4 5423341 Bob 217 Rosse 001 John 3.4 1234545 John 216 Rosse 002 Bob 1.3 5423341 Bob 216 Rosse 002 Bob 1.3

Renaming (ρ) Students(sid,sname,gpa) Changes the schema, not the instance A special operator- neither basic nor derived Notation: r B1,,Bn (R) SQL: SELECT sid AS studid, sname AS name, gpa AS gradeptavg FROM Students; Note: this is shorthand for the proper form (since names, not order matters!): r A1àB1,,AnàBn (R) RA: ρ 4?@ABA,5$67,"C$A7D?EF" (Students) We care about this operator because we are working in a named perspective

Another example: Students sid sname gpa 001 John 3.4 002 Bob 1.3 ρ 4?@ABA,5$67,"C$A7D?EF" (Students) Students studid name gradeptavg 001 John 3.4 002 Bob 1.3

Natural Join ( ) Note: Textbook notation is * Students(sid,name,gpa) People(ssn,name,address) Notation: R 1 R 2 Joins R 1 and R 2 on equality of all shared attributes If R 1 has attribute set A, and R 2 has attribute set B, and they share attributes A B = C, can also be written: R 1 C R 2 Our first example of a derived RA operator: Meaning: R 1 R 2 = P A U B (s C=D (ρ J L (R 1 ) R 2 )) Where: The rename ρ J L renames the shared attributes in one of the relations The selection s C=D checks equality of the shared attributes The projection P A U B eliminates the duplicate common attributes SQL: SELECT DISTINCT sid, S.name, gpa, ssn, address FROM Students S, People P WHERE S.name = P.name; RA: Students People

Another example: Students S sid S.name gpa 001 John 3.4 002 Bob 1.3 People P ssn P.name address 1234545 John 216 Rosse 5423341 Bob 217 Rosse Students People sid S.name gpa ssn address 001 John 3.4 1234545 216 Rosse 002 Bob 1.3 5423341 216 Rosse

Natural Join Given schemas R(A, B, C, D), S(A, C, E), what is the schema of R S? Given R(A, B, C), S(D, E), what is R S? Given R(A, B), S(A, B), what is R S?

Example: Converting SFW Query -> RA Students(sid,name,gpa) People(ssn,name,address) SELECT DISTINCT gpa, address FROM Students S, People P WHERE gpa > 3.5 AND S.name = P.name; Π "#$,$AAC744 (σ "#$&'.) (S P)) How do we represent this query in RA?

Logical Equivalece of RA Plans Given relations R(A,B) and S(B,C): Here, projection & selection commute: σ EM) (Π E (R)) = Π E (σ EM) (R)) What about here? σ EM) (Π P (R))? = Π P (σ EM) (R))

RDBMS Architecture How does a SQL engine work? SQL Query Relational Algebra (RA) Plan Optimized RA Plan Execution We saw how we can transform declarative SQL queries into precise, compositional RA plans

RDBMS Architecture How does a SQL engine work? SQL Query Relational Algebra (RA) Plan Optimized RA Plan Execution We ll look at how to then optimize these plans later in this lecture

RDBMS Architecture How is the RA plan executed? SQL Query Relational Algebra (RA) Plan Optimized RA Plan Execution We already know how to execute all the basic operators!

2. ADV. RELATIONAL ALGEBRA

What you will learn about in this section 1. Set Operations in RA 2. Fancier RA

Relational Algebra (RA) Five basic operators: 1. Selection: s 2. Projection: P 3. Cartesian Product: 4. Union: È 5. Difference: - Derived or auxiliary operators: Intersection Joins (natural,equi-join, theta join, semi-join) Renaming: r Division We ll look at these And also at some of these derived operators

1. Union (È) and 2. Difference ( ) R1 È R2 Example: ActiveEmployees È RetiredEmployees R 1 R 2 R1 R2 Example: AllEmployees -- RetiredEmployees R 1 R 2

What about Intersection (Ç)? It is a derived operator R1 Ç R2 = R1 (R1 R2) Also expressed as a join! Example UnionizedEmployees Ç RetiredEmployees R 1 R 2

Fancier RA

Theta Join ( q ) Students(sid,sname,gpa) People(ssn,pname,address) A join that involves a predicate R1 q R2 = s q (R1 R2) Here q can be any condition SQL: SELECT * FROM Students,People WHERE q; Note that natural join is a theta join + a projection. RA: Students R People

Equi-join ( A=B ) A theta join where q is an equality R1 A=B R2 = s A=B (R1 R2) Example: Students(sid,sname,gpa) People(ssn,pname,address) SQL: Employee SSN=SSN Dependents SELECT * FROM Students S, People P WHERE sname = pname; Most common join in practice! RA: S 45$67M#5$67 P

Semijoin ( ) R S = P A1,,An (R S) Where A 1,, A n are the attributes in R Example: Students(sid,sname,gpa) People(ssn,pname,address) SQL: Employee Dependents SELECT DISTINCT sid,sname,gpa FROM Students,People WHERE sname = pname; RA: Students People

Divison ( ) T(Y) = R(Y,X) S(X) Y is the set of attributes of R that are not attributes of S. For a tuple t to appear in the result T of the Division, the values in t must appear in R in combination with every tuple in S.

Example R(Y,X) S(X) = T(Y) SELECT PS1.pilot_name FROM PilotSkills AS PS1, Hangar AS H1 WHERE PS1.plane_name = H1.plane_name GROUP BY PS1.pilot_name HAVING COUNT(PS1.plane_name) = (SELECT COUNT(plane_name) FROM Hangar); https://www.simple-talk.com/sql/t-sql-programming/divided-we-stand-the-sql-of-relational-division/

Multisets

Recall that SQL uses Multisets Multiset X Tuple (1, a) (1, a) (1, b) (2, c) (2, c) (2, c) (1, d) (1, d) Equivalent Representations of a Multiset λ X = Count of tuple in X (Items not listed have implicit count 0) Multiset X Tuple λ(x) (1, a) 2 (1, b) 1 (2, c) 3 (1, d) 2 Note: In a set all counts are {0,1}.

Generalizing Set Operations to Multiset Operations Multiset X Multiset Y Multiset Z Tuple λ(x) Tuple λ(y) Tuple λ(z) (1, a) 2 (1, b) 0 (2, c) 3 (1, d) 0 (1, a) 5 = (1, b) 1 (2, c) 2 (1, d) 2 (1, a) 2 (1, b) 0 (2, c) 2 (1, d) 0 λ Z = min(λ X, λ Y ) For sets, this is intersection

Generalizing Set Operations to Multiset Operations Multiset X Multiset Y Multiset Z Tuple λ(x) Tuple λ(y) Tuple λ(z) (1, a) 2 (1, b) 0 (2, c) 3 (1, d) 0 = (1, a) 5 (1, b) 1 (2, c) 2 (1, d) 2 (1, a) 7 (1, b) 1 (2, c) 5 (1, d) 2 λ Z = λ X + λ Y For sets, this is union

Operations on Multisets s C (R): preserve the number of occurrences P A (R): no duplicate elimination Cross-product, join: no duplicate elimination This is importantrelational engines work on multisets, not sets!

Complete Set of Relational Operations The set of operations including Select s, Project p Union È Difference Rename r, and Cartesian Product X is called a complete set because any other relational algebra expression can be expressed by a combination of these five operations. For example: R Ç S = (R È S ) ((R - S) È (S - R)) R <join condition> S = s <join condition> (R X S)

Table 8.1 Operations of Relational Algebra continued on next slide

Table 8.1 Operations of Relational Algebra (continued)

Query Tree Notation Query Tree An internal data structure to represent a query Standard technique for estimating the work involved in executing the query, the generation of intermediate results, and the optimization of execution Nodes stand for operations like selection, projection, join, renaming, division,. Leaf nodes represent base relations A tree gives a good visual feel of the complexity of the query and the operations involved Algebraic Query Optimization consists of rewriting the query or modifying the query tree into an equivalent tree.

Example of Query Tree For every project located in Stafford, list the project number, dept. number, manager s last name, address, and birth date (page 258)

Summary Total 8 basic operators: Unary relational operators (3) Selection: s Projection: P Renaming: r Binary relational operators (5) Union: Intersect: Set difference: - Cartesian Product (Join):, Natural Join, Theta Join, Equi-Join, Semi-Join. Division: Tell us: How the query may be executed.

Acknowledgement Some of the slides in this presentation are taken from the slides provided by the authors. Many of these slides are taken from cs145 course offered by Stanford University.