CMPT 354: Database System I. Lecture 5. Relational Algebra

Similar documents
Introduction Relational Algebra Operations

CMPT 354: Database System I. Lecture 2. Relational Model

CMPT 354: Database System I. Lecture 4. SQL Advanced

CMPT 354: Database System I. Lecture 3. SQL Basics

Introduction to Data Management CSE 344. Lectures 8: Relational Algebra

Relational Query Languages: Relational Algebra. Juliana Freire

Relational Algebra. Study Chapter Comp 521 Files and Databases Fall

Relational Algebra. Relational Query Languages

Relational Algebra. [R&G] Chapter 4, Part A CS4320 1

CMPT 354: Database System I. Lecture 7. Basics of Query Optimization

CMPT 354: Database System I. Lecture 1. Course Introduction

Relational Model and Relational Algebra

Introduction to Data Management. Lecture #11 (Relational Algebra)

Introduction to Data Management CSE 344. Lectures 8: Relational Algebra

Relational Query Languages. Preliminaries. Formal Relational Query Languages. Example Schema, with table contents. Relational Algebra

CSC 261/461 Database Systems Lecture 13. Fall 2017

CAS CS 460/660 Introduction to Database Systems. Relational Algebra 1.1

Relational Algebra. Note: Slides are posted on the class website, protected by a password written on the board

CSCC43H: Introduction to Databases. Lecture 3

CMPT 354: Database System I. Lecture 11. Transaction Management

Ian Kenny. November 28, 2017

Relational Query Languages. Relational Algebra. Preliminaries. Formal Relational Query Languages. Relational Algebra: 5 Basic Operations

Relational Algebra Part I. CS 377: Database Systems

Relational Model, Relational Algebra, and SQL

Lecture 16. The Relational Model

The Relational Model

Relational algebra. Iztok Savnik, FAMNIT. IDB, Algebra

Relational Algebra. Chapter 4, Part A. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

Relational Query Languages

Relational Algebra 1. Week 4

RELATIONAL DATA MODEL: Relational Algebra

Relational Algebra Homework 0 Due Tonight, 5pm! R & G, Chapter 4 Room Swap for Tuesday Discussion Section Homework 1 will be posted Tomorrow

SQL: Data Manipulation Language. csc343, Introduction to Databases Diane Horton Winter 2017

Chapter 6 The Relational Algebra and Relational Calculus

SQL: csc343, Introduction to Databases Renée J. Miller and Fatemeh Nargesian and Sina Sina Meraji. Winter 2018

Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Chapter 6 Outline. Unary Relational Operations: SELECT and

CS 377 Database Systems

CIS 330: Applied Database Systems. ER to Relational Relational Algebra

Today s topics. Null Values. Nulls and Views in SQL. Standard Boolean 2-valued logic 9/5/17. 2-valued logic does not work for nulls

Relational Algebra 1

2.2.2.Relational Database concept

CMP-3440 Database Systems

Relational Algebra. Procedural language Six basic operators

v Conceptual Design: ER model v Logical Design: ER to relational model v Querying and manipulating data

Chapter 2: Intro to Relational Model

Relational Model & Algebra. Announcements (Thu. Aug. 27) Relational data model. CPS 116 Introduction to Database Systems

Database Management Systems. Chapter 4. Relational Algebra. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

Relational Model & Algebra. Announcements (Tue. Sep. 3) Relational data model. CompSci 316 Introduction to Database Systems

MIS Database Systems Relational Algebra

Announcements. Relational Model & Algebra. Example. Relational data model. Example. Schema versus instance. Lecture notes

Database Applications (15-415)

Basic operators: selection, projection, cross product, union, difference,

Relational Database Systems 1

Chapter 6 Part I The Relational Algebra and Calculus

Relational Algebra. Lecture 4A Kathleen Durant Northeastern University

4/10/2018. Relational Algebra (RA) 1. Selection (σ) 2. Projection (Π) Note that RA Operators are Compositional! 3.

Relational Algebra for sets Introduction to relational algebra for bags

Informatics 1: Data & Analysis

Informationslogistik Unit 4: The Relational Algebra

COMP 244 DATABASE CONCEPTS AND APPLICATIONS

Lecture Query evaluation. Combining operators. Logical query optimization. By Marina Barsky Winter 2016, University of Toronto

Relational Algebra. Relational Algebra Overview. Relational Algebra Overview. Unary Relational Operations 8/19/2014. Relational Algebra Overview

Relational Algebra 1

EECS 647: Introduction to Database Systems

Missing Information. We ve assumed every tuple has a value for every attribute. But sometimes information is missing. Two common scenarios:

RELATIONAL ALGEBRA. CS 564- Fall ACKs: Dan Suciu, Jignesh Patel, AnHai Doan

Relational Algebra and SQL

Relational Algebra. Relational Query Languages

CompSci 516: Database Systems

Simple queries Set operations Aggregate operators Null values Joins Query Optimization. John Edgar 2

CS 4604: Introduction to Database Management Systems. B. Aditya Prakash Lecture #2: The Relational Model and Relational Algebra

Relational Model and Algebra. Introduction to Databases CompSci 316 Fall 2018

Relational Algebra. ICOM 5016 Database Systems. Roadmap. R.A. Operators. Selection. Example: Selection. Relational Algebra. Fundamental Property

Chapter 3. The Relational Model. Database Systems p. 61/569

L22: The Relational Model (continued) CS3200 Database design (sp18 s2) 4/5/2018

Faloutsos - Pavlo CMU SCS /615

Overview. Carnegie Mellon Univ. School of Computer Science /615 - DB Applications. Concepts - reminder. History

Relational Algebra. Mr. Prasad Sawant. MACS College. Mr.Prasad Sawant MACS College Pune

Database Management System. Relational Algebra and operations

Chapter 6: Formal Relational Query Languages

Comp 5311 Database Management Systems. 2. Relational Model and Algebra

Database Technology Introduction. Heiko Paulheim

Relational Model: History

A subquery is a nested query inserted inside a large query Generally occurs with select, from, where Also known as inner query or inner select,

Chapter 6 The Relational Algebra and Calculus

CS2300: File Structures and Introduction to Database Systems

Chapter 6 Formal Relational Query Languages

Relational Query Languages

The Relational Algebra

1. Introduction; Selection, Projection. 2. Cartesian Product, Join. 5. Formal Definitions, A Bit of Theory

Chapter 8: Relational Algebra

Relational Model 2: Relational Algebra

Textbook: Chapter 6! CS425 Fall 2013 Boris Glavic! Chapter 3: Formal Relational Query. Relational Algebra! Select Operation Example! Select Operation!

Relational Database Systems 2 5. Query Processing

Chapter 5 Relational Algebra. Nguyen Thi Ai Thao

CS

Relational Algebra. Relational Algebra. 7/4/2017 Md. Golam Moazzam, Dept. of CSE, JU

Web Science & Technologies University of Koblenz Landau, Germany. Relational Data Model

Relational Model History. COSC 304 Introduction to Database Systems. Relational Model and Algebra. Relational Model Definitions.

Computing for Medicine (C4M) Seminar 3: Databases. Michelle Craig Associate Professor, Teaching Stream

Transcription:

CMPT 354: Database System I Lecture 5. Relational Algebra 1

What have we learned Lec 1. DatabaseHistory Lec 2. Relational Model Lec 3-4. SQL 2

Why Relational Algebra matter? An essential topic to understand how query processingand optimization work What happened when an SQL is issued to a database? Help you master the kills to quickly learn a new query language How to quickly learn XML QL and MangoDB QL? 3

Relational Query Languages Query languages allow the manipulation and retrieval of data from a database Traditionally: QL!= programming language Doesn t need to be turing complete Not designed for computation Supports easy, efficient access to large databases Recent Years: Everything interesting involves a large data set QLs are quite powerful for expressing algorithms at scale 4

Formal Query Languages Relational Algebra Procedural query language used to represent execution plans Relational Calculus Non-procedural (declarative) querylanguage Describe what you want, rather than how to compute it Foundation for SQL 5

Results of a Query Queryisa function over relations Q(R1,...,Rn) = Rresult The schema of the result relation is determined by the input relation and the query Because the result of a query is a relation, it can be used as input to another query Q( ) =,Q( ) =, 6

Sets v.s. Bags Sets: {a, b, c}, {a, d, e, f}, {}, Bags: {a, a, b, c}, {b, b, b, b, b}, Relational Algebra has two flavors: Set semantics = standard Relational Algebra Bag semantics = extended Relational Algebra DB systems implement bag semantics (Why?) 7

Sets v.s. Bags Sets: {a, b, c}, {a, d, e, f}, {}, Bags: {a, a, b, c}, {b, b, b, b, b}, Relational Algebra has two flavors: Set semantics = standard Relational Algebra Bag semantics = extended Relational Algebra DB systems implement bag semantics (Why?) 8

Relational Algebra Operators Core 5 operators Selection (s) Projection (p) Union ( ) Set Difference (-) Cross product (X) Additional operators Rename (ρ) join ( ) Intersect ( ) 9

Selection The selection operator, s (sigma), specifies the rows to be retained from the input relation A selection has the form: s condition (relation), where condition is a Boolean expression Terms in the condition are comparisons between two fields (or a field and a constant) Using one of the comparison operators: <,, =, ¹, ³, > Terms may be connected by Ù (and), or Ú (or), Terms may be negated using (not) 10

Selection Example s birth < 1981 (Customer) Customer sin firstname lastname birth 111 Buffy Summers 1981 222 Xander Harris 1981 333 Cordelia Chase 1980 444 Rupert Giles 1955 555 Dawn Summers 1984 s lastname = "Summers" (Customer) sin firstname lastname birth 333 Cordelia Chase 1980 444 Rupert Giles 1955 sin firstname lastname birth 111 Buffy Summers 1981 555 Dawn Summers 1984 11

Projection The projection operator, p (pi), specifies the columns to be retained from the input relation A selection has the form: p columns (relation) Where columns is a comma separated list of column names The list contains the names of the columns to be retained in the result relation 12

Projection Example p firstname,lastname (Customer) firstname lastname Customer Buffy Summers sin firstname lastname birth Xander Harris 111 Buffy Summers 1981 Cordelia Chase 222 Xander Harris 1981 Rupert Giles 333 Cordelia Chase 1980 Dawn Summers 444 Rupert Giles 1955 birth 555 Dawn Summers 1984 1981 1980 p birth (Customer) 1955 1984 13

Selection and Projection Notes Selection and projection eliminate duplicates Since relations are sets Both operations require one input relation The schema of the result of a selection is the same as the schema of the input relation The schema of the result of a projection contains just those attributes in the projection list 14

Composing Selection and Projection p sin, firstname (s birth < 1982 Ù lastname = "Summers" (Customer)) Customer sin firstname lastname birth 111 Buffy Summers 1981 222 Xander Harris 1981 333 Cordelia Chase 1980 444 Rupert Giles 1955 555 Dawn Summers 1984 intermediate relation sin firstname lastname birth 111 Buffy Summers 1981 sin firstname 111 Buffy 15

Composing Selection and Projection Customer sin firstname lastname birth 111 Buffy Summers 1981 222 Xander Harris 1981 333 Cordelia Chase 1980 444 Rupert Giles 1955 555 Dawn Summers 1984 p birth (s birth < 1981 (Customer)) s birth < 1981 (p birth (Customer)) birth 1980 1955 birth 1980 1955 16

Commutative property For example: x + y = y + x x * y = y * x Does it hold for projection and selection? p columns (s condition (R)) = p condition (s columns (R))? What about p firstname (s birth < 1981 (Customer))? 17

Commutative property Customer sin firstname lastname birth 111 Buffy Summers 1981 222 Xander Harris 1981 333 Cordelia Chase 1980 444 Rupert Giles 1955 555 Dawn Summers 1984 p firstname (s birth < 1981 (Customer)) p firstname s birth birth < (s < 1981 1981 birth (p (p < firstname, 1981 (p firstname, birth (Customer)) birth (Customer))) firstname Cordelia Rupert firstname Cordelia Rupert 18

Set Operations Review A = {1, 3, 6} B = {1, 2, 5, 6} Union (È) A È B º B È A A È B = {1, 2, 3, 5, 6} Intersection(Ç) A Ç B º B Ç A A Ç B = {1, 6} Set Difference(-) A - B ¹ B - A A - B = {3} B - A = {2, 5} 19

Union Compatible Relations where op = È, Ç, or - A op B = R result A and B must be union compatible Same number of fields Field i in each schema have the same type 20

Union Compatible Relations Intersection of the Employee and Customer relations Customer sin firstname lastname birth 111 Buffy Summers 1981 222 Xander Harris 1981 333 Cordelia Chase 1980 444 Rupert Giles 1955 555 Dawn Summers 1984 Employee sin firstname lastname salary 208 Clark Kent 80000.55 111 Buffy Summers 22000.78 412 Carol Danvers 64000.00 The two relations are not union compatible as birth is a DATE and salary is a REAL We can carry out preliminary operations to make the relations union compatible p sin, firstname, lastname (Customer) Ç p sin, firstname, lastname (Employee) 21

Union Compatible Relations where op = È, Ç, or - A op B = R result A and B must be union compatible Same number of fields Field I in each schema have the same type Result schema borrowed from A A(age int) È B(num int) =? R result (age int) 22

Union A sin firstname lastname 111 Buffy Summers 222 Xander Harris 333 Cordelia Chase 444 Rupert Giles 555 Dawn Summers B sin firstname lastname 208 Clark Kent 111 Buffy Summers 412 Carol Danvers A È B sin firstname lastname 111 Buffy Summers 222 Xander Harris 333 Cordelia Chase 444 Rupert Giles 555 Dawn Summers 208 Clark Kent 412 Carol Danvers 23

Set Difference A sin firstname lastname 111 Buffy Summers 222 Xander Harris 333 Cordelia Chase 444 Rupert Giles 555 Dawn Summers B sin firstname lastname 208 Clark Kent 111 Buffy Summers 412 Carol Danvers A - B B - A sin firstname lastname 222 Xander Harris 333 Cordelia Chase 444 Rupert Giles 555 Dawn Summers sin firstname lastname 208 Clark Kent 412 Carol Danvers 24

Note on Set Difference Notice that most operators are monotonic Increasing size of inputs à outputs grow Set Difference is non-monotonic Example: A B Increasing the size of B could decrease output size Set difference is blocking: For A B, must wait for all B tuples before any results 25

Intersection A sin firstname lastname 111 Buffy Summers 222 Xander Harris 333 Cordelia Chase 444 Rupert Giles 555 Dawn Summers B sin firstname lastname 208 Clark Kent 111 Buffy Summers 412 Carol Danvers A Ç B sin firstname lastname 111 Buffy Summers 26

Note on Intersect A B = R result A B Can we express using other operators? A B =? 27

Note on Intersect A B = R result A B Can we express using other operators? A B = A -?(A B) A B = A B 28

Cartesian Product A(a 1,, a n ) x B(a n+1,,a m ) = R result (a 1,,a m ) Each row of A paired with each row of B Result schema concats A and B s fields Names are inherited if possible (i.e. if not duplicated) If two field names are the same (i.e., a naming conflict occurs) and the affected columns are referred to by position If R contains m records, and S contains n records, the result relation will contain m * n records 29

Cartesian Product Example s lastname = "Summers" (Customer) Account sin firstname lastname birth acc type balance sin 111 Buffy Summers 1981 01 CHQ 2101.76 111 555 Dawn Summers 1984 02 SAV 11300.03 333 s lastname = "Summers" (Customer) Account 03 CHQ 20621.00 444 1 firstname lastname birth acc type balance 8 111 Buffy Summers 1981 01 CHQ 2101.76 111 111 Buffy Summers 1981 02 SAV 11300.03 333 111 Buffy Summers 1981 03 CHQ 20621.00 444 555 Dawn Summers 1984 01 CHQ 2101.76 111 555 Dawn Summers 1984 02 SAV 11300.03 333 555 Dawn Summers 1984 03 CHQ 20621.00 444 30

Renaming It is sometimes useful to assign names to the results of a relational algebra query The rename operator, r (rho) r S (R) renames a relation r S(a1,a2,,an) (R) renames a relation and its attributes r new/old (R) renames specified attributes R r sid1/1, sid2/8 (R) 31

Largest Balance Find the account with the largest balance; return accnumber 1. Find accounts which are less than some other account s account.balance < d.balance (Account r d (Account)) 2. Use set difference to find the account with the largest balance p accnumber (Account) p account.accnumber (s account.balance < d.balance (Account r d (Account))) 32

Relational Algebra Operators Core 5 operations Selection (s) Projection (p) Union ( ) Set Difference (-) Cross product (X) Additional operations Rename (ρ) Intersect ( ) Join ( ) 33

Relational Algebra Exercises Student (sid, lastname, firstname, cgpa) 101, Jordan, Michael, 3.8 Offering (oid, dept, cnum, term, instructor) abc, CMPT, 354, Fall 2018, Jiannan Took (sid, oid, grade) 101, abc, 95 1. sid of all students who have earned some grade over 80 and some grade below 50. p sid (s grade > 80 (Took)) p sid (s grade < 50 (Took)) 34

Relational Algebra Exercises Student (sid, lastname, firstname, cgpa) 101, Jordan, Michael, 3.8 Offering (oid, dept, cnum, term, instructor) abc, CMPT, 354, Fall 2018, Jiannan Took (sid, oid, grade) 101, abc, 95 2. Student number of all students who have taken CMPT 354 p sid (s Offering.oID = Took.oID Ù dept = CMPT Ù cnum = 354 (Offering x Took)) 35

(Inner) Joins Motivation Simplify some queries that require a Cartesian product Natural Join: R S = π A (σ θ (R S)) Theta Join: R θ S = σ θ (R S) Equijoin: R θ S = σ θ (R S) Join condition θ consists only of equalities 36

Natural Join There is often a natural way to join two relations Join based on common attributes Eliminate duplicate common attributes fromthe result Customer sin firstname lastname birth 111 Buffy Summers 1981 222 Xander Harris 1981 333 Cordelia Chase 1980 444 Rupert Giles 1955 Employee sin firstname lastname salary 208 Clark Kent 80000.55 111 Buffy Summers 22000.78 396 Dawn Allen 41000.21 Customer Employee sin firstname lastname birth salary 111 Buffy Summers 1981 22000.78 37

Natural Join Meaning: R S R S = π A (σ θ (R S)) Where: Selection σ θ checks equality of all common attributes (i.e., attributes with same names) Projection π A eliminates duplicate common attributes The natural join of two tables with no fields in common is the Cartesian product Not the empty set 38

Natural Join Example R A B C D 111 Buffy Summers 1981 222 Xander Harris 1981 333 Cordelia Chase 1980 444 Rupert Giles 1955 S A B C E 208 Clark Kent 80000.55 111 Buffy Summers 22000.78 396 Dawn Allen 41000.21 R S= π A,B,C,D,E (σ R.A=S.A Ù R.B=S.B Ù R.C=S.C (R S)) A B C D E 111 Buffy Summers 1981 22000.78 39

Theta Join R θ S = σ θ (R x S) Most general form θ can be any condition No projection in this case! Result schema same as cross product 40

Theta Join Example Customer sin firstname lastname birth 111 Buffy Summers 1981 222 Xander Harris 1981 333 Cordelia Chase 1980 444 Rupert Giles 1955 555 Dawn Summers 1984 Employee sin firstname lastname salary 208 Clark Kent 80000.55 111 Buffy Summers 22000.78 412 Carol Danvers 64000.00 Customer Customer.sin < Employee.sin Employee 1 2 3 birth 5 6 7 salary 111 Buffy Summers 1981 208 Clark Kent 80000.55 111 Buffy Summers 1981 412 Carol Danvers 64000.00 222 Xander Harris 1981 412 Carol Danvers 64000.00 333 Cordelia Chase 1980 412 Carol Danvers 64000.00 41

Equi-Joins R θ S = σ θ (R x S) A theta join where θ is an equality predicate Customer sin firstname lastname birth 111 Buffy Summers 1981 222 Xander Harris 1981 333 Cordelia Chase 1980 444 Rupert Giles 1955 Employee sin firstname lastname salary 208 Clark Kent 80000.55 111 Buffy Summers 22000.78 396 Dawn Allen 41000.21 Customer Customer.sin = Employee.sin Employee 1 2 3 birth 5 6 7 salary 111 Buffy Summers 1981 111 Buffy Summers 22000.78 42

(Inner) Joins Summary Natural Join: R S = π A (σ θ (R S)) Equality on all fields with same name in R and in S Projection π A drops all redundant attributes Theta Join: R θ S = σ θ (R S) Join of R and S with a join condition θ Cross-product followed by selection θ No projection Equijoin: R θ S = σ θ (R S) Join condition θ consists only of equalities No projection 43

Relational Algebra Exercises Student (sid, lastname, firstname, cgpa) 101, Jordan, Michael, 3.8 Course (dept, cnum, name, breadth) CMPT, 354, DB, True Offering (oid, dept, cnum, term, instructor) abc, CMPT, 354, Fall 2018, Jiannan Took (sid, oid, grade) 101, abc, 95 The names of all students who have passed a breadth course (grade >= 60 and breadth = True) with Martin p lastname, firstname (s breadth = True Ù grade > 60 Ù instructor = Martin (Student Took Offering Course)) 44

Different Plans, Same Results Semantic equivalence: results are always the same π name (σ cnum=354 (R S)) π name (σ cnum=354 (R) S) Are they equivalent? Which one is more efficient? Can you make it even more efficient? 45

Other Operators There are additional relational algebra operators Usually used in the context of query optimization Duplicate elimination d Used to turn a bag into a set Aggregation operators e.g. sum, average Grouping g Used to partition tuples into groups Typically used with aggregation 46

Summary Relational Algebra (RA) operators Five core operators: selection, projection, cross-product, union and set difference Additional operators are defined in terms of the core operators: rename, intersection, join Theorem: SQL and RA can express exactly the same class of queries MultipleRAqueries can beequivalent Same semantics butdifference performance Form basis for optimizations RDBMS translate SQL à RA, then optimize RA 47

Acknowledge Some lecture slides were copied from or inspired by the following course materials W4111:Introduction to databases by Eugene Wu at Columbia University CSE344: Introduction to Data Management by Dan Suciu at University of Washington CMPT354: Database System I by John Edgar at Simon Fraser University CS186: Introduction to Database Systems by Joe Hellerstein at UC Berkeley CS145: Introduction to Databases by Peter Bailis at Stanford CS 348: Introduction to Database Management by Grant Weddell at University of Waterloo 48