Relational Algebra. Spring 2012 Instructor: Hassan Khosravi

Similar documents
Programming and Database Fundamentals for Data Scientists

Relational Model, Key Constraints

Information Systems for Engineers. Exercise 10. ETH Zurich, Fall Semester Hand-out Due

SQL queries II. Set operations and joins

High-Level Database Models. Spring 2011 Instructor: Hassan Khosravi

Set Operations, Union

Niklas Fors The Relational Data Model 1 / 17

Database Systems. Basics of the Relational Data Model

Database Modifications and Transactions

AC61/AT61 DATABASE MANAGEMENT SYSTEMS DEC 2013

Data Definition Language (DDL), Views and Indexes Instructor: Shel Finkelstein

Information Systems Engineering. SQL Structured Query Language DML Data Manipulation (sub)language

Databases - Relations in Databases. (N Spadaccini 2010) Relations in Databases 1 / 16

Relational Model. CSE462 Database Concepts. Demian Lessa. Department of Computer Science and Engineering State University of New York, Buffalo

1. Given the name of a movie studio, find the net worth of its president.

Joins, NULL, and Aggregation

Introduction to Databases, Fall 2005 IT University of Copenhagen. Lecture 2: Relations and SQL. September 5, Lecturer: Rasmus Pagh

Midterm 1 157A Fall /22

The Relational Model of Data (ii)

Information Systems Engineering. SQL Structured Query Language DDL Data Definition (sub)language

Database Systems Architecture. Stijn Vansummeren

Databases-1 Lecture-01. Introduction, Relational Algebra

Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa

Optimization of logical query plans Eliminating redundant joins

Relational Database: The Relational Data Model; Operations on Database Relations

Relational Algebra. Study Chapter Comp 521 Files and Databases Fall

Database Technology Introduction. Heiko Paulheim

5.2 E xtended Operators of R elational Algebra

Chapter 2: Intro to Relational Model

Relational Query Languages: Relational Algebra. Juliana Freire

SQL Functionality SQL. Creating Relation Schemas. Creating Relation Schemas

Database Systems CSE 303. Lecture 02

Database Systems CSE 303. Lecture 02

Chapter 4. Basic SQL. Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley

Relational Databases

Relational Algebra and SQL. Basic Operations Algebra of Bags

Relational Algebra. Lecture 4A Kathleen Durant Northeastern University

CMP-3440 Database Systems

Concepts from

Algebraic laws extensions to relational algebra

Databases 1. Daniel POP

Instructor: Amol Deshpande

Chapter 2 The relational Model of data. Relational model introduction

Relational Query Languages. Preliminaries. Formal Relational Query Languages. Example Schema, with table contents. Relational Algebra

Outline. Textbook Chapter 6. Note 1. CSIE30600/CSIEB0290 Database Systems Basic SQL 2

Chapter 3: Introduction to SQL. Chapter 3: Introduction to SQL

SQL: Concepts. Todd Bacastow IST 210: Organization of Data 2/17/ IST 210

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Exam II Computer Programming 420 Dr. St. John Lehman College City University of New York 20 November 2001

WHAT IS SQL. Database query language, which can also: Define structure of data Modify data Specify security constraints

CSCB20 Week 3. Introduction to Database and Web Application Programming. Anna Bretscher Winter 2017

CS143: Relational Model

Relational Model and Relational Algebra A Short Review Class Notes - CS582-01

CMPT 354 Views and Indexes. Spring 2012 Instructor: Hassan Khosravi

Chapter 2 The relational Model of data. Relational algebra

Relational Algebra. csc343, Introduction to Databases Diane Horton, Michelle Craig, and Sina Meraji Fall 2017

CSIE30600 Database Systems Basic SQL 2. Outline

Databases. Jörg Endrullis. VU University Amsterdam

Relational Algebra. Note: Slides are posted on the class website, protected by a password written on the board

Relational Model and Relational Algebra

Chapter 4. Basic SQL. SQL Data Definition and Data Types. Basic SQL. SQL language SQL. Terminology: CREATE statement

2.2.2.Relational Database concept

Relational Algebra. Algebra of Bags

Basic operators: selection, projection, cross product, union, difference,

Chapter 3: Introduction to SQL

Database System Concepts, 5th Ed.! Silberschatz, Korth and Sudarshan See for conditions on re-use "

Relational Algebra. Relational Query Languages

Midterm Review. Winter Lecture 13

Relational Algebra. Procedural language Six basic operators

Introduction to Database Design, fall 2011 IT University of Copenhagen. Normalization. Rasmus Pagh

Ian Kenny. November 28, 2017

SQL DATA DEFINITION LANGUAGE

Normalization. Anomalies Functional Dependencies Closures Key Computation Projecting Relations BCNF Reconstructing Information Other Normal Forms

SQL. Lecture 4 SQL. Basic Structure. The select Clause. The select Clause (Cont.) The select Clause (Cont.) Basic Structure.

Database Management System Dr. S. Srinath Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No.

SQL DATA DEFINITION LANGUAGE

SQL (Structured Query Language)

Information Systems Engineering. Other Database Concepts

Birkbeck. (University of London) BSc/FD EXAMINATION. Department of Computer Science and Information Systems. Database Management (COIY028H6)

Chapter 3: Introduction to SQL

CS145 Introduction. About CS145 Relational Model, Schemas, SQL Semistructured Model, XML

The SQL data-definition language (DDL) allows defining :

Relational Algebra and SQL

Relational Algebra. [R&G] Chapter 4, Part A CS4320 1

CSCI3030U Database Models

THE AUSTRALIAN NATIONAL UNIVERSITY. Mid-Semester Examination August 2006 RELATIONAL DATABASES (COMP2400)

CSEN 501 CSEN501 - Databases I

Relational Algebra BASIC OPERATIONS DATABASE SYSTEMS AND CONCEPTS, CSCI 3030U, UOIT, COURSE INSTRUCTOR: JAREK SZLICHTA

user specifies what is wanted, not how to find it

Chapter 2: Intro to Relational Model

Relational Model, Relational Algebra, and SQL

Introduction to Data Management. Lecture #11 (Relational Algebra)

1 Relational Data Model

Chapter 3. The Relational Model. Database Systems p. 61/569

Relational Model & Algebra. Announcements (Thu. Aug. 27) Relational data model. CPS 116 Introduction to Database Systems

Basant Group of Institution

Mahathma Gandhi University

QQ Group

SQL Overview. CSCE 315, Fall 2017 Project 1, Part 3. Slides adapted from those used by Jeffrey Ullman, via Jennifer Welch

SQL Data Definition and Data Manipulation Languages (DDL and DML)

Transcription:

Relational Algebra Spring 2012 Instructor: Hassan Khosravi

Querying relational databases Lecture given by Dr. Widom on querying Relational Models 2.2

2.1 An Overview of Data Models 2.1.1 What is a Data Model? 2.1.2 Important Data Models 2.1.3 The Relational Model in Brief 2.1.4 The Semi-structured Model in Brief 2.1.5 Other Data Models 2.1.6 Comparison of Modeling Approaches 2.3

2.1.1 What is a Data Model? Data model is a notion for describing data or information. Real World Math Model: 1. Structure of the data (tuples) 2. Operations on the data queries to retrieve and modify information 3. Constraints on the data year has to be integer, name is string. Important data models The relational Model The semi-structured data model XML 2.4

Relational Model in Brief Title Year Length genre Gone with the wind 1939 231 Drama Star Wars 1977 124 SciFi Wayne s world 1992 95 comedy Relational model is based on tables Operations: query, modify Constraints: year is Integer between 1930-2012 The structure may appear to resemble an array of structs in C where the column headers are the field names and each row represent the values of one struct in the array. Distinction in scales of relations Not normally implemented as main-memory structure Take into consideration to access relations on hard drive 2.5

The Semi-structured Model in Brief <Movies> <Movie title= Gone with the Wind > </Movie> <Year>1939</Year> <Length>231</Length> <Genre>drama</Genre> <Movie title= Star Wars > </Movie> <title= Wars > <Year>1977</Year> <Length>124</Length> <Genre>sciFi</Length> <Movie title= Wayne s World > </Movie> </Movies> <Year>1992</Year> <Length>95</Length> <Genre>comedy</Genre> Semi structure data resembles trees or graphs rather than tables or arrays. Operations usually involve following in the tree. 2.6 Find the movies with the comedy genre. Constraints often involve data types of values associated with a tag. Values associated with the length tag are integers

Comparison of Modeling Approaches Semi-structured models have more flexibility than relations. However, the relational model is still preferred in DBMS s. 1. Efficiency of access to data and efficiency of modifications to that data are more important than flexibility 2. ease of use is more important than flexibility. SQL enables the programmer to express their wishes at very high level. The strongly limited set of operations can be optimized to run very fast 2.7

Basics of the Relational model Title Year Length genre Gone with the wind 1939 231 Drama Star Wars 1977 124 SciFi Wayne s world 1992 95 comedy Attributes: columns of a relation are named attributes. Schema: the name of the relation and the set of attributes Movies(title, year, length, genre) Tuples: The rows of a relation, other than the header Domains: the value for each attribute must be atomic (can not be structure). Each attribute has a domain of values. 2.8

Equivalent Representations of a Relation Relations are sets of tuples not lists of tuples. The order of tuples does not matter. Attributes could be reordered too. Title Year Length genre Gone with the wind 1939 231 Drama Star Wars 1977 124 SciFi Wayne s world 1992 95 comedy Year Genre Title length 1977 SciFi Star Wars 124 1992 Comedy Wayne s World 95 1939 Drama Gone With the Wind 231 How many different ways can we present the given relation? 2.9

Relation Instances and Keys A set of tuples for a given relation is called an instance of that relation. It is expected for the instance of the relation to change over time. New movies are added to the table It is less common for the schema of a relation to change. It is hard to add a new value for all the current tuples if a new attribute is added to the schema. Keyes of relations Key constraints: A set of attributes form a key if we do not allow two tuples in a relation instance to have the same value. We indicate the attributes that form a key by underlining them Movies(title, year, length, genre) Key most be true for all possible instances of a relation not a specific instance. Genre is not a key What if our data does not have a key? Generate artificial ID. Student Number 2.10

Database Schema about Movies Movies( ) title: string; Year : integer, Length : integer, Genre : string, studioname : string, producerc# : integer Moviestar ( name : string, address : string, gender : char, birthdate : date ) MovieExec ( name: string, address : string cert# : integer networth : integer ) Studio ( ) name: string, address : string pressc# : integer StarsIn ( ) MovieTitle: string, Movieyear : integer Starname : string 2.11

Defining a Relation Schema in SQL 2.3.1 Relations in SQL 2.3.2 Data Types 2.3.3 Simple Table Declarations 2.3.4 Modifying Relation Schemas 2.3.5 Default Values 2.3.6 Declaring Keys 2.3.7 Exercises for Section 2.3 2.12

2.3.1 Relations in SQL SQL also pronounced (sequel) is the principal language used to describe and manipulate relational database SQL makes a distinction between three kinds of relations Stored relations (tables): this relations are tables that exist in the database we can query and modify Views: are relations defined by a computation. They are not stored but constructed. We just query them (chapter 8) Temporary tables: are constructed by SQL language processor during optimization. These are not stored nor seen by the user 2.13

Data Types Char(n): a fixed-length string of up to n characters. Char(5) of foo is stored foo Varchar(n): a variable-length string of up to n characters Varchar(5) of foo is stored foo Bit(n), Varbit(n) fixed and variable string of upto n bits. Boolean: True False and although it would surprise George Boole Unknown Int or Integer: typical integer values Float or real: typical real values Decimal(6,2) could be 0123.45 Date and time: essentially char strings with constraints. 2.14

2.3.3 Simple Table Declarations CREATE TABLE Movie ( title VARCHAR(255), year INTEGER, length INTEGER, incolor CHAR(1), studioname CHAR(50), producerc# INTEGER, ); CREATE TABLE MOVIESTAR ( NAME CHAR(30), ADDRESS VARCHAR2(50), GENDER CHAR(6), BIRTHDATE DATE ); Movies( ) title: string; Year : integer, Length : integer, Genre : string, studioname : string, producerc# : integer Moviestar ( name : string, address : string, gender : char, birthdate : date ) 2.15

Modifying Relation Schemas We can delete a table R by the following SQL command Drop table R; We can modify a table by the command Alter Table MovieStar ADD phone CHAR(16); Alter Table MovieStar Drop birthdate; Defaults values To use the default character? As the default for an unknown gender. Earliest possible date for Unknown Birthdate. DATE 0000-00-00 Gender CHAR(1) DEFAULT?, Birthdate DATE DEFAULT DATE 0000-00-00, ALTER TABLE MovieStar ADD phone CHAR (16) DEFAULT unlisted ; 2.16

2.3.6 Declaring Keys Two ways to declare keys in CRATE table statement Primary key can not be null Unique can be null Replace primary with unique in examples to get the example with unique CREATE TABLE MovieStar ( ); name CHAR (30) Primary Key, address VARCHAR (255), gender CHAR(1), birthdate DATE CREATE TABLE MovieStar ( ); name CHAR (30), address VARCHAR (255), gender CHAR(1), birthdate DATE PRIMARY KEY (name) 2.17

Example 2.7 The Relation Movie, whose key is the pair of attributes title and year must be declared like this CREATE TABLE Movies( title year length genre studiname producerc# PRIMARY KEY ); CHAR(100), INTEGER, INTEGER, CHAR(10), CHAR(30), INTEGER, (title,year) 2.18

Quick summary Lecture given by Dr. Widom on Relational Model definition 2.19

2.4 An Algebraic Query Language 2.4.1 Why Do We Need a Special Query Language? 2.4.2 What is an Algebra? 2.4.3 Overview of Relational Algebra 2.4.4 Set Operations on Relations 2.4.5 Projection 2.4.6 Selection 2.4.7 Cartesian Product 2.4.8 Natural Joins 2.4.9 Theta-Joins 2.4.10 Combining Operations to Form Queries 2.4.11 Naming and Renaming 2.4.12 Relationships Among Operations 2.4.13 A Linear Notation for Algebraic Expressions 2.4.14 Exercises for Section 2.4 2.20

Why Do We Need a Special Query Language? Why not just use C or java instead of introducing relational algebra? Relational algebra is useful because it is less powerful than C and Java. One of the only areas where non-turing-complete languages make sense. Relational algebra CANNOT determine whether the number of tuples are odd or even Being less powerful is helpful because Ease of programming Ease of compilation Ease of optimization 2.21

Projection The Projection operator applied to a relation R, produces a new relation with a subset of R s columns. Duplicate tuples are eliminated. Title Year Length Genre Studioname producerc# Star Wars 1977 124 SciFi Fox 12345 Galaxy 1999 104 Comedy DreamWorks 67890 Wayne s World 1992 95 Comedy Paramount 99999 Title,year,length (Movies) Title Year Length Star Wars 1977 124 Galaxy Quest 1999 104 Wayne s World 1992 95 genre (Movies) Genre SciFi Comedy 2.22

Selection and Projection Lecture given by Dr. Widom on selection and projection 2.23

2.4.6 Selection The selection operator applied to a relation R, produces a new relation with a subset of R s tuples. Title Year Length Genre Studioname producerc# Star Wars 1977 124 SciFi Fox 12345 Galaxy 1999 104 Comedy DreamWorks 67890 Wayne s World 1992 95 Comedy Paramount 99999 σ length >= 100 (Movie) Title Year Length Genre StudioName producerc# Star Wars 1977 124 SciFi Fox 12345 Galaxy 1999 104 Comedy DreamWorks 67890 2.24

Example for Selection Set tuples in the relation movies that represent Fox Movies at least 100 minutes long. Title Year Length Genre Studioname producerc# Star Wars 1977 124 SciFi Fox 12345 Galaxy 1999 104 Comedy DreamWorks 67890 Wayne s World 1992 95 Comedy Paramount 99999 σ Length >= 100 AND studioname = Fox (Movies) Title Year Length Genre StudioName producerc# Star Wars 1977 124 SciFi Fox 12345 2.25

2.4.7 Cartesian Product The Cartesian Product of two sets R and S is the set of pairs that can be formed by choosing the first element from R and the second from S. Relation R A B 1 2 3 4 Relation S B C D 2 5 6 4 7 8 9 10 11 Relation R X S A R.B S.B C D 1 2 2 5 6 1 2 4 7 8 1 2 9 10 11 3 4 2 5 6 3 4 4 7 8 3 4 9 10 11 If R and S have some attribute in common, we need to invent new name for the identical attributes. 2.26

Cartesian Product Lecture given by Dr. Widom on duplicates, cross product 2.27

2.28

2.4.8 Natural Joins The Natural join of two sets R and S is the set of pairs that agree in whatever attributes are common to the schemas of R and S. Let A 1,A 2,, A n be attributes in both R and S. a tuple r from R and s from S are successfully paired if and only if r and s agree on A 1,A 2,, A n that can be formed by choosing the first element from R and the second from S. Relation R A B 1 2 3 4 Relation S B C D 2 5 6 4 7 8 9 10 11 Relation R S A B C D 1 2 5 6 3 4 7 8 2.29

Example for Natural Join A more complicated example for natural join Relation U A B C 1 2 3 6 7 8 9 7 8 Relation V B C D 2 3 4 2 3 5 7 8 10 Result U V A B C D 1 2 3 4 1 2 3 5 6 7 8 10 9 7 8 10 2.30

Lecture given by Dr. Widom on Natural Join 2.31

2.32

2.33

Theta-Joins It is sometimes desirable to pair tuples on other conditions except all the common attributes being equal. The notation for a theta-join of relation R and S based on condition C is R C S The result is constructed as follows: Take product of R and S Select tuples that satisfy C U A < D V Relation U A B C 1 2 3 6 7 8 9 7 8 Relation V B C D 2 3 4 2 3 5 7 8 10 A U.B U.C V.B V.C D 1 2 3 2 3 4 1 2 3 2 3 5 1 2 3 7 8 10 6 7 8 7 8 10 9 7 8 7 8 10 2.34

Example on Theta-Joins U and V that has more complex condition : We require for successful pairing not only that the A component of U-tuple be less than D component of the V-tuple, but that the two tuples disagree on their respective B components Relation U A B C 1 2 3 6 7 8 9 7 8 Relation V B C D 2 3 4 2 3 5 7 8 10 U A < D AND U.B <> V.B V A U.B U.C V.B V.C D 1 2 3 7 8 10 2.35

Combining Operations to Form Queries Example: What are the titles and years of movies made by Fox that are at least 100 minutes long Title,year σ length >=100 σ StudioName = Fox Movies Movies Title,year (σ length >=100 (Movies) σ StudioName = Fox (Movies) Title,year (σ length >=100 AND StudioName = Fox (Movies) 2.36

Relational algebra Algebra in general consists of operators and atomic operands Algebra of arithmetic operands are variables and constants and operators are (+, -, *, /). Any algebra allows us to build expressions by applying an operator to operands and other expressions. (x+y)/z Name Address Gender Birthdate Carrie Fisher 123 Maple st., Hollywood F 9/9/99 Mark hamill 456 Oak road., Brentwood M 8/8/88 Relation R Name Address Gender Birthdate Carrie Fisher 123 Maple st., Hollywood F 9/9/99 Harrison Ford 789 Palm Dr., Beverly Hills M 7/7/77 Relation S 2.37

Operations of relational algebra Union (R S): the set of elements that are in R, or S or both. Appears only once in the union. Relation R Name Address Gender Birthdate Carrie Fisher 123 Maple st., Hollywood F 9/9/99 Mark hamill 456 Oak road., Brentwood M 8/8/88 Name Address Gender Birthdate Relation S Carrie Fisher 123 Maple st., Hollywood F 9/9/99 Harrison Ford 789 Palm Dr., Beverly Hills M 7/7/77 Name Address Gender Birthdate Carrie Fisher 123 Maple st., Hollywood F 9/9/99 Mark Hamill 456 oak Rd., Brentwood M 8/8/88 Harrison Ford 789 Palm Dr., Beverly Hills M 7/7/77 2.38

Operations of relational algebra Intersection (R S): the set of elements that are in both R and S. Appears only once in the intersection. Relation R Name Address Gender Birthdate Carrie Fisher 123 Maple st., Hollywood F 9/9/99 Mark hamill 456 Oak road., Brentwood M 8/8/88 Name Address Gender Birthdate Relation S Carrie Fisher 123 Maple st., Hollywood F 9/9/99 Harrison Ford 789 Palm Dr., Beverly Hills M 7/7/77 Name Address Gender Birthdate Carrie Fisher 123 Maple st., Hollywood F 9/9/99 2.39

Operations of relational algebra The Difference (R-S): the set of elements that are in R and not in S. Appears only once in the difference. Relation R Name Address Gender Birthdate Carrie Fisher 123 Maple st., Hollywood F 9/9/99 Mark hamill 456 Oak road., Brentwood M 8/8/88 Name Address Gender Birthdate Relation S Carrie Fisher 123 Maple st., Hollywood F 9/9/99 Harrison Ford 789 Palm Dr., Beverly Hills M 7/7/77 Name Address Gender Birthdate Mark Hamill 456 oak Rd., Brentwood M 8/8/88 2.40

Lecture given by Dr. Widom on union, difference, intersection 2.41

2.42

2.4.11 Naming and Renaming Operator to explicitly rename attributes in relations. PS(A1,A2,, An ) (R) results in a relation S that has exactly the same tuples as R but the attributes names are A 1,A 2,, A n starting from the left most attribute. Relation R A B 1 2 3 4 Relation S B C D 2 5 6 4 7 8 9 10 11 R X ρ s (X,C,D) (S) A B X C D 1 2 2 5 6 1 2 4 7 8 1 2 9 10 11 3 4 2 5 6 3 4 4 7 8 3 4 9 10 11 2.43

Lecture given by Dr. Widom on Renaming 2.44

Relationships Among Operations Intersection can be expressed as difference. R S = R (R S) See video Theta join can be expressed by product and selection R C S= (R S) C Natural join can be rewritten by product, selection, projection Example Result U V = A,U.B, U.C, D ( U.B=V.B AND U.C=V.B (U V)) Relation U A B C 1 2 3 6 7 8 9 7 8 Relation V B C D 2 3 4 2 3 5 7 8 10 These are the only redundancies ( union, difference, selection, projection, product, renaming) form an independent set. 2.45 Result U V A B C D 1 2 3 4 1 2 3 5 6 7 8 10 9 7 8 10

2.5 Constraints on Relations 2.5.1 Relational Algebra as a Constraint Language 2.5.2 Referential Integrity Constraints 2.5.3 Key Constraints 2.5.4 Additional Constraint Examples 2.5.5 Exercises for Section 2.5 2.6 Summary of Chapter 2 2.7 References for Chapter 2 2.46

Referential Integrity Constraints Referential Integrity Constraints A value appearing in one context also appears in another, related context StarsIn(movietitle, movieyear,starname) Movie(title,year,length,studioName, producerc#) movietitle, movieyear (StarsIn) title,year (Movies) Movie(title,year,length,genre,studioName, producerc#) MovieExec(name,address,cert#,netWorth) producerc# (Movies) cert# (MocvieExec) 2.47

Key Constraints Recall that name is the key for relation MovieStar(name,address,gender,birthdate) The requirement can be expressed by the algebraic expression σ MS1.name = MS2.name AND MS1.address MS2.address (MS1 x MS2) = MS1 in the product MS1 x MS2 is shorthand for the remaining ρ MS1(name,address,gender,birthdate) (MovieStar) 2.48

Example 2.24 The only legal value for Gender attribute is F and M. We can express the gender attribute of MovieStar alegrabically by: σ Mgender F AND gender M (MovieStar) = 2.49

Example 2.25 If one must have networth of at least $100,000,000 to be president of movie studio. FROM MovieExec(name,address,cert#,networth) Studio(name,address, presc#) First we have to perform theta-join on this two relations. σ networth < 100000000 (Studio presc# = cert# MovieExec) = Second way TpressC# (Studio) cert# (σ networth < 100000000 (MovieExec)) Which one is more efficient? 2.50

Summary of Relational Algebra Lecture given by Dr. Widom on Relational Model 2.51