What is an algebra? A formal system of manipulation of symbols to deal with general statements of relations.

Similar documents
Relational Algebra Part I. CS 377: Database Systems

CMP-3440 Database Systems

CS 377 Database Systems

Chapter 6 The Relational Algebra and Relational Calculus

Relational Query Languages. Preliminaries. Formal Relational Query Languages. Example Schema, with table contents. Relational Algebra

Informationslogistik Unit 4: The Relational Algebra

Relational Algebra. Note: Slides are posted on the class website, protected by a password written on the board

Chapter 5 Relational Algebra. Nguyen Thi Ai Thao

ECE 650 Systems Programming & Engineering. Spring 2018

CS317 File and Database Systems

RELATIONAL DATA MODEL: Relational Algebra

Chapter 2: Intro to Relational Model

The Relational Model and Relational Algebra

Chapter 6 The Relational Algebra and Calculus

Relational Algebra. Relational Query Languages

Chapter 8: The Relational Algebra and The Relational Calculus

Relational Algebra. Study Chapter Comp 521 Files and Databases Fall

Introduction to Data Management CSE 344. Lectures 8: Relational Algebra

MIS Database Systems Relational Algebra

Database Management System Dr. S. Srinath Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No.

Relational Model. Rab Nawaz Jadoon DCS. Assistant Professor. Department of Computer Science. COMSATS IIT, Abbottabad Pakistan

Basant Group of Institution

Chapter 6 - Part II The Relational Algebra and Calculus

Mahathma Gandhi University

Database Management System Dr. S. Srinath Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No.

The Relational Algebra

Set theory is a branch of mathematics that studies sets. Sets are a collection of objects.

پوهنتون کابل پوهنځی كمپيوترساینس

A l Ain University Of Science and Technology

LECTURE 8: SETS. Software Engineering Mike Wooldridge

CSCC43H: Introduction to Databases. Lecture 3

Relational Model: History

Relational Model, Relational Algebra, and SQL

Relational Data Model ( 관계형데이터모델 )

Chapter 6 5/2/2008. Chapter Outline. Database State for COMPANY. The Relational Algebra and Calculus

Relational Model and Relational Algebra

Databases. Relational Model, Algebra and operations. How do we model and manipulate complex data structures inside a computer system? Until

Chapter 3B Objectives. Relational Set Operators. Relational Set Operators. Relational Algebra Operations

Relational terminology. Databases - Sets & Relations. Sets. Membership

THE RELATIONAL DATABASE MODEL

Relational Model History. COSC 304 Introduction to Database Systems. Relational Model and Algebra. Relational Model Definitions.

Chapter 6: RELATIONAL DATA MODEL AND RELATIONAL ALGEBRA

Relational Algebra & Calculus. CS 377: Database Systems

Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Chapter 6 Outline. Unary Relational Operations: SELECT and

CMPT 354: Database System I. Lecture 3. SQL Basics

RDBMS- Day 4. Grouped results Relational algebra Joins Sub queries. In today s session we will discuss about the concept of sub queries.

Introduction to Data Management. Lecture #11 (Relational Algebra)

Relational Data Model

Relational Algebra. Chapter 4, Part A. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

Relational Algebra. Relational Query Languages

Relational Algebra. Procedural language Six basic operators

Review for Exam 1 CS474 (Norton)

Databases 1. Daniel POP

Database Technology Introduction. Heiko Paulheim

COSC344 Database Theory and Applications. σ a= c (P) S. Lecture 4 Relational algebra. π A, P X Q. COSC344 Lecture 4 1

Chapter 2: Intro to Relational Model

Chapter 3. The Relational Model. Database Systems p. 61/569

SQL STRUCTURED QUERY LANGUAGE

Relational Algebra. [R&G] Chapter 4, Part A CS4320 1

Introduction to Data Management CSE 344. Lectures 8: Relational Algebra

Basic operators: selection, projection, cross product, union, difference,

CS2300: File Structures and Introduction to Database Systems

Unit 4 Relational Algebra (Using SQL DML Syntax): Data Manipulation Language For Relations Zvi M. Kedem 1

Introductory SQL SQL Joins: Viewing Relationships Pg 1

Ian Kenny. November 28, 2017

Relational Query Languages: Relational Algebra. Juliana Freire

Relational Algebra 1

Comp 5311 Database Management Systems. 2. Relational Model and Algebra

Relational Algebra. Mr. Prasad Sawant. MACS College. Mr.Prasad Sawant MACS College Pune

The Extended Algebra. Duplicate Elimination. Sorting. Example: Duplicate Elimination

Relational Query Languages

Relational Model History. COSC 416 NoSQL Databases. Relational Model (Review) Relation Example. Relational Model Definitions. Relational Integrity

RELATIONAL ALGEBRA II. CS121: Relational Databases Fall 2017 Lecture 3

CS121 MIDTERM REVIEW. CS121: Relational Databases Fall 2017 Lecture 13

Relational Algebra. Lecture 4A Kathleen Durant Northeastern University

The Relational Algebra

Relational Query Languages. Relational Algebra. Preliminaries. Formal Relational Query Languages. Relational Algebra: 5 Basic Operations

Relational Databases

RELATIONAL DATA MODEL

Chapter 3: Relational Model

Databases - 4. Other relational operations and DDL. How to write RA expressions for dummies

Notes. CS 640 Relational Algebra Winter / 16. Notes. CS 640 Relational Algebra Winter / 16. Notes

Relational Model and Relational Algebra. Rose-Hulman Institute of Technology Curt Clifton

NULLs & Outer Joins. Objectives of the Lecture :

Concepts of Database Management Eighth Edition. Chapter 2 The Relational Model 1: Introduction, QBE, and Relational Algebra

Database Management Systems. Chapter 4. Relational Algebra. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

Views. Lecture 15. Robb T. Koether. Fri, Feb 16, Hampden-Sydney College. Robb T. Koether (Hampden-Sydney College) Views Fri, Feb 16, / 28

Part 1 on Table Function

8) A top-to-bottom relationship among the items in a database is established by a

Chapter 4. The Relational Model

Lecture 16. The Relational Model

Relational Database Model. III. Introduction to the Relational Database Model. Relational Database Model. Relational Terminology.

Lecture 2 SQL. Instructor: Sudeepa Roy. CompSci 516: Data Intensive Computing Systems

A subquery is a nested query inserted inside a large query Generally occurs with select, from, where Also known as inner query or inner select,

Database Applications (15-415)

v Conceptual Design: ER model v Logical Design: ER to relational model v Querying and manipulating data

The Relational Algebra and Calculus. Copyright 2013 Ramez Elmasri and Shamkant B. Navathe

Informatics 1: Data & Analysis

CS34800 Information Systems. The Relational Model Prof. Walid Aref 29 August, 2016

Web Science & Technologies University of Koblenz Landau, Germany. Relational Data Model

Transcription:

Lecture 4. Relational Algebra By now, we know that (a) All data in a relational DB is stored in tables (b) there are several tables in each DB (because of Normalization), and (c) often the information we need is divided between different tables. Therefore we need a logical method to combine data in different tables, to select some subset of the data, etc. Here, we learn how we can do so using the algebra of relational tables. What is an algebra? A formal system of manipulation of symbols to deal with general statements of relations. Algebraic systems are abstract: in the sense that a system only provides a set of symbols, and set of rules to construct valid expressions, and a set of rules by which expressions and symbols can be manipulated. The algebra we learn in high school is the algebra of real numbers. All symbols for variables represented real numbers, and all manipulations of symbols gave relations between some real numbers. Later we learnt the Algebra of Complex numbers, which worked pretty much the same way. Now we learn the Algebra of relations, where every relational table is a set of tuples, and each tuple is an ordered sequence of attribute values. Just as real algebra has operations (+, -, x, /), Relational Algebra also has operations. The main RA operations: Select, Project, Join, Divide Notice that the result of any operation in real algebra is also a real number. Thus, for real numbers x, y: (x + y) is real. So is ( x - y). So is ( x / y) for all values where / is defined. [When is it not defined?] Similarly, in RA, whenever an operation is defined the result of an RA operation is a relational table. Why is this important? Because this allows us to combine a sequence of RA operations in arbitrary order! [Why is it important to combine arbitrary sequences of RA operations?] 1

Now we shall see that extraction/modification of ANY information in a relational schema can be done by some combination of RA operations. We shall demonstrate with the following tables: EMPLOYEE Name ID SupervisorID DeptNo John 111 222 5 Frankie 222 777 4 Alice 333 444 4 Jennifer 444 777 4 William 555 222 5 Joyce 666 222 5 James 777 222 5 John 888 null 1 WORKS_ON IDno ProjNo Hours 111 1 2.5 111 2 1.5 555 3 2.5 666 1 3.5 666 2 3.5 222 2 2.5 222 3 5.5 222 4 5.5 222 5 1.5 DEPENDENT EmpID DepName Relationship 222 Jack Son 222 Jill Wife 222 John Son 444 Ted Son 111 Mike Son 111 Anita Daughter 2

Select Operation A relational table is composed of an unordered set of tuples (rows). The SELECT operation allows us select a SUBSET of the tuples of a relational table, which satisfy some specified conditions. SELECT [conditions] ( TABLE ) = SELECT [DeptNo = 5] ( EMPLOYEE ) Name ID SupervisorID DeptNo John 111 222 5 William 555 222 5 Joyce 666 222 5 James 777 222 5 All tuples in relational table EMPLOYEE, for which the condition [DeptNo = 5] was TRUE, were placed in the. NOTES: 1. SELECT looks at each tuple of its argument, and evaluates the specified conditions. If the conditions are true, then that tuple is placed in ; otherwise the tuple is rejected. 2. The result of the SELECT operation is also a Relational table! In fact, it is a subset of all the tuples of its argument. 3. Selection conditions must always evaluate to logical values: TRUE, or FALSE. Hence all conditions are EXPRESSIONS, connected by LOGICAL operators (AND, OR, NOT). = SELECT [ (DeptNo!= 4) AND ( (ID = 222) OR (ID = 111)) ] ( EMPLOYEE) Name ID SupervisorID DeptNo John 111 222 5 3

Project Operation The SELECT operation outputs a subset of the rows of a Relational Table. In contrast, the PROJECT operation outputs a subset of the columns of the Relational Table. PROJECT [attribute-list] ( TABLE ) = PROJECT [ Name, ID] ( EMPLOYEE ) Name ID John 111 Frankie 222 Alice 333 Jennifer 444 William 555 Joyce 666 James 777 John 888 NOTES: 1. Once again, the result of the PROJECT operation is another Relational Table. 2. Since Relational tables are a set of tuples, if PROJECT will not output identical tuples twice! = PROJECT [EmpID, Relationship] ( DEPENDENT ) EmpID Relationship 222 Son 222 Wife 444 Son 111 Son 111 Daughter 4

Combinations of RA operations Similar to real algebra, RA Operations can be used in arbitrary combinations: = PROJECT [Name, ID] ( SELECT [ SupervisorID = 222] ( EMPLOYEE) ) This gets evaluated in two steps: First, the SELECT returns 1, and then the PROJECT returns the final. Step 1: 1 Name ID SupervisorID DeptNo John 111 222 5 William 555 222 5 Joyce 666 222 5 James 777 222 5 and then Step 2: Name ID John 111 William 555 Joyce 666 James 777 Join Operations The join operation is used to join the data in two tables. This operator combines the information in two Relational Tables. JOIN [conditions] ( TABLE1, TABLE2) In the above, TABLE1 and TABLE2 can possibly be the same table. JOIN forms combinations of the tables TABLE1 and TABLE2. The output is another table, with all the attributes of TABLE1 and all attributes of TABLE2. 5

How it works: Since Relational Tables are sets of tuples, first form the CARTESIAN PRODUCT of the tables. If TABLE1 has A rows, and B columns; TABLE2 has N rows and M columns, then: The Cartesian product will have A*N tuples, and each tuple will have (B + M) attributes. The of the JOIN will contain every tuple of the Cartesian product for which [conditions] evaluate to TRUE. = JOIN [ID = IDno] ( EMPLOYEE, WORKS_ON ) Name ID SupervisorID DeptNo IDno ProjNo Hours John 111 222 5 111 1 2.5 John 111 222 5 111 2 1.5 Frankie 222 777 4 222 2 2.5 Frankie 222 777 4 222 3 5.5 Frankie 222 777 4 222 4 5.5 Frankie 222 777 4 222 5 1.5 William 555 222 5 555 3 2.5 Joyce 666 222 5 666 1 3.5 Joyce 666 222 5 666 2 3.5 NOTES: 1. The result of THETA-JOIN is also a Relational Table. 2. DOT-convention: Sometimes, the names of attributes in TABLE1 and TABLE2 can be the same. Whenever there is confusion, we shall refer to such attributes by assigning those attribute names in the as TABLE_NAME.Attribute. Thus, the attribute Name in can equivalently be called EMPLOYEE.Name. Likewise, the attributes can all be named as: EMPLOYEE.Name, EMPLOYEE.ID,..., WORKS_ON.IDno,..., WORKS_ON.Hours. 3. NATURAL-JOIN: A special case of the JOIN Operation is often used. In a NATURAL-JOIN, attributes of TABLE1 and TABLE2 that have the SAME NAME, must be equal in value. Data loss in a JOIN: 6

In a join operation, if there is no tuple from TABLE2 matching the conditions with a tuple of TABLE1, then that tuple of TABLE1 does not occur in the. Sometimes, when performing JOIN operations, we may specifically require that all tuples of TABLE1 must occur at least once in the. If no matching tuples are found in TABLE2, then just enter NULL values for the attributes related to TABLE2. This operation is called a LEFT- OUTER-JOIN. = LEFT-OUTER-JOIN [ID = EmpID] ( EMPLOYEE, DEPENDENT ) Name ID SupervisorID DeptNo EmpID DepName Relationship John 111 222 5 111 Mike Son John 111 222 5 111 Anita Daughter Frankie 222 777 4 222 Jack Son Frankie 222 777 4 222 Jill Wife Frankie 222 777 4 222 John Son Alice 333 444 4 null null null Jennifer 444 777 4 444 Ted Son William 555 222 5 null null null Joyce 666 222 5 null null null James 777 222 5 null null null John 888 null 1 null null null Similarly, we can define a RIGHT-OUTER-JOIN, where all tuples of TABLE2 appear at least once, with null values for tuples of TABLE1 when there is no match. 7

Set theoretic Operations: Since relational tables are sets of tuples, common set operations can be easily defined. These include Union, Intersection, Difference, Division. Union: If two tables have the same attributes, you can perform a union. UNION ( TABLE1, TABLE2) X = UNION( (SELECT [EmpID = 222] (DEPENDENTS)), ( SELECT [EmpID = 444] ( DEPENDENTS))) X EmpID DepName Relationship 222 Jack Son 222 Jill Wife 222 John Son 444 Ted Son Intersection: Performs set intersection on the tables, provided they have the same attributes. INTERSECTION ( TABLE1, TABLE2) X = INTERSECTION (( (SELECT [EmpID = 222] (DEPENDENTS)), ( SELECT [Relationship = SON] ( DEPENDENTS))) X EmpID DepName Relationship 222 Jack Son 222 John Son Difference: Performs set difference (every element of first set which is NOT a member of the second set is output) on the two table; the two tables should have identical set of attributes. 8

DIFFERENCE ( TABLE1, TABLE2) Y = DIFFERENCE (( (SELECT [EmpID = 222] (DEPENDENTS)), (SELECT [Relationship = SON] ( DEPENDENTS) ) ) Y EmpID DepName Relationship 222 Jill Wife Note that DIFFERENCE is not commutative; DIFFERENCE( A, B) DIFFERENCE( B, A) DivideBy: You may also perform a set division operation on two tables. The operation is described using the simple tables and example that follow. DIVIDEBY ( TABLE1, TABLE2) The result is defined as follows: 1. Attributes of TABLE2 must be a proper subset of attributes of TABLE1. 2. Let the attributes of TABLE2 be {B1,, Bn}, and of TABLE1 be {A1,, Am, B1,, Bn}. 3. The output of the DIVIDEBY is a table with attributes {A1,, Am}. 4. The output contains all tuples with values <A1i,, Ami> such that for every distinct tuple in TABLE2, with value <B1j,, Bnj>, there is a tuple < A1i,, Ami, B1j,, Bnj> in TABLE1 for every j. 9

Consider the following tables: WORKS_ON EmployeeID ProjectNo 111 1 111 2 222 2 222 3 222 1 333 2 PROJECTS ProjectNo 1 2 3 We compute = DIVIDEBY( WORKS_ON, PROJECTS) 1. will be a table with those attributes of WORKS_ON that are not in PROJECTS, namely, {EmployeeID}. 2. From the PROJECTS table, there are three distinct values of < ProjectNo>: <1>, <2> and <3>. 3. From WORKS_ON, there is exactly one value of {EmployeeID}, namely <222>, such that there are rows in WORKS_ON corresponding to <222, 1>, <222, 2> and <222, 3>. Note that <111> does not qualify since there is no tuple in WORKS_ON with values <111, 3>; likewise, <333> does not qualify. 4. Thus, we get the result: EmployeeID 222 Notice that the above operation answers the question: Which employee works on all the projects. RA is an elegant mathematical tool for accessing and manipulating data stored in relational schemas. It can be shown that RA is complete, in the sense that any modification you would like to do on any subset of a set of relational tables can be done using some combination of RA commands. Thus, RA can be used to construct a Data Manipulation Language (DML) for relational DB s; however, the de facto standard DML used by all DBMS s is SQL, which we shall learn next. SQL is based on a mathematical system called Relational Calculus (which we will not learn here). 10