Relational Algebra 1
Motivation The relational data model provides a means of defining the database structure and constraints NAME SALARY ADDRESS DEPT Smith 50k St. Lucia Printing Dilbert 40k Taringa Printing Jones 60k Kenmore Printing Trump 65k Auchenflower Head Office Harrison 78k St. Lucia Head Office A data model must also provide a set of operations to manipulate the data Find the names and departments of all employees who earn more than 55K Increment the salary of all employees in the printing department by 10% What is the address of employee Jones The basic set of relational model operations constitute the Relational Algebra 2
Contents Relational Algebra What is a Relational Query Relational Query Languages Relational Algebra Operations Query Formulation in Relational Algebra Exercises in Relational Algebra 3
What is a Relational Query Data in a relational database can be manipulated in the following ways: INSERT : New tuples may be inserted DELETE : Existing tuples may be deleted UPDATE : Values of attributes in existing tuples may be changed RETRIEVE: Attributes of specific tuples, entire tuples, or even entire relations may be retrieved Relational Query Languages should provide all of the above 4
Relational Query Languages Relational Queries are formulated in Relational Query Languages Relational Algebra (RA) Formal query language for a relational database Structured Query Language (SQL) Comprehensive, commercial query language with widely accepted international standard Query by Example (QBE) Commercial, graphical query language with minimum syntax 5
SQL and Relational Algebra SQL Declarative language Users specify what the result of the query should be, DBMS decides operations and order of execution Operations Provides commands to create and modify database structure and constraints (DDL) Provides commands to insert, delete, update and retrieve (DML) RA Procedural language Algebraic expressions specify an order of operations ie. How the query will be processed Operations Provides operators, that enable a user to specify retrieval requests only 6
Contents Relational Algebra What is a Relational Query Relational Query Languages Relational Algebra Operations Query Formulation in Relational Algebra Exercises in Relational Algebra 7
Relational Algebra Operations Relational algebra operations are applied on relations Result of relational algebra operations are also relations, i.e the algebra operations produce new relations from old A sequence of relational algebra operations forms a relational algebra expression, whose result will also be a relation 8
Types of RA Operations Set operations from mathematical set theory (Applicable because each relation is also a set of tuples) UNION INTERSECTION DIFFERENCE CARTESIAN PRODUCT Operations developed specifically for RDBs SELECT PROJECT JOIN DIVISION 9
Operators and Notation Traditional Set Operators Intersection Union Difference Cartesian Product Specific Database Operators Select σ Project Join Division Π 10
Understanding RA Operations SELECT PROJECT Assignment and Naming UNION INTERSECTION DIFFERENCE Properties of operators CARTESIAN PRODUCT JOIN DIVISION Discuss First Discuss Second Discuss Third 11
Select σ < selection condition > ( < relation name > ) Select those rows which satisfy a given condition This operation is also called restriction NAME SALARY ADDRESS DEPT Smith 50k St. Lucia Printing Dilbert 40k Taringa Printing Jones 60k Kenmore Printing Trump 65k Auchenflower Head Office Harrison 78k St. Lucia Head Office Selected Tuples 12
Select Example 1. List all details of employees working in department 4? EMPLOYEE [Ename, SSN, DOB, Address, Sex, Salary, SuperSSN, Dno] σ Dno = 4 (EMPLOYEE) 13
Select Example 2. List all details of employees earning more than $30000? EMPLOYEE [Ename, SSN, DOB, Address, Sex, Salary, SuperSSN, Dno] σ Salary > 30000 (EMPLOYEE) 14
Select Example 3. List all details about employees who work in department 4 and earn over $25000, or work in department 5 and earn over $30000? EMPLOYEE [Ename, SSN, DOB, Address, Sex, Salary, SuperSSN, Dno] σ (Dno = 4 Salary > 25000) (Dno = 5 Salary > 30000) (EMPLOYEE) 15
Π < attribute list > (< relation name >) Project Produce a new relation with only some of the attributes of the original relation. Duplicate tuples are eliminated in the result relation NAME SALARY ADDRESS DEPT Smith 50k St. Lucia Printing Dilbert 50K Taringa Printing Jones 60k Kenmore Printing Trump 65k Auchenflower Head Office Duplicated Tuples Harrison 65K St. Lucia Head Office 16
Project Example 4. For each employee, list their name, date of birth and salary. EMPLOYEE [Ename, SSN, Bdate, Address, Sex, Salary, SuperSSN, Dno] Π Ename,Bdate,Salary (EMPLOYEE) 17
Project Example 5. List the salaries paid to employees in each department and the department number. EMPLOYEE [Ename, SSN, DOB, Address, Sex, Salary, SuperSSN, Dno] Π Dno, Salary (EMPLOYEE) 18
Handling Complex Queries Formulation of complex queries may require several relational algebra operations one after the other Operations can be written as a single relational algebra expression by nesting the operations Operations can be applied one at a time by creating intermediate result relations Intermediate Results have to be assigned to temporary relations which must be named 19
Relation Assignment and Naming Relation Assignment Result Relation Relational Expression Relation Naming TEMP Attribute (re)naming TEMP (dept, emp-salary) 20
Assignment Example 6. Create a new relation named RESULT, containing each employee and their date of birth. Label the resulting columns with Employee and DOB. EMPLOYEE [Ename, SSN, Bdate, Address, Sex, Salary, SuperSSN, Dno] RESULT(Employee,DOB ) Π Ename,Bdate (EMPLOYEE) 21
Assignment Example 7. List the names and salaries of all employees who work for department 5 EMPLOYEE [Ename, SSN, Bdate, Address, Sex, Salary, SuperSSN, Dno] Π Ename,Salary ( σ Dno = 5 (EMPLOYEE ) ) Query with Expression EMPS - DEP5 σ Dno = 5 (EMPLOYEE ) RESULT Π Ename,Salary (EMPS - DEP5) Query with Intermediate Relations 22
SELECT PROJECT Assignment and Naming UNION INTERSECTION DIFFERENCE Properties of operators CARTESIAN PRODUCT JOIN DIVISION Understanding RA Operations 23
Basic Set Operators Relation is a set of tuples (no duplicates) Set theory, and hence elementary set operators also apply to relations UNION INTERSECTION DIFFERENCE CARTESIAN PRODUCT A A A B B B Union A B Intersection A B Difference A - B 24
Union Compatibility in Relations Two relations R(A1, A2,..., An) and S(B1, B2,..., Bn) are union compatible iff They have the same degree n, (number of columns) Their columns have corresponding domains, i.e dom(ai) = dom(bi) for 1 i n Applies to union, intersection and difference 25
Union Compatibility Although domains need to correspond they do not have to have the same name WORKS_ON [ESSN, Pno, Hours] WORKED_ON [Employee, Project, Duration] where dom (ESSN) = dom (Employee) dom (Pno) = dom (project) dom (Hours) = dom (Duration) 26
Union R1 R2 Produces a relation that includes all tuples that appear only in R1, or only in R2, or in both R1 and R2 Duplicate Tuples are eliminated R1 and R2 must be union compatible 27
Union Example 8.Identify the employees who both work on projects and also have dependents WORKS_ON [ESSN, PNo, Hours] DEPENDENT [ESSN, Dep_Name, Sex, DOB, Relationship] WORKS_ON DEPENDENT The relations are not UNION compatible! 28
Union Example 9.List the ESSN s of employees who either have dependents or work on projects. WORKS_ON [ESSN, PNo, Hours] DEPENDENT [ESSN, Dep_Name, Sex, DOB, Relationship] Π ESSN ( DEPENDENT ) Π ESSN (WORKS_ON ) 29
Intersection R1 R2 Produces a relation that includes the tuples that appear in both R1 and R2. R1 and R2 must be union compatible. 30
Intersection Example 10. List the ESSN s of employees who have dependents and work on projects. WORKS_ON [ESSN, PNo, Hours] DEPENDENT [ESSN, Dep_Name, Sex, DOB, Relationship] Π ESSN ( DEPENDENT ) Π ESSN (WORKS_ON ) 31
Difference R1 - R2 Produces a relation that includes all the tuples that appear in R1, but do not appear in R2. R1 and R2 must be union compatible. 32
Difference Example 11. List the ESSN s of employees who have dependents but do not work on projects. WORKS_ON [ESSN, PNo, Hours] DEPENDENT [ESSN, Dep_Name, Sex, DOB, Relationship] Π ESSN ( DEPENDENT ) Π ESSN (WORKS_ON ) 33
Properties of Operators Commutative and Associative Operators Precedence among operators in relational algebra expressions De Morgan s Laws 34
Commutative and Associative A B Commutative A B = B A associative (A B) C = A ( B C ) A B commutative A B = B A associative (A B) C = A ( B C ) A B not commutative A B B A not associative (A B) C A (B C ) 35
Operator Precedence Higher Lower =,, <, >,, not and or σ, Π,,,, +, Operators performed left to right in the expression ( ) can be used to alter operator precedence, that is operations in ( ) will be performed before even if they have a lower precedence order 36
Precedence Example 12. List all employees who are male, and either earn less than $40000 or work for deptment 5. EMPLOYEE [Ename, SSN, DOB, Address, Sex, Salary, SuperSSN, Dnum] Π Ename (σ (Sex = M and (Salary < 40000 or Dnum = 5)) (EMPLOYEE )) How does the above solution differ from the following? Π Ename (σ (Sex = M and Salary < 40000 or Dnum = 5) (EMPLOYEE )) 37
De Morgan s Laws ( p q ) p q ( p q ) p q where p and q are predicates, e.g Age>20, Dept=Research, e.g (Salary > 40000) (Dept = Research) ) (Salary > 40000) (Dept = Research) 38
DeMorgan s Law Example 13. List all projects which are neither located in Brisbane, nor controlled by department 4. PROJECT [PName, PNo, Plocation, Dnum] Π Pname (σ not (Plocation = Brisbane ) and not (Dnum=4) (PROJECT)) Π Pname (σ not (Plocation = Brisbane or Dnum=4) (PROJECT)) Π Pname (σ not (Plocation <> Brisbane and Dnum <> 4) (PROJECT)) 39
Cartesian Product R1 R2 Also known as a cross-product or cross-join R1 and R2 need NOT be union compatible The result of R1 (A1, A2, An) x R2 (B1, B2, Bm) is a relation Q with n + m attributes Q (A1, A2, An, B1, B2, Bm) in that order Q has one tuple for each combination of tuples from R1 and R2, thus if R1 has r tuples and R2 has t tuples, then Q will have r * t tuples 40
Cartesian Product Example Subject Student Degree Subject Student Degree CS114 CS115 CS180 Anna BIT = Fred BSc CS114 Anna BIT CS114 Fred BSc CS115 Anna BIT CS115 Fred BSc CS180 Anna BIT CS180 Fred BSc 41
Cartesian Product Example 14. For each female employee, list the names of all of her dependents. EMPLOYEE [Ename,SSN,DOB,Address,Sex,Salary,SuperSSN, Dno] DEPENDENT [ESSN, DepName, Sex, DOB, Relationship] FEMALE_EMPS σ Sex = F (EMPLOYEE) EMP_NAMES Π Ename, SSN (FEMALE_EMPS) EMP_DEPEND EMP_NAMES DEPENDENT ACTUAL_DEPEND σ SSN = ESSN (EMP_DEPEND) RESULT Π Ename, DepName (ACTUAL_DEPEND) 42
SELECT PROJECT Assignment and Naming UNION INTERSECTION DIFFERENCE Properties of operators CARTESIAN PRODUCT JOIN DIVISION Understanding RA Operations 43
Join Operations A Join is similar to Cartesian Product, but only selected pairs of tuples appear in the result It is used to combine related tuples from two relations into a single tuple in a new relation. This is needed when information is contained in more than one relation There are three types of Join Operations: Thieta-Join Equi-Join Natural Join 44
Thieta-Join R1 < join condition> R2 A join condition(s) is of the form A θ B, where A R1 and B R2, and θ is one of {=,, <,, >, } 45
Thieta-Join Example 15. For each employee, list all the employees who earn more (than the first employee). EMPLOYEE [Ename, SSN, DOB, Address, Sex, Salary, SuperSSN, Dno] DEPARTMENT [DName, DNumber, MgrSSN, MgrStart] A EMPLOYEE B EMPLOYEE RESULT Π A.Ename, B.Ename (A A.Salary < B.Salary B) 46
Equi-Join R1 < join condition> R2 Specialization of Join Join condition only has equality comparisons only 47
Equi-Join Example 16. List the names of the managers of each department. EMPLOYEE [ Ename, SSN, DOB, Address, Sex, Salary, SuperSSN, Dno] DEPARTMENT [ DName, DNumber, MgrSSN, MgrStart] DEPT_MGR DEPARTMENT MGRSSN = SSN EMPLOYEE RESULT Π Ename (DEPT_MGR) 48
Natural Join R1 * R2 Similar to equi-join except that the attributes that are used for the join are those that have the same name in each relation Consequently, they are not explicitly specified The duplicate column is eliminated 49
Natural Join Example Subject Student Student Degree Subject Student Degree CS114 CS115 CS180 CS214 Anna Fred Anna Bobby Anna BIT * = Anna BA Fred BSc CS114 Anna BIT CS114 Anna BA CS115 Fred BSc CS180 Anna BIT CS180 Anna BA 50
Natural Join Example 17. What is the result schema of the following query? What attributes is the join performed on? DEPARTMENT [ DName, DNumber, MgrSSN, MgrStart ] DEPT_LOCS [ DNumber, Dlocation ] DEPARTMENT * DEPT_LOCS 51
Natural Join Example 18. What is the difference between the results of the following queries? What attributes are the joins performed on? EMPLOYEE [ Ename, SSN, DOB, Address, Sex, Salary, SuperSSN, Dno ] DEPARTMENT [ DName, DNumber, MgrSSN, MgrStart ] EMPLOYEE * DEPARTMENT and EMP(MgrSSN,Dnumber) Π SSN,Dno (EMPLOYEE) RESULT EMP * DEPARTMENT 52
R1 R2 Division Result relation contains columns in R1, but not in R2 Relations R1 and R2 must be division compatible, i.e last n columns of R1 must be identically named to columns in R2, where n is the degree of R2 The result relation contains tuples t, such that a value in t appears in R1, in combination with every tuple in R2 53
Division Example Student Degree Subject Subject Student Degree Anna BIT CS114 Anna BIT CS115 Anna BIT CS180 CS114 CS115 = Anna BIT Fred BSc CS114 Fred BSc CS180 54
Division Example 22. Retrieve the names of employees who work on all projects that John Smith works on. EMPLOYEE [ Ename, SSN, DOB, Address, Sex, Salary, SuperSSN, Dno] WORKS_ON [ ESSN, Pno, Hours ] SMITH σ Ename= John Smith (EMPLOYEE) SMITH_PNOS Π Pno (WORKS_ON ESSN=SSN SMITH) SMITH_PNOS Π ESSN, Pno (WORKS_ON) SMITH_PNOS SSN_PNOS SMITH_PNOS RESULT Π Ename (SSNS * EMPLOYEE) 55
Contents Relational Algebra What is a Relational Query Relational Query Languages Relational Algebra Operations Query Formulation in Relational Algebra Exercises in Relational Algebra 56
Query Formulation in RA Understand what the English query means Identify which relations, tuples (SELECT) and attributes (PROJECT) that will be required for the query Identify the relationships between required relations and accordingly which binary operators can be used (JOIN, PRODUCT, UNION, DIVISION, ) Formulate the query keeping in mind operator properties (Commutative/Associative, Order precedence, De Morgan s Laws) 57
Which RA Operator to use? SELECT PROJECT UNION INTERSECTION DIFFERENCE CARTESIAN PRODUCT JOIN DIVISION Use unary operators SELECT / PROJECT when choosing tuples / attributes respectively froma single relation Use binary operators UNION, PRODUCT, JOIN, when defining the relationship between 2 or more relations { σ Π } Complete Set of Operations 58
Complete Set of RA Operators It has been proved that {σ, Π,,, complete set of RA operators } is a Each remaining relational algebra operator can be expressed as a sequence of operations from this set These remaining operators have been defined primarily for convenience! 59
Expressing other operators Intersection R S ( R S ) (( R S ) ( S R ) (Thieta/Equi) Join R <condition> S σ <condition> ( R S) Natural Join R1 (B1, A2, A3,... An) Π (A1, A2, A3,... An) R R * S Π (B1, A2, A3,... An, B2,... Bm) σ <R.B1 = S.B1> ( R1 S) 60
Expressing other operators Division T1 Π Y ( R ) T2 Π Y ( ( S T1 ) R ) R S T1 T2 61
Contents Relational Algebra What is a Relational Query Relational Query Languages Relational Algebra Operations Query Formulation in Relational Algebra Exercises in Relational Algebra 62
Relational Algebra Exercises These exercises use the Company database as an example to illustrate relational algebra queries that require the use of multiple relational algebra operators EMPLOYEE [Ssn, Fname, Mit, Lname, Dob, Address, Sex,Salary, Dno, SuperSSN] DEPARTMENT [Dnumber, Dname, MGRSSN,MgrStart] PROJECT [Pno, PName, Plocation, DNum] DEPENDENT [ESSN,DepName, Sex, DOB, Relationship] WORKS_ON [ESSN, PNo, Hours] DEPT_LOCS [DNumber, DLocation] 63
RA Exercise 23. Retrieve the name and address of all employees who work for the Research Department. EMPLOYEE [ Ename, SSN, DOB, Address, Sex, Salary, SuperSSN, Dno ] DEPARTMENT [ Dname, Dnumber, MgrSSN, MgrStart ] RESEARCH_DEPT σ Dname= Research (DEPARTMENT) RESEARCH_DEPT_EMPS (RESEARCH_DEPT Dnumber=Dno EMPLOYEE) RESULT Π Ename,Address (RESEARCH_DEPT_EMPS) 64
RA Exercise 24. For every project located in Ipswich, list the project number, the controlling department number, and the department manager s name, address & birth date. EMPLOYEE [ Ename, SSN, DOB, Address, Sex, Salary, SuperSSN, Dno] DEPARTMENT [ Dname, Dnumber, MgrSSN, MgrStart] PROJECT [ PName, Pnumber, Plocation, Dnum] IPSWICH_PROJS σ Plocation= ipswich (PROJECT) CONTR_DEPT (IPSWICH_PROJS Dnum=Dnumber DEPARTMENT) PROJ_DEPT_MGR (CONTR_DEPT MgrSSN=SSN EMPLOYEE) RESULT Π Pnumber,Dnum,Ename,Address,Bdate (PROJ_DEPT_MGR) 65
RA Exercise 25. Find the names of employees who work on all projects controlled by department 5. EMPLOYEE [ Ename, SSN, DOB, Address, Sex, Salary, SuperSSN, Dno] PROJECT [ PName, Pnumber, Plocation, Dnum] WORKS_ON [ ESSN, Pno, Hours] DEPT5_PROJS(Pno) Π Pnumber (σ Dnum=5 (PROJECT)) EMP_PROJ(SSN,Pno) Π ESSN,Pno (WORKS_ON) RESULT_EMP_SSNS EMP_PROJ DEPT5_PROJS RESULT Π Ename (RESULT_EMP_SSNS * EMPLOYEE ) 66
RA Exercise 26. List project numbers for projects that involve an employee whose name is Smith, either as a worker or as a manager of the department that controls the project. EMPLOYEE [ Ename, SSN, DOB, Address, Sex, Salary, SuperSSN, Dno] PROJECT [ PName, Pnumber, Plocation, Dnum] DEPARTMENT [ Dname, Dnumber, MgrSSN, MgrStart] WORKS_ON [ ESSN, Pno, Hours] SMITHS(ESSN) Π SSN (σ Ename= Smith (EMPLOYEE)) SMITH_WORKER_PROJS Π Pno (WORKS_ON * SMITHS) MGRS Π Ename,Dnumber (EMPLOYEE SSN=MgrSSN DEPARTMENT) SMITH_MGRS σ Ename= Smith (MGRS) SMITH_MANAGED_DEPTS(Dnum) Π Dnumber (SMITH_MGRS) SMITH_MGR_PROJS(Pno) Π Pnumber (SMITH_MANAGED_DEPTS * PROJECT) RESULT SMITH_WORKER_PROJS SMITH_MGR_PROJS 67
RA Exercise 27. Retrieve the names of employees who have no dependents. EMPLOYEE [ Ename, SSN, DOB, Address, Sex, Salary, SuperSSN, Dno] DEPENDENT [ ESSN, Dep_Name, Sex, DOB, Relationship] ALL_EMPS Π SSN (EMPLOYEE) EMPS_WITH_DEPS(SSN) Π ESSN (DEPENDENT) EMPS_WITHOUT_DEPS ( ALL_EMPS EMPS_WITH_DEPS) RESULT Π Ename (EMPS_WITHOUT_DEPS * EMPLOYEE ) 68
RA Exercise 28. List the names of managers who have at least one dependent. EMPLOYEE [ Ename, SSN, DOB, Address, Sex, Salary, SuperSSN, Dno] DEPARTMENT [ Dname, Dnumber, MgrSSN, MgrStart] DEPENDENT [ ESSN, Dep_Name, Sex, DOB, Relationship] MGR(SSN) Π MgrSSN (DEPARTMENT) EMPS_WITH_DEPS(SSN) Π ESSN (DEPENDENT) MGRS_WITH_DEPS (MGRS EMPS_WITH_DEPS) RESULT Π Ename (MGRS_WITH_DEPS * EMPLOYEE) 69
Review Relational algebra gives the theoretical foundations for Relational Query Languages Relational algebra operations operate on entire relations, and produce results which are also relations Relational algebra expressions, consisting of a sequence of relational algebra operators, specify a high-level procedure to achieve a query result However, relational algebraic query formulation is procedural, and therefore focuses on how a query result can be achieved Declarative query languages, e.g., SQL, allow the user to specify what info the user wants rather than how the result is to be obtained 70