CS 377 Database Systems Relational Algebra and Calculus Li Xiong Department of Mathematics and Computer Science Emory University 1
ER Diagram of Company Database 2
3
4
5
Relational Algebra and Relational Calculus Previous lecture on relational model presented the structures and constraints for the relational model Relational Algebra Formal foundation for relational model operations Basis for implementing and optimizing queries in RDBMS Basis for practical query languages such as SQL Relational Calculus Formal declarative language for relational queries 6
Outline Relational Algebra Unary Relational Operations Relational Algebra Operations From Set Theory Binary Relational Operations Additional Relational Operations Relational Calculus Tuple Relational Calculus Domain Relational Calculus Coming up SQL 7
Relational Algebra Relational algebra is a mathematical language with a basic set of operations for manipulating relations. A relational algebra operation operates on one or more relations and results a new relation, which can be further manipulated using operations of the same algebra. A relational algebra expression is a sequence of relational algebra operations. 8
Relational Algebra Operations 9
Unary Relational Operations - Select SELECT Operation: select a subset of the tuples from a relation that satisfy a selection condition. Example: To select the EMPLOYEE tuples whose department number is four or those whose salary is greater than $30,000 the following notation is used: σdno = 4 (EMPLOYEE) σsalary > 30,000 (EMPLOYEE) Notation: σ <selection condition> (R) Selection condition is a Boolean expression containing clauses in the form: <attribute name> <comparison op> <constant value> <attribute name> <comparison op> <attribute name> 10
SELECT Operation Properties The SELECT operation σ <selection condition> (R) produces a relation S that has the same schema as R The SELECT operation σ is commutative; i.e., σ <condition1> (σ < condition2> ( R)) = σ <condition2> (σ < condition1> ( R)) A cascaded SELECT operation may be applied in any order; i.e., σ <condition1> (σ < condition2> (σ <condition3> ( R)) = σ <condition2> (σ < condition3> (σ < condition1> ( R))) A cascaded SELECT operation may be replaced by a single selection with a conjunction of all the conditions; i.e., σ <condition1> (σ < condition2> (σ <condition3> ( R)) = σ <condition1> AND < condition2> AND < condition3> ( R))) 11
Unary Relational Operations - Project PROJECT Operation: selects certain columns from the table and discards the other columns. Example: To list each employee s first and last name and salary π LNAME, FNAME,SALARY (EMPLOYEE) Notation: π<attribute list>(r) Duplicate Elimination: the project operation removes any duplicate tuples, so the result of the project operation is a set of tuples 12
PROJECT Operation Properties The number of tuples in π <list> (R) is always less or equal to the number of tuples in R If the list of attributes includes a key of R, then the number of tuples is equal to the number of tuples in R π <list1> (π <list2> (R) ) = π <list1> (R) as long as <list2> contains the attributes in <list1> 13
Sequences of Operations and the In-line expression: RENAME Operation Sequence of operations: Rename attributes in intermediate results RENAME operation Examples ρ DEPT5_EMPS (σdno = 5 (EMPLOYEE)) 16
Relational Algebra Operations From Set Theory UNION Operation: denoted by R S, is a relation that includes all tuples that are either in R or in S or in both R and S. Duplicate tuples are eliminated. INTERSECTION operation: denoted by R S, is a relation that includes all tuples that are in both R and S. Set Difference (or MINUS) Operation: denoted by R - S, is a relation that includes all tuples that are in R but not in S. Type Compatibility: The two operands must be type compatible. The operands R(A 1, A 2,..., A n ) and S(B 1, B 2,..., B n ) must have the same number of attributes, and the domains of corresponding attributes must be compatible; that is, dom(a i )=dom(b i ) for i=1, 2,..., n. The resulting relation for R S, R S, or R-S has the same attribute names as R (by convention). 17
Relational Algebra Operations From Set Theory - Properties Notice that both union and intersection are commutative operations; that is R S = S R, and R S = S R Both union and intersection can be treated as n-ary operations applicable to any number of relations as both are associative operations; that is R (S T) = (R S) T, and (R S) T = R (S T) The minus operation is not commutative; that is, in general R - S S R 18
Relational Algebra Operations From Set Theory Cartesian Product CARTESIAN (or cross product) Operation: combine tuples from two relations. In general, the result of R(A 1, A 2,..., A n ) x S(B 1, B 2,..., B m ) is a relation Q with degree n + m attributes Q(A 1, A 2,..., A n, B 1, B 2,..., B m ), in that order. The resulting relation Q has one tuple for each combination of tuples one from R and one from S. Hence, if R has n R tuples (denoted as R = n R ), and S has n S tuples, then R x S will have n R * n S tuples. The two operands do NOT have to be "type compatible Example: FEMALE_EMPS σ SEX= F (EMPLOYEE) EMPNAMES π FNAME, LNAME, SSN (FEMALE_EMPS) EMP_DEPENDENTS EMPNAMES x DEPENDENT 21
FEMALE_EMPS σ SEX= F (EMPLOYEE) EMPNAMES π FNAME, LNAME, SSN (FEMALE_EMPS) EMP_DEPENDENTS EMPNAMES x DEPENDENT
Binary Relational Operations - Join JOIN Operation: the sequence of cartesian product followed by select Notation: R <join condition> S where R and S can be any relations that result from general relational algebra expressions. 24
ACTUAL_DEPENDENTS EMPNAMES SSN=ESSN DEPENDENTS
Retrieve the department and the manager s information: DEPT_MGR DEPARTMENT MGRSSN=SSN EMPLOYEE
Variations of Join EQUIJOIN Operation Involves join conditions with equality comparisons only. The result always have one or more pairs of attributes (whose names need not be identical) that have identical values in every tuple. NATURAL JOIN Operation * Gets rid of the second (superfluous) attribute in an EQUIJOIN condition. Requires the two join attributes, or each pair of corresponding join attributes, have the same name in both relations. If this is not the case, a renaming operation is applied first. 28
Proj_Dept <- Project * Department Dept_Locs <- Department * Dept_Locations 29
Additional Relational Operations Outer Join In NATURAL JOIN tuples without a matching (or related) tuple are eliminated from the join result. Tuples with null in the join attributes are also eliminated. Outer joins can be used when we want to keep all the tuples in R or S, regardless of whether or not they have matching tuples in the other relation. The left outer join operation keeps every tuple in the first or left relation R in R S; if no matching tuple is found in S, then the attributes of S in the join result are filled or padded with null values. The right outer join, keeps every tuple in the second or right relation S in the result of R S. A third operation, full outer join, denoted by keeps all tuples in both the left and the right relations 31
Outer Join Example EMPLOYEE left outer join (SSN=Mgr_SSN) DEPARTMENT 32
Binary Relational Operations - Division DIVISION Operation: R(Z) S(X), where X subset Z. Example: retrieve the SSN of employees who work on all the projects that John Smith is working on Let Y = Z - X (and hence Z = X Y). The result of DIVISION is a relation T(Y) that includes a tuple t if tuples t R appear in R with t R [Y] = t, and with t R [X] = t s for every tuple t s in S. For a tuple t to appear in the result T of the DIVISION, the values in t must appear in R in combination with every tuple in S. 33