Relational Algebra for sets Introduction to relational algebra for bags Thursday, September 27, 2012 1 1
Terminology for Relational Databases Slide repeated from Lecture 1... Account Number Owner Balance Type 101 J. Smith 1000.00 checking 102 W. Wei 2000.00 checking 103 J. Smith 5000.00 savings 104 M. Jones 1000.00 checking 105 H. Martin 10,000.00 checking 2 relation Each entry in the table is called a row or a tuple. Sometimes an entry in the table is called a record. The instance is the current set of rows (or tuples).
Codd s Original Relational Algebra Operators Eight operators defined for sets: project select cross product join union intersection difference division Plus renaming (to provide names for the relation & attributes of answer) 3
Project operator (π) in relational algebra Operator invented by Codd (not part of set theory) Account Number Owner Balance Type 101 J. Smith 1000.00 checking 102 W. Wei 2000.00 checking 103 J. Smith 5000.00 savings 104 M. Jones 1000.00 checking 105 H. Martin 10,000.00 checking Consider the query: π Number, Owner Account list of attributes (to retain) 4
Project operator (π) in relational algebra Always applied to single relation a unary operator For the query: π Number, Owner Account query answer is: Number Owner 101 J. Smith 102 W. Wei 103 J. Smith 104 M. Jones 105 H. Martin 5
Project operator (π): another example Account Number Owner Balance Type 101 J. Smith 1000.00 checking 102 W. Wei 2000.00 checking 103 J. Smith 5000.00 savings 104 M. Jones 1000.00 checking 105 H. Martin 10,000.00 checking Consider the query: π Owner Account list of attributes (to retain) 6
Project operator example (cont.) Consider the query: π Owner Account Query answer is: Owner W. Wei J. Smith M. Jones H. Martin In relational algebra defined on sets, the query answer is a set. J. Smith appears just once in the query answer. 7
Select operator ( ) in relational algebra invented by Codd (not part of set theory) Given the following relation (and instance) Account Number Owner Balance Type 101 J. Smith 1000.00 checking 102 W. Wei 2000.00 checking 103 J. Smith 5000.00 savings 104 M. Jones 1000.00 checking 105 H. Martin 10,000.00 checking Consider the query: Balance < 3000 Account 8
Select operator example (cont.) Balance < 3000 Account The select predicate is evaluated for each tuple Account Number Owner Balance Type 101 J. Smith 1000.00 checking 102 W. Wei 2000.00 checking 103 J. Smith 5000.00 savings 104 M. Jones 1000.00 checking 105 H. Martin 10,000.00 checking 9
Select operator example (cont.) For this query: Balance < 3000 Account The query answer is: Number Owner Balance Type 101 J. Smith 1000.00 checking 102 W. Wei 2000.00 checking 104 M. Jones 1000.00 checking 10
Select operator in relational algebra Always applied to a single relation a unary operator Balance < 3000 Account the select operator a relation name (or a relation expression) 11 the predicate: an attribute a comparator (, >,, =,, <) an attribute or a constant
Examples using the select operator Balance < 3000 Account Number = 103 Account Balance = Number Account Attribute compared to attribute! Type = checking ( Balance < 3000 Account) relational expression! Account Number Owner Balance Type 101 J. Smith 1000.00 checking 102 W. Wei 2000.00 checking 103 J. Smith 5000.00 savings 104 M. Jones 1000.00 checking 105 H. Martin 10,000.00 checking 12
Example (Useless) Query with Answer Account Query answer is empty. But that s allowed. Number Owner Balance Type 101 J. Smith 1000.00 checking 102 W. Wei 2000.00 checking 103 J. Smith 5000.00 savings 104 M. Jones 1000.00 checking 105 H. Martin 10,000.00 checking Type= checking AND Type = savings ATMWithdrawal Number Owner Balance Type But why is this a useless query? 13
Select and Project can be combined Account Number Owner Balance Type 101 J. Smith 1000.00 checking 102 W. Wei 2000.00 checking 103 J. Smith 5000.00 instance. savings 104 M. Jones 1000.00 checking 105 H. Martin 10,000.00 checking Owner ( Balance < 3000 Account) Balance < 3000 ( Owner, Balance Account) Note: two queries are equivalent if they are guaranteed to return the same query answer for every possible DB Owner ( Balance < 3000 ( Owner, Balance Account)) Balance < 3000 ( Owner Account) Is this one well-formed? Which pairs of these queries are equivalent, if any? 14
Cross Product an operator from set theory Suppose.. A = {a, b, c} B = {1, 2} then in set theory, the cross product is defined as: A X B = {(a, 1), (b, 1), (c, 1), (a, 2), (b, 2), (c, 2)} A X B is a set consisting of pairs (2-tuples) where each pair consists of an element from A and an element from B 15
Cross Product in Set Theory Suppose.. A = {a, b, c} B = {1, 2} C = {x, y} then A X B = {(a, 1), (b, 1), (c, 1), (a, 2), (b, 2), (c, 2)} and (A X B) X C = {((a,1),x), ((b,1),x), ((c,1),x), ((a,2),x), ((b,2),x), ((c,2),x), ((a,1),y), ((b,1),y), ((c,1),y), ((a,2),y), ((b,2),y), ((c,2),y)} 16
Cross Product in Relational Algebra vs. Set Theory Given A = {a, b, c} B = {1, 2} C = {x, y} then (A X B) X C, in set theory, = {((a,1),x), ((b,1),x), ((c,1),x), ((a,2),x), ((b,2),x), ((c,2),x), ((a,1),y), ((b,1),y), ((c,1),y), ((a,2),y), ((b,2),y), ((c,2),y)} Codd simplified it in relational algebra to: {(a,1,x), (b,1,x), (c,1,x), (a,2,x), (b,2,x), (c,2,x), (a,1,y), (b,1,y), (c,1,y), (a,2,y), (b,2,y), (c,2,y)} by eliminating parentheses. flattening the tuples. 17
Same slide with color eliminated Given A = {a, b, c} B = {1, 2} C = {x, y} with the cross product (A X B) X C in set theory = {((a,1),x), ((b,1),x), ((c,1),x), ((a,2),x), ((b,2),x), ((c,2),x), ((a,1),y), ((b,1),y), ((c,1),y), ((a,2),y), ((b,2),y), ((c,2),y)} Codd simplified it in relational algebra to: {(a,1,x), (b,1,x), (c,1,x), (a,2,x), (b,2,x), (c,2,x), (a,1,y), (b,1,y), (c,1,y), (a,2,y), (b,2,y), (c,2,y)} by eliminating parentheses. flattening the tuples. 18
Example Database to show how cross product can be used in a query Imagine that we have these two relations in a university database. Teacher (t-num, t-name) Course (c-num, c-name) In reality, the relations would probably be more detailed with attribute names as follows: Teacher (Number, Name, Office, E-mail) Course (Number, Name, Description) Taught-By (Quarter, Course, Section, Teacher, TimeDays) etc. 19
X cross product operator produces every possible combination Teacher t-num t-name Cross product produces: every possible combination of a teacher and a course Course c-num c-name 101 Smith 586 Intro to DB 105 Jones 533 Intro to OS 110 Fong Teacher X Course t-num t-name c-num c-name 101 Smith 586 Intro to DB 105 Jones 586 Intro to DB 110 Fong 586 Intro to DB 101 Smith 533 Intro to OS 105 Jones 533 Intro to OS 110 Fong 533 Intro to OS 20
Account Number Owner Balance Type 101 J. Smith 1000.00 checking 102 W. Wei 2000.00 checking 103 J. Smith 5000.00 savings 104 M. Jones 1000.00 checking 105 H. Martin 10,000.00 checking Deposit Account T-id Date Amount 102 1 10/22/00 500.00 102 2 10/29/00 200.00 104 3 10/29/00 1000.00 105 4 11/02/00 10,000.00 Balance > 1000 AND Number =Account (Account X Deposit) Cross product followed by select. 21
Account Number Owner Balance Type 101 J. Smith 1000.00 checking 102 W. Wei 2000.00 checking 103 J. Smith 5000.00 savings 104 M. Jones 1000.00 checking 105 H. Martin 10,000.00 checking No! Throw it away. Deposit Account T-id Date Amount 102 1 10/22/00 500.00 102 2 10/29/00 200.00 104 3 10/29/00 1000.00 105 4 11/02/00 10,000.00 Balance > 1000 AND Number =Account (Account X Deposit) notice the columns 22 Number Owner Balance Type Account T-id Date Amount
Account Number Owner Balance Type 101 J. Smith 1000.00 checking 102 W. Wei 2000.00 checking 103 J. Smith 5000.00 savings 104 M. Jones 1000.00 checking 105 H. Martin 10,000.00 checking No! Throw it away. Deposit Account T-id Date Amount 102 1 10/22/00 500.00 102 2 10/29/00 200.00 104 3 10/29/00 1000.00 105 4 11/02/00 10,000.00 Balance > 1000 AND Number =Account (Account X Deposit) Number Owner Balance Type Account T-id Date Amount 23
Account Number Owner Balance Type 101 J. Smith 1000.00 checking 102 W. Wei 2000.00 checking 103 J. Smith 5000.00 savings 104 M. Jones 1000.00 checking 105 H. Martin 10,000.00 checking No! Throw it away. Deposit Account T-id Date Amount 102 1 10/22/00 500.00 102 2 10/29/00 200.00 104 3 10/29/00 1000.00 105 4 11/02/00 10,000.00 Balance > 1000 AND Number =Account (Account X Deposit) Number Owner Balance Type Account T-id Date Amount 24
Account Number Owner Balance Type 101 J. Smith 1000.00 checking 102 W. Wei 2000.00 checking 103 J. Smith 5000.00 savings 104 M. Jones 1000.00 checking 105 H. Martin 10,000.00 checking No! Throw it away. Deposit Account T-id Date Amount 102 1 10/22/00 500.00 102 2 10/29/00 200.00 104 3 10/29/00 1000.00 105 4 11/02/00 10,000.00 Balance > 1000 AND Number =Account (Account X Deposit) Number Owner Balance Type Account T-id Date Amount 25
Account Number Owner Balance Type 101 J. Smith 1000.00 checking 102 W. Wei 2000.00 checking 103 J. Smith 5000.00 savings 104 M. Jones 1000.00 checking 105 H. Martin 10,000.00 checking Yes! Place in query answer. Deposit Account T-id Date Amount 102 1 10/22/00 500.00 102 2 10/29/00 200.00 104 3 10/29/00 1000.00 105 4 11/02/00 10,000.00 Balance > 1000 AND Number =Account (Account X Deposit) Number Owner Balance Type Account T-id Date Amount 102 W. Wei 2000.00 checking102 1 10/22/00 500.00 26
Account Number Owner Balance Type 101 J. Smith 1000.00 checking 102 W. Wei 2000.00 checking 103 J. Smith 5000.00 savings 104 M. Jones 1000.00 checking 105 H. Martin 10,000.00 checking Yes! Place in query answer. Deposit Account T-id Date Amount 102 1 10/22/00 500.00 102 2 10/29/00 200.00 104 3 10/29/00 1000.00 105 4 11/02/00 10,000.00 Balance > 1000 AND Number =Account (Account X Deposit) Number Owner Balance Type Account T-id Date Amount 102 W. Wei 2000.00 checking102 1 10/22/00 500.00 102 W. Wei 2000.00 checking102 2 10/29/00 200.00 27
Account Number Owner Balance Type 101 J. Smith 1000.00 checking 102 W. Wei 2000.00 checking 103 J. Smith 5000.00 savings 104 M. Jones 1000.00 checking 105 H. Martin 10,000.00 checking No! Throw it away. Deposit Account T-id Date Amount 102 1 10/22/00 500.00 102 2 10/29/00 200.00 104 3 10/29/00 1000.00 105 4 11/02/00 10,000.00 Balance > 1000 AND Number =Account (Account X Deposit) Number Owner Balance Type Account T-id Date Amount 102 W. Wei 2000.00 checking102 1 10/22/00 500.00 102 W. Wei 2000.00 checking102 2 10/29/00 200.00 28
Account Number Owner Balance Type 101 J. Smith 1000.00 checking 102 W. Wei 2000.00 checking 103 J. Smith 5000.00 savings 104 M. Jones 1000.00 checking 105 H. Martin 10,000.00 checking No! Throw it away. Deposit Account T-id Date Amount 102 1 10/22/00 500.00 102 2 10/29/00 200.00 104 3 10/29/00 1000.00 105 4 11/02/00 10,000.00 Balance > 1000 AND Number =Account (Account X Deposit) Number Owner Balance Type Account T-id Date Amount 102 W. Wei 2000.00 checking102 1 10/22/00 500.00 102 W. Wei 2000.00 checking102 2 10/29/00 200.00 29
Account Number Owner Balance Type 101 J. Smith 1000.00 checking 102 W. Wei 2000.00 checking 103 J. Smith 5000.00 savings 104 M. Jones 1000.00 checking 105 H. Martin 10,000.00 checking All combinations fail! Deposit Account T-id Date Amount 102 1 10/22/00 500.00 102 2 10/29/00 200.00 104 3 10/29/00 1000.00 105 4 11/02/00 10,000.00 Balance > 1000 AND Number =Account (Account X Deposit) Number Owner Balance Type Account T-id Date Amount 102 W. Wei 2000.00 checking102 1 10/22/00 500.00 102 W. Wei 2000.00 checking102 2 10/29/00 200.00 30
Account Number Owner Balance Type 101 J. Smith 1000.00 checking 102 W. Wei 2000.00 checking 103 J. Smith 5000.00 savings 104 M. Jones 1000.00 checking 105 H. Martin 10,000.00 checking No! Throw it away. Deposit Account T-id Date Amount 102 1 10/22/00 500.00 102 2 10/29/00 200.00 104 3 10/29/00 1000.00 105 4 11/02/00 10,000.00 Balance > 1000 AND Number =Account (Account X Deposit) Number Owner Balance Type Account T-id Date Amount 102 W. Wei 2000.00 checking102 1 10/22/00 500.00 102 W. Wei 2000.00 checking102 2 10/29/00 200.00 31
Account Number Owner Balance Type 101 J. Smith 1000.00 checking 102 W. Wei 2000.00 checking 103 J. Smith 5000.00 savings 104 M. Jones 1000.00 checking 105 H. Martin 10,000.00 checking No! Throw it away. Deposit Account T-id Date Amount 102 1 10/22/00 500.00 102 2 10/29/00 200.00 104 3 10/29/00 1000.00 105 4 11/02/00 10,000.00 Balance > 1000 AND Number =Account (Account X Deposit) Number Owner Balance Type Account T-id Date Amount 102 W. Wei 2000.00 checking102 1 10/22/00 500.00 102 W. Wei 2000.00 checking102 2 10/29/00 200.00 32
Account Number Owner Balance Type 101 J. Smith 1000.00 checking 102 W. Wei 2000.00 checking 103 J. Smith 5000.00 savings 104 M. Jones 1000.00 checking 105 H. Martin 10,000.00 checking No! Throw it away. Why? Deposit Account T-id Date Amount 102 1 10/22/00 500.00 102 2 10/29/00 200.00 104 3 10/29/00 1000.00 105 4 11/02/00 10,000.00 Balance > 1000 AND Number =Account (Account X Deposit) Number Owner Balance Type Account T-id Date Amount 102 W. Wei 2000.00 checking102 1 10/22/00 500.00 102 W. Wei 2000.00 checking102 2 10/29/00 200.00 33
Account Number Owner Balance Type 101 J. Smith 1000.00 checking 102 W. Wei 2000.00 checking 103 J. Smith 5000.00 savings 104 M. Jones 1000.00 checking 105 H. Martin 10,000.00 checking No! Throw it away. Deposit Account T-id Date Amount 102 1 10/22/00 500.00 102 2 10/29/00 200.00 104 3 10/29/00 1000.00 105 4 11/02/00 10,000.00 Balance > 1000 AND Number =Account (Account X Deposit) Number Owner Balance Type Account T-id Date Amount 102 W. Wei 2000.00 checking102 1 10/22/00 500.00 102 W. Wei 2000.00 checking102 2 10/29/00 200.00 34
Account Number Owner Balance Type 101 J. Smith 1000.00 checking 102 W. Wei 2000.00 checking 103 J. Smith 5000.00 savings 104 M. Jones 1000.00 checking 105 H. Martin 10,000.00 checking The first three fail. Deposit Account T-id Date Amount 102 1 10/22/00 500.00 102 2 10/29/00 200.00 104 3 10/29/00 1000.00 105 4 11/02/00 10,000.00 Balance > 1000 AND Number =Account (Account X Deposit) Number Owner Balance Type Account T-id Date Amount 102 W. Wei 2000.00 checking102 1 10/22/00 500.00 102 W. Wei 2000.00 checking102 2 10/29/00 200.00 35
Account Number Owner Balance Type 101 J. Smith 1000.00 checking 102 W. Wei 2000.00 checking 103 J. Smith 5000.00 savings 104 M. Jones 1000.00 checking 105 H. Martin 10,000.00 checking Yes! Place in query answer. Final answer: Deposit Account T-id Date Amount 102 1 10/22/00 500.00 102 2 10/29/00 200.00 104 3 10/29/00 1000.00 105 4 11/02/00 10,000.00 Balance > 1000 AND Number =Account (Account X Deposit) Number Owner Balance Type Account T-id Date Amount 102 W. Wei 2000.00 checking102 1 10/22/00 500.00 102 W. Wei 2000.00 checking102 2 10/29/00 200.00 105 H. Martin 10,000.00 checking105 4 11/02/00 10,000.00 36
join operator (defined using σ and X) Account Deposit Check Number Owner Balance Type Account Transaction-id Date Amount Account Check-number Date Amount A.Number=Deposit.Account (Account X Deposit) Notice: select condition in first query is used as join condition in second query. is equivalent to Account A.Number=Deposit.Account Deposit 37
A few details about join Each simple Boolean predicate in the join condition must compare an attribute from one relation to an attribute in the other relation. In this query: Account A A.Number=D.Account AND D.type= checking Deposit D the D.type = checking isn t a JOIN condition. If you have a join with NO condition, then it is a cross product by definition. 38
Join.. with all six comparators Student advisor=number Faculty Student S S.age < F.age Faculty F Student S S.salary F.salary Faculty F etc. Join is sometimes called theta-join or θ-join where the θ represents any of the 6 comparators (<, >, =,,, ) (In PostgreSQL the 6 comparators are (<, >, =,!= or <>, >=, <=).) The most common join (with equality) is called equi-join 39
Exercise Class(course, term, room, teacher) teacher is foreign key referencing Faculty.id Faculty(id, name, office) Student(id, name, major) Enrolled(id, course, term, grade) id is a foreign key referencing Student.id (course, term) together is a foreign key referencing Class.(course, term) 40
Faculty(id, name, office) Class(course, term, room, teacher) Enrolled(id, course, term, grade) Student(id, name, major) Provide sample data with several faculty, students, and classes in several terms. Write a relational algebra query that lists faculty id and student id pairs where the student is enrolled in a class that is taught by the faculty member. Write a relational algebra query that lists faculty ids for faculty who teach at least two classes in the same term. 41
Equi-join (reminder) equi join: Account Number=Account Deposit When the join is based on equality, then we always have two identical attributes (columns) in the answer. Number Owner Balance Type Account Trans-id Date Amount 102 W. Wei 2000.00 checking 102 1 10/22/00 500.00 102 W. Wei 2000.00 checking 102 2 10/29/00 200.00 104 M. Jones 1000.00 checking 104 3 10/29/00 1000.00 105 H. Martin 10,000.00 checking 105 4 11/2/00 10000.00 If we use natural join, the duplicate column is eliminated. 42
Natural join Joins two relations by checking for equality on all pairs of attributes with the same name. Eliminates duplicate columns from query answer. This is risky; your queries might change if you change your schema. (If you use natural join in SQL queries.) This is great for textbooks queries are simpler. 43
NATURAL JOIN NATURAL JOIN like a macro that joins tables with an equality check for all attributes with the same name. Course (CNumber, CName, Description) Teacher (TNumber, TName, Phone) Offering (CNumber, TNumber, Time, Days, Room) 44 Teacher Offering Course This query (with natural join) does just what you want. But it requires the schema to be just right.
A simple relational algebra query with zero operators Relational algebra query: Student A relation name, by itself, is a valid relational algebra query. It returns all of the tuples in the relation in the query answer. 45
Relational Algebra Operators There are eight operators project select union intersection difference cross product join division Three operators from set theory renaming (to provide names for the relation & attributes of answer) 46
Union in set theory vs. relational algebra In set theory, the elements of a set can be all different types S = { a, 7053, (1, 2, Smith ), (3, 4, 5, 6, 7, 8, 9)} (atomic values as well as tuples of different lengths) In set theory, you can take the union (or intersection or difference) of any two sets. A = {1, (3, 4, a ), 5.3} B = {7, 1, (2, 3)} A B = {1, (3, 4, a ), 5.3, 7, (2, 3)} A B = {1} A B = {(3, 4, a ), 5.3} But in relational algebra, relations must have the same shape (be union-compatible) before you can take,,. 47
Union Compatible Two relations are union-compatible if they have the same number of attributes and the corresponding attributes have the same name and are defined on the same domains. (this is imprecise because domains/datatypes may not be precise; domains should be compatible) Suppose we have these relations: Checking-Account (num, owner, balance) Savings-Account (num, owner, balance) These are union-compatible relations. 48
Union in Relational Algebra Consider this query: Checking-account Savings-account Checking-account Savings-account num owner balance 101 J. Smith 1000.00 102 W. Wei 2000.00 104 M. Jones 1000.00 105 H. Martin 10,000.00 num owner balance 103 J. Smith 5000.00 49 num owner balance 101 J. Smith 1000.00 102 W. Wei 2000.00 104 M. Jones 1000.00 105 H. Martin 10,000.00 103 J. Smith 5000.00
Intersection in Relational Algebra (example 1) Consider this query: Checking-account Savings-account Checking-account Savings-account num owner balance 101 J. Smith 1000.00 102 W. Wei 2000.00 104 M. Jones 1000.00 105 H. Martin 10,000.00 num owner balance 103 J. Smith 5000.00 What s the answer to this query? 50 50
Intersection in Relational Algebra (ex. 1 cont.) What is the answer to this query: Checking-account Savings-account Checking-account Savings-account What s the answer to this query? num owner balance 101 J. Smith 1000.00 102 W. Wei 2000.00 104 M. Jones 1000.00 105 H. Martin 10,000.00 num owner balance 103 J. Smith 5000.00 It s empty. There are no tuples that are in both relations. 51 51
Intersection in Relational Algebra (example 2) What s the answer to this query? ( owner Checking-account) ( owner Savings-account) Checking-account Savings-account num owner balance 101 J. Smith 1000.00 102 W. Wei 2000.00 104 M. Jones 1000.00 105 H. Martin 10,000.00 num owner balance 103 J. Smith 5000.00 52 52
Intersection in Relational Algebra (ex. 2 cont.) Intermediate query answers ( owner Checking-account) ( owner Savings-account) owner J. Smith W. Wei M. Jones H. Martin owner J. Smith Query answer is (using attribute name from Checkingaccount): owner J. Smith 53 53
Set Difference: Relational Algebra (ex. 1) Consider this query: Checking-account Savings-account Find all the tuples (rows) that are in the Checking-account relation that are not in the Savings-account relation. Checking-account Savings-account num owner balance 101 J. Smith 1000.00 102 W. Wei 2000.00 104 M. Jones 1000.00 105 H. Martin 10,000.00 num owner balance 103 J. Smith 5000.00 What is the answer? 54 54
Set Difference: Relational Algebra (ex. 1 cont.) Consider this query: Checking-account Savings-account Find all the tuples (rows) that are in the Checking-account relation that are not in the Savings-account relation. Checking-account Savings-account num owner balance 101 J. Smith 1000.00 102 W. Wei 2000.00 104 M. Jones 1000.00 105 H. Martin 10,000.00 num owner balance 103 J. Smith 5000.00 What is the answer? All of the rows in the checking-account table. 55 55
Set Difference: Relational Algebra (ex. 2) ( owner Checking-account) ( owner Savings-account) Checking-account Savings-account num owner balance 101 J. Smith 1000.00 102 W. Wei 2000.00 104 M. Jones 1000.00 105 H. Martin 10,000.00 num owner balance 103 J. Smith 5000.00 Compute the intermediate query answers. Then, what is the final query answer? 56 56
Set Difference: Relational Algebra (ex. 2 cont.) ( owner Checking-account) ( owner Savings-account) 57 Checking-account Savings-account Compute the intermediate query answers. Then, what is the final query answer? num owner balance 101 J. Smith 1000.00 102 W. Wei 2000.00 104 M. Jones 1000.00 105 H. Martin 10,000.00 num owner balance 103 J. Smith 5000.00 owner W. Wei M. Jones H. Martin 57
Another example for set operators Graduate-student (id, name, GPA, phone) Undergrad-student (id, name, GPA, phone) These tables are union-compatible; we can issue the following queries: 1. Graduate-student Undergrad-student 2. Graduate-student Undergrad-student 3. Undergrad-student Graduate-student What do these queries compute, described in English? 58 58
Relational Algebra: Divide Operator Suppose we have this extra table, in the Bank database: Account-types Type checking savings Suppose we would like to know which customers have at least one account of each type of account. That is, we want to know who has accounts of ALL the types. 59
We can use the Divide operator in Rel. Alg. ( Owner, Type Account) Account-types Account Number Owner Balance Type 101 J. Smith 1000.00 checking 102 W. Wei 2000.00 checking 103 J. Smith 5000.00 savings 104 M. Jones 1000.00 checking 105 H. Martin 10,000.00 checking Account-types Type checking savings Owner J. Smith Find account owners who have ALL types of accounts. 60
Divide Operator For R S where R (r1, r2, r3, r4) and S(s1, s2) Since S has two attributes, there must be two attributes in R (say r3 and r4) that are defined on the same domains, respectively, as s1 and s2. We could say that (r3, r4) is union-compatible with (s1, s2). The query answer has the remaining attributes (r1, r2). And the answer has a tuple, (r1, r2), in the answer if the (r1, r2) value appears with every S tuple in R. 61
How does divide work? ( Owner, Type Account) Account-types Owner, Type Account Owner Type J. Smith checking W. Wei checking J. Smith savings M. Jones checking H. Martin checking Can we find an owner where there are enough tuples in this table for that owner so that we can match EVERY tuple in Account-types? List all such owners. Account-types Type checking savings Owner 62
Write this query in relational algebra Customer (Number, Name, Address, CRating, CAmount, CBalance, Salesperson) Salesperson (Number, Name, Address, Office) Find the name of salespersons (if there are any) who are assigned to ALL customers. S.Number, S.Name ((( Salesperson, Number Customer) ( Number Customer)) 5 1 2 4 3 X Salesperson=S.Number Salesperson) 63
Why do we use Relational Algebra? Because: It is mathematically defined We can prove that two relational algebra expressions are equivalent. For example: cond1 ( cond2 R) cond2 ( cond1 R) cond1 AND cond2 R ( cond1 R) ( cond2 R) R1 cond R2 cond (R1 R2) 64
Equivalences for AND, OR, and NOT cond1 OR cond2 R ( cond1 R) ( cond2 R) cond1 AND cond2 R ( cond1 R) ( cond2 R) cond1 AND NOT cond2 R ( cond1 R) ( cond2 R) The WHERE clause (and the predicate for the operator) may contain AND, OR, as well as NOT. 65
Uses of Relational Algebra Equivalences To help query writers they can write queries in several different ways To help query optimizers they can choose among different ways to execute the query and in both cases we know for sure that the two queries (the original and the replacement) are identical that they will produce the same answer on all database instances 66
Queries Account Number Owner Balance Type 101 J. Smith 1000.00 checking 102 W. Wei 2000.00 checking 103 J. Smith 5000.00 savings 104 M. Jones 1000.00 checking 105 H. Martin 10,000.00 checking Notice that a query is expressed against the schema. Balance > 1000 AND Number =Account (Account X Deposit) But the query runs or executes against the instance (the data) And may give different answers 67 on different instances Owner J. Smith W. Wei M. Jones H. Martin
Comments on Queries Account Number Owner Balance Type 101 J. Smith 1000.00 checking 102 W. Wei 2000.00 checking 103 J. Smith 5000.00 savings 104 M. Jones 1000.00 checking 105 H. Martin 10,000.00 checking Notice that the answer to a query is always a relation! It doesn t have a name. The attribute names are taken from the input tables. It might or might not have any rows. 68 Owner J. Smith W. Wei M. Jones H. Martin
Comments on Queries Because the answer to a relational query is always a table. we can use the answer from one query as input to another query. This means that we can create arbitrarily complex queries! A relational query languages is closed if it has this property. 69
Example of Codd s Definition of a Relation Suppose we have a relation defined as: Person(name, salary, num, status) with domains defined as: Name-values = {all possible strings of 30 characters} Sal-values = {real numbers between 0 and 100,000} Status-values = { f, p } Num-values = {integers between 0 and 9999} any instance of the relation is always a subset ( ) of: Name-values X Sal-values X Num-values X Status-values Note: a domain is a set of simple, atomic values. 70
Mathematical Definition of a Relational DB (cont.) Each (instance of a) relation is a subset of the cross product of it s domains. One element of a relation is called a tuple. A relation is ALWAYS a set by definition. If you add the element 2 to the set {1, 2, 3, 4} the resulting set is {1, 2, 3, 4} If you add the tuple {101, J. Smith, 1000.00, checking } to the relation on the next slide, you still only have five tuples. 71
Can we define tables (to use in relational algebra)? If you need to define a relation. You could say something like: Let R = {( John, 5, male ), ( Sue, 6, female )} or let R be the table Name Age Gender John 5 male Sue 6 female and then use R in expressions like R X Student or whatever