Functional Dependencies
Redundancy in Database Design A table Students-take-courses (stud-id, name, address, phone, crs-id, instructor-name, office) Students(stud-id, name, address, phone, ) Instructors(name, office, ) Redundant information If a student takes 20 courses, her/his name, address, phone number have to be repeated 20 times If an instructor teaches 2 courses with 120 students in total, her/his office number is repeated 120 times CMPT 354: Database I -- Functional Dependencies 2
Why Redundancy Could Be Bad? Space cost Maintenance overhead If a student updates her/his address, 20 records need to be updated If an instructor moves to a new office, 120 records need to be updated What if inconsistency happens during the update? CMPT 354: Database I -- Functional Dependencies 3
Why Redundancy Could Be Good? Students-take-courses(stud-id, name, crs-id) Student name is redundant if we have table Students(stud-id, name, address, phone, ) Only need Students-take-courses(stud-id, crsid) What if often we need to generate class rosters? Fast query answering: avoid joining two tables many times CMPT 354: Database I -- Functional Dependencies 4
Requirements of Good Design Correctness: no information loss Must be guaranteed Efficiency Minimum (or, as less as possible) redundant (repeated) information Good performance with respect to (expected) typical workload May have to trade off between space and query answering time Redundant information may help query answering CMPT 354: Database I -- Functional Dependencies 5
Atomic Domains Domain is atomic if its elements are considered to be indivisible units Course-id consisting of department code and course number, e.g., CMPT 354 Bad examples: a customer s all accounts, all owners of an account Non-atomic values complicate storage and query answering, and encourage redundant (repeated) storage of data Storage and redundancy: a set of accounts stored with each customer, and a set of owners stored with each account CMPT 354: Database I -- Functional Dependencies 6
First Normal Form Normal form: a quality criteria that the database design should meet A relational schema R is in first normal form if the domains of all attributes of R are atomic All relations are assumed in first normal form A property of how the elements of the domain are used Strings would normally be considered indivisible Course-id is not atomic since two pieces of information are encoded CMPT 354: Database I -- Functional Dependencies 7
Combine Schemas? Combine borrow and loan to get bor_loan = (customer_id, loan_number, amount ) Result is possible repetition of information (L-100 in example below) CMPT 354: Database I -- Functional Dependencies 8
Why Decomposition? Suppose we had started with bor_loan, how would we know to split up (decompose) it into borrower and loan? Write a rule if there were a schema (loan_number, amount), then loan_number would be a candidate key CMPT 354: Database I -- Functional Dependencies 9
Why Decomposition? Denote as a functional dependency loan_number amount In bor_loan, because loan_number is not a candidate key, the amount of a loan may have to be repeated This indicates the need to decompose bor_loan CMPT 354: Database I -- Functional Dependencies 10
Combined Schema w/o Repetition Consider combining loan_branch and loan loan_amt_br = (loan_number, amount, branch_name) No repetition CMPT 354: Database I -- Functional Dependencies 11
Decomposition Is Not Always Good Suppose we decompose employee into employee1 = (employee_id, employee_name) employee2 = (employee_name, telephone_number, start_date) We cannot reconstruct the original employee relation if there are two employees having the same name CMPT 354: Database I -- Functional Dependencies 12
A Lossy Decomposition More tuples after rejoining the tables is considered loss of information instead of gain CMPT 354: Database I -- Functional Dependencies 13
Designing by Decomposition Start from a wide table the universal table Containing all pieces of information Decide whether a particular relation R is in good form In the case that a relation R is not in a good form, decompose it into a set of relations {R 1, R 2,..., R n } such that Each relation is in good form The decomposition does not lose information CMPT 354: Database I -- Functional Dependencies 14
Functional Dependencies Constraints on the set of legal relations Require that the value for a certain set of attributes determines uniquely the value for another set of attributes A functional dependency is a generalization of the notion of a key CMPT 354: Database I -- Functional Dependencies 15
Functional Dependencies Let R be a relation schema, α R and β R The functional dependency α β holds on R if and only if for any legal relations r(r), whenever any two tuples t 1 and t 2 of r agree on the attributes α, they also agree on the attributes β t 1 [α] = t 2 [α] t 1 [β ] = t 2 [β ] CMPT 354: Database I -- Functional Dependencies 16
Example Example: Consider r(a,b ) with the following instance of r. A B 1 4 1 5 3 7 On this instance, A B does NOT hold, but B A does hold CMPT 354: Database I -- Functional Dependencies 17
Super Keys and Candidate Keys K is a superkey for relation schema R if and only if K R K is a candidate key for R if and only if K R and for no α K, α R CMPT 354: Database I -- Functional Dependencies 18
Dependencies and Constraints Functional dependencies can express constraints that cannot be expressed using superkeys Consider the schema bor_loan = (customer_id, loan_number, amount ) We expect loan_number amount We do not expect amount customer_id CMPT 354: Database I -- Functional Dependencies 19
Use of Functional Dependencies Testing relations to see if they are legal under a given set of functional dependencies If a relation r is legal under a set F of functional dependencies, we say that r satisfies F Specifying constraints on the set of legal relations We say that F holds on R if all legal relations on R satisfy the set of functional dependencies F A specific instance of a relation schema may satisfy a functional dependency even if the functional dependency does not hold on all legal instances For example, a specific instance of loan may, by chance, satisfy amount customer_name CMPT 354: Database I -- Functional Dependencies 20
Trivial Functional Dependencies A functional dependency is trivial if it is satisfied by all instances of a relation Example: customer_name, loan_number customer_name customer_name customer_name In general, α β is trivial if β α CMPT 354: Database I -- Functional Dependencies 21
Closure A set of functional dependencies may logically imply other functional dependencies If A B and B C, then A C The set of all functional dependencies logically implied by F is the closure of F We denote the closure of F by F + F + is a superset of F CMPT 354: Database I -- Functional Dependencies 22
Armstrong s Axioms Finding F + (reflexivity) If β α, then α β (augmentation) If α β, then γ α γβ (transitivity) If α β, and β γ, then α γ These rules are Sound: generate only functional dependencies that actually hold Complete: generate all functional dependencies that hold CMPT 354: Database I -- Functional Dependencies 23
Example R = (A, B, C, G, H, I) F = { A B A C CG H CG I B H} some members of F+ A H By using transitivity from A B and B H AG I By augmenting A C with G, to get AG CG and then using transitivity with CG I CG HI By augmenting CG I to infer CG CGI, and augmenting of CG H to infer CGI HI, and then using transitivity CMPT 354: Database I -- Functional Dependencies 24
Procedure for Computing F + F + = F repeat for each functional dependency f in F + apply reflexivity and augmentation rules on f add the resulting functional dependencies to F + for each pair of functional dependencies f 1 and f 2 in F + if f 1 and f 2 can be combined using transitivity then add the resulting functional dependency to F + until F + does not change any further CMPT 354: Database I -- Functional Dependencies 25
Auxiliary Rules We can further simplify manual computation of F+ by using the following additional rules (union) If α βholds and α γholds, then α βγholds (decomposition) If α βγholds, then α β holds and α γholds (pseudotransitivity) If α β holds and γ β δ holds, then α γ δholds The above rules can be inferred from Armstrong s axioms CMPT 354: Database I -- Functional Dependencies 26
Summary First normal form Decomposition in database design Functional dependencies Armstrong s axioms and auxiliary rules for closure computation CMPT 354: Database I -- Functional Dependencies 27
To-Do-List Please prove the auxiliary rules using Armstrong s Axioms CMPT 354: Database I -- Functional Dependencies 28