The Relational Data Model Lecture 6 1 Outline Relational Data Model Functional Dependencies Logical Schema Design Reading Chapter 8 2 1
The Relational Data Model Data Modeling Relational Schema Physical storage Have seen this in SQL E/R diagrams Have seen this too Tables: column names: attributes rows: tuples Discuss next Complex file organization and index structures. 3 Terminology Table name or relation name Attribute names Products: Name Price Category Manufacturer gizmo $19.99 gadgets GizmoWorks Power gizmo $29.99 gadgets GizmoWorks SingleTouch $149.99 photography Canon MultiTouch $203.99 household Hitachi Tuples or rows or records 4 2
Schemas Relational Schema: Relation name plus attribute names E.g. Product(Name, Price, Category, Manufacturer) In practice we add the domain for each attribute Database Schema Set of relational schemas E.g. Product(Name, Price, Category, Manufacturer), Company(Name, Address, Phone),....... 5 Instances Relational schema = R(A 1,,A k ) Instance = relation with k attributes (of type R) values of corresponding domains Database schema = R 1 ( ), R 2 ( ),, R n ( ) Instance = n relations, of types R 1, R 2,..., R n This is all mathematics, not to be confused with SQL tables! (What's a difference?) 6 3
Example Relational schema:product(name, Price, Category, Manufacturer) Instance: Name Price Category Manufacturer gizmo $19.99 gadgets GizmoWorks Power gizmo $29.99 gadgets GizmoWorks SingleTouch $149.99 photography Canon MultiTouch $203.99 household Hitachi 7 First Normal Form (1NF) A database schema is in First Normal Form if all tables are flat Student Student Name Alice Bob Carol GPA 3.8 3.7 3.9 Courses Math DB OS DB OS Math OS May need to add keys Name Alice Bob Carol Takes Student Alice Carol Alice Bob Alice Carol GPA DB OS OS 3.8 3.7 3.9 Course Math Math DB Course Course Math DB OS 8 4
Functional Dependencies A form of constraint hence, part of the schema Finding them is part of the database design Also used in normalizing the relations Warning: this is the most abstract, and hardest part of the course. 9 Definition: Functional Dependencies If two tuples agree on the attributes A 1, A 2,, A n then they must also agree on the attributes B 1, B 2,, B m Formally: A 1, A 2,, A n B 1, B 2,, B m 10 5
Examples EmpID Name Phone Position E0045 Smith 1234 Clerk E1847 John 9876 Salesrep E1111 Smith 9876 Salesrep E9999 Mary 1234 Lawyer EmpID Name, Phone, Position Position Phone but Phone Position 11 In General To check A B, erase all other columns A X1 X2 B Y1 Y2 check if the remaining relation is many-one (called functional in mathematics) 12 6
Example EmpID Name Phone Position E0045 Smith 1234 Clerk E1847 John 9876 Salesrep E1111 Smith 9876 Salesrep E9999 Mary 1234 Lawyer Position Phone 13 Typical Examples of FDs Product: name price, manufacturer Person: ssn name, age Company: name stockprice, president 14 7
Example Product(name, category, color, department, price) Consider these FDs: name color category department color, category price What do they say? 15 Example FD s are constraints on relations: On some instances they hold On others they don t name color category department color, category price name category color department price Gizmo Gadget Green Toys 49 Tweaker Gadget Green Toys 99 Does this instance satisfy all the FDs? 16 8
Example name color category department color, category price name category color department price Gizmo Gadget Green Toys 49 Tweaker Gadget Black Toys 99 Gizmo Stationary Green Office-supp. 59 What about this one? 17 Example If some FDs are satisfied, then others are satisfied too If all these FDs are true: name color category department color, category price Then this FD also holds: name, category price Why?? 18 9
Inference Rules for FD s A 1, A 2,, A n B 1, B 2,, B m Is equivalent to Splitting rule and Combining rule A 1, A 2,, A n B 1 A 1, A 2,, A n B 2..... A 1, A 2,, A n B m A1... Am B1... Bm 19 Inference Rules for FD s (continued) A 1, A 2,, A n A i Trivial Rule where i = 1, 2,..., n A 1 A m Why? 20 10
Inference Rules for FD s (continued) Transitive Closure Rule If A 1, A 2,, A n B 1, B 2,, B m and B 1, B 2,, B m C 1, C 2,, C p then A 1, A 2,, A n C 1, C 2,, C p Why? 21 A 1 A m B 1 B m C 1... C p 22 11
Example (continued) Start from the following FDs: Infer the following FDs: Inferred FD 4. name, category name 5. name, category color 6. name, category category 7. name, category color, category 8. name, category price 1. name color 2. category department 3. color, category price Which Rule did we apply? 23 Example (continued) Answers: 1. name color 2. category department 3. color, category price Inferred FD 4. name, category name 5. name, category color 6. name, category category 7. name, category color, category 8. name, category price Which Rule did we apply? Trivial rule Transitivity on 4, 1 Trivial rule Split/combine on 5, 6 Transitivity on 3, 7 24 12
Another Example Enrollment(student, major, course, room, time) student major major, course room course time What else can we infer? 25 Another Rule Augmentation If A 1, A 2,, A n B then A 1, A 2,, A n, C 1, C 2,, C p B Augmentation follows from trivial rules and transitivity How? 26 13
Problem: infer ALL FDs Given a set of FDs, infer all possible FDs How to proceed? Try all possible FDs, apply all 3 rules E.g. R(A, B, C, D): how many FDs are possible? Drop trivial FDs, drop augmented FDs Still way too many Better: use the Closure Algorithm (next) 27 Closure of a set of Attributes Given a set of attributes A 1,, A n The closure, {A 1,, A n } +, is the set of attributes B s.t. A 1,, A n B Example: name color category department color, category price Closures: name + = {name, color} {name, category} + = {name, category, color, department, price} color + = {color} 28 14
Closure Algorithm Start with X={A1,, An}. Example: Repeat until X doesn t change do: if B 1,, B n C is a FD and B 1,, B n are all in X name color category department color, category price then add C to X. {name, category} + = {name, category, color, department, price} 29 Example R(A,B,C,D,E,F) A, B C A, D E B D A, F B Compute {A,B} + X = {A, B, } Compute {A, F} + X = {A, F, } 30 15
Using Closure to Infer ALL FDs Example: A, B C A, D B B D Step 1: Compute X +, for every X: A+ = A, B+ = BD, C+ = C, D+ = D AB+ = ABCD, AC+ = AC, AD+ = ABCD ABC+ = ABD+ = ACD + = ABCD (no need to compute why?) BCD + = BCD, ABCD+ = ABCD Step 2: Enumerate all FD s X Y, s.t. Y X + and X Y = : AB CD, AD BC, ABC D, ABD C, ACD B 31 Problem: Finding FDs Approach 1: During Database Design Designer derives them from real-world knowledge of users Problem: knowledge might not be available Approach 2: From a Database Instance Analyze given database instance and find all FD s satisfied by that instance Useful if designers don t get enough information from users Problem: FDs might be artifical for the given instance 32 16
Find All FDs Student Dept Course Room Alice CSE C++ 020 Bob Alice Carol CSE EE CSE C++ HW DB 020 040 045 Do all FDs make sense in practice? Dan CSE Java 050 Elsa CSE DB 045 Frank EE Circuits 020 33 Answer Course Dept, Room Dept, Room Course Student, Dept Course, Room Student, Course Dept, Room Student, Room Dept, Course Do all FDs make sense in practice? 34 17
Keys A key is a set of attributes A 1,..., A n s.t. for any other attribute B, we have A 1,..., A n B A minimal key is a set of attributes which is a key and for which no subset is a key Note: book calls them superkey and key 35 Computing Keys Compute X + for all sets X If X + = all attributes, then X is a key List only the minimal keys Note: there can be many minimal keys! Example: R(A,B,C), AB C, BC A Minimal keys: AB and BC 36 18
Examples of Keys Product(name, price, category, color) name, category price category color Keys are: {name, category} and all supersets Enrollment(student, address, course, room, time) student address room, time course student, course room, time 37 Relational Schema Design (or Logical Schema Design) Main idea: Start with some relational schema Find out its FD s Use them to design a better relational schema 38 19
Data Anomalies When a database is poorly designed we get anomalies: Redundancy: data is repeated Update anomalies: need to change in several places Delete anomalies: may lose data when we don t want 39 Relational Schema Design Example: Persons with several phones Name Fred SSN 123-45-6789 PhoneNumber 206-555-1234 City Seattle Fred 123-45-6789 206-555-6543 Seattle Joe 987-65-4321 908-555-2121 Westfield SSN Name, City but not SSN PhoneNumber Anomalies: Redundancy = repeat data Update anomalies = Fred moves to Bellevue Deletion anomalies = Joe deletes his phone number: what is his city? 40 20
Relation Decomposition Break the relation into two: Name SSN PhoneNumber City Fred 123-45-6789 206-555-1234 Seattle Fred 123-45-6789 206-555-6543 Seattle Joe 987-65-4321 908-555-2121 Westfield Name Fred Joe SSN 123-45-6789 987-65-4321 City Seattle Westfield SSN Anomalies have gone: No more repeated data Easy to move Fred to Bellevue (how?) Easy to delete all Joe s phone number (how?) 123-45-6789 123-45-6789 987-65-4321 PhoneNumber 206-555-1234 206-555-6543 908-555-2121 41 Relational Schema Design Conceptual Model: name Product buys Person price name ssn Relational Model: plus FD s Normalization: Eliminates anomalies 42 21
Decompositions in General R(A 1,..., A n, B 1,..., B m, C 1,..., C p ) R 1 (A 1,..., A n, B 1,..., B m ) R 2 (A 1,..., A n, C 1,..., C p ) R 1 = projection of R on A 1,..., A n, B 1,..., B m R 2 = projection of R on A 1,..., A n, C 1,..., C p 43 Decomposition Sometimes it is correct: Name Gizmo OneClick Gizmo Price 19.99 24.99 19.99 Category Gadget Camera Camera Name Price Name Category Gizmo 19.99 Gizmo Gadget OneClick 24.99 OneClick Camera Gizmo 19.99 Gizmo Camera Lossless decomposition 44 22
Incorrect Decomposition Sometimes it is not: Name Price Category Gizmo OneClick Gizmo 19.99 24.99 19.99 Gadget Camera Camera What s incorrect?? Name Gizmo OneClick Gizmo Category Gadget Camera Camera Lossy decomposition Price 19.99 24.99 19.99 Category Gadget Camera Camera 45 Decompositions in General R(A 1,..., A n, B 1,..., B m, C 1,..., C p ) R 1 (A 1,..., A n, B 1,..., B m ) R 2 (A 1,..., A n, C 1,..., C p ) If A 1,..., A n B 1,..., B m Then the decomposition is lossless Note: don t need necessarily A 1,..., A n C 1,..., C p Example: name price, hence the first decomposition is lossless 46 23
Normal Forms First Normal Form = all attributes are atomic Second Normal Form (2NF) = old and obsolete Third Normal Form (3NF) = this lecture Boyce Codd Normal Form (BCNF) = this lecture Others... 47 Boyce-Codd Normal Form A simple condition for removing anomalies from relations: A relation R is in BCNF if: If A 1,..., A n B is a non-trivial dependency in R, then {A 1,..., A n } is a key for R In English (though a bit vague): Whenever a set of attributes of R is determining another attribute, it should determine all the attributes of R. 48 24
BCNF Decomposition Algorithm Repeat choose A 1,, A m B 1,, B n that violates the BNCF condition split R into R 1 (A 1,, A m, B 1,, B n ) and R 2 (A 1,, A m, [others]) continue with both R 1 and R 2 Until no more violations B s A s Others Is there a 2-attribute relation that is not in BCNF? R 1 R 2 49 Example Name SSN PhoneNumber City Fred 123-45-6789 206-555-1234 Seattle Fred 123-45-6789 206-555-6543 Seattle Joe 987-65-4321 908-555-2121 Westfield Joe 987-65-4321 908-555-1234 Westfield What are the dependencies? SSN Name, City What are the keys? {SSN, PhoneNumber} Is it in BCNF? 50 25
Decompose it into BCNF Name Fred SSN 123-45-6789 City Seattle SSN Name, City Joe 987-65-4321 Westfield SSN 123-45-6789 123-45-6789 987-65-4321 987-65-4321 PhoneNumber 206-555-1234 206-555-6543 908-555-2121 908-555-1234 Let s check anomalies: Redundancy? Update? Delete? 51 Summary of BCNF Decomposition Find a dependency that violates the BCNF condition: Heuristics: choose B, B, B as large as possible 1 2 m Decompose: A 1, A 2,, A n B 1, B 2,, B m Others R1 A s B s R2 Continue until there are no BCNF violations left. 2-attribute relations are BCNF 52 26
Example Decomposition Person(name, SSN, age, haircolor, phonenumber) SSN name, age age haircolor Decompose in BCNF (in class): Step 1: find all keys (How? Compute S +, for various sets S) Step 2: now decompose 53 Other Example R(A,B,C,D) A B, B C Key: AD Violations of BCNF: A B, A C, A BC Pick A BC: split into R1(A,BC) R2(A,D) What happens if we pick A B first? 54 27
Lossless Decompositions A decomposition is lossless if we can recover: R(A,B,C) Decompose R1(A,B) R2(A,C) Recover R (A,B,C) should be the same as R(A,B,C) R is in general larger than R. Must ensure R = R 55 Lossless Decompositions Given R(A,B,C) s.t. A B, the decomposition into R1(A,B), R2(A,C) is lossless 56 28
3NF: A Problem with BCNF Unit Company Product FD s: Unit Company; Company, Product Unit So, there is a BCNF violation, and we decompose. Unit Company Unit Company Unit Product No FDs Notice: we loose the FD: Company, Product Unit 57 So What s the Problem? Unit Company Unit Product Galaga99 UW Galaga99 databases Bingo UW Bingo databases No problem so far. All local FD s are satisfied. Let s put all the data back into a single table again (anomalies?): Unit Company Product Galaga99 UW databases Bingo UW databases Violates the dependency: company, product -> unit! 58 29
Solution: 3rd Normal Form (3NF) A simple condition for removing anomalies from relations: A relation R is in 3rd normal form if : Whenever there is a nontrivial dependency A 1, A 2,..., A n B for R, then {A 1, A 2,..., A n } is a key for R, or B is part of a key. Tradeoff: BCNF = no anomalies, but may lose some FDs 3NF = keeps all FDs, but may have some anomalies 59 30