Normalization 03 CSE3421 notes 1
Example F: A B (1) ABCD E (2) EF G (3) EF H (4) ACDF EG (5) Calculate the minimal cover of F. 2
Step 1: Put F in standard form FDs (1) (4) are already in standard form. For FD (5): ACDF E (5.1) ACDF G (5.2) 3
Step 2: eliminate extraneous attiributes from LHS (minimize LHSs) (1) A B : nothing to eliminate. (2) ABCD E. If delete A, will have BCD E. Is this LHS good enough? It is, if either BCD E, or BCD W, such that W contains ABCD (the original LHS). Note, (BCD)+ = BCD. Therefore, cannot delete A. 4
Can we delete B? If so, then will have ACD E Is this LHS good enough? It is, if either ACD E, or ACD W that contains ABCD. Note, ACD ACD ABCD. Therefore, ACD W = ABCD, and thus B can be eliminated! So, ABCD E becomes ACD E. 5
Can I delete any more? i.e., delete C or D? If delete C, then ACD E becomes AD E. Test: AD AD ABD, does not contain ACD. Therefore, cannot delete C. Similarly, cannot delete D. Since we finished scanning the entire LHS of this FD, step 2 is finished for this FD, and the resulting FD is ACD E (2.1) replaces (2) of original F. 6
Repeat the above process for FDs (3), (4), (5.1), (5.2). For (3): EF G. 1. Can I delete E?... If so, will have F G. not possible check. 2. Can I delete F? if so, will have E G not possible.. Check. Therefore, there is no change in (3). 7
For (4) : EF H Again, cannot delete anything from LHS. For (5.1) [ ACDF E ]. Delete A? CDF E? Or, CDF W that contains ACDF? (CDF)+ = CDF, which is neither E nor W. cannot delete A. 8
For (5.1) [ ACDF E ] Delete C? ADF E, or ADF W that contains ACDF. ADF ADF from (1) ABDF, which does contain E or ACDF. Cannot delete C. 9
For (5.1) [ ACDF E ] Delete D? ACF E, or ACF W that contains ACDF. ACF ACF from (1) ABCF, which does not contain E or ACDF. Cannot delete D. 10
For (5.1) [ ACDF E ] Delete F? ACD E, or ACD W that contains ACDF. ACD ACD from (1) ABCD from (2) ABCDE, which contains E!! Therefore, F can be deleted from the LHS of (5.1), and (5.1) becomes ACD E (5.1.1) Finished step 2 of (5.1)!!.. On to (5.2) 11
ACDF G (5.2) Repeat the above process and find that nothing can be eliminated from the LHS of (5.2). So step 2 of the minimal cover computation is finished (we minimized all LHSs of all FDs). 12
The resulting FDs from step 2, are: A B (1) ---------- (1) ACD E (2.1) ------- (2) EF G (3) --------- (3) EF H (4) --------- (4) ACD E (5.1.1) ACDF G (5.2) ------- (5) Same as (2)!! 13
Now proceed to step 3: Eliminate redundant FDs Start top-to-bottom (can start bottom-to-top, but may get different result, although still a minimal cover). Is (1) A B redundant? If so, then the rest of the FDs must produce A B. Observe that if A B is deleted, then A can generate A only (A A) and thus A B cannot be deleted. 14
Is (2) [ ACD E ] redundant? (ACD)+ =? ACD ACD from (1) [A B] ABCD. Since ABCD does not contain E, we conclude that ACD E is not redundant. 15
Is (3) [ EF G] redundant? (EF)+ =? EF EF (4) EF H EFH. Since EFH does not contain G, we conclude that EF G is not redundant. 16
EF EF Is (4) [ EF H] redundant? (3) EF G EFG. EFG does not contain H, so EF H is not redundant. 17
Is (5) [ ACDF G] redundant? ACDF ACDF (1) A B ABCDF (2) ACD E ABCDFE (3) EF G ABCDFEG. Since ABCDFEG contains G, ACDF G is redundant!!!! So can eliminate this FD. This is the end of step 3!!! 18
A B (1) ACD E (2) EF G (3) EF H (4) The minimal cover is: This is the minimal cover of F. 19
Another example of minimal cover F: ABH C (1) A D (2) C E (3) BGH F (4) F AD (5) E F (6) BH E (7) Find a minimal cover for F. 20
Step 1: Put F in standard form All FDs are already in standard form, except (5) (5) F AD: F A (5.1) F D (5.2) 21
Step 2: Eliminate redundant attributes from LHSs Check FD (1): ABH C Is A redundant? If so, then BH C or BH W that contains ABH BH (7) BH E BHE (6) E F BHEF (5) F AD BHEFAD. Since BHEFAD contains ABH, A is redundant! ABH C becomes BH C (1.1) 22
In BH C, is B redundant? no (check) Is H redundant? no (check) Check FD (2) A D Nothing can be redundant. Check FD (3) C E Nothing can be redundant 23
Check FD (4) : BGH F Is B redundant? If so, then GH F, or GH W that contains BGH GH GH is the only thing that can derive. Therefore, B is not redundant. Is G redundant? BH BH (1.1) BH C BHC (3) C E BHCE (6) E F BHCEF, contains F G is redundant! Note: upon eliminating G here, attribute G is completely lost from the set of FDs!!! Is H redundant? no (check) Therefore, BGH F becomes BH F (4.1) 24
Check FDs F AD (5) F A (5.1) F D (5.2) E F (6) BH E (7) Nothing redundant 25
Step 3: Eliminate redundant FDs Check BH C (1.1) Is this FD redundant? If so, then BH W that contains C BH BH (7) BH E BHE (6) E F BHEF (5.1),(5.2) F A, F D BHEFAD BHEFAD does not contain C. So BH C is not redundant. 26
A A, only. Not redundant Is (2) A D redundant? 27
C C, only. Not redundant. Is (3) C E redundant? 28
Is (4.1) BH F redundant? BH BH (1.1) BH C BHC (3) C E BHCE (6) E F BHCEF, which contains F. Therefore, BH F is redundant! Eliminate BH F 29
F F Is (5.1) F A redundant? (5.2) F D FD, which does not contain A F A is not redundant 30
Is (5.2) F D redundant? F F (5.1) F A FA (2) A D FAD, which contains D. F D is redundant! 31
E E, only. Is (6) E F redundant? E F is not redundant. 32
Is (7) BH E redundant? BH BH (1.1) BH C BHC (3) C E BHCE, which contains E! BH E is redundant! Eliminate BH E 33
BH C (1) A D (2) C E (3) F A (4) E F (5) The minimal cover is: Note: the minimal cover does not involve original attribute G!!! (i.e. G is lost). 34
Cost of calculating the minimal cover O(mc (F)) =? Assume F has k FDs Assume F has N attributes (over all FDs) O(mc) = O ( step 1) + O ( step 2) + O ( step 3) 35
Cost of step 1: O (step 1) Put each FD in standard form Have to scan entire FD O (N) for each FD k * O(N) for all FDs 36
Cost of step 2: O (step 2) Eliminate redundant attributes from LHSs For each FD X Y: For each attribute of the FD: Cost to check if can eliminate that attribute from LHS == cost to calculate closure of a set of attributes == cc. [ cc can be around O(N^2) or O(N^3)] Therefore, for each FD: O(N) * cc. So for all FDs, cost is: k * O(N) * cc 37
Cost of step 3: O (step 3) Eliminate redundant FDs For each FD X Y, check if this FD is redundant. The FD is redundant if X+ contains Y, using the other FDs. The cost to calculate that is cc for each FD. Therefore, for all FDs, the cost would be k * cc 38
Total cost (cost of minimal cover calculation) O(mc) = O ( step 1) + O ( step 2) + O ( step 3) = k * O( N ) + k*o(n)*cc + k*cc = O(k*N*cc) = O(k*N^3), assuming that cc = O(N^2), Or O(k*N^4), assuming that cc = O(N^3). 39
Cost on the average On the average, we assume that the attributes are equally distributed among FDs and also between the LHS and RHS of each FD. Then, each RHS (and each LHS) has N/2k attributes. Then the cost of MC is: O(k*N/2k*cc) = O(cc*N/2) = O(N*cc) = O(N^3) or O(N^4). 40
Non-uniqueness of minimal covers When calculating a MC, the outcome of steps 2 and 3 may depend on the order in which we test the candidates for removal (both attributes and FDs). i.e., a set of FDs may have several minimal covers. 41
F: A B (1) B C (2) C A (3) A C (4) C B (5) B A (6) Example : Find the MC 42
Step 1: Put F in standard form Nothing to do. It is already in standard form. 43
Step 2: Eliminate redundant attributes from LHSs Nothing to do. LHSs are already minimal. 44
Step 3: Eliminate redundant FDs Start from bottom of F Is (6): B A redundant? B B (2) B C BC (3) C A BCA, which contains A. Therefore, B A is redundant. Is (5): C B redundant? C C (3) C A CA (1) A B CAB, which contains B. Therefore, C B is redundant. Is (4): A C redundant? A A (1) A B AB (2) B C ABC, which contains C. Therefore, A C is redundant. 45
Is (3): C A redundant? C C. Therefore C A is not redundant. Is (2): B C redundant? B B. Therefore B C is not redundant. Is (1): A B redundant? A A. Therefore A B is not redundant. 46
The minimal cover is: A B (1) B C (2) C A (3) 47
Now process from top of F, for step 3 Is (1): A B redundant? A A (4) A C AC (5) C B ACB, which contains B. Therefore, A B is redundant. Is (2): B C redundant? B B (6) B A BA (4) A C BAC, which contains C. Therefore, B C is redundant. Is (3): C A redundant? C C (5) C B CB (6) B A CBA, which contains A. Therefore, C A is redundant. 48
Is (4): A C redundant? A A. Therefore, A C is not redundant. Is (5): C B redundant? C C. Therefore, C B is not redundant. Is (6): B A redundant? B B. Therefore, B A is not redundant. 49
The minimal cover is: A C (4) C B (5) B A (6) (different from the minimal cover of when we process the FDs bottom-to-top). 50
Another minimal cover example (try it as exercise) F: AB C C A BC D ACD B D E D G BE C CG B CG D CE A CE G MC 1 AB C C A BC D CD B D E D G BE C CG D CE G MC 2 AB C C A BC D D E D G BE C CG B CE G 51
Where are we X+: Closure of set of attributes F+: closure of set of FDs MC: minimal cover of F Preservation of dependencies Lossless join property Algorithm to compute 3NF 52
3NF Synthesis algorithm Given a relation R, find a decomposition of R that is: 1) In 3NF 2) Has Lossless join property 3) Preserves dependencies 53
Given set of FDs F: Steps for 3NF synthesis I. Construct minimal cover of F II. Group FDs with same LHS All FDs with the same LHS become one FD with that LHS and as its RHS the union of all RHSs. III. Add key relation, if necessary If X is a key for R and X is not in a relation resulted from steps 1 and 2, then add the relation X. (this will guarantee lossless join). 54
In 3NF synthesis algorithm Step I Step II 3NF + preserved dependencies. Step III 3NF + preserved dependencies + lossless join property. 55
Example of 3NF synthesis F: A B (1) C B (2) 56
I. construct MC Step 1: standard form FDs are already in standard form. Step 2: eliminate redundant attributes There are none to eliminate. Step 3: eliminate redundant FDs. There are none to eliminate the minimal cover is F itself. 57
II. Combine FDs with same LHS. already done (there are none to combine). III. Add key relation, if necessary. AC is a key in R. add relation AC 58
Result The resulting decomposition is: (R1, R2, R3) (R1, R2, R3) R1 (A, B) R2 (B, C) R3 (A, C) A B C B -- 59
Another example of 3NF synthesis F: AB C (1) A B (2) B A (3) 60
I. construct MC 3NF synthesis example Step 1: standard form FDs are already in standard form. 61
3NF synthesis example (I). construct MC Step 2: eliminate redundant attributes Check AB C Is A redundant? If so, then B C or B W that contains AB.»B B (3) BA, which contains AB.» A is redundant!» AB C becomes B C (1.1) Check FDs (2) and (3) nothing redundant there. F becomes: B C (1.1) A B (2) B A (3) 62
3NF synthesis example (I). construct MC / Step 3: eliminate redundant FDs. Is (1.1): B C, redundant? If so, then (2) and (3) can produce equivalent B B (3) BA, which does NOT contain C. Therefore, (1.1): B C is not redundant! and no other FD is redundant either. (check ) done with MC computation! the MC is: B C (1) A B (2) B A (3) 63
3NF synthesis example (II). Combine FDs with same LHS / the MC is: B C (1) A B (2) B A (3) (1) & (3) : B AC (2) : A B The new F. 64
3NF synthesis example (III). Add relation with X key, if necessary / This is not necessary here, since B is a key and B AC forms a relation already. 65
The resulting schema is (R1, R2) R1 R2 A, B, C A, B B AC A B (notice the redundancy!!!) 66
Another example of 3NF synthesis ( from a previous example) F: ABH C (1) A D (2) C E (3) BGH F (4) F AD (5) E F (6) BH E (7) Perform 3NF synthesis on the above F. 67
Done before. MC is: BH C (1) A D (2) C E (3) F A (4) E F (5) construct MC Note: the minimal cover does not involve original attribute G!!! (i.e. G is lost). 68
(II). Combine FDs with same LHS Nothing to do 69
(III). Add key relation, if necessary Note, none of (1).. (5) of the MC contain enough attributes to form a key for the entire set of attributes (since attribute G was lost in the way). The closest such FD is BH C, since BH ABCDEFH since all attributes are derived, except G. Note that BGH is a key for the entire set of attributes (including attribute G). So BGH is a key relation. 70
The resulting schema is (R1, R2, R3, R4, R5, R6) R1 B, C, H R2 A, D R3 C, E R4 A, F R5 E, F R6 B, G, H BH C A D C E F A E F -- 71
Where are we X+: Closure of set of attributes F+: closure of set of FDs MC: minimal cover of F Preservation of dependencies Lossless join property Algorithm to compute 3NF 72
BCNF (Boyce-Codd Normal Form) (a.k.a. 3 ½ NF) A relation R with F is in BCNF if for any FD X A in F such that A is not contained in X, X is a superkey of R. (i.e., X is or contains a key of R). In other words, the only non-trivial dependencies are those in which a key functionally determines one or more other attributes. 73
Example 1 R = (C, S, Z) C: city S: state Z: zip code F: CS Z Z C Is NOT in BCNF, since Z C violates BCNF: Z C is in F, such that C is not contained in Z and Z is NOT a superkey in R. (for that matter, Z is not even a key in R). 74
Example 2 Person = (SSN, name, address, hobby) [ key :: SSN, hobby] F: SSN name (1) SSN, hobby name, address (2) ( FD (1) is given, FD (2) is derived from relation Person ). FD (1) violates BCNF since SSN is not a key. Person is not in BCNF. 75
Example 3 HasAccount = (acc#, clientid, officeid) [ key :: clientid, officeid] F: acc# officeid (1) clientid, officeid acc# (2) ( FD (1) is given, FD (2) is derived from relation HasAccount ). FD (1) violates BCNF since acc# is not a key. HasAccount is not in BCNF. 76
Example 4 Members = (name, address, balance) [ key :: name] Orders = (order#, name, item, qty) [ key :: order#] F: name address, balance order# name, item, qty (1) in Members (2) in Orders Is in BCNF 77
How to decompose R into BCNF? Assume R is not in BCNF. there is a FD X A that violates BCNF. Decompose R into R A and XA; repeat, if R A and/or XA is not in BCNF. R R A X A 78
R = (C, T, H, R, S, G) C: course T: teacher H: hour R: room S: student G: grade Example 1 79
F: C T //each course has one teacher. HR C //only 1 course per room at a time. HT R //only 1 teacher in a room at a time. CS G //each student has 1 grade in each course. HS R //a student can only be in 1 room at one time. (Notice, the only key is HS) (HS)+: HS HS HSR HSRC HSRCG HSRCGT, all attributes. CS G of R violates BCNF since CS is not a key. decompose R 80
decompose R = (C, T, H, R, S, G): CS G of R violates BCNF ( A = G; X = CS ) ( R ) C T H R S G ( R A ) C T H R S C T HR C HT R HS R Check if CTHRS is in BCNF ( XA ) C S G CS G Violates BCNF, since C is not a key. CTHRS not in BCNF. In BCNF 81
decompose (C, T, H, R, S): its C T violates BCNF ( A = T; X = C ) C T H R S ( R A ) C H R S HR C HS R Check if CHRS is in BCNF ( XA ) C T C T In BCNF Violates BCNF, since HR is not a key. CHRS not in BCNF. Note: HT R is lost! i.e., not preserved. (BCNF does not necessarily preserve dependencies). 82
decompose (C, H, R, S): its HR C violates BCNF ( A = C; X = HR ) C H R S ( R A ) ( XA ) H R S In BCNF H R C In BCNF HS R HR C 83
overview of decomposition ( R ) C T H R S G C T H R S C T HR C HT R HS R C T C T In BCNF C S G CS G In BCNF C H R S HR C HS R H R C HR C In BCNF H R S HS R In BCNF 84
R = (St, C, Sem, P, T, R) St: Student C: course Sem: semester P: professor T: time R: room Example 2 85
F: St, C, Sem P (1) P, Sem C (2) C, Sem, T P (3) P, Sem, T C, R (4) P, Sem, C, T R (5) P, Sem, T C (6) Convert to BCNF 86
Another approach 1. Convert to 3NF using the 3NF synthesis algorithm (resulting to 3NF, with preserved dependencies and lossless join property). I. Construct MC II. Group FDs with same LHS III. Add key relation, if necessary 2. Then convert those 3NFs that are not BCNFs, into BCNF (may lose some dependencies, but ok). 87
(I) Construct minimal cover Step 1: convert to standard form. Is in SF already, except FD (4): P, Sem, T C (4.1) P, Sem, T R (4.2) Step 2: eliminate redundant attributes from LHSs. (check and do it ) Step 3: eliminate redundant FDs. (check and do it ) 88
The minimal cover is: St, C, Sem P (1) P, Sem C (2) C, Sem, T P (3) P, Sem, T R (4) 89
(II) Group FDs with same LHS Already done St, C, Sem P (1) P, Sem C (2) C, Sem, T P (3) P, Sem, T R (4) 90
After (I) and (II), have 3NF and preserved dependencies. (R1, R2, R3, R4) R1 St, C, Sem, P R2 P, Sem, C R3 C, Sem, T, P R4 P, Sem, T, R St, C, Sem P P, Sem C C, Sem, T P P, Sem, T R To complete the 3NF synthesis algorithm, we should look for key relation as well. But since we have 3NF already and we only look for BCNF, this step is not necessary. (try it as an exercise ) Notice, this dependency refers also to R1 and R3! 91
Above schema is not in BCNF For example, in R1: R1 = (St, C, Sem, P ) St, C, Sem P (1) P, Sem C (2) FD (1) satisfies BCNF But FD (2) does not satisfy BCNF, since P, Sem is not a key. decompose R1 92
decompose R1 = (St, C, Sem, P ): P, Sem C of R1 violates BCNF ( A = C; X = P Sem ) ( R1 ) R11 ( R1 A ) St, Sem, P St, C, Sem, P In BCNF R12 ( XA ) P, Sem, C In BCNF < no FDs here. Lost an FD! > P, Sem C Identical to R2 one of them will be discarded. 93
Also, R3 is not in BCNF since FD (2) [ P, Sem C ] violates BCNF. decompose R3 ( R3 ) R31 C, Sem, T, P R32 ( R3 A ) Sem, T, P In BCNF ( XA ) P, Sem, C In BCNF < no FDs. Lost C,Sem,T P! > P, Sem C Identical to R2 and R12 only one of them will be kept. 94
The final BCNF decomposition is: (R11, R12, R31, R4) R11 R12 (or R2, or R32) R31 St, Sem, P P, Sem, C Sem, T, P -- P, Sem C -- R4 P, Sem, T, R P, Sem, T R 95
Also, notice If add, for example, (St, T, Sem, P) with no dependencies, in the original 3NF schema, have lossess join. If add this one to the final BCNF schema, then also have BCNF with lossless join. 96
End of Normalization 03 97
End of Normalization 98