Combining schemas. Problems: redundancy, hard to update, possible NULLs

Handout

Combining schemas Problems: redundancy, hard to update, possible NULLs

Problems? Conclusion: Whether the join attribute is PK or not makes a great difference when combining schemas!

Splitting schemas, a.k.a. decomposition (revert arrows below) Coincidence or not? (And why it matters ) Functional dependency: loan_number amount

An even worse decomposition: lossy! Why do we say lossy when in fact we end up with more data?

7.3 Decomposition using FDs

FD algebra

Example: --------------------------------------------------------------------------------------------------

The most useful normal form:

loan = (loan_number, amount) borrower = (customer_id, loan_number) Find the set of all (non-trivial) FDs for the relation bor_loan

Another example: Is this schema in BCNF?

In bor_loan, the violating FD is loan_number amount, so we set Why not simply say R?

Another example: It was found earlier that this schema is not in BCNF. The violating FD is B C. Apply the BCNF decomposition algorithm!

If no, decompose it! Is this relation in BCNF?

To do for next time: Rework all the BCNF examples! ----------------------------------------------------------------------------------------------

BCNF and preservation of dependencies E-R design from Ch.6: a customer A customer can have more than 1 personal banker, can have at most 1 personal banker but at most one at any given branch. (?)

A ternary relationship-set is needed: Implementation: R = cust_banker_branch = (customer_id, employee_id, branch_name, type) FDs: FD1: employee_id branch_name FD2: (customer_id, branch_name) (employee_id, type)

Is cust_banker_branch in BCNF? No. Apply the decomposition algorithm! Decomposition: R1 = (employee_id, branch_name) R2 = (customer_id, employee_id, type) Problem: FD2 is now spread across two relations!

Conclusion: BCNF is not dependency preserving R = cust_banker_branch = (customer_id, employee_id, branch_name, type) FDs: FD1: employee_id branch_name FD2: (customer_id, branch_name) (employee_id, type) Extra-credit: What if we started BCNF decomposition with F2 instead of F1? Time: 2

Because it is not always possible to achieve both BCNF and dependency preservation, we consider a weaker NF, known as Show that cust_banker_branch is in 3NF R = cust_banker_branch = (customer_id, employee_id, branch_name, type) FDs: FD1: employee_id branch_name FD2: (customer_id, branch_name) (employee_id, type)

Whatever happened to 2NF? In a nutshell, it forbids attributes to depend on parts of keys. See Second normal form - Wikipedia, the free encyclopedia for more details. Another BCNF/3NF example: books (B-Name, Ed, A-Name, A-SSN, Nr-pag) Is it in BCNF? 3NF? A_Name A_SSN To do for next time: Rework all the BCNF & 3NF examples! -----------------------------------------------------------------------------------------------------

Higher NFs Consider this relation: classes (course, teacher, book ) If (c, t, b) classes means that t is qualified to teach c, and b is a required textbook for c. What are the FDs for this relation? Is it in BCNF? Is it in 3NF?

We still have redundancies and insertion anomalies e.g., if Marilyn is a new teacher that can teach database, two tuples need to be inserted: (database, Marilyn, DB Concepts) (database, Marilyn, Ullman)

Whatever happened to 2NF? In a nutshell, it forbids attributes to depend on parts of keys. See Second normal form - Wikipedia, the free encyclopedia for more details.

The big picture

7.4 FD Theory 7.4.1 The Closure of a set of FDs Yes, this is a trivial FD!

Algorithm to compute F+

Although Armstrong s axioms are sufficient to obtain the closure in practice we want more tools How about these? Idempotency: X X X Commutativity: X Y Y X They are true, but it is customary to write all attributes as sets w/no repeating values and sorted in alphabetical order.

Important lemma: if and only if Proof: Left as individual work for next time. Use the definition of a FD from p.271:

Practice exercise 7.4

Example: Quiz: Generate 4 more FDs that are in F +

7.4.2 The Closure of a set of attributes (under the set of FDs) Compare to the inefficient algorithm, based on F + For next time: Read and understand the example on p.281 --------------------------------------------------------------------------------------------------------

Applications of attribute closure: Check if a set of attributes is superkey Check if a set of attributes is candidate key (i.e. superkey + minimal) Check if a functional dependency holds (i.e. if is in F + ) o Find + and then check if + Computing closure F + of F o For each set of attributes R, find the closure +, and for each S + output a functional dependency S Attribute closure gives another algorithm to find the FD closure F +! Compare it with the first alg. from fig. 7.8. Which one do you think is more efficient? Explain!

Example:

In general, a FD is of the form, with and sets of attributes, e.g. EFG KL. Food for thought: Can be the empty set? ( nothing ) Can be the total set? ( everything ) Can be the empty set? ( nothing ) Can be the total set? ( everything )

Extraneous attributes This part is trivial, so it doesn t need to be checked (it was included just for symmetry) In English: If we remove the attribute, the closure F + does not change Why is this of practical importance?

Examples: Given F = {A C, AB C } B is extraneous in AB C because {AB C} can be derived from A C (How?) As seen in this example, sometimes removal of extraneous attributes makes an entire FD disappear (b/c it s a duplicate) Given F = {A C, AB CD} C is extraneous in AB CD since AB C can be derived even after deleting C (How?)

Algorithm: Add to the list of applications of attribute closure! Exercise

Answer: ------------------------------------------------------------------------------------- Exercise

Answer: A+ = {A, B, C, D}, so A+ contains C, so C is extraneous in A CD Exercise Same scenario as above. Is D extraneous in A CD? Exercise F = {A B, B C, A C). Is C extraneous in A C? So what do we do about A C? For next time: solve all the exercises above, plus the one on p.283!

? Why is this of practical importance? Algorithm:

Example not from text: Solve for practice!

Solution:

Two things must be preserved when we perform decompositions: Data (tuples) FDs

Efficient algorithm (uses only attribute closure, not FD closure!) How much of Ri can we recover, based on the current result?

Example (not in text, but in text slides): Trivial, don t need algorithm! Apply the algorithm above to prove this! ----------------------------------------------------------------------------------

Solution: Prove that the decomposition R1=(A, B) R2 = (A,C) is not dependency preserving. The FD that needs to be recovered is B C. Apply algorithm: result = {B} Consider R1; result R1 = {B}; {B} + = {BC}; {BC} R1 = {B}; resultu{b} = {B} Consider R2; result R2 = Ø; Ø + = Ø; result = {B} No progress, so algorithm stops. We could not obtain the RHS of B C, so FD cannot be recovered.

Week 12, Lect 1 7.5 Decomposition using FDs Problem: The definitions of both BCNF and 3NF require F + expensive!

FYI there is a sketched proof for this on p.289 (not required for final)

Can you find super-keys? Intuitively, we can feel that AC BDE but how to prove it? Hint: Armstrong s axioms (and theorems) So AC is a super-key. But is it a candidate key? (What s the difference?)

Do you think there are other candidate keys? Why or why not? Are there any BCNF violations? Hint: To find BCNF violations, do we need to check F or F +? Why? Which one do choose to start decomposition?

Now write down the two relations resulting from decomposition, including their FDs F 1 and F 2 and their candidate keys:

SKIP the remainder of Section 7.5, starting with 7.5.1.2. (p.289) SKIP 7.6, 7.7 Read and take notes: Sections 7.8, 7.9 Homework for Ch.7: 1, 3, 5, 6, 7, 11