Functional dependency theory Introduction to Database Design 2012, Lecture 8 Course evaluation Recalling normal forms Functional dependency theory Computing closures of attribute sets BCNF decomposition Next time: - Canonical covers - 3NF decomposition Overview 2
Course evaluation Main points - More TAs - Exercises after lecture rather than before (why?) Question: Hand-in schedule Minor points - More examples (SQL was hard) - Recap at start of lecture - Correction of non-mandatory exercises - Exam on paper 3 Teaching cancelled April 16 Instead we continue one more week April 16 4
Normal forms Definition A legal instance of a database schema is an instance that does not break the rules of the real world Definition. A functional dependency (FD) α β holds if for all pairs of tuples t, u in any legal instance if t[α] = u[α] then t[β] = u[β] Here α,β denote sets of attributes Example: dept_name budget because if t[dept_name] = u[dept_name] then t[budget] = u[budget] 6
Boyce-Codd normal form (BCNF) A table r(r) is in BCNF if for all functional dependencies α β either - β α (α β is trivial) - or α is a superkey A schema is in BCNF if all tables are in BCNF A schema in BCNF does not allow for (most) redundancies Example of a non-bcnf schema: - instructor(id, name, salary, dept_name, building, budget) 7 Third normal form (3NF) A table r(r) is in 3NF if for all functional dependencies α β either - β α (α β is trivial) - or α is a superkey - or each A in β-α is part of a candidate key Any schema in BCNF is also in 3NF A schema is in 3NF if all tables are in 3NF Example: dept_advisor(s_id, i_id, dept_name) is 3NF: i_id dept_name s_id, dept_name i_id 8
Bill Kent: [Every] non-key [attribute] must provide a fact about the key, the whole key, and nothing but the key (so help me Codd) The 3NF oath Explanation: - Think of an FD α A as A provides a fact about α - A key attribute is an attribute part of a candidate key 9 Motivation for 3NF Schemas in 3NF can contain some redundancies not found in schemas in BCNF Not always possible to reduce to BCNF in a dependency preserving way This means that 3NF schemas can be more efficient to work with See this weeks exercises for an illustration! 10
Functional-dependency theory Consider the following schema departure(fn, date, an) flight_time(fn, std, sta) airport(ac, an, city) destination(fn, ac) Motivating example - fn = flight number - an = airport name - ac = airport code - std, sta: scheduled time of departure / arrival Is this BCNF? 12
Primary keys as FDs departure(fn, date, an) flight_time(fn, std, sta) airport(ac, an, city) destination(fn, ac) None of these violate BCNF A logically implied FD: fn an This violates BCNF in departure Motivating example fn, date an fn std, sta ac an, city fn ac 13 Moral We need to take logically implied FDs into account when reasoning about BCNF, 3NF We need to know rules for constructing new logically implied FDs We need to know algorithms for finding all implied FDs On the other hand: - If there is only one table, then it suffices to inspect the given set of FDs 14
We use F to range over sets of FDs So F could be: Derived FDs Definition. An FD α β is logically implied if all instances satisfying F also satisfy α β fn, date an fn std, sta ac an, city fn ac Example. FD fn an is logically implied by the above F F+ is set of all FDs logically implied by F 15 Armstrong s axioms Axioms for deriving implied functional dependencies - Reflexivity: If β α then α β - Augmentation: If α β then α,γ β, γ - Transitivity: If α β and β γ then also α γ Exercise: Check that these are valid Theorem. All FDs logically implied by F can be derived using Armstrong s axioms Derived rule: (α β and α γ) if and only if α β, γ 16
By transitivity fn an, city By reflexivity an, city an By transitivity again fn an An example derivation fn, date an fn std, sta ac an, city fn ac 17 Closures of attribute sets Definition. Given attribute set α and set of FDs F, α + is the largest set of attributes such that α α + is logically implied by F Example. If F is Then i_id dept_name s_id, dept_name i_id (s_id, dept_name) + = (s_id, dept_name, i_id) (s_id, i_id) + = (s_id, dept_name, i_id) (i_id) + = (dept_name, i_id) 18
An algorithm for computing closures Computing α + : Set result = α Repeat - Find an FD β γ in F such that β α and γ not a subset of result - Set result = result γ Terminate when no such FD exists 19 Example Consider schema R(A,B,C,D) with dependencies AB C C D D A Compute all candidate keys Is R in BCNF? Is it in 3NF? (this is a typical exam exercise) 20
Consider schema R(A,B,C), S(C,D,E) with dependencies AB C C D D B Compute all candidate keys Are R, S in BCNF? Are they in 3NF? Example 21 BCNF normalisation
Not a BCNF Decomposition into BCNF Decompose to BCNF: - instructor(id, name, salary, dept_name) - department(dept_name, building, budget) 23 Compute F+ BCNF decomposition Repeat the following while the schema is not BCNF - Find a BCNF violation A1 A2... An B1 B2... Bm in schema R(α) - Decompose R into ((α-b1 B2... Bm) A1 A2... An) and (A1 A2... An B1 B2... Bm) 24
Decompose the relation With the functional dependencies Example cd_shop(cd_id, artist, title, order_id, order_date, quantity, customer_id, name, address) cd_id artist, title customer_id name, address order_id order_date, customer_id order_id, cd_id quantity 25 Non determinancy Much depends on the choice of BCNF violation Try e.g. decomposing first using There is no guarantee that decomposition is dependency preserving (even if there is a dependency preserving decomposition) order_id order_date, customer_id One heuristic is to maximise right hand sides of BCNF violations 26
Correctness Correctness: - Tables become smaller for every decomposition - Every 2-attribute table is BCNF - So in the end, the schema must be BCNF Every decomposition is lossless In fact if α β then decomposition of R(αβγ) into (αβ) and (αγ) is always lossless (book page 346) 27 Discussion BCNF algorithm suggests a new strategy to DB design: - Put everything in one table - Write up all FDs - Apply BCNF decomposition This is not a good strategy This can lead to terrible designs Without ER diagram, it is also harder to extend an existing DB design 28
Summary When checking BCNF of schemas with more than one table one must inspect also logically implied FDs Armstrong s axioms provide an algorithm for computing closures of attribute sets We can always decompose a relation to a BCNF schema 29 Intended learning outcomes After this lecture you should be able to - Compute closures of attribute sets - Judge if a design is BCNF or 3NF - Apply BCNF algorithm 30