Normalisation theory Introduction to Database Design 2012, Lecture 7 Challenging exercises E-R diagrams example Normalisation theory, motivation Functional dependencies Boyce-Codd normal form (BCNF) 3rd normal form (3NF) Next two lectures - Functional dependency theory - Normalisation algorithms Overview 2
Second challenging exercise Reformulate as we did last week Further reformulation Challenging exercises Find all cds that were bought in all orders in which 'Paranoid' by 'Black Sabbath' was bought. Find all cds c such that there is no order containing 'Paranoid' by 'Black Sabbath' not containing c Find all cds c such that there is no order containing 'Paranoid' by 'Black Sabbath' not in set of orders containing c 3 Problem Solution Challenging exercises Find all cds c such that there is no order containing 'Paranoid' by 'Black Sabbath' not containing c select artist, title from cd as S where not exists (select * from cd natural join purch_cd! where title = 'Paranoid' and artist = 'Black Sabbath'! and purch_id not in (select purch_id from purch_cd where cd_id = S.cd_id)); orders containing Paranoid, but not c orders containing c 4
Benjamin s solution select count(*) as AppearsInNumberOfOrdersWithParanoid, cd_id, title, artist from purchase natural join purch_cd natural join cd where purch_id in ( #Purch_ID for all paranoid orders select purch_id from purchase natural join purch_cd natural join cd where title='paranoid'and artist='black sabbath') group by cd_id having AppearsInNumberOfOrdersWithParanoid = (select count(*) from purchase natural join purch_cd natural join cd where title='paranoid'and artist='black sabbath' group by cd_id order by purch_id); 5 E-R diagrams
Generalisation in E-R diagrams, example Many-to-one relationship assigned_to cannot be described without using generalisation task id description assigned_to employee id name total permanent job title trainee start-date supervisor 7 Normal forms
Challenge: Avoiding redundancy A poor database design Redundancy wastes space and leads to inconsistency issues Normalisation theory deals with this issue 9 Anomalies caused by redundancy Update anomalies: occur when information is updated one but not all places where information occurs Deletion anomalies: occur when deleting one fact leads to deletion of other facts in an unwanted way Insertion anomalies: cannot insert information about one thing without knowing additional information about something else We should design database such that these anomalies cannot occur 10
Functional dependencies Cause of redundancy - value of budget determined by value of department - department is not a superkey - So budget information repeated between tuples with same department value Say there is a functional dependency from department to budget Notation dept_name budget 11 Can involve sets of tuples A poor flight db design: Functional dependencies Functional dependencies flight_all(flight_num, dept_date, capacity, dept_airport, arr_airport, STD, STA, date_offset) flight_num, dept_date capacity flight_num STD,STA flight_num, dept_date STD,STA The last one is not minimal (in a sense to be made precise later) but still true 12
Functional dependencies Functional dependencies are rules derived from the real world situation we are modelling Functional dependencies are a form of integrity constraint Goal is to make database enforce these constraints by design 13 Definition A legal instance of a database schema is an instance that does not break the rules of the real world Definition. A functional dependency α β holds if for all pairs of tuples t, u in any legal instance if t[α] = u[α] then t[β] = u[β] Here α,β denote sets of attributes t[α] = u[α] means tuples t and u agree on the values in α 14
Examples α β always holds if β α (a trivial dependency) A set of attributes α is a superkey for relation r (R) if α R A candidate key is a minimal superkey Example: flight_all table (3 slides back) - flight_num, dept_date, capacity is a superkey - flight_num, dept_date is a candidate key 15 Boyce-Codd normal form (BCNF) Definition. A table r(r) is in BCNF if for all functional dependencies α β either - β α (α β is trivial) - or α is a superkey A schema is in BCNF if all tables are in BCNF Example: - instructor(id, name, salary, dept_name, building, budget) is not BCNF - Because of dept_name building, budget 16
Not a BCNF Decomposition into BCNF Decompose to BCNF: - instructor(id, name, salary, dept_name) - department(dept_name, building, budget) 17 Not BCNF Decompose as This is BCNF Another example flight_all(flight_num, dept_date, capacity, dept_airport, arr_airport, STD, STA, date_offset) flight(flight_num, dept_airport, arr_airport, STD, STA, date_offset) departure(flight_num, dept_date, capacity) 18
Decomposition into - instructor(id, name, salary, dept_name) - department(dept_name, building, budget) Lossless decomposition is a lossless composition, because we can recover big table, by joining instructor and department A lossy decomposition of instructor: - instructor_name(id, name) - instructor(name, salary, dept_name) (what happens if two instructors have same name?) 19 Lossless decomposition Definition. A decomposition of r(r) into s(α), s (β) is lossless if for any legal instance of r r = α (r) β(r) This is the case of the decomposition into instructor, department Another example of a lossy decomposition - instructor(id, name, salary) - department(dept_name, building, budget) 20
3rd normal form 3NF, motivation A database in BCNF is to a large extent free of redundancy But reducing to BCNF can also lead to inefficient databases 3NF is a less strict normal form Reducing to 3NF is often enough In some situations 3NF allows for more efficient databases than BCNF 22
Dependency preservation, example DB schema - department, instructor, student - dept_advisor(s_id, i_id, dept_name) i_id dept_name s_id, dept_name i_id 23 Dependency preservation, example i_id dept_name s_id, dept_name i_id Schema previous slide not BCNF Consider decomposition of dept_advisor - (s_id, i_id) - (i_id, dept_name) (this table can be dropped because it is contained in instructor table) Second functional dependency involves 2 tables This decomposition is not dependency preserving 24
Efficiency of insertions 1st design Consider first single table solution - dept_advisor(s_id, i_id, dept_name) When inserting a tuple into dept_advisor check - Given i_id works in department dept_name This is a lookup into instructor - s_id, dept_name i_id is not violated This is a primary key condition Both these conditions can be checked efficiently i_id dept_name s_id, dept_name i_id 25 Efficiency of insertions, 2nd design Consider first single table solution - dept_advisor(s_id, i_id) When inserting a tuple into dept_advisor check - s_id, dept_name i_id is not violated This involves computing a join of dept_advisor and department Computing joins is slow and we would not like to do that for each insertion! i_id dept_name s_id, dept_name i_id 26
Third normal form (3NF) A table r(r) is in 3NF if for all functional dependencies α β either - β α (α β is trivial) - or α is a superkey - or each A in β-α is part of a candidate key A schema is in 3NF if all tables are in 3NF 27 The schema is in 3NF - dept_advisor(s_id, i_id, dept_name) Two candidate keys - (s_id, dept_name) - (s_id, i_id) i_id dept_name s_id, dept_name i_id So any attribute is part of candidate key Example in 3NF i.e. third condition holds for any functional dependency 28
Normalisation theory seeks to minimise redundancy Summary A DB design satisfying BCNF is redundancy free Sometimes tradeoffs between efficiency and removal of redundancy In this case the weaker 3NF may be preferable 29 Learning objectives You should be able to see if a db schema satisfies BCNF or 3NF Next time: - BCNF decomposition Can always compose a db design to BCNF in a lossless way - 3NF decomposition Can always compose into 3NF in a lossless and dependency preserving way 30