CS317 File and Database Systems http://dilbert.com/strips/comic/2010-08-24/ Lecture 8 Introduction to Normalization October 17, 2017 Sam Siewert
Exam #1 Questions? Reminders Working on Grading Ex #3 - Return Next Week Grading Breakdown here - http://mercury.pr.erau.edu/~siewerts/cs317/policies/grad ing-breakdown.pdf Assignment #4, Wednesday, Normalization Assignment #5, Logical and Physical DB Design Assignment #6, DBMS Project of Your Interest Sam Siewert 2
Normalization Concern is Duplication of Data in DBMS Wastes Space Insert Hazard (Update Multiple Tables?) Delete Hazard (Delete from Multiple Tables?) Modification Hazard (Modify in Multiple Tables?) Foreign Keys are Exception (Expected Redundancy for Relational Model) Minimal Attributes (Columns in Relations [Tables]) Attributes in Table with Close Logical Relationship Functionally Dependent Attributes in Same Relation Models of Functional Dependency Minimal Redundancy [Foreign Keys Only] Sam Siewert 3
How Normalization Supports Database Design (Ref. Connolly-Begg) 4
Data Redundancy and Update Anomalies FK Duplication [ok] Redundant Attribute Data 5
Example Functional Dependency that holds for all Time Consider the values shown in staffno and sname attributes of the Staff relation (previous slide). Based on sample data, the following functional dependencies appear to hold. staffno sname sname staffno 6
Data Redundancy and Update Anomalies StaffBranch relation has redundant data; the details of a branch are repeated for every member of staff. In contrast, the branch information appears only once for each branch in the Branch relation and only the branch number (branchno) is repeated in the Staff relation, to represent where each member of staff is located. 7
Duplicate Data and Update Anomalies Relations that contain redundant information may suffer from update anomalies. 3 update anomalies Row Insertion Enter SL99 assinged B003 fat finger baddress SG37, SG14, SG5 share with SL99 Which one is right? Deletion Delete SA9 (fired) What is baddress of B007? Do we still have B007? Modification Correct Bad Street # for Deer Rd. Which row - SL21 or SL41 row? updates 8
Lossless-join and Dependency Preservation Properties Two important properties of decomposition. Lossless-join property enables us to find any instance of the original relation from corresponding instances in the smaller relations. I can create UNF table as a view if I want to! Dependency preservation property enables us to enforce a constraint on the original relation by enforcing some constraint on each of the smaller relations. E.g. Domain, Referential Integrity (all staff must have one branch assignment), StaffNo must be unique, etc. 9
Functional Dependencies Important concept associated with normalization. Functional dependency describes relationship between attributes. For example, if A and B are attributes of relation R, B is functionally dependent on A (denoted A B), if each value of A in R is associated with exactly one value of B in R. 10
Characteristics of Functional Dependencies Property of the meaning or semantics of the attributes in a relation. Diagrammatic representation. The determinant of a functional dependency refers to the attribute or group of attributes on the lefthand side of the arrow. 11
An Example Functional Dependency 12
Characteristics of Functional Dependencies Full functional dependency indicates that if A and B are attributes of a relation, B is fully functionally dependent on A, if B is functionally dependent on A, but not on any proper subset of A. E.g. Branch assignment does not depend on your salary or position, just who you are 13
Functional Dependencies Determinants with minimal number of attributes necessary to maintain the functional dependency with the attribute(s) on the right hand-side. E.g. A staff member is assigned to one and only one branch. A branch has many staff members assigned to it. This requirement is called full functional dependency. 14
Full vs. Partial Functional Dependency Staff relation: staffno, sname branchno Each value of (staffno, sname) is associated with a single value of branchno. However, branchno is also functionally dependent on a subset of (staffno, sname), namely staffno. Example above is a partial dependency (name irrelevant) 15
Better Staff Branch Relations Full functional assigned to only one Partial functional assigned to only one E.g. Two employees named John White SL21 & New John White as SL100 SL21 -> B005 SL100 -> B099 16
Transitive Dependencies Important to recognize a transitive dependency because its existence in a relation can potentially cause update anomalies. Transitive dependency describes a condition where A, B, and C are attributes of a relation such that if A B and B C, then C is transitively dependent on A via B (provided that A is not functionally dependent on B or C). 17
Example Transitive Dependency Consider functional dependencies in the StaffBranch relation (see Slide 17). staffno sname, position, salary, branchno, baddress branchno baddress Transitive dependency, branchno baddress exists on staffno via branchno. 18
Better Staff Branch Relations assigned to only one has only one address Branch address of StaffNo is transitive 19
The Process of Normalization 20
The Process of Normalization 21
Unnormalized Form (UNF) A table that contains one or more repeating groups. Worst case May also have Full/Partial Functional Dependencies May also have Transitive Functional Dependencies E.g. Most Excel Spreadsheets!!! 22
Case in Point Omission of data Would an RDBMS have caught? Perhaps if data for plot was queried from well formed schema? Spreadsheets tend to use ranges rather than predicates Reinhart, Rogoff... and Herndon: The student who caught out the profs BBC News story Sam Siewert 23