SVY227: Databases for GIS Lecture 6: Relational Database Normalisation 2. Dr Stuart Barr School of Civil Engineering & Geosciences University of Newcastle upon Tyne. Email: S.L.Barr@ncl.ac.uk Lecture 6: Relational database normalisation 1 Lecture Objectives & Contents Objectives: To introduce formal methods of database structure. Contents: The Normal Forms. Summary. Lecture 6: Relational database normalisation 2
Formal Methods - Normalisation Involves applying formal methods to design an effectient database. Involves the use of the concept of Functional Dependence. Using this to apply the three stages or rules to refine and check the quality of the database. Known as First, Second and Third Normal Form. Other higher-order Forms also exist, but we will limit our coverage to the first three. Lecture 6: Relational database normalisation 3 Formal Methods Functional Dependence Constraint between two sets of attributes - Imagine: R = {A1, A2, A3,..., A n } is a single relation of all attributes. Called a universal relation scheme. A functional dependency X Y: Occurs between two sets X and Y that are subsets of R. Constraint on the possible tuples that can form a relation r of R. The constraint is that for t 1 and t 2 in r that have t 1 [X]=t 2 [X] they also have t 1 [Y]=t 2 [Y]. The values of the Y attributes of a tuple are dependent (determined) by the values of X attributes of the tuple. So X Y means X functionally determines Y X called the left-hand side of FD. Y called the right-hand side of FD. Lecture 6: Relational database normalisation 4
Formal Methods Functional Dependence Functional dependence: Describe relation scheme by specifying constraints on its attributes. Constraints MUST hold all times for the relation scheme. An example: StaffID Sname. ProjID {Pname, Plocation}. {StaffID, ProjID} Hours. SurvProj StaffID ProjID Hours Sname Pname Plocation Lecture 6: Relational database normalisation 5 Formal Methods Functional Dependence The functional dependencies FD of R is denoted F. Obvious members of F are usually pre-determined as part of the database design process. There are often many possible members of F. These can be inferred to give the closure of F. Denoted F +, which is the full set of functional dependencies. Staff StaffID Sname Fname Bdate Dnumber Dname Dmanager Lecture 6: Relational database normalisation 6
Formal Methods Functional Dependence An example of inferred FDs: F = {StaffID {Sname, Fname, Bdate, Dnumber}, Dnumber {Dname, Dmanager}}. Using a reflexive, transitive and projective rule also gives: StaffID StaffID reflexive. StaffID {Dname, Dmanager} transitive. StaffID Fname projective. Dnumber Dname projective. Staff StaffID Sname Fname Bdate Dnumber Dname Dmanager Lecture 6: Relational database normalisation 7 Formal Methods Functional Dependency & Normal Forms FD used to help improve the meaning of a relation schemes process of Normalisation: Analysis of relation schemes using FDs and primary keys. To minimize redundancy. To minimise problems latter with insertions, deletions and update operations. Normalisation involves applying Normal Forms: A series of tests to certify that a relation scheme satisfies a particular normal form. The normal form of a relation indicates the degree to which it has been normalised. If a relation scheme does not meet a particular normal form it can be decomposed until it does. Lecture 6: Relational database normalisation 8
First Normal Form The domain of an attribute MUST include only atomic (simple) values AND the value of any attribute in a tuple must be a single value from the domain of the attribute. Not in 1NF: Department Dnumber Dname Dmanager Dlocations Department Dnumber Dname Dmanager Dlocations 1 CEGS 3 {Cassie, Bedson, } Now in 1NF BUT redundancy: Department Dnumber Dname Dmanager Dlocation 1 CEGS 3 Cassie 1 CEGS 3 Bedson Lecture 6: Relational database normalisation 9 First Normal Form Now in 1NF and redundancy removed: Department Dnumber Dname Dmanager 1 CEGS 3 2 GEOG 4 DeptLoc Dnumber Dlocation 1 Cassie 1 Bedson 2 Daysh 1NF simple A relation scheme must not contain any repeating attributes; items must be simple. Lecture 6: Relational database normalisation 10
Second Normal Form A relation scheme is in second normal form if it is in first normal form and every nonprime attribute in R is fully functionally dependent on the primary key of R. Lecture 6: Relational database normalisation 11 Second Normal Form Full Functional Dependence: X Y is FFD if the removal of any A from X means that the dependency does not hold anymore. So X Y if not FFD (termed partial) where you can remove an A from X and the dependency X Y still holds. E.g., X Y = {StaffID, ProjID} Hours is FFD; Remove ProjID gives StaffID Hours but this does not hold. X Y = {StaffID, ProjID} Sname is not FFD; Remove ProjID and StaffID Sname still holds (one of the base FDs). SurvProj StaffID ProjID Hours Sname Pname Plocation Lecture 6: Relational database normalisation 12
Second Normal Form A relation scheme is in second normal form if it is in first normal form and every nonprime attribute in R is fully functionally dependent on the primary key of R. SO Second normal form is essentially ensuring that you have no partial functional dependencies in your relation scheme. That is all relation scheme are full functionally dependent. Hence, that all attributes are only functionally dependent on the primary key Lecture 6: Relational database normalisation 13 Second Normal Form Checks If the primary key has only one attribute: The relation is in second normal form. All attributes are fully functionally dependent on PK. Does not mean you have good relation scheme design. Attributes may not be logical for the relation scheme. If the primary key has two or more attributes: You need to test for partial functional dependencies where the attributes of the left hand side of FDs are part of the primary key. If partial functional dependencies are found then you need to decompose the relation into a new series that individually are full functional dependencies. Lecture 6: Relational database normalisation 14
Second Normal Form - Example Partial functional dependencies: {StaffID, ProjID} Sname violates 2NF. {StaffID, ProjID} Pname violates 2NF. {StaffID, ProjID} Plocation violates 2NF. SurvProj StaffID ProjID Hours Sname Pname Plocation So what would not violate full function dependency: {StaffID, ProjID} Hours. {StaffID} Sname. {ProjID} {Pname, Plocation}. Lecture 6: Relational database normalisation 15 Second Normal Form - Example So: SurvProj StaffID ProjID Hours Sname Pname Plocation Becomes: R1 StaffID ProjID Hours R2 StaffID Sname R3 ProjID Pname Plocation Lecture 6: Relational database normalisation 16
Second Normal Form - Golden Rule When creating a relation scheme always only include attributes that are fully functionally dependent on the primary key. That is dont include attributes in relations that are not needed. Lecture 6: Relational database normalisation 17 Third Normal Form A relation scheme is in third normal form if it is in first and second normal form and no nonprime attribute is transitively dependent on the primary key. Lecture 6: Relational database normalisation 18
Third Normal Form R is said to have a transitive dependence if: There is a set of attributes Z that are not a primary key or a subset of any primary key of R. And X Z and Z Y (and by inference X Y). E.g., StaffID Dmanager is transitive. Note: the relation is in second normal form. Staff StaffID Sname Fname Bdate Dnumber Dname Dmanager Lecture 6: Relational database normalisation 19 Third Normal Form - Solution Again decompose into a set of relations that are in third normal form. Staff StaffID Sname Fname Bdate Dnumber Dname Dmanager So the above becomes: R1 StaffID Sname Fname Bdate Dnumber R2 Dnumber Dname Dmanager Lecture 6: Relational database normalisation 20
Lecture 6: Reading Elmasri, R., and Navathe, S., 2003. Fundamentals of Database Systems. Addison Wesley (London). Pages 293-332. Lecture 6: Relational database normalisation 21 SVY227 L5: Relational Database Normalisation. Today we have: Introduced formal methods based on Codds normal forms. Next lecture: Introduction to Entity Relationship Models. Lecture 6: Relational database normalisation 22