Institute of Southern Punjab, Multan Mr. Muhammad Nouman Farooq BSC-H (Computer Science) MS (Telecomm. and Networks) Honors: Magna Cumm Laude Honors Degree Gold Medalist! Blog Url: noumanfarooqatisp.wordpress.com E-Mail: noman.iefr@hotmail.com
Advance Database Systems Advanced Normalization Lecture# 6
Lecture 6: Advanced Normalization Normalization (Brief Overview) Functional Dependencies and Keys 1 st Normal Form 2 nd Normal Form 3 rd Normal Form 3.5 Normal Form (Boyce Codd Normal Form-BCNF) 4 th Normal Form 5 th Normal Form (Project-Join Normal Form-PJNF) Domain Key Normal Form (DKNF) 6 th Normal Form Class Activity 3
Normalization (Brief Overview) 4
Normalization (Brief Overview) The main goal of Database Normalization is to restructure the logical data model of a database to: 1. Eliminate Redundancy. 2. Organize Data Efficiently. 3. Reduce the possibility of Data Anomalies/Irregularities. 5
Continued Data anomalies are inconsistencies in the data stored in a database as a result of an operation such as update, insertion, and deletion. Such inconsistencies may arise when have a particular record stored in multiple locations and not all of the copies are updated. We can prevent such anomalies by implementing 7 different level of normalization called Normal Forms (NF) We ll only look at the first Three. 6
Continued Database Normalization was first proposed by Edgar F. Codd. Codd defined the first three Normal Forms, which we ll look into, of the 7 known Normal Forms. In order to do normalization we must know what the requirements are for each of the three Normal Forms that we ll go over. One of the key requirements to remember is that Normal Forms are progressive. That is, in order to achieve 3 rd NF we must have 2 nd NF and in order to have 2 nd NF we must have 1 st NF. 7
Levels of Normalization Levels of normalization based on the amount of redundancy in the database. Various levels of normalization are: First Normal Form (1NF) Second Normal Form (2NF) Third Normal Form (3NF) Boyce-Codd Normal Form (BCNF) or 3.5 NF Fourth Normal Form (4NF) Fifth Normal Form (Project-Join Normal Form-PJNF) Domain Key Normal Form (DKNF) Sixth Normal Form (6NF) Number of Tables Increases Redundancy Decreases 8
Data Anomalies 1. Insertion Anomaly: Cannot make a record of Jone s address because he is not taking any classes. 10
Continued 2. Update Anomaly: Clearly, Name and Address are redundant (larger relation and we have to update 3 rows to update the Address) 11
Continued 3. Delete Anomaly: Cannot delete Jones enrolment without loosing his address as well 12
Functional Dependencies and Keys 13
Functional Dependencies and Keys. Example is shown on next slide 14
15
16
Prime and Non-Prime Attributes In Normalization Terminology, Any attribute that is completely or at least part is a member of a Primary Key is known as a Prime Attribute instead of the more common term Key Attribute. So, a Non-Prime attribute or Non-Key attribute, is not part of any Candidate Key. 17
Closure Set of Attributes Trivial means D and E contains only its own value 18
Finding Candidate Keys using Closure Set of Attributes There are 15 possible Keys; We have to find Candidate key from these possible keys So: 19
Continued.. There are 15 possible Keys; We have to find Candidate key from these possible keys 20
Continued.. 21
Finding Candidate Keys using Closure Set of Attributes 22
Continued.. 23
1 st Normal Form 24
1 st Normal Form The requirements to satisfy the 1 st NF are: 1) Each table has a Primary Key: minimal set of attributes which can uniquely identify a record 2) The values in each column of a table are Atomic (No multivalue attributes allowed eg. stdcellno; stdemailaddress) 3) There are no repeating groups: two columns do not store similar information in the same table. 25
2 nd Normal Form 28
2 nd Normal Form The requirements to satisfy the 2 nd NF: All requirements for 1 st NF must be met Any partial functional dependencies have been removed (i.e., non-keys are identified by the Whole Primary Key). bookisbn --> booktitle, bookfirstauthorname, bookpublisher. In the given above example; The title of a book, the name of the first author, and the publisher are functionally dependent on the book's International Standard Book Number (ISBN). In other words all listed attributes are identified uniquely by the whole Primary Key which is bookisbn and this relation/file/table/entity is in 2 nd Normal Form. 29
30
31
33
34
3 rd Normal Form 35
Continued We have seen how Database Normalization can decrease redundancy, increase efficiency and reduce anomalies by implementing three of seven different levels of normalization called Normal Forms. The first Three Normal Forms (3-NF) are usually sufficient for small to medium size applications. 39
3.5 Normal Form (Boyce Codd Normal Form-BCNF) 40
42
44
45
46
4 th Normal Form 50
Conditions that Satisfies a Relation R is in 4-NF A relation R is in 4-NF if and only if the following Conditions satisfied: 1) R must be in 3.5 NF (BCNF). 2) It must not contain Multi Valued Dependencies (MVD s). Note: Multi Valued Attributes and Multi Valued Dependencies are different concepts.
Multi-Valued Dependency (MVD) It is the dependency where one attribute value is potentially a Multi Valued fact about another: Important Points: 1) There must be three or more attributes exists in a Relation (In example three Attributes exists which are: Person, Mobile and Food_Like) 2) Attributes or Subset of Attributes must be independent of each other (In example Mobile and Food_Like attributes are Independent of each other).
Overview of Definitions of 4-NF and Multivalued Dependencies (MVD)
Example:
Anomalies in a Relation that is not in 4NF
Class Activity: a) If not in 4NF than Decompose the given Relation into 4NF and Design an ERD in Information Engineering Standard. b) Write a complete Code of an ERD of part a) in MySQL using XAMPP as Simulator and Show Relational Schema Diagram having Database Name: 4NF
5 th Normal Form (Project-Join Normal Form-PJNF) 58
Why we Required Decomposition of a Relation Redundancy can be reduced Data Independence can be maintained Anomalies can be reduced
In Natural Join Operation; After Joining tables; Repeated Records in rows are Eliminated In Additive Lossy Join; spurious RECORD/TUPPLE/INSTANCE created after Natural Joining on two or more than two tables (It can be called as Information/record retrieve that was missing in original table) In Non-Additive Lossless Join; We are not losing any Information and not creating any SPURIOUS/FALSE RECORDS/INSTANCES/DATA/TUPPLES in a Relation/Table/File but repetition in Records must be removed
Additive Lossy Join means EXTRA RECORD/TUPPLE/INSTANCE created after Natural Joining on two or more than two tables In Natural Join; After Joining multiple tables; Repeated values Eliminated as shown in figures
R = R1 (R) R2 (R) Rn (R) Projection of R1 Natural Join Projection of R2 Natural Join Projection of R3.. Natural Join Projection of Rn = R
Conditions that Satisfies a Relation R is in 5-NF A relation R is in 5-NF if and only if: 1) R must be in 4-NF. 2) If Join Dependency (JD) not exists. a) If Non-Additive Lossless Join exists under one condition that is; decomposition of main table/relation into smaller relations that must eliminate redundant records with out losing original information and with out creating false/spurious records. b) If Additive Lossy Join exists than it will also be in 5-NF under one condition that is; gaining meaningful information that was missing in original relation (not called as spurious record in certain situation)
Lossy and Lossless Join Examples under Two Conditions Condition 1: When Data is given in a Relation Condition 2: When Data is Not given in a Relation 67
Condition 1: When Data is Given in a Relation The following given Relation is Lossy Join or Lossless Join?
Condition 1: Example 2 R1 is a Super Set of R2:
Continued.. R1 R2 R3
Continued.. R1 R2 R3 R1 (R) R2 (R) R3 (R)
Condition 1: Example 3
Condition 2: When Data is not given in a Relation
Solution: - Step 1: Write data in Tabular Form: -
Step 2: 1. Apply Functional Dependencies as provided in question: 2. We will select column of these determinants containing minimum of two stars.
Step 2 (Continued..) Now, We will apply functional dependency where b c and c d
Step 3: Check number of Stars in each Rows. If a Row contains stars in all Columns than Lossless Join otherwise Lossy Join.
81
Domain Key Normal Form (DKNF) 82
6 th Normal Form 86
Class Activity 91
Class Activity 1: Answer the following Questions 92
Class Activity 2: Answer the following Questions a) Indicate the Level of Normal Form for the given below Relation. b) If the Relation is not in 4-NF, Than decompose it into 4-NF Relations. c) Design an ERD of 4-NF Relations in Information Engineering Standard. 93
Recommended Readings Chapter 5 from: Modern Database Management-8 th Edition by Jeffrey A. Hoffer, Mary B. Presscott & Fred R. McFadden (Page No. 211-219) Advanced Normal Forms from: Modern Database Management-8 th Edition by Jeffrey A. Hoffer, Mary B. Presscott & Fred R. McFadden (Page No. 605-610)
Recommended Readings Chapter 14 from: Database Systems-A Practical Approach to Design, Implementation and Management by Thomas Connolly and Carolyn Begg, 4th Edition (Page No. 481-485) Chapter 4 from: Database Systems-A Pragmatic Approach by Elvis C. Foster, Shirpad V. Godbole (Page 72-79)
Recommended Readings Chapter 6 from: Database Systems - Design, Implementation and Management by Carlos Coronel, Steven Morris and Peter Rob, 9th Edition(Page No. 204-246)
Summary of Lecture Lecture 6 Normalization (Brief Overview) Functional Dependencies and Keys 1 st Normal Form 2 nd Normal Form 3 rd Normal Form 3.5 Normal Form (Boyce Codd Normal Form-BCNF) 4 th Normal Form 5 th Normal Form (Project-Join Normal Form-PJNF) Domain Key Normal Form (DKNF) 6 th Normal Form Class Activity 97
END OF LECTURE 6 98