The Relational Model and Normalization 1. Introduction 2 2. Relational Model Terminology 3 4. Normal Forms 11 5. Multi-valued Dependency 21 6. The Fifth Normal Form 22
The Relational Model and Normalization 1. Introduction Questions? - Should we store these two tables as they are, or - Should we combine them into one table in our new database? We need to understand: - The relational model - Relational model terminology
2. Relational Model Terminology Relational Model - Introduced in 1970 Created by E.F. Codd - He was an IBM engineer - The model used mathematics known as "relational algebra" Now the standard model for commercial DBMS products The most Important Terms used by the Relational Model Entity Relation Functional Dependency Determinant Candidate Key Composite Key Primary Key Surrogate Key Foreign Key Referential integrity constraint Normal Form Multivalued Dependency (1) Entity An entity is some identifiable thing that users want to track: - Customers - Computers - Sales
(2) Relation A Relation : - Figure 3-5 : Tables that are not relations (a) Table with Multiple entries per cell (b) Table with Required Row order - Alternative Terminology
(3) Functional Dependency A functional dependency occurs when the value of one (a set of) attribute(s) determines the value of a second (set of) attribute(s): StudentID StudentName StudentID (DormName, DormRoom, Fee) X Y : If the value of X determines the value of Y, attribute Y is functionally dependent on attribute X Determinant - The attribute on the left side of the functional dependency is called the determinant - Functional dependencies may be based on equations: ExtendedPrice = Quantity X UnitPrice (Quantity, UnitPrice) ExtendedPrice - Function dependencies are not equations! Composite Determinant - A determinant of a functional dependency that consists of more than one attribute (StudentName, ClassName) (Grade) Functional Dependency Rule 1 If X (Y, Z), then X Y and X Z 2 But, If (X, Y) Z, it is not true X Y and Y Z neither X nor Y determines Z by itself
Formal Description Let R be a relation schema α R, β R The functional dependency α β holds on R if and only if for any legal relation r(r), whenever any two tuples t1 and t2 of r agree on the attributes α, they also agree on the attributes on the β That is, t1[α] = t2[α] t1[β] = t2[β] l Example : Supplier Relation S# (SNAME, STATUS, CITY) S# SNAME STATUS CITY Smith 20 London S2 Jones 10 Paris S3 Blake 30 Paris S4 Clark 20 London S5 Adams 30 Athens S# FD diagram SNAME STATUS CITY Finding Functional Dependency in SKU_DATA Relation SKU (SKU_Description, Department, Buyer) SKU_Description (SKU, Department, Buyer) Buyer Department
Finding Functional Dependency in Order_ITEM Relation (OrderNumber, SKU) (Quantity, Price, ExtendedPrice) (Quantity, Price) (ExtendedPrice) SKU PRICE (the price can be changed after an order has been processed) What Are Determinant Values Unique? A determinant is unique in a relation if, and only if, it determines every other column in the relation You cannot find the determinants of all functional dependencies simply by looking for unique values in one column: - Data set limitations - Must be logically a determinant To determine if attribute A determines attribute B "Every time that a value of attribute A appears is it mateched with the same value of attribute B?" - sample data can be incomplete
Keys Key A group of one or more attributes that uniquely identifies a row K is a super key for relation schema R if and only if K R (uniqueness property) K is candidate key for R if and only if K R and there is no α K, α R ( minimality property) A primary key is a candidate key selected as the primary means of identifying rows in a relation: - There is one and only one primary key per relation - The primary key may be a composite key There is at least one candidate key - because relations do not contain duplicate tuples - at least the combination of all attributes of the relation has the uniqueness property A surrogate key as an artificial column added to a relation to serve as a primary key: A foreign key is the primary key of one relation that is placed in another relation to form a link between the relations: - A foreign key can be a single column or a composite key - The term refers to the fact that key values are foreign to the relation in which they appear as foreign key values DEPARTMENT (DepartmentName, BudgetCode, ManagerName) EMPLOYEE(EmployeeNumber,EmployeeName, DepartmentName)
Definition of Foreign Key Let R2 be a base relation. Then foreign key in R2 is a subset of the set of attributes of R2, say FK, such that 1 There exists a base relation R1 with a candidate key CK, (R1 and R2 not necessarily distinct) 2 For all time, each value of FK in the current value of R2 is identical to the value of CK in some tuple in the current value of R1 R1 CK R2 FK Referential Integrity Rule The database must not contain any unmatched foreign key values If B references A, A must exist "foreign key" and "referential integrity" are defined in terms of each other Database Designer - specify which operations should be rejected - specify which compensating operations should be performed What should happen on an attempt to delete the target of a foreign key reference? RESTRICTED : "restricted" to the case where there are no matching. CASCADES : " cascades" to delete those matching also
Nulls - as a basis for dealing with the problem of missing information - "date of birth unknown","to be announced" - Nulls are not the same as blank or zero three valued logic AND True False unknown True True False unknown False False False False unknown Unknown False unknown OR True False unknown True True True True False True False unknown unknown True unknown unknown l Entity Integrity Rule No component of the primary key of a base relation is allowed to accept nulls. definition of every attribute involved in primary key must include the specification NULLS NOT ALLOWED l Integrity Rules - Summary Referential Integrity Rule Entity Integrity Rule
4. Normal Forms Modification Anomalies update anomaly (before update) (after update) Deletion Anomaly - To lose facts that tow different things, ( a machine and a repair) - If you delete a tuple that has the item number 200 or 300 Insertion Anomaly Primary Key :( OrderNumber, SKU) - We assume that situation when we are going to insert the fact that related with only the SKU ( SKU, SKU_Descripion, Department, Buyer) - Is it possible?
Normalization Theory All Possible Relational Schema Repetition of information Inability to represent certain information Inefficient retrieval and update performance Normalization Good Database Design Relations are categorized as a normal form based on which modification anomalies or other problems that they are subject to:
Normalization Category 1NF - A table that qualifies as a relation is in 1NF 2NF - A relation is in 2NF if all of its nonkey attributes are dependent on all of the primary key 3NF - A relation is in 3NF if it is in 2NF and has no determinants except the primary key - The relation R is 3NF iff it is 2NF and has no transitive dependencies Boyce-Codd Normal Form (BCNF) - A relation is in BCNF if every determinant is a candidate key Eliminating Anomalies from Functional Dependencies Put all relations into Boyce-Codd Normal Form (BCNF):
Eliminating Anomalies from Functional Dependencies : Example 1 (page 84) - SKU_DATA relation FDs SKU (SKU_Description, Department, Buyer) SKU_Description (SKU, Department, Buyer) Buyer Department SKU and SKU_Description ; all of the attributes in the table, so they are candidate keys Buyer ; It does not determine all of the other attributes Hence, the SKU_DATA relation is not in BCNF Step 3-A in Figure 3-11 : Step 3-B : make the Buyer attribute the primary key BUYER(Buyer, Department) Step 3-C and 3-D : SKU_DATA_2(SKU, SKU_Description, Buyer)
[Report #3] Describe the whole process of below examples and write your explanation and PKs, FKs of the decomposed relations for each step of process for putting a relation into BCNF. 1) Eliminating Anomalies from Functional Dependencies : Example 2 (page 85) 2) Eliminating Anomalies from Functional Dependencies : Example 3 (page 86) 3) Eliminating Anomalies from Functional Dependencies : Example 4 (page 87) 4) Eliminating Anomalies from Functional Dependencies : Example 5 (page 89)
Theoretical Approach : The First Normal Form definition - Any table of data that meets the definition of a relation Formal Definition A relation is in 1NF iff all underlying domains contain scalar values only Example " 상태 20 인도시 런던 에위치한공급자 이부품 P1 을 300 개공급한다 S# STATUS CITY 20 London 20 London 20 London 20 London S2 10 Paris S2 10 Paris S3 10 Paris P# P1 P2 P3 P4 P1 P2 P2 QTY 300 200 400 200 300 400 200 S5 40 Athens P4 100 S# P# QTY CITY STATUS l Anomalies in 1NF 3 Insertion Anomaly New Supplier without P# 4 Deletion Anomaly Unexpected loss of bundled information 5 Update Anomaly FIRST London Paris redundancy S# STATUS CITY P# QTY 20 London P1 300 20 London P2 200 20 London P3 400 20 London P4 200 S2 10 Paris P1 300 S2 10 Paris P2 400 S3 90 New York P2 200 S5 40 Athens P4 100 S6 70 Florence Null Null
l Solution Decomposition First SECOND(S#, STATUS, CITY), SP(S#, P#, QTY) l Nonloss(lossless decomposition) - Reversibility S S# STATUS CITY S3 30 Paris S5 30 Athens nonloss S# STATUS S3 30 S5 30 SC S# CITY S3 Paris S5 Athens lossy S# STATUS S3 30 S5 30 STC STATUS CITY 30 Paris 30 Athens l test for the lossless decomposition nonloss (a) SST S# STATUS SC S# CITY S# STATUS CITY S S3 30 S3 Paris S3 30 Paris S5 30 S5 Athens S5 30 Athens lossy (b) SST S# STATUS STC STATUS CITY S# STATUS CITY S S3 30 30 Paris S3 30 Paris S5 30 30 Athens S3 30 Athens S5 30 Paris S5 30 Athens
l Decomposition : In a view of functional dependency S S# STATUS CITY S3 30 Paris S5 30 Athens S# S# STATUS S# S# CITY CITY (a) SST S# STATUS S3 30 S5 30 SC S# CITY S3 Paris S5 Athens S# S# STATUS S# S# CITY CITY (b) SST We cannot tell which supplier has which city S# STATUS S3 30 S5 30 STC STATUS CITY 30 Paris 30 Athens S# STATUS S# STATUS S# CITY is lost l Heath's Theorem Let R(A, B, C) be a Relation, where A, B, C are set of attributes, - If R satisfies the FD A B, - then R is equal to join of its projections on R(A, B) and R(B, C) l First SECOND(S#, STATUS, CITY), SP(S#, P#, QTY) SECOND S# S2 S3 S5 STATUS CITY 20 London 10 Paris 10 Paris 40 Athens SP S# P# QTY P1 300 P2 200 P3 400 P4 200 S2 P1 300 S2 P2 400 S3 P2 200 S5 P4 100
Theoretical Approach : The Second Normal Form l Definition - R is in 2NF When all of a relation's nonkey attributes are dependent of a key l Formal Definition A relation R is in 2NF if and only if - 1NF - Every non-key attribute is irreducibly(fully) dependent on the primary key Anomaly in 2NF Insertion Anomaly - Insertion {Florence, Italy} Deletion Anomaly - Deletion of tuple : Unexpected loss of bundled information Update Anomaly SECOND S# S2 S3 S5 STATUS CITY 20 London 10 Paris 10 Paris 40 Athens S# CITY STATUS Because, Transitive Dependency Decomposition SECOND(S#, STATUS, CITY) decomposition SC(S#, STATUS), CS(CITY, STATUS) SC S# CITY CITY STATUS CS London Athens 30 S2 Paris London 20 S3 Paris Paris 10 S5 Athens Rome 50
Theoretical Approach : The Third Normal Form Definition The relation R is 3NF iff it is 2NF and has no transitive dependencies Formal Definition - 2NF - Every non-key attribute is non-transitively(no mutual dependency) on the primary key Anomaly in 3NF 3NF o w S Smith Smith Jones Jones Semantic constraints J Math Physics Math Physics T Prof. White Prof. Green Prof. White Prof. Brown S J T Two candidate keys : {S,J}, {S, T} w For each subject, each student of that subject is taught by only one teacher { S, J } T w Each teacher teaches only one subject T J w Each subject is taught by several teachers Anomaly w Jones drops the physics w delete the last tuple w we loss the information Prof. Brown teaches Physic Solution : Decomposition - SJT ST(S, T), TJ((T, J) - BCNF(Boyce-Codd Normal Form)
5. Multi-valued Dependency A multi-valued dependency occurs when a determinant determines a particular set of values: Employee Degree Employee Sibling PartKit Part The determinant of a multivalued dependency can never be a primary key Multivalued dependencies are not a problem if they are in a separate relation, so: - Always put multivaled dependencies into their own relation - This is known as Fourth Normal Form (4NF)
A relation is in 4NF iff, for all multi-valued dependency A B, non-key attributes are dependent on A C T B database database Prof. White Prof. White Korth Ullman database Prof. White Prof. Green Korth Ullman database Prof. Green Korth database Prof. Green Ullman insertion C T C B database Prof. Sara Ullman database Prof. Sara Korth 6. The Fifth Normal Form A relation is in 5NF iff, for all join dependency (R 1, R 2,..., R n ), R 1, R 2,, R n is a candidate key S# P# P1 P2 J# J2 J1 S2 P1 J1 P1 J1 S# P# P1 P2 P# P1 P2 J# J2 J1 S# J# J2 J1 S# P# P1 P2 P# P1 P2 J# J2 J1 S# J# J2 J1 S2 P1 P1 J1 S2 J1 P1 J1
[Report #4] 1. make the relation that is the result of join on the relation CUSTOMER and ORDER. CUSTOMER (Cid, Name, Address) Cid (Name, Address, zipcode) Address Zipcode ORDER (BookNumber, Cid, Price, Orderdate) (BookNumber, Cid) (Price, Orderdate) 2. Write a detail process the following problems. (1) Describe the FDs in result relation 1 and illustrate the anomaly (2) Transform into the second Normal Form (3) Transform into the third Normal Form (4) Transform into the BCNF (5) make sure that result of (4) is the same as the result of problem 1