The Relational Model and Normalization

Similar documents
David M. Kroenke and David J. Auer Database Processing Fundamentals, Design, and Implementation

Database Systems Relational Model. A.R. Hurson 323 CS Building

Relational Design: Characteristics of Well-designed DB

Normalization Rule. First Normal Form (1NF) Normalization rule are divided into following normal form. 1. First Normal Form. 2. Second Normal Form

Chapter 4. The Relational Model

Normalization. Murali Mani. What and Why Normalization? To remove potential redundancy in design

UNIT 3 DATABASE DESIGN

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

CS 338 Functional Dependencies

Supplier-Parts-DB SNUM SNAME STATUS CITY S1 Smith 20 London S2 Jones 10 Paris S3 Blake 30 Paris S4 Clark 20 London S5 Adams 30 Athens

Draw A Relational Schema And Diagram The Functional Dependencies In The Relation >>>CLICK HERE<<<

Steps in normalisation. Steps in normalisation 7/15/2014

Redundancy:Dependencies between attributes within a relation cause redundancy.

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-2

customer = (customer_id, _ customer_name, customer_street,

Chapter 10. Normalization. Chapter Outline. Chapter Outline(contd.)

IS 263 Database Concepts

Relational Model. Rab Nawaz Jadoon DCS. Assistant Professor. Department of Computer Science. COMSATS IIT, Abbottabad Pakistan

begin [atomic] operation, operation, { commit rollback} end

Database Design Theory and Normalization. CS 377: Database Systems

DATABASE MANAGEMENT SYSTEM SHORT QUESTIONS. QUESTION 1: What is database?

Normalisation Chapter2 Contents

Database Management System Prof. Partha Pratim Das Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur

Chapter 10. Chapter Outline. Chapter Outline. Functional Dependencies and Normalization for Relational Databases

The strategy for achieving a good design is to decompose a badly designed relation appropriately.

Databases -Normalization I. (GF Royle, N Spadaccini ) Databases - Normalization I 1 / 24

In This Lecture. Normalisation to BCNF. Lossless decomposition. Normalisation so Far. Relational algebra reminder: product

Chapter 6: Relational Database Design

Schema Refinement: Dependencies and Normal Forms

CS211 Lecture: Database Design

Fig. 7.1 Levels of normalization

Chapter 14. Database Design Theory: Introduction to Normalization Using Functional and Multivalued Dependencies

Databases 1. Daniel POP

Schema Refinement: Dependencies and Normal Forms

Database Principles: Fundamentals of Design, Implementation, and Management Tenth Edition. Chapter 9 Normalizing Database Designs

Informal Design Guidelines for Relational Databases

Schema Refinement: Dependencies and Normal Forms

Unit 3 : Relational Database Design

Learning outcomes. On successful completion of this unit you will: 1. Understand data models and database technologies.

Functional Dependencies and. Databases. 1 Informal Design Guidelines for Relational Databases. 4 General Normal Form Definitions (For Multiple Keys)

V. Database Design CS448/ How to obtain a good relational database schema

Database Systems: Design, Implementation, and Management Tenth Edition. Chapter 6 Normalization of Database Tables

Relational Database design. Slides By: Shree Jaswal

Chapter 3. The Relational database design

This lecture. Databases -Normalization I. Repeating Data. Redundancy. This lecture introduces normal forms, decomposition and normalization.

DATABASE DESIGN I - 1DL300

Introduction to Relational Databases. Introduction to Relational Databases cont: Introduction to Relational Databases cont: Relational Data structure

Chapter 14 Outline. Normalization for Relational Databases: Outline. Chapter 14: Basics of Functional Dependencies and

Chapter 7: Relational Database Design

Edited by: Nada Alhirabi. Normalization

DATABASE DESIGN I - 1DL300

Dr. Anis Koubaa. Advanced Databases SE487. Prince Sultan University

Database Normalization. (Olav Dæhli 2018)

ITCS 3160 DATA BASE DESIGN AND IMPLEMENTATION

8) A top-to-bottom relationship among the items in a database is established by a

COSC344 Database Theory and Applications. σ a= c (P) Lecture 3 The Relational Data. Model. π A, COSC344 Lecture 3 1

Databases Tutorial. March,15,2012 Jing Chen Mcmaster University

DATABASE DESIGN I - 1DL300

Normalisation. Normalisation. Normalisation

Relational Database Design (II)

Relational Model. IT 5101 Introduction to Database Systems. J.G. Zheng Fall 2011

CMU SCS CMU SCS CMU SCS CMU SCS whole nothing but

Logical Database Design Normalization

Chapter 8: Relational Database Design

CSCI 403: Databases 13 - Functional Dependencies and Normalization

The Basic (Flat) Relational Model. Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley

Lecture 11 - Chapter 8 Relational Database Design Part 1

Unit IV. S_id S_Name S_Address Subject_opted

Dr. Lyn Mathis Page 1

CSC 742 Database Management Systems

Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Overview - detailed. Goal. Faloutsos & Pavlo CMU SCS /615

DBMS. Relational Model. Module Title?

CS2300: File Structures and Introduction to Database Systems

Relational Design Theory. Relational Design Theory. Example. Example. A badly designed schema can result in several anomalies.

A7-R3: INTRODUCTION TO DATABASE MANAGEMENT SYSTEMS

Midterm Exam #1 Version A CS 122A Winter 2017

Unit- III (Functional dependencies and Normalization, Relational Data Model and Relational Algebra)

Functional Dependencies and Normalization for Relational Databases Design & Analysis of Database Systems

Functional dependency theory

Mapping ER Diagrams to. Relations (Cont d) Mapping ER Diagrams to. Exercise. Relations. Mapping ER Diagrams to Relations (Cont d) Exercise

DATABASE TECHNOLOGY - 1DL124

CS275 Intro to Databases

Databases The theory of relational database design Lectures for m

Normalization in DBMS

Normal Forms. Winter Lecture 19

Relational Database Systems 1. Christoph Lofi Simon Barthel Institut für Informationssysteme Technische Universität Braunschweig

Databases 1. Daniel POP

DBMS Chapter Three IS304. Database Normalization-Comp.

Database Foundations. 3-9 Validating Data Using Normalization. Copyright 2015, Oracle and/or its affiliates. All rights reserved.

Lecture 15: Database Normalization

L12: ER modeling 5. CS3200 Database design (sp18 s2) 2/22/2018

Relational Design 1 / 34

CS 377 Database Systems

CPS510 Database System Design Primitive SYSTEM STRUCTURE

Normalization. VI. Normalization of Database Tables. Need for Normalization. Normalization Process. Review of Functional Dependence Concepts

Relational Database Design. Announcements. Database (schema) design. CPS 216 Advanced Database Systems. DB2 accounts have been set up

7+8+9: Functional Dependencies and Normalization

CS403- Database Management Systems Solved Objective Midterm Papers For Preparation of Midterm Exam

Chapter 8 INTEGRITY 1

Unit 3 The Relational Model

Transcription:

The Relational Model and Normalization 1. Introduction 2 2. Relational Model Terminology 3 4. Normal Forms 11 5. Multi-valued Dependency 21 6. The Fifth Normal Form 22

The Relational Model and Normalization 1. Introduction Questions? - Should we store these two tables as they are, or - Should we combine them into one table in our new database? We need to understand: - The relational model - Relational model terminology

2. Relational Model Terminology Relational Model - Introduced in 1970 Created by E.F. Codd - He was an IBM engineer - The model used mathematics known as "relational algebra" Now the standard model for commercial DBMS products The most Important Terms used by the Relational Model Entity Relation Functional Dependency Determinant Candidate Key Composite Key Primary Key Surrogate Key Foreign Key Referential integrity constraint Normal Form Multivalued Dependency (1) Entity An entity is some identifiable thing that users want to track: - Customers - Computers - Sales

(2) Relation A Relation : - Figure 3-5 : Tables that are not relations (a) Table with Multiple entries per cell (b) Table with Required Row order - Alternative Terminology

(3) Functional Dependency A functional dependency occurs when the value of one (a set of) attribute(s) determines the value of a second (set of) attribute(s): StudentID StudentName StudentID (DormName, DormRoom, Fee) X Y : If the value of X determines the value of Y, attribute Y is functionally dependent on attribute X Determinant - The attribute on the left side of the functional dependency is called the determinant - Functional dependencies may be based on equations: ExtendedPrice = Quantity X UnitPrice (Quantity, UnitPrice) ExtendedPrice - Function dependencies are not equations! Composite Determinant - A determinant of a functional dependency that consists of more than one attribute (StudentName, ClassName) (Grade) Functional Dependency Rule 1 If X (Y, Z), then X Y and X Z 2 But, If (X, Y) Z, it is not true X Y and Y Z neither X nor Y determines Z by itself

Formal Description Let R be a relation schema α R, β R The functional dependency α β holds on R if and only if for any legal relation r(r), whenever any two tuples t1 and t2 of r agree on the attributes α, they also agree on the attributes on the β That is, t1[α] = t2[α] t1[β] = t2[β] l Example : Supplier Relation S# (SNAME, STATUS, CITY) S# SNAME STATUS CITY Smith 20 London S2 Jones 10 Paris S3 Blake 30 Paris S4 Clark 20 London S5 Adams 30 Athens S# FD diagram SNAME STATUS CITY Finding Functional Dependency in SKU_DATA Relation SKU (SKU_Description, Department, Buyer) SKU_Description (SKU, Department, Buyer) Buyer Department

Finding Functional Dependency in Order_ITEM Relation (OrderNumber, SKU) (Quantity, Price, ExtendedPrice) (Quantity, Price) (ExtendedPrice) SKU PRICE (the price can be changed after an order has been processed) What Are Determinant Values Unique? A determinant is unique in a relation if, and only if, it determines every other column in the relation You cannot find the determinants of all functional dependencies simply by looking for unique values in one column: - Data set limitations - Must be logically a determinant To determine if attribute A determines attribute B "Every time that a value of attribute A appears is it mateched with the same value of attribute B?" - sample data can be incomplete

Keys Key A group of one or more attributes that uniquely identifies a row K is a super key for relation schema R if and only if K R (uniqueness property) K is candidate key for R if and only if K R and there is no α K, α R ( minimality property) A primary key is a candidate key selected as the primary means of identifying rows in a relation: - There is one and only one primary key per relation - The primary key may be a composite key There is at least one candidate key - because relations do not contain duplicate tuples - at least the combination of all attributes of the relation has the uniqueness property A surrogate key as an artificial column added to a relation to serve as a primary key: A foreign key is the primary key of one relation that is placed in another relation to form a link between the relations: - A foreign key can be a single column or a composite key - The term refers to the fact that key values are foreign to the relation in which they appear as foreign key values DEPARTMENT (DepartmentName, BudgetCode, ManagerName) EMPLOYEE(EmployeeNumber,EmployeeName, DepartmentName)

Definition of Foreign Key Let R2 be a base relation. Then foreign key in R2 is a subset of the set of attributes of R2, say FK, such that 1 There exists a base relation R1 with a candidate key CK, (R1 and R2 not necessarily distinct) 2 For all time, each value of FK in the current value of R2 is identical to the value of CK in some tuple in the current value of R1 R1 CK R2 FK Referential Integrity Rule The database must not contain any unmatched foreign key values If B references A, A must exist "foreign key" and "referential integrity" are defined in terms of each other Database Designer - specify which operations should be rejected - specify which compensating operations should be performed What should happen on an attempt to delete the target of a foreign key reference? RESTRICTED : "restricted" to the case where there are no matching. CASCADES : " cascades" to delete those matching also

Nulls - as a basis for dealing with the problem of missing information - "date of birth unknown","to be announced" - Nulls are not the same as blank or zero three valued logic AND True False unknown True True False unknown False False False False unknown Unknown False unknown OR True False unknown True True True True False True False unknown unknown True unknown unknown l Entity Integrity Rule No component of the primary key of a base relation is allowed to accept nulls. definition of every attribute involved in primary key must include the specification NULLS NOT ALLOWED l Integrity Rules - Summary Referential Integrity Rule Entity Integrity Rule

4. Normal Forms Modification Anomalies update anomaly (before update) (after update) Deletion Anomaly - To lose facts that tow different things, ( a machine and a repair) - If you delete a tuple that has the item number 200 or 300 Insertion Anomaly Primary Key :( OrderNumber, SKU) - We assume that situation when we are going to insert the fact that related with only the SKU ( SKU, SKU_Descripion, Department, Buyer) - Is it possible?

Normalization Theory All Possible Relational Schema Repetition of information Inability to represent certain information Inefficient retrieval and update performance Normalization Good Database Design Relations are categorized as a normal form based on which modification anomalies or other problems that they are subject to:

Normalization Category 1NF - A table that qualifies as a relation is in 1NF 2NF - A relation is in 2NF if all of its nonkey attributes are dependent on all of the primary key 3NF - A relation is in 3NF if it is in 2NF and has no determinants except the primary key - The relation R is 3NF iff it is 2NF and has no transitive dependencies Boyce-Codd Normal Form (BCNF) - A relation is in BCNF if every determinant is a candidate key Eliminating Anomalies from Functional Dependencies Put all relations into Boyce-Codd Normal Form (BCNF):

Eliminating Anomalies from Functional Dependencies : Example 1 (page 84) - SKU_DATA relation FDs SKU (SKU_Description, Department, Buyer) SKU_Description (SKU, Department, Buyer) Buyer Department SKU and SKU_Description ; all of the attributes in the table, so they are candidate keys Buyer ; It does not determine all of the other attributes Hence, the SKU_DATA relation is not in BCNF Step 3-A in Figure 3-11 : Step 3-B : make the Buyer attribute the primary key BUYER(Buyer, Department) Step 3-C and 3-D : SKU_DATA_2(SKU, SKU_Description, Buyer)

[Report #3] Describe the whole process of below examples and write your explanation and PKs, FKs of the decomposed relations for each step of process for putting a relation into BCNF. 1) Eliminating Anomalies from Functional Dependencies : Example 2 (page 85) 2) Eliminating Anomalies from Functional Dependencies : Example 3 (page 86) 3) Eliminating Anomalies from Functional Dependencies : Example 4 (page 87) 4) Eliminating Anomalies from Functional Dependencies : Example 5 (page 89)

Theoretical Approach : The First Normal Form definition - Any table of data that meets the definition of a relation Formal Definition A relation is in 1NF iff all underlying domains contain scalar values only Example " 상태 20 인도시 런던 에위치한공급자 이부품 P1 을 300 개공급한다 S# STATUS CITY 20 London 20 London 20 London 20 London S2 10 Paris S2 10 Paris S3 10 Paris P# P1 P2 P3 P4 P1 P2 P2 QTY 300 200 400 200 300 400 200 S5 40 Athens P4 100 S# P# QTY CITY STATUS l Anomalies in 1NF 3 Insertion Anomaly New Supplier without P# 4 Deletion Anomaly Unexpected loss of bundled information 5 Update Anomaly FIRST London Paris redundancy S# STATUS CITY P# QTY 20 London P1 300 20 London P2 200 20 London P3 400 20 London P4 200 S2 10 Paris P1 300 S2 10 Paris P2 400 S3 90 New York P2 200 S5 40 Athens P4 100 S6 70 Florence Null Null

l Solution Decomposition First SECOND(S#, STATUS, CITY), SP(S#, P#, QTY) l Nonloss(lossless decomposition) - Reversibility S S# STATUS CITY S3 30 Paris S5 30 Athens nonloss S# STATUS S3 30 S5 30 SC S# CITY S3 Paris S5 Athens lossy S# STATUS S3 30 S5 30 STC STATUS CITY 30 Paris 30 Athens l test for the lossless decomposition nonloss (a) SST S# STATUS SC S# CITY S# STATUS CITY S S3 30 S3 Paris S3 30 Paris S5 30 S5 Athens S5 30 Athens lossy (b) SST S# STATUS STC STATUS CITY S# STATUS CITY S S3 30 30 Paris S3 30 Paris S5 30 30 Athens S3 30 Athens S5 30 Paris S5 30 Athens

l Decomposition : In a view of functional dependency S S# STATUS CITY S3 30 Paris S5 30 Athens S# S# STATUS S# S# CITY CITY (a) SST S# STATUS S3 30 S5 30 SC S# CITY S3 Paris S5 Athens S# S# STATUS S# S# CITY CITY (b) SST We cannot tell which supplier has which city S# STATUS S3 30 S5 30 STC STATUS CITY 30 Paris 30 Athens S# STATUS S# STATUS S# CITY is lost l Heath's Theorem Let R(A, B, C) be a Relation, where A, B, C are set of attributes, - If R satisfies the FD A B, - then R is equal to join of its projections on R(A, B) and R(B, C) l First SECOND(S#, STATUS, CITY), SP(S#, P#, QTY) SECOND S# S2 S3 S5 STATUS CITY 20 London 10 Paris 10 Paris 40 Athens SP S# P# QTY P1 300 P2 200 P3 400 P4 200 S2 P1 300 S2 P2 400 S3 P2 200 S5 P4 100

Theoretical Approach : The Second Normal Form l Definition - R is in 2NF When all of a relation's nonkey attributes are dependent of a key l Formal Definition A relation R is in 2NF if and only if - 1NF - Every non-key attribute is irreducibly(fully) dependent on the primary key Anomaly in 2NF Insertion Anomaly - Insertion {Florence, Italy} Deletion Anomaly - Deletion of tuple : Unexpected loss of bundled information Update Anomaly SECOND S# S2 S3 S5 STATUS CITY 20 London 10 Paris 10 Paris 40 Athens S# CITY STATUS Because, Transitive Dependency Decomposition SECOND(S#, STATUS, CITY) decomposition SC(S#, STATUS), CS(CITY, STATUS) SC S# CITY CITY STATUS CS London Athens 30 S2 Paris London 20 S3 Paris Paris 10 S5 Athens Rome 50

Theoretical Approach : The Third Normal Form Definition The relation R is 3NF iff it is 2NF and has no transitive dependencies Formal Definition - 2NF - Every non-key attribute is non-transitively(no mutual dependency) on the primary key Anomaly in 3NF 3NF o w S Smith Smith Jones Jones Semantic constraints J Math Physics Math Physics T Prof. White Prof. Green Prof. White Prof. Brown S J T Two candidate keys : {S,J}, {S, T} w For each subject, each student of that subject is taught by only one teacher { S, J } T w Each teacher teaches only one subject T J w Each subject is taught by several teachers Anomaly w Jones drops the physics w delete the last tuple w we loss the information Prof. Brown teaches Physic Solution : Decomposition - SJT ST(S, T), TJ((T, J) - BCNF(Boyce-Codd Normal Form)

5. Multi-valued Dependency A multi-valued dependency occurs when a determinant determines a particular set of values: Employee Degree Employee Sibling PartKit Part The determinant of a multivalued dependency can never be a primary key Multivalued dependencies are not a problem if they are in a separate relation, so: - Always put multivaled dependencies into their own relation - This is known as Fourth Normal Form (4NF)

A relation is in 4NF iff, for all multi-valued dependency A B, non-key attributes are dependent on A C T B database database Prof. White Prof. White Korth Ullman database Prof. White Prof. Green Korth Ullman database Prof. Green Korth database Prof. Green Ullman insertion C T C B database Prof. Sara Ullman database Prof. Sara Korth 6. The Fifth Normal Form A relation is in 5NF iff, for all join dependency (R 1, R 2,..., R n ), R 1, R 2,, R n is a candidate key S# P# P1 P2 J# J2 J1 S2 P1 J1 P1 J1 S# P# P1 P2 P# P1 P2 J# J2 J1 S# J# J2 J1 S# P# P1 P2 P# P1 P2 J# J2 J1 S# J# J2 J1 S2 P1 P1 J1 S2 J1 P1 J1

[Report #4] 1. make the relation that is the result of join on the relation CUSTOMER and ORDER. CUSTOMER (Cid, Name, Address) Cid (Name, Address, zipcode) Address Zipcode ORDER (BookNumber, Cid, Price, Orderdate) (BookNumber, Cid) (Price, Orderdate) 2. Write a detail process the following problems. (1) Describe the FDs in result relation 1 and illustrate the anomaly (2) Transform into the second Normal Form (3) Transform into the third Normal Form (4) Transform into the BCNF (5) make sure that result of (4) is the same as the result of problem 1