NORMALISATION (Relational Database Schema Design Revisited)

Similar documents
Normalization. Murali Mani. What and Why Normalization? To remove potential redundancy in design

Relational Design 1 / 34

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Functional Dependencies and Normalization for Relational Databases Design & Analysis of Database Systems

COSC Dr. Ramon Lawrence. Emp Relation

Dr. Anis Koubaa. Advanced Databases SE487. Prince Sultan University

The Relational Algebra

Chapter 10. Normalization. Chapter Outline. Chapter Outline(contd.)

FUNCTIONAL DEPENDENCIES CHAPTER , 15.5 (6/E) CHAPTER , 10.5 (5/E)

Functional Dependencies & Normalization for Relational DBs. Truong Tuan Anh CSE-HCMUT

CS 2451 Database Systems: Database and Schema Design

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-2

Chapter 10. Chapter Outline. Chapter Outline. Functional Dependencies and Normalization for Relational Databases

CS 338 Functional Dependencies

The strategy for achieving a good design is to decompose a badly designed relation appropriately.

UNIT 3 DATABASE DESIGN

We shall represent a relation as a table with columns and rows. Each column of the table has a name, or attribute. Each row is called a tuple.

Database Design Theory and Normalization. CS 377: Database Systems

Informal Design Guidelines for Relational Databases

Functional Dependencies and. Databases. 1 Informal Design Guidelines for Relational Databases. 4 General Normal Form Definitions (For Multiple Keys)

Relational Database design. Slides By: Shree Jaswal

Chapter 14. Database Design Theory: Introduction to Normalization Using Functional and Multivalued Dependencies

Chapter 14 Outline. Normalization for Relational Databases: Outline. Chapter 14: Basics of Functional Dependencies and

Relational Databases Overview

Steps in normalisation. Steps in normalisation 7/15/2014

Schema Refinement: Dependencies and Normal Forms

Normalisation. Normalisation. Normalisation

CMP-3440 Database Systems

Schema Refinement: Dependencies and Normal Forms

Mapping ER Diagrams to. Relations (Cont d) Mapping ER Diagrams to. Exercise. Relations. Mapping ER Diagrams to Relations (Cont d) Exercise

TDDD12 Databasteknik Föreläsning 4: Normalisering

Schema Refinement: Dependencies and Normal Forms

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 4 - Schema Normalization

Database Normalization. (Olav Dæhli 2018)

IS 263 Database Concepts

Chapter 16. Relational Database Design Algorithms. Database Design Approaches. Top-Down Design

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2009 Lecture 3 - Schema Normalization

MODULE: 3 FUNCTIONAL DEPENDENCIES

Redundancy:Dependencies between attributes within a relation cause redundancy.

More Relational Algebra

In This Lecture. Normalisation to BCNF. Lossless decomposition. Normalisation so Far. Relational algebra reminder: product

NORMAL FORMS. CS121: Relational Databases Fall 2017 Lecture 18

COSC344 Database Theory and Applications. σ a= c (P) S. Lecture 4 Relational algebra. π A, P X Q. COSC344 Lecture 4 1

Unit IV. S_id S_Name S_Address Subject_opted

SVY227: Databases for GIS

Relational Model. CS 377: Database Systems

CSE 562 Database Systems

Normal Forms. Winter Lecture 19

Lecture 11 - Chapter 8 Relational Database Design Part 1

V. Database Design CS448/ How to obtain a good relational database schema

Applied Databases. Sebastian Maneth. Lecture 5 ER Model, normal forms. University of Edinburgh - January 25 th, 2016

Normalization in DBMS

Database Management System Prof. Partha Pratim Das Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur

In Chapters 3 through 6, we presented various aspects

Functional Dependencies and Finding a Minimal Cover

Relational Design: Characteristics of Well-designed DB

CSCI 403: Databases 13 - Functional Dependencies and Normalization

Normalization is based on the concept of functional dependency. A functional dependency is a type of relationship between attributes.

Applied Databases. Sebastian Maneth. Lecture 5 ER Model, Normal Forms. University of Edinburgh - January 30 th, 2017

Database Principles: Fundamentals of Design, Implementation, and Management Tenth Edition. Chapter 9 Normalizing Database Designs

Relational Databases and Web Integration. Week 7

Relational Design Theory. Relational Design Theory. Example. Example. A badly designed schema can result in several anomalies.

DATABASTEKNIK - 1DL116

Chapter 3. The Relational database design

DATABASE DESIGN I - 1DL300

Database Systems: Design, Implementation, and Management Tenth Edition. Chapter 6 Normalization of Database Tables

DBMS Chapter Three IS304. Database Normalization-Comp.

Lecture5 Functional Dependencies and Normalization for Relational Databases

Schema And Draw The Dependency Diagram

Relational Model. Rab Nawaz Jadoon DCS. Assistant Professor. Department of Computer Science. COMSATS IIT, Abbottabad Pakistan

Chapter 6: Relational Database Design

The University of Nottingham

ECE 650 Systems Programming & Engineering. Spring 2018

A database can be modeled as: + a collection of entities, + a set of relationships among entities.

GUJARAT TECHNOLOGICAL UNIVERSITY

Functional dependency theory

CS 348 Introduction to Database Management Assignment 2

Techno India Batanagar Computer Science and Engineering. Model Questions. Subject Name: Database Management System Subject Code: CS 601

Normalization Normalization

CS430 Final March 14, 2005

DATABASE MANAGEMENT SYSTEM SHORT QUESTIONS. QUESTION 1: What is database?

Database Systems. Answers

Database Technologies. Madalina CROITORU IUT Montpellier

CS211 Lecture: Database Design

Part II: Using FD Theory to do Database Design

Database Management System

Database Foundations. 3-9 Validating Data Using Normalization. Copyright 2015, Oracle and/or its affiliates. All rights reserved.

Normalization Rule. First Normal Form (1NF) Normalization rule are divided into following normal form. 1. First Normal Form. 2. Second Normal Form

Lecture 15: Database Normalization

Edited by: Nada Alhirabi. Normalization

Normalisation. Databases: Topic 4

Chapter 14. Chapter 14 - Objectives. Purpose of Normalization. Purpose of Normalization

7+8+9: Functional Dependencies and Normalization

ACS-2914 Normalization March 2009 NORMALIZATION 2. Ron McFadyen 1. Normalization 3. De-normalization 3

UFCEKG : Dt Data, Schemas Sh and Applications. Lecture 10 Database Theory & Practice (4) : Data Normalization

Relational Database Design (II)

Practice questions recommended before the final examination

More Normalization Algorithms. CS157A Chris Pollett Nov. 28, 2005.

Case Study: Lufthansa Cargo Database

Database Constraints and Design

Transcription:

NORMALISATION (Relational Database Schema Design Revisited) Designing an ER Diagram is fairly intuitive, and faithfully following the steps to map an ER diagram to tables may not always result in the best table design. This is a top-down approach In this lecture we discuss some design criteria, and introduce the concept of normalisation, which is the process of transforming a collection of attributes into relations which hsatisfy a collection of rules. This is a bottom-up approach In constructing a relational database there is a considerable freedom which ranges from: putting all of the data in one relation having lots of small relations Some Guidelines Be guided by the semantics or meaning of the data put together in one relation data that belongs together (e.g. using the ER approach) Avoid idhaving too many null values if only 10% of employees have offices we would prefer to set up a separate relation pairing employees and offices than add a column to employee Beware of keeping multiple versions of information Avoid breaking up relations in such a way that spurious information is created Preserve known functional dependencies How do we decide which makes the best set of relations for our purpose? MSc/Dip IT ISD L18 Normalisation (425-456) 425 MSc/Dip IT ISD L18 Normalisation (425-456) 426 Ill Effects of Multiple Copies of Data It s Easy to Create False Data ni# name bdate dnum dname mgrni# 666 Jim Grey 01/12/67 5 Admin 666 777 Jo White 04/06/53 5 Admin 666 888 Sue Low 12/09/88 6 Finance 888 999 Tim Lee 30/07/78 7 HR 999 898 Mary Law 22/02/60 7 HR 999 Insertion. We cannot add a new department which has no employees as yet. Entering a new employee in department 5 means repeating existing departmental data Let s try to split up the ni# name pname ploc dt data about employees 666 Jim Grey Jupiter Perth and projects into two 666 Jim Grey Saturn Dundee tables 888 Sue Low Mars Perth ni# name pname ni# ploc This loses the 666 Jim Grey Jupiter 666 Perth info that Jim 666 Jim Grey Saturn 666 Dundee works on Project 888 Sue Low Mars 888 Perth Jupiter AT Perth Deletion. Deleting Sue Low loses details about department 6 Modification. If the manager of department 5 is changed, it must be changed in all tuples ni# name pname ploc 666 Jim Grey Jupiter Perth 666 Jim Grey Jupiter Dundee etc etc etc etc Joining them back together, we get NEW TUPLES! Jim works on Jupiter and works at Perth MSc/Dip IT ISD L18 Normalisation (425-456) 427 MSc/Dip IT ISD L18 Normalisation (425-456) 428

Preserving Functional Dependencies Some Examples of Functional Dependency Functional Dependencies are known connections between items in the database which can be installed into the database by good schema design Given ni#, we can determine the employee s name, date of birth, salary and department i.e. columns name, bdate, salary, department all depend on ni# Also, from knowing the ni#, we know the department number, and therefore we can find out all about the other department information as well fd1 dname, mgrni#, mrgrstartdate all depend on dnum dnum, mgrni#, mrgrstartdate all depend on dname ni# name bdate dnum dname mgrni# mgrstartdate fd2 fd3 We will proceed to use functional dependencies to guide our database design Another example is that hours depends on ni#, pnum MSc/Dip IT ISD L18 Normalisation (425-456) 429 MSc/Dip IT ISD L18 Normalisation (425-456) 430 Defining Functional Dependence Full Functional Dependency An attribute, X, of a relation is functionally dependent on attributes A, B - N if the same values of R A... R N always lead to the same value of R X We write R A... R N R X. e.g. ni# name, bdate, dnum ni#, pnum hours This is a property of the meaning of fthe data, not a property that t emerges from the current values of the data X fully depends on A... N if it is not dependent on any subset of A... N. Example for Employees : bdate is dependent on ni# and name but only fully dependent on ni# However WorksOn : hours is fully dependent on ni# and pnum i.e. if you happened to have no two employees with the same name, you may indeed be able to infer birth date from name for this data set, but this would not constitute genuine dependence because next week you might employ someone with the same name as a current staff member MSc/Dip IT ISD L18 Normalisation (425-456) 431 MSc/Dip IT ISD L18 Normalisation (425-456) 432

Normalisation Normalisation is the process of taking a set of relations and decomposing them into more relations which satisfy our criteria better First Normal Form A relation is in First Normal Form (1NF) if all values are atomic i.e. all the cells hold single values. The decomposition is essentially a series of projections so that the original data can be reconstituted using joins The form of the relations which satisfy our conditions is called a normal form There are a number of these of increasing stringency Specifically the following are not permitted Multi-valued l attributes / sets e.g. An attribute locations which is intended to hold more than one location per department cannot be held in a single cell Composite attributes / nested relations e.g. An attribute name which is divided up into hold a person s title, surname and forenames cannot be held in a single cell MSc/Dip IT ISD L18 Normalisation (425-456) 433 MSc/Dip IT ISD L18 Normalisation (425-456) 434 Universal Relation Candidate Keys We start by creating the Universal Relation which is one giant relation schema including every database attribute (using unique names) It must be in first normal form (or it can t go in the database at all!) A (usually composite) primary key can be chosen which uniquely identifies each row. A candidate key is a set of attributes which uniquely identifies each entry in the relation, and therefore can be the primary key of that relation A relation can have more than one candidate key (e.g. dnum and dname), in which case one (usually the simplest) is chosen to be the primary key The following universal relation is for a subset of the Employee database consisting of Employee, Department and Project only. Composite attributes have been broken down, multi-valued attributes are shown as one attribute. What is the primary key? ni#, lastname, firstnames, street, city, postcode, salary, bdate, gender, dnum, dname, mgrni#, mgrstartdate, pnum, pname, plocation, hours MSc/Dip IT ISD L18 Normalisation (425-456) 435 MSc/Dip IT ISD L18 Normalisation (425-456) 436

Functional Dependencies for the Company Database Second Normal Form ni# -> lastname, firstnames, street, city, postcode, salary, bdate, gender, dnum, dname, mgrni#, mgrstartdate pnum -> pname, plocation dnum -> dname, mgrni#, mgrstartdate ni#, pnum -> hours Note that everything is directly or indirectly dependent on ni# and pnum so these will make an effective ec key of the Universal Relation By definition, every other attribute is functionally dependent on the primary key The relation is in Second Normal Form (2NF), if it is in First Normal Form AND all the non-primary-key attributes are fully functionally dependent on the chosen primary key, and also on any other candidate key Clearly, any relation with a single primary key will be 2NF If there are two primary key attributes, A & B, then each of the other attributes is either (i) dependent on A alone (ii) dependent on B alone or (iii) dependent on both For a relation to be in 2NF : only the third option is allowed 2NF normalisation consists of creating a separate relation for each of the three cases MSc/Dip IT ISD L18 Normalisation (425-456) 437 MSc/Dip IT ISD L18 Normalisation (425-456) 438 Making our Universal Relation 2NF 2NF Normalisation This uses a subset of the attributes. The national insurance number together with the project name is a candidate key Consider each combination of the composite primary key attributes Which other attributes depend on just a subset of them? Repeat for any candidate keys ni# pnum and all the other attributes hours lastname pname plocation WORKS_ONON wni# wpnum hours fd1 PROJECT pnum pname plocation fd3 fd4 fd1 fd2 fd3 fd4 EMPLOYEE ni# lastname firstnames street... mgrstartdate fd2 MSc/Dip IT ISD L18 Normalisation (425-456) 439 MSc/Dip IT ISD L18 Normalisation (425-456) 440

Third Normal Form Employee/Dept p Relation : turn to 3NF Third Normal Form eliminates transitive dependencies - i.e. those dependencies which hold only because of some intermediary attribute An attribute is transitively dependent on the primary key if there is some other attribute which it is dependent on and which is, in turn, dependent on the key ni# --> dnum --> mgrni# Here mgrni# is dependent of ni# as required by 2NF, but only because it is dependent on dnum which is in turn dependent on ni# Non-3NF-relations are likely to hold redundant information Again using a subset of the Employee attributes Look for transitive dependencies ni# --> dnum --> dname, mgrni# ni# --> dname --> dnum, mgrni# We choose the primary key of the 3NF relation to be dnum A relation is in Third Normal Form if it is in second normal form AND no non-prime-key attribute is transitively dependent on the primary key. i.e. for any pair of attribute A & B such that R A R B, there is no attribute such that R A R X and R X R A MSc/Dip IT ISD L18 Normalisation (425-456) 441 MSc/Dip IT ISD L18 Normalisation (425-456) 442 ni# ename bdate dnum dname mgrni# LOTS : Introduction In this example, the database holds details of a number of real estate properties The Universal Relations holds 3NF Normalisation propid county lot# area price taxrate A property is identified by the county and the lot number EMPLOYEE ni# ename bdate ednum DEPARTMENT dnum dname mgrni# It also has a unique property p ID number The tax rate is fixed by county regardless of the area (size) MSc/Dip IT ISD L18 Normalisation (425-456) 443 The price is dependent d on the area regardless of the county. MSc/Dip IT ISD L18 Normalisation (425-456) 444

PropertyID is chosen to be the primary key County and Lot Number are a composite candidate key, but they were not chosen to be the primary key Here are the functional dependencies Consider each combination of the composite primary key attributes. Which other attributes depend on just a subset of them? Repeat for any candidate keys 2NF Normalisation propid county lot# area price taxrate LOTS1 propid county lot# area price COUNTY county taxrate MSc/Dip IT ISD L18 Normalisation (425-456) 445 MSc/Dip IT ISD L18 Normalisation (425-456) 446 Look for transitive dependencies, e.g. propid -> area -> price LOTS1 propid county lot# area price COUNTY county taxrate 3NF Normalisation LOTS1A LOTS1B propid county lot# area area price Boyce-Codd Normal Form Even a 3NF database can have redundancy so there is a further strengthening called Boyce Codd Normal Form Imagine that in the LOTS example different counties have different areas For instance areas in Clackmannan are all small while areas in Lanarkshire are all large so we have the dependency: area -> county which means that counties are unnecessarily duplicated when areas are duplicated BCNF removes this by insisting that : if we have a dependency A -> B then A must be a candidate key (i.e. a set of attributes which could have been used as the primary key) Area is not a candidate key, so we exclude this possibility by partitioning LOTS1A into LOTS1AX LOTS1AY propid lot# area area county MSc/Dip IT ISD L18 Normalisation (425-456) 447 MSc/Dip IT ISD L18 Normalisation (425-456) 448

A Final Example : Medical Records Patients have a unique NI#, each having a name and a GP; Each GP has an id number, name & address; A hospital appointment connects a patient, a date, a hospital and a consultant; A consultant has a phone number and visits one hospital on any given day; Ah hospital lhas an address. The following example assumes a patient will only have one appointment on a particular day, and that t consultant t names are unique. Functional Dependencies From a patient's national insurance number, we can determine the patient's name and GP ni# patname, gp#, gpaddress, gpname From a GP's ID we determine the GP's name and address: gp# gpaddress, gpname Each hospital has only one address: hosp hospaddress The Universal Relation (choose primary key) U (ni#, appdate, apptime, patname, gp#, gpname, gpaddress, cname, cphone, hosp,hospadd) MSc/Dip IT ISD L18 Normalisation (425-456) 449 MSc/Dip IT ISD L18 Normalisation (425-456) 450 From a particular patient on a particular day we can determine the consultant information, the time of the appointment and which was the hospital: ni#, appdate cname, cphone, apptime, hosp, hospadd Given the consultant, t appointment t date and time, we can determine the patient: t cname, appdate, apptime ni# Each consultant is at one hospital on any day: cname,appdate hosp, hospaddress Each consultant has one phone number cname cphone Normalising the Example Starting with the Universal Relation we find all kinds of redundancy. U(ni# (ni#, appdate, apptime, patname, gp#, gpname, gpaddress, cname, cphone, hosp,hospadd) So moving to 2NF, split off those attributes only dependent on part of the primary key: Patient (ni#, patname, gp#, gpaddress, gpname) Appt (ni#, appdate, apptime, cname, cphone, hosp, hospaddress) MSc/Dip IT ISD L18 Normalisation (425-456) 451 MSc/Dip IT ISD L18 Normalisation (425-456) 452

Patient (ni#, patname, gp#, gpaddress, gpname) Patient is 2NF but not 3NF since there is a transitive dependency ni# gp# gpadd, gpname So split off the GP information Pat (ni#, patname, gp# ) GP (gp#, gpadd, gpname) Appt is also 2NF but not 3NF since there is are 2 transitive dependencies ni#, appdate cname cphone ni#, appdate hosp hospadd Appt (ni#, appdate, apptime, cname, cphone, hosp, hospaddress) Appt becomes App (ni#, appdate, apptime, cname, hosp) Con (cname, cphone) Hospital (hosp, hospadd) MSc/Dip IT ISD L18 Normalisation (425-456) 453 MSc/Dip IT ISD L18 Normalisation (425-456) 454 So far we have 3NF but not BCNF, because in the App table App (ni#, appdate, apptime, cname, hosp) there is the functional dependency appdate, cname --> hosp yet appdate, cname is not a candidate key Final Tables Pat (ni##, patname, gp# #) GP (gp#, gpadd, gpname) Con (cname, cphone) Hospital (hosp, hospadd) Appointment (ni#, appdate, apptime, cname) ConsHosp (appdate, cname, hosp) Therefore we could go to Appointment (ni#, appdate, apptime, cname) ConsHosp (appdate, cname, hosp) MSc/Dip IT ISD L18 Normalisation (425-456) 455 MSc/Dip IT ISD L18 Normalisation (425-456) 456