Introduction to Database Design, fall 2011 IT University of Copenhagen. Normalization. Rasmus Pagh

Similar documents
Introduction to Databases, Fall 2003 IT University of Copenhagen. Lecture 4: Normalization. September 16, Lecturer: Rasmus Pagh

Normalization. Anomalies Functional Dependencies Closures Key Computation Projecting Relations BCNF Reconstructing Information Other Normal Forms

Case Study: Lufthansa Cargo Database

Database Management System Prof. Partha Pratim Das Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur

Database Systems. Basics of the Relational Data Model

CS411 Database Systems. 05: Relational Schema Design Ch , except and

Concepts from

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 4 - Schema Normalization

Informal Design Guidelines for Relational Databases

CSCI 403: Databases 13 - Functional Dependencies and Normalization

CMU SCS CMU SCS CMU SCS CMU SCS whole nothing but

Normalization. Murali Mani. What and Why Normalization? To remove potential redundancy in design

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2009 Lecture 3 - Schema Normalization

Part II: Using FD Theory to do Database Design

Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Overview - detailed. Goal. Faloutsos & Pavlo CMU SCS /615

Introduction to Databases, Fall 2005 IT University of Copenhagen. Lecture 2: Relations and SQL. September 5, Lecturer: Rasmus Pagh

Exam I Computer Science 420 Dr. St. John Lehman College City University of New York 12 March 2002

Lecture 11 - Chapter 8 Relational Database Design Part 1

Lectures 12: Design Theory I. 1. Normal forms & functional dependencies 2/19/2018. Today s Lecture. What you will learn about in this section

Theory of Normal Forms Decomposition of Relations. Overview

Functional Dependencies and Normalization for Relational Databases Design & Analysis of Database Systems

Birkbeck. (University of London) BSc/FD EXAMINATION. Department of Computer Science and Information Systems. Database Management (COIY028H6)

Redundancy:Dependencies between attributes within a relation cause redundancy.

Database Normalization. (Olav Dæhli 2018)

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

NORMAL FORMS. CS121: Relational Databases Fall 2017 Lecture 18

Final Examination Computer Science 420 Dr. St. John Lehman College City University of New York 21 May 2002

Niklas Fors The Relational Data Model 1 / 17

Database design III. Quiz time! Using FDs to detect anomalies. Decomposition. Decomposition. Boyce-Codd Normal Form 11/4/16

CSE 562 Database Systems

Chapter 10. Chapter Outline. Chapter Outline. Functional Dependencies and Normalization for Relational Databases

Chapter 10. Normalization. Chapter Outline. Chapter Outline(contd.)

Lecture 4. Database design IV. INDs and 4NF Design wrapup

COSC Dr. Ramon Lawrence. Emp Relation

CSIT5300: Advanced Database Systems

Functional dependency theory

CS 338 Functional Dependencies

Lectures 5 & 6. Lectures 6: Design Theory Part II

Chapter 14. Database Design Theory: Introduction to Normalization Using Functional and Multivalued Dependencies

Database Design Theory and Normalization. CS 377: Database Systems

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-2

Normalisation Chapter2 Contents

Relational Database design. Slides By: Shree Jaswal

Final Review. Zaki Malik November 20, 2008

Databases Tutorial. March,15,2012 Jing Chen Mcmaster University

Database Principles: Fundamentals of Design, Implementation, and Management Tenth Edition. Chapter 9 Normalizing Database Designs

UNIT 3 DATABASE DESIGN

Handout 3: Functional Dependencies and Normalization

Chapter 6: Relational Database Design

Applied Databases. Sebastian Maneth. Lecture 5 ER Model, normal forms. University of Edinburgh - January 25 th, 2016

Schema Refinement: Dependencies and Normal Forms

More Normalization Algorithms. CS157A Chris Pollett Nov. 28, 2005.

Relational Design Theory. Relational Design Theory. Example. Example. A badly designed schema can result in several anomalies.

Relational Database Design (II)

Functional Dependencies and. Databases. 1 Informal Design Guidelines for Relational Databases. 4 General Normal Form Definitions (For Multiple Keys)

Normal Forms. Winter Lecture 19

Relational Design: Characteristics of Well-designed DB

Normalization in DBMS

DATABASE MANAGEMENT SYSTEM SHORT QUESTIONS. QUESTION 1: What is database?

Databases The theory of relational database design Lectures for m

DBMS Chapter Three IS304. Database Normalization-Comp.

This lecture. Databases -Normalization I. Repeating Data. Redundancy. This lecture introduces normal forms, decomposition and normalization.

Databases -Normalization I. (GF Royle, N Spadaccini ) Databases - Normalization I 1 / 24

Chapter 14 Outline. Normalization for Relational Databases: Outline. Chapter 14: Basics of Functional Dependencies and

In This Lecture. Normalisation to BCNF. Lossless decomposition. Normalisation so Far. Relational algebra reminder: product

CS211 Lecture: Database Design

customer = (customer_id, _ customer_name, customer_street,

Schema Refinement: Dependencies and Normal Forms

CSE 544 Principles of Database Management Systems

Normalisation. Normalisation. Normalisation

Relational Database Design. Announcements. Database (schema) design. CPS 216 Advanced Database Systems. DB2 accounts have been set up

Databases Lecture 7. Timothy G. Griffin. Computer Laboratory University of Cambridge, UK. Databases, Lent 2009

Schema Refinement: Dependencies and Normal Forms

Relational Design 1 / 34

Introduction to Databases, Fall 2005 IT University of Copenhagen. Lecture 10: Transaction processing. November 14, Lecturer: Rasmus Pagh

Steps in normalisation. Steps in normalisation 7/15/2014

Applied Databases. Sebastian Maneth. Lecture 5 ER Model, Normal Forms. University of Edinburgh - January 30 th, 2017

The Relational Data Model

The strategy for achieving a good design is to decompose a badly designed relation appropriately.

FUNCTIONAL DEPENDENCIES CHAPTER , 15.5 (6/E) CHAPTER , 10.5 (5/E)

UNIT -III. Two Marks. The main goal of normalization is to reduce redundant data. Normalization is based on functional dependencies.

Database Management

Design Theory for Relational Databases

ACS-2914 Normalization March 2009 NORMALIZATION 2. Ron McFadyen 1. Normalization 3. De-normalization 3

A7-R3: INTRODUCTION TO DATABASE MANAGEMENT SYSTEMS

Relational Model. Rab Nawaz Jadoon DCS. Assistant Professor. Department of Computer Science. COMSATS IIT, Abbottabad Pakistan

Database Normalization Complete

DATABASE MANAGEMENT SYSTEMS

Relational Model and Relational Algebra A Short Review Class Notes - CS582-01

Relational Database Design Theory. Introduction to Databases CompSci 316 Fall 2017

6.830 Lecture PS1 Due Next Time (Tuesday!) Lab 1 Out end of week start early!

Learning outcomes. On successful completion of this unit you will: 1. Understand data models and database technologies.

Database Systems. Normalization Lecture# 7

Functional Dependencies and Finding a Minimal Cover

Unit 3 : Relational Database Design

Introduction to Database Systems. Announcements CSE 444. Review: Closure, Key, Superkey. Decomposition: Schema Design using FD

Chapter 8: Relational Database Design

Chapter 7: Relational Database Design

Database Constraints and Design

Functional Dependencies and Normalization

Transcription:

Introduction to Database Design, fall 2011 IT University of Copenhagen Normalization Rasmus Pagh Based on KBL sections 6.1-6.8 (except p. 203 207m), 6.9 (until Multivalued dependencies ), 6.11, and 6.12.

Today s lecture Anomalies in relations. Functional dependencies. Normal forms: Boyce-Codd normal form, 3rd normal form, and a little bit on higher normal forms. 1

What you should remember from previously. In this lecture I will assume that you remember: Key concepts of the relational data model: Relation Attributes Relation schema Relation instance 2

Key concepts in SQL Selection by example: SELECT DISTINCT title, year FROM Movie Join by example: SELECT * FROM Movie, Director WHERE Movie.director = Director.id 3

Redundancy in a relation Redundant (i.e., unnecessary ) information occurs in a relation if the same fact is repeated in several different tuples. One obvious problem with redundant information is that we use more memory than is necessary. Redundancy is an example of an anomaly of the relation schema. Example: Instance of Movies(title, year, length, filmtype, studioname, starname) where the length of a movie is repeated several times. 4

Other kinds of anomalies The other principal kinds of unwanted anomalies are: Update anomalies. Occurwhenitispossibletochangeafactinone tuple but leave the same fact unchanged in another. (E.g., the length of Star Wars in the Movies relation.) Deletion anomalies. Occurwhendeletingatuple(recordingsome fact) may delete another fact from the database. (E.g., information on amovieinthemovies relation.) Insertion anomalies. The dual ofdeletionanomalies. Ideally, we would like relation schemas that do not allow anomalies. Normalization is a process that can often be used to arrive at such schemas. 5

Anomalies and good design As we will see, anomalies are a sign that we tried to encode several, unrelated types of facts into a single relation. Thus, normalization can help improving the design such that the unrelatedness of these facts are captured by the database schema (and E-R diagram). 6

Decomposing relations The anomalies in the example we saw can be eliminated by splitting (or decomposing) therelationschema Movies(title, year, length, filmtype, studioname, starname) into two relation schemas Movies1(title, year, length, filmtype, studioname) Movies2(title, year, starname) 7

Decomposition and projection The relation instances for Movies1 and Movies2 were found by projection of Movies onto their attributes. In SQL, Movies2 could be computed as follows: SELECT DISTINCT title, year, starname FROM Movies This is a general rule when decomposing: The decomposed relation instances are found by projection of the original relation instance. 8

Recombining relations We need the decomposed relations to contain the same information as the original relation. In particular, we must be able to recombine them to recover the original relation. (The lossless join property.) If decomposition is done properly, recombining can be done by joining the relations on attributes of the same name (this is called a natural join). Example: In SQL we can compute Movies as follows: SELECT * FROM Movies1, Movies2 WHERE Movies1.title = Movies2.title AND Movies1.year = Movies2.year 9

Candidate keys of a relation A candidate key for a relation is a set of its attributes that satisfy: Uniqueness. tuple. The values of the attributes uniquely identify a Minimality. No proper subset of the attributes has the uniqueness property. If uniqueness is satisfied (but not necessarily minimality) the attributes are said to form a superkey. Examples: {Title, year, starname} is a candidate key for the Movies relation. {Title, year, starname, length} is a superkey, but not a candidate key, for the Movies relation. {Title, year} does not satisfy uniqueness for Movies. 10

Candidate keys vs primary keys Note that the concept of a candidate key is defined with respect to the relation (schema), and not with respect to any particular instance of the relation. The primary key of a relation in a DBMS should be a candidate key, but there could be several candidate keys to choose from. When talking about normalization, it is irrelevant which key is chosen as primary key. 11

Next: Functional dependencies and normal forms

Functional dependencies cause anomalies When values of attribute B can be derived (by an all-knowing entity) from the attributes A 1,...,A n we say that B is functionally dependent on A 1,...,A n.thisiswrittenasfollows: A 1 A 2...A n B Example: Movies has the functional dependency (FD) title year length but not the FD title year starname This is in fact the very reason for the anomalies we saw! 13

Unavoidable functional dependency Functional dependency on a candidate key If the attributes of some candidate key is included in {A 1,...,A n },these attributes uniquely identify the tuple from which the values come. In particular, we can determine the value of any other attribute B in the relation, so we unavoidably have the FD Trivial functional dependency A 1 A 2...A n B Also, we can always determine the value of attribute A i from the value of attribute A i.soweunavoidablyhavethefd A 1 A 2...A n A i 14

First and second normal form A normal form is a criterion on a relation schema. Arelationschemaisinfirst normal form (1NF) if it specifies that all tuples contain the same number of atomic attribute values. In pure relational DBMSs it is only possible to have 1NF relations schemas. Arelationschemaisinsecond normal form (2NF) if it is in 1NF and there are no functional dependencies with a proper subset of a candidate key on the left hand side. Example: Movies has the functional dependency title year length where {title, year} is a subset of the candidate key {title, year,starname}. Thus, Movies is not in 2NF. 15

Boyce-Codd normal form (BCNF) ArelationschemaisinBoyce-Codd normal form (BCNF) if there are only unavoidable functional dependencies among its attributes. Example: Movies has the functional dependency title year length which is not unavoidable because it is nontrivial and {title, year} is not a superkey. Thus, Movies is not in BCNF. 16

Examples of relations in BCNF The relations of our decomposition: Movies1(title, year, length, filmtype, studioname) Movies2(title, year, starname) are in BCNF. The only nontrivial nonreducible FDs are (all in Movies1): title year length title year filmtype title year studioname and they are unavoidable since {title, year} is a candidate key for Movies1. 17

Writing functional dependencies Reducing FDs: Whenever we can reduce the number of attributes when writing an FD we do so. For example, Movies1 has the FDs which can both be reduced to title year f ilmt ype length title year studion ame length title year length Combining FDs: Whenever several FDs have the same left hand side we combine them. For example, the three FDs we saw for Movies can be written succinctly as: title year length filmt ype studioname 18

Decomposing a relation into BCNF Suppose we have a relation R which is not in BCNF. Then there is an FD which is not unavoidable. A 1 A 2...A n B 1 B 2...B m To eliminate the FD we split R into two relations: One with all attributes of R except B 1,B 2,...,B m. One with attributes A 1,A 2,...,A n,b 1,B 2,...,B m. If any of the resulting relations is not in BCNF, the process is repeated. Note: A 1,A 2,...,A n is a superkey for the second relation therefore we can recover R as the natural join of the two relations. 19

BCNF decomposition example Recall the relation Movies with schema Movies(title, year, length, filmtype, studioname, starname) It has the following FD, which is not unavoidable: title year length filmt ype studioname Thus the decomposition yields the following relations (both in BCNF): Movies1(title, year, length, filmtype, studioname) Movies2(title, year, starname) 20

BCNF decomposition example 2 Movies(title, year, length, filmtype, studioname, starname) could also have been decomposed by using the following FDs, one by one: title year length title year filmtype title year studioname Then the decomposition yields the following relations (all in BCNF): Movies1(title, year, length), Movies2(title, year, filmtype), Movies3(title, year, studioname), Movies4(title, year, starname) To avoid too many relations, as in this example, you should generally use maximal FDs where it is not possible to add attributes on the right side. 21

Problem session Consider a relation containing an inventory record: Inventory(part, warehouse, quantity, warehouse-address) What are the candidate keys of the relation? What are the avoidable functional dependencies? Perform a decomposition into BCNF. 22

Next: 3rd normal form and dependency preservation

Interrelation dependencies Consider the relation Bookings(title,theater,city) with FDs: theater city (and theater is not a candidate key). title city theater. BCNF dec.: Bookings1(theater,city) Bookings2(theater,title). These schemas and their FDs allow, e.g., the relation instances: theater city theater title Guild Park Menlo Park Menlo Park Guild Park The net The net which violate the presumed FD title city theater. Thus, there are implicit dependencies between values in different relations. We cannot check FDs separately in each relation to see such a dependency. 24

Splitting candidate keys As we just saw, decomposition can result in a relational database schema where a functional dependency disappeared. (The decomposition is not dependency preserving.) The problem in the previous example arose because we decomposed according to the FD theater city, wherecity is part of a candidate key for the Bookings relation. Thus we ended up splitting the candidate key {city, theater}. This problem of FDs that are not preserved never arises if we do not decompose in this case. 25

Third normal form We have motivated the following normal form that never splits a candidate keys of the original relation: Arelationschemaisin3rd normal form (3NF) if any functional dependency among its attributes is either unavoidable, or has a member of some candidate key on the right hand side. In words: A relation is in 3NF if there are no unavoidable functional dependencies among non-candidate key attributes. 26

Third normal form, another example HasAccount(Account-number,ClientId,OfficeId) Functional dependencies: ClientId OfficeId AccountNumber AccountNumber OfficeId Is in 3rd normal form, but not BCNF (why?). Can be decomposed losslessly: AcctOffice(Account-number,OfficeId) AcctClient(Account-number,ClientId) 27

3NF by schema synthesis A3NFschemacanalsobefound directly byschema synthesis. Given a relation R and a set F of FDs. Find a minimal set of FDs that cover (imply) all FDs in F. Create a relation R i for each FD (with the attributes of the FD). If needed, add a relation R 0 containing a key of R. We will not go into why this procedure produces a 3NF schema. 28

When to stop decomposition at 3NF? Whether it is a good idea to stop decomposition when third normal form is reached depends on the specific scenario. Mostly, 3NF and BCNF coincide, so there is nothing to consider. If not, the redundancy in tuples in 3NF should be weighed against the fact that some FD is difficult to check/maintain in BCNF. Example: In the Bookings example, we might want to make the DBMS check that to every title and city, there is at most one theater. For the BCNF decomposed relations, this would involve a query on Bookings1 for every change of Bookings2, andviceversa. 29

Next: Higher normal forms.

Redundancy in BCNF relations Boyce-Codd normal form eliminates redundancy in each tuple, but may leave redundancy among tuples in a relation. This typically happens if two many-many relationships (or in general: a combination of two types of facts) are represented in one relation. Example: name street city title year C. Fisher 123 Maple St. Hollywood Star Wars 1977 C. Fisher 123 Maple St. Hollywood Empire Strikes Back 1980 C. Fisher 123 Maple St. Hollywood Return of the Jedi 1983 C. Fisher 5LocustLn. Malibu Star Wars 1977 C. Fisher 5LocustLn. Malibu Empire Strikes Back 1980 C. Fisher 5LocustLn. Malibu Return of the Jedi 1983 31

Curing it with NULL values? Then what about something like one of these: name street city title year C. Fisher 123 Maple St. Hollywood NULL NULL C. Fisher 5LocustLn. Malibu NULL NULL C. Fisher NULL NULL Star Wars 1977 C. Fisher NULL NULL Empire Strikes Back 1980 C. Fisher NULL NULL Return of the Jedi 1983 name street city title year C. Fisher 123 Maple St. Hollywood Star Wars 1977 C. Fisher 5LocustLn. Malibu Empire Strikes Back 1980 C. Fisher NULL NULL Return of the Jedi 1983 32

Decomposition AbetterideaistoeliminateredundancybydecomposingStarsIn as follows: name street city C. Fisher 123 Maple St. Hollywood C. Fisher 5LocustLn. Malibu name title year C. Fisher Star Wars 1977 C. Fisher Empire Strikes Back 1980 C. Fisher Return of the Jedi 1983 33

4th and 5th normal form Roughly speaking, a relation is in 4th normal form if it cannot be meaningfully decomposed into two relations. Example: StarsIn is not in 4th normal form, since it can be decomposed, as we have just shown. Roughly speaking, a relation is in 5th normal form if it cannot be meaningfully decomposed into some number of relations. 34

Relationship among normal forms Inclusion among normal forms: Any relation in 5NF is also in 4NF. Any relation in 4NF is also in BCNF. Any relation in BCNF is also in 3NF. Properties of normal forms: A higher normalformhaslessredundancy,butmaynotpreserve functional and multivalued dependencies. 35

How should normal forms be used? The various normal forms may be seen as guidelines for designing a good relation schema. Some complexities that arise are: Should we split candidate keys, introducing dependencies between relations (in 3NF we do not)? What is the effect of decomposition on performance? (More on this later.) If one chooses a relation schema with avoidable functional dependencies, one should be aware of the extra effort that is needed to avoid anomalies. 36

Most important points in this lecture Your goals related to today s lecture should be to: Understand the significance of normalization. Be able to determine whether a relation is in Boyce Codd normal form or 3rd normal form. Be able to split a relation in several relations to achieve any of the normal forms. Know how to recombine normalized relations in SQL. 37