CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 4 - Schema Normalization

Similar documents
CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2009 Lecture 3 - Schema Normalization

CSE 562 Database Systems

CSE 544 Principles of Database Management Systems

The Relational Data Model

Normalization. Murali Mani. What and Why Normalization? To remove potential redundancy in design

CSE 544 Principles of Database Management Systems

Functional Dependencies CS 1270

CMU SCS CMU SCS CMU SCS CMU SCS whole nothing but

Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Overview - detailed. Goal. Faloutsos & Pavlo CMU SCS /615

BCNF. Yufei Tao. Department of Computer Science and Engineering Chinese University of Hong Kong BCNF

Database Design Theory and Normalization. CS 377: Database Systems

CS411 Database Systems. 05: Relational Schema Design Ch , except and

Schema Refinement and Normal Forms

Databases -Normalization I. (GF Royle, N Spadaccini ) Databases - Normalization I 1 / 24

COSC Dr. Ramon Lawrence. Emp Relation

This lecture. Databases -Normalization I. Repeating Data. Redundancy. This lecture introduces normal forms, decomposition and normalization.

CS 2451 Database Systems: Database and Schema Design

UNIT 3 DATABASE DESIGN

Schema Refinement: Dependencies and Normal Forms

Database Management System

Functional Dependencies and Finding a Minimal Cover

Schema Refinement: Dependencies and Normal Forms

Informal Design Guidelines for Relational Databases

Functional Dependencies and Normalization for Relational Databases Design & Analysis of Database Systems

Schema Refinement: Dependencies and Normal Forms

Chapter 10. Normalization. Chapter Outline. Chapter Outline(contd.)

Chapter 10. Chapter Outline. Chapter Outline. Functional Dependencies and Normalization for Relational Databases

Relational Design: Characteristics of Well-designed DB

Review: Attribute closure

Functional Dependencies & Normalization for Relational DBs. Truong Tuan Anh CSE-HCMUT

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-2

Lecture 5 Design Theory and Normalization

CS 338 Functional Dependencies

FUNCTIONAL DEPENDENCIES

Relational Design 1 / 34

Databases Lecture 7. Timothy G. Griffin. Computer Laboratory University of Cambridge, UK. Databases, Lent 2009

Combining schemas. Problems: redundancy, hard to update, possible NULLs

Chapter 14. Database Design Theory: Introduction to Normalization Using Functional and Multivalued Dependencies

Functional Dependencies and. Databases. 1 Informal Design Guidelines for Relational Databases. 4 General Normal Form Definitions (For Multiple Keys)

Relational Database Design Theory. Introduction to Databases CompSci 316 Fall 2017

DATABASE MANAGEMENT SYSTEMS

Part II: Using FD Theory to do Database Design

Fundamentals of Database Systems

Chapter 8: Relational Database Design

Unit 3 : Relational Database Design

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Database Management System Prof. Partha Pratim Das Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur

Relational Database Design (II)

Typical relationship between entities is ((a,b),(c,d) ) is best represented by one table RS (a,b,c,d)

From Murach Chap. 9, second half. Schema Refinement and Normal Forms

Lectures 5 & 6. Lectures 6: Design Theory Part II

Database design III. Quiz time! Using FDs to detect anomalies. Decomposition. Decomposition. Boyce-Codd Normal Form 11/4/16

Lectures 12: Design Theory I. 1. Normal forms & functional dependencies 2/19/2018. Today s Lecture. What you will learn about in this section

Case Study: Lufthansa Cargo Database

Homework 6: FDs, NFs and XML (due April 13 th, 2016, 4:00pm, hard-copy in-class please)

Draw A Relational Schema And Diagram The Functional Dependencies In The Relation >>>CLICK HERE<<<

Functional dependency theory

Relational Database design. Slides By: Shree Jaswal

SCHEMA REFINEMENT AND NORMAL FORMS

Relational Database Design. Announcements. Database (schema) design. CPS 216 Advanced Database Systems. DB2 accounts have been set up

Normalization 03. CSE3421 notes

SCHEMA REFINEMENT AND NORMAL FORMS

Lecture 11 - Chapter 8 Relational Database Design Part 1

Database Design Principles

Database Constraints and Design

Design Theory for Relational Databases

Announcements (January 20) Relational Database Design. Database (schema) design. Entity-relationship (E/R) model. ODL (Object Definition Language)

MODULE: 3 FUNCTIONAL DEPENDENCIES

customer = (customer_id, _ customer_name, customer_street,

NORMAL FORMS. CS121: Relational Databases Fall 2017 Lecture 18

Theory of Normal Forms Decomposition of Relations. Overview

Lecture 6a Design Theory and Normalization 2/2

Part V Relational Database Design Theory

Database Systems. Basics of the Relational Data Model

Mapping ER Diagrams to. Relations (Cont d) Mapping ER Diagrams to. Exercise. Relations. Mapping ER Diagrams to Relations (Cont d) Exercise

Normalization. Anomalies Functional Dependencies Closures Key Computation Projecting Relations BCNF Reconstructing Information Other Normal Forms

Normalization is based on the concept of functional dependency. A functional dependency is a type of relationship between attributes.

Babu Banarasi Das National Institute of Technology and Management

More Normalization Algorithms. CS157A Chris Pollett Nov. 28, 2005.

Introduction to Databases, Fall 2003 IT University of Copenhagen. Lecture 4: Normalization. September 16, Lecturer: Rasmus Pagh

Introduction to Data Management. Lecture #7 (Relational Design Theory)

Database Management Systems Paper Solution

Chapter 7: Relational Database Design

Unit IV. S_id S_Name S_Address Subject_opted

CSCI 403: Databases 13 - Functional Dependencies and Normalization

Databases The theory of relational database design Lectures for m

Normal Forms. Winter Lecture 19

5 Normalization:Quality of relational designs

Normalisation. Normalisation. Normalisation

Desirable database characteristics Database design, revisited

CS352 Lecture - Conceptual Relational Database Design

Normalisation theory

Chapter 14 Outline. Normalization for Relational Databases: Outline. Chapter 14: Basics of Functional Dependencies and

Introduction to Database Design, fall 2011 IT University of Copenhagen. Normalization. Rasmus Pagh

Examination examples

DATABASE MANAGEMENT SYSTEM SHORT QUESTIONS. QUESTION 1: What is database?

The strategy for achieving a good design is to decompose a badly designed relation appropriately.

Administrivia. Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Relational Model: Keys. Correction: Implied FDs

Final Review. Zaki Malik November 20, 2008

FUNCTIONAL DEPENDENCIES CHAPTER , 15.5 (6/E) CHAPTER , 10.5 (5/E)

Transcription:

CSE 544 Principles of Database Management Systems Magdalena Balazinska Winter 2009 Lecture 4 - Schema Normalization

References R&G Book. Chapter 19: Schema refinement and normal forms Also relevant to this lecture. Chapter 2: Introduction to database design and Chapter 3.5: Logical database design: ER to relational 2

Goal Question: The relational model is great, but how do I go about designing my database schema? 3

Outline Conceptual db design: entity-relationship model Problematic database designs Functional dependencies Normal forms and schema normalization 4

Database Design Process Data Modeling Refinement SQL Tables Files Conceptual Schema ER diagrams Relations Physical Schema 5

Conceptual Schema Design Conceptual Model: name Patient patien_of Doctor zip name dno Relational Model: plus FD s (FD = functional dependency) Normalization: Eliminates anomalies 6

Entity-Relationship Diagrams Attributes name Entity sets Patient Relationship sets patient_of 7

Example ER Diagram pno name since dno Patient patient_of Doctor zip specialty name 8

Entity-Relationship Model Typically, each entity has a key ER relationships can include multiplicity One-to-one, one-to-many, etc. Indicated with arrows Can model multi-way relationships Can model subclasses And more... 9

Example with Inheritance id Person name dept Employee Customer credit_score billing_addr Example from Phil Bernstein s SIGMOD 07 keynote talk 10

Converting into Relations One way to translate our ER diagram into relations HR ( id, name) Empl (id, dept) and id is also a foreign key referencing HR Client (id, name, credit_score, billing_addr) Today, we only talk about using ER diagrams to help us design the conceptual schema of a database In general, apps may need to operate on a view of the data closer to ER model (e.g., OO view of data) while db contains relations Need to translate between objects and relations Can be hard model management problem 11

Back to Our Simpler Example pno name since dno Patient patient_of Doctor zip specialty name 12

Resulting Relations One way to translate diagram into relations PatientOf (pno, name, zip, dno, since) Doctor (dno, dname, specialty) 13

Outline Conceptual db design: entity-relationship model Problematic database designs Functional dependencies Normal forms and schema normalization 14

Problematic Designs Some db designs lead to redundancy Same information stored multiple times Problems Redundant storage Update anomalies Insertion anomalies Deletion anomalies 15

Problem Examples PatientOf pno name zip dno since 1 p1 98125 2 2000 1 p1 98125 3 2003 2 p2 98112 1 2002 Redundant If we update to 98119, we get inconsistency 3 p1 98143 1 1985 What if we want to insert a patient without a doctor? What if we want to delete the last doctor for a patient? Illegal as (pno,dno) is the primary key, cannot have nulls 16

Solution: Decomposition Patient pno name zip 1 p1 98125 2 p2 98112 3 p1 98143 PatientOf pno dno since 1 2 2000 1 3 2003 2 1 2002 3 1 1985 Decomposition solves the problem, but need to be careful 17

Lossy Decomposition Patient pno name zip 1 p1 98125 2 p2 98112 3 p1 98143 PatientOf name dno since p1 2 2000 p1 3 2003 p2 1 2002 p1 1 1985 Decomposition can cause us to lose information! 18

Schema Refinement Challenges How do we know that we should decompose a relation? Functional dependencies Normal forms How do we make sure decomposition does not lose info? Lossless-join decompositions Dependency-preserving decompositions 19

Outline Conceptual db design: entity-relationship model Problematic database designs Functional dependencies Normal forms and schema normalization 20

Functional Dependency A functional dependency (FD) is an integrity constraint that generalizes the concept of a key An instance of relation R satisfies the FD: X Y if for every pair of tuples t1 and t2 if t1.x = t2.x then t1.y = t2.y where X, Y are two nonempty sets of attributes in R We say that X determines Y FDs come from domain knowledge 21

Closure of FDs Some FDs imply others For example: Employee(ssn,position,salary) FD1: ssn position and FD2: position salary Imply FD3: ssn salary Can compute closure of a set of FDs Set F+ of all FDs implied by a given set F of FDs Armstrong s Axioms: sound and complete Reflexivity: if Y X then X Y Augmentation: if X Y then XZ YZ for any Z Transitivity: if X Y and Y Z then X Z 22

Closure of a Set of Attributes Given a set of attributes A 1,, A n The closure, {A 1,, A n } +, is the set of attributes B s.t. A 1,, A n B 23

Closure Algo. (for Attributes) Start with X={A 1,, A n }. Repeat until X doesn t change: if B 1,, B n C is a FD and B 1,, B n are all in X then add C to X. Can use this algorithm to find keys Compute X + for all sets X If X + = all attributes, then X is a superkey Consider only the minimal superkeys 24

Closure Example (for Attributes) Example: name color category department color, category price Closures: name + = {name, color} {name, category} + = {name, category, color, department, price} color + = {color} 25

Closure Algo. (for FDs) Example: A, B C A, D B B D Step 1: Compute X +, for every X: A+ = A, B+ = BD, C+ = C, D+ = D AB+ = ABCD, AC+ = AC, AD+ = ABCD ABC+ = ABD+ = ACD + = ABCD BCD + = BCD, ABCD+ = ABCD Step 2: Enumerate all X, output X X+ - X AB CD, AD BC, ABC CSE 544 - Winter D, ABD 2009 C, ACD B 26

Decomposition Problems FDs will help us identify possible redundancy Identify redundancy and split relations to avoid it. Can we get the data back correctly? Lossless-join decomposition Can we recover the FD s on the big table from the FD s on the small tables? Dependency-preserving decomposition So that we can enforce all FDs without performing joins 27

Outline Conceptual db design: entity-relationship model Problematic database designs Functional dependencies Normal forms and schema normalization 28

Normal Forms Based on Functional Dependencies 2nd Normal Form (obsolete) 3rd Normal Form Boyce Codd Normal Form (BCNF) We only discuss these two Based on Multivalued Dependencies 4th Normal Form Based on Join Dependencies 5th Normal Form 29

BCNF A simple condition for removing anomalies from relations: A relation R is in BCNF if: If A 1,..., A n B is a non-trivial dependency in R, then {A 1,..., A n } is a superkey for R BCNF ensures that no redundancy can be detected using FD information alone 30

Our Example PatientOf pno name zip dno since 1 p1 98125 2 2000 1 p1 98125 3 2003 2 p2 98112 1 2002 3 p1 98143 1 1985 pno,dno is a key, but pno name, zip BCNF violation so we decompose 31

Decomposition in General R(A 1,..., A n, B 1,..., B m, C 1,..., C p ) R 1 (A 1,..., A n, B 1,..., B m ) R 2 (A 1,..., A n, C 1,..., C p ) R 1 = projection of R on A 1,..., A n, B 1,..., B m R 2 = projection of R on A 1,..., A n, C 1,..., C p Theorem If A 1,..., A n B 1,..., B m Then the decomposition is lossless Note: don t need necessarily A 1,..., A n C 1,..., C p 32

BCNF Decomposition Algorithm Repeat choose A 1,, A m B 1,, B n that violates BCNF condition split R into R 1 (A 1,, A m, B 1,, B n ) and R 2 (A 1,, A m, [rest]) continue with both R1 and R2 Until no more violations Lossless-join decomposition: Attributes common to R 1 and R 2 must contain a key for either R 1 or R 2 33

BCNF and Dependencies Unit Company Product FD s: Unit Company; Company, Product Unit So, there is a BCNF violation, and we decompose. 34

BCNF and Dependencies Unit Company Product FD s: Unit Company; Company, Product Unit So, there is a BCNF violation, and we decompose. Unit Company Unit Company Unit Product No FDs In BCNF we lose the FD: Company, Product Unit 35

3NF A simple condition for removing anomalies from relations: A relation R is in 3rd normal form if : Whenever there is a nontrivial dep. A 1, A 2,..., A n B for R, then {A 1, A 2,..., A n } is a super-key for R, or B is part of a key. 36

3NF Discussion 3NF decomposition v.s. BCNF decomposition: Use same decomposition steps, for a while 3NF may stop decomposing, while BCNF continues Tradeoffs BCNF = no anomalies, but may lose some FDs 3NF = keeps all FDs, but may have some anomalies 37

Summary Database design is not trivial Use ER models Translate ER models into relations Normalize to eliminate anomalies Normalization tradeoffs BCNF: no anomalies, but may lose some FDs 3NF: keeps all FDs, but may have anomalies Too many small tables affect performance 38