Concepts from

Similar documents
Database Systems. Basics of the Relational Data Model

Normalization. Anomalies Functional Dependencies Closures Key Computation Projecting Relations BCNF Reconstructing Information Other Normal Forms

Part II: Using FD Theory to do Database Design

Functional Dependencies

Normalization. Murali Mani. What and Why Normalization? To remove potential redundancy in design

Introduction to Databases, Fall 2003 IT University of Copenhagen. Lecture 4: Normalization. September 16, Lecturer: Rasmus Pagh

CS411 Database Systems. 05: Relational Schema Design Ch , except and

Introduction to Database Design, fall 2011 IT University of Copenhagen. Normalization. Rasmus Pagh

High-Level Database Models. Spring 2011 Instructor: Hassan Khosravi

Final Review. Zaki Malik November 20, 2008

Lectures 12: Design Theory I. 1. Normal forms & functional dependencies 2/19/2018. Today s Lecture. What you will learn about in this section

CSCI 403: Databases 13 - Functional Dependencies and Normalization

Lecture 4. Database design IV. INDs and 4NF Design wrapup

Lecture 11 - Chapter 8 Relational Database Design Part 1

Relational Database Design Theory. Introduction to Databases CompSci 316 Fall 2017

Chapter 8: Relational Database Design

Databases 1. Daniel POP

Database Design Theory and Normalization. CS 377: Database Systems

Relational Database Design (II)

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2009 Lecture 3 - Schema Normalization

Unit 3 : Relational Database Design

Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Overview - detailed. Goal. Faloutsos & Pavlo CMU SCS /615

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Relational Design: Characteristics of Well-designed DB

Schema Refinement & Normalization Theory 2. Week 15

UNIT 3 DATABASE DESIGN

Lecture 6a Design Theory and Normalization 2/2

CMU SCS CMU SCS CMU SCS CMU SCS whole nothing but

Relational Model and Relational Algebra A Short Review Class Notes - CS582-01

The Relational Data Model

CSE 562 Database Systems

Theory of Normal Forms Decomposition of Relations. Overview

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 4 - Schema Normalization

COSC Dr. Ramon Lawrence. Emp Relation

Design Theory for Relational Databases

Lectures 5 & 6. Lectures 6: Design Theory Part II

Chapter 7: Relational Database Design

Databases The theory of relational database design Lectures for m

Relational Design Theory. Relational Design Theory. Example. Example. A badly designed schema can result in several anomalies.

Relational Design 1 / 34

Chapter 6: Relational Database Design

Database Constraints and Design

Normalisation. Normalisation. Normalisation

Case Study: Lufthansa Cargo Database

customer = (customer_id, _ customer_name, customer_street,

Informal Design Guidelines for Relational Databases

Database Management System Prof. Partha Pratim Das Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur

V. Database Design CS448/ How to obtain a good relational database schema

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-2

Schema Refinement: Dependencies and Normal Forms

CS 2451 Database Systems: Database and Schema Design

Databases -Normalization I. (GF Royle, N Spadaccini ) Databases - Normalization I 1 / 24

Chapter 10. Chapter Outline. Chapter Outline. Functional Dependencies and Normalization for Relational Databases

Database design III. Quiz time! Using FDs to detect anomalies. Decomposition. Decomposition. Boyce-Codd Normal Form 11/4/16

Relational Database design. Slides By: Shree Jaswal

Schema Refinement: Dependencies and Normal Forms

In This Lecture. Normalisation to BCNF. Lossless decomposition. Normalisation so Far. Relational algebra reminder: product

Fundamentals of Database Systems

Functional Dependencies and Finding a Minimal Cover

CS 338 Functional Dependencies

Relational Design Theory

MODULE: 3 FUNCTIONAL DEPENDENCIES

5 Normalization:Quality of relational designs

NORMAL FORMS. CS121: Relational Databases Fall 2017 Lecture 18

Redundancy:Dependencies between attributes within a relation cause redundancy.

Presentation on Functional Dependencies CS x265

Handout 3: Functional Dependencies and Normalization

Schema Refinement: Dependencies and Normal Forms

Functional Dependencies and. Databases. 1 Informal Design Guidelines for Relational Databases. 4 General Normal Form Definitions (For Multiple Keys)

Chapter 10. Normalization. Chapter Outline. Chapter Outline(contd.)

Chapter 14. Database Design Theory: Introduction to Normalization Using Functional and Multivalued Dependencies

Database Systems. Normalization Lecture# 7

Institute of Southern Punjab, Multan

RELATIONAL DESIGN THEORY / LECTURE 4 TIM KRASKA

Instructor: Amol Deshpande

Review: Attribute closure

The Relational Model and Normalization

Normalisation Chapter2 Contents

Relational Algebra. Spring 2012 Instructor: Hassan Khosravi

Database Management System

DBMS Chapter Three IS304. Database Normalization-Comp.

Schema Refinement and Normal Forms

Functional dependency theory

Birkbeck. (University of London) BSc/FD EXAMINATION. Department of Computer Science and Information Systems. Database Management (COIY028H6)

Normalization 03. CSE3421 notes

Desired properties of decompositions

Functional Dependencies & Normalization for Relational DBs. Truong Tuan Anh CSE-HCMUT

This lecture. Databases -Normalization I. Repeating Data. Redundancy. This lecture introduces normal forms, decomposition and normalization.

Relational Database Systems 1

Edited by: Nada Alhirabi. Normalization

Design Theory for Relational Databases

UNIT -III. Two Marks. The main goal of normalization is to reduce redundant data. Normalization is based on functional dependencies.

CSIT5300: Advanced Database Systems

CS352 Lecture - Conceptual Relational Database Design

CS352 Lecture - Conceptual Relational Database Design

Chapter 14 Outline. Normalization for Relational Databases: Outline. Chapter 14: Basics of Functional Dependencies and

Functional Dependencies and Normalization for Relational Databases Design & Analysis of Database Systems

Databases - Relations in Databases. (N Spadaccini 2010) Relations in Databases 1 / 16

Databases Lecture 7. Timothy G. Griffin. Computer Laboratory University of Cambridge, UK. Databases, Lent 2009

Relational Database Systems 1. Christoph Lofi Simon Barthel Institut für Informationssysteme Technische Universität Braunschweig

Transcription:

1

Concepts from 3.1-3.2 Functional dependencies Keys & superkeys of a relation Reasoning about FDs Closure of a set of attributes Closure of a set of FDs Minimal basis for a set of FDs 2

Plan How can we use FDs to show that a relation has an anomaly (a potential problem)? How can we algorithmically fix the problem? 3

Projecting sets of FDs Suppose we have a relation R and set of FDs F Let S be a relation obtained by projecting R into a subset of the attributes of R F S The projection of is the set of FDs that follow from and hold in S F F Involve only attributes of S π Attributes (R) 4

Projecting sets of FDs Algorithm for computing : Compute closure F + F is the set of all FDs in F + that involve only the S attributes in S Book describes a different algorithm in section 3.2.8. Book's algorithm also shows how to compute a minimal basis of F S F S 5

Projecting sets of FDs R(A, B, C, D); F = {AàB, BàC, CàD} Which FDs hold in S(A, C, D)? F + is {AàB, BàC, CàD, AàC, AàD, BàD} F S is {CàD, AàC, AàD} 6

Anomalies An anomaly is a problem that arises when we try to add too many attributes to a single relation. Arises from redundancy: information repeated unnecessarily. When designing schemas, try to ensure you never repeat yourself! title year length genre studio star Star Wars 1977 124 SciFi Fox Carrie Fisher Star Wars 1977 124 SciFi Fox Mark Hamill Star Wars 1977 124 SciFi Fox Harrison Ford Gone With the Wind 1939 231 Drama MGM Vivien Leigh Wayne's World 1992 95 Comedy Paramount Dana Carvey Wayne's World 1992 95 Comedy Paramount Mike Meyers 7

Anomalies Update anomaly: when you change information in one tuple but leave the same information in a different tuple unchanged. title year length genre studio star Star Wars 1977 124 SciFi Fox Carrie Fisher Star Wars 1977 124 SciFi Fox Mark Hamill Star Wars 1977 124 SciFi Fox Harrison Ford Gone With the Wind 1939 231 Drama MGM Vivien Leigh Wayne's World 1992 95 Comedy Paramount Dana Carvey Wayne's World 1992 95 Comedy Paramount Mike Meyers 8

Anomalies Deletion anomaly: when deleting one or more tuples removes information that we didn't want to lose. title year length genre studio star Star Wars 1977 124 SciFi Fox Carrie Fisher Star Wars 1977 124 SciFi Fox Mark Hamill Star Wars 1977 124 SciFi Fox Harrison Ford Gone With the Wind 1939 231 Drama MGM Vivien Leigh Wayne's World 1992 95 Comedy Paramount Dana Carvey Wayne's World 1992 95 Comedy Paramount Mike Meyers 9

Anomalies Insertion anomaly (left out of book): when storing a piece of information forces us to store an unrelated piece of information as well. title year length genre studio star Star Wars 1977 124 SciFi Fox Carrie Fisher Star Wars 1977 124 SciFi Fox Mark Hamill Star Wars 1977 124 SciFi Fox Harrison Ford Gone With the Wind 1939 231 Drama MGM Vivien Leigh Wayne's World 1992 95 Comedy Paramount Dana Carvey Wayne's World 1992 95 Comedy Paramount Mike Meyers 10

11

Decomposing Relations Given a relation R(A1, A2, An), two relations S(B1, B2, Bm) and T(C1, C2, Ck) form a decomposition of R if: 1. the attributes of S and T together make up the attributes of R, i.e., {A's} = {B's} U {C's} 2. the tuples in S are the projections into {B1 Bm} of the tuples of R i.e. 3. the tuples in T are the projections into {C1 Ck} of the tuples of R i.e. S π B1,B2,..,Bm (R) T π C1,C2,..,Ck (R) 12

title year length genre studio star Star Wars 1977 124 SciFi Fox Carrie Fisher Star Wars 1977 124 SciFi Fox Mark Hamill Star Wars 1977 124 SciFi Fox Harrison Ford Gone With the Wind 1939 231 Drama MGM Vivien Leigh Wayne's World 1992 95 Comedy Paramount Dana Carvey Wayne's World 1992 95 Comedy Paramount Mike Meyers Decompose into Movies(title, year, length, genre, studio) Stars(title, year, star) Are the anomalies removed? Is anything redundant? Why or why not? Do you see a connection to FDs? 13

BCNF Anomalies are guaranteed not to exist when a relation is in Boyce-Codd normal form (BCNF). A relation R is in BCNF iff whenever there is a nontrivial FD A 1 A n ->B 1 B m for R, {A 1,, A n } is a superkey for R. Informally, the left side of every nontrivial FD must be a superkey. 14

Check for BCNF violations List all nontrivial FDs in R. Ensure left side of each nontrivial FD is a superkey. (First have to find all the keys!) Note: a relation with two attributes is always in BCNF. 15

title year length genre studio star Star Wars 1977 124 SciFi Fox Carrie Fisher Star Wars 1977 124 SciFi Fox Mark Hamill Star Wars 1977 124 SciFi Fox Harrison Ford Gone With the Wind 1939 231 Drama MGM Vivien Leigh Wayne's World 1992 95 Comedy Paramount Dana Carvey Wayne's World 1992 95 Comedy Paramount Mike Meyers Decompose into Movies(title, year, length, genre, studio) Stars(title, year, star) What are the new FDs and keys? 16

Example. Is Courses(Number, DepartmentName, CourseName, Classroom, Enrollment, StudentName, Address) in BCNF? FDs: Number DepartmentName à CourseName Number DepartmentName à Classroom Number DepartmentName à Enrollment What is {Number, DepartmentName} + under the FDs? {Number, DepartmentName, Coursename, Classroom, Enrollment} So the key is {Number, DepartmentName, StudentName, Address} So the relation is not in BCNF. 17

Decomposition into BCNF Suppose R is a relation schema that violates BCNF We can decompose R into a set S of new relations such that: each relation in S is in BCNF and we can recover R from the relations in S, i.e., we can reconstruct R exactly from the relations in S 18

Algorithm: Given relation R and set of FDs F: Check if R is in BCNF, if not, do: If there are FDs that violate BCNF, call one X -> Y. Compute X +. Let R1 = X + and R2 = X and all other attributes not in X +. Compute FDs for R1 and R2 (projection algorithm for FDs). Check if R1 and R2 are in BCNF, and repeat if needed. 20

title year length genre studio star Star Wars 1977 124 SciFi Fox Carrie Fisher Star Wars 1977 124 SciFi Fox Mark Hamill Star Wars 1977 124 SciFi Fox Harrison Ford Gone With the Wind 1939 231 Drama MGM Vivien Leigh Wayne's World 1992 95 Comedy Paramount Dana Carvey Wayne's World 1992 95 Comedy Paramount Mike Meyers FDs: title year -> length genre studio Key: {title, year, star} This relation is not in BCNF (the single FD is a violation because the LHS (title, year) is not a superkey). Decompose: - Compute {title, year} + = {title, year, length, genre, studio} - New relation: R1(title, year, length, genre, studio). Key = {title, year} - New relation: R2(title, year, star). Key = {title, year, star} - FDs for R1: Same FD as for original relation. FDs for R2: none - No BCNF violations in R1 or R2. (LHS of FD in R1 is a superkey.) 21

Schema is Courses(Number, DeptName, CourseName, Classroom, Enrollment, StudentName, Address) BCNF-violating FD is Number DeptName à CourseName Classroom Enrollment What is {Number, DeptName}+? {Number, DeptName, CourseName, Classroom, Enrollment} Decompose Courses into Courses1(Number, DeptName, CourseName, Classroom, Enrollment) and Courses2(Number, DeptName, StudentName, Address) Are there any BCNF violations in the two new relations? 22

Students and Profs Suppose we have one single relation with attributes: R# StudentName ProfID (ID of professor teaching a class with the student) ProfName AdvisorID AdvisorName 23

Warmup Attributes: Author, Address (of a person), Title (of a book), Genre (of a book), Pages (in a book). Suppose we have the FDs: Author -> Address Title -> Genre Pages Decompose into BCNF. 24

25

There are other types of decomposition besides BCNF. Why should we use this one and not another? We'd like a decomposition to: 1. eliminate anomalies 2. let us recover the original relation with a join (lossless join property) 3. let us recover the original FDs when recovering the original relation (dependency preservation property) BCNF decomposition gives us 1 & 2, but not 3. 26

BCNF decomposition guarantees: There are no redundancy, insertion, update, or deletion anomalies. We can recover the original relation with a natural join. (lossless join property) However, we might lose some original FDs in the natural join. 27

Name Type Closest Restaurant of Type Alice BBQ Cozy Corner Alice Thai Bhan Thai Bob Pizza Broadway Pizza Charlie Doughnuts Gibson s Donuts Charlie Thai Bangkok Alley Charlie BBQ Cozy Corner Suppose we want to store information about the closest restaurant to various people that serves a certain type of cuisine. What are FDs/keys? Is this in BCNF? 28

Name Alice Alice Bob Charlie Charlie Charlie Closest Cozy Corner Bhan Thai Broadway Pizza Donald's Donuts Bangkok Alley Cozy Corner Restaurant Type Cozy Corner BBQ Bhan Thai Thai Broadway Pizza Pizza Donald's Donuts Doughnuts Bangkok Alley Thai 29

Book's example Traveling shows: Store theater names, the cities they are in, and the title of the show playing. A show never plays at more than one theater per city. theater -> city title city -> theater 30

3 rd Normal Form (3NF) Allows for lossless joins and dependency preservation. Does not fix all anomalies. 3NF is a weaker condition than BCNF (anything in BCNF is automatically in 3NF). 31

3 rd Normal Form (3NF) A relation R is in 3NF iff for every nontrivial FD A1 An -> B for R, one of the following is true: A1 An is a superkey for R (BCNF test) Each B is a prime attribute (an attribute in some key for R) 32

Example R(C, D, P, S, Y) has FDs PSY à CD CD à S Keys are {P, S, Y} and {C, D, P, Y} CD à S violates BCNF However, R is in 3NF because S is part of a key 33

3NF Decomposition Given a relation R and set F of functional dependencies: 1. Find a minimal basis, G, for F. 2. For each FD X -> A in G, use XA as the schema of one of the relations in the decomposition. 3. If none of the sets of schemas from Step 2 is a superkey for R, add another relation whose schema is a key for R. 34

Example Example: R(A, B, C) F: {AàB, CàB} What is the minimal basis set of FDs? What is the decomposition to 3NF? 35

More redundancy? Course Textbook Prof ENGL 101 Writing for Dummies Smith ENGL 101 Wikipedia Is Not a Primary Source Smith ENGL 101 Writing for Dummies Jones ENGL 101 Wikipedia Is Not a Primary Source Jones COMP 142 How to Program in C++ Smith COMP 142 How to Program in C++ Jones Every professor always uses the same set of books. Is this in BCNF? 36

Redundancies can still arise in relations that conform to BCNF. Occurs when a single table tries to contain two (or more) many-one (or many-many) relationships. Course Textbook Prof ENGL 101 Writing for Dummies Smith ENGL 101 Wikipedia Is Not a Primary Source Smith ENGL 101 Writing for Dummies Jones ENGL 101 Wikipedia Is Not a Primary Source Jones COMP 142 How to Program in C++ Smith COMP 142 How to Program in C++ Jones 37

Multivalued dependencies A MVD is a constraint that two sets of attributes are independent of each other. A MVD A1 An ->-> B1 Bm holds in R if in every instance of R: for every pair of tuples t and u that agree on all the As, we can find a tuple v in R that agrees with both t and u on the As with t on the Bs with u on all those attributes of R that are not As or Bs In other words, the information in A1..An determines the values of the set of tuples for B1..Bm and those tuples are independent of any other attributes in the relation. 38

Consider a table with actors/actresses, their street addresses with cities/states/zips, and the movies they ve been in (title/year). 39

Consider a MVD A1 An ->-> B1 Bm. Call attributes not in A s or B s the Cs. This MVD holds in R if: whenever we have two tuples of R that agree on the A s but differ on the B s and C s we should be able to find (or create) two new tuples with the same A s but swapped B s and C s. Equivalently: If knowing A1 An determines a unique set of tuples for B1..Bm that is independent of the C s. 40

Course Textbook Prof ENGL 101 Writing for Dummies Smith ENGL 101 Wikipedia Is Not a Primary Source Smith ENGL 101 Writing for Dummies Jones ENGL 101 Wikipedia Is Not a Primary Source Jones COMP 142 How to Program in C++ Smith COMP 142 How to Program in C++ Jones Course àà Textbook is an MVD What else? 41

FDs vs MVDs A FD A -> B says "Each A determines a unique B" or, "Each A determines 0 or 1 Bs." A MVD A ->-> B says "Each A determines a set of Bs where the Bs are independent of anything in the relation that is not an A or a B." 42

Rules for MVDs FD promotion: Every FD AàB is an MD AààB Trivial MDs: 1. If AààB, then AààAB 2. If A1, A2, An and B1, B2,, Bm make up all the attributes of a relation, then A1, A2, An àà B1, B2, Bm holds in the relation 43

Transitive rule: Given AààB and BààC, we can infer AààC. Complementation rule: if we know AààB, then we know AààC, where all the Cs are attributes not among the As or Bs. 44

Note that the splitting rule does not hold! If AààBC, then it is not true that AààB and AààC. 45

Fourth Normal Form (4NF) "Stronger" than BCNF. A relation R is in 4NF iff: for all MVDs A1 An ->-> B1 Bm, {A1,, An} is a superkey of R. 46

4NF Decomposition Consider relation R with set of attributes X A1 A2 An àà B1 B2 Bm violates 4NF Decompose R into two relations whose attributes are: 1. The As and Bs together, i.e., {A1 A2 An, B1, B2,, Bm} 2. All the attributes of R which are not Bs, i.e. X {B1, B2, Bm} 3. Recursively check if the new relations are in 4NF and repeat 47

Example Course Textbook Prof ENGL 101 Writing for Dummies Smith ENGL 101 Wikipedia Is Not a Primary Source Smith ENGL 101 Writing for Dummies Jones ENGL 101 Wikipedia Is Not a Primary Source Jones COMP 142 How to Program in C++ Smith COMP 142 How to Program in C++ Jones Course ->-> Textbook Course ->-> Professor 48

Example Drinkers(name, addr, phones, beer) FD: name addr Nontrivial MVD s: name phone and name beer. Only key: {name, phones, beer} All three dependencies above violate 4NF. Successive decomposition yields 4NF relations: D1(name, addr) D2(name, phones) D3(name, beer) 49

Relationships Among Normal Forms 4NF implies BCNF, i.e., if a relation is in 4NF, it is also in BCNF BCNF implies 3NF, i.e., if a relation is in BCNF, it is also in 3NF Property 3NF BCNF 4NF Eliminate redundancy due to FDs Maybe Yes Yes Eliminate redundancy due to MDs No No Yes Preserves FDs Yes Maybe Maybe Preserves MDs Maybe Maybe Maybe 50

Normal Forms First Normal Form: each attribute is atomic Second Normal Form: No non-trivial FD has a left side that is a proper subset of a key Third Normal Form: just discussed it Fourth Normal Form: just discussed it Fifth Normal Form: outside the scope of this class Sixth Normal Form: different versions exist. One version developed for temporal databases Seventh Normal Form just kidding J 51

Database Design Mantra everything should depend on the key, the whole key, and nothing but the key 52