Relational Database Design Theory. Introduction to Databases CompSci 316 Fall 2017

Similar documents
Lecture 6a Design Theory and Normalization 2/2

Announcements (January 20) Relational Database Design. Database (schema) design. Entity-relationship (E/R) model. ODL (Object Definition Language)

Relational Database Design. Announcements. Database (schema) design. CPS 216 Advanced Database Systems. DB2 accounts have been set up

Functional Dependencies and Finding a Minimal Cover

Part II: Using FD Theory to do Database Design

Relational Design: Characteristics of Well-designed DB

UNIT 3 DATABASE DESIGN

Database Design Principles

Chapter 8: Relational Database Design

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2009 Lecture 3 - Schema Normalization

MODULE: 3 FUNCTIONAL DEPENDENCIES

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 4 - Schema Normalization

CS411 Database Systems. 05: Relational Schema Design Ch , except and

Databases -Normalization I. (GF Royle, N Spadaccini ) Databases - Normalization I 1 / 24

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-2

Chapter 10. Chapter Outline. Chapter Outline. Functional Dependencies and Normalization for Relational Databases

FUNCTIONAL DEPENDENCIES

Normalization. Murali Mani. What and Why Normalization? To remove potential redundancy in design

Unit 3 : Relational Database Design

Database Systems. Basics of the Relational Data Model

Schema Refinement: Dependencies and Normal Forms

This lecture. Databases -Normalization I. Repeating Data. Redundancy. This lecture introduces normal forms, decomposition and normalization.

CSE 562 Database Systems

Chapter 14. Database Design Theory: Introduction to Normalization Using Functional and Multivalued Dependencies

COSC Dr. Ramon Lawrence. Emp Relation

Chapter 10. Normalization. Chapter Outline. Chapter Outline(contd.)

Functional Dependencies and. Databases. 1 Informal Design Guidelines for Relational Databases. 4 General Normal Form Definitions (For Multiple Keys)

Relational Database Design (II)

Database Design Theory and Normalization. CS 377: Database Systems

Functional Dependencies CS 1270

Informal Design Guidelines for Relational Databases

CSCI 403: Databases 13 - Functional Dependencies and Normalization

Schema Refinement: Dependencies and Normal Forms

Lecture 5 Design Theory and Normalization

Schema Refinement: Dependencies and Normal Forms

Databases The theory of relational database design Lectures for m

Chapter 7: Relational Database Design

Lecture 4. Database design IV. INDs and 4NF Design wrapup

Normalization. Anomalies Functional Dependencies Closures Key Computation Projecting Relations BCNF Reconstructing Information Other Normal Forms

Relational Database design. Slides By: Shree Jaswal

Theory of Normal Forms Decomposition of Relations. Overview

Functional dependency theory

Relational Design Theory. Relational Design Theory. Example. Example. A badly designed schema can result in several anomalies.

Design Theory for Relational Databases

customer = (customer_id, _ customer_name, customer_street,

Schema Refinement and Normal Forms

CS 2451 Database Systems: Database and Schema Design

Functional Dependencies & Normalization for Relational DBs. Truong Tuan Anh CSE-HCMUT

Normalisation. Normalisation. Normalisation

Review: Attribute closure

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Homework 6: FDs, NFs and XML (due April 13 th, 2016, 4:00pm, hard-copy in-class please)

Relational Design Theory

Functional Dependencies and Normalization for Relational Databases Design & Analysis of Database Systems

Database design III. Quiz time! Using FDs to detect anomalies. Decomposition. Decomposition. Boyce-Codd Normal Form 11/4/16

Relational Database Systems 1

Design Theory for Relational Databases

Concepts from

Lectures 5 & 6. Lectures 6: Design Theory Part II

V. Database Design CS448/ How to obtain a good relational database schema

Final Review. Zaki Malik November 20, 2008

Relational Model and Relational Algebra A Short Review Class Notes - CS582-01

Database Management System

Fundamentals of Database Systems

UNIT -III. Two Marks. The main goal of normalization is to reduce redundant data. Normalization is based on functional dependencies.

Lecture 11 - Chapter 8 Relational Database Design Part 1

Database Management System Prof. Partha Pratim Das Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur

The Relational Data Model

Database Normalization. (Olav Dæhli 2018)

Combining schemas. Problems: redundancy, hard to update, possible NULLs

Desirable database characteristics Database design, revisited

Relational Database Systems 1. Christoph Lofi Simon Barthel Institut für Informationssysteme Technische Universität Braunschweig

SQL: Part II. Introduction to Databases CompSci 316 Fall 2018

NORMAL FORMS. CS121: Relational Databases Fall 2017 Lecture 18

Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Overview - detailed. Goal. Faloutsos & Pavlo CMU SCS /615

CS352 Lecture - Conceptual Relational Database Design

Chapter 6: Relational Database Design

CSIT5300: Advanced Database Systems

CMU SCS CMU SCS CMU SCS CMU SCS whole nothing but

Mapping ER Diagrams to. Relations (Cont d) Mapping ER Diagrams to. Exercise. Relations. Mapping ER Diagrams to Relations (Cont d) Exercise

Schema Refinement & Normalization Theory 2. Week 15

Relational Database Systems 1 Wolf-Tilo Balke Hermann Kroll, Janus Wawrzinek, Stephan Mennicke

Functional Dependency: Design and Implementation of a Minimal Cover Algorithm

CS352 Lecture - Conceptual Relational Database Design

6.830 Lecture PS1 Due Next Time (Tuesday!) Lab 1 Out end of week start early!

5 Normalization:Quality of relational designs

Unit IV. S_id S_Name S_Address Subject_opted

Lectures 12: Design Theory I. 1. Normal forms & functional dependencies 2/19/2018. Today s Lecture. What you will learn about in this section

Chapter 14 Outline. Normalization for Relational Databases: Outline. Chapter 14: Basics of Functional Dependencies and

Functional Dependencies and Normalization

Unit- III (Functional dependencies and Normalization, Relational Data Model and Relational Algebra)

18. Relational Database Design

Database Constraints and Design

Typical relationship between entities is ((a,b),(c,d) ) is best represented by one table RS (a,b,c,d)

Functional Dependencies and Single Valued Normalization (Up to BCNF)

Redundancy:Dependencies between attributes within a relation cause redundancy.

Relational Database Design: Part I. Introduction to Databases CompSci 316 Fall 2017

Functional Dependencies

Normalization. VI. Normalization of Database Tables. Need for Normalization. Normalization Process. Review of Functional Dependence Concepts

Normalization is based on the concept of functional dependency. A functional dependency is a type of relationship between attributes.

Transcription:

Relational Database Design Theory Introduction to Databases CompSci 316 Fall 2017

2 Announcements (Thu. Sep. 14) Homework #1 due next Tuesday (11:59pm) Course project description posted Read it! Mixer in a week and a half Milestone #1 right after fall break Teamwork required: 5 people per team on average

3 Motivation uid uname gid 142 Bart dps 123 Milhouse gov 857 Lisa abc 857 Lisa gov 456 Ralph abc 456 Ralph gov Why is UserGroup (uid, uname, gid) a bad design? It has redundancy user name is recorded multiple times, once for each group that a user belongs to Leads to update, insertion, deletion anomalies Wouldn t it be nice to have a systematic approach to detecting and removing redundancy in designs? Dependencies, decompositions, and normal forms

4 Functional dependencies A functional dependency (FD) has the form X Y, where X and Y are sets of attributes in a relation R X Y means that whenever two tuples in R agree on all the attributes in X, they must also agree on all attributes in Y Must be b X Y Z a b c a b?? Could be anything

5 FD examples Address (street_address, city, state, zip) street_address, city, state zip zip city, state zip, state zip? This is a trivial FD Trivial FD: LHS RHS zip state, zip? This is non-trivial, but not completely non-trivial Completely non-trivial FD: LHS RHS =

6 Redefining keys using FD s A set of attributes K is a key for a relation R if K all (other) attributes of R That is, K is a super key No proper subset of K satisfies the above condition That is, K is minimal

7 Reasoning with FD s Given a relation R and a set of FD s F Does another FD follow from F? Are some of the FD s in F redundant (i.e., they follow from the others)? Is K a key of R? What are all the keys of R?

8 Attribute closure Given R, a set of FD s F that hold in R, and a set of attributes Z in R: The closure of Z (denoted Z 3 ) with respect to F is the set of all attributes A 5, A 7, functionally determined by Z (that is, Z A 5 A 7 ) Algorithm for computing the closure Start with closure = Z If X Y is in F and X is already in the closure, then also add Y to the closure Repeat until no new attributes can be added

9 A more complex example UserJoinsGroup (uid, uname, twitterid, gid, fromdate) Assume that there is a 1-1 correspondence between our users and Twitter accounts uid uname, twitterid twitterid uid uid, gid fromdate Not a good design, and we will see why shortly

10 Example of computing closure gid, twitterid 3 =? twitterid uid Add uid Closure grows to { gid, twitterid, uid } F includes: uid uname, twitterid twitterid uid uid, gid fromdate uid uname, twitterid Add uname, twitterid Closure grows to { gid, twitterid, uid, uname } uid, gid fromdate Add fromdate Closure is now all attributes in UserJoinsGroup

11 Using attribute closure Given a relation R and set of FD s F Does another FD X Y follow from F? Compute X 3 with respect to F If Y X 3, then X Y follows from F Is K a key of R? Compute K 3 with respect to F If K 3 contains all the attributes of R, K is a super key Still need to verify that K is minimal (how?)

12 Rules of FD s Armstrong s axioms Reflexivity: If Y X, then X Y Augmentation: If X Y, then XZ YZ for any Z Transitivity: If X Y and Y Z, then X Z Rules derived from axioms Splitting: If X YZ, then X Y and X Z Combining: If X Y and X Z, then X YZ FUsing these rules, you can prove or disprove an FD given a set of FDs

13 Non-key FD s Consider a non-trivial FD X Y where X is not a super key Since X is not a super key, there are some attributes (say Z) that are not functionally determined by X X Y Z a b c 5 a b c 7 That b is associated with a is recorded multiple times: redundancy, update/insertion/deletion anomaly

14 Example of redundancy UserJoinsGroup (uid, uname, twitterid, gid, fromdate) uid uname, twitterid ( plus other FD s) uid uname twitterid gid fromdate 142 Bart @BartJSimpson dps 1987-04-19 123 Milhouse @MilhouseVan_ gov 1989-12-17 857 Lisa @lisasimpson abc 1987-04-19 857 Lisa @lisasimpson gov 1988-09-01 456 Ralph @ralphwiggum abc 1991-04-25 456 Ralph @ralphwiggum gov 1992-09-01

15 Decomposition uid uname twitterid gid fromdate 142 Bart @BartJSimpson dps 1987-04-19 123 Milhouse @MilhouseVan_ gov 1989-12-17 857 Lisa @lisasimpson abc 1987-04-19 857 Lisa @lisasimpson gov 1988-09-01 456 Ralph @ralphwiggum abc 1991-04-25 456 Ralph @ralphwiggum gov 1992-09-01 uid uname twitterid 142 Bart @BartJSimpson 123 Milhouse @MilhouseVan_ 857 Lisa @lisasimpson 456 Ralph @ralphwiggum Eliminates redundancy To get back to the original relation: uid gid fromdate 142 dps 1987-04-19 123 gov 1989-12-17 857 abc 1987-04-19 857 gov 1988-09-01 456 abc 1991-04-25 456 gov 1992-09-01

16 Unnecessary decomposition uid uname twitterid 142 Bart @BartJSimpson 123 Milhouse @MilhouseVan_ 857 Lisa @lisasimpson 456 Ralph @ralphwiggum uid uname uid twitterid 142 Bart 123 Milhouse 857 Lisa 456 Ralph 142 @BartJSimpson 123 @MilhouseVan_ 857 @lisasimpson 456 @ralphwiggum Fine: join returns the original relation Unnecessary: no redundancy is removed; schema is more complicated (and uid is stored twice!)

17 Bad decomposition uid gid fromdate 142 dps 1987-04-19 123 gov 1989-12-17 857 abc 1987-04-19 857 gov 1988-09-01 456 abc 1991-04-25 456 gov 1992-09-01 uid gid uid fromdate 142 dps 123 gov 857 abc 857 gov 456 abc 456 gov 142 1987-04-19 123 1989-12-17 857 1987-04-19 857 1988-09-01 456 1991-04-25 456 1992-09-01 Association between gid and fromdate is lost Join returns more rows than the original relation

18 Lossless join decomposition Decompose relation R into relations S and T attrs R = attrs S attrs T S = π BCCDE F R T = π BCCDE G R The decomposition is a lossless join decomposition if, given known constraints such as FD s, we can guarantee that R = S T Any decomposition gives R S T (why?) A lossy decomposition is one with R S T

19 Loss? But I got more rows! Loss refers not to the loss of tuples, but to the loss of information Or, the ability to distinguish different original relations uid gid 142 dps 123 gov 857 abc 857 gov 456 abc 456 gov uid gid fromdate 142 dps 1987-04-19 123 gov 1989-12-17 857 abc 1987-04-19 1988-09-01 857 gov 1988-09-01 1987-04-19 456 abc 1991-04-25 456 gov 1992-09-01 No way to tell which is the original relation uid fromdate 142 1987-04-19 123 1989-12-17 857 1987-04-19 857 1988-09-01 456 1991-04-25 456 1992-09-01

20 Questions about decomposition When to decompose How to come up with a correct decomposition (i.e., lossless join decomposition)

21 An answer: BCNF A relation R is in Boyce-Codd Normal Form if For every non-trivial FD X Y in R, X is a super key That is, all FDs follow from key other attributes When to decompose As long as some relation is not in BCNF How to come up with a correct decomposition Always decompose on a BCNF violation (details next) FThen it is guaranteed to be a lossless join decomposition!

22 BCNF decomposition algorithm Find a BCNF violation That is, a non-trivial FD X Y in R where X is not a super key of R Decompose R into R 5 and R 7, where R 5 has attributes X Y R 7 has attributes X Z, where Z contains all attributes of R that are in neither X nor Y Repeat until all relations are in BCNF

23 BCNF decomposition example uid uname, twitterid twitterid uid uid, gid fromdate UserJoinsGroup (uid, uname, twitterid, gid, fromdate) BCNF violation: uid uname, twitterid User (uid, uname, twitterid) uid uname, twitterid twitterid uid BCNF Member (uid, gid, fromdate) uid, gid fromdate BCNF

24 Another example uid uname, twitterid twitterid uid uid, gid fromdate UserJoinsGroup (uid, uname, twitterid, gid, fromdate) BCNF violation: twitterid uid UserId (twitterid, uid) BCNF UserJoinsGroup (twitterid, uname, gid, fromdate) twitterid uname twitterid, gid fromdate BCNF violation: twitterid uname UserName (twitterid, uname) BCNF Member (twitterid, gid, fromdate) BCNF

Why is BCNF decomposition lossless 25 Given non-trivial X Y in R where X is not a super key of R, need to prove: Anything we project always comes back in the join: R π IJ R π IK R Sure; and it doesn t depend on the FD Anything that comes back in the join must be in the original relation: R π IJ R π IK R Proof will make use of the fact that X Y

26 Recap Functional dependencies: a generalization of the key concept Non-key functional dependencies: a source of redundancy BCNF decomposition: a method for removing redundancies BNCF decomposition is a lossless join decomposition BCNF: schema in this normal form has no redundancy due to FD s

27 BCNF = no redundancy? User (uid, gid, place) A user can belong to multiple groups A user can register places she s visited Groups and places have nothing to do with other FD s? None BCNF? Yes Redundancies? Tons! uid gid place 142 dps Springfield 142 dps Australia 456 abc Springfield 456 abc Morocco 456 gov Springfield 456 gov Morocco

28 Multivalued dependencies A multivalued dependency (MVD) has the form X Y, where X and Y are sets of attributes in a relation R X Y means that whenever two rows in R agree on all the attributes of X, then we can swap their Y components and get two rows that are also in R X Y Z a b 5 c 5 a b 7 c 7 a b 7 c 5 a b 5 c 7

29 MVD examples User (uid, gid, place) uid gid uid place Intuition: given uid, gid and place are independent uid, gid place Trivial: LHS RHS = all attributes of R uid, gid uid Trivial: LHS RHS

30 Complete MVD + FD rules FD reflexivity, augmentation, and transitivity MVD complementation: If X Y, then X attrs R X Y MVD augmentation: If X Y and V W, then XW YV MVD transitivity: If X Y and Y Z, then X Z Y Replication (FD is MVD): If X Y, then X Y Coalescence: If X Y and Z Y and there is some W disjoint from Y such that W Z, then X Z Try proving things using these!?

31 An elegant solution: chase Given a set of FD s and MVD s D, does another dependency d (FD or MVD) follow from D? Procedure Start with the premise of d, and treat them as seed tuples in a relation Apply the given dependencies in D repeatedly If we apply an FD, we infer equality of two symbols If we apply an MVD, we infer more tuples If we infer the conclusion of d, we have a proof Otherwise, if nothing more can be inferred, we have a counterexample

32 Proof by chase In R A, B, C, D, does A B and B C imply that A C? Have: A B C D a b 5 c 5 d 5 a b 7 c 7 d 7 Need: A B C D a b 5 c 7 d 5 a b 7 c 5 d 7 A A A B B C B C a b 7 c 5 d 5 a b 5 c 7 d 7 a b 7 c 5 d 7 a b 7 c 7 d 5 a b 5 c 7 d 5 a b 5 c 5 d 7

33 Another proof by chase In R A, B, C, D, does A B and B C imply that A C? Have: A B C D a b 5 c 5 d 5 a b 7 c 7 d 7 A B b 5 = b 7 B C c 5 = c 7 Need: c 5 = c 7 A In general, with both MVD s and FD s, chase can generate both new tuples and new equalities

34 Counterexample by chase In R A, B, C, D, does A BC and CD B imply that A B? Have: A B C D a b 5 c 5 d 5 a b 7 c 7 d 7 Need: b 5 = b 7 D A BC a b 7 c 7 d 5 a b 5 c 5 d 7 Counterexample!

35 4NF A relation R is in Fourth Normal Form (4NF) if For every non-trivial MVD X Y in R, X is a superkey That is, all FD s and MVD s follow from key other attributes (i.e., no MVD s and no FD s besides key functional dependencies) 4NF is stronger than BCNF Because every FD is also a MVD

36 4NF decomposition algorithm Find a 4NF violation A non-trivial MVD X Y in R where X is not a superkey Decompose R into R 5 and R 7, where R 5 has attributes X Y R 7 has attributes X Z (where Z contains R attributes not in X or Y) Repeat until all relations are in 4NF Almost identical to BCNF decomposition algorithm Any decomposition on a 4NF violation is lossless

37 4NF decomposition example uid gid place 142 dps Springfield 142 dps Australia User (uid, gid, place) 4NF violation: uid gid 456 abc Springfield 456 abc Morocco 456 gov Springfield 456 gov Morocco Member (uid, gid) 4NF uid gid 142 dps 456 abc 456 gov Visited (uid, place) uid place 4NF 142 Springfield 142 Australia 456 Springfield 456 Morocco

38 Summary Philosophy behind BCNF, 4NF: Data should depend on the key, the whole key, and nothing but the key! You could have multiple keys though Other normal forms 3NF: More relaxed than BCNF; will not remove redundancy if doing so makes FDs harder to enforce 2NF: Slightly more relaxed than 3NF 1NF: All column values must be atomic