CSE 562 Database Systems

Similar documents
CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2009 Lecture 3 - Schema Normalization

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 4 - Schema Normalization

The Relational Data Model

CSE 544 Principles of Database Management Systems

Normalization. Murali Mani. What and Why Normalization? To remove potential redundancy in design

Review: Attribute closure

CS411 Database Systems. 05: Relational Schema Design Ch , except and

CSE 544 Principles of Database Management Systems

Lectures 12: Design Theory I. 1. Normal forms & functional dependencies 2/19/2018. Today s Lecture. What you will learn about in this section

Database Design Theory and Normalization. CS 377: Database Systems

Schema Refinement: Dependencies and Normal Forms

Schema Refinement: Dependencies and Normal Forms

Lectures 5 & 6. Lectures 6: Design Theory Part II

Databases -Normalization I. (GF Royle, N Spadaccini ) Databases - Normalization I 1 / 24

Functional Dependencies CS 1270

Schema Refinement and Normal Forms

COSC Dr. Ramon Lawrence. Emp Relation

This lecture. Databases -Normalization I. Repeating Data. Redundancy. This lecture introduces normal forms, decomposition and normalization.

Schema Refinement: Dependencies and Normal Forms

FUNCTIONAL DEPENDENCIES

Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Overview - detailed. Goal. Faloutsos & Pavlo CMU SCS /615

Database Management System

CMU SCS CMU SCS CMU SCS CMU SCS whole nothing but

Informal Design Guidelines for Relational Databases

UNIT 3 DATABASE DESIGN

Functional Dependencies and Finding a Minimal Cover

Chapter 10. Normalization. Chapter Outline. Chapter Outline(contd.)

Functional Dependencies and Normalization for Relational Databases Design & Analysis of Database Systems

Chapter 10. Chapter Outline. Chapter Outline. Functional Dependencies and Normalization for Relational Databases

BCNF. Yufei Tao. Department of Computer Science and Engineering Chinese University of Hong Kong BCNF

Functional Dependencies & Normalization for Relational DBs. Truong Tuan Anh CSE-HCMUT

CS 2451 Database Systems: Database and Schema Design

Lecture 5 Design Theory and Normalization

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-2

Part II: Using FD Theory to do Database Design

Chapter 14. Database Design Theory: Introduction to Normalization Using Functional and Multivalued Dependencies

Relational Design 1 / 34

Relational Database Design. Announcements. Database (schema) design. CPS 216 Advanced Database Systems. DB2 accounts have been set up

Introduction to Database Systems CSE 544. Lecture #2 January 16, 2007

Database Systems. Basics of the Relational Data Model

CS 338 Functional Dependencies

Chapter 8: Relational Database Design

Relational Database design. Slides By: Shree Jaswal

Databases Lecture 7. Timothy G. Griffin. Computer Laboratory University of Cambridge, UK. Databases, Lent 2009

Relational Database Design Theory. Introduction to Databases CompSci 316 Fall 2017

Functional Dependencies and. Databases. 1 Informal Design Guidelines for Relational Databases. 4 General Normal Form Definitions (For Multiple Keys)

Unit 3 : Relational Database Design

NORMAL FORMS. CS121: Relational Databases Fall 2017 Lecture 18

Database design III. Quiz time! Using FDs to detect anomalies. Decomposition. Decomposition. Boyce-Codd Normal Form 11/4/16

Relational Design: Characteristics of Well-designed DB

Announcements (January 20) Relational Database Design. Database (schema) design. Entity-relationship (E/R) model. ODL (Object Definition Language)

Combining schemas. Problems: redundancy, hard to update, possible NULLs

Database Constraints and Design

From Murach Chap. 9, second half. Schema Refinement and Normal Forms

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Lecture 11 - Chapter 8 Relational Database Design Part 1

Normalization. Anomalies Functional Dependencies Closures Key Computation Projecting Relations BCNF Reconstructing Information Other Normal Forms

Relational Database Design (II)

Chapter 7: Relational Database Design

Introduction to Data Management. Lecture #7 (Relational Design Theory)

Normal Forms. Winter Lecture 19

Typical relationship between entities is ((a,b),(c,d) ) is best represented by one table RS (a,b,c,d)

CSCI 127 Introduction to Database Systems

Design Theory for Relational Databases

Functional dependency theory

Chapter 6: Relational Database Design

Homework 6: FDs, NFs and XML (due April 13 th, 2016, 4:00pm, hard-copy in-class please)

DATABASE MANAGEMENT SYSTEMS

Introduction to Data Management CSE 344

V. Database Design CS448/ How to obtain a good relational database schema

Lecture 6a Design Theory and Normalization 2/2

Database Design Principles

Babu Banarasi Das National Institute of Technology and Management

Introduction to Databases, Fall 2003 IT University of Copenhagen. Lecture 4: Normalization. September 16, Lecturer: Rasmus Pagh

SCHEMA REFINEMENT AND NORMAL FORMS

Case Study: Lufthansa Cargo Database

CSCI 403: Databases 13 - Functional Dependencies and Normalization

customer = (customer_id, _ customer_name, customer_street,

Part V Relational Database Design Theory

Normalisation theory

Database Management System Prof. Partha Pratim Das Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur

MODULE: 3 FUNCTIONAL DEPENDENCIES

Chapter 16. Relational Database Design Algorithms. Database Design Approaches. Top-Down Design

CSE 344 AUGUST 1 ST ENTITIES

Functional Dependencies and Normalization

Mapping ER Diagrams to. Relations (Cont d) Mapping ER Diagrams to. Exercise. Relations. Mapping ER Diagrams to Relations (Cont d) Exercise

We shall represent a relation as a table with columns and rows. Each column of the table has a name, or attribute. Each row is called a tuple.

Final Review. Zaki Malik November 20, 2008

Normalization 03. CSE3421 notes

Desirable database characteristics Database design, revisited

Draw A Relational Schema And Diagram The Functional Dependencies In The Relation >>>CLICK HERE<<<

Ryan Marcotte CS 475 (Advanced Topics in Databases) March 14, 2011

CSIT5300: Advanced Database Systems

Relational Design Theory

CS352 Lecture - Conceptual Relational Database Design

Theory of Normal Forms Decomposition of Relations. Overview

SCHEMA REFINEMENT AND NORMAL FORMS

Database Management Systems Paper Solution

The Entity-Relationship Model (ER Model) - Part 2

Chapter 14 Outline. Normalization for Relational Databases: Outline. Chapter 14: Basics of Functional Dependencies and

Transcription:

Goal CSE 562 Database Systems Question: The relational model is great, but how do I go about designing my database schema? Database Design Some slides are based or modified from originals by Magdalena Balazinska UB CSE 562 UB CSE 562 2 Outline Database Design Process Conceptual DB Design: Entity/Relationship Model Problematic Database Designs Functional Dependencies Data Modeling Refinement SQL Tables Files Normal Forms and Schema Normalization E/R Diagrams Relations Conceptual Schema Physical Schema UB CSE 562 3 UB CSE 562 4 1

Conceptual Schema Design Entity/Relationship Diagrams pno name since dno Conceptual Model Patient patient_of Doctor Attributes name zip specialty name Relational Model + Functional Dependencies (FDs) Normalization Eliminates Anomalies Entity Sets Relationship Sets Patient patient_of UB CSE 562 5 UB CSE 562 6 Example E/R Diagram Resulting Relations One way to translate diagram into relations: pno name since dno PatientOf (pno, name, zip, dno, since) Doctor (dno, dname, specialty) Patient patient_of Doctor zip specialty name UB CSE 562 7 UB CSE 562 8 2

Entity/Relationship Model Example with Inheritance Typically, each entity has a key E/R relationships can include multiplicity One-to-one, one-to-many, etc. Indicated with arrows Can model multi-way relationships id name Person Can model subclasses And more... dept Employee Customer credit_score billing_addr Example from Phil Bernstein s SIGMOD 07 keynote talk UB CSE 562 9 UB CSE 562 10 Converting Into Relations Outline One way to translate our E/R diagram into relations: HR (id, name) Empl (id, dept) id is also a foreign key referencing HR Client (id, name, credit_score, billing_addr) Today, we only talk about using E/R diagrams to help us design the conceptual schema of a database In general, apps may need to operate on a view of the data closer to E/R model (e.g., OO view of data) while DB contains relations Need to translate between objects and relations Object-Relational Mapping (ORM) Hibernate, Microsoft ADO.NET Entity Framework, etc. Conceptual DB Design: Entity/Relationship Model Problematic Database Designs Functional Dependencies Normal Forms and Schema Normalization UB CSE 562 11 UB CSE 562 12 3

Problematic Designs Some DB designs lead to redundancy Same information stored multiple times Problems: Redundant Storage Update Anomalies Insertion Anomalies Deletion Anomalies Problem Examples PatientOf Redundant pno name zip dno since 1 p1 98125 2 2000 1 p1 98125 3 2003 2 p2 98112 1 2002 3 p1 98143 1 1985 If we update to 98119, we get inconsistency What if we want to insert a patient without a doctor? What if we want to delete the last doctor for a patient? Illegal as (pno,dno) is the primary key, cannot have nulls UB CSE 562 13 UB CSE 562 14 Solution: Decomposition Lossy Decomposition Patient PatientOf Patient PatientOf pno name zip 1 p1 98125 2 p2 98112 3 p1 98143 pno dno since 1 2 2000 1 3 2003 2 1 2002 3 1 1985 pno name zip 1 p1 98125 2 p2 98112 3 p1 98143 name dno since p1 2 2000 p1 3 2003 p2 1 2002 p1 1 1985 Decomposition solves the problem, but need to be careful Decomposition can cause us to lose information! UB CSE 562 15 UB CSE 562 16 4

Schema Refinement Challenges Outline How do we know that we should decompose a relation? Functional dependencies Normal forms How do we make sure decomposition does not lose any information? Lossless-join decompositions Dependency-preserving decompositions Conceptual DB Design: Entity/Relationship Model Problematic Database Designs Functional Dependencies Normal Forms and Schema Normalization UB CSE 562 17 UB CSE 562 18 Functional Dependency FD Illustration A functional dependency (FD) is an integrity constraint that generalizes the concept of a key The FD A 1,, A m B 1,, B n holds in R if: t, t R, (t.a 1 =t.a 1 t.a m =t.a m t.b 1 =t.b 1 t.b n =t.b n ) An instance of relation R satisfies the FD: X Y if for every pair of tuples t 1 and t 2 if t 1.X = t 2.X then t 1.Y = t 2.Y where X, Y are two nonempty sets of attributes in R t R A 1 A m B 1 B n We say that X determines Y t FDs come from domain knowledge if t, t agree here then t, t agree here UB CSE 562 19 UB CSE 562 20 5

FD Example FD Terminology An FD holds, or does not hold on an instance: EmpID Name Phone Position E0045 Smith 1234 Clerk E3542 Mike 9876 Salesrep E1111 Smith 9876 Salesrep E9999 Mary 1234 Lawyer EmpID Name, Phone, Position Position Phone but not Phone Position FDs are constraints On some instances they hold On others they do not If for every instance of R a given FD will hold, then we say that R satisfies the FD If we say that R satisfies an FD, we are stating a constraint on R FDs come from domain knowledge UB CSE 562 21 UB CSE 562 22 An Interesting Observation How Is This All Useful? name color If all these FDs are true: category department color, category price Then this FD also holds: name, category price Why??? Anomalies occur when certain bad FDs hold We know some of the FDs Need to find all FDs Then look for the bad ones UB CSE 562 23 UB CSE 562 24 6

Closure of FDs Example (cont d) Some FDs imply others For example: Employee(ssn,position,salary) FD1: ssn position and FD2: position salary Imply FD3: ssn salary Can compute closure of a set of FDs Set F+ of all FDs implied by a given set F of FDs Armstrong s Axioms: sound and complete Reflexivity: if Y X then X Y Augmentation: if X Y then XZ YZ for any Z Transitivity: if X Y and Y Z then X Z Convenient split/combine rule: If X Y and X Z then X YZ 1. name color Starting from these FDs: 2. category department 3. color, category price Infer the following FDs: Inferred FD 4. name, category name 5. name, category color 6. name, category category 7. name, category color, category 8. name, category price Which Rule did we apply? UB CSE 562 25 UB CSE 562 26 Example (cont d) Closure of a Set of Attributes 1. name color Starting from these FDs: 2. category department 3. color, category price Infer the following FDs: Inferred FD 4. name, category name Reflexivity TOO HARD! Let s see an easier way. Which Rule did we apply? 5. name, category color Transitivity on 4 and 1 6. name, category category Reflexivity 7. name, category color, category Split/Combine on 5 and 6 8. name, category price Transitivity on 7 and 3 Given a set of attributes A 1,, A n The closure {A 1,, A n }+ = the set of attributes B such that A 1,, A n B Example: category department name color color, category price Closures: name+ = {name, color} {name, category}+ = {name, category, color, department, price} color+ = {color} UB CSE 562 27 UB CSE 562 28 7

Closure Algorithm For Attributes Closure For Attributes Example To find closure {A 1,, A n }+ 1. Start with X={A 1,, A n } 2. Repeat until X doesn t change: 3. if B 1,, B n C is a FD and B 1,, B n are all in X 4. then add C to X Can use this algorithm to find keys Compute X+ for all sets X If X+ = all attributes, then X is a superkey Minimal superkeys are keys Example: category department name color color, category price Closures: name+ = {name, color} {name, category}+ = {name, category, color, department, price} color+ = {color} UB CSE 562 29 UB CSE 562 30 Another Example Using Closure To Infer ALL FDs R(A, B, C, D, E, F) Compute {A, B}+ X = {A, B, C, D, E} A, B C A, D E B D A, F B Compute {A, F}+ X = {A, F, B, C, D, E} Example: A, B C A, D B B D 1. Step 1: Compute X+, for every X: A+=A, B+=BD, C+= C, D+= D AB+=ABCD, AC+=AC, AD+=ABCD, BC+=BCD, BD+=BD, CD+=CD ABC+=ABD+=ACD+=ABCD, BCD+= BCD ABCD+=ABCD 2. Step 2: Enumerate all FDs X Y, s.t. Y X+ and X Y = : AB CD, AD BC, BC D, ABC D, ABD C, ACD B UB CSE 562 31 UB CSE 562 32 8

Decomposition Problems Outline FDs will help us identify possible redundancy Identify redundancy and split relations to avoid it Can we get the data back correctly? Lossless-join decomposition Conceptual DB Design: Entity/Relationship Model Problematic Database Designs Functional Dependencies Normal Forms and Schema Normalization Can we recover the FDs on the big table from the FDs on the small tables? Dependency-preserving decomposition So that we can enforce all FDs without performing joins UB CSE 562 33 UB CSE 562 34 Normal Forms BCNF Based on Functional Dependencies 3rd Normal Form Boyce Codd Normal Form (BCNF) Based on Multi-valued Dependencies 4th Normal Form Based on Join Dependencies 5th Normal Form A simple condition for removing anomalies from relations: A relation R is in BCNF if: If A 1,, A n B is a non-trivial dependency in R, then {A 1,, A n } is a superkey for R BCNF ensures that no redundancy can be detected using FD information alone UB CSE 562 35 UB CSE 562 36 9

Example Decomposition in General PatientOf pno name zip dno since 1 p1 98125 2 2000 1 p1 98125 3 2003 2 p2 98112 1 2002 3 p1 98143 1 1985 {pno, dno} is a key, but pno name, zip BCNF violation, so we decompose R(A 1,, A n, B 1,, B m, C 1,, C p ) R 1 (A 1,, A n, B 1,, B m ) R 2 (A 1,, A n, C 1,, C p ) R 1 = projection of R on A 1,, A n, B 1,, B m R 2 = projection of R on A 1,, A n, C 1,, C p Theorem If A 1,, A n B 1,, B m, then the decomposition is lossless Note: don t need necessarily A 1,, A n C 1,, C p UB CSE 562 37 UB CSE 562 38 BCNF Decomposition Algorithm BCNF and Dependencies Repeat choose A 1,, A m B 1,, B n that violates BCNF condition split R into R 1 (A 1,, A m, B 1,, B n ) and R 2 (A 1,, A m, [rest]) continue with both R 1 and R 2 Until no more violations Unit Company Product FDs: Unit Company Company, Product Unit So there is a BCNF violation, and we decompose Lossless-join decomposition: Attributes common to R 1 and R 2 must contain a key for either R 1 or R 2 UB CSE 562 39 UB CSE 562 40 10

BCNF and Dependencies 3NF Unit Company Product FDs: Unit Company Company, Product Unit So there is a BCNF violation, and we decompose Unit Company Unit Company A simple condition for removing anomalies from relations A relation R is in 3rd normal form if: Whenever there is a nontrivial dependency A 1, A 2,, A n B for R, then {A 1, A 2,, A n } is a superkey for R, or B is part of a key Unit Product No FDs In BCNF we lose the FD: Company, Product Unit UB CSE 562 41 UB CSE 562 42 3NF Discussion Summary 3NF decomposition vs. BCNF decomposition: Use same decomposition steps, for a while 3NF may stop decomposing, while BCNF continues Tradeoffs BCNF = no anomalies, but may lose some FDs 3NF = keeps all FDs, but may have some anomalies Database design is not trivial Use E/R models Translate E/R models into relations Normalize to eliminate anomalies Normalization tradeoffs BCNF: no anomalies, but may lose some FDs 3NF: keeps all FDs, but may have anomalies Too many small tables affect performance UB CSE 562 43 UB CSE 562 44 11

This Time Design Theory for Relational Databases Chapter 3: 3.1 3.5 High-Level Database Models Chapter 4: 4.1 4.6 UB CSE 562 45 12