The Relational Data Model

Similar documents
CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2009 Lecture 3 - Schema Normalization

CS411 Database Systems. 05: Relational Schema Design Ch , except and

CSE 562 Database Systems

Lectures 5 & 6. Lectures 6: Design Theory Part II

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 4 - Schema Normalization

Introduction to Database Systems CSE 544. Lecture #2 January 16, 2007

FUNCTIONAL DEPENDENCIES

Introduction to Data Management CSE 344

Normalization. Murali Mani. What and Why Normalization? To remove potential redundancy in design

Lectures 12: Design Theory I. 1. Normal forms & functional dependencies 2/19/2018. Today s Lecture. What you will learn about in this section

CSE 344 AUGUST 1 ST ENTITIES

CS 564 Midterm Review

BCNF. Yufei Tao. Department of Computer Science and Engineering Chinese University of Hong Kong BCNF

CSE 544 Principles of Database Management Systems

Design Theory for Relational Databases

CSE 544 Principles of Database Management Systems

Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Overview - detailed. Goal. Faloutsos & Pavlo CMU SCS /615

CMU SCS CMU SCS CMU SCS CMU SCS whole nothing but

Database Design Theory and Normalization. CS 377: Database Systems

In This Lecture. Normalisation to BCNF. Lossless decomposition. Normalisation so Far. Relational algebra reminder: product

Databases -Normalization I. (GF Royle, N Spadaccini ) Databases - Normalization I 1 / 24

Functional Dependencies CS 1270

L12: ER modeling 5. CS3200 Database design (sp18 s2) 2/22/2018

This lecture. Databases -Normalization I. Repeating Data. Redundancy. This lecture introduces normal forms, decomposition and normalization.

Review: Attribute closure

Fundamentals of Database Systems

Lecture 11 - Chapter 8 Relational Database Design Part 1

NORMAL FORMS. CS121: Relational Databases Fall 2017 Lecture 18

Introduction to Databases CSE 414. Lecture 2: Data Models

CSC 261/461 Database Systems Lecture 14. Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101

Lecture 4. Database design IV. INDs and 4NF Design wrapup

Functional Dependencies and Finding a Minimal Cover

COSC Dr. Ramon Lawrence. Emp Relation

CSIT5300: Advanced Database Systems

Functional dependency theory

CSE 444 Midterm Test

Database design III. Quiz time! Using FDs to detect anomalies. Decomposition. Decomposition. Boyce-Codd Normal Form 11/4/16

Lecture 2: Introduction to SQL

Schema Refinement: Dependencies and Normal Forms

The Entity-Relationship Model (ER Model) - Part 2

Introduction to SQL Part 1 By Michael Hahsler based on slides for CS145 Introduction to Databases (Stanford)

Schema Refinement and Normal Forms

Chapter 8: Relational Database Design

Database Systems. Basics of the Relational Data Model

Unit 3 : Relational Database Design

DS Introduction to SQL Part 1 Single-Table Queries. By Michael Hahsler based on slides for CS145 Introduction to Databases (Stanford)

Schema Refinement: Dependencies and Normal Forms

Part II: Using FD Theory to do Database Design

Administrivia. Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Relational Model: Keys. Correction: Implied FDs

Lecture 9: Database Design. Wednesday, February 18, 2015

Introduction to Database Systems. The Relational Data Model

Introduction to Database Systems. The Relational Data Model. Werner Nutt

Schema Refinement: Dependencies and Normal Forms

Databases The theory of relational database design Lectures for m

UNIT 3 DATABASE DESIGN

customer = (customer_id, _ customer_name, customer_street,

CS 338 Functional Dependencies

Normal Forms. Winter Lecture 19

Relational Design 1 / 34

Desired properties of decompositions

Introduction to Database Systems. Announcements CSE 444. Review: Closure, Key, Superkey. Decomposition: Schema Design using FD

Database Management System Prof. Partha Pratim Das Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur

Functional Dependencies and Normalization

5 Normalization:Quality of relational designs

SCHEMA REFINEMENT AND NORMAL FORMS

Normalization 03. CSE3421 notes

Database Management

Relational Database Design (II)

Introduction to Databases, Fall 2003 IT University of Copenhagen. Lecture 4: Normalization. September 16, Lecturer: Rasmus Pagh

CSE 344 JANUARY 5 TH INTRO TO THE RELATIONAL DATABASE

Database Management System

Relational Database Design Theory. Introduction to Databases CompSci 316 Fall 2017

INF 212 ANALYSIS OF PROG. LANGS SQL AND SPREADSHEETS. Instructors: Crista Lopes Copyright Instructors.

Big Data Processing Technologies. Chentao Wu Associate Professor Dept. of Computer Science and Engineering

Informal Design Guidelines for Relational Databases

Relational Design: Characteristics of Well-designed DB

Databases Lecture 7. Timothy G. Griffin. Computer Laboratory University of Cambridge, UK. Databases, Lent 2009

Concepts from

Entity/Relationship Modelling

Relational Design Theory

DS Introduction to SQL Part 2 Multi-table Queries. By Michael Hahsler based on slides for CS145 Introduction to Databases (Stanford)

Chapter 7: Relational Database Design

CS 2451 Database Systems: Database and Schema Design

Normalization. Anomalies Functional Dependencies Closures Key Computation Projecting Relations BCNF Reconstructing Information Other Normal Forms

Functional Dependencies & Normalization for Relational DBs. Truong Tuan Anh CSE-HCMUT

The Relational Data Model. Functional Dependencies. Example. Functional Dependencies

Functional Dependencies and Normalization for Relational Databases Design & Analysis of Database Systems

Introduction to Data Management CSE 344

V. Database Design CS448/ How to obtain a good relational database schema

Sample Exam for CSE 480 (2017) KEY

Introduction to Data Management CSE 344

CSCI 403: Databases 13 - Functional Dependencies and Normalization

Database Design Principles

CSE 344 MAY 11 TH ENTITIES

LOGICAL DATABASE DESIGN Part #2/2

Database Management System 15

Homework 6: FDs, NFs and XML (due April 13 th, 2016, 4:00pm, hard-copy in-class please)

Chapter 10. Normalization. Chapter Outline. Chapter Outline(contd.)

Draw A Relational Schema And Diagram The Functional Dependencies In The Relation >>>CLICK HERE<<<

Relational Database Design. Announcements. Database (schema) design. CPS 216 Advanced Database Systems. DB2 accounts have been set up

Transcription:

The Relational Data Model Lecture 6 1 Outline Relational Data Model Functional Dependencies Logical Schema Design Reading Chapter 8 2 1

The Relational Data Model Data Modeling Relational Schema Physical storage Have seen this in SQL E/R diagrams Have seen this too Tables: column names: attributes rows: tuples Discuss next Complex file organization and index structures. 3 Terminology Table name or relation name Attribute names Products: Name Price Category Manufacturer gizmo $19.99 gadgets GizmoWorks Power gizmo $29.99 gadgets GizmoWorks SingleTouch $149.99 photography Canon MultiTouch $203.99 household Hitachi Tuples or rows or records 4 2

Schemas Relational Schema: Relation name plus attribute names E.g. Product(Name, Price, Category, Manufacturer) In practice we add the domain for each attribute Database Schema Set of relational schemas E.g. Product(Name, Price, Category, Manufacturer), Company(Name, Address, Phone),....... 5 Instances Relational schema = R(A 1,,A k ) Instance = relation with k attributes (of type R) values of corresponding domains Database schema = R 1 ( ), R 2 ( ),, R n ( ) Instance = n relations, of types R 1, R 2,..., R n This is all mathematics, not to be confused with SQL tables! (What's a difference?) 6 3

Example Relational schema:product(name, Price, Category, Manufacturer) Instance: Name Price Category Manufacturer gizmo $19.99 gadgets GizmoWorks Power gizmo $29.99 gadgets GizmoWorks SingleTouch $149.99 photography Canon MultiTouch $203.99 household Hitachi 7 First Normal Form (1NF) A database schema is in First Normal Form if all tables are flat Student Student Name Alice Bob Carol GPA 3.8 3.7 3.9 Courses Math DB OS DB OS Math OS May need to add keys Name Alice Bob Carol Takes Student Alice Carol Alice Bob Alice Carol GPA DB OS OS 3.8 3.7 3.9 Course Math Math DB Course Course Math DB OS 8 4

Functional Dependencies A form of constraint hence, part of the schema Finding them is part of the database design Also used in normalizing the relations Warning: this is the most abstract, and hardest part of the course. 9 Definition: Functional Dependencies If two tuples agree on the attributes A 1, A 2,, A n then they must also agree on the attributes B 1, B 2,, B m Formally: A 1, A 2,, A n B 1, B 2,, B m 10 5

Examples EmpID Name Phone Position E0045 Smith 1234 Clerk E1847 John 9876 Salesrep E1111 Smith 9876 Salesrep E9999 Mary 1234 Lawyer EmpID Name, Phone, Position Position Phone but Phone Position 11 In General To check A B, erase all other columns A X1 X2 B Y1 Y2 check if the remaining relation is many-one (called functional in mathematics) 12 6

Example EmpID Name Phone Position E0045 Smith 1234 Clerk E1847 John 9876 Salesrep E1111 Smith 9876 Salesrep E9999 Mary 1234 Lawyer Position Phone 13 Typical Examples of FDs Product: name price, manufacturer Person: ssn name, age Company: name stockprice, president 14 7

Example Product(name, category, color, department, price) Consider these FDs: name color category department color, category price What do they say? 15 Example FD s are constraints on relations: On some instances they hold On others they don t name color category department color, category price name category color department price Gizmo Gadget Green Toys 49 Tweaker Gadget Green Toys 99 Does this instance satisfy all the FDs? 16 8

Example name color category department color, category price name category color department price Gizmo Gadget Green Toys 49 Tweaker Gadget Black Toys 99 Gizmo Stationary Green Office-supp. 59 What about this one? 17 Example If some FDs are satisfied, then others are satisfied too If all these FDs are true: name color category department color, category price Then this FD also holds: name, category price Why?? 18 9

Inference Rules for FD s A 1, A 2,, A n B 1, B 2,, B m Is equivalent to Splitting rule and Combining rule A 1, A 2,, A n B 1 A 1, A 2,, A n B 2..... A 1, A 2,, A n B m A1... Am B1... Bm 19 Inference Rules for FD s (continued) A 1, A 2,, A n A i Trivial Rule where i = 1, 2,..., n A 1 A m Why? 20 10

Inference Rules for FD s (continued) Transitive Closure Rule If A 1, A 2,, A n B 1, B 2,, B m and B 1, B 2,, B m C 1, C 2,, C p then A 1, A 2,, A n C 1, C 2,, C p Why? 21 A 1 A m B 1 B m C 1... C p 22 11

Example (continued) Start from the following FDs: Infer the following FDs: Inferred FD 4. name, category name 5. name, category color 6. name, category category 7. name, category color, category 8. name, category price 1. name color 2. category department 3. color, category price Which Rule did we apply? 23 Example (continued) Answers: 1. name color 2. category department 3. color, category price Inferred FD 4. name, category name 5. name, category color 6. name, category category 7. name, category color, category 8. name, category price Which Rule did we apply? Trivial rule Transitivity on 4, 1 Trivial rule Split/combine on 5, 6 Transitivity on 3, 7 24 12

Another Example Enrollment(student, major, course, room, time) student major major, course room course time What else can we infer? 25 Another Rule Augmentation If A 1, A 2,, A n B then A 1, A 2,, A n, C 1, C 2,, C p B Augmentation follows from trivial rules and transitivity How? 26 13

Problem: infer ALL FDs Given a set of FDs, infer all possible FDs How to proceed? Try all possible FDs, apply all 3 rules E.g. R(A, B, C, D): how many FDs are possible? Drop trivial FDs, drop augmented FDs Still way too many Better: use the Closure Algorithm (next) 27 Closure of a set of Attributes Given a set of attributes A 1,, A n The closure, {A 1,, A n } +, is the set of attributes B s.t. A 1,, A n B Example: name color category department color, category price Closures: name + = {name, color} {name, category} + = {name, category, color, department, price} color + = {color} 28 14

Closure Algorithm Start with X={A1,, An}. Example: Repeat until X doesn t change do: if B 1,, B n C is a FD and B 1,, B n are all in X name color category department color, category price then add C to X. {name, category} + = {name, category, color, department, price} 29 Example R(A,B,C,D,E,F) A, B C A, D E B D A, F B Compute {A,B} + X = {A, B, } Compute {A, F} + X = {A, F, } 30 15

Using Closure to Infer ALL FDs Example: A, B C A, D B B D Step 1: Compute X +, for every X: A+ = A, B+ = BD, C+ = C, D+ = D AB+ = ABCD, AC+ = AC, AD+ = ABCD ABC+ = ABD+ = ACD + = ABCD (no need to compute why?) BCD + = BCD, ABCD+ = ABCD Step 2: Enumerate all FD s X Y, s.t. Y X + and X Y = : AB CD, AD BC, ABC D, ABD C, ACD B 31 Problem: Finding FDs Approach 1: During Database Design Designer derives them from real-world knowledge of users Problem: knowledge might not be available Approach 2: From a Database Instance Analyze given database instance and find all FD s satisfied by that instance Useful if designers don t get enough information from users Problem: FDs might be artifical for the given instance 32 16

Find All FDs Student Dept Course Room Alice CSE C++ 020 Bob Alice Carol CSE EE CSE C++ HW DB 020 040 045 Do all FDs make sense in practice? Dan CSE Java 050 Elsa CSE DB 045 Frank EE Circuits 020 33 Answer Course Dept, Room Dept, Room Course Student, Dept Course, Room Student, Course Dept, Room Student, Room Dept, Course Do all FDs make sense in practice? 34 17

Keys A key is a set of attributes A 1,..., A n s.t. for any other attribute B, we have A 1,..., A n B A minimal key is a set of attributes which is a key and for which no subset is a key Note: book calls them superkey and key 35 Computing Keys Compute X + for all sets X If X + = all attributes, then X is a key List only the minimal keys Note: there can be many minimal keys! Example: R(A,B,C), AB C, BC A Minimal keys: AB and BC 36 18

Examples of Keys Product(name, price, category, color) name, category price category color Keys are: {name, category} and all supersets Enrollment(student, address, course, room, time) student address room, time course student, course room, time 37 Relational Schema Design (or Logical Schema Design) Main idea: Start with some relational schema Find out its FD s Use them to design a better relational schema 38 19

Data Anomalies When a database is poorly designed we get anomalies: Redundancy: data is repeated Update anomalies: need to change in several places Delete anomalies: may lose data when we don t want 39 Relational Schema Design Example: Persons with several phones Name Fred SSN 123-45-6789 PhoneNumber 206-555-1234 City Seattle Fred 123-45-6789 206-555-6543 Seattle Joe 987-65-4321 908-555-2121 Westfield SSN Name, City but not SSN PhoneNumber Anomalies: Redundancy = repeat data Update anomalies = Fred moves to Bellevue Deletion anomalies = Joe deletes his phone number: what is his city? 40 20

Relation Decomposition Break the relation into two: Name SSN PhoneNumber City Fred 123-45-6789 206-555-1234 Seattle Fred 123-45-6789 206-555-6543 Seattle Joe 987-65-4321 908-555-2121 Westfield Name Fred Joe SSN 123-45-6789 987-65-4321 City Seattle Westfield SSN Anomalies have gone: No more repeated data Easy to move Fred to Bellevue (how?) Easy to delete all Joe s phone number (how?) 123-45-6789 123-45-6789 987-65-4321 PhoneNumber 206-555-1234 206-555-6543 908-555-2121 41 Relational Schema Design Conceptual Model: name Product buys Person price name ssn Relational Model: plus FD s Normalization: Eliminates anomalies 42 21

Decompositions in General R(A 1,..., A n, B 1,..., B m, C 1,..., C p ) R 1 (A 1,..., A n, B 1,..., B m ) R 2 (A 1,..., A n, C 1,..., C p ) R 1 = projection of R on A 1,..., A n, B 1,..., B m R 2 = projection of R on A 1,..., A n, C 1,..., C p 43 Decomposition Sometimes it is correct: Name Gizmo OneClick Gizmo Price 19.99 24.99 19.99 Category Gadget Camera Camera Name Price Name Category Gizmo 19.99 Gizmo Gadget OneClick 24.99 OneClick Camera Gizmo 19.99 Gizmo Camera Lossless decomposition 44 22

Incorrect Decomposition Sometimes it is not: Name Price Category Gizmo OneClick Gizmo 19.99 24.99 19.99 Gadget Camera Camera What s incorrect?? Name Gizmo OneClick Gizmo Category Gadget Camera Camera Lossy decomposition Price 19.99 24.99 19.99 Category Gadget Camera Camera 45 Decompositions in General R(A 1,..., A n, B 1,..., B m, C 1,..., C p ) R 1 (A 1,..., A n, B 1,..., B m ) R 2 (A 1,..., A n, C 1,..., C p ) If A 1,..., A n B 1,..., B m Then the decomposition is lossless Note: don t need necessarily A 1,..., A n C 1,..., C p Example: name price, hence the first decomposition is lossless 46 23

Normal Forms First Normal Form = all attributes are atomic Second Normal Form (2NF) = old and obsolete Third Normal Form (3NF) = this lecture Boyce Codd Normal Form (BCNF) = this lecture Others... 47 Boyce-Codd Normal Form A simple condition for removing anomalies from relations: A relation R is in BCNF if: If A 1,..., A n B is a non-trivial dependency in R, then {A 1,..., A n } is a key for R In English (though a bit vague): Whenever a set of attributes of R is determining another attribute, it should determine all the attributes of R. 48 24

BCNF Decomposition Algorithm Repeat choose A 1,, A m B 1,, B n that violates the BNCF condition split R into R 1 (A 1,, A m, B 1,, B n ) and R 2 (A 1,, A m, [others]) continue with both R 1 and R 2 Until no more violations B s A s Others Is there a 2-attribute relation that is not in BCNF? R 1 R 2 49 Example Name SSN PhoneNumber City Fred 123-45-6789 206-555-1234 Seattle Fred 123-45-6789 206-555-6543 Seattle Joe 987-65-4321 908-555-2121 Westfield Joe 987-65-4321 908-555-1234 Westfield What are the dependencies? SSN Name, City What are the keys? {SSN, PhoneNumber} Is it in BCNF? 50 25

Decompose it into BCNF Name Fred SSN 123-45-6789 City Seattle SSN Name, City Joe 987-65-4321 Westfield SSN 123-45-6789 123-45-6789 987-65-4321 987-65-4321 PhoneNumber 206-555-1234 206-555-6543 908-555-2121 908-555-1234 Let s check anomalies: Redundancy? Update? Delete? 51 Summary of BCNF Decomposition Find a dependency that violates the BCNF condition: Heuristics: choose B, B, B as large as possible 1 2 m Decompose: A 1, A 2,, A n B 1, B 2,, B m Others R1 A s B s R2 Continue until there are no BCNF violations left. 2-attribute relations are BCNF 52 26

Example Decomposition Person(name, SSN, age, haircolor, phonenumber) SSN name, age age haircolor Decompose in BCNF (in class): Step 1: find all keys (How? Compute S +, for various sets S) Step 2: now decompose 53 Other Example R(A,B,C,D) A B, B C Key: AD Violations of BCNF: A B, A C, A BC Pick A BC: split into R1(A,BC) R2(A,D) What happens if we pick A B first? 54 27

Lossless Decompositions A decomposition is lossless if we can recover: R(A,B,C) Decompose R1(A,B) R2(A,C) Recover R (A,B,C) should be the same as R(A,B,C) R is in general larger than R. Must ensure R = R 55 Lossless Decompositions Given R(A,B,C) s.t. A B, the decomposition into R1(A,B), R2(A,C) is lossless 56 28

3NF: A Problem with BCNF Unit Company Product FD s: Unit Company; Company, Product Unit So, there is a BCNF violation, and we decompose. Unit Company Unit Company Unit Product No FDs Notice: we loose the FD: Company, Product Unit 57 So What s the Problem? Unit Company Unit Product Galaga99 UW Galaga99 databases Bingo UW Bingo databases No problem so far. All local FD s are satisfied. Let s put all the data back into a single table again (anomalies?): Unit Company Product Galaga99 UW databases Bingo UW databases Violates the dependency: company, product -> unit! 58 29

Solution: 3rd Normal Form (3NF) A simple condition for removing anomalies from relations: A relation R is in 3rd normal form if : Whenever there is a nontrivial dependency A 1, A 2,..., A n B for R, then {A 1, A 2,..., A n } is a key for R, or B is part of a key. Tradeoff: BCNF = no anomalies, but may lose some FDs 3NF = keeps all FDs, but may have some anomalies 59 30