Relational Design 1 / 34

Similar documents
Dr. Anis Koubaa. Advanced Databases SE487. Prince Sultan University

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Functional Dependencies and Normalization for Relational Databases Design & Analysis of Database Systems

Normalization. Murali Mani. What and Why Normalization? To remove potential redundancy in design

Chapter 14 Outline. Normalization for Relational Databases: Outline. Chapter 14: Basics of Functional Dependencies and

Chapter 10. Normalization. Chapter Outline. Chapter Outline(contd.)

Relational Database design. Slides By: Shree Jaswal

Informal Design Guidelines for Relational Databases

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-2

Chapter 10. Chapter Outline. Chapter Outline. Functional Dependencies and Normalization for Relational Databases

Chapter 14. Database Design Theory: Introduction to Normalization Using Functional and Multivalued Dependencies

Functional Dependencies and. Databases. 1 Informal Design Guidelines for Relational Databases. 4 General Normal Form Definitions (For Multiple Keys)

UNIT 3 DATABASE DESIGN

COSC Dr. Ramon Lawrence. Emp Relation

Database Design Theory and Normalization. CS 377: Database Systems

In Chapters 3 through 6, we presented various aspects

Functional Dependencies & Normalization for Relational DBs. Truong Tuan Anh CSE-HCMUT

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 4 - Schema Normalization

FUNCTIONAL DEPENDENCIES CHAPTER , 15.5 (6/E) CHAPTER , 10.5 (5/E)

We shall represent a relation as a table with columns and rows. Each column of the table has a name, or attribute. Each row is called a tuple.

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2009 Lecture 3 - Schema Normalization

The strategy for achieving a good design is to decompose a badly designed relation appropriately.

CSE 562 Database Systems

MODULE: 3 FUNCTIONAL DEPENDENCIES

Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Overview - detailed. Goal. Faloutsos & Pavlo CMU SCS /615

NORMALISATION (Relational Database Schema Design Revisited)

CMU SCS CMU SCS CMU SCS CMU SCS whole nothing but

Part II: Using FD Theory to do Database Design

TDDD12 Databasteknik Föreläsning 4: Normalisering

CS 2451 Database Systems: Database and Schema Design

Lecture 11 - Chapter 8 Relational Database Design Part 1

CS 338 Functional Dependencies

DC62 DATABASE MANAGEMENT SYSTEMS DEC 2013

The Relational Data Model

Mapping ER Diagrams to. Relations (Cont d) Mapping ER Diagrams to. Exercise. Relations. Mapping ER Diagrams to Relations (Cont d) Exercise

Functional Dependencies and Normalization

Lectures 12: Design Theory I. 1. Normal forms & functional dependencies 2/19/2018. Today s Lecture. What you will learn about in this section

FUNCTIONAL DEPENDENCIES

Normal Forms. Winter Lecture 19

Translation of ER-diagram into Relational Schema. Dr. Sunnie S. Chung CIS430/530

NORMAL FORMS. CS121: Relational Databases Fall 2017 Lecture 18

Database Management System Prof. Partha Pratim Das Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur

Database design III. Quiz time! Using FDs to detect anomalies. Decomposition. Decomposition. Boyce-Codd Normal Form 11/4/16

Schema Refinement: Dependencies and Normal Forms

Relational Database Design (II)

Lectures 5 & 6. Lectures 6: Design Theory Part II

CSIT5300: Advanced Database Systems

Schema Refinement: Dependencies and Normal Forms

Schema Refinement: Dependencies and Normal Forms

Lecture 4. Database design IV. INDs and 4NF Design wrapup

DATABASE DESIGN I - 1DL300

SVY227: Databases for GIS

Relational Design: Characteristics of Well-designed DB

Databases -Normalization I. (GF Royle, N Spadaccini ) Databases - Normalization I 1 / 24

CSCI 403: Databases 13 - Functional Dependencies and Normalization

CS411 Database Systems. 05: Relational Schema Design Ch , except and

V. Database Design CS448/ How to obtain a good relational database schema

DATABASTEKNIK - 1DL116

customer = (customer_id, _ customer_name, customer_street,

Applied Databases. Sebastian Maneth. Lecture 5 ER Model, Normal Forms. University of Edinburgh - January 30 th, 2017

Schema Normalization. 30 th August Submitted By: Saurabh Singla Rahul Bhatnagar

Design Theory for Relational Databases

Unit IV. S_id S_Name S_Address Subject_opted

Concepts from

Steps in normalisation. Steps in normalisation 7/15/2014

In This Lecture. Normalisation to BCNF. Lossless decomposition. Normalisation so Far. Relational algebra reminder: product

Applied Databases. Sebastian Maneth. Lecture 5 ER Model, normal forms. University of Edinburgh - January 25 th, 2016

Normalization Normalization

Chapter 8: Relational Database Design

Chapter 7: Relational Database Design

This lecture. Databases -Normalization I. Repeating Data. Redundancy. This lecture introduces normal forms, decomposition and normalization.

Chapter 6: Relational Database Design

CMP-3440 Database Systems

Database design process

Normalization. Anomalies Functional Dependencies Closures Key Computation Projecting Relations BCNF Reconstructing Information Other Normal Forms

Database Normalization. (Olav Dæhli 2018)

More Relational Algebra

QUESTION BANK. SUBJECT CODE / Name: CS2255 DATABASE MANAGEMENT SYSTEM UNIT III. PART -A (2 Marks)

Translation of ER-diagram into Relational Schema. Dr. Sunnie S. Chung CIS430/530

Theory of Normal Forms Decomposition of Relations. Overview

Database Technology. Topic 2: Relational Databases and SQL. Olaf Hartig.

Fundamentals of Database Systems

Lecture 5 Design Theory and Normalization

More Normalization Algorithms. CS157A Chris Pollett Nov. 28, 2005.

COSC344 Database Theory and Applications. σ a= c (P) S. Lecture 4 Relational algebra. π A, P X Q. COSC344 Lecture 4 1

Normalisation Chapter2 Contents

Unit 3 : Relational Database Design

Guideline 1: Semantic of the relation attributes Do not mix attributes from distinct real world. example

Functional dependency theory

The Relational Model and Normalization

Database Management System

Relational Model. CS 377: Database Systems

Final Review. Zaki Malik November 20, 2008

Schema Refinement and Normal Forms

Databases The theory of relational database design Lectures for m

Some different database system architectures. (a) Shared nothing architecture.

IS 263 Database Concepts

ECE 650 Systems Programming & Engineering. Spring 2018

Relational Database Design. Announcements. Database (schema) design. CPS 216 Advanced Database Systems. DB2 accounts have been set up

Combining schemas. Problems: redundancy, hard to update, possible NULLs

Transcription:

Relational Design 1 / 34

Relational Design Basic design approaches. What makes a good design better than a bad design? How do we tell we have a "good" design? How to we go about creating a good design? 2 / 34

Basic Design Approaches Bottom-up, a.k.a. synthethis Start with individual attributes and large set of binary relationships among attributes Unpopular Top-down, a.k.a. analysis Start with groupings of attributes, e.g., a schema derived from ER-relational mapping Decompose until design properties are met 3 / 34

Relational Design Desiderata 1. The sematics of relation schemas and their attributes should be clear 2. There should be little or no redundant information in tuples 3. The should be few or no NULL values in tuples 4. It should be impossible to generate spurious (invalid) tuples 5. It should be easy to join tables 4 / 34

Design Goal 1: Cohesive meaning in relational schemas Not Cohesive: EMP(Ename, Ssn, Bdate, Address, Dno, Dno, Dname, Dmgr_ssn) Attributes of employees and attributes of departments What does a single tuple represent? Cohesive: EMP(Ename, Ssn, Bdate, Address, Dno) DEPT(Dno, Dname, Dmgr_ssn) Each EMPT tuple represents a single employee Each DEPT tuple represents a single department 5 / 34

Design Goal 2: Minimize Redundancy Redundant information in schemas: wastes storage space, and leads to data manipulation anomalies. One way to think of schemas with redundancy: they are joined tables from well-designed schemas. Redundancy leads to data manipulation anomalies... 6 / 34

Redundancy leads to Insertion Anomalies Ssn Ename Bdate Addr Dmanaged Dno Dname 123 Alice 1990 ATL 1 1 Research 124 Bob 1991 BOS NULL 1 Research 125 Cheng 1992 CHS NULL 1 Research 126 Drhuv 1993 DET 2 2 Engineering 127 Earl 1994 EWR NULL 2 Engineering Every time we insert a new employee, we have to repeat the department information. 7 / 34

Redundancy leads to Deletion Anomalies Ssn Ename Bdate Addr Dmanaged Dno Dname 123 Alice 1990 ATL 1 1 Research 124 Bob 1991 BOS NULL 1 Research 125 Cheng 1992 CHS NULL 1 Research 126 Drhuv 1993 DET 2 2 Engineering 127 Earl 1994 EWR NULL 2 Engineering If we delete the last member of a department, we lose the information about the department itself. Does it cease to exist? 8 / 34

Redundancy leads to Update (Modification) Anomalies Ssn Ename Bdate Addr Dmanaged Dno Dname 123 Alice 1990 ATL 1 1 Research 124 Bob 1991 BOS NULL 1 Research 125 Cheng 1992 CHS NULL 1 Research 126 Drhuv 1993 DET 2 2 Engineering 127 Earl 1994 EWR NULL 2 Engineering If we change the name of the Research department to the "Playing with lasers" department, we have to change multiple tuples. 9 / 34

Design Goal 3: Minimize Nulls in Tuples Ssn Ename Bdate Addr Dmanaged Dno Dname 123 Alice 1990 ATL 1 1 Research 124 Bob 1991 BOS NULL 1 Research 125 Cheng 1992 CHS NULL 1 Research 126 Drhuv 1993 DET 2 2 Engineering 127 Earl 1994 EWR NULL 2 Engineering Bad design: Dmanaged has many nulls because most employees aren t managers. 10 / 34

Design Goal 3: Minimize the need for NULL values in tuples Nulls don t have definite meaning - could be absent, N/A, false Aren t used in joins Aren t counted in aggregate functions Waste space We reduce NULLS by normalization using functional dependency theory. 11 / 34

Design Goal 4: Avoid Spurious Tuples Say we have a relation state r(r) = student course instructor Narayan Database Mark Narayan Operating Systems Ammar Smith Database Navathe Smith Operating Systems Ammar Smith Theory Schulman Wallace Database Mark Wallace Operating Systems Ahamad Wong Database Omiecinski Zelaya Database Navathe 12 / 34

Bad Decomposition r(r1) = student Narayan Narayan Smith Smith Smith Wallace Wallace Wong Zelaya instructor Ammar Mark Ammar Navathe Schulman Ahamad Mark Omiecinski Navathe r(r2) = student Narayan Narayan Smith Smith Smith Wallace Wallace Wong Zelaya course Database Operating Systems Database Operating Systems Theory Database Operating Systems Database Database We would join on student and end up with... 13 / 34

Join with Spurious Tuples student course instructor Narayan Database Ammar Narayan Database Mark Narayan Operating Systems Ammar Narayan Operating Systems Mark Smith Database Ammar Smith Database Navathe... and 13 more tuples, which is way more tuples than the original relation due to spurious tuples, so the join is not non-additive. Lost the association between Instructor and Course. E.g., Mark does not teach Operating Systems. 14 / 34

Design Goal 5: Design relation schemas for natural joins Design relation schemas to be naturally joined on attributes that are related by foreign key-primary key relationships. EMP(Ename, Ssn, Bdate, Address, Dno) DEPT(Dno, Dname, Dmgr_ssn) Join on Dno tells us an employee s department Acheived by normalization based on functional dependency theory - foreign keys reference primary keys. 15 / 34

Functional Dependencies A generalization of superkeys. Given a relation schema R, and subsets of attributes X and Y, the functional dependency X Y Means that for any pair of tuples t 1 and t 2 in r(r) if t 1 [X ] = t 2 [X ] then t 1 [Y ] = t 2 [Y ] In other words, whenever the attributes on the left side of a functional dependency are the same for two tuples in the relation, the attributes on the right side of the functional dependency will also be equal. 16 / 34

Relations Satisfy FDs A B C D a 1 b 1 c 1 d 1 a 1 b 2 c 1 d 2 a 2 b 2 c 2 d 2 a 2 b 2 c 2 d 3 a 3 b 3 c 2 d 4 A C is satisfied because no two tuples with the same A value have different C values. C A is not satisfied because t 4 = (a 2, b 3, c 2, d 3 ) and t 5 = (a 3, b 3, c 2, d 4 ) 17 / 34

Satisfying vs. Holding We say that a functional dependency f holds on a relation if it is not legal to create a tuple that does not satisfy f. Alternately, we say that a relation schema (not just a particular state) satisfies a functional dependency. name street city Alice Elm Charlotte Bob Peachtree Atlanta Charlie Elm Charlotte Here street city is satisifed by this relation state. However, we would not say that the functional dependency holds, or that the relation schema satisfies the functional dependency because we know there can be different cities with the same street names. 18 / 34

Trivial Functional Dependencies A functional dependency is trivial if it is satisfied by all relations. Formally, a functional dependency X Y is trivial if Y X For example: A A AB A AB B are trivial. We don t write trivial functional dependencies when we enumerate a set of functional dependencies that hold on a schema for the purposes of normalization or normal form testing. 19 / 34

Normal Forms A normal form is a set of conditions based on functional dependencies that acts as tests for the "goodness" of the design of a relation schema. Normalization is the process of decomposing existing relation schemas into new relation schemas that satisfy normal forms for the purpose of: minimizing redundancy, and minimizing insertion, deletion, and update anomalies We cover first, second, third, and Boyce-Codd normal forms in this class. Each higher normal form subsumes the normal forms below it, e.g., a 3NF schema is also in 2NF and 1NF. The normal form of a relation schema is the highest normal form it satisfies. 20 / 34

First Normal Form (1NF) Every attribute value is atomic, which is effectively guaranteed by most RDBMS systems today. The following relation is not in 1NF: Dname Dnumber Dmgr_ssn Dlocations Research 5 333445555 {Bellaire, Sugarland, Houston} Admin 4 987654321 {Stafford} HQ 1 888665555 {Houston} Because Dlocations values are not atomic. 21 / 34

Fixing Non 1NF Schemas Many ways to fix (see book). Best way is to decompose into two schemas: Dname Dnumber Dmgr_ssn Research 5 333445555 Admin 4 987654321 HQ 1 888665555 Dnumber Dlocation 5 Bellaire 5 Sugarland 5 Houston 4 Stafford 1 Houston 22 / 34

General Definition of 2NF and 3NF Definitions in previous lecture based on primary key. General definitions based on all candidate keys. Remember: An attribute is prime if it is part of a candidate key, otherwise it is nonprime. General definition of 2NF: A relation schema R is in 2NF if every nonprime attribute A in R is fully (not partially) dependent on any key of R. 23 / 34

A Non-2NF Schema LOTS( Property_id, County_name, Lot#, Area, Price, Tax_rate) FD1: Property_id County_name, Lot#, Area, Price, Tax_rate FD2: County_name, Lot# Property_id, Area, Price, Tax_rate FD3: County_name Tax_rate FD4: Area Price Both Property_id and {County_name, Lot#} are candidate keys. So, by the general definition of 2NF LOTS is not in 2NF due to FD3, i.e., Tax_rate is partially dependent on a candidate key. 24 / 34

2NF Decomposition LOTS( Property_id, County_name, Lot#, Area, Price, Tax_rate) becomes LOTS1( Property_id, County_name, Lot#, Area, Price) FD1: Property_id County_name, Lot#, Area, Price, Tax_rate FD2: County_name, Lot# Property_id, Area, Price, Tax_rate FD4: Area Price LOTS2( County_name, Tax_rate) FD3: County_name Tax_rate 25 / 34

General Definition of 3NF A relation schema R is in 3NF if, whenever a nontrivial functional dependency X A holds in R, either (a) X is a superkey of R, or (b) A is a prime attribute of R. LOTS1( Property_id, County_name, Lot#, Area, Price) FD1: Property_id County_name, Lot#, Area, Price, Tax_rate FD2: County_name, Lot# Property_id, Area, Price, Tax_rate FD4: Area Price not in 3NF due to FD4. Area is not a superkey and Price is not a prime attribute. Note that Price is transitively dependent on each candidate key. 26 / 34

3NF Decomposition LOTS1( Property_id, County_name, Lot#, Area, Price) becomes LOTS1A( Property_id, County_name, Lot#, Area) FD1: Property_id County_name, Lot#, Area, Price, Tax_rate FD2: County_name, Lot# Property_id, Area, Price, Tax_rate and LOTS1B( Area, Price) FD4: Area Price 27 / 34

Straight to 3NF Though we present a progression through 2NF to 3NF for historical reasons, it s not necessary. Given our origial LOTS LOTS( Property_id, County_name, Lot#, Area, Price, Tax_rate) FD1: Property_id County_name, Lot#, Area, Price, Tax_rate FD2: County_name, Lot# Property_id, Area, Price, Tax_rate FD3: County_name Tax_rate FD4: Area Price We see that FD3 and FD4 are problem FDs because neither County_name nor Area is a superkey. 28 / 34

Decomposition Straight to 3NF So we can decompose LOTS( Property_id, County_name, Lot#, Area, Price, Tax_rate) directly into: LOTS1A( Property_id, County_name, Lot#, Area) FD1: Property_id County_name, Lot#, Area FD2: County_name, Lot# Property_id, Area LOTS1B( Area, Price) FD4: Area Price LOTS2( County_name, Tax_rate) FD3: County_name Tax_rate 29 / 34

Observations of General 3NF Tests Two types of problematic FDs: A nonprime attribute determines another nonprime attribute, giving rise to a transitive dependency on a key. Some subset of a key determines a nonprime attribute, giving rise to a partial dependencey on a key which violates 2NF. 30 / 34

Boyce-Codd Normal Form (BCNF) A relation schema R is in BCNF if whenever a nontrivial functional dependency X A holds in R, then X is a superkey of R Note that this is the same as 3NF except that it doesn t allow any attributes (even prime attributes) to be determined by non-keys. General non-bcnf pattern: given R(A, B, C) and FDs AB C C B R is in 3NF but not BCNF due to the FD C B. 31 / 34

BCNF Example 1 Say we add FD5 to LOTS1A( Property_id, County_name, Lot#, Area) FD1: Property_id County_name, Lot#, Area FD2: County_name, Lot# Property_id, Area FD5: Area County_name And say that Fulton county lots are restriced to 1.1, 1.2,..., 2.0 acres and DeKalb county lots are restricted to 0.5, 0.6,..., 1.0 acres. LOTS1A will have a great deal of redundancy. BCNF doesn t allow this schema because of FD5: Area is not a superkey. 32 / 34

BCNF Example 1 Decomposition LOTS1A( Property_id, County_name, Lot#, Area) becomes LOTS1AX( Property_id, Area, Lot#) FD1: Property_id County_name, Lot#, Area and LOTS1AY( Area, County_name) FD5: Area County_name Note that FD2 is lost because its attributes are no longer in the same relation schema. In general, FDs may not be preservable in BCNF decompositions. 33 / 34

BCNF Example 2 Given TEACH(Student, Course, Instructor) and FD1: {Student, Course} Instructor FD2: Instructor Course. FD2 violates BCNF. There are three possible BCNF decompositions: 1. R1( Student, Instructor) and R2( Student, Course ) 2. R1( Instructor, Course) and R2( Student, Course ) 3. R1( Instructor, Course) and R2( Instructor, Student ) All three decompositions preserve attributes and lose FD1, which is acceptable as along as the decomposition has the non-additive join property. Which of these decompositions are good? In the next lecture we ll learn how to answer that question. 34 / 34