Relational Design Theory

Similar documents
Relational Design Theory. Relational Design Theory. Example. Example. A badly designed schema can result in several anomalies.

Functional dependency theory

UNIT 3 DATABASE DESIGN

This lecture. Databases -Normalization I. Repeating Data. Redundancy. This lecture introduces normal forms, decomposition and normalization.

Normalization. Murali Mani. What and Why Normalization? To remove potential redundancy in design

Relational Database Design Theory. Introduction to Databases CompSci 316 Fall 2017

UNIT -III. Two Marks. The main goal of normalization is to reduce redundant data. Normalization is based on functional dependencies.

Databases -Normalization I. (GF Royle, N Spadaccini ) Databases - Normalization I 1 / 24

CS411 Database Systems. 05: Relational Schema Design Ch , except and

CSE 562 Database Systems

Informal Design Guidelines for Relational Databases

Database Design Theory and Normalization. CS 377: Database Systems

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2009 Lecture 3 - Schema Normalization

Chapter 10. Chapter Outline. Chapter Outline. Functional Dependencies and Normalization for Relational Databases

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 4 - Schema Normalization

Relational Design: Characteristics of Well-designed DB

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-2

CS352 Lecture - Conceptual Relational Database Design

FUNCTIONAL DEPENDENCIES

Relational Database Design (II)

Chapter 14. Database Design Theory: Introduction to Normalization Using Functional and Multivalued Dependencies

Functional Dependencies CS 1270

COSC Dr. Ramon Lawrence. Emp Relation

Database Systems. Basics of the Relational Data Model

Part II: Using FD Theory to do Database Design

Chapter 8: Relational Database Design

Unit 3 : Relational Database Design

CS352 Lecture - Conceptual Relational Database Design

Functional Dependencies and. Databases. 1 Informal Design Guidelines for Relational Databases. 4 General Normal Form Definitions (For Multiple Keys)

CSCI 403: Databases 13 - Functional Dependencies and Normalization

Database Design Principles

Chapter 10. Normalization. Chapter Outline. Chapter Outline(contd.)

Lecture 6a Design Theory and Normalization 2/2

Relational Database design. Slides By: Shree Jaswal

6.830 Lecture PS1 Due Next Time (Tuesday!) Lab 1 Out end of week start early!

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

CMU SCS CMU SCS CMU SCS CMU SCS whole nothing but

The Relational Data Model

Database Management System Prof. Partha Pratim Das Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur

Lectures 12: Design Theory I. 1. Normal forms & functional dependencies 2/19/2018. Today s Lecture. What you will learn about in this section

Lecture 4. Database design IV. INDs and 4NF Design wrapup

Relational Database Design. Announcements. Database (schema) design. CPS 216 Advanced Database Systems. DB2 accounts have been set up

Functional Dependencies and Normalization for Relational Databases Design & Analysis of Database Systems

Database design III. Quiz time! Using FDs to detect anomalies. Decomposition. Decomposition. Boyce-Codd Normal Form 11/4/16

Functional Dependencies and Finding a Minimal Cover

Mapping ER Diagrams to. Relations (Cont d) Mapping ER Diagrams to. Exercise. Relations. Mapping ER Diagrams to Relations (Cont d) Exercise

Schema Refinement and Normal Forms

Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Overview - detailed. Goal. Faloutsos & Pavlo CMU SCS /615

Theory of Normal Forms Decomposition of Relations. Overview

Normalization. Anomalies Functional Dependencies Closures Key Computation Projecting Relations BCNF Reconstructing Information Other Normal Forms

Concepts from

Databases The theory of relational database design Lectures for m

Announcements (January 20) Relational Database Design. Database (schema) design. Entity-relationship (E/R) model. ODL (Object Definition Language)

Chapter 7: Relational Database Design

customer = (customer_id, _ customer_name, customer_street,

CS 2451 Database Systems: Database and Schema Design

Normalisation theory

Functional Dependencies and Normalization

Chapter 6: Relational Database Design

Steps in normalisation. Steps in normalisation 7/15/2014

Schema Refinement: Dependencies and Normal Forms

Desirable database characteristics Database design, revisited

Design Theory for Relational Databases

Normalization in DBMS

Part 7: Relational Normal Forms

DATABASE MANAGEMENT SYSTEMS

Informationslogistik Unit 5: Data Integrity & Functional Dependency

Combining schemas. Problems: redundancy, hard to update, possible NULLs

Schema Refinement: Dependencies and Normal Forms

MODULE: 3 FUNCTIONAL DEPENDENCIES

Normalisation Chapter2 Contents

Normalisation. Normalisation. Normalisation

CSIT5300: Advanced Database Systems

Draw A Relational Schema And Diagram The Functional Dependencies In The Relation >>>CLICK HERE<<<

Lecture 5 Design Theory and Normalization

Functional Dependencies & Normalization for Relational DBs. Truong Tuan Anh CSE-HCMUT

Handout 3: Functional Dependencies and Normalization

Schema Refinement: Dependencies and Normal Forms

From Murach Chap. 9, second half. Schema Refinement and Normal Forms

Review: Attribute closure

Homework 6: FDs, NFs and XML (due April 13 th, 2016, 4:00pm, hard-copy in-class please)

Relational Database Systems 1

Database Normalization. (Olav Dæhli 2018)

Database Management System

Introduction to Database Design, fall 2011 IT University of Copenhagen. Normalization. Rasmus Pagh

Lectures 5 & 6. Lectures 6: Design Theory Part II

Database Management

Functional Dependencies

Databases 1. Daniel POP

In This Lecture. Normalisation to BCNF. Lossless decomposition. Normalisation so Far. Relational algebra reminder: product

Edited by: Nada Alhirabi. Normalization

NORMAL FORMS. CS121: Relational Databases Fall 2017 Lecture 18

Normal Forms. Winter Lecture 19

Final Review. Zaki Malik November 20, 2008

Databases Tutorial. March,15,2012 Jing Chen Mcmaster University

Part 5: Relational Normal Forms

Functional Dependency: Design and Implementation of a Minimal Cover Algorithm

The strategy for achieving a good design is to decompose a badly designed relation appropriately.

Unit- III (Functional dependencies and Normalization, Relational Data Model and Relational Algebra)

The Relational Model and Normalization

Transcription:

Chapter 5 Relational Design Theory Database Systems p. 211/569

Relational Design Theory A badly designed schema can cause a lot of trouble Database designers talk of anomalies and distinguish three different types: Update anomaly Insertion anomaly Deletion anomaly Database Systems p. 212/569

Relational Design Theory(2) Update anomaly: Changing a single fact requires you to touch multiple tuples Insertion anomaly: When inserting a tuple not all the required information is available, resulting in lots of NULL values In the worst case, value for a key attribute is missing tuple cannot be inserted Deletion anomaly: Deleting a tuple removes more information than intended Database Systems p. 213/569

Example A financial consultant stores data in a relational DBMS There are (B)rokers, their (O)ffices, (I)nvestors, (S)hares, (D)ividends, (N)umber of shares an investor holds Everything is kept in one big relation: finance(b, O, I, S, D, N) Database Systems p. 214/569

Example(2) finance B O I S D N Gekko 103 Trump IBM 1.20 50 Gekko 103 Trump Fiat 1.00 30 Gekko 103 Trump Coca Cola 0.00 100 Fox 214 Smith BASF 0.80 80 Fox 214 Smith Fiat 1.00 140 Fox 214 Smith IBM 1.20 30.................. Database Systems p. 215/569

Example(3) What happens, when Gekko moves from 103 to 145? We have to change three (or even more) tuples What happens, when Lynch starts working as a broker, but doesn t have any clients yet? We insert [Lynch, 210, NULL, NULL, NULL, NULL] What happens, when Smith sells all his shares? He vanishes from the database... Database Systems p. 216/569

Taking a Closer Look The relationfinance seems to contain a lot of redundant data: 103 is Gekko s office IBM pays a dividend of 1.20 per share Fox is acting as broker for Smith Database Systems p. 217/569

Taking a Closer Look(2) Where does this redundancy come from? Certain attribute values determine other attribute values There are functional dependencies between attributes Database Systems p. 218/569

Functional Dependencies For example, there is a functional dependency (FD) betweenbroker andoffice: Broker Office Assuming every broker has one office, the value for B uniquely determines the value foro As far as we can see, looks good: Gekko always shows up with office 103, Fox with office 214 Database Systems p. 219/569

Functional Dependencies(2) Formal definition of an FD: α and β are attribute sets of a relational schema R There is an FD α β, if for all pairs of tuples r,t R: r.α = t.α r.β = t.β Let s say α = {A 1,...,A n } and β = {B 1,...,B m } r α β t if r and t agree here they also have to agree here Database Systems p. 220/569

Functional Dependencies(3) A functional dependency has to hold for all possible instances of R Using the current instance R, we can only disprove FDs: If you find this, it s not an FD FDs have to be derived from background knowledge of the application Can you find other FDs infinance? Database Systems p. 221/569

Functional Dependencies(4) F R = {B O,S D,I B,IS N} What do we do with these FDs? First of all, we can now compute a key for R Database Systems p. 222/569

Properties of a Key A key κ has the following properties: 1. κ R 2. κ R 3. There is no κ κ such that κ R Property 2 means the key is complete property 3 means the key is minimal Database Systems p. 223/569

Properties of a Key (2) If κ satisfies all the properties, it s called a candidate key A relation may have more than one candidate key Choose one of candidate keys as the primary key If κ only satisfies properties 1 and 2, it s called a superkey (candidate) key superkey Database Systems p. 224/569

Computing a Key IsIS a key offinance? Property 1: Checking properties 2 and 3 requires a few more concepts of relational design theory Database Systems p. 225/569

Reasoning about FDs We can infer additional FDs from a set of given F The closure F + is the set of all FDs that can be inferred from F There are rules, called Armstrong s Axioms, to do this Database Systems p. 226/569

Armstrong s Axioms Let α,β and γ be subsets of R We have three axioms: Reflexivity: β α α β Augmentation: If α β, then αγ βγ Transitivity: α β and β γ α γ Try this out on the relational schema forfinance: R = (B,O,I,S,D,N) F R = {B O,S D,I B,IS N} Which attributes are functionally determined byis? Database Systems p. 227/569

Closure of a Set of Attributes AC(α) is the closure of a set of attributes α AC(α) contains all the attributes functionally determined by α (given F R ) Figuring out AC(α) by using Armstrong s Axioms can be a bit tiresome We don t have to, there s a simple algorithm: AC := α while (AC still changes) do for each FD β γ in F R do if (β AC) then AC := AC γ Database Systems p. 228/569

Closure of a Set of Attributes(2) We can use this to check properties 2 and 3 of a key: AC(IS) = BOISDN IS is a superkey AC(I) = BOI AC(S) = SD IS is also a candidate key Database Systems p. 229/569

The Next Step Now that we know how to find keys, what do we do next? Design theory can be used to check the quality of a schema This is done via normal forms (NFs): 1NF, 2NF, 3NF, BCNF, 4NF (Usually) the higher, the better Database Systems p. 230/569

First Normal Form (1NF) Assume we want to store parent/child relationships in a relation: parents mother father child Mary John Dave Sue Mike Kate......... What do we do if a couple has more than one child? We have to get this information into the relational table structure Database Systems p. 231/569

First Normal Form(2) We could store the names of the children in a tuple or add columns: parents mother father children Mary John [Jen, Dave]......... parents mother father child1 child2 child3... Mary John Jen Dave NULL... Sue Mike Kate NULL NULL..................... Database Systems p. 232/569

First Normal Form(3) The attributechildren has a domain, e.g. char(30) First solution takes more than one value from that domain and assigns it to an attribute Breaks the relational table structure We could change the domain: have a list of char(30) Adding constraints becomes more difficult Formulating queries such as list all parents who have children with the same name also becomes harder Database Systems p. 233/569

First Normal Form(4) Second solution seems to follow relational table structure, but has problems as well How many columns do we choose? Every additional column adds lots of NULL values Querying for children with the same name needs to compare every column with every other column Database Systems p. 234/569

First Normal Form(5) A relational schema is in 1NF, if all attribute values are atomic: an attribute may take on only one value from its domain parents (in 1NF) mother father child Mary John Jen Mary John Dave Sue Mike Kate......... Database Systems p. 235/569

Second Normal Form (2NF) A relational schema is in 2NF, if it is in 1NF and every non-key attribute (NKA) is fully functional dependent on every key What s a full functional dependency? β is fully functional dependent on α (α β), if α β and there is no α α such that α β Database Systems p. 236/569

Second Normal Form(2) For example,finance is not in 2NF because D is a non-key attribute It is dependent on S not fully functional dependent on key IS If we violate 2NF, that means we combine different entity relationships in one table In the case offinance, the relationships between shares and dividends & investors and brokers Causes redundancy Database Systems p. 237/569

Third Normal Form (3NF) Even if a relational schema is in 2NF, there may still be redundancies in it These are caused by transitive dependencies For example, adding data about departments to the relationprofessor may result in one pers_no name office dept_no dept name is a non-key attribute There is a transitive dependency between the keypers no and another non-key attribute,dept no: dept_name pers no dept no dept name Database Systems p. 238/569

Third Normal Form(2) A schema is in 3NF, if for every FD α β at least one of the following conditions holds α β is trivial, i.e., β α α is a superkey Every attribute in β is part of a candidate key Every non-key attribute has to depend on the key, the whole key, and nothing but the key, so help me Codd Database Systems p. 239/569

Boyce-Codd Normalform (BCNF) Usually, designers are already quite happy with 3NF Only in rare cases does a schema in 3NF still contain redundancies For example, assume a restaurant offers a meal deal : There are starters, main dishes, and drinks For a meal you choose one of each type Database Systems p. 240/569

Boyce-Codd Normalform(2) The ordered meals are stored in a table: meals (N)umber (D)ish (T)ype 1 bruschetta starter 1 penne con salmone main 1 white wine drink 2 soup of the day starter 2 pizza funghi main 2 white wine drink......... We have the following FDs: ND T, NT D, D T That means, ND and NT are candidate keys Database Systems p. 241/569

Boyce-Codd Normalform(3) This schema is in 3NF: Every attribute in this schema is part of some candidate key Third condition of 3NF always holds However, there is still redundancy: we already know that white wine is a drink Database Systems p. 242/569

Boyce-Codd Normalform(4) We need to tighten the conditions A schema is in BCNF, if for every FD α β at least one of the following conditions holds α β is trivial, i.e., β α α is a superkey Every attribute (including key attributes) has to depend on the key, the whole key, and nothing but the key A schema in BCNF does not have any redundancies caused by functional dependencies Unfortunately, this is Database Systems p. 243/569

Multi-valued Dependencies Let s look at the following relation: skills pers_no lang prog_lang 3002 English C 3002 Italian C 3002 English Java 3002 Italian Java 3005 German C 3005 Italian C We have no FDs whatsoever, but there are still redundancies Database Systems p. 244/569

Multi-valued Dependencies(2) Someone speaking four languages and knowing five programming language needs twenty tuples in this relation! We are throwing together two completely independent aspects Can be stored more compactly in two relations: skills pers_no lang 3002 English 3002 Italian 3005 German 3005 Italian skills pers_no proglang 3002 C 3002 Java 3005 C Database Systems p. 245/569

Multi-valued Dependencies(3) We have something called multi-valued dependencies (MVDs) here: pers no lang andpers no proglang Because the languages and programming languages a person knows are completely independent every possible combination has to show up for a particular person Database Systems p. 246/569

Multi-valued Dependencies(4) There s also a formal definition: Let α,β,γ R (with α β γ = R) α β holds if for every instance R: for every pair of tuples t 1,t 2 in R with t 1.α = t 2.α exists a tuple t 3 R with t 3.α = t 1.α(= t 2.α) t 3.β = t 1.β t 3.γ = t 2.γ Formulated differently: for all tuples with the same value for α, all possible β,γ-combinations have to appear Database Systems p. 247/569

Multi-valued Dependencies(5) MVDs are a generalization of FDs, i.e., every FD is an MVD (but not necessarily the other way around) Similar to the Armstrong Axioms for FDs, there are also inference rules for MVDs We are not going to go into details here... One important property is the following, though: α β α γ for γ = R α β Database Systems p. 248/569

Fourth Normal Form (4NF) 4NF is a generalization of BCNF to MVDs A relational schema is in 4NF, if for every MVD α β at least one of the following conditions holds α β is trivial, i.e., β α or α β = R α is a superkey There are even higher normal forms than 4NF... Mainly of academic interest, not covered here Database Systems p. 249/569

Data Quality A higher normal form includes all the lower ones: 4NF BCNF 3NF 2NF 1NF The higher the normal form, the better (should be at least 3NF) What do you do if you re not satisfied with the normal form of your current schema? Database Systems p. 250/569

Decomposition of Relations A relational schema can be transformed into a higher normal form by taking it apart This can be done by decomposing a schema R into subschemas R 1,...,R n with R i R for 1 i n finance B O I S D N Gekko 103 Trump IBM 1.20 50 Gekko 103 Trump Fiat 1.00 30 Gekko 103 Trump Coca Cola 0.00 100 Fox 214 Smith BASF 0.80 80 Fox 214 Smith Fiat 1.00 140 Fox 214 Smith IBM 1.20 30.................. Database Systems p. 251/569

Decomposition of Relations(2) A decomposition needs to have two properties to be useful Recoverability of information: we are able to reconstruct the original instance R from the instances R 1,...R n (for all possible instances R) Preservation of dependencies: all FDs in F R are also found in F R1,...,F Rn Database Systems p. 252/569

Recoverability of Information Assume we have a decomposition of R into R 1 and R 2 with R = R 1 R 2 and R 1 = π R1 (R), R 2 = π R2 (R) This decomposition recovers all information, if for every possible instance of R the following holds R = R 1 R 2 How do we check this for every instance? Database Systems p. 253/569

Recoverability of Information(2) Luckily we don t have to! It s fine if at least one of the following conditions hold: (R 1 R 2 ) R 1 F + R (R 1 R 2 ) R 2 F + R Database Systems p. 254/569

What Happens If We Get It Wrong? F R = {B O,S D,I B,IS N} finance B O I S D N Gekko 103 Trump IBM 1.20 50 Gekko 103 Trump Fiat 1.00 30 Fox 214 Smith Fiat 1.00 140 Fox 214 Smith IBM 1.20 30 ւց finance1 B O S Gekko 103 IBM Gekko 103 Fiat Fox 214 Fiat Fox 214 IBM finance2 I S D N Trump IBM 1.20 50 Trump Fiat 1.00 30 Smith Fiat 1.00 140 Smith IBM 1.20 30 Database Systems p. 255/569

What Happens If We Get It Wrong?(2) finance1 finance2 finance (reconstructed) B O I S D N Gekko 103 Trump IBM 1.20 50 Gekko 103 Trump Fiat 1.00 30 Fox 214 Smith Fiat 1.00 140 Fox 214 Smith IBM 1.20 30 Gekko 103 Smith Fiat 1.00 140 Gekko 103 Smith IBM 1.20 30 Fox 214 Trump IBM 1.20 50 Fox 214 Trump Fiat 1.00 30 We end up with more tuples than before! Database Systems p. 256/569

Dependency Preservation Given a decomposition of R into R 1,...,R n Decomposition preserves dependencies, if F R (F R1 F Rn ) or F + R = (F R 1 F Rn ) +, respectively In the previous (non-recoverable) decomposition we also lose dependencies: I B is not included anymore Database Systems p. 257/569

Decomposition Algorithms Decomposing relations on a trial-and-error basis and then checking them would be very tedious There are algorithms to do this systematically Most important one: 3NF Synthesis Algorithm Decomposes a schema R into 3NF guaranteeing recoverability and dependency preservation Prerequisite: a minimal basis for the set of FDs F R Database Systems p. 258/569

Minimal Basis F c is a minimal basis or canonical cover of F if the following three properties hold: F c F, i.e., F + c = F + There are no FDs α β in F c in which α or β contain unnecessary attributes The left-hand side of every FD in F c is unique This can be achieved by applying the union rule: α β, α γ α βγ Database Systems p. 259/569

Minimal Basis(2) How do we check for unnecessary attributes? A α : (F c {α β} {(α A) β}) F c B β : (F c {α β} {α (β B)}) F c We do this in two steps: first looking at the left-hand side (LHS) and then at the right-hand side (RHS) Database Systems p. 260/569

Minimizing LHS For the LHS of every FD α β F check for all A α if A is unnecessary, that means: β AC(F,α A) If yes, replace α β by (α A) β. Database Systems p. 261/569

Minimizing RHS For the RHS of every (remaining) FD α β F check for all B β if B is unnecessary, that means: B AC(F {α β} {α (β B)},α) If yes, replace α β by α (β B) Database Systems p. 262/569

Only Two More Steps... If present, remove all FDs of the form α Merge all FDs that have the same LHS (union rule): α β 1,...,α β n become α β 1 β n That s it, you now have a minimal basis! Depending on the order you process the FDs, you may end up with a slightly different result However, does not matter, no solution will contain any redundancy Database Systems p. 263/569

3NF Synthesis Algorithm Now we are ready to apply the algorithm to decompose a schema into 3NF First step: for every FD α β in F c create a subschema R α = α β with F α := {α β F c α β R α } Second step: add a schema R κ that contains a candidate key Third step: eliminate redundant schemas, i.e., if you find R i R j, drop R i Database Systems p. 264/569

Example Let s apply this algorithm tofinance with F Rc = {B O,S D,I B,IS N} R B (B,O) R S (S,D) R I (I,B) R IS (I,S,N) (contains key) What happened to the anomalies we discussed at the beginning of the chapter? Database Systems p. 265/569

Decomposition into BCNF and 4NF There are algorithms for decomposing schemas into BCNF and 4NF, too However, these algorithms cannot guarantee the preservation of dependencies Usually, designers stop at 3NF Has another reason: the higher the normal form, the more biased a schema is towards update operations Database Systems p. 266/569

Decomposition into BCNF Start with Z = {R} If there is still an R i Z that is not in BCNF Find an FD (α β) F + with α β R i α β = α R i / F + Decompose R i into R i1 := α β and R i2 := R i β Remove R i from Z and add R i1 and R i2 : Z := (Z {R i }) {R i1 } {R i2 } Database Systems p. 267/569

Decomposition into 4NF Very similar to BCNF, just replace FD with MVD Start with Z = {R} If there is still an R i Z that is not in 4NF Find an MVD (α β) F + with α β R i α β = α R i / F + Decompose R i into R i1 := α β and R i2 := R i β Remove R i from Z and add R i1 and R i2 : Z := (Z {R i }) {R i1 } {R i2 } Database Systems p. 268/569

Summary Normal forms determine the quality of a relational schema If not good enough, a decomposition algorithm can be used to improve quality All algorithms guarantee recoverability Preservation of dependencies can be guaranteed up to 3NF High quality for relational schemas means Database Systems p. 269/569