Combining schemas. Problems: redundancy, hard to update, possible NULLs

Similar documents
Chapter 7: Relational Database Design

Chapter 6: Relational Database Design

customer = (customer_id, _ customer_name, customer_street,

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 4 - Schema Normalization

Functional Dependencies CS 1270

Chapter 8: Relational Database Design

CS411 Database Systems. 05: Relational Schema Design Ch , except and

Database Design Principles

CSE 562 Database Systems

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2009 Lecture 3 - Schema Normalization

Relational Database Design (II)

Functional Dependencies

Unit 3 : Relational Database Design

Database design III. Quiz time! Using FDs to detect anomalies. Decomposition. Decomposition. Boyce-Codd Normal Form 11/4/16

Homework 6: FDs, NFs and XML (due April 13 th, 2016, 4:00pm, hard-copy in-class please)

Databases Tutorial. March,15,2012 Jing Chen Mcmaster University

UNIT 3 DATABASE DESIGN

Functional dependency theory

Part II: Using FD Theory to do Database Design

Lecture 11 - Chapter 8 Relational Database Design Part 1

CMU SCS CMU SCS CMU SCS CMU SCS whole nothing but

Database Management System Prof. Partha Pratim Das Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur

Database Design Theory and Normalization. CS 377: Database Systems

CSIT5300: Advanced Database Systems

Database Systems. Basics of the Relational Data Model

Homework 6: FDs, NFs and XML (due April 15 th, 2015, 4:00pm, hard-copy in-class please)

Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Overview - detailed. Goal. Faloutsos & Pavlo CMU SCS /615

Lecture 4. Database design IV. INDs and 4NF Design wrapup

Handout 3: Functional Dependencies and Normalization

FUNCTIONAL DEPENDENCIES

Databases Lecture 7. Timothy G. Griffin. Computer Laboratory University of Cambridge, UK. Databases, Lent 2009

Normalization. Murali Mani. What and Why Normalization? To remove potential redundancy in design

Design Theory for Relational Databases

Relational Database Design Theory. Introduction to Databases CompSci 316 Fall 2017

Homework 3: Relational Database Design Theory (100 points)

Databases -Normalization I. (GF Royle, N Spadaccini ) Databases - Normalization I 1 / 24

Presentation on Functional Dependencies CS x265

Relational Design: Characteristics of Well-designed DB

CSCI 403: Databases 13 - Functional Dependencies and Normalization

BCNF. Yufei Tao. Department of Computer Science and Engineering Chinese University of Hong Kong BCNF

Databases The theory of relational database design Lectures for m

CS352 Lecture - Conceptual Relational Database Design

Draw A Relational Schema And Diagram The Functional Dependencies In The Relation >>>CLICK HERE<<<

CS352 Lecture - Conceptual Relational Database Design

NORMAL FORMS. CS121: Relational Databases Fall 2017 Lecture 18

Normalization 03. CSE3421 notes

This lecture. Databases -Normalization I. Repeating Data. Redundancy. This lecture introduces normal forms, decomposition and normalization.

Normalization. Anomalies Functional Dependencies Closures Key Computation Projecting Relations BCNF Reconstructing Information Other Normal Forms

Lecture 6a Design Theory and Normalization 2/2

6.830 Lecture PS1 Due Next Time (Tuesday!) Lab 1 Out end of week start early!

Fundamentals of Database Systems

Normalisation theory

Schema Refinement & Normalization Theory 2. Week 15

Review: Attribute closure

COSC Dr. Ramon Lawrence. Emp Relation

Lectures 5 & 6. Lectures 6: Design Theory Part II

Informal Design Guidelines for Relational Databases

Desired properties of decompositions

The Relational Data Model

QUESTION BANK. SUBJECT CODE / Name: CS2255 DATABASE MANAGEMENT SYSTEM UNIT III. PART -A (2 Marks)

Administrivia. Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Relational Model: Keys. Correction: Implied FDs

Case Study: Lufthansa Cargo Database

Normalisation Chapter2 Contents

Functional Dependencies and Finding a Minimal Cover

Theory of Normal Forms Decomposition of Relations. Overview

UNIT -III. Two Marks. The main goal of normalization is to reduce redundant data. Normalization is based on functional dependencies.

Database Management

Database Systems CSE Comprehensive Exam Spring 2005

Desirable database characteristics Database design, revisited

More Normalization Algorithms. CS157A Chris Pollett Nov. 28, 2005.

Schema Refinement: Dependencies and Normal Forms

Relational Design 1 / 34

CS 338 Functional Dependencies

Assignment 2 Solutions

In This Lecture. Normalisation to BCNF. Lossless decomposition. Normalisation so Far. Relational algebra reminder: product

DATABASE MANAGEMENT SYSTEMS

Database Normalization. (Olav Dæhli 2018)

Schema Refinement: Dependencies and Normal Forms

Functional Dependencies & Normalization for Relational DBs. Truong Tuan Anh CSE-HCMUT

Chapter 14 Outline. Normalization for Relational Databases: Outline. Chapter 14: Basics of Functional Dependencies and

Design Theory for Relational Databases

Normalisation. Normalisation. Normalisation

Chapter 10. Chapter Outline. Chapter Outline. Functional Dependencies and Normalization for Relational Databases

Chapter 10. Normalization. Chapter Outline. Chapter Outline(contd.)

Lectures 12: Design Theory I. 1. Normal forms & functional dependencies 2/19/2018. Today s Lecture. What you will learn about in this section

DATABASE DESIGN I - 1DL300

Introduction to Databases, Fall 2003 IT University of Copenhagen. Lecture 4: Normalization. September 16, Lecturer: Rasmus Pagh

SCHEMA REFINEMENT AND NORMAL FORMS

Lecture 14 of 42. E-R Diagrams, UML Notes: PS3 Notes, E-R Design. Thursday, 15 Feb 2007

Lecture 5 Design Theory and Normalization

DATABASE DESIGN I - 1DL300

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Schema Refinement and Normal Forms

Chapter 16. Relational Database Design Algorithms. Database Design Approaches. Top-Down Design

Chapter 14. Database Design Theory: Introduction to Normalization Using Functional and Multivalued Dependencies

Relational Model and Relational Algebra A Short Review Class Notes - CS582-01

Relational Design Theory

ADVANCED DATABASES ; Spring 2015 Prof. Sang-goo Lee (11:00pm: Mon & Wed: Room ) Advanced DB Copyright by S.-g.

CSCI 127 Introduction to Database Systems

Unit- III (Functional dependencies and Normalization, Relational Data Model and Relational Algebra)

Transcription:

Handout

Combining schemas Problems: redundancy, hard to update, possible NULLs

Problems? Conclusion: Whether the join attribute is PK or not makes a great difference when combining schemas!

Splitting schemas, a.k.a. decomposition (revert arrows below) Coincidence or not? (And why it matters ) Functional dependency: loan_number amount

An even worse decomposition: lossy! Why do we say lossy when in fact we end up with more data?

7.3 Decomposition using FDs

FD algebra

Example: --------------------------------------------------------------------------------------------------

The most useful normal form:

loan = (loan_number, amount) borrower = (customer_id, loan_number) Find the set of all (non-trivial) FDs for the relation bor_loan

Another example: Is this schema in BCNF?

In bor_loan, the violating FD is loan_number amount, so we set Why not simply say R?

Another example: It was found earlier that this schema is not in BCNF. The violating FD is B C. Apply the BCNF decomposition algorithm!

If no, decompose it! Is this relation in BCNF?

To do for next time: Rework all the BCNF examples! ----------------------------------------------------------------------------------------------

BCNF and preservation of dependencies E-R design from Ch.6: a customer A customer can have more than 1 personal banker, can have at most 1 personal banker but at most one at any given branch. (?)

A ternary relationship-set is needed: Implementation: R = cust_banker_branch = (customer_id, employee_id, branch_name, type) FDs: FD1: employee_id branch_name FD2: (customer_id, branch_name) (employee_id, type)

Is cust_banker_branch in BCNF? No. Apply the decomposition algorithm! Decomposition: R1 = (employee_id, branch_name) R2 = (customer_id, employee_id, type) Problem: FD2 is now spread across two relations!

Conclusion: BCNF is not dependency preserving R = cust_banker_branch = (customer_id, employee_id, branch_name, type) FDs: FD1: employee_id branch_name FD2: (customer_id, branch_name) (employee_id, type) Extra-credit: What if we started BCNF decomposition with F2 instead of F1? Time: 2

Because it is not always possible to achieve both BCNF and dependency preservation, we consider a weaker NF, known as Show that cust_banker_branch is in 3NF R = cust_banker_branch = (customer_id, employee_id, branch_name, type) FDs: FD1: employee_id branch_name FD2: (customer_id, branch_name) (employee_id, type)

Whatever happened to 2NF? In a nutshell, it forbids attributes to depend on parts of keys. See Second normal form - Wikipedia, the free encyclopedia for more details. Another BCNF/3NF example: books (B-Name, Ed, A-Name, A-SSN, Nr-pag) Is it in BCNF? 3NF? A_Name A_SSN To do for next time: Rework all the BCNF & 3NF examples! -----------------------------------------------------------------------------------------------------

Higher NFs Consider this relation: classes (course, teacher, book ) If (c, t, b) classes means that t is qualified to teach c, and b is a required textbook for c. What are the FDs for this relation? Is it in BCNF? Is it in 3NF?

We still have redundancies and insertion anomalies e.g., if Marilyn is a new teacher that can teach database, two tuples need to be inserted: (database, Marilyn, DB Concepts) (database, Marilyn, Ullman)

Whatever happened to 2NF? In a nutshell, it forbids attributes to depend on parts of keys. See Second normal form - Wikipedia, the free encyclopedia for more details.

The big picture

7.4 FD Theory 7.4.1 The Closure of a set of FDs Yes, this is a trivial FD!

Algorithm to compute F+

Although Armstrong s axioms are sufficient to obtain the closure in practice we want more tools How about these? Idempotency: X X X Commutativity: X Y Y X They are true, but it is customary to write all attributes as sets w/no repeating values and sorted in alphabetical order.

Important lemma: if and only if Proof: Left as individual work for next time. Use the definition of a FD from p.271:

Practice exercise 7.4

Example: Quiz: Generate 4 more FDs that are in F +

7.4.2 The Closure of a set of attributes (under the set of FDs) Compare to the inefficient algorithm, based on F + For next time: Read and understand the example on p.281 --------------------------------------------------------------------------------------------------------

Applications of attribute closure: Check if a set of attributes is superkey Check if a set of attributes is candidate key (i.e. superkey + minimal) Check if a functional dependency holds (i.e. if is in F + ) o Find + and then check if + Computing closure F + of F o For each set of attributes R, find the closure +, and for each S + output a functional dependency S Attribute closure gives another algorithm to find the FD closure F +! Compare it with the first alg. from fig. 7.8. Which one do you think is more efficient? Explain!

Example:

In general, a FD is of the form, with and sets of attributes, e.g. EFG KL. Food for thought: Can be the empty set? ( nothing ) Can be the total set? ( everything ) Can be the empty set? ( nothing ) Can be the total set? ( everything )

Extraneous attributes This part is trivial, so it doesn t need to be checked (it was included just for symmetry) In English: If we remove the attribute, the closure F + does not change Why is this of practical importance?

Examples: Given F = {A C, AB C } B is extraneous in AB C because {AB C} can be derived from A C (How?) As seen in this example, sometimes removal of extraneous attributes makes an entire FD disappear (b/c it s a duplicate) Given F = {A C, AB CD} C is extraneous in AB CD since AB C can be derived even after deleting C (How?)

Algorithm: Add to the list of applications of attribute closure! Exercise

Answer: ------------------------------------------------------------------------------------- Exercise

Answer: A+ = {A, B, C, D}, so A+ contains C, so C is extraneous in A CD Exercise Same scenario as above. Is D extraneous in A CD? Exercise F = {A B, B C, A C). Is C extraneous in A C? So what do we do about A C? For next time: solve all the exercises above, plus the one on p.283!

? Why is this of practical importance? Algorithm:

Example not from text: Solve for practice!

Solution:

Two things must be preserved when we perform decompositions: Data (tuples) FDs

Efficient algorithm (uses only attribute closure, not FD closure!) How much of Ri can we recover, based on the current result?

Example (not in text, but in text slides): Trivial, don t need algorithm! Apply the algorithm above to prove this! ----------------------------------------------------------------------------------

Solution: Prove that the decomposition R1=(A, B) R2 = (A,C) is not dependency preserving. The FD that needs to be recovered is B C. Apply algorithm: result = {B} Consider R1; result R1 = {B}; {B} + = {BC}; {BC} R1 = {B}; resultu{b} = {B} Consider R2; result R2 = Ø; Ø + = Ø; result = {B} No progress, so algorithm stops. We could not obtain the RHS of B C, so FD cannot be recovered.

Week 12, Lect 1 7.5 Decomposition using FDs Problem: The definitions of both BCNF and 3NF require F + expensive!

FYI there is a sketched proof for this on p.289 (not required for final)

Can you find super-keys? Intuitively, we can feel that AC BDE but how to prove it? Hint: Armstrong s axioms (and theorems) So AC is a super-key. But is it a candidate key? (What s the difference?)

Do you think there are other candidate keys? Why or why not? Are there any BCNF violations? Hint: To find BCNF violations, do we need to check F or F +? Why? Which one do choose to start decomposition?

Now write down the two relations resulting from decomposition, including their FDs F 1 and F 2 and their candidate keys:

SKIP the remainder of Section 7.5, starting with 7.5.1.2. (p.289) SKIP 7.6, 7.7 Read and take notes: Sections 7.8, 7.9 Homework for Ch.7: 1, 3, 5, 6, 7, 11