Database Design Principles

Similar documents
CS352 Lecture - Conceptual Relational Database Design

CS352 Lecture - Conceptual Relational Database Design

Chapter 8: Relational Database Design

Unit 3 : Relational Database Design

Chapter 7: Relational Database Design

Functional Dependencies CS 1270

Desirable database characteristics Database design, revisited

This lecture. Databases -Normalization I. Repeating Data. Redundancy. This lecture introduces normal forms, decomposition and normalization.

Relational Database Design Theory. Introduction to Databases CompSci 316 Fall 2017

Normalization. Murali Mani. What and Why Normalization? To remove potential redundancy in design

Databases -Normalization I. (GF Royle, N Spadaccini ) Databases - Normalization I 1 / 24

UNIT 3 DATABASE DESIGN

Relational Design: Characteristics of Well-designed DB

Functional Dependencies

UNIT -III. Two Marks. The main goal of normalization is to reduce redundant data. Normalization is based on functional dependencies.

Relational Database Design (II)

Functional Dependencies and Finding a Minimal Cover

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 4 - Schema Normalization

Combining schemas. Problems: redundancy, hard to update, possible NULLs

Database Management System 15

Query Processing Strategies and Optimization

Functional Dependency: Design and Implementation of a Minimal Cover Algorithm

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2009 Lecture 3 - Schema Normalization

CS411 Database Systems. 05: Relational Schema Design Ch , except and

Database Design Theory and Normalization. CS 377: Database Systems

Functional Dependencies and Normalization for Relational Databases Design & Analysis of Database Systems

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-2

Chapter 10. Normalization. Chapter Outline. Chapter Outline(contd.)

Chapter 10. Chapter Outline. Chapter Outline. Functional Dependencies and Normalization for Relational Databases

Lecture 6a Design Theory and Normalization 2/2

Database Systems. Basics of the Relational Data Model

Review: Attribute closure

Functional dependency theory

Chapter 14. Database Design Theory: Introduction to Normalization Using Functional and Multivalued Dependencies

FUNCTIONAL DEPENDENCIES

Informal Design Guidelines for Relational Databases

Schema Refinement: Dependencies and Normal Forms

QUESTION BANK. SUBJECT CODE / Name: CS2255 DATABASE MANAGEMENT SYSTEM UNIT III. PART -A (2 Marks)

CSE 562 Database Systems

customer = (customer_id, _ customer_name, customer_street,

QUIZ 1 REVIEW SESSION DATABASE MANAGEMENT SYSTEMS

Functional Dependencies and. Databases. 1 Informal Design Guidelines for Relational Databases. 4 General Normal Form Definitions (For Multiple Keys)

V. Database Design CS448/ How to obtain a good relational database schema

Schema Refinement: Dependencies and Normal Forms

FINAL EXAM REVIEW. CS121: Introduction to Relational Database Systems Fall 2018 Lecture 27

Functional Dependencies & Normalization for Relational DBs. Truong Tuan Anh CSE-HCMUT

Relational Design Theory. Relational Design Theory. Example. Example. A badly designed schema can result in several anomalies.

Database Management System Prof. Partha Pratim Das Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur

Schema Refinement: Dependencies and Normal Forms

Design Theory for Relational Databases

Typical relationship between entities is ((a,b),(c,d) ) is best represented by one table RS (a,b,c,d)

The Entity Relationship Model

CS211 Lecture: Database Design

Announcements (January 20) Relational Database Design. Database (schema) design. Entity-relationship (E/R) model. ODL (Object Definition Language)

Instructor: Amol Deshpande

Fundamentals of Database Systems

BCNF. Yufei Tao. Department of Computer Science and Engineering Chinese University of Hong Kong BCNF

MODULE: 3 FUNCTIONAL DEPENDENCIES

Schema Refinement and Normal Forms

Relational Database design. Slides By: Shree Jaswal

Part II: Using FD Theory to do Database Design

CSCI 403: Databases 13 - Functional Dependencies and Normalization

Schema Refinement & Normalization Theory 2. Week 15

Relational Design Theory

Chapter 6: Relational Database Design

The Relational Data Model

CSCI 127 Introduction to Database Systems

Homework 6: FDs, NFs and XML (due April 13 th, 2016, 4:00pm, hard-copy in-class please)

Normalization. VI. Normalization of Database Tables. Need for Normalization. Normalization Process. Review of Functional Dependence Concepts

Databases The theory of relational database design Lectures for m

Theory of Normal Forms Decomposition of Relations. Overview

Database design III. Quiz time! Using FDs to detect anomalies. Decomposition. Decomposition. Boyce-Codd Normal Form 11/4/16

Lecture 11 - Chapter 8 Relational Database Design Part 1

Relational Design Theory

Relational Model and Relational Algebra A Short Review Class Notes - CS582-01

5 Normalization:Quality of relational designs

Database Management System

A subquery is a nested query inserted inside a large query Generally occurs with select, from, where Also known as inner query or inner select,

CSIT5300: Advanced Database Systems

Redundancy:Dependencies between attributes within a relation cause redundancy.

Relational Database Systems 1

ADVANCED DATABASES ; Spring 2015 Prof. Sang-goo Lee (11:00pm: Mon & Wed: Room ) Advanced DB Copyright by S.-g.

Relational Database Design. Announcements. Database (schema) design. CPS 216 Advanced Database Systems. DB2 accounts have been set up

Design Theory for Relational Databases

COSC Dr. Ramon Lawrence. Emp Relation

The Relational Data Model. Functional Dependencies. Example. Functional Dependencies

Database Management Systems Paper Solution

NORMAL FORMS. CS121: Relational Databases Fall 2017 Lecture 18

CS 2451 Database Systems: Database and Schema Design

DATABASE MANAGEMENT SYSTEMS

Ryan Marcotte CS 475 (Advanced Topics in Databases) March 14, 2011

Relational Database Systems 1. Christoph Lofi Simon Barthel Institut für Informationssysteme Technische Universität Braunschweig

Functional Dependencies and Normalization

We shall represent a relation as a table with columns and rows. Each column of the table has a name, or attribute. Each row is called a tuple.

Normalization is based on the concept of functional dependency. A functional dependency is a type of relationship between attributes.

Database Normalization

Homework 6: FDs, NFs and XML (due April 15 th, 2015, 4:00pm, hard-copy in-class please)

Who is the borrower whose id is 12345? σ borrower borrower_id = select * from borrower where borrower_id = '12345'

Lecture 5 Design Theory and Normalization

CS352 Lecture - The Entity-Relationship Model

Transcription:

Database Design Principles CPS352: Database Systems Simon Miner Gordon College Last Revised: 2/11/15

Agenda Check-in Design Project ERD Presentations Database Design Principles Decomposition Functional Dependencies Closures Canonical Cover Homework 3

Check-in

The Entities and Relationships of the Psalms Psalm 30, 31, and 32 (NIV)

Some Assertions about Psalm Data Each psalm is identified by a number, has a text, and may have a type, recipient, and zero or more instruments An author is identified by a name and has a position. An author may write multiple psalms, but every author must be associated with at least one psalm. An occasion is identified by a name and has a location. A single psalm can be used for multiple occasions, but it doesn t make sense to have an occasion without a psalm. (How boring would that be!) A psalm can describe one or more acts of God, and multiple psalms can describe a single act of God.

Design Project ERD Presentations Milestone II

Database Design Principles

Introduction Terminology review Relation scheme set of attributes for some relation (R, R 1, R 2 ) Relation the actual data stored in some relation scheme (r, r 1, r 2 ) Tuple a single actual row in the relation (t, t 1, t 2 ) Changes to the library database schema We make the following updates for this discussion Add the following attributes to the book relation copy_number a library can have multiple copies of a book accession_number unique number (ID) assigned to a copy of a book when the library acquires it New book and checked_out relation scheme Book( call_number, copy_number, accession_number, title, author ) Checked_out( borrower_id, call_number, copy_number, date_due )

The Art of Database Design Designing a database is a balancing act On the one extreme, you can have a universal relation (in which all attributes reside within a single relation scheme) Everything( borrower_id, last_name, first_name, // from borrower call_number, copy_number, accession_number, title, author // from book date_due // from checked_out ) Leads to numerous anomalies with changing data in the database

Break Up Relations with Decomposition Decomposition is the process of breaking up an original scheme into two or more schemes Each attribute of the original scheme appears in at least one of the new schemes But this can be taken too far Borrower( borrower_id, last_name, first_name ) Book( call_number, copy_number, accession_number, title, author ) Checked_out( date_due ) Leads to lossy-join problems

We Want Lossless-Join Decompositions Part of the middle ground in the balancing act Allows decomposition of the Everything relation Preserves connections between the tuples of the participating relations So that the natural join of the new relations = the original Everything relation Formal definition For some relation scheme R decomposed into two or more schemes (R 1, R 2, R n ) Where R = R 1 R 2 R n A lossless-join decomposition means that for every legal instance r of R decomposed into r 1, r 2, r n of R 1, R 2, and R n r = r 1 X r 2 X X r n

Database Design Goal: Create Good Relations We want to be able to determine whether a particular relation R is in good form. We ll talk about how to do this shortly In the case that a relation R is not in good form, decompose it into a set of relations {R 1, R 2,..., R n } such that each relation is in good form the decomposition is a lossless-join decomposition Our theory is based on: functional dependencies multivalued dependencies

Functional Dependency (FD) When the value of a certain set of attributes uniquely determines the value for another set of attributes Generalization of the notion of a key A way to find good relations A B (read: A determines B) Formal definition For some relation scheme R and attribute sets A (A R) and B (B R) A B if for any legal relation on R If there are two tuples t 1 and t 2 such that t 1 (A) = t 2 (A) It must be the case that t 2 (B) = t 2 (B)

Finding Functional Dependencies From keys of an entity From relationships between entities Implied functional dependencies

FDs from Entity Keys A BC

FDs from One to Many / Many to One Relationships A BC W XY A BCMWXY

FDs from One to One Relationships A BC W XY A BCMWXY W XYMABC

FDs from Many to Many Relationships A BC W XY AW M

Implied Functional Dependencies Initial set of FDs logically implies other FDs If A B and B C, then A C Closure If F is the set of functional dependencies we develop from the logic of the underlying reality Then F+ (the transitive closure of F) is the set consisting of all the dependencies of F, plus all the dependencies they imply

Rules for Computing F+ We can find F +, the closure of F, by repeatedly applying Armstrong s Axioms: if, then (reflexivity) Trivial dependency if, then (augmentation) if, and, then (transitivity) Additional rules (inferred from Armstrong s Axioms) If and, then (union) If, then and (decomposition) If and, then (pseudotransitivity)

Applying the Axioms R = (A, B, C, G, H, I) F = { A B A C CG H CG I B H} some members of F + A H by transitivity from A B and B H AG I by augmenting A C with G, to get AG CG and then transitivity with CG I CG HI by augmenting CG I to infer CG CGI, and augmenting of CG H to infer CGI HI, and then transitivity or by the union rule

Algorithm to Compute F+ To compute the closure of a set of functional dependencies F: F + = F repeat for each functional dependency f in F + apply reflexivity and augmentation rules on f add the resulting functional dependencies to F + for each pair of functional dependencies f 1 and f 2 in F + if f 1 and f 2 can be combined using transitivity then add the resulting functional dependency to F + until F + does not change any further

Algorithm to Compute the Closure of Attribute Sets Given a set of attributes, define the closure of under F (denoted by + ) as the set of attributes that are functionally determined by under F Algorithm to compute +, the closure of under F result := ; while (changes to result) do for each in F do begin if result then result := result end

Example of Attribute Set R = (A, B, C, G, H, I) F = {A B A C CG H CG I B H} Closure (AG) + 1. result = AG 2. result = ABCG (A C and A B) 3. result = ABCGH (CG H and CG AGBC) 4. result = ABCGHI (CG I and CG AGBCH) Is AG a candidate key? 1. Is AG a super key? 1. Does AG R? == Is (AG) + R 2. Is any subset of AG a superkey? 1. Does A R? == Is (A) + R 2. Does G R? == Is (G) + R

Canonical Cover Sets of functional dependencies may have redundant dependencies that can be inferred from the others For example: A C is redundant in: {A B, B C, A C} Parts of a functional dependency may be redundant E.g.: on RHS: {A B, B C, A CD} can be simplified to {A B, B C, A D} E.g.: on LHS: {A B, B C, AC D} can be simplified to {A B, B C, A D} Intuitively, a canonical cover of F is a minimal set of functional dependencies equivalent to F, having no redundant dependencies or redundant parts of dependencies

Definition of Canonical Cover A canonical cover for F is a set of dependencies F c such that F logically implies all dependencies in F c, and F c logically implies all dependencies in F, and No functional dependency in F c contains an extraneous attribute, and Each left side of functional dependency in F c is unique. To compute a canonical cover for F: repeat Use the union rule to replace any dependencies in F 1 1 and 1 2 with 1 1 2 Find a functional dependency with an extraneous attribute either in or in /* Note: test for extraneous attributes done using F c, not F*/ If an extraneous attribute is found, delete it from until F does not change Note: Union rule may become applicable after some extraneous attributes have been deleted, so it has to be re-applied

How to Find a Canonical Cover Another algorithm Write F as a set of dependencies where each has a single attribute on the right hand side Eliminate trivial dependencies In which and (reflexivity) Eliminate redundant dependencies (implied by other dependencies) Combine dependencies with the same left hand side For any given set of FDs, the canonical cover is not necessarily unique

Uses of Functional Dependencies Testing for lossless-join decomposition Testing for dependency preserving decompositions Defining keys

Testing for Lossless-Join Decomposition The closure of a set of FDs can be used to test if a decomposition is lossless-join For the case of R = (R 1, R 2 ), we require that for all possible relations r on schema R r = R1 (r ) R2 (r ) A decomposition of R into R 1 and R 2 is lossless join if at least one of the following dependencies is in F + : R 1 R 2 R 1 R 1 R 2 R 2 Does the intersection of the decomposition satisfy at least one FD?

Testing for Dependency Preserving Decompositions The closure of a set of FDs allows us to test a new tuple being inserted into a table to see if it satisfies all relevant FDs without having to do a join This is desirable because joins are expensive Let F i be the set of dependencies F + that include only attributes in R i. A decomposition is dependency preserving, if (F 1 F 2 F n ) + = F + If it is not, then checking updates for violation of functional dependencies may require computing joins, which is expensive. The closure of a dependency preserving decomposition equals the closure of the original set Can all FDs be tested (either directly or by implication) without doing a join?

Keys and Functional Dependencies Given a relation scheme R with attribute set K R K is a superkey if K R K is a candidate key if there is no subset L of K such that L R A superkey with one attribute is always a candidate key Primary key is the candidate key K chosen by the designer Every relation must have a superkey (possibly the entire set of attributes) Key attribute an attribute that is or is part of a candidate key

Homework 3