Ryan Marcotte CS 475 (Advanced Topics in Databases) March 14, 2011

Similar documents
CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 4 - Schema Normalization

Functional Dependencies CS 1270

Normalization. Murali Mani. What and Why Normalization? To remove potential redundancy in design

Schema Refinement: Dependencies and Normal Forms

Schema Refinement: Dependencies and Normal Forms

Databases -Normalization I. (GF Royle, N Spadaccini ) Databases - Normalization I 1 / 24

CSE 562 Database Systems

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2009 Lecture 3 - Schema Normalization

Functional Dependencies and Finding a Minimal Cover

COSC Dr. Ramon Lawrence. Emp Relation

Schema Refinement: Dependencies and Normal Forms

This lecture. Databases -Normalization I. Repeating Data. Redundancy. This lecture introduces normal forms, decomposition and normalization.

Part II: Using FD Theory to do Database Design

Chapter 10. Normalization. Chapter Outline. Chapter Outline(contd.)

Functional dependency theory

Functional Dependencies and Normalization for Relational Databases Design & Analysis of Database Systems

Relational Design: Characteristics of Well-designed DB

Database Management System Prof. Partha Pratim Das Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur

Relational Database Design. Announcements. Database (schema) design. CPS 216 Advanced Database Systems. DB2 accounts have been set up

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-2

Chapter 10. Chapter Outline. Chapter Outline. Functional Dependencies and Normalization for Relational Databases

UNIT 3 DATABASE DESIGN

Databases The theory of relational database design Lectures for m

FUNCTIONAL DEPENDENCIES

CS411 Database Systems. 05: Relational Schema Design Ch , except and

Database Systems. Basics of the Relational Data Model

Database Management System

Announcements (January 20) Relational Database Design. Database (schema) design. Entity-relationship (E/R) model. ODL (Object Definition Language)

CS 2451 Database Systems: Database and Schema Design

Database Design Theory and Normalization. CS 377: Database Systems

Chapter 14. Database Design Theory: Introduction to Normalization Using Functional and Multivalued Dependencies

Lecture 11 - Chapter 8 Relational Database Design Part 1

Informal Design Guidelines for Relational Databases

Functional Dependencies and. Databases. 1 Informal Design Guidelines for Relational Databases. 4 General Normal Form Definitions (For Multiple Keys)

Relational Database Systems 1

Lectures 12: Design Theory I. 1. Normal forms & functional dependencies 2/19/2018. Today s Lecture. What you will learn about in this section

CS 338 Functional Dependencies

V. Database Design CS448/ How to obtain a good relational database schema

Lecture 5 Design Theory and Normalization

Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Overview - detailed. Goal. Faloutsos & Pavlo CMU SCS /615

Relational Design Theory. Relational Design Theory. Example. Example. A badly designed schema can result in several anomalies.

Normalisation. Normalisation. Normalisation

CMU SCS CMU SCS CMU SCS CMU SCS whole nothing but

Database Design Principles

customer = (customer_id, _ customer_name, customer_street,

Chapter 8: Relational Database Design

IJREAS Volume 2, Issue 2 (February 2012) ISSN: COMPARING MANUAL AND AUTOMATIC NORMALIZATION TECHNIQUES FOR RELATIONAL DATABASE ABSTRACT

Relational Database design. Slides By: Shree Jaswal

Homework 6: FDs, NFs and XML (due April 13 th, 2016, 4:00pm, hard-copy in-class please)

Functional Dependencies & Normalization for Relational DBs. Truong Tuan Anh CSE-HCMUT

FUNCTIONAL DEPENDENCIES CHAPTER , 15.5 (6/E) CHAPTER , 10.5 (5/E)

Chapter 7: Relational Database Design

Unit- III (Functional dependencies and Normalization, Relational Data Model and Relational Algebra)

Relational Database Design Theory. Introduction to Databases CompSci 316 Fall 2017

MODULE: 3 FUNCTIONAL DEPENDENCIES

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Relational Database Systems 1. Christoph Lofi Simon Barthel Institut für Informationssysteme Technische Universität Braunschweig

Unit 3 : Relational Database Design

Relational Database Systems 1 Wolf-Tilo Balke Hermann Kroll, Janus Wawrzinek, Stephan Mennicke

Steps in normalisation. Steps in normalisation 7/15/2014

Final Review. Zaki Malik November 20, 2008

The Relational Data Model

Schema Refinement and Normal Forms

Functional Dependencies and Single Valued Normalization (Up to BCNF)

Desirable database characteristics Database design, revisited

BCNF. Yufei Tao. Department of Computer Science and Engineering Chinese University of Hong Kong BCNF

Babu Banarasi Das National Institute of Technology and Management

To overcome these anomalies we need to normalize the data. In the next section we will discuss about normalization.

CS352 Lecture - Conceptual Relational Database Design

Relational Database Design (II)

PLEASE HAND IN. Good Luck! UNIVERSITY OF TORONTO Faculty of Arts and Science. December 2009 Examinations. CSC 343H1F Duration 3 hours.

Database design III. Quiz time! Using FDs to detect anomalies. Decomposition. Decomposition. Boyce-Codd Normal Form 11/4/16

A7-R3: INTRODUCTION TO DATABASE MANAGEMENT SYSTEMS

CS145 Midterm Examination

Redundancy:Dependencies between attributes within a relation cause redundancy.

Homework 6: FDs, NFs and XML (due April 15 th, 2015, 4:00pm, hard-copy in-class please)

Theory of Normal Forms Decomposition of Relations. Overview

Database Normalization. (Olav Dæhli 2018)

CSCI 403: Databases 13 - Functional Dependencies and Normalization

UNIT -III. Two Marks. The main goal of normalization is to reduce redundant data. Normalization is based on functional dependencies.

Database Management Systems Paper Solution

Gn-Dtd: Innovative Way for Normalizing XML Document

Applying Spanning Tree Graph Theory for Automatic Database Normalization

NORMAL FORMS. CS121: Relational Databases Fall 2017 Lecture 18

Databases Lecture 7. Timothy G. Griffin. Computer Laboratory University of Cambridge, UK. Databases, Lent 2009

CS352 Lecture - Conceptual Relational Database Design

CS211 Lecture: Database Design

Database Management System 15

Case Study: Lufthansa Cargo Database

Review: Attribute closure

Chapter 14 Outline. Normalization for Relational Databases: Outline. Chapter 14: Basics of Functional Dependencies and

Lectures 5 & 6. Lectures 6: Design Theory Part II

Design Theory for Relational Databases

DBMS Chapter Three IS304. Database Normalization-Comp.

Part V Relational Database Design Theory

Chapter 6: Relational Database Design

Databases Tutorial. March,15,2012 Jing Chen Mcmaster University

Draw A Relational Schema And Diagram The Functional Dependencies In The Relation >>>CLICK HERE<<<

TDDD12 Databasteknik Föreläsning 4: Normalisering

Schema Refinement & Normalization Theory 2. Week 15

Transcription:

Ryan Marcotte www.cs.uregina.ca/~marcottr CS 475 (Advanced Topics in Databases) March 14, 2011

Outline Introduction to XNF and motivation for its creation Analysis of XNF s link to BCNF Algorithm for converting a DTD to XNF Example March 14, 2011 Ryan Marcotte 2

March 14, 2011 Ryan Marcotte 3

Introduction XML is used for data storage and exchange Data is stored in a hierarchical fashion Duplicates and inconsistencies may exist in the data store March 14, 2011 Ryan Marcotte 4

Introduction Relational databases store data according to some schema XML also stores data according to some schema, such as a Document Type Definition (DTD) Obviously, some schemas are better than others A normal form is needed that reduces the amount of storage needed while ensuring consistency and eliminating redundancy March 14, 2011 Ryan Marcotte 5

Introduction XNF was proposed by Marcelo Arenas and Leonid Libkin (University of Toronto) in a 2004 paper titled A Normal Form for XML Documents Recognized a need for good XML data design as a lot of data is being put on the web Once massive web databases are created, it is very hard to change their organization; thus, there is a risk of having large amounts of widely accessible, but at the same time poorly organized legacy data. March 14, 2011 Ryan Marcotte 6

Introduction XNF provides a set of rules that describe well-formed DTDs Poorly-designed DTDs can be transformed into wellformed ones (through normalization just like relational databases!) Well-formed DTDs avoid redundancies and update anomalies March 14, 2011 Ryan Marcotte 7

March 14, 2011 Ryan Marcotte 8

Review of Basic Terms Recall the definition of functional dependencies (FDs) Given a relation schema R, a set of attributes X is said to functionally determine another set of attributes Y (also in R), written X Y, if and only if for each unique value for X there is exactly one value for Y March 14, 2011 Ryan Marcotte 9

Review of Basic Terms F + is the closure of FDs derived using Armstrong s axioms: reflexivity (if Y X, then X Y) augmentation (if X Y, then XZ YZ) transitivity (if X Y and Y Z, then X Z) Every set of FDs has a canonical cover (a minimal set of FDs such that all other FDs can be derived using the above axioms) March 14, 2011 Ryan Marcotte 10

Review of Basic Terms An element represents a node in the XML tree and includes everything from its start tag to its end tag An attribute provides additional information about an element; attributes begin with @ A path in an XML document is a sequence of element names separated by periods, ending with an element name or an attribute name March 14, 2011 Ryan Marcotte 11

Review of Basic Terms <!DOCTYPE students [ <!ELEMENT student_list (student)*> <!ELEMENT student (first_name, last_name)> <!ELEMENT first_name (#PCDATA)> <!ELEMENT last_name (#PCDATA)> <!ATTLIST student id CDATA #REQUIRED> ]> For example: student_list.student.first_name.s student_list.student.@id March 14, 2011 Ryan Marcotte 12

Review of Basic Terms The term S represents a string value (corresponding to the #PCDATA keyword in the DTD) For example, if the element name is first_name and the element is <first_name>paul</first_name>, then S = Paul March 14, 2011 Ryan Marcotte 13

Review of Basic Terms Redundancy occurs when data corresponding to a single element is stored more than once Update anomalies take two forms: Because data for an element is stored multiple times, updating one record creates an inconsistency Removing an element may remove it from the document entirely Examples of the above will be given later in the presentation March 14, 2011 Ryan Marcotte 14

Boyce-Codd Normal Form A relational database is in BCNF if and only if for every one of its nontrivial FDs X Y, X is a superkey (X is either a candidate key or a superset thereof) Simply speaking, for distinct X, there is exactly one value for Y (no redundancy) Note that the number of attributes in the key X should be minimized for ease of identification among individual tuples March 14, 2011 Ryan Marcotte 15

Boyce-Codd Normal Form Examples: sid, first_name, last_name age (BAD not minimum size) sid first_name, last_name, age (GOOD only one attribute) cid course_name, semester_offered course_name course_description March 14, 2011 Ryan Marcotte 16

XNF Versus BCNF XNF generalizes Boyce-Codd Normal Form XNF disallows redundancy-causing FDs March 14, 2011 Ryan Marcotte 17

XML Normal Form Let P 1 and P 2 be paths in an XML document A DTD D and its set of FDs F is in XNF if and only if for every one of its nontrivial FDs of the form P 1 P 2.@a (where @a is an attribute) or P 1 P 2.E (where E is an element), it is the case that P 1 P 2 is implied by F + March 14, 2011 Ryan Marcotte 18

XML Normal Form In layman s terms, for distinct values of P 1, there is only one value for P 2 This is remarkably similar to our definition of BCNF! In fact, a relational database schema is in BCNF if and only if it s XML schema equivalent is in XNF (this will not be proven here) March 14, 2011 Ryan Marcotte 19

XML Normal Form <!DOCTYPE students [ <!ELEMENT student_list (STUDENT)*> <!ELEMENT student (first_name, last_name)> <!ELEMENT first_name (#PCDATA)> <!ELEMENT last_name (#PCDATA)> <!ATTLIST student id CDATA #REQUIRED> ]> student_list.student.@id student_list.student.first_name.s, student_list.student.last_name.s March 14, 2011 Ryan Marcotte 20

Relational Schema to XML Let R be a relation over attributes A, B, C The schema R(A, B, C) with FD A BC translates to: <!ELEMENT db (G*)> <!ELEMENT G EMPTY> <!ATTLIST G A CDATA #REQUIRED B CDATA #REQUIRED C CDATA #REQUIRED>... with FD db.g.@a db.g.@b, db.g.@c March 14, 2011 Ryan Marcotte 21

March 14, 2011 Ryan Marcotte 22

Usage The following algorithm must be used in the design stage of XML database creation Once data exists in the XML database, it can be very tedious and/or difficult to modify the schema (also, errors may be introduced as a result of the database modifications if it is done by hand) March 14, 2011 Ryan Marcotte 23

Assumptions DTDs are assumed to be nonrecursive (nonrecursive DTDs lead to an infinite number of paths) Note that we can allow for recursion by considering that FDs only specify a finite number of paths and so we can restrict our attention to a finite number of unfoldings of the recursive rules FDs are assumed to have at least one element path on the left-hand side of the rule (that is, FDs are of the form { p, p 1.@a 1, p 1.@a 2,..., p 1.@a n } q) March 14, 2011 Ryan Marcotte 24

Basic Operations Move attributes / child elements from an existing element to another one Create a new element type March 14, 2011 Ryan Marcotte 25

Algorithm Given a DTD D and set of FDs F: If (D, F) is in XNF, return Otherwise, find an anomalous FD and use the two basic operations to modify D to eliminate the anomalous FD Repeat the above the first step will cause the algorithm to terminate once (D, F) is in XNF March 14, 2011 Ryan Marcotte 26

Algorithm Just like other normalization algorithms (for 1NF, 2NF, 3NF, and BCNF), the algorithm: Is simple Decomposes the schema into separate data structures (tables for relational databases, trees for XML) FDs are preserved (it is lossless) The algorithm always terminates; this will not be proven here March 14, 2011 Ryan Marcotte 27

March 14, 2011 Ryan Marcotte 28

Example Schema <!DOCTYPE courses [ <!ELEMENT courses (course*)> <!ELEMENT course (title, taken_by)> <!ATTLIST course cno CDATA #REQUIRED> <!ELEMENT title (#PCDATA)> <!ELEMENT taken_by (student*)> <!ELEMENT student (name, grade)> <!ATTLIST student sno CDATA #REQUIRED> <!ELEMENT name (#PCDATA)> <!ELEMENT grade (#PCDATA)> ]> FDs: courses.course.@cno courses.course { courses.course, courses.course.taken_by.student.@sid } courses.course.taken_by.student courses.course.taken_by.student.@sid courses.course.taken_by.student.name.s March 14, 2011 Ryan Marcotte 29

Example Schema The previous FDs enforce the following constraints: A course ID uniquely identifies a course Two distinct students of the same course cannot have the same student ID Two students with the same student ID must have the same name March 14, 2011 Ryan Marcotte 30

Schema Problems Or do they? Consider the third FD: courses.course.taken_by.student.@sid courses.course.taken_by.student.name.s By XNF, the following must hold: courses.course.taken_by.student.@sid courses.course.taken_by.student.name It does not. Why? March 14, 2011 Ryan Marcotte 31

Schema Problems \ A single @sid identifies two distinct paths! March 14, 2011 Ryan Marcotte 32

Schema Problems The third FD can be violated under the current schema This is because multiple copies of the name element are stored for each unique @sid; because of this, changing a value in one place introduces inconsistency Also, deleting student information from a course could remove that student from the database if only one copy of that student s information exists The above two points are examples of update anomalies March 14, 2011 Ryan Marcotte 33

Using the Algorithm Fix by creating a new element type student_info with @sid as its key Move the name element from the student element to the student_info element Though it is not part of the algorithm, we will modify the root element name from courses to db (database) to better reflect intended semantics March 14, 2011 Ryan Marcotte 34

Using the Algorithm <!DOCTYPE university_db [ <!ELEMENT db (course*, student_info*)> <!ELEMENT course (title, taken_by)> <!ATTLIST course cno CDATA #REQUIRED> <!ELEMENT title (#PCDATA)> <!ELEMENT taken_by (student*)> <!ELEMENT student (grade)> <!ATTLIST student sno CDATA #REQUIRED> <!ELEMENT grade (#PCDATA)> <!ELEMENT student_info (name)> <!ATTLIST student_info sid CDATA #REQUIRED> ]> <!ELEMENT name (#PCDATA)> FDs: db.course.@cno db.course { db.course, db.course.taken_by.student.@sid } db.course.taken_by.student db.course.taken_by.student.@sid db.student_info.name.s March 14, 2011 Ryan Marcotte 35

Using the Algorithm No additional anomalous FDs exist; the schema is in XNF FDs have been preserved March 14, 2011 Ryan Marcotte 36

Do you have any questions? March 14, 2011 Ryan Marcotte 37

Resources A Normal Form for XML Documents Marcelo Arenas and Leonid Libkin (University of Toronto) http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.3.7590&rep=rep1&type=pdf March 14, 2011 Ryan Marcotte 38