Applied Databases. Sebastian Maneth. Lecture 5 ER Model, Normal Forms. University of Edinburgh - January 30 th, 2017

Similar documents
Applied Databases. Sebastian Maneth. Lecture 5 ER Model, normal forms. University of Edinburgh - January 25 th, 2016

The data structures of the relational model Attributes and domains Relation schemas and database schemas

A database can be modeled as: + a collection of entities, + a set of relationships among entities.

Running Example Tables name location

Redundancy:Dependencies between attributes within a relation cause redundancy.

MIDTERM EXAMINATION Spring 2010 CS403- Database Management Systems (Session - 4) Ref No: Time: 60 min Marks: 38

Steps in normalisation. Steps in normalisation 7/15/2014

The DBMS accepts requests for data from the application program and instructs the operating system to transfer the appropriate data.

Conceptual Data Modeling

8) A top-to-bottom relationship among the items in a database is established by a

Database Management System Prof. Partha Pratim Das Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur

COMP Instructor: Dimitris Papadias WWW page:

Lecture 11 - Chapter 8 Relational Database Design Part 1

Normalization. Murali Mani. What and Why Normalization? To remove potential redundancy in design

DATABASE MANAGEMENT SYSTEM SHORT QUESTIONS. QUESTION 1: What is database?

Lecture 5 STRUCTURED ANALYSIS. PB007 So(ware Engineering I Faculty of Informa:cs, Masaryk University Fall Bühnová, Sochor, Ráček

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Objectives Definition iti of terms List five properties of relations State two properties of candidate keys Define first, second, and third normal for

MTAT Introduction to Databases

FINAL EXAM REVIEW. CS121: Introduction to Relational Database Systems Fall 2018 Lecture 27

Database Systems. Normalization Lecture# 7

The Entity-Relationship Model (ER Model) - Part 2

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-2

BIRKBECK (University of London)

Chapter 10. Normalization. Chapter Outline. Chapter Outline(contd.)

Database Design Theory and Normalization. CS 377: Database Systems

Chapter 10. Chapter Outline. Chapter Outline. Functional Dependencies and Normalization for Relational Databases

Functional Dependencies & Normalization for Relational DBs. Truong Tuan Anh CSE-HCMUT

Northern India Engineering College, New Delhi Question Bank Database Management System. B. Tech. Mechanical & Automation Engineering V Semester

More on the Chen Notation

Techno India Batanagar Computer Science and Engineering. Model Questions. Subject Name: Database Management System Subject Code: CS 601

Database Management

Relational Design 1 / 34

Relational Model. Rab Nawaz Jadoon DCS. Assistant Professor. Department of Computer Science. COMSATS IIT, Abbottabad Pakistan

Entity-Relationship Modelling. Entities Attributes Relationships Mapping Cardinality Keys Reduction of an E-R Diagram to Tables

Supplier-Parts-DB SNUM SNAME STATUS CITY S1 Smith 20 London S2 Jones 10 Paris S3 Blake 30 Paris S4 Clark 20 London S5 Adams 30 Athens

Schema Refinement: Dependencies and Normal Forms

UNIT 3 DATABASE DESIGN

Chapter 14 Outline. Normalization for Relational Databases: Outline. Chapter 14: Basics of Functional Dependencies and

Relational Design: Characteristics of Well-designed DB

6 February 2014 CSE-3421M Test #1 w/ answers p. 1 of 14. CSE-3421M Test #1. Design

DC62 Database management system JUNE 2013

Entity-Relationship Model &

Informal Design Guidelines for Relational Databases

Distributed Database Systems By Syed Bakhtawar Shah Abid Lecturer in Computer Science

Functional Dependencies and. Databases. 1 Informal Design Guidelines for Relational Databases. 4 General Normal Form Definitions (For Multiple Keys)

Learning outcomes. On successful completion of this unit you will: 1. Understand data models and database technologies.

Schema Refinement: Dependencies and Normal Forms

COSC Dr. Ramon Lawrence. Emp Relation

CS 338 Functional Dependencies

Database Management

Informatics 1: Data & Analysis

V. Database Design CS448/ How to obtain a good relational database schema

Chapter 4. The Relational Model

Chapter 2 Introduction to Relational Models

Chapter 14. Database Design Theory: Introduction to Normalization Using Functional and Multivalued Dependencies

Relational model continued. Understanding how to use the relational model. Summary of board example: with Copies as weak entity

L12: ER modeling 5. CS3200 Database design (sp18 s2) 2/22/2018

CS403- Database Management Systems Solved Objective Midterm Papers For Preparation of Midterm Exam

Normalisation Chapter2 Contents

DATABASE MANAGEMENT SYSTEMS

پوهنتون کابل پوهنحی كمپيوترساینس پوهنیار محمد شعیب "زرین خیل"

Chapter 2: Entity-Relationship Model. Entity Sets. Entity Sets customer and loan. Attributes. Relationship Sets. A database can be modeled as:

Schema Refinement: Dependencies and Normal Forms

Normalization in DBMS

Normalization Rule. First Normal Form (1NF) Normalization rule are divided into following normal form. 1. First Normal Form. 2. Second Normal Form

Chapter 2: Entity-Relationship Model

NORMAL FORMS. CS121: Relational Databases Fall 2017 Lecture 18

DATABASE TECHNOLOGY - 1DL124

Databases. Dr. Richard E. Turner March 5, 2013

Relational Database Systems Part 01. Karine Reis Ferreira

Chapter 6: Entity-Relationship Model. E-R Diagrams

GUJARAT TECHNOLOGICAL UNIVERSITY


Database Applications (15-415)

The Entity Relationship Model

CS403- Database Management Systems Solved MCQS From Midterm Papers. CS403- Database Management Systems MIDTERM EXAMINATION - Spring 2010

Databases The theory of relational database design Lectures for m

Bachelor in Information Technology (BIT) O Term-End Examination

Brief History of SQL. Relational Database Management System. Popular Databases

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 4 - Schema Normalization

ACS-3902 Fall Ron McFadyen 3D21 Slides are based on chapter 5 (7 th edition) (chapter 3 in 6 th edition)

Lecture 4. Database design IV. INDs and 4NF Design wrapup

3 February 2011 CSE-3421M Test #1 p. 1 of 14. CSE-3421M Test #1. Design

Case Study: Lufthansa Cargo Database

CS425 Fall 2016 Boris Glavic Chapter 2: Intro to Relational Model

The Relational Model

Normal Forms. Winter Lecture 19

Database Normalization. (Olav Dæhli 2018)

The Relational Data Model

Chapter 2: Intro to Relational Model

A Deeper Look at Data Modeling. Shan-Hung Wu & DataLab CS, NTHU

Chapter 6. Advanced Data Modeling. Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel

Lectures 12: Design Theory I. 1. Normal forms & functional dependencies 2/19/2018. Today s Lecture. What you will learn about in this section

CMSC 424 Database design Lecture 3: Entity-Relationship Model. Book: Chap. 1 and 6. Mihai Pop

Database Normalization

Applied Databases. Sebastian Maneth. Lecture 7 Simple SQL Queries. University of Edinburgh - February 6 st, 2017

Lecture 14 of 42. E-R Diagrams, UML Notes: PS3 Notes, E-R Design. Thursday, 15 Feb 2007

Professional Knowledge Complete Theory IT Ebook For IBPS RRB SCALE-2 SBI IT IBPS IT INSURANCE SPECIALIST EXAM

Mapping ER Diagrams to. Relations (Cont d) Mapping ER Diagrams to. Exercise. Relations. Mapping ER Diagrams to Relations (Cont d) Exercise

Transcription:

Applied Databases Lecture 5 ER Model, Normal Forms Sebastian Maneth University of Edinburgh - January 30 th, 2017

Outline 2 1. Entity Relationship Model 2. Normal Forms

From Last Lecture 3 the Lecturer participation in Teaches is optional, so there is a 0 as minimum cardinality on the link between Lecturer and Teaches. the Student participation is mandatory, so the minimum cardinality is 1.

Keys and Superkeys 4 Superkey = Set of attributes of an entity type so that for each entity e of that type, the set of values of the attributes uniquely identifies e. e.g. a Person may be uniquely identified by { Name, NI# } Key = is a superkey which is minimal (aka Candidate Key ) e.g., a Person is uniquely identified by { NI# }. Prime Attribute = attribute that appears in a key Non-Prime Attribute = attribute that appears in no key Simple Key consists of one attribute Composite Key consists of more than one attribute

Primary Keys 5 Primary Key = a key that has been chosen as such by the database designer primary key guarantees logical access to every entity (attributes of a primary key are underlined)

Cyclic Relationship Type with Roles 6

Relationship Type with Attributes 7 Each Supplier Supplies a Part at a certain Price

Weak Entity Types 8 Weak Entity Type = an entity type that does not have sufficient attributes to form a primary key (double rectangle) depends on the existence of an identifying (or owner ) entity type (they have an identifying (ID) relationship double diamond) must have a discriminator (dashed underline) for distinguishing its entities E.g. in an employee database, Child entities exist only if their corresponding Parent employee entity exists. The primary key of a weak entity type is the combination of the primary key of its owner type and its discriminator.

Weak Entity Types 9

ISA Relationship Types 10 If entities of a type have special properties not shared by all entities, then this suggests two entity types with an ISA relationship between them AKA generalization / specialization (supertype / subtype rel.) E.g. an Employee ISA Person and a Student ISA Person If Employee ISA Person, then Employee inherits all attributes of Person.

ISA Relationships 11 Attributes of Employee: Name, Number, and Salary.

Informal Methods for ERD Construction 12

Example 13

2. Normal Forms 14

2. Normal Forms 15 Relational database design What is a good database design? Data How many tables? What goes into which table?

2. Normal Forms 16 Bad database design causes problems, e.g., redundancy (facts are stored more than once) inconsistency through update anomalies complexity of queries and constraints

2. Normal Forms 17 Bad database design causes problems, e.g., redundancy (facts are stored more than once) inconsistency through update anomalies complexity of queries and constraints We study several rules of thumb for good database design Normal Forms First Normal Form (1NF) Boyce-Codd Normal Form (BCNF) Fourth Normal Form (4NF)

Normal Forms 18 Relational Model attributes & domains: attribute A takes values from Dom(A) relation schemas and database schemas Schema (or table header ) of a relation (name) R set schema(r) of attributes (or column headers ) Database Schema set of relation names together with their schemas Book ISBN Title Price 0-321-32132-1 Balloon $34.00 0-55-123456-9 Main Street $22.95 0-123-45678-0 Ulysses $34.00 S = { Book } schema(book) = { ISBN, Price, Title } Database Instance D of S = set { T1,, Tn } of tables Tk (finite relations) of type schema(rk) = { A1,.., Am } 1-22-233700-0 Visual Basic $25.00 Tk fixes an order A1,.., Am If (v1,..,vm) is row of Tk, then vk in Dom(Ak)

Warning on NULL values 19 If (v1,..,vm) is row of Tk, then vk in Dom(Ak) With SQL implementations, this is not entirely correct. By ANSI specification of SQL: every column is NULLable by default. we only know that vk in Dom(Ak) union { NULL } NULL is a condition. It means, a value is unknown, missing, or irrelevant. Using NULLs can cause a lot of problems (from implementation to logic) Check articles on the web! Do not use NULLs!!! None of the normal forms we discuss allows NULLs!

Example Warning on NULL values 20 Name HasDog NI#... Alice 0 Bob 1. 0.... Steve 1. OK. (but could be inefficient)

Example Warning on NULL values 21 Name HasDog NI#... Alice 0 Bob 1. 0.... Steve 1. OK. (but could be inefficient) Name DogName NI#... Alice NULL Bob Einstein. NULL.... Steve Rex. Not OK. create new table(s) e.g. DogName

Warning on Duplicate Rows 22 In relational algebra, duplicate rows are not permitted. In SQL, they are permitted. be careful about this do not design tables that contain duplicates (if you need them, administer them in a different way) do not design queries that return duplicates (unless that is really what you want! often it is better to return a histogram)

Duplicate Rows 23 In relational algebra, duplicate rows are not permitted. In SQL, they are permitted. be careful about this do not design tables that contain duplicates (if you need them, administer them in a different way) do not design queries that return duplicates (unless that is really what you want! often it is better to return a histogram) We would like that the set of all attributes is a trivial superkey! Then: every table has a PRIMARY KEY.

SQL 24 SQL queries have Multiset Semantics, i.e., answers contain duplicates. SELECT Height FROM population; 1.83 1.83 1.75 1.83 Unless you use Set Operators (UNION, DIFFERENCE, INTERSECT, etc) (or the DISTINCT operator) duplicates are removed!

Duplicate Rows 25 mysql> create table col (Number int, Color text); mysql> insert into col values(1,"red"); mysql> insert into col values(1,"red"); mysql> select * from col; +--------+-------+ Number Color +--------+-------+ 1 red 1 red +--------+-------+

Duplicate Rows 26 mysql> create table col (Number int, Color text, primary key (Number, Color)); ERROR 1170 (42000): BLOB/TEXT column 'Color' used in key specification without a key length mysql> create table col (Number int, Color varchar(1000), primary key (Number, Color)); ERROR 1071 (42000): Specified key was too long; max key length is 767 bytes mysql> create table col (Number int, Color varchar(100), primary key (Number, Color)); mysql> insert into col values(1, red ); mysql> insert into col values(1, red ); ERROR 1062 (23000): Duplicate entry '1-red' for key 'PRIMARY'

Duplicate Rows 27 mysql> create table col (Number int, Color text, primary key (Number, Color)); ERROR 1170 (42000): BLOB/TEXT column 'Color' used in key specification without a key length mysql> create table col (Number int, Color varchar(1000), primary key (Number, Color)); ERROR 1071 (42000): Specified key was too long; max key length is 767 bytes mysql> create table col (Number int, Color varchar(100), primary key (Number, Color)); mysql> insert into col values(1, red ); mysql> insert into col values(1, red ); ERROR 1062 (23000): Duplicate entry '1-red' for key 'PRIMARY' when your MySQL script loads csv-files via LOAD DATA, then you do not see these error messages!!!

Duplicate Rows 28 depending on your PRIMARY KEY, the LOADer of MySQL will silently eliminate duplicates for you this is bad practise if you do it, you loose points on Assignment 1! when your MySQL script loads csv-files via LOAD DATA, then you do not see these error messages!!!

Normal Forms 29 Normal Forms prevent modification anomalies prevent data inconsistency make tables less redundant while preserving information (and dependencies) Modification anomalies: same information present in multiple rows. Partial updates may result in inconsistent table (i.e., providing conflicting anwers) update anomaly certain facts cannot be recorded at all insertion anomaly deletion of data representing a fact may necessitate deletion of other completely different facts deletion anomaly

Normal Forms 30 1NF 2NF 3NF BCNF 4NF 5NF DKNF

Normal Forms 31 1NF 2NF 3NF BCNF 4NF = Choose Appropriate Data Types = Do Not Represent the Same Fact Twice = Do not Store Unrelated Information in the Same Relation

Normalization 32 Normalization = process of bringing a given database into a given normal form Typically, normalization is achieved by decomposing tables into smaller tables. Decomposition in turn is realized via projections.

First Normal Form (1NF) 33 for every attribute A, Dom(A) contains only atomic (indivisible) values value of each attribute contains only a single value from the domain [Codd,1971] most database systems do not allow to define tables inside tables you could insert long strings (comma separated) Never do that! ISBN Title AuName AuPhone 0-321-32132-1 Balloon Sleepy, Snoopy, Grumpy 321-321-1111, 232-234-1234, 665-235-6532 0-55-123456-9 Main Street Jones, Smith 123-333-3333, 654-223-3455 0-123-45678-0 Ulysses Joyce 666-666-6666 1-22-233700-0 Visual Basic Roman 444-444-4444

First Normal Form (1NF) 34 Bring a table into 1NF through decomposition: 1) place all items of a repeating group into new table 2) duplicate in new table the primary key of the original table ISBN Title AuName AuPhone ISBN AuName AuPhone 0-321-32132-1 Balloon Sleepy, Snoopy, Grumpy 0-55-123456-9 Main Street Jones, Smith 321-321-1111, 232-234-1234, 665-235-6532 123-333-3333, 654-223-3455 0-123-45678-0 Ulysses Joyce 666-666-6666 1-22-233700-0 Visual Basic Roman 444-444-4444 0-321-32132-1 Sleepy 321-321-1111 0-321-32132-1 Snoopy 232-234-1234 0-321-32132-1 Grumpy 665-235-6532 0-55-123456-9 Jones 123-333-3333 0-55-123456-9 Smith 654-223-3455 ISBN Title 0-123-45678-0 Joyce 666-666-6666 1-22-233700-0 Roman 444-444-4444 0-321-32132-1 Balloon 0-55-123456-9 Main Street 0-123-45678-0 Ulysses 1-22-233700-0 Visual Basic

First Normal Form (1NF) 35 for every attribute A, Dom(A) contains only atomic (indivisible) values value of each attribute contains only a single value from the domain [Codd,1971] atomic value = value that cannot be decomposed Problematic Character string? Fixed-point number? ISBN (includes language and publisher identifiers)? C. J. Date: The notion of atomicity has no absolute meaning

Table versus Relation 36 Chris Date, What First Normal Form Really Means (2000) A table is in 1NF if and only if it is isomorphic to some relation, specifically: 1. There is no top-to-bottom ordering on the rows. 2. There is no left-to-right ordering on the columns. 3. There are no duplicate rows. 4. Every row-and-column intersection contains exactly one value from the applicable domain (and nothing else). 5. All columns are regular [i.e. rows have no hidden components such as row IDs, object IDs, or hidden timestamps].

Second Normal Form (2NF) 37 A table is in 2NF, if [Codd,1971] it is in 1NF every non-prime attribute depends on the whole of every candidate key Example (Not 2NF) Schema(R) = {City, Street, HouseNumber, HouseColor, CityPopulation} 1. {City, Street, HouseNumber} {HouseColor} 2. {City} {CityPopulation} functional dependency 3. CityPopulation is non prime 4. CityPopulation depends on { City } which is NOT the whole of the (unique) candidate key {City, Street, HouseNumber} Functional dependency D E: for every D-tuple, there is at most one E-tuple E (functionally) depends on D

Second Normal Form (2NF) 38 do you see the potential Redundancy in this table? Example (Not 2NF) Schema(R) = {City, Street, HouseNumber, HouseColor, CityPopulation} 1. {City, Street, HouseNumber} {HouseColor} 2. {City} {CityPopulation} functional dependency 3. CityPopulation is non prime 4. CityPopulation depends on { City } which is NOT the whole of the (unique) candidate key {City, Street, HouseNumber} Functional dependency D E: for every D-tuple, there is at most one E-tuple E (functionally) depends on D

Second Normal Form (2NF) 39 Bring a 1NF table into 2NF move an attribute depending on a strict subset of a candidate key into a new table, together with this strict subset the strict subset becomes the key of the new table Example (Convert to 2NF) Old Schema {City, Street, HouseNumber, HouseColor, CityPopulation} New Schema {City, Street, HouseNumber, HouseColor} New Schema {City, CityPopulation}

40 END Lecture 5