Lecture 4 The Entity-Relationship Model (ER Model) - Part 2 By Michael Hahsler Based on slides for CS145 Introduction to Databases (Stanford)
Lecture 4 > Section 2 What you will learn about in this section 1. Relationships: multiplicity, multi-way 2. Design considerations 3. Conversion to SQL 4. Advanced concepts 2
Lecture 4 > Section 2 > Relationships- Multiplicity & Multi-way Multiplicity of ER Relationships One-to-one: X 1 2 3 Y a b c d X 1 1 Y Using Chen s Notation Many-to-one: 1 2 3 a b c d X N 1 Y One-to-many: 1 2 3 a b c d X 1 N Y 1 2 3 a b c d Many-to-many: X Y N M 3
Lecture 4 > Section 1 > Relationships Multiplicity of ER Relationships name category name price Product N 1 Makes Company How to read a relationship in both directions: 1. A product is made by a one company 2. A company makes many product 4
Lecture 4 > Section 2 > Relationships- Multiplicity & Multi-way name category name price Product N 1 makes Company 1 stockprice No specified cardinality often means N:M, or we do not want to decide, yet. buys employs Person address name ssn N What does this say? 5
Lecture 4 > Section 2 > Relationships- Multiplicity & Multi-way Multi-way Relationships How do we model A person buys a product in a store? Product Purchase Store Person 6
Lecture 4 > Section 2 > Relationships- Multiplicity & Multi-way Multiplicity in Multiway Relationships Q: What do the 1s and the N mean? Product N Purchase 1 Store 1 Person 7
Lecture 4 > Section 2 > Relationships- Multiplicity & Multi-way Multiplicity in Multiway Relationships Better: many to many to many relationship Product N Purchase N Store N Person 8
Lecture 4 > Section 2 > Design decisions Conversion of Multi-way Relationship to New Entity + Binary Relationships? Multi-way Relationship Entity + Binary Product ID date N ProductOf 1 Product Purchase Store Purchase N StoreOf 1 Store Person Multiple purchases per (product, store, person) possible here! N BuyerOf 1 Person 9
Lecture 4 > Section 2 > Design decisions 3. Design Principles What s wrong with these examples? Product 1 Purchase N Person Country President Person 10
Lecture 4 > Section 2 > Design decisions Design Principles: What s Wrong? Product date Purchase Store personaddr personname 11
Lecture 4 > Section 2 > Design decisions Design Principles: What s Wrong? - Fixed Product N date M Purchase N 1 Store N 1 Person 12
Lecture 4 > Section 2 > Design decisions Examples: Entity vs. Attribute Should address be an attribute? Or an entity? Addr 1 Addr 2 Employee Street Addr Address AddrOf 1 N Employee ZIP 13
Lecture 4 > Section 2 > Design decisions Examples: Entity vs. Attribute Should address be an attribute? Addr 1 Addr 2 Employee How do we handle employees with more than two addresses? How do we handle addresses where internal structure of the address (e.g. zip code, state) is useful? 14
Lecture 4 > Section 2 > Design decisions Examples: Entity vs. Attribute Use an entity Street Addr ZIP Address N AddrOf 1 Employee In general, when we want to record several values, we choose a separate entity. 15
Lecture 4 > Section 2 > Conversion to SQL From ER Diagrams to Relational Schema Key concept: Both Entity sets and Relationships become relations (tables in RDBMS) 16
Lecture 4 > Section 2 > Conversion to SQL From ER Diagrams to Relational Schema An entity set becomes a relation (multiset of tuples / table) name price Product category Each tuple is one entity Each tuple is composed of the entity s attributes, and has the same primary key Product name price category Gizmo1 99.99 Camera Gizmo2 19.99 Edible 17
Lecture 4 > Section 2 > Conversion to SQL From ER Diagrams to Relational Schema price category name CREATE TABLE Product( name CHAR(50) PRIMARY KEY, price DOUBLE, category VARCHAR(30) ) Product Product name price category Gizmo1 99.99 Camera Gizmo2 19.99 Edible 18
Lecture 4 > Section 2 > Conversion to SQL From ER Diagrams to Relational Schema (N:M) A relation between entity sets A 1,, A N also becomes a multiset of tuples / a table price name category Product N date Purchased M firstname Person lastname Each row/tuple is one relation, i.e. one unique combination of entities (a 1,,a N ) Each row/tuple is composed of the union of the entity sets attributes has the entities primary keys as foreign keys has the union of the entity sets keys as primary key Purchased name firstname lastname date Gizmo1 Bob Joe 01/01/15 Gizmo2 Joe Bob 01/03/15 Gizmo1 JoeBob Smith 01/05/15 19
Lecture 4 > Section 2 > Conversion to SQL From ER Diagrams to Relational Schema (N:M) CREATE TABLE Purchased( name CHAR(50), firstname CHAR(50), lastname CHAR(50), date DATE, PRIMARY KEY (name, firstname, lastname), FOREIGN KEY (name) REFERENCES Product, FOREIGN KEY (firstname, lastname) REFERENCES Person ) price name category Product Purchased N date Purchased M firstname lastname Person name firstname lastname date Gizmo1 Bob Joe 01/01/15 Gizmo2 Joe Bob 01/03/15 Gizmo1 JoeBob Smith 01/05/15 20
Lecture 4 > Section 2 > Conversion to SQL From ER Diagrams to Relational Schema (1:N) A 1:N relationship can be implemented without an extra table. Add the primary key of the 1 side to the table for the N side entity. ID name Customer 1 N has ID Address Street Number ZIPCode Address ID Number Street ZIPCode CustID 1 123 Main St 75000 1 2 6660 Willow Dr 86123 1 3 1 Nowhere Pl 99999-1234 2 21
Lecture 4 > Section 2 > Conversion to SQL From ER Diagrams to Relational Schema (1:N) CREATE TABLE Address( ID CHAR(50), Number CHAR(50), Street CHAR(50), ZIPCode CHAR(10), PRIMARY KEY (ID), FOREIGN KEY (CustID) REFERENCES Customer, ) name ID Customer Address 1 N has ID Street Address Number ZIPCode ID Number Street ZIPCode CustID 1 123 Main St 75000 1 2 6660 Willow Dr 86123 1 3 1 Nowhere Pl 99999-1234 2 22
Lecture 4 > Section 2 > Conversion to SQL From ER Diagram to Relational Schema How do we represent this as a relational schema? name category date firstname lastname price Product Purchased Person name address Store 23
Alternative Notations
Lecture 4 > Section 2 > ACTIVITY Add Multiplicity to your ER diagram Also make sure to add (new concepts underlined): A player can only belong to one team, a play can only be in one game, a pass/run..? Multiple players Tackle a single person in a play Players can achieve a Personal Record linked to a specific Game and Play Players have a weight which changes in on vs. off-season 25
Lecture 4 > Section 3 3. Advanced ER Concepts 26
Lecture 4 > Section 3 What you will learn about in this section 1. Subclasses 2. Constraints 3. Weak entity sets 4. Normal Forms 27
Lecture 4 > Section 3 > Subclasses & OO Modeling Subclasses Some objects in a class may be special, i.e. worthy of their own class Define a new class? But what if we want to maintain connection to current class? Better: define a subclass Ex: Products Software products Educational products We can define subclasses in ER! 28
Lecture 4 > Section 3 > Subclasses & OO Modeling Subclasses name price Product isa Child subclasses contain all the attributes of all of their parent classes plus the new attributes shown attached to them in the ER diagram Software Product Educational Product platforms agegroup 29
Lecture 4 > Section 3 > Subclasses & OO Understanding Subclasses Think in terms of records; ex: Software Product price name Product isa Educational Product Product SoftwareProduct name price name price platforms Child subclasses contain all the attributes of all of their parent classes plus the new attributes shown attached to them in the ER diagram platforms agegroup EducationalProduct name price agegroup 30
Lecture 4 > Section 3 > Subclasses & OO Think like tables Product name price category Gizmo 99 gadget name Camera 49 photo price Toy 39 gadget Product Sw.Product name platforms isa Ed.Product Gizmo unix Software Product Educational Product name agegroup platforms agegroup Gizmo todler Toy retired 31
Lecture 4 > Section 3 > Subclasses & OO IsA Review If we declare A IsA B then every A is a B We use IsA to Add descriptive attributes to a subclass To identify entities that participate in a relationship No need for multiple inheritance 32
Lecture 4 > Section 3 > Subclasses & OO Modeling UnionTypes With Subclasses Person FurniturePiece Company Suppose each piece of furniture is owned either by a person, or by a company. How do we represent this? 33
Lecture 4 > Section 3 > Subclasses & OO Modeling Union Types with Subclasses Say: each piece of furniture is owned either by a person, or by a company Solution 1. Acceptable, but imperfect (What s wrong?) Person FurniturePiece Company 1 N N 1 ownedbyperson ownedbycomp 34
Lecture 4 > Section 3 > Subclasses & OO Modeling Union Types with Subclasses Solution 2: better (though more laborious) FurniturePiece N What is happening here? ownedby Person 1 Owner Company isa 35
Lecture 4 > Section 3 > Constraints Constraints in ER Diagrams Finding constraints is part of the ER modeling process. Commonly used constraints are: Keys: Implicit constraints on uniqueness of entities Ex: An SSN uniquely identifies a person Single-value constraints: Ex: a person can have only one father Referential integrity constraints: Referenced entities must exist Ex: if you work for a company, it must exist in the database Other constraints: Ex: peoples ages are between 0 and 150 Recall FOREIGN KEYs! 36
Lecture 4 > Section 3 > Constraints Participation Constraints: Partial v. Total Product makes Company Are there products made by no company? Companies that don t make a product? Product makes Company Double line indicates total participation (i.e. here: all products are made by a company) 37
Lecture 4 > Section 3 > Constraints Single Value Constraints N 1 makes vs. makes 38
Lecture 4 > Section 3 > Constraints Referential Integrity Express that each product is made by exactly one company: Total Participation + Multiplicity Product N 1 makes Company Each product made by at least one company. Each product made by at most one company. 39
Lecture 4 > Section 3 > Constraints Keys in ER Diagrams Underline keys: name category price Product Note: no formal way to specify multiple keys in ER diagrams Person address name ssn 40
Lecture 4 > Section 3 > Weak Entity Sets Weak Entity Sets Entity sets are weak when their key comes from other classes to which they are related. Team N 1 affiliation University sport number name Football team v. The Stanford Football team (E.g., Berkeley has a football team too, sort of) 41
Lecture 4 > Section 3 > Weak Entity Sets Weak Entity Sets Entity sets are weak when their key comes from other classes to which they are related. Team N 1 affiliation University sport number name number is a partial key. (denote with dashed underline). University is called the identifying owner. 42 Participation in affiliation must be total. Why?
Lecture 4 > Summary ER Summary ER diagrams are a visual syntax that allows technical and non-technical people to talk For conceptual design Basic constructs: entity, relationship, and attributes A good design is faithful to the constraints of the application, but not overzealous 43
Lecture 5 > Section 1 > Overview Design Theory Design theory is about how to represent your data to avoid anomalies. It is a mostly mechanical process Tools can carry out routine portions We have a notebook implementing all algorithms! We ll play with it in the activities!
Lecture 5 > Section 1 > Overview Normal Forms 1 st Normal Form (1NF) = All tables are flat 2 nd Normal Form = disused Boyce-Codd Normal Form (BCNF) 3 rd Normal Form (3NF) DB designs based on functional dependencies, intended to prevent data anomalies 4 th and 5 th Normal Forms
Lecture 5 > Section 1 > Overview 1 st Normal Form (1NF) Student Mary Joe Courses {CS145,CS229} {CS145,CS106} Student Mary Mary Joe Joe Courses CS145 CS229 CS145 CS106 In 1 st NF 1NF: Entries have to be single values not a list!
Lecture 5 > Section 1 > Data anomalies & constraints Issues with 1NF Student Course Room Mary CS145 B01 Joe CS145 B01 Sam CS145 B01...... If every course is in only one room, contains redundant information!
Lecture 5 > Section 1 > Data anomalies & constraints Issues with 1NF Student Course Room Mary CS145 B01 Joe CS145 C12 Sam CS145 B01...... If we update the room number for one tuple, we get inconsistent data = an update anomaly
Lecture 5 > Section 1 > Data anomalies & constraints Issues with 1NF Student Course Room...... If everyone drops the class, we lose what room the class is in! = a delete anomaly
Lecture 5 > Section 1 > Data anomalies & constraints Issues with 1NF Student Course Room CS229 C12 Mary CS145 B01 Joe CS145 B01 Sam CS145 B01...... Similarly, we can t reserve a room without students = an insert anomaly
Lecture 5 > Section 1 > Data anomalies & constraints Issues with 1NF: Split Table Student Course Mary CS145 Joe CS145 Sam CS145.... Course Room CS145 B01 CS229 C12 Is this form better? Redundancy? Update anomaly? Delete anomaly? Insert anomaly? How do we decide how to split the table?
Lecture 5 > Section 1 > Data anomalies & constraints ER Model (if done right) already creates 3NF Student Course CourseNr Room Enrollment N 1 Enrollment for Course Corresponding tables: Enrollment Student Course Mary CS145 Joe CS145 Sam CS145.... Course CourseNr CS145 CS229 Room B01 C12
Lecture 5 > Section 1 > Data anomalies & constraints 3NF Each attribute needs to rely on the complete primary key (= 2NF). All attributes only directly depend on the primary key (no transitive dependencies). Student Course Room Grade Mary CS145 B01 NULL Joe CS145 B01 A Sam CS145 B01 NULL Student Course Grade Mary CS145 NULL Joe CS145 A Sam CS145 NULL CourseNr CS145 CS229 Room B01 C12.......... In short: The data depends on the key [1NF], the whole key [2NF] and nothing but the key [3NF]
Conclusion Databases should be in 3. NF to avoid anomalies. Some databases are poorly designed leading to bad data quality as a result of anomalies. Sometimes databases are on purpose not designed in 3. NF. E.g., for performance or other reasons. In this case, the application/user needs to prevent anomalies (which does not always work). For analytics purposes we often need a single table with all the data. Essentially this table is in 1. NF