MIS2502: Data Analytics Relational Data Modeling. Jing Gong

Similar documents
MIS2502: Data Analytics Relational Data Modeling. Jing Gong

MIS2502: Data Analytics Relational Data Modeling - 1. JaeHwuen Jung

MIS2502: Data Analytics Relational Data Modeling (2) Alvin Zuyin Zheng

MIS2502: Review for Exam 1. Jing Gong

Exam #1 Review. Zuyin (Alvin) Zheng

MIS2502: Data Analytics SQL Getting Information Out of a Database Part 1: Basic Queries

MIS2502: Data Analytics SQL Getting Information Out of a Database. Jing Gong

MIS2502: Review for Exam 2. JaeHwuen Jung

L12: ER modeling 5. CS3200 Database design (sp18 s2) 2/22/2018

Database Design and Administration for OnBase WorkView Solutions. Mike Martel Senior Project Manager

Entity Relationship Diagram (ERD): Basics

Database Systems. Overview - important points. Lecture 5. Some introductory information ERD diagrams Normalization Other stuff 08/03/2015

Entity Relationship Diagram (ERD) Dr. Moustafa Elazhary

L11: ER modeling 4. CS3200 Database design (sp18 s2) 2/15/2018

Relational Model (cont d) & Entity Relational Model. Lecture 2

MIS2502: Data Analytics Dimensional Data Modeling. Jing Gong

More on the Chen Notation

Conceptual Database Design. COSC 304 Introduction to Database Systems. Entity-Relationship Modeling. Entity-Relationship Modeling

LAB 2 Notes. Conceptual Design ER. Logical DB Design (relational) Schema Refinement. Physical DD

Full file at

Represent entities and relations with diagrams

ER Modeling ER Diagram ID-Dependent and Weak Entities Pg 1

Entity-Relationship Modelling. Entities Attributes Relationships Mapping Cardinality Keys Reduction of an E-R Diagram to Tables

System Analysis And Design Methods ENTITY RELATIONSHIP DIAGRAM (ERD) Prof. Ali Khaleghi Eng. Hadi Haedar

MIS2502: Data Analytics The Information Architecture of an Organization. Jing Gong

Entity-Relationship Model &

MIS2502: Data Analytics Dimensional Data Modeling. Jing Gong

LECTURE 3: ENTITY-RELATIONSHIP MODELING

Chapter 2 Conceptual Modeling. Objectives

The entity is an object of interest to the end user. entity correspond to the table not to a row- in the relational environment.

Database Principles: Fundamentals of Design, Implementation, and Management Tenth Edition. Chapter 7 Data Modeling with Entity Relationship Diagrams

course 3 Levels of Database Design CSCI 403 Database Management Mines Courses ERD Attributes Entities title 9/26/2018

Entity Relationship Modeling. From Rob and Coronel (2004), Database Systems: Design, Implementation, and Management

Related download: Instructor Manual for Modern Database Management 12th Edition by Hoffer Venkataraman Topi (Case studies included)

COSC 304 Introduction to Database Systems. Entity-Relationship Modeling

Database Logical Design

ERD Getting Started Guide

MIS Database Systems Entity-Relationship Model.

Data Analysis 1. Chapter 2.1 V3.1. Napier University Dr Gordon Russell

Chapter 1 SQL and Data

Entity Relationship Modelling

The Entity-Relationship Model (ER Model) - Part 2

Objectives Definition iti of terms List five properties of relations State two properties of candidate keys Define first, second, and third normal for

MIS2502: Review for Exam 2. Jing Gong

CONCEPTUAL DESIGN: ER TO RELATIONAL TO SQL

The Next Step: Designing DB Schema. Chapter 6: Entity-Relationship Model. The E-R Model. Identifying Entities and their Attributes.

Conceptual Design with ER Model

SAMPLE FINAL EXAM SPRING/2H SESSION 2017

Database Systems: Design, Implementation, and Management Tenth Edition. Chapter 4 Entity Relationship (ER) Modeling

Essentials of Database Management (Hoffer et al.) Chapter 2 Modeling Data in the Organization

Chapter 6: Entity-Relationship Model. The Next Step: Designing DB Schema. Identifying Entities and their Attributes. The E-R Model.

Database Logical Design

Database Principles: Fundamentals of Design, Implementation, and Management Tenth Edition. Chapter 7 Data Modeling with Entity Relationship Diagrams

Chapter 4 Entity Relationship Modeling In this chapter, you will learn:

Data Modeling Using the Entity-Relationship (ER) Model

Database Systems: Design, Implementation, and Management Tenth Edition. Chapter 4 Entity Relationship (ER) Modeling

2004 John Mylopoulos. The Entity-Relationship Model John Mylopoulos. The Entity-Relationship Model John Mylopoulos

The Relational Model

COMP Instructor: Dimitris Papadias WWW page:

ENTITY-RELATIONSHIP MODEL. CS 564- Spring 2018

Logical Database Design Normalization

Overview of Database Design Process Example Database Application (COMPANY) ER Model Concepts

David M. Kroenke and David J. Auer Database Processing Fundamentals, Design, and Implementation

Appendix A Database Design. Data Modeling and the Entity-Relationship Model

Translation of ER-diagram into Relational Schema. Dr. Sunnie S. Chung CIS430/530

Entity Relationships and Databases

CSE 880:Database Systems. ER Model and Relation Schemas

A l Ain University Of Science and Technology

Consistency The DBMS must ensure the database will always be in a consistent state. Whenever data is modified, the database will change from one

A l Ain University Of Science and Technology

Conceptual Design. The Entity-Relationship (ER) Model

Database Applications (15-415)

IS 263 Database Concepts

Database Foundations. 2-3 Conceptual Data Modeling. Copyright 2015, Oracle and/or its affiliates. All rights reserved.

A good example of entities and relationships can be seen below.

CMPT 354 Database Systems. Simon Fraser University Fall Instructor: Oliver Schulte

High Level Database Models

Lecture3: Data Modeling Using the Entity-Relationship Model.

ALTERNATE SCHEMA DIAGRAMMING METHODS DECISION SUPPORT SYSTEMS. CS121: Relational Databases Fall 2017 Lecture 22

Overview of Database Design Process. Data Modeling Using the Entity- Relationship (ER) Model. Two main activities:

Introduction to Database Design

Copyright 2016 Ramez Elmasr and Shamkant B. Navathei

ER DIAGRAM ER) diagram, a graphical representation of entities and their relationships to each other, typically used in computing in regard to the

Data Warehousing. Overview

Review The Big Picture

Tutorial. Creating ERDs Using

Database Management Systems

CHAPTER 11. Data Normalization

1/24/2012. Chapter 7 Outline. Chapter 7 Outline (cont d.) CS 440: Database Management Systems

XV. The Entity-Relationship Model

MIS2502: Data Analytics MySQL and SQL Workbench. Jing Gong

Definition. 02. Data Modeling. Example ABC Company Database. Data Modeling Importance

MIS 0855 Fall 2015 Data Science

CS403- Database Management Systems Solved Objective Midterm Papers For Preparation of Midterm Exam

How to design a database

Translation of ER-diagram into Relational Schema. Dr. Sunnie S. Chung CIS430/530

Database Systems ER Model. A.R. Hurson 323 CS Building

Course on Database Design Carlo Batini University of Milano Bicocca Part 5 Logical Design

CS 405G: Introduction to Database Systems

Notes. These slides are based on a slide set provided by Prof. M. Tamer Öszu. CS 640 E-R Model Winter / 23. Notes

Transcription:

MIS2502: Data Analytics Relational Data Modeling Jing Gong gong@temple.edu http://community.mis.temple.edu/gong

Where we are Now we re here Data entry Transactional Database Data extraction Analytical Data Store Data analysis Stores real-time transactional data Stores historical transactional and summary data

What is a model? Representation of something in the real world

Modeling a database A representation of the information to be captured Describes the data contained in the database Explains how the data interrelates

Why bother modeling? Creates a blueprint before you start building the database Gets the story straight: easy for nontechnical people to understand Minimize having to go back and make changes in the implementation stage

Systems analysis and design in the context of database development Systems Analysis is the process of modeling the problem Requirements-oriented What should we do? This is where we define and understand the business scenario. Systems Design is the process of modeling a solution Functionality-oriented How should we do it? This is where we implement that scenario as a database.

Start with a problem statement Design a database to track orders for a store. A customer places an order for a product. People can place an order for multiple products. Record first name, last name, city, state, and zip code for customers. We also want to know the date an order was placed. Finally, we want to track the name and price of products and the quantity of each product for each order.

The Entity Relationship Diagram (ERD) The primary way of modeling a relational database Three main diagrammatic elements Entity A uniquely identifiable thing (i.e., person, order) Relationship Describes how two entities relate to one another (i.e., makes) Attribute A characteristic of an entity or relationship (i.e., first name, order number)

Begin with Identifying the Entities This is what your database is about. 1. List the nouns in the problem statement. 2. When nouns are synonyms for other nouns, choose the best one. 3. Make a note of nouns that describe other nouns. These will be your entities attributes. 4. Rule out the nouns that don t relate to the process to be captured. What s left are your entities!

So here are the nouns Design a database to track orders for a store. A customer places an order for a product. People can place an order for multiple products. Record first name, last name, city, state, and zip code for customers. We also want to know the date an order was placed. Finally, we want to track the name and price of products and the quantity of each product for each order. Which nouns are entities Which nouns are attributes? Which nouns are irrelevant?

Here s where it gets tricky store is not an entity because we are not tracking specific information about the store (i.e., store location) BUT if there were many stores and we wanted to track sales by store, then store would be an entity! In this case, store is the context But that isn t part of the problem statement.

The ERD Based on the Problem Statement

The primary key Entities need to be uniquely identifiable So you can tell them apart They may not be explicitly part of the problem statement, but you need them! Use a primary key One or more attributes that uniquely identifies an entity How about these as primary keys for Customer: Customer ID Order number Uniquely identifies a customer Uniquely identifies an order First name and/or last name? Social security number?

Last component: Cardinality Defines the rules of the association between entities Customer makes Order Minimum cardinality: Maximum cardinality: at least one at most - one at least zero (optional) at most - many This is a one-to-many (1:m) relationship: One customer can have many orders. A customer could have no orders. One order can only belong to one customer. An order has to belong to at least one customer.

Maximum and Minimum Cardinality Maximum cardinality (type of relationship) Describes the maximum number of entity instances that participate in a relationship One-to-one One-to-many Many-to-one Minimum cardinality Describes the minimum number of entity instances that must participate in a relationship

One-to-One Relationship One-to-One (1:1) A single instance of one entity is related to a single instance of another entity A state has (at most) one governor A governor governs (at most) one state

One-to-Many Relationship One-to-Many (1:n or 1:m) A single instance of one entity is related to multiple instances of another entity A publisher can publish many (multiple) books A book is published by (at most) one publisher

Many-to-Many Relationship Many-to-Many (n:n or m:m) Each instance of one entity is related to multiple instances of another entity, and vice versa A book can be written by many (multiple) authors An author can write many (multiple) books

Minimum Cardinality Minimums are generally stated as either zero or one: 0 (optional): participation in the relationship by the entity is optional. 1 (mandatory): participation in the relationship by the entity is mandatory. A programmer is mandatory for a certificate); or a certificate has to be issued to someone. A certificate is optional for a programmer; or a programmer may not have any certificates 1:m maximum cardinality: a programmer can have many certificates; a certificate is issued to only one programmer

Crows Feet Notation Customer makes Order So called because this looks something like this There are other ways of denoting cardinality, but this one is pretty standard. There are also variations of the crows feet notion!

Crow s Foot Notation Summary Symbol Minimum Cardinality + Maximum Cardinality Optional - One Mandatory - One Optional - Many Mandatory - Many

The Order-Production Example: A Many-to-Many (m:m) Relationship An order can be composed of many products. An order has to have at least one product. A product can be a part of many orders. A product has to be associated with at least one order. Does it make sense for the maximum cardinality to be 1 for either entity? at least one at most - many at least one at most - many Order number Order contains Product Order Date Quantity Product name Does it make sense for the minimum cardinality to be 0 (optional) for either entity? Price Product ID

Cardinality is defined by business rules What would the cardinality be in these situations??? Order contains Product?? Course has Section?? Employee has Office

Relationship Attributes Name TUID Student contains Course Course Title Grade Semester Course number The grade and semester describes the combination of student and course (i.e., Bob takes MIS2502 in Fall 2011 and receives a B; Sue takes MIS2502 in Fall 2012 and receives an A)

A scenario: The auto repair shop Each transaction is associated with a repair, a car, and a mechanic. Cars, repairs, and mechanics can all be part of multiple transactions. Many transactions can make up an invoice. A transaction can only belong to one invoice. A car is described by a VIN, make, and model. A mechanic is described by a name and SSN. A repair is described by a repair id and a price. A transaction occurs on a particular date and has a transaction ID. An invoice has an invoice number and a billing name, city, state, and zip code.

Solution

Normalization Organizing data to minimize redundancy (repeated data) This is good for several reasons The database takes up less space Fewer inconsistencies in your data Easier to search and navigate the data It s easier to make changes to the data The relationships take care of the rest

Normalizing your ER Model If an entity has multiple sets of related attributes, split them up into separate entities Don t do this Vendor Phone Product ID Vendor Name Product Vendor Address Product name Vendor Phone Vendor Name Vendor Vendor Address Vendor ID Price do this sells Then you won t have to repeat vendor information for each product. Product Price Product name Product ID

Normalizing your ER Model Each attribute should be atomic you can t (logically) break it up any further. Don t do this do this Phone Address Phone First Name Customer ID Customer First/Last Name Customer ID Customer Last Name Street City State Zip This way you can search or sort by last name OR first name, and by city, state, or zip code.

Key concepts Entity Relationship Cardinality Summary of ERD Minimum cardinality: 1:1, 1:m, m:m Maximum cardinality: optional or mandatory (i.e., 0 or 1) Crow s foot notation Attributes Entity attributes: primary key vs. non-key Relationship attributes Key skills Interpret simple ERDs Draw an ERD based on a scenario description

Implementing the ERD As a database schema A map of the tables and fields in the database This is what is implemented in the database management system Part of the design process A schema actually looks a lot like the ERD Entities become tables Attributes become fields Relationships can become additional tables

The Rules 1. Create a table for every entity 2. Create table fields for every entity s attributes 3. Implement relationships between the tables 1:many relationships many:many relationships 1:1 relationships Primary key field of 1 table put into many table as foreign key field Create new table 1:many relationships with original tables Primary key field of one table put into other table as foreign key field

The ERD Based on the Problem Statement

Our Order Database schema Original 1:n relationship Original n:n relationship Order-Product is a decomposed many-to-many relationship Order-Product has a 1:n relationship with Order and Product Now an order can have multiple products, and a product can be associated with multiple orders

Customer Table The Customer and Order Tables: The 1:n Relationship CustomerID FirstName LastName City State Zip 1001 Greg House Princeton NJ 09120 1002 Lisa Cuddy Plainsboro NJ 09123 1003 James Wilson Pittsgrove NJ 09121 1004 Eric Foreman Warminster PA 19111 Order Table Order Number OrderDate Customer ID 101 3-2-2011 1001 102 3-3-2011 1002 103 3-4-2011 1001 104 3-6-2011 1004 Customer ID is a foreign key in the Order table. We can associate multiple orders with a single customer! In the Order table, Order Number is unique; Customer ID is not!

Customer Table The Customer and Order Tables: Normalization CustomerID FirstName LastName City State Zip 1001 Greg House Princeton NJ 09120 1002 Lisa Cuddy Plainsboro NJ 09123 1003 James Wilson Pittsgrove NJ 09121 1004 Eric Foreman Warminster PA 19111 Order Table Order Number OrderDate Customer ID 101 3-2-2011 1001 102 3-3-2011 1002 103 3-4-2011 1001 104 3-6-2011 1004 No repeating orders or customers. Every customer is unique. Every order is unique. This is an example of normalization..

To figure out who ordered what Match the Customer IDs of the two tables, starting with the table with the foreign key (Order): Order Table Customer Table Order Number OrderDate Customer ID Customer ID FirstName LastName City State Zip 101 3-2-2011 1001 1001 Greg House Princeton NJ 09120 102 3-3-2011 1002 1002 Lisa Cuddy Plainsboro NJ 09123 103 3-4-2011 1001 1001 Greg House Princeton NJ 09120 104 3-6-2011 1004 1004 Eric Foreman Warminster PA 19111 We now know which order belonged to which customer This is called a join

Now the many:many relationship Order Table Order Number OrderDate 101 3-2-2011 1001 102 3-3-2011 1002 103 3-4-2011 1001 104 3-6-2011 1004 Customer ID Order ProductID Order-Product Table Order number Product ID 1 101 2251 2 2 101 2282 3 3 101 2505 1 4 102 2251 5 Quantity 5 102 2282 2 Product Table ProductID ProductName Price 6 103 2505 3 7 104 2505 8 2251 Cheerios 3.99 2282 Bananas 1.29 2505 Eggo Waffles 2.99 This table relates Order and Product to each other!

Order ProductID To figure out what each order contains Match the Product IDs and Order IDs of the tables, starting with the table with the foreign keys (Order-Product): Order-Product Table Order Table Product Table Order Number Product ID Quantity Order Number Order Date Customer ID Product ID Product Name 1 101 2251 2 101 3-2-2011 1001 2251 Cheerios 3.99 2 101 2282 3 101 3-2-2011 1001 2282 Bananas 1.29 3 101 2505 1 101 3-2-2011 1001 2505 Eggo Waffles 2.99 4 102 2251 5 102 3-3-2011 1002 2251 Cheerios 3.99 5 102 2282 2 102 3-3-2011 1002 2282 Bananas 1.29 6 103 2505 3 103 3-4-2011 1001 2505 Eggo Waffles 2.99 7 104 2505 8 104 3-6-2011 1004 2505 Eggo Waffles 2.99 Price So which customers ordered Eggo Waffles (by their Customer IDs)?

This is denormalized data necessary for querying but bad for storage tomer Customer ID Customer ID Product ID Product Name Price 1001 2251 Cheerios 3.99 1001 2282 Bananas 1.29 1001 2505 Eggo Waffles 2.99 1002 2251 Cheerios 3.99 1002 2282 Bananas 1.29 1001 2505 Eggo Waffles 2.99 1004 2505 Eggo Waffles 2.99 First Name Last Name City State Zip 1 1001 Greg House Princeton NJ 09120 2 1002 Lisa Cuddy Plainsboro NJ 09123 The redundant data seems harmless, but: What if the price of Eggo Waffles changes? And what if Greg House changes his address? And if there are 1,000,000 records? 1 1001 Greg House Princeton NJ 09120 4 1004 Eric Foreman Warminster PA 19111

Summary of Database Schema Draw the corresponding schema of an ERD Identify tables based on entities and relationships Implement primary key/foreign key relationships Decompose many-to-many relationships in an ERD into one-to-many relationships in the schema Best practices for normalization Be able to match up (join) multiple tables