Lecture 18. Business Intelligence and Data Warehousing. 1:M Normalization. M:M Normalization 11/1/2017. Topics Covered

Similar documents
Basics of Dimensional Modeling

TIM 50 - Business Information Systems

Penn State Student Chapter of the Association for Computing Machinery

Accounting Information Systems, 2e (Kay/Ovlia) Chapter 2 Accounting Databases. Objective 1

Management Information Systems MANAGING THE DIGITAL FIRM, 12 TH EDITION FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT

DATA MINING AND WAREHOUSING

Management Information Systems Review Questions. Chapter 6 Foundations of Business Intelligence: Databases and Information Management

COMP 465 Special Topics: Data Mining

by Prentice Hall

Chapter 6 VIDEO CASES

Question Bank. 4) It is the source of information later delivered to data marts.

IT1105 Information Systems and Technology. BIT 1 ST YEAR SEMESTER 1 University of Colombo School of Computing. Student Manual

On-Line Application Processing

Managing Data Resources

Management Information Systems

Handout 12 Data Warehousing and Analytics.

Chapter 3. Foundations of Business Intelligence: Databases and Information Management

Data warehouse and Data Mining

TDWI Data Modeling. Data Analysis and Design for BI and Data Warehousing Systems

The Data Organization

WKU-MIS-B10 Data Management: Warehousing, Analyzing, Mining, and Visualization. Management Information Systems

Lectures for the course: Data Warehousing and Data Mining (IT 60107)

Overview. Introduction to Data Warehousing and Business Intelligence. BI Is Important. What is Business Intelligence (BI)?

Managing Data Resources

Data Warehousing. Data Warehousing and Mining. Lecture 8. by Hossen Asiful Mustafa

Warehousing. Data Mining

1. Inroduction to Data Mininig

Chapter 4 Data Mining A Short Introduction

Full file at

Computers Are Your Future

Information Systems and Networks

Data Mining Course Overview

TIM 50 - Business Information Systems

CHAPTER 3 Implementation of Data warehouse in Data Mining

Data Warehousing. Seminar report. Submitted in partial fulfillment of the requirement for the award of degree Of Computer Science

OLAP Introduction and Overview

Data Warehouse and Data Mining

This tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining.

Data Management Lecture Outline 2 Part 2. Instructor: Trevor Nadeau

Data Strategies for Efficiency and Growth

The Evolution of Data Warehousing. Data Warehousing Concepts. The Evolution of Data Warehousing. The Evolution of Data Warehousing

DATA MINING TRANSACTION

Foundations of Business Intelligence: Databases and Information Management

1. Analytical queries on the dimensionally modeled database can be significantly simpler to create than on the equivalent nondimensional database.

Data Warehousing. Overview

Data Warehouse and Data Mining

Introduction to MS Access: creating tables, keys, and relationships

A Star Schema Has One To Many Relationship Between A Dimension And Fact Table

Data Mining. Ryan Benton Center for Advanced Computer Studies University of Louisiana at Lafayette Lafayette, La., USA.

Data warehouses Decision support The multidimensional model OLAP queries

Introduction to Data Mining and Data Analytics

MIS2502: Data Analytics Dimensional Data Modeling. Jing Gong

~ Ian Hunneybell: DWDM Revision Notes (31/05/2006) ~

Meetings This class meets on Mondays from 6:20 PM to 9:05 PM in CIS Room 1034 (in class delivery of instruction).

Database Management Systems MIT Lesson 01 - Introduction By S. Sabraz Nawaz

5-1McGraw-Hill/Irwin. Copyright 2007 by The McGraw-Hill Companies, Inc. All rights reserved.

Data Warehouse and Data Mining

MIT Database Management Systems Lesson 01: Introduction

GUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATIONS (MCA) Semester: IV

Chapter 28. Outline. Definitions of Data Mining. Data Mining Concepts

Chapter 6. Foundations of Business Intelligence: Databases and Information Management VIDEO CASES

Jarek Szlichta

Knowledge Modelling and Management. Part B (9)

Banner ODS Overview. Data Snapshots Based on Dates and Events Data Sets Frozen for Point in Time Historic, Trend Reporting Analytics. Admin.

Association Rule Mining. Entscheidungsunterstützungssysteme

Step-by-step data transformation

Data Mining and Data Warehousing Introduction to Data Mining

This tutorial will help computer science graduates to understand the basic-to-advanced concepts related to data warehousing.

1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda

Interview Questions on DBMS and SQL [Compiled by M V Kamal, Associate Professor, CSE Dept]

Course Logistics & Chapter 1 Introduction

Data Warehouse and Mining

Day 1 Agenda. Brio 101 Training. Course Presentation and Reference Material

CHAPTER 8 DECISION SUPPORT V2 ADVANCED DATABASE SYSTEMS. Assist. Prof. Dr. Volkan TUNALI

Data Analysis. CPS352: Database Systems. Simon Miner Gordon College Last Revised: 12/13/12

IBM InfoSphere Information Server Version 8 Release 7. Reporting Guide SC

ANU MLSS 2010: Data Mining. Part 2: Association rule mining

A case study to introduce Microsoft Data Mining in the database course

CT75 DATA WAREHOUSING AND DATA MINING DEC 2015

Dta Mining and Data Warehousing

The strategic advantage of OLAP and multidimensional analysis

Chapter 3. Databases and Data Warehouses: Building Business Intelligence

The Data Organization

1 DATAWAREHOUSING QUESTIONS by Mausami Sawarkar

E(xtract) T(ransform) L(oad)

Considerations of Analysis of Healthcare Claims Data

CT75 (ALCCS) DATA WAREHOUSING AND DATA MINING JUN

OLAP Theory-English version On-Line Analytical processing (Buisness Intzlligence)

Data Warehousing. Ritham Vashisht, Sukhdeep Kaur and Shobti Saini

Partner Presentation Faster and Smarter Data Warehouses with Oracle OLAP 11g

Chapter 18: Data Analysis and Mining

DATA QUALITY STRATEGY. Martin Rennhackkamp

MIS2502: Data Analytics Dimensional Data Modeling. Jing Gong

CS614 - Data Warehousing - Midterm Papers Solved MCQ(S) (1 TO 22 Lectures)

IST722 Data Warehousing

Chapter 3: Data Mining:

Data Warehousing and OLAP

Business Intelligence and Decision Support Systems

TDWI strives to provide course books that are content-rich and that serve as useful reference documents after a class has ended.

Data warehouse architecture consists of the following interconnected layers:

Transcription:

Lecture 18 Business Intelligence and Data Warehousing BDIS 6.2 BSAD 141 Dave Novak Topics Covered Test # Review What is Business Intelligence? How can an organization be data rich and information poor? Terms including Data Warehousing Benefits Multidimensional Analysis Data scrubbing Data mining concepts Rules for Normalizing Relationships 1-1: Choose primary key attribute from either entity (but not both) and place copy in the other entity as a Foreign Key. 1-M: Place copy of Primary Key attribute of the 1 entity in the M entity as a Foreign Key M-M: Create a joining or intersection table with a foreign key from each of the tables in the M:M relationship Many times the primary key of the M:M Intersection table will be a composite key (a key made up of multiple fields e.g. Student ID and CRN) 1:M Normalization Create a Foreign Key on the MANY side of the relationship with the SAME name as the primary key from the 1 side of the relationship Professor each professor can teach 1 M each section can be taught by Class Section M:M Normalization Convert all M:M to two 1:M relationships going to an intersection or joining table Student Student can register for M M each section can contain Class Section each student can be listed on each line M can list Student / Section 1 1 each line can list Line Item M each section can be listed on Class Section Integrities as Applied to Database Design Integrity - the state of being whole, entire, or undiminished. Data Integrity: Accuracy, Consistency and without Bias. Domain Constraints (only #s, currency, only 1 to 10 ) Field Datatypes on columns (currency, date/time, Text, Boolean..) Entity Integrity: Each Entity has a Unique Identifier which makes every row Unique Thus No Duplicate Records Each Unique Identifier a.k.a. Primary Key is not null and is in fact Unique All components of the Primary key are NOT null Referential Integrity: All Foreign Key Values EXIST in the Related Primary Key Set Also stated as NO orphaned foreign keys 1

Integrity in MS Access Data Integrity Input Masks and Data Types Zip Codes, Phone Numbers and Date Value Entity Integrity Identifying Field (or Fields) Can t be Null and auto iterates to a new UNIQUE value Access auto creates an ID column set to Auto- Number Referential Integrity Can t put a value in a Foreign Key column that DOESN T exist in a the Primary Key column to which it relates. Establishing Relationships in Access Relationships Referential Integrity requires that ALL values in a foreign key field MUST exist in the Primary Key field of which it is related other wise it would be an orphaned key related to a record that doesn t exist. Database are designed to Structure Data Well Designed databases consistently enforce Referential Integrity Summary Test #2 includes definitional concepts as well as practical application of certain concepts Read Chapter 6 and Appendix C Work through all examples Use Cengage online supplements BDIS 6.2 Main Points What is a Data Warehouse (compared to a Database? What is a Data mart (compared to a Data Warehouse? Cluster Analysis and Association Detection Support, Confidence and Lift Supporting Decisions with Business Intelligence Organizational data is difficult to access Organizational data contains structured data in database Organizational data contains unstructured data such as voice mail, phone calls, text messages, and video clips 2

The Problem: Data Rich, Information Poor Businesses face a data explosion as digital images, email in-boxes, and broadband connections doubles by 2010 The amount of data generated is doubling every year Some believe it will soon double monthly The Solution: Business Intelligence What is BI? Combination of technologies and analytics that facilitate the collection, integration, and presentation of INFORMATION Why? Improve decision making The Solution: Business Intelligence Retail and sales: Predicting sales; determining correct inventory levels and distribution schedules among outlets; and loss prevention Banking: Forecasting levels of bad loans and fraudulent credit card use, credit card spending by new customers, and which kinds of customers will best respond to (and qualify for) new loan offers The Solution: Business Intelligence Operations management: Predicting machinery failures; finding key factors that control optimization of manufacturing capacity Insurance: Forecasting claim amounts and medical coverage costs; classifying the most important elements that affect medical coverage; predicting which customers will buy new insurance policies The Solution: Business Intelligence Improving the quality of business decisions has a direct impact on costs and revenue BI enables business users to receive data for analysis that is: Reliable Consistent Understandable Easily manipulated Term Clarification Data Center: Physical space (like a building) to house a large group of servers, storage and network equipment Database: A structured set of digitally stored data managed by a computer to support OPERATIONAL needs relating to the creation, retrieval, updates and deletion of data. Data Warehouse: a type of database that accumulates and integrates copies of multiple database for ANALYTICAL purposes Data Mart: A subset of data from a database or data warehouse used for a single ad-hoc or specific analysis. 3

Term Clarification The definitions imply: Databases are most typically used to support lowerlevel operational (i.e. largely transactional) decision making Data warehouses integrate data from different DBs and add analytical capabilities needed to support tactical and strategic decision making Benefits of Data Warehousing Data warehouses extend the transformation of data into information In the 1990 s executives became less concerned with the day-to-day business operations and more concerned with overall business functions The data warehouse provided the ability to support decision making without disrupting the day-to-day operations Benefits of Data Warehousing Benefits of Data Warehousing Data warehouse A logical collection of information gathered from many different operational databases that supports business analysis activities and decisionmaking tasks The primary purpose of a data warehouse is to aggregate information throughout an organization into a single repository for decision-making purposes Benefits of Data Warehousing Departmental Databases No Integration Across Departments 4

PERFORMING BUSINESS ANALYSIS Multidimensional Analysis Databases contain information in a series of two-dimensional tables In a data warehouse information is multidimensional, it contains layers of columns and rows Dimension A particular attribute of information Cube Common term for the representation of multidimensional information Multidimensional Analysis Cubes of Information Benefits of High Quality Information Levels, Formats, and Granularities of Information The Cost of Poor Quality Information Butterfly effect Seemingly minor or even insignificant events can have a major impact on a system Organizations depend heavily on data A rather simple issue such as misspelling a customer s name could lead to revenue loss, process problems, and even result in compliance / regulatory issues 5

The Cost of Poor Quality Information An organization must maintain highquality data in the data warehouse Information cleansing or scrubbing A process that weeds out and fixes or discards inconsistent, incorrect, or incomplete information Why is this important in the context of a data warehouse? The data warehouse can contain information from many different DBs even external DBs The data in these DBs are not consistent with one another Contact Information in an Operational System Standardizing Customer Name from Operational Systems Cost of Accurate and Complete Information 6

UVM Databases UVM Databases Banner Student Information Systems Courses, Students, Instructors, Grades PeopleSoft Salaries, Pay, Tax Information BlackBoard Frequency of Logins, Course grades, assignments, readings, materials Student Evaluations Each department different not associable to specific students Facilities Swipe Cards Cat Card Swipes for all buildings SGA Student Membership/Participation UVM Virtual Event Management System Resource Scheduling of all Meeting spaces at UVM More Alumni, Donors, Research, Meal Plan, Residence halls How would UVM use a data warehouse? For example, what types of questions could be addressed that cannot be addressed by using the individual DBs Student retention Trends and Patterns with Data Mining Common forms of data-mining analysis capabilities include Cluster analysis Association detection Statistical analysis Cluster Analysis Statistical approach used to classify data or objects (people, places, or things) into distinct predefined groups The objects within a particular group are very similar to one another and are very different from the objects in other groups Provides insight into association and pattern Cluster Analysis Association Detection = Classic Example being Market Basket Analysis Marketing group all customers into distinct groups based on their purchasing patterns Insurance identify automobile policy holders with a higher than average claim cost Finance identify certain categories of stocks that exceed some set of pre-determined performance thresholds Market-basket analysis is a data-mining technique for determining sales patterns Uses statistical methods to identify sales patterns in large volumes of data Shows which products customers tend to buy together Used to estimate probability of customer purchase Helps identify cross-selling opportunities "Customers who bought book X also bought book Y 7

Market Basket Analysis Information Systems for Large Retail Organizations Creating Mounds of Data Transactional Data about: What was bought What time of day What day of week What time of year AND what else was bought that Visit Incomplete demographic information about the customer Market Basket Analysis Categorize all products into a finite, manageable number of categories Beer Vegetables Deli Soda Frozen dinners Identify Relationships Between Product Categories Market Basket Analysis Results of the Data mining: Unexpected Yet Logical So what do you with this information? Market Basket Analysis Sample of 1,000 receipts 100 of them had BOTH diapers and beer.10 Support (100/1000) 125 had Diapers (but not necessarily beer).80 Confidence (100/125) 200 of them bought Beer (but not necessarily diapers).20 Base Probability (200/1000) Confidence of Diapers & Beer Lift = Base Probability of Beer Lift shows 4 times more likely to buy beer when they buy diapers (.80 /.20) http://www.uvm.edu/~tichitte/mba/mba.html Summary Test # Review What is Business Intelligence? How can an organization be data rich and information poor? Terms including Data Warehousing Benefits Multidimensional Analysis Data scrubbing Data mining concepts 8