Handout 12 Data Warehousing and Analytics.

Similar documents
Data Warehouses Chapter 12. Class 10: Data Warehouses 1

Overview. Introduction to Data Warehousing and Business Intelligence. BI Is Important. What is Business Intelligence (BI)?

Information Management course

Chapter 6. Foundations of Business Intelligence: Databases and Information Management VIDEO CASES

Evolution of Database Systems

CS614 - Data Warehousing - Midterm Papers Solved MCQ(S) (1 TO 22 Lectures)

Data Mining Concepts & Techniques

Full file at

DATA MINING AND WAREHOUSING

Database Vs. Data Warehouse

Chapter 3. Foundations of Business Intelligence: Databases and Information Management

Data Warehousing and OLAP Technologies for Decision-Making Process

Data Mining & Data Warehouse

Question Bank. 4) It is the source of information later delivered to data marts.

Chapter 6 VIDEO CASES

1. Inroduction to Data Mininig

Data Management Glossary

Fig 1.2: Relationship between DW, ODS and OLTP Systems

This tutorial will help computer science graduates to understand the basic-to-advanced concepts related to data warehousing.

Designing Data Warehouses. Data Warehousing Design. Designing Data Warehouses. Designing Data Warehouses

A Novel Approach of Data Warehouse OLTP and OLAP Technology for Supporting Management prospective

Lecture 18. Business Intelligence and Data Warehousing. 1:M Normalization. M:M Normalization 11/1/2017. Topics Covered

Data warehouse architecture consists of the following interconnected layers:

Chapter 3. The Multidimensional Model: Basic Concepts. Introduction. The multidimensional model. The multidimensional model

Managing Data Resources

Data Warehouse and Data Mining

Managing Data Resources

DATAWAREHOUSING AND ETL PROCESSES: An Explanatory Research

This tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining.

CHAPTER 3 Implementation of Data warehouse in Data Mining

Data Strategies for Efficiency and Growth

Data Analysis and Data Science

Data Warehouse. Asst.Prof.Dr. Pattarachai Lalitrojwong

After completing this course, participants will be able to:

Decision Support, Data Warehousing, and OLAP

Big Data 13. Data Warehousing

Data Warehousing and OLAP

Data Warehousing and Decision Support. Introduction. Three Complementary Trends. [R&G] Chapter 23, Part A

CS377: Database Systems Data Warehouse and Data Mining. Li Xiong Department of Mathematics and Computer Science Emory University

Data Warehousing. Overview

Data-Driven Driven Business Intelligence Systems: Parts I. Lecture Outline. Learning Objectives

Management Information Systems MANAGING THE DIGITAL FIRM, 12 TH EDITION FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT

OLAP Introduction and Overview

Data Warehousing (1)

Přehled novinek v SQL Server 2016

Data Warehousing. Data Warehousing and Mining. Lecture 8. by Hossen Asiful Mustafa

The Data Organization

Power Distribution Analysis For Electrical Usage In Province Area Using Olap (Online Analytical Processing)

: How does DSS data differ from operational data?

DATA WAREHOUSE EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY

Data Warehousing and Decision Support

Management Information Systems Review Questions. Chapter 6 Foundations of Business Intelligence: Databases and Information Management

Data Warehousing and Decision Support

1 DATAWAREHOUSING QUESTIONS by Mausami Sawarkar

Decision Support. Chapter 25. CS 286, UC Berkeley, Spring 2007, R. Ramakrishnan 1

Big Data 13. Data Warehousing

by Prentice Hall

QUALITY MONITORING AND

Data Warehousing and Decision Support (mostly using Relational Databases) CS634 Class 20

Data Mining. Associate Professor Dr. Raed Ibraheem Hamed. University of Human Development, College of Science and Technology

Advanced Data Management Technologies Written Exam

Data Warehouse and Data Mining

Oracle Database 11g: Data Warehousing Fundamentals

1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda

FROM A RELATIONAL TO A MULTI-DIMENSIONAL DATA BASE

Data Warehouse and Data Mining

Evolving To The Big Data Warehouse

Managing Information Resources

This module presents the star schema, an alternative to 3NF schemas intended for analytical databases.

Introduction to Data Warehousing

KNGX NOTES INFS1603 [INFS1603] KEVIN NGUYEN

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores

TIM 50 - Business Information Systems

CT75 DATA WAREHOUSING AND DATA MINING DEC 2015

Data Warehouse and Data Mining

Data Warehouse and Mining

Data Mining. Data warehousing. Hamid Beigy. Sharif University of Technology. Fall 1394

Topics covered 10/12/2015. Pengantar Teknologi Informasi dan Teknologi Hijau. Suryo Widiantoro, ST, MMSI, M.Com(IS)

Top Five Reasons for Data Warehouse Modernization Philip Russom

STRATEGIC INFORMATION SYSTEMS IV STV401T / B BTIP05 / BTIX05 - BTECH DEPARTMENT OF INFORMATICS. By: Dr. Tendani J. Lavhengwa

CHAPTER 3 BUILDING ARCHITECTURAL DATA WAREHOUSE FOR CANCER DISEASE

Information Systems and Networks

DATA MINING TRANSACTION

Logical Design A logical design is conceptual and abstract. It is not necessary to deal with the physical implementation details at this stage.

Data Warehousing. Seminar report. Submitted in partial fulfillment of the requirement for the award of degree Of Computer Science

WKU-MIS-B10 Data Management: Warehousing, Analyzing, Mining, and Visualization. Management Information Systems

Zusammenfassung zur Vorlesung Data Warehousing

Data Mining and Data Warehousing Introduction to Data Mining

Introduction to Data Mining and Data Analytics

A Star Schema Has One To Many Relationship Between A Dimension And Fact Table

CHAPTER 8 DECISION SUPPORT V2 ADVANCED DATABASE SYSTEMS. Assist. Prof. Dr. Volkan TUNALI

Introduction to DWML. Christian Thomsen, Aalborg University. Slides adapted from Torben Bach Pedersen and Man Lung Yiu

Data Warehouse Testing. By: Rakesh Kumar Sharma

Data Vault Brisbane User Group

Aggregating Knowledge in a Data Warehouse and Multidimensional Analysis

The Evolution of Data Warehousing. Data Warehousing Concepts. The Evolution of Data Warehousing. The Evolution of Data Warehousing

Q1) Describe business intelligence system development phases? (6 marks)

ETL and OLAP Systems

DHANALAKSHMI COLLEGE OF ENGINEERING, CHENNAI

Two Success Stories - Optimised Real-Time Reporting with BI Apps

Transcription:

Handout 12 CS-605 Spring 17 Page 1 of 6 Handout 12 Data Warehousing and Analytics. Operational (aka transactional) system a system that is used to run a business in real time, based on current data; also called a system of record Informational (analytical) system a system designed to support decision making based on historical point-in-time and prediction data for complex queries or data-mining applications o Collect business operational data o Reduce it to a form that can be used to analyze the behavior of the business. o Not limited to Database, but often using the Database technology. Data warehouse (simple definition) an archival database for decision support. Operational Databases Decision Support Databases Support day-to-day business operations Read/writeable: records may be inserted, updated, deleted. Not as big as ones used for Decision Support Hold historical information integrated from multiple sources Primarily read-only Updating limited to o Load o Refresh o (i.e. Inserts, some Deletes, almost never Updates) Include a temporal component. Tend to be very large (especially when storing transaction data) Integrity not a big concern Usually designed in ad hoc manner Queries Often involve complex logical expressions in WHERE Require access to many kinds of facts/business objects, i.e. may require many joins. Functionally complex: may involve complex statistical computations Analytically complex: rarely answered in one query. Data Warehouse: A subject-oriented, integrated, time-variant, non-updatable collection of data used in support of management decision-making processes Subject-oriented: e.g. customers, patients, students, products Integrated: Consistent naming conventions, formats, encoding structures; from multiple and heterogeneous organizational data sources Time-variant: Can study trends and changes Nonupdatable (nonvolatile): Read-only, periodically refreshed - 1 -

Handout 12 CS-605 Spring 17 Page 2 of 6 Data Mart: A data warehouse that is limited in scope. Intended for use by a smaller, more specialized group of people Creating a Data Warehouse - ETL (Extract, Transform, Load ) Need to integrate uncoordinated and inconsistent multiple databases in organizations. Need to separate operational and informational systems and data to improve performance of data management Extract Static extract = capturing a snapshot of the source data at a point in time Incremental extract = capturing changes that have occurred since the last static extract Scrub/Cleanse uses pattern recognition and AI techniques to upgrade data quality Problems: misspellings, erroneous dates, incorrect field usage, mismatched addresses, missing data, duplicate data, inconsistencies Figure 9-1 from MDM Examples of heterogeneous data Establishing standard abbreviations and identifiers, replacing synonyms. Transform and consolidate convert data from format of operational system to format of data warehouse split/combine source records synchronize time information: e.g. customer - revenue data stored by fiscal quarter customer - salesperson data stored by calendar quarter can t tell which salesperson is responsible for what part of the customer revenue - 2 -

Handout 12 CS-605 Spring 17 Page 3 of 6 Load/Index Place transformed data into the warehouse and create indexes Move the data Initial / Refresh mode: bulk rewriting of target data at periodic intervals Check uniqueness constraints CPU intensive process, especially if many indices are present drop/reset indices could help. Several Common Data Warehouse Architectures Generic Two-Level Architecture Independent Data Mart Dependent Data Mart and Operational Data Store Logical Data Mart and @ctive Warehouse Generic Two-Level Architecture Operational Databases / One company-wide Warehouse Benefit: single integrated view of organizational data Problem: Periodic extraction data is not completely current in warehouse Independent Data Mart Multiple Data marts - mini-warehouses, limited in scope No single consolidated warehouse. Benefits: easier to create than one integrated warehouse Problems: redundancy, extra work in ETL for each data mart, potential lack of consistency, complex querying across multiple data marts users of individual marts must themselves provide an integrated view this is difficult and does not add up to having a single warehouse with well-defined known structure. Dependent Data Mart and Operational Data Store Data loaded from Operational Data Store to single Data Warehouse from Data Warehouse to Data Marts Benefits: single ETL no redundancy Logical Data Mart and @ctive Warehouse Data marts are logical views of the warehouse. Works well when data warehouse is not too large. Used in e-commerce applications. Problems: performance degrades with increasing size of the warehouse Benefits: Data in marts always current, no redundancy in storage/etl - 3 -

Handout 12 CS-605 Spring 17 Page 4 of 6 Data Warehouse Structure Star-schema: Dimension tables (often de-normalized for performance reasons) describe major business subjects + Time Period. Fact table an associative entity of the dimensions. Contains factual and quantitative summary data. Examples (From MDM) Fact table provides statistics for sales broken down by product, period and store dimensions - 4 -

Handout 12 CS-605 Spring 17 Page 5 of 6 Issues: Dimension table keys must be surrogate (non-intelligent and non-business related) for the following reasons Object descriptions may change over time e.g.: decided to change size of product with business number 20. Length/format consistency Across multiple organizational databases, the same product may have different identification numbers/primary keys Granularity of Fact Table what level of detail do you want? Transactional grain finest level enter every transaction into warehouse Aggregated grain more summarized enter just summary data Finer grain => better analysis capability more dimension tables => more rows in fact table Modeling dates: Technologies Data Mining Knowledge discovery using a blend of statistical, AI, and computer graphics techniques Explain observed events or conditions why sudden increase in turkey sales? Confirm hypotheses do turkey sales increase in November? do more students take Literature courses as sophomores than juniors? Explore data for new or unexpected relationships what else are the customers that buy turkeys in November likely to buy? which group of customers is likely to be interested in a product? Data visualization representing data in graphical/multimedia formats for analysis. Often used in conjunction with data mining. Helps identify trends and patterns. - 5 -

Handout 12 CS-605 Spring 17 Page 6 of 6 Big Data - evolving term - usually refers to voluminous amount of structured, semi-structured and unstructured data - can be mined for information Analytics o Systematic analysis and interpretation of data typically using mathematical, statistical, and computational tools to improve our understanding of a real-world domain. Big data characteristics The Five Vs of Big Data Volume much larger quantity of data than typical for relational databases Variety lots of different data types and formats Velocity data comes at very fast rate (e.g. mobile sensors, web click stream) Veracity traditional data quality methods don t apply; how to judge the data s accuracy and relevance? Value big data is valuable to the bottom line, and for fostering good organizational actions and decisions - Schema on Read, rather than Schema on Write Schema on Write preexisting data model, how traditional databases are designed (relational databases) Schema on Read data model determined later, depends on how you want to use it Capture and store the data, and worry about how you want to use it later - Data Lake o A large integrated repository for internal and external data that does not follow a predefined schema o Capture everything, dive in anywhere, flexible access NoSQL = Not Only SQL databases A category of recently introduced data storage and retrieval technologies not based on the relational model Supports schema on read Largely open source BASE basically available, soft state, eventually consistent - 6 -