DC Area Business Objects Crystal User Group (DCABOCUG) Data Warehouse Architectures for Business Intelligence Reporting.

Similar documents
Inductive Enterprise Architectures

Data Warehouses Chapter 12. Class 10: Data Warehouses 1

Metabase Metadata Management System Data Interoperability Need and Solution Characteristics

Data Models: The Center of the Business Information Systems Universe

CHAPTER 3 Implementation of Data warehouse in Data Mining

Oracle Database 11g: Data Warehousing Fundamentals

This tutorial will help computer science graduates to understand the basic-to-advanced concepts related to data warehousing.

Data warehouse architecture consists of the following interconnected layers:

Business Event Management

Business Intelligence An Overview. Zahra Mansoori

Data Warehouse and Mining

TDWI Data Modeling. Data Analysis and Design for BI and Data Warehousing Systems

Fig 1.2: Relationship between DW, ODS and OLTP Systems

Resource Life Cycle Management The Enterprise Architecture Component Integrator

The Evolution of Data Warehousing. Data Warehousing Concepts. The Evolution of Data Warehousing. The Evolution of Data Warehousing

Business Intelligence and Decision Support Systems

Fundamentals of Information Systems, Seventh Edition

Data Warehouse and Data Mining

CHAPTER 8 DECISION SUPPORT V2 ADVANCED DATABASE SYSTEMS. Assist. Prof. Dr. Volkan TUNALI

The Data Organization

Full file at

Meaning & Concepts of Databases

Section of Achieving Data Standardization

Data Warehousing & Mining Techniques

Data Warehousing. Overview

DATA MINING TRANSACTION

Chapter 13 Business Intelligence and Data Warehouses The Need for Data Analysis Business Intelligence. Objectives

A Novel Approach of Data Warehouse OLTP and OLAP Technology for Supporting Management prospective

2. Summary. 2.1 Basic Architecture. 2. Architecture. 2.1 Staging Area. 2.1 Operational Data Store. Last week: Architecture and Data model

Data Warehousing & Mining Techniques

STRATEGIC INFORMATION SYSTEMS IV STV401T / B BTIP05 / BTIX05 - BTECH DEPARTMENT OF INFORMATICS. By: Dr. Tendani J. Lavhengwa

Building a Data Warehouse step by step

Overview. Introduction to Data Warehousing and Business Intelligence. BI Is Important. What is Business Intelligence (BI)?

Data Warehousing Introduction. Toon Calders

Data-Driven Driven Business Intelligence Systems: Parts I. Lecture Outline. Learning Objectives

Data Architecture Reference Model

: How does DSS data differ from operational data?

Data Warehouse and Data Mining


MIT Database Management Systems Lesson 01: Introduction

Data Mining Concepts & Techniques

SAS Data Integration Studio 3.3. User s Guide

DATA WAREHOUSE EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY

1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda

Evolution of Database Systems

AVOIDING SILOED DATA AND SILOED DATA MANAGEMENT

TDWI strives to provide course books that are contentrich and that serve as useful reference documents after a class has ended.

C_HANAIMP142

Modeling Data and Designing Databases

Whitemarsh Metabase Getting Started Users Guide

Pro Tech protechtraining.com

Decision Support, Data Warehousing, and OLAP

collection of data that is used primarily in organizational decision making.

Information Management course

Data Warehousing. Adopted from Dr. Sanjay Gunasekaran

Data Warehouse and Data Mining

Guide Users along Information Pathways and Surf through the Data

Data Warehouse and Data Mining

Complete. The. Reference. Christopher Adamson. Mc Grauu. LlLIJBB. New York Chicago. San Francisco Lisbon London Madrid Mexico City

Database Management Systems MIT Lesson 01 - Introduction By S. Sabraz Nawaz

Chapter 3. The Multidimensional Model: Basic Concepts. Introduction. The multidimensional model. The multidimensional model

Data Warehousing and OLAP Technologies for Decision-Making Process

DATA MINING AND WAREHOUSING

Introduction to DWML. Christian Thomsen, Aalborg University. Slides adapted from Torben Bach Pedersen and Man Lung Yiu

Class Organization. CSPP 53017: Data Warehousing Winter 2013" Lecture 1" Svetlozar Nestorov"

DHANALAKSHMI COLLEGE OF ENGINEERING, CHENNAI

Enterprise Architectures

ETL and OLAP Systems

This module presents the star schema, an alternative to 3NF schemas intended for analytical databases.

Chapter 1: The Database Environment

Data Vault Brisbane User Group

Chapter 3: Data Warehousing

Aggregating Knowledge in a Data Warehouse and Multidimensional Analysis

TIM 50 - Business Information Systems

Data Warehousing. Data Warehousing and Mining. Lecture 8. by Hossen Asiful Mustafa

Proposal of a new Data Warehouse Architecture Reference Model

Data Warehousing Concepts

6+ years of experience in IT Industry, in analysis, design & development of data warehouses using traditional BI and self-service BI.

Course Contents: 1 Business Objects Online Training

Management Information Systems MANAGING THE DIGITAL FIRM, 12 TH EDITION FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT

Question Bank. 4) It is the source of information later delivered to data marts.

SOME TYPES AND USES OF DATA MODELS

DKMS Brief No. Five: Is Data Staging Relational? A Comment

Course chapter 8: Data Warehousing - Introduction. F. Radulescu - Data warehousing - introduction 1

DSS based on Data Warehouse

INTRODUCTORY INFORMATION TECHNOLOGY ENTERPRISE DATABASES AND DATA WAREHOUSES. Faramarz Hendessi

The Data Organization

DATABASE MANAGEMENT SYSTEMS. UNIT I Introduction to Database Systems

Whitepaper. Solving Complex Hierarchical Data Integration Issues. What is Complex Data? Types of Data

Q1) Describe business intelligence system development phases? (6 marks)

OLAP Introduction and Overview

Data Strategies for Efficiency and Growth

Chapter 11 Database Concepts

Cognos also provides you an option to export the report in XML or PDF format or you can view the reports in XML format.

REVENUE REPORTING DASHBOARD FOR A HOTEL GROUP

5. Technology Applications

Call: SAS BI Course Content:35-40hours

The InfoLibrarian Metadata Appliance Automated Cataloging System for your IT infrastructure.

Introduction. Example Databases

An Information Asset Hub. How to Effectively Share Your Data

Transcription:

DC Area Business Objects Crystal User Group (DCABOCUG) Data Warehouse Architectures for Business Intelligence Reporting April 14, 2009 Whitemarsh Information Systems Corporation 2008 Althea Lane Bowie, Maryland 20716 Tele: 301-249-1142 Email: Whitemarsh@wiscorp.com Web: www.wiscorp.com

Topics 1. What is Business Intelligence?...1 2. Fundamental Data Management Terms...2 3. Data is Executed Policy (core principles)...5 4. The World of Data Architectures...7 5. Two Data Warehouse Strategies...15 5.1 ER-Model Based (Inmon)...16 5.2 Star Schema Based (Kimball)...17 5.3 Key Data Warehouse Architecture Differences...19 5.4 Key Data Modeling Differences...22 5.5 Key Build Strategy Differences...25 6. Conclusions and Recommendations...29 ii

1. What is Business Intelligence? 1

2. Fundamental Data Management Terms Data Database Term Database Management Database Management System (DBMS) Data Element Definition The persistent evidences of policy executions, that is, the execution of the procedures of the enterprise. A highly organized repository of enterprise-data that contains its own internal structure ensuring that data can be entered, updated, reported, and held secure. The strategies and procedures for managing enterprise data within databases in an orderly and well-reasoned manner. An IT software system that accomplishes the definition, instantiation, evolution, maintenance, evolution, and reporting of databases. Example: Oracle, DB2. A Data Element is a context independent business fact semantic template that can be employed as the meaning and rules basis for an attribute, column, field, screen element, etc. 2

Term Data Model Concepts Data Model Logical Data Model Data Warehouse Architectures for Business Intelligence Reporting Definition A model of the data that is to be contained in a database. A complete data model consists of: 1) data record structure (a.k.a., tables), 2) relationships among the tables, and 3) formally defined operations on the rows of data and the relationships. A type of data model that represents the data models (see above) of individual concepts. E.g., Person, Address, Invoice, and Customer. Commonly this type of data model is high-level and likely only contains entities, attributes, and relationships. Possible 100s of data models of concepts. Data structure pattern of a concept. Ether-land A database data model that consists of a schema, tables, columns, and relationships. These are typically constructed through the employment of data models of concepts within the database s data model. These models are DBMS independent. Commonly these models are in third normal form. Physical Data Model A database models tuned to the needs of a specific application and DBMS. Commonly these models are derived from one or more logical data models and may not be in third normal form. 3

Metadata Term Metadata Management System Business Intelligence Definition An imprecise and vague term to mean the specifications of real objects. Example: if a database s data is real, the database s schema is metadata. Simply put, metadata is the abstraction-level about the considered object. Book example: while the chapters are real, the table of contents and index are metadata. A software application, commonly with a database that is managed by a DBMS that stores, manages, and reports metadata. The sets of data that a business declares is needed to understand its past, and undertake tactical and strategic operations about its current, and future activities. 4

3. Data is Executed Policy (core principles)! The persistent evidences of policy executions, that is, the execution of the procedures of the enterprise.! Processes are the procedures through which policy is executed and data is created.! Policy and procedures run the business.! Policy executions produce data.! Data stored in databases are the persistent memory of the organization. 5

Data is Executed Policy (cont.)! "Data" specifications are policy definitions.! Process specifications are procedure definitions.! All data (i.e., policy) specifications are metadata.! All process (i.e., procedure) specifications are metadata.! A metadata database, i.e. a metabase, is a database for all Policy and Procedure Specifications.! Data Management manages both metadata and data. 6

4. The World of Data Architectures An Architecture: An engineered and recognized style that is replicated across many instances in support of a well-defined purpose. 7

Five Data Architecture Classes Seen in Enterprises 8

Data Architecture Persistent Data Classifications and Characteristics Persistent Data Classification Persistent Data Characteristics Process Characteristics User Considerations Technical Considerations Original Data Capture Detailed atomic data Accurate as of the last update Tuned for transaction capture, storage and update Application oriented Original data source entry personnel High availability Amount of data for processing is small Multiple vendor packages Well defined, long lasting database designs Normalized database designs Uses reference data No invalid data updates allowed Transaction driven Processing supported by well known data integrity and business processing rules Understands, creates, and maintains TDSA databases through Original Data Capture system extract and TDSA loading software. Supports day-to-day operations Package specific May or may not be controlled by SQL DBMS Source data for TDSA 9

Data Architecture Persistent Data Classifications and Characteristics Persistent Data Classification Persistent Data Characteristics Process Characteristics User Considerations Technical Considerations TDSA: Transaction Data Staging Area Transient data/short lived Foundation data source for all operational data systems Enterprise-wide standard semantics Package independent database designs Denormalized Full business transaction Accepts, stores, and then pushes forward function Only refreshed with changes from previous version Operations application data system daily update event driven Translation and transformation Users cannot access Multiple platforms Interface monitoring Applications insulation SQL DBMS controlled Does not use reference data 10

Data Architecture Persistent Data Classifications and Characteristics Persistent Data Classification Persistent Data Characteristics Process Characteristics User Considerations Technical Considerations ODS: Operational Data Store Subject Area Databases Detail level data May be lightly summarized Current or nearly current Updated daily via TDSA data transaction files Accepts and stores data from TDSA End-user detailed level analysis Used for up to the minute decisions Requires fast response time Large volume SQL DBMS controlled Rolling histories Broad subject area database scope Supports comprehensive reporting and generalized ad hoc query Used for detailed decision making Normalized database designs Redundant data from across enterprise May contain derived data from outside Uses reference data May receive and/or send data to databases within class Data source for all warehouse databases 11

Data Architecture Persistent Data Classifications and Characteristics Persistent Data Classification Persistent Data Characteristics Process Characteristics User Considerations Technical Considerations Warehouse: Wholesale Inmon data warehouse Detail and some summarized. Rolling Histories Load/replace, no end-user update Enterprise-wide standard semantics No end-user updating Regular, periodic updates Supports standardized, ondemand reports Supports general complex business data analyses such as trends and forecasting Supports power and analytical users community Used for broad direction and positioning Used to formulate and assess long term decisions Availability not on business critical path User workstation access Large data volumes per query High processing power required Address one or more subject areas Views data from multiple subject areas SQL DBMS controlled Redundant data from across enterprise Supports data mining May contain internal derived data Reference data fully embedded May receive and/or send data to databases within class Data source for all retail data warehouses 12

Data Architecture Persistent Data Classifications and Characteristics Persistent Data Classification Persistent Data Characteristics Process Characteristics User Considerations Technical Considerations Warehouse: Retail Kimball Data Mart Light to highly summarized and some detail Rolling histories Load/replace, no end-user update Availability not on business critical path Regular, periodic updates Highly designed, end-user on-demand reports Supports managerial community Cannot update Used for direction and positioning Relaxed availability User workstations Large volume High processing power Enterprise-wide standard semantics Supports self-service enduser reporting Supportive of long term decision making SQL DBMS controlled Denormalized and highly designed to specifically favor one or more reporting formats Redundant data from across enterprise Supports very specific simple to complex business data analyses Views data from multiple subject areas Specific reporting need May contain internal derived data Reference data fully embedded May receive and/or send data to databases within class 13

Data Architecture Persistent Data Classifications and Characteristics Persistent Data Classification Persistent Data Characteristics Process Characteristics User Considerations Technical Considerations Reference Data Durable codes and long value alternatives with policy definitions and full descriptions Enterprise-wide standard semantics Simple updates Update mappings required for reference data value migration Needed by all levels in the organization Used by all systems Enables understanding and conversion of historical data Supports the concept of single source Integration with all data store types Source of all valid and invalid values including alternatives for different countries and languages Multiple group data field constructors suitable for different countries and languages Definitive source for multiuse data in all other databases Changed data history supported by conversion mappings Long lasting, seldom updated 14

5. Two Data Warehouse Strategies Bill Inmon ER Model -based Ralph Kimball Star Schema-based 15

5.1 ER-Model Based (Inmon) Airline Airline Code Airline Name Air Craft Aircraft Tail Number Air Craft Type Airline Code First Class Seats Business Class Seats Economy Class Seats Airport Air Port Code Air Port Name Flights Flight Number Departing Airport Code Departing Airport Cost Arriving Airport Code Arriving Airport Cost Departure Time Arriving Time Departure Gate Arrival Gate Aircraft Tail Number Fuel Cost Food Cost Flight Departure Date Customer Customer Number Customer Name Customer Address Customer City Customer State Customer Zip Customer Birthdate Customer Phone Customer Airport Booking Air Flight Port Number Code Air Port Name Air Customer Port Code Number Air Seat Port Number Name Ticket Price Seat Class 16

5.2 Star Schema Based (Kimball) Airline Airline Code Airline Name Air Craft Aircraft Tail Number Air Craft Type Airline Code First Class Seats Business Class Seats Economy Class Seats Customer Booking Flight Number Customer Number Seat Airport Number Air Port Code Ticket Price Air Port Name Air Seat Port Class Code Air Aircraft Port Name Tail Number Date Airline Code Flights Flight Number Departing Airport Code Departing Airport Name Arriving Airport Code Arriving Airport Name Flight Number Departure Time Arriving Time Departure Gate Arrival Gate Aircraft Tail Number Customer Customer Number Customer Name Customer Address Customer City Customer State Customer Zip Customer Birthdate Customer Phone Date Date 17

Airline Airline Code Airline Name Air Craft Aircraft Tail Number Air Craft Type Airline Code First Class Seats Business Class Seats Economy Class Seats Flight Revenue Flight Number First Class Seat Revenue Business Class Seat Revenue Economy Airport Class Seat Revenue Fuel Cost Air Port Code Departing Air Port Airport Name Cost Arriving Airport Cost Food Cost Aircraft Tail Number Date Air Port Code Airline Code Flights Flight Number Departing Airport Code Arriving Airport Code Flight Number Departure Time Arriving Time Departure Gate Arrival Gate Aircraft Tail Number Airport Air Port Code Air Port Name Date Date 18

5.3 Key Data Warehouse Architecture Differences 1. Kimball approach has many Star Schemas... Sort of a have a need? Make a Start-Schema Data Mart! Multiple Star Schemas share dimensions. 2. Inmon is a top-down approach. Data Marts pop out of the bottom of a cohesive Enterprise-wide (or at least subject-wide) database architecture. 3. Kimball would say: The Corporate Data Warehouse is the collection of all Data Marts in which each corresponds to the past and current information about a discrete decision domain. 4. Inmon would say: The Corporation Data Warehouse is a unified, integrated, and non-redundant enterprise-level operational, tactical, and strategic database. A Data Mart is a materialized discrete data-based view of a small decision domain. 19

Quotes from Kimball and Inmon The data warehouse is nothing more than the union of all the data marts Ralph Kimball Dec. 29, 1997. You can catch all the minnows in the ocean and stack them together and they still do not make a whale. Bill Inmon Jan. 8, 1998. 20

In short Kimball--First Data Marts The Sum of which is the Enterprise Data Warehouse. Inmon---First the Enterprise Data Warehouse--Later the Data Marts. 21

5.4 Key Data Modeling Differences 1. Each dimension is a flat table of collapsed hierarchies. Hence very un-normalized. 2. Each dimension with history has a Current flag for the most recent row. 3. Time-sequence is managed by a special Time- Dimension 4. Fact tables record state changes or fact measurements. Facts are selectable or distinguishable by related dimension information. 5. Granularity, precision, and timeliness is set to the need of the decision domain of the specific star schema. 6. Dimensions can be shared between and among fact tables thus giving the ability to sort of report different measurements based on common dimensions. 22

ER-Schema Data Models 1. Every table is normalized to the maximum degree possible. 2. No special treatment of history other that what is naturally engineered into database tables. 3. Time- sequence is represent by time-stamped columns 3. No manufactured dimensions. No manufactured fact tables. 4. Granularity, precision, and timeliness is set to that of the enterprise, not of the data warehouse tables. 5. Star Schemas Data Marts are built from corporate data warehouse. 23

Key Data Modeling Differences Summary 1. Inmon s approach is NOT for directly building data marts. Rather it s to build Enterprise Data Warehouses from which data marts are generated. 2. Kimball s approach is to build collections of Star Schema data marts with shared dimensions. The Kimball EDW is THIS collection. 3. So really, arguing for a Kimball or Inmon approach is almost like arguing which is better, a car s engine or its transmission. An argument based on a false premise. But if you are going to build Data Marts, the Build Strategy Differences are worth noting. 24

5.5 Key Build Strategy Differences Star Schema Extract Transform Load (ETL) Strategy (worst case?) Source (Excel) Source (Cobol/ ISAM) Source (SQL) Source (Cobol/ ISAM) Source (Excel) Data Mart 01 Data Mart 04 Data Mart 02 Data Mart 03 Data Mart 05 25

ER-Model Extract Transform Load (ETL) Strategy Source (Excel) Source (Cobol/ ISAM) Source (SQL) Source (Cobol/ ISAM) Source (Excel) Data Mart 01 Data Mart 02 Data Mart 05 Data Mart 03 Data Mart 04 26

Kimball 15 sets of business rules for granularity, precision, and timeliness. Critical Statistics Inmon 10 sets of business rules for granularity, precision, and timeliness. 15 sets of value domain engineering. 5 sets of value domain engineering. 15 sets of ETL software processes. 10 sets of ETL software processes. Hi probability of conflicting Business rules for the 15 ETL Scripts. Low probability of conflicting Business rules for the 10 ETL Scripts. Counts drawn from strategies contained on Slides 25 and 26. 27

Issue Consistent Semantics Synchronized Granularity and Precision Agreed-upon Timeliness Quality engineered twoway relationships versus collapsed hierarchies Kimball Risk if each ETL and DM has its own. Risk if each ETL and DM has its own. Risk if each ETL and DM has its own. Risk because there s not highly engineered pair-related sets to draw from. Critical Comparison Inmon Resolved and able to be controlled. Resolved and able to be controlled. Resolved and able to be controlled. Resolved and able to be controlled. Comparisons drawn from strategies contained on Slides 25 and 26. 28

6. Conclusions and Recommendations 1. Kimball Data Marts have value and are needed within narrow, well-defined decision domains. 2. Kimball Data Marts have an intuitive way of perceiving and processing data by end users. 3. Kimball s Data Marts built directly from sources are accomplished faster (i.e., Source to Data Mart) than Source-to-Inmon-to-Kimball. 4. Direct population of Kimball Data Marts from Data Sources creates risks in the areas of semantics, granularity, precision, timeliness, and uncontrolled value domains. 5. Proliferation of Data Marts can cause more data and semantics stove-pipes if there s no enterprise-wide data-centric engineering and architecture. 29

Conclusions and Recommendations (cont.) 6. While a Inmon-Kimball strategy will take longer than one-off Kimball data marts, it has positive effects on synchronization of semantics, granularity, precision, timeliness, and controlled value domains across Kimball-based data marts. 7. Direct querying an Inmon Data warehouse requires more specialized and advanced database knowledge than from Kimball Data Marts. 8. Inmon data warehouse design is optimized for data loading and update. 9. An Inmon-Kimball architecture has greater stability, flexibility, and evolution capability over either architecture individually. 30

Bottom Line Recommendation 1. Build Inmon data warehouses at the Subject Area database scope. 2. Build Kimball Star-Schema data marts from Inmon data warehouses. 3. Inmon-Kimball is preferred over either Inmon or Kimball alone. 31