Designing Data Warehouses. Data Warehousing Design. Designing Data Warehouses. Designing Data Warehouses

Similar documents
Data Warehouse. Asst.Prof.Dr. Pattarachai Lalitrojwong

Data warehouse architecture consists of the following interconnected layers:

Logical Design A logical design is conceptual and abstract. It is not necessary to deal with the physical implementation details at this stage.

Data Warehousing. Overview

1. Analytical queries on the dimensionally modeled database can be significantly simpler to create than on the equivalent nondimensional database.

Data Warehouses Chapter 12. Class 10: Data Warehouses 1

Data Warehousing Concepts

Complete. The. Reference. Christopher Adamson. Mc Grauu. LlLIJBB. New York Chicago. San Francisco Lisbon London Madrid Mexico City

Extended TDWI Data Modeling: An In-Depth Tutorial on Data Warehouse Design & Analysis Techniques

Fig 1.2: Relationship between DW, ODS and OLTP Systems

Oracle Database 11g: Data Warehousing Fundamentals

This tutorial will help computer science graduates to understand the basic-to-advanced concepts related to data warehousing.

Data Warehouse and Mining

Informatica Power Center 10.1 Developer Training

OLAP Introduction and Overview

Data Mining Concepts & Techniques

Data Strategies for Efficiency and Growth

DHANALAKSHMI COLLEGE OF ENGINEERING, CHENNAI

A Star Schema Has One To Many Relationship Between A Dimension And Fact Table

02 Hr/week. Theory Marks. Internal assessment. Avg. of 2 Tests

DATA WAREHOUSE EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY

OBIEE Course Details

Module 1.Introduction to Business Objects. Vasundhara Sector 14-A, Plot No , Near Vaishali Metro Station,Ghaziabad

Full file at

Data-Driven Driven Business Intelligence Systems: Parts I. Lecture Outline. Learning Objectives

The Evolution of Data Warehousing. Data Warehousing Concepts. The Evolution of Data Warehousing. The Evolution of Data Warehousing

DATA MINING TRANSACTION

Chapter 3. The Multidimensional Model: Basic Concepts. Introduction. The multidimensional model. The multidimensional model

The Data Organization

CHAPTER 3 Implementation of Data warehouse in Data Mining

Data Warehouse and Data Mining

DATA MINING AND WAREHOUSING

Question Bank. 4) It is the source of information later delivered to data marts.

Real-World Performance Training Dimensional Queries

TDWI strives to provide course books that are contentrich and that serve as useful reference documents after a class has ended.

STRATEGIC INFORMATION SYSTEMS IV STV401T / B BTIP05 / BTIX05 - BTECH DEPARTMENT OF INFORMATICS. By: Dr. Tendani J. Lavhengwa

Data Warehouse and Data Mining

Pro Tech protechtraining.com

Best Practices in Data Modeling. Dan English

Data Warehousing. Data Warehousing and Mining. Lecture 8. by Hossen Asiful Mustafa

Handout 12 Data Warehousing and Analytics.

A Novel Approach of Data Warehouse OLTP and OLAP Technology for Supporting Management prospective

Database design View Access patterns Need for separate data warehouse:- A multidimensional data model:-

COGNOS (R) 8 GUIDELINES FOR MODELING METADATA FRAMEWORK MANAGER. Cognos(R) 8 Business Intelligence Readme Guidelines for Modeling Metadata

Data Warehouse Logical Design. Letizia Tanca Politecnico di Milano (with the kind support of Rosalba Rossato)

A Systems Approach to Dimensional Modeling in Data Marts. Joseph M. Firestone, Ph.D. White Paper No. One. March 12, 1997

Data Warehouses and Deployment

Lectures for the course: Data Warehousing and Data Mining (IT 60107)

1 DATAWAREHOUSING QUESTIONS by Mausami Sawarkar

Data Warehouse and Data Mining

An Overview of Data Warehousing and OLAP Technology

REVENUE REPORTING DASHBOARD FOR A HOTEL GROUP

Proceedings of the IE 2014 International Conference AGILE DATA MODELS

Built for Speed: Comparing Panoply and Amazon Redshift Rendering Performance Utilizing Tableau Visualizations

DATA WAREHOUING UNIT I

After completing this course, participants will be able to:

Building a Data Warehouse step by step

Data Mining. Data warehousing. Hamid Beigy. Sharif University of Technology. Fall 1394

CS377: Database Systems Data Warehouse and Data Mining. Li Xiong Department of Mathematics and Computer Science Emory University

ETL TESTING TRAINING

CS655 Data Warehousing

The University of Iowa Intelligent Systems Laboratory The University of Iowa Intelligent Systems Laboratory

A Data Warehouse Implementation Using the Star Schema. For an outpatient hospital information system

ETL Interview Question Bank

Database Systems: Design, Implementation, and Management Tenth Edition. Chapter 9 Database Design

Course chapter 8: Data Warehousing - Introduction. F. Radulescu - Data warehousing - introduction 1

Sql Fact Constellation Schema In Data Warehouse With Example

Information Technology Engineers Examination. Database Specialist Examination. (Level 4) Syllabus. Details of Knowledge and Skills Required for

Rocky Mountain Technology Ventures

C_HANAIMP142

Call: SAS BI Course Content:35-40hours

Data Warehousing and Business Intelligence. Improve strategic decision making David Diaz Diaz CERN GS-AIS

This module presents the star schema, an alternative to 3NF schemas intended for analytical databases.

Data Mining. ❸Chapter 3 Data warehouse, ETL and OLAP. Asso.Prof.Dr. Xiao-dong Zhu. Business School, University of Shanghai for Science & Technology

Seminars of Software and Services for the Information Society. Data Warehousing Design Issues

DATABASE DEVELOPMENT (H4)

Decision Support Systems aka Analytical Systems

Implementing a Data Warehouse with Microsoft SQL Server 2012/2014 (463)

Implement a Data Warehouse with Microsoft SQL Server

Index. Symbols = (equal) operator, 87

UNIT -1 UNIT -II. Q. 4 Why is entity-relationship modeling technique not suitable for the data warehouse? How is dimensional modeling different?

20463C-Implementing a Data Warehouse with Microsoft SQL Server. Course Content. Course ID#: W 35 Hrs. Course Description: Audience Profile

Implementing a Data Warehouse with Microsoft SQL Server

Data Warehouse Testing. By: Rakesh Kumar Sharma

Time: 3 hours. Full Marks: 70. The figures in the margin indicate full marks. Answers from all the Groups as directed. Group A.

Account Payables Dimension and Fact Job Aid

Department of Industrial Engineering. Sharif University of Technology. Operational and enterprises systems. Exciting directions in systems

IMPLEMENTING STATISTICAL DOMAIN DATABASES IN POLAND. OPPORTUNITIES AND THREATS. Central Statistical Office in Poland

Aggregating Knowledge in a Data Warehouse and Multidimensional Analysis

Create Cube From Star Schema Grouping Framework Manager

SAS Data Integration Studio 3.3. User s Guide

A quality product by Brainheaters education solutions Pvt. Ltd. Brainheaters Notes. Revised (A.Y )

Data Warehousing. Adopted from Dr. Sanjay Gunasekaran

IBM Cognos Framework Manager: Design Metadata Models (V10.2)

DKMS Brief No. Five: Is Data Staging Relational? A Comment

Preface 7. 1 Data warehousing and database technologies 9

ETL (Extraction Transformation & Loading) Testing Training Course Content

Chapter 3. Databases and Data Warehouses: Building Business Intelligence

TDWI strives to provide course books that are content-rich and that serve as useful reference documents after a class has ended.

TDWI Data Modeling. Data Analysis and Design for BI and Data Warehousing Systems

Transcription:

Designing Data Warehouses To begin a data warehouse project, need to find answers for questions such as: Data Warehousing Design Which user requirements are most important and which data should be considered first? Should project be scaled down into something more manageable? Should infrastructure for a scaled down project be capable of ultimately delivering a full-scale enterprise-wide data warehouse? 2 Designing Data Warehouses For many enterprises, the way to avoid the complexities associated with designing a data warehouse is to start by building one or more data marts. Data marts allow designers to build something that is far simpler and achievable for a specific group of users. Designing Data Warehouses Few designers are willing to commit to an enterprise-wide design that must meet all user requirements at one time. Despite the interim solution of building data marts, goal remains same: i.e., the ultimate creation of a data warehouse that supports the requirements of the enterprise. 3 4

Designing Data Warehouses Requirements collection and analysis stage of a data warehouse project involves interviewing appropriate members of staff (such as marketing users, finance users, and sales users) to enable identification of prioritized set of requirements that data warehouse must meet. Designing Data Warehouses At same time, interviews are conducted with members of staff responsible for operational systems to identify which data sources can provide clean, valid, and consistent data that will remain supported over next few years. 5 6 Designing Data Warehouses Interviews provide the necessary information for the top-down view (user requirements) and the bottom-up view (which data sources are available) of the data warehouse. The database component of a data warehouse is described using a technique called dimensionality modeling. Dimensionality Modeling Logical design technique that aims to present the data in a standard, intuitive form that allows for high-performance access Uses the concepts of ER modeling with some important restrictions. Every dimensional model (DM) is composed of one table with a composite primary key, called the fact table, and a set of smaller tables called dimension tables. 7 8

Dimensionality Modeling Each dimension table has a simple (noncomposite) primary key that corresponds exactly to one of the components of the composite key in the fact table. Dimensionality Modeling All natural keys are replaced with surrogate keys. Means that every join between fact and dimension tables is based on surrogate keys, not natural keys. Forms star-like structure, which is called a star schema or star join. Surrogate keys allows data in the warehouse to have some independence from the data used and produced by the OLTP systems. 9 10 Star Schema for Property Sales of DreamHome Dimensionality Modeling Star schema is a logical structure that has a fact table containing factual data in the center, surrounded by dimension tables containing reference data, which can be denormalized. Facts are generated by events that occurred in the past, and are unlikely to change, regardless of how they are analyzed. 11 12

Dimensionality Modeling Bulk of data in data warehouse is in fact tables, which can be extremely large. Important to treat fact data as read-only reference data that will not change over time. Most useful fact tables contain one or more numerical measures, or facts that occur for each record and are numeric and additive. Dimensionality Modeling Dimension tables usually contain descriptive textual information. Dimension attributes are used as the constraints in data warehouse queries. Star schemas can be used to speed up query performance by denormalizing reference information into a single dimension table. 13 14 Dimensionality Modeling Property Sales with Normalized Version of Branch Dimension Table Snowflake schema is a variant of the star schema where dimension tables do not contain denormalized data. Starflake schema is a hybrid structure that contains a mixture of star (denormalized) and snowflake (normalized) schemas. Allows dimensions to be present in both forms to cater for different query requirements. 15 16

Dimensionality Modeling Predictable and standard form of the underlying dimensional model offers important advantages: Efficiency Ability to handle changing requirements Extensibility Ability to model common business situations Predictable query processing. Database Design Methodology for Data Warehouses Nine-Step Methodology includes following steps: Choosing the process Choosing the grain Identifying and conforming the dimensions Choosing the facts Storing pre-calculations in the fact table Rounding out the dimension tables Choosing the duration of the database Tracking slowly changing dimensions Deciding the query priorities and the query modes. 17 18 Step 1: Choosing The Process ER Model of an Extended Version of DreamHome The process (function) refers to the subject matter of a particular data mart. First data mart built should be the one that is most likely to be delivered on time, within budget, and to answer the most commercially important business questions. 19 20

ER Model of Property Sales Business Process of DreamHome Step 2: Choosing The Grain Decide what a record of the fact table is to represent. Identify dimensions of the fact table. The grain decision for the fact table also determines the grain of each dimension table. Also include time as a core dimension, which is always present in star schemas. 21 22 Step 3: Identifying and Conforming the Dimensions Star Schemas for Property Sales and Property Advertising Dimensions set the context for asking questions about the facts in the fact table. If any dimension occurs in two data marts, they must be exactly the same dimension, or one must be a mathematical subset of the other. A dimension used in more than one data mart is referred to as being conformed. 23 24

Step 4: Choosing The Facts Property Rentals With a Badly Structured Fact Table The grain of the fact table determines which facts can be used in the data mart. Facts should be numeric and additive. Unusable facts include: non-numeric facts, non-additive facts, fact at different granularity from other facts in table. 25 26 Property Rentals With Fact Table Corrected Step 5: Storing Pre-Calculations in the Fact Table Once the facts have been selected each should be re-examined to determine whether there are opportunities to use pre-calculations. 27 28

Step 6: Rounding Out The Dimension Tables Text descriptions are added to the dimension tables. Text descriptions should be as intuitive and understandable to the users as possible. Usefulness of a data mart is determined by the scope and nature of the attributes of the dimension tables. Step 7: Choosing The Duration Of The Database Duration measures how far back in time the fact table goes. Very large fact tables raise at least two very significant data warehouse design issues. Often difficult to source increasing old data. It is mandatory that the old versions of the important dimensions be used, not the most current versions. Known as the Slowly Changing Dimension problem. 29 30 Step 8: Tracking Slowly Changing Dimensions Slowly changing dimension problem means that the proper description of the old dimension data must be used with old fact data. Often, a generalized key must be assigned to important dimensions in order to distinguish multiple snapshots of dimensions over a period of time. Step 8: Tracking Slowly Changing Dimensions Three basic types of slowly changing dimensions: Type 1, where a changed dimension attribute is overwritten. Type 2, where a changed dimension attribute causes a new dimension record to be created. Type 3, where a changed dimension attribute causes an alternate attribute to be created so that both the old and new values of the attribute are simultaneously accessible in the same dimension record. 31 32

Step 9: Deciding The Query Priorities And The Query Modes Most critical physical design issues affecting the end-user s perception includes: physical sort order of the fact table on disk; presence of pre-stored summaries or aggregations. Additional physical design issues include administration, backup, indexing performance, and security. Database Design Methodology for Data Warehouses Methodology designs a data mart that supports requirements of particular business process and allows the easy integration with other related data marts to form the enterprise-wide data warehouse. A dimensional model, which contains more than one fact table sharing one or more conformed dimension tables, is referred to as a fact constellation. 33 34 Fact and Dimension Tables for each Business Process of DreamHome Dimensional Model (Fact Constellation) for the DreamHome Data Warehouse 35 36

Criteria for Assessing the Dimensionality of a Data Warehouse Criteria for Assessing the Dimensionality of a Data Warehouse Criteria proposed by Ralph Kimball to measure the extent to which a system supports the dimensional view of data warehousing. Twenty criteria divided into three broad groups: architecture, administration, and expression. 37 38 Criteria for Assessing the Dimensionality of a Data Warehouse Architectural criteria describes way the entire system is organized. Administration criteria are considered to be essential to the smooth running of a dimensionally oriented data warehouse. Expression criteria are mostly analytic capabilities that are needed in real-life situations. 39