TDWI strives to provide course books that are content-rich and that serve as useful reference documents after a class has ended.

Similar documents
TDWI strives to provide course books that are contentrich and that serve as useful reference documents after a class has ended.

TDWI Data Modeling. Data Analysis and Design for BI and Data Warehousing Systems

TDWI strives to provide course books that are content-rich and that serve as useful reference documents after a class has ended.

Extended TDWI Data Modeling: An In-Depth Tutorial on Data Warehouse Design & Analysis Techniques

This tutorial will help computer science graduates to understand the basic-to-advanced concepts related to data warehousing.

FROM A RELATIONAL TO A MULTI-DIMENSIONAL DATA BASE

Data Warehousing. Overview

Data Strategies for Efficiency and Growth

This module presents the star schema, an alternative to 3NF schemas intended for analytical databases.

Data warehouse architecture consists of the following interconnected layers:

Basics of Dimensional Modeling

TDWI strives to provide course books that are contentrich and that serve as useful reference documents after a class has ended.

Information Management Fundamentals by Dave Wells

Designing Data Warehouses. Data Warehousing Design. Designing Data Warehouses. Designing Data Warehouses

Aggregating Knowledge in a Data Warehouse and Multidimensional Analysis

Chapter 3. The Multidimensional Model: Basic Concepts. Introduction. The multidimensional model. The multidimensional model

DC Area Business Objects Crystal User Group (DCABOCUG) Data Warehouse Architectures for Business Intelligence Reporting.

A Systems Approach to Dimensional Modeling in Data Marts. Joseph M. Firestone, Ph.D. White Paper No. One. March 12, 1997

Oracle Database 11g: Data Warehousing Fundamentals

Data Mining. Data warehousing. Hamid Beigy. Sharif University of Technology. Fall 1394

The Data Organization

Business Intelligence. You can t manage what you can t measure. You can t measure what you can t describe. Ahsan Kabir

Cognos also provides you an option to export the report in XML or PDF format or you can view the reports in XML format.

DATA MINING AND WAREHOUSING

Data Modeling: Beginning and Advanced HDT825 Five Days

Lectures for the course: Data Warehousing and Data Mining (IT 60107)

Data Warehouses Chapter 12. Class 10: Data Warehouses 1

QM Chapter 1 Database Fundamentals Version 10 th Ed. Prepared by Dr Kamel Rouibah / Dept QM & IS

An Overview of Data Warehousing and OLAP Technology

Dta Mining and Data Warehousing

Application software office packets, databases and data warehouses.

Essentials of Database Management

Call: SAS BI Course Content:35-40hours

Fig 1.2: Relationship between DW, ODS and OLTP Systems

WKU-MIS-B10 Data Management: Warehousing, Analyzing, Mining, and Visualization. Management Information Systems

Information Management course

CS614 - Data Warehousing - Midterm Papers Solved MCQ(S) (1 TO 22 Lectures)

Data Warehouse and Data Mining

IT1105 Information Systems and Technology. BIT 1 ST YEAR SEMESTER 1 University of Colombo School of Computing. Student Manual

Data Warehousing Fundamentals by Mark Peco

Full file at

ten mistakes to avoid In Dimensional Modeling

Data Mining Concepts & Techniques

Course Number : SEWI ZG514 Course Title : Data Warehousing Type of Exam : Open Book Weightage : 60 % Duration : 180 Minutes

Chapter 1: The Database Environment

Account Payables Dimension and Fact Job Aid

Star Schema Design (Additonal Material; Partly Covered in Chapter 8) Class 04: Star Schema Design 1

The strategic advantage of OLAP and multidimensional analysis

Advanced Data Management Technologies Written Exam

Government of Ontario IT Standard (GO ITS)

Data about data is database Select correct option: True False Partially True None of the Above

IDU0010 ERP,CRM ja DW süsteemid Loeng 5 DW concepts. Enn Õunapuu

CHAPTER 6 DATABASE MANAGEMENT SYSTEMS

DATA STEWARDSHIP BODY OF KNOWLEDGE (DSBOK)

Evolution of Database Systems

Product Documentation SAP Business ByDesign August Analytics

Government of Ontario IT Standard (GO ITS) GO-ITS Number 56.3 Information Modeling Standard

UNIVERSITY OF CINCINNATI

Figure 1-1a Data in context. Context helps users understand data

SAS. Information Map Studio 3.1: Creating Your First Information Map

A Multi-Dimensional Data Model

1Z0-526

Management Information Systems

Computers Are Your Future

1/12/2018. APPA Institute Dallas, TX Feb DATA INTEGRATION PURPOSE OF TODAY S PRESENTATION

Management Information Systems MANAGING THE DIGITAL FIRM, 12 TH EDITION FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT

MaintScape Training Course Table of Contents

Decision Support Systems aka Analytical Systems

Question Bank. 4) It is the source of information later delivered to data marts.

NOTES ON OBJECT-ORIENTED MODELING AND DESIGN

Index. Symbols = (equal) operator, 87

Factors in the Design and Development of a Data Warehouse for Academic Data

CHAPTER 8: ONLINE ANALYTICAL PROCESSING(OLAP)

The DBMS accepts requests for data from the application program and instructs the operating system to transfer the appropriate data.

Managing Data Resources

SAS Data Integration Studio 3.3. User s Guide

Vertica Knowledge Base Article. Vertica QuickStart for Tableau

Microsoft SharePoint Server 2013 Plan, Configure & Manage

A Road Map for Advancing Your Career. Distinguish yourself professionally. Get an edge over the competition. Advance your career with CBIP.

OLAP Introduction and Overview

This tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining.

OBJECTIVES DEFINITIONS CHAPTER 1: THE DATABASE ENVIRONMENT AND DEVELOPMENT PROCESS. Figure 1-1a Data in context

Oracle Field Sales/Laptop

COGNOS (R) 8 GUIDELINES FOR MODELING METADATA FRAMEWORK MANAGER. Cognos(R) 8 Business Intelligence Readme Guidelines for Modeling Metadata

The COSMIC Functional Size Measurement Method Version 4.0.1

Employer Reporting. User Guide

DATA MINING TRANSACTION

The Data Organization

by Prentice Hall

OLAP2 outline. Multi Dimensional Data Model. A Sample Data Cube

Department of Industrial Engineering. Sharif University of Technology. Operational and enterprises systems. Exciting directions in systems

SOLUTIONS TO REVIEW QUESTIONS AND EXERCISES FOR PART 3 - DATABASE ANALYSIS AND DESIGN (CHAPTERS 10 15)

IT DATA WAREHOUSING AND DATA MINING UNIT-2 BUSINESS ANALYSIS

Managing Data Resources

1. Analytical queries on the dimensionally modeled database can be significantly simpler to create than on the equivalent nondimensional database.

Search for a Target Company

FAQ: Relational Databases in Accounting Systems

Guide Users along Information Pathways and Surf through the Data

Sql Fact Constellation Schema In Data Warehouse With Example

Oracle Financial Analyzer Oracle General Ledger

Transcription:

Previews of TDWI course books are provided as an opportunity to see the quality of our material and help you to select the courses that best fit your needs. The previews can not be printed. TDWI strives to provide course books that are content-rich and that serve as useful reference documents after a class has ended. This preview shows selected pages that are representative of the entire course book. The pages shown are not consecutive. The page numbers as they appear in the actual course material are shown at the bottom of each page. All table-of-contents pages are included to illustrate all of the topics covered by a course.

TDWI Dimensional Data Modeling Primer from Requirements to Business Analytics The Data Warehousing Institute i

The Data Warehousing Institute takes pride in the educational soundness and technical accuracy of all of our courses. Please give us your comments we d like to hear from you. Address your feedback to: email: info@dw-institute.com Publication Date: September 2004 Copyright 2004 by The Data Warehousing Institute. All rights reserved. No part of this document may be reproduced in any form, or by any means, without written permission from The Data Warehousing Institute. ii The Data Warehousing Institute

TABLE OF CONTENTS Module 1 Dimensional Modeling Concepts... 1-1 Module 2 Requirements Gathering for Dimensional Modeling.. 2-1 Module 3 Logical Dimensional Modeling......... 3-1 Module 4 From Logical Model to Star Schema...... 4-1 Module 5 Dimensional Data and Business Analytics........ 5-1 Appendix A Bibliography and References... A-1 Exercises Exercise Instructions and Worksheets W-1 The Data Warehousing Institute iii

Dimensional Modeling Concepts Module 1 Dimensional Modeling Concepts Topic Page Dimensional Modeling Basics 1-2 Comparing Relational and Dimensional Models 1-10 Concepts Summary 1-22 The Data Warehousing Institute 1-1

Dimensional Modeling Concepts TDWI Dimensional Data Modeling Primer Dimensional Modeling Basics Dimensional Data Models Conceptual Data Model? how many xyz in abc what is the rate of kjljklj how frequently does 12345 how long does it take to xxxx what does it cost to xxxxx what is the value of abcde how many times does xxxxx an analysis process model of business requirements starting from vague and uncertain evolving to specific and certain business questions list & fact/qualifier matrix Logical Data Model EMPLOYEE EMPLOYEE empl_number EMPLOYEE SATISFACTION empl_name empl_gender satisfaction_score empl_length_of_svc empl_age job_turnover_rate avg_length_of_employ ment complaint_frequency TIME disciplinary_action_fre quency MONTH date calendar_year calendar_month fiscal_year fiscal_month JOB DEPARTMENT dept_code dept_name JOB job_id job_description job_title job_shift_code job_location_code job_status_code a business (non-technical) design process model of business solution starting from specific business requirements evolving to product specification logical dimensional data model Physical Data Model location year location_code date_yyyy location_name fiscal_yy location_address location_city_name location_state_abbr month location_zip_code date_yyyymm fiscal_yyyymm fiscal_yyyyqq labor union union_id_number union_name union_group_code employee satisfaction employee emp_id_number emp_age emp_gender emp_name emp_hire_date emp_term_date emp_status_code emp_term_reason department dept_number dept_name dept_abbr an implementation (technical) design process model of technology solution starting from product specification evolving to database specification star schema trade trade_code trade_soc_code trade_name contract_start_date contract_end_date job_change_count employment_length_months complaint_count resignation_count termination_count promotion_count demotion_count disciplinary_action_count job job_id_number job_title job_shift_code job_shift_name 1-8 The Data Warehousing Institute

Dimensional Modeling Concepts Dimensional Modeling Basics Dimensional Data Models LEVELS OF MODELING Dimensional data design, like any other design process, involves a transition from abstract to highly specific. Abstract models are a means to understand requirements, while implementation models must specify the solution precisely. A three-level approach to data modeling works well where: Conceptual modeling describes the business needs. Logical modeling describes the business solution. Physical modeling specifies the technology solution. CONCEPTUAL MODELS LOGICAL MODELS Conceptual models are produced through analysis of business needs, and are intended to structure business requirements such that they can be verified and used as input to logical data modeling. When designing dimensional data, conceptual models include a representative list of business questions and analysis of those questions to understand them as a collection of data facts and data qualifiers. Logical modeling is a point of transition from analysis to design. Logical models describe the design of a solution to requirements expressed by conceptual models. A logical model is analogous with a product specification, describing what is to be produced but not detailing how it will be assembled. A logical dimensional model meets this requirement for design of dimensional data. PHYSICAL MODELS Physical models are produced by technical design processes. They describe and specify technology solutions. The physical design process transforms a logical model into a specification for implementation. Physical modeling adds the details necessary to describe how a product is structured and assembled. When designing dimensional data, the most common and widely-accepted physical model is a star-schema. The Data Warehousing Institute 1-9

Dimensional Modeling Concepts TDWI Dimensional Data Modeling Primer Comparing Relational and Dimensional Models A Quick Review of Relational Modeling EMPLOYEE empl_number empl_name empl_gender empl_hire_date empl_birth_date performs held by JOB job_id job_description job_title job_shift_code job_location_code job_status_code owned by owns changed by changes DEPARTMENT dept_code dept_name PERSONNEL ACTION is one of PA_id_number PA_type_code PA_effective_date PA_comments hiring approval_code hire_authority_id resignation resign_reason voluntary_code termination term_authority_code for_cause_code 1-10 The Data Warehousing Institute

Dimensional Modeling Concepts Comparing Relational and Dimensional Models A Quick Review of Relational Modeling ENTITY- RELATIONSHIP MODELS This diagram illustrates a simple entity-relationship (ER) model. The primary components of an ER model are: Entity An entity is a subject about which the business has the need, will, and means to collect data a person, place, thing, concept, or event that is of business interest. Entities are represented as labeled boxes in the ER model. Examples in the diagram include EMPLOYEE, JOB, and DEPARTMENT. Attribute An attribute is a property or characteristic of an entity that can be collected as data. Attributes are listed inside the box of the entities that they describe. Job_description and job_title, for example, are attributes of the entity JOB. Relationship A relationship is an important association among entities that is of business interest and may be collected as data. Examples of relationships in this diagram include EMPLOYEE PERFORMS JOB and DEPARTMENT OWNS JOB. Other important ER concepts include: Cardinality which describes the number of occurrences of each entity type that may participate in an occurrence of a relationship. Cardinality options include zero or one, exactly one, one or more, and zero more. In this diagram some of the relationship cardinalities are: a PERSONNEL ACTION changes exactly one JOB, a JOB is changed by one or more PERSONNEL ACTIONs, a JOB is held by zero or one EMPLOYEEs. Specialization (also called subtyping) creates a hierarchy or parent/child relationship between an ENTITY super-type and its subtypes. Specialization makes sense when the entity sub-types have unique attributes or participate in relationships not common to all subtypes. In this diagram PERSONNEL ACTION is an entity super-type with three sub-types. The Data Warehousing Institute 1-11

Dimensional Modeling Concepts TDWI Dimensional Data Modeling Primer Comparing Relational and Dimensional Models Introduction to Logical Dimensional Modeling EMPLOYEE EMPLOYEE empl_number empl_name empl_gender empl_length_of_svc empl_age TIME MONTH date calendar_year calendar_month fiscal_year fiscal_month EMPLOYEE SATISFACTION satisfaction_score job_turnover_rate avg_length_of_employment complaint_frequency disciplinary_action_frequency resignation_rate termination_rate promotion_rate demotion_rate JOB DEPARTMENT dept_code dept_name JOB job_id job_description job_title job_shift_code job_location_code job_status_code 1-12 The Data Warehousing Institute

Dimensional Modeling Concepts Comparing Relational and Dimensional Models Introduction to Logical Dimensional Modeling COMPONENTS AND STRUCTURE OF THE DIMENSIONAL MODEL This diagram illustrates a logical dimensional model. The components of the model include: A Meter that contains related measures of business interest. Each logical dimensional model has only one meter. In this example, the meter is EMPLOYEE SATISFACTION. Related measures include satisfaction_score, job_turnover_rate, avg_length_of_employment, complaint_frequency, etc. Visually, meters in a logical dimensional model look much like entities in the ER model, and measures look much like attributes in the ER model. Dimensions that provide the means to select, sort, filter and summarize business measures. In this diagram the dimensions are EMPLOYEE, TIME, and JOB. Dimension Levels describe hierarchies that exist within dimensions. This diagram contains one multi-level dimension JOB which contains dimensions levels of DEPARTMENT and JOB. The EMPLOYEE dimension contains dimension level EMPLOYEE, and the TIME dimension contains dimension level MONTH. Visually, dimension levels in a logical dimensional model look much like entities in an ER model. Dimension Attributes include identifiers and descriptive data about dimension levels. Some of the dimension attributes of JOB are job_id, job_description, and job_title. Dimension attributes are diagrammed in the same way as attributes of entities in an ER model. Associations within a dimension describe the dimension hierarchy and are represented as one-to-many relationships from one dimension level to another for example one DEPARTMENT contains many JOBS. Dimension to meter associations are represented as one-to-many relationships from the lowest level of each dimension to the meter. In this diagram the examples are MONTH to EMPLOYEE SATISFACTION, JOB to EMPLOYEE SATISFACTION, and EMPLOYEE to EMPLOYEE SATISFACTION. What this means is that each unique set of EMPLOYEE SATISFACTION measures is for one MONTH, one JOB, and one EMPLOYEE. The Data Warehousing Institute 1-13

Dimensional Modeling Concepts TDWI Dimensional Data Modeling Primer Comparing Relational and Dimensional Models A Basis for Comparison How would you represent employee satisfaction in an entity-relationship model? Some business questions: 1. Which departments have the highest employee satisfaction? Which have the lowest? 2. What is the average length of employment by trade and department? 3. What is the job turnover rate by department, employee age, and gender? 4. What is the frequency of complaints by job location? By shift? By trade? 5. Is there a relationship between complaint frequency and union representation? 6. Do age and gender affect promotion & demotion rates in some departments? 7. Do rotating shift jobs have higher rates of complaints & disciplinary actions? 8. Can we profile the factors for jobs with high termination rates? 1-16 The Data Warehousing Institute

Dimensional Modeling Concepts Comparing Relational and Dimensional Models A Basis for Comparison STARTING WITH BUSINESS QUESTIONS Business questions are the starting place for dimensional modeling and the basis for business metrics. The questions listed here are examples of the kinds of business questions that may be of interest for business performance in the area of EMPLOYEE SATISFACTION. These questions are used as the foundation of examples throughout the course. The Data Warehousing Institute 1-17

Requirements Gathering for Dimensional Modeling Module 2 Requirements Gathering for Dimensional Modeling Topic Page Business Context for Data Modeling 2-2 Business Questions as Requirements Models 2-6 Fact/Qualifier Analysis 2-12 Requirements Gathering Summary 2-20 The Data Warehousing Institute 2-1

This page intentionally left blank.

Requirements Gathering for Dimensional Modeling TDWI Dimensional Data Modeling Primer Business Questions as Requirements Models Addressing People, Performance, and Processes Enterprise Metrics / KPIs executive context Process Metrics vision and strategy Growth Metrics Organizational Metrics senior manager / department head context Process Metrics line manager / process manager context Financial Metrics Customer Metrics Activity Metrics knowledge worker context business processes events / transactions activities sources inputs workforce product customers Develop a robust, representative set of business questions that represents the perspectives of: business performance metrics and measures (financial, process, customer, growth) people who will use those metrics and measures (from executives to knowledge workers) the business processes in which those people participate (all parts of the processes customers, products, activities, workforce, events, transactions, inputs, and sources) 2-6 The Data Warehousing Institute

Requirements Gathering for Dimensional Modeling Business Questions as Requirements Models Addressing People, Performance, and Processes PUTTING IT ALL TOGETHER PEOPLE PERFORMANCE PROCESSES Requirements modeling begins with business questions the foundation for all subsequent dimensional modeling activities. The first context-level model is a robust and representative list of business questions that are within the scope of the project. Note the key words robust and representative. It is not possible to develop a complete and exhaustive list of all questions that might be asked. The objective is to create a list of questions that is fertile enough to drive development of a highlyadaptable data structure that is capable of answers to questions not yet known. The people, performance, and process perspectives help to ensure richness in the list of questions. Who are the stakeholders of the data marts or analytic applications being developed Executives? Managers? Knowledge workers? What metrics do they use today? What unanswered questions do they have? What levels of metrics do they need? Which classes of metrics are within the scope of modeling Financial? Process? Customer? Growth? How do stakeholder questions relate to these four categories of performance management? How do they relate to the KPIs for the business? Which business processes are within the scope of modeling? What can be measured about those processes? What metrics are needed to effectively manage the processes? What metrics are needed for knowledge workers to perform the process activities? Which process components make good metrics subjects? The Data Warehousing Institute 2-7

Requirements Gathering for Dimensional Modeling TDWI Dimensional Data Modeling Primer Business Questions as Requirements Models A Framework for Business Questions business questions (BPM / Customer) business questions (CRM / Product) Activity Process Organization Enterprise Customer Management Disciplines CRM BPM SCM BAM etc. Subject Areas Product Supplier Workforce Cost Value etc. Duration etc. business questions (SCM / Supplier) business questions (BAM / Workforce) 2-8 The Data Warehousing Institute

Requirements Gathering for Dimensional Modeling Business Questions as Requirements Models A Framework for Business Questions A DIMENSIONAL VIEW OF METRICS The diagram on the facing page illustrates a three dimensional view that is useful in managing a comprehensive set of business metrics. The dimensions are: The levels of metrics enterprise, organization, process, and activity that were previously discussed. The major subject areas of your business, typically found in a subject area model and the basis for the subject-oriented data warehouse. Subjects vary depending on the industry and the things that make your business unique. They include such things as customer, supplier, product, service, finance, etc. Performance management methods that are used by your business. These include disciplines such as business performance management (BPM), customer relationship management (CRM), supply chain management (SCM), etc. INSIDE THE FRAMEWORK METRIC CLASS WORDS Each intersection of the three dimensions is an opportunity to find business questions and to manage business metrics. At the intersection of enterprise, product, and CRM for example, you may ask: What needs to be known about products at the executive level to make CRM effective? What are the KPIs (from enterprise level) for products that are important to CRM? How are product measures for CRM related to those for BPM and SCM? How are product measures for CRM related to customer measures for CRM? And many more. In this way, the framework becomes both a tool to discover questions and a tool to manage cohesion and consistency across a collection of metrics. Similar to the class words of common data elements (date, code, etc.) metrics have class words (i.e., cost, value, duration, count) that describe the kind of quantity that they express. Classification is useful both in finding metrics (what can we measure?) and in achieving consistency among a set of related metrics. The Data Warehousing Institute 2-9

Requirements Gathering for Dimensional Modeling TDWI Dimensional Data Modeling Primer Fact/Qualifier Analysis From Business Requirements to Data Requirements Mapping of Business Questions FACTS: Discrete items of business information. QUALIFIERS: Criteria used to access, sort, group, summarize, and present information. associations among facts & qualifiers 2-12 The Data Warehousing Institute

Requirements Gathering for Dimensional Modeling Fact/Qualifier Analysis From Business Requirements to Data Requirements STRUCTURING REQUIREMENTS FOR DATA Business questions are the basis of requirements analysis, but they are not the final result. Full understanding of requirements is achieved through analysis of those questions. Requirements are much more than simply knowing which data is needed. To provide a foundation for logical design you ll also need to know how it will be accessed, navigated, and used. Business questions begin the transition to more formal data models through a matrix modeling technique known as fact/qualifier analysis. Each business question is examined to determine the fact(s) to be contained in the answer and the qualifiers that give the fact(s) meaning. The terms fact and qualifier describe roles that a data item assumes in a specific instance of business usage. Facts are discrete items of business information. Qualifiers are criteria used to access, sort, group, summarize, and present information. A single data item may be a fact in one instance and a qualifier in another. Consider, for example, the business question What is the average value of inventory for each store at the end of each month? The fact inherent in this question is average value of inventory the answer that the business person wants to obtain. The qualifiers found in the question are store and month the additional data needed to give the answer context and meaning. The Data Warehousing Institute 2-13

Logical Dimensional Modeling Module 3 Logical Dimensional Modeling Topic Page Modeling Meters and Measures 3-2 Modeling Dimensions 3-4 More about Meters and Measures 3-12 Logical Modeling Summary 3-18 Th Data Warehousing Institute 3-1

This page intentionally left blank.

Logical Dimensional Modeling TDWI Dimensional Data Modeling Primer Modeling Dimensions Adding Dimensions from the Qualifiers labor union year employee emp_age emp_gender trade month employee satisfaction job_turnover_rate avg_length_of_employment number_of_complaints location department shift job turnover rate avg. length of employment number of complaints department employee gender employee age trade year shift month labor union location 1,4 1 1 1 1 4 4 4 2 2 2 2 2 3 3 3 3 3-6 The Data Warehousing Institute

Logical Dimensional Modeling Modeling Dimensions Adding Dimensions from the Qualifiers DIMENSIONS IN THE LOGICAL MODEL Once dimension hierarchies are known, the logical dimensional model is extended by adding dimensions and associating them with the meter. For every measure contained in the meter, all associated qualifiers must be represented by a dimension. To add dimensions to the model: 1. Include each dimension hierarchy previously identified. Associate only the lowest level of each hierarchy with the meter as a one-tomany relationship. 2. Examine the remaining qualifiers to find those that may be dimension attributes instead of dimension levels. In this example, employee gender and employee age are attributes that describe employee. Thus employee becomes the dimension level with age and gender modeled as attributes. The dimension level employee is associated with the meter as a one-to-many relationship. 3. All remaining qualifiers are mapped as flat (single- level, nonhierarchical) dimensions, and each is associated with the meter using a one-to-many relationship. The Data Warehousing Institute 3-7

Logical Dimensional Modeling TDWI Dimensional Data Modeling Primer Modeling Dimensions Completing the Dimensions location location_code location_name location_address location_city_name location_state_abbr location_zip_code labor union union_id_number union_name union_group_code trade trade_code trade_soc_code trade_name contract_start_date contract_end_date year date_yyyy fiscal_yy month date_yyyymm fiscal_yyyymm fiscal_yyyyqq employee satisfaction job_turnover_rate avg_length_of_employment number_of_complaints employee emp_id_number emp_age emp_gender emp_name emp_hire_date emp_term_date emp_status_code emp_term_reason department dept_number dept_name dept_abbr job job_id_number job_title job_shift_code job_shift_name 3-10 The Data Warehousing Institute

Logical Dimensional Modeling Modeling Dimensions Completing the Dimensions IDENTIFY THE DIMENSION ATTRIBUTES Logical modeling of the dimensions is completed by expanding the list of attributes for each dimension level. Recall the earlier statement that The analysis and design processes of dimensional modeling extend, expand, and refine the collection of measures and dimensions such that the resulting data structure has the ability to answer many questions not found on the original list. This refinement activity is one of the key areas where that expansion occurs. Examine each dimension level with the following questions: What operational system identifiers are used for the dimension level? What is the natural identifier or key for the dimension level? What attributes of the dimension level appear in operational systems reporting? What attributes of the dimension level are collected as part of business transactions? What dimension level attributes are available in the source data? What dimension attributes are useful for business analysis? The Data Warehousing Institute 3-11

Logical Dimensional Modeling TDWI Dimensional Data Modeling Primer More about Meters and Measures Granularity and the Measures location location_code location_name location_address location_city_name location_state_abbr location_zip_code labor union union_id_number union_name union_group_code trade trade_code trade_soc_code trade_name contract_start_date contract_end_date year date_yyyy fiscal_yy month date_yyyymm fiscal_yyyymm fiscal_yyyyqq employee satisfaction job_turnover_rate job_change_count avg_length_of_employment employment_length_months number_of_complaints complaint_count each instance represents a unique combination of 1 month, 1 employee, 1 job, 1 location, and 1 trade employee emp_id_number emp_age emp_gender emp_name emp_hire_date emp_term_date emp_status_code emp_term_reason department dept_number dept_name dept_abbr job job_id_number job_title job_shift_code job_shift_name 3-14 The Data Warehousing Institute

Logical Dimensional Modeling More about Meters and Measures Granularity and the Measures WHAT CAN BE MEASURES AT THE LEVEL OF DETAIL? Once the grain has been declared, examine the measures to be certain that they make sense in context of the grain. In this example the following refinements have been applied: Job_turnover_rate is replaced with job_change_count. It is not possible to determine a turnover rate for a single employee and month. It is possible, however, to count the number of job changes for a single employee in one month. And it is practical with a collection of data for many employees and multiple months, and using an OLAP tool, to derive and display job_turnover_rate. Avg_length_of_employment is replaced with the measure of employment_length_months. An average for one employee has the same numeric value as that employee s individual length of employment. Calculating the average for a group of employees is an OLAP function. Number_of_complaints is replaced with complaint_count. This is a syntactic and naming conventions preference, and is aligned with the concept of metric class words that was discussed earlier in the course. Consistency of naming between job_change_count and complaint_count makes the data more understandable and usable for the business. Apply similar standards for other classes of data values date, average, minimum, maximum, mean, rate, duration, etc. The Data Warehousing Institute 3-15

Physical Dimensional Modeling Module 4 From Logical Model to Star Schema Topic Page Star Schema Dimensions 4-2 Star Schema Fact Tables 4-10 Star Schema Design Challenges 4-14 Modeling Process Summary 4-30 The Data Warehousing Institute 4-1

This page intentionally left blank.

Physical Dimensional Modeling TDWI Dimensional Data Modeling Primer Star Schema Dimensions Modeling Dimension Tables location location_code location_name location_address location_city_name location_state_abbr location_zip_code labor organization union_id_number union_name union_group_code trade_code trade_soc_code trade_name contract_start_date contract_end_date time date_yyyymm fiscal_yyyymm fiscal_yyyyqq employee satisfaction job_change_count employment_length_months complaint_count resignation_count termination_count promotion_count demotion_count disciplinary_action_count satisfaction_score employee emp_id_number emp_age emp_gender emp_name emp_hire_date emp_term_date emp_status_code emp_term_reason employment organization dept_number dept_name dept_abbr job_id_number job_title job_shift_code job_shift_name 4-4 The Data Warehousing Institute

Physical Dimensional Modeling Star Schema Dimensions Modeling Dimension Tables ONE TABLE PER DIMENSION Up to this point in the modeling process each dimension is normalized, with dimension hierarchies intact and data redundancy eliminated. Starschema design de-normalizes the dimensions, collapsing each dimension to a single table regardless of the number of dimension levels. Dimension attributes at higher levels of a hierarchy are kept redundantly with each occurrence of the lowest level in the dimension. This design, while suboptimal for transaction processing, is highly effective for analytical processing where access and navigation not update integrity and performance are the driving factors. This is the point of transition from logical (business solution) design to physical (technical implementation) design. Although the star-schema looks much like the logical model, the components are different: Where the logical model has a meter (a business performance measurement concept) the physical model has a fact table (a database table). The fact table contains facts the data elements to physically implement business measures. Where the logical model has dimension levels and relationships within them and to the meter (the business context of measures) the physical model has dimension tables (again database tables). The dimension tables contain dimension elements the data elements to physically implement dimension attributes. The Data Warehousing Institute 4-5

Physical Dimensional Modeling TDWI Dimensional Data Modeling Primer Star Schema Fact Tables Defining the Fact Table Key location location_code location_name location_address location_city_name location_state_abbr location_zip_code location_key labor organization union_id_number union_name union_group_code trade_code trade_soc_code trade_name contract_start_date contract_end_date labor_org_key time date_yyyymm fiscal_yyyymm fiscal_yyyyqq time_key employee satisfaction job_change_count employment_length_months complaint_count resignation_count termination_count promotion_count demotion_count disciplinary_action_count satisfaction_score time_key emp_key location_key employment_org_key labor_org_key emp_age emp_gender employee emp_id_number emp_age emp_gender emp_name emp_hire_date emp_term_date emp_status_code emp_term_reason emp_key employment organization dept_number dept_name dept_abbr job_id_number job_title job_shift_code job_shift_name employment_org_key 4-10 The Data Warehousing Institute

Physical Dimensional Modeling Star Schema Fact Tables Defining the Fact Table Key DIMENSION TO FACT NAVIGATION The foundation of OLAP processing is navigation from a set of dimensions to the facts associated with those dimensions. For example, the business question 4. What is the rate of job turnover by department, shift, and location? How does it change from month to month? asks that a specific measure (rate of job turnover) be reported by a group of dimensions (department, shift, location, and month). Each unique value of job_turnover_rate is determined from a unique combination of key values for all of the participating dimensions. Note that job_turnover_rate does not exist in the star schema on the facing page. It is a measure that must be calculated based on the values of other data in the fact table. Derived measures are discussed on the next set of pages. A COMPOSITE KEY TABLELESS DIMENSIONS The fact table key is simply the composite of all dimension table keys a concatenation of dimension surrogate keys. This works well because the keys are used only for navigation by the OLAP tool and are never exposed to a business user of the tool. Note that the dimensions tables for employee age and gender have been removed. Once the keys emp_age and emp_gender are migrated into the fact table there is no need to maintain them redundantly as single column tables. It is also acceptable, but not essential, to remove those elements from the employee dimension table. The Data Warehousing Institute 4-11

Physical Dimensional Modeling TDWI Dimensional Data Modeling Primer Star Schema Fact Tables Supporting Calculated Measures location location_code location_name location_address location_city_name location_state_abbr location_zip_code location_key labor organization union_id_number union_name union_group_code trade_code trade_soc_code trade_name contract_start_date contract_end_date labor_org_key time date_yyyymm fiscal_yyyymm fiscal_yyyyqq time_key employee satisfaction job_change_count employment_length_months complaint_count resignation_count termination_count promotion_count demotion_count disciplinary_action_count satisfaction_score employee_count time_key emp_key location_key employment_org_key labor_org_key emp_age emp_gender employee emp_id_number emp_age emp_gender emp_name emp_hire_date emp_term_date emp_status_code emp_term_reason emp_key employment organization dept_number dept_name dept_abbr job_id_number job_title job_shift_code job_shift_name employment_org_key 4-12 The Data Warehousing Institute

Physical Dimensional Modeling Star Schema Fact Tables Supporting Calculated Measures FACTS NEEDED FOR CALCULATION Continuing the example from the previous pages, recall that job_turnover_rate does not exist in the star schema. It is a measure that must be calculated based on the values of other data in the fact table. It is essential then, to ensure that all data needed to calculate the measure exists in the fact table. In this example, job_turnover_rate is calculated as job_change_count divided by the number of employees in the dimension set. One of the needed facts job_turnover_rate exists in the fact table. The other necessary item number of employees does not. Depending on the capabilities of the OLAP tool(s) being used, it may be necessary to include employee_count as a fact. The star schema design on the facing page includes employee_count. Given the grain of the fact table each row represents a unique combination of one employee, one month, one job, one location, and one trade the value of employee_count will always be exactly one. Thus the value of job_turnover_rate will always be equal to the value of job_change_count at the row level. However, when data values are accumulated for several rows for a dimension set job_turnover_rate becomes a meaningful measure. The Data Warehousing Institute 4-13

Physical Dimensional Modeling TDWI Dimensional Data Modeling Primer Star Schema Design Challenges Slowly Changing Dimensions Type 2 Example Location Table as of January 1 location_key location_code location_name location_address location_city location_state 00010 NWR001 retail store 1 2317 SE Stark Portland OR 00020 NWR002 retial store 2 9927 S. Main Roseburg OR 00030 NWWHSE warehouse 180 N. Broadw Portland OR Location Table as of September 1 location_key location_code location_name location_address location_city location_state 00010 NWR001 retail store 1 2317 SE Stark Portland OR 00020 NWR002 Product Table retial store September 2 9927 S. 2001 Main Roseburg OR 00030 NWWHSE warehouse 180 N. Broadw Portland OR 00040 NWD001 dist. center 1 180 N. Broadw Portland OR 00050 NWD002 dist. center 2 440 Coburg Rd Eugene OR Keep Current & History Rows in Dimension Table 4-22 The Data Warehousing Institute

Physical Dimensional Modeling Star Schema Design Challenges Slowly Changing Dimensions Type 2 Example KEEPING ALL OF THE HISTORY Type 2 dimensions insert a new row, with a new surrogate key value, into the dimension table whenever a change occurs to the value of a dimension element. This is the most common design choice for slowly changing dimensions. The obvious advantage is that a type 2 dimension preserves all of the history of dimension data changes. For a structure that is also dimensioned by time, there is no need to time-stamp dimension records. Each row of the fact table is associated with only one row of each dimension table, thus the time dimension serves as the time stamp. The most significant disadvantage of type 2 dimensions is rapid growth. Type 2 dimensions may result in very large dimension tables contributing to both database size and query performance concerns. The Data Warehousing Institute 4-23

Physical Dimensional Modeling TDWI Dimensional Data Modeling Primer Star Schema Design Challenges Semi-Additive and Non-Additive Facts location location_code location_name location_address location_city_name location_state_abbr location_zip_code location_key labor organization union_id_number union_name union_group_code trade_code trade_soc_code trade_name contract_start_date contract_end_date labor_org_key time date_yyyymm fiscal_yyyymm fiscal_yyyyqq time_key employee satisfaction job_change_count employment_length_months complaint_count resignation_count termination_count promotion_count demotion_count disciplinary_action_count satisfaction_score employee_count time_key emp_key location_key employment_org_key labor_org_key emp_age emp_gender employee emp_id_number emp_age emp_gender emp_name emp_hire_date emp_term_date emp_status_code emp_term_reason emp_key employment organization dept_number dept_name dept_abbr job_id_number job_title job_shift_code job_shift_name employment_org_key 4-28 The Data Warehousing Institute

Physical Dimensional Modeling Star Schema Design Challenges Semi-Additive and Non-Additive Facts MEASURES AND SUMMARIZATION RULES One of the powerful features of OLAP processing is the ability to summarize measures using hierarchical dimensions to sum monthly sales measures, for example, to report a year-to-date sales. Not all facts, however, should be summarized for all dimensions. Each measure in a fact table needs to be examined to determine whether it is: Additive Producing useful and meaningful information when summarized along any set of dimensions. Semi-Additive Useful to summarize using some dimensions but not all dimensions. Non-Additive Impractical to sum along any set of dimensions. AN EXAMPLE The employee satisfaction star schema illustrates an example of a semiadditive fact employee_count. To count the employees in a given trade for a specific month, you could sum employee_count using those two dimensions and produce a meaningful result. The same approach could be used to count employees in a specific department in any month. Yet, consider what happens if the goal is to count employees of a department during the first quarter of the year. Any employee who worked in that department throughout the quarter will have three rows one for January, one for February, and one for March. Each row has an employee_count value of one. Each of those employees would be counted three times, resulting in an incorrect total. Employee_count is a semi-additive fact because it can be summed along most dimensions, but may not be summarized up the hierarchy of the time dimension. THE SOLUTION The solution to semi-additive and non-additive facts depends on the specific OLAP tool(s) being used. Some tools have features to specify summarization constraints. When your tool has these features it is important that the constraints be specified. Without these features, business user education and describing summarization constraints in the metadata are helpful. The Data Warehousing Institute 4-29

Physical Dimensional Modeling TDWI Dimensional Data Modeling Primer Modeling Process Summary From Business Requirements to Star Schema business requirements define the scope (metrics for business management) develop a list of business questions map questions into fact/qualifier matrix dimensional modeling logical dimensional modeling model the meter model the dimensions refine the meter name the meter select measures from F/Q matrix select qualifiers from F/Q matrix find hierarchies in the qualifiers add dimensions & associate with meter refine the dimensions add dimension attributes declare the grain refine the measures physical dimensional modeling name the dimensions model the dimension tables define the dimension table keys define the fact table key 4-30 The Data Warehousing Institute

Physical Dimensional Modeling Modeling Process Summary From Business Requirements to Star Schema PROCESS OVERVIEW The facing page illustrates dimensional data design activities from requirements through physical design. This diagram provides a simple, easy-to-reference summary of the dimensional modeling process from beginning to end. The Data Warehousing Institute 4-31

Dimensional Data and Business Analytics Module 5 Dimensional Data and Business Analytics Topic Page Delivering Business Value 5-2 An OLAP Demonstration 5-4 The Data Warehousing Institute 5-1

This page intentionally left blank.

Dimensional Data and Business Analytics TDWI Dimensional Data Modeling Primer An OLAP Demonstration Business Metrics in Action location location_code location_name location_address location_city_name location_state_abbr location_zip_code location_key labor organization union_id_number union_name union_group_code trade_code trade_soc_code trade_name contract_start_date contract_end_date labor_org_key time date_yyyymm fiscal_yyyymm fiscal_yyyyqq time_key employee satisfaction job_change_count employment_length_months complaint_count resignation_count termination_count promotion_count demotion_count disciplinary_action_count satisfaction_score employee_count time_key emp_key location_key employment_org_key labor_org_key emp_age emp_gender employee emp_id_number emp_age emp_gender emp_name emp_hire_date emp_term_date emp_status_code emp_term_reason emp_key employment organization dept_number dept_name dept_abbr job_id_number job_title job_shift_code job_shift_name employment_org_key 5-4 The Data Warehousing Institute

Dimensional Data and Business Analytics An OLAP Demonstration Business Metrics in Action OLAP DEMO This OLAP demonstration completes the chain of activity from requirements to business analytics. NOTES The Data Warehousing Institute 5-5