Data Vault Modeling & Methodology. Technical Side and Introduction Dan Linstedt, 2010,

Size: px
Start display at page:

Download "Data Vault Modeling & Methodology. Technical Side and Introduction Dan Linstedt, 2010,"

Transcription

1 Data Vault Modeling & Methodology Technical Side and Introduction Dan Linstedt, 2010,

2 Technical Definition The Data Vault is a detail oriented, historical tracking and uniquely linked set of normalized tables that support one or more functional areas of business. It is a hybrid approach encompassing the best of breed between 3 rd normal form (3NF) and star schema. The design is flexible, scalable, consistent and adaptable to the needs of the enterprise. Architected specifically to meet the needs of today s enterprise data warehouses 2

3 Customer What Does One Look Like? Records a history of the interaction Product Elements: Hub Link Satellite Sat Sat Customer Sat F(x) Sat Link F(x) F(x) Sat Product Sat Sat Sat Hub = List of Unique Business Keys Link = List of Relationships, Associations Satellites = Descriptive Data Order F(x) Order Sat Sat 3

4 Excel As A Source Level A Level B Level C Item Item Item Staging Table Hub Grouping Link Acct To Group Hierarchical Link of Groups Sat Group Type User Grouping Structures Flattened Structure Hub Account Raw Source Data in DV Do you have a power executive who is technically inclined, who runs the business off a rogue spreadsheet? 4

5 Data Vault Basic Elements CORE ARCHITECTURE 5

6 Data Vault Core Architecture Hubs, Links, Satellites Hubs = Unique List of Business Keys Links = Unique List of Relationships across keys Satellites = Descriptive Data Satellites have 1 and only one parent table Satellites cannot be Parents to other tables Hubs cannot be child tables Last Seen Dates, Load Dates, Record Sources, and Surrogate keys are not part of the core architecture. They exists to help models and key migration. 6

7 Hub Entity A Hub is a list of unique business keys Hub Structure Primary Key <Business Key> Load DTS Last Seen DTS Record Source Unique Index (Primary Index) Hub Product Product Sequence ID Product Number Product Load DTS Product Last Seen DTS Prod Record Source A Hub s business key is a unique index A Hub s load date represents the FIRST TIME the EDW saw the data A Hub s record source represents: First the Master data source (on collisions), if not available it holds the origination source of the actual key 7

8 Link Entity A Link is an intersection of two or more business keys It can contain Hub keys and other Link keys Link Structure Primary Key Link Line-Item Link Line Item Sequence ID {Hub/Lnk Surrogate Keys 2..N} Load DTS Last Seen DTS Record Source Unique Index (Primary Index) Hub Product Sequence ID Hub Order Sequence ID **Line Item Number Load DTS Last Seen DTS A Link s business key is a composite unique index Record Source A Link may or may not have a **Item Numbering attribute A Link s load date represents the FIRST TIME the EDW saw the data A Link s record source represents: first the Master data source (on collisions), if not available, it holds the origination source of the actual key 8

9 Satellite Entity A Satellite is a time-dimensional table housing detailed information about the Hub s or Link s business keys Primary Key Load DTS Extract DTS **Load End Date Detail Business Data {Update User} {Update DTS} Record Source Unique Index (Primary Index) Customer # Load DTS Extract DTS **Load End Date Customer Name Customer Addr1 Customer Addr2 {Update User} {Update DTS} Record Source Satellites are defined by TYPE of data and RATE OF CHANGE Mathematically this reduces redundancy and decreases storage requirements over time (compared to a star schema) 9

10 Rules and Standards GOVERN your deployment THINKING OF BREAKING RULES 10

11 Some Rules For You NO Foreign Keys in the Satellites! NO Hub to Hub (Parent Child relationships) NO Enforcement of relationships in the data model NO Date Time attributes in HUB or LINK Primary Keys Why?? It breaks flexibility It breaks auditability / accountability It breaks Scalability It breaks Performance It introduces Decisions in the architecture, which breaks Patterns! Up Next Links and the Unit Of Work 11

12 Business Key Definitions The contracts system is responsible for creating customer account numbers. The EDW will never see other systems creating customer account numbers. (Requirement #101) Sales is clearly creating customer numbers, how do we detect the issue and alert the business? Point: Not all business keys are created EQUAL! 12

13 Link: Unit of Work Hub Category Link Prod-Cat Hub Product Sat Effectivity Link Line Item Unit Of Work Link: Product by Supplier by Category Link Prod-Supp Sat Effectivity Hub Supplier Link Product by Category Link Product by Supplier These links are Optional, used For exploration only 13

14 What Happens When: We Break the Unit of Work Source System UOW Product_ID Category_ID Supplier_ID Link Product by Supplier Product_ID Supplier_ID Link Product by Category Product_ID Category_ID Model Normalization Question: After normalizing, how can you reconstruct the source image EXACLTY as it stands? 14

15 What Happens When: Trying to Rebuild from Two Links Source System UOW Product_ID Category_ID Supplier_ID Model Normalization Link Product by Supplier Product_ID Supplier_ID Link Product by Category Product_ID Category_ID Re-joining the data, creates a record that does not exist in the original source system, this is the same problem that BI engines will have when putting together Data Mart results. 15

16 Link: Unit of Work Kept Together Source System Source Table UOW Product_ID Category_ID Supplier_ID Data Vault Link: Product by Category by Supplier Product_ID Category_ID Supplier_ID Commutative Property: Enable reproduction of the source exactly as it stands UOW is properly represented by a single Link in the Data Vault 16

17 What keeps you up at night? CURRENT LOADING PAIN 17

18 Problems with EDW Loads Today Technical Issues: 2am Wakeup Calls because data won t fit the business rules Emergency Fixes to Production Speed, Speed, Speed (shrinking load window + more data) Can t load real-time data (business rules in the way!!) Business won t buy better, faster, hardware! Business Issues: Maintenance cycles take too long Maintenance costs continue to increase Fixes to existing mappings break working logic Complexity of existing systems become unsustainable to business IT isn t using 80%+ of the hardware resources given to them (their jobs are running at 40% utilization when they are full-bore ) 18

19 Solutions! Technical Solutions All Parallel Job Streams As much as possible 1 Target Per Map, Per Action reduces complexity Generate Data Flows based on patterns (then focus on the real work) Get some SLEEP at night!! (no more production modifications) Business Solutions Decrease turn-around time Increase Performance Handle Real-Time Data!! Reduce Complexity = Reduce Costs, Reduce Time to Implement Get the power back for decision making, discovering and building your own marts 19

20 How? 20

21 Some standards to follow BASIC LOADING CONCEPTS 21

22 Loading: A Golden Rule 100% of the Data Loaded to the EDW 100% of the time! It s all about Auditability 22

23 Load Date / End Date Geology Batch Load Real-Time Loading 23

24 Real Time Loading - DV Stock Trade ACCOUNT= TRADE="Buy" STOCK= DAN" SHARES=100.0 CURRENCY="USD" PRICE= DATE="Feb 20, 2002 Comment="Buy Order to Execute" = Inserts Only, no Updates Acct Hub Trade Link DAN Stock Hub TRADE="Buy" SHARES=100.0 CURRENCY="USD" PRICE= DATE="Feb 20, 2002 Comment="Buy Order to Execute" Transactional Link # of Inserts 75M 50M 25M 10M First Data Set Loaded New Systems Data Added Months in Production As critical mass of current business keys is reached, the insert rates decrease rapidly. New systems add new keys, quickly and efficiently to an existing Hub. 24

25 Batch Load Date Time Stamp Stage Load Stage Load Staging Area CNTRL_DTE LOAD_DTS STAGING TABLE Sequence_ID. Load_DTS Record_Source STAGING TABLE Sequence_ID. Load_DTS Record_Source EDW Data Vault Load Date Is exactly the same For All rows 25

26 Parallel Load Architecture - Batch Staging Loads Data Vault Loads Data Mart Loads Sources Stage Hubs Hub Satellites Link Satellites Dimensions Facts Links Major Synchronization Points Processing: All loads are done in parallel Sets of processes wait for the previous set to complete Processes are run as soon as data is ready No other waiting time is required Load dependencies are greatly reduced 26

27 Mathematics of Batch Loading Its all about SPEED SPEED SPEED 10 Million Incoming Rows 60% - 80% Inserts (Never Seen Before) 10%-20% Updates Matched By KEY 5% Deletes EDW: 1 Billion Rows And growing Inserts are the single fastest operation in the Database! Updates are the single slowest operation in the Database! Q: Why push 80% of your Insert data through the heaviest/slowest transformation logic? 27

28 Simple Loading Patterns Rule: 1 Target Per Data Flow (map/graph) Per Action Source SQ LKP Target Filter If Exists Target Insert Source (Stage) Insert View: Select ALL that do not exist By PK in target Update View: Select ALL that exist By PK in target ONLY those with DELTA Source SQ Target Insert Target 28

29 Results of Pattern Tuning FROM THIS.. 5M 600 RPS = 2.31 hrs OR: 7k rps = 11.9 mins No parallelism This map must run at a minimum of 10k rps to beat the parallel times 10k rps = 8.33 mins TO THIS! Pass 1: 33k RPS = 2.52 mins Pass 2: 33k RPS = 2.52 mins 25k RPS = 3.33 mins Pass 3: 50k RPS = 1.66 mins 33k RPS = 2.52 mins 40k RPS = 2.03 mins 23k RPS = 3.61 mins Total Time: = 9.46 mins 29

30 Patterns Take the Cake! LOADING THE DATA VAULT 30

31 Loading Templates: Hubs Staging Data Distinct List BK Keys Exists In Target? No Insert Into Target (Gen Surrogate) Hub Yes Drop Row From Feed Select a Master system, and a hierarchy of importance for sub-systems to annotate arrival location of data Purpose of the loading template: Find out if the business key exists in the hub, if not insert it Use a distinct list (unique) of business keys coming from the staging area 31

32 Loading Templates: Links Staging Data Distinct List Busn Keys Lookup EACH Hubs Surrogate Keys Exists In Target? No Insert Into Target (gen surrogate) Link Yes Drop Row From Feed Select a Master system, and a hierarchy of importance for sub-systems to annotate arrival location of data Purpose of the loading template: Find all relationships between business keys, then, is the relationship already recorded in the Link, if not insert it Use a distinct list of related business keys 32

33 Loading Templates: Satellites Staging Data Distinct List Sat Rows Lookup EACH Hub s or Link s Surrogate Keys All Columns Match? No Insert Into Target Satellite Find Latest Sat Row Yes Drop Row From Feed Select a Master system, and a hierarchy of importance for sub-systems to annotate arrival location of data Purpose of the loading template: Gather descriptive data, compare to most recent copy of information in satellite, and if there are any deltas load, if not, don t load Use a distinct list of descriptive fields from the source systems 33

34 How to build your Data Vault GETTING STARTED HOW TO 34

35 Step 1: Establish Scope (Build Business Case Model) 35

36 Step 1: Define Business Keys Hub Invoice Hub Campaign Hub Customer Hub Products 36

37 Step 2: Define Associations Hub Invoice Hub Campaign Link Campaign by Invoice by Customer Hub Customer Link Product on Campaign Hub Products Link Invoice Line Items 37

38 Step 3: Define Descriptive Data Hub Invoice Hub Campaign Link Campaign by Invoice by Customer Hub Customer Sat Effectiveness Ratings Sat Effectiveness Dates Sat Dates and Amounts Sat Address Sat Details Sat Contacts Link Product on Campaign Hub Products Link Invoice Line Items Sat Availability Dates Sat Defect Reasons Sat Descriptions Sat Stock Quantities Sat Amounts Sat Quantities 38

39 Step 4: Build Source Model (PK/FK) (No Pictures, Sorry) Ensure the source model (DDL Only) has Primary and Foreign Keys defined Normalize the source model (if not normalized) Capture and integrate all source systems involved (if not already captured) Add Comments to the DDL (tables and fields) 39

40 Step 5: Build Cross-Reference The purpose of such an exercise is not to identify all the elements, but specifically to identify the target Hubs, (ie: the business keys), target Links, and at LEAST a single Satellite for at least 1 source column The engine (SaaS) will automatically assign all other descriptive elements to the first Satellite identified. SOURCE TABLE SOURCE COLUMN GROUP TARGET TABLE TARGET COLUMN AHLTAT_DIAGNOSIS DOC_REF 1 SAT_AHLTAT_DIAGNOSIS DOC_REF DATAID 1 HUB_DIAGNOSIS DIAGNOSIS_DATAID FACILITYNCID 1 HUB_FACILITY FAC_ID DIAGNOSISNCID 1 SAT_AHLTAT_DIAGNOSIS DIAGNOSISNCID ENCOUNTERNUMBER 1 HUB_EVENT EVNT_ID CLINICIANNCID 1 HUB_CLINICIAN CLINICIAN_NCID UNIT_NUMBER 1 HUB_UNIT UNIT_ID MEDCINID 1 HUB_MEDCIN MEDCIN_ID CREATETIME 1 SAT_AHLTAT_DIAGNOSIS CREATETIME CREATEUSERNCID 1 SAT_AHLTAT_DIAGNOSIS CREATEUSERNCID MODIFYUSERNCID 1 SAT_AHLTAT_DIAGNOSIS MODIFYUSERNCID MODIFYTIME 1 SAT_AHLTAT_DIAGNOSIS MODIFYTIME PRIORITY 1 SAT_AHLTAT_DIAGNOSIS PRIORITY DIAGNOSESCOMMENT 1 SAT_AHLTAT_DIAGNOSIS DIAGNOSESCOMMENT 40

41 Step 6: Generate Baseline ETL/ELT Source DDL Cross-Ref Mapping XLS Target DDL Generate Code, Reports, Documentation Data Flows (Mappings / Graphs) 41

42 What did we learn? CONCLUSIONS / SUMMARY 42

43 Data Vault Modeling Is Made up of Hubs, Links, and Satellites Easy to create and build Hardest thing is to find/locate and define the Business Keys Consistent, Scalable, Repeatable, Pattern Based RULES BASED / STANDARDS DRIVEN Loading Is. Scalable, Fault-Tolerant, Parallelizable, Pattern Based Generatable Performance Based 100% Restartable Set Based Devoid of Soft Business Rules!! 43

44 Still - Lots To Learn We didn t cover: Joins point-in-time tables building marts business logic components SQL extraction bridge tables what to do when dealing with bad data architecting security, managing governance, handling metadata Contact me for Workshops (training), and Mentoring 44

45 Questions? Dan Linstedt President, Empowered Holdings, LLC Tel: SERVICES: Consulting Assessments Product Selection Scorecards Architecture / Design Mentoring and Workshops (training) 45

Kent Graziano

Kent Graziano Agile Data Warehouse Modeling: Introduction to Data Vault Modeling Kent Graziano Twitter @KentGraziano Agenda Bio What is a Data Vault? Where does it fit in an DW/BI architecture? How to design a Data

More information

Data Vault Brisbane User Group

Data Vault Brisbane User Group Data Vault Brisbane User Group 26-02-2013 Agenda Introductions A brief introduction to Data Vault Creating a Data Vault based Data Warehouse Comparisons with 3NF/Kimball When is it good for you? Examples

More information

Data Vault. The Next Super Model. (Patent Pending Architecture) Presented by Kent Graziano Supervisor, Enterprise Data Warehouse Denver Public Schools

Data Vault. The Next Super Model. (Patent Pending Architecture) Presented by Kent Graziano Supervisor, Enterprise Data Warehouse Denver Public Schools Data Vault The Next Super Model (Patent Pending Architecture) Presented by Kent Graziano Supervisor, Enterprise Data Warehouse Denver Public Schools Slides courtesy of Dan Linstedt Core Integration Partners,

More information

Technology Note. Data Vault Modeling with ER/Studio Data Architect

Technology Note. Data Vault Modeling with ER/Studio Data Architect Technology Note Data Vault Modeling with ER/Studio Data Architect Dr. Sultan Shiffa March 28, 2018 Data Vault Modeling with ER/Studio Data Architect Overview I have been asked multiple times if ER/Studio

More information

DATA VAULT MODELING GUIDE

DATA VAULT MODELING GUIDE DATA VAULT MODELING GUIDE Introductory Guide to Data Vault Modeling GENESEE ACADEMY, LLC 2012 Authored by: Hans Hultgren DATA VAULT MODELING GUIDE Introductory Guide to Data Vault Modeling Forward Data

More information

Data Vault Partitioning Strategies WHITE PAPER

Data Vault Partitioning Strategies WHITE PAPER Dani Schnider Data Vault ing Strategies WHITE PAPER Page 1 of 18 www.trivadis.com Date 09.02.2018 CONTENTS 1 Introduction... 3 2 Data Vault Modeling... 4 2.1 What is Data Vault Modeling? 4 2.2 Hubs, Links

More information

Techno Expert Solutions An institute for specialized studies!

Techno Expert Solutions An institute for specialized studies! Getting Started Course Content of IBM Cognos Data Manger Identify the purpose of IBM Cognos Data Manager Define data warehousing and its key underlying concepts Identify how Data Manager creates data warehouses

More information

A Star Schema Has One To Many Relationship Between A Dimension And Fact Table

A Star Schema Has One To Many Relationship Between A Dimension And Fact Table A Star Schema Has One To Many Relationship Between A Dimension And Fact Table Many organizations implement star and snowflake schema data warehouse The fact table has foreign key relationships to one or

More information

Data Strategies for Efficiency and Growth

Data Strategies for Efficiency and Growth Data Strategies for Efficiency and Growth Date Dimension Date key (PK) Date Day of week Calendar month Calendar year Holiday Channel Dimension Channel ID (PK) Channel name Channel description Channel type

More information

IBM B5280G - IBM COGNOS DATA MANAGER: BUILD DATA MARTS WITH ENTERPRISE DATA (V10.2)

IBM B5280G - IBM COGNOS DATA MANAGER: BUILD DATA MARTS WITH ENTERPRISE DATA (V10.2) IBM B5280G - IBM COGNOS DATA MANAGER: BUILD DATA MARTS WITH ENTERPRISE DATA (V10.2) Dauer: 5 Tage Durchführungsart: Präsenztraining Zielgruppe: This course is intended for Developers. Nr.: 35231 Preis:

More information

Comparing Anchor Modeling with Data Vault Modeling

Comparing Anchor Modeling with Data Vault Modeling PLACE PHOTO HERE, OTHERWISE DELETE BOX Comparing Anchor Modeling with Data Vault Modeling Lars Rönnbäck & Hans Hultgren SUMMER 2013 lars.ronnback@anchormodeling.com www.anchormodeling.com Hans@GeneseeAcademy.com

More information

Guide Users along Information Pathways and Surf through the Data

Guide Users along Information Pathways and Surf through the Data Guide Users along Information Pathways and Surf through the Data Stephen Overton, Overton Technologies, LLC, Raleigh, NC ABSTRACT Business information can be consumed many ways using the SAS Enterprise

More information

ALTERNATE SCHEMA DIAGRAMMING METHODS DECISION SUPPORT SYSTEMS. CS121: Relational Databases Fall 2017 Lecture 22

ALTERNATE SCHEMA DIAGRAMMING METHODS DECISION SUPPORT SYSTEMS. CS121: Relational Databases Fall 2017 Lecture 22 ALTERNATE SCHEMA DIAGRAMMING METHODS DECISION SUPPORT SYSTEMS CS121: Relational Databases Fall 2017 Lecture 22 E-R Diagramming 2 E-R diagramming techniques used in book are similar to ones used in industry

More information

Introductory Guide to Data Vault Modeling GENESEE ACADEMY, LLC

Introductory Guide to Data Vault Modeling GENESEE ACADEMY, LLC Introductory Guide to Data Vault Modeling GENESEE ACADEMY, LLC 2016 Authored by: Hans Hultgren Introductory Guide to Data Vault Modeling Forward Data Vault modeling is most compelling when applied to an

More information

Decision Guidance. Data Vault in Data Warehousing

Decision Guidance. Data Vault in Data Warehousing Decision Guidance Data Vault in Data Warehousing DATA VAULT IN DATA WAREHOUSING Today s business environment requires data models, which are resilient to change and enable the integration of multiple data

More information

DATA VAULT CDVDM. Certified Data Vault Data Modeler Course. Sydney Australia December In cooperation with GENESEE ACADEMY, LLC

DATA VAULT CDVDM. Certified Data Vault Data Modeler Course. Sydney Australia December In cooperation with GENESEE ACADEMY, LLC DATA VAULT CDVDM Certified Data Vault Data Modeler Course Sydney Australia December 3-5 2012 In cooperation with GENESEE ACADEMY, LLC Course Description and Outline DATA VAULT CDVDM Certified Data Vault

More information

Implementing a Data Warehouse with Microsoft SQL Server 2012/2014 (463)

Implementing a Data Warehouse with Microsoft SQL Server 2012/2014 (463) Implementing a Data Warehouse with Microsoft SQL Server 2012/2014 (463) Design and implement a data warehouse Design and implement dimensions Design shared/conformed dimensions; determine if you need support

More information

Data warehouse architecture consists of the following interconnected layers:

Data warehouse architecture consists of the following interconnected layers: Architecture, in the Data warehousing world, is the concept and design of the data base and technologies that are used to load the data. A good architecture will enable scalability, high performance and

More information

Next Generation DWH Modeling. An overview of DWH modeling methods

Next Generation DWH Modeling. An overview of DWH modeling methods Next Generation DWH Modeling An overview of DWH modeling methods Ronald Kunenborg www.grundsatzlich-it.nl Topics Where do we stand today Data storage and modeling through the ages Current data warehouse

More information

Schwan Food Company s Journey with SAP HANA

Schwan Food Company s Journey with SAP HANA Speakers: Schwan Food Company s Journey with SAP HANA May 14, 2013 From Vision of SAP HANA to EDW on SAP HANA Al Grube Enterprise Information Architect The Schwan Food Company Al.Grube@schwans.com Mark

More information

Information Management Fundamentals by Dave Wells

Information Management Fundamentals by Dave Wells Information Management Fundamentals by Dave Wells All rights reserved. Reproduction in whole or part prohibited except by written permission. Product and company names mentioned herein may be trademarks

More information

Oracle 11g Partitioning new features and ILM

Oracle 11g Partitioning new features and ILM Oracle 11g Partitioning new features and ILM H. David Gnau Sales Consultant NJ Mark Van de Wiel Principal Product Manager The following is intended to outline our general product

More information

Full file at

Full file at Chapter 2 Data Warehousing True-False Questions 1. A real-time, enterprise-level data warehouse combined with a strategy for its use in decision support can leverage data to provide massive financial benefits

More information

Modeling Pattern Awareness

Modeling Pattern Awareness Modeling Pattern Awareness Modeling Pattern Awareness 2014 Authored by: Hans Hultgren Modeling Pattern Awareness The importance of knowing your pattern Forward Over the past decade Ensemble Modeling has

More information

File Processing Approaches

File Processing Approaches Relational Database Basics Review Overview Database approach Database system Relational model File Processing Approaches Based on file systems Data are recorded in various types of files organized in folders

More information

turning data into dollars

turning data into dollars turning data into dollars Tom s Ten Data Tips November 2012 Data warehouse automation Data warehouse (DWH) automation is a relatively new and burgeoning field. Design patterns have emerged that enable

More information

ETL Best Practices and Techniques. Marc Beacom, Managing Partner, Datalere

ETL Best Practices and Techniques. Marc Beacom, Managing Partner, Datalere ETL Best Practices and Techniques Marc Beacom, Managing Partner, Datalere Thank you Sponsors Experience 10 years DW/BI Consultant 20 Years overall experience Marc Beacom Managing Partner, Datalere Current

More information

Low Friction Data Warehousing WITH PERSPECTIVE ILM DATA GOVERNOR

Low Friction Data Warehousing WITH PERSPECTIVE ILM DATA GOVERNOR Low Friction Data Warehousing WITH PERSPECTIVE ILM DATA GOVERNOR Table of Contents Foreword... 2 New Era of Rapid Data Warehousing... 3 Eliminating Slow Reporting and Analytics Pains... 3 Applying 20 Years

More information

MOC 20463C: Implementing a Data Warehouse with Microsoft SQL Server

MOC 20463C: Implementing a Data Warehouse with Microsoft SQL Server MOC 20463C: Implementing a Data Warehouse with Microsoft SQL Server Course Overview This course provides students with the knowledge and skills to implement a data warehouse with Microsoft SQL Server.

More information

An Information Asset Hub. How to Effectively Share Your Data

An Information Asset Hub. How to Effectively Share Your Data An Information Asset Hub How to Effectively Share Your Data Hello! I am Jack Kennedy Data Architect @ CNO Enterprise Data Management Team Jack.Kennedy@CNOinc.com 1 4 Data Functions Your Data Warehouse

More information

IBM Industry Data Models

IBM Industry Data Models IBM Software Group IBM Industry Data Models Usage, Process & Demonstration David Cope EDW Architect Asia Pacific 2007 IBM Corporation The EDW Data Model Business Requirements Analysis Design Planning Data

More information

OLAP Introduction and Overview

OLAP Introduction and Overview 1 CHAPTER 1 OLAP Introduction and Overview What Is OLAP? 1 Data Storage and Access 1 Benefits of OLAP 2 What Is a Cube? 2 Understanding the Cube Structure 3 What Is SAS OLAP Server? 3 About Cube Metadata

More information

Top of Minds Report series Data Warehouse The six levels of integration

Top of Minds Report series Data Warehouse The six levels of integration Top of Minds Report series Data Warehouse The six levels of integration Recommended reading Before reading this report it is recommended to read ToM Report Series on Data Warehouse Definitions for Integration

More information

Microsoft SQL Server Training Course Catalogue. Learning Solutions

Microsoft SQL Server Training Course Catalogue. Learning Solutions Training Course Catalogue Learning Solutions Querying SQL Server 2000 with Transact-SQL Course No: MS2071 Two days Instructor-led-Classroom 2000 The goal of this course is to provide students with the

More information

FROM A RELATIONAL TO A MULTI-DIMENSIONAL DATA BASE

FROM A RELATIONAL TO A MULTI-DIMENSIONAL DATA BASE FROM A RELATIONAL TO A MULTI-DIMENSIONAL DATA BASE David C. Hay Essential Strategies, Inc In the buzzword sweepstakes of 1997, the clear winner has to be Data Warehouse. A host of technologies and techniques

More information

Contingency Planning and Disaster Recovery

Contingency Planning and Disaster Recovery Contingency Planning and Disaster Recovery Best Practices Version: 7.2.x Written by: Product Knowledge, R&D Date: April 2017 2017 Lexmark. All rights reserved. Lexmark is a trademark of Lexmark International

More information

Modeling the. Agile. with Data Vault. Data Warehouse. Hans Hultgren

Modeling the. Agile. with Data Vault. Data Warehouse. Hans Hultgren Agile Modeling the Data Warehouse with Data Vault Hans Hultgren Contents FORWARD 4 ABOUT THE AUTHOR 7 ACKNOWLEDGEMENTS 8 CHAPTER 1 DATA VA ULT DEF IN ED 19 1.1 data Vault is a Data Modeling Approach 20

More information

Managing Data Resources

Managing Data Resources Chapter 7 OBJECTIVES Describe basic file organization concepts and the problems of managing data resources in a traditional file environment Managing Data Resources Describe how a database management system

More information

Basics of Dimensional Modeling

Basics of Dimensional Modeling Basics of Dimensional Modeling Data warehouse and OLAP tools are based on a dimensional data model. A dimensional model is based on dimensions, facts, cubes, and schemas such as star and snowflake. Dimension

More information

MIS2502: Data Analytics Dimensional Data Modeling. Jing Gong

MIS2502: Data Analytics Dimensional Data Modeling. Jing Gong MIS2502: Data Analytics Dimensional Data Modeling Jing Gong gong@temple.edu http://community.mis.temple.edu/gong Where we are Now we re here Data entry Transactional Database Data extraction Analytical

More information

DATABASE DEVELOPMENT (H4)

DATABASE DEVELOPMENT (H4) IMIS HIGHER DIPLOMA QUALIFICATIONS DATABASE DEVELOPMENT (H4) December 2017 10:00hrs 13:00hrs DURATION: 3 HOURS Candidates should answer ALL the questions in Part A and THREE of the five questions in Part

More information

Call: SAS BI Course Content:35-40hours

Call: SAS BI Course Content:35-40hours SAS BI Course Content:35-40hours Course Outline SAS Data Integration Studio 4.2 Introduction * to SAS DIS Studio Features of SAS DIS Studio Tasks performed by SAS DIS Studio Navigation to SAS DIS Studio

More information

TDWI strives to provide course books that are contentrich and that serve as useful reference documents after a class has ended.

TDWI strives to provide course books that are contentrich and that serve as useful reference documents after a class has ended. Previews of TDWI course books offer an opportunity to see the quality of our material and help you to select the courses that best fit your needs. The previews cannot be printed. TDWI strives to provide

More information

Optimizing and Modeling SAP Business Analytics for SAP HANA. Iver van de Zand, Business Analytics

Optimizing and Modeling SAP Business Analytics for SAP HANA. Iver van de Zand, Business Analytics Optimizing and Modeling SAP Business Analytics for SAP HANA Iver van de Zand, Business Analytics Early data warehouse projects LIMITATIONS ISSUES RAISED Data driven by acquisition, not architecture Too

More information

Microsoft Implementing a SQL Data Warehouse

Microsoft Implementing a SQL Data Warehouse 1800 ULEARN (853 276) www.ddls.com.au Microsoft 20767 - Implementing a SQL Data Warehouse Length 5 days Price $4290.00 (inc GST) Version C Overview This five-day instructor-led course provides students

More information

Integrating SAS and Data Vault

Integrating SAS and Data Vault ABSTRACT Paper 1898-2018 Integrating SAS and Data Vault Patrick Cuba, Cuba BI Consulting Pty Ltd Data Vault (DV) modelling technique is fast gaining popularity around the world as an easy to learn, easy

More information

Migrate from Netezza Workload Migration

Migrate from Netezza Workload Migration Migrate from Netezza Automated Big Data Open Netezza Source Workload Migration CASE SOLUTION STUDY BRIEF Automated Netezza Workload Migration To achieve greater scalability and tighter integration with

More information

HANA Performance. Efficient Speed and Scale-out for Real-time BI

HANA Performance. Efficient Speed and Scale-out for Real-time BI HANA Performance Efficient Speed and Scale-out for Real-time BI 1 HANA Performance: Efficient Speed and Scale-out for Real-time BI Introduction SAP HANA enables organizations to optimize their business

More information

Xcelerated Business Insights (xbi): Going beyond business intelligence to drive information value

Xcelerated Business Insights (xbi): Going beyond business intelligence to drive information value KNOWLEDGENT INSIGHTS volume 1 no. 5 October 7, 2011 Xcelerated Business Insights (xbi): Going beyond business intelligence to drive information value Today s growing commercial, operational and regulatory

More information

A brief history of time for Data Vault

A brief history of time for Data Vault Dates and times in Data Vault There are no best practices. Just a lot of good practices, and even more bad practices. This is especially true when it comes to handling dates and times in Data Warehousing,

More information

Introduction to Data Science

Introduction to Data Science UNIT I INTRODUCTION TO DATA SCIENCE Syllabus Introduction of Data Science Basic Data Analytics using R R Graphical User Interfaces Data Import and Export Attribute and Data Types Descriptive Statistics

More information

Microsoft Developer Day

Microsoft Developer Day Microsoft Developer Day Pradeep Menon Microsoft Developer Day Solutions Architect Agenda Microsoft Developer Day Traditional Business Intelligence Architecture Structured Sources Extract Transform Structurize

More information

Business Intelligence. You can t manage what you can t measure. You can t measure what you can t describe. Ahsan Kabir

Business Intelligence. You can t manage what you can t measure. You can t measure what you can t describe. Ahsan Kabir Business Intelligence You can t manage what you can t measure. You can t measure what you can t describe Ahsan Kabir A broad category of applications and technologies for gathering, storing, analyzing,

More information

Freecoms VoIP Mobile Community Telecom S. Ferrari, page n 1»

Freecoms VoIP Mobile Community Telecom S. Ferrari, page n 1» Freecoms VoIP Mobile Community Telecom S. Ferrari, page n 1» Multiservice Mobile VoIP Community Powerful multiservice package: Home and Mobile VoIP communication. Business and Private WEB Portal community

More information

Modern Data Warehouse The New Approach to Azure BI

Modern Data Warehouse The New Approach to Azure BI Modern Data Warehouse The New Approach to Azure BI History On-Premise SQL Server Big Data Solutions Technical Barriers Modern Analytics Platform On-Premise SQL Server Big Data Solutions Modern Analytics

More information

Oracle Data Warehousing Pushing the Limits. Introduction. Case Study. Jason Laws. Principal Consultant WhereScape Consulting

Oracle Data Warehousing Pushing the Limits. Introduction. Case Study. Jason Laws. Principal Consultant WhereScape Consulting Oracle Data Warehousing Pushing the Limits Jason Laws Principal Consultant WhereScape Consulting Introduction Oracle is the leading database for data warehousing. This paper covers some of the reasons

More information

Segregating Data Within Databases for Performance Prepared by Bill Hulsizer

Segregating Data Within Databases for Performance Prepared by Bill Hulsizer Segregating Data Within Databases for Performance Prepared by Bill Hulsizer When designing databases, segregating data within tables is usually important and sometimes very important. The higher the volume

More information

Overview of Reporting in the Business Information Warehouse

Overview of Reporting in the Business Information Warehouse Overview of Reporting in the Business Information Warehouse Contents What Is the Business Information Warehouse?...2 Business Information Warehouse Architecture: An Overview...2 Business Information Warehouse

More information

Whitepaper. Solving Complex Hierarchical Data Integration Issues. What is Complex Data? Types of Data

Whitepaper. Solving Complex Hierarchical Data Integration Issues. What is Complex Data? Types of Data Whitepaper Solving Complex Hierarchical Data Integration Issues What is Complex Data? Historically, data integration and warehousing has consisted of flat or structured data that typically comes from structured

More information

Information Value Chain

Information Value Chain Physical Value Chain Introduction When I was head of architecture at the newly global Dun and Bradstreet I needed to change my thinking from that of a software vendor, which I had recently been, to that

More information

SharePoint 2010 Enterprise Content Management for IT Pros. Mirjam van Olst Macaw

SharePoint 2010 Enterprise Content Management for IT Pros. Mirjam van Olst Macaw SharePoint 2010 Enterprise Content Management for IT Pros Mirjam van Olst Macaw About Mirjam Blog: http://sharepointchick.com Email: mirjam@macaw.nl Twitter: @mirjamvanolst Agenda Managed Metadata Service

More information

Data and Knowledge Management Dr. Rick Jerz

Data and Knowledge Management Dr. Rick Jerz Data and Knowledge Management Dr. Rick Jerz 1 Goals Define big data and discuss its basic characteristics Understand ways to store information Understand the value of a Database Management System Explain

More information

Entity Relationship Diagram (ERD) Dr. Moustafa Elazhary

Entity Relationship Diagram (ERD) Dr. Moustafa Elazhary Entity Relationship Diagram (ERD) Dr. Moustafa Elazhary Data Modeling Data modeling is a very vital as it is like creating a blueprint to build a house before the actual building takes place. It is built

More information

BI/DWH Test specifics

BI/DWH Test specifics BI/DWH Test specifics Jaroslav.Strharsky@s-itsolutions.at 26/05/2016 Page me => TestMoto: inadequate test scope definition? no problem problem cold be only bad test strategy more than 16 years in IT more

More information

Module 1.Introduction to Business Objects. Vasundhara Sector 14-A, Plot No , Near Vaishali Metro Station,Ghaziabad

Module 1.Introduction to Business Objects. Vasundhara Sector 14-A, Plot No , Near Vaishali Metro Station,Ghaziabad Module 1.Introduction to Business Objects New features in SAP BO BI 4.0. Data Warehousing Architecture. Business Objects Architecture. SAP BO Data Modelling SAP BO ER Modelling SAP BO Dimensional Modelling

More information

Efficiency Gains in Inbound Data Warehouse Feed Implementation

Efficiency Gains in Inbound Data Warehouse Feed Implementation Efficiency Gains in Inbound Data Warehouse Feed Implementation Simon Eligulashvili simon.e@gamma-sys.com Introduction The task of building a data warehouse with the objective of making it a long-term strategic

More information

Making EXCEL Work for YOU!

Making EXCEL Work for YOU! Tracking and analyzing numerical data is a large component of the daily activity in today s workplace. Microsoft Excel 2003 is a popular choice among individuals and companies for organizing, analyzing,

More information

IT1105 Information Systems and Technology. BIT 1 ST YEAR SEMESTER 1 University of Colombo School of Computing. Student Manual

IT1105 Information Systems and Technology. BIT 1 ST YEAR SEMESTER 1 University of Colombo School of Computing. Student Manual IT1105 Information Systems and Technology BIT 1 ST YEAR SEMESTER 1 University of Colombo School of Computing Student Manual Lesson 3: Organizing Data and Information (6 Hrs) Instructional Objectives Students

More information

1Z0-526

1Z0-526 1Z0-526 Passing Score: 800 Time Limit: 4 min Exam A QUESTION 1 ABC's Database administrator has divided its region table into several tables so that the west region is in one table and all the other regions

More information

Data and Knowledge Management. Goals. Big Data. Dr. Rick Jerz

Data and Knowledge Management. Goals. Big Data. Dr. Rick Jerz Data and Knowledge Management Dr. Rick Jerz 1 Goals Define big data and discuss its basic characteristics Understand ways to store information Understand the value of a Database Management System Explain

More information

Data Warehouses Chapter 12. Class 10: Data Warehouses 1

Data Warehouses Chapter 12. Class 10: Data Warehouses 1 Data Warehouses Chapter 12 Class 10: Data Warehouses 1 OLTP vs OLAP Operational Database: a database designed to support the day today transactions of an organization Data Warehouse: historical data is

More information

20767B: IMPLEMENTING A SQL DATA WAREHOUSE

20767B: IMPLEMENTING A SQL DATA WAREHOUSE ABOUT THIS COURSE This 5-day instructor led course describes how to implement a data warehouse platform to support a BI solution. Students will learn how to create a data warehouse with Microsoft SQL Server

More information

Pro Tech protechtraining.com

Pro Tech protechtraining.com Course Summary Description This course provides students with the skills necessary to plan, design, build, and run the ETL processes which are needed to build and maintain a data warehouse. It is based

More information

Data Vault Modeling and its Evolution DECISION SCIENCES INSTITUTE. Conceptual Data Vault Modeling and its Opportunities for the Future

Data Vault Modeling and its Evolution DECISION SCIENCES INSTITUTE. Conceptual Data Vault Modeling and its Opportunities for the Future DECISION SCIENCES INSTITUTE Conceptual Data Vault Modeling and its Opportunities for the Future Aarthi Raman, Active Network, Dallas, TX, 75201 itz.aarthi@gmail.com Teuta Cata, Northern Kentucky University,

More information

Analytics in the Cloud Mandate or Option?

Analytics in the Cloud Mandate or Option? Analytics in the Cloud Mandate or Option? Rick Lower Sr. Director of Analytics Alliances Teradata 1 The SAS & Teradata Partnership Overview Partnership began in 2007 to improving analytic performance Teradata

More information

Oracle Database 12c: Performance Management and Tuning

Oracle Database 12c: Performance Management and Tuning Oracle University Contact Us: +43 (0)1 33 777 401 Oracle Database 12c: Performance Management and Tuning Duration: 5 Days What you will learn In the Oracle Database 12c: Performance Management and Tuning

More information

EPM Live 2.2 Configuration and Administration Guide v.os1

EPM Live 2.2 Configuration and Administration Guide v.os1 Installation Configuration Guide EPM Live v2.2 Version.01 April 30, 2009 EPM Live 2.2 Configuration and Administration Guide v.os1 Table of Contents 1 Getting Started... 5 1.1 Document Overview... 5 1.2

More information

EZY Intellect Pte. Ltd., #1 Changi North Street 1, Singapore

EZY Intellect Pte. Ltd., #1 Changi North Street 1, Singapore Oracle Database 12c: Performance Management and Tuning NEW Duration: 5 Days What you will learn In the Oracle Database 12c: Performance Management and Tuning course, learn about the performance analysis

More information

SAMPLE. Preface xi 1 Introducting Microsoft Analysis Services 1

SAMPLE. Preface xi 1 Introducting Microsoft Analysis Services 1 contents Preface xi 1 Introducting Microsoft Analysis Services 1 1.1 What is Analysis Services 2005? 1 Introducing OLAP 2 Introducing Data Mining 4 Overview of SSAS 5 SSAS and Microsoft Business Intelligence

More information

Analytics: Server Architect (Siebel 7.7)

Analytics: Server Architect (Siebel 7.7) Analytics: Server Architect (Siebel 7.7) Student Guide June 2005 Part # 10PO2-ASAS-07710 D44608GC10 Edition 1.0 D44917 Copyright 2005, 2006, Oracle. All rights reserved. Disclaimer This document contains

More information

Lyras Shipping - CIO Forum

Lyras Shipping - CIO Forum Lyras Shipping - CIO Forum Data Relationships at the Core of Making Big Data Work Panteleimon Pantelis 2015 Ulysses Systems (UK) Ltd. www.ulysses-systems.com Lyras Shipping and Big or not so Big BUT very

More information

Application software office packets, databases and data warehouses.

Application software office packets, databases and data warehouses. Introduction to Computer Systems (9) Application software office packets, databases and data warehouses. Piotr Mielecki Ph. D. http://www.wssk.wroc.pl/~mielecki piotr.mielecki@pwr.edu.pl pmielecki@gmail.com

More information

Migrate from Netezza Workload Migration

Migrate from Netezza Workload Migration Migrate from Netezza Automated Big Data Open Netezza Source Workload Migration CASE SOLUTION STUDY BRIEF Automated Netezza Workload Migration To achieve greater scalability and tighter integration with

More information

Exam /Course 20767B: Implementing a SQL Data Warehouse

Exam /Course 20767B: Implementing a SQL Data Warehouse Exam 70-767/Course 20767B: Implementing a SQL Data Warehouse Course Outline Module 1: Introduction to Data Warehousing This module describes data warehouse concepts and architecture consideration. Overview

More information

From Single Purpose to Multi Purpose Data Lakes. Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019

From Single Purpose to Multi Purpose Data Lakes. Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019 From Single Purpose to Multi Purpose Data Lakes Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019 Agenda Data Lakes Multiple Purpose Data Lakes Customer Example Demo Takeaways

More information

This tutorial will help computer science graduates to understand the basic-to-advanced concepts related to data warehousing.

This tutorial will help computer science graduates to understand the basic-to-advanced concepts related to data warehousing. About the Tutorial A data warehouse is constructed by integrating data from multiple heterogeneous sources. It supports analytical reporting, structured and/or ad hoc queries and decision making. This

More information

CIS 330: Web-driven Web Applications. Lecture 2: Introduction to ER Modeling

CIS 330: Web-driven Web Applications. Lecture 2: Introduction to ER Modeling CIS 330: Web-driven Web Applications Lecture 2: Introduction to ER Modeling 1 Goals of This Lecture Understand ER modeling 2 Last Lecture Why Store Data in a DBMS? Transactions (concurrent data access,

More information

USERS CONFERENCE Copyright 2016 OSIsoft, LLC

USERS CONFERENCE Copyright 2016 OSIsoft, LLC Bridge IT and OT with a process data warehouse Presented by Matt Ziegler, OSIsoft Complexity Problem Complexity Drives the Need for Integrators Disparate assets or interacting one-by-one Monitoring Real-time

More information

Dan Vlamis Vlamis Software Solutions, Inc Copyright 2005, Vlamis Software Solutions, Inc.

Dan Vlamis Vlamis Software Solutions, Inc Copyright 2005, Vlamis Software Solutions, Inc. 2UDFOH2/$3 +RZ'RHVLW5HDOO\:RUN",28*/LYH 6HVVLRQ Dan Vlamis dvlamis@vlamis.com Vlamis Software Solutions, Inc. 816-781-2880 http://www.vlamis.com 9ODPLV6RIWZDUH6ROXWLRQV,QF Founded in 1992 in Kansas City,

More information

Architectural challenges for building a low latency, scalable multi-tenant data warehouse

Architectural challenges for building a low latency, scalable multi-tenant data warehouse Architectural challenges for building a low latency, scalable multi-tenant data warehouse Mataprasad Agrawal Solutions Architect, Services CTO 2017 Persistent Systems Ltd. All rights reserved. Our analytics

More information

This module presents the star schema, an alternative to 3NF schemas intended for analytical databases.

This module presents the star schema, an alternative to 3NF schemas intended for analytical databases. Topic 3.3: Star Schema Design This module presents the star schema, an alternative to 3NF schemas intended for analytical databases. Star Schema Overview The star schema is a simple database architecture

More information

CT75 DATA WAREHOUSING AND DATA MINING DEC 2015

CT75 DATA WAREHOUSING AND DATA MINING DEC 2015 Q.1 a. Briefly explain data granularity with the help of example Data Granularity: The single most important aspect and issue of the design of the data warehouse is the issue of granularity. It refers

More information

<Insert Picture Here> Looking at Performance - What s new in MySQL Workbench 6.2

<Insert Picture Here> Looking at Performance - What s new in MySQL Workbench 6.2 Looking at Performance - What s new in MySQL Workbench 6.2 Mario Beck MySQL Sales Consulting Manager EMEA The following is intended to outline our general product direction. It is

More information

Deccansoft Software Services Microsoft Silver Learning Partner. SSAS Syllabus

Deccansoft Software Services Microsoft Silver Learning Partner. SSAS Syllabus Overview: Analysis Services enables you to analyze large quantities of data. With it, you can design, create, and manage multidimensional structures that contain detail and aggregated data from multiple

More information

Data Science. Data Analyst. Data Scientist. Data Architect

Data Science. Data Analyst. Data Scientist. Data Architect Data Science Data Analyst Data Analysis in Excel Programming in R Introduction to Python/SQL/Tableau Data Visualization in R / Tableau Exploratory Data Analysis Data Scientist Inferential Statistics &

More information

collection of data that is used primarily in organizational decision making.

collection of data that is used primarily in organizational decision making. Data Warehousing A data warehouse is a special purpose database. Classic databases are generally used to model some enterprise. Most often they are used to support transactions, a process that is referred

More information

The DBMS accepts requests for data from the application program and instructs the operating system to transfer the appropriate data.

The DBMS accepts requests for data from the application program and instructs the operating system to transfer the appropriate data. Managing Data Data storage tool must provide the following features: Data definition (data structuring) Data entry (to add new data) Data editing (to change existing data) Querying (a means of extracting

More information

ETL Testing Concepts:

ETL Testing Concepts: Here are top 4 ETL Testing Tools: Most of the software companies today depend on data flow such as large amount of information made available for access and one can get everything which is needed. This

More information

ETL and OLAP Systems

ETL and OLAP Systems ETL and OLAP Systems Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Software Development Technologies Master studies, first semester

More information

Chapter 6. Foundations of Business Intelligence: Databases and Information Management VIDEO CASES

Chapter 6. Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Chapter 6 Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Case 1a: City of Dubuque Uses Cloud Computing and Sensors to Build a Smarter, Sustainable City Case 1b:

More information