Data Vault Brisbane User Group

Similar documents
Kent Graziano

Data Warehouses Chapter 12. Class 10: Data Warehouses 1

DATA VAULT MODELING GUIDE

Data Vault Modeling & Methodology. Technical Side and Introduction Dan Linstedt, 2010,

Decision Guidance. Data Vault in Data Warehousing

A brief history of time for Data Vault

turning data into dollars

Next Generation DWH Modeling. An overview of DWH modeling methods

Data Vault. The Next Super Model. (Patent Pending Architecture) Presented by Kent Graziano Supervisor, Enterprise Data Warehouse Denver Public Schools

Technology Note. Data Vault Modeling with ER/Studio Data Architect

Business Intelligence Architecture Kim Setälä 37E00550 Business Intelligence

Full file at

Applying Business Logic to a Data Vault

Building a Data Strategy for a Digital World

DATA VAULT CDVDM. Certified Data Vault Data Modeler Course. Sydney Australia December In cooperation with GENESEE ACADEMY, LLC

Introductory Guide to Data Vault Modeling GENESEE ACADEMY, LLC

DATA WAREHOUSE 03 COMMON DWH ARCHITECTURES ANDREAS BUCKENHOFER, DAIMLER TSS

Comparing Anchor Modeling with Data Vault Modeling

Two Success Stories - Optimised Real-Time Reporting with BI Apps

Modeling the. Agile. with Data Vault. Data Warehouse. Hans Hultgren

Composite Software Data Virtualization The Five Most Popular Uses of Data Virtualization

Data Warehousing Fundamentals by Mark Peco

Hybrid Data Platform

Managing Data Resources

STRATEGIC INFORMATION SYSTEMS IV STV401T / B BTIP05 / BTIX05 - BTECH DEPARTMENT OF INFORMATICS. By: Dr. Tendani J. Lavhengwa

BI/DWH Test specifics

Data Warehouse and Data Mining

TDWI Data Modeling. Data Analysis and Design for BI and Data Warehousing Systems

Top of Minds Report series Data Warehouse The six levels of integration

Business Intelligence and Decision Support Systems

Modeling Pattern Awareness

Migrate from Netezza Workload Migration

New Zealand Government IBM Infrastructure as a Service

Taming Structured And Unstructured Data With SAP HANA Running On VCE Vblock Systems

Anchor Modeling A Technique for Information under Evolution

CHAPTER 3 Implementation of Data warehouse in Data Mining

This module presents the star schema, an alternative to 3NF schemas intended for analytical databases.

Teradata Aggregate Designer

Proven Integration Strategies for Government

Managing Data Resources

USERS CONFERENCE Copyright 2016 OSIsoft, LLC

Understanding Impact of J2EE Applications On Relational Databases. Dennis Leung, VP Development Oracle9iAS TopLink Oracle Corporation

turning data into dollars

2 The IBM Data Governance Unified Process

BPS Suite and the OCEG Capability Model. Mapping the OCEG Capability Model to the BPS Suite s product capability.

Schwan Food Company s Journey with SAP HANA

AVOIDING SILOED DATA AND SILOED DATA MANAGEMENT

ETL is No Longer King, Long Live SDD

DATABASE DEVELOPMENT (H4)

Low Friction Data Warehousing WITH PERSPECTIVE ILM DATA GOVERNOR

Fig 1.2: Relationship between DW, ODS and OLTP Systems

Migrate from Netezza Workload Migration

Data Stewardship Core by Maria C Villar and Dave Wells

DC Area Business Objects Crystal User Group (DCABOCUG) Data Warehouse Architectures for Business Intelligence Reporting.

Data Warehouse and Data Mining

Complete. The. Reference. Christopher Adamson. Mc Grauu. LlLIJBB. New York Chicago. San Francisco Lisbon London Madrid Mexico City

Modernizing Business Intelligence and Analytics

CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED DATA PLATFORM

An Information Asset Hub. How to Effectively Share Your Data

Data Warehousing ETL. Esteban Zimányi Slides by Toon Calders

Handout 12 Data Warehousing and Analytics.

Designing Data Warehouses. Data Warehousing Design. Designing Data Warehouses. Designing Data Warehouses

SAP IQ Software16, Edge Edition. The Affordable High Performance Analytical Database Engine

Data Vault Modeling and its Evolution DECISION SCIENCES INSTITUTE. Conceptual Data Vault Modeling and its Opportunities for the Future

Chapter 3: Data Warehousing

A Star Schema Has One To Many Relationship Between A Dimension And Fact Table

SAP BW/4HANA the next generation Data Warehouse

Pro Tech protechtraining.com

Cloud Going Mainstream All Are Trying, Some Are Benefiting; Few Are Maximizing Value

The Use of Soft Systems Methodology for the Development of Data Warehouses

Data Strategies for Efficiency and Growth

A scalable AI Knowledge Graph Solution for Healthcare (and many other industries) Dr. Jans Aasman

Cloud Going Mainstream All Are Trying, Some Are Benefiting; Few Are Maximizing Value

Data Quality Architecture and Options

TimeXtender extends beyond data warehouse automation with Discovery Hub

FIVE BEST PRACTICES FOR ENSURING A SUCCESSFUL SQL SERVER MIGRATION

Data Quality Control Why you d want a novelty detector in your ETL

IT Briefing. May 17, 2012 Goizueta Business School Room 231

This tutorial will help computer science graduates to understand the basic-to-advanced concepts related to data warehousing.

Data Management Glossary

Cloud Going Mainstream All Are Trying, Some Are Benefiting; Few Are Maximizing Value. An IDC InfoBrief, sponsored by Cisco September 2016

Meaning & Concepts of Databases

Proceedings of the IE 2014 International Conference AGILE DATA MODELS

Data Warehouse and Data Mining

Appliances and DW Architecture. John O Brien President and Executive Architect Zukeran Technologies 1

Information Management Fundamentals by Dave Wells

Oracle 1Z0-515 Exam Questions & Answers

Streamline your planning, forecasting and reporting process with M-Power s pre-built Xcelerate templates

CS614 - Data Warehousing - Midterm Papers Solved MCQ(S) (1 TO 22 Lectures)

Business Intelligence An Overview. Zahra Mansoori

5 Fundamental Strategies for Building a Data-centered Data Center

How Insurers are Realising the Promise of Big Data

Copyright 2016 Datalynx Pty Ltd. All rights reserved. Datalynx Enterprise Data Management Solution Catalogue

Xcelerated Business Insights (xbi): Going beyond business intelligence to drive information value

Arindrajit Roy; Office hours:

Q1) Describe business intelligence system development phases? (6 marks)

SD-WAN. Enabling the Enterprise to Overcome Barriers to Digital Transformation. An IDC InfoBrief Sponsored by Comcast

Simplifying your upgrade and consolidation to BW/4HANA. Pravin Gupta (Teklink International Inc.) Bhanu Gupta (Molex LLC)

Data Virtualization Implementation Methodology and Best Practices

Drawing the Big Picture

Transcription:

Data Vault Brisbane User Group 26-02-2013

Agenda Introductions A brief introduction to Data Vault Creating a Data Vault based Data Warehouse Comparisons with 3NF/Kimball When is it good for you? Examples What s next? 2/28/2013 1

Agenda Introductions A brief introduction to Data Vault Creating a Data Vault based Data Warehouse Comparisons with 3NF/Kimball When is it good for you? Examples What s next? 2/28/2013 2

Introductions - About Analytics8 Founded in 2002 in Australia Offices in Sydney, Melbourne, Brisbane, Chicago, Raleigh and Dallas 85+ Consultants Cross industry Technology and vendor agnostic 100% Services organisation Consulting, Training, Support, Software Procurement Business Intelligence and Data Warehousing Strategy, Enablement and Optimisation Leverage your data to hit your targets www.analytics8.com 2/28/2013 3

Introductions - About Analytics8 Strategic Services Implementation Services DW/BI Strategy and Roadmaps DW, BI and ETL Architecture Data / Business Modeling Business Intelligence and Analytics Project Management & Governance Competency Centers DW, BI and ETL Assessments Data Integration Tool / Vendor Selection and procurement assistance Training Support 2/28/2013 4

Introductions Brisbane User Group 2/28/2013 5

Agenda Introductions A brief introduction to Data Vault Creating a Data Vault based Data Warehouse Comparisons with 3NF/Kimball When is it good for you? Examples What s next? 2/28/2013 6

Data Vault There are no facts, only interpretations Friedrich Nietzsche Get the facts first, then distort them as you please Mark Twain Everything we hear is an opinion, not a fact. Everything we see is a perspective, not the truth. Marcus Aurelius Thursday, February 28, 2013 7

Data Vault Data is managed as an asset Business Rules are moved closer to the business The Truth is subjective and based on changing business rules 2/28/2013 8

Data Vault Data Vault is the optimal choice for modelling the EDW in the DW 2.0 framework. Bill Inmon about DW 2.0 The Data Vault is a detailed, historically oriented, uniquely linked set of normalized tables that support one or more functional areas of business. Dan Linstedt 2/28/2013 9

Data Vault it s not necessarily about Oracle Database Vault An end-to-end solution; it complements existing approaches An ETL framework Creation of information 2/28/2013 10

Data Vault principles The goal is to integrate (disparate) data from many source systems and link them together while maintaining source system context An Enterprise Data Warehouse is collection of transactions, a single source of the facts as they were at the time (not the single source for the truth) The Truth is subjective: based on soft and changing business rules Data centric view of integration: Everything is many-to-many. Everything is time dependant. Late binding of data: simplified load dependencies and resulting options for parallel processing (application and database level). Repeatable, consistent, scalable, auditable and fault-tolerant It s all about flexibility: Handling changes in structure and data (expand) Changing the Data Warehouse structure and performance (manage) Uses RDBMS basics 2/28/2013 11

Data Vault architecture Source Systems Business Rules & IQ EDW Data Marts Source Systems Hard Business Rules EDW Business Rules & IQ Data Marts Virtualisation 2/28/2013 12

Data Vault architecture The business rules are moved closer to the business which: Improves IT reaction time Enables business users to direct Business Intelligence Reduces cost Minimises impact 2/28/2013 13

Reference Architecture Challenges: Dealing with complexities Dealing with dependencies Ability to respond to a changing environment Principles: Flexibility in design and maintenance Change resilient Future proof (Near) Real Time ready Modular Scalable Durable and predictable Provide a bottom up architecture which can be applied incrementally with a top down approach Results: Separation of Data Warehouse concepts Flexible error handling Hybrid modelling Parallelism Built-in audit trail 2/28/2013 14

Exception Handling Operational Meta Data Reference Architecture Presentation Layer Integration Layer / SOR Staging Layer

Agenda Introductions A brief introduction to Data Vault Creating a Data Vault based Data Warehouse Comparisons with 3NF/Kimball When is it good for you? Examples What s next? 2/28/2013 16

Data Vault entity types Hub: Unique list of business keys Satellite: Historical descriptive data (about the hub or link) Link: Unique list of relationships between keys (Current table) (Point in time table) (Reference table) 2/28/2013 17

Data Vault entities: Hubs A Hub entity contains the unique list of business keys. Contains: Surrogate key Business key Load date timestamp Last seen timestamp Record source indication for traceability 2/28/2013 18

Data Vault entities: Satellites Satellites entities provide context for a hub or Link. Much like a Type-2 dimension, its information is subject to change over time Contains: Hub or Link primary key Load date timestamp End date valid timestamp Record source indication All context attributes 2/28/2013 19

Data Vault entities: Links Link entities are many-to-many relationships. Determines the grain Leads to fact tables Are valid for a certain period of time Contains: Surrogate key (optional), relation to Link-Satellite Hub key(s), determines the relationship Load date timestamp Last seen date timestamp Record source indication for traceability 2/28/2013 20

Links: everything is many to many Portfolio One Many Customer Portfolio Many Many Customer Portfolio One Many Customer Portfolio Many One Customer When the EDW is modelled for today it breaks down when loading history 2/28/2013 21

Links: everything is many to many Portfolio One Many Customer Portfolio One Portfolio Many Many Customer Many Link Many One Customer Portfolio Many One Customer Historical, present and future data can be loaded without re-engineering 2/28/2013 22

Data Vault Why isolate keys? Data Warehouse management is reduced because of decoupling Keys are distributed early and data can be traced by these keys throughout the system Relation Relation History History Extra History 2/28/2013 23

Data Vault - Load strategy Hybrid modelling reduces dependencies and simplifies ETL Loading processes are self-dependant Capable of Near Real-Time loading Simple, Scalable, Parallel and Consistent 2/28/2013 24

Data Vault - Flexibility Shipment dates Billed amounts Product Supplier Link Products Suppliers Availability schedule Stocks Address Descriptions Descriptions Rating Score 2/28/2013 25

Agenda Introductions A brief introduction to Data Vault Creating a Data Vault based Data Warehouse Comparison with 3NF/Kimball When is it good for you? Examples What s next? 2/28/2013 26

Data Vault architecture comparisons Kimball or Inmon (CIF) Complex ETL Truth oriented Business Rules before EDW Data Vault 100% of the data (within scope) 100% of the time Source driven Auditable Transaction / data oriented Template/metadata driven No Business Rules No destructive loading Kinstedt or Dinmon! 2/28/2013 27

Compared to 3NF 3rd Normal Form: the corporate data model Long developing time (mainly due business changes) Subject Area Database, modelled to current views Adaptation issues: to change the model can be hard Definitions changing ( customer means something else now) Growth of new relationships Duplicate data sources require a priority / trust layer Cascading impact: changes ripple through to underlying tables Integration issues: Load dependencies because of referential integrity Data Quality!= Referential Integrity Time driven PK issues (new parent or child; key change) 2/28/2013 28

Data Vault case study late arriving requirement Normalised core DWH model 2/28/2013 29

Data Vault case study late arriving requirement Late arriving requirement: introduction of a Cover Group Policy 2/28/2013 30

Data Vault case study late arriving requirement x x x x x x x x Downstream impacts of normalisation 2/28/2013 31

Data Vault case study late arriving requirement 1 2 3 5 4 Downstream impacts of normalisation 2/28/2013 32

Data Vault case study late arriving requirement HUB_POLICY RISK HUB_POLICY Policy Id PMS_PLCY_NO HUB_POLICY STATUS Policy Status Id Policy Status Type Id HUB_POLICY INSURED Policy Insured Id Insured Id Policy Risk Id PMS Risk Pt 1 PMS Risk Pt 2 PMS Risk Pt 3 LNK_POL_ST_INS_RISK Link Policy Status ID Policy Id (FK) Policy Status Id (FK) Policy Insured Id (FK) Policy Risk Id (FK) POLICY OFFER Data Vault approach (before the introduction of the Cover Group Policy) Derived on output 2/28/2013 33

Data Vault case study late arriving requirement = HUB_POLICY Policy Id PMS_PLCY_NO HUB_POLICY STATUS Policy Status Id Policy Status Type Id HUB_POLICY INSURED Policy Insured Id Insured Id HUB_POLICY RISK Policy Risk Id PMS Risk Pt 1 PMS Risk Pt 2 PMS Risk Pt 3 HUB_COVER DEVELOPMENT GROUP Cover Development Group Id Cover Development Group Cd x LNK_POL_ST_INS_RISK Link Policy Status ID Policy Id (FK) Policy Status Id (FK) Policy Insured Id (FK) Policy Risk Id (FK) LNK_POL_ST_INS_RISK_CDG Link Policy Status ID Policy Id (FK) Policy Status Id (FK) Policy Insured Id (FK) Policy Risk Id (FK) Cover Development Group Id (F 2/28/2013 34

Data Vault case study late arriving requirement 2/28/2013 35

Data Vault - Disadvantages Scaling versus performance: lots of outer joins and tables in queries Not intended for ad hoc end user access Aging relationships Currently not an open platform Does not provide solutions for the data mart layer 2/28/2013 36

Compared to Star Schema models Star Schema / fact and dimensions issues: Expensive updates and deletes Dimensions over time (Type 1, 2 and 3) Architecture includes many kinds of tables (helper, bridge, junk, mini) Grain issues difficult to resolve Real-time loading impractical Issues with transactions appearing before dimension data Complex loading and changing of history Begins to fail under very heavy loads Inflexible mix of basic elements (history, structure, key distribution) 2/28/2013 37

Data Vault - Advantages Completely auditable architecture DWH model is aligned with the business model Extremely adaptable to (business) changes Designed and optimised for the EDW Durable, consistent and predictable Consistency pays back over time Lends itself for real-time processing Simple and consistent Isolation from change Incrementally built Easy to load a Dimensional Model 2/28/2013 38

Atomicity Data Warehouses try to do too much in a loading cycle; addressing all kinds of problems in a single load pattern 2/28/2013 39

Atomicity Data Warehouses try to do too much in a loading cycle; addressing all kinds of problems in a single load pattern 2/28/2013 40

Agenda Introductions A brief introduction to Data Vault Creating a Data Vault based Data Warehouse Comparisons with 3NF/Kimball When is it good for you? Examples What s next? 2/28/2013 41

Is it good for you? Is the introduction of Data Vault as the middle-tier (Integration / SOR / Core DWH layer) worth the additional effort in terms of (ETL) development and space? 2/28/2013 42

Not a good match You re using a 2-tiered architecture / don t want (or think you need) the extra layer (i.e. not an EDW). You re unfamiliar with the approach. These concerns are often deeply rooted and overriding this may not get the best result. There is a relatively low maturity regarding Data Modelling. Data Vault required a relatively senior/firm Modeller. Data Vault leaves less room for deviations, requires adequate assignment of business keys (not 1 on 1 with source primary keys) and generally requires a firm adherence to the standards. There is not enough involvement / drive to pursue the program. Related to the familiarity working with Data Vault requires continuous selling of the approach as to date it is still fairly uncommon. 2/28/2013 43

A good match The outcomes and/or requirements are not clear or are likely to change. You are following an agile approach for Project Management or specified very short delivery cycles. You want to incrementally expand your data model. You want to plan for / expect to require additional scalability. You want to leverage (ETL) automation / enforce standards through automation. You are stuck in a tactical (2-tiered / Dimensional Bus Architecture) solution and want to expand, Data Vault can be used to incrementally backfill the solution. 2/28/2013 44

Agenda Introductions A brief introduction to Data Vault Creating a Data Vault based Data Warehouse Comparisons with 3NF/Kimball When is it good for you? Examples What s next? 2/28/2013 45

Demonstration Assemblies Use of BIML and C# Model Driven Design 2/28/2013 46

Agenda Introductions A brief introduction to Data Vault Creating a Data Vault based Data Warehouse Comparisons with 3NF/Kimball When is it good for you? Examples What s next? 2/28/2013 47

What s next??? Data Vault 2.0? Big Data? Model Driven Design? Case Studies? Software / ETL specific implementations? 2/28/2013 48

Thank you! 2/28/2013 49