What about Transactional Links?

Similar documents
DATA VAULT MODELING GUIDE

Modeling Pattern Characteristics

Introductory Guide to Data Vault Modeling GENESEE ACADEMY, LLC

DATA VAULT CDVDM. Certified Data Vault Data Modeler Course. Sydney Australia December In cooperation with GENESEE ACADEMY, LLC

Comparing Anchor Modeling with Data Vault Modeling

Modeling Pattern Awareness

Modeling the. Agile. with Data Vault. Data Warehouse. Hans Hultgren

Technology Note. Data Vault Modeling with ER/Studio Data Architect

Your Real Estate & Strata Management Customer Communication Requirements Issues Solved

This module presents the star schema, an alternative to 3NF schemas intended for analytical databases.

Data Vault Brisbane User Group

Up and Running Software The Development Process

Incremental development A.Y. 2018/2019

Applying Business Logic to a Data Vault

7 Trends driving the Industry to Software-Defined Servers

NINE MYTHS ABOUT. DDo S PROTECTION

turning data into dollars

Building a Waterfall Chart in Excel

BUILDING A WATERFALL CHART IN EXCEL

VIDEO 1: WHY IS THE USER EXPERIENCE CRITICAL TO CONTEXTUAL MARKETING?

One of the fundamental kinds of websites that SharePoint 2010 allows

Introduction to Programming

Design and Analysis of Algorithms Prof. Madhavan Mukund Chennai Mathematical Institute. Week 02 Module 06 Lecture - 14 Merge Sort: Analysis

MARKETING VOL. 3

Use and Abuse of Anti-Spam White/Black Lists

Introduction to User Stories. CSCI 5828: Foundations of Software Engineering Lecture 05 09/09/2014

Single device test requirements for reliable CAN-Based multi-vendor networks

8 Web Design Principles to Use IN ELEARNING DESIGN

Backup vs. Business Continuity

SOME TYPES AND USES OF DATA MODELS

DOWNLOAD PDF SIXTY MILESTONES OF PROGRESS,

Decision Guidance. Data Vault in Data Warehousing

SQL Server 2008 Consolidation

Testing is a very big and important topic when it comes to software development. Testing has a number of aspects that need to be considered.

Welcome to Design Patterns! For syllabus, course specifics, assignments, etc., please see Canvas

turning data into dollars

GOOGLE MOVES FROM ADVERTISING TO COMMISSION MODEL: What It Means For Your Business

The great primary-key debate

MTAT Software Engineering. Written Exam 17 January Start: 9:15 End: 11:45

Data Vault Modeling & Methodology. Technical Side and Introduction Dan Linstedt, 2010,

ESET Remote Administrator 6. Version 6.0 Product Details

Link Service Routines

Crash Course in Modernization. A whitepaper from mrc

Some doubts about the objectivity of logical determination of the uniqueness of the elementary process in the Function Point Analysis

THE DEFINITIVE GUIDE

USE CASE. Collect CLOSED CASE FEEDBACK. Salesforce Workflow. Clicktools Deployment TWO DEPLOYMENT APPROACHES. The basic activity flow goes like this:

TDWI Data Modeling. Data Analysis and Design for BI and Data Warehousing Systems

Defining Done in User Stories

This tutorial also elaborates on other related methodologies like Agile, RAD and Prototyping.

Why you should design your data hub top-down vs. bottom-up

Availability and the Always-on Enterprise: Why Backup is Dead

Implementing ITIL v3 Service Lifecycle

Losing Control: Controls, Risks, Governance, and Stewardship of Enterprise Data

Task Management. Version 6.0 B

A quick guide to... Split-Testing

The #1 Key to Removing the Chaos. in Modern Analytical Environments

Agile Tester Foundation E-learning Course Outline

CS487 Midterm Exam Summer 2005

Implementation Guide for Delivery Notification in Direct

WHITE PAPER BCDR: 4 CRITICAL QUESTIONS FOR YOUR COMMUNICATIONS PROVIDER

The notion of functions

A brief history of time for Data Vault

Object-Oriented Programming in Objective-C

Before the FEDERAL COMMUNICATIONS COMMISSION Washington, D.C

Virtualization. Q&A with an industry leader. Virtualization is rapidly becoming a fact of life for agency executives,

WHITE PAPER Cloud FastPath: A Highly Secure Data Transfer Solution

Multi-Vendor Key Management with KMIP

Software Testing part II (white box) Lecturer: Giuseppe Santucci

Advanced Security Tester Course Outline

QUICKBOOKS EXPORT FILE: Manual. avfuel QUICKBOOKS EXPORT FILE MANUAL

OBJECT ORIENTED SYSTEM DEVELOPMENT Software Development Dynamic System Development Information system solution Steps in System Development Analysis

GET TO KNOW MARKETING AUTOMATION

Shadows in the graphics pipeline

Redesigning a Website Using IA Principals

Intro. Scheme Basics. scm> 5 5. scm>

Read & Download (PDF Kindle) Data Structures And Other Objects Using C++ (4th Edition)

1 Shorten Your Sales Cycle - Copyright Roundpeg 2015 All rights Reserved

Q &A on Entity Relationship Diagrams. What is the Point? 1 Q&A

VMWARE MICRO-SEGMENTATION AND SECURITY DEPLOY SERVICE

Meeting the Move Update Standard Mailstream Matters November 16, 2009 article. TOPIC: Meeting the Move Update Standard

C++ Reference NYU Digital Electronics Lab Fall 2016

Chapter 10. Object-Oriented Analysis and Modeling Using the UML. McGraw-Hill/Irwin

Next Generation Backup: Better ways to deal with rapid data growth and aging tape infrastructures

Top of Minds Report series Data Warehouse The six levels of integration

MARKETING VOL. 1

If you re a Facebook marketer, you re likely always looking for ways to

ProServeIT Corporation Century Ave. Mississauga, ON L5N 6A4 T: TF: F: W: ProServeIT.

NXT Programming for Beginners Project 9: Automatic Sensor Calibration

Depiction of program declaring a variable and then assigning it a value

What Secret the Bisection Method Hides? by Namir Clement Shammas

NIRVANA. Setup Guide. David Allen Company

C++ Data Types. 1 Simple C++ Data Types 2. 3 Numeric Types Integers (whole numbers) Decimal Numbers... 5

Entity Relationships and Databases

Table of Contents. Customer Profiling and List Segmentation 4. Delivering Unique, Relevant Content 7. Managing Campaigns Across Multiple Time Zones 10

Special Report. What to test (and how) to increase your ROI today

Data Basics for Research.

Teaching Ruby on Rails Dr Bruce Scharlau Computing Science Department University of Aberdeen Aberdeen, AB24 3UE

ADDRESS DATA CLEANSING A BETTER APPROACH

Arithmetic-logic units

Disaster Recovery Is A Business Strategy

Transcription:

What about Transactional Links? Transactional Links By: Hans Hultgren Links in the Data Vault modeling pattern are used to model the relationships between entities in our model. Entities are the Persons, Places, Things, and Events that we seek to capture, model, store and analyze. In Data Vault these entities are Core Business Concepts (CBCs) and are modeled as Ensembles. Each Ensemble contains a Hub, Link(s) and Satellite(s). It is the sum of these parts that represents the Entity (CBC). Those of us who have been working with Data Vault over the past decade have heard the term Transactional Link and several have also applied it to some degree. Our understanding of the pattern is evolving as we collectively gain more and more experience. Many of us now understand that Transactional Links are at best an exception which would be best to avoid and at worst a black hole where we may lose data. A discussion (or debate) about this topic should of course begin with our definition of Transactional Links. This article will discuss two interpretations of this term: I) A hybrid table that includes an instance/key, relationships and possibly also context, and II) A standard Data Vault Link being used to model a Transaction 2017 Genesee Academy LLC January 16, 2017 Page 1

Transactional Link I (Hybrid Link table with an instance/key, etc.) This form of table looks somewhat like a Link however in addition to the FKs of the Hubs it relates it also has an identity (key). The identity is key that we would typically model as the business key in a Hub as it is the unique business identifier for this event or transaction. People have deployed these exception forms in their pattern in cases of extreme volumes and rapid burst loading rates. Important to note: this is not a standard Link as it contains not only relationships but also an identity (key). If you deploy these structures in your modeling pattern please clearly differentiate them from the standard Links in the Data Vault pattern. They will use a different table template, use a different loading pattern, react differently to changes, use different sourcing templates for downstream deliverables and have a lesser degree of agility. Recommendation: avoid using these forms if possible. Since they introduce a 4 th table form in your pattern (Hub, Link, Satellite and Transactional_Link). Even in cases of extreme volumes you can always try to use the standard pattern first and then go to the exception form if needed. Transactional Link II (Link used to model an Event or Transaction) This is the idea of using a standard pattern Data Vault Link to model an Event or Transaction. The idea is that the events are all intersections of business data and so should be modeled as Links. The problem is that the Transaction Concept (Entity for example a Sale) is lost in this case. There is no instance (key) to identify the event or transaction. Important to note: this pattern does not work. Because there is no identity afforded the event or transaction it will not be possible to load like (similar) transactions to the data warehouse. A Link only represents the first time the intersection of keys is seen (a core foundation of Data Vault modeling). If the same set of associated keys arrives again it will be ignored by the data warehouse. This means the data warehouse can literally lose transactions. Recommendation: don t use this pattern as it is not a viable alternative and you will risk losing data. 2017 Genesee Academy LLC January 16, 2017 Page 2

Transactional Link I A hybrid table that includes an instance/key, relationships and possibly also context. Guidance: Avoid using this form; stick with the core pattern instead. In any event carefully consider all pros and cons before diverging from the pattern. If you create a hybrid structure that associates an intersection of keys but also includes its own identifier/key then you will have created a structure that will also capture transactions (seals up the black hole). But this table is not a Link. It is a hybrid table that has enveloped at a minimum both a Link and a Hub (the key, the instance). From the Colors analysis we can see that a table form that includes both a key (unique instance) and relationships is not a Link as in a Data Vault pattern but rather some form of 3NF Entity. Concepts modeled as Ensembles or Entities If you use this type of table then the only remaining important factor is to clearly differentiate it from standard DV Links. As with the diagram above, the Transaction structure is clearly depicted as different from the other table forms used in the Data Vault Ensemble modeling pattern. Note that there are hybrid or exception structures that we sometimes deploy in Data Vault modeling. The use of these forms is up to the modeler/architect in charge of the model. From a Data Vault perspective these are typically no problem so long as each form that you use is clearly defined and communicated. I would always recommend that you first consider the standard Data Vault forms and carefully weigh the pros and cons of the hybrid form you are contemplating. In this case the loading performance of the data warehouse might be improved by using the Transaction entity hybrid form. However the loading and sourcing standards (and related templates or automation tooling if applicable) will need to expand to take into account this additional form. Likewise the extensibility of the 2017 Genesee Academy LLC January 16, 2017 Page 3

pattern (as new subject areas are introduced) will be to some degree negatively impacted by this hybrid form. Lastly you are more likely to run into reengineering work as new subject areas come in or as the business models and processes change over time. By way of example, here we consider this Sales case and use a hybrid Transactional Link structure. Transactional Link I (Hybrid Link table with an instance/key) Please note that the Link in this diagram is titled TL_Sale (for Transactional Link). Also the component that differs from standard Links in Data Vault is noted in ALL CAPS and is the key representing an instance (of a Sale in this case). So where a Link is defined as an intersection of keys, this TL is defined by and intersection of keys PLUS a key/instance (unique identifier) for the Sale. Now let s consider one of the very first things that can happen in the business model / business process. What happens if the Sale can have more than one Product? Since this TL is representing an instance of a Sale, it cannot be repeated for every Product on the Sale. There is a grain shift a change in cardinality. So we need to split out the Header from the Line Item. 2017 Genesee Academy LLC January 16, 2017 Page 4

Adapting Transactional Link I with Grain Shift The result is a Link-to-Link relationship between the TL_Sale and the Line Item level (Products on the Sale). Well since Link-to-Link (L:L) relationships have been effectively eliminated from the Data Vault pattern this simple grain shift has caused a problem. Either we stay with a L:L constellation (which would at a minimum require reengineering of the existing Link to remove the FK to H_Product) or we need to create a Hub for the Sale. Of course both options have rather profound reengineering efforts since the majority of the data the real volumes are in the Transactions. Even More. One of the great benefits of Data Vault (and all Ensemble Modeling patterns) is the ability to build incrementally. Future changes should not impact the existing design but should bolt on without reengineering as much as possible. Let s now consider three future iterations that are required as new subject areas are introduced: 1) Returns are now added to the scope of the data warehouse. A Customer can return a particular Product on a Sale (Line Item). 2) The Customer Call Center subject area is now added to the scope of the data warehouse. As part of this subject area Customers may complain about a particular Product on a Sale (Line Item). 3) The Logistics subject area is added to the scope of the data warehouse. As part of the Logistics subject area we now track the Deliveries of certain backordered Products on a Sale (Line Item). 2017 Genesee Academy LLC January 16, 2017 Page 5

Extending Transactional Link I with new Subject Areas Note that in this expanded model we now have three cases of Link-to-Link-to-Linkto-Link relationships (L:L:L:L). And in aggregate there is a L:L:L:L:L:L:L:L constellation. Now consider that you want to know if the Products Returned were also the Products that were Complained about and/or Backordered because they were not in stock? As you can imagine the usability of this resulting model is somewhat less than optimal. So again the Recommendation: avoid using these forms if possible. And certainly consider the cost to the pattern if you elect to try it. 2017 Genesee Academy LLC January 16, 2017 Page 6

Transactional Link II A standard Data Vault Link being used to model a Transaction. Guidance: Don t this form; it does not work. High risk of losing data and pattern will be compromised. To more fully understand the issues we can look to an example: Hans buys a Flat White from Steve at the Golden Starbucks. There is an event associated with the intersection of four concepts: Customer, Product, Employee and Store. The model below illustrates a Data Vault model including these four concepts and a Link entity type being used to model a transaction. Transactional Link II (Standard Link used to model a Transaction) Please take a moment to look at this model and assess whether each of the forms above is accurately modeled given the definitions of Hubs and Links. When the data warehouse is loaded there is a standard that only new instances of Hubs and Links are inserted (CDC). If a Hub record already exists we do not load it again. Likewise if a Link record already exists we do not load it again. The Date_Time_Stamp in each of the Hubs and Links represents the first time we saw this key or relationship (it is not used as part of a key to differentiate records). Step through the loading process for each of these tables: H_Customer = Hans exists? No. Insert Hans record. H_Store = Golden Starbucks exists? H_Employee = Steve exists? H_Product = Flat White exists? 2017 Genesee Academy LLC January 16, 2017 Page 7

L_Sale [Hans/Golden Starbucks/Steve/Flat White] exists? No, Insert L_Sale [Hans/Golden Starbucks/Steve/Flat White] record. So far everything seems to be working fine. The Twist. Now assume that Hans returns to the Golden Starbucks and buys another Flat White from Steve. Apparently Hans likes a good Flat White; and as a business Starbucks sees this whole repeating purchase thing as acceptable and encouraged Customer behavior. Later that evening we step through the loading process for each of these tables, now for the second transaction: H_Customer = Hans exists? H_Store = Golden Starbucks exists? H_Employee = Steve exists? H_Product = Flat White exists? L_Sale [Hans/Golden Starbucks/Steve/Flat White] exists? What happened to the second Sale? 2017 Genesee Academy LLC January 16, 2017 Page 8

Take a moment to walk through this again on your own. Draw it out on a piece of paper or on a white board. Look again at the table forms of the Hubs and Link. Walk through the loading pattern again. Did we miss something? Reality. In fact the data warehouse will never load this second Sale (regardless of date or the context associated with any subsequent Sales). This is because a Link (Link entity type, or whatever name) can never be used to model an event or transaction. What is missing is the actual Sale Event; the Transaction itself has not been identified or modeled. The only things modeled are the relationships that relate to the Sale. True: a Transaction has relationships. False: a Transaction is relationships. Consider 3NF modeling for a moment. How is a Transaction modeled in 3NF? The Sale Event is modeled as an Entity. A Sale_Entity is created that includes the instance/key of that transaction, plus the related Foreign Keys (FKs) and the context related to the event. Applying the Data Vault pattern of Ensemble Modeling the instance/key becomes a Hub, the FK relationships become Link(s) and the context is placed in Satellites. This is the pattern. All Transactions in Data Vault are modeled as complete Ensembles including Hub, Link(s) and Satellite(s). Concept depicted in 3NF versus Ensemble So the second Sale was lost because we did not model the Sale. It is somewhat ironic that perhaps the most important Entity in this subject area was overlooked. When you work with business users and explore their Core Business Concepts (person, place, thing, event) you would of course have arrived at a list that includes Customer, Store, Employee, Product and Sale. The Sale CBC (Main Entity) is at the heart of this entire subject area. It IS the reason the keys were correlated in the first place. The corrected model includes the Hub for the Sale: H_Sale (see below). Note that the Sale does have several relationships associated with it. It is true that Events, especially Transactions, will commonly have many FKs associated with each record. So having large Links and several Links associated with an Event such as a Transaction is very common. 2017 Genesee Academy LLC January 16, 2017 Page 9

Fixed Model: Transaction as DV Ensemble with Hub Take two with the corrected version: Later that evening we step through the loading process for each of these tables, now for the second transaction: H_Sale = 112002 exists? No. Insert 112002 record. H_Customer = Hans exists? H_Store = Golden Starbucks exists? H_Employee = Steve exists? H_Product = Flat White exists? L_Sale [112002/Hans/Golden Starbucks/Steve/Flat White] exists? No. Insert L_Sale [112002/Hans/Golden Starbucks/Steve/Flat White] record. Happily we have now inserted the second Sale! Notice that there is a new record for the Hub that defines this subject area the main Event, the Sale Transaction, the H_Sale. The new Sale CBC (Event) has relationships to several other CBCs in this 2017 Genesee Academy LLC January 16, 2017 Page 10

case Customer (Person), Store (Place), Employee (Person) and Product (Thing). These relationships are captured in the new Link record that was also loaded. Warning. The messaging that a Link is an entity type used to model transactions is dangerous. Because of the very real risk of creating a black hole in your data warehouse where records will fall, never to be seen again. This is especially risky because there will be no indication that a problem even exists. All the DV forms will be loading successfully, according to plan, with no technical exceptions. But all subsequent like transactions will be lost. Summary Transactional Links (I & II) Guidance: Stick with the core Data Vault Ensemble pattern and avoid hybrid forms. Model all events and transactions as full Ensembles. In the Data Vault pattern that means a Hub, Link(s) and Satellite(s). The fundamental Ensemble patterns work well for their intended purpose without the need for modification. The effects, and ripple effects, of diverging from the pattern are typically found to cause more issues than they address. For more information download the Data Vault Modeling Guide or purchase the Data Vault Book (Modeling the Agile Data Warehouse with Data Vault). Data Vault Certification Training (CDVDM) available through Genesee Academy, LLC GeneseeAcademy.com 2017 Genesee Academy LLC January 16, 2017 Page 11