Data Integration Best Practices

Similar documents
Call: SAS BI Course Content:35-40hours

Best Practice for Creation and Maintenance of a SAS Infrastructure

Best ETL Design Practices. Helpful coding insights in SAS DI studio. Techniques and implementation using the Key transformations in SAS DI studio.

Introduction to ETL with SAS

Introduction to DWH / BI Concepts

Certkiller.A QA

Implementing a Data Warehouse with Microsoft SQL Server 2012

Implementing a Data Warehouse with Microsoft SQL Server 2012/2014 (463)

Microsoft Implementing a SQL Data Warehouse

SESUG Paper SD ETL Load performance benchmarking using different load transformations in SAS Data Integration Studio.

Implementing a SQL Data Warehouse

20767B: IMPLEMENTING A SQL DATA WAREHOUSE

Implement a Data Warehouse with Microsoft SQL Server

Implementing a Data Warehouse with Microsoft SQL Server 2012

20463C-Implementing a Data Warehouse with Microsoft SQL Server. Course Content. Course ID#: W 35 Hrs. Course Description: Audience Profile

Implementing a Data Warehouse with Microsoft SQL Server

BASICS BEFORE STARTING SAS DATAWAREHOSING Concepts What is ETL ETL Concepts What is OLAP SAS. What is SAS History of SAS Modules available SAS

Implementing a SQL Data Warehouse

Exam /Course 20767B: Implementing a SQL Data Warehouse

MOC 20463C: Implementing a Data Warehouse with Microsoft SQL Server

An Introduction to Parallel Processing with the Fork Transformation in SAS Data Integration Studio

Rupinder Dhillon Dec 14, 2012 TASS-i

20767: Implementing a SQL Data Warehouse

SAS Data Integration Studio 3.3. User s Guide

COURSE 10977A: UPDATING YOUR SQL SERVER SKILLS TO MICROSOFT SQL SERVER 2014

Microsoft Implementing a Data Warehouse with Microsoft SQL Server 2014

Call: Datastage 8.5 Course Content:35-40hours Course Outline

Techno Expert Solutions An institute for specialized studies!

Utilizing SQL with WindMilMap

INDEPTH Network. Introduction to ETL. Tathagata Bhattacharjee ishare2 Support Team

Pentaho 3.2 Data Integration

IBM B5280G - IBM COGNOS DATA MANAGER: BUILD DATA MARTS WITH ENTERPRISE DATA (V10.2)

Implementing a Data Warehouse with Microsoft SQL Server 2014

IBM WEB Sphere Datastage and Quality Stage Version 8.5. Step-3 Process of ETL (Extraction,

Course Contents: 1 Datastage Online Training

Two Success Stories - Optimised Real-Time Reporting with BI Apps

Implementing a Data Warehouse with Microsoft SQL Server 2014 (20463D)

Shawn Dorward, MVP. Getting Started with Power Query

Exploiting Key Answers from Your Data Warehouse Using SAS Enterprise Reporter Software

Shawn Dorward, MVP. Getting Started with Power Query

"Charting the Course... MOC B Updating Your SQL Server Skills to Microsoft SQL Server 2014 Course Summary

ETL (Extraction Transformation & Loading) Testing Training Course Content

Extending the Scope of Custom Transformations

Perceptive Data Transfer

A detailed comparison of EasyMorph vs Tableau Prep

MeasureUp Notes. Contents

Paper HOW-06. Tricia Aanderud, And Data Inc, Raleigh, NC

Identifying Updated Metadata and Images from a Content Provider

CHAPTER 7 CONCLUSION AND FUTURE WORK

Quick Start Guide. Copyright 2016 Rapid Insight Inc. All Rights Reserved

Green Eggs And SAS. Presented To The Edmonton SAS User Group October 24, 2017 By John Fleming. SAS is a registered trademark of The SAS Institute

A Practical Introduction to SAS Data Integration Studio

Updating your Business Intelligence Skills to Microsoft SQL Server 2012 Course 40009A; 3 Days, Instructor-led

Data Management Glossary

Intro to BI Architecture Warren Sifre

1. Attempt any two of the following: 10 a. State and justify the characteristics of a Data Warehouse with suitable examples.

My SAS Grid Scheduler

C Exam Code: C Exam Name: IBM InfoSphere DataStage v9.1

Jet Data Manager 2014 SR2 Product Enhancements

Xpert BI General

Duration: 5 Days. EZY Intellect Pte. Ltd.,

Data Warehousing. New Features in SAS/Warehouse Administrator Ken Wright, SAS Institute Inc., Cary, NC. Paper

ABOUT P2WARE PLANNER SUITE 2011 NEW P2WARE PLANNER SUITE 2011 LICENCES

SQL Replication Project Update. Presented by Steve Ives

Accelerated SQL Server 2012 Integration Services

Updating your Business Intelligence Skills to Microsoft SQL Server 2012

MCSA SQL SERVER 2012

ETL Interview Question Bank

User Guide. Data Preparation R-1.0

A Examcollection.Premium.Exam.47q

SUGI 29 Data Warehousing, Management and Quality

Vendor: Microsoft. Exam Code: Exam Name: Implementing a Data Warehouse with Microsoft SQL Server Version: Demo

Liberate, a component-based service orientated reporting architecture

Implementing and Maintaining Microsoft SQL Server 2008 Integration Services

Data Mining. Asso. Profe. Dr. Raed Ibraheem Hamed. University of Human Development, College of Science and Technology Department of CS (1)

DISCOVERY HUB RELEASE DOCUMENTATION

Self-documenting Data Processes with SAS

Data Integration and ETL with Oracle Warehouse Builder

Data Stage ETL Implementation Best Practices

Data Vault Brisbane User Group

Complete. The. Reference. Christopher Adamson. Mc Grauu. LlLIJBB. New York Chicago. San Francisco Lisbon London Madrid Mexico City

Data Science. Data Analyst. Data Scientist. Data Architect

Quality Gates User guide

My Journey with DataFlux - Garry D Lima Business Solutions Administrator December 13, 2013

ABSTRACT INTRODUCTION

Oracle Fusion Middleware

Recommended Maintenance Plan for Siriusware Clients for SQL server 2005

Microsoft Access Database How to Import/Link Data

DATA WAREHOUSE Extras

Customising SAS OQ to Provide Business Specific Testing of SAS Installations and Updates

BI4Dynamics AX/NAV Integrate external data sources

Housekeeping...1 Introduction...1 Using folders...1 Archiving s...8

Oregon SQL Welcomes You to SQL Saturday Oregon

User Guide. Data Preparation R-1.1

Designing your BI Architecture

Access/SAS application for verification and correction: corporate tax returns

Enterprise Data-warehouse (EDW) In Easy Steps

Data transformation guide for ZipSync

SAS CURRICULUM. BASE SAS Introduction

Datastage Slowly Changing Dimensions

Transcription:

(Healthy Habits for SAS Data Integration Studio Users)

Abstract: Version 9 of the SAS System offers tools to help developers and business users manage and organise the wealth of data and processes that face SAS professionals today. SAS Data Integration Studio benefits from many features that support healthy habits for data integration, but they can only 'be of use' if they are 'being used'. DI Studio allows customisation of the custom tree, error monitoring, job status handling, data validation, conformed data model support, selfdocumentation, and role assignment. Identification of the benefits behind using these functions is often enough to motivate users into controlled and organised methods of working. This paper describes examples of best practice for developing data integration suites to ensure quality, efficiency and resilience is built into the heart of your enterprises information estate.

Subjects: Data Integration Structure Data Integration Organisation Capture Control (CCT Tables) Error Monitoring Data Validation Data Protection (Scrambler) Conformed Modelling SQL Optimisation Self Documentation Role Assignment Rename Standard Transforms SAS DI Studio Version 3.4 under SAS Intelligence Platform 9.1.3

Data Integration Structure Challenge: How can you best deliver Business Intelligence from a variety of source systems across a diverse consumer base? Solution: Employ a Data Integration flow structure.

Data Integration Structure Challenge: How can you best deliver Business Intelligence from a variety of source systems across a diverse consumer base? Solution: Employ a Data Integration flow structure. Source Systems

Data Integration Structure Challenge: How can you best deliver Business Intelligence from a variety of source systems across a diverse consumer base? Solution: Employ a Data Integration flow structure. Source Systems Detailed Data Model

Data Integration Structure Challenge: How can you best deliver Business Intelligence from a variety of source systems across a diverse consumer base? Solution: Employ a Data Integration flow structure. Source Systems Detailed Data Model Subject Specific Data Marts

Data Integration Structure Challenge: How can you best deliver Business Intelligence from a variety of source systems across a diverse consumer base? Solution: Employ a Data Integration flow structure. Source Systems Detailed Data Model Subject Specific Data Marts Subject Specific Business Intelligence

Subjects: Data Integration Structure Data Integration Organisation Capture Control (CCT Tables) Error Monitoring Data Validation Data Protection (Scrambler) Conformed Modelling SQL Optimisation Self Documentation Role Assignment Rename Standard Transforms SAS DI Studio Version 3.4 under SAS Intelligence Platform 9.1.3

Data Integration Organisation Challenge: How can you keep track of the thousands of jobs typically created in a data integration suite? Solution: Utilise the custom tree in SAS Data Integration Studio.

Data Integration Organisation Create folders for each integration layer. Sub divide them by: Jobs Libraries Tables Number the folders preserve order. Stick to methodology: (e.g. don t transform in capture layer)

Subjects: Data Integration Structure Data Integration Organisation Capture Control (CCT Tables) Error Monitoring Data Validation Data Protection (Scrambler) Conformed Modelling SQL Optimisation Self Documentation Role Assignment Rename Standard Transforms SAS DI Studio Version 3.4 under SAS Intelligence Platform 9.1.3

Capture Control Challenge: How can I perform incremental extracts from several source systems? Solution: Define Capture Control Tables for each source table. Status To ensure smooth running of DI suite. (Started, Failed, or Success) From/To Datetimes To extract against the last updated column in the database. Also useful to determine processing times as data increases day by day.

Capture Control Send Job Status to dataset with same name as the job.

Capture Control Only extract records which have updated since last run. Capture Job Source Systems Conformed Model

Capture Control Only extract records which have updated since last run. Capture Job Source Systems Conformed Model CoreInfo Tables

Capture Control Only extract records which have updated since last run. Pre Capture Job Post Source Systems Conformed Model CoreInfo Tables

Capture Control Pre-Processing Is this the first time the job has run successfully today? Yes No Warn that duplicate facts will occur. Did the previous run fail, or not finish? Yes No Warn that this is a replacement run. Update dates in CCT table for this source. (&source_table._cct)

Capture Control Post-Processing Did the job run successfully? Yes No Update CCT table with Status= Failed. Update dates in CCT table for this source. (&source_table._cct)

Subjects: Data Integration Structure Data Integration Organisation Capture Control (CCT Tables) Error Monitoring Data Validation Data Protection (Scrambler) Conformed Modelling SQL Optimisation Self Documentation Role Assignment Rename Standard Transforms SAS DI Studio Version 3.4 under SAS Intelligence Platform 9.1.3

Error Monitoring Challenge: How can I keep my production support department informed of job failures/successes? Solution: Email job statistics to designated mailbox. Create User Transform called Email_Stats. Add Email_Stats transform to each job.

Error Monitoring Add Email_Stats transform to Job.

Error Monitoring Drag Target table to one input. Drag Email_Stats to other input. (Email_Stats table contains email addresses of recipients). Don t hard-code email addresses. What happens when people leave? Different recipients for dev/prod.

Error Monitoring Email_Stats transform properties. Only emails if job has failed.

Error Monitoring Last job in flow always sends email to Admin & Support. Set Last Job to Yes.

Subjects: Data Integration Structure Data Integration Organisation Capture Control (CCT Tables) Error Monitoring Data Validation Data Protection (Scrambler) Conformed Modelling SQL Optimisation Self Documentation Role Assignment Rename Standard Transforms SAS DI Studio Version 3.4 under SAS Intelligence Platform 9.1.3

Data Validation Challenge: How can I ensure only clean data gets loaded into the warehouse? Solution: Use the Data Validation transformation.

Data Validation Challenge: How can I ensure only clean data gets loaded into the warehouse? Solution: Use the Data Validation transformation. Use the standard Invalid, Missing, Duplicate tabs. Employ custom validation and apply a severity rating: 1 = Exclusion 2 = Correction 3 = Improvement Store exceptions in permanent dataset for further analysis.

Data Validation e.g. Check for Truncation of Key columns

Data Validation 1) Create each condition

Data Validation 1) Create each condition 2) Determine validation

Data Validation 1) Create each condition 2) Determine validation 3) Define corrective action if required

Data Validation 1) Create each condition 2) Determine validation 3) Define corrective action if required 4) This gets written to temp dataset ETLS_EXCEPTIONS.

Data Validation 1) Create each condition 2) Determine validation 3) Define corrective action if required 4) This gets written to temp dataset ETLS_EXCEPTIONS. 5) Run %Append_Data_Quality Macro in post-processing.

Data Validation 1) Create each condition 2) Determine validation 3) Define corrective action if required 4) This gets written to temp dataset ETLS_EXCEPTIONS. 5) Run %Append_Data_Quality Macro in post-processing. 6) Use BI tools to investigate Data Quality issues (e.g. Particular source system requires cleansing)

Data Validation %Append_Data_Quality Macro Logic. Does ETLS_EXCEPTIONS exist? Yes No Halt macro as no errors to process. Append exceptions to permanent table DQ_Error_Event.

Data Validation Table Properties for DQ_ERROR_EVENT. Column name Description Type Length Row_Extraction_Date Date-timestamp when the row was exported or extracted from the source system. Num (8) Exception_Event_Date Date-timestamp when the exception was identified by the data warehouse processes. Num (8) Job_Name The name of the ETL job which identified the exception. Char (64) Table_Name The library and table name which contains the row and column containing the exception. Char (41) Row_Number The row number containing the exception. Num (8) Column_Name The column name containing the datum of the exception. Char (32) Screen_Description The screen (data quality test) description. Char (256) Exception_Description Standardised description of the exception. Char (256) Exception_Action Automated data conform action (if any). Char (256) Exception_Severity The severity level of the DQ Error Event (1=Exclusion, 2=Correction, 3=Improvement ). Num (8) Unconformed_ValueN Original value (numeric) before conforming. Num (8) Conformed_ValueN Conformed (numeric) value. Num (8) Unconformed_ValueC Original value (character) before conforming. Char (256) Conformed_ValueC Conformed (character) value. Char (256)

Subjects: Data Integration Structure Data Integration Organisation Capture Control (CCT Tables) Error Monitoring Data Validation Data Protection (Scrambler) Conformed Modelling SQL Optimisation Self Documentation Role Assignment Rename Standard Transforms SAS DI Studio Version 3.4 under SAS Intelligence Platform 9.1.3

Data Scrambling Challenge: How can I ensure I m not holding sensitive production data on development/test systems. Solution: Use Data Scrambling routines in non-production environments. Often development source systems are created using production data, and warehouses can propagate the risk of breaching the data protection act.

Data Scrambling Custom Transform The %data_scrambler macro allows for columns to be scrambled or passed through normally.

Data Scrambling Custom transform Edit Paramters: Select Pass don t scramble key fields! Scramble method: Ranuni Function MD5 Function Translate Function

Data Scrambling What about Production? %let liveenvironment = PROD; %let thisenvironment= %sysfunc(substr(%sysfunc(upcase(%sysfunc(getoption(metaserver)))),1,4); Don t perform scramble routine if thisenvironment = liveenvironment. When runnning in Dev the METASERVER option should be different. Could set up a table with environment value in.

Subjects: Data Integration Structure Data Integration Organisation Capture Control (CCT Tables) Error Monitoring Data Validation Data Protection (Scrambler) Conformed Modelling SQL Optimisation Self Documentation Role Assignment Rename Standard Transforms SAS DI Studio Version 3.4 under SAS Intelligence Platform 9.1.3

Conformed Model Challenge: How can I track trends in my data when the source systems don t hold history. Solution: Use a conformed data model in a warehouse, using slowly changing dimensions where appropriate. Re-Useable Dimensions Fact Tables

Conformed Model In the Integrate layer use the SCD Type II Loader transform to make use of effective date processing.

Conformed Model In the Integrate Layer use the Surrogate Key Generator to determine keys for dimension tables.

Subjects: Data Integration Structure Data Integration Organisation Capture Control (CCT Tables) Error Monitoring Data Validation Data Protection (Scrambler) Conformed Modelling SQL Optimisation Self Documentation Role Assignment Rename Standard Transforms SAS DI Studio Version 3.4 under SAS Intelligence Platform 9.1.3

SQL Optimisation Challenge: How can I ensure the best possible SQL performance is achieved through my SQL Join transform. Solution: Use the undocumented _Method option on the SQL procedure to determine processing.

SQL Optimisation: _Method Option (SAS Note 33604)

Subjects: Data Integration Structure Data Integration Organisation Capture Control (CCT Tables) Error Monitoring Data Validation Data Protection (Scrambler) Conformed Modelling SQL Optimisation Self Documentation Role Assignment Rename Standard Transforms SAS DI Studio Version 3.4 under SAS Intelligence Platform 9.1.3

Self Documentation Challenge: How can I ensure the executed warehouse code is documented to an acceptable standard? Solution: DI Studio self documents the code, based on descriptions in in the job and transform properties.

Self Documentation Meaningful Job names Descriptions of why not just what.

Self Documentation Use Notes and Document Attachments.

Self Documentation Descriptions & Notes are propagated through to the executable code, benefitting production support teams.

Subjects: Data Integration Structure Data Integration Organisation Capture Control (CCT Tables) Error Monitoring Data Validation Data Protection (Scrambler) Conformed Modelling SQL Optimisation Self Documentation Role Assignment Rename Standard Transforms SAS DI Studio Version 3.4 under SAS Intelligence Platform 9.1.3

Role Assignment Challenge: How can I address who is responsible for which job / entity? Solution: Use Role Assignment in DI studio.

Role Assignment Allocate names and roles where required.

Subjects: Data Integration Structure Data Integration Organisation Capture Control (CCT Tables) Error Monitoring Data Validation Data Protection (Scrambler) Conformed Modelling SQL Optimisation Self Documentation Role Assignment Rename Standard Transforms SAS DI Studio Version 3.4 under SAS Intelligence Platform 9.1.3

Rename Standard Transforms Challenge: How can I keep track of processing in a job which has a lot of transformations. Solution: Don t use the default transform names, but rename the default to something meaningful. E.g. Rename SQL Join to Merge Agent_Dim with Broker_Dim

Subjects: Data Integration Structure Data Integration Organisation Capture Control (CCT Tables) Error Monitoring Data Validation Data Protection (Scrambler) Conformed Modelling SQL Optimisation Self Documentation Role Assignment Rename Standard Transforms SAS DI Studio Version 3.4 under SAS Intelligence Platform 9.1.3

Contributors Mick Collington Jethro Day Steve Morton Nick Treadgold

Contributors Mick Collington Jethro Day Steve Morton Nick Treadgold Data Integration Developer Group (SAS Professionals) Julien Heijster John Robertson http://www.sasprofessionals.net/group/dataintegrationdeveloper/ forum/topics/data-integration-best

Contributors Mick Collington Jethro Day Steve Morton Nick Treadgold Data Integration Developer Group (SAS Professionals) Julien Heijster John Robertson http://www.sasprofessionals.net/group/dataintegrationdeveloper/ forum/topics/data-integration-best SAS.COM