(Healthy Habits for SAS Data Integration Studio Users)
Abstract: Version 9 of the SAS System offers tools to help developers and business users manage and organise the wealth of data and processes that face SAS professionals today. SAS Data Integration Studio benefits from many features that support healthy habits for data integration, but they can only 'be of use' if they are 'being used'. DI Studio allows customisation of the custom tree, error monitoring, job status handling, data validation, conformed data model support, selfdocumentation, and role assignment. Identification of the benefits behind using these functions is often enough to motivate users into controlled and organised methods of working. This paper describes examples of best practice for developing data integration suites to ensure quality, efficiency and resilience is built into the heart of your enterprises information estate.
Subjects: Data Integration Structure Data Integration Organisation Capture Control (CCT Tables) Error Monitoring Data Validation Data Protection (Scrambler) Conformed Modelling SQL Optimisation Self Documentation Role Assignment Rename Standard Transforms SAS DI Studio Version 3.4 under SAS Intelligence Platform 9.1.3
Data Integration Structure Challenge: How can you best deliver Business Intelligence from a variety of source systems across a diverse consumer base? Solution: Employ a Data Integration flow structure.
Data Integration Structure Challenge: How can you best deliver Business Intelligence from a variety of source systems across a diverse consumer base? Solution: Employ a Data Integration flow structure. Source Systems
Data Integration Structure Challenge: How can you best deliver Business Intelligence from a variety of source systems across a diverse consumer base? Solution: Employ a Data Integration flow structure. Source Systems Detailed Data Model
Data Integration Structure Challenge: How can you best deliver Business Intelligence from a variety of source systems across a diverse consumer base? Solution: Employ a Data Integration flow structure. Source Systems Detailed Data Model Subject Specific Data Marts
Data Integration Structure Challenge: How can you best deliver Business Intelligence from a variety of source systems across a diverse consumer base? Solution: Employ a Data Integration flow structure. Source Systems Detailed Data Model Subject Specific Data Marts Subject Specific Business Intelligence
Subjects: Data Integration Structure Data Integration Organisation Capture Control (CCT Tables) Error Monitoring Data Validation Data Protection (Scrambler) Conformed Modelling SQL Optimisation Self Documentation Role Assignment Rename Standard Transforms SAS DI Studio Version 3.4 under SAS Intelligence Platform 9.1.3
Data Integration Organisation Challenge: How can you keep track of the thousands of jobs typically created in a data integration suite? Solution: Utilise the custom tree in SAS Data Integration Studio.
Data Integration Organisation Create folders for each integration layer. Sub divide them by: Jobs Libraries Tables Number the folders preserve order. Stick to methodology: (e.g. don t transform in capture layer)
Subjects: Data Integration Structure Data Integration Organisation Capture Control (CCT Tables) Error Monitoring Data Validation Data Protection (Scrambler) Conformed Modelling SQL Optimisation Self Documentation Role Assignment Rename Standard Transforms SAS DI Studio Version 3.4 under SAS Intelligence Platform 9.1.3
Capture Control Challenge: How can I perform incremental extracts from several source systems? Solution: Define Capture Control Tables for each source table. Status To ensure smooth running of DI suite. (Started, Failed, or Success) From/To Datetimes To extract against the last updated column in the database. Also useful to determine processing times as data increases day by day.
Capture Control Send Job Status to dataset with same name as the job.
Capture Control Only extract records which have updated since last run. Capture Job Source Systems Conformed Model
Capture Control Only extract records which have updated since last run. Capture Job Source Systems Conformed Model CoreInfo Tables
Capture Control Only extract records which have updated since last run. Pre Capture Job Post Source Systems Conformed Model CoreInfo Tables
Capture Control Pre-Processing Is this the first time the job has run successfully today? Yes No Warn that duplicate facts will occur. Did the previous run fail, or not finish? Yes No Warn that this is a replacement run. Update dates in CCT table for this source. (&source_table._cct)
Capture Control Post-Processing Did the job run successfully? Yes No Update CCT table with Status= Failed. Update dates in CCT table for this source. (&source_table._cct)
Subjects: Data Integration Structure Data Integration Organisation Capture Control (CCT Tables) Error Monitoring Data Validation Data Protection (Scrambler) Conformed Modelling SQL Optimisation Self Documentation Role Assignment Rename Standard Transforms SAS DI Studio Version 3.4 under SAS Intelligence Platform 9.1.3
Error Monitoring Challenge: How can I keep my production support department informed of job failures/successes? Solution: Email job statistics to designated mailbox. Create User Transform called Email_Stats. Add Email_Stats transform to each job.
Error Monitoring Add Email_Stats transform to Job.
Error Monitoring Drag Target table to one input. Drag Email_Stats to other input. (Email_Stats table contains email addresses of recipients). Don t hard-code email addresses. What happens when people leave? Different recipients for dev/prod.
Error Monitoring Email_Stats transform properties. Only emails if job has failed.
Error Monitoring Last job in flow always sends email to Admin & Support. Set Last Job to Yes.
Subjects: Data Integration Structure Data Integration Organisation Capture Control (CCT Tables) Error Monitoring Data Validation Data Protection (Scrambler) Conformed Modelling SQL Optimisation Self Documentation Role Assignment Rename Standard Transforms SAS DI Studio Version 3.4 under SAS Intelligence Platform 9.1.3
Data Validation Challenge: How can I ensure only clean data gets loaded into the warehouse? Solution: Use the Data Validation transformation.
Data Validation Challenge: How can I ensure only clean data gets loaded into the warehouse? Solution: Use the Data Validation transformation. Use the standard Invalid, Missing, Duplicate tabs. Employ custom validation and apply a severity rating: 1 = Exclusion 2 = Correction 3 = Improvement Store exceptions in permanent dataset for further analysis.
Data Validation e.g. Check for Truncation of Key columns
Data Validation 1) Create each condition
Data Validation 1) Create each condition 2) Determine validation
Data Validation 1) Create each condition 2) Determine validation 3) Define corrective action if required
Data Validation 1) Create each condition 2) Determine validation 3) Define corrective action if required 4) This gets written to temp dataset ETLS_EXCEPTIONS.
Data Validation 1) Create each condition 2) Determine validation 3) Define corrective action if required 4) This gets written to temp dataset ETLS_EXCEPTIONS. 5) Run %Append_Data_Quality Macro in post-processing.
Data Validation 1) Create each condition 2) Determine validation 3) Define corrective action if required 4) This gets written to temp dataset ETLS_EXCEPTIONS. 5) Run %Append_Data_Quality Macro in post-processing. 6) Use BI tools to investigate Data Quality issues (e.g. Particular source system requires cleansing)
Data Validation %Append_Data_Quality Macro Logic. Does ETLS_EXCEPTIONS exist? Yes No Halt macro as no errors to process. Append exceptions to permanent table DQ_Error_Event.
Data Validation Table Properties for DQ_ERROR_EVENT. Column name Description Type Length Row_Extraction_Date Date-timestamp when the row was exported or extracted from the source system. Num (8) Exception_Event_Date Date-timestamp when the exception was identified by the data warehouse processes. Num (8) Job_Name The name of the ETL job which identified the exception. Char (64) Table_Name The library and table name which contains the row and column containing the exception. Char (41) Row_Number The row number containing the exception. Num (8) Column_Name The column name containing the datum of the exception. Char (32) Screen_Description The screen (data quality test) description. Char (256) Exception_Description Standardised description of the exception. Char (256) Exception_Action Automated data conform action (if any). Char (256) Exception_Severity The severity level of the DQ Error Event (1=Exclusion, 2=Correction, 3=Improvement ). Num (8) Unconformed_ValueN Original value (numeric) before conforming. Num (8) Conformed_ValueN Conformed (numeric) value. Num (8) Unconformed_ValueC Original value (character) before conforming. Char (256) Conformed_ValueC Conformed (character) value. Char (256)
Subjects: Data Integration Structure Data Integration Organisation Capture Control (CCT Tables) Error Monitoring Data Validation Data Protection (Scrambler) Conformed Modelling SQL Optimisation Self Documentation Role Assignment Rename Standard Transforms SAS DI Studio Version 3.4 under SAS Intelligence Platform 9.1.3
Data Scrambling Challenge: How can I ensure I m not holding sensitive production data on development/test systems. Solution: Use Data Scrambling routines in non-production environments. Often development source systems are created using production data, and warehouses can propagate the risk of breaching the data protection act.
Data Scrambling Custom Transform The %data_scrambler macro allows for columns to be scrambled or passed through normally.
Data Scrambling Custom transform Edit Paramters: Select Pass don t scramble key fields! Scramble method: Ranuni Function MD5 Function Translate Function
Data Scrambling What about Production? %let liveenvironment = PROD; %let thisenvironment= %sysfunc(substr(%sysfunc(upcase(%sysfunc(getoption(metaserver)))),1,4); Don t perform scramble routine if thisenvironment = liveenvironment. When runnning in Dev the METASERVER option should be different. Could set up a table with environment value in.
Subjects: Data Integration Structure Data Integration Organisation Capture Control (CCT Tables) Error Monitoring Data Validation Data Protection (Scrambler) Conformed Modelling SQL Optimisation Self Documentation Role Assignment Rename Standard Transforms SAS DI Studio Version 3.4 under SAS Intelligence Platform 9.1.3
Conformed Model Challenge: How can I track trends in my data when the source systems don t hold history. Solution: Use a conformed data model in a warehouse, using slowly changing dimensions where appropriate. Re-Useable Dimensions Fact Tables
Conformed Model In the Integrate layer use the SCD Type II Loader transform to make use of effective date processing.
Conformed Model In the Integrate Layer use the Surrogate Key Generator to determine keys for dimension tables.
Subjects: Data Integration Structure Data Integration Organisation Capture Control (CCT Tables) Error Monitoring Data Validation Data Protection (Scrambler) Conformed Modelling SQL Optimisation Self Documentation Role Assignment Rename Standard Transforms SAS DI Studio Version 3.4 under SAS Intelligence Platform 9.1.3
SQL Optimisation Challenge: How can I ensure the best possible SQL performance is achieved through my SQL Join transform. Solution: Use the undocumented _Method option on the SQL procedure to determine processing.
SQL Optimisation: _Method Option (SAS Note 33604)
Subjects: Data Integration Structure Data Integration Organisation Capture Control (CCT Tables) Error Monitoring Data Validation Data Protection (Scrambler) Conformed Modelling SQL Optimisation Self Documentation Role Assignment Rename Standard Transforms SAS DI Studio Version 3.4 under SAS Intelligence Platform 9.1.3
Self Documentation Challenge: How can I ensure the executed warehouse code is documented to an acceptable standard? Solution: DI Studio self documents the code, based on descriptions in in the job and transform properties.
Self Documentation Meaningful Job names Descriptions of why not just what.
Self Documentation Use Notes and Document Attachments.
Self Documentation Descriptions & Notes are propagated through to the executable code, benefitting production support teams.
Subjects: Data Integration Structure Data Integration Organisation Capture Control (CCT Tables) Error Monitoring Data Validation Data Protection (Scrambler) Conformed Modelling SQL Optimisation Self Documentation Role Assignment Rename Standard Transforms SAS DI Studio Version 3.4 under SAS Intelligence Platform 9.1.3
Role Assignment Challenge: How can I address who is responsible for which job / entity? Solution: Use Role Assignment in DI studio.
Role Assignment Allocate names and roles where required.
Subjects: Data Integration Structure Data Integration Organisation Capture Control (CCT Tables) Error Monitoring Data Validation Data Protection (Scrambler) Conformed Modelling SQL Optimisation Self Documentation Role Assignment Rename Standard Transforms SAS DI Studio Version 3.4 under SAS Intelligence Platform 9.1.3
Rename Standard Transforms Challenge: How can I keep track of processing in a job which has a lot of transformations. Solution: Don t use the default transform names, but rename the default to something meaningful. E.g. Rename SQL Join to Merge Agent_Dim with Broker_Dim
Subjects: Data Integration Structure Data Integration Organisation Capture Control (CCT Tables) Error Monitoring Data Validation Data Protection (Scrambler) Conformed Modelling SQL Optimisation Self Documentation Role Assignment Rename Standard Transforms SAS DI Studio Version 3.4 under SAS Intelligence Platform 9.1.3
Contributors Mick Collington Jethro Day Steve Morton Nick Treadgold
Contributors Mick Collington Jethro Day Steve Morton Nick Treadgold Data Integration Developer Group (SAS Professionals) Julien Heijster John Robertson http://www.sasprofessionals.net/group/dataintegrationdeveloper/ forum/topics/data-integration-best
Contributors Mick Collington Jethro Day Steve Morton Nick Treadgold Data Integration Developer Group (SAS Professionals) Julien Heijster John Robertson http://www.sasprofessionals.net/group/dataintegrationdeveloper/ forum/topics/data-integration-best SAS.COM