Luncheon Webinar Series April 25th, 2014 Governance for ETL Presented by Beate Porst Sponsored By: 1 Governance for ETL Questions and suggestions regarding presentation topics? - send to editor@dsxchange.com Downloading the presentation http://www.dsxchange.net/governanceforetl.html Replay will be available within one day with email with details Pricing and configuration - send to editor@dsxchange.net Subject line : Pricing For those that stay through the entire presentation, we have a extra give away! Bonus Offer Free premium membership for your DataStage Management! Submit your management s email address and we will offer him access on your behalf. Email Info@dsxchange.net subject line Managers special. Join us all at Linkedin http://tinyurl.com/dsxmembers 2 1
Governance for ETL Beate Porst porst@us.ibm.com Session Abstract It should be without a doubt that it would be beneficial if ETL developers or operators would have an easy way to understand what type of information a data source contains, any rules associated with these sources and its content and which processes currently depends on these data sources. Yet the IT side, when executing data integration projects -- whether development or operations -- rarely has access to such valuable information. Those projects are still handled like a decade ago where the business side, at the beginning of the project, specifies requirements which are then handed over to IT, in many cases as Excel or Word documents. Anything missing in the specification, any discrepancy between the specification and the actual data source or if the definition is ambiguous could potentially result in incorrect transformations which are very difficult to detect. See how Information Server's build-in governance capabilities can help your data integration developers and operators to gain a better insight into data resources, their decencies, data policies & rules and how transformation jobs are linked to each other. 2
Agenda What is Information Governance? Short intro into Information Governance Which Information Server tools/interface are available to support Information Governance? Demo Acting on Insight Requires Confidence in Data Make Informed Decisions Take Bigger, Calculated Risks Uncover competitive advantages Identify new opportunities Information Integration & Governance for Big Data Automated Integration Visual Context Agile Governance Rapid, easy access to big data, wherever it resides Easy categorization, indexing, discovery of big data to optimize its usage Definition and execution of governance appropriate to data value and intended use 3
What is Information Governance? Lots of definitions -- E.g.: Information governance is a holistic approach to managing and leveraging information for business benefits and encompasses information quality, information protection and information life cycle management. 55% of the 400 people agreed with us, 26% said they had a narrower scope 7 What are the drivers for Governance? Better customer centricity Security and Risk management Audit and regulatory compliance Archival and storage management Data quality management Consistency of information across the enterprise 4
IBM Information Governance Unified Process One Platform for Information Governance Define Business Problem Obtain Executive Sponsorship Conduct Maturity Assessment Build Roadmap Establish Organization Blueprint Build Data Dictionary Understand Data Create Metadata Repository Define Metrics Govern Data Quality Govern Master Data Govern Life Cycle of Information Governs Security & Privacy Govern Big Data & Analytics = Enable through Process Measure Results = Enable through Technology What is your organizations maturity around Information Governance? Do you know what is contained in a specific data source? Do you have specific rules assigned to a data source? Does everyone understand the dependencies between transformations if you have multiple projects? Do all your projects use the same schema definitions? Is there a way for your business to tell if data marts are getting populated? Does your operations team have a way to understand and report ETL runtime quality? 5
Metadata Business Benefits A. Number of new applications per year 100 B. Number of hours by business analysts per application per year 200 C. Number of hours by data modelers per application per year 200 D. Average hourly cost of a business analyst $40 E. Average hourly cost of a data modeler $25 F. Total application development cost for analysts and data modelers {(AxBxD)+(AxCxE)} $1,300,000 G. Number of developer hours spent on data discovery per application 100 H. Average hourly cost of a developer $50 I. Total annual cost of data discovery (AxGxH) 500,000 J. Conservative savings from efficiency improvements based on a metadata platform 25% 11 Metadata Business Benefits (cont.) K. IT savings from a metadata platform in year 1 (F+I)xH $450,000 L. Number of users of business reports - Risk 1,000 - Marketing 1,000 - Finance 3,000 5,000 M. Total number of working hours per year 2,000 N. Percentage of time spent on reviewing reports 5% O. Percentage of report-viewing time that can be saved based on access to common data definitions and data lineage P. Cost per hour of business users $50 Q. Business savings from a metadata platform in year 1 (LxMxNxOxP) $750,000 R. Total financial benefits to IT and the business from a metadata platform in year 1 (K+Q) 3% $1,200,000 12 6
Information Governance Tools in Information Server Blueprint Director Define project blueprints & milestones Assign IT & business assets Governance Dashboard Report and analyze IT asset & business data/metadata at various dimensions Business Glossary Define business policies, rules, terms and relationships and link these to IT assets Metadata Workbench Analyze Data Lineage and perform Impact analysis Data Quality Console Monitor data quality issues within your data quality and transformation projects Operational Console Monitor your operational environment and understand system utilization Example Metadata Relationships and Analysis What can I find out about the data sources And whom can I contact if I have questions about the data What is the business definition for the data in my table? What policies and rules exist for the data? E.g. what are the valid values? What is impacted if I change this job? 7
DEMO Scenario Metadata Workbench Paul is a new ETL developer at JK Bank. As he is just a few weeks with the company he does not know all the details yet and specifically does not know any details about previous ETL projects. For an existing project, Paul is ask to integrate a new data source into JKs data mart. Awhile ago JK Bank started to embrace information governance with the goal to improve the understanding and consistency about their data and data sources. In this context, JK Bank started to build a business glossary with all the JK specific business definitions, policies and rules and link these to their data sources and other IT asset to ensure that both side (IT and the business) have access to the same information. With this information, Paul is now able to familiarize himself about the transformations, data sources and other information that he needs to fullfil his job without having to rely on verbal information 8
Paul opens one of the jobs in DS Designer and is interested if he can find out more about the term Customer. He knows, he has BG Anywhere installed on his desktop which allows him to lookup content through simple high-light of the text. There is indeed a Customer business term defined and Paul goes ahead and drills into it. 9
From the detailed Business Glossary view, Paul identifies that the Customer table is associated to the Customer business term. He is going to drill into the customer table The customer table on its end is linked to a Project blueprint. As Paul assumes that the project blueprint will give him a good overview of the entire project flow, he click on this 10
And indeed this gives him a good understand of the different phases and associated assets for this project. It even allows him to drill into the Metadata Workbench for further analysis Paul can see the same project folder in Metadata Workbench including the current visual design. From here, he can drill into each stage, link or data source for further information. He can also do a data lineage or impact analysis to know how this jobs links to other assets 11
Paul sees, there are quite a lot of jobs and data sources that depend on JOB_3. So, depending on the change, he may have to propagate these to other sub-sequent jobs. One of the original source for the whole transformation sequence is the BANK_ACCOUNTS table. Paul would like to know more what this table and the column contain and therefore drills into it 12
With one click Paul gets all the info for Bank_Accounts such as the steward, the general description etc. If we would want to know more about the business context, he could drill into the business term Looking at the other details in the table overview, Paul can see that the table was referenced by a Project blueprint. Paul really would like to know how the entire project was defined from a high-level perspective and selects the blue print. 13
When Paul selected the Blueprint, it opens a separate windows with a Blueprint viewer which allows Paul to see the outline of the project and all its associated assets. This view is fully navigational. Now that Paul got some better understanding of the overall project, he is now going to drill into the column details 14
Paul drills into the Gender column. He see that the gender column also has a business definition. As he is ask to make changes here, he would like to know who his LOB defines Gender From the Gender definition, Paul can see that Gender has specific valid values and that this type of data only has a low sensitivity rate, which means it would not need to be mask. He also notices the rule that governs Gender. He wants to have a look at this 15
Looking at the Governance Rule, Paul noticed that this rule is not implemented yet. He decides to contact Richard to consult him if this is coming forward or if he should take care of it. Before that Paul looks at the Policy which governs this rules Based on the policy, it clearly states that the sources associated to the rules would need to be validated. This confirms that he would need to check with Richard about the progress to create a Data Validation rule for Gender 16
Governance Dashboard Scenario Chief Governance Officer Mary wants to understand the status of the governance program. Do policies have stewards assigned? Do policies have information governance rules assigned? Are information governance rules implemented? Starting from a policy of interested, Mary browses the policy hierarchy to an area that needs more detailed inspection. 17
18
19
Thank you 20