Adaptive Presentation at DAMA-NYC October 19 th, 2017 Enabling Data Governance Leveraging Critical Data Elements Jeff Goins, President, Jeff.goins@adaptive.com James Cerrato, Chief, Product Evangelist, james.cerrato@adaptive.com Copyright 2017 Adaptive, Inc. All Rights Reserved. 1
Confidentiality & Disclosure Agreement This is an unpublished work, the copyright in which vests in Adaptive, Inc. ( Adaptive ). All rights reserved. The information contained herein is confidential and the property of Adaptive, Inc. and is supplied without liability for errors or commissions. No part may be reproduced, disclosed or used, except as authorized by contract or other written permission. The copyright and the foregoing restriction on reproduction and use extend to all media in which the information may be embodied. All product names used herein are for identification purposes only and may be trademarks of their respective companies. Copyright 2017 Adaptive, Inc. All Rights Reserved. 2
Agenda Why are Critical Data Elements needed to make Enterprise Data Modeling more effective? What are Critical Data Elements and how to identify them? How to use Critical Data Elements to validate data movements? How to use Critical Data Elements to guide Business Rule definition and implementation? Copyright 2017 Adaptive, Inc. All Rights Reserved. 3
What are the limits to getting value from an Enterprise Data Model? Enterprise Data Models are BIG and complex The work to get full value from them at the logical and physical level can be extremely time consuming Traceability, Lineage and Governance are used to achieve value from an Enterprise Data Model However, this can t be done all at once, but takes many separate projects Copyright 2017 Adaptive, Inc. All Rights Reserved. 4
Why are Critical Data Elements needed? The all-at-once enterprise approach doesn t deliver tangible value soon enough Addressing only one business area or one database misses important aspects Instead focus on specific data items across the enterprise according to how critical they are By focusing on Critical Data Elements you address the important data needs that might span multiple organizations across the enterprise You can provide rich model of business context (not practical to do for a whole enterprise data model) Copyright 2017 Adaptive, Inc. All Rights Reserved. 5
What are Critical Data Elements (CDEs)? There is an enormous proliferation of data For data management and data governance activities to be successful there must be a focus on data that is vital to the successful operation of an enterprise CDEs identify which data is most important and requires the most attention CDEs should be identified at the business level so that their business context and purpose is captured A CDE should be linked to the logical and physical data elements, analytics and reports that provide its design and implementation Copyright 2017 Adaptive, Inc. All Rights Reserved. 6
How to identify CDEs? Financial reporting e.g.comprehensive Capital Analysis and Review (CCAR), Sarbanes Oxley, Basel Committee on Banking Supervision (BCBS) Audit trail Evidence in case of enquiry personal or legal Analytics Evidence of aggregate performance Master Data, Product Privacy / Sensitive Personally identifiable information (PII), European Union s General Data Protection Regulation (GDPR) Usable for identify theft e.g. Health Insurance Portability and Accountability Act (HIPAA) Corporate Policy Adherence to standards Copyright 2017 Adaptive, Inc. All Rights Reserved. 7
Where to get CDEs from? CDEs are often common to an industry or jurisdiction So do not need to be developed from scratch Sources include: Submissions required by regulators Data subject to laws e.g. GDPR defines scope of Personally Identifiable Information) Industry vocabularies/ontologies e.g. FIBO for Finance Vendor-supplied models or sets of CDEs for an industry These can be incorporated as an extra level of traceability in an enterprise Copyright 2017 Adaptive, Inc. All Rights Reserved. 8
Business context for a CDE defines purpose Definition Using business terminology: how is it used, what are examples? Business processes Creating, reading, updating, deleting Business Applications How are CDEs used by business applications that deliver valuable outcomes? Business Constraints Quality Criteria, Rules, Policies, Standards and Regulations Strategic Finance Goals Objectives Measurements Key Performance Indicators Assessments Organization IT Accountability Process What are all the different accountable relationships? Performance e.g. Steward, Business Owner, Technical Owner, Consumer, Subject Matter Expert HR Strategy External Governance Project Copyright 2017 Adaptive, Inc. All Rights Reserved. 9
What is traceability and why is it an important aspect of data governance? Traceability The relationship from a technical data element (tables, columns, BI reports ) to the business element (business terms, business concepts) it implements Enables business users to navigate from a business element and discover where its data is stored or included in BI reports Enables technical users to navigate from a technical data element to discover its approved business purpose and constraints its usage Provides the link between business and technical metadata and eliminates labor intensive and error prone efforts to answer questions that would require contacting many different data owners and stewards Copyright 2017 Adaptive, Inc. All Rights Reserved. 10
Data Governance using CDEs and Traceability Business/ Semantic Layer Describes information in the business context: Business focused & Business friendly Conceptual/ Semantic Standardized Logical/ Design Layer Describes the design level artifacts: Business as well as technical focus Logical, design, structured Includes details esp. about relationships Standardized at Enterprise Level Conforming to standards at application level Physical/ Actual Layer Represents the actual structures holding & operating information: Technical focus Actual Detailed Enterprise Application Cstmr Cstmr_Id Cstmr_Nm (Business Concept) Operational System Model CDE Orders (Business Concept) Product Identifier Orders Quantity Cust_Id Cust_Name CDE Product Identifier Quantity Data Warehouse Model Business Concept Model Cust_Dim Cust_Key Cust_Name Top s Report Operational Data Stores Data Lake Data Warehouse Reports T R A C E A B I L I T Y
What is lineage and why is it an important aspect of data governance? Lineage The path that data takes from original source elements to its consumption in reports and/or analytics Each step in data movement may also involve a transformation or calculation Enables business users to find where data was sourced from that they are seeing in a BI or regulatory report Enables technical users tasked to change a data element to do an impact analysis and quickly identify where its data is used This is vital for enterprises that have regulatory compliance requirements that they have full control of financial/private data end-to-end through their systems Copyright 2017 Adaptive, Inc. All Rights Reserved. 12
Where can Lineage be found? Lineage can take many forms aside from the traditional ETL tools Anything that moves or processes data to produce an output or conclusion Programs (e.g. COBOL, Java, Python) that create or update data Bulk data loaders and copiers (e.g. FTP) Messaging systems and buses Calls to Web service/rest APIs Consuming external data feeds (e.g. company info, stock prices) Big data analytics Machine learning Copyright 2017 Adaptive, Inc. All Rights Reserved. 13
How and where is data stored? Likewise the data can take many forms Relational databases CSV NoSQL Graphs RDF XML, JSON, Parquet/Avro Unstructured and semi-structured data from documents to multimedia And stored in many ways and many locations/clouds Databases Data lakes Files File-based cloud storage (e.g. Amazon S3) Copyright 2017 Adaptive, Inc. All Rights Reserved. 14
Data Governance using CDEs and Lineage Business/ Semantic Layer Describes information in the business context: Business focused & Business friendly Conceptual/ Semantic Standardized Logical/ Design Layer Describes the design level artifacts: Business as well as technical focus Logical, design, structured Includes details esp. about relationships Standardized at Enterprise Level Conforming to standards at application level Enterprise Application (Business Concept) CDE Operational System Model Orders (Business Concept) Orders Product Identifier Quantity CDE Product Identifier Quantity Data Warehouse Model Business Concept Model Physical/ Actual Layer Represents the actual structures holding & operating information: Technical focus Actual Detailed Cstmr Cstmr_Id Cstmr_Nm JSON Cust_Id Cust_Name ETL Cust_Dim Cust_Key Cust_Name Top s Report Operational Data Stores Data Lake LINEAGE Data Warehouse Reports BI
How to use CDEs to validate semantic integrity of data movement? Before implementation As a physical design is being authored, the next step can be guided by the traceability from a business element to its physical elements After implementation Validate that straight-through physical data movements have source and targets that map to compatible business elements By following traceability from a CDE to its physical element, and then lineage to where it came from / is used, then traceability back to business element you can compare the business element to the CDE to see if they are compatible How does this relate to Data Quality? If data movements don t match the business intent then the quality of data in the target data elements will be degraded Copyright 2017 Adaptive, Inc. All Rights Reserved. 16
B U S I N E S S P H Y S I C A L T r a c e a b i l i t y Data Store Using traceability and lineage to validate semantic integrity of data movements Annual Hardware Revenue Annual Software Revenue YrHrdRev YrSftRev YrSoftRv Annual Product Revenue Annual Cost of Product Sales YrCostProd YrProdRev Data Lake used by used by Annual Total Revenue Annual Cost of Sales YrCostSales YrTotRev Lineage Data Warehouse CDE Annual Net Revenue YrNetRev Reports Copyright 2017 Adaptive, Inc. All Rights Reserved. 17
CDEs guide the definition and implementation of Rules Rules are defined and associated to CDEs at Enterprise/ Business Level Rules defined at business level will be applied and enforced at the physical implementation level Enables standardization, consistency and conformance More detailed and specific variations of rules can be applied at technical level Copyright 2017 Adaptive, Inc. All Rights Reserved. 18
Business Rule Definition & Management CDE CDE Business (Business Concept) (Business Concept) Concept Model Rule: cannot be empty Rule: must be numerical Operational System Model Cstmr Cstmr_Id Cstmr_Nm ETL Orders Product Identifier Quantity Cust_Id Cust_Name Data Warehouse Model ETL Cust_Dim Cust_Key Cust_Name Operational Data Stores Data Lake Data Warehouse E N F O R C E M E N T
Rule implementation guided by Traceability CDE CDE Business (Business Concept) (Business Concept) 1.CDE & Rule is selected for execution Concept Model Rule: Desc: cannot be empty. Expr: If IS NULL Then count() Rule: Desc: must be numerical Expr: If does not Contain Numbers Then count() 4. Query executed against database tables and results shown Operational System Model Data Warehouse Model Orders Product Identifier Quantity 2. Physical data elements determined using traceability link 3. Query generated using physical metadata captured by scanner Cstmr Cstmr_Id Cstmr_Nm ETL Cust_Id Cust_Name ETL Cust_Dim Cust_Key Cust_Name Schema: ODS Connection URL: 192.10.10.12:8021 Schema: EDW Connection URL: 192.10.15.140:8088 Schema: DTM Connection URL: 192.10.15.150:8988
Different types of rules Constraints - Such as the ones shown on previous slide Calculations - e.g. profit = income costs within a context Quality - e.g. customer name is the same across different databases for same customer id. name is the same as used in a company registry. There are typically 8 dimensions of quality Inferences - e.g. a creditor is any entity which the company has a net negative balance with across all business transactions Classifications - e.g. a good customer is one that earned the company > $100k in net revenue in the previous financial year Copyright 2017 Adaptive, Inc. All Rights Reserved. 21
Conclusion Projects often fail due to having too large a scope Critical Data Elements provide a focus o Cut the volume o Allow in-depth management o Biggest business benefit in shortest time o Accelerates data and report rationalization Criteria will be organization-dependent CDEs can be used with Traceability and lineage to validate semantic integrity of data movements CDEs provide business context for definition and implementation of rules Copyright 2017 Adaptive, Inc. All Rights Reserved. 22
Thank you Jeff Goins, President, Jeff.goins@adaptive.com James Cerrato, Chief, Product Evangelist, james.cerrato@adaptive.com Copyright 2017 Adaptive, Inc. All Rights Reserved. 23