www.informatik-aktuell.de
Wolfgang Epting: Testdaten versteckte Geschäftschance oder immanentes Sicherheitsrisiko?
Test Data Management: Testing Matters Testing is not noticed when it goes well Challenges and Costs Production Defects Missing Data for Testing Blown Budgets 8-12 Copies of Production Old, irrelevant Test data Testing is 30-40% of the Application Development Cycle Testing Defects Missed Deadlines Data Breach Risk
Agile Development Needs Test Data Management The Majority of Application Development Lifecycles are Spend on Development Tasks
Agile Development Customer Challenges Priority Challenge Cause Delivery Schedules Delays, unreliable schedules Lengthy test data provisioning processes Application Quality Data Security Poor, High Error Rates Sensitive or private information is exposed to test teams, consultants and outsourcers Poor quality test data Production data is often used in test / development Budget Overruns Exceeding costs Resource intensive manual test data processes 4
Test Data is Not Immune 2014 2013 2012 2011 2010
Estimated Cost Examples for TDM Based on interviews with Informatica partners and SMEs (1bil+ org) Manual Project Testers Iterations Hours Burdened Labor Test Total TDG Only 20 50 6 16 $150 $14,400,000 $1,440,000 Automated Project Testers Iterations Hours Burdened Labor Test Total TDG Only 20 50 6 11.2 $150 $10,080,000 $1,008,000 6
Business Solution: Test Data Management (TDM) IT organizations need a solution that can speed testing cycles by automatically creating and provisioning test data with high precision; without introducing risk by protecting sensitive and private information. Test Data Subset Test Data Generation Secure and speed application development times Production data is analyzed for sensitive data and then masked to ensure that data privacy is not compromised in test systems. TDM eliminates the need for full production copies by allowing testers to create fully-functional data subsets at lower costs. 7
Informatica Secure Testing Solution Architecture DISCOVER Relationships Keys Sensitive Data Informatica Test Tool Integration Production Custom Apps Library of Test Data Sets Self-Service Test Data Provisioning Dev Test Train Synthetic Test Data Informatica Test Data Generation Informatica Test Data Warehouse Informatica Persistent Data Masking Informatica Data Subset Informatica Test Data Management
Purpose Built Solution Maximizes Productivity Role Specific Tools, Task Specific Interfaces Define Compliance Officers Business Analysts Compliance Officers Application Administrators Measure and Monitor Data Governance Discover Data Analysts Auditors Application Owners & Administrators Apply 9
Define Enterprise Masking Policies Define Sensitive Data & Remediation Plan Measure and Monitor Define Data Governance Discover Compliance, Privacy and Security Officers Business Analysts Apply Standardize policies across the enterprise with predefined packs for PII, PCI, and PHI Accelerate deployments with standard data domains, element definitions and preferred masking rules
Discover Sensitive Data and Table Relationships Measure and Monitor Define Data Governance Discover Data Analysts, Architects Auditors Assess exposure by thoroughly identifying all sensitive data Improve user productivity with automated discovery-- predefined patterns, data domains, Natural Language Processing, etc. Apply Auto-learned Data Relationships and Model
Informatica Persistent Data Masking Protect Sensitive Information in Nonproduction Permanently alter sensitive data such as credit cards, address information, or names Variety of Techniques: Shuffle Employee ID s Substitute Names Constant for City Special Credit Card Technique ID Name City Credit Card 0964 John Smith Plano 4417 1234 9741 5678 1949 9112 9471 9388 2586 7310 Mark Jones Rob Davis Jeff Richards Modesto Hartford Tampa 4981 4078 1341 9149 0854 1491 0508 4298 0149 9341 0134 9544 0148 9114 4198 9148 9481 1499 9147 1341 0521
Data Masking Cascade Masked values cascade to all related tables and fields 1. Process main table and create crossreference 2. Cascade changes to child tables 3. Cascade changes to related tables 4. Cascade changes to cluster database tables (mainly HR) PCL2 (HR Cluster 2) RELID SRTFD CLUSTD RU 0000000500001 3611 RU 0000000500002 3245 RU 0000000500003 3176 RU 0000000500004 3594 PA0002 (personal data) PERNR NACHN VORNA 1221 Smith Jeff 222 Jones Mike 3223 Washington Tina 4224 Jenkins Janet PA0003 (payroll status) PERNR SUBTY OBJPS 1221-30 222-31 3223-32 4224-33 C a s c a d e BSEG (Accounting Line Item) BUKRS BELNR PERNR RU 101 221 1 RU 102 222 RU 103 223 3 RU 104 224 4
Test Data Management Execute Masking and Subset Jobs Original Source Masked and Subsetted Target Create Environments as Needed Etc. Universal Connectivity Application Administrator
Audit Data Masking Results Set up independent masking validation rules Complete the audit process by proving that sensitive values have changed Ensure that formats are preserved Validate that data comes from a dictionary of values Validate that no original values exist in the masked database
Test Data Generation Getting Good Data to Test New tables related to the functionality have no data in production Data needs to be generated and related to existing PROD data New Functionality Access to production is limited by IT policies Test Data Generation Existing capabilities rolled out to new markets Data specific to new markets needs to be generated and related to existing PROD data No Access to PROD PROD Data not Representative 16 1
Without Test Data Warehouse Full Data Set (Masked) Product_id P1 P2 P3 P4 P5 P6 P7 Product Name Benz BMW Toyota Ford GM VW Audi Test Team One Identify Data Set Run Tests Update/Insert Data Record Results Request Database Refresh Order_ID O1 O2 O3 O4 O5 O6 O99 Order_Status Shipped In-Process Shipped Open In-Process Cancelled Shipped Order_ Line_ID Order_ID Product_Id OL1 O1 P1 OL2 O2 P2 OL3 O3 P3 OL4 O4 P1 OL5 O5 P7 Need to avoid collisions amongst teams Provision full database copies per team (virtual or physical) Refresh full databases to reset a small amount of data No ability to have metadata descriptions attached to test data sets 17
With Test Data Warehouse Full Data Set (Masked) Product_id P1 P2 P3 P4 P5 P6 P7 Order_ID O1 O2 O3 O4 O5 O6 O99 Product Name Benz BMW Toyota Ford GM VW Audi Order_Status Shipped In-Process Shipped Open In-Process Cancelled Delivered Order_ Line_ID Order_ID Product_Id OL1 O1 P1 OL2 O2 P2 OL3 O3 P3 OL4 O4 P1 OL5 O5 P7 Test Team One Test Team Two Identify Data Sets Run Tests Update/Insert Data Record Results Test Data Reset - weekly Identify Version Run Tests No Updates Record Results Test Data Reset - quarterly 18
Informatica Secure Testing Platform Test Data Management Dynamic Data Masking Production Sensitive Data Discovery Persistent Data Masking Non-Production Dev Test Risk & Compliance Officers Offshore, Outsourced Test Data Subset UAT Test Data Generation Train Testers Developers Trainers Cloud Test Tool Integration HP ALM Test Data Warehouse DBA & Infrastructure Managers
TDM Factory Design Importance of Repeatable Processes Data Masking and Data Subset is Not a Once-and-Done Project Raw Material Finished Goods TDM Factory Holistic, Timely Authoritative, Secure Application Data Create your process Define your masking rules Define your subset templates Test on a subset of the data Test to ensure that your processes work as you build them Continually improve the process based on the feedback
Data Masking On Hadoop v9.7 Use Cases (1) Persistent masking during import process: a) For Structured Data b) For Semi-structured Data (4) Persistent masking during export process: a) For Structured Data b) For Semi-structured Data (2) Persistent masking of sensitive data in Hadoop: a) For Analytics b) Data Provisioning c) Test Data (3) Dynamic masking of sensitive data in Hadoop based on user role.
Data Masking and Subset for salesforce.com Secure and Populate Sandbox Copies Masks existing SFDC sandbox environments Ensures data privacy Populate empty sandboxes Out of the box data masking rules Minimal options for speed of deployment Create test data sets for sandboxes Rationalize existing SFDC investment 23
Insurance company complies with GDV Code of Conduct mandate to protect insured sensitive data KEY BUSINESS IMPERATIVE AND IT INITIATIVE Business Imperative: Guarantee Security and Privacy are taken into account in the design and processing of products and services IT Initiative: Test Data Management for Secure Test Environments THE CHALLENGE Compliance with German Code of Conduct for PII, PHI and banking information Compliance to be achieved by January 2016 Multiple systems to be protected including SAP and Mainframe integrations INFORMATICA ADVANTAGE Out-of-the-box data masking packs to be applied in both SAP and mainframe environment Consistently mask sensitive data across multiple applications. Ability to handle complexity in data models (SAP ~200.000 tables) Connectivity to all required data sources including Oracle, DB2, VSAM and IMS RESULTS/BENEFITS Comply with GDV Code of Conduct 12 months prior to deadline Consistently, reliably, and quickly mask sensitive data Create a consistent test environment with multiple systems (SAP and Mainframe) Go live with complete scope in 9 months
Global Team of 700 Testers Benefit from Instantaneous Quality Test Data KEY BUSINESS IMPERATIVE AND IT INITIATIVE Business Imperative: Enable testing teams across the globe to self provision test data securely and on-demand IT Initiative: Enterprise Test Data Management Platform and Center of Excellence THE CHALLENGE Global team of 700+ testers including employees, contractors and consultants need test data faster Need to adhere to strict data privacy regulations Needed a platform that would work across integrated new and legacy applications INFORMATICA ADVANTAGE Works across all applications Integrated data masking Ability to handle complexity Enabled consistent and repeatable test data sets RESULTS/BENEFITS Test data that was available in 10 days is now available instantaneously Saved $2.2M with first application Went live in 5 months
Questions
27