Enterprise Challenges of Test Data Size, Change, Complexity, Disparity, and Privacy

Similar documents
Non-GUI Test Automation Concepts and Case Studies in Maintainable Testing

Agile Testing in the Real World Moving Beyond Theory to Achieve Practicality [NEED PICTURE HERE]

Advanced Software Testing Pairwise Testing Techniques

Advanced Software Testing Testing Code with Static Analysis

Code Coverage Metrics And How to Use Them

Shift Left and Friends And What They Mean for Testers

Code Coverage Metrics And How to Use (and Misuse) Them

Advanced Software Testing Understanding Code Coverage

Advanced Software Testing Applying State Diagrams to Business Logic

Advanced Software Testing Integration Testing

Can a Mobile Device Save Your Life? Testing, Quality, and Ubiquitous Computing

The Six Principles of BW Data Validation

FIVE BEST PRACTICES FOR ENSURING A SUCCESSFUL SQL SERVER MIGRATION

Integrated Access Management Solutions. Access Televentures

SYMANTEC: SECURITY ADVISORY SERVICES. Symantec Security Advisory Services The World Leader in Information Security

Five Hard-Won Lessons In Performance, Load, and Reliability Testing

Moving From Reactive to Proactive Storage Management with an On-demand Cloud Solution

BEFORE you pick a web designer, ASK these 20 critical questions.

Analysis for Testing. by: rex Black. quality matters Q3 2009

Understanding Managed Services

Data Warehouses and Deployment

Embedded Technosolutions

The Data Explosion. A Guide to Oracle s Data-Management Cloud Services

Considerations for Mobilizing your Lotus Notes Applications

Microsoft SharePoint Server 2013 Plan, Configure & Manage

The Truth About Test Data Management & Its Impact on Agile Development

BECOME A LOAD TESTING ROCK STAR

Next Generation Backup: Better ways to deal with rapid data growth and aging tape infrastructures

SQL Maestro and the ELT Paradigm Shift

Frontline Interoperability Test Team Case Studies

WHITE PAPER MARCH Automating Data Masking and Reduction for SAP System Copy

Using Threat Analytics to Protect Privileged Access and Prevent Breaches

SAP testing accelerators are a new trend from software testing vendors

Metabase Metadata Management System Data Interoperability Need and Solution Characteristics

Privacy and Security Aspects Related to the Use of Big Data Progress of work in the ESS. Pascal Jacques Eurostat Local Security Officer 1

Selling Improved Testing

Database Systems: Design, Implementation, and Management Tenth Edition. Chapter 1 Database Systems

VoIP INTERNET-BASED PHONE SYSTEMS CHOCK FULL OF FEATURES

The #1 Key to Removing the Chaos. in Modern Analytical Environments

whitepaper How to Measure, Report On, and Actually Reduce Vulnerability Risk

Evaluation Guide for ASP.NET Web CMS and Experience Platforms

FAQ: Database Development and Management

Brochure. Data Masking. Cost-Effectively Protect Data Privacy in Production and Nonproduction Systems

How to integrate data into Tableau

Escaping PCI purgatory.

Top 4 considerations for choosing a converged infrastructure for private clouds

A Strategic Approach to Web Application Security

Is Your Data Safe In The Cloud?

Advanced Solutions of Microsoft SharePoint Server 2013 Course Contact Hours

Ensuring a Successful DCIM Project. What You Need to Know. Paul Goodison CEO, Cormant Inc.

Advanced Solutions of Microsoft SharePoint 2013

EMC ACADEMIC ALLIANCE

Availability and the Always-on Enterprise: Why Backup is Dead

THE CUSTOMER SITUATION. The Customer Background

The Definitive Guide to Automating Content Migration

USE CASE. Collect CLOSED CASE FEEDBACK. Salesforce Workflow. Clicktools Deployment TWO DEPLOYMENT APPROACHES. The basic activity flow goes like this:

Bisnode View Why is it so damn hard to piece together information across the enterprise?

Sample Exam. Advanced Test Automation - Engineer

THE STATE OF DATA QUALITY

Teradata Analyst Pack More Power to Analyze and Tune Your Data Warehouse for Optimal Performance

Preview. Mobile Payments. Payments Strategy Series. A Guide to Planning Your Approach. Price: $150

Data Protection. Plugging the gap. Gary Comiskey 26 February 2010

A Make-or-Break Decision: Choosing the Right Serialization Technology for Revenue, Operational, and Brand Success

The Business Value of Metadata for Data Governance: The Challenge of Integrating Packaged Applications

Business and IT Challenges

COMPTIA CLO-001 EXAM QUESTIONS & ANSWERS

July 20, 2006 Oracle Application Express Helps Build Web Applications Quickly by Noel Yuhanna with Megan Daniels

Understanding Internet Speed Test Results

74% 2014 SIEM Efficiency Report. Hunting out IT changes with SIEM

Optimizing Testing Performance With Data Validation Option

Transform your network and your customer experience. Introducing SD-WAN Concierge

Topics covered 10/12/2015. Pengantar Teknologi Informasi dan Teknologi Hijau. Suryo Widiantoro, ST, MMSI, M.Com(IS)

Incremental Updates VS Full Reload

A Ready Business rises above infrastructure limitations. Vodacom Power to you

Exploring Cloud. Is moving your phone system to the cloud right for your business?

Welfare Navigation Using Genetic Algorithm

The Need for a Terminology Bridge. May 2009

How Real Time Are Your Analytics?

Internationalization & Localization Testing - A Case Study

A Road Map for Advancing Your Career. Distinguish yourself professionally. Get an edge over the competition. Advance your career with CBIP.

RED HAT ENTERPRISE LINUX. STANDARDIZE & SAVE.

White Paper. Low Cost High Availability Clustering for the Enterprise. Jointly published by Winchester Systems Inc. and Red Hat Inc.

Vibration analysis goes mainstream

Documentation Accessibility. Access to Oracle Support

Making the case for SD-WAN

CASE STUDY GLOBAL CONSUMER GOODS MANUFACTURER ACHIEVES SIGNIFICANT SAVINGS AND FLEXIBILITY THE CUSTOMER THE CHALLENGE

Predictive Insight, Automation and Expertise Drive Added Value for Managed Services

Composite Software Data Virtualization The Five Most Popular Uses of Data Virtualization

Introduction to Customer Data Platforms

Move Beyond Primitive Drawing Tools with SAP Sybase PowerDesigner Create and Manage Business Change in Your Enterprise Architecture

WHITE PAPER Cloud FastPath: A Highly Secure Data Transfer Solution

Your Data Demands More NETAPP ENABLES YOU TO LEVERAGE YOUR DATA & COMPUTE FROM ANYWHERE

Five Reasons It s Time For Secure Single Sign-On

Healthcare IT A Monitoring Primer

Best Practices in Data Governance

Importance of the Data Management process in setting up the GDPR within a company CREOBIS

TECHNOLOGY WHITE PAPER. Java for the Real Time Business

Get your business Skype d up. Lessons learned from Skype for Business adoption

The data quality trends report

Move Cyber Threats On To Another Target. Encrypt Everything, Everywhere. Imam Sheikh Director, Product Management Vormetric

Transcription:

Size, Change, Complexity, Disparity, and Privacy

For simple applications, representative test data can be relatively easy What if you are testing enterprise-scale applications? In enterprise data centers, dozens or hundreds of applications co-exist These applications are of various sizes, complexities, and criticalities They operate on various data repositories, in some cases shared data repositories In some cases, disparate data repositories hold related data Let s examine the test data options, and the challenges that arise, at the enterprise scale Copyright (c) 2016 RBCS Page 2

Options for Test Data Hand generated: Tester creates all test data by hand Tool generated: Generic (e.g., Excel) or specific tools, opensource or commercial, used to generate test data according to tester-specified pattern Production: all or a subset of production data copied to test environment Anonymized production: Tool(s) used to perform irreversible encryption, substitution, or scrambling of values Pseudo-anonymized production: Reversible anonymization In this presentation, I ll address factors to consider when choosing these options At the enterprise level, you ll probably need one or more tools More than one option might be useful when creating test data Whatever options are chosen, be really careful about: Testing on actual production data in the production environment Copying test data into the production environment Copyright (c) 2016 RBCS Page 3

Test Data Is Simple When Life Is Like So, if you are testing a simple mobile app, you might be thinking, Test data, what s the big deal? If the production data is small enough, you can generate data to populate the database Or maybe there s no personal information in the production data, so you just use that In some cases, your testing situation might simple, like the one shown here, but eventually it could evolve as your application s abilities grow Copyright (c) 2016 RBCS Page 4

But the Enterprise Be Like Actually, it s usually more complicated than this Copyright (c) 2016 RBCS Page 5

Production Data, the Testing Gold Standard For most large-scale systems, testers cannot create sufficient and sufficiently diverse test data by hand Data generation tools can create large datasets, but the data might lack diversity, similar value distributions, or other attributes of production data Production data is ideal, especially when large data sets have accumulated over years with various current and past application versions Production data looks exactly like the data the application will encounter on release, since it is But using production data for testing poses many challenges Copyright (c) 2016 RBCS Page 6

Privacy Concerns: Fly Meets Ointment Production data often contains personal data Trying to secure test data handling during testing can impose undesirable inefficiencies and constraints Such inefficiencies and constraints apply especially when using distributed and outsourced testing Enter the anonymization tool However, such tools are not magic Misuse can result in trivially easy reversal For example, letter substitution could lead to names like Kpio Cspxo, obviously John Brown In addition, Kpio Cspxo (and other nonsense) is not realistic For testing you must preserve data meaning and meaningfulness Copyright (c) 2016 RBCS Page 7

Meaning and Meaningfulness Substitutions, good and bad Good: John Brown to Lester Camden Bad: John Brown to Charlotte Dostoyevsky (esp. if we have gender field) Queries, views, and joins of anonymized data must yield same results as on production data For example, if a production query on John Brown returned 20 records, a test data query for Lester Camden must also return 20 records Problems here will affect functionality, performance, reliability, and load testing In addition, localization can be an issue with different languages and cultures, especially if multiple character sets can apply Copyright (c) 2016 RBCS Page 8

Oh, Those Long-Distance Relationships Data and relationships between data can span databases For example: Three applications have years of data describing the same population, spread across three different databases The applications interoperate and share data Data-warehousing and analytical applications access the related data across databases Logical records are created with de facto joins across via de facto foreign keys Anonymization can break these integrity constraints Meaningful end-to-end testing of functionality, performance, throughput, reliability, localization, and security becomes impossible For many of our clients, the usefulness of the test data for interoperability testing poses the hardest challenge This is true whether hand generated, tool generated, production, pseudo-anonymized, or anonymized Copyright (c) 2016 RBCS Page 9

Generic Test Data Requirements Keep existing data quality levels Most production data contains a large number (~25%) of errors Error rates must be the same for test data, but, if privacy is an issue, must not enable reverse engineering of the data Ensure maintainability Be able to edit, add, update, and delete data This includes individual fields and records, whole database, and across logical records spanning databases Maximize efficiency of the test data acquisition and update processes If production data is to be copied and/or anonymized, ensure those operations occur on quiescent data Copyright (c) 2016 RBCS Page 10

Test Database Server Costs Yet another challenge to production-scale test data is cost The test data must reside on test database servers, which can involve: DBMS licensing costs Server costs Disk space costs Ongoing hardware and software maintenance A regular backup process (with attendant costs for backup media as well as labor) Using open-source DBMS only addresses the first of these costs Copyright (c) 2016 RBCS Page 11

Storage Virtualization This allows abstraction of physical storage space It can reduce some of the cost issues associated with test data storage It s probably under-utilized by testing groups, as I hear very little about it online, from clients, or at conferences It can help in terms of allocating/de-allocating production-like storage for testing The issue of anonymization of the production data still exists, though Copyright (c) 2016 RBCS Page 12

Coping with Data Size Enterprise data have gotten huge, due to increases in disk capacity Unfortunately, disk throughput has not kept up This creates serious issues with copying, anonymizing, and managing test data that comes near to or achieves production size Figure from tylermuth.wordpress.com Copyright (c) 2016 RBCS Page 13

Specialized Application Support Many enterprises use tools such as SAP to help manage their businesses There are test data anonymization tools for the more common such enterprise applications If you are using a more obscure application, no such tool may be available Similar issues exist with NoSQL and other data formats Further, the application vendor might not be willing to share metadata information This should be a consideration during enterprise application selection However, testers are rarely included in such decisions, and once made, such decisions often don t change for years Copyright (c) 2016 RBCS Page 14

Success with Test Data Tools Success with test data tools is like any other test tool Assemble a team Elicit requirements from users and stakeholders (including those who understand privacy/security issues) Identify risks and constraints Determine tool options (open-source and commercial) Evaluate tools to identify short-list options Select the tool Pilot the tool Deploy the tool Short-cutting this process or imposing unrealistic budget targets can cause significant inefficiencies and possibly even inability to fulfill basic needs Copyright (c) 2016 RBCS Page 15

Conclusions Testing enterprise applications poses challenges beyond what small-scale applications pose This is especially true for test data Challenges include sources of data, privacy, availability of tools, size, cost, and testing usefulness Like any complex situation, success requires understanding the challenges and planning to meet them This presentation should help you identify your specific challenges and build an appropriate plan Copyright (c) 2016 RBCS Page 16

Contact RBCS For over twenty years, RBCS has delivered software and hardware testing services, including consulting, outsourcing, and training. Employing the industry s most experienced and recognized consultants, RBCS conducts product testing, builds and improves testing groups, and hires testing staff for hundreds of clients worldwide. Ranging from Fortune 20 companies to start-ups, RBCS clients save time and money through improved product development, decreased tech support calls, improved corporate reputation and more. To learn more about RBCS, visit. Address: RBCS, Inc. 31520 Beck Road Bulverde, TX 78163-3911 USA Phone: +1 (830) 438-4830 E-mail: info@rbcs-us.com Web: Facebook: RBCS, Inc Twitter: @RBCS, @LaikaTestDog Copyright (c) 2016 RBCS Page 17