Practical Design of Experiments: Considerations for Iterative Developmental Testing

Similar documents
Dealing with Categorical Data Types in a Designed Experiment

Categorical Data in a Designed Experiment Part 2: Sizing with a Binary Response

Computer Experiments: Space Filling Design and Gaussian Process Modeling

CHAPTER 3 AN OVERVIEW OF DESIGN OF EXPERIMENTS AND RESPONSE SURFACE METHODOLOGY

How to Conduct a Heuristic Evaluation

Agile Accessibility. Presenters: Ensuring accessibility throughout the Agile development process

Two-Level Designs. Chapter 881. Introduction. Experimental Design. Experimental Design Definitions. Alias. Blocking

Lecture: Simulation. of Manufacturing Systems. Sivakumar AI. Simulation. SMA6304 M2 ---Factory Planning and scheduling. Simulation - A Predictive Tool

LISA: Explore JMP Capabilities in Design of Experiments. Liaosa Xu June 21, 2012

Design of Experiments

Truncation Errors. Applied Numerical Methods with MATLAB for Engineers and Scientists, 2nd ed., Steven C. Chapra, McGraw Hill, 2008, Ch. 4.

Software Test & Evaluation Summit/Workshop Review

Optimization of Surface Roughness in End Milling of Medium Carbon Steel by Coupled Statistical Approach with Genetic Algorithm

HOW TO PROVE AND ASSESS CONFORMITY OF GUM-SUPPORTING SOFTWARE PRODUCTS

NEXT GENERATION SECURITY OPERATIONS CENTER

Categorizing Migrations

Information Security Governance and IT Governance

Using TAR 2.0 to Streamline Document Review for Japanese Patent Litigation

UNCLASSIFIED. FY 2016 Base FY 2016 OCO

Foundation Level Learning Targets Version 2.2

Quasi-Monte Carlo Methods Combating Complexity in Cost Risk Analysis

Sustainable Security Operations

ALIGNING CYBERSECURITY AND MISSION PLANNING WITH ADVANCED ANALYTICS AND HUMAN INSIGHT

Chapter 2 Example Modeling and Forecasting Scenario

CONNECTING TESTS. If two tests, A and B, are joined by a common link ofkitems and each test is given to its own sample ofnpersons, then da and drb

Linear programming II João Carlos Lourenço

Fusion AE LC Method Validation Module. S-Matrix Corporation 1594 Myrtle Avenue Eureka, CA USA Phone: URL:

IJQRM 21,6. Received March 2004 Revised March 2004 Accepted March 2004

Stat 401 B Lecture 26

STAT 311 (3 CREDITS) VARIANCE AND REGRESSION ANALYSIS ELECTIVE: ALL STUDENTS. CONTENT Introduction to Computer application of variance and regression

FUSION PRODUCT DEVELOPMENT SOFTWARE

Accuracy Assessment of Ames Stereo Pipeline Derived DEMs Using a Weighted Spatial Dependence Model

Information systems design: a procedural approach

Implementing ITIL v3 Service Lifecycle

FIVE BEST PRACTICES FOR ENSURING A SUCCESSFUL SQL SERVER MIGRATION

Building Better Parametric Cost Models

A tool to assist and evalute workstation design

How to Analyze Materials

Calibrating HART Transmitters. HCF_LIT-054, Revision 1.1

4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used.

GoldSim: Using Simulation to Move Beyond the Limitations of Spreadsheet Models

Adaptive spatial resampling as a Markov chain Monte Carlo method for uncertainty quantification in seismic reservoir characterization

Performance Characterization in Computer Vision

The Cost of Information Loss

2017 Mathematics Paper 1 (Non-calculator) Finalised Marking Instructions

TISA Methodology Threat Intelligence Scoring and Analysis

Briefing Date. Purpose

TRANSACTIONAL BENCHMARK

Factors Affecting the Performance of Ad Hoc Networks

Threat Hunting in Modern Networks. David Biser

SEPTEMBER 2018

Using Simulation to Understand Bottlenecks, Delay Accumulation, and Rail Network Flow

Tree-based Cluster Weighted Modeling: Towards A Massively Parallel Real- Time Digital Stradivarius

Strategy & Planning: Data Governance & Data Quality

HARNESSING CERTAINTY TO SPEED TASK-ALLOCATION ALGORITHMS FOR MULTI-ROBOT SYSTEMS

Optimizing Pharmaceutical Production Processes Using Quality by Design Methods

SPM Users Guide. This guide elaborates on powerful ways to combine the TreeNet and GPS engines to achieve model compression and more.

GEEN 1300 Introduction to Engineering Computing Fall Practice Test for Section Test #1

Symantec Data Center Migration Service

Automate Transform Analyze

WIN BACK YOUR CUSTOMERS! A guide to re-engaging your inactive subscribers

turning data into dollars

CHAPTER 4 STOCK PRICE PREDICTION USING MODIFIED K-NEAREST NEIGHBOR (MKNN) ALGORITHM

Stylus Studio Case Study: FIXML Working with Complex Message Sets Defined Using XML Schema

Symantec Data Center Transformation

Basic Concepts of Reliability

OUTSMART ADVANCED CYBER ATTACKS WITH AN INTELLIGENCE-DRIVEN SECURITY OPERATIONS CENTER

Chapter 8 Introduction to Statistical Quality Control, 6 th Edition by Douglas C. Montgomery. Copyright (c) 2009 John Wiley & Sons, Inc.

Smart Data Center From Hitachi Vantara: Transform to an Agile, Learning Data Center

Quantifying the Value of Firewall Management

SSIM Image Quality Metric for Denoised Images

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques

Systems 2020 Strategic Initiative Overview

Data Verification and Validation (V&V) for New Simulations

Process of Interaction Design and Design Languages

Chapter 16. Microscopic Traffic Simulation Overview Traffic Simulation Models

Expectation-Maximization Methods in Population Analysis. Robert J. Bauer, Ph.D. ICON plc.

Simulation Supported POD Methodology and Validation for Automated Eddy Current Procedures

CA Security Management

Technical Brief: Domain Risk Score Proactively uncover threats using DNS and data science

Analytic Performance Models for Bounded Queueing Systems

these developments has been in the field of formal methods. Such methods, typically given by a

The Consequences of Poor Data Quality on Model Accuracy

KS3 Progression Map: Number

INITIALIZING CENTROIDS FOR K-MEANS ALGORITHM AN ALTERNATIVE APPROACH

The Composition of Number

Engineering Design Notes I Introduction. EE 498/499 Capstone Design Classes Klipsch School of Electrical & Computer Engineering

A Robust Optimum Response Surface Methodology based On MM-estimator

A Heuristic Robust Approach for Real Estate Valuation in Areas with Few Transactions

Frequently Asked Questions Updated 2006 (TRIM version 3.51) PREPARING DATA & RUNNING TRIM

The Analysis and Proposed Modifications to ISO/IEC Software Engineering Software Quality Requirements and Evaluation Quality Requirements

Viewpoint Review & Analytics

Design of Experiments for Coatings

Linear and Quadratic Least Squares

The Cross-Entropy Method

Adaptive Filtering using Steepest Descent and LMS Algorithm

Information Infrastructure and Security. The value of smart manufacturing begins with a secure and reliable infrastructure

TEL2813/IS2820 Security Management

ACCURACY AND EFFICIENCY OF MONTE CARLO METHOD. Julius Goodman. Bechtel Power Corporation E. Imperial Hwy. Norwalk, CA 90650, U.S.A.

DOE and Test Automation for System of Systems T&E

Transcription:

Practical Design of Experiments: Considerations for Iterative Developmental Testing Best Practice Authored by: Michael Harman 29 January 2018 The goal of the STAT COE is to assist in developing rigorous, defensible test strategies to more effectively quantify and characterize system performance and provide information that reduces risk. This and other COE products are available at www.afit.edu/stat. STAT Center of Excellence 2950 Hobson Way Wright-Patterson AFB, OH 45433

Table of Contents Executive Summary... 2 Introduction... 2 Critical Analytical Goals... 2 Keep the Regression Model Current... 2 Routinely Check Model Efficacy... 2 Considerations for Iterative Testing... 3 Questions for Iterative Re-Screening... 3 Screening Design Considerations... 3 Questions for Iterative Design Augmentation... 4 Design Augmentation Considerations... 4 A Comment on Violation of DOE Fundamentals... 5 Questions About Model Validation... 5 Use of Validation Points... 5 Reporting Test Results... 5 Conclusions... 6 References... 6

Executive Summary Developmental testing (DT) in the Department of Defense (DOD) is an iterative process by which a system is evaluated and improved before operational testing (OT) and fielding. Employing design of experiments (DOE) in DT results in a unique iterative process. Additional considerations for test designs and regression modeling are needed for evolving configurations that purposefully change performance. Setting analytical goals helps determine how the team will deal with re-testing. DOE provides tremendous information that must be reported clearly and precisely to support the systems engineering process effectively. This paper provides practical considerations for these iterative processes. Keywords: sequential experimentation, defense testing, regression modeling, iterative testing Introduction Design of Experiments (DOE) is a tool to facilitate rigorous testing and provide high quality analysis. Developmental testing (DT) in DOD is an iterative process by which a system is evaluated and improved before operational testing (OT) and fielding. Employing DOE in DT requires a unique iterative process when the tested system configuration is changed. The considerations presented here primarily relate to configuration changes implemented to correct performance shortfalls. This paper provides practical details and provides several questions to ask to execute this iterative process effectively. Smaller (certainly a subjective term) changes may not be subject to the rigor detailed below and some options are touched on but are not the focus. This paper assumes the reader already has an understanding of DOE and sequential testing. For detailed DOE texts see Box, Hunter, and Hunter (2005) or Montgomery (2017). Critical Analytical Goals Setting analytical goals helps determine how the team will test successive configurations. Analytical goals may vary and should be specific to each test program. Several broadly applicable goals are discussed below. Keep the Regression Model Current A good regression model should be able to describe performance at every point inside the factor space; however, it is only relevant to the latest configuration that underwent testing. Keeping the model current with the latest configuration allows the team to use it later for engineering, mission planning, or tactical development needs. Therefore, executing only screening designs after a configuration change may not be sufficient to generate a full regression model (unless main effects are the only significant terms). Note that for small changes, late in testing, it may suffice to use only validation points with a new configuration to determine if the previous configuration s model still applies. This should not be used as the primary method in early testing (especially when many changes are being incorporated) because it can introduce some subjectivity that the sequential screening/augmentation process strives to avoid. Routinely Check Model Efficacy Using validation points enables the team to assess the accuracy of the model at other points inside the factor space not already tested. Risk areas, or other regions of interest (including a particular factor Page 2

combination that is meaningful to subject matter experts), may not be adequately covered by the design points alone. The addition of validation points facilitates the evaluation of requirements, risk, and corrections at locations of the team s choosing and also bolsters confidence in the accuracy of the model. Considerations for Iterative Testing DT is iterative by nature. A system configuration is evaluated and then modified to address deficiencies and the process repeats until the requirements are met. Sequential DOE (screening, augmentation, and validation) still applies to testing each configuration, but evolving configurations that purposefully change performance require additional considerations. The first screening test is meant to identify significant factors. Augmenting that design to address higher order terms informs the evaluation against requirements. Finally, validation provides information about how well the regression model performs throughout the factor space. This process will need to be repeated again with a new configuration, but what other considerations should be addressed? Questions for Iterative Re-Screening The initial screening will identify significant factors, but how are these to be viewed after the system has been changed? Should current configuration screening be limited to factors significant in the last test? What if the change alters significant factors and others become significant? Are we looking to confirm or refute expectations from the latest configuration updates? What is the resource cost to re-screen all factors versus a reduced list? What is the information (decision) risk of missing significant factors? Screening Design Considerations Factor significance will be evaluated after the initial screening design and then decisions are made that impact the content of the subsequent augmented design. Because the configuration has changed, there is some probability the significant factors will change. However, complete re-testing consumes resources. While the team must address specifics for their system under test, here are some suggestions. Re-screen all factors every time. Since screening designs are very efficient and the data points become part of the augmented design, these are efficiently used resources. If a new significant factor emerges after a configuration change, then this screening will be well worth its cost. Different significant factors between configurations indicate system performance has changed. This new information can provide a rough capability to determine if desired changes may have been successfully realized or not. The choice of the type of screening design is important. DOE software can easily generate custom computer-generated screening designs built by optimizing only one criterion, but classical fractional factorial designs have many beneficial design properties as well. Fractional factorials can be used to screen all 2FI without the manual labor that custom designs require in some software suites. Be sure to choose at minimum a resolution IV design so that ME are not aliased with 2FI. There will be some aliasing between 2FI and other 2FI but, if it is minimal, the results for ME significance should help discern which 2FI are truly present (for more on fractional factorial designs see Montgomery, Chapter 8). Page 3

Include as many 2FI as possible with every screening iteration. As testing proceeds through many configurations, you may have more information that indicates certain 2FI will be present or are of particular interest. If so, be sure to include those with minimal aliasing. Evaluate the root mean square error (RMSE) and recalculate the signal to noise ratio (SNR) after each test. Designs created using an accurate SNR will be correctly sized for model term power and will correctly allocate resources. This idea also applies to the augmented designs described below. Questions for Iterative Design Augmentation Similar to screening designs, questions are raised regarding the augmentation of the screening designs: Should previously significant factors be considered if they are not significant in the current screening? Should all potential two-factor interactions (2FI) be added in every iteration? Can some terms be objectively eliminated based on previous results? Including all 2FI may require a large number of test points. Design Augmentation Considerations Given what was learned through screening, you will augment the design to include only the significant factors and the desired higher order terms. Factor significance will potentially/probably be different for every response. If this is the case, there are two options: Augment the design using all the significant factors from all responses o Allows the use of a single design o This option increases the probability the design may be significantly larger than necessary for any single response Create different augmented designs for each response o Results in multiple designs o Smaller focused designs may be less resource costly than a single larger design Augmentation also builds subsequent designs based on the information derived from screening. If a previously significant factor does not surface in the current round of screening, it obviously means the impact of that factor was not observed in the data. Augment the design with and without this additional factor and determine the resource cost to include it. With a large number of factors, adding an additional one may not impact the design size enough to justify leaving it out if it is of particular interest. Compare both design sizes to make an objective resource determination. Including all 2FI may drive the design size up significantly. However, few 2FI may be truly present in the system and/or few may be of interest. If 2FI were NOT screened originally, then include any associated with the significant main effects. o If resources do not allow this, then include all that are potentially relevant based on the physics of the system. o There is a risk a significant 2FI might be missed if all are not included. If 2FI were screened, then (obviously) include the significant ones in the model. Page 4

o If some are aliased with other 2FI, use the significant main effects to objectively weed out the insignificant 2FI or include all that might be significant. One can also augment/fold the design strategically to break the alias chains. A Comment on Violation of DOE Fundamentals Typically, the transition into actual testing uncovers execution details that were previously not well understood. Issues with randomization, factor level selection or setting, equipment or measurement limitations, noise or other nuisance factors, and myriad others are typical. These issues may impact the ability of the test design to effectively isolate and estimate significant factors and need to be recorded for consultation during analysis. This information may also demonstrate a need to restructure the designs to address an inadequate SNR or correct for lack of randomization so the analytical goals are met. The critical point is to analyze the data as the design was actually executed, not simply how it was planned. Questions About Model Validation How much validation is required at each step in the iterative evaluation process? How will points be selected (quantity and location)? Should the points focus on the entire space or address identified risk areas? What are the resource costs and impacts? Use of Validation Points The number and location of validation points are not defined by any hard rules (see Kutner (2004) or NIST (2017) for discussion). The first time through you may want to choose them randomly throughout the space to evaluate model accuracy in a broad sense. As testing proceeds, the validity of the model may be more important in areas where higher risk is identified or in areas with higher prediction variance and less so in regions with better performance margin. Furthermore, risk regions addressed by the new configuration may be between design points so the only way to judge actual performance is to add points into that region. Lastly, validation points may uncover regions of interest where the model performs poorly. This may help the team focus efforts to add additional factors or model terms to improve the modeling or fit higher order terms if lack of fit remains significant. Reporting Test Results For evaluation purposes, the main benefit of DOE is the statistically rigorous determination of an empirical regression equation that includes mathematically-estimated interactions. This model facilitates estimation of the response throughout the factor space while accounting for the percent of variability captured by the model (R 2 ) and error (root mean square error [RMSE]). This estimate of the response can be compared to the requirement. Sufficient sequential testing should result in an accurate model that supports insightful information gathering. However, determination of significant factors and the performance assessment can only facilitate a correction of deficiencies when it is reported clearly. The report must include regions of good margin, high risk (low margin), and failing performance along with the significant factors and interactions determined from testing. The contributing factors may help the developer determine root cause and Page 5

effectively implement changes. The report can also provide the actual regression equation and the electronic analysis files (e.g., JMP, Design Expert, etc.). A straightforward format for reporting this information is given below. Response name Factors screened o List all original factors Significant factors at XX% confidence o List factors/terms to include any interactions Significant factors comments o Include any noteworthy details about changes in significance between iterations ZZ% of the factor space passes the requirement o This can be accomplished with Monte Carlo simulation on the regression equation throughout the factor space (JMP software includes this in the profiler platform. o Provide any noteworthy comments regarding program risk in these areas Failing performance details o Driving factors (highest impact on negative performance) o Factors/levels where performance does not meet requirement o Recommendations Regression equation o The model coefficients convey the impact of each term on the response o Including a coded model allows relative impacts to be more easily observed. All factor levels are relative so actual units (e.g., meters, Hz, Watts) are not an issue. o This may be more effectively included electronically in the native DOE software or in Excel format for direct use by engineers Repeat this listing for each response. Conclusions Employing DOE in DT imparts a unique iterative design generation/analytical process and requires the team to consider the application of sequential testing in a new context. The team must set analytical goals so the designs are correctly formulated and executed through the course of repetitive configuration changes. The use of DOE also facilitates clear and precise reporting of analytical results which effectively support the systems engineering process. References Box, George Edward Pelham., et al. Statistics for Experimenters: Design, Innovation, and Discovery. 2nd ed., Wiley, 2005. Kutner, Michael H., et al. Applied Linear Regression Models. 4th ed., McGraw-Hill, 2005. Montgomery, Douglas C. Design and Analysis of Experiments. Wiley, 2017. Page 6

National Institute of Standards and Technology (NIST), NIST/SEMATECH e-handbook of Statistical Methods, http://www.itl.nist.gov/div898/handbook/pri/section4/pri46.htm, accessed 13 December 2017. Page 7