DISTRIBUTION STATEMENT A Approved for public release: distribution unlimited.

AVIA Test Selection through Spatial Variance Bounding Method for Autonomy Under Test By Miles Thompson Senior Research Engineer Aerospace, Transportation, and Advanced Systems Lab DISTRIBUTION STATEMENT A Approved for public release: distribution unlimited. DISTRIBUTION STATEMENT A Approved for public release: distribution unlimited. 1

Autonomy Validation, Introspection, and Assessment (AVIA) Rapidly conduct massive numbers of analytic assessments of autonomy system for DARPA's Anti-Submarine Warfare Continuous Trail Unmanned Vessel (ACTUV) in complex, dynamic scenarios Impact Assess capabilities beyond operational envelope: 10x 100x more scenarios Address ACTUV operational concerns Provide > 10x reduction in time, from 42 days to < 1 day DISTRIBUTION STATEMENT A Approved for public release: distribution unlimited. 2

How does AVIA accomplish this? Sample entire test domain intelligently and without bias Currently using Latin hypercube sampling and SVBM Allow perturbations and emergent behaviors to arise Upgradeable: can implement future sampling techniques Parallel scenario execution Demonstrated 1000 1-hour scenarios in < 24 hours (16 hours) Automatic metrics evaluation Metrics stored in MySQL database, easy to visualize with analysis graphical user interface (GUI) Metrics are requirement dependent and include introspection to increase understanding of autonomy logic DISTRIBUTION STATEMENT A Approved for public release: distribution unlimited. 3

How does AVIA accomplish this? Sample entire test domain intelligently and without bias Currently using Latin hypercube sampling and SVBM Allow perturbations and emergent behaviors to arise Upgradeable: can implement future sampling techniques What is Spatial Variance Bounding Method (SVBM)? Test selection method designed for discontinuous performance of complex systems across a large test domain Perfect for autonomous systems and autonomy under test Iteratively selects tests to run based on expected variance Goal is to bound the expectations of performance throughout the test domain including probability of exceeding a requirement value DISTRIBUTION STATEMENT A Approved for public release: distribution unlimited. 4

Requirements for SVBM Assumptions about performance of the Autonomy Under Test (AUT) Performance is a random variable with known or unknown distribution» May still be deterministic, but assumed stochastic or probabilistic Performance is stationary, it is time-invariant» A learning autonomy is not time invariant Performance can be measured with scalar values for all metrics Performance at similar locations in the test domain have similar variances» There is a spatial relationship between test points DISTRIBUTION STATEMENT A Approved for public release: distribution unlimited. 5

How Test Selection Works in SVBM 1. Initial Sampling 2. Iterative Sampling 3. Completion Criteria DISTRIBUTION STATEMENT A Approved for public release: distribution unlimited. 6

Initial Sampling (Seeding) DISTRIBUTION STATEMENT A Approved for public release: distribution unlimited. 7

Requirements for Initial Sampling Defined test domain The set of valid test cases that arise from the set of parameter values May or may not include the system under test (SUT) that the autonomy will inhabit AVIA leverages scenarios Smaller sections of the test domain that contain interesting results Scenarios have certain constants and other parameters are varied Gives the tester/developer more control over the test domain DISTRIBUTION STATEMENT A Approved for public release: distribution unlimited. 8

Initial Sampling Method AVIA uses a random sampling technique to seed the test domain Uses a form of Latin Hypercube Sampling that is not orthogonal under the following conditions: Sample size greater than the number of categorical values in a parameter Sample size greater than the number of integer values in a range of an integer parameter DISTRIBUTION STATEMENT A Approved for public release: distribution unlimited. 9

Example Initial Sample Run # of Transients Visibility (m) Sea State Seed 1 7 2323 0 1220734062 2 5 3951 5 533612633 3 7 1555 2 1987750129 4 5 1202 2 704368234 5 6 3725 1 1967037535 6 4 1907 4 1594332788 7 4 3085 3 1280899409 8 6 2681 1 1870924790 9 8 3489 0 420668153 10 8 2055 3 1065225544 DISTRIBUTION STATEMENT A Approved for public release: distribution unlimited. 10

Running Initial Sample Initial sample is passed to Test Matrix Tool (TMT) Run Manager for simulation and results gathering While the sample is being run, as runs complete, the metrics from those runs are added to a SQL database and available for viewing in the analysis graphical user interface (GUI) DISTRIBUTION STATEMENT A Approved for public release: distribution unlimited. 11

Iterative Sampling DISTRIBUTION STATEMENT A Approved for public release: distribution unlimited. 12

Why Sample Iteratively? Use results to drive selection of desired test locations Know when to stop testing prior to running entire domain Entire test domain likely unfeasible Blind large sample may be costly and not provide desired results SVBM can use a variety of iterative sampling methods as long as those methods provide an estimate of variance DISTRIBUTION STATEMENT A Approved for public release: distribution unlimited. 13

Iterative Sampling Methods Interpolation Estimating the values of test locations that have not been run by combining weighted values from other test locations in the previous sample Genetic Algorithm Using natural selection to guide sampling toward desired test locations based on crossover and mutation of the previous sample Univariate Search Modifying the previous sample by stepping each parameter sequentially to produce a new sample DISTRIBUTION STATEMENT A Approved for public release: distribution unlimited. 14

Background on Kriging Kriging provides a performance estimate at a point in the test domain based on the results of all completed points and their spatial relationship Unlike many interpolation methods, Kriging uses covariance between parameters to estimate metric values Prevents bias from closely located tests on predictive estimates Allows each metric to have its own weights for interpolation based on the spatial relationship between test locations DISTRIBUTION STATEMENT A Approved for public release: distribution unlimited. 15

Applied Kriging Examples Design Optimization of an Engine Joseph, V.R., Hung Y., & Sudjianto, A. (2008). Blind Kriging: A new Method for Developing Models DISTRIBUTION STATEMENT A Approved for public release: distribution unlimited. 16

Applied Kriging Examples System Tradespace Search Fulcoly, D. O., Shaw, N. B., Ross, A. M., & Rhodes, D. H. (2012). Exploiting Multidimensional Design of Experiments and Kriging Methods: An Application to a Satellite Radar System Tradespace and Orbital Transfer Vehicle Tradespace DISTRIBUTION STATEMENT A Approved for public release: distribution unlimited. 17

Kriging in AVIA AVIA follows a very similar path to the tradespace search when it comes to testing Major difference: Tradespace looks for optimal or satisfactory conditions AVIA currently looks for correct/comprehensive performance characterization with regard to requirements AVIA uses Kriging to identify the points in the test domain that the testers know the least about These points have the highest uncertainty highest variance Every iteration of the sample will be the tests with the highest uncertainty AVIA does not bias on passing or failing requirements, which makes it an excellent test tool AVIA could be used as a design tool looking for optimal performance, but currently it is being used for assessing autonomy DISTRIBUTION STATEMENT A Approved for public release: distribution unlimited. 18

Kriging in AVIA With the distance, known as lag, calculated between all existing runs, AVIA can calculate a covariance matrix for each of the metrics being analyzed (see backup slides) It is important to analyze each metric independently because each could have a different variogram, the spatial relationship of lag to variance for that metric. Closest Point of Approach (CPA) DISTRIBUTION STATEMENT A Approved for public release: distribution unlimited. 19

Completion Criteria DISTRIBUTION STATEMENT A Approved for public release: distribution unlimited. 20

Completion Criteria Completion criteria are required so that an iterative sampling mechanism knows when to stop It also should be meaningful to the tester In AVIA, desire to have a comprehensive performance characterization across the test domain Accomplished by getting an accurate representation of the variance per metric Accurate performance variance requires a good sample» AVIA assumes this condition is reached when the metric variances are steady state DISTRIBUTION STATEMENT A Approved for public release: distribution unlimited. 21

Steady State Variance Example Iteration 1 Variances x10^-3 Iteration 2 Variances Collisions/Hour CPA Collisions/Hour CPA SIMULATED DATA FOR THE PURPOSE OF PRESENTATION DISTRIBUTION STATEMENT A Approved for public release: distribution unlimited. 22

Convergence/Divergence/Steady Iteration 3 Variances x10^-3 Iteration 4 Variances Collisions/Hour CPA Collisions/Hour CPA SIMULATED DATA FOR THE PURPOSE OF PRESENTATION DISTRIBUTION STATEMENT A Approved for public release: distribution unlimited. 23

Convergence/Divergence/Steady x10^-3 Iteration 5 Variances x10^-3 Iteration 6 Variances Collisions/Hour CPA Collisions/Hour CPA SIMULATED DATA FOR THE PURPOSE OF PRESENTATION DISTRIBUTION STATEMENT A Approved for public release: distribution unlimited. 24

Convergence/Divergence/Steady x10^-3 Iteration 7 Variances x10^-3 Iteration 8 Variances Collisions/Hour CPA Collisions/Hour CPA SIMULATED DATA FOR THE PURPOSE OF PRESENTATION DISTRIBUTION STATEMENT A Approved for public release: distribution unlimited. 25

Guaranteed Steady for Stationary Guaranteed to converge to steady state for stationary (time-invariant) performance: Even if truly random to some distribution Even if no spatial variance relationship (see CPA above) Finite simulation testing of large test domains with completion criteria in AVIA DISTRIBUTION STATEMENT A Approved for public release: distribution unlimited. 26

Advantages/Disadvantages Advantages/Disadvantages of Spatial Variance Bounding Method in AVIA Pros:» Can bound expected performance for any conditions that were not tested (excellent for acquisition evaluation)» Finite Cons:» Scales with the variability of the system» Each metric assessed individually for steady state» Iterative sampling criteria is flexible and can be made to locate min/max» If high variability in system, will require many runs» Requires all metrics to be scalar» Assumes all metric errors to be normally distributed to remain unbiased DISTRIBUTION STATEMENT A Approved for public release: distribution unlimited. 27

Acknowledgement Thanks to DARPA for funding this research and pushing assessment methods for autonomous systems. QUESTIONS? This research was developed with funding from the Defense Advanced Research Projects Agency (DARPA). The views, opinions and/or findings expressed are those of the author and should not be interpreted as representing the official views or policies of the Department of Defense or the U.S. Government. DISTRIBUTION STATEMENT A Approved for public release: distribution unlimited. 28

BACKUP SLIDES DISTRIBUTION STATEMENT A Approved for public release: distribution unlimited. 29

Explanation of Kriging From Bailey and Gatrell Ordinary Kriging The weights are based on the covariance between the locations of the test that have been completed and the location of the desired untested point DISTRIBUTION STATEMENT A Approved for public release: distribution unlimited. 30

Explanation of Kriging Introduce a Lagrange multiplier λ to ensure that the sum of all the weights = 1 to minimize the variance between the actual and predicted response DISTRIBUTION STATEMENT A Approved for public release: distribution unlimited. 31

Explanation of Kriging The interpolation weights provide the estimated response based on already captured responses Predicting the response at an untested location is good, but AVIA needs the estimated variance at that location for test selection DISTRIBUTION STATEMENT A Approved for public release: distribution unlimited. 32

Explanation of Kriging Minimizing the variance of the prediction errors with the Lagrangian, yields the following: Modifying the previous equations for calculating weights: DISTRIBUTION STATEMENT A Approved for public release: distribution unlimited. 33

Explanation of Kriging Traditional Kriging Methods: Simple Kriging The mean performance is known and it is zero (or a constant)» The sum of the weights does not have to equal 1 Ordinary Kriging The mean performance is not known, but it is constant» The sum of the weights equals 1 Universal Kriging (Kriging with a trend) The mean performance is dictated by an unknown trend-line (typically polynomial)» Requires sufficient sample size to estimate the polynomial Modern Kriging Methods: Disjunctive Kriging Nonlinear generalization of Kriging Blind Kriging Special Universal Kriging using Bayesian inference for estimating trend-line DISTRIBUTION STATEMENT A Approved for public release: distribution unlimited. 34

Kriging Lag Calculation in AVIA Finding the distance in AVIA test domain: Kriging relies on covariance to calculate its weights» The covariance requires a distance commonly referred to as lag between test locations AVIA s parameters are rarely in the same units» Requires normalization to ensure dimensions can be computed as lag Run # of Transients Visibility (m) Sea State Target Present 1 7 2323 0 False 2 5 3951 5 False 3 7 1555 2 True DISTRIBUTION STATEMENT A Approved for public release: distribution unlimited. 35

Kriging Lag Calculation in AVIA Run # of Transients Visibility (m) Sea State Target Present 1 1 0.321 0 False 2 0 1 1 False 3 1 0 0.4 True AVIA normalizes each parameter individually, then calculates the lag matrices for each parameter. Categorical parameters are treated with simple matching coefficient. # of Transients Visibility Sea State Target Present Run 1 2 3 Run 1 2 3 Run 1 2 3 Run 1 2 3 1 0 1 0 1 0 0.679 0.321 1 0 1 0.4 1 0 0 1 2 1 0 1 2 0.679 0 1 2 1 0 0.6 2 0 0 1 3 0 1 0 3 0.321 1 0 3 0.4 0.6 0 3 1 1 0 With lags calculated for each parameter, AVIA calculates an overall distance by Euclidean distance calculations. In this way, all distances will be between 0 and n, where n is the number of parameters. In this case, n = 4 so all distances will be [0, 2]. DISTRIBUTION STATEMENT A Approved for public release: distribution unlimited. 36

Kriging Lag Calculation in AVIA Run 1 2 3 1 0 1.569 1.124 2 1.569 0 1.833 3 1.124 1.833 0 By the calculations, runs 1 and 3 are the closest to each other and runs 2 and 3 are farthest from each other. A lag of 0 indicates that two runs are identical in parameters. DISTRIBUTION STATEMENT A Approved for public release: distribution unlimited. 37

Kriging in AVIA Variograms in AVIA Calculated by binning the lags and calculating the variance of all runs metric values in that bin» Only bin up to 50 percent of max lag» Expect variance to approach sill after halfway across the test domain Fit a theoretical variogram model to smooth the spatial variance» Currently use Gaussian because likely to have a non-zero y-intercept (nugget variance)» A horizontal variogram indicates very little spatial relationship DISTRIBUTION STATEMENT A Approved for public release: distribution unlimited. 38