INFORMATION DRIVEN OPTIMIZATION SEARCH FILTER: PREDICTING TABU REGIONS. Matthew H. Jones

Size: px

Start display at page:

Download "INFORMATION DRIVEN OPTIMIZATION SEARCH FILTER: PREDICTING TABU REGIONS. Matthew H. Jones"

James Merritt
5 years ago
Views:

1 Proceedings of the Systems and Information Engineering Design Symposium Matthew H. Jones, Stephen D. Patek, and Barbara E. Tawney, eds. INFORMATION DRIVEN OPTIMIZATION SEARCH FILTER: PREDICTING TABU REGIONS Matthew H. Jones Department of Systems and Information Engineering University of Virginia ABSTRACT Many search techniques fail to account for the information obtained from previous objective function evaluations when determining a new set of control parameters. This paper presents an empirical study of a neural net prescreener using a random and grid search however one may use the pre-screener in combination with any search procedure. A single neural network model is exted through the usage hierarchical clustering to organize the search space into groups with a corresponding neural network model. Empirical tests indicate that a neural network prescreener is beneficial in significantly reducing the number of probes with a minimal cost to accuracy, an acceptable trade-off given the high cost of executing complex objective functions. Specifically, a single neural network model is optimal given a random search. Hierarchical clustering using m separate neural network models outperforms, in terms of deviance and number of probes, the usage of a single model with respect to a grid search. The grid search provides a broader coverage of the search space, yielding more information regarding the search space. INTRODUCTION Many techniques fail to account for the information obtained from previous objective function evaluations when determining a new set of control parameters. Searches may blindly search the feasible control space, exhibit a level one Markovian process, or keep a record of previously explored regions when selecting the new vector of control parameters for future search. Historically the determination of the next search area has been a computational simple task however this paradigm is slowly shifting. Complex computer simulations models execute in times magnitudes larger than the time to determine the next control parameters to search. Time spent in intelligently selecting the next search region, to minimize the number of objective function evaluations, drastically reduces the overall time for optimization. Information discovery and data mining principles provide insight into the global behavior of the objective function response surface. Each new objective function evaluation yields further insight into the response surface, identifying areas of potential optimum, considering the information obtained from all previous objective function evaluations. OptQuest, a commercially available optimization package, performs just such a task using a neural network pre-screener to identifying potentially inferior sets of control points prior to exping the resources to evaluate the objective function [, ]. OptQuest s neural net pre-screener is proprietary knowledge. The paper presents an empirical study of a neural net pre-screener using a random and grid search however one may use the pre-screener in combination with any search procedure. A single neural net model is fit over the entire search space. A set of points must be calculated to initially populate the neural net as well as the metric for disqualifying a control point based on the neural net model, also called the risk level. Both quantities require sensitivity analysis to gain insight and in making general recommations regarding usage of the pre-screener over a general class of problems. The mean squared error and mean absolute deviation error metrics act as the fit minimization criterion in the neural net prediction model. In all instances, the number of probes and deviation from the best answer with and without the pre-screener act as metrics for comparison among implementations. Lastly, a single neural net model may not provide the best prediction over the entire search space. A hierarchical clustering procedure organizes the clusters into groups and a separate neural net model fit over each set of clusters. A potential control vector is classified into one of the m clusters and a prediction generated from the specific neural net model. A preliminary study is presented. SELECTION OF NEURAL NETWORKS Neural networks contain simple elements or neurons operating in parallel inspired by the biological nervous system. The structure and prediction generated from a neural net is determined by the connections between elements. The weights and connections of the neurons adapt to match a supervised learning pattern between the input and output. Feed forward backpropogation neural nets were selected given their ability to provide reasonable predictions when

2 presented with non-training data. For more information regarding neural networks and the backpropogation method please see [, ]. MATLAB s built in neural network toolbox and hierarchical clustering functions provide an excellent tool for coding the problem. All networks have two layers with the first layer having ten nodes with a log-sigmoid transfer function and the second layer containing one node with a linear transfer function. The performance function is either mean squared error (MSE, an L error metric) or mean absolute deviation (MAD, an L error metric). A single outlier using the MSE performance measure is heavily penalized and highly influences the other points, a potentially undesirable property. The best minimum is defined as the best minimum value observed in the search. The predicted output is the value predicted, for the potential next set of control parameters using the neural net model. This value is then normalized by the best minimum. If this value is less than some threshold, we accept and if not the point is thrown out and not executed over the loss function.. Random Search.. Determination of Risk Level FITTING A SINGLE NEURAL NETWORK OVER THE SEARH SPACE The following function is used in fitting a single neural network model over the entire search space. Empirical analysis over this problem yields insight into determining an appropriate number of points to initially the fit the network, determine the risk level, and the performance criterion (MSE or MAD). The function is as follows: [( x x + x ) + ( x x x )] L ( x, x) = + () T we wish to minimize L ( x, x ); Θ [, ] with * θ = [. 9,. 9]. The function is multi-modal with four local minimum, one of which is the global optimum. It may be seen below: Local Minima Local Minima Local Minima Deviation Fliter/No-Filter Changing Risk Level - Random Threshold Value 9 Figure. Risk Level Random Search on MAE Average MSE Average MAE SD MSE SD The above figure summarizes the threshold value as a function of the deviation obtained by implementing the neural net pre-filter. Each point represents the performance metric over replications. The solid lines represent the average over the replications for the MSE and MAD performance criterion. The dashed lines denote the standard deviation over the replications for the MSE and MAD performance criterion. As the threshold value increases, the more likely a control point is to be accepted and thus the greater number of probes into the response surface. It is interesting to note that the mean and standard deviation are similar. In this empirical case, neither the MSE or MAD performance criterion dominate over all values of test threshold values. Global Minima Figure. Contour Plot of Response Surface The risk level is defined as follows: Best_Min. Pred_Output / Best_Min. < τ ()

3 Changing Risk Level.. Determination of Initial Sample Size. Number of Probes (Possible ) MAE Average MSE Average MAE SD MSE SD Number of Probes (Possible ) Threshold Figure. Risk Level Random Search on Number Probes The above figure displays the threshold value as a function of the number of probes. As the threshold value increases, the number of probes increases. The standard deviation over the samples is displayed as the two dashed lines. These results mirror the deviance chart displayed in Figure. As the threshold value increases, the deviance goes down however this requires more surface probes. The slope of the averages for MSE and MAD indicate a large performance jump from moving from a threshold value of. to.. The deviance seen in Figure from. to. is similar for a % reduction in probes. A threshold of. in this instance appears optimal in reducing the number of probes while minimizing the deviance. The plot below plots the mean deviance and number of probes for the MSE and MAD case. In this case, MSE and MAD exhibit similar performance measures. Intial Sample Size MAE Number Probes MSE Number Probes MAE MSE Figure. Initial Sample Size Random Search Summary The above plot displays the sensitivity analysis over the number of probes, the dashed lines representing the deviance, solid lines the number of probes. Decreasing the initial sample size decreases the number of probes however increases the deviance and vice versa. In general, the MAE error criterion exhibits minimal deviance with a fewer number of probes than does the MSE criterion across the initial sample size. Given this empirical results and the desire to not heavily penalize outliers, MAD is the preferred performance metric for minimization when fitting a neural network model.. Grid Search 9.9 Number of Probes % Reduction in Possible Number of Probes Threshold Value MAE MSE MAE Number of Probes MSE Number of Probes Figure. Risk Level Random Search Summary. Grid Size (NxN) % Reduction in Possible Number of Probes Figure. Initial Sample Size Random Search Summary The above plot, taken over a series of grid searches with varying grid sizes, displays the deviance and probe reduction as a function of the grid size. The deviance from having the neural net pre-screener and not have the prescreener decreases as the grid size increases. The percent reduction in probes increases as the grid size increases. The

4 number of probes increases however the percentage of probes over the number of possible probes decreases. It is best to maximize the available probe budget in a grid search to minimize the deviance. FITTING MULTIPLE NEURAL NETWORKS OVER THE SEARCH SPACE A single network may not yield an accurate prediction over the entire response surface, particularly if the surface is multi-modal and large jumps in the response occur with a small change in the control parameters. This occurs with a steep gradient, may indicate a local optimum, and is difficult to predict. The chart below shows a second order polynomial fit over a cross section of the test function. A single second order fit yields a poor fit given the multimodal nature of the data. This provides an incentive to investigate the impact of fitting multiple neural net models over the response surface. The difficulty comes in determining how to separate regions into partitions and to classify a new control parameter into one of m possible partitions for prediction. Hierarchical clustering provides a solution.. AVELINK Clustering using Euclidean Distances with Clusters Average link clustering is implored with the Euclidean distance metric. In this empirical test, either three or four separate clusters are chosen. Figure 9 displays a sample drogram and displays at what distances individual points or clusters join to form larger clusters. The number of intersections, with the vertical lines, formed by drawing a horizontal line across the drogram determines the number of clusters formed. Thus in Figure 9 the placement of the horizontal line across the figure indicates the formation of four clusters Distance at Which Clusters/Points Join - Figure. Fitting One Second Order Polynomial Model Cluster Number Figure 9. Sample Drogram Fitting two separate second order polynomial model exactly matches data, yielding far better predictive power... Random Search Figure. Fitting Two Second Order Polynomial Models In the empirical test for a random search, points are probed to form the initial clusters. A neural net model is fit over each of either three of four groups of clusters. The generation of a new point by the random search requires the following steps:. Generate new point (randomly or next in algorithm) for potential objective function evaluation. Calculate the centroid of each cluster density by average the x and x s for each of the m clusters. Calculate the Euclidean distance between each of the m cluster centroids and the new point. Make the prediction based on the points assignment to the neural networks associated with the selected cluster. Accept the new point or reject based on the associated risk level. If the new point is accept, re-calculate the m clusters and update the m neural networks.

5 The plot below displays a sample set of clusters with the larger shadowed points denoting the centroid Figure. Example Clusters for Grid Search -. Number of Probes - Figure. Example Clusters for Random Search Clusters Clusters Linear ( Clusters) Linear ( Clusters) Figure. Random Search Results The above plot shows the deviance as a function of the number of probes in the case of using three of four clusters. A least squares regression line for the cluster cases suggests deviance and the number of probes are not related and in the cluster case, a decreases in the number of probes denotes an increase in the deviance. Comparisons with usage of a single neural net model occurs in Section. Grid Size Probes Used / Possible Number Probes Figure. and Probes Used as a Function of Grid Size with Clusters The above plot displays the deviance as a function of grid size and the number of probes used over the possible number of probes. The percentage of possible probes used decreases as the grid size decreases and the deviance decreases, similar results to that of imploring a single neural network model. Comparisons with using a single neural net model occurs in Section Probes Used / Possible Number Probes.. Grid Search Four separate clusters are fit over the search region imploring a grid search. The chart below displays the clusters with the respective centroid.

6 COMPARISONS. Random Search superior for a grid search compared to a single neural network model. CONCLUSIONS from Best Optimum in Probes 9 - Number of Probes Rand Rand Rand Rand Neural Networks NN with Clustering Figure. Random Search Comparison The above plot displays empirical trials of a random search with and without the neural network pre-screener as well as neural network model imploring hierarchical clustering. Fitting a neural model outperforms half of the random runs not using a neural network pre-screener and performs similarly to the other two runs. In general, NN with clustering is outperformed by the a single neural network model based on the number of probes and accuracy. The empirical tests indicate that a neural network prescreener is beneficial in significantly reducing the number of probes with a minimal cost to accuracy, an acceptable trade-off given the high cost of executing complex objective functions. Specifically, a single neural network model, in this empirical test, is appropriate given a random search. The grid search provides a broader coverage of the search space, yielding more information regarding the search space. The number of points within each of the m clusters is more uniformly distributed than the clusters created in a random search. Hierarchical clustering using m separate neural network models outperforms, in terms of deviance and number of probes, the usage of a single model when the underlying search process is a grid search. This provides empirical evidence that information driven search procedures may provide significant performance enhancements in optimization search procedures. Clearly more research is required in the broader applicability of the tool however the widespread usage and acceptability of OptQuest, imploring a neural network pre-filter, provides promising results. REFERENCES. Grid Search 9 9 Grid Size - NN - NN with Clustering Probe Reduction - NN Probe Reduction - NN with Clustering Figure. Grid Search Comparison % Possible Probes [] M. Laguna, "Metaheuristic Optimization with Evolver, Genocop and OptQuest," l, 99. [] F. K. Glover, James P. ; Laguna, Manuel, "The OptQuest Approach to Crystal Ball Simulation Optimization," l. [] R. E. Uhrig, "Introduction to artificial neural networks," presented at Industrial Electronics, Control, and Instrumentation, 99., Proceedings of the 99 IEEE IECON st International Conference on, 99. [] A. K. Jain, J. Mao, and K. M. Mohiuddin, "Artificial neural networks: a tutorial," Computer, vol. 9, pp. -, 99. The above figure summarizes the grid search with and without the usage of hierarchical clustering. The lines representing the deviance as a function of the grid size nearly coincide between the two methods. The lines representing the probe reduction denote that neural networks with hierarchical clustering require far fewer probes with the same deviance level. This empirically shows that creating m clusters and fitting m neural networks over each cluster is AUTHOR BIOGRAPHIES MATTHEW H. JONES is a Ph.D student in the Systems and Information Engineering Department under the direction of Professor William T. Scherer. He received his B.S and M.S. from the University of Virginia in Systems Engineering. He begins employment with The Aerospace Cor-

7 poration in Chantilly, VA in the Systems Architecture and Engineering Department at the of the summer session to remotely write his dissertation. He currently serves as the head teaching assistant for the Accelerated Masters Program in Systems Engineering. He is a member of INFORMS and INCOSE. APPENDIX A: MATLAB CODE SNIPPETS % Define a Neural Network net = newff([-, ; -, ], [ ], {'logsig','purelin'}); net.performfcn = 'mae'; % Perform Initial Clustering Y = pdist(predictor'); Z = linkage(y, 'average'); T = cluster(z, ); for i = :length(predictor') if (T(i) == ) X = size(cluster); Cluster(X(,) +,:) = Predictor_T(i, :); Cluster_R(X(,)+) = Response(i); if (T(i) == ) X = size(cluster); Cluster(X(,) +,:) = Predictor_T(i, :); Cluster_R(X(,)+) = Response(i); if (T(i) == ) X = size(cluster); Cluster(X(,) +,:) = Predictor_T(i, :); Cluster_R(X(,)+) = Response(i); if (T(i) == ) X = size(cluster); Cluster(X(,) +,:) = Predictor_T(i, :); Cluster_R(X(,)+) = Response(i); % Compute the centroids A = []; A(max(T),) = ; N = []; N(max(T),) = ; Centroids = []; Centroids(max(T), ) = ; for i = :length(t) N(T(i)) = N(T(i)) + ; for j=: A(T(i),j) = A(T(i),j) + Predictor_T(i,j); for i = :max(t) for j=: Centroids(i,j) = A(i,j) / N(i); % Determine the cluster closet to the current centroids min = ; min_cluster = ; for i = :max(t) Distance = []; Distance(,) = x; Distance(,) = x; Distance(,) = Centroids(i, ); Distance(,) = Centroids(i, ); distance = pdist(distance, DISTANCE_METRIC); if (distance < min) min_cluster = i; min = distance; % Train the neural nets over each cluster if (update_net == ) if (length(cluster) > ) net = train(net, Cluster', Cluster_R); if (length(cluster) > ) net = train(net, Cluster', Cluster_R); if (length(cluster) > ) net = train(net, Cluster', Cluster_R); if (length(cluster) > ) net = train(net, Cluster', Cluster_R); update_net = ; % Train the neural networks if (min_cluster == ) predicted_output = sim(net, [x;x]); if (min_cluster == ) predicted_output = sim(net, [x;x]); if (min_cluster == ) predicted_output = sim(net, [x;x]); if (min_cluster == ) predicted_output = sim(net, [x;x]);

8 Jones

Metaheuristic Optimization with Evolver, Genocop and OptQuest

Metaheuristic Optimization with Evolver, Genocop and OptQuest MANUEL LAGUNA Graduate School of Business Administration University of Colorado, Boulder, CO 80309-0419 Manuel.Laguna@Colorado.EDU Last revision: