ADAPTATION OF THE UOBYQA ALGORITHM FOR NOISY FUNCTIONS

Size: px

Start display at page:

Download "ADAPTATION OF THE UOBYQA ALGORITHM FOR NOISY FUNCTIONS"

Aleesha Chambers
5 years ago
Views:

1 Proceedings of the 2006 Winter Simulation Conference L. F. Perrone, F. P. Wieland, J. Liu, B. G. Lawson, D. M. Nicol, and R. M. Fujimoto, eds. ADAPTATION OF THE UOBYQA ALGORITHM FOR NOISY FUNCTIONS Geng Deng Department of Mathematics University of Wisconsin 480 Lincoln Dr. Madison, WI 53706, U.S.A. Michael C. Ferris Computer Sciences Department University of Wisconsin 1210 W. Dayton Street Madison, WI 53706, U.S.A. ABSTRACT In many real-world optimization problems, the objective function may come from a simulation evaluation so that it is (a) subject to various levels of noise, (b) not differentiable, and (c) computationally hard to evaluate. In this paper, we modify Powell s UOBYQA algorithm to handle those real-world simulation problems. Our modifications apply Bayesian techniques to guide appropriate sampling strategies to estimate the objective function. We aim to make the underlying UOBYQA algorithm proceed efficiently while simultaneously controlling the amount of computational effort. 1 INTRODUCTION Powell s UOBYQA (Unconstrained Optimization BY Quadratic Approximation) (Powell 2002) is a derivativefree algorithm designed for (small scale) unconstrained optimization. The general structure of UOBYQA follows a model-based approach, which constructs a series of local quadratic models that approximate the objective function. The algorithm generates its iterates using a trust region framework (Nocedal and Wright 1999). It differs from a classical trust region method in that it creates quadratic models by interpolating a set of sample points instead of using the gradient and Hessian values of the objective function (thus making it a derivative-free tool). Besides UOBYQA, other model-based software includes WEDGE (Marazzi and Nocedal 2002) and NEWUOA (Powell 2004). We develop a variant of the original UOBYQA, called noisy UOBYQA, that is adapted for noisy optimization problems. These are very common in real-world applications, in particular when the objective function is an associated measurement of an experimental simulation. In this situation, the objective value may be difficult to obtain and the inaccuracy of the objective function often complicates the optimization process. For example, the derivative value is typically unavailable, thus many standard algorithms are not applicable. We consider the stochastic problem in the following parametric form: min F(x) = E[ f(x,ω(x))]. (1) x Rn The underlying function F( ) is unknown and must be estimated. The function f(, ω(x)) is a sample response function, which is affected by a random factor ω(x) at x. We normally assume the noise term ω(x) has mean zero and is independently distributed. As a special case, we consider ω(x) as white noise in our paper and use the additive form: f(x,ω(x)) = F(x)+ω(x). When noise is present, UOBYQA may behave precariously. For example, a subproblem that minimizes the quadratic model within a trust region may generate poor solutions. The idea of our modification is to control the random error by averaging multiple evaluations per point, helping the algorithm to proceed appropriately. Therefore, we will face the primary issue of handling the tradeoff between two objectives: having an efficient algorithm versus minimizing computational effort. On the one hand, we sample more replications to increase the accuracy of estimation of the underlying function F and hence the accuracy of the algorithm; on the other hand, we want to use as little computation as possible. In our modifications, we apply Bayesian techniques to establish rules of evaluating appropriateness of performing a step in UOBYQA. Only when such rules are satisfied, do we proceed with this step of the algorithm. The noisy UOBYQA algorithm shares similarities to the Response Surface Methodology (RSM) (Box and Draper 1986). Both of the methods construct a series of models to the simulation response during the optimization process. However, many features implemented in UOBYQA, such as the quadratic model update, and trust region control, have advantages over the classical RSM approach /06/$ IEEE 312

2 The remainder of the paper is arranged as follows. In Section 2, we give a brief outline of the deterministic UOBYQA algorithm. In Section 3, we present the noisy UOBYQA algorithm, including the following modifications: stabilizing the quadratic models, comparing two candidate points and a new termination criterion. We also show how to optimally allocate computational resources in these procedures. In Section 4, we apply the new noisy UOBYQA algorithm to several numerical examples and compare it with other noisy algorithms. Finally, we design and solve a real simulation model. 2 THE UOBYQA ALGORITHM We first outline the structure of the standard UOBYQA in the noiseless case. In the following description, we denote the deterministic objective function as f( ). As we have mentioned, UOBYQA is essentially in the framework of a model-based approach. Starting the algorithm requires an initial trial point x 0 and an initial trust region radius 0. At each iteration k, the derivative estimate of the objective function f is contained in a quadratic model Q k (x k + s) = c Q + g T Qs+ 1 2 st G Q s, (2) which is constructed by interpolating a set of well-positioned points I = {y 1,y 2,...,y L }, Q k (y i ) = f(y i ), i = 1,2,...,L. The point x k acts as the center of the trust region, and is the best point in the set I. The interpolative model is expected to well approximate f around the base point x k, such that the parameters c Q,g Q and G Q approximate the Taylor expansion coefficients of f around x k. To ensure a unique quadratic interpolator, the number of interpolating points should satisfy L = 1 (n+1)(n+2). (3) 2 As in a classical trust region method, a new promising point is found from a subproblem: min Q k(x k + s), subject to s 2 k. (4) s R n The solution s of (4) is accepted (or not) by evaluating the degree of agreement between f and Q k : γ = f(x k) f(x k + s ) Q k (x k ) Q k (x k + s ). (5) If the ratio γ is large enough and the new point maintains the good geometry of the set I, then the point is accepted into I. Otherwise, we either improve the geometry of I or update the trust region radius. We always keep track of the best point in the set I. Whenever a new point x + enters, it is compared with f(x k ) to determine the best point, which becomes the next iterate x k+1 argmin{ f(x k ), f(x + )}. The following outline of the UOBYQA algorithm contains only the key points regarding the modifications; details can be found in (Powell 2002). The UOBYQA Algorithm A starting point x 0, an initial trust region radius 0, and a terminating trust region radius end are given. 1. Generate initial trial points in the interpolation set I. Let iterate x 1 be the best point in I. 2. For iterations k = 1,2,... (a) (b) (c) (d) (e) Construct a quadratic model Q k (x) of the form (2) that interpolates points in I. Solve the trust region subproblem (4). Evaluate the function at the new point x k + s and compute the agreement ratio γ in (5). Test whether the point is acceptable in the set I. If not, improve the quality I or update the trust radius k. If a point is added to the set I, another element in I should be removed. When a new point x + is added, determine the best point in I. If k end, terminate the loop. 3. Evaluate and return the final solution point. 3 MODIFICATIONS In the following subsections, we will present our modifications to the UOBYQA algorithm to deal with problematic points of the algorithm in the noisy case. 3.1 Reducing Quadratic Model Variance When there is uncertainty in the objective function, the existence of noise can cause erroneous estimations of coefficients of the quadratic model Q(x), say c Q,g Q,G Q, and as a result, generate an unstable solution x k +s. To reduce the variance of the quadratic model, we consider generating multiple function values for points y j, j = 1,2,...,L in the set I and use the averaged function value for the interpolation process. In UOBYQA, the quadratic function is constructed as a linear combination of Lagrange functions l j (x), Q k (x) = L f(y j )l j (x), x R n, (6) 313

3 where L is the cardinality of the set I, as defined in (3). Each l j (x) is a quadratic polynomial from R n to R l j (x k + s) = c j + g T j s+ 1 2 st G j s, j = 1,2,...,L, that has the property l j (y i ) = δ i j, i = 1,2,...,L, where δ i j is 1 if i = j and 0 otherwise. It follows from (2) and (6) that the parameters of Q k are derived as c Q = L f(y j )c j, g Q = L f(y j )g j, and G Q = L f(y j )G j. Note that the parameters c j, g j, and G j in each Lagrange function l j are uniquely determined when the y j are given, regardless of the objective function f( ). When we have r j function evaluations for the point y j, we can compute more accurate estimates of c Q,g Q, and G Q using mean values in (7) in place of f(y j ). We employ Bayesian tools to analytically quantify the distributions of the parameters in (7), which helps us determine the appropriate number of evaluations r j. In the Bayesian framework, the unknown mean µ(y j ) and variance σ 2 (y j ) of f(y j,ω) are considered as random variables, whose distributions are inferred by Bayes rule. By assuming a non-informative prior distribution, we can estimate the joint posterior distributions of µ(y j ) and 1/σ 2 (y j ) as 1 σ 2 (y j ) X Gamma((r j 1)/2, ˆσ 2 (y j )(r j 1)/2), µ ( y j ) σ 2 (y j ),X N( µ(y j ),σ 2 (y j )/r j ). Here we denote the sample mean and sample variance of the data by µ(y j ) and ˆσ 2 (y j ). The gamma distribution Gamma(α,β) has mean α/β and variance α/β 2. We use a non-informative prior since there is no prior information on the real distribution. The distribution of the mean value µ(y j ) is of most interest to us. When the sample size is large, we can replace the variance σ 2 (y j ) with the sample variance ˆσ 2 (y j ) in (8), and can asymptotically derive the posterior distribution of µ(y j ) X as (7) (8) µ(y j ) X N( µ(y j ), ˆσ 2 (y j )/r j ). (9) According to (7), we treat the c Q,g Q and G Q as random variables from a Bayesian perspective. They all follow normal distributions whose means are estimated using (9) as (all of these are posterior estimates) E[c Q ] [ L ] = E f(y j,ω)c j [ L ] = E µ(y j )c j = L µ(y j )c j, E[g Q ] E[G Q ] = L µ(y j )g j, = L µ(y j (10) )G j and the variances are estimated as ( L ) var(c Q ) = var µ(y j )c j = L c2 j ˆσ2 (y j )/r j, var(g Q (i )) = L g2 j (i )ˆσ 2 (y j )/r j, var(g Q (i, j )) = L G2 j (i, j )ˆσ 2 (y j )/r j, i, j = 1,...,n. (11) As r j increases to infinity, the variance decreases to zero. A more precise computation involving the use of the Student s t-distribution was found to perform similarly. To increase the stability of a quadratic model, we want to quantify how the randomness in the coefficients c Q,g Q, and G Q affects the solution of the subproblem (4). Suppose we solve N t trial subproblems whose quadratic model coefficients are extracted from their posterior distributions, we expect that solutions s (i),i = 1,2,...,N t have a small overall variance (see Figure 1). Therefore, we introduce a criterion that constrains the standard deviation of solutions in each coordinate direction, requiring them to be smaller than a threshold value β times the trust region radius: n max std([s (1) ( j),s (2) ( j),...,s (Nt) ( j)]) β k. (12) Increasing r j should help reduce the volatility of coefficients, thus reduce the standard deviations of solutions. Therefore, satisfying the above criterion necessitates a sufficiently large r j for the point y j. Figure 1: The Trial Solutions within a Trust Region are Projected to Each Coordinate Direction and the Standard Deviations of the Projected Values are Evaluated. Sequentially allocating computational resources: A new resource allocation question arises (based on r j function evaluations at site y j ) about which site to assign new function replications in order to satisfy the constraint (12) with the minimum total number of function evaluations. In solving the subproblem (4), we know that only g Q and G Q matter 314

4 for determining the solution s from the definition of Q k (x) in (2). Instead of satisfying the constraint directly, we aim to control the variance of the most volatile coefficient. We minimize the corresponding ratio of the standard deviation to the expected value ( ) std(g max Q (i )) i, j E[g Q (i )], std(g Q(i, j )), i E[G Q (i, j )], j = 1,...,n. (13) We propose a fast but sub-optimal strategy here, one that assigns additional computational resources sequentially. We assume that when an additional batch of r replications at site y j are newly produced, the sample mean µ(y j ) and variance ˆσ 2 (y j ) remain invariant. Under this assumption, the posterior variance of µ(y j ) (see (9)) changes from ˆσ 2 (y j )/r j ˆσ 2 (y j )/(r j + r). First, let φ( r) (here r = [r 1,r 2,...,r L ]) denote the largest quantity in (13): φ( r) = max = max i, j ( std(gq (i )) i, j E[g Q (i )], std(g Q(i, j )) E[G Q (i, j )] s s LP LP g 2 j (i )ˆσ 2 (y j )/r j G 2 j (i, j )ˆσ 2 (y j )/r j, LP LP µ(y j )g j (i ) µ(y j )G j (i, j ) ), i, j = 1,...,n. (14) To achieve a good allocation scheme, we invest our new resources in the point y j in order to most sharply decrease the quantity φ( r). After assigning r samples to the point y j, we obtain a new vector r+re j, where e j is the standard unit vector in R L with 1 on the jth component. We select the site y j with the index: arg max j φ( r+ re j ). (15) The best index is determined by comparing the L different possible options. Procedure to stabilize the quadratic model: Given initial sample size r 0, batch size r, and the threshold value β. 1. Generate r 0 function evaluations for each point for pre-estimations of sample mean and sample variance. Set r j r 0, j = 1,2,...,L. 2. Determine the largest quantity φ( r) which corresponds to the most volatile coefficient. 3. Select the site for further evaluations using (15). 4. Evaluate r function replications on the selected site and update the sample mean and sample variance. 5. Repeat Steps 2-4, until constraint (12) is satisfied. 3.2 Selecting the Best Point If x + is a new point entering the interpolation set I, we encounter the problem of selecting the best point in I. Since x k is known to be the best in the previous set, the question becomes how we determine the order of x k and x +. When there is noise in the function output, we need more precise estimations of the underlying function value to make the correct decision. Without loss of generality, the problem is equivalent to solving a discrete-event optimization problem: arg min j {1,2} E[ f(y j,ω)]. (16) An extension of the problem is to select the best point among K (K 2) points. This general problem is often regarded selecting the best system, which is considered one of the principal problems in discrete-event simulation studies. Kim and Nelson (2003) provide a good survey on IZ-R&S procedures. In our case, since there are only two points involved, we implement a simplified variant of a ranking and selection procedure OCBA (Chen, Chen, and Yucesan 1999). Suppose we have replicated r j function values for y j, let µ(y i ) and σ 2 (y j ) be the unknown mean and variance of the output at y j. The point evidenced by a smaller sample mean is selected: { 1, if µ(y Choose point 1 ) µ(y 2 ); 2, otherwise. We will quantify our decision-making by the so-called probability of correct selection (PCS): PCS = Pr(select y (1) µ(y (1) ) µ(y (2) )). (17) Here the notation µ(y (1) ) µ(y (2) ) reveals the underlying order of the means. In OCBA, Bayesian methods are applied to estimate the posterior PCS. The idea is to construct a posterior distribution for the underlying mean µ(y j ) using observed data and estimate the PCS via joint distributions. We increase the number of samples until the desired accuracy is achieved PCS 1 α, (18) where α represents a significance level. We perform a similar analysis using Bayesian inference as in Section 3.1. Without loss of generality, suppose we observe µ(y 1 ) µ(y 2 ) and select the first point. The PCS 315

5 corresponds to the probability of the event {µ(y 1 ) µ(y 2 )}, PCS = Pr(µ(y 1 ) µ(y 2 )) Pr(µ(y 1 ) µ(y 2 ) X) = Pr(µ(y 1 ) X µ(y 2 ) X 0). (19) The approximate PSC in (19) is a tail probability of a normal distribution: PCS (20) ( ( Pr N µ(y 1 ) µ(y 2 ), ˆσ2 (y 1 ) + ˆσ2 (y 2 ) ) ) 0. r 1 r 2 Sequentially allocating computational resources We consider a sequential allocation strategy to assign new computational resources to the points y 1 and y 2, so that we can use the least number of function evaluations to satisfy the rule (18). The change of sample mean and sample variance follows the assumptions in Section (3.1). To determine the potentially better point, we compare the derivative value: max j {1,2} r j Pr ( ( ) ) N µ(y 1 ) µ(y 2 ), ˆσ2 (y 1 ) r 1 + ˆσ2 (y 2 ) r 2 0. (21) It is not hard to find that, since the mean of the joint distribution µ(y 1 ) µ(y 2 ) is unchanged, we desire the largest decrease in the variance ψ( r) := ˆσ 2 (y 1 )/r 1 + ˆσ 2 (y 2 )/r 2. Therefore, the problem (21) becomes min ψ( r+ re j). (22) j {1,2} This determines the index of the potentially better point. Procedure to select the best point in I: Given initial sample size r 0, batch size r, and a significance parameter α. 1. Evaluate r 0 function evaluations for each point for pre-estimations of sample mean and sample variance. Set r j r 0, j = 1,2. 2. Select the point to carry out further replications using (22). 3. Evaluate r further function replications on the point selected and update the sample mean and sample variance. 4. Repeat Steps 2 and 3 until the constraint (18) is satisfied. 3.3 New Termination Criterion In UOBYQA and other model-based optimization algorithms, a test on the norm of the gradient g k or the trust region size k is typically treated as termination criteria; i.e., k end = However, these criteria are not suitable for noisy cases. When k gets smaller, the estimated value of G Q, which is a gauge for curvature of the quadratic model, will approach zero. This will inevitably require much more function evaluations in later iterations to reduce the variance of the quadratic model in order to retain accuracy. The new termination criterion is designed to relax the value of end, such that the algorithm terminates much earlier. We specify a parameter N max that controls the maximum number of replications per site in the algorithm. N max represents the amount of computing we are willing to spend at any site. In each iteration k, we will check that any point x on the edge of the subregion {x x x b k } is separable from the center x k, given the control N max on the number of replications per site. Here separable means that one point is better than the other with high accuracy (PCS 1 α). The difficulty occurs in estimations of sample mean µ(x) and sample variance ˆσ 2 (x) of an edge point x, which demand additional function evaluations. We simplify this procedure by performing the separability test using Q k ( ) instead of the original F( ), because Q k is a good surrogate model of the underlying mean function F when k is small. We can (a) approximate µ(x) with Q k (x) and (b) approximate ˆσ 2 (x) with ˆσ 2 (x k ). The second approximation is valid because the variances of function output at two points are very close within a small trust region. In fact, we enumerate 2n edge points (of {x x x k < k }) which are in standard coordinate directions from x k, and check their separability from x k. The 2n points consist of a representative set of edge points. The stopping criterion is met when a portion of the representative point set, i.e., 80% of the 2n points, are separable from x k. In practice, what we first calculate is a least separable distance d. By observing the posterior distribution with the approximations: µ(x) µ(x k ) X N(F(x) F(x k ), ˆσ 2 (x)+ ˆσ 2 (x k )/N max ) N(Q k (x) Q k (x k ),2ˆσ 2 (x k )/N max ). We compute d, satisfying Pr(N(d,2ˆσ 2 (x k )/N max ) 0) 1 α, via an inverse cdf function evaluation. Then, we say x and x k are separable if and only if the difference Q k (x) Q k (x b ) d. 316

6 4 NUMERICAL RESULTS 4.1 Numerical Functions We tested the noisy UOBYQA algorithm on several numerical examples and compared it with two other noisy optimization tools, NOMADm (Nonlinear Optimization for Mixed variables and Derivatives in Matlab) (Abramson 2006) and SPSA (Simultaneous Perturbation Stochastic Approximation) (Spall 2003). NOMADm is a pattern search algorithm that implements ranking and selection; and SPSA is a line search algorithm that applies simultaneous perturbation for gradient estimation. The test function we employed was the well-known extended Rosenbrock function: n 1 F(x) = 100(x i+1 xi 2 ) 2 +(x i 1) 2. i=1 The function has a global optima at the point (1,1,...,1) n, at which the optimal objective value attains 0. A noisy optimization problem was formulated by adding a white noise term to the objective (Section 1). We also applied our algorithm to the ARGLINC function with very similar results. We assume the random term ω was independent of x and followed a normal distribution with mean zero and variance σ 2, indicating the levels of noise. The starting point x 0 was ( 1.2,1, 1.2,1,..., 1.2,1) n and the initial trust region radius 0 was set to 2. Table 1 presents the details about a single-run of noisy UOBYQA to the two-dimensional Rosenbrock function. The variance σ 2 was set to 0.01, which was moderately noisy from our tested noise levels. We used the following default setting of parameters: initial sampling number r 0 = 3, which is small but enough to generate sample mean and sample variance; the significance level α = 0.2, the threshold value β = 0.4, the number of trail solutions N t = 20, and the maximum number of evaluations per point N max = 60. The new termination criterion we introduced terminated the algorithm earlier, at iteration 81. In fact, we did not observe noticeable improvement in terms of objective value ( to ) between the iteration 80 and 120, because the presence of noise deteriorated the accuracy in these iterations. In order to keep the algorithm running and maintaining correct decisions, a large number of function evaluations were necessary. It was therefore preferable to stop the algorithm early. In the earlier iterations, the algorithm used relatively few function replications. This implied that when noisy effects were small compared to the large function values, the basic operation of the method was unchanged. In Table 3, we further compare numerical results of UOBYQA, NOMADm, and SPSA. We tested the Rosen- Table 1: The Performance of Noisy UOBYQA for the Rosenbrock Function, with n = 2 and σ 2 = Iteration (k) FN F(x k ) k Stops with the new termination criterion Stops with the termination criterion k 10 4 brock function of dimension 2 and 10. Increasing the dimension significantly increased computational burden. To solve a 10-dimensional problem, we need function evaluations in total. The different scales of noise variance σ 2 were specified to be 0.001, 0.01, 0.1, and 1. We constrained the algorithm with various total number of function evaluations. The parameter setting was the same as before, except that we used different N max when the total number of function evaluations or the variance changed. A practically useful formula for N max was N max = Max FN Iteration # (n) δ(σ2 ). Here the iteration # represented the estimated number of iterations that the algorithm should maintain for a given dimension. This is a rough estimate and the actual number of iterations varies in different problems. δ(σ 2 ) was an adjustment value associated with the variance σ 2. The algorithm used a relatively smaller N max when the variance was small. We provide the following suggested values in Table 2. Table 2: Suggested Values to Define N max n Iteration # σ δ(σ 2 ) As shown in Table 3, for the 2 dimensional case, our algorithms performed better than the other two algorithms. The mean value was the smallest among the peer algorithms. For the 10 dimensional case, when the variance σ 2 was large, our algorithm did not perform well. We should note a fact that making the variance 10 times larger, from our previous theoretical analysis, we potentially require 10 times more function evaluations in order to achieve the same scale of accuracy. That is why the quality of the solutions 317

7 decreased sharply as σ 2 increased. Another aspect that influences the performance may be the dimension of the problem. Powell has pointed out that in higher dimensional problems, UOBYQA may not be practically useful because the number of interpolation points L is huge. We think that noisy UOBYQA inherits the limitation in the same way. Table 3: Apply Noisy UOBYQA to the Rosenbrock Function (Results are Based on 10 Replications of the Algorithms). n 2 10 Noise level σ 2 Max FN Noisy NOMADm SPSA UOBYQA Mean Error Mean Error Mean Error Simulation Problems In this subsection, we applied the noisy UOBYQA algorithm to solve a pricing problem via simulation. The parameters we considered were the prices of M similar goods in a store, say, p 1, p 2,..., p M. When a customer arrives at the store, he is sequentially exposed to the M goods by the store keeper, from the most expensive good which is of the best quality to the cheapest good. He decides to purchase the item or not after viewing it, but he will buy at most one of them. For good i, we set the probability that the customer purchases the item as Prob i = exp( p i /η i ), i = 1,2,...,M. The objective of the store keeper is to determine the best combination of prices p = [p 1, p 2,..., p M ] which yields the largest expected profit. Running the simulation for a period of time, if we denote the total number of customers as m, and the number of customers who purchase good i as m i, we formulate the stochastic problem as: max(expected Profit), where Profit := p The precise solution is obtained from: M i=1 m i m p i. max(expected Profit) (23) p M i 1 := max (1 Prob j ) Prob i p i. p i=1 We used Arena 9.0 from Rockwell software to model the pricing system. We considered cases M = 2 and M = 10, where M also indicated the dimension of the variable p. In computation of the Prob i, the parameter η i can reflect the goodness of the merchandize. We set η 1 = 50 and η 2 = 20 in the two-dimensional case; set η 1 = 50,η 2 = 48,...,η 10 = 32 in the ten-dimensional case. The noisy UOBYQA algorithm and OptQuest (the optimization add-on for Arena,< were compared on the stochastic simulation problem. (See results in Table 4.) We implemented the same parametric setting of the algorithm as in the Section 4.1, except that we assigned the initial trust region as 10, and the upper bound for replication usage as 200 and 2000 for the two cases respectively. By varying the number of customers generated in each simulation run, we can have several levels of noisy profit value. The table shows the different levels of variances of the output. We compared with the real optimal solutions solved using the same η i s via (23). As we can see in the table, noisy UOBYQA did a uniformly better job than OptQuest, with a higher quality of solutions. For all the tested cases, the gap to the optimal solution was reduced around 50%. Table 4: Optimization Results of the Pricing Model (over Average Value of 10 Runs), Where the Real Solution for M = 2 is and for M = 10 is Model Estimated variance Max FN Noisy UOBYQA OptQuest M= M=

8 5 CONCLUSIONS In this paper we modify Powell s UOBYQA algorithm to solve noisy optimization problems. UOBYQA belongs to the class of model-based algorithms that do not require evaluations of the gradient of the objective, therefore, this algorithm is particularly applicable to real-world simulation problems. In the noisy setting, performing more function evaluations can reduce the randomness in function responses. Our approach is motivated by applying analytical Bayesian inference to determine the appropriate number of replications and provide statistical accuracy of the algorithm. At the same time, we have attempted to control the total computational effort. Note that our method does not assume Common Random Numbers (CRN) (Law and Kelton 2000) are implemented in the simulations. Therefore, we cannot generate a deterministic approximation function N 1 N i=1 f(,ω i) to F( ) by fixing a sequence of samples ω i,i = 1,2,...,N. The use of CRN is a topic for future research. The modifications can be generalized to other modelbased algorithms, such as the WEDGE algorithm. Our modifications are not intended to be applied to linear model based algorithms, since linear models are more sensitive to noise than quadratic models. In a stochastic algorithm, quadratic models are robust against noise and preferable to use. REFERENCES Abramson, M NOMADm: mash adaptive direct search Matlab toolbox. <http: // MAbramson/NOMADm.html>. Box, E. P. G., and N. R. Draper Empirical modelbuilding and response surfaces. New York: John Wiley & Sons. Chen, H.-C., C.-H. Chen, and E. Yucesan An asymptotic allocation for simultaneous simulation experiments. In Proceedings of the 1999 Winter Simulation Conference, Kim, S.-H., and B. L. Nelson Selecting the best system: theory and methods. In Proceedings of the 2003 Winter Simulation Conference, Law, A., and W. Kelton Simulation modeling and analysis. Third ed. New York: McGraw-Hill. Marazzi, M., and J. Nocedal Wedge trust region methods for derivative free optimization. Mathematical Programming 91: Nocedal, J., and S. J. Wright Numerical optimization. New York: Springer. Powell, M. J. D UOBYQA: Unconstrained optimization by quadratic approximation. Mathematical Programming 92: Powell, M. J. D The NEWUOA software for unconstrained optimization with derivatives. DAMTP Report 2004/NA05, University of Cambridge. Spall, J. C Introduction to stochastic search and optimization. New York: John Wiley & Sons. ACKNOWLEDGMENTS This research was partially supported by Air Force Office of Scientific Research Grant FA , and National Science Foundation Grants DMS and IIS AUTHOR BIOGRAPHIES GENG DENG is a Ph.D. student at the University of Wisconsin Madison. He is interested in simulation-based optimization, dynamic programming, and applied statistics. He is a member of INFORMS, SIAM, and AMS. He can be reached via <geng@cs.wisc.edu>. MICHAEL C. FERRIS is Professor of Computer Sciences and Industrial Engineering at the University of Wisconsin Madison. His research interests are in computational algorithms and modeling, coupled with applications in economics, structural and mechanical engineering and medicine. His work emphasizes practical solution of large scale optimization and complementarity problems. He is a member of the Mathematical Programming Society, SIAM, and INFORMS. His and web addresses are <ferris@cs.wisc.edu> and < ferris>. 319

A derivative-free trust-region algorithm for reliability-based optimization

Struct Multidisc Optim DOI 10.1007/s00158-016-1587-y BRIEF NOTE A derivative-free trust-region algorithm for reliability-based optimization Tian Gao 1 Jinglai Li 2 Received: 3 June 2016 / Revised: 4 September