- PDF Free Download

Numerical experience with a derivative-free trust-funnel method for nonlinear optimization problems with general nonlinear constraints by Ph. R. Sampaio and Ph. L. Toint Report naxys-3-25 August 2, 25 Log 2 Scaled Performance Profile on Subset of CUTEst.9.8.7.6 ρs().5.4.3.2 Subbasis Min. Frobenius norm Min. l. 2 norm Regression.5.5 2 2.5 University of Namur, 6, rue de Bruxelles, B5 Namur (Belgium) http://www.unamur.be/sciences/naxys

Numerical experience with a derivative-free trust-funnel method for nonlinear optimization problems with general nonlinear constraints Ph. R. Sampaio and Ph. L. Toint August 2, 25 Abstract A trust-funnel method is proposed for solving nonlinear optimization problems with general nonlinear constraints. It extends the one presented by Gould and Toint (Math. Prog., 22():55-96, 2), originally proposed for equality-constrained optimization problems only, to problems with both equality and inequality constraints and where simple bounds are also considered. As the original one, our method maes use of neither filter nor penalty functions and considers the objective function and the constraints as independently as possible. To handle the bounds, an activeset approach is employed. We then exploit techniques developed for derivative-free optimization to obtain a method that can also be used to solve problems where the derivatives are unavailable or are available at a prohibitive cost. The resulting approach extends the DEFT-FUNNEL algorithm presented by Sampaio and Toint (Comput. Optim. Appl., 6():25-49, 25), which implements a derivative-free trust-funnel method for equality-constrained problems. Numerical experiments with the extended algorithm show that our approach compares favorably to other well-nown modelbased algorithms for derivative-free optimization. Keywords: Constrained nonlinear optimization, trust-region method, trust funnel, derivative-free optimization. Introduction We consider the solution of the nonlinear optimization problem min f(x) x s.t.: l s c(x) u s, l x x u x, (.) where we assume that f : IR n IR and c : IR n IR m are twice continuously differentiable, and that f is bounded below on the feasible domain. The vectors l s and u s are lower and upper bounds, respectively, on the constraints values c(x), while l x and u x are bounds on the x variables, with l s (IR ) m, u s (IR ) m, l x (IR ) n and u x (IR ) n. Namur Center for Complex Systems (naxys) and Department of Mathematics, University of Namur, 6, rue de Bruxelles, B-5 Namur, Belgium. Email: phillipece@gmail.com Namur Center for Complex Systems (naxys) and Department of Mathematics, University of Namur, 6, rue de Bruxelles, B-5 Namur, Belgium. Email: philippe.toint@unamur.be

Sampaio, Toint: A derivative-free trust-funnel method for constrained nonlinear optimization 2 By ining f(x, s) = f(x) and c(x, s) = c(x) s, the problem above may be rewritten as the following equality-constrained optimization problem with simple bounds min f(x, s) (x,s) s.t.: c(x, s) =, (.2) l s s u s, l x x u x, which is the one we address throughout this paper. The trust-funnel method was first introduced by Gould and Toint [] (see also []) as a SQP (Sequential Quadratic Programming) algorithm for equality-constrained optimization problems whose convergence is driven by an adaptive bound imposed on the allowed infeasibility at each iteration. Such bound is monotonically decreased as the algorithm progresses, assuring its global convergence whilst seeing optimality; hence the name trust funnel. It belongs to the class of trust-region methods and maes use of a composite-step approach to calculate a new direction at each iteration: a normal step is computed first in the hope of reducing the infeasibility measure ensuing from the constraint functions values, and a tangent step is subsequently calculated with the aim of improving optimality of the iterates with regard to the objective function. These computations are carried out with the use of two different trust regions, one for each step component. The main idea is to consider the objective function and the constraints as independently as possible. The method is noticeable among others for constrained problems as a parameter-free alternative, for neither filter nor penalties are needed, freeing the user from common difficulties encountered when choosing the initial penalty parameter, for instance. An extension to problems with both equalities and inequalities was developed recently by Curtis et al. [7], who presented an interior-point trust-funnel algorithm for solving large-scale problems that may be characterized as a barrier-sqp method. Although many methods have been designed for unconstrained DFO (Derivative-Free Optimization), the range of possibilities for the constrained case remains quite open. A derivative-free trustfunnel method called DEFT-FUNNEL (DErivative-Free Trust FUNNEL), proposed by Sampaio and Toint [22] for equality-constrained problems, has given good results in comparison with a filter-sqp method introduced by Colson [4] and a trust-region method by Powell named COBYLA (Constrained Optimization BY Linear Approximations) [9, 2], all centred on local polynomial interpolation models. Recently, another trust-region method by Powell [2] entitled LINCOA has been developed for linearly-constrained optimization without derivatives and maes use of quadratic interpolation-based models for the objective function. Many other methods that use direct search instead of trust-region techniques combined to interpolation-based models also have been developed for the constrained case (e.g., Lewis and Torczon [4, 5, 6], Audet and Dennis [], Lucidi et al. [7], Yu and Li [24]). The primary contribution of this paper is the extension of both the original trust-funnel method [] and its derivative-free adaptation (DEFT-FUNNEL) [22] to problems with general nonlinear constraints as well as the presentation of numerical experience with the resulting algorithm. Particularly for the derivative-free algorithm, a subspace minimization approach proposed by Gratton et al. [2] that considers the poisedness of the interpolation set is employed in order to avoid its degeneration. The final method described here features four main steps to solve problems with general nonlinear constraints, namely: a subspace minimization approach to handle the bounds on the x variables, which maes the DEFT-FUNNEL an active-set method, a bounded linear least-squares solver to calculate the normal step, a projected gradient method to calculate the tangent step and the control of the permitted infeasibility of the iterates through the funnel bound. The reason behind the choice of exploring subspaces ined by the active bounds is dual. Besides the fact that we aim to avoid treating the bounds on the x variables as general inequality constraints, the reduction of the dimension of the problem after having identified the active bounds helps to thwart a possible degeneration of

Sampaio, Toint: A derivative-free trust-funnel method for constrained nonlinear optimization 3 the interpolation set when the sample points become close to each other and thus affinely dependent, which happens often as the optimal solution is approached. The fact that the s variables play no role on the choice of the interpolation set vindicates the construction of the subspaces based upon the x variables only. This paper is organized as follows. Section 2 introduces the proposed trust-funnel algorithm step by step. The subsection 2. explains the subspace minimization approach, while the subsections 2.2 and 2.3 give details on how the computation of the composite step components is conducted. The subsections 2.4 and 2.5 then address the whole mechanism of the algorithm itself. Section 3 concerns the application of the method to derivative-free optimization problems and Section 4 gives the description of the final method, which assembles the techniques elaborated in Sections 2 and 3. In Section 5, we discuss some initial numerical experiments and compare its performance to COBYLA s and LINCOA s in a selected set of test problems. Ultimately, we draw some final conclusions in Section 6. Notation. Unless otherwise specified, our norm is the standard Euclidean norm. Given any vector x, we denote its i-th component by [x] i. We let B(z; ) denote the closed Euclidian ball centered at z, with radius >. Given any set A, A denotes the cardinality of A. By P d n, we mean the space of all polynomials of degree at most d in IR n. Finally, given any subspace S, we denote its dimension by dim(s). 2 An active-set trust-funnel method Our method generates a sequence of points {(x, s )} such that, at each iteration, the bound constraints below are satisfied l s s u s, (2.3) l x x u x. (2.4) By using a composite-step approach, each trial step d = (d x, ds )T is decomposed as d = ( d x d s ) = ( n x n s ) + ( t x t s ) = n + t, where the normal step component n = (n x, ns )T aims to improve feasibility, and the tangent step component t = (t x, ts )T reduces the objective function model s value while preserving any gains in feasibility obtained through n. In the following subsections, we will describe how each component is computed. Before that, we briefly explain how the subspace minimization is employed in our algorithm. 2. Subspace minimization As in the method proposed by Gratton et al. [2] for bound-constrained optimization problems, our algorithm maes use of an active-set approach where the minimization is restricted to subspaces ined by the active x variables. At each iteration, we ine the subspace S as follows S = {x IR n [x] i = [l x ] i for i L and [x] i = [u x ] i for i U }, where L = {i [x ] i [l x ] i ɛ b } and U = {i [u x ] i [x ] i ɛ b } ine the index sets of (nearly) active variables at their bounds, for some small constant ɛ b > ined a priori. After that S has been

Sampaio, Toint: A derivative-free trust-funnel method for constrained nonlinear optimization 4 ined, the minimization at iteration is then restricted to the new subspace S. Once a direction d for (x, s ) has been computed, we set (x +, s + ) = (x, s ) + d if is a successful iteration; otherwise, we set (x +, s + ) = (x, s ). If a solution ( x, s ) for the subproblem ined by S satisfies the optimality conditions for the subproblem, we chec whether it is also optimal for the original problem (2.5). If it is, the solution encountered is returned to the user and the algorithm halts; otherwise, it proceeds in the full space by computing a new direction for ( x, s ) and repeats the above process at iteration + by ining S +. 2.2 The normal step For any point (x, s), we measure the constraint violation by v(x, s) = 2 c(x, s) 2. (2.5) Analogous to the trust-funnel method for equality-constrained problems proposed in [] and in [22], we also have a funnel bound v max for v such that, for each iteration, v v max, where v = v(x, s ). As this bound is monotonically decreased, the algorithm is driven towards feasibility, guaranteeing the convergence of the algorithm. In order to ensure that the normal step is indeed normal, we add the following condition that ass that it mostly lies in the space spanned by the columns of the matrix J(x, s ) T : n κ n c(x, s ), (2.6) for some κ n. We then perform the calculation of n by solving the trust-region bound-constrained linear least-squares problem min n=(n x,n s ) c(x 2, s ) + J(x, s )n 2 s.t.: l s s + n s u s, l x x + n x u x, x + n x S, n N, where J(x, s) = (J(x) I) represents the Jacobian of c(x, s) with respect to (x, s) and for some trust-region radius c >. 2.3 The tangent step (2.7) N = {z IR n+m z min [ c, κ n c(x, s ) ] }, (2.8) After attempting to reduce infeasibility through the normal step, a tangent step is calculated with the aim of improving optimality. The computation of the latter is conducted carefully so that the gains in feasibility obtained by the former are not abandoned without good reasons. The SQP model for the function f is ined as ψ ((x, s ) + d) = f + g, d + 2 d, B d, (2.9)

Sampaio, Toint: A derivative-free trust-funnel method for constrained nonlinear optimization 5 where f = f(x, s ), g = (x,s) f(x, s ), and B is the approximate Hessian of the Lagrangian function L(x, s, µ, z s, w s, z x, w x ) = f(x) + µ, c(x, s)) + w s, s u s + z s, l s s + w x, x u x + z x, l x x with respect to (x, s), given by ( H + m B = i= [ˆµ ) ] i C i, (2.) where z s and w s are the Lagrange multipliers associated to the lower and upper bounds, respectively, on the slac variables s, and z x and w x, the Lagrange multipliers associated to the lower and upper bounds on the x variables. In (2.), H is a bounded symmetric approximation of 2 xxf(x, s ) = 2 f(x ), the matrices C i are bounded symmetric approximations of the constraints Hessians 2 xxc i (x, s ) = 2 c i (x ) and the vector ˆµ may be viewed as a local approximation of the Lagrange multipliers with respect to the equality constraints c(x, s). By using the decomposition d = n + t, we then have that where ψ ((x, s ) + n + t) = ψ ((x, s ) + n ) + g N, t + 2 t, B t, (2.) g N = g + B n. (2.2) In the interest of assuring that (2.) it is a proper local approximation for the function f((x, s ) + n + t), the complete step d = n + t must belong to T = {d IR n+m d f }, (2.3) for some radius f. The minimization of (2.) should then be restricted to the intersection of N and T, which imposes that the tangent step t results in a complete step d = n + t that satisfies the inclusion where the radius of R is thus given by d R = N T = {d IR n+m d }, (2.4) = min[ c, f ]. (2.5) Similarly to the computation of the normal step, we intend to remain in the subspace S after waling along the tangent direction. We accomplish that by imposing the following condition x + n x + tx S. Additionaly, the conditions (2.3) and (2.4) must be satisfied at the final point (x, s ) + d, i.e. we must have l s s + n s + ts u s, l x x + n x + tx u x. Due to (2.4), we as n to belong to R before attempting the computation of t by requiring that n κ R, (2.6)

Sampaio, Toint: A derivative-free trust-funnel method for constrained nonlinear optimization 6 for some κ R (, ). If (2.6) happens, which means that there is enough space left to mae another step without crossing the trust region border, the tangent step is finally computed by solving the following problem min g t=(t x,t s N, t + t, B 2 t ) s.t.: J(x, s )t =, l s s + n s + ts u s, l x x + n x + tx u x, x + n x + tx S. n + t R. (2.7) We ine our f-criticality measure as π f = g N, r, (2.8) where r is the projected Cauchy direction obtained by solving the linear optimization problem min g r=(r x,r s N, r ) s.t.: J(x, s )r =, l s s + n s + rs u s, l x x + n x + (2.9) rx u x, x + n x + rx S. r. By inition, π f measures how much decrease could be obtained locally along the projection of the negative of the approximate gradient g N onto the nullspace of J(x, s ) intersected with the region delimited by the bounds. A new local estimate of the Lagrange multipliers (µ, z s, ws, zx, wx ) are computed by solving the following bound-constrained linear least-squares problem { min M (µ,ẑ s,ŵ s,ẑ x,ŵ x 2 (µ, ẑ s, ŵ s, ẑ x, ŵ x ) 2. ) (2.2) s.t.: ẑ s, ŵ s, ẑ x, ŵ x, where M (µ, ẑ s, ŵ s, ẑ x, ŵ x ) = ( g N + ( I x w ) ( J(x ) + T I ) ŵ x + ( I x z ) ( µ + ) ẑ x, I s w ) ( ŵ s + Iz s ) ẑ s the matrix I is the m m identity matrix, the matrices Iz s and Iw s are obtained from I by removing the columns whose indices are not associated to any active (lower and upper, respectively) bound at s + n s, the matrices Ix z and Iw x are obtained from the n n identity matrix by removing the columns whose indices are not associated to any active (lower and upper, respectively) bound at x + n x, and the Lagrange multipliers (ẑ s, ŵ s, ẑ x, ŵ x ) are those in (z s, w s, z x, w x ) associated to active bounds at s + n s and x + n x. All the other Lagrange multipliers are set to zero.

Sampaio, Toint: A derivative-free trust-funnel method for constrained nonlinear optimization 7 2.4 Which steps to compute and retain We now explain when the normal step n and the tangent step t should be computed at iteration given the constraint violation and the measure of optimality described in the previous subsection. The normal step is computed when = or the current constraint violation is significant, which is now formally ined by the conditions c(x, s ) > ω n (π f ) or v > κ vv v max, (2.2) where ω n is some bounding function, κ vv (, ) is a constant. If (2.2) fails, the computation of the normal step is not required and so we set n =. We ine a v-criticality measure that indicates how much decrease could be obtained locally along the projection of the negative gradient of the Gauss-Newton model of v at (x, s ) onto the region delimited by the bounds as = J(x, s ) T c(x, s ), b, where the projected Cauchy direction b is given by the solution of min J(x, s ) T c(x, s ), b b=(b x,b s ) s.t.: l s s + b s u s, l x x + b x u x, x + b x S, b. π v We say that (x, s ) is an infeasible stationary point if c(x, s ) and π v =. The procedure for the calculation of normal step is given in the algorithm below. (2.22) Algorithm 2.: NormalStep(x, s, π v, πf, v, v max ) Step : If c(x, s ) and π v =, STOP (infeasible stationary point). Step 2: If = or if (2.2) holds, compute a normal step n by solving (2.7). Otherwise, set n =. If the solution of (2.9) is r =, then by (2.8) we have π f =. In this case, the computation of the tangent step is sipped, and we simply set t =. If π f is unsubstantial compared to the current infeasibility, i.e. for a given a monotonic bounding function ω t, the condition π f > ω t( c(x, s ) ) (2.23) fails, then the current iterate is still too far from feasibility to worry about optimality, and we again sip the tangent step computation by setting t =. While (2.23) and (2.2) together provide considerable flexibility in our algorithm in that a normal or tangent step is only computed when relevant, our setting also produces the possibility that both these conditions fail. In this case, we have that d = n +t is identically zero, and the sole computation in the iteration is that of the new Lagrange multipliers (2.2). Finally, we may evaluate the usefulness of the tangent step t after (or during) its computation, in the sense that we would lie a relatively large tangent step to cause a clear decrease in the model (2.) of the objective function. We therefore chec whether the conditions t > κ CS n (2.24)

Sampaio, Toint: A derivative-free trust-funnel method for constrained nonlinear optimization 8 and where and δ f,t δ f = δ f,t + δf,n κ δ δ f,t, (2.25) = ψ ((x, s ) + n ) ψ ((x, s ) + n + t ) (2.26) δ f,n = ψ (x, s ) ψ ((x, s ) + n ), (2.27) are satisfied for some κ CS > and for κ δ (, ). The inequality (2.25) indicates that the predicted improvement in the objective function obtained in the tangent step is not negligible compared to the predicted change in f resulting from the normal step. If (2.24) holds but (2.25) fails, the tangent step is not useful in the sense just discussed, and we choose to ignore it by resetting t =. The computation of the tangent step is described algorithmically as follows. Algorithm 2.2: TangentStep(x, s, n ) Step : If (2.6) holds, then Step.: select a vector ˆµ and ine B as in (2.); Step.2: compute µ by solving (2.2); Step.3: compute the modified Cauchy direction r by solving (2.9) and ine π f Step.4: if (2.23) holds, compute a tangent step t by solving (2.7). as (2.8); Step 2: If (2.6) fails, set µ = µ. In this case, or if (2.23) fails, or if (2.24) holds but (2.25) fails, set t = and d = n. Step 3: Define (x +, s+ ) = (x, s ) + d. 2.5 Iterations types Once we have computed the step d and the trial point (x +, s+ ) = (x, s ) + d, (2.28) we are left with the tas of accepting or rejecting it. If n = t =, iteration is said to be a µ-iteration because the only computation potentially performed is that of a new vector of Lagrange multiplier estimates. We will say that iteration is an f-iteration if t, (2.25) holds, and v(x +, s+ ) vmax. (2.29) Condition (2.29) ensures that the step maintains feasibility within reasonable bounds. Thus the iteration s expected major achievement is, in this case, a decrease in the value of the objective function f, hence its name. If iteration is neither a µ-iteration nor a f-iteration, then it is said to be a c- iteration. If (2.25) fails, then the expected major achievement (or failure) of iteration is, a contrario, to improve feasibility, which is also the case when the step only contains its normal component. The main idea behind the technique for accepting the trial point is to measure whether the major expected achievement of the iteration has been realized.

Sampaio, Toint: A derivative-free trust-funnel method for constrained nonlinear optimization 9 If iteration is a µ-iteration, we do not have any other choice than to restart with (x +, s + ) = (x, s ) using the new multipliers. We then ine f + = f and c + = c (2.3) and eep the current value of the maximal infeasibility v max + = vmax. If iteration is an f-iteration, we accept the trial point (i.e. (x +, s + ) = (x +, s+ )) if ρ f = f(x, s ) f(x +, s+ ) δ f η, (2.3) and reject it (i.e. (x +, s + ) = (x, s )), otherwise. The value of the maximal infeasibility measure is left unchanged, that is v+ max = vmax. Note that δ f > (because of (2.26) and (2.25)) unless (x, s ) is first-order critical, and hence that condition (2.3) is well-ined. If iteration is a c-iteration, we accept the trial point if the improvement in feasibility is comparable to its predicted value δ c = 2 c(x, s ) 2 2 c(x, s ) + J(x, s )d 2, and the latter is itself comparable to its predicted decrease along the normal step, that is n, δ c κ cnδ c,n and ρ c = v(x, s ) v(x +, s+ ) δ c η, (2.32) for some κ cn (, κ tg ] and where δ c,n = 2 c(x, s ) 2 2 c(x, s ) + J(x, s )n 2. (2.33) If (2.32) fails, the trial point is rejected. We update the value of the maximal infeasibility by v max { [ max + = κtx v max, v(x +, s+ ) + κ tx2(v(x, s ) v(x +, s+ ))] if (2.32) hold, v max otherwise, (2.34) for some κ tx (, ) and κ tx2 (, ). Note that we only chec the third condition in (2.32) after the first two conditions have been verified. Assuming that n, the Cauchy condition (2.33) and c(x, s ) ensure that δ c,n > provided π v. Thus the third condition is well ined, unless (x, s ) is an infeasible stationary point, in which case the algorithm is terminated. 3 Application to derivative-free optimization Our algorithm maes use of surrogate models m f (x) and m c (x) = (m c (x), mc 2 (x),..., mc m(x)) built from polynomial interpolation over a set of sample points Y = {y, y,..., y p } that replace the functions f(x) and c(x) = (c (x), c 2 (x),..., c m (x)). Therefore, at each iteration, the following interpolation conditions are satisfied m f (y i ) = f(y i ), m c j(y i ) = c j (y i ), (3.35)

Sampaio, Toint: A derivative-free trust-funnel method for constrained nonlinear optimization for all y i Y. Whenever Y changes, the models m c and mf are updated to satisfy the interpolation conditions (3.35) for the new set Y +, thereby implying that new function evaluations of f and c are carried out for the additional points obtained at iteration. The algorithm developed in this wor employs the now standard idea of starting with incomplete interpolation models with linear accuracy and then enhancing them with curvature information, which provides an actual accuracy at least as good as that for linear models and, hopefully, better. We thus consider underdetermined quadratic interpolation i.e. n + Y (n + )(n + 2)/2 with initial sample sets that are poised for linear interpolation. As there are many possibilities for building underdetermined quadratic interpolation-based models, we consider three distinct ways whose complete descriptions are available in Conn, Scheinberg and Vicente [6], namely: subbasis selection approach, minimum l 2 -norm or Euclidian norm models (referred to as minimum-norm interpolating polynomials in Section 5. of [6]), whose coefficients vector associated to the natural basis is the minimum Euclidian norm solution of the linear system resulting from the interpolation conditions, and minimum Frobenius norm models where the coefficients related to the quadratic part of the natural basis are minimized (see the optimization problem (5.6) in [6]), which is equivalent to the minimization of the Frobenius norm of the Hessian of the models. In addition, we also consider least-squares regression for the construction of quadratic models, in which case one has n + Y (n + )(n + 2), giving the user a total of four possibilities for model building. For any function f : IR n IR and set Y = {y,..., y p } IR n, there exists an unique polynomial m(x) that interpolates f(x) on Y such that m(x) = p i= f(yi )l i (x), where {l i (x)} p i= is the basis of Lagrange polynomials, if Y satisfies a property of nonsingularity called poisedness (see Definition 3., page 37, in [6]). Before going further into details of the algorithm, we first introduce a condition to the interpolation set that is used to measure the error between the original functions and the interpolation models as well as their derivatives. Definition 3.. Let Y = {y, y,..., y p } be a poised interpolation set and P d n be a space of polynomials of degree less than or equal to d on IR n. Let Λ > and {l (x), l (x),..., l p (x)} be the basis of Lagrange polynomials associated with Y. Then, the set Y is said to be Λ-poised in B for P d n (in the interpolation sense) if and only if max max l i(x) Λ. i p x B As it is shown in [6], the error bounds for at most fully quadratic models depend linearly on the constant Λ; the smaller it is, the better the interpolation models approximate the original functions. We also note that the error bounds for undetermined quadratic interpolation models are linear in the diameter of the smallest ball B(Y) containing Y for the first derivatives and quadratic for the function values. 3. Recursive call in subspaces As pointed out before, by reducing the dimension of the problem we attempt to diminish the chances of a possible degeneration of the interpolation set when the sample points become too close to each other and thus affinely dependent. Such case might happen, for instance, as the optimal solution is approached. In our algorithm, we apply the recursive approach found in the derivative-free method proposed by Gratton et al. [2] for bound-constrained optimization problems. Once the subspace S has been ined at iteration, the algorithm calls itself recursively and the dimension of the problem is then reduced to ˆn = n L U, where n denotes here the dimension of IR n. A new well-poised

Sampaio, Toint: A derivative-free trust-funnel method for constrained nonlinear optimization interpolation set Z is then constructed from a suitable choice of points in X S, where X is the set of all points obtained up to iteration. In order to save function evaluations in the building process of the new interpolation set, all the points in X that are nearly but not in S are projected onto S and used to build Z with their model values instead of their real function values. In a more formal description, we ine the set of the points that are close to (but not on) the active bounds at x as A = where, for at least one i, the strict inequality or { y X [y] i [l x ] i ɛ b for i L and [u x ] i [y] i ɛ b for i U < [y] i [l x ] i, i L, < [u x ] i [y] i, i U, must hold. We then project all the points y A onto S, obtaining new dummy points y s that are added to X with associated values m f (y s) and m c (y s) rather than the values of the original functions. As it is shown in Section 3.3, these dummy points are progressively replaced by other points with true function values with high priority during the minimization in S. We denote by Dum the set of dummy points at iteration. Convergence in a subspace is only declared if the interpolation set contains no dummy points. If a solution has been found for a subspace and there are still dummy points in the interpolation set, evaluations of the original functions f(x) and c(x) at such points are carried out and the interpolating models are recomputed from the original function values. Once convergence has been declared in a subspace S, the L U fixed components x i associated with the active bounds and the component x of the approximate solution found in S of dimension ˆn = n L U are assembled to compose a full-dimensional vector x S in IRn. The algorithm then checs whether (x S, s S ) is optimal for the full-dimensional problem or not. Firstly, a full-space interpolation set of degree n + is built in an ɛ-neighborhood around the point x S. Subsequently, the corresponding interpolating models mf and m c are recomputed and the f-criticality measure πf is calculated anew using information of the updated models. Finally, the criticality step in the full space is then entered. The following algorithm gives a full description of the ideas explained above. }, Algorithm 3.: SubspaceMinimization(X, Y, x, s, f, c, vmax ) Step : Chec for (nearly) active bounds at x and ine S. If there is no (nearly) active bound or if S has already been explored, go to Step 6. If all bounds are active, go to Step 5. Step 2: Project points in X which lie close to the (nearly) active bounds on S and associate with them suitable function values estimates. Add new dummy points to Dum, if any. Step 3: Build a new interpolation set Z in S including the projected points, if any. Step 4: Call recursively DEFT-FUNNEL(S, X, Z, x, s, f, c, vmax ) and let (x S, s S ) be the solution of the subspace problem after adding the fixed components. Step 5: If dim(s ) < n, return (x S, s S ). Otherwise, reset (x, s ) = (x S, s S ), construct new set Y around x, build m f and mc and recompute πf.

Sampaio, Toint: A derivative-free trust-funnel method for constrained nonlinear optimization 2 Step 6: If S has already been explored, set (x +, s + ) = (x, s ), reduce the trust regions radii f + = γ f and c + = γ c, set + = min[ f +, c + ] and build a new poised set Y + in B(x + ; + ). 3.2 Criticality step Convergence is declared when both feasibility and optimality have been achieved and the error between the real functions and the models is expected to be sufficiently small. As it was mentioned before, this error is directly lined to the Λ-poisedness measure given in Definition 3.. The criticality step in DEFT-FUNNEL is described in the next algorithm. Algorithm 3.2: CriticalityStep(Y, π f, ɛ i) Step : Define ˆm f i = mf, ˆmc i = mc and ˆπf i = π f. ] Step 2: If c(x, s ) ɛ i and ˆπ f i ɛ i, set ɛ i+ = max [α c(x, s ), αˆπ f i, ɛ and modify Y as needed to ensure it is Λ-poised in B(x, ɛ i+ ). If Y was modified, compute new models ˆm f i and ˆm c i, calculate ˆr i and ˆπ f i and increment i by one. If c(x, s ) ɛ and ˆπ f i ɛ, return (x, s ); otherwise, start Step 2 again; [ ] Step 3: Set m f = ˆmf i, mc = ˆmc i, πf = ˆπf i, = β max c(x, s ), π f and ine ϑ i = x if a new model has been computed. 3.3 Maintenance of the interpolation set and trust-region updating strategy Many derivative-based trust-regions methods decrease the trust region radius at unsuccessful iterations to converge. However, in derivative-free optimization, unsuccessful iterations in trust-region methods might be due to a bad quality of the interpolating model rather than a large trust region size. In order to maintain the quality of the surrogate models without resorting to costly model improvement steps at unsuccessful iterations, which might require another optimization problem to be globally solved, we apply a self-correcting geometry scheme proposed by Scheinberg and Toint [23] for the management of the geometry of the interpolation set. Such scheme is based upon the idea that unsuccessful trial points can still be useful as they may improve the geometry of the interpolation set. At such iterations, the trust region radii are reduced only if the algorithm fails to improve the geometry of the interpolation set by replacing one of its points by the trial point, thereby implying that there is no lac of poisedness in this case. Firstly, an interpolation point that is far from the current point x is sought to be replaced by the trial point x +. If there is no such a far point, one tries to replace an interpolation point y,j whose associated Lagrange polynomial value at x + in absolute value, l,j(x + ), is bigger than a preined threshold Λ. By doing that, one attempts to obtain a Λ-poised set Y +, or, since at most one interpolation point is replaced by iteration, improve its poisedness, at least. If no interpolation point is replaced, the trust regions are then shrun. A slight modification on the scheme is used on DEFT-FUNNEL. Following the idea proposed in [2], the trust region radii can also be reduced in cases where it is possible to improve poisedness. The

Sampaio, Toint: A derivative-free trust-funnel method for constrained nonlinear optimization 3 number of reductions of f and c allowed in such cases, however, is limited by constants νmax f > and νc max > preined by the user as a means to prevent the trust regions from becoming too small. The management of the interpolation set in DEFT-FUNNEL is described in the next algorithm. It depends on the criterion used to ine successful iterations, which is passed to the algorithm through the parameter criterion. Algorithm 3.3: UpdateInterpolationSet(Y, x, x +,, ɛ i, criterion) Step : Augment the interpolation set. If Y < p max, then ine Y + = Y {x + }. Step 2: Replace a dummy interpolation point. If Y = p max and the set D = {y,j Dum Y } (3.36) is non-empty, then ine Y + = Y \ {y,r } {x + } where l,r(x + ) and r is an index of any point in D such that r = arg max y,j x + j 2 l,j (x + ). (3.37) Step 3: Successful iteration. If Y = p max, D =, and criterion holds, then ine Y + = Y \ {y,r } {x + } for y,r = arg max y,j x + y,j 2 l,j (x + ). (3.38) Y Step 4: Replace a far interpolation point. If Y = p max, D =, criterion fails, either x ϑ i or ɛ i, and the set F = {y,j Y such that y,j x > ζ and l,j (x + ) } (3.39) is non-empty, then ine Y + = Y \ {y,r } {x + }, where y,r = arg max y,j x + y,j 2 l,j (x + ). (3.4) F Step 5: Replace a close interpolation point. If Y = p max, criterion fails, either x ϑ i or ɛ i, the set D F is empty, and the set C = {y,j Y such that y,j x ζ and l,j (x + ) > Λ} (3.4) is non-empty, then ine Y + = Y \ {y,r } {x + }, where y,r = arg max y,j x + y,j 2 l,j (x + ). (3.42) C Step 6: No replacements. If Y = p max, criterion fails and either [x = ϑ i and > ɛ i ] or D F C =, then ine Y + = Y.

Sampaio, Toint: A derivative-free trust-funnel method for constrained nonlinear optimization 4 We now give the details of the procedures related to f- and c-iterations, starting with the former. The operations involved in such procedures concern the acceptance of the trial point and the update of the trust regions radii. The relation between the trust region updating strategy at unsuccessful iterations and the management of the interpolation set discussed above is now formally described. Algorithm 3.4: f-iteration(x, s, x +, s+, f, c ) Step : Successful iteration. If ρ f η, set (x +, s + ) = (x +, s+ ) and ν f =. If ρ f η 2, set f + = min[max[γ 2 d, f ], max ]; otherwise, set f + = f. If v(x +, s+ ) < η 3 v max, set c + = min[max[γ 2 n, c ], max ]; otherwise, set c + = c. Step 2: Unsuccessful iteration. If ρ f < η, set (x +, s + ) = (x, s ) and c + = c. If Y + Y, set f + = γ d and ν f = ν f + if ν f ν max f If Y + = Y, set f + = γ d. or f + = f otherwise. The operations related to c-iterations follow below. Algorithm 3.5: c-iteration(x, s, x +, s+, f, c ) Step : Successful iteration. If (2.32) holds, set (x +, s + ) = (x +, s+ ), f + = f and ν c =. If ρ c η 2, set c + = min[max[γ 2 n, c ], max ]; otherwise, set c + = c. Step 2: Unsuccessful iteration. If (2.32) fails, set (x +, s + ) = (x, s ) and f + = f. If Y + Y and ν c νc max, set c + = γ n if n =, or c + = γ c otherwise ( n = ). If ν c > νc max, set c + = c. If ν c νc max, update ν c = ν c +. If Y + = Y, set c + = γ n if n, or c + = γ c otherwise ( n = ). 4 The algorithm We now provide a formal description of our complete algorithm for solving nonlinear optimization problems with general nonlinear constraints without derivatives. Algorithm 4.: DEFT-FUNNEL(S, X, Y, x, s, f, c, v max ) Step : Initialization. Choose an initial vector of Lagrange multipliers µ and parameters ɛ >, ɛ >, f >, c >, α (, ), < γ < < γ 2, ζ, < η < η 2 <, Λ >, β >, η 3 >. Define = min[ f, c ] max. Initialize Y, with x Y B(x, ) and

Sampaio, Toint: A derivative-free trust-funnel method for constrained nonlinear optimization 5 Y n +, as well as the maximum number of interpolation points p max Y. Compute the associated models m f and mc around x and Lagrange polynomials {l,j } p j=. Define vmax = max[κ ca, κ cr v(x, s )], where κ ca > and κ cr >. Compute r by solving (2.9) with normal step n = and ine π f as in (2.8). Define νmax f > and νc max > and set ν f = ν c =. Set = and i =. Step : SubspaceMinimization(X, Y, x, s, f, c, vmax ). Step 2: CriticalityStep(Y, π f, ɛ i). Step 3: NormalStep(x, s, π v, πf, v, v max ). Step 4: TangentStep(x, s, n ). Step 5: Conclude a µ-iteration. If n = t =, then Step 5.: set (x +, s + ) = (x, s ), f + = f and c + = c ; Step 5.2: set + = min[ f +, c + ], vmax + = vmax and Y + = Y. Step 6: Conclude an f-iteration. If t and (2.25) and (2.29) hold, Step 6.: UpdateInterpolationSet(Y,x, x +,, ɛ i, ρ f η ); Step 6.2: f-iteration(x, s, x +, s+, f, c ); Step 6.3: Set + = min[ f +, c + ] and vmax + = vmax. Step 7: Conclude a c-iteration. If either n and t =, or either one of (2.25) or (2.29) fails, Step 7.: UpdateInterpolationSet(Y,x, x +,, ɛ i, (2.32) ); Step 7.2: c-iteration(x, s, x +, s+, f, c ); Step 7.3: Set + = min[ f +, c + ] and update vmax using (2.34). Step 8: Update the models and the Lagrange polynomials. If Y + Y, compute the interpolation models m f + and mc + around x + using Y + and the associated Lagrange polynomials {l +,j } p j=. Increment by one and go to Step. 5 Numerical experiments We implemented DEFT-FUNNEL in Matlab and tested it first on a set of 8 small-scale constrained problems from the CUTEst [9] collection. The problems contain at least one equality or inequality constraint, many of them containing both types and some containing simple bounds as well. 63 of the problems in our experiments are in the range of the first 3 test examples from the Hoc-Schittowsi collection [3], while the remaining are nonlinearly constrained optimization problems found in [3]. For the calculation of the normal step and the approximate Lagrange multipliers, we used a Matlab code named BLLS developed in collaboration with Ane Tröltzsch for solving bound-constrained linear least-squares problems. This method is intended for small-dimensional problems and is an active-set algorithm where the unconstrained problem is solved at each iteration in the subspace ined by the currently active bounds, which are determined by a projected Cauchy step. As for the computation of

Sampaio, Toint: A derivative-free trust-funnel method for constrained nonlinear optimization 6 the tangent step, we used a nonmonotone spectral projected gradient method [2] to solve the (possibly) ininite quadratic subproblems (2.7). Finally, we ine the bounding functions ω n (t) and ω t (t) as ω n (t) =. min[, t] and ω t (t) =. min[, t 2 ]. (5.43) We compare the results of the four variants of our method with regard to the way of building the interpolating models subbasis selection, mininum Frobenius norm, minimum l 2 -norm and regression to those obtained with the popular Fortran code COBYLA [9], a trust-region method for constrained problems that models the objective and constraint functions by linear interpolation. The only criterion for comparison is the number of calls on the routine that evaluates the objective function and the constraints at the same time at the required points, which is motivated by the fact that these costs very often dominate those of all other algorithmic computations. Therefore, we do not count evaluations of the objective function and the constraints separately, but rather the number of calls to the single routine that calculates both simultaneously. Comparisons of the computational results obtained by DEFT-FUNNEL for equality-constrained problems only also have been made with another method named CDFO, a derivative-free filter-sqp algorithm proposed in [4]. The results are reported in [22] and show that DEFT-FUNNEL was superior to CDFO in our tests. The threshold ɛ for declaring convergence in the criticality step in DEFT-FUNNEL is set to ɛ = 4. The stopping criterion used in COBYLA is based on the trust region radius ρ from the interval [ρ end, ρ beg ], where ρ beg and ρ end are constants preined by the user. The parameter ρ is decreased by a constant factor during the execution of the algorithm and is never increased. The algorithm stops when ρ = ρ end. Therefore, ρ end should have the magnitude of the required accuracy in the final values of the variables. In our experiments, we set ρ end = 4. In DEFT-FUNNEL, we fixed the trust-region parameters to =, η =., η 2 =.9, η 3 =.5, γ =.5, γ 2 = 2.5 and max =. The parameter ζ used in the inition of the sets F and C of far points and close points, respectively, is set to ζ =. For the limit number of times to reduce the trust regions sizes when a far or close interpolation point is replaced at unsuccessful iterations, we choose νf max =. We set p max = (n + )(n + 2)/2 for the subbasis, minimum Frobenius norm and minimum l 2 -norm approaches, and p max = (n + )(n + 2) for the regression case. Finally, we set ɛ =. as the initial value for the loop in the criticality step, α =. and β =. The performance of the methods are first compared by means of their performance profiles [8]. = ν max c Denote the set of solvers by S, and the set of problems by P. We compare the performance on problem p P by solver s S with the best performance by any solver on this problem; that is, we use the performance ratio r p,s = t p,s min{t p,s : s S}, where t p,s is the number of function evaluations required to solve problem p by solver s. If we ine ρ s () = P {p P : r p,s }, then ρ s () is an approximation to the probability for solver s S that a performance ratio r p,s is within a factor IR of the best possible ratio. In Figure 5., we provide the performance profiles of the four variants of DEFT-FUNNEL, whereas, in Figure 5.2, we compare the performance of each variant of our method to COBYLA s individually. As it can be seen, DEFT-FUNNEL has shown superior results on the set of test problems when the interpolating models are built from subbasis selection, minimum Frobenius norm and minimum l 2 -norm approaches. For the regression variant, Figure 5.2d reveals that COBYLA was faster than DEFT-FUNNEL in most of the problems, although the latter was able to solve more problems than the former for large values of.

Sampaio, Toint: A derivative-free trust-funnel method for constrained nonlinear optimization 7 Log 2 Scaled Performance Profile on Subset of CUTEst.9.8.7.6 ρs().5.4.3.2 Subbasis Min. Frobenius norm Min. l norm. 2 Regression.5.5 2 2.5 Figure 5.: Log 2 -scaled performance profiles of DEFT-FUNNEL with different approaches to build the models on a set of 8 problems from CUTEst. We also compare the methods by using data profiles. Such profiles were proposed by Moré and Wild [8] as a means to provide the user an accurate view of the performance of derivative-free solvers when the computational budget is limited. Let t p,s be the number of function evaluations required to solve problem p by solver s as before and denote by n p the dimension of the vector x in the problem p. The data profile of solver s S is ined by d s () = P { p P : } t p,s n p +. Because of the scaling choice of dividing by n p +, we may interpret d s () as the percentage of problems that can be solved with simplex gradient estimates. In Figure 5.3, the data profiles show that the subbasis, minimum Frobenius norm and minimum l 2 -norm variants have similar performance to COBYLA when the computational budget is limited to a value between and 2. In particular, when only 2 simplex gradient estimates are allowed, they solve roughly 8% of the problems. However, as the budget limit increases, the difference in the performance between these three variants and COBYLA becomes progressively larger, revealing the superiority of DEFT-FUNNEL. For the regression variant, it is difficult to tell which method wins from Figure 5.3d. For a very small budget (up to 7 simplex gradient estimates), both solve approximately the same amount of problems. For a budget of simplex gradients where κ (7, 35], COBYLA solves more problems than DEFT-FUNNEL, while for larger values of κ DEFT-FUNNEL solves more problems. We also tested DEFT-FUNNEL on a set of 2 small-scale linearly constrained optimization problems from CUTEst and compared its results with those of the software LINCOA, a newly developed trust-region method by Powell [2] for linearly constrained optimization without derivatives. His software combines active-set methods with truncated conjugate gradients and uses quadratic interpolation models for the objective function. The user is required to provide the Jacobian matrix and the vector of constants in the right side of the constraints as inputs, as well as a feasible starting point. Since many of CUTEst problems provide an infeasible starting point, we chose a feasible one instead for the sae of comparison. The stopping criterion used in LINCOA is the same found in COBYLA, i.e. the algorithm stops when the trust region radius ρ [ρ end, ρ beg ] reaches the smallest value allowed ρ end.

Sampaio, Toint: A derivative-free trust-funnel method for constrained nonlinear optimization 8 Log 2 Scaled Performance Profile on Subset of CUTEst Log 2 Scaled Performance Profile on Subset of CUTEst.9.9.8.8.7.7.6.6 ρs().5 ρs().5.4.4.3.3.2.2. DEFT FUNNEL/Subbasis COBYLA.5.5 2 2.5 3. DEFT FUNNEL/Min. Frobenius norm COBYLA.5.5 2 2.5 (a) Subbasis (b) Min. Frobenius norm Log 2 Scaled Performance Profile on Subset of CUTEst Log 2 Scaled Performance Profile on Subset of CUTEst.9.9.8.8.7.7.6.6 ρs().5 ρs().5.4.4.3.3.2.2. DEFT FUNNEL/Min. l 2 norm COBYLA.5.5 2 2.5 3. DEFT FUNNEL/Regression COBYLA.5.5 2 2.5 3 3.5 4 (c) Min. l 2-norm (d) Regression Figure 5.2: Log 2 -scaled performance profiles of the methods DEFT-FUNNEL (with different approaches to build the models) and COBYLA on a set of 8 problems from CUTEst. In our experiments, we set ρ end = 4. Regarding the parameter setting in DEFT-FUNNEL, the same values were ept for comparison with LINCOA. In Figure 5.4, we provide the performance profiles of the four variants of DEFT-FUNNEL for the 2 linearly constrained problems, whereas, in Figure 5.5, we compare the performance of each variant of our method to LINCOA s individually. All the variants of DEFT-FUNNEL except the minimum l 2 -norm approach have solved all the 2 linearly constrained problems. The results reported in Figures 5.5a and 5.5b reveal a superior performance of LINCOA over DEFT-FUNNEL/Subbasis and DEFT-FUNNEL/Frobenius. On the other hand, Figure 5.5c shows that DEFT-FUNNEL/Min. l 2 -norm was faster than LINCOA and, in Figure 5.5d, it can be seen that DEFT-FUNNEL/Regression also was slightly faster, although the latter presented superior performance for large values of, which indicates more robustness. Data profiles of DEFT-FUNNEL and LINCOA are presented in Figure 5.6 and show that, for very small budgets (less than simplex gradient estimates), DEFT-FUNNEL was able to solve more

Sampaio, Toint: A derivative-free trust-funnel method for constrained nonlinear optimization 9 Data Profile on Subset of CUTEst Data Profile on Subset of CUTEst.9.9.8.8.7.7.6.6 ds().5 ds().5.4.4.3.3.2.2. DEFT FUNNEL/Subbasis COBYLA 2 3 4 5. DEFT FUNNEL/Min. Frobenius norm COBYLA 2 4 6 8 2 (a) Subbasis (b) Min. Frobenius norm Data Profile on Subset of CUTEst Data Profile on Subset of CUTEst.9.9.8.8.7.7.6.6 ds().5 ds().5.4.4.3.3.2.2 DEFT FUNNEL/Min. l. 2 norm COBYLA 2 4 6 8 2. DEFT FUNNEL/Regression COBYLA 2 4 6 8 2 (c) Min. l 2-norm (d) Regression Figure 5.3: Data profiles of the methods DEFT-FUNNEL (with different approaches to build the models) and COBYLA on a set of 8 problems from CUTEst. problems than LINCOA, while the latter performed better than the former for larger budgets. We conclude this section by noticing that the performance profiles in the Figures 5.2 and 5.5 and the data profiles in the Figures 5.3 and 5.6 give clear indication that DEFT-FUNNEL provides encouraging results for small-scale nonlinear optimization problems with general nonlinear constraints. For the case where the user nows that the constraint functions are linear and he is able to provide their gradients, it might be interesting to the handle those constraints separately rather than lumping them and other general nonlinear constraints together in c(x). The number of function evaluations required by each method to converge in our numerical experiments are reported in the appendix.

Sampaio, Toint: A derivative-free trust-funnel method for constrained nonlinear optimization 2 Log 2 Scaled Performance Profile on Subset of CUTEst.9.8.7.6 ρs().5.4.3.2 Subbasis Min. Frobenius norm Min. l. 2 norm Regression.2.4.6.8.2.4.6 Figure 5.4: Log 2 -scaled performance profiles of DEFT-FUNNEL with different approaches to build the models on a set of 2 linearly constrained problems from CUTEst.

Sampaio, Toint: A derivative-free trust-funnel method for constrained nonlinear optimization 2 Log 2 Scaled Performance Profile on Subset of CUTEst Log 2 Scaled Performance Profile on Subset of CUTEst.9.9.8.8.7.7.6.6 ρs().5 ρs().5.4.4.3.3.2.2. DEFT FUNNEL/Subbasis LINCOA.5.5. DEFT FUNNEL/Min. Frobenius norm LINCOA.2.4.6.8.2.4.6.8 (a) Subbasis (b) Min. Frobenius norm Log 2 Scaled Performance Profile on Subset of CUTEst Log 2 Scaled Performance Profile on Subset of CUTEst.9.9.8.8.7.7.6.6 ρs().5 ρs().5.4.4.3.3.2.2. DEFT FUNNEL/Min. l 2 norm LINCOA.2.4.6.8.2.4.6. DEFT FUNNEL/Regression LINCOA.2.4.6.8.2.4.6 (c) Min. l 2-norm (d) Regression Figure 5.5: Log 2 -scaled performance profiles of the methods DEFT-FUNNEL (with different approaches to build the models) and LINCOA a set of 2 linearly constrained problems from CUTEst.