Introduction to unconstrained optimization - derivative-free methods

Introduction to unconstrained optimization - derivative-free methods Jussi Hakanen Post-doctoral researcher Office: AgC426.3 jussi.hakanen@jyu.fi

Learning outcomes To understand the basic principles of derivative-free methods in optimization To understand differences between gradient based methods and derivative-free methods To be able to decide when these methods could/should be used

Why derivative-free methods? Don t utilize derivative information If derivatives are available, they should be used! For solving problems where derivatives are not available or are not reliable or are time consuming to calculate

Reminder: Structure of optimization methods Typically Constraint handling converts the problem to (a series of) unconstrained problems In unconstrained optimization a search direction is determined at each iteration The best solution in the search direction is found with line search Constraint handling method Unconstrained optimization Line search

Reminder: Opt. conditions for unconstrained problem min f x, s. t. x R n Necessary conditions: Let f be twice differentiable in x. If x is a local minimizer, then f x = 0 (that is, x is a critical point of f) and H x is positive semidefinite. Sufficient conditions: Let f be twice differentiable in x. If f x = 0 and H(x ) is positive definite, then x is a strict local minimizer. Result: Let f: R n R is twice differentiable in x. If f x = 0 and H(x ) is indefinite, then x is a saddle point.

Reminder: Model algorithm for unconstrained minimization Let x h be the current estimate for x 1) [Test for convergence.] If conditions are satisfied, stop. The solution is x h. 2) [Compute a search direction.] Compute a nonzero vector d h R n which is the search direction. 3) [Compute a step length.] Compute α h > 0, the step length, for which it holds that f x h + α h d h < f(x h ). 4) [Update the estimate for minimum.] Set x h+1 = x h + α h d h, h = h + 1 and go to step 1. From Gill et al., Practical Optimization, 1981, Academic Press

Question List derivative-free methods for unconstrained optimization? what are their general ideas Discuss in pairs (5-10 minutes). You can freely utilize e.g. the Internet

Summary of discussion 1. Classification: direct search, simplicial direct search, line search and trust-region based methods 2. Limitations: small number of variables, for quite smooth functions, less accuracy 3. Pattern search, particle swarm (evolutionary methods in general), downhill simplex

Derivative-free methods Don t use derivative information Build models of the functions based on sample function values (model based methods) or Utilize the sample function values directly (direct search methods) Pros Applicable for a wider class of problems For highly multimodal problems can perform better than gradient based methods Cons Typically slower convergence Results less accurate Can only solve problems with moderate number of variables (up to tens of variables) Literature Some information in many books on nonlinear optimization Conn, Scheinberg & Vicente: Introduction to Derivative-Free Optimization, SIAM, 2009

Multiple local optima 3.5 3 An example model-instance: see HB of GO, Vol 2, Ch 15. 2.5 2 1.5 1 0.5-4 -2 2 4 (C) Janos D. Pinter, PCS Inc.

Examples Univariate search, coordinate descent, cyclic coordinate search Hooke and Jeeves Powell s method Nelder Mead Simplex

From Miettinen: Nonlinear optimization, 2007 (in Finnish) Coordinate descent f x = 2x 1 2 + 2x 1 x 2 + x 2 2 + x 1 x 2

From Miettinen: Nonlinear optimization, 2007 (in Finnish) Idea of pattern search

From Miettinen: Nonlinear optimization, 2007 (in Finnish) Hooke and Jeeves f x = x 1 2 4 + x 1 2x 2 2

From Miettinen: Nonlinear optimization, 2007 (in Finnish) Hooke and Jeeves with fixed step length f x = x 1 2 4 + x 1 2x 2 2

Powell s method Most efficient pattern search method Differs from Hooke and Jeeves so that for each pattern search step one of the coordinate directions is replaced with previous pattern search direction.

Nelder Mead Simplex Nelder and Mead: A Simplex Method for Function Minimization, Computer Journal, 7, 308-313, 1965 most widely used and cited direct search method original method some times performs poorly, but different modifications have been made to improve it A derivative-free method Don t confuse with more famous Simplex algorithm for linear optimization Operates with n + 1 points x 1,, x n+1 in R n Also known as the Polytope algorithm because the points can be seen as vertices of a polytope in R n spring 2014

From Gill et al., Practical Optimization, 1981 Nelder Mead Simplex (cont.) Initialize: Select n + 1 points in R n and evaluate objective function in the points. Order the points s.t. f 1 f 2 f n f n+1 (note: f j = f(x j )). New trial point: x r = c + α(c x n+1 ), where c = 1 n x j n j=1 is the centroid of n best vertices and α > 0 is the reflection coefficient. Evaluate f r. f 1 f r f n : x r is neither a new best point nor worst point x r replaces x n+1 and move to next iteration. f r < f 1 : x r is the new best point, try to improve f further in that direction x e = c + β(x r c), where β > 1 is the expansion coefficient. If f e < f r, replace x n+1 by x e. Otherwise, replace x n+1 by x r. f r > f n : The polytope is considered to be too large and should be contracted. Contraction: x c = c + γ(x n+1 c) if f r f n+1 OR x c = c + γ(x r c) if f r < f n+1, where 0 < γ < 1 is the contraction coefficient. If f c < min {f r, f n+1 } replace x n+1 by x c. Otherwise further contraction is carried out. f 3 f 1 x r x e spring 2014 f 2

Rosenbrock function Have you heard of Rosenbrock function? A non-convex function f x = 1 x 1 2 + 100 x 2 x 1 2 2 Has a global minimum in x = 1,1 T, f x = 0 which is located in a narrow, banana-shaped valley The coefficient of the second term can be adjusted but it does not affect the position of the global minimum Used to test optimization algorithms spring 2014 From Wikipedia

From Gill et al., Practical Optimization, 1981 Nelder Mead Simplex example spring 2014

From Gill et al., Practical Optimization, 1981 Compare with steepest descent

Next lecture (Wed 28.1.) Matlab: Optimization Toolbox Demonstrate this for unconstrained optimization