Recent Developments in Model-based Derivative-free Optimization

Similar documents
COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 2: Linear Regression Gradient Descent Non-linear basis functions

COMPUTATIONAL INTELLIGENCE (CS) (INTRODUCTION TO MACHINE LEARNING) SS16. Lecture 2: Linear Regression Gradient Descent Non-linear basis functions

A Random Variable Shape Parameter Strategy for Radial Basis Function Approximation Methods

CS 450 Numerical Analysis. Chapter 7: Interpolation

Programming, numerics and optimization

Convexization in Markov Chain Monte Carlo

Optimization. Industrial AI Lab.

Optimization for Machine Learning

Learning a classification of Mixed-Integer Quadratic Programming problems

Locally Weighted Least Squares Regression for Image Denoising, Reconstruction and Up-sampling

SYSTEMS OF NONLINEAR EQUATIONS

Computational Methods. Constrained Optimization

Performance Evaluation of an Interior Point Filter Line Search Method for Constrained Optimization

Mathematical Programming and Research Methods (Part II)

Solution Methods Numerical Algorithms

Comparison of Interior Point Filter Line Search Strategies for Constrained Optimization by Performance Profiles

morphology on binary images

IE598 Big Data Optimization Summary Nonconvex Optimization

Integer Programming Theory

Constrained and Unconstrained Optimization

(1) Given the following system of linear equations, which depends on a parameter a R, 3x y + 5z = 2 4x + y + (a 2 14)z = a + 2

CHAPTER 6 IMPLEMENTATION OF RADIAL BASIS FUNCTION NEURAL NETWORK FOR STEGANALYSIS

Learning from Data Linear Parameter Models

Approximation Methods in Optimization

Kernels and representation

Bilevel Sparse Coding

Approximation of a Fuzzy Function by Using Radial Basis Functions Interpolation

A C 2 Four-Point Subdivision Scheme with Fourth Order Accuracy and its Extensions

Algorithms for convex optimization

Convex Optimization MLSS 2015

Delaunay-based Derivative-free Optimization via Global Surrogate. Pooriya Beyhaghi, Daniele Cavaglieri and Thomas Bewley

Lecture 25: Bezier Subdivision. And he took unto him all these, and divided them in the midst, and laid each piece one against another: Genesis 15:10

Chemnitz Scientific Computing Preprints

Ill-Posed Problems with A Priori Information

Two-phase matrix splitting methods for asymmetric and symmetric LCP

Surrogate Gradient Algorithm for Lagrangian Relaxation 1,2

Ensemble methods in machine learning. Example. Neural networks. Neural networks

Chapter 1 BACKGROUND

A fast algorithm for sparse reconstruction based on shrinkage, subspace optimization and continuation [Wen,Yin,Goldfarb,Zhang 2009]

Aspects of Convex, Nonconvex, and Geometric Optimization (Lecture 1) Suvrit Sra Massachusetts Institute of Technology

Function approximation using RBF network. 10 basis functions and 25 data points.

Lagrangian methods for the regularization of discrete ill-posed problems. G. Landi

Parallel and Distributed Sparse Optimization Algorithms

Introduction to Optimization Problems and Methods

CPSC 340: Machine Learning and Data Mining. Robust Regression Fall 2015

Incremental Gradient, Subgradient, and Proximal Methods for Convex Optimization: A Survey. Chapter 4 : Optimization for Machine Learning

Contents. I Basics 1. Copyright by SIAM. Unauthorized reproduction of this article is prohibited.

INTRODUCTION TO LINEAR AND NONLINEAR PROGRAMMING

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

Calibration by Optimization Without Using Derivatives

Theoretical Concepts of Machine Learning

REAL-CODED GENETIC ALGORITHMS CONSTRAINED OPTIMIZATION. Nedim TUTKUN

Characterizing Improving Directions Unconstrained Optimization

Introduction to Computer Graphics. Modeling (3) April 27, 2017 Kenshi Takayama

Introduction to optimization methods and line search

Linear Methods for Regression and Shrinkage Methods

A projected Hessian matrix for full waveform inversion Yong Ma and Dave Hale, Center for Wave Phenomena, Colorado School of Mines

Chapter 3 Numerical Methods

A Truncated Newton Method in an Augmented Lagrangian Framework for Nonlinear Programming

Support Vector Machines.

1.7.1 Laplacian Smoothing

Efficient Tuning of SVM Hyperparameters Using Radius/Margin Bound and Iterative Algorithms


Modern Methods of Data Analysis - WS 07/08

Iterative Algorithms I: Elementary Iterative Methods and the Conjugate Gradient Algorithms

Adaptive Node Selection in Periodic Radial Basis Function Interpolations

Projection-Based Methods in Optimization

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Radial Basis Function Networks: Algorithms

Bayesian Methods in Vision: MAP Estimation, MRFs, Optimization

COMPUTATIONAL INTELLIGENCE

Derivative Free Optimization Methods: A Brief, Opinionated, and Incomplete Look at a Few Recent Developments

Introduction to unconstrained optimization - derivative-free methods

Search direction improvement for gradient-based optimization problems

A C 2 Four-Point Subdivision Scheme with Fourth Order Accuracy and its Extensions

Collocation and optimization initialization

Iterative methods for use with the Fast Multipole Method

Post-Processing Radial Basis Function Approximations: A Hybrid Method

A Brief Look at Optimization

MVE165/MMG630, Applied Optimization Lecture 8 Integer linear programming algorithms. Ann-Brith Strömberg

Nonlinear Programming

Module 1 Lecture Notes 2. Optimization Problem and Model Formulation

Experimental Data and Training

AM205: lecture 2. 1 These have been shifted to MD 323 for the rest of the semester.

SID-PSM: A PATTERN SEARCH METHOD GUIDED BY SIMPLEX DERIVATIVES FOR USE IN DERIVATIVE-FREE OPTIMIZATION

Introduction to Optimization

Numerical Optimization: Introduction and gradient-based methods

Lecture 19: November 5

Fast Radial Basis Functions for Engineering Applications. Prof. Marco Evangelos Biancolini University of Rome Tor Vergata


Convex Optimization / Homework 2, due Oct 3

Learning via Optimization

RBF Interpolation with CSRBF of Large Data Sets

3 Nonlinear Regression

Convex or non-convex: which is better?

arxiv: v1 [math.na] 20 Sep 2016

lecture 10: B-Splines

Matching. Compare region of image to region of image. Today, simplest kind of matching. Intensities similar.

The Pre-Image Problem in Kernel Methods

Transcription:

Recent Developments in Model-based Derivative-free Optimization Seppo Pulkkinen April 23, 2010

Introduction Problem definition The problem we are considering is a nonlinear optimization problem with constraints: min x R n f : Rn R l i x i u i, i = 1,..., n, Ax b. We also assume that the objective function is nonconvex and not necessarily differentiable. is expensive to evaluate.

A Motivating Example: Image Matching Problem Given two consequent images and a region in the first image, find a matching region from the second image: Practical considerations: Difficult to find invariant measures between the images. The transformations between the images can be large. The images may be contaminated with noise.

Image Matching Problem: A Simple Mathematical Formulation Problem definition The aim is to find transformation parameters giving the best fit between the matched regions by solving the problem min p x Ω x I 1 (x ) I 2 (x + T (x, p)) 2. This is a nonconvex nonlinear optimization problem having a large number of local minima. nondifferentiable objective function. possibly constraints enforcing the smoothness of the solution. Implication: Local gradient-based methods are not really usable for such problems.

Problems With Noisy Data Smooth Noisy Consider a difference approximation of the form f f (x + he i) f (x). x i h Clearly, any local gradient approximations become unusable in the presence of noise.

The Traditional Approach - Taylor Series Approximations Most gradient-based algorithms employ the quadratic Taylor series approximation m(x + s) = f (x) + f (x) T s + 1 2 st H(x)s. This can expressed in a more generic form, that is m(x + s) = c + b T s + 1 2 st As, where c R, b R n and A R n n. Problem: Can the model parameters c, b and A be estimated without evaluating derivatives?

Interpolation-based Methods An alternative approach: Determine the model parameters c, b and A from interpolation equations m(x + y i ) = f (x + y i ), i = 1,..., Y, where Y = {y 1,..., y m } is the set of interpolation points. A model defined by the above equations only requires that f can be evaluated at the given points: no derivatives are needed. is not restricted to the small neighbourhood of x.

Limitations of Quadratic Models Quadratic interpolation models have several limitations: The need to solve O(n 2 ) parameters from interpolation equations. Updating the model parameters has complexity of O(n 4 ). The interpolation set must be well-poised, which leads to complex geometric conditions. Quadratic models are essentially local: they cannot model multimodal behaviour of a nonconvex function. Problem: Is there a model function that requires only O(n) parameters. requires only mild conditions for well-poisedness. can approximate functions with multiple local minima.

Improvement: Underdetermined Quadratic Models An alternative approach is to determine only the diagonal elements of the matrix A from interpolation equations. The off-diagonal elements of the matrix A are approximated by using the minimum Frobenius norm method (Powell, 2004, Wild, 2008). Requires only 2n + 1 model parameters. The amount of work per iteration is O(n 3 ) (Powell, 2004). Analogous to quasi-newton methods. However... this approach still has the limitations of quadratic models: it gives only local approximations.

A Novel Approach - Radial Basis Function Models RBF Model Function A typical radial basis function model is of the form Y m(x + s) = λ i φ( s y i ) + p(s), i=1 where λ i are weighting coefficients and p is a low-order polynomial. Such a model function addresses the questions posed above: The minimum number of interpolation points is n + 2. Can use an arbitrary number of interpolation points. Ideal for approximating functions with multiple minima.

Radial Basis Functions: Overview The choice of the radial basis function φ is crucial for the accuracy and numerical stability of the approximation. Commonly used radial basis functions: φ(r) = r linear φ(r) = r 3 cubic φ(r) = r 2 log r thin plate φ(r) = (γr 2 + 1) 3 2 multiquadric φ(r) = exp γr 2 gaussian r 0. Other important applications of radial basis functions are solving partial differential equations. neural networks.

An Illustrative Example RBF Interpolation with 30 randomly chosen interpolation points, Rastrigin function: 1.0 Function 40.5 1.0 RBF model 42 36.0 36 0.5 31.5 27.0 0.5 30 24 0.0 0.5 1.0 1.0 0.5 0.0 0.5 1.0 22.5 18.0 13.5 9.0 4.5 0.0 0.0 0.5 1.0 1.0 0.5 0.0 0.5 1.0 18 12 6 0 6

An Illustrative Example Function RBF model Model 40 35 30 25 20 15 10 5 40 30 20 10 0 0.5 0.5 0.5 0.0 0.5 0.5 0.0 0.5 0.0 0.5 0.5 0.0 Radial basis function models yield global approximations of the objective function. increasingly accurate approximations as the number of interpolation points increases.

Limiting Functions of Flat RBF Models (1) Examples of RBF models with adjustable shape parameter: φ(r) = (γr 2 + 1) 3 2 φ(r) = e γr 2 multiquadric gaussian The limit γ 0 (Fornberg et al., 2004): When Y = (n+1)(n+2) 2, the limit γ 0 yields under certain conditions a quadratic polynomial, i.e. Y lim γ 0 i=1 λ i φ( s y i, γ) + p(s) = 1 2 st As + b T s + c. Implication: RBF models yield accurate local approximations by letting γ 0 near a minimum.

Limiting Functions of flat RBF Models (2) 0.6 0.4 0.2 0.0 0.2 0.4 Function 40.5 36.0 31.5 27.0 22.5 18.0 13.5 9.0 0.6 0.4 0.2 0.0 0.2 0.4 Multiquadric RBF model (γ=5) 81 72 63 54 45 36 27 18 0.6 4.5 0.6 9 0.6 0.4 0.2 0.0 0.2 0.4 0.6 0.0 0.6 0.4 0.2 0.0 0.2 0.4 0.6 0 Multiquadric RBF model (γ=0.05) 180 Quadratic model 180 0.6 135 0.6 135 0.4 90 0.4 90 0.2 45 0.2 45 0.0 0 0.0 0 0.2 45 0.2 45 0.4 0.6 0.6 0.4 0.2 0.0 0.2 0.4 0.6 90 135 180 0.4 0.6 0.6 0.4 0.2 0.0 0.2 0.4 0.6 90 135 180

Geomeric Conditions for RBF Interpolation (1) We are particularly interested in multiquadric RBF models Y m(x + s) = λ i (γ s y i 2 + 1) 3 2 + g T s + c, i=1 where the linear polynomial tail guarantees an unique interpolant (Powell, 1992). provides an estimate for the function gradient. The interpolation equations uniquely determining the model parameters are: m(x + y i ) = f (x + y i ), i = 1,..., Y Y λ i p j (y i ) = 0, j = 1,..., n + 1 i=1

Geomeric Conditions for RBF Interpolation (2) We denote the linear tail p(s) by n+1 i=1 c ip i (s), where {p 1,..., p n+1 } span the linear polynomial space. The interpolation equations in matrix form are [ ] [ ] Φ Π λ Π T = F, 0 c where λ = λ 1. λ Y, c = c 1. c n+1, F = f (x + y 1 ). f (x + y Y ). and Φ ij = φ( y i y j ), Π ij = p i (y j ).

Geomeric Conditions for RBF Interpolation (3) Necessary conditions A necessary condition for an unique solution of the interpolation equations is that rank(π) n + 1, or equivalently, at a least subset consisting of n + 1 points is linearly independent. Two approaches for ensuring this poisedness condition can be found in the literature: 1 Apply correction steps for improving the quality of the model (Powell, 2004, Scheinberg et al., 2009). 2 Avoid inserting any bad interpolation points (Marazzi and Nocedal, 2002).

The Trust Region Framework (1) Idea: Define a region in which the model can be considered reliable. Contour lines of f An iterative algorithm: Each new iterate x k+1 is defined by x k+1 = x k + s k, where s k minimizes the model within the current trust region.

The Trust Region Framework (2) Mathematical formulation: Solve the minimization problem s = arg min s {m k (x k + s) x k + s B k }, where the spherical trust region B k is defined as B k = {x F x x k < k }, and F is the set of feasible points. Also adjust the trust region radius k, if necessary: If the step s leads to sufficiently smaller function value, increase the radius, set k+1 > k. Otherwise, shrink the trust region, set k+1 < k.

Updating the Model Under Geometric Constraints The Constraint Condition: S = span({y 1,..., y n+1 } \ {y }). Compute vector ˆn that is orthonormal to S. The feasible region containing sufficiently linearly independent points is defined by infeasible region F = {x B k x T ˆn > γ x }. The Idea of The Algorithm: Replace some interpolation point y with a better point y +, for example, y + = s k.

The Special Structure of RBF Models Motivation RBF models are linear combinations of convex and concave functions. Hence, it seems natural to express the model function in the decomposed form m(x) = g(x) h(x), where g and h are convex. Implications This special structure allows developing efficient d.c. (diff-convex) algorithms for minimizing the RBF model function.

Diff-convex Decompositions of RBF Models The following decompositions of RBF models have been proposed in the literature (Hoai An, Vaz and Vicente, 2009): Separation of convex and concave terms: g(x) = λ i 0 λ i φ( x y i ) + p(x), h(x) = λ i <0( λ i )φ( x y i ) Regularization approach: g(x) = ρ 2 x 2 + p(x), Y h(x) = ρ 2 x 2 λ i φ( x y i ) i=1

The d.c. Algorithm: Preliminaries g(x) g(x)-h(x) -h(x) x The Idea of the d.c. Algorithm: Replace the concave term h(x) of f (x) = g(x) h(x) with its linear approximation, let f (x) g(x) (h(x 0 ) + h(x 0 )(x x 0 )).

The d.c. Algorithm: Mathematical Formulation Statement of the Algorithm: Iteratively solve the problem x k+1 = arg min x F {g(x) (h(x k) + (x x k ) T y k )}, where y k = h(x k ). Using this formulation can be beneficial, if the new problem is easier to solve can be solved more efficiently than the original problem.

Convexification: An Illustrative Example Idea: Convexify the function by adding a convex term ρ 2 x 2 with a large enough parameter ρ to it.

The d.c. Algorithm: Regularization Approach With the regularized d.c. decomposition, solving the linearized minimization problem x k+1 = arg min x F {g(x) (h(x k) + (x x k ) T y k )}, is equivalent to solving x k+1 = arg min x (x O + y k g ), x F ρ which is the projection of the term x O + y k g ρ to the set F. We have a gradient descent method requiring no line search. a convenient way to handle constraints.

How to determine the Regularization Parameter ρ? The sufficient condition for convexity of h The convexity of h within the trust region B is guaranteed, if ρ max x B 2 h (x), where Y h (x) = λ i φ( x y i ). i=1 It is possible to derive an upper bound for the minimum ρ that ensures convexity. When ρ gives an accurate estimate, the algorithm converges rapidly.

Thank you! Questions?