Modern Methods of Data Analysis - WS 07/08

Similar documents
Today. Golden section, discussion of error Newton s method. Newton s method, steepest descent, conjugate gradient

Multivariate Numerical Optimization

Introduction to optimization methods and line search

Constrained and Unconstrained Optimization

Lecture 6 - Multivariate numerical optimization

Theoretical Concepts of Machine Learning

Introduction to Optimization Problems and Methods

Energy Minimization -Non-Derivative Methods -First Derivative Methods. Background Image Courtesy: 3dciencia.com visual life sciences

Numerical Optimization

CS281 Section 3: Practical Optimization

Contents. I Basics 1. Copyright by SIAM. Unauthorized reproduction of this article is prohibited.

Convex Optimization CMU-10725

Today s class. Roots of equation Finish up incremental search Open methods. Numerical Methods, Fall 2011 Lecture 5. Prof. Jinbo Bi CSE, UConn

Introduction to Optimization

Computational Methods. Constrained Optimization

Solving for dynamic user equilibrium

Optimization. (Lectures on Numerical Analysis for Economists III) Jesús Fernández-Villaverde 1 and Pablo Guerrón 2 February 20, 2018

Image Registration Lecture 4: First Examples

Classical Gradient Methods

Hartley - Zisserman reading club. Part I: Hartley and Zisserman Appendix 6: Part II: Zhengyou Zhang: Presented by Daniel Fontijne

MATH3016: OPTIMIZATION

A Study on the Optimization Methods for Optomechanical Alignment

A Brief Look at Optimization

CS321 Introduction To Numerical Methods

Experimental Data and Training

10703 Deep Reinforcement Learning and Control

Optimization. Industrial AI Lab.

Machine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013

CS 395T Lecture 12: Feature Matching and Bundle Adjustment. Qixing Huang October 10 st 2018

Multi Layer Perceptron trained by Quasi Newton learning rule

5 Machine Learning Abstractions and Numerical Optimization

APPLIED OPTIMIZATION WITH MATLAB PROGRAMMING

Programming, numerics and optimization

CPSC 340: Machine Learning and Data Mining. Robust Regression Fall 2015

REGRESSION ANALYSIS : LINEAR BY MAUAJAMA FIRDAUS & TULIKA SAHA

OPTIMIZATION FOR AUTOMATIC HISTORY MATCHING

Machine Learning for Signal Processing Lecture 4: Optimization

COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 2: Linear Regression Gradient Descent Non-linear basis functions

Today. Gradient descent for minimization of functions of real variables. Multi-dimensional scaling. Self-organizing maps

An Evolutionary Algorithm for Minimizing Multimodal Functions

Training of Neural Networks. Q.J. Zhang, Carleton University

INTRODUCTION TO LINEAR AND NONLINEAR PROGRAMMING

Neural Networks: Optimization Part 1. Intro to Deep Learning, Fall 2018

Introduction to unconstrained optimization - derivative-free methods

Optimization. there will solely. any other methods presented can be. saved, and the. possibility. the behavior of. next point is to.

Module 4 : Solving Linear Algebraic Equations Section 11 Appendix C: Steepest Descent / Gradient Search Method

Chapter Multidimensional Gradient Method

Algorithms for convex optimization

1.1 calculator viewing window find roots in your calculator 1.2 functions find domain and range (from a graph) may need to review interval notation

Introduction to Design Optimization: Search Methods

COMPUTATIONAL INTELLIGENCE (CS) (INTRODUCTION TO MACHINE LEARNING) SS16. Lecture 2: Linear Regression Gradient Descent Non-linear basis functions

Ellipsoid Algorithm :Algorithms in the Real World. Ellipsoid Algorithm. Reduction from general case

Optimal Control Techniques for Dynamic Walking

10-701/15-781, Fall 2006, Final

Wiswall, Applied Microeconometrics, Lecture Notes 1. In this section we focus on three very common computational tasks in applied

Mathematical Programming and Research Methods (Part II)

Newton and Quasi-Newton Methods

Algoritmi di Ottimizzazione: Parte A Aspetti generali e algoritmi classici

Introduction. Optimization

A User Manual for the Multivariate MLE Tool. Before running the main multivariate program saved in the SAS file Part2-Main.sas,

25. NLP algorithms. ˆ Overview. ˆ Local methods. ˆ Constrained optimization. ˆ Global methods. ˆ Black-box methods.

Edge and local feature detection - 2. Importance of edge detection in computer vision

What is machine learning?

Delaunay-based Derivative-free Optimization via Global Surrogate. Pooriya Beyhaghi, Daniele Cavaglieri and Thomas Bewley

Fast oriented bounding box optimization on the rotation group SO(3, R)

Gradient Descent. Wed Sept 20th, James McInenrey Adapted from slides by Francisco J. R. Ruiz

Generic descent algorithm Generalization to multiple dimensions Problems of descent methods, possible improvements Fixes Local minima

Adaptive Regularization. in Neural Network Filters

Optimization in Scilab

NUMERICAL METHODS PERFORMANCE OPTIMIZATION IN ELECTROLYTES PROPERTIES MODELING

Numerical Optimization: Introduction and gradient-based methods

Assignment 2. Classification and Regression using Linear Networks, Multilayer Perceptron Networks, and Radial Basis Functions

Neural Network Learning. Today s Lecture. Continuation of Neural Networks. Artificial Neural Networks. Lecture 24: Learning 3. Victor R.

PARALLELIZATION OF THE NELDER-MEAD SIMPLEX ALGORITHM

Surface Parameterization

Contents. Hilary Term. Summary of Numerical Analysis for this term. Sources of error in numerical calculation. Solving Problems

Artificial Intelligence

Module 1 Lecture Notes 2. Optimization Problem and Model Formulation

Tree-GP: A Scalable Bayesian Global Numerical Optimization algorithm

3.3 Function minimization

Artificial Intelligence

Robot Mapping. TORO Gradient Descent for SLAM. Cyrill Stachniss

B553 Lecture 12: Global Optimization

7 OPTIMIZATION 46. Contributing to savings versus achieving enjoyment from purchases made now;

MULTI-DIMENSIONAL MONTE CARLO INTEGRATION

Lecture 5: Optimization of accelerators in simulation and experiments. X. Huang USPAS, Jan 2015

Convexization in Markov Chain Monte Carlo

An evolutionary annealing-simplex algorithm for global optimisation of water resource systems

Combine the PA Algorithm with a Proximal Classifier

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Optical Design with Zemax

Advanced Operations Research Prof. G. Srinivasan Department of Management Studies Indian Institute of Technology, Madras

Introduction to Design Optimization: Search Methods

Cost Functions in Machine Learning

A Brief Overview of Optimization Problems. Steven G. Johnson MIT course , Fall 2008

Fast marching methods

Lecture 12: Feasible direction methods

A Brief Overview of Optimization Problems. Steven G. Johnson MIT course , Fall 2008

10.4 Linear interpolation method Newton s method

Mini-Max Type Robust Optimal Design Combined with Function Regularization

Transcription:

Modern Methods of Data Analysis Lecture XV (04.02.08) Contents: Function Minimization (see E. Lohrmann & V. Blobel)

Optimization Problem Set of n independent variables Sometimes in addition some constraints A single measure of goodness => objective function In physics data analysis: Objective function: (negative log of a) Likelihood function in MLH method sum of squares in a (nonlinear) Least Square problem Constraints: equality constraints, expressing relations between parameters inequality constraints are limit of certain parameters defining a restricted range of parameters (e.g. m>0)

Aim of Optimization Find the global minimum of the objective function within the allowed range of parameter values in a short time even for a large number of parameters and a complicated objective function even if there are local minima Most methods will converge to the next minimum, which may be the global minimum or a local minimum, going immediately downhill as far as possible. Search for the global minimum requires a special effort.

One-dimensional Minimization Search for minimum of function f(x) of (scalar) argument x Important application in multidimensional minimization: robust minimization along a line line search Aim: robust, efficient and as fast as possible, because each function evaluation may require a large CPU time Standard method: iterations, starting from expression : with convergence to fixed point, with with

Newton Iteration Method Method for the determination of zeros of a function ( ), based on derivatives: Same method for min/max determination ( derived from Taylor expansion ): It follows:

Convergence Behaviour (I) An iterative method is called local convergent of at least order p, if for all start values is valid for all k (c<1 in the linear case p=1). Condition for order p: and the iterative method is convergent of order p.

Convergence Behaviour (II) The linear case (p=1): required. Sequence converging monotonely to for a positive value of, and alternating around for a negative value. linear convergence can be very slow. The constant c is often very close to 1, and many 100 iterations may be necessary with small progress per iteration - not recommended. quadratic convergence usually only few iterations required, very fast in final phase - recommended, at least for the end game.

Convergence for Newton Method... for the determination of a minimum (or maximum): in general Newtons method: - quadratically convergent (locally) - first and second derivative required - may be divergent for a bad start value

Search Without Derivatives Required: robust convergent method for minimum determination without the need to calculate derivatives (which may be complicated or impossible) Aim: determine very short x-interval, which contains the minimum of the function f(x) Strategy of search method with two steps find initial interval, which includes unimodal minimum for some reduce size of interval (sufficiently)

Golden Section Strategy Define new point by New interval, depending on function value Reduction of length of interval by factor by one iteration for cost of computation of one function value (linear convergence). For 10 iterations reduction by factor, not dependent on function behaviour.

Parabola Method More efficient for normal behaviour of functions: fit parabola to last three points and use minimum of parabola as next point Note: many functions to be minimized are parabolic in good approximation => min. of parabola close to function minimum Bild 8.6 (Blobel/Lohrmann) But: method can get stuck with unbalanced section of interval (parabolic interpolation become instable) => Combined method: use mixture of parabola and golden section method to avoid unbalanced section of the interval.

Search Methods in n Dimensions Search method in n dimensions do not require any derivatives, only function values. Examples: Line search in one variable: sequentially in all dimensions (usually rather inefficient) Simplex method by Nelder and Mead: simple, but making use of earlier function evaluations in an efficient way ( learning ) Monte Carlo search: random search in n dimensions, using result as starting values for more efficient methods; meaningful if several local minima may exist In general search methods are acceptable initially (far from the optimum), but are inefficient and slow in the end game.

Simplex Method A simplex is formed by n+1 points in n-d space (n=2 triangle) sorted such that values are in the order In addition: mean of best n points = center of gravity Method: sequence of cycles with new point in each cycle, replacing worst point, with new (updated) simplex in each cycle. At the start of each cycle new test point of worst point at the center or gravity:, obtained by reflexion

The Simplex Method A few steps of the simplex method. Starting from the simplex with the center of gravity c. The points and are test points.

A Cycle in Simplex Method Depending on value : : Test point is middle point and is added, the previous worst point is removed : Test point is best point, search direction seems to effective. A new point (with β > 1) is determined and the function value is evaluated. For extra step is successful, is replaced by otherwise by : The simplex is too big, it has to be reduced. For the test point replaces the worst point. A new test point is defined by with 0<γ<1. If this point with is an improvement, then is replaces by this point. Otherwise a new simplex is defined by replacing all points but by for j=2,...,n+1 with 0<δ<1, which requires n function evaluations. Typical values are α=1, β=2, γ=0.5 and δ=0.5.

Monte Carlo Search in n Dimensions Search in a box: Lower and upper boundaries defined a test point: and with (uniformly distributed). Check for the point with the smallest function value among several test points. Search in a sphere: Define step size vector with:, and search with (from standard normal distribution). If new point has smaller function value, use this as next starting point. Meaningful in higher dimensions, especially if existence of many local minima expected, as method to get good starting value.

n Dimensional Minimization with Derivatives minimize Taylor expansion: function derivative Function value and derivatives are evaluated at function gradient Hesse matrix

Covariance Matrix Note: if objective function is: a sum of squares of deviations, defined by the Method or Least Square or a negative log. Likelihood function, defined according to the Maximum Likelihood function then the inverse Hessian matrix H at the minimum is a good estimate of the covariance matrix of the parameters : The second derivative needs most of the time to be computed anyhow at least at the last iteration step.

The Newton Step Step determined from For a quadratic function the Newton step is, in length and direction, a step to the minimum of the function. Sometimes large angle ( ) between Newton direction and (the direction of steepest descent). Calculation of distance to minimum (called EDM in MINUIT) if Hessian positive-definite. For a quadratic function the distance to the minimum is d/2.

General Iteration Scheme Test for convergence: If the conditions for convergence are satisfied, the algorithms terminates with as the solution. The difference and d are used in the test Compute a search vector: A vector is computed as the new search vector. The newton search vector is determine from Line Search: A one-dimensional minimization is done for the function and is determined. (this step is essential to get a stable method!) Update: The point is defined by and k is increased by 1,

Method of Steepest Descent The search vector is equal to the negative gradient Step seems to be natural choice; Only gradient required (no Hesse matrix) good; No step size defined (in contrast to the Newton step) bad; rate of convergence only linear: and are largest and smallest eigenvalue and κ condition number of Hesse matrix H. For a large value of κ, c close to one and slow convergence very bad. Optimal step size, if Hessian known:

Derivative Calculation The (optimal) Newton method requires first derivatives of F(x) : computation second derivatives of F(x) : computation Analytical derivatives may be impossible or difficult to obtain. Numerical derivatives require good step size δ for differential quotient E.g numerical derivative of f(x) in one dimension: Can the Newton (or quasi Newton) method be used without explicit calculation of the complete Hessian?

Minimization of Objective Function with gradient Hessian Newton step Least squares contributions: Ignoring second derivatives improves the Newton step!

Newton steps... in fit of Exponential Colour contours of objective function steps correspond to ΔΧ²~ 50 2. derivatives ignored : 2. derivatives included Ignoring second derivatives improves the Newton step!

Variable Metrik Method (I) Calculation of Hessian (with n(n+1)/2 different elements) from sequence of first derivatives (gradients) by update of estimate from change of gradient. Step is calculated from After a line search with minimum at with gradient Update matrix (with new value is : ) is not completely defined by those equations. Note: an accurate line search is essential for the success.

Variable Metrik Method (II) Most effective update formula (Broydo/Fletcher/Goldfarb/Shanno (BFGS)) Initial matrix may be the unit matrix Properties: the method generates n independent search directions for a quadratic function and the estimated Hessian converges to the true Hessian. Potential problems: no real convergence for good starting point; estimate destroyed for small, inaccurate steps (round-off errors)

Minimization with MINUIT Several options can be selected: Option MIGRAD: minimizes the objective function, calculates : first derivatives numerically and uses the BFGS update formula for the Hessian fast Option HESSE: calculates the Hesse matrix numerically recommended after minimization Option MINIMIZE: minimization by MIGRAD and HESSE calculation with checks