Chapter Multidimensional Gradient Method

Similar documents
16. LECTURE 16. I understand how to find the rate of change in any direction. I understand in what direction the maximum rate of change happens.

f for Directional Derivatives and Gradient The gradient vector is calculated using partial derivatives of the function f(x,y).

Energy Minimization -Non-Derivative Methods -First Derivative Methods. Background Image Courtesy: 3dciencia.com visual life sciences

CS545 Contents IX. Inverse Kinematics. Reading Assignment for Next Class. Analytical Methods Iterative (Differential) Methods

Visualizing Images. Lecture 2: Intensity Surfaces and Gradients. Images as Surfaces. Bridging the Gap. Examples. Examples

Robert Collins CSE486, Penn State. Lecture 2: Intensity Surfaces and Gradients

Ellipsoid Algorithm :Algorithms in the Real World. Ellipsoid Algorithm. Reduction from general case

Hartley - Zisserman reading club. Part I: Hartley and Zisserman Appendix 6: Part II: Zhengyou Zhang: Presented by Daniel Fontijne

Winter 2012 Math 255 Section 006. Problem Set 7

Robert Collins CSE598G. Intro to Template Matching and the Lucas-Kanade Method

Math 213 Exam 2. Each question is followed by a space to write your answer. Please write your answer neatly in the space provided.

Modern Methods of Data Analysis - WS 07/08

Minima, Maxima, Saddle points

3.3 Optimizing Functions of Several Variables 3.4 Lagrange Multipliers

INF Biologically inspired computing Lecture 1: Marsland chapter 9.1, Optimization and Search Jim Tørresen

Directional Derivatives. Directional Derivatives. Directional Derivatives. Directional Derivatives. Directional Derivatives. Directional Derivatives

Week 5: Geometry and Applications

Multivariate Numerical Optimization

2 Second Derivatives. As we have seen, a function f (x, y) of two variables has four different partial derivatives: f xx. f yx. f x y.

Linear Discriminant Functions: Gradient Descent and Perceptron Convergence

Theoretical Concepts of Machine Learning

Polynomial Functions Graphing Investigation Unit 3 Part B Day 1. Graph 1: y = (x 1) Graph 2: y = (x 1)(x + 2) Graph 3: y =(x 1)(x + 2)(x 3)

Robot Mapping. Graph-Based SLAM with Landmarks. Cyrill Stachniss

Section 10.1 Polar Coordinates

Generic descent algorithm Generalization to multiple dimensions Problems of descent methods, possible improvements Fixes Local minima

Polar Coordinates. 2, π and ( )

Numerical Methods Lecture 7 - Optimization

Today. Golden section, discussion of error Newton s method. Newton s method, steepest descent, conjugate gradient

Chapter 3 Numerical Methods

MATH 2400, Analytic Geometry and Calculus 3

Numerical Optimization

d f(g(t), h(t)) = x dt + f ( y dt = 0. Notice that we can rewrite the relationship on the left hand side of the equality using the dot product: ( f

CHAPTER VI BACK PROPAGATION ALGORITHM

Introduction to Optimization

A Brief Look at Optimization

14.6 Directional Derivatives and the Gradient Vector

6. Linear Discriminant Functions

Types of Edges. Why Edge Detection? Types of Edges. Edge Detection. Gradient. Edge Detection

Image Registration Lecture 4: First Examples

Lesson 4: Gradient Vectors, Level Curves, Maximums/Minimums/Saddle Points

Module 1 Lecture Notes 2. Optimization Problem and Model Formulation

Equation of tangent plane: for explicitly defined surfaces

Search direction improvement for gradient-based optimization problems

Computational Methods. Constrained Optimization

Constrained Optimization with Calculus. Background Three Big Problems Setup and Vocabulary

Local and Global Minimum

Lecture 12: convergence. Derivative (one variable)

Optimization. Industrial AI Lab.

Classical Gradient Methods

Experimental Data and Training

Newton and Quasi-Newton Methods

Math 241, Final Exam. 12/11/12.

Logistic Regression and Gradient Ascent

Parallel Implementation of Nudged Elastic Band Method

Optimal Control Techniques for Dynamic Walking

Second Midterm Exam Math 212 Fall 2010

3D Computer Graphics. Jared Kirschner. November 8, 2010

Introduction to optimization methods and line search

21-256: Lagrange multipliers

Coordinate Frames and Transforms

Chapter 5 Partial Differentiation

Machine Learning for Signal Processing Lecture 4: Optimization

In other words, we want to find the domain points that yield the maximum or minimum values (extrema) of the function.

Module 9 : Numerical Relaying II : DSP Perspective

Introduction to Design Optimization: Search Methods

AP CALCULUS BC 2014 SCORING GUIDELINES

Math 231E, Lecture 34. Polar Coordinates and Polar Parametric Equations

Coordinate transformations. 5554: Packet 8 1

CS 395T Lecture 12: Feature Matching and Bundle Adjustment. Qixing Huang October 10 st 2018

1.5 Equations of Lines and Planes in 3-D

18.02 Final Exam. y = 0

Coordinate Geometry. Coordinate geometry is the study of the relationships between points on the Cartesian plane

WORKSPACE AGILITY FOR ROBOTIC ARM Karna Patel

User Activity Recognition Based on Kalman Filtering Approach

Today. Gradient descent for minimization of functions of real variables. Multi-dimensional scaling. Self-organizing maps

Introduction to Optimization Problems and Methods

Image Compression: An Artificial Neural Network Approach

7 OPTIMIZATION 46. Contributing to savings versus achieving enjoyment from purchases made now;

Convex Optimization CMU-10725

COMP30019 Graphics and Interaction Three-dimensional transformation geometry and perspective

MEI Desmos Tasks for AS Pure

Gradient Descent. Wed Sept 20th, James McInenrey Adapted from slides by Francisco J. R. Ruiz

Constrained Optimization

Vectors. Vectors are used to talk about positions in 3 dimensional space, differences between one position and another, or directions.

Solving for dynamic user equilibrium

Introduction to Design Optimization: Search Methods

Vector Calculus: Understanding the Cross Product

1. Answer: x or x. Explanation Set up the two equations, then solve each equation. x. Check

Worksheet 2.7: Critical Points, Local Extrema, and the Second Derivative Test

Summer Review for Students Entering Pre-Calculus with Trigonometry. TI-84 Plus Graphing Calculator is required for this course.

Projectile Trajectory Scenarios

Module 4 : Solving Linear Algebraic Equations Section 11 Appendix C: Steepest Descent / Gradient Search Method

OPTIMIZATION METHODS

Representing 2D Transformations as Matrices

Machine Learning for Software Engineering

is a plane curve and the equations are parametric equations for the curve, with parameter t.

Regents Physics Lab #30R. Due Date. Refraction of Light

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.

High Altitude Balloon Localization from Photographs

Beyond Classical Search: Local Search. CMPSCI 383 September 23, 2011

Transcription:

Chapter 09.04 Multidimensional Gradient Method After reading this chapter, you should be able to: 1. Understand how multi-dimensional gradient methods are different from direct search methods. Understand the use of first and second derivatives in multi-dimensions 3. Understand how the steepest ascent/descent method works 4. Solve multi-dimensional optimization problems using the steepest ascent/descent method How do gradient methods differ from direct search methods in multi-dimensional optimization? The difference between gradient and direct search methods in multi-dimensional optimization is similar to the difference between these approaches in one-dimensional optimization. Direct search methods are useful when the derivative of the optimization function is not available to effectively guide the search for the optimum. While direct search methods explore the parameter space in a systematic manner, they are not computationally very efficient. On the other hand, gradient methods use information from the derivatives of the optimization function to more effectively guide the search and find optimum solutions much quicker. Newton s Method When Newton s Method (http://numericalmethods.eng.usf.edu/topics/opt_newtons_method.html) was introduced as a one-dimensional optimization method, we discussed the use of the first and second derivative of the optimization function as sources of information to determine if we have reached an optimal point (where the value of the first derivative is zero). If that optimal point is a maximum, the second derivative is negative. If the point is a minimum, the second derivative is positive. 09.04.1

09.04. Chapter 09.04 What are Gradients and Hessians and how are Gradients and Hessians used in multidimensional optimization? Gradients and Hessians describe the first and second derivatives of functions, respectively in multiple dimensions and are used frequently in various gradient methods for multidimensional optimization. We describe these two concepts of gradients and Hessians next. Gradient: The gradient is a vector operator denoted by (referred to as del ) which, when applied to a function f, represents its directional derivatives. For example, consider a two dimensional function f ( x, y) which shows elevation above sea level at points x and y. If you wanted to move in the direction that would gain you the most elevation, this direction could be defined along a direction h which forms an angle θ with the x -axis. For an illustration of this, see Figure 1. The elevation along this new axis can be described by a new function g (h) where your current location is the origin of the new coordinate axis or h = 0. The slope in this direction can be calculated by taking the derivative of the new function g (h) at this point, namely g (0). The slope is then calculated by g (0) = cosθ + sinθ y h ( a, b) θ Figure 1. Determining elevation along a new axis The gradient is a special case where the direction of the vector gains the most elevation, or has the steepest ascent. If the goal was to decrease elevation, then this would be termed as the steepest descent. The gradient of f ( x, y) or f is the vector pointing in the direction of the steepest slope at that point. The gradient is calculated by f = i + j (1) x

Multidimensional Gradient Method 09.04.3 Example 1 Calculate the gradient to determine the direction of the steepest slope at point (, 1) for the function f ( x, y) = x y. Solution To calculate the gradient; the partial derivatives must be evaluated as = xy = ()(1) = 4 = x y = () (1) = 8 which are used to determine the gradient at point (,1) as f = 4 i + 8j Traveling along this direction, we would gain elevation equal to the magnitude of the gradient which is f = 4 + 8 = 8. 94. Note that there is no other direction along which we can move to increase the slope. Hessians: The Hessian matrix or just the Hessian, is the Jacobian Matrix of the second-order partial derivatives of a function. For example, in a two dimensional function the Hessian matrix is simply f f H = () f f The determinant of the Hessian matrix is also referred to as the Hessian. Example f x, y = x y. Solution To calculate the Hessian, we would need to calculate f = y x = (1) = Calculate the Hessian at point (, 1) for the function ( )

09.04.4 Chapter 09.04 f = x = () = 8 f f = = 4xy = 4()(1) = 8 and the resulting Hessian matrix is 8 H = 8 8 Based on your knowledge of how second derivatives are used in one-dimensional optimization, you may guess that the value of the second derivatives in multi-dimensional optimization will tell us if we are at a maxima or minima. In multi-dimensional optimization, the Hessian of the optimization function contains the information of the second derivatives of the function, and is used to make such a determination. The determinant of the Hessian matrix denoted by H can have three cases: 1. If H > 0and / x > 0 f then ( x y) H and f / x < 0then ( x y) H then f ( x, y) is a saddle point.. If > 0 3. If < 0 f, is a local minimum. f, is a local maximum. Referring to Example, since H = 48 < 0, then point (, 1) is a saddle point. Hessians are also used in determining search trajectories in more advanced multidimensional gradient search techniques such as the Marquardt Method which is beyond the scope of this module. What is the Steepest Ascent (or Descent) method and how does it work? In any multi-dimensional optimization algorithm, there are two key questions to be asked when searching for an optimal solution. The first one is about the direction of travel. The concept of gradients provides the answers to this question, at least in the short term. The second question is how long one should pursue a solution in this direction before reevaluating an alternative direction. If a re-evaluation strategy is executed too often, it increases computational costs. If it is executed too infrequently, one may end up pursuing a direction that may take the search away from the optimal solution. The steepest ascent algorithm proposes a simple solution to the second question by arbitrarily choosing a step * size h. The special case where h = h is referred to as the optimal steepest descent where * h brings us to the local maximum along the direction of the gradient. Consider the following

Multidimensional Gradient Method 09.04.5 example which illustrates the application of the optimal steepest ascent method to a multidimensional optimization problem. Example 3 Determine the minimum of the function f ( x, y) = x + y + x + 4. Use the point (,1) the initial estimate of the optimal solution. Solution as Iteration 1: To calculate the gradient; the partial derivatives must be evaluated as = x + = () + = 6 = y = (1) = which are used to determine the gradient at point (,1) as f = 6 i + j Now the function f ( x, y) can be expressed along the direction of gradient as f x0 + h, y0 + h = f ( + 6h,1 + h) = ( + 6h) + (1 + h) + ( + 6h) + 4 Multiplying out the terms we obtain the one dimensional function along the gradient as g ( h) = 40h + 40h + 13 * This is a simple function and it is easy to determine h = 0. 5 by taking the first derivative and solving for its roots. This means that traveling a step size of h = 0. 5 along the gradient reaches a minimum value for the function in this direction. These values are substituted back to calculate a new value for x and y as follows: x = + 6( 0.5) = 1 y = 1+ ( 0.5) = 0 Calculating the new values of x and y concludes the first iteration. Note that f ( 1,0) = 3 is less than f (,1) = 13 which indicates a move in the right direction. Iteration : The new initial point is ( 1,0).We calculate the gradient at this point as = x + = ( 1) + = 0 = y = (0) = 0 which are used to determine the gradient at point ( 1,0) as f = 0 i + 0j This indicates that the current location is a local optimum and no improvement can be gained by moving in any direction. To ensure that we have reached a minimum, we can calculate the

09.04.6 Chapter 09.04 Hessian of the function. To determine the Hessian, the second partial derivatives are determined and evaluated as follows f = f = f = 0 The resulting Hessian matrix and its determinant are 0 H = 0 H = 4 0 4 Since H > 0 and / x > 0 f 1,0 is a local minimum. f then ( ) OPTIMIZATION Topic Multidimensional Gradient Method Summary Textbook notes for the multidimensional gradient method Major All engineering majors Authors Ali Yalcin Date December, 01 Web Site http://numericalmethods.eng.usf.edu