Introduction to Optimization Amy Langville SAMSI Undergraduate Workshop N.C. State University SAMSI 6/1/05
GOAL: minimize f(x 1, x 2, x 3, x 4, x 5 ) = x 2 1.5x 2x 3 + x 4 /x 5 PRIZE: $1 million # of independent variables = z = f(x 1, x 2, x 3, x 4, x 5 ) lives in R?
GOAL: minimize f(x 1, x 2, x 3, x 4, x 5 ) = x 2 1.5x 2x 3 + x 4 /x 5 PRIZE: $1 million # of independent variables = z = f(x 1, x 2, x 3, x 4, x 5 ) lives in R? Suppose you know little to nothing about Calculus or Optimization, could you win the prize? How?
GOAL: minimize f(x 1, x 2, x 3, x 4, x 5 ) = x 2 1.5x 2x 3 + x 4 /x 5 PRIZE: $1 million # of independent variables = z = f(x 1, x 2, x 3, x 4, x 5 ) lives in R? Suppose you know little to nothing about Calculus or Optimization, could you win the prize? How? Trial and Error, repeated function evaluations
Calculus III Review local min vs. global min vs. saddle point CPs and horizontal T. planes Local Mins and 2nd Derivative Test Global Mins and CPs and BPs Gradient = Direction of?
Constrained vs. Unconstrained Opt. Unconstrained min f(x, y) = x 2 + y 2 Constrained min f(x, y) = x 2 + y 2 s.t. x 0, y 0 min f(x, y) = x 2 + y 2 s.t. x > 0, y > 0 min f(x, y) = x 2 + y 2 s.t. 1 x 2,0 y 3 EVT min f(x, y) = x 2 + y 2 s.t. y = x + 2
Gradient Descent Methods Hillclimbers on Cloudy Day: max f(x, y) = min f(x, y) Initializations 1st-order and 2nd-order info. from partials: Gradient + Hessian Matlab function: gd(α, x 0 )
Iterative Methods Issues Convergence Test: what is it for gd.m? Convergence Proof: is gd.m guaranteed to converge to local min? For α > 0? For α < 0? Rate of Convergence: how many iterations? How do starting points x 0 affect number of iterations? Worst starting point for α = 4? Best?
Convergence of Optimization Methods global vs. local vs. stationary point vs. none Most optimization algorithms cannot guarantee convergence to global min, much less local min. However, some classes of optim. problems are particularly nice. Convex objective EX: z =.5(α x 2 + y 2 ), α > 0 Every local min is global min! Even for particularly tough optim. problems, sometimes the most popular, successful algorithms perform well on many problems, despite lack of convergence theory. Must qualify statements: I found best global min to date.
Your Least Squares Problem how many variables/unknowns n =? z = f(x 1, x 2,..., x n ) lives in R? can we graph z?
Nonsmooth, Nondifferentiable Surfaces Can t compute gradient f can t use GD Methods Line Search Methods Method of Alternating Variables (Coordinate Descent): solve series of 1-D problems what would these steps look like on contour map?
fminsearch and Nelder-Mead maintain basis of n + 1 points where n = # variables form simplex using these points; convex hull idea: move in direction away from worst of these points EX: n = 2, so maintain basis of 3 points living in xy-plane simplex is triangle create new simplex by moving away from worst point: reflect, expand, contract, shrink steps
PROPERTIES OF NELDER MEAD 117 x 3 x 3 x x x r x r x e Fig. 1. Nelder Mead simplices after a reflection and an expansion step. The original simplex is shown with a dashed line. x 3 x 3 x cc x x x 1 x c x r Fig. 2. Nelder Mead simplices after an outside contraction, an inside contraction, and a shrink. The original simplex is shown with a dashed line. then x (k+1) 1 = x (k) 1. Beyond this, whatever rule is used to define the original ordering may be applied after a shrink. We define the change index k of iteration k as the smallest index of a vertex that differs between iterations k and k + 1: (2.8) k = min{ i x (k) i x (k+1) i }. (Tie-breaking rules are needed to define a unique value of k.) When Algorithm NM terminates in step 2, 1 <k n; with termination in step 3, k = 1; with termination in step 4, 1 k n + 1; and with termination in step 5, k = 1 or 2. A statement that x j changes means that j is the change index at the relevant iteration. The rules and definitions given so far imply that, for a nonshrink iteration,
N-M Algorithm
N-M Algorithm
.! / #! $% ( % 10 0 SN1939A 0.23 0.22 SN1939A, Residual norm as a function of λ 1 and λ 2 0.21 Residual norm 0.2 0.19 0.18 luminosity 10 1 0.17 20 40 60 80 10 2 0 20 40 60 80 100 120 140 160 180 days λ 100 1 2 3 4 5 6 7 8 λ 2 6 % 10 0 SN1939A 0.23 0.22 SN1939A, Residual norm as a function of λ 1 and λ 2 0.21 Residual norm 0.2 0.19 0.18 luminosity 10 1 0.17 20 40 60 80 10 2 0 20 40 60 80 100 120 140 160 180 days λ 100 1 2 3 4 5 6 7 8 λ 2 6)
8 % 10 0 SN1939A 0.23 0.22 SN1939A, Residual norm as a function of λ 1 and λ 2 0.21 Residual norm 0.2 0.19 0.18 luminosity 10 1 0.17 20 40 60 80 10 2 0 20 40 60 80 100 120 140 160 180 days λ 100 1 2 3 4 5 6 7 8 λ 2 Q % 10 0 SN1939A 0.23 0.22 SN1939A, Residual norm as a function of λ 1 and λ 2 0.21 Residual norm 0.2 0.19 0.18 luminosity 10 1 0.17 20 40 60 80 10 2 0 20 40 60 80 100 120 140 160 180 days λ 100 1 2 3 4 5 6 7 8 λ 2 6
% 10 0 SN1939A 0.23 0.22 SN1939A, Residual norm as a function of λ 1 and λ 2 0.21 Residual norm 0.2 0.19 0.18 luminosity 10 1 0.17 20 40 60 80 10 2 0 20 40 60 80 100 120 140 160 180 days λ 100 1 2 3 4 5 6 7 8 λ 2 68 % 10 0 SN1939A 0.23 0.22 SN1939A, Residual norm as a function of λ 1 and λ 2 0.21 Residual norm 0.2 0.19 0.18 luminosity 10 1 0.17 20 40 60 80 10 2 0 20 40 60 80 100 120 140 160 180 days λ 100 1 2 3 4 5 6 7 8 λ 2 8(
Q(! % 10 0 SN1939A 0.23 0.22 SN1939A, Residual norm as a function of λ 1 and λ 2 0.21 Residual norm 0.2 0.19 0.18 Starting guess λ 0 luminosity 10 1 0.17 20 40 Optimizer λ * 60 80 10 2 0 20 40 60 80 100 120 140 160 180 days λ 100 1 2 3 4 5 6 7 8 λ 2 7 / 7, ( % * " - ( ) - * +( " $ $ " - * - * ( $,* * ( $,* ("/ - * ( $,* * ( $,* ("/ --"- ( $,* - * - * *+ ( "-$ --"- - * - ( ) + ( " $ $ " ( $,* * ( $,* * *+ ( "-$ 86
N-M Algorithm not proven to converge in general but, widely used easy to implement inexpensive: usually only 1-2 function evaluations/iteration no derivatives needed makes good progress at beginning of iteration history Assignments: Display N-M steps using options.display= iter ; fminsearch(fun,[x0],options); Write nested for loops in Matlab to generate grid of starting points (and later random starting points) for fminsearch to find best global min
Genetic/Evolutionary Algorithms at each iteration either mate or mutate possible solution vectors based on fitness of possible solution vectors as measured by objective function