Introduction to Optimization

Size: px

Start display at page:

Download "Introduction to Optimization"

Elaine Thornton
5 years ago
Views:

1 Introduction to Optimization Constrained Optimization Marc Toussaint U Stuttgart

2 Constrained Optimization General constrained optimization problem: Let R n, f : R n R, g : R n R m, h : R n R l find min f() s.t. g() 0, h() = 0 In this lecture I ll focus (mostly) on inequality constraints g! Applications Find an optimal, non-colliding trajectory in robotics Optimize the shape of a turbine blade, s.t. it must not break Optimize the train schedule, s.t. consistency/possibility 2/39

3 Try to some how transform the constraint problem to a series of unconstraint problems a single but larger unconstraint problem another constraint problem, hopefully simpler (dual, conve) 3/39

4 General approaches Penalty & Barriers Associate a (adaptive) penalty cost with violation of the constraint Associate an additional force compensating the gradient into the constraint (augmented Lagrangian) Associate a log barrier with a constraint, becoming for violation (interior point method) Gradient projection methods (mostly for linear contraints) For active constraints, project the step direction to become tangantial When checking a step, always pull it back to the feasible region Lagrangian & dual methods Rewrite the constrained problem into an unconstrained one Or rewrite it as a (conve) dual problem Simple methods (linear constraints) Walk along the constraint boundaries 4/39

5 Penalties & Barries Convention: A barrier is really for g() > 0 A penalty is zero for g() 0 and increases with g() > 0 5/39

6 Log barrier method or Interior Point method 6/39

7 Log barrier method Instead of we address min f() s.t. g() 0 min f() µ i log( g i ()) 7/39

8 Log barrier For µ 0, µ log( g) converges to [g > 0] The barriers gradient log( g) = g g constraint Notation: [boolean epression] {0, 1} pushes away from the Eventually we want to have a very small µ but choosing small µ makes the barrier very non-smooth, which is bad for Gradient and 2nd order methods 8/39

9 Central Path Every µ defines a different optimal (µ) (µ) = argmin f() µ i log( g i ()) Each point on the path can be understood as the optimal compromise of minimizing f() and a repelling force of the constraints. (Which corresponds to dual variables λ (µ).) 9/39

10 Log barrier method Input: initial R n, functions f(), g(), f(), g(), tolerances θ, ɛ Output: 1: initialize µ = 1 2: repeat 3: find argmin f() µ i log( g i()) with tolerance 10θ 4: decrease µ µ/10 5: until < θ and i : g i () < ɛ Note: See Boyd & Vandenberghe for stopping criteria based on f precision (duality gap) and better choice of initial µ (which is called t there). 10/39

11 We will revisit the log barrier method later, once we introduced the Langrangian... 11/39

12 Squared Penalty Method 12/39

13 Squared Penalty Method This is perhaps the simplest approach Instead of we address min f() s.t. g() 0 min f() + µ m [g i () > 0] g i () 2 i=1 Input: initial R n, functions f(), g(), f(), g(), tol. θ, ɛ Output: 1: initialize µ = 1 2: repeat 3: find argmin f() + µ i [g i() > 0] g() 2 with tolerance 10θ 4: µ 10µ 5: until < θ and i : g i () < ɛ 13/39

14 Squared Penalty Method The method is ok, but will always lead to some violation of constraints 14/39

15 Squared Penalty Method The method is ok, but will always lead to some violation of constraints A better idea would be to add an out-pushing gradient/force g i () for every constraint g i () > 0 that is violated. Ideally, the out-pushing gradient mies with f() eactly such that the result becomes tangential to the constraint! This idea leads to the augmented Lagrangian approach. 14/39

16 Augmented Lagrangian (We can introduce this is a self-contained manner, without yet defining the Lagrangian ) 15/39

17 Augmented Lagrangian (equality constraint) We first consider an equality constraint before addressing inequalities Instead of min f() s.t. h() = 0 we address min f() + µ m i=1 h i () 2 + i=1 λ i h i () (1) Note: The gradient h i () is always orthogonal to the constraint By tuning λ i we can induce a virtual gradient λ i h i () The term µ m i=1 h i() 2 penalizes as before Here is the trick: First minimize (1) for some µ and λ i This will in general lead to a (slight) penalty µ m i=1 h i() 2 For the net iteration, choose λ i to generate eactly the gradient that was previously generated by the penalty 16/39

18 Optimality condition after an iteration: m m = argmin f() + µ h i() 2 + λ ih i() i=1 i=1 m m 0 = f( ) + µ 2h i( ) h i( ) + λ i h i( ) i=1 Update λ s for the net iteration: m λ new i h i( ) = µ i=1 λ new i i=1 i=1 2h i( ) h i( ) + i=1 = λ old i + 2µh i( ) λ old i h i( ) Input: initial R n, functions f(), h(), f(), h(), tol. θ, ɛ Output: 1: initialize µ = 1, λ i = 0 2: repeat 3: find argmin f() + µ i h i() 2 + i λ ih i () 4: i : λ i λ i + 2µh i ( ) 5: until < θ and h i () < ɛ 17/39

19 This adaptation of λ i is really elegant: We do not have to take the penalty limit µ but still can have eact constraints If f and h were linear ( f and h i constant), the updated λ i is eactly right: In the net iteration we would eactly hit the constraint (by construction) The penalty term is like a measuring device for the necessary virtual gradient, which is generated by the agumentation term in the net iteration The λ i are very meaningful: they give the force/gradient that a constraint eerts on the solution 18/39

20 Augmented Lagrangian Instead of we address min f() + µ i=1 (inequality constraint) min f() s.t. g() 0 m m [g i () 0 λ i > 0] g i () 2 + λ i g i () A constraint is either active or inactive: When active (g i () 0 λ i > 0) we aim for equality g i () = 0 When inactive (g i () < 0 λ i = 0) we don t penalize/augment λ i are zero or positive, but never negative i=1 Input: initial R n, functions f(), g(), f(), g(), tol. θ, ɛ Output: 1: initialize µ = 1, λ i = 0 2: repeat 3: find argmin f() + µ i [gi() 0 λi > 0] gi()2 + i λigi() 4: i : λ i ma(λ i + 2µg i( ), 0) 5: until < θ and g i() < ɛ 19/39

21 General approaches Penalty & Barriers Associate a (adaptive) penalty cost with violation of the constraint Associate an additional force compensating the gradient into the constraint (augmented Lagrangian) Associate a log-barrier with a constraint, becoming for violation (interior point method) Gradient projection methods (mostly for linear contraints) For active constraints, project the step direction to become tangantial When checking a step, always pull it back to the feasible region Lagrangian & dual methods Rewrite the constrained problem into an unconstrained one Or rewrite it as a (conve) dual problem Simple methods (linear constraints) Walk along the constraint boundaries 20/39

22 The Lagrangian 21/39

23 The Lagrangian Given a constraint problem we define the Lagrangian as min f() s.t. g() 0 L(, λ) = f() + m λ i g i () i=1 The λ i 0 are called dual variables or Lagrange multipliers 22/39

24 What s the point of this definition? The Lagrangian is useful to compute optima analytically, on paper that s why physicist learn it early on The Lagrangian implies the KKT conditions of optimality Optima are necessarily at saddle points of the Lagrangian The Lagrangian implies a dual problem, which is sometimes easier to solve than the primal 23/39

25 Eample: Some calculus using the Lagrangian For R 2, what is min 2 s.t = 1 Solution: L(, λ) = 2 + λ( ) 0 = L(, λ) = 2 + λ 1 1 = 2 = λ/2 1 0 = λ L(, λ) = = λ/2 λ/2 1 λ = 1 = 1 = 2 = 1/2 24/39

26 The force & KKT view on the Lagrangian At the optimum there must be a balance between the cost gradient f() and the gradient of the active constraints g i () 25/39

27 The force & KKT view on the Lagrangian At the optimum there must be a balance between the cost gradient f() and the gradient of the active constraints g i () Formally: for optimal : f() span{ g i()} [ ] Or: for optimal there must eist λ i such that f() = i ( λi gi()) 26/39

28 The force & KKT view on the Lagrangian At the optimum there must be a balance between the cost gradient f() and the gradient of the active constraints g i () Formally: for optimal : f() span{ g i()} [ ] Or: for optimal there must eist λ i such that f() = i ( λi gi()) For optimal it must hold (necessary condition): λ s.t. m f() + λ i g i () = 0 i=1 i : g i () 0 i : λ i 0 i : λ i g i () = 0 ( force balance ) (primal feasibility) (dual feasibility) (complementary) The last condition says that λ i > 0 only for active constraints. These are the Karush-Kuhn-Tucker conditions (KKT, neglecting equality constraints) 26/39

29 The force & KKT view on the Lagrangian The first condition ( force balance ), λ s.t. m f() + λ i g i () = 0 i=1 can be equivalently epressed as, λ s.t. L(, λ) = 0 In that sense, the Lagrangian can be viewed as the energy function that generates (for good choice of λ) the right balance between cost and constraint gradients This is eactly as in the augmented Lagrangian approach, where however we have an additional ( augmented ) squared penalty that is used to tune the λ i 27/39

30 Saddle point view on the Lagrangian Let s briefly consider the equality case again: with the Lagrangian Note: min f() s.t. h() = 0 L(, λ) = f() + m λ i h i () i=1 min L(, λ) 0 = L(, λ) force balance ma λ L(, λ) 0 = λl(, λ) = h i () constraint Optima (, λ ) are saddle points where L = 0 ensures force balance and λ L = 0 ensures the constraint 28/39

31 Saddle point view on the Lagrangian In the inequality case: ma L(, λ) = λ 0 ma L(, λ) λ i 0 f() if g() 0 otherwise λ i = 0 if g() < 0 0 = λi L(, λ) = g i (0) otherwise This implies either (λ i = 0 g i () < 0) or g i (0) = 0, which is eactly equivalent to the KKT conditions Again, optima (, λ ) are saddle points where min L enforces force balance and ma λ L enforces the KKT conditions 29/39

32 The Lagrange dual problem We define the Lagrange dual function as l(λ) = min L(, λ) This implies two problems min f() s.t. g() 0 primal problem ma l(λ) s.t. λ 0 dual problem λ The dual problem is conve, even if the primal is non-conve! Written more symmetric: min ma L(, λ) l 0 ma min λ 0 primal problem L(, λ) dual problem because the ma λ 0 L(, λ) ensures the constraints (previous slide). 30/39

33 The Lagrange dual problem The dual function is always a lower bound (for any λ i 0) [ ] l(λ) = min L(, λ) min f() s.t. g() 0 And consequently ma min λ 0 We say strong duality holds iff ma min λ 0 L(, λ) min L(, λ) = min ma L(, λ) l 0 ma L(, λ) l 0 If the primal is conve, and there eist an interior point : i : g i () < 0 (which is called Slater condition), then we have strong duality 31/39

34 And what about algorithms? So far we ve only introduced a whole lot of formalism, and seen that the Lagrangian sort of represents the constraint problem min L or L = 0 is related to the force balance ma λ L or λ L = 0 is related to constraints or KKT conditions This implies two dual problems, min ma λ L and ma λ min L, the second (dual) is a lower bound of the first (primal) But what are the algorithms we can get out of this? 32/39

35 Algorithmic implications of the Lagrangian view If min L(, λ) can be solved analytically, we can alternatively solve the (conve) dual problem. But more generally Optimization problem Solve KKT conditions Apply standard algos for solving an equation system r(, λ) = 0: Newton method r = r λ This leads to primal-dual algorithms that adapt and λ concurrently. Roughly, they use the curvature 2 f to estimate the right λ to push out of the constraint. We will discuss this after we ve learnt about 2nd order methods. 33/39

36 Log barrier method revisited 34/39

37 Log barrier method revisited Log barrier method: Instead of min f() s.t. g() 0 we address min f() µ i log( g i ()) For given µ the optimality condition is or equivalently f() i µ g i () g i() = 0 f() i λ i g i () = 0, λ i g i () = µ These are called modified (=approimate) KKT conditions. 35/39

38 Log barrier method revisited Centering (the unconstrained minimization) in the log barrier method is equivalent to solving the modified KKT conditions. Note also: On the central path, the duality gap is mµ: l(λ (µ)) = f( (µ)) + i λ ig i ( (µ)) = f( (µ)) mµ 36/39

39 Phase I: Finding a feasible initialization 37/39

40 Phase I: Finding a feasible initialization An elegant method for finding a feasible point : min s s.t. i : g i () s, s 0 (,s) R n+1 or min (,s) R n+m m s i s.t. i : g i () s i, s i 0 i=1 38/39

41 General approaches Penalty & Barriers Associate a (adaptive) penalty cost with violation of the constraint Associate an additional force compensating the gradient into the constraint (augmented Lagrangian) Associate a log barrier with a constraint, becoming for violation (interior point method) Gradient projection methods (mostly for linear contraints) For active constraints, project the step direction to become tangantial When checking a step, always pull it back to the feasible region Lagrangian & dual methods Rewrite the constrained problem into an unconstrained one Or rewrite it as a (conve) dual problem Simple methods (linear constraints) Walk along the constraint boundaries 39/39

Introduction to Optimization

Introduction to Optimization Second Order Optimization Methods Marc Toussaint U Stuttgart Planned Outline Gradient-based optimization (1st order methods) plain grad., steepest descent, conjugate grad.,