Lecture 12: Feasible direction methods

Similar documents
Linear programming and duality theory

Advanced Operations Research Techniques IE316. Quiz 2 Review. Dr. Ted Ralphs

15.082J and 6.855J. Lagrangian Relaxation 2 Algorithms Application to LPs

Discrete Optimization 2010 Lecture 5 Min-Cost Flows & Total Unimodularity

Programming, numerics and optimization

Advanced Operations Research Techniques IE316. Quiz 1 Review. Dr. Ted Ralphs

Math 414 Lecture 2 Everyone have a laptop?

Department of Mathematics Oleg Burdakov of 30 October Consider the following linear programming problem (LP):

Lecture 2 September 3

Lecture 1: Introduction

ORIE 6300 Mathematical Programming I September 2, Lecture 3

MVE165/MMG630, Applied Optimization Lecture 8 Integer linear programming algorithms. Ann-Brith Strömberg

Applied Integer Programming

Characterizing Improving Directions Unconstrained Optimization

Introduction to Mathematical Programming IE496. Final Review. Dr. Ted Ralphs

COMP331/557. Chapter 2: The Geometry of Linear Programming. (Bertsimas & Tsitsiklis, Chapter 2)

Lecture 5: Properties of convex sets

TMA946/MAN280 APPLIED OPTIMIZATION. Exam instructions

/ Approximation Algorithms Lecturer: Michael Dinitz Topic: Linear Programming Date: 2/24/15 Scribe: Runze Tang

Mathematical Programming and Research Methods (Part II)

Computational Methods. Constrained Optimization

11 Linear Programming

Integer Programming Theory

Introduction to Optimization Problems and Methods

Convex Optimization Lecture 2

Selected Topics in Column Generation

Lecture 2 - Introduction to Polytopes

MA4254: Discrete Optimization. Defeng Sun. Department of Mathematics National University of Singapore Office: S Telephone:

In this chapter we introduce some of the basic concepts that will be useful for the study of integer programming problems.

LECTURE NOTES Non-Linear Programming

Lecture 15: Log Barrier Method

for Approximating the Analytic Center of a Polytope Abstract. The analytic center of a polytope P + = fx 0 : Ax = b e T x =1g log x j

Part 4. Decomposition Algorithms Dantzig-Wolf Decomposition Algorithm

A Brief Overview of Optimization Problems. Steven G. Johnson MIT course , Fall 2008

David G. Luenberger Yinyu Ye. Linear and Nonlinear. Programming. Fourth Edition. ö Springer

15. Cutting plane and ellipsoid methods

Contents. I Basics 1. Copyright by SIAM. Unauthorized reproduction of this article is prohibited.

LP Geometry: outline. A general LP. minimize x c T x s.t. a T i. x b i, i 2 M 1 a T i x = b i, i 2 M 3 x j 0, j 2 N 1. where

Lecture 19 Subgradient Methods. November 5, 2008

Linear Optimization. Andongwisye John. November 17, Linkoping University. Andongwisye John (Linkoping University) November 17, / 25

Math 5593 Linear Programming Lecture Notes

Optimization under uncertainty: modeling and solution methods

16.410/413 Principles of Autonomy and Decision Making

DM545 Linear and Integer Programming. Lecture 2. The Simplex Method. Marco Chiarandini

5.3 Cutting plane methods and Gomory fractional cuts

Lecture 3. Corner Polyhedron, Intersection Cuts, Maximal Lattice-Free Convex Sets. Tepper School of Business Carnegie Mellon University, Pittsburgh

A Brief Overview of Optimization Problems. Steven G. Johnson MIT course , Fall 2008

CMU-Q Lecture 9: Optimization II: Constrained,Unconstrained Optimization Convex optimization. Teacher: Gianni A. Di Caro

Optimization Methods. Final Examination. 1. There are 5 problems each w i t h 20 p o i n ts for a maximum of 100 points.

Benders Decomposition

R n a T i x = b i} is a Hyperplane.

Combinatorial Geometry & Topology arising in Game Theory and Optimization

Integer Programming Explained Through Gomory s Cutting Plane Algorithm and Column Generation

POLYHEDRAL GEOMETRY. Convex functions and sets. Mathematical Programming Niels Lauritzen Recall that a subset C R n is convex if

COLUMN GENERATION IN LINEAR PROGRAMMING

Lecture 4: Rational IPs, Polyhedron, Decomposition Theorem

Simplicial Decomposition with Disaggregated Representation for the Traffic Assignment Problem

Solution Methods Numerical Algorithms

Lecture 4: Linear Programming

4 LINEAR PROGRAMMING (LP) E. Amaldi Fondamenti di R.O. Politecnico di Milano 1

A generic column generation principle: derivation and convergence analysis

C&O 355 Lecture 16. N. Harvey

Lecture 5: Duality Theory

Linear Programming in Small Dimensions

INTRODUCTION TO LINEAR AND NONLINEAR PROGRAMMING

Introduction to optimization methods and line search

Linear Programming Duality and Algorithms

A Derivative-Free Approximate Gradient Sampling Algorithm for Finite Minimax Problems

Integer Programming Chapter 9

Submodularity Reading Group. Matroid Polytopes, Polymatroid. M. Pawan Kumar

Applied Lagrange Duality for Constrained Optimization

Lecture Notes 2: The Simplex Algorithm

CS 473: Algorithms. Ruta Mehta. Spring University of Illinois, Urbana-Champaign. Ruta (UIUC) CS473 1 Spring / 29

MVE165/MMG631 Linear and integer optimization with applications Lecture 9 Discrete optimization: theory and algorithms

Integer Programming ISE 418. Lecture 7. Dr. Ted Ralphs

Computational Optimization. Constrained Optimization Algorithms

GENERAL ASSIGNMENT PROBLEM via Branch and Price JOHN AND LEI

4 Integer Linear Programming (ILP)

Lec13p1, ORF363/COS323

CS671 Parallel Programming in the Many-Core Era

PRIMAL-DUAL INTERIOR POINT METHOD FOR LINEAR PROGRAMMING. 1. Introduction

Introduction to Modern Control Systems

Convex Optimization MLSS 2015

Probabilistic Graphical Models

Discrete Optimization. Lecture Notes 2

Section Notes 5. Review of Linear Programming. Applied Math / Engineering Sciences 121. Week of October 15, 2017

Mathematical and Algorithmic Foundations Linear Programming and Matchings

Optimization. Industrial AI Lab.

maximize c, x subject to Ax b,

Lagrangean relaxation - exercises

Integer Programming ISE 418. Lecture 1. Dr. Ted Ralphs

Nonlinear Programming

Convexity and Optimization

Lesson 17. Geometry and Algebra of Corner Points

Convex Geometry arising in Optimization

Unconstrained Optimization Principles of Unconstrained Optimization Search Methods

An iteration of the simplex method (a pivot )

Lagrangian Relaxation: An overview

Algorithms for Decision Support. Integer linear programming models

AMS : Combinatorial Optimization Homework Problems - Week V

Transcription:

Lecture 12 Lecture 12: Feasible direction methods Kin Cheong Sou December 2, 2013 TMA947 Lecture 12 Lecture 12: Feasible direction methods 1 / 1

Feasible-direction methods, I Intro Consider the problem to find f = infimum f(x), subject to x X, (1a) (1b) X R n nonempty, closed and convex; f : R n R is C 1 on X A natural idea is to mimic the line search methods for unconstrained problems. However, most methods for (1) manipulate (that is, relax) the constraints defining X; in some cases even such that the sequence {x k } is infeasible until convergence. Why? TMA947 Lecture 12 Lecture 12: Feasible direction methods 3 / 1

Feasible-direction methods, I Intro Consider a constraint g i (x) b i, where g i is nonlinear Checking whether p is a feasible direction at x, or what the maximum feasible step from x in the direction of p is, is very difficult For which step length α > 0 is g i (x +αp) = b i? This is a nonlinear equation in α! Assuming that X is polyhedral, these problems are not present Note: KKT always necessary for a local min for polyhedral sets; methods will find such points TMA947 Lecture 12 Lecture 12: Feasible direction methods 4 / 1

Feasible-direction descent methods Intro Step 0. Determine a starting point x 0 R n such that x 0 X. Set k := 0 Step 1. Determine a search direction p k R n such that p k is a feasible descent direction Step 2. Determine a step length α k > 0 such that f(x k +α k p k ) < f(x k ) and x k +α k p k X Step 3. Let x k+1 := x k +α k p k Step 4. If a termination criterion is fulfilled, then stop! Otherwise, let k := k +1 and go to Step 1 TMA947 Lecture 12 Lecture 12: Feasible direction methods 5 / 1

Notes Intro Similar form as the general method for unconstrained optimization Just as local as methods for unconstrained optimization Search directions typically based on the approximation of f a relaxation Search direction often of the form p k = y k x k, where y k X solves an approximate problem Line searches similar; note the maximum step Termination criteria and descent based on first-order optimality and/or fixed-point theory (p k 0 n ) TMA947 Lecture 12 Lecture 12: Feasible direction methods 6 / 1

LP-based algorithm, I: The Frank Wolfe method Frank Wolfe The Frank Wolfe method is based on a first-order approximation of f around the iterate x k. This means that the relaxed problems are LPs, which can then be solved by using the Simplex method Remember the first-order optimality condition: If x X is a local minimum of f on X then holds f(x ) T (x x ) 0, x X, Remember also the following equivalent statement: minimum x X f(x ) T (x x ) = 0 TMA947 Lecture 12 Lecture 12: Feasible direction methods 9 / 1

LP-based algorithm, I: The Frank Wolfe method Frank Wolfe Follows that if, given an iterate x k X, minimum y X f(x k ) T (y x k ) < 0, and y k is an optimal solution to this LP problem, then the direction of p k := y k x k is a feasible descent direction with respect to f at x Search direction towards an extreme point of X [one that is optimal in the LP over X with costs c = f(x k )] This is the basis of the Frank Wolfe algorithm TMA947 Lecture 12 Lecture 12: Feasible direction methods 10 / 1

LP-based algorithm, I: The Frank Wolfe method Frank Wolfe We assume that X is bounded in order to ensure that the LP always has a finite optimal solution. The algorithm can be extended to work for unbounded polyhedra The search directions then are either towards an extreme point (finite optimal solution to LP) or in the direction of an extreme ray of X (unbounded solution to LP) Both cases identified in the Simplex method TMA947 Lecture 12 Lecture 12: Feasible direction methods 11 / 1

The search-direction problem Frank Wolfe 20 f(x k ) 15 X x k 10 5 p k 0 y k 5 5 0 5 10 TMA947 Lecture 12 Lecture 12: Feasible direction methods 12 / 1

Algorithm description, Frank Wolfe Frank Wolfe Step 0. Find x 0 X (for example any extreme point in X). Set k := 0 Step 1. Find an optimal solution y k to the problem to minimize y X z k (y) := f(x k ) T (y x k ) (2) Let p k := y k x k be the search direction Step 2. Approximately solve the problem to minimize f(x k +αp k ) over α [0,1]. Let α k be the step length Step 3. Let x k+1 := x k +α k p k Step 4. If, for example, z k (y k ) or α k is close to zero, then terminate! Otherwise, let k := k+1 and go to Step 1 TMA947 Lecture 12 Lecture 12: Feasible direction methods 13 / 1

Convergence Frank Wolfe Suppose X R n nonempty polytope; f in C 1 on X In Step 2 of the Frank Wolfe algorithm, we either use an exact line search or the Armijo step length rule Then: the sequence {x k } is bounded and every limit point (at least one exists) is stationary; {f(x k )} is descending, and therefore has a limit; z k (y k ) 0 ( f(x k ) T p k 0) If f is convex on X, then every limit point is globally optimal TMA947 Lecture 12 Lecture 12: Feasible direction methods 14 / 1

Franke-Wolfe convergence Frank Wolfe 3 2.2 2 2.1 2 1 1.9 1.8 0 1.7 1.6 1 1.5 2 1.4 1.3 3 3 2 1 0 1 2 3 1.2 2 1.8 1.6 1.4 1.2 1 0.8 TMA947 Lecture 12 Lecture 12: Feasible direction methods 15 / 1

The convex case: Lower bounds Frank Wolfe Remember the following characterization of convex functions in C 1 on X: f is convex on X f(y) f(x)+ f(x) T (y x), x,y X Suppose f is convex on X. Then, f(x k )+z k (y k ) f (lower bound, LBD), and f(x k )+z k (y k ) = f(x k ) if and only if x k is globally optimal. A relaxation cf. the Relaxation Theorem! Utilize the lower bound as follows: we know that f [f(x k )+z k (y k ),f(x k )]. Store the best LBD, and check in Step 4 whether [f(x k ) LBD]/ LBD is small, and if so terminate TMA947 Lecture 12 Lecture 12: Feasible direction methods 16 / 1

Notes Frank Wolfe Frank Wolfe uses linear approximations works best for almost linear problems For highly nonlinear problems, the approximation is bad the optimal solution may be far from an extreme point In order to find a near-optimum requires many iterations the algorithm is slow Another reason is that the information generated (the extreme points) is forgotten. If we keep the linear subproblem, we can do much better by storing and utilizing this information TMA947 Lecture 12 Lecture 12: Feasible direction methods 17 / 1

LP-based algorithm, II: Simplicial decomposition SD Remember the Representation Theorem (special case for polytopes): Let P = {x R n Ax = b; x 0 n }, be nonempty and bounded, and V = {v 1,...,v K } be the set of extreme points of P. Every x P can be represented as a convex combination of the points in V, that is, x = K α i v i, i=1 for some α 1,...,α k 0 such that K i=1 α i = 1 TMA947 Lecture 12 Lecture 12: Feasible direction methods 19 / 1

LP-based algorithm, II: Simplicial decomposition SD The idea behind the Simplicial decomposition method is to generate the extreme points v i which can be used to describe an optimal solution x, that is, the vectors v i with positive weights α i in K x = α i v i i=1 The process is still iterative: we generate a working set P k of indices i, optimize the function f over the convex hull of the known points, and check for stationarity and/or generate a new extreme point TMA947 Lecture 12 Lecture 12: Feasible direction methods 20 / 1

Algorithm description, Simplicial decomposition SD Step 0. Find x 0 X, for example any extreme point in X. Set k := 0. Let P 0 := Step 1. Let y k be an optimal solution to the LP problem minimize y X Let P k+1 := P k {k} z k (y) := f(x k ) T (y x k ) TMA947 Lecture 12 Lecture 12: Feasible direction methods 21 / 1

Algorithm description, Simplicial decomposition SD Step 2. Let (µ k,ν k+1 ) be an approximate solution to the restricted master problem (RMP) to minimize f µx k + ν i y i, (3a) (µ,ν) i P k+1 subject to µ+ ν i = 1, (3b) i P k+1 µ,ν i 0, i P k+1 (3c) Step 3. Let x k+1 := µ k+1 x k + i P k+1 (ν k+1 ) i y i Step 4. If, for example, z k (y k ) is close to zero, or if P k+1 = P k, then terminate! Otherwise, let k := k +1 and go to Step 1 TMA947 Lecture 12 Lecture 12: Feasible direction methods 22 / 1

Algorithm description, Simplicial decomposition SD This basic algorithm keeps all information generated, and adds one new extreme point in every iteration An alternative is to drop columns (vectors y i ) that have received a zero (or, low) weight, or to keep only a maximum number of vectors Special case: maximum number of vectors kept = 1 = the Frank Wolfe algorithm! We obviously improve the Frank Wolfe algorithm by utilizing more information Unfortunately, we cannot do line search! TMA947 Lecture 12 Lecture 12: Feasible direction methods 23 / 1

Practical simplicial decomposition SD In theory, SD will converge after a finite number of iterations, as there are finite many extreme points. However, the restricted master problem is harder to solve when the set P k is large. Extreme cases: P k = 1, Frank-Wolfe and line search, easy! If P k contains all extreme points, the restricted is just the original problem in disguise. We fix this by in each iteration also removing some extreme points from P. Practical rules. Drop y i if ν i = 0. Limit the size of Pk = r. (Again, r = 1 is Frank-Wolfe.) TMA947 Lecture 12 Lecture 12: Feasible direction methods 24 / 1

Simplicial decomposition illustration SD Figure : Example implementation of SD. Starting at x 0 = (1, 1) T, and with P 0 as the extreme points at (2,0) T, P k 2. TMA947 Lecture 12 Lecture 12: Feasible direction methods 25 / 1

Simplicial decomposition illustration SD Figure : Example implementation of SD. Starting at x 0 = (1, 1) T, and with P 0 as the extreme points at (2,0) T, P k 2. TMA947 Lecture 12 Lecture 12: Feasible direction methods 26 / 1

Simplicial decomposition illustration SD Figure : Example implementation of SD. Starting at x 0 = (1, 1) T, and with P 0 as the extreme points at (2,0) T, P k 2. TMA947 Lecture 12 Lecture 12: Feasible direction methods 27 / 1

Simplicial decomposition illustration SD Figure : Example implementation of SD. Starting at x 0 = (1, 1) T, and with P 0 as the extreme points at (2,0) T, P k 2. TMA947 Lecture 12 Lecture 12: Feasible direction methods 28 / 1

Simplicial decomposition illustration SD Figure : Example implementation of SD. Starting at x 0 = (1, 1) T, and with P 0 as the extreme points at (2,0) T, P k 2. TMA947 Lecture 12 Lecture 12: Feasible direction methods 29 / 1

Convergence SD It does at least as well as the Frank Wolfe algorithm: line segment [x k,y k ] feasible in RMP If x unique then convergence is finite if the RMPs are solved exactly, and the maximum number of vectors kept is the number needed to span x Much more efficient than the Frank Wolfe algorithm in practice (consider the above FW example!) We can solve the RMPs efficiently, since the constraints are simple TMA947 Lecture 12 Lecture 12: Feasible direction methods 30 / 1

The gradient projection algorithm Gradient projection The gradient projection algorithm is based on the projection characterization of a stationary point: x X is a stationary point if and only if, for any α > 0, x = Proj X [x α f(x )] N X (x ) x f(x ) x 01 01 01 y X TMA947 Lecture 12 Lecture 12: Feasible direction methods 32 / 1

Gradient projection algorithms Gradient projection Let p := Proj X [x α f(x)] x, for any α > 0. Then, if and only if x is non-stationary, p is a feasible descent direction of f at x The gradient projection algorithm is normally stated such that the line search is done over the projection arc, that is, we find a step length α k for which x k+1 := Proj X [x k α k f(x k )], k = 1,... (4) has a good objective value. Use the Armijo rule to determine α k Note: gradient projection becomes steepest descent with Armijo line search when X = R n! TMA947 Lecture 12 Lecture 12: Feasible direction methods 33 / 1

Gradient projection algorithms x k α f(x k ) Gradient projection x k ᾱ f(x k ) x k (ᾱ/2) f(x k ) x k (ᾱ/4) f(x k ) x k X TMA947 Lecture 12 Lecture 12: Feasible direction methods 34 / 1

Gradient projection algorithms Gradient projection Bottleneck: how can we compute projections? In general, we study the KKT conditions of the system and apply a simplex-like method. If we have a specially structured feasible polyhedron, projections may be easier to compute. Particular case: the unit simplex (the feasible set of the SD subproblems). TMA947 Lecture 12 Lecture 12: Feasible direction methods 35 / 1

Easy projections Gradient projection Example: the feasible set is S = {x R n 0 x i 1,i = 1,...,n}. Then Proj S (x) = z, where for i = 1,...,n. 0, x i < 0, z i = x i, 0 x i 1 1, 1 < x i, Exercise: prove this by applying the varitional inequality (or KKT conditions) to the problem. 1 minz S x z 2 2 TMA947 Lecture 12 Lecture 12: Feasible direction methods 36 / 1

Convergence, I Gradient projection X R n nonempty, closed, convex; f C 1 on X; for the starting point x 0 X it holds that the level set lev f (f(x 0 )) intersected with X is bounded In the algorithm (5), the step length α k is given by the Armijo step length rule along the projection arc Then: the sequence {x k } is bounded; every limit point of {x k } is stationary; {f(x k )} descending, lower bounded, hence convergent Convergence arguments similar to steepest descent one TMA947 Lecture 12 Lecture 12: Feasible direction methods 37 / 1

Convergence, II Gradient projection Assume: X R n nonempty, closed, convex; f C 1 on X; f convex; an optimal solution x exists In the algorithm (5), the step length α k is given by the Armijo step length rule along the projection arc Then: the sequence {x k } converges to an optimal solution Note: with X = R n = convergence of steepest descent for convex problems with optimal solutions! TMA947 Lecture 12 Lecture 12: Feasible direction methods 38 / 1

An illustration of FW vs. SD, I Gradient projection A large-scale nonlinear network flow problem which is used to estimate traffic flows in cities Model over the small city of Sioux Falls in North Dakota, USA; 24 nodes, 76 links, and 528 pairs of origin and destination Three algorithms for the RMPs were tested a Newton method and two gradient projection methods. MATLAB implementation. Remarkable difference The Frank Wolfe method suffers from very small steps being taken. Why? Many extreme points active = many routes used TMA947 Lecture 12 Lecture 12: Feasible direction methods 39 / 1

An illustration of FW vs. SD, I Gradient projection 10 1 Sioux Falls network 10 2 SD/Grad. proj. 1 SD/Grad. proj. 2 SD/Newton Frank Wolfe 10 3 Max relative objective function error 10 4 10 5 10 6 10 7 10 8 10 9 10 10 10 11 0 10 20 30 40 50 60 70 80 90 100 CPU time (s) Figure : The performance of SD vs. FW on the Sioux Falls network TMA947 Lecture 12 Lecture 12: Feasible direction methods 40 / 1