Algorithms for convex optimization

Similar documents
LECTURE 13: SOLUTION METHODS FOR CONSTRAINED OPTIMIZATION. 1. Primal approach 2. Penalty and barrier methods 3. Dual approach 4. Primal-dual approach

Ellipsoid Algorithm :Algorithms in the Real World. Ellipsoid Algorithm. Reduction from general case

Convex Optimization M2

Programming, numerics and optimization

Mathematical Programming and Research Methods (Part II)

Solution Methods Numerical Algorithms

Tutorial on Convex Optimization for Engineers

Computational Methods. Constrained Optimization

Introduction to Optimization Problems and Methods

15. Cutting plane and ellipsoid methods

EARLY INTERIOR-POINT METHODS

LECTURE 6: INTERIOR POINT METHOD. 1. Motivation 2. Basic concepts 3. Primal affine scaling algorithm 4. Dual affine scaling algorithm

Characterizing Improving Directions Unconstrained Optimization

Introduction to Optimization

Convex Optimization CMU-10725

1. Introduction. performance of numerical methods. complexity bounds. structural convex optimization. course goals and topics

Contents. I Basics 1. Copyright by SIAM. Unauthorized reproduction of this article is prohibited.

CS 435, 2018 Lecture 2, Date: 1 March 2018 Instructor: Nisheeth Vishnoi. Convex Programming and Efficiency

PRIMAL-DUAL INTERIOR POINT METHOD FOR LINEAR PROGRAMMING. 1. Introduction

CMU-Q Lecture 9: Optimization II: Constrained,Unconstrained Optimization Convex optimization. Teacher: Gianni A. Di Caro

Interior Penalty Functions. Barrier Functions A Problem and a Solution

Convex Optimization MLSS 2015

David G. Luenberger Yinyu Ye. Linear and Nonlinear. Programming. Fourth Edition. ö Springer

Linear and Integer Programming :Algorithms in the Real World. Related Optimization Problems. How important is optimization?

Lecture 12: convergence. Derivative (one variable)

EE/AA 578: Convex Optimization

SIMULATED ANNEALING WITH AN EFFICIENT UNIVERSAL BARRIER

Introduction to optimization methods and line search

Composite Self-concordant Minimization

Chapter II. Linear Programming

Modern Methods of Data Analysis - WS 07/08

Lecture 15: Log Barrier Method

Lecture 2 September 3

Introduction to Modern Control Systems

Convex Optimization - Chapter 1-2. Xiangru Lian August 28, 2015

Comparison of Interior Point Filter Line Search Strategies for Constrained Optimization by Performance Profiles

Recent Developments in Model-based Derivative-free Optimization

Lecture 25 Nonlinear Programming. November 9, 2009

Advanced Operations Research Techniques IE316. Quiz 1 Review. Dr. Ted Ralphs

Local and Global Minimum

Introduction to Convex Optimization. Prof. Daniel P. Palomar

Constrained and Unconstrained Optimization

Programs. Introduction

Linear Optimization and Extensions: Theory and Algorithms

3 Interior Point Method

Lecture Notes 2: The Simplex Algorithm

Delaunay-based Derivative-free Optimization via Global Surrogate. Pooriya Beyhaghi, Daniele Cavaglieri and Thomas Bewley

Research Interests Optimization:

California Institute of Technology Crash-Course on Convex Optimization Fall Ec 133 Guilherme Freitas

Convex Optimization. Lijun Zhang Modification of

A Brief Look at Optimization

Convexity Theory and Gradient Methods

The Simplex Algorithm

CS675: Convex and Combinatorial Optimization Spring 2018 The Simplex Algorithm. Instructor: Shaddin Dughmi

A fast algorithm for sparse reconstruction based on shrinkage, subspace optimization and continuation [Wen,Yin,Goldfarb,Zhang 2009]

ORIE 6300 Mathematical Programming I November 13, Lecture 23. max b T y. x 0 s 0. s.t. A T y + s = c

Minima, Maxima, Saddle points

5 Machine Learning Abstractions and Numerical Optimization

Introduction to optimization

Lecture 12: Feasible direction methods

Optimization. Industrial AI Lab.

Crash-Starting the Simplex Method

Parameterization. Michael S. Floater. November 10, 2011

60 2 Convex sets. {x a T x b} {x ã T x b}

Short Reminder of Nonlinear Programming

Convexity: an introduction

Efficient Optimization for L -problems using Pseudoconvexity

Convexization in Markov Chain Monte Carlo

Lec13p1, ORF363/COS323

Selected Topics in Column Generation

Discrete Optimization. Lecture Notes 2

5 Day 5: Maxima and minima for n variables.

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 2. Convex Optimization

Unconstrained Optimization Principles of Unconstrained Optimization Search Methods

INTRODUCTION TO LINEAR AND NONLINEAR PROGRAMMING

Constrained Optimization COS 323

A Note on Smoothing Mathematical Programs with Equilibrium Constraints

Constrained optimization

Experimental Data and Training

A Brief Overview of Optimization Problems. Steven G. Johnson MIT course , Fall 2008

Mathematical and Algorithmic Foundations Linear Programming and Matchings

16.410/413 Principles of Autonomy and Decision Making

Lecture 2 - Introduction to Polytopes

Today. Golden section, discussion of error Newton s method. Newton s method, steepest descent, conjugate gradient

A popular method for moving beyond linearity. 2. Basis expansion and regularization 1. Examples of transformations. Piecewise-polynomials and splines

This lecture: Convex optimization Convex sets Convex functions Convex optimization problems Why convex optimization? Why so early in the course?

Interior Point I. Lab 21. Introduction

Performance Evaluation of an Interior Point Filter Line Search Method for Constrained Optimization

Applied Lagrange Duality for Constrained Optimization

CONLIN & MMA solvers. Pierre DUYSINX LTAS Automotive Engineering Academic year

Lecture 9: Introduction to Spline Curves

An augmented Lagrangian method for equality constrained optimization with fast infeasibility detection

Theoretical Concepts of Machine Learning

4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used.

Lower bounds on the barrier parameter of convex cones

CPSC 340: Machine Learning and Data Mining. Robust Regression Fall 2015

REAL-CODED GENETIC ALGORITHMS CONSTRAINED OPTIMIZATION. Nedim TUTKUN

CHAPTER 2 CONVENTIONAL AND NON-CONVENTIONAL TECHNIQUES TO SOLVE ORPD PROBLEM

Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models

SYSTEMS OF NONLINEAR EQUATIONS

Transcription:

Algorithms for convex optimization Michal Kočvara Institute of Information Theory and Automation Academy of Sciences of the Czech Republic and Czech Technical University kocvara@utia.cas.cz http://www.utia.cas.cz/kocvara Algorithms for convex optimization p.1/33

Convex programs A general problem of mathematical programming: min f(x) subject to x X R n (MP) where n is the design dimension of the problem; f : R n R is the objective function; X R n is the feasible domain of the problem. Assume: the function f and the feasible domain X are convex; the feasible domain X is defined by convex functions g i : R n R, i = 1,..., m: X := {x R n g i (x) 0, i = 1,..., m}. Algorithms for convex optimization p.4/33

Convex programs A mathematical program satisfying the above assumption is called convex mathematical program: min f(x) subject to g i (x) 0, i = 1,..., m. (CP) Why are we interested in this class of optimization problems? That is because: (i) Convex programs are computationally tractable: there exist numerical methods which efficiently solve every convex program satisfying mild additional assumption; (ii) In contrast, no efficient universal methods for nonconvex mathematical programs are known. Algorithms for convex optimization p.5/33

Convex functions A function h : R n R is convex when x, x λ [0, 1] : h(λx + (1 λ)x ) λh(x) + (1 λ)h(x ) Convex (a) and nonconvex (b) functions in R 1 : (a) (b) Algorithms for convex optimization p.6/33

Convex feasible region Convex feasible region described by convex g i s: Algorithms for convex optimization p.7/33

Method of Newton The simplest convex program problem without constraints and with a strongly convex objective: min f(x) (UCP); strongly convex the Hessian matrix 2 f ( x) is positive definite at every point x and that f ( x) as x 2. Method of choice: Newton s method. The idea: at every iterate, we approximate the function f by a quadratic function, the second-order Taylor expansion: f(x) + (y x) T f(x) + 1 2 (y x)t 2 f(x)(y x). The next iterate by minimization of this quadratic function. Algorithms for convex optimization p.8/33

Method of Newton Given a current iterate x, compute the gradient f(x) and Hessian H(x) of f at x. Compute the direction vector Compute the new iterate The method is d = H(x) 1 f(x). x new = x + d. extremely fast whenever we are close enough to the solution x. Theory: the method is locally quadratically convergent, i.e., x new x 2 c x x 2 2, provided that x x 2 r, where r is small enough. rather slow when we are not close enough to x. Algorithms for convex optimization p.9/33

Interior-point methods Transform the difficult constrained problem into an easy unconstrained problem, or into a sequence of unconstrained problems. Once we have an unconstrained problem, we can solve it by Newton s method. The idea is to use a barrier function that sets a barrier against leaving the feasible region. If the optimal solution occurs at the boundary of the feasible region, the procedure moves from the interior to the boundary, hence interior-point methods. The barrier function approach was first proposed in the early sixties and later popularised and thoroughly investigated by Fiacco and McCormick. Algorithms for convex optimization p.10/33

Classic approach min f(x) subject to g i (x) 0, i = 1,..., m. (CP) We introduce a barrier function B that is nonnegative and continuous over the region {x g i (x) < 0}, and approaches infinity as the boundary of the region {x g i (x) 0} is approached from the interior. (CP) a one-parametric family of functions generated by the objective and the barrier: Φ(µ; x) := f(x) + µb(x) and the corresponding unconstrained convex programs min x Φ(µ; x); here the penalty parameter µ is assumed to be nonnegative. Algorithms for convex optimization p.11/33

Classic approach The idea behind the barrier methods is now obvious: We start with some µ (say µ = 1) and solve the unconstrained auxiliary problem. Then we decrease µ by some factor and solve again the auxiliary problem, and so on. The auxiliary problem has a unique solution x(µ) for any µ > 0. The central path, defined by the solutions x(µ), µ > 0, is a smooth curve and its limit points (for µ 0) belong to the set of optimal solutions of (CP). Algorithms for convex optimization p.12/33

Example One of the most popular barrier functions (Frisch s) logarithmic barrier function, m B(x) = log( g i (x)). i=1 Consider a one-dimensional CP min x subject to x 0. Auxiliary problem: Φ(µ; x) := x + µb(x) Algorithms for convex optimization p.13/33

Example Barrier function (a) and function Φ (b) for µ = 1 and µ = 0.3: 7 7 6 6 5 4 5 4 µ=0.3 3 3 2 1 µ =0.3 2 1 µ =1 0 0 1 2 µ =1 1 2 5 4.5 4 3.5 3 2.5 2 1.5 1 0.5 0 0.5 (a) 5 4.5 4 3.5 3 2.5 2 1.5 1 0.5 0 0.5 (b) Algorithms for convex optimization p.14/33

Example The auxiliary function s. t. min (x 1 + x 2 ) x 1 2x 2 + 2 0 x 1 0, x 2 0 Φ = x 1 + x 2 µ[log(x 1 + 2x 2 2) + log(x 1 ) + log(x 2 )] Algorithms for convex optimization p.15/33

Example Level lines of the function Φ for µ = 1, 0.5, 0.25, 0.125: 300 300 250 250 200 200 150 150 100 100 50 50 50 100 150 200 250 300 50 100 150 200 250 300 300 300 250 250 200 200 150 150 100 100 50 50 50 100 150 200 250 300 50 100 150 200 250 300 Algorithms for convex optimization p.16/33

Central path x( µ 1 ) x( µ 0.5 ) x( µ 0.25 ) 0100 x( µ 0.125 ) 11 000 111 01 Algorithms for convex optimization p.17/33

The algorithm At i-th step, we are at a point x(µ i ) of the central path. decrease a bit µ i, thus getting a new target point x(µ i+1 ) on the path; approach the new target point x( µ i+1 ) by running the Newton method started at our current iterate x i. Hope: x(µ i ) is in the region of quadratic convergence of the Newton method approaching x(µ i+1 ). Hence, following the central path, Newton s method is always efficient (always in the region of quadratic convergence) Algorithms for convex optimization p.18/33

Theory and practice THEORY: Under some mild technical assumption, the method converges to the solution of (CP): x(µ) x as µ 0. PRACTICE: disappointing. The method may have serious numerical difficulties. The idea to stay on the central path, and thus to solve the auxiliary problems exactly, is too restrictive. The idea to stay all the time in the region of quadratic convergence of the Newton method may lead to extremely short steps. How to guarantee in practice that we are close enough? The theory does not give any quantitative results. If we take longer steps (i.e., decrease µ more rapidly), then the Newton method may become inefficient and we may even leave the feasible region. Algorithms for convex optimization p.19/33

Modern approach Brief history The classic barrier methods 60 s and 70 s. Due to their disadvantages, practitioners lost interest soon. Linear programming: Before 1984 linear programs solved exclusively by simplex method (Danzig 47) good practical performance bad theoretical behaviour (Examples with exponential behaviour) Looking for polynomial-time method 1979 Khachian: ellipsoid method polynomial-time (theoretically) bad practical performance 1984 Karmakar: polynomial-time method for LP reported 50-times faster than simplex 1986 Gill et al.: Karmakar = classic barrier method Algorithms for convex optimization p.20/33

Asymptotic vs. Complexity analysis Asymptotic analysis Classic analysis of the Newton and barrier methods: uses terms like sufficiently close, sufficiently small, close enough, asymptotic quadratic convergence. Does not give any quantitative estimates like: how close is sufficiently close in terms of the problem data? how much time (operations) do we need to reach our goal? Algorithms for convex optimization p.21/33

Asymptotic vs. Complexity analysis Complexity analysis Answers the question: Given an instance of a generic problem and a desired accuracy, how many arithmetic operations do we need to get a solution? The classic theory for barrier methods does not give a single information in this respect. Moreover, from the complexity viewpoint, Newton s method has no advantage to first-order algorithms (there is no such phenomenon as local quadratic convergence ); constrained problems have the same complexity as the unconstrained ones. Algorithms for convex optimization p.22/33

Modern approach (Nesterov-Nemirovski) It appears that all the problems of the classic barrier methods come from the fact that we have too much freedom in the choice of the penalty function B. Nesterov and Nemirovski (SIAM, 1994): 1. There is a class of good (self-concordant) barrier functions. Every barrier function B of this type is associated with a real parameter θ(b) > 1. 2. If B is self-concordant, one can specify the notion of closeness to the central path and the policy of updating the penalty parameter µ in the following way. If an iterate x i is close (in the above sense) to x(µ i ) and we update the parameter µ i to µ i+1, then in a single Newton step we get a new iterate x i+1 which is close to x(µ i+1 ). In other words, after every update of µ we can perform only one Newton step and stay close to the central path. Moreover, points close to the central path belong to the interior of the feasible region. Algorithms for convex optimization p.23/33

Modern approach (Nesterov-Nemirovski) 3. The penalty updating policy can be defined in terms of problem data: ( 1 = 1 + 0.1 ) 1 ; µ i+1 θ(b) µ i this shows, in particular, that reduction factor is independent on the size of µ, the penalty parameter decreases linearly. 4. Assume that we are close to the central path (this can be realized by certain initialization step). Then every O( θ(b)) steps of the algorithm improve the quality of the generated approximate solutions by an absolute constant factor. In particular, we need at most O(1) ( θ(b) log 1 + µ ) 0θ(B) ε to generate a strictly feasible ε-solution to (CP). Algorithms for convex optimization p.24/33

Modern approach (Nesterov-Nemirovski) Staying close to the central path in a single Newton step: One Newton step x i x( µ i-1 ) x x( µ i ) i+1 x( µ i+1 ) x i-1 Central path 01 01 The class of self-concordant function is sufficiently large and contains many popular barriers, in particular the logarithmic barrier function. Algorithms for convex optimization p.25/33