Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 2. Convex Optimization

Similar documents
Affine function. suppose f : R n R m is affine (f(x) =Ax + b with A R m n, b R m ) the image of a convex set under f is convex

Convex sets and convex functions

Convex sets and convex functions

COM Optimization for Communications Summary: Convex Sets and Convex Functions

2. Convex sets. affine and convex sets. some important examples. operations that preserve convexity. generalized inequalities

Convexity Theory and Gradient Methods

CS675: Convex and Combinatorial Optimization Fall 2014 Convex Functions. Instructor: Shaddin Dughmi

Tutorial on Convex Optimization for Engineers

2. Convex sets. x 1. x 2. affine set: contains the line through any two distinct points in the set

Convex Optimization. Convex Sets. ENSAE: Optimisation 1/24

2. Convex sets. affine and convex sets. some important examples. operations that preserve convexity. generalized inequalities

Introduction to Modern Control Systems

Lecture 4: Convexity

Convex Sets. CSCI5254: Convex Optimization & Its Applications. subspaces, affine sets, and convex sets. operations that preserve convexity

Convexity I: Sets and Functions

Lecture 2: August 31

Convex Sets (cont.) Convex Functions

Introduction to Convex Optimization. Prof. Daniel P. Palomar

Mathematical Programming and Research Methods (Part II)

Lecture 2: August 29, 2018

CS675: Convex and Combinatorial Optimization Spring 2018 Convex Sets. Instructor: Shaddin Dughmi

Convex Optimization M2

Convex Optimization. 2. Convex Sets. Prof. Ying Cui. Department of Electrical Engineering Shanghai Jiao Tong University. SJTU Ying Cui 1 / 33

Lecture: Convex Sets

Convex Optimization - Chapter 1-2. Xiangru Lian August 28, 2015

Lecture 5: Properties of convex sets

60 2 Convex sets. {x a T x b} {x ã T x b}

CS599: Convex and Combinatorial Optimization Fall 2013 Lecture 4: Convex Sets. Instructor: Shaddin Dughmi

Convexity: an introduction

Lecture 2: August 29, 2018

Chapter 4 Convex Optimization Problems

Applied Lagrange Duality for Constrained Optimization

Convex Optimization. Erick Delage, and Ashutosh Saxena. October 20, (a) (b) (c)

Lecture 19: Convex Non-Smooth Optimization. April 2, 2007

IE 521 Convex Optimization

Aspects of Convex, Nonconvex, and Geometric Optimization (Lecture 1) Suvrit Sra Massachusetts Institute of Technology

CMU-Q Lecture 9: Optimization II: Constrained,Unconstrained Optimization Convex optimization. Teacher: Gianni A. Di Caro

Convex Optimization. Lijun Zhang Modification of

Convex Optimization Lecture 2

Optimization under uncertainty: modeling and solution methods

Introduction to Constrained Optimization

Simplex Algorithm in 1 Slide

MTAEA Convexity and Quasiconvexity

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 6

Lecture 2 Convex Sets

Numerical Optimization

Lecture 5: Duality Theory

DM545 Linear and Integer Programming. Lecture 2. The Simplex Method. Marco Chiarandini

Lecture 2 - Introduction to Polytopes

Programming, numerics and optimization

Unconstrained Optimization Principles of Unconstrained Optimization Search Methods

Conic Duality. yyye

CME307/MS&E311 Theory Summary

Introduction to Convex Optimization

Convex Optimization and Machine Learning

Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization. Author: Martin Jaggi Presenter: Zhongxing Peng

Characterizing Improving Directions Unconstrained Optimization

Lecture 7: Support Vector Machine

Linear methods for supervised learning

Lecture 2. Topology of Sets in R n. August 27, 2008

Linear programming and duality theory

Lecture 10: SVM Lecture Overview Support Vector Machines The binary classification problem

Lecture 2 September 3

Math 5593 Linear Programming Lecture Notes

CME307/MS&E311 Optimization Theory Summary

Nonlinear Programming

EC 521 MATHEMATICAL METHODS FOR ECONOMICS. Lecture 2: Convex Sets

A Short SVM (Support Vector Machine) Tutorial

11 Linear Programming

Linear Programming. Larry Blume. Cornell University & The Santa Fe Institute & IHS

Convex Sets. Pontus Giselsson

Lec 11 Rate-Distortion Optimization (RDO) in Video Coding-I

Mathematical and Algorithmic Foundations Linear Programming and Matchings

This lecture: Convex optimization Convex sets Convex functions Convex optimization problems Why convex optimization? Why so early in the course?

13. Cones and semidefinite constraints

(1) Given the following system of linear equations, which depends on a parameter a R, 3x y + 5z = 2 4x + y + (a 2 14)z = a + 2

Advanced Operations Research Techniques IE316. Quiz 2 Review. Dr. Ted Ralphs

Section 5 Convex Optimisation 1. W. Dai (IC) EE4.66 Data Proc. Convex Optimisation page 5-1

Contents. I Basics 1. Copyright by SIAM. Unauthorized reproduction of this article is prohibited.

AMS : Combinatorial Optimization Homework Problems - Week V

2. Optimization problems 6

CS 435, 2018 Lecture 2, Date: 1 March 2018 Instructor: Nisheeth Vishnoi. Convex Programming and Efficiency

Convex Optimization MLSS 2015

Week 5. Convex Optimization

Advanced Operations Research Techniques IE316. Quiz 1 Review. Dr. Ted Ralphs

Introduction to optimization

minimise f(x) subject to g(x) = b, x X. inf f(x) = inf L(x,

Locally convex topological vector spaces

Local and Global Minimum

of Convex Analysis Fundamentals Jean-Baptiste Hiriart-Urruty Claude Lemarechal Springer With 66 Figures

Convexity and Optimization

Programs. Introduction

Convexity and Optimization

Convex Optimization. Chapter 1 - chapter 2.2

COMS 4771 Support Vector Machines. Nakul Verma

Introduction to Optimization

Optimization III: Constrained Optimization

3. The Simplex algorithmn The Simplex algorithmn 3.1 Forms of linear programs

maximize c, x subject to Ax b,

Lecture 2 Optimization with equality constraints

Transcription:

Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 2 Convex Optimization

Shiqian Ma, MAT-258A: Numerical Optimization 2 2.1. Convex Optimization General optimization problem: min f 0 (x) s.t., f i (x) b i, i = 1,..., m. x = (x 1,..., x n ): optimization variables f 0 : R n R: objective function f i : R n R, i = 1,..., m: constraint functions

Shiqian Ma, MAT-258A: Numerical Optimization 3 General optimization problem very difficult to solve methods involve some compromise: long computation time, or not always find the global solution exceptions: certain problem classes can be solved efficiently and reliably linear programming problems convex optimization problems We focus on convex optimization problems in this course. many tricks to transforming problems into convex form surprisingly many problems can be solved via convex optimization

Shiqian Ma, MAT-258A: Numerical Optimization 4 Convex optimization min f 0 (x) s.t., f i (x) b i, i = 1,..., m objective and constraint functions are convex f i (λx + (1 λ)y) λf i (x) + (1 λ)f i (y) for all x, y R n, and λ [0, 1]. i.e., objective function is a convex function, feasible region is a convex set examples: LP, QP, SDP

Shiqian Ma, MAT-258A: Numerical Optimization 5 2.1.1. Convex sets Affine set Line through x 1, x 2 : all points x = θx 1 + (1 θ)x 2, (θ R) Affine set: contains the line through any two distinct points in the set example: solution set of linear equations {x Ax = b}

Shiqian Ma, MAT-258A: Numerical Optimization 6 Convex set Line segment between x 1 and x 2 : all points x = θx 1 + (1 θ)x 2, 0 θ 1 convex set: contains line segment between any two points in the set x 1, x 2 C, 0 θ 1 θx 1 + (1 θ)x 2 C convex hull (conv(s)): set of all convex combinations of points in S: k x = θ 1 x 1 + θ 2 x 2 +... + θ k x k, θ i = 1, θ i 0. i=1

Shiqian Ma, MAT-258A: Numerical Optimization 7 Convex cone: set that contains all conic combinations of points in the set. conic combination of x 1 and x 2 : any point of the form x = θ 1 x 1 + θ 2 x 2, θ 1 0, θ 2 0. hyperplane: set of form {x a x = b}, (a 0) halfspace: set of the form {x a x b}, (a 0) a is the normal vector hyperplanes are affine and convex; halfspaces are convex Euclidean ball with center x c and radius r: B(x c, r) = {x x x c 2 r} = {x c + ru u 2 1}

Shiqian Ma, MAT-258A: Numerical Optimization 8 ellipsoid: set of the form {x (x x c ) P 1 (x x c ) 1} with P S n ++ (i.e., P symmetric positive definite) norm ball with center x c and radius r: {x x x c r} norm cone: {(x, t) x t} Euclidean norm cone is called second-order cone Polyhedra: solution set of finitely many linear inequalities and equalities Ax b, Cx = d. Positive semidefinite cone S n denotes the set of symmetric n n matrices

Shiqian Ma, MAT-258A: Numerical Optimization 9 S n + = {X S n X 0} denotes positive semidefinite n n matrices X S n + z Xz 0, for all z S n + is a convex cone S n ++ = {X S n X 0}: positive definite n n matrices

Shiqian Ma, MAT-258A: Numerical Optimization 10 Operations that preserve convexity practical methods for establishing convexity of a set C apply definition x 1, x 2 C, 0 θ 1 θx 1 + (1 θ)x 2 C show that C is obtained from simple convex sets (hyperplanes, halfspaces, norm balls,...) by operations that preserve convexity intersection affine functions perspective function linear-fractional functions

Shiqian Ma, MAT-258A: Numerical Optimization 11 Affine function suppose f : R n R m is affine (f(x) = Ax + b with A R m n, b R m ) the image of convex set under f is convex S R n convex f(s) = {f(x) x S} convex the inverse image f 1 (C) of a convex set under f is convex Examples: C R m convex f 1 (C) = {x R n f(x) C} convex scaling, translation, projection solution set of linear matrix inequality {x x 1 A 1 +... + x m A m B} (with A i, B S p )

Shiqian Ma, MAT-258A: Numerical Optimization 12 Perspective and linear-fractional functions Perspective function P : R n+1 R n : P (x, t) = x/t, dom P = {(x, t) t > 0} images and inverse images of convex sets under perspective are convex linear-fractional function f : R n R m : f(x) = Ax + b c x + d, dom f = {x c x + d > 0} images and inverse images of convex sets under linear-fractional are convex

Shiqian Ma, MAT-258A: Numerical Optimization 13 Basic properties of convexity Separation hyperplane theorem if C and D are disjoint convex sets, then there exists a 0, b such that a x b for x C, a x b for x D the hyperplane {x a x = b} separates C and D supporting hyperplane to set C at boundary point x 0 : {x a x = a x 0 } where a 0 and a x a x 0 for all x C supporting hyperplane theorem: if C is convex, then there exists a supporting hyperplane at every boundary point of C

Shiqian Ma, MAT-258A: Numerical Optimization 14 2.1.2. Convex functions

Shiqian Ma, MAT-258A: Numerical Optimization 15 Definition f : R n R is convex if dom f is a convex set and f(θx + (1 θ)y) θf(x) + (1 θ)f(y) for all x, y dom f, 0 θ 1 f is concave if f is convex f is strictly convex if dom f is a convex set and f(θx + (1 θ)y) < θf(x) + (1 θ)f(y) for all x, y dom f, x y, 0 < θ < 1

Shiqian Ma, MAT-258A: Numerical Optimization 16 Examples on R Convex: affine: ax + b on R, for any a, b R exponential: e ax, for any a R powers: x α on R ++, for α 1 or α 0 powers of absolution value: x p on R, for p 1 negative entropy: x log x on R ++ Concave: affine: ax + b on R, for any a, b R powers: x α on R ++, for 0 α 1 logarithm: log x on R ++

Shiqian Ma, MAT-258A: Numerical Optimization 17 Examples on R n and R m n affine functions are convex and concave; all norms are convex examples on R n affine function f(x) = a x + b norms: x p = ( n i=1 x i p ) 1/p for p 1; x = max k x k examples on R m n affine function spectral norm f(x) = Tr (A X) + b = m i=1 n A ij X ij + b j=1 f(x) = X 2 = σ max (X) = (λ max (X X)) 1/2

Shiqian Ma, MAT-258A: Numerical Optimization 18 Restriction of a convex function to a line f : R n R is convex if and only if the function g : R R, g(t) = f(x + tv), dom g = {t x + tv dom f} is convex (in t) for any x dom f, v R n can check convexity of f by checking convexity of functions of one variable example. f : S n R with f(x) = log det X, dom f = S n ++ g(t) = log det(x + tv ) = log det X + log det(i + tx 1/2 V X 1/2 ) = log det X + n i=1 log(1 + tλ i) where λ i are the eigenvalues of X 1/2 V X 1/2 g is concave in t (for any choice of X 0, V ); hence f is concave

Shiqian Ma, MAT-258A: Numerical Optimization 19 Extended-value extension extended-value extension f of f is f(x) = f(x), x dom f, f(x) =, x / dom f often simplifies notation; for example, the condition 0 θ 1 f(θx + (1 θ)y) θ f(x) + (1 θ) f(y) dom f is convex for x, y dom f, 0 θ 1 f(θx + (1 θ)y) θf(x) + (1 θ)f(y)

Shiqian Ma, MAT-258A: Numerical Optimization 20 First-order condition f is differentiable if dom f is open and the gradient ( f(x) f(x) =, f(x),..., f(x) ) x 1 x 2 x n exists at each x dom f First-order condition: differentiable f with convex domain is convex iff f(y) f(x) + f(x) (y x) for all x, y dom f First-order approximation of f is global underestimator

Shiqian Ma, MAT-258A: Numerical Optimization 21 Second-order conditions f is twice differentiable if dom f is open and the Hessian 2 f(x) S n, 2 f(x) ij = 2 f(x) x i x j, i, j = 1,..., n, exists at each x dom f Second-order conditions: for twice differentiable f with convex domain f is convex if and only if 2 f(x) 0 for all x dom f if 2 f(x) 0 for all x dom f, then f is strictly convex

Shiqian Ma, MAT-258A: Numerical Optimization 22 Examples quadratic function: f(x) = (1/2)x P x + q x + r (with P S n ) f(x) = P x + q, 2 f(x) = P convex if P 0 least-squares objective: f(x) = Ax b 2 2 f(x) = 2A (Ax b), 2 f(x) = 2A A convex (for any A) quadratic-over-linear: f(x, y) = x 2 /y 2 f(x, y) = 2 ( ) ( ) y y 0 y 3 x x convex for any y > 0

Shiqian Ma, MAT-258A: Numerical Optimization 23 log-sum-exp: f(x) = log n k=1 exp x k is convex 2 f(x) = 1 1 z diag(z) 1 (1 z) 2zz (z k = exp x k ) to show 2 f(x) 0, we must verify that v 2 f(x)v 0 for all v: v 2 f(x)v = ( k z kv 2 k )( k z k) ( k v kz k ) 2 ( k z k) 2 0 since ( k v kz k ) 2 ( k z kv 2 k )( k z k) (from Cauchy-Schwarz inequality) geometric mean: f(x) = ( n k=1 x k) 1/n on R n ++ is concave (similar proof as for log-sum-exp)

Shiqian Ma, MAT-258A: Numerical Optimization 24 α-sublevel set of f : R n R: Epigraph and sublevel set C α = {x dom f f(x) α} sublevel sets of convex functions are convex (converse is false) epigraph of f : R n R: epi f = {(x, t) R n+1 x dom f, f(x) t} f is convex if and only if epi f is a convex set

Shiqian Ma, MAT-258A: Numerical Optimization 25 Jensen s inequality if f is convex, then for any random variable z f(ez) Ef(z)

Shiqian Ma, MAT-258A: Numerical Optimization 26 Operations that preserve convexity practical methods for establishing convexity of a function verify definition (often simplified by restricting to a line) for twice differentiable functions, show 2 f(x) 0 show that f is obtained from simple convex functions by operations that preserve convexity nonnegative weighted sum composition with affine function pointwise maximum and supremum composition minimization perspective

Shiqian Ma, MAT-258A: Numerical Optimization 27 Composition with affine function composition with affine function: f(ax + b) is convex if f is convex examples: log barrier for linear inequalities m f(x) = log(b i a i x), dom f = {x a i x < b i, i = 1,..., m} i=1 (any) norm of affine function: f(x) = Ax + b

Shiqian Ma, MAT-258A: Numerical Optimization 28 Pointwise maximum if f 1,..., f m are convex, then f(x) = max{f 1 (x),..., f m (x)} is convex examples: piecewise-linear function: f(x) = max i=1,...,m (a i x + b i) is convex sum of r largest components of x R n : f(x) = x [1] + x [2] +... + x [r] is convex (x [i] is the i-th largest component of x) proof: f(x) = max{x i1 + x i2 +... + x ir 1 i 1 < i 2 <... < i r n}

Shiqian Ma, MAT-258A: Numerical Optimization 29 Pointwise supremum if f(x, y) is convex in x for each y A, then is convex examples: g(x) = sup f(x, y) y A support function of a set C: S C (x) = sup y C y x is convex distance to farthest point in a set C: f(x) = sup x y y C maximum eigenvalue of symmetric matrix: for X S n, λ max (X) = sup y Xy y 2 =1

Shiqian Ma, MAT-258A: Numerical Optimization 30 Composition with scalar functions composition of g : R n R and h : R R: f(x) = h(g(x)) f is convex if g convex, h convex, h nondecreasing f is convex if g concave, h convex, h nonincreasing proof (for n = 1, differentiable g,h) examples f (x) = h (g(x))g (x) 2 + h (g(x))g (x) exp g(x) is convex if g is convex 1/g(x) is convex if g is concave and positive

Shiqian Ma, MAT-258A: Numerical Optimization 31 Vector composition composition of g : R n R k and h : R k R: f(x) = h(g(x)) = h(g 1 (x), g 2 (x),..., g k (x)) f is convex if g i convex, h convex, h nondecreasing in each argument f is convex if g i concave, h convex, h nonincreasing in each argument proof (for n = 1, differentiable g,h) examples f (x) = g (x) 2 h(g(x))g (x) + h(g(x)) g (x) m i=1 log g i(x) is concave if g i are concave and positive log m i=1 exp g i(x) is convex if g i are convex

Shiqian Ma, MAT-258A: Numerical Optimization 32 Minimization if f(x, y) is convex in (x, y) and C is a convex set, then is convex examples g(x) = inf f(x, y) y C f(x, y) = x Ax + 2x By + y Cy with ( ) A B B 0, C 0 C minimizing over y gives g(x) = inf y f(x, y) = x (A BC 1 B )x g is convex because Schur complement A BC 1 B 0 distance to a set: dist(x, S) = inf y S x y is convex if S is convex

Shiqian Ma, MAT-258A: Numerical Optimization 33 Perspective the perspective of a function f : R n R is the function g : R n R R, g(x, t) = tf(x/t), dom g = {(x, t) x/t dom f, t > 0} g is convex if f is convex examples f(x) = x x is convex; hence g(x, t) = x x/t is convex for t > 0 negative logarithm f(x) = log x is convex; hence relative entropy g(x, t) = t log t t log x is convex on R 2 ++ if f is convex, then g(x) = (c x + d)f((ax + b)/(c x + d)) is convex on {x c x + d > 0, (Ax + b)/(c x + d) dom f}

Shiqian Ma, MAT-258A: Numerical Optimization 34 the conjugate of a function f is The conjugate function f (y) = f is convex (even if f is not) examples negative logarithm f(x) = log x sup (y x f(x)) x dom f f (y) = { sup x>0 (xy + log x) 1 log( y) if y < 0 = otherwise strictly convex quadratic f(x) = (1/2)x Qx with Q S n ++ f (y) = sup(y x (1/2)x Qx = (1/2)y Q 1 y x

Shiqian Ma, MAT-258A: Numerical Optimization 35 2.1.3. Convex optimization problem standard form convex optimization problem min f 0 (x) s.t. f i (x) 0, i = 1,..., m a i x = b i, i = 1,..., p f 0, f 1,..., f m are convex; equality constraints are affine often written as min f 0 (x) s.t. f i (x) 0, i = 1,..., m Ax = b feasible set of a convex optimization problem is convex

Shiqian Ma, MAT-258A: Numerical Optimization 36 Local and global optima any locally optimal point of a convex problem is (globally) optimal proof: suppose x is locally optimal and y is optimal with f 0 (y) < f 0 (x) x locally optimal means there is an R > 0 such that z feasible, z x 2 R f 0 (z) f 0 (x) consider z = θy + (1 θ)x with θ = R/(2 y x 2 ) y x 2 > R, so 0 < θ < 1/2 z is a convex combination of two feasible points, hence also feasible z x 2 = R/2 and f 0 (z) θf 0 (x) + (1 θ)f 0 (y) < f 0 (x) which contradicts our assumption that x is locally optimal

Shiqian Ma, MAT-258A: Numerical Optimization 37 Optimality conditions for differentiable f 0 x is optimal if and only if it is feasible and f 0 (x) (y x) 0 for all feasible y if nonzero, f 0 (x) defines a supporting hyperplane to feasible set X at x unconstrained problem: x is optimal if and only if x dom f 0, f 0 (x) = 0

Shiqian Ma, MAT-258A: Numerical Optimization 38 Lagrangian standard form problem (not necessarily convex) min f 0 (x) s.t. f i (x) 0, i = 1,..., m h i (x) = 0, i = 1,..., p variable x R n, domain D, optimal value p Lagrangian: L : R n R m R p R, with dom L = D R m R p L(x, λ, ν) = f 0 (x) + m λ i f i (x) + i=1 Lagrange dual function: g : R m R p R, p ν i h i (x) g(λ, ν) = inf x D L(x, λ, ν) = inf x D (f 0 (x) + m i=1 λ if i (x) + p i=1 ν ih i (x)) i=1

Shiqian Ma, MAT-258A: Numerical Optimization 39 g is concave, can be for some λ, ν lower bound property: if λ 0, then g(λ, ν) p proof: if x is feasible and λ 0, then f 0 ( x) L( x, λ, ν) inf L(x, λ, ν) = g(λ, ν) x D minimizing over all feasible x gives p g(λ, ν)

Shiqian Ma, MAT-258A: Numerical Optimization 40 Lagrange dual and conjugate function dual function min f 0 (x) s.t. Ax b, Cx = d g(λ, ν) = inf x dom f0 (f 0 (x) + (A λ + C ν) x b λ d ν) = f 0 ( A λ C ν) b λ d ν

Shiqian Ma, MAT-258A: Numerical Optimization 41 Lagrange dual problem finds best lower bound on p The dual problem max g(λ, ν) s.t. λ 0 a convex optimization problem; optimal value denoted by d

Shiqian Ma, MAT-258A: Numerical Optimization 42 weak duality: d p Weak and strong duality always holds (for convex and nonconvex problems) can be used to find nontrivial lower bounds for difficult problems strong duality: d = p does not hold in general (usually) holds for convex problems conditions that guarantee strong duality in convex problems are called constraint qualifications

Shiqian Ma, MAT-258A: Numerical Optimization 43 Slater s constraint qualification strong duality holds for a convex problem if it is strictly feasible, i.e., min f 0 (x) s.t. f i (x) 0, i = 1,..., m Ax = b x int D : f i (x) < 0, i = 1,..., m, Ax = b also guarantees that the dual optimum is attained (if p > ) there exist many other types of constraint qualifications

Shiqian Ma, MAT-258A: Numerical Optimization 44 Karush-Kuhn-Tucker (KKT) conditions the following 4 conditions are called KKT conditions (for a problem with differentiable f i,h i ): primal constraints: f i (x) 0, i = 1,..., m, h i (x) = 0, i = 1,..., p dual constraints: λ 0 complementary slackness: λ i f i (x) = 0, i = 1,..., m gradient of Lagrangian wrt x vanishes: m p f 0 (x) + λ i f i (x) + ν i h i (x) = 0 i=1 i=1

Shiqian Ma, MAT-258A: Numerical Optimization 45 KKT conditions for convex problem if x, λ, ν satisfy KKT for a convex problem, then they are optimal: from complementary slackness: f 0 ( x) = L( x, λ, ν) from 4th condition (and convexity): g( λ, ν) = L( x, λ, ν) hence f 0 ( x) = g( λ, ν) if Slater s condition is satisfied: x is optimal iff there exist λ,ν that satisfy KKT conditions recall that Slater implies strong duality, and dual optimum is attained generalizes optimality condition f 0 (x) = 0 for unconstrained problem