Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 2. Convex Optimization

Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 2 Convex Optimization

Shiqian Ma, MAT-258A: Numerical Optimization 2 2.1. Convex Optimization General optimization problem: min f 0 (x) s.t., f i (x) b i, i = 1,..., m. x = (x 1,..., x n ): optimization variables f 0 : R n R: objective function f i : R n R, i = 1,..., m: constraint functions

Shiqian Ma, MAT-258A: Numerical Optimization 3 General optimization problem very difficult to solve methods involve some compromise: long computation time, or not always find the global solution exceptions: certain problem classes can be solved efficiently and reliably linear programming problems convex optimization problems We focus on convex optimization problems in this course. many tricks to transforming problems into convex form surprisingly many problems can be solved via convex optimization

Shiqian Ma, MAT-258A: Numerical Optimization 4 Convex optimization min f 0 (x) s.t., f i (x) b i, i = 1,..., m objective and constraint functions are convex f i (λx + (1 λ)y) λf i (x) + (1 λ)f i (y) for all x, y R n, and λ [0, 1]. i.e., objective function is a convex function, feasible region is a convex set examples: LP, QP, SDP

Shiqian Ma, MAT-258A: Numerical Optimization 5 2.1.1. Convex sets Affine set Line through x 1, x 2 : all points x = θx 1 + (1 θ)x 2, (θ R) Affine set: contains the line through any two distinct points in the set example: solution set of linear equations {x Ax = b}

Shiqian Ma, MAT-258A: Numerical Optimization 6 Convex set Line segment between x 1 and x 2 : all points x = θx 1 + (1 θ)x 2, 0 θ 1 convex set: contains line segment between any two points in the set x 1, x 2 C, 0 θ 1 θx 1 + (1 θ)x 2 C convex hull (conv(s)): set of all convex combinations of points in S: k x = θ 1 x 1 + θ 2 x 2 +... + θ k x k, θ i = 1, θ i 0. i=1

Shiqian Ma, MAT-258A: Numerical Optimization 7 Convex cone: set that contains all conic combinations of points in the set. conic combination of x 1 and x 2 : any point of the form x = θ 1 x 1 + θ 2 x 2, θ 1 0, θ 2 0. hyperplane: set of form {x a x = b}, (a 0) halfspace: set of the form {x a x b}, (a 0) a is the normal vector hyperplanes are affine and convex; halfspaces are convex Euclidean ball with center x c and radius r: B(x c, r) = {x x x c 2 r} = {x c + ru u 2 1}

Shiqian Ma, MAT-258A: Numerical Optimization 8 ellipsoid: set of the form {x (x x c ) P 1 (x x c ) 1} with P S n ++ (i.e., P symmetric positive definite) norm ball with center x c and radius r: {x x x c r} norm cone: {(x, t) x t} Euclidean norm cone is called second-order cone Polyhedra: solution set of finitely many linear inequalities and equalities Ax b, Cx = d. Positive semidefinite cone S n denotes the set of symmetric n n matrices

Shiqian Ma, MAT-258A: Numerical Optimization 9 S n + = {X S n X 0} denotes positive semidefinite n n matrices X S n + z Xz 0, for all z S n + is a convex cone S n ++ = {X S n X 0}: positive definite n n matrices

Shiqian Ma, MAT-258A: Numerical Optimization 10 Operations that preserve convexity practical methods for establishing convexity of a set C apply definition x 1, x 2 C, 0 θ 1 θx 1 + (1 θ)x 2 C show that C is obtained from simple convex sets (hyperplanes, halfspaces, norm balls,...) by operations that preserve convexity intersection affine functions perspective function linear-fractional functions

Shiqian Ma, MAT-258A: Numerical Optimization 11 Affine function suppose f : R n R m is affine (f(x) = Ax + b with A R m n, b R m ) the image of convex set under f is convex S R n convex f(s) = {f(x) x S} convex the inverse image f 1 (C) of a convex set under f is convex Examples: C R m convex f 1 (C) = {x R n f(x) C} convex scaling, translation, projection solution set of linear matrix inequality {x x 1 A 1 +... + x m A m B} (with A i, B S p )

Shiqian Ma, MAT-258A: Numerical Optimization 12 Perspective and linear-fractional functions Perspective function P : R n+1 R n : P (x, t) = x/t, dom P = {(x, t) t > 0} images and inverse images of convex sets under perspective are convex linear-fractional function f : R n R m : f(x) = Ax + b c x + d, dom f = {x c x + d > 0} images and inverse images of convex sets under linear-fractional are convex

Shiqian Ma, MAT-258A: Numerical Optimization 13 Basic properties of convexity Separation hyperplane theorem if C and D are disjoint convex sets, then there exists a 0, b such that a x b for x C, a x b for x D the hyperplane {x a x = b} separates C and D supporting hyperplane to set C at boundary point x 0 : {x a x = a x 0 } where a 0 and a x a x 0 for all x C supporting hyperplane theorem: if C is convex, then there exists a supporting hyperplane at every boundary point of C

Shiqian Ma, MAT-258A: Numerical Optimization 14 2.1.2. Convex functions

Shiqian Ma, MAT-258A: Numerical Optimization 15 Definition f : R n R is convex if dom f is a convex set and f(θx + (1 θ)y) θf(x) + (1 θ)f(y) for all x, y dom f, 0 θ 1 f is concave if f is convex f is strictly convex if dom f is a convex set and f(θx + (1 θ)y) < θf(x) + (1 θ)f(y) for all x, y dom f, x y, 0 < θ < 1

Shiqian Ma, MAT-258A: Numerical Optimization 16 Examples on R Convex: affine: ax + b on R, for any a, b R exponential: e ax, for any a R powers: x α on R ++, for α 1 or α 0 powers of absolution value: x p on R, for p 1 negative entropy: x log x on R ++ Concave: affine: ax + b on R, for any a, b R powers: x α on R ++, for 0 α 1 logarithm: log x on R ++

Shiqian Ma, MAT-258A: Numerical Optimization 17 Examples on R n and R m n affine functions are convex and concave; all norms are convex examples on R n affine function f(x) = a x + b norms: x p = ( n i=1 x i p ) 1/p for p 1; x = max k x k examples on R m n affine function spectral norm f(x) = Tr (A X) + b = m i=1 n A ij X ij + b j=1 f(x) = X 2 = σ max (X) = (λ max (X X)) 1/2

Shiqian Ma, MAT-258A: Numerical Optimization 18 Restriction of a convex function to a line f : R n R is convex if and only if the function g : R R, g(t) = f(x + tv), dom g = {t x + tv dom f} is convex (in t) for any x dom f, v R n can check convexity of f by checking convexity of functions of one variable example. f : S n R with f(x) = log det X, dom f = S n ++ g(t) = log det(x + tv ) = log det X + log det(i + tx 1/2 V X 1/2 ) = log det X + n i=1 log(1 + tλ i) where λ i are the eigenvalues of X 1/2 V X 1/2 g is concave in t (for any choice of X 0, V ); hence f is concave

Shiqian Ma, MAT-258A: Numerical Optimization 19 Extended-value extension extended-value extension f of f is f(x) = f(x), x dom f, f(x) =, x / dom f often simplifies notation; for example, the condition 0 θ 1 f(θx + (1 θ)y) θ f(x) + (1 θ) f(y) dom f is convex for x, y dom f, 0 θ 1 f(θx + (1 θ)y) θf(x) + (1 θ)f(y)

Shiqian Ma, MAT-258A: Numerical Optimization 20 First-order condition f is differentiable if dom f is open and the gradient ( f(x) f(x) =, f(x),..., f(x) ) x 1 x 2 x n exists at each x dom f First-order condition: differentiable f with convex domain is convex iff f(y) f(x) + f(x) (y x) for all x, y dom f First-order approximation of f is global underestimator

Shiqian Ma, MAT-258A: Numerical Optimization 21 Second-order conditions f is twice differentiable if dom f is open and the Hessian 2 f(x) S n, 2 f(x) ij = 2 f(x) x i x j, i, j = 1,..., n, exists at each x dom f Second-order conditions: for twice differentiable f with convex domain f is convex if and only if 2 f(x) 0 for all x dom f if 2 f(x) 0 for all x dom f, then f is strictly convex

Shiqian Ma, MAT-258A: Numerical Optimization 22 Examples quadratic function: f(x) = (1/2)x P x + q x + r (with P S n ) f(x) = P x + q, 2 f(x) = P convex if P 0 least-squares objective: f(x) = Ax b 2 2 f(x) = 2A (Ax b), 2 f(x) = 2A A convex (for any A) quadratic-over-linear: f(x, y) = x 2 /y 2 f(x, y) = 2 ( ) ( ) y y 0 y 3 x x convex for any y > 0

Shiqian Ma, MAT-258A: Numerical Optimization 23 log-sum-exp: f(x) = log n k=1 exp x k is convex 2 f(x) = 1 1 z diag(z) 1 (1 z) 2zz (z k = exp x k ) to show 2 f(x) 0, we must verify that v 2 f(x)v 0 for all v: v 2 f(x)v = ( k z kv 2 k )( k z k) ( k v kz k ) 2 ( k z k) 2 0 since ( k v kz k ) 2 ( k z kv 2 k )( k z k) (from Cauchy-Schwarz inequality) geometric mean: f(x) = ( n k=1 x k) 1/n on R n ++ is concave (similar proof as for log-sum-exp)

Shiqian Ma, MAT-258A: Numerical Optimization 24 α-sublevel set of f : R n R: Epigraph and sublevel set C α = {x dom f f(x) α} sublevel sets of convex functions are convex (converse is false) epigraph of f : R n R: epi f = {(x, t) R n+1 x dom f, f(x) t} f is convex if and only if epi f is a convex set

Shiqian Ma, MAT-258A: Numerical Optimization 25 Jensen s inequality if f is convex, then for any random variable z f(ez) Ef(z)

Shiqian Ma, MAT-258A: Numerical Optimization 26 Operations that preserve convexity practical methods for establishing convexity of a function verify definition (often simplified by restricting to a line) for twice differentiable functions, show 2 f(x) 0 show that f is obtained from simple convex functions by operations that preserve convexity nonnegative weighted sum composition with affine function pointwise maximum and supremum composition minimization perspective

Shiqian Ma, MAT-258A: Numerical Optimization 27 Composition with affine function composition with affine function: f(ax + b) is convex if f is convex examples: log barrier for linear inequalities m f(x) = log(b i a i x), dom f = {x a i x < b i, i = 1,..., m} i=1 (any) norm of affine function: f(x) = Ax + b

Shiqian Ma, MAT-258A: Numerical Optimization 28 Pointwise maximum if f 1,..., f m are convex, then f(x) = max{f 1 (x),..., f m (x)} is convex examples: piecewise-linear function: f(x) = max i=1,...,m (a i x + b i) is convex sum of r largest components of x R n : f(x) = x [1] + x [2] +... + x [r] is convex (x [i] is the i-th largest component of x) proof: f(x) = max{x i1 + x i2 +... + x ir 1 i 1 < i 2 <... < i r n}

Shiqian Ma, MAT-258A: Numerical Optimization 29 Pointwise supremum if f(x, y) is convex in x for each y A, then is convex examples: g(x) = sup f(x, y) y A support function of a set C: S C (x) = sup y C y x is convex distance to farthest point in a set C: f(x) = sup x y y C maximum eigenvalue of symmetric matrix: for X S n, λ max (X) = sup y Xy y 2 =1

Shiqian Ma, MAT-258A: Numerical Optimization 30 Composition with scalar functions composition of g : R n R and h : R R: f(x) = h(g(x)) f is convex if g convex, h convex, h nondecreasing f is convex if g concave, h convex, h nonincreasing proof (for n = 1, differentiable g,h) examples f (x) = h (g(x))g (x) 2 + h (g(x))g (x) exp g(x) is convex if g is convex 1/g(x) is convex if g is concave and positive

Shiqian Ma, MAT-258A: Numerical Optimization 31 Vector composition composition of g : R n R k and h : R k R: f(x) = h(g(x)) = h(g 1 (x), g 2 (x),..., g k (x)) f is convex if g i convex, h convex, h nondecreasing in each argument f is convex if g i concave, h convex, h nonincreasing in each argument proof (for n = 1, differentiable g,h) examples f (x) = g (x) 2 h(g(x))g (x) + h(g(x)) g (x) m i=1 log g i(x) is concave if g i are concave and positive log m i=1 exp g i(x) is convex if g i are convex

Shiqian Ma, MAT-258A: Numerical Optimization 32 Minimization if f(x, y) is convex in (x, y) and C is a convex set, then is convex examples g(x) = inf f(x, y) y C f(x, y) = x Ax + 2x By + y Cy with ( ) A B B 0, C 0 C minimizing over y gives g(x) = inf y f(x, y) = x (A BC 1 B )x g is convex because Schur complement A BC 1 B 0 distance to a set: dist(x, S) = inf y S x y is convex if S is convex

Shiqian Ma, MAT-258A: Numerical Optimization 33 Perspective the perspective of a function f : R n R is the function g : R n R R, g(x, t) = tf(x/t), dom g = {(x, t) x/t dom f, t > 0} g is convex if f is convex examples f(x) = x x is convex; hence g(x, t) = x x/t is convex for t > 0 negative logarithm f(x) = log x is convex; hence relative entropy g(x, t) = t log t t log x is convex on R 2 ++ if f is convex, then g(x) = (c x + d)f((ax + b)/(c x + d)) is convex on {x c x + d > 0, (Ax + b)/(c x + d) dom f}

Shiqian Ma, MAT-258A: Numerical Optimization 34 the conjugate of a function f is The conjugate function f (y) = f is convex (even if f is not) examples negative logarithm f(x) = log x sup (y x f(x)) x dom f f (y) = { sup x>0 (xy + log x) 1 log( y) if y < 0 = otherwise strictly convex quadratic f(x) = (1/2)x Qx with Q S n ++ f (y) = sup(y x (1/2)x Qx = (1/2)y Q 1 y x

Shiqian Ma, MAT-258A: Numerical Optimization 35 2.1.3. Convex optimization problem standard form convex optimization problem min f 0 (x) s.t. f i (x) 0, i = 1,..., m a i x = b i, i = 1,..., p f 0, f 1,..., f m are convex; equality constraints are affine often written as min f 0 (x) s.t. f i (x) 0, i = 1,..., m Ax = b feasible set of a convex optimization problem is convex

Shiqian Ma, MAT-258A: Numerical Optimization 36 Local and global optima any locally optimal point of a convex problem is (globally) optimal proof: suppose x is locally optimal and y is optimal with f 0 (y) < f 0 (x) x locally optimal means there is an R > 0 such that z feasible, z x 2 R f 0 (z) f 0 (x) consider z = θy + (1 θ)x with θ = R/(2 y x 2 ) y x 2 > R, so 0 < θ < 1/2 z is a convex combination of two feasible points, hence also feasible z x 2 = R/2 and f 0 (z) θf 0 (x) + (1 θ)f 0 (y) < f 0 (x) which contradicts our assumption that x is locally optimal

Shiqian Ma, MAT-258A: Numerical Optimization 37 Optimality conditions for differentiable f 0 x is optimal if and only if it is feasible and f 0 (x) (y x) 0 for all feasible y if nonzero, f 0 (x) defines a supporting hyperplane to feasible set X at x unconstrained problem: x is optimal if and only if x dom f 0, f 0 (x) = 0

Shiqian Ma, MAT-258A: Numerical Optimization 38 Lagrangian standard form problem (not necessarily convex) min f 0 (x) s.t. f i (x) 0, i = 1,..., m h i (x) = 0, i = 1,..., p variable x R n, domain D, optimal value p Lagrangian: L : R n R m R p R, with dom L = D R m R p L(x, λ, ν) = f 0 (x) + m λ i f i (x) + i=1 Lagrange dual function: g : R m R p R, p ν i h i (x) g(λ, ν) = inf x D L(x, λ, ν) = inf x D (f 0 (x) + m i=1 λ if i (x) + p i=1 ν ih i (x)) i=1

Shiqian Ma, MAT-258A: Numerical Optimization 39 g is concave, can be for some λ, ν lower bound property: if λ 0, then g(λ, ν) p proof: if x is feasible and λ 0, then f 0 ( x) L( x, λ, ν) inf L(x, λ, ν) = g(λ, ν) x D minimizing over all feasible x gives p g(λ, ν)

Shiqian Ma, MAT-258A: Numerical Optimization 40 Lagrange dual and conjugate function dual function min f 0 (x) s.t. Ax b, Cx = d g(λ, ν) = inf x dom f0 (f 0 (x) + (A λ + C ν) x b λ d ν) = f 0 ( A λ C ν) b λ d ν

Shiqian Ma, MAT-258A: Numerical Optimization 41 Lagrange dual problem finds best lower bound on p The dual problem max g(λ, ν) s.t. λ 0 a convex optimization problem; optimal value denoted by d

Shiqian Ma, MAT-258A: Numerical Optimization 42 weak duality: d p Weak and strong duality always holds (for convex and nonconvex problems) can be used to find nontrivial lower bounds for difficult problems strong duality: d = p does not hold in general (usually) holds for convex problems conditions that guarantee strong duality in convex problems are called constraint qualifications

Shiqian Ma, MAT-258A: Numerical Optimization 43 Slater s constraint qualification strong duality holds for a convex problem if it is strictly feasible, i.e., min f 0 (x) s.t. f i (x) 0, i = 1,..., m Ax = b x int D : f i (x) < 0, i = 1,..., m, Ax = b also guarantees that the dual optimum is attained (if p > ) there exist many other types of constraint qualifications

Shiqian Ma, MAT-258A: Numerical Optimization 44 Karush-Kuhn-Tucker (KKT) conditions the following 4 conditions are called KKT conditions (for a problem with differentiable f i,h i ): primal constraints: f i (x) 0, i = 1,..., m, h i (x) = 0, i = 1,..., p dual constraints: λ 0 complementary slackness: λ i f i (x) = 0, i = 1,..., m gradient of Lagrangian wrt x vanishes: m p f 0 (x) + λ i f i (x) + ν i h i (x) = 0 i=1 i=1

Shiqian Ma, MAT-258A: Numerical Optimization 45 KKT conditions for convex problem if x, λ, ν satisfy KKT for a convex problem, then they are optimal: from complementary slackness: f 0 ( x) = L( x, λ, ν) from 4th condition (and convexity): g( λ, ν) = L( x, λ, ν) hence f 0 ( x) = g( λ, ν) if Slater s condition is satisfied: x is optimal iff there exist λ,ν that satisfy KKT conditions recall that Slater implies strong duality, and dual optimum is attained generalizes optimality condition f 0 (x) = 0 for unconstrained problem