: Convex Non-Smooth Optimization April 2, 2007
Outline Lecture 19 Convex non-smooth problems Examples Subgradients and subdifferentials Subgradient properties Operations with subgradients and subdifferentials Optimality conditions Subgradient algorithms Convex Optimization 1
Convex-Constrained Non-smooth Minimization Characteristics: minimize f(x) subject to x C The function f : R n R is convex and possibly non-differentiable The set C R n is nonempty and convex The optimal value f is finite Our focus here is non-differentiability Renewed interest comes from large-scale problems and the need for distributed computations. Main questions: Where do such problems arise? How do we deal with non-differentiability? How can we solve them? Convex Optimization 2
Where they arise Lecture 19 Naturally in some applications (comm. nets, data fitting, neural-nets): Least-squares problems minimize mj=1 h(w, x j ) y j 2 subject to w 0 here (x j, y j ), j = 1,..., m are the input-output pairs, w are weights (decision variables) to be optimized, h is convex possibly nonsmooth In Lagrangian duality minimize q(µ, λ) subject to µ 0 A systematic approach for generating primal optimal bounds A part of some primal-dual scheme In (sharp) penalty approaches {f(x) + tp (g(x))} min x C where t > 0 is a penalty parameter and the penalty function is P (u) = m j=1 max{u j, 0} or P (u) = max{u 1,..., u m, 0} Convex Optimization 3
Example: Optimization in Network Coding Linear Cost Model minimize (i,j) L a ij max s S x s ij subject to 0 max s S x s ij c ij for all (i, j) L {j (i,j) L} x s ij {j (j,i)} x s ji = b s i for all i N, s S Lecture 19 N is the set of nodes in the communication network S is the set of sessions [a session is a pair of nodes that communicate] L is the set of directed links (i, j) denotes a link originating at node i and ending at node j c ij is the communication rate capacity of the link (i, j) a ij is the cost for the link (i, j) x s ij is the communication rate for session s on link (i, j) Problem Assign the rates x ij so as to minimize the cost subject to link capacities and the network balance equations The full knowledge of the network is not centralized Convex Optimization 4
Non-differentiability Issue Any convex function is differentiable almost everywhere on its domain [the points of non-differentiability are countable - zero measure] However, the points of nondifferentiability cannot be ignored minimize x subject to x R Consider a gradient descent method d k = f(x k ) starting with x 0 = 2, and a backtracking line search with α = 1 and σ = 0.2 Point x = 1 will be accepted since f(2) = 2, f(1) = 1, and f(2) = 2, so that 1 = f(2) f(1) 0.2 f(2) 2 = 0.8 Thus, x 1 = 1 and f(1) = 1, and x = 0 will be accepted: x 2 = 0. To proceed, we need f(0), which is not defined!!! Is there a direction playing role of a gradient? Convex Optimization 5
Subgradient What is a subgradient? For a convex and differentiable f, the linearization of f at a vector ˆx dom f underestimates f at all points in dom f f(x) f(ˆx) + f(ˆx) T (x ˆx) for all x dom f For a differentiable function this linearization is unique at any given ˆx dom f A convex non-differentiable f may have multiple linearizations at some points in dom f For such functions, a subgradient provides a linearization of f that underestimates f globally (at all points of the domain of f) Convex Optimization 6
Definition of Subgradient and Subdifferential Def. A vector s R n is a subgradient of f at ˆx dom f when Lecture 19 f(x) f(ˆx) + s T (x ˆx) for all x dom f Def. A subdifferential of f at ˆx dom f is the set of all subgradients s of f at ˆx dom f The subdifferential of f at ˆx is denoted by f(ˆx) When f is differentible at ˆx, we have f(ˆx) = { f(ˆx)} (the subdifferential is a singleton) Examples f(x) = x, f(0) = f(x) = sign(x) for x 0 [ 1, 1] for x = 0 x 2 + 2 x 3 for x > 1 0 for x 1 Convex Optimization 7
Subgradients and Epigraph Let s be a subgradient of f at ˆx: f(x) f(ˆx) + s T (x ˆx) The subgradient inequality is equivalent to s T ˆx + f(ˆx) s T x + f(x) for all x dom f for all x dom f Let f(x) > for all x R n. Then epi f = {(x, w) f(x) w, x R n } Thus, s T ˆx + f(ˆx) s T x + w for all (x, w) epi f, equivalent to [ ] T [ ] [ ] T [ ] s ˆx s x for all (x, w) epi f 1 f(ˆx) 1 w Therefore, the hyperplane H = { (x, γ) R n+1 ( s, 1) T (x, γ) = ( s, 1) T (ˆx, f(ˆx)) } supports epi f at the vector (ˆx, f(ˆx)) Convex Optimization 8
Subdifferential Set Properties When nonempty, a subdifferential f(ˆx) is convex and closed Existence Theorem Let f be convex with f(x) > for all x and a nonempty int(dom f). Then, the subdifferential f(ˆx) is nonempty compact convex set for every ˆx in the interior of dom f. Proof: f(ˆx) Nonempty. Let ˆx be in the interior of dom f. The vector (ˆx, f(ˆx)) does not belong to the interior of epi f. The epigraph epi f is convex and by the Supporting Hyperlane Theorem, there is a vector (d, β) R n+1, (d, β) 0 such that d T ˆx + βf(ˆx) d T x + βw for all (x, w) epi f By f(x) > for all x, we have epi f = {(x, w) f(x) w, x R n }. Hence, d T ˆx + βf(ˆx) d T x + βw for all x R n with f(x) w We must have β 0. We cannot have β = 0 (it would imply d = 0). Dividing by β, we see that d/β is a subgradient of f at ˆx Convex Optimization 9
f(ˆx) Bounded. By the subgradient inequality, we have Lecture 19 f(x) f(ˆx) + s T (x ˆx) for all x dom f Suppose that the subdifferential f(ˆx) is unbounded. Let s k be a sequence of subgradients in f(ˆx) with s k. Since ˆx lies in the interior of domain, there exists a δ > 0 such that ˆx + δy dom f for any y R n. Letting x = ˆx + δ s k for any k, we have ( s k f ˆx + δ s ) k f(ˆx) + δ s k for all k s k As k, we have f (ˆx + δ s ) k s k f(ˆx). However, this relation contradicts the continuity of f at ˆx. [Recall, a convex function is continuous over the interior of its domain.] Example Consider f(x) = x with dom f = {x x 0}. We have f(0) =. Note that 0 is not in the interior of the domain of f NOTE When f is closed convex and dom f, then f(x) > for all x R n Convex Optimization 10
Operations with Subdifferential Let f, f 1, and f 2 be convex functions with nonempty domains Scaling For λ > 0, the function λf is convex and (λf)(x) = λ f(x) for all x int(dom f) Sum The function f 1 + f 2 is convex, and (f 1 + f 2 )(x) = f 1 (x) + f 2 (x) for all x int(dom f) = int(dom f 1 ) int(dom f 2 ) Composition with Affine Mapping Let φ(x) = f(ax + b). Then, the function φ is convex and φ(x) = A T f(ax + b) for all x int(dom φ) where dom φ = {x Ax + b dom f} Convex Optimization 11
Max-Type Functions Let f i (x), i = 1,..., m be convex functions with nonempty domains Max-Function The function f(x) = max 1 i m f i (x) is convex and f(x) = conv ({ f i (x) i I(x)}) for all x int(dom f) where I(x) = {i f i (x) = f(x)} and dom f = m i=1dom f i Example f(x) = max{1 x, 1 + x}, we have f(x) = f 1 (x) for x < 0 f 2 (x) for x > 0 I(x) = {1} for x < 0 {2} for x > 0 f 1 (x) or f 2 (x) for x = 0 {1, 2} for x = 0 f(x) = 1 for x < 0, f(x) = 1 for x > 0, and f(0) = [ 1, 1] Convex Optimization 12
Examples f(x) = m i=1 a T i x b i. Let I (x) = {i a T i x b i < 0} I + (x) = {i a T i x b i > 0} I 0 (x) = {i a T i x b i = 0} Then f(x) = i I + (x) a i i I (x) Euclidian Norm f(x) = x = f(x) = x x a i + i I 0 (x) conv ({ a i, a i }) x 2 1 + + x2 n. Then for x 0 f(0) = {x x 1} = B 2 (0, 1) Convex Optimization 13
Sup-Type Functions Let φ(x, z) be function convex in x for every z Z, with Z Sup-Function The function f(x) = sup z Z φ(x, z) is convex and Lecture 19 conv ({ x φ(x, z) z Z (x)}) f(x) for all x dom f { } Z (x) = {z φ(x, z) = f(x)} dom f = x sup φ(x, z) < z Z Proof Let z Z (ˆx) and s x φ(ˆx, z). Then, for any x dom f: f(x) φ(x, z) φ(ˆx, z) + s T (x ˆx) = f(ˆx) + s T (x ˆx) When φ(x, z) is differentiable with respect to x for every z Z: conv ({ x φ(x, z) z Z (x)}) f(x) for all x dom f Convex Optimization 14
Optimality Conditions: Unconstrained Case Unconstrained optimization Assumption minimize f(x) The function f is convex (non-differentiable) and proper [f proper means f(x) > for all x and dom f ] Theorem Under this assumption, a vector x minimizes f over R n if and only if 0 f(x ) The result is a generalization of f(x ) = 0 Proof x is optimal if and only if f(x) f(x ) for all x, or equivalently f(x) f(x ) + 0 T (x x ) for all x R n Thus, x is optimal if and only if 0 f(x ) Convex Optimization 15
Examples The function f(x) = x f(0) = sign(x) for x 0 [ 1, 1] for x = 0 The minimum is at x = 0, and evidently 0 f(0) The function f(x) = x f(x) = x x for x 0 {s s 1} for x = 0 Again, the minimum is at x = 0 and 0 f(0) Convex Optimization 16
The function f(x) = max{x 2 + 2x 3, x 2 2x 3, 4} f(x) = x 2 2x 3 for x < 1 4 for x [ 1, 1] x 2 + 2x 3 for x > 1 f(x) = 2x 2 for x > 1 [ 4, 0] for x = 1 0 for x ( 1, 1) [0, 4] for x = 1 2x + 2 for x > 1 The optimal set is X = [ 1, 1] For every x X, we have 0 f(x ) Convex Optimization 17
Optimality Conditions: Constrained Case Constrained optimization Assumption minimize f(x) subject to x C The function f is convex (non-differentiable) and proper The set C is nonempty and convex Theorem Under this assumption, a vector x C minimizes f over the set C if and only if there exists a subgradient d f(x ) such that d T (x x ) 0 for all x C The result is a generalization of f(x ) T (x x ) 0 for x C Convex Optimization 18