Lecture 19: Convex Non-Smooth Optimization. April 2, 2007

Similar documents
Lecture 19 Subgradient Methods. November 5, 2008

Convexity Theory and Gradient Methods

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 2. Convex Optimization

Aspects of Convex, Nonconvex, and Geometric Optimization (Lecture 1) Suvrit Sra Massachusetts Institute of Technology

Convex Optimization Lecture 2

Convex Optimization MLSS 2015

Lagrangian Relaxation: An overview

Mathematical Programming and Research Methods (Part II)

Numerical Optimization

Convexity: an introduction

CMU-Q Lecture 9: Optimization II: Constrained,Unconstrained Optimization Convex optimization. Teacher: Gianni A. Di Caro

Lecture 2 September 3

Characterizing Improving Directions Unconstrained Optimization

Sparse Optimization Lecture: Proximal Operator/Algorithm and Lagrange Dual

Introduction to Modern Control Systems

Lecture 4: Convexity

Lecture 2 - Introduction to Polytopes

Generalized Nash Equilibrium Problem: existence, uniqueness and

Affine function. suppose f : R n R m is affine (f(x) =Ax + b with A R m n, b R m ) the image of a convex set under f is convex

Linear Programming. Larry Blume. Cornell University & The Santa Fe Institute & IHS

Math 5593 Linear Programming Lecture Notes

Convex Optimization - Chapter 1-2. Xiangru Lian August 28, 2015

Convex sets and convex functions

Convex sets and convex functions

Lecture 2. Topology of Sets in R n. August 27, 2008

Optimization for Machine Learning

1. Introduction. performance of numerical methods. complexity bounds. structural convex optimization. course goals and topics

CONVEX OPTIMIZATION: A SELECTIVE OVERVIEW

2. Convex sets. x 1. x 2. affine set: contains the line through any two distinct points in the set

Section 5 Convex Optimisation 1. W. Dai (IC) EE4.66 Data Proc. Convex Optimisation page 5-1

Nonlinear Programming

Optimization under uncertainty: modeling and solution methods

MTAEA Convexity and Quasiconvexity

Advanced Operations Research Techniques IE316. Quiz 1 Review. Dr. Ted Ralphs

CS675: Convex and Combinatorial Optimization Fall 2014 Convex Functions. Instructor: Shaddin Dughmi

LECTURE 13: SOLUTION METHODS FOR CONSTRAINED OPTIMIZATION. 1. Primal approach 2. Penalty and barrier methods 3. Dual approach 4. Primal-dual approach

2. Convex sets. affine and convex sets. some important examples. operations that preserve convexity. generalized inequalities

Introduction to Constrained Optimization

Compact Sets. James K. Peterson. September 15, Department of Biological Sciences and Department of Mathematical Sciences Clemson University

DM545 Linear and Integer Programming. Lecture 2. The Simplex Method. Marco Chiarandini

Introduction to Optimization

Convex Sets (cont.) Convex Functions

of Convex Analysis Fundamentals Jean-Baptiste Hiriart-Urruty Claude Lemarechal Springer With 66 Figures

A Derivative-Free Approximate Gradient Sampling Algorithm for Finite Minimax Problems

Convexity I: Sets and Functions

COMS 4771 Support Vector Machines. Nakul Verma

PRIMAL-DUAL INTERIOR POINT METHOD FOR LINEAR PROGRAMMING. 1. Introduction

CS 473: Algorithms. Ruta Mehta. Spring University of Illinois, Urbana-Champaign. Ruta (UIUC) CS473 1 Spring / 29

LECTURE 10 LECTURE OUTLINE

College of Computer & Information Science Fall 2007 Northeastern University 14 September 2007

EC 521 MATHEMATICAL METHODS FOR ECONOMICS. Lecture 2: Convex Sets

A primal-dual framework for mixtures of regularizers

Convex Optimization. Convex Sets. ENSAE: Optimisation 1/24

ORIE 6300 Mathematical Programming I November 13, Lecture 23. max b T y. x 0 s 0. s.t. A T y + s = c

Applied Lagrange Duality for Constrained Optimization

Convex Analysis and Minimization Algorithms I

60 2 Convex sets. {x a T x b} {x ã T x b}

Bounded subsets of topological vector spaces

Contents. I Basics 1. Copyright by SIAM. Unauthorized reproduction of this article is prohibited.

Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization. Author: Martin Jaggi Presenter: Zhongxing Peng

AMS : Combinatorial Optimization Homework Problems - Week V

Programming, numerics and optimization

Solving the Master Linear Program in Column Generation Algorithms for Airline Crew Scheduling using a Subgradient Method.

4 LINEAR PROGRAMMING (LP) E. Amaldi Fondamenti di R.O. Politecnico di Milano 1

Tutorial on Convex Optimization for Engineers

Simplex Algorithm in 1 Slide

Lecture 1: Introduction

Convex Optimization / Homework 2, due Oct 3

Programs. Introduction

Mathematical and Algorithmic Foundations Linear Programming and Matchings

Convex optimization algorithms for sparse and low-rank representations

COM Optimization for Communications Summary: Convex Sets and Convex Functions

ORIE 6300 Mathematical Programming I September 2, Lecture 3

15.082J and 6.855J. Lagrangian Relaxation 2 Algorithms Application to LPs

LECTURE 7 LECTURE OUTLINE. Review of hyperplane separation Nonvertical hyperplanes Convex conjugate functions Conjugacy theorem Examples

Lecture 5: Duality Theory

Discrete Optimization 2010 Lecture 5 Min-Cost Flows & Total Unimodularity

Convexization in Markov Chain Monte Carlo

ISM206 Lecture, April 26, 2005 Optimization of Nonlinear Objectives, with Non-Linear Constraints

IE 521 Convex Optimization

Lecture 2: August 31

FACES OF CONVEX SETS

IDENTIFYING ACTIVE MANIFOLDS

Math 414 Lecture 2 Everyone have a laptop?

Machine Learning for Signal Processing Lecture 4: Optimization

Convex Sets. Pontus Giselsson

Convex Optimization and Machine Learning

16.410/413 Principles of Autonomy and Decision Making

11 Linear Programming

Convex Sets. CSCI5254: Convex Optimization & Its Applications. subspaces, affine sets, and convex sets. operations that preserve convexity

Approximation Algorithms: The Primal-Dual Method. My T. Thai

Advanced Operations Research Techniques IE316. Quiz 2 Review. Dr. Ted Ralphs

Convex Geometry arising in Optimization

Integer Programming Theory

Lecture 10: SVM Lecture Overview Support Vector Machines The binary classification problem

TMA946/MAN280 APPLIED OPTIMIZATION. Exam instructions

LECTURE 18 LECTURE OUTLINE

CS522: Advanced Algorithms

Distance-to-Solution Estimates for Optimization Problems with Constraints in Standard Form

Lecture 3. Corner Polyhedron, Intersection Cuts, Maximal Lattice-Free Convex Sets. Tepper School of Business Carnegie Mellon University, Pittsburgh

Transcription:

: Convex Non-Smooth Optimization April 2, 2007

Outline Lecture 19 Convex non-smooth problems Examples Subgradients and subdifferentials Subgradient properties Operations with subgradients and subdifferentials Optimality conditions Subgradient algorithms Convex Optimization 1

Convex-Constrained Non-smooth Minimization Characteristics: minimize f(x) subject to x C The function f : R n R is convex and possibly non-differentiable The set C R n is nonempty and convex The optimal value f is finite Our focus here is non-differentiability Renewed interest comes from large-scale problems and the need for distributed computations. Main questions: Where do such problems arise? How do we deal with non-differentiability? How can we solve them? Convex Optimization 2

Where they arise Lecture 19 Naturally in some applications (comm. nets, data fitting, neural-nets): Least-squares problems minimize mj=1 h(w, x j ) y j 2 subject to w 0 here (x j, y j ), j = 1,..., m are the input-output pairs, w are weights (decision variables) to be optimized, h is convex possibly nonsmooth In Lagrangian duality minimize q(µ, λ) subject to µ 0 A systematic approach for generating primal optimal bounds A part of some primal-dual scheme In (sharp) penalty approaches {f(x) + tp (g(x))} min x C where t > 0 is a penalty parameter and the penalty function is P (u) = m j=1 max{u j, 0} or P (u) = max{u 1,..., u m, 0} Convex Optimization 3

Example: Optimization in Network Coding Linear Cost Model minimize (i,j) L a ij max s S x s ij subject to 0 max s S x s ij c ij for all (i, j) L {j (i,j) L} x s ij {j (j,i)} x s ji = b s i for all i N, s S Lecture 19 N is the set of nodes in the communication network S is the set of sessions [a session is a pair of nodes that communicate] L is the set of directed links (i, j) denotes a link originating at node i and ending at node j c ij is the communication rate capacity of the link (i, j) a ij is the cost for the link (i, j) x s ij is the communication rate for session s on link (i, j) Problem Assign the rates x ij so as to minimize the cost subject to link capacities and the network balance equations The full knowledge of the network is not centralized Convex Optimization 4

Non-differentiability Issue Any convex function is differentiable almost everywhere on its domain [the points of non-differentiability are countable - zero measure] However, the points of nondifferentiability cannot be ignored minimize x subject to x R Consider a gradient descent method d k = f(x k ) starting with x 0 = 2, and a backtracking line search with α = 1 and σ = 0.2 Point x = 1 will be accepted since f(2) = 2, f(1) = 1, and f(2) = 2, so that 1 = f(2) f(1) 0.2 f(2) 2 = 0.8 Thus, x 1 = 1 and f(1) = 1, and x = 0 will be accepted: x 2 = 0. To proceed, we need f(0), which is not defined!!! Is there a direction playing role of a gradient? Convex Optimization 5

Subgradient What is a subgradient? For a convex and differentiable f, the linearization of f at a vector ˆx dom f underestimates f at all points in dom f f(x) f(ˆx) + f(ˆx) T (x ˆx) for all x dom f For a differentiable function this linearization is unique at any given ˆx dom f A convex non-differentiable f may have multiple linearizations at some points in dom f For such functions, a subgradient provides a linearization of f that underestimates f globally (at all points of the domain of f) Convex Optimization 6

Definition of Subgradient and Subdifferential Def. A vector s R n is a subgradient of f at ˆx dom f when Lecture 19 f(x) f(ˆx) + s T (x ˆx) for all x dom f Def. A subdifferential of f at ˆx dom f is the set of all subgradients s of f at ˆx dom f The subdifferential of f at ˆx is denoted by f(ˆx) When f is differentible at ˆx, we have f(ˆx) = { f(ˆx)} (the subdifferential is a singleton) Examples f(x) = x, f(0) = f(x) = sign(x) for x 0 [ 1, 1] for x = 0 x 2 + 2 x 3 for x > 1 0 for x 1 Convex Optimization 7

Subgradients and Epigraph Let s be a subgradient of f at ˆx: f(x) f(ˆx) + s T (x ˆx) The subgradient inequality is equivalent to s T ˆx + f(ˆx) s T x + f(x) for all x dom f for all x dom f Let f(x) > for all x R n. Then epi f = {(x, w) f(x) w, x R n } Thus, s T ˆx + f(ˆx) s T x + w for all (x, w) epi f, equivalent to [ ] T [ ] [ ] T [ ] s ˆx s x for all (x, w) epi f 1 f(ˆx) 1 w Therefore, the hyperplane H = { (x, γ) R n+1 ( s, 1) T (x, γ) = ( s, 1) T (ˆx, f(ˆx)) } supports epi f at the vector (ˆx, f(ˆx)) Convex Optimization 8

Subdifferential Set Properties When nonempty, a subdifferential f(ˆx) is convex and closed Existence Theorem Let f be convex with f(x) > for all x and a nonempty int(dom f). Then, the subdifferential f(ˆx) is nonempty compact convex set for every ˆx in the interior of dom f. Proof: f(ˆx) Nonempty. Let ˆx be in the interior of dom f. The vector (ˆx, f(ˆx)) does not belong to the interior of epi f. The epigraph epi f is convex and by the Supporting Hyperlane Theorem, there is a vector (d, β) R n+1, (d, β) 0 such that d T ˆx + βf(ˆx) d T x + βw for all (x, w) epi f By f(x) > for all x, we have epi f = {(x, w) f(x) w, x R n }. Hence, d T ˆx + βf(ˆx) d T x + βw for all x R n with f(x) w We must have β 0. We cannot have β = 0 (it would imply d = 0). Dividing by β, we see that d/β is a subgradient of f at ˆx Convex Optimization 9

f(ˆx) Bounded. By the subgradient inequality, we have Lecture 19 f(x) f(ˆx) + s T (x ˆx) for all x dom f Suppose that the subdifferential f(ˆx) is unbounded. Let s k be a sequence of subgradients in f(ˆx) with s k. Since ˆx lies in the interior of domain, there exists a δ > 0 such that ˆx + δy dom f for any y R n. Letting x = ˆx + δ s k for any k, we have ( s k f ˆx + δ s ) k f(ˆx) + δ s k for all k s k As k, we have f (ˆx + δ s ) k s k f(ˆx). However, this relation contradicts the continuity of f at ˆx. [Recall, a convex function is continuous over the interior of its domain.] Example Consider f(x) = x with dom f = {x x 0}. We have f(0) =. Note that 0 is not in the interior of the domain of f NOTE When f is closed convex and dom f, then f(x) > for all x R n Convex Optimization 10

Operations with Subdifferential Let f, f 1, and f 2 be convex functions with nonempty domains Scaling For λ > 0, the function λf is convex and (λf)(x) = λ f(x) for all x int(dom f) Sum The function f 1 + f 2 is convex, and (f 1 + f 2 )(x) = f 1 (x) + f 2 (x) for all x int(dom f) = int(dom f 1 ) int(dom f 2 ) Composition with Affine Mapping Let φ(x) = f(ax + b). Then, the function φ is convex and φ(x) = A T f(ax + b) for all x int(dom φ) where dom φ = {x Ax + b dom f} Convex Optimization 11

Max-Type Functions Let f i (x), i = 1,..., m be convex functions with nonempty domains Max-Function The function f(x) = max 1 i m f i (x) is convex and f(x) = conv ({ f i (x) i I(x)}) for all x int(dom f) where I(x) = {i f i (x) = f(x)} and dom f = m i=1dom f i Example f(x) = max{1 x, 1 + x}, we have f(x) = f 1 (x) for x < 0 f 2 (x) for x > 0 I(x) = {1} for x < 0 {2} for x > 0 f 1 (x) or f 2 (x) for x = 0 {1, 2} for x = 0 f(x) = 1 for x < 0, f(x) = 1 for x > 0, and f(0) = [ 1, 1] Convex Optimization 12

Examples f(x) = m i=1 a T i x b i. Let I (x) = {i a T i x b i < 0} I + (x) = {i a T i x b i > 0} I 0 (x) = {i a T i x b i = 0} Then f(x) = i I + (x) a i i I (x) Euclidian Norm f(x) = x = f(x) = x x a i + i I 0 (x) conv ({ a i, a i }) x 2 1 + + x2 n. Then for x 0 f(0) = {x x 1} = B 2 (0, 1) Convex Optimization 13

Sup-Type Functions Let φ(x, z) be function convex in x for every z Z, with Z Sup-Function The function f(x) = sup z Z φ(x, z) is convex and Lecture 19 conv ({ x φ(x, z) z Z (x)}) f(x) for all x dom f { } Z (x) = {z φ(x, z) = f(x)} dom f = x sup φ(x, z) < z Z Proof Let z Z (ˆx) and s x φ(ˆx, z). Then, for any x dom f: f(x) φ(x, z) φ(ˆx, z) + s T (x ˆx) = f(ˆx) + s T (x ˆx) When φ(x, z) is differentiable with respect to x for every z Z: conv ({ x φ(x, z) z Z (x)}) f(x) for all x dom f Convex Optimization 14

Optimality Conditions: Unconstrained Case Unconstrained optimization Assumption minimize f(x) The function f is convex (non-differentiable) and proper [f proper means f(x) > for all x and dom f ] Theorem Under this assumption, a vector x minimizes f over R n if and only if 0 f(x ) The result is a generalization of f(x ) = 0 Proof x is optimal if and only if f(x) f(x ) for all x, or equivalently f(x) f(x ) + 0 T (x x ) for all x R n Thus, x is optimal if and only if 0 f(x ) Convex Optimization 15

Examples The function f(x) = x f(0) = sign(x) for x 0 [ 1, 1] for x = 0 The minimum is at x = 0, and evidently 0 f(0) The function f(x) = x f(x) = x x for x 0 {s s 1} for x = 0 Again, the minimum is at x = 0 and 0 f(0) Convex Optimization 16

The function f(x) = max{x 2 + 2x 3, x 2 2x 3, 4} f(x) = x 2 2x 3 for x < 1 4 for x [ 1, 1] x 2 + 2x 3 for x > 1 f(x) = 2x 2 for x > 1 [ 4, 0] for x = 1 0 for x ( 1, 1) [0, 4] for x = 1 2x + 2 for x > 1 The optimal set is X = [ 1, 1] For every x X, we have 0 f(x ) Convex Optimization 17

Optimality Conditions: Constrained Case Constrained optimization Assumption minimize f(x) subject to x C The function f is convex (non-differentiable) and proper The set C is nonempty and convex Theorem Under this assumption, a vector x C minimizes f over the set C if and only if there exists a subgradient d f(x ) such that d T (x x ) 0 for all x C The result is a generalization of f(x ) T (x x ) 0 for x C Convex Optimization 18