Incremental Gradient, Subgradient, and Proximal Methods for Convex Optimization: A Survey. Chapter 4 : Optimization for Machine Learning

Size: px

Start display at page:

Download "Incremental Gradient, Subgradient, and Proximal Methods for Convex Optimization: A Survey. Chapter 4 : Optimization for Machine Learning"

Randolf Harris
5 years ago
Views:

1 Incremental Gradient, Subgradient, and Proximal Methods for Convex Optimization: A Survey Chapter 4 : Optimization for Machine Learning

2 Summary of Chapter 2 Chapter 2: Convex Optimization with Sparsity Inducing Norm This chapter is on convex optimization of the form Where f is convex differentiable function and Ωis sparsityinducing non-smooth norm Ωl1, l1+ l1/lq, hierarchical l1/lqnorm Subgradient, block co-ordinate descent, reweighted l2 algorithms etc

3 Summary of Chapter 3 This chapter is on Cone linear and quadratic programming of the form Where is generalized inequality,, where C is closed pointed cone. Examples of cones :- 1) non-negative orhant 2) Second-order cone :- There is Python package CVXOPT to solve conic problems

4 Introduction This chapter considers optimization problems with cost functions such as Where m is very large. Therefore, using incremental methods that operate on singe rather than entire cost function.

5 Least Square and Related inference problems Classical regression L1-regularization problem Other possibilities include using non-quadratic convex loss functions

6 Dual Optimization in Separable Problems The problems of the form On non-convex set Y, have dual form

7 Weber Problem in Location Theory Find a point x whose weighted distances from given get of points Y (y1, y2, ym) is minimized

8 Incremental Gradient Methods Differentiable Problems When the component functions are differentiable we may use incremental gradient methods of the form Where ikis the index of cost component iterated on Such methods make fast progress when far from convergence but are slow when close to convergence Fixes: use constant step size or reduce to a small positive value

9 Variant of incremental gradient method Gradient method with momentum Aggregate component gradient Incremental gradient methods are also related to stochastic gradient method.

10 Incremental Sub-gradient Methods For cases when component functions are convex and nondifferentiable In place of gradient, arbitrary sub gradient is used. Convexity of fi(x) is essential Even non-incremental methods require sub-linear rate of convergence, hence incremental methods are favored

11 Incremental Proximal Methods These are the problems of the form This form is desirable as for some components, proximal iteration may be obtained in closed form Proximal iterations are considered more stable than gradient or subgradient iterations.

12 Incremental Subgradient-Proximal methods These methods include incremental algorithms with combination of proximal and sub-gradient iteration.

13 Both zkand xkare within constraint X which can be relaxed for either proximal or sub-gradient iterations which leads to easier computation So, the iterations in previous slides can be rewritten as: Or Incremental proximal iterations are closely related to sub-gradient iterations. So, we can re-write two steps given above in one step

14 Order of components Incremental sub-gradient proximal method s effectiveness depends on order {fi, hi} are chosen. 1) Cyclic : {fi, hi} are taken in fixed deterministic order 2) randomized order based on uniform sampling: each iteration pair {fi, hi} is randomly chosen Both order converge, however randomized order is superior to cyclic order

15 Applications: Regularized least squares Let s consider problem of the form Where R(x) is a l1-norm Then proximal iteration becomes

16 Applications: Regularized least squares It decomposed into Incremental algorithm are well-suited for such problem as proximal updates can be done in closed form Followed by gradient iteration

17 Iterated Projection Algorithm for Feasibility Problem Feasibility problem has the form Which can be re-written for Lipschitzcontinuous f and sufficiently large γ For which incremental algorithms apply

Nonlinear Programming

Nonlinear Programming SECOND EDITION Dimitri P. Bertsekas Massachusetts Institute of Technology WWW site for book Information and Orders http://world.std.com/~athenasc/index.html Athena Scientific, Belmont,