Composite Self-concordant Minimization
|
|
- Gilbert Bailey
- 5 years ago
- Views:
Transcription
1 Composite Self-concordant Minimization Volkan Cevher Laboratory for Information and Inference Systems-LIONS Ecole Polytechnique Federale de Lausanne (EPFL) Paris 6 Dec 11, 2013 joint work with Quoc Tran Dinh Anastasios Kyrillidis Yen-Huan Li
2 (P) Composite minimization a a f convex and smooth g convex and possibly nonsmooth Motivation Problem (P) covers many practical problems: - Unconstrained basic LASSO / logistic regression - Graphical model selection / latent variable graphical model selection - Poisson imaging reconstruction / LASSO problem with unknown variance - Low-rank recovery / clustering - Atomic norm regularization / off-the-grid array processing
3 (P) Composite minimization a a f convex and smooth g convex and possibly nonsmooth Motivation Problem (P) covers many practical LARGE SCALE problems: - Unconstrained basic LASSO / logistic regression - Graphical model selection / latent variable graphical model selection - Poisson imaging reconstruction / LASSO problem with unknown variance - Low-rank recovery / clustering - Atomic norm regularization / off-the-grid array processing need scalable algorithms
4 Composite (P) minimization: modus operandi a a f convex and smooth g convex and possibly nonsmooth Classes of smooth functions (f)
5 Composite (P) minimization: modus operandi a a f convex and smooth g convex and possibly nonsmooth Classes of smooth functions (f)
6 Composite (P) minimization: modus operandi a a f convex and smooth g convex and possibly nonsmooth Classes of smooth functions (f) Following prox computation is tractable:
7 Composite (P) minimization: modus operandi a a f convex and smooth g convex and possibly nonsmooth Classes of smooth functions (f) Following prox computation is tractable: Example:
8 Composite (P) minimization: modus operandi a a f convex and smooth Classes of smooth functions (f) g convex and possibly nonsmooth with tractable prox well-understood Fast gradient schemes (Nesterov s methods) Newton/quasi Newton schemes
9 Composite (P) minimization: an uncharted region a a f convex and smooth Classes of smooth functions (f) g convex and possibly nonsmooth with tractable prox Scalability is NOT great Fast gradient schemes (Nesterov s methods) Newton/quasi Newton schemes
10 Composite (P) self-concordant minimization a a f convex and self-concordant Classes of smooth functions (f) g convex and possibly nonsmooth with tractable prox Key structure for the interior point method
11 Example: Log-determinant for LMIs Application: Graphical model selection Optimization problem
12 Log-barrier for linear/quadratic inequalities Poisson imaging reconstruction via TV regularization -a Basic pursuit denoising problem (BPDP): Barrier formulation LASSO problem with unknown variance
13 Composite (P) self-concordant minimization a a f convex and self-concordant Classes of smooth functions (f) g convex and possibly nonsmooth with tractable prox Our contributions: i) a variable metric (path following) forward-backward framework ii) convergence theory without the Lipschitz gradient assumption iii) novel variants and extensions for several applications & SCOPT
14 Basic algorithmic framework
15 A basic composite minimization framework Main properties of Lower surrogate Upper surrogate Hessian surrogates
16 A basic composite minimization framework Main properties of Lower surrogate Upper surrogate Hessian surrogates
17 A basic composite minimization framework Main properties of Lower surrogate Upper surrogate Hessian surrogates
18 A basic composite minimization framework ISTA Main properties of Lower surrogate Upper surrogate Hessian surrogates
19 A basic composite minimization framework ISTA Main properties of Lower surrogate Upper surrogate Hessian surrogates
20 A basic composite minimization framework ISTA acceleration is possible FISTA Main properties of Lower surrogate Upper surrogate Hessian surrogates
21 To adapt or not to adapt? L is a global worst-case constant
22 To adapt or not to adapt? L is a global worst-case constant
23 To adapt or not to adapt? L is a global worst-case constant
24 To adapt or not to adapt? Variable metric proximal point operator
25 To adapt or not to adapt? Variable metric proximal point operator
26 A basic variable metric minimization framework Proximal point scheme with variable metric [Bonnans, 1993] Proximal point scheme with variable metric Variable metric proximal point operator Additional accuracy vs. computation trade-offs
27 Self-concordance vs. Lipschitz gradient + SC Main properties of Lower surrogate Upper surrogate Hessian surrogates
28 Self-concordance vs. Lipschitz gradient + SC Main properties of Lower surrogate Upper surrogate Global Hessian surrogates
29 Self-concordance vs. Lipschitz gradient + SC Main properties of Lower surrogate Upper surrogate Hessian surrogates Local norm: Utility functions:
30 Self-concordance vs. Lipschitz gradient + SC Main properties of Lower surrogate Upper surrogate Hessian surrogates Local Local norm: Utility functions:
31 Self-concordance: A mathematical tool Main properties of Lower surrogate Upper surrogate Hessian surrogates New variable metric framework with rigorous convergence guarantees Includes several algorithms: Newton, quasi-newton, and gradient methods...
32 Our composite self-concordant minimization framework Proximal Newton scheme
33 Our composite self-concordant minimization framework Proximal Newton scheme How to compute the Proximal Newton direction?
34 Our composite self-concordant minimization framework Proximal Newton scheme How to compute the Proximal Newton direction? Fast gradient schemes (Nesterov s methods) Newton/quasi Newton schemes
35 Our composite self-concordant minimization framework Key contribution: Proximal Newton scheme step size selection procedure How to compute the Proximal Newton direction? Fast gradient schemes (Nesterov s methods) Newton/quasi Newton schemes
36 How do we compute the step-size? Upper surrogate of f Convexity of g and optimality condition of the subproblem
37 How do we compute the step-size? Upper surrogate of f Convexity of g and optimality condition of the subproblem
38 How do we compute the step-size? Upper surrogate of f Convexity of g and optimality condition of the subproblem When,
39 Analytic complexity Worst-case complexity to obtain an -approximate solution
40 Analytic complexity Worst-case complexity to obtain an -approximate solution global convergence local convergence Can explicitly calculate quadratic convergence region
41 Analytic complexity Worst-case complexity to obtain an -approximate solution global convergence local convergence Can explicitly calculate Line-search can accelerate the convergence quadratic convergence region Line-search enhancement
42 Graphical model selection Objective:
43 Graphical model selection Objective: Gradient and Hessian (large-scale, special structure)
44 Graphical model selection Objective: Gradient and Hessian (large-scale, special structure) How to compute the Proximal Newton direction?
45 Graphical model selection Objective: Gradient and Hessian (large-scale, special structure) Dual approach for solving subproblem (SP) Primal subproblem Dual subproblem (SPGL) Unconstrained LASSO problem
46 Graphical model selection Objective: Gradient and Hessian (large-scale, special structure) Dual approach for solving subproblem (SP) Primal subproblem Dual subproblem (SPGL) No Cholesky decomposition and matrix inversion Unconstrained LASSO problem
47 Graphical model selection Objective: Gradient and Hessian (large-scale, special structure) Dual approach for solving subproblem (SP) Primal subproblem Dual subproblem (SPGL) No Cholesky decomposition and matrix inversion Unconstrained LASSO problem How to compute proximal Newton decrement?
48 Graphical model selection: numerical examples Our method vs QUIC [Hseih2011] - QUIC subproblem solver: special block-coordinate descent - Our subproblem solver: general proximal algorithms Convergence behaviour [rho = 0.5]: Lymph [p = 587] (left), Leukemia [p = 1255] (right) Step-size selection strategies: Arabidopsis [p = 834], Leukemia [p = 1255], Hereditary [p = 1869] *Details: A proximal Newton framework for composite minimization: Graph learning without Cholesky decompositions and matrix inversions, ICML 13 and lions.epfl.ch/publications.
49 Graphical model selection: numerical examples Our method vs QUIC [Hseih2011] - QUIC subproblem solver: special block-coordinate descent On the average x5 acceleration (up to x15) over Matlab QUIC - Our subproblem solver: general proximal algorithms Convergence behaviour [rho = 0.5]: Lymph [p = 587] (left), Leukemia [p = 1255] (right) Step-size selection strategies: Arabidopsis [p = 834], Leukemia [p = 1255], Hereditary [p = 1869] *Details: A proximal Newton framework for composite minimization: Graph learning without Cholesky decompositions and matrix inversions, ICML 13 and lions.epfl.ch/publications.
50 (P) Composite minimization: alternatives? a a f convex and smooth g convex and possibly nonsmooth Existing numerical approaches - Splitting methods - Forward-backward: applicable if f has Lipschitz gradient - Douglas-Rachford decomposition: f and g have tractable proximity operators - Augmented Lagrangian methods (e.g., D-R again) Prox operator of self-concordant functions are costly!
51 Our cheaper variable metric strategies Proximal gradient scheme* How to compute the search direction? No Lipschitz assumption A new predictor corrector scheme (with local linear convergence*) Proximal quasi-newton scheme (BFGS updates)* *Details: Composite Self-Concordant Minimization, arxiv and lions.epfl.ch/publications.
52 New theory: Local linear convergence of the PG method Graph learning: Lymph [p = 587]
53 New theory: Local linear convergence of the PG method Graph learning: Lymph [p = 587] lucky BB
54 New theory: Local linear convergence of the PG method Graph learning: Lymph [p = 587]
55 New theory: Local linear convergence of the PG method Graph learning: Lymph [p = 587] Theory is based on the notion of restricted strong convexity
56 New theory: Local linear convergence of the PG method Graph learning: Lymph [p = 587] Theory is based on the notion of restricted strong convexity convergence depends on the full condition number
57 New theory: Local linear convergence of the PG method Graph learning: Lymph [p = 587] Theory is based on the notion of restricted strong convexity convergence depends on the full condition number
58 New theory: Local linear convergence of the PG method Graph learning: Lymph [p = 587] Theory is based on the notion of restricted strong convexity convergence depends on the restricted condition number
59 New theory: Local linear convergence of the PG method Graph learning: Lymph [p = 587] Theory is based on the notion of restricted strong convexity convergence depends on the restricted condition number
60 New theory: Local linear convergence of the PG method Graph learning: Lymph [p = 587] Heteroschedastic LASSO [rho decreases from left to right]
61 Proximal gradient scheme: new engineering A greedy enhancement prox:
62 Proximal gradient scheme: new engineering A greedy enhancement prox:
63 Proximal gradient scheme: new engineering A greedy enhancement slows down convergence! prox:
64 Proximal gradient scheme: new engineering A greedy enhancement simple decision: prox:
65 Proximal gradient scheme: new engineering A greedy enhancement simple decision: cost: practically none if implemented carefully prox:
66 Poisson imaging reconstruction via TV Our method vs SPIRAL-TAP [Harmany2012] Poisson imaging reconstruction via TV regularization -a
67 Poisson imaging reconstruction via TV Our method vs SPIRAL-TAP [Harmany2012] On the average x10 acceleration (up to x250) over SPIRAL-TAP with better accuracy
68 Barrier extensions
69 Constrained convex problems
70 Constrained convex problems Examples:
71 Constrained convex problems Examples: Main idea: solve a sequence of composite self-concordant problems
72 How does it work? Main idea: solve a sequence of composite self-concordant problems as opposed to DCO One iteration k requires two updates simultaneously: - Update the penalty parameter: - Update the iterative vector (can be solved approximately): *Details: An Inexact Proximal Path-Following Algorithm for Constrained Convex Minimization, optimization-online and lions.epfl.ch/publications.
73 How does it converge? The algorithm consists of two PHASES: - Phase I: Find a starting point such that: - Phase II: Perform the path-following iterations Tracking properties on the penalty parameter and the iterative sequence in Phase II - The penalty parameter decreases at least with a factor at each iteration k - Tracking error of the iterative sequence: Worst-case complexity: - Phase 1: Finding a starting point for Phase II requires at most - Phase II: The worst-case complexity to reach an - solution is at most: - Note: This worst-case complexity is as the same as in standard path-following methods [see Nesterov2004]
74 Proximal path-following Upshot: no-heavy lifting! Proximal path following for conic programming with rigorous guarantees Example: Low-rank SDP matrix approximation... is a regularization parameter in (0, 1), M is the given input matrix. #variables #constraints
75 Proximal path-following Upshot: desired scaling! Proximal path following for conic programming with rigorous guarantees Example: Max-norm clustering DCO: splitting splitting
76 Proximal path-following Upshot: auto. regularization! Proximal path following for conic programming with rigorous guarantees - Main idea: solve a sequence of composite self-concordant problems as opposed to DCO Example: Max-norm clustering Example: Graph selection
77 Conclusions
78 Conclusions A new variable metric proximal-point framework for composite self-concordant minimization + Extensions
79 Conclusions Highlights - Globalization: a new strategy for finding step-size explicitly strategy motivate forward-looking line-search - Search direction: efficient (strongly convex program) - Local convergence: quadratic convergence without boundedness of the Hessian analytic quadratic convergence region
80 Conclusions Highlights - Globalization: a new strategy for finding step-size explicitly strategy motivate forward-looking line-search - Search direction: efficient (strongly convex program) - Local convergence: quadratic convergence without boundedness of the Hessian Practical contributions (this talk) analytic quadratic convergence region SCOPT package has quasi-newton / first & second order leverage fast proximal solvers for g(x) (structured norms etc.) robust to subproblem solver accuracy SCOPT FTW
81
A primal-dual framework for mixtures of regularizers
A primal-dual framework for mixtures of regularizers Baran Gözcü baran.goezcue@epfl.ch Laboratory for Information and Inference Systems (LIONS) École Polytechnique Fédérale de Lausanne (EPFL) Switzerland
More informationDavid G. Luenberger Yinyu Ye. Linear and Nonlinear. Programming. Fourth Edition. ö Springer
David G. Luenberger Yinyu Ye Linear and Nonlinear Programming Fourth Edition ö Springer Contents 1 Introduction 1 1.1 Optimization 1 1.2 Types of Problems 2 1.3 Size of Problems 5 1.4 Iterative Algorithms
More information1. Introduction. performance of numerical methods. complexity bounds. structural convex optimization. course goals and topics
1. Introduction EE 546, Univ of Washington, Spring 2016 performance of numerical methods complexity bounds structural convex optimization course goals and topics 1 1 Some course info Welcome to EE 546!
More informationParallel and Distributed Sparse Optimization Algorithms
Parallel and Distributed Sparse Optimization Algorithms Part I Ruoyu Li 1 1 Department of Computer Science and Engineering University of Texas at Arlington March 19, 2015 Ruoyu Li (UTA) Parallel and Distributed
More informationContents. I Basics 1. Copyright by SIAM. Unauthorized reproduction of this article is prohibited.
page v Preface xiii I Basics 1 1 Optimization Models 3 1.1 Introduction... 3 1.2 Optimization: An Informal Introduction... 4 1.3 Linear Equations... 7 1.4 Linear Optimization... 10 Exercises... 12 1.5
More informationCombinatorial Selection and Least Absolute Shrinkage via The CLASH Operator
Combinatorial Selection and Least Absolute Shrinkage via The CLASH Operator Volkan Cevher Laboratory for Information and Inference Systems LIONS / EPFL http://lions.epfl.ch & Idiap Research Institute joint
More informationIE598 Big Data Optimization Summary Nonconvex Optimization
IE598 Big Data Optimization Summary Nonconvex Optimization Instructor: Niao He April 16, 2018 1 This Course Big Data Optimization Explore modern optimization theories, algorithms, and big data applications
More informationOptimization for Machine Learning
Optimization for Machine Learning (Problems; Algorithms - C) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html
More informationConvex optimization algorithms for sparse and low-rank representations
Convex optimization algorithms for sparse and low-rank representations Lieven Vandenberghe, Hsiao-Han Chao (UCLA) ECC 2013 Tutorial Session Sparse and low-rank representation methods in control, estimation,
More informationLecture 18: March 23
0-725/36-725: Convex Optimization Spring 205 Lecturer: Ryan Tibshirani Lecture 8: March 23 Scribes: James Duyck Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have not
More informationPIPA: A New Proximal Interior Point Algorithm for Large-Scale Convex Optimization
PIPA: A New Proximal Interior Point Algorithm for Large-Scale Convex Optimization Marie-Caroline Corbineau 1, Emilie Chouzenoux 1,2, Jean-Christophe Pesquet 1 1 CVN, CentraleSupélec, Université Paris-Saclay,
More informationACCELERATED DUAL GRADIENT-BASED METHODS FOR TOTAL VARIATION IMAGE DENOISING/DEBLURRING PROBLEMS. Donghwan Kim and Jeffrey A.
ACCELERATED DUAL GRADIENT-BASED METHODS FOR TOTAL VARIATION IMAGE DENOISING/DEBLURRING PROBLEMS Donghwan Kim and Jeffrey A. Fessler University of Michigan Dept. of Electrical Engineering and Computer Science
More informationProximal operator and methods
Proximal operator and methods Master 2 Data Science, Univ. Paris Saclay Robert M. Gower Optimization Sum of Terms A Datum Function Finite Sum Training Problem The Training Problem Convergence GD I Theorem
More informationConvex Optimization. Lijun Zhang Modification of
Convex Optimization Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Modification of http://stanford.edu/~boyd/cvxbook/bv_cvxslides.pdf Outline Introduction Convex Sets & Functions Convex Optimization
More informationLECTURE 13: SOLUTION METHODS FOR CONSTRAINED OPTIMIZATION. 1. Primal approach 2. Penalty and barrier methods 3. Dual approach 4. Primal-dual approach
LECTURE 13: SOLUTION METHODS FOR CONSTRAINED OPTIMIZATION 1. Primal approach 2. Penalty and barrier methods 3. Dual approach 4. Primal-dual approach Basic approaches I. Primal Approach - Feasible Direction
More informationOptimization for Machine Learning
with a focus on proximal gradient descent algorithm Department of Computer Science and Engineering Outline 1 History & Trends 2 Proximal Gradient Descent 3 Three Applications A Brief History A. Convex
More informationSpicyMKL Efficient multiple kernel learning method using dual augmented Lagrangian
SpicyMKL Efficient multiple kernel learning method using dual augmented Lagrangian Taiji Suzuki Ryota Tomioka The University of Tokyo Graduate School of Information Science and Technology Department of
More informationAlgorithms for convex optimization
Algorithms for convex optimization Michal Kočvara Institute of Information Theory and Automation Academy of Sciences of the Czech Republic and Czech Technical University kocvara@utia.cas.cz http://www.utia.cas.cz/kocvara
More informationNonlinear Programming
Nonlinear Programming SECOND EDITION Dimitri P. Bertsekas Massachusetts Institute of Technology WWW site for book Information and Orders http://world.std.com/~athenasc/index.html Athena Scientific, Belmont,
More informationThe Alternating Direction Method of Multipliers
The Alternating Direction Method of Multipliers Customizable software solver package Peter Sutor, Jr. Project Advisor: Professor Tom Goldstein April 27, 2016 1 / 28 Background The Dual Problem Consider
More informationIntroduction to Optimization
Introduction to Optimization Second Order Optimization Methods Marc Toussaint U Stuttgart Planned Outline Gradient-based optimization (1st order methods) plain grad., steepest descent, conjugate grad.,
More informationThe Alternating Direction Method of Multipliers
The Alternating Direction Method of Multipliers With Adaptive Step Size Selection Peter Sutor, Jr. Project Advisor: Professor Tom Goldstein October 8, 2015 1 / 30 Introduction Presentation Outline 1 Convex
More informationA PRIMAL-DUAL FRAMEWORK FOR MIXTURES OF REGULARIZERS. LIONS - EPFL Lausanne - Switzerland
A PRIMAL-DUAL FRAMEWORK FOR MIXTURES OF REGULARIZERS Baran Gözcü, Luca Baldassarre, Quoc Tran-Dinh, Cosimo Aprile and Volkan Cevher LIONS - EPFL Lausanne - Switzerland ABSTRACT Effectively solving many
More informationCOMS 4771 Support Vector Machines. Nakul Verma
COMS 4771 Support Vector Machines Nakul Verma Last time Decision boundaries for classification Linear decision boundary (linear classification) The Perceptron algorithm Mistake bound for the perceptron
More informationPreface. and Its Applications 81, ISBN , doi: / , Springer Science+Business Media New York, 2013.
Preface This book is for all those interested in using the GAMS technology for modeling and solving complex, large-scale, continuous nonlinear optimization problems or applications. Mainly, it is a continuation
More informationConvex Optimization MLSS 2015
Convex Optimization MLSS 2015 Constantine Caramanis The University of Texas at Austin The Optimization Problem minimize : f (x) subject to : x X. The Optimization Problem minimize : f (x) subject to :
More informationLecture 19: November 5
0-725/36-725: Convex Optimization Fall 205 Lecturer: Ryan Tibshirani Lecture 9: November 5 Scribes: Hyun Ah Song Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have not
More informationConstrained optimization
Constrained optimization A general constrained optimization problem has the form where The Lagrangian function is given by Primal and dual optimization problems Primal: Dual: Weak duality: Strong duality:
More informationAlternating Direction Method of Multipliers
Alternating Direction Method of Multipliers CS 584: Big Data Analytics Material adapted from Stephen Boyd (https://web.stanford.edu/~boyd/papers/pdf/admm_slides.pdf) & Ryan Tibshirani (http://stat.cmu.edu/~ryantibs/convexopt/lectures/21-dual-meth.pdf)
More informationLecture 12: convergence. Derivative (one variable)
Lecture 12: convergence More about multivariable calculus Descent methods Backtracking line search More about convexity (first and second order) Newton step Example 1: linear programming (one var., one
More informationNetwork Lasso: Clustering and Optimization in Large Graphs
Network Lasso: Clustering and Optimization in Large Graphs David Hallac, Jure Leskovec, Stephen Boyd Stanford University September 28, 2015 Convex optimization Convex optimization is everywhere Introduction
More informationSparse Optimization Lecture: Proximal Operator/Algorithm and Lagrange Dual
Sparse Optimization Lecture: Proximal Operator/Algorithm and Lagrange Dual Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know learn the proximal
More informationCharacterizing Improving Directions Unconstrained Optimization
Final Review IE417 In the Beginning... In the beginning, Weierstrass's theorem said that a continuous function achieves a minimum on a compact set. Using this, we showed that for a convex set S and y not
More information5 Machine Learning Abstractions and Numerical Optimization
Machine Learning Abstractions and Numerical Optimization 25 5 Machine Learning Abstractions and Numerical Optimization ML ABSTRACTIONS [some meta comments on machine learning] [When you write a large computer
More informationRecent Advances in Frank-Wolfe Optimization. Simon Lacoste-Julien
Recent Advances in Frank-Wolfe Optimization Simon Lacoste-Julien OSL 2017 Les Houches April 13 th, 2017 Outline Frank-Wolfe algorithm review global linear convergence of FW optimization variants condition
More informationThe Benefit of Tree Sparsity in Accelerated MRI
The Benefit of Tree Sparsity in Accelerated MRI Chen Chen and Junzhou Huang Department of Computer Science and Engineering, The University of Texas at Arlington, TX, USA 76019 Abstract. The wavelet coefficients
More informationFast-Lipschitz Optimization
Fast-Lipschitz Optimization DREAM Seminar Series University of California at Berkeley September 11, 2012 Carlo Fischione ACCESS Linnaeus Center, Electrical Engineering KTH Royal Institute of Technology
More informationLecture 4 Duality and Decomposition Techniques
Lecture 4 Duality and Decomposition Techniques Jie Lu (jielu@kth.se) Richard Combes Alexandre Proutiere Automatic Control, KTH September 19, 2013 Consider the primal problem Lagrange Duality Lagrangian
More informationOptimization. Industrial AI Lab.
Optimization Industrial AI Lab. Optimization An important tool in 1) Engineering problem solving and 2) Decision science People optimize Nature optimizes 2 Optimization People optimize (source: http://nautil.us/blog/to-save-drowning-people-ask-yourself-what-would-light-do)
More informationA Brief Look at Optimization
A Brief Look at Optimization CSC 412/2506 Tutorial David Madras January 18, 2018 Slides adapted from last year s version Overview Introduction Classes of optimization problems Linear programming Steepest
More informationINTRODUCTION TO LINEAR AND NONLINEAR PROGRAMMING
INTRODUCTION TO LINEAR AND NONLINEAR PROGRAMMING DAVID G. LUENBERGER Stanford University TT ADDISON-WESLEY PUBLISHING COMPANY Reading, Massachusetts Menlo Park, California London Don Mills, Ontario CONTENTS
More informationx 1 SpaceNet: Multivariate brain decoding and segmentation Elvis DOHMATOB (Joint work with: M. EICKENBERG, B. THIRION, & G. VAROQUAUX) 75 x y=20
SpaceNet: Multivariate brain decoding and segmentation Elvis DOHMATOB (Joint work with: M. EICKENBERG, B. THIRION, & G. VAROQUAUX) L R 75 x 2 38 0 y=20-38 -75 x 1 1 Introducing the model 2 1 Brain decoding
More informationEE8231 Optimization Theory Course Project
EE8231 Optimization Theory Course Project Qian Zhao (zhaox331@umn.edu) May 15, 2014 Contents 1 Introduction 2 2 Sparse 2-class Logistic Regression 2 2.1 Problem formulation....................................
More informationConvex Optimization CMU-10725
Convex Optimization CMU-10725 Conjugate Direction Methods Barnabás Póczos & Ryan Tibshirani Conjugate Direction Methods 2 Books to Read David G. Luenberger, Yinyu Ye: Linear and Nonlinear Programming Nesterov:
More informationIncremental Gradient, Subgradient, and Proximal Methods for Convex Optimization: A Survey. Chapter 4 : Optimization for Machine Learning
Incremental Gradient, Subgradient, and Proximal Methods for Convex Optimization: A Survey Chapter 4 : Optimization for Machine Learning Summary of Chapter 2 Chapter 2: Convex Optimization with Sparsity
More informationSupport Vector Machines. James McInerney Adapted from slides by Nakul Verma
Support Vector Machines James McInerney Adapted from slides by Nakul Verma Last time Decision boundaries for classification Linear decision boundary (linear classification) The Perceptron algorithm Mistake
More informationIntroduction Optimization Geoff Gordon Ryan Tibshirani
Introduction 10-75 Optimization Geoff Gordon Ryan Tibshirani Administrivia http://www.cs.cmu.edu/~ggordon/1075-f1/ http://groups.google.com/group/1075-f1 Administrivia Prerequisites: no formal ones, but
More informationPARALLEL OPTIMIZATION
PARALLEL OPTIMIZATION Theory, Algorithms, and Applications YAIR CENSOR Department of Mathematics and Computer Science University of Haifa STAVROS A. ZENIOS Department of Public and Business Administration
More informationAn R Package flare for High Dimensional Linear Regression and Precision Matrix Estimation
An R Package flare for High Dimensional Linear Regression and Precision Matrix Estimation Xingguo Li Tuo Zhao Xiaoming Yuan Han Liu Abstract This paper describes an R package named flare, which implements
More informationExploiting Low-Rank Structure in Semidenite Programming by Approximate Operator Splitting
Exploiting Low-Rank Structure in Semidenite Programming by Approximate Operator Splitting Mario Souto, Joaquim D. Garcia and Álvaro Veiga June 26, 2018 Outline Introduction Algorithm Exploiting low-rank
More informationSTRUCTURAL & MULTIDISCIPLINARY OPTIMIZATION
STRUCTURAL & MULTIDISCIPLINARY OPTIMIZATION Pierre DUYSINX Patricia TOSSINGS Department of Aerospace and Mechanical Engineering Academic year 2018-2019 1 Course objectives To become familiar with the introduction
More informationProgramming, numerics and optimization
Programming, numerics and optimization Lecture C-4: Constrained optimization Łukasz Jankowski ljank@ippt.pan.pl Institute of Fundamental Technological Research Room 4.32, Phone +22.8261281 ext. 428 June
More informationImage Restoration using Accelerated Proximal Gradient method
Image Restoration using Accelerated Proximal Gradient method Alluri.Samuyl Department of computer science, KIET Engineering College. D.Srinivas Asso.prof, Department of computer science, KIET Engineering
More informationComputational Methods. Constrained Optimization
Computational Methods Constrained Optimization Manfred Huber 2010 1 Constrained Optimization Unconstrained Optimization finds a minimum of a function under the assumption that the parameters can take on
More informationA More Efficient Approach to Large Scale Matrix Completion Problems
A More Efficient Approach to Large Scale Matrix Completion Problems Matthew Olson August 25, 2014 Abstract This paper investigates a scalable optimization procedure to the low-rank matrix completion problem
More informationLogistic Regression. Abstract
Logistic Regression Tsung-Yi Lin, Chen-Yu Lee Department of Electrical and Computer Engineering University of California, San Diego {tsl008, chl60}@ucsd.edu January 4, 013 Abstract Logistic regression
More informationEpigraph proximal algorithms for general convex programming
Epigraph proimal algorithms for general conve programming Matt Wytock, Po-Wei Wang and J. Zico Kolter Machine Learning Department Carnegie Mellon University mwytock@cs.cmu.edu Abstract This work aims at
More informationEfficient MR Image Reconstruction for Compressed MR Imaging
Efficient MR Image Reconstruction for Compressed MR Imaging Junzhou Huang, Shaoting Zhang, and Dimitris Metaxas Division of Computer and Information Sciences, Rutgers University, NJ, USA 08854 Abstract.
More informationLinear Optimization and Extensions: Theory and Algorithms
AT&T Linear Optimization and Extensions: Theory and Algorithms Shu-Cherng Fang North Carolina State University Sarai Puthenpura AT&T Bell Labs Prentice Hall, Englewood Cliffs, New Jersey 07632 Contents
More informationOutlier Pursuit: Robust PCA and Collaborative Filtering
Outlier Pursuit: Robust PCA and Collaborative Filtering Huan Xu Dept. of Mechanical Engineering & Dept. of Mathematics National University of Singapore Joint w/ Constantine Caramanis, Yudong Chen, Sujay
More informationThe flare Package for High Dimensional Linear Regression and Precision Matrix Estimation in R
Journal of Machine Learning Research 6 (205) 553-557 Submitted /2; Revised 3/4; Published 3/5 The flare Package for High Dimensional Linear Regression and Precision Matrix Estimation in R Xingguo Li Department
More information25. NLP algorithms. ˆ Overview. ˆ Local methods. ˆ Constrained optimization. ˆ Global methods. ˆ Black-box methods.
CS/ECE/ISyE 524 Introduction to Optimization Spring 2017 18 25. NLP algorithms ˆ Overview ˆ Local methods ˆ Constrained optimization ˆ Global methods ˆ Black-box methods ˆ Course wrap-up Laurent Lessard
More informationSparse Screening for Exact Data Reduction
Sparse Screening for Exact Data Reduction Jieping Ye Arizona State University Joint work with Jie Wang and Jun Liu 1 wide data 2 tall data How to do exact data reduction? The model learnt from the reduced
More informationConic Optimization via Operator Splitting and Homogeneous Self-Dual Embedding
Conic Optimization via Operator Splitting and Homogeneous Self-Dual Embedding B. O Donoghue E. Chu N. Parikh S. Boyd Convex Optimization and Beyond, Edinburgh, 11/6/2104 1 Outline Cone programming Homogeneous
More informationTheoretical Concepts of Machine Learning
Theoretical Concepts of Machine Learning Part 2 Institute of Bioinformatics Johannes Kepler University, Linz, Austria Outline 1 Introduction 2 Generalization Error 3 Maximum Likelihood 4 Noise Models 5
More informationSolution Methods Numerical Algorithms
Solution Methods Numerical Algorithms Evelien van der Hurk DTU Managment Engineering Class Exercises From Last Time 2 DTU Management Engineering 42111: Static and Dynamic Optimization (6) 09/10/2017 Class
More informationComputational Optimization. Constrained Optimization Algorithms
Computational Optimization Constrained Optimization Algorithms Same basic algorithms Repeat Determine descent direction Determine step size Take a step Until Optimal But now must consider feasibility,
More informationMining Sparse Representations: Theory, Algorithms, and Applications
Mining Sparse Representations: Theory, Algorithms, and Applications Jun Liu, Shuiwang Ji, and Jieping Ye Computer Science and Engineering The Biodesign Institute Arizona State University What is Sparsity?
More informationMathematical Programming and Research Methods (Part II)
Mathematical Programming and Research Methods (Part II) 4. Convexity and Optimization Massimiliano Pontil (based on previous lecture by Andreas Argyriou) 1 Today s Plan Convex sets and functions Types
More informationConditional gradient algorithms for machine learning
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationInverse KKT Motion Optimization: A Newton Method to Efficiently Extract Task Spaces and Cost Parameters from Demonstrations
Inverse KKT Motion Optimization: A Newton Method to Efficiently Extract Task Spaces and Cost Parameters from Demonstrations Peter Englert Machine Learning and Robotics Lab Universität Stuttgart Germany
More informationA Truncated Newton Method in an Augmented Lagrangian Framework for Nonlinear Programming
A Truncated Newton Method in an Augmented Lagrangian Framework for Nonlinear Programming Gianni Di Pillo (dipillo@dis.uniroma1.it) Giampaolo Liuzzi (liuzzi@iasi.cnr.it) Stefano Lucidi (lucidi@dis.uniroma1.it)
More informationME 555: Distributed Optimization
ME 555: Distributed Optimization Duke University Spring 2015 1 Administrative Course: ME 555: Distributed Optimization (Spring 2015) Instructor: Time: Location: Office hours: Website: Soomin Lee (email:
More informationOn a first-order primal-dual algorithm
On a first-order primal-dual algorithm Thomas Pock 1 and Antonin Chambolle 2 1 Institute for Computer Graphics and Vision, Graz University of Technology, 8010 Graz, Austria 2 Centre de Mathématiques Appliquées,
More informationA Multilevel Proximal Gradient Algorithm for a Class of Composite Optimization Problems
A Multilevel Proximal Gradient Algorithm for a Class of Composite Optimization Problems Panos Parpas May 9, 2017 Abstract Composite optimization models consist of the minimization of the sum of a smooth
More informationDistributed Optimization for Machine Learning
Distributed Optimization for Machine Learning Martin Jaggi EPFL Machine Learning and Optimization Laboratory mlo.epfl.ch AI Summer School - MSR Cambridge - July 5 th Machine Learning Methods to Analyze
More informationModified Iterative Method for Recovery of Sparse Multiple Measurement Problems
Journal of Electrical Engineering 6 (2018) 124-128 doi: 10.17265/2328-2223/2018.02.009 D DAVID PUBLISHING Modified Iterative Method for Recovery of Sparse Multiple Measurement Problems Sina Mortazavi and
More informationLearning with infinitely many features
Learning with infinitely many features R. Flamary, Joint work with A. Rakotomamonjy F. Yger, M. Volpi, M. Dalla Mura, D. Tuia Laboratoire Lagrange, Université de Nice Sophia Antipolis December 2012 Example
More information2. Linear Regression and Gradient Descent
Pattern Recognition And Machine Learning - EPFL - Fall 2015 Emtiyaz Khan, Timur Bagautdinov, Carlos Becker, Ilija Bogunovic & Ksenia Konyushkova 2. Linear Regression and Gradient Descent 2.1 Goals The
More informationThe exam is closed book, closed notes except your one-page (two-sided) cheat sheet.
CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or
More informationTime-varying Linear Regression with Total Variation Regularization
ime-varying Linear Regression with otal Variation Regularization Matt Wytock Machine Learning Department Carnegie Mellon University mwytock@cs.cmu.edu Abstract We consider modeling time series data with
More informationOne-Pass Ranking Models for Low-Latency Product Recommendations
One-Pass Ranking Models for Low-Latency Product Recommendations Martin Saveski @msaveski MIT (Amazon Berlin) One-Pass Ranking Models for Low-Latency Product Recommendations Amazon Machine Learning Team,
More informationStructured Learning. Jun Zhu
Structured Learning Jun Zhu Supervised learning Given a set of I.I.D. training samples Learn a prediction function b r a c e Supervised learning (cont d) Many different choices Logistic Regression Maximum
More informationAdvanced phase retrieval: maximum likelihood technique with sparse regularization of phase and amplitude
Advanced phase retrieval: maximum likelihood technique with sparse regularization of phase and amplitude A. Migukin *, V. atkovnik and J. Astola Department of Signal Processing, Tampere University of Technology,
More informationSolving basis pursuit: infeasible-point subgradient algorithm, computational comparison, and improvements
Solving basis pursuit: infeasible-point subgradient algorithm, computational comparison, and improvements Andreas Tillmann ( joint work with D. Lorenz and M. Pfetsch ) Technische Universität Braunschweig
More informationProjection-Based Methods in Optimization
Projection-Based Methods in Optimization Charles Byrne (Charles Byrne@uml.edu) http://faculty.uml.edu/cbyrne/cbyrne.html Department of Mathematical Sciences University of Massachusetts Lowell Lowell, MA
More informationIterative Shrinkage/Thresholding g Algorithms: Some History and Recent Development
Iterative Shrinkage/Thresholding g Algorithms: Some History and Recent Development Mário A. T. Figueiredo Instituto de Telecomunicações and Instituto Superior Técnico, Technical University of Lisbon PORTUGAL
More informationA new mini-max, constrained optimization method for solving worst case problems
Carnegie Mellon University Research Showcase @ CMU Department of Electrical and Computer Engineering Carnegie Institute of Technology 1979 A new mini-max, constrained optimization method for solving worst
More informationFast Newton methods for the group fused lasso
Fast Newton methods for the group fused lasso Matt Wytock Machine Learning Dept. Carnegie Mellon University Pittsburgh, PA Suvrit Sra Machine Learning Dept. Carnegie Mellon University Pittsburgh, PA J.
More informationPRIMAL-DUAL INTERIOR POINT METHOD FOR LINEAR PROGRAMMING. 1. Introduction
PRIMAL-DUAL INTERIOR POINT METHOD FOR LINEAR PROGRAMMING KELLER VANDEBOGERT AND CHARLES LANNING 1. Introduction Interior point methods are, put simply, a technique of optimization where, given a problem
More informationA fast algorithm for sparse reconstruction based on shrinkage, subspace optimization and continuation [Wen,Yin,Goldfarb,Zhang 2009]
A fast algorithm for sparse reconstruction based on shrinkage, subspace optimization and continuation [Wen,Yin,Goldfarb,Zhang 2009] Yongjia Song University of Wisconsin-Madison April 22, 2010 Yongjia Song
More informationEllipsoid Algorithm :Algorithms in the Real World. Ellipsoid Algorithm. Reduction from general case
Ellipsoid Algorithm 15-853:Algorithms in the Real World Linear and Integer Programming II Ellipsoid algorithm Interior point methods First polynomial-time algorithm for linear programming (Khachian 79)
More informationA fast weighted stochastic gradient descent algorithm for image reconstruction in 3D computed tomography
A fast weighted stochastic gradient descent algorithm for image reconstruction in 3D computed tomography Davood Karimi, Rabab Ward Department of Electrical and Computer Engineering University of British
More informationConvex and Distributed Optimization. Thomas Ropars
>>> Presentation of this master2 course Convex and Distributed Optimization Franck Iutzeler Jérôme Malick Thomas Ropars Dmitry Grishchenko from LJK, the applied maths and computer science laboratory and
More informationA Comparative Study & Analysis of Image Restoration by Non Blind Technique
A Comparative Study & Analysis of Image Restoration by Non Blind Technique Saurav Rawat 1, S.N.Tazi 2 M.Tech Student, Assistant Professor, CSE Department, Government Engineering College, Ajmer Abstract:
More information3 Interior Point Method
3 Interior Point Method Linear programming (LP) is one of the most useful mathematical techniques. Recent advances in computer technology and algorithms have improved computational speed by several orders
More informationLarge-Scale Lasso and Elastic-Net Regularized Generalized Linear Models
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models DB Tsai Steven Hillion Outline Introduction Linear / Nonlinear Classification Feature Engineering - Polynomial Expansion Big-data
More informationSnapVX: A Network-Based Convex Optimization Solver
Journal of Machine Learning Research 18 (2017) 1-5 Submitted 9/15; Revised 10/16; Published 2/17 SnapVX: A Network-Based Convex Optimization Solver David Hallac Christopher Wong Steven Diamond Abhijit
More informationCS 395T Lecture 12: Feature Matching and Bundle Adjustment. Qixing Huang October 10 st 2018
CS 395T Lecture 12: Feature Matching and Bundle Adjustment Qixing Huang October 10 st 2018 Lecture Overview Dense Feature Correspondences Bundle Adjustment in Structure-from-Motion Image Matching Algorithm
More informationCrash-Starting the Simplex Method
Crash-Starting the Simplex Method Ivet Galabova Julian Hall School of Mathematics, University of Edinburgh Optimization Methods and Software December 2017 Ivet Galabova, Julian Hall Crash-Starting Simplex
More information