Convex Optimization CMU-10725

Similar documents
David G. Luenberger Yinyu Ye. Linear and Nonlinear. Programming. Fourth Edition. ö Springer

INTRODUCTION TO LINEAR AND NONLINEAR PROGRAMMING

Convex Optimization CMU-10725

Simplex Algorithm in 1 Slide

Classical Gradient Methods

Multi Layer Perceptron trained by Quasi Newton learning rule

Theoretical Concepts of Machine Learning

Modern Methods of Data Analysis - WS 07/08

10.6 Conjugate Gradient Methods in Multidimensions

Contents. I Basics 1. Copyright by SIAM. Unauthorized reproduction of this article is prohibited.

Optimization. there will solely. any other methods presented can be. saved, and the. possibility. the behavior of. next point is to.

25. NLP algorithms. ˆ Overview. ˆ Local methods. ˆ Constrained optimization. ˆ Global methods. ˆ Black-box methods.

Lecture 3: Convex sets

arxiv: v1 [cs.cv] 2 May 2016

LECTURE 13: SOLUTION METHODS FOR CONSTRAINED OPTIMIZATION. 1. Primal approach 2. Penalty and barrier methods 3. Dual approach 4. Primal-dual approach

Introduction to Optimization Problems and Methods

Lecture 15: The subspace topology, Closed sets

Convex Optimization CMU-10725

Today. Golden section, discussion of error Newton s method. Newton s method, steepest descent, conjugate gradient

Ellipsoid Algorithm :Algorithms in the Real World. Ellipsoid Algorithm. Reduction from general case

Proximal operator and methods

CPSC 340: Machine Learning and Data Mining. Robust Regression Fall 2015

Lecture 12: convergence. Derivative (one variable)

Optimization for Machine Learning

COMPUTATIONAL INTELLIGENCE (CS) (INTRODUCTION TO MACHINE LEARNING) SS16. Lecture 2: Linear Regression Gradient Descent Non-linear basis functions

10703 Deep Reinforcement Learning and Control

Module 1 Lecture Notes 2. Optimization Problem and Model Formulation

COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 2: Linear Regression Gradient Descent Non-linear basis functions

Non-context-Free Languages. CS215, Lecture 5 c

Convex Optimization MLSS 2015

Iterative Algorithms I: Elementary Iterative Methods and the Conjugate Gradient Algorithms

Conic Duality. yyye

Animation Lecture 10 Slide Fall 2003

Introduction to Optimization

10.7 Variable Metric Methods in Multidimensions

Math 5593 Linear Programming Lecture Notes

Lecture 5: Duality Theory

SYSTEMS OF NONLINEAR EQUATIONS

Multivariate Numerical Optimization

1. Introduction. performance of numerical methods. complexity bounds. structural convex optimization. course goals and topics

THREE LECTURES ON BASIC TOPOLOGY. 1. Basic notions.

Computational Optimization. Constrained Optimization Algorithms

Lecture 6 - Multivariate numerical optimization

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

Introduction to optimization methods and line search

Energy Minimization -Non-Derivative Methods -First Derivative Methods. Background Image Courtesy: 3dciencia.com visual life sciences

Ill-Posed Problems with A Priori Information

Introduction to Machine Learning CMU-10701

Gradient, Newton and conjugate direction methods for unconstrained nonlinear optimization

Lecture 19: November 5

Nonlinear Programming

Chapter 3 Numerical Methods

Humanoid Robotics. Least Squares. Maren Bennewitz

Advanced Operations Research Techniques IE316. Quiz 1 Review. Dr. Ted Ralphs

As a consequence of the operation, there are new incidences between edges and triangles that did not exist in K; see Figure II.9.

Lecture 12: Feasible direction methods

Numerical Optimization: Introduction and gradient-based methods

5 Machine Learning Abstractions and Numerical Optimization

CS473-Algorithms I. Lecture 13-A. Graphs. Cevdet Aykanat - Bilkent University Computer Engineering Department

Iterative Methods for Solving Linear Problems

What is machine learning?

Divide and Conquer Kernel Ridge Regression

Computational Methods. Constrained Optimization

A Brief Look at Optimization

CS 395T Lecture 12: Feature Matching and Bundle Adjustment. Qixing Huang October 10 st 2018

Optimization. Industrial AI Lab.

CONLIN & MMA solvers. Pierre DUYSINX LTAS Automotive Engineering Academic year

Efficient Optimization for L -problems using Pseudoconvexity

Training of Neural Networks. Q.J. Zhang, Carleton University

Minimization algorithm in the simulation of the Wall Touching Kink Modes

Composite Self-concordant Minimization

Convexization in Markov Chain Monte Carlo

Lecture 18: March 23

Open and Closed Sets

A CONJUGATE DIRECTION IMPLEMENTATION OF THE BFGS ALGORITHM WITH AUTOMATIC SCALING. Ian D Coope

06: Logistic Regression

Image registration for motion estimation in cardiac CT

Optimization. (Lectures on Numerical Analysis for Economists III) Jesús Fernández-Villaverde 1 and Pablo Guerrón 2 February 20, 2018

Lecture 2. Topology of Sets in R n. August 27, 2008

Convexity: an introduction

4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used.

Short Reminder of Nonlinear Programming

Logistic Regression

Mathematics of Optimization Viewpoint of applied mathematics Algorithm

LECTURE 8: SMOOTH SUBMANIFOLDS

K-Means and Gaussian Mixture Models

Accelerating the Hessian-free Gauss-Newton Full-waveform Inversion via Preconditioned Conjugate Gradient Method

LECTURE 18 LECTURE OUTLINE

A Scaled Gradient Descent Method for. Unconstrained Optimiziation Problems With A. Priori Estimation of the Minimum Value

TOPOLOGY CHECKLIST - SPRING 2010

for Approximating the Analytic Center of a Polytope Abstract. The analytic center of a polytope P + = fx 0 : Ax = b e T x =1g log x j

Poblano v1.0: A Matlab Toolbox for Gradient-Based Optimization

Efficient Iterative Semi-supervised Classification on Manifold

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

Lecture 6: Faces, Facets

A large number of user subroutines and utility routines is available in Abaqus, that are all programmed in Fortran. Subroutines are different for

Experimental Data and Training

CS281 Section 3: Practical Optimization

Two-phase matrix splitting methods for asymmetric and symmetric LCP

Compact Sets. James K. Peterson. September 15, Department of Biological Sciences and Department of Mathematical Sciences Clemson University

Transcription:

Convex Optimization CMU-10725 Conjugate Direction Methods Barnabás Póczos & Ryan Tibshirani

Conjugate Direction Methods 2

Books to Read David G. Luenberger, Yinyu Ye: Linear and Nonlinear Programming Nesterov: Introductory lectures on convex optimization Bazaraa, Sherali, Shetty: Nonlinear Programming Dimitri P. Bestsekas: Nonlinear Programming 3

Motivation Conjugate direction methods can be regarded as being between the method of steepest descent and Newton s method. Motivation: steepest descent is slow. Goal: Accelerate it! Newton method is fast BUT: we need to calculate the inverse of the Hessian matrix Something between steepest descent and Newton method? 4

Goal: Conjugate Direction Methods Accelerate the convergence rate of steepest descent while avoiding the high computational cost of Newton s method Originally developed for solving the quadratic problem: Equivalently, our goal is to solve: Conjugate direction methods can solve this problem at most n iterations (usually for large n less is enough) 5

Conjugate Direction Methods algorithm for the numerical solution of linear equations, whose matrix Q is symmetric and positive-definite. An iterative method, so it can be applied to systems that are too large to be handled by direct methods (such as the Cholesky decomposition.) Algorithm for seeking minima of nonlinear equations. 6

Conjugate directions Definition [Q-conjugate directions] In the applications that we consider, the matrix Q will be positive definite but this is not inherent in the basic definition. If Q = 0, any two vectors are conjugate. if Q = I, conjugacy is equivalent to the usual notion of orthogonality. 7

Linear independence lemma Lemma [Linear Independence] Proof: [Proof by contradiction] 8

Why is Q-conjugacy useful? 9

Quadratic problem Goal: the unique solution to this problem is also the unique solution to the linear equation: 10

Importance of Q-conjugancy Therefore, Qx * =b is the step where standard orthogonality is not enough anymore, We need to use Q-conjugacy. 11

Importance of Q-conjugancy No need to do matrix inversion! We only need to calculate inner products. This can be generalized further such a way that the starting point of the iteration can be arbitrary x 0 12

Conjugate Direction Theorem In the previous slide we had Theorem [Conjugate Direction Theorem] [update rule] [gradient of f] No need to do matrix inversion! We only need to calculate inner products. 13

Proof Therefore, it is enough to prove that with these α k values we have 14

Proof We already know Therefore, Q.E.D. 15

Another motivation for Q-conjugacy Goal: Therefore, n separate 1-dimensional optimization problems! 16

Expanding Subspace Theorem 17

Expanding Subspace Theorem 18

Expanding Subspace Theorem Theorem [Expanding Subspace Theorem] 19

Expanding Subspace Theorem Proof 20 D. Luenberger, Yinyu Ye: Linear and Nonlinear Programming

Proof By definition, Therefore, 21

Proof We have proved By definition, Therefore, 22

Proof Since [We have proved this] Therefore, [By induction assumption], Q.E.D. 23

Corollary Corollary of Exp. Subs. Theorem 24

Expanding Subspace Theorem 25 D. Luenberger, Yinyu Ye: Linear and Nonlinear Programming

THE CONJUGATE GRADIENT METHOD 26

THE CONJUGATE GRADIENT METHOD Given d 0,, d n-1, we already have an update rule for α k How should we choose vectors d 0,, d n-1? The conjugate gradient method The conjugate gradient method is a conjugate direction method Selects the successive direction vectors as a conjugate version of the successive gradients obtained as the method progresses. The conjugate directions are not specified beforehand, but rather are determined sequentially at each step of the iteration. 27

THE CONJUGATE GRADIENT METHOD Advantages Simple update rule the directions are based on the gradients, therefore the process makes good uniform progress toward the solution at every step. For arbitrary sequences of conjugate directions the progress may be slight until the final few steps 28

Conjugate Gradient Algorithm Conjugate Gradient Algorithm The CGA is only slightly more complicated to implement than the method of steepest descent but converges in a finite number of steps on quadratic problems. In contrast to Newton method, there is no need for matrix inversion. 29

Conjugate Gradient Theorem To verify that the algorithm is a conjugate direction algorithm, all we need is to verify that the vectors d 0,,d k are Q-orthogonal. Theorem [Conjugate Gradient Theorem] The conjugate gradient algorithm is a conjugate direction method. a) b) c) d) Only α k needs matrix Q in the algorithm! e) 30

Proofs Would be nice to discuss it, but no time 31

EXTENSION TO NONQUADRATIC PROBLEMS 32

EXTENSION TO NONQUADRATIC PROBLEMS Goal: Do quadratic approximation This is similar to Newton s method. [f is approximated by a quadratic function] When applied to nonquadratic problems, conjugate gradient methods will not usually terminate within n steps. After n steps, we can restart the process from this point and run the algorithm for another n steps 33

Step 1 Conjugate Gradient Algorithm for nonquadratic functions Step 2 Step 3 34

Properties of CGA An attractive feature of the algorithm is that, just as in the pure form of Newton s method, no line searching is required at any stage. The algorithm converges in a finite number of steps for a quadratic problem. The undesirable features are that Hessian matrix must be evaluated at each point. The algorithm is not, in this form, globally convergent. 35

LINE SEARCH METHODS 36

Fletcher Reeves method Step 1 Step 2 Step 3 37

Fletcher Reeves method Line search method Hessian is not used in the algorithm In the quadratic case it is identical to the original conjugate direction algorithm 38

Polak Ribiere method Same as Fletcher Reeves method, BUT: Again this leads to a value identical to the standard formula in the quadratic case. Experimental evidence seems to favor the Polak Ribiere method over other methods of this general type. 39

Convergence rate Under some conditions the line search method is globally convergent. Under some conditions, the rate is [since one complete n step cycle solves a quadratic problem similarly To the Newton method] 40

Numerical Experiments A comparison of * gradient descent with optimal step size (in green) and * conjugate vector (in red) for minimizing a quadratic function. Conjugate gradient converges in at most n steps (here n=2). 41

Summary Conjugate Direction Methods - conjugate directions Minimizing quadratic functions Conjugate Gradient Methods for nonquadratic functions - Line search methods * Fletcher Reeves * Polak Ribiere 42