A Derivative-Free Approximate Gradient Sampling Algorithm for Finite Minimax Problems

Similar documents
Lecture 19: Convex Non-Smooth Optimization. April 2, 2007

Sparse Optimization Lecture: Proximal Operator/Algorithm and Lagrange Dual

Lecture 12: Feasible direction methods

Introduction. Optimization

Convex Optimization MLSS 2015

IDENTIFYING ACTIVE MANIFOLDS

Constrained and Unconstrained Optimization

Introduction to Optimization Problems and Methods

Lagrangian Relaxation: An overview

Aspects of Convex, Nonconvex, and Geometric Optimization (Lecture 1) Suvrit Sra Massachusetts Institute of Technology

Optimization. Industrial AI Lab.

Generalized Nash Equilibrium Problem: existence, uniqueness and

CMU-Q Lecture 9: Optimization II: Constrained,Unconstrained Optimization Convex optimization. Teacher: Gianni A. Di Caro

Convex Optimization Lecture 2

A Brief Overview of Optimization Problems. Steven G. Johnson MIT course , Fall 2008

Lecture Notes 2: The Simplex Algorithm

Mathematical Programming and Research Methods (Part II)

Multivariate Numerical Optimization

A Brief Overview of Optimization Problems. Steven G. Johnson MIT course , Fall 2008

Performance Evaluation of an Interior Point Filter Line Search Method for Constrained Optimization

Convex Optimization - Chapter 1-2. Xiangru Lian August 28, 2015

Module 4 : Solving Linear Algebraic Equations Section 11 Appendix C: Steepest Descent / Gradient Search Method

Comparison of Interior Point Filter Line Search Strategies for Constrained Optimization by Performance Profiles

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

A direct search method for smooth and nonsmooth unconstrained optimization

Lecture 2 September 3

Optimization Plugin for RapidMiner. Venkatesh Umaashankar Sangkyun Lee. Technical Report 04/2012. technische universität dortmund

Optimization for Machine Learning

Nonsmooth Optimization and Related Topics

ISTITUTO DI ANALISI DEI SISTEMI ED INFORMATICA

for Approximating the Analytic Center of a Polytope Abstract. The analytic center of a polytope P + = fx 0 : Ax = b e T x =1g log x j

Convex Optimization / Homework 2, due Oct 3

Math 5593 Linear Programming Lecture Notes

Ellipsoid Algorithm :Algorithms in the Real World. Ellipsoid Algorithm. Reduction from general case

Extending the search space of time-domain adjoint-state FWI with randomized implicit time shifts

Chapter 3 Numerical Methods

Lecture 19 Subgradient Methods. November 5, 2008

Lecture 6 - Multivariate numerical optimization

Distributed non-convex optimization

Convex Analysis and Minimization Algorithms I

Discrete Optimization 2010 Lecture 5 Min-Cost Flows & Total Unimodularity

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

IE 521 Convex Optimization

A Multilevel Proximal Gradient Algorithm for a Class of Composite Optimization Problems

Introduction to optimization methods and line search

Characterizing Improving Directions Unconstrained Optimization

USING SAMPLING AND SIMPLEX DERIVATIVES IN PATTERN SEARCH METHODS

Divide and Conquer Kernel Ridge Regression

MVE165/MMG630, Applied Optimization Lecture 8 Integer linear programming algorithms. Ann-Brith Strömberg

Introduction to unconstrained optimization - derivative-free methods

Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization. Author: Martin Jaggi Presenter: Zhongxing Peng

Bilinear Programming

Nesterov's Smoothing Technique and Minimizing Differences of Convex Functions for Hierarchical Clustering

Lagrangian methods for the regularization of discrete ill-posed problems. G. Landi

Lecture 15: The subspace topology, Closed sets

LECTURE 13: SOLUTION METHODS FOR CONSTRAINED OPTIMIZATION. 1. Primal approach 2. Penalty and barrier methods 3. Dual approach 4. Primal-dual approach

Linear Discriminant Functions: Gradient Descent and Perceptron Convergence

T. Background material: Topology

Optimization. 1. Optimization. by Prof. Seungchul Lee Industrial AI Lab POSTECH. Table of Contents

Computational Machine Learning, Fall 2015 Homework 4: stochastic gradient algorithms

The Mesh Adaptive Direct Search Algorithm for Discrete Blackbox Optimization

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 2. Convex Optimization

Programming, numerics and optimization

CS 473: Algorithms. Ruta Mehta. Spring University of Illinois, Urbana-Champaign. Ruta (UIUC) CS473 1 Spring / 29

Projection-Based Methods in Optimization

EC 521 MATHEMATICAL METHODS FOR ECONOMICS. Lecture 2: Convex Sets

PROJECTION ONTO A POLYHEDRON THAT EXPLOITS SPARSITY

An accelerated proximal gradient algorithm for nuclear norm regularized least squares problems

Introduction to Modern Control Systems

AM 221: Advanced Optimization Spring 2016

INTRODUCTION AN INTRODUCTION TO DIRECT OPTIMIZATION METHODS. SIMPLE EXAMPLE: COMPASS SEARCH (Con t) SIMPLE EXAMPLE: COMPASS SEARCH

Solution Methodologies for. the Smallest Enclosing Circle Problem

Applied Lagrange Duality for Constrained Optimization

not made or distributed for profit or commercial advantage and that copies

Review Initial Value Problems Euler s Method Summary

Efficient Tuning of SVM Hyperparameters Using Radius/Margin Bound and Iterative Algorithms

Convexity: an introduction

Recent Developments in Model-based Derivative-free Optimization

Section 5 Convex Optimisation 1. W. Dai (IC) EE4.66 Data Proc. Convex Optimisation page 5-1

C3 Numerical methods

control polytope. These points are manipulated by a descent method to compute a candidate global minimizer. The second method is described in Section

Delaunay-based derivative-free optimization via global surrogates, part II: convex constraints

INTRODUCTION TO LINEAR AND NONLINEAR PROGRAMMING

3. The Simplex algorithmn The Simplex algorithmn 3.1 Forms of linear programs

A generic column generation principle: derivation and convergence analysis

Path Planning and Numerical Methods for Static (Time-Independent / Stationary) Hamilton-Jacobi Equations

An iteration of the simplex method (a pivot )

Persistence stability for geometric complexes

The Capacity of Wireless Networks

H = {(1,0,0,...),(0,1,0,0,...),(0,0,1,0,0,...),...}.

Finding Euclidean Distance to a Convex Cone Generated by a Large Number of Discrete Points

Numerical Optimization

Nonlinear Programming

A PARAMETRIC SIMPLEX METHOD FOR OPTIMIZING A LINEAR FUNCTION OVER THE EFFICIENT SET OF A BICRITERIA LINEAR PROBLEM. 1.

Delay-minimal Transmission for Energy Constrained Wireless Communications

David G. Luenberger Yinyu Ye. Linear and Nonlinear. Programming. Fourth Edition. ö Springer

Convexity Theory and Gradient Methods

Proximal operator and methods

A primal-dual framework for mixtures of regularizers

15. Cutting plane and ellipsoid methods

Transcription:

1 / 33 A Derivative-Free Approximate Gradient Sampling Algorithm for Finite Minimax Problems Speaker: Julie Nutini Joint work with Warren Hare University of British Columbia (Okanagan) III Latin American Workshop on Optimization and Control January 10-13, 2012

2 / 33 Table of Contents 1 Introduction 2 AGS Algorithm 3 Robust AGS Algorithm 4 Numerical Results 5 Conclusion

3 / 33 Setting We assume that our problem is of the form min x f (x) where f (x) = max{f i : i = 1,..., N}, where each f i is continuously differentiable, i.e., f i C 1, BUT we cannot compute f i.

Active Set/Gradients Definition We define the active set of f at a point x to be the set of indices A( x) = {i : f ( x) = f i ( x)}. The set of active gradients of f at x is denoted by { f i ( x)} i A( x). 4 / 33

5 / 33 Clarke Subdifferential for Finite Max Function Proposition (Proposition 2.3.12, Clarke 90) Let f = max{f i : i = 1,..., N}. If f i C 1 for each i A( x), then f ( x) = conv{ f i ( x)} i A( x).

Method of Steepest Descent 1. Initialize: Set d 0 = Proj(0 f ). 2. Step Length: Find t k by solving min t k >0 {f (x k + t k d k )}. 3. Update: Set x k+1 = x k + t k d k. Increase k = k + 1. Loop. 6 / 33

7 / 33 AGS Algorithm - General Idea Replace f with an approximate subdifferential Find an approximate direction of steepest descent Minimize along nondifferentiable ridges

8 / 33 Approximate Gradient Sampling Algorithm (AGS Algorithm)

9 / 33 AGS Algorithm 1 Initialize 2 Generate Approximate Subdifferential: Sample Y = [x k, y 1,..., y m ] from around x k such that max y i x k k. i=1,...,m Calculate an approximate gradient A f i for each i A(x k ) and set G k = conv{ A f i (x k )} i A(x k ). 3 Direction: Set d k = Proj(0 G k ). 4 Check Stopping Conditions 5 (Armijo) Line Search 6 Update and Loop

10 / 33 Convergence AGS Algorithm

Remark We define the approximate subdifferential of f at x as G( x) = conv{ A f i ( x)} i A( x), where A f i ( x) is the approximate gradient of f i at x. 11 / 33

12 / 33 Lemma Suppose there exists an ε > 0 such that A f i ( x) f i ( x) ε. Then 1 for all w G( x), v f ( x) such that w v ε, and 2 for all v f ( x), w G( x) such that w v ε.

13 / 33 Proof. 1. By definition, for all w G( x) there exists a set of α i such that w = α i A f i ( x), where α i 0, α i = 1. i A( x) i A( x) Using the same α i as above, we see that v = α i f i ( x) f ( x). i A( x) Then w v = α i A f i ( x)) α i f i ( x) ε. i A( x) i A( x) Hence, for all w G( x), there exists a v f ( x) such that w v ε. (1)

14 / 33 Convergence Theorem Let {x k } k=0 be generated by the AGS algorithm. Suppose there exists a K such that given any set Y generated in Step 1 of the AGS algorithm, A f i (x k ) satisfies A f i (x k ) f i (x k ) K k, where k = max y i x k. y i Y Suppose t k is bounded away from 0. Then either 1 f (x k ), or 2 d k 0, k 0 and every cluster point x of the sequence {x k } k=0 satisfies 0 f ( x).

15 / 33 Convergence - Proof Proof - Outline. 1. The direction of steepest descent is Proj(0 f (x k )). 2. By previous lemma, G(x k ) is a good approximate of f (x k ). 3. So we can show that Proj(0 G(x k )) is still a descent direction (approximate direction of steepest descent). 4. Thus, we can show that convergence holds.

16 / 33 Robust Approximate Gradient Sampling Algorithm (Robust AGS Algorithm)

Visual Consider the function f (x) = max{f 1 (x), f 2 (x), f 3 (x)} where f 1 (x) = x 2 1 + x 2 2 f 2 (x) = (2 x 1 ) 2 + (2 x 2 ) 2 f 3 (x) = 2 exp(x 2 x 1 ). 17 / 33

18 / 33 Visual Representation Contour plot - nondifferentiable ridges

19 / 33 Visual Representation Regular AGS algorithm:

20 / 33 Visual Representation Robust AGS algorithm:

21 / 33 Robust Approximate Gradient Sampling Algorithm Definition Let Y = [x k, y 1, y 2,..., y m ] be a set of sampled points. The robust active set of f on Y is A(Y ) = A(y i ). y i Y

22 / 33 Robust Approximate Gradient Sampling Algorithm 1 Initialize 2 Generate Approximate Subdifferential: Calculate an approximate gradient A f i for each i A(Y ) and set G k = conv{ A f i (x k )} i A(x k ), and GY k = conv{ Af i (x k )} i A(Y ). 3 Direction: Set d k = Proj(0 G k ) and dy k = Proj(0 Gk Y ). 4 Check Stopping Conditions: Use d k to check stopping conditions. 5 (Armijo) Line Search: Use dy k for the search direction. 6 Update and Loop

23 / 33 Convergence Robust AGS Algorithm

24 / 33 Robust Convergence Theorem Let {x k } k=0 be generated by the robust AGS algorithm. Suppose there exists a K such that given any set Y generated in Step 1 of the robust AGS algorithm, A f i (x k ) satisfies A f i (x k ) f i (x k ) K k, where k = max y i x k. y i Y Suppose t k is bounded away from 0. Then either 1 f (x k ), or 2 d k 0, k 0 and every cluster point x of the sequence {x k } k=0 satisfies 0 f ( x).

25 / 33 Numerical Results

26 / 33 Approximate Gradients Requirement In order for convergence to be guaranteed in the AGS algorithm, A f i (x k ) must satisfy an error bound for each of the active f i : A f i (x k ) f i (x k ) K, where K > 0 and is bounded above. We used the following 3 approximate gradients: 1 Simplex Gradient (see Lemma 6.2.1, Kelley 99) 2 Centered Simplex Gradient (see Lemma 6.2.5, Kelley 99) 3 Gupal Estimate (see Theorem 3.8, Hare and Nutini 11)

27 / 33 Numerical Results - Overview Implementation was done in MATLAB 24 nonsmooth test problems (Lukšan-Vlček, 00) 25 trials for each problem Quality (improvement of digits of accuracy) measured by ( Fmin F ) log F 0 F

28 / 33 Numerical Results - Goal Determine any notable numerical differences between: 1 Version of the AGS Algorithm 1. Regular 2. Robust 2 Approximate Gradient 1. Simplex Gradient 2. Centered Simplex Gradient 3. Gupal Estimate Thus, 6 different versions of our algorithm were compared.

29 / 33 Performance Profile 1 0.9 0.8 0.7 0.6 Performance Profile on AGS Algorithm (successful improvement of 1 digit) Simplex Robust Simplex Centered Simplex Robust Centered Simplex Gupal Robust Gupal ρ(τ) 0.5 0.4 0.3 0.2 0.1 0 10 0 10 1 10 2 10 3 τ

30 / 33 Performance Profile 0.6 0.5 0.4 Performance Profile on AGS Algorithm (successful improvement of 3 digits) Simplex Robust Simplex Centered Simplex Robust Centered Simplex Gupal Robust Gupal ρ(τ) 0.3 0.2 0.1 0 10 0 10 1 10 2 10 3 τ

31 / 33 Conclusion Approximate gradient sampling algorithm for finite minimax problems Robust version Convergence (0 f ( x)) Numerical tests Notes: We have numerical results for a robust stopping condition. There is future work in developing the theoretical analysis. We are currently working on an application in seismic retrofitting.

32 / 33 Thank you!

33 / 33 References 1 J. V. Burke, A. S. Lewis, and M. L. Overton. A robust gradient sampling algorithm for nonsmooth, nonconvex optimization, SIAM J. Optim., 15(2005), pp. 751-779. 2 W. Hare and J. Nutini. A derivative-free approximate gradient sampling algorithm for finite minimax problems. Submitted to SIAM J. Optim., 2011. 3 K. C. Kiwiel. A nonderivative version of the gradient sampling algorithm for nonsmooth nonconvex optimization, SIAM J. Optim., 20(2010), pp. 1983-1994. 4 C. T. Kelley. Iterative methods for optimization, SIAM, Philadelphia, PA, 1999. 5 L. Lukšan and J. Vlček. Test Problems for Nonsmooth Unconstrained and Linearly Constrained Optimization. Technical Report V-798, ICS AS CR, February 2000.