DC Programming: A brief tutorial. Andres Munoz Medina Courant Institute of Mathematical Sciences

Similar documents
Bilinear Programming

Nesterov's Smoothing Technique and Minimizing Differences of Convex Functions for Hierarchical Clustering

David G. Luenberger Yinyu Ye. Linear and Nonlinear. Programming. Fourth Edition. ö Springer

The Maximum Ratio Clique Problem: A Continuous Optimization Approach and Some New Results

Primal and Dual Methods for Optimisation over the Non-dominated Set of a Multi-objective Programme and Computing the Nadir Point

Convexity Theory and Gradient Methods

A PARAMETRIC SIMPLEX METHOD FOR OPTIMIZING A LINEAR FUNCTION OVER THE EFFICIENT SET OF A BICRITERIA LINEAR PROBLEM. 1.

control polytope. These points are manipulated by a descent method to compute a candidate global minimizer. The second method is described in Section

An FPTAS for minimizing the product of two non-negative linear cost functions

Convexity and Optimization

Convex Optimization MLSS 2015

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 2. Convex Optimization

Nonlinear Programming

Convexity and Optimization

Convex Analysis and Minimization Algorithms I

Surrogate Gradient Algorithm for Lagrangian Relaxation 1,2

Characterizing Improving Directions Unconstrained Optimization

Lagrangian Relaxation: An overview

Introduction to Modern Control Systems

Convex sets and convex functions

Convex sets and convex functions

INTRODUCTION TO LINEAR AND NONLINEAR PROGRAMMING

Bilinear Modeling Solution Approach for Fixed Charged Network Flow Problems (Draft)

Programming, numerics and optimization

COMS 4771 Support Vector Machines. Nakul Verma

Introduction to optimization methods and line search

of Convex Analysis Fundamentals Jean-Baptiste Hiriart-Urruty Claude Lemarechal Springer With 66 Figures

Convex Optimization Lecture 2

Contents. I Basics 1. Copyright by SIAM. Unauthorized reproduction of this article is prohibited.

Sparse Optimization Lecture: Proximal Operator/Algorithm and Lagrange Dual

Aspects of Convex, Nonconvex, and Geometric Optimization (Lecture 1) Suvrit Sra Massachusetts Institute of Technology

Lecture 19: Convex Non-Smooth Optimization. April 2, 2007

IE598 Big Data Optimization Summary Nonconvex Optimization

Convex Optimization. Lijun Zhang Modification of

Walheer Barnabé. Topics in Mathematics Practical Session 2 - Topology & Convex

Research Interests Optimization:

A Linear Programming Approach to Concave Approximation and Value Function Iteration

Nonsmooth Optimization and Related Topics

Kernels and Constrained Optimization

Recent Developments in Model-based Derivative-free Optimization

LECTURE 13: SOLUTION METHODS FOR CONSTRAINED OPTIMIZATION. 1. Primal approach 2. Penalty and barrier methods 3. Dual approach 4. Primal-dual approach

An FPTAS for Optimizing a Class of Low-Rank Functions Over a Polytope

Constrained and Unconstrained Optimization

AM 221: Advanced Optimization Spring 2016

Optimization. 1. Optimization. by Prof. Seungchul Lee Industrial AI Lab POSTECH. Table of Contents

Probabilistic Graphical Models

Optimization for Machine Learning

Tutorial on Convex Optimization for Engineers

Introduction to Optimization Problems and Methods

CME307/MS&E311 Theory Summary

A Brief Overview of Optimization Problems. Steven G. Johnson MIT course , Fall 2008

Convexity: an introduction

Mathematical Programming and Research Methods (Part II)

Introduction to Constrained Optimization

Machine Learning for Signal Processing Lecture 4: Optimization

A Brief Overview of Optimization Problems. Steven G. Johnson MIT course , Fall 2008

LECTURE NOTES Non-Linear Programming

CPSC 340: Machine Learning and Data Mining. Robust Regression Fall 2015

COM Optimization for Communications Summary: Convex Sets and Convex Functions

Solution Methods Numerical Algorithms

Convex Optimization and Machine Learning

Applied Lagrange Duality for Constrained Optimization

morphology on binary images

Affine function. suppose f : R n R m is affine (f(x) =Ax + b with A R m n, b R m ) the image of a convex set under f is convex

Convexization in Markov Chain Monte Carlo

Introduction to Optimization

CME307/MS&E311 Optimization Theory Summary

Optimization. Industrial AI Lab.

Lecture 4: Convexity

CMU-Q Lecture 9: Optimization II: Constrained,Unconstrained Optimization Convex optimization. Teacher: Gianni A. Di Caro

FACES OF CONVEX SETS

Natural Language Processing

Revisiting the Upper Bounding Process in a Safe Branch and Bound Algorithm

1. Introduction. performance of numerical methods. complexity bounds. structural convex optimization. course goals and topics

A Comparison of Mixed-Integer Programming Models for Non-Convex Piecewise Linear Cost Minimization Problems

Convex Optimization / Homework 2, due Oct 3

New Methods for Solving Large Scale Linear Programming Problems in the Windows and Linux computer operating systems

Convex Programs. COMPSCI 371D Machine Learning. COMPSCI 371D Machine Learning Convex Programs 1 / 21

IE 521 Convex Optimization

Efficient Evolutionary Algorithm for Single-Objective Bilevel Optimization

Lecture 5: Duality Theory

Support Vector Machines.

Towards a practical simplex method for second order cone programming

Convergence of Nonconvex Douglas-Rachford Splitting and Nonconvex ADMM

Math 5593 Linear Programming Lecture Notes

Unconstrained Optimization Principles of Unconstrained Optimization Search Methods

Efficient Tuning of SVM Hyperparameters Using Radius/Margin Bound and Iterative Algorithms

A Deterministic Global Optimization Method for Variational Inference

Global Minimization via Piecewise-Linear Underestimation

Chapter 3 Numerical Methods

Optimization Plugin for RapidMiner. Venkatesh Umaashankar Sangkyun Lee. Technical Report 04/2012. technische universität dortmund

Lecture 25 Nonlinear Programming. November 9, 2009

Support Vector Machines. James McInerney Adapted from slides by Nakul Verma

Conic Duality. yyye

ON CONCAVITY OF THE PRINCIPAL S PROFIT MAXIMIZATION FACING AGENTS WHO RESPOND NONLINEARLY TO PRICES

CS675: Convex and Combinatorial Optimization Fall 2014 Convex Functions. Instructor: Shaddin Dughmi

A fast algorithm for sparse reconstruction based on shrinkage, subspace optimization and continuation [Wen,Yin,Goldfarb,Zhang 2009]

February 2017 (1/20) 2 Piecewise Polynomial Interpolation 2.2 (Natural) Cubic Splines. MA378/531 Numerical Analysis II ( NA2 )

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016

Lecture 2 September 3

Transcription:

DC Programming: A brief tutorial. Andres Munoz Medina Courant Institute of Mathematical Sciences munoz@cims.nyu.edu

Difference of Convex Functions Definition: A function exists f is said to be DC if there g, h convex functions such that f = g h Definition: A function is locally DC if for every there exists a neighborhood U such that f U is DC. x

Contents Motivation. DC functions and properties. DC programming and DCA. CCCP and convergence results. Global optimization algorithms.

Motivation Not all machine learning problems are convex anymore Transductive SVM s [Wang et. al 2003] Kernel learning [Argyriou et. al 2004] Structure prediction [Narasinham 2012] Auction mechanism design [MM and Muñoz]

Notation Let g be a convex function. The conjugate of g is defined as g g (y) =supy, x g(x) x For >0, g(x 0 ) denotes the subdifferential of g at, i.e. x 0 g(x 0 )={v R n g(x) g(x 0 )+x x 0,v g(x 0 ) will denote the exact subdifferential.

DC functions A function f : R R is DC iff is the integral of a function of bounded variation. [Hartman 59] A locally DC function is globally DC. [Hartman 59] All twice continuously differentiable functions are DC. Closed under sum, negation, supremum and products.

DC programming DC programming refers to optimization problems of the form. min x g(x) h(x) More generally, for f i (x) DC functions min x g(x) h(x) subject to f i (x) 0

Global optimality conditions. A point x is a global solution if and only if h(x ) g(x ) Let w =infg(x) h(x), then a point x is a global solution if and only if x 0=inf x { h(x)+t g(x) t w }

Global optimality conditions. A point x is a global solution if and only if h(x ) g(x ) Let w =infg(x) h(x), then a point x is a global solution if and only if x 0=inf x { h(x)+t g(x) t w }

Global optimality conditions. A point x is a global solution if and only if h(x ) g(x ) Let w =infg(x) h(x), then a point x is a global solution if and only if x 0=inf x { h(x)+t g(x) t w }

Local optimality conditions If x verifies h(x ) int g(x ), then is a strict local minimizer of g h

DC duality By definition of conjugate function inf x g(x) h(x) =inf x =inf x =inf y g(x) (supx, y h (y)) inf y This is the dual of the original problem y h (y)+g(x) x, y h (y) g (y)

DC algorithm. We want to find a sequence x k that decreases the function at every step. Use duality. If y h(x 0 ) then h (y) g (y) =x 0,y h(x 0 ) sup(x, y g(x)) x =inf x g(x) h(x)+x 0 x, y g(x 0 ) h(x 0 )

DC algorithm Solve the partial problems S(x ) inf{h (y) g (y) :y h(x )} T (y ) inf{g(x) h(x) :x g (y )} Choose y and. k S(x k ) x k+1 T (y k ) Solve concave minimization problems. Simplified DCA y k h(x k ) x k g (y k )

DCA as CCCP If the function is differentiable, the simplified DCA becomes y k = h(x k ) and x k+1 g (y k ) Equivalent to

DCA as CCCP If the function is differentiable, the simplified DCA becomes y k = h(x k ) and x k+1 g (y k ) Equivalent to

DCA as CCCP If the function is differentiable, the simplified DCA becomes y k = h(x k ) and x k+1 g (y k ) x k+1 argmin g(x) x, h(x k ) Equivalent to

DCA as CCCP If the function is differentiable, the simplified DCA becomes y k = h(x k ) and x k+1 g (y k ) x k+1 argmin g(x) x, h(x k ) Equivalent to x k+1 argmin g(x) h(x k ) x x k, h(x k )

CCCP as a majorization minimization algorithm To minimize f, MM algorithms build a majorization function F such that f(x) F (x, y) x, y f(x) =F (x, x) x Do coordinate descent on F In our scenario F (x, y) =g(x) h(y) x y, h(y)

Convergence results Unconstrained DC functions: Convergence to a local minimum (no rate of convergence). Bound depends on moduli of convexity. [PD Tao, LT Hoai An 97] Unconstrained smooth optimization: Linear or almost quadratic convergence depending on curvature [Roweis et. al 03] Constrained smooth optimization: Convergence without rate using Zangwill s theory. [Lanckriet, Sriperumbudur 09]

Global convergence NP-hard in general: Minimizing a quadratic function with one negative eigenvalue with linear constraints. [Pardalos 91] Mostly branch and bound methods and cutting plane methods [H. Tuy 03, Horst and Thoai 99] Some successful results with low rank functions.

Cutting plane algorithm[h. Tuy 03] All limit points of this algorithm are global minimizers Finite convergence for piecewise linear functions (Conjecture). Keeps track of exponentially many vertices.

Cutting plane algorithm[h. Tuy 03] All limit points of this algorithm are global minimizers Finite convergence for piecewise linear functions (Conjecture). Keeps track of exponentially many vertices.

Cutting plane algorithm[h. Tuy 03] All limit points of this algorithm are global minimizers Finite convergence for piecewise linear functions (Conjecture). Keeps track of exponentially many vertices.

Cutting plane algorithm[h. Tuy 03] All limit points of this algorithm are global minimizers Finite convergence for piecewise linear functions (x 0,t 0 ) (Conjecture). Keeps track of exponentially many vertices.

Cutting plane algorithm[h. Tuy 03] All limit points of this algorithm are global minimizers Finite convergence for piecewise linear functions (x 0,t 0 ) (Conjecture). Keeps track of exponentially many vertices.

Cutting plane algorithm[h. Tuy 03] All limit points of this algorithm are global minimizers Finite convergence for piecewise linear functions (x 0,t 0 ) (Conjecture). Keeps track of exponentially many vertices.

Cutting plane algorithm[h. Tuy 03] All limit points of this algorithm are global minimizers Finite convergence for piecewise linear functions (Conjecture). Keeps track of exponentially many vertices.

Cutting plane algorithm[h. Tuy 03] All limit points of this algorithm are global minimizers Finite convergence for piecewise linear functions (x 1,t 1 ) (Conjecture). Keeps track of exponentially many vertices.

Cutting plane algorithm[h. Tuy 03] All limit points of this algorithm are global minimizers Finite convergence for piecewise linear functions (x 1,t 1 ) (Conjecture). Keeps track of exponentially many vertices.

Cutting plane algorithm[h. Tuy 03] All limit points of this algorithm are global minimizers Finite convergence for piecewise linear functions (x 1,t 1 ) (Conjecture). Keeps track of exponentially many vertices.

Cutting plane algorithm[h. Tuy 03] All limit points of this algorithm are global minimizers Finite convergence for piecewise linear functions (Conjecture). Keeps track of exponentially many vertices.

Cutting plane algorithm[h. Tuy 03] All limit points of this algorithm are global minimizers Finite convergence for piecewise linear functions (x 2,t 2 ) (Conjecture). Keeps track of exponentially many vertices.

Cutting plane algorithm[h. Tuy 03] All limit points of this algorithm are global minimizers Finite convergence for piecewise linear functions (x 2,t 2 ) (Conjecture). Keeps track of exponentially many vertices.

Cutting plane algorithm[h. Tuy 03] All limit points of this algorithm are global minimizers Finite convergence for piecewise linear functions (x 2,t 2 ) (Conjecture). Keeps track of exponentially many vertices.

Cutting plane algorithm[h. Tuy 03] All limit points of this algorithm are global minimizers Finite convergence for piecewise linear functions (Conjecture). Keeps track of exponentially many vertices.

Cutting plane algorithm[h. Tuy 03] All limit points of this algorithm are global minimizers Finite convergence for piecewise linear functions (x 3,t 3 ) (Conjecture). Keeps track of exponentially many vertices.

Cutting plane algorithm[h. Tuy 03] All limit points of this algorithm are global minimizers Finite convergence for piecewise linear functions (x 3,t 3 ) (Conjecture). Keeps track of exponentially many vertices.

Cutting plane algorithm[h. Tuy 03] All limit points of this algorithm are global minimizers Finite convergence for piecewise linear functions (x 3,t 3 ) (Conjecture). Keeps track of exponentially many vertices.

Branch and bound.[hoarst and Thoai 99] Main idea is to keep upper and lower bounds of g h on simplices. S k Upper bound: Evaluate g(x k ) h(x k ) for x k S k. Use DCA as subroutine for better bounds. Lower bound: If (v i ) n+1 n+1 and. Solve x = i=1 α i v i are the vertices of i=1 S k min α g i α i v i α i h(v i )

Low rank optimization and polynomial guarantees[goyal and Ravi 08] A function f : R n R has rank k n if there exists g : R k R and α 1,...,α k R n such that f(x) =g(α 1 x,..., α k x) Most examples in Economy literature. For a quasi-concave function to solve min f(x). x C f we want Can always transform DC programs to this type of problem.

Algorithm Let g satisfy the following conditions. The gradient g(y) 0 g(λy) λ c g(y) for all λ>1 and some c α i x>0 for all x P There is an algorithm that finds with f(x) (1 + )f(x. ) in O c k k x

Further reading Farkas type results and duality for DC programs with convex constraints. [Dinh et. al 13] On DC functions and mappings [Duda et. al 01]

Open problems Local rate of convergence for constrained DC programs. Is there a condition under which DCA finds global optima. For instance g h might not be convex but h g might. Finite convergence of cutting plane methods.

References Dinh, Nguyen, Guy Vallet, and T. T. A. Nghia. "Farkas-type results and duality for DC programs with convex constraints." (2006). Goyal, Vineet, and R. Ravi. "An FPTAS for minimizing a class of lowrank quasi-concave functions over a convex set." Operations Research Letters 41.2 (2013): 191-196. Tao, Pham Dinh, and Le Thi Hoai An. "Convex analysis approach to DC programming: theory, algorithms and applications." Acta Mathematica Vietnamica 22.1 (1997): 289-355. Horst, R., and Ng V. Thoai. "DC programming: overview." Journal of Optimization Theory and Applications 103.1 (1999): 1-43. Tuy, H. "On global optimality conditions and cutting plane algorithms." Journal of optimization theory and applications 118.1 (2003): 201-216. Yuille, Alan L., and Anand Rangarajan. "The concave-convex procedure."neural Computation 15.4 (2003): 915-936.

References Salakhutdinov, Ruslan, Sam Roweis, and Zoubin Ghahramani. "On the convergence of bound optimization algorithms." Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann Publishers Inc., 2002. Hartman, Philip. "On functions representable as a difference of convex functions." Pacific J. Math 9.3 (1959): 707-713. Hiriart-Urruty, J-B. "Generalized Differentiability/Duality and Optimization for Problems Dealing with Differences of Convex Functions." Convexity and duality in optimization. Springer Berlin Heidelberg, 1985. 37-70.