Convergence of Nonconvex Douglas-Rachford Splitting and Nonconvex ADMM

Similar documents
Convex optimization algorithms for sparse and low-rank representations

Optimization for Machine Learning

Sparse Optimization Lecture: Proximal Operator/Algorithm and Lagrange Dual

Conic Optimization via Operator Splitting and Homogeneous Self-Dual Embedding

Lecture 19 Subgradient Methods. November 5, 2008

Lagrangian Relaxation: An overview

The Alternating Direction Method of Multipliers

Lecture 19: Convex Non-Smooth Optimization. April 2, 2007

Parallel and Distributed Sparse Optimization Algorithms

A More Efficient Approach to Large Scale Matrix Completion Problems

Convex Optimization MLSS 2015

Monotone Closure of Relaxed Constraints in Submodular Optimization: Connections Between Minimization and Maximization: Extended Version

Bilinear Programming

Characterizing Improving Directions Unconstrained Optimization

Generalized Nash Equilibrium Problem: existence, uniqueness and

Alternating Direction Method of Multipliers

Aspects of Convex, Nonconvex, and Geometric Optimization (Lecture 1) Suvrit Sra Massachusetts Institute of Technology

IDENTIFYING ACTIVE MANIFOLDS

The Alternating Direction Method of Multipliers

DISTRIBUTED NETWORK RESOURCE ALLOCATION WITH INTEGER CONSTRAINTS. Yujiao Cheng, Houfeng Huang, Gang Wu, Qing Ling

Convex Optimization / Homework 2, due Oct 3

A general system for heuristic minimization of convex functions over non-convex sets

A General System for Heuristic Solution of Convex Problems over Nonconvex Sets

A proximal center-based decomposition method for multi-agent convex optimization

Applied Lagrange Duality for Constrained Optimization

Stochastic Separable Mixed-Integer Nonlinear Programming via Nonconvex Generalized Benders Decomposition

Optimality certificates for convex minimization and Helly numbers

Penalty Alternating Direction Methods for Mixed- Integer Optimization: A New View on Feasibility Pumps

Lecture 4 Duality and Decomposition Techniques

1 Linear programming relaxation

Convex Optimization Lecture 2

arxiv: v3 [math.oc] 3 Nov 2016

A parallel implementation of the ADMM algorithm for power network control

Iteratively Re-weighted Least Squares for Sums of Convex Functions

Lecture 2 September 3

A Feasible Region Contraction Algorithm (Frca) for Solving Linear Programming Problems

Improving Dual Bound for Stochastic MILP Models Using Sensitivity Analysis

Point-Set Topology II

Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization. Author: Martin Jaggi Presenter: Zhongxing Peng

AM 221: Advanced Optimization Spring 2016

11 Linear Programming

Introduction to Optimization

CMU-Q Lecture 9: Optimization II: Constrained,Unconstrained Optimization Convex optimization. Teacher: Gianni A. Di Caro

Compact Sets. James K. Peterson. September 15, Department of Biological Sciences and Department of Mathematical Sciences Clemson University

Optimality certificates for convex minimization and Helly numbers

A Little Point Set Topology

MVE165/MMG630, Applied Optimization Lecture 8 Integer linear programming algorithms. Ann-Brith Strömberg

CONLIN & MMA solvers. Pierre DUYSINX LTAS Automotive Engineering Academic year

LaGO - A solver for mixed integer nonlinear programming

On the Global Solution of Linear Programs with Linear Complementarity Constraints

Integer Programming ISE 418. Lecture 1. Dr. Ted Ralphs

Integer Programming ISE 418. Lecture 7. Dr. Ted Ralphs

2. Convex sets. x 1. x 2. affine set: contains the line through any two distinct points in the set

The Branch-and-Sandwich Algorithm for Mixed-Integer Nonlinear Bilevel Problems

LECTURE 10 LECTURE OUTLINE

15.082J and 6.855J. Lagrangian Relaxation 2 Algorithms Application to LPs

CONVEX OPTIMIZATION: A SELECTIVE OVERVIEW

Elementary Topology. Note: This problem list was written primarily by Phil Bowers and John Bryant. It has been edited by a few others along the way.

General properties of staircase and convex dual feasible functions

Lecture Notes 2: The Simplex Algorithm

Lecture 5: Properties of convex sets

Convex Analysis and Minimization Algorithms I

DC Programming: A brief tutorial. Andres Munoz Medina Courant Institute of Mathematical Sciences

Topology problem set Integration workshop 2010

A Computational Theory of Clustering

M3P1/M4P1 (2005) Dr M Ruzhansky Metric and Topological Spaces Summary of the course: definitions, examples, statements.

Modified Augmented Lagrangian Coordination and Alternating Direction Method of Multipliers with

The Size Robust Multiple Knapsack Problem

A Comparison of Mixed-Integer Programming Models for Non-Convex Piecewise Linear Cost Minimization Problems

TOPOLOGY, DR. BLOCK, FALL 2015, NOTES, PART 3.

1. Introduction. performance of numerical methods. complexity bounds. structural convex optimization. course goals and topics

2. Convex sets. affine and convex sets. some important examples. operations that preserve convexity. generalized inequalities

Convex Sets. CSCI5254: Convex Optimization & Its Applications. subspaces, affine sets, and convex sets. operations that preserve convexity

PROJECTION ONTO A POLYHEDRON THAT EXPLOITS SPARSITY

Algorithms for Decision Support. Integer linear programming models

Convex Optimization. Lijun Zhang Modification of

Linear Bilevel Programming With Upper Level Constraints Depending on the Lower Level Solution

Convex Optimization and Machine Learning

Unifying and extending hybrid tractable classes of CSPs

Open problems in convex geometry

Introduction to Modern Control Systems

Point-Set Topology 1. TOPOLOGICAL SPACES AND CONTINUOUS FUNCTIONS

Distributed Alternating Direction Method of Multipliers

Linear & Conic Programming Reformulations of Two-Stage Robust Linear Programs

Bounded subsets of topological vector spaces

A Class of Submodular Functions for Document Summarization

TMA946/MAN280 APPLIED OPTIMIZATION. Exam instructions

Lecture 1: Introduction

THREE LECTURES ON BASIC TOPOLOGY. 1. Basic notions.

Selected Topics in Column Generation

Convex Optimization - Chapter 1-2. Xiangru Lian August 28, 2015

Randomized Algorithms 2017A - Lecture 10 Metric Embeddings into Random Trees

Convexity Theory and Gradient Methods

Math 5593 Linear Programming Lecture Notes

Module 1 Lecture Notes 2. Optimization Problem and Model Formulation

Probabilistic embedding into trees: definitions and applications. Fall 2011 Lecture 4

Lagrangean Methods bounding through penalty adjustment

Tilburg University. Document version: Publisher final version (usually the publisher pdf) Publication date: Link to publication

Efficient Iterative LP Decoding of LDPC Codes with Alternating Direction Method of Multipliers

ONLY AVAILABLE IN ELECTRONIC FORM

Transcription:

Convergence of Nonconvex Douglas-Rachford Splitting and Nonconvex ADMM Shuvomoy Das Gupta Thales Canada 56th Allerton Conference on Communication, Control and Computing, 2018 1

What is this talk about? this talk is about ADMM and Douglas-Rachford splitting for nonconvex problems the alternating direction method of multipliers (ADMM) originally designed to solve convex optimization problem Douglas-Rachford splitting algorithm ADMM is its special case in a convex setup both guaranteed to converge for convex problems 2

Motivation nonconvex ADMM (NC-ADMM) has become a popular heuristic to tackle nonconvex problems recently, NC-ADMM heuristic has been applied to [Erseghe, 2014] optimal power flow problem, [Takapoui et al., 2017] mixed integer quadratic optimization, [Iyer et al., 2014] submodular minimization with nonconvex constraints... [Diamond et al., 2018] Python package NCVX (extension of CVXPY) implements ADMM heuristic (NC-ADMM) often produces lower objective values compared with exact solvers within a time limit nonconvex Douglas-Rachford splitting (NC-DRS): analogous nonconvex heuristic based on Douglas-Rachford splitting not much has been done to improve the theoretical understanding of such heuristics

Summary of the results NC-DRS attacks the original problem directly optimal solutions can be characterized via the NC-DRS operator if deviation from a convex setup is bounded it will converge or oscillate in a compact connected set NC-ADMM works on a modified dual problem, not the original nonconvex problem not equivalent to NC-DRS, but there is a relationship between them likely to produce a lower objective value 4

Problem in consideration minimize a convex cost function with nonconvex constraint set minimize x subject to f (x) x C (OPT) f is closed, proper, and convex C is compact, but not necessarily convex

Reformulation through indicator function indicator function of set C: δ C (x) = { 0, if x C, if x / C δ C is closed and proper, but not necessarily convex we can write ( minimizex f (x) subject to x C ) = minimize x f(x) + δ C (x)

Proximal operator of f and projection onto C both NC-DRS and NC-ADMM have same subroutines: first prox γf, then Π C and finally proximal operator of f evaluated at point x with parameter γ > 0: ( 1 prox γf (x) = argmin y f(y) + y x 2) 2γ single-valued, continuous projection onto C: prox γδc (x) = Π C (x) = argmin y C ( y x 2 ) there can be multiple projections one such projection is denoted by Π C ( )

NC-ADMM and NC-DRS: NC-ADMM: x n+1 = prox γf (y n z n ) y n+1 = Π C (x n+1 + z n ) z n+1 = z n y n+1 + x n+1 NC-DRS: x n+1 = prox γf (z n ) y n+1 = Π C (2x n+1 z n ) z n+1 = z n + y n+1 x n+1 both have same subroutines, but different inputs 8

Pretend C is convex f is closed, proper, convex pretend C is convex x n,y n converge to an optimal solution for any initial condition but C is not necessarily convex in our setup convergence conditions are messy 9

Why are convergence conditions messy? the convergence conditions are messy because: subdifferential operator of δ C is monotone, but not maximally monotone Π C : is expansive i.e., not nonexpansive the underlying reflection operator is expansive Little bit of review...

Monotone and maximally monotone operators T is monotone if for every (x,u),(y,v) grat x y u v 0 T is maximally monotone if gra T is not properly contained by any other monotone operator s graph monotone, but not maximally monotone maximally monotone 11

Subdifferential operator for a nonconvex function g: closed, proper, but not necessarily convex g: subdifferential of g is monotone, but not maximally monotone g (x) = {u R n ( y R n ) g (y) g (x) + u y x } no subgradient at

Why are convergence conditions messy? our problem: minimize x f(x) + δ C (x) f: maximally monotone δ C : monotone, but not maximally monotone Π C is expansive (not nonexpansive) What is a nonexpansive operator?

What is a nonexpansive operator? T : single-valued operator on R n T is nonexpansive on R n if for every x,y we have T (x) T (y) x y prox γf is nonexpansive ) (2prox γf I n is nonexpansive 14

Operators that are expansive T is a single-valued operator on R n T is expansive if there exist x,y such that T (x) T (y) > x y ΠC is expansive 15

Characterization of minimizers: NC-DRS operator and its reflection T : NC-DRS operator T = Π C ( 2prox γf I n ) + I n prox γf. R: reflection operator of T NC-DRS in compact form: R = 2 T I n z n+1 = T z n = 1 2 ( R + In ) z n ΠC : expansive R: expansive root of all convergence issues 16

Characterization of minimizers argmin(f + δ C ) is the set of minimizers of min. x f(x) + δ C (x) prox γf ( fix T ) argmin(f + δc ) underlying assumptions: 1. zer ( f + δ C ) is nonempty 2. fix T is nonempty 3. fix {( 2Π C I n )( 2proxγf I n )} is nonempty

Convergence of NC-DRS: setup ε R xy : expansiveness of R at x,y ε R xy = { R(x) R(y) x y, if x y < R(x) R(y) 0, else σ R xy = ε R xy R(x) R(y) + x y

Convergence of NC-DRS: conditions (z n ) n N : sequence of vectors generated for some chosen initial point z 0 if the following holds: there exists a z fix T, such that n=0 (σ R znz) 2 is bounded then... above, and z 0 z 2 is finite z 0 z 2 + n=0 B(z;r): compact ball with center z and radius r define r := ( ) 2 σ R zn z

Convergence of NC-DRS then one of the following will happen: 1. convergence to a point: the sequence (z n ) n N converges to a point z B(z;r) 2. cluster points form a continuum: the set of cluster points of (z n ) n N forms a nonempty compact connected set in B(z; r) if situation 1 occurs and lim n (σ R znz ) 2 = 0, then x n = prox γf (z n 1 ) converges to an optimal solution

Some comments on convergence for convergence total deviation of R from being a nonexpansive operator over the sequence {(z n,z)} n N is bounded depends on the initial point in our case C is not necessarily convex sanity check: pretend C is convex total deviation of R from being a nonexpansive operator over the sequence {(z n,z)} n N is zero our convergence proof coincides with known convergence results for convex setup

Constructing NC-ADMM original problem: minimize x C f(x) take dual and apply NC-DRS to the dual resultant algorithm is relaxed NC-ADMM x n+1 = prox γf (y n z n ) y n+1 = Π convc (z n + x n+1 ) z n+1 = z n y n+1 + x n+1 relaxed NC-ADMM solves minimize x convc f(x)

Constructing NC-ADMM (continued) relaxed NC-ADMM x n+1 = prox γf (y n z n ) y n+1 = Π convc (z n + x n+1 ) z n+1 = z n y n+1 + x n+1 solves minimize x convc f(x) replace Π convc with Π C to solve minimize x C f(x) resultant algorithm is NC-ADMM: x n+1 = prox γf (y n z n ) y n+1 = Π C (x n+1 + z n ) z n+1 = z n y n+1 + x n+1

Constructing NC-ADMM

Pretend C is convex pretend C is convex strong duality holds Π convc = Π C Moreau s decomposition can be applied to δ C : prox δc + prox δ = I C n then: no information loss in constructing NC-ADMM from NC-DRS NC-ADMM and NC-DRS are equivalent to each other

However... C is nonconvex NC-ADMM and NC-DRS are not equivalent reasons for the loss of equivalency: 1. strict duality gap 2. Π conv C Π C 3. Moreau s decomposition does not hold NC-ADMM in NCVX-CVXPY works on a modified dual, hence produces a lower objective value relationship between minimizers of the original problem and the NC-ADMM operator breaks down

What if? is it possible to establish convergence NC-ADMM ignoring NC-DRS? our convergence analysis is established for nonempty, compact, but not necessarily convex constraint sets it is equally applicable to the smaller subclass of problems with convex constraint sets in this smaller subclass of convex problems NC-ADMM and NC-DRS have the same convergence properties 27

Summary NC-DRS and NC-ADMM are very similar looking, but very different heuristics the theoretical rationale to use NC-DRS could be stronger than NC-ADMM... 28

Limitations strong assumptions in minimizer characterization step size is 1, does not consider adaptive step sizes no numerical experiments in the paper

End of talk if you are working on nonconvex problems using ADMM/DRS, please talk to me! Thank you! Questions?

Diamond, S, Takapoui, R, & Boyd, S. 2018. A general system for heuristic minimization of convex functions over non-convex sets. Optimization Methods and Software, 33(1), 165 193. Erseghe, Tomaso. 2014. Distributed optimal power flow using ADMM. IEEE transactions on power systems, 29(5), 2370 2380. Iyer, Rishabh, Jegelka, Stefanie, & Bilmes, Jeff. 2014. Monotone Closure of Relaxed Constraints in Submodular Optimization: Connections Between Minimization and Maximization. Pages 360 369 of: Proceedings of the Thirtieth Conference on Uncertainty in Artificial Intelligence. UAI 14. Arlington, Virginia, United States: AUAI Press. 30

Takapoui, Reza, Moehle, Nicholas, Boyd, Stephen, & Bemporad, Alberto. 2017. A simple effective heuristic for embedded mixed-integer quadratic programming. International Journal of Control, 1 11. 30