The convex geometry of inverse problems

Similar documents
The convex geometry of inverse problems

An incomplete survey of matrix completion. Benjamin Recht Department of Computer Sciences University of Wisconsin-Madison

ELEG Compressive Sensing and Sparse Signal Representations

Convex Optimization MLSS 2015

A Course in Convexity

Lecture 2 September 3

Sparse Reconstruction / Compressive Sensing

Convex Optimization - Chapter 1-2. Xiangru Lian August 28, 2015

Open problems in convex geometry

Detecting Burnscar from Hyperspectral Imagery via Sparse Representation with Low-Rank Interference

ELEG Compressive Sensing and Sparse Signal Representations

Combinatorial Selection and Least Absolute Shrinkage via The CLASH Operator

Lecture 17 Sparse Convex Optimization

Introduction to Topics in Machine Learning

A More Efficient Approach to Large Scale Matrix Completion Problems

Convexity: an introduction

Compressive Sensing. Part IV: Beyond Sparsity. Mark A. Davenport. Stanford University Department of Statistics

Numerical Optimization

60 2 Convex sets. {x a T x b} {x ã T x b}

POLYHEDRAL GEOMETRY. Convex functions and sets. Mathematical Programming Niels Lauritzen Recall that a subset C R n is convex if

Conic Duality. yyye

Lecture 2. Topology of Sets in R n. August 27, 2008

Convex Optimization. 2. Convex Sets. Prof. Ying Cui. Department of Electrical Engineering Shanghai Jiao Tong University. SJTU Ying Cui 1 / 33

Section 5 Convex Optimisation 1. W. Dai (IC) EE4.66 Data Proc. Convex Optimisation page 5-1

DM545 Linear and Integer Programming. Lecture 2. The Simplex Method. Marco Chiarandini

CS675: Convex and Combinatorial Optimization Spring 2018 Convex Sets. Instructor: Shaddin Dughmi

Convex Sets. CSCI5254: Convex Optimization & Its Applications. subspaces, affine sets, and convex sets. operations that preserve convexity

Support Vector Machines.

Convexization in Markov Chain Monte Carlo

Week 5. Convex Optimization

Outlier Pursuit: Robust PCA and Collaborative Filtering

Outline Introduction Problem Formulation Proposed Solution Applications Conclusion. Compressed Sensing. David L Donoho Presented by: Nitesh Shroff

Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization. Author: Martin Jaggi Presenter: Zhongxing Peng

MA4254: Discrete Optimization. Defeng Sun. Department of Mathematics National University of Singapore Office: S Telephone:

2. Convex sets. x 1. x 2. affine set: contains the line through any two distinct points in the set

Compressive. Graphical Models. Volkan Cevher. Rice University ELEC 633 / STAT 631 Class

Convex Optimization. Convex Sets. ENSAE: Optimisation 1/24

Math 5593 Linear Programming Lecture Notes

POLYTOPES. Grünbaum and Shephard [40] remarked that there were three developments which foreshadowed the modern theory of convex polytopes.

Open problems in convex optimisation

be a polytope. has such a representation iff it contains the origin in its interior. For a generic, sort the inequalities so that

Lecture 19: Convex Non-Smooth Optimization. April 2, 2007

Towards Efficient Higher-Order Semidefinite Relaxations for Max-Cut

Collaborative Sparsity and Compressive MRI

2. Convex sets. affine and convex sets. some important examples. operations that preserve convexity. generalized inequalities

The Fundamentals of Compressive Sensing

THEORY OF LINEAR AND INTEGER PROGRAMMING

COMPRESSED SENSING: A NOVEL POLYNOMIAL COMPLEXITY SOLUTION TO NASH EQUILIBRIA IN DYNAMICAL GAMES. Jing Huang, Liming Wang and Dan Schonfeld

Signal Reconstruction from Sparse Representations: An Introdu. Sensing

A fast algorithm for sparse reconstruction based on shrinkage, subspace optimization and continuation [Wen,Yin,Goldfarb,Zhang 2009]

Probabilistic Graphical Models

Blind Deconvolution Problems: A Literature Review and Practitioner s Guide

Lecture 2 Convex Sets

Convex Geometry arising in Optimization

Lecture 2 - Introduction to Polytopes

Convex optimization algorithms for sparse and low-rank representations

Mathematical and Algorithmic Foundations Linear Programming and Matchings

CS599: Convex and Combinatorial Optimization Fall 2013 Lecture 4: Convex Sets. Instructor: Shaddin Dughmi

INTRODUCTION TO FINITE ELEMENT METHODS

Algebraic Geometry of Segmentation and Tracking

Introduction to Coxeter Groups

Convex Sets (cont.) Convex Functions

Linear Programming in Small Dimensions

Advanced Operations Research Techniques IE316. Quiz 1 Review. Dr. Ted Ralphs

CS 473: Algorithms. Ruta Mehta. Spring University of Illinois, Urbana-Champaign. Ruta (UIUC) CS473 1 Spring / 36

3. The Simplex algorithmn The Simplex algorithmn 3.1 Forms of linear programs

Math 734 Aug 22, Differential Geometry Fall 2002, USC

Convex sets and convex functions

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 6

Semistandard Young Tableaux Polytopes. Sara Solhjem Joint work with Jessica Striker. April 9, 2017

Chapter 4 Concepts from Geometry

Fano varieties and polytopes

Convex Optimization / Homework 2, due Oct 3

Research Interests Optimization:

Structurally Random Matrices

Investigating Mixed-Integer Hulls using a MIP-Solver

Polytopes Course Notes

Geometry and Statistics in High-Dimensional Structured Optimization

Convex Optimization M2

Convex sets and convex functions

What Does Robustness Say About Algorithms?

Lecture 9. Semidefinite programming is linear programming where variables are entries in a positive semidefinite matrix.

1. Introduction. performance of numerical methods. complexity bounds. structural convex optimization. course goals and topics

Convexity Theory and Gradient Methods

Rank Minimization over Finite Fields

Integral Geometry and the Polynomial Hirsch Conjecture

Linear and Integer Programming :Algorithms in the Real World. Related Optimization Problems. How important is optimization?

Conic Optimization via Operator Splitting and Homogeneous Self-Dual Embedding

We have set up our axioms to deal with the geometry of space but have not yet developed these ideas much. Let s redress that imbalance.

Right-angled hyperbolic polyhedra

ADAPTIVE LOW RANK AND SPARSE DECOMPOSITION OF VIDEO USING COMPRESSIVE SENSING

P257 Transform-domain Sparsity Regularization in Reconstruction of Channelized Facies

Limitations of Matrix Completion via Trace Norm Minimization

Optimization under uncertainty: modeling and solution methods

Braid groups and Curvature Talk 2: The Pieces

Aspects of Convex, Nonconvex, and Geometric Optimization (Lecture 1) Suvrit Sra Massachusetts Institute of Technology

The important function we will work with is the omega map ω, which we now describe.

On the positive semidenite polytope rank

Alternating Projections

However, this is not always true! For example, this fails if both A and B are closed and unbounded (find an example).

Transcription:

The convex geometry of inverse problems Benjamin Recht Department of Computer Sciences University of Wisconsin-Madison Joint work with Venkat Chandrasekaran Pablo Parrilo Alan Willsky

Linear Inverse Problems Find me a solution of m x n, m<n Of the infinite collection of solutions, which one should we pick? Leverage structure: y = Φx Sparsity Rank Smoothness Symmetry How do we design algorithms to solve underdetermined systems problems with priors?

Sparsity 1-sparse vectors of Euclidean norm 1 1 Convex hull is the unit ball of the l 1 norm -1 1-1

x 2 Ax=b x 1 Compressed Sensing: Candes, Romberg, Tao, Donoho, Tanner, Etc...

Rank 2x2 matrices plotted in 3d rank 1 x 2 + z 2 + 2y 2 = 1 Convex hull: X = i σ i (X)

2x2 matrices plotted in 3d X = i σ i (X) Nuclear Norm Heuristic Fazel 2002. R, Fazel, and Parrilo 2007 Rank Minimization/Matrix Completion

Integer Programming Integer solutions: all components of x are ±1 (-1,1) (1,1) Convex hull is the unit ball of the l 1 norm (-1,-1) (1,-1)

x 2 Ax=b x 1 Donoho and Tanner 2008 Mangasarian and Recht. 2009.

Parsimonious Models rank model weights atoms Search for best linear combination of fewest atoms rank = fewest atoms needed to describe the model

Permutation Matrices X a sum of a few permutation matrices Examples: Multiobject Tracking (Huang et al), Ranked elections (Jagabathula, Shah) Convex hull of the permutation matrices: Birkhoff Polytope of doubly stochastic matrices Permutahedra: convex hull of permutations of a fixed vector. [1,2,3,4]

Moment Curve Curve of [1,t,t 2,t 3,t 4,...] System Identification, Image Processing, Numerical Integration, Statistical Inference... Convex hull is characterized by linear matrix inequalities

Cut Matrices Sums of rank-one sign matrices: X = i p i X i X i = x i x i X ij = ±1 Collaborative Filtering (Srebro et al), Clustering in Genetic Networks (Tanay et al), Combinatorial Approximation Algorithms (Frieze and Kannan) Convex hull is the cut polytope. Membership is NPhard to test Semidefinite approximations of this hull to within constant factors.

Tensors X a low-rank tensor (multi-index array) Examples: Polynomial equations, computer vision, differential equations, statistics, chemometrics,... Convex hull of rank-1 tensors leads to a tensor nuclear norm ball Everything involving tensors is intractable to compute (in theory...) But heuristics work unreasonably well: why?

Atomic Norms Given a basic set of atoms,, define the function A x A =inf{t >0 : x tconv(a)} A When is centrosymmetric, we get a norm x A =inf{ a A c a : x = a A c a a} IDEA: minimize subject to z A Φz = y When does this work? How do we solve the optimization problem?

Atomic norms in sparse Greedy approximations approximation f f n L2 c 0f A n Best n term approximation to a function f in the convex hull of A in Banach space. Maurey, Jones, and Barron (1980s-90s) Devore and Temlyakov (1996)

Tangent Cones Set of directions that decrease the norm from x form a cone: T A (x) ={d : x + αd A x A for some α > 0} y = Φz x minimize z A subject to Φz = y T A (x) {z : z A x A } x is the unique minimizer if the intersection of this cone with the null space of Φ equals {0}

Gaussian Widths When does a random subspace, U, intersect a convex cone C at the origin? Gordon 88: with high probability if w(c) =E codim(u) w(c) 2 max x, g x C S n 1 Where is the Gaussian width. Corollary: For inverse problems: if Φ is a random Gaussian matrix with m rows, need for recovery of x. m w(t A (x)) 2

Robust Recovery Suppose we observe minimize subject to y = Φx + w z A Φz y δ w 2 δ ˆx If is an optimal solution, then provided that m c 0w(T A (x)) 2 (1 ) 2 x ˆx 2δ Φz y δ {z : z A x A }

What can we do with Gaussian widths? Used by Rudelson & Vershynin for analyzing sharp bounds on the RIP for special case of sparse vector recovery using l1. For a k-dim subspace S, w(s) 2 = k. Computing width of a cone C not easy in general Main property we exploit: symmetry and duality (inspired by Stojnic 09)

Duality w(c) = E E = E max v, g v C v=1 max v, g v C v 1 min g u u C C is the polar cone. C = {w : w, z 0 z C} T A (x) = N A (x) N A (x) is the normal cone. Equal to the cone induced by the subdifferential of the atomic norm at x. N A (x)

Symmetry I - self duality Self dual cones - orthant, positive semidefinite cone, second order cone Gaussian width = half the dimension of the cone C x = Π C (x)+π C (x) C Π C (x), Π C (x) =0 E[ inf u C g u2 2]=E[Π C (g) 2 2]=E[Π C (g) 2 2]

Spectral Norm Ball How many measurements to recover a unitary matrix? T A (U) =S P Tangent cone is skew-symmetric matrices minus the positive semidefinite cone. These two sets are orthogonal, thus w(t A (U)) 2 n 1 2 + 1 2 n 2 = 3n2 n 4

Re-derivations Hypercube: m n/2 (orthant is self dual, or direct integration) Sparse Vectors, n vector, sparsity s m (2s + 1) log(n s) Low-rank matrices: n1 x n2, (n1<n2), rank r m 3r(n 1 + n 2 r)+2n 1

General Cones Theorem: Let C be a nonempty cone with polar cone C*. Suppose C* subtends normalized solid angle µ. Then w(c) 3 log 4 µ Proof Idea: The expected distance to C* can be bounded by the expected distance to a spherical cap Isoperimetry: Out of all subsets of the sphere with the same measure, the one with the smallest neighborhood is the spherical cap The rest is just integrals...

Symmetry II - Polytopes Corollary: For a vertex-transitive (i.e., symmetric ) polytope with p vertices, O(log p) Gaussian measurements are sufficient to recover a vertex via convex optimization. For n x n permutation matrix: m = O(n log n) For n x n cut matrix: m = O(n) (Semidefinite relaxation also gives m = O(n))

Algorithms minimize z Φz y 2 2 + µz A Naturally amenable to projected gradient algorithm: z k+1 = Π ηµ (z k ηφ r k ) residual shrinkage r k = Φz k y Π τ (z) = arg min u 1 2 z u2 + τu A Similar algorithm for atomic norm constraint Same basic ingredients for ALM, ADM, Bregman, Mirror Prox, etc... how to compute the shrinkage?

Shrinkage Π τ (z) = arg min 2 z u2 + τu A Λ τ (z) = arg min 1z v A τ 2 v2 u 1 z = Π τ (z)+λ τ (z) Dual norm v A = maxv, a a A

Relaxations v A = maxv, a a A Dual norm is efficiently computable if the set of atoms is polyhedral or semidefinite representable A 1 A 2 = x A 1 x A 2 and x A2 x A1 Convex relaxations of atoms yield approximations to the norm NB! tangent cone gets wider Hierarchy of polyhedral (Sherali-Adams) or semidefinite (Positivstellensatz) approximations to atomic sets yield progressively tighter bounds on the atomic norm

X A =inf Maxnorm Algorithms σ 1 : X = σ ju j vj where u j = 1 and v j =1 j Atomic set of rank one sign matrices X max := inf {U 2, V 2, : X = UV } Semidefinite relaxation X max X A 1.8X max Grothendieck s inequality Fast algorithms based on projection/shrinkage: Lee, R., Salakhutdinov, Srebro, Tropp (NIPS2010) Key ingredients: semidefinite programming, low-rank embedding, projected gradient, stochastic approximation

Maxnorm vs Tracenorm Better Generalization in theory (Srebro 05, and width arguments) More stable and better prediction in practice (for example, significantly better performance on collaborative filtering data sets) Extensions to spectral clustering, graph approximation algorithms, etc.

Scaling up Exploiting geometric structure in multicore data analysis Clever parallelization of incremental gradient algorithms, cache alignment, etc. Netflix data-set 100M examples 17770 rows 480189 columns In preparation for SIGMOD11 with Christopher Re

Atomic Norm Decompositions Propose a natural convex heuristic for enforcing prior information in inverse problems Bounds for the linear case: heuristic succeeds for most sufficiently large sets of measurements Stability without restricted isometries Standard program for computing these bounds: distance to normal cones Approximation schemes for computationally difficult priors

Extensions... Width Calculations for more general structures Recovery bounds for structured measurement matrices (application specific) Understanding of the loss due to convex relaxation and norm approximation Scaling generalized shrinkage algorithms to massive data sets