Alina Ene. From Minimum Cut to Submodular Minimization Leveraging the Decomposable Structure. Boston University

Size: px

Start display at page:

Download "Alina Ene. From Minimum Cut to Submodular Minimization Leveraging the Decomposable Structure. Boston University"

Mariah Weaver
6 years ago
Views:

1 From Minimum Cut to Submodular Minimization Leveraging the Decomposable Structure Alina Ene Boston University Joint work with Huy Nguyen (Northeastern University) Laszlo Vegh (London School of Economics)

2 Minimum s-t Cut Image Segmentation s t x x 1 Submodular Minimization min f(s) S V x x 4 x 3 x 5 Structured Sparsity [Bach 10] MAP Inference

3 Minimum s-t Cut Image Segmentation s Submodular Minimization t min f(s) S V x x 1 Submodular Minimization min f(s) S V x x 4 x 3 x 5 Structured Sparsity [Bach 10] MAP Inference

4 Minimum s-t Cut Image Segmentation s Submodular Minimization t min f(s) S V Solvable in polynomial time x Combinatorial, Submodular ellipsoid, Minimization cutting plane, x 1 min f(s) S V x x 4 x 3 x 5 Structured Sparsity [Bach 10] MAP Inference

5 Minimum s-t Cut Image Segmentation s Submodular Minimization t min f(s) S V Solvable in polynomial time x Combinatorial, Submodular ellipsoid, Minimization cutting plane, x 1 min f(s) S V Very high running times ( V 3 ) x x 4 Leverage structure to obtain faster algos x 3 x 5 Structured Sparsity [Bach 10] MAP Inference

6 Decomposable Functions min S V f i (S) Simple fi : fast subroutine for minimizing fi(s) + w(s) for any linear function w [Stobbe, Krause 10; Kolmogorov 1; Jegelka, Bach, Sra 13; Nishihara, Jegelka, Jordan 14]

7 Decomposable Structure = + + f(s) = X ee f e (S) Cut function of a single edge

8 Decomposable Structure = + + hyperedges, local functions,

9 This Talk Sum-of-Submodular / Decomposable SFM min S V f i (S) Continuous algos based on gradient descent Combinatorial algos based on max flow approaches Interplay between continuous optimization and combinatorial structure

10 Main Combinatorial Player Key concept underpinning submodular algorithms Base Polytope B(f) ={y hy, 1 S iapplef(s) 8S V, hy, 1 V i = f(v )} Think of y as a linear function on V

11 Continuous Algorithms

12 From Submodular to Convex Optimization [Edmonds 70, Lovasz 83, Fujishige 84] Submodular Minimization can be reduced to finding a point of minimum norm in the base polytope Discrete Problem min f(s) S V Exact Convex Formulation min yb(f) kyk

13 Discrete Problem min f(s) S V Exact Convex Formulation min yb(f) kyk Convex objective is very nice Constraint is very complicated

14 Discrete Problem min f(s) S V Exact Convex Formulation min yb(f) kyk Convex objective is very nice Constraint is very complicated Algorithms based on Ellipsoid/Cutting Plane [GLS 84, LSW 15] Provable polynomial running times Running time prohibitively large in applications

15 Discrete Problem min f(s) S V Exact Convex Formulation min yb(f) kyk Convex objective is very nice Constraint is very complicated Fujishige-Wolfe and Frank-Wolfe algorithms Suitable for some applications Pseudo-polynomial running times Known to require large number of iterations

16 In the decomposable setting, can leverage more of the convex optimization toolkit Lemma: If f = f i then B(f) = B(f i ) Discrete Problem f i (S) min S V Exact Convex Formulation min B(f i ) f i simple ) quadratic min over B(f i ) easy

17 Discrete Problem f i (S) min S V Exact Convex Formulation min B(f i ) Decomposition allows for fast and practical algorithms based on gradient descent [Stobbe, Krause 10; Jegelka, Bach, Sra 13; Nishihara, Jegelka, Jordan 14, E., Nguyen 15]

18 Discrete Problem f i (S) min S V Exact Convex Formulation min B(f i ) [E., Nguyen ICML 15] Minimize the convex formulation using Random Coordinate Gradient Descent Also accelerated algorithm Fastest running times currently known

19 Exact Convex Formulation min B(f i ) g(y) := Random Coordinate Descent Algorithm Start with y =(y 1,...,y r ), where B(f i ) In each iteration Pick an index i {1,,...,r} uniformly at random hhupdate block iii y 0 i g(y)+hr argmin i g(y),z i i + kz i k z i B(f i )

20 Smoothness g kr i g(y) r i g(z)k applek z i k y and z differ onln block i Random Coordinate Descent Algorithm Start with y =(y 1,...,y r ), where B(f i ) In each iteration Pick an index i {1,,...,r} uniformly at random hhupdate block iii y 0 i g(y)+hr argmin i g(y),z i i + kz i k z i B(f i )

21 g Smooth functions have quadratic upper bounds y Random Coordinate Descent Algorithm Start with y =(y 1,...,y r ), where B(f i ) In each iteration Pick an index i {1,,...,r} uniformly at random hhupdate block iii y 0 i g(y)+hr argmin i g(y),z i i + kz i k z i B(f i )

22 (z) =g(y)+hr i g(y),z i i + kz i k z and y differ onln block i g y Random Coordinate Descent Algorithm Start with y =(y 1,...,y r ), where B(f i ) In each iteration Pick an index i {1,,...,r} uniformly at random hhupdate block iii y 0 i g(y)+hr argmin i g(y),z i i + kz i k z i B(f i )

23 (z) =g(y)+hr i g(y),z i i + kz i k z and y differ onln block i g Minimize the upper bound y y Random Coordinate Descent Algorithm Start with y =(y 1,...,y r ), where B(f i ) In each iteration Pick an index i {1,,...,r} uniformly at random hhupdate block iii y 0 i g(y)+hr argmin i g(y),z i i + kz i k z i B(f i )

24 Exact Convex Formulation min B(f i ) g(y) := Random Coordinate Descent Algorithm Start with y =(y 1,...,y r ), where B(f i ) In each iteration Pick an index i {1,,...,r} uniformly at random hhupdate block iii y 0 i g(y)+hr argmin i g(y),z i i + kz i k z i B(f i ) Efficient update for simple functions

25 Exact Convex Formulation min B(f i ) g(y) := Random Coordinate Descent Algorithm Start with y =(y 1,...,y r ), where B(f i ) In each iteration Pick an index i {1,,...,r} uniformly at random hhupdate block iii y 0 i g(y)+hr argmin i g(y),z i i + kz i k z i B(f i ) Also accelerated version building on [Fercoq-Richtárik 13]

26 Discrete Problem f i (S) min S V Exact Convex Formulation min B(f i ) [E., Nguyen ICML 15] Minimize the convex formulation using Random Coordinate Gradient Descent

27 Discrete Problem f i (S) min S V Exact Convex Formulation min B(f i ) [E., Nguyen ICML 15] Minimize the convex formulation using Random Coordinate Gradient Descent Running time (with acceleration) [ENV 17] O nr log Fmax Q Q: time for one update via oracle for fi

28 Discrete Problem f i (S) min S V Exact Convex Formulation min B(f i ) [E., Nguyen ICML 15] Minimize the convex formulation using Random Coordinate Gradient Descent Running time (with acceleration) [ENV 17] O(nr log (nf max ) Q) Q: time for one update via oracle for fi

29 Discrete Algorithms

30 The quest for combinatorial algos for submodular min [Cunningham 85] It is an outstanding open problem to find a practical combinatorial algorithm to minimize a general submodular function, which also runs in polynomial time.

The quest for combinatorial algos for submodular min [Cunningham 85] It is an outstanding open problem to find a practical combinatorial algorithm to minimize a

31 The quest for combinatorial algos for submodular min [Cunningham 85] It is an outstanding open problem to find a practical combinatorial algorithm to minimize a general submodular function, which also runs in polynomial time. [Bixby et al. 85, Cunningham 85] Framework for designing combinatorial algorithms for submodular minimization

32 Basic framework for all combinatorial algorithms Cunningham Framework Maintain a point n the base polytope Iteratively update y using flow-style updates How to represent a point in the base polytope?

33 Basic framework for all combinatorial algorithms Cunningham Framework Maintain a point n the base polytope Iteratively update y using flow-style updates How to represent a point in the base polytope? As a convex combination of vertices Pro: Very general (works for any submodular fn) Con: Difficult/expensive to maintain and update

34 For decomposable submodular functions simpler version of Cunningham s approach works! Recall: For f = we have f i B(f) = B(f i ) Can represent y B(f) as y = y y r where B(f i ) Verify B(f i ) using oracle for f i

35 From decomposable functions to graphs Maintain a point y = y y r y B(f) into points with a decomposition B(f i ) Update using auxiliary graph G =(V,[ i E i ) (u, v) E i if y i B(f i ) for some > 0 c(u, v) = max u v u y i + v For graph cut instance, the auxiliary graph is the usual residual graph for network flows

36 Translate classical maxflow toolkit to submodular Edmonds-Karp-Dinitz, Preflow-push, SUBMODULAR

37 Translate classical maxflow toolkit to submodular Edmonds-Karp-Dinitz, Preflow-push, Sample Result: If all f i O(n r) have small support preflow-push runs in oracle calls First shown by [Fujishige-Zhang 94] for r = [Fix et al. 13] adapt IBFS algo. of [Goldberg et al. 11]

38 Translate classical maxflow toolkit to submodular Edmonds-Karp-Dinitz, Preflow-push, [E., Nguyen, Vegh NIPS 17] Discrete to Continuous Tight running time analyses for the gradient descent algorithms Discrete vs Continuous Experimental comparison on computer vision tasks

39 Recall our problem: min 8 < : g(y) = : y ry 9 = B(f i ) ; optimal solutions ry B(f i ) Minimize objective using random coordinate descent Linear convergence rate despite lack of strong convexity Use combinatorial structure instead of strong convexity

40 min n f(x): x X o µ-strong Convexity X f( ) f( ) hrf( ), i + µ (d(, ) ) for any, X

41 min n f(x): x X o µ-strong Convexity X f( ) f( ) hrf( ), i + µ (d(, ) ) for any, X What we really need f( ) f( ) µ (min d(, ) ) where is an optimal point

42 Our problem: min 8 < : g(y) = : y ry 9 = B(f i ) ; ry B(f i ) optimal point arbitrary point Restricted µ -Strong Convexity g( ) g( ) µ (min d(, ) )

43 Our problem: min 8 < : g(y) = : y ry 9 = B(f i ) ; ry B(f i ) optimal point arbitrary point Restricted µ -Strong Convexity g( ) g( ) µ (min d(, ) ) Can show µ max n 1 nr, 1 n o

44 Our problem: min 8 < : g(y) = : y ry 9 = B(f i ) ; ry B(f i ) Want: g( ) - g( ) 1 n d(, )

45 Our problem: min 8 < : g(y) = : y ry 9 = B(f i ) ; ry B(f i ) B(f) sum of the blocks Want: g( ) - g( ) 1 n d(, )

46 Our problem: min 8 < : g(y) = : y ry 9 = B(f i ) ; min norm point ry B(f i ) B(f) sum of the blocks Want: g( ) - g( ) 1 n d(, )

47 Our problem: min 8 < : g(y) = : y ry 9 = B(f i ) ; min norm point ry B(f i ) B(f) sum of the blocks g( ) - g( ) d(, ) 1 n d(, )

48 Our problem: min 8 < : g(y) = : y ry 9 = B(f i ) ; min norm point ry B(f i ) B(f) sum of the blocks g( ) - g( ) 1 d(, ) d(, ) n

49 Our problem: min 8 < : g(y) = : y ry 9 = B(f i ) ; flow decomposition min norm point ry B(f i ) B(f) sum of the blocks g( ) - g( ) 1 d(, ) d(, ) n

50 demand 1 n demand 1 n ` of demand = 1 n n ` of flow = n ` of demand = 1 n (` of flow)

51 Summary Graph Min Cut Decomposable Submod Min Algorithms that leverage both convex optimization and the combinatorial structure (via graph max flow)

Inference in Higher Order MRF-MAP Problems with Small and Large Cliques

Inference in Higher Order MRF-MAP Problems with Small and Large Cliques Ishant Shanu 1 Chetan Arora 1 S.N. Maheshwari 2 1 IIIT Delhi 2 IIT Delhi Abstract Higher Order MRF-MAP formulation has been a popular