Dynamic Tree Block Coordinate Ascent

Size: px

Start display at page:

Download "Dynamic Tree Block Coordinate Ascent"

Denis Todd
5 years ago
Views:

1 Dynamic Tree Block Coordinate Ascent Daniel Tarlow 1, Dhruv Batra 2 Pushmeet Kohli 3, Vladimir Kolmogorov 4 1: University of Toronto 3: Microso3 Research Cambridge 2: TTI Chicago 4: University College London InternaAonal Conference on Machine Learning (ICML), 2011

2 MAP in Large Discrete Models Many important problems can be expressed as a discrete Random Field (MRF, CRF) MAP inference is a fundamental problem min x X E(x) = min x X i V θ i (x i )+ (i,j) E θ ij (x i,x j ) InpainAng Stereo Object Class Labeling Protein Design / Side Chain PredicAon

3 Primal and Dual min x Primal A V E θ A (x A ) A V E Dual min x A θa (x A )= A V E h A Dual is a lower bound: less constrained version of primal is a reparameteriza)on, determined by messages h A* is height of unary or pairwise potenaal DefiniAon of reparameterizaaon: θ A (x A )= A V E A V E θ A (x A ) {x A } LP based message passing: find reparameterizaaon to maximize dual

4 Standard Linear Program based Message Passing Max Product Linear Programming (MPLP) Update edges in fixed order SequenAal Tree Reweighted Max Product (TRW S) SequenAally iterate over variables in fixed order Tree Block Coordinate Ascent (TBCA) [Sontag & Jaakkola, 2009] Update trees in fixed order Key: these are all energy oblivious Can we do be^er by being energy aware?

5 Example TBCA with StaJc Schedule: 630 messages needed TBCA with Dynamic Schedule: 276 messages needed

Benefit of Energy Awareness StaJc semngs Not all graph regions are equally difficult RepeaAng computaaon on easy parts is wasteful Easy region Dynamic semngs

6 Benefit of Energy Awareness StaJc semngs Not all graph regions are equally difficult RepeaAng computaaon on easy parts is wasteful Easy region Dynamic semngs (e.g., learning, search) Harder region Small region of graph changes. ComputaAon on unchanged part is wasteful Unchanged Image Previous OpAmum Change Mask Changed

7 References and Related Work [Elidan et al., 2006], [Su^on & McCallum, 2007] Residual Belief PropagaAon. Pass most different messages first. [Chandrasekaran et al., 2007] Works only on conanuous variables. Very different formulaaon. [Batra et al., 2011] Local Primal Dual Gap for Tightening LP relaxaaons. [Kolmogorov, 2006] Weak Tree Agreement in relaaon to TRW S. [Sontag et al., 2009] Tree Block Coordinate Descent.

8 VisualizaJon of reparameterized energy States for each variable: red (R), green (G), or blue (B) "Good" local settings: (can assume "good" has cost 0, otherwise cost 1) G G G, B B G G G, B B G G B, B B B x 1 x 2 x 3 x 4 R G B

9 VisualizaJon of reparameterized energy States for each variable: red (R), green (G), or blue (B) "Good" local settings: R G B G G G, B B G G G, B B G G B, B B B "Don't be R or B" "Don't be R or G" x 1 x 2 x 3 x 4 "Don't be R or B" "Don't be R or B" "Don't be R or B" "Don't be R" HypotheJcal messages that e.g. residual max product would send.

10 VisualizaJon of reparameterized energy States for each variable: red (R), green (G), or blue (B) "Good" local settings: G G G, B B G G G, B B G G B, B B B x 1 x 2 x 3 x 4 R G B But we don't need to send any messages. We are at the global opjmum. Our scores (see later slides) are 0, so we wouldn't send any messages here.

11 VisualizaJon of reparameterized energy States for each variable: red (R), green (G), or blue (B) "Good" local settings: B G G, B B B G G, B B B G B, B B B x 1 x 2 x 3 x 4 R G B Change unary potenjals (e.g., during learning or search)

12 VisualizaJon of reparameterized energy States for each variable: red (R), green (G), or blue (B) "Good" local settings: B G G, B B B G G, B B B G B, B B B x 1 x 2 x 3 x 4 R G B Locally, best assignment for some variables change.

13 VisualizaJon of reparameterized energy States for each variable: red (R), green (G), or blue (B) "Good" local settings: R G B B G G, B B B G G, B B B G B, B B B "Don't be R or G" "Don't be R or G" x 1 x 2 x 3 x 4 "Don't be R or G" "Don't be R or G" "Don't be R or G" "Don't be R" HypotheJcal messages that e.g. residual max product would send.

14 VisualizaJon of reparameterized energy States for each variable: red (R), green (G), or blue (B) "Good" local settings: B G G, B B B G G, B B B G B, B B B x 1 x 2 x 3 x 4 R G B But we don't need to send any messages. We are at the global opjmum. Our scores (see later slides) are 0, so we wouldn't send any messages here.

15 VisualizaJon of reparameterized energy "Good" local settings: B G G, B B B G G, B B B G B, B B B x 1 x 2 x 3 x 4 Possible fix: look at how much sending messages on edge would improve dual. Would work in above case, but incorrectly ignores e.g. the subgraph below: "Good" local settings: B B B G, B G G R, G R R R R G B x 1 x 2 x 3 x 4

16 Key Slide B B B G, B B B R,G R R R x 1 x 2 x 3 x 4 Locally, everything looks opamal

17 Key Slide B B B G, B B B R,G R R R x 1 x 2 x 3 x 4 Try assigning a value to each variable

18 Key Slide Our main contribuaon Use primal (and dual) informajon to choose regions on which to pass messages B B B G, B B B R,G R R R x 1 x 2 x 3 x 4 SubopAmal SubopAmal Try assigning a value to each variable

19 Our FormulaJon Measure primal dual local agreement at edges and variables Local Primal Dual Gap (LPDG). Weak Tree Agreement (WTA). Choose forest with maximum disagreement Kruskal's algorithm, possibly terminated early Apply TBCA update on maximal trees Important! Minimize overhead. Use quanaaes that are already computed during inference, and carefully cache computaaons

(x p A ) A V E min x A θa (x A ) = A V E primal dual ( ) θ A (x p A ) min θa (x A ) x A = A V E

20 Local Primal Dual Gap (LPDG) Score Difference between primal and dual objecaves Primal dual gap Given primal assignment x p and dual variables (messages) defining, primal dual gap is A V E θ A (x p A ) A V E min x A θa (x A ) = A V E primal dual ( ) θ A (x p A ) min θa (x A ) x A = A V E LPDG (A) Primal cost of node/edge Dual bound at node/edge e: local disagreement measure: e A = LPDG (A)

21 Shortcoming of LPDG Score: Loose RelaxaJons LPDG > 0, but dual opamal Filled circle means, black edge means

22 Weak Tree Agreement (WTA) [Kolmogorov 2006] Reparameterized potenaals are said to saasfy WTA if there exist non empty subsets for each node i D i X i such that θ i (x i )=h i min θij (x i,x j )=h ij x j D j x i D i x i D i, (i, j) E Filled circle means Black edge means labels labels At Weak Tree Agreement Not at Weak Tree Agreement

23 Weak Tree Agreement (WTA) [Kolmogorov 2006] Reparameterized potenaals are said to saasfy WTA if there exist non empty subsets for each node i D i X i such that θ i (x i )=h i min θij (x i,x j )=h ij x j D j x i D i x i D i, (i, j) E Filled circle means Black edge means labels labels At Weak Tree Agreement D 1 ={0} D 2 ={0,2} D 2 ={0,2} D 3 ={0} Not at Weak Tree Agreement D 1 ={0} D 2 ={2} D 2 ={0,2} D 3 ={0}

24 Weak Tree Agreement (WTA) [Kolmogorov 2006] Reparameterized potenaals are said to saasfy WTA if there exist non empty subsets for each node i D i X i such that θ i (x i )=h i min θij (x i,x j )=h ij x j D j x i D i x i D i, (i, j) E Filled circle means Black edge means labels labels At Weak Tree Agreement Not at Weak Tree Agreement D 1 ={0} D 2 ={2} D 2 ={0,2} D 3 ={0}

25 WTA Score e: local disagreement measure Costs: solid low do^ed medium else high D 2 ={0,2} D 3 ={0,2} e 23 = max( min(a,high), min(b,c) ) c Filled circle means, black edge means

26 WTA Score e: local disagreement measure Costs: solid low do^ed medium else high D 2 ={0,2} D 3 ={0,2} e 23 = max( min(a,high), min(b,c) ) c Filled circle means, black edge means

27 WTA Score e: local disagreement measure Costs: solid low do^ed medium else high D 2 ={0,2} D 3 ={0,2} e 23 = max a, c ( ) c = a c Filled circle means, black edge means

28 WTA Score e: local disagreement measure Costs: solid low do^ed medium else high D 2 ={0,2} D 3 ={0,2} e 23 = max a, c ( ) c = a c Filled circle means, black edge means

29 WTA Score e: local disagreement measure: node measure e i = max θ i (x i ) min θ i (x i ) x i D i x i

30 Single FormulaJon of LPDG and WTA Set a max history size parameter R. Store most recent R labelings of variable i in label set D i R=1: LPDG score. R>1: WTA score. Combine scores into undirected edge score:

31 ProperJes of LPDG/WTA Scores LPDG measure gives upper bound on possible dual improvement from passing messages on forest LPDG may overesamate "usefulness" of an edge e.g., on non Aght relaxaaons. LPDG > 0 WTA = 0 WTA measure addresses overesamate problem: is zero shortly a3er normal message passing would converge. Both only change when messages are passed on nearby region of graph.

32 Experiments Computer Vision: Stereo Image SegmentaAon Dynamic Image SegmentaAon Protein Design: StaAc problem CorrelaAon between measure and dual improvement Dynamic search applicaaon Algorithms TBCA: StaAc Schedule, LPDG Schedule, WTA Schedule MPLP [Sontag and Globerson implementaaon] TRW S [Kolmogorov ImplementaAon]

33 Experiments: Stereo 383x434 pixels, 16 labels. Po^s potenaals.

34 Experiments: Image SegmentaJon +,!" (, '(%& + (!&!* %*%, 5678,9:;03<.=0!&!)(!&!)!&!'(!&!', ->?@!AB5C ->?@!D-@ ->?@!?E/:2 -FD!G H@B!"" #"" $"" %"" -./0,12034.;2<(=>?541/@5 %*%$+ %*%$ %*%)+ %*%) ( 6ABC!DE.F 6ABC!G6C 6ABC!B9:>0 6HG!I JCE! " # $ %& %! -(./012345( :0 '(%& ) 375x500 pixels, 21 labels. General potenaals based on label co occurence.

Experiments: Dynamic Image SegmentaJon Sheep!&'# +,!"*!&'*!&') 6>?!@,.A2B 6>?

35 Experiments: Dynamic Image SegmentaJon Sheep!&'# +,!"*!&'*!&') 6DEF!?6F,.A2B 6DEF!?6F,C7/:2B, Previous Opt Modify White Unaries ;357<=!&''!&'( Sheep!&'! New Opt Heatmap of Messages!&',!" #!" $!" % -,./012345, :0 Warm started DTBCA vs Warm started TRW S 375x500 pixels, 21 labels. Po^s potenaals.

36 Experiments: Dynamic Image SegmentaJon Airplane Previous Opt Modify White Unaries New Opt Heatmap of Messages Warm started DTBCA vs Warm started TRW S 375x500 pixels, 21 labels. Po^s potenaals.

37 Experiments: Protein Design Dual Improvement vs. Measure on Forest Dual Improvement 10 1 Random Blocks Max WTA Block Max LPDG Block 10 1 LPDG/WTA Measure Dual Improvement 10!3 10!5 Random Blocks Max LPDG Block 10!5 10!3 10!1 LPDG Measure Dual Improvement 10!3 10!5 Random Blocks Max WTA Block 10!5 10!3 10!1 WTA Measure Other protein experiments: (see paper) DTBCA vs. staac "stars" on small protein DTBCA converges to opamum in.39s vs TBCA in.86s SimulaAng node expansion in A* search on larger protein Similar dual for DTBCA in 5s as Warm started TRW S in 50s. Protein Design from Yanover et al.

38 Discussion Energy oblivious schedules can be wasteful. For LP based message passing, primal informaaon is useful for scheduling. We give two low overhead ways of including it Biggest win comes from dynamic applicaaons ExciAng future dynamic applicaaons: search, learning,...

39 Discussion Energy oblivious schedules can be wasteful. For LP based message passing, primal informaaon is useful for scheduling. We give two low overhead ways of including it Biggest win comes from dynamic applicaaons ExciAng future dynamic applicaaons: search, learning,... Thank You!

40 Unused slides

41 Schlesinger's Linear Program (LP) Real valued exact Marginal polytope approx LOCAL polytope (see next slide)

42 Primal Dual

44 WTA Score e: local disagreement measure D 2 ={0,2} D 3 ={0} Filled circle means, black edge means

An Introduction to LP Relaxations for MAP Inference

An Introduction to LP Relaxations for MAP Inference Adrian Weller MLSALT4 Lecture Feb 27, 2017 With thanks to David Sontag (NYU) for use of some of his slides and illustrations For more information, see