Parallel and Distributed Graph Cuts by Dual Decomposition

Parallel and Distributed Graph Cuts by Dual Decomposition Presented by Varun Gulshan 06 July, 2010

What is Dual Decomposition?

What is Dual Decomposition? It is a technique for optimization: usually used for approximate optimization.

What is Dual Decomposition? It is a technique for optimization: usually used for approximate optimization. Dual: There is dual involved somewhere.

What is Dual Decomposition? It is a technique for optimization: usually used for approximate optimization. Dual: There is dual involved somewhere. Decomposition: Some kind of decomposition involved.

Optimization Problem Consider an optimization problem of the form: min f 1 y + f 2 y y

Optimization Problem Consider an optimization problem of the form: min f 1 y + f 2 y y Usually f y = f 1 y + f 2 y is hard to optimize.

Optimization Problem Consider an optimization problem of the form: min f 1 y + f 2 y y Usually f y = f 1 y + f 2 y is hard to optimize. Each of f 1 y and f 2 y can be easily optimized separately.

Optimization Problem Consider an optimization problem of the form: min f 1 y + f 2 y y Usually f y = f 1 y + f 2 y is hard to optimize. Each of f 1 y and f 2 y can be easily optimized separately. Dual decomposition idea: Decompose original problem into optimizable subproblems and combine their solution in a principled way.

Optimization Problem Original problem: min f 1 y + f 2 y y

Optimization Problem Original problem: Duplicate variables: min f 1 y + f 2 y y min f 1 y 1 + f 2 y 2 y 1,y 2 s.t y 1 = y 2

Optimization Problem Duplicate variables: min f 1 y 1 + f 2 y 2 y 1,y 2 s.t y 1 y 2 = 0

Optimization Problem Duplicate variables: Write the lagrangian dual: min f 1 y 1 + f 2 y 2 y 1,y 2 s.t y 1 y 2 = 0 gλ = min y 1,y 2 f 1 y 1 + f 2 y 2 + λ T y 1 y 2

Optimization Problem The lagrangian dual: gλ = min y 1,y 2 f 1 y 1 + f 2 y 2 + λ T y 1 y 2

Optimization Problem The lagrangian dual: gλ = min y 1,y 2 f 1 y 1 + f 2 y 2 + λ T y 1 y 2 Decompose the lagrangian dual: gλ = min f 1 y 1 + λ T y 1 + min f 2 y 2 λ T y 2 y 1 y 2 = g 1 λ + g 2 λ

Optimization Problem The lagrangian dual: gλ = min y 1,y 2 f 1 y 1 + f 2 y 2 + λ T y 1 y 2 Decompose the lagrangian dual: gλ = min f 1 y 1 + λ T y 1 + min f 2 y 2 λ T y 2 y 1 y 2 = g 1 λ + g 2 λ where: g 1 λ = min f 1 y 1 + λ T y 1 y 1 g 2 λ = min f 2 y 2 λ T y 2 y 2

Maximizing Dual λ, gλ is a lower bound on the optimal of the primal. So maximize the lower bound: max gλ = max g 1 λ + g 2 λ λ λ

Maximizing Dual λ, gλ is a lower bound on the optimal of the primal. So maximize the lower bound: max gλ = max g 1 λ + g 2 λ λ λ The lagrangian dual gλ is always a concave function of λ, so gλ is convex, and hence is minimized easily. Sub-gradient descent is used to optimize wrt λ. Subgradient of g 1 λ = min y1 f 1 y 1 + y T λ is given by 1 g 1 λ = ȳ 1 = arg min y1 f 1 y 1 + y T λ. So computing 1 subgradient essentially involves solving this minimization.

Maximizing Dual: Algorithm max g 1 λ+g 2 λ λ [ ] = max min f 1 y 1 +λ T y 1 +min f 2 y 2 λ T y 2 λ y 1 y 2

Maximizing Dual: Algorithm max g 1 λ+g 2 λ λ 1 Initialize λcan be arbritary. [ ] = max min f 1 y 1 +λ T y 1 +min f 2 y 2 λ T y 2 λ y 1 y 2

Maximizing Dual: Algorithm max g 1 λ+g 2 λ λ 1 Initialize λcan be arbritary. [ ] = max min f 1 y 1 +λ T y 1 +min f 2 y 2 λ T y 2 λ y 1 y 2 2 Compute subgradient of g 1 λ and g 2 λ. Computing subgradient involves solving min y1 f 1 y 1 + λ T y 1 which is doable because f 1 y and f 2 y are minimizable. 3 Use subgradient to update value of λ usual gradient descent update rule. λ t+1 = λ t + α t ȳ 1 ȳ 2.

Recovering primal variables By solving the dual we get the globally optimal dual variable λ = arg max λ gλ. But we need the primal variables y 1, y 2 as that's what we are interested in optimizing originally. Note that while optimizing the dual, we computed subgradients which involved minimization over primal variables. We use those solutions as our primal variables, i.e:

Infeasible primal solution The obtained primal solutions y and 1 y will in general not 2 satisfy y = 1 y its only satised when duality gap is 0. 2

Infeasible primal solution The obtained primal solutions y 1 and y 2 will in general not satisfy y = 1 y its only satised when duality gap is 0. 2 One way is choose one of y 1 or y 2 whichever one has lower primal value.

Examples N. Komodakis, N. Paragios and G. Tziritas. MRF Optimization via Dual Decomposition: Message-Passing Revisited. ICCV 2007. They use dual decomposition to nd MAP solution to a MRF with loops. f y = θ ac y a, y c +θ ad y a, y d +θ ab y a, y b +θ bc y b, y c +θ bd y b, y d y {1, 2,, L}

Examples f y = f 1 y + f 2 y

Other applications Applications to optimization of higher order potentials i.e energies involving more than just pairwise terms. Parallelization of optimization next..

The missing parts Its easy to extend the discussion to decomposition into > 2 parts. There can be many choices for decompsing, which is a good choice? Theoretical properties, what guarantees can we get on the primal solution.