Parallel and Distributed Graph Cuts by Dual Decomposition Presented by Varun Gulshan 06 July, 2010
What is Dual Decomposition?
What is Dual Decomposition? It is a technique for optimization: usually used for approximate optimization.
What is Dual Decomposition? It is a technique for optimization: usually used for approximate optimization. Dual: There is dual involved somewhere.
What is Dual Decomposition? It is a technique for optimization: usually used for approximate optimization. Dual: There is dual involved somewhere. Decomposition: Some kind of decomposition involved.
Optimization Problem Consider an optimization problem of the form: min f 1 y + f 2 y y
Optimization Problem Consider an optimization problem of the form: min f 1 y + f 2 y y Usually f y = f 1 y + f 2 y is hard to optimize.
Optimization Problem Consider an optimization problem of the form: min f 1 y + f 2 y y Usually f y = f 1 y + f 2 y is hard to optimize. Each of f 1 y and f 2 y can be easily optimized separately.
Optimization Problem Consider an optimization problem of the form: min f 1 y + f 2 y y Usually f y = f 1 y + f 2 y is hard to optimize. Each of f 1 y and f 2 y can be easily optimized separately. Dual decomposition idea: Decompose original problem into optimizable subproblems and combine their solution in a principled way.
Optimization Problem Consider an optimization problem of the form: min f 1 y + f 2 y y Usually f y = f 1 y + f 2 y is hard to optimize. Each of f 1 y and f 2 y can be easily optimized separately. Dual decomposition idea: Decompose original problem into optimizable subproblems and combine their solution in a principled way. Use variable duplication and duality to achieve the above.
Optimization Problem Original problem: min f 1 y + f 2 y y
Optimization Problem Original problem: Duplicate variables: min f 1 y + f 2 y y min f 1 y 1 + f 2 y 2 y 1,y 2 s.t y 1 = y 2
Optimization Problem Duplicate variables: min f 1 y 1 + f 2 y 2 y 1,y 2 s.t y 1 y 2 = 0
Optimization Problem Duplicate variables: Write the lagrangian dual: min f 1 y 1 + f 2 y 2 y 1,y 2 s.t y 1 y 2 = 0 gλ = min y 1,y 2 f 1 y 1 + f 2 y 2 + λ T y 1 y 2
Optimization Problem The lagrangian dual: gλ = min y 1,y 2 f 1 y 1 + f 2 y 2 + λ T y 1 y 2
Optimization Problem The lagrangian dual: gλ = min y 1,y 2 f 1 y 1 + f 2 y 2 + λ T y 1 y 2 Decompose the lagrangian dual: gλ = min f 1 y 1 + λ T y 1 + min f 2 y 2 λ T y 2 y 1 y 2 = g 1 λ + g 2 λ
Optimization Problem The lagrangian dual: gλ = min y 1,y 2 f 1 y 1 + f 2 y 2 + λ T y 1 y 2 Decompose the lagrangian dual: gλ = min f 1 y 1 + λ T y 1 + min f 2 y 2 λ T y 2 y 1 y 2 = g 1 λ + g 2 λ where: g 1 λ = min f 1 y 1 + λ T y 1 y 1 g 2 λ = min f 2 y 2 λ T y 2 y 2
Maximizing Dual λ, gλ is a lower bound on the optimal of the primal. So maximize the lower bound: max gλ = max g 1 λ + g 2 λ λ λ
Maximizing Dual λ, gλ is a lower bound on the optimal of the primal. So maximize the lower bound: max gλ = max g 1 λ + g 2 λ λ λ The lagrangian dual gλ is always a concave function of λ, so gλ is convex, and hence is minimized easily.
Maximizing Dual λ, gλ is a lower bound on the optimal of the primal. So maximize the lower bound: max gλ = max g 1 λ + g 2 λ λ λ The lagrangian dual gλ is always a concave function of λ, so gλ is convex, and hence is minimized easily. Sub-gradient descent is used to optimize wrt λ.
Maximizing Dual λ, gλ is a lower bound on the optimal of the primal. So maximize the lower bound: max gλ = max g 1 λ + g 2 λ λ λ The lagrangian dual gλ is always a concave function of λ, so gλ is convex, and hence is minimized easily. Sub-gradient descent is used to optimize wrt λ. Subgradient of g 1 λ = min y1 f 1 y 1 + y T λ is given by 1 g 1 λ = ȳ 1 = arg min y1 f 1 y 1 + y T λ. So computing 1 subgradient essentially involves solving this minimization.
Maximizing Dual: Algorithm max g 1 λ+g 2 λ λ [ ] = max min f 1 y 1 +λ T y 1 +min f 2 y 2 λ T y 2 λ y 1 y 2
Maximizing Dual: Algorithm max g 1 λ+g 2 λ λ 1 Initialize λcan be arbritary. [ ] = max min f 1 y 1 +λ T y 1 +min f 2 y 2 λ T y 2 λ y 1 y 2
Maximizing Dual: Algorithm max g 1 λ+g 2 λ λ 1 Initialize λcan be arbritary. [ ] = max min f 1 y 1 +λ T y 1 +min f 2 y 2 λ T y 2 λ y 1 y 2 2 Compute subgradient of g 1 λ and g 2 λ. Computing subgradient involves solving min y1 f 1 y 1 + λ T y 1 which is doable because f 1 y and f 2 y are minimizable.
Maximizing Dual: Algorithm max g 1 λ+g 2 λ λ 1 Initialize λcan be arbritary. [ ] = max min f 1 y 1 +λ T y 1 +min f 2 y 2 λ T y 2 λ y 1 y 2 2 Compute subgradient of g 1 λ and g 2 λ. Computing subgradient involves solving min y1 f 1 y 1 + λ T y 1 which is doable because f 1 y and f 2 y are minimizable. 3 Use subgradient to update value of λ usual gradient descent update rule. λ t+1 = λ t + α t ȳ 1 ȳ 2.
Maximizing Dual: Algorithm max g 1 λ+g 2 λ λ 1 Initialize λcan be arbritary. [ ] = max min f 1 y 1 +λ T y 1 +min f 2 y 2 λ T y 2 λ y 1 y 2 2 Compute subgradient of g 1 λ and g 2 λ. Computing subgradient involves solving min y1 f 1 y 1 + λ T y 1 which is doable because f 1 y and f 2 y are minimizable. 3 Use subgradient to update value of λ usual gradient descent update rule. λ t+1 = λ t + α t ȳ 1 ȳ 2. 4 Goto Step 2, repeat until convergence of gλ.
Maximizing Dual: Algorithm max g 1 λ+g 2 λ λ 1 Initialize λcan be arbritary. [ ] = max min f 1 y 1 +λ T y 1 +min f 2 y 2 λ T y 2 λ y 1 y 2 2 Compute subgradient of g 1 λ and g 2 λ. Computing subgradient involves solving min y1 f 1 y 1 + λ T y 1 which is doable because f 1 y and f 2 y are minimizable. 3 Use subgradient to update value of λ usual gradient descent update rule. λ t+1 = λ t + α t ȳ 1 ȳ 2. 4 Goto Step 2, repeat until convergence of gλ. 5 Now we have the optimal λ, we need to recover the primal variables y 1 and y 2.
Recovering primal variables By solving the dual we get the globally optimal dual variable λ = arg max λ gλ. But we need the primal variables y 1, y 2 as that's what we are interested in optimizing originally.
Recovering primal variables By solving the dual we get the globally optimal dual variable λ = arg max λ gλ. But we need the primal variables y 1, y 2 as that's what we are interested in optimizing originally. Note that while optimizing the dual, we computed subgradients which involved minimization over primal variables. We use those solutions as our primal variables, i.e:
Recovering primal variables By solving the dual we get the globally optimal dual variable λ = arg max λ gλ. But we need the primal variables y 1, y 2 as that's what we are interested in optimizing originally. Note that while optimizing the dual, we computed subgradients which involved minimization over primal variables. We use those solutions as our primal variables, i.e: y 1 = arg min f 1 y 1 + λ T y 1 y 1 y 2 = arg min f 2 y 2 λ T y 2 y 2
Infeasible primal solution The obtained primal solutions y and 1 y will in general not 2 satisfy y = 1 y its only satised when duality gap is 0. 2
Infeasible primal solution The obtained primal solutions y 1 and y 2 will in general not satisfy y = 1 y its only satised when duality gap is 0. 2 One way is choose one of y 1 or y 2 whichever one has lower primal value.
Infeasible primal solution The obtained primal solutions y 1 and y 2 will in general not satisfy y = 1 y its only satised when duality gap is 0. 2 One way is choose one of y 1 or y 2 whichever one has lower primal value. Currently I havent gured out how people nd a good feasible primal solution Victor help!.
Examples N. Komodakis, N. Paragios and G. Tziritas. MRF Optimization via Dual Decomposition: Message-Passing Revisited. ICCV 2007. They use dual decomposition to nd MAP solution to a MRF with loops. f y = θ ac y a, y c +θ ad y a, y d +θ ab y a, y b +θ bc y b, y c +θ bd y b, y d y {1, 2,, L}
Examples f y = f 1 y + f 2 y
Other applications Applications to optimization of higher order potentials i.e energies involving more than just pairwise terms. Parallelization of optimization next..
The missing parts Its easy to extend the discussion to decomposition into > 2 parts. There can be many choices for decompsing, which is a good choice? Theoretical properties, what guarantees can we get on the primal solution.