Parallel and Distributed Graph Cuts by Dual Decomposition

Similar documents
Convex Optimization and Machine Learning

Lecture 4 Duality and Decomposition Techniques

Sparse Optimization Lecture: Proximal Operator/Algorithm and Lagrange Dual

Constrained optimization

Surrogate Gradient Algorithm for Lagrangian Relaxation 1,2

COMS 4771 Support Vector Machines. Nakul Verma

A MRF Shape Prior for Facade Parsing with Occlusions Supplementary Material

Lagrangian Relaxation: An overview

Gate Sizing by Lagrangian Relaxation Revisited

CONLIN & MMA solvers. Pierre DUYSINX LTAS Automotive Engineering Academic year

Nonlinear Programming

Characterizing Improving Directions Unconstrained Optimization

MVE165/MMG630, Applied Optimization Lecture 8 Integer linear programming algorithms. Ann-Brith Strömberg

Optimization of Discrete Markov Random Fields via Dual Decomposition

Improving Dual Bound for Stochastic MILP Models Using Sensitivity Analysis

David G. Luenberger Yinyu Ye. Linear and Nonlinear. Programming. Fourth Edition. ö Springer

Lecture 18: March 23

Lagrangean relaxation - exercises

Inter and Intra-Modal Deformable Registration:

15.082J and 6.855J. Lagrangian Relaxation 2 Algorithms Application to LPs

Advanced Operations Research Techniques IE316. Quiz 2 Review. Dr. Ted Ralphs

MVE165/MMG631 Linear and integer optimization with applications Lecture 9 Discrete optimization: theory and algorithms

Kernels and Constrained Optimization

ISM206 Lecture, April 26, 2005 Optimization of Nonlinear Objectives, with Non-Linear Constraints

Part 5: Structured Support Vector Machines

Planar Cycle Covering Graphs

A Bundle Approach To Efficient MAP-Inference by Lagrangian Relaxation (corrected version)

Support Vector Machines. James McInerney Adapted from slides by Nakul Verma

ONLY AVAILABLE IN ELECTRONIC FORM

INTRODUCTION TO LINEAR AND NONLINEAR PROGRAMMING

LECTURE 13: SOLUTION METHODS FOR CONSTRAINED OPTIMIZATION. 1. Primal approach 2. Penalty and barrier methods 3. Dual approach 4. Primal-dual approach

Introduction to Constrained Optimization

PRIMAL-DUAL INTERIOR POINT METHOD FOR LINEAR PROGRAMMING. 1. Introduction

Selected Topics in Column Generation

Convex Optimization Lecture 2

LECTURE 18 LECTURE OUTLINE

Optimization Methods. Final Examination. 1. There are 5 problems each w i t h 20 p o i n ts for a maximum of 100 points.

Linear Programming. Larry Blume. Cornell University & The Santa Fe Institute & IHS

Solution Methods Numerical Algorithms

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 2. Convex Optimization

Computational Optimization. Constrained Optimization Algorithms

Machine Learning for Signal Processing Lecture 4: Optimization

Lecture 19: Convex Non-Smooth Optimization. April 2, 2007

Biomedical Image Analysis Using Markov Random Fields & Efficient Linear Programing

CONVEX OPTIMIZATION: A SELECTIVE OVERVIEW

A Joint Congestion Control, Routing, and Scheduling Algorithm in Multihop Wireless Networks with Heterogeneous Flows

Link Dimensioning and LSP Optimization for MPLS Networks Supporting DiffServ EF and BE Classes

Tightening MRF Relaxations with Planar Subproblems

Linear methods for supervised learning

Convex Optimization MLSS 2015

LaGO - A solver for mixed integer nonlinear programming

Lagrangean Methods bounding through penalty adjustment

Programming, numerics and optimization

Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization. Author: Martin Jaggi Presenter: Zhongxing Peng

An Improved Subgradiend Optimization Technique for Solving IPs with Lagrangean Relaxation

Crash-Starting the Simplex Method

Introduction to Optimization

Lagrangian Relaxation

Convex Programs. COMPSCI 371D Machine Learning. COMPSCI 371D Machine Learning Convex Programs 1 / 21

Cosegmentation Revisited: Models and Optimization

The Alternating Direction Method of Multipliers

Parallel and Distributed Vision Algorithms Using Dual Decomposition

Alternating Direction Method of Multipliers

Part 4. Decomposition Algorithms Dantzig-Wolf Decomposition Algorithm

DETERMINISTIC OPERATIONS RESEARCH

Symbolic Subdifferentiation in Python

Benders in a nutshell Matteo Fischetti, University of Padova

Assignment 5. Machine Learning, Summer term 2014, Ulrike von Luxburg To be discussed in exercise groups on May 19-21

Modern Benders (in a nutshell)

ME 555: Distributed Optimization

Solving the Master Linear Program in Column Generation Algorithms for Airline Crew Scheduling using a Subgradient Method.

Linear Programming Problems

Lecture 10: SVM Lecture Overview Support Vector Machines The binary classification problem

Lecture Notes: Constraint Optimization

Lecture 7: Support Vector Machine

Graph cut based image segmentation with connectivity priors

Today. Gradient descent for minimization of functions of real variables. Multi-dimensional scaling. Self-organizing maps

Column Generation Based Primal Heuristics

Visual Recognition: Examples of Graphical Models

Revisiting MAP Estimation, Message Passing and Perfect Graphs

Multi-stage Stochastic Programming, Stochastic Decomposition, and Connections to Dynamic Programming: An Algorithmic Perspective

On the Global Solution of Linear Programs with Linear Complementarity Constraints

A primal-dual framework for mixtures of regularizers

Convex Optimization. Erick Delage, and Ashutosh Saxena. October 20, (a) (b) (c)

Network Flows. 7. Multicommodity Flows Problems. Fall 2010 Instructor: Dr. Masoud Yaghini

to the Traveling Salesman Problem 1 Susanne Timsj Applied Optimization and Modeling Group (TOM) Department of Mathematics and Physics

Contents. I Basics 1. Copyright by SIAM. Unauthorized reproduction of this article is prohibited.

Optimization for Machine Learning

Solutions for Operations Research Final Exam

Probabilistic Graphical Models

Lec 11 Rate-Distortion Optimization (RDO) in Video Coding-I

10. Network dimensioning

Introduction to Optimization

Lagrangian Decompositions for the Two-Level FTTx Network Design Problem

Linear programming and duality theory

Dual Subgradient Methods Using Approximate Multipliers

Algorithmic innovations and software for the dual decomposition method applied to stochastic mixed-integer programs

Getting Feasible Variable Estimates From Infeasible Ones: MRF Local Polytope Study

Gate Sizing and Vth Assignment for Asynchronous Circuits Using Lagrangian Relaxation. Gang Wu, Ankur Sharma and Chris Chu Iowa State University

Bilinear Programming

Transcription:

Parallel and Distributed Graph Cuts by Dual Decomposition Presented by Varun Gulshan 06 July, 2010

What is Dual Decomposition?

What is Dual Decomposition? It is a technique for optimization: usually used for approximate optimization.

What is Dual Decomposition? It is a technique for optimization: usually used for approximate optimization. Dual: There is dual involved somewhere.

What is Dual Decomposition? It is a technique for optimization: usually used for approximate optimization. Dual: There is dual involved somewhere. Decomposition: Some kind of decomposition involved.

Optimization Problem Consider an optimization problem of the form: min f 1 y + f 2 y y

Optimization Problem Consider an optimization problem of the form: min f 1 y + f 2 y y Usually f y = f 1 y + f 2 y is hard to optimize.

Optimization Problem Consider an optimization problem of the form: min f 1 y + f 2 y y Usually f y = f 1 y + f 2 y is hard to optimize. Each of f 1 y and f 2 y can be easily optimized separately.

Optimization Problem Consider an optimization problem of the form: min f 1 y + f 2 y y Usually f y = f 1 y + f 2 y is hard to optimize. Each of f 1 y and f 2 y can be easily optimized separately. Dual decomposition idea: Decompose original problem into optimizable subproblems and combine their solution in a principled way.

Optimization Problem Consider an optimization problem of the form: min f 1 y + f 2 y y Usually f y = f 1 y + f 2 y is hard to optimize. Each of f 1 y and f 2 y can be easily optimized separately. Dual decomposition idea: Decompose original problem into optimizable subproblems and combine their solution in a principled way. Use variable duplication and duality to achieve the above.

Optimization Problem Original problem: min f 1 y + f 2 y y

Optimization Problem Original problem: Duplicate variables: min f 1 y + f 2 y y min f 1 y 1 + f 2 y 2 y 1,y 2 s.t y 1 = y 2

Optimization Problem Duplicate variables: min f 1 y 1 + f 2 y 2 y 1,y 2 s.t y 1 y 2 = 0

Optimization Problem Duplicate variables: Write the lagrangian dual: min f 1 y 1 + f 2 y 2 y 1,y 2 s.t y 1 y 2 = 0 gλ = min y 1,y 2 f 1 y 1 + f 2 y 2 + λ T y 1 y 2

Optimization Problem The lagrangian dual: gλ = min y 1,y 2 f 1 y 1 + f 2 y 2 + λ T y 1 y 2

Optimization Problem The lagrangian dual: gλ = min y 1,y 2 f 1 y 1 + f 2 y 2 + λ T y 1 y 2 Decompose the lagrangian dual: gλ = min f 1 y 1 + λ T y 1 + min f 2 y 2 λ T y 2 y 1 y 2 = g 1 λ + g 2 λ

Optimization Problem The lagrangian dual: gλ = min y 1,y 2 f 1 y 1 + f 2 y 2 + λ T y 1 y 2 Decompose the lagrangian dual: gλ = min f 1 y 1 + λ T y 1 + min f 2 y 2 λ T y 2 y 1 y 2 = g 1 λ + g 2 λ where: g 1 λ = min f 1 y 1 + λ T y 1 y 1 g 2 λ = min f 2 y 2 λ T y 2 y 2

Maximizing Dual λ, gλ is a lower bound on the optimal of the primal. So maximize the lower bound: max gλ = max g 1 λ + g 2 λ λ λ

Maximizing Dual λ, gλ is a lower bound on the optimal of the primal. So maximize the lower bound: max gλ = max g 1 λ + g 2 λ λ λ The lagrangian dual gλ is always a concave function of λ, so gλ is convex, and hence is minimized easily.

Maximizing Dual λ, gλ is a lower bound on the optimal of the primal. So maximize the lower bound: max gλ = max g 1 λ + g 2 λ λ λ The lagrangian dual gλ is always a concave function of λ, so gλ is convex, and hence is minimized easily. Sub-gradient descent is used to optimize wrt λ.

Maximizing Dual λ, gλ is a lower bound on the optimal of the primal. So maximize the lower bound: max gλ = max g 1 λ + g 2 λ λ λ The lagrangian dual gλ is always a concave function of λ, so gλ is convex, and hence is minimized easily. Sub-gradient descent is used to optimize wrt λ. Subgradient of g 1 λ = min y1 f 1 y 1 + y T λ is given by 1 g 1 λ = ȳ 1 = arg min y1 f 1 y 1 + y T λ. So computing 1 subgradient essentially involves solving this minimization.

Maximizing Dual: Algorithm max g 1 λ+g 2 λ λ [ ] = max min f 1 y 1 +λ T y 1 +min f 2 y 2 λ T y 2 λ y 1 y 2

Maximizing Dual: Algorithm max g 1 λ+g 2 λ λ 1 Initialize λcan be arbritary. [ ] = max min f 1 y 1 +λ T y 1 +min f 2 y 2 λ T y 2 λ y 1 y 2

Maximizing Dual: Algorithm max g 1 λ+g 2 λ λ 1 Initialize λcan be arbritary. [ ] = max min f 1 y 1 +λ T y 1 +min f 2 y 2 λ T y 2 λ y 1 y 2 2 Compute subgradient of g 1 λ and g 2 λ. Computing subgradient involves solving min y1 f 1 y 1 + λ T y 1 which is doable because f 1 y and f 2 y are minimizable.

Maximizing Dual: Algorithm max g 1 λ+g 2 λ λ 1 Initialize λcan be arbritary. [ ] = max min f 1 y 1 +λ T y 1 +min f 2 y 2 λ T y 2 λ y 1 y 2 2 Compute subgradient of g 1 λ and g 2 λ. Computing subgradient involves solving min y1 f 1 y 1 + λ T y 1 which is doable because f 1 y and f 2 y are minimizable. 3 Use subgradient to update value of λ usual gradient descent update rule. λ t+1 = λ t + α t ȳ 1 ȳ 2.

Maximizing Dual: Algorithm max g 1 λ+g 2 λ λ 1 Initialize λcan be arbritary. [ ] = max min f 1 y 1 +λ T y 1 +min f 2 y 2 λ T y 2 λ y 1 y 2 2 Compute subgradient of g 1 λ and g 2 λ. Computing subgradient involves solving min y1 f 1 y 1 + λ T y 1 which is doable because f 1 y and f 2 y are minimizable. 3 Use subgradient to update value of λ usual gradient descent update rule. λ t+1 = λ t + α t ȳ 1 ȳ 2. 4 Goto Step 2, repeat until convergence of gλ.

Maximizing Dual: Algorithm max g 1 λ+g 2 λ λ 1 Initialize λcan be arbritary. [ ] = max min f 1 y 1 +λ T y 1 +min f 2 y 2 λ T y 2 λ y 1 y 2 2 Compute subgradient of g 1 λ and g 2 λ. Computing subgradient involves solving min y1 f 1 y 1 + λ T y 1 which is doable because f 1 y and f 2 y are minimizable. 3 Use subgradient to update value of λ usual gradient descent update rule. λ t+1 = λ t + α t ȳ 1 ȳ 2. 4 Goto Step 2, repeat until convergence of gλ. 5 Now we have the optimal λ, we need to recover the primal variables y 1 and y 2.

Recovering primal variables By solving the dual we get the globally optimal dual variable λ = arg max λ gλ. But we need the primal variables y 1, y 2 as that's what we are interested in optimizing originally.

Recovering primal variables By solving the dual we get the globally optimal dual variable λ = arg max λ gλ. But we need the primal variables y 1, y 2 as that's what we are interested in optimizing originally. Note that while optimizing the dual, we computed subgradients which involved minimization over primal variables. We use those solutions as our primal variables, i.e:

Recovering primal variables By solving the dual we get the globally optimal dual variable λ = arg max λ gλ. But we need the primal variables y 1, y 2 as that's what we are interested in optimizing originally. Note that while optimizing the dual, we computed subgradients which involved minimization over primal variables. We use those solutions as our primal variables, i.e: y 1 = arg min f 1 y 1 + λ T y 1 y 1 y 2 = arg min f 2 y 2 λ T y 2 y 2

Infeasible primal solution The obtained primal solutions y and 1 y will in general not 2 satisfy y = 1 y its only satised when duality gap is 0. 2

Infeasible primal solution The obtained primal solutions y 1 and y 2 will in general not satisfy y = 1 y its only satised when duality gap is 0. 2 One way is choose one of y 1 or y 2 whichever one has lower primal value.

Infeasible primal solution The obtained primal solutions y 1 and y 2 will in general not satisfy y = 1 y its only satised when duality gap is 0. 2 One way is choose one of y 1 or y 2 whichever one has lower primal value. Currently I havent gured out how people nd a good feasible primal solution Victor help!.

Examples N. Komodakis, N. Paragios and G. Tziritas. MRF Optimization via Dual Decomposition: Message-Passing Revisited. ICCV 2007. They use dual decomposition to nd MAP solution to a MRF with loops. f y = θ ac y a, y c +θ ad y a, y d +θ ab y a, y b +θ bc y b, y c +θ bd y b, y d y {1, 2,, L}

Examples f y = f 1 y + f 2 y

Other applications Applications to optimization of higher order potentials i.e energies involving more than just pairwise terms. Parallelization of optimization next..

The missing parts Its easy to extend the discussion to decomposition into > 2 parts. There can be many choices for decompsing, which is a good choice? Theoretical properties, what guarantees can we get on the primal solution.