Regularization and Markov Random Fields (MRF) CS 664 Spring 2008

Similar documents
Algorithms for Markov Random Fields in Computer Vision

Markov Networks in Computer Vision. Sargur Srihari

Markov Networks in Computer Vision

Models for grids. Computer vision: models, learning and inference. Multi label Denoising. Binary Denoising. Denoising Goal.

Image Restoration using Markov Random Fields

Markov Random Fields and Gibbs Sampling for Image Denoising

Statistical and Learning Techniques in Computer Vision Lecture 1: Markov Random Fields Jens Rittscher and Chuck Stewart

Bayesian Methods in Vision: MAP Estimation, MRFs, Optimization

CS6670: Computer Vision

Convexization in Markov Chain Monte Carlo

Efficient Belief Propagation for Early Vision

intro, applications MRF, labeling... how it can be computed at all? Applications in segmentation: GraphCut, GrabCut, demos

Graphical Models for Computer Vision

Motivation: Shortcomings of Hidden Markov Model. Ko, Youngjoong. Solution: Maximum Entropy Markov Model (MEMM)

D-Separation. b) the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C.

Markov Random Fields and Segmentation with Graph Cuts

CS242: Probabilistic Graphical Models Lecture 3: Factor Graphs & Variable Elimination

Digital Image Processing Laboratory: MAP Image Restoration

MRFs and Segmentation with Graph Cuts

Problem Set 4. Assigned: March 23, 2006 Due: April 17, (6.882) Belief Propagation for Segmentation

CS 664 Flexible Templates. Daniel Huttenlocher

Undirected Graphical Models. Raul Queiroz Feitosa

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

Object Recognition Using Pictorial Structures. Daniel Huttenlocher Computer Science Department. In This Talk. Object recognition in computer vision

Mesh segmentation. Florent Lafarge Inria Sophia Antipolis - Mediterranee

Part II. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Computer Vision Group Prof. Daniel Cremers. 4a. Inference in Graphical Models

Discrete Optimization Algorithms in Computer Vision

Probabilistic Graphical Models

CS 664 Segmentation. Daniel Huttenlocher

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

Conditional Random Fields and beyond D A N I E L K H A S H A B I C S U I U C,

Some material taken from: Yuri Boykov, Western Ontario

Geometric Registration for Deformable Shapes 3.3 Advanced Global Matching

EE795: Computer Vision and Intelligent Systems

What have we leaned so far?


Introduction to Graphical Models

Graphical Models & HMMs

ECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

Neural Network and Deep Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina

Fast Approximate Energy Minimization via Graph Cuts

Comparison of Graph Cuts with Belief Propagation for Stereo, using Identical MRF Parameters

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

Markov/Conditional Random Fields, Graph Cut, and applications in Computer Vision

Machine Learning. Sourangshu Bhattacharya

Combinatorial optimization and its applications in image Processing. Filip Malmberg

Scene Grammars, Factor Graphs, and Belief Propagation

Shallow Parsing Swapnil Chaudhari 11305R011 Ankur Aher Raj Dabre 11305R001

Computer Vision I - Filtering and Feature detection

CS 664 Slides #11 Image Segmentation. Prof. Dan Huttenlocher Fall 2003

10708 Graphical Models: Homework 4

Collective classification in network data

Efficient Belief Propagation for Early Vision

Unsupervised Texture Image Segmentation Using MRF- EM Framework

Mean Field and Variational Methods finishing off

CS 534: Computer Vision Segmentation II Graph Cuts and Image Segmentation

Scene Grammars, Factor Graphs, and Belief Propagation

Markov Random Fields and Segmentation with Graph Cuts

STA 4273H: Statistical Machine Learning

More details on Loopy BP

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Edge and local feature detection - 2. Importance of edge detection in computer vision

Conditional Random Fields - A probabilistic graphical model. Yen-Chin Lee 指導老師 : 鮑興國

Loopy Belief Propagation

Graphical Models. Pradeep Ravikumar Department of Computer Science The University of Texas at Austin

Semantic Segmentation. Zhongang Qi

3 : Representation of Undirected GMs

Expectation Propagation

All images are degraded

Motion Estimation (II) Ce Liu Microsoft Research New England

Learning Two-View Stereo Matching

Structured Models in. Dan Huttenlocher. June 2010

The Rendering Equation and Path Tracing

18 October, 2013 MVA ENS Cachan. Lecture 6: Introduction to graphical models Iasonas Kokkinos

Complex Prediction Problems

Neighbourhood-consensus message passing and its potentials in image processing applications

CS 6784 Paper Presentation

Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields

SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014

Lecture 21 : A Hybrid: Deep Learning and Graphical Models

Lecture 3: Conditional Independence - Undirected

Image analysis. Computer Vision and Classification Image Segmentation. 7 Image analysis

Segmentation and Grouping

Graphical Models, Bayesian Method, Sampling, and Variational Inference

Programming, numerics and optimization

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms for Inference Fall 2014

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas

Application of MRF s to Segmentation

Generative and discriminative classification techniques

Digital Image Processing Laboratory: Markov Random Fields and MAP Image Segmentation

Computer Vision for HCI. Topics of This Lecture

The K-modes and Laplacian K-modes algorithms for clustering

Mean Field and Variational Methods finishing off

MRF-based Algorithms for Segmentation of SAR Images

Image features. Image Features

27: Hybrid Graphical Models and Neural Networks

Parallel constraint optimization algorithms for higher-order discrete graphical models: applications in hyperspectral imaging

Transcription:

Regularization and Markov Random Fields (MRF) CS 664 Spring 2008

Regularization in Low Level Vision Low level vision problems concerned with estimating some quantity at each pixel Visual motion (u(x,y),v(x,y)) Stereo disparity d(x,y) Restoration of true intensity b(x,y) Problem under constrained Only able to observe noisy values at each pixel Sometimes single pixel not enough to estimate value Need to apply other constraints 2

Smooth but with discontinuities 3

Small discontinuities important 4

Smoothness Constraints Estimated values should change slowly as function of (x,y) Except boundaries which are relatively rare Minimize an error function E(r(x,y)) = V(r(x,y)) + λ D I (r(x,y)) For r being estimated at each x,y location V penalizes change in r in local neighborhood D I penalizes r disagreeing with image data λ controls tradeoff of these smoothness and data terms Can itself be parameterized by x,y 5

Regularization for Visual Motion Use quadratic error function Smoothness term V (u(x,y),v(x,y)) = u x2 + u y2 + v x2 + v y 2 Where subscripts denote partials u x = u(x,y)/ x, etc. Data term D I (u(x,y),v(x,y)) = (I x u + I y v + I t ) 2 Only for smoothly changing motion fields No discontinuity boundaries Does not work well in practice 6

Problems With Regularization Computational difficulty Extremely high dimensional minimization problem 2mn dimensional space for m n image and motion estimation If k motion values, k 2mn possible solutions Can solve with gradient descent methods Smoothness too strong a model Can in principle estimate variable smoothness penalty λ I (x,y) More difficult computation Need to relate λ I to V, D I 7

Line process Regularization With Discontinuities Estimate binary value representing when discontinuity between neighboring pixels Pixels as sites s S (vertices in graph) Neighborhood N s sites connected to s by edges Grid graph 4-connected or 8-connected Write smoothness term analogously as s S n Ns (u s -u n )2 + (v s -v n ) 2 s n 8

Line Process Variable smoothness penalty depending on binary estimate of discontinuity l s,n s S n Ns [α s (1-l s,n )((u s -u n )2 + (v s -v n ) 2 ) + β s l s,n ] With α s, β s constants controlling smoothness Minimization problem no longer as simple Graduated non-convexity (GNC) s n Line process l s,n 9

Robust Regularization Both smoothness and data constraints can be violated Result not smooth at certain locations Addressed by line process Data values bad at certain locations E.g., specularities, occlusions Not addressed by line process Unified view: model both smoothness and data terms using robust error measures Replace quadratic error which is sensitive to outliers 10

Robust Formulation Simply replace quadratic terms with robust error function ρ s S [ρ 1 (Ix u s + Iy v s + I t ) + λ n Ns [ρ 2 (u s -u n ) + ρ 2 (v s -v n )]] In practice often estimate first term over small region around s Some robust error functions Truncated linear: ρ τ (x) = min(τ,x) Truncated quadratic: ρ τ (x) = min(τ,x 2 ) Lorentzian: ρ σ (x) = log(1 + ½(x/σ) 2 ) 11

Influence Functions Useful to think of error functions in terms of degree to which a given value affects the result 12

Relation to Line Process Can think of robust error function as performing outlier rejection Influence (near) zero for outliers but non-zero for inliers Line process makes a binary inlier/outlier decision Based on external process or on degree of difference between estimated values Both robust estimation and line process formulations local characterizations 13

Relationship to MRF Models Markov random field (MRF) Collection of random variables Graph structure models spatial relations with local neighborhoods (Markov property) Explicit dependencies among pixels Widely used in low-level vision problems Stereo, motion, segmentation Seek best label for each pixel Bayesian model, e.g., MAP estimation Common to consider corresponding energy minimization problems 14

Markov Random Fields in Vision Graph G=(V,E) Assume vertices indexed 1,, n Observable variables y={y 1,, y n } Unobservable variables x={x 1,, x n } Edges connect each x i to certain neighbors N xi Edges connect each x i to y i Consider cliques of size 2 Recall clique is fully connected sub-graph 4-connected grid or 2-connected chain 15

MRF Models in Vision Prior P(x) factors into product of functions over cliques Due to Hammersly-Clifford Theorem P(x) = C Ψ C (x c ) Ψ C termed clique potential, of form exp(-v C ) For clique size 2 (cliques correspond to edges) P(x) = i,j Ψ ij (x i,x j ) Probability of hidden and observed values P(x,y) = i,j Ψ ij (x i,x j ) i Ψ ii (x i,y i ) Given particular clique energy V ij and observed y, seek values of x maximizing P(x,y) 16

Markov Property Neighborhoods completely characterize conditional distributions Solving a global problem with local relationships Probability of values over subset S given remainder same as for that subset given its neighborhood Given S V and S c =V-S P(x S x Sc ) = P (x S N xs ) Conceptually and computationally useful 17

MRF Estimation Various ways of maximizing probability Common to use MAP estimate argmax x P(x y) argmax x i,j Ψ ij (x i,x j ) i Φ i (x i,y i ) Probabilities hard to compute with Use logs (or often negative log) argmin x i,j V ij (x i,x j ) + i D i (x i,y i ) In energy function formulation often think of assigning best label f i L to each node v i given data y i argmin f [ i D(y i,f i ) + i,j V(f i,f j )] 18

Similar to Regularization Summation of data and smoothness terms argmin f [ i D(y i,f i ) + i,j V(f i,f j )] argmin f s S [ρ 1 (d s,f s )+ λ n Ns [ρ 2 (f s -f n )]] Data term D vs. robust data function ρ 1 Clique term V vs. robust smoothness function ρ 2 Over cliques rather than neighbors of each site Nearly same definitions on four connected grid Probabilistic formulation particularly helpful for learning problems Parameters of D, V or even form of D,V 19

Common Clique Energies Enforce smoothness, robust to outliers Potts model Same or outlier (based on label identity) V τ (f i,f j ) = 0 when f i =f j, τ otherwise Truncated linear model Small linear change or outlier (label difference) V σ,τ (f i,f j ) = min(τ, σ f i -f j ) Truncated quadratic model Small quadratic change or outlier (label difference) V σ,τ (f i,f j ) = min(τ, σ f i -f j 2 ) 20

1D Graphs (Chains) Simpler than 2D for illustration purposes Fast polynomial time algorithms Problem definition Sequence of nodes V=(1,, n) Edges between adjacent pairs (i, i+1) Observed value y i at each node Seek labeling f=(f 1,, f n ), f i L, minimizing i [D(y i,f i ) + V(f i,f i+1 )] (note V(f n,f n+1 )=0) Contrast with smoothing by convolution d i 1 3 2 1 3 12 10 11 10 12 f i 2 2 2 2 2 11 11 11 11 11 21

Viterbi Recurrence Don t need explicit min over f=(f 1,, f n ) Instead recursively compute s i (f i ) = D(y i,f i ) + min fi-1 (s i-1 (f i-1 ) + V(f i-1,f i )) Note s i (f i ) for given i encodes a lowest cost label sequence ending in state f i at that node V(f i-1,f i ) Possible labels, values of f i S i-1 (f i-1 ) D(y i,f i ) s i (f i ) 22

Viterbi Algorithm Find a lowest cost assignment f 1,, f n Initialize s 1 (f 1 ) = D(y 1,f 1 )+π, with π cost of f 1 if not uniform Recurse s i (f i ) = D(y i,f i ) + min fi-1 (s i-1 (f i-1 ) + V(f i-1,f i )) b i (f i ) = argmin fi-1 (s i-1 (f i-1 ) + V(f i-1,f i )) Terminate min fn s n (f n ), cost of cheapest path (neg log prob) f n* = argmin fn s n (f n ) Backtrack f n-1* = b n (f n ) 23

Viterbi Algorithm For sequence of n data elements, with m possible labels per element Compute s i (f i ) for each element using recurrence O(nm 2 ) time For final node compute f n minimizing s n (f n ) Trace back from node back to first node Minimizers computed when computing costs on forward pass First step dominates running time Avoid searching exponentially many paths 24

Large Label Sets Problematic Viterbi slow with large number of labels O(m 2 ) term in calculating s i (f i ) For our problems V usually has a special form so can compute in linear time Consider linear clique energy s i (f i ) = D(y i,f i ) + min fi-1 (s i-1 (f i-1 ) + f i-1 -f i ) Minimization term is precisely the distance transform DTs i-1 of a function considered earlier Which can compute in linear time But linear model not robust Can extend to truncated linear 25

Truncated Distance Cost Avoid explicit min fi-1 for each f i Truncated linear model min fi-1 (s i-1 (f i-1 ) + min(τ, f i-1 -f i )) Factor f i out of minimizations over f i-1 min(min fi-1 (s i-1 (f i-1 )+τ), min fi-1 (s i-1 (f i-1 )+ f i-1 -f i )) min(min fi-1 (s i-1 (f i-1 )+τ), DT si-1 (f i )) Analogous for truncated quadratic model Similar for Potts model except no need for distance transform O(mn) algorithm for best label sequence 26

Belief Propagation Local message passing scheme in graph Every node in parallel computes messages to send to neighbors Iterate time-steps, t, until convergence Various message updating schemes Here consider max product for undirected graph Becomes min sum using costs (neg log probs) Message m i,j,t sent from node i to j at time t m i,j,t (f j ) = min fi [V(f i,f j )+D(y i,f i ) + k Ni\j m k,i,t-1 (f i )] 27

Belief Propagation After message passing converges at iteration T Each node computes final value based on neighbors b i (f i )=D(y i,f i )+ k Ni m k,i,t (f i ) Select label f i minimizing b i for each node Corresponds to maximizing belief (probability) For singly-connected chain node generally has two neighbors i-1 and i+1 m i,i-1,t (f i-1 ) = min fi [V(f i,f i-1 )+D(y i,f i )+m i+1,i,t-1 (f i )] Analogous for i+1 neighbor 28

Belief Propagation on a Chain Message passed from i to i+1 m i,i+1,t (f i+1 ) = min fi [V(f i,f i+1 )+D(y i,f i )+m i-1,i,t-1 (f i )] Note relation to Viterbi recursion Can show BP converges to same minimum as Viterbi for chain (if unique min) V(f i,f i+1 ) Possible labels, values of f i m i-1,i,t-1 (f i ) D(y i,f i ) m i,i+1,t-1 (f i+1 ) 29

Min Sum Belief Prop Algorithm For chain, two messages per node Node i sends messages m i,l to left m i,r to right Initialize: m i,l,0 =m i,r,0 =(0,, 0) for all nodes i Update messages, for t from 1 to T m i,l,t (f l ) = min fi [V(f i,f l )+D(y i,f i )+m r,i,t-1 (f i )] m i,r,t (f r ) = min fi [V(f i,f r )+D(y i,f i )+m l,i,t-1 (f i )] Compute belief at each node b i (f i )=D(y i,f i )+m r,i,t (f i )+m l,i,t (f i ) Select best at each node (global optimum) argmin fi b i (f i ) 30

Relation to HMM Hidden Markov model Set of unobservable (hidden) states Sequence of observed values, y i Transitions between states are Markov Depend only on previous state (or fixed number) State transition matrix (costs or probabilities) Distribution of possible observed values for each state Given y i determine best state sequence Widely used in speech recognition and temporal modeling 31

Hidden Markov Models Two different but equivalent views Sequence of unobservable random variables and observable values 1D MRF with label set Penalties V(f i,f j ), data costs D(y i,f i ) Hidden non-deterministic state machine Distribution over observable values for each state V A C B D A B D E 32

Using HMM s Three classical problems for HMM Given observation sequence Y=y 1,, y n and HMM λ=(d,v,π) 1. Compute P(Y λ), probability of observing Y given the model Alternatively cost (negative log prob) 2. Determine the best state sequence x 1,, x n given Y Various definitions of best, one is MAP estimate argmax X P(X Y,λ) or min cost 3. Adjust model λ=(d,v,π) to maximize P(Y λ) Learning problem often solved by EM 33

HMM Inference or Decoding Determine the best state sequence X given observation sequence Y MAP (maximum a posteriori) estimate argmax X P(X Y,λ) Equivalently minimize cost, negative log prob Computed using Viterbi or max-product (minsum) belief propagation Most likely state at each time P(X t Y 1,,Y t,λ) Maximize probability of states individually Computed using forward-backward procedure or sum-product belief propagation 34

1D HMM Example Estimate bias of changing coin from sequence of observed {H,T} values Use MAP formulation Find lowest cost state sequence States correspond to possible bias values, e.g.,.10,,.90 (large state space) Data costs logp(h x i ), -logp(t x i ) Used to analyze time varying popularity of item downloads at Internet Archive Each visit results in download or not (H/T) 35

1D HMM Example Truncated linear penalty term V(f i,f j ) Contrast with smoothing Particularly hard task for 0-1 valued data 36

Algorithms for Grids (2D) Polynomial time for binary label set or for convex cost function V(f i,f j ) Compute minimum cut in certain graph NP hard in general (reduction from multi-way cut) Approximation methods (not global min) Graph cuts and expansion moves Loopy belief propagation Many other local minimization techniques Monte Carlo sampling methods, annealing, etc. Consider graph cuts and belief propagation Reasonably fast Can characterize the local minimum 37