Computer Vision Group Prof. Daniel Cremers. 4a. Inference in Graphical Models

Similar documents
D-Separation. b) the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C.

Part II. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

STA 4273H: Statistical Machine Learning

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

Motivation: Shortcomings of Hidden Markov Model. Ko, Youngjoong. Solution: Maximum Entropy Markov Model (MEMM)

Probabilistic Graphical Models

6 : Factor Graphs, Message Passing and Junction Trees

Machine Learning. Sourangshu Bhattacharya

ECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov

Conditional Random Fields and beyond D A N I E L K H A S H A B I C S U I U C,

CS242: Probabilistic Graphical Models Lecture 3: Factor Graphs & Variable Elimination

Chapter 8 of Bishop's Book: Graphical Models

Graphical Models & HMMs

Probabilistic Graphical Models

ECE521 W17 Tutorial 10

Expectation Propagation

Loopy Belief Propagation

Lecture 5: Exact inference. Queries. Complexity of inference. Queries (continued) Bayesian networks can answer questions about the underlying

CS242: Probabilistic Graphical Models Lecture 2B: Loopy Belief Propagation & Junction Trees

Collective classification in network data

Algorithms for Markov Random Fields in Computer Vision

3 : Representation of Undirected GMs

Computer vision: models, learning and inference. Chapter 10 Graphical Models

Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields

Learning and Recognizing Visual Object Categories Without First Detecting Features

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 15, 2015

OSU CS 536 Probabilistic Graphical Models. Loopy Belief Propagation and Clique Trees / Join Trees

V,T C3: S,L,B T C4: A,L,T A,L C5: A,L,B A,B C6: C2: X,A A

Lecture 5: Exact inference

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

STAT 598L Probabilistic Graphical Models. Instructor: Sergey Kirshner. Exact Inference

Homework. Gaussian, Bishop 2.3 Non-parametric, Bishop 2.5 Linear regression Pod-cast lecture on-line. Next lectures:

Models for grids. Computer vision: models, learning and inference. Multi label Denoising. Binary Denoising. Denoising Goal.

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms for Inference Fall 2014

Lecture 9: Undirected Graphical Models Machine Learning

Undirected Graphical Models. Raul Queiroz Feitosa

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

Machine Learning

Building Classifiers using Bayesian Networks

Machine Learning

Introduction to Graphical Models

Machine Learning

Markov Networks in Computer Vision. Sargur Srihari

Mean Field and Variational Methods finishing off

Regularization and Markov Random Fields (MRF) CS 664 Spring 2008

Exact Inference: Elimination and Sum Product (and hidden Markov models)

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

Probabilistic Graphical Models

5 Minimal I-Maps, Chordal Graphs, Trees, and Markov Chains

Directed Graphical Models

Mean Field and Variational Methods finishing off

Learning the Structure of Sum-Product Networks. Robert Gens Pedro Domingos

Machine Learning Lecture 16

Junction tree propagation - BNDG 4-4.6

ECE521 Lecture 21 HMM cont. Message Passing Algorithms

Detection of Man-made Structures in Natural Images

Machine Learning

Max-Sum Inference Algorithm

Lecture 3: Conditional Independence - Undirected

Probabilistic Graphical Models

COMP90051 Statistical Machine Learning

CS 664 Flexible Templates. Daniel Huttenlocher

FMA901F: Machine Learning Lecture 6: Graphical Models. Cristian Sminchisescu

Part I: Sum Product Algorithm and (Loopy) Belief Propagation. What s wrong with VarElim. Forwards algorithm (filtering) Forwards-backwards algorithm

18 October, 2013 MVA ENS Cachan. Lecture 6: Introduction to graphical models Iasonas Kokkinos

Probabilistic Graphical Models

Machine Learning Lecture 3

Bayes Net Learning. EECS 474 Fall 2016

ECE521 Lecture 18 Graphical Models Hidden Markov Models

Conditional Random Fields - A probabilistic graphical model. Yen-Chin Lee 指導老師 : 鮑興國

Machine Learning Lecture 3

2. Graphical Models. Undirected graphical models. Factor graphs. Bayesian networks. Conversion between graphical models. Graphical Models 2-1

Machine Learning Lecture 3

Neural Networks and Deep Learning

Markov Networks in Computer Vision

Probabilistic Graphical Models

COMP90051 Statistical Machine Learning

2. Graphical Models. Undirected pairwise graphical models. Factor graphs. Bayesian networks. Conversion between graphical models. Graphical Models 2-1

7. Boosting and Bagging Bagging

Deep Generative Models Variational Autoencoders

CRF Feature Induction

CRFs for Image Classification

Bayesian Networks Inference (continued) Learning

Finding Non-overlapping Clusters for Generalized Inference Over Graphical Models

Semantic Segmentation. Zhongang Qi

4 Factor graphs and Comparing Graphical Model Types

Graphical Models. Pradeep Ravikumar Department of Computer Science The University of Texas at Austin

Markov Network Structure Learning

10708 Graphical Models: Homework 4

Decomposition of log-linear models

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

Conditional Random Fields for Object Recognition

Summary: A Tutorial on Learning With Bayesian Networks

Modeling and Reasoning with Bayesian Networks. Adnan Darwiche University of California Los Angeles, CA

Sequence Labeling: The Problem

Today. Logistic Regression. Decision Trees Redux. Graphical Models. Maximum Entropy Formulation. Now using Information Theory

More details on Loopy BP

Object Recognition Using Pictorial Structures. Daniel Huttenlocher Computer Science Department. In This Talk. Object recognition in computer vision

Transcription:

Group Prof. Daniel Cremers 4a. Inference in Graphical Models

Inference on a Chain (Rep.) The first values of µ α and µ β are: The partition function can be computed at any node: Overall, we have O(NK 2 ) operations to compute the marginal 2 Group

More General Graphs The message-passing algorithm can be extended to more general graphs: Undirected Tree Directed Tree Polytree It is then known as the sum-product algorithm. A special case of this is belief propagation. 3 Group

More General Graphs The message-passing algorithm can be extended to more general graphs: Undirected Tree An undirected tree is defined as a graph that has exactly one path between any two nodes 4 Group

More General Graphs The message-passing algorithm can be extended to more general graphs: A directed tree has only one node without parents and all other nodes have exactly one parent Directed Tree Conversion from a directed to an undirected tree is no problem, because no links are inserted The same is true for the conversion back to a directed tree 5 Group

More General Graphs The message-passing algorithm can be extended to more general graphs: Polytrees can contain nodes with several parents, therefore moralization can remove independence relations Polytree 6 Group

Factor Graphs The Sum-product algorithm can be used to do inference on undirected and directed graphs. A representation that generalizes directed and undirected models is the factor graph. f(x 1,x 2,x 3 )=p(x 1 )p(x 2 )p(x 3 x 1,x 2 ) Directed graph Factor graph 7 Group

Factor Graphs The Sum-product algorithm can be used to do inference on undirected and directed graphs. A representation that generalizes directed and undirected models is the factor graph. Undirected graph Factor graph 8 Group

Factor Graphs Factor graphs can contain multiple factors x 1 x 3 f a for the same nodes f b are more general than undirected graphs x 4 are bipartite, i.e. they consist of two kinds of nodes and all edges connect nodes of different kind 9 Group

Factor Graphs Directed trees convert to x 1 x 3 tree-structured factor graphs The same holds for x 4 undirected trees Also: directed polytrees convert to tree-structured factor graphs x 1 x 3 f a And: Local cycles in a directed graph can be removed by converting to a x 4 factor graph 10 Group

The Sum-Product Algorithm Assumptions: all variables are discrete the factor graph has a tree structure The factor graph represents the joint distribution as a product of factor nodes: p(x) = Y s f s (x s ) The marginal distribution at a given node x is p(x) = X x\x p(x) 11 Group

The Sum-Product Algorithm For a given node x the joint can be written as p(x) = Y s2ne(x) F s (x, X s ) Thus, we have p(x) = X x\x Y s2ne(x) F s (x, X s ) Product of all factors associated with f s Key insight: Sum and product can be exchanged! p(x) = Y s2ne(x) X X s F s (x, X s )= Y s2ne(x) µ fs!x(x) Messages from factors to node x 12 Group

The Sum-Product Algorithm The factors in the messages can be factorized further: F s (x, X s )=f s (x, x 1,...,x M )G 1 (x 1,X s1 )...G M (x M,X sm ) The messages can then be computed as µ fs!x(x) = X x 1 X = X x 1 X x M f s (x, x 1,...,x M ) x M f s (x, x 1,...,x M ) Y m2ne(f s )\x Y X X s m m2ne(f s )\x G m (x m,x sm ) µ xm!f s (x m ) Messages from nodes to factors 13 Group

The Sum-Product Algorithm µ xm!f s (x m )= = Y l2ne(x m )\f s The factors G of the neighboring nodes can again be factorized further: G M (x m,x sm )= X X ml F l (x m,x ml ) Y l2ne(x m )\f s F l (x m,x ml ) This results in the exact same situation as above! We can now recursively apply the derived rules: Y l2ne(x m )\f s µ fl!x m (x m ) 14 Group

The Sum-Product Algorithm Summary marginalization: 1.Consider the node x as a root note 2.Initialize the recursion at the leaf nodes as: (var) or µ f!x (x) =1 µ x!f (x) =f(x) (fac) 3.Propagate the messages from the leaves to the root x 4.Propagate the messages back from the root to the leaves 5.We can get the marginals at every node in the graph by multiplying all incoming messages 15 Group

The Max-Sum Algorithm Sum-product is used to find the marginal distributions at every node, but: How can we find the setting of all variables that maximizes the joint probability? And what is the value of that maximal probability? Idea: use sum-product to find all marginals and then report the value for each node x that maximizes the marginal p(x) However: this does not give the overall maximum of the joint probability 16 Group

The Max-Sum Algorithm Observation: the max-operator is distributive, just like the multiplication used in sum-product: max(ab, ac) =a max(b, c) if a 0 Idea: use max instead of sum as above and exchange it with the product Chain example: max x p(x) = 1 Z max x 1...max[ 1,2 (x 1,x 2 )... N 1,N (x N 1,x N )] = 1 Z max x 1 [ 1,2 (x 1,x 2 )[...max N 1,N (x N 1,x N )]] Message passing can be used as above! 17 Group

The Max-Sum Algorithm To find the maximum value of p(x), we start again at the leaves and propagate to the root. Two problems: no summation, but many multiplications; this leads to numerical instability (very small values) when propagating back, multiple configurations of x can maximize p(x), leading to wrong assignments of the overall maximum Solution to the first: Transform everything into log-space and use sums 18 Group

The Max-Sum Algorithm Solution to the second problem: Keep track of the arg max in the forward step, i.e. store at each node which value was responsible for the maximum: (x n ) = arg max x n 1 [ln f n 1,n (x n 1,x n )+µ xn 1!f n 1,n (x n )] Then, in the back-tracking step we can recover the arg max by recursive substitution of: x max n 1 = (x max n ) 19 Group

Other Inference Algorithms Junction Tree Algorithm: Provides exact inference on general graphs. Works by turning the initial graph into a junction tree and then running a sum-product-like algorithm A junction tree is obtained from an undirected model by triangulation and mapping cliques to nodes and connections of cliques to edges It is the maximal spanning tree of cliques Problem: Intractable on graphs with large cliques. Cost grows exponentially with the number of variables in the largest clique ( tree width ). 20 Group

Other Inference Algorithms Loopy Belief Propagation: Performs Sum-Product on general graphs, particularly when they have loops Propagation has to be done several times, until a convergence criterion is met No guarantee of convergence and no global optimum Messages have to be scheduled Initially, unit messages passed across all edges Approximate, but tractable for large graphs 21 Group

Conditional Random Fields Another kind of undirected graphical model is known as Conditional Random Field (CRF). CRFs are used for classification where labels are represented as discrete random variables y and features as continuous random variables x A CRF represents the conditional probability where w are parameters learned from training data. CRFs are discriminative and MRFs are generative 22 Group

Conditional Random Fields Derivation of the formula for CRFs: In the training phase, we compute parameters w that maximize the posterior: where (x *,y * ) is the training data and p(w) is a Gaussian prior. In the inference phase we maximize 23 Group

Conditional Random Fields Typical example: observed variables x i,j are intensity values of pixels in an image and hidden variables y i,j are object labels Note: the definition of x i,j and y i,j is different from the one in C.M. Bishop (pg.389)! 24 Group

CRF Training We minimize the negative log-posterior: Computing the likelihood is intractable, as we have to compute the partition function for each w. We can approximate the likelihood using pseudo-likelihood: where Markov blanket C i : All cliques containing y i 25 Group

Pseudo Likelihood 26 Group

Pseudo Likelihood Pseudo-likelihood is computed only on the Markov blanket of y i and its corresp. feature nodes. 27 Group

Potential Functions The only requirement for the potential functions is that they are positive. We achieve that with: where f is a compatibility function that is large if the labels y C fit well to the features x C. This is called the log-linear model. The function f can be, e.g. a local classifier 28 Group

Training: CRF Training and Inference Using pseudo-likelihood, training is efficient. We have to minimize: Log-pseudo-likelihood Gaussian prior This is a convex function that can be minimized using gradient descent Inference: Only approximatively, e.g. using loopy belief propagation 29 Group

Summary Undirected Graphical Models represent conditional independence more intuitively using graph separation Their factorization is done based on potential functions The normalizer is called the partition function, which in general is intractable to compute Inference in graphical models can be done efficiently using the sum-product algorithm (message passing). Another inference algorithm is loopy belief propagation, which is approximate, but tractable Conditional Random Fields are a special kind of MRFs and can be used for classification 30 Group