Chapter 8 of Bishop's Book: Graphical Models

Similar documents
Part II. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Lecture 4: Undirected Graphical Models

D-Separation. b) the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C.

Computer vision: models, learning and inference. Chapter 10 Graphical Models

Computer Vision Group Prof. Daniel Cremers. 4a. Inference in Graphical Models

CS242: Probabilistic Graphical Models Lecture 3: Factor Graphs & Variable Elimination

Expectation Propagation

5 Minimal I-Maps, Chordal Graphs, Trees, and Markov Chains

3 : Representation of Undirected GMs

4 Factor graphs and Comparing Graphical Model Types

A Tutorial Introduction to Belief Propagation

ECE521 W17 Tutorial 10

Loopy Belief Propagation

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

STA 4273H: Statistical Machine Learning

Graphical Models. David M. Blei Columbia University. September 17, 2014

Machine Learning. Sourangshu Bhattacharya

FMA901F: Machine Learning Lecture 6: Graphical Models. Cristian Sminchisescu

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Probabilistic Graphical Models

6 : Factor Graphs, Message Passing and Junction Trees

Lecture 3: Conditional Independence - Undirected

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

Statistical and Learning Techniques in Computer Vision Lecture 1: Markov Random Fields Jens Rittscher and Chuck Stewart

CS242: Probabilistic Graphical Models Lecture 2B: Loopy Belief Propagation & Junction Trees

Algorithms for Markov Random Fields in Computer Vision

The Basics of Graphical Models

10 Sum-product on factor tree graphs, MAP elimination

2. Graphical Models. Undirected graphical models. Factor graphs. Bayesian networks. Conversion between graphical models. Graphical Models 2-1

Belief propagation and MRF s

Comparison of Graph Cuts with Belief Propagation for Stereo, using Identical MRF Parameters

2. Graphical Models. Undirected pairwise graphical models. Factor graphs. Bayesian networks. Conversion between graphical models. Graphical Models 2-1

Neighbourhood-consensus message passing and its potentials in image processing applications

Markov Random Fields and Segmentation with Graph Cuts

Graphical Models. Pradeep Ravikumar Department of Computer Science The University of Texas at Austin

Probabilistic Graphical Models

MRFs and Segmentation with Graph Cuts

Lecture 9: Undirected Graphical Models Machine Learning

CSE 586 Final Programming Project Spring 2011 Due date: Tuesday, May 3

ECE521 Lecture 18 Graphical Models Hidden Markov Models

CAP5415-Computer Vision Lecture 13-Image/Video Segmentation Part II. Dr. Ulas Bagci

Analysis: TextonBoost and Semantic Texton Forests. Daniel Munoz Februrary 9, 2009

AN ANALYSIS ON MARKOV RANDOM FIELDS (MRFs) USING CYCLE GRAPHS

Introduction to Graphical Models

Markov/Conditional Random Fields, Graph Cut, and applications in Computer Vision

Lecture 5: Exact inference

Models for grids. Computer vision: models, learning and inference. Multi label Denoising. Binary Denoising. Denoising Goal.

Lecture 5: Exact inference. Queries. Complexity of inference. Queries (continued) Bayesian networks can answer questions about the underlying

Motivation: Shortcomings of Hidden Markov Model. Ko, Youngjoong. Solution: Maximum Entropy Markov Model (MEMM)

Using Combinatorial Optimization within Max-Product Belief Propagation

Machine Learning

Conditional Random Fields and beyond D A N I E L K H A S H A B I C S U I U C,

Recitation 4: Elimination algorithm, reconstituted graph, triangulation

Workshop report 1. Daniels report is on website 2. Don t expect to write it based on listening to one project (we had 6 only 2 was sufficient

Machine Learning

Labeling Irregular Graphs with Belief Propagation

OSU CS 536 Probabilistic Graphical Models. Loopy Belief Propagation and Clique Trees / Join Trees

Collective classification in network data

Machine Learning Lecture 16

Exact Inference: Elimination and Sum Product (and hidden Markov models)

Machine Learning

ECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov

ESSP: An Efficient Approach to Minimizing Dense and Nonsubmodular Energy Functions

A discriminative view of MRF pre-processing algorithms supplementary material

Dr. Ulas Bagci

COS 513: Foundations of Probabilistic Modeling. Lecture 5

Junction tree propagation - BNDG 4-4.6

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 20, NO. 9, SEPTEMBER

Max-Sum Inference Algorithm

Markov Random Fields

Recall from last time. Lecture 4: Wrap-up of Bayes net representation. Markov networks. Markov blanket. Isolating a node

Information Processing Letters

Efficient Belief Propagation with Learned Higher-order Markov Random Fields

Deep Boltzmann Machines

ECE521 Lecture 21 HMM cont. Message Passing Algorithms

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Variable Grouping for Energy Minimization

Reduce, Reuse & Recycle: Efficiently Solving Multi-Label MRFs

Image Restoration using Markov Random Fields

Semi-Global Matching: a principled derivation in terms of Message Passing

Junction Trees and Chordal Graphs

intro, applications MRF, labeling... how it can be computed at all? Applications in segmentation: GraphCut, GrabCut, demos

Comparison of energy minimization algorithms for highly connected graphs

Graph Cut based Continuous Stereo Matching using Locally Shared Labels

MEDICAL IMAGE COMPUTING (CAP 5937) LECTURE 10: Medical Image Segmentation as an Energy Minimization Problem

Markov Logic: Representation

Proteins, Particles, and Pseudo-Max- Marginals: A Submodular Approach Jason Pacheco Erik Sudderth

Finding Non-overlapping Clusters for Generalized Inference Over Graphical Models

Measuring Uncertainty in Graph Cut Solutions - Efficiently Computing Min-marginal Energies using Dynamic Graph Cuts

18 October, 2013 MVA ENS Cachan. Lecture 6: Introduction to graphical models Iasonas Kokkinos

Graphical Models Part 1-2 (Reading Notes)

Undirected Graphical Models. Raul Queiroz Feitosa

Sequence Labeling: The Problem

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms for Inference Fall 2014

Graphical models and message-passing algorithms: Some introductory lectures

Markov Random Fields and Segmentation with Graph Cuts

Computer Vision : Exercise 4 Labelling Problems

COMP90051 Statistical Machine Learning

Transcription:

Chapter 8 of Bishop's Book: Graphical Models

Review of Probability Probability density over possible values of x Used to find probability of x falling in some range For continuous variables, the probability of a single value is technically 0! For discrete values, called probability mass

Review of Probability The density function must satisfy two conditions

Two important rules Sum Rule: p(x) is marginal probability, obtained by marginalizing out y Product Rule

Graphical Models Consider the distribution Can be rewritten as Which can be rewritten as

Making a Graphical Model Introduce one node per random variable Use the distribution Add one edge per conditional distribution

What's the distribution for this graph?

Big Advantage: Many Less Parameters Assume each variable can take K states, how many numbers do we need to specify this distribution What if we just wanted to express it as a giant joint distribution

What do you do with distributions? Draw Samples Infer marginal distributions over variables

Ancestral Sampling 1. Start at the low numbered nodes and draw samples 2.Work your through the graph, sampling from conditional distributions

Generative Models The graph, along with the ancestral sampling method, is a model of how the data is generated The model expresses a causal process for generating the data These models are often called generative models They show how to generate the data

Example of a Generative Model of Images (of Objects) (From Sudderth, Torralba, Freeman, and Willsky - NIPS05)

Undirected Models Directed models are useful, but have some complicating limitations Determining conditional independence can be hard Not all relationships can be expressed causally We can drop the arrows to make an undirected model Also called Markov Network or Markov Random Field

Undirected Graphical Models To understand undirected models, we need to introduce the notion of a clique Subset of nodes Links between all nodes in subset And Maximal Cliques If you add nodes to the clique, it is no longer a clique

or Conditional Independence

Conditional Independence in an MRF Conditioned on its neighbors a node is conditionally independent from the rest of the nodes in the graph

Representing the Distribution in an Undirected Graph The form of the distribution is The distribution is formed from potential functions on the maximal cliques of the graph They represent compatibility between the states of different variables Z is a normalization constant and is also known as the partition function

Converting a directed model to an undirected model You moralize the graph Marry the parents Can then use inference techniques for undirected graphs

Inference in Graphical Models Let's say that I've analyzed the problem Have designed a graphical model that relates the observations to quantities that I would like to estimate Get my observations How do estimate the hidden, or latent, quantities?

Inference and Conditional Independence Conditional independence is important for MRF models because it makes inference much easier. Consider a chain x 1 x N Task: Find the marginal distribution of some variable x n

Naïve, Slow Way Use the sum rule and do the sums Implication: If you have K states per node and N nodes, this will take K N operations.

Taking advantage of the structure The distribution of this chain: Is x 1 x N

4 Node Example Start Re-Arranging Sums

Re-Arranging Sums Make Substitution Which leads to

To Get Do it again

How many operations did this require?

For a general chain

One more type of graphical model Now, we'll write the distribution as Called a factor

The Graph The graphs are bipartite graphs Represents Variables Represents Factor Functions Advantage: The graph can express distributions more exactly

Example A factor graph can represent any directed or undirected graphical model

Deriving the sum-product algorithm... Call all of the factors associated with this sub-tree...... Call all of the factors associated with this sub-tree

Goal calculate the marginal probability of some variable x This means all variables but x Bold font represents the collection of all variables

Deriving the sum-product algorithm... Call all of the factors associated with this sub-tree...... Call all of the factors associated with this sub-tree

Combining the two (and flipping order)

We can also factor the F variables

Called the sum-product algorithm (or belief propagation) Two steps: Variables pass messages to factors Factors pass messages to variables

Can use similar message-passing scheme with undirected models Pass messages between nodes The factor graph representation is nice because it is much easier to deal with non-pairwise nodes. If your clique potentials involve more than two nodes, the messages are complicated

What if you want the labeling with the highest probability? Use max-product or max-sum in the log-domain Basically, replace the sum in sum-product with max

IMPORTANT: These algorithms only work if the graph has a tree structure If the graph has loops, it is NP-Complete (see Boykov, Veksler, and Zabih PAMI-2000) Problem: Many interesting models in vision have loops One solution: Ignore the @^$&^$ loops and run it anyway

Surprisingly, this works well Recent comparison of BP with other algorithms on several vision problems A Comparative Study of Energy Minimization Methods for Markov Random Fields R. Szeliski, R. Zabih, D. Scharstein, O. Veksler, V. Kolmogorov, A. Agarwala, M. Tappen, and C. Rother. In Ninth European Conference on Computer Vision (ECCV 2006), volume 2, pages 19-26, Graz, Austria, May 2006. This paper also has references to a lot of work that relies on MRF models

To Add Definition of Boltzman Hammersley-Clifford Relationship of undirected to Directed