Part II. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Similar documents
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Machine Learning. Sourangshu Bhattacharya

Computer Vision Group Prof. Daniel Cremers. 4a. Inference in Graphical Models

D-Separation. b) the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C.

STA 4273H: Statistical Machine Learning

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

Motivation: Shortcomings of Hidden Markov Model. Ko, Youngjoong. Solution: Maximum Entropy Markov Model (MEMM)

Lecture 5: Exact inference. Queries. Complexity of inference. Queries (continued) Bayesian networks can answer questions about the underlying

6 : Factor Graphs, Message Passing and Junction Trees

ECE521 W17 Tutorial 10

FMA901F: Machine Learning Lecture 6: Graphical Models. Cristian Sminchisescu

Chapter 8 of Bishop's Book: Graphical Models

Workshop report 1. Daniels report is on website 2. Don t expect to write it based on listening to one project (we had 6 only 2 was sufficient

Lecture 5: Exact inference

Machine Learning

STAT 598L Probabilistic Graphical Models. Instructor: Sergey Kirshner. Exact Inference

Machine Learning Lecture 16

Introduction to Graphical Models

Machine Learning

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

CS242: Probabilistic Graphical Models Lecture 3: Factor Graphs & Variable Elimination

Probabilistic Graphical Models

OSU CS 536 Probabilistic Graphical Models. Loopy Belief Propagation and Clique Trees / Join Trees

4 Factor graphs and Comparing Graphical Model Types

Collective classification in network data

Machine Learning

Loopy Belief Propagation

Machine Learning

CS242: Probabilistic Graphical Models Lecture 2B: Loopy Belief Propagation & Junction Trees

Lecture 9: Undirected Graphical Models Machine Learning

Mean Field and Variational Methods finishing off

Mean Field and Variational Methods finishing off

Probabilistic Graphical Models

Graphical Models & HMMs

Probabilistic Graphical Models

Exact Inference: Elimination and Sum Product (and hidden Markov models)

Expectation Propagation

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

Lecture 3: Conditional Independence - Undirected

Conditional Random Fields and beyond D A N I E L K H A S H A B I C S U I U C,

Part I: Sum Product Algorithm and (Loopy) Belief Propagation. What s wrong with VarElim. Forwards algorithm (filtering) Forwards-backwards algorithm

Learning the Structure of Sum-Product Networks. Robert Gens Pedro Domingos

Junction tree propagation - BNDG 4-4.6

Max-Sum Inference Algorithm

V,T C3: S,L,B T C4: A,L,T A,L C5: A,L,B A,B C6: C2: X,A A

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

Directed Graphical Models

Reasoning About Uncertainty

Bayesian Networks. A Bayesian network is a directed acyclic graph that represents causal relationships between random variables. Earthquake.

Graphical Models. Pradeep Ravikumar Department of Computer Science The University of Texas at Austin

Bayesian Networks Inference

Probabilistic Graphical Models

Machine Learning

COS 513: Foundations of Probabilistic Modeling. Lecture 5

Computer vision: models, learning and inference. Chapter 10 Graphical Models

Graphical Models as Block-Tree Graphs

Probabilistic Graphical Models

18 October, 2013 MVA ENS Cachan. Lecture 6: Introduction to graphical models Iasonas Kokkinos

Inference Complexity As Learning Bias. The Goal Outline. Don t use model complexity as your learning bias

Probabilistic Graphical Models

Machine Learning. Lecture Slides for. ETHEM ALPAYDIN The MIT Press, h1p://

Algorithms for Markov Random Fields in Computer Vision

Finding Non-overlapping Clusters for Generalized Inference Over Graphical Models

ECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov

Graphical Models. David M. Blei Columbia University. September 17, 2014

5/3/2010Z:\ jeh\self\notes.doc\7 Chapter 7 Graphical models and belief propagation Graphical models and belief propagation

CS 343: Artificial Intelligence

Bayesian Machine Learning - Lecture 6

COMP90051 Statistical Machine Learning

Graphical Models Part 1-2 (Reading Notes)

CS 664 Flexible Templates. Daniel Huttenlocher

Bayesian Networks Inference (continued) Learning

Probabilistic Graphical Models

These notes present some properties of chordal graphs, a set of undirected graphs that are important for undirected graphical models.

Bayesian Networks, Winter Yoav Haimovitch & Ariel Raviv

A New Approach For Convert Multiply-Connected Trees in Bayesian networks

Recitation 4: Elimination algorithm, reconstituted graph, triangulation

Regularization and Markov Random Fields (MRF) CS 664 Spring 2008

Building Classifiers using Bayesian Networks

The Basics of Graphical Models

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Scene Grammars, Factor Graphs, and Belief Propagation

ECE521 Lecture 21 HMM cont. Message Passing Algorithms

Integrating Probabilistic Reasoning with Constraint Satisfaction

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 15, 2015

CS649 Sensor Networks IP Track Lecture 6: Graphical Models

Scene Grammars, Factor Graphs, and Belief Propagation

More details on Loopy BP

CS 6784 Paper Presentation

From the Jungle to the Garden: Growing Trees for Markov Chain Monte Carlo Inference in Undirected Graphical Models

Warm-up as you walk in

Lecture 4: Undirected Graphical Models

CS 532c Probabilistic Graphical Models N-Best Hypotheses. December

Undirected Graphical Models. Raul Queiroz Feitosa

Sequence Labeling: The Problem

Models for grids. Computer vision: models, learning and inference. Multi label Denoising. Binary Denoising. Denoising Goal.

5 Minimal I-Maps, Chordal Graphs, Trees, and Markov Chains

CS 188: Artificial Intelligence

10 Sum-product on factor tree graphs, MAP elimination

Indexing Correlated Probabilistic Databases

Transcription:

Part II C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Converting Directed to Undirected Graphs (1)

Converting Directed to Undirected Graphs (2) Add extra links between all pairs of parents : moralization marrying the parents. The resulting undirected graph, after dropping the arrows, is called the moral graph.

Directed vs. Undirected Graphs

Inference in Graphical Models A graphical representation of Bayes theorem.

Inference on a Chain For N variables, each with K states, there are KN values for x, thus evaluation and storage of the joint distribution, as well as marginalization to obtain p(x n ), all involve storage and computation that scale exponentially with the length N of the chain.

Efficient Computation(1) Rearrange the order of the summations and the multiplications allows the required marginal to be evaluated much more efficiently. ab + ac = a(b + c) Doesn t depend on x N

Efficient Computation(2) Evaluating the marginal is O(NK 2 ). This is linear in the length of the chain, in contrast to the exponential cost of a naive approach.

Inference on a Chain

Inference on a Chain

Inference on a Chain

Inference on a Chain To compute local marginals: Compute and store all forward messages,. Compute and store all backward messages,. Compute Z at any node x m Compute for all variables required.

Trees Undirected Tree Directed Tree Polytree

Factor Graphs Factors in directed graphs are local conditional distributions. Factors in undirected graphs are potential functions over the maximal cliques (the normalizing coefficient 1/Z can be viewed as a factor defined over the empty set of variables).

Factor Graphs from Undirected Graphs Create variable nodes corresponding to the nodes in the original undirected graph Create additional factor nodes corresponding to the maximal cliques x s. The factors f s (x s ) are then set equal to the clique potentials.

Factor Graphs from Directed Graphs Create variable nodes in the factor graph corresponding to the nodes of the directed graph. Create factor nodes corresponding to the conditional distributions. Add the appropriate links.

The Sum-Product Algorithm (1) Objective: i. to obtain an efficient, exact inference algorithm for finding marginals; ii. in situations where several marginals are required, to allow computations to be shared efficiently. Key idea: Distributive Law

The Sum-Product Algorithm (2)

The Sum-Product Algorithm (3)

The Sum-Product Algorithm (4)

The Sum-Product Algorithm (5)

The Sum-Product Algorithm (6)

The Sum-Product Algorithm (7) Initialization

The Sum-Product Algorithm (8) To compute local marginals: Pick an arbitrary node as root Compute and propagate messages from the leaf nodes to the root, storing received messages at every node. Compute and propagate messages from the root to the leaf nodes, storing received messages at every node. Compute the product of received messages at each node for which the marginal is required, and normalize if necessary.

Sum-Product: Example (1)

Sum-Product: Example (2)

Sum-Product: Example (3)

Sum-Product: Example (4)

The Max-Sum Algorithm (1) Objective: an efficient algorithm for finding i. the value x max that maximises p(x); ii. the value of p(x max ). In general, maximum marginals joint maximum.

The Max-Sum Algorithm (2) Maximizing over a chain (max-product)

The Max-Sum Algorithm (3) Generalizes to tree-structured factor graph maximizing as close to the leaf nodes as possible

The Max-Sum Algorithm (4) Max-Product Max-Sum For numerical reasons, use Again, use distributive law

The Max-Sum Algorithm (5) Initialization (leaf nodes) Recursion

The Max-Sum Algorithm (6) Termination (root node) Back-track, for all nodes i with l factor nodes to the root (l=0)

The Max-Sum Algorithm (7) Example: Markov chain

The Junction Tree Algorithm Exact inference on general graphs. Works by turning the initial graph into a junction tree and then running a sumproduct-like algorithm. Intractable on graphs with large cliques.

Loopy Belief Propagation Sum-Product on general graphs. Initial unit messages passed across all links, after which messages are passed around until convergence (not guaranteed!). Approximate but tractable for large graphs. Sometime works well, sometimes not at all.