ECE521 W17 Tutorial 10

Similar documents
ECE521 Lecture 21 HMM cont. Message Passing Algorithms

STA 4273H: Statistical Machine Learning

Part II. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

D-Separation. b) the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C.

Introduction to Graphical Models

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

Machine Learning. Sourangshu Bhattacharya

Computer Vision Group Prof. Daniel Cremers. 4a. Inference in Graphical Models

Chapter 8 of Bishop's Book: Graphical Models

FMA901F: Machine Learning Lecture 6: Graphical Models. Cristian Sminchisescu

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

2. Graphical Models. Undirected graphical models. Factor graphs. Bayesian networks. Conversion between graphical models. Graphical Models 2-1

CS242: Probabilistic Graphical Models Lecture 3: Factor Graphs & Variable Elimination

6 : Factor Graphs, Message Passing and Junction Trees

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

Probabilistic Graphical Models

2. Graphical Models. Undirected pairwise graphical models. Factor graphs. Bayesian networks. Conversion between graphical models. Graphical Models 2-1

ECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov

Machine Learning

Machine Learning

COMP90051 Statistical Machine Learning

Graphical Models. Dmitrij Lagutin, T Machine Learning: Basic Principles

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

ECE521 Lecture 18 Graphical Models Hidden Markov Models

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

Workshop report 1. Daniels report is on website 2. Don t expect to write it based on listening to one project (we had 6 only 2 was sufficient

Machine Learning

Lecture 5: Exact inference. Queries. Complexity of inference. Queries (continued) Bayesian networks can answer questions about the underlying

Probabilistic Graphical Models

Expectation Propagation

Graphical Models & HMMs

Computer vision: models, learning and inference. Chapter 10 Graphical Models

Exact Inference: Elimination and Sum Product (and hidden Markov models)

Probabilistic Graphical Models

Factor Graphs and message passing

Lecture 4: Undirected Graphical Models

Statistical and Learning Techniques in Computer Vision Lecture 1: Markov Random Fields Jens Rittscher and Chuck Stewart

CS 343: Artificial Intelligence

5/3/2010Z:\ jeh\self\notes.doc\7 Chapter 7 Graphical models and belief propagation Graphical models and belief propagation

Lecture 5: Exact inference

Conditional Random Fields and beyond D A N I E L K H A S H A B I C S U I U C,

V,T C3: S,L,B T C4: A,L,T A,L C5: A,L,B A,B C6: C2: X,A A

CS 188: Artificial Intelligence

Sequence Labeling: The Problem

Bayesian Networks. A Bayesian network is a directed acyclic graph that represents causal relationships between random variables. Earthquake.

Graphical Models. David M. Blei Columbia University. September 17, 2014

Integrating Probabilistic Reasoning with Constraint Satisfaction

18 October, 2013 MVA ENS Cachan. Lecture 6: Introduction to graphical models Iasonas Kokkinos

Belief propagation in a bucket-tree. Handouts, 275B Fall Rina Dechter. November 1, 2000

Algorithm Design (8) Graph Algorithms 1/2

Max-Sum Inference Algorithm

Bayesian Machine Learning - Lecture 6

OSU CS 536 Probabilistic Graphical Models. Loopy Belief Propagation and Clique Trees / Join Trees

Machine Learning

STAT 598L Probabilistic Graphical Models. Instructor: Sergey Kirshner. Exact Inference

5 Minimal I-Maps, Chordal Graphs, Trees, and Markov Chains

A Tutorial Introduction to Belief Propagation

Machine Learning Lecture 16

Models for grids. Computer vision: models, learning and inference. Multi label Denoising. Binary Denoising. Denoising Goal.

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 15, 2015

A Brief Introduction to Bayesian Networks AIMA CIS 391 Intro to Artificial Intelligence

Graphical Models Part 1-2 (Reading Notes)

10708 Graphical Models: Homework 4

Bayesian Networks Inference

Time series, HMMs, Kalman Filters

4. (a) Draw the Petersen graph. (b) Use Kuratowski s teorem to prove that the Petersen graph is non-planar.

Computational Intelligence

The Basics of Graphical Models

Loopy Belief Propagation

Motivation: Shortcomings of Hidden Markov Model. Ko, Youngjoong. Solution: Maximum Entropy Markov Model (MEMM)

Collective classification in network data

Lecture 9: Undirected Graphical Models Machine Learning

Analysis: TextonBoost and Semantic Texton Forests. Daniel Munoz Februrary 9, 2009

Regularization and Markov Random Fields (MRF) CS 664 Spring 2008

CS242: Probabilistic Graphical Models Lecture 2B: Loopy Belief Propagation & Junction Trees

3 : Representation of Undirected GMs

Recall from last time. Lecture 4: Wrap-up of Bayes net representation. Markov networks. Markov blanket. Isolating a node

Factor Graph Tutorial

Let G = (V, E) be a graph. If u, v V, then u is adjacent to v if {u, v} E. We also use the notation u v to denote that u is adjacent to v.

From the Jungle to the Garden: Growing Trees for Markov Chain Monte Carlo Inference in Undirected Graphical Models

COS 513: Foundations of Probabilistic Modeling. Lecture 5

Probabilistic Graphical Models

Directed Graphical Models (Bayes Nets) (9/4/13)

COMP90051 Statistical Machine Learning

Solution and Grading Key PHYS1212 / PHYS1252 Quiz #1.04 Ray Diagrams

Algorithms. Graphs. Algorithms

Exam Advanced Data Mining Date: Time:

Markov/Conditional Random Fields, Graph Cut, and applications in Computer Vision

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Port-Numbering Model. DDA Course week 2

CS388C: Combinatorics and Graph Theory

UNIT 3. Greedy Method. Design and Analysis of Algorithms GENERAL METHOD

CSCE 478/878 Lecture 6: Bayesian Learning and Graphical Models. Stephen Scott. Introduction. Outline. Bayes Theorem. Formulas

Part I: Sum Product Algorithm and (Loopy) Belief Propagation. What s wrong with VarElim. Forwards algorithm (filtering) Forwards-backwards algorithm

Probabilistic Graphical Models

Algorithms: Graphs. Amotz Bar-Noy. Spring 2012 CUNY. Amotz Bar-Noy (CUNY) Graphs Spring / 95

Graphs & Digraphs Tuesday, November 06, 2007

Av. Prof. Mello Moraes, 2231, , São Paulo, SP - Brazil

Acyclic Network. Tree Based Clustering. Tree Decomposition Methods

Machine Learning A WS15/16 1sst KU Version: January 11, b) [1 P] For the probability distribution P (A, B, C, D) with the factorization

Transcription:

ECE521 W17 Tutorial 10 Shenlong Wang and Renjie Liao *Some of materials are credited to Jimmy Ba, Eric Sudderth, Chris Bishop

Introduction to A4 1, Graphical Models 2, Message Passing 3, HMM

Introduction to A4 1, Graphical Models Draw Bayes-net:

Introduction to A4 Factor graph:

Introduction to A4 1, Graphical Models Draw factor-graph:

Introduction to A4 1, Graphical Models Draw factor-graph:

Introduction to A4 1, Graphical Models Clique: A subset of vertices of an undirected graph such that every two distinct vertices in the clique are adjacent. Maximal Clique: A clique that cannot be extended by including one more adjacent vertex The 11 light blue 3-cliques (triangles) form maximal cliques The 2 dark blue 4-cliques form maximal cliques

Introduction to A4 1, Graphical Models Conversion: a FG expresses pair-wise factorization Convert the FG on the left to a MRF Convert the MRF back to a FG, we lost the factorization property

Introduction to A4 Converting Markov Random Fields to factor graph takes the following steps: Consider all the maximum cliques of the MRF Create a factor node for each of the maximum cliques Connect all the nodes of the maximum clique to the new factor nodes

Introduction to A4 Converting Markov Random Fields to factor graph takes the following steps: Consider all the maximum cliques of the MRF Create a factor node for each of the maximum cliques Connect all the nodes of the maximum clique to the new factor nodes

Introduction to A4 Converting Markov Random Fields to factor graph takes the following steps: Consider all the maximum cliques of the MRF Create a factor node for each of the maximum cliques Connect all the nodes of the maximum clique to the new factor nodes

Introduction to A4 1, Graphical Models Conditional Independence:

Introduction to A4 2, Message Passing 3, HMM Sum-Product Algorithm a.k.a. Belief Propagation, directed graphical models were used in expert systems, which need to infer their beliefs about the probabilities of various events Forward-Backward Algorithm

Sum Product Algorithm The sum-product algorithm is used to compute probabilities for a subset of the variables of a joint distribution, e.g. P(a, b, c, d) Marginal distributions, e.g. P(b) Joint distributions of a subset of variables, e.g. P(a,b) Conditional distributions ( often the posterior distributions ), e.g. P(a,c d)=p(a,c,d) / P(d)

Message notations We use function to denote messages. On each edge of the factor graph, there are two messages traveling in opposite directions We use subscript to denote the origin and the destination of these messages, e.g.: factor-to-variable variable-to-factor

The sum product algorithm on factor graph Two rules in the sum-product algorithm: Factor-to-variable messages: Incoming messages...

The sum product algorithm on factor graph Two rules in the sum-product algorithm: Variable-to-factor messages:...

The sum product algorithm on factor graph Two rules in the sum-product algorithm: Factor-to-variable messages: Variable-to-factor messages:

The sum product algorithm on factor graph How to start the sum-product algorithm: Choose a node in the factor graph as the root node Compute all the leaf-to-root messages Compute all the root-to-leaf messages Initial conditions: Starting from a factor leaf/root node, the initial factor-to-variable message is the factor itself Starting from a variable leaf/root node, the initial variable-to-factor message is a vector of ones

Compute probabilities with messages How to convert messages to actual probabilities:... Compute unnormalized probabilities using messages (the sum-product algorithm): unnormalized marginal probabilities: normalization constant: marginal probabilities:

Compute probabilities with messages How to convert messages to actual probabilities: Let be a subset of the variables we wish to compute joint distribution over... Compute unnormalized probabilities using messages (the sum-product algorithm): unnormalized joint probabilities: When all the variables in are connected to the same factor. We make the computation efficient normalization constant: marginal probabilities:

Equivalence Sum-product algorithm is equivalent to forward-backward algorithm in the context of Hidden Markov Models Factor graph of HMM:

Forward-Backward Algorithm For the purpose of inference, we can simplify the factor graph as below: Factors are:

Forward-Backward Algorithm For the purpose of inference, we can simplify the factor graph as below: Factors are: Messages are: Quiz: Why?

Forward-Backward Algorithm For the purpose of inference, we can simplify the factor graph as below: Rewrite message:

Forward-Backward Algorithm For the purpose of inference, we can simplify the factor graph as below: Rewrite message: Replace factor:

Forward-Backward Algorithm For the purpose of inference, we can simplify the factor graph as below: Rewrite message: Replace factor:

Forward-Backward Algorithm For the purpose of inference, we can simplify the factor graph as below: Obtain Posterior:

A set of binary random variables:

The sum product algorithm on factor graph How to start the sum-product algorithm: Choose a node in the factor graph as the root node Compute all the leaf-to-root messages Compute all the root-to-leaf messages Initial conditions: Starting from a factor leaf/root node, the initial factor-to-variable message is the factor itself Starting from a variable leaf/root node, the initial variable-to-factor message is a vector of ones

The sum product algorithm on factor graph How to start the sum-product algorithm: Choose a node in the factor graph as the root node Compute all the leaf-to-root messages Compute all the root-to-leaf messages Initial conditions: Starting from a factor leaf/root node, the initial factor-to-variable message is the factor itself Starting from a variable leaf/root node, the initial variable-to-factor message is a vector of ones

The sum product algorithm on factor graph How to start the sum-product algorithm: Choose a node in the factor graph as the root node Compute all the leaf-to-root messages Compute all the root-to-leaf messages Initial conditions: Starting from a factor leaf/root node, the initial factor-to-variable message is the factor itself Starting from a variable leaf/root node, the initial variable-to-factor message is a vector of ones

The sum product algorithm on factor graph How to start the sum-product algorithm: Choose a node in the factor graph as the root node Compute all the leaf-to-root messages Compute all the root-to-leaf messages Initial conditions: Starting from a factor leaf/root node, the initial factor-to-variable message is the factor itself Starting from a variable leaf/root node, the initial variable-to-factor message is a vector of ones

How do we know we did it right?

We can verify the correctness by computing normalization constant Z

We can verify the correctness by computing normalization constant Z

We can verify the correctness by computing normalization constant Z

Naive computation:

Naive computation: O(2 5 )

Naive computation: O(2 5 ) Message-passing: O(2 3 ) Dominated by the largest factor

Naive computation: O(2 5 ) Message-passing: O(2 3 ) Dominated by the largest factor

How do we compute P(e, b)?

How do we compute P(e, b)? We can eliminate the summation over b or e during message-passing

How do we compute P(e, b)? We can eliminate the summation over b or e during message-passing Assume choose e as an anchor and eliminate summation over b: Or we can choose b as an anchor and eliminate summation over c:

How do we compute P(e, b)? Assume choose e as an anchor and eliminate summation over b: Need to re-compute the messages to carry b along

How do we compute P(e, b)? Assume choose e as an anchor and eliminate summation over b: Need to re-compute the messages to carry b along

How do we compute P(e, b)? New messages carrying b:

How do we compute P(e, b)? New messages carrying b:

How do we compute P(e, b)? New messages carrying b:

How do we compute P(e, b)? New messages carrying b:

How do we compute P(e, b)? New messages carrying b:

What if we observed c as c =[0,1]?

What if we observed c as c =[0,1]? It changes the outgoing messages from c

What if we observed c as c =[0,1]? It changes the outgoing messages from c And their derivative messages

What if we observed c as c =[0,1]? It changes the outgoing messages from c And their derivative messages

Message-passing algorithms Why message-passing algorithms? Message-passing rules are local computation that usually has much lower computational complexity than global summation Computation complexity in message-passing is dominated by the factors with the largest number of neighbour nodes