OSU CS 536 Probabilistic Graphical Models. Loopy Belief Propagation and Clique Trees / Join Trees

Similar documents
Part I: Sum Product Algorithm and (Loopy) Belief Propagation. What s wrong with VarElim. Forwards algorithm (filtering) Forwards-backwards algorithm

Part II. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

CS242: Probabilistic Graphical Models Lecture 2B: Loopy Belief Propagation & Junction Trees

D-Separation. b) the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C.

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

6 : Factor Graphs, Message Passing and Junction Trees

Computer Vision Group Prof. Daniel Cremers. 4a. Inference in Graphical Models

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

CS242: Probabilistic Graphical Models Lecture 3: Factor Graphs & Variable Elimination

STA 4273H: Statistical Machine Learning

Junction tree propagation - BNDG 4-4.6

Motivation: Shortcomings of Hidden Markov Model. Ko, Youngjoong. Solution: Maximum Entropy Markov Model (MEMM)

Mean Field and Variational Methods finishing off

Exact Inference: Elimination and Sum Product (and hidden Markov models)

Mean Field and Variational Methods finishing off

Machine Learning. Sourangshu Bhattacharya

COS 513: Foundations of Probabilistic Modeling. Lecture 5

Integrating Probabilistic Reasoning with Constraint Satisfaction

A Tutorial Introduction to Belief Propagation

Loopy Belief Propagation

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

V,T C3: S,L,B T C4: A,L,T A,L C5: A,L,B A,B C6: C2: X,A A

Graphical Models. Pradeep Ravikumar Department of Computer Science The University of Texas at Austin

Lecture 5: Exact inference. Queries. Complexity of inference. Queries (continued) Bayesian networks can answer questions about the underlying

Collective classification in network data

Tree-structured approximations by expectation propagation

Algorithms for Markov Random Fields in Computer Vision

Lecture 9: Undirected Graphical Models Machine Learning

Expectation Propagation

Probabilistic Graphical Models

Inference. Inference: calculating some useful quantity from a joint probability distribution Examples: Posterior probability: Most likely explanation:

STAT 598L Probabilistic Graphical Models. Instructor: Sergey Kirshner. Exact Inference

Conditional Random Fields and beyond D A N I E L K H A S H A B I C S U I U C,

Chapter 8 of Bishop's Book: Graphical Models

Probabilistic Graphical Models

Introduction to Graphical Models

More details on Loopy BP

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

18 October, 2013 MVA ENS Cachan. Lecture 6: Introduction to graphical models Iasonas Kokkinos

Lecture 5: Exact inference

An Introduction to LP Relaxations for MAP Inference

Learning the Structure of Sum-Product Networks. Robert Gens Pedro Domingos

Probabilistic Graphical Models

Bayesian Networks Inference (continued) Learning

10708 Graphical Models: Homework 4

Bayesian Networks Inference

Modeling and Reasoning with Bayesian Networks. Adnan Darwiche University of California Los Angeles, CA

ECE521 W17 Tutorial 10

2. Graphical Models. Undirected graphical models. Factor graphs. Bayesian networks. Conversion between graphical models. Graphical Models 2-1

Max-Sum Inference Algorithm

Warm-up as you walk in

FMA901F: Machine Learning Lecture 6: Graphical Models. Cristian Sminchisescu

Belief propagation in a bucket-tree. Handouts, 275B Fall Rina Dechter. November 1, 2000

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

CS 664 Flexible Templates. Daniel Huttenlocher

Speeding Up Computation in Probabilistic Graphical Models using GPGPUs

Regularization and Markov Random Fields (MRF) CS 664 Spring 2008

Machine Learning

Machine Learning

Information Processing Letters

Probabilistic Graphical Models

ECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov

Linear Time Inference in Hierarchical HMMs

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas

Finding Non-overlapping Clusters for Generalized Inference Over Graphical Models

2. Graphical Models. Undirected pairwise graphical models. Factor graphs. Bayesian networks. Conversion between graphical models. Graphical Models 2-1

Dynamic Bayesian network (DBN)

Probabilistic Graphical Models

Time series, HMMs, Kalman Filters

Workshop report 1. Daniels report is on website 2. Don t expect to write it based on listening to one project (we had 6 only 2 was sufficient

CS 532c Probabilistic Graphical Models N-Best Hypotheses. December

Undirected Graphical Models. Raul Queiroz Feitosa

Lecture 3: Conditional Independence - Undirected

Models for grids. Computer vision: models, learning and inference. Multi label Denoising. Binary Denoising. Denoising Goal.

Probabilistic Graphical Models

10708 Graphical Models: Homework 2

Microsoft Office Access 2007: Intermediate Course 01 Relational Databases

Markov Random Fields and Segmentation with Graph Cuts

Problem Set 4. Assigned: March 23, 2006 Due: April 17, (6.882) Belief Propagation for Segmentation

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 15, 2015

CS 343: Artificial Intelligence

Av. Prof. Mello Moraes, 2231, , São Paulo, SP - Brazil

Variational Methods for Graphical Models

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

Factor Graphs and message passing

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Recall from last time. Lecture 4: Wrap-up of Bayes net representation. Markov networks. Markov blanket. Isolating a node

Automatic Speech Recognition using Dynamic Bayesian Networks

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms for Inference Fall 2014

From the Jungle to the Garden: Growing Trees for Markov Chain Monte Carlo Inference in Undirected Graphical Models

Belief Updating in Bayes Networks (1)

CS 188: Artificial Intelligence

Escola Politécnica, University of São Paulo Av. Prof. Mello Moraes, 2231, , São Paulo, SP - Brazil

Structured Models in. Dan Huttenlocher. June 2010

Machine Learning Lecture 16

Lecture 4: Undirected Graphical Models

EE512 Graphical Models Fall 2009

Ch9: Exact Inference: Variable Elimination. Shimi Salant, Barak Sternberg

Learning Bayesian Networks (part 3) Goals for the lecture

Transcription:

OSU CS 536 Probabilistic Graphical Models Loopy Belief Propagation and Clique Trees / Join Trees Slides from Kevin Murphy s Graphical Model Tutorial (with minor changes) Reading: Koller and Friedman Ch 10

Part I: Sum Product Algorithm and (Loopy) Belief Propagation (All you need to know is slide 17 which is covered in the excerpt from MacKay s book posted on the course webpage.)

What s wrong with VarElim Often we want to query all hidden nodes. VarElim takes O(N 2 K w+1 ) time to compute P(X i x e ) for all (hidden) nodes i. There exist message passing algorithms that can do this in O(N K w+1 ) time. Later, we will use these to do approximate inference in O(N K 2 ) time, indep of w. X 1 X 2 X 3 Y 2 Y 1 Y 3 SP2-3

Repeated variable elimination leads to redundant calculations X 1 X 2 X 3 Y 1 Y 3 Y 2 O(N 2 K 2 ) time to compute all N marginals SP2-4

Forwards-backwards algorithm X t X t X t+1 Rabiner89,etc Y 1:t-1 Y t Y t+1:n Forwards prediction Local evidence Backwards prediction (Use dynamic programming to compute these) SP2-5

Forwards algorithm (filtering) X t X t Y 1:t-1 Y t SP2-6

Backwards algorithm X t X t X t+1 X t+2 Y t+1 Y t+2:n SP2-7

Forwards-backwards algorithm 1 12 X 1 X 12 24 X 24 b 1 Forwards b 12 b 24 Backwards Backwards messages independent of forwards messages Combine O(N K 2 ) time to compute all N marginals, not O(N 2 K 2 ) SP2-8

Belief propagation Pearl88,Shafer90,Yedidia01,etc Forwards-backwards algorithm can be generalized to apply to any tree-like graph (ones with no loops). For now, we assume pairwise potentials. SP2-9

Absorbing messages X t-1 X t X t+1 Y t SP2-10

Sending messages X t-1 X t X t+1 Y t SP2-11

Centralized protocol Collect to root (post-order) Distribute from root (pre-order) R 5 4 3 R 1 2 3 1 2 5 4 Computes all N marginals in 2 passes over graph SP2-12

Distributed protocol Computes all N marginals in O(N) parallel updates SP2-13

Loopy belief propagation Applying BP to graphs with loops (cycles) can give the wrong answer, because it overcounts evidence Cloudy Sprinkler Rain WetGrass In practice, often works well (e.g., error correcting codes) SP2-14

Why Loopy BP? We can compute exact answers by converting a loopy graph to a junction tree and running BP (see later). However, the resulting Jtree has nodes with O(K w+1 ) states, so inference takes O(N K w+1 ) time [w=clique size of triangulated graph]. We can apply BP to the original graph in O(N K C ) time [C = clique size of original graph]. To apply BP to a graph with non pairwise potentials, it is simpler to use factor graphs. SP2-15

Factor graphs Kschischang01 X1 X2 X1 X2 X1 X2 Bayes net X3 Markov net X3 Pairwise Markov net X3 X4 X5 X4 X5 X4 X5 X1 X2 X1 X2 X1 X2 X3 X3 X3 X4 X5 X4 X5 X4 X5 Bipartite graph SP2-16

Loopy BP (see MacKay PDF) Dashed messages are products of same color solid messages (and factor) f 1 f 2 f 3 q x f1 q x f2 q y f2 q y f3 q z f3 = 1 (empty ) r f1 x = f1 r f2 x r f2 y r f3 y x y r f4 x = f4 q x f1 r f3 z z f 4 SP2-17

Sum-product vs max-product Sum-product computes marginals using this rule Max-product computes max marginals using the rule Same algorithm on different semirings: (+,x,0,1) and (max,x,-1,1) Shafer90,Bistarelli97,Goodman99,Aji00 SP2-18

Viterbi decoding Compute most probable explanation (MPE) of observed data Hidden Markov Model (HMM) X 1 X 2 X 3 hidden Y 1 Y 3 Y 2 observed Tomato SP2-19

Viterbi algorithm for HMMs Run max forwards algorithm, keeping track of most probable predecessor for each state Pointer traceback Can produce N-best list (most probable configurations) in O(N T K 2 ) time Forney73,Nilsson01 SP2-20

Loopy Viterbi Use max-product to compute/ approximate If there are no ties and the max-marginals are exact, then This method does not use traceback, so can be used with distributed/ loopy BP We can break ties, and produce N most-probable configurations, by asserting that certain assignments are disallowed, and rerunning Yanover04 SP2-21

BP speedup tricks Sometimes we can reduce the time to compute a message from O(K 2 ) to O(K) If (x i,x j ) = exp( f(x i ) f(x j ) 2 ), then Sum-product in O(K log K) time [exact FFT] or O(K) time [approx] Max-product in O(K) time [distance transform] Felzenszwalb03/04,Movellan04,deFreitas04 For general (discrete) potentials, we can dynamically add/delete states to reduce K Coughlan04 Sometimes we can speedup convergence by Using a better message-passing schedule (e.g., along Wainwright01 embedded spanning trees) Using a multiscale method Felzenszwalb04 SP2-22

Part II: Sum Product Algorithm and (Loopy) Belief Propagation Not tested material

Junction/ join/ clique trees To perform exact inference in an arbitrary graph, convert it to a junction tree, and then perform belief propagation. A jtree is a tree whose nodes are sets, and which has the Jtree property: all sets which contain any given variable form a connected graph (variable cannot appear in 2 disjoint places) C C C S W R moralize S W R Make jtree Maximal cliques = { {C,S,R}, {S,R,W} } Separators = { {C,S,R} Å {S,R,W} = {S,R} } SP2-24

B G Making a junction tree GM D B D A F moralize A F C E C E {a,b,c} Jensen94 Max spanning tree {b,c,e} {b,d} {b,e,f} Jtree W ij = C i Å C j {b,d} 1 {b,e,f} 1 1 1 2 {b,c,e} 2 {a,b,c} Jgraph SP2-25 Find max cliques A B C Triangulate (order f,d,e,c,b,a) E GT D F

S C W R Clique potentials C Each model clique potential gets assigned to one Jtree clique potential Each observed variable assigns a delta function to one Jtree clique potential If we observe W=w *, set E(w)= (w,w * ), else E(w)=1 Square nodes are factors SP2-26

S C W R Separator potentials C Separator potentials enforce consistency between neighboring cliques on common variables. Square nodes are factors SP2-27

BP on a Jtree 1 C 4 A Jtree is a MRF with pairwise potentials. Each (clique) node potential contains CPDs and local evidence. Each edge potential acts like a projection function. 2 3 We do a forwards (collect) pass, then a backwards (distribute) pass. The result is the Hugin/ Shafer-Shenoy algorithm. SP2-28

BP on a Jtree (collect) C Initial clique potentials contain CPDs and evidence SP2-29

BP on a Jtree (collect) C Message from clique to separator marginalizes belief (projects onto intersection) [remove c] SP2-30

BP on a Jtree (collect) C Separator potentials gets marginal belief from their parent clique. SP2-31

BP on a Jtree (collect) C Message from separator to clique expands marginal [add w] SP2-32

BP on a Jtree (collect) C SP2-33 Root clique has seen all the evidence

BP on a Jtree (distribute) C C SP2-34

BP on a Jtree (distribute) C C Marginalize out w and exclude old evidence (e c, e r ) SP2-35

BP on a Jtree (distribute) C C Combine upstream and downstream evidence SP2-36

BP on a Jtree (distribute) C C Add c and exclude old evidence (e c, e r ) SP2-37

BP on a Jtree (distribute) C C Combine upstream and downstream evidence SP2-38

Partial beliefs C C Evidence on R now added here The beliefs / messages at intermediate stages (before finishing both passes) may not be meaningful, because any given clique may not have seen all the model potentials/ evidence (and hence may not be normalizable). This can cause problems when messages may fail (eg. Sensor nets). One must reparameterize using the decomposable model to ensure meaningful partial beliefs. Paskin04 SP2-39

Hugin algorithm Hugin = BP applied to a Jtree using a serial protocol Collect Distribute C i C i S ij S ij C j C j Square nodes are separators SP2-40