CS242: Probabilistic Graphical Models Lecture 2B: Loopy Belief Propagation & Junction Trees

Similar documents
CS242: Probabilistic Graphical Models Lecture 3: Factor Graphs & Variable Elimination

6 : Factor Graphs, Message Passing and Junction Trees

Expectation Propagation

Part II. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

OSU CS 536 Probabilistic Graphical Models. Loopy Belief Propagation and Clique Trees / Join Trees

Mean Field and Variational Methods finishing off

Mean Field and Variational Methods finishing off

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

Computer Vision Group Prof. Daniel Cremers. 4a. Inference in Graphical Models

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Loopy Belief Propagation

Probabilistic Graphical Models

STA 4273H: Statistical Machine Learning

Lecture 9: Undirected Graphical Models Machine Learning

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Chapter 8 of Bishop's Book: Graphical Models

Collective classification in network data

Part I: Sum Product Algorithm and (Loopy) Belief Propagation. What s wrong with VarElim. Forwards algorithm (filtering) Forwards-backwards algorithm

Junction tree propagation - BNDG 4-4.6

Probabilistic Graphical Models

V,T C3: S,L,B T C4: A,L,T A,L C5: A,L,B A,B C6: C2: X,A A

Probabilistic Graphical Models

These notes present some properties of chordal graphs, a set of undirected graphs that are important for undirected graphical models.

Tree-structured approximations by expectation propagation

Information Processing Letters

D-Separation. b) the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C.

EE512 Graphical Models Fall 2009

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

Motivation: Shortcomings of Hidden Markov Model. Ko, Youngjoong. Solution: Maximum Entropy Markov Model (MEMM)

Bayesian Networks, Winter Yoav Haimovitch & Ariel Raviv

Graphical Models. Pradeep Ravikumar Department of Computer Science The University of Texas at Austin

COS 513: Foundations of Probabilistic Modeling. Lecture 5

More details on Loopy BP

EE512A Advanced Inference in Graphical Models

Lecture 4: Undirected Graphical Models

Acyclic Network. Tree Based Clustering. Tree Decomposition Methods

Probabilistic Graphical Models

AN ANALYSIS ON MARKOV RANDOM FIELDS (MRFs) USING CYCLE GRAPHS

Decomposition of log-linear models

Finding Non-overlapping Clusters for Generalized Inference Over Graphical Models

An Effective Upperbound on Treewidth Using Partial Fill-in of Separators

Homework 1: Belief Propagation & Factor Graphs

Models for grids. Computer vision: models, learning and inference. Multi label Denoising. Binary Denoising. Denoising Goal.

Belief propagation in a bucket-tree. Handouts, 275B Fall Rina Dechter. November 1, 2000

Markov Random Fields

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

Lecture 11: May 1, 2000

Summary. Acyclic Networks Join Tree Clustering. Tree Decomposition Methods. Acyclic Network. Tree Based Clustering. Tree Decomposition.

An Introduction to LP Relaxations for MAP Inference

10708 Graphical Models: Homework 4

Acyclic Network. Tree Based Clustering. Tree Decomposition Methods

Learning the Structure of Sum-Product Networks. Robert Gens Pedro Domingos

Probabilistic Graphical Models

STAT 598L Probabilistic Graphical Models. Instructor: Sergey Kirshner. Exact Inference

Graphical Models as Block-Tree Graphs

Machine Learning. Sourangshu Bhattacharya

Multiagent Graph Coloring: Pareto Efficiency, Fairness and Individual Rationality By Yaad Blum, Jeffrey S. Rosenchein

Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees

Algorithms for Markov Random Fields in Computer Vision

Faster parameterized algorithms for Minimum Fill-In

Lecture 5: Exact inference. Queries. Complexity of inference. Queries (continued) Bayesian networks can answer questions about the underlying

Graphical Models. David M. Blei Columbia University. September 17, 2014

Exact Inference: Elimination and Sum Product (and hidden Markov models)

Structured Region Graphs: Morphing EP into GBP

Lecture 5: Exact inference

Lecture 13: May 10, 2002

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

Integrating Probabilistic Reasoning with Constraint Satisfaction

Ch9: Exact Inference: Variable Elimination. Shimi Salant, Barak Sternberg

Graphs and Network Flows IE411. Lecture 21. Dr. Ted Ralphs

Machine Learning

Computer vision: models, learning and inference. Chapter 10 Graphical Models

Learning Bounded Treewidth Bayesian Networks

Matching Theory. Figure 1: Is this graph bipartite?

3 : Representation of Undirected GMs

ECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov

Regularization and Markov Random Fields (MRF) CS 664 Spring 2008

Graphical Models & HMMs

The Basics of Graphical Models

Chordal Graphs: Theory and Algorithms

Probabilistic Graphical Models

Convergent message passing algorithms - a unifying view

Bipartite Roots of Graphs

Error correction guarantees

Chordal graphs MPRI

A Tutorial Introduction to Belief Propagation

ECE521 Lecture 21 HMM cont. Message Passing Algorithms

Lecture 3: Conditional Independence - Undirected

Directed Graphical Models

Probabilistic Graphical Models

Lecture 6. Register Allocation. I. Introduction. II. Abstraction and the Problem III. Algorithm

Small Survey on Perfect Graphs

Junction Trees and Chordal Graphs

ECE521 W17 Tutorial 10

Mixture Models and the EM Algorithm

Notes for Lecture 24

Faster parameterized algorithms for Minimum Fill-In

Conditional Random Fields and beyond D A N I E L K H A S H A B I C S U I U C,

Speeding Up Computation in Probabilistic Graphical Models using GPGPUs

CS649 Sensor Networks IP Track Lecture 6: Graphical Models

Transcription:

CS242: Probabilistic Graphical Models Lecture 2B: Loopy Belief Propagation & Junction Trees Professor Erik Sudderth Brown University Computer Science September 22, 2016 Some figures and materials courtesy David Barber, Bayesian Reasoning and Machine Learning http://www.cs.ucl.ac.uk/staff/d.barber/brml/

CS242: Lecture 2B Outline Ø Elimination Algorithms & Clique Trees Ø Junction Tree Algorithm Ø Loopy Belief Propagation X2 X 4 X 4 X 1 X 1 X1 X1 X1 X 3 X 5 X 6 X 1 X 6 X 3 X2 X 5 X2 X 3 X 5 X 3 X 5

x s V Undirected Graphical Models Random variable associated with node s. Set of N nodes or vertices, {1, 2,...,N} x 1 x 2 E Set of undirected edges (s,t) linking pairs of nodes. x 3 C Undirected Markov Random Field: p(x) = 1 Z Set of cliques (fully connected subsets) of nodes. Y c2c c(x c ) 0 Z>0 c(x c ) Z = X x Y c2c x 4 x 5 c(x c ) Arbitrary non-negative potential function. Normalization constant, or partition function. Defined as sum over all possible joint states.

p(x) = 1 Z Clique-Based Inference Algorithms Y c2c c(x c ) z c = {x s s 2 c}, c2 C z 123 p(z) / Y c(z c ) z 34 z 35 c2c x 1 x 2 x 3 For each clique c, define a variable z c which enumerates joint configurations of dependent variables Does this define an equivalent joint distribution? x 4 x 5 PROBLEM: We have defined multiple copies of the variables in the true model, but not enforced any relationships among them

p(x) = 1 Z Clique-Based Inference Algorithms z c = {x s s 2 c}, c2 C p(z) / Y c2c Y c2c c(x c ) c(z c ) Y d6=c cd(z c,z d ) z 123 z 34 z 35 x 1 x 2 For each clique c, define a variable z c which enumerates joint configurations of dependent variables Add potentials enforcing consistency between all pairs of clique variables which share one of the original variables: 1 if zc (x cd(z c,z d )= s )=z d (x s ) for all s 2 c \ d, 0 otherwise. PROBLEM: The graph may have a large number of pairwise consistency constraints, and inference will be difficult x 3 x 4 x 5

p(x) = 1 Z Clique-Based Inference Algorithms Y c2c c(x c ) z c = {x s s 2 c}, c2 C p(z) / Y Y c(z c ) c2c (c,d)2e(c) cd(z c,z d ) z 123 z 34 z 35 For each clique c, define a variable z c which enumerates joint configurations of dependent variables Add potentials enforcing consistency between some subset of pairs of cliques, taking advantage of transitivity of equality: x a = x b,x b = x c! x a = x c Question: How many edges are needed for global consistency? When can we build a tree-structured clique graph? x 1 x 2 x 3 x 4 x 5

An Undirected View of Graph Elimination Elimination Order: (6,5,4,3,2,1) X 4 X 4 X 1 X 6 X1 X 6 X 3 X 5 X 3 X 5 For the node being eliminated, marginalize over values of associated variable Compute a new potential function involving all other variables which are neighbors of the just-marginalized variables, and are thus related by some potential If necessary, add edges to create a clique for the newly created potential

An Undirected View of Graph Elimination Elimination Order: (6,5,4,3,2,1) X 4 X 4 X 1 X 6 X1 X 6 X 4 X 3 X 5 X 3 X 5 X 1 X 3 X 5

An Undirected View of Graph Elimination X 4 Elimination Order: (6,5,4,3,2,1) X 4 X 1 X 1 X 1 X 3 X 5 X 3 X 3 X 1 X 1

Elimination Clique Tree X2 X 4 X 1 X1 X1 X 3 X 5 X 6 The clique tree contains the cliques (fully connected subsets) which are generated as elimination executes X2 X 3 X 5 X2 X 4 X 1 X 1 X1 X1 X1 X 3 X 5 X 6 The separator sets contain the variables which are shared among each linked pair of cliques X 3 X2 X 3 X 5 X2 X 5

CS242: Lecture 2B Outline Ø Elimination Algorithms & Clique Trees Ø Junction Tree Algorithm Ø Loopy Belief Propagation b d {b, d} {b} COLLECT messages DISTRIBUTE messages 1 4 a c e f {a, b, c} {b, c} {b, c, e} {b, e} {b, e, f} root 2 1 3 4 Some figures from Paskin (2003)

Marginal Inference Algorithms One Marginal All Marginals Tree elimination applied to leaves of tree belief propagation or sum-product algorithm Graph elimination algorithm junction tree algorithm: belief propagation on a junction tree (special clique tree)

Clique Trees and Junction Trees X 4 X2 X 4 X 1 X 6 X1 X1 X 3 X 5 X 6 X 3 X 5 This clique tree has the junction tree property: the clique nodes containing any variable from the original model form a connected subtree We can exactly represent the distribution ignoring redundant constraints Note that not all clique trees are junction trees: X1 X1 X 3 X2 X 3 X 5 X2 X 4 X 5 X 6 X2 X 3 X 5

Finding a Junction Tree The Junction Tree Property: A clique tree is a junction tree if for every pair of cliques V, W, all cliques on the (unique) path between V and W contain any shared variables. A B A,B Not a Junction Tree! C D A,C B,D C,D A B C D Given a set of cliques, how can we efficiently find a clique tree with the junction tree (running intersection) property? How can we be sure that at least one junction tree exists? Strategy: Augment the graph with additional edges Ø Cliques of original graph are always subsets of cliques of the augmented graph, so original distribution still factorizes appropriately Ø As cliques grow, will eventually be able to construct a junction tree Question: Which undirected graphs have junction trees?

p(a, b, c, d) = Z Junction Trees and Triangulation b The Junction Tree Property: A clique tree is a junction tree if for every pair of cliques V, W, all cliques on the (unique) path between V and W contain any shared variables. A B A A,B a B,D Bd c C D A,C C (a) C,D D a, b, c b, c b, c, d (b) A chord is an edge connectingfigure two non-adjacent nodes : (a) Markov network (a,inb, some c) (b, c,cycle d). (b) Clique graph represen A cycle is chordless if it contains no chords A graph is triangulated (chordal) if it contains no chordless cycles of length 4 or more Cliqueofpotential assignments Theorem: The maximal cliques a graph have a corresponding The separator potential may be set to the normalisation constan junction tree if and only if that undirected graph is triangulated Cliques have potentials b, c) and d). Lemma: For a non-complete triangulated graph with at least 3(a, nodes, there (b, is ac,decomposition of the nodes into disjoint sets A, B, S such that S separates A from B, and S is complete. Ø Key induction argument in constructing junction tree from triangulation Ø Implies existence of elimination ordering which introduces no new edges

Constructing a Junction Tree A ABD ABD B D D BD C E BCD CD CDE Theorem: A clique tree is a junction tree if and only if it is a maximal spanning tree of: Ø Graph nodes: One for each maximal clique in triangulation Ø Graph edges: Link all pairs of cliques that share at least one variable Ø Edge weights: Cardinality of separator set (intersection) of cliques Ø Computational complexity: At worst quadratic in number of maximal cliques Junction Tree Algorithms for General-Purpose Inference 1. Triangulate the target undirected graphical model Ø Any elimination ordering generates a valid triangulation (linear in graph size) Ø But finding the optimal triangulation is NP-hard 2. Arrange triangulated cliques into a junction tree (at worst quadratic in graph size) 3. Execute sum-product algorithm on junction tree (exponential in clique size) BCD CD CDE

Example: Junction Tree Formation 1. Compute the elimination cliques (the order here is f, d, e, c, b, a). d d b b b b a f a a a b c e c e c e c a 2. Form the complete cluster graph over the maximal elimination cliques and find a maximum-weight spanning tree. {b, d} 1 1 1 2 {b, e, f} 1 {b, d} {b} NOTE: For a given set of cliques, there may be multiple valid junction trees {b, c, e} 2 {a, b, c} {a, b, c} {b, c} {b, c, e} {b, e} {b, e, f} Paskin 2003

Example: Junction Tree Formation a b c d e a b c d e f g h i f g h i j k j k l l a b c d e f g h i j k l

Example: Junction Tree Formation a b c d e a b c d e f g h i f g h i j k j k l l a b c d e f g h i j k l

Example: Junction Tree Formation abf bf bcfg cdhi di dei a b c d e cfg chi f g h i cfgj cj ck chik j k cjk jk l jkl

Reminder: Pairwise Sum-Product BP Set of neighbors of node t: (t) ={s 2V (s, t) 2F} m ts (x s ) x s x t x s x t m st (x t ) p t (x t ) / Y s2 (t) m st (x t ) m ts (x s )= X Y st(x s,x t ) m ut (x t ) x t u2 (t)\s K t p t (x t ) m st (x t ) m ts (x s ) number of discrete states for random variable x t marginal distribution of the K t discrete states of random variable x t message from node s to node t, a vector of K t non-negative numbers message from node t to node s, a vector of K s non-negative numbers

p(z) / Y c2c cd(z c,z d )= Sum-Product for Junction Trees c(z c ) Y (c,d)2e(c) cd(z c,z d ) 1 if zc (x s )=z d (x s ) for all s 2 c \ d, 0 otherwise. Original Undirected Graphical Model Ø Nodes: Discrete random variables x s Ø Cliques: Groups of variables over which the joint distribution factorizes into potentials Junction Tree Ø Nodes: Maximal cliques in some triangulation Ø Node variables: All joint configurations z c of variables in corresponding clique Ø Node potentials: Potentials from original model Ø Edge potentials: Indicator functions forcing consistency of shared variables (separator sets) z c = {x s s 2 c}, c2 C {a, b, c} a {b, c} b c {b, d} {b} {b, c, e} d e {b, e} f {b, e, f}

Sum-Product for Junction Trees Sum-Product for Junction Trees (Shafer-Shenoy) Ø Express algorithm via original variables x s Ø Messages depend on clique intersection (separators) Ø Efficient schedules compute each message once C i C j µ ji C i C j S ij COLLECT messages DISTRIBUTE messages 1 4 µ ij p j (x Cj ) / Cj (x Cj ) Y µ ij (x Sij ) i2 (j) root 2 1 3 4 {b, d} S ij = S ji = C i \ C j µ ji (x Sji ) / X C j (x Cj ) x Rji R ji = C j \ S ji Y µ kj (x Skj ) k2 (j)\i {a, b, c} {b, c} {b} {b, c, e} {b, e} {b, e, f}

Sum-Product for Junction Trees Sum-Product for Junction Trees (Shafer-Shenoy) Ø Express algorithm via original variables x s Ø Messages depend on clique intersection (separators) Ø Efficient schedules compute each message once Storage & Computational Cost O P j Qs2C j K s, where x s 2{1,...,K s } Exponential in sizes of maximal cliques. p j (x Cj ) / Cj (x Cj ) Y µ ij (x Sij ) µ ji (x Sji ) / X C j (x Cj ) x Rji i2 (j) Y k2 (j)\i µ kj (x Skj ) COLLECT messages DISTRIBUTE messages root {a, b, c} {b, c} 1 2 1 3 4 {b, d} {b} 4 {b, c, e} {b, e} {b, e, f}

a b c Summary: Junction Tree Algorithm d e f {a, b, c} {b, c} {b, d} {b} {b, c, e} p j (x Cj ) / Cj (x Cj ) Y i2 (j) {b, e} {b, e, f} µ ij (x Sij ) COLLECT messages DISTRIBUTE messages root 1 4 2 1 3 4 S ij = S ji = C i \ C j Junction Tree Algorithms for General-Purpose Inference 1. If necessary, convert graphical model to undirected form (linear in graph size) 2. Triangulate the target undirected graph Ø Any elimination ordering generates a valid triangulation (linear in graph size) Ø Finding an optimal triangulation, with minimal cliques, is NP-hard 3. Arrange triangulated cliques into a junction tree (at worst quadratic in graph size) 4. Execute sum-product algorithm on junction tree (exponential in clique size)

CS242: Lecture 2B Outline Ø Elimination Algorithms & Clique Trees Ø Junction Tree Algorithm Ø Loopy Belief Propagation

Inference for Graphs with Cycles For graphs with cycles, the dynamic programming BP derivation breaks Junction Tree Algorithm Cluster nodes to break cycles Run BP on the tree of clusters Exact, but may be intractable Loopy Belief Propagation Iterate local BP message updates on cyclic graph Hope beliefs converge Empirically, often very effective Coming later: Justification as variational method

Sum-Product Belief Propagation Set of neighbors of node s: (s) ={f 2F s 2 f} x t x u x s f x v x w Loopy BP: Message updates can be iteratively computed on graphs with cycles. But marginals incorrect! x t x u x s f x v m sf (x s )= Y g2 (s)\f m gs (x s ) / p s(x s ) m fs (x s ) Marginal Distribution of Each Variable: Marginal Distribution of Each Factor: Clique of variables linked by factor. m fs (x s )= X x f\s p s (x s ) / Y f2 (s) f (x f ) Y m fs (x s ) t2f\s p f (x f ) / f (x f ) Y m sf (x s ) s2f x w m tf (x t )

Low Density Parity Check (LDPC) Codes Parity Check Factors Evidence (observation) Factors Data Bits Parity Bits Ø Each variable node is binary, so x s 2 {0, 1} Ø Parity check factors equal 1 if the sum of the connected bits is even, 0 if the sum is odd (invalid codewords are excluded) Ø Unary evidence factors equal probability that each bit is a 0 or 1, given data. Assumes independent noise on each bit. Loopy BP: 0 5 15 \

Loopy Belief Propagation A Revolution: Belief Propagation Graphs With Cycles In NIPS 1997: https://papers.nips.cc/paper/1467-a-revolution-belief-propagation-in-graphs-with-cycles Brendan J. Frey http://wvw.cs.utoronto.ca/-frey Department of Computer Science University of Toronto David J. C. MacKay http://vol.ra.phy.cam.ac.uk/mackay Department of Physics, Cavendish Laboratory Cambridge University UAI 1999: https://arxiv.org/abs/1301.6725