Mean Field and Variational Methods finishing off

Similar documents
Mean Field and Variational Methods finishing off

More details on Loopy BP

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

Probabilistic Graphical Models

Expectation Propagation

Part II. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Probabilistic Graphical Models

CS242: Probabilistic Graphical Models Lecture 2B: Loopy Belief Propagation & Junction Trees

Probabilistic Graphical Models

D-Separation. b) the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C.

Collective classification in network data

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

10708 Graphical Models: Homework 4

Probabilistic Graphical Models

Loopy Belief Propagation

Junction tree propagation - BNDG 4-4.6

Computer Vision Group Prof. Daniel Cremers. 4a. Inference in Graphical Models

Tree-structured approximations by expectation propagation

OSU CS 536 Probabilistic Graphical Models. Loopy Belief Propagation and Clique Trees / Join Trees

Motivation: Shortcomings of Hidden Markov Model. Ko, Youngjoong. Solution: Maximum Entropy Markov Model (MEMM)

Part I: Sum Product Algorithm and (Loopy) Belief Propagation. What s wrong with VarElim. Forwards algorithm (filtering) Forwards-backwards algorithm

STA 4273H: Statistical Machine Learning

Machine Learning

Dynamic Bayesian network (DBN)

Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees

Regularization and Markov Random Fields (MRF) CS 664 Spring 2008

Cycles in Random Graphs

Pairwise Clustering and Graphical Models

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

Expectation Maximization. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University

Integrating Probabilistic Reasoning with Constraint Satisfaction

Bayesian Networks Inference

Machine Learning

Probabilistic Graphical Models

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

A Tutorial Introduction to Belief Propagation

The Surprising Power of Belief Propagation

Introduction to Graphical Models

Machine Learning

Robust Probabilistic Inference in Distributed Systems

Geometric Registration for Deformable Shapes 3.3 Advanced Global Matching

6 : Factor Graphs, Message Passing and Junction Trees

Machine Learning

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

Coloring 3-Colorable Graphs

10 Sum-product on factor tree graphs, MAP elimination

Modeling and Reasoning with Bayesian Networks. Adnan Darwiche University of California Los Angeles, CA

A Fast Learning Algorithm for Deep Belief Nets

Semantic Segmentation. Zhongang Qi

Markov Decision Processes (MDPs) (cont.)

STAT 598L Probabilistic Graphical Models. Instructor: Sergey Kirshner. Exact Inference

Information Processing Letters

Finding Non-overlapping Clusters for Generalized Inference Over Graphical Models

Undirected Graphical Models. Raul Queiroz Feitosa

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Ch9: Exact Inference: Variable Elimination. Shimi Salant, Barak Sternberg

CRFs for Image Classification

Structured Region Graphs: Morphing EP into GBP

Time series, HMMs, Kalman Filters

Bayesian Networks Inference (continued) Learning

Learning the Structure of Sum-Product Networks. Robert Gens Pedro Domingos

Clustering web search results

Probabilistic Graphical Models

Workshop report 1. Daniels report is on website 2. Don t expect to write it based on listening to one project (we had 6 only 2 was sufficient

COS 513: Foundations of Probabilistic Modeling. Lecture 5

Improving Mining Quality by Exploiting Data Dependency

Conditional Random Fields and beyond D A N I E L K H A S H A B I C S U I U C,

Graphical Models as Block-Tree Graphs

CS649 Sensor Networks IP Track Lecture 6: Graphical Models

Decomposition of log-linear models

V,T C3: S,L,B T C4: A,L,T A,L C5: A,L,B A,B C6: C2: X,A A

Unsupervised Learning. Clustering and the EM Algorithm. Unsupervised Learning is Model Learning

CS 532c Probabilistic Graphical Models N-Best Hypotheses. December

Exact Inference: Elimination and Sum Product (and hidden Markov models)

Lecture Notes 2: The Simplex Algorithm

Computer vision: models, learning and inference. Chapter 10 Graphical Models

Quantitative Biology II!

Speeding Up Computation in Probabilistic Graphical Models using GPGPUs

Message-passing algorithms (continued)

Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields

From Unsupervised to Semi-Supervised Learning: Algorithms and Applications

Lecture outline. Graph coloring Examples Applications Algorithms

Inference. Inference: calculating some useful quantity from a joint probability distribution Examples: Posterior probability: Most likely explanation:

Message-passing Methods on Loopy Factor Graphs Based on Homotopy and Reweighting

ECE 5424: Introduction to Machine Learning

Markov Random Fields and Segmentation with Graph Cuts

Chapter 8 of Bishop's Book: Graphical Models

10. MLSP intro. (Clustering: K-means, EM, GMM, etc.)

10.4 Linear interpolation method Newton s method

Lecture 9: Undirected Graphical Models Machine Learning

ECE521 Lecture 18 Graphical Models Hidden Markov Models

Machine Learning

Deep Generative Models Variational Autoencoders

A Robust Architecture for Distributed Inference in Sensor Networks

Identifying network modules

An Introduction to LP Relaxations for MAP Inference

18 October, 2013 MVA ENS Cachan. Lecture 6: Introduction to graphical models Iasonas Kokkinos

CS242: Probabilistic Graphical Models Lecture 3: Factor Graphs & Variable Elimination

Variational Methods for Graphical Models

Parallel Splash Belief Propagation

Transcription:

Readings: K&F: 10.1, 10.5 Mean Field and Variational Methods finishing off Graphical Models 10708 Carlos Guestrin Carnegie Mellon University November 5 th, 2008 10-708 Carlos Guestrin 2006-2008 1 10-708 Carlos Guestrin 2006-2008 2 1

What you need to know so far Goal: Find an efficient distribution that is close to posterior Distance: measure distance in terms of KL divergence Asymmetry of KL: D(p q) D(q p) Computing right KL is intractable, so we use the reverse KL 10-708 Carlos Guestrin 2006-2008 3 Reverse KL & The Partition Function Back to the general case Consider again the defn. of D(q p): p is Markov net P F Theorem: where energy functional: 10-708 Carlos Guestrin 2006-2008 4 2

Understanding Reverse KL, Energy Function & The Partition Function Maximizing Energy Functional Minimizing Reverse KL Theorem: Energy Function is lower bound on partition function Maximizing energy functional corresponds to search for tight lower bound on partition function 10-708 Carlos Guestrin 2006-2008 5 Structured Variational Approximate Inference Pick a family of distributions Q that allow for exact inference e.g., fully factorized (mean field) Find Q2Q that maximizes For mean field 10-708 Carlos Guestrin 2006-2008 6 3

Optimization for mean field Constrained optimization, solved via Lagrangian multiplier 9 λ, such that optimization equivalent to: Take derivative, set to zero Theorem: Q is a stationary point of mean field approximation iff for each i: 10-708 Carlos Guestrin 2006-2008 7 Understanding fixed point equation 10-708 Carlos Guestrin 2006-2008 8 4

Simplifying fixed point equation 10-708 Carlos Guestrin 2006-2008 9 Q i only needs to consider factors that intersect X i Theorem: The fixed point: is equivalent to: where the Scope[φ j ] = U j [ {X i } 10-708 Carlos Guestrin 2006-2008 10 5

There are many stationary points! 10-708 Carlos Guestrin 2006-2008 11 Very simple approach for finding one stationary point Initialize Q (e.g., randomly or smartly) Set all vars to unprocessed Pick unprocessed var X i update Q i : set var i as processed if Q i changed set neighbors of X i to unprocessed Guaranteed to converge 10-708 Carlos Guestrin 2006-2008 12 6

More general structured approximations Mean field very naïve approximation Consider more general form for Q assumption: exact inference doable over Q Theorem: stationary point of energy functional: Very similar update rule 10-708 Carlos Guestrin 2006-2008 13 Computing update rule for general case Consider one φ: 10-708 Carlos Guestrin 2006-2008 14 7

Structured Variational update requires inference Compute marginals wrt Q of cliques in original graph and cliques in new graph, for all cliques What is a good way of computing all these marginals? Potential updates: sequential: compute marginals, update ψ j, recompute marginals parallel: compute marginals, update all ψ s, recompute marginals 10-708 Carlos Guestrin 2006-2008 15 What you need to know about variational methods Structured Variational method: select a form for approximate distribution minimize reverse KL Equivalent to maximizing energy functional searching for a tight lower bound on the partition function Many possible models for Q: independent (mean field) structured as a Markov net cluster variational Several subtleties outlined in the book 10-708 Carlos Guestrin 2006-2008 16 8

Readings: K&F: 10.2, 10.3 Loopy Belief Propagation Graphical Models 10708 Carlos Guestrin Carnegie Mellon University November 5 th, 2008 10-708 Carlos Guestrin 2006-2008 17 Recall message passing over junction trees CD Exact inference: generate a junction tree message passing over neighbors DIG GSI inference exponential in size of clique GJSL HGJ 10-708 Carlos Guestrin 2006-2008 18 9

Belief Propagation on Tree Pairwise Markov Nets Tree pairwise Markov net is a tree!!! no need to create a junction tree Message passing: More general equation: N(i) neighbors of i in pairwise MN Theorem: Converges to true probabilities: 10-708 Carlos Guestrin 2006-2008 19 Loopy Belief Propagation on Pairwise Markov Nets What if we apply BP in a graph with loops? send messages between pairs of nodes in graph, and hope for the best What happens? evidence goes around the loops multiple times may not converge if it converges, usually overconfident about probability values But often gives you reasonable, or at least useful answers especially if you just care about the MPE rather than the actual probabilities 10-708 Carlos Guestrin 2006-2008 20 10

More details on Loopy BP Numerical problem: messages < 1 get multiplied together as we go around the loops numbers can go to zero normalize messages to one: Z i!j doesn t depend on X j, so doesn t change the answer Computing node beliefs (estimates of probs.): 10-708 Carlos Guestrin 2006-2008 21 An example of running loopy BP 10-708 Carlos Guestrin 2006-2008 22 11

Convergence If you tried to send all messages, and beliefs haven t changed (by much)! converged 10-708 Carlos Guestrin 2006-2008 23 (Non-)Convergence of Loopy BP Loopy BP can oscillate!!! oscillations can small oscillations can be really bad! Typically, if factors are closer to uniform, loopy does well (converges) if factors are closer to deterministic, loopy doesn t behave well One approach to help: damping messages new message is average of old message and new one: often better convergence but, when damping is required to get convergence, result often bad graphs from Murphy et al. 99 10-708 Carlos Guestrin 2006-2008 24 12

Loopy BP in Factor graphs What if we don t have pairwise Markov nets? 1. Transform to a pairwise MN 2. Use Loopy BP on a factor graph Message example: from node to factor: A B C D E ABC ABD BDE CDE from factor to node: 10-708 Carlos Guestrin 2006-2008 25 Loopy BP in Factor graphs From node i to factor j: F(i) factors whose scope includes X i A B C D E ABC ABD BDE CDE From factor j to node i: Scope[φ j ] = Y[{X i } Belief: Node: Factor: 10-708 Carlos Guestrin 2006-2008 26 13

What you need to know about loopy BP Application of belief propagation in loopy graphs Doesn t always converge damping can help good message schedules can help (see book) If converges, often to incorrect, but useful results Generalizes from pairwise Markov networks by using factor graphs 10-708 Carlos Guestrin 2006-2008 27 14