Expectation Propagation

Similar documents
Probabilistic Graphical Models

Loopy Belief Propagation

CS242: Probabilistic Graphical Models Lecture 2B: Loopy Belief Propagation & Junction Trees

Mean Field and Variational Methods finishing off

Computer vision: models, learning and inference. Chapter 10 Graphical Models

Chapter 8 of Bishop's Book: Graphical Models

CS242: Probabilistic Graphical Models Lecture 3: Factor Graphs & Variable Elimination

Mean Field and Variational Methods finishing off

More details on Loopy BP

Computer Vision Group Prof. Daniel Cremers. 4a. Inference in Graphical Models

Tree-structured approximations by expectation propagation

Part II. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

STA 4273H: Statistical Machine Learning

Collective classification in network data

Structured Region Graphs: Morphing EP into GBP

Lecture 4: Undirected Graphical Models

Probabilistic Graphical Models

D-Separation. b) the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C.

Finding Non-overlapping Clusters for Generalized Inference Over Graphical Models

Learning the Structure of Sum-Product Networks. Robert Gens Pedro Domingos

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

6 : Factor Graphs, Message Passing and Junction Trees

Algorithms for Markov Random Fields in Computer Vision

Probabilistic Graphical Models

Graphical Models. Pradeep Ravikumar Department of Computer Science The University of Texas at Austin

3 : Representation of Undirected GMs

Probabilistic Graphical Models

Information Processing Letters

Mesh segmentation. Florent Lafarge Inria Sophia Antipolis - Mediterranee

Motivation: Shortcomings of Hidden Markov Model. Ko, Youngjoong. Solution: Maximum Entropy Markov Model (MEMM)

Applied Bayesian Nonparametrics 5. Spatial Models via Gaussian Processes, not MRFs Tutorial at CVPR 2012 Erik Sudderth Brown University

10708 Graphical Models: Homework 4

Structured Region Graphs: Morphing EP into GBP

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 51, NO. 7, JULY A New Class of Upper Bounds on the Log Partition Function

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 15, 2015

ECE521 W17 Tutorial 10

FMA901F: Machine Learning Lecture 6: Graphical Models. Cristian Sminchisescu

OSU CS 536 Probabilistic Graphical Models. Loopy Belief Propagation and Clique Trees / Join Trees

ECE521 Lecture 21 HMM cont. Message Passing Algorithms

Regularization and Markov Random Fields (MRF) CS 664 Spring 2008

Convergent message passing algorithms - a unifying view

3136 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 52, NO. 11, NOVEMBER 2004

Message-passing algorithms (continued)

18 October, 2013 MVA ENS Cachan. Lecture 6: Introduction to graphical models Iasonas Kokkinos

Graphical Models. David M. Blei Columbia University. September 17, 2014

Parallel constraint optimization algorithms for higher-order discrete graphical models: applications in hyperspectral imaging

ECE521 Lecture 18 Graphical Models Hidden Markov Models

AN ANALYSIS ON MARKOV RANDOM FIELDS (MRFs) USING CYCLE GRAPHS

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

Image Restoration using Markov Random Fields

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

An Introduction to LP Relaxations for MAP Inference

2. Graphical Models. Undirected graphical models. Factor graphs. Bayesian networks. Conversion between graphical models. Graphical Models 2-1

ECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas

Convexization in Markov Chain Monte Carlo

The Basics of Graphical Models

5 Minimal I-Maps, Chordal Graphs, Trees, and Markov Chains

Belief propagation and MRF s

Lecture 9: Undirected Graphical Models Machine Learning

intro, applications MRF, labeling... how it can be computed at all? Applications in segmentation: GraphCut, GrabCut, demos

Undirected Graphical Models. Raul Queiroz Feitosa

A Fast Learning Algorithm for Deep Belief Nets

Models for grids. Computer vision: models, learning and inference. Multi label Denoising. Binary Denoising. Denoising Goal.

Integrating Probabilistic Reasoning with Constraint Satisfaction

Conditional Random Fields and beyond D A N I E L K H A S H A B I C S U I U C,

Generalized Belief Propagation on Tree Robust Structured Region Graphs

2. Graphical Models. Undirected pairwise graphical models. Factor graphs. Bayesian networks. Conversion between graphical models. Graphical Models 2-1

Statistical and Learning Techniques in Computer Vision Lecture 1: Markov Random Fields Jens Rittscher and Chuck Stewart

2D Articulated Tracking With Dynamic Bayesian Networks

Probabilistic Graphical Models

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms for Inference Fall 2014

4 Factor graphs and Comparing Graphical Model Types

Graphical Models, Bayesian Method, Sampling, and Variational Inference

On Iteratively Constraining the Marginal Polytope for Approximate Inference and MAP

Max-Product Particle Belief Propagation

Machine Learning. Sourangshu Bhattacharya

Workshop report 1. Daniels report is on website 2. Don t expect to write it based on listening to one project (we had 6 only 2 was sufficient

Graphical Models as Block-Tree Graphs

Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees

Variational Methods for Graphical Models

Lecture 3: Conditional Independence - Undirected

Markov Logic: Representation

Dynamic Bayesian network (DBN)

Jing Gao 1, Feng Liang 1, Wei Fan 2, Chi Wang 1, Yizhou Sun 1, Jiawei i Han 1 University of Illinois, IBM TJ Watson.

PIANOS requirements specifications

Data association based on optimization in graphical models with application to sensor networks

Smooth Image Segmentation by Nonparametric Bayesian Inference

Geometric Registration for Deformable Shapes 3.3 Advanced Global Matching

Introduction to Graphical Models

New Outer Bounds on the Marginal Polytope

Introduction to Machine Learning CMU-10701

Bayesian Machine Learning - Lecture 6

Junction tree propagation - BNDG 4-4.6

Parallelizing A Convergent Approximate Inference Method

Continuously-adaptive discretization for message-passing algorithms

Quantitative Biology II!

A Parallel Algorithm for Exact Structure Learning of Bayesian Networks

Transcription:

Expectation Propagation Erik Sudderth 6.975 Week 11 Presentation November 20, 2002

Introduction Goal: Efficiently approximate intractable distributions Features of Expectation Propagation (EP): Deterministic, iterative method for computing approximate posterior distributions Approximating distribution may be selected from any exponential family Framework for extending loopy Belief Propagation (BP): - Structured approximations for greater accuracy - Inference for continuous non-gaussian models

Outline Background Graphical models Exponential families Expectation Propagation (EP) Assumed Density Filtering EP for unstructured exponential families Connections to Belief Propagation BP as a fully factorized EP approximation Free energy interpretations Continuous non-gaussian models Structured EP approximations

Clutter Problem x y 1 y 2 y 3 y n n independent observations from a Gaussian distribution of unknown mean x embedded in a sea of clutter posterior is a mixture of 2 n Gaussians

Graphical Models An undirected graph is defined by set of nodes set of edges connecting nodes Nodes are associated with random variables

Graphical Models An undirected graph is defined by set of nodes set of edges connecting nodes Nodes are associated with random variables Graph Separation Conditional Independence

Markov Properties & Factorizations Question Which probability distributions satisfy the conditional independencies implied by a graph? Hammersley-Clifford Theorem assuming is Markov w.r.t., where normalization constant arbitrary positive clique potential function set of all cliques of Cliques are fully connected subsets of :

Exponential Families exponential (canonical) parameter vector potential function log partition function (normalization) Examples: Gaussian Poisson Discrete multinomial Factorized versions of these models

Manipulation of Exponential Families Products: Quotients: May not preserve normalizability Projections: Optimal solution found via moment matching:

Assumed Density Filtering (ADF) Choose an approximating exponential family Initialize by approximating the first compatibility function: Sequentially incorporate all other compatibilities: The current best estimate of the product distribution is used to guide the incorporation of Superior to approximating individually

ADF for the Clutter Problem ADF is sensitive to the order in which compatibility functions are incorporated into the posterior

ADF as Compatibility Approximation Standard View: Sequential approximation of the posterior Alternate View: Sequential approximation of compatibilities exponential approximation to member of exponential family

Expectation Propagation Idea: Iterate the ADF compatibility function approximations, always using the best estimates for all but one function to improve the exponential approximation to the remaining term Initialization: Choose starting values for the compatibility approximations: Initialize the corresponding posterior approximation:

EP Iteration 1. Choose some m i (x) to refine. 2. Remove the effects of m i (x) from the current estimate: 3. Update the posterior approximation to q(x;θ * ), where 4. Refine the exponential approximation to m i (x) as

EP for the Clutter Problem n = 20 n = 200 EP generally shows quite good performance, but is not guaranteed to converge

Relationship to Belief Propagation BP is a special case of EP Many results characterizing BP can be extended to EP EP provides a mechanism for constructing improved approximations for models where BP performs poorly EP extends local propagation methods to many models where BP is not possible (continuous non-gaussian) Explore relationship for special case of pairwise MRFs:

Belief Propagation Combine the information from all nodes in the graph through a series of local message-passing operations neighborhood of node s (adjacent nodes) message sent from node t to node s ( sufficient statistic of t s knowledge about s)

BP Message Updates 1. Combine incoming messages, excluding that from node s, with the local observation to form a distribution over 2. Propagate this distribution from node t to node s using the pairwise interaction potential 3. Integrate out the effects of

Fully Factorized EP Approximations Each q s (x s ) can be a general discrete multinomial distribution (no restrictions other than factorization) Compatibility approximations in same exponential family Initialization: Initialize compatibility approximations m s,t (x s,x t ) Initialize each term in the factorized posterior approximation:

Factorized EP Iteration I 1. Choose some m s,t (x s,x t ) to refine. m s,t (x s,x t ) involves only x s and x t, so the approximations q u (x u ) for all other nodes are unaffected by the EP update 2. Remove the effects of m s,t (x s,x t ) from the current estimate:

Factorized EP Iteration II 3. Update the posterior approximation by determining the appropriate marginal distributions: 4. Refine the exponential approximation to m s,t (x s,x t ) as Standard BP Message Updates

Bethe Free Energy BP: Minimize subject to marginalization constraints EP: Minimize subject to expectation constraints

Implications of Free Energy Interpretation Fixed Points EP has a fixed point for every product distribution p(x) Stable EP fixed points must be local minima of the Bethe free energy (converse does not hold) Double Loop Algorithms Guaranteed convergence to local minimum of Bethe Separate Bethe into sum of convex and concave parts: Outer Loop: Bound concave part linearly Inner Loop: Solve constrained convex minimization

Are Double Loop Algorithms Worthwhile?

Non-Gaussian Message Passing Choose an approximating exponential family Modify the BP marginalization step to perform moment matching: construct best local exponential approximation Switching Linear Dynamical Systems discrete system mode conditionally Gaussian observation Exact Posterior: Mixture of exponentially many Gaussians EP Approximation: Single Gaussian for each discrete state

Structured EP Approximations Original Fully Factorized EP (Belief Propagation) Structured EP Wainwright: Structured EP approximations must use triangulated graphs Unifies structured EP-style approximations and region based Kikuchi-style approximations in common framework Which higher order approximation is more effective?

Open Research Directions For a given computational cost, what combination of substructures and/or region-based clustering produce the most accurate estimates? How robust and effective are EP iterations for continuous, non-gaussian models? Are the posterior distributions arising in practice well modeled by exponential families?

References T. Minka, A Family of Algorithms for Approximate Bayesian Inference. PhD Thesis, MIT, Jan. 2001. T. Minka, Expectation Propagation for Approximate Bayesian Inference. UAI 17, pp. 362-369, 2001. T. Heskes and O. Zoeter, Expectation Propagation for Approximate Inference in Dynamic Bayesian Networks. UAI 18, pp. 216-223, 2002. M. Wainwright, Stochastic Processes on Graphs with Cycles: Geometric and Variational Approaches. PhD Thesis, MIT, Jan. 2002.