PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Similar documents
Machine Learning. Sourangshu Bhattacharya

Part II. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

FMA901F: Machine Learning Lecture 6: Graphical Models. Cristian Sminchisescu

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

D-Separation. b) the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C.

Workshop report 1. Daniels report is on website 2. Don t expect to write it based on listening to one project (we had 6 only 2 was sufficient

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

STA 4273H: Statistical Machine Learning

Graphical Models & HMMs

Computer Vision Group Prof. Daniel Cremers. 4a. Inference in Graphical Models

Bayesian Machine Learning - Lecture 6

Probabilistic Graphical Models

Machine Learning

Computer vision: models, learning and inference. Chapter 10 Graphical Models

Graphical Models Part 1-2 (Reading Notes)

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Machine Learning Lecture 16

Motivation: Shortcomings of Hidden Markov Model. Ko, Youngjoong. Solution: Maximum Entropy Markov Model (MEMM)

Graphical Models. Dmitrij Lagutin, T Machine Learning: Basic Principles

6 : Factor Graphs, Message Passing and Junction Trees

Probabilistic Graphical Models

Lecture 9: Undirected Graphical Models Machine Learning

Machine Learning. Lecture Slides for. ETHEM ALPAYDIN The MIT Press, h1p://

Machine Learning

Machine Learning

Loopy Belief Propagation

Lecture 5: Exact inference. Queries. Complexity of inference. Queries (continued) Bayesian networks can answer questions about the underlying

Machine Learning

OSU CS 536 Probabilistic Graphical Models. Loopy Belief Propagation and Clique Trees / Join Trees

Directed Graphical Models

COMP90051 Statistical Machine Learning

ECE521 Lecture 21 HMM cont. Message Passing Algorithms

Reasoning About Uncertainty

ECE521 W17 Tutorial 10

Lecture 5: Exact inference

Chapter 8 of Bishop's Book: Graphical Models

Introduction to Graphical Models

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms for Inference Fall 2014

Probabilistic Graphical Models

Expectation Propagation

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

STAT 598L Probabilistic Graphical Models. Instructor: Sergey Kirshner. Exact Inference

CS242: Probabilistic Graphical Models Lecture 2B: Loopy Belief Propagation & Junction Trees

3 : Representation of Undirected GMs

Conditional Random Fields and beyond D A N I E L K H A S H A B I C S U I U C,

CS242: Probabilistic Graphical Models Lecture 3: Factor Graphs & Variable Elimination

Part I: Sum Product Algorithm and (Loopy) Belief Propagation. What s wrong with VarElim. Forwards algorithm (filtering) Forwards-backwards algorithm

CS 343: Artificial Intelligence

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

Max-Sum Inference Algorithm

Algorithms for Markov Random Fields in Computer Vision

Directed Graphical Models (Bayes Nets) (9/4/13)

COS 513: Foundations of Probabilistic Modeling. Lecture 5

Lecture 4: Undirected Graphical Models

Graphical Models for Computer Vision

Graphical Models. David M. Blei Columbia University. September 17, 2014

These notes present some properties of chordal graphs, a set of undirected graphs that are important for undirected graphical models.

Learning the Structure of Sum-Product Networks. Robert Gens Pedro Domingos

Junction tree propagation - BNDG 4-4.6

Introduction to Bayesian networks

Inference Complexity As Learning Bias. The Goal Outline. Don t use model complexity as your learning bias

Machine Learning

Building Classifiers using Bayesian Networks

Regularization and Markov Random Fields (MRF) CS 664 Spring 2008

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 15, 2015

18 October, 2013 MVA ENS Cachan. Lecture 6: Introduction to graphical models Iasonas Kokkinos

Exact Inference: Elimination and Sum Product (and hidden Markov models)

Graphical Models as Block-Tree Graphs

Collective classification in network data

Models for grids. Computer vision: models, learning and inference. Multi label Denoising. Binary Denoising. Denoising Goal.

Finding Non-overlapping Clusters for Generalized Inference Over Graphical Models

Introduction to information theory and coding - Lecture 1 on Graphical models

Recall from last time. Lecture 4: Wrap-up of Bayes net representation. Markov networks. Markov blanket. Isolating a node

The Basics of Graphical Models

Graphical Models. Pradeep Ravikumar Department of Computer Science The University of Texas at Austin

Bayesian Networks. A Bayesian network is a directed acyclic graph that represents causal relationships between random variables. Earthquake.

From the Jungle to the Garden: Growing Trees for Markov Chain Monte Carlo Inference in Undirected Graphical Models

Modeling and Reasoning with Bayesian Networks. Adnan Darwiche University of California Los Angeles, CA

Stat 5421 Lecture Notes Graphical Models Charles J. Geyer April 27, Introduction. 2 Undirected Graphs

Homework 1: Belief Propagation & Factor Graphs

Scene Grammars, Factor Graphs, and Belief Propagation

A Parallel Algorithm for Exact Structure Learning of Bayesian Networks

Graph Theory. Probabilistic Graphical Models. L. Enrique Sucar, INAOE. Definitions. Types of Graphs. Trajectories and Circuits.

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 20, NO. 9, SEPTEMBER

Probabilistic Graphical Models

COMP90051 Statistical Machine Learning

Integrating Probabilistic Reasoning with Constraint Satisfaction

ECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov

A New Approach For Convert Multiply-Connected Trees in Bayesian networks

CS 664 Flexible Templates. Daniel Huttenlocher

Structured Region Graphs: Morphing EP into GBP

Scene Grammars, Factor Graphs, and Belief Propagation

Av. Prof. Mello Moraes, 2231, , São Paulo, SP - Brazil

Sequence Labeling: The Problem

5/3/2010Z:\ jeh\self\notes.doc\7 Chapter 7 Graphical models and belief propagation Graphical models and belief propagation

CSCI 2950-P Homework 1: Belief Propagation, Inference, & Factor Graphs

Ch9: Exact Inference: Variable Elimination. Shimi Salant, Barak Sternberg

arxiv: v2 [cs.cv] 30 Jul 2018

Mean Field and Variational Methods finishing off

Sparse Nested Markov Models with Log-linear Parameters

Transcription:

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Bayesian Networks Directed Acyclic Graph (DAG)

Bayesian Networks General Factorization

Bayesian Curve Fitting (1) Polynomial

Bayesian Curve Fitting (2) Plate

Bayesian Curve Fitting (3) Input variables and explicit hyperparameters

Bayesian Curve Fitting Learning Condition on data

Bayesian Curve Fitting Prediction Predictive distribution: where

Generative Models Causal process for generating images

Discrete Variables (1) General joint distribution: K 2 { 1 parameters Independent joint distribution: 2(K { 1) parameters

Discrete Variables (2) General joint distribution over M variables: K M { 1 parameters M -node Markov chain: K { 1 + (M { 1)K(K { 1) parameters

Discrete Variables: Bayesian Parameters (1)

Discrete Variables: Bayesian Parameters (2) Shared prior

Parameterized Conditional Distributions If are discrete, K-state variables, in general has O(K M ) parameters. The parameterized form requires only M + 1 parameters

Linear-Gaussian Models Directed Graph Each node is Gaussian, the mean is a linear function of the parents. Vector-valued Gaussian Nodes

Conditional Independence a is independent of b given c Equivalently Notation

Conditional Independence: Example 1

Conditional Independence: Example 1

Conditional Independence: Example 2

Conditional Independence: Example 2

Conditional Independence: Example 3 Note: this is the opposite of Example 1, with c unobserved.

Conditional Independence: Example 3 Note: this is the opposite of Example 1, with c observed.

Am I out of fuel? and hence B = F = G = Battery (0=flat, 1=fully charged) Fuel Tank (0=empty, 1=full) Fuel Gauge Reading (0=empty, 1=full)

Am I out of fuel? Probability of an empty tank increased by observing G = 0.

Am I out of fuel? Probability of an empty tank reduced by observing B = 0. This referred to as explaining away.

D-separation A, B, and C are non-intersecting subsets of nodes in a directed graph. A path from A to B is blocked if it contains a node such that either a) the arrows on the path meet either head-to-tail or tailto-tail at the node, and the node is in the set C, or b) the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C. If all paths from A to B are blocked, A is said to be d- separated from B by C. If A is d-separated from B by C, the joint distribution over all variables in the graph satisfies.

D-separation: Example

D-separation: I.I.D. Data

Directed Graphs as Distribution Filters

The Markov Blanket Factors independent of x i cancel between numerator and denominator.

Cliques and Maximal Cliques Clique Maximal Clique

Joint Distribution where is the potential over clique C and is the normalization coefficient; note: M K-state variables K M terms in Z. Energies and the Boltzmann distribution

Illustration: Image De-Noising (1) Original Image Noisy Image

Illustration: Image De-Noising (2)

Illustration: Image De-Noising (3) Noisy Image Restored Image (ICM)

Illustration: Image De-Noising (4) Restored Image (ICM) Restored Image (Graph cuts)

Converting Directed to Undirected Graphs (1)

Converting Directed to Undirected Graphs (2) Additional links

Directed vs. Undirected Graphs (1)

Directed vs. Undirected Graphs (2)

Inference in Graphical Models

Inference on a Chain

Inference on a Chain

Inference on a Chain

Inference on a Chain

Inference on a Chain To compute local marginals: Compute and store all forward messages,. Compute and store all backward messages,. Compute Z at any node x m Compute for all variables required.

Trees Undirected Tree Directed Tree Polytree

Factor Graphs

Factor Graphs from Directed Graphs

Factor Graphs from Undirected Graphs

The Sum-Product Algorithm (1) Objective: i. to obtain an efficient, exact inference algorithm for finding marginals; ii. in situations where several marginals are required, to allow computations to be shared efficiently. Key idea: Distributive Law

The Sum-Product Algorithm (2)

The Sum-Product Algorithm (3)

The Sum-Product Algorithm (4)

The Sum-Product Algorithm (5)

The Sum-Product Algorithm (6)

The Sum-Product Algorithm (7) Initialization

The Sum-Product Algorithm (8) To compute local marginals: Pick an arbitrary node as root Compute and propagate messages from the leaf nodes to the root, storing received messages at every node. Compute and propagate messages from the root to the leaf nodes, storing received messages at every node. Compute the product of received messages at each node for which the marginal is required, and normalize if necessary.

Sum-Product: Example (1)

Sum-Product: Example (2)

Sum-Product: Example (3)

Sum-Product: Example (4)

The Max-Sum Algorithm (1) Objective: an efficient algorithm for finding i. the value x max that maximises p(x); ii. the value of p(x max ). In general, maximum marginals joint maximum.

The Max-Sum Algorithm (2) Maximizing over a chain (max-product)

The Max-Sum Algorithm (3) Generalizes to tree-structured factor graph maximizing as close to the leaf nodes as possible

The Max-Sum Algorithm (4) Max-Product Max-Sum For numerical reasons, use Again, use distributive law

The Max-Sum Algorithm (5) Initialization (leaf nodes) Recursion

The Max-Sum Algorithm (6) Termination (root node) Back-track, for all nodes i with l factor nodes to the root (l=0)

The Max-Sum Algorithm (7) Example: Markov chain

The Junction Tree Algorithm Exact inference on general graphs. Works by turning the initial graph into a junction tree and then running a sumproduct-like algorithm. Intractable on graphs with large cliques.

Loopy Belief Propagation Sum-Product on general graphs. Initial unit messages passed across all links, after which messages are passed around until convergence (not guaranteed!). Approximate but tractable for large graphs. Sometime works well, sometimes not at all.