Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

Similar documents
Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

D-Separation. b) the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C.

Machine Learning. Sourangshu Bhattacharya

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

FMA901F: Machine Learning Lecture 6: Graphical Models. Cristian Sminchisescu

Probabilistic Graphical Models

Computer Vision Group Prof. Daniel Cremers. 4a. Inference in Graphical Models

Computer vision: models, learning and inference. Chapter 10 Graphical Models

COMP90051 Statistical Machine Learning

Workshop report 1. Daniels report is on website 2. Don t expect to write it based on listening to one project (we had 6 only 2 was sufficient

Graphical Models Part 1-2 (Reading Notes)

3 : Representation of Undirected GMs

Bayesian Machine Learning - Lecture 6

Graphical Models & HMMs

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Part II. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Graphical Models. Dmitrij Lagutin, T Machine Learning: Basic Principles

Machine Learning

2. Graphical Models. Undirected graphical models. Factor graphs. Bayesian networks. Conversion between graphical models. Graphical Models 2-1

Probabilistic Graphical Models

Lecture 4: Undirected Graphical Models

ECE521 Lecture 21 HMM cont. Message Passing Algorithms

Directed Graphical Models (Bayes Nets) (9/4/13)

CS242: Probabilistic Graphical Models Lecture 3: Factor Graphs & Variable Elimination

2. Graphical Models. Undirected pairwise graphical models. Factor graphs. Bayesian networks. Conversion between graphical models. Graphical Models 2-1

Introduction to Graphical Models

Lecture 9: Undirected Graphical Models Machine Learning

Lecture 3: Conditional Independence - Undirected

Machine Learning

Machine Learning Lecture 16

Machine Learning

STA 4273H: Statistical Machine Learning

Sequence Labeling: The Problem

Graphical Models. David M. Blei Columbia University. September 17, 2014

Machine Learning. Lecture Slides for. ETHEM ALPAYDIN The MIT Press, h1p://

These notes present some properties of chordal graphs, a set of undirected graphs that are important for undirected graphical models.

A Brief Introduction to Bayesian Networks AIMA CIS 391 Intro to Artificial Intelligence

The Basics of Graphical Models

Bayesian Classification Using Probabilistic Graphical Models

Graphical Models. Pradeep Ravikumar Department of Computer Science The University of Texas at Austin

COMP90051 Statistical Machine Learning

Chapter 8 of Bishop's Book: Graphical Models

Graphical models are a lot like a circuit diagram they are written down to visualize and better understand a problem.

Junction tree propagation - BNDG 4-4.6

Directed Graphical Models

6 : Factor Graphs, Message Passing and Junction Trees

Probabilistic Graphical Models

Reasoning About Uncertainty

Statistical Techniques in Robotics (STR, S15) Lecture#06 (Wednesday, January 28)

1 : Introduction to GM and Directed GMs: Bayesian Networks. 3 Multivariate Distributions and Graphical Models

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms for Inference Fall 2014

ECE521 W17 Tutorial 10

Today. Logistic Regression. Decision Trees Redux. Graphical Models. Maximum Entropy Formulation. Now using Information Theory

Loopy Belief Propagation

4 Factor graphs and Comparing Graphical Model Types

A Parallel Algorithm for Exact Structure Learning of Bayesian Networks

Conditional Random Fields - A probabilistic graphical model. Yen-Chin Lee 指導老師 : 鮑興國

Conditional Random Fields and beyond D A N I E L K H A S H A B I C S U I U C,

Computational Intelligence

Collective classification in network data

Machine Learning. Supervised Learning. Manfred Huber

Lecture 5: Exact inference. Queries. Complexity of inference. Queries (continued) Bayesian networks can answer questions about the underlying

Statistical and Learning Techniques in Computer Vision Lecture 1: Markov Random Fields Jens Rittscher and Chuck Stewart

Recall from last time. Lecture 4: Wrap-up of Bayes net representation. Markov networks. Markov blanket. Isolating a node

Mixture Models and the EM Algorithm

Regularization and Markov Random Fields (MRF) CS 664 Spring 2008

Lecture 2 - Graph Theory Fundamentals - Reachability and Exploration 1

Graphical Models and Markov Blankets

BACKGROUND: A BRIEF INTRODUCTION TO GRAPH THEORY

ECE521 Lecture 18 Graphical Models Hidden Markov Models

Exact Inference: Elimination and Sum Product (and hidden Markov models)

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

Bayes Net Learning. EECS 474 Fall 2016

Lecture 3: Graphs and flows

RNNs as Directed Graphical Models

7. Boosting and Bagging Bagging

CS 343: Artificial Intelligence

Bayesian Networks. A Bayesian network is a directed acyclic graph that represents causal relationships between random variables. Earthquake.

From the Jungle to the Garden: Growing Trees for Markov Chain Monte Carlo Inference in Undirected Graphical Models

Note Set 4: Finite Mixture Models and the EM Algorithm

Decomposition of log-linear models

CS 343: Artificial Intelligence

5 Minimal I-Maps, Chordal Graphs, Trees, and Markov Chains

Markov Random Fields and Gibbs Sampling for Image Denoising

Algorithms for Markov Random Fields in Computer Vision

ECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov

Graph Theory. Probabilistic Graphical Models. L. Enrique Sucar, INAOE. Definitions. Types of Graphs. Trajectories and Circuits.

Chapter 2 PRELIMINARIES. 1. Random variables and conditional independence

Graphical Models Reconstruction

Finding Non-overlapping Clusters for Generalized Inference Over Graphical Models

Stat 5421 Lecture Notes Graphical Models Charles J. Geyer April 27, Introduction. 2 Undirected Graphs

10-701/15-781, Fall 2006, Final

Building Classifiers using Bayesian Networks

Lecture 5: Exact inference

Markov Equivalence in Bayesian Networks

COS 513: Foundations of Probabilistic Modeling. Lecture 5

10708 Graphical Models: Homework 2

Expectation Propagation

Machine Learning

Recitation 4: Elimination algorithm, reconstituted graph, triangulation

Transcription:

Prof. Daniel Cremers 4. Probabilistic Graphical Models Directed Models

The Bayes Filter (Rep.) (Bayes) (Markov) (Tot. prob.) (Markov) (Markov) 2

Graphical Representation (Rep.) We can describe the overall process using a Dynamic Bayes Network: This incorporates the following Markov assumptions: (measurement) (state) 3

Definition A Probabilistic Graphical Model is a diagrammatic representation of a probability distribution. In a Graphical Model, random variables are represented as nodes, and statistical dependencies are represented using edges between the nodes. The resulting graph can have the following properties: Cyclic / acyclic Directed / undirected The simplest graphs are Directed Acyclig Graphs (DAG). 4

Simple Example Given: 3 random variables,, and Joint prob: Random variables can be discrete or continuous A Graphical Model based on a DAG is called a Bayesian Network 5

Simple Example In general: Joint prob: random variables This leads to a fully connected graph. Note: The ordering of the nodes in such a fully connected graph is arbitrary. They all represent the joint probability distribution: 6

Bayesian Networks Statistical independence can be represented by the absence of edges. This makes the computation efficient. Intuitively: only and have an influence on 7

Bayesian Networks We can now define a one-to-one mapping from graphical models to probabilistic formulations: General Factorization: where ancestors of and 8

Elements of Graphical Models In case of a series of random variables with equal dependencies, we can subsume them using a plate: Plate 9

Elements of Graphical Models (2) We distinguish between input variables and explicit hyper-parameters: 10

Elements of Graphical Models (3) We distinguish between observed variables and hidden variables: (deterministic parameters omitted) 11

Regression as a Graphical Model Regression: Prediction of a new target value Here: conditioning on all deterministic parameters Using this, we can obtain the predictive distribution: 12

Two Special Cases We consider two special cases: All random variables are discrete; i.e. Each x i is represented by values p(x µ) = KY k=1 µ x k k 0.5000 0.3750 0.2500 0.1250 0 All random variables are Gaussian where 13

Discrete Variables: Example Two dependent variables: K 2-1 parameters Here: K = 2 1 0.2 2 0.8 1 1 0.25 1 2 0.75 2 1 0.1 2 2 0.9 Independent joint distribution: 2(K 1) parameters 14

Discrete Variables: General Case In a general joint distribution with M variables we need to store K M -1 parameters If the distribution can be described by this graph: then, we have only K -1 + (M -1) K(K -1) parameters. This graph is called a Markov chain with M nodes. The number of parameters grows only linearly with the number of variables. 15

Gaussian Variables Assume all random variables are Gaussian and we define Then one can show that the joint probability p(x) is a multivariate Gaussian. Furthermore: Thus: X x i = w ij x j + b j + p v i i i N (0, 1) j2pa i X E[x i ]= w ij E[x j ]+b i j2pa i i.e., we can compute the mean values recursively. 16

Gaussian Variables Assume all random variables are Gaussian and we define The same can be shown for the covariance. Thus: Mean and covariance can be calculated recursively Furthermore it can be shown that: The fully connected graph corresponds to a Gaussian with a general symmetric covariance matrix The non-connected graph corresponds to a diagonal covariance matrix 17

Independence (Rep.) Definition 1.4: Two random variables and are independent iff: For independent random variables and we have: Notation: Independence does not imply conditional independence. The same is true for the opposite case. 18

Conditional Independence (Rep.) Definition 1.5: Two random variables and are conditional independent given a third random variable iff: This is equivalent to: and Notation: 19

Conditional Independence: Example 1 This graph represents the probability distribution: Marginalizing out c on both sides gives This is in general not equal to. p(a)p(b) Thus: and are not independent: 20

Conditional Independence: Example 1 Now, we condition on ( it is assumed to be known): Thus: and are conditionally independent given : We say that the node at is a tail-to-tail node on the path between and 21

Conditional Independence: Example 2 This graph represents the distribution: Again, we marginalize over : And we obtain: 22

Conditional Independence: Example 2 As before, now we condition on : And we obtain: We say that the node at on the path between and. is a head-to-tail node 23

Conditional Independence: Example 3 Now consider this graph: using: X p(a, b, c) =p(a)p(b) X c c we obtain: p(c a, b) And the result is: 24

Conditional Independence: Example 3 Again, we condition on This results in: We say that the node at on the path between and. is a head-to-head node 25

To Summarize When does the graph represent (conditional) independence? Tail-to-tail case: if we condition on the tail-to-tail node Head-to-tail case: if we cond. on the head-to-tail node Head-to-head case: if we do not condition on the headto-head node (and neither on any of its descendants) In general, this leads to the notion of D-separation for directed graphical models. 26

D-Separation Say: A, B, and C are non-intersecting subsets of nodes in a directed graph. A path from A to B is blocked by C if it contains a node such that either a) the arrows on the path meet either head-to-tail or tail-totail at the node, and the node is in the set C, or b) the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C. If all paths from A to B are blocked, A is said to be d-separated from B by C. Notation: 27

D-Separation Say: A, B, and C are non-intersecting subsets of nodes in a directed graph. A path from A to B is blocked by C if it contains a node such that either a) the arrows on the and path meet not either of head-to-tail or tail-totail at the node, and the node is in the set C, or b) the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C. If all paths from A to B are blocked, A is said to be d-separated from B by C. Notation: D-Separation is a property of graphs probability distributions 28

D-Separation: Example We condition on a descendant of e, i.e. it does not block the path from a to b. We condition on a tail-to-tail node on the only path from a to b, i.e f blocks the path. 29

I-Map Definition 4.1: A graph G is called an I-map for a distribution p if every D-separation of G corresponds to a conditional independence relation satisfied by p: Example: The fully connected graph is an I-map for any distribution, as there are no D-separations in that graph. 30

D-Map Definition 4.2: A graph G is called an D-map for a distribution p if for every conditional independence relation satisfied by p there is a D-separation in G : Example: The graph without any edges is a D-map for any distribution, as all pairs of subsets of nodes are D-separated in that graph. 31

Perfect Map Definition 4.3: A graph G is called a perfect map for a distribution p if it is a D-map and an I-map of p. A perfect map uniquely defines a probability distribution. 32

The Markov Blanket Consider a distribution of a node x_i conditioned on all other nodes: Markov blanket x i : all parents, children and co-parents of x i. at 33 Factors independent of x i cancel between numerator and denominator.

Summary Graphical models represent joint probability distributions using nodes for the random variables and edges to express (conditional) (in)dependence A prob. dist. can always be represented using a fully connected graph, but this is inefficient In a directed acyclic graph, conditional independence is determined using D-separation A perfect map implies a one-to-one mapping between c.i. relations and D-separations The Markov blanket is the minimal set of observed nodes to obtain conditional independence 34

Prof. Daniel Cremers 4. Probabilistic Graphical Models Undirected Models

Repetition: Bayesian Networks Directed graphical models can be used to represent probability distributions This is useful to do inference and to generate samples from the distribution efficiently 36

Repetition: D-Separation D-separation is a property of graphs that can be easily determined An I-map assigns every d-separation a c.i. rel A D-map assigns every c.i. rel a d-separation Every Bayes net determines a unique prob. dist. 37

In-depth: The Head-to-Head Node Example: a: Battery charged (0 or 1) b: Fuel tank full (0 or 1) p(a) =0.9 p(b) =0.9 a b p(c) 1 1 0.8 1 0 0.2 0 1 0.2 0 0 0.1 c: Fuel gauge says full (0 or 1) We can compute and and obtain similarly: p( c b) =0.81 a explains c away p( c) =0.315 p( b c) 0.257 p( b c, a) 0.111 38

Repetition: D-Separation 39

Directed vs. Undirected Graphs Using D-separation we can identify conditional independencies in directed graphical models, but: Is there a simpler, more intuitive way to express conditional independence in a graph? Can we find a representation for cases where an ordering of the random variables is inappropriate (e.g. the pixels in a camera image)? Yes, we can: by removing the directions of the edges we obtain an Undirected Graphical Model, also known as a Markov Random Field 40

Example: Camera Image x i x i directions are counter-intuitive for images Markov blanket is not just the direct neighbors when using a directed model 41

Markov Random Fields Markov Blanket All paths from A to B go through C, i.e. C blocks all paths. We only need to condition on the direct neighbors of x to get c.i., because these already block every path from x to any other node. 42

Factorization of MRFs Any two nodes x i and x j that are not connected in an MRF are conditionally independent given all other nodes: p(x i,x j x \{i,j} )=p(x i x \{i,j} )p(x j x \{i,j} ) In turn: each factor contains only nodes that are connected This motivates the consideration of cliques in the graph: Clique A clique is a fully connected subgraph. A maximal clique can not be extended with another node without loosing the property of full connectivity. Maximal Clique 43

Factorization of MRFs In general, a Markov Random Field is factorized as where C is the set of all (maximal) cliques and Φ C is a positive function of a given clique x C of nodes, called the clique potential. Z is called the partition function. Theorem (Hammersley/Clifford): Any undirected model with associated clique potentials Φ C is a perfect map for the probability distribution defined by Equation (4.1). As a conclusion, all probability distributions that can be factorized as in (4.1), can be represented as an MRF. 44

Converting Directed to Undirected Graphs (1) In this case: Z=1 45

Converting Directed to Undirected Graphs (2) x 1 x 3 x 1 x 3 x 2 x 2 x 4 x 4 p(x) =p(x 1 )p(x 2 )p(x 2 )p(x 4 x 1,x 2,x 3 ) In general: conditional distributions in the directed graph are mapped to cliques in the undirected graph However: the variables are not conditionally independent given the head-to-head node Therefore: Connect all parents of head-to-head nodes with each other (moralization) 46

Converting Directed to Undirected Graphs (2) x 1 x 3 x 1 x 3 x 2 x 2 x 4 x 4 p(x) =p(x 1 )p(x 2 )p(x 2 )p(x 4 x 1,x 2,x 3 ) p(x) = (x 1,x 2,x 3,x 4 ) Problem: This process can remove conditional independence relations (inefficient) Generally: There is no one-to-one mapping between the distributions represented by directed and by undirected graphs. 47

Representability As for DAGs, we can define an I-map, a D-map and a perfect map for MRFs. The set of all distributions for which a DAG exists that is a perfect map is different from that for MRFs. Distributions with a DAG as perfect map Distributions with an MRF as perfect map All distributions 48

Directed vs. Undirected Graphs Both distributions can not be represented in the other framework (directed/undirected) with all conditional independence relations. 49

Using Graphical Models We can use a graphical model to do inference: Some nodes in the graph are observed, for others we want to find the posterior distribution Also, computing the local marginal distribution p(x n ) at any node x n can be done using inference. Question: How can inference be done with a graphical model? We will see that when exploiting conditional independences we can do efficient inference. 50

Inference on a Chain The joint probability is given by The marginal at x 3 is In the general case with N nodes we have and 51

Inference on a Chain This would mean K N computations! A more efficient way is obtained by rearranging: Vectors of size K 52

Inference on a Chain In general, we have 53

Inference on a Chain The messages µ α and µ β can be computed recursively: Computation of µ α starts at the first node and computation of µ β starts at the last node. 54

Inference on a Chain The first values of µ α and µ β are: The partition function can be computed at any node: Overall, we have O(NK 2 ) operations to compute the marginal 55

Inference on a Chain To compute local marginals: Compute and store all forward messages,. Compute and store all backward messages, Compute Z once at a node x m : Compute for all variables required. 56

Summary Undirected Models (also known as Markov random fields) provide a simpler method to check for conditional independence A MRF is defined as a factorization over clique potentials and normalized globally Directed models can be converted into undirected ones, but there are distributions that can be represented only in one kind of model For undirected Markov chains there is a very efficient inference method based on message passing 57