Graphical Models Part 1-2 (Reading Notes)

Similar documents
FMA901F: Machine Learning Lecture 6: Graphical Models. Cristian Sminchisescu

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Probabilistic Graphical Models

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

Machine Learning

D-Separation. b) the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C.

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Machine Learning. Sourangshu Bhattacharya

ECE521 Lecture 21 HMM cont. Message Passing Algorithms

Bayesian Machine Learning - Lecture 6

Workshop report 1. Daniels report is on website 2. Don t expect to write it based on listening to one project (we had 6 only 2 was sufficient

Graphical Models. Dmitrij Lagutin, T Machine Learning: Basic Principles

Computer vision: models, learning and inference. Chapter 10 Graphical Models

Introduction to Graphical Models

Part II. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Graphical Models & HMMs

Machine Learning

Machine Learning Lecture 16

Machine Learning

Statistical Techniques in Robotics (STR, S15) Lecture#06 (Wednesday, January 28)

Directed Graphical Models (Bayes Nets) (9/4/13)

Graphical Models. David M. Blei Columbia University. September 17, 2014

Graphical models are a lot like a circuit diagram they are written down to visualize and better understand a problem.

COS 513: Foundations of Probabilistic Modeling. Lecture 5

SCORE EQUIVALENCE & POLYHEDRAL APPROACHES TO LEARNING BAYESIAN NETWORKS

Machine Learning

The Basics of Graphical Models

Graphical Models and Markov Blankets

Chapter 8 of Bishop's Book: Graphical Models

A Brief Introduction to Bayesian Networks. adapted from slides by Mitch Marcus

Research Article Structural Learning about Directed Acyclic Graphs from Multiple Databases

ECE521 W17 Tutorial 10

CS 343: Artificial Intelligence

Sequence Labeling: The Problem

Lecture 4: Undirected Graphical Models

Bayesian Networks. A Bayesian network is a directed acyclic graph that represents causal relationships between random variables. Earthquake.

Markov Equivalence in Bayesian Networks

STA 4273H: Statistical Machine Learning

4.4 Problems and Solutions (last update 12 February 2018)

Building Classifiers using Bayesian Networks

Bayesian Classification Using Probabilistic Graphical Models

1 : Introduction to GM and Directed GMs: Bayesian Networks. 3 Multivariate Distributions and Graphical Models

CS-171, Intro to A.I. Mid-term Exam Summer Quarter, 2016

K-Means and Gaussian Mixture Models

Directed Graphical Models

Probabilistic Graphical Models

ECE521 Lecture 18 Graphical Models Hidden Markov Models

Survey of contemporary Bayesian Network Structure Learning methods

Lecture 3: Conditional Independence - Undirected

A Brief Introduction to Bayesian Networks AIMA CIS 391 Intro to Artificial Intelligence

Machine Learning

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 15, 2015

COMP90051 Statistical Machine Learning

Av. Prof. Mello Moraes, 2231, , São Paulo, SP - Brazil

Chapter 3. Set Theory. 3.1 What is a Set?

Graphical Models Reconstruction

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

Escola Politécnica, University of São Paulo Av. Prof. Mello Moraes, 2231, , São Paulo, SP - Brazil

Introduction to DAGs Directed Acyclic Graphs

Machine Learning

Causality with Gates

7. Boosting and Bagging Bagging

Max-Sum Inference Algorithm

Today s outline: pp

Homework 1: Belief Propagation & Factor Graphs

CS281 Section 9: Graph Models and Practical MCMC

CS 343: Artificial Intelligence

Stat 5421 Lecture Notes Graphical Models Charles J. Geyer April 27, Introduction. 2 Undirected Graphs

Machine Learning

A Transformational Characterization of Markov Equivalence for Directed Maximal Ancestral Graphs

Review I" CMPSCI 383 December 6, 2011!

The Pixel Array method for solving nonlinear systems

A Parallel Algorithm for Exact Structure Learning of Bayesian Networks

Reasoning About Uncertainty

Introduction to information theory and coding - Lecture 1 on Graphical models

Machine Learning. Lecture Slides for. ETHEM ALPAYDIN The MIT Press, h1p://

Sparse Nested Markov Models with Log-linear Parameters

Probabilistic Graphical Models

Figure 4.1: The evolution of a rooted tree.

1.2 Adding Integers. Contents: Numbers on the Number Lines Adding Signed Numbers on the Number Line

Statistical relationship discovery in SNP data using Bayesian networks

An undirected graph is a tree if and only of there is a unique simple path between any 2 of its vertices.

Computer Vision Group Prof. Daniel Cremers. 4a. Inference in Graphical Models

Introduction to Bayesian networks

Lab 7: Bayesian analysis of a dice toss problem using C++ instead of python

Stat 342 Exam 3 Fall 2014

Conditional Random Fields and beyond D A N I E L K H A S H A B I C S U I U C,

Statistical and Learning Techniques in Computer Vision Lecture 1: Markov Random Fields Jens Rittscher and Chuck Stewart

5/3/2010Z:\ jeh\self\notes.doc\7 Chapter 7 Graphical models and belief propagation Graphical models and belief propagation

STAT 598L Probabilistic Graphical Models. Instructor: Sergey Kirshner. Exact Inference

LECTURE 26 PRIM S ALGORITHM

1. make a scenario and build a bayesian network + conditional probability table! use only nominal variable!

CS242: Probabilistic Graphical Models Lecture 3: Factor Graphs & Variable Elimination

Tree-Augmented Naïve Bayes Methods for Real-Time Training and Classification of Streaming Data

Hints for Exercise 4: Recursion

CS 6140: Machine Learning Spring 2016

Monte Carlo Methods. Lecture slides for Chapter 17 of Deep Learning Ian Goodfellow Last updated

Dependency detection with Bayesian Networks

Decision Theory: Single & Sequential Decisions. VE for Decision Networks.

Transcription:

Graphical Models Part 1-2 (Reading Notes) Wednesday, August 3 2011, 2:35 PM Notes for the Reading of Chapter 8 Graphical Models of the book Pattern Recognition and Machine Learning (PRML) by Chris Bishop This section (part 1-2) covers 8.1 and 8.2 of the book, which covers Bayesian Networks (directed graph) and the conditional independencies 8.1 Bayesian Networks Directed acyclic graph describes joint distribution over variables (how they decompose!). In general: *** (key) * Generative Model observed variabel is generated by latent (hidden) vairable Discrete variables: 1st order chain of M discrete nodes Turn discrete variables into a Bayesian model by introducing Dirichlet prior

And with a shared u among all the condition distribution p(xi xi-1) 8.2 Conditional Independence e.g. a three variable case: p(a b, c)=p(a c) or: p(a, b c)=p(a b, c) * p(b c)=p(a c) *p(b c) we call that a is conditionally independent of b given c. Denote as Three example graphs: 1. tail-to-tail Figure 8.15: graph over 3 variables a,b and c. The joint distribution of the 3 variable is: p(a, b, c)= p(a c) p(b c) p (c) to see whether a b are independent, we sum the distribution on c: In general, this does not lead to the product of p(a)p(b) Thus a and b are NOT independent (they are dependent), denote it as:

Figure 8.16: same as 8.15 but conditioned on c of b given c:, which shows that p(a c) and p(b c) are independent, i.e a is conditional independent 2. head-to-tail The joint distribution of a, b and c is : p(a, b, c)=p(a) p(c a) p(b c) To get the joint probability of a and b, we sum over c: which in general does not factorize into p(a)*p(b), and so: If we condition on node c, as shown above, using Bayesian theorem, together with the joint distribution of the three variables, we get: so a and b are conditionally independent given c, i.e.:

3. head-to-head This is a tricker case, actually the property is quite the opposite to the previous two. The joint distribution of a, b and c is: p(a, b, c)= p(a) p(b) p(c a, b) When c is not observed, we marginalize both sides over c to get the joint distribution of a and b: p(a, b)=p(a) p(b), i.e a and b are independent with no variables observed Suppose we condition on c, the conditional distribution of a and b is: which does not factorize into the product p(a) and p(b), and so: Note the third example (head to head) has the opposite behavior from the first two. When c is unobserved, it block the path, and a and b are independent. However, conditioning on c "unblocks" the path and makes a and b dependent. A more subtle thing with this case: a head-to-head path will become unblocked if either the node, or any of its descendants, is observed. Summary : a tail-to-tail node or a head-to-tail node leaves a path unblocked unless it is observed in which case it blocks the path. By contrast, a head-to-head node blocks a path if it is unobserved, but once the node, and/or at least one of its descendants, is observed the path becomes unblocked. D-seperation

D-seperation D-seperation property of directed graph: Consider a general directed graph in which A, B and C are arbitrary nonintersecting sets of nodes. We wish to ascertain whether a particular conditional independence statement is implied by a given directed acyclic graph. To do so,we consider all possible paths from any node in A to any node in B. Any such path is said to be blocked if it includes a node such that either: (a) the arrows on the path meet either head-to-tail or tail-to-tail at the node, and the node is in the set C, or (b) the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, is in the set C. If all paths are blocked, A is said to be d-sperated from B by C, and the joint distribution over all the variables in the graph will satisfy Illustration of D-seperation. In 8.22(a) the path from a to b is NOT blocked by f NOR e; and in (b) the path from a to b IS blocked by f and e. What expressed by directed graph: (a) A particular directed graph represents a specific decomposition of a joint probability distribution into a product of conditional probabilities. (b) The graph also expresses a set of conditional independence statements obtained through the d-separation criterion, The d-separation theorem is really an expression of the equivalence of these two properties. In order to make this clear, it is helpful to think of a directed graph as a filter. Markov Blanket

Consider a joint distribution p(x1,...,xd) represented by a directed graph having D nodes, and consider the conditional distribution of a particular node with variables xi conditioned on all of the remaining variables x(j!=i). Using the factorization property (8.5), we can express this conditional distribution in the form Any factor p(xk pak) that does not have any functional dependence on xi can be taken outside the integral over xi, and will therefore cancel between numerator and denominator. The only factors that remain will be the conditional distribution p(xi pai) for node xi itself, together with the conditional distributions for any nodes xk such that node xi is in the conditioning set of p(xk pak) The Markov blanket of a node x, comprises the set of parents, children and co-parents. We can think Markov blanket of a node as the minimal set of nodes that isolate that node.