Bayesian Machine Learning - Lecture 6

Similar documents
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

COMP90051 Statistical Machine Learning

Machine Learning. Sourangshu Bhattacharya

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

FMA901F: Machine Learning Lecture 6: Graphical Models. Cristian Sminchisescu

Computer vision: models, learning and inference. Chapter 10 Graphical Models

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Graphical Models. Dmitrij Lagutin, T Machine Learning: Basic Principles

Probabilistic Graphical Models

D-Separation. b) the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C.

Machine Learning

Lecture 3: Conditional Independence - Undirected

Graphical Models Part 1-2 (Reading Notes)

Workshop report 1. Daniels report is on website 2. Don t expect to write it based on listening to one project (we had 6 only 2 was sufficient

Lecture 4: Undirected Graphical Models

3 : Representation of Undirected GMs

Directed Graphical Models (Bayes Nets) (9/4/13)

Chapter 2 PRELIMINARIES. 1. Random variables and conditional independence

Reasoning About Uncertainty

Graphical models are a lot like a circuit diagram they are written down to visualize and better understand a problem.

Lecture 2 - Graph Theory Fundamentals - Reachability and Exploration 1

Part II. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Statistical Techniques in Robotics (STR, S15) Lecture#06 (Wednesday, January 28)

ECE521 Lecture 21 HMM cont. Message Passing Algorithms

CS242: Probabilistic Graphical Models Lecture 3: Factor Graphs & Variable Elimination

Lecture 9: Undirected Graphical Models Machine Learning

Graphical Models. David M. Blei Columbia University. September 17, 2014

Stat 5421 Lecture Notes Graphical Models Charles J. Geyer April 27, Introduction. 2 Undirected Graphs

Graphical Models and Markov Blankets

Machine Learning. Lecture Slides for. ETHEM ALPAYDIN The MIT Press, h1p://

2. Graphical Models. Undirected graphical models. Factor graphs. Bayesian networks. Conversion between graphical models. Graphical Models 2-1

1 : Introduction to GM and Directed GMs: Bayesian Networks. 3 Multivariate Distributions and Graphical Models

The Basics of Graphical Models

Recall from last time. Lecture 4: Wrap-up of Bayes net representation. Markov networks. Markov blanket. Isolating a node

Computational Intelligence

Machine Learning

Graph and Digraph Glossary

Bayesian Network & Anomaly Detection

4 Factor graphs and Comparing Graphical Model Types

Graphical Models & HMMs

Machine Learning

10-701/15-781, Fall 2006, Final

Markov Equivalence in Bayesian Networks

Machine Learning

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms for Inference Fall 2014

ECE521 W17 Tutorial 10

6 : Factor Graphs, Message Passing and Junction Trees

Machine Learning

Machine Learning Lecture 16

Building Classifiers using Bayesian Networks

Expectation Propagation

Forensic Statistics and Graphical Models (2) Richard Gill Spring Semester

COS 513: Foundations of Probabilistic Modeling. Lecture 5

A Parallel Algorithm for Exact Structure Learning of Bayesian Networks

2. Graphical Models. Undirected pairwise graphical models. Factor graphs. Bayesian networks. Conversion between graphical models. Graphical Models 2-1

Lecture 11: May 1, 2000

Machine Learning

STA 4273H: Statistical Machine Learning

Bayesian Networks. A Bayesian network is a directed acyclic graph that represents causal relationships between random variables. Earthquake.

Tracking Algorithms. Lecture16: Visual Tracking I. Probabilistic Tracking. Joint Probability and Graphical Model. Deterministic methods

CS 343: Artificial Intelligence

Taking Stock. IE170: Algorithms in Systems Engineering: Lecture 17. Depth-First Search. DFS (Initialize and Go) Last Time Depth-First Search

K-Means and Gaussian Mixture Models

Bayesian Networks, Winter Yoav Haimovitch & Ariel Raviv

Machine Learning A WS15/16 1sst KU Version: January 11, b) [1 P] For the probability distribution P (A, B, C, D) with the factorization

Graph Theory. Probabilistic Graphical Models. L. Enrique Sucar, INAOE. Definitions. Types of Graphs. Trajectories and Circuits.

Graphs. A graph is a data structure consisting of nodes (or vertices) and edges. An edge is a connection between two nodes

Causality with Gates

COMP90051 Statistical Machine Learning

Introduction to Graphs. CS2110, Spring 2011 Cornell University

Today s outline: pp

Directed Graphical Models

Introduction to Machine Learning CMU-10701

Today. Logistic Regression. Decision Trees Redux. Graphical Models. Maximum Entropy Formulation. Now using Information Theory

Algorithm Design and Analysis

RNNs as Directed Graphical Models

Hands-On Graphical Causal Modeling Using R

Machine Learning A W 1sst KU. b) [1 P] Give an example for a probability distributions P (A, B, C) that disproves

On the Implication Problem for Probabilistic Conditional Independency

Chapter 8 of Bishop's Book: Graphical Models

Strongly Connected Components

Probabilistic Graphical Models Part III: Example Applications

Section 5.5. Left subtree The left subtree of a vertex V on a binary tree is the graph formed by the left child L of V, the descendents

Statistical and Learning Techniques in Computer Vision Lecture 1: Markov Random Fields Jens Rittscher and Chuck Stewart

Sequence Labeling: The Problem

Testing Independencies in Bayesian Networks with i-separation

STAT 598L Probabilistic Graphical Models. Instructor: Sergey Kirshner. Exact Inference

Learning Objects and Parts in Images

Lecture 13: May 10, 2002

Discrete mathematics

CS 6140: Machine Learning Spring 2016

SCORE EQUIVALENCE & POLYHEDRAL APPROACHES TO LEARNING BAYESIAN NETWORKS

Math 2 Spring Unit 5 Bundle Transformational Graphing and Inverse Variation

Quantitative Biology II!

Stanford University CS359G: Graph Partitioning and Expanders Handout 18 Luca Trevisan March 3, 2011

Lecture 12: Cleaning up CFGs and Chomsky Normal

Machine Learning

Graphical Models. Pradeep Ravikumar Department of Computer Science The University of Texas at Austin

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

Transcription:

Bayesian Machine Learning - Lecture 6 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 2, 2015

Today s lecture 1 Structure and conditional independence 2 3

Product rule - revisited Consider a joint probability distribution over random variables x 1,..., x N p(x 1,..., x N ) As probability distributions are symmetric, picking an ordering of the variables is arbitrary Given an arbitrary ordering, I can use the product rule to rewrite the joint as p(x 1,..., x N ) = p(x N x N 1,..., x 1 )p(x N 1 x N 2,..., x 1 )... p(x 1 ) (1) Think of linking all random variables with segments and then stretching the resulting graph Useful to draw samples, but does not encode any deeper structure

Factorisations and graphs The product rule gives us a generic but rather useless factorisation of joint probabilities Interesting things arise when special factorisations are present, i.e. when you can remove some variables from the conditioning sets in (1) Today we will formalise what interesting means by mapping distributions onto graphs Graphs are also a very useful tool to model systems, as they abstract the dependence structure between variables without bothering with the details of the specific distributions

Conditional independence Two variables x 1 and x 2 are conditionally independent given x 3 (x 1 x 2 x 3 ) if Equivalently p (x 1 x 2, x 3 ) = p (x 1 x 3 ). p(x 1, x 2 x 3 ) = p(x 1 x 3 )p(x 2 x 3 ) Two non-overlapping sets of nodes A and B are d-separated by a set C if every node in A is conditionally independent of every other node in B given the nodes in C Given a variable (node) x i, its Markov blanket is the set of nodes M that d-separate x i from the rest of the network

DAGs and distributions Let us assume we have a factorisation of joint probability distribution in terms of conditionals A factorization defines a Directed Acyclic Graph (DAG) in the following way: variables are associated with nodes, and variables to the right of the conditioning sign (parents) are linked with an arrow to variables to the left (children) Warning: the graph is not unique (Markov equivalence)!!! Cool thing, start with a DAG and get a distribution with structure

Graphical notation Random variables are denoted by circles with the name inside Observed random variables are often shaded I.i. d. variables are denoted through plates Sometimes, additional parameters which we are not doing posterior inference over are denoted as squares Let s draw the graphical model for the Gaussian Mixture Model

Francesco s example Francesco works on modelling users from twitter data. Let s draw a graphical model for him!

Working out conditional independences from graphs Graphically, two variables a and b are d-separated by C if every path between a and b is blocked A path is blocked if the arrows meet head to tail or tail to tail at a node in C A path is blocked also if there exists a node x on the path where arrows meet head to head, and neither x nor its descendants are in C The Markov blanket of a node are its children, parents and co-parents Notice that not all conditional independence can be represented by a DAG (rather pathological counter examples)

Dependencies with no direction Sometimes we wish to encode dependencies which do not have an obvious precedence Discuss some examples In these cases, we just draw lines between nodes without arrows Hybrid examples (some directed edges, some undirected) also possible One particular example from my work (Sanguinetti et al, Bioinformatics 2008)

Conditional independence and Markov Random Fields Formally, we can associate an undirected graphical model (Markov Random Field) to a general factorisation of a joint distribution (not necessarily in terms of conditional distributions) Nodes are connected if they appear together in a factor Maximal cliques in the graph correspond to factors Conditional independence is equivalent to separation in the graph (easy!)

Simple example Consider the linear dynamical system x t = ax t 1 + ɛ t ɛ t N (0, σ 2 ) x 0 N (0, 1) (2) Write the joint distribution Draw the corresponding directed and undirected graphical models

From directed to undirected graphs In general, it is possible to go from a directed to an undirected graphical model (why?) Remove all arrows Join all nodes that share a child (moralisation) Are conditional independences preserved?