Lecture 4: Undirected Graphical Models

Similar documents
Chapter 8 of Bishop's Book: Graphical Models

FMA901F: Machine Learning Lecture 6: Graphical Models. Cristian Sminchisescu

D-Separation. b) the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C.

3 : Representation of Undirected GMs

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

Lecture 3: Conditional Independence - Undirected

Machine Learning. Sourangshu Bhattacharya

2. Graphical Models. Undirected graphical models. Factor graphs. Bayesian networks. Conversion between graphical models. Graphical Models 2-1

These notes present some properties of chordal graphs, a set of undirected graphs that are important for undirected graphical models.

4 Factor graphs and Comparing Graphical Model Types

Lecture 5: Exact inference. Queries. Complexity of inference. Queries (continued) Bayesian networks can answer questions about the underlying

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

Probabilistic Graphical Models

Statistical and Learning Techniques in Computer Vision Lecture 1: Markov Random Fields Jens Rittscher and Chuck Stewart

CS242: Probabilistic Graphical Models Lecture 3: Factor Graphs & Variable Elimination

Expectation Propagation

Undirected Graphical Models. Raul Queiroz Feitosa

Lecture 5: Exact inference

2. Graphical Models. Undirected pairwise graphical models. Factor graphs. Bayesian networks. Conversion between graphical models. Graphical Models 2-1

10 Sum-product on factor tree graphs, MAP elimination

Recall from last time. Lecture 4: Wrap-up of Bayes net representation. Markov networks. Markov blanket. Isolating a node

Bayesian Machine Learning - Lecture 6

Introduction to information theory and coding - Lecture 1 on Graphical models

Stat 5421 Lecture Notes Graphical Models Charles J. Geyer April 27, Introduction. 2 Undirected Graphs

Decomposition of log-linear models

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Graphical Models. David M. Blei Columbia University. September 17, 2014

Computer vision: models, learning and inference. Chapter 10 Graphical Models

STA 4273H: Statistical Machine Learning

Directed Graphical Models

A note on the pairwise Markov condition in directed Markov fields

CPS 102: Discrete Mathematics. Quiz 3 Date: Wednesday November 30, Instructor: Bruce Maggs NAME: Prob # Score. Total 60

Loopy Belief Propagation

Lecture 22 Tuesday, April 10

Bipartite graphs unique perfect matching.

Graphical models and message-passing algorithms: Some introductory lectures

The Basics of Graphical Models

Graph Theory S 1 I 2 I 1 S 2 I 1 I 2

5 Minimal I-Maps, Chordal Graphs, Trees, and Markov Chains

Part II. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

CS473-Algorithms I. Lecture 13-A. Graphs. Cevdet Aykanat - Bilkent University Computer Engineering Department

CS649 Sensor Networks IP Track Lecture 6: Graphical Models

Lecture 9: Undirected Graphical Models Machine Learning

Learning Bayesian Networks via Edge Walks on DAG Associahedra

COS 513: Foundations of Probabilistic Modeling. Lecture 5

Research Article Structural Learning about Directed Acyclic Graphs from Multiple Databases

Finding Non-overlapping Clusters for Generalized Inference Over Graphical Models

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms for Inference Fall 2014

Graphical Models. Pradeep Ravikumar Department of Computer Science The University of Texas at Austin

COMP90051 Statistical Machine Learning

CS242: Probabilistic Graphical Models Lecture 2B: Loopy Belief Propagation & Junction Trees

Computability Theory

6 : Factor Graphs, Message Passing and Junction Trees

ECE521 W17 Tutorial 10

Markov Random Fields

Junction Trees and Chordal Graphs

Number Theory and Graph Theory

AN ANALYSIS ON MARKOV RANDOM FIELDS (MRFs) USING CYCLE GRAPHS

Mesh segmentation. Florent Lafarge Inria Sophia Antipolis - Mediterranee

Adjacent: Two distinct vertices u, v are adjacent if there is an edge with ends u, v. In this case we let uv denote such an edge.

Chapter 2 PRELIMINARIES. 1. Random variables and conditional independence

Bayesian Networks, Winter Yoav Haimovitch & Ariel Raviv

1 : Introduction to GM and Directed GMs: Bayesian Networks. 3 Multivariate Distributions and Graphical Models

1 Digraphs. Definition 1

Image Restoration using Markov Random Fields

Problem 1. Which of the following is true of functions =100 +log and = + log? Problem 2. Which of the following is true of functions = 2 and =3?

Survey of contemporary Bayesian Network Structure Learning methods

Computer Vision Group Prof. Daniel Cremers. 4a. Inference in Graphical Models

V10 Metabolic networks - Graph connectivity

On the Implication Problem for Probabilistic Conditional Independency

Junction Trees and Chordal Graphs

10708 Graphical Models: Homework 4

Information Processing Letters

Treewidth and graph minors

Section 8.2 Graph Terminology. Undirected Graphs. Definition: Two vertices u, v in V are adjacent or neighbors if there is an edge e between u and v.

A Transformational Characterization of Markov Equivalence for Directed Maximal Ancestral Graphs

36-720: Graphical Models

Domination, Independence and Other Numbers Associated With the Intersection Graph of a Set of Half-planes

Reasoning About Uncertainty

Network flows and Menger s theorem

Machine Learning!!!!! Srihari. Chain Graph Models. Sargur Srihari

Workshop report 1. Daniels report is on website 2. Don t expect to write it based on listening to one project (we had 6 only 2 was sufficient

Junction tree propagation - BNDG 4-4.6

ECE521 Lecture 21 HMM cont. Message Passing Algorithms

Probabilistic Graphical Models

Lecture 19 Thursday, March 29. Examples of isomorphic, and non-isomorphic graphs will be given in class.

Machine Learning. Lecture Slides for. ETHEM ALPAYDIN The MIT Press, h1p://

A graph is finite if its vertex set and edge set are finite. We call a graph with just one vertex trivial and all other graphs nontrivial.

Conditional Random Fields and beyond D A N I E L K H A S H A B I C S U I U C,

Separators and Adjustment Sets in Markov Equivalent DAGs

An Effective Upperbound on Treewidth Using Partial Fill-in of Separators

Collective classification in network data

Building Classifiers using Bayesian Networks

Cheng Soon Ong & Christian Walder. Canberra February June 2018

V,T C3: S,L,B T C4: A,L,T A,L C5: A,L,B A,B C6: C2: X,A A

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 20, NO. 9, SEPTEMBER

CPCS Discrete Structures 1

Math 454 Final Exam, Fall 2005

Lecture 2 - Graph Theory Fundamentals - Reachability and Exploration 1

Let G = (V, E) be a graph. If u, v V, then u is adjacent to v if {u, v} E. We also use the notation u v to denote that u is adjacent to v.

Transcription:

Lecture 4: Undirected Graphical Models Department of Biostatistics University of Michigan zhenkewu@umich.edu http://zhenkewu.com/teaching/graphical_model 15 September, 2016 Zhenke Wu BIOSTAT830 Graphical Models (Module 1: Representation) 1

Lecture 3 Main Points Again Representation of Directed Acyclic Graphs (DAG) Motivation: Need a system that can Clearly represent human knowledge about informational relevance Afford qualitative and robust reasoning Representation: Connect d-separation (graphical concept) to conditional independence (probability concept) Directed edges (arrows) encode local dependencies Not every joint probability distribution has a DAG with exactly the same set of conditional independencies (represented by the d-separation triplets from the DAG). Reading (optional): Pearl and Verma (1987). The logic of representing dependencies by directed acyclic graphs. Zhenke Wu BIOSTAT830 Graphical Models (Module 1: Representation) 2

Undirected Graphical Models DAGs using directed edges to guide the specification of components in the joint probability distributions: [X 1,..., X p ] = j [X j Pa G X j ] (local Markov condition) Undirected graphical (UG) models also provide another system for qualitatively representing vertex-dependencies, esp. when the directionality of interactions are unclear; Gives correlations Also known as: Markov Random Field (MRF), or Markov network Rich applications in spatial statistics (spatial interactions), natural language processing (word dependencies), network discoveries (e.g., neuron activation patterns, protein interaction networks),... Zhenke Wu BIOSTAT830 Graphical Models (Module 1: Representation) 3

UG Examples (Protein Networks and Game of Go) Stern et al. (2004), Proceedings of 23rd ICML Zhenke Wu BIOSTAT830 Graphical Models (Module 1: Representation) 4

Undirected Graphical Models Pairwise non-causal relationships Can readily write down the model, but not obvious how to generate samples from it Zhenke Wu BIOSTAT830 Graphical Models (Module 1: Representation) 5

Markov Properties on UG A probability distribution P for a random vector X = (X 1,..., X d ) could satisfy a range of different Markov properties with respect to a graph G = (V, E), where V is the set of vertices, each corresponding to one of {X 1,..., X d }, and E is the set of edges. Global Markov Property: P satisfies the global Markov property with respect to a graph G if for any disjoint vertex subsets A, B, and C, such that C separates A and B, the random variables X A are conditionally independent of X B given X C. Here, we say C separates A and B if every path from a node in A to a node in B passes through a node in C. Local Markov Property: P satisfies the local Markov property with respect to G if the conditional distribution of a variable given its neighbors is independent of the remaining nodes. Pairwise Markov Property: P satisfies the pairwise markov property with respect to G if for any pair of non-adjacent nodes, s, t V, we have X s X t X V \{s,t} Zhenke Wu BIOSTAT830 Graphical Models (Module 1: Representation) 6

Separation Zhenke Wu BIOSTAT830 Graphical Models (Module 1: Representation) 7

Relationships of Different Markov Properties A distribution that satisfies the global Markov property is said to be a Markov random field or Markov network with respect to the graph. Proposition 1: For any undirected graph G and any distribution P, we have: global Markov property = local Markov Property = pairwise Markov property Proposition 2: If the joint density p(x) of the distribution P is positive and continuous with respect to a product measure, then pairwise Markov property implies global Markov property. Therefore, for distributions with positive continuous densities, the global, local, and pairwise Markov properties are equivalent. We usually say a distribution P is Markov to G, if P satisfies the global Markov property with respect to a graph G. Zhenke Wu BIOSTAT830 Graphical Models (Module 1: Representation) 8

Clique Decomposition Unlike a DAG that encodes factorization by conditional probability distributions, UG does this in terms of clique potentials, where clique in a graph is a fully connected subset of vertices. A clique is a maximal clique if it is not contained in any larger clique. Zhenke Wu BIOSTAT830 Graphical Models (Module 1: Representation) 9

Factorization by Clique Decompositions Let C be a set of all maximal cliques in a graph. A probability distribution factorizes with respect to this graph G if it can be written as a product of factors, one for each of the maximal cliques in the graph: p(x 1,..., x d ) = c C ψ C (x C ). Similarly, a set of clique potentials {ψ C (x C ) 0} C C determines a probability distribution that factors with respect to the graph G by normalizing: p(x 1,..., x d ) = 1 ψ C (x C ). Z The normalizing constant, or partition function Z sums or integrates over all settings of the random variables. Note that Z may contain parameters from the potential functions. c C Zhenke Wu BIOSTAT830 Graphical Models (Module 1: Representation) 10

Factorization and Markov Property Theorem 1: For any undirected graph G = (V, E), a distribution P that factors with respect to the graph also satisfies the global Markov property on the graph. Next question: under what conditions the Markov properties imply factorization with respect to a graph? Theorem (Hammersley-Clifford-Besag; Discrete Version). Suppose that G = (V, E) is a graph and X i, i V are random variables that take on a finite number of values. If P(x) > 0 is strictly positive and satisfies the local Markov property with respect to G, then it factorizes with respect to G. Zhenke Wu BIOSTAT830 Graphical Models (Module 1: Representation) 11

Factorization and Markov Property (continued) For positive distributions, Global Markov Local Markov Factorization Zhenke Wu BIOSTAT830 Graphical Models (Module 1: Representation) 12

Comment Next lecture: learn the relationships between DAGs and UGs; when can we convert a DAG to an UG; how can we do it? (Hint: moralization; important for posterior inference) Reading: Section 4.5, Koller and Friedman (2009) Zhenke Wu BIOSTAT830 Graphical Models (Module 1: Representation) 13