Recall from last time. Lecture 4: Wrap-up of Bayes net representation. Markov networks. Markov blanket. Isolating a node

Recall from last time Lecture 4: Wrap-up of Bayes net representation. Markov networks Markov blanket, moral graph Independence maps and perfect maps Undirected graphical models (Markov networks) A Bayes net can be viewed as an independence map (I-map) for some distribution The I-map property means that the distribution factorizes according to the graph structure of the net But the graph can have more arcs than necessary! Directed separation (d-separation) is a sound and complete way to characterize the distributions corresponding to a given graph structure. January 10, 2006 1 COMP-526 Lecture 4 January 10, 2006 2 COMP-526 Lecture 4 Isolating a node Suppose we want the smallest set of nodes U such that is independent of all other nodes in the network given U: ({ 1... n } {} U) U. What should U be? Markov blanket Clearly, at least s parents and children should be in U But this is not enough if there are v-structures; U will also have to include s spouses - i.e. the other parents of s children The set U consisting of s parents, children and other parents of its children is called the Markov blanket of. January 10, 2006 3 COMP-526 Lecture 4 January 10, 2006 4 COMP-526 Lecture 4

Moral graphs Given a DAG G, we define the moral graph of G to be an undirected graph U over the same set of vertices, such that the edge (, ) is in U if is in s Markov blanket Perfect maps A DAG G is a perfect map of a distribution p if it satisfies the following property: Z d-separates and Z If G is an I-map of p, then U will also be an I-map of p But many independencies are lost when going to a moral graph Moral graphs will prove to be useful when we talk about A perfect map captures all the independencies of a distribution Perfect maps are unique, up to DAG equivalence How can we construct a perfect map for a distribution? inference. January 10, 2006 5 COMP-526 Lecture 4 January 10, 2006 6 COMP-526 Lecture 4 Example Consider a distribution over 4 random variable,, Z, W such that: {Z,W } Z W {, } Can you find an I-map for this distribution? Can you find a perfect map? Some distributions do not have perfect maps! Example Consider a distribution over 4 random variable,, Z, W such that: {Z,W } Z W {, } Can you find an I-map for this distribution? Can you find a perfect map? Some distributions do not have perfect maps! January 10, 2006 7 COMP-526 Lecture 4 January 10, 2006 8 COMP-526 Lecture 4

Example: Pathfinder (Heckerman, 1991) Medical diagnostic system for lymph node diseases Large net! 60 diseases, 100 symptoms and test results, 14000 probabilities Network built by medical experts 8 hours to determine the variables 35 hours for network topology 40 hours for probability table values Experts found it easy to invent causal links and probabilities Pathfinder is now outperforming world experts in diagnosis Commercialized by Intellipath and Chapman Hall Publishing; extended to other medical domains Typical applications for Bayes nets Medical diagnosis Bioinformatics (data integration) Risk assessment Environmental science (e.g., wildlife habitat viability, risk of foreign species invasion) Analysis of demographic data In general, diagnosis and causal reasoning tasks Many commercial packages available (e.g. Netica, Hugin, WinMine,...) Sometimes Bayes net technology is incorporated in business software January 10, 2006 9 COMP-526 Lecture 4 January 10, 2006 10 COMP-526 Lecture 4 Undirected graphical models So far we have used directed graphs as the underlying structure of a Bayes net Why not use undirected graphs as well? E.g., variables might not be in a causality relation, but they can still be correlated, like the pixels in a neighborhood in an image An undirected graph over a set of random variables { 1,... n } is called a undirected graphical model or Markov random field or Markov network Conditional independence We need to be able to specify, for a given graph, if Z, for any disjoint subsets of nodes,, Z. In directed graphs, we did this using the Bayes Ball algorithm In undirected graphs, independence can be established simply by graph separation: if every path from a node in to a node in Z goes through a node in, we conclude that Z Hence, independence can be established by removing the nodes in the conditioning set then doing reachability analysis on the remaining graph. What is the Markov blanket of a node in an undirected model? January 10, 2006 11 COMP-526 Lecture 4 January 10, 2006 12 COMP-526 Lecture 4

How expressive are undirected models? Are undirected models more expressive than directed models? Example: An undirected graph I.e. for any directed model, can we find an undirected model that satisfies exactly the same conditional independence relations? Z W Are undirected models less expressive? I.e. for any undirected model, can we find a directed model that satisfies exactly the same conditional independencies? Can we find a directed graph that satisfies the same independence relations? January 10, 2006 13 COMP-526 Lecture 4 January 10, 2006 14 COMP-526 Lecture 4 Local parameterization Example: A directed graph Z Can we find an undirected graph that satisfies the same independence relations? In directed models, we had local probability models (CPDs) attached to every node, giving the conditional probability of the corresponding random variable given its parents We want a similar property in undirected models: the joint probability distribution should factorize over the graph This means that the joint can be written as a product of local factors, which depend on subsets of the variables. What should the local factors be? January 10, 2006 15 COMP-526 Lecture 4 January 10, 2006 16 COMP-526 Lecture 4

Local parameterizations: Try 2 What about local marginal parameterizations? Suppose we express the joint as: p( 1,... n ) = i It is local and has a nice interpretation So consider using it for an example: p( i, Neighbors( i )) Z Consider a pair of nodes and that are not directly connected through an arc According to the conditional independence interpretation, and are independent given all the other nodes in the graph { 1,... n } Hence, there must be a factorization in which they do not appear in the same factor This suggests that we should define factors on cliques Recall that a clique is a fully connected subset of nodes (i.e., there is an arc between every pair of nodes) January 10, 2006 17 COMP-526 Lecture 4 January 10, 2006 18 COMP-526 Lecture 4 Example: what are the cliques? A B C D January 10, 2006 19 COMP-526 Lecture 4