A note on the pairwise Markov condition in directed Markov fields

Similar documents
These notes present some properties of chordal graphs, a set of undirected graphs that are important for undirected graphical models.

CAUSAL NETWORKS: SEMANTICS AND EXPRESSIVENESS*

CONDITIONAL INDEPENDENCE AND ITS REPRESENTATIONS*

Causal Networks: Semantics and Expressiveness*

Stat 5421 Lecture Notes Graphical Models Charles J. Geyer April 27, Introduction. 2 Undirected Graphs

Research Article Structural Learning about Directed Acyclic Graphs from Multiple Databases

Sparse Nested Markov Models with Log-linear Parameters

Lecture 4: Undirected Graphical Models

Av. Prof. Mello Moraes, 2231, , São Paulo, SP - Brazil

A Transformational Characterization of Markov Equivalence for Directed Maximal Ancestral Graphs

A Logical Notion of Conditional Independence: Properties and Applications

Escola Politécnica, University of São Paulo Av. Prof. Mello Moraes, 2231, , São Paulo, SP - Brazil

Solutions to In-Class Problems Week 4, Fri

Markov Equivalence in Bayesian Networks

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Bayesian Networks, Winter Yoav Haimovitch & Ariel Raviv

Causal inference and causal explanation with background knowledge

Separators and Adjustment Sets in Markov Equivalent DAGs

Trees. 3. (Minimally Connected) G is connected and deleting any of its edges gives rise to a disconnected graph.

Strongly Connected Components

A Well-Behaved Algorithm for Simulating Dependence Structures of Bayesian Networks

36-720: Graphical Models

Faster Constraint Satisfaction Algorithms for Temporal Reasoning' (*)

Learning Bayesian Networks via Edge Walks on DAG Associahedra

NOTICE WARNING CONCERNING COPYRIGHT RESTRICTIONS: The copyright law of the United States (title 17, U.S. Code) governs the making of photocopies or

Chapter 2 PRELIMINARIES. 1. Random variables and conditional independence

Representation of Finite Games as Network Congestion Games

9 Connectivity. Contents. 9.1 Vertex Connectivity

Introduction to information theory and coding - Lecture 1 on Graphical models

The Prism of the Acyclic Orientation Graph is Hamiltonian

Chapter 1. Preliminaries

Loopy Belief Propagation

STABILITY AND PARADOX IN ALGORITHMIC LOGIC

FMA901F: Machine Learning Lecture 6: Graphical Models. Cristian Sminchisescu

Disjoint directed cycles

ARELAY network consists of a pair of source and destination

Testing Independencies in Bayesian Networks with i-separation

On the Implication Problem for Probabilistic Conditional Independency

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

On the Space-Time Trade-off in Solving Constraint Satisfaction Problems*

Lecture 2 - Graph Theory Fundamentals - Reachability and Exploration 1

Module 11. Directed Graphs. Contents

Week 9-10: Connectivity

Topology Homework 3. Section Section 3.3. Samuel Otten

Statistical and Learning Techniques in Computer Vision Lecture 1: Markov Random Fields Jens Rittscher and Chuck Stewart

Consistency and Set Intersection

On Local Optima in Learning Bayesian Networks


Graph Connectivity G G G

A Discovery Algorithm for Directed Cyclic Graphs

Solution for Homework set 3

The strong chromatic number of a graph

Robust Independence-Based Causal Structure Learning in Absence of Adjacency Faithfulness

Causal Explanation with Background Knowledge. Bhaskara Reddy Moole

CSE101: Design and Analysis of Algorithms. Ragesh Jaiswal, CSE, UCSD

Probabilistic Partial Evaluation: Exploiting rule structure in probabilistic inference

Antisymmetric Relations. Definition A relation R on A is said to be antisymmetric

The Acyclic Bayesian Net Generator (Student Paper)

Parameter and Structure Learning in Nested Markov Models

Conditional PSDDs: Modeling and Learning with Modular Knowledge

Node Aggregation for Distributed Inference in Bayesian Networks

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 53, NO. 10, OCTOBER

Graph Definitions. In a directed graph the edges have directions (ordered pairs). A weighted graph includes a weight function.

Probabilistic Graphical Models

Lecture 11 COVERING SPACES

Machine Learning. Sourangshu Bhattacharya

Bipartite Roots of Graphs

On the Relationships between Zero Forcing Numbers and Certain Graph Coverings

Initialization for the method of conditioning in Bayesian belief networks

Machine Learning. Lecture Slides for. ETHEM ALPAYDIN The MIT Press, h1p://

The Structure of Bull-Free Perfect Graphs

Complete Variable-Length "Fix-Free" Codes

Cartesian Products of Graphs and Metric Spaces

Unifying and extending hybrid tractable classes of CSPs

EDAA40 At home exercises 1

On 2-Subcolourings of Chordal Graphs

5. Lecture notes on matroid intersection

Abstract. 2 Background 2.1 Belief Networks. 1 Introduction

On Computing the Minimal Labels in Time. Point Algebra Networks. IRST { Istituto per la Ricerca Scientica e Tecnologica. I Povo, Trento Italy

SEFE without Mapping via Large Induced Outerplane Graphs in Plane Graphs

Recognizing Interval Bigraphs by Forbidden Patterns

Lecture : Topological Space

Lecture 13: May 10, 2002

A Characterization of

i=l An Algorithm for the Construction of Bayesian Network Structures from Data

1 Variations of the Traveling Salesman Problem

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

arxiv:submit/ [math.co] 9 May 2011

Weak Dynamic Coloring of Planar Graphs

EXTERNAL VISIBILITY. 1. Definitions and notation. The boundary and interior of

The Encoding Complexity of Network Coding

Ch9: Exact Inference: Variable Elimination. Shimi Salant, Barak Sternberg

Learning Equivalence Classes of Bayesian-Network Structures

Lecture 6: Graph Properties

Graphical Analysis of Value of Information in Decision Models

Advanced Combinatorial Optimization September 17, Lecture 3. Sketch some results regarding ear-decompositions and factor-critical graphs.

Lecture 17: Continuous Functions

Module 7. Independent sets, coverings. and matchings. Contents

CLASS-ROOM NOTES: OPTIMIZATION PROBLEM SOLVING - I

CSE 331: Introduction to Algorithm Analysis and Design Graphs

Transcription:

TECHNICAL REPORT R-392 April 2012 A note on the pairwise Markov condition in directed Markov fields Judea Pearl University of California, Los Angeles Computer Science Department Los Angeles, CA, 90095-1596, USA (310) 825-3243 judea@cs.ucla.edu April 9, 2012 Abstract It is well known that, in directed Markov fields, the pairwise Markov condition does not imply the global Markov condition, unless the distribution is strictly positive. We introduce a stronger version of the pairwise condition which requires that every nonadjacent pair be independent conditional on every set that separates the pair in the graph. We show that this stronger condition is equivalent to the global Markov condition (for all probability distributions.) We generalize this result to abstract dependency models, and show that a weaker condition holds for compositional graphoids. 1 Introduction A probability distribution is said to be Markov relative to a directed acyclic graph (DAG) G if every d-separation condition in G is confirmed by a corresponding conditional independence condition in P. This condition is also called globally Markov since it applies to all subsets of variables and, when it holds, we say that P and G are compatible (Pearl, 1988; Lauritzen, 1996). Several local conditions have been devised which apply to singleton variables, among them the local Markov condition and the pairwise Markov condition. The local Markov condition requires that every variable be conditional independent of its non descendants, given its parents. This condition is often taken as a definition of Bayesian Networks and it can be shown to be equivalent to the global Markov condition for all probability distributions Pearl and Verma (1987); Geiger et al. (1990); Lauritzen (1996). 1 The pairwise Markov condition requires that every pair (x, y) of non adjacent variables with y non-descendant of 1 This global-local equivalence holds in fact in all dependency models that obey the semi-graphoid axioms, not necessarily probabilistic. 1

x be independent, conditional on all other non-descendants of x (Lauritzen, 1996, p. 50). This condition, however, while entailed by the global Markov condition, is too weak to ensure global compatibility unless the distribution is strictly positive (a simple counterexample is given below). This idiosyncratic feature of pairwise independencies sets directed Markov fields apart from their undirected counterparts, in which the global, local and pairwise conditions coincide Pearl and Paz 1986; Pearl 1988, p. 98; Lauritzen 1996. This note introduces a stronger version of the pairwise condition, named pairwise compatibility, which requires that every nonadjacent pair be independent conditional on all its separating sets in the DAG. We show that this stronger condition is equivalent to the global Markov condition for all probability distributions, including those that impose logical or equality constraints. Notation Let P be a joint distribution function on a set V of variables, and G a Directed Acyclic Graph (DAG) with vertices corresponding to variables in V. Let X, Y, and Z stand for sets of variables in V, and x, y, z singleton variables in V. Let (X, Y, Z) G stand for the assertion Y d-separates X from Z in G. Let (X, Y, Z) P stand for the assertion: Y is independent of X given Z in P. Definitions A DAG G is said to be set-compatible (SP) with probability function P iff for every three sets X, Y, and Z in V. (X, Y, Z) G = (X, Y, Z) P (1) A DAG G is said to be pairwise-compatible (PC) with probability P iff (x, Y, z) G = (x, Y, z) P (2) for every set Y and every pair of singleton variables x and z. Clearly, set-compatibility implies pairwise compatibility. We will show that the converse is also true. Theorem 1 Pairwise-compatibility implies set-compatibility Proof: It is enough to prove that pairwise compatibility implies (x, pa(x), nd(x)) P, where pa(x) is the set of parents of x, and nd(x) is the set of nondescendants of x. The reason is that the condition (x, pa(x), nd(x)) P for all x implies set-compatibility (See Causality, Pearl 2009, p. 19, Theorem 1.2.7). Proof by induction: Assuming PC, let us prove that (x, pa(x), nd(x)) P holds for every variable x, using induction on the cardinality of sets in nd(x). Let I k stand for the hypothesis: 2

I k : if (x, Y, z) G = (x, Y, z) P for every set Y and every singletons x and z, then, for every x we have: (x, pa(x), S k ) P, where S k is any set of nondescendants of x, such that card(s k ) k. Base: For k = 1, I k is trivially true, because I 1 reduces to an identity. Indeed, I 1 says: if (x, Y, z) G = (x, Y, z) P for every set Y and every singletons x and z, then, for every x we have: (x, pa(x), S 1 ) P, where S 1 is any set of nondescendants of x, such that card(s 1 ) 1. But the sets S 1 are all singletons, hence whatever is true for z, would be true for S 1 as well. The induction step is: I k I k+1 Let S k+1 be any set of nondescendants of x, such that card(s) k + 1. Let z be any singleton variable in S k+1, and S = S k+1 \z Let T stands for the claim of pairwise compatibility, i.e.,: (x, Y, z) G = (x, Y, z) P for every singletons x and z, From I k we have T (x, pa(x), S) P, because the cardinality of S is not exceeding k. We also have (x, pa(x), z, S) G because S and z are both in nd(x) of G. This implies (x, pa(x), z, S) G, because d-separation is a graphoid. and any graphoid obeys the weak union axiom: (x, pa(x), zs) G (x, pa(x)z, S) G, and Thus, T (x, pa(x)z, S) P, because S is of card k. and I k is assumed to hold. In summary, We now have T (x, pa(x), S) P, T (x, pa(x)z, S) P, from which we conclude, using the contraction axiom for probabilistic independence (Causality, Pearl 2009, p. 11): or This completes the inductive step QED T (x, pa(x), Sz) P, T (x, pa(x), S k+1 ) P, 3

Example 1 Let x = y = z and let w be independent of x with P(x = 1) = P(x = 0) = P(w = 1) = P(w = 0) = 1/2 This distribution is pair-wise Markov with respect to the graph G 1 in Fig. 1(a), because every non adjunct pair of variables x, y with y nondescendant of x is independent conditional on all other nondecendants of x (in our case {z, w}). However, this distribution is not setcompatible with the graph, because (x, w, yz) G holds in G 1 while (x, w, yz) P does not hold in P. Now examine our strong pair-wise condition, PC; (x, y) are nonadjacent in G 1, with two separating sets, {w} and {wz}. PC requires not only (x, wz, y) P but also (x, w, y) P. While the first requirement holds in P, the second does not, which renders G 1 and P pair-wise incompatible. In contrast, the graph G 2 depicted in Fig. 1(b) is pairwise compatible with P. Here, the only adjacent pair is (x, z) and it is separated by one set {wy}. Accordingly, pairwise comparability requires that x and z be independent conditional on {zw}, a requirement satisfied in P. Y Z Y Z W W (a) X (b) X Figure 1: 2 Conclusions We showed that a strong version of the pairwise condition is sufficient to guarantee the global Markov condition in directed Markov fields. The strong version requires that every pair of nonadjacent nodes be independent conditional on every set that d-separates the pair. Clearly the ability to replace set-compatibility with pairwise compatibility is not unique to graphs and probabilities, but extends to dependency models for which the semi-graphoid axioms hold. Acknowledgments This research was supported in parts by grants from NIH #1R01 LM009961-01, NSF #IIS- 0914211 and #IIS-1018922, and ONR #N000-14-09-1-0665 and #N00014-10-1-0933. 4

References Geiger, D., Paz, A. and Pearl, J. (1990). Learning simple causal structures. Journal of Intelligent Systems 50 510 530. Lauritzen, S. (1996). Graphical Models. Clarendon Press, Oxford. Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann, San Mateo, CA. Pearl, J. (2009). Causality: Models, Reasoning, and Inference. 2nd ed. Cambridge University Press, New York. Pearl, J. and Paz, A. (1986). On the logic of representing dependencies by graphs. In Proceedings 1986 Canadian AI Conference. Montreal, Ontario, Canada. Pearl, J. and Verma, T. (1987). The logic of representing dependencies by directed acyclic graphs. In Proceedings of the Sixth National Conference on AI (AAAI-87). Seattle, WA. 5