SCORE EQUIVALENCE & POLYHEDRAL APPROACHES TO LEARNING BAYESIAN NETWORKS

Similar documents
Survey of contemporary Bayesian Network Structure Learning methods

COMP331/557. Chapter 2: The Geometry of Linear Programming. (Bertsimas & Tsitsiklis, Chapter 2)

Bayesian network model selection using integer programming

The Chordal Graph Polytope for Learning Decomposable Models

CS599: Convex and Combinatorial Optimization Fall 2013 Lecture 14: Combinatorial Problems as Linear Programs I. Instructor: Shaddin Dughmi

THEORY OF LINEAR AND INTEGER PROGRAMMING

DM545 Linear and Integer Programming. Lecture 2. The Simplex Method. Marco Chiarandini

Applied Integer Programming

Combinatorial Geometry & Topology arising in Game Theory and Optimization

Convex Geometry arising in Optimization

Geometric View on Learning Bayesian Network Structures

What is a cone? Anastasia Chavez. Field of Dreams Conference President s Postdoctoral Fellow NSF Postdoctoral Fellow UC Davis

Linear programming and duality theory

Linear Programming in Small Dimensions

MA4254: Discrete Optimization. Defeng Sun. Department of Mathematics National University of Singapore Office: S Telephone:

A Note on Polyhedral Relaxations for the Maximum Cut Problem

Lecture 3. Corner Polyhedron, Intersection Cuts, Maximal Lattice-Free Convex Sets. Tepper School of Business Carnegie Mellon University, Pittsburgh

Graphical Models Part 1-2 (Reading Notes)

Lecture 2 - Introduction to Polytopes

ACTUALLY DOING IT : an Introduction to Polyhedral Computation

POLYHEDRAL GEOMETRY. Convex functions and sets. Mathematical Programming Niels Lauritzen Recall that a subset C R n is convex if

Advanced Operations Research Techniques IE316. Quiz 2 Review. Dr. Ted Ralphs

Minimum Weight Constrained Forest Problems. Problem Definition

Integer Programming Theory

Introduction to Mathematical Programming IE496. Final Review. Dr. Ted Ralphs

18 Spanning Tree Algorithms

Graphs and Network Flows IE411. Lecture 21. Dr. Ted Ralphs

Lecture 12 March 4th

Learning decomposable models with a bounded clique size

A Counterexample to the Dominating Set Conjecture

Lecture 4: Rational IPs, Polyhedron, Decomposition Theorem

Noncrossing sets and a Graßmann associahedron

Advanced Operations Research Techniques IE316. Quiz 1 Review. Dr. Ted Ralphs

Lecture 6: Faces, Facets

Probabilistic Graphical Models

CS675: Convex and Combinatorial Optimization Spring 2018 The Simplex Algorithm. Instructor: Shaddin Dughmi

15-451/651: Design & Analysis of Algorithms October 11, 2018 Lecture #13: Linear Programming I last changed: October 9, 2018

Math 5593 Linear Programming Lecture Notes

Introduction to Graph Theory

6. Lecture notes on matroid intersection

Winning Positions in Simplicial Nim

The orientability of small covers and coloring simple polytopes. Nishimura, Yasuzo; Nakayama, Hisashi. Osaka Journal of Mathematics. 42(1) P.243-P.

BAYESIAN NETWORKS STRUCTURE LEARNING

In this lecture, we ll look at applications of duality to three problems:

CSE 417 Network Flows (pt 3) Modeling with Min Cuts

Lesson 17. Geometry and Algebra of Corner Points

LP Geometry: outline. A general LP. minimize x c T x s.t. a T i. x b i, i 2 M 1 a T i x = b i, i 2 M 3 x j 0, j 2 N 1. where

MOURAD BAÏOU AND FRANCISCO BARAHONA

Combinatorics of cluster structures in Schubert varieties

Approximation slides 1. An optimal polynomial algorithm for the Vertex Cover and matching in Bipartite graphs

Building Classifiers using Bayesian Networks

TORIC VARIETIES JOAQUÍN MORAGA

Notes on Minimum Cuts and Modular Functions

D-Separation. b) the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C.

A Connection between Network Coding and. Convolutional Codes

Integer and Combinatorial Optimization

Probabilistic Graphical Models

15.082J and 6.855J. Lagrangian Relaxation 2 Algorithms Application to LPs

Matroidal Subdivisions, Dressians and Tropical Grassmannians

arxiv: v2 [math.co] 24 Aug 2016

1. Lecture notes on bipartite matching February 4th,

Submodularity Reading Group. Matroid Polytopes, Polymatroid. M. Pawan Kumar

Integer Programming Chapter 9

Treewidth and graph minors

A Correlation Inequality for Whitney-Tutte Polynomials

4 LINEAR PROGRAMMING (LP) E. Amaldi Fondamenti di R.O. Politecnico di Milano 1

Optimization of submodular functions

Graph Coloring Facets from a Constraint Programming Formulation

DISCRETE CONVEX ANALYSIS

Graphs that have the feasible bases of a given linear

Approximation algorithms for finding a -connected subgraph of a graph with minimum weight

5.3 Cutting plane methods and Gomory fractional cuts

Bayesian Machine Learning - Lecture 6

Linear and Integer Programming (ADM II) Script. Rolf Möhring WS 2010/11

1 Linear programming relaxation

arxiv: v1 [math.co] 3 Apr 2016

arxiv: v1 [math.co] 5 Nov 2010

Institutionen för matematik, KTH.

1 Elementary number theory

Some Advanced Topics in Linear Programming

Discrete Optimization. Lecture Notes 2

Learning Bayesian Network Structure using LP Relaxations

Collapsible biclaw-free graphs

Linear Programming Duality and Algorithms

Integer Programming ISE 418. Lecture 1. Dr. Ted Ralphs

ACO Comprehensive Exam October 12 and 13, Computability, Complexity and Algorithms

4 Integer Linear Programming (ILP)

Lecture 5: Exact inference

CMPSCI 311: Introduction to Algorithms Practice Final Exam

Mining Social Network Graphs

FMA901F: Machine Learning Lecture 6: Graphical Models. Cristian Sminchisescu

CS 473: Algorithms. Ruta Mehta. Spring University of Illinois, Urbana-Champaign. Ruta (UIUC) CS473 1 Spring / 36

MVE165/MMG630, Applied Optimization Lecture 8 Integer linear programming algorithms. Ann-Brith Strömberg

ORIE 6300 Mathematical Programming I September 2, Lecture 3

FACES OF CONVEX SETS

R n a T i x = b i} is a Hyperplane.

The Ordered Covering Problem

Target Cuts from Relaxed Decision Diagrams

Endomorphisms and synchronization, 2: Graphs and transformation monoids. Peter J. Cameron

Lecture 14: Linear Programming II

Transcription:

SCORE EQUIVALENCE & POLYHEDRAL APPROACHES TO LEARNING BAYESIAN NETWORKS David Haws*, James Cussens, Milan Studeny IBM Watson dchaws@gmail.com University of York, Deramore Lane, York, YO10 5GE, UK The Institute of Information Theory and Automation of the CAS, Pod Voda renskou veˇˇz ı 4, Prague, 182 08, Czech Republic

Definition A Bayesian network is a graphical model that represents a set of random variables and their conditional dependencies via a directed acyclic graph (DAG). Nodes à Random variables Edges à Conditional dependencies Node (RV) is conditionally independent of its non-descendents; given the state of all its parents. or Node (RV) is conditionally independent of all other nodes j, given its Markov blanket. Variables X, Y are conditionally independent (CI) given Z if Pr(X and Y Z) = Pr(X Z) Pr(Y Z).

Examples

Learning Bayesian Network Variables Best Bayesian Network Observations Learn NP-Hard How to find right DAG? Scoring criteria!

Scoring Criteria A scoring function Q(G, D) evaluates how well a DAG explains the data. We will only consider score equivalent and decomposable scoring functions Bayesian Information Criterion (BIC), Bayesian Dirichlet Equivalent (BDE). Roughly: Likelihood of graph given data + penalty for complex graphs. Score equivalent: Score of two Markov equivalent graphs are the same. WARNING: Two different DAGs may represent the same probability model! If so, they are called Markov equivalent. Markov equivalent

Score Decomposable A scoring function is score decomposable if there exists a set of functions (local scores) q i B : DAT A({i} [ B,d)! R such that Q(G, D) = X i2n Parents of node i in graph G. q i pag (i)(d {i}[pag (i)) Sum over random variables / nodes. Local score Score DAGs by summing the local score of each node and its parents!

Family Variable Representation Given DAG G over variables N one has Record each nodes parent set! Two graphs are Markov equivalent, but different DAG representations!

Family Variable Polytope (FVP) Vertex ß à DAG Dimension = N(2 (N-1) -1) Facet description for N=1,2,3,4 No facet description N > 4 Some facets known N > 4 Simple IP relaxation IP solution gives DAG FVP

Characteristic Imset Representation -Studeny Goal: Unique vector representation of Markov equivalent classes of DAGs. Notation: Suppose N random variables. We index the components of using subsets, such as

Characteristic Imset Representation Given DAG G over variables N one has any. Moreover for Parents of node i in G. Theorem (Studeny, Hemmecke, Lindner 2011): Characteristic imsets Markov equivalence classes.

Characteristic Imset Polytope (CIP) Vertex ß à Markov Eq. Class Dimension = 2 N N 1 Facet description for N=1,2,3,4 No facet description N > 4 Some facets known N > 4 Complex IP relaxation IP solution gives eq. class CIP

Geometric Approach to Learning BN Every reasonable scoring function (BIC, BDE, ) is an affine function of family variable or char imset: Integer and linear programming techniques can be applied! (Linear relaxations combined with row-generation and Brach-and-Cut) Data Data FVP Practical ILP methods & software exist based on FVP and CIP. CIP

FVP: Some Known Faces Order faces (Cussens et al.) Sink faces (Cussens et al.) DAGs consistent with total order DAGs with sink j Non-negative inequalities on family variables. (H., Cussens, Studeny). Modified convexity constraints (H, Cussens Studeny). Generalized cluster inequalities (H, Cussens, Studeny), (Cussens, et al.) Connected matroids on C N, C 2 (Studeny). Extreme supermodular set functions (H, Cussens, Studeny). Node i has at most one parent set. Coming up Coming up

Super-modular Set Functions The set of super-modular vectors is a polyhedral cone. Cone is pointed => it has finitely many extreme rays. Extreme rays correspond to faces of FVP. Remark: The rank functions of connected matroids are extreme rays of non-decreasing submodular functions (mirrors to supermodular functions).

Cluster Inequalities Why? Not all nodes in the cluster C can have a parent which is also in the cluster C

Generalized Cluster Inequalities For every cluster and the generalized cluster inequality is Why? For any DAG G the induced subgraph G C is acyclic and the first k nodes in C in a total order consistent with G C has at most k-1 parents in C.

Connecting FVP and CIP Linear map between Family variable and Char Imset: (Studeny, Haws) objective is score equivalent(se) if BIC & BDE always give SE obj! Face is score equivalent if there exists a SE objective defining F. A face is weakly equivalent(we) if

Score Equivalence, FVP, & CIP Theorem [H, Cussens, Studeny] The following conditions are equivalent for a facet a) S is closed under Markov equivalence b) S contains the whole equivalence class of full graphs c) S is SE Corollary [H, Cussens, Studeny] There is a 1-1 correspondence between SE faces of FVP and faces of CIP which preserves inclusion. Corollary [H, Cussens, Studeny] SE facets of the FVP correspond to those facets of the CIP that contain the 1-imset. None of those facets of CIP include the 0-imset. Moreover, these facets correspond to extreme supermodular functions.

Sufficiency of Score Equivalent Faces Learning BN structure = optimizing SE obj over FVP Are SE faces of FVP sufficient? Yes! Theorem [H, Cussens, Studeny] Let o be an SE objective. Then the LP problem of has the same optimal value as the LP problem to maximize the same function over the polyhedron SE faces of FVP corresponding to facets of CIP not containing 0- imset Non-negativity and modified convexity constraints. Not true for SE-facets! L Must use all SE-faces.

Open Conjecture something to think on (H, Cussens, Studeny) Theorem: Every weakly equivalent facet of family-variable polytope is a score equivalent facet. (Haws, Cussens, Studeny) Conjecture: Every weakly equivalent face of family-variable polytope is a score equivalent face. Believe false, but counter-example must be in N >=4. Already performed extensive searches in N=4,5. L

THANK YOU! David Haws*, Milan Studeny, James Cussens IBM Watson dchaws@gmail.com Polyhedral Approaches to Learning Bayesian Networks, David Haws, James Cussens, Milan Studeny, to appear in book on Special Session on Algebraic and Geometric Methods in Applied Discrete Mathematics, 2015.