Dependency detection with Bayesian Networks

Similar documents
Evaluating the Effect of Perturbations in Reconstructing Network Topologies

Summary: A Tutorial on Learning With Bayesian Networks

Cheng Soon Ong & Christian Walder. Canberra February June 2018

A Parallel Algorithm for Exact Structure Learning of Bayesian Networks

Building Classifiers using Bayesian Networks

BAYESIAN NETWORKS STRUCTURE LEARNING

Missing Data Estimation in Microarrays Using Multi-Organism Approach

Bayesian network structure learning based on the chaotic particle swarm optimization algorithm

Exam Advanced Data Mining Date: Time:

Machine Learning

Computer vision: models, learning and inference. Chapter 10 Graphical Models

STAT 598L Learning Bayesian Network Structure

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

Machine Learning. Sourangshu Bhattacharya

FMA901F: Machine Learning Lecture 6: Graphical Models. Cristian Sminchisescu

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

CS 343: Artificial Intelligence

University of Wisconsin-Madison Spring 2018 BMI/CS 776: Advanced Bioinformatics Homework #2

Introduction to Machine Learning CMU-10701

Constructing Bayesian Network Models of Gene Expression Networks from Microarray Data

Modeling and Reasoning with Bayesian Networks. Adnan Darwiche University of California Los Angeles, CA

Probabilistic Graphical Models

Inferring Regulatory Networks by Combining Perturbation Screens and Steady State Gene Expression Profiles

Learning Generative Graph Prototypes using Simplified Von Neumann Entropy

Gene expression & Clustering (Chapter 10)

Bayesian Networks Inference (continued) Learning

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

Graphical models are a lot like a circuit diagram they are written down to visualize and better understand a problem.

Graphical Models. Pradeep Ravikumar Department of Computer Science The University of Texas at Austin

A Bayesian Network Approach for Causal Action Rule Mining

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

The Acyclic Bayesian Net Generator (Student Paper)

Mix-nets: Factored Mixtures of Gaussians in Bayesian Networks with Mixed Continuous And Discrete Variables

Ordering attributes for missing values prediction and data classification

Evaluating the Explanatory Value of Bayesian Network Structure Learning Algorithms

José Miguel Hernández Lobato Zoubin Ghahramani Computational and Biological Learning Laboratory Cambridge University

A Probabilistic Relaxation Framework for Learning Bayesian Network Structures from Data

Node Aggregation for Distributed Inference in Bayesian Networks

Probabilistic Abstraction Lattices: A Computationally Efficient Model for Conditional Probability Estimation

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 15, 2015

Warm-up as you walk in

Layout Search of a Gene Regulatory Network for 3-D Visualization

Parallel Algorithms for Bayesian Networks Structure Learning with Applications in Systems Biology

Machine Learning

Flood Damage and Inuencing Factors: A Bayesian Network Perspective

A Taxonomy of Semi-Supervised Learning Algorithms

Lecture 4: Undirected Graphical Models

Slides for Data Mining by I. H. Witten and E. Frank

Machine Learning

Comparative analysis of data mining methods for predicting credit default probabilities in a retail bank portfolio

Bayesian Classification Using Probabilistic Graphical Models

SYDE Winter 2011 Introduction to Pattern Recognition. Clustering

Note Set 4: Finite Mixture Models and the EM Algorithm

Package ArrayBin. February 19, 2015

Machine Learning

Random projection for non-gaussian mixture models

RNNs as Directed Graphical Models

1 : Introduction to GM and Directed GMs: Bayesian Networks. 3 Multivariate Distributions and Graphical Models

Bayesian Machine Learning - Lecture 6

Some questions of consensus building using co-association

CS242: Probabilistic Graphical Models Lecture 3: Factor Graphs & Variable Elimination

HPC methods for hidden Markov models (HMMs) in population genetics

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask

Comparing Dropout Nets to Sum-Product Networks for Predicting Molecular Activity

A Brief Look at Optimization

A probabilistic logic incorporating posteriors of hierarchic graphical models

Data Preprocessing. Why Data Preprocessing? MIT-652 Data Mining Applications. Chapter 3: Data Preprocessing. Multi-Dimensional Measure of Data Quality

Statistical Techniques in Robotics (STR, S15) Lecture#06 (Wednesday, January 28)

Bayesian Networks. A Bayesian network is a directed acyclic graph that represents causal relationships between random variables. Earthquake.

10-701/15-781, Fall 2006, Final

Machine Learning Techniques for Data Mining

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques

Hybrid Feature Selection for Modeling Intrusion Detection Systems

Sequential Dependency and Reliability Analysis of Embedded Systems. Yu Jiang Tsinghua university, Beijing, China

Markov Random Fields and Segmentation with Graph Cuts

10601 Machine Learning. Model and feature selection

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms for Inference Fall 2014

Clustering web search results

ON HEURISTIC METHODS IN NEXT-GENERATION SEQUENCING DATA ANALYSIS

Phylogenetics on CUDA (Parallel) Architectures Bradly Alicea

9.1. K-means Clustering

SVM Classification in -Arrays

CHAPTER 4 FUZZY LOGIC, K-MEANS, FUZZY C-MEANS AND BAYESIAN METHODS

NC 27710, USA, 2 Department of Electrical Engineering and 3 Department of Computer Science, Duke University, Durham, NC 27708, USA

Mixture Models and the EM Algorithm

Machine Learning. Lecture Slides for. ETHEM ALPAYDIN The MIT Press, h1p://

Cambridge Interview Technical Talk

Learning Equivalence Classes of Bayesian-Network Structures

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas

Causality with Gates

Conditional Random Fields and beyond D A N I E L K H A S H A B I C S U I U C,

Contents. Preface to the Second Edition

e-ccc-biclustering: Related work on biclustering algorithms for time series gene expression data

Machine Learning

Integrating locally learned causal structures with overlapping variables

Detecting Novel Associations in Large Data Sets

3 : Representation of Undirected GMs

COMS 4771 Clustering. Nakul Verma

Web Usage Mining. Overview Session 1. This material is inspired from the WWW 16 tutorial entitled Analyzing Sequential User Behavior on the Web

Computer Vision. Exercise Session 10 Image Categorization

Transcription:

Dependency detection with Bayesian Networks M V Vikhreva Faculty of Computational Mathematics and Cybernetics, Lomonosov Moscow State University, Leninskie Gory, Moscow, 119991 Supervisor: A G Dyakonov E-mail: vkhvmr@gmail.com 1. Introduction Most methods aimed to detect dependency between two variables do not consider interventions with numerous other variables and often make groundless assumptions about relation form. Bayesian networks help to despise of the drawbacks. Bayesian networks allow one to learn about causal relationships. Bayesian networks can consider prior knowledge. Bayesian networks can readily handle incomplete data sets. A Bayesian network is a probabilistic graphical model (a type of statistical model) that represents a set of random variables and their conditional dependencies via a directed acyclic graph (DAG). For a long time Bayesian networks were used as a representation for encoding uncertain expert knowledge in expert systems. Recently, methods were developed to learn Bayesian network from data. Nowadays data analysis bring up a problem of learning a Bayesian network structure. This work is dedicated to finding the scope of data dependencies modern applied algorithms can detect with Bayesian networks. Specifically, experiments aimed to detect dependency between two variables were done with the use of pebl library[2]. In section 2, there are a few more words about what a Bayesian network is. In section 3, we present a Python package pebl and discuss pebl implemented tools for structure learning and probabilities learning in a fixed Bayesian-network structure. In section 4, testing of pebl tools and experiments on how well pebl detects dependencies are presented. 2. Bayesian Network graphical model Formally, Bayesian networks are DAGs whose nodes represent random variables. Edges represent conditional dependencies; nodes that are not connected represent variables that are conditionally independent of each other. That also means that absence of an edge between two nodes doesn t really mean actual independence between corresponding variables, because they can influence each other through other variables. Each node is associated with a probability function that takes, as input, a particular set of values for the node s parent variables, and gives the probability distribution of the variable represented by the node. For example, if m parent nodes represent m Boolean variables then the probability function could be represented by a table of 2 m entries, one entry for each of the 2 m possible combinations of its parents being true or false (see example on Figure 1). In order to fully specify the Bayesian network it is necessary to define the joint probability distribution of the statistical model, thus to specify for each node X the probability distribution for X conditional upon X s parents. It may have any form. In this work discrete distribution is used since it simplifies calculations. In a situation when only constraints on a distribution are

Figure 1. A simple Bayesian network and its conditional probability tables. The joint probability is P(G, S, R) = P(G S, R)P(S R)P(R), where G correspond to Grass Wet variable, S Sprinkler, R Rain. known, the principle of maximum entropy can help to determine the distribution. It will be a distribution that both maximizes entropy and meet the constraints. Often these conditional distributions include parameters which are unknown and must be estimated from data, sometimes using the maximum likelihood approach. Direct maximization of the likelihood (or of the posterior probability) is often complex when there are unobserved variables. A classical approach to this problem is the expectation-maximization algorithm which alternates computing expected values of the unobserved variables conditional on observed data, with maximizing the complete likelihood (or posterior) assuming that previously computed expected values are correct. In the simplest case, a Bayesian network is specified by an expert and is then used to perform inference. But commonly the task of defining the network is too complex for humans. Then the network structure and the parameters of the local distributions must be learned from data. The common approach to structural learning is to introduce a statistically motivated scoring function that evaluates each network with respect to the training data and to search for the optimal network according to this score. For a scoring function posterior probability of the structure given the training data can be used. The exhaustive maximization of the score is known to be an NP-hard problem. Thus heuristic search is often used. 3. Pebl library In this paper, we introduce pebl, a Python library and application for learning Bayesian network structure from data. Although learning the structure of BNs from data is now common, there are still a few high-quality open-source softwares that can meet the needs of various users. End users require software that is easy to use; supports learning with different data types; can accommodate missing values and hidden variables; and can take advantage of various computational clusters and grids. Pebl is the Python Environment for Bayesian Learning to meet these needs. 3.1. Functionality pebl: can work with continuous, discrete and categorical variables (with different type variables simultaneously); can perform maximum entropy discretization (that is when single variable values are compressed into fixed number of bins according to maximum entropy criterion); uses the BDe metric for scoring networks and handles interventional data using the method described in [4]; can handle missing values and hidden variables using exact marginalization and Gibbs sampling [1];

BANJO BNT Causal Explorer Deal LibB PEBL License Academic GPL Academic GPL Academic MIT Scripting language Matlab Matlab Matlab R N/A Python Interventional Data No Yes No No No Yes Dynamic BN Yes Yes No No No No Structural Priors Yes No No No No Yes Missing Data No Yes No No Yes Yes Parallel Execution No No No No No Yes Table 1. Popular Bayesian network structure learning software. supports structural priors over edges specified as hard constraints or soft energy matrices (see [5] for details, this option was not used in our experiments) and arbitrary constraints specified as Python functions or lambda expressions. 3.2. pebl interface use In the input data file type of variables should be specified; Apply pebl.data.discretize(num_bins) to compress unique values of continuous or discrete variables to num_bins number of values (so pebl sees continuous variables as discrete); For parallel calculations use pebl.multiprocess; pebl.result.post[:10] is made to access top score network configurations and pebl.result.cons to access its consensus matrix C (where C ij is a p-value referring to an edge from i-th to j-th node). 3.3. Convenience and Scalability While many tasks related to Bayesian learning are embarrassingly parallel in theory, few software packages take advantage of it. pebl can execute learning tasks in parallel over multiple processors or CPU cores. With appropriate configuration settings and the use of parallel execution, pebl can be used for large learning tasks. Although pebl has been tested successfully with datasets with 10000 variables and samples, BN structure learning is a known NP-Hard problem [6] and analysis using datasets with more than a few hundred variables is likely to result in poor results due to poor coverage of the search space. 3.4. Related Software Comparison While there are many software tools for working with BNs, most focus on parameter learning and inference rather than structure learning. As shown in Table 1, the ability to handle interventional data, model with missing values, use soft and arbitrary priors and exploit parallel platforms are unique to pebl. pebl, however, does not currently provide any features for learning Dynamic Bayesian Networks (a Bayesian Network which relates variables to each other over adjacent time steps). Also pebl can be slower than software written in C/C++ or Java, because it uses Python libraries, but still pebl benefits from parallel learning and edge prior implementation, making itself applicable to a wider range of problems than other software. 4. Experiments In the section there are different tests of pebl dependency detection abilities. To check if a program detects dependency between two variables or not we use as input features reflected

Feature Name Values main feature x N (2, 10) dependency f(x) f(x) independable feature y N (2, 10) Table 2. Features to detect dependency x f(x). in Table 2. Since an output of pebl learner consists of top scoring networks and a consensus matrix, we use minimum p-value of edges we expect to be present and maximum p-value of all edges connecting independent features to verify algorithm answer. An edge p-value is a rate of top scored networks containing the edge. Best p-value an edge can have is 1 and the lowest is 0. So p-value represents probability of the edge presence in the optimal structure of BN for the variables. Bayesian network do not comprise cycles, so dependencies like linear relation can be represented with two structures (see Figure 2). Figure 2. For most functions f(x) these are equivalent structures. Maximum entropy discretization plays an important role in dependency detection. Given x, y and f(x + n 1 ) + n 2, n 1 N (0, 0.2), n 2 N (0, 0.1) we variate num_bins and check how p-values of true edges (those connecting dependable features) change. Our dataset consists of 100 objects. On Figure 3 p-values of true edges exceed p-values of false edges (those connecting independable features) in the (35; 38) and (50, 75) num_bins intervals, that is where dependency x f(x) is detected. 65 value seems the most certain. How to detect optimal num_bins value without knowing true edges? On Figure 3 p-values of false and true edges are reflected. The regions where false p-values are nonzero is where network fails to recognise independence. So it will be best to select num_bins value, where network is more confident in true edges than in false ones (for example, num_bins [55, 75]). Finally we noise data with normal distributed variable to inspect limits of pebl linear dependency detection ability. The dependency is recognised quite well (Figure 4), see example of noised data in Figure 5. Polynomial dependency detection is reflected on Figures 6, 7. On Figures 3 and 6 it is worth noticing that 0.5 is a frequent value for true edges p-value and also it s upper limit. Dependency configurations x f(x) and f(x) x appear in top score networks same number of times and each of their p-value will be 0.5. That proves our assumption about equivalence of the configurations. 5. Conclusion We introduced Bayesian network library pebl, reviewed it s implemented tools and made several suggestions on how to work with pebl. Experiments show that for two variables dependency detection it works much better than a man s eye. With the use of parallel calculations Bayesian networks structure-learners can be a powerful dependency detection tool.

Figure 3. Edges p-values (varying num bins) for linear dependency dataset. Figure 4. Edges p-values (varying noise N (0, sigma)) for linear dependency dataset. Figure 5. Linear dataset with noise sigma=10. Figure 6. Edges p-values (varying noise N (0, sigma)) for polynomial dependency dataset. Figure 7. Polynomial dataset with noise sigma=5.

References [1] Heckerman D 1998 A tutorial on learning with bayesian networks. The MIT Press [2] Abhik Shah, Peter Woolf 2009 Python Environment for Bayesian Learning: Inferring the Structure of Bayesian Networks from Knowledge and Data JMLR [3] Peer D, Regev A, Elidan G, Friedman N 2001 Inferring subnetworks from perturbed expression profiles Bioinformatics [4] Yoo C, Thorsson V, Cooper GF 2002 Discovery of causal relationships in a gene-regulation pathway from a mixture of experimental and observational DNA microarray data Pac Symp Biocomput [5] Imoto S, Higuchi T, Goto T, Tashiro K, Kuhara S, Miyano S 2003 Combining microarrays and biological knowledge for estimating gene networks via bayesian networks; Bioinformatics Conference, Proceedings of the 2003 IEEE; pp. 104-113; [6] Chickering DM, Geiger D, Heckerman D. 1994 November Learning bayesian networks is np-hard Technical Report MSR-TR-94-17, Microsoft Research.