The qp Package. December 21, 2006

Similar documents
Simulation of molecular regulatory networks with graphical models

Fathom Dynamic Data TM Version 2 Specifications

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski

Package qpgraph. January 22, 2018

Statistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte

Multivariate Capability Analysis

Notation Index. Probability notation. (there exists) (such that) Fn-4 B n (Bell numbers) CL-27 s t (equivalence relation) GT-5.

Notation Index 9 (there exists) Fn-4 8 (for all) Fn-4 3 (such that) Fn-4 B n (Bell numbers) CL-25 s ο t (equivalence relation) GT-4 n k (binomial coef

10-701/15-781, Fall 2006, Final

Computational complexity

The Maximum Clique Problem

The Power and Sample Size Application

Package assortnet. January 18, 2016

Algorithms for the Bin Packing Problem with Conflicts

Graph Theory. ICT Theory Excerpt from various sources by Robert Pergl

COMPUTER AND ROBOT VISION

University of Florida CISE department Gator Engineering. Clustering Part 4

Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Package CoClust. December 17, 2017

Clustering Part 4 DBSCAN

Package nlsrk. R topics documented: June 24, Version 1.1 Date Title Runge-Kutta Solver for Function nls()

Week 7 Picturing Network. Vahe and Bethany

Lecture 4: Undirected Graphical Models

Predictive Analysis: Evaluation and Experimentation. Heejun Kim

Chapter 5: Outlier Detection

Part I, Chapters 4 & 5. Data Tables and Data Analysis Statistics and Figures

Package fso. February 19, 2015

Package XMRF. June 25, 2015

Biometrics Technology: Image Processing & Pattern Recognition (by Dr. Dickson Tong)

Package gibbs.met documentation

Mean Tests & X 2 Parametric vs Nonparametric Errors Selection of a Statistical Test SW242

UNIVERSITY OF OSLO. Faculty of Mathematics and Natural Sciences

Package ldbod. May 26, 2017

Package MultiMeta. February 19, 2015

An Edge-Swap Heuristic for Finding Dense Spanning Trees

A NEW DYNAMIC SINGLE-ROW ROUTING FOR CHANNEL ASSIGNMENTS

EE795: Computer Vision and Intelligent Systems

Uncertainties: Representation and Propagation & Line Extraction from Range data

Section 2-2 Frequency Distributions. Copyright 2010, 2007, 2004 Pearson Education, Inc

Smooth Simultaneous Structural Graph Matching and Point-Set Registration

Minitab 18 Feature List

The pheno Package. December 11, Description Provides some easy-to-use functions for time series analyses of (plant-) phenological data sets.

Package lle. February 20, 2015

Chapter 9 Graph Algorithms

List of NEW Maths content

SAS/STAT 13.1 User s Guide. The Power and Sample Size Application

UNIT 1: NUMBER LINES, INTERVALS, AND SETS

MINITAB Release Comparison Chart Release 14, Release 13, and Student Versions

How does the ROI affect the thresholding?

3 Feature Selection & Feature Extraction

THE SWALLOW-TAIL PLOT: A SIMPLE GRAPH FOR VISUALIZING BIVARIATE DATA.

Introduction to Combinatorial Algorithms

8. MINITAB COMMANDS WEEK-BY-WEEK

Medical Image Analysis

Package mixphm. July 23, 2015

Clustering. Chapter 10 in Introduction to statistical learning

Making Science Graphs and Interpreting Data

ALGEBRA II A CURRICULUM OUTLINE

Supplementary Material

Package gibbs.met. February 19, 2015

Set up of the data is similar to the Randomized Block Design situation. A. Chang 1. 1) Setting up the data sheet

Lesson 18-1 Lesson Lesson 18-1 Lesson Lesson 18-2 Lesson 18-2

Lecture 18 Representation and description I. 2. Boundary descriptors

Introduction to Geospatial Analysis

NEW CERN PROTON SYNCHROTRON BEAM OPTIMIZATION TOOL

Mobile Human Detection Systems based on Sliding Windows Approach-A Review

Fast or furious? - User analysis of SF Express Inc

Methods for Intelligent Systems

SAS (Statistical Analysis Software/System)

Minimum Spanning Trees My T. UF

Complexity Results on Graphs with Few Cliques

3 INTEGER LINEAR PROGRAMMING

Lecture and notes by: Nate Chenette, Brent Myers, Hari Prasad November 8, Property Testing

Clustering Using Graph Connectivity

Expectation Maximization (EM) and Gaussian Mixture Models

STATISTICS (STAT) Statistics (STAT) 1

Exploring Data. This guide describes the facilities in SPM to gain initial insights about a dataset by viewing and generating descriptive statistics.

Basics of Network Analysis

Supplementary text S6 Comparison studies on simulated data

31.6 Powers of an element

A Tutorial on VLFeat

Using PageRank in Feature Selection

Minitab 17 commands Prepared by Jeffrey S. Simonoff

Network Traffic Measurements and Analysis

CSE 417T: Introduction to Machine Learning. Lecture 6: Bias-Variance Trade-off. Henry Chai 09/13/18

Package mgc. April 13, 2018

Exploratory data analysis for microarrays

Stanford University CS359G: Graph Partitioning and Expanders Handout 18 Luca Trevisan March 3, 2011

The Curse of Dimensionality

Combo Charts. Chapter 145. Introduction. Data Structure. Procedure Options

[Programming Assignment] (1)

Data Mining and. in Dynamic Networks

Computing Largest Correcting Codes and Their Estimates Using Optimization on Specially Constructed Graphs p.1/30

Indexing by Shape of Image Databases Based on Extended Grid Files

11/22/2016. Chapter 9 Graph Algorithms. Introduction. Definitions. Definitions. Definitions. Definitions

Using Symbolic Techniques to find the Maximum Clique in Very Large Sparse Graphs

STATA 13 INTRODUCTION

Spectral Clustering and Community Detection in Labeled Graphs

Metrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates?

Estimating Human Pose in Images. Navraj Singh December 11, 2009

Transcription:

The qp Package December 21, 2006 Type Package Title q-order partial correlation graph search algorithm Version 0.2-1 Date 2006-12-18 Author Robert Castelo <robert.castelo@upf.edu>, Alberto Roverato <alberto.roverato@unibo.it> Maintainer Robert Castelo <robert.castelo@upf.edu> Depends R (>= 2.2.1) the q-order partial correlation graph search algorithm, q-partial, or qp, algorithm for short, is a robust procedure for structure learning of undirected Gaussian graphical Markov models from small n, large p data, that is, multivariate normal data coming from a number of random variables p larger than the number of multidimensional data points n as in the case of, e.g., microarray data. License GPL version 2 or newer R topics documented: jmlr06data.......................................... 2 qp.............................................. 2 qp.analyse.......................................... 4 qp.ci.test........................................... 5 qp.clique........................................... 6 qp.edge.prob........................................ 8 qp.get.cliques........................................ 9 qp.graph........................................... 10 qp.hist............................................ 11 qp.matrix.image....................................... 12 qp.search.......................................... 13 Index 15 1

2 qp jmlr06data Synthetic data from the article by Castelo and Roverato (2006) Format Synthetic data generated from two graphs with 150 vertices, G 1 and G 2. In G 1 the boundary of every vertex is at most 5, while in G 2 the boundary of every vertext is at most 20 IC.bd5: inverse correlation matrix encoding the independence structure of G 1 IC.bd20: inverse correlation matrix encoding the independence structure of G 2 S.bd5.N20: sample covariance matrix from a sample of size 20 drawn from a normal distribution with mean 0 and inverse correlation matrix IC.bd5 S.bd5.N50: sample covariance matrix from a sample of size 50 drawn from a normal distribution with mean 0 and inverse correlation matrix IC.bd5 S.bd5.N150: sample covariance matrix from a sample of size 150 drawn from a normal distribution with mean 0 and inverse correlation matrix IC.bd5 S.bd20.N20: sample covariance matrix from a sample of size 20 drawn from a normal distribution with mean 0 and inverse correlation matrix IC.bd20 S.bd20.N50: sample covariance matrix from a sample of size 50 drawn from a normal distribution with mean 0 and inverse correlation matrix IC.bd20 S.bd20.N150: sample covariance matrix from a sample of size 150 drawn from a normal distribution with mean 0 and inverse correlation matrix IC.bd20 qp.out.bd5.n20.q10: output from qp.search applied to S.bd5.N20 with q=10 and T=500 qp.out.bd20.n20.q10: output from qp.search applied to S.bd20.N20 with q=10 and T=500 qp The package qp : summary information

qp 3 This package provides functions for implementing the q-order partial-correlation graph search algorithm, q-partial, or qp, algorithm for short. The qp algorithm is a robust procedure for structure learning of undirected Gaussian graphical Markov models (UGGMMs) from "small n, large p" data, that is, multivariate normal data coming from a number of random variables p larger than the number of multidimensional data points n as in the case of, e.g., microarray data. Data jmlr06data synthetic data used in the referenced article Functions qp.search calculates the estimates of the non-rejection rates for every pair of variables qp.edge.prob calculates the estimate of the non-rejection rate for a particular pair of variables, this function is also called by qp.search qp.ci.test performs a test for conditional independence qp.analyse provides some exploratory analyses on the output of qp.search qp.clique calculates the maximum clique size as a function of the minimum threshold on the non-rejection rate for removing an edge qp.hist shows a histogram of the estimated non-rejection rates obtained through qp.search qp.graph returns the qp-graph, in the form of an incidence matrix, resulting of thresholding the non-rejection rates in the output of qp.search qp.matrix.image makes an image plot of the absolute value of an inverse correlation matrix qp.get.cliques finds the set of cliques of an undirected graph The package provides an implementation of the procedures described by Castelo and Roverato (2006) and is a contribution to the gr-project described by Lauritzen (2002). Authors Robert Castelo, Departament de Ciències Experimentals i de la Salut, Universitat Pompeu Fabra, Barcelona, Spain. Alberto Roverato, Dipartimento di Scienze Statistiche, Università di Bologna, Italy. Lauritzen, S. L. (2002). graphical Models in R. R News, 3(2)39.

4 qp.analyse qp.analyse Performs some exploratory analyses on the q-partial graph Using the output of qp.search this function provides some exploratory analyses on the resulting q-partial graph. qp.analyse(qp.output, threshold, largest.clique=true, plot.image=true, exact.calculation=false, approximation.iterations=100) qp.output output of qp.search threshold threshold on the minimum non-rejection rate required for edge removal largest.clique when this flag is set to TRUE it calculates the size of the largest clique plot.image when this flag is set it plots the incidence matrix resulting of thresholding the non-rejection rates exact.calculation when this flag is set to TRUE, the exact maximum clique size is calculated and when set to FALSE a lower bound is calculated instead. It applies only when largest.clique=true approximation.iterations number of iterations performed to calculate the lower bound on the clique number of each graph. It applies only when largest.clique=true and \ exact.calculation=false Details Returns an object of the class matrix showing the number of selected edges, the number of edges of the complete graph and the percentage of selected edges. When largest.clique=true it gives also the size of the largest clique and when plot.image=true it plots the incidence matrix resulting of thresholding the non-rejection rates. Beware that setting largest.clique=true and exact.calculation=true when giving breakpoints between 0.95 and 1.0 (which may result into very dense graphs) can lead to a very long time of computation due to the NP-completeness of the problem of calculating the size of the largest clique which is therefore bounded by an exponential growth of the running time as function of the graph density (cf. Pardalos and Xue, 1994). The lower bound on the maximum clique size is calculated by ranking the vertices by their connectivity degree, put the first vertex in a set and go through the rest of the ranking adding those vertices

qp.ci.test 5 to the set that form a clique with the vertices currently within the set. Once the entire ranking has been examined a large clique should have been built and hopefully the largest one. This process is repeated a number of times (approximation.iterations) each of which the ranking is altered with increasing levels of randomness acyclically (altering 1 to p vertices and again). Larger values of approximation.iterations should provide tighter lower bounds and eventually the exact maximum clique size (the clique number). Pardalos, P.M. and Xue, J. (1994). The maximum clique problem, J. Global Optim., 4:301-328 qp.search, qp.clique qp.analyse(qp.out.bd5.n20.q10,threshold=0.9,largest.clique=true) qp.ci.test Conditional independence test Performs a test for conditional independence between variables indexed by i and j given the conditioning set Q qp.ci.test(s, N, i=1, j=2, Q=c(), binary=true) S N i j Q binary sample variance-covariance matrix sample size index of one variable index of another variable conditioning set flag to switch to the compiled C code

6 qp.clique Details By default binary=true and the compiled and faster C code corresponding to this function will be executed. If binary=false is set, then the R code will be executed. Value t.value p.value the t-statistic value the p-value on rejecting the null hypothesis of conditional independence qp.edge.prob S <- S.bd5.N20 N <- 20 qp.ci.test(s,n,i=3,j=4,q=c(5,6,7)) qp.clique Relationship between non-rejection rate and maximum clique size Using the output of qp.search this function calculates the maximum clique size as a function of the minimum threshold on the non-rejection rate for removing an edge qp.clique(qp.output, N, threshold.lim=c(0,1), breaks=5, plot.image=true, exact.calculation=false, approximation.iterations=100)

qp.clique 7 qp.output N output of qp.search sample size threshold.lim range of the non-rejection rate threshold on which calculate the funcion breaks plot.image one of: a vector giving the breakpoints along the range defined by threshold.lim, a single number giving the number of equidistant breakpoints that divide the range defined by threshold.lim. when this flag is set to TRUE, the qp.clique plot is produced exact.calculation when this flag is set to TRUE, the exact maximum clique size is calculated and when set to FALSE a lower bound is calculated instead approximation.iterations number of iterations performed to calculate the lower bound on the clique number of each graph. It applies only when exact.calculation=false Details The qp.clique plot provides information on the graphs potentially selected by specifying different values of the threshold. Every circle in the plot corresponds to a graph and has three values associated with it: the threshold value used to construct the graph (horizontal axis); the number of vertices of the largest clique of the graph (vertical axis); the percentage of present edges in the graph (number inside the plot, beside the circle). Furthermore, adjacent circles are joined by a line and the dotted horizontal line corresponds to the sample size N. Beware that setting exact.calculation=true and giving breakpoints between 0.95 and 1.0, may result into very dense graphs which can lead to a very long time of computation due to the NPcompleteness of the problem of calculating the size of the largest clique which is therefore bounded by an exponential growth of the running time as function of the graph density (cf. Pardalos and Xue, 1994). The lower bound on the maximum clique size is calculated by ranking the vertices by their connectivity degree, put the first vertex in a set and go through the rest of the ranking adding those vertices to the set that form a clique with the vertices currently within the set. Once the entire ranking has been examined a large clique should have been built and hopefully the largest one. This process is repeated a number of times (approximation.iterations) each of which the ranking is altered with increasing levels of randomness acyclically (altering 1 to p vertices and again). Larger values of approximation.iterations should provide tighter lower bounds and eventually the exact maximum clique size (the clique number). Value threshold size threshold on the non-rejection rate that provides the maximum clique size that is strictly smaller than the sample size N maximum clique size strictly smaller than the sample size N

8 qp.edge.prob Pardalos, P.M. and Xue, J. (1994). The maximum clique problem, J. Global Optim., 4:301-328 qp.search qp.clique(qp.out.bd5.n20.q10,20) qp.edge.prob Estimate of the non-rejection rate Calculates the estimate of the non-rejection rate for a pair of variables, that is, the proportion of conditional independence tests that accept the null hypothesis of zero partial correlation given the q-order conditionals. qp.edge.prob(s, N, i=1, j=2, q=0, T=500, significance=0.05, binary=true) S N i j q T Details sample variance-covariance matrix sample size index of one variable index of another variable partial-correlation order number of tests per adjacency significance significance level of each test binary flag to switch to the compiled C code By default binary=true and the compiled and faster C code corresponding to this function will be executed. If binary=false is set, then the R code will be executed.

qp.get.cliques 9 qp.search, qp.ci.test S <- S.bd5.N20 N <- 20 q <- 6 T <- 100 qp.edge.prob(s,n,i=3,j=4,q,t) qp.get.cliques Cliques of an undirected graph It finds the set of cliques, i.e. maximal complete subsets of vertices, of an undirected graph taken as an incidence matrix. qp.get.cliques(i, binary=true) I binary incidence matrix flag to switch to the compiled C code Details It uses the algorithm described in Bron and Kerbosch (1973) and returns a list where each member is a vector of vertices forming a clique in the given graph. Beware that the problem of finding the set of cliques is NP-complete and the time of computation of this algorithm grows exponentially in the graph density (number of actual edges over the total number of adjacencies).

10 qp.graph Bron, C. and Kerbosch, J (1973). Finding all cliques of an undirected graph, Commun. ACM, 16:575 577 qp.graph, qp.clique I <- qp.graph(qp.out.bd5.n20.q10,threshold=0.9) cliquelist <- qp.get.cliques(i) sprintf("the graph has %d cliques\n",length(cliquelist)) qp.graph Incidence matrix of the qp-graph Using the output of qp.search this function returns the qp-graph, in the form of an incidence matrix, resulting of thresholding the non-rejection rates in the output of qp.search qp.graph(qp.output, threshold) qp.output threshold output of qp.search threshold on the non-rejection rate

qp.hist 11 qp.search I <- qp.graph(qp.out.bd5.n20.q10,threshold=0.9) sprintf("the graph has %.0f edges\n",sum(i)/2) qp.hist Histogram of the non-rejection rates Using the output of qp.search this function plots the histogram of the estimated non-rejection rates. When the inverse correlation matrix from the generative graph is given, it provides additional plots of information. qp.hist(qp.output, IC=NULL, prob=false) qp.output IC prob output of qp.search inverse correlation matrix from the generative graph when this flag is set to TRUE the histograms show densities, otherwise they show absolute frequencies. qp.search

12 qp.matrix.image # if are working with synthetic data and have the IC matrix qp.hist(qp.out.bd5.n20.q10,ic.bd5,prob=true) # otherwise, we just look at the non-rejection rate distribution qp.hist(qp.out.bd5.n20.q10,null,prob=true) qp.matrix.image Image of an inverse correlation matrix Makes an image plot of the absolute value of an inverse correlation matrix and reports the number of edges of the corresponding independence graph, the total number of adjacencies of the graph and the percentage of edges respect to this total number of adjacencies qp.matrix.image(m, col=null, plot=true) M col plot the matrix to make the image plot flag that when set to NULL the gray scale is used in the plot when this flag is set to TRUE it plots the function Details Returns an object of the class matrix containing the the number of edges of the corresponding independence graph, the total number of adjacencies of the graph and the percentage of edges respect to this total number of adjacencies. When plot=true it plots the partial correlation coefficients as a matrix.

qp.search 13 qp.matrix.image(ic.bd5) qp.search Matrix of non-rejection rates Calculates the estimates of the non-rejection rates for every pair of variables. qp.search(s, N, q=0, T=500, significance=0.05, binary=true) S N T q Details Value sample variance-covariance matrix sample size number of tests per adjacency partial-correlation order significance significance level of each test binary flag to switch to the compiled C code By default binary=true and the compiled and faster C code corresponding to this function will be executed. If binary=false is set, then the R code will be executed. A T matrix with the acceptance test counts number of tests per adjacency (copied from the input parameter) qp.edge.prob, qp.analyse, qp.hist

14 qp.search S <- S.bd5.N20 N <- 20 q <- 6 T <- 100 qp.out <- qp.search(s,n,q,t)

Index Topic datasets jmlr06data, 1 Topic graphs qp, 2 qp.analyse, 3 qp.ci.test, 5 qp.clique, 6 qp.edge.prob, 8 qp.get.cliques, 9 qp.graph, 10 qp.hist, 11 qp.matrix.image, 12 qp.search, 13 Topic models qp, 2 qp.analyse, 3 qp.ci.test, 5 qp.clique, 6 qp.edge.prob, 8 qp.get.cliques, 9 qp.graph, 10 qp.hist, 11 qp.matrix.image, 12 qp.search, 13 Topic multivariate qp, 2 qp.analyse, 3 qp.ci.test, 5 qp.clique, 6 qp.edge.prob, 8 qp.get.cliques, 9 qp.graph, 10 qp.hist, 11 qp.matrix.image, 12 qp.search, 13 qp, 2 qp.analyse, 2, 3, 13 qp.ci.test, 2, 5, 8 qp.clique, 3, 4, 6, 9 qp.edge.prob, 2, 5, 8, 13 qp.get.cliques, 3, 9 qp.graph, 3, 9, 10 qp.hist, 3, 11, 13 qp.matrix.image, 3, 12 qp.out.bd20.n20.q10 (jmlr06data), 1 qp.out.bd5.n20.q10 (jmlr06data), 1 qp.search, 2 4, 6 8, 10, 11, 13 S.bd20.N150 (jmlr06data), 1 S.bd20.N20 (jmlr06data), 1 S.bd20.N50 (jmlr06data), 1 S.bd5.N150 (jmlr06data), 1 S.bd5.N20 (jmlr06data), 1 S.bd5.N50 (jmlr06data), 1 IC.bd20 (jmlr06data), 1 IC.bd5 (jmlr06data), 1 jmlr06data, 1, 2 15