How to Use pkgdeptools

Similar documents
How To Use GOstats and Category to do Hypergeometric testing with unsupported model organisms

MIGSA: Getting pbcmc datasets

HowTo: Querying online Data

Image Analysis with beadarray

Bioconductor: Annotation Package Overview

Creating a New Annotation Package using SQLForge

RDAVIDWebService: a versatile R interface to DAVID

How to use cghmcr. October 30, 2017

Working with aligned nucleotides (WORK- IN-PROGRESS!)

The analysis of rtpcr data

Creating a New Annotation Package using SQLForge

Analysis of two-way cell-based assays

SimBindProfiles: Similar Binding Profiles, identifies common and unique regions in array genome tiling array data

RMassBank for XCMS. Erik Müller. January 4, Introduction 2. 2 Input files LC/MS data Additional Workflow-Methods 2

Introduction to the Codelink package

How to use CNTools. Overview. Algorithms. Jianhua Zhang. April 14, 2011

A complete analysis of peptide microarray binding data using the pepstat framework

survsnp: Power and Sample Size Calculations for SNP Association Studies with Censored Time to Event Outcomes

segmentseq: methods for detecting methylation loci and differential methylation

Count outlier detection using Cook s distance

Analysis of screens with enhancer and suppressor controls

An Introduction to GenomeInfoDb

Advanced analysis using bayseq; generic distribution definitions

An Overview of the S4Vectors package

Overview of ensemblvep

The bumphunter user s guide

mocluster: Integrative clustering using multiple

Overview of ensemblvep

riboseqr Introduction Getting Data Workflow Example Thomas J. Hardcastle, Betty Y.W. Chung October 30, 2017

Shrinkage of logarithmic fold changes

The rtracklayer package

MMDiff2: Statistical Testing for ChIP-Seq Data Sets

The REDseq user s guide

Visualisation, transformations and arithmetic operations for grouped genomic intervals

An Introduction to the methylumi package

Introduction to GenomicFiles

Creating IGV HTML reports with tracktables

DupChecker: a bioconductor package for checking high-throughput genomic data redundancy in meta-analysis

Introduction: microarray quality assessment with arrayqualitymetrics

Getting started with flowstats

Introduction to BatchtoolsParam

Simulation of molecular regulatory networks with graphical models

INSPEcT - INference of Synthesis, Processing and degradation rates in Time-course analysis

An Introduction to ShortRead

NacoStringQCPro. Dorothee Nickles, Thomas Sandmann, Robert Ziman, Richard Bourgon. Modified: April 10, Compiled: April 24, 2017.

SCnorm: robust normalization of single-cell RNA-seq data

An Introduction to the methylumi package

Analysis of Pairwise Interaction Screens

Triform: peak finding in ChIP-Seq enrichment profiles for transcription factors

Analyse RT PCR data with ddct

segmentseq: methods for detecting methylation loci and differential methylation

The fastliquidassociation Package

How to use MeSH-related Packages

MultipleAlignment Objects

Fitting Baselines to Spectra

Introduction: microarray quality assessment with arrayqualitymetrics

INSPEcT - INference of Synthesis, Processing and degradation rates in Time-course analysis

An Introduction to Bioconductor s ExpressionSet Class

Cardinal design and development

CQN (Conditional Quantile Normalization)

rhdf5 - HDF5 interface for R

HowTo layout a pathway

Package BiocInstaller

Using the SRAdb Package to Query the Sequence Read Archive

rbiopaxparser Vignette

bayseq: Empirical Bayesian analysis of patterns of differential expression in count data

Using the SRAdb Package to Query the Sequence Read Archive

crlmm to downstream data analysis

Basic4Cseq: an R/Bioconductor package for the analysis of 4C-seq data

Introduction to the TSSi package: Identification of Transcription Start Sites

GxE.scan. October 30, 2018

Making and Utilizing TxDb Objects

Evaluation of VST algorithm in lumi package

The Command Pattern in R

Seminar III: R/Bioconductor

MethylAid: Visual and Interactive quality control of Illumina Human DNA Methylation array data

Design Issues in Matrix package Development

Biobase development and the new eset

qplexanalyzer Matthew Eldridge, Kamal Kishore, Ashley Sawle January 4, Overview 1 2 Import quantitative dataset 2 3 Quality control 2

ssviz: A small RNA-seq visualizer and analysis toolkit

matter: Rapid prototyping with data on disk

M 3 D: Statistical Testing for RRBS Data Sets

PROPER: PROspective Power Evaluation for RNAseq

Design Issues in Matrix package Development

Using branchpointer for annotation of intronic human splicing branchpoints

Analysis of Bead-level Data using beadarray

Preprocessing and Genotyping Illumina Arrays for Copy Number Analysis

genefu: a package for breast cancer gene expression analysis

Basic4Cseq: an R/Bioconductor package for the analysis of 4C-seq data

poplite vignette Daniel Bottomly and Beth Wilmot October 12, 2017

RNAseq Differential Expression Analysis Jana Schor and Jörg Hackermüller November, 2017

GenoGAM: Genome-wide generalized additive

Sparse Model Matrices

Comparing Least Squares Calculations

An introduction to gaucho

Using the qrqc package to gather information about sequence qualities

VariantFiltering: filtering of coding and noncoding genetic variants

Wave correction for arrays

Using the SRAdb Package to Query the Sequence Read Archive

Annotation mapping functions

Transcription:

How to Use pkgdeptools Seth Falcon February 10, 2018 1 Introduction The pkgdeptools package provides tools for computing and analyzing dependency relationships among R packages. With it, you can build a graph-based representation of the dependencies among all packages in a list of CRAN-style package repositories. There are utilities for computing installation order of a given package and, if the RCurl package is available, estimating the download size required to install a given package and its dependencies. This vignette demonstrates the basic features of the package. 2 Graph Basics A graph consists of a set of nodes and a set of edges representing relationships between pairs of nodes. The relationships among the nodes of a graph are binary; either there is an edge between a pair of nodes or there is not. To model package dependencies using a graph, let the set of packages be the nodes of the graph with directed edges originating from a given package to each of its dependencies. Figure 1 shows a part of the Bioconductor dependency graph for to the Category package. Since circular dependencies are not allowed, the resulting dependency graph will be a directed acyclic graph (DAG). 3 Building a Dependency Graph > library("pkgdeptools") > library("biobase") 1

> library("rgraphviz") The makedepgraph function retrieves the meta data for all packages of a specified type (source, win.binary, or mac.binary) from each repository in a list of repository URLs and builds a graphnel 1 instance representing the packages and their dependency relationships. The function takes four arguments: 1) replist a character vector of CRAN-style package repository URLs; 2) suggests.only a logical value indicating whether the resulting graph should represent relations from the Depends field (FALSE, default) or the Suggests field (TRUE); 3) type a string indicating the type of packages to search for, the default is getoption("pkgtype"); 4) keep.builtin which will keep packages that come with a standard R install in the dependency graph (the default is FALSE). Here we use makedepgraph to build dependency graphs of the BioC and CRAN packages. Each dependency graph is a graphnel instance. The outedges of a given node list its direct dependencies (as shown for package annotate). The node attribute size gives the size of the package in megabytes when the dosize argument is TRUE (this is the default). Obtaining the size of packages requires the RCurl package and can be time consuming for large repositories since a seprate HTTP request must be made for each package. In the examples below, we set dosize=false to speed the computations. > library(biocinstaller) > biocurl <- biocinstallrepos()["biocsoft"] > biocdeps <- makedepgraph(biocurl, type="source", dosize=false) > biocdeps A graphnel graph with directed edges Number of Nodes = 2312 Number of Edges = 9890 > edges(biocdeps)["annotate"] $annotate [1] "AnnotationDbi" "XML" "Biobase" "DBI" [5] "xtable" "BiocGenerics" "RCurl" > ## if dosize=true, size in MB is stored > ## as a node attribute: > ## nodedata(biocdeps, n="annotate", attr="size") 1 See help("graphnel-class") 2

4 Using the Dependency Graph The dependencies of a given package can be visualized using the graph generated by makedepgraph and the Rgraphviz package. The graph shown in Figure 1 was produced using the code shown below. The acc method from the graph package returns a vector of all nodes that are accessible from the given node. Here, it has been used to obtain the complete list of Category s dependencies. > categorynodes <- c("category", + names(acc(biocdeps, "Category")[[1]])) > categorygraph <- subgraph(categorynodes, biocdeps) > nn <- makenodeattrs(categorygraph, shape="ellipse") > plot(categorygraph, nodeattrs=nn) In R, there is no easy to way to preview a given package s dependencies and estimate the amount of data that needs to be downloaded even though the install.packages function will search for and install package dependencies if you ask it to by specifying dependencies=true. The getinstallorder function provides such a preview. For computing installation order, it is useful to have a single graph representing the relationships among all packages in all available repositories. Below, we create such a graph combining all CRAN and Bioconductor packages. > alldeps <- makedepgraph(biocinstallrepos(), type="source", + keep.builtin=true, dosize=false) > Calling getinstallorder for package GOstats, we see a listing of only those packages that need to be installed. Your results will be different based upon your installed packages. > getinstallorder("gostats", alldeps) $packages character(0) $total.size numeric(0) 3

When needed.only=false, the complete dependency list is returned regardless of what packages are currently installed. > getinstallorder("gostats", alldeps, needed.only=false) $packages [1] "methods" "utils" "graphics" [4] "stats" "parallel" "BiocGenerics" [7] "Biobase" "stats4" "S4Vectors" [10] "IRanges" "DBI" "bit" [13] "bit64" "tools" "assertthat" [16] "grdevices" "crayon" "cli" [19] "rlang" "utf8" "pillar" [22] "tibble" "blob" "digest" [25] "memoise" "pkgconfig" "Rcpp" [28] "BH" "plogr" "RSQLite" [31] "AnnotationDbi" "grid" "lattice" [34] "Matrix" "graph" "RBGL" [37] "XML" "xtable" "bitops" [40] "RCurl" "annotate" "GSEABase" [43] "splines" "survival" "genefilter" [46] "Category" "GO.db" "AnnotationForge" [49] "Rgraphviz" "GOstats" $total.size [1] NA The edge directions of the dependency graph can be reversed and the resulting graph used to determine the set of packages that make use of (even indirectly) a given package. For example, one might like to know which packages make use of the methods package. Here is one way to do that: > alldepsonme <- reverseedgedirections(alldeps) > usesmethods <- dijkstra.sp(alldepsonme, start="methods")$distance > usesmethods <- usesmethods[is.finite(usesmethods)] > length(usesmethods) - 1 ## don't count methods itself [1] 10915 4

> table(usesmethods) usesmethods 0 1 2 3 4 5 1 3624 6378 860 51 2 > > tolatex(sessioninfo()) ˆ R Under development (unstable) (2018-01-28 r74177), x86_64-pc-linux-gnu ˆ Locale: LC_CTYPE=en_US.UTF-8, LC_NUMERIC=C, LC_TIME=en_US.UTF-8, LC_COLLATE=C, LC_MONETARY=en_US.UTF-8, LC_MESSAGES=en_US.UTF-8, LC_PAPER=en_US.UTF-8, LC_NAME=C, LC_ADDRESS=C, LC_TELEPHONE=C, LC_MEASUREMENT=en_US.UTF-8, LC_IDENTIFICATION=C ˆ Running under: Ubuntu 16.04.3 LTS ˆ Matrix products: default ˆ BLAS: /home/biocbuild/bbs-3.7-bioc/r/lib/librblas.so ˆ LAPACK: /home/biocbuild/bbs-3.7-bioc/r/lib/librlapack.so ˆ Base packages: base, datasets, grdevices, graphics, grid, methods, parallel, stats, utils ˆ Other packages: Biobase 2.39.2, BiocGenerics 0.25.3, BiocInstaller 1.29.4, RBGL 1.55.0, RCurl 1.95-4.10, Rgraphviz 2.23.0, bitops 1.0-6, graph 1.57.1, pkgdeptools 1.45.1 ˆ Loaded via a namespace (and not attached): compiler 3.5.0, stats4 3.5.0, tools 3.5.0 5

Category genefilter GSEABase RBGL annotate graph AnnotationDbi xtable RCurl XML RSQLite IRanges Biobase DBI S4Vectors BiocGenerics Figure 1: The dependency graph for the Category package. 6