Package GSVA. R topics documented: January 22, Version

Similar documents
Package GSRI. March 31, 2019

Package POST. R topics documented: November 8, Type Package

Package splinetimer. December 22, 2016

Package EMDomics. November 11, 2018

Package SIMLR. December 1, 2017

Package dualks. April 3, 2019

Package ssizerna. January 9, 2017

Package BgeeDB. January 5, 2019

Package Linnorm. October 12, 2016

Package STROMA4. October 22, 2018

ROTS: Reproducibility Optimized Test Statistic

Package varistran. July 25, 2016

Package SCAN.UPC. October 9, Type Package. Title Single-channel array normalization (SCAN) and University Probability of expression Codes (UPC)

Package geecc. October 9, 2015

Package SC3. November 27, 2017

Package SC3. September 29, 2018

Package RNAseqNet. May 16, 2017

Package RNASeqR. January 8, 2019

Package CONOR. August 29, 2013

Package igc. February 10, 2018

Package ibbig. R topics documented: December 24, 2018

Package PGSEA. R topics documented: May 4, Type Package Title Parametric Gene Set Enrichment Analysis Version 1.54.

Package RcisTarget. July 17, 2018

Package frma. R topics documented: March 8, Version Date Title Frozen RMA and Barcode

Package RTCGAToolbox

Package ffpe. October 1, 2018

Package phantasus. March 8, 2019

Package PCGSE. March 31, 2017

Package twilight. August 3, 2013

Package EGSEA. July 7, 2018

Package PCADSC. April 19, 2017

Package HTSFilter. November 30, 2017

Package scmap. March 29, 2019

How to store and visualize RNA-seq data

Package RUVSeq. September 16, 2018

Package RSNPset. December 14, 2017

Package DPBBM. September 29, 2016

Package INCATome. October 5, 2017

Package GSCA. October 12, 2016

Package GSCA. April 12, 2018

Package tximport. November 25, 2017

Package semisup. March 10, Version Title Semi-Supervised Mixture Model

Package LncPath. May 16, 2016

Drug versus Disease (DrugVsDisease) package

Package tximport. May 13, 2018

Package HTSFilter. August 2, 2013

Package anota2seq. January 30, 2018

Package ClusterSignificance

Package MixSim. April 29, 2017

Package geecc. R topics documented: December 7, Type Package

mirnet Tutorial Starting with expression data

Package sigqc. June 13, 2018

Package canvasxpress

Package OrderedList. December 31, 2017

genbart package Vignette Jacob Cardenas, Jacob Turner, and Derek Blankenship

Package cycle. March 30, 2019

Package sglr. February 20, 2015

Package dotcall64. January 11, 2018

Package survclip. November 22, 2017

Package Modeler. May 18, 2018

Package weco. May 4, 2018

Package DFP. February 2, 2018

Package enrichplot. September 29, 2018

Package PSEA. R topics documented: November 17, Version Date Title Population-Specific Expression Analysis.

Package Director. October 15, 2018

SEEK User Manual. Introduction

Introduction to GE Microarray data analysis Practical Course MolBio 2012

Package transcriptogramer

Package polyphemus. February 15, 2013

Package XMRF. June 25, 2015

Package Tnseq. April 13, 2017

9/29/13. Outline Data mining tasks. Clustering algorithms. Applications of clustering in biology

Package rknn. June 9, 2015

Supervised vs unsupervised clustering

Package TMixClust. July 13, 2018

Course on Microarray Gene Expression Analysis

Package M3Drop. August 23, 2018

Package decomptumor2sig

Package EBglmnet. January 30, 2016

Package clusterprofiler

Package OutlierDC. R topics documented: February 19, 2015

Package LOLA. December 24, 2018

Package vbmp. March 3, 2018

Package clustvarsel. April 9, 2018

Package DAVIDQuery. R topics documented: October 4, Type Package. Title Retrieval from the DAVID bioinformatics data resource into R

Package bnbc. R topics documented: April 1, Version 1.4.0

How to use the rbsurv Package

Package cgh. R topics documented: February 19, 2015

Package spade. R topics documented:

Package sfc. August 29, 2016

Package macorrplot. R topics documented: October 2, Title Visualize artificial correlation in microarray data. Version 1.50.

Package omicplotr. March 15, 2019

Package projector. February 27, 2018

Package mgsa. January 13, 2019

Package ClusterBootstrap

STENO Introductory R-Workshop: Loading a Data Set Tommi Suvitaival, Steno Diabetes Center June 11, 2015

Package KEGGprofile. February 10, 2018

CQN (Conditional Quantile Normalization)

RNA-Seq. Joshua Ainsley, PhD Postdoctoral Researcher Lab of Leon Reijmers Neuroscience Department Tufts University

Package exporkit. R topics documented: May 5, Type Package Title Expokit in R Version Date

Transcription:

Version 1.26.0 Package GSVA January 22, 2018 Title Gene Set Variation Analysis for microarray and RNA-seq data Depends R (>= 3.0.0) Imports methods, BiocGenerics, Biobase, GSEABase (>= 1.17.4), geneplotter, shiny, shinythemes Suggests limma, RColorBrewer, genefilter, mclust, edger, snow, parallel, GSVAdata Description Gene Set Variation Analysis (GSVA) is a non-parametric, unsupervised method for estimating variation of gene set enrichment through the samples of a expression data set. GSVA performs a change in coordinate systems, transforming the data from a gene by sample matrix to a gene-set by sample matrix, thereby allowing the evaluation of pathway enrichment for each sample. This new matrix of GSVA enrichment scores facilitates applying standard analytical methods like functional enrichment, survival analysis, clustering, CNVpathway analysis or cross-tissue pathway analysis, in a pathway-centric manner. License GPL (>= 2) LazyLoad yes URL https://github.com/rcastelo/gsva BugReports https://github.com/rcastelo/gsva/issues Encoding latin1 biocviews Microarray, Pathways, GeneSetEnrichment NeedsCompilation yes Author Justin Guinney [aut, cre], Robert Castelo [aut], Joan Fernandez [ctb] Maintainer Justin Guinney <justin.guinney@sagebase.org> R topics documented: computegenesetsoverlap.................................. 2 filtergenesets........................................ 3 gsva............................................. 4 igsva............................................. 8 Index 10 1

2 computegenesetsoverlap computegenesetsoverlap Compute gene-sets overlap Description Usage Calculates the overlap among every pair of gene-sets given as input. ## S4 method for signature 'list,character' computegenesetsoverlap(gsets, uniqgenes, min.sz=1, max.sz=inf) ## S4 method for signature 'list,expressionset' computegenesetsoverlap(gsets, uniqgenes, min.sz=1, max.sz=inf) ## S4 method for signature 'GeneSetCollection,character' computegenesetsoverlap(gsets, uniqgenes, min.sz=1, max.sz=inf) ## S4 method for signature 'GeneSetCollection,ExpressionSet' computegenesetsoverlap(gsets, uniqgenes, min.sz=1, max.sz=inf) Arguments gsets uniqgenes min.sz max.sz Gene sets given either as a list or a GeneSetCollection object. Vector of unique genes to be considered when calculating the overlaps. Minimum size. Maximum size. Details Value This function calculates the overlap between every pair of gene sets of the input argument gsets. Before this calculation takes place, the gene sets in gsets are firstly filtered to discard genes that do not match to the identifiers in uniqgenes. Secondly, they are further filtered to meet the minimum and/or maximum size specified with the arguments min.sz and max.sz. The overlap between two gene sets is calculated as the number of common genes between the two gene sets divided by the smallest size of the two gene sets. A gene-set by gene-set matrix of the overlap among every pair of gene sets. Author(s) J. Guinney References Hänzelmann, S., Castelo, R. and Guinney, J. GSVA: Gene set variation analysis for microarray and RNA-Seq data. BMC Bioinformatics, 14:7, 2013. See Also filtergenesets

filtergenesets 3 Examples genesets <- list(set1=as.character(1:4), set2=as.character(4:10)) computegenesetsoverlap(genesets, unique(unlist(genesets))) filtergenesets Filter gene sets Description Filters gene sets through a given minimum and maximum set size. Usage ## S4 method for signature 'list' filtergenesets(gsets, min.sz=1, max.sz=inf) ## S4 method for signature 'GeneSetCollection' filtergenesets(gsets, min.sz=1, max.sz=inf) Arguments gsets min.sz max.sz Gene sets given either as a list or a GeneSetCollection object. Minimum size. Maximum size. Details This function filters the input gene sets according to a given minimum and maximum set size. Value A collection of gene sets that meet the given minimum and maximum set size. Author(s) J. Guinney References Hänzelmann, S., Castelo, R. and Guinney, J. GSVA: Gene set variation analysis for microarray and RNA-Seq data. BMC Bioinformatics, 14:7, 2013. See Also computegenesetsoverlap

4 gsva Examples genesets <- list(set1=as.character(1:4), set2=as.character(4:10)) filtergenesets(genesets, min.sz=5) gsva Gene Set Variation Analysis Description Usage Estimates GSVA enrichment scores. ## S4 method for signature 'ExpressionSet,list' gsva(expr, gset.idx.list, annotation, method=c("gsva", "ssgsea", "zscore", "plage"), kcdf=c("gaussian", "Poisson", "none"), rnaseq=false, abs.ranking=false, min.sz=1, max.sz=inf, no.bootstraps=0, bootstrap.percent =.632, parallel.sz=0, parallel.type="sock", mx.diff=true, tau=switch(method, gsva=1, ssgsea=0.25, NA), kernel=true, ssgsea.norm=true, verbose=true, return.old.value=false) ## S4 method for signature 'ExpressionSet,GeneSetCollection' gsva(expr, gset.idx.list, annotation, method=c("gsva", "ssgsea", "zscore", "plage"), kcdf=c("gaussian", "Poisson", "none"), rnaseq=false, abs.ranking=false, min.sz=1, max.sz=inf, no.bootstraps=0, bootstrap.percent =.632, parallel.sz=0, parallel.type="sock", mx.diff=true, tau=switch(method, gsva=1, ssgsea=0.25, NA), kernel=true, ssgsea.norm=true, verbose=true,

gsva 5 return.old.value=false) ## S4 method for signature 'matrix,genesetcollection' gsva(expr, gset.idx.list, annotation, method=c("gsva", "ssgsea", "zscore", "plage"), kcdf=c("gaussian", "Poisson", "none"), rnaseq=false, abs.ranking=false, min.sz=1, max.sz=inf, no.bootstraps=0, bootstrap.percent =.632, parallel.sz=0, parallel.type="sock", mx.diff=true, tau=switch(method, gsva=1, ssgsea=0.25, NA), kernel=true, ssgsea.norm=true, verbose=true, return.old.value=false) ## S4 method for signature 'matrix,list' gsva(expr, gset.idx.list, annotation, method=c("gsva", "ssgsea", "zscore", "plage"), kcdf=c("gaussian", "Poisson", "none"), rnaseq=false, abs.ranking=false, min.sz=1, max.sz=inf, no.bootstraps=0, bootstrap.percent =.632, parallel.sz=0, parallel.type="sock", mx.diff=true, tau=switch(method, gsva=1, ssgsea=0.25, NA), kernel=true, ssgsea.norm=true, verbose=true, return.old.value=false) Arguments expr gset.idx.list annotation method Gene expression data which can be given either as an ExpressionSet object or as a matrix of expression values where rows correspond to genes and columns correspond to samples. Gene sets provided either as a list object or as a GeneSetCollection object. In the case of calling gsva() with expression data in a matrix and gene sets as a GeneSetCollection object, the annotation argument can be used to supply the name of the Bioconductor package that contains annotations for the class of gene identifiers occurring in the row names of the expression data matrix. By default gsva() will try to match the identifiers in expr to the identifiers in gset.idx.list just as they are, unless the annotation argument is set. Method to employ in the estimation of gene-set enrichment scores per sample. By default this is set to gsva (Hänzelmann et al, 2013) and other options

6 gsva kcdf rnaseq abs.ranking min.sz max.sz are ssgsea (Barbie et al, 2009), zscore (Lee et al, 2008) or plage (Tomfohr et al, 2005). The latter two standardize first expression profiles into z-scores over the samples and, in the case of zscore, it combines them together as their sum divided by the square-root of the size of the gene set, while in the case of plage they are used to calculate the singular value decomposition (SVD) over the genes in the gene set and use the coefficients of the first right-singular vector as pathway activity profile. Character string denoting the kernel to use during the non-parametric estimation of the cumulative distribution function of expression levels across samples when method="gsva". By default, kcdf="gaussian" which is suitable when input expression values are continuous, such as microarray fluorescent units in logarithmic scale, RNA-seq log-cpms, log-rpkms or log-tpms. When input expression values are integer counts, such as those derived from RNA-seq experiments, then this argument should be set to kcdf="poisson". This argument supersedes arguments rnaseq and kernel, which are deprecated and will be removed in the next release. This argument has been deprecated and will be removed in the next release. Please use the argument kcdf instead. Flag used only when mx.diff=true. When abs.ranking=false (default) a modified Kuiper statistic is used to calculate enrichment scores, taking the magnitude difference between the largest positive and negative random walk deviations. When abs.ranking=true the original Kuiper statistic that sums the largest positive and negative random walk deviations, is used. In this latter case, gene sets with genes enriched on either extreme (high or low) will be regarded as highly activated. Minimum size of the resulting gene sets. Maximum size of the resulting gene sets. no.bootstraps Number of bootstrap iterations to perform. This argument has been deprecated and will be removed in the next release. bootstrap.percent.632 is the ideal percent samples bootstrapped. This argument has been deprecated and will be removed in the next release. parallel.sz parallel.type mx.diff tau Number of processors to use when doing the calculations in parallel. This requires to previously load either the parallel or the snow library. If parallel is loaded and this argument is left with its default value (parallel.sz=0) then it will use all available core processors unless we set this argument with a smaller number. If snow is loaded then we must set this argument to a positive integer number that specifies the number of processors to employ in the parallel calculation. Type of cluster architecture when using snow. Offers two approaches to calculate the enrichment statistic (ES) from the KS random walk statistic. mx.diff=false: ES is calculated as the maximum distance of the random walk from 0. mx.diff=true (default): ES is calculated as the magnitude difference between the largest positive and negative random walk deviations. Exponent defining the weight of the tail in the random walk performed by both the gsva (Hänzelmann et al., 2013) and the ssgsea (Barbie et al., 2009) methods. By default, this tau=1 when method="gsva" and tau=0.25 when method="ssgsea" just as specified by Barbie et al. (2009) where this parameter is called alpha.

gsva 7 kernel ssgsea.norm verbose This argument has been deprecated and will be removed in the next release. Please use the argument kcdf instead. Logical, set to TRUE (default) with method="ssgsea" runs the SSGSEA method from Barbie et al. (2009) normalizing the scores by the absolute difference between the minimum and the maximum, as described in their paper. When ssgsea.norm=false this last normalization step is skipped. Gives information about each calculation step. Default: FALSE. return.old.value Logical, set to FALSE (default) has no effect but when return.old.value=true, then the return value takes form of a list object as in previous versions of GSVA. This argument will be present only in this release for backward compability purposes during the deprecation of the arguments no.bootstraps and bootstrap.percent and will dissappear in the next release. Details GSVA assesses the relative enrichment of gene sets across samples using a non-parametric approach. Conceptually, GSVA transforms a p-gene by n-sample gene expression matrix into a g- geneset by n-sample pathway enrichment matrix. This facilitates many forms of statistical analysis in the space of pathways rather than genes, providing a higher level of interpretability. The gsva() function first maps the identifiers in the gene sets to the identifiers in the input expression data leading to a filtered collection of gene sets. This collection can be further filtered to require a minimun and/or maximum size of the gene sets for which we want to calculate GSVA enrichment scores, by using the arguments min.sz and max.sz. Value A gene-set by sample matrix of GSVA enrichment scores. Author(s) J. Guinney and R. Castelo References Barbie, D.A. et al. Systematic RNA interference reveals that oncogenic KRAS-driven cancers require TBK1. Nature, 462(5):108-112, 2009. Hänzelmann, S., Castelo, R. and Guinney, J. GSVA: Gene set variation analysis for microarray and RNA-Seq data. BMC Bioinformatics, 14:7, 2013. Lee, E. et al. Inferring pathway activity toward precise disease classification. PLoS Comp Biol, 4(11):e1000217, 2008. Tomfohr, J. et al. Pathway level analysis of gene expression using singular value decomposition. BMC Bioinformatics, 6:225, 2005. See Also filtergenesets computegenesetsoverlap

8 igsva Examples library(limma) p <- 10 ## number of genes n <- 30 ## number of samples ngrp1 <- 15 ## number of samples in group 1 ngrp2 <- n - ngrp1 ## number of samples in group 2 ## consider three disjoint gene sets genesets <- list(set1=paste("g", 1:3, sep=""), set2=paste("g", 4:6, sep=""), set3=paste("g", 7:10, sep="")) ## sample data from a normal distribution with mean 0 and st.dev. 1 y <- matrix(rnorm(n*p), nrow=p, ncol=n, dimnames=list(paste("g", 1:p, sep=""), paste("s", 1:n, sep=""))) ## genes in set1 are expressed at higher levels in the last 'ngrp1+1' to 'n' samples y[genesets$set1, (ngrp1+1):n] <- y[genesets$set1, (ngrp1+1):n] + 2 ## build design matrix design <- cbind(samplegroup1=1, samplegroup2vs1=c(rep(0, ngrp1), rep(1, ngrp2))) ## fit linear model fit <- lmfit(y, design) ## estimate moderated t-statistics fit <- ebayes(fit) ## genes in set1 are differentially expressed toptable(fit, coef="samplegroup2vs1") ## estimate GSVA enrichment scores for the three sets gsva_es <- gsva(y, genesets, mx.diff=1) ## fit the same linear model now to the GSVA enrichment scores fit <- lmfit(gsva_es, design) ## estimate moderated t-statistics fit <- ebayes(fit) ## set1 is differentially expressed toptable(fit, coef="samplegroup2vs1") igsva Gene Set Variation Analysis Description Starts an interactive GSVA shiny web app.

igsva 9 Usage Details Value igsva() GSVA assesses the relative enrichment of gene sets across samples using a non-parametric approach. Conceptually, GSVA transforms a p-gene by n-sample gene expression matrix into a g- geneset by n-sample pathway enrichment matrix. This facilitates many forms of statistical analysis in the space of pathways rather than genes, providing a higher level of interpretability. The igsva() function starts an interactive shiny web app that allows the user to configure the arguments of the gsva() function and runs it on the computer. Please see the manual page of the gsva() function for a description of the arguments and their default and alternative values. The input data may be loaded from the users workspace or by selecting a CSV file for the expression data, and a GMT file for the gene sets data. A gene-set by sample matrix of GSVA enrichment scores after pressing the button Save & Close. This result can be also downloaded as a CSV file with the Download button. Author(s) J. Fernández and R. Castelo References Hänzelmann, S., Castelo, R. and Guinney, J. GSVA: Gene set variation analysis for microarray and RNA-Seq data. BMC Bioinformatics, 14:7, 2013. See Also gsva Examples ## Not run: res <- igsva() ## this will open your browser with the GSVA shiny web app ## End(Not run)

Index Topic GSVA igsva, 8 Topic Gene set computegenesetsoverlap, 2 filtergenesets, 3 Topic Pathway variation gsva, 4 Topic shiny igsva, 8 computegenesetsoverlap, 2, 3, 7 computegenesetsoverlap,genesetcollection,character-method (computegenesetsoverlap), 2 computegenesetsoverlap,genesetcollection,expressionset-method (computegenesetsoverlap), 2 computegenesetsoverlap,list,character-method (computegenesetsoverlap), 2 computegenesetsoverlap,list,expressionset-method (computegenesetsoverlap), 2 filtergenesets, 2, 3, 7 filtergenesets,genesetcollection-method (filtergenesets), 3 filtergenesets,list-method (filtergenesets), 3 gsva, 4, 9 gsva,expressionset,genesetcollection-method (gsva), 4 gsva,expressionset,list-method (gsva), 4 gsva,matrix,genesetcollection-method (gsva), 4 gsva,matrix,list-method (gsva), 4 igsva, 8 10