Package cycle. March 30, 2019

Similar documents
Introduction to Mfuzz package and its graphical user interface

Package frma. R topics documented: March 8, Version Date Title Frozen RMA and Barcode

Package Mfuzz. R topics documented: July 4, Version Date

Package OrderedList. December 31, 2017

Package GSRI. March 31, 2019

Package Mfuzz. R topics documented: March 26, Version Date Title Soft clustering of time series gene expression data

Package POST. R topics documented: November 8, Type Package

Package STROMA4. October 22, 2018

Package LiquidAssociation

Package macorrplot. R topics documented: October 2, Title Visualize artificial correlation in microarray data. Version 1.50.

Package TMixClust. July 13, 2018

Package twilight. August 3, 2013

Package dualks. April 3, 2019

Package diggit. R topics documented: January 11, Version Date

Package tspair. July 18, 2013

Package ConsensusClusterPlus

Package ffpe. October 1, 2018

Package mfa. R topics documented: July 11, 2018

Package LMGene. R topics documented: December 23, Version Date

Package OLIN. September 30, 2018

Package snm. July 20, 2018

Comparisons and validation of statistical clustering techniques for microarray gene expression data. Outline. Microarrays.

ROTS: Reproducibility Optimized Test Statistic

Package generxcluster

Package Modeler. May 18, 2018

The supclust Package

Package semisup. March 10, Version Title Semi-Supervised Mixture Model

Package pcagopromoter

Package dmrseq. September 14, 2018

Package simulatorz. March 7, 2019

Package scmap. March 29, 2019

Package MEAL. August 16, 2018

Package methyvim. April 5, 2019

Package SCAN.UPC. July 17, 2018

Package nlnet. April 8, 2018

Package RobustRankAggreg

Package EMDomics. November 11, 2018

Package svapls. February 20, 2015

Package MiRaGE. October 16, 2018

Package stattarget. December 5, 2017

Package CNTools. November 6, 2018

Package stattarget. December 23, 2018

Package sscore. R topics documented: June 27, Version Date

Package vbmp. March 3, 2018

Package bdots. March 12, 2018

Package ssizerna. January 9, 2017

Package G1DBN. R topics documented: February 19, Version Date

safe September 23, 2010

Package PSEA. R topics documented: November 17, Version Date Title Population-Specific Expression Analysis.

Package splinetimer. December 22, 2016

Package BDMMAcorrect

Package macat. October 16, 2018

Package PCGSE. March 31, 2017

Package M3Drop. August 23, 2018

Package sparsereg. R topics documented: March 10, Type Package

Package DFP. February 2, 2018

Exploring cdna Data. Achim Tresch, Andreas Buness, Tim Beißbarth, Florian Hahne, Wolfgang Huber. June 17, 2005

AB1700 Microarray Data Analysis

Medical Image Analysis

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski

Package TANOVA. R topics documented: February 19, Version Date

Bioconductor exercises 1. Exploring cdna data. June Wolfgang Huber and Andreas Buness

Package TANDEM. R topics documented: June 15, Type Package

Chapter 6: Linear Model Selection and Regularization

Package subtype. February 20, 2015

Package sigqc. June 13, 2018

Package clusterrepro

Package MODA. January 8, 2019

Package GateFinder. March 17, 2019

Package biosigner. March 6, 2019

Package RUVSeq. September 16, 2018

Package ebdbnet. February 15, 2013

Package cgh. R topics documented: February 19, 2015

Package ibbig. R topics documented: December 24, 2018

Package intccr. September 12, 2017

EBSeqHMM: An R package for identifying gene-expression changes in ordered RNA-seq experiments

Package st. July 8, 2015

Package tatest. July 18, 2018

Package orqa. R topics documented: February 20, Type Package

Package Director. October 15, 2018

Package AffyExpress. October 3, 2013

Package makecdfenv. March 13, 2019

Package QUBIC. September 1, 2018

How to use the rbsurv Package

Package RSNPset. December 14, 2017

Package BgeeDB. January 5, 2019

Package BrainStars. July 19, 2018

Package mimager. March 7, 2019

Package BAC. R topics documented: November 30, 2018

STEM. Short Time-series Expression Miner (v1.1) User Manual

Package MaxContrastProjection

Package rain. March 4, 2018

Package affypdnn. January 10, 2019

Package affylmgui. July 21, 2018

Package multihiccompare

Package virtualarray

Package EBglmnet. January 30, 2016

Package sparsenetgls

Drug versus Disease (DrugVsDisease) package

Package EDDA. April 14, 2019

Transcription:

Package cycle March 30, 2019 Version 1.36.0 Date 2016-02-18 Title Significance of periodic expression pattern in time-series data Author Matthias Futschik <mfutschik@ualg.pt> Maintainer Matthias Futschik <mfutschik@ualg.pt> Depends R (>= 2.10.0), Mfuzz Imports Biobase, stats Description Package for assessing the statistical significance of periodic expression based on Fourier analysis and comparison with data generated by different background models License GPL-2 URL http://cycle.sysbiolab.eu biocviews Microarray, TimeCourse git_url https://git.bioconductor.org/packages/cycle git_branch RELEASE_3_8 git_last_commit c0d50b6 git_last_commit_date 2018-10-30 Date/Publication 2019-03-30 R topics documented: ar1analysis......................................... 2 backgrounddata....................................... 3 fdrfourier.......................................... 4 fourierscore......................................... 6 Index 8 1

2 ar1analysis ar1analysis Performs AR1 fitting Description Usage Calculation of the autocorrelation coefficients genes and variance of corresponding random variables to fit gene expression time series by AR1 processes ar1analysis(eset) Arguments eset object of the class ExpressionSet Value Note List of fitted autocorrelation coefficients (alpha) for ExpressionSet features and variance (sigma2) of corresponding random variables obtained using the ar function of the stats package. Note that this function evaluates soley the exprs matrix and no information is used from the phenodata. In particular, the ordering of samples (arrays) is the same as the ordering of the columns in the exprs matrix. Also, replicated arrays in the exprs matrix are treated as independent i.e. they should be averagered prior to analysis or placed into different distinct ExpressionSet objects. See Also ar Examples data(yeast) # loading the reduced CDC28 yeast set (from the Mfuzz package) # Data preprocessing if (interactive()){ data(yeast) yeast <- filter.na(yeast) # filters genes with more than 25% of the expression values missing yeast <- fill.na(yeast) # for illustration only; rather use knn method for replacing missing values tmp <- ar1analysis(yeast) # fits AR1 process autocorrelation coefficients plot(density(tmp$alpha),main="autocorrelation") }

backgrounddata 3 backgrounddata Generation of background expression set Description Usage The function generates background expression sets using different methods (permutation within rows, Gaussian distribution, auto-regressive models) backgrounddata(eset,model=c("rr", "gauss", "ar1")) Arguments eset model object of the class ExpressionSet model for generation of background data: rr - random permutation, gauss - Gaussian and ar1 - AR1 models Details Microarray data comprise the measurements of transcript levels for many thousands of genes. Due to the large number of genes, it can be expected that some genes show periodicity simply by chance. To assess therefore the significance of periodic signals, it is necessary first to define what distribution of signals can be expected if the studied process exhibits no true periodicity. In statistical terms this is equivalent with the definition of a null hypothesis of non-periodic expression. The most simple model for non-periodic expression is based on randomization of the observed times series. A background distribution can then be constructed by (repeated) random permutation of the sequentially ordered measurements in the experiment. This background model is used here if model="rr" is chosen. Alternatively, non-periodic expression can be derived using a statistical model. A conventional approach is based on the assumption of data normality and to use the normal distribution. This background model is chosen if model="gauss". However, these two approaches neglect the fact that time series data exhibit generally a considerable autocorrelation i.e. correlation between successive measurements. Therefore, neither the assumptions of data normality nor for randomizations may hold. As demonstrated for yeast cell cycle data (Bioinformatics 2008), this failure can substantially interfere with the significance testing, and that neglecting autocorrelation can potentially lead to a considerable overestimation of the number of periodically expressed genes. A more suitable model is based on autoregressive processes of order one (AR(1)), for which the value of the time-dependent variable X depends on its previous value up to a normally distributed random variable Z. Such model is used here for the setting of model="ar1". The autocorrelation of X and variance of Z is estimated for each feature of the ExpressionSet object separately. Mathematical details can be found in the given reference. It is important to note in this context, that AR(1) processes cannot capture periodic patterns except for alternations with period two. Since Z is a random variable, we can readily generate a collection of time series with the same autocorrelation as in the original data set. Therefore, although AR(1) processes constitute random processes, they allow us to construct a background distribution that captures the autocorrelation structure of original gene expression time series without fitting the potentially included periodic pattern.

4 fdrfourier Value Note ExpressionSet object with expression data generated by the chosen background model Note that this function evaluates soley the exprs matrix and no information is used from the phenodata. In particular, the ordering of samples (arrays) is the same as the ordering of the columns in the exprs matrix. Also, replicated arrays in the exprs matrix are treated as independent i.e. they should be averagered prior to analysis or placed into different distinct ExpressionSet objects. Author(s) Matthias E. Futschik (http://www.cbme.ualg.pt/mfutschik_cbme.html) References Matthias E. Futschik and Hanspeter Herzel (2008) Are we overestimating the number of cell-cycling genes? The impact of background models on time series analysis, Bioinformatics, 24(8):1063-1069 fdrfourier Calculation of the false discovery rates (FDR) for periodic expression Description Usage The function calculates the empirical FDR based on derived Fourier scores derived by fourierscore for the observed expression and the comparison with scores derived for different background model generated by backgrounddata. fdrfourier(eset,t,times,background.model="rr",n=100,progress=false) Arguments eset T object of the class ExpressionSet cycle period times time of measurements background.model model for generation of background data: rr - permutation within rows, gauss - Gaussian background, ar1 - AR1 models N Details progress number of generated data sets for the background distribution if set to TRUE, a progress of calculations is reported To assess the significance of the Fourier score obtained for the original gene expression time series, the probability has to be calculated of how often such a score would be observed by chance based on the chosen background distribution. The statistical significance is given by the calculated false discovery rate. It is defined here as the expected proportion of false positives among all genes detected as periodically expressed. Mathematical details can be found in the given reference.

fdrfourier 5 Value Note List with FDR for the features of the eset object (fdr), and Fourier scores for ExpressionSet object (F) and the background data (F.b). This is the main function of the cycle package. Note that the calculation of FDR employing empirical background distributions can require considerable time (up to several days for large gene expression data sets). Importantly, this function evaluates soley the exprs matrix and no information is used from the phenodata. In particular, the ordering of samples (arrays) is the same as the ordering of the columns in the exprs matrix. Also, replicated arrays in the exprs matrix are treated as independent i.e. they should be averagered prior to analysis or placed into different distinct ExpressionSet objects. Author(s) Matthias E. Futschik (http://www.cbme.ualg.pt/mfutschik_cbme.html) References Matthias E. Futschik and Hanspeter Herzel (2008) Are we overestimating the number of cell-cycling genes? The impact of background models on time series analysis, Bioinformatics, 24(8):1063-1069 Examples if (interactive()){ set.seed(1) data(yeast) # loading the reduced CDC28 yeast set (from the Mfuzz package) # Data preprocessing yeast <- filter.na(yeast) # filters genes with more than 25% of the expression values missing yeast <- fill.na(yeast) # for illustration only; rather use knn method for yeast <- standardise(yeast) # T.yeast <- 85 # cell cycle period (t=85min) times.yeast <- pdata(yeast)$time # time of measurements # yeast.test <- yeast[1:600,] # To speed up the example # NN <- 50 # number of generated background models # Here, a small number was chosen for demonstration purpose. # For the actual analysis, rather set N = 1000 # Calculation of FDRs # i) based on random permutation as background model fdr.rr <- fdrfourier(eset=yeast.test,t=t.yeast, times=times.yeast,background.model="rr",n=nn,progress=true) # ii) based on Gaussian distribution fdr.g <- fdrfourier(eset=yeast.test,t=t.yeast, times=times.yeast,background.model="gauss",n=nn,progress=true) # iii) based on AR(1) models as background fdr.ar1 <- fdrfourier(eset=yeast.test,t=t.yeast, times=times.yeast,background.model="ar1",n=nn,progress=true)

6 fourierscore # Number of significant genes based on diff. background models sum(fdr.rr$fdr < 0.1) sum(fdr.g$fdr < 0.1) sum(fdr.ar1$fdr < 0.1) # Plot top scoring gene plot(times.yeast,exprs(yeast.test)[order(fdr.ar1$fdr)[1],],type="o", xlab="time",ylab="expression", main=paste(featurenames(yeast.test)[order(fdr.ar1$fdr)[1]],"-- FDR:", fdr.ar1$fdr[order(fdr.ar1$fdr)[1]])) # List significant genes fdr.ar1$fdr[which(fdr.ar1$fdr < 0.1)] } fourierscore Calculation of the Fourier score Description The function calculates fourier score for a chosen periodicity. Usage fourierscore(eset,t,times=seq_len(ncol(eset))) Arguments eset T times object of the class ExpressionSet length of chosen period measurement times (with the index of the column as default) Details The Fourier score can be used to detect periodic signals. The closer a gene s expression follows a (possibly shifted) cosine curve of period T, the larger is the Fourier score. Mathematical details can be found in the given reference. The function fourierscore calculates the Fourier scores for all features of an ExpressionSet object. Value Fourier scores for all features (genes) of eset Note Note that this function evaluates soley the exprs matrix and no information is used from the phenodata. In particular, the ordering of samples (arrays) is the same as the ordering of the columns in the exprs matrix. Also, replicated arrays in the exprs matrix are treated as independent i.e. they should be averagered prior to analysis or placed into different distinct ExpressionSet objects.

fourierscore 7 Author(s) Matthias E. Futschik (http://www.cbme.ualg.pt/mfutschik_cbme.html) References Matthias E. Futschik and Hanspeter Herzel (2008) Are we overestimating the number of cell-cycling genes? The impact of background models on time series analysis, Bioinformatics, 24(8):1063-1069 Examples # Data preprocessing if (interactive()){ set.seed(1) data(yeast) # loading the reduced CDC28 yeast set (from the Mfuzz package) yeast <- yeast[1:600,] # for illustration yeast <- filter.na(yeast) # filters genes with more than 25% of the expression values missing yeast <- fill.na(yeast) # for illustration only; rather use knn method for yeast <- standardise(yeast) T.yeast <- 85 # cell cycle period (t=85min) times.yeast <- pdata(yeast)$time # time of measurements F <- fourierscore(yeast,t = T.yeast, times= times.yeast) # calculates the Fourier scores for yeast genes # Plot highest scoring gene plot(times.yeast,exprs(yeast)[order(-f)[1],],type="o", xlab="time",ylab="expression", main=featurenames(yeast)[order(-f)[1]]) # Plot lowest scoring gene plot(times.yeast,exprs(yeast)[order(f)[1],],type="o", xlab="time",ylab="expression", main=featurenames(yeast)[order(f)[1]]) }

Index Topic datagen backgrounddata, 3 Topic distribution backgrounddata, 3 Topic htest fdrfourier, 4 Topic ts ar1analysis, 2 fdrfourier, 4 fourierscore, 6 ar, 2 ar1analysis, 2 backgrounddata, 3 fdrfourier, 4 fourierscore, 6 8