Package svapls. February 20, 2015

Similar documents
Package ICsurv. February 19, 2015

Package EBglmnet. January 30, 2016

Package BiDimRegression

Package subtype. February 20, 2015

Package mrm. December 27, 2016

Package mixphm. July 23, 2015

Package comphclust. May 4, 2017

Package comphclust. February 15, 2013

Package parcor. February 20, 2015

Package orqa. R topics documented: February 20, Type Package

Package cwm. R topics documented: February 19, 2015

Package psda. September 5, 2017

srap: Simplified RNA-Seq Analysis Pipeline

Package mnlogit. November 8, 2016

Package StVAR. February 11, 2017

Package freeknotsplines

Package robustreg. R topics documented: April 27, Version Date Title Robust Regression Functions

Package Tnseq. April 13, 2017

Package CorporaCoCo. R topics documented: November 23, 2017

Package TANOVA. R topics documented: February 19, Version Date

Package LMGene. R topics documented: December 23, Version Date

Package mtsdi. January 23, 2018

Package influence.sem

Package PedCNV. February 19, 2015

Package nlnet. April 8, 2018

Package TANDEM. R topics documented: June 15, Type Package

ROTS: Reproducibility Optimized Test Statistic

Package r.jive. R topics documented: April 12, Type Package

Package GWRM. R topics documented: July 31, Type Package

Package ssizerna. January 9, 2017

Package TestDataImputation

Package SSLASSO. August 28, 2018

Package MultiRR. October 21, 2015

Package merror. November 3, 2015

Package GFD. January 4, 2018

Generalized least squares (GLS) estimates of the level-2 coefficients,

Package RankAggreg. May 15, 2018

Package biglars. February 19, 2015

Package gpdtest. February 19, 2015

Package RCA. R topics documented: February 29, 2016

Package twilight. August 3, 2013

Package missforest. February 15, 2013

Package XMRF. June 25, 2015

Package msgps. February 20, 2015

Package LncPath. May 16, 2016

Package obliquerf. February 20, Index 8. Extract variable importance measure

Package longclust. March 18, 2018

Package GWAF. March 12, 2015

Package marqlevalg. February 20, 2015

Package CONOR. August 29, 2013

Package Rambo. February 19, 2015

Package EMDomics. November 11, 2018

Package CuCubes. December 9, 2016

Package snm. July 20, 2018

Package RYoudaoTranslate

Package inca. February 13, 2018

Package cycle. March 30, 2019

Package vinereg. August 10, 2018

Package ArrayBin. February 19, 2015

Package genelistpie. February 19, 2015

Package tseriesentropy

Package capushe. R topics documented: April 19, Type Package

Package cgh. R topics documented: February 19, 2015

Chapter 6: Linear Model Selection and Regularization

Package clusterrepro

Package SeleMix. R topics documented: November 22, 2016

Package rknn. June 9, 2015

Package Combine. R topics documented: September 4, Type Package Title Game-Theoretic Probability Combination Version 1.

Package st. July 8, 2015

Package sparsenetgls

Package assortnet. January 18, 2016

Automated Bioinformatics Analysis System on Chip ABASOC. version 1.1

Package pcr. November 20, 2017

Package gpr. February 20, 2015

Package rfsa. R topics documented: January 10, Type Package

Package intccr. September 12, 2017

Package FisherEM. February 19, 2015

Package MultiMeta. February 19, 2015

Package mpm. February 20, 2015

Package FWDselect. December 19, 2015

Course on Microarray Gene Expression Analysis

STAT 311 (3 CREDITS) VARIANCE AND REGRESSION ANALYSIS ELECTIVE: ALL STUDENTS. CONTENT Introduction to Computer application of variance and regression

Package DSBayes. February 19, 2015

Package sigqc. June 13, 2018

Design of Experiments

Package geneslope. October 26, 2016

Package GLDreg. February 28, 2017

Package SoftClustering

Package IntNMF. R topics documented: July 19, 2018

Package ridge. R topics documented: February 15, Title Ridge Regression with automatic selection of the penalty parameter. Version 2.

Package LVGP. November 14, 2018

Package ETC. February 19, 2015

Package orthogonalsplinebasis

The supclust Package

Package pampe. R topics documented: November 7, 2015

Package RobustRankAggreg

Package esabcv. May 29, 2015

Package listdtr. November 29, 2016

Package optimus. March 24, 2017

2017 ITRON EFG Meeting. Abdul Razack. Specialist, Load Forecasting NV Energy

Transcription:

Package svapls February 20, 2015 Type Package Title Surrogate variable analysis using partial least squares in a gene expression study. Version 1.4 Date 2013-09-19 Author Sutirtha Chakraborty, Somnath Datta and Susmita Datta Maintainer Sutirtha Chakraborty <statistuta@gmail.com> Depends R (>= 2.0), class, stats, pls Accurate identification of genes that are truly differentially expressed over two sample varieties, after adjusting for hidden subject-specific effects of residual heterogeneity. License GPL-3 Collate fitmodel.r svpls.r hfp.r NeedsCompilation no Repository CRAN Date/Publication 2013-09-20 08:13:19 R topics documented: svapls-package....................................... 2 fitmodel........................................... 3 hfp.............................................. 4 hidden_fac.dat........................................ 5 svpls............................................. 6 Index 9 1

2 svapls-package svapls-package Surrogate variable analysis using Partial Least Squares in a gene expression data Details The package svapls contains functions that are intended for the identification, correction and visualization of the hidden variability owing to a variety of unknown subject/sample specific effects of residual heterogeneity in a gene expression data. Package: svapls Type: Package Version: 1.4 Date: 2013-09-19 License: GPL-3 The package can be used to find the genes that are truly differentially expressed between two types of samples (tissue types, biological conditions like Cancer/Non- Cancer samples, etc.), after adjusting for the hidden factors of residual heterogeneity in the data. The function svpls detects the truly positive genes after correcting for the hidden variation and also provides a modified gene expression matrix which is free from the spurious effects of the residual expression heterogeneity. Another important function hfp produces a heat- map representing the intensity of latent variability due to the unknown sample- specific factors, for any specified set of genes and subjects. fitmodel, svpls and hfp Author(s) Sutirtha Chakraborty, Somnath Datta and Susmita Datta. Maintainer: Sutirtha Chakraborty <statistuta@gmail.com> References Sutirtha Chakraborty, Somnath Datta and Susmita Datta. (2012) Surrogate Variable Analysis Using Partial Least Squares in Gene Expression Studies. Bioinformatics. fit <- svpls(10,10,hidden_fac.dat,pmax = 5) fit$genes Y.corrected <- fit$y.corr

fitmodel 3 gen <- paste("g",c(1:15,50:65),sep="") sub <- paste("s",c(1:5,11:17),sep="") hfp(fit,gen,sub,hidden_fac.dat) fitmodel Function to fit an ANCOVA model to the log transformed gene expression data, with a certain specified number of surrogate variables. Usage This function begins its operation by fitting a standard ANOVA model to the gene expression data, with the gene, variety main effects and their mutual interaction. The residuals from the fit of this model and the original gene expression values are then respectively organized into two matrices E and Y, where each column corresponds to a certain gene. Now E is regressed on Y by Partial Least Squares (PLS) and a specified number of scores are extracted as the estimates of the latent components from their respective column spaces. The scores in the Y-space are used as surrogate variables along with the gene and variety interaction effects with the first score and the usual effects from the standard ANOVA model, in order to fit an ANCOVA model to the data. The function returns the results from this fit. fitmodel(k1, k2, Y, n.surr) Arguments k1 Number of subjects/samples under variety 1. k2 Number of subjects/samples under variety 2. Y Value n.surr The log transformed gene expression data, with genes along the rows and subjects/samples along the columns. The specified number of surrogate variables. mu.hat G.hat V.hat GV.hat sc beta.hat GZ1.hat VZ1.hat Intercept (general mean effect). Main effects of the genes. Main effects of the varieties. Gene-Variety interaction effects. Values of the Surrogate variables (computed only when n.surr>0). Coefficients of the surrogate variables (computed only when n.surr>0). Interaction effects of the genes with the first surrogate variable (computed only when n.surr>0). Interaction effects of the varieties with the fist surrogate variable (computed only when n.surr>0).

4 hfp vhat.gvh MSE AIC Variances of the estimators for the gene-variety interaction effects. Mean Squarred Error for the fitted model. Value of the Akaike s Information Criterion (AIC) for the fitted model. Author(s) Sutirtha Chakraborty, Somnath Datta and Susmita Datta. References Sutirtha Chakraborty, Somnath Datta and Susmita Datta. (2012) Surrogate Variable Analysis Using Partial Least Squares in Gene Expression Studies. Bioinformatics. Martens, H., Naes, T. (1989) Multivariate Calibration. Chicestor:Wiley. See Also svpls, hfp ## Fitting an ANCOVA model with 5 surrogate variables fit <- fitmodel(10,10,hidden_fac.dat,n.surr = 5) print(fit) hfp Function to construct a heatmap of the hidden variation in the gene expression data. Usage The function hfp produces a plot of the PLS imputed estimate of the hidden variability in the data, derived from the optimal model, corresponding to an user-specified set of genes and subjects/samples. hfp(obj, gen, ind, Y) Arguments obj gen ind Y An svpls object. An user-specified set of genes. An user-specified set of subjects. A log transformed gene expression matrix with genes along the rows and subjects/samples along the columns.

hidden_fac.dat 5 Value A heatmap of the hidden variability corresponding to the specified set of genes and subjects, attributable to the unknown subject-specific factors in the gene expression data. Author(s) Sutirtha Chakraborty, Somnath Datta and Susmita Datta. References Sutirtha Chakraborty, Somnath Datta and Susmita Datta. (2012) Surrogate Variable Analysis Using Partial Least Squares in Gene Expression Studies. Bioinformatics. See Also heatmap, fitmodel, svpls ## Fitting the optimal ANCOVA model to the data gives: fit <- svpls(10,10,hidden_fac.dat,pmax = 5) ## Specifying the sets of genes and subjects gen <- paste("g",c(1:15,50:65),sep="") sub <- paste("s",c(1:5,11:17),sep="") hfp(fit,gen,sub,hidden_fac.dat) hidden_fac.dat A gene expression data affected by a hidden variable. The dataset contains the log transformed expression levels of 500 genes over 20 subjects distributed equally between two varieties 1 and 2. The data is affected by the unknown effects from a hidden confounder whose effect changes over the two sample varieties. Usage

6 svpls Format A data frame with 500 observations on the following 20 variables. S1 a numeric vector S2 a numeric vector S3 a numeric vector S4 a numeric vector S5 a numeric vector S6 a numeric vector S7 a numeric vector S8 a numeric vector S9 a numeric vector S10 a numeric vector S11 a numeric vector S12 a numeric vector S13 a numeric vector S14 a numeric vector S15 a numeric vector S16 a numeric vector S17 a numeric vector S18 a numeric vector S19 a numeric vector S20 a numeric vector ## maybe str(hidden_fac.dat) ; plot(hidden_fac.dat)... svpls Function for identfying the optimal ANCOVA model and detecting the genes that are truly differentially expressed between the two types of samples. This function calls fitmodel repeatedly to fit a series of ANCOVA models along with the standard ANOVA model, to the log transformed gene expression data. The model with the minimum AIC is selected as the optimal one and its corresponding estimated effects are then used to perform a multiple testing of differential expression, over all the genes, using the Benjamini-Hochberg correction.

svpls 7 Usage svpls(k1, k2, Y, pmax = 3, fdr = 0.05) Arguments k1 Number of subjects/samples under variety 1. k2 Number of subjects/samples under variety 2. Y pmax fdr The log transformed gene expression data, with genes along the rows and subjects/samples along the columns. Maximum number of surrogate variables to be incorporated in the ANCOVA model (means pmax ANCOVA models are fitted to the data). By default, it is taken as 3. The specified False Discovery Rate (FDR) for multiple testing of differential expression, using the Benjamini-Hochberg correction. By Default it is taken as 0.05. Value opt.model PLS.imp Y.corr pvalues pvalues.adj genes AIC.opt The optimal model. 1 denotes the standard ANOVA model. PlS imputed estimate of the hidden expression heterogeneity, evaluated from the optimal model (applicable only when opt.model>1). Corrected gene expression matrix after adjusting for the hidden effects (applicable only when opt.model>1). p-values from the tests with the effects estimated from the standard ANOVA model (returned only when opt.model=1. Adjusted p-values after correcting for the hidden effects (applicable only when opt.model>1). Genes that are deemed to be differentially expressed from the multiple hypotheses testing with effects estimated from the optimal model. AIC value for the optimal model. Author(s) Sutirtha Chakraborty, Somnath Datta and Susmita Datta. References Hirotsugu, A. (1980) Likelihood and the Bayes Procedure. The Institute of Statistical Mathematics, Tokyo., Benjamini, Y and Hochberg, Y (1995) Controlling the false discovery rate : a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. See Also fitmodel, hfp

8 svpls ## Loading the first dataset ## Fitting the optimal ANCOVA model to the data gives: fit <- svpls(10,10,hidden_fac.dat,pmax = 5) ## The optimal ANCOVA model, its AIC value and the positive genes detected from it are given by: fit$opt.model fit$aic.opt fit$genes ## The corrected gene expression matrix obtained after removing the effects of ## the hidden variability is given by: Y.corrected <- fit$y.corr

Index Topic classes fitmodel, 3 svpls, 6 Topic datasets hidden_fac.dat, 5 Topic methods fitmodel, 3 svpls, 6 Topic models fitmodel, 3 svapls-package, 2 Topic print fitmodel, 3 hfp, 4 svpls, 6 fitmodel, 3, 5, 7 heatmap, 5 hfp, 4, 4, 7 hidden_fac.dat, 5 svapls (svapls-package), 2 svapls-package, 2 svpls, 4, 5, 6 9