Package TANDEM. R topics documented: June 15, Type Package

Similar documents
Package glmnetutils. August 1, 2017

Package milr. June 8, 2017

Package glinternet. June 15, 2018

Package ordinalnet. December 5, 2017

Package PUlasso. April 7, 2018

Package EBglmnet. January 30, 2016

Package hiernet. March 18, 2018

Package msda. February 20, 2015

Package svalues. July 15, 2018

Package bisect. April 16, 2018

Package flam. April 6, 2018

Package TVsMiss. April 5, 2018

Package intccr. September 12, 2017

Package ECctmc. May 1, 2018

Package GauPro. September 11, 2017

Package PDN. November 4, 2017

Package cattonum. R topics documented: May 2, Type Package Version Title Encode Categorical Features

Glmnet Vignette. Introduction. Trevor Hastie and Junyang Qian

Package subsemble. February 20, 2015

Package rpst. June 6, 2017

Package BiDimRegression

Package hsiccca. February 20, 2015

Package enpls. May 14, 2018

Package RAMP. May 25, 2017

Lasso. November 14, 2017

Package RNAseqNet. May 16, 2017

Package svmpath. R topics documented: August 30, Title The SVM Path Algorithm Date Version Author Trevor Hastie

Package vinereg. August 10, 2018

Package CVR. March 22, 2017

Package fastdummies. January 8, 2018

Package MTLR. March 9, 2019

Package sparsenetgls

Package GFD. January 4, 2018

Package SEMrushR. November 3, 2018

Package BigTSP. February 19, 2015

Package psda. September 5, 2017

Package NNLM. June 18, 2018

Package catenary. May 4, 2018

Package colf. October 9, 2017

Package msgps. February 20, 2015

Package gppm. July 5, 2018

Package vennlasso. January 25, 2019

Package gbts. February 27, 2017

Package KRMM. R topics documented: June 3, Type Package Title Kernel Ridge Mixed Model Version 1.0 Author Laval Jacquin [aut, cre]

Package datasets.load

Package DTRlearn. April 6, 2018

14. League: A factor with levels A and N indicating player s league at the end of 1986

Package weco. May 4, 2018

Package gggenes. R topics documented: November 7, Title Draw Gene Arrow Maps in 'ggplot2' Version 0.3.2

Package optimus. March 24, 2017

Package IntNMF. R topics documented: July 19, 2018

Package bife. May 7, 2017

Package WordR. September 7, 2017

Package geojsonsf. R topics documented: January 11, Type Package Title GeoJSON to Simple Feature Converter Version 1.3.

Package polywog. April 20, 2018

Package sprinter. February 20, 2015

Package freeknotsplines

Package LVGP. November 14, 2018

Package sigqc. June 13, 2018

Package nmslibr. April 14, 2018

Package sfc. August 29, 2016

Package nonlinearicp

Package simr. April 30, 2018

Package validara. October 19, 2017

Package epitab. July 4, 2018

Package splines2. June 14, 2018

Package CausalImpact

Package pksea. December 22, 2017

Package XMRF. June 25, 2015

Package diffusr. May 17, 2018

Package quickreg. R topics documented:

Package elmnnrcpp. July 21, 2018

Package widyr. August 14, 2017

Package Cubist. December 2, 2017

Package listdtr. November 29, 2016

Package DRaWR. February 5, 2016

Package fcr. March 13, 2018

Package scoop. September 16, 2011

Package shinyfeedback

Package robustreg. R topics documented: April 27, Version Date Title Robust Regression Functions

Package jdx. R topics documented: January 9, Type Package Title 'Java' Data Exchange for 'R' and 'rjava'

Package ArCo. November 5, 2017

Package pcr. November 20, 2017

Package apastyle. March 29, 2017

Package clusternomics

The supclust Package

Package ssa. July 24, 2016

Package cosinor. February 19, 2015

Package rnn. R topics documented: June 21, Title Recurrent Neural Network Version 0.8.1

Package gwrr. February 20, 2015

Package automl. September 13, 2018

Package CatEncoders. March 8, 2017

Package CompGLM. April 29, 2018

Package phrasemachine

Package mgc. April 13, 2018

Package simsurv. May 18, 2018

Package balance. October 12, 2018

Package dyncorr. R topics documented: December 10, Title Dynamic Correlation Package Version Date

Package BASiNET. October 2, 2018

Package MonteCarlo. April 23, 2017

Transcription:

Type Package Package TANDEM June 15, 2017 Title A Two-Stage Approach to Maximize Interpretability of Drug Response Models Based on Multiple Molecular Data Types Version 1.0.2 Date 2017-04-07 Author Nanne Aben Maintainer Nanne Aben <nanne.aben@gmail.com> A two-stage regression method that can be used when various input data types are correlated, for example gene expression and methylation in drug response prediction. In the first stage it uses the upstream features (such as methylation) to predict the response variable (such as drug response), and in the second stage it uses the downstream features (such as gene expression) to predict the residuals of the first stage. In our manuscript (Aben et al., 2016, <doi:10.1093/bioinformatics/btw449>), we show that using TANDEM prevents the model from being dominated by gene expression and that the features selected by TANDEM are more interpretable. Depends R (>= 2.10) Imports glmnet, Matrix License GPL-2 LazyData TRUE RoxygenNote 6.0.1 Suggests knitr, rmarkdown VignetteBuilder knitr NeedsCompilation no Repository CRAN Date/Publication 2017-06-15 13:40:02 UTC R topics documented: coef.tandem......................................... 2 example_data........................................ 3 1

2 coef.tandem nested.cv.......................................... 3 predict.tandem....................................... 4 relative.contributions.................................... 5 tandem............................................ 7 Index 9 coef.tandem Returns the regression coefficients from a TANDEM fit Returns the regression coefficients from a TANDEM fit. ## S3 method for class 'tandem' coef(object,...) object A tandem-object, as returned by tandem()... Not used. Other arguments for coef(). The regression coefficients. # fit a tandem model, determine the coefficients and create a prediction beta = coef(fit) y_hat = predict(fit, newx=x)

example_data 3 example_data A small artificial data set A small artificial data set example_data Format A small artificial data set, containing 200 samples, 20 upstream features and 20 downstream features x A 200 x 40 feature matrix y A 200 x 1 response vector upstream A 40 x 1 boolean vector indicating which features are upstream features data_types A 40 x 1 vector indicating for each feature to which data type they belong nested.cv Estimating predictive performance via nested cross-validation Performs a nested cross-validation to assess the predictive performance. The inner loop is used to determine the optimal lambda (as in cv.glmnet) and the outer loop is used to asses the predictive performance in an unbiased way. nested.cv(x, y, upstream, method = "tandem", family = "gaussian", nfolds = 10, nfolds_inner = 10, foldid = NULL, lambda_upstream = "lambda.1se", lambda_downstream = "lambda.1se", lambda_glmnet = "lambda.1se",...) x y upstream A feature matrix, where the rows correspond to samples and the columns to features. A vector containing the response. A logical index vector that indicates for each feature whether it s upstream (TRUE) or downstream (FALSE).

4 predict.tandem method family nfolds nfolds_inner Indicates whether the nested cross-validation is performed on TANDEM or on the classic approach (glmnet). Should be either "tandem" or "glmnet". The family parameter that s passed to cv.glmnet(). Currently, only family= gaussian is supported. Number of cross-validation folds (default is 10) used in the outer cross-validation loop. Number of cross-validation folds (default is 10) used to determine the optimal lambda in the inner cross-validation loop. foldid An optional vector indicating in which cross-validation fold each sample should be in the outer cross-validation loop. Overrides nfolds when used. lambda_upstream Only used when method= tandem. For the first stage (using the upstream features), should glmnet use lambda.min or lambda.1se? Default is lambda.1se. lambda_downstream Only used when method= tandem. For the second stage (using the downstream features), should glmnet use lambda.min or lambda.1se? Default is lambda.1se. lambda_glmnet Only used when method= glmnet. Should glmnet use lambda.min or lambda.1se? Default is lambda.1se.... Other parameters that are passed to cv.glmnet(). The predicted response vector y_hat and the mean-squared error (MSE). # assess the prediction error in a nested cv-loop # fix the seed to have the same foldids between the two methods set.seed(1) cv_tandem = nested.cv(x, y, upstream, method="tandem", alpha=0.5) set.seed(1) cv_glmnet = nested.cv(x, y, upstream, method="glmnet", alpha=0.5) barplot(c(cv_tandem$mse, cv_glmnet$mse), ylab="mse", names=c("tandem", "Classic approach")) predict.tandem Creates a prediction using a tandem-object Creates a prediction using a tandem-object.

relative.contributions 5 ## S3 method for class 'tandem' predict(object, newx,...) object newx A tandem-object, as returned by tandem() A feature matrix, where the rows correspond to samples and the columns to features.... Not used. Other arguments for predict(). The predicted response vector. # fit a tandem model, determine the coefficients and create a prediction beta = coef(fit) y_hat = predict(fit, newx=x) relative.contributions Determine the relative contribution per data type For each data type, determine its relative contribution to the overall prediction. relative.contributions(fit, x, data_types, lambda_glmnet = "lambda.1se") fit x Either a tandem-object or a cv.glmnet-object The feature matrix used to train the fit, where the rows correspond to samples and the columns to features.

6 relative.contributions data_types lambda_glmnet A vector of the same length as the number of features, that indicates for each feature to which data type it belongs. This vector doesn t need to correspond to the upstream vector used in tandem(). For example, the upstream features be spread across various data types (such as mutation, CNA, methylation and cancer type) and the downstream features could be gene expression. Only used when fit is a cv.glmnet object. Should glmnet use lambda.min or lambda.1se? Default is lambda.1se. Note that for TANDEM objects, the lambda_upstream and lambda_downstream parameters should be specified during the tandem() call, as they are used while fitting the model. A vector that indicates the relative contribution per data type. These numbers sum up to one. ## simple example data_types = example_data$data_types # fit TANDEM model # assess the relative contribution of upstream and downstream features contr = relative.contributions(fit, x, data_types) barplot(contr, ylab="relative contribution", ylim=0:1) ## comparing TANDEM and classic model (glmnet) data_types = example_data$data_types # fix the cv folds, to facilitate a comparison between models set.seed(1) n = nrow(x) nfolds = 10 foldid = ceiling(sample(1:n)/n * nfolds) # fit both a TANDEM and a classic model (glmnet) library(glmnet) fit2 = cv.glmnet(x, y, alpha=0.5, foldid=foldid) # assess the relative contribution of upstream and downstream features # using both methods contr_tandem = relative.contributions(fit, x, data_types) contr_glmnet = relative.contributions(fit2, x, data_types)

tandem 7 par(mfrow=c(1,2)) barplot(contr_tandem, ylab="relative contribution", main="tandem", ylim=0:1) barplot(contr_glmnet, ylab="relative contribution", main="classic approach", ylim=0:1) par(mfrow=c(1,1)) tandem Fits a TANDEM model by performing a two-stage regression Fits a TANDEM model by performing a two-stage regression. In the first stage, all upstream features (x[,upstream]) are regressed on the output y. In the second stage, the downstream features (x[,!upstream]) are regressed on the residuals of the first stage. In both stages Elastic Net regression (as implemented in cv.glmnet() from the glmnet package) is used to perform the regression. tandem(x, y, upstream, family = "gaussian", nfolds = 10, foldid = NULL, lambda_upstream = "lambda.1se", lambda_downstream = "lambda.1se",...) x y upstream family nfolds A feature matrix, where the rows correspond to samples and the columns to features. A vector containing the response. A boolean vector that indicates for each feature whether it s upstream (TRUE) or downstream (FALSE). The family parameter that s passed to cv.glmnet(). Currently, only family= gaussian is supported. Number of cross-validation folds (default is 10) used to determine the optimal lambda in cv.glmnet(). foldid An optional vector indicating in which cross-validation fold each sample should be. Overrides nfolds when used. lambda_upstream For the first stage (using the upstream features), should glmnet use lambda.min or lambda.1se? Default is lambda.1se. lambda_downstream For the second stage (using the downstream features), should glmnet use lambda.min or lambda.1se? Default is lambda.1se.... Other parameters that are passed to cv.glmnet(). A tandem-object.

8 tandem # fit a tandem model, determine the coefficients and create a prediction beta = coef(fit) y_hat = predict(fit, newx=x)

Index Topic datasets example_data, 3 coef.tandem, 2 example_data, 3 nested.cv, 3 predict.tandem, 4 relative.contributions, 5 tandem, 7 9