Package abe. October 30, 2017

Similar documents
Package FWDselect. December 19, 2015

Package palmtree. January 16, 2018

Package rfsa. R topics documented: January 10, Type Package

Package freeknotsplines

Package MIICD. May 27, 2017

Package DTRreg. August 30, 2017

Package vinereg. August 10, 2018

Package quickreg. R topics documented:

Package robets. March 6, Type Package

Package cattonum. R topics documented: May 2, Type Package Version Title Encode Categorical Features

Package intccr. September 12, 2017

Package PTE. October 10, 2017

Package basictrendline

Package rpst. June 6, 2017

Package simr. April 30, 2018

Package TVsMiss. April 5, 2018

Package datasets.load

Package clustvarsel. April 9, 2018

Package citools. October 20, 2018

Package clusterpower

Package lcc. November 16, 2018

Package sprinter. February 20, 2015

Package GWRM. R topics documented: July 31, Type Package

Package GLDreg. February 28, 2017

Package nonnest2. January 23, 2018

Package complmrob. October 18, 2015

Package optimus. March 24, 2017

Package fbroc. August 29, 2016

Package vip. June 15, 2018

Package cwm. R topics documented: February 19, 2015

Package sparsereg. R topics documented: March 10, Type Package

Package fastdummies. January 8, 2018

Package MOST. November 9, 2017

Package missforest. February 15, 2013

Package SSLASSO. August 28, 2018

Package nricens. R topics documented: May 30, Type Package

Package rereg. May 30, 2018

Package EFS. R topics documented:

Package infer. July 11, Type Package Title Tidy Statistical Inference Version 0.3.0

Package CombMSC. R topics documented: February 19, 2015

Package svalues. July 15, 2018

Package edrgraphicaltools

Package caic4. May 22, 2018

Package epitab. July 4, 2018

Package SSRA. August 22, 2016

Package CMatching. R topics documented:

Package bisect. April 16, 2018

Package addhaz. September 26, 2018

Package glmnetutils. August 1, 2017

Package longclust. March 18, 2018

Package manet. September 19, 2017

Package ordinalnet. December 5, 2017

Package quadprogxt. February 4, 2018

Package PrivateLR. March 20, 2018

Package MCMC4Extremes

Package sizemat. October 19, 2018

Package qlearn. R topics documented: February 20, Type Package Title Estimation and inference for Q-learning Version 1.

Package FisherEM. February 19, 2015

Package bacon. October 31, 2018

Package endogenous. October 29, 2016

Package misclassglm. September 3, 2016

Package CALIBERrfimpute

Package oaxaca. January 3, Index 11

Package ArCo. November 5, 2017

Package nonlinearicp

Package sspse. August 26, 2018

Package apastyle. March 29, 2017

Package StVAR. February 11, 2017

Package BigVAR. R topics documented: April 3, Type Package

Package uclaboot. June 18, 2003

Package gamm4. July 25, Index 10

Package ICsurv. February 19, 2015

Package OLScurve. August 29, 2016

Package BiDimRegression

Package reghelper. April 8, 2017

Package bayesdp. July 10, 2018

Package flexcwm. May 20, 2018

Package QCAtools. January 3, 2017

Package MSwM. R topics documented: February 19, Type Package Title Fitting Markov Switching Models Version 1.

The simpleboot Package

Package robustgam. February 20, 2015

Package interplot. R topics documented: June 30, 2018

Package coloc. February 24, 2018

Package kirby21.base

Package CausalImpact

Package MatchIt. April 18, 2017

Package sure. September 19, 2017

Package C50. December 1, 2017

Package mlvar. August 26, 2018

Package gppm. July 5, 2018

Package colf. October 9, 2017

Package IATScore. January 10, 2018

Package ClusterBootstrap

Package CausalGAM. October 19, 2017

Package OrthoPanels. November 11, 2016

Package PCADSC. April 19, 2017

Package sigqc. June 13, 2018

Package EBglmnet. January 30, 2016

Package pwrab. R topics documented: June 6, Type Package Title Power Analysis for AB Testing Version 0.1.0

Package msgps. February 20, 2015

Transcription:

Package abe October 30, 2017 Type Package Title Augmented Backward Elimination Version 3.0.1 Date 2017-10-25 Author Rok Blagus [aut, cre], Sladana Babic [ctb] Maintainer Rok Blagus <rok.blagus@mf.uni-lj.si> Description Performs augmented backward elimination and checks the stability of the obtained model. Augmented backward elimination combines significance or information based criteria with the change in estimate to either select the optimal model for prediction purposes or to serve as a tool to obtain a practically sound, highly interpretable model. More details can be found in Dunkler et al. (2014) <doi:10.1371/journal.pone.0113677>. License GPL (>= 2) RoxygenNote 6.0.1 NeedsCompilation no Repository CRAN Date/Publication 2017-10-30 09:14:07 UTC R topics documented: abe.............................................. 2 abe.boot........................................... 5 plot.abe........................................... 7 summary.abe........................................ 9 Index 11 1

2 abe abe Augmented Backward Elimination Description Usage Function "abe" performs Augmented backward elimination where variable selection is based on the change-in-estimate and significance or information criteria. It can also make a backward-selection based on significance or information criteria only by turning off the change-in-estimate criterion. abe(fit, data = NULL, include = NULL, active = NULL, tau = 0.05, exp.beta = TRUE, exact = FALSE, criterion = "alpha", alpha = 0.2, type.test = "Chisq", type.factor = NULL, verbose = T) Arguments fit data include active tau exp.beta exact criterion An object of a class "lm", "glm" or "coxph" representing the fit. Note, the functions should be fitted with argument x=true and y=true. data frame used when fitting the object fit. a vector containing the names of variables that will be included in the final model. These variables are used as only passive variables during modeling. These variables might be exposure variables of interest or known confounders. They will never be dropped from the working model in the selection process, but they will be used passively in evaluating change-in-estimate criteria of other variables. Note, variables which are not specified as include or active in the model fit are assumed to be active and passive variables. a vector containing the names of active variables. These less important explanatory variables will only be used as active, but not as passive variables when evaluating the change-in-estimate criterion. Value that specifies the threshold of the relative change-in-estimate criterion. Default is set to 0.05. Logical specifying if exponent is used in formula to standardize the criterion. Default is set to TRUE. Logical, specifies if the method will use exact change-in-estimate or its approximation. Default is set to FALSE, which means that the method will use approximation proposed by Dunkler et al. Note, setting to TRUE can severely slow down the algorithm, but setting to FALSE can in some cases lead to a poor approximation of the change-in-estimate criterion. String that specifies the strategy to select variables for the black list. Currently supported options are significance level 'alpha', Akaike information criterion 'AIC' and Bayesian information criterion 'BIC'. If you are using significance level, in that case you have to specify the value of alpha (see parameter alpha) and type of the test statistic (see parameter type.test). Default is set to "alpha".

abe 3 alpha type.test type.factor verbose Value that specifies the level of significance as explained above. Default is set to 0.2. String that specifies which test should be performed in case the criterion = "alpha". Possible values are "F" and "Chisq" (default) for class "lm", "Rao", "LRT", "Chisq" (default), "F" for class "glm" and "Chisq" for class "coxph". See also drop1. String that specifies how to treat factors, see details, possible values are "factor" and "individual". Logical that specifies if the variable selection process should be printed. Note: this can severely slow down the algorithm. Details Using the default settings ABE will perform augmented backward elimination based on significance. The level of significance will be set to 0.2. All variables will be treated as "passive or active". Approximated change-in-estimate will be used. Threshold of the relative change-in-estimate criterion will be 0.05. Setting tau to a very large number (e.g. Inf) turns off the change-in-estimate criterion, and ABE will only perform backward elimination. Specifying "alpha" = 0 will include variables only because of the change-in-estimate criterion, as then variables are not safe from exclusion because of their p-values. Specifying "alpha" = 1 will always include all variables. When using type.factor="individual" each dummy variable of a factor is treated as an individual explanatory variable, hence only this dummy variable can be removed from the model (warning: use sensible coding for the reference group). Using type.factor="factor" will look at the significance of removing all dummy variables of the factor and can drop the entire variable from the model. Value An object of class "lm", "glm" or "coxph" representing the model chosen by abe method. Author(s) Rok Blagus, <rok.blagus@mf.uni-lj.si> Sladana Babic References Daniela Dunkler, Max Plischke, Karen Lefondre, and Georg Heinze. Augmented backward elimination: a pragmatic and purposeful way to develop statistical models. PloS one, 9(11):e113677, 2014. See Also abe.boot, lm, glm and coxph

4 abe Examples # simulate some data: set.seed(1) n=100 x1<-runif(n) x2<-runif(n) x3<-runif(n) y<--5+5*x1+5*x2+ rnorm(n,sd=5) dd<-data.frame(y,x1,x2,x3) # fit a simple model containing only numeric covariates fit<-lm(y~x1+x2+x3,x=true,y=true,data=dd) # perform ABE with "x1" as only passive and "x2" as only active # using the exact change in the estimate of 5% and significance # using 0.2 as a threshold abe.fit<-abe(fit,data=dd,include="x1",active="x2", tau=0.05,exp.beta=false,exact=true,criterion="alpha",alpha=0.2, type.test="chisq",verbose=true) summary(abe.fit) # similar example, but turn off the change-in-estimate and perform # only backward elimination abe.fit<-abe(fit,data=dd,include="x1",active="x2", tau=inf,exp.beta=false,exact=true,criterion="alpha",alpha=0.2, type.test="chisq",verbose=true) summary(abe.fit) # an example with the model containing categorical covariates: dd$x3<-rbinom(n,size=3,prob=1/3) dd$y1<--5+5*x1+5*x2+ rnorm(n,sd=5) fit<-lm(y1~x1+x2+factor(x3),x=true,y=true,data=dd) # treat "x3" as a single covariate: abe.fit.fact<-abe(fit,data=dd,include="x1",active="x2", tau=0.05,exp.beta=false,exact=true,criterion="alpha",alpha=0.2, type.test="chisq",verbose=true,type.factor="factor") summary(abe.fit.fact) # treat each dummy of "x3" as a separate covariate: abe.fit.ind<-abe(fit,data=dd,include="x1",active="x2", tau=0.05,exp.beta=false,exact=true,criterion="alpha",alpha=0.2, type.test="chisq",verbose=true,type.factor="individual") summary(abe.fit.ind)

abe.boot 5 abe.boot Bootstrapped Augmented Backward Elimination Description Usage Performs Augmented backward elimination on re-sampled datasets using different bootstrap and re-sampling techniques. abe.boot(fit, data = NULL, include = NULL, active = NULL, tau = 0.05, exp.beta = TRUE, exact = FALSE, criterion = "alpha", alpha = 0.2, type.test = "Chisq", type.factor = NULL, num.boot = 100, type.boot = c("bootstrap", "mn.bootstrap", "subsampling"), prop.sampling = 0.632) Arguments fit data include active tau exp.beta exact criterion An object of a class "lm", "glm" or "coxph" representing the fit. Note, the functions should be fitted with argument x=true and y=true. data frame used when fitting the object fit. a vector containing the names of variables that will be included in the final model. These variables are used as passive variables during modeling. These variables might be exposure variables of interest or known confounders. They will never be dropped from the working model in the selection process, but they will be used passively in evaluating change-in-estimate criteria of other variables. Note, variables which are not specified as include or active in the model fit are assumed to be active and passive variables. a vector containing the names of active variables. These less important explanatory variables will only be used as active, but not as passive variables when evaluating the change-in-estimate criterion. Value that specifies the threshold of the relative change-in-estimate criterion. Default is set to 0.05. Logical specifying if exponent is used in formula to standardize the criterion. Default is set to TRUE. Logical, specifies if the method will use exact change-in-estimate or approximated. Default is set to FALSE, which means that the method will use approximation proposed by Dunkler et al. Note, setting to TRUE can severely slow down the algorithm, but setting to FALSE can in some cases lead to a poor approximation of the change-in-estimate criterion. String that specifies the strategy to select variables for the blacklist. Currently supported options are significance level 'alpha', Akaike information criterion 'AIC' and Bayesian information criterion 'BIC'. If you are using significance level, in that case you have to specify the value of alpha (see parameter alpha). Default is set to "alpha".

6 abe.boot Details alpha type.test type.factor num.boot type.boot Value that specifies the level of significance as explained above. Default is set to 0.2. String that specifies which test should be performed in case the criterion = "alpha". Possible values are "F" and "Chisq" (default) for class "lm", "Rao", "LRT", "Chisq" (default), "F" for class "glm" and "Chisq" for class "coxph". See also drop1. String that specifies how to treat factors, see details, possible values are "factor" and "individual". number of bootstrap re-samples String that specifies the type of bootstrap. Possible values are "bootstrap", "mn.bootstrap", "subsampling", see details prop.sampling Sampling proportion. Only applicable for type.boot="mn.bootstrap" and type.boot="subsampling", defaults to 0.632. See details. type.boot can be bootstrap (n observations drawn from the original data with replacement), mn.bootstrap (m out of n observations drawn from the original data with replacement), subsampling (m out of n observations drawn from the original data without replacement), where m is [prop.sampling*n]. Value an object of class abe for which summary and plot functions are available. A list with the following elements: models the final models obtained after performing ABE on re-sampled datasets, each object in the list is of the same class as fit alpha the vector of significance levels used tau the vector of threshold values for the change-in-estimate num.boot number of re-sampled datasets criterion criterion used when constructing the black-list all.vars a list of variables used when estimating fit fit.or the initial model Author(s) Rok Blagus, <rok.blagus@mf.uni-lj.si> Sladana Babic References Daniela Dunkler, Max Plischke, Karen Lefondre, and Georg Heinze. Augmented backward elimination: a pragmatic and purposeful way to develop statistical models. PloS one, 9(11):e113677, 2014. Riccardo De Bin, Silke Janitza, Willi Sauerbrei and Anne-Laure Boulesteix. Subsampling versus Bootstrapping in Resampling-Based Model Selection for Multivariable Regression. Biometrics 72, 272-280, 2016.

plot.abe 7 See Also abe, summary.abe, plot.abe Examples # simulate some data and fit a model set.seed(1) n=100 x1<-runif(n) x2<-runif(n) x3<-runif(n) y<--5+5*x1+5*x2+ rnorm(n,sd=5) dd<-data.frame(y=y,x1=x1,x2=x2,x3=x3) fit<-lm(y~x1+x2+x3,x=true,y=true,data=dd) # use ABE on 50 bootstrap re-samples considering different # change-in-estimate thresholds and significance levels fit.boot<-abe.boot(fit,data=dd,include="x1",active="x2", tau=c(0.05,0.1),exp.beta=false,exact=true, criterion="alpha",alpha=c(0.2,0.05),type.test="chisq", num.boot=50,type.boot="bootstrap") summary(fit.boot) # use ABE on 50 subsamples randomly selecting 50% of subjects # considering different change-in-estimate thresholds and # significance levels fit.boot<-abe.boot(fit,data=dd,include="x1",active="x2", tau=c(0.05,0.1),exp.beta=false,exact=true, criterion="alpha",alpha=c(0.2,0.05),type.test="chisq", num.boot=50,type.boot="subsampling",prop.sampling=0.5) summary(fit.boot) plot.abe Plot Function Description Usage Plot function for the bootstrapped version of ABE. ## S3 method for class 'abe' plot(x, type.plot = c("coefficients", "models", "variables"), alpha = NULL, tau = NULL, variable = NULL,...)

8 plot.abe Arguments x Details type.plot an object of class "abe", an object returned by a call to abe.boot string which specifies the type of the plot. See details. alpha values of alpha for which the plot is to be made (can be a vector of length >1) tau values of tau for which the plot is to be made (can be a vector of length >1) variable variables for which the plot is to be made (can be a vector of length >1)... Arguments to be passed to methods, such as graphical parameters (see barplot, hist). when using type.plot="coefficients" the function plots a histogram of the estimated regression coefficients for the specified variables, alpha(s) and tau(s) obtained from different re-sampled datasets. When the variable is not included in the final model, its regression coefficient is set to zero. When using type.plot="variables" the function plots a barplot of the relative inclusion frequencies of the specified variables, for the specified values of alpha and tau. When using type.plot="models" the function plots a barplot of the relative frequencies of the final models for specified alpha(s) and tau(s). Author(s) See Also Rok Blagus, <rok.blagus@mf.uni-lj.si> Sladana Babic abe.boot, summary.abe Examples set.seed(1) n=100 x1<-runif(n) x2<-runif(n) x3<-runif(n) y<--5+5*x1+5*x2+ rnorm(n,sd=5) dd<-data.frame(y=y,x1=x1,x2=x2,x3=x3) fit<-lm(y~x1+x2+x3,x=true,y=true,data=dd) fit.boot<-abe.boot(fit,data=dd,include="x1",active="x2", tau=c(0.05,0.1),exp.beta=false,exact=true, criterion="alpha",alpha=c(0.2,0.05),type.test="chisq", num.boot=50,type.boot="bootstrap") plot(fit.boot,type.plot="coefficients", alpha=0.2,tau=0.1,variable=c("x1","x3"), col="light blue") plot(fit.boot,type.plot="variables",

summary.abe 9 alpha=0.2,tau=0.1,variable=c("x1","x2","x3"), col="light blue",horiz=true,las=1) par(mar=c(4,6,4,2)) plot(fit.boot,type.plot="models", alpha=0.2,tau=0.1,col="light blue",horiz=true,las=1) summary.abe Summary Function Description makes a summary of a bootstrapped version of ABE Usage ## S3 method for class 'abe' summary(object, conf.level = 0.95,...) Arguments object an object of class "abe", an object returned by a call to abe.boot conf.level the confidence level, defaults to 0.95... additional arguments affecting the summary produced. Value a list with the following elements: var.rel.frequencies: inclusion relative frequencies for all variables from the initial model model.rel.frequencies: relative frequencies of the final models var.coefs: bootstrap medians and percentiles for the estimates of the regression coefficients for each variable from the initial model Author(s) Rok Blagus, <rok.blagus@mf.uni-lj.si> Sladana Babic See Also abe.boot, plot.abe

10 summary.abe Examples set.seed(1) n=100 x1<-runif(n) x2<-runif(n) x3<-runif(n) y<--5+5*x1+5*x2+ rnorm(n,sd=5) dd<-data.frame(y=y,x1=x1,x2=x2,x3=x3) fit<-lm(y~x1+x2+x3,x=true,y=true,data=dd) fit.boot<-abe.boot(fit,data=dd,include="x1",active="x2", tau=c(0.05,0.1),exp.beta=false,exact=true, criterion="alpha",alpha=c(0.2,0.05),type.test="chisq", num.boot=50,type.boot="bootstrap") summary(fit.boot)$var.rel.frequencies

Index abe, 2, 7 abe.boot, 3, 5, 8, 9 barplot, 8 coxph, 3 drop1, 3, 6 glm, 3 hist, 8 lm, 3 plot.abe, 7, 7, 9 summary.abe, 7, 8, 9 11