Package vip. June 15, 2018

Similar documents
Package cattonum. R topics documented: May 2, Type Package Version Title Encode Categorical Features

Package embed. November 19, 2018

Package sure. September 19, 2017

Package fastdummies. January 8, 2018

Package vinereg. August 10, 2018

Package ggimage. R topics documented: December 5, Title Use Image in 'ggplot2' Version 0.1.0

Package docxtools. July 6, 2018

Package fastqcr. April 11, 2017

Package gggenes. R topics documented: November 7, Title Draw Gene Arrow Maps in 'ggplot2' Version 0.3.2

Package messaging. May 27, 2018

Package ggimage. R topics documented: November 1, Title Use Image in 'ggplot2' Version 0.0.7

Package bisect. April 16, 2018

Package canvasxpress

Package dotwhisker. R topics documented: June 28, Type Package

Package preprosim. July 26, 2016

Package qualmap. R topics documented: September 12, Type Package

Package wrswor. R topics documented: February 2, Type Package

Package spark. July 21, 2017

Package datasets.load

Package ECctmc. May 1, 2018

Package validara. October 19, 2017

Package pwrab. R topics documented: June 6, Type Package Title Power Analysis for AB Testing Version 0.1.0

Package interplot. R topics documented: June 30, 2018

Package rgho. R topics documented: January 18, 2017

Package ecoseries. R topics documented: September 27, 2017

Package sfc. August 29, 2016

Package WordR. September 7, 2017

Package quickreg. R topics documented:

Package anomalydetection

Package splithalf. March 17, 2018

Package TrafficBDE. March 1, 2018

Package widyr. August 14, 2017

Package omu. August 2, 2018

Package milr. June 8, 2017

Package rsppfp. November 20, 2018

Package deductive. June 2, 2017

Package glmnetutils. August 1, 2017

Package SEMrushR. November 3, 2018

Package DALEX. August 6, 2018

Package gtrendsr. October 19, 2017

Package gtrendsr. August 4, 2018

Package caretensemble

Package comparedf. February 11, 2019

Package superheat. February 4, 2017

Package fingertipsr. May 25, Type Package Version Title Fingertips Data for Public Health

Package breakdown. June 14, 2018

Package TVsMiss. April 5, 2018

Package geojsonsf. R topics documented: January 11, Type Package Title GeoJSON to Simple Feature Converter Version 1.3.

Package autocogs. September 22, Title Automatic Cognostic Summaries Version 0.1.1

Package statsdk. September 30, 2017

Package tidytransit. March 4, 2019

Package pdfsearch. July 10, 2018

Package facerec. May 14, 2018

Package modmarg. R topics documented:

Package zoomgrid. January 3, 2019

Package crossword.r. January 19, 2018

Package jpmesh. December 4, 2017

Package FSelectorRcpp

Package reghelper. April 8, 2017

Package opencage. January 16, 2018

Package tidytree. June 13, 2018

Package NFP. November 21, 2016

Package sankey. R topics documented: October 22, 2017

Package preprocomb. June 26, 2016

Package auctestr. November 13, 2017

Package balance. October 12, 2018

Package ezsummary. August 29, 2016

Package kdtools. April 26, 2018

Package editdata. October 7, 2017

Package robotstxt. November 12, 2017

Package optimus. March 24, 2017

Package bigreadr. R topics documented: August 13, Version Date Title Read Large Text Files

Package d3plus. September 25, 2017

Package infer. July 11, Type Package Title Tidy Statistical Inference Version 0.3.0

Package EFS. R topics documented:

Package guardianapi. February 3, 2019

Package keyholder. May 19, 2018

Package effectr. January 17, 2018

Package calpassapi. August 25, 2018

Package autoshiny. June 25, 2018

Package semver. January 6, 2017

Package knitrprogressbar

Package jdx. R topics documented: January 9, Type Package Title 'Java' Data Exchange for 'R' and 'rjava'

Package anomalize. April 17, 2018

Package smoothr. April 4, 2018

Package mason. July 5, 2018

Package barcoder. October 26, 2018

Package queuecomputer

Package lumberjack. R topics documented: July 20, 2018

Package sigmanet. April 23, 2018

Package catenary. May 4, 2018

Package linkspotter. July 22, Type Package

Package epitab. July 4, 2018

Package basictrendline

Package sfdct. August 29, 2017

Package textrecipes. December 18, 2018

Package pmatch. October 19, 2018

Package ggmosaic. February 9, 2017

Package reval. May 26, 2015

Package condusco. November 8, 2017

Transcription:

Type Package Title Variable Importance Plots Version 0.1.0 Package vip June 15, 2018 A general framework for constructing variable importance plots from various types machine learning models in R. Aside from some standard modelbased variable importance measures, this package also provides a novel approach based on partial dependence plots (PDPs) and individual conditional expectation (ICE) curves as described in Greenwell et al. (2018) <arxiv:1805.04755>. License GPL (>= 2) URL https://github.com/koalaverse/vip BugReports https://github.com/koalaverse/vip/issues Encoding UTF-8 LazyData true Imports dplyr, ggplot2 (>= 0.9.0), gridextra, magrittr, pdp, plyr, stats, tibble, tidyr, utils Suggests C50, caret, earth, gbm, h2o, knitr, party, partykit, ranger, rpart, randomforest, rmarkdown, xgboost, glmnet, testthat RoxygenNote 6.0.1 NeedsCompilation no Author Brandon Greenwell [aut, cre] (<https://orcid.org/0000-0002-8120-0084>), Brad Boehmke [aut] Maintainer Brandon Greenwell <greenwell.brandon@gmail.com> Repository CRAN Date/Publication 2018-06-15 09:10:57 UTC R topics documented: vi.............................................. 2 vint............................................. 3 1

2 vi vip.............................................. 4 vi_ice............................................ 5 vi_model.......................................... 6 vi_pdp............................................ 7 Index 9 vi Variable Importance Compute variable importance scores for the predictors in a model. vi(, method = c("model", "pdp", "ice", "perm"), feature_names, FUN = NULL, truncate_feature_names = NULL, sort = TRUE, decreasing = TRUE, scale = FALSE,...) method feature_names A fitted model (e.g., a "randomforest" ). Character string specifying the type of variable importance (VI) to compute. Current options are "model" (for model-based VI scores), "pdp" (for PDPbased VI scores), "ice" (for ICE-based VI scores), and "perm" (for permutationbased VI scores). The default is "model". For details on the PDP/ICE-based method, see the reference below. Also, method = "perm" is currently ignored, but will be available in a future release. Character string giving the names of the predictor variables (i.e., features) of interest. FUN List with two componenets, "cat" and "con", containing the functions to use for categorical and continuous features, respectively. If NULL, the standard deviation is used for continuous features. For categorical features, the range statistic is used (i.e., (max - min) / 4). truncate_feature_names Integer specifying the length at which to truncate feature names. Default is NULL which results in no truncation (i.e., the full name of each feature will be printed). sort decreasing scale Logical indicating whether or not to order the sort the variable importance scores. Default is TRUE. Logical indicating whether or not the variable importance scores should be sorted in descending (TRUE) or ascending (FALSE) order of importance. Default is TRUE. Logical indicating whether or not to scale the variable importance scores so that the largest is 100. Default is FALSE.... Additional optional arguments.

vint 3 Value A tidy data frame (i.e., a "tibble" ) with two columns: Variable and Importance. For "glm"-like, an additional column, called Sign, is also included which includes the sign (i.e., POS/NEG) of the original coefficient. References Greenwell, B. M., Boehmke, B. C., and McCarthy, A. J. A Simple and Effective Model-Based Variable Importance Measure. arxiv preprint arxiv:1805.04755 (2018). Examples # # A projection pursuit regression example # # Load the sample data data(mtcars) # Fit a projection pursuit regression model mtcars.ppr <- ppr(mpg ~., data = mtcars, nterms = 1) # Compute variable importance scores vi(mtcars.ppr, method = "ice") # Plot variable importance scores vip(mtcars.ppr, method = "ice") vint Interaction Effects Compute the strength of two-way interaction effects. For details, see the reference below. vint(, feature_names, progress = "none", parallel = FALSE, paropts = NULL,...) feature_names progress parallel A fitted model (e.g., a "randomforest" ). Character string giving the names of the two features of interest. Character string giving the name of the progress bar to use while constructing the interaction statistics. See create_progress_bar for details. Default is "none". Logical indicating whether or not to run partial in parallel using a backend provided by the foreach package. Default is FALSE.

4 vip Details paropts List containing additional options to be passed onto foreach when parallel = TRUE.... Additional optional arguments to be passed onto partial. Coming soon! References Greenwell, B. M., Boehmke, B. C., and McCarthy, A. J.: A Simple and Effective Model-Based Variable Importance Measure. arxiv preprint arxiv:1805.04755 (2018). vip Variable Importance Plots Plot variable importance scores for the predictors in a model. vip(,...) ## Default S3 method: vip(, num_features = 10L, bar = TRUE, width = 0.75, horizontal = TRUE, alpha = 1, color = "grey35", fill = "grey35", size = 1, shape = 19,...) A fitted model (e.g., a "randomforest" ).... Additional optional arguments to be passed onto vi. num_features Integer specifying the number of variable importance scores to plot. Default is 10. bar Logical indicating whether or not to produce a barplot. Default is TRUE. If bar = FALSE, then a dotchart is displayed instead. width horizontal alpha color fill Numeric value specifying the width of the bars when bar = TRUE. Default is 0.75. Logical indicating whether or not to plot the importance scores on the x-axis (TRUE). Default is TRUE. Numeric value between 0 and 1 giving the trasparency of the bars. Character string specifying the color to use for the borders of the bars. Could also be a function, such as heat.colors. Default is "grey35". Character string specifying the color to use to fill the bars. Could also be a function, such as heat.colors. Default is "grey35".

vi_ice 5 size shape Numeric value indicating the size to use for the points whenever bar = FALSE. Default is 1. Numeric value indicating the shape to use for the points whenever bar = FALSE. Default is 1. Examples # # A projection pursuit regression example # # Load the sample data data(mtcars) # Fit a projection pursuit regression model mtcars.ppr <- ppr(mpg ~., data = mtcars, nterms = 1) # Construct variable importance plot vip(mtcars.ppr, method = "ice") vi_ice ICE-Based Variable Importance Compute ICE-based variable importance scores for the predictors in a model. (This function is meant for internal use only.) vi_ice(,...) ## Default S3 method: vi_ice(, feature_names, FUN = NULL,...) A fitted model (e.g., a "randomforest" ).... Additional optional arguments to be passed onto partial. feature_names FUN Character string giving the names of the predictor variables (i.e., features) of interest. List with two componenets, "cat" and "con", containing the functions to use for categorical and continuous features, respectively. If NULL, the standard deviation is used for continuous features. For categorical features, the range statistic is used (i.e., (max - min) / 4).

6 vi_model Details Value Coming soon! A tidy data frame (i.e., a "tibble" ) with two columns, Variable and Importance, containing the variable name and its associated importance score, respectively. vi_model Model-Based Variable Importance Compute model-based variable importance scores for the predictors in a model. (This function is meant for internal use only.) ## Default S3 method: ## S3 method for class 'C5.0' ## S3 method for class 'constparty' ## S3 method for class 'earth' ## S3 method for class 'gbm' ## S3 method for class 'H2OBinomialModel' ## S3 method for class 'H2OMultinomialModel' ## S3 method for class 'H2ORegressionModel' ## S3 method for class 'lm'

vi_pdp 7 ## S3 method for class 'randomforest' ## S3 method for class 'RandomForest' vi_model(, auc = FALSE,...) ## S3 method for class 'ranger' ## S3 method for class 'rpart' ## S3 method for class 'train' ## S3 method for class 'xgb.booster' A fitted model (e.g., a "randomforest" ).... Additional optional arguments. auc Logical indicating whether or not to compute the AUC-based variable scores described in Janitza et al. (2012). Only available for cforest s. See varimpauc for details. Default is FALSE. Details Coming soon! Value A tidy data frame (i.e., a "tibble" ) with two columns: Variable and Importance. For "glm"-like, an additional column, called Sign, is also included which includes the sign (i.e., POS/NEG) of the original coefficient. vi_pdp PDP-Based Variable Importance Compute PDP-based variable importance scores for the predictors in a model. (This function is meant for internal use only.)

8 vi_pdp vi_pdp(,...) ## Default S3 method: vi_pdp(, feature_names, FUN = NULL,...) A fitted model (e.g., a "randomforest" ).... Additional optional arguments to be passed onto partial. feature_names FUN Details Value Coming soon! Character string giving the names of the predictor variables (i.e., features) of interest. List with two componenets, "cat" and "con", containing the functions to use for categorical and continuous features, respectively. If NULL, the standard deviation is used for continuous features. For categorical features, the range statistic is used (i.e., (max - min) / 4). A tidy data frame (i.e., a "tibble" ) with two columns, Variable and Importance, containing the variable name and its associated importance score, respectively.

Index cforest, 7 create_progress_bar, 3 foreach, 4 heat.colors, 4 partial, 4, 5, 8 varimpauc, 7 vi, 2, 4 vi_ice, 5 vi_model, 6 vi_pdp, 7 vint, 3 vip, 4 9