Package IPMRF. August 9, 2017

Similar documents
Package missforest. February 15, 2013

Package RLT. August 20, 2018

Package BigTSP. February 19, 2015

Package gbm2sas. November 11, Index 6. Convert GBM Object Trees to SAS Code

Package randomforest.ddr

Package EFS. R topics documented:

Package rpst. June 6, 2017

Package nonlinearicp

Package kernelfactory

Package DALEX. August 6, 2018

Package avirtualtwins

Package obliquerf. February 20, Index 8. Extract variable importance measure

Package EnsembleBase

Random Forest A. Fornaser

Business Club. Decision Trees

Package sprinter. February 20, 2015

Package ranger. November 10, 2015

Package CALIBERrfimpute

Random Forests and Boosting

Package ModelMap. February 19, 2015

Data analysis case study using R for readily available data set using any one machine learning Algorithm

Package biglars. February 19, 2015

Package vip. June 15, 2018

Package gbts. February 27, 2017

Package logicfs. R topics documented:

Package ICEbox. July 13, 2017

2 cubist.default. Fit a Cubist model

Package rocc. February 20, 2015

Package caretensemble

3 Ways to Improve Your Regression

Package LINselect. June 9, 2018

MIT Samberg Center Cambridge, MA, USA. May 30 th June 2 nd, by C. Rea, R.S. Granetz MIT Plasma Science and Fusion Center, Cambridge, MA, USA

Package subsemble. February 20, 2015

Package Cubist. December 2, 2017

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

Package ebmc. August 29, 2017

Nonparametric Approaches to Regression

Package blockforest. December 30, 2018

Random Forest Classification and Attribute Selection Program rfc3d

Machine Learning. A. Supervised Learning A.7. Decision Trees. Lars Schmidt-Thieme

Package intccr. September 12, 2017

Package Boruta. January 8, 2011

Package ordinalforest

Nonparametric Classification Methods

Package Modeler. May 18, 2018

Notes based on: Data Mining for Business Intelligence

A Tutorial on Applying Novel Variable Importance Measures for Ordinal Response Data

stepwisecm: Stepwise Classification of Cancer Samples using High-dimensional Data Sets

Lecture 7: Decision Trees

Package ordinalclust

Package truncreg. R topics documented: August 3, 2016

Package parcor. February 20, 2015

Package TWIX. R topics documented: February 15, Title Trees WIth extra splits. Date Version

Package TANDEM. R topics documented: June 15, Type Package

Package CausalImpact

Package spikeslab. February 20, 2015

Package RAMP. May 25, 2017

Package rferns. November 15, 2017

Random Forest in Genomic Selection

Package tsensembler. August 28, 2017

Package SSLASSO. August 28, 2018

Faculty of Sciences. Holger Cevallos Valdiviezo

Applying Improved Random Forest Explainability (RFEX 2.0) steps on synthetic data for variable features having a unimodal distribution

Package nricens. R topics documented: May 30, Type Package

The Basics of Decision Trees

Package assortnet. January 18, 2016

CART Bagging Trees Random Forests. Leo Breiman

Package tidyimpute. March 5, 2018

Package JOUSBoost. July 12, 2017

Package nodeharvest. June 12, 2015

Package condusco. November 8, 2017

Package palmtree. January 16, 2018

Package AnDE. R topics documented: February 19, 2015

Package stattarget. December 23, 2018

Package TVsMiss. April 5, 2018

An Empirical Comparison of Ensemble Methods Based on Classification Trees. Mounir Hamza and Denis Larocque. Department of Quantitative Methods

Package nullabor. February 20, 2015

Package gains. September 12, 2017

Machine Learning Duncan Anderson Managing Director, Willis Towers Watson

Lecture 20: Bagging, Random Forests, Boosting

Package ridge. R topics documented: February 15, Title Ridge Regression with automatic selection of the penalty parameter. Version 2.

Package Tnseq. April 13, 2017

partykit: A Toolbox for Recursive Partytioning

Package subtype. February 20, 2015

Package randomglm. R topics documented: February 20, 2015

Using the missforest Package

Package RandPro. January 10, 2018

Random Forests May, Roger Bohn Big Data Analytics

Package rknn. June 9, 2015

Package PTE. October 10, 2017

MSA220/MVE440 Statistical Learning for Big Data

Package purrrlyr. R topics documented: May 13, Title Tools at the Intersection of 'purrr' and 'dplyr' Version 0.0.2

Package FPDclustering

Package quickreg. R topics documented:

Stepwise: Stepwise Classification of Cancer Samples using Clinical and Molecular Data

Package sgpls. R topics documented: August 29, Type Package Title Sparse Group Partial Least Square Methods Version 1.

Package robustreg. R topics documented: April 27, Version Date Title Robust Regression Functions

Package longclust. March 18, 2018

Package PrivateLR. March 20, 2018

Package DPBBM. September 29, 2016

Transcription:

Type Package Package IPMRF August 9, 2017 Title Intervention in Prediction Measure (IPM) for Random Forests Version 1.2 Date 2017-08-09 Author Irene Epifanio, Stefano Nembrini Maintainer Irene Epifanio <epifanio@uji.es> Imports party, randomforest, gbm Suggests mlbench, randomforestsrc, ranger Description Computes IPM for assessing variable importance for random forests. See details at I. Epifanio (2017) <DOI:10.1186/s12859-017-1650-8>. License GPL-3 NeedsCompilation no Repository CRAN Date/Publication 2017-08-09 12:14:05 UTC R topics documented: IPMRF-package....................................... 2 ipmgbmnew......................................... 3 ipmparty........................................... 4 ipmpartynew........................................ 7 ipmranger.......................................... 9 ipmrangernew........................................ 11 ipmrf............................................ 12 ipmrfnew.......................................... 14 Index 17 1

2 IPMRF-package IPMRF-package Intervention in Prediction Measure (IPM) for Random Forests Description Details It computes IPM for assessing variable importance for random forests. See I. Epifanio (2017). Intervention in prediction measure: a new approach to assessing variable importance for random forests. BMC Bioinformatics. Package: IPMRF Type: Package Version: 1.2 Date: 2017-08-09 Main Functions: ipmparty: IPM casewise with CIT-RF by party for OOB samples ipmpartynew: IPM casewise with CIT-RF by party for new samples ipmrf: IPM casewise with CART-RF by randomforest for OOB samples ipmrfnew: IPM casewise with CART-RF by randomforest for new samples ipmranger: IPM casewise with RF by ranger for OOB samples ipmrangernew: IPM casewise with RF by ranger for new samples ipmgbmnew: IPM casewise with GBM by gbm for new samples Author(s) Irene Epifanio, Stefano Nembrini References Pierola, A. and Epifanio, I. and Alemany, S. (2016) An ensemble of ordered logistic regression and random forest for child garment size matching. Computers & Industrial Engineering, 101, 455 465. Epifanio, I. (2017) Intervention in prediction measure: a new approach to assessing variable importance for random forests. BMC Bioinformatics, 18, 230. See Also https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-017-1650-8

ipmgbmnew 3 ipmgbmnew IPM casewise with gbm object by gbm for new cases, whose responses do not need to be known Description The IPM of a new case, i.e. one not used to grow the forest and whose true response does not need to be known, is computed as follows. The new case is put down each of the ntree trees in the forest. For each tree, the case goes from the root node to a leaf through a series of nodes. The variable split in these nodes is recorded. The percentage of times a variable is selected along the case s way from the root to the terminal node is calculated for each tree. Note that we do not count the percentage of times a split occurred on variable k in tree t, but only the variables that intervened in the prediction of the case. The IPM for this new case is obtained by averaging those percentages over the ntree trees. Usage ipmgbmnew(marbolr, da, ntree) Arguments marbolr da ntree Generalized Boosted Regression object obtained with gbm. Data frame with the predictors only, not responses, for the new cases. Each row corresponds to an observation and each column corresponds to a predictor, which obviously must be the same variables used as predictors in the training set. Number of trees. Details All details are given in Epifanio (2017). Value It returns IPM for new cases. It is a matrix with as many rows as cases are in da, and as many columns as predictors are in da. Note See Epifanio (2017) about the parameters of RFs to be used, the advantages and limitations of IPM, and in particular when CART is considered with predictors of different types. Author(s) Stefano Nembrini

4 ipmparty References Pierola, A. and Epifanio, I. and Alemany, S. (2016) An ensemble of ordered logistic regression and random forest for child garment size matching. Computers & Industrial Engineering, 101, 455 465. Epifanio, I. (2017) Intervention in prediction measure: a new approach to assessing variable importance for random forests. BMC Bioinformatics, 18, 230. See Also ipmparty, ipmrf, ipmranger, ipmpartynew, ipmrfnew Examples ## Not run: library(party) library(gbm) gbm=gbm(score ~., data = readingskills, n.trees=50, shrinkage=0.05, interaction.depth=5, bag.fraction = 0.5, train.fraction = 0.5, n.minobsinnode = 1, cv.folds = 0, keep.data=f, verbose=f) apply(ipmgbmnew(gbm,readingskills[,-4],50),fun=mean,2)->gbm_ipm gbm_ipm ## End(Not run) ipmparty IPM casewise with CIT-RF by party for OOB samples Description The IPM for a case in the training set is calculated by considering and averaging over only the trees where the case belongs to the OOB set. The case is put down each of the trees where the case belongs to the OOB set. For each tree, the case goes from the root node to a leaf through a series of nodes. The variable split in these nodes is recorded. The percentage of times a variable is selected along the case s way from the root to the terminal node is calculated for each tree. Note that we do not count the percentage of times a split occurred on variable k in tree t, but only the variables that intervened in the prediction of the case. The IPM for this case is obtained by averaging those percentages over only the trees where the case belongs to the OOB set. The random forest is based on CIT (Conditional Inference Trees). Usage ipmparty(marbol, da, ntree)

ipmparty 5 Arguments marbol da ntree Random forest obtained with cforest. Responses can be of the same type supported by cforest, not only numerical or nominal, but also ordered responses, censored response variables and multivariate responses. Data frame with the predictors only, not responses, of the training set used for computing marbol. Each row corresponds to an observation and each column corresponds to a predictor. Predictors can be numeric, nominal or an ordered factor. Number of trees in the random forest. Details Value Note All details are given in Epifanio (2017). It returns IPM for cases in the training set. It is estimated when they are OOB observations. It is a matrix with as many rows as cases are in da, and as many columns as predictors are in da. IPM can be estimated for any kind of RF computed by cforest, including multivariate RF. See Epifanio (2017) about advantages and limitations of IPM, and about the parameters to be used in cforest. Author(s) Irene Epifanio References Pierola, A. and Epifanio, I. and Alemany, S. (2016) An ensemble of ordered logistic regression and random forest for child garment size matching. Computers & Industrial Engineering, 101, 455 465. Epifanio, I. (2017) Intervention in prediction measure: a new approach to assessing variable importance for random forests. BMC Bioinformatics, 18, 230. See Also ipmpartynew, ipmrf, ipmranger, ipmrfnew, ipmrangernew, ipmgbmnew Examples #Note: more examples can be found at #https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-017-1650-8 ## ------------------------------------------------------- ## Example from \code{\link[party]{varimp}} in \pkg{party} ## Classification RF

6 ipmparty ## ------------------------------------------------------- ## Not run: library(party) #from help in varimp by party package set.seed(290875) readingskills.cf <- cforest(score ~., data = readingskills, control = cforest_unbiased(mtry = 2, ntree = 50)) # standard importance varimp(readingskills.cf) # the same modulo random variation varimp(readingskills.cf, pre1.0_0 = TRUE) # conditional importance, may take a while... varimp(readingskills.cf, conditional = TRUE) ## End(Not run) #IMP based on CIT-RF (party package) library(party) ntree<-50 #readingskills: data from party package da<-readingskills[,1:3] set.seed(290875) readingskills.cf3 <- cforest(score ~., data = readingskills, control = cforest_unbiased(mtry = 3, ntree = 50)) #IPM case-wise computed with OOB with party pupf<-ipmparty(readingskills.cf3,da,ntree) #global IPM pua<-apply(pupf,2,mean) pua ## ------------------------------------------------------- ## Example from \code{\link[randomforestsrc]{var.select}} in \pkg{randomforestsrc} ## Multivariate mixed forests ## ------------------------------------------------------- ## Not run: library(randomforestsrc) #from help in var.select by randomforestsrc package mtcars.new <- mtcars mtcars.new$cyl <- factor(mtcars.new$cyl) mtcars.new$carb <- factor(mtcars.new$carb, ordered = TRUE) mv.obj <- rfsrc(cbind(carb, mpg, cyl) ~., data = mtcars.new, importance = TRUE) var.select(mv.obj, method = "vh.vimp", nrep = 10)

ipmpartynew 7 #different variables are selected if var.select is repeated ## End(Not run) #IMP based on CIT-RF (party package) library(randomforestsrc) mtcars.new <- mtcars ntree<-500 da<-mtcars.new[,3:10] mc.cf <- cforest(carb+ mpg+ cyl ~., data = mtcars.new, control = cforest_unbiased(mtry = 8, ntree = 500)) #IPM case-wise computing with OOB with party pupf<-ipmparty(mc.cf,da,ntree) #global IPM pua<-apply(pupf,2,mean) pua #disp and hp are consistently selected as more important if repeated ipmpartynew IPM casewise with CIT-RF by party for new cases, whose responses do not need to be known Description Usage The IPM of a new case, i.e. one not used to grow the forest and whose true response does not need to be known, is computed as follows. The new case is put down each of the ntree trees in the forest. For each tree, the case goes from the root node to a leaf through a series of nodes. The variable split in these nodes is recorded. The percentage of times a variable is selected along the case s way from the root to the terminal node is calculated for each tree. Note that we do not count the percentage of times a split occurred on variable k in tree t, but only the variables that intervened in the prediction of the case. The IPM for this new case is obtained by averaging those percentages over the ntree trees. The random forest is based on CIT (Conditional Inference Trees). ipmpartynew(marbol, da, ntree) Arguments marbol Random forest obtained with cforest. Responses in the training set can be of the same type supported by cforest, not only numerical or nominal, but also ordered responses, censored response variables and multivariate responses.

8 ipmpartynew da ntree Data frame with the predictors only, not responses, for the new cases. Each row corresponds to an observation and each column corresponds to a predictor, which obviously must be the same variables used as predictors in the training set. Predictors can be numeric, nominal or an ordered factor. Number of trees in the random forest. Details All details are given in Epifanio (2017). Value It returns IPM for new cases. It is a matrix with as many rows as cases are in da, and as many columns as predictors are in da. IPM can be estimated for any kind of RF computed by cforest, including multivariate RF. Note See Epifanio (2017) about advantages and limitations of IPM, and about the parameters to be used in cforest. Author(s) Irene Epifanio References Pierola, A. and Epifanio, I. and Alemany, S. (2016) An ensemble of ordered logistic regression and random forest for child garment size matching. Computers & Industrial Engineering, 101, 455 465. Epifanio, I. (2017) Intervention in prediction measure: a new approach to assessing variable importance for random forests. BMC Bioinformatics, 18, 230. See Also ipmparty, ipmrf, ipmranger, ipmrfnew, ipmrangernew, ipmgbmnew Examples #Note: more examples can be found at #https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-017-1650-8 ## ------------------------------------------------------- ## Example from \code{\link[party]{varimp}} in \pkg{party} ## Classification RF ## ------------------------------------------------------- library(party)

ipmranger 9 #IMP based on CIT-RF (party package) ntree=50 #readingskills: data from party package da=readingskills[,1:3] set.seed(290875) readingskills.cf3 <- cforest(score ~., data = readingskills, control = cforest_unbiased(mtry = 3, ntree = 50)) #new case nativespeaker='yes' age=8 shoesize=28 da1=data.frame(nativespeaker, age, shoesize) #IPM case-wise computed for new cases for party package pupfn=ipmpartynew(readingskills.cf3,da1,ntree) pupfn ipmranger IPM casewise with RF by ranger for OOB samples Description The IPM for a case in the training set is calculated by considering and averaging over only the trees where the case belongs to the OOB set. The case is put down each of the trees where the case belongs to the OOB set. For each tree, the case goes from the root node to a leaf through a series of nodes. The variable split in these nodes is recorded. The percentage of times a variable is selected along the case s way from the root to the terminal node is calculated for each tree. Note that we do not count the percentage of times a split occurred on variable k in tree t, but only the variables that intervened in the prediction of the case. The IPM for this case is obtained by averaging those percentages over only the trees where the case belongs to the OOB set. The random forest is based on a fast implementation of CART-RF. Usage ipmranger(marbolr, da, ntree) Arguments marbolr da ntree Random forest obtained with ranger. Responses can be of the same type supported by ranger. Note that not only numerical or nominal, but also ordered responses, censored response variables and multivariate responses can be considered with ipmparty. Data frame with the predictors only, not responses, of the training set used for computing marbolr. Each row corresponds to an observation and each column corresponds to a predictor. Predictors can be numeric, nominal or an ordered factor. Number of trees in the random forest.

10 ipmranger Details All details are given in Epifanio (2017). Value It returns IPM for cases in the training set. It is estimated when they are OOB observations. It is a matrix with as many rows as cases are in da, and as many columns as predictors are in da. Note See Epifanio (2017) about the parameters of RFs to be used, the advantages and limitations of IPM, and in particular when CART is considered with predictors of different types. Author(s) Stefano Nembrini, Irene Epifanio References Pierola, A. and Epifanio, I. and Alemany, S. (2016) An ensemble of ordered logistic regression and random forest for child garment size matching. Computers & Industrial Engineering, 101, 455 465. Epifanio, I. (2017) Intervention in prediction measure: a new approach to assessing variable importance for random forests. BMC Bioinformatics, 18, 230. See Also ipmparty, ipmrf, ipmpartynew, ipmrfnew, ipmrangernew, ipmgbmnew Examples #Note: more examples can be found at #https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-017-1650-8 ## Not run: library(ranger) num.trees=500 rf <- ranger(species ~., data = iris,keep.inbag = TRUE,num.trees=num.trees) IPM=apply(ipmranger(rf,iris[,-5],num.trees),FUN=mean,2) ## End(Not run)

ipmrangernew 11 ipmrangernew IPM casewise with RF by ranger for new cases, whose responses do not need to be known Description The IPM of a new case, i.e. one not used to grow the forest and whose true response does not need to be known, is computed as follows. The new case is put down each of the ntree trees in the forest. For each tree, the case goes from the root node to a leaf through a series of nodes. The variable split in these nodes is recorded. The percentage of times a variable is selected along the case s way from the root to the terminal node is calculated for each tree. Note that we do not count the percentage of times a split occurred on variable k in tree t, but only the variables that intervened in the prediction of the case. The IPM for this new case is obtained by averaging those percentages over the ntree trees. The random forest is based on a fast implementation of CART. Usage ipmrangernew(marbolr, da, ntree) Arguments marbolr da ntree Random forest obtained with ranger. Responses can be of the same type supported by ranger. Note that not only numerical or nominal, but also ordered responses, censored response variables and multivariate responses can be considered with ipmparty. Data frame with the predictors only, not responses, for the new cases. Each row corresponds to an observation and each column corresponds to a predictor, which obviously must be the same variables used as predictors in the training set. Number of trees in the random forest. Details All details are given in Epifanio (2017). Value It returns IPM for new cases. It is a matrix with as many rows as cases are in da, and as many columns as predictors are in da. Note See Epifanio (2017) about the parameters of RFs to be used, the advantages and limitations of IPM, and in particular when CART is considered with predictors of different types.

12 ipmrf Author(s) Stefano Nembrini, Irene Epifanio References Pierola, A. and Epifanio, I. and Alemany, S. (2016) An ensemble of ordered logistic regression and random forest for child garment size matching. Computers & Industrial Engineering, 101, 455 465. Epifanio, I. (2017) Intervention in prediction measure: a new approach to assessing variable importance for random forests. BMC Bioinformatics, 18, 230. See Also ipmparty, ipmrf, ipmranger, ipmpartynew, ipmrfnew, ipmgbmnew Examples ## Not run: library(ranger) num.trees=500 rf <- ranger(species ~., data = iris,keep.inbag = TRUE,num.trees=num.trees) IPM_complete=apply(ipmrangernew(rf,iris[,-5],num.trees),FUN=mean,2) ## End(Not run) ipmrf IPM casewise with CART-RF by randomforest for OOB samples Description The IPM for a case in the training set is calculated by considering and averaging over only the trees where the case belongs to the OOB set. The case is put down each of the trees where the case belongs to the OOB set. For each tree, the case goes from the root node to a leaf through a series of nodes. The variable split in these nodes is recorded. The percentage of times a variable is selected along the case s way from the root to the terminal node is calculated for each tree. Note that we do not count the percentage of times a split occurred on variable k in tree t, but only the variables that intervened in the prediction of the case. The IPM for this case is obtained by averaging those percentages over only the trees where the case belongs to the OOB set. The random forest is based on CART. Usage ipmrf(marbolr, da, ntree)

ipmrf 13 Arguments marbolr da ntree Random forest obtained with randomforest. Responses can be of the same type supported by randomforest. Note that not only numerical or nominal, but also ordered responses, censored response variables and multivariate responses can be considered with ipmparty. Data frame with the predictors only, not responses, of the training set used for computing marbolr. Each row corresponds to an observation and each column corresponds to a predictor. Predictors can be numeric, nominal or an ordered factor. Number of trees in the random forest. Details Value Note All details are given in Epifanio (2017). It returns IPM for cases in the training set. It is estimated when they are OOB observations. It is a matrix with as many rows as cases are in da, and as many columns as predictors are in da. See Epifanio (2017) about the parameters of RFs to be used, the advantages and limitations of IPM, and in particular when CART is considered with predictors of different types. Author(s) Irene Epifanio References Pierola, A. and Epifanio, I. and Alemany, S. (2016) An ensemble of ordered logistic regression and random forest for child garment size matching. Computers & Industrial Engineering, 101, 455 465. Epifanio, I. (2017) Intervention in prediction measure: a new approach to assessing variable importance for random forests. BMC Bioinformatics, 18, 230. See Also ipmparty, ipmranger, ipmpartynew, ipmrfnew, ipmrangernew, ipmgbmnew Examples #Note: more examples can be found at #https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-017-1650-8 ## Not run: library(mlbench)

14 ipmrfnew #data used by Breiman, L.: Random forests. Machine Learning 45(1), 5--32 (2001) data(pimaindiansdiabetes2) Diabetes <- na.omit(pimaindiansdiabetes2) set.seed(2016) require(randomforest) ri<- randomforest(diabetes ~., data=diabetes, ntree=500, importance=true, keep.inbag=true,replace = FALSE) #GVIM and PVIM (CART-RF) im=importance(ri) im #rank ii=apply(im,2,rank) ii #IPM based on CART-RF (randomforest package) da=diabetes[,1:8] ntree=500 #IPM case-wise computed with OOB pupf=ipmrf(ri,da,ntree) #global IPM pua=apply(pupf,2,mean) pua #IPM by classes attach(diabetes) puac=matrix(0,nrow=2,ncol=dim(da)[2]) puac[1,]=apply(pupf[diabetes=='neg',],2,mean) puac[2,]=apply(pupf[diabetes=='pos',],2,mean) colnames(puac)=colnames(da) rownames(puac)=c( 'neg', 'pos') puac #rank IPM #global rank rank(pua) #rank by class apply(puac,1,rank) ## End(Not run) ipmrfnew IPM casewise with CART-RF by randomforest for new cases, whose responses do not need to be known Description The IPM of a new case, i.e. one not used to grow the forest and whose true response does not need to be known, is computed as follows. The new case is put down each of the ntree trees in the forest.

ipmrfnew 15 Usage For each tree, the case goes from the root node to a leaf through a series of nodes. The variable split in these nodes is recorded. The percentage of times a variable is selected along the case s way from the root to the terminal node is calculated for each tree. Note that we do not count the percentage of times a split occurred on variable k in tree t, but only the variables that intervened in the prediction of the case. The IPM for this new case is obtained by averaging those percentages over the ntree trees. The random forest is based on CART ipmrfnew(marbolr, da, ntree) Arguments marbolr da ntree Random forest obtained with randomforest. Responses can be of the same type supported by randomforest. Note that not only numerical or nominal, but also ordered responses, censored response variables and multivariate responses can be considered with ipmparty. Data frame with the predictors only, not responses, for the new cases. Each row corresponds to an observation and each column corresponds to a predictor, which obviously must be the same variables used as predictors in the training set. Number of trees in the random forest. Details Value Note All details are given in Epifanio (2017). It returns IPM for new cases. It is a matrix with as many rows as cases are in da, and as many columns as predictors are in da. See Epifanio (2017) about the parameters of RFs to be used, the advantages and limitations of IPM, and in particular when CART is considered with predictors of different types. Author(s) Irene Epifanio References Pierola, A. and Epifanio, I. and Alemany, S. (2016) An ensemble of ordered logistic regression and random forest for child garment size matching. Computers & Industrial Engineering, 101, 455 465. Epifanio, I. (2017) Intervention in prediction measure: a new approach to assessing variable importance for random forests. BMC Bioinformatics, 18, 230.

16 ipmrfnew See Also ipmparty, ipmrf, ipmranger, ipmpartynew, ipmrangernew,ipmgbmnew Examples #Note: more examples can be found at #https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-017-1650-8 library(mlbench) #data used by Breiman, L.: Random forests. Machine Learning 45(1), 5--32 (2001) data(pimaindiansdiabetes2) Diabetes <- na.omit(pimaindiansdiabetes2) set.seed(2016) require(randomforest) ri<- randomforest(diabetes ~., data=diabetes, ntree=500, importance=true, keep.inbag=true,replace = FALSE) #new cases da1=rbind(apply(diabetes[diabetes[,9]=='pos',1:8],2,mean), apply(diabetes[diabetes[,9]=='neg',1:8],2,mean)) #IPM case-wise computed for new cases for randomforest package ntree=500 pupfn=ipmrfnew(ri, as.data.frame(da1),ntree) pupfn

Index Topic Generalized Boosted Regression IPMRF-package, 2 Topic Random forest IPMRF-package, 2 Topic Variable importance measure IPMRF-package, 2 Topic multivariate ipmgbmnew, 3 ipmparty, 4 ipmpartynew, 7 ipmranger, 9 ipmrangernew, 11 ipmrf, 12 ipmrfnew, 14 Topic tree ipmgbmnew, 3 ipmparty, 4 ipmpartynew, 7 ipmranger, 9 ipmrangernew, 11 ipmrf, 12 ipmrfnew, 14 cforest, 5, 7, 8 gbm, 3 ipmgbmnew, 3, 5, 8, 10, 12, 13, 16 ipmparty, 4, 4, 8, 10, 12, 13, 16 ipmpartynew, 4, 5, 7, 10, 12, 13, 16 ipmranger, 4, 5, 8, 9, 12, 13, 16 ipmrangernew, 5, 8, 10, 11, 13, 16 IPMRF (IPMRF-package), 2 ipmrf, 4, 5, 8, 10, 12, 12, 16 IPMRF-package, 2 ipmrfnew, 4, 5, 8, 10, 12, 13, 14 randomforest, 13, 15 ranger, 9, 11 17