Package rferns. November 15, 2017

Similar documents
Package Boruta. January 8, 2011

Package randomforest.ddr

Package naivebayes. R topics documented: January 3, Type Package. Title High Performance Implementation of the Naive Bayes Algorithm

Package missforest. February 15, 2013

Package maboost. R topics documented: February 20, 2015

Package C50. December 1, 2017

Package ranger. November 10, 2015

Journal of Statistical Software

Package arulescba. April 24, 2018

Package sfc. August 29, 2016

Package CatEncoders. March 8, 2017

Package kerasformula

Package rocc. February 20, 2015

Package gbts. February 27, 2017

Package obliquerf. February 20, Index 8. Extract variable importance measure

Package fastdummies. January 8, 2018

Package RandPro. January 10, 2018

Package Matrix.utils

Package randomglm. R topics documented: February 20, 2015

Package kernelfactory

Package preprocomb. June 26, 2016

Package ordinalforest

Package datasets.load

Package furrr. May 16, 2018

Package TANDEM. R topics documented: June 15, Type Package

Package raker. October 10, 2017

Package bigreadr. R topics documented: August 13, Version Date Title Read Large Text Files

Package cattonum. R topics documented: May 2, Type Package Version Title Encode Categorical Features

Package logicfs. R topics documented:

Package CorporaCoCo. R topics documented: November 23, 2017

Package lumberjack. R topics documented: July 20, 2018

Package narray. January 28, 2018

Package glmnetutils. August 1, 2017

Package AnDE. R topics documented: February 19, 2015

Package MTLR. March 9, 2019

Package corclass. R topics documented: January 20, 2016

Package subsemble. February 20, 2015

Package RLT. August 20, 2018

Package ClusterSignificance

Package PCADSC. April 19, 2017

Package CompGLM. April 29, 2018

Package rpst. June 6, 2017

Package spark. July 21, 2017

Package bayescl. April 14, 2017

Package Cubist. December 2, 2017

Package woebinning. December 15, 2017

Package IATScore. January 10, 2018

Package nonlinearicp

Package table1. July 19, 2018

Package caretensemble

Package pwrab. R topics documented: June 6, Type Package Title Power Analysis for AB Testing Version 0.1.0

Package hypercube. December 15, 2017

Package auctestr. November 13, 2017

Package inca. February 13, 2018

Package optimus. March 24, 2017

Package CuCubes. December 9, 2016

Package EFS. R topics documented:

Package PrivateLR. March 20, 2018

Package vinereg. August 10, 2018

Package varhandle. R topics documented: July 4, 2018

Package Numero. November 24, 2018

Package io. January 15, 2018

Package DTRreg. August 30, 2017

Package MOST. November 9, 2017

Package StatMeasures

Package rngsetseed. February 20, 2015

Package hsiccca. February 20, 2015

Package kirby21.base

Package blocksdesign

Package GFD. January 4, 2018

Package areaplot. October 18, 2017

Package clustmixtype

Package nodeharvest. June 12, 2015

Package RPEnsemble. October 7, 2017

Package fbroc. August 29, 2016

Package ECctmc. May 1, 2018

Package embed. November 19, 2018

Package sylcount. April 7, 2017

Package anidom. July 25, 2017

Package subtype. February 20, 2015

Package splithalf. March 17, 2018

Package gensemble. R topics documented: February 19, 2015

Package balance. October 12, 2018

Package Rtsne. April 14, 2017

Package GenSA. R topics documented: January 17, Type Package Title Generalized Simulated Annealing Version 1.1.

Package densityclust

Package Formula. July 11, 2017

Package JOUSBoost. July 12, 2017

Package mvprobit. November 2, 2015

Package fasttextr. May 12, 2017

Package condusco. November 8, 2017

R topics documented: 2 clone

Package sparsereg. R topics documented: March 10, Type Package

Package blocksdesign

Package preprosim. July 26, 2016

Package vesselr. R topics documented: April 3, Type Package

Package mmtsne. July 28, 2017

Package PTE. October 10, 2017

Package comparedf. February 11, 2019

Package colf. October 9, 2017

Transcription:

Package rferns November 15, 2017 Type Package Title Random Ferns Classifier Version 2.0.3 Author Miron B. Kursa Maintainer Miron B. Kursa <M.Kursa@icm.edu.pl> Description An R implementation of the random ferns classifier by Ozuysal, Calonder, Lepetit and Fua (2009) <doi:10.1109/tpami.2009.23>, modified for generic and multi-label classification and featuring OOB error approximation and importance measure as introduced in Kursa (2014) <doi:10.18637/jss.v061.i10>. BugReports https://github.com/mbq/rferns/issues License GPL-3 RoxygenNote 6.0.1 NeedsCompilation yes Repository CRAN Date/Publication 2017-11-15 14:50:02 UTC R topics documented: merge.rferns........................................ 2 naivewrapper........................................ 3 predict.rferns........................................ 5 rferns............................................ 6 Index 9 1

2 merge.rferns merge.rferns Merge two random ferns models Description Usage This function combines two compatible (same decision, same training data structure and same depth) models into a single ensemble. It can be used to distribute model training, perform it on batches of data, save checkouts or precisely investigate its course. ## S3 method for class 'rferns' merge(x, y, dropmodel = FALSE, ignoreobjectconsistency = FALSE, truey = NULL,...) Arguments x y Value Object of a class rferns; a first model to be merged. Object of a class rferns; a second model to be merged. Can also be NULL, x is immediately returned in that case. Has to have be built on the same kind of training data as x, with the same depth. dropmodel If TRUE, model structure will be dropped to save size. This disallows prediction using the merged model, but retains importance and OOB approximations. ignoreobjectconsistency If TRUE, merge will be done even if both models were built on a different sets of objects. This drops OOB approximations. truey Copy of the training decision, used to re-construct OOB error and confusion matrix. Can be omitted, OOB error and confusion matrix will disappear in that case; ignored when ignoreobjectconsistency is TRUE.... Ignored, for S3 gerneric/method consistency. An object of class rferns, which is a list with the following components: model ooberr importance oobscores oobpreds The merged model in case both x and y had model structures included and dropmodel was FALSE. Otherwise NULL. OOB approximation of accuracy, if can be computed. Namely, when oobscores could be and truey is provided. The merged importance scores in case both x and y had importance calculated. Shadow importance appears only if both models had it enabled. OOB scores, if can be computed; namely if both models had it calculated and ignoreobjectconsistency was not used. A vector of OOB predictions of class for each object in training set, if can be computed.

naivewrapper 3 Note oobconfusionmatrix OOB confusion matrix, if can be computed. Namely, when oobscores could be and truey is provided. timetaken Time used to train the model, calculated as a sum of training times of x and y. parameters classlabels isstruct merged Numerical vector of three elements: classes, depth and ferns. Copy of levels(y) after purging unused levels. Copy of the train set structure. Set to TRUE to mark that merging was done. In case of different training object sets were used to build the merged models, merged importance is calculated but mileage may vary; for substantially different sets it may become biased. Your have been warned. Shadow importance is only merged when both models have shadow importance and the same consistentseed value; otherwise shadow importance would be biased down. The order of objects in x and y is not important; the only exception is merging with NULL, in which case x must be an rferns object for R to use proper merge method. Author(s) Miron B. Kursa Examples set.seed(77) #Fetch Iris data data(iris) #Build models rferns(species~.,data=iris)->modela rferns(species~.,data=iris)->modelb modelab<-merge(modela,modelb); print(modela); print(modelab); naivewrapper Naive feature selection method utilising the rferns shadow imporance Description Proof-of-concept ensemble of rferns models, built to stabilise and improve selection based on shadow importance. It employs a super-ensemble of iterations small rferns forests, each built on a subspace of size attributes, which is selected randomly, but with a higher selection probability for attributes claimed important by previous sub-models. Final selection is a group of attributes which hold a substantial weight at the end of the procedure.

4 naivewrapper Usage naivewrapper(x, y, iterations = 1000, depth = 5, ferns = 100, size = 30) Arguments x y iterations depth ferns size Data frame containing attributes; must have unique names and contain only numeric, integer or (ordered) factor columns. Factors must have less than 31 levels. No NA values are permitted. A decision vector. Must a factor of the same length as nrow(x) for ordinary many-label classification, or a logical matrix with each column corresponding to a class for multi-label classification. Number of iterations i.e., the number of sub-models built. The depth of the ferns; must be in 1 16 range. Note that time and memory requirements scale with 2^depth. Number of ferns to be build in each sub-model. This should be a small number, around 3-5 times size. Number of attributes considered by each sub-model. Value An object of class naivewrapper, which is a list with the following components: found weights timetaken params Names of all selected attributes. Vector of weights indicating the confidence that certain feature is relevant. Time of computation. Copies of algorithm parameters, iterations, depth, ferns and size, as a named vector. Author(s) Miron B. Kursa References Kursa MB (2017). Efficient all relevant feature selection with random ferns. In: Kryszkiewicz M., Appice A., Slezak D., Rybinski H., Skowron A., Ras Z. (eds) Foundations of Intelligent Systems. ISMIS 2017. Lecture Notes in Computer Science, vol 10352. Springer, Cham. Examples set.seed(77); #Fetch Iris data data(iris) #Extend with random noise noisyiris<-cbind(iris[,-5],apply(iris[,-5],2,sample)); names(noisyiris)[5:8]<-sprintf("nonsense%d",1:4) #Execute selection naivewrapper(noisyiris,iris$species,iterations=50,ferns=20,size=8)

predict.rferns 5 predict.rferns Prediction with random ferns model Description Usage This function predicts classes of new objects with given rferns object. ## S3 method for class 'rferns' predict(object, x, scores = FALSE,...) Arguments object x Value scores Object of a class rferns; a model that will be used for prediction. Data frame containing attributes; must have corresponding names to training set (although order is not important) and do not introduce new factor levels. If this argument is not given, OOB predictions on the training set will be returned. If TRUE, the result will contain score matrix instead of simple predictions.... Additional parameters. Predictions. If scores is TRUE, a factor vector (for many-class classification) or a logical data.frame (for multi-class classification) with predictions, else a data.frame with class scores. Author(s) Miron B. Kursa Examples set.seed(77) #Fetch Iris data data(iris) #Split into train and test set iris[c(true,false),]->irisr iris[c(false,true),]->irise #Build model rferns(species~.,data=irisr)->model print(model) #Test predict(model,irise)->p print(table( Predictions=p, True=irisE[["Species"]])) err<-mean(p!=irise[["species"]])

6 rferns print(paste("test error",err,sep=" ")) #Show first OOB scores head(predict(model,scores=true)) rferns Classification with random ferns Description This function builds a random ferns model on the given training data. Usage rferns(x,...) ## S3 method for class 'formula' rferns(formula, data =.GlobalEnv,...) ## S3 method for class 'matrix' rferns(x, y,...) ## Default S3 method: rferns(x, y, depth = 5, ferns = 1000, importance = "none", reporterrorevery = 0, saveerrorpropagation = FALSE, saveforest = TRUE, consistentseed = NULL,...) Arguments x Data frame containing attributes; must have unique names and contain only numeric, integer or (ordered) factor columns. Factors must have less than 31 levels. No NA values are permitted.... For formula and matrix methods, a place to state parameters to be passed to default method. For the print method, arguments to be passed to print. formula data y depth ferns alternatively, formula describing model to be analysed. in which to interpret formula. A decision vector. Must a factor of the same length as nrow(x) for ordinary many-label classification, or a logical matrix with each column corresponding to a class for multi-label classification. The depth of the ferns; must be in 1 16 range. Note that time and memory requirements scale with 2^depth. Number of ferns to be build.

rferns 7 importance Set to calculate attribute importance measure (VIM); "simple" will calculate the default mean decrease of true class score (MDTS, something similar to Random Forest s MDA/MeanDecreaseAccuracy), "shadow" will calculate MDTS and additionally MDTS of this attribute shadow, an implicit feature build by shuffling values within it, thus stripping it from information (which is slightly slower). Shadow importance is useful as a reference to judge significance of a regular importance. "none" turns importance calculation off, for a slightly faster execution. For compatibility with pre-1.2 rferns, TRUE will resolve to "simple" and FALSE to "none". Abbreviation can be used instead of a full value. reporterrorevery If set to a number larger than 0, current OOB error approximation will be printed every time a chunk of reporterrorevery ferns is finished. saveerrorpropagation Should the OOB error approximation be calculated after each ferns was created and saved? Setting to FALSE may improve performance. saveforest Should the model be saved? It must be TRUE if you want to use the model for prediction; however, if you are interested in importance or OOB error only, setting it to FALSE significantly improves memory requirements, especially for large depth and ferns. consistentseed PRNG seed used for shadow importance only. Must be either a 2-element integer vector or NULL, which corresponds to seeding from the default PRNG. It should be set to the same value in two merged models to make shadow importance meaningful. Value An object of class rferns, which is a list with the following components: model ooberr importance oobscores oobpreds The built model; NULL if saveforest was FALSE. OOB approximation of accuracy. Ignores never-oob-tested objects (see oob- Scores element). The importance scores or NULL if importance was set to "none". In a first case it is a data.frame with two or three columns: MeanScoreLoss which is a mean decrease of a score of a correct class when a certain attribute is permuted, Tries which is number of ferns which utilised certain attribute, and, only when importance was set to "shadow", Shadow, which is a mean decrease of accuracy for the correct class for a permuted copy of an attribute (useful as a baseline for normal importance). The rownames are set and equal to the names(x). A matrix of OOB scores of each class for each object in training set. Rows correspond to classes in the same order as in levels(y). If the ferns is too small, some columns may contain NAs, what means that certain objects were never in test set. A vector of OOB predictions of class for each object in training set. Never- OOB-tested objects (see above) have predictions equal to NA. oobconfusionmatrix Confusion matrix build from oobpreds and y.

8 rferns Note timetaken parameters classlabels Time used to train the model (smaller than wall time because data preparation and model final touches are excluded; however it includes the time needed to compute importance, if it applies). An object of difftime class. Numerical vector of three elements: classes, depth and ferns, containing respectively the number of classes in decision and copies of depth and ferns parameters. Copy of levels(y) after purging unused levels. consistentseed Consistent seed used; only present for importance="shadow". Can be used to seed a new model via consistentseed argument. isstruct Copy of the train set structure, required internally by predict method. The unused levels of the decision will be removed; on the other hand unused levels of categorical attributes will be preserved, so that they could be present in the data later predicted with the model. The levels of ordered factors in training and predicted data must be identical. Do not use formula interface for a data with large number of attributes; the overhead from handling the formula may be significant. Author(s) Miron B. Kursa References Ozuysal M, Calonder M, Lepetit V & Fua P. (2009). Fast Keypoint Recognition using Random Ferns, IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(3), 448-461. Kursa MB (2014). rferns: An Implementation of the Random Ferns Method for General-Purpose Machine Learning, Journal of Statistical Software, 61(10), 1-13. Examples set.seed(77); #Fetch Iris data data(iris) #Build model rferns(species~.,data=iris) ##Importance rferns(species~.,data=iris,importance="shadow")->model print(model$imp)

Index merge.rferns, 2 naivewrapper, 3 predict.rferns, 5 rferns, 6 9