The TWIX Package. April 24, 2007

Size: px

Start display at page:

Download "The TWIX Package. April 24, 2007"

Alvin Fowler
6 years ago
Views:

1 The TWIX Package April 24, 2007 Version Title Trees WIth extra splits Author Sergej Potapov Maintainer Martin Theus Depends R (>= 1.8.0), rpart, iplots Trees with extra splits Date License Standard GNU public license URL R topics documented: Dev.leaf Devplot bagg.default boottwix deviance.twix export fullrks get.splitvar get.tree idevplot nlogn olives plot.twix pred.value predict.twix print.id.tree print.single.tree print.twix

2 2 Devplot scree.plot sp.slave splitt summary.twix tunetwix TWIX Index 20 Dev.leaf Internal functions of TWIX. Internal functions of package TWIX. Dev.leaf(x) Details These are not to be called by the user. Devplot Deviance plot Deviance plot. Devplot(rsp, x,interactiv=false,sample=c(false,0),col=1,classes=false,pch=16,...) rsp response variable. x a dataframe of predictor variables. interactiv see idevplot. sample a dataframe of predictor variables. col the color for points. classes Scatterplot of Classes. pch a vector of plotting characters or symbols.... other parameters to be passed through to plotting functions.

3 bagg.default 3 idevplot,plot.twix,twix #Devplot(olives$Region,olives[,1:8]) bagg.default Predictions from TWIX s or Bagging Trees Prediction of a new observation based on multiple trees. ## Default S3 method: bagg(object,data=null,sq=1:10,...) ## S3 method for class 'TWIX': bagg(object,...) ## S3 method for class 'boottwix': bagg(object,...) object data sq object of classes TWIX or boottwix. a data frame of new observations. TWIX, predict.twix boottwix Integer vector indicating for which trees predictions are required. #Tree <- TWIX(Region~.,data=olives,topN=c(5,3),method="local") #Tree1 <- boottwix(region~.,data=olives,topn=c(3,1),n=10) #pred <- bagg(tree,olives,sq=1:10) #pred1 <- bagg(tree1,olives,sq=1:10) # #CCR's #sum(pred==olives$region)/nrow(olives) #sum(pred1==olives$region)/nrow(olives)

4 4 boottwix boottwix Bootstrap of the TWIX trees Bootstrap samples of the Greedy-TWIX-trees. boottwix(formula, data=null,test.data=0,n=1,topn=1,subset=null, method="deviance",topn.method="complete", cluster=null,minsplit=30,minbucket=round(minsplit/3), Devmin=0.1,level=20,score=1,tol=0.15) formula formula of the form y ~ x1 + x2 +..., where y must be a factor and x1,x2,... are numeric. data an optional data frame containing the variables in the model (training data). test.data a data frame containing new data. N an integer giving the number of bootstrap replications. topn integer vector. How many splits will be selected and at which level? If length 1, the same size of splits will be selected at each level. If length > 1, for example topn=c(3,2), 3 splits will be chosen at first level, 2 splits at second level and for all next levels 1 split. subset an optional vector specifying a subset of observations to be used. method Which split points will be used? This can be "deviance" (default), "grid" or "local". If the method is set to: "local" - the program uses the local maxima of the split function(entropy), "deviance" - all values of the entropy, "grid" - grid points. topn.method one of "complete"(default) or "single". A specification of the consideration of the split points. If set to "complete" it uses split points from all variables, else it uses split points per variable. cluster name of the cluster, if parallel computing will be used. minsplit the minimum number of observations that must exist in a node. minbucket the minimum number of observations in any terminal <leaf> node. Devmin the minimum improvement on entropy by splitting. level maximum depth of the trees. If level set to 1, trees consist of root node. score a parameter, which can be 1(default) or 2. If it is 2 the sort-function will be used, if it set to 1 weigth-function will be used score = 0.25*scale(dev.tr)+0.6*scale(fit.tr)+0.15*(tree.structure) tol parameter, which will be used, if topn.method is set to "single".

5 deviance.twix 5 Value a list with the following components : call trees the call generating the object. a list of all constructed trees, which include ID, Dev... for each tree. get.tree, predict.twix, deviance.twix,bagg.twix, #Tree <- boottwix(region~.,data=olives,n=5) #Tree$trees deviance.twix Deviance of the TWIX-trees Returns the deviance of a fitted model object. ## S3 method for class 'TWIX': deviance(object,type="training",...) Value object type an object of class TWIX or boottwix for which the deviance is desired. a deviance from test, training or from both data.... additional optional argument. The value of the deviance extracted from the object object. TWIX, predict.twix #Tree <- TWIX(Region~.,data=olives,topN=c(5,3),method="local") #deviance(tree)

6 6 fullrks export Export TWIX-trees for KLIMT Export TWIX-trees for KLIMT. export(x, sq = 1, directory = "ForKlimt") x sq directory an object of class TWIX. a vector of tree IDs. a name of directory, which will be created. predict.twix,print.twix #Tree<-TWIX(Region~.,data=olives,topN=c(3,2),method="local",level=5) #export(tree,c(1,3)) fullrks Internal functions of TWIX. Internal functions of TWIX. fullrks(m) m a numeric or character vector fullrks(c(3,1,2,4,5)) a<-as.factor(c("yes","yes","no","no")) fullrks(a)

7 get.splitvar 7 get.splitvar Methods for getting split variables from a single tree(single.tree). This function extracts a split variables from a single.tree object. get.splitvar(x, sq = 1:length(x$trees),parm = "Splitvar") x sq parm an object of class TWIX generated by TWIX Integer vector indicating for which trees a split variables are required. Which information must be returned? This can be "Splitvar" or "Dev" TWIX,get.tree, predict.twix #TreeFR<-TWIX(Area~.,data=olives,topN=c(2,1),method="local",level=5) #TreeSG<-get.splitvar(TreeFR,sq=1) #TreeSG #get.tree(treefr) get.tree Methods for getting appointed trees from a multitree(twix). This function extracts a single tree from a TWIX object. get.tree(m.tr, n = 1, id = NULL) m.tr n id an object of class TWIX generated by TWIX integer. Which tree must be extracted? a vector of integers. ID of the tree.

8 8 idevplot TWIX, predict.twix #TreeFR<-TWIX(Area~.,data=olives,topN=c(2,1),method="local") #TreeSG<-get.tree(TreeFR,n=1) #TreeSG idevplot Interactive Deviance plot Interactive Deviance plot. idevplot(rsp, data,col=1,...) rsp response variable. data a dataframe of numeric predictor variables. col the color for points.... other parameters to be passed through to plotting functions. Devplot,plot.TWIX,TWIX #idevplot(olives$region,olives[,c(1:8)])

9 nlogn 9 nlogn Internal functions of TWIX. This function compute 0*log(0). nlogn(x) x a numeric value log(0) nlogn(0) olives Classification of olive oils from their fatty acid composition. The olives data frame has 572 rows and 10 columns. olives Format This data frame contains the following columns: Area a factor with 3 levels. Region a factor with 9 levels. and 8 variables of fatty acid measurements. Source Forina, M. and Armanino, C. and Lanteri, S. and Tiscornia, E. (1983) Food Research and Data Analysis, Applied Science Publishers, CA 1983.

10 10 plot.twix plot.twix Plotting method for TWIX Objects Plot an TWIX or boottwix object generated by TWIX or boottwix function(s). ## S3 method for class 'TWIX': plot(x,sq = 1:length(x$trees),type = "deviance", i.plot = FALSE,size = 3,freq = TRUE,breaks = "Sturges",pch = par("pch") x sq type i.plot size freq breaks pch an object of class TWIX. Integer vector giving the number of trees to be plotted. one of "deviance", "ccr","d&c". logical. If TRUE, iplot will be used. value for largest circle (cex). logical. Should the frequence or density be plotted. see histogram. a vector of plotting characters or symbols.... graphical parameters can be given as arguments to plot. Many methods will also accept the following arguments: Details If type = "deviance": the training deviance vs. test deviance will be plotted. If type = "ccr": the correct classification rate(ccr) for training data vs. the CCR for test data. If type = "d&c": the deviance vs. CCR for test data. TWIX get.tree

11 pred.value 11 i <- sample(572,150) ic <- setdiff(1:572,i) training <- olives[ic,] valid <- olives[i,] # #Tree<-TWIX(Region~.,training,test.data=valid,topN=c(10,2),method="local") #plot(tree) #plot(tree,type="ccr") #plot(tree,type="d&c") #plot(tree,i.plot=true) pred.value Prediction for one case Internal prediction functions. pred.value(x, tree) x tree the case from data.frame an object of class single.tree predict.twix predict.twix Predictions from a TWIX Object The result is a data frame, whose rows are prediction values from appointed tree(s). ## S3 method for class 'TWIX': predict(object,newdata,sq=1,ccr=false,...)

12 12 print.id.tree object newdata sq ccr an object returned from TWIX function. data frame containing the new data(test data). Integer vector indicating for which trees predictions are required. logical. If TRUE the result is a list of two components: a data frame with prediction values and correct classification rate of trees. This parameter can be ignored, if the function TWIX has been called call with test data (test.data=test).... additional arguments affecting the predictions produced TWIX, plot.twix i <- sample(572,150) ic <- setdiff(1:572,i) training <- olives[ic,] test <- olives[i,] # #Tree<-TWIX(Region~.,training,topN=c(5,2),method="local") #pred<-predict(tree,newdata=test,sq=1:2) # #predict(tree,newdata=test,sq=1:5,ccr=true)$ccr print.id.tree Internal functions of TWIX. Print names from id.tree Object. ## S3 method for class 'id.tree': print(x, sq = 1:5,...) x an object of class id.tree. sq a numeric vector.... further arguments passed to or from other methods. TWIX

13 print.single.tree 13 print.single.tree Print tree from single.tree Object. This is a method for the generic print() function for objects generating by the function get.tree. ## S3 method for class 'single.tree': print(x, klimt=false, Data=NULL, file="fromr.tree",...) x klimt Data file an object of class single.tree. logical. If TRUE, Klimt will be started with the tree baum und dataset Data. a data frame. It can be test data or training data. This parameter is ignored if klimt == "FALSE". a character string naming a file.... further arguments passed to or from other methods. get.tree,twix #Tree<-TWIX(Area~.,data=olives,topN=c(2,2),method="local") #Tree1<-get.tree(Tree,n=1) #Tree1 #for Klimt #print(tree1,klimt=true,data=olives) print.twix Print Method for TWIX or BootTWIXtree Object Print object of class TWIX or boottwix. ## S3 method for class 'TWIX': print(x,...) ## S3 method for class 'boottwix': print(x,...)

14 14 scree.plot x Details object of class TWIX or boottwix.... additional arguments. An object of class TWIX or boottwix is printed. Information about names of the intermediate variables is given. TWIX, print.single.tree scree.plot Scree-plot A scree plot shows the sorted maximum decrease in impurity for each variable s value. scree.plot(formula, data = NULL, bars = TRUE, col = "grey", type = "b", pch = 16, ylim = c(0, 1),...) formula data bars col type formula of the form y ~ x1 + x2 +..., where y must be a factor and x1,x2,... are numeric or factor. an optional data frame containing the variables in the model. the type of plot: barplot or lines. The colors for lines or Bar s. the color for points. pch see par(pch =...). ylim the y limits of the plot.... other parameters to be passed through to plotting functions. idevplot,plot.twix,twix scree.plot(region~.,data=olives,bars=false,col=2) scree.plot(region~.,data=olives,bars=true)

15 sp.slave 15 sp.slave Internal functions of TWIX. Trees with extra splits sp.slave(rsp, m, test.data, Dmin = 0.01, minsplit = 20, minbucket=round(minsplit/3),topn = 1, method = "deviance", topn.method = "complete", level = 3, lev = 0, st = 1, tol = 0.1, K = 0, oldspvar=0) rsp m test.data method topn.method minsplit minbucket Dmin topn level st tol lev K oldspvar a response variable. a dataframe (training data). a data frame containing new data. Which split points will be used? This can be "deviance" (default), "grid" or "local". If the method is set to: local the program uses the local maxima of the split function(entropy), deviance all values of the entropy, grid grid points. one of "complete"(default) or "single". A specification of the consideration of the split points. If set to "complete" it uses split points from all variables, else it uses split points per variable. the minimum number of observations that must exist in a node. the minimum number of observations in any terminal <leaf> node. the minimum improvement on entropy by splitting. a integer vector. How many splits will be selected and at which level? If length 1, the same size of splits will be selected at each level. If length > 1, for example topn=c(3,2), 3 splits will be chosen at first level, 2 splits at second level and for all next levels 1 split. maximum depth of the trees. If level set to 1, trees consist of root node. step parameter for method "grid". parameter, which will be used, if topn.method is set to "single". Internal parameter. k-fold cross-validation. internal parameter. TWIX, predict.twix,

16 16 summary.twix splitt Internal functions of TWIX. This function compute deviance and split-points. splitt(sv, rsp, svrks = fullrks(sv), meth = "deviance", topn = 1, topn.meth = "complete", lstep = 1, test = FALSE, K = 0, level = 0) sv rsp svrks meth topn topn.meth lstep test K level a numeric vector of predicted variable. response variable. a index vector of sv by sorting. Which split points will be used? This can be "deviance" (default), "grid" or "local". a numeric vector. How many splits will be selected and at which level? one of "complete"(default) or "single". step parameter for method "grid". parameter for Devplot. k-fold cross-validation. Set the maximum depth of the TWIX tree s summary.twix Summarising TWIX summary method for objects returned by TWIX or boottwix. ## S3 method for class 'TWIX': summary(object,...) ## S3 method for class 'boottwix': summary(object,...)

17 tunetwix 17 object object of class TWIX or boottwix.... further arguments to be passed to or from methods. TWIX tunetwix Parameter Tuning. This function tunes hyperparameters minbuck and maxdepth. tunetwix(formula, data = NULL, minbuck = seq(5, 30, by = 5), xval = 10, runs = 10, trace.plot = TRUE) formula data minbuck xval runs trace.plot formula of the form y ~ x1 + x2 +..., where y must be a factor and x1,x2,... are numeric or factor. an optional data frame containing the variables in the model. the sampling space for parameter minbuck. number of cross-validations. number of runs. Should trace plot be ploted?... other parameters to be passed through to plotting functions. plot.twix,twix tunetwix(region~.,data=olives[,1:9],minbuck=c(1,5,10,15,20,25),runs=2)

18 18 TWIX TWIX Trees with extra splits Trees with extra splits TWIX(formula, data = NULL, test.data = 0, subset = NULL, method = "deviance", topn.method = "complete", cluster = NULL, minsplit = 30, minbucket = round(minsplit/3), Devmin = 0.05, topn = 1, level = 30, st = 1, cl.level = 2, tol = 0.15, score = 1, k = 0, trace.plot=false,...) formula data test.data subset method topn.method cluster minsplit minbucket Devmin formula of the form y ~ x1 + x2 +..., where y must be a factor and x1,x2,... are numeric or factor. an optional data frame containing the variables in the model(training data). This can be a data frame containing new data, 0(default), or "NULL".If set to "NULL" the bad obserations will be specified. an optional vector specifying a subset of observations to be used. Which split points will be used? This can be "deviance" (default), "grid" or "local". If the method is set to: "local" - the program uses the local maxima of the split function(entropy), "deviance" - all values of the entropy, "grid" - grid points. one of "complete"(default) or "single". A specification of the consideration of the split points. If set to "complete" it uses split points from all variables, else it uses split points per variable. name of the cluster, if parallel computing will be used. the minimum number of observations that must exist in a node. the minimum number of observations in any terminal <leaf> node. the minimum improvement on entropy by splitting. topn integer vector. How many splits will be selected and at which level? If length 1, the same size of splits will be selected at each level. If length > 1, for example topn=c(3,2), 3 splits will be chosen at first level, 2 splits at second level and for all next levels 1 split. level st cl.level tol maximum depth of the trees. If level set to 1, trees consist of root node. step parameter for method "grid". parameter for parallel computing. parameter, which will be used, if topn.method is set to "single".

19 TWIX 19 score a parameter, which can be 1(default) or 2. If it is 2 the sort-function will be used, if it set to 1 weigth-function will be used score = 0.25*scale(dev.tr)+0.6*scale(fit.tr)+0.15*(tree.structure) k Value trace.plot k-fold cross-validation of split-function. k specify the part of observations which will be take in hold-out sample (k can be (0,0.5)). Should trace plot be ploted?... further arguments to be passed to or from methods. a list with the following components : call trees greedy.tree multitree agg.id Bad.id the call generating the object. a list of all constructed trees, which include ID, Dev... for each tree. greedy tree database vector specifying trees for aggregation. ID-vector of bad observations from train data. get.tree, predict.twix, print.single.tree, plot.twix, deviance.twix i <- sample(572,150) ic <- setdiff(1:572,i) training <- olives[ic,] test <- olives[i,] # #Tree1<-TWIX(Region~.,data=training[,1:9],topN=c(9,2),method="local") #Tree1$trees # #pred<-predict(tree1,newdata=test,sq=1:2) # #predict(tree1,newdata=test,sq=1:2,ccr=true)$ccr

20 Index Topic datasets olives, 9 Topic internal Dev.leaf, 1 Topic tree bagg.default, 2 boottwix, 3 deviance.twix, 5 Devplot, 2 export, 5 fullrks, 6 get.splitvar, 7 get.tree, 7 idevplot, 8 nlogn, 9 plot.twix, 10 pred.value, 11 predict.twix, 11 print.id.tree, 12 print.single.tree, 13 print.twix, 13 scree.plot, 14 sp.slave, 15 splitt, 16 summary.twix, 16 tunetwix, 17 TWIX, 18 bagg (bagg.default), 2 bagg.default, 2 bagg.twix, 4 boottwix, 3, 3 fullrks, 6 get.splitvar, 7 get.tree, 4, 7, 7, 10, 13, 19 idevplot, 2, 8, 14 nlogn, 9 olives, 9 plot.boottwix (plot.twix), 10 plot.twix, 2, 8, 10, 12, 14, 17, 19 pred.value, 11 predict.boottwix (predict.twix), 11 predict.twix, 3 8, 11, 11, 15, 19 print.boottwix (print.twix), 13 print.id.tree, 12 print.single.tree, 13, 14, 19 print.twix, 6, 13 scree.plot, 14 sp.slave, 15 splitt, 16 summary.boottwix (summary.twix), 16 summary.twix, 16 tunetwix, 17 TWIX, 2, 3, 5, 7, 8, 10, 12 17, 18 Dev.leaf, 1 deviance.boottwix (deviance.twix), 5 deviance.twix, 4, 5, 19 Devplot, 2, 8 export, 5 20

Package TWIX. R topics documented: February 15, Title Trees WIth extra splits. Date Version

Package TWIX. R topics documented: February 15, Title Trees WIth extra splits. Date Version Package TWIX February 15, 2013 Title Trees WIth extra splits Date 28.03.2012 Version 0.2.19 Author Sergej Potapov , Martin Theus Maintainer Sergej Potapov