Introduction to Classification & Regression Trees

Introduction to Classification & Regression Trees ISLR Chapter 8 vember 8, 2017

Classification and Regression Trees Carseat data from ISLR package

Classification and Regression Trees Carseat data from ISLR package Binary Outcome High 1 if Sales > 8, otherwise 0

Classification and Regression Trees Carseat data from ISLR package Binary Outcome High 1 if Sales > 8, otherwise 0 Fit a Classification tree model to Price and Income

Classification and Regression Trees Carseat data from ISLR package Binary Outcome High 1 if Sales > 8, otherwise 0 Fit a Classification tree model to Price and Income Pick a predictor and a cutpoint to split data X j s and X k > s to minimize deviance (or SSE for regression) - leads to a root node in a tree continue splitting/partitioning data until stopping criterion is reached (number of observations in a node > 10 and within node deviance > 001 deviance of the root node)

Carseat Example library(tree) data(carseats) Carseats = mutate(carseats, High=factor(ifelse ( Carseats$Sales <=8, "", "Yes ")) ) treecarseats = tree(high ~ Price + Income, data=carseats)

Carseat Example plot(treecarseats) text(treecarseats) Price < 925 Yes Price < 142 Income < 605 Income < 625

Partition partitiontree(treecarseats) points(carseats$price,carseats$income, col=carseats$high) 50 100 150 20 60 100 Price Income Yes

Splits treecarseats ## node), split, n, deviance, yval, (yprob) ## * denotes terminal node ## ## 1) root 400 54150 ( 05900 04100 ) ## 2) Price < 925 62 6624 Yes ( 02258 07742 ) * ## 3) Price > 925 338 43480 ( 06568 03432 ) ## 6) Price < 142 287 38210 ( 06167 03833 ) ## 12) Income < 605 113 12870 ( 07434 02566 ) ## 13) Income > 605 174 24040 ( 05345 04655 ) ## 7) Price > 142 51 3695 ( 08824 01176 ) ## 14) Income < 625 19 000 ( 10000 00000 ) * ## 15) Income > 625 32 3088 ( 08125 01875 ) *

Summary summary(treecarseats) ## ## Classification tree: ## tree(formula = High ~ Price + Income, data = Carseats) ## Number of terminal nodes: 5 ## Residual mean deviance: 118 = 4662 / 395 ## Misclassification error rate: 0325 = 130 / 400

All Variables treecarseats =tree(high ~ -Sales, data=carseats ) summary(treecarseats) ## ## Classification tree: ## tree(formula = High ~ - Sales, data = Carseats) ## Variables actually used in tree construction: ## [1] "ShelveLoc" "Price" "Income" "CompPrice ## [6] "Advertising" "Age" "US" ## Number of terminal nodes: 27 ## Residual mean deviance: 04575 = 1707 / 373 ## Misclassification error rate: 009 = 36 / 400 Overfitting?

Tree ShelveLoc:ac Price < 925 Price < 135 US:a Income < 46 Income < 57 CompPrice Population < 1105 < 2075 Advertising < 135 Price < 109 Yes Yes Yes Yes Yes Yes CompPrice < 1245 Age < 545 CompPrice < CompPrice 1305 < 1225 Price < 1065 Price < 1225 Income < 100 Price < 125 Population < 177 Yes Yes Income < 605 ShelveLoc:a CompPrice < 1475 Yes Yes Price < 1095 Price < 147 CompPrice < 1525 Age < 495 Yes Yes Yes

Classification Error setseed (2) train=sample (1: nrow(carseats ), 200) Carseatstest=Carseats [-train,] treecarseats =tree(high ~ -Sales, data=carseats,subset=train ) treepred=predict (treecarseats,carseatstest,type ="cla table(treepred, Carseatstest$High) ## ## treepred Yes ## 86 27 ## Yes 30 57 (30 + 27) /200 # classification error ## [1] 0285

Cost-Complexity Pruning 1 Grow a large tree on training data, stopping when each terminal node has fewer than some minimum number of observations

Cost-Complexity Pruning 1 Grow a large tree on training data, stopping when each terminal node has fewer than some minimum number of observations 2 Prediction for region m is the Class c that max cˆπ mc 3 Snip off the least important splits via cost-complexity pruning to the tree in order to obtain a sequence of best subtrees indexed by cost parameter k, N miss N k T missclassification error penalized by number of terminal nodes

Pruning via Cross Validation ) setseed(2) cvcarseats = cvtree(treecarseats, FUN=prunemisclass) cvcarseats$dev 55 60 65 70 75 80 cvcarseats$dev 55 60 65 70 75 80 5 10 15 0 5 10 15 20 cvcarseats$size cvcarseats$k

Pruned prunecarseats = prunemisclass(treecarseats,best =9) ShelveLoc: Bad,Medium Price < 142 Price < 1425 ShelveLoc: Bad Price < 865 Advertising < 65 Yes Yes Yes Age < 375 CompPrice < 1185 Yes

Miss-classification after Selection treepred=predict(prunecarseats, Carseatstest, type="class") table(treepred,carseatstest$high) ## ## treepred Yes ## 94 24 ## Yes 22 60 (94 +60)/200 # classified Correctly ## [1] 077

Tree with Random Split of Data Price < 925 CompPrice < 1085 ShelveLoc:ac ShelveLoc:aPopulation < 86 Yes Yes Yes Age < 615 Income < 355 Price < 1265 Age < 795 Advertising < 75 Population < 545 Advertising < 16 Advertising < 3 CompPrice < 1315 CompPrice < 1385 CompPrice < 118 Population < 2315 Yes Yes Income < 84 Age < 445 Yes Income < 415

Tree with another Random Split of Data ShelveLoc:ac Advertising < 125 Price < 1425 Price < 1075 Age < 565 US:a CompPrice < 1295 Price < 92 ShelveLoc:a CompPrice < 1195 Yes Yes CompPrice < 127 Price < 1445 ShelveLoc:a Income < 655 CompPrice < 1445 Price < 985 Income < 97CompPrice < 120 Yes Age < 405 Yes Yes Yes Age < 485 Yes Population < 1345

Bagging: Bootstrap Aggregation Splitting data into random partitions and fitting a tree model on each half may lead to very different predictions (high variability)

Bagging: Bootstrap Aggregation Splitting data into random partitions and fitting a tree model on each half may lead to very different predictions (high variability) Reduce variability by averaging over multiple training sets! do not have access to multiple training sets so create them via bootstrap samples (sample of size n with replacement) Generate B bootstrap sample of observations from the single training data Calculate predictions for the bth sample ˆf b (x)

Next Class Trees are simple to understand, but not as competitive with other supervised learning approaches for prediction/classification

Next Class Trees are simple to understand, but not as competitive with other supervised learning approaches for prediction/classification Ways to improving Trees through multiple trees in some ensemble: Bagging Random Forests Boosting BART (Bayesian Additive Regression Trees)