Patrick Breheny. November 10

Similar documents
Classification and Regression Trees

Generalized additive models II

Classification and Regression Trees

Tree-based methods for classification and regression

Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai

Lecture 19: Decision trees

Chapter 5. Tree-based Methods

Lecture 20: Classification and Regression Trees

Decision trees. Decision trees are useful to a large degree because of their simplicity and interpretability

Lecture 20: Bagging, Random Forests, Boosting

MSA220/MVE440 Statistical Learning for Big Data

CART. Classification and Regression Trees. Rebecka Jörnsten. Mathematical Sciences University of Gothenburg and Chalmers University of Technology

Classification/Regression Trees and Random Forests

Random Forest A. Fornaser

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1

The Basics of Decision Trees

CS229 Lecture notes. Raphael John Lamarre Townshend

An overview for regression tree

Splines. Patrick Breheny. November 20. Introduction Regression splines (parametric) Smoothing splines (nonparametric)

Network Traffic Measurements and Analysis

Computer Vision Group Prof. Daniel Cremers. 8. Boosting and Bagging

Introduction to Classification & Regression Trees

CARTWARE Documentation

7. Decision or classification trees

Decision Tree Learning

Notes based on: Data Mining for Business Intelligence

Statistical Methods for Data Mining

7. Boosting and Bagging Bagging

Generalized additive models I

Faculty of Sciences. Holger Cevallos Valdiviezo

8. Tree-based approaches

Knowledge Discovery and Data Mining

Classification: Decision Trees

Nonparametric Classification Methods

Data Mining Lecture 8: Decision Trees

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018

1) Give decision trees to represent the following Boolean functions:

Additive Models, Trees, etc. Based in part on Chapter 9 of Hastie, Tibshirani, and Friedman David Madigan

Fitting Classification and Regression Trees Using Statgraphics and R. Presented by Dr. Neil W. Polhemus

Decision Tree CE-717 : Machine Learning Sharif University of Technology

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

Implementierungstechniken für Hauptspeicherdatenbanksysteme Classification: Decision Trees

Predictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA

Performance Evaluation of Various Classification Algorithms

The digital copy of this thesis is protected by the Copyright Act 1994 (New Zealand).

Regression. Dr. G. Bharadwaja Kumar VIT Chennai

Machine Learning. A. Supervised Learning A.7. Decision Trees. Lars Schmidt-Thieme

Data Mining. 3.2 Decision Tree Classifier. Fall Instructor: Dr. Masoud Yaghini. Chapter 5: Decision Tree Classifier

Ensemble Learning: An Introduction. Adapted from Slides by Tan, Steinbach, Kumar

Data Mining. Decision Tree. Hamid Beigy. Sharif University of Technology. Fall 1396

Logical Rhythm - Class 3. August 27, 2018

Lecture 06 Decision Trees I

Classification with Decision Tree Induction

Lecture 5: Decision Trees (Part II)

Decision Trees. This week: Next week: Algorithms for constructing DT. Pruning DT Ensemble methods. Random Forest. Intro to ML

Section 2.2 Normal Distributions. Normal Distributions

Machine Learning with MATLAB --classification

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 10: Decision Trees

Lecture 2 :: Decision Trees Learning

Hybrid Feature Selection for Modeling Intrusion Detection Systems

Sensor-based Semantic-level Human Activity Recognition using Temporal Classification

An Empirical Comparison of Ensemble Methods Based on Classification Trees. Mounir Hamza and Denis Larocque. Department of Quantitative Methods

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 10: Decision Trees

Statistical Machine Learning Hilary Term 2018

Splines and penalized regression

Advanced and Predictive Analytics with JMP 12 PRO. JMP User Meeting 9. Juni Schwalbach

Part I. Classification & Decision Trees. Classification. Classification. Week 4 Based in part on slides from textbook, slides of Susan Holmes

Nominal Data. May not have a numerical representation Distance measures might not make sense. PR and ANN

Classification. Instructor: Wei Ding

Computer Vision Group Prof. Daniel Cremers. 6. Boosting

Classification. Instructor: Wei Ding

INTRO TO RANDOM FOREST BY ANTHONY ANH QUOC DOAN

arxiv: v1 [stat.ml] 12 Jun 2017

CS Machine Learning

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

Lecture 7: Decision Trees

CSE4334/5334 DATA MINING

An introduction to random forests

Predicting Rare Failure Events using Classification Trees on Large Scale Manufacturing Data with Complex Interactions

10601 Machine Learning. Model and feature selection

Cluster Analysis. Angela Montanari and Laura Anderlucci

Lecture 25: Review I

Nonparametric Approaches to Regression

Recitation Supplement: Creating a Neural Network for Classification SAS EM December 2, 2002

Random Forests and Boosting

h=[3,2,5,7], pos=[2,1], neg=[4,4]

A Program demonstrating Gini Index Classification

Distribution-free Predictive Approaches

Enterprise Miner Tutorial Notes 2 1

CHAPTER 7 EXAMPLES: MIXTURE MODELING WITH CROSS- SECTIONAL DATA

SVM in Analysis of Cross-Sectional Epidemiological Data Dmitriy Fradkin. April 4, 2005 Dmitriy Fradkin, Rutgers University Page 1

Simulation studies. Patrick Breheny. September 8. Monte Carlo simulation Example: Ridge vs. Lasso vs. Subset

Machine Learning. Decision Trees. Le Song /15-781, Spring Lecture 6, September 6, 2012 Based on slides from Eric Xing, CMU

STAT Midterm Research Project: Random Forest. Vicky Xu. Xiyu Liang. Xue Cao

Handling Missing Values using Decision Trees with. Branch-Exclusive Splits

Lecture (08, 09) Routing in Switched Networks

Statistical Analysis of List Experiments

Lecture outline. Decision-tree classification

CS 229 Midterm Review

Cyber attack detection using decision tree approach

Transcription:

Patrick Breheny November Patrick Breheny BST 764: Applied Statistical Modeling /6

Introduction Our discussion of tree-based methods in the previous section assumed that the outcome was continuous Tree-based methods for categorical outcomes are referred to as classification trees The main idea is the same: we recursively partition the sample space, fitting a very simple model in each partition In the case of classification, this our model is to simply use the observed proportions Patrick Breheny BST 764: Applied Statistical Modeling 2/6

Classification tree model The classification model is simply to estimate Pr(G = k x) by ˆπ mk I(x R m ), m where ˆπ mk is the proportion of observations in partition m that belong to class k Regression trees are based on the least squares criterion: we used this criterion both to find optimal splits and to measure node impurity for the purpose of pruning This criterion is not appropriate for classification The primary change we need to make in the algorithm in order to obtain classification trees is to establish a new criterion to determine goodness of fit Patrick Breheny BST 764: Applied Statistical Modeling 3/6

Node impurity measures There are three commonly used measures of node impurity Q m (T ) for classification trees: Misclassification error: Gini index: Deviance: N m I(y i arg max kˆπ mk ) i R m ˆπ mk ( ˆπ mk ) k k ˆπ mk log(ˆπ mk ) Patrick Breheny BST 764: Applied Statistical Modeling 4/6

Node impurity measures (cont d) For K = 2: Q m (T)...3.4.5 Misclassification Gini index Deviance (rescaled)..4.8. p Note that all three are similar, but that the Gini index and deviance are differentiable, and thus easier to optimize numerically Patrick Breheny BST 764: Applied Statistical Modeling 5/6

Growing and pruning your classification tree As with regression trees, the classification tree algorithm determines, for each variable j, the split point s j that minimizes N Q (s j ) + N 2 Q 2 (s j ), where N i is the number of observations in each new node and Q i (s j ) is the impurity measure of each new node Again, we search over all (j, s) pairs and choose the one that provides the lowest total impurity m N mq m (T ) Cost-complexity pruning works the same way, although typically misclassification loss is used for pruning regardless of the method used to grow the tree, as it is the easiest to implement via cross-validation Patrick Breheny BST 764: Applied Statistical Modeling 6/6

Example As an example of classification trees, we will consider the coronary heart disease data set that we have examined a few times already To obtain classification trees, one can either specify method= class or convert the outcome to a factor, in which case method= class is inferred: heart$chd <- factor(heart$chd,labels=c("healthy","chd")) fit <- rpart(chd~.,data=heart) This method of inferring the type of tree by the class of the outcome is also used by party, which lacks the option to directly specify the model By default, rpart uses the Gini index to grow classification trees, although there is an option to use deviance instead; party is again based on hypothesis testing Patrick Breheny BST 764: Applied Statistical Modeling 7/6

Cost-complexity pruning 2 3 5 7 8 Relative Error.5.7.9. In sample Cross validated..5.2. α Patrick Breheny BST 764: Applied Statistical Modeling 8/6

Results: rpart age < 5.5 >= 5.5 age famhist < 3.5 >= 3.5 Absent Present typea tobacco ldl < 4.99 >= 4.99 < 68.5 >= 68.5 < 7.65 >= 7.65 adiposity >= 27.985< 27.985 n = 8.8.4 n = 7.8.4 n = 2.8.4 n = 58.8.4 n = 24.8.4 n = 2.8.4 n = 9.8.4 n = 5.8.4 Patrick Breheny BST 764: Applied Statistical Modeling 9/6

Results: party age p <. 5 > 5 age p <. famhist p <. 3 > 3 Absent Present tobacco p =.3 typea p =.2 tobacco p =.32 ldl p =.43.5 >.5 68 > 68 7.44 > 7.44 4.9 > 4.9 n = 8.8.4 n = 28.8.4 n = 7.8.4 n = 2.8.4 n = 58.8.4 n = 24.8.4 n = 39.8.4 n = 5.8.4 Patrick Breheny BST 764: Applied Statistical Modeling /6

Assumption-free Advantages of tree-based methods Disadvantages of tree-based methods Perhaps the primary advantage of tree-based methods is that they are virtually assumption-free Consequently, they are very simple to fit and interpret, since no time or effort has to go into making, checking, or explaining assumptions Furthermore, they are capable of discovering associations, such as higher-order interactions, that would otherwise go utterly unsuspected Patrick Breheny BST 764: Applied Statistical Modeling /6

Easily extended Advantages of tree-based methods Disadvantages of tree-based methods Another advantage is that, since very simple models are being fit to each node, trees are fairly easy to extend to other problems For instance, the rpart and party packages have implemented survival trees, poisson regression trees, and ordinal regression trees (some are only available in one package or the other, however) Indeed, rpart lets you pass your method to the function although admittedly, this is somewhat easier said than done Patrick Breheny BST 764: Applied Statistical Modeling 2/6

Missing data Advantages of tree-based methods Disadvantages of tree-based methods Finally, trees have a rather elegant option for handling missing data besides the usual options of discarding or imputing observations For a given split, we can find surrogate splits, which best mimic the behavior of the original split Then, when sending an observation down the tree, if the splitting variable is missing, we simply use the surrogate split instead Patrick Breheny BST 764: Applied Statistical Modeling 3/6

Instability of trees Advantages of tree-based methods Disadvantages of tree-based methods The primary disadvantage of trees is that they are rather unstable (i.e., have high variance) In other words, small change in the data often results in a completely different tree something to keep in mind while interpreting trees One major reason for this instability is that if a split changes, all the splits under it are changes as well, thereby propagating the variability A related methodology, random forests, grows a large number of trees and then averages across them in order to stabilize the tree-based approach, although obviously there is a cost as far as interpretation is concerned Patrick Breheny BST 764: Applied Statistical Modeling 4/6

Advantages of tree-based methods Disadvantages of tree-based methods Difficulty in capturing additive structure In addition, trees have a difficult time capturing simple additive structures:..4.8. 2 2 4 6 x y Patrick Breheny BST 764: Applied Statistical Modeling 5/6

Concluding remarks Advantages of tree-based methods Disadvantages of tree-based methods In other words, the weaknesses of tree-based methods are precisely the strengths of linear models, and vice versa For this reason, I have always personally found the two methods to be useful complements to each other For example, when embarking on an extensive analysis using a linear or generalized linear model, it doesn t hurt to check your results against a regression tree If you find that the regression tree chooses a very different set of important variables, you may want to reconsider the linear model (especially if the regression tree achieves a much better R 2 ) Patrick Breheny BST 764: Applied Statistical Modeling 6/6