Decision Tree Learning

Similar documents
Lecture 7: Decision Trees

Notes based on: Data Mining for Business Intelligence

Data Mining. 3.2 Decision Tree Classifier. Fall Instructor: Dr. Masoud Yaghini. Chapter 5: Decision Tree Classifier

Data Mining Lecture 8: Decision Trees

Tree-based methods for classification and regression

CS Machine Learning

Classification with Decision Tree Induction

Lecture outline. Decision-tree classification

Example of DT Apply Model Example Learn Model Hunt s Alg. Measures of Node Impurity DT Examples and Characteristics. Classification.

Decision Tree CE-717 : Machine Learning Sharif University of Technology

COMP 465: Data Mining Classification Basics

Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai

Decision trees. Decision trees are useful to a large degree because of their simplicity and interpretability

Extra readings beyond the lecture slides are important:

Classification and Regression Trees

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018

Random Forest A. Fornaser

Knowledge Discovery and Data Mining

Lecture 5: Decision Trees (Part II)

Data Mining Concepts & Techniques

Chapter ML:III. III. Decision Trees. Decision Trees Basics Impurity Functions Decision Tree Algorithms Decision Tree Pruning

Lecture 19: Decision trees

Machine Learning. Decision Trees. Manfred Huber

8. Tree-based approaches

Implementierungstechniken für Hauptspeicherdatenbanksysteme Classification: Decision Trees

Classification/Regression Trees and Random Forests

Classification and Regression Trees

Decision tree learning

CSE4334/5334 DATA MINING

Lecture 20: Bagging, Random Forests, Boosting

7. Decision or classification trees

Supervised Learning. Decision trees Artificial neural nets K-nearest neighbor Support vectors Linear regression Logistic regression...

Data Mining: Concepts and Techniques Classification and Prediction Chapter 6.1-3

Data Mining Classification - Part 1 -

Classification. Instructor: Wei Ding

Classification: Basic Concepts, Decision Trees, and Model Evaluation

Logical Rhythm - Class 3. August 27, 2018

Business Club. Decision Trees

Nominal Data. May not have a numerical representation Distance measures might not make sense. PR and ANN

Algorithms: Decision Trees

Exam Advanced Data Mining Date: Time:

Lecture 2 :: Decision Trees Learning

Machine Learning in Real World: C4.5

INTRO TO RANDOM FOREST BY ANTHONY ANH QUOC DOAN

Performance Analysis of Classifying Unlabeled Data from Multiple Data Sources

CS 229 Midterm Review

Statistical Machine Learning Hilary Term 2018

Part I. Instructor: Wei Ding

Ensemble Methods, Decision Trees

Classification: Decision Trees

Decision Tree Learning

Additive Models, Trees, etc. Based in part on Chapter 9 of Hastie, Tibshirani, and Friedman David Madigan

Lazy Decision Trees Ronny Kohavi

Data Mining Practical Machine Learning Tools and Techniques

Data Mining and Analytics

Predictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA

What is Learning? CS 343: Artificial Intelligence Machine Learning. Raymond J. Mooney. Problem Solving / Planning / Control.

Decision Trees. Query Selection

Classification Algorithms in Data Mining

CLASSIFICATION USING E-MINER: LOGISTIC REGRESSION AND CART

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

A Program demonstrating Gini Index Classification

Lecture Notes for Chapter 4

arxiv: v1 [stat.ml] 25 Jan 2018

Nearest neighbor classification DSE 220

A Systematic Overview of Data Mining Algorithms

A Systematic Overview of Data Mining Algorithms. Sargur Srihari University at Buffalo The State University of New York

Induction of Decision Trees

Part I. Classification & Decision Trees. Classification. Classification. Week 4 Based in part on slides from textbook, slides of Susan Holmes

Nominal Data. May not have a numerical representation Distance measures might not make sense PR, ANN, & ML

Uninformed Search Methods. Informed Search Methods. Midterm Exam 3/13/18. Thursday, March 15, 7:30 9:30 p.m. room 125 Ag Hall

MIT Samberg Center Cambridge, MA, USA. May 30 th June 2 nd, by C. Rea, R.S. Granetz MIT Plasma Science and Fusion Center, Cambridge, MA, USA

Decision Trees: Discussion

Data Mining. Decision Tree. Hamid Beigy. Sharif University of Technology. Fall 1396

Cyber attack detection using decision tree approach

Classification and Regression

Decision Trees: Representation

Improving Quality of Products in Hard Drive Manufacturing by Decision Tree Technique

BITS F464: MACHINE LEARNING

Machine Learning. A. Supervised Learning A.7. Decision Trees. Lars Schmidt-Thieme

The digital copy of this thesis is protected by the Copyright Act 1994 (New Zealand).

Analysis of Various Decision Tree Algorithms for Classification in Data Mining

Given a collection of records (training set )

The Basics of Decision Trees

1) Give decision trees to represent the following Boolean functions:

Random Forest Classification and Attribute Selection Program rfc3d

Unsupervised Learning I: K-Means Clustering

Introduction to Machine Learning

Data Mining. 3.3 Rule-Based Classification. Fall Instructor: Dr. Masoud Yaghini. Rule-Based Classification

PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning

Network Traffic Measurements and Analysis

Improving Quality of Products in Hard Drive Manufacturing by Decision Tree Technique

Machine Learning (CSE 446): Concepts & the i.i.d. Supervised Learning Paradigm

Classification with PAM and Random Forest

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 10: Decision Trees

Supervised Learning Classification Algorithms Comparison

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation. Lecture Notes for Chapter 4. Introduction to Data Mining

Cse634 DATA MINING TEST REVIEW. Professor Anita Wasilewska Computer Science Department Stony Brook University

Patrick Breheny. November 10

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 10: Decision Trees

Transcription:

Decision Tree Learning Debapriyo Majumdar Data Mining Fall 2014 Indian Statistical Institute Kolkata August 25, 2014

Example: Age, Income and Owning a flat Monthly income (thousand rupees) 250 200 150 100 50 L 1 Training set Owns a house Does not own a house 0 L 2 0 10 20 30 40 50 60 70 Age If the training data was as above Could we define some simple rules by observation? Any point above the line L 1 à Owns a house Any point to the right of L 2 à Owns a house Any other point à Does not own a house 2

Example: Age, Income and Owning a flat Monthly income (thousand rupees) 250 200 150 100 50 L 1 Training set Owns a house Does not own a house 0 Root node: Split at Income = 101 0 10 20 30 40 50 60 70 Age Income 101: Label = Yes Income < 101: Split at Age = 54 L 2 In general, the data won t be such as above Age 54: Label = Yes Age < 54: Label = No 3

Example: Age, Income and Owning a flat Monthly income (thousand rupees) 250 200 150 100 50 Training set Owns a house Does not own a house 0 0 10 20 30 40 50 60 70 Age Approach: recursively split the data into partitions so that each partition becomes purer till How to decide the split? How to measure purity? When to stop? 4

Approach for splikng What are the possible lines for splitting? For each variable, midpoints between pairs of consecutive values for the variable How many? If N = number of points in training set and m = number of variables About O(N m) How to choose which line to use for splitting? The line which reduce impurity (~ heterogeneity of composition) the most How to measure impurity? 5

Gini Index for Measuring Impurity Suppose there are C classes Let p(i t) = fraction of observations belonging to class i in rectangle (node) t Gini index: C i=1 Gini(t) =1 p(i t) 2 If all observations in t belong to one single class Gini(t) = 0 When is Gini(t) maximum? 6

Entropy Average amount of information contained From another point of view average amount of information expected hence amount of uncertainty We will study this in more detail later Entropy: C Entropy(t) = p(i t) log 2 p(i t) i=1 Where 0 log 2 0 is defined to be 0 7

ClassificaOon Error What if we stop the tree building at a node That is, do not create any further branches for that node Make that node a leaf Classify the node with the most frequent class present in the node Classification error as measure of impurity This rectangle (node) is still impure ClassificationError(t) =1 max i [ p(i t)] Intuitively the impurity of the most frequent class in the rectangle (node) 8

The Full Blown Tree Recursive splitting Suppose we don t stop until all nodes are pure A large decision tree with leaf nodes having very few data points Does not represent classes well Overfitting Solution: Stop earlier, or Prune back the tree StaOsOcally not significant Root 1000 400 600 Number of points 200 200 160 240 2 1 5 9

Prune back Pruning step: collapse leaf nodes and make the immediate parent a leaf node Effect of pruning Lose purity of nodes But were they really pure or was that a noise? Too many nodes noise Trade-off between loss of purity and gain in complexity Leaf node (label = Y) Freq = 5 Decision node (Freq = 7) Prune Leaf node (label = B) Freq = 2 Leaf node (label = Y) Freq = 7 10

Prune back: cost complexity Cost complexity of a (sub)tree: Classification error (based on training data) and a penalty for size of the tree Decision node (Freq = 7) tradeoff (T ) = Err(T )+α L(T ) Err(T) is the classification error L(T) = number of leaves in T Penalty factor α is between 0 and 1 If α=0, no penalty for bigger tree Leaf node (label = Y) Freq = 5 Leaf node (label = Y) Freq = 7 Prune Leaf node (label = B) Freq = 2 11

Different Decision Tree Algorithms Chi-square Automatic Interaction Detector (CHAID) Gordon Kass (1980) Stop subtree creation if not statistically significant by chi-square test Classification and Regression Trees (CART) Breiman et al. Decision tree building by Gini s index Iterative Dichotomizer 3 (ID3) Ross Quinlan (1986) Splitting by information gain (difference in entropy) C4.5 Quinlan s next algorithm, improved over ID3 Bottom up pruning, both categorical and continuous variables Handling of incomplete data points C5.0 Ross Quinlan s commercial version 12

ProperOes of Decision Trees Non parametric approach Does not require any prior assumptions regarding the probability distribution of the class and attributes Finding an optimal decision tree is an NP-complete problem Heuristics used: greedy, recursive partitioning, top-down, bottom-up pruning Fast to generate, fast to classify Easy to interpret or visualize Error propagation An error at the top of the tree propagates all the way down 13

References Introduction to Data Mining, by Tan, Steinbach, Kumar Chapter 4 is available online: http://www-users.cs.umn.edu/~kumar/dmbook/ch4.pdf 14