Algorithms: Decision Trees

Similar documents
Machine Learning in Telecommunications

Logical Rhythm - Class 3. August 27, 2018

Decision Tree CE-717 : Machine Learning Sharif University of Technology

Decision Trees. Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

CS229 Lecture notes. Raphael John Lamarre Townshend

CS Machine Learning

7. Decision or classification trees

Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai

CS 229 Midterm Review

What is Learning? CS 343: Artificial Intelligence Machine Learning. Raymond J. Mooney. Problem Solving / Planning / Control.

Lecture outline. Decision-tree classification

Classification. Slide sources:

Data Mining Practical Machine Learning Tools and Techniques

Extra readings beyond the lecture slides are important:

Business Club. Decision Trees

Data Mining Lecture 8: Decision Trees

Boosting Simple Model Selection Cross Validation Regularization

Classification/Regression Trees and Random Forests

Boosting Simple Model Selection Cross Validation Regularization. October 3 rd, 2007 Carlos Guestrin [Schapire, 1989]

Classification. Instructor: Wei Ding

Decision Trees: Discussion

Decision tree learning

Classification: Basic Concepts, Decision Trees, and Model Evaluation

Decision trees. Decision trees are useful to a large degree because of their simplicity and interpretability

Simple Model Selection Cross Validation Regularization Neural Networks

Machine Learning (CSE 446): Decision Trees

CPSC 340: Machine Learning and Data Mining. Probabilistic Classification Fall 2017

Lecture 7: Decision Trees

Machine Learning. Decision Trees. Le Song /15-781, Spring Lecture 6, September 6, 2012 Based on slides from Eric Xing, CMU

Lecture 20: Bagging, Random Forests, Boosting

Part I. Instructor: Wei Ding

Midterm Examination CS540-2: Introduction to Artificial Intelligence

Introduction to Machine Learning

Random Forest A. Fornaser

Data Mining. 3.2 Decision Tree Classifier. Fall Instructor: Dr. Masoud Yaghini. Chapter 5: Decision Tree Classifier

CS4445 Data Mining and Knowledge Discovery in Databases. A Term 2008 Exam 2 October 14, 2008

Supervised Learning. Decision trees Artificial neural nets K-nearest neighbor Support vectors Linear regression Logistic regression...

8. Tree-based approaches

Classification with Decision Tree Induction

CSE4334/5334 DATA MINING

Decision Trees / Discrete Variables

Univariate and Multivariate Decision Trees

Classification and Regression Trees

CSC411/2515 Tutorial: K-NN and Decision Tree

10601 Machine Learning. Model and feature selection

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1

Data Mining: Classifier Evaluation. CSCI-B490 Seminar in Computer Science (Data Mining)

Lecture 5: Decision Trees (Part II)

Decision Tree Learning

Classification: Decision Trees

Nearest neighbor classification DSE 220

Machine Learning. Decision Trees. Manfred Huber

Decision Trees Oct

Improved Post Pruning of Decision Trees

Tree-based methods for classification and regression

A Systematic Overview of Data Mining Algorithms. Sargur Srihari University at Buffalo The State University of New York

Ensemble Learning: An Introduction. Adapted from Slides by Tan, Steinbach, Kumar

Notes based on: Data Mining for Business Intelligence

Data Mining Part 5. Prediction

CPSC 340: Machine Learning and Data Mining

Data Mining and Knowledge Discovery Practice notes 2

A Program demonstrating Gini Index Classification

Midterm Examination CS 540-2: Introduction to Artificial Intelligence

Lecture on Modeling Tools for Clustering & Regression

Lecture 19: Decision trees

Data Mining Concepts & Techniques

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

Data Mining and Knowledge Discovery: Practice Notes

I211: Information infrastructure II

Classification and Regression

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018

COMPUTATIONAL INTELLIGENCE SEW (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 6: k-nn Cross-validation Regularization

Ensemble Methods, Decision Trees

Uninformed Search Methods. Informed Search Methods. Midterm Exam 3/13/18. Thursday, March 15, 7:30 9:30 p.m. room 125 Ag Hall

COMP 465: Data Mining Classification Basics

1) Give decision trees to represent the following Boolean functions:

Nominal Data. May not have a numerical representation Distance measures might not make sense. PR and ANN

Metrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates?

Data Mining. 3.3 Rule-Based Classification. Fall Instructor: Dr. Masoud Yaghini. Rule-Based Classification

Nonparametric Methods Recap

Decision Trees. Petr Pošík. Czech Technical University in Prague Faculty of Electrical Engineering Dept. of Cybernetics

A Systematic Overview of Data Mining Algorithms

The Basics of Decision Trees

Model Complexity and Generalization

Data Mining: Concepts and Techniques Classification and Prediction Chapter 6.1-3

Geometric data structures:

Statistical Machine Learning Hilary Term 2018

Knowledge Discovery and Data Mining

Decision Tree Learning

Problem 1: Complexity of Update Rules for Logistic Regression

BITS F464: MACHINE LEARNING

Credit card Fraud Detection using Predictive Modeling: a Review

Network Traffic Measurements and Analysis

April 3, 2012 T.C. Havens

Tree Structures. A hierarchical data structure whose point of entry is the root node

Classification and Regression Trees

Nesnelerin İnternetinde Veri Analizi

Machine Learning. bad news. Big Picture: Supervised Learning. Supervised (Function) Learning. Learning From Data with Decision Trees.

Introduction to Rule-Based Systems. Using a set of assertions, which collectively form the working memory, and a set of

Decision Tree Structure: Root

Transcription:

Algorithms: Decision Trees

A small dataset: Miles Per Gallon Suppose we want to predict MPG From the UCI repository

A Decision Stump

Recursion Step Records in which cylinders = 4 Records in which cylinders = 5 Take the Original Dataset.. And partition it according to the value of the attribute we split on Build Tree from these Records Records in which cylinders = 6 Records in which cylinders = 8

Second level of tree Recursively build a tree from the seven records in which there are four cylinders and the maker was based in Asia (Similar recursion in the other cases)

The final tree

Classification of a new example Classifying a test example Traverse tree Report leaf label

Learning decision trees is hard!!! Learning the simplest (smallest) decision tree is an NPcomplete problem [Hyafil & Rivest 76] Resort to a greedy heuristic: Start from empty decision tree Split on next best attribute (feature) Recurse How to choose the best attribute and the value for a split?

Entropy Entropy characterizes our uncertainty about our source of information More uncertainty, more entropy! Information Theory interpretation: H(Y) is the expected number of bits needed to encode a randomly drawn value of Y (under most efficient code)

Information gain Advantage of attribute decrease in uncertainty Entropy of Y before you split Entropy after split Weight by probability of following each branch, i.e., normalized number of records Information gain is difference

Learning decision trees Start from empty decision tree Split on next best attribute (feature) Use, for example, information gain to select attribute Split on Recurse

A Decision Stump

Base Cases Base Case One: If all records in current data subset have the same output then don t recurse Base Case Two: If all records have exactly the same set of input attributes then don t recurse

Base Case 1 Don t split a node if all matching records have the same output value

Base Case 2 Don t split a node if all records have exactly the same set of input attributes

Basic Decision Tree Building Summarized BuildTree(DataSet,Output) If all output values are the same in DataSet, return a leaf node that says predict this unique output If all input values are the same, return a leaf node that says predict the majority output Else find attribute X with highest Info Gain Suppose X has n X distinct values (i.e. X has arity n X ). Create and return a non-leaf node with n X children. The i th child should be built by calling BuildTree(DSi,Output) Where DSi built consists of all those records in DataSet for which X = ith distinct value of X.

Decision trees will overfit Standard decision trees are have no learning biased Training set error is always zero! (If there is no label noise) Lots of variance Will definitely overfit!!! Must bias towards simpler trees Many strategies for picking simpler trees: Fixed depth Fixed number of leaves Or something smarter

Consider this split

A statistical test Suppose that mpg was completely uncorrelated with maker. What is the chance we d have seen data of at least this apparent level of association anyway?

Using to avoid overfitting Build the full decision tree as before But when you can grow it no more, start to prune: Beginning at the bottom of the tree, delete splits in which have extreme low chance to appear//pchance > MaxPchance Continue working your way up until there are no more prunable nodes

What you need to know about decision trees Decision trees are one of the most popular data mining tools Easy to understand Easy to implement Easy to use Computationally cheap (to solve heuristically) Information gain to select attributes (ID3, C4.5, ) Presented for classification, can be used for regression and density estimation too Decision trees will overfit!!! Zero bias classifier! Lots of variance Must use tricks to find simple trees, e.g., Fixed depth/early stopping Pruning

Decision trees in Matlab Use classregtree class Create a new tree: t=classregtree(x,y), X is a matrix of predictor values, y is a vector of n response values Prune the tree: tt = prune(t, alpha, pchance) alpha defines the level of the pruning Predict a value y= eval(tt, X)