Induction of Decision Trees. An Example Data Set and Decision Tree

Similar documents
Induction of Decision Trees

Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai

ISSUES IN DECISION TREE LEARNING

Classification with Decision Tree Induction

Decision tree learning

COMP 465: Data Mining Classification Basics

Data Mining. 3.2 Decision Tree Classifier. Fall Instructor: Dr. Masoud Yaghini. Chapter 5: Decision Tree Classifier

Lecture 5: Decision Trees (Part II)

Extra readings beyond the lecture slides are important:

Data Mining: Concepts and Techniques Classification and Prediction Chapter 6.1-3

BITS F464: MACHINE LEARNING

Data Mining and Analytics

9/6/14. Our first learning algorithm. Comp 135 Introduction to Machine Learning and Data Mining. knn Algorithm. knn Algorithm (simple form)

Nominal Data. May not have a numerical representation Distance measures might not make sense. PR and ANN

Lecture 5 of 42. Decision Trees, Occam s Razor, and Overfitting

Classification. Instructor: Wei Ding

Introduction to Machine Learning

Data Mining Algorithms: Basic Methods

Part I. Instructor: Wei Ding

Decision Tree Learning

Implementierungstechniken für Hauptspeicherdatenbanksysteme Classification: Decision Trees

Classification: Basic Concepts, Decision Trees, and Model Evaluation

CSE4334/5334 DATA MINING

Nominal Data. May not have a numerical representation Distance measures might not make sense PR, ANN, & ML

8. Tree-based approaches

CS Machine Learning

Lecture outline. Decision-tree classification

Decision Tree Learning

Data Mining Practical Machine Learning Tools and Techniques

Business Club. Decision Trees

Decision Tree CE-717 : Machine Learning Sharif University of Technology

Data Mining. Decision Tree. Hamid Beigy. Sharif University of Technology. Fall 1396

Supervised Learning. Decision trees Artificial neural nets K-nearest neighbor Support vectors Linear regression Logistic regression...

Advanced learning algorithms

Machine Learning in Real World: C4.5

7. Decision or classification trees

CLASSIFICATION OF C4.5 AND CART ALGORITHMS USING DECISION TREE METHOD

Chapter ML:III. III. Decision Trees. Decision Trees Basics Impurity Functions Decision Tree Algorithms Decision Tree Pruning

Outline. RainForest A Framework for Fast Decision Tree Construction of Large Datasets. Introduction. Introduction. Introduction (cont d)

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 10: Decision Trees

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018

Introduction to Machine Learning CANB 7640

Decision Trees. Query Selection

Classification: Decision Trees

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 10: Decision Trees

Data Mining in Bioinformatics Day 1: Classification

Instance-Based Representations. k-nearest Neighbor. k-nearest Neighbor. k-nearest Neighbor. exemplars + distance measure. Challenges.

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

Rule induction. Dr Beatriz de la Iglesia

Machine Learning Chapter 2. Input

Data Mining Classification - Part 1 -

CONCEPT FORMATION AND DECISION TREE INDUCTION USING THE GENETIC PROGRAMMING PARADIGM

Representing structural patterns: Reading Material: Chapter 3 of the textbook by Witten

Machine Learning. A. Supervised Learning A.7. Decision Trees. Lars Schmidt-Thieme

CMPUT 391 Database Management Systems. Data Mining. Textbook: Chapter (without 17.10)

DESIGN AND IMPLEMENTATION OF BUILDING DECISION TREE USING C4.5 ALGORITHM

Coefficient of Variation based Decision Tree (CvDT)

Knowledge Discovery and Data Mining

Data Mining Part 4. Tony C Smith WEKA Machine Learning Group Department of Computer Science University of Waikato

Classification and Regression Trees

Data Mining Concepts & Techniques

Decision Trees. Petr Pošík. Czech Technical University in Prague Faculty of Electrical Engineering Dept. of Cybernetics

Classification/Regression Trees and Random Forests

Data Mining Practical Machine Learning Tools and Techniques

Tillämpad Artificiell Intelligens Applied Artificial Intelligence Tentamen , , MA:8. 1 Search (JM): 11 points

Lecture 2 :: Decision Trees Learning

Lecture on Modeling Tools for Clustering & Regression

Unsupervised: no target value to predict

Data Mining. Practical Machine Learning Tools and Techniques. Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A.

Data Mining and Machine Learning: Techniques and Algorithms

Nesnelerin İnternetinde Veri Analizi

Machine Learning. Decision Trees. Le Song /15-781, Spring Lecture 6, September 6, 2012 Based on slides from Eric Xing, CMU

.. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. for each element of the dataset we are given its class label.

Classification and Regression

Part I. Classification & Decision Trees. Classification. Classification. Week 4 Based in part on slides from textbook, slides of Susan Holmes

Classification and Regression Trees

CS229 Lecture notes. Raphael John Lamarre Townshend

Data Mining. Practical Machine Learning Tools and Techniques. Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A.

Example of DT Apply Model Example Learn Model Hunt s Alg. Measures of Node Impurity DT Examples and Characteristics. Classification.

Practical Data Mining COMP-321B. Tutorial 1: Introduction to the WEKA Explorer

Statistical Pattern Recognition

Lecture 7: Decision Trees

Statistical Pattern Recognition

Inducing Decision Trees with an Ant Colony Optimization Algorithm

Basic Concepts Weka Workbench and its terminology

Notes based on: Data Mining for Business Intelligence

Data Mining D E C I S I O N T R E E. Matteo Golfarelli

Lazy Decision Trees Ronny Kohavi

Chapter 4: Algorithms CS 795

What Is Data Mining? CMPT 354: Database I -- Data Mining 2

The digital copy of this thesis is protected by the Copyright Act 1994 (New Zealand).

Homework 1 Sample Solution

Statistical Pattern Recognition

CSE 5243 INTRO. TO DATA MINING

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation. Lecture Notes for Chapter 4. Introduction to Data Mining

Nonparametric Classification Methods

Lecture 19: Decision trees

ARTIFICIAL INTELLIGENCE (CS 370D)

Machine Learning Lecture 11

Implementation of Classification Rules using Oracle PL/SQL

Transcription:

Induction of Decision Trees Blaž Zupan, Ivan Bratko magixfriuni-ljsi/predavanja/uisp An Example Data Set and Decision Tree # Attribute Class Outlook Company Sailboat Sail? 1 sunny big small 2 sunny med small 3 sunny med big 4 sunny small sunny outlook rainy 5 sunny big big 6 rainy small company 7 rainy med small 8 rainy big big 9 rainy big med big 10 rainy med big sailboat small big 1

Classification # Attribute Class Outlook Company Sailboat Sail? 1 sunny big? 2 rainy big small? sunny outlook rainy company med big sailboat small big Induction of Decision Trees Data Set (Learning Set) Each example = Attributes + Class Induced description = Decision tree TDIDT Top Down Induction of Decision Trees Recursive artitioning 2

Some TDIDT Systems ID3 (Quinlan 79) CART (Brieman et al 84) Assistant (Cestnik et al 87) C45 (Quinlan 93) See5 (Quinlan 97) Orange (Demšar, Zupan 98-03) Analysis of Severe Trauma atients Data H_ICU The worst ph value at ICU <72 72-733 >733 Death 00 (0/15) AT_WORST Well 088 (14/16) Well 082 (9/11) <787 >=787 Death 00 (0/7) The worst active partial thromboplastin time H_ICU and AT_WORST are exactly the two factors (theoretically) advocated to be the most important ones in the study by Rotondo et al, 1997 3

Breast Cancer Recurrence Degree of Malig < 3 >= 3 Tumor Size Involved Nodes < 15 >= 15 < 3 >= 3 Age rec 125 recurr 39 rec 30 recurr 18 recurr 27 _rec 10 rec 4 recurr 1 rec 32 recurr 0 Tree induced by Assistant rofessional Interesting: Accuracy of this tree compared to medical specialists rostate cancer recurrence Secondary Gleason Grade 1,2 3 4 5 No SA Level Stage Yes 149 >149 T1c,T2a, T2b,T2c T1ab,T3 No rimary Gleason Grade No Yes 2,3 4 No Yes 4

TDIDT Algorithm Also kwn as ID3 (Quinlan) To construct decision tree T from learning set S: If all examples in S belong to some class C Then make leaf labeled C Otherwise select the most informative attribute A partition S according to A s values recursively construct subtrees T1, T2,, for the subsets of S TDIDT Algorithm Resulting tree T is: A Attribute A v1 v2 vn A s values T1 T2 Tn Subtrees 5

Ather Example # Attribute Class Outlook Temperature Humidity Windy lay 1 sunny hot high N 2 sunny hot high N 3 overcast hot high 4 rainy moderate high 5 rainy cold rmal 6 rainy cold rmal N 7 overcast cold rmal 8 sunny moderate high N 9 sunny cold rmal 10 rainy moderate rmal 11 sunny moderate rmal 12 overcast moderate high 13 overcast hot rmal 14 rainy moderate high N Simple Tree Outlook sunny overcast rainy Humidity Windy high rmal N N 6

Complicated Tree Temperature cold moderate hot Outlook Outlook Windy sunny overcast rainy sunny overcast rainy Windy Windy Humidity N Humidity high rmal high rmal N N Windy Outlook sunny overcast rainy N N null Attribute Selection Criteria Main principle Select attribute which partitions the learning set into subsets as pure as possible Various measures of purity Information-theoretic Gini index X 2 ReliefF Various improvements probability estimates rmalization binarization, subsetting 7

Information-Theoretic Approach To classify an object, a certain information is needed I, information After we have learned the value of A, we only need some remaining amount of information to classify the object Ires, residual information Gain Gain(A) = I Ires(A) The most informative attribute is the one that minimizes Ires, ie, maximizes Gain Entropy The average amount of information I needed to classify an object is given by the entropy measure For a two-class problem: entropy p(c1) 8

Residual Information After applying attribute A, S is partitioned into subsets according to values v of A Ires is equal to weighted sum of the amounts of information for the subsets Triangles and Squares # Attribute Shape Color Outline Dot 1 green dashed triange 2 green dashed triange 3 yellow dashed square 4 red dashed square 5 red solid square 6 red solid triange 7 green solid square 8 green dashed triange 9 yellow solid square 10 red solid square 11 green solid square 12 yellow dashed square 13 yellow solid square 14 red dashed triange 9

Triangles and Squares # Attribute Shape Color Outline Dot 1 green dashed triange 2 green dashed triange 3 yellow dashed square 4 red dashed square 5 red solid square 6 red solid triange 7 green solid square 8 green dashed triange 9 yellow solid square 10 red solid square 11 green solid square 12 yellow dashed square 13 yellow solid square 14 red dashed triange Data Set: A set of classified objects Entropy 5 triangles 9 squares class probabilities entropy 10

red Entropy reduction by data set partitioning Color? yellow green Entropija vredsti atributa Color? red yellow green 11

Information Gain Color? red yellow green Information Gain of The Attribute Attributes Gain(Color) = 0246 Gain(Outline) = 0151 Gain(Dot) = 0048 Heuristics: attribute with the highest gain is chosen This heuristics is local (local minimization of impurity) 12

red Color? green yellow Gain(Outline) = 0971 0 = 0971 bits Gain(Dot) = 0971 0951 = 0020 bits red Color? Gain(Outline) = 0971 0951 = 0020 bits Gain(Dot) = 0971 0 = 0971 bits green yellow Outline? dashed solid 13

14 Color? red yellow green dashed solid Dot? Outline? Decision Tree Color Dot Outline square red yellow green square triangle square triangle dashed solid

A Defect of Ires Ires favors attributes with many values Such attribute splits S to many subsets, and if these are small, they will tend to be pure anyway One way to rectify this is through a corrected measure of information gain ratio Information Gain Ratio I(A) is amount of information needed to determine the value of an attribute A Information gain ratio 15

Information Gain Ratio Color? red yellow green Information Gain and Information Gain Ratio A v(a) Gain(A) GainRatio(A) Color 3 0247 0156 Outline 2 0152 0152 Dot 2 0048 0049 16

Gini Index Ather sensible measure of impurity (i and j are classes) After applying attribute A, the resulting Gini index is Gini can be interpreted as expected error rate Gini Index 17

Gini Index for Color Color? red yellow green Gain of Gini Index 18

Three Impurity Measures A Gain(A) GainRatio(A) GiniGain(A) Color 0247 0156 0058 Outline 0152 0152 0046 Dot 0048 0049 0015 These impurity measures assess the effect of a single attribute Criterion most informative that they define is local (and myopic ) It does t reliably predict the effect of several attributes applied jointly Orange: Shapes Data Set shapetab Color Outline Dot Shape d d d d class green dashed triange green dashed triange yellow dashed square red dashed square red solid square red solid triange green solid square green dashed triange yellow solid square red solid square green solid square yellow dashed square yellow solid square 19

Orange: Impurity Measures import orange data = orangeexampletable('shape') gain = orangemeasureattribute_info gainratio = orangemeasureattribute_gainratio gini = orangemeasureattribute_gini print print "%15s %-8s %-8s %-8s" % ("name", "gain", "g ratio", "gini") for attr in datadomainattributes: print "%15s %43f %43f %43f" % \ (attrname, gain(attr, data), gainratio(attr, data), gini(attr, data)) name gain g ratio gini Color 0247 0156 0058 Outline 0152 0152 0046 Dot 0048 0049 0015 Orange: orngtree import orange, orngtree data = orangeexampletable('shape') tree = orngtreetreelearner(data) orngtreeprinttxt(tree) print '\nwith contingency vector:' orngtreeprinttxt(tree, internalnodefields=['contingency'], leaffields=['contingency']) Color green: Outline dashed: triange (1000%) Outline solid: square (1000%) Color yellow: square (1000%) Color red: Dot : square (1000%) Dot : triange (1000%) With contingency vector: Color (<5, 9>) green: Outline (<3, 2>) dashed: triange (<3, 0>) Outline (<3, 2>) solid: square (<0, 2>) Color (<5, 9>) yellow: square (<0, 4>) Color (<5, 9>) red: Dot (<2, 3>) : square (<0, 3>) Dot (<2, 3>) : triange (<2, 0>) 20

Orange: Saving to DOT import orange, orngtree data = orangeexampletable('shape') tree = orngtreetreelearner(data) orngtreeprintdot(tree, 'shapedot', leafshape='box', internalnodeshape='ellipse') > dot -Tgif shapedot > shapegif DB Miner: visualization 21

SGI MineSet: visualization 22