Data Mining Practical Machine Learning Tools and Techniques

Similar documents
Machine Learning in Real World: C4.5

Data Mining Practical Machine Learning Tools and Techniques. Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank

Advanced learning algorithms

Data Mining. 3.2 Decision Tree Classifier. Fall Instructor: Dr. Masoud Yaghini. Chapter 5: Decision Tree Classifier

9/6/14. Our first learning algorithm. Comp 135 Introduction to Machine Learning and Data Mining. knn Algorithm. knn Algorithm (simple form)

Data Mining. Practical Machine Learning Tools and Techniques. Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A.

Classification: Decision Trees

Representing structural patterns: Reading Material: Chapter 3 of the textbook by Witten

Data Mining Practical Machine Learning Tools and Techniques

Decision tree learning

The digital copy of this thesis is protected by the Copyright Act 1994 (New Zealand).

Data Mining Practical Machine Learning Tools and Techniques

Lecture outline. Decision-tree classification

Decision Trees. Query Selection

Decision Tree CE-717 : Machine Learning Sharif University of Technology

Classification with Decision Tree Induction

CS Machine Learning

Decision Trees: Discussion

Decision Tree Learning

Hierarchical Clustering Lecture 9

Lecture 5: Decision Trees (Part II)

Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai

BITS F464: MACHINE LEARNING

Slides for Data Mining by I. H. Witten and E. Frank

COMP 465: Data Mining Classification Basics

Data Mining. 3.3 Rule-Based Classification. Fall Instructor: Dr. Masoud Yaghini. Rule-Based Classification

CSE4334/5334 DATA MINING

Data Mining Part 5. Prediction

Machine Learning. Decision Trees. Manfred Huber

Data Mining and Machine Learning: Techniques and Algorithms

Ensemble Methods, Decision Trees

Machine Learning Techniques for Data Mining

Slides for Data Mining by I. H. Witten and E. Frank

Chapter 4: Algorithms CS 795

Algorithms: Decision Trees

Lecture 5 of 42. Decision Trees, Occam s Razor, and Overfitting

Nominal Data. May not have a numerical representation Distance measures might not make sense. PR and ANN

Data Mining Classification - Part 1 -

Chapter 4: Algorithms CS 795

Classification and Regression Trees

Decision trees. Decision trees are useful to a large degree because of their simplicity and interpretability

Data Mining. Decision Tree. Hamid Beigy. Sharif University of Technology. Fall 1396

7. Decision or classification trees

Machine Learning. Decision Trees. Le Song /15-781, Spring Lecture 6, September 6, 2012 Based on slides from Eric Xing, CMU

Business Club. Decision Trees

Supervised Learning. Decision trees Artificial neural nets K-nearest neighbor Support vectors Linear regression Logistic regression...

Data Mining Algorithms: Basic Methods

Classification: Basic Concepts, Decision Trees, and Model Evaluation

Nominal Data. May not have a numerical representation Distance measures might not make sense PR, ANN, & ML

Classification. Instructor: Wei Ding

2-3 Tree. Outline B-TREE. catch(...){ printf( "Assignment::SolveProblem() AAAA!"); } ADD SLIDES ON DISJOINT SETS

Decision Trees / Discrete Variables

CS229 Lecture notes. Raphael John Lamarre Townshend

DATA ANALYSIS I. Types of Attributes Sparse, Incomplete, Inaccurate Data

Decision Trees. Petr Pošík. Czech Technical University in Prague Faculty of Electrical Engineering Dept. of Cybernetics

Data Mining Practical Machine Learning Tools and Techniques

ISSUES IN DECISION TREE LEARNING

Data Engineering. Data preprocessing and transformation

Part I. Instructor: Wei Ding

Introduction to Machine Learning

Classification/Regression Trees and Random Forests

Nearest neighbor classification DSE 220

Notes based on: Data Mining for Business Intelligence

Data Mining: Concepts and Techniques Classification and Prediction Chapter 6.1-3

Prediction. What is Prediction. Simple methods for Prediction. Classification by decision tree induction. Classification and regression evaluation

Knowledge Discovery and Data Mining

Lecture 7: Decision Trees

Ensemble Learning: An Introduction. Adapted from Slides by Tan, Steinbach, Kumar

Example of DT Apply Model Example Learn Model Hunt s Alg. Measures of Node Impurity DT Examples and Characteristics. Classification.

Data Mining and Machine Learning. Instance-Based Learning. Rote Learning k Nearest-Neighbor Classification. IBL and Rule Learning

Data Mining Part 4. Tony C Smith WEKA Machine Learning Group Department of Computer Science University of Waikato

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

CSC411/2515 Tutorial: K-NN and Decision Tree

Extra readings beyond the lecture slides are important:

The digital copy of this thesis is protected by the Copyright Act 1994 (New Zealand).

Statistical Machine Learning Hilary Term 2018

Machine Learning. A. Supervised Learning A.7. Decision Trees. Lars Schmidt-Thieme

Data Mining Practical Machine Learning Tools and Techniques

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1

Practical Data Mining COMP-321B. Tutorial 1: Introduction to the WEKA Explorer

Decision Tree Learning

Binary Search Tree (3A) Young Won Lim 6/2/18

Decision Tree (Continued) and K-Nearest Neighbour. Dr. Xiaowei Huang

Lecture 19: Decision trees

Data Mining Lecture 8: Decision Trees

Artificial Intelligence. Programming Styles

SOCIAL MEDIA MINING. Data Mining Essentials

Data Mining Concepts & Techniques

Data Mining. 3.5 Lazy Learners (Instance-Based Learners) Fall Instructor: Dr. Masoud Yaghini. Lazy Learners

8. Tree-based approaches

CS 229 Midterm Review

Data Mining. 1.3 Input. Fall Instructor: Dr. Masoud Yaghini. Chapter 3: Input

Naïve Bayes Classification. Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others

What is Learning? CS 343: Artificial Intelligence Machine Learning. Raymond J. Mooney. Problem Solving / Planning / Control.

Machine Learning Lecture 11

Binary Search Tree (2A) Young Won Lim 5/17/18

Improved Post Pruning of Decision Trees

Data Preprocessing. Slides by: Shree Jaswal

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation. Lecture Notes for Chapter 4. Introduction to Data Mining

Naïve Bayes Classification. Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others

Transcription:

Decision trees Extending previous approach: Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank to permit numeric s: straightforward to deal sensibly with missing values: trickier stability for noisy data: requires pruning mechanism to handle regression End result: C4.5 Best-known and (probably) most widely-used learning algorithm Commercial successor: C5.0 2 Review of Basic Strategy Review: Splitting Attribute Strategy: top down Recursive divide-and-conquer fashion Entropy function: First: select for root node Create branches depending on the values of at the root. entropyp 1, p 2,...,p n = p 1 logp 1 p 2 logp 2... p n log p n Then: split instances into subsets One for each branch extending from the node Example of information calculation: Finally: repeat recursively for each branch, using only instances that reach the branch info[2,3]=entropy 2/5,3/5= 2/5log2/5 3/5log3/5 Stop if all instances have the same class information gain = info[v]-info[children of v] Data Mining: Practical Machine Learning Tools and Techniques (Chapter 4) 3 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 4) 4

Numeric s Weather data (again!) Standard method: binary splits E.g. temp < 45 Unlike nominal s, every has many possible split points Mild rmal rmal rmal Solution is straightforward extension: Evaluate info gain for every possible split point of Choose best split point Info gain for best split point is info gain for If outlook = sunny and humidity = high then play = no If humidity = normal then play = yes Computationally more demanding 5 6 Weather data (again!) Example Split on temperature : Mild 85 rmal 80 85 90 64 65 68 69 70 71 72 72 75 75 80 81 83 85 rmal 83 rmal Mild 70 68 86 rmal 96 80 E.g. temperature 71.5: yes/4, no/2 temperature 71.5: yes/5, no/3 65 70 If outlook = sunny and humidity = high then play = no If humidity = normal then play = yes If outlook = sunny and humidity > 83 then play = no If humidity < 85 then play = no Info([4,2],[5,3]) = 6/14 info([4,2]) + 8/14 info([5,3]) = 0.939 bits Place split points halfway between values Can evaluate all split points in one pass! 7 8

Can avoid repeated sorting Sort instances by the values of the numeric Does this have to be repeated at each node of the tree?! Sort order for children can be derived from sort order for parent Drawback: need to create and store an array of sorted indices for each numeric Binary vs multiway splits Splitting (multi-way) on a nominal exhausts all information in that minal is tested (at most) once on any path in the tree t so for binary splits on numeric s! Numeric may be tested several times along a path in the tree Disadvantage: tree is hard to read Remedy: pre-discretize numeric s, or use multi-way splits instead of binary ones 9 10 Missing values Pruning Simplest strategy: send instances down the popular branch. More sophisticated: Split instances with missing values into pieces A piece going down a branch receives a weight proportional to the popularity of the branch weights sum to 1 During classification, split the instance into pieces in the same way Prevent overfitting to noise in the data Prune the decision tree Two strategies: Postpruning take a fully-grown decision tree and discard unreliable parts Prepruning stop growing a branch when information becomes unreliable Postpruning preferred in practice prepruning can stop early 11 12

Prepruning Postpruning First, build full tree Then, prune it Based on statistical significance test Stop growing the tree when there is no statistically significant association between any and the class at a particular node Fully-grown tree shows all interactions How? determine whether some subtrees might be due to chance effects Two pruning operations: Subtree replacement Most popular test: chi-squared test (ID3) Subtree raising Use error estimation or statistical techniques 13 14 Subtree replacement Subtree raising Bottom-up Consider replacing a tree only after considering all its subtrees Delete node Redistribute instances Slower than subtree replacement (Worthwhile?) 15 16

Estimating error rates Regression trees Prune only if it does not increase the estimated error Error on the training data is NOT a useful estimator (would result in almost no pruning) Use hold-out set for pruning ( reduced-error pruning ) C4.5 s method Derive confidence interval from training data Use a heuristic limit, derived from this, for pruning Standard Bernoulli-process-based method Similar to decision trees but a leaf = average values of instances reaching leaf. Differences: Splitting criterion: minimize intra-subset variation can use standard deviation Termination criterion: std dev becomes small Prediction: Leaf predicts average class values of instances More sophisticated version: model trees each leaf represents a linear regression function 17 18