Decision Tree Learning
|
|
- Vivien Dickerson
- 5 years ago
- Views:
Transcription
1 Decision Tree Learning 1
2 Simple example of object classification Instances Size Color Shape C(x) x1 small red circle positive x2 large red circle positive x3 small red triangle negative x4 large blue circle negative 2
3 Decision Trees Tree-based classifiers for instances represented as feature-vectors. Nodes test features, there is one branch for each value of the feature, and leaves specify the category. color color red blue green red blue green shape shape neg pos B C circle square triangle circle square triangle pos neg neg A B C Can represent arbitrary conjunction and disjunction (any boolean function). Can represent any classification function over discrete feature vectors. Can be rewritten as a set of rules, i.e. disjunctive normal form (DNF). red circle pos red circle A blue B; red square B green C; red triangle C
4 General shape of a decision tree Every non leaf node is a test on the value of one features. For each possible outcome of the test, an arrow is created that links to a subsequent test node or to a leaf node. Leaf nodes are DECISIONS concerning the value of the classification 4
5 Another simple example: risk of giving a loan Here, we have 3 features: income, credit history and debt. The classification function is Risk(x) with 3 values: high, moderate, low 5
6 More complex example: Application of Decision- Tree Model to character recognition based on visual features 6 Can t be visualized without zooming, but boxes (decisions) are alphabet letters, circles are test on values of several visual features
7 How does it work: Top-Down Decision Tree Induction Recursively build a tree top-down by divide and conquer. Subset of examples in D in which color=red <big, red, circle>: + <small, red, circle>: + <small, red, square>: - <big, blue, circle>: - red <big, red, circle>: + <small, red, circle>: + <small, red, square>: - blue color green Initial learning Set D Begin by considering the feature color At each step, we aim to find the best split of our data. What is a good split? One which reduces the uncertainty of classification for some split! 7
8 Top-Down Decision Tree Induction Recursively build a tree top-down by divide and conquer. <big, red, circle>: + <small, red, circle>: + <small, red, square>: - pos <big, red, circle>: + <small, red, circle>: + <big, red, circle>: + <small, red, circle>: + <small, red, square>: - <big, blue, circle>: - color red blue green shape neg neg circle <big, blue, circle>: - square triangle neg pos <small, red, square>: - How do we decide the order in which we test attributes? How do we decide the class of nodes for which we have no examples? Let s ignore for now these 2 issues and describe the algorithm first 8
9 Remember the house classification problem in L1 Mostly SF The data suggests homes higher than 73 meters are mostly in San Francisco 73m
10 Splitting data according to features height >75 <75 SF All instances in this split are green (=SF), so we can output a decison In this split, we need to analyze more features since no decision on the class is possible 10
11 Decision Tree Induction Pseudocode DTree(examples, features) returns a tree a)if all examples are in one category, return a leaf node with that category label. b)else if the set of features is empty, return a leaf node with the category label that is the most common in examples. Else pick a feature f and create a node R for it color For each possible value v i of f: Let examples i be the subset of examples that have value v i for f Add an out-going edge E to node R labeled with the value v i. If examples i is empty then attach a leaf node to edge E labeled with the category that is the most common in examples. else call DTree(examples i, features {f}) and attach the resulting tree as the subtree under edge E. Return the subtree rooted at R. a) and b) are the termination conditions e.g. color e.g. color=red <big, red, circle>: + <small, red, circle>: + <small, red, square>: - red 11
12 Example (1) Examples: <big, red, circle>: + <small, red, circle>: + <small, red, square>: - <big, blue, circle>: - pick a feature F and create a node R for it, eg. color color For each possible value v i of F: Let examples i be the subset of examples that have value v i for F Add an out-going edge E to node R labeled with the value v i. <big, red, circle>: + <small, red, circle>: + <small, red, square>: - red color if ( ) else call DTree(examples i, features {F}) and attach the resulting tree as the subtree under edge E. Dtree( <big, red, circle>: + <small, red, circle>: + <small, red, square>: -, <dimension,shape>) 12
13 Example (2) DTree( <big, red, circle>: + <small, red, circle>: + <small, red, square>: -, <dimension,shape>) pick a feature F and create a node R for it, eg. shape <big, red, circle>: + <small, red, circle>: + circle shape red color square triangle <small, red, square>: - positive negative positive If all examples are in one category, return a leaf node with that category label. If examples i is empty. then attach a leaf node to edge E labeled with the category that 13 is the most common in examples.
14 Example (3) <big, blue, circle>: - <big, red, circle>: + <small, red, circle>: + <small, red, square>: - pos <big, red, circle>: + <small, red, circle>: + color red blue green shape neg neg circle <big, blue, circle>: - square triangle neg pos <small, red, square>: - Now we know how to decide the class when we have no examples, but how do we decide the order in which we create nodes? 14
15 Picking a Good Split Feature Goal is to have the resulting tree be as small as possible, per Occam s razor. Finding a minimal decision tree (nodes, leaves, or depth) is an NP-hard optimization problem. Top-down divide-and-conquer method does a greedy search for a simple tree but does not guarantee to find the smallest. General lesson in ML: Greed is good. Want to pick a feature that creates subsets of examples that are relatively pure in a single class so they are closer to being leaf nodes. There are a variety of heuristics for picking a good test, a popular one is based on information gain that originated with the ID3 system of Quinlan (1979). 15
16 Entropy (recap.. ) Entropy (disorder, impurity) of a set of examples, D, relative to a binary classification is: Entropy(D) = p 1 log 2 ( p 1 ) p 0 log 2 ( p 0 ) where p 1 is the fraction of positive examples in D and p 0 is the fraction of negatives. (Notice that if S is a SAMPLE of a population, Entropy(S) is an estimate of the population entropy) If all examples are in one category, entropy is zero (we define 0 log(0)=0) If examples are equally mixed (p 1 =p 0 =0.5), entropy is a maximum of 1. Entropy can be viewed as the number of bits required on average to encode the class of an example in D. It is also an estimate of the initial disorder or uncertainty about a classification, given the set D. For multi-class problems with c categories, entropy generalizes to: Entropy(D) = p i log 2 ( p i ) c i=1 16
17 Entropy Plot for Binary Classification 17
18 Information Gain The information gain of a feature f is the expected reduction in entropy resulting from splitting on this feature. Gain(D, f ) = Entropy(D) where D v is the subset of D having value v for feature f (e.g, if f=color and v=red) Entropy of each resulting subset weighted by its relative size. Example: <big, red, circle>: + <small, red, circle>: + <small, red, square>: - <big, blue, circle>: - D v Entropy(D v ) D v Values( f ) Initial Entropy is 1 2+, 2 -: E=1 size 2+, 2 - : E=1 color 2+, 2 - : E=1 shape big small 1+,1-1+,1- E big =1 E small =1 Gain=1-( ) = 0 red blue 2+,1-0+,1- E red =0.918 E blue =0 Gain=1-( ) = circle square 2+,1-0+,1- E circle =0.918 E square =0 Gain=1-( ) = 0.311
19 New pseudo-code DTree(examples, features) returns a tree a)if all examples are in one category, return a leaf node with that category label. b)else if the set of features is empty, return a leaf node with the category label that is the most common in examples. Else pick the best feature f according to IG and create a node R for it For each possible value v i of f: Let examples i be the subset of examples that have value v i for f Add an out-going edge E to node R labeled with the value v i. If examples i is empty then attach a leaf node to edge E labeled with the category that is the most common in examples. else call DTree(examples i, features {f}) and attach the resulting tree as the subtree under edge E. Return the subtree rooted at R. 19
20 A small animated example 20
21 A complete example
22 A Decision Tree for PlayTennis? data The Play tennis? data example. instance x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14 Outlook Temperature Humidity Windy Play Sunny Sunny Overcast Rainy Rainy Rainy Overcast Sunny Sunny Rainy Sunny Overcast Overcast Rainy Hot Hot Hot Mild Cool Cool Cool Mild Cool Mild Mild Mild Hot Mild High High High High Normal Normal Normal High Normal Normal Normal High Normal High False True False False False True True False False False True True False True No No Yes Yes Yes No Yes No Yes Yes Yes Yes Yes No
23 The Process of Constructing a Decision Tree Select an attribute to place at the root of the decision tree and make one branch for every possible value. Repeat the process recursively for each branch.
24 Information Gained by Knowing the Result of a Decision In the playtennis example, there are 9 instances of which the decision to play is yes and there are 5 instances of which the decision to play is no. Then, the initial data entropy is: 9 14 # log 9 & # % (+ 5 & # % ( log 5 & % ( = $ 14' $ 14' $ 14' The information initially required to correctly separate the data is bits
25 Information Further Required If Outlook Is Placed at the Root is the entropy of a 5 instances data set where 3 are positive and 2 pegative (or viceversa) yes yes no no no Outlook sunny overcast rainy yes yes yes yes yes yes yes no no 0 is the entropy of a dataset where all instances are positive (or negative) Probabil ity of outlook =sunny Information further required =! 5 $! # & $! # & $ # & 0.971= 0.693bits. " 14% " 14% " 14%
26 Information Gained by Placing Each of the 4 Attributes Gain(outlook) = bits bits = bits. Gain(temperature) = bits. Gain(humidity) = bits. Gain(windy) = bits.
27 The Strategy for Selecting an Attribute to Place at a Node Select the attribute that gives us the largest information gain. In this example, it is the attribute Outlook. Outlook sunny overcast rainy 2 yes 3 no 4 yes 3 yes 2 no
28 The Recursive Procedure for Constructing a Decision Tree Apply to each branch recursively to construct the decision tree. For example, for the branch Outlook = Sunny, we evaluate the information gained by applying each of the remaining 3 attributes. Gain(Outlook=sunny;Temperature) = = Gain(Outlook=sunny;Humidity) = = Gain(Outlook=sunny;Windy) = = 0.02
29 Recursive selection Similarly, we also evaluate the information gained by applying each of the remaining 3 attributes for the branch Outlook = rainy. Gain(Outlook=rainy;Temperature) = = 0.02 Gain(Outlook=rainy;Humidity) = = 0.02 Gain(Outlook=rainy;Windy) = = 0.971
30 The resulting tree Outlook sunny overcast rainy humidity yes windy high normal false true no yes yes no 30
31 DT can be represented as a set of rules Outlook sunny overcast rainy humidity yes windy no high normal false true yes yes no IF Outlook=sunny AND humidity=high è no IF Outlook=sunny AND humidity=normalè yes IF Outlook=overcast è yes IF Outlook=rainy AND windy=falseè yes IF Outlook=rainy AND windy=trueè no 31
32 Support and Confidence Each rule as a support D v / D represented by the set of examples that fit the rule Each rule has also a confidence which might or might not be equal to support. The confidence is the subset of D v which is correctly classified by the rule. Remember one of the 2 exit conditions in the algorithm: Else if the set of features is empty, return a leaf node with the category label that is the most common in examples. Hence the set of examples D v does NOT has a uniform classification, but, say, D v + positive and D v - negative, if D v + > D v - class is positive, support is D v + / D and confidence is D v + / D v For example if we consume all features in a branch of the tree and we remain with 5 examples, of which 3 positive and 2 negative, we append the decision positive to the tree branch (and its associated rule), with support 5 (or 5/ D ) and confidence 3/5 32
33 Example (e.g. if we have just one feature) Outlook sunny overcast rainy 2 yes 3 no 4 yes 3 yes 2 no negative RULE: IF Outlook=sunny THEN neg Support=3/14 Confidence=3/5 decision-trees-tutorial/ 33
34 Properties of Decision Tree Learning Continuous (real-valued) features can be handled by allowing nodes to split a real valued feature into two ranges based on a threshold (e.g. length < 3 and length 3) Classification trees have discrete class labels at the leaves, regression trees allow real-valued outputs at the leaves. Algorithms for finding consistent trees are efficient for processing large amounts of training data for data mining tasks. Methods developed for handling noisy training data (both class and feature noise). Consistency constraint can be relaxed. Methods developed for handling missing feature values. 34
35 Hypothesis Space Search in DL Performs batch learning that processes all training instances at once rather than incremental learning (e.g. like VS) that updates a hypothesis after each example. Performs hill-climbing (greedy search) that may only find a locally-optimal solution. Greedy because it locally determines the best split node to be generated. Guaranteed to find a tree consistent with any conflict-free training set (i.e. identical feature vectors always assigned the same class), but not necessarily the simplest tree. Finds a single discrete hypothesis, so there is no way to provide confidences or create useful queries. 35
36 Bias in Decision-Tree Induction Information-gain gives a bias for trees with minimal depth. Implements a search (preference) bias instead of a language (restriction) bias (language bias e.g. conjunctive hypotheses only as for VS). 36
37 History of Decision-Tree Research Hunt and colleagues use exhaustive search decision-tree methods (CLS) to model human concept learning in the 1960 s. In the late 70 s, Quinlan developed ID3 with the information gain heuristic to learn expert systems from examples. Simulataneously, Breiman and Friedman and colleagues develop CART (Classification and Regression Trees), similar to ID3. In the 1980 s a variety of improvements are introduced to handle noise, continuous features, missing features, and improved splitting criteria. Various expert-system development tools results. Quinlan s updated decision-tree package (C4.5) released in Weka includes Java version of C4.5 called J48. 37
38 Summary Decision tree algorithm Ordering nodes (Information Gain) Testing and pruning 38
39 Computational Complexity Worst case builds a complete tree where every path test every feature. Assume D =n examples and m features. f 1 f m Maximum of n examples spread across all nodes at each of the m levels At each level, i, in the tree, must examine the remaining m- i features for each instance at that level to calculate info gains. m 2 i n = O( nm ) i= 1 However, learned tree is rarely complete (number of leaves is usually << n). In practice, complexity is linear in both number of features (m) and number of training examples (n). 39
40 Overfitting Learning a tree that classifies the training data perfectly may not lead to the tree with the best generalization to unseen data. There may be noise in the training data that the tree is erroneously fitting. The algorithm may be making poor decisions towards the leaves of the tree that are based on very little data and may not reflect reliable trends (e.g. reliable rules should be supported by many of examples, not just a handful ). A hypothesis, h, is said to overfit the training data is there exists another hypothesis, h, such that h has less error than h on the training data but greater error on independent test data. on training data accuracy on test data hypothesis complexity 40
41 Overfitting example (1) 41
42 Overfitting Example (2) Testing Ohms Law: V = IR (I = (1/R)V) Experimentally measure 10 points Fit a curve to the Resulting data. current (I) voltage (V) Perfect fit to training data with an 9 th degree polynomial (can fit n points exactly with an n-1 degree polynomial) Ohm was wrong, we have found a more accurate function! 42
43 Overfitting Example (3) Testing Ohms Law: V = IR (I = (1/R)V) current (I) voltage (V) Better generalization with a linear function that fits training data less accurately. 43
44 Overfitting Noise in Decision Trees Category or feature noise can easily cause overfitting. Add noisy instance <medium, blue, circle>: pos (but really neg) color red green blue shape neg neg circle square triangle pos neg pos 44
45 Overfitting Noise in Decision Trees Category or feature noise can easily cause overfitting. Add noisy instance <medium, blue, circle>: pos (but really neg) color red green blue <big, blue, circle>: - shape neg <medium, blue, circle>: + circle square triangle small med big pos neg pos neg pos neg Conflicting examples can also arise if the features are incomplete and inadequate to determine the class or if the target concept is non-deterministic (hidden features not discovered during initial phase or not available). For example it is believed that psoriasis also depends on stress, but patients stress condition mostly not available in patients records 45
46 Example (inadequate features) X(has-4-legs, has-tail, has-long-nose, has-big-ears,can-be-ridden) Classes: horse, camel, elephant The missing feature has-hump does not allow to correctly classify a camel (camel and horse would be classified the same according to the above features) 46
47 Overfitting Prevention (Pruning) Methods Two basic approaches for decision trees Prepruning: Stop growing tree as some point during top-down construction when there is no longer sufficient data to make reliable decisions (e.g. D v <k). Postpruning: Grow the full tree, then remove sub-trees that do not have sufficient evidence. Label leaf resulting from pruning with the majority class of the remaining data, or a class probability distribution. Method for determining which sub-trees to prune: Cross-validation: Reserve some training data as a hold-out set (validation set, tuning set) to evaluate utility of subtrees. Note: tuning set different from test set!! Test set always needed at the end. Statistical test: Use a statistical test on the training data to determine if any observed regularity (e.g. rule) can be dismisses, as likely due to random chance. Minimum description length (MDL): Determine if the additional complexity of the hypothesis is less complex than just explicitly remembering any exceptions resulting from pruning (e.g. for mammals: wings=false, except for bats). 47
48 Reduced Error Pruning A post-pruning, cross-validation approach. Partition training data D in learning L and validation V sets. Build a complete tree from the L data. Until accuracy on validation set V decreases do: For each non-leaf node, n, in the tree do: Temporarily prune the subtree below n and replace it with a leaf labeled with the current majority class at that node. Measure and record the accuracy of the pruned tree on the validation set. Permanently prune the node that results in the greatest increase in accuracy on the validation set. 48
49 Example shape red circle square pos neg pos <big, blue, circle>: - <medium, blue, circle>: + <small, blue, circle>: - green blue neg neg triangle small med big neg pos neg Majority of removed examples under the pruned sub-tree is neg 49
50 Issues with Reduced Error Pruning The problem with this approach is that it potentially wastes training data on the validation set. Severity of this problem depends where we are on the learning curve: test accuracy number of training examples A learning curve shows the accuracy, e.g. the learning rate, as the number of training examples grows. 50
51 Cross-Validating without Loosing Training Data If the algorithm is modified to grow trees breadth-first rather than depth-first (e.g. test all values of a feature first and create all nodes at level i), we can stop growing after reaching any specified tree complexity. First, run several trials of reduced error-pruning using different random splits of learning and test sets. Record the complexity of the pruned tree learned in each trial. Let C be the average pruned-tree complexity (C= nodes+ edges of the tree). Grow a final tree breadth-first from all the training data but stop when the complexity reaches C. Similar cross-validation approach can be used to set arbitrary algorithm parameters in general. 51
52 Summary Decision tree algorithm Ordering nodes (Information Gain) Testing and pruning 52
What is Learning? CS 343: Artificial Intelligence Machine Learning. Raymond J. Mooney. Problem Solving / Planning / Control.
What is Learning? CS 343: Artificial Intelligence Machine Learning Herbert Simon: Learning is any process by which a system improves performance from experience. What is the task? Classification Problem
More informationDecision Tree CE-717 : Machine Learning Sharif University of Technology
Decision Tree CE-717 : Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Some slides have been adapted from: Prof. Tom Mitchell Decision tree Approximating functions of usually discrete
More informationEnsemble Methods, Decision Trees
CS 1675: Intro to Machine Learning Ensemble Methods, Decision Trees Prof. Adriana Kovashka University of Pittsburgh November 13, 2018 Plan for This Lecture Ensemble methods: introduction Boosting Algorithm
More informationBITS F464: MACHINE LEARNING
BITS F464: MACHINE LEARNING Lecture-16: Decision Tree (contd.) + Random Forest Dr. Kamlesh Tiwari Assistant Professor Department of Computer Science and Information Systems Engineering, BITS Pilani, Rajasthan-333031
More informationClassification with Decision Tree Induction
Classification with Decision Tree Induction This algorithm makes Classification Decision for a test sample with the help of tree like structure (Similar to Binary Tree OR k-ary tree) Nodes in the tree
More informationIntroduction to Machine Learning
Introduction to Machine Learning Decision Tree Example Three variables: Attribute 1: Hair = {blond, dark} Attribute 2: Height = {tall, short} Class: Country = {Gromland, Polvia} CS4375 --- Fall 2018 a
More information9/6/14. Our first learning algorithm. Comp 135 Introduction to Machine Learning and Data Mining. knn Algorithm. knn Algorithm (simple form)
Comp 135 Introduction to Machine Learning and Data Mining Our first learning algorithm How would you classify the next example? Fall 2014 Professor: Roni Khardon Computer Science Tufts University o o o
More informationLecture 5 of 42. Decision Trees, Occam s Razor, and Overfitting
Lecture 5 of 42 Decision Trees, Occam s Razor, and Overfitting Friday, 01 February 2008 William H. Hsu, KSU http://www.cis.ksu.edu/~bhsu Readings: Chapter 3.6-3.8, Mitchell Lecture Outline Read Sections
More informationRepresenting structural patterns: Reading Material: Chapter 3 of the textbook by Witten
Representing structural patterns: Plain Classification rules Decision Tree Rules with exceptions Relational solution Tree for Numerical Prediction Instance-based presentation Reading Material: Chapter
More informationDecision tree learning
Decision tree learning Andrea Passerini passerini@disi.unitn.it Machine Learning Learning the concept Go to lesson OUTLOOK Rain Overcast Sunny TRANSPORTATION LESSON NO Uncovered Covered Theoretical Practical
More informationData Mining. 3.2 Decision Tree Classifier. Fall Instructor: Dr. Masoud Yaghini. Chapter 5: Decision Tree Classifier
Data Mining 3.2 Decision Tree Classifier Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction Basic Algorithm for Decision Tree Induction Attribute Selection Measures Information Gain Gain Ratio
More informationData Mining Practical Machine Learning Tools and Techniques
Decision trees Extending previous approach: Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank to permit numeric s: straightforward
More informationData Mining. Decision Tree. Hamid Beigy. Sharif University of Technology. Fall 1396
Data Mining Decision Tree Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 1 / 24 Table of contents 1 Introduction 2 Decision tree
More informationDecision Trees: Discussion
Decision Trees: Discussion Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 This lecture: Learning Decision Trees 1. Representation: What are decision trees? 2. Algorithm: Learning
More informationMachine Learning in Real World: C4.5
Machine Learning in Real World: C4.5 Industrial-strength algorithms For an algorithm to be useful in a wide range of realworld applications it must: Permit numeric attributes with adaptive discretization
More informationLecture 5: Decision Trees (Part II)
Lecture 5: Decision Trees (Part II) Dealing with noise in the data Overfitting Pruning Dealing with missing attribute values Dealing with attributes with multiple values Integrating costs into node choice
More information7. Decision or classification trees
7. Decision or classification trees Next we are going to consider a rather different approach from those presented so far to machine learning that use one of the most common and important data structure,
More informationData Mining and Analytics
Data Mining and Analytics Aik Choon Tan, Ph.D. Associate Professor of Bioinformatics Division of Medical Oncology Department of Medicine aikchoon.tan@ucdenver.edu 9/22/2017 http://tanlab.ucdenver.edu/labhomepage/teaching/bsbt6111/
More informationDecision Trees Dr. G. Bharadwaja Kumar VIT Chennai
Decision Trees Decision Tree Decision Trees (DTs) are a nonparametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target
More informationNominal Data. May not have a numerical representation Distance measures might not make sense. PR and ANN
NonMetric Data Nominal Data So far we consider patterns to be represented by feature vectors of real or integer values Easy to come up with a distance (similarity) measure by using a variety of mathematical
More informationLecture outline. Decision-tree classification
Lecture outline Decision-tree classification Decision Trees Decision tree A flow-chart-like tree structure Internal node denotes a test on an attribute Branch represents an outcome of the test Leaf nodes
More informationChapter 4: Algorithms CS 795
Chapter 4: Algorithms CS 795 Inferring Rudimentary Rules 1R Single rule one level decision tree Pick each attribute and form a single level tree without overfitting and with minimal branches Pick that
More informationAdvanced learning algorithms
Advanced learning algorithms Extending decision trees; Extraction of good classification rules; Support vector machines; Weighted instance-based learning; Design of Model Tree Clustering Association Mining
More informationChapter 4: Algorithms CS 795
Chapter 4: Algorithms CS 795 Inferring Rudimentary Rules 1R Single rule one level decision tree Pick each attribute and form a single level tree without overfitting and with minimal branches Pick that
More informationChapter ML:III. III. Decision Trees. Decision Trees Basics Impurity Functions Decision Tree Algorithms Decision Tree Pruning
Chapter ML:III III. Decision Trees Decision Trees Basics Impurity Functions Decision Tree Algorithms Decision Tree Pruning ML:III-67 Decision Trees STEIN/LETTMANN 2005-2017 ID3 Algorithm [Quinlan 1986]
More informationLecture 2 :: Decision Trees Learning
Lecture 2 :: Decision Trees Learning 1 / 62 Designing a learning system What to learn? Learning setting. Learning mechanism. Evaluation. 2 / 62 Prediction task Figure 1: Prediction task :: Supervised learning
More informationCOMP 465: Data Mining Classification Basics
Supervised vs. Unsupervised Learning COMP 465: Data Mining Classification Basics Slides Adapted From : Jiawei Han, Micheline Kamber & Jian Pei Data Mining: Concepts and Techniques, 3 rd ed. Supervised
More informationSupervised Learning. Decision trees Artificial neural nets K-nearest neighbor Support vectors Linear regression Logistic regression...
Supervised Learning Decision trees Artificial neural nets K-nearest neighbor Support vectors Linear regression Logistic regression... Supervised Learning y=f(x): true function (usually not known) D: training
More informationISSUES IN DECISION TREE LEARNING
ISSUES IN DECISION TREE LEARNING Handling Continuous Attributes Other attribute selection measures Overfitting-Pruning Handling of missing values Incremental Induction of Decision Tree 1 DECISION TREE
More informationDecision Trees. Query Selection
CART Query Selection Key Question: Given a partial tree down to node N, what feature s should we choose for the property test T? The obvious heuristic is to choose the feature that yields as big a decrease
More informationNominal Data. May not have a numerical representation Distance measures might not make sense PR, ANN, & ML
Decision Trees Nominal Data So far we consider patterns to be represented by feature vectors of real or integer values Easy to come up with a distance (similarity) measure by using a variety of mathematical
More informationDecision Trees. Petr Pošík. Czech Technical University in Prague Faculty of Electrical Engineering Dept. of Cybernetics
Decision Trees Petr Pošík Czech Technical University in Prague Faculty of Electrical Engineering Dept. of Cybernetics This lecture is largely based on the book Artificial Intelligence: A Modern Approach,
More informationCS Machine Learning
CS 60050 Machine Learning Decision Tree Classifier Slides taken from course materials of Tan, Steinbach, Kumar 10 10 Illustrating Classification Task Tid Attrib1 Attrib2 Attrib3 Class 1 Yes Large 125K
More informationData Mining: Concepts and Techniques Classification and Prediction Chapter 6.1-3
Data Mining: Concepts and Techniques Classification and Prediction Chapter 6.1-3 January 25, 2007 CSE-4412: Data Mining 1 Chapter 6 Classification and Prediction 1. What is classification? What is prediction?
More informationInduction of Decision Trees
Induction of Decision Trees Blaž Zupan and Ivan Bratko magixfriuni-ljsi/predavanja/uisp An Example Data Set and Decision Tree # Attribute Class Outlook Company Sailboat Sail? 1 sunny big small yes 2 sunny
More informationMachine Learning. Decision Trees. Le Song /15-781, Spring Lecture 6, September 6, 2012 Based on slides from Eric Xing, CMU
Machine Learning 10-701/15-781, Spring 2008 Decision Trees Le Song Lecture 6, September 6, 2012 Based on slides from Eric Xing, CMU Reading: Chap. 1.6, CB & Chap 3, TM Learning non-linear functions f:
More informationData Mining. 3.3 Rule-Based Classification. Fall Instructor: Dr. Masoud Yaghini. Rule-Based Classification
Data Mining 3.3 Fall 2008 Instructor: Dr. Masoud Yaghini Outline Using IF-THEN Rules for Classification Rules With Exceptions Rule Extraction from a Decision Tree 1R Algorithm Sequential Covering Algorithms
More informationData Mining Part 5. Prediction
Data Mining Part 5. Prediction 5.4. Spring 2010 Instructor: Dr. Masoud Yaghini Outline Using IF-THEN Rules for Classification Rule Extraction from a Decision Tree 1R Algorithm Sequential Covering Algorithms
More informationDecision Tree Learning
Decision Tree Learning Debapriyo Majumdar Data Mining Fall 2014 Indian Statistical Institute Kolkata August 25, 2014 Example: Age, Income and Owning a flat Monthly income (thousand rupees) 250 200 150
More informationARTIFICIAL INTELLIGENCE (CS 370D)
Princess Nora University Faculty of Computer & Information Systems ARTIFICIAL INTELLIGENCE (CS 370D) (CHAPTER-18) LEARNING FROM EXAMPLES DECISION TREES Outline 1- Introduction 2- know your data 3- Classification
More informationDecision Trees Oct
Decision Trees Oct - 7-2009 Previously We learned two different classifiers Perceptron: LTU KNN: complex decision boundary If you are a novice in this field, given a classification application, are these
More informationMachine Learning. Decision Trees. Manfred Huber
Machine Learning Decision Trees Manfred Huber 2015 1 Decision Trees Classifiers covered so far have been Non-parametric (KNN) Probabilistic with independence (Naïve Bayes) Linear in features (Logistic
More informationAlgorithms: Decision Trees
Algorithms: Decision Trees A small dataset: Miles Per Gallon Suppose we want to predict MPG From the UCI repository A Decision Stump Recursion Step Records in which cylinders = 4 Records in which cylinders
More informationLecture 7: Decision Trees
Lecture 7: Decision Trees Instructor: Outline 1 Geometric Perspective of Classification 2 Decision Trees Geometric Perspective of Classification Perspective of Classification Algorithmic Geometric Probabilistic...
More informationData Mining Practical Machine Learning Tools and Techniques
Output: Knowledge representation Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter of Data Mining by I. H. Witten and E. Frank Decision tables Decision trees Decision rules
More informationCSC411/2515 Tutorial: K-NN and Decision Tree
CSC411/2515 Tutorial: K-NN and Decision Tree Mengye Ren csc{411,2515}ta@cs.toronto.edu September 25, 2016 Cross-validation K-nearest-neighbours Decision Trees Review: Motivation for Validation Framework:
More informationData Mining. Practical Machine Learning Tools and Techniques. Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A.
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Output: Knowledge representation Tables Linear models Trees Rules
More informationDecision Tree (Continued) and K-Nearest Neighbour. Dr. Xiaowei Huang
Decision Tree (Continued) and K-Nearest Neighbour Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Recap basic knowledge Decision tree learning How to split Identify the best feature to
More informationData Mining Algorithms: Basic Methods
Algorithms: The basic methods Inferring rudimentary rules Data Mining Algorithms: Basic Methods Chapter 4 of Data Mining Statistical modeling Constructing decision trees Constructing rules Association
More informationWEKA: Practical Machine Learning Tools and Techniques in Java. Seminar A.I. Tools WS 2006/07 Rossen Dimov
WEKA: Practical Machine Learning Tools and Techniques in Java Seminar A.I. Tools WS 2006/07 Rossen Dimov Overview Basic introduction to Machine Learning Weka Tool Conclusion Document classification Demo
More informationExam Advanced Data Mining Date: Time:
Exam Advanced Data Mining Date: 11-11-2010 Time: 13.30-16.30 General Remarks 1. You are allowed to consult 1 A4 sheet with notes written on both sides. 2. Always show how you arrived at the result of your
More informationCISC 4631 Data Mining
CISC 4631 Data Mining Lecture 05: Overfitting Evaluation: accuracy, precision, recall, ROC Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Eamonn Koegh (UC Riverside)
More informationMIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018
MIT 801 [Presented by Anna Bosman] 16 February 2018 Machine Learning What is machine learning? Artificial Intelligence? Yes as we know it. What is intelligence? The ability to acquire and apply knowledge
More informationMachine Learning Lecture 11
Machine Learning Lecture 11 Random Forests 23.11.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory Probability
More informationData Mining and Machine Learning: Techniques and Algorithms
Instance based classification Data Mining and Machine Learning: Techniques and Algorithms Eneldo Loza Mencía eneldo@ke.tu-darmstadt.de Knowledge Engineering Group, TU Darmstadt International Week 2019,
More informationExtra readings beyond the lecture slides are important:
1 Notes To preview next lecture: Check the lecture notes, if slides are not available: http://web.cse.ohio-state.edu/~sun.397/courses/au2017/cse5243-new.html Check UIUC course on the same topic. All their
More informationDATA MINING OVERFITTING AND EVALUATION
DATA MINING OVERFITTING AND EVALUATION 1 Overfitting Will cover mechanisms for preventing overfitting in decision trees But some of the mechanisms and concepts will apply to other algorithms 2 Occam s
More informationDecision Trees: Representation
Decision Trees: Representation Machine Learning Fall 2018 Some slides from Tom Mitchell, Dan Roth and others 1 Key issues in machine learning Modeling How to formulate your problem as a machine learning
More informationInstance-Based Representations. k-nearest Neighbor. k-nearest Neighbor. k-nearest Neighbor. exemplars + distance measure. Challenges.
Instance-Based Representations exemplars + distance measure Challenges. algorithm: IB1 classify based on majority class of k nearest neighbors learned structure is not explicitly represented choosing k
More informationUninformed Search Methods. Informed Search Methods. Midterm Exam 3/13/18. Thursday, March 15, 7:30 9:30 p.m. room 125 Ag Hall
Midterm Exam Thursday, March 15, 7:30 9:30 p.m. room 125 Ag Hall Covers topics through Decision Trees and Random Forests (does not include constraint satisfaction) Closed book 8.5 x 11 sheet with notes
More informationAdditive Models, Trees, etc. Based in part on Chapter 9 of Hastie, Tibshirani, and Friedman David Madigan
Additive Models, Trees, etc. Based in part on Chapter 9 of Hastie, Tibshirani, and Friedman David Madigan Predictive Modeling Goal: learn a mapping: y = f(x;θ) Need: 1. A model structure 2. A score function
More informationAnalytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.
Glossary of data mining terms: Accuracy Accuracy is an important factor in assessing the success of data mining. When applied to data, accuracy refers to the rate of correct values in the data. When applied
More informationClassification: Decision Trees
Metodologie per Sistemi Intelligenti Classification: Decision Trees Prof. Pier Luca Lanzi Laurea in Ingegneria Informatica Politecnico di Milano Polo regionale di Como Lecture outline What is a decision
More informationh=[3,2,5,7], pos=[2,1], neg=[4,4]
2D1431 Machine Learning Lab 1: Concept Learning & Decision Trees Frank Hoffmann e-mail: hoffmann@nada.kth.se November 8, 2002 1 Introduction You have to prepare the solutions to the lab assignments prior
More informationBusiness Club. Decision Trees
Business Club Decision Trees Business Club Analytics Team December 2017 Index 1. Motivation- A Case Study 2. The Trees a. What is a decision tree b. Representation 3. Regression v/s Classification 4. Building
More informationBatch and Incremental Learning of Decision Trees
Batch and Incremental Learning of Decision Trees DIPLOMARBEIT zur Erlangung des akademischen Grades DIPLOMINGENIEUR im Diplomstudium Mathematik Angefertigt am Institut für Wissensbasierte Mathematische
More informationPrediction. What is Prediction. Simple methods for Prediction. Classification by decision tree induction. Classification and regression evaluation
Prediction Prediction What is Prediction Simple methods for Prediction Classification by decision tree induction Classification and regression evaluation 2 Prediction Goal: to predict the value of a given
More informationArtificial Intelligence. Programming Styles
Artificial Intelligence Intro to Machine Learning Programming Styles Standard CS: Explicitly program computer to do something Early AI: Derive a problem description (state) and use general algorithms to
More informationNearest neighbor classification DSE 220
Nearest neighbor classification DSE 220 Decision Trees Target variable Label Dependent variable Output space Person ID Age Gender Income Balance Mortgag e payment 123213 32 F 25000 32000 Y 17824 49 M 12000-3000
More informationDecision trees. Decision trees are useful to a large degree because of their simplicity and interpretability
Decision trees A decision tree is a method for classification/regression that aims to ask a few relatively simple questions about an input and then predicts the associated output Decision trees are useful
More informationPractical Data Mining COMP-321B. Tutorial 1: Introduction to the WEKA Explorer
Practical Data Mining COMP-321B Tutorial 1: Introduction to the WEKA Explorer Gabi Schmidberger Mark Hall Richard Kirkby July 12, 2006 c 2006 University of Waikato 1 Setting up your Environment Before
More informationMachine Learning: Symbolische Ansätze
Machine Learning: Symbolische Ansätze Learning Rule Sets Introduction Learning Rule Sets Terminology Coverage Spaces Separate-and-Conquer Rule Learning Covering algorithm Top-Down Hill-Climbing Rule Evaluation
More informationCMPUT 391 Database Management Systems. Data Mining. Textbook: Chapter (without 17.10)
CMPUT 391 Database Management Systems Data Mining Textbook: Chapter 17.7-17.11 (without 17.10) University of Alberta 1 Overview Motivation KDD and Data Mining Association Rules Clustering Classification
More informationMachine Learning. bad news. Big Picture: Supervised Learning. Supervised (Function) Learning. Learning From Data with Decision Trees.
bad news Machine Learning Learning From Data with Decision Trees Supervised (Function) Learning y = F(x 1 x n ): true function (usually not known) D: training sample drawn from F(x) 57,M,195,0,125,95,39,25,0,1,0,0,0,1,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0
More informationData Mining Part 4. Tony C Smith WEKA Machine Learning Group Department of Computer Science University of Waikato
Data Mining Part 4 Tony C Smith WEKA Machine Learning Group Department of Computer Science University of Waikato Algorithms: The basic methods Inferring rudimentary rules Statistical modeling Constructing
More informationLearning Rules. Learning Rules from Decision Trees
Learning Rules In learning rules, we are interested in learning rules of the form: if A 1 A 2... then C where A 1, A 2,... are the preconditions/constraints/body/ antecedents of the rule and C is the postcondition/head/
More informationThe digital copy of this thesis is protected by the Copyright Act 1994 (New Zealand).
http://waikato.researchgateway.ac.nz/ Research Commons at the University of Waikato Copyright Statement: The digital copy of this thesis is protected by the Copyright Act 1994 (New Zealand). The thesis
More informationFuzzy Partitioning with FID3.1
Fuzzy Partitioning with FID3.1 Cezary Z. Janikow Dept. of Mathematics and Computer Science University of Missouri St. Louis St. Louis, Missouri 63121 janikow@umsl.edu Maciej Fajfer Institute of Computing
More informationData Mining and Knowledge Discovery Practice notes 2
Keywords Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si Data Attribute, example, attribute-value data, target variable, class, discretization Algorithms
More informationData Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation. Lecture Notes for Chapter 4. Introduction to Data Mining
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach, Kumar (modified by Predrag Radivojac, 2017) Classification:
More informationData Mining and Knowledge Discovery: Practice Notes
Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si 8.11.2017 1 Keywords Data Attribute, example, attribute-value data, target variable, class, discretization
More informationData Mining. 3.5 Lazy Learners (Instance-Based Learners) Fall Instructor: Dr. Masoud Yaghini. Lazy Learners
Data Mining 3.5 (Instance-Based Learners) Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction k-nearest-neighbor Classifiers References Introduction Introduction Lazy vs. eager learning Eager
More informationClassification/Regression Trees and Random Forests
Classification/Regression Trees and Random Forests Fabio G. Cozman - fgcozman@usp.br November 6, 2018 Classification tree Consider binary class variable Y and features X 1,..., X n. Decide Ŷ after a series
More informationData Mining Classification - Part 1 -
Data Mining Classification - Part 1 - Universität Mannheim Bizer: Data Mining I FSS2019 (Version: 20.2.2018) Slide 1 Outline 1. What is Classification? 2. K-Nearest-Neighbors 3. Decision Trees 4. Model
More informationData Mining Practical Machine Learning Tools and Techniques. Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank Implementation: Real machine learning schemes Decision trees Classification
More informationIntroduction to Rule-Based Systems. Using a set of assertions, which collectively form the working memory, and a set of
Introduction to Rule-Based Systems Using a set of assertions, which collectively form the working memory, and a set of rules that specify how to act on the assertion set, a rule-based system can be created.
More informationCONCEPT FORMATION AND DECISION TREE INDUCTION USING THE GENETIC PROGRAMMING PARADIGM
1 CONCEPT FORMATION AND DECISION TREE INDUCTION USING THE GENETIC PROGRAMMING PARADIGM John R. Koza Computer Science Department Stanford University Stanford, California 94305 USA E-MAIL: Koza@Sunburn.Stanford.Edu
More informationTree-based methods for classification and regression
Tree-based methods for classification and regression Ryan Tibshirani Data Mining: 36-462/36-662 April 11 2013 Optional reading: ISL 8.1, ESL 9.2 1 Tree-based methods Tree-based based methods for predicting
More informationUnivariate and Multivariate Decision Trees
Univariate and Multivariate Decision Trees Olcay Taner Yıldız and Ethem Alpaydın Department of Computer Engineering Boğaziçi University İstanbul 80815 Turkey Abstract. Univariate decision trees at each
More informationIntroduction to Machine Learning CANB 7640
Introduction to Machine Learning CANB 7640 Aik Choon Tan, Ph.D. Associate Professor of Bioinformatics Division of Medical Oncology Department of Medicine aikchoon.tan@ucdenver.edu 9/5/2017 http://tanlab.ucdenver.edu/labhomepage/teaching/canb7640/
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Lecture 10 - Classification trees Tom Kelsey School of Computer Science University of St Andrews http://tom.home.cs.st-andrews.ac.uk twk@st-andrews.ac.uk Tom Kelsey
More informationCS229 Lecture notes. Raphael John Lamarre Townshend
CS229 Lecture notes Raphael John Lamarre Townshend Decision Trees We now turn our attention to decision trees, a simple yet flexible class of algorithms. We will first consider the non-linear, region-based
More informationLazy Decision Trees Ronny Kohavi
Lazy Decision Trees Ronny Kohavi Data Mining and Visualization Group Silicon Graphics, Inc. Joint work with Jerry Friedman and Yeogirl Yun Stanford University Motivation: Average Impurity = / interesting
More informationUniversity of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 10: Decision Trees
University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 10: Decision Trees colour=green? size>20cm? colour=red? watermelon size>5cm? size>5cm? colour=yellow? apple
More informationTillämpad Artificiell Intelligens Applied Artificial Intelligence Tentamen , , MA:8. 1 Search (JM): 11 points
Lunds Tekniska Högskola EDA132 Institutionen för datavetenskap VT 2017 Tillämpad Artificiell Intelligens Applied Artificial Intelligence Tentamen 2016 03 15, 14.00 19.00, MA:8 You can give your answers
More informationNaïve Bayes Classification. Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others
Naïve Bayes Classification Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others Things We d Like to Do Spam Classification Given an email, predict
More informationMachine Learning (CSE 446): Concepts & the i.i.d. Supervised Learning Paradigm
Machine Learning (CSE 446): Concepts & the i.i.d. Supervised Learning Paradigm Sham M Kakade c 2018 University of Washington cse446-staff@cs.washington.edu 1 / 17 Review 1 / 17 Decision Tree: Making a
More informationClassification: Decision Trees
Classification: Decision Trees IST557 Data Mining: Techniques and Applications Jessie Li, Penn State University 1 Decision Tree Example Will a pa)ent have high-risk based on the ini)al 24-hour observa)on?
More informationA Systematic Overview of Data Mining Algorithms
A Systematic Overview of Data Mining Algorithms 1 Data Mining Algorithm A well-defined procedure that takes data as input and produces output as models or patterns well-defined: precisely encoded as a
More informationDESIGN AND IMPLEMENTATION OF BUILDING DECISION TREE USING C4.5 ALGORITHM
1 Proceedings of SEAMS-GMU Conference 2007 DESIGN AND IMPLEMENTATION OF BUILDING DECISION TREE USING C4.5 ALGORITHM KUSRINI Abstract. Decision tree is one of data mining techniques that is applied in classification
More information