Decision Tree Learning

Size: px
Start display at page:

Download "Decision Tree Learning"

Transcription

1 Decision Tree Learning 1

2 Simple example of object classification Instances Size Color Shape C(x) x1 small red circle positive x2 large red circle positive x3 small red triangle negative x4 large blue circle negative 2

3 Decision Trees Tree-based classifiers for instances represented as feature-vectors. Nodes test features, there is one branch for each value of the feature, and leaves specify the category. color color red blue green red blue green shape shape neg pos B C circle square triangle circle square triangle pos neg neg A B C Can represent arbitrary conjunction and disjunction (any boolean function). Can represent any classification function over discrete feature vectors. Can be rewritten as a set of rules, i.e. disjunctive normal form (DNF). red circle pos red circle A blue B; red square B green C; red triangle C

4 General shape of a decision tree Every non leaf node is a test on the value of one features. For each possible outcome of the test, an arrow is created that links to a subsequent test node or to a leaf node. Leaf nodes are DECISIONS concerning the value of the classification 4

5 Another simple example: risk of giving a loan Here, we have 3 features: income, credit history and debt. The classification function is Risk(x) with 3 values: high, moderate, low 5

6 More complex example: Application of Decision- Tree Model to character recognition based on visual features 6 Can t be visualized without zooming, but boxes (decisions) are alphabet letters, circles are test on values of several visual features

7 How does it work: Top-Down Decision Tree Induction Recursively build a tree top-down by divide and conquer. Subset of examples in D in which color=red <big, red, circle>: + <small, red, circle>: + <small, red, square>: - <big, blue, circle>: - red <big, red, circle>: + <small, red, circle>: + <small, red, square>: - blue color green Initial learning Set D Begin by considering the feature color At each step, we aim to find the best split of our data. What is a good split? One which reduces the uncertainty of classification for some split! 7

8 Top-Down Decision Tree Induction Recursively build a tree top-down by divide and conquer. <big, red, circle>: + <small, red, circle>: + <small, red, square>: - pos <big, red, circle>: + <small, red, circle>: + <big, red, circle>: + <small, red, circle>: + <small, red, square>: - <big, blue, circle>: - color red blue green shape neg neg circle <big, blue, circle>: - square triangle neg pos <small, red, square>: - How do we decide the order in which we test attributes? How do we decide the class of nodes for which we have no examples? Let s ignore for now these 2 issues and describe the algorithm first 8

9 Remember the house classification problem in L1 Mostly SF The data suggests homes higher than 73 meters are mostly in San Francisco 73m

10 Splitting data according to features height >75 <75 SF All instances in this split are green (=SF), so we can output a decison In this split, we need to analyze more features since no decision on the class is possible 10

11 Decision Tree Induction Pseudocode DTree(examples, features) returns a tree a)if all examples are in one category, return a leaf node with that category label. b)else if the set of features is empty, return a leaf node with the category label that is the most common in examples. Else pick a feature f and create a node R for it color For each possible value v i of f: Let examples i be the subset of examples that have value v i for f Add an out-going edge E to node R labeled with the value v i. If examples i is empty then attach a leaf node to edge E labeled with the category that is the most common in examples. else call DTree(examples i, features {f}) and attach the resulting tree as the subtree under edge E. Return the subtree rooted at R. a) and b) are the termination conditions e.g. color e.g. color=red <big, red, circle>: + <small, red, circle>: + <small, red, square>: - red 11

12 Example (1) Examples: <big, red, circle>: + <small, red, circle>: + <small, red, square>: - <big, blue, circle>: - pick a feature F and create a node R for it, eg. color color For each possible value v i of F: Let examples i be the subset of examples that have value v i for F Add an out-going edge E to node R labeled with the value v i. <big, red, circle>: + <small, red, circle>: + <small, red, square>: - red color if ( ) else call DTree(examples i, features {F}) and attach the resulting tree as the subtree under edge E. Dtree( <big, red, circle>: + <small, red, circle>: + <small, red, square>: -, <dimension,shape>) 12

13 Example (2) DTree( <big, red, circle>: + <small, red, circle>: + <small, red, square>: -, <dimension,shape>) pick a feature F and create a node R for it, eg. shape <big, red, circle>: + <small, red, circle>: + circle shape red color square triangle <small, red, square>: - positive negative positive If all examples are in one category, return a leaf node with that category label. If examples i is empty. then attach a leaf node to edge E labeled with the category that 13 is the most common in examples.

14 Example (3) <big, blue, circle>: - <big, red, circle>: + <small, red, circle>: + <small, red, square>: - pos <big, red, circle>: + <small, red, circle>: + color red blue green shape neg neg circle <big, blue, circle>: - square triangle neg pos <small, red, square>: - Now we know how to decide the class when we have no examples, but how do we decide the order in which we create nodes? 14

15 Picking a Good Split Feature Goal is to have the resulting tree be as small as possible, per Occam s razor. Finding a minimal decision tree (nodes, leaves, or depth) is an NP-hard optimization problem. Top-down divide-and-conquer method does a greedy search for a simple tree but does not guarantee to find the smallest. General lesson in ML: Greed is good. Want to pick a feature that creates subsets of examples that are relatively pure in a single class so they are closer to being leaf nodes. There are a variety of heuristics for picking a good test, a popular one is based on information gain that originated with the ID3 system of Quinlan (1979). 15

16 Entropy (recap.. ) Entropy (disorder, impurity) of a set of examples, D, relative to a binary classification is: Entropy(D) = p 1 log 2 ( p 1 ) p 0 log 2 ( p 0 ) where p 1 is the fraction of positive examples in D and p 0 is the fraction of negatives. (Notice that if S is a SAMPLE of a population, Entropy(S) is an estimate of the population entropy) If all examples are in one category, entropy is zero (we define 0 log(0)=0) If examples are equally mixed (p 1 =p 0 =0.5), entropy is a maximum of 1. Entropy can be viewed as the number of bits required on average to encode the class of an example in D. It is also an estimate of the initial disorder or uncertainty about a classification, given the set D. For multi-class problems with c categories, entropy generalizes to: Entropy(D) = p i log 2 ( p i ) c i=1 16

17 Entropy Plot for Binary Classification 17

18 Information Gain The information gain of a feature f is the expected reduction in entropy resulting from splitting on this feature. Gain(D, f ) = Entropy(D) where D v is the subset of D having value v for feature f (e.g, if f=color and v=red) Entropy of each resulting subset weighted by its relative size. Example: <big, red, circle>: + <small, red, circle>: + <small, red, square>: - <big, blue, circle>: - D v Entropy(D v ) D v Values( f ) Initial Entropy is 1 2+, 2 -: E=1 size 2+, 2 - : E=1 color 2+, 2 - : E=1 shape big small 1+,1-1+,1- E big =1 E small =1 Gain=1-( ) = 0 red blue 2+,1-0+,1- E red =0.918 E blue =0 Gain=1-( ) = circle square 2+,1-0+,1- E circle =0.918 E square =0 Gain=1-( ) = 0.311

19 New pseudo-code DTree(examples, features) returns a tree a)if all examples are in one category, return a leaf node with that category label. b)else if the set of features is empty, return a leaf node with the category label that is the most common in examples. Else pick the best feature f according to IG and create a node R for it For each possible value v i of f: Let examples i be the subset of examples that have value v i for f Add an out-going edge E to node R labeled with the value v i. If examples i is empty then attach a leaf node to edge E labeled with the category that is the most common in examples. else call DTree(examples i, features {f}) and attach the resulting tree as the subtree under edge E. Return the subtree rooted at R. 19

20 A small animated example 20

21 A complete example

22 A Decision Tree for PlayTennis? data The Play tennis? data example. instance x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14 Outlook Temperature Humidity Windy Play Sunny Sunny Overcast Rainy Rainy Rainy Overcast Sunny Sunny Rainy Sunny Overcast Overcast Rainy Hot Hot Hot Mild Cool Cool Cool Mild Cool Mild Mild Mild Hot Mild High High High High Normal Normal Normal High Normal Normal Normal High Normal High False True False False False True True False False False True True False True No No Yes Yes Yes No Yes No Yes Yes Yes Yes Yes No

23 The Process of Constructing a Decision Tree Select an attribute to place at the root of the decision tree and make one branch for every possible value. Repeat the process recursively for each branch.

24 Information Gained by Knowing the Result of a Decision In the playtennis example, there are 9 instances of which the decision to play is yes and there are 5 instances of which the decision to play is no. Then, the initial data entropy is: 9 14 # log 9 & # % (+ 5 & # % ( log 5 & % ( = $ 14' $ 14' $ 14' The information initially required to correctly separate the data is bits

25 Information Further Required If Outlook Is Placed at the Root is the entropy of a 5 instances data set where 3 are positive and 2 pegative (or viceversa) yes yes no no no Outlook sunny overcast rainy yes yes yes yes yes yes yes no no 0 is the entropy of a dataset where all instances are positive (or negative) Probabil ity of outlook =sunny Information further required =! 5 $! # & $! # & $ # & 0.971= 0.693bits. " 14% " 14% " 14%

26 Information Gained by Placing Each of the 4 Attributes Gain(outlook) = bits bits = bits. Gain(temperature) = bits. Gain(humidity) = bits. Gain(windy) = bits.

27 The Strategy for Selecting an Attribute to Place at a Node Select the attribute that gives us the largest information gain. In this example, it is the attribute Outlook. Outlook sunny overcast rainy 2 yes 3 no 4 yes 3 yes 2 no

28 The Recursive Procedure for Constructing a Decision Tree Apply to each branch recursively to construct the decision tree. For example, for the branch Outlook = Sunny, we evaluate the information gained by applying each of the remaining 3 attributes. Gain(Outlook=sunny;Temperature) = = Gain(Outlook=sunny;Humidity) = = Gain(Outlook=sunny;Windy) = = 0.02

29 Recursive selection Similarly, we also evaluate the information gained by applying each of the remaining 3 attributes for the branch Outlook = rainy. Gain(Outlook=rainy;Temperature) = = 0.02 Gain(Outlook=rainy;Humidity) = = 0.02 Gain(Outlook=rainy;Windy) = = 0.971

30 The resulting tree Outlook sunny overcast rainy humidity yes windy high normal false true no yes yes no 30

31 DT can be represented as a set of rules Outlook sunny overcast rainy humidity yes windy no high normal false true yes yes no IF Outlook=sunny AND humidity=high è no IF Outlook=sunny AND humidity=normalè yes IF Outlook=overcast è yes IF Outlook=rainy AND windy=falseè yes IF Outlook=rainy AND windy=trueè no 31

32 Support and Confidence Each rule as a support D v / D represented by the set of examples that fit the rule Each rule has also a confidence which might or might not be equal to support. The confidence is the subset of D v which is correctly classified by the rule. Remember one of the 2 exit conditions in the algorithm: Else if the set of features is empty, return a leaf node with the category label that is the most common in examples. Hence the set of examples D v does NOT has a uniform classification, but, say, D v + positive and D v - negative, if D v + > D v - class is positive, support is D v + / D and confidence is D v + / D v For example if we consume all features in a branch of the tree and we remain with 5 examples, of which 3 positive and 2 negative, we append the decision positive to the tree branch (and its associated rule), with support 5 (or 5/ D ) and confidence 3/5 32

33 Example (e.g. if we have just one feature) Outlook sunny overcast rainy 2 yes 3 no 4 yes 3 yes 2 no negative RULE: IF Outlook=sunny THEN neg Support=3/14 Confidence=3/5 decision-trees-tutorial/ 33

34 Properties of Decision Tree Learning Continuous (real-valued) features can be handled by allowing nodes to split a real valued feature into two ranges based on a threshold (e.g. length < 3 and length 3) Classification trees have discrete class labels at the leaves, regression trees allow real-valued outputs at the leaves. Algorithms for finding consistent trees are efficient for processing large amounts of training data for data mining tasks. Methods developed for handling noisy training data (both class and feature noise). Consistency constraint can be relaxed. Methods developed for handling missing feature values. 34

35 Hypothesis Space Search in DL Performs batch learning that processes all training instances at once rather than incremental learning (e.g. like VS) that updates a hypothesis after each example. Performs hill-climbing (greedy search) that may only find a locally-optimal solution. Greedy because it locally determines the best split node to be generated. Guaranteed to find a tree consistent with any conflict-free training set (i.e. identical feature vectors always assigned the same class), but not necessarily the simplest tree. Finds a single discrete hypothesis, so there is no way to provide confidences or create useful queries. 35

36 Bias in Decision-Tree Induction Information-gain gives a bias for trees with minimal depth. Implements a search (preference) bias instead of a language (restriction) bias (language bias e.g. conjunctive hypotheses only as for VS). 36

37 History of Decision-Tree Research Hunt and colleagues use exhaustive search decision-tree methods (CLS) to model human concept learning in the 1960 s. In the late 70 s, Quinlan developed ID3 with the information gain heuristic to learn expert systems from examples. Simulataneously, Breiman and Friedman and colleagues develop CART (Classification and Regression Trees), similar to ID3. In the 1980 s a variety of improvements are introduced to handle noise, continuous features, missing features, and improved splitting criteria. Various expert-system development tools results. Quinlan s updated decision-tree package (C4.5) released in Weka includes Java version of C4.5 called J48. 37

38 Summary Decision tree algorithm Ordering nodes (Information Gain) Testing and pruning 38

39 Computational Complexity Worst case builds a complete tree where every path test every feature. Assume D =n examples and m features. f 1 f m Maximum of n examples spread across all nodes at each of the m levels At each level, i, in the tree, must examine the remaining m- i features for each instance at that level to calculate info gains. m 2 i n = O( nm ) i= 1 However, learned tree is rarely complete (number of leaves is usually << n). In practice, complexity is linear in both number of features (m) and number of training examples (n). 39

40 Overfitting Learning a tree that classifies the training data perfectly may not lead to the tree with the best generalization to unseen data. There may be noise in the training data that the tree is erroneously fitting. The algorithm may be making poor decisions towards the leaves of the tree that are based on very little data and may not reflect reliable trends (e.g. reliable rules should be supported by many of examples, not just a handful ). A hypothesis, h, is said to overfit the training data is there exists another hypothesis, h, such that h has less error than h on the training data but greater error on independent test data. on training data accuracy on test data hypothesis complexity 40

41 Overfitting example (1) 41

42 Overfitting Example (2) Testing Ohms Law: V = IR (I = (1/R)V) Experimentally measure 10 points Fit a curve to the Resulting data. current (I) voltage (V) Perfect fit to training data with an 9 th degree polynomial (can fit n points exactly with an n-1 degree polynomial) Ohm was wrong, we have found a more accurate function! 42

43 Overfitting Example (3) Testing Ohms Law: V = IR (I = (1/R)V) current (I) voltage (V) Better generalization with a linear function that fits training data less accurately. 43

44 Overfitting Noise in Decision Trees Category or feature noise can easily cause overfitting. Add noisy instance <medium, blue, circle>: pos (but really neg) color red green blue shape neg neg circle square triangle pos neg pos 44

45 Overfitting Noise in Decision Trees Category or feature noise can easily cause overfitting. Add noisy instance <medium, blue, circle>: pos (but really neg) color red green blue <big, blue, circle>: - shape neg <medium, blue, circle>: + circle square triangle small med big pos neg pos neg pos neg Conflicting examples can also arise if the features are incomplete and inadequate to determine the class or if the target concept is non-deterministic (hidden features not discovered during initial phase or not available). For example it is believed that psoriasis also depends on stress, but patients stress condition mostly not available in patients records 45

46 Example (inadequate features) X(has-4-legs, has-tail, has-long-nose, has-big-ears,can-be-ridden) Classes: horse, camel, elephant The missing feature has-hump does not allow to correctly classify a camel (camel and horse would be classified the same according to the above features) 46

47 Overfitting Prevention (Pruning) Methods Two basic approaches for decision trees Prepruning: Stop growing tree as some point during top-down construction when there is no longer sufficient data to make reliable decisions (e.g. D v <k). Postpruning: Grow the full tree, then remove sub-trees that do not have sufficient evidence. Label leaf resulting from pruning with the majority class of the remaining data, or a class probability distribution. Method for determining which sub-trees to prune: Cross-validation: Reserve some training data as a hold-out set (validation set, tuning set) to evaluate utility of subtrees. Note: tuning set different from test set!! Test set always needed at the end. Statistical test: Use a statistical test on the training data to determine if any observed regularity (e.g. rule) can be dismisses, as likely due to random chance. Minimum description length (MDL): Determine if the additional complexity of the hypothesis is less complex than just explicitly remembering any exceptions resulting from pruning (e.g. for mammals: wings=false, except for bats). 47

48 Reduced Error Pruning A post-pruning, cross-validation approach. Partition training data D in learning L and validation V sets. Build a complete tree from the L data. Until accuracy on validation set V decreases do: For each non-leaf node, n, in the tree do: Temporarily prune the subtree below n and replace it with a leaf labeled with the current majority class at that node. Measure and record the accuracy of the pruned tree on the validation set. Permanently prune the node that results in the greatest increase in accuracy on the validation set. 48

49 Example shape red circle square pos neg pos <big, blue, circle>: - <medium, blue, circle>: + <small, blue, circle>: - green blue neg neg triangle small med big neg pos neg Majority of removed examples under the pruned sub-tree is neg 49

50 Issues with Reduced Error Pruning The problem with this approach is that it potentially wastes training data on the validation set. Severity of this problem depends where we are on the learning curve: test accuracy number of training examples A learning curve shows the accuracy, e.g. the learning rate, as the number of training examples grows. 50

51 Cross-Validating without Loosing Training Data If the algorithm is modified to grow trees breadth-first rather than depth-first (e.g. test all values of a feature first and create all nodes at level i), we can stop growing after reaching any specified tree complexity. First, run several trials of reduced error-pruning using different random splits of learning and test sets. Record the complexity of the pruned tree learned in each trial. Let C be the average pruned-tree complexity (C= nodes+ edges of the tree). Grow a final tree breadth-first from all the training data but stop when the complexity reaches C. Similar cross-validation approach can be used to set arbitrary algorithm parameters in general. 51

52 Summary Decision tree algorithm Ordering nodes (Information Gain) Testing and pruning 52

What is Learning? CS 343: Artificial Intelligence Machine Learning. Raymond J. Mooney. Problem Solving / Planning / Control.

What is Learning? CS 343: Artificial Intelligence Machine Learning. Raymond J. Mooney. Problem Solving / Planning / Control. What is Learning? CS 343: Artificial Intelligence Machine Learning Herbert Simon: Learning is any process by which a system improves performance from experience. What is the task? Classification Problem

More information

Decision Tree CE-717 : Machine Learning Sharif University of Technology

Decision Tree CE-717 : Machine Learning Sharif University of Technology Decision Tree CE-717 : Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Some slides have been adapted from: Prof. Tom Mitchell Decision tree Approximating functions of usually discrete

More information

Ensemble Methods, Decision Trees

Ensemble Methods, Decision Trees CS 1675: Intro to Machine Learning Ensemble Methods, Decision Trees Prof. Adriana Kovashka University of Pittsburgh November 13, 2018 Plan for This Lecture Ensemble methods: introduction Boosting Algorithm

More information

BITS F464: MACHINE LEARNING

BITS F464: MACHINE LEARNING BITS F464: MACHINE LEARNING Lecture-16: Decision Tree (contd.) + Random Forest Dr. Kamlesh Tiwari Assistant Professor Department of Computer Science and Information Systems Engineering, BITS Pilani, Rajasthan-333031

More information

Classification with Decision Tree Induction

Classification with Decision Tree Induction Classification with Decision Tree Induction This algorithm makes Classification Decision for a test sample with the help of tree like structure (Similar to Binary Tree OR k-ary tree) Nodes in the tree

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Decision Tree Example Three variables: Attribute 1: Hair = {blond, dark} Attribute 2: Height = {tall, short} Class: Country = {Gromland, Polvia} CS4375 --- Fall 2018 a

More information

9/6/14. Our first learning algorithm. Comp 135 Introduction to Machine Learning and Data Mining. knn Algorithm. knn Algorithm (simple form)

9/6/14. Our first learning algorithm. Comp 135 Introduction to Machine Learning and Data Mining. knn Algorithm. knn Algorithm (simple form) Comp 135 Introduction to Machine Learning and Data Mining Our first learning algorithm How would you classify the next example? Fall 2014 Professor: Roni Khardon Computer Science Tufts University o o o

More information

Lecture 5 of 42. Decision Trees, Occam s Razor, and Overfitting

Lecture 5 of 42. Decision Trees, Occam s Razor, and Overfitting Lecture 5 of 42 Decision Trees, Occam s Razor, and Overfitting Friday, 01 February 2008 William H. Hsu, KSU http://www.cis.ksu.edu/~bhsu Readings: Chapter 3.6-3.8, Mitchell Lecture Outline Read Sections

More information

Representing structural patterns: Reading Material: Chapter 3 of the textbook by Witten

Representing structural patterns: Reading Material: Chapter 3 of the textbook by Witten Representing structural patterns: Plain Classification rules Decision Tree Rules with exceptions Relational solution Tree for Numerical Prediction Instance-based presentation Reading Material: Chapter

More information

Decision tree learning

Decision tree learning Decision tree learning Andrea Passerini passerini@disi.unitn.it Machine Learning Learning the concept Go to lesson OUTLOOK Rain Overcast Sunny TRANSPORTATION LESSON NO Uncovered Covered Theoretical Practical

More information

Data Mining. 3.2 Decision Tree Classifier. Fall Instructor: Dr. Masoud Yaghini. Chapter 5: Decision Tree Classifier

Data Mining. 3.2 Decision Tree Classifier. Fall Instructor: Dr. Masoud Yaghini. Chapter 5: Decision Tree Classifier Data Mining 3.2 Decision Tree Classifier Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction Basic Algorithm for Decision Tree Induction Attribute Selection Measures Information Gain Gain Ratio

More information

Data Mining Practical Machine Learning Tools and Techniques

Data Mining Practical Machine Learning Tools and Techniques Decision trees Extending previous approach: Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank to permit numeric s: straightforward

More information

Data Mining. Decision Tree. Hamid Beigy. Sharif University of Technology. Fall 1396

Data Mining. Decision Tree. Hamid Beigy. Sharif University of Technology. Fall 1396 Data Mining Decision Tree Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 1 / 24 Table of contents 1 Introduction 2 Decision tree

More information

Decision Trees: Discussion

Decision Trees: Discussion Decision Trees: Discussion Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 This lecture: Learning Decision Trees 1. Representation: What are decision trees? 2. Algorithm: Learning

More information

Machine Learning in Real World: C4.5

Machine Learning in Real World: C4.5 Machine Learning in Real World: C4.5 Industrial-strength algorithms For an algorithm to be useful in a wide range of realworld applications it must: Permit numeric attributes with adaptive discretization

More information

Lecture 5: Decision Trees (Part II)

Lecture 5: Decision Trees (Part II) Lecture 5: Decision Trees (Part II) Dealing with noise in the data Overfitting Pruning Dealing with missing attribute values Dealing with attributes with multiple values Integrating costs into node choice

More information

7. Decision or classification trees

7. Decision or classification trees 7. Decision or classification trees Next we are going to consider a rather different approach from those presented so far to machine learning that use one of the most common and important data structure,

More information

Data Mining and Analytics

Data Mining and Analytics Data Mining and Analytics Aik Choon Tan, Ph.D. Associate Professor of Bioinformatics Division of Medical Oncology Department of Medicine aikchoon.tan@ucdenver.edu 9/22/2017 http://tanlab.ucdenver.edu/labhomepage/teaching/bsbt6111/

More information

Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai

Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai Decision Trees Decision Tree Decision Trees (DTs) are a nonparametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target

More information

Nominal Data. May not have a numerical representation Distance measures might not make sense. PR and ANN

Nominal Data. May not have a numerical representation Distance measures might not make sense. PR and ANN NonMetric Data Nominal Data So far we consider patterns to be represented by feature vectors of real or integer values Easy to come up with a distance (similarity) measure by using a variety of mathematical

More information

Lecture outline. Decision-tree classification

Lecture outline. Decision-tree classification Lecture outline Decision-tree classification Decision Trees Decision tree A flow-chart-like tree structure Internal node denotes a test on an attribute Branch represents an outcome of the test Leaf nodes

More information

Chapter 4: Algorithms CS 795

Chapter 4: Algorithms CS 795 Chapter 4: Algorithms CS 795 Inferring Rudimentary Rules 1R Single rule one level decision tree Pick each attribute and form a single level tree without overfitting and with minimal branches Pick that

More information

Advanced learning algorithms

Advanced learning algorithms Advanced learning algorithms Extending decision trees; Extraction of good classification rules; Support vector machines; Weighted instance-based learning; Design of Model Tree Clustering Association Mining

More information

Chapter 4: Algorithms CS 795

Chapter 4: Algorithms CS 795 Chapter 4: Algorithms CS 795 Inferring Rudimentary Rules 1R Single rule one level decision tree Pick each attribute and form a single level tree without overfitting and with minimal branches Pick that

More information

Chapter ML:III. III. Decision Trees. Decision Trees Basics Impurity Functions Decision Tree Algorithms Decision Tree Pruning

Chapter ML:III. III. Decision Trees. Decision Trees Basics Impurity Functions Decision Tree Algorithms Decision Tree Pruning Chapter ML:III III. Decision Trees Decision Trees Basics Impurity Functions Decision Tree Algorithms Decision Tree Pruning ML:III-67 Decision Trees STEIN/LETTMANN 2005-2017 ID3 Algorithm [Quinlan 1986]

More information

Lecture 2 :: Decision Trees Learning

Lecture 2 :: Decision Trees Learning Lecture 2 :: Decision Trees Learning 1 / 62 Designing a learning system What to learn? Learning setting. Learning mechanism. Evaluation. 2 / 62 Prediction task Figure 1: Prediction task :: Supervised learning

More information

COMP 465: Data Mining Classification Basics

COMP 465: Data Mining Classification Basics Supervised vs. Unsupervised Learning COMP 465: Data Mining Classification Basics Slides Adapted From : Jiawei Han, Micheline Kamber & Jian Pei Data Mining: Concepts and Techniques, 3 rd ed. Supervised

More information

Supervised Learning. Decision trees Artificial neural nets K-nearest neighbor Support vectors Linear regression Logistic regression...

Supervised Learning. Decision trees Artificial neural nets K-nearest neighbor Support vectors Linear regression Logistic regression... Supervised Learning Decision trees Artificial neural nets K-nearest neighbor Support vectors Linear regression Logistic regression... Supervised Learning y=f(x): true function (usually not known) D: training

More information

ISSUES IN DECISION TREE LEARNING

ISSUES IN DECISION TREE LEARNING ISSUES IN DECISION TREE LEARNING Handling Continuous Attributes Other attribute selection measures Overfitting-Pruning Handling of missing values Incremental Induction of Decision Tree 1 DECISION TREE

More information

Decision Trees. Query Selection

Decision Trees. Query Selection CART Query Selection Key Question: Given a partial tree down to node N, what feature s should we choose for the property test T? The obvious heuristic is to choose the feature that yields as big a decrease

More information

Nominal Data. May not have a numerical representation Distance measures might not make sense PR, ANN, & ML

Nominal Data. May not have a numerical representation Distance measures might not make sense PR, ANN, & ML Decision Trees Nominal Data So far we consider patterns to be represented by feature vectors of real or integer values Easy to come up with a distance (similarity) measure by using a variety of mathematical

More information

Decision Trees. Petr Pošík. Czech Technical University in Prague Faculty of Electrical Engineering Dept. of Cybernetics

Decision Trees. Petr Pošík. Czech Technical University in Prague Faculty of Electrical Engineering Dept. of Cybernetics Decision Trees Petr Pošík Czech Technical University in Prague Faculty of Electrical Engineering Dept. of Cybernetics This lecture is largely based on the book Artificial Intelligence: A Modern Approach,

More information

CS Machine Learning

CS Machine Learning CS 60050 Machine Learning Decision Tree Classifier Slides taken from course materials of Tan, Steinbach, Kumar 10 10 Illustrating Classification Task Tid Attrib1 Attrib2 Attrib3 Class 1 Yes Large 125K

More information

Data Mining: Concepts and Techniques Classification and Prediction Chapter 6.1-3

Data Mining: Concepts and Techniques Classification and Prediction Chapter 6.1-3 Data Mining: Concepts and Techniques Classification and Prediction Chapter 6.1-3 January 25, 2007 CSE-4412: Data Mining 1 Chapter 6 Classification and Prediction 1. What is classification? What is prediction?

More information

Induction of Decision Trees

Induction of Decision Trees Induction of Decision Trees Blaž Zupan and Ivan Bratko magixfriuni-ljsi/predavanja/uisp An Example Data Set and Decision Tree # Attribute Class Outlook Company Sailboat Sail? 1 sunny big small yes 2 sunny

More information

Machine Learning. Decision Trees. Le Song /15-781, Spring Lecture 6, September 6, 2012 Based on slides from Eric Xing, CMU

Machine Learning. Decision Trees. Le Song /15-781, Spring Lecture 6, September 6, 2012 Based on slides from Eric Xing, CMU Machine Learning 10-701/15-781, Spring 2008 Decision Trees Le Song Lecture 6, September 6, 2012 Based on slides from Eric Xing, CMU Reading: Chap. 1.6, CB & Chap 3, TM Learning non-linear functions f:

More information

Data Mining. 3.3 Rule-Based Classification. Fall Instructor: Dr. Masoud Yaghini. Rule-Based Classification

Data Mining. 3.3 Rule-Based Classification. Fall Instructor: Dr. Masoud Yaghini. Rule-Based Classification Data Mining 3.3 Fall 2008 Instructor: Dr. Masoud Yaghini Outline Using IF-THEN Rules for Classification Rules With Exceptions Rule Extraction from a Decision Tree 1R Algorithm Sequential Covering Algorithms

More information

Data Mining Part 5. Prediction

Data Mining Part 5. Prediction Data Mining Part 5. Prediction 5.4. Spring 2010 Instructor: Dr. Masoud Yaghini Outline Using IF-THEN Rules for Classification Rule Extraction from a Decision Tree 1R Algorithm Sequential Covering Algorithms

More information

Decision Tree Learning

Decision Tree Learning Decision Tree Learning Debapriyo Majumdar Data Mining Fall 2014 Indian Statistical Institute Kolkata August 25, 2014 Example: Age, Income and Owning a flat Monthly income (thousand rupees) 250 200 150

More information

ARTIFICIAL INTELLIGENCE (CS 370D)

ARTIFICIAL INTELLIGENCE (CS 370D) Princess Nora University Faculty of Computer & Information Systems ARTIFICIAL INTELLIGENCE (CS 370D) (CHAPTER-18) LEARNING FROM EXAMPLES DECISION TREES Outline 1- Introduction 2- know your data 3- Classification

More information

Decision Trees Oct

Decision Trees Oct Decision Trees Oct - 7-2009 Previously We learned two different classifiers Perceptron: LTU KNN: complex decision boundary If you are a novice in this field, given a classification application, are these

More information

Machine Learning. Decision Trees. Manfred Huber

Machine Learning. Decision Trees. Manfred Huber Machine Learning Decision Trees Manfred Huber 2015 1 Decision Trees Classifiers covered so far have been Non-parametric (KNN) Probabilistic with independence (Naïve Bayes) Linear in features (Logistic

More information

Algorithms: Decision Trees

Algorithms: Decision Trees Algorithms: Decision Trees A small dataset: Miles Per Gallon Suppose we want to predict MPG From the UCI repository A Decision Stump Recursion Step Records in which cylinders = 4 Records in which cylinders

More information

Lecture 7: Decision Trees

Lecture 7: Decision Trees Lecture 7: Decision Trees Instructor: Outline 1 Geometric Perspective of Classification 2 Decision Trees Geometric Perspective of Classification Perspective of Classification Algorithmic Geometric Probabilistic...

More information

Data Mining Practical Machine Learning Tools and Techniques

Data Mining Practical Machine Learning Tools and Techniques Output: Knowledge representation Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter of Data Mining by I. H. Witten and E. Frank Decision tables Decision trees Decision rules

More information

CSC411/2515 Tutorial: K-NN and Decision Tree

CSC411/2515 Tutorial: K-NN and Decision Tree CSC411/2515 Tutorial: K-NN and Decision Tree Mengye Ren csc{411,2515}ta@cs.toronto.edu September 25, 2016 Cross-validation K-nearest-neighbours Decision Trees Review: Motivation for Validation Framework:

More information

Data Mining. Practical Machine Learning Tools and Techniques. Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A.

Data Mining. Practical Machine Learning Tools and Techniques. Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A. Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Output: Knowledge representation Tables Linear models Trees Rules

More information

Decision Tree (Continued) and K-Nearest Neighbour. Dr. Xiaowei Huang

Decision Tree (Continued) and K-Nearest Neighbour. Dr. Xiaowei Huang Decision Tree (Continued) and K-Nearest Neighbour Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Recap basic knowledge Decision tree learning How to split Identify the best feature to

More information

Data Mining Algorithms: Basic Methods

Data Mining Algorithms: Basic Methods Algorithms: The basic methods Inferring rudimentary rules Data Mining Algorithms: Basic Methods Chapter 4 of Data Mining Statistical modeling Constructing decision trees Constructing rules Association

More information

WEKA: Practical Machine Learning Tools and Techniques in Java. Seminar A.I. Tools WS 2006/07 Rossen Dimov

WEKA: Practical Machine Learning Tools and Techniques in Java. Seminar A.I. Tools WS 2006/07 Rossen Dimov WEKA: Practical Machine Learning Tools and Techniques in Java Seminar A.I. Tools WS 2006/07 Rossen Dimov Overview Basic introduction to Machine Learning Weka Tool Conclusion Document classification Demo

More information

Exam Advanced Data Mining Date: Time:

Exam Advanced Data Mining Date: Time: Exam Advanced Data Mining Date: 11-11-2010 Time: 13.30-16.30 General Remarks 1. You are allowed to consult 1 A4 sheet with notes written on both sides. 2. Always show how you arrived at the result of your

More information

CISC 4631 Data Mining

CISC 4631 Data Mining CISC 4631 Data Mining Lecture 05: Overfitting Evaluation: accuracy, precision, recall, ROC Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Eamonn Koegh (UC Riverside)

More information

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018 MIT 801 [Presented by Anna Bosman] 16 February 2018 Machine Learning What is machine learning? Artificial Intelligence? Yes as we know it. What is intelligence? The ability to acquire and apply knowledge

More information

Machine Learning Lecture 11

Machine Learning Lecture 11 Machine Learning Lecture 11 Random Forests 23.11.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory Probability

More information

Data Mining and Machine Learning: Techniques and Algorithms

Data Mining and Machine Learning: Techniques and Algorithms Instance based classification Data Mining and Machine Learning: Techniques and Algorithms Eneldo Loza Mencía eneldo@ke.tu-darmstadt.de Knowledge Engineering Group, TU Darmstadt International Week 2019,

More information

Extra readings beyond the lecture slides are important:

Extra readings beyond the lecture slides are important: 1 Notes To preview next lecture: Check the lecture notes, if slides are not available: http://web.cse.ohio-state.edu/~sun.397/courses/au2017/cse5243-new.html Check UIUC course on the same topic. All their

More information

DATA MINING OVERFITTING AND EVALUATION

DATA MINING OVERFITTING AND EVALUATION DATA MINING OVERFITTING AND EVALUATION 1 Overfitting Will cover mechanisms for preventing overfitting in decision trees But some of the mechanisms and concepts will apply to other algorithms 2 Occam s

More information

Decision Trees: Representation

Decision Trees: Representation Decision Trees: Representation Machine Learning Fall 2018 Some slides from Tom Mitchell, Dan Roth and others 1 Key issues in machine learning Modeling How to formulate your problem as a machine learning

More information

Instance-Based Representations. k-nearest Neighbor. k-nearest Neighbor. k-nearest Neighbor. exemplars + distance measure. Challenges.

Instance-Based Representations. k-nearest Neighbor. k-nearest Neighbor. k-nearest Neighbor. exemplars + distance measure. Challenges. Instance-Based Representations exemplars + distance measure Challenges. algorithm: IB1 classify based on majority class of k nearest neighbors learned structure is not explicitly represented choosing k

More information

Uninformed Search Methods. Informed Search Methods. Midterm Exam 3/13/18. Thursday, March 15, 7:30 9:30 p.m. room 125 Ag Hall

Uninformed Search Methods. Informed Search Methods. Midterm Exam 3/13/18. Thursday, March 15, 7:30 9:30 p.m. room 125 Ag Hall Midterm Exam Thursday, March 15, 7:30 9:30 p.m. room 125 Ag Hall Covers topics through Decision Trees and Random Forests (does not include constraint satisfaction) Closed book 8.5 x 11 sheet with notes

More information

Additive Models, Trees, etc. Based in part on Chapter 9 of Hastie, Tibshirani, and Friedman David Madigan

Additive Models, Trees, etc. Based in part on Chapter 9 of Hastie, Tibshirani, and Friedman David Madigan Additive Models, Trees, etc. Based in part on Chapter 9 of Hastie, Tibshirani, and Friedman David Madigan Predictive Modeling Goal: learn a mapping: y = f(x;θ) Need: 1. A model structure 2. A score function

More information

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset. Glossary of data mining terms: Accuracy Accuracy is an important factor in assessing the success of data mining. When applied to data, accuracy refers to the rate of correct values in the data. When applied

More information

Classification: Decision Trees

Classification: Decision Trees Metodologie per Sistemi Intelligenti Classification: Decision Trees Prof. Pier Luca Lanzi Laurea in Ingegneria Informatica Politecnico di Milano Polo regionale di Como Lecture outline What is a decision

More information

h=[3,2,5,7], pos=[2,1], neg=[4,4]

h=[3,2,5,7], pos=[2,1], neg=[4,4] 2D1431 Machine Learning Lab 1: Concept Learning & Decision Trees Frank Hoffmann e-mail: hoffmann@nada.kth.se November 8, 2002 1 Introduction You have to prepare the solutions to the lab assignments prior

More information

Business Club. Decision Trees

Business Club. Decision Trees Business Club Decision Trees Business Club Analytics Team December 2017 Index 1. Motivation- A Case Study 2. The Trees a. What is a decision tree b. Representation 3. Regression v/s Classification 4. Building

More information

Batch and Incremental Learning of Decision Trees

Batch and Incremental Learning of Decision Trees Batch and Incremental Learning of Decision Trees DIPLOMARBEIT zur Erlangung des akademischen Grades DIPLOMINGENIEUR im Diplomstudium Mathematik Angefertigt am Institut für Wissensbasierte Mathematische

More information

Prediction. What is Prediction. Simple methods for Prediction. Classification by decision tree induction. Classification and regression evaluation

Prediction. What is Prediction. Simple methods for Prediction. Classification by decision tree induction. Classification and regression evaluation Prediction Prediction What is Prediction Simple methods for Prediction Classification by decision tree induction Classification and regression evaluation 2 Prediction Goal: to predict the value of a given

More information

Artificial Intelligence. Programming Styles

Artificial Intelligence. Programming Styles Artificial Intelligence Intro to Machine Learning Programming Styles Standard CS: Explicitly program computer to do something Early AI: Derive a problem description (state) and use general algorithms to

More information

Nearest neighbor classification DSE 220

Nearest neighbor classification DSE 220 Nearest neighbor classification DSE 220 Decision Trees Target variable Label Dependent variable Output space Person ID Age Gender Income Balance Mortgag e payment 123213 32 F 25000 32000 Y 17824 49 M 12000-3000

More information

Decision trees. Decision trees are useful to a large degree because of their simplicity and interpretability

Decision trees. Decision trees are useful to a large degree because of their simplicity and interpretability Decision trees A decision tree is a method for classification/regression that aims to ask a few relatively simple questions about an input and then predicts the associated output Decision trees are useful

More information

Practical Data Mining COMP-321B. Tutorial 1: Introduction to the WEKA Explorer

Practical Data Mining COMP-321B. Tutorial 1: Introduction to the WEKA Explorer Practical Data Mining COMP-321B Tutorial 1: Introduction to the WEKA Explorer Gabi Schmidberger Mark Hall Richard Kirkby July 12, 2006 c 2006 University of Waikato 1 Setting up your Environment Before

More information

Machine Learning: Symbolische Ansätze

Machine Learning: Symbolische Ansätze Machine Learning: Symbolische Ansätze Learning Rule Sets Introduction Learning Rule Sets Terminology Coverage Spaces Separate-and-Conquer Rule Learning Covering algorithm Top-Down Hill-Climbing Rule Evaluation

More information

CMPUT 391 Database Management Systems. Data Mining. Textbook: Chapter (without 17.10)

CMPUT 391 Database Management Systems. Data Mining. Textbook: Chapter (without 17.10) CMPUT 391 Database Management Systems Data Mining Textbook: Chapter 17.7-17.11 (without 17.10) University of Alberta 1 Overview Motivation KDD and Data Mining Association Rules Clustering Classification

More information

Machine Learning. bad news. Big Picture: Supervised Learning. Supervised (Function) Learning. Learning From Data with Decision Trees.

Machine Learning. bad news. Big Picture: Supervised Learning. Supervised (Function) Learning. Learning From Data with Decision Trees. bad news Machine Learning Learning From Data with Decision Trees Supervised (Function) Learning y = F(x 1 x n ): true function (usually not known) D: training sample drawn from F(x) 57,M,195,0,125,95,39,25,0,1,0,0,0,1,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0

More information

Data Mining Part 4. Tony C Smith WEKA Machine Learning Group Department of Computer Science University of Waikato

Data Mining Part 4. Tony C Smith WEKA Machine Learning Group Department of Computer Science University of Waikato Data Mining Part 4 Tony C Smith WEKA Machine Learning Group Department of Computer Science University of Waikato Algorithms: The basic methods Inferring rudimentary rules Statistical modeling Constructing

More information

Learning Rules. Learning Rules from Decision Trees

Learning Rules. Learning Rules from Decision Trees Learning Rules In learning rules, we are interested in learning rules of the form: if A 1 A 2... then C where A 1, A 2,... are the preconditions/constraints/body/ antecedents of the rule and C is the postcondition/head/

More information

The digital copy of this thesis is protected by the Copyright Act 1994 (New Zealand).

The digital copy of this thesis is protected by the Copyright Act 1994 (New Zealand). http://waikato.researchgateway.ac.nz/ Research Commons at the University of Waikato Copyright Statement: The digital copy of this thesis is protected by the Copyright Act 1994 (New Zealand). The thesis

More information

Fuzzy Partitioning with FID3.1

Fuzzy Partitioning with FID3.1 Fuzzy Partitioning with FID3.1 Cezary Z. Janikow Dept. of Mathematics and Computer Science University of Missouri St. Louis St. Louis, Missouri 63121 janikow@umsl.edu Maciej Fajfer Institute of Computing

More information

Data Mining and Knowledge Discovery Practice notes 2

Data Mining and Knowledge Discovery Practice notes 2 Keywords Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si Data Attribute, example, attribute-value data, target variable, class, discretization Algorithms

More information

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation. Lecture Notes for Chapter 4. Introduction to Data Mining

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation. Lecture Notes for Chapter 4. Introduction to Data Mining Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach, Kumar (modified by Predrag Radivojac, 2017) Classification:

More information

Data Mining and Knowledge Discovery: Practice Notes

Data Mining and Knowledge Discovery: Practice Notes Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si 8.11.2017 1 Keywords Data Attribute, example, attribute-value data, target variable, class, discretization

More information

Data Mining. 3.5 Lazy Learners (Instance-Based Learners) Fall Instructor: Dr. Masoud Yaghini. Lazy Learners

Data Mining. 3.5 Lazy Learners (Instance-Based Learners) Fall Instructor: Dr. Masoud Yaghini. Lazy Learners Data Mining 3.5 (Instance-Based Learners) Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction k-nearest-neighbor Classifiers References Introduction Introduction Lazy vs. eager learning Eager

More information

Classification/Regression Trees and Random Forests

Classification/Regression Trees and Random Forests Classification/Regression Trees and Random Forests Fabio G. Cozman - fgcozman@usp.br November 6, 2018 Classification tree Consider binary class variable Y and features X 1,..., X n. Decide Ŷ after a series

More information

Data Mining Classification - Part 1 -

Data Mining Classification - Part 1 - Data Mining Classification - Part 1 - Universität Mannheim Bizer: Data Mining I FSS2019 (Version: 20.2.2018) Slide 1 Outline 1. What is Classification? 2. K-Nearest-Neighbors 3. Decision Trees 4. Model

More information

Data Mining Practical Machine Learning Tools and Techniques. Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank

Data Mining Practical Machine Learning Tools and Techniques. Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank Implementation: Real machine learning schemes Decision trees Classification

More information

Introduction to Rule-Based Systems. Using a set of assertions, which collectively form the working memory, and a set of

Introduction to Rule-Based Systems. Using a set of assertions, which collectively form the working memory, and a set of Introduction to Rule-Based Systems Using a set of assertions, which collectively form the working memory, and a set of rules that specify how to act on the assertion set, a rule-based system can be created.

More information

CONCEPT FORMATION AND DECISION TREE INDUCTION USING THE GENETIC PROGRAMMING PARADIGM

CONCEPT FORMATION AND DECISION TREE INDUCTION USING THE GENETIC PROGRAMMING PARADIGM 1 CONCEPT FORMATION AND DECISION TREE INDUCTION USING THE GENETIC PROGRAMMING PARADIGM John R. Koza Computer Science Department Stanford University Stanford, California 94305 USA E-MAIL: Koza@Sunburn.Stanford.Edu

More information

Tree-based methods for classification and regression

Tree-based methods for classification and regression Tree-based methods for classification and regression Ryan Tibshirani Data Mining: 36-462/36-662 April 11 2013 Optional reading: ISL 8.1, ESL 9.2 1 Tree-based methods Tree-based based methods for predicting

More information

Univariate and Multivariate Decision Trees

Univariate and Multivariate Decision Trees Univariate and Multivariate Decision Trees Olcay Taner Yıldız and Ethem Alpaydın Department of Computer Engineering Boğaziçi University İstanbul 80815 Turkey Abstract. Univariate decision trees at each

More information

Introduction to Machine Learning CANB 7640

Introduction to Machine Learning CANB 7640 Introduction to Machine Learning CANB 7640 Aik Choon Tan, Ph.D. Associate Professor of Bioinformatics Division of Medical Oncology Department of Medicine aikchoon.tan@ucdenver.edu 9/5/2017 http://tanlab.ucdenver.edu/labhomepage/teaching/canb7640/

More information

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Knowledge Discovery and Data Mining Lecture 10 - Classification trees Tom Kelsey School of Computer Science University of St Andrews http://tom.home.cs.st-andrews.ac.uk twk@st-andrews.ac.uk Tom Kelsey

More information

CS229 Lecture notes. Raphael John Lamarre Townshend

CS229 Lecture notes. Raphael John Lamarre Townshend CS229 Lecture notes Raphael John Lamarre Townshend Decision Trees We now turn our attention to decision trees, a simple yet flexible class of algorithms. We will first consider the non-linear, region-based

More information

Lazy Decision Trees Ronny Kohavi

Lazy Decision Trees Ronny Kohavi Lazy Decision Trees Ronny Kohavi Data Mining and Visualization Group Silicon Graphics, Inc. Joint work with Jerry Friedman and Yeogirl Yun Stanford University Motivation: Average Impurity = / interesting

More information

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 10: Decision Trees

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 10: Decision Trees University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 10: Decision Trees colour=green? size>20cm? colour=red? watermelon size>5cm? size>5cm? colour=yellow? apple

More information

Tillämpad Artificiell Intelligens Applied Artificial Intelligence Tentamen , , MA:8. 1 Search (JM): 11 points

Tillämpad Artificiell Intelligens Applied Artificial Intelligence Tentamen , , MA:8. 1 Search (JM): 11 points Lunds Tekniska Högskola EDA132 Institutionen för datavetenskap VT 2017 Tillämpad Artificiell Intelligens Applied Artificial Intelligence Tentamen 2016 03 15, 14.00 19.00, MA:8 You can give your answers

More information

Naïve Bayes Classification. Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others

Naïve Bayes Classification. Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others Naïve Bayes Classification Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others Things We d Like to Do Spam Classification Given an email, predict

More information

Machine Learning (CSE 446): Concepts & the i.i.d. Supervised Learning Paradigm

Machine Learning (CSE 446): Concepts & the i.i.d. Supervised Learning Paradigm Machine Learning (CSE 446): Concepts & the i.i.d. Supervised Learning Paradigm Sham M Kakade c 2018 University of Washington cse446-staff@cs.washington.edu 1 / 17 Review 1 / 17 Decision Tree: Making a

More information

Classification: Decision Trees

Classification: Decision Trees Classification: Decision Trees IST557 Data Mining: Techniques and Applications Jessie Li, Penn State University 1 Decision Tree Example Will a pa)ent have high-risk based on the ini)al 24-hour observa)on?

More information

A Systematic Overview of Data Mining Algorithms

A Systematic Overview of Data Mining Algorithms A Systematic Overview of Data Mining Algorithms 1 Data Mining Algorithm A well-defined procedure that takes data as input and produces output as models or patterns well-defined: precisely encoded as a

More information

DESIGN AND IMPLEMENTATION OF BUILDING DECISION TREE USING C4.5 ALGORITHM

DESIGN AND IMPLEMENTATION OF BUILDING DECISION TREE USING C4.5 ALGORITHM 1 Proceedings of SEAMS-GMU Conference 2007 DESIGN AND IMPLEMENTATION OF BUILDING DECISION TREE USING C4.5 ALGORITHM KUSRINI Abstract. Decision tree is one of data mining techniques that is applied in classification

More information