Decision Trees. Query Selection

Size: px
Start display at page:

Download "Decision Trees. Query Selection"

Transcription

1

2

3 CART Query Selection Key Question: Given a partial tree down to node N, what feature s should we choose for the property test T? The obvious heuristic is to choose the feature that yields as big a decrease in the impurity as possible. The impurity gradient is i(n) =i(n) P L i(n L ) (1 P L )i(n R ), (5) where N L and N R are the left and right descendants, respectively, P L is the fraction of data that will go to the left sub-tree when property T is used. The strategy is then to choose the feature that maximizes i(n). J. Corso (SUNY at Buffalo) Trees 13 / 33

4

5 CART What can we say about this strategy? For the binary-case, it yields one-dimensional optimization problem (which may have non-unique optima). J. Corso (SUNY at Buffalo) Trees 14 / 33

6 CART What can we say about this strategy? For the binary-case, it yields one-dimensional optimization problem (which may have non-unique optima). In the higher branching factor case, it would yield a higher-dimensional optimization problem. In multi-class binary tree creation, we would want to use the twoing criterion. The goal is to find the split that best separates groups of the c categories. A candidate supercategory C 1 consists of all patterns in some subset of the categories and C 2 has the remainder. When searching for the feature s, we also need to search over possible category groupings. J. Corso (SUNY at Buffalo) Trees 14 / 33

7 CART What can we say about this strategy? For the binary-case, it yields one-dimensional optimization problem (which may have non-unique optima). In the higher branching factor case, it would yield a higher-dimensional optimization problem. In multi-class binary tree creation, we would want to use the twoing criterion. The goal is to find the split that best separates groups of the c categories. A candidate supercategory C 1 consists of all patterns in some subset of the categories and C 2 has the remainder. When searching for the feature s, we also need to search over possible category groupings. This is a local, greedy optimization strategy. J. Corso (SUNY at Buffalo) Trees 14 / 33

8 CART What can we say about this strategy? For the binary-case, it yields one-dimensional optimization problem (which may have non-unique optima). In the higher branching factor case, it would yield a higher-dimensional optimization problem. In multi-class binary tree creation, we would want to use the twoing criterion. The goal is to find the split that best separates groups of the c categories. A candidate supercategory C 1 consists of all patterns in some subset of the categories and C 2 has the remainder. When searching for the feature s, we also need to search over possible category groupings. This is a local, greedy optimization strategy. Hence, there is no guarantee that we have either the global optimum (in classification accuracy) or the smallest tree. J. Corso (SUNY at Buffalo) Trees 14 / 33

9 CART What can we say about this strategy? For the binary-case, it yields one-dimensional optimization problem (which may have non-unique optima). In the higher branching factor case, it would yield a higher-dimensional optimization problem. In multi-class binary tree creation, we would want to use the twoing criterion. The goal is to find the split that best separates groups of the c categories. A candidate supercategory C 1 consists of all patterns in some subset of the categories and C 2 has the remainder. When searching for the feature s, we also need to search over possible category groupings. This is a local, greedy optimization strategy. Hence, there is no guarantee that we have either the global optimum (in classification accuracy) or the smallest tree. In practice, it has been observed that the particular choice of impurity function rarely affects the final classifier and its accuracy. J. Corso (SUNY at Buffalo) Trees 14 / 33

10

11 CART A Note About Multiway Splits In the case of selecting a multiway split with branching factor B, the following is the direct generalization of the impurity gradient function: i(s) =i(n) B k=1 P k i(n k ) (6) This direct generalization is biased toward higher branching factors. To see this, consider the uniform splitting case. J. Corso (SUNY at Buffalo) Trees 15 / 33

12 CART A Note About Multiway Splits In the case of selecting a multiway split with branching factor B, the following is the direct generalization of the impurity gradient function: i(s) =i(n) B k=1 P k i(n k ) (6) This direct generalization is biased toward higher branching factors. To see this, consider the uniform splitting case. So, we need to normalize each: i B (s) = i(s) B k=1 P k log P k. (7) And then we can again choose the feature that maximizes this normalized criterion. J. Corso (SUNY at Buffalo) Trees 15 / 33

13

14

15 CART When to Stop Splitting? If we continue to grow the tree until each leaf node has its lowest impurity (just one sample datum), then we will likely have over-trained the data. This tree will most definitely not generalize well. Conversely, if we stop growing the tree too early, the error on the training data will not be sufficiently low and performance will again suffer. So, how to stop splitting? J. Corso (SUNY at Buffalo) Trees 16 / 33

16

17

18 CART Stopping by Thresholding the Impurity Gradient Splitting is stopped if the best candidate split at a node reduces the impurity by less than the preset amount, β: max s i(s) β. (8) Benefit 1: Unlike cross-validation, the tree is trained on the complete training data set. J. Corso (SUNY at Buffalo) Trees 17 / 33

19 CART Stopping by Thresholding the Impurity Gradient Splitting is stopped if the best candidate split at a node reduces the impurity by less than the preset amount, β: max s i(s) β. (8) Benefit 1: Unlike cross-validation, the tree is trained on the complete training data set. Benefit 2: Leaf nodes can lie in different levels of the tree, which is desirable whenver the complexity of the data varies throughout the range of values. J. Corso (SUNY at Buffalo) Trees 17 / 33

20 CART Stopping by Thresholding the Impurity Gradient Splitting is stopped if the best candidate split at a node reduces the impurity by less than the preset amount, β: max s i(s) β. (8) Benefit 1: Unlike cross-validation, the tree is trained on the complete training data set. Benefit 2: Leaf nodes can lie in different levels of the tree, which is desirable whenver the complexity of the data varies throughout the range of values. Drawback: But, how do we set the value of the threshold β? J. Corso (SUNY at Buffalo) Trees 17 / 33

21

22 CART Stopping with a Complexity Term Define a new global criterion function α size + leaf nodes i(n). (9) which trades complexity for accuracy. Here, size could represent the number of nodes or links and α is some positive constant. The strategy is then to split until a minimum of this global criterion function has been reached. J. Corso (SUNY at Buffalo) Trees 18 / 33

23 CART Stopping with a Complexity Term Define a new global criterion function α size + leaf nodes i(n). (9) which trades complexity for accuracy. Here, size could represent the number of nodes or links and α is some positive constant. The strategy is then to split until a minimum of this global criterion function has been reached. Given the entropy impurity, this global measure is related to the minimum description length principle. The sum of the impurities at the leaf nodes is a measure of uncertainty in the training data given the model represented by the tree. J. Corso (SUNY at Buffalo) Trees 18 / 33

24

25 CART Stopping by Testing the Statistical Significance During construction, estimate the distribution of the impurity gradients i for the current collection of nodes. J. Corso (SUNY at Buffalo) Trees 19 / 33

26 CART Stopping by Testing the Statistical Significance During construction, estimate the distribution of the impurity gradients i for the current collection of nodes. For any candidate split, estimate if it is statistical different from zero. One possibility is the chi-squared test. J. Corso (SUNY at Buffalo) Trees 19 / 33

27 CART Stopping by Testing the Statistical Significance During construction, estimate the distribution of the impurity gradients i for the current collection of nodes. For any candidate split, estimate if it is statistical different from zero. One possibility is the chi-squared test. More generally, we can consider a hypothesis testing approach to stopping: we seek to determine whether a candidate split differs significantly from a random split. J. Corso (SUNY at Buffalo) Trees 19 / 33

28 CART Stopping by Testing the Statistical Significance During construction, estimate the distribution of the impurity gradients i for the current collection of nodes. For any candidate split, estimate if it is statistical different from zero. One possibility is the chi-squared test. More generally, we can consider a hypothesis testing approach to stopping: we seek to determine whether a candidate split differs significantly from a random split. Suppose we have n samples at node N. Aparticularsplits sends Pn patterns to the left branch and (1 P )n patterns to the right branch. A random split would place P n1 of the ω 1 samples to the left, P n2 of the ω 2 samples to the left and corresponding amounts to the right. J. Corso (SUNY at Buffalo) Trees 19 / 33

29

30 CART The chi-squared statistic calculates the deviation of a particular split s from this random one: χ 2 = 2 i=1 (n il n ie ) 2 n ie (10) where n il is the number of ω 1 patterns sent to the left under s, and n ie = Pn i is the number expected by the random rule. The larger the chi-squared statistic, the more the candidate split deviates from a random one. J. Corso (SUNY at Buffalo) Trees 20 / 33

31 CART The chi-squared statistic calculates the deviation of a particular split s from this random one: χ 2 = 2 i=1 (n il n ie ) 2 n ie (10) where n il is the number of ω 1 patterns sent to the left under s, and n ie = Pn i is the number expected by the random rule. The larger the chi-squared statistic, the more the candidate split deviates from a random one. When it is greater than a critical value (based on desired significance bounds), we reject the null hypothesis (the random split) and proceed with s. J. Corso (SUNY at Buffalo) Trees 20 / 33

32 CART Pruning Tree construction based on when to stop splitting biases the learning algorithm toward trees in which the greatest impurity reduction occurs near the root. It makes no attempt to look ahead at what splits may occur in the leaf and beyond. J. Corso (SUNY at Buffalo) Trees 21 / 33

33 CART Pruning Tree construction based on when to stop splitting biases the learning algorithm toward trees in which the greatest impurity reduction occurs near the root. It makes no attempt to look ahead at what splits may occur in the leaf and beyond. Pruning is the principal alternative strategy for tree construction. J. Corso (SUNY at Buffalo) Trees 21 / 33

34 CART Pruning Tree construction based on when to stop splitting biases the learning algorithm toward trees in which the greatest impurity reduction occurs near the root. It makes no attempt to look ahead at what splits may occur in the leaf and beyond. Pruning is the principal alternative strategy for tree construction. In pruning, we exhaustively build the tree. Then, all pairs of neighboring leafs nodes are considered for elimination. J. Corso (SUNY at Buffalo) Trees 21 / 33

35 CART Pruning Tree construction based on when to stop splitting biases the learning algorithm toward trees in which the greatest impurity reduction occurs near the root. It makes no attempt to look ahead at what splits may occur in the leaf and beyond. Pruning is the principal alternative strategy for tree construction. In pruning, we exhaustively build the tree. Then, all pairs of neighboring leafs nodes are considered for elimination. Any pair that yields a satisfactory increase in impurity (a small one) is eliminated and the common ancestor node is declared a leaf. J. Corso (SUNY at Buffalo) Trees 21 / 33

36 CART Pruning Tree construction based on when to stop splitting biases the learning algorithm toward trees in which the greatest impurity reduction occurs near the root. It makes no attempt to look ahead at what splits may occur in the leaf and beyond. Pruning is the principal alternative strategy for tree construction. In pruning, we exhaustively build the tree. Then, all pairs of neighboring leafs nodes are considered for elimination. Any pair that yields a satisfactory increase in impurity (a small one) is eliminated and the common ancestor node is declared a leaf. Unbalanced trees often result from this style of pruning/merging. J. Corso (SUNY at Buffalo) Trees 21 / 33

37 CART Pruning Tree construction based on when to stop splitting biases the learning algorithm toward trees in which the greatest impurity reduction occurs near the root. It makes no attempt to look ahead at what splits may occur in the leaf and beyond. Pruning is the principal alternative strategy for tree construction. In pruning, we exhaustively build the tree. Then, all pairs of neighboring leafs nodes are considered for elimination. Any pair that yields a satisfactory increase in impurity (a small one) is eliminated and the common ancestor node is declared a leaf. Unbalanced trees often result from this style of pruning/merging. Pruning avoids the local -ness of the earlier methods and uses all of the training data, but it does so at added computational cost during the tree construction. J. Corso (SUNY at Buffalo) Trees 21 / 33

38

39

40

41

42 ID3 ID3 Method ID3 is another tree growing method. J. Corso (SUNY at Buffalo) Trees 26 / 33

43 ID3 ID3 Method ID3 is another tree growing method. It assumes nominal inputs. J. Corso (SUNY at Buffalo) Trees 26 / 33

44 ID3 ID3 Method ID3 is another tree growing method. It assumes nominal inputs. Every split has a branching factor B j,whereb j is the number of discrete attribute bins of the variable j chosen for splitting. J. Corso (SUNY at Buffalo) Trees 26 / 33

45 ID3 ID3 Method ID3 is another tree growing method. It assumes nominal inputs. Every split has a branching factor B j,whereb j is the number of discrete attribute bins of the variable j chosen for splitting. These are, hence, seldom binary. J. Corso (SUNY at Buffalo) Trees 26 / 33

46 ID3 ID3 Method ID3 is another tree growing method. It assumes nominal inputs. Every split has a branching factor B j,whereb j is the number of discrete attribute bins of the variable j chosen for splitting. These are, hence, seldom binary. The number of levels in the trees are equal to the number of input variables. J. Corso (SUNY at Buffalo) Trees 26 / 33

47 ID3 ID3 Method ID3 is another tree growing method. It assumes nominal inputs. Every split has a branching factor B j,whereb j is the number of discrete attribute bins of the variable j chosen for splitting. These are, hence, seldom binary. The number of levels in the trees are equal to the number of input variables. The algorithm continues until all nodes are pure or there are no more variables on which to split. J. Corso (SUNY at Buffalo) Trees 26 / 33

48 ID3 ID3 Method ID3 is another tree growing method. It assumes nominal inputs. Every split has a branching factor B j,whereb j is the number of discrete attribute bins of the variable j chosen for splitting. These are, hence, seldom binary. The number of levels in the trees are equal to the number of input variables. The algorithm continues until all nodes are pure or there are no more variables on which to split. One can follow this by pruning. J. Corso (SUNY at Buffalo) Trees 26 / 33

49 C4.5 C4.5 Method (in brief) This is a successor to the ID3 method. J. Corso (SUNY at Buffalo) Trees 27 / 33

50 C4.5 C4.5 Method (in brief) This is a successor to the ID3 method. It handles real valued variables like CART and uses the ID3 multiway splits for nominal data. J. Corso (SUNY at Buffalo) Trees 27 / 33

51

52 Example Example from T. Mitchell Book: PlayTennis Day Outlook Temperature Humidity Wind PlayTennis D1 Sunny Hot High Weak No D2 Sunny Hot High Strong No D3 Overcast Hot High Weak Yes D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D6 Rain Cool Normal Strong No D7 Overcast Cool Normal Strong Yes D8 Sunny Mild High Weak No D9 Sunny Cool Normal Weak Yes D10 Rain Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D13 Overcast Hot Normal Weak Yes D14 Rain Mild High Strong No J. Corso (SUNY at Buffalo) Trees 28 / 33

53 Example Which attribute is the best classifier? S: [9+,5-] E =0.940 E =0.940 S: [9+,5-] Humidity Wind High Normal Weak Strong [3+,4-] [6+,1-] Gain (S, Humidity ) Gain (S, Wind) [6+,2-] [3+,3-] E =0.985 E =0.592 E =0.811 E =1.00 = (7/14) (7/14).592 =.151 = (8/14) (6/14)1.0 =.048 J. Corso (SUNY at Buffalo) Trees 29 / 33

54 Example {D1, D2,..., D14} [9+,5 ] Outlook Sunny Overcast Rain {D1,D2,D8,D9,D11} {D3,D7,D12,D13} {D4,D5,D6,D10,D14} [2+,3 ] [4+,0 ] [3+,2 ]?? Yes Which attribute should be tested here? Ssunny = {D1,D2,D8,D9,D11} Gain (S sunny, Humidity) =.970 (3/5) 0.0 (2/5) 0.0 =.970 Gain (S sunny, Temperature) =.970 (2/5) 0.0 (2/5) 1.0 (1/5) 0.0 =.570 Gain (S sunny, Wind) =.970 (2/5) 1.0 (3/5).918 =.019 J. Corso (SUNY at Buffalo) Trees 30 / 33

55

56 Learned Tree Decision Trees Example Outlook Sunny Overcast Rain Humidity Yes Wind High Normal Strong Weak No Yes No Yes J. Corso (SUNY at Buffalo) Trees 32 / 33

57 Example Overfitting Instance Consider adding a new, noisy training example #15: Sunny, Hot, N ormal, Strong, P layt ennis = N o What effect would it have on the earlier tree? J. Corso (SUNY at Buffalo) Trees 33 / 33

58

59 I f,-.",) 7

60 ( r~''''+jl''ia"d Or,) I \ (

61 - ~ ~ - -1,I -, v~, 5?'f?':ll"l p3 11:_,/..._ t":::..l ;1-1#o<t;"'i~ ",rc,. Pt:r--l r....f' I/~ ',/,_ ;:::;..,..., c;.-/-.t J...,v ',F R-P _ 5;,.-y ('~;\ 1.t'l. oj- L,.r.,...i... ~ 1-1>1:>(, ~ ';"'-n:.~,",,;...,l,.-;h h_ I.)..1 J \./. /' IL... r!-':"rol s:".,.~ (f" "'tv-..., hnlil..."..j- -!V-U.. I... T /1 " ~'.ti'j(. 1"',. T \ --'> '-. I I N"<.I ~ I.(t r-:l- () 11> h "1'II1f7., ~'"" ", ~o"'- 6-fA- J p( c,/ ::; (I, -r:i" \~ Dr-. '>+~rl'oj_ #oj..u.... MY...J ( iba -fl",rf ~-,./ 'V --L.,,... 4'i.("1'- y' J., ~ ('i<,') -:::: "I 2.. M (~) \/.1./ I /' 1\-:;.. I' 'V Jh..- h/a ~ Jl.AA~('",Iv. 'A, RAJ /...,,.I tl",- ( f?! " '>J I " I f-... *'{~ v rl- T\ ~ CAn. ~t_'- ";" I~') A-. "'''',,_,... _ el hio -.. ~.' I ~.'

62 'it t(lj.d(j~ ~~ ~>.d1a~ - _ I~ (]) y ~r~~... "2i.y-c-tt~-P~~~lko/~-/il -----l-jllf--~~'_!a~~ w/k ' ~~. ""'1=~~ L~ ~ _Ll!~"cM t,~~:b... _~1:.. I~)- /~ t I'ff' tl-t-i _,Lfik~~~ ~_~.~~,",-- -r~~ _ (j) c;.--/-i"~ilm ~ ~U~(.Li kvt A-+--~~4j II ~~ ls.~~ja 'fj - /~'lcte. d,,~._~~~.vl~ _ 0-0- _ ---+l"-ir-l'h4-tj-~~,f-tv~ b~~jr_~d::l~~ <1P' /.')_.0)" 2? _ H- oj ~-vfi0-u...- 5? ~ - t:q. I)'" )'<,... > k ~~ - _~_~_ -H- _ql..l~-..l3m - I ~Il- -r~-)q--'il~u]{-i-;-~)l, ~-If!t-:t(~l~,------' '~ Hllr II I

Nominal Data. May not have a numerical representation Distance measures might not make sense. PR and ANN

Nominal Data. May not have a numerical representation Distance measures might not make sense. PR and ANN NonMetric Data Nominal Data So far we consider patterns to be represented by feature vectors of real or integer values Easy to come up with a distance (similarity) measure by using a variety of mathematical

More information

BITS F464: MACHINE LEARNING

BITS F464: MACHINE LEARNING BITS F464: MACHINE LEARNING Lecture-16: Decision Tree (contd.) + Random Forest Dr. Kamlesh Tiwari Assistant Professor Department of Computer Science and Information Systems Engineering, BITS Pilani, Rajasthan-333031

More information

Decision Trees: Discussion

Decision Trees: Discussion Decision Trees: Discussion Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 This lecture: Learning Decision Trees 1. Representation: What are decision trees? 2. Algorithm: Learning

More information

Nominal Data. May not have a numerical representation Distance measures might not make sense PR, ANN, & ML

Nominal Data. May not have a numerical representation Distance measures might not make sense PR, ANN, & ML Decision Trees Nominal Data So far we consider patterns to be represented by feature vectors of real or integer values Easy to come up with a distance (similarity) measure by using a variety of mathematical

More information

Data Mining Practical Machine Learning Tools and Techniques

Data Mining Practical Machine Learning Tools and Techniques Decision trees Extending previous approach: Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank to permit numeric s: straightforward

More information

Decision Tree CE-717 : Machine Learning Sharif University of Technology

Decision Tree CE-717 : Machine Learning Sharif University of Technology Decision Tree CE-717 : Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Some slides have been adapted from: Prof. Tom Mitchell Decision tree Approximating functions of usually discrete

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Decision Tree Example Three variables: Attribute 1: Hair = {blond, dark} Attribute 2: Height = {tall, short} Class: Country = {Gromland, Polvia} CS4375 --- Fall 2018 a

More information

Lecture 5: Decision Trees (Part II)

Lecture 5: Decision Trees (Part II) Lecture 5: Decision Trees (Part II) Dealing with noise in the data Overfitting Pruning Dealing with missing attribute values Dealing with attributes with multiple values Integrating costs into node choice

More information

Classification with Decision Tree Induction

Classification with Decision Tree Induction Classification with Decision Tree Induction This algorithm makes Classification Decision for a test sample with the help of tree like structure (Similar to Binary Tree OR k-ary tree) Nodes in the tree

More information

Decision tree learning

Decision tree learning Decision tree learning Andrea Passerini passerini@disi.unitn.it Machine Learning Learning the concept Go to lesson OUTLOOK Rain Overcast Sunny TRANSPORTATION LESSON NO Uncovered Covered Theoretical Practical

More information

Data Mining. Decision Tree. Hamid Beigy. Sharif University of Technology. Fall 1396

Data Mining. Decision Tree. Hamid Beigy. Sharif University of Technology. Fall 1396 Data Mining Decision Tree Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 1 / 24 Table of contents 1 Introduction 2 Decision tree

More information

Lecture 5 of 42. Decision Trees, Occam s Razor, and Overfitting

Lecture 5 of 42. Decision Trees, Occam s Razor, and Overfitting Lecture 5 of 42 Decision Trees, Occam s Razor, and Overfitting Friday, 01 February 2008 William H. Hsu, KSU http://www.cis.ksu.edu/~bhsu Readings: Chapter 3.6-3.8, Mitchell Lecture Outline Read Sections

More information

7. Decision or classification trees

7. Decision or classification trees 7. Decision or classification trees Next we are going to consider a rather different approach from those presented so far to machine learning that use one of the most common and important data structure,

More information

Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai

Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai Decision Trees Decision Tree Decision Trees (DTs) are a nonparametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target

More information

Decision Tree Learning

Decision Tree Learning Decision Tree Learning 1 Simple example of object classification Instances Size Color Shape C(x) x1 small red circle positive x2 large red circle positive x3 small red triangle negative x4 large blue circle

More information

ISSUES IN DECISION TREE LEARNING

ISSUES IN DECISION TREE LEARNING ISSUES IN DECISION TREE LEARNING Handling Continuous Attributes Other attribute selection measures Overfitting-Pruning Handling of missing values Incremental Induction of Decision Tree 1 DECISION TREE

More information

Data Mining. 3.2 Decision Tree Classifier. Fall Instructor: Dr. Masoud Yaghini. Chapter 5: Decision Tree Classifier

Data Mining. 3.2 Decision Tree Classifier. Fall Instructor: Dr. Masoud Yaghini. Chapter 5: Decision Tree Classifier Data Mining 3.2 Decision Tree Classifier Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction Basic Algorithm for Decision Tree Induction Attribute Selection Measures Information Gain Gain Ratio

More information

CSC411/2515 Tutorial: K-NN and Decision Tree

CSC411/2515 Tutorial: K-NN and Decision Tree CSC411/2515 Tutorial: K-NN and Decision Tree Mengye Ren csc{411,2515}ta@cs.toronto.edu September 25, 2016 Cross-validation K-nearest-neighbours Decision Trees Review: Motivation for Validation Framework:

More information

Machine Learning in Real World: C4.5

Machine Learning in Real World: C4.5 Machine Learning in Real World: C4.5 Industrial-strength algorithms For an algorithm to be useful in a wide range of realworld applications it must: Permit numeric attributes with adaptive discretization

More information

Advanced learning algorithms

Advanced learning algorithms Advanced learning algorithms Extending decision trees; Extraction of good classification rules; Support vector machines; Weighted instance-based learning; Design of Model Tree Clustering Association Mining

More information

Supervised Learning. Decision trees Artificial neural nets K-nearest neighbor Support vectors Linear regression Logistic regression...

Supervised Learning. Decision trees Artificial neural nets K-nearest neighbor Support vectors Linear regression Logistic regression... Supervised Learning Decision trees Artificial neural nets K-nearest neighbor Support vectors Linear regression Logistic regression... Supervised Learning y=f(x): true function (usually not known) D: training

More information

Classification and Regression Trees

Classification and Regression Trees Classification and Regression Trees Matthew S. Shotwell, Ph.D. Department of Biostatistics Vanderbilt University School of Medicine Nashville, TN, USA March 16, 2018 Introduction trees partition feature

More information

Decision Tree (Continued) and K-Nearest Neighbour. Dr. Xiaowei Huang

Decision Tree (Continued) and K-Nearest Neighbour. Dr. Xiaowei Huang Decision Tree (Continued) and K-Nearest Neighbour Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Recap basic knowledge Decision tree learning How to split Identify the best feature to

More information

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 10: Decision Trees

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 10: Decision Trees University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 10: Decision Trees colour=green? size>20cm? colour=red? watermelon size>5cm? size>5cm? colour=yellow? apple

More information

9/6/14. Our first learning algorithm. Comp 135 Introduction to Machine Learning and Data Mining. knn Algorithm. knn Algorithm (simple form)

9/6/14. Our first learning algorithm. Comp 135 Introduction to Machine Learning and Data Mining. knn Algorithm. knn Algorithm (simple form) Comp 135 Introduction to Machine Learning and Data Mining Our first learning algorithm How would you classify the next example? Fall 2014 Professor: Roni Khardon Computer Science Tufts University o o o

More information

Decision Tree Learning

Decision Tree Learning Decision Tree Learning Debapriyo Majumdar Data Mining Fall 2014 Indian Statistical Institute Kolkata August 25, 2014 Example: Age, Income and Owning a flat Monthly income (thousand rupees) 250 200 150

More information

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 10: Decision Trees

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 10: Decision Trees University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 10: Decision Trees colour=green? size>20cm? colour=red? watermelon size>5cm? size>5cm? colour=yellow? apple

More information

CSE 634/590 Data mining Extra Credit: Classification by Association rules: Example Problem. Muhammad Asiful Islam, SBID:

CSE 634/590 Data mining Extra Credit: Classification by Association rules: Example Problem. Muhammad Asiful Islam, SBID: CSE 634/590 Data mining Extra Credit: Classification by Association rules: Example Problem Muhammad Asiful Islam, SBID: 106506983 Original Data Outlook Humidity Wind PlayTenis Sunny High Weak No Sunny

More information

The digital copy of this thesis is protected by the Copyright Act 1994 (New Zealand).

The digital copy of this thesis is protected by the Copyright Act 1994 (New Zealand). http://waikato.researchgateway.ac.nz/ Research Commons at the University of Waikato Copyright Statement: The digital copy of this thesis is protected by the Copyright Act 1994 (New Zealand). The thesis

More information

Chapter ML:III. III. Decision Trees. Decision Trees Basics Impurity Functions Decision Tree Algorithms Decision Tree Pruning

Chapter ML:III. III. Decision Trees. Decision Trees Basics Impurity Functions Decision Tree Algorithms Decision Tree Pruning Chapter ML:III III. Decision Trees Decision Trees Basics Impurity Functions Decision Tree Algorithms Decision Tree Pruning ML:III-67 Decision Trees STEIN/LETTMANN 2005-2017 ID3 Algorithm [Quinlan 1986]

More information

Data Mining and Machine Learning: Techniques and Algorithms

Data Mining and Machine Learning: Techniques and Algorithms Instance based classification Data Mining and Machine Learning: Techniques and Algorithms Eneldo Loza Mencía eneldo@ke.tu-darmstadt.de Knowledge Engineering Group, TU Darmstadt International Week 2019,

More information

COMP 465: Data Mining Classification Basics

COMP 465: Data Mining Classification Basics Supervised vs. Unsupervised Learning COMP 465: Data Mining Classification Basics Slides Adapted From : Jiawei Han, Micheline Kamber & Jian Pei Data Mining: Concepts and Techniques, 3 rd ed. Supervised

More information

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition Pattern Recognition Kjell Elenius Speech, Music and Hearing KTH March 29, 2007 Speech recognition 2007 1 Ch 4. Pattern Recognition 1(3) Bayes Decision Theory Minimum-Error-Rate Decision Rules Discriminant

More information

8. Tree-based approaches

8. Tree-based approaches Foundations of Machine Learning École Centrale Paris Fall 2015 8. Tree-based approaches Chloé-Agathe Azencott Centre for Computational Biology, Mines ParisTech chloe agathe.azencott@mines paristech.fr

More information

Chapter 4: Algorithms CS 795

Chapter 4: Algorithms CS 795 Chapter 4: Algorithms CS 795 Inferring Rudimentary Rules 1R Single rule one level decision tree Pick each attribute and form a single level tree without overfitting and with minimal branches Pick that

More information

Business Club. Decision Trees

Business Club. Decision Trees Business Club Decision Trees Business Club Analytics Team December 2017 Index 1. Motivation- A Case Study 2. The Trees a. What is a decision tree b. Representation 3. Regression v/s Classification 4. Building

More information

The Curse of Dimensionality

The Curse of Dimensionality The Curse of Dimensionality ACAS 2002 p1/66 Curse of Dimensionality The basic idea of the curse of dimensionality is that high dimensional data is difficult to work with for several reasons: Adding more

More information

Artificial Intelligence. Programming Styles

Artificial Intelligence. Programming Styles Artificial Intelligence Intro to Machine Learning Programming Styles Standard CS: Explicitly program computer to do something Early AI: Derive a problem description (state) and use general algorithms to

More information

Decision trees. Decision trees are useful to a large degree because of their simplicity and interpretability

Decision trees. Decision trees are useful to a large degree because of their simplicity and interpretability Decision trees A decision tree is a method for classification/regression that aims to ask a few relatively simple questions about an input and then predicts the associated output Decision trees are useful

More information

Machine Learning. Decision Trees. Manfred Huber

Machine Learning. Decision Trees. Manfred Huber Machine Learning Decision Trees Manfred Huber 2015 1 Decision Trees Classifiers covered so far have been Non-parametric (KNN) Probabilistic with independence (Naïve Bayes) Linear in features (Logistic

More information

Machine Learning Lecture 11

Machine Learning Lecture 11 Machine Learning Lecture 11 Random Forests 23.11.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory Probability

More information

Classification and Regression Trees

Classification and Regression Trees Classification and Regression Trees David S. Rosenberg New York University April 3, 2018 David S. Rosenberg (New York University) DS-GA 1003 / CSCI-GA 2567 April 3, 2018 1 / 51 Contents 1 Trees 2 Regression

More information

Ensemble Methods, Decision Trees

Ensemble Methods, Decision Trees CS 1675: Intro to Machine Learning Ensemble Methods, Decision Trees Prof. Adriana Kovashka University of Pittsburgh November 13, 2018 Plan for This Lecture Ensemble methods: introduction Boosting Algorithm

More information

Chapter 4: Algorithms CS 795

Chapter 4: Algorithms CS 795 Chapter 4: Algorithms CS 795 Inferring Rudimentary Rules 1R Single rule one level decision tree Pick each attribute and form a single level tree without overfitting and with minimal branches Pick that

More information

Unsupervised: no target value to predict

Unsupervised: no target value to predict Clustering Unsupervised: no target value to predict Differences between models/algorithms: Exclusive vs. overlapping Deterministic vs. probabilistic Hierarchical vs. flat Incremental vs. batch learning

More information

Prediction. What is Prediction. Simple methods for Prediction. Classification by decision tree induction. Classification and regression evaluation

Prediction. What is Prediction. Simple methods for Prediction. Classification by decision tree induction. Classification and regression evaluation Prediction Prediction What is Prediction Simple methods for Prediction Classification by decision tree induction Classification and regression evaluation 2 Prediction Goal: to predict the value of a given

More information

Data Mining and Analytics

Data Mining and Analytics Data Mining and Analytics Aik Choon Tan, Ph.D. Associate Professor of Bioinformatics Division of Medical Oncology Department of Medicine aikchoon.tan@ucdenver.edu 9/22/2017 http://tanlab.ucdenver.edu/labhomepage/teaching/bsbt6111/

More information

Learning Rules. Learning Rules from Decision Trees

Learning Rules. Learning Rules from Decision Trees Learning Rules In learning rules, we are interested in learning rules of the form: if A 1 A 2... then C where A 1, A 2,... are the preconditions/constraints/body/ antecedents of the rule and C is the postcondition/head/

More information

Induction of Decision Trees

Induction of Decision Trees Induction of Decision Trees Blaž Zupan and Ivan Bratko magixfriuni-ljsi/predavanja/uisp An Example Data Set and Decision Tree # Attribute Class Outlook Company Sailboat Sail? 1 sunny big small yes 2 sunny

More information

A Systematic Overview of Data Mining Algorithms

A Systematic Overview of Data Mining Algorithms A Systematic Overview of Data Mining Algorithms 1 Data Mining Algorithm A well-defined procedure that takes data as input and produces output as models or patterns well-defined: precisely encoded as a

More information

What Is Data Mining? CMPT 354: Database I -- Data Mining 2

What Is Data Mining? CMPT 354: Database I -- Data Mining 2 Data Mining What Is Data Mining? Mining data mining knowledge Data mining is the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data CMPT

More information

Tree-based methods for classification and regression

Tree-based methods for classification and regression Tree-based methods for classification and regression Ryan Tibshirani Data Mining: 36-462/36-662 April 11 2013 Optional reading: ISL 8.1, ESL 9.2 1 Tree-based methods Tree-based based methods for predicting

More information

Implementierungstechniken für Hauptspeicherdatenbanksysteme Classification: Decision Trees

Implementierungstechniken für Hauptspeicherdatenbanksysteme Classification: Decision Trees Implementierungstechniken für Hauptspeicherdatenbanksysteme Classification: Decision Trees Dominik Vinan February 6, 2018 Abstract Decision Trees are a well-known part of most modern Machine Learning toolboxes.

More information

Data Mining Lecture 8: Decision Trees

Data Mining Lecture 8: Decision Trees Data Mining Lecture 8: Decision Trees Jo Houghton ECS Southampton March 8, 2019 1 / 30 Decision Trees - Introduction A decision tree is like a flow chart. E. g. I need to buy a new car Can I afford it?

More information

Lecture outline. Decision-tree classification

Lecture outline. Decision-tree classification Lecture outline Decision-tree classification Decision Trees Decision tree A flow-chart-like tree structure Internal node denotes a test on an attribute Branch represents an outcome of the test Leaf nodes

More information

Uninformed Search Methods. Informed Search Methods. Midterm Exam 3/13/18. Thursday, March 15, 7:30 9:30 p.m. room 125 Ag Hall

Uninformed Search Methods. Informed Search Methods. Midterm Exam 3/13/18. Thursday, March 15, 7:30 9:30 p.m. room 125 Ag Hall Midterm Exam Thursday, March 15, 7:30 9:30 p.m. room 125 Ag Hall Covers topics through Decision Trees and Random Forests (does not include constraint satisfaction) Closed book 8.5 x 11 sheet with notes

More information

Part I. Classification & Decision Trees. Classification. Classification. Week 4 Based in part on slides from textbook, slides of Susan Holmes

Part I. Classification & Decision Trees. Classification. Classification. Week 4 Based in part on slides from textbook, slides of Susan Holmes Week 4 Based in part on slides from textbook, slides of Susan Holmes Part I Classification & Decision Trees October 19, 2012 1 / 1 2 / 1 Classification Classification Problem description We are given a

More information

Decision Trees. Petr Pošík. Czech Technical University in Prague Faculty of Electrical Engineering Dept. of Cybernetics

Decision Trees. Petr Pošík. Czech Technical University in Prague Faculty of Electrical Engineering Dept. of Cybernetics Decision Trees Petr Pošík Czech Technical University in Prague Faculty of Electrical Engineering Dept. of Cybernetics This lecture is largely based on the book Artificial Intelligence: A Modern Approach,

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:

More information

CS Machine Learning

CS Machine Learning CS 60050 Machine Learning Decision Tree Classifier Slides taken from course materials of Tan, Steinbach, Kumar 10 10 Illustrating Classification Task Tid Attrib1 Attrib2 Attrib3 Class 1 Yes Large 125K

More information

Lecture 7: Decision Trees

Lecture 7: Decision Trees Lecture 7: Decision Trees Instructor: Outline 1 Geometric Perspective of Classification 2 Decision Trees Geometric Perspective of Classification Perspective of Classification Algorithmic Geometric Probabilistic...

More information

A Systematic Overview of Data Mining Algorithms. Sargur Srihari University at Buffalo The State University of New York

A Systematic Overview of Data Mining Algorithms. Sargur Srihari University at Buffalo The State University of New York A Systematic Overview of Data Mining Algorithms Sargur Srihari University at Buffalo The State University of New York 1 Topics Data Mining Algorithm Definition Example of CART Classification Iris, Wine

More information

Midterm Examination CS540-2: Introduction to Artificial Intelligence

Midterm Examination CS540-2: Introduction to Artificial Intelligence Midterm Examination CS540-2: Introduction to Artificial Intelligence March 15, 2018 LAST NAME: FIRST NAME: Problem Score Max Score 1 12 2 13 3 9 4 11 5 8 6 13 7 9 8 16 9 9 Total 100 Question 1. [12] Search

More information

Extra readings beyond the lecture slides are important:

Extra readings beyond the lecture slides are important: 1 Notes To preview next lecture: Check the lecture notes, if slides are not available: http://web.cse.ohio-state.edu/~sun.397/courses/au2017/cse5243-new.html Check UIUC course on the same topic. All their

More information

Lecture 2 :: Decision Trees Learning

Lecture 2 :: Decision Trees Learning Lecture 2 :: Decision Trees Learning 1 / 62 Designing a learning system What to learn? Learning setting. Learning mechanism. Evaluation. 2 / 62 Prediction task Figure 1: Prediction task :: Supervised learning

More information

CSCE 478/878 Lecture 6: Bayesian Learning and Graphical Models. Stephen Scott. Introduction. Outline. Bayes Theorem. Formulas

CSCE 478/878 Lecture 6: Bayesian Learning and Graphical Models. Stephen Scott. Introduction. Outline. Bayes Theorem. Formulas ian ian ian Might have reasons (domain information) to favor some hypotheses/predictions over others a priori ian methods work with probabilities, and have two main roles: Optimal Naïve Nets (Adapted from

More information

Data Mining: Concepts and Techniques Classification and Prediction Chapter 6.1-3

Data Mining: Concepts and Techniques Classification and Prediction Chapter 6.1-3 Data Mining: Concepts and Techniques Classification and Prediction Chapter 6.1-3 January 25, 2007 CSE-4412: Data Mining 1 Chapter 6 Classification and Prediction 1. What is classification? What is prediction?

More information

Machine Learning. Decision Trees. Le Song /15-781, Spring Lecture 6, September 6, 2012 Based on slides from Eric Xing, CMU

Machine Learning. Decision Trees. Le Song /15-781, Spring Lecture 6, September 6, 2012 Based on slides from Eric Xing, CMU Machine Learning 10-701/15-781, Spring 2008 Decision Trees Le Song Lecture 6, September 6, 2012 Based on slides from Eric Xing, CMU Reading: Chap. 1.6, CB & Chap 3, TM Learning non-linear functions f:

More information

Notes based on: Data Mining for Business Intelligence

Notes based on: Data Mining for Business Intelligence Chapter 9 Classification and Regression Trees Roger Bohn April 2017 Notes based on: Data Mining for Business Intelligence 1 Shmueli, Patel & Bruce 2 3 II. Results and Interpretation There are 1183 auction

More information

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany Syllabus Fri. 27.10. (1) 0. Introduction A. Supervised Learning: Linear Models & Fundamentals Fri. 3.11. (2) A.1 Linear Regression Fri. 10.11. (3) A.2 Linear Classification Fri. 17.11. (4) A.3 Regularization

More information

1) Give decision trees to represent the following Boolean functions:

1) Give decision trees to represent the following Boolean functions: 1) Give decision trees to represent the following Boolean functions: 1) A B 2) A [B C] 3) A XOR B 4) [A B] [C Dl Answer: 1) A B 2) A [B C] 1 3) A XOR B = (A B) ( A B) 4) [A B] [C D] 2 2) Consider the following

More information

Data Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation

Data Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation Data Mining Part 2. Data Understanding and Preparation 2.4 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Introduction Normalization Attribute Construction Aggregation Attribute Subset Selection Discretization

More information

Representing structural patterns: Reading Material: Chapter 3 of the textbook by Witten

Representing structural patterns: Reading Material: Chapter 3 of the textbook by Witten Representing structural patterns: Plain Classification rules Decision Tree Rules with exceptions Relational solution Tree for Numerical Prediction Instance-based presentation Reading Material: Chapter

More information

Data Engineering. Data preprocessing and transformation

Data Engineering. Data preprocessing and transformation Data Engineering Data preprocessing and transformation Just apply a learner? NO! Algorithms are biased No free lunch theorem: considering all possible data distributions, no algorithm is better than another

More information

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation. Lecture Notes for Chapter 4. Introduction to Data Mining

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation. Lecture Notes for Chapter 4. Introduction to Data Mining Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach, Kumar (modified by Predrag Radivojac, 2017) Classification:

More information

Text Categorization. Foundations of Statistic Natural Language Processing The MIT Press1999

Text Categorization. Foundations of Statistic Natural Language Processing The MIT Press1999 Text Categorization Foundations of Statistic Natural Language Processing The MIT Press1999 Outline Introduction Decision Trees Maximum Entropy Modeling (optional) Perceptrons K Nearest Neighbor Classification

More information

Backtracking. Chapter 5

Backtracking. Chapter 5 1 Backtracking Chapter 5 2 Objectives Describe the backtrack programming technique Determine when the backtracking technique is an appropriate approach to solving a problem Define a state space tree for

More information

Supervised Learning: K-Nearest Neighbors and Decision Trees

Supervised Learning: K-Nearest Neighbors and Decision Trees Supervised Learning: K-Nearest Neighbors and Decision Trees Piyush Rai CS5350/6350: Machine Learning August 25, 2011 (CS5350/6350) K-NN and DT August 25, 2011 1 / 20 Supervised Learning Given training

More information

Coefficient of Variation based Decision Tree (CvDT)

Coefficient of Variation based Decision Tree (CvDT) Coefficient of Variation based Decision Tree (CvDT) Hima Bindu K #1, Swarupa Rani K #2, Raghavendra Rao C #3 # Department of Computer and Information Sciences, University of Hyderabad Hyderabad, 500046,

More information

Data Mining Algorithms: Basic Methods

Data Mining Algorithms: Basic Methods Algorithms: The basic methods Inferring rudimentary rules Data Mining Algorithms: Basic Methods Chapter 4 of Data Mining Statistical modeling Constructing decision trees Constructing rules Association

More information

Classification/Regression Trees and Random Forests

Classification/Regression Trees and Random Forests Classification/Regression Trees and Random Forests Fabio G. Cozman - fgcozman@usp.br November 6, 2018 Classification tree Consider binary class variable Y and features X 1,..., X n. Decide Ŷ after a series

More information

Random Forest A. Fornaser

Random Forest A. Fornaser Random Forest A. Fornaser alberto.fornaser@unitn.it Sources Lecture 15: decision trees, information theory and random forests, Dr. Richard E. Turner Trees and Random Forests, Adele Cutler, Utah State University

More information

Outline. RainForest A Framework for Fast Decision Tree Construction of Large Datasets. Introduction. Introduction. Introduction (cont d)

Outline. RainForest A Framework for Fast Decision Tree Construction of Large Datasets. Introduction. Introduction. Introduction (cont d) Outline RainForest A Framework for Fast Decision Tree Construction of Large Datasets resented by: ov. 25, 2004 1. 2. roblem Definition 3. 4. Family of Algorithms 5. 6. 2 Classification is an important

More information

Slides for Data Mining by I. H. Witten and E. Frank

Slides for Data Mining by I. H. Witten and E. Frank Slides for Data Mining by I. H. Witten and E. Frank 7 Engineering the input and output Attribute selection Scheme-independent, scheme-specific Attribute discretization Unsupervised, supervised, error-

More information

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018 MIT 801 [Presented by Anna Bosman] 16 February 2018 Machine Learning What is machine learning? Artificial Intelligence? Yes as we know it. What is intelligence? The ability to acquire and apply knowledge

More information

Classification: Decision Trees

Classification: Decision Trees Metodologie per Sistemi Intelligenti Classification: Decision Trees Prof. Pier Luca Lanzi Laurea in Ingegneria Informatica Politecnico di Milano Polo regionale di Como Lecture outline What is a decision

More information

Machine Learning. A. Supervised Learning A.7. Decision Trees. Lars Schmidt-Thieme

Machine Learning. A. Supervised Learning A.7. Decision Trees. Lars Schmidt-Thieme Machine Learning A. Supervised Learning A.7. Decision Trees Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim, Germany 1 /

More information

Nearest neighbor classification DSE 220

Nearest neighbor classification DSE 220 Nearest neighbor classification DSE 220 Decision Trees Target variable Label Dependent variable Output space Person ID Age Gender Income Balance Mortgag e payment 123213 32 F 25000 32000 Y 17824 49 M 12000-3000

More information

Data Mining. 3.3 Rule-Based Classification. Fall Instructor: Dr. Masoud Yaghini. Rule-Based Classification

Data Mining. 3.3 Rule-Based Classification. Fall Instructor: Dr. Masoud Yaghini. Rule-Based Classification Data Mining 3.3 Fall 2008 Instructor: Dr. Masoud Yaghini Outline Using IF-THEN Rules for Classification Rules With Exceptions Rule Extraction from a Decision Tree 1R Algorithm Sequential Covering Algorithms

More information

Boosting Simple Model Selection Cross Validation Regularization. October 3 rd, 2007 Carlos Guestrin [Schapire, 1989]

Boosting Simple Model Selection Cross Validation Regularization. October 3 rd, 2007 Carlos Guestrin [Schapire, 1989] Boosting Simple Model Selection Cross Validation Regularization Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 3 rd, 2007 1 Boosting [Schapire, 1989] Idea: given a weak

More information

Midterm Examination CS 540-2: Introduction to Artificial Intelligence

Midterm Examination CS 540-2: Introduction to Artificial Intelligence Midterm Examination CS 54-2: Introduction to Artificial Intelligence March 9, 217 LAST NAME: FIRST NAME: Problem Score Max Score 1 15 2 17 3 12 4 6 5 12 6 14 7 15 8 9 Total 1 1 of 1 Question 1. [15] State

More information

Batch and Incremental Learning of Decision Trees

Batch and Incremental Learning of Decision Trees Batch and Incremental Learning of Decision Trees DIPLOMARBEIT zur Erlangung des akademischen Grades DIPLOMINGENIEUR im Diplomstudium Mathematik Angefertigt am Institut für Wissensbasierte Mathematische

More information

What is Learning? CS 343: Artificial Intelligence Machine Learning. Raymond J. Mooney. Problem Solving / Planning / Control.

What is Learning? CS 343: Artificial Intelligence Machine Learning. Raymond J. Mooney. Problem Solving / Planning / Control. What is Learning? CS 343: Artificial Intelligence Machine Learning Herbert Simon: Learning is any process by which a system improves performance from experience. What is the task? Classification Problem

More information

4 INFORMED SEARCH AND EXPLORATION. 4.1 Heuristic Search Strategies

4 INFORMED SEARCH AND EXPLORATION. 4.1 Heuristic Search Strategies 55 4 INFORMED SEARCH AND EXPLORATION We now consider informed search that uses problem-specific knowledge beyond the definition of the problem itself This information helps to find solutions more efficiently

More information

CSE4334/5334 DATA MINING

CSE4334/5334 DATA MINING CSE4334/5334 DATA MINING Lecture 4: Classification (1) CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai Li (Slides courtesy

More information

Boosting Simple Model Selection Cross Validation Regularization

Boosting Simple Model Selection Cross Validation Regularization Boosting: (Linked from class website) Schapire 01 Boosting Simple Model Selection Cross Validation Regularization Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 8 th,

More information

CS4445 Data Mining and Knowledge Discovery in Databases. A Term 2008 Exam 2 October 14, 2008

CS4445 Data Mining and Knowledge Discovery in Databases. A Term 2008 Exam 2 October 14, 2008 CS4445 Data Mining and Knowledge Discovery in Databases. A Term 2008 Exam 2 October 14, 2008 Prof. Carolina Ruiz Department of Computer Science Worcester Polytechnic Institute NAME: Prof. Ruiz Problem

More information

Data Mining and Machine Learning. Instance-Based Learning. Rote Learning k Nearest-Neighbor Classification. IBL and Rule Learning

Data Mining and Machine Learning. Instance-Based Learning. Rote Learning k Nearest-Neighbor Classification. IBL and Rule Learning Data Mining and Machine Learning Instance-Based Learning Rote Learning k Nearest-Neighbor Classification Prediction, Weighted Prediction choosing k feature weighting (RELIEF) instance weighting (PEBLS)

More information

Machine Learning Techniques for Data Mining

Machine Learning Techniques for Data Mining Machine Learning Techniques for Data Mining Eibe Frank University of Waikato New Zealand 10/25/2000 1 PART VII Moving on: Engineering the input and output 10/25/2000 2 Applying a learner is not all Already

More information

Ensemble Learning: An Introduction. Adapted from Slides by Tan, Steinbach, Kumar

Ensemble Learning: An Introduction. Adapted from Slides by Tan, Steinbach, Kumar Ensemble Learning: An Introduction Adapted from Slides by Tan, Steinbach, Kumar 1 General Idea D Original Training data Step 1: Create Multiple Data Sets... D 1 D 2 D t-1 D t Step 2: Build Multiple Classifiers

More information