Rutgers University PhD in Management Program. Expert Systems Prof. Glenn Shafer Fall 2001

Size: px
Start display at page:

Download "Rutgers University PhD in Management Program. Expert Systems Prof. Glenn Shafer Fall 2001"

Transcription

1 Rutgers University PhD in Management Program Expert Systems Prof. Glenn Shafer Fall 2001 Report on Brodley and Utgoff (1992) Paper on Multivariate Versus Univariate Decision Trees Fatima Alali December 13, 2001

2 Introduction Classification trees are used to predict membership of cases or objects in the classes of a categorical dependent variable from their measurements on one or more predictor variables. A decision tree is either a leaf node containing the name of a class or a decision node containing a test. For each possible outcome of the test there is a branch to a decision tree. To classify a new instance, one starts at the root of the tree and follow the branch indicated by the outcome of each test until a leaf node is reached. The classification will be then the name of class at the leaf. Univariate decision trees test one feature at a node and thus result in larger trees than if multiple features are tested at a node. In a multivariate decision trees, each test can be based on one or more input features. The paper describes and evaluates different multivariate tree construction methods. The paper also presents experiments that support the theoretical analysis. Section 1provides an overview of the paper. Section 2 provides an overview of the C4.5 and LMDT software used in the empirical applications described in the paper. Section 3, provides an overview of results obtained in the paper. Section 4 assesses both C4.5 and LMDT in a different data set. Section 5 provides conclusion and future implications. 1. Paper Overview The paper provides empirical evaluation of the LMDT algorithms and demonstrates the need for multivariate tests and the need for LMDT to uncover linear structure in the data. 1.1 The theory underlying tree models (classification models): The basic strategy for classification tree is to recursively split the cells of the space of input variables. A given cell is splited through first searching over all variables and all possible thresholds to find whatever leads to the best improvement in a specified score function. The score is assessed based on training data set elements and then is cross validated using the test data set. One disadvantage of classification trees is that they are hierarchical (sequential) in nature, so that the tests in a decision tree are performed sequentially by following the branches of the tree. Thus, only those features that are needed to reach a decision will be evaluated. Decision trees use the sequential decision procedure for determining the classification of a new data point. Decision trees have many attractive attributes: They are easy to construct and to understand. They can handle mixed variables, discrete, continuous, numeric and symbolic as described below. They can easily partition the space using binary tests (thresholds on real variables) They predict the class value for a new case quickly. However, decision as said above are sequential in nature that may result in over grown tree, which is in turn difficult to understand and use since they may result in sub optimal partitions of the space of the input variables. 1.2 Tree construction issues: 2

3 There are two stages for any univariate and multivariate tree construction. First, building the tree using a set of training data, each described by features and labeled with a class name. A top-down approach algorithm chooses the best test to split the training set using some score function 1. The chosen test is then used to partition the training instance and a branch for each outcome of the test is created. This algorithm is applied recursively to each resulting split. If the data in a partition is from a single class then a leaf node is created and assigned the class label of the single class. Different forms of partition merit criteria can be used to judge goodness of split. Those forms commonly appear in terms of impurity and entropy measure. 2 One concern in tree structuring is to avoid overfitting the decision tree to the training data in domains that contain noisy instances 3. Overfitting occurs when the training data contain noisy instances and the decision tree algorithm induces a classifier that forces all instances in the training set correctly. Thus, when classifying a new instance, the tree may perform poorly. To avoid overfitting, the second stage of building a tree is to prune it back to eliminate branches that are not statistically valid. At the pruning stage, a leaf node replaces a whole sub tree. The replacement takes place if a decision rule establishes that the expected error rate in the sub tree is greater than in the single leaf Numeric vs. symbolic features When instances contain features that are symbolic (unordered) and numeric (ordered), it is important to map each unordered feature to a numeric feature without imposing any order on the unordered values of the feature. For some instances, not all feature values may be available. Missing values can be filled in by the estimate of the sample mean after normalization Characteristics of Linear Machines and Linear Machine Decision Trees Breiman et al. (1984) Conducted extensive research on univariate tress. Univariate trees are based on one input variable and thus restricted to represent a split through the instance space that is orthogonal to the variables axis. This process results in bias classification especially when input variables are related numerically. The LMDT tries to overcome this problem. The LMDT constructs each test in a decision tree by training a linear machine and then eliminating irrelevant and noisy variables in a controlled manner. LMDT achieves this through two properties. First, the method by which LMDT finds and eliminates noisy and irrelevant variables is computationally efficient. Second, the linear 1 The score functions for predictive models is a function of the difference between the predictions obtained from the model for each individual input and the targets, which are the response variables. Score function in predictive models can be presented by sum of squared errors between model predictions and actual target measurements. Predictive modeling for Classification. Chapter 7, Score Functions for Data Mining Algorithms. Page Brodley and Utgoff (1990), Multivariate decision trees. 3 Noisy instance is one for which either the class label is incorrect, some number of the attribute values are incorrect or a combination of the two. Brodley and Utgoff (1990). 4 Multivariate decision trees. Brodley and Utgoff (1990). 5 Normalization: at each node in the tree, each encoded symbolic and numeric feature is normalized by mapping it to standard normal form ~ (0, 1). 3

4 machine learning approach enables LMDT to find a good partition of the instance space, no matter the instance space is linearly separable or not. LMDT can handle instances described by numeric/symbolic variables in which some of the values may be missing. LMDT encodes symbolic variables to numeric variables and ensures no order is placed on these variables. The encoded variables are then normalized at each node to gage the relative importance of the encoded variables by the magnitudes of the corresponding weights. LMDT sets missing variables to zero, which correspond to the sample mean of, the corresponding encoded normalized variable. Encoding information is computed at each node and is retained in the tree for classification purposes later. The authors describe how the elimination of noisy and irrelevant variables is done. When LMDT detects that a linear machine is near its final set of boundaries, it eliminates variables that contribute least to discriminating the set of instances at that node and then continues training the linear machine. During the process of eliminating variables, the most accurate linear machine ends; the test for the decision node is the saved linear machine. Linear machines based on fewer variables are preferred in two cases. When the accuracy of a linear machine is based on fewer variables, accuracy is higher than the best accuracy observed thus far or if the drop in accuracy is not significantly different than the best accuracy as measured by t-test at the.01 level of significance. In this case, the linear machines based on fewer variables are saved and if the accuracy is higher, the system updates its values for the best accuracy observed thus far. Second, the algorithm avoids underfitting the data and will eliminate variables until the number of instances is greater than the capacity of a hyperplane. In this case, if the number of unique instances is not twice the dimensionality of each instance, then the linear machine with fewer variables is preferred. The authors measure contribution of a variable to the ability to discriminate by using a measure of the dispersion of its weights over the set of classes. Variables whose weights are widely dispersed achieve two objectives. First, a large magnitude weight causes the corresponding variable to make a large contribution to the value of the discriminant function (discriminability). Second, a variable whose weights are widely spaced makes different contributions to the value of the discrimination function of each class. LMDT computes dispersion for each variable as the average squared distance between weights of each pair of classes and then eliminates the variable with smallest dispersion. As for variable selection criteria, LMDT performs a sequential backward selection (SBS) search for a good combination of features. SBS starts with all initial features and tries to remove the feature that will cause the smallest decrease in accuracy as measured by a criterion function 6. This criterion function in LMDT removes the feature with the lowest weight of dispersion. LMDT recalculate weights after each elimination. 6 A criterion function is a figure of merit reflecting the amount of classification information conveyed by a feature. 4

5 The LMDT is able to overcome the L problem demonstrated in the paper. As the linear machine at the root begins to move toward one of the segments, misclassified instance from the other segment have decreasing effect, thus allowing linear machine to fine one of the segments without being misled by the distant points. The L problem appears in situations where permitting multivariate splits enable a decision tree algorithm to induce a better generalization than using only univariate splits and is thus the appropriate bias. A decision tree that permits only univariate splits would require a large number of tests to classify the training instances correctly. In univariate algorithms, an increase in the number of instances, clustered near a separating hyperplane, increases the number of splits necessary to classify the data correctly. However, the increase in the number of splits does not ensure an increase in accuracy for previously unseen points. 2 C4.5 and LMDT descriptions In this section I first start describing C4.5 software developed by Quinlan originated from ID3 algorithms used for inducing classification models from data. Then, I describe the LMDT algorithm for purposes of multivariate decision trees. 2.1 C4.5 and ID3 algorithm: The basic idea behind the ID3 is that in a decision tree, each node corresponds to a noncategorical attribute which is most informative among the attributes have not yet considered in the tree. It is also important to know how informative a node is. This procedure is called entropy. Below is a brief description of the ID3 algorithm. The ID3 algorithm is used to build a decision tree, given a set of non-categorical attributes C1, C2,.., Cn, the categorical attribute C, and a training set T of records. function ID3 (R: a set of non-categorical attributes, C: the categorical attribute, S: a training set) returns a decision tree; begin If S is empty, return a single node with value Failure; If S consists of records all with the same value for the categorical attribute, return a single node with that value; If R is empty, then return a single node with as value the most frequent of the values of the categorical attribute that are found in records of S; [note that then there will be errors, that is, records that will be improperly classified]; Let D be the attribute with largest Gain(D,S) among attributes in R; Let {dj j=1,2,.., m} be the values of attribute D; Let {Sj j=1,2,.., m} be the subsets of S consisting respectively of records with value dj for attribute D; Return a tree with root labeled D and arcs labeled d1, d2,.., dm going respectively to the trees 5

6 ID3(R-{D}, C, S1), ID3(R-{D}, C, S2),.., ID3(R-{D}, C, Sm); end ID3; 7 C4.5 is an extension of ID3 that considers missing value-attributes, continuous attributes, pruning procedures and rule derivation. 2.2 Linear Machine Decision Trees (as described in the paper): The LMDT algorithm builds a multiclass, multivariate decision tee using a top-down approach described above. For each decision node the LMDT trains a linear machine, based on a subset of the input variables, which then serves as multivariate test for the decision node. LM is multiclass linear discriminant which itself classifies an instance. Linear machine is a set of T linear discriminant functions that are used together to assign an instance to one of the R classes. For example, Y is an instance description (a pattern vector) consisting of a constant threshold value 1 and the numerically encoded features. gi(y) is a discriminant function of the form Wi Y, where Wi is a vector of adjustable coefficients known as weights. A LM infers instance Y to belong to one class i if and only if ( j,i<> j) gi(y) > gj(y). One way for training a linear machine is absolute error correction rule, which adjust Wi, where i is the class to which the instance belongs and j is the class to which the linear machine incorrectly assigns the instance. This correction is done by Wi Wi + cy and Wj Wj- cy where c = (Wj Wi) Y/ 2YY, is the smallest integer such that the updated linear machine will classify the instance correctly. When the instances are linearly separable, cycling through the instances allows the linear machine to partition the instances into separate convex regions. When the instances are not linearly separable, error correction may not cease and the classification accuracy will be unpredictable. To overcome this problem, the authors use a thermal perceptron 8, which they call thermal linear machine. This approach solve the problem when one large error is far away from the decision boundary through using c= β/ β+ k, where k = (Wj Wi) Y/ 2YY and annealing β during training. In addition, it solves the problem when a misclassified instance lies very close to the decision boundary through annealing c by β giving the correction coefficient c = β^2 / β+ k. The model reduces β geometrically by rate a and mathematically by constant b. this enables the algorithm to spend more time training with small values of β when it is refining the location of the decision boundary. Β is reduced for the magnitude of linear machine 9 decreased for the current weight adjustment and increased during the previous adjustment. 7 Building Classification Models: ID3 and C4.5 at: 8 Linear perceptron, developed by Frean (1990), provides stable behavior when instances are not linearly separable. Frean addresses the two problems. 9 The magnitude of the linear machine is defined as the sum of the magnitudes of its constituent weight vectors. 6

7 3 Overview of results obtained in Brodley and Utgoff s paper (1992): The paper uses LMDT across a variety of large data sets, symbolic, numeric, and logic and binary task and multi task data. It compares LMDT approach to C4.5 univariate approach across different tasks to investigate the circumstances the bias of multivariate (and thus LMDT s search bias for finding such a tree) is more appropriate. 3.1 Data description: Six data sets are used to represent a mix of symbolic, and/or numeric attributes, missing values, binary class tasks and multiclass tasks. The Clevland data set consists of 303 patient diagnoses (presence or absence of heart disease). The Glass data set contains different glass samples taken from the scene of an accident. The Iris data set that contains linearly separable and non-linearly separable tasks. Letter recognition data set describes whether a rectangular pixel white-and-black displays as one of the 26 capital letters in English alphabet. Pixel segmentation data set segments an image into seven classes. The vote data set is used to classify each member of the Congress in 1984 as Republican or Democrat using their votes on key issues Terminologies: Number of classes: output classifications classes Performance measures of each of the tasks are reported. Each reported measure is the average of ten runs. To achieve an estimate of the true error rate for five of the domains, authors performed a ten-fold cross validation for each run. The data were spit randomly for each run, with the same split used for both algorithms. Measures reported are: o Unique attributes: The number of the original input attributes that ever need to be evaluated somewhere in the tree. o Number of nodes is the number of linear machines in the tree o Number of leaves is the number of leaves in the tree o o Nodes: The number of test nodes in the tree o Leaves: The total number of leaves in the tree o Average variables per LM: the average number of encoded variables per linear machines o Epochs: The numbers of epochs need to converge to a tree that classifies the training instances correctly. An epoch equals to the number of instances in the training set o Bits: the umber of bits needed to represent the classifier. o Accuracy: the percentage of the test instances classified correctly. If the difference in the accuracy for the test set is statistically significantly different for the two algorithms, the paper reports this difference by highlighting the higher accuracy in bold face type. o The test for significance is a t-test at the.01 level of significance. 10 For more details on the data sets (domains), refer to Brodley and Utgoff (1992), page

8 3.3 Results: It is expected that the time required for LMDT is longer than the time required for C4.5 tree because the hypothesis space for multivariate decision trees is larger than the hypothesis space for univariate decision trees. To compare the difference, the authors report both the number of instances used to update the linear machine and the number of instances observed. For both algorithms, authors count the number of times each training instance is examined. All training instances are examined at the root of the tree, however, at each sub tree, the algorithm examines only a portion of the instance. The number reported in the paper is the sum of the number of instances observed at each node in the tree divided by the size of a training set. This count is fair because although C4.5, while searching for a test at a sub tree, may only examine part of each instance, the same is true for LMDT. It is not meaningful to compare the size of the resulting trees using measures such as number of nodes or number of leaves. The size of an LMDT can be more complex than a C4.5 node. To compare size of the trees, the authors use the Minimum Description Length Principle (MDLP), which states that the best hypothesis to induce from a data set is the one that minimizes the length of the hypothesis plus the length of the data when coded using the hypothesis to predict the data. Here the hypothesis is the decision tree and the data is the training set. The best hypothesis is the one that can represent the data with the fewest number of bits. To represent the data, we must code both the tree and the error vector 11. The results show that LMDT finds trees for the Clevland and Letter Recognition tasks that are statistically significantly more accurate than those C4.5 finds, whereas, C4.5 finds more accurate trees for the Glass and Votes tasks. The difference in accuracies for the Iris and the Pixel Segmentation tasks are not significant. The size of the trees as measured by the number of bits required to code the tree is not consistent with the MDLP. Authors explain this that their data codings are not provably optimal. 4. Empirical assessments of the univariate and multivariate decision trees using C4.5 and LMDT on different data sets The paper uses the C4.5 and the LMDT software to provide empirical results on performance on these software as well as to show how the resulting decision trees based on univariate tests and multivariate trees differ. Data limitation and software restrictions: In this part of this report, I use different data sets and compare the two algorithms. The original version of C4.5 has restricted access. Therefore, I used a See5 demo and run the software. The See5 12 demo is limited to small datasets (up to 400 cases for See5/C5.0) but it incorporates all the features of the C4.5 and C5 the updated version of C4.5. Data set used for LMDT is within software data. So that I have used two different data sets one on each software. Because training set data are comparable, due to limited attribute, results below can be meaningful with limitation. Changes also has been made in three 11 Appendix A of the paper provides the coding procedures. See Brodley and Utgoff (1992), Page See 5. is downloadable at 8

9 revisions, to change the weight training algorithms to run thermal training 10 times, it picks the set of weights that allow the LM to maximize the selection criterion (information-gain or accuracy), add the capability to have LMDT not discard features (user can choose this option with -k parameter) and add the capability for the user to choose accuracy as a selection criterion (user chooses accuracy with -y parameter). 13 Below are the outputs of See5 and LMDT, respectively: See5: See5 [Release 1.15] Wed Dec 12 15:50: ** This demonstration version cannot process ** ** more than 400 training or test cases. ** Read 400 cases (35 attributes) from soybean. data Decision tree: int-discolor = brown: brown-stem-rot (31.4/1.4) int-discolor = black: charcoal-rot (10.5/0.5) int-discolor = none: :...plant-growth = norm: :...leafspot-size = N/A: : :...canker-lesion = N/A: powdery-mildew (16.3/2.3) : : canker-lesion = brown: anthracnose (4.2/0.2) : : canker-lesion = dk-brown-blk: anthracnose (13.8/0.8) : : canker-lesion = tan: purple-seed-stain (6.4/0.4) : leafspot-size = lt-1/8: : :...canker-lesion in brown,dk-brown-blk: bacterial-blight (0) : : canker-lesion = tan: purple-seed-stain (7) : : canker-lesion = N/A: : : :...leafspots-marg = no-w-s-marg: bacterial-pustule (8.4/0.4) : : leafspots-marg = w-s-marg: : : :...seed-size = norm: bacterial-blight (11) : : seed-size = lt-norm: bacterial-pustule (2.6/0.6) : leafspot-size = gt-1/8: : :...mold-growth = present: : :...leaves = norm: diaporthe-pod-&-stem-blight (5.7) : : leaves = abnorm: downy-mildew (11) : mold-growth = absent: : :...fruit-pods = few-present: brown-spot (0) : fruit-pods = diseased: frog-eye-leaf-spot (28/1) : fruit-pods = norm: : :...fruiting-bodies = present: brown-spot (27) : fruiting-bodies = absent: : :...date = april: brown-spot (1) : date = may: brown-spot (14/1) : date = october: alternarialeaf-spot (18/1) : date = june: : :...precip = lt-norm: phyllosticta-leaf-spot (2) : : precip = norm: phyllosticta-leaf-spot (1) : : precip = gt-norm: brown-spot (11) : date = july: [S1] 13 LMDT Documentation provided by Carla E. Brodley Version 2 9/4/94. 9

10 : date = august: : :...severity = severe: alternarialeaf-spot (0) : : severity = minor: frog-eye-leaf-spot (3) : : severity = pot-severe: alternarialeaf-spot (11/3) : date = september: : :...stem = norm: alternarialeaf-spot (26/2) : stem = abnorm: frog-eye-leaf-spot (2) plant-growth = abnorm: :...leaves = norm: rhizoctonia-root-rot (13) leaves = abnorm: :...stem = abnorm: :...plant-stand = normal: : :...seed = norm: diaporthe-stem-canker (13.1/0.1) : : seed = abnorm: anthracnose (4) : plant-stand = lt-normal: : :...fruiting-bodies = absent: : :...area-damaged = scattered: anthracnose (1.9/0.9) : : area-damaged = low-areas: phytophthora-rot (47.5/0.1) : : area-damaged = upper-areas: 2-4-d-injury (0.1) : : area-damaged = whole-field: herbicide-injury (2.6/1.2) : fruiting-bodies = present: : :...roots = galls-cysts: phytophthora-rot (0) : roots = norm: anthracnose (4) : roots = rotted: phytophthora-rot (8.2/0.6) stem = norm: :...seed = abnorm: cyst-nematode (11/1.1) seed = norm: :...leafspot-size = N/A: 2-4-d-injury (0.1) leafspot-size = lt-1/8: bacterial-blight (2) leafspot-size = gt-1/8: :...leaf-shread = present: phyllosticta-leaf-spot (3) leaf-shread = absent: :...date in june,september, : october: brown-spot (0) date = april: brown-spot (2) date = may: brown-spot (2) date = july: frog-eye-leaf-spot (3/1) date = august: frog-eye-leaf-spot (1) SubTree [S1] area-damaged = scattered: frog-eye-leaf-spot (3/1) area-damaged = low-areas: brown-spot (2/1) area-damaged = upper-areas: phyllosticta-leaf-spot (3) area-damaged = whole-field: brown-spot (1) Evaluation on training data (400 cases): Decision Tree Size Errors 45 19( 4.8%) << (a) (b) (c) (d) (e) (f) (g) (h) (i) (j) (k) (l) (m) (n) (o) (p) (q) (r) (s) <-classified as 10

11 (a): class diaporthe-stem-canker 10 (b): class charcoal-rot 13 (c): class rhizoctonia-root-rot 55 1 (d): class phytophthora-rot 30 (e): class brown-stem-rot 14 (f): class powdery-mildew 11 (g): class downy-mildew 58 1 (h): class brown-spot 13 (i): class bacterial-blight 10 1 (j): class bacterial-pustule 13 (k): class purple-seed-stain 26 (l): class anthracnose (m): class phyllosticta-leaf-spot (n): class alternarialeaf-spot 6 37 (o): class frog-eye-leaf-spot 8 (p): class diaporthe-pod-&-stem-blight 11 (q): class cyst-nematode 4 (r): class 2-4-d-injury 2 1 (s): class herbicide-injury Evaluation on test data (233 cases): Decision Tree Size Errors 45 33(14.2%) << (a) (b) (c) (d) (e) (f) (g) (h) (i) (j) (k) (l) (m) (n) (o) (p) (q) (r) (s) <-classified as (a): class diaporthe-stem-canker 11

12 9 (b): class charcoal-rot 1 5 (c): class rhizoctonia-root-rot 28 (d): class phytophthora-rot 14 (e): class brown-stem-rot 4 (f): class powdery-mildew 8 (g): class downy-mildew (h): class brown-spot 5 (i): class bacterial-blight (j): class bacterial-pustule 6 (k): class purple-seed-stain 11 (l): class anthracnose 4 3 (m): class phyllosticta-leaf-spot 27 5 (n): class alternarialeaf-spot (o): class frog-eye-leaf-spot 7 (p): class diaporthe-pod-&-stem-blight 2 (q): class cyst-nematode 4 8 (r): class 2-4-d-injury 1 3 (s): class herbicide-injury Time: 0.1 secs LMDT Output: LM LM LM LEAF LEAF 2 12

13 LEAF LM LEAF LEAF 2 Output Statistics: Number Epochs : Num Insts seen: Num Insts trnd: Number nodes : 4 Number of LVs : 5 Unique vars : 6 Ave. vars/lm : 5.75 Train accuracy: Test accuracy : Train errors 66.0 Test errors 12.0 Time : 1 It is noticeable that the multivariate decision tree generated by LMDT is much smaller (simpler) than the one generated by the See5. decision trees. The See5 demo does not provide detailed statistics on the output. Multivariate decision trees have more complex nodes compared to univariate decision tree nodes. However, results may not be comparable due to the use of different data sets of different sizes. 4. Conclusion Since the objective of creating a multivariate decision tree algorithm is to overcome problems of univariate trees tests that branch splits are orthogonal to the variables axes. Results demonstrate that for some data sets the bias of univariate decision tree is more appropriate. This is because LMDT s bias for finding a multivariate tree may be inappropriate for some tasks because it may not find a univariate test when it should. LMDT variable elimination method is a greedy search procedure, which can get stuck on local maxima. Therefore, although the hypothesis space LMDT searches includes univariate decision trees, the heuristic nature of LMDT s search may result in selecting a test from an inappropriate part of the hypothesis space. A solution suggested by the authors would be to determine the appropriate bias dynamically for each test in the tree. The perceptron tree algorithm is one example of a system, which tries to determine the appropriate representational bias for the instances automatically. Specifically, the algorithm first tries to fit a linear threshold unit (LTU) to the space of the instances. If the 13

14 space is not linearly separable, then the bias of an LTU 14 is inappropriate and the system searches for the best univariate test. However, for some instance spaces, the best test may be based on a subset of the variables. The multivariate decision tree algorithm therefore should employ a dynamic control strategy for finding the appropriate representational bias for each test in the decision tree. Specifically, rather than search the space of multivariate tests using a fixed bias (like LMDT), such a system would have the capability to focus its search using heuristic measures of the learning process. Future directions suggested by the authors: The bias problem can be solved if a chosen algorithm is able to induce a good generalization that depends on how well the hypothesis space underlying the learning algorithm and the bias for searching that space fit the given task. Given that different algorithms search different hypothesis spaces, one algorithm can find a better hypothesis than other for some but not for all tasks. Given a task for which there is no a priori knowledge as to what the appropriate hypothesis space should be, a learning algorithm should itself determine what is the appropriate bias. 14 See Utgoff, P. E. and Brodley, C. E., Linear Machine Decision Trees. COINS Technical Report 91-10, January 1991, Department of Computer Science, University of Massachusetts, Amherst, MA

15 References Breiman, L., Friedman, J., Olshen, R., and Stone, C. (1984). Classification and Regression Trees. Wadsworth International Group, Belmont, CA. Building Classification Models: ID3 and C4.5 at: David J. Hand Padhraic Smyth Heikki Mannila, Principles of Data Mining. MIT Press, September Chapter 7, and 10, Score Functions for Data Mining Algorithms and Predictive Modeling For Classification. Brodley, C. E. and Utgoff, P. E., Multivariate Versus Univariate Decision Trees. COINS Technical Report 92-8, January 1992, Department of Computer Science, University of Massachusetts, Amherst, MA See 5. Demo at Utgoff, P. E. and Brodley, C. E. Linear Machine Decision Trees. COINS Technical Report 91-10, January 1991, Department of Computer Science, University of Massachusetts, Amherst, MA

Univariate and Multivariate Decision Trees

Univariate and Multivariate Decision Trees Univariate and Multivariate Decision Trees Olcay Taner Yıldız and Ethem Alpaydın Department of Computer Engineering Boğaziçi University İstanbul 80815 Turkey Abstract. Univariate decision trees at each

More information

Multivariate Decision Trees

Multivariate Decision Trees achine Learning, 19, 45-77 (1995) 1995 Kluwer Academic Publishers, Boston. anufactured in The Netherlands. ultivariate Decision Trees CARLA E. BRODLEY School of Electrical Engineering, Purdue University,

More information

Comparing Univariate and Multivariate Decision Trees *

Comparing Univariate and Multivariate Decision Trees * Comparing Univariate and Multivariate Decision Trees * Olcay Taner Yıldız, Ethem Alpaydın Department of Computer Engineering Boğaziçi University, 80815 İstanbul Turkey yildizol@cmpe.boun.edu.tr, alpaydin@boun.edu.tr

More information

7. Decision or classification trees

7. Decision or classification trees 7. Decision or classification trees Next we are going to consider a rather different approach from those presented so far to machine learning that use one of the most common and important data structure,

More information

Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai

Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai Decision Trees Decision Tree Decision Trees (DTs) are a nonparametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target

More information

An Information-Theoretic Approach to the Prepruning of Classification Rules

An Information-Theoretic Approach to the Prepruning of Classification Rules An Information-Theoretic Approach to the Prepruning of Classification Rules Max Bramer University of Portsmouth, Portsmouth, UK Abstract: Keywords: The automatic induction of classification rules from

More information

Induction of Multivariate Decision Trees by Using Dipolar Criteria

Induction of Multivariate Decision Trees by Using Dipolar Criteria Induction of Multivariate Decision Trees by Using Dipolar Criteria Leon Bobrowski 1,2 and Marek Krȩtowski 1 1 Institute of Computer Science, Technical University of Bia lystok, Poland 2 Institute of Biocybernetics

More information

A Systematic Overview of Data Mining Algorithms. Sargur Srihari University at Buffalo The State University of New York

A Systematic Overview of Data Mining Algorithms. Sargur Srihari University at Buffalo The State University of New York A Systematic Overview of Data Mining Algorithms Sargur Srihari University at Buffalo The State University of New York 1 Topics Data Mining Algorithm Definition Example of CART Classification Iris, Wine

More information

Perceptron-Based Oblique Tree (P-BOT)

Perceptron-Based Oblique Tree (P-BOT) Perceptron-Based Oblique Tree (P-BOT) Ben Axelrod Stephen Campos John Envarli G.I.T. G.I.T. G.I.T. baxelrod@cc.gatech sjcampos@cc.gatech envarli@cc.gatech Abstract Decision trees are simple and fast data

More information

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1 Big Data Methods Chapter 5: Machine learning Big Data Methods, Chapter 5, Slide 1 5.1 Introduction to machine learning What is machine learning? Concerned with the study and development of algorithms that

More information

Business Club. Decision Trees

Business Club. Decision Trees Business Club Decision Trees Business Club Analytics Team December 2017 Index 1. Motivation- A Case Study 2. The Trees a. What is a decision tree b. Representation 3. Regression v/s Classification 4. Building

More information

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 10: Decision Trees

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 10: Decision Trees University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 10: Decision Trees colour=green? size>20cm? colour=red? watermelon size>5cm? size>5cm? colour=yellow? apple

More information

Data Mining. 3.2 Decision Tree Classifier. Fall Instructor: Dr. Masoud Yaghini. Chapter 5: Decision Tree Classifier

Data Mining. 3.2 Decision Tree Classifier. Fall Instructor: Dr. Masoud Yaghini. Chapter 5: Decision Tree Classifier Data Mining 3.2 Decision Tree Classifier Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction Basic Algorithm for Decision Tree Induction Attribute Selection Measures Information Gain Gain Ratio

More information

A Systematic Overview of Data Mining Algorithms

A Systematic Overview of Data Mining Algorithms A Systematic Overview of Data Mining Algorithms 1 Data Mining Algorithm A well-defined procedure that takes data as input and produces output as models or patterns well-defined: precisely encoded as a

More information

Decision Tree CE-717 : Machine Learning Sharif University of Technology

Decision Tree CE-717 : Machine Learning Sharif University of Technology Decision Tree CE-717 : Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Some slides have been adapted from: Prof. Tom Mitchell Decision tree Approximating functions of usually discrete

More information

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 10: Decision Trees

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 10: Decision Trees University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 10: Decision Trees colour=green? size>20cm? colour=red? watermelon size>5cm? size>5cm? colour=yellow? apple

More information

CS Machine Learning

CS Machine Learning CS 60050 Machine Learning Decision Tree Classifier Slides taken from course materials of Tan, Steinbach, Kumar 10 10 Illustrating Classification Task Tid Attrib1 Attrib2 Attrib3 Class 1 Yes Large 125K

More information

Fuzzy Partitioning with FID3.1

Fuzzy Partitioning with FID3.1 Fuzzy Partitioning with FID3.1 Cezary Z. Janikow Dept. of Mathematics and Computer Science University of Missouri St. Louis St. Louis, Missouri 63121 janikow@umsl.edu Maciej Fajfer Institute of Computing

More information

Evaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München

Evaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München Evaluation Measures Sebastian Pölsterl Computer Aided Medical Procedures Technische Universität München April 28, 2015 Outline 1 Classification 1. Confusion Matrix 2. Receiver operating characteristics

More information

Decision trees. Decision trees are useful to a large degree because of their simplicity and interpretability

Decision trees. Decision trees are useful to a large degree because of their simplicity and interpretability Decision trees A decision tree is a method for classification/regression that aims to ask a few relatively simple questions about an input and then predicts the associated output Decision trees are useful

More information

Data Mining. Decision Tree. Hamid Beigy. Sharif University of Technology. Fall 1396

Data Mining. Decision Tree. Hamid Beigy. Sharif University of Technology. Fall 1396 Data Mining Decision Tree Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 1 / 24 Table of contents 1 Introduction 2 Decision tree

More information

The digital copy of this thesis is protected by the Copyright Act 1994 (New Zealand).

The digital copy of this thesis is protected by the Copyright Act 1994 (New Zealand). http://waikato.researchgateway.ac.nz/ Research Commons at the University of Waikato Copyright Statement: The digital copy of this thesis is protected by the Copyright Act 1994 (New Zealand). The thesis

More information

Random Forest A. Fornaser

Random Forest A. Fornaser Random Forest A. Fornaser alberto.fornaser@unitn.it Sources Lecture 15: decision trees, information theory and random forests, Dr. Richard E. Turner Trees and Random Forests, Adele Cutler, Utah State University

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:

More information

Nearest neighbor classification DSE 220

Nearest neighbor classification DSE 220 Nearest neighbor classification DSE 220 Decision Trees Target variable Label Dependent variable Output space Person ID Age Gender Income Balance Mortgag e payment 123213 32 F 25000 32000 Y 17824 49 M 12000-3000

More information

Data Mining: Concepts and Techniques Classification and Prediction Chapter 6.1-3

Data Mining: Concepts and Techniques Classification and Prediction Chapter 6.1-3 Data Mining: Concepts and Techniques Classification and Prediction Chapter 6.1-3 January 25, 2007 CSE-4412: Data Mining 1 Chapter 6 Classification and Prediction 1. What is classification? What is prediction?

More information

Classification and Regression Trees

Classification and Regression Trees Classification and Regression Trees David S. Rosenberg New York University April 3, 2018 David S. Rosenberg (New York University) DS-GA 1003 / CSCI-GA 2567 April 3, 2018 1 / 51 Contents 1 Trees 2 Regression

More information

COMP 465: Data Mining Classification Basics

COMP 465: Data Mining Classification Basics Supervised vs. Unsupervised Learning COMP 465: Data Mining Classification Basics Slides Adapted From : Jiawei Han, Micheline Kamber & Jian Pei Data Mining: Concepts and Techniques, 3 rd ed. Supervised

More information

5 Learning hypothesis classes (16 points)

5 Learning hypothesis classes (16 points) 5 Learning hypothesis classes (16 points) Consider a classification problem with two real valued inputs. For each of the following algorithms, specify all of the separators below that it could have generated

More information

Slides for Data Mining by I. H. Witten and E. Frank

Slides for Data Mining by I. H. Witten and E. Frank Slides for Data Mining by I. H. Witten and E. Frank 7 Engineering the input and output Attribute selection Scheme-independent, scheme-specific Attribute discretization Unsupervised, supervised, error-

More information

Midterm Examination CS540-2: Introduction to Artificial Intelligence

Midterm Examination CS540-2: Introduction to Artificial Intelligence Midterm Examination CS540-2: Introduction to Artificial Intelligence March 15, 2018 LAST NAME: FIRST NAME: Problem Score Max Score 1 12 2 13 3 9 4 11 5 8 6 13 7 9 8 16 9 9 Total 100 Question 1. [12] Search

More information

MIT Samberg Center Cambridge, MA, USA. May 30 th June 2 nd, by C. Rea, R.S. Granetz MIT Plasma Science and Fusion Center, Cambridge, MA, USA

MIT Samberg Center Cambridge, MA, USA. May 30 th June 2 nd, by C. Rea, R.S. Granetz MIT Plasma Science and Fusion Center, Cambridge, MA, USA Exploratory Machine Learning studies for disruption prediction on DIII-D by C. Rea, R.S. Granetz MIT Plasma Science and Fusion Center, Cambridge, MA, USA Presented at the 2 nd IAEA Technical Meeting on

More information

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018 MIT 801 [Presented by Anna Bosman] 16 February 2018 Machine Learning What is machine learning? Artificial Intelligence? Yes as we know it. What is intelligence? The ability to acquire and apply knowledge

More information

Machine Learning Techniques for Data Mining

Machine Learning Techniques for Data Mining Machine Learning Techniques for Data Mining Eibe Frank University of Waikato New Zealand 10/25/2000 1 PART VII Moving on: Engineering the input and output 10/25/2000 2 Applying a learner is not all Already

More information

Classification with Decision Tree Induction

Classification with Decision Tree Induction Classification with Decision Tree Induction This algorithm makes Classification Decision for a test sample with the help of tree like structure (Similar to Binary Tree OR k-ary tree) Nodes in the tree

More information

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany Syllabus Fri. 27.10. (1) 0. Introduction A. Supervised Learning: Linear Models & Fundamentals Fri. 3.11. (2) A.1 Linear Regression Fri. 10.11. (3) A.2 Linear Classification Fri. 17.11. (4) A.3 Regularization

More information

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition Pattern Recognition Kjell Elenius Speech, Music and Hearing KTH March 29, 2007 Speech recognition 2007 1 Ch 4. Pattern Recognition 1(3) Bayes Decision Theory Minimum-Error-Rate Decision Rules Discriminant

More information

Nominal Data. May not have a numerical representation Distance measures might not make sense. PR and ANN

Nominal Data. May not have a numerical representation Distance measures might not make sense. PR and ANN NonMetric Data Nominal Data So far we consider patterns to be represented by feature vectors of real or integer values Easy to come up with a distance (similarity) measure by using a variety of mathematical

More information

A fuzzy k-modes algorithm for clustering categorical data. Citation IEEE Transactions on Fuzzy Systems, 1999, v. 7 n. 4, p.

A fuzzy k-modes algorithm for clustering categorical data. Citation IEEE Transactions on Fuzzy Systems, 1999, v. 7 n. 4, p. Title A fuzzy k-modes algorithm for clustering categorical data Author(s) Huang, Z; Ng, MKP Citation IEEE Transactions on Fuzzy Systems, 1999, v. 7 n. 4, p. 446-452 Issued Date 1999 URL http://hdl.handle.net/10722/42992

More information

What is Learning? CS 343: Artificial Intelligence Machine Learning. Raymond J. Mooney. Problem Solving / Planning / Control.

What is Learning? CS 343: Artificial Intelligence Machine Learning. Raymond J. Mooney. Problem Solving / Planning / Control. What is Learning? CS 343: Artificial Intelligence Machine Learning Herbert Simon: Learning is any process by which a system improves performance from experience. What is the task? Classification Problem

More information

Ensemble Methods, Decision Trees

Ensemble Methods, Decision Trees CS 1675: Intro to Machine Learning Ensemble Methods, Decision Trees Prof. Adriana Kovashka University of Pittsburgh November 13, 2018 Plan for This Lecture Ensemble methods: introduction Boosting Algorithm

More information

Extra readings beyond the lecture slides are important:

Extra readings beyond the lecture slides are important: 1 Notes To preview next lecture: Check the lecture notes, if slides are not available: http://web.cse.ohio-state.edu/~sun.397/courses/au2017/cse5243-new.html Check UIUC course on the same topic. All their

More information

Data Mining Classification - Part 1 -

Data Mining Classification - Part 1 - Data Mining Classification - Part 1 - Universität Mannheim Bizer: Data Mining I FSS2019 (Version: 20.2.2018) Slide 1 Outline 1. What is Classification? 2. K-Nearest-Neighbors 3. Decision Trees 4. Model

More information

USING CONVEX PSEUDO-DATA TO INCREASE PREDICTION ACCURACY

USING CONVEX PSEUDO-DATA TO INCREASE PREDICTION ACCURACY 1 USING CONVEX PSEUDO-DATA TO INCREASE PREDICTION ACCURACY Leo Breiman Statistics Department University of California Berkeley, CA 94720 leo@stat.berkeley.edu ABSTRACT A prediction algorithm is consistent

More information

Data Mining. 3.3 Rule-Based Classification. Fall Instructor: Dr. Masoud Yaghini. Rule-Based Classification

Data Mining. 3.3 Rule-Based Classification. Fall Instructor: Dr. Masoud Yaghini. Rule-Based Classification Data Mining 3.3 Fall 2008 Instructor: Dr. Masoud Yaghini Outline Using IF-THEN Rules for Classification Rules With Exceptions Rule Extraction from a Decision Tree 1R Algorithm Sequential Covering Algorithms

More information

CS229 Lecture notes. Raphael John Lamarre Townshend

CS229 Lecture notes. Raphael John Lamarre Townshend CS229 Lecture notes Raphael John Lamarre Townshend Decision Trees We now turn our attention to decision trees, a simple yet flexible class of algorithms. We will first consider the non-linear, region-based

More information

Lecture 7: Decision Trees

Lecture 7: Decision Trees Lecture 7: Decision Trees Instructor: Outline 1 Geometric Perspective of Classification 2 Decision Trees Geometric Perspective of Classification Perspective of Classification Algorithmic Geometric Probabilistic...

More information

Lecture on Modeling Tools for Clustering & Regression

Lecture on Modeling Tools for Clustering & Regression Lecture on Modeling Tools for Clustering & Regression CS 590.21 Analysis and Modeling of Brain Networks Department of Computer Science University of Crete Data Clustering Overview Organizing data into

More information

CS 229 Midterm Review

CS 229 Midterm Review CS 229 Midterm Review Course Staff Fall 2018 11/2/2018 Outline Today: SVMs Kernels Tree Ensembles EM Algorithm / Mixture Models [ Focus on building intuition, less so on solving specific problems. Ask

More information

International Journal of Software and Web Sciences (IJSWS)

International Journal of Software and Web Sciences (IJSWS) International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) ISSN (Print): 2279-0063 ISSN (Online): 2279-0071 International

More information

Oblique Linear Tree. 1. Introduction

Oblique Linear Tree. 1. Introduction Oblique Linear Tree João Gama LIACC, FEP - University of Porto Rua Campo Alegre, 823 4150 Porto, Portugal Phone: (+351) 2 6001672 Fax: (+351) 2 6003654 Email: jgama@ncc.up.pt WWW: http//www.up.pt/liacc/ml

More information

Logistic Model Tree With Modified AIC

Logistic Model Tree With Modified AIC Logistic Model Tree With Modified AIC Mitesh J. Thakkar Neha J. Thakkar Dr. J.S.Shah Student of M.E.I.T. Asst.Prof.Computer Dept. Prof.&Head Computer Dept. S.S.Engineering College, Indus Engineering College

More information

Supervised Learning with Neural Networks. We now look at how an agent might learn to solve a general problem by seeing examples.

Supervised Learning with Neural Networks. We now look at how an agent might learn to solve a general problem by seeing examples. Supervised Learning with Neural Networks We now look at how an agent might learn to solve a general problem by seeing examples. Aims: to present an outline of supervised learning as part of AI; to introduce

More information

Decision Trees Oct

Decision Trees Oct Decision Trees Oct - 7-2009 Previously We learned two different classifiers Perceptron: LTU KNN: complex decision boundary If you are a novice in this field, given a classification application, are these

More information

Using Pairs of Data-Points to Define Splits for Decision Trees

Using Pairs of Data-Points to Define Splits for Decision Trees Using Pairs of Data-Points to Define Splits for Decision Trees Geoffrey E. Hinton Department of Computer Science University of Toronto Toronto, Ontario, M5S la4, Canada hinton@cs.toronto.edu Michael Revow

More information

Clustering and Visualisation of Data

Clustering and Visualisation of Data Clustering and Visualisation of Data Hiroshi Shimodaira January-March 28 Cluster analysis aims to partition a data set into meaningful or useful groups, based on distances between data points. In some

More information

Combined Weak Classifiers

Combined Weak Classifiers Combined Weak Classifiers Chuanyi Ji and Sheng Ma Department of Electrical, Computer and System Engineering Rensselaer Polytechnic Institute, Troy, NY 12180 chuanyi@ecse.rpi.edu, shengm@ecse.rpi.edu Abstract

More information

Machine Learning. Decision Trees. Manfred Huber

Machine Learning. Decision Trees. Manfred Huber Machine Learning Decision Trees Manfred Huber 2015 1 Decision Trees Classifiers covered so far have been Non-parametric (KNN) Probabilistic with independence (Naïve Bayes) Linear in features (Logistic

More information

8. Tree-based approaches

8. Tree-based approaches Foundations of Machine Learning École Centrale Paris Fall 2015 8. Tree-based approaches Chloé-Agathe Azencott Centre for Computational Biology, Mines ParisTech chloe agathe.azencott@mines paristech.fr

More information

Decision tree learning

Decision tree learning Decision tree learning Andrea Passerini passerini@disi.unitn.it Machine Learning Learning the concept Go to lesson OUTLOOK Rain Overcast Sunny TRANSPORTATION LESSON NO Uncovered Covered Theoretical Practical

More information

Machine Learning. A. Supervised Learning A.7. Decision Trees. Lars Schmidt-Thieme

Machine Learning. A. Supervised Learning A.7. Decision Trees. Lars Schmidt-Thieme Machine Learning A. Supervised Learning A.7. Decision Trees Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim, Germany 1 /

More information

Classification: Linear Discriminant Functions

Classification: Linear Discriminant Functions Classification: Linear Discriminant Functions CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Discriminant functions Linear Discriminant functions

More information

arxiv: v1 [stat.ml] 25 Jan 2018

arxiv: v1 [stat.ml] 25 Jan 2018 arxiv:1801.08310v1 [stat.ml] 25 Jan 2018 Information gain ratio correction: Improving prediction with more balanced decision tree splits Antonin Leroux 1, Matthieu Boussard 1, and Remi Dès 1 1 craft ai

More information

Notes based on: Data Mining for Business Intelligence

Notes based on: Data Mining for Business Intelligence Chapter 9 Classification and Regression Trees Roger Bohn April 2017 Notes based on: Data Mining for Business Intelligence 1 Shmueli, Patel & Bruce 2 3 II. Results and Interpretation There are 1183 auction

More information

HIMIC : A Hierarchical Mixed Type Data Clustering Algorithm

HIMIC : A Hierarchical Mixed Type Data Clustering Algorithm HIMIC : A Hierarchical Mixed Type Data Clustering Algorithm R. A. Ahmed B. Borah D. K. Bhattacharyya Department of Computer Science and Information Technology, Tezpur University, Napam, Tezpur-784028,

More information

Data Mining Part 5. Prediction

Data Mining Part 5. Prediction Data Mining Part 5. Prediction 5.4. Spring 2010 Instructor: Dr. Masoud Yaghini Outline Using IF-THEN Rules for Classification Rule Extraction from a Decision Tree 1R Algorithm Sequential Covering Algorithms

More information

1) Give decision trees to represent the following Boolean functions:

1) Give decision trees to represent the following Boolean functions: 1) Give decision trees to represent the following Boolean functions: 1) A B 2) A [B C] 3) A XOR B 4) [A B] [C Dl Answer: 1) A B 2) A [B C] 1 3) A XOR B = (A B) ( A B) 4) [A B] [C D] 2 2) Consider the following

More information

Best First and Greedy Search Based CFS and Naïve Bayes Algorithms for Hepatitis Diagnosis

Best First and Greedy Search Based CFS and Naïve Bayes Algorithms for Hepatitis Diagnosis Best First and Greedy Search Based CFS and Naïve Bayes Algorithms for Hepatitis Diagnosis CHAPTER 3 BEST FIRST AND GREEDY SEARCH BASED CFS AND NAÏVE BAYES ALGORITHMS FOR HEPATITIS DIAGNOSIS 3.1 Introduction

More information

MLPQNA-LEMON Multi Layer Perceptron neural network trained by Quasi Newton or Levenberg-Marquardt optimization algorithms

MLPQNA-LEMON Multi Layer Perceptron neural network trained by Quasi Newton or Levenberg-Marquardt optimization algorithms MLPQNA-LEMON Multi Layer Perceptron neural network trained by Quasi Newton or Levenberg-Marquardt optimization algorithms 1 Introduction In supervised Machine Learning (ML) we have a set of data points

More information

Nominal Data. May not have a numerical representation Distance measures might not make sense PR, ANN, & ML

Nominal Data. May not have a numerical representation Distance measures might not make sense PR, ANN, & ML Decision Trees Nominal Data So far we consider patterns to be represented by feature vectors of real or integer values Easy to come up with a distance (similarity) measure by using a variety of mathematical

More information

The Basics of Decision Trees

The Basics of Decision Trees Tree-based Methods Here we describe tree-based methods for regression and classification. These involve stratifying or segmenting the predictor space into a number of simple regions. Since the set of splitting

More information

Lecture 2 :: Decision Trees Learning

Lecture 2 :: Decision Trees Learning Lecture 2 :: Decision Trees Learning 1 / 62 Designing a learning system What to learn? Learning setting. Learning mechanism. Evaluation. 2 / 62 Prediction task Figure 1: Prediction task :: Supervised learning

More information

CHAPTER 4 DETECTION OF DISEASES IN PLANT LEAF USING IMAGE SEGMENTATION

CHAPTER 4 DETECTION OF DISEASES IN PLANT LEAF USING IMAGE SEGMENTATION CHAPTER 4 DETECTION OF DISEASES IN PLANT LEAF USING IMAGE SEGMENTATION 4.1. Introduction Indian economy is highly dependent of agricultural productivity. Therefore, in field of agriculture, detection of

More information

Data Mining Practical Machine Learning Tools and Techniques. Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank

Data Mining Practical Machine Learning Tools and Techniques. Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank Implementation: Real machine learning schemes Decision trees Classification

More information

Lecture 10 September 19, 2007

Lecture 10 September 19, 2007 CS 6604: Data Mining Fall 2007 Lecture 10 September 19, 2007 Lecture: Naren Ramakrishnan Scribe: Seungwon Yang 1 Overview In the previous lecture we examined the decision tree classifier and choices for

More information

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani Neural Networks CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Biological and artificial neural networks Feed-forward neural networks Single layer

More information

Data Mining. 3.5 Lazy Learners (Instance-Based Learners) Fall Instructor: Dr. Masoud Yaghini. Lazy Learners

Data Mining. 3.5 Lazy Learners (Instance-Based Learners) Fall Instructor: Dr. Masoud Yaghini. Lazy Learners Data Mining 3.5 (Instance-Based Learners) Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction k-nearest-neighbor Classifiers References Introduction Introduction Lazy vs. eager learning Eager

More information

Machine Learning Models for Pattern Classification. Comp 473/6731

Machine Learning Models for Pattern Classification. Comp 473/6731 Machine Learning Models for Pattern Classification Comp 473/6731 November 24th 2016 Prof. Neamat El Gayar Neural Networks Neural Networks Low level computational algorithms Learn by example (no required

More information

Lecture outline. Decision-tree classification

Lecture outline. Decision-tree classification Lecture outline Decision-tree classification Decision Trees Decision tree A flow-chart-like tree structure Internal node denotes a test on an attribute Branch represents an outcome of the test Leaf nodes

More information

Data Preprocessing. Slides by: Shree Jaswal

Data Preprocessing. Slides by: Shree Jaswal Data Preprocessing Slides by: Shree Jaswal Topics to be covered Why Preprocessing? Data Cleaning; Data Integration; Data Reduction: Attribute subset selection, Histograms, Clustering and Sampling; Data

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Unsupervised learning Until now, we have assumed our training samples are labeled by their category membership. Methods that use labeled samples are said to be supervised. However,

More information

Improving Quality of Products in Hard Drive Manufacturing by Decision Tree Technique

Improving Quality of Products in Hard Drive Manufacturing by Decision Tree Technique Improving Quality of Products in Hard Drive Manufacturing by Decision Tree Technique Anotai Siltepavet 1, Sukree Sinthupinyo 2 and Prabhas Chongstitvatana 3 1 Computer Engineering, Chulalongkorn University,

More information

Segmentation of Images

Segmentation of Images Segmentation of Images SEGMENTATION If an image has been preprocessed appropriately to remove noise and artifacts, segmentation is often the key step in interpreting the image. Image segmentation is a

More information

Classification with PAM and Random Forest

Classification with PAM and Random Forest 5/7/2007 Classification with PAM and Random Forest Markus Ruschhaupt Practical Microarray Analysis 2007 - Regensburg Two roads to classification Given: patient profiles already diagnosed by an expert.

More information

Classification Algorithms in Data Mining

Classification Algorithms in Data Mining August 9th, 2016 Suhas Mallesh Yash Thakkar Ashok Choudhary CIS660 Data Mining and Big Data Processing -Dr. Sunnie S. Chung Classification Algorithms in Data Mining Deciding on the classification algorithms

More information

Empirical Evaluation of Feature Subset Selection based on a Real-World Data Set

Empirical Evaluation of Feature Subset Selection based on a Real-World Data Set P. Perner and C. Apte, Empirical Evaluation of Feature Subset Selection Based on a Real World Data Set, In: D.A. Zighed, J. Komorowski, and J. Zytkow, Principles of Data Mining and Knowledge Discovery,

More information

Equation to LaTeX. Abhinav Rastogi, Sevy Harris. I. Introduction. Segmentation.

Equation to LaTeX. Abhinav Rastogi, Sevy Harris. I. Introduction. Segmentation. Equation to LaTeX Abhinav Rastogi, Sevy Harris {arastogi,sharris5}@stanford.edu I. Introduction Copying equations from a pdf file to a LaTeX document can be time consuming because there is no easy way

More information

Feature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Feature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani Feature Selection CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Dimensionality reduction Feature selection vs. feature extraction Filter univariate

More information

Supervised Learning. Decision trees Artificial neural nets K-nearest neighbor Support vectors Linear regression Logistic regression...

Supervised Learning. Decision trees Artificial neural nets K-nearest neighbor Support vectors Linear regression Logistic regression... Supervised Learning Decision trees Artificial neural nets K-nearest neighbor Support vectors Linear regression Logistic regression... Supervised Learning y=f(x): true function (usually not known) D: training

More information

Data Mining Lecture 8: Decision Trees

Data Mining Lecture 8: Decision Trees Data Mining Lecture 8: Decision Trees Jo Houghton ECS Southampton March 8, 2019 1 / 30 Decision Trees - Introduction A decision tree is like a flow chart. E. g. I need to buy a new car Can I afford it?

More information

A Two Stage Zone Regression Method for Global Characterization of a Project Database

A Two Stage Zone Regression Method for Global Characterization of a Project Database A Two Stage Zone Regression Method for Global Characterization 1 Chapter I A Two Stage Zone Regression Method for Global Characterization of a Project Database J. J. Dolado, University of the Basque Country,

More information

Topics in Machine Learning

Topics in Machine Learning Topics in Machine Learning Gilad Lerman School of Mathematics University of Minnesota Text/slides stolen from G. James, D. Witten, T. Hastie, R. Tibshirani and A. Ng Machine Learning - Motivation Arthur

More information

Using Machine Learning to Optimize Storage Systems

Using Machine Learning to Optimize Storage Systems Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation

More information

Lecture 5: Decision Trees (Part II)

Lecture 5: Decision Trees (Part II) Lecture 5: Decision Trees (Part II) Dealing with noise in the data Overfitting Pruning Dealing with missing attribute values Dealing with attributes with multiple values Integrating costs into node choice

More information

CS178: Machine Learning and Data Mining. Complexity & Nearest Neighbor Methods

CS178: Machine Learning and Data Mining. Complexity & Nearest Neighbor Methods + CS78: Machine Learning and Data Mining Complexity & Nearest Neighbor Methods Prof. Erik Sudderth Some materials courtesy Alex Ihler & Sameer Singh Machine Learning Complexity and Overfitting Nearest

More information

Logical Rhythm - Class 3. August 27, 2018

Logical Rhythm - Class 3. August 27, 2018 Logical Rhythm - Class 3 August 27, 2018 In this Class Neural Networks (Intro To Deep Learning) Decision Trees Ensemble Methods(Random Forest) Hyperparameter Optimisation and Bias Variance Tradeoff Biological

More information

University of Ghana Department of Computer Engineering School of Engineering Sciences College of Basic and Applied Sciences

University of Ghana Department of Computer Engineering School of Engineering Sciences College of Basic and Applied Sciences University of Ghana Department of Computer Engineering School of Engineering Sciences College of Basic and Applied Sciences CPEN 405: Artificial Intelligence Lab 7 November 15, 2017 Unsupervised Learning

More information

Artificial Intelligence. Programming Styles

Artificial Intelligence. Programming Styles Artificial Intelligence Intro to Machine Learning Programming Styles Standard CS: Explicitly program computer to do something Early AI: Derive a problem description (state) and use general algorithms to

More information

Lazy Decision Trees Ronny Kohavi

Lazy Decision Trees Ronny Kohavi Lazy Decision Trees Ronny Kohavi Data Mining and Visualization Group Silicon Graphics, Inc. Joint work with Jerry Friedman and Yeogirl Yun Stanford University Motivation: Average Impurity = / interesting

More information

Algorithms: Decision Trees

Algorithms: Decision Trees Algorithms: Decision Trees A small dataset: Miles Per Gallon Suppose we want to predict MPG From the UCI repository A Decision Stump Recursion Step Records in which cylinders = 4 Records in which cylinders

More information