Rutgers University PhD in Management Program. Expert Systems Prof. Glenn Shafer Fall 2001
|
|
- Cynthia Bailey
- 5 years ago
- Views:
Transcription
1 Rutgers University PhD in Management Program Expert Systems Prof. Glenn Shafer Fall 2001 Report on Brodley and Utgoff (1992) Paper on Multivariate Versus Univariate Decision Trees Fatima Alali December 13, 2001
2 Introduction Classification trees are used to predict membership of cases or objects in the classes of a categorical dependent variable from their measurements on one or more predictor variables. A decision tree is either a leaf node containing the name of a class or a decision node containing a test. For each possible outcome of the test there is a branch to a decision tree. To classify a new instance, one starts at the root of the tree and follow the branch indicated by the outcome of each test until a leaf node is reached. The classification will be then the name of class at the leaf. Univariate decision trees test one feature at a node and thus result in larger trees than if multiple features are tested at a node. In a multivariate decision trees, each test can be based on one or more input features. The paper describes and evaluates different multivariate tree construction methods. The paper also presents experiments that support the theoretical analysis. Section 1provides an overview of the paper. Section 2 provides an overview of the C4.5 and LMDT software used in the empirical applications described in the paper. Section 3, provides an overview of results obtained in the paper. Section 4 assesses both C4.5 and LMDT in a different data set. Section 5 provides conclusion and future implications. 1. Paper Overview The paper provides empirical evaluation of the LMDT algorithms and demonstrates the need for multivariate tests and the need for LMDT to uncover linear structure in the data. 1.1 The theory underlying tree models (classification models): The basic strategy for classification tree is to recursively split the cells of the space of input variables. A given cell is splited through first searching over all variables and all possible thresholds to find whatever leads to the best improvement in a specified score function. The score is assessed based on training data set elements and then is cross validated using the test data set. One disadvantage of classification trees is that they are hierarchical (sequential) in nature, so that the tests in a decision tree are performed sequentially by following the branches of the tree. Thus, only those features that are needed to reach a decision will be evaluated. Decision trees use the sequential decision procedure for determining the classification of a new data point. Decision trees have many attractive attributes: They are easy to construct and to understand. They can handle mixed variables, discrete, continuous, numeric and symbolic as described below. They can easily partition the space using binary tests (thresholds on real variables) They predict the class value for a new case quickly. However, decision as said above are sequential in nature that may result in over grown tree, which is in turn difficult to understand and use since they may result in sub optimal partitions of the space of the input variables. 1.2 Tree construction issues: 2
3 There are two stages for any univariate and multivariate tree construction. First, building the tree using a set of training data, each described by features and labeled with a class name. A top-down approach algorithm chooses the best test to split the training set using some score function 1. The chosen test is then used to partition the training instance and a branch for each outcome of the test is created. This algorithm is applied recursively to each resulting split. If the data in a partition is from a single class then a leaf node is created and assigned the class label of the single class. Different forms of partition merit criteria can be used to judge goodness of split. Those forms commonly appear in terms of impurity and entropy measure. 2 One concern in tree structuring is to avoid overfitting the decision tree to the training data in domains that contain noisy instances 3. Overfitting occurs when the training data contain noisy instances and the decision tree algorithm induces a classifier that forces all instances in the training set correctly. Thus, when classifying a new instance, the tree may perform poorly. To avoid overfitting, the second stage of building a tree is to prune it back to eliminate branches that are not statistically valid. At the pruning stage, a leaf node replaces a whole sub tree. The replacement takes place if a decision rule establishes that the expected error rate in the sub tree is greater than in the single leaf Numeric vs. symbolic features When instances contain features that are symbolic (unordered) and numeric (ordered), it is important to map each unordered feature to a numeric feature without imposing any order on the unordered values of the feature. For some instances, not all feature values may be available. Missing values can be filled in by the estimate of the sample mean after normalization Characteristics of Linear Machines and Linear Machine Decision Trees Breiman et al. (1984) Conducted extensive research on univariate tress. Univariate trees are based on one input variable and thus restricted to represent a split through the instance space that is orthogonal to the variables axis. This process results in bias classification especially when input variables are related numerically. The LMDT tries to overcome this problem. The LMDT constructs each test in a decision tree by training a linear machine and then eliminating irrelevant and noisy variables in a controlled manner. LMDT achieves this through two properties. First, the method by which LMDT finds and eliminates noisy and irrelevant variables is computationally efficient. Second, the linear 1 The score functions for predictive models is a function of the difference between the predictions obtained from the model for each individual input and the targets, which are the response variables. Score function in predictive models can be presented by sum of squared errors between model predictions and actual target measurements. Predictive modeling for Classification. Chapter 7, Score Functions for Data Mining Algorithms. Page Brodley and Utgoff (1990), Multivariate decision trees. 3 Noisy instance is one for which either the class label is incorrect, some number of the attribute values are incorrect or a combination of the two. Brodley and Utgoff (1990). 4 Multivariate decision trees. Brodley and Utgoff (1990). 5 Normalization: at each node in the tree, each encoded symbolic and numeric feature is normalized by mapping it to standard normal form ~ (0, 1). 3
4 machine learning approach enables LMDT to find a good partition of the instance space, no matter the instance space is linearly separable or not. LMDT can handle instances described by numeric/symbolic variables in which some of the values may be missing. LMDT encodes symbolic variables to numeric variables and ensures no order is placed on these variables. The encoded variables are then normalized at each node to gage the relative importance of the encoded variables by the magnitudes of the corresponding weights. LMDT sets missing variables to zero, which correspond to the sample mean of, the corresponding encoded normalized variable. Encoding information is computed at each node and is retained in the tree for classification purposes later. The authors describe how the elimination of noisy and irrelevant variables is done. When LMDT detects that a linear machine is near its final set of boundaries, it eliminates variables that contribute least to discriminating the set of instances at that node and then continues training the linear machine. During the process of eliminating variables, the most accurate linear machine ends; the test for the decision node is the saved linear machine. Linear machines based on fewer variables are preferred in two cases. When the accuracy of a linear machine is based on fewer variables, accuracy is higher than the best accuracy observed thus far or if the drop in accuracy is not significantly different than the best accuracy as measured by t-test at the.01 level of significance. In this case, the linear machines based on fewer variables are saved and if the accuracy is higher, the system updates its values for the best accuracy observed thus far. Second, the algorithm avoids underfitting the data and will eliminate variables until the number of instances is greater than the capacity of a hyperplane. In this case, if the number of unique instances is not twice the dimensionality of each instance, then the linear machine with fewer variables is preferred. The authors measure contribution of a variable to the ability to discriminate by using a measure of the dispersion of its weights over the set of classes. Variables whose weights are widely dispersed achieve two objectives. First, a large magnitude weight causes the corresponding variable to make a large contribution to the value of the discriminant function (discriminability). Second, a variable whose weights are widely spaced makes different contributions to the value of the discrimination function of each class. LMDT computes dispersion for each variable as the average squared distance between weights of each pair of classes and then eliminates the variable with smallest dispersion. As for variable selection criteria, LMDT performs a sequential backward selection (SBS) search for a good combination of features. SBS starts with all initial features and tries to remove the feature that will cause the smallest decrease in accuracy as measured by a criterion function 6. This criterion function in LMDT removes the feature with the lowest weight of dispersion. LMDT recalculate weights after each elimination. 6 A criterion function is a figure of merit reflecting the amount of classification information conveyed by a feature. 4
5 The LMDT is able to overcome the L problem demonstrated in the paper. As the linear machine at the root begins to move toward one of the segments, misclassified instance from the other segment have decreasing effect, thus allowing linear machine to fine one of the segments without being misled by the distant points. The L problem appears in situations where permitting multivariate splits enable a decision tree algorithm to induce a better generalization than using only univariate splits and is thus the appropriate bias. A decision tree that permits only univariate splits would require a large number of tests to classify the training instances correctly. In univariate algorithms, an increase in the number of instances, clustered near a separating hyperplane, increases the number of splits necessary to classify the data correctly. However, the increase in the number of splits does not ensure an increase in accuracy for previously unseen points. 2 C4.5 and LMDT descriptions In this section I first start describing C4.5 software developed by Quinlan originated from ID3 algorithms used for inducing classification models from data. Then, I describe the LMDT algorithm for purposes of multivariate decision trees. 2.1 C4.5 and ID3 algorithm: The basic idea behind the ID3 is that in a decision tree, each node corresponds to a noncategorical attribute which is most informative among the attributes have not yet considered in the tree. It is also important to know how informative a node is. This procedure is called entropy. Below is a brief description of the ID3 algorithm. The ID3 algorithm is used to build a decision tree, given a set of non-categorical attributes C1, C2,.., Cn, the categorical attribute C, and a training set T of records. function ID3 (R: a set of non-categorical attributes, C: the categorical attribute, S: a training set) returns a decision tree; begin If S is empty, return a single node with value Failure; If S consists of records all with the same value for the categorical attribute, return a single node with that value; If R is empty, then return a single node with as value the most frequent of the values of the categorical attribute that are found in records of S; [note that then there will be errors, that is, records that will be improperly classified]; Let D be the attribute with largest Gain(D,S) among attributes in R; Let {dj j=1,2,.., m} be the values of attribute D; Let {Sj j=1,2,.., m} be the subsets of S consisting respectively of records with value dj for attribute D; Return a tree with root labeled D and arcs labeled d1, d2,.., dm going respectively to the trees 5
6 ID3(R-{D}, C, S1), ID3(R-{D}, C, S2),.., ID3(R-{D}, C, Sm); end ID3; 7 C4.5 is an extension of ID3 that considers missing value-attributes, continuous attributes, pruning procedures and rule derivation. 2.2 Linear Machine Decision Trees (as described in the paper): The LMDT algorithm builds a multiclass, multivariate decision tee using a top-down approach described above. For each decision node the LMDT trains a linear machine, based on a subset of the input variables, which then serves as multivariate test for the decision node. LM is multiclass linear discriminant which itself classifies an instance. Linear machine is a set of T linear discriminant functions that are used together to assign an instance to one of the R classes. For example, Y is an instance description (a pattern vector) consisting of a constant threshold value 1 and the numerically encoded features. gi(y) is a discriminant function of the form Wi Y, where Wi is a vector of adjustable coefficients known as weights. A LM infers instance Y to belong to one class i if and only if ( j,i<> j) gi(y) > gj(y). One way for training a linear machine is absolute error correction rule, which adjust Wi, where i is the class to which the instance belongs and j is the class to which the linear machine incorrectly assigns the instance. This correction is done by Wi Wi + cy and Wj Wj- cy where c = (Wj Wi) Y/ 2YY, is the smallest integer such that the updated linear machine will classify the instance correctly. When the instances are linearly separable, cycling through the instances allows the linear machine to partition the instances into separate convex regions. When the instances are not linearly separable, error correction may not cease and the classification accuracy will be unpredictable. To overcome this problem, the authors use a thermal perceptron 8, which they call thermal linear machine. This approach solve the problem when one large error is far away from the decision boundary through using c= β/ β+ k, where k = (Wj Wi) Y/ 2YY and annealing β during training. In addition, it solves the problem when a misclassified instance lies very close to the decision boundary through annealing c by β giving the correction coefficient c = β^2 / β+ k. The model reduces β geometrically by rate a and mathematically by constant b. this enables the algorithm to spend more time training with small values of β when it is refining the location of the decision boundary. Β is reduced for the magnitude of linear machine 9 decreased for the current weight adjustment and increased during the previous adjustment. 7 Building Classification Models: ID3 and C4.5 at: 8 Linear perceptron, developed by Frean (1990), provides stable behavior when instances are not linearly separable. Frean addresses the two problems. 9 The magnitude of the linear machine is defined as the sum of the magnitudes of its constituent weight vectors. 6
7 3 Overview of results obtained in Brodley and Utgoff s paper (1992): The paper uses LMDT across a variety of large data sets, symbolic, numeric, and logic and binary task and multi task data. It compares LMDT approach to C4.5 univariate approach across different tasks to investigate the circumstances the bias of multivariate (and thus LMDT s search bias for finding such a tree) is more appropriate. 3.1 Data description: Six data sets are used to represent a mix of symbolic, and/or numeric attributes, missing values, binary class tasks and multiclass tasks. The Clevland data set consists of 303 patient diagnoses (presence or absence of heart disease). The Glass data set contains different glass samples taken from the scene of an accident. The Iris data set that contains linearly separable and non-linearly separable tasks. Letter recognition data set describes whether a rectangular pixel white-and-black displays as one of the 26 capital letters in English alphabet. Pixel segmentation data set segments an image into seven classes. The vote data set is used to classify each member of the Congress in 1984 as Republican or Democrat using their votes on key issues Terminologies: Number of classes: output classifications classes Performance measures of each of the tasks are reported. Each reported measure is the average of ten runs. To achieve an estimate of the true error rate for five of the domains, authors performed a ten-fold cross validation for each run. The data were spit randomly for each run, with the same split used for both algorithms. Measures reported are: o Unique attributes: The number of the original input attributes that ever need to be evaluated somewhere in the tree. o Number of nodes is the number of linear machines in the tree o Number of leaves is the number of leaves in the tree o o Nodes: The number of test nodes in the tree o Leaves: The total number of leaves in the tree o Average variables per LM: the average number of encoded variables per linear machines o Epochs: The numbers of epochs need to converge to a tree that classifies the training instances correctly. An epoch equals to the number of instances in the training set o Bits: the umber of bits needed to represent the classifier. o Accuracy: the percentage of the test instances classified correctly. If the difference in the accuracy for the test set is statistically significantly different for the two algorithms, the paper reports this difference by highlighting the higher accuracy in bold face type. o The test for significance is a t-test at the.01 level of significance. 10 For more details on the data sets (domains), refer to Brodley and Utgoff (1992), page
8 3.3 Results: It is expected that the time required for LMDT is longer than the time required for C4.5 tree because the hypothesis space for multivariate decision trees is larger than the hypothesis space for univariate decision trees. To compare the difference, the authors report both the number of instances used to update the linear machine and the number of instances observed. For both algorithms, authors count the number of times each training instance is examined. All training instances are examined at the root of the tree, however, at each sub tree, the algorithm examines only a portion of the instance. The number reported in the paper is the sum of the number of instances observed at each node in the tree divided by the size of a training set. This count is fair because although C4.5, while searching for a test at a sub tree, may only examine part of each instance, the same is true for LMDT. It is not meaningful to compare the size of the resulting trees using measures such as number of nodes or number of leaves. The size of an LMDT can be more complex than a C4.5 node. To compare size of the trees, the authors use the Minimum Description Length Principle (MDLP), which states that the best hypothesis to induce from a data set is the one that minimizes the length of the hypothesis plus the length of the data when coded using the hypothesis to predict the data. Here the hypothesis is the decision tree and the data is the training set. The best hypothesis is the one that can represent the data with the fewest number of bits. To represent the data, we must code both the tree and the error vector 11. The results show that LMDT finds trees for the Clevland and Letter Recognition tasks that are statistically significantly more accurate than those C4.5 finds, whereas, C4.5 finds more accurate trees for the Glass and Votes tasks. The difference in accuracies for the Iris and the Pixel Segmentation tasks are not significant. The size of the trees as measured by the number of bits required to code the tree is not consistent with the MDLP. Authors explain this that their data codings are not provably optimal. 4. Empirical assessments of the univariate and multivariate decision trees using C4.5 and LMDT on different data sets The paper uses the C4.5 and the LMDT software to provide empirical results on performance on these software as well as to show how the resulting decision trees based on univariate tests and multivariate trees differ. Data limitation and software restrictions: In this part of this report, I use different data sets and compare the two algorithms. The original version of C4.5 has restricted access. Therefore, I used a See5 demo and run the software. The See5 12 demo is limited to small datasets (up to 400 cases for See5/C5.0) but it incorporates all the features of the C4.5 and C5 the updated version of C4.5. Data set used for LMDT is within software data. So that I have used two different data sets one on each software. Because training set data are comparable, due to limited attribute, results below can be meaningful with limitation. Changes also has been made in three 11 Appendix A of the paper provides the coding procedures. See Brodley and Utgoff (1992), Page See 5. is downloadable at 8
9 revisions, to change the weight training algorithms to run thermal training 10 times, it picks the set of weights that allow the LM to maximize the selection criterion (information-gain or accuracy), add the capability to have LMDT not discard features (user can choose this option with -k parameter) and add the capability for the user to choose accuracy as a selection criterion (user chooses accuracy with -y parameter). 13 Below are the outputs of See5 and LMDT, respectively: See5: See5 [Release 1.15] Wed Dec 12 15:50: ** This demonstration version cannot process ** ** more than 400 training or test cases. ** Read 400 cases (35 attributes) from soybean. data Decision tree: int-discolor = brown: brown-stem-rot (31.4/1.4) int-discolor = black: charcoal-rot (10.5/0.5) int-discolor = none: :...plant-growth = norm: :...leafspot-size = N/A: : :...canker-lesion = N/A: powdery-mildew (16.3/2.3) : : canker-lesion = brown: anthracnose (4.2/0.2) : : canker-lesion = dk-brown-blk: anthracnose (13.8/0.8) : : canker-lesion = tan: purple-seed-stain (6.4/0.4) : leafspot-size = lt-1/8: : :...canker-lesion in brown,dk-brown-blk: bacterial-blight (0) : : canker-lesion = tan: purple-seed-stain (7) : : canker-lesion = N/A: : : :...leafspots-marg = no-w-s-marg: bacterial-pustule (8.4/0.4) : : leafspots-marg = w-s-marg: : : :...seed-size = norm: bacterial-blight (11) : : seed-size = lt-norm: bacterial-pustule (2.6/0.6) : leafspot-size = gt-1/8: : :...mold-growth = present: : :...leaves = norm: diaporthe-pod-&-stem-blight (5.7) : : leaves = abnorm: downy-mildew (11) : mold-growth = absent: : :...fruit-pods = few-present: brown-spot (0) : fruit-pods = diseased: frog-eye-leaf-spot (28/1) : fruit-pods = norm: : :...fruiting-bodies = present: brown-spot (27) : fruiting-bodies = absent: : :...date = april: brown-spot (1) : date = may: brown-spot (14/1) : date = october: alternarialeaf-spot (18/1) : date = june: : :...precip = lt-norm: phyllosticta-leaf-spot (2) : : precip = norm: phyllosticta-leaf-spot (1) : : precip = gt-norm: brown-spot (11) : date = july: [S1] 13 LMDT Documentation provided by Carla E. Brodley Version 2 9/4/94. 9
10 : date = august: : :...severity = severe: alternarialeaf-spot (0) : : severity = minor: frog-eye-leaf-spot (3) : : severity = pot-severe: alternarialeaf-spot (11/3) : date = september: : :...stem = norm: alternarialeaf-spot (26/2) : stem = abnorm: frog-eye-leaf-spot (2) plant-growth = abnorm: :...leaves = norm: rhizoctonia-root-rot (13) leaves = abnorm: :...stem = abnorm: :...plant-stand = normal: : :...seed = norm: diaporthe-stem-canker (13.1/0.1) : : seed = abnorm: anthracnose (4) : plant-stand = lt-normal: : :...fruiting-bodies = absent: : :...area-damaged = scattered: anthracnose (1.9/0.9) : : area-damaged = low-areas: phytophthora-rot (47.5/0.1) : : area-damaged = upper-areas: 2-4-d-injury (0.1) : : area-damaged = whole-field: herbicide-injury (2.6/1.2) : fruiting-bodies = present: : :...roots = galls-cysts: phytophthora-rot (0) : roots = norm: anthracnose (4) : roots = rotted: phytophthora-rot (8.2/0.6) stem = norm: :...seed = abnorm: cyst-nematode (11/1.1) seed = norm: :...leafspot-size = N/A: 2-4-d-injury (0.1) leafspot-size = lt-1/8: bacterial-blight (2) leafspot-size = gt-1/8: :...leaf-shread = present: phyllosticta-leaf-spot (3) leaf-shread = absent: :...date in june,september, : october: brown-spot (0) date = april: brown-spot (2) date = may: brown-spot (2) date = july: frog-eye-leaf-spot (3/1) date = august: frog-eye-leaf-spot (1) SubTree [S1] area-damaged = scattered: frog-eye-leaf-spot (3/1) area-damaged = low-areas: brown-spot (2/1) area-damaged = upper-areas: phyllosticta-leaf-spot (3) area-damaged = whole-field: brown-spot (1) Evaluation on training data (400 cases): Decision Tree Size Errors 45 19( 4.8%) << (a) (b) (c) (d) (e) (f) (g) (h) (i) (j) (k) (l) (m) (n) (o) (p) (q) (r) (s) <-classified as 10
11 (a): class diaporthe-stem-canker 10 (b): class charcoal-rot 13 (c): class rhizoctonia-root-rot 55 1 (d): class phytophthora-rot 30 (e): class brown-stem-rot 14 (f): class powdery-mildew 11 (g): class downy-mildew 58 1 (h): class brown-spot 13 (i): class bacterial-blight 10 1 (j): class bacterial-pustule 13 (k): class purple-seed-stain 26 (l): class anthracnose (m): class phyllosticta-leaf-spot (n): class alternarialeaf-spot 6 37 (o): class frog-eye-leaf-spot 8 (p): class diaporthe-pod-&-stem-blight 11 (q): class cyst-nematode 4 (r): class 2-4-d-injury 2 1 (s): class herbicide-injury Evaluation on test data (233 cases): Decision Tree Size Errors 45 33(14.2%) << (a) (b) (c) (d) (e) (f) (g) (h) (i) (j) (k) (l) (m) (n) (o) (p) (q) (r) (s) <-classified as (a): class diaporthe-stem-canker 11
12 9 (b): class charcoal-rot 1 5 (c): class rhizoctonia-root-rot 28 (d): class phytophthora-rot 14 (e): class brown-stem-rot 4 (f): class powdery-mildew 8 (g): class downy-mildew (h): class brown-spot 5 (i): class bacterial-blight (j): class bacterial-pustule 6 (k): class purple-seed-stain 11 (l): class anthracnose 4 3 (m): class phyllosticta-leaf-spot 27 5 (n): class alternarialeaf-spot (o): class frog-eye-leaf-spot 7 (p): class diaporthe-pod-&-stem-blight 2 (q): class cyst-nematode 4 8 (r): class 2-4-d-injury 1 3 (s): class herbicide-injury Time: 0.1 secs LMDT Output: LM LM LM LEAF LEAF 2 12
13 LEAF LM LEAF LEAF 2 Output Statistics: Number Epochs : Num Insts seen: Num Insts trnd: Number nodes : 4 Number of LVs : 5 Unique vars : 6 Ave. vars/lm : 5.75 Train accuracy: Test accuracy : Train errors 66.0 Test errors 12.0 Time : 1 It is noticeable that the multivariate decision tree generated by LMDT is much smaller (simpler) than the one generated by the See5. decision trees. The See5 demo does not provide detailed statistics on the output. Multivariate decision trees have more complex nodes compared to univariate decision tree nodes. However, results may not be comparable due to the use of different data sets of different sizes. 4. Conclusion Since the objective of creating a multivariate decision tree algorithm is to overcome problems of univariate trees tests that branch splits are orthogonal to the variables axes. Results demonstrate that for some data sets the bias of univariate decision tree is more appropriate. This is because LMDT s bias for finding a multivariate tree may be inappropriate for some tasks because it may not find a univariate test when it should. LMDT variable elimination method is a greedy search procedure, which can get stuck on local maxima. Therefore, although the hypothesis space LMDT searches includes univariate decision trees, the heuristic nature of LMDT s search may result in selecting a test from an inappropriate part of the hypothesis space. A solution suggested by the authors would be to determine the appropriate bias dynamically for each test in the tree. The perceptron tree algorithm is one example of a system, which tries to determine the appropriate representational bias for the instances automatically. Specifically, the algorithm first tries to fit a linear threshold unit (LTU) to the space of the instances. If the 13
14 space is not linearly separable, then the bias of an LTU 14 is inappropriate and the system searches for the best univariate test. However, for some instance spaces, the best test may be based on a subset of the variables. The multivariate decision tree algorithm therefore should employ a dynamic control strategy for finding the appropriate representational bias for each test in the decision tree. Specifically, rather than search the space of multivariate tests using a fixed bias (like LMDT), such a system would have the capability to focus its search using heuristic measures of the learning process. Future directions suggested by the authors: The bias problem can be solved if a chosen algorithm is able to induce a good generalization that depends on how well the hypothesis space underlying the learning algorithm and the bias for searching that space fit the given task. Given that different algorithms search different hypothesis spaces, one algorithm can find a better hypothesis than other for some but not for all tasks. Given a task for which there is no a priori knowledge as to what the appropriate hypothesis space should be, a learning algorithm should itself determine what is the appropriate bias. 14 See Utgoff, P. E. and Brodley, C. E., Linear Machine Decision Trees. COINS Technical Report 91-10, January 1991, Department of Computer Science, University of Massachusetts, Amherst, MA
15 References Breiman, L., Friedman, J., Olshen, R., and Stone, C. (1984). Classification and Regression Trees. Wadsworth International Group, Belmont, CA. Building Classification Models: ID3 and C4.5 at: David J. Hand Padhraic Smyth Heikki Mannila, Principles of Data Mining. MIT Press, September Chapter 7, and 10, Score Functions for Data Mining Algorithms and Predictive Modeling For Classification. Brodley, C. E. and Utgoff, P. E., Multivariate Versus Univariate Decision Trees. COINS Technical Report 92-8, January 1992, Department of Computer Science, University of Massachusetts, Amherst, MA See 5. Demo at Utgoff, P. E. and Brodley, C. E. Linear Machine Decision Trees. COINS Technical Report 91-10, January 1991, Department of Computer Science, University of Massachusetts, Amherst, MA
Univariate and Multivariate Decision Trees
Univariate and Multivariate Decision Trees Olcay Taner Yıldız and Ethem Alpaydın Department of Computer Engineering Boğaziçi University İstanbul 80815 Turkey Abstract. Univariate decision trees at each
More informationMultivariate Decision Trees
achine Learning, 19, 45-77 (1995) 1995 Kluwer Academic Publishers, Boston. anufactured in The Netherlands. ultivariate Decision Trees CARLA E. BRODLEY School of Electrical Engineering, Purdue University,
More informationComparing Univariate and Multivariate Decision Trees *
Comparing Univariate and Multivariate Decision Trees * Olcay Taner Yıldız, Ethem Alpaydın Department of Computer Engineering Boğaziçi University, 80815 İstanbul Turkey yildizol@cmpe.boun.edu.tr, alpaydin@boun.edu.tr
More information7. Decision or classification trees
7. Decision or classification trees Next we are going to consider a rather different approach from those presented so far to machine learning that use one of the most common and important data structure,
More informationDecision Trees Dr. G. Bharadwaja Kumar VIT Chennai
Decision Trees Decision Tree Decision Trees (DTs) are a nonparametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target
More informationAn Information-Theoretic Approach to the Prepruning of Classification Rules
An Information-Theoretic Approach to the Prepruning of Classification Rules Max Bramer University of Portsmouth, Portsmouth, UK Abstract: Keywords: The automatic induction of classification rules from
More informationInduction of Multivariate Decision Trees by Using Dipolar Criteria
Induction of Multivariate Decision Trees by Using Dipolar Criteria Leon Bobrowski 1,2 and Marek Krȩtowski 1 1 Institute of Computer Science, Technical University of Bia lystok, Poland 2 Institute of Biocybernetics
More informationA Systematic Overview of Data Mining Algorithms. Sargur Srihari University at Buffalo The State University of New York
A Systematic Overview of Data Mining Algorithms Sargur Srihari University at Buffalo The State University of New York 1 Topics Data Mining Algorithm Definition Example of CART Classification Iris, Wine
More informationPerceptron-Based Oblique Tree (P-BOT)
Perceptron-Based Oblique Tree (P-BOT) Ben Axelrod Stephen Campos John Envarli G.I.T. G.I.T. G.I.T. baxelrod@cc.gatech sjcampos@cc.gatech envarli@cc.gatech Abstract Decision trees are simple and fast data
More informationBig Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1
Big Data Methods Chapter 5: Machine learning Big Data Methods, Chapter 5, Slide 1 5.1 Introduction to machine learning What is machine learning? Concerned with the study and development of algorithms that
More informationBusiness Club. Decision Trees
Business Club Decision Trees Business Club Analytics Team December 2017 Index 1. Motivation- A Case Study 2. The Trees a. What is a decision tree b. Representation 3. Regression v/s Classification 4. Building
More informationUniversity of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 10: Decision Trees
University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 10: Decision Trees colour=green? size>20cm? colour=red? watermelon size>5cm? size>5cm? colour=yellow? apple
More informationData Mining. 3.2 Decision Tree Classifier. Fall Instructor: Dr. Masoud Yaghini. Chapter 5: Decision Tree Classifier
Data Mining 3.2 Decision Tree Classifier Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction Basic Algorithm for Decision Tree Induction Attribute Selection Measures Information Gain Gain Ratio
More informationA Systematic Overview of Data Mining Algorithms
A Systematic Overview of Data Mining Algorithms 1 Data Mining Algorithm A well-defined procedure that takes data as input and produces output as models or patterns well-defined: precisely encoded as a
More informationDecision Tree CE-717 : Machine Learning Sharif University of Technology
Decision Tree CE-717 : Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Some slides have been adapted from: Prof. Tom Mitchell Decision tree Approximating functions of usually discrete
More informationUniversity of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 10: Decision Trees
University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 10: Decision Trees colour=green? size>20cm? colour=red? watermelon size>5cm? size>5cm? colour=yellow? apple
More informationCS Machine Learning
CS 60050 Machine Learning Decision Tree Classifier Slides taken from course materials of Tan, Steinbach, Kumar 10 10 Illustrating Classification Task Tid Attrib1 Attrib2 Attrib3 Class 1 Yes Large 125K
More informationFuzzy Partitioning with FID3.1
Fuzzy Partitioning with FID3.1 Cezary Z. Janikow Dept. of Mathematics and Computer Science University of Missouri St. Louis St. Louis, Missouri 63121 janikow@umsl.edu Maciej Fajfer Institute of Computing
More informationEvaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München
Evaluation Measures Sebastian Pölsterl Computer Aided Medical Procedures Technische Universität München April 28, 2015 Outline 1 Classification 1. Confusion Matrix 2. Receiver operating characteristics
More informationDecision trees. Decision trees are useful to a large degree because of their simplicity and interpretability
Decision trees A decision tree is a method for classification/regression that aims to ask a few relatively simple questions about an input and then predicts the associated output Decision trees are useful
More informationData Mining. Decision Tree. Hamid Beigy. Sharif University of Technology. Fall 1396
Data Mining Decision Tree Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 1 / 24 Table of contents 1 Introduction 2 Decision tree
More informationThe digital copy of this thesis is protected by the Copyright Act 1994 (New Zealand).
http://waikato.researchgateway.ac.nz/ Research Commons at the University of Waikato Copyright Statement: The digital copy of this thesis is protected by the Copyright Act 1994 (New Zealand). The thesis
More informationRandom Forest A. Fornaser
Random Forest A. Fornaser alberto.fornaser@unitn.it Sources Lecture 15: decision trees, information theory and random forests, Dr. Richard E. Turner Trees and Random Forests, Adele Cutler, Utah State University
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:
More informationNearest neighbor classification DSE 220
Nearest neighbor classification DSE 220 Decision Trees Target variable Label Dependent variable Output space Person ID Age Gender Income Balance Mortgag e payment 123213 32 F 25000 32000 Y 17824 49 M 12000-3000
More informationData Mining: Concepts and Techniques Classification and Prediction Chapter 6.1-3
Data Mining: Concepts and Techniques Classification and Prediction Chapter 6.1-3 January 25, 2007 CSE-4412: Data Mining 1 Chapter 6 Classification and Prediction 1. What is classification? What is prediction?
More informationClassification and Regression Trees
Classification and Regression Trees David S. Rosenberg New York University April 3, 2018 David S. Rosenberg (New York University) DS-GA 1003 / CSCI-GA 2567 April 3, 2018 1 / 51 Contents 1 Trees 2 Regression
More informationCOMP 465: Data Mining Classification Basics
Supervised vs. Unsupervised Learning COMP 465: Data Mining Classification Basics Slides Adapted From : Jiawei Han, Micheline Kamber & Jian Pei Data Mining: Concepts and Techniques, 3 rd ed. Supervised
More information5 Learning hypothesis classes (16 points)
5 Learning hypothesis classes (16 points) Consider a classification problem with two real valued inputs. For each of the following algorithms, specify all of the separators below that it could have generated
More informationSlides for Data Mining by I. H. Witten and E. Frank
Slides for Data Mining by I. H. Witten and E. Frank 7 Engineering the input and output Attribute selection Scheme-independent, scheme-specific Attribute discretization Unsupervised, supervised, error-
More informationMidterm Examination CS540-2: Introduction to Artificial Intelligence
Midterm Examination CS540-2: Introduction to Artificial Intelligence March 15, 2018 LAST NAME: FIRST NAME: Problem Score Max Score 1 12 2 13 3 9 4 11 5 8 6 13 7 9 8 16 9 9 Total 100 Question 1. [12] Search
More informationMIT Samberg Center Cambridge, MA, USA. May 30 th June 2 nd, by C. Rea, R.S. Granetz MIT Plasma Science and Fusion Center, Cambridge, MA, USA
Exploratory Machine Learning studies for disruption prediction on DIII-D by C. Rea, R.S. Granetz MIT Plasma Science and Fusion Center, Cambridge, MA, USA Presented at the 2 nd IAEA Technical Meeting on
More informationMIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018
MIT 801 [Presented by Anna Bosman] 16 February 2018 Machine Learning What is machine learning? Artificial Intelligence? Yes as we know it. What is intelligence? The ability to acquire and apply knowledge
More informationMachine Learning Techniques for Data Mining
Machine Learning Techniques for Data Mining Eibe Frank University of Waikato New Zealand 10/25/2000 1 PART VII Moving on: Engineering the input and output 10/25/2000 2 Applying a learner is not all Already
More informationClassification with Decision Tree Induction
Classification with Decision Tree Induction This algorithm makes Classification Decision for a test sample with the help of tree like structure (Similar to Binary Tree OR k-ary tree) Nodes in the tree
More informationLars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
Syllabus Fri. 27.10. (1) 0. Introduction A. Supervised Learning: Linear Models & Fundamentals Fri. 3.11. (2) A.1 Linear Regression Fri. 10.11. (3) A.2 Linear Classification Fri. 17.11. (4) A.3 Regularization
More informationPattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition
Pattern Recognition Kjell Elenius Speech, Music and Hearing KTH March 29, 2007 Speech recognition 2007 1 Ch 4. Pattern Recognition 1(3) Bayes Decision Theory Minimum-Error-Rate Decision Rules Discriminant
More informationNominal Data. May not have a numerical representation Distance measures might not make sense. PR and ANN
NonMetric Data Nominal Data So far we consider patterns to be represented by feature vectors of real or integer values Easy to come up with a distance (similarity) measure by using a variety of mathematical
More informationA fuzzy k-modes algorithm for clustering categorical data. Citation IEEE Transactions on Fuzzy Systems, 1999, v. 7 n. 4, p.
Title A fuzzy k-modes algorithm for clustering categorical data Author(s) Huang, Z; Ng, MKP Citation IEEE Transactions on Fuzzy Systems, 1999, v. 7 n. 4, p. 446-452 Issued Date 1999 URL http://hdl.handle.net/10722/42992
More informationWhat is Learning? CS 343: Artificial Intelligence Machine Learning. Raymond J. Mooney. Problem Solving / Planning / Control.
What is Learning? CS 343: Artificial Intelligence Machine Learning Herbert Simon: Learning is any process by which a system improves performance from experience. What is the task? Classification Problem
More informationEnsemble Methods, Decision Trees
CS 1675: Intro to Machine Learning Ensemble Methods, Decision Trees Prof. Adriana Kovashka University of Pittsburgh November 13, 2018 Plan for This Lecture Ensemble methods: introduction Boosting Algorithm
More informationExtra readings beyond the lecture slides are important:
1 Notes To preview next lecture: Check the lecture notes, if slides are not available: http://web.cse.ohio-state.edu/~sun.397/courses/au2017/cse5243-new.html Check UIUC course on the same topic. All their
More informationData Mining Classification - Part 1 -
Data Mining Classification - Part 1 - Universität Mannheim Bizer: Data Mining I FSS2019 (Version: 20.2.2018) Slide 1 Outline 1. What is Classification? 2. K-Nearest-Neighbors 3. Decision Trees 4. Model
More informationUSING CONVEX PSEUDO-DATA TO INCREASE PREDICTION ACCURACY
1 USING CONVEX PSEUDO-DATA TO INCREASE PREDICTION ACCURACY Leo Breiman Statistics Department University of California Berkeley, CA 94720 leo@stat.berkeley.edu ABSTRACT A prediction algorithm is consistent
More informationData Mining. 3.3 Rule-Based Classification. Fall Instructor: Dr. Masoud Yaghini. Rule-Based Classification
Data Mining 3.3 Fall 2008 Instructor: Dr. Masoud Yaghini Outline Using IF-THEN Rules for Classification Rules With Exceptions Rule Extraction from a Decision Tree 1R Algorithm Sequential Covering Algorithms
More informationCS229 Lecture notes. Raphael John Lamarre Townshend
CS229 Lecture notes Raphael John Lamarre Townshend Decision Trees We now turn our attention to decision trees, a simple yet flexible class of algorithms. We will first consider the non-linear, region-based
More informationLecture 7: Decision Trees
Lecture 7: Decision Trees Instructor: Outline 1 Geometric Perspective of Classification 2 Decision Trees Geometric Perspective of Classification Perspective of Classification Algorithmic Geometric Probabilistic...
More informationLecture on Modeling Tools for Clustering & Regression
Lecture on Modeling Tools for Clustering & Regression CS 590.21 Analysis and Modeling of Brain Networks Department of Computer Science University of Crete Data Clustering Overview Organizing data into
More informationCS 229 Midterm Review
CS 229 Midterm Review Course Staff Fall 2018 11/2/2018 Outline Today: SVMs Kernels Tree Ensembles EM Algorithm / Mixture Models [ Focus on building intuition, less so on solving specific problems. Ask
More informationInternational Journal of Software and Web Sciences (IJSWS)
International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) ISSN (Print): 2279-0063 ISSN (Online): 2279-0071 International
More informationOblique Linear Tree. 1. Introduction
Oblique Linear Tree João Gama LIACC, FEP - University of Porto Rua Campo Alegre, 823 4150 Porto, Portugal Phone: (+351) 2 6001672 Fax: (+351) 2 6003654 Email: jgama@ncc.up.pt WWW: http//www.up.pt/liacc/ml
More informationLogistic Model Tree With Modified AIC
Logistic Model Tree With Modified AIC Mitesh J. Thakkar Neha J. Thakkar Dr. J.S.Shah Student of M.E.I.T. Asst.Prof.Computer Dept. Prof.&Head Computer Dept. S.S.Engineering College, Indus Engineering College
More informationSupervised Learning with Neural Networks. We now look at how an agent might learn to solve a general problem by seeing examples.
Supervised Learning with Neural Networks We now look at how an agent might learn to solve a general problem by seeing examples. Aims: to present an outline of supervised learning as part of AI; to introduce
More informationDecision Trees Oct
Decision Trees Oct - 7-2009 Previously We learned two different classifiers Perceptron: LTU KNN: complex decision boundary If you are a novice in this field, given a classification application, are these
More informationUsing Pairs of Data-Points to Define Splits for Decision Trees
Using Pairs of Data-Points to Define Splits for Decision Trees Geoffrey E. Hinton Department of Computer Science University of Toronto Toronto, Ontario, M5S la4, Canada hinton@cs.toronto.edu Michael Revow
More informationClustering and Visualisation of Data
Clustering and Visualisation of Data Hiroshi Shimodaira January-March 28 Cluster analysis aims to partition a data set into meaningful or useful groups, based on distances between data points. In some
More informationCombined Weak Classifiers
Combined Weak Classifiers Chuanyi Ji and Sheng Ma Department of Electrical, Computer and System Engineering Rensselaer Polytechnic Institute, Troy, NY 12180 chuanyi@ecse.rpi.edu, shengm@ecse.rpi.edu Abstract
More informationMachine Learning. Decision Trees. Manfred Huber
Machine Learning Decision Trees Manfred Huber 2015 1 Decision Trees Classifiers covered so far have been Non-parametric (KNN) Probabilistic with independence (Naïve Bayes) Linear in features (Logistic
More information8. Tree-based approaches
Foundations of Machine Learning École Centrale Paris Fall 2015 8. Tree-based approaches Chloé-Agathe Azencott Centre for Computational Biology, Mines ParisTech chloe agathe.azencott@mines paristech.fr
More informationDecision tree learning
Decision tree learning Andrea Passerini passerini@disi.unitn.it Machine Learning Learning the concept Go to lesson OUTLOOK Rain Overcast Sunny TRANSPORTATION LESSON NO Uncovered Covered Theoretical Practical
More informationMachine Learning. A. Supervised Learning A.7. Decision Trees. Lars Schmidt-Thieme
Machine Learning A. Supervised Learning A.7. Decision Trees Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim, Germany 1 /
More informationClassification: Linear Discriminant Functions
Classification: Linear Discriminant Functions CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Discriminant functions Linear Discriminant functions
More informationarxiv: v1 [stat.ml] 25 Jan 2018
arxiv:1801.08310v1 [stat.ml] 25 Jan 2018 Information gain ratio correction: Improving prediction with more balanced decision tree splits Antonin Leroux 1, Matthieu Boussard 1, and Remi Dès 1 1 craft ai
More informationNotes based on: Data Mining for Business Intelligence
Chapter 9 Classification and Regression Trees Roger Bohn April 2017 Notes based on: Data Mining for Business Intelligence 1 Shmueli, Patel & Bruce 2 3 II. Results and Interpretation There are 1183 auction
More informationHIMIC : A Hierarchical Mixed Type Data Clustering Algorithm
HIMIC : A Hierarchical Mixed Type Data Clustering Algorithm R. A. Ahmed B. Borah D. K. Bhattacharyya Department of Computer Science and Information Technology, Tezpur University, Napam, Tezpur-784028,
More informationData Mining Part 5. Prediction
Data Mining Part 5. Prediction 5.4. Spring 2010 Instructor: Dr. Masoud Yaghini Outline Using IF-THEN Rules for Classification Rule Extraction from a Decision Tree 1R Algorithm Sequential Covering Algorithms
More information1) Give decision trees to represent the following Boolean functions:
1) Give decision trees to represent the following Boolean functions: 1) A B 2) A [B C] 3) A XOR B 4) [A B] [C Dl Answer: 1) A B 2) A [B C] 1 3) A XOR B = (A B) ( A B) 4) [A B] [C D] 2 2) Consider the following
More informationBest First and Greedy Search Based CFS and Naïve Bayes Algorithms for Hepatitis Diagnosis
Best First and Greedy Search Based CFS and Naïve Bayes Algorithms for Hepatitis Diagnosis CHAPTER 3 BEST FIRST AND GREEDY SEARCH BASED CFS AND NAÏVE BAYES ALGORITHMS FOR HEPATITIS DIAGNOSIS 3.1 Introduction
More informationMLPQNA-LEMON Multi Layer Perceptron neural network trained by Quasi Newton or Levenberg-Marquardt optimization algorithms
MLPQNA-LEMON Multi Layer Perceptron neural network trained by Quasi Newton or Levenberg-Marquardt optimization algorithms 1 Introduction In supervised Machine Learning (ML) we have a set of data points
More informationNominal Data. May not have a numerical representation Distance measures might not make sense PR, ANN, & ML
Decision Trees Nominal Data So far we consider patterns to be represented by feature vectors of real or integer values Easy to come up with a distance (similarity) measure by using a variety of mathematical
More informationThe Basics of Decision Trees
Tree-based Methods Here we describe tree-based methods for regression and classification. These involve stratifying or segmenting the predictor space into a number of simple regions. Since the set of splitting
More informationLecture 2 :: Decision Trees Learning
Lecture 2 :: Decision Trees Learning 1 / 62 Designing a learning system What to learn? Learning setting. Learning mechanism. Evaluation. 2 / 62 Prediction task Figure 1: Prediction task :: Supervised learning
More informationCHAPTER 4 DETECTION OF DISEASES IN PLANT LEAF USING IMAGE SEGMENTATION
CHAPTER 4 DETECTION OF DISEASES IN PLANT LEAF USING IMAGE SEGMENTATION 4.1. Introduction Indian economy is highly dependent of agricultural productivity. Therefore, in field of agriculture, detection of
More informationData Mining Practical Machine Learning Tools and Techniques. Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank Implementation: Real machine learning schemes Decision trees Classification
More informationLecture 10 September 19, 2007
CS 6604: Data Mining Fall 2007 Lecture 10 September 19, 2007 Lecture: Naren Ramakrishnan Scribe: Seungwon Yang 1 Overview In the previous lecture we examined the decision tree classifier and choices for
More informationNeural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani
Neural Networks CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Biological and artificial neural networks Feed-forward neural networks Single layer
More informationData Mining. 3.5 Lazy Learners (Instance-Based Learners) Fall Instructor: Dr. Masoud Yaghini. Lazy Learners
Data Mining 3.5 (Instance-Based Learners) Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction k-nearest-neighbor Classifiers References Introduction Introduction Lazy vs. eager learning Eager
More informationMachine Learning Models for Pattern Classification. Comp 473/6731
Machine Learning Models for Pattern Classification Comp 473/6731 November 24th 2016 Prof. Neamat El Gayar Neural Networks Neural Networks Low level computational algorithms Learn by example (no required
More informationLecture outline. Decision-tree classification
Lecture outline Decision-tree classification Decision Trees Decision tree A flow-chart-like tree structure Internal node denotes a test on an attribute Branch represents an outcome of the test Leaf nodes
More informationData Preprocessing. Slides by: Shree Jaswal
Data Preprocessing Slides by: Shree Jaswal Topics to be covered Why Preprocessing? Data Cleaning; Data Integration; Data Reduction: Attribute subset selection, Histograms, Clustering and Sampling; Data
More informationUnsupervised Learning
Unsupervised Learning Unsupervised learning Until now, we have assumed our training samples are labeled by their category membership. Methods that use labeled samples are said to be supervised. However,
More informationImproving Quality of Products in Hard Drive Manufacturing by Decision Tree Technique
Improving Quality of Products in Hard Drive Manufacturing by Decision Tree Technique Anotai Siltepavet 1, Sukree Sinthupinyo 2 and Prabhas Chongstitvatana 3 1 Computer Engineering, Chulalongkorn University,
More informationSegmentation of Images
Segmentation of Images SEGMENTATION If an image has been preprocessed appropriately to remove noise and artifacts, segmentation is often the key step in interpreting the image. Image segmentation is a
More informationClassification with PAM and Random Forest
5/7/2007 Classification with PAM and Random Forest Markus Ruschhaupt Practical Microarray Analysis 2007 - Regensburg Two roads to classification Given: patient profiles already diagnosed by an expert.
More informationClassification Algorithms in Data Mining
August 9th, 2016 Suhas Mallesh Yash Thakkar Ashok Choudhary CIS660 Data Mining and Big Data Processing -Dr. Sunnie S. Chung Classification Algorithms in Data Mining Deciding on the classification algorithms
More informationEmpirical Evaluation of Feature Subset Selection based on a Real-World Data Set
P. Perner and C. Apte, Empirical Evaluation of Feature Subset Selection Based on a Real World Data Set, In: D.A. Zighed, J. Komorowski, and J. Zytkow, Principles of Data Mining and Knowledge Discovery,
More informationEquation to LaTeX. Abhinav Rastogi, Sevy Harris. I. Introduction. Segmentation.
Equation to LaTeX Abhinav Rastogi, Sevy Harris {arastogi,sharris5}@stanford.edu I. Introduction Copying equations from a pdf file to a LaTeX document can be time consuming because there is no easy way
More informationFeature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani
Feature Selection CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Dimensionality reduction Feature selection vs. feature extraction Filter univariate
More informationSupervised Learning. Decision trees Artificial neural nets K-nearest neighbor Support vectors Linear regression Logistic regression...
Supervised Learning Decision trees Artificial neural nets K-nearest neighbor Support vectors Linear regression Logistic regression... Supervised Learning y=f(x): true function (usually not known) D: training
More informationData Mining Lecture 8: Decision Trees
Data Mining Lecture 8: Decision Trees Jo Houghton ECS Southampton March 8, 2019 1 / 30 Decision Trees - Introduction A decision tree is like a flow chart. E. g. I need to buy a new car Can I afford it?
More informationA Two Stage Zone Regression Method for Global Characterization of a Project Database
A Two Stage Zone Regression Method for Global Characterization 1 Chapter I A Two Stage Zone Regression Method for Global Characterization of a Project Database J. J. Dolado, University of the Basque Country,
More informationTopics in Machine Learning
Topics in Machine Learning Gilad Lerman School of Mathematics University of Minnesota Text/slides stolen from G. James, D. Witten, T. Hastie, R. Tibshirani and A. Ng Machine Learning - Motivation Arthur
More informationUsing Machine Learning to Optimize Storage Systems
Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation
More informationLecture 5: Decision Trees (Part II)
Lecture 5: Decision Trees (Part II) Dealing with noise in the data Overfitting Pruning Dealing with missing attribute values Dealing with attributes with multiple values Integrating costs into node choice
More informationCS178: Machine Learning and Data Mining. Complexity & Nearest Neighbor Methods
+ CS78: Machine Learning and Data Mining Complexity & Nearest Neighbor Methods Prof. Erik Sudderth Some materials courtesy Alex Ihler & Sameer Singh Machine Learning Complexity and Overfitting Nearest
More informationLogical Rhythm - Class 3. August 27, 2018
Logical Rhythm - Class 3 August 27, 2018 In this Class Neural Networks (Intro To Deep Learning) Decision Trees Ensemble Methods(Random Forest) Hyperparameter Optimisation and Bias Variance Tradeoff Biological
More informationUniversity of Ghana Department of Computer Engineering School of Engineering Sciences College of Basic and Applied Sciences
University of Ghana Department of Computer Engineering School of Engineering Sciences College of Basic and Applied Sciences CPEN 405: Artificial Intelligence Lab 7 November 15, 2017 Unsupervised Learning
More informationArtificial Intelligence. Programming Styles
Artificial Intelligence Intro to Machine Learning Programming Styles Standard CS: Explicitly program computer to do something Early AI: Derive a problem description (state) and use general algorithms to
More informationLazy Decision Trees Ronny Kohavi
Lazy Decision Trees Ronny Kohavi Data Mining and Visualization Group Silicon Graphics, Inc. Joint work with Jerry Friedman and Yeogirl Yun Stanford University Motivation: Average Impurity = / interesting
More informationAlgorithms: Decision Trees
Algorithms: Decision Trees A small dataset: Miles Per Gallon Suppose we want to predict MPG From the UCI repository A Decision Stump Recursion Step Records in which cylinders = 4 Records in which cylinders
More information