LIACC. Machine Learning group. Sequence-based Methods for Pruning Regression Trees by Luís Torgo. Internal Report n /4/98

Size: px
Start display at page:

Download "LIACC. Machine Learning group. Sequence-based Methods for Pruning Regression Trees by Luís Torgo. Internal Report n /4/98"

Transcription

1 LIACC Machine Learning group Internal Report n Sequence-based Methods for Pruning Regression Trees by Luís Torgo 20/4/98

2 Sequence-based methods for Pruning Regression Trees Luís Torgo LIACC - University of Porto R. Campo Alegre, Porto Portugal ltorgo@ncc.up.pt URL : Phone : (+351) Fax : (+351) ABSTRACT: Pruning has been considered the key issue for obtaining reliable tree-based models. Current approaches to this task have taken two distinct pathways. On one hand we have methods that are based on selection from a sequence of candidate pruned trees. On the other hand we have algorithms that follow a bottom-up approach by pruning unreliable lower branches of the unpruned tree, thus producing a single tree model. These two approaches entail a different trade-off between accuracy and interpretability. This paper presents a comparative study between representatives of both approaches. The study is carried out in the context of regression trees with a focus on sequence-based methods. We describe some existing algorithms and present some new variants. We claim that sequence-based methods provide an additional benefit by allowing the user to easily select a model with the intended degree of interpretability. Moreover, our study reveals that this is achieved without sacrificing accuracy. KEYWORDS: Tree pruning methods, regression trees. 1 Introduction Recursive partitioning is the algorithm behind most decision tree algorithms. This algorithm provides computational efficiency at the cost of statistical unreliability in lower branches of the trees. Moreover, noisy domains often lead to the well-known

3 problem of over-specialization. Post-pruning of these trees has always been considered a major step in tree induction (Breiman et al., 1984). Breiman and his colleagues have described a pruning methodology based on the notion of tree selection using reliable error estimates. A side effect of this approach, which is often overlooked, is that it provides a set of alternative pruned trees entailing different compromises between accuracy and simplicity. The availability of these set of alternative models can be considered a key advantage to the user of these systems, as he may easily choose alternative trees to convey his domain-specific needs. Interpretability has always been considered a key advantage of Machine Learning (ML) approaches and these methods provide an additional degree of flexibility in this respect. Niblett and Bratko (1986) have described an alternative approach to tree post-pruning. Their algorithm proceeds by pruning unreliable lower branches of the trees in a bottom-up fashion. Compared to Breiman et al. s approach this method is computational more efficient, but on the other hand it produces a unique tree as result. The only way the user can get alternative tree models is by running the algorithm again using different set-ups for the parameters of the used error estimation method. This entails additional computational cost. Moreover, these parameters often lack intuitive meaning. Our study compares these two approaches to pruning in terms of accuracy and interpretability in twelve regression domains. The comparison is carried out using a common regression tree learning system. Our system (RT) implements several sequence-based pruning methods as well as Niblett & Bratko s method. Existing comparative work (Mingers, 1989; Esposito et al., 1993, 1995) has not addressed this issue of the flexible model choice of sequence-based pruning methods. These works have compared full-featured pruning algorithms. Ours emphasizes sequencebased pruning methods. We introduce several new variants for exploring the large search space of pruned trees. We evaluate these methods and stress their differences compared to a well-known single-tree pruning algorithm (Niblett & Bratko s method). The next section describes the pruning methods that we use in our comparisons. Section 3 presents the experiments carried out, while Section 4 contains the main conclusions of this work 2 Pruning Regression Trees Post-pruning of tree models can be seen as a search through the space of all possible sub-trees of the original overly large learned tree (Esposito et al.,1993). Esposito and her colleagues (1993) have described a series of pruning algorithms

4 from this search perspective. Their work presents the approaches followed by different pruning methods with respect to the forms of exploring the search space, the operators used for moving through the search states and the used state evaluation functions. Our study is focused on the question of how the next state is generated and how further the search goes. Sequence-based methods use a hillclimbing method that selects the next node to prune. In this paper we present several alternative functions to guide this selection. Moreover, these methods keep exploring the search space until a single leaf node tree is reached. Selection of the final tree model is left to a second stage of the pruning process. This stage is considered different from tree generation (Breiman et al.,1984). As a result of this procedure systems using these approaches can output both the selected tree as well as the other trials. Moreover, they provide the evaluation of these alternatives in the selection phase. An example of such sequence is given in Table 1 that results from applying our RT system to the Boston Housing domain from the UCI repository (Merz and Murphy, 1996). The information provided by this type of tables can be of extreme relevance in order to allow the user to explore different alternative models without additional learning effort. We claim that this is an important advantage in real world applications. Furthermore, it is known that it is quite difficult to find algorithm settings that work perfectly over all domains. Other pruning methods exist that do not provide such a sequence of alternative trees. Apart from the CART system (Breiman et al., 1984) most systems follow such strategies. Most of the existing alternatives use a bottom-up approach (Esposito et al.,1993). They start with the unpruned tree and visit the nodes using some pre-defined order. Each time some evaluation conditions are met the nodes are pruned. These methods produce a single tree as the result of the pruning stage. However, it should be mentioned that by casting these methods into a search-based framework as proposed by Esposito et al. (1993), one can see that they in fact explore alternative tree models. This means that these methods could theoretically be adapted in order to produce a set of trees (although potentially smaller than the ones generated by sequence-based methods). Still, this research line has not been explored.

5 Table 1. An example of a sequence of trees for the Housing data set. TREE N.LEAVES ERR ERR SE(ERR ) MDL PRUNED AT NODES T T T T T T T T7 * T T T T T12 $ T T T T T T TABLE LEGEND: ERR - Training Set Error ERR' - Estimated true error SE ERR' - Standard Error of the estimate * - The lowest estimated error tree + - The 1-SE tree MDL - Code description length of the tree (in bits) $ - The Minimum Description Length tree 2.1 Generating Sequences of Pruned Trees There are two main approaches to sequence-based pruning. Nested methods generate a sequence of trees where each element is a sub-tree of the previous one. The key issue on these methods is the selection of the next node to prune on each iteration. We will describe several alternative approaches to this issue. Optimal pruning algorithms generate sequences of pruned trees with size decreasing in one, where each element is the most accurate tree of all possible trees with that size. The trees in these sequences are not nested. This approach was already mentioned in the CART book (Breiman et al.,1984) although an algorithm was not provided. Bohanec and Bratko (1994) proposed a dynamic programming algorithm (OPT) to solve this search problem. Allmuallin (1996) has recently

6 proposed an improvement of this algorithm (OPT-2). In our study we use this algorithm as the representative of this type of methods. We now briefly describe the main ideas of the methods we use in our experimental comparison. All of them were implemented in the RT system Minimal Cost Complexity This is a nested sequence method originally presented in Breiman et al. (1984). This method is based on a measure of the cost complexity of the trees. This value is given by the following equation: where, ~ Rα +α ( T ) = R( T ) T (1) α is a complexity parameter; R(T) is the error in the training set of the tree T; and T ~ is the number of leaves of the tree. Based on this notion of α-based cost complexity Breiman and his colleagues describe an iterative algorithm that generates a sequence of nested pruned trees. At each step the node that is chosen to be pruned is the node t that minimizes the function : R( t) R( Tt ) f ( t) = ~ (2) T 1 where, R(t) is the training error at node t ; R(T t ) is the training error of the sub-tree rooted at t ; and T ~ t is the number of leaves of this sub-tree. t The Minimal Error Decrease This nested method uses a simple heuristic to choose the next node to prune. Based on the error estimates obtained during the learning phase we proceed by selecting at each step the splitting node with lowest impact in the tree error. This can be described as selecting the node t that minimizes the function : f ( t) = R( t) ( ) (3) R T t

7 2.1.3 The Minimum Statistical Support method The idea behind this method is to prune the node that is potentially less reliable. We follow a heuristic approach to measure statistical reliability. We make the strong assumption that nodes with lower number of training samples lead to less reliable estimates of the error. Based on this assumption we select the node with minimal number of cases as the next node to prune The MDL-based method Minimum description length (Rissanen, 1983) has been used as a means to characterize models within ML (Quinlan and Rivest, 1989; Wallace and Patrick, 1993). This methodology based on a sound theoretical background provides means to incorporate in a unique measure both the model complexity as well as its errors. In a way this is the idea behind the Minimal Cost Complexity measure presented in Section 2.1.1, but this latter lacks theoretical justification. Recently, Robnik- Sikonja and Kononenko (1998) presented a coding schema for regression tree models. Based on this coding we have developed a method for selecting a node to prune in a nested sequence generation algorithm. This method consists of choosing the node with lowest variation in description length. Notice that we do not say lowest decrease as in the previous training error-based measures. Training error always decreases as the size of the tree grows. On the contrary, MDL coding follows a U-shaped curve (we can observe this effect in Table 1). The function guiding the choice of a node can be described by, where, t ( t) MDL( T ) min MDL (4) MDL(t) is the coding length of node t if it was a leaf; and MDL(T t ) is the coding length of the sub-tree rooted at t. The coding length of a leaf node amounts to a sum of the coding lengths of the errors committed by the model at that node plus the coding of the model. This turns out to be a sum of the coding of a set of real numbers, which can be solved using the method proposed by Rissanen (1983). The coding length of a tree is the sum of the coding of its split nodes plus the coding of its leaf nodes. We follow the same strategy outlined in Robnik-Sikonja and Kononenko (1998) to code each split node. t

8 2.1.5 The OPT-2 method Breiman et al.,(1984) defined the notion of an optimal sequence of pruned trees as sequence of trees with size increasing in one, such that each tree is the highest accuracy tree for that size. Based on these ideas, Bohanec and Bratko (1994) have presented a dynamic programming algorithm (OPT) that solved the problem of finding these trees in an efficient way. Recently, Almuallim (1996) has described an improvement of this algorithm, named OPT-2. This algorithm improves the computational efficiency of OPT and it is able to generate the trees sequentially as opposed to OPT that generates the whole sequence simultaneously. We have implemented the OPT-2 algorithm within our RT system The Depth-first method As a kind of baseline method we have used an algorithm that runs through the tree in a depth-first fashion. Each time this method reaches a node whose descendants are leaves it generates a new pruned tree by pruning at that node. 2.2 Single-tree Methods These methods produce as outcome a single tree as opposed to the methods described in the previous section. Several methods exist that explore the search space using different heuristics. Niblett and Bratko s algorithm (1986) is one of the most well known algorithms. Its has been applied in the context of regression trees in the RETIS system (Karalic and Cestnik, 1991; Karalic, 1992). This algorithm runs through every node of the tree calculating their error (called static error in the author s nomenclature). In the RETIS system this calculus is based on m-estimates (Cestnik, 1990; Karalic and Cestnik, 1991) of the variance. These estimates provide more reliable values of the node errors than the ones obtained with the training set. The m-estimate of the error in a node is based on the estimate of the average Y value and is given by 2 ( σ ) ( ) m Est µ n Y i= 1 1 = n + m n y + m N ( ) n + m i i= 1 N i= 1 1 m m Est Y = y µ n + m N y i 2 2 ( Est( ) ( ) 2 i + yi m (5) N n + m i= 1

9 where, N is the number of training cases and n is the number in the node; and m is a parameter of these estimates. For each split node a dynamic error is also calculated. This is equal to the weighted sum of the errors of its children. These weights are obtained taking into account the number of cases going to each branch. In case the dynamic error is superior to the static error the node is pruned. We have implemented this algorithm in our RT system. M-estimates can also be used to calculate the error of a tree. This will enable the use of these estimates as a means to select a tree in a sequence-based method. By proceeding this way we will be able to compare sequence-based methods to this Niblett and Bratko s algorithm. 3 Experimental Comparison In the first set of experiments we have tried to assert if there was a clear winner among the sequence-based methods. To compare different sequences of trees we have used the following methodology. We have tried to find out the potential of each sequence. The goal of producing a sequence is to provide a set of alternative models that will be used in a sub-sequent evaluation phase for the final tree selection. If we had a perfect tree evaluation method we could compare the best selections in all sequences. We can simulate this set-up by using a sufficiently large separate set of cases. Using these cases we can estimate the error of each tree in each sequence and compare the best trees of each sequence. These best trees represent the potential of each sequence generation method. The goal of the comparisons we have carried out was to observe the performance of each method for different training sample sizes. We have used the data sets presented in Table 2. Table 2. The used data sets and the available number of cases. Data Set Training Pool;Test Set Data Set Training Pool;Test Set Abalone 3133;1044 Kinematics 4500;3692 Pole 5000;4065 Fried ;5000 Elevators 8752;7847 Census16H 17000;5784 Ailerons 7154;6596 Census8L 17000;5784 CompActiv. 4500;3692 2Dplanes 20000;5000 CompActiv(s) 4500;3692 Mv ;5000

10 For each data set we have randomly obtained samples of 300, 600, 1000 and 2000 cases, from the training pool. Using the RT learning system we have obtained a large regression tree for each sample. The sequence-based methods were then used to obtain a sequence of pruned trees. The large separate test set was then used to select the best tree of each sequence. We have compared the size and accuracy of these trees. The results we present are averages of 20 random repetitions for each training size. The results are given as percentage losses over the result of the best method (thus the best result for each data set has a score of 0%). Figure 1 presents the results for the tree sizes (measured as the number of leaves) for all data sets and training sizes. The results of this figure reveal that the methods have a similar behavior for the different training sizes. In effect, with few exceptions the differences are not big in terms of tree size. The apparent absence of a method consistently outperforming the others can be explained by the fact that there is usually a large amount of trees with similar performance but quite different sizes as it was observed by Breiman et al. (1984). 80% Samples of 300 Cases 80% Samples of 600 Cases 60% 60% 40% 40% 20% 20% 0% Abalone 2Planes Pole Elevators Ailerons Mv1 Fried1 CompAct CompAct(s) Census16H Census8L Kinematics DepthFrst MinError MCC MDL OPT2 MinSS 0% Abalone 2Planes Pole Elevators Ailerons Mv1 Fried1 CompAct CompAct(s) Census16H Census8L Kinematics DepthFrst MinError MCC MDL OPT2 MinSS 80% Samples of 1000 Cases 80% Samples of 2000 Cases 60% 60% 40% 40% 20% 20% 0% Abalone 2Planes Pole Elevators Ailerons Mv1 Fried1 CompAct CompAct(s) Census16H Census8L Kinematics DepthFrst MinError MCC MDL OPT2 MinSS 0% Abalone 2Planes Pole Elevators Ailerons Mv1 Fried1 CompAct CompAct(s) Census16H Census8L Kinematics DepthFrst MinError MCC MDL OPT2 MinSS Figure 1 - The comparative results for tree size. Figure 2 shows similar graphs for the Mean Squared Error of the best trees of each sequence. The results are again given as percentages and they were trimmed to 20% losses.

11 The differences between the methods decrease as the training samples grow in size. It is interesting to observe that there are extremely large differences in the potential of the sequences for samples with few hundred cases. Under these conditions the OPT-2 method is more stable over all domains (only once achieved a result worse than 5%). On the other hand methods like Depth-first, Minimal Cost Complexity and Minimal Error have a very bad score for the CompAct data set. Our results seem to indicate that only with small training samples there are large differences in the potential of the sequences obtained by the methods we have tried. 20% Samples of 300 Cases 20% Samples of 600 Cases 15% 15% 10% 10% 5% 5% 0% Abalone 2Planes Pole Elevators Ailerons Mv1 Fried1 CompAct CompAct(s) Census16H Census8L Kinematics DepthFrst MinError MCC MDL OPT2 MinSS 0% Abalone 2Planes Pole Elevators Ailerons Mv1 Fried1 CompAct CompAct(s) Census16H Census8L Kinematics DepthFrst MinError MCC MDL OPT2 MinSS 20% Samples of 1000 Cases 20% Samples of 2000 Cases 15% 15% 10% 10% 5% 5% 0% Abalone 2Planes Pole Elevators Ailerons Mv1 Fried1 CompAct CompAct(s) Census16H Census8L Kinematics DepthFrst MinError MCC MDL OPT2 MinSS 0% Abalone 2Planes Pole Elevators Ailerons Mv1 Fried1 CompAct CompAct(s) Census16H Census8L Kinematics DepthFrst MinError MCC MDL OPT2 MinSS Figure 2 The results of the error of the best trees for different training sizes. We have conducted a set of paired comparisons in order to quantify the statistical significance of the differences between the methods in all data sets. We have used a 4 10-fold Cross Validation evaluation procedure. The use of Cross Validation ensures the characterization of performance differences due to variations on the test sets, while the repetition with different random permutations of the data set removes eventual biases due to training set order. For each of the 40 folds all methods were tried under the same conditions a part from the sequence generation algorithm.

12 Table 3 presents the results in terms of Mean Squared Error of the best tree in each sequence. The best result for each data set is presented in italics and the results that are significantly worse that this best score are marked with asterisks (one for 95% confidence and two for 99%). Table 3. Mean Squared Error for each sequence-based method. DepthFrst MinErr MDL MCC OPT-2 MinStatS Abalone 7.35 ** 7.11 ** ** 7.22 ** Planes Pole Elevators * * Ailerons ** ** ** ** ** Mv Fried CompAct CompAct(s) Census16H 1.9E+09 ** 1.97E+09 ** 1.72E+09 ** 1.99E+09 ** 1.92E+09 ** 1.63E+09 Census8L 1.55E+09 ** 1.54E+09 ** 1.26E+09 ** 1.45E+09 ** 1.43E+09 ** 1.21E+09 Kinematics ** ** ** We can observe that our proposed Minimum Statistical support (MS) method is clearly the method with best results. MS is the only method that never significantly looses to others, while achieving several significant wins. This result is even more remarkable if we take into account that MS is a particular simple method when compared to others like OPT-2, for instance. Our other proposed method (MDL) can be considered the second best in terms of accuracy. Surprisingly, there are methods that even achieve worse results than the blind Depth-first approach on some domains. This table also shows that the choice of the sequence generation method may lead to significant differences on the accuracy of the final model. Table 4 presents the results of this comparison in terms of size of the trees. This table uses the same notation as the previous one regarding significance of differences.

13 Table 4. Tree size for each sequence-based method. DepthFrst MinErr MDL MCC OPT-2 MinStatS Abalone ** 81.4 ** 44.6 ** 65.3 ** 76 ** Planes Pole 42.0 ** 39.8 ** * 39.0 ** 36.2 Elevators ** ** ** ** * Ailerons ** ** ** ** Mv Fried ** ** ** ** CompAct CompAct(s) Census16H ** ** ** Census8L ** ** ** ** Kinematics ** ** ** ** ** In terms of size of the trees both MDL and MS achieve significantly better results than the others. The difference can be extremely large for some domains (Census16H, Census8L, Ailerons and Abalone). These results make even more remarkable the scores of these two methods in terms of accuracy that were given in Table 3. We now address the issue of how the sequence-based methods compare to single-tree algorithms. We have compared one representative of each approach, the Minimum Statistical Support (MS) and the Niblett and Bratko s algorithm (NB), respectively. Our implementation of this later algorithm uses m-estimates to obtain the error of nodes as in the RETIS system (Karalic and Cestnik, 1991). To ensure a fair comparison with the MS method we have also used these estimates to select one of the trees of its sequence. We have carried out several 4 10 Cross Validation paired experiments using different values for the m parameter (0.5, 1.5, 2, 3, 5 and 10). For each fold a sequence was built using the MS method and m- estimates were used to select one of the trees. This tree was than compared with the tree chosen by the NB method using m-estimates with the same value of the m parameter. Table 5 shows the Mean Squared Error of the comparison for different values of m over all domains. Whenever the result of MS is significantly better (97.5% confidence) than the corresponding result of NB, the value is shown with dark gray background. The inverse is shown with light gray background. The other differences lack statistical significance at this level of confidence.

14 Table 5. Minimal Statistical support versus Niblett and Bratko s algorithm in terms of Mean Squared Error. Abal. 2Pl. Pole Elev. Ail. Mv1 Fried Comp Comp(s) C16H C8L Kin. NB(0.5) E E MS(0.5) E E NB(1.5) E E MS(1.5) E E NB(2) E E MS(2) E E NB(3) E E MS(3) E E NB(5) E E MS(5) E E NB(10) E E MS(10) E E This table provides a clear indication that the representative of sequence-based methods is significantly superior the Niblett and Bratko s algorithm on several domains. This result is even clearer if we compare the best results of each method for all tried m values 1. In effect, within this scenario we have verified that the NB method never outperformed MS result (with or without statistical significance). On the contrary, MS was significantly (97.5% confidence) better than NB on 6 out of the 12 domains. We now analyze the results of the same comparison in terms of size of the trees. Table 6 presents the number of leaves of the trees obtained with both methods. This table follows the same notation regarding the statistical significance of the differences. 1 - This best setting could be automatically obtained using an internal Cross Validation tuning process.

15 Table 6. Minimal Statistical support versus Niblett and Bratko s algorithm in terms of Size of the Tree (number of leaves). Abal. 2Pl. Pole Elev. Ail. Mv1 Fried Comp Comp(s) C16H C8L Kin. NB(0.5) MS(0.5) NB(1.5) MS(1.5) NB(2) MS(2) NB(3) MS(3) NB(5) MS(5) NB(10) MS(10) This table reveals a slight superiority of the MS method with few exceptions on some domains. If we again consider the optimal value of m for each method we observe that both methods significantly outperform the other in 3 domains. This indicates that the results in terms of tree size are a bit more leveled. Still, we have to recall that in terms of interpretability the sequence-based methods allow the user to select other trees with the wanted degree of complexity without any additional computational cost. 4 Conclusions We have presented a set of pruning methods based on the notion of selection from a sequence of trees. These methods provide a set of alternative regression trees with different trade-off between accuracy and interpretability. These approaches use reliable error estimates to choose one of these models, but they also allow the user to inspect and select other trees without additional learning effort. The trees in these sequences are characterized with measures that provide guidance regarding the costs of other selections. We claim that this feature of sequence-based pruning methods is of extreme importance in real applications of tree-based learners. We have compared several sequence-based algorithms in twelve data sets under the same learning framework. In this paper we have presented two new methods for generating sequences of pruned trees (MDL and MS). We have observed that

16 these methods significantly outperformed other existing methods in terms of accuracy on several domains. Moreover, this score is obtained with simpler trees. Both methods could be easily adapted to classification trees. We have also compared our sequence-based MS method with an algorithm that follows a single-tree approach. Our comparison has revealed that the sequencebased method leads to trees that are significantly more accurate on several domains. References Almuallim,H. (1996) : An efficient algorithm for optimal pruning of decision trees, Artificial Intelligence, 82 (2), Elsevier. Bohanec,M., Bratko,I. (1994) : Trading Accuracy for Simplicity in Decision Trees, Machine Learning, 15 (3). Kluwer Academic Publishers. Breiman,L., Friedman,J.H., Olshen,R.A. & Stone,C.J. (1984): Classification and Regression Trees, Wadsworth Int. Group, Belmont, California, USA, Cestnik,B. (1990) : Estimating probabilities : A crucial task in Machine Learning. In proceeding of the 9th European Conference on Artificial Intelligence (ECAI-90), Pitman Publishers. Esposito,F., Malerba,D., Semerano,G. (1993) : Decision Tree Pruning as a Search in the State Space. In Proceedings of the European Conference on Machine Learning (ECML-93), Brazdil,P. (ed.). LNAI-667, Springer Verlag Esposito,F., Malerba,D., Semerano,G. (1995) : Simplifying Decision Trees by Pruning and Grafting: New Results. In Proceedings of the European Conference on Machine Learning (ECML-95), Lavrac,N. and Wrobel,S. (eds.). LNAI-912, Springer Verlag Karalic,A. (1992) : Employing Linear Regression in Regression Tree Leaves. In Proceedings of ECAI-92. Wiley & Sons. Karalic,A., Cestnik,B. (1991) : The bayesian approach to tree-structured regression. In proceedings of the ITI-91. Merz,C.J., Murphy,P.M. (1996) : UCI repository of machine learning databases [ Irvine, CA. University of California, Department of Information and Computer Science. Mingers,J. (1989) : An empirical comparison of pruning methods for decision tree induction. Machine Learning, 4 : Kluwer Academic Publishers. Niblett,T., Bratko,I. (1986) : Learning decision rules in noisy domains. In Developments in Expert Systems, Bramer,M. (ed.). Cambridge University Press. Quinlan,J. and Rivest,R. (1989) : Inferring decision trees using the minimum description length principle. Information and Computation, 80 : Rissanen,J. (1983) : A universal prior for integers and estimation by minimum description length. The Annals of Statistics, 11 (2) : Robnik-Sikonja,M., Kononenko,I. (1998) : Pruning regression trees with MDL. Working paper. Wallace,C. and Patrick,J. (1993) : Coding decision trees. Machine Learning, 11(1). Kluwer Academic Publishers.

A Comparative Study of Reliable Error Estimators for Pruning Regression Trees

A Comparative Study of Reliable Error Estimators for Pruning Regression Trees A Comparative Study of Reliable Error Estimators for Pruning Regression Trees Luís Torgo LIACC/FEP University of Porto R. Campo Alegre, 823, 2º - 4150 PORTO - PORTUGAL Phone : (+351) 2 607 8830 Fax : (+351)

More information

The digital copy of this thesis is protected by the Copyright Act 1994 (New Zealand).

The digital copy of this thesis is protected by the Copyright Act 1994 (New Zealand). http://waikato.researchgateway.ac.nz/ Research Commons at the University of Waikato Copyright Statement: The digital copy of this thesis is protected by the Copyright Act 1994 (New Zealand). The thesis

More information

Trimmed bagging a DEPARTMENT OF DECISION SCIENCES AND INFORMATION MANAGEMENT (KBI) Christophe Croux, Kristel Joossens and Aurélie Lemmens

Trimmed bagging a DEPARTMENT OF DECISION SCIENCES AND INFORMATION MANAGEMENT (KBI) Christophe Croux, Kristel Joossens and Aurélie Lemmens Faculty of Economics and Applied Economics Trimmed bagging a Christophe Croux, Kristel Joossens and Aurélie Lemmens DEPARTMENT OF DECISION SCIENCES AND INFORMATION MANAGEMENT (KBI) KBI 0721 Trimmed Bagging

More information

REGRESSION BY SELECTING APPROPRIATE FEATURE(S)

REGRESSION BY SELECTING APPROPRIATE FEATURE(S) REGRESSION BY SELECTING APPROPRIATE FEATURE(S) 7ROJD$\GÕQDQG+$OWD\*üvenir Department of Computer Engineering Bilkent University Ankara, 06533, TURKEY Abstract. This paper describes two machine learning

More information

Estimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees

Estimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees Estimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees Jing Wang Computer Science Department, The University of Iowa jing-wang-1@uiowa.edu W. Nick Street Management Sciences Department,

More information

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1 Big Data Methods Chapter 5: Machine learning Big Data Methods, Chapter 5, Slide 1 5.1 Introduction to machine learning What is machine learning? Concerned with the study and development of algorithms that

More information

Stepwise Induction of Model Trees

Stepwise Induction of Model Trees Stepwise Induction of Model Trees Donato Malerba Annalisa Appice Antonia Bellino Michelangelo Ceci Domenico Pallotta Dipartimento di Informatica, Università degli Studi di Bari via Orabona 4, 70 Bari,

More information

Decision trees. Decision trees are useful to a large degree because of their simplicity and interpretability

Decision trees. Decision trees are useful to a large degree because of their simplicity and interpretability Decision trees A decision tree is a method for classification/regression that aims to ask a few relatively simple questions about an input and then predicts the associated output Decision trees are useful

More information

Chapter 5. Local Regression Trees

Chapter 5. Local Regression Trees Chapter 5 Local Regression Trees In this chapter we explore the hypothesis of improving the accuracy of regression trees by using smoother models at the tree leaves. Our proposal consists of using local

More information

What is Learning? CS 343: Artificial Intelligence Machine Learning. Raymond J. Mooney. Problem Solving / Planning / Control.

What is Learning? CS 343: Artificial Intelligence Machine Learning. Raymond J. Mooney. Problem Solving / Planning / Control. What is Learning? CS 343: Artificial Intelligence Machine Learning Herbert Simon: Learning is any process by which a system improves performance from experience. What is the task? Classification Problem

More information

Constraint Based Induction of Multi-Objective Regression Trees

Constraint Based Induction of Multi-Objective Regression Trees Constraint Based Induction of Multi-Objective Regression Trees Jan Struyf 1 and Sašo Džeroski 2 1 Katholieke Universiteit Leuven, Dept. of Computer Science Celestijnenlaan 200A, B-3001 Leuven, Belgium

More information

Classification and Regression Trees

Classification and Regression Trees Classification and Regression Trees David S. Rosenberg New York University April 3, 2018 David S. Rosenberg (New York University) DS-GA 1003 / CSCI-GA 2567 April 3, 2018 1 / 51 Contents 1 Trees 2 Regression

More information

CloNI: clustering of JN -interval discretization

CloNI: clustering of JN -interval discretization CloNI: clustering of JN -interval discretization C. Ratanamahatana Department of Computer Science, University of California, Riverside, USA Abstract It is known that the naive Bayesian classifier typically

More information

INDUCTIVE LEARNING WITH BCT

INDUCTIVE LEARNING WITH BCT INDUCTIVE LEARNING WITH BCT Philip K Chan CUCS-451-89 August, 1989 Department of Computer Science Columbia University New York, NY 10027 pkc@cscolumbiaedu Abstract BCT (Binary Classification Tree) is a

More information

Algorithms: Decision Trees

Algorithms: Decision Trees Algorithms: Decision Trees A small dataset: Miles Per Gallon Suppose we want to predict MPG From the UCI repository A Decision Stump Recursion Step Records in which cylinders = 4 Records in which cylinders

More information

An Empirical Study of Hoeffding Racing for Model Selection in k-nearest Neighbor Classification

An Empirical Study of Hoeffding Racing for Model Selection in k-nearest Neighbor Classification An Empirical Study of Hoeffding Racing for Model Selection in k-nearest Neighbor Classification Flora Yu-Hui Yeh and Marcus Gallagher School of Information Technology and Electrical Engineering University

More information

Metrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates?

Metrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates? Model Evaluation Metrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates? Methods for Model Comparison How to

More information

Induction of Multivariate Decision Trees by Using Dipolar Criteria

Induction of Multivariate Decision Trees by Using Dipolar Criteria Induction of Multivariate Decision Trees by Using Dipolar Criteria Leon Bobrowski 1,2 and Marek Krȩtowski 1 1 Institute of Computer Science, Technical University of Bia lystok, Poland 2 Institute of Biocybernetics

More information

FLEXIBLE AND OPTIMAL M5 MODEL TREES WITH APPLICATIONS TO FLOW PREDICTIONS

FLEXIBLE AND OPTIMAL M5 MODEL TREES WITH APPLICATIONS TO FLOW PREDICTIONS 6 th International Conference on Hydroinformatics - Liong, Phoon & Babovic (eds) 2004 World Scientific Publishing Company, ISBN 981-238-787-0 FLEXIBLE AND OPTIMAL M5 MODEL TREES WITH APPLICATIONS TO FLOW

More information

Improving the Random Forest Algorithm by Randomly Varying the Size of the Bootstrap Samples for Low Dimensional Data Sets

Improving the Random Forest Algorithm by Randomly Varying the Size of the Bootstrap Samples for Low Dimensional Data Sets Improving the Random Forest Algorithm by Randomly Varying the Size of the Bootstrap Samples for Low Dimensional Data Sets Md Nasim Adnan and Md Zahidul Islam Centre for Research in Complex Systems (CRiCS)

More information

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018 MIT 801 [Presented by Anna Bosman] 16 February 2018 Machine Learning What is machine learning? Artificial Intelligence? Yes as we know it. What is intelligence? The ability to acquire and apply knowledge

More information

Tree-based methods for classification and regression

Tree-based methods for classification and regression Tree-based methods for classification and regression Ryan Tibshirani Data Mining: 36-462/36-662 April 11 2013 Optional reading: ISL 8.1, ESL 9.2 1 Tree-based methods Tree-based based methods for predicting

More information

Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai

Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai Decision Trees Decision Tree Decision Trees (DTs) are a nonparametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target

More information

Rank Measures for Ordering

Rank Measures for Ordering Rank Measures for Ordering Jin Huang and Charles X. Ling Department of Computer Science The University of Western Ontario London, Ontario, Canada N6A 5B7 email: fjhuang33, clingg@csd.uwo.ca Abstract. Many

More information

An Information-Theoretic Approach to the Prepruning of Classification Rules

An Information-Theoretic Approach to the Prepruning of Classification Rules An Information-Theoretic Approach to the Prepruning of Classification Rules Max Bramer University of Portsmouth, Portsmouth, UK Abstract: Keywords: The automatic induction of classification rules from

More information

Oblique Linear Tree. 1. Introduction

Oblique Linear Tree. 1. Introduction Oblique Linear Tree João Gama LIACC, FEP - University of Porto Rua Campo Alegre, 823 4150 Porto, Portugal Phone: (+351) 2 6001672 Fax: (+351) 2 6003654 Email: jgama@ncc.up.pt WWW: http//www.up.pt/liacc/ml

More information

10601 Machine Learning. Model and feature selection

10601 Machine Learning. Model and feature selection 10601 Machine Learning Model and feature selection Model selection issues We have seen some of this before Selecting features (or basis functions) Logistic regression SVMs Selecting parameter value Prior

More information

Fuzzy Partitioning with FID3.1

Fuzzy Partitioning with FID3.1 Fuzzy Partitioning with FID3.1 Cezary Z. Janikow Dept. of Mathematics and Computer Science University of Missouri St. Louis St. Louis, Missouri 63121 janikow@umsl.edu Maciej Fajfer Institute of Computing

More information

Efficient Pruning Method for Ensemble Self-Generating Neural Networks

Efficient Pruning Method for Ensemble Self-Generating Neural Networks Efficient Pruning Method for Ensemble Self-Generating Neural Networks Hirotaka INOUE Department of Electrical Engineering & Information Science, Kure National College of Technology -- Agaminami, Kure-shi,

More information

Cyber attack detection using decision tree approach

Cyber attack detection using decision tree approach Cyber attack detection using decision tree approach Amit Shinde Department of Industrial Engineering, Arizona State University,Tempe, AZ, USA {amit.shinde@asu.edu} In this information age, information

More information

The Projected Dip-means Clustering Algorithm

The Projected Dip-means Clustering Algorithm Theofilos Chamalis Department of Computer Science & Engineering University of Ioannina GR 45110, Ioannina, Greece thchama@cs.uoi.gr ABSTRACT One of the major research issues in data clustering concerns

More information

Univariate and Multivariate Decision Trees

Univariate and Multivariate Decision Trees Univariate and Multivariate Decision Trees Olcay Taner Yıldız and Ethem Alpaydın Department of Computer Engineering Boğaziçi University İstanbul 80815 Turkey Abstract. Univariate decision trees at each

More information

Logical Decision Rules: Teaching C4.5 to Speak Prolog

Logical Decision Rules: Teaching C4.5 to Speak Prolog Logical Decision Rules: Teaching C4.5 to Speak Prolog Kamran Karimi and Howard J. Hamilton Department of Computer Science University of Regina Regina, Saskatchewan Canada S4S 0A2 {karimi,hamilton}@cs.uregina.ca

More information

USING REGRESSION TREES IN PREDICTIVE MODELLING

USING REGRESSION TREES IN PREDICTIVE MODELLING Production Systems and Information Engineering Volume 4 (2006), pp. 115-124 115 USING REGRESSION TREES IN PREDICTIVE MODELLING TAMÁS FEHÉR University of Miskolc, Hungary Department of Information Engineering

More information

7. Decision or classification trees

7. Decision or classification trees 7. Decision or classification trees Next we are going to consider a rather different approach from those presented so far to machine learning that use one of the most common and important data structure,

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Decision Tree Example Three variables: Attribute 1: Hair = {blond, dark} Attribute 2: Height = {tall, short} Class: Country = {Gromland, Polvia} CS4375 --- Fall 2018 a

More information

Allstate Insurance Claims Severity: A Machine Learning Approach

Allstate Insurance Claims Severity: A Machine Learning Approach Allstate Insurance Claims Severity: A Machine Learning Approach Rajeeva Gaur SUNet ID: rajeevag Jeff Pickelman SUNet ID: pattern Hongyi Wang SUNet ID: hongyiw I. INTRODUCTION The insurance industry has

More information

The Basics of Decision Trees

The Basics of Decision Trees Tree-based Methods Here we describe tree-based methods for regression and classification. These involve stratifying or segmenting the predictor space into a number of simple regions. Since the set of splitting

More information

Application of Additive Groves Ensemble with Multiple Counts Feature Evaluation to KDD Cup 09 Small Data Set

Application of Additive Groves Ensemble with Multiple Counts Feature Evaluation to KDD Cup 09 Small Data Set Application of Additive Groves Application of Additive Groves Ensemble with Multiple Counts Feature Evaluation to KDD Cup 09 Small Data Set Daria Sorokina Carnegie Mellon University Pittsburgh PA 15213

More information

BITS F464: MACHINE LEARNING

BITS F464: MACHINE LEARNING BITS F464: MACHINE LEARNING Lecture-16: Decision Tree (contd.) + Random Forest Dr. Kamlesh Tiwari Assistant Professor Department of Computer Science and Information Systems Engineering, BITS Pilani, Rajasthan-333031

More information

Classification Using Unstructured Rules and Ant Colony Optimization

Classification Using Unstructured Rules and Ant Colony Optimization Classification Using Unstructured Rules and Ant Colony Optimization Negar Zakeri Nejad, Amir H. Bakhtiary, and Morteza Analoui Abstract In this paper a new method based on the algorithm is proposed to

More information

APHID: Asynchronous Parallel Game-Tree Search

APHID: Asynchronous Parallel Game-Tree Search APHID: Asynchronous Parallel Game-Tree Search Mark G. Brockington and Jonathan Schaeffer Department of Computing Science University of Alberta Edmonton, Alberta T6G 2H1 Canada February 12, 1999 1 Running

More information

Improving Quality of Products in Hard Drive Manufacturing by Decision Tree Technique

Improving Quality of Products in Hard Drive Manufacturing by Decision Tree Technique Improving Quality of Products in Hard Drive Manufacturing by Decision Tree Technique Anotai Siltepavet 1, Sukree Sinthupinyo 2 and Prabhas Chongstitvatana 3 1 Computer Engineering, Chulalongkorn University,

More information

Concept Tree Based Clustering Visualization with Shaded Similarity Matrices

Concept Tree Based Clustering Visualization with Shaded Similarity Matrices Syracuse University SURFACE School of Information Studies: Faculty Scholarship School of Information Studies (ischool) 12-2002 Concept Tree Based Clustering Visualization with Shaded Similarity Matrices

More information

Evaluation of Decision Tree Pruning Algorithms for Complexity and Classification Accuracy

Evaluation of Decision Tree Pruning Algorithms for Complexity and Classification Accuracy Evaluation of Decision Tree Pruning Algorithms for Complexity and Classification Accuracy Dipti D. Patil Assistant Professor, MITCOE, Pune, INDIA V.M. Wadhai Professor and Dean of Research, MITSOT, MAE,

More information

Lecture 7: Decision Trees

Lecture 7: Decision Trees Lecture 7: Decision Trees Instructor: Outline 1 Geometric Perspective of Classification 2 Decision Trees Geometric Perspective of Classification Perspective of Classification Algorithmic Geometric Probabilistic...

More information

BRACE: A Paradigm For the Discretization of Continuously Valued Data

BRACE: A Paradigm For the Discretization of Continuously Valued Data Proceedings of the Seventh Florida Artificial Intelligence Research Symposium, pp. 7-2, 994 BRACE: A Paradigm For the Discretization of Continuously Valued Data Dan Ventura Tony R. Martinez Computer Science

More information

Boosting Simple Model Selection Cross Validation Regularization. October 3 rd, 2007 Carlos Guestrin [Schapire, 1989]

Boosting Simple Model Selection Cross Validation Regularization. October 3 rd, 2007 Carlos Guestrin [Schapire, 1989] Boosting Simple Model Selection Cross Validation Regularization Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 3 rd, 2007 1 Boosting [Schapire, 1989] Idea: given a weak

More information

Stochastic propositionalization of relational data using aggregates

Stochastic propositionalization of relational data using aggregates Stochastic propositionalization of relational data using aggregates Valentin Gjorgjioski and Sašo Dzeroski Jožef Stefan Institute Abstract. The fact that data is already stored in relational databases

More information

Search and Optimization

Search and Optimization Search and Optimization Search, Optimization and Game-Playing The goal is to find one or more optimal or sub-optimal solutions in a given search space. We can either be interested in finding any one solution

More information

Study on Classifiers using Genetic Algorithm and Class based Rules Generation

Study on Classifiers using Genetic Algorithm and Class based Rules Generation 2012 International Conference on Software and Computer Applications (ICSCA 2012) IPCSIT vol. 41 (2012) (2012) IACSIT Press, Singapore Study on Classifiers using Genetic Algorithm and Class based Rules

More information

Classification. Instructor: Wei Ding

Classification. Instructor: Wei Ding Classification Part II Instructor: Wei Ding Tan,Steinbach, Kumar Introduction to Data Mining 4/18/004 1 Practical Issues of Classification Underfitting and Overfitting Missing Values Costs of Classification

More information

Logistic Model Tree With Modified AIC

Logistic Model Tree With Modified AIC Logistic Model Tree With Modified AIC Mitesh J. Thakkar Neha J. Thakkar Dr. J.S.Shah Student of M.E.I.T. Asst.Prof.Computer Dept. Prof.&Head Computer Dept. S.S.Engineering College, Indus Engineering College

More information

A CSP Search Algorithm with Reduced Branching Factor

A CSP Search Algorithm with Reduced Branching Factor A CSP Search Algorithm with Reduced Branching Factor Igor Razgon and Amnon Meisels Department of Computer Science, Ben-Gurion University of the Negev, Beer-Sheva, 84-105, Israel {irazgon,am}@cs.bgu.ac.il

More information

Decision Tree Learning

Decision Tree Learning Decision Tree Learning 1 Simple example of object classification Instances Size Color Shape C(x) x1 small red circle positive x2 large red circle positive x3 small red triangle negative x4 large blue circle

More information

Improving Tree-Based Classification Rules Using a Particle Swarm Optimization

Improving Tree-Based Classification Rules Using a Particle Swarm Optimization Improving Tree-Based Classification Rules Using a Particle Swarm Optimization Chi-Hyuck Jun *, Yun-Ju Cho, and Hyeseon Lee Department of Industrial and Management Engineering Pohang University of Science

More information

Cpk: What is its Capability? By: Rick Haynes, Master Black Belt Smarter Solutions, Inc.

Cpk: What is its Capability? By: Rick Haynes, Master Black Belt Smarter Solutions, Inc. C: What is its Capability? By: Rick Haynes, Master Black Belt Smarter Solutions, Inc. C is one of many capability metrics that are available. When capability metrics are used, organizations typically provide

More information

Genetic Programming for Data Classification: Partitioning the Search Space

Genetic Programming for Data Classification: Partitioning the Search Space Genetic Programming for Data Classification: Partitioning the Search Space Jeroen Eggermont jeggermo@liacs.nl Joost N. Kok joost@liacs.nl Walter A. Kosters kosters@liacs.nl ABSTRACT When Genetic Programming

More information

Boosting Simple Model Selection Cross Validation Regularization

Boosting Simple Model Selection Cross Validation Regularization Boosting: (Linked from class website) Schapire 01 Boosting Simple Model Selection Cross Validation Regularization Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 8 th,

More information

Classification/Regression Trees and Random Forests

Classification/Regression Trees and Random Forests Classification/Regression Trees and Random Forests Fabio G. Cozman - fgcozman@usp.br November 6, 2018 Classification tree Consider binary class variable Y and features X 1,..., X n. Decide Ŷ after a series

More information

Comparing Univariate and Multivariate Decision Trees *

Comparing Univariate and Multivariate Decision Trees * Comparing Univariate and Multivariate Decision Trees * Olcay Taner Yıldız, Ethem Alpaydın Department of Computer Engineering Boğaziçi University, 80815 İstanbul Turkey yildizol@cmpe.boun.edu.tr, alpaydin@boun.edu.tr

More information

Regression Error Characteristic Surfaces

Regression Error Characteristic Surfaces Regression Characteristic Surfaces Luís Torgo LIACC/FEP, University of Porto Rua de Ceuta, 118, 6. 4050-190 Porto, Portugal ltorgo@liacc.up.pt ABSTRACT This paper presents a generalization of Regression

More information

Hybrid Feature Selection for Modeling Intrusion Detection Systems

Hybrid Feature Selection for Modeling Intrusion Detection Systems Hybrid Feature Selection for Modeling Intrusion Detection Systems Srilatha Chebrolu, Ajith Abraham and Johnson P Thomas Department of Computer Science, Oklahoma State University, USA ajith.abraham@ieee.org,

More information

Building Classifiers using Bayesian Networks

Building Classifiers using Bayesian Networks Building Classifiers using Bayesian Networks Nir Friedman and Moises Goldszmidt 1997 Presented by Brian Collins and Lukas Seitlinger Paper Summary The Naive Bayes classifier has reasonable performance

More information

PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning

PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning Data Mining and Knowledge Discovery, 4, 315 344, 2000 c 2000 Kluwer Academic Publishers. Manufactured in The Netherlands. PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning RAJEEV

More information

Using Pairs of Data-Points to Define Splits for Decision Trees

Using Pairs of Data-Points to Define Splits for Decision Trees Using Pairs of Data-Points to Define Splits for Decision Trees Geoffrey E. Hinton Department of Computer Science University of Toronto Toronto, Ontario, M5S la4, Canada hinton@cs.toronto.edu Michael Revow

More information

Evaluating the Replicability of Significance Tests for Comparing Learning Algorithms

Evaluating the Replicability of Significance Tests for Comparing Learning Algorithms Evaluating the Replicability of Significance Tests for Comparing Learning Algorithms Remco R. Bouckaert 1,2 and Eibe Frank 2 1 Xtal Mountain Information Technology 215 Three Oaks Drive, Dairy Flat, Auckland,

More information

Instance-Based Prediction of Continuous Values

Instance-Based Prediction of Continuous Values From: AAAI Technical Report WS-94-01. Compilation copyright 1994, AAAI (www.aaai.org). All rights reserved. Instance-Based Prediction of Continuous Values Tony Townsend-Weber and Dennis Kibler Information

More information

Classification and Regression Trees

Classification and Regression Trees Classification and Regression Trees Matthew S. Shotwell, Ph.D. Department of Biostatistics Vanderbilt University School of Medicine Nashville, TN, USA March 16, 2018 Introduction trees partition feature

More information

Data Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation

Data Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation Data Mining Part 2. Data Understanding and Preparation 2.4 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Introduction Normalization Attribute Construction Aggregation Attribute Subset Selection Discretization

More information

Optimal Extension of Error Correcting Output Codes

Optimal Extension of Error Correcting Output Codes Book Title Book Editors IOS Press, 2003 1 Optimal Extension of Error Correcting Output Codes Sergio Escalera a, Oriol Pujol b, and Petia Radeva a a Centre de Visió per Computador, Campus UAB, 08193 Bellaterra

More information

Look-Ahead Based Fuzzy Decision Tree Induction

Look-Ahead Based Fuzzy Decision Tree Induction IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 9, NO. 3, JUNE 2001 461 Look-Ahead Based Fuzzy Decision Tree Induction Ming Dong, Student Member, IEEE, and Ravi Kothari, Senior Member, IEEE Abstract Decision

More information

Supervised Learning. Decision trees Artificial neural nets K-nearest neighbor Support vectors Linear regression Logistic regression...

Supervised Learning. Decision trees Artificial neural nets K-nearest neighbor Support vectors Linear regression Logistic regression... Supervised Learning Decision trees Artificial neural nets K-nearest neighbor Support vectors Linear regression Logistic regression... Supervised Learning y=f(x): true function (usually not known) D: training

More information

Educated brute-force to get h(4)

Educated brute-force to get h(4) Educated brute-force to get h(4) Rogério Reis Nelma Moreira João Pedro Pedroso Technical Report Series: DCC-04-04 rev.3 Departamento de Ciência de Computadores Faculdade de Ciências & Laboratório de Inteligência

More information

Unsupervised Discretization using Tree-based Density Estimation

Unsupervised Discretization using Tree-based Density Estimation Unsupervised Discretization using Tree-based Density Estimation Gabi Schmidberger and Eibe Frank Department of Computer Science University of Waikato Hamilton, New Zealand {gabi, eibe}@cs.waikato.ac.nz

More information

Efficient Pairwise Classification

Efficient Pairwise Classification Efficient Pairwise Classification Sang-Hyeun Park and Johannes Fürnkranz TU Darmstadt, Knowledge Engineering Group, D-64289 Darmstadt, Germany Abstract. Pairwise classification is a class binarization

More information

A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection (Kohavi, 1995)

A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection (Kohavi, 1995) A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection (Kohavi, 1995) Department of Information, Operations and Management Sciences Stern School of Business, NYU padamopo@stern.nyu.edu

More information

CSC411/2515 Tutorial: K-NN and Decision Tree

CSC411/2515 Tutorial: K-NN and Decision Tree CSC411/2515 Tutorial: K-NN and Decision Tree Mengye Ren csc{411,2515}ta@cs.toronto.edu September 25, 2016 Cross-validation K-nearest-neighbours Decision Trees Review: Motivation for Validation Framework:

More information

Data Mining and Knowledge Discovery Practice notes 2

Data Mining and Knowledge Discovery Practice notes 2 Keywords Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si Data Attribute, example, attribute-value data, target variable, class, discretization Algorithms

More information

A Fast Decision Tree Learning Algorithm

A Fast Decision Tree Learning Algorithm A Fast Decision Tree Learning Algorithm Jiang Su and Harry Zhang Faculty of Computer Science University of New Brunswick, NB, Canada, E3B 5A3 {jiang.su, hzhang}@unb.ca Abstract There is growing interest

More information

Improving Quality of Products in Hard Drive Manufacturing by Decision Tree Technique

Improving Quality of Products in Hard Drive Manufacturing by Decision Tree Technique www.ijcsi.org 29 Improving Quality of Products in Hard Drive Manufacturing by Decision Tree Technique Anotai Siltepavet 1, Sukree Sinthupinyo 2 and Prabhas Chongstitvatana 3 1 Computer Engineering, Chulalongkorn

More information

An Empirical Investigation of the Trade-Off Between Consistency and Coverage in Rule Learning Heuristics

An Empirical Investigation of the Trade-Off Between Consistency and Coverage in Rule Learning Heuristics An Empirical Investigation of the Trade-Off Between Consistency and Coverage in Rule Learning Heuristics Frederik Janssen and Johannes Fürnkranz TU Darmstadt, Knowledge Engineering Group Hochschulstraße

More information

SECRET: A Scalable Linear Regression Tree Algorithm

SECRET: A Scalable Linear Regression Tree Algorithm SECRET: A Scalable Linear Regression Tree Algorithm Alin Dobra Department of Computer Science Cornell University Ithaca, NY 4853 dobra@cs.cornell.edu Johannes Gehrke Department of Computer Science Cornell

More information

Using Turning Point Detection to Obtain Better Regression Trees

Using Turning Point Detection to Obtain Better Regression Trees Using Turning Point Detection to Obtain Better Regression Trees Paul K. Amalaman, Christoph F. Eick and Nouhad Rizk pkamalam@uh.edu, ceick@uh.edu, nrizk@uh.edu Department of Computer Science, University

More information

Swarm Based Fuzzy Clustering with Partition Validity

Swarm Based Fuzzy Clustering with Partition Validity Swarm Based Fuzzy Clustering with Partition Validity Lawrence O. Hall and Parag M. Kanade Computer Science & Engineering Dept University of South Florida, Tampa FL 33620 @csee.usf.edu Abstract

More information

Nonparametric Methods Recap

Nonparametric Methods Recap Nonparametric Methods Recap Aarti Singh Machine Learning 10-701/15-781 Oct 4, 2010 Nonparametric Methods Kernel Density estimate (also Histogram) Weighted frequency Classification - K-NN Classifier Majority

More information

Lecture 5 of 42. Decision Trees, Occam s Razor, and Overfitting

Lecture 5 of 42. Decision Trees, Occam s Razor, and Overfitting Lecture 5 of 42 Decision Trees, Occam s Razor, and Overfitting Friday, 01 February 2008 William H. Hsu, KSU http://www.cis.ksu.edu/~bhsu Readings: Chapter 3.6-3.8, Mitchell Lecture Outline Read Sections

More information

Development of Prediction Model for Linked Data based on the Decision Tree for Track A, Task A1

Development of Prediction Model for Linked Data based on the Decision Tree for Track A, Task A1 Development of Prediction Model for Linked Data based on the Decision Tree for Track A, Task A1 Dongkyu Jeon and Wooju Kim Dept. of Information and Industrial Engineering, Yonsei University, Seoul, Korea

More information

Data Mining and Knowledge Discovery: Practice Notes

Data Mining and Knowledge Discovery: Practice Notes Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si 8.11.2017 1 Keywords Data Attribute, example, attribute-value data, target variable, class, discretization

More information

Random Forests and Boosting

Random Forests and Boosting Random Forests and Boosting Tree-based methods are simple and useful for interpretation. However they typically are not competitive with the best supervised learning approaches in terms of prediction accuracy.

More information

Data Mining. 3.2 Decision Tree Classifier. Fall Instructor: Dr. Masoud Yaghini. Chapter 5: Decision Tree Classifier

Data Mining. 3.2 Decision Tree Classifier. Fall Instructor: Dr. Masoud Yaghini. Chapter 5: Decision Tree Classifier Data Mining 3.2 Decision Tree Classifier Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction Basic Algorithm for Decision Tree Induction Attribute Selection Measures Information Gain Gain Ratio

More information

CHAPTER 3 AN OVERVIEW OF DESIGN OF EXPERIMENTS AND RESPONSE SURFACE METHODOLOGY

CHAPTER 3 AN OVERVIEW OF DESIGN OF EXPERIMENTS AND RESPONSE SURFACE METHODOLOGY 23 CHAPTER 3 AN OVERVIEW OF DESIGN OF EXPERIMENTS AND RESPONSE SURFACE METHODOLOGY 3.1 DESIGN OF EXPERIMENTS Design of experiments is a systematic approach for investigation of a system or process. A series

More information

Ensemble Learning. Another approach is to leverage the algorithms we have via ensemble methods

Ensemble Learning. Another approach is to leverage the algorithms we have via ensemble methods Ensemble Learning Ensemble Learning So far we have seen learning algorithms that take a training set and output a classifier What if we want more accuracy than current algorithms afford? Develop new learning

More information

Univariate Margin Tree

Univariate Margin Tree Univariate Margin Tree Olcay Taner Yıldız Department of Computer Engineering, Işık University, TR-34980, Şile, Istanbul, Turkey, olcaytaner@isikun.edu.tr Abstract. In many pattern recognition applications,

More information

Text Categorization. Foundations of Statistic Natural Language Processing The MIT Press1999

Text Categorization. Foundations of Statistic Natural Language Processing The MIT Press1999 Text Categorization Foundations of Statistic Natural Language Processing The MIT Press1999 Outline Introduction Decision Trees Maximum Entropy Modeling (optional) Perceptrons K Nearest Neighbor Classification

More information

Calibrating Random Forests

Calibrating Random Forests Calibrating Random Forests Henrik Boström Informatics Research Centre University of Skövde 541 28 Skövde, Sweden henrik.bostrom@his.se Abstract When using the output of classifiers to calculate the expected

More information

Speeding Up Logistic Model Tree Induction

Speeding Up Logistic Model Tree Induction Speeding Up Logistic Model Tree Induction Marc Sumner 1,2,EibeFrank 2,andMarkHall 2 1 Institute for Computer Science, University of Freiburg, Freiburg, Germany sumner@informatik.uni-freiburg.de 2 Department

More information

Decision Trees. This week: Next week: constructing DT. Pruning DT. Ensemble methods. Greedy Algorithm Potential Function.

Decision Trees. This week: Next week: constructing DT. Pruning DT. Ensemble methods. Greedy Algorithm Potential Function. Decision Trees This week: constructing DT Greedy Algorithm Potential Function upper bounds the error Pruning DT Next week: Ensemble methods Random Forest 2 Decision Trees - Boolean x 0 + x 6 0 + - 3 Decision

More information

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques 24 Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Ruxandra PETRE

More information

FUTURE communication networks are expected to support

FUTURE communication networks are expected to support 1146 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL 13, NO 5, OCTOBER 2005 A Scalable Approach to the Partition of QoS Requirements in Unicast and Multicast Ariel Orda, Senior Member, IEEE, and Alexander Sprintson,

More information