LIACC. Machine Learning group. Sequence-based Methods for Pruning Regression Trees by Luís Torgo. Internal Report n /4/98
|
|
- Charleen Gray
- 5 years ago
- Views:
Transcription
1 LIACC Machine Learning group Internal Report n Sequence-based Methods for Pruning Regression Trees by Luís Torgo 20/4/98
2 Sequence-based methods for Pruning Regression Trees Luís Torgo LIACC - University of Porto R. Campo Alegre, Porto Portugal ltorgo@ncc.up.pt URL : Phone : (+351) Fax : (+351) ABSTRACT: Pruning has been considered the key issue for obtaining reliable tree-based models. Current approaches to this task have taken two distinct pathways. On one hand we have methods that are based on selection from a sequence of candidate pruned trees. On the other hand we have algorithms that follow a bottom-up approach by pruning unreliable lower branches of the unpruned tree, thus producing a single tree model. These two approaches entail a different trade-off between accuracy and interpretability. This paper presents a comparative study between representatives of both approaches. The study is carried out in the context of regression trees with a focus on sequence-based methods. We describe some existing algorithms and present some new variants. We claim that sequence-based methods provide an additional benefit by allowing the user to easily select a model with the intended degree of interpretability. Moreover, our study reveals that this is achieved without sacrificing accuracy. KEYWORDS: Tree pruning methods, regression trees. 1 Introduction Recursive partitioning is the algorithm behind most decision tree algorithms. This algorithm provides computational efficiency at the cost of statistical unreliability in lower branches of the trees. Moreover, noisy domains often lead to the well-known
3 problem of over-specialization. Post-pruning of these trees has always been considered a major step in tree induction (Breiman et al., 1984). Breiman and his colleagues have described a pruning methodology based on the notion of tree selection using reliable error estimates. A side effect of this approach, which is often overlooked, is that it provides a set of alternative pruned trees entailing different compromises between accuracy and simplicity. The availability of these set of alternative models can be considered a key advantage to the user of these systems, as he may easily choose alternative trees to convey his domain-specific needs. Interpretability has always been considered a key advantage of Machine Learning (ML) approaches and these methods provide an additional degree of flexibility in this respect. Niblett and Bratko (1986) have described an alternative approach to tree post-pruning. Their algorithm proceeds by pruning unreliable lower branches of the trees in a bottom-up fashion. Compared to Breiman et al. s approach this method is computational more efficient, but on the other hand it produces a unique tree as result. The only way the user can get alternative tree models is by running the algorithm again using different set-ups for the parameters of the used error estimation method. This entails additional computational cost. Moreover, these parameters often lack intuitive meaning. Our study compares these two approaches to pruning in terms of accuracy and interpretability in twelve regression domains. The comparison is carried out using a common regression tree learning system. Our system (RT) implements several sequence-based pruning methods as well as Niblett & Bratko s method. Existing comparative work (Mingers, 1989; Esposito et al., 1993, 1995) has not addressed this issue of the flexible model choice of sequence-based pruning methods. These works have compared full-featured pruning algorithms. Ours emphasizes sequencebased pruning methods. We introduce several new variants for exploring the large search space of pruned trees. We evaluate these methods and stress their differences compared to a well-known single-tree pruning algorithm (Niblett & Bratko s method). The next section describes the pruning methods that we use in our comparisons. Section 3 presents the experiments carried out, while Section 4 contains the main conclusions of this work 2 Pruning Regression Trees Post-pruning of tree models can be seen as a search through the space of all possible sub-trees of the original overly large learned tree (Esposito et al.,1993). Esposito and her colleagues (1993) have described a series of pruning algorithms
4 from this search perspective. Their work presents the approaches followed by different pruning methods with respect to the forms of exploring the search space, the operators used for moving through the search states and the used state evaluation functions. Our study is focused on the question of how the next state is generated and how further the search goes. Sequence-based methods use a hillclimbing method that selects the next node to prune. In this paper we present several alternative functions to guide this selection. Moreover, these methods keep exploring the search space until a single leaf node tree is reached. Selection of the final tree model is left to a second stage of the pruning process. This stage is considered different from tree generation (Breiman et al.,1984). As a result of this procedure systems using these approaches can output both the selected tree as well as the other trials. Moreover, they provide the evaluation of these alternatives in the selection phase. An example of such sequence is given in Table 1 that results from applying our RT system to the Boston Housing domain from the UCI repository (Merz and Murphy, 1996). The information provided by this type of tables can be of extreme relevance in order to allow the user to explore different alternative models without additional learning effort. We claim that this is an important advantage in real world applications. Furthermore, it is known that it is quite difficult to find algorithm settings that work perfectly over all domains. Other pruning methods exist that do not provide such a sequence of alternative trees. Apart from the CART system (Breiman et al., 1984) most systems follow such strategies. Most of the existing alternatives use a bottom-up approach (Esposito et al.,1993). They start with the unpruned tree and visit the nodes using some pre-defined order. Each time some evaluation conditions are met the nodes are pruned. These methods produce a single tree as the result of the pruning stage. However, it should be mentioned that by casting these methods into a search-based framework as proposed by Esposito et al. (1993), one can see that they in fact explore alternative tree models. This means that these methods could theoretically be adapted in order to produce a set of trees (although potentially smaller than the ones generated by sequence-based methods). Still, this research line has not been explored.
5 Table 1. An example of a sequence of trees for the Housing data set. TREE N.LEAVES ERR ERR SE(ERR ) MDL PRUNED AT NODES T T T T T T T T7 * T T T T T12 $ T T T T T T TABLE LEGEND: ERR - Training Set Error ERR' - Estimated true error SE ERR' - Standard Error of the estimate * - The lowest estimated error tree + - The 1-SE tree MDL - Code description length of the tree (in bits) $ - The Minimum Description Length tree 2.1 Generating Sequences of Pruned Trees There are two main approaches to sequence-based pruning. Nested methods generate a sequence of trees where each element is a sub-tree of the previous one. The key issue on these methods is the selection of the next node to prune on each iteration. We will describe several alternative approaches to this issue. Optimal pruning algorithms generate sequences of pruned trees with size decreasing in one, where each element is the most accurate tree of all possible trees with that size. The trees in these sequences are not nested. This approach was already mentioned in the CART book (Breiman et al.,1984) although an algorithm was not provided. Bohanec and Bratko (1994) proposed a dynamic programming algorithm (OPT) to solve this search problem. Allmuallin (1996) has recently
6 proposed an improvement of this algorithm (OPT-2). In our study we use this algorithm as the representative of this type of methods. We now briefly describe the main ideas of the methods we use in our experimental comparison. All of them were implemented in the RT system Minimal Cost Complexity This is a nested sequence method originally presented in Breiman et al. (1984). This method is based on a measure of the cost complexity of the trees. This value is given by the following equation: where, ~ Rα +α ( T ) = R( T ) T (1) α is a complexity parameter; R(T) is the error in the training set of the tree T; and T ~ is the number of leaves of the tree. Based on this notion of α-based cost complexity Breiman and his colleagues describe an iterative algorithm that generates a sequence of nested pruned trees. At each step the node that is chosen to be pruned is the node t that minimizes the function : R( t) R( Tt ) f ( t) = ~ (2) T 1 where, R(t) is the training error at node t ; R(T t ) is the training error of the sub-tree rooted at t ; and T ~ t is the number of leaves of this sub-tree. t The Minimal Error Decrease This nested method uses a simple heuristic to choose the next node to prune. Based on the error estimates obtained during the learning phase we proceed by selecting at each step the splitting node with lowest impact in the tree error. This can be described as selecting the node t that minimizes the function : f ( t) = R( t) ( ) (3) R T t
7 2.1.3 The Minimum Statistical Support method The idea behind this method is to prune the node that is potentially less reliable. We follow a heuristic approach to measure statistical reliability. We make the strong assumption that nodes with lower number of training samples lead to less reliable estimates of the error. Based on this assumption we select the node with minimal number of cases as the next node to prune The MDL-based method Minimum description length (Rissanen, 1983) has been used as a means to characterize models within ML (Quinlan and Rivest, 1989; Wallace and Patrick, 1993). This methodology based on a sound theoretical background provides means to incorporate in a unique measure both the model complexity as well as its errors. In a way this is the idea behind the Minimal Cost Complexity measure presented in Section 2.1.1, but this latter lacks theoretical justification. Recently, Robnik- Sikonja and Kononenko (1998) presented a coding schema for regression tree models. Based on this coding we have developed a method for selecting a node to prune in a nested sequence generation algorithm. This method consists of choosing the node with lowest variation in description length. Notice that we do not say lowest decrease as in the previous training error-based measures. Training error always decreases as the size of the tree grows. On the contrary, MDL coding follows a U-shaped curve (we can observe this effect in Table 1). The function guiding the choice of a node can be described by, where, t ( t) MDL( T ) min MDL (4) MDL(t) is the coding length of node t if it was a leaf; and MDL(T t ) is the coding length of the sub-tree rooted at t. The coding length of a leaf node amounts to a sum of the coding lengths of the errors committed by the model at that node plus the coding of the model. This turns out to be a sum of the coding of a set of real numbers, which can be solved using the method proposed by Rissanen (1983). The coding length of a tree is the sum of the coding of its split nodes plus the coding of its leaf nodes. We follow the same strategy outlined in Robnik-Sikonja and Kononenko (1998) to code each split node. t
8 2.1.5 The OPT-2 method Breiman et al.,(1984) defined the notion of an optimal sequence of pruned trees as sequence of trees with size increasing in one, such that each tree is the highest accuracy tree for that size. Based on these ideas, Bohanec and Bratko (1994) have presented a dynamic programming algorithm (OPT) that solved the problem of finding these trees in an efficient way. Recently, Almuallim (1996) has described an improvement of this algorithm, named OPT-2. This algorithm improves the computational efficiency of OPT and it is able to generate the trees sequentially as opposed to OPT that generates the whole sequence simultaneously. We have implemented the OPT-2 algorithm within our RT system The Depth-first method As a kind of baseline method we have used an algorithm that runs through the tree in a depth-first fashion. Each time this method reaches a node whose descendants are leaves it generates a new pruned tree by pruning at that node. 2.2 Single-tree Methods These methods produce as outcome a single tree as opposed to the methods described in the previous section. Several methods exist that explore the search space using different heuristics. Niblett and Bratko s algorithm (1986) is one of the most well known algorithms. Its has been applied in the context of regression trees in the RETIS system (Karalic and Cestnik, 1991; Karalic, 1992). This algorithm runs through every node of the tree calculating their error (called static error in the author s nomenclature). In the RETIS system this calculus is based on m-estimates (Cestnik, 1990; Karalic and Cestnik, 1991) of the variance. These estimates provide more reliable values of the node errors than the ones obtained with the training set. The m-estimate of the error in a node is based on the estimate of the average Y value and is given by 2 ( σ ) ( ) m Est µ n Y i= 1 1 = n + m n y + m N ( ) n + m i i= 1 N i= 1 1 m m Est Y = y µ n + m N y i 2 2 ( Est( ) ( ) 2 i + yi m (5) N n + m i= 1
9 where, N is the number of training cases and n is the number in the node; and m is a parameter of these estimates. For each split node a dynamic error is also calculated. This is equal to the weighted sum of the errors of its children. These weights are obtained taking into account the number of cases going to each branch. In case the dynamic error is superior to the static error the node is pruned. We have implemented this algorithm in our RT system. M-estimates can also be used to calculate the error of a tree. This will enable the use of these estimates as a means to select a tree in a sequence-based method. By proceeding this way we will be able to compare sequence-based methods to this Niblett and Bratko s algorithm. 3 Experimental Comparison In the first set of experiments we have tried to assert if there was a clear winner among the sequence-based methods. To compare different sequences of trees we have used the following methodology. We have tried to find out the potential of each sequence. The goal of producing a sequence is to provide a set of alternative models that will be used in a sub-sequent evaluation phase for the final tree selection. If we had a perfect tree evaluation method we could compare the best selections in all sequences. We can simulate this set-up by using a sufficiently large separate set of cases. Using these cases we can estimate the error of each tree in each sequence and compare the best trees of each sequence. These best trees represent the potential of each sequence generation method. The goal of the comparisons we have carried out was to observe the performance of each method for different training sample sizes. We have used the data sets presented in Table 2. Table 2. The used data sets and the available number of cases. Data Set Training Pool;Test Set Data Set Training Pool;Test Set Abalone 3133;1044 Kinematics 4500;3692 Pole 5000;4065 Fried ;5000 Elevators 8752;7847 Census16H 17000;5784 Ailerons 7154;6596 Census8L 17000;5784 CompActiv. 4500;3692 2Dplanes 20000;5000 CompActiv(s) 4500;3692 Mv ;5000
10 For each data set we have randomly obtained samples of 300, 600, 1000 and 2000 cases, from the training pool. Using the RT learning system we have obtained a large regression tree for each sample. The sequence-based methods were then used to obtain a sequence of pruned trees. The large separate test set was then used to select the best tree of each sequence. We have compared the size and accuracy of these trees. The results we present are averages of 20 random repetitions for each training size. The results are given as percentage losses over the result of the best method (thus the best result for each data set has a score of 0%). Figure 1 presents the results for the tree sizes (measured as the number of leaves) for all data sets and training sizes. The results of this figure reveal that the methods have a similar behavior for the different training sizes. In effect, with few exceptions the differences are not big in terms of tree size. The apparent absence of a method consistently outperforming the others can be explained by the fact that there is usually a large amount of trees with similar performance but quite different sizes as it was observed by Breiman et al. (1984). 80% Samples of 300 Cases 80% Samples of 600 Cases 60% 60% 40% 40% 20% 20% 0% Abalone 2Planes Pole Elevators Ailerons Mv1 Fried1 CompAct CompAct(s) Census16H Census8L Kinematics DepthFrst MinError MCC MDL OPT2 MinSS 0% Abalone 2Planes Pole Elevators Ailerons Mv1 Fried1 CompAct CompAct(s) Census16H Census8L Kinematics DepthFrst MinError MCC MDL OPT2 MinSS 80% Samples of 1000 Cases 80% Samples of 2000 Cases 60% 60% 40% 40% 20% 20% 0% Abalone 2Planes Pole Elevators Ailerons Mv1 Fried1 CompAct CompAct(s) Census16H Census8L Kinematics DepthFrst MinError MCC MDL OPT2 MinSS 0% Abalone 2Planes Pole Elevators Ailerons Mv1 Fried1 CompAct CompAct(s) Census16H Census8L Kinematics DepthFrst MinError MCC MDL OPT2 MinSS Figure 1 - The comparative results for tree size. Figure 2 shows similar graphs for the Mean Squared Error of the best trees of each sequence. The results are again given as percentages and they were trimmed to 20% losses.
11 The differences between the methods decrease as the training samples grow in size. It is interesting to observe that there are extremely large differences in the potential of the sequences for samples with few hundred cases. Under these conditions the OPT-2 method is more stable over all domains (only once achieved a result worse than 5%). On the other hand methods like Depth-first, Minimal Cost Complexity and Minimal Error have a very bad score for the CompAct data set. Our results seem to indicate that only with small training samples there are large differences in the potential of the sequences obtained by the methods we have tried. 20% Samples of 300 Cases 20% Samples of 600 Cases 15% 15% 10% 10% 5% 5% 0% Abalone 2Planes Pole Elevators Ailerons Mv1 Fried1 CompAct CompAct(s) Census16H Census8L Kinematics DepthFrst MinError MCC MDL OPT2 MinSS 0% Abalone 2Planes Pole Elevators Ailerons Mv1 Fried1 CompAct CompAct(s) Census16H Census8L Kinematics DepthFrst MinError MCC MDL OPT2 MinSS 20% Samples of 1000 Cases 20% Samples of 2000 Cases 15% 15% 10% 10% 5% 5% 0% Abalone 2Planes Pole Elevators Ailerons Mv1 Fried1 CompAct CompAct(s) Census16H Census8L Kinematics DepthFrst MinError MCC MDL OPT2 MinSS 0% Abalone 2Planes Pole Elevators Ailerons Mv1 Fried1 CompAct CompAct(s) Census16H Census8L Kinematics DepthFrst MinError MCC MDL OPT2 MinSS Figure 2 The results of the error of the best trees for different training sizes. We have conducted a set of paired comparisons in order to quantify the statistical significance of the differences between the methods in all data sets. We have used a 4 10-fold Cross Validation evaluation procedure. The use of Cross Validation ensures the characterization of performance differences due to variations on the test sets, while the repetition with different random permutations of the data set removes eventual biases due to training set order. For each of the 40 folds all methods were tried under the same conditions a part from the sequence generation algorithm.
12 Table 3 presents the results in terms of Mean Squared Error of the best tree in each sequence. The best result for each data set is presented in italics and the results that are significantly worse that this best score are marked with asterisks (one for 95% confidence and two for 99%). Table 3. Mean Squared Error for each sequence-based method. DepthFrst MinErr MDL MCC OPT-2 MinStatS Abalone 7.35 ** 7.11 ** ** 7.22 ** Planes Pole Elevators * * Ailerons ** ** ** ** ** Mv Fried CompAct CompAct(s) Census16H 1.9E+09 ** 1.97E+09 ** 1.72E+09 ** 1.99E+09 ** 1.92E+09 ** 1.63E+09 Census8L 1.55E+09 ** 1.54E+09 ** 1.26E+09 ** 1.45E+09 ** 1.43E+09 ** 1.21E+09 Kinematics ** ** ** We can observe that our proposed Minimum Statistical support (MS) method is clearly the method with best results. MS is the only method that never significantly looses to others, while achieving several significant wins. This result is even more remarkable if we take into account that MS is a particular simple method when compared to others like OPT-2, for instance. Our other proposed method (MDL) can be considered the second best in terms of accuracy. Surprisingly, there are methods that even achieve worse results than the blind Depth-first approach on some domains. This table also shows that the choice of the sequence generation method may lead to significant differences on the accuracy of the final model. Table 4 presents the results of this comparison in terms of size of the trees. This table uses the same notation as the previous one regarding significance of differences.
13 Table 4. Tree size for each sequence-based method. DepthFrst MinErr MDL MCC OPT-2 MinStatS Abalone ** 81.4 ** 44.6 ** 65.3 ** 76 ** Planes Pole 42.0 ** 39.8 ** * 39.0 ** 36.2 Elevators ** ** ** ** * Ailerons ** ** ** ** Mv Fried ** ** ** ** CompAct CompAct(s) Census16H ** ** ** Census8L ** ** ** ** Kinematics ** ** ** ** ** In terms of size of the trees both MDL and MS achieve significantly better results than the others. The difference can be extremely large for some domains (Census16H, Census8L, Ailerons and Abalone). These results make even more remarkable the scores of these two methods in terms of accuracy that were given in Table 3. We now address the issue of how the sequence-based methods compare to single-tree algorithms. We have compared one representative of each approach, the Minimum Statistical Support (MS) and the Niblett and Bratko s algorithm (NB), respectively. Our implementation of this later algorithm uses m-estimates to obtain the error of nodes as in the RETIS system (Karalic and Cestnik, 1991). To ensure a fair comparison with the MS method we have also used these estimates to select one of the trees of its sequence. We have carried out several 4 10 Cross Validation paired experiments using different values for the m parameter (0.5, 1.5, 2, 3, 5 and 10). For each fold a sequence was built using the MS method and m- estimates were used to select one of the trees. This tree was than compared with the tree chosen by the NB method using m-estimates with the same value of the m parameter. Table 5 shows the Mean Squared Error of the comparison for different values of m over all domains. Whenever the result of MS is significantly better (97.5% confidence) than the corresponding result of NB, the value is shown with dark gray background. The inverse is shown with light gray background. The other differences lack statistical significance at this level of confidence.
14 Table 5. Minimal Statistical support versus Niblett and Bratko s algorithm in terms of Mean Squared Error. Abal. 2Pl. Pole Elev. Ail. Mv1 Fried Comp Comp(s) C16H C8L Kin. NB(0.5) E E MS(0.5) E E NB(1.5) E E MS(1.5) E E NB(2) E E MS(2) E E NB(3) E E MS(3) E E NB(5) E E MS(5) E E NB(10) E E MS(10) E E This table provides a clear indication that the representative of sequence-based methods is significantly superior the Niblett and Bratko s algorithm on several domains. This result is even clearer if we compare the best results of each method for all tried m values 1. In effect, within this scenario we have verified that the NB method never outperformed MS result (with or without statistical significance). On the contrary, MS was significantly (97.5% confidence) better than NB on 6 out of the 12 domains. We now analyze the results of the same comparison in terms of size of the trees. Table 6 presents the number of leaves of the trees obtained with both methods. This table follows the same notation regarding the statistical significance of the differences. 1 - This best setting could be automatically obtained using an internal Cross Validation tuning process.
15 Table 6. Minimal Statistical support versus Niblett and Bratko s algorithm in terms of Size of the Tree (number of leaves). Abal. 2Pl. Pole Elev. Ail. Mv1 Fried Comp Comp(s) C16H C8L Kin. NB(0.5) MS(0.5) NB(1.5) MS(1.5) NB(2) MS(2) NB(3) MS(3) NB(5) MS(5) NB(10) MS(10) This table reveals a slight superiority of the MS method with few exceptions on some domains. If we again consider the optimal value of m for each method we observe that both methods significantly outperform the other in 3 domains. This indicates that the results in terms of tree size are a bit more leveled. Still, we have to recall that in terms of interpretability the sequence-based methods allow the user to select other trees with the wanted degree of complexity without any additional computational cost. 4 Conclusions We have presented a set of pruning methods based on the notion of selection from a sequence of trees. These methods provide a set of alternative regression trees with different trade-off between accuracy and interpretability. These approaches use reliable error estimates to choose one of these models, but they also allow the user to inspect and select other trees without additional learning effort. The trees in these sequences are characterized with measures that provide guidance regarding the costs of other selections. We claim that this feature of sequence-based pruning methods is of extreme importance in real applications of tree-based learners. We have compared several sequence-based algorithms in twelve data sets under the same learning framework. In this paper we have presented two new methods for generating sequences of pruned trees (MDL and MS). We have observed that
16 these methods significantly outperformed other existing methods in terms of accuracy on several domains. Moreover, this score is obtained with simpler trees. Both methods could be easily adapted to classification trees. We have also compared our sequence-based MS method with an algorithm that follows a single-tree approach. Our comparison has revealed that the sequencebased method leads to trees that are significantly more accurate on several domains. References Almuallim,H. (1996) : An efficient algorithm for optimal pruning of decision trees, Artificial Intelligence, 82 (2), Elsevier. Bohanec,M., Bratko,I. (1994) : Trading Accuracy for Simplicity in Decision Trees, Machine Learning, 15 (3). Kluwer Academic Publishers. Breiman,L., Friedman,J.H., Olshen,R.A. & Stone,C.J. (1984): Classification and Regression Trees, Wadsworth Int. Group, Belmont, California, USA, Cestnik,B. (1990) : Estimating probabilities : A crucial task in Machine Learning. In proceeding of the 9th European Conference on Artificial Intelligence (ECAI-90), Pitman Publishers. Esposito,F., Malerba,D., Semerano,G. (1993) : Decision Tree Pruning as a Search in the State Space. In Proceedings of the European Conference on Machine Learning (ECML-93), Brazdil,P. (ed.). LNAI-667, Springer Verlag Esposito,F., Malerba,D., Semerano,G. (1995) : Simplifying Decision Trees by Pruning and Grafting: New Results. In Proceedings of the European Conference on Machine Learning (ECML-95), Lavrac,N. and Wrobel,S. (eds.). LNAI-912, Springer Verlag Karalic,A. (1992) : Employing Linear Regression in Regression Tree Leaves. In Proceedings of ECAI-92. Wiley & Sons. Karalic,A., Cestnik,B. (1991) : The bayesian approach to tree-structured regression. In proceedings of the ITI-91. Merz,C.J., Murphy,P.M. (1996) : UCI repository of machine learning databases [ Irvine, CA. University of California, Department of Information and Computer Science. Mingers,J. (1989) : An empirical comparison of pruning methods for decision tree induction. Machine Learning, 4 : Kluwer Academic Publishers. Niblett,T., Bratko,I. (1986) : Learning decision rules in noisy domains. In Developments in Expert Systems, Bramer,M. (ed.). Cambridge University Press. Quinlan,J. and Rivest,R. (1989) : Inferring decision trees using the minimum description length principle. Information and Computation, 80 : Rissanen,J. (1983) : A universal prior for integers and estimation by minimum description length. The Annals of Statistics, 11 (2) : Robnik-Sikonja,M., Kononenko,I. (1998) : Pruning regression trees with MDL. Working paper. Wallace,C. and Patrick,J. (1993) : Coding decision trees. Machine Learning, 11(1). Kluwer Academic Publishers.
A Comparative Study of Reliable Error Estimators for Pruning Regression Trees
A Comparative Study of Reliable Error Estimators for Pruning Regression Trees Luís Torgo LIACC/FEP University of Porto R. Campo Alegre, 823, 2º - 4150 PORTO - PORTUGAL Phone : (+351) 2 607 8830 Fax : (+351)
More informationThe digital copy of this thesis is protected by the Copyright Act 1994 (New Zealand).
http://waikato.researchgateway.ac.nz/ Research Commons at the University of Waikato Copyright Statement: The digital copy of this thesis is protected by the Copyright Act 1994 (New Zealand). The thesis
More informationTrimmed bagging a DEPARTMENT OF DECISION SCIENCES AND INFORMATION MANAGEMENT (KBI) Christophe Croux, Kristel Joossens and Aurélie Lemmens
Faculty of Economics and Applied Economics Trimmed bagging a Christophe Croux, Kristel Joossens and Aurélie Lemmens DEPARTMENT OF DECISION SCIENCES AND INFORMATION MANAGEMENT (KBI) KBI 0721 Trimmed Bagging
More informationREGRESSION BY SELECTING APPROPRIATE FEATURE(S)
REGRESSION BY SELECTING APPROPRIATE FEATURE(S) 7ROJD$\GÕQDQG+$OWD\*üvenir Department of Computer Engineering Bilkent University Ankara, 06533, TURKEY Abstract. This paper describes two machine learning
More informationEstimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees
Estimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees Jing Wang Computer Science Department, The University of Iowa jing-wang-1@uiowa.edu W. Nick Street Management Sciences Department,
More informationBig Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1
Big Data Methods Chapter 5: Machine learning Big Data Methods, Chapter 5, Slide 1 5.1 Introduction to machine learning What is machine learning? Concerned with the study and development of algorithms that
More informationStepwise Induction of Model Trees
Stepwise Induction of Model Trees Donato Malerba Annalisa Appice Antonia Bellino Michelangelo Ceci Domenico Pallotta Dipartimento di Informatica, Università degli Studi di Bari via Orabona 4, 70 Bari,
More informationDecision trees. Decision trees are useful to a large degree because of their simplicity and interpretability
Decision trees A decision tree is a method for classification/regression that aims to ask a few relatively simple questions about an input and then predicts the associated output Decision trees are useful
More informationChapter 5. Local Regression Trees
Chapter 5 Local Regression Trees In this chapter we explore the hypothesis of improving the accuracy of regression trees by using smoother models at the tree leaves. Our proposal consists of using local
More informationWhat is Learning? CS 343: Artificial Intelligence Machine Learning. Raymond J. Mooney. Problem Solving / Planning / Control.
What is Learning? CS 343: Artificial Intelligence Machine Learning Herbert Simon: Learning is any process by which a system improves performance from experience. What is the task? Classification Problem
More informationConstraint Based Induction of Multi-Objective Regression Trees
Constraint Based Induction of Multi-Objective Regression Trees Jan Struyf 1 and Sašo Džeroski 2 1 Katholieke Universiteit Leuven, Dept. of Computer Science Celestijnenlaan 200A, B-3001 Leuven, Belgium
More informationClassification and Regression Trees
Classification and Regression Trees David S. Rosenberg New York University April 3, 2018 David S. Rosenberg (New York University) DS-GA 1003 / CSCI-GA 2567 April 3, 2018 1 / 51 Contents 1 Trees 2 Regression
More informationCloNI: clustering of JN -interval discretization
CloNI: clustering of JN -interval discretization C. Ratanamahatana Department of Computer Science, University of California, Riverside, USA Abstract It is known that the naive Bayesian classifier typically
More informationINDUCTIVE LEARNING WITH BCT
INDUCTIVE LEARNING WITH BCT Philip K Chan CUCS-451-89 August, 1989 Department of Computer Science Columbia University New York, NY 10027 pkc@cscolumbiaedu Abstract BCT (Binary Classification Tree) is a
More informationAlgorithms: Decision Trees
Algorithms: Decision Trees A small dataset: Miles Per Gallon Suppose we want to predict MPG From the UCI repository A Decision Stump Recursion Step Records in which cylinders = 4 Records in which cylinders
More informationAn Empirical Study of Hoeffding Racing for Model Selection in k-nearest Neighbor Classification
An Empirical Study of Hoeffding Racing for Model Selection in k-nearest Neighbor Classification Flora Yu-Hui Yeh and Marcus Gallagher School of Information Technology and Electrical Engineering University
More informationMetrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates?
Model Evaluation Metrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates? Methods for Model Comparison How to
More informationInduction of Multivariate Decision Trees by Using Dipolar Criteria
Induction of Multivariate Decision Trees by Using Dipolar Criteria Leon Bobrowski 1,2 and Marek Krȩtowski 1 1 Institute of Computer Science, Technical University of Bia lystok, Poland 2 Institute of Biocybernetics
More informationFLEXIBLE AND OPTIMAL M5 MODEL TREES WITH APPLICATIONS TO FLOW PREDICTIONS
6 th International Conference on Hydroinformatics - Liong, Phoon & Babovic (eds) 2004 World Scientific Publishing Company, ISBN 981-238-787-0 FLEXIBLE AND OPTIMAL M5 MODEL TREES WITH APPLICATIONS TO FLOW
More informationImproving the Random Forest Algorithm by Randomly Varying the Size of the Bootstrap Samples for Low Dimensional Data Sets
Improving the Random Forest Algorithm by Randomly Varying the Size of the Bootstrap Samples for Low Dimensional Data Sets Md Nasim Adnan and Md Zahidul Islam Centre for Research in Complex Systems (CRiCS)
More informationMIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018
MIT 801 [Presented by Anna Bosman] 16 February 2018 Machine Learning What is machine learning? Artificial Intelligence? Yes as we know it. What is intelligence? The ability to acquire and apply knowledge
More informationTree-based methods for classification and regression
Tree-based methods for classification and regression Ryan Tibshirani Data Mining: 36-462/36-662 April 11 2013 Optional reading: ISL 8.1, ESL 9.2 1 Tree-based methods Tree-based based methods for predicting
More informationDecision Trees Dr. G. Bharadwaja Kumar VIT Chennai
Decision Trees Decision Tree Decision Trees (DTs) are a nonparametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target
More informationRank Measures for Ordering
Rank Measures for Ordering Jin Huang and Charles X. Ling Department of Computer Science The University of Western Ontario London, Ontario, Canada N6A 5B7 email: fjhuang33, clingg@csd.uwo.ca Abstract. Many
More informationAn Information-Theoretic Approach to the Prepruning of Classification Rules
An Information-Theoretic Approach to the Prepruning of Classification Rules Max Bramer University of Portsmouth, Portsmouth, UK Abstract: Keywords: The automatic induction of classification rules from
More informationOblique Linear Tree. 1. Introduction
Oblique Linear Tree João Gama LIACC, FEP - University of Porto Rua Campo Alegre, 823 4150 Porto, Portugal Phone: (+351) 2 6001672 Fax: (+351) 2 6003654 Email: jgama@ncc.up.pt WWW: http//www.up.pt/liacc/ml
More information10601 Machine Learning. Model and feature selection
10601 Machine Learning Model and feature selection Model selection issues We have seen some of this before Selecting features (or basis functions) Logistic regression SVMs Selecting parameter value Prior
More informationFuzzy Partitioning with FID3.1
Fuzzy Partitioning with FID3.1 Cezary Z. Janikow Dept. of Mathematics and Computer Science University of Missouri St. Louis St. Louis, Missouri 63121 janikow@umsl.edu Maciej Fajfer Institute of Computing
More informationEfficient Pruning Method for Ensemble Self-Generating Neural Networks
Efficient Pruning Method for Ensemble Self-Generating Neural Networks Hirotaka INOUE Department of Electrical Engineering & Information Science, Kure National College of Technology -- Agaminami, Kure-shi,
More informationCyber attack detection using decision tree approach
Cyber attack detection using decision tree approach Amit Shinde Department of Industrial Engineering, Arizona State University,Tempe, AZ, USA {amit.shinde@asu.edu} In this information age, information
More informationThe Projected Dip-means Clustering Algorithm
Theofilos Chamalis Department of Computer Science & Engineering University of Ioannina GR 45110, Ioannina, Greece thchama@cs.uoi.gr ABSTRACT One of the major research issues in data clustering concerns
More informationUnivariate and Multivariate Decision Trees
Univariate and Multivariate Decision Trees Olcay Taner Yıldız and Ethem Alpaydın Department of Computer Engineering Boğaziçi University İstanbul 80815 Turkey Abstract. Univariate decision trees at each
More informationLogical Decision Rules: Teaching C4.5 to Speak Prolog
Logical Decision Rules: Teaching C4.5 to Speak Prolog Kamran Karimi and Howard J. Hamilton Department of Computer Science University of Regina Regina, Saskatchewan Canada S4S 0A2 {karimi,hamilton}@cs.uregina.ca
More informationUSING REGRESSION TREES IN PREDICTIVE MODELLING
Production Systems and Information Engineering Volume 4 (2006), pp. 115-124 115 USING REGRESSION TREES IN PREDICTIVE MODELLING TAMÁS FEHÉR University of Miskolc, Hungary Department of Information Engineering
More information7. Decision or classification trees
7. Decision or classification trees Next we are going to consider a rather different approach from those presented so far to machine learning that use one of the most common and important data structure,
More informationIntroduction to Machine Learning
Introduction to Machine Learning Decision Tree Example Three variables: Attribute 1: Hair = {blond, dark} Attribute 2: Height = {tall, short} Class: Country = {Gromland, Polvia} CS4375 --- Fall 2018 a
More informationAllstate Insurance Claims Severity: A Machine Learning Approach
Allstate Insurance Claims Severity: A Machine Learning Approach Rajeeva Gaur SUNet ID: rajeevag Jeff Pickelman SUNet ID: pattern Hongyi Wang SUNet ID: hongyiw I. INTRODUCTION The insurance industry has
More informationThe Basics of Decision Trees
Tree-based Methods Here we describe tree-based methods for regression and classification. These involve stratifying or segmenting the predictor space into a number of simple regions. Since the set of splitting
More informationApplication of Additive Groves Ensemble with Multiple Counts Feature Evaluation to KDD Cup 09 Small Data Set
Application of Additive Groves Application of Additive Groves Ensemble with Multiple Counts Feature Evaluation to KDD Cup 09 Small Data Set Daria Sorokina Carnegie Mellon University Pittsburgh PA 15213
More informationBITS F464: MACHINE LEARNING
BITS F464: MACHINE LEARNING Lecture-16: Decision Tree (contd.) + Random Forest Dr. Kamlesh Tiwari Assistant Professor Department of Computer Science and Information Systems Engineering, BITS Pilani, Rajasthan-333031
More informationClassification Using Unstructured Rules and Ant Colony Optimization
Classification Using Unstructured Rules and Ant Colony Optimization Negar Zakeri Nejad, Amir H. Bakhtiary, and Morteza Analoui Abstract In this paper a new method based on the algorithm is proposed to
More informationAPHID: Asynchronous Parallel Game-Tree Search
APHID: Asynchronous Parallel Game-Tree Search Mark G. Brockington and Jonathan Schaeffer Department of Computing Science University of Alberta Edmonton, Alberta T6G 2H1 Canada February 12, 1999 1 Running
More informationImproving Quality of Products in Hard Drive Manufacturing by Decision Tree Technique
Improving Quality of Products in Hard Drive Manufacturing by Decision Tree Technique Anotai Siltepavet 1, Sukree Sinthupinyo 2 and Prabhas Chongstitvatana 3 1 Computer Engineering, Chulalongkorn University,
More informationConcept Tree Based Clustering Visualization with Shaded Similarity Matrices
Syracuse University SURFACE School of Information Studies: Faculty Scholarship School of Information Studies (ischool) 12-2002 Concept Tree Based Clustering Visualization with Shaded Similarity Matrices
More informationEvaluation of Decision Tree Pruning Algorithms for Complexity and Classification Accuracy
Evaluation of Decision Tree Pruning Algorithms for Complexity and Classification Accuracy Dipti D. Patil Assistant Professor, MITCOE, Pune, INDIA V.M. Wadhai Professor and Dean of Research, MITSOT, MAE,
More informationLecture 7: Decision Trees
Lecture 7: Decision Trees Instructor: Outline 1 Geometric Perspective of Classification 2 Decision Trees Geometric Perspective of Classification Perspective of Classification Algorithmic Geometric Probabilistic...
More informationBRACE: A Paradigm For the Discretization of Continuously Valued Data
Proceedings of the Seventh Florida Artificial Intelligence Research Symposium, pp. 7-2, 994 BRACE: A Paradigm For the Discretization of Continuously Valued Data Dan Ventura Tony R. Martinez Computer Science
More informationBoosting Simple Model Selection Cross Validation Regularization. October 3 rd, 2007 Carlos Guestrin [Schapire, 1989]
Boosting Simple Model Selection Cross Validation Regularization Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 3 rd, 2007 1 Boosting [Schapire, 1989] Idea: given a weak
More informationStochastic propositionalization of relational data using aggregates
Stochastic propositionalization of relational data using aggregates Valentin Gjorgjioski and Sašo Dzeroski Jožef Stefan Institute Abstract. The fact that data is already stored in relational databases
More informationSearch and Optimization
Search and Optimization Search, Optimization and Game-Playing The goal is to find one or more optimal or sub-optimal solutions in a given search space. We can either be interested in finding any one solution
More informationStudy on Classifiers using Genetic Algorithm and Class based Rules Generation
2012 International Conference on Software and Computer Applications (ICSCA 2012) IPCSIT vol. 41 (2012) (2012) IACSIT Press, Singapore Study on Classifiers using Genetic Algorithm and Class based Rules
More informationClassification. Instructor: Wei Ding
Classification Part II Instructor: Wei Ding Tan,Steinbach, Kumar Introduction to Data Mining 4/18/004 1 Practical Issues of Classification Underfitting and Overfitting Missing Values Costs of Classification
More informationLogistic Model Tree With Modified AIC
Logistic Model Tree With Modified AIC Mitesh J. Thakkar Neha J. Thakkar Dr. J.S.Shah Student of M.E.I.T. Asst.Prof.Computer Dept. Prof.&Head Computer Dept. S.S.Engineering College, Indus Engineering College
More informationA CSP Search Algorithm with Reduced Branching Factor
A CSP Search Algorithm with Reduced Branching Factor Igor Razgon and Amnon Meisels Department of Computer Science, Ben-Gurion University of the Negev, Beer-Sheva, 84-105, Israel {irazgon,am}@cs.bgu.ac.il
More informationDecision Tree Learning
Decision Tree Learning 1 Simple example of object classification Instances Size Color Shape C(x) x1 small red circle positive x2 large red circle positive x3 small red triangle negative x4 large blue circle
More informationImproving Tree-Based Classification Rules Using a Particle Swarm Optimization
Improving Tree-Based Classification Rules Using a Particle Swarm Optimization Chi-Hyuck Jun *, Yun-Ju Cho, and Hyeseon Lee Department of Industrial and Management Engineering Pohang University of Science
More informationCpk: What is its Capability? By: Rick Haynes, Master Black Belt Smarter Solutions, Inc.
C: What is its Capability? By: Rick Haynes, Master Black Belt Smarter Solutions, Inc. C is one of many capability metrics that are available. When capability metrics are used, organizations typically provide
More informationGenetic Programming for Data Classification: Partitioning the Search Space
Genetic Programming for Data Classification: Partitioning the Search Space Jeroen Eggermont jeggermo@liacs.nl Joost N. Kok joost@liacs.nl Walter A. Kosters kosters@liacs.nl ABSTRACT When Genetic Programming
More informationBoosting Simple Model Selection Cross Validation Regularization
Boosting: (Linked from class website) Schapire 01 Boosting Simple Model Selection Cross Validation Regularization Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 8 th,
More informationClassification/Regression Trees and Random Forests
Classification/Regression Trees and Random Forests Fabio G. Cozman - fgcozman@usp.br November 6, 2018 Classification tree Consider binary class variable Y and features X 1,..., X n. Decide Ŷ after a series
More informationComparing Univariate and Multivariate Decision Trees *
Comparing Univariate and Multivariate Decision Trees * Olcay Taner Yıldız, Ethem Alpaydın Department of Computer Engineering Boğaziçi University, 80815 İstanbul Turkey yildizol@cmpe.boun.edu.tr, alpaydin@boun.edu.tr
More informationRegression Error Characteristic Surfaces
Regression Characteristic Surfaces Luís Torgo LIACC/FEP, University of Porto Rua de Ceuta, 118, 6. 4050-190 Porto, Portugal ltorgo@liacc.up.pt ABSTRACT This paper presents a generalization of Regression
More informationHybrid Feature Selection for Modeling Intrusion Detection Systems
Hybrid Feature Selection for Modeling Intrusion Detection Systems Srilatha Chebrolu, Ajith Abraham and Johnson P Thomas Department of Computer Science, Oklahoma State University, USA ajith.abraham@ieee.org,
More informationBuilding Classifiers using Bayesian Networks
Building Classifiers using Bayesian Networks Nir Friedman and Moises Goldszmidt 1997 Presented by Brian Collins and Lukas Seitlinger Paper Summary The Naive Bayes classifier has reasonable performance
More informationPUBLIC: A Decision Tree Classifier that Integrates Building and Pruning
Data Mining and Knowledge Discovery, 4, 315 344, 2000 c 2000 Kluwer Academic Publishers. Manufactured in The Netherlands. PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning RAJEEV
More informationUsing Pairs of Data-Points to Define Splits for Decision Trees
Using Pairs of Data-Points to Define Splits for Decision Trees Geoffrey E. Hinton Department of Computer Science University of Toronto Toronto, Ontario, M5S la4, Canada hinton@cs.toronto.edu Michael Revow
More informationEvaluating the Replicability of Significance Tests for Comparing Learning Algorithms
Evaluating the Replicability of Significance Tests for Comparing Learning Algorithms Remco R. Bouckaert 1,2 and Eibe Frank 2 1 Xtal Mountain Information Technology 215 Three Oaks Drive, Dairy Flat, Auckland,
More informationInstance-Based Prediction of Continuous Values
From: AAAI Technical Report WS-94-01. Compilation copyright 1994, AAAI (www.aaai.org). All rights reserved. Instance-Based Prediction of Continuous Values Tony Townsend-Weber and Dennis Kibler Information
More informationClassification and Regression Trees
Classification and Regression Trees Matthew S. Shotwell, Ph.D. Department of Biostatistics Vanderbilt University School of Medicine Nashville, TN, USA March 16, 2018 Introduction trees partition feature
More informationData Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation
Data Mining Part 2. Data Understanding and Preparation 2.4 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Introduction Normalization Attribute Construction Aggregation Attribute Subset Selection Discretization
More informationOptimal Extension of Error Correcting Output Codes
Book Title Book Editors IOS Press, 2003 1 Optimal Extension of Error Correcting Output Codes Sergio Escalera a, Oriol Pujol b, and Petia Radeva a a Centre de Visió per Computador, Campus UAB, 08193 Bellaterra
More informationLook-Ahead Based Fuzzy Decision Tree Induction
IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 9, NO. 3, JUNE 2001 461 Look-Ahead Based Fuzzy Decision Tree Induction Ming Dong, Student Member, IEEE, and Ravi Kothari, Senior Member, IEEE Abstract Decision
More informationSupervised Learning. Decision trees Artificial neural nets K-nearest neighbor Support vectors Linear regression Logistic regression...
Supervised Learning Decision trees Artificial neural nets K-nearest neighbor Support vectors Linear regression Logistic regression... Supervised Learning y=f(x): true function (usually not known) D: training
More informationEducated brute-force to get h(4)
Educated brute-force to get h(4) Rogério Reis Nelma Moreira João Pedro Pedroso Technical Report Series: DCC-04-04 rev.3 Departamento de Ciência de Computadores Faculdade de Ciências & Laboratório de Inteligência
More informationUnsupervised Discretization using Tree-based Density Estimation
Unsupervised Discretization using Tree-based Density Estimation Gabi Schmidberger and Eibe Frank Department of Computer Science University of Waikato Hamilton, New Zealand {gabi, eibe}@cs.waikato.ac.nz
More informationEfficient Pairwise Classification
Efficient Pairwise Classification Sang-Hyeun Park and Johannes Fürnkranz TU Darmstadt, Knowledge Engineering Group, D-64289 Darmstadt, Germany Abstract. Pairwise classification is a class binarization
More informationA Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection (Kohavi, 1995)
A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection (Kohavi, 1995) Department of Information, Operations and Management Sciences Stern School of Business, NYU padamopo@stern.nyu.edu
More informationCSC411/2515 Tutorial: K-NN and Decision Tree
CSC411/2515 Tutorial: K-NN and Decision Tree Mengye Ren csc{411,2515}ta@cs.toronto.edu September 25, 2016 Cross-validation K-nearest-neighbours Decision Trees Review: Motivation for Validation Framework:
More informationData Mining and Knowledge Discovery Practice notes 2
Keywords Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si Data Attribute, example, attribute-value data, target variable, class, discretization Algorithms
More informationA Fast Decision Tree Learning Algorithm
A Fast Decision Tree Learning Algorithm Jiang Su and Harry Zhang Faculty of Computer Science University of New Brunswick, NB, Canada, E3B 5A3 {jiang.su, hzhang}@unb.ca Abstract There is growing interest
More informationImproving Quality of Products in Hard Drive Manufacturing by Decision Tree Technique
www.ijcsi.org 29 Improving Quality of Products in Hard Drive Manufacturing by Decision Tree Technique Anotai Siltepavet 1, Sukree Sinthupinyo 2 and Prabhas Chongstitvatana 3 1 Computer Engineering, Chulalongkorn
More informationAn Empirical Investigation of the Trade-Off Between Consistency and Coverage in Rule Learning Heuristics
An Empirical Investigation of the Trade-Off Between Consistency and Coverage in Rule Learning Heuristics Frederik Janssen and Johannes Fürnkranz TU Darmstadt, Knowledge Engineering Group Hochschulstraße
More informationSECRET: A Scalable Linear Regression Tree Algorithm
SECRET: A Scalable Linear Regression Tree Algorithm Alin Dobra Department of Computer Science Cornell University Ithaca, NY 4853 dobra@cs.cornell.edu Johannes Gehrke Department of Computer Science Cornell
More informationUsing Turning Point Detection to Obtain Better Regression Trees
Using Turning Point Detection to Obtain Better Regression Trees Paul K. Amalaman, Christoph F. Eick and Nouhad Rizk pkamalam@uh.edu, ceick@uh.edu, nrizk@uh.edu Department of Computer Science, University
More informationSwarm Based Fuzzy Clustering with Partition Validity
Swarm Based Fuzzy Clustering with Partition Validity Lawrence O. Hall and Parag M. Kanade Computer Science & Engineering Dept University of South Florida, Tampa FL 33620 @csee.usf.edu Abstract
More informationNonparametric Methods Recap
Nonparametric Methods Recap Aarti Singh Machine Learning 10-701/15-781 Oct 4, 2010 Nonparametric Methods Kernel Density estimate (also Histogram) Weighted frequency Classification - K-NN Classifier Majority
More informationLecture 5 of 42. Decision Trees, Occam s Razor, and Overfitting
Lecture 5 of 42 Decision Trees, Occam s Razor, and Overfitting Friday, 01 February 2008 William H. Hsu, KSU http://www.cis.ksu.edu/~bhsu Readings: Chapter 3.6-3.8, Mitchell Lecture Outline Read Sections
More informationDevelopment of Prediction Model for Linked Data based on the Decision Tree for Track A, Task A1
Development of Prediction Model for Linked Data based on the Decision Tree for Track A, Task A1 Dongkyu Jeon and Wooju Kim Dept. of Information and Industrial Engineering, Yonsei University, Seoul, Korea
More informationData Mining and Knowledge Discovery: Practice Notes
Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si 8.11.2017 1 Keywords Data Attribute, example, attribute-value data, target variable, class, discretization
More informationRandom Forests and Boosting
Random Forests and Boosting Tree-based methods are simple and useful for interpretation. However they typically are not competitive with the best supervised learning approaches in terms of prediction accuracy.
More informationData Mining. 3.2 Decision Tree Classifier. Fall Instructor: Dr. Masoud Yaghini. Chapter 5: Decision Tree Classifier
Data Mining 3.2 Decision Tree Classifier Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction Basic Algorithm for Decision Tree Induction Attribute Selection Measures Information Gain Gain Ratio
More informationCHAPTER 3 AN OVERVIEW OF DESIGN OF EXPERIMENTS AND RESPONSE SURFACE METHODOLOGY
23 CHAPTER 3 AN OVERVIEW OF DESIGN OF EXPERIMENTS AND RESPONSE SURFACE METHODOLOGY 3.1 DESIGN OF EXPERIMENTS Design of experiments is a systematic approach for investigation of a system or process. A series
More informationEnsemble Learning. Another approach is to leverage the algorithms we have via ensemble methods
Ensemble Learning Ensemble Learning So far we have seen learning algorithms that take a training set and output a classifier What if we want more accuracy than current algorithms afford? Develop new learning
More informationUnivariate Margin Tree
Univariate Margin Tree Olcay Taner Yıldız Department of Computer Engineering, Işık University, TR-34980, Şile, Istanbul, Turkey, olcaytaner@isikun.edu.tr Abstract. In many pattern recognition applications,
More informationText Categorization. Foundations of Statistic Natural Language Processing The MIT Press1999
Text Categorization Foundations of Statistic Natural Language Processing The MIT Press1999 Outline Introduction Decision Trees Maximum Entropy Modeling (optional) Perceptrons K Nearest Neighbor Classification
More informationCalibrating Random Forests
Calibrating Random Forests Henrik Boström Informatics Research Centre University of Skövde 541 28 Skövde, Sweden henrik.bostrom@his.se Abstract When using the output of classifiers to calculate the expected
More informationSpeeding Up Logistic Model Tree Induction
Speeding Up Logistic Model Tree Induction Marc Sumner 1,2,EibeFrank 2,andMarkHall 2 1 Institute for Computer Science, University of Freiburg, Freiburg, Germany sumner@informatik.uni-freiburg.de 2 Department
More informationDecision Trees. This week: Next week: constructing DT. Pruning DT. Ensemble methods. Greedy Algorithm Potential Function.
Decision Trees This week: constructing DT Greedy Algorithm Potential Function upper bounds the error Pruning DT Next week: Ensemble methods Random Forest 2 Decision Trees - Boolean x 0 + x 6 0 + - 3 Decision
More informationEnhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques
24 Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Ruxandra PETRE
More informationFUTURE communication networks are expected to support
1146 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL 13, NO 5, OCTOBER 2005 A Scalable Approach to the Partition of QoS Requirements in Unicast and Multicast Ariel Orda, Senior Member, IEEE, and Alexander Sprintson,
More information