LIACC. Machine Learning group. Sequence-based Methods for Pruning Regression Trees by Luís Torgo. Internal Report n /4/98

Size: px

Start display at page:

Download "LIACC. Machine Learning group. Sequence-based Methods for Pruning Regression Trees by Luís Torgo. Internal Report n /4/98"

Charleen Gray
5 years ago
Views:

1 LIACC Machine Learning group Internal Report n Sequence-based Methods for Pruning Regression Trees by Luís Torgo 20/4/98

2 Sequence-based methods for Pruning Regression Trees Luís Torgo LIACC - University of Porto R. Campo Alegre, Porto Portugal ltorgo@ncc.up.pt URL : Phone : (+351) Fax : (+351) ABSTRACT: Pruning has been considered the key issue for obtaining reliable tree-based models. Current approaches to this task have taken two distinct pathways. On one hand we have methods that are based on selection from a sequence of candidate pruned trees. On the other hand we have algorithms that follow a bottom-up approach by pruning unreliable lower branches of the unpruned tree, thus producing a single tree model. These two approaches entail a different trade-off between accuracy and interpretability. This paper presents a comparative study between representatives of both approaches. The study is carried out in the context of regression trees with a focus on sequence-based methods. We describe some existing algorithms and present some new variants. We claim that sequence-based methods provide an additional benefit by allowing the user to easily select a model with the intended degree of interpretability. Moreover, our study reveals that this is achieved without sacrificing accuracy. KEYWORDS: Tree pruning methods, regression trees. 1 Introduction Recursive partitioning is the algorithm behind most decision tree algorithms. This algorithm provides computational efficiency at the cost of statistical unreliability in lower branches of the trees. Moreover, noisy domains often lead to the well-known

3 problem of over-specialization. Post-pruning of these trees has always been considered a major step in tree induction (Breiman et al., 1984). Breiman and his colleagues have described a pruning methodology based on the notion of tree selection using reliable error estimates. A side effect of this approach, which is often overlooked, is that it provides a set of alternative pruned trees entailing different compromises between accuracy and simplicity. The availability of these set of alternative models can be considered a key advantage to the user of these systems, as he may easily choose alternative trees to convey his domain-specific needs. Interpretability has always been considered a key advantage of Machine Learning (ML) approaches and these methods provide an additional degree of flexibility in this respect. Niblett and Bratko (1986) have described an alternative approach to tree post-pruning. Their algorithm proceeds by pruning unreliable lower branches of the trees in a bottom-up fashion. Compared to Breiman et al. s approach this method is computational more efficient, but on the other hand it produces a unique tree as result. The only way the user can get alternative tree models is by running the algorithm again using different set-ups for the parameters of the used error estimation method. This entails additional computational cost. Moreover, these parameters often lack intuitive meaning. Our study compares these two approaches to pruning in terms of accuracy and interpretability in twelve regression domains. The comparison is carried out using a common regression tree learning system. Our system (RT) implements several sequence-based pruning methods as well as Niblett & Bratko s method. Existing comparative work (Mingers, 1989; Esposito et al., 1993, 1995) has not addressed this issue of the flexible model choice of sequence-based pruning methods. These works have compared full-featured pruning algorithms. Ours emphasizes sequencebased pruning methods. We introduce several new variants for exploring the large search space of pruned trees. We evaluate these methods and stress their differences compared to a well-known single-tree pruning algorithm (Niblett & Bratko s method). The next section describes the pruning methods that we use in our comparisons. Section 3 presents the experiments carried out, while Section 4 contains the main conclusions of this work 2 Pruning Regression Trees Post-pruning of tree models can be seen as a search through the space of all possible sub-trees of the original overly large learned tree (Esposito et al.,1993). Esposito and her colleagues (1993) have described a series of pruning algorithms

4 from this search perspective. Their work presents the approaches followed by different pruning methods with respect to the forms of exploring the search space, the operators used for moving through the search states and the used state evaluation functions. Our study is focused on the question of how the next state is generated and how further the search goes. Sequence-based methods use a hillclimbing method that selects the next node to prune. In this paper we present several alternative functions to guide this selection. Moreover, these methods keep exploring the search space until a single leaf node tree is reached. Selection of the final tree model is left to a second stage of the pruning process. This stage is considered different from tree generation (Breiman et al.,1984). As a result of this procedure systems using these approaches can output both the selected tree as well as the other trials. Moreover, they provide the evaluation of these alternatives in the selection phase. An example of such sequence is given in Table 1 that results from applying our RT system to the Boston Housing domain from the UCI repository (Merz and Murphy, 1996). The information provided by this type of tables can be of extreme relevance in order to allow the user to explore different alternative models without additional learning effort. We claim that this is an important advantage in real world applications. Furthermore, it is known that it is quite difficult to find algorithm settings that work perfectly over all domains. Other pruning methods exist that do not provide such a sequence of alternative trees. Apart from the CART system (Breiman et al., 1984) most systems follow such strategies. Most of the existing alternatives use a bottom-up approach (Esposito et al.,1993). They start with the unpruned tree and visit the nodes using some pre-defined order. Each time some evaluation conditions are met the nodes are pruned. These methods produce a single tree as the result of the pruning stage. However, it should be mentioned that by casting these methods into a search-based framework as proposed by Esposito et al. (1993), one can see that they in fact explore alternative tree models. This means that these methods could theoretically be adapted in order to produce a set of trees (although potentially smaller than the ones generated by sequence-based methods). Still, this research line has not been explored.

5 Table 1. An example of a sequence of trees for the Housing data set. TREE N.LEAVES ERR ERR SE(ERR ) MDL PRUNED AT NODES T T T T T T T T7 * T T T T T12 $ T T T T T T TABLE LEGEND: ERR - Training Set Error ERR' - Estimated true error SE ERR' - Standard Error of the estimate * - The lowest estimated error tree + - The 1-SE tree MDL - Code description length of the tree (in bits) $ - The Minimum Description Length tree 2.1 Generating Sequences of Pruned Trees There are two main approaches to sequence-based pruning. Nested methods generate a sequence of trees where each element is a sub-tree of the previous one. The key issue on these methods is the selection of the next node to prune on each iteration. We will describe several alternative approaches to this issue. Optimal pruning algorithms generate sequences of pruned trees with size decreasing in one, where each element is the most accurate tree of all possible trees with that size. The trees in these sequences are not nested. This approach was already mentioned in the CART book (Breiman et al.,1984) although an algorithm was not provided. Bohanec and Bratko (1994) proposed a dynamic programming algorithm (OPT) to solve this search problem. Allmuallin (1996) has recently

6 proposed an improvement of this algorithm (OPT-2). In our study we use this algorithm as the representative of this type of methods. We now briefly describe the main ideas of the methods we use in our experimental comparison. All of them were implemented in the RT system Minimal Cost Complexity This is a nested sequence method originally presented in Breiman et al. (1984). This method is based on a measure of the cost complexity of the trees. This value is given by the following equation: where, ~ Rα +α ( T ) = R( T ) T (1) α is a complexity parameter; R(T) is the error in the training set of the tree T; and T ~ is the number of leaves of the tree. Based on this notion of α-based cost complexity Breiman and his colleagues describe an iterative algorithm that generates a sequence of nested pruned trees. At each step the node that is chosen to be pruned is the node t that minimizes the function : R( t) R( Tt ) f ( t) = ~ (2) T 1 where, R(t) is the training error at node t ; R(T t ) is the training error of the sub-tree rooted at t ; and T ~ t is the number of leaves of this sub-tree. t The Minimal Error Decrease This nested method uses a simple heuristic to choose the next node to prune. Based on the error estimates obtained during the learning phase we proceed by selecting at each step the splitting node with lowest impact in the tree error. This can be described as selecting the node t that minimizes the function : f ( t) = R( t) ( ) (3) R T t

7 2.1.3 The Minimum Statistical Support method The idea behind this method is to prune the node that is potentially less reliable. We follow a heuristic approach to measure statistical reliability. We make the strong assumption that nodes with lower number of training samples lead to less reliable estimates of the error. Based on this assumption we select the node with minimal number of cases as the next node to prune The MDL-based method Minimum description length (Rissanen, 1983) has been used as a means to characterize models within ML (Quinlan and Rivest, 1989; Wallace and Patrick, 1993). This methodology based on a sound theoretical background provides means to incorporate in a unique measure both the model complexity as well as its errors. In a way this is the idea behind the Minimal Cost Complexity measure presented in Section 2.1.1, but this latter lacks theoretical justification. Recently, Robnik- Sikonja and Kononenko (1998) presented a coding schema for regression tree models. Based on this coding we have developed a method for selecting a node to prune in a nested sequence generation algorithm. This method consists of choosing the node with lowest variation in description length. Notice that we do not say lowest decrease as in the previous training error-based measures. Training error always decreases as the size of the tree grows. On the contrary, MDL coding follows a U-shaped curve (we can observe this effect in Table 1). The function guiding the choice of a node can be described by, where, t ( t) MDL( T ) min MDL (4) MDL(t) is the coding length of node t if it was a leaf; and MDL(T t ) is the coding length of the sub-tree rooted at t. The coding length of a leaf node amounts to a sum of the coding lengths of the errors committed by the model at that node plus the coding of the model. This turns out to be a sum of the coding of a set of real numbers, which can be solved using the method proposed by Rissanen (1983). The coding length of a tree is the sum of the coding of its split nodes plus the coding of its leaf nodes. We follow the same strategy outlined in Robnik-Sikonja and Kononenko (1998) to code each split node. t

8 2.1.5 The OPT-2 method Breiman et al.,(1984) defined the notion of an optimal sequence of pruned trees as sequence of trees with size increasing in one, such that each tree is the highest accuracy tree for that size. Based on these ideas, Bohanec and Bratko (1994) have presented a dynamic programming algorithm (OPT) that solved the problem of finding these trees in an efficient way. Recently, Almuallim (1996) has described an improvement of this algorithm, named OPT-2. This algorithm improves the computational efficiency of OPT and it is able to generate the trees sequentially as opposed to OPT that generates the whole sequence simultaneously. We have implemented the OPT-2 algorithm within our RT system The Depth-first method As a kind of baseline method we have used an algorithm that runs through the tree in a depth-first fashion. Each time this method reaches a node whose descendants are leaves it generates a new pruned tree by pruning at that node. 2.2 Single-tree Methods These methods produce as outcome a single tree as opposed to the methods described in the previous section. Several methods exist that explore the search space using different heuristics. Niblett and Bratko s algorithm (1986) is one of the most well known algorithms. Its has been applied in the context of regression trees in the RETIS system (Karalic and Cestnik, 1991; Karalic, 1992). This algorithm runs through every node of the tree calculating their error (called static error in the author s nomenclature). In the RETIS system this calculus is based on m-estimates (Cestnik, 1990; Karalic and Cestnik, 1991) of the variance. These estimates provide more reliable values of the node errors than the ones obtained with the training set. The m-estimate of the error in a node is based on the estimate of the average Y value and is given by 2 ( σ ) ( ) m Est µ n Y i= 1 1 = n + m n y + m N ( ) n + m i i= 1 N i= 1 1 m m Est Y = y µ n + m N y i 2 2 ( Est( ) ( ) 2 i + yi m (5) N n + m i= 1

9 where, N is the number of training cases and n is the number in the node; and m is a parameter of these estimates. For each split node a dynamic error is also calculated. This is equal to the weighted sum of the errors of its children. These weights are obtained taking into account the number of cases going to each branch. In case the dynamic error is superior to the static error the node is pruned. We have implemented this algorithm in our RT system. M-estimates can also be used to calculate the error of a tree. This will enable the use of these estimates as a means to select a tree in a sequence-based method. By proceeding this way we will be able to compare sequence-based methods to this Niblett and Bratko s algorithm. 3 Experimental Comparison In the first set of experiments we have tried to assert if there was a clear winner among the sequence-based methods. To compare different sequences of trees we have used the following methodology. We have tried to find out the potential of each sequence. The goal of producing a sequence is to provide a set of alternative models that will be used in a sub-sequent evaluation phase for the final tree selection. If we had a perfect tree evaluation method we could compare the best selections in all sequences. We can simulate this set-up by using a sufficiently large separate set of cases. Using these cases we can estimate the error of each tree in each sequence and compare the best trees of each sequence. These best trees represent the potential of each sequence generation method. The goal of the comparisons we have carried out was to observe the performance of each method for different training sample sizes. We have used the data sets presented in Table 2. Table 2. The used data sets and the available number of cases. Data Set Training Pool;Test Set Data Set Training Pool;Test Set Abalone 3133;1044 Kinematics 4500;3692 Pole 5000;4065 Fried ;5000 Elevators 8752;7847 Census16H 17000;5784 Ailerons 7154;6596 Census8L 17000;5784 CompActiv. 4500;3692 2Dplanes 20000;5000 CompActiv(s) 4500;3692 Mv ;5000

10 For each data set we have randomly obtained samples of 300, 600, 1000 and 2000 cases, from the training pool. Using the RT learning system we have obtained a large regression tree for each sample. The sequence-based methods were then used to obtain a sequence of pruned trees. The large separate test set was then used to select the best tree of each sequence. We have compared the size and accuracy of these trees. The results we present are averages of 20 random repetitions for each training size. The results are given as percentage losses over the result of the best method (thus the best result for each data set has a score of 0%). Figure 1 presents the results for the tree sizes (measured as the number of leaves) for all data sets and training sizes. The results of this figure reveal that the methods have a similar behavior for the different training sizes. In effect, with few exceptions the differences are not big in terms of tree size. The apparent absence of a method consistently outperforming the others can be explained by the fact that there is usually a large amount of trees with similar performance but quite different sizes as it was observed by Breiman et al. (1984). 80% Samples of 300 Cases 80% Samples of 600 Cases 60% 60% 40% 40% 20% 20% 0% Abalone 2Planes Pole Elevators Ailerons Mv1 Fried1 CompAct CompAct(s) Census16H Census8L Kinematics DepthFrst MinError MCC MDL OPT2 MinSS 0% Abalone 2Planes Pole Elevators Ailerons Mv1 Fried1 CompAct CompAct(s) Census16H Census8L Kinematics DepthFrst MinError MCC MDL OPT2 MinSS 80% Samples of 1000 Cases 80% Samples of 2000 Cases 60% 60% 40% 40% 20% 20% 0% Abalone 2Planes Pole Elevators Ailerons Mv1 Fried1 CompAct CompAct(s) Census16H Census8L Kinematics DepthFrst MinError MCC MDL OPT2 MinSS 0% Abalone 2Planes Pole Elevators Ailerons Mv1 Fried1 CompAct CompAct(s) Census16H Census8L Kinematics DepthFrst MinError MCC MDL OPT2 MinSS Figure 1 - The comparative results for tree size. Figure 2 shows similar graphs for the Mean Squared Error of the best trees of each sequence. The results are again given as percentages and they were trimmed to 20% losses.

11 The differences between the methods decrease as the training samples grow in size. It is interesting to observe that there are extremely large differences in the potential of the sequences for samples with few hundred cases. Under these conditions the OPT-2 method is more stable over all domains (only once achieved a result worse than 5%). On the other hand methods like Depth-first, Minimal Cost Complexity and Minimal Error have a very bad score for the CompAct data set. Our results seem to indicate that only with small training samples there are large differences in the potential of the sequences obtained by the methods we have tried. 20% Samples of 300 Cases 20% Samples of 600 Cases 15% 15% 10% 10% 5% 5% 0% Abalone 2Planes Pole Elevators Ailerons Mv1 Fried1 CompAct CompAct(s) Census16H Census8L Kinematics DepthFrst MinError MCC MDL OPT2 MinSS 0% Abalone 2Planes Pole Elevators Ailerons Mv1 Fried1 CompAct CompAct(s) Census16H Census8L Kinematics DepthFrst MinError MCC MDL OPT2 MinSS 20% Samples of 1000 Cases 20% Samples of 2000 Cases 15% 15% 10% 10% 5% 5% 0% Abalone 2Planes Pole Elevators Ailerons Mv1 Fried1 CompAct CompAct(s) Census16H Census8L Kinematics DepthFrst MinError MCC MDL OPT2 MinSS 0% Abalone 2Planes Pole Elevators Ailerons Mv1 Fried1 CompAct CompAct(s) Census16H Census8L Kinematics DepthFrst MinError MCC MDL OPT2 MinSS Figure 2 The results of the error of the best trees for different training sizes. We have conducted a set of paired comparisons in order to quantify the statistical significance of the differences between the methods in all data sets. We have used a 4 10-fold Cross Validation evaluation procedure. The use of Cross Validation ensures the characterization of performance differences due to variations on the test sets, while the repetition with different random permutations of the data set removes eventual biases due to training set order. For each of the 40 folds all methods were tried under the same conditions a part from the sequence generation algorithm.

12 Table 3 presents the results in terms of Mean Squared Error of the best tree in each sequence. The best result for each data set is presented in italics and the results that are significantly worse that this best score are marked with asterisks (one for 95% confidence and two for 99%). Table 3. Mean Squared Error for each sequence-based method. DepthFrst MinErr MDL MCC OPT-2 MinStatS Abalone 7.35 ** 7.11 ** ** 7.22 ** Planes Pole Elevators * * Ailerons ** ** ** ** ** Mv Fried CompAct CompAct(s) Census16H 1.9E+09 ** 1.97E+09 ** 1.72E+09 ** 1.99E+09 ** 1.92E+09 ** 1.63E+09 Census8L 1.55E+09 ** 1.54E+09 ** 1.26E+09 ** 1.45E+09 ** 1.43E+09 ** 1.21E+09 Kinematics ** ** ** We can observe that our proposed Minimum Statistical support (MS) method is clearly the method with best results. MS is the only method that never significantly looses to others, while achieving several significant wins. This result is even more remarkable if we take into account that MS is a particular simple method when compared to others like OPT-2, for instance. Our other proposed method (MDL) can be considered the second best in terms of accuracy. Surprisingly, there are methods that even achieve worse results than the blind Depth-first approach on some domains. This table also shows that the choice of the sequence generation method may lead to significant differences on the accuracy of the final model. Table 4 presents the results of this comparison in terms of size of the trees. This table uses the same notation as the previous one regarding significance of differences.

13 Table 4. Tree size for each sequence-based method. DepthFrst MinErr MDL MCC OPT-2 MinStatS Abalone ** 81.4 ** 44.6 ** 65.3 ** 76 ** Planes Pole 42.0 ** 39.8 ** * 39.0 ** 36.2 Elevators ** ** ** ** * Ailerons ** ** ** ** Mv Fried ** ** ** ** CompAct CompAct(s) Census16H ** ** ** Census8L ** ** ** ** Kinematics ** ** ** ** ** In terms of size of the trees both MDL and MS achieve significantly better results than the others. The difference can be extremely large for some domains (Census16H, Census8L, Ailerons and Abalone). These results make even more remarkable the scores of these two methods in terms of accuracy that were given in Table 3. We now address the issue of how the sequence-based methods compare to single-tree algorithms. We have compared one representative of each approach, the Minimum Statistical Support (MS) and the Niblett and Bratko s algorithm (NB), respectively. Our implementation of this later algorithm uses m-estimates to obtain the error of nodes as in the RETIS system (Karalic and Cestnik, 1991). To ensure a fair comparison with the MS method we have also used these estimates to select one of the trees of its sequence. We have carried out several 4 10 Cross Validation paired experiments using different values for the m parameter (0.5, 1.5, 2, 3, 5 and 10). For each fold a sequence was built using the MS method and m- estimates were used to select one of the trees. This tree was than compared with the tree chosen by the NB method using m-estimates with the same value of the m parameter. Table 5 shows the Mean Squared Error of the comparison for different values of m over all domains. Whenever the result of MS is significantly better (97.5% confidence) than the corresponding result of NB, the value is shown with dark gray background. The inverse is shown with light gray background. The other differences lack statistical significance at this level of confidence.

14 Table 5. Minimal Statistical support versus Niblett and Bratko s algorithm in terms of Mean Squared Error. Abal. 2Pl. Pole Elev. Ail. Mv1 Fried Comp Comp(s) C16H C8L Kin. NB(0.5) E E MS(0.5) E E NB(1.5) E E MS(1.5) E E NB(2) E E MS(2) E E NB(3) E E MS(3) E E NB(5) E E MS(5) E E NB(10) E E MS(10) E E This table provides a clear indication that the representative of sequence-based methods is significantly superior the Niblett and Bratko s algorithm on several domains. This result is even clearer if we compare the best results of each method for all tried m values 1. In effect, within this scenario we have verified that the NB method never outperformed MS result (with or without statistical significance). On the contrary, MS was significantly (97.5% confidence) better than NB on 6 out of the 12 domains. We now analyze the results of the same comparison in terms of size of the trees. Table 6 presents the number of leaves of the trees obtained with both methods. This table follows the same notation regarding the statistical significance of the differences. 1 - This best setting could be automatically obtained using an internal Cross Validation tuning process.

15 Table 6. Minimal Statistical support versus Niblett and Bratko s algorithm in terms of Size of the Tree (number of leaves). Abal. 2Pl. Pole Elev. Ail. Mv1 Fried Comp Comp(s) C16H C8L Kin. NB(0.5) MS(0.5) NB(1.5) MS(1.5) NB(2) MS(2) NB(3) MS(3) NB(5) MS(5) NB(10) MS(10) This table reveals a slight superiority of the MS method with few exceptions on some domains. If we again consider the optimal value of m for each method we observe that both methods significantly outperform the other in 3 domains. This indicates that the results in terms of tree size are a bit more leveled. Still, we have to recall that in terms of interpretability the sequence-based methods allow the user to select other trees with the wanted degree of complexity without any additional computational cost. 4 Conclusions We have presented a set of pruning methods based on the notion of selection from a sequence of trees. These methods provide a set of alternative regression trees with different trade-off between accuracy and interpretability. These approaches use reliable error estimates to choose one of these models, but they also allow the user to inspect and select other trees without additional learning effort. The trees in these sequences are characterized with measures that provide guidance regarding the costs of other selections. We claim that this feature of sequence-based pruning methods is of extreme importance in real applications of tree-based learners. We have compared several sequence-based algorithms in twelve data sets under the same learning framework. In this paper we have presented two new methods for generating sequences of pruned trees (MDL and MS). We have observed that

16 these methods significantly outperformed other existing methods in terms of accuracy on several domains. Moreover, this score is obtained with simpler trees. Both methods could be easily adapted to classification trees. We have also compared our sequence-based MS method with an algorithm that follows a single-tree approach. Our comparison has revealed that the sequencebased method leads to trees that are significantly more accurate on several domains. References Almuallim,H. (1996) : An efficient algorithm for optimal pruning of decision trees, Artificial Intelligence, 82 (2), Elsevier. Bohanec,M., Bratko,I. (1994) : Trading Accuracy for Simplicity in Decision Trees, Machine Learning, 15 (3). Kluwer Academic Publishers. Breiman,L., Friedman,J.H., Olshen,R.A. & Stone,C.J. (1984): Classification and Regression Trees, Wadsworth Int. Group, Belmont, California, USA, Cestnik,B. (1990) : Estimating probabilities : A crucial task in Machine Learning. In proceeding of the 9th European Conference on Artificial Intelligence (ECAI-90), Pitman Publishers. Esposito,F., Malerba,D., Semerano,G. (1993) : Decision Tree Pruning as a Search in the State Space. In Proceedings of the European Conference on Machine Learning (ECML-93), Brazdil,P. (ed.). LNAI-667, Springer Verlag Esposito,F., Malerba,D., Semerano,G. (1995) : Simplifying Decision Trees by Pruning and Grafting: New Results. In Proceedings of the European Conference on Machine Learning (ECML-95), Lavrac,N. and Wrobel,S. (eds.). LNAI-912, Springer Verlag Karalic,A. (1992) : Employing Linear Regression in Regression Tree Leaves. In Proceedings of ECAI-92. Wiley & Sons. Karalic,A., Cestnik,B. (1991) : The bayesian approach to tree-structured regression. In proceedings of the ITI-91. Merz,C.J., Murphy,P.M. (1996) : UCI repository of machine learning databases [ Irvine, CA. University of California, Department of Information and Computer Science. Mingers,J. (1989) : An empirical comparison of pruning methods for decision tree induction. Machine Learning, 4 : Kluwer Academic Publishers. Niblett,T., Bratko,I. (1986) : Learning decision rules in noisy domains. In Developments in Expert Systems, Bramer,M. (ed.). Cambridge University Press. Quinlan,J. and Rivest,R. (1989) : Inferring decision trees using the minimum description length principle. Information and Computation, 80 : Rissanen,J. (1983) : A universal prior for integers and estimation by minimum description length. The Annals of Statistics, 11 (2) : Robnik-Sikonja,M., Kononenko,I. (1998) : Pruning regression trees with MDL. Working paper. Wallace,C. and Patrick,J. (1993) : Coding decision trees. Machine Learning, 11(1). Kluwer Academic Publishers.

A Comparative Study of Reliable Error Estimators for Pruning Regression Trees

A Comparative Study of Reliable Error Estimators for Pruning Regression Trees Luís Torgo LIACC/FEP University of Porto R. Campo Alegre, 823, 2º - 4150 PORTO - PORTUGAL Phone : (+351) 2 607 8830 Fax : (+351)