ENSEMBLE of fuzzy decision trees is a relatively new

Size: px

Start display at page:

Download "ENSEMBLE of fuzzy decision trees is a relatively new"

Leona Gilmore
5 years ago
Views:

1 Data Mining with Ensembles of Fuzzy Decision Trees Christophe Marsala Abstract In this paper, a study is presented to explore ensembles of fuzzy decision trees. First of all, a quick recall of the state of the art related to ensembles of (fuzzy) decision trees in Machine Learning is presented. Afterwards, a new approach to construct a forest of fuzzy decision trees is proposed. Two experiments are described, one with forests of fuzzy decision trees, and the other with bagging of fuzzy decision trees. The results highlight the interest of using fuzzy set theory in this kind of approaches. I. INTRODUCTION ENSEMBLE of fuzzy decision trees is a relatively new topic. It has been introduced since the end of the 1990 in order to construct the first kind of forests of fuzzy decision trees. In the Machine Learning community, and in Data mining works, ensembles approaches have been developped in order to enhance the capabilities of classical learning algorithms and it has offered a lot of promising results. At the beginning, the boostrap method from the Statistical domain [1] has been introduced to ensure a good generalization power of models induced by Machine Learning algorithms [2]. That leads to the introduction of bagging approaches [3]. Other approaches to build ensemble of classifiers are based on the boosting methods [4], [5], [6]. These approaches offer a good way to greatly enhance the capabilities of a learning algorithm. Afterwards, forests of decision trees have been introduced to reduce their error rate when classifying new cases [7], [8], [9], [10] Randomness was introduced to construct decision trees and it offers a good way to cover the whole example space when learning and thus obtain a better classifier at the end. Recently, completely random trees were introduced and lead to the construction of completely random forests [11] that are efficient to construct a good classifier from a training set. Recently, ensemble approaches have been coupled with fuzzy-set-based learning algorithms [12], [13], [14], [15], [16]. The main aim of such approaches lies in the fact that the combination of fuzzy learning tools with ensemble approaches takes advantages of the smooth decision that is offered by a fuzzy classifier when classifying new cases. One difficulty here is to offer a fast algorithm to construct the ensemble. As a consequence, the size of the ensemble (for instance, the number of fuzzy decision trees required to build a sufficiently efficient forest) should be evaluated in order to combine efficiency and speed. In this paper, we will study a new approach to construct a forest of fuzzy decision trees. First of all, in Section II, a Christophe Marsala, Université Pierre et Marie Curie Paris6, CNRS UMR 7606, LIP6, 104 avenue du Président Kennedy, Paris, F-75016, France ( Christophe.Marsala@lip6.fr). quick state of the art of ensemble approaches is presented, and we focus on the main ones. In Section III, we propose a new approach to construct a forest of fuzzy decision trees. A method to use such a forest to classify new examples is also presented. In Section IV, two experiments are presented. They highlight the interests to use fuzzy set theory based approaches in such a process and offer a way to choose a minimal number of fuzzy decision trees to build the forest. Finally, we conclude and present some future works. II. ENSEMBLE OF (FUZZY) DECISION TREES Studies on the combination of classifiers have been conducted for a long time. For instance, in the Statistical domain, lots of approaches to estimate the parameters of a distribution of probability have been proposed. Such methods like the leave one out method, the cross-validation method, or the bootstrap method are usual tools in this domain. More recently, since the beginning of the 1990, these approaches have been introduced in the Machine Learning domain. This introduction is not very surprising in the sense that in Statistical inference, the main aim is very close to the Machine Learning one: to predict a probability distribution from a sample of data. As stated by Efron: Statistical inference concerns learning from experience: we observe a random sample x = (x 1, x 2,..., x n ) and wish to infer properties of the complete population X = (X 1, X 2,..., X N ) that yielded the sample. Probability theory goes in the opposite direction: from the composition of a population X we deduce the properties of a random sample x, and of statistics calculated from x. Statistical inference as a mathematical science has been developed almost exclusively in terms of probability theory [1]. Lots of actual approaches as bagging, random forest, or extreme trees, are derivated from the seminal works on the boostrap approach to construct ensemble of classifiers. Moreover, there is another kind of approaches, the boosting approach. Boosting a learning algorithm leads to an enhancement of its performances. As stated previously: Voting algorithms can be divided into two types: those that adaptively change the distribution of the training set based on the performance of previous classifiers (as in boosting methods) and those that do not (as in Bagging). [4]. There exists other ways to construct ensemble of classifiers. For instance, in [17], an interesting approach has been introduced to construct an ensemble of classifiers. The main idea is to combine classifiers that output a ranking when classifying an example. A threshold is set in these rankings to select the examples that are perfectly classified. From the results of several classifiers, an aggregation enables the optimization of the final ranking of the set of examples /09/$ IEEE

2 In the following, we will focus on the main (classical) ensemble approaches. Moreover, a particular focus will be done on fuzzy ensemble approaches that are based on forests of fuzzy decision trees. A. Machine learning and bootstrap In Machine Learning, a lot of ensemble approaches are derivated from the boostrap. The bootstrap method is explained hereafter. Let X = (x 1, x 2,..., x n ) be a training set of examples. Each example x i is composed of two parts: the description t i (or vector of characteristics), and a class y i. From X, in the Statistical domain, a prediction system r X (t) can be constructed to associated for each t a class y that corresponds. After the construction of r X, it is necessary to estimated the error rate in the prediction of the class for any forthcoming descriion. This estimation can be done by means of a cross validation, but Efron et Tibshirani have shown that this approach could lead to a high variability [2]. To alleviate this drawback, they introduced the bootstrap method that reduce the variability in the estimation of this error rate. In this method, a high number of samples (also called bootstrap samples) are drawn randomly from X. This enables the construction of a high number of subsets of X that can be used to reduce the variance of the estimation of the error rate. B. Bagging and boosting The bagging method has been proposed by [3]. It is an approach that is based on the boostrap method: bagging is the acronym of Bootstrap aggregating. Bagging is very close to boostrap but it differs in the final aim. The bagging is a machine learning method, it is based on the used of a learning algorithm and its aim is to construct an ensemble of predictors in order to obtain a highly confident predictor. A different method, the boosting method, has been introduced by [4], [5], [6] at the beginning of the The aim of this method is to combine a set of weak learners to obtain a very accurate prediction rule [18]. Boosting has been applied in AdaBoost that has been proposed by [19]. C. Random forests The random-forest-based methods have been introduced since the middle of the nineties. There is several approaches to construct a random forest. One of these approaches has been proposed to reduce the error rate of decision trees by increasing the number of trees constructed from the training set [7], [8]. The authors proposed the construction of trees from random subspaces of the space of the characteristics. In this approach, the description of the examples is thus randomly modified in order to construct several decision trees. A random sampling of a subset of attributes is done. This subset of attributes is used to describe the training examples and it is used to construct a decision tree. The random sampling is done several times in order to obtain a set of decision trees (a forest). In first work, the oblique decision tree algorithm was used to construct the trees [7]. Later, the well-known C4.5 has been used in this approach [8]. To classify examples by means of such a forest, an aggregation by vote of the decisions from each tree is realized. Another random forest approach has been proposed by [9]. In this approach, during the construction of the decision tree, the tests in internal nodes are chosen randomly from the 20 best tests according to the information gain. Continuous attributes are considered in the same way: the threshold of discretization for such an attribute is also chosen randomly. In his work, the author highlights the fact that this random forest approach leads to results similar to the one obtained by means of the bagging method. However, it is shown that it could lead to an improvement of the results when considering data with little noise [9]. The well-known random Forests has been introduced by Breiman [10]. A random forest is composed of an ensemble of decision trees. Each decision trees is constructed by means of a random sample of the examples from the training set. As in the bagging approach, the sampling is done with replacement. When classifying forthcoming examples, a vote of all decision trees in the forest is conducted to predict the decision of examples. D. Extremely random ensemble of trees Recent approaches are based on the construction of ensembles of extremely random trees. Extremely random trees are constructed by means of a completely random process. Both the choice of the test nodes in the tree, and the choice of the discretization thresholds for continuous attributes are random choices from the set of all possibilities [11]. Here, the main difference with classical ensemble approaches lays in the fact that it is not a random choice on the set of examples (to build subsamples) that leads to various decision trees, but it is the use of a completely random process during the construction of the trees. E. Forests of fuzzy decision trees Forests of fuzzy decision trees have been proposed more recently in order to handle continuous or fuzzy attributes. The use of the fuzzy set theory brings out more robustness and more interpretability when handling such can of data. A first approach to construct a forest of fuzzy decision trees has been proposed in [12]. To handle training problems where there are more than two classes to predict, a forest of fuzzy decision trees is constructed by considering a n- classes problem as n two-classes problems. Each fuzzy decision trees of the forest is constructed to predict a chosen class against all the other classes that are merged. When classifying forthcoming examples, several aggregation methods (normalized vote, unnormalized vote, possibilist aggregation,... ) have been proposed to aggregate the results of the classification of each fuzzy decision tree of the forest. Other methods to construct forests of fuzzy decision trees have been proposed. They can be divided into three main kinds of methods:

3 methods that modify the labels of examples and use the classical algorithm to construct fuzzy decision trees. For instance, the method by [12]. methods that use samples of the training set to construct several fuzzy decision trees. Here, some recent works by [16] presents a preliminary approach based on the random forests of Breiman. In these family of method, the classical algorithm to construct fuzzy decision trees is used. methods that use the same training set, without any sampling, and introduce some alteration of the process of selection of test nodes, either by eliminating some attributes from the list of attributes (as in [14]), or by selecting a set of attributes instead of a unique one (as in [13], [15]). In [14], the fuzzy decision trees are first constructed classically (ie. non-fuzzily) and the fuzzification of the test nodes is done at the end of the construction of the trees. In [13], [15], the trees are constructed by considering at each node, not only the best attributes, but all the best ones if there are several attributes that can be convenient to split the training set. By now, the description of real-world applications of forests of fuzzy decision trees is rather rare, but the development of this branch of researches is growing. An instance of a real-world application of forests of fuzzy decision trees can be found in [20]. In this application, forests of fuzzy decision trees have been used both to enhanced the predictability accuracy of fuzzy decision trees, and to output a ranking of test examples in classification. III. CONSTRUCTING AND USING A FOREST OF FUZZY DECISION TREES In this part, a presentation of a method to construct and to use a forest of fuzzy decision trees (FFDT) is done. First of all, the algorithm to construct a fuzzy decision tree (FDT) is recalled, and the use of a fuzzy decision tree in classification is presented too. A. Growing fuzzy decision trees Classical decision tree algorithms [21], [22] are one of the inductive learning algorithms that are often used in data mining. Unfortunately they encounter sometimes technical problems when dealing with numerical attributes. That leads to the introduction of the fuzzy decision tree construction algorithms enabling the use of fuzzy values in the decision tree [23], [24]. Fuzziness allows the decisions to be smoother, avoiding sharp thresholds. It also enables to have degrees of decision and of membership to a certain class. Inductive learning rises from the particular to the general. A tree is built from its root to its leaves, by successive partitioning the training set into subsets. Each partition is done by means of a test on an attribute, which leads to the definition of a node of the tree. Let us assume that a set of classes C = {c 1,..., c K }, representing a physical or a conceptual phenomenon, is considered. This phenomenon is described by means of a set of attributes A = {A 1,..., A N }. In that case, a description is a N-tuple of attribute-value pairs (A j, v jl ). Each description is linked with a particular class c k from C to make up an instance (or example, or case) e i of the phenomenon. Finally, the inductive learning is the process that generalizes from a training set E = {e 1,..., e n } of examples to a general law to bring out relations between descriptions and classes in C. In our case, each attribute A j can take a fuzzy, numerical, or symbolic value v jl in the set {v j1,..., v jmj } of all possible values. We suppose that v jl is associated with a membership function µ vjl. Similarly, each c k is supposed to be associated with a membership function µ ck. 1) Attribute selection: Most algorithms designed for constructing decision trees proceed in the same way: the socalled Top Down Induction of Decision Tree (TDIDT) method. They build a tree from the root to the leaves, by successive partitioning the training set into subsets. Each partition is done by means of a test on one attribute and leads to the definition of a node of the tree. The attribute is selected by means of a measure of discrimination H. Such a measure enables us to order the attributes according to increasing discrimination-accuracy when splitting the training set. The discrimination power of each attribute in A is valued with regard to the classes. The attribute with the highest discriminating power is selected to construct a node. In the classical ID3 algorithm [22], the measure of discrimination used is the Shannon measure of entropy. To construct fuzzy decision trees, a fuzzy measure of discrimination should be used. Such a measure must deal with the knowing of a fuzzy partition for fuzzy attribute. Well known fuzzy measure are the fuzzy entropy (that is an extension of the Shannon entropy to fuzzy events) [25] and the measure of ambiguity [26]. 2) Construction of fuzzy partitions: The process of construction of fuzzy decision tree is based on the use of fuzzy partitions for each numerical attribute. However, it is rare to know such a fuzzy partition. Thus an automatic method of construction such a partition from a set of precise values was implemented. In this way we obtain a set of fuzzy values for each numerical attribute. The algorithm [27] is based on the utilization of the mathematical morphology theory. Kernels of concordant values of a numerical attribute related to the values of the class can be found. Fuzzy values induced from a set of numerical values of an attribute are linked with the repartition of the values of the class related to the numerical attribute. Thus a contextual partitioning of an attribute is performed, enabling us to obtain the best partition related to that attribute with respect to the class. 3) Classification with a fuzzy decision tree: It is wellknown that the path from the root to a leaf in of a decision tree is equivalent to a production rule. The premises for such a rule r are composed by tests on attributes values, and the conclusion is the value of the class that labels the leaf of the path: if A l1 = v l1 and...and A lp = v lp then C = c k

4 In a fuzzy decision tree, a leaf can be labelled by a set of values {c 1,..., c K } for the class, each value c j associated with a weight computed during the learning phase. Thus, a path of a fuzzy decision tree is equivalent to the following rule: if A l1 = v l1 and... and A lp = v lp then C = c 1 with the degree P (c 1 (v l1,... v lp )) and... and C = c K with the degree P (c K (v l1,... v lp )) where P is the probability measure of fuzzy event introduced by [28]. In a fuzzy decision tree, each value v i can be either precise or fuzzy, and is described by means of a membership function µ vi. Now, when an example e, described by means of a set of values {A 1 = w 1 ;...; A n = w n }, must be classified, its description is compared with the premises of the rule r, by looking to the degree with which the observed value w is near to the edge value v. This proximity is valued as a degree Deg(w, v). In our case, the value w is a precise value and we have Deg(w, v) = µ v (w). For each premise, the degree Deg(w li, v li ) is valued for the corresponding value w li. Finally, given the rule r, the example e is associated with the class c j with a final degree Fdeg r (c j ). This final degree is the aggregation of all the degrees Deg(w li, v li ) by means of the minimum: Fdeg r (c j ) = min Deg(w l i, v li ).P (c j (v l1,... v lp )) i = 1... p For each class c j, the example e is associated with the membership degree Fdeg(c j ), from [0, 1], computed based on the whole set of rules. If n ρ is the number of rules given by the fuzzy decision tree: Fdeg(c j ) = max Fdeg r (c j ) r = 1... n ρ The predicted class c k associated with e can be chosen as the class with the highest membership degree Fdeg(c k ). B. Building a forest A forest is composed of a given number n of Fuzzy Decision Trees. Each fuzzy decision tree F i of the forest is constructed from a training set T i. Each training set T i is a random sample of the whole original training set, as described hereafter. In the classical random forest approach [10], the random samples of the T i are done by selecting examples from the original training set. This selection is done with replacement. On the contrary, in the growth of a forest of fuzzy decision trees, the random samples are made without replacement. In presence of unbalanced training set, the sampling sets T i are constructed such that they are composed by an equal number of examples from each class. Moreover, although a fuzzy decision tree can handle simultaneously several classes, in domains where there exists a great number of classes, the problem is decomposed by constructing a forest for each single class. The number of forest is thus the number of classes. For instance, if the aim is to associate each case e with one of the three classes c 1, c 2, and c 3, a forest is constructed to recognize if e can be associated with c 1 or not, another forest to recognize if e can be associated with c 2 or not, an a last one to recognize if e can be associated with c 3 or not. Thus, a forest is dedicated to the recognition of a single class and is composed of binary fuzzy decision trees that classify into a binary class. The algorithm to build a forest of fuzzy decision trees is summarized in Algorithm 1. Algorithm 1 Building a forest of fuzzy decision trees Input: Training set E, n > 0, the number of trees in the forest, p ]0, 1], the ratio of examples from the minority class to be kept, 0 < M < E, the maximum number of examples with the same class, L p, set of parameters to construct a fuzzy decision tree (according to the used learning algorithm). Output: F, a forest of n fuzzy decision trees. Begin F {} c number of examples with the minority class in E m min(p c, M) i 1 while i n do E i random choice of m examples from each class from E A i construction of a fuzzy decision tree from E i with L p F F {A i } i i + 1 endwhile End. C. Using a forest Each fuzzy decision tree of a forest is used to classify examples from the test set as explained previously (Section III-A.3). By means of a fuzzy decision tree, an example can be associated with a membership degree FDeg(c) to a predicted class c. With a forest of n fuzzy decision trees, corresponding to a single class to be recognized, the classification of an example e is performed in two steps: 1) classification of e by means of the n fuzzy decision trees of the forest: e is classified by means of each fuzzy decision tree F i in order to obtain a degree Fdeg(c) of e to belong to the class c. We denote d i (e) = Fdeg(c) the degree given by F i for e. 2) sum (or average) of the d i (e), i = 1... n degrees for each e in order to obtained a single value d(e) = n i=1 d i(e), which corresponds to the degree in which the forest believe that the e contains the feature. The higher d(e), the higher it is believed that the e contains the corresponding feature.

5 The algorithm to use a forest of fuzzy decision trees is summarized in Algorithm 2. Algorithm 2 Classification by means of a forest Input: F, a forest of fuzzy decision trees, e, an example to be classified, L W, a set of weights associated with each tree of the forest. Output: L F, list of classes c k with a membership degree µ k (e) [0, 1] Begin F s {} /* Set of trees already seen */ L A {} /* List of individual classification results for e */ while F s F do A choice of a tree from F that is not yet in F s R classifywithfuzzytree(a, e) L A L A {R} F s F s {A} endwhile L F aggregationoftheresults(l A, L W ) End. one of the other classes. The idea being to have the same number of examples with the class c, than examples with another class. step 3 from this sampling, a fuzzy decision tree is constructed. This process is resumed for each of the three classes in order to obtain three fuzzy decision trees, each one enabling the classification of an example with regards to a given class. Thus, the process described in Section III-B can be applied. During the classification step, an example from the test set is classified by each of the fuzzy decision trees and the individual classification results are aggregated in order to determine the final class of the example. The classification is done similarly than the one described in Section III-C. IV. EXPERIMENTS In order to study the behavior of forests of fuzzy decision trees as a convenient tool for data mining some experiments were conducted. Moreover, a similar study has been conducted with the classical bagging approach in order to highligh the interest of using fuzzy decision trees rather than classical crisp trees. To study the influence of the number of fuzzy decision trees in the forest, we choose the Waveform dataset [21], from the UCI repository [29]. We focus on the error rate (ie. the rate of bad-classified examples). A. Waveform dataset The well-known Waveform dataset [21] is often used in the machine learning community and a lot of algorithms have been evaluated with it. For instance, in [10] or in [11], some results with this dataset can be found for algorithms combining decision trees (adaboost, random forests,...). This dataset has the following interesting properties. There are 3 (symbolic) classes to recognize, and 21 real-valued attributes. Data can be noised (as in real-world problems). The dataset is composed of a total of 5000 instances. In our experiments, the dataset has been split into two subsets: the training set is composed of 3500 examples, and the test set is composed of 1500 examples. B. Forests of fuzzy decision trees In this first experiment, the forests are constructed following the protocol: step 1 a class c is chosen from the set of classes step 2 a sampling of the training set is done by taking all the examples associated with the class c, and in addition a random sampling of examples having Fig. 1. Influence of the size of the forest In Figure 1, we present the error rate variations when classifying the test set with various sizes of forests. We recall that the error rate is the ratio between the number of bad-classified examples and the total number of classified examples. The error rate ranges from 0 ( no bad classification ) to 1 ( all the examples were uncorrectly classified ). In Figure 1, the average error rate is around 0.15 that corresponds to around 15% examples are uncorrectly classified. Forests with up to 500 fuzzy decision trees have been constructed, and each one of these forests has been used to classify all the examples from the test set. From Figure 1, it can be shown that the error rate decreases with the size of the forest of fuzzy decision trees (in number of trees). In this figure, the Classic graph corresponds to the error rate when using a fuzzy decision tree classically. That is, by considering that each fuzzy node of the fuzzy decision tree outputs a non-fuzzy result. Here, the decision is done thanks to the use of the alpha-cut of degree 0.5 of the fuzzy values. In this case a fuzzy decision tree outputs a single class with a full membership, either has the class or has not the class, for each example. The two other graphs show results in the complete fuzzy use of the fuzzy decision trees (in this case a FDT outputs a

6 degree of membership (ranging from 0 to 1) of the example to each class). The Zadeh graph corresponds to the error rate obtained when using the Zadeh t-norms (the minimum and maximum operators) when classifying examples. The Lukasiewicz graph corresponds to the error rate obtained when using the Lukasiewicz t-norms when classifying examples. From Figure 1, it can be shown that the use of the fuzzy set theory reduce the error rate for this problem. In this case, the error rate is always better in the cases of the fuzzy uses of the fuzzy decision trees, no matter the size of the forest. Moreover, it can also be shown that the number of the fuzzy decision trees to build the forest should not be necessarily very high. In Figure 1, it is clear that the variability of the error rate is very high for forests with less than 50 fuzzy decision trees, and this error rate does not vary a lot for forest with more than 100 fuzzy decision trees. As a rule of thumb, a forest with 100 fuzzy decision trees should be sufficient to obtain a good balance between efficiency of the classification model and speed of the construction method. C. Bagging fuzzy decision trees In this second experiment, a bagging of fuzzy decision trees is done. Random choices of examples from the training set are done with replacement in order to construct the sample subsets that are used to construct the fuzzy decision trees. During the classification step, an example from the test set is classified by each fuzzy decision tree and the individual classification results are aggregated in order to determine the final class of the example. The classification is done similarly than the one described previously. way than in the previous experiment, it can be shown that the error rate decreases with the size of the forest of fuzzy decision trees (in number of trees). In this figure too, the Classic graph corresponds to the error rate when using a fuzzy decision tree classically. The Zadeh graph corresponds to the error rate obtained when using the Zadeh t-norms, and the Lukasiewicz graph corresponds to the error rate obtained when using the Lukasiewicz t-norms. From Figure 2 also, it can be shown that the use of the fuzzy set theory reduce the error rate for this problem. D. Remarks A final remark concern the complexity and the runtime of the whole process. The complexity for the construction of a forest is related to the number of the fuzzy decision trees it contains and, thus, is relatively low (if taken into account the fact that a small number of fuzzy decision trees is needed to obtain a convenient small error rate). For instance, the runtime of the experiments described here, composed of the construction of 500 fuzzy decision trees, the classification of the test set by each of these fuzzy decision trees and with each of the presented operators (Classic, Zadeh, and Lukasiewicz), is around 7350 seconds on a multiprocessor computer (10 core 2.93 Ghz, 64 Go RAM, with GNU/Linux 2.6). The construction and the use of the fuzzy decision trees was done by means of the Salammbô software. This software was developed for building fuzzy decision trees efficiently and it enables to test several kinds of parameters both during the construction of a fuzzy decision tree (fuzzy partitioning, measures of discrimination, stopping criterion,...) and during the classification of new examples with the obtained tree (choice of the t-norms,.) [24]. V. CONCLUSIONS In this paper, a study has been presented on ensembles of fuzzy decision trees. A new approach to construct a forest of fuzzy decision trees has been proposed, and two experiments have been described that shown the interest of using fuzziness in such kind of ensemble methods. In future works, the experiments will be expanded to other classical datasets in other to deepen the conclusion drawns. Morevover, new applications in data mining, and in particular in video mining, will be conducted to highligh the interest of such fuzzy approaches. REFERENCES Fig. 2. Influence of the size of the forest with Bagging In Figure 2, we present the error rate variations when classifying the test set with various sizes of bagging forests. Bagging of up to 500 fuzzy decision trees have been studied, and each one of these forests has been used to classify all the examples from the test set. From Figure 2, in the same [1] B. Efron and R. Tibshirani, An Introduction to the Bootstrap. Chapman and Hall, CRC Press, [2], Improvements on cross-validation: The.632+ bootstrap method, Journal of the American Statistical Association, vol. 92, no. 438, pp , [3] L. Breiman, Bagging predictors, Machine Learning, vol. 24, pp , [4] E. Bauer and R. Kohavi, An empirical comparison of voting classification algorithms: Bagging, boosting, and variants, Machine Learning, vol. 36, pp , [5] R. Schapire, The strengh of weak learnability, Machine Learning, vol. 5, pp , 1990.

7 [6] Y. Freund, R. Iyer, R. Schapire, and Y. Singer, An efficient boosting algorithm for combining preferences, Journal of Machine Learning Research, vol. 4, pp , [7] T. K. Ho, Random decision forests, in Proc. of the Third Int. Conf. on Document Analysis and Recognition, vol. 1, 1995, pp [8], The random subspace method for constructing decision forests, IEEE Transactions on Pattern Analysis and Machine Learning, vol. 20, no. 8, pp , August [9] T. Dietterich, An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization, Machine Learning, vol. 40, pp , [10] L. Breiman, Random forests, Machine Learning, vol. 45, pp. 5 32, [11] P. Geurts, D. Ernst, and L. Wehenkel, Extremely randomized trees, Machine Learning, vol. 63, no. 1, pp. 3 42, April [12] C. Marsala and B. Bouchon-Meunier, Forest of fuzzy decision trees, in Proceedings of the Seventh International Fuzzy Systems Association World Congress, M. Mare s, R. Mesiar, V. Novák, J. Ramik, and A. Stup nanová, Eds., vol. 1, Prague, Czech Republic, June 1997, pp [13] C. Z. Janikow and M. Faifer, Fuzzy decision forest, in Proceedings of the 19th International Conference of the North American Fuzzy Information Processing Society (NAFIPS 00), July 2000, pp [14] K. Crockett, Z. Bandar, and D. Mclean, Growing a fuzzy decision forest, in Proceedings of the 10th IEEE International Conference on Fuzzy Systems, December 2001, pp [15] C. Z. Janikow, Fuzzy decision forest, in Proceedings of the 22nd International Conference of the North American Fuzzy Information Processing Society (NAFIPS 03), July 2003, pp [16] P. P. Bonissone, J. M. Cadenas, M. C. Garrido, and R. A. Díaz- Valladares, A fuzzy random forest: Fundamental for design and construction, in Proceedings of the 12th International Conference on Information Processing and Management of Uncertainty in Knowledge- Based Systems (IPMU 08), Malaga, Spain, July 2008, pp [17] T. K. Ho, A theory of multiple classifier systems and its application to visual word recognigtion, Ph.D. dissertation, Faculty of the Graduate School of State University of New York at Buffalo, May [18] Y. Freund and R. Shapire, A short introduction to boosting, Journal of Japanese Society for Artificial Intelligence, vol. 14, no. 5, pp , September [19], Experiments with a new boosting algorithm, in International Conference on Machine Learning, 1996, pp [20] C. Marsala and M. Detyniecki, University of Paris 6 at TRECVID 2006: Forests of fuzzy decision trees for high-level feature extraction, in TREC Video Retrieval Evaluation Online Proceedings, November 2006, [21] L. Breiman, J. Friedman, R. Olshen, and C. Stone, Classification And Regression Trees. New York: Chapman and Hall, [22] J. R. Quinlan, Induction of decision trees, Machine Learning, vol. 1, no. 1, pp , [23] C. Z. Janikow, Fuzzy decision trees: Issues and methods, IEEE Transactions on Systems, Man and Cybernetics, vol. 28, no. 1, pp. 1 14, [24] C. Marsala and B. Bouchon-Meunier, An adaptable system to construct fuzzy decision trees, in Proc. of the NAFIPS 99 (North American Fuzzy Information Processing Society), New York, USA, 1999, pp [25] B. Bouchon-Meunier, C. Marsala, and M. Ramdani, Learning from imperfect data, in Fuzzy Information Engineering: a Guided Tour of Applications, D. Dubois, H. Prade, and R. R. Yager, Eds. John Wileys and Sons, 1997, pp [26] Y. Yuan and M. Shaw, Induction of fuzzy decision trees, Fuzzy Sets and systems, vol. 69, pp , [27] C. Marsala and B. Bouchon-Meunier, Fuzzy partioning using mathematical morphology in a learning scheme, in Proceedings of the 5th IEEE Int. Conf. on Fuzzy Systems, vol. 2, New Orleans, USA, September 1996, pp [28] L. Zadeh, Fuzzy sets, Information and Control, vol. 8, pp , [29] A. Asuncion and D. Newman, UCI machine learning repository University of California, Irvine, School of Information and Computer Sciences, mlearn/mlrepository.html, 2007.

Improving the Random Forest Algorithm by Randomly Varying the Size of the Bootstrap Samples for Low Dimensional Data Sets

Improving the Random Forest Algorithm by Randomly Varying the Size of the Bootstrap Samples for Low Dimensional Data Sets Md Nasim Adnan and Md Zahidul Islam Centre for Research in Complex Systems (CRiCS)