Contribution of Data Complexity Features on Dynamic Classifier Selection

Size: px

Start display at page:

Download "Contribution of Data Complexity Features on Dynamic Classifier Selection"

Augusta Walton
5 years ago
Views:

1 Contribution of Data Complexity Features on Dynamic Classifier Selection André L. Brun, Alceu S. Britto Jr, Luiz S. Oliveira, Fabricio Enembreck, and Robert Sabourin Postgraduate Program in Informatics Pontifical Catholic University of Parana (PUCPR) Curitiba, Parana, Brazil {abrun, alceu, Postgraduate Program in Informatics Federal University of Parana (UFPR) Curitiba, Parana, Brazil École de Technologie Supérieure (ÉTS) Quebec University Montreal, Quebec, Canada Abstract Different dynamic classifier selection techniques have been proposed in the literature to determine among diverse classifiers available in a pool which should be used to classify a test instance. The individual competence of each classifier in the pool is usually evaluated taking into account its accuracy on the neighborhood of the test instance in a validation dataset. In this work we investigate the possible contribution of considering during the classifier evaluation the use of features related to the problem complexity. Since usually the pool generation technique does not assure diversity, the idea is to consider diversity during the selection. Basically, we select a classifier trained in subset of data showing similar complexity than that observed in neighborhood of the test instance. We expect that this similarity in terms of complexity allow us to select a more competent classifier. Experiments on 30 classification problems representing different levels of difficulty have shown that the proposed selection method is comparable to well known dynamic selection strategies. When compared with other DS approaches it was able to win on 123 over 150 experiments. This promising results indicate that further investigation must be done to increase diversity in terms of data complexity during the process of pool generation. I. INTRODUCTION Multiple Classifier Systems (MCS) have been proposed in the literature as an alternative for the hard task of training a monolithic approach in which a single classifier must be capable of learning the wide variability usually found in a pattern recognition problem [1]. The rationale behind this is that the classifiers in an MCS are diverse in the sense that they make different errors, and consequently, they show some complementarity [2]. Many researchers have focused on Multiple Classifier Systems and, consequently, new solutions have been dedicated to each of the three possible MCS phases: a) generation, b) selection, and c) integration. In the first phase, a pool of classifiers is generated; in the second, a subset of these classifiers is selected, while in the last phase, a final decision is made based on the prediction(s) of the selected classifier(s). In this context, the selection phase has received special attention of the research community. We can find approaches based on the selection of a single or an ensemble of classifiers from an initial pool, created using learning algorithms oriented by diversity and accuracy [3][4][5]. Such a selection can be performed in a static or dynamic fashion. In the former, the selection is done during training stage of the MCS. The same selected classifiers are used for all testing samples. In the later, the selection is executed during the testing stage of the MCS. For each testing sample a specific selection is executed. The success of a dynamic selection method depends on the adoption of a good criterion to evaluate the competence of classifiers in recognizing the test pattern to be labeled. The authors in [6] present a taxonomy to categorize the methods according to the criterion used for the fitness evaluation of the classifiers. According to them, the methods can be separated into two major groups: those based on individual competence and those which consider the relationship between the classifiers that compose the pool. Despite the high number of different strategies and aspects used to evaluate the performance of the classifiers in the pool, it is possible to observe the common use of accuracy evaluation which is usually combined with other sources of information. It this paper, we evaluate the contribution of features related to the level of difficulty of a classification problem extracted from data complexity analysis in the process of evaluating the competence of each classifier given a test instance. It is worth noticing that the complexity here, or difficulty, is

2 not restricted to only the number of instances, classes and features. In fact, it involves important aspects inherent to a classification problem that are estimated from complexity measures computed from the problem data. Among different aspects, the complexity measures usually attempt to describe and quantify how overlapping are two classes, how the border region between classes behaves, or even how is the spatial distribution of each class. The study presented here is inspired on works that try to find the most promising learning inducer for a specific classification problem taking into account its difficulty [7]. However, the idea is to investigate whether the level of difficulty estimated from the neighborhood of the test pattern in a validation dataset can contribute to compute the competence of the classifiers in the pool of an MCS. In summary, the research questions to be answered here are two, as follows: a) Could the use of information related to data complexity analysis contribute to estimated the competence of classifiers in an MCS based on dynamic selection? and b) Is there some relation between the distribution of training datasets in the complexity space and the observed performance of the proposed selection strategy? To answer those questions we have used an experimental protocol based on 30 classification problems representing different levels of difficulty. A pool of classifiers is constructed for each problem and for each training subset a complexity signature is created considering a set of complexity measures used to describe the problem difficulty. During the operational phase, similar signature is obtained for the neighborhood of the test instance using a validation dataset. The similarity between these complexity signatures, plus additional feature used to describe the test instance complexity are combined with accuracy information and applied to evaluate the competence of each classifier in the pool. In addition, we compared the obtained results with 5 different dynamic selection methods available in the literature. The experiments have shown that selecting the classifiers based on complexity analysis is a promising strategy. The proposed DS approach was able to win the single best (SB) on 28 over 30 experiments, and the combination of all classifiers on 23 over 30 experiments. When compared with other DS approaches it was able to win on 123 over 150 experiments. This work is organized into 6 sections. In Section 2, we present the complexity measures used to estimated the problem difficulty. Section 3 describes the proposed DS method, while Sections 4 and 5 present the experimental results and a further analysis, respectively. Finally, in Section 6 one may find the conclusion and future work. II. COMPLEXITY MEASURES The level of difficulty of a classification problem can be estimated using complexity measures applied on the data. The complexity measures proposed in the literature are usually classified into three categories [7], [8], as follows: a) classes overlapping; b) classes separability; and c) classes geometry, topology and density. In our work, we have considered three measures, one from each category. The rationale behind that is to use measures based on different concepts, showing low correlation. From the first category, we have used the F 1 (Fisher s Discriminant Ratio). This measure expresses how separable are two classes according to a specific feature. The F 1 metric can be interpreted as the distance between the center of two classes, so that the larger its value, larger the separation between the classes. Its calculation consists on comparing the means and the standard deviations for each class feature, in order to evaluate discrepancy level. Equation 1 presents how it is computed for each attribute specifically. The elements µ 1, µ 2, σ 1, and σ 2 correspond to the means and standard deviations for classes 1 and 2, respectively, for a specific feature space. The value adopted for F 1 will be the highest among all features. F 1 = (µ 1 µ 2 ) 2 σ 12 σ 2 2 (1) From the second category, we have used the N2 measure (the ratio of intra/inter class nearest neighbor distance). By applying N 2 we expect to determine how two classes are separable analyzing the existence and form of the border between classes. The idea is characterized by calculating the Euclidean distance between each element of the set to the nearest neighbor in the same class and to the closest neighbor outside the class. Then the distances between the elements of the same class are added and divided by the sum of the distances between instances of different classes, as shown in Equation 2, where δ(n 1 = (x i ), x i ) represents the distance between the instance i and its nearest neighbor from the same class. At the same time, δ(n 1 (x i), x i ) consists in the distance between the element i to the nearest element that belongs to a different class, while n is the number of instances. n i=1 N2 = δ(n 1 = (x i ), x i ) n i=1 δ(n 1 (x i), x i ) Finally, from the third category, we have selected N4 (the nonlinearity of the one-nearest neighbor classifier) which is based on the same idea of another measure named L3 (The nonlinearity of a linear classifier). According Ho & Basu [7] and Ho, Basu & Law [8], given a training set, L3 first creates a test set by linearly interpolating between randomly chosen pairs within the same class (the training set) also with random coefficients (different weights for each element). Then L3 correspond to the value of the error rate of the training data set versus test set applying a linear classifier. The measure N 4 follows the same principle, however, when calculating the error rate on the test set, instead of adopting a linear classifier, the 1NN classifier is employed. III. (2) N4 = 1NN error (dataset) (3) PROPOSED DS METHOD Different approaches can be found in the literature for dynamic selection of classifiers. The main difference among them is the criterion used to estimate the competence of the classifiers for a given test instance during the selection scheme. One may find measures of competence based on pure accuracy (overall local accuracy or local class accuracy) [9], [10], ranking of classifiers [11], probabilistic information [10],

3 Fig. 1. Training Stage - Pool Generation. Fig. 2. Operational Stage - Classifiers Selection. [12], classifier behavior calculated on output profiles [13], and oracle information [14][15]. Moreover, we may find measures that consider interactions among classifiers, such as diversity [16][17], ambiguity [18] or other grouping approaches [19]. In this work, our assumption is that the complexity of the neighborhood of the test instance computed on a validation dataset when combined with accuracy information can be used to estimate the competence of the classifiers in the pool. For this purpose the competence of a classifier is estimated considering its accuracy on a local region of the feature space for which we also compute the level of difficulty. Figures 1 and 2 present an overview of the proposed method, while its steps are described in the Algorithm 1. In the training stage, the training dataset of a given classification problem is used to generate a pool of initial classifiers using an ensemble learning technique (Bagging or Boosting) to provide diversity and accuracy (Fig. 1 - A). Afterwards, for each subset of data generated, a vector composed of M complexity measures is computed (Fig. 1 - B). This feature set is used as an M-complexity signature for each data subset (DS i ). During the operational phase (Fig. 2), the dynamic selection is done by estimating the competence of each classifier based on three features. In order to describe them let us consider DS i as the dataset used for training the classifier C i, while sig DSi is the M-complexity signature (F1, N2 and N4 measure values) computed from DS i. In addition, γ t as the K-neighborhood of the test instance t, while sig γt as the M-complexity signature computed from γ t. f 1i - Similarity in terms of complexity: Given a testing sample t, the first step is to define its neighborhood γ t in the validation dataset (Fig. 2 - A). Afterwards, sig γt is computed from γ t (Fig. 2 - B). The similarity between the complexity signature sig γt with each training dataset complexity signature sig DSi is done by means of Euclidean distance as denoted in Eq. 4. With this, we can determine the classifier trained on dataset showing similar complexity than that observed on the neighborhood of the test instance. f 1i = δ(sig γt, sig DSi ) (4) f 2i - Distance from the predicted class: Let us to consider y j as the class predicted by the classifier C i for the test instance t, DS i as the dataset used to train C i, and α ij as the centroid of the predicted class y j in the training dataset DS i. We compute the distance of the test instance t to the centroid α ij as shown in Eq. 5. The idea is to better describe the complexity space, since the complexity measures may show similar values to represent the difficulty between two classes even when they are differently distributed in the feature space.

4 f 2i = δ(t, α ij ) (5) f 3i - Local class accuracy: Consists on the local class accuracy of each classifier C i considering the class predicted (y j ) for the test instance t. This local accuracy is estimated on the neighborhood (γ t ) as denoted by Eq. 6. f 3i = Accuracy(C i, y j, γ t ) (6) The features are computed for each classifier. The final competence value of the classifier C i can be obtained by combining the three features. We have evaluated to combine them by using product and sum, both strategies have shown similar results. The final combination using the sum of them is presented in Eq. 7. Comp_ Ci = (1 f 1i) + (1 f 2i) + f 3i (7) where f 1i and f 2i correspond to the normalized metrics of f 1i and f 2i, respectively. They were normalized using the MinMax scaling as denoted on Eq. 8 for the feature f 1i. f 1i = f 1i f 1imin f 1imax f 1imin (8) The most promising classifier C is obtained (Fig. 2 - D) as described in the Eq. 9. C = argmax(comp_ Ci ) (9) Algorithm 1: DSOC - DS on Complexity Input: the pool C of M classifiers; training, validation and testing sets, Tr, Va and Te; and the neighborhood size K Output: C, the most promising classifier for each testing sample t in Te 1 for each classifier C i in the pool do 2 Compute the complexity signature sig DSi from data subset DS i ; 3 end 4 for each test t i in Te do 5 Find the γ t as the K-nearest-neighboors of the t i in Va; 6 Compute the complexity signature of γ t ; 7 for each classifier C i in the pool do 8 Compute features f 1, f 2, f 3 ; 9 Normalize f 1 and f 2 ; 10 Comp_ Ci = (1 f 1i ) + (1 f 2i ) + f 3i; 11 end 12 C = argmax(comp_ Ci ); 13 Use the classifier C to classify t i ; 14 end IV. EXPERIMENTS This section presents the experiments undertaken to evaluate the proposed DS method. A set of 30 datasets were used in these comparative experiments. Sixteen coming from the UCI machine learning repository [20], four from the KEEL (Knowledge Extraction based on Evolutionary Learning) repository [21], four from the Ludmila Kuncheva Collection of real medical data [22], four from the STATLOG project [23] and two artificial datasets generated with the Matlab PRTools toolbox. These datasets present only numeric features with no missing values. In addition, they have been frequently used to evaluate DS methods in the literature. Table I presents the details of each dataset. The experiments were conduced using 20 replications. For each replication, the datasets were randomly divided on a distribution of 50% of the dataset elements for training, 25% for validation and 25% for testing. For each problem a pool with 100 perceptrons was created using the Bagging technique [3]. Bags containing 10% or 20% of the training samples were used, depending on the size of the training sets (20% was used for the smaller datasets). The perceptron was used as base classifier since it represents an unstable and weak inducer. In addition, weak classifiers may emphasize the differences in terms of performance among the DS schemes [15]. The size of the neighborhood used to compute the complexity descriptors for each test instance was defined as 30. This value ensures for the used datasets the presence of elements of at least two classes in the neighborhood, thus it would be possible to calculate the metrics of complexity. However, in order to determine this size for the neighborhood we have carried out experiments on thirteen UCI problems, where the number of neighbors were varied from 20 to 50. As described before, we used three complexity metrics: F 1, N2 and N4. The idea was to use a descriptor of each category described in Section 2. To drive our choice, we carried out a study on thirteen UCI databases, in which we analyzed the correlation among all the 14 complexity metrics available in the DCoL library [24]. It was found that these three measures present low Pearson correlation among themselves, indicating that they may explain various phenomena together. In order to evaluate the contribution of the adoption of data complexity measures in the selection process, we compared the performance of the proposed method with 5 DS techniques already established in the literature, and also with the Single Best (SB) and combination of all classifiers (ALL). With respect to DS methods, we have implemented methods based on single classifier selection (LCA [9], OLA [9], and a Priori [25]) and also methods based on ensemble selection (KNORA- E and KNORA-U [15]). For these methods a neighborhood size K = 7 was used. This value was proved to be the most appropriate in previous studies [15], [26]. The average performance of each approach for each classification problem is shown in Table II. The boldfaced values in that table represents the highest accuracy for each problem. In the last column of Table II we have the oracle performance for each problem considering the given pool of classifiers. Such an upper limit in terms of performance is estimated considering the assumption that if at least one classifier can well recognize a given test pattern then the pool is also able to recognize it. It is possible to observe that the proposed DS method

5 TABLE I. MAIN FEATURES OF THE DATASETS USED IN THE EXPERIMENTS Dataset Instances Training Test Validation Features Classes % Bag Source Adult UCI Banana PRTools Blood UCI CTG UCI Diabetes UCI Ecoli UCI Faults UCI German STATLOG Glass UCI Haberman UCI Heart STATLOG ILPD UCI Segmentation UCI Ionosphere UCI Laryngeal LKC Laryngeal LKC Lithuanian PRTools Liver UCI Magic KEEL Mammo KEEL Monk KEEL Phoneme ELENA Sonar UCI Thyroid LKC Vehicle STATLOG Vertebral UCI WBC UCI WDVG UCI Weaning LKC Wine UCI TABLE II. COMPARISON OF THE PROPOSED METHOD OF DYNAMIC SELECTION BASED ON COMPLEXITY (DSOC) WITH THE SINGLE BEST (SB) CLASSIFIER IN THE POOL, THE COMBINATION OF ALL CLASSIFIERS (ALL), SOME DYNAMIC SELECTION METHODS LIKE OLA, LCA, A PRIORI, KNORA-U (KU), KNORA-E (KE), AND THE ORACLE PERFORMANCE. THE RESULTS ARE THE AVERAGE AND CORRESPONDING STANDARD DEVIATION OF 20 REPLICATIONS. THE BEST RESULTS ARE IN BOLDFACE. Dataset SB ALL OLA LCA a Priori KU KE DSOC Oracle Adult 83.6 (2.3) 86.7 (2.4) 82.4 (2.8) 82.3 (2.5) 80.6 (4.8) 76.6 (2.3) 71.0 (3.2) 85.6 (2.5) 99.7 (0.4) Banana 85.3 (1.4) 84.1 (1.4) 89.2 (1.9) 89.5 (1.9) 86.1 (2.5) 89.2 (1.4) 84.4 (1.9) 87.4 (2.4) 89.8 (1.9) Blood 76.4 (0.3) 76.4 (0.2) 74.2 (3.0) 74.2 (3.2) 69.0 (16.4) 76.4 (0.2) 76.4 (0.2) 72.7 (2.7) 100 (-) CTG 69.8 (11.1) 86.6 (1.7) 87.9 (1.1) 88.4 (1.2) 84.1 (1.6) 85.3 (0.9) 81.3 (1.0) 88.8 (1.1) 99.9 (0.1) Diabetes 66.0 (1.3) 64.5 (1.4) 69.9 (2.9) 70.0 (2.4) 58.6 (7.8) 65.5 (0.4) 65.1 (-) 69.4 (3.5) 92.3 (7.2) Ecoli 63.7 (3.9) 42.1 (0.6) 77.9 (3.8) 79.9 (2.9) 55.1 (9.2) 64.0 (4.2) 42.1 (0.6) 80.5 (3.7) 97.1 (1.7) Faults 31.2 (14.2) 63.5 (2.8) 64.9 (2.5) 66.4 (1.6) 51.4 (2.9) 53.6 (2.0) 36.7 (2.3) 67.6 (1.5) 99.2 (0.4) German 59.5 (5.0) 75.7 (2.5) 68.7 (2.9) 70.0 (2.9) 66.7 (3.4) 70.1 (0.3) 70.0 (-) 72.8 (2.4) 100 (-) Glass 56.6 (6.6) 58.0 (5.2) 59.9 (6.9) 60.7 (8.6) 46.4 (9.4) 49.3 (5.8) 33.6 (1.7) 63.1 (6.2) 99.8 (0.6) Haberman 75.3 (2.9) 73.7 (-) 75.3 (3.9) 74.9 (3.8) 73.9 (1.4) 73.8 (0.3) 73.7 (-) 76.4 (3.5) 88.8 (5.9) Heart 79.1 (4.9) 83.8 (3.2) 76.9 (3.3) 75.7 (4.3) 75.8 (6.3) 70.8 (4.1) 68.2 (3.6) 82.1 (3.4) 100 (-) ILPD 68.1 (3.5) 70.6 (3.5) 66.9 (3.0) 67.7 (3.2) 64.6 (6.0) 71.7 (-) 71.7 (-) 66.6 (2.9) 100 (0.1) Image 16.1 (5.6) 36.3 (1.1) 68.6 (3.0) 70.9 (2.6) 47.9 (3.8) 49.9 (1.9) 27.8 (1.2) 70.3 (2.4) 77.8 (2.7) Ionosphere 78.3 (2.8) 72.0 (2.6) 80.3 (2.6) 86.1 (3.2) 72.1 (4.9) 79.5 (6.2) 56.3 (15.3) 86.9 (3.2) 98.2 (1.9) Laryngeal (3.8) 78.6 (4.7) 79.4 (5.0) 79.8 (4.9) 76.2 (4.3) 69.2 (3.9) 66.9 (3.3) 82.4 (5.2) 99.9 (0.4) Laryngeal (5.1) 66.5 (3.3) 65.4 (5.3) 66.2 (4.9) 61.5 (5.5) 57.1 (4.0) 50.1 (3.6) 67.7 (4.0) 99.6 (0.7) Lithuanian 67.9 (6.5) 50.8 (0.5) 95.9 (1.1) 95.8 (1.2) 85.9 (2.7) 72.3 (3.2) 50.0 (-) 95.7 (2.5) 99.9 (0.5) Liver 65.6 (3.4) 59.5 (2.7) 64.5 (4.5) 66.7 (3.8) 54.1 (6.5) 49.9 (4.6) 41.9 (-) 66.0 (3.2) 100 (-) Magic 60.2 (9.5) 78.3 (0.6) 80.7 (0.6) 80.6 (1.7) 77.4 (0.6) 77.9 (0.5) 77.3 (0.5) 80.9 (0.8) 90.0 (0.5) Mammo 64.2 (14.4) 81.0 (2.6) 78.9 (2.1) 78.8 (1.6) 77.5 (3.5) 75.9 (2.4) 72.6 (2.4) 80.6 (2.4) 98.3 (1.0) Monk 78.4 (4.2) 80.5 (2.6) 86.5 (3.3) 86.5 (3.2) 77.5 (3.5) 63.8 (3.9) 55.1 (3.2) 89.3 (3.2) 100 (-) Phoneme 62.2 (6.6) 76.3 (0.8) 81.6 (1.1) 82.0 (0.9) 76.1 (1.4) 75.0 (0.6) 72.9 (0.8) 80.6 (1.1) 96.5 (0.5) Sonar 61.4 (9.0) 54.6 (1.9) 68.9 (6.8) 70.3 (6.8) 53.6 (5.9) 53.6 (1.3) 53.2 (0.9) 71.0 (7.7) 100 (-) Thyroid 93.3 (1.9) 94.4 (1.3) 94.3 (1.9) 95.6 (1.2) 90.1 (20.8) 72.0 (3.5) 21.4 (4.8) 94.5 (1.4) 100 (-) Vehicle 26.4 (3.8) 36.0 (5.7) 59.1 (3.7) 59.4 (3.2) 35.3 (5.9) 46.5 (3.1) 25.7 (0.2) 58.2 (7.5) (-) Vertebral 80.9 (2.9) 81.3 (3.6) 81.5 (4.9) 81.8 (4.3) 76.5 (5.3) 75.7 (2.9) 68.7 (1.3) 81.8 (3.7) 100 (-) WBC 85.3 (0.3) 53.6 (0.2) 92.7 (3.0) 93.2 (3.2) 81.7 (16.4) 88.3 (0.2) 62.9 (0.2) 92.5 (2.5) 100 (-) WDVG 44.6 (0.3) 83.4 (1.1) 80.1 (1.1) 80.4 (1.0) 77.2 (1.7) 65.2 (1.2) 61.5 (1.1) 82.4 (1.3) 99.9 (0.1) Weaning 76.9 (5.6) 79.3 (5.0) 77.0 (3.3) 76.9 (4.1) 71.7 (4.9) 58.4 (1.7) 53.5 (2.2) 82.9 (3.6) 100 (-) Wine 59.2 (8.7) 32.8 (1.1) 70.0 (5.2) 70.2 (4.9) 61.5 (6.0) 66.9 (4.3) 32.8 (1.2) 69.4 (6.4) 100 (-)

6 Fig. 3. Pairwise comparison of DSOC with all methods. The blue bars represent the number of problems where the adoption of complexity outperformed the competitor method, while red bars refer to the number of losses of the proposed approach Fig. 4. Graphical representation of Nemenyi test comparing all methods. The values presented next the method names correspond to its average rank considering the 30 classification problems surpassed the performance of the single best classifier on 28 over 30 experiments, and with respect to the combination of all classifiers in 23 over 30 experiments. When compared only with other DS approaches, it won in 123 over 150 experiments. Figure 3 shows the pairwise comparison with all tested methods. The blue bars represent the number of cases where there s a contribution in using the complexity measures and the red bars represents the numbers of problems where the proposed method loses. In one situation there is a tie between LCA (Vertebral dataset) and our approach. In order to compare the approaches behavior was performed Friedman s test with a confidence of 95% and a degree of freedom of 7 (once we are comparing 8 methods). For all thirty problems the null hypothesis was rejected, indicating that there s a significant difference in the accuracy among the approaches. A Nemenyi post-hoc test was performed to delineate the rank of the algorithms in all problems. The results are shown in Figure 4. It was possible to notice that our approach achieved the best ranking position in general. However the distance to OLA and LCA is smaller than the critical distance. Based on those results we can answer the first research question presented in this paper, which is related to whether the information related to data complexity analysis can contribute to estimated the competence of classifiers in an MCS based on dynamic selection. In fact, we can see some interesting contribution, however we should grasp why it could not be more significant. In the next section we discuss that, while we try to answer our second research question. V. FURTHER ANALYSIS Despite some interesting results observed in the last section, it is important to notice that the proposed method was not the best always. Thus, it is necessary to understand why in some cases there is some gain in performance and in other cases some loss is observed. With this in mind, we analyzed when the two situations occur. Considering that the proposed approach takes into account the similarity between the complexities of the data subset on which the classifier is trained and the neighborhood of test instance, we analyzed the behavior of F 1, N2 and N4 complexity descriptors. To this end, we quantified each complexity descriptor into one hundred bins. Thus, it was possible to compare both distributions, the one related to the subset used for training the classifier and that estimated from neighborhood of the test instances in the validation dataset. Figure 5 shows the behavior related to one replication selected from the 20 performed with the Monk (left side) and Sonar (right side) datasets. The proposed DS method have shown for the Monk dataset 5.5 percentage points of improvement (a significant gain) when compared to the observed second position in the rank (LCA method). On the other hand, we observed for the sonar dataset 3.4 percentage points

7 (a) (b) (c) (d) (e) (f) Fig. 5. Overlapping between complexity distributions, in red the distribution estimated from the neighborhood of each test instance, and in blue the distribution estimated from the training subsets: (a),(c) and (e) are related to the measures F1, N2 and N4 for the monk dataset; similarly (b),(d) and (f) are related to the sonar dataset. of loss (a significant loss) when compared to the observed second position in the rank (again the LCA method). One can observe the overlapping between complexity distributions, in red the distribution estimated from the neighborhood of each test instance, and in blue the distribution estimated from the training subsets. The distributions on the left side, Figures 5(a), 5(c) and 5(e) are related to the measures F 1, N2 and N4 for the Monk dataset for which the proposed method has shown promising results. Similarly, Figures 5(b), 5(d) and 5(f) are related to the Sonar dataset, for which the proposed method is not indicated. As one may see, when the overlap between distributions is more evident the contribution of the proposed DS approach is more significant. This may justify some efforts to investigate strategies to modify the ensemble learning algorithm by using complexity features to drive the generation of the training subsets that will be used to create the pool of classifiers. The rationale behind that is to better cover the complexity space of the problem in hand. VI. CONCLUSION We have evaluated the contribution of data complexity information to measure the competence of the classifiers in a dynamic selection method. For this purpose, local accuracy was combined with information related to the similarity of the data complexity estimated from the neighborhood of the test instance and from the subset of data used for training each classifier in the pool. The experiments on 30 datasets considering 20 replications have shown that the dynamic selection using complexity descriptors is a promising strategy when compared with the combination of all classifiers in the pool, and with 5 different DS methods available in the literature. Although promising results were achieved, there is still the need for further study on the influence of the data complexity on the selection process. Additional investigation is necessary to verify if we can improve the results by considering during the pool generation some strategy to better exploit the problem complexity space. An alternative would be to generate the pool of classifiers taking into account data subsets with different complexities. ACKNOWLEDGMENT This work was partially supported by CNPq (Conselho Nacional de Desenvolvimento Científico e Tecnológico), Brazil, and by CAPES (Coordenação de Aperfeiçoamento de Pessoal de Nível Superior), Brazil.

8 REFERENCES [1] V. Gunes, M. Manard, P. Loonis, and S. Petit-Renaud, Combination, cooperation and selection of classifiers: A state of the art, International Journal of Pattern Recognition and Artificial Intelligence, vol. 17, no. 08, pp , [2] L. I. Kuncheva and C. J. Whitaker, Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy, Machine Learning, vol. 51, no. 2, pp , [3] L. Breiman, Bagging predictors, Machine Learning, vol. 24, no. 2, pp , [4] Y. Freund and R. E. Schapire, Experiments with a new boosting algorithm, in Proceedings of the 13th International Conference on Machine Learning, 1996, pp [5] T. K. Ho, The random subspace method for constructing decision forests, IEEE Trans. Pattern Anal. Mach. Intell., vol. 20, no. 8, pp , Aug [6] A. S. Britto Jr., R. Sabourin, and L. E. S. Oliveira, Dynamic selection of classifiers - a comprehensive review, Pattern Recognition, vol. 47, no. 11, pp , [7] T. K. Ho and M. Basu, Complexity measures of supervised classification problems, Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 24, no. 3, pp , Mar [8] T. Ho, M. Basu, and M. Law, Measures of geometrical complexity in classification problems, in Data Complexity in Pattern Recognition, ser. Advanced Information and Knowledge Processing, M. Basu and T. Ho, Eds. Springer London, 2006, pp [9] K. Woods, W. P. Kegelmeyer, Jr., and K. Bowyer, Combination of multiple classifiers using local accuracy estimates, IEEE Trans. Pattern Anal. Mach. Intell., vol. 19, no. 4, pp , Apr [10] L. Didaci, G. Giacinto, F. Roli, and G. L. Marcialis, A study on the performances of dynamic classifier selection based on local accuracy estimation, Pattern Recognition, vol. 38, no. 11, pp , [11] M. Sabourin, A. Mitiche, D. Thomas, and G. Nagy, Classifier combination for hand-printed digit recognition, in Document Analysis and Recognition, 1993, Proceedings of the Second International Conference on, 1993, pp [12] G. Giacinto and F. Roli, Methods for dynamic classifier selection, in Proceedings of the 10th International Conference on Image Analysis and Processing, ser. ICIAP 99. Washington, DC, USA: IEEE Computer Society, 1999, pp [13] G. Giacinto, F. Roli, and G. Fumera, Selection of classifiers based on multiple classifier behaviour, in Proceedings of the Joint IAPR International Workshops on Advances in Pattern Recognition. London, UK, UK: Springer-Verlag, 2000, pp [14] L. Kuncheva and J. Rodriguez, Classifier ensembles with a random linear oracle, Knowledge and Data Engineering, IEEE Transactions on, vol. 19, no. 4, pp , [15] A. Ko, R. Sabourin, and A. Britto Jr., From dynamic classifier selection to dynamic ensemble selection, Pattern Recognition, vol. 41, no. 5, pp , [16] A. Santana, R. Soares, A. Canuto, and M. C. P. d. Souto, A dynamic classifier selection method to build ensembles using accuracy and diversity, in Neural Networks, SBRN 06. Ninth Brazilian Symposium on, Oct 2006, pp [17] Y. Yan, X.-C. Yin, Z.-B. Wang, X. Yin, C. Yang, and H.-W. Hao, Sorting-based dynamic classifier ensemble selection, in Document Analysis and Recognition (ICDAR), th International Conference on, Aug 2013, pp [18] E. dos Santos, R. Sabourin, and P. Maupin, Ambiguity-guided dynamic selection of ensemble of classifiers, in Information Fusion, th International Conference on, July 2007, pp [19] J. Xiao and C. He, Dynamic classifier ensemble selection based on gmdh, in Computational Sciences and Optimization, CSO International Joint Conference on, vol. 1, April 2009, pp [20] K. Bache and M. Lichman, UCI machine learning repository, [Online]. Available: [21] J. Alcalá-Fdez, A. Fernãndez, J. Luengo, J. Derrac, S. García, L. Sãnchez, and F. Herrera, Keel data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework, Journal of Multiple-Valued Logic and Soft Computing, vol. 17, no. 2-3, pp , 2011, cited By 275. [22] L. Kuncheva, Statlog: Comparison of classification algorithms on large real-world problems, [Online]. Available: mas00a/activities/real_data.htm [23] R. D. King, C. Feng, and A. Sutherland, Statlog: Comparison of classification algorithms on large real-world problems, [24] A. Orriols-Puig, N. Macià, and T. K. Ho, Documentation for the data complexity library in c++, Barcelona, Spain, Tech. Rep., [25] G. Giacinto and F. Roli, Adaptive selection of image classifiers, in Image Analysis and Processing, ser. Lecture Notes in Computer Science, A. Bimbo, Ed., vol Springer Berlin Heidelberg, 1997, pp [26] R. M. Cruz, R. Sabourin, G. D. Cavalcanti, and T. Ing Ren, Meta-des: A dynamic ensemble selection framework using meta-learning, Pattern Recogn., vol. 48, no. 5, pp , May 2015.

Analyzing different prototype selection techniques for dynamic classifier and ensemble selection

Analyzing different prototype selection techniques for dynamic classifier and ensemble selection Rafael M. O. Cruz and Robert Sabourin École de technologie supérieure - Université du Québec Email: cruz@livia.etsmtl.ca,