Feature Selection in High Dimensional Data by a Filter-Based Genetic Algorithm
|
|
- Kristina Hawkins
- 5 years ago
- Views:
Transcription
1 Feature Selection in High Dimensional Data by a Filter-Based Genetic Algorithm C. De Stefano, F. Fontanella and A. Scotto di Freca Dipartimento di Ingegneria Elettrica e dell Informazione (DIEI) Università di Cassino e del Lazio meridionale Via G. Di Biasio, Cassino (FR) Italy {destefano,fontanella,a.scotto}@unicas.it Abstract. In classification and clustering problems, feature selection techniques can be used to reduce the dimensionality of the data and increase the performances. However, feature selection is a challenging task, especially when hundred or thousands of features are involved. In this framework, we present a new approach for improving the performance of a filter-based genetic algorithm. The proposed approach consists of two steps: first, the available features are ranked according to a univariate evaluation function; then the search space represented by the first M features in the ranking is searched using a filter-based genetic algorithm for finding feature subsets with a high discriminative power. Experimental results demonstrated the effectiveness of our approach in dealing with high dimensional data, both in terms of recognition rate and feature number reduction. 1 Introduction Recent years have seen a strong growth of applications dealing with huge amounts of data, such as data mining and medical data processing [23]. This kind of application often imply classification or clustering problems where the objects to be classified or clustered are represented as feature vectors. The feature selection problem consists in selecting, from the whole set of available features, the subset of them providing the most discriminative power. The choice of a good feature subset is crucial since if the selected features do not contain enough information to discriminate patterns belonging to different classes, the performances may be unsatisfactory, regardless of the effectiveness of the classification system employed. Moreover, irrelevant and noisy features unnecessarily enlarge the search space, increasing both the time and the complexity of the learning process. Feature selection algorithms usually imply the definition of an evaluation function and of a search procedure. Evaluation functions can be divided into two broad classes: univariate and multivariate measures. Univariate measures evaluate the effectiveness of each single feature in discriminating samples belonging to different classes and are used to rank the available features. Once the features have been evaluated, the subset search procedure is straightforward: the features are ranked according to their merit and the best M features are
2 selected. The parameter M is specified by the user. These kind of approaches are very fast and can be used to cope with problems involving even hundreds of thousands of features. The main drawback of these measures is that they cannot consider interactions that may occur between two or more features. For this reason, features which perform well when used in conjunction with other features, are discarded if they perform poorly when used alone. Additionally, the features with the highest scores (merits) are usually similar. Therefore, these measures tend to select redundant features [6]. Multivariate measures, instead, evaluate feature subsets by measuring how well patterns belonging to different classes are discriminated when projected in the subspace represented by the subset to be evaluated. These measures are generally classified into two categories: filter and wrapper [8]. Wrapper approaches use classification algorithms, to evaluate the goodness of the subsets. This leads to high computational costs when a large number of evaluations is required, especially when large datasets are involved. Filter approaches, instead, are independent of any classification algorithm and, in most of the cases, are computationally less expensive and more general than wrapper algorithms. As concerns the search strategies, given a measure, the optimal subset can be found by exhaustively evaluating all the possible solutions. Unfortunately, the exhaustive search is impracticable when the cardinality N of the whole set of features Y is high (N > 50). This is due to the fact that the search space, made of all the 2 N possible subsets of Y, exponentially grows with N. For this reason, many heuristic algorithms have been proposed for finding near-optimal solutions [4, 8, 21]. Among these algorithms, greedy strategies that incrementally generate feature subsets have been proposed. Since these algorithms do not take into account complex interactions among the features, in most of the cases they lead to sub-optimal solutions. Evolutionary computation (EC) based techniques have been widely used to cope with the feature selection problem [21]. Among the EC based approaches, Genetic Algorithms (GAs) have been widely used. GA binary vectors provide a natural and straightforward representation for feature subsets: the value 1 or 0 of the chromosome i-th element indicates whether the i-th feature is included or not. Most of the GA approaches use wrapper evaluation functions [21]. For these approaches different classification algorithms have been adopted, among them: Support Vector Machines (SVMs) [16], K-Nearest Neighbor (KNN) [13] and Artificial Neural Networks (ANNs) [22]. As mentioned above, wrapper evaluation functions lead to high computational costs since their computational complexity depends on the number of samples actually used for training the classifier. As consequence, such approaches are not well suited to deal with problems involving a huge number of instances and features. Also filter fitness functions have been used; the approach presented in [19] uses an information theory based evaluation function, while in [11] the authors adopt a consistency measure. Moreover, in [3, 5] the authors present a filter fitness function that extends the Fisher s linear discriminant. Recently, in order to reduce the search space size for high-dimensional datasets,
3 different strategies have been adopted [18, 20, 21] for GA-based algorithms. In [18], the search space reduction for the GA is performed by using different filter approaches. The information provided by these approaches is used to build a part of the individuals making up the initial population. Then, individuals are evaluated by means of a neural network based wrapper function. The approach has been tested on a credit assessment risk problem involving just 33 features. In [20] the authors present a new GA-based approach for feature selection which uses three different ranking algorithms for reducing the search space for the GA, which uses a SVM based wrapper as fitness function. However, in this case, the GA algorithm is used in a very limited way because the search space is reduced to only 12 features. Moreover, in [10] and [12] two different GA-based hybrid approaches that use wrapper fitness functions are proposed and tested on data with no more than 100 features. Finally, in [2] a two steps procedure is used to deal with data involving up to thousands of features. In the first step, the whole set of features is ranked according to an univariate measure; in the second step, the final subset is built by incrementally adding the i-th ranked feature. The process continues until the added feature improves the performance of the classifier used for the subset evaluation. In this paper we present a new GA-based algorithm for feature selection that exploits the advantages of both feature ranking and GAs. The goal is to build a high performance feature selection system that selects a small number of features, with respect to the total number of available features. For this purpose, we built a two-module system that combines a feature ranking algorithm with a GA. The proposed system allows us to greatly reduce the number of features to be used in the classification phase. More specifically, the first module uses a feature ranking algorithm to greatly reduce the number of features to be taken into account by the second module; it considers only a given number M (a priori fixed) of features that are promising, according to the univariate measure used for ranking the whole feature set given in input to the system. The second GAbased module seeks, in the search space consisting of the feature subsets made of the features provided by the first module, the best feature subset by using a filter fitness function that evaluates feature subsets. The layout of the proposed system is shown in Figure 1. Because of the reduction performed by the feature ranking, the search space provided to the GA module is much smaller than that made of all the possible subsets of the whole feature set. The proposed system is based on the hypothesis that this reduced search space still contains most of the promising areas, i.e. those containing good and near-optimal solutions (subsets). In practice, the filtering performed by the ranking module does not discard those features that performs well only when used in combination with other ones; this allows the second GA-based module to focus its search on these more promising areas. As concerns the univariate measures for the feature ranking, we used the Chisquare measure introduced in [15]. As evaluation function for the GA module we used that introduced in [9], namely the Correlation based Feature Selection function (CFS). This function evaluates the merit of a subset by considering
4 N features Feature ranking M features GA selected features M Fig. 1. The layout of the proposed system. both the correlation between the class labels and the single features, and the inter-correlation among the selected features. The CFS function computation is made of two steps: (i) the class-features correlation vector and the features correlation matrix are a priori computed for all the features and properly stored; (ii) the subsequent computations of the CFS function can be computed by accessing to the vector and matrix a priori computed. It is worth noting that these computations are independent of the training set size. The effectiveness of the proposed approach has been tested on four different datasets publicly available, whose total number of features ranges from 500 to Two kinds of comparison were performed: in the former the results of our approach were compared with those achieved by using different feature selection strategies; in the latter, our results were compared with those obtained by a wrapper based approach. The remainder of the paper is organized as follows: in Section 2 the feature evaluation functions are described, Section 3 illustrates the GA used to implement the second module. In Section 4 the experimental results are detailed. Finally, Section 5 is devoted to the conclusions. 2 Feature Evaluation Function As mentioned in the Introduction, feature evaluation functions can be broadly divided into two classes, namely univariate measures and multivariate measures. In the following, the univariate measure adopted for the ranking module and the subset evaluation criterion used as fitness function of the GA module are detailed. 2.1 Univariate Measures Univariate measures evaluate the effectiveness of a single feature in discriminating samples belonging to different classes and can be used to sort the whole set of available features. The feature selection approaches which use this kind of measure do not need a search procedure In fact, once the features have been sorted,
5 the best subset of M features consists of the first M features of the ranking. Note that the value of M must be chosen by the user. For our approach, we used the Chi-square univariate measure [15]. This measure estimates feature merit by using a discretization algorithm: if a feature can be discretized to a single value, it has not discriminative power and it can safely be discarded. The discretization algorithm adopts a supervised heuristic method based on the χ 2 statistic. The range of values for each feature is initially discretized by considering a certain number of intervals (heuristically determined). Then, the χ 2 statistic is used to determine if the relative frequencies of the classes in adjacent intervals are similar enough to justify the merging of such intervals. The formula for computing the χ 2 value for two adjacent intervals is the following: 2 C χ 2 (A ij E ij ) 2 = (1) Eij i=1 j=1 where C is the number of classes, A ij is the instance number of the j-th class in the i-th interval and E ij is the expected frequency of A ij given by the formula: E ij = R i C j /N T where R i is the number of instances in the i-th interval and C j and N T are the instance number of the j-th class and the total number of instances, respectively, in both intervals. The extent of the merging process is controlled by a threshold, whose value represent the maximum admissible difference among the occurrence frequencies of the samples in adjacent intervals. The value of this threshold has been heuristically set during preliminary experiments. 2.2 Subset Evaluation Functions Multivariate methods for feature subset evaluation, can in turn be divided into two classes: filter and wrapper. The former are based on statistical measures and their outcomes are independent from the classifier actually used. The latter, instead, are based on the classification results achieved by a certain classifier, trained on the subset to be evaluated. Wrapper methods are usually computationally more expensive than the filter ones, as they require the training of the classifier used for each evaluation, making them unsuitable to solve big data tasks, where huge datasets must be processed. Moreover, while filter-based evaluations are more general, as they give statistical information on the data, wrapper-based evaluations may give raise to loss of generality because they depend on the specific classifier used. In order to introduce the subset evaluation function adopted, let us briefly recall the well known information-theory concept of entropy. Given a discrete variable X, which can assume the values {x 1, x 2,..., x n }, its entropy H(X) is defined as: n H(X) = p(x i ) log 2 p(x i ) (2) i=1
6 where p(x i ) is the probability mass function of the value x i. The quantity H(X) represents an estimate of the uncertainty of the random variable X. The concept of entropy can be used to define the conditional entropy of two random variables X and Y taking the values x i and y j respectively, as: H(X Y ) = i,j p(x i, y j ) log p(y j) p(x i, y j ) where p(x i, y j ) is the joint probability that at same time X = x i and Y = y j. The quantity in (3) represents the amount of randomness in the random variable X when the value of Y is known. Given two features X and Y, their correlation r XY is computed as follows 1 : r XY = 2.0 H(X) + H(Y ) H(X, Y ) H(X) + H(Y ) As fitness function for the GA module we chose a filter called CFS (Correlationbased Feature Selection) [9], which uses a correlation based heuristic to evaluate feature subset quality. This function takes into account the usefulness of the single features for predicting class labels along with the level of inter-correlation among them. The idea behind this approach is that good subsets contain features highly correlated with the class and uncorrelated with each other. Given a feature selection problem in which the patterns are represented by means of a set Y of N features, the CFS function computes the merit of the generic subset X Y, made of k features, as follows: f CF S (X) = (3) (4) k r cf k + k (k 1) rff (5) where r cf is the average feature-class correlation, and r ff is the feature-feature correlation. Note that the numerator estimates the discriminative power of the features in X, whereas the denominator assesses the redundancy among them. The CFS function allows the GA to discard irrelevant and redundant features. The former because they are poor in discriminating the different classes at the hand; the latter because they are highly correlated with one or more of the other features. In contrast to previously presented approaches [10, 18], this fitness function is able to automatically find the number of features and does not need the setting of any parameter. Finally, given a dataset D to estimate the quantities in (4) and a feature subset X to be evaluated, the computation of f CF S (X) (X Y ) can be made very fast. In fact, before starting the search procedure (the GA in our case), the correlation vector V cf, containing N elements, and the N N symmetric correlation matrix M ff can be computed. The i-th element of V cf contains the value of the correlation between the i-th feature and the class, whereas the element M ff [i, j] represents the correlation between the i-th and the j-th feature. Once the values of V cf and M ff have been computed, given a subset X containing k features, the computation of f CF S (X) only requires 2k memory accesses. 1 Note that the same holds also for the feature-class correlation.
7 3 Genetic Algorithms for Feature Selection In the last decades, Evolutionary Computation techniques have shown to be very effective as methodology for solving optimization problems whose search space are discontinuous and very complex. In this field, GAs represent a subset of these optimization techniques and have been applied to a wide variety of both numerical and combinatorial optimization problems [12]. In a GA the solutions are represented as binary vectors and operators such as crossover and mutation are applied to explore the search space made of all possible solutions. GAs can be easily applied to the problem of feature selection: given a set Y having cardinality equal to N, a subset X of Y (X Y ) can be represented by a binary vector of N elements whose i-th element is set to 1 if the i-th features is included in X, 0 otherwise. Besides the simplicity in the solution encoding, GAs are well suited for this class of problems as the search in this exponential space is very hard since interactions among features can be highly complex and strongly nonlinear. Some studies on the GAs effectiveness in solving features selection problems can be found in [12, 21]. The second module of the system presented here has been implemented by using a generational GA. In order to reduce the computational complexity of the fitness function (see subsection 2.2) the class-feature correlation vector V CF and the feature-feature correlation matrix M F F are pre-computed. Then the GA starts by randomly generating a population of P individuals. Afterwards, the fitness of the generated individuals is evaluated according to the formula in (5). After this preliminary evaluation phase, a new population is generated by selecting P/2 couples of individuals using the roulette wheel method. The one point crossover operator is then applied to each of the selected couples, according to a given probability factor p c. Afterwards, the mutation operator is applied with a probability p m. The value of p m has been set to 1/N, where N is the chromosome length, i.e. the total number of the available features for the problem at hand. This probability value allows, on average, the modification of only one chromosome element. This value has been suggested in [17] as optimal mutation rate below the error threshold of replication. Finally these individuals are added to the new population. The process just described is repeated for N g generations. 4 Experimental Results We tested the proposed approach on high dimensional data (from 500 up to features). For each dataset, a set of values for the parameter M (see Figure 1) has been tested. For each value of M, 30 runs have been performed for the GA module. At the end of every run, the feature subset encoded by the individual with the best fitness, has been used to built a Multilayer Perceptron classifier (MLP in the following), trained by using the back propagation algorithm. The classification performances of the classifiers built have been obtained by using the 10-fold cross-validation approach. The results reported in the following have
8 Table 1. The values of the GA module parameters used in the experiments. Note that p m depends on the chromosome length, i.e. the total number of available features N F. Parameter symbol value Population size P 100 Crossover probability p c 0.4 Mutation probability p m 1/N F Number of Generations N g 500 been obtained averaging the performance of the 30 MLP s built. Some preliminary trials have been performed to set the parameters of the GA and of the MLP, reported in Table 1 and 2 respectively. These two sets of parameters have been used for all the experiments reported below. 4.1 The Datasets The proposed approach has been tested on the following, publicly available, datasets: Arcene, Gisette, Madelon [1] and Ucihar [14]. The characteristics of the datasets are summarized in Table 3. They present different characteristics as regards the number of attributes, the number of classes (two or multiple classes problems) and the number of samples. In particular, Arcene contains mass-spectrometric data from medical tests for cancer diagnosis (ovarian or prostate cancer); it is a two-class classification problem with continuous input variables. Gisette contains images of confusable handwritten digits: the four and the nine. The dataset was constructed from the MNIST data made available by Yann LeCun of the NEC Research Institute; it is a two-class classification problem with sparse continuous input variables. Madelon contains two-class synthetic data with sparse binary attributes; each class contains a certain number of Gaussian clusters, independently generated. It also contains some redundant and useless features. Finally, the Ucihar dataset contains data representing signals from smartphone sensors (accelerometer and gyroscope), recorded from 30 persons wearing a smartphone on the waist. Each person performed six activities, each representing a class of the problem: walking, walking upstairs, walking downstairs, sitting, standing and laying. Table 2. The values of the parameters used for the training of the MLP s. Note that the number of hidden neurons depends on both the number of input attributes N a and the output classes N c. Parameter value Learning rate 0.3 Momentum 0.2 Hidden Neurons (N a + N c)/2 Epochs 500
9 Table 3. The datasets used in the experiments. Datasets attributes samples classes Arcene Gisette Madelon Ucihar Comparison Findings In order to test the effectiveness of our system, we performed two sets of comparisons. In the first set, we compared the results of the proposed system with those obtained by three different feature selection approaches: The feature ranking represented by the first module of our system (Figure 1): given the whole set of N features, it gives as output the best M feature, according to the univariate measure adopted; it will be denoted as RNK in the following. the GA used in the second module of the proposed system (Figure 1): given the whole set of N features, it searches for the best solution (subset) by using the GA algorithm detailed in Section 3. It will be denoted as GA in the following. The third approach taken into account for the comparison, instead, is quite similar to our approach but uses the sequential forward floating selection as search strategy of the second module. This strategy searches the solution space by using a greedy hill-climbing technique. It starts with the empty set of features and, at each step, selects the best feature that satisfies the evaluation function; The algorithm also verifies the possibility of improvement of the criterion if a feature is excluded. In this case, the worst feature, according to the evaluation function, is excluded from the set. We used an improved version of this algorithm, presented in [7]. It will be denoted as RNK-SFS in the following. The purpose of the first comparison was to test the effectiveness of the proposed approach in improving the performance obtained by using only the feature ranking approach. As regards the second comparison, its aim was to validate our hypothesis: a feature ranking algorithm can be used to improve the performance of a standard GA, by locating the promising areas of the whole search space consisting of all the available features. Finally, the goal of the third comparison was to assess the ability of the GA in finding good solutions in the search space provided by the feature ranking module. For all the comparisons, the performance have been evaluated in terms of recognition rate and feature reduction. As concerns the second set of comparisons, we compared the results of our system with those presented in [2]. The approach taken account for this comparison is called IWSS and, as mentioned in the introduction, it is wrapper-based and is able to deal with problems involving thousands of features.
10 With the purpose of investigating how the value of the parameter M affects the performance of the presented system, we tested several M values. Moreover, since the number of attributes of the datasets taken into account differs widely, we considered two sets of values. The set {100, 200, 500, 1000, 2000} has been considered for the datasets Arcene and Gisette, whose samples are described by and 5000 attributes, respectively. The set {20, 50, 100, 200, 300} has been used for the datasets Madelon and Ucihar, having 500 and 561 attributes, respectively. First Set of Comparisons Since the approaches RNK and RNK-SFS are deterministic, for each value of M, they generated a single feature subset. However, in order to perform a fair comparison with the proposed approach, for each subset generated, 30 MLP s have been trained with different, randomly generated, initial weights. The trained MLP s have been evaluated by using the 10-fold cross-validation approach. The results reported in the following have been obtained averaging the performance of the 30 MLP s learned. Also in this case we used the parameters reported in Table 2. As concerns the results of the GA approach, they have been obtained by using the methodology adopted for our approach, so as described at the beginning of the present section. Note that to statistically validate the comparison results, we performed the nonparametric Wilcoxon rank-sum test (α = 0.05) over 30 runs. The comparison results have been grouped according to the different values of M used, and are reported in Tables 4 (Arcene and Gisette) and 5 (Madelon and Ucihar). In both tables, the second column shows the values of the parameter M, while the recognition rate (RR) and the number of selected features (NF), are reported for each method. It is worth noting that for the RNK method the number of selected features have not been reported because it coincides with the value of M actually used. In each table, the recognition rates in bold highlight the results which are significantly better with respect to the second best results (values starred in the table), according to the Wilcoxon test. As concerns the results that do not present a statistically significant difference, the best two results are both starred. Moreover, for each method, in the case that two or more results do not present statistically significant difference, the result achieved with the minimum number of features has been considered. Finally, note that for our approach, we used the abbreviation RNK-GA. The comparison results for the Arcene and Gisette datasets are shown in Table 4. From the table it can be seen that the proposed approach achieves better performance for both datasets. In more detail, for the Arcene dataset a recognition rate of 92.3% has been obtained by using only 465 out of the features provided in input to the system. This result has been achieved with a value of M equal to 2000 and it is significantly better than those obtained with smaller values. This seems to suggest that for these smaller values the ranking module discards features that, although they score poorly according to the χ 2
11 Table 4. Comparison results for the Arcene and Gisette datasets. Bold values represent the best statistically significant results. Dataset M RNK-GA GA RNK RNK-SFS RR NF RR NF RR RR NF Arcene Gisette measure, are relevant for the classification task when used in conjunction with other features. Nonetheless, the search space reduction performed by the ranking module with M = 2000 (from to ) allowed a strong improvement of the performance with respect to the GA, which searched the whole search space. As concerns the second best result, the RNK approach reached a recognition rate of 87.4%, by using 500 features which coincides with the value of M. Note that for M = 2000 the RNK approach performs poorly, indicating that only some of the first 2000 ranked features are actually relevant, while most of them are irrelevant or redundant, and training the MLP with all of them leads to poor classification performance. As for the RNK-SFS method, which got its best result with M = 2000, it performs significantly worse than our approach, showing that the GA has a better searching ability than RNK-SFS. For the Gisette dataset, the best two results of our system (M=500 and M=1000) were not significantly different and, according to the criterion mentioned above, we considered that which used 41.9 features, which achieved a recognition rate of 96.7% (M =500). The results of the other methods were not significantly different each other. In this case, the GA approach got good results, but selected much more features than RNK-GA. This result shows that, even if the GA selected most of the relevant features, it was not able to discard the redundant and irrelevant ones. This is due to the fact that the GA searched the whole space of solutions and could not benefit of the filtering action performed by the ranking module. As regards the RNK-SFS, it got the best performance with three different results (M=500, M=100 and M=2000), which did not exhibit any statistically significant difference. The result chosen (M =500) reached a recognition rate of 95.6%, by selecting 73 features. These results exhibit slight differences also in terms of number of features. This seems to suggest that the SFS algorithm, starting from the 500 features case, got stuck in suboptimal areas of the search space, and it was not able to locate new areas containing solutions consisting of a greater number of features.
12 Table 5. Comparison results for the Madelon and Ucihar datasets. Bold values represent the best statistically significant results. Dataset M RNK-GA GA RNK RNK-SFS RR NF RR NF RR RR NF Madelon Ucihar The results just described seem to confirm that as the search space (exponentially) grows with M, the GA module of the proposed approach is able to locate new areas of the search space containing better solutions, which includes the new features progressively added. In particular, in the case of the Arcene dataset, our system was able to find solutions whose cardinality strongly grows as M increases, obtaining a strong increase of the performance in terms of recognition rate. For the Gisette dataset, instead, the fitness increment of the new solutions found did not lead to a significant improvement in terms of recognition rate. The comparison results for the Madelon and Ucihar datasets are shown in Table 5. From the table it can be observed that for both datasets the proposed system did not significantly outperform the compared systems. As concerns the Madelon dataset, the RNK approach reached its best performance by using the first 20 ranked features. This suggests that the Madelon whole set of features contains a small set of features that, even when they are taken separately, have a high discriminative power. This set can be easily identified, either by using RNK which selected the best 20 features, according to χ 2 measure (equation (1)), or by the GA, despite it searched the whole search space. As for the Ucihar results, RNK-GA, RNK-SFS and GA outperformed RNK, but achieved performances that did not exhibit any statistically significant difference. RNK-GA and RNK-SFS obtained their best results with M =200, using a comparable number of features. The GA selected much more features than RNK-GA and RNK-SFS (about three times), confirming that it wasn t able to discard redundant and irrelevant features. Nonetheless, in this case, these features did not affect the MLP training process. These results seems to suggest that when the number of features to be dealt with is not too large: (i) even less effective search algorithms like the SFS can find good solutions in the reduced search space provided by the first module; (ii) the GA alone is still able to locate search space areas containing good solutions, but they may contain redundant or irrelevant features.
13 Table 6. Comparison results with the IWSS approach. Dataset RNK-GA IWSS M RR NF RR NF Arcene Gisette Madelon (a) C4.5 algorithm Dataset RNK-GA IWSS M RR NF RR NF Arcene Gisette Madelon (b) K Nearest Neighbor Dataset RNK-GA IWSS M RR NF RR NF Arcene Gisette Madelon (c) Naive Bayes Second set of Comparisons In [2], the proposed approach (IWSS) has been tested on three classifiers: Naive Bayes (NB), KNN (K = 1) and C4.5. Moreover, in [2], among the others, three of the four datasets reported in Table 3 have been tested: Arcene, Gisette and Madelon. The comparison results are shown in Table 6. Note that the second column shows the values of the parameter M that obtained the best result among those tested (the same used for the first set of comparison). The recognition rate (RR) and the number of selected features (NF), are also reported. Since the IWSS approach is deterministic, the results reported in [2], refer to a single feature subset and have been obtained using the 10-fold cross-validation technique. For this reason, in order to statistically validate the comparison results we performed the one sample Wilcoxon rank-sum test (α = 0.05), comparing the single result of IWSS with the results of RNK- GA, on the 30 runs. The values in bold highlight the best result, according to the Wilcoxon test. From the table it can be seen that for the Arcene dataset (10000 features), the proposed system greatly outperforms the IWSS approach on the three classifiers considered. As concerns the Gisette dataset (5000 features), for the C4.5 and KNN classifiers, the performance of our approach are better than those of IWSS. Finally, as for the Madelon dataset (500 features), IWSS achieves better performance for the C4.5 and KNN classifiers, whereas for the NB classifier our system performs slightly better. The above results confirms the effectiveness of proposed approach to deal with high dimensional data. In fact, on the Arcene dataset our system largely
14 outperforms the IWSS approach, in spite of the fact that IWSS uses a wrapper evaluation function. This is confirmed by the results on the Gisette dataset, where our system obtained better performances on two of the three classifiers. In this case, the differences in terms of recognition rates are much smaller than those of the Arcene dataset. However, these results are similar to those reported in Table 4, where the recognition rate differences for the MLP classifier are not greater than 1%. Only for the NB classifier IWSS significantly outperforms our approach, but nonetheless this performance is worse than that obtained by our approach with the MLP. Finally, as for the Madelon dataset, as mentioned above, it is characterized by a small set of discriminative features that can be easily identified by univariate measures. Then, in this case, the IWSS approach is favored because it first ranks the features and then incrementally build the feature subset by using a greedy strategy. 4.3 Discussion From the results shown above it can be seen that the univariate measure used in the first module is able to identify most of the relevant features even though the adopted measure evaluates the relevance of each feature, without taking into account any feature interaction. In practice, what happens is that these interacting features that are singularly little relevant, but that become useful when taken with other features, do not appear at the bottom of the ranking. Thus, these features can always be included by suitably incrementing the value of M. It is worth noting that, because of the exponentially growth of the search space, even high values of M allows a strong reduction of the search space. An interesting property of our system is that, once the number of M has been correctly set, the GA of the second module is able to discard redundant features. In fact, our approach selected much less features than those given in input or those selected by the GA taken into account for the comparison. We want to remark that, the above results confirm the assumptions underlying our method: (i) a feature ranking algorithm can be used to preselect a number, a priori fixed, of features among the whole set of available features; (ii) the search space consisting of the subsets made of these selected features contains most of the good and near-optimal solutions (subsets). In practice, the filtering performed by the feature ranking module makes easier the task of searching for good solutions and this filtering is crucial in improving the performance of the GA when thousands of features are involved. 5 Conclusions We present a novel GA-based approach for feature selection which is able to deal with thousands of features. The approach consists of two modules. The first uses a feature ranking based approach that reduces the search space made of the whole set of available features. This reduction is performed by discarding
15 the features that, according the univariate measure employed, are less useful for discriminating among the different classes at hand. The second module uses a genetic algorithm to search in the solution space provided by the first module. This module employs a correlation based heuristic function to evaluate the worth of the feature subsets encoded by the individuals. Since we used only filter evaluation functions, the proposed approach shows the following interesting properties: (i) it is independent of the classification system used; (ii) once that the correlation data have been computed in the initialization step of the GA, the computational cost of the fitness function does not depend on the training set size. The second property makes our system particularly suitable for problems involving a huge number of instances. The effectiveness of the proposed system has been tested on data represented in high dimensional spaces. The achieved results have been compared with those obtained by different feature selection strategies, both wrapper and filter. For the datasets containing thousands of features, our method obtain better results than the other methods both in terms of accuracy and number of selected features. Future works will investigate different feature evaluation function, both for the ranking (univariate) and for the GA (multivariate). Moreover, system performance will be evaluated also for different classification schemes. References 1. Nips 2003 workshop on feature extraction and feature selection challenge. (2003) 2. Bermejo, P., Gámez, J.A., Puerta, J.M.: Improving incremental wrapper-based subset selection via replacement and early stopping. IJPRAI 25(5), (2011) 3. Cordella, L.P., De Stefano, C., Fontanella, F., Marrocco, C., Scotto di Freca, A.: Combining single class features for improving performance of a two stage classifier. In: 20th International Conference on Pattern Recognition (ICPR 2010). pp IEEE Computer Society (2010) 4. Dash, M., Liu, H.: Feature selection for classification. Intelligent Data Analysis 1(1-4), (1997) 5. De Stefano, C., Fontanella, F., Marrocco, C.: A GA-Based Feature Selection Algorithm for Remote Sensing Images, pp Springer Berlin Heidelberg, Berlin, Heidelberg (2008) 6. De Stefano, C., Fontanella, F., Maniaci, M., Scotto di Freca, A.: A method for scribe distinction in medieval manuscripts using page layout features. In: Maino, G., Foresti, G. (eds.) Image Analysis and Processing - ICIAP 2011, Lecture Notes in Computer Science, vol. 6978, pp Springer Berlin Heidelberg (2011) 7. Gütlein, M., Frank, E., Hall, M., Karwath, A.: Large scale attribute selection using wrappers. In: Proc. of the IEEE Symposium on Computational Intelligence and Data Mining (CIDM 2009) (2009) 8. Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, (2003) 9. Hall, M.A.: Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of the Seventeenth International Conference on Machine Learning. pp Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2000)
16 10. Huang, J., Cai, Y., Xu, X.: A hybrid genetic algorithm for feature selection wrapper based on mutual information. Pattern Recogn. Lett. 28(13), (Oct 2007) 11. Lanzi, P.: Fast feature selection with genetic algorithms: a filter approach. In: Evolutionary Computation, 1997., IEEE International Conference on. pp (Apr 1997) 12. Lee, J.S., Oh, I.S., Moon, B.R.: Hybrid genetic algorithms for feature selection. IEEE Trans. Pattern Anal. Mach. Intell. 26(11), (2004) 13. Li, R., Lu, J., Zhang, Y., Zhao, T.: Dynamic adaboost learning with feature selection based on parallel genetic algorithm for image annotation. Knowledge-Based Systems 23(3), (2010) 14. Lichman, M.: UCI machine learning repository (2013), Liu, H., Setiono, R.: Chi2: Feature selection and discretization of numeric attributes. In: ICTAI. pp IEEE Computer Society, Washington, DC, USA (1995) 16. Manimala, K., Selvi, K., Ahila, R.: Hybrid soft computing techniques for feature selection and parameter optimization in power quality data mining. Applied Soft Computing 11(8), (2011), Ochoa, G.: Error thresholds in genetic algorithms. Evolutionary Computation 14(2), (2006) 18. Oreski, S., Oreski, G.: Genetic algorithm-based heuristic for feature selection in credit risk assessment. Expert Systems with Applications 41(4, Part 2), (2014) 19. Spolaor, N., Lorena, A., Lee, H.: Multi-objective genetic algorithm evaluation in feature selection. In: Takahashi, R., Deb, K., Wanner, E., Greco, S. (eds.) Evolutionary Multi-Criterion Optimization, Lecture Notes in Computer Science, vol. 6576, pp Springer Berlin Heidelberg (2011) 20. Tan, F., Fu, X., Zhang, Y., Bourgeois, A.G.: A genetic algorithm-based method for feature subset selection. Soft Computing 12(2), (Sep 2007) 21. Xue, B., Zhang, M., Browne, W.N., Yao, X.: A survey on evolutionary computation approaches to feature selection. IEEE Transactions on Evolutionary Computation 20(4), (Aug 2016) 22. Yusta, S.C.: Different metaheuristic strategies to solve the feature selection problem. Pattern Recognition Letters 30(5), (2009) 23. Zhai, Y., Ong, Y.S., Tsang, I.: The emerging big dimensionality. Computational Intelligence Magazine, IEEE 9(3), (Aug 2014)
BENCHMARKING ATTRIBUTE SELECTION TECHNIQUES FOR MICROARRAY DATA
BENCHMARKING ATTRIBUTE SELECTION TECHNIQUES FOR MICROARRAY DATA S. DeepaLakshmi 1 and T. Velmurugan 2 1 Bharathiar University, Coimbatore, India 2 Department of Computer Science, D. G. Vaishnav College,
More informationBest First and Greedy Search Based CFS and Naïve Bayes Algorithms for Hepatitis Diagnosis
Best First and Greedy Search Based CFS and Naïve Bayes Algorithms for Hepatitis Diagnosis CHAPTER 3 BEST FIRST AND GREEDY SEARCH BASED CFS AND NAÏVE BAYES ALGORITHMS FOR HEPATITIS DIAGNOSIS 3.1 Introduction
More informationReview of feature selection techniques in bioinformatics by Yvan Saeys, Iñaki Inza and Pedro Larrañaga.
Americo Pereira, Jan Otto Review of feature selection techniques in bioinformatics by Yvan Saeys, Iñaki Inza and Pedro Larrañaga. ABSTRACT In this paper we want to explain what feature selection is and
More informationUsing a genetic algorithm for editing k-nearest neighbor classifiers
Using a genetic algorithm for editing k-nearest neighbor classifiers R. Gil-Pita 1 and X. Yao 23 1 Teoría de la Señal y Comunicaciones, Universidad de Alcalá, Madrid (SPAIN) 2 Computer Sciences Department,
More informationFeature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani
Feature Selection CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Dimensionality reduction Feature selection vs. feature extraction Filter univariate
More informationAn Empirical Study of Lazy Multilabel Classification Algorithms
An Empirical Study of Lazy Multilabel Classification Algorithms E. Spyromitros and G. Tsoumakas and I. Vlahavas Department of Informatics, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece
More informationFilter methods for feature selection. A comparative study
Filter methods for feature selection. A comparative study Noelia Sánchez-Maroño, Amparo Alonso-Betanzos, and María Tombilla-Sanromán University of A Coruña, Department of Computer Science, 15071 A Coruña,
More informationRandom Search Report An objective look at random search performance for 4 problem sets
Random Search Report An objective look at random search performance for 4 problem sets Dudon Wai Georgia Institute of Technology CS 7641: Machine Learning Atlanta, GA dwai3@gatech.edu Abstract: This report
More informationNovel Initialisation and Updating Mechanisms in PSO for Feature Selection in Classification
Novel Initialisation and Updating Mechanisms in PSO for Feature Selection in Classification Bing Xue, Mengjie Zhang, and Will N. Browne School of Engineering and Computer Science Victoria University of
More informationDiscretizing Continuous Attributes Using Information Theory
Discretizing Continuous Attributes Using Information Theory Chang-Hwan Lee Department of Information and Communications, DongGuk University, Seoul, Korea 100-715 chlee@dgu.ac.kr Abstract. Many classification
More informationPublished by: PIONEER RESEARCH & DEVELOPMENT GROUP ( 1
Cluster Based Speed and Effective Feature Extraction for Efficient Search Engine Manjuparkavi A 1, Arokiamuthu M 2 1 PG Scholar, Computer Science, Dr. Pauls Engineering College, Villupuram, India 2 Assistant
More informationEnhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques
24 Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Ruxandra PETRE
More informationA Parallel Evolutionary Algorithm for Discovery of Decision Rules
A Parallel Evolutionary Algorithm for Discovery of Decision Rules Wojciech Kwedlo Faculty of Computer Science Technical University of Bia lystok Wiejska 45a, 15-351 Bia lystok, Poland wkwedlo@ii.pb.bialystok.pl
More informationInformation Fusion Dr. B. K. Panigrahi
Information Fusion By Dr. B. K. Panigrahi Asst. Professor Department of Electrical Engineering IIT Delhi, New Delhi-110016 01/12/2007 1 Introduction Classification OUTLINE K-fold cross Validation Feature
More informationFeature Selection Using Modified-MCA Based Scoring Metric for Classification
2011 International Conference on Information Communication and Management IPCSIT vol.16 (2011) (2011) IACSIT Press, Singapore Feature Selection Using Modified-MCA Based Scoring Metric for Classification
More informationHybrid Correlation and Causal Feature Selection for Ensemble Classifiers
Hybrid Correlation and Causal Feature Selection for Ensemble Classifiers Rakkrit Duangsoithong and Terry Windeatt Centre for Vision, Speech and Signal Processing University of Surrey Guildford, United
More informationUnsupervised Feature Selection Using Multi-Objective Genetic Algorithms for Handwritten Word Recognition
Unsupervised Feature Selection Using Multi-Objective Genetic Algorithms for Handwritten Word Recognition M. Morita,2, R. Sabourin 3, F. Bortolozzi 3 and C. Y. Suen 2 École de Technologie Supérieure, Montreal,
More informationResearch on Applications of Data Mining in Electronic Commerce. Xiuping YANG 1, a
International Conference on Education Technology, Management and Humanities Science (ETMHS 2015) Research on Applications of Data Mining in Electronic Commerce Xiuping YANG 1, a 1 Computer Science Department,
More informationA FAST CLUSTERING-BASED FEATURE SUBSET SELECTION ALGORITHM
A FAST CLUSTERING-BASED FEATURE SUBSET SELECTION ALGORITHM Akshay S. Agrawal 1, Prof. Sachin Bojewar 2 1 P.G. Scholar, Department of Computer Engg., ARMIET, Sapgaon, (India) 2 Associate Professor, VIT,
More informationA Feature Selection Method to Handle Imbalanced Data in Text Classification
A Feature Selection Method to Handle Imbalanced Data in Text Classification Fengxiang Chang 1*, Jun Guo 1, Weiran Xu 1, Kejun Yao 2 1 School of Information and Communication Engineering Beijing University
More informationA Hybrid Feature Selection Algorithm Based on Information Gain and Sequential Forward Floating Search
A Hybrid Feature Selection Algorithm Based on Information Gain and Sequential Forward Floating Search Jianli Ding, Liyang Fu School of Computer Science and Technology Civil Aviation University of China
More informationFeature-weighted k-nearest Neighbor Classifier
Proceedings of the 27 IEEE Symposium on Foundations of Computational Intelligence (FOCI 27) Feature-weighted k-nearest Neighbor Classifier Diego P. Vivencio vivencio@comp.uf scar.br Estevam R. Hruschka
More informationEvolving SQL Queries for Data Mining
Evolving SQL Queries for Data Mining Majid Salim and Xin Yao School of Computer Science, The University of Birmingham Edgbaston, Birmingham B15 2TT, UK {msc30mms,x.yao}@cs.bham.ac.uk Abstract. This paper
More informationFeature Selection with Decision Tree Criterion
Feature Selection with Decision Tree Criterion Krzysztof Grąbczewski and Norbert Jankowski Department of Computer Methods Nicolaus Copernicus University Toruń, Poland kgrabcze,norbert@phys.uni.torun.pl
More informationForward Feature Selection Using Residual Mutual Information
Forward Feature Selection Using Residual Mutual Information Erik Schaffernicht, Christoph Möller, Klaus Debes and Horst-Michael Gross Ilmenau University of Technology - Neuroinformatics and Cognitive Robotics
More informationWrapper Feature Selection using Discrete Cuckoo Optimization Algorithm Abstract S.J. Mousavirad and H. Ebrahimpour-Komleh* 1 Department of Computer and Electrical Engineering, University of Kashan, Kashan,
More informationEfficient Image Compression of Medical Images Using the Wavelet Transform and Fuzzy c-means Clustering on Regions of Interest.
Efficient Image Compression of Medical Images Using the Wavelet Transform and Fuzzy c-means Clustering on Regions of Interest. D.A. Karras, S.A. Karkanis and D. E. Maroulis University of Piraeus, Dept.
More informationDynamic Clustering of Data with Modified K-Means Algorithm
2012 International Conference on Information and Computer Networks (ICICN 2012) IPCSIT vol. 27 (2012) (2012) IACSIT Press, Singapore Dynamic Clustering of Data with Modified K-Means Algorithm Ahamed Shafeeq
More informationMODELLING DOCUMENT CATEGORIES BY EVOLUTIONARY LEARNING OF TEXT CENTROIDS
MODELLING DOCUMENT CATEGORIES BY EVOLUTIONARY LEARNING OF TEXT CENTROIDS J.I. Serrano M.D. Del Castillo Instituto de Automática Industrial CSIC. Ctra. Campo Real km.0 200. La Poveda. Arganda del Rey. 28500
More informationNoise-based Feature Perturbation as a Selection Method for Microarray Data
Noise-based Feature Perturbation as a Selection Method for Microarray Data Li Chen 1, Dmitry B. Goldgof 1, Lawrence O. Hall 1, and Steven A. Eschrich 2 1 Department of Computer Science and Engineering
More informationFeature Selection. Department Biosysteme Karsten Borgwardt Data Mining Course Basel Fall Semester / 262
Feature Selection Department Biosysteme Karsten Borgwardt Data Mining Course Basel Fall Semester 2016 239 / 262 What is Feature Selection? Department Biosysteme Karsten Borgwardt Data Mining Course Basel
More informationCLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS
CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of
More informationA Heart Disease Risk Prediction System Based On Novel Technique Stratified Sampling
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 16, Issue 2, Ver. X (Mar-Apr. 2014), PP 32-37 A Heart Disease Risk Prediction System Based On Novel Technique
More informationVisual object classification by sparse convolutional neural networks
Visual object classification by sparse convolutional neural networks Alexander Gepperth 1 1- Ruhr-Universität Bochum - Institute for Neural Dynamics Universitätsstraße 150, 44801 Bochum - Germany Abstract.
More informationThe Un-normalized Graph p-laplacian based Semi-supervised Learning Method and Speech Recognition Problem
Int. J. Advance Soft Compu. Appl, Vol. 9, No. 1, March 2017 ISSN 2074-8523 The Un-normalized Graph p-laplacian based Semi-supervised Learning Method and Speech Recognition Problem Loc Tran 1 and Linh Tran
More informationA Classifier with the Function-based Decision Tree
A Classifier with the Function-based Decision Tree Been-Chian Chien and Jung-Yi Lin Institute of Information Engineering I-Shou University, Kaohsiung 84008, Taiwan, R.O.C E-mail: cbc@isu.edu.tw, m893310m@isu.edu.tw
More informationPerformance Analysis of Data Mining Classification Techniques
Performance Analysis of Data Mining Classification Techniques Tejas Mehta 1, Dr. Dhaval Kathiriya 2 Ph.D. Student, School of Computer Science, Dr. Babasaheb Ambedkar Open University, Gujarat, India 1 Principal
More information6. Dicretization methods 6.1 The purpose of discretization
6. Dicretization methods 6.1 The purpose of discretization Often data are given in the form of continuous values. If their number is huge, model building for such data can be difficult. Moreover, many
More informationGene selection through Switched Neural Networks
Gene selection through Switched Neural Networks Marco Muselli Istituto di Elettronica e di Ingegneria dell Informazione e delle Telecomunicazioni Consiglio Nazionale delle Ricerche Email: Marco.Muselli@ieiit.cnr.it
More informationThe Comparative Study of Machine Learning Algorithms in Text Data Classification*
The Comparative Study of Machine Learning Algorithms in Text Data Classification* Wang Xin School of Science, Beijing Information Science and Technology University Beijing, China Abstract Classification
More informationThe Role of Biomedical Dataset in Classification
The Role of Biomedical Dataset in Classification Ajay Kumar Tanwani and Muddassar Farooq Next Generation Intelligent Networks Research Center (nexgin RC) National University of Computer & Emerging Sciences
More informationAn Effective Performance of Feature Selection with Classification of Data Mining Using SVM Algorithm
Proceedings of the National Conference on Recent Trends in Mathematical Computing NCRTMC 13 427 An Effective Performance of Feature Selection with Classification of Data Mining Using SVM Algorithm A.Veeraswamy
More informationInformation theory methods for feature selection
Information theory methods for feature selection Zuzana Reitermanová Department of Computer Science Faculty of Mathematics and Physics Charles University in Prague, Czech Republic Diplomový a doktorandský
More informationImproving Feature Selection Techniques for Machine Learning
Georgia State University ScholarWorks @ Georgia State University Computer Science Dissertations Department of Computer Science 11-27-2007 Improving Feature Selection Techniques for Machine Learning Feng
More informationUsing PageRank in Feature Selection
Using PageRank in Feature Selection Dino Ienco, Rosa Meo, and Marco Botta Dipartimento di Informatica, Università di Torino, Italy fienco,meo,bottag@di.unito.it Abstract. Feature selection is an important
More informationUsing Decision Boundary to Analyze Classifiers
Using Decision Boundary to Analyze Classifiers Zhiyong Yan Congfu Xu College of Computer Science, Zhejiang University, Hangzhou, China yanzhiyong@zju.edu.cn Abstract In this paper we propose to use decision
More informationRobustness of Selective Desensitization Perceptron Against Irrelevant and Partially Relevant Features in Pattern Classification
Robustness of Selective Desensitization Perceptron Against Irrelevant and Partially Relevant Features in Pattern Classification Tomohiro Tanno, Kazumasa Horie, Jun Izawa, and Masahiko Morita University
More informationEstimating Error-Dimensionality Relationship for Gene Expression Based Cancer Classification
1 Estimating Error-Dimensionality Relationship for Gene Expression Based Cancer Classification Feng Chu and Lipo Wang School of Electrical and Electronic Engineering Nanyang Technological niversity Singapore
More informationImproving the Efficiency of Fast Using Semantic Similarity Algorithm
International Journal of Scientific and Research Publications, Volume 4, Issue 1, January 2014 1 Improving the Efficiency of Fast Using Semantic Similarity Algorithm D.KARTHIKA 1, S. DIVAKAR 2 Final year
More informationImproving Results and Performance of Collaborative Filtering-based Recommender Systems using Cuckoo Optimization Algorithm
Improving Results and Performance of Collaborative Filtering-based Recommender Systems using Cuckoo Optimization Algorithm Majid Hatami Faculty of Electrical and Computer Engineering University of Tabriz,
More informationCHAPTER 6 HYBRID AI BASED IMAGE CLASSIFICATION TECHNIQUES
CHAPTER 6 HYBRID AI BASED IMAGE CLASSIFICATION TECHNIQUES 6.1 INTRODUCTION The exploration of applications of ANN for image classification has yielded satisfactory results. But, the scope for improving
More informationInternational Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X
Analysis about Classification Techniques on Categorical Data in Data Mining Assistant Professor P. Meena Department of Computer Science Adhiyaman Arts and Science College for Women Uthangarai, Krishnagiri,
More informationFeature Selection Technique to Improve Performance Prediction in a Wafer Fabrication Process
Feature Selection Technique to Improve Performance Prediction in a Wafer Fabrication Process KITTISAK KERDPRASOP and NITTAYA KERDPRASOP Data Engineering Research Unit, School of Computer Engineering, Suranaree
More informationSSV Criterion Based Discretization for Naive Bayes Classifiers
SSV Criterion Based Discretization for Naive Bayes Classifiers Krzysztof Grąbczewski kgrabcze@phys.uni.torun.pl Department of Informatics, Nicolaus Copernicus University, ul. Grudziądzka 5, 87-100 Toruń,
More informationFlexible-Hybrid Sequential Floating Search in Statistical Feature Selection
Flexible-Hybrid Sequential Floating Search in Statistical Feature Selection Petr Somol 1,2, Jana Novovičová 1,2, and Pavel Pudil 2,1 1 Dept. of Pattern Recognition, Institute of Information Theory and
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Features and Feature Selection Hamid R. Rabiee Jafar Muhammadi Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Features and Patterns The Curse of Size and
More informationChapter 22 Information Gain, Correlation and Support Vector Machines
Chapter 22 Information Gain, Correlation and Support Vector Machines Danny Roobaert, Grigoris Karakoulas, and Nitesh V. Chawla Customer Behavior Analytics Retail Risk Management Canadian Imperial Bank
More informationSupervised Variable Clustering for Classification of NIR Spectra
Supervised Variable Clustering for Classification of NIR Spectra Catherine Krier *, Damien François 2, Fabrice Rossi 3, Michel Verleysen, Université catholique de Louvain, Machine Learning Group, place
More informationSlides for Data Mining by I. H. Witten and E. Frank
Slides for Data Mining by I. H. Witten and E. Frank 7 Engineering the input and output Attribute selection Scheme-independent, scheme-specific Attribute discretization Unsupervised, supervised, error-
More informationBRACE: A Paradigm For the Discretization of Continuously Valued Data
Proceedings of the Seventh Florida Artificial Intelligence Research Symposium, pp. 7-2, 994 BRACE: A Paradigm For the Discretization of Continuously Valued Data Dan Ventura Tony R. Martinez Computer Science
More informationFast Efficient Clustering Algorithm for Balanced Data
Vol. 5, No. 6, 214 Fast Efficient Clustering Algorithm for Balanced Data Adel A. Sewisy Faculty of Computer and Information, Assiut University M. H. Marghny Faculty of Computer and Information, Assiut
More informationFuzzy Entropy based feature selection for classification of hyperspectral data
Fuzzy Entropy based feature selection for classification of hyperspectral data Mahesh Pal Department of Civil Engineering NIT Kurukshetra, 136119 mpce_pal@yahoo.co.uk Abstract: This paper proposes to use
More informationWEIGHTED K NEAREST NEIGHBOR CLASSIFICATION ON FEATURE PROJECTIONS 1
WEIGHTED K NEAREST NEIGHBOR CLASSIFICATION ON FEATURE PROJECTIONS 1 H. Altay Güvenir and Aynur Akkuş Department of Computer Engineering and Information Science Bilkent University, 06533, Ankara, Turkey
More informationFeature Selection in Knowledge Discovery
Feature Selection in Knowledge Discovery Susana Vieira Technical University of Lisbon, Instituto Superior Técnico Department of Mechanical Engineering, Center of Intelligent Systems, IDMEC-LAETA Av. Rovisco
More informationFeature Selection Based on Relative Attribute Dependency: An Experimental Study
Feature Selection Based on Relative Attribute Dependency: An Experimental Study Jianchao Han, Ricardo Sanchez, Xiaohua Hu, T.Y. Lin Department of Computer Science, California State University Dominguez
More informationFeature and Search Space Reduction for Label-Dependent Multi-label Classification
Feature and Search Space Reduction for Label-Dependent Multi-label Classification Prema Nedungadi and H. Haripriya Abstract The problem of high dimensionality in multi-label domain is an emerging research
More informationStatistical dependence measure for feature selection in microarray datasets
Statistical dependence measure for feature selection in microarray datasets Verónica Bolón-Canedo 1, Sohan Seth 2, Noelia Sánchez-Maroño 1, Amparo Alonso-Betanzos 1 and José C. Príncipe 2 1- Department
More informationA Novel Criterion Function in Feature Evaluation. Application to the Classification of Corks.
A Novel Criterion Function in Feature Evaluation. Application to the Classification of Corks. X. Lladó, J. Martí, J. Freixenet, Ll. Pacheco Computer Vision and Robotics Group Institute of Informatics and
More informationSelection of Location, Frequency and Orientation Parameters of 2D Gabor Wavelets for Face Recognition
Selection of Location, Frequency and Orientation Parameters of 2D Gabor Wavelets for Face Recognition Berk Gökberk, M.O. İrfanoğlu, Lale Akarun, and Ethem Alpaydın Boğaziçi University, Department of Computer
More informationA Content Vector Model for Text Classification
A Content Vector Model for Text Classification Eric Jiang Abstract As a popular rank-reduced vector space approach, Latent Semantic Indexing (LSI) has been used in information retrieval and other applications.
More informationWeighting and selection of features.
Intelligent Information Systems VIII Proceedings of the Workshop held in Ustroń, Poland, June 14-18, 1999 Weighting and selection of features. Włodzisław Duch and Karol Grudziński Department of Computer
More information1. Introduction. 2. Motivation and Problem Definition. Volume 8 Issue 2, February Susmita Mohapatra
Pattern Recall Analysis of the Hopfield Neural Network with a Genetic Algorithm Susmita Mohapatra Department of Computer Science, Utkal University, India Abstract: This paper is focused on the implementation
More informationDistributed Optimization of Feature Mining Using Evolutionary Techniques
Distributed Optimization of Feature Mining Using Evolutionary Techniques Karthik Ganesan Pillai University of Dayton Computer Science 300 College Park Dayton, OH 45469-2160 Dale Emery Courte University
More informationInformation-Theoretic Feature Selection Algorithms for Text Classification
Proceedings of International Joint Conference on Neural Networks, Montreal, Canada, July 31 - August 4, 5 Information-Theoretic Feature Selection Algorithms for Text Classification Jana Novovičová Institute
More informationAn Information-Theoretic Approach to the Prepruning of Classification Rules
An Information-Theoretic Approach to the Prepruning of Classification Rules Max Bramer University of Portsmouth, Portsmouth, UK Abstract: Keywords: The automatic induction of classification rules from
More informationFeature selection in environmental data mining combining Simulated Annealing and Extreme Learning Machine
Feature selection in environmental data mining combining Simulated Annealing and Extreme Learning Machine Michael Leuenberger and Mikhail Kanevski University of Lausanne - Institute of Earth Surface Dynamics
More informationCOMPARISON OF SUBSAMPLING TECHNIQUES FOR RANDOM SUBSPACE ENSEMBLES
COMPARISON OF SUBSAMPLING TECHNIQUES FOR RANDOM SUBSPACE ENSEMBLES SANTHOSH PATHICAL 1, GURSEL SERPEN 1 1 Elecrical Engineering and Computer Science Department, University of Toledo, Toledo, OH, 43606,
More informationUsing PageRank in Feature Selection
Using PageRank in Feature Selection Dino Ienco, Rosa Meo, and Marco Botta Dipartimento di Informatica, Università di Torino, Italy {ienco,meo,botta}@di.unito.it Abstract. Feature selection is an important
More informationMachine Learning Techniques for Data Mining
Machine Learning Techniques for Data Mining Eibe Frank University of Waikato New Zealand 10/25/2000 1 PART VII Moving on: Engineering the input and output 10/25/2000 2 Applying a learner is not all Already
More informationUnivariate and Multivariate Decision Trees
Univariate and Multivariate Decision Trees Olcay Taner Yıldız and Ethem Alpaydın Department of Computer Engineering Boğaziçi University İstanbul 80815 Turkey Abstract. Univariate decision trees at each
More informationOptimization of Association Rule Mining through Genetic Algorithm
Optimization of Association Rule Mining through Genetic Algorithm RUPALI HALDULAKAR School of Information Technology, Rajiv Gandhi Proudyogiki Vishwavidyalaya Bhopal, Madhya Pradesh India Prof. JITENDRA
More informationREMOVAL OF REDUNDANT AND IRRELEVANT DATA FROM TRAINING DATASETS USING SPEEDY FEATURE SELECTION METHOD
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 5.258 IJCSMC,
More informationData Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation
Data Mining Part 2. Data Understanding and Preparation 2.4 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Introduction Normalization Attribute Construction Aggregation Attribute Subset Selection Discretization
More informationImproved PSO for Feature Selection on High-Dimensional Datasets
Improved PSO for Feature Selection on High-Dimensional Datasets Binh Tran, Bing Xue, and Mengjie Zhang Victoria University of Wellington, PO Box 600, Wellington 6140, New Zealand {binh.tran,bing.xue,mengjie.zhang}@ecs.vuw.ac.nz
More informationSelf-Organizing Maps for cyclic and unbounded graphs
Self-Organizing Maps for cyclic and unbounded graphs M. Hagenbuchner 1, A. Sperduti 2, A.C. Tsoi 3 1- University of Wollongong, Wollongong, Australia. 2- University of Padova, Padova, Italy. 3- Hong Kong
More informationImproving Tree-Based Classification Rules Using a Particle Swarm Optimization
Improving Tree-Based Classification Rules Using a Particle Swarm Optimization Chi-Hyuck Jun *, Yun-Ju Cho, and Hyeseon Lee Department of Industrial and Management Engineering Pohang University of Science
More informationCS6375: Machine Learning Gautam Kunapuli. Mid-Term Review
Gautam Kunapuli Machine Learning Data is identically and independently distributed Goal is to learn a function that maps to Data is generated using an unknown function Learn a hypothesis that minimizes
More informationAnalytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.
Glossary of data mining terms: Accuracy Accuracy is an important factor in assessing the success of data mining. When applied to data, accuracy refers to the rate of correct values in the data. When applied
More informationSemi-Supervised Clustering with Partial Background Information
Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject
More informationCluster homogeneity as a semi-supervised principle for feature selection using mutual information
Cluster homogeneity as a semi-supervised principle for feature selection using mutual information Frederico Coelho 1 and Antonio Padua Braga 1 andmichelverleysen 2 1- Universidade Federal de Minas Gerais
More informationFEATURE EXTRACTION TECHNIQUES USING SUPPORT VECTOR MACHINES IN DISEASE PREDICTION
FEATURE EXTRACTION TECHNIQUES USING SUPPORT VECTOR MACHINES IN DISEASE PREDICTION Sandeep Kaur 1, Dr. Sheetal Kalra 2 1,2 Computer Science Department, Guru Nanak Dev University RC, Jalandhar(India) ABSTRACT
More informationAn Empirical Study on feature selection for Data Classification
An Empirical Study on feature selection for Data Classification S.Rajarajeswari 1, K.Somasundaram 2 Department of Computer Science, M.S.Ramaiah Institute of Technology, Bangalore, India 1 Department of
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Features and Feature Selection Hamid R. Rabiee Jafar Muhammadi Spring 2012 http://ce.sharif.edu/courses/90-91/2/ce725-1/ Agenda Features and Patterns The Curse of Size and
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Features and Feature Selection Hamid R. Rabiee Jafar Muhammadi Spring 2013 http://ce.sharif.edu/courses/91-92/2/ce725-1/ Agenda Features and Patterns The Curse of Size and
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Image Data: Classification via Neural Networks Instructor: Yizhou Sun yzsun@ccs.neu.edu November 19, 2015 Methods to Learn Classification Clustering Frequent Pattern Mining
More informationPrognosis of Lung Cancer Using Data Mining Techniques
Prognosis of Lung Cancer Using Data Mining Techniques 1 C. Saranya, M.Phil, Research Scholar, Dr.M.G.R.Chockalingam Arts College, Arni 2 K. R. Dillirani, Associate Professor, Department of Computer Science,
More informationFeature Selection for Supervised Classification: A Kolmogorov- Smirnov Class Correlation-Based Filter
Feature Selection for Supervised Classification: A Kolmogorov- Smirnov Class Correlation-Based Filter Marcin Blachnik 1), Włodzisław Duch 2), Adam Kachel 1), Jacek Biesiada 1,3) 1) Silesian University
More informationVariable Selection 6.783, Biomedical Decision Support
6.783, Biomedical Decision Support (lrosasco@mit.edu) Department of Brain and Cognitive Science- MIT November 2, 2009 About this class Why selecting variables Approaches to variable selection Sparsity-based
More informationRank Measures for Ordering
Rank Measures for Ordering Jin Huang and Charles X. Ling Department of Computer Science The University of Western Ontario London, Ontario, Canada N6A 5B7 email: fjhuang33, clingg@csd.uwo.ca Abstract. Many
More informationFeatures: representation, normalization, selection. Chapter e-9
Features: representation, normalization, selection Chapter e-9 1 Features Distinguish between instances (e.g. an image that you need to classify), and the features you create for an instance. Features
More information