Feature Selection for Classification of Remote Sensed Hyperspectral Images: A Filter approach using Genetic Algorithm and Cluster Validity

Feature Selection for Classification of Remote Sensed Hyperspectral Images: A Filter approach using Genetic Algorithm and Cluster Validity A. B. Santos 1, C. S. F. de S. Celes 1, A. de A. Araújo 1, and D. Menotti 2 1 Computer Science Department, UFMG - Federal University of Minas Gerais, Belo Horizonte, MG, Brazil 2 Computing Department, UFOP - Federal University of Ouro Preto, Ouro Preto, MG, Brazil Abstract In this paper, we investigate the advantages of using feature selection approaches for classification of remote sensed hyperspectral images. We propose a new filter feature selection approach based on genetic algorithms (GA) and cluster validity measures for finding the best subset of features that maximizes inter-cluster and minimizes intracluster distances, respectivelly. Thus, using the optimal, or sub-optimal, subset of features, classifiers can build decision boundaries in an accurate way. Dunn s index metric, given a subset of features, is used to estimate how good the built clusters are. Experiments were carried out with two wellknown datasets: AVIRIS - Indian Pines and ROSIS - Pavia University. Three different classifiers were used to evaluate our proposal: Support Vector Machines (SVM), Multi-layer Perceptron Neural Networks (MLP) and K-Nearest Neighbor (KNN). Moreover, we compare the performance of our proposal in terms of accuracies to other ones: the traditional Pixelwise, without feature selection/extraction, and the widely used Singular Value Decomposition Band Subset Selection (SVDSS). Experiments show that the classification methods using our feature selection approach produce a small subset of features which easily achieve enough discriminative power and their results are similars to the ones using SVDSS. Keywords: filter feature selection, hyperspectral image, pattern classification, genetic algorithm, cluster validity 1. Introduction One of the many tasks in remote sensing is land cover classification, which is concerned with the identification of areas with vegetation, hydrographic, artificial cover (plantations, urban areas, reforestation areas, etc.) and all the different coverages of the earth s surface [1], [2]. Hyperspectral images have information about materials on earth s surface expressed in hundred bands/wavelengths [1]. This data allows us to identify and discrimate materials with more accuracy [1], [2]. With a new data representation as such, classifiers can improve their performance in terms of accuracy and precision. For instance, recently, classification methods using Support Vector Machines (SVM) have shown greater accuracy when dealing with hyperspectral data than when compared with other methods using Maximum Likelihood (ML), k-nearest neighbors (KNN), among other classifiers [3], [4], [2]. Although the high dimensionality of hyperspectral images provides great discriminative power, its classification is still a challenging task due to the large amount of spectral information and its small set of referenced data [1], [2], [5], [6]. This is also known as Hughes phenomena [7] or the curse of dimensionality. Another constraint mentioned in the literature when data is in high-dimensional space is the density estimation [2]. It is more difficult to compute than when in a lower dimensional space, since the space is quite sparse [2]. In order to surmount such difficulties, some approaches apply feature extraction/selection/representation techniques [3], [8], [9]. Thus, feature dimension reduction approaches are still required in order to improve the generalization power of the classifiers and reduce its overhead [3]. In [8], a wrapper approach using genetic algorithms (GA) and a SVM classifier tackles this issue. Wrapper approaches, however, have high computacional costs [10]. For this reason, in this paper, we propose a filter approach for feature selection. The search for a smaller subset of features is based on a GA as well, where clustering analysis measures are evolved trying to achieve a minimal number of features without loss of discriminative power. Experiments were carried out using two well-known data sets: Indian Pines and Pavia University, obtained by Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) [11] and Reflective Optics System Imaging Spectrometer (ROSIS) [12] sensors, respectively. Three different classifier algorithms (i.e., KNN, Multi-layer Perceptron Neural Networks (MLP), and SVM) were used to compare accuracies obtained by our approach to Pixelwise, which does not use feature selection, and SVDSS ones, which is widely used for feature selection in the remote sensing community [13], [14], [15]. The remainder of this paper is organized as follows. Section 2 describes the classification process and presents some well-known algorithms for such task. Section 3 introduces the problem of feature selection and the SVDSS approach. In section 4, our proposed approach for feature selection is presented. Finally, the experiments and conclusions are presented in Sections 5 and 6, respectively.

2. Classification Algorithms First, let us mathematically define the problem of classification of hyperspectral images. Let δ {1,..., n} be an integer set which indexes the n pixels of a hyperspectral image. Let ψ {1,..., K c } be a set of K c available classes and X (X 1,..., X n ) R d n be the pixels that compose the feature vector in a d-dimensional space. Finally, let y (y 1,..., y n ) ψ n represent a labeled image. The classification goal is, for every pixel l δ, to infer a label y l ψ using its feature vector x l R d. The so-called Pixelwise approach uses all d responses/bands in the feature vector, without any spatial information, to assign a label to the pixel. In this traditional approach, any feature extraction/selection technique may be applied. Some classification algorithms are briefly described below. 2.1 K-Nearest Neighbor The K-Nearest Neighbor (KNN) classification algorithm labels a sample according to the K closest samples. Since samples are in some feature space, KNN uses a distance measure to define the closeness of each sample to another [16]. For example, let X (X 1,..., X n ) be a training set and each sample X i a tuple (x 1, x 2,..., x d, c), where c is the class which X i belongs to and x i, with 1 i d, its features or attributes. If T (T 1,..., T n ), a testing set, where each sample T i is an unlabeled sample with the same d feature number as X i, to assign a label the KNN algorithm computes the distance of T i to every sample in the training set X, then a label c is assigned to T i considering the most frequent label of the K closest training samples. Besides its simple operation, this algorithm has some drawbacks: High overhead to compute the distance between the sample to be labeled and all samples in the training set. Low accuracy in high dimensional space. Need to set the parameter K. 2.2 Multilayer Perceptron Neural Network An Artificial Neural Network (ANN) of Multilayer Perceptron type is an extension of common ANN Perceptron. The MLP is composed of a set of input units, or neurons, that represents the input layer, at least a hidden layer and an output layer [16]. In pattern classification, a MLP separates the feature space using hyperplanes, by means of a supervised learning process. Thus, regions, in feature space, are associated with a class, then a new sample can be labeled according to the region in which it is inserted. As MLPs can have a greater number of layers, they are able to perform multiple separations in feature space. Hence, MLP can build arbitrary shapes in feature space that represents different and complex classes [16]. The construction of a MLP has some issues such as the number of hidden layers and neurons in each layer, which should be set according to the problem. 2.3 Support Vector Machines The Support Vector Machines (SVM) methodology is based on class separation through margins in which samples are mapped to a feature space where they can be linearly separable [16]. The data is transformed to a new feature space, generally larger than the original, by a kernel function. Some popular kernels are: Linear, Polynomial and Radial Basis Function (RBF). The ability of separating data with nonlinear distributions is related to the choice of this function and should be chosen according to the problem domain [16]. Using an appropriate nonlinear mapping, samples of two classes can then be linearly separated by a hyperplane [16], [17] in this new transformed and high feature space [16], [4]. Thus, SVM training consists of finding an optimal hyperplane where the separating distance between margins of each class can be maximized [16], [17]. Samples whose locations are located on the margins are called support vectors and are the most informative ones to build the classification decision boundary [16]. 3. Feature Selection There are many reasons to reduce dimensionality on data. As said previously, one of them is the Hughes effect. Another obvious reason is to reduce the computational complexity [17]. For example, more features imply more synaptic weights to be optimized in a neural network [17]. The feature selection is preprocessing which aims to minimize the number of features amount, and simultaneously keep the maximum amount of discriminant information as possible [17]. The selection of B-optimal bands (features) is a combinatorial optimization problem in a very large search space. For example, to select B bands from a set of N, the ) = N! B!(N B)!. total number of possible combinations is ( N B For each combination of selected features, a separability criterion should be used to find the best subset. The computational load employed to test all possibible combination is intractable, thus, GA is a suitable tool that one can use to lead a search that optimizes a certain separability criterion. In general, the selection process is divided into two different approaches: filter and wrapper [18]. The wrapper approach leads a search using information provided by a learning algorithm (classifier) [18]. For each step on selection process the subset chosen is evaluated by the learning algorithm, then the best subset is that one which has the best evaluation [18]. Filter approaches have the property of being completely independent of information from the classifier because the feature selection is performed as a preprocessing step, before the classification process. When the number of features is high the filter model is usually chosen due

its computational efficiency [10]. Figure 1 illustrates the wrapper approach and Figure 2 the filter model. Fig. 1: Wrapper model. Modified from [18]. represented as a bit string. Each individual is evaluated by a function called fitness. This function establishes the quality of an individual related to a solution. In GA, the population starts in a random way, or by some strategy according to the problem in question. The population undergoes a determined number of evolutions. During this process individuals of this population are evolved and reproduced using some genetic operations such as: crossover, mutation and the selection process. Its main goal is to find the individual with the best fitness [20]. We model the problem of feature selection as follows: each individual has a size of B bands/genes, as shown in Figure 3, and each gene represents the presence (bit 1) or the absence (bit 0) of a band, then, from the composition of present bands the subset of selected bands (features subset) is composed. As previously stated, the quality of Fig. 2: Filter model. Modified from [18]. 3.1 Singular Value Decomposition Band Subset Selection - SVDSS The Singular Value Decomposition Band Subset Selection (SVDSS) is a heuristic based on Singular Value Decomposition (SVD) and rank revealing QR matrix factorization. The SVD of a hyperspectral image is computed, then B-bands are selected as the B first rows of its rank built by QR factorization [14], [15]. This unsupervised approach selects the most independent bands looking for a subset of B-bands that approximate the total variability to the first B Principal Components of a Principal Component Analysis (PCA) [14], [19], [13]. This filter approach has been widely used in the field of remote sensed hyperspectral images [14], [15] and represents a great technique for feature selection. The advantage of this approach over combinatorial optimization ones is that it can be performed in polynomial time [19]. However, it does not guarantee the optimal subset of features. More details on SVDSS can be see in [14], [19], [13]. 4. Proposal Genetic Algorithms (GAs) are the most widespread techniques among evolutionary algorithms. They allow us to find potencial solutions for optimization problems in an acceptable time, especially when the search space is very large [20]. This technique is a heuristic based on population of individuals (e.g., chromosomes), in which each individual enacts a candidate solution for a problem [20] and can be Fig. 3: Chromossome representation and mapping of the hyperspectral bands on the hypercube. Adapted from [6]. each candidate solution is evaluated according to a fitness function. The fitness function evaluates the selected features (set of bands which corresponds to bit 1 in individual chromosome) according to a metric of interest. Here, we are interested in a metric based on homogeneity between samples in the same partition (class), as well as the dissimilarity between samples in different partitions which compose the subset. For this task, a cluster validity metric is used. One of the most popular metrics for cluster validity is the Dunn s index [21]. The higher the indices obtained are, more dense and distant from each other the clusters are. Let U {C 1,..., C c } be a partition system composed of a given subset of features. The Dunn s index can then be calculated as, { min v D (U) = 1 i c min 1 j c i j { δ(ci, C j ) max 1 k c{φ(c k )} } }, where Φ(C k ) is the diameter of cluster C k and c the number of clusters. In our work, we adopted δ(c i, C j ) as the distance between centroids of clusters C i and C j, however any kind of inter-cluster distance measurement can

be applied [21]. The similarity, or distance metric, used is the Euclidian distance, i.e., δ(x, y) = (x 1 y 1 ) 2 +... + (x n y n ) 2. Thereby, the fitness function evaluates a partition system compound by a subset of features. Therefore, the aim is to minimize distances between samples belonging to the same cluster (intra-cluster) and to maximize distances between samples of different clusters (inter-cluster). Thus, we expect to build clusters with a subset of features that maximizes Dunn s index such that classifiers can build decision boundaries in an accurate way. 5. Experiments In this section, we describe the experiments carried out on two well-known datasets for validating our proposal and comparing it with other feature selection/representation approaches (SVDSS and pixelwise) using three classifier algorithms (MLP, KNN, and SVM). After finding the most suitable features using the proposed scheme, in order to perform a fair comparison, we use this same number of features as parameter in SVDSS approach. We use the SVDSS implementation from Hyperspectral Image Analysis Toolbox (HIAT) for Matlab [22]. The MLP and KNN classifiers are native implementations from Matlab and SVM is run from by Matlab however using the LIBSVM [23] implementation. Firstly, we present details regarding GA and classifiers setups used in our proposal. 5.1 GA and Classifiers Setups An individual s chromossome, i.e., the features present in the individual, is initialized in a random way and the parameters were set according to results of preliminary experiments. Table 1 presents all parameters used in GA. Notice that all parameters, for both datasets, are the same. The samples are randomly chosen, however the total number of samples has an important impact on the performance. The higher the number of samples, the higher the time to calculate fitness for each individual. In order to ensure of features to be found. We expect that at the end of all runs the total number of features found at each run are close. Three different classification algorithms were used: Support Vector Machines (SVM), Multilayer Perceptron Neural Network (MLP) and K-Nearest Neighboor (KNN). The kernel used in SVM is Radial Basis Function (RBF) and the parameters were manually adjusted. The configuration of MLP was: a single hidden layer and its number of neurons equal to square root of the number of input patterns times output patterns, sigmoidal transfer function and backpropagation training algorithm. In KNN, the parameter K was setup to 8. 5.2 Selected features by GA scheme It is well-known that for each channel of a hyperspectral images we have a response/amplitude in a specific wavelength. Thus, in the case that indexes of these images are correlated to their waveleghts, finding some closeness between responses that have near indexes was expected. It is interesting to note that our work always find a subset of features that are in the same track, that is, for each GA execution new subsets are generated but with indexes that are equal (same features) or very close (same tracks). Since we have decided to let GA find the optimal number of features, it is also important to note that for each run we obtained almost the same number of features. Figure 5 shown the average number of features of the best individuals of each age, over 10 runs. This indicates that there are tracks of spectral bands that are more discriminative than others, according to Dunn s index. Thereby, features that repeat in at least 50% of all 10 runs were selected to compose the final subset of features. Table 1: GA s Parameters for Indian Pines & Pavia University datasets. Parameters data Population (num. of individuals) 100 Number of ages 500 Crossover probability 80% Mutation probability 0.9% K-tournament 2 K-elitism 2 Samples per class 50 high reliability of results, ten runs of GA for each dataset were performed. Then, we select those features that more frequently appeared. Note that we do not define the number Fig. 5: Average number of features of the best individuals of each age. Pavia University and Indian Pines datasets in blue and green, respectively. 5.3 Classification Results As can be seen in Table 2, the results of our proposal were very close to the widely used SVDSS for all used classifiers, but any improvements, in terms of OA and AA compared

(a) Ground-truth Fig. 4: Results for Indian Pines dataset. Table 2: Results for Indian Pines dataset. (b) Classification map - Proposal+SVM Number Train KNN MLP SVM of samples samples (%) Pixelwise SVD Proposal Pixelwise SVD Proposal Pixelwise SVD Proposal OA (%) 66.8 65.2 65.4 79.9 79.5 74.9 81.6 77.9 77.1 AA (%) 67.3 66.8 65.6 82.3 80.8 78.7 81.8 77.8 78.1 Classes Alfafa 54 50 88.9 88.9 87 98.1 88.9 94.4 94.4 92.6 90.7 Corn-notill 1434 5 51.5 49.7 47 74.4 72 68.3 78 71.8 63.3 Corn-min 834 5 43.8 36.9 42.5 68.6 62.2 60.7 61.5 57 53.2 Corn 234 5 31.6 32.5 28.6 52.6 24.8 40.6 26 18.3 23.5 Grass/pasture 497 5 64.6 63.4 59.6 64.8 63.6 78.5 88.5 87.9 83.1 Grass/trees 747 5 93 93 86.7 95.5 96.9 93 92.6 92.6 95.2 Grass/pasture-mowed 26 50 92 92.3 88.5 100 100 100 96.1 96.2 88.5 Hay-windrowed 489 5 86.3 87.1 84.7 90.4 93.8 88.8 92.8 92.8 93.5 Oats 20 50 65 70 65 100 100 100 100 90 100 Soybeans-notill 968 5 72.8 68.8 66.7 68.1 78.8 61.9 75.9 62.4 60.5 Soybeans-min 2468 5 72.9 71.7 75.3 83.7 82.7 75.9 86.3 86.4 84.9 Soybean-clean 614 5 30.6 28.8 30 83.9 77.8 64.7 80.9 68.1 83 Wheat 212 5 95.2 94.8 96.7 98 100 93.7 93.4 90.6 93.4 Woods 1294 5 94.1 94.1 96 95.9 96.4 96.4 97.3 97.3 98.5 Bldg-grass-trees-drives 380 5 6.3 5.8 5.2 44.5 54.4 43.3 44.7 40 39.5 Stone-steel towers 95 50 88.4 91.6 89.5 99 100 98.9 100 100 98.9 to Pixelwise approach, were obtained for both SVDSS and our proposal. However, the results are still close to the ones of Pixelwise with the advantage of reducing computational overhead in the classification process, since the feature space was reduced. Figure 4b shows the classification map from the Indian Pines dataset obtained by using the subset of features found by our work and the SVM classifier, while Figure 4a shows its respective ground-truth. In order to achieve consistency in terms of results, a similar procedure to the one applied to the Indian Pines dataset was applied to Pavia University dataset. Again, features that repeat in at least 50% of all executions were selected to compose the final subset of features. At the end of 500 ages, for 10 runs, GA always found a mean of 38 out 103 features. The blue curve in Figure 5 shows the average number of features of the best individuals of each age, over 10 runs. Table 3 shows that for KNN and MLP classifiers, our approach achieved greater or similar results in terms of OA if compared to SVDSS. However, using more robust classifiers, such as SVM, the Pixelwise approach holds the highest OA values. Nevertheless SVM performance is still near the ones of SVDSS and ours. Figure 6b shows the classification map obtained by using the subset of features found by our work and the SVM classifier, while Figure 6a shows its respective ground-truth. A very relevant constraint with Dunn s index is its sensitivity to a few outliers [21]. Due to that, it is possible that sometimes this cluster validity measure can provide an inappropriate fitness for our purpose. In order to overcome this problem some generalization of Dunn s index has been used [21] and may provide a better estimative for our problem. 6. Conclusions Due to the high dimensionality of remote sensed hyperspectral images, their rich content and some drawbacks

(a) Ground-truth Fig. 6: Results for Pavia University dataset. Table 3: Results for Pavia University dataset. (b) Classification map - Proposal+SVM Number Train KNN MLP SVM of samples samples(%) Pixelwise SVD Proposal Pixelwise SVD Proposal Pixelwise SVD Proposal OA (%) 85.3 84.4 84.4 92.3 88.9 91.7 93.4 92.6 91.1 AA (%) 81.1 79.6 80.2 89.3 76.5 88.5 90.4 88.9 88.4 Classes Asphalt 6631 5 87.3 86.6 86.5 93.0 90.9 92.2 94.2 93.2 93.4 Meadow 18649 5 98.4 98.3 98.0 97.5 97.2 97.0 97.9 97.9 97.5 Gravel 2099 5 59.7 53.4 58.1 74.4 70.3 71.2 74 73.9 72.3 Trees 3064 5 77.7 76.1 73.8 86.7 85.2 91.0 90.2 86.8 86.8 Metal Sheets 1345 5 98.7 98.8 98.6 99.3 97.6 98.7 99 97.2 98 Bare Soil 5029 5 46.9 44.5 45.1 88.0 87.2 87.9 89.7 89.6 88.1 Bitumen 1330 5 78.8 77.1 81.4 79.5 76.2 75.6 79.8 75 75 Bricks 3682 5 82.3 82.5 80.5 85.8 83.8 83.0 88.8 87.5 85.5 Shadow 947 5 99.7 99.2 99.7 99.9 0.0 99.6 99.7 99 99.3 for data discrimination, we have investigated the benefits of using feature selection approaches for the problem of classification of this data type. A new filter feature selection approach was proposed. The main idea in this proposal was that smaller subsets of features, which generate clusters with high values of Dunn s index, may provide enough discriminant information for the classification task and hence from them it is easier to build decision boundaries with good power of generalization. We used the advantages of genetic algorithms to lead a search for finding better clusters from a subset of features. The use of genetic algorithms and a cluster validity measure (Dunn s index) have proven suitable for the problem of feature selection. However, it is noticiable that Dunn s index may be affected by a few outliers and for this reason another cluster validity measures should be explored. Nevertheless, our proposal shows that a smaller number of features with enough discriminative power is easily achievable. Acknowledgements The authors would like to thank FAPEMIG, CAPES and CNPq for the financial support. References [1] C. Chang, Hyperspectral data exploitation: theory and applications. Wiley-Blackwell, 2007. [2] A. Plaza et al., Recent advances in techniques for hyperspectral image processing, Remote Sensing Environmet, vol. 113, no. 1, pp. 110 122, 2009.

[3] B.-C. Kuo, C.-H. Li, and J.-M. Yang, Kernel nonparametric weighted feature extraction for hyperspectral image classification, IEEE Trans. on Geoscience and Remote Sensing (TGARS), vol. 47, no. 4, pp. 1139 1155, 2009. [4] G. Camps-Valls and L. Bruzzone, Kernel-based methods for hyperspectral image classification, IEEE Trans. on Geoscience and Remote Sensing (TGARS), vol. 43, no. 6, pp. 1351 1362, 2005. [5] J. Benediktsson, J. Palmason, and J. Sveinsson, Classification of hyperspectral data from urban areas based on extrended morphological profiles, IEEE Trans. on Geoscience and Remote Sensing (TGARS), vol. 43, no. 3, pp. 480 491, 2005. [6] Y. Tarabalka, Classification of hyperspectral data using spectralspatial approaches, Ph.D. dissertation, University of Iceland and Grenoble Institute of Technology, 2010. [7] G. Hughes, On the mean accuracy of statistical pattern recognizers, IEEE Transactions on Information Theory, vol. 14, no. 1, pp. 55 63, 1968. [8] Y. Bazi and F. Melgani, Toward an optimal svm classification system for hyperspectral remote sensing images, IEEE Trans. on Geoscience and Remote Sensing (TGARS), vol. 44, no. 11, pp. 3374 3385, 2006. [9] S. Serpico and L. Bruzzone, A new search algorithm for feature selection in hyperspectral remote sensing images, IEEE Trans. on Geoscience and Remote Sensing (TGARS), vol. 39, no. 7, pp. 1360 1367, 2001. [10] L. Yu and H. Liu, Feature selection for high-dimensional data: A fast correlation-based filter solution, in International Workshop on Machine Learning, 2003, p. 856. [11] R. Green et al., Imaging spectroscopy and the airborne visible/infrared imaging spectrometer (aviris), Remote Sensing of Environment, vol. 65, no. 3, pp. 227 248, 1998. [12] P. Gege et al., System analysis and performance of the new version of the imaging spectrometer rosis, in EARSeL Workshop on Imaging Spectroscopy, 1998, pp. 29 35. [13] M. Velez-Reyes and L. Jimenez, Subset selection analysis for the reduction of hyperspectral imagery, in IEEE International Geoscience and Remote Sensing Symposium (IGARSS), vol. 3, 1998, pp. 1577 1581. [14] L. Jimenez-Rodriguez, E. Arzuaga-Cruz, and M. Vélez-Reyes, Unsupervised linear feature-extraction methods and their effects in the classification of high-dimensional data, IEEE Trans. on Geoscience and Remote Sensing (TGARS), vol. 45, no. 2, pp. 469 483, 2007. [15] G. Bilgin, S. Ertürk, and T. Yıldırım, Segmentation of hyperspectral images via subtractive clustering and cluster validation using one-class support vector machines, IEEE Trans. on Geoscience and Remote Sensing (TGARS), no. 99, pp. 1 9, 2011. [16] R. Duda, P. Hart, and D. Stork, Pattern Classification and Scene Analysis, 2nd ed. John Wiley & Sons, 1995. [17] C. Bishop, Pattern recognition and machine learning. Springer New York, 2006, vol. 4. [18] R. Kohavi and G. John, Wrappers for feature subset selection, Artificial intelligence, vol. 97, no. 1-2, pp. 273 324, 1997. [19] M. Velez-Reyes, L. Jimenez, D. Linares, and H. Velazquez, Comparison of matrix factorization algorithms for band selection in hyperspectral imagery, in SPIE proceedings series, 2000, pp. 288 297. [20] D. Goldberg, Genetic algorithms in search, optimization, and machine learning. Addison-wesley, 1989. [21] J. Bezdek and N. Pal, Cluster validation with generalized dunn s indices, in International Two-Stream Conference on Artificial Neural Networks and Expert Systems,, 1995, pp. 190 193. [22] E. Arzuaga-Cruz, L. Jimenez-Rodriguez, M. Velez-Reyes, D. Kaeli, E. Rodriguez-Diaz, H. Velazquez-Santana, A. Castrodad-Carrau, L. Santos-Campis, and C. Santiago, A matlab toolbox for hyperspectral image analysis, in Geoscience and Remote Sensing Symposium, 2004. IGARSS 04. Proceedings. 2004 IEEE International, vol. 7, 2004, pp. 4839 4842. [23] C.-C. Chang and C.-J. Lin, LIBSVM: A library for support vector machines, ACM Transactions on Intelligent Systems and Technology, vol. 2, no. 3, pp. 27:1 27:27, 2011, software available at http://www.csie.ntu.edu.tw/ cjlin/libsvm.