IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 24, NO. 7, JULY

Size: px

Start display at page:

Download "IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 24, NO. 7, JULY"

Jemimah Wade
5 years ago
Views:

1 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 24, NO. 7, JULY Spatial Coherence-Based Batch-Mode Active Learning for Remote Sensing Image Classification Qian Shi, Bo Du, Member, IEEE, and Liangpei Zhang, Senior Member, IEEE Abstract Batch-mode active learning (AL) approaches are dedicated to the training sample set selection for classification, regression, and retrieval problems, where a batch of unlabeled samples is queried at each iteration by considering both the uncertainty and diversity criteria. However, for remote sensing applications, the conventional methods do not consider the spatial coherence between the training samples, which will lead to the unnecessary cost. Based on the above two points, this paper proposes a spatial coherence-based batch-mode AL method. First, mean shift clustering is used for the diversity criterion, and thus the number of new queries can be varied in the different iterations. Second, the spatial coherence is represented by a two-level segmentation map which is used to automatically label part of the new queries. To get a stable and correct second-level segmentation map, a new merging strategy is proposed for the mean shift segmentation. The experimental results with two real remote sensing image data sets confirm the effectiveness of the proposed techniques, compared with the other state-of-the-art methods. Index Terms Active learning, hyperspectral images, image classification. I. INTRODUCTION IN RECENT years, many different supervised classification methods have been proposed to increase the classification accuracy for remotely sensed images [1] [4]. For hyperspectral image classification, the geometric properties of different objects may not always be visibly distinguishable, and they usually cannot be recognized with high reliability by a human being. In such cases, a ground survey is necessary for the sample labeling. Thus under the limitation of labeling costs, the available training samples are often not enough for adequate learning of the classifier. However, in practical applications, manually selecting the region of interest in the scene as the training set is the common approach this procedure often introduces much redundancy into the training Manuscript received February 22, 2013; revised August 10, 2014 and November 9, 2014; accepted February 9, Date of publication February 24, 2015; date of current version March 31, This work was supported in part by the National Basic Research Program (973 Program) of China under Grant 2011CB and Grant 2012CB and in part by the National Natural Science Foundation of China under Grant and Grant The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Edwin A. Marengo. (Corresponding author: B. Du.) Q. Shi and L. Zhang are with the State Key Laboratory of Information Engineering in Surveying, Mapping, and Remote Sensing, Wuhan University, Wuhan , China ( shiqian@whu.edu.cn; zlp62@whu.edu.cn). B. Du is with the School of Computer Science, Wuhan University, Wuhan , China ( remoteking@whu.edu.com). Color versions of one or more of the figures in this paper are available online at Digital Object Identifier /TIP sample set. As a result, the labeling costs are increased and the corresponding trained classifier performance is considerably reduced. In order to reduce the labeling costs and optimize the classification performance, the training set should be as small as possible, to avoid redundancy, and should include the samples that are most relevant, to improve the performance of the classifier. In this context, active learning (AL) approaches have been introduced into the classification field [5] [8], which can enrich the information of the training sample set and improve the statistic of the classes. AL approaches repeatedly ask the user to attribute the labels to the most informative unlabeled samples, according to a function of their class membership uncertainty, and then add them to the training set [9] [12]. Through this sampling procedure, the classification model s performance can be iteratively improved. AL methods have proved to be an effective tool for constructing efficient training sample sets. The manual labeling not only contains the labeling costs but also the traveling costs between queried samples in the experimental field. For hyperspectral image classification, the newly selected informative samples needed to be manually labeled in every iteration is geographically distributed in the experimental field, and travelers have to traverse every point needed to be labeled. As a result, one more iteration means the travelers have to redesign the shortest route to connect all the points, and there is high probability to revisit some region have visited, which will increase the traveling costs, Thus less iterations are more preferable in the practice application. On the basis of the aforementioned considerations, a batch-mode active learning method is expected to be more suitable for hyperspectral image classification, where a batch of unlabeled samples is queried at each iteration, which increases the speed of the sample selection and reduces the iterations [13]. The key issue for batch-mode AL methods is to select a batch of samples with as little redundancy as possible, so that they can provide the maximum possible information to the classifier. In detail, two points should be taken into account in the query function adopted for selecting the batch of the most informative samples: 1) the uncertainty; and 2) the diversity of the samples [14] [16]. The uncertainty criterion reflects how confident the supervised classifiers are in classifying the test samples. The diversity criterion is used to reduce the redundancy among the new queried samples by selecting the unlabeled samples which are as distant from one another as possible [13]. Before the batch-mode selection method was proposed, the conventional AL methods just used the uncertainty criterion to IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See for more information.

2 2038 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 24, NO. 7, JULY 2015 query new samples according to the degree of the uncertainty. Thus, there has been a large amount of research into the study of the uncertainty criterion, which can be grouped into three main areas: 1) query by committee [17] [22], where the uncertainty is evaluated by measuring the disagreement of a committee of classifiers; 2) the posterior probability [23], [24], in which the posterior probability is used to measure the uncertainty of the candidates; and 3) the large margin heuristic, where the distance to the margin is calculated to measure the uncertainty, which is suitable for margin-based classifiers such as support vector machine (SVM) [25], [26]. Recently, many active learning methods have being used in the remote sensing which shows great potential to improve the quality of the training sample set [27] [31]. When active learning algorithms are applied in remote sensing image classification, the labeling costs are not only associated with the amount of new samples to be labeled, but also with the geometric distance between the samples. Liu et al. [32], [33] combined traveling distance costs and the spectral uncertainty into the query function to select the new queried samples. The spectral uncertainty should be maximized, while the distance traveling to the new point should be minimized. On the basis of this work, Demir et al. [34] further took into account the sample accessibility on the ground, which fits well with real applications. Another way to reduce the labeling costs is to regard the segment patches as the samples to be selected in the training sample set [35], which accords with the real situation of manual labeling much better since each ground object exists in the form of an object (segment patch) in the remote sensing image [36]. In addition, the key to this method is that each segment should be homogeneous enough to belong to a single class [35], which means that it can be assigned a single label. However, this method uses a single spectral curve to represent all the pixels in a segment patch, which will reduce the spectral variability of each class in the image. In particular, those pixels on a real class boundary cannot be selected by the active learning algorithm since these pixels are averaged with other samples. In order to solve this problem, the proposed method still selects pixels from the samples to be labeled, rather than from the segmented patches, in the active learning process. It is assumed that by labeling one pixel in a segment patch that all the pixels in this segment patch can be automatically marked by this label. Thus, in the following iteration, these samples can be included in the training sample set if they are selected by the uncertainty criteria. In this way, with the same amount of queried samples as the method proposed in [35], more pixels can be used to accurately describe the class boundary. Furthermore, the selected pixels marked by the segment patches can be called automatically labeled pixels. However, this strategy is dependent on the homogeneity of each segment patch. This paper therefore proposes a two-level segmentation method based on a mean shift segmentation strategy to obtain more accurate segmentation results. The first level is the over-segmentation map by the original mean shift segmentation algorithm, with a small scale. However, the automatic labeling capability may be limited if the segment patches are too small. With the aim of alleviating this problem, a merging is performed on the over-segmentation map, on which each segment patch is merged into a single object. A new merging mechanism is further proposed in which the class distribution information of each segment patch is used to control the process of the merging, which can avoid incorrect merging between different land-cover types when a large scale is used in the second merging process. The intrinsic sensitivity to scale in the mean shift segmentation can therefore be alleviated. Crawford et al. [37] noted that spatial information can be considered as a metric to minimize the selection of spatially collocated samples. Muñoz-Marí et al. proposed a semisupervised classification with active queries [38], which build a hierarchical clustering tree, and exploits the most coherent pixels with respect to the available class information. Inspired by this idea, the over-segmentation map is used in our method to avoid selecting the samples that are geographically very close to the current samples in the training sample set. Thus, more than one uncertain sample located in the same segment patch means redundancy in the spatial domain, and should be reduced to only one sample, according to the diversity criterion. In the proposed method, although the small segments merged on the over-segmentation map may reduce the capability of removing the redundancy in the spatial domain, most of the selected samples can be automatically labeled on the second-level map; thus, there is no obvious increase in the labeling costs, while the spectral variability in the sample set is preserved. This paper proposes a spatial coherence based batch-mode AL framework for remote sensing image classification. The main contributions of this paper are: 1) A novel diversity criterion is proposed to adaptively determine the number of new queries at each iteration, which is expected to reduce the iterations in the whole active learning process. 2) A segmentation map is used to incorporate the spatial coherence to reduce the labeling costs. Pixels rather than segment patches are regarded as the samples selected in the AL process. In addition, the segment patches are utilized to provide the references for automatically labeling the selected pixels. 3) A new merging strategy is proposed for the mean shift segmentation, to ensure the accuracy of the automatic labeling, which can avoid incorrect merging when a large scale is used. The rest of the paper is organized as follows. Section II describes the framework of the batch-mode active learning and proposes a spatial coherence based batch-mode active learning method by defining a segmentation-based diversity criterion. The experimental results and analysis are presented in Section III. Section IV concludes the paper. II. THE SPATIAL COHERENCE BASED BATCH-MODE ACTIVE LEARNING FRAMEWORK A. The Framework of the Batch-Mode AL Procedure We first describe the overall framework of the batch-mode AL process. Let us model it as a sextuplet (G, F, D, S, T, U) [39]. G is a supervised classifier, which is trained by the training set T. F is the uncertainty criterion used to

D is the diversity criterion used to reduce the redundancy among the selected samples. S is a supervisor that can give the true class label to any unlabeled sample in U. As shown in Fig.

3 SHI et al.: SPATIAL COHERENCE-BASED BATCH-MODE AL 2039 Fig. 1. The batch-mode AL procedure. Fig. 2. The proposed the batch mode framework. select the unlabeled samples lying close to decision function from a pool U of unlabeled samples, on the basis of the current classification results [24]. D is the diversity criterion used to reduce the redundancy among the selected samples. S is a supervisor that can give the true class label to any unlabeled sample in U. As shown in Fig. 1, the AL method carries out an iterative process, where the supervisor S interacts with the system by labeling the most uncertain samples selected by uncertainty criterion F and diversity criterion D at each iteration. At the first stage, an initial training set T(0) with a few labeled samples is used for training the classifier G. After initialization, the uncertainty criterion F is used to select a set of samples lying on the decision boundary from the pool U. Then, based on the diversity criterion D, the most representative samples are selected from these samples. The supervisor S assigns the true class label. These new labeled samples are then added to T(iter) (where iter refers to the iteration number), and the classifier G is retrained using the updated training set. This closed loop of querying and retraining continues until a stopping criterion is satisfied. Fig. 1 shows a sketch of the batch-mode AL procedure. The proposed method is based on the framework of a batch-mode AL process with several further contributions introduced into it. These contributions are detailed in the following section. B. Proposed Batch Mode AL Framework The proposed method is based on the batch mode AL framework. In the uncertainty procedure, we use the two classical criteria [13], [23] to select a batch of samples lying close to the decision function. And in the diversity procedure, we use mean shift clustering to remove the spectral redundancy in these samples, and over-segmentation map is used to remove the spatial redundancy. Based on the refined segmentation map, new selected informative samples are tested whether can be labeled by the over-segmentation map. If these are samples located in a same homogenous patch with samples have been assigned with a label, these samples are free to be labeled. The general workflow is shown in Fig. 2. Each of these steps will be describe in more detail in following sections. C. Uncertainty Criterion The uncertainty criterion aims at constructing a unified criterion to evaluate the uncertainty of the samples in the unlabeled sample pool U. Since the most uncertain samples have the lowest probability of being correctly classified by the conventional classification models, they are the most useful in the training set. In this paper, we investigate two popular techniques based on different classifiers: 1) multiclass-level uncertainty (MCLU), which is based on the SVM classifier [13]; and 2) breaking ties (BT), which is based on a classifier that can output the posterior probability [23]. We now briefly summarize the mechanisms of the two common uncertainty criteria. 1) Multiclass-Level Uncertainty (MCLU): The MCLU technique selects the most uncertain samples according

4 2040 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 24, NO. 7, JULY 2015 Fig. 3. The strategy used to remove the redundancy in the uncertain sample set. Remove the spatial diversity by choosing one sample from each segment patch. Then, based on the distribution of these uncertain samples in the spectral space, mean shift clustering is used to remove the redundancy in the spectral domain. to their functional distance to the hyperplanes of the binary SVM classifiers with the OAA (one against all) architecture [40], [41]. For the samples x i, the distance value to the n hyperplanes can be represented by { f 1 (x i ),..., f n (x i )}. The difference between the first- and second-largest distance values to the hyperplanes can then be represented as: c dif f (x i ) = f r1 max(x i ) f r2 max(x i ) If c dif f (x i ) is small, the decision for r 1 max is not reliable, and there is a possible conflict with the class r 2 max. Therefore, the uncertainty of x i is high. 2) Breaking Ties (BT): The BT criterion focuses on analyzing the boundary regions between two classes, with the goal of obtaining more diverse sample sets [23]. Suppose p ia is the largest, and p ib is the second-largest probability for a sample x i,wherei a represents the index of the largest probability for the sample x i,andi b represents the index of the second-largest probability for the sample x i, then the decision criterion is: d i = p ia p ib Intuitively, if the value of d i is small, the tie between p ia and p ib is strong, which indicates that the classification confidence is low. We sort d i by ascending order, and select n spectral samples in the front part of the sorting as the new samples, which are represented by S spectral. D. Diversity Criterion In the diversity criterion, an over-segmentation map is used to remove the spatial redundancy in the uncertain samples. Furthermore, mean shift clustering is used to remove the spectral redundancy. Each sample corresponds to one segment patch. Let us assume that the uncertain sample set U UC belongs to m segmented patches on the over-segmentation map. We can choose one sample from each segment patch, and thus the uncertain sample set U UC can be reduced to U UC_spa. The mean shift clustering method is then used to remove the spectral redundancy in U UC_spa, which can be represented by U UC_spa_spe. The process is shown in Fig. 3. In contrast to the classic k-means clustering approach, there are no embedded assumptions on the distribution or the number of clusters. Only by giving a fixed bandwidth can the number of the clusters be automatically determined. The details of the principles of mean shift clustering and segmentation are presented in Section III.E. Fig. 4. Automatically labeling strategy. The shaded areas represent segment patches have being labeled. Point 2,3,6 can be labeled automatically, and point 1,4,5 should be queried by user. E. Automatical Labeling Strategy by a New Merging Strategy In the remote sensing images, adjacent pixels have a high probability of belonging to the same class. Thus after we get a batch of informative samples from uncertainty and diversity criterion then we will test whether these samples can be automatically labeled through the homogenous segment patch. If these are samples located in a same homogenous patch with samples have been assigned with a label, these samples are free to be labeled. As shown in the Fig.4, shaded segment areas represent the segment patches have being labeled. Thus point 2,3,6 can be labeled automatically, and point 1,4,5 should be queried by the user. F. The Mean Shift Technique and the New Merging Strategy for Mean Shift Segmentation As described in Section II, the method of utilizing the spatial information is to automatically label the queried samples which are located in the segment patches that have been labeled. Thus this method is highly dependent on the homogeneity of the segment patches. As a result, a small scale is preferred to achieve this requirement. However, when the segment patches are small, the number of queried examples that can be automatically labeled would be obviously reduced. Thus we should merge these segment patches as much as possible, on the premise that incorrect merging does not occur. In this section, we focus on proposing a new merging strategy based on the mean shift technique. A large scale is used for the new merging strategy, and to avoid any incorrect merging, the discriminative information in the segment patches is used to control the merging of the segment patches. Before introducing the new merging strategy, we briefly introduce the principle of the mean shift technique. 1) Mean Shift Clustering and Segmentation: The mean shift technique is a general non-parametric mode clustering procedure. The main idea behind mean shift is to treat the points in the d-dimensional feature space as an empirical probability density function, where dense regions in the feature space correspond to the local maxima or modes of the underlying distribution [42]. For each data point in the feature space, one performs a gradient ascent procedure on the locally estimated density, until convergence. The stationary points of this procedure represent the modes of the distribution. Furthermore, the data points associated (at least approximately) with the same stationary point are considered members

5 SHI et al.: SPATIAL COHERENCE-BASED BATCH-MODE AL 2041 Fig. 5. The principle of mean shift analysis. Fig. 6. The new merging strategy. of the same cluster. The principle of the mean shift segmentation method is shown in Fig. 5. In order to find the cluster center for a point P 1, repeatedly figure out the centroid of the points inside a sphere (initially at P 1 ) and re-center the sphere on the centroid, until the sphere is stationary. Next, some key technical details behind mean shift are reviewed (for further details, see [43] [45]). Given the data points in the d-dimensional space, the kernel density estimator at point x can be written as: f h,k (x) = c k,d nh d n k( x x i h 2 ) where k(x) is a monotonically decreasing function, and is referred to as the profile of the kernel. Given the function g(x) = k (x) for the profile, the kernel G(x) is defined as G(x) = c g,d g ( x 2).Forn data points x i, i = 1, n, inthe d-dimensional space R d, the mean shift vector is defined as: n x i g( x x i 2 h ) m h,g (x) = n g( x x i 2 h ) x (1) where x is the center of the kernel (window), and h is a bandwidth parameter. Therefore, the mean shift vector is the difference between the weighted mean, using kernel G as the weights and the center of the kernel (window). As noted in [46], remote sensing imagery is typically represented as a spatial-range joint feature space. The spatial domain denotes the coordinates and locations for the different pixels, and the range domain represents the spectral signals for the different channels. The multivariate kernel is defined for the joint density estimation: K hs,h r (x) = C m h 2 s h r p k( x s 2 h )k( x r 2 s h ) r where x s is the spatial part, x r is the range part of a feature vector, and h s and h r are the employed kernel bandwidths, which correspond to the search window. A remote sensing image is typically represented as a 2D lattice of d-dimensional vectored pixels. The search can obviously be limited to a rectangular window of size 2 2 in the normalized space, which corresponds to (2[h s ]+1) 2 image pixels, where [ ] is the down-rounded integer, and C m is the corresponding normalization constant. In practice, the bandwidth can control the size of the kernel, and it determines the resolution of the mode detection. As a result, the number of the queried samples is varied along with the bandwidth of the segmentation. 2) The New Merging Strategy and Multi-Scale Segmentation: After we obtain the over-segmentation maps by using a small scale, the label and confidence of each segment patch can be deduced based on the classification map. There are two main basic rules to avoid incorrect merging: 1) segment patches with different labels cannot be merged; and 2) segment patches with low confidence of belonging to that label should not be merged with the other segment patches. Fig. 6 presents our idea. The detailed steps are as follows: (i) Obtain the Label and Confidence of Each Segmented Patch: Each pixel in a specific homogeneous region can be related to a specific class label on the classification map generated by the few training samples. Thus, the labels of the segmented patches can be obtained by counting the occurrence of each class label, i.e., the class label with the highest occurrence can be assumed to be the label of the segmented patch. The confidence of the segmented patch belonging to a specific class c is influenced by two factors: 1) the first is whether the class label accounts for a large percentage of the total pixels in the patch; and 2) the second is the average posterior probability of all the pixels belonging to this class label. Assume that the class label can be represented by [1,...,C]. The first factor can then be represented by ratio1 i = n c n,wheren is the total number of pixels in the segmented patch, and n c is the occurrence of class c in this patch. Meanwhile, the nc pro ic second factor can be represented by ratio2 i = n c, where pro ic represents the posterior probability of the ith pixel belonging to the c class. The total confidence p in the ith patch can be defined as p i = ratio1 i ratio2 i. (ii) Filtering of the Over-Segmentation Map: The main difference between the first and the second segmentation procedure is that, on the first merging of the image, the pixels are arranged as a lattice structure, so we can control the search scope by setting the value of h s and h r ; however, on the second merging, which is

Thus, we define an adaptive way to replace the effect of h s, which is illustrated in Fig. 7. Assume that the center patch belongs to the class c.

6 2042 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 24, NO. 7, JULY 2015 Fig. 7. Finding the neighborhood patches. done on the small patches, the lattice structure no longer applies, and the parameter h s is also not applicable. Thus, we define an adaptive way to replace the effect of h s, which is illustrated in Fig. 7. Assume that the center patch belongs to the class c. The neighborhood patches will be contained within the local window only if they share the same label c as the center patch. The neighborhood structure can be described by continuously looking for adjacent patches with the same label until no more can be found. In this way, adaptive neighborhood patches can be determined for each local patch. Because of the removal of the segmented patches belonging to the other class labels, the window is directly shifting toward the center of the class c, which can further reduce the computation time and speed up the convergence of the shifting process. Above all, the foremost benefit is that the effect of incorrect merging can be avoided. For the data points x i, i = 1, n, inthed-dimensional space R d, the new mean shift vector is defined as: m h,g (x) = n x i exp( x x i 2 ) p 2hr 2 i x n exp( x x i 2 ) p 2hr 2 i The term p i controls the weight of each point in the local window, based on the confidence of each patch. The new merging strategy can reduce the probability of a patch s label spreading to the other classes by reducing the contribution to calculating the window center. As a result, these patches cannot easily be merged with the other patches. In fact, the strategy can increase the weights of the patches with a higher confidence to calculate the center of the window, and thus the shifting direction will be more likely to be toward those patches with correct labels. As a result, the labels of the patches derived by the training samples can spread to the other adjacent spatial patches with uncertain labels. When a larger scale is used, the patches not only with a high confidence but also with a rich textural structure can be merged. Furthermore, incorrect merging can be reduced significantly, due to the effective avoidance of the patches with a low confidence being merged with the other patches. The pseudocode of the proposed algorithm is shown above. At each iteration iter, the proposed heuristic evaluates the sampling strategy on the uncertain set U UC. The set of samples corresponding to the cluster U UC is composed of the samples Fig. 8. (a) False color composite of the AVIRIS Indian Pines scene by bands 60, 27, and 17. (b) Ground-truth map containing 16 mutually exclusive land-cover classes. satisfying h(x i ) θ, i U iter. U UC_spa can then be obtained by querying the over-segmentation map M s1. The final clustering result C spa_spe is obtained by mean shift clustering. At the stage of assigning labels, some of the samples in the patches will have already been assigned labels. The remaining samples can then be manually labeled. The heuristic algorithm then returns a batch of p points Xnew iter ={x k, y k } p k=1. Finally, S iter is added into the training set X iter+1 = X iter Xnew iter, and is removed from the pool U iter+1 = U iter \Xnew iter, and the process iterates until a stopping criterion is satisfied. To sum up, when the MCLU is treated as uncertainty criterion, the proposed active learning method can be called as MCLU-MS-OS and MCLU-MS-RS (MS represents the mean shift clustering for diversity criterion, and OS and RS represents automatic labeling by over-segmentation map and refined segmentation map). When breaking ties is treated as uncertainty criterion, the proposed active learning method can be called as BT-MS-OS and BT-MO-RS (The mean of MS, OS and RS is same with above description). III. EXPERIMENTAL RESULTS AND ANALYSIS Two sections are included: 1) descriptions of the remote sensing images used for validating the proposed approaches and the corresponding experimental setup; 2) the experimental results and analysis. A. Dataset Description and Experiment Settings Two widely used datasets are used in our experiments: The first dataset is the Indian Pines hyperspectral test set, which was acquired by NASA s Airborne Visible InfraRed Imaging Spectrometer (AVIRIS) [46]. It contains pixels, with each pixel having 220 spectral bands covering the range of μm. The corresponding spatial resolution is approximately 20 m. The initial training sets T are composed of about seven samples per class (112 samples in all), and the rest of the samples are considered as unlabeled samples stored in the unlabeled pool U. The false color composite and the ground-truth map are shown in Fig. 8. The second hyperspectral image dataset was acquired by the Airborne Reflective Optics System Imaging Spectrometer (ROSIS), and covers an urban area of Pavia, northern Italy. The image scene, with a size

SHI et al.: SPATIAL COHERENCE-BASED BATCH-MODE AL 2043 Algorithm 1 The Spatial Coherence Based Batch-Mode Active Learning Method TABLE I METHODS WITH DIFFERENT COMBINATIONS Fig. 9.

of 610 340 pixels, is centered at the University of Pavia. After removing 12 noisy or water absorption bands, it comprises 103 spectral channels.

7 SHI et al.: SPATIAL COHERENCE-BASED BATCH-MODE AL 2043 Algorithm 1 The Spatial Coherence Based Batch-Mode Active Learning Method TABLE I METHODS WITH DIFFERENT COMBINATIONS Fig. 9. (a) False color composite of the ROSIS University of Pavia scene (bands 60, 27, and 17). (b) Reference map containing nine mutually exclusive land-cover classes. of pixels, is centered at the University of Pavia. After removing 12 noisy or water absorption bands, it comprises 103 spectral channels. Nine ground-truth classes, with a total of samples, are considered in the experiments. The initial training sets X are composed of 10 samples per class (90 samples in all) randomly drawn from U. The false color composite and the groundtruth map are shown in Fig. 9. To test the effectiveness of the proposed diversity criterion, a kernel k-means based clustering method is used as the comparison method, which has proved to be better than the traditional k-means clustering method. This comparison is under the conditions of the different uncertainty criteria mentioned above. The different combinations are shown in Table I. According to the above description, when multiclass-level uncertainty (MCLU) is used as the uncertainty criterion and SVM is used as the classifier, MCLU (no diversity criterion), MCLU combined with kernel k-means (MCLU-Kkmeans), and MCLU combined with the proposed diversity criterion (MCLU-MS for short) are compared. When breaking ties is used as the uncertainty criterion and sparse multinomial

8 2044 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 24, NO. 7, JULY 2015 logistic regression (SMLR) is used as the classifier, similarly, BT, BT-Kkmeans, and BT-MS are compared. In uncertainty procedure, we set c dif f (x i ) = 0.01 for MCLU and d i = 0.03 for breaking ties. For these traditional AL methods using a fix number of the new queries, 20 samples are sampled for each iteration in the Indian Pines dataset, and 19 samples are used for the Pavia dataset. Demir et al. [13] proved that selecting the most uncertain sample in each cluster in the diversity criterion is more effective than selecting the sample closest to the class center. Thus, this enhanced strategy is used on all the methods with a diversity criterion. As an active learning method using spatial information, we also compared the proposed method with another active learning method [38] utilizing both the spatial and spectral information, which builds a hierarchical clustering tree and finds the most coherent clustering between the queried labels and the classification solution. The node selection strategy is choosing s 2 mode, and subnode selection strategy is choosing d 1 mode, which have been proved is an optimal mode combination. And we use same number of initial training samples as the input. In order to represent the effectiveness of the automatic labeling of the segmentation map, we also study the active learning results when using automatic labeling. Furthermore, to investigate the superiority of the refined segmentation map over the over-segmentation map, the classification results of the automatic labeling by both the over-segmentation map and the refined segmentation map are also presented, represented by BT-MS-OS, MCLU-MS-OS (automatic labeling by over-segmentation map), BT-MS-RS, and MCLU-MS-RS (automatic labeling by refined segmentation map). As the baseline, only the uncertainty criterion is used, where the samples with the highest uncertainty are added to the training sample set. One-against-all SVM (OAA SVM), implemented with the LIBSVM library [47], is applied with a Gaussian RBF kernel k(x i, x j ) = exp( xi x j 2 /2σ 2 ) for all the experiments. As noted in [48], in each independent run, a 5-fold cross validation is carried out in order to choose a set of hyper-parameters in the ranges of log 2 C ={ 5,, 15} and σ =[0.1σ p, 0.2σ p,, 1.5σ p ]. σ p is equal to the average Euclidean distance between the randomly chosen pixels in U. The two parameters are evaluated in the first few iterations; when the optimal parameters remain unchanged, the evaluation of the parameters is stopped. For the other classifier, SMLR, the sparsity parameter is fixed to and the smoothness parameter to 2. As noted in [49] [52], although these defined parameter settings might be suboptimal, they empirically lead to very good results. All the results are generated by the average accuracies of ten individual experiments, according to the ten initial randomly selected training sets. The results are represented by the learning rate curves, which show the average classification accuracy versus the number of training samples needed to be labeled. It is important to note that it is not possible to directly evaluate the stability of the proposed method, because the number of new queried samples in each iteration is not the same in the proposed method. Thus, we average the queried number at the corresponding iterations in the ten runs, which can be represented by { 10 n iter1_i /10, 10 n iter2_i /10,..., 10 n iterh_i /10}, whereh is the index of the iteration, and n iterh_i represents the number of new queries at the hth iteration in the i th initialization. Then, based on the average queried number at each iteration, we can find the interpolation value on the learning curves for each of the 10 initializations. After this procedure, the overall accuracy and standard deviation can be calculated. As a lower bound on the accuracy, passive learning is evaluated by random sampling in the pool of candidates. In the proposed method, a Gaussian kernel is used in the mean shift segmentation. For both datasets, we set h s = 7 and h r = 700 in the over-segmentation maps. On the refined segmentation map, we set range h r = As for the mean shift clustering, we set h = 800 for the Pavia dataset and h = 500 for the Indian Pines dataset. For the setting of the bandwidth of the mean shift clustering, a smaller value is preferable, which can not only reduce the redundancy of the selected samples in the spatial domain but can also keep the variability of the spectral domain. The influence of the scale on the final results is also discussed in the following sections. There are three evaluation criteria used to compare the effectiveness of the AL methods. The first is the number of samples needed to be labeled to reach the given accuracy; the second is the corresponding time cost; and the third is the least number of iterations required. In the experiments, the pre-defined accuracy is obtained by using all the test samples to train the classifier. B. Results and Discussion This section presents the results for the two images and the different settings presented earlier. 1) Experiments With the AVIRIS Indian Pines Dataset: (i) The Performances of the Different Active Learning Algorithms: The full SMLR achieves an overall accuracy of 86.84% with the test set, with a kappa index of The full SVM achieves an overall accuracy of 84.67% with the test set, with a kappa index of The dotted lines in these two figures refer to the overall accuracy of full SMLR and full SVM, respectively. Fig. 10 shows the overall average accuracy and standard deviation in the ten runs. Fig. 10 (a) shows the learning curves based on SMLR by the use of BT as the uncertainty criterion. Fig. 10 (b) shows the result based on SVM by the use of MCLU as the uncertainty criterion. Table II shows the average computation times of the AL methods in the 10 runs. In Fig. 10 (a), it can be observed that BT-MS and MCLU-MS can still get comparable accuracies to BT-Kkmeans and MCLU-Kkmeans, but only need 10 iterations, which is much less than the 19 iterations in the conventional methods. From the curves of BT-MS and MCLU-MS, the number of uncertain samples selected from the unlabeled pool decreases gradually when the iterations increase, which proves that the uncertain samples of the classifier are adaptive to the performance of the classifier at each iteration.

9 SHI et al.: SPATIAL COHERENCE-BASED BATCH-MODE AL 2045 Fig. 10. The overall classification accuracy based on (a) SMLR and (b) SVM. Each curve shows the overall accuracy and standard deviation for the increasing size of samples needed to be labeled over several runs of the algorithm, starting with different initial sets. TABLE II THE COMPUTATION TIMES OF THE AL METHODS WITH THE INDIAN PINES DATASET In fact, in the last few iterations of the traditional methods (BT, BT-Kkmeans), although the same amount of new samples is added in the training sample set, there is no obvious improvement in the accuracy, which further proves the assumption above. When the automatic labeling procedure is added into the whole AL process, the curves of BT-MS-RS, BT-MS-OS, MCLU-MS-RS, and MCLU-MS-OS show a steeper learning rate than the other methods. The analysis of the behavior of the standard deviation also points out that, with both datasets, the proposed technique has a smaller standard deviation than the kernel k-means based algorithm. This confirms the good stability of the proposed method versus the k-means based algorithm. BT-MS-OS reaches the full SMLR accuracy with a training set composed of 241 pixels, and BT-MS-RS needs only 207 pixels. The other methods converge with about 507 samples, nearly two times more than the methods utilizing automatic labeling, which suggests that many new queried samples located in patches have been labeled. We can find that MCLU-MS-OS needs 254 samples to get the full SVM accuracy and MCLU-MS-RS only needs 227 samples. And about 253 samples and 280 samples can be automatically labeled by MCLU-MS-OS and MCLU-MS-RS. It is worth noting that when the proposed method only utilizes the over-segmentation map, BT-MS-OS and MCLU-MS-OS can get comparative performances to BT-MS-RS and MCLU-MS-RS, but the performances of BT-MS-OS and MCLU-MS-OS are very sensitive to the scale of the segmentation. In the following section, the sensitivity of BT-MS-OS and MCLU-MS-OS is investigated. From the learning curves of BT-MS-RS and MCLU-MS-RS, with the increase in the iterations, Fig. 11. The overall classification accuracy of MCLU-MS-OS and the method in ref [38]. the number of new queries need to be labeled does not show an obvious increase; however, the overall accuracy still increases significantly. Especially in the last few iterations, the number of new queried samples that need to be manually labeled is clearly reduced. Fig.11 shows the learning curves of proposed MCLU-MS-OS and the active learning method proposed by [38], and the proposed MCLU-MS-OS also can get better performance. The proposed method can find accurate segment patch and more free samples added to help the classification, thus the performance is better than compared method. Because of the sharp decrease in the number of iterations, the computational time cost of the proposed method is much less than the traditional AL methods, which can be seen in TABLE II. Additionally, the times of BT-MS-RS and MCLU-MS-RS include the time spent on generating the multi-level segmentation map. We can see that the time spent on the multi-level segmentation map is in fact insignificant. (ii) The Segmentation Results of the Indian Pines Dataset: Fig. 12 shows the mid-results of the Indian Pines dataset. It is worth noting that the segmentation algorithm can

2046 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 24, NO. 7, JULY 2015 Fig. 12.

maps based on SMLR at the 2nd, 4th, and 9th iterations; (d), (f), (h) the refined segmentation maps based on the over-segmentation map using the scale of h = 4000 at the 2nd, 4th, and 9th iterations;

Each curve shows the overall accuracy for the increasing samples need to be labeled over several runs of the algorithm, starting with different initial sets.

All the refined segmentation maps were generated when h r = 4000. Not much merging occurs on the first iteration, which is because most of the regions are uncertain, which prevents them from merging.

At the sixth iteration, the overall accuracy reaches the given accuracy requirement; thus, the final refined segmentation map gets a stratified result.

10 2046 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 24, NO. 7, JULY 2015 Fig. 12. The mid-result of the Indian Pines experiment: (a) the over-segmentation map when h s = 7andh r = 700; (b) the mean shift segmentation map when h s = 7andh r = 3500; (c), (e), (g) the classification maps based on SMLR at the 2nd, 4th, and 9th iterations; (d), (f), (h) the refined segmentation maps based on the over-segmentation map using the scale of h = 4000 at the 2nd, 4th, and 9th iterations; and (l) the ground truth map. Fig. 13. The overall classification accuracy and standard deviation based on (a) SMLR and (b) SVM. Each curve shows the overall accuracy for the increasing samples need to be labeled over several runs of the algorithm, starting with different initial sets. only perform on the unlabeled regions, which can reduce the workload of the segmentation algorithm. In this figure, the over-segmentation map was generated when h s = 7andh r = 700. All the refined segmentation maps were generated when h r = Not much merging occurs on the first iteration, which is because most of the regions are uncertain, which prevents them from merging. However, when the classification map improves as new queried samples are added, more regions can be merged. At the sixth iteration, the overall accuracy reaches the given accuracy requirement; thus, the final refined segmentation map gets a stratified result. If we directly use the scale of h r = 4000, the segmentation map shown in Fig. 12 (b) represents seriously incorrect merging. 2) Experiments With the Pavia Dataset: (i) The Performances of the Different Active Learning Algorithms: The full SMLR achieves an overall accuracy of 92.76% on the test set, with a kappa index of The full SVM achieves an overall accuracy of 92.81% on the test set, with a kappa index of Fig. 13 shows the average overall accuracy and standard deviation Fig. 14. The overall classification accuracy of proposed MCLU-MS-OS and the method in ref [38]. in the ten runs. TABLE shows the average computation times of the AL methods in the 10 runs. When SMLR is used as the classifier and breaking ties is the uncertainty criterion, BT-MS-OS can obtain the given accuracy by labeling 418 samples, and BT-MS-RS only needs 434 samples. All the other methods without the spatial information need about 520 samples. Based on the MCLU criterion, MCLU-MS-OS needs 389 samples to get the full SVM accuracy,

SHI et al.: SPATIAL COHERENCE-BASED BATCH-MODE AL 2047 TABLE III THE COMPUTATION TIMES OF ALL THE AL METHODS WITH THE PAVIA DATASET Fig. 15.

when h = 4000 in the last iteration; (d) the classification map in the last iteration; and (e) the ground truth map.

When the automatic labeling procedure is removed from the whole AL process, it can be seen that BT-MS and MCLU-MS can still obtain an acceptable accuracy with the BT-Kkmeans and MCLU-Kkmeans methods.

With the decrease in the iterations, the computation time is also reduced, compared to the other methods, which can be seen in TABLE III. And Fig.

11 SHI et al.: SPATIAL COHERENCE-BASED BATCH-MODE AL 2047 TABLE III THE COMPUTATION TIMES OF ALL THE AL METHODS WITH THE PAVIA DATASET Fig. 15. The mid-result of the Pavia experiment: (a) the over-segmentation map when h s = 7andh r = 700; (b) the mean shift segmentation map when using h s = 7andh r = 4000; (c) the refined segmentation map when h = 4000 in the last iteration; (d) the classification map in the last iteration; and (e) the ground truth map. and MCLU-MS-RS only needs 368 samples, which is obviously less than the other methods. When the automatic labeling procedure is removed from the whole AL process, it can be seen that BT-MS and MCLU-MS can still obtain an acceptable accuracy with the BT-Kkmeans and MCLU-Kkmeans methods. However, BT-MS and MCLU-MS only need 10 iterations, which is much less than BT-Kkmeans and MCLU-Kkmeans need to reach a satisfactory accuracy. With the decrease in the iterations, the computation time is also reduced, compared to the other methods, which can be seen in TABLE III. And Fig.14 also confirms that proposed method can still perform better than the method proposed by [38]. The proposed method also represent comparative performance with method proposed by [38], however, under the limitation of the small segment patch, both these two methods have not shown the obvious superior to the method only using the spectral information. The proposed method can find accurate segment patch and more free samples added to help the classification, thus the performance is better than compared method. However, compared with the Indian Pines dataset, the superiority of our proposed method is not so obvious. A possible reason for this is that the land-cover distribution in the Pavia dataset is not as concentrated as the Indian Pines dataset, which leads to the limitation of the automatic labeling capability. (ii) The Segmentation Results of the Indian Pines Dataset: The result of part of the Pavia dataset is shown in Fig. 15. In the bottom of Fig. 15 (b), it can be seen that the meadows and bare_soil are incorrectly merged. The meadows and tree classes are also incorrectly merged. In Fig. 15 (c), the incorrect merging is effectively avoided. From Fig. 15 (d), because a small part of the grass region cannot be accurately classified, the final refined segmentation cannot merge this small part with the rest of the grassland. Meanwhile, although a large scale is used in the refined segmentation map, the segment patches are still small, due to the intrinsic complex structure in this image. 3) The Effect of the Automatic Labeling: Fig. 16 shows the actual rate of automatic labeling for each class in the different iterations for two datasets. With the condition that no new queried samples are located in a certain class, the rate of automatic labeling cannot be calculated; thus, some lines in these figs are not connected. we can find that the automatically labeling rate of Indian pine of most class is very high, especially in the last the iterations, the rate is close to 1, which means most of samples can be automatically labeled. However, for the pavia dataset, the automatically labeling rate is not that high like Indian pine dataset, but still some samples can be automatically labeled. Thus this strategy is effective to reduce the number of samples need to be manually labeled on the both datasets. And the automatically labeling strategy can is significantly effective on the dataset distribution with large homogenous patches. 4) The Sensitivity of the Segmentation Scale: In this section, we assess the sensitivity of scale in the mean shift segmentation and refined segmentation with the two datasets. From Fig.17(a) and Fig.18(a), the limitations of the automatic labeling procedure, when the scale gets larger, the number of new queries is not reduced accordingly; however, the overall

12 2048 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 24, NO. 7, JULY 2015 Fig. 16. The rate of automatic labeling for each class in the different iterations. (a) Indiana dataset. (b) Pavia dataset. TABLE IV THE ERROR RATE OF THE AUTOMATIC LABELING WHEN ONLY ONE SEGMENTATION IS USED UNDER THE DIFFERENT SCALES TABLE V THE ERROR RATES OF THE AUTOMATIC LABELING WHEN REFINED SEGMENTATION IS USED UNDER THE DIFFERENT SCALES Fig. 17. The active learning curves of the proposed method: (a) when only one segmentation is used under the different scales; and (b) when refined segmentation is used under the different scales with the Indian Pines dataset. Fig. 18. The active learning curves of the proposed method: (a) when only one segmentation is used under the different scales; and (b) when refined segmentation is used under the different scales with the pavia datset. accuracy is reduced when the range scale is larger. The error rate of automatic labeling under the different scales also shows that when the range scale is larger, the error rate is notably increased, which leads to the decrease in the overall accuracy. Compared to the Indian pine dataset, the active learning curves of Pavia dataset did not increase sharply, which is because the automatically labeling ability on this dataset is not so obvious like the Indian pine dataset. Fig.17(b) and Fig.18(b) show the active learning curves of the proposed method when refined segmentation is used under the different scales. There is no obvious decrease of the performance with the increase of the scale. The error rate of the automatic labeling is shown in Table IV and Table V. We can see that, as a result of the error rate of the automatic labeling procedure being kept at the same level when refined segmentation is used, the performance of the active

13 SHI et al.: SPATIAL COHERENCE-BASED BATCH-MODE AL 2049 learning strategy gets better and better with the increase in the scale. IV. CONCLUSION In this paper, we have developed a spatial coherence based batch-mode active learning method for remote sensing image classification. By effectively utilizing the spatial correlation, the proposed AL method provides a good performance when compared with the other traditional methods. Firstly, the proposed method does not fix the number of the new queried samples, and have been proved could reduce the number of iterations. Secondly, our method utilizes a mean shift segmentation to describe the spatial correlations, which can further remove the spatial redundancy after removing the spectral redundancy. Meanwhile, by adding automatically labeling procedure according to segmentation patch, the proposed method can reduce the samples needing to be queried, and thus reduces the cost of the manual labeling to a great extent. REFERENCES [1] T. Yamazaki and D. Gingras, Image classification using spectral and spatial information based on MRF models, IEEE Trans. Image Process., vol. 4, no. 9, pp , Sep [2] K. Bernard, Y. Tarabalka, J. Angulo, J. Chanussot, and J. A. Benediktsson, Spectral spatial classification of hyperspectral data based on a stochastic minimum spanning forest approach, IEEE Trans. Image Process., vol. 21, no. 4, pp , Apr [3] J. E. Fowler and Q. Du, Anomaly detection and reconstruction from random projections, IEEE Trans. Image Process., vol. 21, no. 1, pp , Jan [4] J. Li, P. Reddy Marpu, A. Plaza, J. M. Bioucas-Dias, and J. Atli Benediktsson, Generalized composite kernel framework for hyperspectral image classification, IEEE Trans. Geosci. Remote Sens., vol. 51, no. 9, pp , Sep [5] B. Settles, Active Learning Literature Survey. Madison, WI, USA: Univ. Wisconsin-Madison, [6] J. Tang, Z.-J. Zha, D. Tao, and T.-S. Chua, Semantic-gap-oriented active learning for multilabel image annotation, IEEE Trans. Image Process., vol. 21, no. 4, pp , Apr [7] J.-G. Wang, E. Sung, and W.-Y. Yau, Active learning for solving the incomplete data problem in facial age classification by the furthest nearest-neighbor criterion, IEEE Trans. Image Process., vol. 20, no. 7, pp , Jul [8] D. P. Williams, Bayesian data fusion of multiview synthetic aperture sonar imagery for seabed classification, IEEE Trans. Image Process., vol. 18, no. 6, pp , Jun [9] D. MacKay, Information-based objective functions for active data selection, Neural Comput., vol. 4, no. 4, pp , Jul [10] D. D. Lewis and J. Catlett, Heterogeneous uncertainty sampling for supervised learning, in Proc. 11th Int. Conf. Mach. Learn., 1994, pp [11] D. A. Cohn, Z. Ghahramani, and M. I. Jordan. (1996). Active learning with statistical models. [Online]. Available: abs/cs/ [12] A. Vlachos, A stopping criterion for active learning, Comput. Speech Lang., vol. 22, no. 3, pp , Jul [13] B. Demir, C. Persello, and L. Bruzzone, Batch-mode active-learning methods for the interactive classification of remote sensing images, IEEE Trans. Geosci. Remote Sens., vol. 49, no. 3, pp , Mar [14] K. Brinker, Incorporating diversity in active learning with support vector machines, in Proc. Int. Conf. Mach. Learn. Workshop, 2003, p. 59. [15] Z. Xu, K. Yu, V. Tresp, X. Xu, and J. Wang, Representative sampling for text classification using support vector machines, Adv. Inf. Retr., vol. 2633, pp , Apr [16] H. T. Nguyen and A. Smeulders, Active learning using pre-clustering, in Proc. 21st Int. Conf. Mach. Learn., 2004, p. 79. [17] H. S. Seung, M. Opper, and H. Sompolinsky, Query by committee, in Proc. 5th Annu. Workshop Comput. Learn. Theory, 1992, pp [18] Y. Freund, H. S. Seung, E. Shamir, and N. Tishby, Selective sampling using the query by committee algorithm, Mach. Learn., vol. 28, nos. 2 3, pp , Aug./Sep [19] S. Argamon-Engelson and I. Dagan. (2011). Committee-based sample selection for probabilistic classifiers. [Online]. Available: [20] N. Abe and H. Mamitsuka, Query learning strategies using boosting and bagging, in Proc. 15th Int. Conf. Mach. Learn. (ICML), 1998, pp [21] P. Melville and R. J. Mooney, Diverse ensembles for active learning, in Proc. Int. Conf. Mach. Learn. Workshop, 2004, p [22] K. Nigam, A. K. McCallum, S. Thrun, and T. Mitchell, Text classification from labeled and unlabeled documents using EM, Mach. Learn., vol. 39, nos. 2 3, pp , May [23] T. Luo et al., Active learning to recognize multiple types of plankton, in Proc. 17th Int. Conf. Pattern Recognit. (ICPR), Aug. 2004, pp [24] N. Roy and A. McCallum, Toward optimal active learning through Monte Carlo estimation of error reduction, in Proc. ICML, Williamstown, MA, USA, 2001, pp [25] C. Campbell, N. Cristianini, and A. Smola, Query learning with large margin classifiers, in Proc. Int. Conf. Mach. Learn. Workshop, 2000, pp [26] G. Schohn and D. Cohn, Less is more: Active learning with support vector machines, in Proc. Int. Conf. Mach. Learn. Workshop, 2000, pp [27] B. Demir, C. Persello, and L. Bruzzone, Batch-mode active-learning methods for the interactive classification of remote sensing images, IEEE Trans. Geosci. Remote Sens., vol. 49, no. 3, pp , Mar [28] M. Ferecatu and N. Boujemaa, Interactive remote-sensing image retrieval using active relevance feedback, IEEE Trans. Geosci. Remote Sens., vol. 45, no. 4, pp , Apr [29] M. Volpi, D. Tuia, and M. Kanevski, Memory-based cluster sampling for remote sensing image classification, IEEE Trans. Geosci. Remote Sens., vol. 50, no. 8, pp , Aug [30] S. Patra and L. Bruzzone, A fast cluster-assumption based activelearning technique for classification of remote sensing images, IEEE Trans. Geosci. Remote Sens., vol. 49, no. 5, pp , May [31] Z. Li, J. Liu, Y. Yang, X. Zhou, and H. Lu, Clustering-guided sparse structural learning for unsupervised feature selection, IEEE Trans. Knowl. Data Eng., vol. 26, no. 9, pp , Sep [32] G. Jun, R. R. Vatsavai, and J. Ghosh, Spatially adaptive classification and active learning of multispectral data with Gaussian processes, in Proc. IEEE Int. Conf. Data Mining Workshops (ICDMW), 2009, pp [33] A. Liu, G. Jun, and J. Ghosh, Active learning of hyperspectral data with spatially dependent label acquisition costs, in Proc. IEEE Int. Geosci. Remote Sens. Symp., vol. 5. Jul. 2009, pp. V-256 V-259. [34] B. Demir, L. Minello, and L. Bruzzone, A cost-sensitive active learning technique for the definition of effective training sets for supervised classifiers, in Proc. IEEE Int. Geosci. Remote Sens. Symp., Munich, Germany, Jul. 2012, pp [35] D. Tuia, J. Muñoz-Marí, and G. Camps-Valls, Remote sensing image segmentation by active queries, Pattern Recognit., vol. 45, no. 6, pp , Jun [36] V. Walter, Object-based classification of remote sensing data for change detection, ISPRS J. Photogramm. Remote Sens., vol. 58, nos. 3 4, pp , Jan [37] M. M. Crawford, D. Tuia, and H. L. Yang, Active learning: Any value for classification of remotely sensed data? Proc. IEEE, vol. 101, no. 3, pp , Mar [38] J. Muñoz-Marí, D. Tuia, and G. Camps-Valls, Semisupervised classification of remote sensing images with active queries, IEEE Trans. Geosci. Remote Sens., vol. 50, no. 10, pp , Oct [39] M. Li and I. K. Sethi, Confidence-based active learning, IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no. 8, pp , Aug [40] D. Tuia, F. Ratle, F. Pacifici, M. F. Kanevski, and W. J. Emery, Active learning methods for remote sensing image classification, IEEE Trans. Geosci. Remote Sens., vol. 47, no. 7, pp , Jul [41] A. Vlachos, A stopping criterion for active learning, Comput. Speech Lang., vol. 22, no. 3, pp , Jul [42] Y. Cheng, Mean shift, mode seeking, and clustering, IEEE Trans. Pattern Anal. Mach. Intell., vol. 17, no. 18, pp , Aug

2050 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 24, NO. 7, JULY 2015 [43] D. Comaniciu and P. Meer, Mean shift: A robust approach toward feature space analysis, IEEE Trans. Pattern Anal. Mach.

, Jul. 2001, pp. 438 445. [45] X. Huang and L. Zhang, An adaptive mean-shift analysis approach for object extraction and classification from urban hyperspectral imagery, IEEE Trans. Geosci.

Lin, LIBSVM: A library for support vector machines, ACM Trans. Intell. Syst. Technol., vol. 2, pp. 27:1 27:27, Apr. 2011. [48] V. Cherkassky and Y.

14 2050 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 24, NO. 7, JULY 2015 [43] D. Comaniciu and P. Meer, Mean shift: A robust approach toward feature space analysis, IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 5, pp , May [44] D. Comaniciu, V. Ramesh, and P. Meer, The variable bandwidth mean shift and data-driven scale selection, in Proc. 8th IEEE Int. Conf. Comput. Vis., Jul. 2001, pp [45] X. Huang and L. Zhang, An adaptive mean-shift analysis approach for object extraction and classification from urban hyperspectral imagery, IEEE Trans. Geosci. Remote Sens., vol. 46, no. 12, pp , Dec [46] C.-I. Chang, Hyperspectral Data Exploitation: Theory and Applications. New York, NY, USA: Wiley, [47] C.-C. Chang and C.-J. Lin, LIBSVM: A library for support vector machines, ACM Trans. Intell. Syst. Technol., vol. 2, pp. 27:1 27:27, Apr [48] V. Cherkassky and Y. Ma, Practical selection of SVM parameters and noise estimation for SVM regression, Neural Netw., vol. 17, no. 1, pp , Jan [49] J. Li, J. M. Bioucas-Dias, and A. Plaza, Hyperspectral image segmentation using a new Bayesian approach with active learning, IEEE Trans. Geosci. Remote Sens., vol. 49, no. 10, pp , Oct [50] J. Li, J. M. Bioucas-Dias, and A. Plaza, Spectral spatial classification of hyperspectral data using loopy belief propagation and active learning, IEEE Trans. Geosci. Remote Sens., vol. 51, no. 2, pp , Feb [51] J. Li, J. M. Bioucas-Dias, and A. Plaza, Semisupervised hyperspectral image segmentation using multinomial logistic regression with active learning, IEEE Trans. Geosci. Remote Sens., vol. 48, no. 11, pp , Nov [52] J. Tang, R. Hong, S. Yan, T.-S. Chua, G.-J. Qi, and R. Jain, Image annotation by knn-sparse graph-based label propagation over noisily tagged web images, ACM Trans. Int. Syst. Technol., vol. 2, no. 2, p. 14, Feb [53] Q. Shi, L. Zhang, and B. Du, Semisupervised discriminative locally enhanced alignment for hyperspectral image classification, IEEE Trans. Geosci. Remote Sens., vol. 51, no. 9, pp , Sep [54] B. Du, L. Zhang, L. Zhang, T. Chen, and K. Wu, A discriminative manifold learning based dimension reduction method for hyperspectral classification, Int. J. Fuzzy Syst., vol. 14, no. 2, pp , Jun [55] N. Wang, B. Du, L. Zhang, and L. Zhang, An abundance characteristicbased independent component analysis for hyperspectral unmixing, IEEE Trans. Geosci. Remote Sens., vol. 53, no. 1, pp , Jan Qian Shi received the B.S. degree in sciences and techniques of remote sensing from Wuhan University, Wuhan, China, in 2010, where she is currently pursuing the Ph.D. degree with the State Key Laboratory of Information Engineering in Surveying, Mapping, and Remote Sensing. Her research interests include hyperspectral data dimension reduction, active learning for remote sensing images, and pattern recognition in remote sensing images. Bo Du (M 11) received the B.S. degree from Wuhan University, Wuhan, China, in 2005, and the Ph.D. degree in photogrammetry and remote sensing from the State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, in 2010, where he is currently an Associate Professor with the School of Computer Science. His major research interests include pattern recognition, hyperspectral image processing, and signal processing. Liangpei Zhang (M 06 SM 08) received the B.S. degree in physics from Hunan Normal University, Changsha, China, in 1982, the M.S. degree in optics from the Chinese Academy of Sciences, Xi an, China, in 1988, and the Ph.D. degree in photogrammetry and remote sensing from Wuhan University, Wuhan, China, in He is currently the Head of the Remote Sensing Division with the State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University. He is also a Chang Jiang Scholar Chair Professor appointed by the Ministry of Education, China. He is a Principal Scientist for the China State Key Basic Research Project ( ) appointed by the Ministry of National Science and Technology of China to lead the Remote Sensing Program in China. He has authored over 410 research papers. He holds 15 patents. His research interests include hyperspectral remote sensing, high-resolution remote sensing, image processing, and artificial intelligence. Dr. Zhang is also a fellow of the Institution of Engineering and Technology, an Executive Member (Board of Governor) of the China National Committee of the International Geosphere-Biosphere Program, and an Executive Member of the China Society of Image and Graphics. He regularly serves as the Co-Chair of the series SPIE Conferences on Multispectral Image Processing and Pattern Recognition, Conference on Asia Remote Sensing, and many other conferences. He edited several conference proceedings, issues, and Geoinformatics symposiums. He also serves as an Associate Editor of the International Journal of Ambient Computing and Intelligence, the International Journal of Image and Graphics, theinternational Journal of Digital Multimedia Broadcasting, thejournal of Geo-spatial Information Science, the Journal of Remote Sensing, and the IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING.

Title: A novel SOM-SVM based active learning technique for remote sensing image classification

Title: A novel SOM-SVM based active learning technique for remote sensing image classification 014 IEEE. Personal used of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising