Spectral Active Clustering of Remote Sensing Images

Similar documents
Comparison of spatial indexes

BoxPlot++ Zeina Azmeh, Fady Hamoui, Marianne Huchard. To cite this version: HAL Id: lirmm

A Practical Approach for 3D Model Indexing by combining Local and Global Invariants

Tacked Link List - An Improved Linked List for Advance Resource Reservation

Study on Feebly Open Set with Respect to an Ideal Topological Spaces

Geometric Consistency Checking for Local-Descriptor Based Document Retrieval

Prototype Selection Methods for On-line HWR

Lossless and Lossy Minimal Redundancy Pyramidal Decomposition for Scalable Image Compression Technique

How to simulate a volume-controlled flooding with mathematical morphology operators?

An FCA Framework for Knowledge Discovery in SPARQL Query Answers

Selection of Scale-Invariant Parts for Object Class Recognition

Setup of epiphytic assistance systems with SEPIA

Regularization parameter estimation for non-negative hyperspectral image deconvolution:supplementary material

Service Reconfiguration in the DANAH Assistive System

DANCer: Dynamic Attributed Network with Community Structure Generator

Reverse-engineering of UML 2.0 Sequence Diagrams from Execution Traces

DSM GENERATION FROM STEREOSCOPIC IMAGERY FOR DAMAGE MAPPING, APPLICATION ON THE TOHOKU TSUNAMI

KeyGlasses : Semi-transparent keys to optimize text input on virtual keyboard

Traffic Grooming in Bidirectional WDM Ring Networks

Fuzzy sensor for the perception of colour

A Voronoi-Based Hybrid Meshing Method

RETIN AL: An Active Learning Strategy for Image Category Retrieval

Relabeling nodes according to the structure of the graph

Natural Language Based User Interface for On-Demand Service Composition

SDLS: a Matlab package for solving conic least-squares problems

Application of RMAN Backup Technology in the Agricultural Products Wholesale Market System

Linked data from your pocket: The Android RDFContentProvider

The optimal routing of augmented cubes.

HySCaS: Hybrid Stereoscopic Calibration Software

MIXMOD : a software for model-based classification with continuous and categorical data

Fault-Tolerant Storage Servers for the Databases of Redundant Web Servers in a Computing Grid

Automatic indexing of comic page images for query by example based focused content retrieval

Structuring the First Steps of Requirements Elicitation

Paris-Lille-3D: A Point Cloud Dataset for Urban Scene Segmentation and Classification

Comparator: A Tool for Quantifying Behavioural Compatibility

THE COVERING OF ANCHORED RECTANGLES UP TO FIVE POINTS

Blind Browsing on Hand-Held Devices: Touching the Web... to Understand it Better

Taking Benefit from the User Density in Large Cities for Delivering SMS

Syrtis: New Perspectives for Semantic Web Adoption

QuickRanking: Fast Algorithm For Sorting And Ranking Data

Kernel perfect and critical kernel imperfect digraphs structure

YAM++ : A multi-strategy based approach for Ontology matching task

Using a Medical Thesaurus to Predict Query Difficulty

Every 3-connected, essentially 11-connected line graph is hamiltonian

A case-based reasoning approach for unknown class invoice processing

LIG and LIRIS at TRECVID 2008: High Level Feature Extraction and Collaborative Annotation

A Resource Discovery Algorithm in Mobile Grid Computing based on IP-paging Scheme

Assisted Policy Management for SPARQL Endpoints Access Control

A Practical Evaluation Method of Network Traffic Load for Capacity Planning

The Proportional Colouring Problem: Optimizing Buffers in Radio Mesh Networks

lambda-min Decoding Algorithm of Regular and Irregular LDPC Codes

Zigbee Wireless Sensor Network Nodes Deployment Strategy for Digital Agricultural Data Acquisition

NP versus PSPACE. Frank Vega. To cite this version: HAL Id: hal

Very Tight Coupling between LTE and WiFi: a Practical Analysis

Quality of Service Enhancement by Using an Integer Bloom Filter Based Data Deduplication Mechanism in the Cloud Storage Environment

An Efficient Numerical Inverse Scattering Algorithm for Generalized Zakharov-Shabat Equations with Two Potential Functions

Change Detection System for the Maintenance of Automated Testing

A case-based reasoning approach for invoice structure extraction

Catalogue of architectural patterns characterized by constraint components, Version 1.0

Sliding HyperLogLog: Estimating cardinality in a data stream

Learning Object Representations for Visual Object Class Recognition

A Fuzzy Approach for Background Subtraction

Moveability and Collision Analysis for Fully-Parallel Manipulators

Representation of Finite Games as Network Congestion Games

Kernel-Based Laplacian Smoothing Method for 3D Mesh Denoising

LaHC at CLEF 2015 SBS Lab

Full Text Search Engine as Scalable k-nearest Neighbor Recommendation System

Clustering Document Images using a Bag of Symbols Representation

An Experimental Assessment of the 2D Visibility Complex

A Multi-purpose Objective Quality Metric for Image Watermarking

Workspace and joint space analysis of the 3-RPS parallel robot

Non-negative dictionary learning for paper watermark similarity

[Demo] A webtool for analyzing land-use planning documents

Movie synchronization by audio landmark matching

Multi-atlas labeling with population-specific template and non-local patch-based label fusion

Scale Invariant Detection and Tracking of Elongated Structures

Fast and precise kinematic skeleton extraction of 3D dynamic meshes

Multimedia CTI Services for Telecommunication Systems

Recommendation-Based Trust Model in P2P Network Environment

Mokka, main guidelines and future

SLMRACE: A noise-free new RACE implementation with reduced computational time

ROBUST MOTION SEGMENTATION FOR HIGH DEFINITION VIDEO SEQUENCES USING A FAST MULTI-RESOLUTION MOTION ESTIMATION BASED ON SPATIO-TEMPORAL TUBES

The Connectivity Order of Links

End-to-End Learning of Polygons for Remote Sensing Image Classification

Caching strategies based on popularity prediction in content delivery networks

Simulations of VANET Scenarios with OPNET and SUMO

Motion-based obstacle detection and tracking for car driving assistance

A 64-Kbytes ITTAGE indirect branch predictor

The New Territory of Lightweight Security in a Cloud Computing Environment

Approximate RBF Kernel SVM and Its Applications in Pedestrian Classification

Branch-and-price algorithms for the Bi-Objective Vehicle Routing Problem with Time Windows

Real-Time Collision Detection for Dynamic Virtual Environments

QAKiS: an Open Domain QA System based on Relational Patterns

Cloud My Task - A Peer-to-Peer Distributed Python Script Execution Service

TEXTURE SIMILARITY METRICS APPLIED TO HEVC INTRA PREDICTION

Detecting Anomalies in Netflow Record Time Series by Using a Kernel Function

A Generic Architecture of CCSDS Low Density Parity Check Decoder for Near-Earth Applications

Partition Detection in Mobile Ad-hoc Networks

Linux: Understanding Process-Level Power Consumption

Efficient Gradient Method for Locally Optimizing the Periodic/Aperiodic Ambiguity Function

Transcription:

Spectral Active Clustering of Remote Sensing Images Zifeng Wang, Gui-Song Xia, Caiming Xiong, Liangpei Zhang To cite this version: Zifeng Wang, Gui-Song Xia, Caiming Xiong, Liangpei Zhang. Spectral Active Clustering of Remote Sensing Images. 2014. <hal-00931926> HAL Id: hal-00931926 https://hal.archives-ouvertes.fr/hal-00931926 Submitted on 16 Jan 2014 HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

SPECTRAL ACTIVE CLUSTERING OF REMOTE SENSING IMAGES Zifeng Wang 1, Gui-Song Xia 1, Caiming Xiong 2, Liangpei Zhang 1 1 Key State Laboratory LIESMARS, Wuhan University, Wuhan 430072, China 2 Department of Computer Science, State University of New York at Buffalo, NY, USA ABSTRACT Mining useful information from remote sensing images is a longstanding and challenging problem in earth observation, among which images clustering is used to discover meaningful scene information, by grouping similar image pixels into clusters. The main difficulty of image clustering, however, lies in the fact that imperfect similarity measure between images usually leads to bad clustering results. Supervised classification with labeled training samples can partially solve this problem, but the collection of such labeled data is usually time-consuming and sometimes impossible in many real problems. This paper presents an active remote sensing image clustering algorithm by integrating simple human queries into the clustering process. More precisely, we propose a spectral active clustering method that can actively query the oracle (such as human) to improve the image clustering performance. We first construct a k-nearest neighbor (k-nn) graph of the remote sensing images. We then iteratively select the most informative pairwise constraints and purify the k-nn graph, by removing the edges between images from different classes. The final clustering on the purified k-nn graph leads to more accurate result. The proposed method has been evaluated on three high-resolution remote sensing image datasets. It achieves the state-of-the-art performance and demonstrates high potentials in practical remote sensing applications. Index Terms Information mining, remote sensing image clustering, active clustering 1. INTRODUCTION Nowadays, remote sensing images capture in details wide surfaces and result in extremely large volumes of data with high spectral and spatial resolution. However, the large amount of remote sensing images are at present weakly exploited due to their large sizes and time-consuming visual analysis [1]. This demands to develop more efficient methods for mining information from these large scale remote sensing images. In this paper, we are interested in remote sensing image classification. As we know, most of the classification methods require labeled training data from experts, while the annotation of which is usually expensive and sometimes impossible to acquire in many real problems. Alternatively, clustering (or unsupervised classification) approaches are more suitable when no labeled data is available [2]. Survey of different clustering methods has been presented in [3]. One main difficulty of these methods, however, lies in the fact that their performances heavily depend on the similarity matrices between images, which are usually far from perfect on real problem. One possible solution is to integrate acceptable and weak priori knowledge, that are easy to achieve without expertise, into the clustering process, and to build semi-supervised [4] or active clustering algorithms [5, 6]. This paper introduces a new active clustering algorithm for remote sensing images with weak human queries. More precisely, we propose a spectral active clustering method that can actively query the oracle (such as human) to purify the similarity matrix and finally improve the images clustering performance. In particular, given remote sensing images, we first construct a k-nearest neighbor (k-nn) graph, on which the clustering will perform. To get near-perfect similarity matrices, we propose to interactively purify the k-nn graph, by actively selecting the most informative pairwise constraints. (In our work, the pairwise image constraints are similar or dissimilar given by an oracle, which is much easier than that of collecting labeled data.) The purified similarity matrices lead to a updated k-nn graph and are finally used to achieve the clustering of images. Unlike previous active clustering method [5, 7], we use vertex uncertainty instead of pairwise uncertainty for selecting informative vertexes on the k-nn graph. To our knowledge, our work is the first time to develop an active clustering method for remote sensing images, which has high potentials in real problem of mining information from earth observation images. The proposed method demonstrates the state-of-the-art classification and annotation results on three high-resolution satellite image datasets. The rest of this paper is organized as follows. Section 2 recalls the basics of spectral image clustering. Section 3 introduces the proposed spectral active clustering framework for remote sensing images. Section 4 presents experimental results and Section 5 gives some conclusion remarks. 2. SPECTRAL IMAGE CLUSTERING Spectral clustering, which makes use of the spectrum of the similarity matrix of the data to perform clustering, is one of

the most popular clustering algorithms, due to its efficiency and the simplicity of implementation [8]. Constructing a k-nn graph of images : Given a set of images (or regions of an image) I = {I 1, I 2,, I n }, described by some visual features f i, denote the similarity matrix as S, with each elements s ij representing a measure of the similarity between f i, f j of I i, I j. The similarity graph is thus defined by G = (V, E), where each vertex v i V represents an image I i and two vertices I i, I j are connected if s ij is larger than a certain threshold, and the edge e ij E is weighted by s ij. In our case, we refer the graph G as a k-nn graph, which implies that for a vertex v i only the vertices with the first k largest similarity measure are connected to it. Spectral clustering algorithm : Spectral image clustering [8] then works with the eigenvectors of the normalized Laplacian matrix L of G, using k-means algorithm. The normalized Laplacian matrix L is computed as L = I D 1/2 SD 1/2, where I is the identity matrix and D is the diagonal matrix as D ii = n j=1 s ij. In order to deal with large scale remote sensing images, some large scale spectral clustering algorithm can be used to compute L more efficiently. 3. SPECTRAL ACTIVE CLUSTERING OF REMOTE SENSING IMAGES This section presents the proposed spectral active clustering approach for remote sensing images. The flowchart of the algorithm is depicted in Fig. 1. and spectral clustering performs again on the updated k-nn graph. The algorithm iterates this process until the oracle is satisfied with the result or the k-nn graph is fully purified. In what follows, we precise the criterions for active constraints selection and the k-nn graph purification. Active Constraints Selection Ideally, in a perfect k-nn graph, each image should belong to the same class as all its neighbors do. Consequently, vertices connected in the k-nn graph will be assigned to the same cluster. In order to look for new constraints to purify the k-nn graph, we try to find out the image whose neighbors contain the worst edge (edge linked images belonging to different classes). For the class label c of vertices, we use cluster labels instead of the real labels which are still unavailable. In order to measure the disordered level of an image I i on the graph G, we use an entropy criterion defined as H(I i ) = l P Ii (l) log P Ii (l), (1) where P Ii (l) = #{cv=l,v N I i } k is the probability of the neighbors that are assigned to the cluster l, with N Ii as the neighborhoods of I i on G. We thus choose the most disordered image I i, i.e. I i = arg max Ij G H(I j ), as the vertex on the graph for querying. Consequently, the active edges are thus {(I i, I j ); e i j = 1, I j G}, which link the image I i and its neighbors. We query the oracle along these edges to obtain new constraints. k-nn Graph Purification As mentioned before, there are two kind of pairwise constraints: similar and dissimilar, called must-link and cannot-link on the graph which respectively link images to the same and different classes. A simple constraint augmentation process is described in Fig. 2. It enables us to obtain more constraints from known ones. By purifying the k-nn graph, all the cannot-link edges in the graph will be removed, while the similarity values of must-link edges are set to 1. Fig. 1. Flowchart for spectral active clustering. Refer to the text for more details. Given a set of image (or image regions) as inputs, we first construct the k-nn graph followed by a spectral clustering algorithm, as described in Section 2. We iteratively obtain new constraints by actively selecting the most informative image and querying the oracle. Active learning helps us to find out the most informative image, which is also the most uncertain one, according to the current clustering result and the k- NN graph. With new constraints, the k-nn graph is purified Fig. 2. Our constraint augmentation process. 4. EXPERIMENTAL RESULTS We evaluates the proposed spectral active clustering method on remote sensing image classification and annotation. The three remote sensing image sets we used are: (1) A GeoEye-1 image in Beijing as in Figure 4: the image is of size 4000 4000 and is partitioned into 1600

Fig. 3. The evaluation of the similarity matrix with the iterative purification of the k-nn graph. From left to right: the similarity matrices at 1, 75, 150 and 225 iterations. From top to bottom: the similarity matrices of the GeoEye-1 image in Beijing, the WhuSPL datset [9], and the UCM datset [10]. Observe that with the iterations the similarity matrices become more and more discriminative. tiles with size of 100 100 pixels. We try to annotate the tiled images as in [11], who also provides its human-labeled groundtruth. (2) WhuSPL dataset [9], containing 20-class high-resolution satellite images with about 50 samples per each categories. (3) UCM dataset [10], which contains 21class high-resolution satellite images with 100 samples per each categories. In our experiments, to characterize each image or image tile, we calculate the bag of dense SIFT descriptors [12] and bag of color descriptors in each image. Thus, the image Ii is described by a histogram-like vector hi of length M. The pairwise similarities matrix PSM is calculated by histogram intersection kernel as sij = m min(hi [m], hj [m]). We compare our algorithm with three currently proposed methods: spectral clustering (SC) algorithm [8], semi-supervised spectral clustering (S3 C) algorithm [4] and the M3 DA-RF algorithm [11], a recently proposed fullsupervised method on our task. Remark that, instead of collecting pairwise constraints actively, S3 C [4] performs spectral clustering by using pairwise constraints provided before clustering. The M3 DA-RF [11] is full-supervised and requires labeled data from experts as training sample. In current experiments, our method exits the active clustering loop when the oracle is satisfied with the results. For a fair comparison, we use the same number of pairwise constraints for the S3 C [4] algorithm. In order to evaluate the performance of clustering algorithms (as that for supervised algorithms, e.g. M3 DA-RF), the clustering accuracy is computed by assigning the label of each cluster as its closest class in the groundtruth. Fig. 3 illustrates the evaluation of the similarity matrices with the iterative purification of the k-nn graph on each data set. Note that, on all the three dataset, with more active iterations (i.e. more queries from oracle) are used the similarity matrices become more and more discriminative. This verifies the efficiency and the necessariness of the active selection procedure of our proposed method. We compare the clustering performances of SC, S3 C and our method on WhuSPL dataset [9] and UCM dataset [10]. The accuracies are respectively 39.2%, 58.5% and 98.8% on WhuSPL dataset and 35.7%, 47.9% and 96.3% on UCM dataset of the three algorithms. Fig. 4 displays the comparison of our method with the three approaches mentioned above, SC [8], S3 C [4] and M3 DA-RF [11], on the annotation of a big high-resolution satellite image, a GeoEye-1 image in Beijing. The M3 DARF [11] takes half of the labeled data as training samples and uses multi-level max-margin discriminative random field [11] for annotation. Notice that M3 DA-RF uses actually spatial constraints via conditional random field to improve its performance. From Fig. 4, one can see that S3 C gives much better result (61.8%) than that of SC (35.8%), which implies that pairwise constraints is very helpful for this task. The M3 DARF produces better result (91.6%) than S3 C does. This is mainly because it uses fully labeled training samples, the prior information of which is much stronger that that of weak pairwise constraints. Our method outperforms all the three methods and shows near-perfect annotation result (99.2%). This finally verifies our intuition on the proposed method, that the weak prior knowledge from the actively selected pairwise constraints can help the clustering algorithm a lot.

(a) original image (b) human-labeled groundtruth [11] (c) by spectral clustering [8] (d) by S3 C [4] (semi-supervised) (e) by M3 DA-RF [11], (full-supervised) (f) by our method Fig. 4. Comparison of different methods on the annotation of a GeoEye-1 image in Beijing. Refer to the text for more details. 5. CONCLUSION This paper addresses the problem of remote sensing image clustering with weak priors. Our method actively queries the oracle to get weak pairwise constraints. Instead of pair uncertainty, the proposed method uses node uncertainty as active select criterion, which can more accurately select useful queries. With these acceptable a few pairwise constraints, the clustering result shows notable improvements. It is very powerful to classify remote sensing images, when there is no available labeled training data or it is hard to acquire such training data. From experiment results, we can see that the proposed method outperforms three state-of-the-art methods and shows high potentials in mining meaningful information from remote sensing images. 6. REFERENCES [1] D. Tuia and G. Camps-Valls, Recent advances in remote sensing image processing, in Proc. of ICIP, 2009, pp. 3705 3708. [2] R. T. Ng and J. Han, Efficient and effective clustering methods for spatial data mining, in Proc. of VLDB, 1994, pp. 144 155. [3] A. K. Jain and R. C. Dubes, Algorithms for Clustering Data, Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1988. [4] Z. Li and J. Liu, Constrained clustering by spectral kernel learning, in Proc. of ICCV, 2009, pp. 421 427. [5] B. Eriksson, G. Dasarathy, A. Singh, and R. Nowak, Active clustering: Robust and efficient hierarchical clustering using adaptively selected similarities, CoRR, vol. 1102.3887, 2011. [6] C. Xiong, D. Johnson, and J. J. Corso, Spectral active clustering via purification of the k-nearest neighbor graph, in Proc. of European Conference on Data Mining, 2012. [7] X. Wang and I. Davidson, Active spectral clustering, in Proc. of ICDM, 2010, pp. 561 568. [8] A. Y. Ng, M. I. Jordan, and Y. Weiss, On spectral clustering: Analysis and an algorithm, in Proc. of NIPS, 2001, pp. 849 856. [9] G.-S. Xia, W. Yang, J. Delon, and Y. Gousseau, Structural high-resolution satellite image indexing, in ISPRS TC VII Symposium-100 Years ISPRS, 2010, vol. 38, pp. 298 303. [10] Y. Yang and S. Newsam, Bag-of-visual-words and spatial extensions for land-use classification, in Proc.of SIGSPATIAL on Advances in GIS. ACM, 2010, pp. 270 279. [11] F. Hu, W. Yang, J. Chen, and H. Sun, Tile-level annotation of satellite images using multi-level max-margin discriminative random field, Remote Sensing, vol. 5, no. 5, pp. 2275 2291, 2013. [12] D. Lowe, Distinctive image features from scale-invariant keypoints, Int. J. Comput. Vision, vol. 60, no. 2, pp. 91 110, 2004.