Schroedinger Eigenmaps with Nondiagonal Potentials for Spatial-Spectral Clustering of Hyperspectral Imagery

Similar documents
Spatial-Spectral Dimensionality Reduction of Hyperspectral Imagery with Partial Knowledge of Class Labels

DIMENSION REDUCTION FOR HYPERSPECTRAL DATA USING RANDOMIZED PCA AND LAPLACIAN EIGENMAPS

Low-dimensional Representations of Hyperspectral Data for Use in CRF-based Classification

Dimension reduction for hyperspectral imaging using laplacian eigenmaps and randomized principal component analysis:midyear Report

Dimension reduction for hyperspectral imaging using laplacian eigenmaps and randomized principal component analysis

Classification of Hyperspectral Data over Urban. Areas Using Directional Morphological Profiles and. Semi-supervised Feature Extraction

Spatial-Spectral Operator Theoretic Methods for Hyperspectral Image Classification

HYPERSPECTRAL image (HSI) acquired by spaceborne

Fusion of pixel based and object based features for classification of urban hyperspectral remote sensing data

Spatial-spectral operator theoretic methods for hyperspectral image classification

Semi-Supervised Normalized Embeddings for Fusion and Land-Use Classification of Multiple View Data

Hyperspectral Image Classification Using Gradient Local Auto-Correlations

Hyperspectral image segmentation using spatial-spectral graphs

PoS(CENet2017)005. The Classification of Hyperspectral Images Based on Band-Grouping and Convolutional Neural Network. Speaker.

Exploring Structural Consistency in Graph Regularized Joint Spectral-Spatial Sparse Coding for Hyperspectral Image Classification

Spectral-Spatial Response for Hyperspectral Image Classification

SELECTION OF THE OPTIMAL PARAMETER VALUE FOR THE LOCALLY LINEAR EMBEDDING ALGORITHM. Olga Kouropteva, Oleg Okun and Matti Pietikäinen

Data fusion and multi-cue data matching using diffusion maps

Frame based kernel methods for hyperspectral imagery data

Linear and Non-linear Dimentionality Reduction Applied to Gene Expression Data of Cancer Tissue Samples

Hyperspectral Image Classification via Kernel Sparse Representation

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

Spectral Angle Based Unary Energy Functions for Spatial-Spectral Hyperspectral Classification Using Markov Random Fields

PARALLEL IMPLEMENTATION OF MORPHOLOGICAL PROFILE BASED SPECTRAL-SPATIAL CLASSIFICATION SCHEME FOR HYPERSPECTRAL IMAGERY

IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 13, NO. 8, AUGUST

GRAPH-BASED SEMI-SUPERVISED HYPERSPECTRAL IMAGE CLASSIFICATION USING SPATIAL INFORMATION

Non-linear dimension reduction

REMOTE sensing hyperspectral images (HSI) are acquired

Dimension Reduction CS534

Locality Preserving Projections (LPP) Abstract

R-VCANet: A New Deep Learning-Based Hyperspectral Image Classification Method

Does Normalization Methods Play a Role for Hyperspectral Image Classification?

Hyperspectral Image Classification by Using Pixel Spatial Correlation

School of Computer and Communication, Lanzhou University of Technology, Gansu, Lanzhou,730050,P.R. China

Research Article Hyperspectral Image Classification Using Kernel Fukunaga-Koontz Transform

STRATIFIED SAMPLING METHOD BASED TRAINING PIXELS SELECTION FOR HYPER SPECTRAL REMOTE SENSING IMAGE CLASSIFICATION

Locality Preserving Projections (LPP) Abstract

IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 7, NO. 6, JUNE

CSE 6242 A / CS 4803 DVA. Feb 12, Dimension Reduction. Guest Lecturer: Jaegul Choo

Classification of Hyper spectral Image Using Support Vector Machine and Marker-Controlled Watershed

Remote Sensed Image Classification based on Spatial and Spectral Features using SVM

Principal Component Image Interpretation A Logical and Statistical Approach

Remote Sensing Data Classification Using Combined Spectral and Spatial Local Linear Embedding (CSSLE)

c 2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all

MULTIVARIATE TEXTURE DISCRIMINATION USING A PRINCIPAL GEODESIC CLASSIFIER

Multi-resolution Segmentation and Shape Analysis for Remote Sensing Image Classification

Fuzzy Entropy based feature selection for classification of hyperspectral data

COMPRESSED DETECTION VIA MANIFOLD LEARNING. Hyun Jeong Cho, Kuang-Hung Liu, Jae Young Park. { zzon, khliu, jaeypark

Stratified Structure of Laplacian Eigenmaps Embedding

Face Recognition using Laplacianfaces

Dimensionality Reduction using Hybrid Support Vector Machine and Discriminant Independent Component Analysis for Hyperspectral Image

IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING 1

Cluster Analysis (b) Lijun Zhang

A comparison study of dimension estimation algorithms

Introduction to digital image classification

Copyright 2005 Center for Imaging Science Rochester Institute of Technology Rochester, NY

APPLICATION OF SOFTMAX REGRESSION AND ITS VALIDATION FOR SPECTRAL-BASED LAND COVER MAPPING

Application of Spectral Clustering Algorithm

Robust Pose Estimation using the SwissRanger SR-3000 Camera

Semi-Supervised Clustering with Partial Background Information

DUe to the rapid development and proliferation of hyperspectral. Hyperspectral Image Classification in the Presence of Noisy Labels

HYPERSPECTRAL remote sensing images (HSI) with

Selecting Models from Videos for Appearance-Based Face Recognition

A MAXIMUM NOISE FRACTION TRANSFORM BASED ON A SENSOR NOISE MODEL FOR HYPERSPECTRAL DATA. Naoto Yokoya 1 and Akira Iwasaki 2

Multi-level fusion of graph based discriminant analysis for hyperspectral image classification

Graph Laplacian Kernels for Object Classification from a Single Example

Spatially variant dimensionality reduction for the visualization of multi/hyperspectral images

Manifold Learning for Video-to-Video Face Recognition

Large-Scale Face Manifold Learning

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

CSE 6242 A / CX 4242 DVA. March 6, Dimension Reduction. Guest Lecturer: Jaegul Choo

Appearance Manifold of Facial Expression

EVALUATION OF CONVENTIONAL DIGITAL CAMERA SCENES FOR THEMATIC INFORMATION EXTRACTION ABSTRACT

Including the Size of Regions in Image Segmentation by Region Based Graph

Spectral-spatial rotation forest for hyperspectral image classification

Revista de Topografía Azimut

DEEP LEARNING TO DIVERSIFY BELIEF NETWORKS FOR REMOTE SENSING IMAGE CLASSIFICATION

Diagonal Principal Component Analysis for Face Recognition

Hyperspectral and Multispectral Image Fusion Using Local Spatial-Spectral Dictionary Pair

Relative Constraints as Features

Spectral-Spatial Classification of Hyperspectral Image Based on Kernel Extreme Learning Machine

The Analysis of Parameters t and k of LPP on Several Famous Face Databases

Constrained Manifold Learning for Hyperspectral Imagery Visualization

Object Geolocation from Crowdsourced Street Level Imagery

Learning a Manifold as an Atlas Supplementary Material

Modelling and Visualization of High Dimensional Data. Sample Examination Paper

Laplacian Faces: A Face Recognition Tool

Learning to Recognize Faces in Realistic Conditions

Title: A Deep Network Architecture for Super-resolution aided Hyperspectral Image Classification with Class-wise Loss

Visual Representations for Machine Learning

Technical Report. Title: Manifold learning and Random Projections for multi-view object recognition

Identifying Layout Classes for Mathematical Symbols Using Layout Context

IMAGE DENOISING USING NL-MEANS VIA SMOOTH PATCH ORDERING

Globally and Locally Consistent Unsupervised Projection

Color Local Texture Features Based Face Recognition

High-Resolution Image Classification Integrating Spectral-Spatial-Location Cues by Conditional Random Fields

Image Similarities for Learning Video Manifolds. Selen Atasoy MICCAI 2011 Tutorial

Short Survey on Static Hand Gesture Recognition

Improving Image Segmentation Quality Via Graph Theory

Discriminant Analysis-Based Dimension Reduction for Hyperspectral Image Classification

Transcription:

Schroedinger Eigenmaps with Nondiagonal Potentials for Spatial-Spectral Clustering of Hyperspectral Imagery Nathan D. Cahill a, Wojciech Czaja b, and David W. Messinger c a Center for Applied and Computational Mathematics, School of Mathematical Sciences, Rochester Institute of Technology, Rochester, NY 14623, USA b Department of Mathematics, University of Maryland, College Park, MD 20742, USA c Digital Imaging and Remote Sensing Laboratory, Center for Imaging Science, Rochester Institute of Technology, Rochester, NY 14623, USA ABSTRACT Schroedinger Eigenmaps (SE) has recently emerged as a powerful graph-based technique for semi-supervised manifold learning and recovery. By extending the Laplacian of a graph constructed from hyperspectral imagery to incorporate barrier or cluster potentials, SE enables machine learning techniques that employ expert/labeled information provided at a subset of pixels. In this paper, we show how different types of nondiagonal potentials can be used within the SE framework in a way that allows for the integration of spatial and spectral information in unsupervised manifold learning and recovery. The nondiagonal potentials encode spatial proximity, which when combined with the spectral proximity information in the original graph, yields a framework that is competitive with state-of-the-art spectral/spatial fusion approaches for clustering and subsequent classification of hyperspectral image data. Keywords: Schroedinger eigenmaps, Laplacian eigenmaps, spatial-spectral fusion, dimensionality reduction 1. INTRODUCTION In hyperspectral imagery, each image pixel is comprised of typically hundreds of spectral bands. 1 Hence, an m n hyperspectral image with d spectral bands can be thought of as a data set containing mn points in a d-dimensional space. Because d can be quite large, it can be difficult for analysts to effectively search the imagery to identify targets or anomalies. Furthermore, automated algorithms for classification, segmentation, and target/anomaly detection can require a massive amount of computation. In order to combat these issues, a variety of approaches have been recently proposed to perform dimensionality reduction on hyperspectral imagery. Since hyperspectral data cannot be assumed to lie on a linear manifold, 2 many nonlinear approaches to dimensionality reduction have been investigated, including Local Linear Embedding (LLE), 3 Isometric Feature Mapping (ISOMAP), 4 Kernel Principal Components Analysis (KPCA), 5 and Laplacian Eigenmaps (LE). 6 In this article, we focus on the LE algorithm, which involves constructing a graph representing the highdimensional data and then using generalized eigenvectors of the graph Laplacian matrix as the basis for a lower-dimensional space in which local properties of the data are preserved. Recent research 7 9 has shown that due to spatial correlations in hyperspectral imagery (especially in high resolution hyperspectral imagery), spatial information should be included, or fused, with the spectral information in order to more adequately represent the properties of the image data in the lower-dimensional space. Incorporating spatial information has been approached from multiple fronts: modifying the structure of the graph, 7,8 modifying the edge weights, 9 or fusing spatial and spectral Laplacian matrices and/or their generalized eigenvectors. 8 We propose a different generalization of the LE algorithm for dimensionality reduction of hyperspectral imagery in a manner that fuses spatial and spectral information. Our generalization, which we refer to as the Spatial-Spectral Schroedinger Eigenmaps (SSSE) algorithm, is based on adding nondiagonal potentials encoding Send correspondence to Nathan D. Cahill: nathan.cahill@rit.edu

spatial proximity to the Laplacian matrix of the original graph (which contains spectral proximity information). Adding these potentials changes the Laplacian operator to a Schroedinger operator, making our proposed algorithm an instance of the Schroedinger Eigenmaps (SE) algorithm. 10 (Originally, SE was proposed for semisupervised dimensionality reduction and learning; in SSSE, the semi-supervision refers to knowledge of spatial proximity between pixels instead of knowledge of particular class labels.) To illustrate the practicality of the SSSE algorithm, we performed experiments on publicly available hyperspectral images (Pavia University and Indian Pines). We used a subset of the ground-truth labels from these images to learn classifiers for predicting class labels from the SSSE reduced-dimension data. When comparing SSSE with eight other dimensionality reduction algorithms, the subsequent classification performance is competitive/superior in nearly all cases. The remainder of this article is organized as follows. Section 2 provides mathematical preliminaries that describe the LE and SE algorithms, as well as prior art approaches for spatial-spectral fusion in LE-based dimensionality reduction. Section 3 presents the proposed SSSE algorithm. Section 4 describes, carries out, and analyzes the results of classification experiments that illustrate the efficacy of the SSSE algorithm with respect to several prior art algorithms. Finally, section 5 provides some concluding remarks. 2. MATHEMATICAL PRELIMINARIES In many areas of imaging analysis and computer vision, high dimensional data intrinsically resides on a low dimensional manifold in the high dimensional space. The goal of dimensionality reduction algorithms is to reduce the number of dimensions in the data in a way that preserves properties of the low dimensional manifold. Mathematically, ifx = {x 1,...,x k }isasetofpointsonamanifoldm R n, dimensionalityreductionalgorithms aim to identify a set of corresponding points Y = {y 1,...,y k } in R m, where m << n, so that the structure of Y is somehow similar to that of X. 2.1 Laplacian Eigenmaps The Laplacian Eigenmaps (LE) algorithm of Belkin and Niyogi 11 is a geometrically motivated nonlinear dimensionality reduction algorithm that is popular due to its computational efficiency, its locality preserving properties, and its natural relationship to clustering algorithms. It involves the following three steps: 1. Construct an undirected graph G = (X,E) whose vertices are the points in X and whose edges E are defined based on proximity between vertices. Proximity can be found either by ǫ-neighborhoods or by (mutual) k-nearest neighbor search. 2. Define weights for the edges in E. One ( common method ) is to define weights according to the heat kernel; i.e., define the weight W i,j = exp x i x j 2 /σ if an edge exists between x i and x j or W i,j = 0 otherwise. 3. Compute the smallest m+1 eigenvalues and eigenvectors of the generalized eigenvector problem Lf = λdf, wheredisthediagonalweighteddegreematrixdefinedbyd i,i = j W i,j, andl = D W isthelaplacian matrix. If the resulting eigenvectors f 0,..., f m, are ordered so that 0 = λ 0 λ 1 λ m, then the points y T 1, y T 2,..., y T 3 are defined to be the rows of F = [f 1 f 2 f m ]. As noted by Belkin and Niyogi, 11 the generalized eigenvector problem solved in the LE algorithm is identical to the one that emerges in the normalized cuts (NCut) algorithm 12,13 for clustering vertices of a graph into different classes. In fact, clustering can proceed directly on the points in Y, using a standard algorithm such as k-means clustering.

2.2 Schroedinger Eigenmaps Czaja and Ehler 10 proposed the Schroedinger Eigenmaps (SE) algorithm by generalizing the LE algorithm to incorporate a potential matrix V. The SE algorithm proceeds with the same steps as the LE algorithm, with the exception that the generalized eigenvector problem in step (3) is replaced by the problem (L+αV)f = λdf, where α is a parameter chosen to relatively weight the contributions of the Laplacian matrix and potential matrix. Two types of potentials have been explored for use in hyperspectral imaging analysis: 14 barriers and clusters. Barrier potentials are created by defining V to be a nonnegative diagonal matrix. The positive entries in V effectively pull the corresponding points in Y towards the origin. Cluster potentials are created by defining V to be the sum of nondiagonal matrices V (i,j) defined by: V (i,j) k,l = 1, (k,l) {(i,i),(j,j)} 1, (k,l) {(i,j),(j,i)} 0, otherwise The inclusion of V (i,j) in V effectively pulls or clusters y i and y j together.. (1) A key benefit of SE is that the potential matrix V enables semi-supervised clustering. If a subset of points in X has a known label, defining V to be a cluster potential will pull the corresponding points in Y towards each other. This same behavior extends to multiple labels. Following dimensionality reduction via SE, a standard clustering algorithm (like k-means clustering) can be employed as in the previous section. 2.3 Spatial-Spectral Fusion When the manifold under investigation describes image data, it is not only the spectral (intensity) information at each pixel in the image that influences the structure of the manifold, but also the spatial relationships between the spectra of neighboring pixels. To handle both spectral and spatial information mathematically, a manifold point x i is represented by concatenating a pixel s spectral information x f i and its spatial location x p i [ ; i.e., x T i = x f T ] T i x pt i. There are multiple ways to proceeding with LE-based dimensionality reduction (and clustering) that have been explored in the literature. 2.3.1 Shi-Malik Shi and Malik 12,13 describes how to handle graph construction and edge weight definition in a manner that incorporates both spectral and spatial information. This technique applied in a LE-based dimensionality reduction algorithm can be described by the following steps: 1. Construct G so that the set of edges E is defined based on ǫ-neighborhoods of the spatial locations; i.e., define an edge between x i and x j if x p i xp j 2 < ǫ. 2. Define edge weights by: W i,j = ) 2 xfi xfj exp ( xp i xp j 2 σf 2 σp 2, (x i,x j ) E 0, otherwise 3. Proceed with step (3) of the LE algorithm defined in Section 2.1. 2.3.2 Gilles-Bowles. (2) Gilles and Bowles 9 modify the approach of Shi and Malik to incorporate a penalty on differences in the direction of the spectral information as opposed to a penalty on the norm of their differences, and they illustrate how this modification is useful in segmenting hyperspectral images. The difference between the Gilles-Bowles and Shi-Malik approaches is that the edge weights in (2) are replaced by: W i,j = exp ( cos 1 ( x f i,xf j x f i x f j ) xpi xpj 2 σ 2 p ), (x i,x j ) E 0, otherwise. (3)

2.3.3 Hou-Zhang-Ye-Zheng Hou et al. 7 propose a slightly different approach to fusing spectral and spatial information in an LE-based algorithm within a system for classifying regions of hyperspectral imagery. Instead of the Shi-Malik and Gilles- Bowles approaches of defining graph edges based solely on spatial information and weights based on fused spectral-spatial information, Hou et al. uses the fused spectral-spatial information in the step of defining the graph edges and then uses binary weights; i.e., 1. Construct G so that the set of edges E is defined based on k-nearest neighbors according fused spectralspatial metric; i.e., define an edge between x i and x j if x i and x j are mutually in the k-nearest neighbors of each other according to the measure: x f i xf j 2 ( ( x p i d(x i,x j ) = 1 exp 1 exp xp 2 )) j. (4) 2σ 2 f 2σ 2 p 2. Define binary edge weights: W i,j = { 1, (xi,x j ) E 0, otherwise. (5) 3. Proceed with step (3) of the LE algorithm defined in Section 2.1. 2.3.4 Benedetto et al. Benedetto et al. 8 propose a variety of ways to fuse spectral and spatial information into a LE-based algorithm that is used in conjunction with linear discriminant analysis (LDA) to classify hyperspectral imagery. To unify their various proposed techniques, we introduce the metric: d β (x i,x j ) = x f i xf j 2 ( 2 x p i j β +(1 β) xp 2 ) 2, (6) σ 2 f σ 2 p where 0 β 1. Note that d 0 measures scaled Euclidean distance based purely on spatial components, and d 1 measures scaled Euclidean distance based purely on spectral components. Furthermore, we define G β to be the graph constructed so that the set of edges E β is defined based on mutual k-nearest neighbors according to the metric d β (x i,x j ). We also define the weight matrix W β componentwise by: W (β) i,j = { exp ( d β (x i,x j ) 2), (x i,x j ) E β 0, otherwise and we define the corresponding Laplacian matrix L β = D β W β., (7) With this notation, we can describe the following three flavors of LE-based manifold recovery proposed by Benedetto et al. 8 Benedetto-E: Fused Eigenvectors Perform the following steps: 1. Construct graphs G 0 and G 1 so that the sets of edges E 0 and E 1 are defined based on mutual k-nearest neighbors according to the metrics d 0 and d 1, respectively. 2. Define edge weights for G 0 and G 1 according to (7) with β = 0 and 1, respectively. 3. Let m = m 0 + m 1. Compute the smallest m 0 + 1 eigenvalues and eigenvectors L 0 f (0) = λd 0 f (0), and compute the smallest m 1 + 1 eigenvalues and eigenvectors of L 1 f (1) = λd 1 f (1). Assuming each set of eigenvectors is sorted so [ that the eigenvalues are increasing, then the points y1 T, y2 T,..., y3 T are defined to be the rows of F = f (0) 1 f m (0) 0 f (1) 1 f m (1) 1 ].

Benedetto-L: Fused Laplacians Perform steps (1) and (2) of Benedetto-E. Now perform the steps: 3. Define a fused Laplacian matrix L using one of the three methods: (a) element-wise multiplication of L 0 and L 1, (b) sum of L 0 and L 1, or (c) matrix multiplication of L 1 by L 0, followed by zeroing any components corresponding to edges not in E 1. (In fusion methods (a) and (c), the diagonals of the resulting matrices should be recomputed in order to ensure that they are valid Laplacian matrices; i.e., that the row sums are all zero.) 4. Proceed with step (3) of the LE algorithm defined in Section 2.1, using the fused Laplacian matrix L. Benedetto-M: Fused Metric Perform the standard LE algorithm using the graph G β with corresponding weight matrix W β. 3. SPATIAL-SPECTRAL SCHROEDINGER EIGENMAPS FOR DIMENSIONALITY REDUCTION AND CLUSTERING All of the prior art approaches described in Section 2.3 for performing dimensionality reduction and clustering with fused spatial and spectral information are based on the LE algorithm. We propose a different approach for spatial-spectral dimensionality reduction and clustering: computing Schroedinger Eigenmaps on graphs defined with spectral information, using cluster potentials that encode spatial proximity. The proposed algorithm, which we denote SSSE (Spatial-Spectral Schroedinger Eigenmaps) proceeds as follows: 1. Construct an undirected graph G = (X,E) whose vertices are the points in X and whose edges E are defined based on proximity between the spectral components of the vertices. 2. Define ( weights for the edges in E based on spectral information. For example, define the weight W i,j = ) exp x f i xf j 2 /σf 2 if an edge exists between x i and x j or W i,j = 0 otherwise. 3. Define a cluster potential matrix V that encodes proximity between the spatial components of the vertices: ( k x p V = V (i,j) i γ i,j exp xp 2 ) j, (8) i=1 x j Nǫ(x p i) where N p ǫ (x i ) is the set of points in X whose spatial components are in an ǫ-neighborhood of the spatial components of x i ; i.e., N p ǫ (x i ) = {x X x i s.t. x p i xp ǫ}, (9) V (i,j) is defined as in (1), and γ i,j can be chosen in a manner that provides greater influence for spatial neighbors having nearby spectral components. 4. Compute the smallest m+1 eigenvalues and eigenvectors of (L+αV)f = λdf, where D is the diagonal weighted degree matrix defined by D i,i = j W i,j, and L = D W is the Laplacian matrix. If the resulting eigenvectors f 0,..., f m, are ordered so that 0 = λ 0 λ 1 λ m, then the points y T 1, y T 2,..., y T 3 are defined to be the rows of F = [f 1 f 2 f m ]. Following dimensionality reduction, a standard clustering algorithm(like k-means clustering) can be employed as in Sections 2.1 2.2. Note the similarities between the SSSE algorithm ( and the Shi-Malik and Gilles-Bowles approaches described ) ( ( )) in Section 2.3.1 2.3.2. If we choose γ i,j = exp x f i xf j 2 /σf 2 or exp cos 1 f x i,xf j, then the σ 2 p x f i x f j

coefficients of each V (i,j) in (8) are equivalent to the edge weights in (2) or (3). The benefit of SSSE is that since these coefficients are applied to the cluster potentials (and not applied as edge weights on the graph G), the spatial neighborhood N p ǫ can be chosen to be quite small (even ǫ = one pixel) while still allowing G to contain edges corresponding to spectrally similar points that may be spatially distant. Another advantage of SSSE over some of the other algorithms (specifically, the Hou-Zhang-Ye-Zheng and Benedetto-M algorithms) is that the impact of changing the relative magnitudes of the spatial and spectral scale parameters (σ f and σ p ) can be explored without having to repeat the graph construction step. Once a graph is constructed, any changes made with respect to σ p /σ f can be achieved solely by modifying the cluster potential matrix. 4. CLASSIFICATION EXPERIMENTS In order to determine the efficacy of the proposed algorithm for spatial-spectral dimensionality reduction and compare its performance with respect to the prior art algorithms described in Section 2.3, we perform classification experiments(after dimensionality reduction) using publicly available hyperspectral image data sets with manually labeled ground truth. The data sets, experiments, and results are described in this section. 4.1 Data We use two publicly available datasets: Indian Pines and Pavia. The Indian Pines image, shown in Fig. s 1a 1b, was captured by an AVIRIS spectrometer over the rural Indian Pines test site in Northwestern Indiana, USA. The image contains 145 145 pixels with spatial resolution of approximately 20 meters per pixel, with 224 spectral bands, 4 of which we have discarded due to noise and water. The image has been partially labeled, yielding 10249 ground truth pixels associated with 16 classes. The Pavia image, a portion of which is shown in Fig. s 1c 1d, was captured by a ROSIS sensor over the University of Pavia, Italy. The original image contains 610 340 pixels with spatial resolution of approximately 1.3 meters per pixel, with 115 spectral bands. A partial set of labels yields 42776 ground truth pixels associated with 9 classes. We use a cropped subset (610 175 pixels) of the original image in which the ground truth labels are particularly spatially diverse. (a) (b) (c) (d) Figure 1: Original images and ground truth: (a) Indian Pines, bands [29, 15, 12], (b) Indian Pines ground truth, (c) Pavia, bands [68,30,2], (d) Pavia ground truth.

4.2 Experimental Setup To compare our proposed SSSE algorithm with prior art algorithms, we use each algorithm to perform dimensionality reduction, and then we subsequently perform classification using the lower-dimensional embeddings in a similar manner to the protocol described in Benedetto et al. 8 The classification step is performed using linear discriminant analysis (LDA) as implemented in MATLAB, with 10% and 1% of each class selected from the ground truth pixels from the Indian Pines and Pavia images, respectively. We repeated classification 10 times and computed the mode of the results at each pixel to yield the final classification result. We used resulting confusion matrices to compute per-class accuracy as well as overall accuracy (OA), average accuracy (AA), average precision (AP), average sensitivity (ASe), average specificity (ASp), and Kappa coefficient (κ). Finally, we compared algorithms by determining whether differences in their Kappa coefficients were statistically significant using Z scores. 15 For the dimensionality ( reduction step, we use two versions of our proposed SSSE algorithm: SSSE1 - SSSE ) ( ( )) with γ i,j = exp x f i xf j 2 /σf 2, and SSSE2 - SSSE with γ i,j = exp cos 1 f x i,xf j. We also x f i x f j use our own implementations of the following algorithms: SM Shi-Malik, GB Gilles-Bowles, HZYZ Hou- Zhang-Ye-Zheng, BE Benedetto-E, BL1 Benedetto-L with element-wise multiplication of Laplacians, BL2 Benedetto-L with addition of Laplacians, BL3 Benedetto-L with matrix multiplication of Laplacians followed by zeroing of edges not in E 1, and BM Benedetto-M. A few notes about data treatment and parameter choices: Prior to dimensionality reduction, the spectral components of the data in X are normalized so that (1/k) k i=1 x f i = 1. We also assume that components of x p are in units of pixels. We make the 2 initial choice of σ f = σ p = 1 for each algorithm, but we adjust these parameters when necessary to improve performance. For all algorithms, we choose the reduced dimension to be n = 50 for Indian Pines and n = 25 for Pavia. For algorithms requiring graph construction via k-nearest neighbors (SSSE, HZYZ, BE, BL1, BL2, BL3, BM), we select k = 20. For the SSSE algorithm, we choose ǫ = 1 pixel for defining the neighborhood N p ǫ (x i ). In addition, we introduce a parameter ˆα defined by α = ˆα tr(l)/tr(v), in order to trade off the impact of L and V in a way that can be directly compared across images. 4.3 Results In the SSSE algorithm, fixing σ f = σ p = 1 leaves ˆα as the only free parameter. We tested classification after dimensionality reduction via SSSE1 and SSSE2 by selecting 17 logarithmically spaced values for ˆα ranging from 1 to 100. The resulting overall accuracy and average accuracy, precision, sensitivity, and specificity are shown as functions of ˆα in Fig. 2. Figures 3 4 show resulting classification maps for a subset of these choices of ˆα, as well as for the choice ˆα = 0 (corresponding to the use of solely spectral information). For both sets of images, we selected the best value of ˆα to be the value that appears to best maximize all of the reported quantities (OA, AA, AP, ASe, ASp). For the Indian Pines image (for both SSSE1 and SSSE2), this value is ˆα = 17.78, whereas for the Pavia image (again for both SSSE1 and SSSE2), it is ˆα = 23.71. Numerical values of OA, AA, AP, ASe, ASp, and κ, as well as classification accuracy for each class, are reported in Tables 1 3. Also in Tables 1 4 are results of classification after using (our implementations of) the prior-art algorithms for dimensionality reduction and determining the best choice of parameters for those algorithms. For the Indian Pines image, these best parameter choices are: SM: ǫ = 5, σ f = 0.1, σ p = 100; GB: ǫ = 5, σ f = 0.2, σ p = 100; HZYZ: σ f = 1, σ p = 10; BE: 8 spatial / 42 spectral eigenvectors, σ f = 1, σ p = 10; BM: σ f = 1, σ p = 10, β = 0.98. For the Pavia image, the best parameter choices are: SM: ǫ = 7, σ f = 0.45, σ p = 100; GB: ǫ = 7, σ f = 0.2, σ p = 100; HZYZ: σ f = 1, σ p = 10; BE: 5 spatial / 20 spectral eigenvectors, σ f = 1, σ p = 10; BM:

1 0.9 0.8 0.7 10 0 10 1 10 2 1 0.95 0.9 0.85 0.8 0.75 10 0 10 1 10 2 ˆα ˆα Indian Pines 1 0.95 0.9 0.85 0.8 0.75 10 0 10 1 10 2 1 0.95 0.9 0.85 0.8 0.75 10 0 10 1 10 2 ˆα ˆα Pavia Figure 2: Classification performance measures for SSSE1 (top) and SSSE2 (bottom) as functions of ˆα: Overall accuracy (blue circles), average accuracy (green x s), average precision (red squares), average sensitivity (black + s), average specificity (magenta triangles), and Kappa coefficient (yellow triangles). Dashed vertical lines indicate best choice of ˆα. ˆα = 0 ˆα = 1.33 ˆα = 3.16 ˆα = 7.50 ˆα = 17.78 ˆα = 42.17 ˆα = 100 Figure 3: Classification results for Indian Pines image after dimensionality reduction via SSSE1 (top row) and SSSE2 (bottom row) for various values of ˆα.

ˆα = 0 ˆα = 1.33 ˆα = 3.16 ˆα = 7.50 ˆα = 17.78 ˆα = 42.17 ˆα = 100 Figure 4: Classification results for Pavia image after dimensionality reduction via SSSE1 (top row) and SSSE2 (bottom row) for various values of ˆα. σ f = 1, σ p = 10, β = 0.95. Note that we did not include results corresponding to BL1; performing element-wise multiplication of weights caused some rows of the resulting weight matrix to be numerically zero, leading to a graph that was not connected so that the eigenvalue zero had multiplicity greater than one. As can be seen in Table 1, for the Indian Pines image, the SSSE2 algorithm exhibits the best performance in terms of all of the global measures (OA, AA, AP, ASe, and ASp), and the SSSE1 algorithm exhibits the second best performance. Other algorithms that perform fairly well on the Indian Pines image include SM, GB, HZYZ, and BE. To determine whether the difference in classification results from different algorithms may be statistically significant, we compute the standard normal deviant, Z, from the Kappa coefficients and their variance estimates; Z scores above 1.96 indicate statistically significant differences in Kappa coefficient at the 95% confidence level. Table 2 shows when the resulting Z scores indicate statistically significant differences between classification performance for each pair of algorithms. From this table, we see that for the Indian Pines data, while the differences in performance between SSSE1 and SSSE2 are not statistically significant, both SSSE1 and SSSE2 do exhibit statistically significant improvements over all other algorithms (with the exception of SSSE1 and HZYZ, in which the difference in performance is not statistically significant). In Table 3, we actually see that for the Pavia image, the BE algorithm exhibits the best performance in terms of all of the global measures. However, SSSE2 and SSSE1 come in second and third place, respectively, in terms of most of the global measures. (HZYZ outperforms SSSE1 in average precision). HZYZ also performs quite well on the Pavia image, and GB performs fairly well. Table 4 confirms this interpretation: the BE algorithm performs significantly better than other algorithms. Excluding BE, the SSSE1 and SSSE2 algorithms perform significantly better than all remaining algorithms.

No. of Samp. SSSE1 SSSE2 SM GB HZYZ BE BL2 BL3 BM OA 95.45 95.64 92.10 93.32 94.87 92.97 65.29 62.49 87.70 AA 99.43 99.45 99.01 99.16 99.36 99.12 95.66 95.31 98.46 AP 96.63 96.77 92.81 94.51 96.15 94.02 69.82 63.44 90.43 ASe 91.19 91.32 86.87 88.66 90.73 87.77 61.78 62.18 81.98 ASp 99.68 99.69 99.45 99.53 99.64 99.51 97.53 97.33 99.14 κ 94.81 95.02 90.99 92.37 94.15 91.98 60.59 57.02 86.03 Class 1 46 99.99 99.99 100.0 100.0 99.99 99.99 99.37 99.18 99.99 Class 2 1428 98.10 98.21 95.80 95.65 97.56 96.03 86.39 86.74 93.51 Class 3 830 98.68 98.66 98.19 98.64 98.74 98.07 92.11 90.74 98.19 Class 4 237 99.40 99.35 99.22 99.69 99.62 99.26 96.36 97.39 96.82 Class 5 483 99.63 99.69 99.51 99.65 99.60 99.40 98.47 98.47 99.08 Class 6 730 99.92 99.91 98.90 99.63 99.97 99.56 98.01 97.36 98.48 Class 7 28 99.87 99.91 99.73 99.86 99.81 99.67 99.45 99.94 99.42 Class 8 478 99.99 99.99 100.0 100.0 100.0 100.0 99.40 99.17 100.0 Class 9 20 99.80 99.75 98.99 99.65 99.80 99.75 98.78 98.86 98.86 Class 10 972 98.30 98.40 97.49 97.80 98.03 97.27 92.34 89.30 96.89 Class 11 2455 98.16 98.24 98.62 98.54 98.26 98.63 83.70 81.47 97.75 Class 12 593 99.39 99.53 98.74 98.72 98.88 98.75 92.83 93.25 97.51 Class 13 205 99.95 99.96 99.98 99.98 99.99 99.98 99.40 99.27 99.98 Class 14 1265 99.88 99.85 99.91 99.96 99.84 99.99 97.77 97.79 99.83 Class 15 386 99.93 99.92 99.13 99.13 99.65 99.75 96.39 96.21 99.09 Class 16 93 99.91 99.90 99.98 99.73 99.98 99.84 99.84 99.84 99.98 Table 1: Indian Pines classification results using various dimensionality algorithms. OA = Overall Accuracy, AA = Average Accuracy, AP = Average Precision, ASe = Average Sensitivity, ASp = Average Specificity, κ = Kappa coefficient. Class rows report per-class accuracy. Classes: 1 = Alfalfa, 2 = Corn-notill, 3 = Corn-mintill, 4 = Corn, 5 = Grass-pasture, 6 = Grass-trees, 7 = Grass-pasture-mowed, 8 = Hay-windrowed, 9 = Oats, 10 = Soybean-notill, 11 = Soybean-mintill, 12 = Soybean-clean, 13 = Wheat, 14 = Woods, 15 = Buildings-Grass-Trees-Drives, 16 = Stone-Steel-Towers. All quantities (except number of samples) are percentages. SSSE1 SSSE2 SM GB HZYZ BE BL2 BL3 BM SSSE1 o + + o + + + + SSSE2 o + + + + + + + SM + + + GB + o + + + HZYZ o + + + + + + BE + o + + + BL2 + BL3 BM + + Table 2: Statistical significance between κ values of classification algorithms on Indian Pines data. Each entry is + if κ is significantly larger in the row method versus the column method, if κ is significantly smaller, and o if there is no significant difference. Significance is measured at the 95% confidence level.

No. of Samp. SSSE1 SSSE2 SM GB HZYZ BE BL2 BL3 BM OA 95.33 95.64 81.14 91.50 91.98 97.14 83.56 79.77 86.70 AA 98.96 99.03 95.81 98.11 98.22 99.37 96.35 95.50 97.04 AP 92.72 93.46 71.09 84.38 92.74 97.25 78.46 73.60 77.48 ASe 96.12 96.46 70.31 87.57 92.78 97.75 82.21 75.57 75.14 ASp 99.43 99.47 97.55 98.94 98.86 99.61 97.91 97.41 98.35 κ 94.05 94.45 75.85 89.15 89.91 96.37 78.99 74.30 82.99 Class 1 6631 99.24 99.20 91.86 96.97 98.13 99.02 94.96 92.57 93.97 Class 2 18649 99.44 99.69 93.28 97.45 96.67 98.13 96.31 95.72 96.42 Class 3 2099 97.13 97.31 98.29 99.18 97.12 99.96 89.79 91.43 99.03 Class 4 3064 99.28 99.29 93.65 96.27 96.76 98.37 98.04 95.01 93.91 Class 5 1345 99.91 99.79 98.93 99.53 99.96 99.87 99.91 98.42 99.19 Class 6 5029 99.55 99.80 97.91 99.13 99.81 99.98 98.40 98.73 98.54 Class 7 1330 99.95 99.98 97.58 99.90 99.47 99.96 97.40 95.50 98.45 Class 8 3682 96.36 96.52 95.24 97.48 96.03 99.01 92.32 92.51 97.11 Class 9 947 99.80 99.70 95.53 97.08 99.99 99.99 99.98 99.64 96.77 Table 3: Pavia classification results using various dimensionality algorithms. OA = Overall Accuracy, AA = Average Accuracy, AP = Average Precision, ASe = Average Sensitivity, ASp = Average Specificity, κ = Kappa coefficient. Class rows report per-class accuracy. Classes: 1 = Asphalt, 2 = Meadows, 3 = Gravel, 4 = Trees, 5 = Painted metal sheets, 6 = Bare soil, 7 = Bitumen, 8 = Self-Blocking Bricks, 9 = Shadows. All quantities (except number of samples) are percentages. SSSE1 SSSE2 SM GB HZYZ BE BL2 BL3 BM SSSE1 o + + + + + + SSSE2 o + + + + + + SM + GB + + + + HZYZ + + + + + BE + + + + + + + + BL2 + + BL3 BM + + + Table 4: Statistical significance between κ values of classification algorithms on Pavia data. Each entry is + if κ is significantly larger in the row method versus the column method, if κ is significantly smaller, and o if there is no significant difference. Significance is measured at the 95% confidence level.

5. CONCLUSION In this article, we proposed a new algorithm for dimensionality reduction using both the spatial and spectral information present in a hyperspectral image. The algorithm is based on Schroedinger Eigenmaps, which has traditionally been used for semi-supervised learning. By constructing a graph based solely on spectral information and then defining a cluster potential matrix that encodes spatial relationships between pixels, our proposed algorithm provides a natural way to trade off the relative impact of the spatial versus spectral information in the dimensionality reduction process. Classification experiments on publicly available hyperspectral images with manually labeled ground truth show that the proposed algorithm exhibits superior/competitive performance to a variety of prior art algorithms for reducing the dimension of the data provided to a standard classification algorithm. APPENDIX Prototype implementations of the Spatial-Spectral Schroedinger Eigenmaps algorithms (SSSE1 and SSSE2) are available for download at MATLAB Central (http://www.mathworks.com/matlabcentral/) under File ID #45908. ACKNOWLEDGEMENTS The authors would like to thank Prof. Landgrebe (Purdue University, USA) for providing the Indian Pines data and Prof. Paolo Gamba (Pavia University, Italy) for providing the Pavia University data. REFERENCES [1] Schott, J. R., [Remote Sensing: The Image Chain Approach], Oxford University Press, 2nd ed. (2007). [2] Prasad, S. and Bruce, L., Limitations of principal components analysis for hyperspectral target recognition, Geoscience and Remote Sensing Letters, IEEE 5, 625 629 (Oct 2008). [3] Kim, D. and Finkel, L., Hyperspectral image processing using locally linear embedding, in [Neural Engineering, 2003. Conference Proceedings. First International IEEE EMBS Conference on], 316 319 (March 2003). [4] Bachmann, C., Ainsworth, T., and Fusina, R., Exploiting manifold geometry in hyperspectral imagery, Geoscience and Remote Sensing, IEEE Transactions on 43, 441 454 (March 2005). [5] Fauvel, M., Chanussot, J., and Benediktsson, J., Kernel principal component analysis for the classification of hyperspectral remote sensing data of urban areas, EURASIP Journal on Advances in Signal Processing 2009(783194), 1 14 (2009). [6] Halevy, A., Extensions of Laplacian Eigenmaps for Manifold Learning, PhD thesis, University of Maryland, College Park (2011). [7] Hou, B., Zhang, X., Ye, Q., and Zheng, Y., A novel method for hyperspectral image classification based on Laplacian eigenmap pixels distribution-flow, Selected Topics in Applied Earth Observations and Remote Sensing, IEEE Journal of 6(3), 1602 1618 (2013). [8] Benedetto, J., Czaja, W., Dobrosotskaya, J., Doster, T., Duke, K., and Gillis, D., Integration of heterogeneous data for classification in hyperspectral satellite imagery, in [Proc. of SPIE Vol. 8390], 839027 1 839027 12 (June 2012). [9] Gillis, D. B. and Bowles, J. H., Hyperspectral image segmentation using spatial-spectral graphs, Proc. SPIE Algorithms and Technologies for Multispectral, Hyperspectral, and Ultraspectral Imagery XVIII 8390, 83901Q 1 83901Q 11 (2012). [10] Czaja, W. and Ehler, M., Schroedinger eigenmaps for the analysis of biomedical data, IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 1274 1280 (May 2013). [11] Belkin, M. and Niyogi, P., Laplacian eigenmaps for dimensionality reduction and data representation, Neural Computation 15, 1373 1396 (June 2003). [12] Shi, J. and Malik, J., Normalized cuts and image segmentation, in [Computer Vision and Pattern Recognition, 1997. Proceedings., 1997 IEEE Computer Society Conference on], 731 737 (1997).

[13] Shi, J. and Malik, J., Normalized cuts and image segmentation, Pattern Analysis and Machine Intelligence, IEEE Transactions on 22(8), 888 905 (2000). [14] Benedetto, J., Czaja, W., Dobrosotskaya, J., Doster, T., Duke, K., and Gillis, D., Semi-supervised learning of heterogeneous data in remote sensing imagery, in [Proc. of SPIE Vol. 8401], 840104 1 840104 12 (June 2012). [15] Senseman, G. M., Bagley, C. F., and Tweddale, S. A., Accuracy assessment of the discrete classification of remotely-sensed digital data for landcover mapping, in [USACERL Technical Report EN-95/04], 1 27 (April 1995).