Schroedinger Eigenmaps with Nondiagonal Potentials for Spatial-Spectral Clustering of Hyperspectral Imagery

Schroedinger Eigenmaps with Nondiagonal Potentials for Spatial-Spectral Clustering of Hyperspectral Imagery Nathan D. Cahill a, Wojciech Czaja b, and David W. Messinger c a Center for Applied and Computational Mathematics, School of Mathematical Sciences, Rochester Institute of Technology, Rochester, NY 14623, USA b Department of Mathematics, University of Maryland, College Park, MD 20742, USA c Digital Imaging and Remote Sensing Laboratory, Center for Imaging Science, Rochester Institute of Technology, Rochester, NY 14623, USA ABSTRACT Schroedinger Eigenmaps (SE) has recently emerged as a powerful graph-based technique for semi-supervised manifold learning and recovery. By extending the Laplacian of a graph constructed from hyperspectral imagery to incorporate barrier or cluster potentials, SE enables machine learning techniques that employ expert/labeled information provided at a subset of pixels. In this paper, we show how different types of nondiagonal potentials can be used within the SE framework in a way that allows for the integration of spatial and spectral information in unsupervised manifold learning and recovery. The nondiagonal potentials encode spatial proximity, which when combined with the spectral proximity information in the original graph, yields a framework that is competitive with state-of-the-art spectral/spatial fusion approaches for clustering and subsequent classification of hyperspectral image data. Keywords: Schroedinger eigenmaps, Laplacian eigenmaps, spatial-spectral fusion, dimensionality reduction 1. INTRODUCTION In hyperspectral imagery, each image pixel is comprised of typically hundreds of spectral bands. 1 Hence, an m n hyperspectral image with d spectral bands can be thought of as a data set containing mn points in a d-dimensional space. Because d can be quite large, it can be difficult for analysts to effectively search the imagery to identify targets or anomalies. Furthermore, automated algorithms for classification, segmentation, and target/anomaly detection can require a massive amount of computation. In order to combat these issues, a variety of approaches have been recently proposed to perform dimensionality reduction on hyperspectral imagery. Since hyperspectral data cannot be assumed to lie on a linear manifold, 2 many nonlinear approaches to dimensionality reduction have been investigated, including Local Linear Embedding (LLE), 3 Isometric Feature Mapping (ISOMAP), 4 Kernel Principal Components Analysis (KPCA), 5 and Laplacian Eigenmaps (LE). 6 In this article, we focus on the LE algorithm, which involves constructing a graph representing the highdimensional data and then using generalized eigenvectors of the graph Laplacian matrix as the basis for a lower-dimensional space in which local properties of the data are preserved. Recent research 7 9 has shown that due to spatial correlations in hyperspectral imagery (especially in high resolution hyperspectral imagery), spatial information should be included, or fused, with the spectral information in order to more adequately represent the properties of the image data in the lower-dimensional space. Incorporating spatial information has been approached from multiple fronts: modifying the structure of the graph, 7,8 modifying the edge weights, 9 or fusing spatial and spectral Laplacian matrices and/or their generalized eigenvectors. 8 We propose a different generalization of the LE algorithm for dimensionality reduction of hyperspectral imagery in a manner that fuses spatial and spectral information. Our generalization, which we refer to as the Spatial-Spectral Schroedinger Eigenmaps (SSSE) algorithm, is based on adding nondiagonal potentials encoding Send correspondence to Nathan D. Cahill: nathan.cahill@rit.edu

spatial proximity to the Laplacian matrix of the original graph (which contains spectral proximity information). Adding these potentials changes the Laplacian operator to a Schroedinger operator, making our proposed algorithm an instance of the Schroedinger Eigenmaps (SE) algorithm. 10 (Originally, SE was proposed for semisupervised dimensionality reduction and learning; in SSSE, the semi-supervision refers to knowledge of spatial proximity between pixels instead of knowledge of particular class labels.) To illustrate the practicality of the SSSE algorithm, we performed experiments on publicly available hyperspectral images (Pavia University and Indian Pines). We used a subset of the ground-truth labels from these images to learn classifiers for predicting class labels from the SSSE reduced-dimension data. When comparing SSSE with eight other dimensionality reduction algorithms, the subsequent classification performance is competitive/superior in nearly all cases. The remainder of this article is organized as follows. Section 2 provides mathematical preliminaries that describe the LE and SE algorithms, as well as prior art approaches for spatial-spectral fusion in LE-based dimensionality reduction. Section 3 presents the proposed SSSE algorithm. Section 4 describes, carries out, and analyzes the results of classification experiments that illustrate the efficacy of the SSSE algorithm with respect to several prior art algorithms. Finally, section 5 provides some concluding remarks. 2. MATHEMATICAL PRELIMINARIES In many areas of imaging analysis and computer vision, high dimensional data intrinsically resides on a low dimensional manifold in the high dimensional space. The goal of dimensionality reduction algorithms is to reduce the number of dimensions in the data in a way that preserves properties of the low dimensional manifold. Mathematically, ifx = {x 1,...,x k }isasetofpointsonamanifoldm R n, dimensionalityreductionalgorithms aim to identify a set of corresponding points Y = {y 1,...,y k } in R m, where m << n, so that the structure of Y is somehow similar to that of X. 2.1 Laplacian Eigenmaps The Laplacian Eigenmaps (LE) algorithm of Belkin and Niyogi 11 is a geometrically motivated nonlinear dimensionality reduction algorithm that is popular due to its computational efficiency, its locality preserving properties, and its natural relationship to clustering algorithms. It involves the following three steps: 1. Construct an undirected graph G = (X,E) whose vertices are the points in X and whose edges E are defined based on proximity between vertices. Proximity can be found either by ǫ-neighborhoods or by (mutual) k-nearest neighbor search. 2. Define weights for the edges in E. One ( common method ) is to define weights according to the heat kernel; i.e., define the weight W i,j = exp x i x j 2 /σ if an edge exists between x i and x j or W i,j = 0 otherwise. 3. Compute the smallest m+1 eigenvalues and eigenvectors of the generalized eigenvector problem Lf = λdf, wheredisthediagonalweighteddegreematrixdefinedbyd i,i = j W i,j, andl = D W isthelaplacian matrix. If the resulting eigenvectors f 0,..., f m, are ordered so that 0 = λ 0 λ 1 λ m, then the points y T 1, y T 2,..., y T 3 are defined to be the rows of F = [f 1 f 2 f m ]. As noted by Belkin and Niyogi, 11 the generalized eigenvector problem solved in the LE algorithm is identical to the one that emerges in the normalized cuts (NCut) algorithm 12,13 for clustering vertices of a graph into different classes. In fact, clustering can proceed directly on the points in Y, using a standard algorithm such as k-means clustering.

2.2 Schroedinger Eigenmaps Czaja and Ehler 10 proposed the Schroedinger Eigenmaps (SE) algorithm by generalizing the LE algorithm to incorporate a potential matrix V. The SE algorithm proceeds with the same steps as the LE algorithm, with the exception that the generalized eigenvector problem in step (3) is replaced by the problem (L+αV)f = λdf, where α is a parameter chosen to relatively weight the contributions of the Laplacian matrix and potential matrix. Two types of potentials have been explored for use in hyperspectral imaging analysis: 14 barriers and clusters. Barrier potentials are created by defining V to be a nonnegative diagonal matrix. The positive entries in V effectively pull the corresponding points in Y towards the origin. Cluster potentials are created by defining V to be the sum of nondiagonal matrices V (i,j) defined by: V (i,j) k,l = 1, (k,l) {(i,i),(j,j)} 1, (k,l) {(i,j),(j,i)} 0, otherwise The inclusion of V (i,j) in V effectively pulls or clusters y i and y j together.. (1) A key benefit of SE is that the potential matrix V enables semi-supervised clustering. If a subset of points in X has a known label, defining V to be a cluster potential will pull the corresponding points in Y towards each other. This same behavior extends to multiple labels. Following dimensionality reduction via SE, a standard clustering algorithm (like k-means clustering) can be employed as in the previous section. 2.3 Spatial-Spectral Fusion When the manifold under investigation describes image data, it is not only the spectral (intensity) information at each pixel in the image that influences the structure of the manifold, but also the spatial relationships between the spectra of neighboring pixels. To handle both spectral and spatial information mathematically, a manifold point x i is represented by concatenating a pixel s spectral information x f i and its spatial location x p i [ ; i.e., x T i = x f T ] T i x pt i. There are multiple ways to proceeding with LE-based dimensionality reduction (and clustering) that have been explored in the literature. 2.3.1 Shi-Malik Shi and Malik 12,13 describes how to handle graph construction and edge weight definition in a manner that incorporates both spectral and spatial information. This technique applied in a LE-based dimensionality reduction algorithm can be described by the following steps: 1. Construct G so that the set of edges E is defined based on ǫ-neighborhoods of the spatial locations; i.e., define an edge between x i and x j if x p i xp j 2 < ǫ. 2. Define edge weights by: W i,j = ) 2 xfi xfj exp ( xp i xp j 2 σf 2 σp 2, (x i,x j ) E 0, otherwise 3. Proceed with step (3) of the LE algorithm defined in Section 2.1. 2.3.2 Gilles-Bowles. (2) Gilles and Bowles 9 modify the approach of Shi and Malik to incorporate a penalty on differences in the direction of the spectral information as opposed to a penalty on the norm of their differences, and they illustrate how this modification is useful in segmenting hyperspectral images. The difference between the Gilles-Bowles and Shi-Malik approaches is that the edge weights in (2) are replaced by: W i,j = exp ( cos 1 ( x f i,xf j x f i x f j ) xpi xpj 2 σ 2 p ), (x i,x j ) E 0, otherwise. (3)

2.3.3 Hou-Zhang-Ye-Zheng Hou et al. 7 propose a slightly different approach to fusing spectral and spatial information in an LE-based algorithm within a system for classifying regions of hyperspectral imagery. Instead of the Shi-Malik and Gilles- Bowles approaches of defining graph edges based solely on spatial information and weights based on fused spectral-spatial information, Hou et al. uses the fused spectral-spatial information in the step of defining the graph edges and then uses binary weights; i.e., 1. Construct G so that the set of edges E is defined based on k-nearest neighbors according fused spectralspatial metric; i.e., define an edge between x i and x j if x i and x j are mutually in the k-nearest neighbors of each other according to the measure: x f i xf j 2 ( ( x p i d(x i,x j ) = 1 exp 1 exp xp 2 )) j. (4) 2σ 2 f 2σ 2 p 2. Define binary edge weights: W i,j = { 1, (xi,x j ) E 0, otherwise. (5) 3. Proceed with step (3) of the LE algorithm defined in Section 2.1. 2.3.4 Benedetto et al. Benedetto et al. 8 propose a variety of ways to fuse spectral and spatial information into a LE-based algorithm that is used in conjunction with linear discriminant analysis (LDA) to classify hyperspectral imagery. To unify their various proposed techniques, we introduce the metric: d β (x i,x j ) = x f i xf j 2 ( 2 x p i j β +(1 β) xp 2 ) 2, (6) σ 2 f σ 2 p where 0 β 1. Note that d 0 measures scaled Euclidean distance based purely on spatial components, and d 1 measures scaled Euclidean distance based purely on spectral components. Furthermore, we define G β to be the graph constructed so that the set of edges E β is defined based on mutual k-nearest neighbors according to the metric d β (x i,x j ). We also define the weight matrix W β componentwise by: W (β) i,j = { exp ( d β (x i,x j ) 2), (x i,x j ) E β 0, otherwise and we define the corresponding Laplacian matrix L β = D β W β., (7) With this notation, we can describe the following three flavors of LE-based manifold recovery proposed by Benedetto et al. 8 Benedetto-E: Fused Eigenvectors Perform the following steps: 1. Construct graphs G 0 and G 1 so that the sets of edges E 0 and E 1 are defined based on mutual k-nearest neighbors according to the metrics d 0 and d 1, respectively. 2. Define edge weights for G 0 and G 1 according to (7) with β = 0 and 1, respectively. 3. Let m = m 0 + m 1. Compute the smallest m 0 + 1 eigenvalues and eigenvectors L 0 f (0) = λd 0 f (0), and compute the smallest m 1 + 1 eigenvalues and eigenvectors of L 1 f (1) = λd 1 f (1). Assuming each set of eigenvectors is sorted so [ that the eigenvalues are increasing, then the points y1 T, y2 T,..., y3 T are defined to be the rows of F = f (0) 1 f m (0) 0 f (1) 1 f m (1) 1 ].

Benedetto-L: Fused Laplacians Perform steps (1) and (2) of Benedetto-E. Now perform the steps: 3. Define a fused Laplacian matrix L using one of the three methods: (a) element-wise multiplication of L 0 and L 1, (b) sum of L 0 and L 1, or (c) matrix multiplication of L 1 by L 0, followed by zeroing any components corresponding to edges not in E 1. (In fusion methods (a) and (c), the diagonals of the resulting matrices should be recomputed in order to ensure that they are valid Laplacian matrices; i.e., that the row sums are all zero.) 4. Proceed with step (3) of the LE algorithm defined in Section 2.1, using the fused Laplacian matrix L. Benedetto-M: Fused Metric Perform the standard LE algorithm using the graph G β with corresponding weight matrix W β. 3. SPATIAL-SPECTRAL SCHROEDINGER EIGENMAPS FOR DIMENSIONALITY REDUCTION AND CLUSTERING All of the prior art approaches described in Section 2.3 for performing dimensionality reduction and clustering with fused spatial and spectral information are based on the LE algorithm. We propose a different approach for spatial-spectral dimensionality reduction and clustering: computing Schroedinger Eigenmaps on graphs defined with spectral information, using cluster potentials that encode spatial proximity. The proposed algorithm, which we denote SSSE (Spatial-Spectral Schroedinger Eigenmaps) proceeds as follows: 1. Construct an undirected graph G = (X,E) whose vertices are the points in X and whose edges E are defined based on proximity between the spectral components of the vertices. 2. Define ( weights for the edges in E based on spectral information. For example, define the weight W i,j = ) exp x f i xf j 2 /σf 2 if an edge exists between x i and x j or W i,j = 0 otherwise. 3. Define a cluster potential matrix V that encodes proximity between the spatial components of the vertices: ( k x p V = V (i,j) i γ i,j exp xp 2 ) j, (8) i=1 x j Nǫ(x p i) where N p ǫ (x i ) is the set of points in X whose spatial components are in an ǫ-neighborhood of the spatial components of x i ; i.e., N p ǫ (x i ) = {x X x i s.t. x p i xp ǫ}, (9) V (i,j) is defined as in (1), and γ i,j can be chosen in a manner that provides greater influence for spatial neighbors having nearby spectral components. 4. Compute the smallest m+1 eigenvalues and eigenvectors of (L+αV)f = λdf, where D is the diagonal weighted degree matrix defined by D i,i = j W i,j, and L = D W is the Laplacian matrix. If the resulting eigenvectors f 0,..., f m, are ordered so that 0 = λ 0 λ 1 λ m, then the points y T 1, y T 2,..., y T 3 are defined to be the rows of F = [f 1 f 2 f m ]. Following dimensionality reduction, a standard clustering algorithm(like k-means clustering) can be employed as in Sections 2.1 2.2. Note the similarities between the SSSE algorithm ( and the Shi-Malik and Gilles-Bowles approaches described ) ( ( )) in Section 2.3.1 2.3.2. If we choose γ i,j = exp x f i xf j 2 /σf 2 or exp cos 1 f x i,xf j, then the σ 2 p x f i x f j

coefficients of each V (i,j) in (8) are equivalent to the edge weights in (2) or (3). The benefit of SSSE is that since these coefficients are applied to the cluster potentials (and not applied as edge weights on the graph G), the spatial neighborhood N p ǫ can be chosen to be quite small (even ǫ = one pixel) while still allowing G to contain edges corresponding to spectrally similar points that may be spatially distant. Another advantage of SSSE over some of the other algorithms (specifically, the Hou-Zhang-Ye-Zheng and Benedetto-M algorithms) is that the impact of changing the relative magnitudes of the spatial and spectral scale parameters (σ f and σ p ) can be explored without having to repeat the graph construction step. Once a graph is constructed, any changes made with respect to σ p /σ f can be achieved solely by modifying the cluster potential matrix. 4. CLASSIFICATION EXPERIMENTS In order to determine the efficacy of the proposed algorithm for spatial-spectral dimensionality reduction and compare its performance with respect to the prior art algorithms described in Section 2.3, we perform classification experiments(after dimensionality reduction) using publicly available hyperspectral image data sets with manually labeled ground truth. The data sets, experiments, and results are described in this section. 4.1 Data We use two publicly available datasets: Indian Pines and Pavia. The Indian Pines image, shown in Fig. s 1a 1b, was captured by an AVIRIS spectrometer over the rural Indian Pines test site in Northwestern Indiana, USA. The image contains 145 145 pixels with spatial resolution of approximately 20 meters per pixel, with 224 spectral bands, 4 of which we have discarded due to noise and water. The image has been partially labeled, yielding 10249 ground truth pixels associated with 16 classes. The Pavia image, a portion of which is shown in Fig. s 1c 1d, was captured by a ROSIS sensor over the University of Pavia, Italy. The original image contains 610 340 pixels with spatial resolution of approximately 1.3 meters per pixel, with 115 spectral bands. A partial set of labels yields 42776 ground truth pixels associated with 9 classes. We use a cropped subset (610 175 pixels) of the original image in which the ground truth labels are particularly spatially diverse. (a) (b) (c) (d) Figure 1: Original images and ground truth: (a) Indian Pines, bands [29, 15, 12], (b) Indian Pines ground truth, (c) Pavia, bands [68,30,2], (d) Pavia ground truth.

4.2 Experimental Setup To compare our proposed SSSE algorithm with prior art algorithms, we use each algorithm to perform dimensionality reduction, and then we subsequently perform classification using the lower-dimensional embeddings in a similar manner to the protocol described in Benedetto et al. 8 The classification step is performed using linear discriminant analysis (LDA) as implemented in MATLAB, with 10% and 1% of each class selected from the ground truth pixels from the Indian Pines and Pavia images, respectively. We repeated classification 10 times and computed the mode of the results at each pixel to yield the final classification result. We used resulting confusion matrices to compute per-class accuracy as well as overall accuracy (OA), average accuracy (AA), average precision (AP), average sensitivity (ASe), average specificity (ASp), and Kappa coefficient (κ). Finally, we compared algorithms by determining whether differences in their Kappa coefficients were statistically significant using Z scores. 15 For the dimensionality ( reduction step, we use two versions of our proposed SSSE algorithm: SSSE1 - SSSE ) ( ( )) with γ i,j = exp x f i xf j 2 /σf 2, and SSSE2 - SSSE with γ i,j = exp cos 1 f x i,xf j. We also x f i x f j use our own implementations of the following algorithms: SM Shi-Malik, GB Gilles-Bowles, HZYZ Hou- Zhang-Ye-Zheng, BE Benedetto-E, BL1 Benedetto-L with element-wise multiplication of Laplacians, BL2 Benedetto-L with addition of Laplacians, BL3 Benedetto-L with matrix multiplication of Laplacians followed by zeroing of edges not in E 1, and BM Benedetto-M. A few notes about data treatment and parameter choices: Prior to dimensionality reduction, the spectral components of the data in X are normalized so that (1/k) k i=1 x f i = 1. We also assume that components of x p are in units of pixels. We make the 2 initial choice of σ f = σ p = 1 for each algorithm, but we adjust these parameters when necessary to improve performance. For all algorithms, we choose the reduced dimension to be n = 50 for Indian Pines and n = 25 for Pavia. For algorithms requiring graph construction via k-nearest neighbors (SSSE, HZYZ, BE, BL1, BL2, BL3, BM), we select k = 20. For the SSSE algorithm, we choose ǫ = 1 pixel for defining the neighborhood N p ǫ (x i ). In addition, we introduce a parameter ˆα defined by α = ˆα tr(l)/tr(v), in order to trade off the impact of L and V in a way that can be directly compared across images. 4.3 Results In the SSSE algorithm, fixing σ f = σ p = 1 leaves ˆα as the only free parameter. We tested classification after dimensionality reduction via SSSE1 and SSSE2 by selecting 17 logarithmically spaced values for ˆα ranging from 1 to 100. The resulting overall accuracy and average accuracy, precision, sensitivity, and specificity are shown as functions of ˆα in Fig. 2. Figures 3 4 show resulting classification maps for a subset of these choices of ˆα, as well as for the choice ˆα = 0 (corresponding to the use of solely spectral information). For both sets of images, we selected the best value of ˆα to be the value that appears to best maximize all of the reported quantities (OA, AA, AP, ASe, ASp). For the Indian Pines image (for both SSSE1 and SSSE2), this value is ˆα = 17.78, whereas for the Pavia image (again for both SSSE1 and SSSE2), it is ˆα = 23.71. Numerical values of OA, AA, AP, ASe, ASp, and κ, as well as classification accuracy for each class, are reported in Tables 1 3. Also in Tables 1 4 are results of classification after using (our implementations of) the prior-art algorithms for dimensionality reduction and determining the best choice of parameters for those algorithms. For the Indian Pines image, these best parameter choices are: SM: ǫ = 5, σ f = 0.1, σ p = 100; GB: ǫ = 5, σ f = 0.2, σ p = 100; HZYZ: σ f = 1, σ p = 10; BE: 8 spatial / 42 spectral eigenvectors, σ f = 1, σ p = 10; BM: σ f = 1, σ p = 10, β = 0.98. For the Pavia image, the best parameter choices are: SM: ǫ = 7, σ f = 0.45, σ p = 100; GB: ǫ = 7, σ f = 0.2, σ p = 100; HZYZ: σ f = 1, σ p = 10; BE: 5 spatial / 20 spectral eigenvectors, σ f = 1, σ p = 10; BM:

1 0.9 0.8 0.7 10 0 10 1 10 2 1 0.95 0.9 0.85 0.8 0.75 10 0 10 1 10 2 ˆα ˆα Indian Pines 1 0.95 0.9 0.85 0.8 0.75 10 0 10 1 10 2 1 0.95 0.9 0.85 0.8 0.75 10 0 10 1 10 2 ˆα ˆα Pavia Figure 2: Classification performance measures for SSSE1 (top) and SSSE2 (bottom) as functions of ˆα: Overall accuracy (blue circles), average accuracy (green x s), average precision (red squares), average sensitivity (black + s), average specificity (magenta triangles), and Kappa coefficient (yellow triangles). Dashed vertical lines indicate best choice of ˆα. ˆα = 0 ˆα = 1.33 ˆα = 3.16 ˆα = 7.50 ˆα = 17.78 ˆα = 42.17 ˆα = 100 Figure 3: Classification results for Indian Pines image after dimensionality reduction via SSSE1 (top row) and SSSE2 (bottom row) for various values of ˆα.

ˆα = 0 ˆα = 1.33 ˆα = 3.16 ˆα = 7.50 ˆα = 17.78 ˆα = 42.17 ˆα = 100 Figure 4: Classification results for Pavia image after dimensionality reduction via SSSE1 (top row) and SSSE2 (bottom row) for various values of ˆα. σ f = 1, σ p = 10, β = 0.95. Note that we did not include results corresponding to BL1; performing element-wise multiplication of weights caused some rows of the resulting weight matrix to be numerically zero, leading to a graph that was not connected so that the eigenvalue zero had multiplicity greater than one. As can be seen in Table 1, for the Indian Pines image, the SSSE2 algorithm exhibits the best performance in terms of all of the global measures (OA, AA, AP, ASe, and ASp), and the SSSE1 algorithm exhibits the second best performance. Other algorithms that perform fairly well on the Indian Pines image include SM, GB, HZYZ, and BE. To determine whether the difference in classification results from different algorithms may be statistically significant, we compute the standard normal deviant, Z, from the Kappa coefficients and their variance estimates; Z scores above 1.96 indicate statistically significant differences in Kappa coefficient at the 95% confidence level. Table 2 shows when the resulting Z scores indicate statistically significant differences between classification performance for each pair of algorithms. From this table, we see that for the Indian Pines data, while the differences in performance between SSSE1 and SSSE2 are not statistically significant, both SSSE1 and SSSE2 do exhibit statistically significant improvements over all other algorithms (with the exception of SSSE1 and HZYZ, in which the difference in performance is not statistically significant). In Table 3, we actually see that for the Pavia image, the BE algorithm exhibits the best performance in terms of all of the global measures. However, SSSE2 and SSSE1 come in second and third place, respectively, in terms of most of the global measures. (HZYZ outperforms SSSE1 in average precision). HZYZ also performs quite well on the Pavia image, and GB performs fairly well. Table 4 confirms this interpretation: the BE algorithm performs significantly better than other algorithms. Excluding BE, the SSSE1 and SSSE2 algorithms perform significantly better than all remaining algorithms.

No. of Samp. SSSE1 SSSE2 SM GB HZYZ BE BL2 BL3 BM OA 95.45 95.64 92.10 93.32 94.87 92.97 65.29 62.49 87.70 AA 99.43 99.45 99.01 99.16 99.36 99.12 95.66 95.31 98.46 AP 96.63 96.77 92.81 94.51 96.15 94.02 69.82 63.44 90.43 ASe 91.19 91.32 86.87 88.66 90.73 87.77 61.78 62.18 81.98 ASp 99.68 99.69 99.45 99.53 99.64 99.51 97.53 97.33 99.14 κ 94.81 95.02 90.99 92.37 94.15 91.98 60.59 57.02 86.03 Class 1 46 99.99 99.99 100.0 100.0 99.99 99.99 99.37 99.18 99.99 Class 2 1428 98.10 98.21 95.80 95.65 97.56 96.03 86.39 86.74 93.51 Class 3 830 98.68 98.66 98.19 98.64 98.74 98.07 92.11 90.74 98.19 Class 4 237 99.40 99.35 99.22 99.69 99.62 99.26 96.36 97.39 96.82 Class 5 483 99.63 99.69 99.51 99.65 99.60 99.40 98.47 98.47 99.08 Class 6 730 99.92 99.91 98.90 99.63 99.97 99.56 98.01 97.36 98.48 Class 7 28 99.87 99.91 99.73 99.86 99.81 99.67 99.45 99.94 99.42 Class 8 478 99.99 99.99 100.0 100.0 100.0 100.0 99.40 99.17 100.0 Class 9 20 99.80 99.75 98.99 99.65 99.80 99.75 98.78 98.86 98.86 Class 10 972 98.30 98.40 97.49 97.80 98.03 97.27 92.34 89.30 96.89 Class 11 2455 98.16 98.24 98.62 98.54 98.26 98.63 83.70 81.47 97.75 Class 12 593 99.39 99.53 98.74 98.72 98.88 98.75 92.83 93.25 97.51 Class 13 205 99.95 99.96 99.98 99.98 99.99 99.98 99.40 99.27 99.98 Class 14 1265 99.88 99.85 99.91 99.96 99.84 99.99 97.77 97.79 99.83 Class 15 386 99.93 99.92 99.13 99.13 99.65 99.75 96.39 96.21 99.09 Class 16 93 99.91 99.90 99.98 99.73 99.98 99.84 99.84 99.84 99.98 Table 1: Indian Pines classification results using various dimensionality algorithms. OA = Overall Accuracy, AA = Average Accuracy, AP = Average Precision, ASe = Average Sensitivity, ASp = Average Specificity, κ = Kappa coefficient. Class rows report per-class accuracy. Classes: 1 = Alfalfa, 2 = Corn-notill, 3 = Corn-mintill, 4 = Corn, 5 = Grass-pasture, 6 = Grass-trees, 7 = Grass-pasture-mowed, 8 = Hay-windrowed, 9 = Oats, 10 = Soybean-notill, 11 = Soybean-mintill, 12 = Soybean-clean, 13 = Wheat, 14 = Woods, 15 = Buildings-Grass-Trees-Drives, 16 = Stone-Steel-Towers. All quantities (except number of samples) are percentages. SSSE1 SSSE2 SM GB HZYZ BE BL2 BL3 BM SSSE1 o + + o + + + + SSSE2 o + + + + + + + SM + + + GB + o + + + HZYZ o + + + + + + BE + o + + + BL2 + BL3 BM + + Table 2: Statistical significance between κ values of classification algorithms on Indian Pines data. Each entry is + if κ is significantly larger in the row method versus the column method, if κ is significantly smaller, and o if there is no significant difference. Significance is measured at the 95% confidence level.

No. of Samp. SSSE1 SSSE2 SM GB HZYZ BE BL2 BL3 BM OA 95.33 95.64 81.14 91.50 91.98 97.14 83.56 79.77 86.70 AA 98.96 99.03 95.81 98.11 98.22 99.37 96.35 95.50 97.04 AP 92.72 93.46 71.09 84.38 92.74 97.25 78.46 73.60 77.48 ASe 96.12 96.46 70.31 87.57 92.78 97.75 82.21 75.57 75.14 ASp 99.43 99.47 97.55 98.94 98.86 99.61 97.91 97.41 98.35 κ 94.05 94.45 75.85 89.15 89.91 96.37 78.99 74.30 82.99 Class 1 6631 99.24 99.20 91.86 96.97 98.13 99.02 94.96 92.57 93.97 Class 2 18649 99.44 99.69 93.28 97.45 96.67 98.13 96.31 95.72 96.42 Class 3 2099 97.13 97.31 98.29 99.18 97.12 99.96 89.79 91.43 99.03 Class 4 3064 99.28 99.29 93.65 96.27 96.76 98.37 98.04 95.01 93.91 Class 5 1345 99.91 99.79 98.93 99.53 99.96 99.87 99.91 98.42 99.19 Class 6 5029 99.55 99.80 97.91 99.13 99.81 99.98 98.40 98.73 98.54 Class 7 1330 99.95 99.98 97.58 99.90 99.47 99.96 97.40 95.50 98.45 Class 8 3682 96.36 96.52 95.24 97.48 96.03 99.01 92.32 92.51 97.11 Class 9 947 99.80 99.70 95.53 97.08 99.99 99.99 99.98 99.64 96.77 Table 3: Pavia classification results using various dimensionality algorithms. OA = Overall Accuracy, AA = Average Accuracy, AP = Average Precision, ASe = Average Sensitivity, ASp = Average Specificity, κ = Kappa coefficient. Class rows report per-class accuracy. Classes: 1 = Asphalt, 2 = Meadows, 3 = Gravel, 4 = Trees, 5 = Painted metal sheets, 6 = Bare soil, 7 = Bitumen, 8 = Self-Blocking Bricks, 9 = Shadows. All quantities (except number of samples) are percentages. SSSE1 SSSE2 SM GB HZYZ BE BL2 BL3 BM SSSE1 o + + + + + + SSSE2 o + + + + + + SM + GB + + + + HZYZ + + + + + BE + + + + + + + + BL2 + + BL3 BM + + + Table 4: Statistical significance between κ values of classification algorithms on Pavia data. Each entry is + if κ is significantly larger in the row method versus the column method, if κ is significantly smaller, and o if there is no significant difference. Significance is measured at the 95% confidence level.

5. CONCLUSION In this article, we proposed a new algorithm for dimensionality reduction using both the spatial and spectral information present in a hyperspectral image. The algorithm is based on Schroedinger Eigenmaps, which has traditionally been used for semi-supervised learning. By constructing a graph based solely on spectral information and then defining a cluster potential matrix that encodes spatial relationships between pixels, our proposed algorithm provides a natural way to trade off the relative impact of the spatial versus spectral information in the dimensionality reduction process. Classification experiments on publicly available hyperspectral images with manually labeled ground truth show that the proposed algorithm exhibits superior/competitive performance to a variety of prior art algorithms for reducing the dimension of the data provided to a standard classification algorithm. APPENDIX Prototype implementations of the Spatial-Spectral Schroedinger Eigenmaps algorithms (SSSE1 and SSSE2) are available for download at MATLAB Central (http://www.mathworks.com/matlabcentral/) under File ID #45908. ACKNOWLEDGEMENTS The authors would like to thank Prof. Landgrebe (Purdue University, USA) for providing the Indian Pines data and Prof. Paolo Gamba (Pavia University, Italy) for providing the Pavia University data. REFERENCES [1] Schott, J. R., [Remote Sensing: The Image Chain Approach], Oxford University Press, 2nd ed. (2007). [2] Prasad, S. and Bruce, L., Limitations of principal components analysis for hyperspectral target recognition, Geoscience and Remote Sensing Letters, IEEE 5, 625 629 (Oct 2008). [3] Kim, D. and Finkel, L., Hyperspectral image processing using locally linear embedding, in [Neural Engineering, 2003. Conference Proceedings. First International IEEE EMBS Conference on], 316 319 (March 2003). [4] Bachmann, C., Ainsworth, T., and Fusina, R., Exploiting manifold geometry in hyperspectral imagery, Geoscience and Remote Sensing, IEEE Transactions on 43, 441 454 (March 2005). [5] Fauvel, M., Chanussot, J., and Benediktsson, J., Kernel principal component analysis for the classification of hyperspectral remote sensing data of urban areas, EURASIP Journal on Advances in Signal Processing 2009(783194), 1 14 (2009). [6] Halevy, A., Extensions of Laplacian Eigenmaps for Manifold Learning, PhD thesis, University of Maryland, College Park (2011). [7] Hou, B., Zhang, X., Ye, Q., and Zheng, Y., A novel method for hyperspectral image classification based on Laplacian eigenmap pixels distribution-flow, Selected Topics in Applied Earth Observations and Remote Sensing, IEEE Journal of 6(3), 1602 1618 (2013). [8] Benedetto, J., Czaja, W., Dobrosotskaya, J., Doster, T., Duke, K., and Gillis, D., Integration of heterogeneous data for classification in hyperspectral satellite imagery, in [Proc. of SPIE Vol. 8390], 839027 1 839027 12 (June 2012). [9] Gillis, D. B. and Bowles, J. H., Hyperspectral image segmentation using spatial-spectral graphs, Proc. SPIE Algorithms and Technologies for Multispectral, Hyperspectral, and Ultraspectral Imagery XVIII 8390, 83901Q 1 83901Q 11 (2012). [10] Czaja, W. and Ehler, M., Schroedinger eigenmaps for the analysis of biomedical data, IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 1274 1280 (May 2013). [11] Belkin, M. and Niyogi, P., Laplacian eigenmaps for dimensionality reduction and data representation, Neural Computation 15, 1373 1396 (June 2003). [12] Shi, J. and Malik, J., Normalized cuts and image segmentation, in [Computer Vision and Pattern Recognition, 1997. Proceedings., 1997 IEEE Computer Society Conference on], 731 737 (1997).

[13] Shi, J. and Malik, J., Normalized cuts and image segmentation, Pattern Analysis and Machine Intelligence, IEEE Transactions on 22(8), 888 905 (2000). [14] Benedetto, J., Czaja, W., Dobrosotskaya, J., Doster, T., Duke, K., and Gillis, D., Semi-supervised learning of heterogeneous data in remote sensing imagery, in [Proc. of SPIE Vol. 8401], 840104 1 840104 12 (June 2012). [15] Senseman, G. M., Bagley, C. F., and Tweddale, S. A., Accuracy assessment of the discrete classification of remotely-sensed digital data for landcover mapping, in [USACERL Technical Report EN-95/04], 1 27 (April 1995).