HYPERSPECTRAL imaging amounts to collecting the

1408 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 45, NO. 5, MAY 2007 Transform Coding Techniques for Lossy Hyperspectral Data Compression Barbara Penna, Student Member, IEEE, Tammam Tillo, Member, IEEE, Enrico Magli, Member, IEEE, and Gabriella Olmo, Senior Member, IEEE Abstract Transform-based lossy compression has a huge potential for hyperspectral data reduction. Hyperspectral data are 3-D, and the nature of their correlation is different in each dimension. This calls for a careful design of the 3-D transform to be used for compression. In this paper, we investigate the transform design and rate allocation stage for lossy compression of hyperspectral data. First, we select a set of 3-D transforms, obtained by combining in various ways wavelets, wavelet packets, the discrete cosine transform, and the Karhunen Loève transform (KLT), and evaluate the coding efficiency of these combinations. Second, we propose a low-complexity version of the KLT, in which complexity and performance can be balanced in a scalable way, allowing one to design the transform that better matches a specific application. Third, we integrate this, as well as other existing transforms, in the framework of Part 2 of the Joint Photographic Experts Group (JPEG) 2000 standard, taking advantage of the high coding efficiency of JPEG 2000, and exploiting the interoperability of an international standard. We introduce an evaluation framework based on both reconstruction fidelity and impact on image exploitation, and evaluate the proposed algorithm by applying this framework to AVIRIS scenes. It is shown that the scheme based on the proposed low-complexity KLT significantly outperforms previous schemes as to rate-distortion performance. As for impact on exploitation, we consider multiclass hard classification, spectral unmixing, binary classification, and anomaly detection as benchmark applications. Index Terms Anomaly detection, classification, discrete cosine transform (DCT), hyperspectral data, Joint Photographic Experts Group (JPEG) 2000, Karhunen Loève transform (KLT), lossy compression, spectral unmixing, wavelets. I. INTRODUCTION HYPERSPECTRAL imaging amounts to collecting the energy reflected or emitted by ground targets at a typically very high number of wavelengths, resulting in a data cube containing tens to hundreds of bands. These data have become increasingly popular, since they enable plenty of new applications, including detection and identification of surface, and atmospheric constituents, analysis of soil type, agriculture and forest monitoring, environmental studies, and military surveillance. The data are usually acquired by a remote platform (a satellite or an aircraft), and then downlinked to a ground station. Due to the huge size of the datasets, compression is necessary to match the available transmission bandwidth. Manuscript received December 16, 2005; revised November 7, 2006. The authors are with Center for Multimedia Radio Communications (CERCOM), Dipartimento di Elettronica, Politecnico di Torino, 10129 Torino, Italy (e-mail: barbara.penna@polito.it; tammam.tillo@polito.it; enrico.magli@polito.it; gabriella.olmo@polito.it). Digital Object Identifier 10.1109/TGRS.2007.894565 In the past, scientific data have been almost exclusively compressed by means of lossless methods, in order to preserve their full quality. However, more recently, there has been an increasing interest in their lossy compression. In fact, two of the most recent satellites, SPOT 4 and IKONOS, employ on-board lossy compression prior to downlinking the data to ground stations. As lossy compression allows for higher scene acquisition rates, several lossy algorithms have been designed for multispectral and hyperspectral images. Many of these techniques are based on decorrelating transforms, in order to exploit spatial and interband (i.e., spectral) correlation, followed by a quantization stage and an entropy coder. Examples include the Joint Photographic Experts Group (JPEG) 2000 standard [1], and set partitioning methods such as set partitioning in hierarchical trees (SPIHT) and its variations [SPIHT-2D, SPIHT-3D, set partitioned embedded block (SPECK)]. Some authors have also proposed to employ the 3-D discrete wavelet transform (DWT) [2] [4] and the 3-D discrete cosine transform (DCT) [5], [6]. Moreover, several methods that treat differently spectral and spatial redundancy have been investigated. A popular approach involves the combination of a 1-D spectral decorrelator such as the Karhunen Loève transform (KLT), the DWT, or the DCT, followed by JPEG 2000 employed as spatial decorrelator, rate allocator, and entropy coder (see, e.g., [7]); SPIHT has also been used for the same purpose [8]. In [9] and [10], a 3-D version of SPIHT using a separable spectral wavelet transform coupled with asymmetric tree structure, as well as a low complexity image encoder with similar features [set partitioned embedded block (SPECK)], are proposed for hyperspectral image compression. Although there has been a large amount of work on 3-D coders, the proposed techniques have been often tested under different conditions and using different datasets; this makes it difficult to evaluate the best combination of spatial and spectral transforms for a given application. On a related note, regardless of the fact that the KLT is the optimal transform in the coding gain sense, its practical application has been somewhat limited because of its complexity and of the fact that the transform is signal-adaptive; however, a few recent works have rediscovered the KLT and attempted to exploit its superior decorrelation capabilities [11]. In [12], vector quantization and spectral KLT are employed to exploit the correlation between multispectral bands. In [13], an efficient adaptive KLT algorithm for multispectral image compression is presented. The proposed technique exploits an adaptive algorithm to continuously adjust 0196-2892/$25.00 2007 IEEE

PENNA et al.: TRANSFORM CODING TECHNIQUES 1409 eigenvalues and eigenvectors when input image data are received sequentially. In [14], an integer implementation of the KTL followed by JPEG 2000 applied to each transformed component is presented. In [15], a hybrid 3-D wavelet transform is used, employing JPEG 2000 as spatial decorrelator. This technique significantly outperforms state-of-the-art schemes. Part of the performance gain is achieved through full 3-D postcompression rate-distortion optimization, which is a powerful feature of JPEG 2000 Part 2, and is also used in [16] and [17]. A hybrid 3-D DWT has also been used in [16] and [17], and tarp coding is proposed in [18]. The highest performance existing schemes take advantage of the high coding efficiency of the KLT as spectral decorrelator, and of JPEG 2000 as spatial decorrelator and entropy coder. However, a few issues are still open. First, although the KLT is the optimal decorrelating transform, its complexity is very high, due to the need to estimate covariance matrices, solve eigenvector problems, and computing matrix-vector products. Second, most schemes employ JPEG 2000 separately on each band, allocating the same rate to each of them; this approach is obviously suboptimal, since the spectral transform unbalances the energy in different bands, and this can be exploited by differentiating the rate allocation. Although lossy compression allows for much higher compression ratios than lossless compression, it introduces degradation in the data. Therefore, as lossy compression has become more popular, researchers have started to investigate the quality issues associated with such information losses. Two important questions are 1) whether the metrics based on the mean-squared error (MSE) can adequately capture the effects of this degradation in typical remote sensing applications and 2) whether other simple metrics exist, which are better then MSE at capturing these effects. Recently, a comprehensive investigation of this problem has been reported in [19], where the authors consider a set of quality metrics and a set of image degradations, including lossy compression. The sensitivity of each metric to each degradation is evaluated on hyperspectral data, and some general conclusions are drawn. It is shown that, taking, e.g., spectral angle mapper (SAM) classification [20] as reference application, MSE turns out to be a reasonably good metric; however, more than one metric (or, alternatively, a more thorough evaluation procedure) should be used if an accurate characterization of the degradation is required. This paper attempts to solve the transform coding problems outlined above, and builds on the state-of-the-art by providing the following contributions. First of all, we report the results of a comprehensive experiment aimed at comparing the coding efficiency of several combinations of spatial and spectral transforms. The experiment is carried out in the framework of lossy compression of hyperspectral data, since these data have become very popular, and are more amenable to spectral decorrelation than multispectral data; in particular, a few Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) scenes have been used to evaluate the selected transforms. We consider different transforms such as DCT, rectangular, square and hybrid wavelet and wavelet packet transforms, KLT, and various spatial/spectral combinations of these transforms; the evaluation procedure is designed so as to simulate global 3-D rate allocation, as opposed to assigning the same rate to each transformed band. Second, we propose a low-complexity version of the KLT, which provides performance similar to the full-featured KLT, with significantly lower computational complexity. Third, we integrate the low-complexity KLT, as well as a few of the best existing transforms, into a practical compression scheme based on the JPEG 2000 standard. This choice allows us to define a compression scheme combining the flexibility and interoperability of an international standard with the high coding efficiency of JPEG 2000 and the KLT. In particular, rather than selecting a fixed number of spectral components, the 3-D postcompression rate-distortion optimization allows to keep all components and to represent them with different accuracy, so as to achieve maximum reconstruction fidelity. Finally, we introduce a performance evaluation framework based on both rate-distortion analysis and impact on information extraction. The resulting scheme is compliant with the multicomponent transformation extension defined in Part 2 of JPEG 2000 [21], and significantly outperforms the best existing lossy compression techniques. This paper is organized as follows. In Section II, we analyze various decorrelating transforms. In Section III, we outline a transform evaluation procedure and provide evaluation results on hyperspectral data. In Section IV, we define the proposed KLT-based algorithm, whereas in Section V, we report its performance evaluation as to complexity and fidelity, and in Section VI information extraction results. Finally, in Section VII, we draw some conclusions. II. THREE-DIMENSIONAL TRANSFORM CODING STUDY A. Introduction The transforms employed in the development of the proposed compression schemes are briefly described in the following. The KLT is the optimal block-based transform (in a statistical sense) for data compression. Defining the covariance matrix C X of a random column vector X with mean value µ X as C X = E [ (X µ X )(X µ X ) T] (1) the KLT transform matrix V is obtained by aligning columnwise the eigenvectors of C X. It can be shown that the transformed random vector Y = V T (X µ X ) has uncorrelated components, i.e., C Y = V T C X V is a diagonal matrix. It should be noted that the transform defined above coincides with a principal component analysis (PCA). It is known [22] that the KLT and PCA use the same transform matrix, and the PCA also removes the expected value µ X from the signal prior to the transformation. However, with a slight abuse of notation, in this paper, we will denote the transform above as a KLT, because our use of the transformed components is significantly different from what has been done before. While all previous papers employing the PCA for compression (e.g., [23] and [24]) use it to reduce the number of components, by zeroing out a given number of least significant ones, in this paper we keep all components, but represent them with different accuracy. This

1410 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 45, NO. 5, MAY 2007 concept will be further clarified in Sections III-A and IV-D when we discuss 3-D rate-distortion optimization. The DCT (see, e.g., [25]) represents the input signal as a linear combination of weighted basis functions that are related to its frequency components. It is known to be a good approximation to the KLT for Gauss Markov processes, and its computation is much simpler. The DWT [25] is based on the principle that efficient decorrelation can be achieved by splitting the data into two halfrate subsequences, carrying information, respectively, on the approximation and detail of the original signal, or equivalently on the low- and high-frequency half-bands of its spectrum. Since most of the signal energy of real-world signals is typically concentrated in the lowpass frequencies, this process splits the signal in a very significant and a little significant part, leading to good energy compaction. The procedure can be iterated on the lowpass subsequence by means of a filter bank scheme. The discrete wavelet packet transform (DWPT) is a generalization of the DWT that offers a richer range of options for signal analysis. The DWPT is implemented by a filter bank in which also the highpass outputs are allowed to be further split into approximation and detail. If both the lowpass and highpass sequences are always split, the system is said to be a complete wavelet packet transform. However, it is not necessary for the transform to be complete; for any given input signal, there exists an optimal choice of highpass and lowpass iterations that captures most of the input signal correlation, which is known as best basis wavelet packet transform. Given an appropriate cost function, a search algorithm adaptively selects the best basis for a given signal. Different cost functions can be employed, e.g., entropy, minimum distortion, minimum number of coefficients above a certain threshold [26], or rate-distortion optimization [27]. Our purpose is to select the best 3-D transforms in terms of energy compaction; hence, the cost function in [27] is not suitable because it explicitly takes quantization into account, while our analysis aims at being independent of the specific quantization scheme employed. We have found that the coding gain, which is a performance measure of transform efficiency [25], is a very good and theoretically sound objective function for seeking the best decomposition tree. Assuming a DWPT with l decomposition levels, the coding gain G (l 1) l is defined as G (l 1) l = σl 1 2 ( N ) 1 i=1 σ2 N l,i where σ l 1 is the standard deviation of the transformed coefficients in a subband at level l 1, and σ l,i are the standard deviations of the i =1,...,N subbands stemming from a further decomposition; note that the denominator is the geometric mean of the subband variances. If G (l 1) l > 1 the decomposition at level l is retained, otherwise the current subband at level l 1 is kept. When the DWT and DWPT have to be applied to 3-D data set, multidimensional extensions are required. In the following, we will consider three possible extensions, which are referred to as square, rectangular, and hybrid transforms. Our description refers to the DWT; the generalization to DWPT is straightforward. A square 2-D transform is such that first one decomposition level is computed in all dimensions. Then, the (multidimensional) approximation subband is considered, and a new iteration is applied to it. This transform is also referred to in the literature as dyadic transform, because it replicates, in multiple dimensions, the iterative structure on the lowpass branch of the 1-D transform. A rectangular 2-D transform is such that first the complete 1-D wavelet transform (i.e., all decomposition levels) is computed in one dimension, and then the complete transform is applied to the second dimension. In 3-D, a square transform is obtained by first computing one decomposition level in all dimensions, and then iterating on the LLL cube. Conversely, the rectangular transform is obtained by first applying the complete transform along the first dimension, then along the second one, and finally along the third one. In 3-D, hybrid transforms can also be obtained by first applying the complete transform in one dimension, and then taking a 2-D square transform in the other two dimensions. The obtained transform is referred to as 3-D hybrid rectangular/square DWT, and has been found to be a better match to hyperspectral data then the square transform [15] [18], [28]. Note that this hybrid transform is often referred to in the literature as a wavelet packet transform, although it is not connected with the use of a best basis DWPT [26]. B. Three-Dimensional Transforms Selected for Evaluation The previously described 1-D transforms have been combined in various ways to obtain 3-D transforms for hyperspectral data. The most relevant combinations are reported in the following. As for filter selection in the DWT and DWPT, the (9, 7) biorthogonal wavelet filter pair has been used throughout this paper; this filter is known to provide excellent compression performance, and has been selected for inclusion in the JPEG 2000 standard. The first transform we consider is the 3-D square DWT, labeled as DWT3D. The second transform is the 3-D square DWPT, labeled as DWP3D. In this transform, one level of wavelet packet decomposition is applied along the three dimensions; then, all the subcubes obtained by the wavelet packet decomposition may be further split so as to minimize an appropriate cost function. This procedure is repeated iteratively on each obtained cube for a given number of decomposition levels. As hyperspectral data carry a lot of information in the spectral dimension, it is interesting to think of transforms that operate differently in the spectral and spatial directions, in order to match the different nature of those correlations. The third transform we consider is hence the 3-D hybrid rectangular/square wavelet transform, labeled as DWT1D2D. Fig. 1 depicts a hybrid 3-D wavelet transform obtained with three decomposition levels as described. As can be seen, the subbands generated by this transform are parallelepipeds in 3-D. The number of subbands is higher than that of the classical square transform; the frequency partitioning is such that high horizontal and low vertical frequencies lie in subbands where

PENNA et al.: TRANSFORM CODING TECHNIQUES 1411 Fig. 1. Hybrid rectangular/square 3-D DWT subband decomposition of a data cube. Fig. 2. Best basis for the spectral DWPT. the basis functions are short and long in the horizontal and vertical dimensions, respectively. The obtained frequency tessellation is finer, and has more radial symmetry than the square transform. The fourth transform is the hybrid spectral wavelet packet and spatial square 2-D wavelet transform, labeled as DWP1D- DWT2D. The cost function for DWPT is minimized in the third dimension considering the whole cube obtained by the 1-D packet decomposition, and not the single 1-D vectors. In fact, the separated optimization for each single vector could yield different bases. In this case, the transformed data cube would present discontinuities, which would penalize the performance of the next spatial 2-D DWT stage. Another advantage is the reduced overhead, since a single spectral decomposition tree has to be transmitted, instead of a separate tree for each spectral vector. Best basis selection using the coding gain yields the decomposition represented in Fig. 2. Consistently with the notion that spectral vectors have a significant information content, the obtained decomposition is finer than the classical dyadic wavelet tree in the high-frequency portion of the spectrum, and almost resembles a Fourier transform. The fifth transform is the hybrid spectral wavelet packet transform and spatial wavelet packet transform, labeled as DWP1D-DWP2D. The cost function is minimized in the third dimension considering the cube obtained by the 1-D DWPT; then, a 2-D DWPT is evaluated on each single band. The sixth transform is the hybrid spectral DWT and spatial wavelet packet transform, labeled as DWT1D-DWP2D. The seventh transform is the spectral DCT and spatial DWT, and is labeled as DCT1D-DWT2D. The eight transform is the spectral KLT and spatial DWT, and is labeled as KLT1D-DWT2D. In order to evaluate the transform matrix which optimally decorrelates the spectral dimension, we estimate the covariance matrix of the hyperspectral data cube assuming that each spectral vector, containing the radiance of a pixel at a given spatial location in all the bands, is a realization of the random process that has to be decorrelated. In particular, given the hyperspectral data cube, i.e., B bands containing M lines and N samples, we form the M N column vectors X ij =[x 1 ij,x2 ij,...,xb ij ]T,fori =1,...,M and j =1,...,N, where x k ij is the pixel with spatial coordinates (i, j) in band k. We employ the sample mean vector M x = [m 1,m 2,...,m B ] T, with m k =(1/M N ) M N i=1 j=1 xk ij,as estimate of the ensemble averages of each band. For each spectral vector, we estimate its covariance matrix using one single realization, i.e., C X,i,j =(X ij M x )(X ij M x ) T. Finally, we compute the average covariance matrix C X =(1/M N ) M i=1 N j=1 C X,i,j. We solve the eigenvector problem for the symmetric matrix C X, obtaining the eigenvalues λ i and eigenvectors u i that satisfy C X u i = λ i u i. The KLT kernel is a unitary matrix V, whose columns are the eigenvectors u i arranged in descending order of eigenvalue magnitude. This matrix is used to transform each spectral vector, after subtracting its mean value, as Y ij = V T (X ij M x ). The complexity of the decorrelation transform is the sum of three contributions. The first one is the evaluation of the covariance matrix (O(B 2 MN)); the second one is the solution of the eigenvector problem (O(B 3 ), [29]); the third one is the computation of transform coefficients (O(B 2 MN)). It can be observed that M, N, and B are of the same order of magnitude, hence the second term is negligible with the respect to the first and third. The overall computational complexity is very high, and has so far limited the use of the KLT in practical applications. III. RESULTS OF TRANSFORM EVALUATION A. Evaluation Procedure All the transforms described in Section II-B have been compared in terms of their energy compaction capability, which is a measure of the fraction of signal energy contained in a given number of transform coefficients. It is a very important property in image compression, since it provides an estimation of the effect of quantization in a practical coding scheme. The energy compaction property is evaluated for each transform by performing the following steps: 1) computing the 3-D transform on a few hyperspectral scenes; 2) zeroing out a given percentage of transform coefficients, taken as those with the smallest magnitude; 3) computing the inverse 3-D transform; 4) computing the signal-to-noise ratio (SNR) with respect to the original image. As for SNR, we use the following definition: SNR = 10 log 10 E X E N where E X and E N are the energy of the original signal and of the reconstruction error, respectively. In particular, we denote as x i,j,k the original image and x i,j,k the reconstructed image, where the indexes i, j, and k scan the lines, pixels, and bands.

1412 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 45, NO. 5, MAY 2007 Then, we have E X = i,j,k x2 i,j,k, and E N = i,j,k (x i,j,k x i,j,k ) 2. Note that the energy E X depends on the mean value of the signal; some authors prefer using the variance instead of the energy. Hyperspectral images usually have nonzero mean value; in the transform domain the mean value is often captured by a single transform coefficient or a small set of coefficients (e.g., the dc coefficient for the DCT, the LL subband for the wavelet transform). We use the term energy compaction to denote the ability of a transform to concentrate the energy of the image (including its mean value) in the smallest number of coefficients of high value in the transform domain, and more specifically providing the highest coding gain. As a consequence, it makes sense to consider energy instead of variance because we are using energy-preserving transforms, or approximations thereof, which also represent the image mean value. Of particular importance is the fact that the coefficients to be zeroed out are taken in arbitrary order within the complete 3-D set of transform coefficients, and not on a band by band basis. This mimics a 3-D rate-distortion optimization, which is known to provide significantly better results than band-byband optimization [15]. Also for the KLT, we zero out the coefficients of smallest magnitude out of all 224 transformed components; i.e., we do not reduce the dataset dimensionality, but rather represent all components with different precision, which is a more rigorous approach from the rate-distortion standpoint. Note that, for a unitary transform like the KLT, zeroing the coefficients with smallest magnitude yields the optimal reconstructed signal in MSE sense. Moreover, in the procedure above we do not perform any entropy coding or quantization of the significant coefficients. While a higher percentage of zeroed coefficients mimics encoding at lower rate, the relation between that percentage and the actual obtained rate is not linear, and depends on the adopted quantization and entropy coding schemes. The performance evaluation is carried out on 16-bit radiance AVIRIS data cubes. AVIRIS scenes have 224 bands and 614 512 pixels resolution, but each scene has been cropped to 256 256 224 pixels in order to speed up the computations; larger scenes have been used to test the complete algorithms. Scene 4 of Cuprite and scene 3 of Jasper Ridge have been employed; for brevity, we only report the set of results for Cuprite. The results presented in Section III-B do not include the rate overhead for transmitting the KLT matrix coefficients; however, even if not compressed, representing these coefficients as 32-bit floating point numbers generates an overhead of about 0.11 bpppb for the considered scene size. For the wavelet transform, we employ three transform levels in each direction. B. Energy Compaction Results We anticipate that, not surprisingly, we have found that the spectral correlation plays a crucial role for compression, since the transforms that are better able to capture this correlation are those that rank best for compression. It is already known for lossless compression (see, e.g., [30]) that large bit-rate reductions can be achieved by employing an efficient model of the spectral correlation. As will be seen, exploiting this Fig. 3. Performance comparison of different transforms DWT3D versus DWP3D. correlation in the lossy case calls for the use of separate spectral and spatial transforms. In Fig. 3, we compare the DWT3D and DWP3D transforms. Neither transform is computed separately in the spectral dimension. As can be seen, the performance of the DWP3D transform is significantly better than that of the DWT3D. This is due to the fact that the 3-D square wavelet transform is isotropic in all three dimensions. This may not be the most appropriate correlation model of a hyperspectral dataset, since the subband decomposition in the spectral dimension is not as fine as it could be. Hence, because of the rather rough tessellation of 3-D frequency space, the DWT3D turns out to have poor performance as to energy compaction of spectral vectors. The DWP3D transform performs much better, since its ability to adaptively select the frequency tessellation allows it to refine the signal description along the spectral dimension, and hence to exploit much better the spectral correlation. Fig. 4 compares the performance of wavelet and wavelet packet transforms computed separately in the spectral direction. Namely, we first compute a full 1-D DWT or DWPT in the spectral direction, followed by a square 2-D DWT or DWPT. The following remarks can be made. Spatial decorrelation is performed more effectively by the DWT than by the DWPT. This is somewhat counterintuitive, since one would expect that the optimized decomposition provides better results. As a matter of fact, in our evaluation procedure we zero out the least significant coefficients taken from the complete 3-D set of transform coefficients, whereas the DWPT transform is optimized separately in the spectral and spatial directions. Thus, our procedure closely simulated a 3-D rate allocation, which would work better with a three-dimensionally optimized transform. As an example, the 2-D DWPT does not decompose the bands in the water absorption region because they contain little information; however, those bands contain many highvalued coefficients that have to be retained, at the expenses of other coefficients that are discarded. Therefore, the mismatch between the 3-D coefficient selection and the separatedness of the DWPT transform makes it useless, and even disadvantageous, to perform best basis selection.

PENNA et al.: TRANSFORM CODING TECHNIQUES 1413 Fig. 4. Performance comparison of different transforms, with a separable transform in the spectral direction. solution of the eigenvector problem. As expected, the DCT performs almost always worse than the DWT, except for a range in the very low bit-rate region. In summary, this analysis shows that the schemes with highest performance are based on hybrid rectangular/square transforms. Among these schemes, the 2-D DWT should be preferred for spatial decorrelation. As far as spectral decorrelation is concerned, the 1-D DWT provides good performance with limited complexity; as a consequence, the DWT1D2D will be used as benchmark transform for the proposed compression algorithm. The KLT achieves significantly better performance at all bit-rates, and especially in the low bit-rate region. It should be noticed that this KLT employs a single transform matrix for all spectral vectors. This approach is effective because, along the spectral dimension, the signal depend almost exclusively on pixel land cover; since only a few land covers are typically present in an image, the KLT works better than the other transforms. In the spatial dimension, the signal depends on the scene geometry, which is less predictable with many discontinuities near region boundaries. In this case, due to the high degree of nonstationarity, the single KLT becomes far from optimal, and the DWT works better; the use of the optimal KLT would require the solution to multiple eigenvector problems, which is not realistic in a practical scenario. IV. PROPOSED LOW-COMPLEXITY KLT As has been seen, the spectral KLT can provide performance gains in excess of 2 db with respect to the wavelet transform using a single average covariance matrix; however, its complexity is rather high for real-time applications. In the following, we propose a low-complexity version of the KLT that alleviates this problem with virtually no performance loss with respect to the full-complexity transform. Fig. 5. Performance comparison of different transforms, with a separable transform in the spectral direction: KLT, DWT, and DCT. Comparing Figs. 4 and 3, it can be seen that the performance of the DWT1D2D transform is very similar to that of the DWP3D transform, but with a significantly reduced computational effort, since it is not necessary to compute the optimal basis. In fact, this transform has been selected in [15] [18] for its favorable tradeoff between performance and complexity. Continuing the study of spectrally separable transforms, Fig. 5 compares the DWT1D2D, DCT1D-DWT2D, and KLT1D-DWT2D transforms. Following the results described above, these transforms have been selected in order to compare different spectral decorrelators, using the 2-D DWT for spatial decorrelation because of its effectiveness. Not surprising, the KLT turns out to be the best transform; since we are employing the same transform for all spectral vectors, the overhead of describing the transform matrix in the compressed file is negligible. The performance gain of the KLT1D-DWT2D with respect to the DWT1D2D is about 2 db at high quality levels, and significantly more at low bit-rates. However, the KLT1D- DWT2D requires the estimation (and averaging) of as many covariance matrices as samples per band, followed by the A. One-Dimensional Transform The KLT evaluates the average correlation matrix over all spectral vectors. For an AVIRIS scene, this amounts to computing and averaging over 300 000 such matrices. To simplify this process, we note that convergence of the estimation process may be achieved using fewer matrices. Using the notation defined in Section II, in the proposed low-complexity transform all the processing is not carried out on the complete set of spectral vectors, but rather on a subset of vectors selected at random. Hence, the sample mean vector is defined as M x =[m 1,m 2,...,m B ]T, where m k = (1/M N ) i I j J xk ij, and I and J are sets containing, respectively, M and N different indexes picked at random in the intervals [1,M] and [1,N], with M M and N N. The covariance matrix is obtained as C X = (1/M N ) i I j J (X ij M x)(x ij M x) T. It is used to form the eigenvector set C X u i = λ i u i, where u i are the eigenvectors associated with the eigenvalues λ i. Aligning the eigenvectors columnwise we obtain the low-complexity KLT matrix V. The transformed vector is computed as Y ij =(V ) T (X ij M x). We also denote as ρ = M N /M N the percentage of spectral vectors employed to evaluate the

1414 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 45, NO. 5, MAY 2007 covariance matrix. Obviously, the smaller is ρ, the lower is the complexity of this KLT. The complexity of the first stage of the new transform, i.e., the evaluation of the covariance matrix, becomes O(ρB 2 MN), i.e., it is reduced by a factor ρ, as the number of covariance matrices to be computed decreases linearly with ρ; the other two terms remain unchanged. Some numerical results on the complexity of the complete JPEG 2000 based algorithm are given in Section V-A. Note that, in principle, a very regular image would require a smaller value of ρ for the convergence of the covariance matrix estimation process than a highly nonstationary image. However, taking this issue into account would require a preanalysis of the image, which would significantly increase complexity. Some work in this direction has been reported in [23] and [31], where a preclassification is carried out, and different KLTs are used within each class. In the context of this paper, we have found that the results obtained using reasonable values of ρ are little dependent on the random drawing, provided the samples are taken uniformly over the image; for example, with ρ =0.01, C X is obtained as average of more than 1000 matrices. B. Three-Dimensional Transform Evaluation: Spectral Low-Complexity KLT and Spatial 2-D DWT In order to obtain a 3-D transform that can be applied to a hyperspectral data cube, we employ the proposed lowcomplexity KLT as a spectral decorrelator, followed by the 2-D DWT for spatial decorrelation. In other terms, this transform is equivalent to the KLT1D-DWT2D, which turned out to be the highest performance transform, but employs the lowcomplexity version of the KLT. We have evaluated the performance of this transform, following the procedure outlined in Section III-A, for several values of ρ, in order to understand how many spectral vectors are actually needed to obtain convergence in the estimate of the covariance matrix, and hence optimal performance. The results are reported in Fig. 6. As can be seen, taking ρ =0.1 we obtain a negligible performance loss with respect to the fullcomplexity KLT; in particular, setting to zero 50% and 96% of the transform coefficients yields an SNR loss of 0.21 and 0.16 db, respectively. Taking ρ =0.01 the loss is still very small, with an even larger computational saving; in this case, the SNR decreases by 0.89 and 1.12 db when 50% and 96% of the transform coefficients are, respectively, set to zero. C. Overview of JPEG 2000 The architecture of JPEG 2000 is based on transform coding. A biorthogonal DWT is first applied to each image tile, whose output is a set of subbands at different resolution levels. Each quantized subband of the wavelet decomposition is divided into rectangular blocks (code-blocks), which are independently encoded with the embedded block coding with optimized truncation (EBCOT) entropy coding engine; EBCOT is based on a bit-plane approach, context modeling, and arithmetic coding. The bit stream output by EBCOT is organized by the rate allocator into a sequence of layers, each layer containing contributions from each code-block; the block truncation points Fig. 6. Performance of the 3-D transform employing the low-complexity KLT, for several values of ρ. associated with each layer are optimized in the rate distortion sense. The final JPEG 2000 codestream consists of a main header, followed by a layered representation of the included code-blocks for each tile, organized into packets. Part 2 of the standard provides specific tools that can be applied to hyperspectral images. In particular, the multicomponent transformation feature allows for spectral decorrelation by means of an external transform, followed by the application of JPEG 2000 to a whole block of decorrelated bands; the bands are separately decorrelated in the spatial directions by means of the 2-D wavelet transform, whereas the rate allocation is optimized across the whole block. Since JPEG 2000 standardizes the decoder, Part 2 provides the syntax (i.e., the MCC, MCT, and MCO marker segments) to embed into the codestream the inverse spectral transform that must be carried out after performing JPEG 2000 decoding of each component. D. Integration of Low-Complexity KLT Within JPEG 2000 The proposed technique employs a hybrid 3-D transform; it first applies the low-complexity KLT as multicomponent extension to JPEG 2000, and then the JPEG 2000 2-D DWT, rate allocation and entropy coding to the spectrally transformed bands. Three decomposition levels are performed for the 2-D spatial transform, employing the (9, 7) filter. The inverse KLT transform matrix is written in an MCT marker segment in the compressed file. Notably, the postcompression rate-distortion optimization is operated on the complete 3-D set of transformed coefficients, ensuring optimal performance. In particular, rather than canceling some least significant spectral components, ratedistortion optimization selects an optimal number of bit-planes and coding passes to be retained in each code-block of all 224 transformed components, so as to obtain maximum reconstruction fidelity. On a related note, a very desirable feature of a compression system for remote sensing images is the ability to generate quicklook images without having to fully decode the compressed file. In a typical scenario, a user would download a low spatial resolution false-color quicklook of the scene. To do so,

PENNA et al.: TRANSFORM CODING TECHNIQUES 1415 full spectral decorrelation is necessary in order to extract the three false-color bands, and then reduced resolution decoding of each band has to be carried out. This procedure is impractical, because it requires to perform the full spectral decorrelation to extract few channels. Moreover, it is not compliant with the JPEG 2000 standard, which requires that the spatial inverse transforms are performed before the spectral one. On the other hand, JPEG 2000 Part 2 provides an interesting feature, in that, through suitable marker segments, it is possible to specify different transformations for selected groups of bands [21]. For example, the three bands to be used to generate false-color quicklooks can be skipped by the spectral decorrelator and compressed in intraband mode; this yields a slight performance loss, but allows increased flexibility in the access to selected portions of the data. This procedure can be extended to the proposed scheme, where the bands to be used to generate the quicklooks could be canceled from the spectral vectors in the computation of the covariance matrix, and then of the transform coefficients. However, this goes beyond the scope of the present paper, and is left for further work. V. E XPERIMENTAL RESULTS: COMPLEXITY AND RECONSTRUCTION FIDELITY As has been said, although MSE is a popular degradation metric, it does not necessarily describe how closely to the original image a reconstructed image would perform in terms of information extraction. For this reason, the quality assessment of the proposed technique has been divided into two parts. In the present section, we deal with reconstruction fidelity as well as complexity, whereas in Section VI, we provide the results of an application-based quality assessment. The rate-distortion performance of the proposed scheme, based on the low-complexity KLT and JPEG 2000, has been compared with that of other state-of-the-art lossy compression schemes. First, in Section V-A, we evaluate the complexity of the low- and full-complexity KLT; this allows to assess the actual computational advantage in a realistic compression setting. Then, in Section V-B, we compare the compression performance of various algorithms. In the results described in this section, we have employed a set of AVIRIS radiance scenes, i.e., the Cuprite, Jasper Ridge, and Moffett Field scenes, using 256 lines with 614 pixels and all bands, unless otherwise noted. The Purdue Indian Pines scene has also been used; this scene contains 145 145 pixels and 220 bands, and provides a higher amount of detail than the previous scenes. The AVIRIS sensor is a representative hyperspectral one, and the data are publicly available on the Internet at aviris.jpl.nasa.gov; since these data are widely used in the literature, comparisons with other techniques are facilitated. JPEG 2000 has been run without error resilience options, and no quality layers have been formed. The KLT matrix coefficients are embedded in an MCT marker segment in the JPEG 2000 codestream. For the wavelet transform, we employ five transform levels in each direction, in order to accommodate the 614 pixels of each line. We have noted that using only three levels in the spectral direction does not significantly decrease performance. Fig. 7. (Top) Performance of the low-complexity KLT as a function of ρ. The curve refers to an encoding rate of 1 bpppb. (Bottom) Computation time for the evaluation of the covariance matrix, as function of ρ. The results refer to the Cuprite scene. TABLE I COMPLEXITY COMPARISON OF END-TO-END COMPRESSION ALGORITHMS.RUNNING TIMES ARE EXPRESSED IN SECONDS A. Complexity Fig. 7 shows the compression performance and the computation time of the proposed algorithm as a function of ρ, on the Cuprite scene. The running times have been measured on a Pentium IV PC at 3 GHz, and refer only to the evaluation of the covariance matrix; the computations have been carried out using Matlab, which is known to be very efficient for matrix computations. As can be seen, the performance loss is very smooth as ρ decreases, allowing one to select the best performance-complexity tradeoff for a given application. As had been noted in the previous experiment, the values ρ =0.1 and ρ =0.01 yield a very small loss, and can be used as starting point for a fine optimization. These values provide a complexity reduction of about 20 and 100 times with respect to the fullcomplexity transform. Clearly, the covariance matrix evaluation is only one source of complexity, since the solution to the eigenvector problem, the computation of transform coefficients, as well as the quantization, entropy coding and rate allocation, have to be taken into account. In our implementation, all the steps up to and including the generation of the KLT matrix are implemented in Matlab, whereas the actual image encoding is done using optimized C software. Table I compares the end-to-end computation time for the full-complexity KLT, the low-complexity KLT with ρ equal to 0.1 and 0.01, and the technique proposed in [15], which employs the DWT1D2D transform, using JPEG 2000 for the spatial wavelet transform, quantization, entropy coding, and rate allocation; the time spent in the covariance matrix evaluation has also been reported. As can be seen, using ρ =0.1 and

1416 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 45, NO. 5, MAY 2007 Fig. 8. Performance evaluation of the proposed JPEG 2000 based technique: Rate-distortion curve for the Cuprite scene. Dashed: Full-complexity KLT. Solid: Low-complexity KLT, ρ =0.01. Dashed+star: DWT1D2D as proposed in [15]. Fig. 9. Performance evaluation of the proposed JPEG 2000 based technique: Rate-distortion curve for the Jasper Ridge scene. Dashed: Full-complexity KLT. Solid: Low-complexity KLT, ρ =0.01. Dashed+star: DWT1D2D as proposed in [15]. ρ = 0.01 yields an end-to-end computational saving of 2.57 and 3.01 times, respectively, with a minor performance loss. As ρ decreases the computation time tends to settle on an asymptotic value which is larger than the value of the DWT1D2D. This is due to the fact that the solution to the eigenvector problem, and especially the computation of transform coefficients, are more demanding than the spectral DWT. This is somewhat obvious because the transform coefficients are computed as a full matrix-vector product, since the KLT matrix does not exhibit any structure that can be exploited to reduce the number of operations. However, the low-complexity KLT with ρ =0.01 is only about 40% more complex than the DWT1D2D, and provides a significant performance gain, as will be seen in the following. B. Compression Performance The compression performance of the proposed scheme has been compared with that of other state-of-the-art schemes. The results are shown in Fig. 8 for the Cuprite scene. The following algorithms are compared: 1) the proposed scheme with the low-complexity KLT (ρ =0.01); 2) the scheme with the fullcomplexity KLT; 3) the DWT1D2D scheme employing JPEG 2000 and 3-D rate-distortion optimization [15]. As expected, the performance of the low-complexity KLT is very close to that of the full-complexity transform, with a maximum loss of 0.27 db at high bit-rates. It should be noted that, with respect to the technique in [15], the proposed KLTbased scheme achieves a significant SNR gain, ranging between 2.5 and 6.7 db. Similar results have been achieved for other scenes. In particular, in Fig. 9, we report performance results for the Jasper Ridge scene. The gain with the respect to the technique in [15] is between 5 and 8.1 db. Fig. 10 reports the rate-distortion curves obtained using the Purdue Indian Pines image. This image contains a large amount Fig. 10. Performance evaluation of the proposed JPEG 2000 based technique: Rate-distortion curve for the Purdue Indian Pines scene. Dashed: Full-complexity KLT. Solid: Low-complexity KLT, ρ =0.1. Dashed+star: DWT1D2D as proposed in [15]. of details, and is rather nonstationary. As a consequence, on this image the KLT still outperforms the other techniques, but the gap is lower, i.e., the maximum SNR gain is about 0.65 db with respect to [15]; as in the previous case, the low-complexity KLT has a small performance loss with respect to the KLT, which amounts to 0.2 db on average. VI. EXPERIMENTAL RESULTS: INFORMATION EXTRACTION Quality metrics such as SNR, which are based on the MSE, measure the fidelity of the reconstructed image with respect to the original image; however, higher SNR does not necessarily yield higher quality of a remote sensing lossy-compressed image for a given application. In fact, some artifacts, e.g.,

PENNA et al.: TRANSFORM CODING TECHNIQUES 1417 TABLE II COMPARISON OF CLASSIFICATION RESULTS ON THE PURDUE INDIAN PINES IMAGE,USING LINEAR DISCRIMINANT ANALYSIS, AND THE ORIGINAL GROUND TRUTH AS REFERENCE tiling or compression artifacts, which may have little effect on SNR, might heavily bias the analysis results of the reconstructed images. Therefore, it is necessary to validate the compression results also from the remote sensing application standpoint. Although this is an important research topic, there is no widely accepted protocol to evaluate remote sensing image quality in a general way; this is partly caused by the conspicuous number of existing remote sensing applications, which makes it difficult to work out a reasonable set of quality metrics. Recently, an investigation of various quality metrics has been reported in [19]. It is shown that MSE is reasonably good at capturing the effect of lossy compression on SAM classification; on the other hand, it is also recognized that more than one metric is needed to accurately analyze the quality degradation. SAM [10], [32], iterative self-organizing data, and k-means [18] classification are popular benchmark applications for validating lossy compression technique from the quality assessment standpoint. In this paper, we employ a more thorough quality assessment approach, considering a more general classification perspective. In particular, we propose an evaluation framework based on the analysis of the following remote sensing applications, which have been selected in order to cover the largest possible range of image exploitation scenarios. 1) Multiclass hard classification. In high-resolution imagery, the size of a single resolution cell should be small enough that a single material is contained in it. As a consequence, when one performs image classification, it is reasonable to assign each pixel to one and only one class. To evaluate the performance of lossy compression techniques for hard classification, we employ linear discriminant analysis [33]. 2) Classification of mixed pixels. When the size of a resolution cell is large, as occurs with low-resolution images, it is likely that a pixel contains contributions from elements belonging to more than one class. Therefore, rather than assigning a pixel to one class, it is more suitable to estimate the percentage of contribution of each class to the current pixel. This can be carried out by means of spectral unmixing techniques (see [34] for a survey). 3) Binary hard classification. Multichannel images are often analyzed using detectors and classifiers, in order to assign each pixel to one out of a given total number of classes. We conduct an experiment to study the effect of compression on binary detection; we define two classes, namely background and foreground, and with the aid of ground truth we compute error probabilities for the detector run on the decoded images. To this end, we employ support vector machines as in [35]. 4) Anomaly detection. Another very typical use of hyperspectral images is the detection of isolated anomalous targets. Detectors that do not require target signature models have been developed [36], so as to overcome the difficulty of developing such signatures; such algorithms are usually referred to as anomaly detectors. The Reed Xiaoli (RX) algorithm [37] is probably the most popular anomaly detector.

1418 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 45, NO. 5, MAY 2007 A. Multiclass Hard Classification Multiclass classification amounts to assigning each pixels to one out of a given number of classes; the assignment is based on the comparison of the spectral vector with reference signatures. We have applied the linear discriminant analysis method [33] to the Purdue Indian Pines image, for which ground truth is available, making it possible to assess the classification results with very good accuracy. In developing the signatures and carrying out the classification, we have closely followed the approach outlined in [35]. In particular, only nine out of the sixteen land-cover classes available in the original ground truth have been retained, i.e., those for which a sufficiently high number of training samples could be used. Out of the samples available for each class, about one half has been used to generate the signatures, and the other half to evaluate the classification performance. The signatures have been developed from the original image data, and then used to classify the coded and decoded scenes at different bit-rates. In Table II, we present the results in a very similar way with respect to [35], i.e., showing the producers and users accuracies for each class, as well as the overall classification accuracy (percentage of pixels that have been classified correctly), and the κ statistics, which also weights the classification error for each single class. Looking at the overall accuracy and the κ statistics, it can be observed that, at the bit-rates of interest, all algorithms provide good classification accuracy, and their performance is roughly equivalent. For each algorithm, a higher bit-rate (and hence smaller MSE) always leads to a smaller classification error. As for the performance of different compression techniques, it can be seen that, for medium and low bit-rates, e.g., 0.5 bpppb, the best algorithms in terms of MSE also provide more accurate classification, whereas the differences tend to vanish in the case of higher bit-rates. The behavior is not the same for all classes; e.g., the error for classes 5 and 9 (wood and hay) is almost independent from the bit-rate, whereas for classes 2 and 8 (corn min. till, and Soybean clean till) compression has a more significant impact. B. Classification of Mixed Pixels For mixed-pixel classification, we have employed linear spectral unmixing. The linear mixing model describes each spectral vector x as a mixture of M endmembers x = M a i s i + w i=1 where w is an additive noise vector, and s i are the M endmembers, and represent the different classes that contribute to the spectral vector with different abundance fractions. The objective of spectral unmixing is to estimate the abundance fractions a i given the observed data and the endmembers. In our experiment, low-resolution AVIRIS radiance images have been used, namely Cuprite and Jasper Ridge. In particular, the constrained version of the linear spectral unmixing algorithm in [38] has been used. Ten classes have been considered; this choice has been made by looking at the significant breaks TABLE III COMPARISON OF LINEAR SPECTRAL UNMIXING RESULTS FOR THE CUPRITE AND JASPER RIDGE IMAGES IN TERMS OF THE AVERAGE ERROR MAGNITUDE in the histogram of clusters, which represent major changes in their generality. For each pixel, the spectral unmixing algorithm provides an n-tuple of numbers between 0 and 1, which represent the fraction of each class contributing to the pixel (n is equal to 10 in our experiment). We have derived a performance metric for linear spectral unmixing of the original and decoded image as follows. We interpret the n-tuple of values for each pixel as the components of a feature vector in an n-dimensional Euclidean feature space. We compute the magnitude of the error vector, i.e., the difference between the vectors associated with the original and decoded pixels, respectively, and we use the average magnitude of the error vector as overall error metric. The error metric values are reported in Table III for the Cuprite and Jasper Ridge images. As can be seen, the KLTbased techniques consistently achieve better performance than the algorithm in [15] based on the DWT1D2D, at all bitrates. As for SNR, it can be observed that large SNR gains are typically reflected by better classification performance, although occasionally the algorithm with smaller SNR may slightly outperform that with higher SNR. This highlights the need of a performance assessment that considers both SNR and application-related metrics. C. Binary Hard Classification Besides multiclass classification, in some cases it is interesting to discriminate between two classes, namely background and foreground pixels; in the following we focus on this binary detection problem. For binary hard classification, we have adopted the two-class support vector machine formulation in [35]. Given a labeled training data set {(x 1,y 1 ),...,(x n,y n )}, where x i are spectral vectors and y i {+1, 1} are labels, and a nonlinear mapping φ( ), the support vector machine method solves { 1 min w,ξ i,b 2 w 2 + C i ξ i } subject to y i (φ T (x i )w + b) 1 ξ i i and ξ i 0 i, where w and b define a linear regressor in the feature space; ξ i and C are, respectively, a positive slack variable and the penalization applied to errors, which can be also regarded as a regularization parameter. The support vector machine constructs a hyperplane

PENNA et al.: TRANSFORM CODING TECHNIQUES 1419 TABLE IV BINARY CLASSIFICATION RESULTS ON THE PURDUE INDIAN PINES IMAGE, USING SUPPORT VECTOR MACHINES, AND EMPLOYING THE ORIGINAL GROUND TRUTH AS REFERENCE Fig. 11. Error rate for detection of anomalous pixels on the Cuprite image, as a function of the percentage of background (nonanomalous) pixels. φ T (x i )w + b that maximizes the margin of separation. We employ a linear kernel, i.e., φ(x i ) φ(x j )=x i x j. We have applied support vector machines to the Purdue Indian Pines image. Since classification ground truth is available for this image, it is possible to assess the probability of detection error with very good accuracy. Out of the available samples, about one half has been used as training samples, and the other half to carry out classification. The signatures have been developed from the original image data, and then used to classify the coded and decoded scenes at different bitrates. Two classes are considered, namely the set of sixteen land cover classes as one class (foreground), and then the background. As a consequence, this classification procedure detects whether a pixels belongs to the background or to the foreground. The results are shown in Table IV; for each bitrate we show the overall classification accuracy (percentage of pixels that have been classified correctly), and the κ statistics. The 2 2 confusion matrix is also reported. The first row reports the percentage of background pixels that have been, respectively, correctly and incorrectly classified with respect to the ground truth; the second row reports the percentage of foreground pixels that have been, respectively, incorrectly and correctly classified with respect to the ground truth. As can be seen, binary detection is only marginal affected by compression. The performance seems to be little dependent on the bit-rate, also at low bit-rates, and all the algorithms achieve similar performance, very close to the binary detection results on the original image. In this application, MSE turns out to be poorly correlated to the information extraction performance. D. Anomaly Detection As for anomaly detection, we have applied the RX detector by Reed and Yu as defined in (1) of [39]. Since our goal is to assess the performance impairment with respect to the no compression case, we have employed the anomaly detection results obtained on the uncompressed image as reference results, and we have computed the error probability of the algorithm run on the decoded image; both missed detections and false alarms have been considered as error sources. It should be noted that the anomaly detection results obtained on the original image, which we employ to compute the error rates, are not necessarily exact; unfortunately, anomaly detection ground truth is not available for this image. As a consequence, the error rates simply show the percentage difference in the detection results obtained on the decoded as opposed to the original image. Since the exact percentage of anomalous pixels is not known apriori, and is to some extent application-dependent, the results presented in this paper have been worked out considering increasingly high percentages of anomalous pixels, i.e., increasingly low thresholds for the likelihood test. Fig. 11 shows the error rates for Cuprite, as a function of the percentage of background pixels (anomalous pixels consist in the complementary set). The encoding for this experiment has been done at 0.5 and 1.5 bpppb. Somewhat surprisingly, it turns out that, for this application, the results have the opposite trend with respect to the classification results. In particular, the DWTbased compression algorithm achieves the best performance, followed by the low-complexity KLT, and then by the fullcomplexity KLT. In other terms, the algorithms with the best rate-distortion performance are those that perform worse for anomaly detection. This behavior can be easily explained by considering that transform coding, and in general lossy compression, can be seen as approximation problems where one wants to describe a signal using the smallest number of coefficients. To do so, a signal basis is usually sought, which captures the most typical features of the signal. That is, compression is achieved by

1420 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 45, NO. 5, MAY 2007 representing all the typical components, and discarding, or representing more coarsely, the anomalous components. The DWT does this by using a fixed set of basis functions that capture both long and short features of the signal. The KLT does even better, in that the optimal basis functions that match the principal components of a specific signal are computed. As a consequence, although the overall MSE is significantly lower for the KLT-based algorithm, for some spectral vectors, and specifically for a few anomalous ones, the KLT can produce a higher MSE, and hence impair the detection of these anomalous pixels. For example, in the experiment above at 0.5 bpppb, we have computed the reconstruction MSE for the two most anomalous spectral vectors, i.e., those with largest likelihood. It turns out that, for those vectors, the DWT achieves an MSE of 178.5 and 192.61, while values of 260.25 and 216.40 are obtained by the KLT. This highlights that anomalous pixels can exhibit large errors, thus impairing detection performance. VII. CONCLUSION In this paper, we have carried out an extensive study of 3-D transforms for lossy compression of hyperspectral data. It has been found that, among wavelet-based transforms, a hybridrectangulare/square transform is highly suitable, and achieves performance similar to wavelet packets. The best spectral transform has turned out to be the KLT. In order to make this transform computationally feasible, we have proposed a low-complexity version with comparable performance. The degree of computational saving and the related performance loss can be tuned to the specific needs of each application. The low-complexity KLT, along with a hybrid waveletbased scheme, have been integrated into a JPEG 2000 Part 2 compliant scheme. The proposed KLT-based scheme achieves significant performance gains with respect to wavelet-based methods. On highly nonstationary scenes, the gain is still present, although in a smaller amount, since the KLT is less effective in capturing the nonstationary behavior. An end-toend complexity reduction of about three times can be achieved using the low-complexity KLT, with a minor performance loss (about 0.5 db). This transform is only about 40% more complex than 3-D wavelets, but has significantly better performance. A quality assessment of compressed images has also been carried out by evaluating the effects of several lossy compression schemes on different remote sensing applications, namely multiclass classification, classification of mixed pixels via spectral unmixing, binary hard classification, and anomaly detection. It turns out that, for multiclass classification and spectral unmixing, SNR is a reasonable indicator of classification performance, so that the proposed scheme is still the highest performance one by a large margin. For binary hard classification, the performance turns out to be little dependent on the MSE in a wide range of bit-rates. On the other hand, it has been found that the best algorithms from the rate-distortion viewpoint are not the best choice for anomaly detection, since they provide a very good representation of the most typical image features, but exhibit larger error on the anomalous pixels. As a consequence, the use of MSE as error metric does not always reflect the information extraction performance. In general, MSE-optimized algorithms, such as those using the KLT, provide a very good representation of the most typical image features, but may exhibit larger error on the nontypical pixels. Therefore, applications where these nontypical pixels are important may be affected by compression to a larger extent. ACKNOWLEDGMENT The authors would like to thank the anonymous reviewers for their insightful comments, which helped improve the quality of this paper. REFERENCES [1] D. S. Taubman and M. W. Marcellin, JPEG2000: Image Compression Fundamentals, Standards, and Practice. Norwell, MA: Kluwer, 2001. [2] S. Lim, K. Sohn, and C. Lee, Compression for hyperspectral images using three dimensional wavelet transform, in Proc. IGARSS, Sydney, Australia, 2001, pp. 109 111. [3] Y. Tseng, H. Shih, and P. Hsu, Hyperspectral image compression using three-dimensional wavelet transformation, in Proc. 21st ACRS, Taipei, Taiwan, 2000. [4] A. Kaarna and J. Parkkinen, Comparison of compression methods for multispectral images, in Proc. NORSIG Nordic Signal Process. Symp., Kolmarden, Sweden, 2000, vol. 2, pp. 251 254. [5] G. P. Abousleman, M. W. Marcellin, and B. R. Hunt, Compression of hyperspectral imagery using the 3-D DCT and hybrid DPCM-DCT, IEEE Trans. Geosci. Remote Sens., vol. 33, no. 1, pp. 26 34, Jan. 1995. [6] D. Markman and D. Malah, Hyperspectral image coding using 3D transforms, in Proc. IEEE ICIP, Thessaloniki, Greece, 2001, pp. 114 117. [7] M. D. Pal, C. M. Brislawn, and S. R. Brumby, Feature extraction from hyperspectral images compressed using the JPEG-2000 standard, in Proc. IEEE SSIAI, Santa Fe, NM, 2002, pp. 168 172. [8] S. Lim, K. H. Sohn, and C. Lee, Principal component analysis for compression of hyperspectral images, in Proc. IGARSS, Sydney, Australia, 2001, pp. 97 99. [9] X. Tang, C. Sungdae, and W. A. Pearlman, 3D set partitioning coding methods in hyperspectral image compression, in Proc. IEEE ICIP, Barcelona, Spain, 2003, pp. II-239 II-242. [10] X. Tang and W. A. Pearlman, Three-dimensional wavelet-based compression of hyperspectral images, in Hyperspectral Data Compression. Norwell, MA: Kluwer, 2005. [11] J. A. Sagri, A. G. Tescher, and J. T. Reagan, Practical transform coding of multispectral imagery, IEEE Signal Process. Mag.,vol.12,no.1,pp.32 43, Jan. 1995. [12] P. L. Dragotti, G. Poggi, and A. R. P. Ragozini, Compression of multispectral images by three-dimensional SPIHT algorithm, IEEE Trans. Geosci. Remote Sens., vol. 38, no. 1, pp. 416 428, Jan. 2000. [13] L. Chang, C. Cheng, and T. Chen, An efficient adaptive KLT for multispectral image compression, in Proc. 4th IEEE Southwest Symp. Image Anal. and Interpretation, Austin, TX, 2000, pp. 252 255. [14] P. Hao and Q. Shi, Reversible integer KLT for progressive-to-lossless compression of multiple component images, in Proc IEEE Int. Conf. Image Process., Barcelona, Spain, 2003, pp. I-633 I-636. [15] B. Penna, T. Tillo, E. Magli, and G. Olmo, Progressive 3-D coding of hyperspectral images based on JPEG 2000, IEEE Geosci. Remote Sens. Lett., vol. 3, no. 1, pp. 125 129, Jan. 2006. [16] J. T. Rucker, J. E. Fowler, and N. H. Younan, JPEG2000 coding strategies for hyperspectral data, in Proc. IGARSS, 2005. [17] B. Penna, T. Tillo, E. Magli, and G. Olmo, Embedded lossy to lossless compression of hyperspectral images using JPEG 2000, in Proc. IGARSS, 2005, pp. 140 143. [18] Y. Wang, J. T. Rucker, and J. E. Fowler, Three-dimensional tarp coding for the compression of hyperspectral images, IEEE Geosci. Remote Sens. Lett., vol. 1, no. 2, pp. 136 140, Apr. 2004. [19] E. Christophe, D. Léger, and C. Mailhes, Quality criteria benchmark for hyperspectral imagery, IEEE Trans. Geosci. Remote Sens., vol.43,no.9, pp. 2103 2114, Sep. 2005. [20] V. Guralnik and G. Karypis, A scalable algorithm for clustering protein sequences, in Proc. Workshop Data Mining Bioinformatics, 2001.

PENNA et al.: TRANSFORM CODING TECHNIQUES 1421 [21] M. Boliek, Ed., JPEG 2000 Part 2 Extensions. Document ISO/IEC 15444-2. [22] J. J. Gerbrands, On the relationships between SVD, KLT and PCA, Pattern Recogn., vol. 14, no. 1 6, pp. 375 381, 1981. [23] A. Kaarna, P. Zemcik, H. Kalviainen, and J. Parkkinen, Compression of multispectral remote sensing images using clustering and spectral reduction, IEEE Trans. Geosci. Remote Sens., vol. 38, no. 2, pp. 1073 1082, Mar. 2000. [24] S. Lim, K. H. Sohn, and C. Lee, Principal component analysis for compression of hyperspectral images, in Proc. IGARSS, 2001, pp. 97 99. [25] M. Vetterli and J. Kovacevic, Wavelet and Subband Coding. Englewood Clifffs, NJ: Prentice-Hall, 1995. [26] R. R. Coifman and M. V. Wickerhauser, Entropy-based algorithms for best basis selection, IEEE Trans. Inf. Theory, vol. 38, no. 2, pp. 713 718, Mar. 1992. [27] K. Ramchandran and M. Vetterli, Best wavelet packet bases in a ratedistortion sense, IEEE Trans. Image Process., vol.2,no.2,pp.160 175, Apr. 1993. [28] B.-J. Kim, Z. Xiong, and W. A. Pearlman, Low bit-rate scalable video coding with 3-D set partitioning in hierarchical trees (3-D SPIHT), IEEE Trans. Circuits Syst. Video Technol., vol. 10, no. 8, pp. 1374 1387, Dec. 2000. [29] I. S. Dhillon, A new O(N 2 ) algorithm for the symmetric tridiagonal eigenvalue/eigenvector problem, Ph.D. dissertation, Univ. California, Berkeley, CA, 1997. [30] X. Wu and N. Memon, Context-based lossless interband compression Extending CALIC, IEEE Trans. Image Process., vol. 9, no. 6, pp. 994 1001, Jun. 2000. [31] G. Gelli and G. Poggi, Compression of multispectral images by spectral classification and transform coding, IEEE Trans. Image Process., vol.8, no. 4, pp. 476 489, Apr. 1999. [32] X. Tang and W. A. Pearlman, Lossy-to-lossless block-based compression of hyperspectral volumetric data, in Proc. IEEE Int. Conf. Image Process., 2004, pp. 1133 1136. [33] W. J. Krzanowski, Principles of Multivariate Analysis. London, U.K.: Oxford Univ. Press, 1988. [34] N. Keshave and J. F. Mustard, Spectral unmixing, IEEE Signal Process. Mag., vol. 19, no. 1, pp. 44 57, Jan. 2002. [35] G. Camps-Valls and L. Bruzzone, Kernel-based methods for hyperspectral image classification, IEEE Trans. Geosci. Remote Sens., vol. 43, no. 6, pp. 1351 1362, Jun. 2005. [36] D. W. J. Stein, S. G. Beaven, L. E. Hoff, E. M. Winter, A. P. Schaum, and A. D. Stocker, Anomaly detection from hyperspectral imagery, IEEE Signal Process. Mag., vol. 19, no. 1, pp. 58 69, Jan. 2002. [37] I. S. Reed and X. Yu, Adaptive multiple-band CFAR detection of and optical pattern with unknown spectral distribution, IEEE Trans. Acoust., Speech Signal Process., vol. 38, no. 10, pp. 1760 1770, Oct. 1990. [38] Y. E. Shimabukuro and J. A. Smith, The least-squares mixing models to generate fraction images derived from remote sensing multispectral data, IEEE Trans. Geosci. Remote Sens., vol. 29, no. 1, pp. 16 20, Jan. 1991. [39] C.-I Chang and S.-S. Chiang, Anomaly detection and classification for hyperspectral imagery, IEEE Trans. Geosci. Remote Sens.,vol.40,no.6, pp. 1314 1325, Jun. 2002. Barbara Penna (S 02) was born in Castellamonte (Torino), Italy, on May 8, 1976. She received the degree in electronics engineering from Politecnico di Torino, Turin, Italy, in July 2001, where she has been working toward the Ph.D. degree in electronics and communication engineering at the Department of Electronics since January 2003. From September 2001 to December 2002, she worked as Researcher under grant in the Image Processing Lab in the Department of Electronics, Politecnico di Torino. Her research interests are in the field of image compression for remote sensing applications. Tammam Tillo (S 02 M 06) was born in Damascus, Syria, in 1971. He received the degree in electrical engineering from the University of Damascus, Damascus, Syria, in 1994, and the Ph.D. degree in electronics and communication engineering from the Politecnico di Torino, Turin, Italy, in 2005. From 1999 to 2002, he was with Souccar For Electronic Industries, Damascus. In 2004, he was a Visiting Researcher at the Swiss Federal Institute of Technology, Lausanne, Switzerland. He is currently a Postdoctoral Researcher at the Dipartimento di Elettronica, Politecnico di Torino. His research interests are in the areas of robust transmission, image and video compression, and hyperspectral image compression. Enrico Magli (S 98 M 01) received the degree in electronics engineering and the Ph.D. degree in electrical engineering from the Politecnico di Torino, Turin, Italy, in 1997 and 2001, respectively. He is currently an Assistant Professor at the Politecnico di Torino. His research interests are in the field of error-resilient image and video coding for wireless applications, compression of remote sensing images, image security and digital watermarking, and distributed source coding. From March to August 2000, he was a Visiting Researcher at the Signal Processing Laboratory of the Swiss Federal Institute of Technology, Lausanne, Switzerland. He has coauthored more than 100 scientific papers in international journals and conferences. Dr. Magli is currently a member of the Data Archiving and Distribution Technical Committee of the IEEE Geoscience and Remote Sensing Society, of the Multimedia Systems and Applications Technical Committee of the IEEE Circuits and Systems Society, and a contributor to the ISO activities on JPEG 2000 (Part 11, wireless applications). He has been member of the Technical Program Committee and session chair for several international conferences, including IEEE International Conference on Acoustics, Speech, and Signal Processing, IEEE International Conference on Multimedia and Expo, IEEE International Conference on Image Processing, IEEE International Workshop on Multimedia Signal Processing, IEEE International Symposium on Circuits and Systems, and the IEEE Geoscience and Remote Sensing Society. Gabriella Olmo (S 89 M 91 SM 06) received the M.S. (cum laude) and Ph.D. degrees in electronic engineering from the Politecnico di Torino, Torino, Italy. She is currently an Associate Professor at the Politecnico di Torino. Her main recent interests are in the fields of image and video coding, resilient multimedia coding for wireless applications, compression of remote sensing images, joint sourcechannel coding, and distributed source coding. She has coordinated several national and international research programs in the fields of wireless multimedia communications, under contracts by the European Community and the Italian Ministry of Education. She has coauthored more than 130 papers in international technical journals and conference proceedings. Dr. Olmo has been a member of the Technical Program Committee and session chair for several international conferences. She is a member of IEEE Communications Society and IEEE Signal Processing Society.