On Clustering and Embedding Manifolds using a Low Rank Neighborhood Approach

Size: px

Start display at page:

Download "On Clustering and Embedding Manifolds using a Low Rank Neighborhood Approach"

Gloria Simpson
5 years ago
Views:

1 1 On Clustering and Embedding Manifolds using a Low Rank Neighborhood Approach Arun M. Saranathan, Student Member, IEEE, and Mario Parente, Member, IEEE arxiv: v2 [cs.cv] 16 Sep 2016 Abstract In the manifold learning community there has been an onus on the simultaneous clustering and embedding of multiple manifolds. Manifold clustering and embedding algorithms perform especially poorly when embedding highly nonlinear manifolds. In this paper we propose a novel algorithm for improved manifold clustering and embedding. Since a majority of these algorithms are graph based they use different strategies to ensure that only data-point belonging to the same manifold are chosen as neighbors. The new algorithm proposes the addition of a low-rank criterion on the neighborhood of each datapoint to ensure that only data-points belonging to the same manifold are prioritized for neighbor selection. Following this a reconstruction matrix is calculated to express each data-point as an affine combination of its neighbors. If the low rank neighborhood criterion succeeds in prioritizing data-points belonging to same manifold as neighbors, the reconstruction matrix is (near) block diagonal. This reconstruction matrix can then be used for clustering and embedding. Over a variety of simulated and real data-sets the algorithm shows improvements on the state-of-theart manifold clustering and embedding algorithms in terms of both clustering and embedding performance. Index Terms Manifold Clustering, Manifold Embedding, low rank neighborhood selection. I. INTRODUCTION In the era of big data we live under the constant deluge of high-dimensional data in the form of image/video streams, hyperspectral images and audio recordings to cite a few. Generally, these data-streams are not intrinsically high dimensional, rather they lie on or near manifolds that exhibit lower intrinsic dimensionality. Identifying and unraveling these few degrees of freedom leads to significant gains in data processing and storage. Points lying on/near smooth manifolds can be modeled as data that are originally drawn from a lowdimensional parameter space, and then have been mapped by smooth (i.e. diffeo-morphic and invertible) linear or nonlinear functions into a high dimensional ambient space. Manifold learning algorithms attempt to learn a low-dimensional representation of the data or better to embed the data into a lower dimensional coordinate system, such that locally some of the geometrical structure from the high-dimensional data is preserved. In simpler terms, such techniques attempt to eliminate the (nonlinear) effects of the mapping and learn the lower-dimensional parameter space representation. A large number of algorithms have been proposed for manifold learning [1], [2]. Such approaches attempt to find a mapping that would preserve appropriate global (e.g. ISOMAP [3]) or local properties (e.g. Locally Linear Embedding (LLE) [4], A. M. Saranathan and M. Parente are with the Department of Electrical and Computer Engineering, University of Massachusetts, Amherst, MA USA asaranat@umass.edu Laplacian Eigenmaps [5] and Local Tangent Space Alignment (LTSA) [6]). A variety of such techniques have been created and used for a wide range of applications (see [7], [8] and references therein for more examples). The assumption that the data are drawn from a single manifold is seldom satisfied. In practice we are better served by modeling the data as lying on or near a mixture of manifolds. If these manifolds are well separated, simply adding a clustering technique such as N-Cuts [9] prior to manifold learning is sufficient to identify the different manifolds, so that accurate low dimensional parameter space representations (embeddings) for each of the manifolds can be computed. On the other hand, if the different manifolds overlap, identifying the different manifolds and generating quality embedding cannot be easily accomplished. Data-sets that could be modeled as overlapping manifolds have been observed in hyperspectral imaging (whenever there are multiple mixtures with shared endmembers [10]), in images of hand-written numbers [11], natural images for object recognition [12], human face images [11], [13] and human motion capture tasks [14]. A variety of approaches have been designed for clustering data belonging to multiple manifolds. The simplest approaches model the data as lying on or near linear manifolds (affine spaces) and leveraged this expectation of linearity to aid manifold identification (see [15] and references therein). Other algorithms such as the Local Structural Consistency (LSC) [12] and Spectral Multi-Manifold clustering (SMMC) create a structure-dependent similarity metric to generate suitable affinity matrices for spectral clustering. Another approach, the Robust Multiple Manifolds Structure Learning (RMMSL) [16], generates affinity matrices based on tangent space alignment. Unfortunately, these algorithms do not generate lowdimensional embedding for the data. Other algorithms like k- Manifolds [14] are based on the assumption that there is no embedding in a lower dimensional space that preserves all the properties captured in the high dimensional data, but this has been shown to be false in the case of nonlinear manifolds that only share a boundary, such as multiple mixtures with shared endmembers in hyperspectral data [10]. Other popular algorithms draw on the notion of reconstruction coefficients/matrices first introduced in Locally Linear Embedding (LLE) [4]. The algorithms follow a general scheme wherein each data-point is expressed as an affine (or sometimes linear) combination of other points in the data-set. In addition to this, the algorithms place some penalty on the reconstruction coefficients to make sure that data-points on the same manifold are chosen (assigned non-zero reconstruction coefficients) to reconstruct the given data-point. This yields a reconstruction matrix that is approximately block-diagonal and

2 2 the application of a spectral clustering algorithm is sufficient to identify the different manifolds present in the data. Reconstruction-based methods differ mainly in the type of constraints imposed on the reconstruction coefficients to generate a block-diagonal reconstruction matrix. These techniques then generate embeddings from the reconstruction matrix using similar schemes to the one used by LLE. One such algorithms is the Low Rank Embedding (LRE) [17], which adds a rank-based penalty on the reconstruction matrix. The algorithm is quite similar to the Low Rank Representation (LRR) [18] approach for subspace clustering. In effect the LRE assumes that the reconstruction coefficients of data-points on the same manifold have a similar underlying structure, i.e. they can be reconstructed accurately by using the same set of points. The LRE algorithm generates an embedding of the data into a low-dimensional space using the reconstruction matrix in the same fashion as the LLE. The algorithm then performs k- means [19] on the embedding to learn manifold memberships. While this is a reasonable assumption for data which can be modeled as linear subspaces with some distortions, in scenarios where there are highly nonlinear manifolds the neighborhood structure for data-points in different parts of the manifold are significantly different thus different sets of points should be used to reconstruct target points in different linear patches on the manifold. Furthermore, the LLE embedding scheme, which LRE uses, only captures the geometric properties of a neighborhood when the reconstruction coefficients are unaffected by translation, rotation and scaling [4]. The LLE algorithm ensures such invariance by enforcing a sum-to-one (affineness) constraint on the reconstruction coefficients in the neighborhood. Since LRE does not add this constraint the reconstruction coefficients are no longer invariant to rigid linear transformations which leads to distortions in the embedding. Another example of reconstruction-matrix-based algorithms that perform both manifold clustering and embedding is the Sparse Manifold Clustering and Embedding (SMCE) [11]. The SMCE attempts to find a reconstruction matrix where each data-point is expressed as an affine combination of its k-nearest neighbors and adds an additional penalty on the distance based-sparsity of the reconstruction coefficient vector of the point. The authors show that the effect of minimizing both reconstruction error and sparsity penalty as much as possible is that only data-points on the same manifold are assigned non-zero weights. The reconstruction matrix created in this fashion also fulfills the conditions mentioned in LLE for accurate embeddings, i.e. the reconstruction coefficients are invariant to translations, rotations and scaling. While creating sparse neighborhoods may aid the clustering objective, it throws up some issues in the embedding. In particular, the spectral embedding technique introduced in the LLE only preserves local relationships, and there is no penalty if the global geometric information is distorted. Namely, if different neighborhoods do not share points, there is no penalty if they are embedded with different scalings or rotations: global shape is only preserved if there is significant overlap between adjacent neighborhoods. Since the SMCE creates very sparse neighborhoods with little or no overlap there maybe significant distortions in the global shape. A more recent technique, the Joint Manifold Clustering and Embedding (JMCE) [20], expresses each data-point as a convex combination of its k nearest neighbors and at the same time adds a penalty on the magnitude of the non-zero weights assigned to neighbors on other manifolds. While the technique has shown some promise in clustering of hyperspectral data, due to the restriction to convex reconstructions, the embedding suffers from distortions at the boundary of the manifolds. In this paper we propose a novel approach, the Low Rank Neighborhood Embedding (LRNE), which expresses every data-point as an affine combination of its k nearest neighbors. The novelty with respect to the other reconstruction-based approaches is that, in order to ensure that only neighbors on the same manifold are prioritized in reconstructing a target point, we add a penalty on the dimension of the neighborhood of the point, rather that relying on sparsity or penalizing the rank of the whole reconstruction matrix. More specifically, the penalty encourages selecting from the neighborhood a set of points for reconstruction belonging to an affine patch of dimension as low as possible. Consider a point at/near an intersection, the nearest neighbors are drawn from different linear patches each belonging to a different manifold overlapping at that intersection. In this scenario, choosing the set of points which reconstruction error and also lie on a patch of lowest possible dimension will make sure that points from the same manifold as the original point are chosen for reconstruction. Since the reconstruction scheme is local and affine, the LRNE reconstruction matrix can be embedded by using a spectral embedding stage similar to the one described in LLE. The reconstruction coefficient vectors generated by the LRNE are invariant to translations, rotations and scaling. Also, unlike the SMCE, which due to its requirements of learning a sparse neighborhood assigns large (non-zero) reconstruction coefficients to as few neighbors as possible, the LRNE assigns large (non-zero) reconstruction coefficients to all points on the same manifold in the neighborhood, this ensures sufficient overlap between adjacent neighborhoods, which is necessary to help preserving the global geometry as much as possible in the embedding. A preliminary version of this algorithm with limited results for hyperspectral data alone can be found in [21]. The paper is arranged as follows: in section II we will describe different multi-manifold structures our algorithm will target. In section III we describe the new Low Rank Neighborhood Embedding (LRNE), we will provide some intuition to show that choosing a low-dimensional neighborhood will ensure that only data-points from the same manifold are chosen. Following this we will describe an optimization scheme that will ensure the choice of such a low-dimensional neighborhood and the steps required to generate the clustering and embedding from the reconstruction coefficients. In section IV we describe the experiments used to compare the various manifold clustering and embedding algorithms and analysis of the results. We will offer concluding remarks and avenues for further research in section V.

3 Fig. 1. Types of manifold mixtures: (a) Adjoining manifolds. (b) Intersecting manifolds. Inset figures show the neighborhood for a target point at/near the intersection II.

3 3 Fig. 1. Types of manifold mixtures: (a) Adjoining manifolds. (b) Intersecting manifolds. Inset figures show the neighborhood for a target point at/near the intersection II. MANIFOLD MIXTURE TYPES We will apply our method to two different types of manifold mixtures. In one case, the different manifolds have only a boundary in common, we will refer to such manifolds as adjoining manifolds. An example of such manifolds is shown in Fig. 1 (a). Another case would be when the manifolds appear to pass through each other, we will refer to such manifolds as intersecting manifolds. An example of such manifolds is shown in Fig. 1 (b). For each set of manifolds, we will refer to the sub-manifold on each side of the intersection as an arm of the manifold. III. THE LOW-RANK NEIGHBORHOOD EMBEDDING ALGORITHM Given a set of points in R D, drawn from p different smooth and sufficiently well sampled manifolds, the Low Rank Neighborhood Embedding (LRNE) algorithm attempts to find a representation such that each point is reconstructed as an affine combination of the spatially nearest neighbors that lie on the same manifold as the target point. The resulting reconstruction matrix is therefore approximately block-diagonal, where each block should contain reconstruction coefficients for data-points drawn from a single manifold arm. A symmetrized version of the reconstruction matrix is then used as a similarity matrix in a spectral clustering algorithm to identify the different clusters (manifold arms) in the data. For intersecting manifolds, an additional procedure will pair the different arms of each manifold. The reconstruction matrix is also used for embedding in a procedure similar to the LLE embedding. A. Generating the reconstruction matrix The Low Rank Neighborhood Embedding (LRNE) follows a scheme similar to the Locally Linear Embedding (LLE), wherein it attempts to express every data-point as an affine combination of its nearest neighbors with additional penalties to ensure that only neighbors on the same manifold as a given data-point are used in the reconstruction. The LLE objective function to find the appropriate reconstruction coefficients, as defined in [4], is: α x α i n i 2 subject to 1 T α = 1. (1) where x is the target point for which we are finding the reconstruction coefficients, is the l 2 norm, N is the set of the k nearest neighbors of the data-point x defined by N (x) = {n 1, n 2,... n k } and α i R is the reconstruction coefficient assigned to the neighbor n i. In scenarios where the data lie on multiple manifolds with some overlap, the neighborhood N (x) of a target point at/near an intersection will contain points from all the different manifolds that overlap at the intersection. Since each manifold is smooth and well sampled, the neighbors drawn from each manifold will appear to lie on a linear patch. The inset pictures in Fig. 1 show the neighborhoods for a data-point at/near the intersection for the example manifolds. The target point in each case is marked in black and neighbors from the same manifold as the target point marked in blue, while the neighbors from the other manifold are marked in red. The dimensionality of N (x) is the dimensionality of the union of all linear patches, each of which belongs to a lowerdimensional affine space. The LRNE attempts to select from the neighborhood N (x) only those neighbors that lie on the same manifold as the target point as the ones to be used in the reconstruction. This problem can be cast as the task of finding a subset of the neighborhood that both successfully reconstruct the target point (according to Eqn. (1)) and spans an affine space of the lowest possible dimension. Consider a weighted neighborhood matrix M = [α 1 n 1 α 2 n 2... α k n k ]. If we add a penalty to the objective function defined in Eqn. (1) based on the dimension of the space spanned by the columns of M (which is given by rank(m)), the only way to lower the dimensionality of such space (i.e. rank(m)) is by zeroing out coefficients α i, corresponding to all the points that lie on some of the manifolds. We informally refer to the neighbors with non-zero coefficients as the points chosen for the reconstruction. The penalty on the dimensionality of the chosen set can be increased so that possibly neighbors lying on only one manifold are selected. Additionally, if the chosen points do not lie on the same manifold as the target, the reconstruction error will not be d. Thus the need to simultaneously the reconstruction error and the dimension of the neighborhood will ensure that only points on the same manifold as the target point are chosen for the reconstruction (in the absence of noise). The parameter λ regulates the trade-off between allowing reconstruction using some points from the wrong manifold to obtain a better linear fit to the data when noise or local density changes are present. As a result of the above discussion, we propose the addition of a dimension based penalty to the LLE reconstruction objective: α 1 2 x α i n i 2 + λ rank (M) subject to 1 T α = 1. It is straightforward to note that for target points not close to the intersection, Eqn. (2) can be applied as well with the effect that no points from N (x) are excluded. The one hurdle in the solution of the problem defined in (2)

4 Fig. 2. Reconstruction matrix for (a) adjoining manifolds and (b) intersecting manifolds. The points are arranged according to arm-membership to highlight the block structure Eqn.

4 4 Fig. 2. Reconstruction matrix for (a) adjoining manifolds and (b) intersecting manifolds. The points are arranged according to arm-membership to highlight the block structure Eqn. (2) is that the rank function and hence the objective function defined above is not convex. To mitigate this problem the rank function is replaced by the nuclear norm ( ) function which has been shown to be a convex approximation for the rank function [22]. The modified objective function can be written as: α 1 2 x α i n i 2 + λ ˆM subject to 1 T α = 1. where ˆM = [α1ˆn 1 α 2ˆn 2... α kˆn k ], and ˆn i = n i / n i. The normalization is necessary because unlike the rank, the nuclear norm is affected by the scaling of the columns of M. Since the objective function described in Eqn. (3) is convex, the optimization problem could be solved by using the CVX toolbox for MATLAB [23], [24]. In this paper we also present an ADMM based [25] first-order solver for the minimization of the problem described in Eqn. (3). We show the details of the technique in Appendix (A). The effect of the approximation in Eqn. (3) is that, rather than forcing to zero the reconstruction coefficients of neighbors on manifolds other than the one containing the target point, neighbors on the same manifold as the target point are assigned much larger weights as compared to points on the other manifolds. The result of application of Eqn. (3) to the i-th target point in the data-set is a reconstruction vector α that fills the i-th row of a reconstruction matrix R so that R ij = 0 if x j / N (x i ), otherwise the value R ij is α j. If the manifolds are adjoining manifolds, the reconstruction matrix will exhibit an approximately block-diagonal structure with number of blocks equal to the number of manifolds. Consider the example of adjoining manifolds shown in Fig. 1 (a), in this data-set each linear patch at the intersection belongs to a different manifold as shown in the inset figure. Since Eqn. (3) has a penalty on assigning large reconstruction weights to data-points on a different linear patch (or manifold), only data-points on the same manifold as the target point are assigned large reconstruction coefficients. In such a scenario the reconstruction matrices will appear to be approximately block-diagonal with two blocks. It is important to notice that the sparsity structure within each block will only be due to the neighborhood sparsity and no further subdivision into sub-blocks is observed. Such occurrence would imply either that the neighborhoods of some (3) points do not lie on linear patches or that there is not enough overlap between the different neighborhoods, which is contrary with the assumptions of local linearity and of a sufficiently well sampled manifold. The reconstruction matrix for the dataset shown in Fig. 1 (a) is shown in Fig. 2 (a). If the data-points lie on intersecting manifolds, the different blocks will identify with the different arms of the overlapping manifolds. Again, we can illustrate the issue with the example in Fig. 1 (b), in which there are two manifolds, each with an arm on each side of the intersection. The algorithm will assign much larger reconstruction weights to points on the same manifold as the target point. Consider a point that is on the horizontal manifold and is a small distance from the intersection (marked in black) as shown in Fig. 3 (a). The neighborhood of the target point is composed by points approximately lying on the four patches in orange (neighbors on the same arm as the target point), purple (neighbors on the opposite arm in the same manifold) and blue and green (neighbors on the other manifold). The number of neighbors in each patch is proportional to the area of the patch if the manifolds are uniformly sampled. Clearly the purple region is much smaller than the orange region and consequently a point has more neighbors on the same arm and far fewer neighbors on the other arm of the same manifold. Thus in scenarios where we consider neighborhoods made up of the nearest neighbors (either k nearest or ɛ) at/near the intersection for intersecting manifolds, there are significantly smaller number of neighbors on the opposing arm of the same manifold as compared to neighbors on the same arm. This effect, which is an intrinsic property of neighborhood graphs, together with the fact that LRNE prioritizes neighbors on the same manifold of each target point result in the reconstruction matrix, wherein the block corresponding to each arm appears disconnected from the blocks corresponding to the other arms. Therefore the reconstruction matrix exhibits twice as many blocks as the number of manifolds (in the case of the specific example in Fig. 1 (b) the reconstruction matrix has four blocks). The reconstruction matrix for the data-set shown in Fig. 1 (b) is shown in Fig. 2 (b). B. Clustering from the Reconstruction Matrix Based on the discussion in the previous section, the reconstruction matrices will be approximately block diagonal with the blocks corresponding to points on each manifold arm. In order to perform manifold clustering, we model the set of datapoints as the nodes of a graph such that the similarity between the nodes x i and x j is based on the reconstruction coefficient (r ij ) in R and the distance between x i and x j and we define the similarity between the two points as: - r ij / x j x i 2 w ij = t i r it/ x t x i 2 this scheme is similar to the one defined in section 2.2 of [11]. The final step is to make this similarity matrix symmetric, this can be achieved by setting W sym = max(w, W T ). This matrix, which exhibits the same block-structure as the reconstruction matrix is then provided as an input to a spectral clustering algorithm [9], [26]. In this case we use the unnormalized

5 5 Fig. 3. (a) neighborhood structure for a point at the intersection (b) reconstruction coefficients for a point at the intersection based on arm membership [blue:same arm, red:other arm of the same manifold, green&black different arms on the other manifold] spectral clustering algorithm described in section 4 of [27], additionally we also use the recursive two-way cut scheme described in section 3.2 of [9]. But in practice any spectral clustering techniques will provide reasonable results because if the weight matrix is (nearly) block-diagonal with k blocks, so is the Laplacian, then the spectral clustering algorithm will identify the different blocks as separate connected components in the corresponding graph (see Proposition II of [27]). In the case of adjoining manifolds, such clustering would directly identify the different manifolds present in the data as the number of blocks in the reconstruction matrix is equal to the number of manifolds in the data. For intersecting manifolds the identified clusters will correspond to the arms of every manifold. C. Pairing the opposing arms of intersecting manifolds It s important to identify two opposing arms as part of a single manifold, as these are seen as members of the same perceptual class. We pair the two arms of a manifold by leveraging the fact that for each target point, the neighborhoodrank-based penalty of the LRNE forces the neighbors on the arms of the same manifold as the target point to have, on average, higher reconstruction coefficients than neighbors on other manifolds. This effect of the optimization in Eqn. (3) is routinely observed for points at the intersection. After this merging step we are able to achieve accurate clustering performance in the case of intersecting manifolds. As an example, Fig. 3 (b) displays, for a target point at the intersection of the manifolds in Fig. 1 (b), the reconstruction coefficients grouped according to arm membership. Reconstruction coefficients for neighbors in the same arm are shown in blue, the reconstruction coefficients for neighbors on the opposite arm of the same manifold are shown in red and the reconstruction coefficients for neighbors on the other manifold are shown in green and black. The average reconstruction coefficients for points on the other manifolds (represented by the green and black lines in Fig. 3 (b)) exhibit lower values than the ones for neighbors on the same manifold as the target point (represented by the red and blue lines in Fig. 3 (b)). The algorithm for arm pairing is as follows: 1) Target points near the intersection are identified as the ones which have neighbors in each of the arms identified by the spectral clustering algorithm. 2) For every target point at the intersection we compute the difference of the average reconstruction coefficient of neighbors lying on the other arms to the average reconstruction coefficient for neighbors on the arm with largest weights. 3) The point then votes to merge the arm with largest weights with the arm with the smallest difference in average reconstruction coefficient. Each arm is merged with the arm that the majority of boundary points vote to merge with. In its present version, the algorithm requires user input on the presence of intersecting vs. adjoining manifolds. On the other hand a simple observation suggests a way to automatically detect if a merging step is required or not. In the case of adjoining manifolds there is only one arm associated with each manifold at the intersection, which ensures that for a data-point at/near the intersection the low-rank criterion of the LRNE ensures that only coefficients on the same arm as the target point are assigned on average large reconstruction coefficients, while in the case of intersecting manifolds the large coefficients are assigned on average to neighbors on the two arms of the same manifold as the target points. A simple analysis of the distribution of the metrics in step 2) of the pairing algorithm could help identify the presence of intersecting manifolds. D. Embedding from the Reconstruction Matrix The LRNE reconstruction coefficients are generated with the same affine constraints as the LLE. As a result, they are unaffected by translations, rotation or scalings of the data-points in each neighborhood which ensures that these coefficients capture the intrinsic geometric structure of the neighborhood [4]. We can use the reconstruction coefficients to compute a low-dimensional embedding in the same fashion as the LLE. Namely, we find a low-dimensional representation Y solving the following problem : Y Y Y R 2 F subject to Y Y T = I. Minimizing the above optimization problem generates the coordinates of the low-dimensional embedding Y in a orthogonal space which is centered at the origin [4]. Following this the embeddings corresponding to each of the different classes are separated based on the classification to generate embedding of each class. E. A note on Time-Complexity The CVX solver uses interior point techniques, which while very efficient, are quite complicated and slow and has worst case time complexity of the order of O(n 6 ) [28]. Each iteration of the ADMM algorithm has a time complexity of O(mn 2 ) since it requires the Singular Value Decomposition (SVD) of a m n matrix at every step [29]. In comparison the SMCE has a time complexity of O(k) (where k is number of iterations for the optimization). The LRE has a time complexity of O(N 3 ), where N is the number of points but is in general faster than (4)

6 6 TABLE I EFFECT OF PARAMETERS ON CLUSTERING PERFORMANCE Fig. 4. Simulated data-set containing adjoining manifolds (a) original labels (b) SMCE class labels (c) LRE class labels (d) LRNE class labels (e) LLE embedding (f) SMCE embedding (g) LRE embedding (h) LRNE embedding Fig. 5. Intersecting manifolds simulated data-set (a) original labels (b) SMCE class labels (c) LRE class labels (d) LRNE class labels (e) LLE embedding (f) SMCE embedding (g) LRE embedding (h) LRNE embedding the LRNE as it solves for the reconstruction coefficients of all the data-points at once. In the future we will look to adapt techniques such as the one described in [28], which have been previously used to further improve the time complexity of rank-based techniques. IV. EXPERIMENTS AND RESULTS We tested the new algorithm on some simulated datasets as well as some real (benchmark) data-sets for manifold clustering and embedding. We compared the performance of the LRNE with the ones exhibited by the LRE and SMCE algorithms. The LRNE outperformed its competitors in terms of both classification and embedding. A. Simulated Data The performance of the LRNE was first assessed on simulated manifolds. The simulated data-sets comprise of a pair of rectangular sheets warped by using smooth nonlinear mappings (using sinc and other trigonometric functions) corrupted by Gaussian white noise ( N (0, 0.01)). In the first data-set, the pair of sheets that share a boundary as shown in Fig. 1 (a) (simulating adjoining manifolds), while in the other they pass through each other as shown in Fig. 1 (b) (modeling intersecting manifolds). The parameter space representation of each manifold in the adjoining manifolds data-set is a 3 3 patch sampled uniformly. Each manifold contains 630 points. On the other hand each of the intersecting manifolds is made up of 1000 data-points uniformly sampled on a 4 4 patch. For k = 20 k = 30 k = 40 k = 50 k = 60 λ = λ = 0.01 λ = 0.1 λ = 1 λ = 10 SMCE LRNE SMCE LRNE SMCE LRNE SMCE LRNE SMCE LRNE LRE both these experiments we assume that we have knowledge of the number and type (whether adjoining or intersecting) of manifolds. The three algorithms under test feature a parameter (λ in Eqn. (3) for LRNE) that trades off the reconstruction objective with a penalty on using points on the wrong manifold. Additionally, LRNE and SMCE share with the LLE a parameter k, the number of neighbors in the k-nn graph. The various algorithms were tested on the different data-sets at different parameter values these results for the adjoining manifolds are shown in Table. I. The parameters were incrementally varied until the performance began to deteriorate. The LRE shows near constant performance and does not change significantly with the change in the parameters. The LRNE and the SMCE on the other hand are more sensitive to the parameter λ. Both algorithms are quite stable with respect to classification performance but in general over a wide range of parameters the LRNE shows improved performance over the SMCE. Due to the unsupervised nature of the approaches under test, the notion of the set of parameter values that yields the best overall performance is dependent on the structure of the data-set. One can decide for example to put more emphasis in lowering the misclassification rate at the expense of the embedding performance and vice versa. In order to have a pictorial representation of the results in this example, Table II records the best performance in terms of misclassification rate of the different algorithms for the above parameter range on the simulated data-sets described above and on the simulated hyperspectral mixture data-set described in section IV-B. Figs. 4 and 5 illustrate the corresponding clustering and embedding results on the two warped-sheet data-sets. In particular, Fig. 4 concerns the simulated data-set with adjoining manifolds, while Fig. 5 refers to the simulated data-set with intersecting manifolds. In both figures the classification results are featured in the first row against the true manifold labels and the embedding results are shown in the second row. The optimal embedding performance to compare the three competitors to can be considered the one achieved by applying the LLE separately on each manifold. As expected, the LRE does not fare well when the manifolds exhibit non-trivial curvature. Since nonlinear manifolds are only locally linear, reconstruction coefficients for data-points from different parts of the manifold have significantly differing structures. This does not conform with the low-rank penalty on the reconstruction matrix which leads to highly flawed

7 7 TABLE II MISCLASSIFICATION RATES OF THE DIFFERENT ALGORITHMS k LRE SMCE LRNE Simulated Intersecting Simulated Adjoining Hapke with Shared EM classification performances as shown in Figs. 4 (c) & 5 (c). The embedding generated by the LRE are also invalid, mainly due to the poor clustering performances. The SMCE on the other hand performs well in terms of the classification of the adjoining manifolds as shown in Fig. 4 (b) but the embeddings are significantly distorted as shown in Fig. 4 (f). The poor embedding performance is caused by fact that the penalty on the sparsity of the neighborhood of each target point result in adjacent neighborhoods that do not share significant overlap (i.e. neighboring points have vastly differing neighborhood structures). Since the embedding technique is local, i.e. it only penalizes distortions in the shape of local neighborhoods, the different neighborhoods may be embedded with slightly different orientation or scalings leading to distortions in the global shape of the parameter space representation. The SMCE clustering performance in the case of intersecting manifolds is flawed as shown in Fig. 5 (b). The clustering identifies one of the arms as a cluster as opposed to a manifold. This is because the reconstruction matrix in the case of intersecting manifolds with a nearest neighbor structure has twice as many blocks as manifolds as described in Section III-A. The SMCE uses distance based sparsity constraints which ensures that points on the other manifold and the opposing arm are assigned zero/very low reconstruction coefficients. Since the SMCE assigns zero/very low reconstruction coefficients to points that are far away from the target point there is no viable way of identifying the clusters/arms that belong to the same manifold. The LRNE performs well in terms of classification for both adjoining and intersecting manifolds as shown in Fig. 4 (d) and 5 (d). The embedding from the LRNE compares quite favorably to the embedding from the LLE and is quite successful in identifying the general shape of the parameter space (identifies approximate rectangular shapes) as shown in Fig. 4 (e) and Fig. 5 (e). B. Hyperspectral Mixture Data Hyperspectral imagers (HSIs) or imaging spectrometers measure electromagnetic energy scattered in their field view in the Visible to Near InfraRed (VNIR) wavelength range ( nm). HSI data-sets are organized into planes that form a data cube: each plane corresponds to solar electromagnetic energy reflected off the surface of materials, acquired over a narrow wavelength range a spectral channel for all pixels, and each pixel represents a vector of measurements acquired at a given location for all spectral channels a (reflectance) spectrum [30]. A spectrum can also be interpreted as a point in a high-dimensional space of dimension equal to the number of spectral channels. This experiment simulates the scenario Fig. 6. Simulated Hapke data-set (a) SMCE class labels (b) LRE class labels (c) LRNE class labels (d) LLE embedding (e) SMCE embedding (f) LRNE embedding in which a hyperspectral imager observes several pixels from a terrain composed by mechanical (or intimate) mixtures of different materials, called endmembers, as in a sand beach made up of grains of different minerals. The spectra of such intimate mixtures are described by physical models, such as the one introduced by Hapke [31]. In [10], it has been shown that the point cloud representing intimately mixed spectra of known materials (endmembers), if modeled using Hapke s model can be considered as lying near a manifold obtained by sampling an abundance simplex. Each mixed pixel can be modeled as a linear combination of endmember spectra weighted according to the sampled abundance coefficients in a D-dimensional space (where D is the number of spectral bands) and then applying a nonlinear mapping in that space. Even if the abundance simplex is uniformly sampled, the nonlinear map produces a point cloud that exhibits a density gradient, with higher density (of samples) near the dark endmembers and lower density around the brighter endmembers (as explained in [10]). The exact nature and amount of the density gradient depends upon the endmembers chosen in the mixture. The data-set was chosen because of the high-dimensionality of the ambient space and the non-uniform sampling of the manifolds. The density gradient affects neighborhood structures at points with low density as even the nearest neighbors are quite far away making this a harder data-set for manifold learning algorithms. We modeled a data-set which contains two ternary mixtures (mixtures with 3 endmembers) with two common endmembers. The resulting data cloud exhibits points lying near 2 manifolds adjoining at the boundary occupied by mixed spectra that are combinations of the shared endmembers. We chose four mineral endmember spectra from the RELAB spectral database 1. The minerals are olivine, ripidolite, illite and nontronite samples we generated by sampling each 2 D abundance simplex uniformly and mixed spectra was generate according to the Hapke Model. In Fig. 6, we show the clustering and embedding performance of the three competing algorithms on this data-set. The set-up is the same as in Figs. 4 and 5 except that the high-dimensional data are projected onto the first three principal components for visualization. It is 1 RELAB Spectral Database: Copyright 2008, Brown University, Providence, RI.; All Rights Reserved

8 8 TABLE IV AVG. ERROR IN EMBEDDING PERFORMANCE adjoining manifolds intersecting manifolds ternary mixtures LLE LRNE manif manif manif manif mix-manif mix-manif important to notice that the particular choice of endmembers creates an interesting density pattern. For the blue mixture, the point-density decreases from the intersection between the two mixtures towards the corner represented by the non-shared endmember. For the red mixture, a (more intense) decrease is observed towards one of the shared endmembers as well. The LRE clustering performance for this data-set shows that the performance of this algorithm suffers significantly in the presence of the density gradient, as depicted in Fig. 6 (c). The result affected the embedding so significantly that we opted for not showing the LRE embeddings. The SMCE shows reasonable clustering results as shown in 6 (b), the performance is however significantly degraded as compared to the one on the uniformly-sampled adjoining manifolds. In particular, the performance is affected significantly near the corners of the red mixture, as seen in Fig. 6 (b). In the region of the intersection towards one of the shared endmembers the neighborhoods of points on the red manifold might contain mostly points from the blue manifold. Since the SMCE prioritizes the closest neighbors and selects very few neighbors, it creates scenarios where the data-points in this region are completely disconnected from the correct manifold, so that many of those points are classified as blue. The SMCE embeddings show significant distortions and missing pieces, due to the clustering mistakes, as shown in 6 (f). The LRNE performs slightly better than the SMCE in terms of clustering in the presence of such density effects as shown in 6 (d). This is due to the fact that the algorithm does not have a sparsity penalty and can rely on more points for reconstruction in low-density neighborhoods. The LRNE embeddings (in 6 (g)) are significantly closer to the optimal LLE embeddings (6 (e)) as compared the SMCE embedding. The missing parts corresponds to the incorrectly classified parts of the simplex. [Note:- Embedding of incorrectly classified points not shown for clarity]. We have observed a similar trend of LRNE outperforming LRE and SMCE in terms of both clustering and embedding for several other end-member configurations. A similar example was presented in [21], in addition to results on a real hyperspectral data-set, acquired by the authors. C. Analyzing the Embedding Performance In this section we will attempt to make a quantitative comparison of the embedding performance of the LRNE to the optimal embeddings. In the best scenario, the embeddings generated by the optimal LLE are an approximate representation of the parameter space up to some affine transformation [32]. In the examples analyzed so far, the manifolds are non-linear mappings of convex sets in some low-dimensional space, i.e. the intersecting and adjoining manifolds are non-linear mappings of rectangular sheets while the mixture manifolds are non-linear mappings of the 2-D abundance simplex (a triangle). Since the data-sets are convex in the parameter space each point in the parameter space can be expressed as a convex combination of the cloud vertices. In general, If we ignore the performance loss due to the linear global transformation between the embedded cloud and the original parametrization, we can consider the optimal LLE embedding performance as the best achievable by a local affine reconstruction. The embeddings will also be convex sets and the points in the embedded spaces can also be expressed as convex combinations of the vertices. It might be interesting to compare the coefficients of the convex combination of vertices in the original parameter spaces and the ones of the convex combination of vertices in the embedding, as a way to quantitatively evaluate the embedding performance. This is important in some applications in which the coefficients carry semantic value. For example, in hyperspectral unmixing the coefficients represent the fractional contributions (abundances) of the different endmembers to the mixed pixels. We devised a simple method to perform the comparison. We express each point x i as a convex combination of some vertices V with weights W i. From the embeddings we also find the estimated weights Wi est for the embedded points y i with respect to the embedded versions of the same vertices V y. We define the average error in the embedding of a manifold as 1 n n W est i W i, where n is the number correctly classified points in each manifold. [Note: - we do not consider incorrectly classified points as the embedding error in these case is affected by the classification]. The effect of the parameters λ and the number of neighbors k on the embedding error of the LRNE, is shown for one of the manifolds in the adjoining manifolds dataset in Table. III. In general we note that the embedding error is slightly higher than the corresponding LLE error. For very small values of λ incorrect points are given high priority in the reconstruction this leading to distortion. For high values of λ the algorithm prioritizes low-rank neighborhoods over reconstruction error leading to large embedding errors. The best embedding error for the different simulated manifolds is shown in Table IV. In general the best embedding error from the LRNE is very close to the LLE embedding errors. D. Experiments with Real Data In this section we evaluate the LRNE algorithm on some well-known real data-sets used as benchmarks for manifold clustering. Real data-sets offer specific challenges, which vary according to the nature and type of noise in the data and the number of manifold intersections. Additionally, we seldom have information on the sample distribution on the manifold and whether the data are best modeled as adjoining or intersecting manifolds. To analyze the robustness of the various algorithms to issues encountered in real data-sets we

9 9 TABLE III EFFECT OF PARAMETERS ON LRNE EMBEDDING OF MANIFOLD-1 λ = λ = 0.01 λ = 0.1 λ = 1 λ = 10 LLE k = k = k = k = k = TABLE VII CLASSIFICATION PERFORMANCE OF YALE FACE -B DATA-SET Misclassification Rate SMCE LRE LRNE TABLE VIII CLASSIFICATION PERFORMANCE FOR THE SUBSET OF THE COIL-20 DATA-SET SMCE LRE LRNE % misclass Fig. 7. Top Row - Embeddings generated for the MNIST data-set by the LRNE for clusters corresponding to (a) digit 3, (b) digit 4, (c) digit 6. Bottom Row - Embeddings generated for the MNIST data-set by the SMCE for clusters corresponding to (d) digit 3, (e) digit 4, (f) digit 6. consider their performance on three data-sets: (i) a subset of the digits from the MNIST database [33], (ii) the Extended Yale Face Database B [34], [35] and (iii) images of objects from the well known COIL-20 database [36]. 1) MNIST Digit data-set: Each image in the MNIST digit data-set is a 28 X 28 grayscale image of a handwritten number. Following the SMCE paper [11], we only consider of 5 of the classes, namely the ones corresponding to digits { 0, 3, 4, 6, 7 }. In particular we draw at random 200 samples from each of these classes to form the test data-set. To variance in the estimation we perform multiple (four) trials with these settings and report the average performance across these trials. Similarly to the experiments on simulated data-sets, we report the classification error across a range of values of the penalty parameters for the three competitor algorithms in Table VI. The results show that LRNE outperforms SMCE and LRE. The best clustering performance of the various algorithms over different classes is shown in Table V. The SMCE particularly struggles with the classes 4 and 6. The embedding generated by LRNE and the SMCE for specific classes is shown in Fig. 7. The embeddings on the digits show that the stylized digits occur as outliers and separated from the other data. Note that images with similar shapes are spatial neighbors in the LRNE embeddings while spatially nearest neighbors have different shapes in the SMCE embeddings (especially in the case of the digit 3 and 6 ) [Note: To facilitate better discrimination incorrectly classified points are not shown in the embeddings for the real data-sets.] 2) Extended Yale Face Database: In this experiment, we consider the problem of clustering and embedding of face images of two subjects (specifically the images of the second and the fifth people in the dataset) from the Extended Yale B database proposed in [11]. In the original database there are 64 images corresponding to each subject at a resolution of pixels captured under fixed pose and with changes in illumination [34]. For this experiment, we use the version of the images that have been resized to These data are described in [37] 2. The performance of the three competitors with parameter λ in the range [0.01, 100] (in multiples of 10), whereas the parameter k was varied between 5 10, in increments of 1. The best classification performance is shown in Table (VII). In terms of clustering results, the LRNE matches the performance of the SMCE. The embeddings generated by the LRNE and the SMCE for the different classes are shown in Fig. 8. The LRNE embeddings resolves both light-direction and image brightness and appear smooth and not disconnected. [Note: To facilitate better discrimination incorrectly classified points are not shown in the embeddings for the real data-sets.] 3) The Columbia Image Object Image Library (COIL-20) data-set: The COIL-20 image database consists of images of 20 different objects, each object was placed on a turntable and 72 images were taken for each object with a 5 rotation between successive images. In this scenario it is expected that the nearest neighbors for each image are the pictures with smallest change in the angle (i.e. the images with a 5 rotation either way). This ensures that even when we choose the sparsest (smallest) neighborhoods there is overlap between the neighborhoods as the nearest neighbors on each side are approximately the same distance. Along with onedimensional nature of the manifold the overlap between very sparse neighborhoods makes this an ideal test case for the SMCE. We will compare the different algorithms in terms of both classification and embedding performance. As with the Yale data-set, in this experiment, we will use a 2 This data-set is available for download at dengcai/data/facedata.html

10 10 TABLE V AVERAGE CLASSIFICATION PERFORMANCE OF ACROSS THE DIFFERENT CLASSES IN THE MNIST DIGITS DATA-SETS WITH LRNE (λ = 0.01 & k = 50), SMCE(λ = 1 & kmax = 50) AND LRE(λ = 10) OVER MULTIPLE TRAILS Assigned Label 0 Assigned Label 3 Assigned Label 4 Assigned Label 6 Assigned Label 7 True Label 0 True Label 3 True Label 4 True Label 6 True Label 7 MisClassification LRNE SMCE LRE LRNE SMCE LRE LRNE SMCE LRE LRNE SMCE LRE LRNE SMCE LRE TABLE VI AVERAGE MISCLASSIFICATION RATES ON THE MNIST DATA-SET FOR THE DIFFERENT ALGORITHMS OVER MULTIPLE TRIALS avg. misclass. rate SMCE (kmax = 50) LRNE(k = 50) LRE λ = 0.1 λ = 1 λ = 5 λ = 1e 5 λ = 0.01 λ = 0.05 λ = 1 λ = 10 λ = Fig. 8. The embedding generated on the Extended Yale Face data-set for the clusters corresponding to the (a) the first person by LRNE (b) the second person by LRNE (c) the first person by SMCE (d) the second person by SMCE Fig. 9. (a) Classes chosen from the COIL-20 object database (high resolution images used for display). (b) and (c) Embeddings of the classes corresponding to the cups when classification accuracy is prioritized. (b) LRNE and (c) SMCE reduced size version of this database described in [38] 3. In this experiment, we will concentrate on a subset of the COIL-20 database made up 6 different objects shown in Fig. 9 (a). The parameter λ was varied in the range [0.01, 100] in multiples of 10, while the number of neighbors was varied between [5, 25] with a step-size of 5. The clustering performance of the different algorithms is shown in Table VIII. We note that the LRNE slightly improves on the performance of the SMCE. We will evaluate the embedding performance in terms of 3 This data-set is available for download at dengcai/data/mldata.html the two cup like objects (the first two objects on the bottom row) shown in Fig. 9 (a). First lets look at the embedding corresponding to the best classification performance. In terms of the embedding (corresponding to the best classification performance) while both algorithms are successful in learning the directions of variations the SMCE generates embeddings that are essentially 1D, while the 1-D manifold is obtained only partially by the LNRE. As discussed previously, for a 1-D manifold preserving fewer distances from the higher dimension of the manifold serves the SMCE well, whereas trying to preserve larger neighborhoods costs the LRNE, as

11 11 there are farther away and should not be preserved. If we look at the best embedding performance achievable by the algorithms, for the LRNE we can change the parameters to select small neighborhoods. For example, if we set the parameters to k = 2 and λ = 0.5, the LRNE generates near perfect embeddings as shown in Fig. 10 (a), in that it clearly identifies that the manifold is one 1-D, this is intuitive as only there is only one degree of freedom in the various images (the angle of rotation of the turn-table), and progressing along the manifold clearly shows the rotation (which can be appreciated by following the symbols in the figure on the left and the cup handle in the figures on the right). Further the points appear approximately equidistant from each other which is also expected as the change in angle is the same between each pair of pictures. The improved embeddings though come at the cost of the classification performance and in this case the misclassification rates jumps to 7.41%. For the SMCE the best embedding performance occurs if we set kmax = 3 and λ = 0.5, especially in the case of the cup with the handle but for the other manifold the embedding does not change much. The improvements in the embedding also carry a penalty on the classification performance and the misclassification rates is 8.10%. While the embedding is 1-D, the circular pattern is as concise as the one obtained by LNRE as the circular patterns in Fig. 10 (b) can be obtained as a non-linear mapping of the patterns in Fig. 10 (a). V. CONCLUSION & FUTURE WORKS The Low Rank Neighborhood algorithm successfully generates a reconstruction matrix that can be used for both manifold clustering and embedding. The LRNE outperforms existing state-of-the-art algorithms in terms of both clustering and embedding over a variety of simulated and real data-sets. The LRNE shows improved clustering especially in scenarios where there are local variation in densities. Additionally since the LRNE allows the users to choose the size of the neighborhood k, we can ensure that there is enough overlap between different neighborhood patches which ensures that the LRNE is better able to retrieve the global shape of the parameter space. The embeddings generate by the LRNE compare favorably to the ones generated by dedicated embedding algorithms on each manifold separately. Future work will focus on automating the decision on whether the manifolds are based modeled as adjoining or intersecting manifolds, i.e. whether any of the classes generated from the spectral clustering should be futher merged. Another avenue of research is the use of techniques such as the one described in [39] to make the algorithm parameterless. Attempts are also ongoing to evaluate the embedding quality in a more principled manner. APPENDIX A ADMM BASED OPTIMIZATION SCHEME FOR THE LRNE Recall the optimization problem for the LRNE described in Eqn. (3) is: α 1 2 x α i n i 2 + λ ˆM subject to 1 T α = 1. (5) this problem can be simplified and rewritten as: 1 α 2 x Nα 2 + λ ˆN α i e i e T i subject to 1 T α = 1 where the vector e i is a vector such that the i-th element is 1 and all the other elements are 0, where ˆN is the normalized neighborhood matrix described in section III. All the terms in the objective function described above are convex and this optimization problem can be solved by using the Alternating Direction Method of Multipliers (ADMM) [25] to find an optimal solution. A dummy variable is introduced and the variable-augmented optimization problem can be written as: 1 α, V 2 x Nα 2 + λ V subject to 1 T α = 1 V = ˆN α i e i e T i The augmented Lagrangian for the ADMM can be written as: L(V, α, Λ 1, Λ 2 ) = 1 2 x Nα 2 + λ V + β 2 1T α β Λ 1 + β V ˆN α i e i e T i + 1 F 2 β Λ 2 2 F (7) The ADMM decomposes the minimization into two separate optimization problems. Update equation for V : Specifically we can write the update iterations for the variable V with respect to the augmented Lagrangian as: V k+1 = argmin V V k+1 = argmin V L(V, α k, Λ k 1, Λ k 2) λ V + β 2 V T + 1 β Λk 2 where T = ˆN k αk i e ie T i. This equation is quite similar to Eqn. (6) in Lin et al. [40] and similarly can be updated by using a proximal update using the singular value thresholding [41]. Update equation for α: The next step is the optimization with respect to α, which can be written as: - α k+1 =argmin α =argmin α 2 F (6) (8) L(V k+1, α, Λ k 1, Λ k 2) 1 2 β x Nα + 2 }{{} 2 1T α β Λk 1 + F J 1 }{{} J β 2 V k+1 ˆN α i e i e T i + 1 β Λk 2 F }{{} J 3 (9)

12 12 Fig. 10. Embeddings generated by the different algorithms when embeddings are prioritized over the classification (a) LRNE and (b SMCE) First we find the gradient of these terms with respect to alpha, the gradient of the first two terms is very straight forward and can be written as: J 1 α = 2N T x + 2N T Nα (10) J 2 α = 2F T α (11) where F 1 = 1 Λk 1 β. The differentiation of the third term is slightly more complex. We begin by defining F 2 = V k+1 + Λk 2 β, then: J 3 = Tr(F2 T F 2 ) 2 Tr( α i e i e T i ˆN T F 2 )+ + Tr( α i e i e T i ˆN T ˆN α j e j e T j ) Using the properties of the Trace (Tr) (linearity and cyclic permutations) we can rewrite the above equation as: = Tr(F T 2 F 2 ) 2 + = Tr(F T 2 F 2 ) 2 α i Tr(e i e T i ˆN T F 2 )+ ( α i Tr e i e T i α i Tr(e i e T i ˆN T F 2 )+ + j=1 ( ˆN T k ) ) ˆN α je j e T j α i α j Tr(e j e T j e i e T i ˆN T ˆN) now differentiating the above equation with respect to α i we have: J 3 α i = 2 Tr(e i e T i ˆN T F 2 ) + j i ( α j Tr e j e T j e i e T i + 2α i Tr(e i e T i e i e T i ) ˆN T ˆN ˆN T ˆN) by definition e T j e i = 0 if j i and e T i e i = 1 so the above equation reduces to: J 3 α i = 2 Tr(e i e T i ˆN T F 2 ) + 2α i Tr(e i e T i ˆN T ˆN) the gradient with respect to all the entries of α can be written as: J 3 = 2P + 2Qα (12) α where the i th entry of the vector is given by P i = Tr(e i e T i ˆN T F 2 ) and Q is diagonal matrix which has entries given by Q ii = Tr(e i e T i ˆN T ˆN): Using the results in Eqn. (10), (11) and (12) we can write the total gradient as: - N T x + N T Nα β1f 1 + β11 T α βp + βqα = 0 (13) Solving the above equation we get that the optimal value is given by the stationary point ˆα = ( N T N + β11 T + Q ) 1 ( N T x + β1f 1 + βp ). The second-derivative w.r.t α at this stationary point is: 2 L α 2 = N T N + β11 T + βq in the above equation the matrices N T N, 11 T and Q are positive semi-definite, thus the second derivative has positive eigen-values making the stationary point a minimum. Update for the Lagrangian Multipliers: The Lagrangian multipliers can the be updated as:- Λ k+1 1 = Λ k 1 + β(1 T α 1) Λ k+1 2 = Λ k 2 + β(v ˆN REFERENCES α i e i e T i ) [1] X. Huo, X. S. Ni, and A. K. Smith, A survey of manifold-based learning methods, Recent advances in data mining of enterprise data, pp , [2] J. Zhang, H. Huang, and J. Wang, Manifold learning for visualizing and analyzing high-dimensional data, IEEE Intelligent Systems, vol. 25, no. 4, pp , [3] J. B. Tenenbaum, V. de Silva, and J. C. Langford, A global geometric framework for nonlinear dimensionality reduction, Science, vol. 290, no. 5500, p. 2319, [4] S. T. Roweis and L. K. Saul, Nonlinear dimensionality reduction by locally linear embedding, Science, vol. 290, pp , [5] M. Belkin and P. Niyogi, Laplacian eigenmaps and spectral techniques for embedding and clustering, in Advances in Neural Information Processing Systems 14. MIT Press, 2001, pp [6] Z.-y. Zhang and H.-y. Zha, Principal manifolds and nonlinear dimensionality reduction via tangent space alignment, Journal of Shanghai University (English Edition), vol. 8, no. 4, pp , [7] L. Van Der Maaten, E. Postma, and J. Van den Herik, Dimensionality reduction: a comparative review, J Mach Learn Res, vol. 10, pp , [8] Y. Ma and Y. Fu, Manifold Learning Theory and Applications. CRC Press, 2012.

13 [9] J. Shi and J. Malik, Normalized cuts and image segmentation, IEEE Trans. Pattern Anal. Mach. Intell., vol. 22, no. 8, pp. 888 905, Aug. 2000. [10] A. Saranathan and M.

6th IEEE Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing (WHISPERS), 2014. [11] E. Elhamifar and R.

13 13 [9] J. Shi and J. Malik, Normalized cuts and image segmentation, IEEE Trans. Pattern Anal. Mach. Intell., vol. 22, no. 8, pp , Aug [10] A. Saranathan and M. Parente, Manifold clustering based unmixing for the multiple intimate mixture scenario, in Proc. 6th IEEE Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing (WHISPERS), [11] E. Elhamifar and R. Vidal, Sparse manifold clustering and embedding, in Advances in Neural Information Processing Systems 24, J. ShaweTaylor, R. Zemel, P. Bartlett, F. Pereira, and K. Weinberger, Eds. Curran Associates, Inc., 2011, pp [12] Y. Wang, Y. Jiang, Y. Wu, and Z.-H. Zhou, Local and structural consistency for multi-manifold clustering, in IJCAI Proceedings-International Joint Conference on Artificial Intelligence, vol. 22, no. 1. Citeseer, 2011, p [13], PRICAI 2010: Trends in Artificial Intelligence: 11th Pacific Rim International Conference on Artificial Intelligence, Daegu, Korea, August 30 September 2, Proceedings. Berlin, Heidelberg: Springer Berlin Heidelberg, 2010, ch. Multi-manifold Clustering, pp [Online]. Available: 27 [14] R. Souvenir and R. Pless, Manifold clustering, in In ICCV, 2005, pp [15] R. Vidal, Subspace clustering, IEEE Signal Processing Magazine, vol. 28, no. 2, pp , March [16] D. Gong, X. Zhao, and G. Medioni, Robust multiple manifolds structure learning, arxiv preprint arxiv: , [17] R. Liu, R. Hao, and Z. Su, Mixture of manifolds clustering via low rank embedding, Journal of Information and Computational Science, vol. 8, no. 5, pp , [18] G. Liu, Z. Lin, and Y. Yu, Robust subspace segmentation by low-rank representation. [19] J. MacQueen et al., Some methods for classification and analysis of multivariate observations, in Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol. 1, no. 14. Oakland, CA, USA., 1967, pp [20] A. M. Saranathan and M. Parente, Simultaneous clustering and embedding for multiple intimate mixtures, in Geoscience and Remote Sensing Symposium (IGARSS), 2015 IEEE International, July 2015, pp [21] A. Saranathan and M. Parente, Unmixing multiple intimate mixtures via a locally low-rank representation, in Proc. of 8th IEEE GRSS Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing (WHISPERS), August, [22] E. J. Cande s and B. Recht, Exact matrix completion via convex optimization, Found. Comput. Math., vol. 9, no. 6, pp , Dec [Online]. Available: [23] M. Grant and S. Boyd, CVX: Matlab software for disciplined convex programming, version 2.1, Mar [24], Graph implementations for nonsmooth convex programs, in Recent Advances in Learning and Control, ser. Lecture Notes in Control and Information Sciences, V. Blondel, S. Boyd, and H. Kimura, Eds. Springer-Verlag Limited, 2008, pp [25] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, Distributed optimization and statistical learning via the alternating direction method of multipliers, Foundations and Trends R in Machine Learning, vol. 3, no. 1, pp , [26] Z. Yang, T. Hao, O. Dikmen, X. Chen, and E. Oja, Clustering by nonnegative matrix factorization using graph random walk, in Advances in Neural Information Processing Systems, 2012, pp [27] U. Von Luxburg, A tutorial on spectral clustering, Statistics and computing, vol. 17, no. 4, pp , [28] Z. Lin, R. Liu, and Z. Su, Linearized alternating direction method with adaptive penalty for low-rank representation, in Advances in neural information processing systems, 2011, pp [29] G. H. Golub and C. F. Van Loan, Matrix computations. JHU Press, 2012, vol. 3. [30] M. T. Eismann, Hyperspectral remote sensing. SPIE Bellingham, [31] B. Hapke, Theory of Reflectance and Emittance Spectroscopy, 2nd ed. Cambridge University Press, 2012, cambridge Books Online. [32] Y. Goldberg, A. Zakai, D. Kushnir, and Y. Ritov, Manifold learning: The price of normalization, Journal of Machine Learning Research, vol. 9, no. Aug, pp , [33] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, Gradient-based learning applied to document recognition, in Proceedings of the IEEE, 1998, pp [34] A. Georghiades, P. Belhumeur, and D. Kriegman, From few to many: Illumination cone models for face recognition under variable lighting and pose, IEEE Trans. Pattern Anal. Mach. Intelligence, vol. 23, no. 6, pp , [35] K.-C. Lee, J. Ho, and D. J. Kriegman, Acquiring linear subspaces for face recognition under variable lighting, IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 5, pp , May [Online]. Available: [36] S. A. Nene, S. K. Nayar, H. Murase et al., Columbia object image library (coil-20), Tech. Rep. [37] D. Cai, X. He, Y. Hu, J. Han, and T. Huang, Learning a spatially smooth subspace for face recognition, in Proc. IEEE Conf. Computer Vision and Pattern Recognition Machine Learning (CVPR 07), [38] D. Cai, X. He, J. Han, and T. S. Huang, Graph regularized nonnegative matrix factorization for data representation, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 8, pp , [39] L. Zelnik-Manor and P. Perona, Self-tuning spectral clustering, [40] Z. Lin, M. Chen, and Y. Ma, The augmented lagrange multiplier method for exact recovery of corrupted low-rank matrices, arxiv preprint arxiv: , [41] J.-F. Cai, E. J. Cande s, and Z. Shen, A singular value thresholding algorithm for matrix completion, SIAM Journal on Optimization, vol. 20, no. 4, pp , Arun M Saranathan received the B.E. degree from Visvesvaraya Technological University in Belgaum, India and the M.S. degree in Electrical Engineering from the University of Massachusetts, Amherst. He is currently a Ph.D student in the Department of Electrical & Computer Engineering at University of Massachusetts, Amherst. His interests include the use and extension of image segmentation techniques for Hyperspectral (HSI) Images and the use of manifold techniques to model the mixing seen in HSI. He has been a student member of IEEE since Mario Parente (M 05-SM 13) received the B.S. and M.S. (summa cum laude) degrees in telecommunication engineering from the University of Federico II of Naples, Italy, and the M.S. and Ph.D. degrees in electrical engineering from Stanford University, Stanford, CA. He was a Post-Doctoral Associate in the Department of Geosciences at Brown University. He is currently an Assistant Professor in the Department of Electrical and Computer Engineering at the University of Massachusetts Amherst. His research involves combining physical models and statistical techniques to address issues in remote sensing of Earth and planetary surfaces. Prof. Parente s professional interests include identification of ground composition, geomorphological feature detection and imaging spectrometer data modeling, reduction and calibration for NASA missions. He developed machine learning algorithms for the representation and processing of hyperspectral data based on statistical, geometrical and topological models. Dr. Parente s research also involves the study of physical models of light scattering in particulate media. Furthermore, he has developed solutions for the integration of color and hyperspectral imaging and robotics to identify scientifically significant targets for rover and orbiter-based reconnaissance. Dr. Parente has supported several scientific teams in NASA missions such as the Compact Reconnaissance Imaging Spectrometer for Mars (CRISM), the Mars Mineralogy Mapper (M3) and the Mars Science Laboratory ChemCam science teams. Dr. Parente is a principal investigator at the SETI Institute, Carl Sagan Center for Search for Life in the Universe and a member of the NASA Astrobiology Institute. Prof. Parente is serving as an Associate Editor for the IEEE Geoscience and Remote Sensing Letters.

Learning a Manifold as an Atlas Supplementary Material

Learning a Manifold as an Atlas Supplementary Material Nikolaos Pitelis Chris Russell School of EECS, Queen Mary, University of London [nikolaos.pitelis,chrisr,lourdes]@eecs.qmul.ac.uk Lourdes Agapito