Semi-Coupled Basis and Distance Metric Learning for Cross-Domain Matching: Application to Low-Resolution Face Recognition

Size: px

Start display at page:

Download "Semi-Coupled Basis and Distance Metric Learning for Cross-Domain Matching: Application to Low-Resolution Face Recognition"

Bruce Black
5 years ago
Views:

1 Semi-Coupled Basis and Distance Metric Learning for Cross-Domain Matching: Application to Low-Resolution Face Recognition Panagiotis Moutafis and Ioannis A. Kakadiaris Computational Biomedicine Lab Department of Computer Science, University of Houston 4800 Calhoun Rd. Houston, TX, Abstract In this paper, we propose a method for matching biometric data from disparate domains. Specifically, we focus on the problem of comparing a low-resolution (LR) image with a high-resolution (HR) one. Existing coupled mapping methods do not fully exploit the HR information or they do not simultaneously use samples from both domains during training. To this end, we propose a method that learns coupled distance metrics in two steps. In addition, we propose to jointly learn two semi-coupled bases that yield optimal representations. In particular, the HR images are used to learn a basis and a distance metric that result in increased class-separation. The LR images are then used to learn a basis and a distance metric that map the LR data to their class-discriminated HR pairs. Finally, the two distance metrics are refined to simultaneously enhance the class-separation of both HR class-discriminated and LR projected images. We illustrate that different distance metric learning approaches can be employed in conjunction with our framework. Experimental results on Multi-PIE and SCface, along with the relevant hypothesis tests, provide evidence of the effectiveness of the proposed approach. 1. Introduction Data from disparate domains may be exploited in different ways depending on the task at hand. For instance, transfer learning methods are broadly used to learn a mapping from one domain to another. In this paper, we focus on cross-domain matching via coupled mappings. Such techniques seek a unified subspace for both domains where the classification accuracy is improved. Specifically, we focus on the problem of low-resolution face recognition (i.e., matching an LR facial image to an HR one or vice versa). A review of this problem is offered by Wang et al. [15]. The main challenges are: (i) the HR and LR images typically yield a different number of features; and (ii) the LR images contain less discriminative information compared to the HR ones. Hence, the accuracy of face recognition systems significantly degrades. Empirical results suggest that face recognition algorithms require images with a minimum resolution ranging between and pixels [8]. The most common way to address this problem is to enhance the LR images using super-resolution techniques. However, estimating the HR latent image is a difficult and computationally expensive task. In addition, computing an HR reconstruction of the LR image that looks good does not always imply increased discriminative information. To avoid the super-resolution step, coupled mapping techniques learn coupled projections to a common subspace where the classification accuracy is improved. One of the first methods employed for this problem was Canonical Correlation Analysis (CCA) [2] which seeks to learn a projection that maximizes the correlation without any supervision. However, more sophisticated methods have been developed that exploit label information. Some of them rely on an objective function that uses: (i) intrinsic and penalty adjacency matrices to define the constraints; and (ii) information obtained from the samples of the two domains. To allow for generalized eigenvalue decomposition solutions the model parameters and variables are treated as concatenated matrices. Methods in this category include Coupled Locality Preserving Mappings [7], Piecewise Regularized Canonical Correlation Discrimination [9], Simultaneous Discriminant Analysis (SDA) [18], Coupled Marginal Fisher Analysis (CMFA) [10], and Supervised Locality Preserving Projection Coupled Metric Learning [19]. The differences of these approaches lie in the way that they define the components of the objective function. The high dimensional information is usually used to define the affinity constraints. Multidimensional Scaling for Matching Low-resolution Facial Images [1] uses concatenated matrices as well but the objective function is opti-

2 Table 1. Overview of the notation used in this paper. Vectors are denoted by bold lower-case letters, matrices by bold upper-case letters, the number of samples by m, and the number of basis vectors for the high- and low- dimensional input by ψ and χ, respectively. Symbol Description H = [h i h i ɛr η, i = 1,..., m] High-Dimensional Input L = [l i l i ɛr λ, i = 1,..., m] Low-Dimensional Input B η = [b η j bη j ɛrη, j = 1,..., ψ] B λ = [b λ j b λ j ɛr λ, j = 1,..., χ] Basis for the High-Dimensional Input Basis for the Low-Dimensional Input Y = [y i y i ɛr ψ, i = 1,..., m] Representation for the High-Dimensional Input X = [x i x i ɛr χ, i = 1,..., m] Representation for the Low-Dimensional Input f D : R η R α f C : R λ R α f R : R α R β Discriminative Projection Coupling Projection Refining Projection mized using an iterative majorization algorithm. To exploit the information from high-dimensional data the distances in the project space are imposed to approximate the ones that would be obtained by using only the HR images. Other methods jointly learn coupled projections but without concatenating the corresponding matrices. Coupled Metric Learning (CML) [6] learns a projection for the highdimensional data that maximizes class-separation and a projection for the low-dimensional data that maps them to their high-dimensional projected pairs. Maximum-Margin Coupled Mappings (MMCM) [11] relies on the Large Margin Nearest Neighbor Classification [16] objective function. It defines the constraints and error terms using pairs of samples, where the first element of each pair is a sample in the original space from one domain and the other element is a sample from the other domain in the projected space. Methods in the first category simultaneously use samples from both domains, while approaches in the second category exploit high-dimensional information more effectively. In this paper, we propose a method that combines both capabilities by learning the coupled projections in two steps. Moreover, it jointly learns one basis for each domain that results in optimal data representations for the task at hand. Specifically, the proposed model comprises four parameters: (i) a basis B η for the high-dimensional input; (ii) a distance metric f D (formally, a pseudo-metric or equivalently a projection) for the corresponding representation; (iii) a basis B λ for the low-dimensional input; and (iv) a projection f C for the corresponding representation. An overview of the notations is provided in Table 1 and an overview of the proposed method is depicted in Fig. 1. Specifically, B η is learned with reconstructive properties in the original space R η and discriminative properties for the f D -projected data in R α. The discriminative projection f D is learned with the goal of mapping the high-dimensional representations from R ψ to a sub-space R α to maximize the classification accuracy. Note that f D completely ignores the lowdimensional data. Consequently, B η and f D fully exploit the high-dimensional information. The basis B λ is learned with reconstructive properties in the original space R λ and coupling properties for the f C -projected data in R α. The coupling projection f C is learned with the goal of mapping the low-dimensional representations from R χ to the corresponding f D -projected high-dimensional pairs in R α. Hence, it exploits the increased class-separation of the highdimensional data in R α. Finally, to simultaneously exploit samples from both domains and enhance the clustering of the data a refining projection f R is learned that maps both the projected high- and low- dimensional data from R α to R β. The learned f D and f C are thus updated through the composition with f R. Our contribution is a method that: (i) fully exploits the high-dimensional information; (ii) simultaneously exploits samples from both domains; (iii) jointly learns two bases that yield optimal representations; and (iv) advances low-resolution face recognition. Our method relies on closed-form formulas, which makes it efficient. In the spirit of reproducible research, a MATLAB implementation is provided on our website 1. The rest of the paper is organized as follows: In Sec. 2, we offer the intuition behind the proposed model; in Sec. 3, we discuss important implementation details; in Sec. 4, we present the experimental evaluation; and in Sec. 5, we conclude with a discussion of our findings. 2. Model In this section, we present the proposed method, Coupled Basis & Distance (CBD). Our model is parametrized by B η, B λ, f D, and f C. Even though the optimization problem is not jointly convex, it becomes convex if we fix three of them and solve for the remaining one. 1

3 Figure 1. Overview of the proposed method. The indices indicate the sample identifier. Learning B η : The basis for the high-dimensional input should yield representations with: (i) good class-separation in R α, and (ii) good reconstruction properties in R η. Our strategy is to first seek optimal high-dimensional representations that favor k-nearest neighbors (knn) classification in R α. Inspired by LMNN, we learn the optimal position y i for a given high-dimensional sample y i using the notions of target neighbors (i.e., knn of the same class) and impostors (i.e., samples with a different label that violate a predefined margin). Given these representations, B η is updated by minimizing the reconstruction error in R η. Two important advantages of the proposed basis update are: (i) the representations are defined by exploiting local relationships in the data; and (ii) by changing the location of the points in the projected space the relationship between the original and learned representation is non-linear. Hence, by finding a basis that connects the two, our approach incorporates discriminative properties that conventional methods cannot capture. In particular, the optimal position in terms of target neighbors is defined as: m y T j=1 i = y j δ T (i, j) m j=1 δ, (1) T (i, j) where δ T (i, j) = 1 if j is a target neighbor of i and δ T (i, j) = 0, otherwise. The δ T (i, j) function is not commutative, that is δ T (i, j) δ T (j, i). The margin for each sample is defined as a function of the distance to its farthest target neighbor plus a constant: m i = 1 + max f D (y i ) f D (y j ) 2 δ T (i, j). (2) j As a result, we define the optimal position of a given sample y i with respect to its impostors by: y I i = m j=1 ỹj δ I(i, j) m j=1 δ I(i, j), (3) Figure 2. Illustration of learning an optimal representation y T +I i by optimizing a variation of the LMNN loss function. where δ I (i, j) = 1 if j is an impostor for i and δ I (i, j) = 0, otherwise. The representation ỹ j is defined as the geometrically opposite point of y j with respect to y i such that f D (ỹ j ) f D (y j ) 2 = m i. That ( mi f D (y i ) f D (y j ) 2 is ỹ j = y i + (y j y i ) f D (y i ) f D (y j ) 2 ). An illustration of the optimal position in terms of target neighbors and impostors is provided in Fig. 2. Finally, the optimal representation in terms of the LMNN loss function (i.e., both target neighbors and impostors) is provided by: y i = ζ Y y T i + (1 ζ Y ) y I i, s.t. max f D(y j ) 2 = 1, j=1,...,m where ζ Y ɛ[0, 1] determines the trade-off between target neighbors and impostors. If there are no impostors ζ Y is set to one. The regularization constraint ensures that the projected points lie within a hypersphere of unit radius. Since the margin is also defined to be a unit this strategy favors a better clustering of the data. To impose this constraint we scale the data by applying Y = Y (max l=1,...,m f D (y l ) 2 ) 1 before we update their position (i.e., Eqs. (1), (3) & (4)). Once the representations that favor knn classification in the projected space R α have been computed the basis B η can be updated with the goal of minimizing the reconstruction error in R η by: (4) [ B η = H γ Y Y + (1 γ Y ) Y ] 1, (5) where γ Y controls the learning rate. We introduced this learning rate to alleviate abrupt changes. Learning B λ : The basis for the low-dimensional input should yield representations that: (i) couple the projected high-dimensional representations in R α, and (ii) can recon-

4 Algorithm 1 CBD: Training Input: H, L, ψ, χ, α, β, ζ Y, ζ X, γ Y, γ X, µ, and k Output: B η, B λ, f D, and f C 1: Initialize B η, B λ, Y, X, f D, and f C 2: while the convergence criterion has not been met do 3: for i = 1,..., m do 4: Compute y i according to Eq. (4). 5: end for 6: Compute X according to Eq. (6). 7: Update B η and B λ according to Eqs. (5) & (7). 8: Update Y and X according to Eqs. (11) & (12). 9: Update f D and f C according to Eqs. (9) & (10). 10: Update δ T, δ I and m i i. 11: end while struct the low-dimensional data in R λ. We first define optimal representations in terms of coupling. The optimal lowdimensional representations in R α should be identical with the high-dimensional ones. Therefore, we define: X = f 1 C f D [ γ Y Y + (1 γ Y ) Y ]. (6) The use of X to update B λ results in abrupt changes. Hence, we use a learning rate γ X : [ B λ = L γ X X + (1 γ X ) X ] 1. (7) The optimal representations X incorporate the discriminative information of Y. Hence, B λ implicitly incorporates class-separation properties. Existing methods that learn basis with reconstructive and discriminative (or coupling) properties have two drawbacks. First, they use the label information to define global constraints. For example, the Discriminative K-SVD [5] simultaneously learns a dictionary, a linear projection, and the corresponding sparse coefficients. The cost function composes a reconstruction and a class-separation term. The latter penalizes large discrepancies between the linear projection of the sparse coefficients and a matrix defined using the labels information (i.e., global constraints). Second, determining an optimal balance between the reconstruction and discrimination terms depends on the data and should be manually configured. Our approach to learning the basis relies on constraints defined using local relationships and it is robust to different scalings across datasets. Learning f D : The projection f D is learned with the goal of improving the knn accuracy of Y in R α, ignoring X. Our rationale is that the high-dimensional input contains more discriminative information [20]. In particular, f D is learned by employing the Local Fisher Discriminant Analysis (LFDA) [12]. Our selection is motivated by three advantages of LFDA: (i) it defines constraints using local rela- Algorithm 2 CBD: Testing Input: x t, H, B η, B λ, f D, f C, ζ X, and k Output: Class Label 1: Use Eq. (13) to obtain Y. 2: Project Y using f D. 3: Use Eq. (14) to obtain x t. 4: Project x t using f C. 5: Employ knn for f C (x t ), using f D (Y) as training set. tionships in the data, (ii) it relies on closed-form solutions, and (iii) it naturally performs dimensionality reduction. Learning f C : The projection f C seeks a mapping of X to their corresponding high-dimensional pairs in R α (i.e., f D (Y)). Our decision to learn a coupling projection is motivated by the following: (i) an analytical solution can be easily computed, (ii) it has been found to generalize well in similar applications [14], and (iii) it implicitly incorporates the discriminative information exploited by f D. We set: f C = arg min f D (Y) f C (X) 2 F + µ f C 2 F, (8) f C where. 2 F is the Frobenius norm of the projection matrix for f C, and µ determines the weight of the regularization term that is used to avoid over-fitting. This is known to be a ridge regression problem and the solution is obtained by: f C = f D (Y)X (XX + µi) 1, (9) where I is a χ χ identity matrix. Learning f R : To exploit information from both domains and enhance the class-separation an additional mapping f R is learned that updates f D and f C. In particular, f R is learned by employing LFDA using as input f D (Y) and f C (X). The pair-information of the data is not taken into consideration. Instead, only the information of the labels is exploited. The two projections f D and f C are refined as: f D = (f R f D ) and f C = (f R f C ). (10) 3. Implementation An overview of the training and testing procedures is provided by Algs. 1 and 2. First, we focus on Alg. 1. Line 1: The B η, B λ, f D, and f C are initialized using identity matrices. If the matrices are not square the minimum dimension (i.e., number of rows or columns) is used to define the desired identity matrix. Then, additional rows or columns are added by tiling copies of itself. Then, Y and X are initialized according to Eqs. (13) & (14). In particular, Y minimizes the reconstruction error, while X simultaneously minimizes the reconstruction and coupling error: Y = (B η ) 1 H. (11)

5 Table 2. Overview of the parameters for Experiments 1 & 2 (values in parentheses refer to Experiment 3). Dataset η λ ψ χ α β ζ Y ζ X γ Y γ X µ s k SCface (133) 129 (62) 129 (62) 129 (62) Multi-PIE (103) 143 (23) 143 (23) 143 (23) X =([ζ X B λ ; (1 ζ X ) f C ]) 1 [ζ X L; (1 ζ X ) f D (X)], (12) where [ζ X B λ ; (1 ζ X ) f C ] and [ζ X L; (1 ζ X ) f D (X)] denote the concatenated matrices, and ζ X determines the tradeoff between reconstruction and decoupling. Line 3: The indices i are randomly permuted in each iteration. As a result, the bias due to the ordering of the updates is eliminated. Line 7: The representations Y and X are updated according to Eqs. (11) & (12) using the updated basis. This step avoids over-fitting as the representations obtained are going to be similar to the ones to be computed during testing. Line 10: Once the iteration has been completed the target neighbors and impostors are re-defined according to the updated representations and distance metrics. The margins for the impostors are also re-computed. Any test sample x t can be classified using Alg. 2. If a matrix is not invertible the Moore-Penrose pseudo-inverse is used. 4. Experimental Evaluation Parameters: There are many ways to perform dimensionality reduction using the proposed framework: (i) decrease ψ and/or χ, (ii) decrease α, and (iii) decrease β. For CBD no parameter search was performed and the same values were used for all databases and experiments 1 & 2. In particular, χ, α, and β were set to the output dimension specified by Siena et al. [11] to obtain comparable results. The parameter ψ was arbitrarily set equal to λ. Following the suggestion of Weinberger et al. [16] ζ Y was set to 0.5 and the same value was used for ζ X, γ Y, and γ X. The regularization parameter µ was set to zero because no overfitting was observed due to the dimensionality reduction. Alg. 1 was terminated after s = 5 iterations. Finally, k was set to three which is the maximum number of gallery samples with the same identity. An overview of the parameters used for CBD is provided in Table 2. Databases: The experimental evaluation was performed using the Surveillance Cameras Face (SCface) [3] and CMU Multi-PIE Face (Multi-PIE) [4] databases. The SCface database was selected because it includes real images of different resolutions, and Multi-PIE because it offers variety in terms of pose, expression, and illumination. Siena et al. [11] provided us with their code to crop, align, and downsample the images, and also generate LR images for Multi-PIE. The SCface [3] consists of 4, 160 images from 130 subjects and it includes manual annotations. The subset used contains images from: (i) surveillance cameras cam1 cam5, (ii) distance of 2.6 m (i.e., LR), and (iii) distance of 1.0 m (i.e., HR). The resolution of the processed images is and for the HR and LR, respectively. The Multi-PIE [4] comprises over 750, 000 images from 337 subjects. A subset of images was selected using: (i) session 01, (ii) subjects , (iii) expressions 01 and 02 (i.e., neutral and smiling), (iv) cameras 14 0, 05 1, and 05 0 (i.e., yaw= 15, yaw=0, and yaw=15 ), and (v) all of the 20 illumination conditions. Siena et al. [11] could not release the manual annotations that they used. Therefore, the Viola Jones [13] algorithm was employed to detect the eyes. For each subject and each expression the illumination condition denoted by 8 0 was used. When the eye detector failed to detect any eyes, manual annotations were performed. However, when the eyes were detected in the wrong part of the face (e.g., mouth) no corrective action was taken. Hence, some of the processed images used for CBD have incorrect metadata. The final resolution for the HR images is and for the LR images. Experiments: For the first experiment, the protocol of [11] was duplicated. For the second and third experiments, disjoint sets of subjects were used for training and for defining the gallery and probes. Experiment 1: In this experiment, CBD is compared with MMCM, CMFA, SDA, and CCA. For both databases, four HR images were randomly selected to define the gallery which was also used for training. Hence, the gallery size is = 520 images for SCface and = 632 for Multi- PIE. The rest of the images were used as probes. That is, = 130 images for SCface and = 18, 328 images for Multi-PIE. This procedure was repeated 10 times. Siena et al. [11] provided us with the exact indices that they used. Therefore, the results are directly comparable for SCface. However, the Multi-PIE processed data used for CBD were of a lower quality. Hence, its performance is underestimated. The results are presented in Table 3. Experiment 2: In this experiment, four images from 100 subjects were randomly selected for training. Using fewer subjects resulted in numerical errors that prevented us from completing the training process. The rest of the subjects were used to define the gallery and probes for the task of biometric verification. Specifically, the gallery was defined

6 Table 3. Overview of the results for Experiment 1. The values are in the format: improvement over CCA (absolute Rank-1 Identification Rate %). The performance of CBD for Multi-PIE is underestimated due to annotations of lower quality. Bold font is used to indicate the best performance. Method SCface Multi-PIE CCA 1.00 (19.77) 1.00 (63.67) SDA 1.77 (35.08) 1.20 (76.41) CMFA 1.71 (33.77) 1.21 (76.93) MMCM CCA 1.51 (29.77) 1.15 (73.23) MMCM SDA 2.02 (39.92) 1.26 (79.94) MMCM CMFA 1.71 (37.77) 1.26 (80.05) CBD 2.27 (44.92) 1.28 (81.26) Figure 3. Depiction of the ROC curves for the SCface database for Experiment 2 (color figure, best viewed in electronic format). by randomly selecting four images, while the rest of the images were used as probes. The performance evaluation is divided into three categories: (i) LR vs. LR, (ii) HR vs. HR, and (iii) HR vs. LR. The HR vs. LR notation is used to denote that the gallery comprises HR images, while the probes are LR images. This procedure was repeated 100 times and the average performance is presented in Table 4 and the corresponding ROC curves in Figs. 3 & 4. As a baseline we used our own implementations of SDA and CML. To guarantee the correctness of our code we evaluated their performance using the SCface database under the setting of Experiment 1. Due to data irregularities the following settings were used for LFDA: (i) plain metric, (ii) LR in the eigs function (using MATLAB 2012b), and (iii) if the output was complex only the real part was kept. Due to inconsistencies in the dimensionality of the projections CML was trained using the gallery samples, and the HR vs. HR results could not be reported. The projection matrices for SDA were normalized to unit norm. The statistical significance of the obtained results was tested in terms of median values. Specifically, multiple one-sided non-parametric Wilcoxon Signed-Rank tests were performed [17]. The null hypothesis was set to H 0 : the best baseline and CBD median AUC/Rank-1 values are equal, and the alternative hypothesis was defined as H a : the CBD AUC/Rank-1 median has a higher value than the best baseline AUC/Rank-1 median. The Bonferonni adjustment was used to ensure that the overall statistical significance remains 5% due to the multiple tests performed. That is, the statistical significance of each individual test was set to 5% 12 = 0.41%. The statistically significant results are indicated by an asterisk in Table 4. As shown, the median performance of CBD was found to be statistically significantly better in all cases. The highest p-value obtained was The poor performance of the baseline methods is attributed to the smaller training sets and the discrepancies between the disjoint training and evaluation sets. The increased variance observed for CBD is attributed to numerical instabilities induced by LFDA (i.e., high condition number). Nevertheless, CBD is more stable across tasks unlike the other approaches. Experiment 3: In this experiment, we demonstrate that CBD has the potential to be employed with different distance metric learning approaches. To this end, CBD is implemented in conjunction with LFDA and LMNN using the same protocol as Experiment 2. However, due to the increased time complexity of LMNN, the dimensionality of the two datasets was reduced using Principal Components Analysis (PCA) so that 95% of the data variance can be explained. The parameters ψ, χ, α, and β were set to the number of vectors retained by PCA (see Table 3). When LMNN was used as part of CBD the maximum number of iterations was set to 100, while the LMNN baseline was trained using the default settings. The results are presented in Table 5. As illustrated, both approaches can effectively be used in conjunction with CBD. Despite the reduced number of features, both CBD (LMNN) and CBD (LFDA) produce better results for the cross-domain task compared to the baseline methods used in Experiment 2. Moreover, the dimensional- Figure 4. Depiction of the ROC curves for the Multi-PIE database for Experiment 2 (color figure, best viewed in electronic format).

7 Table 4. Overview of the results for Experiment 2. The values are in the format: average value (standard deviation). Bold font is used to indicate the best performance, while the asterisk ( ) indicates that statistically significantly better performance was obtained when CBD was employed. The reported time refers to the time needed to initialize the model parameters, train, test, and evaluate the performance of each method. For more details, consult the experimental protocol described in Sec. 4. Setting Method SCface Multi-PIE AUC Rank-1 % Time (s) AUC Rank-1 % Time (s) LR vs. LR HR vs. HR HR vs. LR CBD 0.69* (0.04) 46.40* (13.30) * (0.06) 78.84* (16.24) SDA 0.64 (0.02) (6.85) (0.01) (2.01) CML 0.53 (0.00) (7.40) (0.00) (2.08) 7.20 CCA 0.58 (0.01) (7.55) (0.00) (2.35) CBD 0.69* (0.03) 51.43* (10.16) * (0.06) 79.71* (15.96) SDA 0.56 (0.01) 9.07 (4.95) (0.00) (1.78) CCA 0.58 (0.02) (7.06) (0.00) (2.25) CBD 0.66* (0.06) 40.63* (15.17) * (0.07) 79.57* (16.10) SDA 0.50 (0.00) 3.33 (0.00) (0.00) 1.72 (0.00) CML 0.52 (0.00) (7.29) (0.00) (2.05) 7.20 CCA 0.50 (0.00) 4.13 (3.64) (0.00) (1.95) Table 5. Overview of the results for Experiment 3. The values are in the format: average value (standard deviation). Bold font is used to indicate the best performance. The reported time refers to the time needed to initialize the model parameters, train, test, and evaluate the performance of each method. For more details, consult the experimental protocol described in Sec. 4. Setting Method SCface Multi-PIE AUC Rank-1 % Time (s) AUC Rank-1 % Time (s) LR vs. LR HR vs. HR HR vs. LR CBD (LMNN) 0.78 (0.03) (9.68) (0.01) (2.89) LMNN 0.75 (0.03) (8.10) (0.01) (3.59) CBD (LFDA) 0.72 (0.02) (2.40) (0.01) (1.86) LFDA 0.70 (0.03) (9.01) (0.01) (2.69) 6.86 CBD (LMNN) 0.77 (0.03) (9.76) (0.01) (2.87) LMNN 0.78 (0.02) (9.70) (0.01) (1.83) CBD (LFDA) 0.70 (0.02) (9.50) (0.01) (1.88) LFDA 0.68 (0.02) (8.68) (0.01) (1.94) 6.86 CBD (LMNN) 0.77 (0.03) (9.59) (0.01) (2.92) CBD (LFDA) 0.71 (0.02) (9.90) (0.01) (1.99) ity reduction yields more stable solutions and thus the standard deviations obtained are significantly reduced as compared to Experiment 2. For the HR vs. HR setting the CBD performance is underestimated as the projected space is of dimensionality β, which is smaller than ψ used for LFDA and LMNN. Finally, even though CBD addresses the HR vs. LR problem in some cases it also outperforms LFDA and LMNN for the HR vs. HR and LR vs. LR tasks. 5. Discussion The proposed approach has three limitations: (i) increased time complexity, (ii) increased number of model parameters, and (iii) it requires tuning of additional parameters. The additional computational burden is due to learning the basis and iteratively updating the projections. However, the proposed approach requires only up to 16.81

8 seconds for training when used in conjunction with LFDA. In addition, unlike LMNN and LFDA, CBD needs to be trained only once for both HR and LR data. Moreover, it was demonstrated that even after the dimensionality reduction our approach still outperforms state-of-the-art crossdomain methods. For each projection an additional basis is learned that yields optimal representations. However, while the model parameters are treated independently during training, at deployment they can be viewed as a single projection in a similar manner that f D and f C are updated via a composition with f R. Finally, for all experiments we obtained improvements using the same parameter configuration. Hence, the proposed approach appears to be robust and it can be employed using intuitive parameter values. 6. Conclusion In this paper, we introduced a method that addresses the problem of cross-domain matching. As demonstrated, the proposed approach fully exploits the high-dimensional data and simultaneously uses samples from both domains. In addition, it jointly learns two bases that yield optimal representations for the task at hand. Despite its increased time complexity our approach remains suitable for real world applications as it depends on closed-form formulas and generalizes well even after dimensionality reduction. Statistical hypothesis tests on empirical data demonstrate that statistically significant improvements can be obtained when our method is used over state-of-the-art approaches. In addition, it was illustrated that the proposed method can be used in conjunction with different projection and distance metric learning techniques. Acknowledgments The authors would like to thank Mrs. Siyang Wang for duplicating the SDA and CML codes, and Mr. Stephen Siena for providing information related to his paper [11]. This research was funded in part by the US Army Research Lab (W911NF ) and the UH Hugh Roy and Lillie Cranz Cullen Endowment Fund. All statements of fact, opinion or conclusions contained herein are those of the authors and should not be construed as representing the official views or policies of the sponsors. References [1] S. Biswas, K. Bowyer, and P. Flynn. Multidimensional scaling for matching low-resolution face images. Trans. on PAMI, (99):1 1, [2] M. Borga, O. Friman, P. Lundberg, and H. Knutsson. A canonical correlation approach to exploratory data analysis in FMRI. In Proc. Int. SMRM, Honolulu, Hawaii, May [3] M. Grgic, K. Delac, and S. Grgic. SCface - surveillance cameras face database. MTAJ, 51(3): , [4] R. Gross, I. Matthews, J. Cohn, T. Kanade, and S. Baker. Multi-PIE. In Proc. FG, Amsterdam, Netherlands, Sept [5] Z. Jiang, Z. Lin, and L. Davis. Learning a discriminative dictionary for sparse coding via label consistent K-SVD. In Proc. CVPR, pages , San Francisco, CA, June [6] B. Li, H. Chang, S. Shan, and X. Chen. Coupled metric learning for face recognition with degraded images. In Proc. AML, pages , Nanjing, China, Nov [7] B. Li, H. Chang, S. Shan, and X. Chen. Low-resolution face recognition via coupled locality preserving mappings. SPL, 17(1):20 23, [8] Y. Lui, D. Bolme, B. Draper, J. Beveridge, G. Givens, and P. Phillips. A meta-analysis of face recognition covariates. In Proc. BTAS, pages , Washington, D.C., Sept [9] C. Ren and D. Dai. Piecewise regularized canonical correlation discrimination for low-resolution face recognition. In Proc. CCPR, pages 1 5, Chongqing, China, Aug [10] S. Siena, V. Boddeti, and B. Kumar. Coupled marginal Fisher analysis for low-resolution face recognition. In Proc. ECCV, pages , Firenze, Italy, Oct [11] S. Siena, V. Boddeti, and B. Kumar. Maximum-margin coupled mappings for cross-domain matching. In Proc. BTAS, Washington D.C., Sept Oct [12] M. Sugiyama, T. Id, S. Nakajima, and J. Sese. Semisupervised local Fisher discriminant analysis for dimensionality reduction. In Proc. PAKDD, pages , Osaka, Japan, May [13] P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. In Proc. CVPR, pages , Kauai, HI, Dec [14] S. Wang, L. Zhang, Y. Liang, and Q. Pan. Semi-coupled dictionary learning with applications to image super-resolution and photo-sketch synthesis. In Proc. CVPR, pages , Providence, Rhode Island, June [15] Z. Wang, Z. Miao, Q. Wu, Y. Wan, and Z. Tang. Lowresolution face recognition: A review. The Visual Computer, 30(4): , [16] K. Weinberger and L. Saul. Distance metric learning for large margin nearest neighbor classification. JMLR, 10: , [17] D. Wolfe and M. Hollander. Nonparametric statistical methods. Wiley Series in Probability and Statistics, [18] C. Zhou, Z. Zhang, D. Yi, Z. Lei, and S. Li. Low-resolution face recognition via simultaneous discriminant analysis. In Proc. IJCB, pages 1 6, Washington, D.C., Oct [19] G. Zou, S. Jiang, Y. Zhang, and K. Fu, G. Wang. A novel coupled metric learning method and its application in degraded face recognition. In Proc. CCBR, pages , Jinan, China, Nov [20] W. W. Zou and P. Yuen. Very low resolution face recognition problem. Trans. on IP, 21(1): , 2012.

Regression-based Metric Learning

Regression-based Metric Learning Panagiotis Moutafis, Mengjun Leng, and Ioannis A. Kakadiaris Computer Science, University of Houston, Houston, Texas 77004 {pmoutafis, mleng2, ioannisk}@uh.edu Abstract