Clustering Lightened Deep Representation for Large Scale Face Identification

Similar documents
Robust Face Recognition Based on Convolutional Neural Network

Face Recognition via Active Annotation and Learning

Face Recognition A Deep Learning Approach

FaceNet. Florian Schroff, Dmitry Kalenichenko, James Philbin Google Inc. Presentation by Ignacio Aranguren and Rahul Rana

Deep Learning for Face Recognition. Xiaogang Wang Department of Electronic Engineering, The Chinese University of Hong Kong

arxiv: v1 [cs.cv] 9 Jun 2016

arxiv: v1 [cs.cv] 1 Jun 2018

DeepFace: Closing the Gap to Human-Level Performance in Face Verification

Improving Face Recognition by Exploring Local Features with Visual Attention

arxiv: v1 [cs.cv] 4 Nov 2016

SHIV SHAKTI International Journal in Multidisciplinary and Academic Research (SSIJMAR) Vol. 7, No. 2, April 2018 (ISSN )

arxiv: v1 [cs.cv] 16 Nov 2015

arxiv: v1 [cs.cv] 31 Jul 2018

Face Recognition by Deep Learning - The Imbalance Problem

arxiv: v4 [cs.cv] 30 May 2018

A Patch Strategy for Deep Face Recognition

Deep Convolutional Neural Network using Triplet of Faces, Deep Ensemble, and Scorelevel Fusion for Face Recognition

Large-scale Datasets: Faces with Partial Occlusions and Pose Variations in the Wild

High Performance Large Scale Face Recognition with Multi-Cognition Softmax and Feature Retrieval

MobileFaceNets: Efficient CNNs for Accurate Real- Time Face Verification on Mobile Devices

Recursive Spatial Transformer (ReST) for Alignment-Free Face Recognition

arxiv: v1 [cs.cv] 31 Mar 2017

Improving Face Recognition by Exploring Local Features with Visual Attention

Face Alignment Under Various Poses and Expressions

Learning to Recognize Faces in Realistic Conditions

Learning Invariant Deep Representation for NIR-VIS Face Recognition

Deep Learning for Vision

Residual vs. Inception vs. Classical Networks for Low-Resolution Face Recognition

Deep Face Recognition. Nathan Sun

arxiv: v1 [cs.cv] 4 Aug 2011

arxiv: v4 [cs.cv] 14 Nov 2016

An Associate-Predict Model for Face Recognition FIPA Seminar WS 2011/2012

Hybrid Deep Learning for Face Verification

Cost-alleviative Learning for Deep Convolutional Neural Network-based Facial Part Labeling

Boosting Sex Identification Performance

Supplementary Material: Unsupervised Domain Adaptation for Face Recognition in Unlabeled Videos

SphereFace: Deep Hypersphere Embedding for Face Recognition

FISHER VECTOR ENCODED DEEP CONVOLUTIONAL FEATURES FOR UNCONSTRAINED FACE VERIFICATION

Face Recognition with Contrastive Convolution

arxiv: v1 [cs.cv] 12 Mar 2014

The Analysis of Faces in Brains and Machines

Joint Unsupervised Learning of Deep Representations and Image Clusters Supplementary materials

FACIAL POINT DETECTION BASED ON A CONVOLUTIONAL NEURAL NETWORK WITH OPTIMAL MINI-BATCH PROCEDURE. Chubu University 1200, Matsumoto-cho, Kasugai, AICHI

Kaggle Data Science Bowl 2017 Technical Report

ROBUST PARTIAL FACE RECOGNITION USING INSTANCE-TO-CLASS DISTANCE

arxiv: v4 [cs.cv] 10 Jun 2018

FACIAL POINT DETECTION USING CONVOLUTIONAL NEURAL NETWORK TRANSFERRED FROM A HETEROGENEOUS TASK

Channel Locality Block: A Variant of Squeeze-and-Excitation

Investigating Nuisance Factors in Face Recognition with DCNN Representation

arxiv: v1 [cs.mm] 12 Jan 2016

LOCAL APPEARANCE BASED FACE RECOGNITION USING DISCRETE COSINE TRANSFORM

Minimum Margin Loss for Deep Face Recognition

Improved Face Detection and Alignment using Cascade Deep Convolutional Network

Deep Fisher Faces. 1 Introduction. Harald Hanselmann

arxiv: v1 [cs.cv] 6 Nov 2016

An Exploration of Computer Vision Techniques for Bird Species Classification

FACE recognition has been one of the most extensively. Robust Face Recognition via Multimodal Deep Face Representation

arxiv: v1 [cs.cv] 15 Nov 2018

arxiv: v1 [cs.cv] 7 Dec 2015

arxiv: v1 [cs.cv] 19 May 2018

Face Recognition At-a-Distance Based on Sparse-Stereo Reconstruction

Hybrid Deep Learning for Face Verification. Yi Sun, Xiaogang Wang, Member, IEEE, and Xiaoou Tang, Fellow, IEEE

Boosting face recognition via neural Super-Resolution

Partial Face Recognition

Hierarchical Ensemble of Gabor Fisher Classifier for Face Recognition

MULTI-POSE FACE HALLUCINATION VIA NEIGHBOR EMBEDDING FOR FACIAL COMPONENTS. Yanghao Li, Jiaying Liu, Wenhan Yang, Zongming Guo

Experiments of Image Retrieval Using Weak Attributes

Git Loss for Deep Face Recognition

Frankenstein: Learning Deep Face Representations using Small Data

MoFA: Model-based Deep Convolutional Face Autoencoder for Unsupervised Monocular Reconstruction

arxiv: v2 [cs.cv] 11 Apr 2016

An efficient face recognition algorithm based on multi-kernel regularization learning

Intensity-Depth Face Alignment Using Cascade Shape Regression

A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS. Kuan-Chuan Peng and Tsuhan Chen

Short Paper Boosting Sex Identification Performance

arxiv: v3 [cs.cv] 24 Apr 2017

Virtual Training Samples and CRC based Test Sample Reconstruction and Face Recognition Experiments Wei HUANG and Li-ming MIAO

Local Gradient Order Pattern for Face Representation and Recognition

Pairwise Relational Networks for Face Recognition

Deep Learning Based Real-time Object Recognition System with Image Web Crawler

Toward More Realistic Face Recognition Evaluation Protocols for the YouTube Faces Database

DeepIndex for Accurate and Efficient Image Retrieval

Generic Face Alignment Using an Improved Active Shape Model

arxiv: v1 [cs.cv] 3 Mar 2018

De-mark GAN: Removing Dense Watermark With Generative Adversarial Network

The HFB Face Database for Heterogeneous Face Biometrics Research

arxiv: v1 [cs.cv] 11 Apr 2018

HENet: A Highly Efficient Convolutional Neural. Networks Optimized for Accuracy, Speed and Storage

Face Recognition by Combining Kernel Associative Memory and Gabor Transforms

arxiv: v1 [cs.cv] 2 Mar 2018

Bidirectional Recurrent Convolutional Networks for Video Super-Resolution

Training Deep Face Recognition Systems with Synthetic Data

An End-to-End System for Unconstrained Face Verification with Deep Convolutional Neural Networks

Content-Based Image Recovery

Better than best: matching score based face registration

DeepFace: Closing the Gap to Human-Level Performance in Face Verification

Eye Detection by Haar wavelets and cascaded Support Vector Machine

Yunfeng Zhang 1, Huan Wang 2, Jie Zhu 1 1 Computer Science & Engineering Department, North China Institute of Aerospace

Human-Robot Interaction

Real-time Object Detection CS 229 Course Project

Transcription:

Clustering Lightened Deep Representation for Large Scale Face Identification Shilun Lin linshilun@bupt.edu.cn Zhicheng Zhao zhaozc@bupt.edu.cn Fei Su sufei@bupt.edu.cn ABSTRACT On specific face dataset, such as the LFW benchmark, recent face recognition methods have achieved near perfect accuracy. However, the face identification is still a challenging task for a super large scale dataset, where a real application is urgently needed, thus Microsoft challenge of recognizing one million celebrities (MS-Celeb-1M) has attracted an increasing attention. In this paper, we propose a threestep strategy to address this problem. Firstly, based on a corss-domain face dataset, i.e., the CASIA-Web dataset, an efficient and deliberate face representation model with a Max-Feature-Map (MFM) activation function is trained to map raw images into the feature space quickly. Secondly, face representations with the same MID in MS-Celeb-1M are clustered into three subsets: a pure set, a hard set and a mess set. The cluster centers are used as gallery representations of the corresponding MID and this scheme reduces the impact of noisy images and the number of comparisons during the face matching. Finally, locality sensitive hashing (LSH) algorithm is applied to speed up the search of the nearest centroid. Experimental results show that our face CNN model can extract stable and discriminative face representations, and the proposed three-step strategy achieves a promising performance without any manual selection for the MS-Celeb-1M dataset. Furthermore, we find that via clustering a relatively pure set is kept by many MIDs in MS- Celeb-1M, which indicats this scheme is effective for cleaning a huge but mess dataset. 16], due to the publication of LFW [4], an extensively reported dataset for evaluation of face recognition algorithms. However, surpassing human recognition accuracy on LFW does not mean that the face recognition has been solved for the number of images in LFW is relatively small. Face i- dentification is still a challenging task for a super large scale dataset. In many real world applications, accurate identification at planetary scale is needed. i.e., in suspect searching, the face identification algorithm should find the suspect in the large scale gallery image set. Similarly, large scale robustness is also necessary in the field of mobile payment to ensure that other people can not use your account by their faces [5]. Recently, Microsoft is releasing MS-Celeb-1M [3], a large scale real world face image dataset to public, encouraging researchers to develop the best face recognition techniques to recognize one million people entities identified from Freebase. Its V1.0 version contains 10M celebrity face images for the top 100K celebrities, which can be used to train and evaluate both face identification and verification algorithms in a relatively large scale. In this paper, we propose a three-step strategy to address MS-Celeb-1M (V1.0 version) challenge of recognizing 100K celebrities in the real world, the flowchart is shown in Fig. 1. Keywords Large Scale Face Identification, Lightened Deep Representation, Clustering, Convolutional Neural Network 1. INTRODUCTION In the past few years, face recognition under uncontrolled environments becomes one of the most extensively research fields in computer vision [19, 20, 15, 17, 10, 8, 13, 2, 6, 14, Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. ICC 17, March 22 2017, Cambridge, United Kingdom c 2017 ACM. ISBN 978-1-4503-4774-7/17/03... $15.00 DOI: http://dx.doi.org/10.1145/3018896.3025149 Figure 1: The basic idea of our proposed three-step strategy to MS-Celeb-1M challenge, including face representation extractor training, clustering in the feature space and closest centroid finding. To achieve ultimate accuracy, existing CNN based models tend to be deeper or use multiple local facial patch ensemble. The very deep model leads to a long computation time for representation extraction on CPU or GPU. And the appli-

cation of multiple facial patches requires much time and introduces uncertainty caused by automatic facial landmarks detection. While for large scale face recognition task, in addition to the time consuming and demand for a large number of computing resources in offline training, the time consuming of representation extraction and dimension of representation should be carefully considered. To obtain a small face representation extractor with fast speed of feature extraction and low-dimensional representation, widely used ReLU is replaced with Max-Feature-Map (MFM) which is proved to be effective [19]. Only a kind of aligned face patch extracted from CASIA-Webface [20] is applied as training data. There are two reasons for us not using MS-Celeb-1M dataset for training. On the one hand, this dataset is still relatively dirty at this stage and takes time to be cleaned. On the other hand, in practical application, it is difficult to cover all query identities in the training set. Our aim is to obtain a generalized model with good performance. The MS-Celeb-1M (V1.0 version) contains 8,456,240 of 99,892 MIDs. In such a large scale, it is not economical to compare query image with the entire dataset. The visualization of face representations obtained by the extractor provides a direct visual insight that through layer by layer abstraction, face representations of the same person tend to be similar, while there is a clear difference between representations of different person. According to this observation, K-means [7] is applied to face representations with the same MID in MS-Celeb-1M and three cluster centers (or less) are obtained as the gallery representations of the corresponding MID. The result of clustering shows face images corresponding to face representations of many MIDs were divided into three parts (pure set, hard set and mess set, the latter two do not necessarily exist). The pure set contains normal images of the person who appears most frequently in a MID folder. Faces in the hard set have variations in pose, illumination, or expression and a small amount of them do not belong to the person appears in pure set. Mess set is consists of those noisy images. Clustering in the feature space not only reduces the amount of computation required for the query phase, but also provides an effective preprocessing method for further data cleaning. Although the number of comparisons at the query phase is reduced by clustering, and face features are compact due to the MFM activation function. Searching for the nearest one from hundreds of thousands of gallery representations still limit the speed of our system. Locality-Sensitive Hashing (LSH) [12] is an effective method to deal with this issue. Through hashing similar gallery representations are mapped to the same bucket with high probability. During evaluation, LSH algorithm will map the query item to the bucket that contains similar gallery representations and consumes approximate O(1) time complexity to find the nearest one. 2. RELATED WORK 2.1 Data Set Face recognition history can be said to be the history of evolution of face database. Early face datasets, such as PIE [11], FERET [9], were almost collected under controlled environments. Very high performance is achieved on these ideal datasets through the efforts of many researchers. However, models learned from these datasets are difficult to generalize to practical applications, especially under the uncontrolled environments. So the interests of community gradually transferred from controlled environments to uncontrolled environments, and the publication of a milestone data-set, LFW [4], including 13,233 images of 5,749 identities, promotes the studies of unconstrained face recognition. Compared to previous datasets, the biggest difference of LFW is that the images are crawled from Internet rather than acquired under several pre-defined environments. Therefore, LFW has more variations in pose, illumination, expression, resolution, imaging device and these factors are combined together in random way. YTF [18] is based on the name list of LFW (1,595 identities) but it is created for video based face recognition. All the 3,425 videos in YTF were downloaded from YouTube. Because the videos on YouTube are highly compressed, the quality of the face snapshots are lower than LFW. Along with the development of deep learning, the scale of face database is also increasing. In the large scale public datasets, CASIA-WebFace [20] that includes about 500K photos of 10K celebrities crawled from the web is a great resource for training. Although the scale of CelebFaces [15] is relative smaller than CASIA-WebFace, it contains rich face attribute labels for face parsing. For large scale private datasets: Facebook s SFC [17] contains more then 4000 subjects and each subject has an average of 1000 images. Using SFC, [17] successfully learns an effective face representation robust to face variations in the wild. Google has access to massive photo collections, for example, they trained the FaceNet [10] on 200 Million photos of 8 Million people (and more recently on 500M of 10M) and won the outstanding performance in many face recognition tasks. The details of recent representative large scale face datasets are shown in Table. 1. Table 1: Current large scale face dataset. Dataset Identities Images Access WDRef [1] 2,995 99,773 Private CelebFace [15] 10,177 202,599 Public WebFace [20] 10,575 494,414 Public VGG Face [8] 2,622 2.6M Public SFC [17] 4,030 4.4M Private Google [10] 10M 500M Private 2.2 Deep Face Representation Learning Data and algorithm are two essential components for face recognition, especially for deep face representation learning. The publication of large scale web crawled dataset, promotes the studies of unconstrained face recognition. Several excellent face features were learned by different deep networks, and achieved high performance on LFW face verification task. Taigman et al. [17] proposed the DeepFace, which was a multi-class network trained to perform the face identification task on 4.4 million faces of over 4,000 identities. Through an ensemble of three networks using different alignments and color channels they obtained high accuracy approaching the human-level. Sun et al. [13] proposed to combine the identification and verification losses for reducing intra-personal variations while enlarging inter-personal differences during the training of DeepID2. They concatenated the features from 25 of such networks, each operating on a different face patch, and then PCA was applied to get

the compact face representation. Schroff et al. [10] presented a system, called FaceNet, which directly learned a mapping from face images to a compact Euclidean space with 100-200 million training face thumbnails consisting of about 8 million different identities. All groups mentioned above surpassed human s performance and achieved near perfect results on the LFW benchmark. However, face recognition is far from being solved. Many applications require accurate identification at planetary scale like finding the best matching face in a database including billions of people while LFW includes only 13K photos of 5K people. The MS-Celeb-1M (V1.0 version) contains 8,456,240 of 99,892 MIDs. Using the entire MS-Celeb-1M dataset as gallery set may not be appropriate. On the one hand, the computation and storage overhead is relatively large, on the other hand, there are many noisy images in a MID folder and calculating the distance between them and query image is not necessary. In fact, through layer by layer abstraction (Fig. 3), face representation has a high distinguishability (intuitive visualization results will be given in section 4) which provides a good basis for clustering. So K-means algorithm is applied to face representations with the same MID in MS-Celeb-1M and three cluster centers (or less) are gained as the gallery representations of corresponding MID. During querying, only distances between all cluster centers and query image need to be calculated. Interestingly, in the absence of any supervisory information, pictures in a MID folder are automatically divided into different categories (pure, hard and mess, the latter two do not necessarily exist) which provides an effective preprocessing method for the follow-up database cleaning. 3. PROPOSED METHOD Since face recognition involves small inter-personal variations and large intra-personal variations, how to learn discriminative face representation narrowing the intra-personal distance and enlarging the inter-personal gap is always a key topic. Deep face representation has made remarkable breakthroughs in this field and been widely used. In the large scale face recognition, time consumption of feature extraction and overhead of feature storage should not be overlooked. Max- Feature-Map activation function proposed in [19] is advantageous to learn a CNN model with small size, fast speed of feature extraction and compact representation. As shown in Fig. 2, the output of MFM activation is the maximum between two convolution feature map candidate nodes. This operation selects more notable and discriminative nodes in both convolution and fully connected layers and makes the model lightened. Details of network architecture is given in Fig. 3. MFM instead of ReLU is utilized after convolutional layers and fully connected layer. Face representation is extracted from FC1 layer (after MFM activation). Figure 3: The flowchart of Transfer FaceNet. Figure 2: Principle of the Max-Feature-Map activation function. To further speed up our system, Locality-Sensitive Hashing (LSH) is utilized in the final query phase. Through hash similar cluster centers are mapped to the same bucket with high probability. During evaluation, LSH algorithm will map the query item to the bucket that contains similar gallery representations and consumes approximate O(1) time complexity to find the nearest one. 4. EXPERIMENTAL RESULTS 4.1 Face Representation Extractor In our experiments, model training is based on the open source deep learning framework Caffe [10]. Training samples for the face representation extractor are 144 144 gray-scale aligned (using five facial landmarks) face images. And training samples are cropped into 128 128 randomly and mirrored for data augmentation. 60% dropout is used to avoid overfitting on fully connected layers. The learning rate is set to 1e-3 initially and reduced to 1e-5 during learning. Fig. 4 shows feature maps from the Conv4b layer of the trained

which implies this scheme is effective for cleaning huge but mess data. Our results of clustering in the feature space provides guidance for further data cleaning which is helpful to obtain a relatively clean and large scale dataset to be used to train a more powerful face representation extractor. Figure 5: Clustering results of MID m.0b3q8k (top) and m.0cz96px (bottom) in MS-Celeb-1M (V1). Figure 4: Visualization of the first 36 feature maps produced by the Conv4b layer of face representation extractor on images of Zhiling Lin (a,b,left) and X- iang Chen (c,d,right). Best viewed in color. face representation extractor. We can see that the response of face concepts like mouth and eyes is visible which can provide discriminative information for face recognition. For example, in our daily life, we intuitively use concepts like big eyes and high nose to distinguish different people. Comparing feature maps of same persons (a vs. b, feature maps in the red boxes for instance) in Fig. 4, we can find that the intra-personal variations of feature maps produced by higher layer of the face representation extractor is small. And the feature maps of two different persons have a significant difference (a vs. c, feature maps in the red and green boxes). This characteristic of the face representation provides a good foundation for the further clustering. 4.2 Clustering in the Feature Space With the purpose of reducing the number of comparisons during evaluation and excluding the impact of noise images on identification performance, K-means algorithm is applied to face representations with the same MID in MS-Celeb-1M and three cluster centers (or less) are gained as the gallery representations of corresponding MID. Some results of clustering are shown in Fig. 5. Images of MID m.0b3q8k (top) are divided into three sets including pure set, hard set and mess set as we described before. Hard set or mess set doesn t always exist, because images of some MID have been relatively pure. MID m.0cz96px (bottom) includes face images of many identities, and it is difficult to confirm which one is the main identity. We believe that images corresponding to such MID need to be re-collected. We find that via clustering a relatively pure set is kept by many MIDs in MS-Celeb-1M, 4.3 Performance In querying, corresponding MID of the nearest centroid is assigned to the query image. Introducing Locality-Sensitive Hashing (LSH) is effective to speed up the process of finding the closest centroid (from O(n) to approximate O(1) time complexity, n is the number of centroids). Because we emphasize on the generalization ability of our system on the large scale web-crawled face dataset, in the process of building our system and testing, neither MS-Celeb-1M (V1) is used for training, nor any artificial cleaning is applied to this dataset. Finally, 58.20% and 38.4% top1 identification rate is achieved on the MS-Celeb-1M (V1) random and hard dev set respectively. The 100K-list in the MS- Celeb-1M (V1) only covers about 75% of celebrities in the measurement set. So in the absence of any expansion of the database, the highest identification rate is 75%. 5. CONCLUSION AND FUTURE WORK This paper proposes a three-step strategy to construct a promising system for large scale automatic face identification by clustering lightened deep representation. We offer an intuitive visual interpretation to the discriminability of lightened deep representation. Face representations are clustered in the feature space based on this observation, which can reduce the impact of noise images mixed in each MID and the number of comparisons during matching. Benefit from the discriminable face representation, images of corresponding MID are divided into three set (or less) and most of MIDs contain a pure set of the main identity. Such automated processing will reduce the labor cost of further data cleaning. Moreover, a relatively clean dataset is essential for training a deep model with strong generalization ability. Exploring the appropriate number of cluster centers of each MID is one of the key points in our future work. How to achieve a good tradeoff between processing speed and accuracy is one of the issues to be solved in large scale face recognition system.

6. ACKNOWLEDGMENT This work is supported by Chinese National Natural Science Foundation (61532018, 61372169, 61471049), Special Funds of Beijing Municipal Co-construction Project, and Beijing Key Lab of Network System and Network Culture. 7. REFERENCES [1] D. Chen, X. Cao, L. Wang, F. Wen, and J. Sun. Bayesian face revisited: A joint formulation. In Computer Vision ECCV 2012, pages 566 579. Springer, 2012. [2] D. Chen, X. Cao, F. Wen, and J. Sun. Blessing of dimensionality: High-dimensional feature and its efficient compression for face verification. In Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on, pages 3025 3032. IEEE, 2013. [3] Y. Guo, L. Zhang, Y. Hu, X. He, and J. Gao. Ms-celeb-1m: Challenge of recognizing one million celebri-ties in the real world. [4] G. B. Huang, M. Ramesh, T. Berg, and E. Learned-Miller. Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Technical report, Technical Report 07-49, University of Massachusetts, Amherst, 2007. [5] I. Kemelmacher-Shlizerman, S. Seitz, D. Miller, and E. Brossard. The megaface benchmark: 1 million faces for recognition at scale. arxiv preprint arxiv:1512.00596, 2015. [6] C. Lu and X. Tang. Surpassing human-level face verification performance on lfw with gaussianface. arxiv preprint arxiv:1404.3840, 2014. [7] J. MacQueen et al. Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, volume 1, pages 281 297. Oakland, CA, USA., 1967. [8] O. M. Parkhi, A. Vedaldi, and A. Zisserman. Deep face recognition. In British Machine Vision Conference, volume 1, page 6, 2015. [9] P. J. Phillips, H. Moon, S. Rizvi, P. J. Rauss, et al. The feret evaluation methodology for face-recognition algorithms. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 22(10):1090 1104, 2000. [10] F. Schroff, D. Kalenichenko, and J. Philbin. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 815 823, 2015. [11] T. Sim, S. Baker, and M. Bsat. The cmu pose, illumination, and expression (pie) database. In Automatic Face and Gesture Recognition, 2002. Proceedings. Fifth IEEE International Conference on, pages 46 51. IEEE, 2002. [12] M. Slaney and M. Casey. Locality-sensitive hashing for finding nearest neighbors [lecture notes]. IEEE Signal Processing Magazine, 25(2):128 131, 2008. [13] Y. Sun, Y. Chen, X. Wang, and X. Tang. Deep learning face representation by joint identification-verification. In Advances in Neural Information Processing Systems, pages 1988 1996, 2014. [14] Y. Sun, X. Wang, and X. Tang. Hybrid deep learning for face verification. In Computer Vision (ICCV), 2013 IEEE International Conference on, pages 1489 1496. IEEE, 2013. [15] Y. Sun, X. Wang, and X. Tang. Deep learning face representation from predicting 10,000 classes. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, pages 1891 1898. IEEE, 2014. [16] Y. Sun, X. Wang, and X. Tang. Deeply learned face representations are sparse, selective, and robust. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2892 2900, 2015. [17] Y. Taigman, M. Yang, M. Ranzato, and L. Wolf. Deepface: Closing the gap to human-level performance in face verification. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, pages 1701 1708. IEEE, 2014. [18] L. Wolf, T. Hassner, and I. Maoz. Face recognition in unconstrained videos with matched background similarity. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pages 529 534. IEEE, 2011. [19] X. Wu, R. He, and Z. Sun. A lightened cnn for deep face representation. arxiv preprint arxiv:1511.02683, 2015. [20] D. Yi, Z. Lei, S. Liao, and S. Z. Li. Learning face representation from scratch. arxiv preprint arxiv:1411.7923, 2014.