Manifold Learning for Video-to-Video Face Recognition

Similar documents
Selecting Models from Videos for Appearance-Based Face Recognition

Robust Pose Estimation using the SwissRanger SR-3000 Camera

Appearance Manifold of Facial Expression

SELECTION OF THE OPTIMAL PARAMETER VALUE FOR THE LOCALLY LINEAR EMBEDDING ALGORITHM. Olga Kouropteva, Oleg Okun and Matti Pietikäinen

Non-linear dimension reduction

Technical Report. Title: Manifold learning and Random Projections for multi-view object recognition

Head Frontal-View Identification Using Extended LLE

The Analysis of Parameters t and k of LPP on Several Famous Face Databases

Large-Scale Face Manifold Learning

Locality Preserving Projections (LPP) Abstract

Dimension Reduction of Image Manifolds

Locality Preserving Projections (LPP) Abstract

Decorrelated Local Binary Pattern for Robust Face Recognition

Dynamic Facial Expression Recognition Using A Bayesian Temporal Manifold Model

Remote Sensing Data Classification Using Combined Spectral and Spatial Local Linear Embedding (CSSLE)

School of Computer and Communication, Lanzhou University of Technology, Gansu, Lanzhou,730050,P.R. China

Learning to Recognize Faces in Realistic Conditions

ABSTRACT. Keywords: visual training, unsupervised learning, lumber inspection, projection 1. INTRODUCTION

Learning a Manifold as an Atlas Supplementary Material

Dimension Reduction CS534

Generalized Principal Component Analysis CVPR 2007

A Distance-Based Classifier Using Dissimilarity Based on Class Conditional Probability and Within-Class Variation. Kwanyong Lee 1 and Hyeyoung Park 2

Globally and Locally Consistent Unsupervised Projection

CSE 6242 A / CS 4803 DVA. Feb 12, Dimension Reduction. Guest Lecturer: Jaegul Choo

Sparsity Preserving Canonical Correlation Analysis

IMPROVED FACE RECOGNITION USING ICP TECHNIQUES INCAMERA SURVEILLANCE SYSTEMS. Kirthiga, M.E-Communication system, PREC, Thanjavur

CSE 6242 A / CX 4242 DVA. March 6, Dimension Reduction. Guest Lecturer: Jaegul Choo

Image Similarities for Learning Video Manifolds. Selen Atasoy MICCAI 2011 Tutorial

Automatic Alignment of Local Representations

Isometric Mapping Hashing

Robust Face Recognition Using Enhanced Local Binary Pattern

Recognizing Handwritten Digits Using the LLE Algorithm with Back Propagation

Data fusion and multi-cue data matching using diffusion maps

Face Recognition using Laplacianfaces

Linear Discriminant Analysis in Ottoman Alphabet Character Recognition

CSE 6242 / CX October 9, Dimension Reduction. Guest Lecturer: Jaegul Choo

Learning based face hallucination techniques: A survey

A NOVEL APPROACH TO ACCESS CONTROL BASED ON FACE RECOGNITION

Neighbor Line-based Locally linear Embedding

Color Local Texture Features Based Face Recognition

Nonlinear Generative Models for Dynamic Shape and Dynamic Appearance

Semi-Supervised PCA-based Face Recognition Using Self-Training

Recognition: Face Recognition. Linda Shapiro EE/CSE 576

Robust Face Recognition via Sparse Representation Authors: John Wright, Allen Y. Yang, Arvind Ganesh, S. Shankar Sastry, and Yi Ma

Manifold Clustering. Abstract. 1. Introduction

Facial Expression Recognition Using Expression- Specific Local Binary Patterns and Layer Denoising Mechanism

Face Recognition using Tensor Analysis. Prahlad R. Enuganti

Generalized Autoencoder: A Neural Network Framework for Dimensionality Reduction

Non-linear CCA and PCA by Alignment of Local Models

LOCAL APPEARANCE BASED FACE RECOGNITION USING DISCRETE COSINE TRANSFORM

Face Recognition Using Wavelet Based Kernel Locally Discriminating Projection

The Role of Manifold Learning in Human Motion Analysis

Person Authentication from Video of Faces: A Behavioral and Physiological Approach Using Pseudo Hierarchical Hidden Markov Models

Differential Structure in non-linear Image Embedding Functions

Texture Classification by Combining Local Binary Pattern Features and a Self-Organizing Map

Sparse Manifold Clustering and Embedding

Diagonal Principal Component Analysis for Face Recognition

Global versus local methods in nonlinear dimensionality reduction

Multidirectional 2DPCA Based Face Recognition System

Global versus local methods in nonlinear dimensionality reduction

Iterative Non-linear Dimensionality Reduction by Manifold Sculpting

Face Recognition At-a-Distance Based on Sparse-Stereo Reconstruction

Sensitivity to parameter and data variations in dimensionality reduction techniques

A new Graph constructor for Semi-supervised Discriminant Analysis via Group Sparsity

PATTERN RECOGNITION USING NEURAL NETWORKS

Facial Expression Recognition with Emotion-Based Feature Fusion

3D Posture Representation Using Meshless Parameterization with Cylindrical Virtual Boundary

A Discriminative Non-Linear Manifold Learning Technique for Face Recognition

Curvilinear Distance Analysis versus Isomap

Using Graph Model for Face Analysis

Modelling and Visualization of High Dimensional Data. Sample Examination Paper

A Real Time Facial Expression Classification System Using Local Binary Patterns

An Improved Face Recognition Technique Based on Modular LPCA Approach

Rate-coded Restricted Boltzmann Machines for Face Recognition

APPLICATION OF LOCAL BINARY PATTERN AND PRINCIPAL COMPONENT ANALYSIS FOR FACE RECOGNITION

Local Discriminant Embedding and Its Variants

Linear Discriminant Analysis for 3D Face Recognition System

Integrating Face-ID into an Interactive Person-ID Learning System

Gait analysis for person recognition using principal component analysis and support vector machines

A Taxonomy of Semi-Supervised Learning Algorithms

Linear Laplacian Discrimination for Feature Extraction

Extended Isomap for Pattern Classification

An Efficient Face Recognition using Discriminative Robust Local Binary Pattern and Gabor Filter with Single Sample per Class

CHAPTER 3 PRINCIPAL COMPONENT ANALYSIS AND FISHER LINEAR DISCRIMINANT ANALYSIS

Discriminative Locality Alignment

Human Motion Synthesis by Motion Manifold Learning and Motion Primitive Segmentation

A Stochastic Optimization Approach for Unsupervised Kernel Regression

Heat Kernel Based Local Binary Pattern for Face Representation

Figure (5) Kohonen Self-Organized Map

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

Video Based Face Recognition Using Graph Matching

Laplacian Faces: A Face Recognition Tool

CHAPTER 5 GLOBAL AND LOCAL FEATURES FOR FACE RECOGNITION

Linear local tangent space alignment and application to face recognition

Last week. Multi-Frame Structure from Motion: Multi-View Stereo. Unknown camera viewpoints

IMAGE RETRIEVAL USING EFFICIENT FEATURE VECTORS GENERATED FROM COMPRESSED DOMAIN

Facial Expression Recognition Using Non-negative Matrix Factorization

Local Similarity based Linear Discriminant Analysis for Face Recognition with Single Sample per Person

Image-Based Face Recognition using Global Features

Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Transcription:

Manifold Learning for Video-to-Video Face Recognition Abstract. We look in this work at the problem of video-based face recognition in which both training and test sets are video sequences, and propose a novel approach based on manifold learning. The idea consists of first learning the intrinsic personal characteristics of each subject from the training video sequences by discovering the hidden low-dimensional nonlinear manifold of each individual. Then, a target face video sequence is projected and compared to the manifold of each subject. The closest manifold, in terms of a recently introduced manifold distance measure, determines the identity of the person in the sequence. Experiments on a large set of talking faces under different image resolutions show very promising results (recognition rate of 99.8%), outperforming many traditional approaches. 1 Introduction Recently, there has been an increasing interest on video-based face recognition (e.g. [1 3]). This is partially due to the limitations of still image-based methods in handling illumination changes, pose variations and other factors. The most studied scenario in video-based face recognition is having a set of still images as the gallery (enrollment) and video sequences as the probe (test set). However, in some real-world applications such as in human-computer interaction and content based video retrieval, both training and test sets can be video sequences. In such settings, performing video-to-video matching may be crucial for robust face recognition but this task is far from being trivial. There are several ways of approaching the problem of face recognition in which both training and test sets are video sequences. Basically, one could build an appearance-based system by selecting few exemplars from the training sequences as gallery models and then performing still image-based recognition and fusing the results over the target video sequence [4]. Obviously, such an approach is not optimal as some important information in the video sequences may be left out. Another direction consists of using spatiotemporal representations for encoding the information both in the training and test video sequences [1 3]. Perhaps, the most popular approach in this category is based on the hidden Markov models (HMMs) which have been successfully applied to face recognition from videos [2]. The idea is quite simple: in the training phase, an HMM is created to learn both the statistics and temporal dynamics of each individual. During the recognition process, the temporal characteristics of the face sequence are analyzed over time by the HMM corresponding to each subject. The likelihood scores provided by the HMMs are compared. The highest score provides

2 the identity of a face in the video sequence. Unfortunately, most methods which use spatiotemporal representations for face recognition have not yet shown their full potential as they suffer from different drawbacks such as the use of only global features while local information is shown to also be important to facial image analysis [5] and the lack of discriminating between the facial dynamics which are useful for recognition from those which can hinder the recognition process [6]. Very recently, inspired by studies in neuroscience emphasizing manifold ways of visual perception, we introduced in [7] a novel method for gender classification from videos using manifold learning. The idea consists of clustering the face sequences in the low-dimensional space based on the intrinsic characteristic of men and women. Then, a target face sequence is projected into both men and women manifolds for classification. The proposed approach reached excellent results not only in gender recognition problem but also in age and ethnic classification from face video sequences. In this work, we extend the approach proposed in [7] to the problem of video-to-video face recognition. Thus, we propose to first learn and discover the hidden low-dimensional nonlinear manifold of each individual. Then, a target face sequence can be projected into each manifold for classification. The closest manifold will then determine the identity of the person in the target face video sequence. The experiments which are presented in Section 4 show that such manifold-based approach yields in excellent results outperforming many traditional methods for video-based face recognition. The rest of this paper is organized as follows. Section 2 explains the notion of face manifold and discusses some learning methods. Then, we describe our proposed approach to the problem of video-to-video face recognition and the experimental analysis in sections 3 and 4, respectively. Finally, we draw a conclusion in Section 5. 2 Face Manifold Let I(P,s) denote a face image of a person P at configuration s. The variable s describes a combination of factors such as facial expression, pose, illuminations etc. Let ξ p, ξ p = {I(P,s) s S} (1) be the collection of face images of the person P under all possible configurations S. The ξ p thus defined is called the face manifold of person P. Additionally, if we consider all the face images of different individuals, then we obtain the face manifold ξ: ξ = p ξ p (2) Such a manifold ξ resides only in a small subspace of the high-dimensional image space. Consider the example of Fig. 1 showing face images of a person when moving his face from left to right. The only obvious degree of freedom in this case is the rotation angle of the face. Therefore, the intrinsic dimensionality of

Manifold Learning for Video-to-Video Face Recognition 3 the faces is very small (close to 1). However, these faces are embedded in a 1600- dimensional image space (since the face images have 40 40 = 1600 pixels) which is highly redundant. If one could discover the hidden low-dimensional structure of these faces (the rotation angle of the face) from the input observations, this would greatly facilitate the further analysis of the face images such as visualization, classification, retrieval etc. Our proposed approach to the problem of video-tovideo face recognition, which is described in Section 3, exploits the properties of face manifolds. Fig. 1. An example showing a face manifold of a given subject embedded in the high dimensional image space Neuroscience studies also pointed out the manifold ways of visual perception [8]. Indeed, facial images are not isolated patterns in the image space but lie on a nonlinear low-dimensional manifold. The key issue in manifold learning is to discover the low-dimensional manifold embedded in the high dimensional space. This can be done by projecting the face images into low-dimensional coordinates. For that purpose, there exist several methods. The traditional ones are Principal Component Analysis (PCA) and Multidimensional Scaling (MDS). These methods are simple to implement and efficient in discovering the structure of data lying on or near linear subspaces of the high-dimensional input space. However, face images do not satisfy this constraint as they lie on a complex nonlinear and nonconvex manifold in the high-dimensional space. Therefore, such linear methods generally fail to discover the real structure of the face images in the low-dimensional space. As an alternative to PCA and MDS, one can consider some nonlinear dimensionality reduction methods such as Self-Organizing Maps (SOM) [9], Generative Topographic Mapping (GTM) [10], Sammon s Mappings (SM) [11] etc. Though these methods can also handle nonlinear manifolds, most of them tend to involve several free parameters such as learning rates and convergence criteria. In addition, most of these methods do not have an obvious guarantee of convergence to the global optimum. Fortunately, in the recent

4 years, a set of new manifold learning algorithms have emerged. These methods are based on an Eigen decomposition and combine the major algorithmic features of PCA and MDS (computational efficiency, global optimality, and flexible asymptotic convergence guarantees) with flexibility to learn a broad class of nonlinear manifolds. Among these algorithms are Locally Linear Embedding (LLE) [12], ISOmetric feature MAPping (ISOMAP) [13] and Laplacian Eigenmaps [14]. 3 Proposed Approach to Video-Video Face Recognition We approach the problem of video-to-video face recognition from manifold learning perspective. We adopt the LLE algorithm for manifold learning due to its demonstrated simplicity and efficiency to recover meaningful low-dimensional structures hidden in complex and high-dimensional data such as face images. LLE is an unsupervised learning algorithm which maps high-dimensional data onto a low-dimensional, neighbor-preserving embedding space. In brief, considering a set of N face images and organizing them into a matrix X (where each column vector represents a face), the LLE algorithm involves then the following three steps: 1. Find the k nearest neighbors of each point X i. 2. Compute the weights W ij that best reconstruct each data point from its neighbors, minimizing the cost in Equation (3): 2 N ǫ(w) = X i W ij X j (3) i=1 j neighbors(i) while enforcing the constraints W ij = 0 if X j is not a neighbor of X i, and N j=1 W ij = 1 for every i (to ensure that W is translation-invariant). 3. Compute the embedding Y (of lower dimensionality d << D, where D is the dimension of the input data) best reconstructed by the weights W ij minimizing the quadratic form in Equation (4): 2 N Φ(Y ) = Y i W ij Y j (4) i=1 j neighbors(i) under constraints N i=1 Y i = 0 (to ensure a translation-invariant embedding) and 1 N N i=1 Y iyi T = 0 (normalized unit covariance). The aim of the first two steps of the algorithm is to preserve the local geometry of the data in the low-dimensional space, while the last step discovers the global structure by integrating information from overlapping local neighborhoods. LLE is an efficient approach to compute the low-dimensional embeddings of high-dimensional data assumed to lie on a non-linear manifold. Its ability to deal with large sizes of high-dimensional data and its non-iterative way to find the embeddings make it attractive.

Manifold Learning for Video-to-Video Face Recognition 5 Given a set of training face video sequences with one or more sequences per person. For each person, we first apply the LLE algorithm on all his/her face images in the training set. We obtain then coordinates in the low-dimensional space, thus defining a face manifold of the person. Let us denote then the obtained embedding for a given person P as ξ P. Note that the calculation of ξ P involves only two free parameters which are the number of neighbors (k) and the dimension of the embedding space (d). A discussion on the values of these two parameters can be found in [7]. To determine the identity of an unknown person in a given face sequence {Face frame(1),face frame(2),...,face frame(l) } we first project every face instance Face frame(i) into the face manifold of each subject in the low-dimensional space. The closest manifold will then determine the identity of the person in the sequence. Fig. 2 shows an example of embedding results of three video sequences of the subjects shown in Fig. 3. The projection of the target face sequence into the manifold of person P is done using the following steps: a. Let now X i be the column vector representing the face image (Face frame(i) ) from the new sequence. b. Find the k nearest neighbors of each point X i among the training face samples of person P. c. Compute the weights W ij that best reconstruct each data point X i from its neighbors using Equation (3). d. Use the obtained weights W ij to compute the embedding Yi P of each point X i (i.e. Face frame(i) ) as: Y P i = j neighbors(x i) W ij ξ P j (5) where ξj P refers to the embedding point of the j th neighbor of the point X i in the face manifold of person P. As a result, we obtain the embedding Y P of the new face sequence in every face manifold ξ P. Then, we compute how close is the embedding Y P to the face manifold ξ P using: D P = 1 L L Y P i=1 i ξ P(i) j (6) where L is the length of the target face sequence, Yi P is the embedding of the point X i in the low-dimensional space and ξ P(i) j is the closest point (in term of Euclidean distance) from the manifold ξ P to Yi P. Finally, the identity of the L person in the target face sequence is given by: argmin i=1 Yi P ξ P(i) j. p 4 Experimental Analysis For experimental analysis, we considered the VidTIMIT [15] face video database containing 43 talking subjects (19 female and 24 male), reciting ten short sen-

6 Fig. 2. Examples of embedding results of 3 sequences of the subjects shown in Fig. 3 tences in three sessions with an average delay of a week between sessions, allowing for appearance and mood changes. In total, there are ten face sequences per persons. From each sequence, we automatically detected the eye positions from the first frame. The determined eye positions are then used to crop the facial area in the whole sequence, yielding in not well aligned face images. Finally, we scaled the resulted images into four different resolutions: 20 20, 40 40, 60 60 and 80 80 pixels. Examples of face images from some sequences are shown in Fig. 3. For evaluation, we randomly selected one face sequence per person for training while the rest was used for testing. In all our experiments, we considered the average recognition rates of 100 random permutations. For comparative study, we also implemented some state-of-the-art methods including three still image-based methods (PCA, LDA and LBP [16]) and two spatiotemporal-based approaches (HMM [2] and ARMA [1]). For still imagebased analysis, we adopted a scheme proposed in [4] to perform appearance-based face recognition from videos. The approach consists of performing unsupervised learning to extract a set of K most representative samples (or exemplars) from the raw gallery videos (K = 3 in our experiments). Once these exemplars are extracted, we build a view-based system and use a probabilistic voting strategy to recognize the individuals in the probe video sequences. The performance of our proposed approach and also those of the considered methods under four different resolutions are plotted in Fig. 4. From the results, we can notice that all the methods perform quite well but the proposed manifold-based approach significantly outperforms all other methods in all im-

Manifold Learning for Video-to-Video Face Recognition 7 Fig. 3. Examples of facial images extracted from videos of three different subjects age resolution configurations. For instance, at image resolution of 60 60, our approach yielded in recognition rate of 99.8% while PCA, LDA, LBP, HMM, and ARMA yielded in recognition rates of 94.2%, 94.0%, 97.6%, 92.9% and 95.8%, respectively. It is worth noting that, in addition to its efficiency, our approach involves only two free parameters which are quite easy to determine [7]. From the results, we can also notice that the spatiotemporal-based methods (HMM and ARMA) do not always perform better than PCA, LDA, and LBP based methods. This supports the conclusions of other researchers indicating that using spatiotemporal representations does not systematically enhance the recognition performance. Our results also show that low-image resolutions affect all methods and the best results using the proposed manifold-based approach are obtained using 60 60 pixels as image resolution. Table 1. The performance of different methods using image resolution of 60x60 pixels Method Recognition rate PCA 94.2 % LDA 94.0 % LBP [16] 97.6 % HMM [2] 92.9 % ARMA [1] 95.8 % Manifold Learning 99.8 %

8 Fig. 4. Performance of the considered methods under four different resolutions 5 Conclusion To overcome the limitations of traditional video-based face recognition methods, we introduced a novel video-to-video matching approach based on manifold learning. Our approach consisted of first learning the hidden low-dimensional manifold of each individual. Then, a target face sequence is projected into each manifold for classification. The closest manifold determined the identity of the person in the target face video sequence. Experiments on a large set of talking faces under different resolutions showed excellent results outperforming stateof-the-art approaches. Our future work consists of extending our approach to multi-view face recognition from videos and experimenting with much larger databases. References 1. Aggarwal, G., Chowdhury, A.R., Chellappa, R.: A system identification approach for video-based face recognition. In: 17th ICPR. Volume 4. (2004) 175 178 2. Liu, X., Chen, T.: Video-based face recognition using adaptive hidden markov models. In: IEEE Int. Conf. on CVPR. (2003) 340 345 3. Lee, K.C., Ho, J., Yang, M.H., Kriegman, D.: Video-based face recognition using probabilistic appearance manifolds. In: IEEE Int. Conf. on CVPR. (2003) 313 320 4. Hadid, A., Pietikäinen, M.: Selecting models from videos for appearance-based face recognition. In: 17th ICPR. (2004) 304 308 5. Heisele, B., Ho, P., Wu, J., Poggio, T.: Face recognition: Component based versus global approaches. CVIU 91(1-2) (2003) 6 21 6. Hadid, A., Pietikäinen, M.: An experimental investigation about the integration of facial dynamics in video-based face recognition. ELCVIA 5(1) (2005) 1 13 7. Anonymous

Manifold Learning for Video-to-Video Face Recognition 9 8. Seung, H.: The manifold ways of perception. Science 290(12) (2000) 2268 2269 9. Kohonen, T., ed.: Self-Organizing Maps. Springer-Verlag, Berlin (1997) 10. Bishop, C.M., Svensen, M., Williams, C.K.I.: GTM: The generative topographic mapping. Neural Computation 10(1) (1998) 215 234 11. Sammon, J.: A nonlinear mapping for data structure analysis. IEEE Transactions on Computers 18(5) (1969) 401 409 12. Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500) (2000) 2323 2326 13. Tenenbaum, J.B., DeSilva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290(5500) (2000) 2319 2323 14. Belkin, M., Niyogi, P.: Laplacian eigenmaps and spectral techniques for embedding and clustering. In: Advances in NIPS 14. (2002) 585 591 15. Sanderson, C., ed.: Biometric Person Recognition: Face, Speech and Fusion. VDM- Verlag (2008) 16. Ahonen, T., Hadid, A., Pietikäinen, M.: Face description with local binary patterns: Application to face recognition. IEEE TPAMI 28(12) (2006) 2037 2041