Face Recognition in Low-Resolution Videos Using Learning-Based Likelihood Measurement Model

Similar documents
Pose-Robust Recognition of Low-Resolution Face Images

FACE images captured by surveillance cameras usually

Particle Filtering. CS6240 Multimedia Analysis. Leow Wee Kheng. Department of Computer Science School of Computing National University of Singapore

Multidirectional 2DPCA Based Face Recognition System

Face Detection and Recognition in an Image Sequence using Eigenedginess

On Modeling Variations for Face Authentication

Heat Kernel Based Local Binary Pattern for Face Representation

Selecting Models from Videos for Appearance-Based Face Recognition

Video Based Face Recognition Using Graph Matching

Overview of the Multiple Biometrics Grand Challenge

Haresh D. Chande #, Zankhana H. Shah *

Generic Face Alignment Using an Improved Active Shape Model

IMPROVED FACE RECOGNITION USING ICP TECHNIQUES INCAMERA SURVEILLANCE SYSTEMS. Kirthiga, M.E-Communication system, PREC, Thanjavur

Semi-Supervised PCA-based Face Recognition Using Self-Training

NIST. Support Vector Machines. Applied to Face Recognition U56 QC 100 NO A OS S. P. Jonathon Phillips. Gaithersburg, MD 20899

Face Recognition Using SIFT- PCA Feature Extraction and SVM Classifier

Overview of the Multiple Biometrics Grand Challenge

Face detection and recognition. Detection Recognition Sally

Probabilistic Facial Feature Extraction Using Joint Distribution of Location and Texture Information

3D Face Modelling Under Unconstrained Pose & Illumination

Subject-Oriented Image Classification based on Face Detection and Recognition

FACE RECOGNITION USING SUPPORT VECTOR MACHINES

MULTI-POSE FACE HALLUCINATION VIA NEIGHBOR EMBEDDING FOR FACIAL COMPONENTS. Yanghao Li, Jiaying Liu, Wenhan Yang, Zongming Guo

Face Recognition Using Phase-Based Correspondence Matching

Learning based face hallucination techniques: A survey

Facial Recognition Using Active Shape Models, Local Patches and Support Vector Machines

COMBINING SPEEDED-UP ROBUST FEATURES WITH PRINCIPAL COMPONENT ANALYSIS IN FACE RECOGNITION SYSTEM

A Distance-Based Classifier Using Dissimilarity Based on Class Conditional Probability and Within-Class Variation. Kwanyong Lee 1 and Hyeyoung Park 2

Face View Synthesis Across Large Angles

Face Tracking. Synonyms. Definition. Main Body Text. Amit K. Roy-Chowdhury and Yilei Xu. Facial Motion Estimation

TIED FACTOR ANALYSIS FOR FACE RECOGNITION ACROSS LARGE POSE DIFFERENCES

Local Similarity based Linear Discriminant Analysis for Face Recognition with Single Sample per Person

Face Recognition Based on LDA and Improved Pairwise-Constrained Multiple Metric Learning Method

LOCAL APPEARANCE BASED FACE RECOGNITION USING DISCRETE COSINE TRANSFORM

Robust Model-Free Tracking of Non-Rigid Shape. Abstract

Partial Face Matching between Near Infrared and Visual Images in MBGC Portal Challenge

Lucas-Kanade Scale Invariant Feature Transform for Uncontrolled Viewpoint Face Recognition

Comparison of Different Face Recognition Algorithms

Face Refinement through a Gradient Descent Alignment Approach

Manifold Learning for Video-to-Video Face Recognition

Unconstrained Face Recognition using MRF Priors and Manifold Traversing

Face Recognition in Video: Adaptive Fusion of Multiple Matchers

CS 231A Computer Vision (Fall 2012) Problem Set 3

Pose Normalization for Robust Face Recognition Based on Statistical Affine Transformation

FACE RECOGNITION FROM A SINGLE SAMPLE USING RLOG FILTER AND MANIFOLD ANALYSIS

Face detection and recognition. Many slides adapted from K. Grauman and D. Lowe

GENDER CLASSIFICATION USING SUPPORT VECTOR MACHINES

Linear Discriminant Analysis for 3D Face Recognition System

Misalignment-Robust Face Recognition

Study and Comparison of Different Face Recognition Algorithms

A Hierarchical Face Identification System Based on Facial Components

Facial Deblur Inference to Improve Recognition of Blurred Faces

Sparse Shape Registration for Occluded Facial Feature Localization

Creating Invariance To Nuisance Parameters in Face Recognition

Face Recognition by Combining Kernel Associative Memory and Gabor Transforms

CS 223B Computer Vision Problem Set 3

A GENERIC FACE REPRESENTATION APPROACH FOR LOCAL APPEARANCE BASED FACE VERIFICATION

Unsupervised Human Members Tracking Based on an Silhouette Detection and Analysis Scheme

Face Recognition Based On Granular Computing Approach and Hybrid Spatial Features

Gabor Volume based Local Binary Pattern for Face Representation and Recognition

Three-Dimensional Face Recognition: A Fishersurface Approach

Face Recognition Across Non-Uniform Motion Blur, Illumination and Pose

Integrating Face-ID into an Interactive Person-ID Learning System

Face Recognition At-a-Distance Based on Sparse-Stereo Reconstruction

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

Robust Estimation of Albedo for Illumination-invariant Matching and Shape Recovery

Face Detection Using Convolutional Neural Networks and Gabor Filters

ROBUST FACE HALLUCINATION USING QUANTIZATION-ADAPTIVE DICTIONARIES

ROBUST OBJECT TRACKING BY SIMULTANEOUS GENERATION OF AN OBJECT MODEL

LECTURE ATTENDANCE SYSTEM WITH FACE RECOGNITION AND IMAGE PROCESSING

Robust Face Recognition via Sparse Representation Authors: John Wright, Allen Y. Yang, Arvind Ganesh, S. Shankar Sastry, and Yi Ma

CHAPTER 3 PRINCIPAL COMPONENT ANALYSIS AND FISHER LINEAR DISCRIMINANT ANALYSIS

Linear Discriminant Analysis in Ottoman Alphabet Character Recognition

Occlusion Robust Multi-Camera Face Tracking

Dilation Aware Multi-Image Enrollment for Iris Biometrics

ROBUST PARTIAL FACE RECOGNITION USING INSTANCE-TO-CLASS DISTANCE

Active Appearance Models

Face Recognition using Principle Component Analysis, Eigenface and Neural Network

Learning Patch Dependencies for Improved Pose Mismatched Face Verification

Applications Video Surveillance (On-line or off-line)

Low Resolution Face Recognition Across Variations in Pose and Illumination

Face Verification across Age Progression

Robust Video-based Face Recognition

Principal Component Analysis and Neural Network Based Face Recognition

Online Learning of Probabilistic Appearance Manifolds for Video-based Recognition and Tracking

Face and Facial Expression Detection Using Viola-Jones and PCA Algorithm

Dictionary-Based Face Recognition from Video

An Integrated Face Recognition Algorithm Based on Wavelet Subspace

Robust Face Recognition via Sparse Representation

A Manifold Approach to Face Recognition from Low Quality Video Across Illumination and Pose using Implicit Super-Resolution

Automatic 3D Face Detection, Normalization and Recognition

Distance-driven Fusion of Gait and Face for Human Identification in Video

Decorrelated Local Binary Pattern for Robust Face Recognition

Learning a Manifold as an Atlas Supplementary Material

Thermal Face Recognition using Local Interest Points and Descriptors for HRI Applications *

Boosting face recognition via neural Super-Resolution

An Adaptive Threshold LBP Algorithm for Face Recognition

Using 3D Models to Recognize 2D Faces in the Wild

3D Active Appearance Model for Aligning Faces in 2D Images

Face Recognition from Images with High Pose Variations by Transform Vector Quantization

Transcription:

Face Recognition in Low-Resolution Videos Using Learning-Based Likelihood Measurement Model Soma Biswas, Gaurav Aggarwal and Patrick J. Flynn, Department of Computer Science and Engineering, University of Notre Dame, Notre Dame {sbiswas, gaggarwa, flynn}@nd.edu Abstract Low-resolution surveillance videos with uncontrolled pose and illumination present a significant challenge to both face tracking and recognition algorithms. Considerable appearance difference between the probe videos and high-resolution controlled images in the gallery acquired during enrollment makes the problem even harder. In this paper, we extend the simultaneous tracking and recognition framework [22] to address the problem of matching highresolution gallery images with surveillance quality probe videos. We propose using a learning-based likelihood measurement model to handle the large appearance and resolution difference between the gallery images and probe videos. The measurement model consists of a mapping which transforms the gallery and probe features to a space in which their inter-euclidean distances approximate the distances that would have been obtained had all the descriptors been computed from good quality frontal images. Experimental results on real surveillance quality videos and comparisons with related approaches show the effectiveness of the proposed framework. 1. Introduction The wide range of applications in law-enforcement and security has made face recognition (FR) a very important area of research in the field of computer vision and pattern recognition. The ubiquitous use of surveillance cameras for improved security has shifted the focus of face recognition This research was funded by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), through the Army Research Laboratory (ARL). The views and conclusions contained in this document are those of the authors and should not be interpreted as representing official policies, either expressed or implied, of IARPA, the ODNI, the Army Research Laboratory, or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation herein. from controlled scenarios to the uncontrolled environment typical in surveillance setting [17]. Typically, the images or videos captured from the surveillance systems have nonfrontal pose and uncontrolled illumination in addition to low-resolution due to the distance of the subjects from the cameras. On the other hand, good high-resolution images of the subjects may be present in the gallery during enrollment. This presents the challenge of matching gallery and probe images or videos which differ significantly in resolution, pose and illumination. In this paper, we consider the scenario in which the gallery consists of one or more highresolution frontal images, while the probe consists of lowresolution videos with uncontrolled pose and illumination as is typically obtained in surveillance systems. Most of the research in video-based face recognition has focused on dealing with one or more challenges like uncontrolled pose, illumination, etc. [23], but there are very few approaches which simultaneously deal with all the challenges. Some of the recent approaches which handle the resolution difference between the gallery and probe either are restricted to frontal images [6] or require videos for enrollment [2]. For video based FR, a tracking-thenrecognition paradigm is typically followed, in which the faces are first tracked and then used for recognition. But both tracking and recognition are very challenging for lowquality videos with low-resolution and significant variations in pose and illumination. In this paper, we extend the simultaneous tracking and recognition framework [22] which performs the two tasks of tracking and recognition in a single unified framework to address these challenges. We propose using distance learning based techniques for better modeling the appearance changes between the frames of the low-resolution probe videos and the high-resolution gallery images for better recognition and tracking accuracy. Multidimensional Scaling [4] is used to learn a mapping from training images which transforms the gallery and probe features to a space in which their inter-euclidean distances approximate the distances that would have been obtained had all the descrip- 1 978-1-4577-1359-0/11/$26.00 2011 IEEE

tors been computed from high-resolution frontal images. We evaluate the effectiveness of the proposed approach on surveillance quality videos from the MBGC data [16]. We observe that the proposed approach performs significantly better in terms of both tracking and recognition accuracy as compared to standard appearance modeling approaches. The rest of the paper is organized as follows. An overview of the related approaches is discussed in Section 2. The details of the proposed approach are provided in Section 3. The results of experimental evaluation are presented in Section 4. The paper concludes with a brief summary and discussion. 2. Previous Work In this section, we discuss the related work in the literature. For brevity, we will refer to high-resolution as HR and low-resolution as LR. There has been a considerable amount of work in general video-based FR addressing two kind of scenarios: (1) both the gallery and probe are video sequences [11] [13] [10] [18] and (2) the probe videos are compared with one or multiple still images in the gallery [22]. For tracking and recognizing faces in realworld, noisy videos, Kim et al. [10] propose a tracker that adaptively builds a target model reflecting changes in appearance typical of a video setting. In the subsequent recognition phase, the identity of the tracked subject is established by fusing pose-discriminant and person-discriminant features over the duration of a video sequence. Stallkamp et al. [18] classify faces using a local appearance-based FR algorithm for real-time video-based face identification. The obtained confidence scores from each classification are progressively combined to provide an identity estimate for the entire sequence. Many researchers have also addressed the problem of video-based FR by treating the videos as image sets [20]. Most of the current approaches which address the problem of LR still face recognition follow a super-resolution approach. Given an LR face image, Jia and Gong [8] propose directly computing a maximum likelihood identity parameter vector in the HR tensor space which can be used for recognition and reconstruction of HR face images. Liu et al. [12] propose a two-step statistical modeling approach for hallucinating a HR face image from a LR input. The relationship between the HR images and their corresponding LR images is learned using a global linear model and the residual high-frequency content is modeled by a patch-based non-parametric Markov network. Several other super-resolution techniques have also been proposed [5] [9]. The main aim of these techniques is to produce a high resolution image from the low-resolution input using assumptions about the image content, and they are usually not designed from a matching perspective. A Multidimensional Scaling (MDS)-based approach has been recently proposed to improve the performance of still LR images, but it does not deal with matching a HR gallery image with a LR probe video [3]. Recently, Hennings-Yeomans et al. [6] proposed an approach to perform super-resolution and recognition simultaneously. Using features from the face and super-resolution priors, they extract an HR template that simultaneously fits the super-resolution as well as the face-feature constraints. The formulation was extended to use multiple frames, and the authors showed that it can also be generalized to use multiple image formation processes, modeling different cameras [7]. But this approach assumes that the probe and gallery images are in the same pose making them not directly applicable for more general scenarios. Arandjelovic and Cipolla [2] propose a generative model for separating the illumination and down-sampling effects for the problem of matching a face in a LR query video sequence against a set of HR gallery sequences. It is an extension of the Generic Shape-Illumination Manifold framework [1] which was used to describe the appearance variations due to the combined effects of facial shape and illumination. As noted in [7], a limitation of this approach is that it requires a video sequence at enrollment. 3. Proposed Approach For matching LR probe videos with significant pose and illumination variations with HR frontal gallery images, we propose to use a learning based appearance modeling in a simultaneous tracking and recognition framework. 3.1. Simultaneous Tracking and Recognition First, we briefly describe the tracking and recognition framework [22] which uses a modified version of the CON- DENSATION algorithm for tracking the facial features across the frames in the poor quality probe video and for recognition. The filtering framework consists of a motion model which characterizes the motion of the subject in the video. The overall state vector of this unified tracking and recognition framework consists of an identity variable in addition to the usual motion parameters. The observation model determines the measurement likelihood i.e. the likelihood of observing the particular measurement given the current state consisting of the motion and identity variable. Motion Model: The motion model is given by the firstorder Markov chain θ t = θ t 1 + u t ; t 1 (1) Here affine motion parameters are used and so θ = (a 1, a 2, a 3, a 4, t x, t y ) where {a 1, a 2, a 3, a 4 } are deformation parameters and {t x, t y } are 2D translation parameters. u t is noise in the motion model.

Identity equation: Assuming that the identity does not change as time proceeds, the identity equation is given by n t = n t 1 ; t 1 (2) Observation Model: Assuming that the transformed observation is a noise-corrupted version of some still template in the gallery, the observation equation can be written as T θt {z t } = I nt + v t ; t 1 (3) where v t is the observation noise at time t and T θt {z t } is a transformed version of the observation z t. Here T θt {z t } is composed of (1) an affine transform of z using {a 1, a 2, a 3, a 4 }, (2) cropping the region of interest at position {t x, t y } with the same size as some still template and (3) performing zero-mean-unit-variance normalization. In this modified version of the CONDENSATION algorithm, random samples are propagated on the motion vector while the samples on the identity variable are kept fixed. Although only the marginal distribution is propagated for motion tracking, the joint distribution is propagated for recognition purposes. This results in a considerable improvement in computation over propagating random variables on both the motion vector and identity variable for large databases. The different steps of the simultaneous tracking and recognition framework are given in Figure 1. The mean of the Gaussian distributed prior comes from the initial detector whose covariance matrix is manually specified. Please refer to [22] for more details of the algorithm. 3.2. Traditional Likelihood Measurement If there is no significant facial appearance difference between the probe frames and the gallery templates, a simple likelihood measurement like a truncated Laplacian is sufficient [22]. More sophisticated likelihood measurement models like the probabilistic subspace density approach are required to handle greater appearance difference between the probe and the gallery [22]. In this approach, the intrapersonal variations are learned using the available gallery and one frame of the video sequences. Usually, surveillance videos have very poor resolution, in addition to large variations in pose and illumination which results in decrease in both tracking and recognition performance. Here we propose a multidimensional scaling (MDS)-based approach for computing the measurement likelihood which results in better modeling of the appearance difference between the gallery and probe resulting in both better tracking and recognition. 3.3. Learning-Based Likelihood Measurement In this work, we use local SIFT features [14] at seven fiducial locations for representing a face (Figure 2). SIFT descriptors are fairly robust to modest variations in pose and Initialize a sample set S 0 = {(θ (j) 0 )}J j=1 according to the prior distribution p(θ 0 z 0 ) which is assumed to be Gaussian. The particle weights for each subject {w (j) 0,n }J j=1, n = 1,, N is initialized to 1. J and N denotes the number of particles and subjects respectively. 1. Predict: sample by drawing θ (j) t from the motion state transition probability p(θ t θ (j) t 1 ) and compute the transformed image T corresponding to the predicted sample. 2. Update: the weights using α j t,n = w (j) t 1,n p(z t n, θ (j) t ) (measurement likelihood) for each subject in the gallery. The normalized weights are given by w (j) t,n = αt,n/ j N J n=1 j=1 αj t,n. The measurement likelihood is learned from a set of HR training images (Section 3.3). 3. Resample: Particles for all subjects are reweighted to obtain samples with new weights w (j) t,n = w t,n/w (j) (j) t, where the denominator is given by w (j) t = N n=1 w(j) t,n. Marginalize over θ t to obtain the weights for n t to obtain the probe id. Figure 1. Simultaneous tracking and recognition framework [22]. resolution and this kind of representation has been shown to be useful for matching facial images in uncontrolled scenarios. But the large variations in pose, illumination and resolution observed in surveillance quality videos results in significant decrease in recognition performance using SIFT descriptors. The MDS-based approach transforms the SIFT descriptors extracted from gallery/probe images to a space in which their inter-euclidean distances approximate the distances had all the descriptors been computed using HR frontal images. The transformation is learned from a set of HR and corresponding LR training images. Figure 2. SIFT features at fiducial locations used for representing the face. Let HR frontal images are denoted by I (h,f), and the LR non-frontal images are denoted by I (l,p). The corresponding SIFT-based feature descriptors are denoted by x (h,f) and x (l,p). Let f : R d R m denote the mapping from the

Figure 3. Flow chart showing the steps of the proposed algorithm. input feature space R d to the embedded Euclidean space R m f(x; W) = W T φ(x) (4) Here φ(x) can be a linear or non-linear function of the feature vectors and W is the matrix of the weights to be determined. The goal is to simultaneously transform the feature vectors from I (h,f) i and I (l,p) j such that the Euclidean distance between the transformed feature vectors approximates d (h,f) i,j (distance if both the images are frontal and high resolution). Thus the objective function to be minimized is given by the distance preserving term J DP which ensures that the distance between the transformed feature vectors approximates d (h,f) i,j J DP (W) = i=1 j=1 (q ij (W) d (h,f) ) 2 (5) Here q ij (W) is the distance between the transformed feature vectors of the images I (h,f) i i,j and I (l,p) j. An optional class separability term J CS can also be incorporated in the objective function to further facilitate discriminability J CS (W) = i=1 j=1 δ(ω i, ω j )qi,j(w) 2 (6) This term tries to minimize the distance between feature vectors belonging to same class [21]. Here δ(ω i, ω j ) = 0; when ω i ω j and 0 otherwise (ω i denotes the class label of the i th image). Combining the above two terms, the transformation is obtained by minimizing the following objective function J(W) = λj DP (W) + (1 λ)j CS (W) (7) The relative effect of the two terms in the objective function is controlled by the parameter λ. The iterative majorization algorithm [21] is used to minimize the objective function (7) to solve for the transformation matrix W. To compute the measurement likelihood, the SIFT descriptors of the gallery and affine-transformed probe frame are mapped using the learned transformation W, followed by computation of Euclidean distances between the transformed features. p(z t n t, θ t ) = W T [ φ(t θt {z t }) φ(x nt ) ] (8) Figure 3 shows a flow-chart of the proposed learning-based simultaneous tracking and recognition framework. 4. Experimental Evaluation In this section, we will discuss in detail the experimental evaluation of the proposed approach. 4.1. Dataset Used For our experiments, we use 50 surveillance quality videos (each 40 100 frames from 50 subjects) from the

Figure 4. Example frames from MBGC video challenge [16]. Multiple Biometric Grand Challenge (MBGC) [16] video challenge data for the probe videos. Figure 4 shows some sample frames from a video sequence. Since the MBGC video challenge data does not contain high resolution frontal still images needed to form the HR gallery set, we select images of the same subjects from FRGC data which has considerable subject overlap with the MBGC data. Figure 5 (top row) shows some sample gallery images from the dataset used and the bottom row shows cropped face regions from the corresponding probe videos. We see that there is a considerable difference in pose, illumination and resolution between the gallery images and the probe videos. 2. Probabilistic subspace density based likelihood: To handle significant appearance differences between the facial images in the gallery and probe, Zhou et al. [22] proposed using the probabilistic subspace density based approach proposed by Moghaddam et al. [15] due to its computational efficiency and high recognition accuracy. The available gallery and one video frame was used for constructing the intra-personal space (IPS). Using this approach, the measurement likelihood can be written as p(zt nt, θt ) = PS Tθt {zt } Int (10) where PS(x) = Ps 2 i=1 (yi /λi ) Q 1/2 s (2π)s/2 i=1 λi exp 1/2 Here {λi, ei }si=1 are the top s eigenvalues and the corresponding eigenvectors obtained by performing regular Principal Component Analysis [19] on IPS and yi = eti x is the ith principal component of x. Figure 5. (Top) Example high resolution gallery images; (Bottom) Cropped facial regions from the corresponding low resolution probe videos. 4.2. Recognition and Tracking Accuracy Here we report both tracking and recognition performance of the proposed approach. The proposed learningbased likelihood measurement model is compared with the following two approaches for computing the likelihood measurement [22]: 1. Truncated laplacian likelihood: Here the likelihood measurement model is given by [22] p(zt nt, θt ) = LAP k Tθt {zt } Int k; σ1, τ1 (9) Here k. k is the absolute distance and 1 σ exp( x/σ) if x τ σ, LAP(x; σ; τ ) = σ 1 exp( τ ) otherwise We build upon the code provided in the authors website (http://www.cfar.umd.edu/shaohua/sourcecodes.html). For all experiments, the kernel mapping φ is set to identity (i.e., φ(x) = x) to highlight just the performance improvement due to the proposed learning approach. Training is done using images from a separate set of 50 subjects. For computation of the transformation matrix using the iterative majorization algorithm, we observe that the objective function decreases till around 20 iterations and then stabilizes. The value of the parameter λ is set to 0.8 and the output dimension m is set to 100. The number of particles for the particle filtering framework is taken to be 200. The recognition performance of the proposed approach is shown in Table 1. Comparisons with the two different kinds of likelihood models are also shown. The three approaches label each video as belonging to one of the subjects in the gallery. The recognition rate is calculated as the percentage of correct labels out of all videos. We see that the recognition performance of the proposed learningbased simultaneous tracking and recognition framework is considerably better than the other approaches due to better

Method Truncated laplacian Probabilistic subspace density Proposed Approach Likelihood Based likelihood Rank 1 Recog. Accuracy 24% 40% 68% Tracking Accuracy 4.8 5.8 2.8 Table 1. Rank-1 recognition accuracy and tracking accuracy (pixels/frame) using the proposed approach. Comparisons with other approaches are also provided. modeling of the appearance difference between the gallery and the probe images. To compute the tracking error, we manually marked three fiducial locations (the center of the two eyes and the bottom of the nose) of every fifth frame of each video. For each probe video, we measured the difference between the manually marked ground truth locations and the locations given by the tracker. For a probe video, the tracking error is given by the average difference in the fiducial locations (averaged over all the frames). Figure 6 shows the tracking results for a few frames of a probe video for the proposed approach. Figure 7 shows the tracking error for the proposed approach and for the truncated laplacian-based likelihood and probabilistic subspace density-based likelihood model. We see for 49 out of 50 videos, the proposed approach achieves a lower tracking error as compared to the other approaches. The mean tracking error (in pixels) over all the probe videos for all the approaches are shown in Table 1. problem. Performing tracking and recognition simultaneously in a unified framework as opposed to first performing tracking and then recognition has been shown to improve both the tracking and recognition performance. But simple likelihood measurement models like truncated laplacian, IPS, etc. fail to give satisfactory performance for cases where there is significant difference between the appearance of the gallery images and the faces in the probe videos. In this paper, we propose using a learning-based likelihood measurement model to improve both the recognition and tracking accuracy for surveillance quality videos. In the training stage, a transformation is learned to simultaneously transform the features from the poor quality probe images and the high quality gallery images in such a manner that the distances between them approximate the distances had the probe videos been captured in the same conditions as the gallery images. In the testing stage, the learned transformation matrix is used to transform the features from the gallery images and the different particles to compute the likelihood of each particle in the modified particle-filtering framework. Experiments on surveillance quality videos show the usefulness of the proposed approach. References Figure 7. Average tracking accuracy of the proposed learningbased approach. Comparison with the other approaches are also provided. 5. Summary and Discussion In this paper, we consider the problem of matching faces in low-resolution surveillance videos with good high resolution images in the gallery. Tracking and recognizing faces in low-resolution videos with considerable variations in pose, illumination, expression, etc. is a very challenging [1] O. Arandjelovic and R. Cipolla. Face recognition from video using the generic shape-illumination manifold. In European Conf. on Computer Vision, pages 27 40, 2006. 2 [2] O. Arandjelovic and R. Cipolla. A manifold approach to face recognition from low quality video across illumination and pose using implicit super-resolution. In IEEE International Conf. on Computer Vision, 2007. 1, 2 [3] S. Biswas, K. W. Bowyer, and P. J. Flynn. Multidimensional scaling for matching low-resolution facial images. In IEEE International Conf. On Biometrics: Theory, Applications And Systems, 2010. 2 [4] I. Borg and P. Groenen. Modern Multidimensional Scaling: Theory and Applications. Springer, Second Edition, New York, NY, 2005. 1 [5] B. Gunturk, A. Batur, Y. Altunbasak, M. Hayes, and R. Mersereau. Eigenface-domain super-resolution for face recognition. IEEE Trans. on Image Processing, 12(5):597 606, May 2003. 2 [6] P. Hennings-Yeomans, S. Baker, and B. Kumar. Simultaneous super-resolution and feature extraction for recognition of low-resolution faces. In IEEE Conf. on Computer Vision and Pattern Recognition, pages 1 8, 2008. 1, 2

Figure 6. A few frames showing the tracking results obtained using the proposed approach. Here only the region of the frames containing the person is shown for better visualization. [7] P. Hennings-Yeomans, B. Kumar, and S. Baker. Recognition of low-resolution faces using multiple still images and multiple cameras. In IEEE International Conf. On Biometrics: Theory, Applications And Systems, pages 1 6, 2008. 2 [8] K. Jia and S. Gong. Multi-modal tensor face for simultaneous super-resolution and recognition. In IEEE International Conf. on Computer Vision, pages 1683 1690, 2005. 2 [9] K. Jia and S. Gong. Generalized face super-resolution. IEEE Trans. on Image Processing, 17(6):873 886, June 2008. 2 [10] M. Kim, S. Kumar, V. Pavlovic, and H. Rowley. Face tracking and recognition with visual constraints in real-world videos. In IEEE Conf. on Computer Vision and Pattern Recognition, pages 1 8, 2008. 2 [11] K. C. Lee, J. Ho, M. H. Yang, and D. Kriegman. Video-based face recognition using probabilistic appearance manifolds. In IEEE Conf. on Computer Vision and Pattern Recognition, pages 313 320, 2003. 2 [12] C. Liu, H. Y. Shum, and W. T. Freeman. Face hallucination: Theory and practice. International Journal of Computer Vision, 75(1):115 134, 2007. 2 [13] X. Liu and T. Chen. Video-based face recognition using adaptive hidden markov models. In IEEE Conf. on Computer Vision and Pattern Recognition, pages 340 345, 2003. 2 [14] D. G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2):91 110, 2004. 3 [15] B. Moghaddam. Principal manifolds and probabilistic subspaces for visual recognition. IEEE Trans. on Pattern Analysis and Machine Intelligence, 24(6):780 788, June 2002. 5 [16] P. J. Phillips, P. J. Flynn, J. R. Beveridge, W. T. Scruggs, A. J. O Toole, D. S. Bolme, K. W. Bowyer, A. Draper, Bruce, G. H. Givens, Y. M. Lui, H. Sahibzada, J. A. Scallan, and S. Weimer. Overview of the multiple biometrics grand challenge. In International Conference on Biometrics, pages 705 714, 2009. 2, 5 [17] P. J. Phillips, W. T. Scruggs, A. J. O Toole, P. J. Flynn, K. W. Bowyer, C. L. Schott, and M. Sharpe. Frvt 2006 and ice 2006 large-scale experimental results. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(5):831 846, 2010. 1 [18] J. Stallkamp, H. K. Ekenel, and R. Stiefelhagen. Video-based face recognition on real-world data. In IEEE International Conf. on Computer Vision, 2007. 2 [19] M. Turk and P. Pentland. Eigenfaces for recognition. Journal of Cognitive Neurosicence, 3(1):71 86, 1991. 5 [20] R. Wang, S. Shan, X. Chen, and W. Gao. Manifold-manifold distance with application to face recognition based on image set. In IEEE Conf. on Computer Vision and Pattern Recognition, 2008. 2 [21] A. Webb. Multidimensional scaling by iterative majorization using radial basis functions. Pattern Recognition, 28(5):753 759, May 1995. 4 [22] S. K. Zhao, V. Krueger, and R. Chellappa. Probabilistic recognition of human faces from video. Computer Vision and Image Understanding, 91:214 245, 2003. 1, 2, 3, 5 [23] W. Zhao, R. Chellappa, P. Phillips, and A. Rosenfeld. Face recognition: A literature survey. ACM Computing Surveys, 35(4):399 458, 2003. 1