Facial Recognition Using Active Shape Models, Local Patches and Support Vector Machines

Facial Recognition Using Active Shape Models, Local Patches and Support Vector Machines Utsav Prabhu ECE Department Carnegie Mellon University 5000 Forbes Avenue Pittsburgh, PA-15213 uprabhu@andrew.cmu.edu Keshav Seshadri ECE Department Carnegie Mellon University 5000 Forbes Avenue Pittsburgh, PA-15213 kseshadr@andrew.cmu.edu Abstract In this paper we propose an improved method for facial recognition of frontal faces using local patches around well defined facial landmarks. Our method aims at rectifying the problems of illumination variation and in-plane rotation of faces by only using specific discriminative areas on a face thus making it more robust. 79 landmarks are automatically fitted onto all faces in our training and test set using a pre-trained Active Shape Model. Local patches of fixed dimension are built around the most discriminative and accurate landmarks and then used to obtain features. It is these features that are used to differentiate one class from another using a Support Vector Machine as the classifier in a one against the rest form. We evaluate our scheme on random training and test sets drawn from two different databases (NIST Multiple Biometric Grand Challenge-2008 (MBGC-2008) and CMU Multi-PIE) and show that our method is capable of good recognition rates. 1 Introduction Facial recognition schemes are increasingly becoming more accurate, however, the combined effect of illumination changes, pose variations and in-plane rotations of subjects has been known to throw off the accuracy of several schemes. Our focus is on illumination and in-plane effects and we do not address the problem posed by pose variations in this paper. Several solutions have been proposed to deal with the problem of illumination. Such schemes include de-illumination and reillumination of faces in the image domain as described in [1], illumination normalization using histogram equalization [2] and using Near-Infrared images [3]. All of the above schemes do achieve good results but focus mainly on compensating for illumination effects rather that using an approach that is inherently robust to it. It has been shown that a local approach to face recognition is more robust to illumination effects than a global approach [4], [5]. It is for this reason that we focus on the use of features extracted from small two-dimensional (2D) regions around selected facial landmarks for our recognition algorithm. A modified Active Shape Model (ASM) [6] is used to determine the locations of 79 landmark points across all faces in our training and test databases. Local patches are isolated around each landmark and used to build features unique to each class using a combination of Gabor filter banks and Principal Component Analysis (PCA). These features are used to train a Support Vector Machine (SVM) which serves as our classifier. Such a scheme harnesses a lot of information from every facial image unlike global approaches which utilize pixel intensities in the image as a whole and thus can suffer from noise in the background, in-plane rotations etc. Similar local approaches have been followed in [4] and [5]. [4] uses Active Shape Models to find landmarks of interest on a face and then compares the facial shape of a test image with those in a training database to classify the test image. [5] 1

extracts facial feature regions and uses an SVM (in a one against the rest form) for classification using these extracted features. To evaluate our algorithm we observe the identification rates it generates on images drawn from the NIST Multiple Biometric Grand Challenge-2008 (MBGC-2008) database [7] as well as images drawn from the CMU Multi-PIE database [8]. Both databases have are quite challenging and contain images with illumination and in-plane rotation effects. The rest of this paper is organized as follows. In section 2, we describe the algorithms we use in our implementation. Section 3 describes the results of our experiments while section 4 presents our conclusions and a description of future work. 2 Component tools used by our method This section goes into the details of several existing tools and why they are suitable for our use in our overall facial recognition method. 2.1 Active shape models Active Shape Models (ASMs) are aimed at automatically locating landmark points that define the shape of any statistically modeled object in an image. When modeling faces, the landmark points of interest consist of points that lie along the shape boundaries of facial features such as the eyes, lips, nose, mouth and eyebrows. The training stage of an ASM involves the building of a statistical facial model from a training set containing images with manually annotated landmarks. The landmarking scheme used by us consists of 79 facial points as shown in Figure 1. Our training set comprised of 500 images of 115 subjects from the query set of the still face challenge problem of the MBGC-2008 database. The shapes in the training set are aligned with each other using Generalized Procrustes Analysis (GPA) [9] and then used to generate a mean shape of a typical face. Subsequently, statistical models of the grey level intensities of the region around each landmark are built using 2D profiles which are generated by sampling the image in a square region around each landmark. Such profiles are generated for each landmark point in each image and for four different levels in an image pyramid. At the testing stage, the OpenCV implementation of the Viola Jones face detector [10] is used for locating the face in an image. Once the face has been detected, the mean face is scaled, rotated and translated using a similarity transform to roughly fit on top of the face in the test image. Multi-level profiles are constructed for the image in the same way as they were at the training stage. Landmarks are repeatedly moved into locations with profiles that best match the mean profile for that landmark until there is no significant change in their positions between two successive iterations. This process continues until convergence is declared at the finest level of the pyramid at which stage the final Figure 1: Landmarking scheme used in our ASM implementation 2

Detect face in test image and align mean face over it Level 3 Level 2 Level 1 Level 0 Multi-level profiling to determine best location for landmarks Final landmark coordinates ready Figure 2: Steps involved in ASM at the test stage landmark coordinates are obtained. Figure 2 illustrates the process of ASM fitting of an unseen test image. 2.2 Gabor filters for texture analysis The texture around particular areas of the face image provides sufficient information to construct a robust face recognition engine. This places a considerable emphasis on the formalization and evaluation of the texture of the image patches, a task which is carried out by the use of Gabor filter banks. Gabor filters are tunable band-pass filters which can be tuned in frequency, orientation and bandwidth. The filter takes the form: g(x, y; λ, θ, ψ, σ, γ) = exp( x 2 + γ 2 y 2 2σ 2 ) cos(2π x + ψ) (1) λ where x = x cos θ + y sin θ y = x sin θ + y cos θ Hence, a Gabor filter is simply the product of a Gaussian kernel and a cosine wave. In (1), λ and ψ represent the wavelength and phase of the underlying cosine wave, σ and γ represent the standard deviation and spatial aspect ratio of the Gaussian kernel and θ represents the orientation of the normal to the function. Gabor filters have been found to be both efficient and versatile in implementation. Consequently, they are widely used in computer vision to identify and differentiate textures in images, usually 3

(a) (b) Figure 3: (a) Some of the Gabor filters used in our filter bank (b) Filtering operation on a patch around a landmark (landmark 20) in the form of a Gabor filter bank consisting of many Gabor filters tuned in different ways. For our experiments, we generate a Gabor filter bank consisting of 384 different Gabor filters, using 8 orientations, 4 frequencies, 3 scales and 2 spatial aspect ratios. Each combination of these values results in 2 Gabor filters: one even-symmetric and one odd-symmetric. A subset of the Gabor filters used in our experiments is shown in Figure 3. Each local patch extracted from the image is then filtered with these 384 filters, leading to a 384 dimension feature vector for each patch as shown in Figure 3. To reduce the length of the feature vector describing each image, we use a PCA-based approach similar to the one proposed in [11]. 2.3 SVMs for multi class problems SVMs are predominantly used for binary class problems, however their use can be extended to multiclass problems as well by 2 approaches. The first approach is in a one against the rest form where M SVMs are built for M classes (one for each class) by treating images from each class as positive samples and images from all remaining samples as negative samples. The second approach is the pairwise method in which M(M-1)/2 SVMs are built to differentiate each class form the remaining M-1 classes. Both approaches have been found to produce approximately similar results when dealing with person recognition [12] so we prefer the one against the rest method as it requires the training of far fewer classifiers than the pairwise approach. In the one against the rest approach, the class label y of a test sample x is assigned as follows: y = n if d n (x) > 0 (2) 4

Figure 4: ASM fitting on some images from the MBGC dataset where d n (x) = max{d i (x)} N i=1 and d i(x) is the distance of x from the i th hyperplane (built for the i th class). The larger the value of d i (x), the more reliable the classification result is and hence we choose the final label of a test sample as the class whose SVM model maximizes this distance. 3 Experimental results Our first experiment aimed at benchmarking our implementation against a global PCA scheme when trained and tested on a subset of images from the still query set (consisting of 10,687 frontal images of 570 subjects) and the still target set (consisting of 24,042 frontal images of 466 subjects) of the MBGC-2008 database. Our training set consisted of 129 classes and 20 images per class while the test set consisted of 94 classes and 5 images per class. Images in this dataset were of size 407 527 while the faces in the images were typically of size 300 400. ASM was run on all these images to get the required 79 facial landmarks. Sample results from the MBGC dataset obtained using the ASM fitting process are shown in Figure 4. The first method for facial recognition was a global PCA scheme in which the facial region was cropped for all images and resized to size 300 300. For the training set, the entire facial region was used as a feature vector for an image and PCA was used to reduce the dimensionality of the feature vector by projecting onto eigenvectors corresponding to eigenvalues that modeled 97% of the feature variance. Now each training image was represented by 273 PCA coefficients. By projecting onto the eigenvectors built during training, each test image was now also represented by the same number of coefficients. Our implementation on the same training and test set first isolated 25 25 local patches around 64 landmarks (numbered 16 to 79 in Figure 1). We neglected landmarks along the facial edge as isolating patches around such landmarks would lead to the patch containing regions outside the face. A Gabor filter bank consisting of 384 filters was then applied to each patch so that each image was now represented by a vector of length 384 64. PCA (modeling 97% of variance) was used for dimensionality reduction as well as for classification and after it was used each image was represented by 179 PCA coefficients. Identification rates and ROC curves were obtained for both methods after computing a similarity matrix (based on cosine distance between feature vectors) for all the training and test images. Thus both these schemes used PCA for dimensionality reduction as well as for classification. Now in order to improve the identification rates, an SVM was used as the classifier. For our implementation we used the SVM Multi-Class library [13]. An SVM model (using a linear kernel) was built in a one against the rest form first for the global PCA coefficients and next for the PCA coefficients obtained from the filtered local patches. The ID rates obtained using these 2 methods on 5

Right Frontal Left 1 2 15 views 4 5 6 7 8 9 10 11 12 3 14 15 13 20 illuminations 2 expressions 249 subjects Figure 5: Session 1 of the CMU Multi-PIE database (a) (b) Figure 6: ROC Curves (a) MBGC dataset (b) MPIE dataset the same test set were computed to compare against the earlier 2 methods which did not involve the use of an SVM. The same schemes were compared in a second experiment carried out on a training set consisting of 249 classes and 30 images per class and a test set consisting of 249 classes and 10 images per class drawn from the frontal view (view 8 in Figure 5) set of session 1 of the CMU Multi-PIE database which contains 149,400 images each of size 640 480 (with the face in the image approximately of size 250 300) of 249 different subjects, across 15 views, 20 illuminations and 2 expressions (neutral and smiling) as shown in Figure 5. For this dataset, PCA (when used alone) reduced each image to a representation consisting of 238 coefficients, while when used with the Gabor responses for each patch, it produced 211 coefficients per image. Table 1 shows the identification rates obtained for the global PCA scheme, the method involving Gabor responses to local patches and PCA, SVM as a classifier for the global PCA coefficients and SVM as a classifier for the local patches PCA coefficients for both the MBGC and MPIE datasets respectively while Figure 6 compares the ROC curves obtained for the first two methods in each experiment. 6

Table 1: Identification rates (in %) obtained by 3 methods on the MBGC and MPIE datasets Method Used MBGC Dataset MPIE Dataset Global PCA 21.70% 33.82% PCA + Filtered Local Patches 55.32% 53.49% SVM + Global PCA 71.70% 74.45% SVM + PCA + Filtered Local Patches 79.36% 98.19% It is clear that a global PCA scheme obtains extremely poor results on both the datasets. The performance is bettered by the use of the Gabor responses to local patches and PCA as a classifier. However, the best results were obtained when an SVM was used as the classifier. This is expected, since PCA seldom functions as a good classifier and its role should be that of dimensionality reduction. As Table 1 shows, the use of an SVM improves the performance for both the global PCA scheme as well as the local patches scheme significantly. What is key though, is that our local approach scores over the global approach both with and without the use of an SVM. This indicates that the idea of using local patches and subsequent feature selection by applying Gabor filters is sound and with the correct choice of Gabor filters and SVM parameters our method can do extremely well on challenging databases. 4 Conclusions and future work We have proposed a method of facial recognition to deal with illumination changes and in-plane rotations which uses a local approach as opposed to a global one. Features are obtained by applying a Gabor filter bank to 2D patches around specific facial landmarks that are fitted using an accurate Active Shape Model. Subsequent use of PCA (for dimensionality reduction) and SVMs (for classification) have been shown to perform quite well on two challenging datasets. Our implementation has been benchmarked against a global PCA based scheme and has been shown to obtain far superior results. The theory behind our implementation has been proved to be quite sound, however there is still scope for improved performance. We have not yet looked into optimizing the Gabor filters we use for extracting the features nor completed a study of the best SVM parameters for building the best classifier. Future work will involve looking into the afore mentioned areas as well as the possibility of using Gabor jets for feature extraction instead of a Gabor filter bank. Another area worth investigating will be the performance enhancement that can be gained by weighting the features obtained for certain landmarks over others. For example, the ASM we use tends to fit eye and nose coordinates better than others and hence the features obtained from patches around these landmarks could be given more weight than others. References [1] Brendan Moore, Marshall Tappen, Hassan Foroosh, Learning Face Appearance Under Different Lighting Conditions, Proceedings of the 3 rd IEEE International Conference on Biometrics: Theory, Applications and Systems, September 2008. [2] Saleh Aly, Alaa Sagheer, Naoyuki Tsuruta, Rin-ichiro Taniguchi, Face recognition across illumination, The 12 th International Symposium on Artificial Life and Robotics, January 2007. [3] Stan Z. Li, RuFeng Chu, ShengCai Liao and Lun Zhang, Illumination Invariant Face Recognition Using Near-Infrared Images, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 29, No. 4, pp. 627-639, April 2007. [4] A. Faro, D. Giordano, C. Spampinato, An Automated Tool for Face Recognition using Visual Attention and Active Shape Models Analysis, Proceedings of The 28 th IEEE EBMS Annual International Conference, September 2006. [5] Bernd Heisele, Purdy Ho, Tomasso Poggio, Face recognition: component-based versus global approaches, Computer Vision and Image Understanding, vol. 91, pp. 6-21, August 2003. 7

[6] Keshav Seshadri and Marios Savvides, Robust Modified Active Shape Model for Automatic Facial Landmark Annotation of Frontal Faces, The 3 rd IEEE International Conference on Biometrics: Theory, Applications and Systems, September 2009. [7] P. Jonathon Phillips, Patrick J. Flynn, J. Ross Beveridge, W. Todd Scrugs, Alice J. O Toole, David Bolme, Kevin W. Bowyer, Bruce A. Draper, Geof H. Givens, Yui Man Lui, Hassan Sahibzada, Joseph A. Scallan III and Samuel Weimer Overview of the Multiple Biometrics Grand Challenge, Proceedings of the 3 rd IAPR/IEEE International Conference on Biometrics, June 2009. [8] R. Gross, I. Matthews, J. Cohn, T. Kanade and S. Baker, Multi-PIE, Proceedings of the 8 th IEEE International Conference on Automatic Face and Gesture Recognition, September 2008. [9] J. C. Gower, Generalized Procrustes Analysis, Psychometrika, vol. 40, no. 1, pp. 33-51, March 1975. [10] Intel: Open Source Computer Vision Library, Intel, 2007. [11] C. Liu and K. Wechsler, Gabor Feature Based Classification using the Enhanced Fisher Linear Discriminant Model for Face Recognition, IEEE Transactions on Image Processing, vol. 11, No. 4, pp. 467-476, April 2002. [12] C. Nakajima, M. Pontil and T. Poggio, People Recognition and Pose Estimation in Image Sequences, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks, vol. 4, pp. 4189-4195, July 2000. [13] SVM multiclass Multi-Class Support Vector Machine, http://svmlight.joachims.org/svm multiclass.html. 8