Head Frontal-View Identification Using Extended LLE

Similar documents
Non-linear dimension reduction

Robust Pose Estimation using the SwissRanger SR-3000 Camera

SELECTION OF THE OPTIMAL PARAMETER VALUE FOR THE LOCALLY LINEAR EMBEDDING ALGORITHM. Olga Kouropteva, Oleg Okun and Matti Pietikäinen

Selecting Models from Videos for Appearance-Based Face Recognition

Combining Gabor Features: Summing vs.voting in Human Face Recognition *

Manifold Learning for Video-to-Video Face Recognition

Learning based face hallucination techniques: A survey

Appearance Manifold of Facial Expression

Technical Report. Title: Manifold learning and Random Projections for multi-view object recognition

Learning a Manifold as an Atlas Supplementary Material

The Analysis of Parameters t and k of LPP on Several Famous Face Databases

Dimension Reduction CS534

Dimension Reduction of Image Manifolds

School of Computer and Communication, Lanzhou University of Technology, Gansu, Lanzhou,730050,P.R. China

Large-Scale Face Manifold Learning

Recognizing Handwritten Digits Using the LLE Algorithm with Back Propagation

Enhanced Active Shape Models with Global Texture Constraints for Image Analysis

Evgeny Maksakov Advantages and disadvantages: Advantages and disadvantages: Advantages and disadvantages: Advantages and disadvantages:

Face Hallucination Based on Eigentransformation Learning

Face Recognition using Tensor Analysis. Prahlad R. Enuganti

Locality Preserving Projections (LPP) Abstract

FACE RECOGNITION FROM A SINGLE SAMPLE USING RLOG FILTER AND MANIFOLD ANALYSIS

Locality Preserving Projections (LPP) Abstract

Recognition: Face Recognition. Linda Shapiro EE/CSE 576

Visual Learning and Recognition of 3D Objects from Appearance

Data fusion and multi-cue data matching using diffusion maps

Remote Sensing Data Classification Using Combined Spectral and Spatial Local Linear Embedding (CSSLE)

Sparse Manifold Clustering and Embedding

DISTANCE MAPS: A ROBUST ILLUMINATION PREPROCESSING FOR ACTIVE APPEARANCE MODELS

Face Recognition Using Vector Quantization Histogram and Support Vector Machine Classifier Rong-sheng LI, Fei-fei LEE *, Yan YAN and Qiu CHEN

Challenges motivating deep learning. Sargur N. Srihari

The Isometric Self-Organizing Map for 3D Hand Pose Estimation

Image Similarities for Learning Video Manifolds. Selen Atasoy MICCAI 2011 Tutorial

Abstract. 1. Introduction. 2. From Distances to Affine Transformation

Towards Multi-scale Heat Kernel Signatures for Point Cloud Models of Engineering Artifacts

Transferring Colours to Grayscale Images by Locally Linear Embedding

Stacked Denoising Autoencoders for Face Pose Normalization

Classification of Face Images for Gender, Age, Facial Expression, and Identity 1

Hyperspectral image segmentation using spatial-spectral graphs

Dynamic Facial Expression Recognition Using A Bayesian Temporal Manifold Model

Globally and Locally Consistent Unsupervised Projection

ESTIMATING HEAD POSE WITH AN RGBD SENSOR: A COMPARISON OF APPEARANCE-BASED AND POSE-BASED LOCAL SUBSPACE METHODS

Extended Isomap for Pattern Classification

Face Recognition with Weighted Locally Linear Embedding

Real-Time Human Pose Inference using Kernel Principal Component Pre-image Approximations

A new Graph constructor for Semi-supervised Discriminant Analysis via Group Sparsity

CPSC 695. Geometric Algorithms in Biometrics. Dr. Marina L. Gavrilova

RDRToolbox A package for nonlinear dimension reduction with Isomap and LLE.

The Novel Approach for 3D Face Recognition Using Simple Preprocessing Method

Face Detection and Recognition in an Image Sequence using Eigenedginess

SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS

AAM Based Facial Feature Tracking with Kinect

Proximal Manifold Learning via Descriptive Neighbourhood Selection

Data Mining: Data. Lecture Notes for Chapter 2. Introduction to Data Mining

Generalized Principal Component Analysis CVPR 2007

Separating Style and Content on a Nonlinear Manifold

Robot Manifolds for Direct and Inverse Kinematics Solutions

Announcements. Recognition I. Gradient Space (p,q) What is the reflectance map?

Announcements. Recognition. Recognition. Recognition. Recognition. Homework 3 is due May 18, 11:59 PM Reading: Computer Vision I CSE 152 Lecture 14

Static Gesture Recognition with Restricted Boltzmann Machines

Face Recognition using SURF Features and SVM Classifier

Nonlinear Generative Models for Dynamic Shape and Dynamic Appearance

Global versus local methods in nonlinear dimensionality reduction

Linear local tangent space alignment and application to face recognition

Robust and Accurate Detection of Object Orientation and ID without Color Segmentation

Biometrics Technology: Image Processing & Pattern Recognition (by Dr. Dickson Tong)

Face Recognition Based on LDA and Improved Pairwise-Constrained Multiple Metric Learning Method

A Discriminative Non-Linear Manifold Learning Technique for Face Recognition

Differential Structure in non-linear Image Embedding Functions

CSE 6242 A / CS 4803 DVA. Feb 12, Dimension Reduction. Guest Lecturer: Jaegul Choo

Improving Latent Fingerprint Matching Performance by Orientation Field Estimation using Localized Dictionaries

Human Identification at a Distance Using Body Shape Information

Local Linear Embedding. Katelyn Stringer ASTR 689 December 1, 2015

A NOVEL APPROACH TO ACCESS CONTROL BASED ON FACE RECOGNITION

Unconstrained Face Recognition using MRF Priors and Manifold Traversing

Symmetry-based Face Pose Estimation from a Single Uncalibrated View

Semi-supervised Data Representation via Affinity Graph Learning

Compressive Sensing for High-Dimensional Data

BSB663 Image Processing Pinar Duygulu. Slides are adapted from Selim Aksoy

Evaluation of texture features for image segmentation

CS4442/9542b Artificial Intelligence II prof. Olga Veksler

An Automatic Face Recognition System in the Near Infrared Spectrum

Face Alignment Under Various Poses and Expressions

Applications Video Surveillance (On-line or off-line)

Dr. Enrique Cabello Pardos July

Component-based Face Recognition with 3D Morphable Models

Face Recognition At-a-Distance Based on Sparse-Stereo Reconstruction

3 Nonlinear Regression

Gabors Help Supervised Manifold Learning

Segmentation and Grouping

Structured Light II. Thanks to Ronen Gvili, Szymon Rusinkiewicz and Maks Ovsjanikov

Misalignment-Robust Face Recognition

Haresh D. Chande #, Zankhana H. Shah *

Local Feature Detectors

Supervised LLE in ICA Space for Facial Expression Recognition

Recognition of Non-symmetric Faces Using Principal Component Analysis

A Supervised Non-linear Dimensionality Reduction Approach for Manifold Learning

Sparsity Preserving Canonical Correlation Analysis

Cross-pose Facial Expression Recognition

Adaptive osculatory rational interpolation for image processing

Transcription:

Head Frontal-View Identification Using Extended LLE Chao Wang Center for Spoken Language Understanding, Oregon Health and Science University Abstract Automatic head frontal-view identification is challenging due to appearance variations caused by pose changes, especially without any training samples. In this paper, we present an unsupervised algorithm for identifying frontal view among multiple facial images under various yaw poses (derived from the same person). Our approach is based on Locally Linear Embedding (LLE), with the assumption that with yaw pose being the only variable, the facial images should lie in a smooth and low dimensional manifold. We horizontally flip the facial images and present two K-nearest neighbor protocols for the original images and the flipped images, respectively. In the proposed extended LLE, for any facial image (original or flipped one), we search (1) the Ko nearest neighbors among the original facial images and (2) the Kf nearest neighbors among the flipped facial images to construct the same neighborhood graph. The extended LLE eliminates the differences (because of background, face position and scale in the whole image and some asymmetry of left-right face) between the original facial image and the flipped facial image at the same yaw pose so that the flipped facial images can be used effectively. Our approach does not need any training samples as prior information. The experimental results show that the frontal view of head can be identified reliably around the lowest point of the pose manifold for multiple facial images, especially the cropped facial images (little background and centered face). 1. Introduction Most literatures [1-2] indicate that frontal-face based recognition performs better than those based on non-frontal faces. Therefore, frontal faces from facial video sequence should be identified before subsequent recognition. Recently, some researchers [3-6] exploited the pose manifold, represented as the underlying geometry structure information of the pose data space, to estimate the head poses including frontal view. BenAbdelkader [3] proposed a taxonomy of methods that correspond to different ways of incorporating the pose angle information at the different stages of supervised manifold learning, such as neighborhood graph construction. Ptucha and A. Savakis [4] used facial feature points rather than the whole face image to construct the manifold for pose estimation. Hu et al. [5] noticed that the data points related to different yaw poses of the same person can construct an ellipse-like circle, which differs in rotation degree, size and center position for different persons. Balasubramanian and Ye [6] presented a biased manifold embedding framework in which the head pose information is used to compute a biased neighborhood of each training sample in the feature space before manifold learning. However, there are three limitations for recent manifold based head pose estimation algorithms. (1) They are all supervised methods, which need a large number of face images under different known poses as training samples to construct a precise pose parameter map. These training samples are often difficult or expensive to collect. (2) Due to the difference of face identity between training samples and testing samples, these algorithms require various preprocessing methods to eliminate the identity information of face image before manifold learning. (3) In order to precisely estimate the head pose, some algorithms use the manually cropped facial images (with very little background and centered face) [3-5]. In this paper, we present an extended Locally Linear Embedding (LLE) [7] for head frontal-view identification in an unsupervised fashion. Many studies [5, 6, 8] indicate that when the pose distributions of the facial images are symmetric (e.g., yaw pose angles varying from -90 o to +90 o in increments of 2 o ), the gradient of the frontal-view position in the geometry of pose manifold should be zero (e.g., lowest point of the pose manifold [8]). In order to construct the symmetric distribution of pose, we horizontally flip the facial images. However, due to the background difference, different face position and scale in the whole image and some asymmetry of left-right face, for the same yaw pose, there is still a big distance between the original facial image and the flipped facial image (e.g., 60 o yaw pose original facial image vs. 60 o yaw pose flipped face image derived from -60 o yaw pose original facial image). In order to address this problem, we present two K-nearest neighbor protocols for the original facial images and the flipped facial images, respectively. For any facial image (original or flipped), we search (1) Ko nearest neighbors among the original facial images and (2) Kf 1

nearest neighbors among the flipped facial images to construct the neighborhood graph of the extended LLE. Different reference facial image corresponds to different variable Ko and Kf, which depends on the pose distributions of the original facial images. 2. The original LLE algorithm As a dimension reduction algorithm, the original LLE constructs a neighborhood preserving mapping while couples across all data points. Therefore, LLE uses overlapping local information to discover global structure. There are three steps of the original LLE [7], described as follows Step 1. Construct Neighborhood Graph : For each data point, if point is one of the nearest neighbors measured as the Euclidean distance,, then connect them with as the edge weight. Step 2. Compute Weight Matrix : For each data point, minimize the cost function (1) by constrained linear fits. Step 3. Construct Low Dimensional Embedding: Compute the vectors best reconstructed by the weights, minimizing the embedding cost (2) by its bottom nonzero eigenvectors. 3. Extended LLE for head frontal-view identification In this section, we first describe why and how to use LLE algorithm for head frontal-view identification. Then, we present the extended LLE with two K-nearest neighbor protocols for original and flipped image sets, respectively. Lastly, we illustrate the algorithm steps of extended LLE. In this paper, we only consider the yaw pose variation. 3.1. Characteristics of manifold geometry using LLE As described in some recent studies [6, 8], when the pose distribution of facial images is symmetric, the position corresponding to frontal view is located in the center of the manifold geometry using LLE. Due to the yaw pose being the single variable, the manifold geometry in a 2D embedding space should be smooth and one-dimensional. As shown in Figure 1 (a), for a symmetric pose distribution of facial images, the one-dimensional manifold geometry is like a parabola shape and the position corresponding to frontal view is in the vertex. Therefore, we can easily identify the frontal view with the manifold geometry in this situation. When the pose distribution of facial images is asymmetric, the one-dimensional manifold is also like a parabola shape but the position corresponding to frontal view deviates from the vertex (see Figure 1 (b)), which is difficult for head frontal-view identification. From the local view, each data point only covers the K nearest neighbors; however, from the global view, all of the data points are coupled across in the manifold learning procedure, which is influenced by the global distribution of the poses. Isomap [9] has the similar characteristics [5, 6], but the manifold geometry using Isomap is not smoother than that using LLE. (a) Figure1: 2D LLE embedding of (a) the facial images with yaw poses varying from -90 o to 90 o in increments of 1 o and (b) the facial images with yaw pose varying from -90 o to 30 o in increments of 1 o ( represents the position corresponding to frontal-view in manifold.) Generally, the facial images are asymmetric in pose distribution. In order to identify the head frontal view, we can first convert the asymmetric distribution into a symmetric distribution. The simplest way is to horizontally flip the original facial images. In Figure 2, for example, we take some original and corresponding flipped facial images Figure2: 2D LLE embedding of original and flipped facial images to learn manifold using LLE. We can see that there is a longer distance between the original facial images and the flipped facial images than the distance among themselves. This is because there are some differences, such as different background, various face (b) 2

position and scale in the whole image and some asymmetry of left-right face, between the original facial image and the flipped facial image at the same yaw pose, which are larger than the pose difference among the K nearest neighbors in the original facial images or the flipped facial images. 3.2. Two K-nearest neighbor protocols In order to eliminate the unnecessary differences (noise) between the original images and the flipped facial images, we present two K-nearest neighbor protocols for the original or flipped set of the facial images, respectively. For any data point (facial image), we search Ko nearest neighbors in original set and Kf nearest neighbors in flipped set. In this paper, the total number of Ko and Kf equals to Kt, which is a constant. However, the variables Ko and Kf can vary according to different reference facial image (data point). For example, there are some original facial images with yaw pose varying from -90 o to 30 o in increments of 20 o ; therefore, the yaw poses of the corresponding flipped facial images vary from -30 o to 90 o in increments of 20 o. As shown in Figure 3, we illustrate the novel neighborhood graph based on the image distance (e.g., the image distance between the flipped facial image with 30 o yaw pose and the original facial image with 10 o yaw pose is 22). If we search the Ko and Ko nearest neighbors of the original facial image with -90 o yaw pose, we find that all of the flipped facial images are far away in image distance. Therefore, for this data point, the variable Kf should be less than Ko (some neighbors in flipped set are abandoned; the same number of neighbors in original set are reconfirmed, which keep Kt constant). Lastly, we describe the rule of abandoning and reconfirming the neighbors in the neighborhood graph. For different data points in the same set (original or flipped), if the same neighbors are searched in the other filed, we only keep the neighbors with the shortest distance (related to the same data point). For those data points which have abandoned neighbors, we need reconfirm the same number of neighbors in the set of the data points. As shown in Figure 3, for the data points with yaw pose -90 o, -70 o, -50 o and -30 o in the original set, they have the same neighbors (-30 o and -10 o ) in the flipped set. We only keep the neighbors related to the data point with yaw pose -30 o (23+22 is the shortest distance among them). In order to Keep Kt (Ko + Kf) constant, we add new neighbors (-30 o and -10 o ) related to the other data points in the original set. 3.3. The extended LLE Incorporating the two K-nearest neighbor protocols into LLE, we propose an extended LLE for head frontal-view identification. We convert the asymmetric distribution of the original facial images into a symmetric distribution by horizontally flipping the original facial images. Then, we Original Flipped -90 o -70 o -50 o -30 o -10 o 10 o 30 o -30 o -10 o -10 o 30 o 50 o 70 o 90 o Original Flipped 90 o 70 o 50 o 30 o 10 o 10 o 30 o 30 o 10 o 10 o 30 o 50 o 70 o 90 o 0 12 17 28 37 43 47 29 37 40 41 49 50 46 12 0 13 27 38 44 49 29 36 40 43 52 53 50 17 13 0 24 36 43 48 26 34 38 42 51 52 49 28 27 24 0 25 34 37 23 22 27 32 42 43 41 37 38 36 25 0 25 32 30 24 14 27 38 40 40 43 44 43 34 25 0 23 35 32 24 22 34 36 37 47 49 48 37 32 23 0 40 35 30 23 26 29 29 29 29 26 23 30 35 40 0 23 32 37 48 49 47 37 36 34 22 24 32 35 23 0 25 34 43 44 43 40 40 38 27 14 24 30 32 25 0 25 36 38 37 41 43 42 32 27 22 23 37 34 25 0 24 27 28 49 52 51 42 38 34 26 48 43 36 24 0 13 17 50 53 52 43 40 36 29 49 44 38 27 13 0 12 46 50 49 41 40 37 29 47 43 37 28 17 12 0 Reconfirmed neighbors Abandoned neighbors Reconfirmed (re-added) neighbors if exist abandoned neighbors Figure 3: Example of the proposed neighborhood graph. The original facial images with yaw pose varying from -90 o to 30 o in increments of 20 o ; the corresponding flipped facial images with yaw pose varying from -30 o to 90 o in increments of 20 o. For any facial image, search the Ko (initially, Ko=2) and Kf (initially, Kf=2) nearest neighbors in original and flipped set, respectively (in the corresponding column). If exist reconfirmed neighbors, the variable Ko and Kf should vary. 3

use the two K-nearest neighbor protocols to eliminate the unnecessary differences between the original facial images and the flipped facial images. Besides, the proposed algorithm does not need any training samples. Similar to original LLE [7], there are also three steps of the proposed extended LLE, described as follows Step 1. Construct Neighborhood Graph : (i). Flip the original facial images horizontally. (ii). For each data point in the original and flipped facial images, perform the two K-nearest neighbor protocols, if point is one of searched neighbors measured as the Euclidean distance,, then connect them with as the edge weight. Step 2. Compute Weight Matrix : For each data point, minimize the cost function Eq. (1) by constrained linear fits. Step 3. Construct Low Dimensional Embedding: Compute the vectors best reconstructed by the weights, minimizing the embedding cost Eq. (2) by its bottom nonzero eigenvectors. Different from original LLE, the proposed extended LLE considers not only the original facial images but also the flipped facial images. We search K neighbors in the original set and the flipped set, respectively. However, they construct the same neighborhood graph of the extended LLE. 4. Experiments In this paper, we compared the original LLE and the extended LLE on the some facial images with the asymmetric distribution of yaw pose for head frontal-view identification. Besides, we also perform the extended LLE on the corresponding cropped facial images (with little background and centered face). 4.1. Facial database We used FacePix face database [10] to form various videos for experiments. FacePix face database contains 30 individuals with yaw pose angles varying from -90 o to 90 o in increments of 1 o. So for each individual, there are 181 face images, and some examples are shown in Figure 4 (a). The resolution of the face images are 128 128, and the grayscale pixel intensity feature space of the face images are used for experiments. For example, we can construct an image sequence in which head rotates from -90 o to 38 o in increments of 2 o, described as -90 o :2 o :38 o. We also make some cropped facial images based on FacePix face database (see Figure 4 (b)). 4.2. Comparisons and Results Based on two image sequences in asymmetric pose distribution (-90 o :1 o :60 o and -90 o :1 o :30 o ), we compare (1) Case I: the original LLE on the original images, (2) Case II: the extended LLE on the original images and the corresponding flipped images and (3) Case III: the extended LLE on the cropped original and flipped images. For head frontal-view identification, the actual degree of frontal view (0 o in each video sequence) is denoted as ; the identified degree of frontal view (the lowest point of manifold geometry) is denoted as. The expression can represent the identification error. Therefore, we can obtain the identification accuracy (mean and standard deviation ) of head frontal view on FacePix face database (30 individuals), listed in Table 1. Then, where represents expectation and. Table 1 Estimating accuracy for head frontal view on FacePix face database (30 individuals) Case I Case II Case III (a) (b) Figure 4: Examples of three persons in (a) FacePix face database and (b) cropped FacePix face database. The angles of yaw pose from left to right column are: -90 o, -60 o, -30 o, 0 o, 30 o and 60 o, respectively. Different rows correspond to different persons. -90 o :1 o :60 o 10.1 o 5.7 o 5.8 o 3.6 o 4.2 o 3.1 o -90 o :1 o :30 o 22.8 5.1 o 5.2 o 2.9 o 4.5 o 3.4 o We can see that the extended LLE has high effectiveness for head frontal-view identification (Case II and Case III), no matter how asymmetric the pose distribution is. In addition, for the cropped facial images (Case III), the 4

-90 o :1 o :60 o -90 o :1 o :30 o Case I: Case II: Case III: (a) (b) Figure 5: Examples of the comparisons on (a) -90 o :1 o :60 o and (b) -90 o :1 o :30 o. Different rows correspond to different cases. In (a) or (b), the same column corresponds to the same person. (The mark * represents the position corresponding to frontal view in manifold.) identifying accuracy using the extended LLE is improved further, although the improvement is not very obvious. This illustrates that even though the facial images are not cropped, the extended LLE can also have a high performance. As shown in Figure 5, we list the experimental results of three persons (in Figure 4) in the comparison. 5. Conclusion and future work We proposed an extended LLE for head frontal-view identification in an unsupervised fashion. For the facial images with the symmetric distribution of yaw poses, the position corresponding to frontal view can be identified from the manifold geometry using LLE. Based on the characteristic property, we can convert any asymmetric pose distribution into a symmetric pose distribution by horizontally flipping the original facial images. However, there are some unnecessary differences (different background, varying facial position and scale in the whole image and some asymmetry of left-right face) between the original facial images and the flipped facial images, which influence the precision of head frontal-view identification. To address this problem, we extend LLE with two K-nearest neighbor protocols: search the K nearest neighbors in the original facial images and the flipped facial images, respectively. Besides, the proposed algorithm does not need any training samples for head fontal-view identification. However, there is still room for improvement to do for the extended LLE. It takes much time to manually crop the facial images. We can use the facial appearance automatically obtained by View-based Active Shape Models [11] or Active Appearance Models [2]. References [1] L. Wiskott, J.M. Fellous, N. Krüger, and C. Malsburg, "Face recognition by elastic bunch graph matching," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, no. 7, pp. 775-779, 1997. [2] T. Cootes, K. Walker, and C. Taylor, View-based active appearance models, in Proc. IEEE International Conference on Automatic Face and Gesture Recognition, 2000, pp. 227-232. [3] R. Ptucha and A. Savakis, Pose estimation using facial feature points and manifold learning, in Proc. IEEE International Conference on Image Processing, 2010, pp. 3261-3264. [4] C. BenAbdelkader, Robust head pose estimation using supervised manifold learning, in Proc. European Conference on Computer vision, 2010, pp. 518 531. [5] N. Hu, W. Huang, and S. Ranganath, Head pose estimation by non-linear embedding and mapping, in Proc. IEEE International Conference on Image Processing, 2005, pp. II-342-5. [6] V.N. Balasubramanian and J. Ye, Biased manifold embedding: a framework for person-independent head pose estimation, in Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2007, pp. 1-7. 5

[7] S. Roweis and L. Saul, Non-Linear Dimensionality Reduction by Locally Linear Embedding, Science, vol. 290, no. 5500, pp. 2323-2326, 2000. [8] M.N. Teli and J.R. Beveridge, Pose manifold curvature is typically less near frontal face views, in Proc. IEEE International Conference on Biometrics: Theory, Applications, and Systems, 2009, pp. 1-6. [9] J.B. Tenenbaum, V. Silva, and J.C. Langford, A global geometric framework for nonlinear dimensionality reduction, Science, vol. 290, no. 5500, pp. 2319-2323, 2000. [10] G. Little, S. Krishna, J. Black, and S. Panchanathan, "A methodology for evaluating robustness of face recognition algorithms with respect to variations in pose angle and illumination angle," in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, 2005, pp. 89-92. [11] C. Wang and X. Song, Tracking facial feature points with prediction-assisted view-based active shape model, in Proc. IEEE International Conference on Automatic Face and Gesture Recognition, 2011. 6