A NOVEL APPROACH TO ACCESS CONTROL BASED ON FACE RECOGNITION

A NOVEL APPROACH TO ACCESS CONTROL BASED ON FACE RECOGNITION A. Hadid, M. Heikkilä, T. Ahonen, and M. Pietikäinen Machine Vision Group Infotech Oulu and Department of Electrical and Information Engineering P.O. Box 4500 FIN-90014 University of Oulu, Finland ABSTRACT In this paper, we introduce an approach to access control based on detecting and recognizing faces with novel methods. The goal is to build an automatic face recognition system which authorizes the access to the laboratory room for only certain persons while denying the access for the others. For this purpose, a camera is set in the door looking at the corridor and the captured frames are analyzed. The analysis starts by selecting the regions of interest in the images using either texture-based background subtraction or skin locus based color segmentation. Once the regions of interest are defined, a fast face detection scheme is launched. Then, the extracted faces are compared to those in the predefined database to determine whether the individuals are allowed to enter the room or not. Among the novelties of our approach is the use of the Local Binary Pattern (LBP) texture features for face recognition and background subtraction. 1. INTRODUCTION Proactive computing aims to design and develop smart environments which can adapt and adjust to the user s movements and actions without requiring any conscious control. In other words, the system should be able to identify the users, interpret their actions, and react appropriately. Thus, one of the first and most important building blocks of such environments is a person identification system. Face recognition has emerged as an adequate technology for person identification in such systems which do not require any user s cooperation. Indeed, face technology has several advantages over other biometric systems. First of all, a face recognition-based system is passive as there is no need to use our fingers or to say some words in order to be recognized by the system. Also, no expensive and specialized equipment are needed as a simple video camera connected to a personal computer are largely enough to build the system. Despite the significant progress made in face recognition research [1], the available commercial systems [2, 1] perform well only under relatively controlled environments. For instance, most of the applications of face recognition to access control suffer from either low recognition speed due to the complex analysis of facial images, or to the low recognition accuracy. The main methods employed by facial recognition vendors to identify and verify faces include eigenfaces [3], Elastic Bunch Graph Matching [4], and Local Feature Analysis (LFA) [5] which is claimed to be used in Visionics commercial system FaceIt. A list of some commercial systems is given in [1] and their performances are reported in the Face Recognition Vendor Test (FRVT) [2]. Recently, we have proposed a new approach for face recognition based on local binary patterns (LBP) [6]. Our method clearly outperformed the-state-of-the-art algorithms (PCA, EBGM and Bayesian Intra/extrapersonal classifier) on four standard FERET probe sets [7]. In this work, we describe an approach to access control based on detecting and recognizing faces with our newly proposed methods, allowing fast processing and accurate recognition. In the system, a camera is set in the door looking at the corridor and the captured frames are analyzed. Figure 1 shows the considered environment. The analysis starts by selecting the regions of interest in the image. For this purpose, two approaches were considered. The first one consists of extracting the moving objects from the video frames using a texture-based approach to background subtraction [8, 9]. The alternative is the use of skin segmentation, which is based on a robust modeling of skin color with so called skin locus used to extract the skin-like regions [10]. By adopting the latter approach, a set of skin-like regions, which are considered face candidates, are extracted from the video frames. After orientation normalization and based on verifying a set of criteria (face symmetry, presence of some facial features, variance of pixel intensities and connected component arrangement), only facial regions are selected [11]. To identify the faces, a new approach for face recognition is considered [6]. The face area is first divided into several blocks and then the LBP feature histograms [8] are extracted from each block and concatenated into a single global feature histogram which efficiently represents the face image. The recognition is performed by a simple histogram matching. The different phases are outlined in Sec-

Fig. 1. The environment: A camera is set in the door looking at the corridor and the captured images are processed in order to allow or deny the access to the laboratory room for the individuals. 2.1. Selecting the Regions of Interest Before launching the face detection, our approach starts by reducing the search space in order to speed-up processing and also to avoid some additional false alarms. We describe below the two approaches to reduce the search space and define the regions of interest. 2.1.1. Background Subtraction Fig. 2. Two schemes are considered: The first solution uses background subtraction in order to extract the regions of interest while the second is based on defining the regions of interest in the input image as those containing skin areas. The idea consists of searching for faces only among the moving objects. Therefore, we developed a novel blockbased approach for background subtraction in order to detect the moving objects [9]. tion 2. 2. FACE RECOGNITION-BASED ACCESS CONTROL Figure 2 outlines the different parts of our system. Two different schemes are shown: The first solution uses a novel texture-based approach to background subtraction in order to extract the regions of interest, which means the moving objects, from video frames. The alternative is based on defining the regions of interest in the input image as those containing skin areas. Fig. 3. An example of moving object extraction using texture-based background subtraction. In our approach, the background image is divided into equally sized blocks using partially overlapping grid structure and each block is modeled as a group of weighted adaptive LBP histograms [8]. The histograms are sorted in decreasing order according to the likelihood that they model the background and the first histograms are considered to be the background model. Foreground detection is achieved by comparing the histogram calculated for the new block

against the existing background histograms. If the match is found, the block is considered to belong to the background. Otherwise, the block is marked as foreground. The algorithm can adapt to inherent changes in the scene background (e.g. illumination changes and multi-modality) and manages situations where new stationary objects are introduced to or old removed from the background area. Moreover, it operates in real-time. Figure 3 shows an example of background subtraction result. 2.1.2. Skin Segmentation Instead of considering moving objects, the faces can be searched for among the skin-like regions. Therefore, our second approach consists of segmenting the input image by defining the skin areas. Although different people have different skin color, but several studies have shown that the major difference lies largely in their intensity rather than their chrominance. Several value distribution models have been compared in different color spaces (RGB, HSV, YCrCb, etc.). In our case, we use the skin locus which has performed well with images under widely varying conditions [10, 11]. Skin locus is the range of skin chromaticities under varying illumination/camera calibration conditions in NCC (normalized color coordinate) space. In the NCC space, the intensity is defined as I = R + G + B and chromaticities are r = R/I, g = G/I and b = B/I. Because r + g + b = 1, only the intensity and two chromaticity coordinates are enough for specifying any color uniquely. We considered r b coordinates to obtain both robustness against intensity variations and good overlap of chromaticities of different skin colors. 2.2. Face Detection After selecting the regions of interest in the input images, the next step is the detection of faces. We describe in the following our approach to face detection in case of skin segementation, which is the present implementation of the system. The skin segmentation phase results in a set of skinlike regions, which are considered face candidates. Using morphological operations (majority operator and applying dilations followed by erosions until the image no longer changes), we reduce the number of these regions. For every candidate, we verify whether it corresponds to a facial region or not. The verification scheme is summarized in Figure 4. To increase the speed and robustness of the detector, we organized some operations on a cascade structure. To deal with faces of different orientations, we firstly calculate the best ellipse fitting the face candidate. Based on the fact that the pixel value variations of other skin-like regions (such as hands) are smaller than those of face regions because of the Fig. 4. Our face detection scheme. presence of features with different brightness, we remove all face region candidates with pixel value variations smaller than a threshold. In order to improve the detection speed and achieve high robustness, we check the symmetry of the face and remove all the candidates when the symmetry is verified but no facial features are detected. Since it is not always possible to detect the facial features (due to different orientations, illuminations, etc.), we build a model of spatial arrangement of connected component features. The main steps of our face detector are described in Figure 4 and the details can be found in [11]. 2.3. Face Normalization and Recognition Once the face has been extracted, we compare the image to those stored in the database to determine the identity of the face. For this purpose, we first normalize of face image to have the same scale (144x112 pixels ) as that of the faces in the database. To recognize the face we adopted the new approach for face representation that we recently introduced [6]. The face area is first divided into several blocks and then the Local Binary Pattern features (LBP) are extracted from each block and concatenated into a single feature histogram which represents efficiently the face image. Figure. 5 describes the facial representation extraction. The recognition is performed by a simple histogram matching. We adopted the χ 2 (Chisquare) as dissimilarity metric for comparing a target face

3.3. Face Detection Results Fig. 5. Face image representation using LBP: The face area is first divided into 24 blocks and then the Local Binary Pattern features (LBP) are extracted from each block and concatenated into a single feature histogram. histogram S to a model histogram M: χ 2 (S, M) = l i=0 (S i M i ) 2 S i + M i, (1) where l is the length of feature vector used to represent the face image (in our experiments, l = 1416). 3. IMPLEMENTATION AND EXPERIMENTAL RESULTS 3.1. Face Database We applied our face detection approach (see Section 2.2) to extract the faces in both the training and testing videos. 96.14% of the faces were successfully detected. Only few false positives were signaled. All these false detections were rejected during the recognition phase. See Figures 8 and 9 for some face detection examples. Note that the detection scheme is based on single frames and does not use the motion information. 3.4. Face Recognition Results First, we conducted a set of experiments in order to determine the parameters (LBP operator and window size) for the facial representation. By choosing a window size of 24*28 pixels and LBP8,1 u2 operator, each face image is represented by a feature vector of 59 24 = 1416 elements. The adopted facial representation is shown in Figure 5. After selecting N b = 6 face models for each subject in the database, we considered an appearance-based face recognition scheme. We used the χ 2 (Chi-square) as a dissimilarity metric for comparing a target face histogram S to a model histogram M. Figure 7 shows the mean recognition rates (Rank curves) when testing the system with the second video sequence of each subject. Just for comparison, we also included the recognition results of PCA [3] and LDA [13] based approaches. To build the system, we collected 20 video sequences of 10 persons who are allowed to enter the laboratory room (2 videos per person). Each video sequence contains 351 frames. The resolution of the images is 640x480 pixels. The database includes frontal and near frontal views with different facial expressions. Some face images from one subject are shown in Figure 6. The database contains two videos per person. The first video is used for training and the other for testing. 3.2. Selecting Models for View-Based Face Recognition In our approach, we adopted a view-based face recognition. Therefore, we automatically selected a set of face models from each training video sequence of each subject. The extraction of the face models is performed using an unsupervised learning scheme that we recently proposed in [12]. The approach is based on applying the locally linear embedding algorithm (LLE) to the raw feature data and then performing K-means clustering in the obtained low dimensional space. We extracted Nb = 6 face models from each training sequence and used them in the appearance-based face recognition. Details about the model extraction process can be found in [12]. Fig. 7. Rank curves for the LBP, LDA and PCA methods. As shown in Figure 7, the LBP-based approach resulted in a recognition rate of 81.4% versus 64.2% and 76.8% for the PCA and LDA based methods, when trying to classify the test images into one of the 10 classes without considering the reject class (i.e. Rank(0)). By defining a threshold for the reject class, the recognition rates dropped to 66.9%, 58.1% and 49.0% for LBP, LDA and PCA methods, respectively. Using a K-nearest

Fig. 6. Examples of face images from the face database considered in our experiments. neighbor classifier (with K = 3), the recognition rates were 71.6% and 57.4% for LBP and PCA methods, respectively. Some examples of recognition using LBP-based approach are shown in Figures 8 and 9. 4. DISCUSSION AND CONCLUSION We presented a system using face recognition to access control. We introduced two schemes: the first one is based on using background subtraction to detect the moving objects followed by face detection and then by LBP-based face recognition. The second scheme consisted of performing skin segmentation instead of background subtraction. The former approach is more adequate for outdoor scenarios where the system might be faced to dramatic illumination changes and complex backgrounds. We reported the results using the latter scheme which is used for an indoor access control application. The system starts by finding the skin-like regions in the input images. This skin segmentation is based on a robust modeling of skin color with so called skin locus. Choosing skin the locus model is motivated by a previous extensive analysis, which showed its efficiency against the stateof-the-art [10]. Once the skin-like regions are segmented, a fast face detector is launched in order to verify whether the skin regions are faces or not [11]. After scale normalization, the faces are efficiently represented using LBP feature histograms [6, 8]. We collected 20 video sequences of 10 different persons (2 videos per person). From the first video sequence of each person, we automatically selected six face models to build a view-based face recognition. The model selection is based on an unsupervised learning scheme using Locally Linear Embedding for dimensionality reduction followed by K-means clustering [12]. We used the second video sequence of each person to test the system. The preliminary results showed a recognition rate of 71.6% using LBP-based face representation, K-nearest neighbors classifier and χ 2 as dissimilarity metric. Though the results are better than those obtained with PCA and LDA, but still they are preliminary and lot of improvements can be achieved. Indeed, some parameters of the system have been set by default and thus are not optimal. For instance, when dividing the facial images into several regions, we gave an equal weight to the contribution of each region. However, one may use different weights, depending on the role of the given regions in detection/recognition. For example, since the eye regions are important for recognition, a high weight can be attributed to the corresponding regions. Such a procedure enhanced the facial representation in [6]. Also, other metrics than χ 2 could be tested and adopted. It is worth to note that the different parts of the system have been tested earlier in extensive experiments. Our methods for background subtraction, skin detection and face recognition have outperformed the state-of-the-art methods, while the face detection scheme [11] has shown good results. In addition to the obtained promising results, an interesting characteristic of our system lies in its speed: the skin and face detections can be run in real time while the LBP feature histograms can be easily computed in a single scan through the images. All our analysis is based on processing single frames. Improvements could be obtained by exploiting information redundancy by choosing, for example, the good frames for recognition. An alternative consists of recognizing a set consecutive frames and then performing a voting strategy to find the identity of the face. The system recognizes some faces much easier than others and we reported the mean recognition rates. The lack of registration is likely to be the main reason of some recognition failures. A potential solution might be the extraction and tracking of facial features in order to align the faces before recognition. During the data acquisition, the subjects were asked to

Fig. 8. Two examples of successful recognition. read some text. This yielded in different facial expressions and also in a non-rigid motion of their facial features. It is of interest to study the incorporation of this dynamic information in recognition [14] and experiment with much larger databases. When using the background subtraction method for limiting the search areas, a different scheme from that described in this paper should be adopted for detecting the faces. For this purpose, our recently introduced approach to face detection using LBP features [15] is a suitable choice. The proposed detection method is based on encoding both local and global facial characteristics into a compact feature histogram using LBP and then scanning the search areas at different scales and positions to detect the faces. This approach has shown excellent results and outperformed the state-ofthe-art algorithms [15]. In some complex environments in which several parts of the background can be skin-like regions, one may combine background subtraction and skin color segmentation in order to select the regions of interest. First, background subtraction can be used to extract the moving objects and then skin detection can be used to find the skin-like regions among only the moving objects. In such a way, false detection alarms will be avoided. Acknowledgment This research was sponsored by the Academy of Finland and the Finnish Graduate School in Electronics, Telecommunications and Automation (GETA). 5. REFERENCES [1] W. Zhao, R. Chellappa, P. J. Phillips, and A. Rosenfeld, Face recognition: A literature survey, ACM Computing Surveys, vol. 34(4), pp. 399 458, Dec. 2003. [2] P.J. Phillips, P. Grother, R. J. Micheals, D. M. Blackburn, E. Tabassi, and J. M. Bone, Face recognition vendor test 2002 results, Technical report, 2003. [3] M. Turk and A. Pentland, Eigenfaces for recognition, Journal of Cognitive Neuroscience, vol. 3, pp. 71 86, 1991. [4] L. Wiskott, J.-M. Fellous, N. Kuiger, and C. von der Malsburg, Face recognition by elastic bunch graph matching, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, pp. 775 779, 1997. [5] P. Penev and J.Atick, Local feature analysis: a general statistical theory for object representation, Network: Computation in Neural Systems, vol. 7, pp. 477 500, 1996. [6] T. Ahonen, A. Hadid, and M. Pietikäinen, Face recognition with local binary patterns, in the 8th European Conference on Computer Vision, May 2004, vol. 1, pp. 469 481. [7] P.J. Phillips, H. Moon, S. A. Rizvi, and P. J. Rauss, The FERET evaluation methodology for facerecognition algorithms, IEEE Transactions on Pat-

Fig. 9. Examples where a stranger to the system is rejected (left) while an authorized person is recognized (right). tern Analysis and Machine Intelligence, vol. 22, pp. 1090 1104, 2000. [8] T. Ojala, M. Pietikäinen, and T. Mäenpää, Multiresolution gray-scale and rotation invariant texture classification with local binary patterns, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, pp. 971 987, 2002. [9] M. Heikkilä, M. Pietikäinen, and J. Heikkilä, A texture-based method for detecting moving objects, in British Machine Vision Conference (submitted), September 2004. [10] B. Martinkauppi, M. Soriano, and M. Pietikäinen, Detection of skin color under changing illumination: a comparative study, in 12th International Conference on Image Analysis and Processing (ICIAP 2003), September 2003, pp. 652 657. [11] A. Hadid, M. Pietikäinen, and B. Martinkauppi, Color-based face detection using skin locus model and hierarchical filtering, in Proc. 16th International Conference on Pattern Recognition, Quebec, 2002, vol. 4, pp. 196 200. [12] A. Hadid and M. Pietikäinen, Selecting models from videos for appearance-based face recognition, in Proc. the 17th International Conference on Pattern Recognition, August 2004, in press. [13] K. Etemad and R. Chellappa, Discriminant analysis for recognition of human face images, Journal of the Optical Society of America, vol. 14, pp. 1724 1733, 1997. [14] A. Hadid and M. Pietikäinen, From still-image to video-based face recognition: An experimental analysis, in 6th Int. Conf. on Face and Gesture Recognition, 2004, pp. 813 818. [15] A. Hadid, M. Pietikäinen, and T. Ahonen, A discriminative feature space for detecting and recognizing faces, in Proc. Computer Vision and Pattern Recognition, June 2004, in press.