Facial Ethnicity Classification based on Boosted Local Texture and Shape Descriptions

Facial Ethnicity Classification based on Boosted Local Texture and Shape Descriptions Huaxiong Ding, Di Huang, IEEE Member, Yunhong Wang, IEEE Member, Liming Chen, IEEE Member, Abstract Ethnicity is a key demographic attribute of human beings and it plays a important role in automatic machine based face analysis, therefore, there has been increasing attention for face based ethnicity classification in recent years. In this paper, we propose a novel method on such an issue by combining both boosted local texture and shape features extracted from 3D face models, in contrast to the existing ones that only depend on 2D facial images. The proposed method makes use of the Oriented Gradient Maps (OGMs) to highlight local geometry as well as texture variations of entire faces, while further learns a compact set of features which are highly related to the ethnicity property for classification. Experiments are comprehensively carried out on the FRGC v2.0 dataset, and the performance is up to 98.3% to distinguish Asians from non-asians when 80% samples are used in the training set, demonstrating the effectiveness of the proposed method. I. INTRODUCTION Facial image analysis is one of the most popular research topics in the field of pattern recognition and computer vision. As a critical branch of this topic, automatic face recognition has been intensively studied during the past several decades. However, the faces of human beings not only provide identity clues, but also convey a variety of demographic information such as gender, ethnicity, age etc., importantly contributing to this domain. Among the demographic attributes, ethnicity generally remains invariant through all lifetime and greatly supports face recognition systems, therefore, automatic facial ethnicity classification has been received increasing attention in recent years. Furthermore, it also has promising potential to be applied for improvement in human computer interaction (HCI), surveillance, video/image retrieval, database indexing etc. In the literature, several methods have been proposed for facial image based ethnicity classification. For instance, Lu et al. [1] proposed a two-class (Asian vs. non-asian) ethnicity classification approach, in which face images were analyzed at multiple scales; an LDA [9] classifier was learnt for each scale; and the final similarity measurement was obtained by This work was supported in part by the National Natural Science Foundation of China (NSFC) under Grant No. 61202237 and No. 61061130560, the French Research Agency (Agence Nationale de Recherche, ANR) under Grant ANR 2010 INTB 0301 01, the joint project funded by the LIA2MCSI lab between Écoles Centrales and Beihang University, and the Fundamental Research Funds for the Central Universities. H. Ding and L. Chen are with Laboratoire d InfoRmatique en Image et Systèmes d information (LIRIS), Ecole Centrale de Lyon, CNRS, UMR 5205, Lyon, 69134, France. huaxiong.ding.ecl@gmail.com and liming.chen@ec-lyon.fr D. Huang and Y. Wang are with State Key Laboratory of Virtual Reality Technology and Systems, School of Computer Science and Engineering, Beihang University, Beijing, 100191, China. dhuang@buaa.edu.cn and yhwang@buaa.edu.cn combining scores of different scales under product rule. The reported classification performance is 96.3% on the database composed by 2,630 samples belonging to 263 subjects. Yang and Ai [2] compared the accuracies of a compact set of Haarlike [3] and LBP features [4] [5] were selected by Adaboost [3]. An average error rate of 3.38% was achieved on the twoclass (Asian vs. non-asian) ethnicity classification task using LBP features. Hosoi et al. [6] made use of the Gabor wavelet features with the SVM classifier and conducted experiments on the HOIP dataset for a triple classification problem: i.e. Asian, European, and African. Finally, the achieved accuracy of each class is: 96%, 93% and 94% respectively. Guo and Mu [7] used Gabor features and reported a benchmark result for classification of five ethnicities on the large-scale dataset MORPH-II containing more than 55,000 facial images. The ethnicity prediction for the Black and White races is 98.3% and 97.1%, while because of insufficient training data, for the other three races: Hispanic, Asian and Indian, the predictions are degraded dramatically to 74.2%, 59.5%, as well as 6.9%, respectively. Up till now, the great majority of efforts have been made in the 2D domain, only using texture information. However, according to the anatomical studies, geometrical information of faces of human beings is essential for ethnicity classification as well. For example, Caucasian brow bones are always deeper, with eyes more sunken than Asian ones. While Asian noses tend to possess lower bridges; Caucasian noses extend slightly upward. Asians commonly have wider foreheads, but Caucasian foreheads are usually slightly narrower. Caucasian faces tend to have longer and sharper chins; Asian ones are generally shorter and flatter. Unfortunately, there is very few research on 3D face based ethnicity recognition. A probable reason is that the distribution of 3D face samples in current public databases with ethnicity labels available is generally unbalanced. Specifically, BU-3DFE [8] includes 6 races, but around 75% samples are the Asians and whites, making the data for the left not sufficient in the training phase, and this ratio even increases to about 92% in the FRGC v2.0 database [9]. Previous research tasks therefore usually treat this topic as a binary or a triple classification problem. Lu et al. [10] can be regarded as the pioneer for this attempt where they integrated similarity measurements of texture and shape (i.e. intensity and depth value of the central part of human faces), showing that the joint use of multiple modality leads to an improvement in accuracy. Zhang and Wang [11] randomly selected Asian and White subjects from the FRGC v2.0 database. Above 99.5% classification accuracy was obtained using the MM-LBP features extracted from both texture and

shape modalities as well as the Adaboost classifier. Toderici et al [12] adopted a high-level demographic feature estimated from 3D meshes of the human face. An accuracy of 99.6% was achieved for a two-class (Asian vs. White) classification on FRGC v2.0. Considering the fact that each organ of faces, such as eye, nose, forehead, etc., provides different cue in facial ethnicity classification, local feature based approaches tend to be more reasonable and thereby more efficient than the holistic ones as did in [13]. In this study, we present a novel approach for ethnicity classification using 3D face models that consist of both texture and shape modalities. Specifically, we introduce the Oriented Gradient Maps (OGM) originally proposed for the task of textured 3D face recognition [14], to extract local details at pre-defined quantized orientations to highlight the distinctiveness in both modalities. In order to emphasize the importances of different organs that are highly related to the ethnicity property, we apply Adaboost to select a set of most representative facial features on OGMs and assign individual weights to them. Finally, by combining similarity scores of all the OGMs of texture and shape clues, decision making is achieved. The proposed approach is evaluated on the FRGC v2.0 database, and the accuracy is up to 98.3% to distinguish Asian and non-asian persons in the setting that 80% samples are used for training, outperforming most of the ones in the literature. Our main contributions are briefly summarized as follows: * Present an ethnicity classification method that uses a novel feature to describe ethnicity related local texture and shape characteristics; * Evaluate the contribution of each organ of human faces to the accuracy of ethnicity classification; * Analyze the impact of the percentage between training and testing samples to final performance. The remainder of the paper is organized as follows: Section II introduces OGMs in detail, and Section III presents the steps of feature selection and classification. Experimental results are shown and discussed in section IV. Finally, Section V concludes the paper. G o = ( ) + I (1) o We then convolve gradient maps with a Gaussian kernel G for avoiding abrupt changes. The standard deviation of the Gaussian kernel G is proportional to the radius of the given neighborhood area R, as in (2). ρ R o = G R G o (2) Thirdly, according to (3), the response vector ρ R (x, y) of pixel location (x, y) can be built by collecting all the values of the convolved gradient maps at that location. The response vector is further normalized to a unit norm vector denoted by ρ R. ρ R (x, y) = [ ρ R 1 (x, y),..., ρ R o (x, y) ] t (3) Finally, an OGM J o is generated as ρ R at orientation o, as shown in Fig. 1. Fig. 1. Illustration of the oriented gradient maps; each is for a quantized orientations o [14]. II. ORIENTED GRADIENT MAPS (OGMS) BASED FACIAL REPRESENTATION Given a textured 3D face model, through the preprocessing pipeline including removing spikes and filling holes, we can extract a facial range image and its texture counterpart for the following steps. In this study, to highlight the discriminative power of local texture and geometry clues of human faces for ethnicity analysis, we introduce OGMs, a biological vision based representation method originally proposed for 3D face recognition [14]. It achieved the state of the art performance and proved insensitive to affine illumination and geometrical transformations. Specifically, given a range or texture image I, for each predefined quantized orientation o, a certain number of gradient maps G 1, G 2,..., G o, which describes gradient norm of input image at direction o, are first computed as in (1). Fig. 2. Example of oriented gradient maps (at 4 quantized orientations) of a facial range image and its corresponding texture. Figure 1 presents such a process, and from Fig 2, we can see that local shape and texture changes of human faces are both highlighted by OGMs. Compared with face recognition that requires more discriminative features to tolerate much smaller inter-class changes caused by similar facial appearances, while in the given task, inter-class variations tend to be larger, and we thus generate four OGMs for each facial range or texture images instead of eight in [14] to increase system efficiency. Specifically, for each of input images, we consider

its gradient maps at following orientations respectively: 0, π 4,, which conserve both positive and negative polarity. π 2, 3π 4 III. FEATURE SELECTION AND CLASSIFICATION Recall that each organ of human faces provides different clue in ethnicity analysis, and to emphasize the importance of different organs that are highly related to ethnicity properties, we make use of the Boosting algorithm for feature selection. To make structure and organization of this study complete, in this section, we briefly introduce the well-known Adaboost technique, and then present the classification step in detail. AdaBoost was formulated by Freund and Schapire [15], a typical algorithm in Boosting family for constructing a strong classifier as linear combination of a certain number of simple weak classifiers. As illustrated in Alg. 1, at each iteration t, a new weak classifier h t is generated, and a distribution of weights D i indicating the importance of examples in data set is updated. The weight of examples misclassified by h t is to be increased and the weight of correctly classified examples is to be decreased. Thus, new classifiers always focus on the examples which are still misclassified. Algorithm 1 Framework of Adaboost Learning for Feature Selection in Our System. 1: Given labelled samples, (x 1, y 1 ), (x 2, y 2 ),..., (x m, y m ), where x i X, y i Y = { 1. + 1}. In our case, X are the OGM features extracted from all the face models in FRGC v2.0. Y 1 = Asian and Y 1 = Non-Asian. 2: Initialize weights D 1 (i) = 1/m 3: for t = 1,..., T : do 4: Build weak classifier using distribution D i 5: Make weak hypothesis h t : Y { 1, +1} with error ɛ t = P r i Dt [h t (x i ) y i ] ( ) 6: Choose α t = 1 2 ln 1 ɛ t ɛ t 7: Update: if h t (x i ) y i D t+1 (i) = D t(i) ɛ αt Z i else D t+1 (i) = D t(i) ɛ αt Z i where Z t is a normalization factor (so that D t+1 will be a distribution) 8: end for 9: Output the final hypothesis: ( T ) H(x) = sign α t h t (x) i=1 T k (x, y) only considering locally the values of pixel(x, y) in G i,k. There are totally x y decision trees calculated, then we select the decision tree with the minimum error as the weak classifier in current iteration. Adaboost finally combines all the weak classifiers with related weights to construct a strong one for final prediction. Specifically, to a facial range or texture image, the OGMs π at four pre-defined quantized orientations i.e. 0, 4, π 2, and 3π 4, will be first computed. Then for each of oriented gradient maps, a strong classifier is used to classify samples into two given ethnic classes: i.e. Asian vs. Non-Asian. Finally, using a sum rule, we integrate similarity scores at each orientation to achieve final decision on the 2D or 3D modality, as in (4) and (5). where h(x) = 1 M h(x) = M h ith orie.(x) (4) i=1 T α t h t (x) (5) i=1 and h(x) is the final strong classifier generated by Adaboost, if h(x) 0, it means that this sample is an Asian, otherwise is a non-asian. Meanwhile, in order to further improve the overall classification accuracy, we integrate both similarity measurements of modalities (2D and 3D) using a sum rule as in (6). h overall (x) = h 2D(x) + h 3D (x) 2 IV. EXPERIMENTAL RESULTS (6) To evaluate the proposed method, in this study, the experiments were mainly carried out on FRGC v2.0 [9], which is one of the most comprehensive and popular datasets, made up of 4007 textured 3D face models from 466 subjects. It contains 99 Asians (1121 facial images) and 367 non-asians (2886 facial images), and a detailed ethnicity distribution is shown in table I. We cropped the face area of each model according to the indicator matrix showing whether there is a valid point in that position. While median filter was adopted to remove spikes and cubic interpolation was employed to fill holes. The face samples for experiments possess expression, illumination, and slight pose variations. Several examples of facial range and texture images are demonstrated in Fig. 3. All the images are resized to 96 120 pixels. Based on the OGMs of facial texture and range images, we first make use of decision trees to build our weak classifiers: given a gallery set {x i i = 1, 2,..., N} and related k th OGMs {G i,k i = 1, 2,..., N, k = 1,..., M} (as mentioned in last section k is set at 4 in our case), we build a decision tree Fig. 3. An example of range and intensity images in FRGC v2.0

TABLE I THE ETHNIC DISTRIBUTION IN THE COMPLETE FRGC V2.0 DATABASE. Asian White Hispanic Asian-Middle-Eastern Asian-Southern Black-or-African-American Unknown Total Scans 1121 2554 113 16 78 28 97 4007 Subjects 99 319 13 1 12 6 16 466 TABLE II THE ETHNIC DISTRIBUTION IN THE ENTIRE BU-3DFE DATASET. White East-Asian Black Middle-East Asian Indian Hispanic-Latino Total Scans 1275 600 225 50 150 200 2500 Subjects 51 24 9 2 6 8 100 As most of the tasks in the literature, due to lacking labels of minority, we consider ethnicity classification as a binary classification problem, classifying ethnics into two-class: i.e. Asian and Non-Asian. We designed four experiments as follows: the first one is to test performance of the proposed method for texture based facial ethnicity classification; while the second is to evaluate its accuracy for shape based ethnicity classification. The third one combines both 2D and 3D modalities at matching score level for result improvement. In the last one, we investigate the importance of certain face regions, such as eyes and nose, in ethnicity classification. In each experiment, we set the size of the gallery samples using the percentage of the whole dataset, varying from 30% to 80% with a step of 10%, and the remaining samples were treated as probes. The 3D facial scans from the same subject are grouped into the same set to ensure that the results are not biased by the similarity between the testing and the training data in terms of the identity. Additionally, in order to examine the applicability of the proposed approach, we also validated our method on the BU- 3DFE database which contains 100 subjects : 51 Whites, 24 East-Asians and others (more details are shown in Table II ). Because of the imbalance of ethnic groups, we employed 24 samples randomly selected from Whites and all samples of East-Asians as a binary classification. The settings of probe and gallery set are the same as those of FRGC v2.0. For each experiment, we repeated 10 times and calculated the average performance. A. Texture based Ethnicity Classification (2D Modality) In this experiment, we evaluated performance of the proposed ethnicity classification approach on facial texture images. In order to highlight its effectiveness, we implemented several techniques in the literature for comparison including Grid+SVM [10], Haar+Adaboost [2], LBP+Adaboost [11], LBP+SVM [13]. From Fig. 4, we can see that even though in a few settings (50% and 80% samples are exploited in the training phase), the results of the proposed approach are slightly inferior to those of the state of the art ones (but still comparable), and the performance generally remains stable as the size of the 100 95 2D Facial Ethnicity Classification 85 OGM Adaboost (average 96.53%) Haar Adaboost (average 95.85%) LBP Adaboost (average 91.92%) LBP SVM average (average 93.95%) Grid SVM average (average 94.89%) 80 Fig. 4. Performances of different methods for 2D facial ethnicity classification on the FRGC v2.0 dataset. gallery set changes. The average classification rate is 96.5% which outperforms that of the other tasks, demonstrating the effectiveness of OGMs to describe texture information. B. Shape based Ethnicity Classification (3D Modality) At this step, in order to be consistent with the experiment using facial texture images, we adopted the same experimental configuration. Figure 5 clearly illustrates that in facial range image based ethnicity classification, our approach achieves better results than any of the others do. Even if the percentage of training samples is only 30%, its classification accuracy is still above 96%. Those experimental accuracies demonstrate that OGMs outperform other well-known local features in discriminative power for shape information representation. C. Multi-modal Score Fusion Table III displays the classification results when combing both the 2D and 3D modalities. We can see that the joint use of the texture and geometry clues leads to better performance than either of the single modality. From Fig. 6, we can see the similar phenomenon as in the last experiment only using shape information. The proposed approach surpasses the other ones and keeps robust when the gallery size varies.

100 3D Facial Ethnicity Classification 100 Integrated Ethnicity Classification 95 85 OGM Adaboost (average 96.98%) Haar Adaboost (average 96.27%) LBP Adaboost (average 93.53%) LBP SVM (average 93.07%) Grid SVM (average 95.05%) 80 Fig. 5. Performances of different methods for 3D facial ethnicity classification on the FRGC v2.0 dataset. TABLE III PERFORMANCES OF THE PROPOSED APPROACH BASED ON 2D, 3D AND BOTH MODALITIES FOR ETHNICITY CLASSIFICATION ON FRGC V2.0. 98 96 94 92 88 86 OGM Adaboost (average 97.82%) Haar Adaboost (average 97.44%) LBP Adaboost (average 95.49%) LBP SVM (average 95.38%) Grid SVM (average 97%) Fig. 6. Integrated Performances of different approaches for facial ethnicity classification on FRGC v2.0. Percentage 2D Modality 3D Modality Both Modalities 30% 95.56% 96.37% 97.22% 40% 95.89% 96.80% 97.62% 50% 96.34% 96.88% 97.63% 60% 97.00% 97.60% 98.06% 70% 97.07% 97.50% 98.17% 80% 97.27% 97.84% 98.26% Fig. 7. Four regions of face intercepted to estimate their contributions for ethnicity classification. D. Importance Analysis of Facial Regions Recall that different ethnic groups usually have local discriminations in facial regions. To investigate the importance of face areas, in this experiment, since all images are resized to a pre-defined scale, according to the positions in the facial images and utilizing some fixed rectangle boxes, we roughly divide facial images into four areas, each of which possesses a distinctive characteristic: eyes, nose, forehead and mouth, as shown in Fig. 7. Then, we exploit the proposed approach to estimate the contribution of each region. Figure 8 demonstrates that among the four selected areas, the eyes and nose are more discriminative than mouth and forehead in ethnicity classification. When combining the four facial regions, performance is improved, suggesting that each area has its own impact. On the other hand, we also analyze the importance of each facial organ in the viewpoint of machine learning and observe the set of OGM features selected by Adaboost. As depicted in Fig. 9, the first five chosen features of each OGM of both modalities are mainly located in the eye region and the nose region, highlighting the importance of the two areas. Additionally, we can see that the conclusion in this analysis accords with the previous one. E. Performances on the BU-3DFE Database Table IV demonstrates the results of the proposed method for the binary classification (Whites vs. Asians) on the BU- 3DFE database. Since samples employed from BU-3DFE in 100 95 85 80 75 Regions of face 70 Entire face Forehead 65 Eyes Nose Mouth 60 Fig. 8. Performances of different face regions using OGM+Adaboost on 3D facial ethnicity classification. Fig. 9. The first five selected features for each OGM of both modalities.

TABLE IV PERFORMANCES OF THE PROPOSED APPROACH BASED ON 2D, 3D AND BOTH MODALITIES FOR ETHNICITY CLASSIFICATION ON BU-3DFE. Percentage 2D Modality 3D Modality Both Modalities 30% 92.92% 92.92% 94.84% 40% 95.24% 94.44% 96.63% 50% 95.00% 96.38% 97.21% 60% 94.78% 97.48% 97.56% 70% 95.35% 97.50% 97.70% 80% 96.72% 97.32% 97.88% the experiment is less than those of FRGC v2.0, the accuracy is slightly lower in the same experimental setting. However, we can still see that the proposed method achieves more than % accuracies in both the single modalities, i.e. texture and geometry as the number of training samples increases. When we finally combine these similarity measurements of the two modalities, the result is further improved, indicating that both clues contribute to the classification result. V. CONCLUSION This study proposed a novel approach for the issue of ethnicity classification by fusing both the boosted local texture and shape features extracted from 3D face models, in contrast to the existing ones which only depend on 2D facial images. The proposed method makes use of Oriented Gradient Maps (OGM) to highlight local geometry and texture variations of human faces, and further adopts Adaboost to learn a compact set of discriminative features highly related to the ethnicity properties for classification. Experiments are carried out on the FRGC v2.0 database, and the performance is up to 98.3% to distinguish Asian from non-asian people when 80% face samples are exploited in the training stage, demonstrating the effectiveness of the proposed method. [9] P. Phillips, P. Flynn, T. Scruggs, K. Bowyer, J. Chang, K. Hoffman, J. Marques, J. Min, and W. Worek, Overview of the face recognition grand challenge, in Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, 2005, pp. 947 954. [10] X. Lu, H. Chen, and A. Jain, Multimodal facial gender and ethnicity identification, in Proc. IEEE International Conference on Biometrics, 2006, pp. 554 561. [11] G. Zhang and Y. Wang, Multimodal 2d and 3d facial ethnicity classification, in Proc. IEEE International Conference on Image and Graphics, 2009, pp. 928 932. [12] G. Toderici, S. OMalley, G. Passalis, T. Theoharis, and I. Kakadiaris, Ethnicity-and gender-based subject retrieval using 3-d facerecognition techniques, International Journal of Computer Vision, vol. 89, no. 2, pp. 382 391, 2010. [13] J. Lyle, P. Miller, S. Pundlik, and D. Woodard, Soft biometric classification using periocular region features, pp. 1 7, 2010. [14] D. Huang, M. Ardabilian, Y. Wang, and L. Chen, Oriented gradient maps based automatic asymmetric 3d-2d face recognition, in Proc. IEEE/IAPR International Conference on Biometrics, 2012, pp. 125 131. [15] Y. Freund and R. Schapire, A desicion-theoretic generalization of online learning and an application to boosting, in Proc. Computational Learning Theory, 1995, pp. 23 37. REFERENCES [1] X. Lu and A. Jain, Ethnicity identification from face images, in Proc. SPIE Defense and Security Symposium, vol. 5404, 2004, pp. 114 123. [2] Z. Yang and H. Ai, Demographic classification with local binary patterns, in Proc. International Conference on Biometrics, 2007, pp. 464 473. [3] P. Viola and M. Jones, Rapid object detection using a boosted cascade of simple features, in Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, 2001, pp. 511 518. [4] T. Ojala, M. Pietikainen, and D. Harwood, Performance evaluation of texture measures with classification based on kullback discrimination of distributions, in Proc. IAPR International Conference on Pattern Recognition, vol. 1, 1994, pp. 582 585. [5] T. Ojala, M. Pietikainen, and T. Maenpaa, Multiresolution gray-scale and rotation invariant texture classification with local binary patterns, IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 24, no. 7, pp. 971 987, 2002. [6] S. Hosoi, E. Takikawa, and M. Kawade, Ethnicity estimation with facial images, in Proc. IEEE International Conference on Automatic Face and Gesture Recognition, 2004, pp. 195 200. [7] G. Guo and G. Mu, A study of large-scale ethnicity estimation with gender and age variations, in Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2010, pp. 79 86. [8] L. Yin, X. Wei, Y. Sun, J. Wang, and M. Rosato, A 3d facial expression database for facial behavior research, in Proc. IEEE International Conference on Automatic Face and Gesture Recognition, 2006, pp. 211 216.