Face Detection for Skintone Images Using Wavelet and Texture Features 1 H.C. Vijay Lakshmi, 2 S. Patil Kulkarni S.J. College of Engineering Mysore, India 1 vijisjce@yahoo.co.in, 2 pk.sudarshan@gmail.com Abstract Images with Skin-tone regions pose new challenges in face detection problem. Paper presents a novel detection algorithm that uses a combination of colour spaces and edgedetection for segmentation followed by texture feature analysis. Experimental results indicate improved false acceptance rates. I. INTRODUCTION Face detection and localization is the task of checking whether the given input image contains any human face, and if so, returning the location of the human face in the image. A large number of factors that govern the problem of face detection [9].The long list of these factors include the pose, orientation, facial expressions, facial sizes found in the image, luminance conditions, occlusion, structural components, gender, ethnicity of the subject, the scene and complexity of image s background. Faces appear totally different under different lighting conditions. A thorough survey of face detection research work is available in [9]. In terms of applications, face detection and good localization is an important pre-processing step in online face recognition Face detection is the first step in online face recognition. Typical face detection is as shown in Fig. 1a and Fig. 1b. This paper is organized as follows. Section II briefly highlights the various challenges encountered while detecting and localizing the faces in skin tone regions. The importance of preprocessing is highlighted in section III. The importance of texture features and their extraction is discussed in section IV. The proposed algorithm is given in section V along with results followed by conclusion in Section VI. Fig. 1a: Input Image Fig. 1b: Image with Detected Faces II. CHALLENGES For the problem of face detection involving colour images having complex scenes [4], the use of skin pixel properties for segmentation reduces the search space to a greater extent. Most of the detection algorithms have considered images having non-skin tone background, people wearing non-skin tone dresses etc. If images contain skin tone background, then the entire region is identified as skin region ( Fig. 2a and Fig. 2b ). While segmenting faces of people wearing skin-tone dresses ( Fig. 2c and Fig. 2d ) using skin pixel segmentation pre-processing technique, the entire image of the person with skin-tone dress is detected as the face region and hence requires a further face localization step. Besides, overlapping face regions also add additional constraints while segmenting the faces. Due to variation in illumination, skin regions may not be properly identified as skin during skin segmentation. Locating faces in these circumstances is more complex as opposed to localizing faces with uniform, non skin-tone background. The Open CV face detection [14] software correctly identifies the face region mostly with images containing single face, but the dataset containing multiple faces with skin-tone background and dresses when tested with the Open CV face detection algorithm based on Viola-Jones [13] failed to detect multiple faces Fig 3. 646
Fig. 2a: Input Image with Skin tone Background Fig. 2b: Segmented Image Using HSI and YCbCr Combination Fig. 2c: Input Image with Skin Tone Dress Fig. 2d: Segmented Image Using HSI and YCbCr Colour Space Fig. 3: Face Detection Using Open CV III. PREPROCESSING Preprocessing step is the stage where the input image is processed to extract the regions of interest. The main task of this stage is to avoid processing regions which are not the candidate regions for the algorithm to process and eliminate these regions from further processing. Segmentation and feature extraction are the two important pre-processing steps that play a vital role in face detection and localisation Segmentation is a process that partitions an image into regions [5] of uniform texture and intensity. In the problem of face detection, segmentation based on skin plays a major role in identifying the candidate face regions [7]. Though there are different segmentation methods, segmentation based on colour is considered. For a detailed survey of colour spaces refer [8]. Segmentation of the input image is the most important step that contributes to the efficient detection and localization of multiple faces in skin tone colour images Skin segmentation of the input image based on skin pixel cue combined with edges help to demarcate region boundaries and segment the image components efficiently [2]. Segmentation of an image based on human skin chromaticity using different colour spaces results in identifying even pseudo skin like regions as skin regions. Hence there is a need for further eliminating these pseudo skin regions. The segmentation using combination of colour spaces combined with Canny and Prewitt edge detection for obtaining the region boundaries segment better [2]. For a detailed study and comparison of edge detecting techniques refer [6]. Feature extraction is an important step in pattern classification. Face detection is a two class pattern classification problem classifying whether the given test pattern belongs to face class or non face class. Feature extraction stage is responsible for finding a set of prominent and robust attributes that would help to correctly decide the object classification boundary. IV. TEXTURE FEATURES Texture is the local intensity pattern. It is one of the most important defining characteristics of an image. It is characterized by the spatial distribution of gray levels in a neighborhood. There are different types of texture feature extraction using statistical, geometrical, and model based and signal processing methods Texture analysis [11], classification and segmentation are of significance to researchers. Refer [11, 12] for detailed texture analysis. To accomplish texture analysis task, the first step is to extract texture features that most completely embody information about the spatial distribution of intensity variations in the textured image. A Gray Level Cooccurrence Matrix is a square matrix with elements corresponding to the relative frequency of occurrence of pairs of gray level of pixels separated by a certain distance in a given direction. GLCM is a tabulation of how often different combination of pixel brightness values (gray levels) occur in an image. The distribution in the matrix depends on the angular and distance relationship between pixels. After creating GLCM, it is required to normalise the matrix before texture features are calculated. Once the GLCM has been created and normalised, various features can be computed from it. Haralick[10] has proposed a number of useful texture features that can be computed from the co-occurrence matrix. Table I lists some of these features. Contrast returns a measure of intensity contrast between a pixel and its neighbourhood. Contrast is 0 for a constant image. Energy means uniformity, or angular second moment (ASM). Entropy is a measure of randomness of intensity image. Correlation measures the joint probability occurrence of the specified pixel pairs. Homogeneity 647
measures the closeness of the distribution of elements in the GLCM to the GLCM diagonal. Image with more number of occurrences of particular color configurations has resulted in higher value of entropy. Local Homogeneity measures the similarity of pixels. Generally, GLCM is computed over a small square window of size N centered at a certain pixel (x, y) and then window is moved by one pixel in the same manner like convolution kernel. Fine texture description requires small values of d and/or small window size, whereas coarse texture requires large values of d and/or large window size. An average over all orientations is taken so that these matrices are rotation invariant. The number of gray levels in the image determines the size of the GLCM. By default, gray level co-matrix uses scaling to reduce the number of intensity values in an image to eight, but in our experiments we have considered 32 intensity bins. V. PROPOSED ALGORITHM In this paper, a combination of colour spaces (HSI and YCbCr) to identify the skin regions combined with Canny and Prewitt edge detection algorithm for better boundary separation is used. As all the skin segmented regions are not face regions, each segmented region has to be checked whether the segmented region contains face or not the following algorithm is proposed. This algorithm is implemented in two phases namely Feature extraction phase and Testing phase. A. Feature Extraction Phase For feature extraction, images with only dominant facial features eyes, nose and mouth with variations in pose and expressions which fit in to a window of size 88 X 80 are considered. We have used 35 such images for feature extraction. Step-1: Wavelet decompose each histogram equalized face image containing prominent facial features with slight variation in pose and expression, retain only approximation image and discard the horizontal, vertical and diagonal details. Step-2: A 32X32 gray level co-occurrence matrix with 32 gray level bins and at 4 different angles (0, 45, 90, -45) are generated for the first level approximation images of each training image considered for feature extraction. Step-3: Haralick features listed in Table-1 is calculated at all the 4 different angles considered at a distance of 1 and thus computing 4 different values for each feature for each training image. Step-4: For each texture feature compute the average of different angles Step-5: Compute the overall mean of each feature of the entire set of images considered for feature extraction. B. Testing Phase Step-1: Segment the input colour image using the algorithm proposed by the authors [4]. Select the segmented regions satisfying constraints like aspect ratio, area greater than 800 pixels, havng holes due to the presence of eyes and mouth as the probable face regions and omit other segmented regions. Step-2: If the size of the selected segmented region is larger than the size of the face with prominent facial features, use a sliding window technique in order to locate the face in the segmented region. We have considered a window of size 88 X 80. Place the fixed size window on the selected segmented regions. If the length of the segmented region is larger than the window size slide the window horizontally, slide the window vertically if the width of the segmented region is larger, slide both horizontally and vertically if both the length and width are larger. Step-3: Wavelet decompose each histogram equalized window content as the window slides across the selected segmented region. Retain only approximation image and discard the details. Step-4: A 32X32 gray level co-occurrence matrix with 32 gray level bins and at 4 different angles (0, 45, 90, -45) are generated for the first level approximation for each first level approximation of the window content Step-5: Haralick features listed in Table-1 is calculated at all the 4 different angles considered at a distance of 1 and thus computing 4 different values for each texture feature. Follow step-4 of feature extraction phase. Step-6: Using city block distance measure compute the distance from the mean value of each feature. To fix up the similarity distance thresholds, run the data set used for feature extraction as test set and fix up the minimum and maximum thresholds as T1 and T2. Distance lies between T1 and T2 for the window contents containing face images. Best match is found when the threshold is closer to the average of T1 and T2. C. Results In our experiments 88 X 80 face images were used for extracting approximation image texture features. Though six different texture features like Energy, Entropy, Homogeneity, Dissimilarity, Contrast and Correlation were computed, Energy, Homogeneity, Entropy values were found to be consistent while classifying the window content as face or non-face in skin tone regions. In most of the cases these three measures were found to be consistent, in a few cases either Energy and Entropy or Energy and Homogeneity values matched with the cutoff thresholds while correctly clustering window contents as face. In [1] approximation coefficients were used without extracting texture features. The number of false acceptances was higher in skin tone regions and there were false rejections in skin tone images with complex background. The number false acceptances and false rejections were 648
reduced considerably when the edge strength of the second level approximation was used as the feature set. In [3], there are few false rejections and the number of false acceptance is minimized. False acceptances are noticed only when approximation images produce similar edges at identical positions. In [1], due to the presence of structural components, varying pose and bright sunlight, the number of false rejections was higher when similarity measure was computed using city block distance. False rejections were reduced, when Bhattacharya distance was used for computing the distance. Bhattacharya distance is computationally expensive. This algorithm was tried on a number of group images with variations in illumination, pose and structural components. Cumulative false rejection and acceptance cases along with the time taken by the algorithm to classify the window content as face or not a face is tabulated in Table II for various sliding window contents. If the faces in the segmented regions do not fit into the window completely, scan the input image at different resolutions and then check the segmented region for the probable face regions. The computation time required to test the given window contents for the proposed method is less than the time required for the algorithm used in [3]. with Edge Features Cityblock 1700 0.0383 30 3 Bhattacharya 1500 0.0096 128 20 Cityblock 1500 0.0094 44 38 The input image with skin tone background shown in Fig 2a after segmentation contains two candidate face regions Fig 4a and Fig 4b. When these segmented regions were tested with face localization algorithm using edges [3], due to the edges produced at identical locations, classifies the window contents as faces. ( Fig 4c, Fig 4d, Fig 4c and Fig 5 ).The same window contents when tested with the proposed algorithm, it correctly classifies them as non-faces. Fig. 4a Fig.4b Property Contrast Correlation TABLE I: HARALICK FEATURES Formula Fig. 4c Fig. 4d Fig. 4e Fig. 5 VI. CONCLUSION By using combination of color spaces along with Canny and Prewitt edge detection at the segmentation stage and texture elements as features, improved false acceptance rates can be achieved. Compared to other approaches small improvement false rejection can also be attained to detect faces in images with skin-tone regions. Angular second Movement(ASM) Energy Dissimilarity Entropy Homogeneity Method with Texture Features TABLE II Distance Windows Time False False Measure Tested Taken Positive Rejection Cityblock 1700 0.0226 21 20 REFERENCES [1] H.C. Vijaylakshmi, S. PatilKulakarni Face Localization and Detection Algorithm for Colour Images using Wavelet ions International Conference ACVIT-09 16th 19th December 2009, at Aurangabad. India pp 305 311. [2] H.C. Vijaylakshmi, S. PatilKulakarni Segmentation Algorithm for Multiple Face Detection for Color Images with Skin Tone Regions 2nd International Conference on Signal Acquisition and Processing February 2010, Bangalore February 9th -10th 2010, pp 162 166. [3] H.C. Vijaylakshmi, S. PatilKulakarni Face Detection and Localization in Skin Toned Color Images Using Wavelet and Edge Detection Techniques Paper accepted at International Conference Advances in Recent Technologies and Computing, ARTcom 2010. [4] M.A. Berbar, H.M. Kelash, and A.A. Kandeel Faces and Facial Features Detection in Color Images Proceedings of the Geometric Modeling and Imaging New Trends (GMAI'06) 0-7695-2604 7/06. [5] R.C. Gonzalez, R.E Woods Digital Image processing Second edition Prentice Hall India. [6] R. Maini & H. Aggarwal Study and Comparison of Various Image Edge Detection Techniques International Journal of Image Processing (IJIP), Jan-Feb 2009, Volume (3) Issue (1) pp.1 11. [7] S. L. Phung, A. Bouzerdoum, and D. Chai Skin Segmentation Using Colour and Edge Information Perth, Western Australia, Australia Proc. Int. Symposium on Signal Processing and its Applications, 1-4 July 2003, Paris, France. 0-7803-7946-2/03. [8] V. Veznevets, V Sazonov, nad A Andreeva, A survey on pixelbased skin color detection techniques, in Proceedings Graphicon- 2003 pp85 92. [9] Y. Ming-Hsuan, D.J. Kreigman, and N. Ahuja, Detecting Faces in 649
images: a survey Pattern Analysis and Machine intelligence, IEEE Transactions on Vol 24, pp. 34 58, 2002. Eric [10] R.M. Haralick, K. Shanmugan, I. Dinstein. Textural Features for Image Classification. IEEE Transactions on Systems, Man and Cybernetics, Vol. 3, 1973, No 6, 610 621. [11] M. Tuceryan and A. K. Jain Texture Analysis. [12] The Handbook of Pattern Recognition and Computer Vision (2nd Edition), by C. H. Chen, L. F. Pau, P. S. P. Wang (eds.), pp. 207 248, World Scientific Publishing Co., 1998. [13] P. Viola and M. Jones. "Robust real time object detection", International Journal of Computer Vision, 2001. [14] S. Krishna, OpenCV face detection system based on Viola-Jones Method, http://www.mathworks.com/matlabcentral /fileexchange/ 19912-open-cv-viola-jones-face-detection-in-matlab 650