ROBUST FACE DETECTION UNDER CHALLENGES OF ROTATION, POSE AND OCCLUSION Phuong-Trinh Pham-Ngoc, Quang-Linh Huynh Department o Biomedical Engineering, Faculty o Applied Science, Hochiminh University o Technology, Vietnam {equation/huynhqlinh}@hcmut.edu.vn Abstract: Face detection has been a typical active research domain or decades because it can be applied in many ields such as human machine interaction, surveillance, commercial application and health care. In this paper, we propose an automatic ace detection system which bases on human skin detection, natural properties o aces and the classiication strength o Local Binary Patterns (LBPs) and embedded Hidden Markov Models (ehmms). We create a developed skin color model that reduces eectively similar skin colors causing noise to receive better skin detection or dierent human races. With detected skin regions, natural properties o aces are used to discard non-ace objects to retain most reasonable ace candidates. A lexible classiication combining LBP histogram matching and embedded Hidden Markov Models (ehmms) is used to determine whether detected candidates are aces or not. The advantages o this classiication is reducing eectively the impact o ace rotation, pose and occlusion. The experiments show that our system is robust to detect human aces in both video sequences and still images with 93% correct detection among the variety o acial test databases orming rom dierent sources. Keywords: Face detection, skin segmentation, Local Binary Patterns (LBPs), embedded Hidden Markov Models (ehmms), ace pose, ace rotation, occlusion. 1 INTRODUCTION As regards the rapidly aging and developing o society, the cost and importance o health care are increasing year by year. Many researchers have concentrated to create robot-aided health care systems or elderly people, or taking care o children, etc. These systems always require a human-robot interaction (HRI) whose principal step is ace detection. The better ace detection system is, the better HRI is. Besides, ace detection has been also investigated because o its wide applications such as surveillance, commercial applications, security, etc. Many ace detection method in image [1] have been published and have achieved encouraging results. However, ace detection is still challenging task due to variability in rotation, pose and occlusion. The ace detection method implemented in OpenCV by Rainer Lienhart [2] is very similar to the state-o-art one published and patented by Paul Viola and Michael Jones [3]. Although this is one o the most popular and useul algorithms nowadays, it is still hard to detect rotated, proile and occluded aces. This paper presents an improved ace detection system to solve these problems. This paper is organized as ollows: human skin detection by developed skin-color model is introduced in section 2. In the next section, ace candidate localization based on natural properties o human aces is described. In section 4, we present a hybrid classiication method to veriy which ace candidate is real ace. To reduce the inluence o occlusion, the human ace is divided into two parts and each o them gives us one LBP histogram. The mixed histogram orms rom these two histograms is regarded as acial representation. A hybrid classiication combining template matching and appearance-based method is used to identiy whether ace candidates as ace or not. This is a combination o LBP histogram matching and ehmms in hierarchical classiier. The experiments are shown and discussed in section 5. Finally, we conclude in section 6. 2 HUMAN SKIN DETECTION Skin color is important and powerul inormation or human ace. The use o color inormation can simpliy the task o localization in complex environment. It allows ast processing and is highly
robust to geometric variations o the ace pattern. For skin detection task, many colorspaces with dierent properties have been applied. Many researchers have achieved encouraging results with RGB, normalized rgb, HSI, YCrCb and RGB-space ratios. A survey o skin color detection can be ound in [4]. However, there are many challenges in this task such as dierent illumination conditions, human races, and similar skin colors. We create an improved skin model in RGB space to obtain better skin detection or various human races. This skin model can reduce eectively similar-skin colors causing noise in skin segmentation process. These colors can be yellow, white, orange, pink, red, the brown color o wood, and the yellow color o sand. Other skin models such as the works o Peer [5] and Lin [6] show nice skin detection but they are still sensitive to remain those non-skin colors. The model o Lin oten retains similar white, yellow, orange, color o sand and grey color as shown in Fig. 1(b). As shown in Fig. 1(c), the model o Peer is sensitive to keep red and pink color. This weak point sometimes becomes a serious problem because non-skin regions are retained too many to determine correct ace regions. Our way is to build a skin classiier to deine explicitly the boundaries o skin cluster in RGB space. This method is simple to lead a rapid classiier. Decision rules o our skin modeling are as ollows: 1, i a set o conditions is satisied δ ( P( x, y)) = (1) where P(x,y) is a pixel o color image and a set o conditions are listed in Table 1. Table 1: A set o conditions deining skin pixels. These conditions should be satisied simultaneously. R (R-G) (G-B) B G [70,85] [30,55] [-5,35] [20,255] [30,255] [86,100] [30,60] [-5,40] [30,255] [40,255] [101,150] (R-G) (G-B) (R-B) (R+B-2G) G [0,30] [-10,45] [15,75] [-15,285] - [31,75] [-5,90] [-255,120] [-20,285] [50,255] [151,200] (R-G) (G-B) (R-B) (R+B-2G) B [15,20] [-5,40] [20,255] [-20,285] - [31,85] [-15,70] [20,255] [0,285] [40,255] [201,255] (R-G) (G-B) (R+B-2G) [5,25] [40,70] [-30,285] [26,100] [0,70] [-15,285] In act, the strongest component among R, G, B decides the color. For skin color, generally R component is always the strongest one because human skin has the special expression o blood color. The color is not skin color i the dierence between R, G and B are too big or small or R value is smaller than 70. The level o red color aects the decision rule o our skin model. Approximately, we divide R component into ive main ranges whose values are larger than 70. Our work is adjusting reasonable dierences between R, G, and B according to these ranges. The proposed skin model gives better skin detection or dierent human races in various environments. Several skin detection results shown in Fig. 1 prove the advantage o our skin model comparing to others. Figure 1: Comparison o skin detection results: (a) Original color image, (b), (c) and (d) Skin detection results rom the skin color models o Lin, Peer and proposed skin model, respectively. 3 FACE CANDIDATE LOCALIZATION An overview o proposed ace detection system is described in Fig. 2. Color Image (Frame) Human Skin Detection Face Candidate Localization basing on Natural Property o Faces Face Candidates Hybrid Classiication basing on LBP Histogram Matching and ehmms Detected Faces Figure 2: Face detection system.
Ater skin segmentation by our skin-color model as shown in Fig. 3(a), we label connected skin regions and erase regions whose areas are smaller than the threshold. In our experiment, this threshold is 108 pixels considered as hal o the smallest ace to be detected. We call this step reducing small noise as shown in Fig. 3(b). This step can reduce unreasonable candidates to improve processing speed. Moreover, generally skin segmentation is aected by dierent illumination conditions; maybe we lose some ace regions. We will recover those necessary regions by labeling connected non-skin regions in each skin region and change them to skin ones. This process ignores non-skin regions connecting directly to boundaries o their skin regions and the ones whose areas are greater than selected thresholds. An example o this step is shown in Fig. 3(c). This step is also able to protect our system under dierent illumination conditions. (a) (c) (e) (b) (d) () Figure 3: Several examples o steps in ace candidate localization. With skin regions having properties o human aces, we preserve them by covering their areas with skin ellipses. Because human aces have nearly elliptic shape, we use area condition or special ellipse region as deined in Fig. 4 to check this property. For each skin region, area condition or ellipse region E is as ollows, P E ( i) T H W (2) where P E (i) is a binary value which equals 1 i skin pixel i belongs to the inner region o ellipse E, T is a selected constant, H and W are the height and width o skin region rectangle. Figure 4: Skin region rectangle with special ellipse E. Those works are important or our ace candidate localization step because we may lose correct aces under strong separation scheme, which is introduced now. In some cases, human aces in images can be connected together or with other things such as hand, arm. This is one o challenges or ace detection task. Some researchers have tried to use various ways to locate aces such as using Hough transorm to ind elliptic skin region considered as ace candidate [7]. However, these methods are sensitive to ail to detect aces because the real shapes o human aces are changed when connections occur. We use histograms o skin pixel intensity to separate connected objects. For each skin region rectangle, we calculate horizontal and vertical histograms o skin pixel intensity. In act, basing on these histograms, there are concave regions at the intersection between dierent parts. We set all histogram values at those concave regions become zero to divide those parts according to unction (3), h( i), h( i) t h( i) = (3) where i is the i th bin o horizontal or vertical histogram, h(i) is histogram value o the i th bin and t is selected coeicient. Ater dividing connected objects, we reject nonace skin regions by several geometric conditions as shown in unction (4), 1, S threshold and δ ( H, W, S) = {( H W 3H ) ( W H 4W )} (4) where H and W are the height and width o skin region rectangle and S is the number o skin pixels belonging to skin region rectangle. An example o skin ellipses and separation process is shown in Fig. 3 (d) and the one o reducing non-ace is displayed in Fig. 3(e). Finally, as shown in Fig. 3(), we receive the most potential ace candidates, which will be used in recognition step. In next sections, we will present how our system recognizes which ace candidates are aces and which ones are not.
4 A HYBRID CLASSIFICATION We use a hybrid method or recognizing objects. A hybrid method o our system is the combination o template matching and appearance-based methods. It is a hierarchical classiier shown in Fig. 5 applied or each ace candidate to determine whether this is human ace or not. Figure 5: Hierarchical classiier scheme. 4.1 Face and Non-ace Class Selection To use template matching and appearance-based methods, irstly we have to create the ace database or training. Face detection will become an easy problem i we have clearly ace and non-ace class modeling. However, it is diicult to model non-ace class because anything that is not a ace belongs to non-ace class. In our method, we collect 200 rontal and proile ace images to create ace samples. To create non-ace class, we choose three main non-ace objects: arm (50 samples), hand (50 samples) and noise (100 samples). All samples are 72x93 size color images. In our experiments, those samples are enough to represent ace and non-ace classes. 4.2 A Modiied Local Binary Patterns or Face Representation Human ace is a near-regular texture pattern generated by acial components and their conigurations. Considering acial components such as eyebrow, eye, pupil, nose and ace boundary, we select eight main dierent spatial templates shown in Fig. 6 to preserve shape inormation o acial components. (0) (1) (2) (3) (4) (5) (6) (7) Figure 6: Eight main spatial templates. With only those spatial templates, we can describe all acial components; or example, a union o templates d, b and c can describe eyebrow. However, we combine both those spatial and local texture inormation to improve the capacity o describing aces. Instead o considering the central pixel P C only with its each neighborhood pixel as original LBP operator did [8], our method uses each pair o two neighborhood pixels (P i1,p i2 ) according to spatial templates to compare with the central pixel P C. Eight spatial templates orm eight binary digits o mlbp number. Thereore, mlbp operator produces 256 dierent values. Equation (5) gives the computation o mlbp number. 7 i mlbp = S i ( x) 2 (5) i= 0 where S i is the i th binary digit o mlbp number, 1, ( PC > Pi 1 ) ( PC > Pi 2 ) S i ( x) = (6) In act, mlbp gives us inormation about both local shapes through eight spatial templates and local textures. We retrieve more inormation to represent ace patterns eectively. 4.3 Mixed mlbp Histogram Matching We use the histogram o mlbp coeicients to represent a ace. I we only use single mlbp histogram or the whole ace candidate image, occlusion will aect template-matching algorithm seriously. To reduce the impact o occlusion, in general, we divide human ace into two parts: the upper part rom nose up to orehead and the lower one rom nose down to neck. We calculate individual histogram or each part and connect them sequentially to create one mixed 255x2-bin histogram representing to ace candidate image. By this way, we reduce eectively the inluence o occlusion. Given an image I, one mixed mlbp histogram is denoted by H mlbpmix (I). We adopt error measurement because o simple and ast computation. A distance measurement is deined as: mlbpmix mlbpmix D( H ( I1), H ( I2)) = Hi ( I1) Hi ( I2) (7) where H mlbp (I 1 ) and H mlbp (I 2 ) are two mixed mlbp 255x2-bin histograms, and n is the number o bins. Given a ace database with m samples, or any sample P, we change it rom color image to grayscale one and deine its histogram-matching eature as the average distance to ace training samples as ollows:
m 1 ( P) = D( H ( P), H ( )) (8) ace X i m i= 1 where X i is a ace-training sample. In act, this histogram-matching eature has the discriminating ability between ace and non-ace patterns. Figure 7(a) which shows the positive and negative distance measure distribution over 156 ace samples and 121 non-ace samples demonstrate this property. 4.3 embedded Hidden Markov Models In our algorithm, we deine non-ace class as three dierent sub-classes: arm, hand, and noise. It means our ace detection is changed to our-class pattern classiication problem. ehmms [9] perorms pattern recognition or a our-class problem by determining the maximum likelihood to ind the most similar class or candidate object. Given training sets o positive and negative samples, we will have our ehmm models corresponding to our classes: ace, arm, hand and noise. A ace candidate, which was ignored by the two irst stages o ace detection system, is checked by ehmms. Finally, this is not human ace i the result o this ace candidate under ehmm stage is non-ace. (a) (b) Figure 7: Distribution o distance measurement: (a) Distribution o ace, (b) Distribution o D. With this eature ace, we use thresholds called T ace to classiy ace and non-ace objects. In our experiment, i ace is smaller than 1800, ace candidate can be considered as ace with 99% o correct detection. I ace is bigger than 3500, the ace candidate is almost not a human ace. Only in the range [1800,3500] o T ace, it is still hard to say i the ace candidate is a ace. We improve ace detection in this range by the ollowing eature. With non-ace database, or any sample P, we also deine its histogram-matching eature as the minimum o three average distances to three non-ace object-training samples, given by nonace ( P) = min( arm ( P), hand ( P), noise ( P)) (9) where arm, hand, noise are calculated ollowing to (8). The dierence between nonace and ace shown in Fig. 7(b) also has the discriminating ability between ace and non-ace patterns. We call it dierence D, given by D ( P) = nonace( P) ace( P) (10) We use D to improve the ace detection rate when T ace is in [1800,3500]. We deine the explicit thresholds T D or D to distinguish ace and non-ace patterns. We speciy matching conditions or both ace and D and use them jointly or the two irst matching stages in hierarchical classiier. Ater those two stages, the ace detection rate can reach over 80%. In order to increase the perormance o our system, ehmms is used as the last step to check ace candidates that are not satisied the two irst template matching steps to give the conclusion. Figure 8: Scheme o ehmms algorithm used in ace detection I aces appear in image, the result o our ace detection system is extracted human aces as shown Fig. 9. Figure 9: An result example o ace detection system 5 EXPERIMENTS To evaluate our system, we built both static and video sequence acial databases. Static database includes totally 500 color images rom dierent sources: Caltech ace database, Smugmug image library, amily photos world wide web. Video sequence database is captured by dierent cameras and extracted rom several movies such as Harry
Porter, etc. Both databases contain multi-ace images with rontal and proile aces under variations in rotation, position, size, and acial expression. With proposed skin model, our system showed eectively to detect multiple aces with various skin-tones, acial expression, and sizes. It can also detect proile aces with the angle about or even more than 90 degrees under complex backgrounds. It also detects successully occluded aces with occlusion less than 50% o ace region. Our system can detect aces in rotation conditions as well, but it may ail in detecting horizontal aces because o mixed LBP histogram matching. Several examples are shown in Fig. 10 to show how well our system works. The summary results or those problems are described in Table 2. Finally, the correct ace detection rate or our proposed system is 93%, which proves that our method is more eective in detecting aces comparing to other methods as shown in Table 3. The speed o our system is about 5 ps or 320x240 size image so that we can apply this ace detection method in real HRI applications. 6 CONCLUSIONS In this paper, we proposed and improved color-based ace detection method. The contributions o our paper are creating the boosted skin-color model reducing more noise to obtain more reasonable ace candidates, using improved LBP histogram matching to overcome the problem o rotation and hybrid method o classiier to reduce the eect o occlusion and pose. Especially, our system is improved to detect proile aces with the angle o pose about 90 degrees. Our hybrid method shows a better ace detection capacity than using separately ehmms or LBP histogram matching. Our uture work is planning to apply and develop this system to ace recognition task. ACKNOWLEDGEMENTS The authors would like to thank to the department o Engineering Physics, Hochiminh university o Technology to support us to do this work. REFERENCES Figure 10: Several ace detection results under multiappearance, rotation, pose and. Table 2: Face detection results under rotation, occlusion, connection and pose. Rotation Occlusion Pose Face detection rate (%) 93 75 92 Table 3: Comparison o ace detection results between dierent methods. Proposed algorithm mlbp histogram matching ehmms Detection rate (%) 93 80 64 1. M. H. Yang, D. J. Kriegman and N. Ahuja: Detecting Faces in Images: A survey. IEEE Trans. on PAMI, vol. 24, no. 1, pp. 34-58, 2002. 2. R. Lienhart, J. Maydt: An Extended Set o Haarlike Features or Rapid Object Detection. Proc. O the IEEE Con. on Image Processing (ICIP 02), 2002. 3. P. Viola and M. Jones: Robust Real-time Object Detection. International journal o Computer Vision, 2002. 4. V. Vezhnevets, V. Sazonov and A. Andreeva: A survey on Pixel-based Skin Color Detection Techniques. Proc. Graphicon, pp. 85-92, 2003. 5. P. Peer, J. Kovac and F. Solina: Human Skin Colour Clustering or Face detection. EUROCON, 2003. 6. C. Lin and K. C. Fan: A Color-Triangle-Based Approach to the Detection o Human Face. BMCV, vol. 1811, pp. 359-368, May 2000. 7. R. Seguier: A Very Fast Adaptive Face Detection System. International Conerence on Visualization, Imaging and Image Processing, 2004. 8. L. G. Shapiro and G. C. Stockman: Computer Vision. Prentice Hall, New Jersey, 2001. 9. A. Neian and M. Hayes: Face Recognition using an embedded HMM. Proc. Audio and Video-based Biometric Person Authentication, pp. 19-44, 1999.