Fast Feature Value Searching for Face Detection

Vol., No. 2 Computer and Informaton Scence Fast Feature Value Searchng for Face Detecton Yunyang Yan Department of Computer Engneerng Huayn Insttute of Technology Hua an 22300, Chna E-mal: areyyyke@63.com Zhbo Guo School of Informaton Engneerng Yangzhou Unversty Yangzhou 225009, Chna E-mal: zhbo_guo@63.com Jngyu Yang School of Computer Scence and Technology Nanjng Unversty of Scence and Technology Nanjng 20094, Chna E-mal: jngyuyang@mal.njust.edu.cn Ths research s supported n part by the NSFC under grant 60632050, the Hgh School Technology Fund of Jangsu provnce under grant 06KJD520024, and Technology Fund of Hua an under grant HAG05053 and HAG07063. Abstract It would cost much and much tme n face detector tranng usng AdaBoost algorthm. An mproved face detecton algorthm called Rank-AdaBoost based on feature-value-dvson and Dual-AdaBoost based on dual-threshold are proposed to accelerate the tranng and mprove detecton performance. Usng the mproved AdaBoost, the feature values wth respect to each Haar-lke feature are rearrange to a defnte number of ranks.the number of ranks s much less than that of the tranng samples, so that the test tme on each tranng samples s saved correspondng to the orgnal AdaBoost algorthm. Inhertng cascaded frame s also proposed here. Expermental results on MIT-CBCL face & nonface tranng data set llustrate that the mproved algorthm could make tranng process convergence quckly and the tranng tme s only one of 50 lke before. Expermental results on MIT+CMU face set also show that the detecton speed and accuracy are both better than the orgnal method. Keywords: Rank- AdaBoost, Feature value dvson, Dual-AdaBoost, Face detecton, Hnhertng cascade. Introducton For ts nterestng applcatons, automatc face detecton has receved consderable attenton among researchers n many felds, such as content-based mage retreval, vdeo codng, vdeo conference, crowd survellance, and ntellgent human computer nterface. Many methods have been proposed to detect faces n a gray mage or a color mage, such as Template Matchng, Mosac Image, Geometrcal Face Model, Dfference Pctures, Snake, Deformable templates, Statstcal Skn Color Models et al. Now the methods based on statstcal learnng algorthms have attracted more attenton, ncludng PCA, Artfcal Neural Networks and Support Vector Machnes, Bayesan Dscrmnant Features, etc. These methods show good performance n the detecton precson, but ther detecton speed needs to be ncreased(lang Luhong,2002,pp.449-458). The frst real-tme face detector was proposed by Vola and Jones (Vola and Jones, 200,2004,pp.37-54). They descrbed a face detecton framework that s capable of processng mages extremely rapdly whle achevng hgh detecton rates. There are three key contrbutons of ths detecton framework. The frst s the ntroducton of a new 20

Computer and Informaton Scence May, 2008 mage representaton called an Integral Image whch allows the features used by the detector to be computed very quckly. The second s a smple and effcent classfer whch s bult usng the AdaBoost learnng algorthm to select a small number of crtcal vsual features from a very large set of potental features. The thrd contrbuton s a method for combnng classfers n a cascade whch allows background regons of the mage to be quckly dscarded and focus on promsng face-lke regons. Smple Haar-lke features are extracted and used as weak classfers. The Vola and Jones frontal face detector runs at about 5 frames per second on 320 by 240 mage. As n the work of Rowley (Rowley et al,998,pp.22-38) and Schnederman(Schnederman,2000,pp.746-75), Vola and Jones(Vola and Jones, 2003) bult a mult-vew face detector wth AdaBoost to handle profle vews and rotated faces. However, one problem wth these approaches s that there are too many Haar-Lke features n a sngle face mage. Another dffculty s that a great deal of non-face tranng samples are used to reach good performance. The bg set of tranng samples not only slow down the tranng, but also ncrease the number of weak classfers greatly n cascaded detector. So AdaBoost on every round needs to search a large pool of canddate weak classfers and the computaton s very complex. Attemptng to get more effcent detector, mproved AdaBoost algorthm called Rank-AdaBoost and Dual-AdaBoost based on feature-value-dvson are proposed to speed-up the tranng and detecton performance. Frstly, for one Haar Lke feature, dstrbuton of feature values of all face samples s dvded nto defnte ranks. A small quantty of value s used as possble threshold n tranng nstead of all feature values of face samples. Then, the approach of fast dual-threshold fndng s developed n the enhanced AdaBoost algorthm, whch makes the tranng process faster and the detecton accuracy hgher. In the tranng process of cascaded detector, the formers classfers are transferred to the later, so that more non-face would be gnored. Both computatonal speed and the performance are mproved obvously by ths approach smultaneously. Expermental results on MIT-CBCL face set and MIT+CMU face set show that our method yelds hgher classfcaton performance than Vola and Jones' both on tranng speed and detecton accuracy. 2. AdaBoost usng Haar-Lke feature 2. Haar-Lke features and ntegral mage Haar-Lke features are a knd of smple rectangle features proposed by Vola et al. as shown n Fgure. The squares represent a face mage. The value of each Haar-Lke feature s computed as a dfference of the sum over the whte and black regons. It descrbes the local gray feature n the mage. Usng party and threshold on ths value, a class s predcted. Vola et al. use three knds of features. They dffer by ther dvson of two, three or four rectangular areas. Rotatng these three types could easly generate other knds of features. Every feature s characterzed by ts poston n the face frame, pre-specfed sze and type. Gven that the base resoluton of the classfer s 24 by 24 pxels, the exhaustve set of rectangle flters s qute large, over 00,000, whch s roughly O(N 4 ) where N=24 (.e. the number of possble locatons tmes the number of possble szes). The actual number s smaller snce flters must ft wthn the classfcaton wndow. Computaton of rectangle flters can be accelerated usng an ntermedate mage representaton called the ntegral mage. Usng ths representaton any rectangle flter, at any scale or locaton, can be evaluated n constant tme. Integral mage at locaton (x, y) s computed as the sum of the pxel values above and to the left of (x,y). A orgnal gray mage I and ts ntegral mage II s descrbed as follows: II ( x, y) = x, y, j= I(, j) The ntegral mage can be computed n one pass over the orgnal mage by usng the followng par of recurrences: S(x, y) = S(x, y ) + I(x, y) II(x, y) =II(x, y) + S(x, y) (where S(x, y) s the cumulatve row sum, S(x, ) = 0, and II(, y) = 0). Usng the ntegral mage, the value of a Haar-Lke feature can be computed by plus or mnus usng the ntegral mage. Any rectangular sum can be calculated n four array references. Clearly the dfference between two rectangular sums can be calculated n eght references. Snce the two-rectangle features defned above nvolve adjacent rectangular sums they can be computed n sx array references, and eght and nne references n the cases of three and four-rectangle features respectvely. In Fgure 2, (d) s an ntegral mage correspondng to the left mage (a) and the feature (b). (c) shows the feature on the mage. Smple p4+p p2 p3 (p6+p3 p4 p5) could gve the value of the feature. Much computng tme s saved. 2.2 AdaBoost algorthm In ts orgnal form, the AdaBoost learnng algorthm s used to boost the classfcaton performance of a smple learnng 2

Vol., No. 2 Computer and Informaton Scence algorthm (e.g., t mght be used to boost the performance of a smple perceptron). It does ths by combnng a collecton of weak classfcaton functons to form a stronger classfer. In the language of boostng the smple learnng algorthm s called a weak learner. So, for example, the perceptron learnng algorthm searches over the set of possble perceptrons and returns the perceptron wth the lowest classfcaton error. The learner s called weak because we do not expect even the best classfcaton functon to classfy the tranng data well (.e. for a gven problem the best perceptron may only classfy the tranng data correctly 5% of the tme). In order for the weak learner to be boosted, t s called upon to solve a sequence of learnng problems. After the frst round of learnng, the examples are re-weghted n order to emphasze those who were ncorrectly classfed by the prevous weak classfer. The fnal strong classfer takes the form of a perceptron, a weghted combnaton of weak classfers followed by a threshold. T f(x) = α ( x) t= where f(x) s the fnal strong classfer, ( h (x),h 2 (x),,h t (x)) s the serals of weak classfers. The h t (x) can be thought of as one feature wth a threshold. The orgnal form of the AdaBoost named as Int- AdaBoost s shown as: ) Gven example mages: (x, y ),, (x L, y L ), where y {, 0} ndcates postve or negatve examples; g j (x ) s the jth th t w, = 0.5/m 0.5/n m, otherwse Haar-Lke feature of th example x. 2) Intalze weghts Where m, n are the number of postve or negatve examples respectvely. L=m+n. 3) For t =... T a. Normalze the weghts 22 w t, = w t, / L w t, j j = b. For each feature j, tran a weak classfer h j, and evaluate ts error ε j wth respect to w t, ε = w hj ( x ) y, p jg j ( x) < p jθ j hj ( x) = 0 otherwse Where p j {, -} s a party bt and θ j s a threshold. c. Choose the classfer h t wth the lowest error ε t d. Update the weghts w t+, = w t, β -e t, where e =0 f example x s classfed correctly, e = otherwse, and β t = ε t /( ε t ). 4) Fnal classfer: T T = α h x H x t= t t ( ) 0.5 α t= t ( ) 0 otherwse where α t = log (/β t ). 2.3. Cascaded detector In an mage, most sub-mages are non-face nstances. In order to mprove computatonal effcency greatly and also reduce the false postve rate, a sequence of gradually more complex classfers called a cascade s bult. An nput wndow s evaluated on the frst classfer of the cascade and f that classfer returns false then computaton on that wndow ends and the detector returns false. If the classfer returns true, then the wndow s passed to the next classfer n the cascade. The next classfer evaluates the wndow n the same way. If the wndow passes through every classfer wth all returnng true, then the detector returns true for that wndow. The more a wndow looks lke a face, the more classfers are evaluated on t and the longer t takes to classfy that wndow. Snce most wndows n an mage do not look lke faces, most are quckly dscarded as non-faces. Fgure 3 descrbes the cascade. By usng cascaded detector, t s possble to use smaller and effcent classfers to reject many negatve examples at early stage whle detectng almost all the postve nstances. Classfers used at successve stages to examne dffcult cases become more and more complex. j L = t.

Computer and Informaton Scence May, 2008 Snce easly recognzable non-face mages are classfed n the early stages. Subsequent classfers are traned only on examples that pass through all the prevous classfers. That s to say, classfers of the later stages of the cascaded detector can be traned rapdly only on the harder, but smaller, part of the non-face tranng set. Therefore ths detecton approach would save the computatonal cost and mantan the performance smultaneously. Stages n cascade are constructed by tranng classfers usng AdaBoost. Cascaded detector s traned as follows: ) Input: Allowed false postve rate f, and detecton rate d per layer; target overall false postve rate F target. P denotes set of postve examples, N denotes set of negatve examples, n s the number of weak classfers n the th layer classfer. 2) F 0 =, D 0 =, =0 3) Whle F > F target a. ++, n =0, F =F - b. Whle F > f F - n ++. Use P and N to tran a th layer classfer wth n weak features usng AdaBoost. Evaluate current cascaded classfer on valdaton set to determne F and D. Decrease threshold for the th classfer untl the current cascaded classfer has a detecton rate D of at least d D, evaluate F. c. If F > F target then evaluate the current cascaded classfer on the set of non-face mages, and put any false detecton nto the N. 3. Proposed Improved AdaBoost 3. Problem of tme cost In AdaBoost algorthm, the step 3)b tme costs expensvely, because all smple classfer h j (j=:k) s desred to compute where k s the number of the Haar-Lke features and the k s a very large number. Moreover each h j s obtaned by exhaustve searchng every samples so that t would take much tme to only get a weak classfer h t (x). If the Onetme s the tme needed to get only one smple classfer, and the tranng tme to get one weak classfer s Trantme, then Trantme = k* Onetme. The step s done repeatedly untl the face detecton rate s satsfed. If the number of weak classfers s T, then t would cost Alltme to obtan the fnal strong classfer. Alltme = Trantme *T = k* Onetme *T If there are 600 weak classfers needed, the mean search tme of each h j s 0.s, only 24,000 features s used to tran, then the consumed tme would be 600 24,000 0. =,440,000s(.e. 400 hours, about 6 days ). It s too long. So t s necessary to save the tme for computng h j. 3.2 Dvson of the feature values Accordng to Int-AdaBoost, for one Haar-lke feature, the feature value of each face sample s used as possble threshold. Under a gven party, the possble threshold s compared wth that of all tranng samples. Consequently the false detecton rate s calculated. Do t agan under another party. The threshold and the party wth ths Haar-lke feature could be determned by the face sample whch causes the mnmum false detecton rate, and the feature value of ths face sample s used as the threshold for the Haar-lke feature.the cost tme s O(m n) where m,n s the number of face samples and non-face samples. In face detecton tranng, m,n are all very large, generally thousands even mllon. So the tme correspondng to O(m n) s very expensve. By the experment results, we fnd that false detecton occurred frequently when there are some nose on the mage and we also fnd that there are lttle dfference between the feature values of some face samples, the ablty of these feature values to dscrmnate face sample and non-face sample s smlar. So to reduce the number of possble threshold, we could get the maxmum and the mnmum feature values of all tranng face samples wth one Haar-lke feature, and obtan r ranks of the feature value from mnmum to maxmum feature value. That s: j = (max(g j (x ))- mn(g j (x )))/r, = m Where g j (x ) s the jth Haar-Lke feature of th example x. Then usng each Th k as a possble threshold to fnd the weak classfer and ts threshold wth the party. Th k s computed by Th k =mn(g j (x ))+k j (k= r). 23

Vol., No. 2 Computer and Informaton Scence Now the cost tme would be O(r (m+n)) nstead of O(m n). After do many experments,we fnd that the detecton performance could be the same as Int-AdaBoost when r s less then 00. If let r=00, then the value of r (m+n) would be smaller and smaller than m n because m, n are usually thousands even mllons. So the tme used to tran would be falled rapdly and could be denoted as O(m+n). Furthermore, let W k s the sum of weght of every samples under threshold Th k, G k s the sum of weght of the samples whose feature value s more than Th k, but less than Th k+, then the sum of weght of every samples under threshold Th k+ s W k+. W k+= W k +G k. That s to say only those samples are tested whose feature value s more than Th k,, and t s not necessary to test all samples to calculate W k+ when W k has been known. So the tme for tranng could be also saved. The mproved AdaBoost based on the dvson of the feature values s called Rank- AdaBoost. 3.3 Fndng of dual-threshold Accordng to expermental results, the local feature dstrbutes regularty correspondng to a weak classfer. The typcal local features of face and non-face are shown n Fgure 4. In Fgure 4, vertcal axs y says the rato of face or non-face n total samples, the horzontal axs x s values of feature. Threshold used to ndcate face or non-face could be obtaned rapdly. As Fgure 4 shows, face s hgher than non-face from θ () to θ (2), so θ () and θ (2) are used as dual-threshold. The specfc method to fnd threshold s descrbed as follows. ) For each x, compute face(x)- nonface(x). 2) Choose x' wth the maxmum: x =argmax (face (x)- nonface (x) ). 3) From x' to left or rght to fnd the crossng pont whch cause face(x)-nonface(x)<0. If no crossng pont s found, then the boundary pont s selected as crossng pont. So we could get two crossng ponts and use them as dual-threshold. AdaBoost based dual-threshold s called Dual- AdaBoost. When determnng a feature as a current weak classfer or not, the dual-threshold of the features may be adjusted to ensure weak classfer h t to meet the demand of detecton. So ths method can greatly accelerate each weak classfer search. If there were 24,000 features, the search tme would be /50 of that of exhaustve search. 4. Proposed Inhertng cascaded detector In cascaded detector, fewer and fewer sub-wndow mages need to be detected by layer classfer at later stages, a lttle error would make overall detecton performance declne. So, a sequence of gradually more complex and more powerful classfers are traned to ncrease classfcaton performance wth examples that have pass through all the prevous classfers. Durng cascaded detector tranng, the predecessor s used as a part of ts successor, that s to say each layer s consdered not only as an ndependent node of the cascaded classfer but also as a component of ts successor. So the later classfer ncludes more classfcaton features. f k (x) = f k (x) + α ( x) where f k (x) and f k- (x) s a strong classfcaton functon on the kth and k-th layer respectvely. It s called nhertng cascaded detector. Based on ths algorthm, the overall performance of the cascaded detector s enhanced. Moreover, the threshold of classfer on dfferent layers s adjusted to separate the tranng samples of face and non-face as far as possble, so that more non-face would be gnored. Both computatonal speed and the performance are mproved obvously by ths detecton approach smultaneously. As an example, we need to tran strong classfers on the kth layer. f k (x) and H k (x) s a strong classfcaton functon and strong classfer respectvely. A total of L samples consst of m postve examples (face) and n negatve examples (non-face). Postve examples arrange n a sequence before n negatve examples. ) Gven example mages: (x, y ),, (x L, y L ), where y {, -} represents postve or negatve examples; g j (x ) s the jth Walsh feature of th example x. L=m+n. 2) Usng the last weghts resulted from the strong classfers tranng on (k-)th layer as w,. If k= then ntalze T = th t w, = 0.5/m 0.5/n m, otherwse weghts () 3) Search θ j and θ (2) () j of each local feature based on ts dstrbuton n all face and non-face examples. Use θ j and θ j 24

Computer and Informaton Scence May, 2008 (2) as dual-threshold. 4) For t =,..., T a. Normalze the weghts w t, = w t, / b. For each feature j, get a weak classfer h j wth θ j () and θ j (2) by the method dscussed above, and evaluate ts error ε j, L ε j = ( )sgn( ) = w t, h j x y. Choose a weak classfer h t wth the lowest error ε t from all these weak classfers, then calculate coeffcent α (0) t = (ln (( ε t )/ε t ))/2. c. Get a strong classfer functon The correspondng strong classfer s: t = L j = (0) f k (x) = f k (x) + α h ( x) + α h ( x) t t w t, j H k (x) = f k (x) 0, - otherwse d. Usng the thresholds to test on postve samples and to make f k (x) acheve the default requrements D k d D k. e. Use f k (x) to test on negatve samples, f F f F then ext the teraton. αt t f. Update the weghts w h t +, = wt, e, where α t = (ln (( ε t )/ε t ))/2. 5) Get a strong classfer functon the correspondng strong classfer s: f k (x) = f k (x) + α ( x) T = th t where β = mn( f ( x )). k =, m 5. Expermental results k 5. Experments on MIT-CBCL data set The publcly avalable MIT-CBCL face database s used to evaluate the performance of the proposed face detecton system. The orgnal MIT-CBCL tranng set contans 2,429 face mages and 4,548 non-face mages n 9 9 pxels grayscale PGM format. The tranng faces are only roughly algned,.e., they were cropped manually around each face just above the eyebrows and about half-way between the mouth and the chn. The data set wllserve our purpose of comparng our detecton system wth ther orgnal system, whch we shall tran usng the same tranng set. Some samples are shown as Fgure 5. The tranng data set we used s the subset of MIT-CBCL and consst of,429 face mages and 3,548 non-face mages. The test set conssts of,000 face mages and,000 non-face mages left. The compuer we used s wth P4/2.4GHz CPU, GB memory. The results are shown as Table. Accordng to Table, the Detecton rate and False postve under Rank-AdaBoost or Dual-AdaBoost are smlar wth Int-AdaBoost, but the tranng tme s obvously dfferent. The tranng tme used n Dual-AdaBoost s only /50 of that of Int-AdaBoost. The detecton tme also fall about half because of less classfers used. Moreover,the robustness and the ablty of generalzaton become better and better. 25

Vol., No. 2 Computer and Informaton Scence 5.2 Experments on a Real-World Test Set 5.2. Tranng data sets selecton Face samples must be selected carefully wth varablty n face orentaton (up-rght, rotated), pose (frontal, profle), facal expresson, occluson, and lghtng condtons. Moreover some unmportant features of the face should be removed, such as har and or so. Non-face samples could be selected randomly. Any sub-wndow of an mage contanng no face can be used as a non-face sample. Almost arbtrary large tranng set can be easly constructed usng these non-face samples. In our experment, the tranng data were collected from varous sources. Face mages were taken from the MIT-CBCL face tranng dataset, FERET face dataset, NJUST603 and web. The dataset contans face mages of varable qualty, dfferent facal expressons and taken under wde range of lghtnng condtons. The dataset contaned 845 face samples ncludng rotated versons of some faces. Non-face mages were collected from the web and MIT-CBCL non-face dataset. Images of dverse scenes were ncluded. The dataset contaned mages such as anmals, plants, countrysde, man-made objects, etc. Some non-face samples were selected by randomly pckng sub-wndows from hundreds of mages that dd not contan face. More than 2,80 thousand non-face samples were used. Hundreds of these non-face mages s very smlar to face. Each mage of all samples was cropped nto sze of 9 9. Fgure 6 shows some face samples. Fgure 7 shows some non-face samples. In the experment, 9954 total Haar-lke features were selected and used as weak classfers. The smallest rectangle flter was defned as 4 4, the bggest was 6 6 on the wndow of 9 9 mage. 5.2.2 Expermental results Gven that non-face samples dscarded on each layer was more than 40%, at the same tme the face samples detecton rate was more than 99.99%. We get a cascaded detector usng the proposed mproved AdaBoost and nhertng cascade model. The face cascaded detector usng Rank-AdaBoost has 35 layers of classfers. The face cascaded detector usng Dual-AdaBoost has 2 layers of classfers. We tested our system on the MIT+CMU frontal face test set(rowley et al,998, pp.22 38). MIT+CMU dataset conssts of 30 mages wth 507 total frontal faces. Every mage was reszed by 0.85 each teraton durng test. The the performance of our detecton system are shown n Table 2. Expermental results on MIT+CMU face set shows that our method provdes hgher classfcaton performance than Vola and Jones' method both on tranng speed and detecton accuracy. 6. Conclusons and Future Work In ths paper, we presented a speed-up technque to tran a face detecotor usng AdaBoost by mproveng the threshold searchng method and new nhertng cascaded frame. Our proposed face detecton system ncorporatng the technque reduces the number of subwndows that need preprocessng and verfcaton. The proposed system s much faster than the orgnal AdaBoost-based detecton systems n tranng speed and s also hgher n testng accuracy. It s sutable for realtme applcatons. Further, the system performs well for frontal faces n gray scale mages wth varaton n scale and poston. A larger tranng set would be essental for the detector to be of practcal use. In partcular, the number of non-face mages would have to be drastcally ncreased n order to decrease false postves. Moreover, as mentoned earler, usng a larger number of Haar-lke features would also mprove the accuracy. Implementng and nprovng the cascade s requred n order to acheve the ultmate am of our work,.e., to mprove the accuracy of the detector whle mantanng real-tme detecton speed. References H.Rowley, S.Balujaand T.Kanade.(998). Neural Network-based Face Detecton. IEEE Pattern Analyss and Machne Intellgence, 20, 22 38. H.Rowley, S.Baluja and T.Kanade. (998). Rotaton Invarant Neural Network-based Face Detecton. In Proc. of IEEE Conference on Computer Vson and Pattern Recognton [C], Australa, 38-44. Lang Luhong, A Ha-zou and Xu Guang-you. (2002). A Survey of Human Face Detecton. Chnese Journal of Computer, 25, 449~458. P. Vola and M. Jones. (200). Rapd object detecton usng a boosted cascade of smple features [A]. In Proc. of IEEE Conference on Computer Vson and Pattern Recognton [C], USA, 5~58. P. Vola and M.Jones. (2004). Robust real-tme face detecton. Internatonal journal of Computer Vson, 57, 37~54 P. Vola and M. Jones. (2003). Fast Mult-vew Face Detecton. Shown as a demo at the IEEE Conference on Computer 26

Computer and Informaton Scence May, 2008 Vson and Pattern Recognton [C], USA. Schnederman, H. and Kanade, T. (2000). A Statstcal Method for 3D Object Detecton Appled to Faces and Cars. In Proc. of IEEE Conference on Computer Vson and Pattern Recognton, USA, 746-75. Table. Comparson o f the tranng and detecton Int-AdaBoost Rank-AdaBoost Dual-AdaBoost Tme for gettng a Smple classfer (s) 0.92 0.007 0.0023 Sum of weak classfer 96 98 56 Detecton rate (%) 97% 96.4% 98.6% False postve 2 0 Detecton tme (s).766.78 0.953 Table 2. Results on the MIT+CMU test set Vola s Detector Rank-AdaBoost Dual-AdaBoost False Postve 50 78 67 48 82 69 5 69 38 Detecton Rate(%) 9.4 92. 93.9 9.2 92.3 94.0 9.8 92.6 94.2 Fgure. Examples of the Vola and Jones features (a) (b) (c) (d) Fgure 2. Features extracton usng ntegral mage Fgure 3. Schematc depcton of a cascaded detector 27

Vol., No. 2 Computer and Informaton Scence P(f(x)) face nonface nonface f(x) θ () θ (2) Fgure 4. Dstrbuton of typcal local features (a) Some face examples (b) Some nonface examples Fgure 5. Some tranng examples Fgure 6. Some faces from the tranng set Fgure 7. Some non-faces from the tranng set 28