Human Face Recognton Usng Generalzed Kernel Fsher Dscrmnant ng-yu Sun,2 De-Shuang Huang Ln Guo. Insttute of Intellgent Machnes, Chnese Academy of Scences, P.O.ox 30, Hefe, Anhu, Chna. 2. Department of Automaton, Unversty of Scence and echnology of Chna Emals: besun@sohu.com, dshuang@m.ac.cn lguo@m.ac.cn Abstract- In ths paper the generalzed kernel fsher dscrmnant (GKFD) method s used to do pattern feature extracton and recognton for human face mage. Frst, we extend the KFD orgnally used n pattern classfcaton problems to the generalzed KFD (GKFD), whch wll be used n feature extracton problems. Compared to several commonly used feature extracton methods, the GKFD can not only reduce the dmenson of nput pattern, but also provde the useful nformaton for pattern classfcaton. Further, ths GKFD also performs well for lnearly nonseparable pattern classfcaton problems for t possesses a nonlnear transformaton capablty. Fnally, the expermental results on human face recognton problems demonstrate the effectveness and effcency of our approach. Introducton In classfcaton and other data analytc tasks t s often necessary to perform pre-processng on the data before applyng the algorthm at hand. he most common pre-processng method s to extract features from problems nvolved so that the tasks are easly resolved. Feature extracton for classfcaton dffers sgnfcantly from feature extracton for descrbng data. For example, Prncpal component analyss (PCA) s to fnd drectons [6][7] that have mnmal reconstructon error by descrbng as much varance of the data as possble wth m orthogonal drectons whle Fsher Lnear Dscrmnant (FLD) [4] s to choose proectons that dfferent classes of patterns are well separated. So, the FLD can provde the useful nformaton for classfcaton whle the PCA s mostly used to perform the dmenson reducton. It should be noted that the FLD s only a lnear transformaton, whch maxmzes the rato of the determnant of the between-class scatter matrx of the proected samples to the determnant of the wthn-class scatter matrx of the proected samples. As a result, the FLD method s of globally optmal performance only for lnear separable problems. Hence, to overcome ths drawback for the FLD, a kernel dea, whch was orgnally appled n Support Vector Machnes (SVMs) [], can be used to construct a new kernel based FLD. Consequently, ths method s also referred to as kernel fsher dscrmnant (KFD). In fact, the KFD has been successfully used n pattern recognton problems, but t can only solve those problems wth two classes. For more detals, please refer to [2][3]. In ths paper we wll extend the KFD to do feature extracton for multple classes of problems. hus, the method s also named as generalzed KFD (GKFD). hose features obtaned by ths GKFD are drectly classfed by the nearest neghbor method. In ths paper, we take the human face recognton data for example to conduct the 466
related experments. he expermental results verfy the effectveness and effcency of our proposed GKFD. 2 A Novel Method for Feature Extracton ased on Generalzed Kernel Fsher Dscrmnant Input space Feature space for performng the feature extracton of multclass pattern recognton problems. Consderng a set of n sample vectors{ x,x,,x}, assume that each vector 2 n belongs to one of c classes{ X, X,, X }. o 2 derve the GKFD, we frst map the data { x,x,,x} 2 n by, a non-lnear mappng, nto c feature space F. hen n F, the optmal subspace, opt, that we wll fnd, s determned as fellow: Fg he sketch of the man dea of the KFD. y usng a kernel functon, the orgnal nonlnear separable data n nput space becomes lnear separable n feature space. As we have ponted out n Secton, the FLD s only a lnear transformaton, and t s of globally optmal performance only for lnear separable data. ut for most real-world data they are lnearly nonseparable, or nonlnearly separable. o overcome ths lmtaton, the KFD method s proposed [2]. he man dea of the KFD, whch has been proved powerful n pattern recognton problems, s to address the problem of the FLD n a kernel feature space, thereby yeldng a nonlnear dscrmnant n nput space (as shown n Fg ). For a set of orgnal nonlnear separable data, the kernel method [] can guarantee to make nonlnear data become lnear separable f ths data s mapped to a feature space by a kernel functon. th ths dea the KFD ams at resolvng a bnary classfcaton problem that frst proects all the data to the optmal proecton vector found by KFD, and then classfes these data accordng to ther proecton values. ut for a multclass classfcaton problem, only one proecton vector s not enough, so n ths secton we wll extend the KFD to the generalzed KFD (GKFD) opt where S = arg max = [ w, w2,, w ] m () S S and S are the correspondng between-class scatter and wthn-class scatter matrces n F,.e., S c ( µ µ )( µ µ = ) = c ( ( ) µ )( ( ) µ = ) k k = xk X µ = ( x ) k, µ = n x N k X n = w wth S n (2) x x (3) ( x ). Each n eqn.() can be computed by solvng the followng generalzed egenvalue problems: S w = λ S w (4) From the theory of reproducng kernels we know that any soluton w F must be n the spannng space of the mapped data,.e., w span{ ( x), ( x2),..., ( x n)}, whch can be wrtten as = n α x = w ( ) (5) Usng ths expanson, the numerator of eqn.() can be rewrtten as: where w S w = α Mα (6) 467
c M = ( M M)( M M) (7) = ( x ) ( x ) = k( x, x ) (8) α = α α α (9) [, 2,..., n ] In eqn. (7) we have defned as ( M) = k( x, x k) (0) n x M = n k X n = k( x, x ) () Now, consderng the denomnator of eqn.(), and usng smlar transformaton, we have: where c = w S w = α Lα (2) w L = K ( I N ) K ; K s an matrx wth ( K ) = k( x, x ), x X nm n m the dentty matrx and n entres n. m n n ; I s the matrx wth all Combnng eqn.(6) and eqn.(2), the optmal subspace, opt, can be determned by followng formula: α Mα αopt = arg max = [ α, α2,, αm ] (3) α α Lα and the followng eqn.(4). o extract the features of a new pattern x wth the GKFD, we smply proect the mapped pattern ( x ) onto ths subspace, and the result s descrbed n eqn.(5). From the above analyses, we can obtan several conclusons as stated n the followng remarks: Remarks: ) From the expresson of S, we can see that assumng the dmenson of the mapped space s r, then, rank( S ) mn( r, c ). As lterature [3] has outlned, the dmenson of the feature space s equal to or hgher than the number, n, of tranng samples, whch makes the regularzaton necessary. Accordngly, there are at most c- generalzed egenvectors correspondng to nonzero egenvalues. In other words, the GKFD can transform the new pattern x to a vector wth the dmenson of c-. 2) Also t can be found that the relaton, rank( S ) mn( r, n ), holds. Due to r > n, so rank( L) = rank( S ) n. Obvously, the matrx L s a sngular one. o overcome ths problem, we can use the same method as n [2][3], add a multple of the dentty matrx to L: Lµ = L+ µ I. (6) 3) A very mportant problem of the GKFD s how to select the kernel functons and ther parameters snce dfferent kernel functons wll has dfferent effects on the performance of the problems nvolved. So far, however, how to select the kernel functon s stll open. Usually the canddates of optmal kernel functons are determned by usng some heurstc rules, where the one that mnmzes a gven crteron s chosen. A most commonly used method s cross n n n opt = [ w, w2,, wm ] = α( x ), α2( x ),, αm( xm ) (4) = = = n n n ( x) opt = α( x ) ( x), α2( x ) ( x),, αm( xm ) ( x ) = = = n n n = αk( x x), α2k( x x),, αmk( xm x ) (5) = = = 468
valdaton method,.e., the tranng samples are dvded nto k subsets, each of whch has the same number of samples. hen the performance of each canddate s evaluated k-tmes. In the -th ( =,2,..., k ) teraton, the data except for the -th one s used to conduct the tranng phase whle the -th one s used to conduct the testng phase. At last, the canddate that acheves the best performance s selected. he extreme case that k s equal to the number of samples s called leave-one-out cross valdaton. paper the nearest neghbor method s selected as the effcent classfer for human face mage. 4 Experment Results o verfy the effectveness and effcency of our approach, n the followng we wll present the related expermental results for human face mage recognton.. 3 Human Face Image Recognton ased on the Nearest Neghbor Method Perhaps the smplest classfcaton scheme s the nearest neghbor classfer (NNC) n the mage classfcaton. Under ths scheme, an mage n the test set s recognzed by assgnng to t the label of the closest pont n the tranng set, where the dstance s measured n the mage space. If all the mages are normalzed to have zero mean and unt varance, then ths procedure s equvalent to choosng the mage n the tranng set that best correlates wth the test mage. In fact, because of the normalzaton process, the normalzed mages are ndependent of lght source ntensty. ut ths NNC procedure has two well-known dsadvantages. Frst, f the mage n the tranng set and the test set are gathered under varyng lghtng condtons, then the correspondng ponts n the mage space may not be tghtly clustered. Second, ths NNC procedure s also computatonally expensve. However, our proposed GKFD can overcome these drawbacks, thus, the nearest neghbor methods can work effcently as a classfer after feature extracton by GKFD. he reason s that on the one hand, the mage from the same class set can gather tghtly after GKFD transform; on the other hand, the dmenson of the transformed sample ponts s sgnfcantly reduced. So n ths Fg 2 Example mages from one subect n ORL database Our experments were performed usng ORL database. hs database ncludes 400 dfferent mages of 40 dstnct subects (ten ones per person), respectvely. For some of the subects, the mages were taken at dfferent tme. Moreover, there are varatons n facal expresson, e.g., open/closed eyes, smlng/nonsmlng, and facal detals, e.g., glasses/no glasses, etc. Fg 2 shows ten mages of a subect. he orgnal face mages were all szed nto 92 2 wth a 256-level gray scale. Frst the face mages are preprocessed by twce dscrete wavelet transformaton. he sze of each mage s reduced to 23 28. As a result, the computatonal complexty further decreases. he experments were performed wth fve tranng mages and fve test mages per person, thus a total of 200 tranng mages and 200 test mages are formed. Assume that there exsts no overlappng between the tranng and test sets. Snce the recognton performance s affected by the selecton of the tranng mages, the reported results were obtaned by tranng 20 recognzers wth dfferent tranng examples that are formed by randomly selectng fve ones from ten mages 469
per subect. he reported classfcaton error has been averaged over all 20 expermental results. In our experments the Radal ass Functon of Gaussan form: 2 (, 2) = exp( 2 σ ) k x x x x (7) s used as kernel functon and the correspondng parameter s selected through leave-one-out method. (a) proposed approach, here we use the nearest neghbor method to perform the classfcaton on the features extracted from the ORL database by the GKFD. Fg 4 shows the plot of the error rates vs. the change of dmenson for four feature extracton methods of PCA, GKFD, PCA+FLD and PCA+GKFD. From ths fgure we can fnd that the error rate of the GKFD s the smallest among the four methods. Note that the FLD can not be appled on the orgnal data drectly for t can only process the data wth the dmenson lower than n-c, so the PCA has to be used before the FLD. able also shows the comparsons of the smallest error rate vs the reduced dmenson for four feature extracton methods. From ths table t can be found that the GKFD can get better performance for lnearly nonseparable pattern recognton problems able Performance comparson of four feature extracton methods Reduced Methods Error Rate (%) dmenson PCA 99 8.5 FLD 20 8.6 PCA+GKFD 39 4.35 GKFD 39 2.6 (b) Fg 3 he data dstrbuton of 3 components for 3 subects after (a) PCA and (b) GKFD. 0.25 0.2 PCA+FLD PCA gkfd GKFD PCA+gKFD PCA+GKFD In order to vsualze the dstrbuton of human face mage samples, we take 3 components correspondng to the frst 3 largest egenvalues of the PCA and the GKFD to draw the 3-dmensonal topology. From Fg 3 we can fnd that the orgnally overlapped data can be more easly classfed after feature extracton by the GKFD whle the 3 components for the PCA are somewhat overlapped. In order to further test the performance of our So called the classfcaton error s meant the percentage of samples erroneously labeled. R ate Er o r 0.5 0. 0.05 0 0 0 20 30 40 50 60 70 80 90 00 Number of Dmenson Fg 4 the error rate curves vs the change of dmenson for four methods of PCA, FLD, GKFD and PCA+GKFD Note that n the testng case the error rate of the PCA+GKFD s hgher than that of the GKFD. he reason possbly s that the data after PCA 470
transformaton have lost some features of the orgnal data. In addton, able 2 also gves a summary of the performance comparsons of fve classfcaton systems for recognzng the data from the ORL database. he correspondng error rates are the averages of 20 smulatons. However, the ndvdual smulaton of the GKFD n experment sometmes shows that the error rate s as low as 0.5%. In concluson, the above expermental results show that the GKFD plus the nearest neghbor method can effcently solve those nonlnearly separable pattern recognton problems wth multple classes compared to other tradtonal methods. able 2 Performance comparson of fve classfcaton systems Systems Error rate (%) Egnfaces[8] 0.0 Pesudo-2DHMM[8] 5.0 Probablstc decson-based neural network[9] 4.0 Lnear SVMs[0] 3.0 GKFD+NN 2.6 5 Conclusons In ths paper we extended the KFD, orgnally used as classfers, to the feature extracton felds. he correspondng performance was verfed by the human face mage data from the ORL database. From the obtaned expermental results we can draw the followng conclusons: ) Unlke the KFD amng at fndng one optmal proecton drecton to perform bnary classfcaton problem, the GKFD ams at fndng an optmal subspace n the feature space to do the feature extracton for those multclass problems. 2) he GKFD performs well for lnearly nonseparable pattern recognton problems for t s a nonlnear transformaton. So t can almost get the optmal results for nonlnearly separable data. 3) Unlke the FLD, the GKFD can be appled to the orgnal data drectly no matter what the number of the dmenson s. 4) In ths paper the nearest neghbor method s used as the classfer due to ts smplcty. If ths method s combned wth some advanced classfcaton methods such as RFN, and SVMs, etc., the error rate can be further reduced. Future works wll nclude extendng ths method to solve those problems wth a large scale of database. References: [] V. Vapnk, he nature of Statstcal Learnng heory, New York: ley, 998. [2] S. Mka, G. Ra tsch, J. eston,. Scho lkopf, and K. Mu ller, Fsher Dscrmnant Analyss wth Kernels, Neural Networks for Sgnal Processng, vol. 9, pp. 4-48, 999. [3] K.-R. Müller and S.Mka, An Introducton to Kernel-ased Learnng Algorthms, IEEE ransacton on Neural Networks, Vol.2, NO.2.pp.8-200,200. [4] R.A Fsher, he use of Multple Measure n axonomc Problems, Ann.Eugencs, vol.7, pp.79-88,936. [5] N.elhumeur and P.Hespanha, Egenface vs. Fsherfaces:Recognton Usng Class Specfc Lnear Probecton, IEEE ransacton on Pattern Analyss, Vol.2, NO.2.pp.8-200,200. [6] R. Lotlkar and R. Kothar, Fractonal-step dmensonalty reducton, IEEE rans. Pattern Anal. Machne Intell., vol. 22, pp. 623 627, 2000. [7] M.urk and A.Pentland, Egnefaces for Recognton J,Cogntve Neuroscence, Vol.3, no.,99. [8] F. S. Samara, Face recognton usng hdden Markov models, Ph.D. dssertaton, Unv. Cambrdge, Cambrdge, U.K., 994. [9] S.-H. Ln, S.-Y. Kung, and L.-J. Ln, Face recognton/detecton by probablstc decson-based neural network, IEEE rans. Neural Networks, vol. 8, 47
pp. 4 32, Jan. 997. [0] G. D. Guo, S. Z. L, and K. L. Chan, Face recognton by support vector machnes, n Proc. Int. Conf. Automatc Face and Gesture Recognton, pp. 96 20.2000 472