IJCSNS Internatonal Journal of Computer Scence and Network Securty, VOL.9 No.4, Aprl 2009 39 Face Recognton Based on Neuro-Fuzzy System Nna aher Makhsoos, Reza Ebrahmpour and Alreza Hajany Department of Electrcal Engneerng, Shahd Rajaee Unversty, ehran, Iran Summary hs paper nvestgates the benefts of combnng neural networks and fuzzy logc nto neuro-fuzzy system, especally for applcaton n face recognton task. he former s the result of combnng fuzzy logc and mxture of experts, ME, by multlayer perceptron networks. In the descrpton of ME, some proposals to mprove the performance of model are also descrbed. An analyss comparng the model enhancements proposed n ths paper wth the correspondng orgnal ME model n face recognton s also presented. Expermental result confrms the mportance of combnng these two technologes (neural networks and fuzzy logc) n face recognton. Key words: Fuzzy logc, Neuro-fuzzy system, Mxture of Experts.. Introducton he combnng of the technques of fuzzy logc and neural networks suggests the novel dea of transformng the burden of desgnng fuzzy logc systems to the tranng and learnng of connectonst structure and learnng to the fuzzy logc systems and the fuzzy logc systems provde the neural networks wth a structural framework wth hgh-level fuzzy IF-HEN rule thnkng and reasonng. hese benefts can be wtnessed by the success n applyng neuro-fuzzy system n areas lke pattern recognton and control. Fuzzy logc [,2,3] and artfcal neural networks [4,5] are complementary technologes n the desgn of ntellgent system. he combnaton of these two technologes nto an ntegrated system appears to be a promsng path toward the development of ntellgent systems capable of capturng qualtes characterzng the human bran. However, fuzzy logc and neural networks generally approach the desgn of ntellgent systems from qute dfferent angles. Neural networks are essentally low-level, computatonal algorthms that sometmes offer a good performance n pattern recognton tasks. On the other hand, fuzzy logc provdes a structural framework that uses and explots those low-level capabltes of neural networks. Both neural networks and fuzzy logc are powerful desgn technques that have ther strengths and weaknesses. Neural networks can learn from data sets whle fuzzy logc solutons are easy to verfy and optmze. able shows a comparson of the propertes of these two technologes. In analyzng ths table, t become obvous that a clever combnaton of the two technologes delvers the best of both worlds. Evolutonary Artfcal Neural Networks have been wdely studed n the last few decades. he man power of artfcal neural networks les n ther ablty to correctly learn the underlyng functon or dstrbuton n a data set from a number of samples. hs ablty can be expressed n terms of mnmzng the estmaton error of the neural network, on prevously unseen data. As dscussed n the comprehensve revew of [6], varety of methods has been appled on dfferent ssues of ANNs, such as the archtecture and the connecton weghts to mprove ther performance. here are several publshed works n the lterature that have shown an ensemble of neural networks demonstrates mproved generalzaton ablty n comparson wth ndvdual networks [7,8,9,0,,2]. Most real world problems are too complcated for a sngle ndvdual network to solve. Dvde-and-Conquer approach, whch tres to solve a complex problem by dvdng t nto smple problems, has proved to be effcent n many of such complex stuatons. Jacobs et al. [3,4] have proposed an ensemble method called Mxture of Experts, ME, based on the Dvde-and-Conquer prncple. ME s one the most famous methods n the category of dynamc structures of combnng classfers, n whch the nput sgnal s drectly nvolved n actuatng the mechansm that ntegrates the outputs of the ndvdual experts nto an overall output [4]. Consder a modular neural network n whch the learnng process proceeds by fusng self organzed and supervsed forms of learnng as shown n Fgure. he experts are techncally performng supervsed learnng n that ther ndvdual outputs are combned to model the desred response. here s, however, a sense n whch the experts are also performng self organzed learnng; that s they self organze to fnd a good parttonng of the nput space so that each expert does well at modelng ts own subspace, and as a whole group they model the nput space well. he Manuscrpt receved Aprl 5, 2009 Manuscrpt revsed Aprl 20, 2009
320 IJCSNS Internatonal Journal of Computer Scence and Network Securty, VOL.8 No., November 2008 learnng algorthm of the mxture structure s descrbed n [5]. able : Propertes of neural networks and fuzzy logc Knowledge Representaton ranablty Neural Networks Implct, the system cannot be easly nterpreted or modfed rans tself by learnng from data sets Fuzzy Logc Explct, verfcaton and optmzaton are very easy and effcent None, everythng must be defned explctly However, n our models, n order to mprove the performance of the expert networks, and consequently the whole network performance, we use modfed of ME n whch MLPs nstead of lnear networks or experts are used, and s hereafter referred to as mxture of multlayer perceptron experts (MME). o evaluate the performance of our proposed model, we use ORL dataset n our experments, and we use Prncpal Component Analyss, PCA, n the feature extracton phase. PCA s one of the most common methods of dmensonalty reducton that s wdely used n the lterature of face recognton for feature extracton [6]. 2. A modfed Mxture of Experts From a computatonal pont of vew, accordng to the prncple of dvde and conquer, a complex computatonal task s solved by dvdng t nto a number of computatonally smple tasks and then combnng the solutons to those tasks. In supervsed learnng, computatonal smplcty s acheved by dstrbutng the learnng task among a number of experts, whch n turn dvdes the nput space nto a set of subspaces. he combnaton of experts s sad to consttute a combnaton of classfers. Mxture of experts s one the most famous methods n the category of dynamc structures of combnng classfers, n whch the nput sgnal s drectly nvolved n actuatng the mechansm that ntegrates the outputs of the ndvdual experts nto an overall output [4]. Consder a modular neural network n whch the learnng process proceeds by fusng self organzed and supervsed forms of learnng as shown n Fg.. he experts are techncally performng supervsed learnng n that ther ndvdual outputs are combned to model the desred response. here s, however, a sense n whch the experts are also performng self organzed learnng; that s they self organze to fnd a good parttonng of the nput space so that each expert does well at modelng ts own subspace, and as a whole group they model the nput space well. he learnng algorthm of the mxture structure s descrbed n [5]. However, n our model, n order to mprove the performance of the expert networks, and consequently the whole network performance, we devse a modfed verson of MoE n whch each expert s a MLP, nstead of lnear networks [7]; furthermore, we make use of a parameter obtaned from fuzzy logc n the weght update phase. Usng ths fuzzy parameter, the degree of ambguty of an nput pattern durng the learnng phase of MLP experts s obtaned. Fg. Block dagram of Commttee Machne based on Mxture of Experts. he rest of ths paper s organzed as follows: In Secton 2, we provde a bref descrpton of the Mxture of Experts structures wth lnear and Mult Layer Perceptrons, MLP, as experts. Secton 3 descrbes the proposed model n whch fuzzy rules are employed to learn a MLP, whch s then used n ME structure. In secton 4 a comparson wth prevously publshed methods s provded, along wth a dscusson on the obtaned results. Fnally, Secton 5, concludes and summarzes the paper. 3. Proposed Fuzzy Mxture of Expert Our proposed model s desgned to acheve robust face recognton wth PCA n the feature extracton stage, and a mxture of Fuzzy MLP experts n the classfcaton stage (Fg. 2). In the conventonal MLP, the number of nodes n the output layer corresponds to the number of classes n the task to be performed. he wnner-takes-all scheme s appled durng the learnng process n order to defne the desred output vector. In the desred output vector, the class to whch the nput pattern belongs s assgned the value, and other classes are assgned the value 0; ths s called a crsp desred output. In real-world problems, however, the data are generally ll-defned, wth overlappng or fuzzy class boundares. hat s, there mght
IJCSNS Internatonal Journal of Computer Scence and Network Securty, VOL.8 No., November 2008 32 be some patterns wth non-zero smlarty to two or more pattern classes. In the conventonal MLP, ths multple smlarty (or membershp value) s not consdered wth the crsp desred output that s used durng ts tranng. In order to consder the membershp values of each class, t would seem very promsng to ncorporate fuzzy concepts n formng the desred output [8]. A common way of formng the fuzzy desred output s proposed n [9], whch we use to develop our method by usng the two parameters of class prototype and varablty. Prototype of a class ponts to the dstrbuton of values n that class and s determned on a pxel by pxel bass. Prototype parameter s obtaned accordng to Eq. (). L j j= Pk = =,..., M () L where P k s the prototype of the th pxel of class k, k =,... C (C s the number of classes), j s the th pxel of the tranng pattern j, M s the number of pxels of a pattern and L s the number of tranng patterns. Varablty of a class has a fundamental role n the process of calculatng the fuzzy desred output and s concerned wth the varatons of prototype values for each pxel n the set of the tranng patterns. Varablty for each s calculated consderng the prototype of that class and t s defned accordng to Eq. (2). V k = M = (0.5 P ) M 4 j 2 Prototype and varablty parameters are obtaned consderng the whole tranng set before startng the tranng process. Durng the tranng phase, for each nput pattern, a comparson s made between the nput and the parameters of ts correspondng class n order to derve ts fuzzy desred output. he weghted dstance between an nput and all classes s calculated consderng the squared error between the nput pattern and the prototype of a class by Eq. (3). Ek f Dk = ( ) (3) c E l= l (2) where X s the th pxel of the nput pattern, f s the fuzzness smlarty whch determnes the rate of decrease of the weghted dstance of the nput pattern accordng to ts error. For each nput pattern, class membershp value s calculated accordng to the weghted dstance of the closest class, other weghted dstances and ts correspondng varablty Eq. (5). Zk exp Zk = Dk Vk, μk = ( ) (5) Z where exp s a fuzzy parameter whch defnes the wdth of the membershp functon, Z s the smlarty of the closest class calculated for the nput pattern. In the MLP weght update process, all tranng patterns have the same mpact on adjustng the weghts. However, there are patterns that are n overlappng areas and that are manly responsble for erratc behavor. One way to mprove the performance of the weght update algorthm s to consder the amount of correcton n the weght vector produced by the nput pattern. Here, the amount of correcton s defned by the degree of ambguty of a pattern n whch the more ambguous an nput pattern s, the less correcton n the weght s made. he degree of ambguty s an addtonal parameter to be used n the proposed fuzzy MLP whch was not employed n the model proposed n [8]. he degree of ambguty s defned accordng to Eq. (6). m A= ( μ ( x) μ ( x)) (6) j where μ ( x) s the membershp value of the top class, μ j ( x) s the membershp value of the second hghest one and m s an enhancement/reducton fuzzy parameter.( m < enhances, m = mantans and m > reduces the nfluence of the ambguty of a tranng pattern). he degree of ambguty s, then, used as a parameter n the experts weght updatng. he new weght updatng equaton for each of expert wll be n 3.2 Secton. M k k = 2 E = ( X P ) (4)
322 IJCSNS Internatonal Journal of Computer Scence and Network Securty, VOL.8 No., November 2008 O = Og =, 2,3, 4 (8) he normalzed exponental transformaton of Eq. (7) may be vewed as a mult-nput generalzaton of the logstc functon. It preserves the rank order of ts nput values, and s a dfferentable generalzaton of the wnner-takes-all operaton of pckng the maxmum value, so referred to as softmax. he weghts of MLPs are learned usng the backpropagaton, BP, algorthm, n order to maxmze the log lkelhood of the tranng data gven the parameters. Assumng that the probablty densty assocated wth each expert s Gaussan wth dentty covarance matrx, MLPs obtan the followng onlne learnng rules: Δ w = μ h( y O)( O( O)) Oh (9) y e Fg. 2. Sketch of the proposed models. Each expert s a one-hdden-layer Fuzzy MLP, wth Momentum constant, that computes an output vector O as a functon of the nput stmul vector x and a set of parameters such as weghts of hdden and output layer and a sgmod functon as the actvaton functon. It s assumed that each expert specalzes n a dfferent area of the face space. he gatng network assgns a weght g to each of the experts outputs, O. he gatng network determnes the g as a functon of the nput vector x and a set of parameters such as weghts of the hdden layer, the output layer and a sgmod functon as the actvaton functon. he g can be nterpreted as estmates of the pror probablty that expert can generate the desred output y. he gatng network s composed of two layers: the frst layer s an MLP network, and the second layer s a softmax nonlnear operator as the gatng network s output. he gatng network computes Og, whch s the output of the MLP layer of the gatng network, then apples softmax functon to get: exp( Og ) g = (7) N exp( O ) j= g So the g s nonnegatve and sum to. he fnal mxed output of the entre network s: Our method of ncreasng the learnng rate, and also avodng the danger of nstablty, s to modfy the fuzzy of Eq. (9) by ncludng a Momentum term [7] and degree of ambguty of Eq.(6): Δ wn ( ) = αδwn ( ) + Aμ hn ( ) ( y On ( ) ) y y e ( On ( ) ( On ( ) )) Ohn ( ) (0) Δ wn ( ) h = αδwn ( ) h + Aμehn ( ) wn ( ) y ( y O( n))( O( n)( O( n))) Oh( n) ( Oh( n) ) () Δ wn ( ) = αδwn ( ) + μ ( hn ( ) gn ( )) yg yg g ( On ( ) g( On ( ) g)) Ohn ( ) g (2) Δ wn ( ) hg = αδwn ( ) hg + μgwn ( ) yg ( hn ( ) gn ( ))( On ( ) ( On ( ) )) g Oh( n) ( Oh( n) ) x (3) g g where μ e and μ g are learnng rates for the experts and the gatng network, respectvely, Oh s the output of expert network s hdden layer, and h s an estmate of the posteror probablty that expert can generate the desred output y : g
IJCSNS Internatonal Journal of Computer Scence and Network Securty, VOL.8 No., November 2008 323 gexp( ( y O) ( y O)) h 2 = (4) gj( ( y O) ( y O)) j 2 hs can be thought of as a softmax functon computed on the nverse of the sum squared error of each expert s output, smoothed by the gatng network s current estmate of the pror probablty that the nput pattern was drawn from expert 's area of specalzaton. As the network s learnng process progresses, the expert networks compete for each nput pattern, whle the gatng network rewards the wnner of each competton wth stronger error feedback sgnals. hus, over tme, the gate parttons the face space n response to the expert s performance. were carred out to evaluate the performance of our proposed face recognton method. In order to select the best canddates for constructng the Mxture of Experts, we frst performed two prelmnary experments on three commttee machnes and two ndvdual classfers. he thrd experment s desgned to test our combnng classfer model. he nvestgated aspects of the experments are, frst, fndng the optmum number of prncpal components for the feature extracton phase, second, selecton of a sutable neural network topology to form the experts, thrd, fndng the optmum archtectural parameters of each ndvdual neural classfer, and forth, Selecton of the optmal tranng parameters, such as the number of tranng epochs and testng the ndvdual and the ensemble structures on an unseen set of 200 test faces. Fg. 4. Sample of ORL mages reszed to 48 48 pxel. Fg. 3. Samples of facal varatons of the ORL dataset. he ncluson of Momentum term n the backpropagaton algorthm tends to accelerate descent n steady downhll drectons and has a stablzng effect n drectons that oscllate n sgn. he ncorporaton of Momentum term n the back-propagaton algorthm represents a mnor modfcaton to the weght update process, yet t may have some benefcal effects on the learnng behavor of the algorthm. he Momentum term may also have the beneft of preventng the learnng process from termnatng n a shallow local mnmum on the error surface [7]. 4. Expermental Results In ths secton, we descrbe the three experments that As mentoned before, we test our model usng the ORL dataset. he mages are grey scale wth a resoluton of 92 2 (Fg. 3). In our experments, 200 mages are chosen for tranng and the remanng 200 for testng. For mplementaton convenence, all mages were frst reszed to 48 48 pxels (Fg. 4). PCA s a well-known statstcal technque for feature extracton. Each M N mage n the tranng set s row concatenated to form MN x vectors k. Gven a set of N { k } tranng mages k 0,,..., N x = tranng set s obtaned as: N x = x k N k = the mean vector of the (5)
324 IJCSNS Internatonal Journal of Computer Scence and Network Securty, VOL.8 No., November 2008 A N MN tranng set matrx X = [ x x] can now be bult. he bass vectors are obtaned by solvng the egenvalue problem: where Λ= V x V k (6) = XX s the covarance matrx, V s x egenvector matrx of x and Λ s the correspondng dagonal matrx of egenvalues. As the PCA has the property of packng the greatest energy nto the least number of prncpal components, egenvectors correspondng to the largest egenvalues n the PCA are selected to form a lower-dmensonal subspace. It s proven that the resdual reconstructon error generated by dscardng the N m components s low even for small m. Fgure 5 llustrates the largest and smallest egenfaces generated by PCA on ORL dataset used n our experment n the tranng set. components for the above mentoned algorthms n plotted. As shown n Fg. 6, the frst 40 prncpal components yelds the best average performance. Fg. 6. Average performance wth respect to prncpal components for dfferent learnng algorthms on the ORL dataset. Settng the number of prncpal components to 40, we evaluate and compare the performance of fuzzy Mxture of Experts wth other networks. he sngle MLP has one hdden layer, and t s traned usng the back propagaton algorthm,. o have a reasonable evaluaton, we compare the sngle MLP wth the mxture model, consstng of four smple MLPs. Expermental results support our clam that four smple MLPs n the mxture archtecture perform the recognton task much better than a sngle MLP. Fg. 5. wo examples of egenfaces that nclude the largest and smallest prncpal components used n our experment. One purpose of the experments conducted n ths paper s to evaluate the effect of dfferent learnng algorthms on the number of requred hdden neurons. We nvestgate the tranng speed and, n general, the recognton rate. Fnally, Mxture of FMLP Experts method s compared wth other face recognton systems. We performed the above mentoned experment wth MLP, fuzzy MLP, Mxture of MLP Experts and Fuzzy Mxture of Experts. he experment was repeated for 20 tmes by randomly choosng dfferent tranng and testng sets form the ORL dataset. he number of prncpal components representng the egenface was set to dfferent values of 0, 5,20,...,80. A total of 20 runs were executed for each learnng algorthm. In Fg. 6 the average performance wth respect to the number of prncpal Fg. 7. Performance rate versus number of neurons n hdden layer for experts of our model. All the networks were traned and tested on the same tranng and test sets. From the results, t s evdent that the performance of the Fuzzy Mxture of Experts network s superor to that of others. able 2 lsts the detals of the tranng parameters and the best structures, n terms of hgher recognton rate, for the experts and gatng network. o fnd the suffcent number of neurons n hdden layer of experts, we experment t wth dfferent number of neurons. As shown n Fg. 7, experts wth 20 neurons n ther hdden layer reveal the best performance. o fnd the requred number of epochs for reachng the hghest recognton rate n both Mxture of MLP Experts and Fuzzy Mxture of Experts, we repeated the same
IJCSNS Internatonal Journal of Computer Scence and Network Securty, VOL.8 No., November 2008 325 experment wth dfferent epochs n tranng the networks and observed ther performance. able 2: he detals of the tranng and fuzzy parameters n as well as the recognton rates of the MLP, Fuzzy MLP, Mxture of MLP Experts and Fuzzy Mxture of Experts networks on the set test. Network Model MLP Fuzzy MLP Mxture of MLP Experts Fuzzy Mxture of Experts opology 40:40:40 40:20:40 Learnng rate & fuzzy parameters Max percentage 0. 0. Momentum: 0.5 f: 0.9 exp: 0.5 m: 0.9 Experts: 40:20:40 Gatng: 40:0:4 Experts: 0. Gatng: 0.5 Experts: 40:20:40 Gatng: 40:0:4 Experts: 0. Gatng: 0.5 Momentum: 0.5 f: 0.9 exp: 0.5 m: 0.9 9 93.5 96 98.9 As shown n Fg. 8, Fuzzy Mxture of Experts needs less epochs to reach ts best result, n other words, t converges faster n comparson wth the same network whch s traned wthout Fuzzy MLP. 5. Concluson hs paper presented the use of a modfed ME structure to mprove human face recognton. Our ME s composed of four modfed experts (whch were fuzzy MPLs) and a gatng network (a MLP). In fuzzy MLP, the tranng rules are modfed such that the weghts are updated consderng the degree of ambguty of each tranng sample. hat s, n case of an ambguous tranng sample (a sample that s lkely to be msclassfed) no weght update s appled. Our proposed ME was traned and tested on the ORL dataset. he recognton rates acheved by the modfed ME turned out to be hgher than that of a sngle MLP, Fuzzy MLP, and the ME wth MLPs as experts traned wthout the fuzzy system. Fg. 8. Performance rate versus number of epoch for Mxture of FMLP Experts and Mxture of FMLP Experts. References [] Zadeh, L., Fuzzy sets. Inf Cont, No.8, pp.338-353, 965. [2] Ruspn, E., Bonssone, P., and Pedrycz, W., Handbook of Fuzzy Computaton. Ed. Iop Pub/Inst of Physcs. 998. [3] Cox, E., he Fuzzy Systems Handbook. AP Professonal New York, 994. [4] S. Haykn, Neural Networks: A Comprehensve Foundaton, Prentce Hall, USA, 999. [5] Mehrotra, K., Mohan, C. K., and Ranka, S., Elements of Artfcal Neural Networks, he MI Press, 997. [6] X. Yao, Evolvng artfcal neural networks, IEEE 87, 999, pp.423 447. [7] R. Avnmelech and N. Intrator, Boosted mxture of experts: An ensemble learnng scheme, Neural Computaton, (2), 999, pp.483 497. [8] D. Jmenez and N. Walsh, Dynamcally weghted ensemble neural networks for classfcaton, Internatonal Jont Conference on Neural Networks, 998, pp.753 756. [9] Y. Lu and X. Yao, Evolvng modular neural networks whch generalze well, IEEE Internatonal Conference on Evolutonary Computaton, USA, 997, pp.605 60. [0] Y. Lu and X. Yao, Ensemble learnng va negatve correlaton, Neural Networks, 2, 999, pp.399 404. [] Y. Lu, X. Yao, and. Hguch, Evolutonary ensembles wth negatve correlaton learnng, IEEE rans Evolutonary Computaton, 4(4), 2000, pp.380 387. [2] A. Sharkey, Combnng Neural Nets Ensemble and Modular Mult-Net Systems, Sprnger, New York, 998. [3] R.A Jacobs, M.I Jordan, and A.G. Barto, ask decomposton through competton n a modular connectonst archtecture: what and where vson tasks, echncal report, Unversty of Massachusetts, 99. [4] R.A. Jacobs and M.J. Jordan, Adaptve mxtures of local experts, Neural Computaton 3, 99, pp.79 87. [5] Jacobs, R., Jordan, M., Nowlan, S., and Hnton, G., Adaptve Mxtures of Local Experts, Neural Comput3, 99, pp. 79 87. [6] M. urk and A. Pentland, Egenfaces for Recognton, Journal of Cogntve Neuroscence, Vol.3 (), pp. 7 86.
326 IJCSNS Internatonal Journal of Computer Scence and Network Securty, VOL.8 No., November 2008 [7] R. Ebrahmpour, E. Kabr and M.R. Yousef, Face Detecton Usng Mxture of MLP Experts, Neural Processng Letters, 2007, pp. 69-82. [8] Pal, S. K., Mtra, S., Multlayer perceptron, fuzzy sets and classfcaton, IEEE ransactons on Neural Networks, 3(5),992, pp. 683 697. [9] Canuto, A., Howells, G., and Farhurst, M. An nvestgaton of the effects of varable vglance wthn the repart neurofuzzy network, Journal of Intellgent and Robotcs Systems, Specal on Neural-Fuzzy Systems, 29(4), 2000, pp. 37 334. [20] N. aher M., A. Hajany, R. Ebrahmpour G. Sepdnam, A Modfed Mxture of MLP Experts for face recognton, IPCV 2008, Las Vegas, Nevada, USA, pp.740-745. face recognton. Nna aher Makhsoos receved the B.S. degree n Computer Engneerng from Najaf Abad Azad Unversty of Esfahan n 2000 and 2004and M.S. degree n Software Engneerng from Ferdows of Mashhad Unversty n 2005 and 2008, respectvely. Her research nterest ncludes mage processng, pattern recognton, fuzzy logc, neural networks and ther applcaton to Reza Ebrahmpour was born n Mahallat, Iran, n July 977. He receved the BS degree n electroncs engneerng from Mazandaran Unversty, Mazandaran, Iran and the MS degree n bomedcal engneerng from arbat Modarres Unversty, ehran, Iran, n 999 and 200, respectvely. He receved hs PhD degree n July 2007 from the School of Cogntve Scence, Insttute for Studes on heoretcal Physcs and Mathematcs, where he worked on vew-ndependent face recognton wth Mxture of Experts. Hs research nterests nclude human and machne vson, neural networks, and pattern recognton. Alreza Hajany BS student, Electrcal Engneerng Department of Shahd Rajaee Unversty ehran, Iran from 2004 up now. Hs research nterest ncludes sgnal processng, ensemble neural networks, face recognton.