Shen, Linlin (2005) Recognizing Faces -- An Approach Based on Gabor Wavelets. PhD thesis, University of Nottingham.

Size: px

Start display at page:

Download "Shen, Linlin (2005) Recognizing Faces -- An Approach Based on Gabor Wavelets. PhD thesis, University of Nottingham."

Stephany Nash
5 years ago
Views:

Shen, Lnln (2005) Recognzng Faces -- An Approach Based on Gabor Wavelets. PhD thess, Unversty of Nottngham. Access from the Unversty of Nottngham repostory: http://eprnts.nottngham.ac.uk/10177/1/recognzng_faces_-- _An_Approach_Based_on_Gabor_Wavelet.

1 Shen, Lnln (2005) Recognzng Faces -- An Approach Based on Gabor Wavelets. PhD thess, Unversty of Nottngham. Access from the Unversty of Nottngham repostory: _An_Approach_Based_on_Gabor_Wavelet.pdf Copyrght and reuse: The Nottngham eprnts servce makes ths work by researchers of the Unversty of Nottngham avalable open access under the followng condtons. Ths artcle s made avalable under the Unversty of Nottngham End User lcence and may be reused accordng to the condtons of the lcence. For more detals see: For more nformaton, please contact eprnts@nottngham.ac.uk

2 Recognzng Faces --- An Approach Based on Gabor Wavelets By LnLn Shen, BSc, MSc Thess submtted to the Unversty of Nottngham for the degree of Doctor of Phlosophy July 2005

3 Abstract Abstract As a hot research topc over the last 25 years, face recognton stll seems to be a dffcult and largely problem. Dstortons caused by varatons n llumnaton, expresson and pose are the man challenges to be dealt wth by researchers n ths feld. Effcent recognton algorthms, robust aganst such dstortons, are the man motvatons of ths research. Based on a detaled revew on the background and wde applcatons of Gabor wavelet, ths powerful and bologcally drven mathematcal tool s adopted to extract features for face recognton. The features contan mportant local frequency nformaton and have been proven to be robust aganst commonly encountered dstortons. To reduce the computaton and memory cost caused by the large feature dmenson, a novel boostng based algorthm s proposed and successfully appled to elmnate redundant features. The selected features are further enhanced by kernel subspace methods to handle the nonlnear face varatons. The effcency and robustness of the proposed algorthm s extensvely tested usng the ORL, FERET and BANCA databases. To normalze the scale and orentaton of face mages, a generalzed symmetry measure based algorthm s proposed for automatc eye locaton. Wthout the requrement of a tranng process, the method s smple, fast and fully tested usng thousands of mages from the BoID and BANCA databases. An automatc user dentfcaton system, consstng of detecton, recognton and user management modules, has been developed. The system can effectvely detect faces from real vdeo streams, dentfy them and retreve correspondng user nformaton from the applcaton database. Dfferent detecton and recognton algorthms can also be easly ntegrated nto the framework.

4 Lst of Publcatons Lst of Publcatons Some parts of the work presented n the thess have been publshed n the followng artcles: 1. LnLn Shen and L Ba. Informaton theory for Gabor feature selecton for face recognton. Eurasp Journal on Appled Sgnal Processng, n press, LnLn Shen and L Ba. A revew on Gabor wavelets for face recognton. Revson submtted, Pattern Analyss and Applcaton, LnLn Shen and L Ba. A fast and robust Gabor feature based method for face recognton. In revson (nvted submsson), Specal Issue on Crme Detecton and Preventon, Pattern Recognton Letters, LnLn Shen, L Ba, Danel Bradsley and YangSheng Wang. Gabor feature selecton usng mproved AdaBoost learnng. The Internatonal Workshop on Bometrc Recognton Systems, n conjuncton wth ICCV 05, Bejng, LnLn Shen and L Ba. Kernel enhanced nformatve Gabor features for face recognton. The 16th Brtsh Machne Vson Conference (BMVC), Oxford, L Ba and LnLn Shen. A fast and robust Gabor feature based method for face recognton. The IEE Internatonal Symposum on Imagng for Crme Detecton and Preventon, The IEE Savoy Place, London, LnLn Shen and L Ba. AdaBoost Gabor feature selecton for classfcaton. Proc. of Image and Vson Computng New Zealand, New Zealand, LnLn Shen, L Ba and P., Pcton. Facal recognton/verfcaton usng Gabor wavelets and kernel methods. Proc. of the IEEE Internal Conference on Image Processng (ICIP), Sngapore, LnLn Shen and L Ba. Combnng Gabor feature and Kernel Drect Dscrmnant Analyss for face recognton. Proc. of the 17 th Internatonal Conference on Pattern Recognton (ICPR), Cambrdge, UK, Aug., 2004.

5 Lst of Publcatons 10. Keron Messer, Josef Kttler, Mohammad Sadegh, Mroslav Hamouz, Alexey Kostn, Faben Cardnaux, Sebasten Marcel, Samy Bengo, Conrad Sanderson, Norman Poh, Yann Rodrguez, Jacek Czyz, L. Vandendorpe, Chrs McCool, Scott Lowther, Srdha Srdharan, Vnod Chandran, Roberto Parades Palacos, Enrque Vdal, L Ba, LnLn Shen, Yan Wang, Chang Yueh-Hsuan, Lu Hsen-Chang, Hung Y-Png, Alexander Henrchs, Marco Müller, Andreas Tewes, Chrstoph von der Malsburg, Rolf Würtz, Zhenger Wang, Feng Xue, Yong Ma, Qong Yang, Ch Fang, Xaoqng Dng, Smon Lucey, Ralph Goss, and Henry Schnederman. Face authentcaton test on the BANCA database. Proc. of the 17th Internatonal Conference on Pattern Recognton (ICPR), Cambrdge, UK, LnLn Shen and L Ba. Gabor feature based face recognton usng Kernel methods. Proc. of the IEEE 6 th Internatonal Conference on Automatc Face and Gesture Recognton, Soel, Korea, May, L Ba and LnLn Shen. Combnng wavelets wth HMM for face recognton. Proc. of the 23rd SGAI Internatonal Conference on Innovatve Technques and Applcatons of Artfcal Intellgence, Cambrdge, UK, Dec., LnLn Shen and L Ba. A comparson of eye and non-eye classfers, Proc. of the 1st Annual Academc Conference of CSSA-Nottngham On Scence and Engneerng, Nottngham, UK, Jul., 2003, pp L Ba and LnLn Shen. Face detecton by orentaton map matchng, Proc. of Internatonal Conference on Computatonal Intellgence for Modellng Control and Automaton, Austra, Feb., 2003, pp LnLn Shen and L Ba. Effects of dfferent Gabor flter parameters on face recognton, Techncal Report, School of CS & IT, Unversty of Nottngham, LnLn Shen and L Ba. PCA versus LDA for face verfcaton usng small sze tranng data, Techncal Report, School of CS & IT, Unversty of Nottngham, LnLn Shen and L Ba. A vdeo based real tme face recognton system, Techncal Report, School of CS & IT, Unversty of Nottngham, 2003.

6 Acknowledgement Acknowledgement I would lke to thank my supervsor Dr. Ba L for her valuable suggestons, encouragement and helps through the whole course of my research. I would lke to thank Professor Peter Ford for hs consstent encouragement and support to our research. Also thanks to Mr. Danel for checkng grammars of the thess and the techncal staff, Vktor, for hs frendly and generous help durng the three years. I would lke to express my deepest grattude to my parents, sster and brothers. Wthout ther support and love, I would not have studed anythng. Ths thess s dedcated to my wfe, Shandy, ZhengXang. Wthout her support and encouragement, there would never be any chance for ths thess to happen. v

7 Table of Contents Table of Contents CHAPTER 1 INTRODUCTION AUTOMATIC PERSON IDENTIFICATION BIOMETRICS FACE IDENTIFICATION AND VERIFICATION PERFORMANCE EVALUATION Identfcaton System Verfcaton System MOTIVATION AND SOLUTIONS MAJOR CONTRIBUTIONS OF THE THESIS ORGANIZATION OF THE THESIS...9 CHAPTER 2 MATHEMATICAL TECHNIQUES USED IN THIS THESIS JOINT TIME FREQUENCY ANALYSIS AND GABOR WAVELETS Jont Tme Frequency Analyss and Gabor Functon D Gabor Wavelets LINEAR SUBSPACE ANALYSIS Prncpal Component Analyss (PCA) Lnear Dscrmnant Analyss (LDA) NON-LINEAR KERNEL SUBSPACE ANALYSIS The Kernel Feature Space Kernel Prncpal Component Analyss (KPCA) Generalzed Dscrmnant Analyss Non-centred Data ADABOOST LEARNING ALGORITHM The Algorthm Tranng Error Choosng α t and h t SUPPORT VECTOR MACHINE ENTROPY AND MUTUAL INFORMATION NOTATION DEFINITIONS Gabor Jet and Smlarty Functon Egenfaces and Fsherfaces The Dfference Space...32 v

8 Table of Contents 2.8 SUMMARY...33 CHAPTER 3 LITERATURE REVIEW D FACE RECOGNITION METHODS Analytc Methods Holstc Methods Hybrd Methods GABOR WAVELET BASED 2D METHODS Analytc Methods Holstc Methods Gabor Wavelet Network Performance Evaluaton Complexty of Gabor Feature Based Methods Optmzaton of Gabor Wavelets for Feature Extracton D FACE RECOGNITION METHODS SUMMARY...66 CHAPTER 4 GABOR FEATURES AND KERNEL SUBSPACE ANALYSIS FOR FACE IDENTIFICATION THE METHODOLOGY System Archtecture Gabor Feature Extracton DownSamplng and Kernel Subspace Analyss Dstance Measure and Classfcaton EXPERIMENTAL RESULTS The Datasets Performance Evaluaton Usng The FERET Database Performance Evaluaton Usng the ORL Database CONCLUSIONS...84 CHAPTER 5 GENERALIZED DISCRIMINANT ANALYSIS OF GABOR FEATURES FOR FACE VERIFICATION FACE VERIFICATION COMPETITION 2004 AND THE BANCA DATABASE The Competton The Database Test Protocols THE METHODOLOGY System Archtecture Smlarty Measure and Threshold Determnaton EXPERIMENTAL RESULTS The Dataset...92 v

9 Table of Contents Results on The Development Set Results on The Evaluaton Set Comparson wth Other Methods CONCLUSIONS...96 CHAPTER 6 OPTIMISING GABOR FEATURES FOR OBJECT DETECTION AND RECOGNITION ADABOOST FEATURE SELECTION AND CLASSIFIER LEARNING THE PROPOSED MUTUALBOOST ALGORITHM APPLICATION TO OBJECT DETECTION APPLICATION TO FACE RECOGNITION EXPERIMENTAL RESULTS Gabor Feature Based Classfer for Object Detecton Selectng Gabor Features for Face Recognton CONCLUSIONS CHAPTER 7 RADIAL SYMMETRY TRANSFORM BASED EYE LOCATION BACKGROUND THE METHODOLOGY The Generalzed Symmetry Transform The Radal Symmetry Measure Eye Locaton by The Radal Symmetry EXPERIMENTAL RESULTS The Results on BoID Database The Results on BANCA Database Integraton wth the Face Verfcaton System CONCLUSIONS CHAPTER 8 THE DEVELOPED USER IDENTIFICATION SYSTEM SYSTEM ARCHITECTURE Regstraton Identfcaton SYSTEM MODULES Face Detecton Recognton User Management CONCLUSIONS CHAPTER 9 CONCLUSIONS AND FUTURE WORKS SUMMARY OF WORKS An Overvew of Gabor Wavelets: Background and Applcatons v

10 Table of Contents Gabor Wavelets and Kernel Subspace Methods for Face Identfcaton and Verfcaton Learnng the Most Important Gabor Features for Object Detecton and Recognton Automatc Eye Locaton The User Identfcaton System FUTURE WORKS Extensons of the Present Works Gabor Feature Selecton wth Larger Search Space Pose Invarant Face Recognton APPENDIX A EIGENVALUE SOLUTIONS OF GDA APPENDIX B OPTIMISING α AND h IN ADABOOST ALGORITHM t t APPENDIX C SKIN BLOB ELLIPSE FITTING BIBLIOGRAPHY v

11 Lst of Fgures Lst of Fgures FIGURE 2-1 GABOR ELEMENTARY FUNCTIONS WITH FIXED SHAPE (A); WITH VARIED SHAPES (B)14 FIGURE 2-2 TIME DURATION AND FREQUENCY BANDWIDTH OF GABOR FUNCTIONS (KYRKI ET AL., 2004)...14 FIGURE 2-3 EXAMPLE 2D GABOR WAVELETS IN THE SPATIAL AND THE FREQUENCY DOMAIN (A) f = 0.4, θ = 0, γ = 4, η = 2, (B) f = 0.2, θ = π / 4, γ = 2, η = FIGURE 2-4 A SIMPLE EXAMPLE (2D->3D) (MULLER, MIKA, RATSCH, TSUDA, & SCHOLKOPF, 2001)...20 FIGURE 2-5 DETAILS OF ADABOOST ALGORITHM (FREUND ET AL., 1999)...26 FIGURE 2-6 A HYPERPLANE CLASSIFIER IN 2-DIMENSION FEATURE SPACE...29 FIGURE 2-7 MAP DATA INTO A FEATURE SPACE WHERE THEY ARE LINEARLY SEPARABLE...29 FIGURE 3-1 GEOMETRIC FEATURES USED FOR FACE RECOGNITION (BRUNELLI ET AL., 1993)...36 FIGURE 3-2 FACE IMAGES REPRESENTED BY GRAPHS (LADES ET AL., 1993)...36 FIGURE 3-3 2D EMBEDDED HMM STRUCTURE (NEFIAN ET AL., 1999)...38 FIGURE 3-4 DIFFERENT BASES OF LINEAR PROJECTIONS: LDA, PCA + LDA AND PCA BASES ARE SHOWN ON THE FIRST, SECOND AND THIRD ROW RESPECTIVELY (ZHAO ET AL., 1998)...40 FIGURE 3-5 THE DIAGRAM FOR A RBF BASED FACE RECOGNITION SYSTEM (ER, WU, LU, & TOH, 2002)...40 FIGURE 3-6 BINARY SVM TREE (GUO, LI, & CHAN, 2001)...41 FIGURE 3-7 LANDMARKS OF ASM (A); VARIANCE OF THE FACIAL SHAPE (B); AND APPEARANCE (C) (LANITIS ET AL., 1997)...43 FIGURE 3-8 FACE ADAPTED GRAPHS FOR DIFFERENT POSES (A) AND AN EXAMPLE FACE BUNCH GRAPH (B) (WISKOTT ET AL., 1997)...45 FIGURE 3-9 THE GROUP SHIFTING/DEFORMATION ALGORITHM (MU ET AL., 2003)...48 FIGURE FACIAL FEATURE POINTS AND THE RESULTS OF GRAPH ADJUSTNG (LIAO ET AL., 2000)...49 FIGURE 3-11 FLOWCHART OF VARIABLE FEATURE POINTS LOCATION (KEPENEKCI, 2001)...50 FIGURE 3-12 CONVOLUTION RESULTS OF A FACE IMAGE WITH 40 GABOR WAVELETS...54 FIGURE 3-13 ORIGINAL IMAGE AND THE RECONSTRUCTED IMAGE WITH DIFFERENT NUMBER OF WAVELETS (KRUGER ET AL., 2002A)...55 FIGURE 3-14 SIGNIFICANT LOCATIONS SELECTED BY DIFFERENT ALGORITHMS: (A) A LOCAL DISCRIMINATION CRITERION RANKED JETS LOCATION, SIGNIFICANCES ARE PROPORTIONAL TO x

12 Lst of Fgures THE RADII OF THE CIRCLES; (B) THE 15 MOST IMPORTANT LOCATIONS SELECTED BY GA; (C)2 2 SAMPLING FOR KEY POINTS WHILE 4 4 SAMPLING FOR ASSISTANT POINTS...64 FIGURE 3-15 EXAMPLE 2D INTENSITY IMAGE, 3D RANGE IMAGE AND SAMPLE HOLE IN SENSED 3D DATA (BOWYER ET AL., 2004)...66 FIGURE 4-1 SYSTEM ARCHITECTURE...69 FIGURE 4-2 THE 40 GABOR WAVELETS IN THE SPATIAL AND FREQUENCY DOMAIN...71 FIGURE 4-3 CONVOLUTION RESULT - (MAGNITUDE AND REAL PART) OF AN IMAGE WITH 40 GABOR WAVELETS...71 FIGURE 4-4 SAMPLE IMAGES FROM THE UMIST DATABASE...73 FIGURE 4-5 DISTRIBUTION OF FACE SAMPLES IN PCA, LDA, KPCA AND GDA SUBSPACES...74 FIGURE 4-6 ENERGY OF THE EIGENVALUES IN PCA, LDA, KPCA AND GDA SUBSPACES...75 FIGURE 4-7 EXAMPLE TRAINING IMAGES (TOP 2 ROWS) AND TEST IMAGES (BOTTOM ROW) OF THE FERET DATABASE...78 FIGURE 4-8 PERFORMANCE OF GABOR + GDA USING DIFFERENT DISTANCE MEASURES...79 FIGURE 4-9 PERFORMANCE OF GABOR + KPCA WITH DIFFERENT DISTANCE MEASURES...79 FIGURE 4-10 EXPERIMENTAL RESULTS OF PCA, LDA, KPCA AND GDA USING GABOR FEATURES...80 FIGURE 4-11 PERFORMANCE IMPROVEMENT OF PCA AND GDA USING GABOR FEATURES...81 FIGURE 4-12 EXAMPLE TRAINING (A), (C) AND TEST IMAGES (B), (D) IN THE ORL DATABASE..83 FIGURE 5-1 EXAMPLE IMAGES IN THE BANCA DATABASE...88 FIGURE 5-2 SYSTEM ARCHITECTURE...90 FIGURE 5-3 NORMALIZED FACE IMAGES...92 FIGURE 5-4 ROC CURVES ON THE DEVELOPMENT SET...93 FIGURE 6-1 PROTOTYPES OF SIMPLE HAAR-LIKE FEATURES (LIENHART ET AL., 2002)...99 FIGURE 6-2 THE PROPOSED MUTUALBOOST ALGORITHM FIGURE 6-3 EXTRA-PERSONAL DIFFERENCE SAMPLES GENERATION FIGURE 6-4 IMAGES FROM FACE IMAGE SET FIGURE 6-5 IMAGES FROM CAR IMAGE SET FIGURE 6-6 SCALE AND ORIENTATION DISTRIBUTION OF FILTERS SELECTED FOR THE FACE IMAGE SET FIGURE 6-7 FIRST EIGHT SELECTED GABOR WAVELETS FOR THE CAR FIGURE 6-8 FAR AND FRR ON THE TRAINING FACE IMAGE SET (A) AND THE TRAINING CAR IMAGE SET (B) FIGURE 6-9 FAR AND FRR ON THE TEST FACE IMAGE SET (A) AND THE TEST CAR IMAGE SET (B) FIGURE 6-10 FIRST SIX GABOR FEATURES (A)-(F); AND THE 200 FEATURE POINTS (G) SELECTED BY ADABOOST x

13 Lst of Fgures FIGURE 6-11 FIRST SIX GABOR FEATURES (A)-(F); AND THE 200 FEATURE POINTS (G) SELECTED BY MUTUALBOOST FIGURE 6-12 DISTRIBUTION OF MUTUALGABOR FEATURES IN SCALE AND ORIENTATION FIGURE 6-13 MI OF FEATURES SELECTED BY ADABOOST (A); MUTUALBOOST (B) FIGURE 6-14 RECOGNITION PERFORMANCE OF ADAGABOR AND MUTUALGABOR FIGURE 6-15 RECOGNITION PERFORMANCE OF ENHANCED MUTUALGABOR FIGURE 6-16 EXAMPLES OF DIFFERENT PROBE IMAGES FIGURE 7-1 THE CONTRIBUTION OF POINTS p AND p TO THE SYMMETRY MEASURE (REISFELD ET j AL., 1995) FIGURE 7-2 THE SYSTEM OUTPUT AT DIFFERENT STAGES. (A) THE INPUT IMAGE; (B) THE RADIAL SYMMETRY MAP; (C) THE FILTERED SYMMETRY MAP; (D) THE THRESHOLDED BINARY SYMMETRY MAP FIGURE 7-3 A SAMPLE FACE IMAGE AND THE LOCATED EYE CENTRE FIGURE 7-4 SOME SAMPLE TEST RESULTS FIGURE 7-5 ERROR DISTRIBUTION FOR TEST SET WITH GLASSES AND WITHOUT GLASSES FIGURE 7-6 THE LOCATION ACCURACY VARYING WITH PARAMETER A FIGURE 7-7 AUTOMATICALLY NORMALIZED FACE IMAGES FIGURE 7-8 WRONG LOCATIONS CAUSED BY FACE DETECTION MODULE (A); EYE LOCATION MODULE (B) FIGURE 8-1 REGISTRATION FLOW CHART FIGURE 8-2 IDENTIFICATION FLOW CHART FIGURE 8-3 A SNAPSHOT OF THE USER IDENTIFICATION SYSTEM FIGURE 8-4 A SAMPLE IMAGE WITH DETECTED FACE FIGURE 8-5 DISTRIBUTION OF SKIN COLORS IN CB, CR DOMAIN FIGURE 8-6 DETECTED FACE IMAGE (A); SKIN PROBABILITY IMAGE (B) AND MASKED FACE IMAGE (C) FIGURE 8-7 ELLIPSE FITTING FOR FACES WITH DIFFERENT ORIENTATIONS FIGURE 8-8 RECOGNITION MODULE DIAGRAM FIGURE 8-9 THE HMM FACE RECOGNITION ALGORITHM FIGURE 8-10 USER MANAGEMENT MODULE DIAGRAM FIGURE 8-11 A SNAPSHOT OF THE VIDEO BASED IDENTIFICATION SYSTEM FIGURE 9-1 A CLASSIFICATION BASED FACE DETECTION SYSTEM FIGURE 9-2 DIAGRAM OF THE DETECTION CASCADE (VIOLA ET AL., 2001) x

14 Lst of Tables Lst of Tables TABLE 3-1 LIST OF GABOR WAVELET BASED FACE RECOGNITION ALGORITHMS AND ACCURACY..58 TABLE 4-1 COMPARATIVE RESULTS OF GABOR + GDA WITH OTHER METHODS ON PART OF THE FERET DATABASE...82 TABLE 4-2 EXPERIMENTAL RESULTS OF PCA, LDA, KPCA AND GDA ON THE ORL DATABASE..83 TABLE 4-3 PERFORMANCE IMPROVEMENTS USING GABOR FEATURES ON THE ORL DATABASE...84 TABLE 4-4 RESULTS OF OTHER METHODS ON THE ORL DATABASE...84 TABLE 5-1 VERIFICATION PERFORMANCE ON THE DEVELOPMENT SET...93 TABLE 5-2 VERIFICATION PERFORMANCE ON THE EVALUATION SET...94 TABLE 5-3 VERIFICATION RESULTS FOR PARTIALLY AUTOMATIC SYSTEMS...95 TABLE 6-1 COMPARATIVE CLASSIFICATION RESULTS ON THE FACE IMAGE SET TABLE 6-2 SVM CLASSIFICATION RESULTS ON THE FACE IMAGE SET TABLE 6-3 ANL FOR DIFFERENT TMI TABLE 6-4 COMPARATIVE COMPUTATION AND MEMORY COST OF GABOR + GDA AND MUTUALGABOR + GDA TABLE 6-5 LIST OF DIFFERENT PROB SETS TABLE 6-6 FERET EVALUATION RESULTS FOR VARIOUS FACE RECOGNITION ALGORITHMS TABLE 7-1 STATISTICAL RESULTS ON THE BANCA DATABASE TABLE 7-2 VERIFICATION RESULTS FOR FULLY AUTOMATIC SYSTEMS TABLE 7-3 COMPARATIVE RESULTS FOR FULLY AND PARTIALLY AUTOMATIC FACE VERIFICATION SYSTEMS x

15 Introducton Chapter 1 Introducton The major concern of ths thess s to develop an automatc face recognton system, whch s robust aganst varance n llumnaton, expresson and pose. At the same tme, the system has to take computaton and memory cost nto consderaton for real tme applcatons. Ths chapter wll gve a bref ntroducton to the background of ths research and a summary of some potental applcatons. Followng the descrpton on how to evaluate the performance of dfferent systems, motvatons behnd the research and the organzaton of the thess wll be ntroduced. 1

16 Introducton 1.1 Automatc Person Identfcaton Wth the advent of electronc bankng, e-commerce, smartcards and an ncreased emphass on the prvacy and securty of nformaton stored n varous databases, automatc personal dentfcaton has become a very mportant topc. Accurate automatc personal dentfcaton s now needed n a wde range of cvlan applcatons nvolvng the use of passports, cellular phones, automatc teller machnes and drver lcense. Tradtonal knowledge-based (password or Personal Identfcaton Number (PIN)) and token-based (passport, drver lcense, and ID card) dentfcatons are prone to fraud because PINs may be forgotten or guessed by an mpostor and the tokens may be lost or stolen. Therefore, tradtonal knowledge-based and token-based only approaches are unable to satsfy the securty requrements of our electroncally nterconnected nformaton socety. A perfect dentty authentcaton system wll need a bometrc component. 1.2 Bometrcs A bometrc s a representaton of a unque part or characterstc of an ndvdual whch has the potental capablty to dstngush between an authorsed person and an mpostor. Snce bometrc characterstcs are dstnctve, cannot be forgotten or lost, and the person to be authentcated needs to be physcally present at the pont of dentfcaton, bometrcs are nherently more relable and more capable than tradtonal knowledgebased and token-based technques. Currently there are many bometrc technologes used for personal authentcaton: face, fngerprnt, hand geometry, rs, retna, sgnature, voce, etc. Despte the fact that other methods of dentfcaton (such as fngerprnt, or rs scans) can be more accurate, face recognton has always remaned a major focus of research because of ts non-nvasve nature and because t s human s prmary method of dentfcaton. The technology of face recognton can be wdely appled n securty 2

17 Introducton survellance, authentcaton, access control and human computer nterfaces. Snce the late eghtes there has been an explosve growth n research on face recognton because of the practcal mportance of the topc and theoretcal nterest from both cogntve scentsts and computer vson and pattern recognton researchers. 1.3 Face Identfcaton and Verfcaton A bometrc system can be operated n two modes: verfcaton mode and dentfcaton mode. In the verfcaton mode, a bometrc system ether accepts or rejects a user s clamed dentty whle a bometrc system operatng n the dentfcaton mode establshes the dentty of the user wthout a clamed dentty. Face dentfcaton s a more dffcult problem than face verfcaton because a huge number of comparsons need to be performed n order to complete dentfcaton. There are a number of potental cvlan applcatons for a bometrc system workng n verfcaton mode. For example, an ATM system whch verfed a user s face wth a bometrc upon each transacton would need only to match the current face mage (acqured at pont of transacton) wth a sngle template stored on the ATM card. A typcal face verfcaton system can be dvded nto two modules: enrolment and verfcaton. The enrolment module scans the face of a person through a sensng devce and then stores a representaton (template) of the face n the database. The verfcaton module s nvoked durng the operaton phase. The same representaton used n enrolment phase s extracted from the nput face and matched aganst the template of the clamed dentty to gve a yes/no answer. On the other hand, an dentfcaton system matches the nput face wth a large number of faces n the database and as a result, algorthm effcency s a crtcal ssue n an dentfcaton system. 3

18 Introducton 1.4 Performance Evaluaton Identfcaton System A face dentfcaton systems performance s usually evaluated by recognton rate, whch s calculated by matchng a set of test face mages wth those n the database. Dfferent algorthms can be evaluated by matchng each test face mage. The matchng attempts performed for each test usually consst of correct matches and ncorrect matches. A matchng s consdered as correct f the two face mages beng matched are from the same person, and ncorrect otherwse. Recognton rate s defned as the rato between the number of correct matches and the number of test mages Verfcaton System In a face verfcaton system, system level performance evaluatons are usually performed by cross matchng the face mages n the database. Dfferent algorthms can be evaluated by matchng each face mage n the database wth the rest of the mages n the database. A threshold value s normally used such that a matchng attempt s consdered authentc when the matchng score s equal or above the threshold value. Two metrcs (FAR and FRR) are used to measure performance of the whole system. The false acceptance rate, or FAR, s the measure of the lkelhood that the bometrc securty system wll ncorrectly accept an access attempt by an unauthorzed user. A system s FAR typcally s stated as the rato of the number of false acceptances dvded by the number of mpostor attempts. The false rejecton rate, or FRR, s the measure of the lkelhood that the bometrc securty system wll ncorrectly reject an access attempt by an authorzed user. Analyss of the FAR shows how well the system can dstngush a correct match from an ncorrect match and s usually related to the unqueness of the features. On the other hand, FRR analyss focuses on the repeatablty 4

19 Introducton of the features between dfferent faces of the same person and s related to the relablty of the features. A system can be tuned for a partcular applcaton by varyng the value of these two metrcs. A low value for both metrcs s often desrable. Unfortunately, tryng to mnmse FAR or FRR requres a trade off between each of the metrcs. The Recever Operatng Curve (ROC) plots FAR versus FRR (Jonsson, Kttler, L, & Matas, 2002) for a system and can be used as a gude for the selecton of an operatng pont for the system. The FAR s usually plotted on the horzontal axs as the ndependent varable and the FRR s plotted on the vertcal axs as the dependent varable. The closer the ROC-curve to the x and y axes, the lower verfcaton error and thus the more relable the system. In reportng the performance, the values of FAR and FRR for the ROCcurve are computed by varyng the threshold value and usng: n n FAR = = n n ac re ; FRR (1.1) u a In Equaton (1.1), n a s the number of access attempt by an authorzed user and n u s the number of access attempt by an unauthorzed user. For a gven threshold value, n ac s the number of acceptances and n re s the number of rejectons. From the ROC-curve, the Equal Error Rate (EER) s defned as the pont where the value of FAR equals the value of FRR. The value of EER can now be used to determne the performance of the system. The lower s the value of EER, the more relable the system. 1.5 Motvaton and Solutons As a hot research topc over the last 25 years, a large number of face recognton algorthms have been proposed n the lterature. The next chapter contans a detaled survey of ths research. Wth a number of dfferent databases avalable, t s always very dffcult to compare dfferent face recognton algorthms. Even when the same database 5

20 Introducton s used, researchers may use dfferent protocols for testng. Whlst many of the algorthms perform well on a certan database, they do not acheve good results on other databases. To make a far comparson, FERET evaluaton (Phllps, Moon, Rzv, & Rauss, 2000) and the Face Authentcaton Test (Messer et al., 2004) have been desgned to evaluate dfferent face dentfcaton and verfcaton algorthms. However, these tests are not concerned wth the speed of the algorthms. Snce only accuracy s accounted for, the applcablty of the algorthms to real-tme applcatons s not consdered. However, the trade-off between accuracy and speed s very mportant. In summary, a face recognton system should not only be able to cope wth varatons n llumnaton, expresson and pose, but also recognze a face n real-tme. Wth n-plane face rotaton, normalsaton can be carred out usng promnent facal features as a reference, e.g., the eyes. However, out-of plane rotaton seems only to be solvable usng 3D technologes. Whle the transformaton of 3D data between dfferent poses s trval, 2D frontal vew mages can also be syntheszed usng a 3D model. The lterature survey of 3D face model technques n chapter 2 shows, however, that the process of syntheszng a frontal vew mage from an arbtrary pose usng a 3D model s very slow. A number of approaches have also been proposed to use 3D data drectly for recognton when such data s avalable. However, 3D scanners are stll relatvely expensve and there are stll some sgnfcant lmtatons to be solved, e.g. the capture process s llumnaton senstve, 3D depth resoluton needs to be mproved, etc. As a result, a 2D frontal vew face recognton system s the man focus of ths research. Though qute a tough task for a computer, face recognton seems to be much easer for human bengs. The ablty to recognze faces and understand the emotons they convey s one of the most mportant human abltes. It s very common that one can nstantly recognze thousands of people. Even a baby s able to dentfy ts mother s face wthn 6

21 Introducton half an hour of brth. As wth many perceptual abltes, the ease wth whch humans can recognze faces dsguses the complexty of the task even when consderng the many potental varatons n such a dynamc real world object. An mportant outcome of research on artfcal vson systems has shown that more than half of the cortex becomes more actve durng vsual processng (Hallnan, Gordon, Yulle, Gbln, & Mumford, 1999). The vsual cortex thus plays a very mportant role n face recognton. Smple cells n the vsual cortex are known to be selectve for four coordnates, each cell havng an x, y locaton n vsual space, a preferred orentaton and a preferred spatal frequency (Daugman, 1985). Based on ths observaton, a number of researches have actually shown that the varous 2D receptve-feld profles encountered n populatons of smple cells are well descrbed by a famly of 2D Gabor wavelets, whch were frst proposed by Gabor (1946) for smultaneous tme and frequency analyss. In addton to ths bologcal motvaton, t s also wdely beleved that local texture features n face mages, extracted by a spatal-frequency wavelet analyss, are bascally more robust aganst dstortons caused by varous llumnaton, expresson and pose (Zhao, Chellapa, Rosenfeld, & Phllps, 2000). In partcular, among varous wavelet bases wth good characterstcs of space-frequency localzaton, the Gabor functon provdes the optmal resoluton n both spatal and frequency doman (Gabor, 1946; Daugman, 1985). As a result, ths research wll apply 2D Gabor wavelets to extract features for face recognton. Snce the smple cells of human vsual cortex are well modelled and the local features n space and frequency doman are smultaneously extracted wth optmal resoluton, the system thus developed mght be able to mmc a human s recognton ablty and be more robust aganst the varaton of llumnaton, expresson and lmted out of plane face rotaton. 7

22 Introducton The motvaton of ths research s to develop both an accurate and a fast frontal vew face recognton algorthm, whch should be robust aganst varatons n llumnaton, expresson and lmted out of plane face rotaton. At the same tme, the system wll be effcent and applcable to real-tme applcatons. When the recognton algorthm has been comprehensvely tested aganst a number of dfferent databases and ts performance maxmsed, t wll be mplemented as a component of a fully automatc face recognton system, complete wth face detecton module. 1.6 Major Contrbutons of The Thess The major contrbutons of the thess can be summarzed as below: An overvew on the background and applcatons of Gabor wavelet has been presented, whch shows that ths bologcally drven mathematcal tool can acheve the optmal resoluton when performng jont tme frequency analyss on the sgnal. The survey of applcatons of such wavelet to face recognton also provdes some gudance for researchers n ths area. A face recognton algorthm robust aganst varatons of llumnaton, expresson and lmted out of plane rotatons has been developed. Once Gabor features are extracted usng a set of Gabor wavelets, kernel subspace methods are then appled to enhance classfcaton accuracy. The algorthm s successfully appled to dentfcaton tasks and tested usng publc databases and protocols. The results verfed the robustness of the extracted features aganst the nonlnear dstortons caused by facal varatons. Based on the successful applcaton of Gabor features and kernel subspace methods to face dentfcaton, the method combnng Generalzed Dscrmnant Analyss (GDA) and Gabor features has also been successfully appled to verfcaton. The 8

23 Introducton expermental results show that the algorthm s among the top performers n the Face Verfcaton Competton A novel feature selecton scheme, MutualBoost, has been proposed to learn the most mportant Gabor features for face recognton. The requrement of Gabor feature based methods for computaton and memory can be substantally reduced when the selected features are used. The results show that MutualBoost selected Gabor features are more dscrmnatve than those learned by the AdaBoost algorthm. The selected nformatve Gabor features are further combned wth GDA (MutualGabor + GDA) for recognton and the method has been fully tested usng the FERET database accordng to the evaluaton protocol. The results show that MutualGabor + GDA acheves better performance than the top performer n the FERET evaluaton, but wth much hgher effcency. A novel symmetry based eye locaton method s presented n ths research. By ntegratng the robust Gabor + GDA algorthm wth the eye locaton method, a fully automatc verfcaton system has been developed. When competng wth 12 partcpants from around the world, the system ranked the 3rd n the Face Verfcaton Competton (FVC2004). We have developed an automatc user dentfcaton system, whch can effectvely detect faces from a real tme vdeo stream, dentfy them and retreve ther regstered personal nformaton such as name etc. The system s expandable and fully ntegratable wth other face detecton and recognton algorthms. 1.7 Organzaton of the Thess The remanng chapters of ths thess are organzed n the followng way: Chapter 2 ntroduces n detal the mathematcal technologes used n the thess. Whle Gabor wavelets are used for robust feature extracton, subspace analyss and support 9

24 Introducton vector machnes are used for feature enhancement and classfcaton. The AdaBoost algorthm and nformaton theory are also descrbed. Chapter 3 revews state of the art face recognton algorthms, both 2D based and 3D based approaches are ncluded. Partcularly, the major concern of the thess,.e., Gabor wavelet based methods are explored n detal. Chapter 4 and 5 presents the proposed Gabor + GDA method for dentfcaton and verfcaton, respectvely. Both methodology and expermental results are gven. Chapter 6 descrbes a novel feature selecton scheme and ts applcaton to select Gabor features for face recognton. The results show that the system usng the selected Gabor features can sgnfcantly ncrease effcency wthout deteroratng performance. In contrast, the face recognton system usng the selected Gabor features has shown to be more robust aganst changes n llumnaton, pose and expresson. Chapter 7 proposes a generalzed symmetry transform based eye locaton algorthm, whch s tested usng thousands of face mages. The eye locaton module has also been ntegrated nto an automatc verfcaton algorthm and top performance on accuracy s observed when compared wth other algorthms. Chapter 8 presents an automatc user dentfcaton system developed n the research. Both system desgns and functon modules are explaned. Fnally, chapter 9 gves conclusons and some comments for future research work on face recognton. 10

25 Mathematcal Technques Used n Ths Thess Chapter 2 Mathematcal Technques Used n Ths Thess Ths chapter s manly concerned wth the man mathematcal technques used n ths thess, whch are lsted as below: Gabor wavelets Lnear Subspace Analyss Non-lnear Kernel Subspace Analyss AdaBoost Learnng Algorthm Support Vector Machne Entropy and Mutual Informaton 11

26 Mathematcal Technques Used n Ths Thess 2.1 Jont Tme Frequency Analyss and Gabor Wavelets Jont Tme Frequency Analyss and Gabor Functon For the past few decades the Fourer transform has been the most commonly used tool for sgnal frequency analyss (Ronald, 1978). It s, however, hard to tell where wthn a sgnal, certan frequences occur,.e., the nformaton about the tme doman s lost. Gven the fact that the frequency content of the majorty of sgnals n the real world change wth tme, t s far more useful to be able to characterze a sgnal n both the tme and frequency domans smultaneously. Instead of comparng the sgnal to complex snusodal functons, a natural way of representng a sgnal n tme and frequency smultaneously s to compare the sgnal wth elementary functons that are concentrated n both the tme and frequency domans (Qan & Chen, 1996). Let s(τ ) and ϕ(τ ) be the sgnal and elementary functon wth centre frequency f, the jont tme and frequency representaton of the sgnal can thus be wrtten as s ( τ ) ϕ( τ t) dτ, whch s an nner product between the sgnal s (τ ) and the shfted elementary functon ϕ (τ ). By movng the short tme duraton wndow functon ϕ (τ ), one could obtan nformaton on how the sgnals frequency contents evolve over tme. Suppose that the tme duraton and frequency bandwdth of ϕ(τ ) are t and f respectvely, then s ( τ ) ϕ( τ t) dτ denotes sgnal nformaton n the range of [ t t, t + t] [ f f, f + f ]. To acheve an exact measure of a sgnal at a partcular tme and frequency, t and f should be as narrow as possble. Unfortunately, the values of t and f are dependent on each other; they are related va the Fourer transform. It s well known that when the tme duraton ncreases, the frequency bandwdth must be smaller and vce versa (Ronald, 1978) thus there s always nherent uncertanty n the tme and frequency 12

27 Mathematcal Technques Used n Ths Thess resoluton of ϕ (t). Several dfferent methods are avalable to calculate the tme duraton and frequency bandwdth of a sgnal. The most common are the standard devaton, or root mean square (r.m.s.), ths s a concept used n statstcal theory (Qan et al., 1996; Daugman, 1985). The tme duraton t s defned as: ( t) 2 = 2 * ( t µ t ) ϕ( t) ϕ ( t) dt, * ϕ( t) ϕ ( t) dt * tϕ( t) ϕ ( t) dt µ = (2.1.1) t * ϕ( t) ϕ ( t) dt By calculatng the frequency uncertanty of f usng a smlar defnton, t has been shown that there s a connecton between the two uncertantes: 1 t f (2.1.2) 2 Gabor (1946) derved the functon that mnmzes ths uncertanty,.e., turns the nequalty nto equaltes such that t f 1 = 2. He found that the functon s a Gaussan modulated by a snusodal sgnal: 2 2 ϕ t) = exp( α t ) exp( j2πf t) (2.1.3) ( 0 where α s the sharpness of the Gaussan, and f 0 s the centre frequency of the snusodal sgnal. See Fgure 2-1 for the Gabor elementary functon wth dfferent frequences. The functon has a Fourer transform: 2 π π Φ( f ) = exp ( f f 2 α α ) (2.1.4) As shown n Fgure 2-1 (a), the shape of Gabor functons s decded by the Gaussan sharpness, whch s nvarant to the varance of the frequency. To make the tme duraton of functon ϕ (t) dependent on the central frequency f 0 (Daubeches, 1990; f 0 Kyrk, Kamaranen, & Kalvanen, 2004), a constant rato γ = s defned such that α the functon, when appled to dfferent frequences, behaves as a scaled verson of each 13

28 Mathematcal Technques Used n Ths Thess other. Fgure 2-1 (b) shows the Gabor functons wth vared shape ( γ = 2 ). Both the tme duraton and frequency bandwdth of the Gabor functon are now related wth the central frequency: the hgher the frequency becomes, the smaller the tme duraton. Ths makes sense snce hgh frequency sgnals change faster. The varatons of tme duraton and frequency bandwdth n both domans are shown n Fgure 2-2, whch demonstrates the smlartes between Gabor functons and other wavelets. (a) (b) Fgure 2-1 Gabor elementary functons wth fxed shape (a); wth vared shapes (b) Fgure 2-2 Tme duraton and frequency bandwdth of Gabor functons (Kyrk et al., 2004) 14

29 Mathematcal Technques Used n Ths Thess The maxmum response of the functon n the frequency doman can also be normalzed to one by multplyng ts nverse now defned as: α 2 π. Consequently the normalzed Gabor functon s 2 f 0 f 0 2 ϕ( t) = exp( t )exp( j2πf 0t) (2.1.5) γ π γ D Gabor Wavelets The 2D counterpart of a Gabor elementary functon was frst ntroduced by Granlund (1978). It can be derved drectly from (2.1.5) by replacng t wth spatal coordnates ( x, y). Daugman (1985) showed a surprsng equvalence between the 2D Gabor functon and the organzaton and characterstcs of the mammalan vsual system. By generalzng the tme frequency resoluton uncertanty to the 2D doman,.e., x y u v 1 4, he also showed that the jont 2D resoluton of Gabor wavelets actually acheves the theoretcal lmt regardless of the values of any of the parameters. From an nformaton theoretc vewpont, Okajma (1998) derved the Gabor functons as solutons for a certan mutual-nformaton maxmzaton problem. The work shows that the Gabor-type receptve feld can extract the maxmum nformaton from local mage regons. Settng the sharpness of the Gaussan n the y axs as β and the rato wth the f central frequency as η =, the 2D Gabor wavelet can now be defned as (Kyrk et al., β 2004): ( ( α x + β y )) exp( j2πfx ) 2 f ϕ( x, y) = exp r r r πγη (2.1.6) xr = x cosθ + y snθ, yr = xsnθ + y cosθ where f s the frequency of the modulatng snusodal plane wave and θ s the orentaton of the major axs of the ellptcal Gaussan. The 2D Gabor wavelet as defned n (2.1.6) has the Fourer transform: 15

Mathematcal Technques Used n Ths Thess 2 1 2 1 2 Φ( u, v) = exp π ( u f ) v 2 r + 2 r α β u = u cosθ + vsnθ, v = u snθ + vcosθ r r (2.1.7) The plots for two Gabor wavelets n the spatal and frequency domans are shown n Fgure 2-3.

6) s dfferent to the one normally used for face recognton (Lades et al., 1993; Wskott, Fellous, Kruger, & von der Malsburg, 1997; Lu & Wechsler, 2002), however, ths defnton s more general.

30 Mathematcal Technques Used n Ths Thess Φ( u, v) = exp π ( u f ) v 2 r + 2 r α β u = u cosθ + vsnθ, v = u snθ + vcosθ r r (2.1.7) The plots for two Gabor wavelets n the spatal and frequency domans are shown n Fgure 2-3. (a) (b) Fgure 2-3 Example 2D Gabor wavelets n the spatal and the frequency doman (a) f = 0.4, θ = 0, γ = 4, η = 2, (b) f = 0.2, θ = π / 4, γ = 2, η = 2 Note that the equaton defned n (2.1.6) s dfferent to the one normally used for face recognton (Lades et al., 1993; Wskott, Fellous, Kruger, & von der Malsburg, 1997; Lu & Wechsler, 2002), however, ths defnton s more general. To fnd the relatonshp between dfferent Gabor wavelet defntons, we frstly defne a wave r vector k = 2πf exp( jθ ) to represent the central frequency components n the frequency doman. Note the assumpton here s that the orentaton of the wave vector s the same 16

31 Mathematcal Technques Used n Ths Thess as that of major axs of the ellptcal Gaussan, whch s fully supported by the models of receptve felds found n smple cells of the cat and macaque strate cortces (Daugman, 1985; Jones & Palmer, 1987). Settng σ 2πf γ = η =,.e. α = β =, the 2π σ r Gabor wavelet located at poston z = ( x, y) can now be defned as: r k r r r k z ( z) = exp 2 2 2π σ 2σ 2 r r exp( k z) ϕ (2.1.8) The wavelet functon used n (Lades et al., 1993; Wskott et al., 1997; Lu et al., 2002) has thus been derved from equaton (2.1.6), and can been seen as a specal case wth α = β. Smlarly, the relatonshp between equaton (2.1.6) and those defned n (Fasel, Barlett, & Movellan, 2002; Weldon, Hggns, & Dunn, 1996) could also be establshed, where the DC term could be deduced to make the wavelet DC free (Lades et al., 1993; Wskott et al., 1997; Lu et al., 2002), smlar effects can also be acheved by normalzng the mage to be zero mean (Kruger & Sommer, 2000; Kruger & Sommer, 2002a). 2.2 Lnear Subspace Analyss Prncpal Component Analyss (PCA) The am of PCA s to dentfy a subspace spanned by the tranng mages { x1, x2, L x LxM }, whch could decorrelate the varance of pxel values. Ths can be acheved by egen analyss of the covarance matrx 1 T Σ = ( x x)( x x) : M M 1 = 1 ΣΕ = ΛΕ (2.2.1) where Ε, Λ are the resultant egenvectors, also referred to as egenfaces, and egen values respectvely. The representaton of a face mage n the PCA subspace s then 17

32 Mathematcal Technques Used n Ths Thess obtaned by projectng t to the coordnate system defned by the egenfaces (Turk & Pentland, 1991) Lnear Dscrmnant Analyss (LDA) Whle the projecton of face mages nto PCA subspace acheves decorrelaton and dmensonalty reducton, LDA ams to fnd a projecton matrx W whch maxmzes the quotent of the determnants of S b and S (Zhao, Krshnaswamy, Chellapa, Swets, w & Weng, 1998), T W S bw W = arg max (2.2.2) T W S W w where S b and S w are the between-class scatter and wthn-class scatter respectvely. Consder a C class problem and let N c be the number of samples n class c, a set of M tranng patterns from the C class can be defned as C { xck, c = 1,2,... C; k = 1,2,..., N c}, M = N. The c S and b S of a tranng set can be computed as: w c= C N c T S = w ( x µ )( x µ ) (2.2.3) ck c ck c C c= 1 N c k = 1 C 1 T S b = ( µ c µ )( µ c µ ) (2.2.4) C 1 c= where µ s the mean of the whole tranng set, and µ c s the mean for the class c. It was shown n (Fukunnaga, 1991) that the projecton matrx W can be computed from the egenvectors of S 1 w S b. However, due to the hgh dmensonalty of the feature vector, especally n face recognton applcatons, S w s usually sngular,.e. the nverse of S w does not exst. As a result, a two-stage dmensonalty reducton technque, named the Most Dscrmant Features (MFD), was proposed by (Swets & Weng, 1996). The orgnal face vectors are frst projected to a lower dmensonal space by PCA, whch s then subjected to LDA analyss. Let W be the projecton matrx from the orgnal pca 18

33 Mathematcal Technques Used n Ths Thess mage space to the PCA subspace, the LDA projecton matrx W lda s thus composed of T 1 T the egenvectors of ( W S W ) ( W S W ). The fnal projecton matrx W mfd can thus be obtaned by: pca w pca pca b pca W = W W (2.2.5) mfd pca lda Note that the rank of S C 1, whle the rank of S M C. As a result, t s suggested b that the dmenson of the PCA subspace should be M-C (Swets et al., 1996). w 2.3 Non-lnear Kernel Subspace Analyss As seen from last secton, both PCA and LDA are lnear methods. Snce facal varatons are mostly nonlnear, PCA and LDA projectons could only provde suboptmal solutons for face recognton tasks (Gupta & Agrawal, 2002). Recently, kernel methods have been successfully appled to solve pattern recognton problems because of ther capacty n handlng nonlnear data. Support Vector Machnes (SVMs) are typcal kernel methods and have been successfully appled to face detecton (Osuna, Freund, & Grost, 1997), face recognton (Phllps, 1999) and gender classfcaton (Moghaddam & Yang, 2000). By mappng sample data to a hgher dmensonal feature space, effectvely a nonlnear problem defned n the orgnal mage space s turned nto a lnear problem n the feature space (Scholkopf et al., 1999). PCA or LDA can subsequently be performed n the feature space and thus Kernel Prncpal Component Analyss (KPCA) (Scholkopf, Smola, & Muller, 1998) and Generalzed Dscrmnant Analyss (GDA) (Baudat & Anouar, 2000). Experments show that KPCA and GDA are able to extract nonlnear features and thus provde better recognton rates n applcatons such as character (Scholkopf et al., 1998) and face recognton (Km, Jung, & Km, 2002; Yang, 2002). 19

34 Mathematcal Technques Used n Ths Thess The Kernel Feature Space Algorthms n feature spaces make use of the followng dea: va a nonlnear mappng φ : R N F x φ( x) (2.3.1) N the data { x R, k 1,..., M} s mapped nto a potentally much hgher dmensonal k = feature space F. Classfcaton may be much easer n ths feature space snce a smple lnear classfer wll be adequate. Intutvely, the dea can be understood from the smple example n Fgure 2-4. Whle a complcated nonlnear decson surface s needed n the two dmensonal space, a smple hyper-plane s enough n the mapped feature space to separate the classes: φ : R ( x, x ) ( z, z, z ) = ( x R , 2x x, x ) (2.3.2) Fgure 2-4 A smple example (2D->3D) (Muller, Mka, Ratsch, Tsuda, & Scholkopf, 2001) In ths example, the complexty of algorthms can be easly controlled due to the low dmenson of feature space. However, when the dmenson of feature space s huge, e.g. mage related classfcaton problems, t would be ntractable to execute an algorthm n ths space. Fortunately, there s a hghly effectve trck for computng dot products n feature spaces for certan mappngs φ and feature spaces F : kernel functons (Scholkopf et al., 1999). In the smple example, the dot product between two feature space vectors can be easly computed wth a kernel functon k as below: 20

35 Mathematcal Technques Used n Ths Thess ( φ( x) φ( y) ) = ( x1, 2x1x2, x2 ) ( y1, 2 y1 y2, y2 ) 2 = (( x, ) (, )) 1 x2 y1 y2 (2.3.3) 2 = ( x y) = k( x, y) There exsts a feature space F and mappng φ such that k( x, y) = ( φ( x) φ( y) ), f the functon k ( x, y) satsfes Mercer s condton (Scholkopf et al., 1999). The most wdely used kernel functons are the Polynomal kernel k( x, y) = ( x y) d and the RBF kernel 2 x y k( x, y) = exp. r Kernel Prncpal Component Analyss (KPCA) Suppose the tranng patterns n the nput space R N are { x k, k = 1,..., M}. φ s the nonlnear map defned from the nput space to a hgh dmensonal feature space: φ : R N F. Each vector x k s now mapped to a hgher dmenson vector φ ( x k ) n the feature space. Here, we assume all the data mapped nto the feature space are centred,.e. M k = 1 φ ( ) = 0 (2.3.4) x k The covarance matrx of the tranng samples n the feature space s now: M 1 T C = φ( x k ) φ( x k ) (2.3.5) M 1 k = Kernel PCA ams to fnd the egenvalues λ 0 and egenvectors v F \ {0} satsfyng λ v = Cv (2.3.6) All solutons v le n the span of φ x ),, φ x ) that M k = 1 ( 1 ( m, and there exst coeffcents α k such v = α φ( ) (2.3.7) k x k Take the nner-product wth vector φ( x k )( k = 1,..., M ) on both sdes of (2.3.6): λ v φ( x )) = ( Cv) φ( x ) (2.3.8) ( k k 21

36 Mathematcal Technques Used n Ths Thess By substtutng (2.3.5) and (2.3.7) nto (2.3.8) and defnng a M M K = k x, x ) = φ( x ) φ( x ) (2.3.9) j ( j j the followng can be obtaned: MλKα = K 2 α Mλα = Kα (2.3.10) where α denotes a column vector wth entres matrx K wth: α 1,...,α M. The above dervaton assumes that all the mapped data φ x ) s centred n feature space F. See secton for an ( k approach to centre the data φ x ) n F. For a new pattern x, the projecton of ts mage φ(x) ( k egenvector v can now be computed as: M k = 1 M k = 1 n the feature space onto the v φ( x) = α ( φ( x ) φ( x)) = α k( x, x) (2.3.11) k k k If the frst L ( 1 L M ) sgnfcant egenvectors are extracted to construct the egen matrx: W = α α... α ] (2.3.12) [ 1 2 L The projecton of x n the L-dmensonal Kernel PCA space s gven by: where y = k W (2.3.13) x k = k( x, x ) k( x, x )... k( x, x )] (2.3.14) x [ 1 2 M Generalzed Dscrmnant Analyss As a generalzed verson of Lnear Dscrmnant Analyss (LDA), Generalzed Dscrmnant Analyss (GDA) performs LDA on sample data n the hgh dmenson feature space. Consder a C class problem and let k N c be the number of samples n class c, then the set of tranng patterns n class c can be defned as x, c = 1,2,... C; k = 1,2,..., N }. The total number of nput vectors can be denoted as: { ck c 22

37 Mathematcal Technques Used n Ths Thess M = C c= 1 N c. For a centred data set n feature space, the between-class scatter matrx S b and wthn-class scatter matrx S w can be defned as: 1 1 C N c T S w = φ( xck ) φ( xck ) (2.3.15) C c= 1 N c k = 1 1 T S = µ µ (2.3.16) b C C c= 1 where µ c s the mean vector of class c : c c N 1 c µ = φ( x ) (2.3.17) c N c k = 1 ck Smlar to LDA, the purpose of GDA s to maxmze the quotent between the nterclass nerta and the ntra-classes nerta. Ths maxmzaton s equvalent to fndng egenvalues λ 0 and egenvectors v F \ {0} satsfyng λ S v S v, (2.3.18) w = b all solutons v le n the span of φ x ),, φ x ) such that N C c c= 1 k = 1 ( 11 ( ck, and there exst coeffcents α ck v = α φ( x ) (2.3.19) ck ck Substtute (2.3.15), (2.3.16), (2.3.17) and (2.3.19) nto (2.3.18): λ 1 C N 1 1 C Nq C N p C Nq C N p N p T T α = = = = = = = = = = qj φ( x p ) φ( x p ) φ( xqj ) α qj φ( x p ) φ( x p ) φ( xqj ) p q 1 j 1 p 1 1 C q 1 j 1 p 1 N p 1 N p 1 Now take the nner-product wth vector φ x ) on both sdes: λ N C q α ( ck 1 (2.3.20) C N p C N q C N p N p 1 ( φ( x )( ) = ( ) ( ) p ) φ( xck ) φ( x p ) φ( xqj ) α qj φ( x p ) φ( xck ) φ( x p ) φ( xqj ) = = = = = = = = = qj q 1 j 1 p 1 1 q 1 j 1 p 1 1 p 1 N (2.3.21) 23

38 Mathematcal Technques Used n Ths Thess The dot product of a sample from class p and the other sample j from class q n the feature space, denoted as ( k kernels as below: ) pq j, can be calculated by a kernel functon, e.g., radal bass 2 ( k ) = ( x ) φ( x ) = k( x, x x p x ) = e r qj j pq φ (2.3.22) p qj p qj Let K be a M M matrx defned on the class elements by ( ( K pq ) p= 1,... C ), where K pq s a q= 1,... C matrx composed of dot products between vectors from class p and q n feature space: Also defne a ( ) K = (2.3.23) pq k j M M = 1,..., N p j= 1,..., N q block dagonal matrx: ( c ) c= 1,..., C U = U (2.3.24) where U c s N N a matrx wth terms all equal to c c 1. N c The equaton n (2.3.21) can now be represented as: λ KKα = KUKα (2.3.25) where α denotes a column vector wth M entres α, c = 1,... C, k = 1,..., N ck c. Dfferent technques can be used to solve the egen problem gven n (2.3.25), the algorthm proposed by (Baudat et al., 2000) was adopted n ths thess, whch fnds the egen vector v by frst dagonalzng the matrx K. Once α and the egenvectors v are decded upoon, the projecton of a new sample x n the GDA space can be easly calculated usng equatons (2.3.12)-(2.3.14). Detals on egenvalue resolutons of GDA can be found n Appendx A Non-centred Data In the general case, data { φ( x )}, = 1,2, L, M s not centred n the feature space. The followng method can be used make ths datas mean zero: 24

39 Mathematcal Technques Used n Ths Thess 25 = = M m m x M x x 1 ) ( 1 ) ( ) ( ~ φ φ φ (2.3.26) The M M kernel matrx K ~ for the centred data can now be calculated as: ( ) ( ) ( ) ( ) ( ) ( ) = = = = = = = = = = + = + = = = M m M M M m j j M m M m M j M m m M j M m m j j M M M x x M x x M x x M x x x M x x M x x x 1 1 n mn 2 1 n n 1 m 1 1 n n 2 1 n n 1 J 1 n n ) ( ) ( 1 ) ( ) ( 1 ) ( ) ( 1 ) ( ) ( ) ( 1 ) ( ) ( 1 ) ( ) ( ~ ) ( ~ ~ K K K K K φ φ φ φ φ φ φ φ φ φ φ φ φ φ (2.3.27) Whch, represented n matrx form s as follows: M M M M M M M K1 1 K1 K 1 K K ~ + = (2.3.28) Once the kernel matrx K ~ for the centred data s calculated, the same procedures as used n prevous sectons can be used to compute the projecton matrx W for the KPCA, or GDA subspace. As gven n Equaton , the projecton of a new pattern x nto the learned subspace can now be computed as: W k y x ~ = (2.3.29) where ( ) = = = = = = + = = = M m M m M n n M m m M M m m x x k M x x k M x x k M x x k x M x x M x x x 1 1 n n n n 1 ), ( 1 ), ( 1 ), ( 1 ), ( ) ( 1 ) ( ) ( 1 ) ( ) ( ~ ) ( ~ ~ φ φ φ φ φ φ k x (2.3.30) Next we defne a M 1 row vector 1 wth all entres equal to 1, the equaton can then be represented n matrx form: M M M M M 1K1 1 k 1K k k x x x ~ + = (2.3.31) 2.4 AdaBoost Learnng Algorthm Introduced by Freud and Schapre (1999), AdaBoost has been successfully appled to object detecton (Vola & Jones, 2001; Lenhart & Maydt, 2002) and face recognton

40 Mathematcal Technques Used n Ths Thess (Mchael & Vola, 2003). The essence of AdaBoost s to learn a number of very smple weak classfers, whch are then lnearly combned nto a sngle strong classfer. Whlst the performance of weak classfers could be just slghtly better than random guessng, AdaBoost learnng mnmzes the upper bound on both tranng and generalzaton errors (Freund et al., 1999). Addtonally AdaBoost has been appled to select Haarlke features (Lenhart et al., 2002) for face detecton, recognton (Mchael et al., 2003) and Gabor feature selecton (Shen & Ba, 2004a) for classfcaton. N Gven m tranng samples ( x, y ), = 1,2,.., m, x R, y { 1,1 } Intalzaton: weghts w 1 ( ) = 1/ m For t=1,, T 1) Tran weak learners usng dstrbuton w t n 2) Choose a weak hypothess h : R { 1,1 } t 3) Choose α t R w ( )exp( α y h ( x )) t t t 4) Update weghts: wt + 1( ) = Z t T Fnal strong classfer: H ( x) = sgn α tht ( x) t= 1 Fgure 2-5 Detals of AdaBoost algorthm (Freund et al., 1999) The Algorthm For two class problems, a set of m labelled tranng samples s gven as ( x, y ), = 1,2,.., m, n where y { 1,1 } s the class label assocated wth sample x R. A large number of n weak classfers h : R { 1,1 } could be generated to form the classfer pool for learnng. The weak classfer could be very smple, e.g., a threshold functon on the kth coordnate of x n the n-dmensonal space. The algorthm focuses on the dffcult tranng patterns, ncreasng ther representaton n successve tranng sets. Over a number of T rounds, T weak classfers are selected to form the fnal strong classfer. In each of the teratons, the space of all possble weak classfers s searched exhaustvely to fnd the one wth the lowest weghted classfcaton error. The error s then used to update the weghts such that the wrongly classfed samples get ther weghts ncreased. 26

41 Mathematcal Technques Used n Ths Thess The resultng strong classfer s a weghted lnear combnaton of all T selected weak classfers. Fgure 2-5 contans the lstng of the AdaBoost algorthm Tranng Error Lettng T f ( x) = α h ( x) and unravellng the weght update rule: t= 1 t t wt ( )exp wt + 1( ) = Z exp = exp = m ( α y h ( x )) α t t m ( y f ( x )) t t Z t T y h ( x ) t Z t t t (2.4.1) Also let [ π ] be an ndcator varable whch s 1 f the predcate π s true and 0 otherwse. Moreover, f H ( x ) y then f ( ) 0 y mplyng that exp( y f ( x )) 1. Thus, x [ H x ) y ] exp( y f ( x )) (2.4.2) ( Snce the tranng error of H : ε ( H ) s smply the number of wrongly classfed samples tr dvded by m, the bound of the error can be easly found as below: ε tr 1 m 1 m 1 = m ( H ) = [ H ( x ) y ] = t Z mw t exp ( y f ( x )) T + 1 ( ) t Z t (2.4.3) Choosng α t and h t To make w t+ 1 be a dstrbuton, the value of Z t shall actually be the sum of w t + 1( ) : t = wt + 1( ) = wt ( )exp( α t yht ( x ) (2.4.4) Z ) To mnmze the upper bound of tranng error: t Z t, a greedy algorthm chooses α t and h such that Z s mnmzed on each round of tranng. By usng a lnear upper t t 27

42 Mathematcal Technques Used n Ths Thess bound functon Z( α t ) of Z t and settng the dervatve mnmze Z s found to be (see appendx B for detals): t dz / dα t to zero, the value of t α to 1 1+ r t α = t ln (2.4.5) 2 1 rt where r w ) = t t ( ) yht ( x. Snce t 2 Z s now bounded by Z r t 1 t, the tranng error of H s now at most t 1 r 2. The tranng error can be further mnmzed f h t s chosen t such that r t s maxmzed on each round of boostng. Snce r t s closely related wth the predcton error ε of h as below (see appendx B for detals): t t 1 r 2 t ε w ( ) [ h ( x ) y ] =, (2.4.6) t = t t maxmzng r t s equvalent to mnmzng error ε t. In sum, h t wth mnmum predcton error ε should be chosen on each round of boostng and α should be set as: t t 1 1 ε t α = t ln 2 ε t (2.4.7) 2.5 Support Vector Machne Orgnatng from the hyperplane classfer proposed by (Boser, Guyon, & Vapnk, 1992), the support vector machne (SVM) has been greatly developed and wdely appled n machne learnng, classfcaton and pattern recognton ever snce (Scholkopf et al., 1997; Crstann Nello & Shawe-Taylor John, 2000; Moghaddam et al., 2000; Osuna et al., 1997). The SVM s bascally a lnear hyperplane classfer f ( x) = w, x + b amed at solvng the two class problem. As shown n Fgure 2-6, the classfer can separate the data from two classes very well. Snce there mght be a number of such lnear classfers avalable, SVM chooses the one wth the maxmal margn, whch s defned as the wdth that the boundary could be ncreased by before httng a data pont. The dstance between the 28

43 Mathematcal Technques Used n Ths Thess two thn lnes (boundary) n the fgure thus defnes the margn of the lnear SVM wth data ponts on the boundary known as support vectors. The lnear classfer f (x) wth maxmzed margn can be found usng quadratc programmng (QP) optmsaton technques as below: ( y x x b) f ( x) = sgn α, (2.5.1) k k k + where N x R s the support vectors learned by SVM. k Fgure 2-6 A hyperplane classfer n 2-dmenson feature space Fgure 2-7 Map data nto a feature space where they are lnearly separable When the data s non-separable, by relaxng constrants and ntroducng extra error to the objectve functon, lnear SVM can also be solved usng QP technques. For nonlnearly separable data, a nonlnear mappng functon φ : R N F, x φ( x) s used to map them nto a hgher dmenson feature space where the lnear classfer can be appled. Fgure 2-7 shows an example usng the kernel method to learn non-lnear SVM, whch 29

44 Mathematcal Technques Used n Ths Thess s smlar to the one shown n Fgure 2-4. Usng the same kernel trck as descrbed n secton 3, the non-lnear SVM s now found to be: ( y k( x, x b) f ( x) = sgn α ) (2.5.2) k k k + where k( x,x) s a kernel functon, e.g., a polynomal kernel and a Gaussn kernel etc. k N Gven a set of tranng samples x, y ),...,( x, y ),...,( x, y ), x R, y { 1, 1}, SVM not ( 1 1 l l + only acheved only small error on the tranng set, t also mnmzed the upper bound of the error on a test set,.e. generalzaton error (Burges, 1998). It has been shown by researchers that, wth probablty 1 η,0 η 1, the followng bound on the expected generalzaton error of the SVM holds: h(log( 2l / h) + 1) log( η / 4) R < R + (2.5.3) emp l where R emp 1 = l l = 1 f ( x ) y s the emprcal rsk as measured on the tranng set and h s the Vapnk Chervonenks (VC) dmenson. The second term on the rght hand sde s called the VC confdence. The SVM mnmses the upper bound by fxng the emprcal rsk to a small value and mnmsng the VC confdence. 2.6 Entropy and Mutual Informaton As a basc concept n nformaton theory, entropy H (X ) s used to measure the uncertanty of a random varable (r.v.) X. If X s a dscrete r.v., H (X ) can be defned as below: H ( X ) = p( X = x) lg( p( X = x)) (2.6.1) x Mutual nformaton I ( Y; X ) s a measure of general nterdependence between two random varables X and Y : I( Y; X ) = H ( X ) + H ( Y ) H ( X, Y ) (2.6.2) Usng Bayes rule on condtonal probabltes, Eq. (2.6.2) can be rewrtten as: 30

45 Mathematcal Technques Used n Ths Thess I( Y; X ) = H ( X ) H ( X Y ) = H ( Y ) H ( Y X ) (2.6.3) Snce H (Y ) measures the pror uncertanty of Y and H ( Y X ) measures the condtonal posteror uncertanty of Y after X s observed, the mutual nformaton I ( Y; X ) measures how much the uncertanty of Y s reduced f X has been observed. It can be easly shown that f X and Y are ndependent, H ( X, Y ) = H ( X ) + H ( Y ), consequently ther mutual nformaton s zero. 2.7 Notaton Defntons Gabor Jet and Smlarty Functon The convoluton of an mage I and a 2D Gabor wavelet ϕ can be defned as follows: r r r G ( x) = I ϕ ( x), x = ( x, y) (2.7.1) where G(x r ) denote the convoluton result wth a wavelet at a poston x r. Snce the local frequency and orentaton nformaton s not avalable, a number of, e.g. 40, wavelets ϕ, j = 0,1,...,39 tuned to dfferent frequences and orentatons are normally used for j feature extracton. The convoluton results at a poston x r thus consst of mportant local nformaton, and can be concatenated to form a dscrmnatve local feature,.e. jet. A jet J (x r r ) s defned as the set of 40 complex coeffcents { ( x), j = 0,1,...,39} obtaned at one mage pont x r r r, where J j ( x) = I ϕ j ( x). The complex coeffcent J j can also be J j wrtten as J j = a j exp( φ j ) wth magntude a j and phase φ j, whch contans very mportant local texture nformaton. Two functons S m ( J, J ') and S p ( J, J ') are defned to measure the smlarty between two jets J and J '. Whle the frst functon S m uses the magntude nformaton only, the other functon S p takes phase nformaton nto consderaton as well. The two smlarty functons are defned as below: 31

46 Mathematcal Technques Used n Ths Thess where k v j a ja' j J J ' j S m ( J, J ') = = (2.7.2) J J ' a a' S ( J, J ') = p j a a' j j 2 a j j j rv cos( φ φ' dk ) j j j j a' j j 2 j j (2.7.3) s the wave vector of the respectve Gabor wavelet ϕ j and d r s an estmated dsplacement that compensates for the rapd phase shfts (Wskott et al., 1997) Egenfaces and Fsherfaces When applyng the lnear subspace technques,.e. PCA and LDA, for face recognton, 2D face mages are usually converted to a 1D feature vector by concatenatng ther rows or columns. Once the projecton bases are learned from a set of tranng faces, they can be converted back to 2D mages. These base mages are thus called Egenfaces and Fsherfaces for PCA and LDA, respectvely The Dfference Space Canoncal algorthms treat face recognton as a mult-class problem,.e. each ndvdual s a class. Some researchers also proposed the dfference space to smplfy face recognton to a two class problem. Such representaton models the dssmlartes between faces. Let T = I, L, I } be a tranng set of faces of K ndvduals, wth { 1 M several mages of each of the subject. Two classes can be generated from T. The frst s the ntra-personal dfferences set, whch are the dssmlartes n facal mages of the same person: { I I I I } CI = ~ p q p q The second s the extra-personal dfference set, whch are the dssmlartes among mages of dfferent person n the tranng set: { I I I I } CE = ~/ p q p q 32

47 Mathematcal Technques Used n Ths Thess The two sets thus defne the dfference space where face recognton can be represented as a two class problem. 2.8 Summary A number of mathematcal technques have been ntroduced n ths chapter, whch wll be appled n the followng processes n the thess: Feature extracton: the mathematcal orgns of Gabor wavelets show that they are very powerful tools when appled to measure local spatal frequency and mage structure. As a specal wavelet, the Gabor wavelet analyzes mages wth the optmal spatal and frequency resoluton. Motvated by the smlarty of the 2D Gabor wavelet and the receptve feld of the smple cells of the mammalan vsual system, the wavelet famly wll be appled to extract local features from face mages for recognton. Such local features wll be robust aganst dstortons caused by varous expresson, pose and llumnaton changes. Feature enhancement and classfcaton: once the robust feature set has been extracted by Gabor wavelets, a number of enhancement tools and classfers can be further appled. Whle lnear subspace technques such as PCA and LDA have been shown to be able to enhance class separablty, ths chapter also gves theoretcal evdence of further advantages of kernel methods. A number of technques based on such methods (e.g. KPCA, GDA and SVM) have been ntroduced n ths chapter and wll be appled to enhance extracted Gabor features for recognton or classfcaton n the followng chapters. Feature selecton: both the mutual nformaton and AdaBoost algorthms ntroduced n ths chapter wll be appled to select the most dscrmnant Gabor features for face recognton. 33

48 Lterature Revew Chapter 3 Lterature Revew Ths chapter gves a lterature survey on state of the art face recognton algorthms, both 2D based and 3D based approaches are ncluded. Partcularly, the major concern of the thess,.e., Gabor wavelet based methods, are explored n detal. 34

49 Lterature Revew 3.1 2D Face Recognton Methods Varous approaches for 2D face recognton have been proposed n the lterature, whch can be classfed nto three categores: analytc (feature based), holstc (global) and hybrd methods. Whle analytc approaches compare the salent facal features or components detected from the face, holstc approaches make use of the nformaton derved from the whole face pattern. By combnng both local and global features, hybrd methods attempt to produce a more complete representaton of facal mages. Lterature surveys on face recognton approaches can be found n (Chellapa, Wlson, & Srohey, 1995) and (Zhao et al., 2000) Analytc Methods For analytc approaches, dstances and angles between feature ponts on the face, shapes of facal features, or local features, e.g. ntensty values extracted from facal features or components are usually appled for face recognton. The man advantage of analytc approaches s to allow a flexble deformaton at the key feature ponts so that pose changes can be compensated. In (Brunell & Poggo, 1993), both template and geometrcal feature based analytc methods are mplemented and compared. For template based method, facal regons are matched wth templates of eyes, nose and mouth respectvely and the smlarty scores of each facal feature are smply added nto a global score for face recognton. For geometrcal feature based methods, eyes, mouth and nose facal features are frstly detected. The nose wdth and length, mouth poston and chn shape features are then nput to a Bayes classfer for dentfcaton. Fgure 3-1 shows how these geometry features are measured, e.g. the chn shape s represented by the dstance between the edge of the chn and the centre of the mouth. However, the expermental results favour the template matchng approach. 35

Lterature Revew Fgure 3-1 Geometrc features used for face recognton (Brunell et al., 1993) A graph structure, called Dynamc Lnk Archtecture (DLA), s proposed by Lades et al.

Once faces are represented by approprate graphs, Gabor features extracted from graph nodes, named Gabor jets, are then used for face recognton.

50 Lterature Revew Fgure 3-1 Geometrc features used for face recognton (Brunell et al., 1993) A graph structure, called Dynamc Lnk Archtecture (DLA), s proposed by Lades et al. (1993) to represent face mages n. In ths system, an elastc graph matchng process s used to learn the representng graph of face mages. Once faces are represented by approprate graphs, Gabor features extracted from graph nodes, named Gabor jets, are then used for face recognton. Fgure 3-2 shows two example face mages overlad wth the representatve graph (Lades et al., 1993). Later on, Wskott et al (1997) extend DLA to Elastc Bunch Graph Matchng (EBGM), where graph nodes are located at a number of selected facal landmarks. The EBGM has shown very compettve performance and been ranked as the top method n the FERET evaluaton (Phllps et al., 2000). Detals of Gabor wavelet based methods wll be presented n secton 2. Fgure 3-2 Face mages represented by graphs (Lades et al., 1993) 36

51 Lterature Revew The Hdden Markov Model (HMM), wdely used to learn the state and transtonal probabltes between a number of hdden states, has also been appled to face recognton. HMMs are normally traned from examples that are represented by a sequence of observatons. The parameters of the HMM are frstly ntalzed and then adjusted to maxmze the probablty of the observaton of the gven tranng samples. The observaton of test samples can then be nput to the traned HMMs for classfcaton accordng to the output probabltes gven dfferent HMMs. Samara and Young (1994) frst proposed a HMM archtecture for face recognton. A face pattern s dvded nto several regons such as forehead, eyes, nose, mouth and chn. These regons occur n the natural order from top to bottom and they are used to form the hdden states of 1D or pseudo 2D HMMs. To tran a HMM, each face mage s represented by a sequence of observaton vectors, whch are constructed from the pxels of a sub wndow. Nefan and Hayes (1999) proposed the embedded 2D HMM, whch conssts of a set of super states wth each super state beng assocated wth a set of embedded states. Super states represent prmary facal regons whlst embedded states wthn each super state descrbe n more detal the facal regons. As shown n Fgure 3-3, transtons between embedded states n dfferent super states are not allowed. Instead of usng pxel ntenstes drectly, the Dscrete Cosne Transform (DCT) coeffcents are used to form the observaton vectors. Compared to 1D and pseudo 2D HMM, the system can perform more effcently. Based on ths work, Ba and Shen replaced DCT wth the Dscrete Wavelet Transform (DWT) for observaton vector extracton (Ba & Shen, 2003a), the results show the performance mprovement acheved. However, HMM based systems requre lots of mages for tranng, and are only capable of operatng on small databases. The performance drops dramatcally as the sze of database s scaled up. As observed n our experments, the accuracy of 37

52 Lterature Revew Nefan and Hayes s method drops from 97.5% to 32.5% when the number of subjects rses from 40 to 200. Fgure 3-3 2D embedded HMM structure (Nefan et al., 1999) As a hyper plane classfer, Support Vector Machnes (SVM) have also been successfully appled to face recognton. A set of SVM classfers s appled to extract dfferent facal components and the grey values of each component are then combned nto a sngle feature vector (Hesele, Ho, & Poggo, 2001). The component based method has been compared wth a SVM classfcaton based global method and the results show ts robustness aganst varance of pose and llumnaton. However, the database conssts of mages from 5 subjects only and a large number of mages are requred to tran those SVMs. Ther later work (Huang & Hesele, 2003) used a 3D morphable model to generate syntheszed mages wth dfferent llumnaton and pose for tranng. As a result, only 3 tranng mages of each person are requred. However, the results are based on a database from 6 persons only. How the performance scales wth the number of subjects n the database remans unknown Holstc Methods Based on prncpal components analyss (PCA), Krby and Srovch (1990) frst developed the well known Egenface method for both face representaton and 38

53 Lterature Revew recognton. In ths method, the whole face pattern s transformed to a feature vector and a set of tranng samples are used to compute Egenfaces (Turk et al., 1991). PCA can acheve the optmal representaton n the sense of maxmzng the overall data varance. However, the dfference between faces from the same person due to llumnaton and pose (wthn-class scatter) seems to be larger than that due to facal dentty (between-class scatter). Based on ths observaton, Lnear dscrmnant analyss (LDA) s appled for Fsher face methods (Belhumeur, Hespanha, & Kregman, 1997). LDA defnes a projecton that makes the wthn-class scatter small and the betweenclass scatter large. Ths projecton has shown to be able to mprove classfcaton performance over PCA. However, t requres a large tranng sample set for good generalzaton, whch s usually not avalable for face recognton applcatons. To address such Small Sample Sze (SSS) problems, Zhao et al (1998) perform PCA to reduce feature dmenson before LDA projecton, see Fgure 3-4 for the dfferent bases of LDA, PCA + LDA, and PCA projecton. By usng hgher order statstcal analyss, Independent Component Analyss (ICA) was frst adopted by (Bartlett, Movellan, & Sejnowsk, 2002) for face recognton, the work showed that ICA outperformed PCA. However, other researchers (Draper, Baek, Bartlett, & Beverdge, 2003) observed that when the rght dstance metrc s used, PCA sgnfcantly outperforms ICA on the FERET database. Recently, kernel methods have been successfully appled to solve pattern recognton problems because of ther capacty to handle nonlnear data. By mappng sample data to a hgher dmensonal feature space, effectvely a nonlnear problem defned n the orgnal mage space s turned nto a lnear problem n the feature space (Scholkopf et al., 1999). PCA or LDA can subsequently be performed n the feature space and are thus called Kernel Prncpal Component Analyss (KPCA) and Generalzed Dscrmnant Analyss (GDA) (Baudat et al., 2000). Experments show that 39

Lterature Revew KPCA and GDA are able to extract nonlnear features and thus provde better recognton rates n applcatons such as face recognton (Km et al.

54 Lterature Revew KPCA and GDA are able to extract nonlnear features and thus provde better recognton rates n applcatons such as face recognton (Km et al., 2002; Yang, Frang, & Yang, 2004; Shen & Ba, 2004b). Fgure 3-4 Dfferent bases of lnear projectons: LDA, PCA + LDA and PCA bases are shown on the frst, second and thrd row respectvely (Zhao et al., 1998) Fgure 3-5 The dagram for a RBF based face recognton system (Er, Wu, Lu, & Toh, 2002) Neural networks (Flemng & Conttrell, 1990; Er et al., 2002; Lu, 2004b) have also been used to classfy global facal features. When face mages were treated as 1D sgnals and wavelet analyss was used for feature extracton (Lu, 2004b), the Radal Bass Functon (RBF) network was appled to the projecton of face mages to Fsherfaces for classfcaton (Er et al., 2002). The dagram for Er s method s plotted n 40

55 Lterature Revew Fgure 3-5. Whle PCA + LDA were frst used to decrease the feature dmenson of face patterns, sample nformaton was adopted to determne the structure and ntal parameters of the RBF network. Fgure 3-6 Bnary SVM tree (Guo, L, & Chan, 2001) Snce SVM s a bnary classfer, (Phllps, 1999) turned the face recognton problem nto a two class problem by ntroducng the dfference space. Two classes, the dssmlartes between faces of the same person and dssmlartes between faces of dfferent people, are desgned n the dfference space. A sngle SVM s traned to classfy the ntra-person and nter-person dfference classes. The results on a dffcult mage set from the FERET database showed that SVMs outperformed the Egenface method sgnfcantly. A bnary tree system was adopted by (Guo et al., 2001) to use SVMs for the mult-class face recognton problem. The results on the ORL database and a larger face collecton from several databases showed that SVMs acheve hgher accuracy than Egenface approach. In (Jonsson et al., 2002) each person s assocated wth a SVM that was traned to dscrmnate the face mages from the same people and those from others. Both PCA and LDA were used for feature extracton and tested on a verfcaton applcaton. By applyng dfferent llumnaton normalzaton technques, the results show that SVMs are robust and relatvely nsenstve to the feature space and pre-processng methods. However, when the representaton feature already captures and emphasses the dscrmnatory nformaton, e.g., features extracted usng LDA or SVMs 41

56 Lterature Revew lose ther superorty n comparson wth the smplest Eucldean dstance + nearest neghbour classfer. Global technques work well for frontal vew face mages, but they are senstve to translaton, rotaton and pose changes (Hesele et al., 2001). Usually normalzaton s an mportant and nevtable process for these methods. A small number of promnent ponts n the face such as eyes, nostrls or centre of the mouth are requred to resze and rotate the nput face mage. After normalzaton, the nput face mage can be algned wth the model face and recognton can be performed thereafter Hybrd Methods Hybrd methods utlze both local and global features for recognton. One of the early works s Pentland s modular Egenfaces (Pentland, Moghaddam, & Starner, 1994). In ths work, the egenface technque s extended to the descrpton and encodng of facal features, yeldng egenfeatures such as egeneyes, egennoses and egenmouths. The expermental results show that the egenfeatures outperform the egenface method, the performance was further mproved by usng the combned representaton of egenfeatures and egenfaces. Another famous work s the Actve Shape Model (ASM) and Actve Appearance Model (AAM) proposed by (Lants, Taylor, & Cootes, 1997). In ths work, Cootes group use ASM and AAM to model the varance of shape and appearance respectvely. Both ASM and AAM are learned from a large number of tranng mages, whch are then used to model test mages. To recognze a face mage, both ASM and AAM are adjusted to ft the new mage, whch generates a number of shape and texture parameters. Those parameters, together wth the local profles at model ponts, are used for face recognton. When 300 mages (10 mages per ndvdual) are used as tranng mages, the method acheves 92% accuracy for 300 test mages. Fgure 3-7 shows the landmarks 42

Lterature Revew used to tran the ASM, and the effects of varyng the frst two parameters of shape and appearance models.

2 Gabor Wavelet Based 2D Methods Despte remarkable progress so far, the general task of face recognton remans a challengng problem due to complex dstortons caused by varous varatons n llumnaton,

It s wdely beleved that local features n face mages are more robust aganst such dstortons and a spatal-frequency analyss s often desrable to extract such features (Zhao et al., 2000; Scholkopf et al.

57 Lterature Revew used to tran the ASM, and the effects of varyng the frst two parameters of shape and appearance models. (a) (b) (c) Fgure 3-7 Landmarks of ASM (a); varance of the facal shape (b); and appearance (c) (Lants et al., 1997) 3.2 Gabor Wavelet Based 2D Methods Despte remarkable progress so far, the general task of face recognton remans a challengng problem due to complex dstortons caused by varous varatons n llumnaton, facal expresson and pose. It s wdely beleved that local features n face mages are more robust aganst such dstortons and a spatal-frequency analyss s often desrable to extract such features (Zhao et al., 2000; Scholkopf et al., 1997). Wth good characterstcs of space-frequency localzaton, wavelet analyss seems to be the rght choce for ths purpose (Qan et al., 1996; Daubeches, 1990). In partcular, among varous wavelet bases Gabor functons provde the optmzed resoluton n both the spatal and frequency domans (Gabor, 1946; Daugman, 1985). The Gabor wavelet was orgnally contrbuted by Gabor (1946) when he proposed to represent sgnals as a combnaton of elementary functons. The 2D counterpart of the Gabor elementary functon was then ntroduced by (Granlund, 1978). Daugman (1985) revewed the 2D Gabor wavelet famly and presented evdence that the famly can well model the 2D receptve-feld profles of smple cells n the mammalan vsual cortex, and thus such vsual neurons could optmze the general uncertanty relatons for resoluton n space, spatal frequency and orentaton. From an nformaton theoretc 43

58 Lterature Revew vewpont, (Okajma, 1998) derved the Gabor functon as solutons for a certan mutual-nformaton maxmzaton problem. The work shows that the Gabor-type receptve feld can extract the maxmum nformaton from local mage regons. Due to the useful characterstcs of Gabor functons, they have been wdely and successfully appled for texture segmentaton (Jan & Farrokhna, 1991; Weldon et al., 1996), handwrtten numerals recognton (Hamamoto et al., 1998), fngerprnt recognton (Lee & Wang, 1999) and face recognton (Lades et al., 1993; Shen et al., 2004b; Wskott et al., 1997; Lu et al., 2002). The wde applcaton of Gabor functons has also resulted n dfferent termnologes, whch may be qute confusng for researchers. Some examples are Gabor wavelet, Gabor flter, Gabor expanson, Gaobr transform and Gabor functon etc. Based on the fact that ths study starts from jont tme frequency analyss of sgnals, the termnology of Gabor wavelet s used n ths thess. Whle Gabor features are used to represent the features extracted by a set of Gabor wavelets, they are usually called jets when the wavelet famly s appled at a certan facal feature pont. A detaled survey on Gabor wavelet based face recognton methods, both analytc and holstc, wll follow n the next secton Analytc Methods Analytc methods utlze the Gabor features, named Gabor jets, extracted from predefned feature ponts, on the face mages for recognton. Dfferent approaches manly vary n the way to locate feature ponts for Gabor jet extracton, whch can be classfed nto two categores: elastc graph matchng based methods and non graph matchng based methods. For elastc graph based analytc methods, a graph s frst placed at an ntal locaton and deformed usng jets to optmze ts smlarty wth a model graph. Non-graph based methods locate feature ponts manually or by colour or edge etc. 44

Lterature Revew nformaton. Once the locaton process s completed, recognton can then be performed usng Gabor jets extracted from those feature ponts. 3.2.1.

, 1997) are two famous Gabor jet based methods usng elastc graph matchng for face representaton.

59 Lterature Revew nformaton. Once the locaton process s completed, recognton can then be performed usng Gabor jets extracted from those feature ponts Elastc Graph Matchng Based Feature Ponts Locaton Dynamc Lnk Archtecture (DLA) (Lades et al., 1993) and Elastc Bunch Graph Matchng (EBGM) (Wskott et al., 1997) are two famous Gabor jet based methods usng elastc graph matchng for face representaton. Graph matchng based methods normally requre two stages to buld the representng graph g I for a face mage I. M Durng the 1st stage, a model graph g s shfted wthn the nput mage whle keepng ts form rgd. The rgd graph s ntalzed at an arbtrary poston n the nput mage. A I M cost functon S ( g, g ) s defned (see Eq. 3.2) and the poston s updated untl a mnmum value of the functon s reached. The global move procedure s then followed by ndvdual vertces dffuson durng the 2nd stage. The vertces of the model graph are vsted n a random order and are shfted by a random vector d r wthn a topologcal constrant T r to encode the local dstortons due to rotatons n depth or expresson varatons. It s actually the deformaton of the vertces that makes the graph matchng processng elastc. (a) (b) Fgure 3-8 Face adapted graphs for dfferent poses (a) and an example face bunch graph (b) (Wskott et al., 1997) In DLA (Lades et al., 1993), a model graph s bult for each ndvdual face n the gallery and the graph matchng process s requred to learn the representng graph for a new face mage. The model graph n DLA s a rectangular graph, wth each node 45

60 Lterature Revew labeled by Gabor jets. Two sample face mages wth overlad representaton graphs are shown n Fgure 3-2. The graph shown n (b) s bult by applyng the 2 stage graph matchng process usng (a) as the model graph. Based on DLA, Wskott et al. (1997) further developed a more approprate graph structure, called EBGM, to represent faces. Compared wth the rectangle graph used n (Lades et al., 1993), the new method employs object adapted graphs and each node refers to specfc facal landmark. Fgure 3-8a shows the adapted graph grds for faces wth dfferent poses, one can observe that such structure s more sutable for face mages. Snce matchng wth each ndvdual model graph s very computatonally expensve for large galleres, they also developed a technque called the Face Bunch Graph (FBG, shown n Fgure 3-8b) to avod such a process. A bunch s a set of jets taken from the same node from dfferent model graphs. Ths requres a set of algned model graphs, such that a gven node always refers to the same facal features. 80 manually bult model graphs are used n (Wskott et al., 1997) to buld the FBG, whch s then used as the only model graph to buld the representng graph for an nput face mage usng the 2 stage graph matchng process. Snce the representng graph of a face mage s normally assocated wth a set of correspondng Gabor jets, jet smlarty plays a very mportant role n the defnton of I M the cost functon S ( g, g ) to match two graphs. Two dfferent functons can be used to compare jets (Wskott et al., 1997). The frst one, ( J, J ') S m, usng magntude ' nformaton only, generates more smooth output when a fxed J ( z r ) s compared wth jets J ( z r r + d ) r located at vared postons wth dsplacement d. The other one, S ( J, J ' ) p, takes phase nto consderaton, s more senstve to dsplacements and potentally more dscrmnatve snce jets wth the same magntudes but dfferent phase can be r r r 2... dstngushed. For a labeled graph wth nodes { z z z N } 1 and edges z e = z z j, r r r 46

61 Lterature Revew e = 1,2,..., E, = 1,2,... N, j = 1,2,..., N, the smlarty of a model graph g M and a varable I graph g s evaluated by a cost functon n DLA as (Lades et al., 1993): I M = r I r M 2 I M S ( g, g ) λ ( xe xe ) S m ( J n, J n ) (3.1) e n where λ determnes the relatve mportance of jet smlartes and the topography term. J s the jet at node n and n x r s the dstance vector to label edge e. Ths functon does e not take the phase of jets nto consderaton. Smlarly, the qualty of matchng between an mage graph I and the FBG B s evaluated by (Wskott et al., 1997): r I r B 2 I M λ ( xe xe ) 1 I Bm S ( g, g ) = r B max S p ( J n, J n ) 2 m E ( x ) N e e n (3.2) where Bm denotes the mth model graph of the bunch graph B. The cost functons thus defned takes the smlarty of both jets and graph geometry nto consderaton. Other defntons of the cost functon can also be found n (Rong, Su, & Ln, 2002). In the 2 nd stage of graph matchng, the graph nodes are also shfted wthn a topographcal constrant T r to model the local face dstortons. Wskott (1999) used a smple rectangular graph model to nvestgate the role of topographcal constrants for face recognton. The prmtve graph models wth dfferent strengths of topographcal constrants are compared wth a more sophstcated system usng bunch graphs. The results show that the constrants are qute useful when the varatons n llumnaton, scale and background are small. Hs work also compared dfferent jet smlarty measure functons and the results suggest that the functon wth phase yelds better matchng results than the one wthout phase when drastcally changng llumnaton s not avalable. Based on the elastc graph matchng framework, a number of vartons have been proposed n the lterature. Mu and Hassoun (2003) proposed a group shft/deformaton algorthm. The algorthm clustered rectangular graph nodes nto groups (eyes, mouth 47

62 Lterature Revew and nose etc.) accordng to ther locatons. All the graph nodes n the same group move together n the rgd matchng stage, whle local deformaton s allowed n the 2nd step, see Fgure 3-9 for detals. The results on two databases show that the proposed group shft algorthm acheved better performance than the standard elastc graph matchng algorthm. Elastc graph matchng has also been appled to face authentcaton by Duc et al. (Duc, Fscher, & Bgun, 1999). The mportance of the rectangular graph nodes s measured by a crteron specally desgned for acceptance and rejecton of the canddate. The crteron s small when the canddate s the rght person, and large n case of an mpostor. The Fsher dscrmnaton crteron turns out to be the rght one. They show that a feature consstng of only Gabor jets extracted from those mportant nodes not only reduces the feature dmenson, but also mproves the recognton performance sgnfcantly. Snce the elastc graph matchng process s very computatonally expensve, they also tested the sgnfcance of the elastc steps by smply droppng them, whch s equvalent to settng λ = n the graph smlarty functon. The comparson of performance obtaned wth and wthout the deform step shows that the elastc matchng slghtly ncreases the performance, but has less nfluence than weghtng of the graph nodes. Fgure 3-9 The group shftng/deformaton algorthm (Mu et al., 2003) Lao and L (2000) reduced the nodes of the bunch graph to only 17 facal feature ponts, all of whch have clear meanngs and exact postons. A collecton of 70 face mages wth manual marks at correct facal feature ponts are used to construct the bunch graph. 48

Lterature Revew Once the FBG s determned, the facal feature ponts can be detected automatcally for the new nput mage by the elastc graph matchng process.

63 Lterature Revew Once the FBG s determned, the facal feature ponts can be detected automatcally for the new nput mage by the elastc graph matchng process. Snce some feature ponts may be located at the wrong places, a graph adjustng stage s proposed to correct the wrongly postoned ponts. Fgure 3-10 shows the results of automatc facal feature pont detecton. The three msplaced feature ponts marked by black crcles (Fgure 3-10a) are corrected by the graph adjustng process (Fgure 3-10b). Instead of usng the rgd matchng step, Jao et al. (Jao, Gao, Chen, Cu, & Shan, 2002) used face structure knowledge and grey ntensty nformaton to locate the facal features, e.g. eyes and mouth. Once the features are located, the poston of the bunch graph s ntalzed and the elastc deformaton step s then used for feature poston refnng and adjustng. (a) Fgure facal feature ponts and the results of graph adjustng (Lao et al., 2000) (b) Non-graph Matchng Based Feature Ponts Locaton Due to the computatonal complexty of the elastc graph matchng process, a number of works have also proposed other technques for feature pont locaton. Some works locate the feature ponts manually (Escobar & Ruz-del-Solar, 2002; Gokberk, Irfanoglu, Akarun, & Alpaydn, 2003; Wang & Q, 2002; Chung, Kee, & Km, 1999) and Gabor jets extracted at those ponts are then subjected to a sophstcated classfcaton system for recognton. Escobar et al (2002) proposes to use Log-Polar mages for Gabor feature extracton. The face mage s Log-Polar transformed before t s convolved wth Gabor wavelets. Ths technque s supposed to be more robust aganst the varance of 49

64 Lterature Revew scale and rotaton. In ths system, facal feature ponts are located manually and the coordnates are Log-Polar transformed as well. Wu et al. (Wu, Yoshda Y., & Shoyama, 2002) used both colour and edge nformaton to extract facal organ regons, feature ponts are then detected by applyng the SUSAN corner detector. 12 Gabor wavelets wth tuned parameters are desgned and used for both feature pont locaton and feature extracton. Face structure and ntensty were also used by Jao et al. (2002) to locate facal features, e.g. eyes and mouth. Fgure 3-11 Flowchart of varable feature ponts locaton (Kepenekc, 2001) Instead of usng the pre-defned facal features such as eyes, nose and mouth, some researchers have proposed to locate feature ponts n the face mages whch contan nterestng nformaton (Kepenekc, 2001; Hjelmas, 2000). These ponts are not necessarly specfc feature ponts, but they are usually postoned around facal features. Hjelmas appled a famly of 24 Gabor wavelets to the face mage and the magntudes of the convoluton results at each locaton n the mage are summed to result n the fltered mage. The centre area of the face s emphaszed by Gaussan weghtng and a maxma 50

65 Lterature Revew selectng algorthm s used to locate the feature ponts wth useful nformaton. Smlar to the method n (Hjelmas, 2000), ponts wth hgh-energzed Gabor wavelet response are found by searchng the pxels n a sldng wndow (Kepenekc, 2001). 40 Gabor wavelets are convolved wth the face mage and the searchng process s appled to each of the 40 resultant mages. The number of feature ponts and ther locatons vary for dfferent face mages Face Smlarty Measures and Recognton Once a face has been represented by a set of Gabor jets extracted from located feature ponts, face recognton s a trval step. For graph matchng based methods, the dentty of a test mage s determned by the statstcs of graph smlarty values between test graphs and all model graphs n (Lades et al., 1993). The smlarty functon of two facal mages s smply an average over the smlartes between pars of correspondng jets n (Wskott et al., 1997) and (Lao et al., 2000). After comparng two strateges for combnng local jet smlartes, (Mu et al., 2003) suggests that a votng strategy should be used. The set of Gabor jets extracted at dfferent feature ponts could also be combned nto a long feature vector and a smple dstance measure could be appled for classfcaton (Duc et al., 1999; Wang et al., 2002; Jao et al., 2002). Three dfferent dstance measures are tested n (Jao et al., 2002) and the results suggest that the cty block dstance metrc acheves better performance than cosne methods. More sophstcated classfers have also been appled to the combned feature vector for recognton, e.g., a Bayesan classfer s adopted n (Wang & Tang, 2003) and mprovements have been acheved over the system usng drect correlaton of Gabor features for classfcaton. Chung et al. (1999) appled PCA to the extracted Gabor response at predefned facal feature ponts such that local varatons can be ncluded to overcome the shortcomng of PCA. 51

66 Lterature Revew Both methods, whether the feature ponts are located by edge detector (Wu et al., 2002) or manually (Escobar et al., 2002), use the average of the jet s smlarty as a measure of face graphs. The jet smlarty functon wthout takng phase nto account s used. Snce the correspondences of jets between two facal mages are unknown, only jets wth smlarty above a preset threshold are taken nto consderaton (Kepenekc, 2001). The mage smlarty of two facal mages s calculated as the mean of the smlartes of the selected jets. To nclude nformaton of topologcal smlarty, the number of smlar jets could also be taken nto the smlarty functon. In ths case, the overall smlarty of a test mage and a reference mage s a weghted sum of the mage smlarty and the number of smlar jets Holstc Methods Whle analytc methods utlze the Gabor jets extracted from promnent feature ponts for recognton, holstc methods normally extract features from the whole face mage. An augmented Gabor feature vector (Lu et al., 2002) can be derved by concatenatng the Gabor jets at all pxel locatons. Snce the feature vector conssts of all useful nformaton extracted from dfferent frequences, orentatons and locatons, ths representaton can produce dscrmnant features for recognton. Smlar to typcal holstc face recognton methods, faces need to be detected and normalzed n sze and orentaton pror to recognton. Varous works have shown that such Gabor features are much more robust than grey-level ntensty values aganst the ms-algnment caused by the normalzaton procedure (Shan, Gao, Chang, Cao, & Yang, 2004). A number of researchers have developed dfferent recognton systems based on ths feature vector. In Lu s early work (Lu et al., 2002), he appled the Enhanced Fsher lnear dscrmnant Model (EFM) on the Gabor feature vector for face recognton, results show that the novel Gabor-Fsher Classfer outperformed both PCA and LDA. 52

67 Lterature Revew Snce the 40 Gabor fltered mages are concatenated together to form a feature vector (see Fgure 3-12), the dmenson s huge, e.g., 163,840 for mages wth sze As a result, downsamplng s frst used to reduce the dmenson to manageable sze. He also appled Independent Component Analyss (ICA) (Lu & Wechsler, 2003) on the augmented feature vector and developed a so-called Independent Gabor Feature (IGF) for recognton. The results show that ICA performs sgnfcantly better than egenfaces. One of hs recent work (Lu, 2004a) utlzed Kernel PCA wth fractonal power polynomal kernel to reduce the dmenson of the extracted Gabor feature vector and enhance the dscrmnatve power at the same tme. However, no drect comparson among those proposed approaches s presented. Shen and Ba (Shen et al., 2004b; Shen & Ba, 2004c) mapped the augmented Gabor features to kernel space,.e., the extracted Gabor feature s analyzed by Generalzed Dscrmnant Analyss (GDA), or Kernel Drect Dscrmnant Analyss (KDDA) for further feature enhancement. Expermental results show that kernel methods acheve much better results than lnear methods such as PCA and LDA. The work of both Lu (Lu et al., 2002) and Shen (Shen et al., 2004b) have shown that Gabor feature based methods can acheve sgnfcant mprovement over those usng raw pxels, whch proved the dscrmnaton ablty of Gabor feature. Smlar work can also be found n (Fan, Wang, Lu, & Tan, 2004), whch apples Null LDA (NLDA) to the augmented Gabor feature vector for recognton. Once the dmenson of extracted feature vector has been reduced and dscrmnaton ablty enhanced by a certan subspace analyss, smple nearest neghbour classfer and Eucldean dstance measure can be appled for classfcaton. When the smple Eucldean dstance measure seems to be enough, research results do suggest that dfferent dstance measures may affect the performance of system and an approprate dstance measure has to be chosen for dfferent subspace analyss approaches (Lu et al., 53

However, such knds of system are more complex and the mprovement s not guaranteed.

68 Lterature Revew 2002; Shen et al., 2004b). More complex classfers, e.g. Support Vector Machne (Ch, Da, & Zhang, 2004) and Nearest Feature Space (Zhu, Va, & Mak, 2004), could also be appled to the enhanced features for possble mprovement of accuracy. However, such knds of system are more complex and the mprovement s not guaranteed. Fgure 3-12 Convoluton results of a face mage wth 40 Gabor wavelets A qute dfferent method proposed by Aynde and Yang uses rank correlaton of Gaborfltered mages for face recognton (Aynde & Yang, 2002). Instead of concatenatng all of the fltered mages together, ther method compares the fltered mages separately. Three Gabor fltered mages wth selected orentaton and kernels, together wth the orgnal face mage and the neghborhood averagng of two fltered mages, are used to represent the faces. Rank correlaton values derved from the sx representng mages are then weghted together to yeld the overall matchng score of two face mages. A face s matched to the subject that produces the hghest smlarty score computed from the sx rank correlaton values. Snce the weghtng parameters need to be decded from the tranng mages, the optmsaton process s very length. It s reported n ths paper that the process takes 35 mnutes to complete a run for parameter determnaton usng 200 tranng mages. 54

69 Lterature Revew Gabor Wavelet Network Whlst most of the works n the lterature use Gabor wavelets for feature extracton, the characterstcs and compresson ablty of wavelets have not been fully explored. Reconstructon of the sgnal from compressed wavelet coeffcents s actually one of the man reasons that lead to the wde applcaton of wavelets n the real world (Strang & Nguyen, 1996; Mallet & Zhong, 1992). Due to the nonorthogonalty of Gabor wavelets, applcaton of Gabor wavelets n sgnal reconstructon s very lmted. Credts must be gven to Krueger, who proposed the use of Gabor Wavelet Networks (GWN) for object representaton and face processng (Kruger et al., 2000; Kruger & Sommer, 2002b; Kruger et al., 2002a). Orgnatng from the dea of wavelet networks (Zhang & Benvenste, 1992) and the fact that Gabor functons have been wdely appled to feature extracton, Krueger proposed to use a set of Gabor wavelets Ψ = ( ϕ 2, wth T 1ϕ Lϕ N ) assocated weghts T W = ( w 1w2, Lw N ) to represent a face mage. The set of Gabor wavelets and weghts are obtaned through optmzng the objectve functonal of reconstructon error 2 = mn T E I w ϕ. The two vectors Ψ = ( ϕ ϕ, Lϕ ) 1 2 N and T W = ( w w, Lw N ) now defne the GWN for representng mage I. Gven the optmal 1 2 GWN of an mage I, t can be reconstructed by a lnear combnaton of the weghted wavelets: I = ϕ = W T Ψ. The qualty of the reconstructon of course depends on the w number of wavelets used and can be vared to reach desred precson. Fgure 3-13 shows the mages reconstructed wth 16, 52, 116 and 216 Gabor wavelets (left to rght). Fgure 3-13 Orgnal mage and the reconstructed mage wth dfferent number of wavelets (Kruger et al., 2002a) 55

70 Lterature Revew Snce Gabor wavelets are nonorthogonal bases, lnear projectons of a new pattern on them do not produce the correct weghts. As a result, dual Gabor wavelets ~ Ψ = ( ~ ϕ ~ ~ have to be found to compute the weghts: T 1ϕ 2, Lϕ N ) ~ ~ W = I Ψ wth Ψ = (3.3) T 1 T ( Ψ Ψ) Ψ Once the GWN s learned to represent a face mage, the representatons can be used for recognton. Snce the number of wavelets and weghts may vary for dfferent mages, a specal dstance measure has been desgned n (Kruger et al., 2002b) for smlarty measurement. Recently, Zhang et al. (Zhang, Zhang, Huang, & Tan, 2005) proposed the concept of the Subject Dependent Gabor Wavelet Network (SDGWN), whch s learned from all of the tranng mages of the same subject. Instead of representng each subject mage wth dfferent GWNs, ther method uses the same GWN model to represent all mages from the same ndvdual. The SDGWN was then further combned wth a recent proposed neural network model, named Kernel Assocatve Memory (KAM) for face recognton. The results on FERET, ORL and AR face databases show that ths method acheved better performance than other popular approaches Performance Evaluaton Wth a lot of face databases avalable, evaluaton of dfferent face recognton algorthms s always one of the most dffcult tasks. Even when the same database s used, dfferent papers may use dfferent parts of the database for experments. Moreover, the parttonng of tranng mages, gallery mages and test mages may also vary. For example, the results of (Wskott et al., 1997) were reported usng 250 fa and 250 fb mages from FERET, whle those from (Lu et al., 2002) were reported usng 600 frontal FERET mages. In (Lades et al., 1993), the database conssts of mages captured from 87 people. Subjects were asked to keep n standard pose, look 15 o to the rght and make a random expresson. The standard mages are used as the model and the mages 56

71 Lterature Revew wth dfferent poses and expressons are used as two probe sets for testng. Accuracy of 98% was acheved for elastc bunch graph methods when frontal vew faces were used for testng (Wskott et al., 1997), where neutral frontal vew faces (fa) were used as the model gallery and frontal vew faces wth dfferent facal expresson (fb) were used as probe mages. When half profle faces or profle faces were matched wth the frontal faces, the accuracy drops sgnfcantly. A number of databases are tested n (Zhang, Yan, & Lades, 1997) and the results of ther algorthm are compared wth state of the art algorthms such as Egenface, elastc graph matchng and neural network, etc. It s clamed that ther algorthm s compettve to the popular methods and ther algorthm acheves hgher performance than most of other algorthms when the FERET database s concerned. Lu et al. (2002) use 600 FERET frontal face mages wth dfferent llumnaton and facal expressons from 200 subjects for performance evaluaton. The eyes of face mages were manually detected and used to normalze the scale and rotaton. Two mages of each person were randomly chosen as tranng mages whle the remanng mage was used for testng. 100% accuracy was acheved when the dmenson of feature was set as 65. A pose estmaton module s also developed n (Lu, 2004a) and the algorthm s tested usng the CMU PIE database where faces wth dfferent poses are avalable. The accuracy of 96% reported n (Aynde et al., 2002) was acheved when 9 mages of each person n the ORL database were used for tranng. Table 3-1 summarzes the database and recognton rate of dfferent Gabor feature based algorthms. All of the recognton rates lsted n the table are for frontal vew faces only, results for half profle and profle faces from FERET database can be found n (Wskott et al., 1997) and (Hjelmas, 2000). Snce the performance for methods group shftng /deformaton (Mu et al., 2003) and weghted EBGM (Duc et al., 1999) are 57

72 Lterature Revew reported wth False Acceptance Rate (FAR) and False Rejecton Rate (FRR), ther results are not ncluded n the table. Algorthms Test Database Recognton Rate (%) Local Methods Global Methods Gabor Wavelet Networks Elastc Graph Matchng Based Methods Non-Graph Matchng Based Methods DLA (Lades et al., 1993) Own 88 EBGM (Wskott et al., 1997) FERET 98 Lao and L s method (Lao et al., 2000) Yale 96.4 Gabor + Bayesan (Wang et al., 2003) XM2VTS 97.1 AR 93.3 Face structure based facal feature Own 93.3 detecton (Jao et al., 2002) ORL 94.5 Strlng Varable facal feature ponts AR (Kepenekc, 2001) ORL 95.3 FERET 96.3 Edge/color based facal feature ponts detecton (Wu et al., 2002) Own 92.8 Log-Polar + Gabor (Escobar et al., 2002) Yale 88.9 Gabor-Fsher (Lu et al., 2002) FERET Gabor ICA (Lu et al., 2003) FERET 98.5 ORL 100 Gabor Kernel PCA (Lu, 2004a) CMU PIE 95.3 FERET 99.5 Gabor-Kernel (Shen et al., 2004b) Gabor + rank correlaton (Aynde et al., 2002) GWN (Kruger et al., 2002b) SDGWN + KAM (Zhang et al., 2005) FERET 97.5 ORL 100 ORL 96.0 UMIST 97.5 Yale 97.8 Mancheste 93.3 r FERET 99.6 ORL 100 AR 96.5 Table 3-1 Lst of Gabor wavelet based face recognton algorthms and accuracy A few works n lterature have also compared Gabor feature based methods wth other popular face recognton algorthms. A Dynamc Lnk Archtecture based algorthm s evaluated as more robust than egenface methods and neural network approaches (Zhang et al., 1997). A combnaton of four databases: MIT, ORL, Wezmann and Bern were used to evaluate dfferent algorthms. Whle the performance of the egenface 58

73 Lterature Revew method deterorates sgnfcantly as lghtng varaton ncreases, the elastc matchng algorthm, on the other hand, s nsenstve to lghtng, face poses, and expresson varatons and therefore s more versatle. An accuracy of 93% was reported for the DLA algorthm, whch s much hgher than that of Egenface methods (66%). Kalocsa et al. (Kalocsa, Zhao, & Elagn, 1998) attempt to compare the performance of machne face recognton systems wth that of humans: 64 volunteers performed a sequental face matchng task and ther error rate and reacton tme was recorded as the psychophyscal data. Two face recognton models, the DLA and PCA-LDA models were also appled to the same mage test set and the results were compared quanttatvely and qualtatvely. The analyss shows that both models are correlated to human performance, however, the DLA model seems to capture human performance better than PCA-LDA model. Several large databases and evaluaton protocols have also been avalable n lterature such that dfferent algorthms can be compared n the same framework. In 1996 and 1997, the FERET evaluaton methodology and benchmark were desgned to evaluate state of the art face recognton algorthms (Phllps et al., 2000). Dfferent test sets were desgned n the evaluaton to test the robustness of face recognton methods aganst varance caused by varous expressons, llumnatons and capture tmes. A number of systems such as PCA, PCA + LDA, neural network and Bayesan methods were evaluated and the results show that EBGM acheved the top performance. To make the testng be as closely as possble wth real authentcaton applcatons, the BANCA database (Ballere & Bengo, 2003; Messer et al., 2004) has also been released recently to replace the XM2VT database (Messer, Matas, Kttler, Luettn, & Matre, 1999) for evaluaton of face verfcaton algorthms. Organzed by the Unversty of Surrey (UK), more than 10 research nsttutes partcpated n the face verfcaton 59

74 Lterature Revew competton (FVC2004), whch was based on the BANCA database. Several protocols were desgned n ths competton to test the robustness of algorthms aganst varance of mage qualty, face poses and llumnaton. The results show that two methods usng Gabor wavelets for feature extracton demonstrated the top performance (Messer et al., 2004). Based on the comparson of the Gabor feature based methods wth state of the art algorthms and the results of FERET evaluaton and FVC2004, we beleve that Gabor wavelets mght be the best choce to extract features for face recognton. The features could be extracted ether locally or globally, and then dfferent classfcaton approaches can be appled Complexty of Gabor Feature Based Methods Despte the advantages of Gabor wavelet based algorthms n recognzng face mages wth dfferent llumnaton, pose and expresson, they requre hgh computatonal efforts. Even when a parallel computer system was used, t was reported n (Lades et al., 1993) that the convoluton of a pxel mage wth 40 Gabor wavelets took about 7 seconds. When 23 transputers were used, the comparson of an mage to a stored face model took 2 to 5 seconds, whle the dentfcaton of a probe face n a database of 87 people took about 25 seconds. For the elastc bunch graph matchng algorthm, the locaton of face, detecton of facal feature ponts and matchng wth FGB together take less than 30 seconds on a SPARC staton (Wskott et al., 1997). Snce fewer graph nodes are used and the smlarty of graphs s smply an average over the smlartes between pars of correspondng jets, the comparson of an nput face aganst a database of 250 people took less than 1 second. As a result, the man computatonal loads for graph matchng based analytc methods are from the process of the convoluton of the mage wth the famly of Gabor wavelets, and the elastc graph matchng step. Fast Fourer Transform (FFT) and Inverse Fast Fourer Transform (IFFT) 60

75 Lterature Revew can be used to speed up the convoluton process,.e. both Gabor wavelets and the mage are transformed to frequency doman usng FFT and the product s then transformed back to spatal doman usng IFFT. The whole convoluton process can thus be completed wthn 2 seconds for mages wth sze on a Pentum 4 1.8G HZ PC. However, the 2 stage elastc graph matchng process remans a tme consumng step. A natural way s to replace part of, or the whole graph matchng process wth a faster method mplementng smlar functon. Jao et al. (2002) replaced the rgd matchng step wth a structure knowledge and grey ntensty nformaton based facal features locaton process. Once the features are located, the poston of the bunch graph s ntalzed and the elastc deformaton step s then used for feature poston refnng and adjustng. However, the tme saved compared to the standard elastc graph matchng process s not reported. Duc et al. (1999) proposed a coarse to fne rgd graph matchng method to speed up the 1 st stage process, whch s based on a Gaussan pyramd structure. They also tested the sgnfcance of the elastc step by smply droppng them, whch results n 3% ncrease n classfcaton error. The performance drop due to the elmnaton of the deform step s not sgnfcant and can be compensated by other enhancements, e.g. weghtng of the graph nodes. The whole elastc graph matchng process could also be replaced by a robust facal feature locaton process. In (Wu et al., 2002), once the mage has been preprocessed usng Gabor wavelets, facal feature ponts are detected usng color, edge, Gabor features and corner nformaton, whch only takes about 0.15 seconds. When 12 purposely desgned Gabor wavelets are appled for facal feature pont extracton, t s reported that the processng tme of the Gabor transformaton takes about 3 seconds wth a 533MHz Celeron processor. The processng tme could have been further 61

76 Lterature Revew reduced wth a more powerful PC. However, the feature pont locaton algorthm tself has to be robust aganst the varaton of llumnaton, pose and expresson. Smlarly, the computatonally ntensve convoluton processes for Gabor feature based holstc methods could also be speed up by usng FFT and IFFT. However, the dmenson of the extracted Gabor feature s ncredbly huge, e.g. 655,360 for an mage wth sze when 40 wavelets are used. Although downsamplng could be used to reduce the feature dmenson to a certan magntude, the dmenson after downsamplng s stll very hgh, e.g. 16,384 wth a downsamplng rate of 40 (Shen et al., 2004b). As a result, hgh memory capacty s requred to save the features of face templates. In addton, both the tranng and applcaton of classfers usng such hgh dmensonal features would be very tme consumng. The Gabor feature representaton of a face mage s substantally compressed when a GWN s used. 52 wavelets have been shown to be suffcent for real tme pose estmaton and face trackng (Kruger et al., 2002b). However, the GWN optmzng process gven an mage requres a hgh computatonal cost. It was reported n (Kruger et al., 2002b) that t takes about 30 seconds on a 750-MHz processor to optmze a GWN wth 16 wavelets, even when a coarse-to-fne strategy has been adopted Optmzaton of Gabor Wavelets for Feature Extracton As descrbed n the last secton, a number of methods have been proposed to reduce the computatonal complexty of Gabor feature extracton, e.g. FFT or usng alternatve facal feature locaton approaches, etc. Some researchers have also tred to optmze the Gabor representatons by usng a feature selecton scheme. The dmenson of Gabor features could thus be reduced and the feature wll be more robust aganst the nfluence of nose. These optmzaton methods can be manly classfed nto the followng categores: 62

77 Lterature Revew Optmzaton of Locatons A local lnear dscrmnaton crteron has been developed n (Duc et al., 1999) to measure the mportance of dfferent nodes on the rectangular graph representng face mages. By usng only the Gabor jets located at sgnfcant nodes, not only s the feature dmenson reduced, but the classfcaton performance s also mproved. The dscrmnaton crteron s smlar to the Fsher measure (Fsher, 1936) such that the varance between samples of the same ndvdual s mnmzed. Another nterestng work models the feature locaton optmzaton objectve as a subset selecton problem (Gokberk et al., 2003). They tested three dfferent Gabor jet representaton schemes: a) rectangular graph wth sparse nodes; b) face adapted graph wth nodes located at promnent facal features only, e.g. eye corners, mouth corners, etc. c) the whole convoluton result ncludng all pxels n the mage. Dfferent feature selecton methods such as best ndvdual feature (BIF), sequental forward selecton (SFS), sequental float forward search (SFFS) and genetc algorthm (GA) were tested and the results show that GA wth representaton scheme c) acheved the best performance. One can observe that most of the sgnfcant jets are located at the perphery of facal features. However, the results do suggest that the best locatons to represent face mages usng Gabor jets may not necessarly be exactly at the facal features. PCA s performed on the augmented feature vectors n (Lu, Lam, & Shen, 2004). They argue that the summaton of the egenvectors at a partcular poston represents the correspondng varatons among tranng mages and thus reflects the correspondng mportance n dstngushng human faces. Each pxel n the mage s then classfed as ether a key pont or assstant pont based on ths crteron. Dfferent samplng ntervals are adopted on the key and assstant ponts and a Gabor feature vector of lower dmenson can thus be generated. LDA s fnally appled to the resultant feature vector for face 63

Lterature Revew recognton. See Fgure 3-14 for the feature locatons selected by dfferent algorthms. As can be seen from the fgure, most of the sgnfcant locatons are around facal features, e.g. eyes, nose and mouth etc.

mportant locatons selected by GA; (c)2 2 samplng for key ponts whle 4 4 samplng for assstant ponts 3.2.6.

Wang and Q (Wang et al., 2002) appled GA to select the optmzed Gabor wavelet bass for feature extracton. 34 easly dentfable landmarks, located manually on each mage, are selected to represent faces.

78 Lterature Revew recognton. See Fgure 3-14 for the feature locatons selected by dfferent algorthms. As can be seen from the fgure, most of the sgnfcant locatons are around facal features, e.g. eyes, nose and mouth etc. (a) (b) (c) Fgure 3-14 Sgnfcant locatons selected by dfferent algorthms: (a) a local dscrmnaton crteron ranked jets locaton, sgnfcances are proportonal to the rad of the crcles; (b) the 15 most mportant locatons selected by GA; (c)2 2 samplng for key ponts whle 4 4 samplng for assstant ponts Optmzaton of Gabor wavelets Instead of optmzng the locatons to extract jets for face representaton, a few works have tred to optmze the Gabor wavelet bass used for feature extracton. Wang and Q (Wang et al., 2002) appled GA to select the optmzed Gabor wavelet bass for feature extracton. 34 easly dentfable landmarks, located manually on each mage, are selected to represent faces. A set of Gabor wavelets wth 4 scales and 6 orentatons s then desgned as canddates and the am of the GA s to then select the optmal subset as a bass for face representaton. To reduce the computaton burden on the GA, they also proposed to use nformaton complexty as a ftness measure of the chromosome. Face recognton s then performed based on the 4 optmal bass selected by GA and substantal mprovements over the egenface method have been observed. In summary, most of the works avalable n lterature ether select locatons where a fxed set of Gabor wavelets are appled, or optmze the wavelet bass to be convolved at a fxed set of feature ponts. Snce dfferent parts of natural objects usually dsplay 64

79 Lterature Revew varous local characterstcs, an mproved method should apply the optmal wavelets at the most approprate locatons for feature extracton D Face Recognton Methods Wth most of the 2D recognton methods focusng on frontal vew face mages only, 3D models have been adopted to recognze faces wth any pose. One of the representatve works usng a 3D model s descrbed n (Romdhan, Blanz, & Vetter, 2002). Ths work performs face recognton n an analyss-by-synthess fashon. The algorthm uses lnear equatons to recover the shape and texture parameters rrespectve of pose and lghtng condtons of the face mage. Those parameters are then used for recognton. However, the model fttng process takes qute long tme, e.g., 8 mnutes on a Pentum III 800 MHz PC. Smlar work can also be found n (Zhao & Chellapa, 2000; Lee & Ranganath, 2003). In these works, a 3D face model was usually used to synthesze mages wth dfferent llumnaton and poses from a frontal face mage, 2D technques are then appled to the syntheszed mages for recognton. Wth the development of 3D capture systems, face recognton usng 3D facal data s also attractng much attenton. (Beumer & Acheroy, 2000) developed both surface matchng and central/lateral profles for recognton, the results show that the two methods gve the same level of performance. Other technques used for 3D face recognton are Extended Gaussan Image (EGI) (Tanaka, Ikeda, & Chak, 1998) and pont sgnature (Chua, Han, & Ho, 2000). Some works also appled 2D technques to 3D range data for recognton, e.g., 3D Egenfaces (Hesher & Erlebacher, 2002). In addton to usng 3D data only, mult-modal 3D+2D face recognton has also been proposed (Wang, Chua, & Ho, 2002). In ths work, Gabor wavelet responses n 2D and pont sgnatures n 3D are ntegrated to an augmented vector for feature representaton. Classfcaton s done by SVMs. 65

Lterature Revew Despte the overall optmsm about 3D face data relatve to 2D face mages, t s ponted out by (Bowyer, Chang, & Flynn, 2004) that there are stll sgnfcant lmtatons n current 3D sensor

80 Lterature Revew Despte the overall optmsm about 3D face data relatve to 2D face mages, t s ponted out by (Bowyer, Chang, & Flynn, 2004) that there are stll sgnfcant lmtatons n current 3D sensor technology and most current 3D face recognton algorthms do not handle expresson varatons well. Whle 3D shape s defned ndependent of llumnaton, t s sensed dependent of llumnatons. Holes may occur n areas where data s mssng, even under deal llumnatons, see Fgure 3-15 for the example. 3D depth resoluton also needs to be mproved to beneft the recognton algorthms. All of these lmtatons suggest that the optmsm sometmes expressed for 3D face recognton s stll somewhat premature (Bowyer et al., 2004). Thus the approprate ssue may not be 3D versus 2D, but nstead the best method to combne 3D and 2D. Fgure 3-15 Example 2D ntensty mage, 3D range mage and sample Hole n sensed 3D data (Bowyer et al., 2004) 3.4 Summary A detaled survey of 2D face recognton algorthms and partcularly, Gabor feature based methods has been gven n ths chapter. 3D face recognton approaches are also brefly descrbed. The short survey on 3D approaches shows that 3D technologes are stll at the ntal stages due to a number of lmtatons. Texture, appearance, geometrcal features etc. 2D nformaton wll contnue to play mportant roles n face recognton. 2D methods can be bascally classfed nto three categores: analytc, holstc and hybrd. Whle the analytc methods extract feature from promnent facal feature ponts, the holstc methods extract feature from the whole face pattern. Due to the robustness aganst complex dstortons caused by varous varatons n llumnaton, 66

81 Lterature Revew facal expressons and poses, Gabor wavelets seems to be promsng bass to extract local features for face recognton, for several reasons: Bologcal motvaton: the shapes of Gabor wavelets are smlar to the receptve felds of smple cells n the prmary vsual cortex (Daugman, 1985), mathematcal motvaton: the Gabor wavelets are optmal for measurng local spatal frequences (Kruger et al., 2002a; Kruger et al., 2002b), and emprcal motvaton: they have been found to yeld sgnfcantly better performance than other methods n some performance tests (Zhang et al., 1997) (Kalocsa et al., 1998), FERET evaluaton (Phllps et al., 2000) and FVC2004 competton (Messer et al., 2004). Smlar to general 2D face recognton algorthms, Gabor wavelet based approaches are also categorzed as analytc and holstc methods. When elastc graph matchng based analytc methods represent face mages wth dfferent graph structures, the elastc matchng process to locate graph nodes for a face mage s however, very tme consumng. To replace such a complex process, some researchers locate facal features by edges, colours etc. such that Gabor features can be extracted from those fducal ponts for recognton. The locaton algorthm tself has to be robust aganst dstortons caused by llumnaton, pose and expresson. The success of Gabor feature based holstc methods reles on an augmented vector extracted from the whole face mage, whch s usually wth huge dmenson, e.g. 655,360 for mage wth sze when 40 wavelets are used. The feature thus requres hgh memory cost and could add hgh computaton cost to the classfer as well. As a result, the research presented n ths thess wll focus on applcaton of Gabor wavelets for face recognton, and on developng methods to optmze the Gabor feature extracton process for performance mprovement and computaton/memory cost reducton. 67

82 Gabor Features and Kernel Subspace Analyss for Face Identfcaton Chapter 4 Gabor Features and Kernel Subspace Analyss for Face Identfcaton The detaled revew on the background of Gabor wavelets has suggested the robustness of such mathematcal tools for feature extracton. Once robust features are extracted, subspace analyss could be appled for further class separablty enhancement and feature dmenson reducton. Due to the adopton of kernel methods, non-lnear kernel subspace analyss, e.g. Kernel Prncpal Component Analyss (KPCA) and Generalzed Dscrmnant Analyss (GDA), mght have substantal advantages over lnear subspace technques such as Prncpal Component Analyss (PCA) and Lnear Dscrmnant Analyss (LDA). Ths chapter presents work that utlses Gabor features and kernel subspace analyss for face dentfcaton. A set of 40 Gabor wavelets s used to extract robust features, whch are then subjected to KPCA or GDA to handle non-lnear varatons. Thereafter, dfferent dstance measures are evaluated and the nearest neghbour classfer s used for recognton. 68

83 Gabor Features and Kernel Subspace Analyss for Face Identfcaton 4.1 The Methodology System Archtecture Fgure 4-1 shows a flow chart demonstratng the use of Gabor features and kernel subspace analyss for face recognton. Intally a set of Gabor wavelets are used to extract approprate features, ths process s detaled n the next secton. The Gabor features extracted from a set of tranng mages are then used to learn the kernel subspace, whch s represented by the projecton matrx W. To dentfy a person, Gabor features of the face mage are extracted, concatenated nto a vector, projected to the learned kernel subspace and fnally compared wth the projectons of tranng (gallery) mages n the database. After comparson usng a dstance measure (such as Eucldean dstance) the person s dentfed as the one whose mage produces the smallest dstance. Projecton Matrx Gabor Feature Extracton DownSample Kernel Subspace Projecton KNN Classfer ID Fgure 4-1 System archtecture Gabor Feature Extracton As descrbed n chapter 3, a Gabor wavelet s determned by the followng parameters: the central frequency f, the orentaton θ and the rato between frequency and the sharpness of Gaussan axs γ, η. When the values of γ and η are normally fxed, a set of Gabor wavelets wth dfferent frequency and orentatons should be desgned to extract dscrmnant Gabor features. Most of the works n face recognton follow from the strateges proposed n (Lades et al., 1993; Wskott et al., 1997),.e., 69

84 Gabor Features and Kernel Subspace Analyss for Face Identfcaton F 2, ( ) max γ = η = f = vπ, where u = 0,...,4, v = 0, Once the rato s fxed, the u, θ v ( 2 ) 8 u = sze of the Gaussan envelope monotoncally decreases wth the value for the central frequency. The hgher the central frequency of the Gabor snusodal carrer, the smaller the area the Gaussan envelop wll cover n spatal doman. Ths s reasonable snce the hgh frequency sgnal changes faster. Accordng to the Nyqust samplng theory, a sgnal contanng frequences hgher than half of the samplng frequency cannot be reconstructed completely. Therefore, the upper lmt frequency for a 2D mage s 0.5 cycles/pxel, whle the lower lmt s 0. However, for face mages the actually useful band s much narrower, F max = cycles/pxel has been proven to be a reasonable choce (Lades et al., 1993). A Gabor wavelet wth parameters ( f,, γ, η) can now be defned as: 2 2 f u f u ϕu v ( x, y) exp, = x πγη γ x = x cosθ + y snθ r r v y = xsnθ + y cosθ v v v 2 r f u + η 2 y r 2 exp ( j2πf x ) u r u θ v, (4.1) Gven a bank of 40 Gabor wavelets, { ϕ, ( x, y), u = 0,...,4, v = 0,...7}, mage features at u v dfferent locatons, frequences and orentatons can be extracted by convolvng the mage I( x, y) wth the wavelets: O x, y) = I( x, y) ϕ ( x, ) (4.2) I u, v ( u, v y Fgure 4-2 shows the 40 Gabor wavelets and ther representaton n the frequency doman. As can be seen, the set of wavelets s tuned to a wde range of scales (frequences) and orentatons. The orentatons of Gabor wavelet shown n the fgure vary along the horzontal axs, whle ther scales vary n the vertcal axs. The mage n the 2 nd row shows the spectrum of the 40 wavelets n frequency doman, wth each blob representng the energy of a wavelet. To extract features at these dfferent scale and 70

wavelets: I S = { Ou, v ( x, y) : u {0,...,4}, v {0,...,7} (4.

85 Gabor Features and Kernel Subspace Analyss for Face Identfcaton orentaton levels the resultant Gabor feature set thus conssts of convoluton results of an nput mage I ( x, y) wth all of the 40 Gabor wavelets: I S = { Ou, v ( x, y) : u {0,...,4}, v {0,...,7} (4.3) Fgure 4-2 The 40 Gabor wavelets n the spatal and frequency doman Fgure 4-3 Convoluton result - (magntude and real part) of an mage wth 40 Gabor wavelets 71

86 Gabor Features and Kernel Subspace Analyss for Face Identfcaton Fgure 4-3 shows the magntude and real parts of Gabor representatons of a face mage at 5 scales and 8 orentatons. A seres of row vectors O I could be converted out of u, v O I v u, ( x, y) by concatenatng ts rows or columns, whch are then concatenated together to generate a dscrmnatve Gabor feature vector: I I I G I) = O = ( O O LO ) (4.4) ( 0,0 0,1 4, 7 As an example take an mage wth sze , the convoluton result wll gve =655,360 features DownSamplng and Kernel Subspace Analyss Due to the extremely hgh dmenson of the extracted Gabor features, the computatonal cost assocated wth learnng the subspace projecton matrx s very hgh. Though the feature dmenson does not affect the sze of the kernel matrx t does ncrease the computatonal cost of the dot product of the two data samples. As suggested n (Lu et al., 2002), the Gaussan pyramd downsamplng s used here for feature dmenson reducton. The experments of varyng downsamplng rate show that recognton rate drops drastcally when the rate s larger than 64. However, the performance s actually very smlar when the downsamplng rate s less than 64. Consderng both computaton cost and system performance, the downsamplng rate s set as 16 throughout ths work. Take an mage wth sze for example, the dmenson of Gabor features can now be reduced to /16=40,960. Detals of kernel subspace analyss have been dscussed n chapter 3 where PCA and LDA are performed n the hgh dmensonal feature space. By usng the kernel technque the dot product of two data vectors n the mapped feature space can be easly computed from the kernel functon. The KPCA and GDA subspace can thus be learned wthout knowledge of the mappng functon. Due to ts wde applcaton n the radal 72

Gabor Features and Kernel Subspace Analyss for Face Identfcaton bass neural networks and Support Vector Machnes the Gaussan kernel s used n ths work: 2 x y k( x, y) = exp (4.

87 Gabor Features and Kernel Subspace Analyss for Face Identfcaton bass neural networks and Support Vector Machnes the Gaussan kernel s used n ths work: 2 x y k( x, y) = exp (4.5) r Fgure 4-4 Sample mages from the UMIST database To gve some ntal deas about the performance of kernel and lnear subspace technques, the ablty of PCA, LDA, KPCA and GDA to separate data from dfferent classes s consdered frst. In order to nclude non-lnear varaton wthn the sample set of face mages the UMIST database (Granham & Allson, 1998) s used n ths test. 128 face samples from 4 people (32 face mages per person) are randomly selected. The database covers a range of poses from half profle to frontal vews, see Fgure 4-4 for the samples. Although the number of subjects s small n ths example the varatons of face mages, even for the same person, are qute large. Due to the substantal pose varaton the dfference between face mages of the same person mght be larger than that due to the subject dentty and thus the classfcaton problem presented here s not a trval one. The pxel values of the 128 tranng samples are drectly used as features 73

88 Gabor Features and Kernel Subspace Analyss for Face Identfcaton and analyzed by PCA, LDA, KPCA and GDA respectvely. The correspondng subspaces are constructed usng the resultant egenvectors. After that, the samples are projected onto the frst two egenvectors extracted by PCA, LDA, KPCA and GDA respectvely. Fgure 4-5 shows the dstrbuton of the face samples n these subspaces after projecton. In ths example, the samples projected by LDA and GDA are well separated. The faces from the same person are projected to the same pont by the GDA methods. Ths fgure provdes an example of better performance of LDA and GDA over PCA and KPCA. The dscrmnaton ablty of GDA s also proved n the experments: GDA performs better than PCA, KPCA, and LDA when the FERET and ORL database are used for testng. The expermental results wll be presented n detal n the followng secton. Fgure 4-5 Dstrbuton of face samples n PCA, LDA, KPCA and GDA subspaces 74

89 Gabor Features and Kernel Subspace Analyss for Face Identfcaton Fgure 4-6 Energy of the egenvalues n PCA, LDA, KPCA and GDA subspaces Snce the egenvalues assocated wth learned projectons (egenvectors) mght gve mportant nformaton to the dscrmnatve ablty of the subspace, we also show the energy of the egenvalues for PCA, LDA, KPCA and GDA n Fgure 4-6. Gven a set of egenvalues { λ m}, = 1,2, L,, the energy for λ s defned as e = λ. The maxmum λ dmenson of LDA and GDA s decded by C 1, where C s the number of ndvduals n the tranng set. As a result, whle the energy for 10 egenvalues are shown for PCA and KPCA, the energy for only 3 egenvalues are shown for LDA and GDA. As shown n ths fgure, the varatons of the egenvalues of PCA and KPCA are qute smlar n ths example, whch explans ther smlar classfcaton performance. The egenvalues shown for GDA are defned n the equaton (A.4) n the appendx, whch nterestngly show that the frst 3 egenvalues are exactly the same. It seems that the 3 projectons of GDA are equally mportant n ths example. 75

90 Gabor Features and Kernel Subspace Analyss for Face Identfcaton Dstance Measure and Classfcaton Gven a set of tranng samples { x, = 1,..., M}, a kernel functon k ( x, y) and a subspace projecton matrx W wth dmenson M L, L << M, a L dmensonal feature y can be derved from the Gabor feature vector x extracted from a test face mage by y = k W, x k x = [ k( x, x ) k( x, x )... k( x, x 1 2 M )]. As descrbed below, three dfferent dstance measures d, d, d are used n our experments to calculate the dstance between two sample E C M projectons y and 1 y : 2 Eucldean Dstance (Eu): T d ( y, y ) = ( y y ) ( y y ) (4.6) E Mahalanobs Dstance (Ma): d M T 1 ( y, y ) = ( y y ) Σ ( y y ) (4.7) Normalzed Correlaton (Nc): d y T 1 2 C ( y1, y 2 ) = (4.8) y 1 y y 2 where Σ s the covarance matrx calculated from the projected tranng samples, and denotes the norm operator. The smple nearest neghbour classfer s used n our experments for classfcaton,.e., the person s dentfed as the closest class to the nput mage: * = arg mn d( y, y ) (4.9) 1 M where M s the number of sample projectons n the database. 4.2 Expermental Results The Datasets Now the performance of the Gabor feature and kernel subspace based methods are analyzed usng two databases: the Face Recognton Technology (FERET) database 76

91 Gabor Features and Kernel Subspace Analyss for Face Identfcaton (Phllps et al., 2000) and the Olvett Research Laboratory (ORL) database (The AT&T Lab Cambrdge, 2002). The FERET database s assocated wth a testng procedure that s ntended to evaluate face recognton systems. The facal mages were collected n 15 sessons between August 1993 and July There are 14,126 mages from 1,199 ndvduals ncluded n the FERET database, whch s dvded nto development and sequestered portons for evaluaton. Due to the complexty of the Gabor feature based method, only a subset of the FERET database s used for testng n ths chapter. However, wth the mprovements proposed n chapter 7, expermental results on the full FERET database accordng to the assocated evaluaton protocol wll be gven there. The ORL database contans face mages taken between Aprl 1992 and Aprl 1994 at the Unversty of Cambrdge, UK. There are 400 mages from 40 ndvduals. The proposed method wll frst be tested usng a subset of the FERET database, where varatons n llumnaton and facal expresson are avalable. Dfferent dstance measures for the Gabor + KPCA and Gabor + GDA methods wll be evaluated and compared wth the lnear subspace technques,.e. Gabor + PCA and Gabor + LDA. The approach wll also be compared wth those usng raw pxel values as features and state of the art algorthms n lterature. Followng the test on the FERET database, the proposed method wll be further evaluated usng the ORL database, where face mages are captured wth vared poses and scales. The performance wll also be compared wth that of state of the art technques Performance Evaluaton Usng The FERET Database 600 frontal face mages correspondng to 200 subjects are extracted from the FERET database for the experments. All the subjects are n an uprght, frontal poston, wth tolerance for some tltng and rotaton of up to 10 degrees. The 600 face mages were acqured under varyng llumnaton condtons and facal expressons. Each subject has 77

92 Gabor Features and Kernel Subspace Analyss for Face Identfcaton three mages of sze wth 256 gray levels. The followng procedures are appled to normalze the face mages pror to the experments: The centres of the eyes of each mage are manually marked, each mage s rotated and scaled to algn the centres of the eyes, Each face mage s cropped to the sze of to extract the facal regon, and normalzed to zero mean and unt varance. To test the algorthms, two mages of each subject are randomly chosen for tranng, whle the remanng one s used for testng. Fgure 4-7 shows sample mages from the database. The frst two rows are example tranng mages whle the thrd row shows example test mages. One can see from the fgure that all test mages consst of varatons n llumnaton and expresson. Fgure 4-7 Example tranng mages (top 2 rows) and test mages (bottom row) of the FERET database Comparson of Dfferent Dstance Measures Kernel subspace analyss,.e. KPCA and GDA, are performed on the Gabor feature vector extracted from the orgnal face mages for face dentfcaton. A Gaussan kernel s used for KPCA and GDA wth r = 8e4, whch s determned emprcally for the best results,.e. the value of r was chosen to maxmze the recognton rate. We observe n our experments that GDA s less senstve to the value of r than KPCA. Three smlarty measures Eu, Ma and Nr are tested and compared. As shown n Fgure 4-8, normalzed correlaton acheved the best performance for GDA among the three 78

93 Gabor Features and Kernel Subspace Analyss for Face Identfcaton dstance measures, whle the dfference between Eucldean dstance and Mahalanobs dstance s not large. However, Mahalanobs becomes the best dstance measure for KPCA, whch acheves sgnfcantly hgher recognton rates than the other two measures (see Fgure 4-9 for detals). Smlar results are also observed for the lnear subspace projecton methods, PCA and LDA. It seems that for expressve features derved n PCA and KPCA space, the Mahalanobs dstance measure s more sutable than others; whle for dscrmnatng features extracted by LDA and GDA, the correlaton dstance measure seems to be the best choce. Fgure 4-8 Performance of Gabor + GDA usng dfferent dstance measures Fgure 4-9 Performance of Gabor + KPCA wth dfferent dstance measures 79

94 Gabor Features and Kernel Subspace Analyss for Face Identfcaton Comparson wth Lnear Subspace Methods The comparatve results of PCA, LDA, KPCA and GDA on the Gabor feature vector wth respectve optmzed dstance measures are shown n Fgure One can see from the fgure that nonlnear subspace methods are bascally performng better than ther correspondng lnear approaches,.e., KPCA performs better than PCA and GDA performs better than LDA. GDA performs the best among these four algorthms. Followng GDA, LDA performs better than KPCA and PCA. The results match well wth the data separaton test n secton A recognton rate as hgh as 97.5% s acheved for the novel Gabor + GDA approach when the number of components s set as 35. When the number of component became bgger than 90, we observed that the accuracy of PCA and KPCA converged around 80%, and there s no overlap between GDA and PCA, or KPCA. Fgure 4-10 Expermental results of PCA, LDA, KPCA and GDA usng Gabor features Comparson wth Raw Pxel Features To emphasze the dscrmnatng power of the extracted Gabor feature vector, the comparatve performance of PCA, Gabor + PCA, GDA and Gabor + GDA are also shown n Fgure When the Gabor feature vector s not used, the pxel values of 80

95 Gabor Features and Kernel Subspace Analyss for Face Identfcaton face mages are smply concatenated to a feature vector. For example, the length of a raw pxel feature vector wll be =16,384 for an mage wth sze One can see that the adopton of the Gabor feature vector mproves the performance of PCA and GDA by a large margn. The Gabor + PCA method acheves 20% hgher accuracy than PCA, whle 6% mprovement s observed for GDA when Gabor wavelets are appled. The mprovement for Gabor + LDA and Gabor + KPCA has also been observed n the experments. Please note that the performance of GDA does not always mprove wth the ncrease of dmenson. As the small (tralng) egenvalues tend to capture nose, GDA acheves ts maxmum performance at dmenson 35. Fgure 4-11 Performance mprovement of PCA and GDA usng Gabor features Comparson wth Other Methods For further comparson of Gabor feature and GDA based methods wth other approaches, the results on the same database for Radal Bass Functon (RBF) neural network and HMM (Nefan et al., 1999; Ba et al., 2003a) based methods are shown n Table 4-1. Raw pxel features are used for RBF based methods,.e. the normalzed pxel values of the mage are nput drectly to the network for personal dentty determnaton. The two layers of the RBF network and HMMs are traned usng the same tranng set, 81

96 Gabor Features and Kernel Subspace Analyss for Face Identfcaton wth parameters optmzed for best performance. The form of the neural network nput layer s actually the Gaussan bass functon, whch s the same as the Gaussan kernel functon. To make RBF the same structure wth kernel subspace analyss, whch takes nner product of the nput data wth all of the tranng samples, the network s desgned wth 400 nodes for the nput layer and 200 nodes for the output layer. When DCT- HMM uses DCT (Dscrete Cosne Transform) coeffcents for observaton vector extracton, DWT-HMM adopts DWT (Dscrete Wavelet Transform) for more robust feature extracton. As shown on Table 4-1, Gabor + GDA performs sgnfcantly better than the other two methods. Recognton Rate RBF Network 75% DCT-HMM 32.5% DWT-HMM 44.5% GDA 90% Gabor + GDA 97.5% Table 4-1 Comparatve results of Gabor + GDA wth other methods on part of the FERET database Performance Evaluaton Usng the ORL Database The ORL database contans 400 mages from 40 subjects. All the mages were taken aganst a dark homogeneous background wth the subjects n an uprght, frontal poston, wth tolerance for some tltng and rotaton of up to 20 degrees. The varaton n scale s up to about 10%. Fgure 4-12 shows example tranng mages and test mages for 2 people. Each mage s reszed to pxels and normalzed to zero mean and unt varance. Both har and forehead are ncluded n the face mages and the poses vary from left to rght and up to down. To evaluate the algorthms, 5 mages of each person are randomly chosen for tranng whle the remanng 5 are used for testng. PCA, LDA, KPCA and GDA are frst performed on the orgnal mages for dentfcaton. As shown n last secton, the Mahalanobs dstance measure s used for 82

97 Gabor Features and Kernel Subspace Analyss for Face Identfcaton PCA and KPCA, whle correlaton dstance measure s adopted for LDA and GDA. The results are tabulated n Table 4-2. As shown n ths table, the performance of LDA deterorates when the varaton n pose ncreases the ntra person varance sgnfcantly, thus t wll be very dffcult to fnd a projecton space such that the wthn class varance s mnmzed. However, once the data s projected to the hgh dmensonal feature space, GDA s stll able to fnd the desred projecton matrx. As a result, both PCA and KPCA acheve better performance than LDA, whle GDA s stll the best method for recognton. (a) (b) (c) (d) Fgure 4-12 Example tranng (a), (c) and test mages (b), (d) n the ORL database Method Recognton Rate PCA 92.0% LDA 85.0% KPCA 91.5% GDA 96.5% Table 4-2 Expermental results of PCA, LDA, KPCA and GDA on the ORL database 83

98 Gabor Features and Kernel Subspace Analyss for Face Identfcaton In the next seres of experments, PCA, LDA, KPCA and GDA are appled to the Gabor features extracted from the mages and the results are shown n Table 4-3. The performance of LDA was greatly mproved and t now acheves better performance than PCA and KPCA, whch shows the robustness of Gabor features aganst the varaton of pose. The novel Gabor + GDA methods acheve 100% accuracy when only 35 components are used, whch s so far the best reported results n lterature on the ORL database, see Table 4-4 for the results of other methods. Snce the number of subjects n the ORL database s much smaller than FERET database, all algorthms acheve much better performance. The results are taken from the orgnal papers drectly, where the same testng strategy s used,.e. half of the mages are used for tranng and the remanng mages are used for testng. Method Recognton Rate Gabor + PCA 98.5% Gabor + LDA 99.0% Gabor + KPCA 98.5% Gabor + GDA 100.0% Table 4-3 Performance mprovements usng Gabor features on the ORL database Method Recognton Rate RBF (Er et al., 2002) 98.08% DCT-HMM (Ba et al., 2003a) 97.50% DWT-HMM (Ba et al., 2003a) 98.50% Table 4-4 Results of other methods on the ORL database 4.3 Conclusons A Gabor feature and kernel subspace analyss based face dentfcaton method has been presented n ths chapter. Gabor wavelets are used to extract features from the face mages, whch are then further analyzed by kernel subspace methods, such as, KPCA and GDA n order to acheve a hghly dscrmnatve feature for recognton. Two databases, FERET and ORL, have been used to test the proposed algorthms. Whle the face mages extracted from the FERET database were acqured under varable 84

99 Gabor Features and Kernel Subspace Analyss for Face Identfcaton llumnaton and expressons, the samples n the ORL database represent varatons n pose and scale. The results show that better performance can be acheved for kernel methods than ther correspondng lnear methods. By testng PCA, LDA, KPCA and GDA usng the pxel features and the extracted Gabor feature vector respectvely, the results show that the Gabor feature vector extracted from the fltered mages yelds a sgnfcantly more dscrmnatve representaton of the face than the orgnal mage. Comparson among dfferent state of the art technques show that the Gabor + GDA method acheves much more effcency on both the FERET and ORL databases. As hgh as 97.5% and 100% accuracy have been observed on the two databases. By mappng the nput features to a hgh dmensonal nonlnear feature space, GDA can not only greatly reduce the feature dmenson, but also ncrease the dscrmnaton power of the extracted features. Encompassng dfferent scale, localty and orentaton nformaton, the proposed Gabor + GDA method has bee proven to be very robust aganst varatons of llumnaton, expresson, pose and scale. 85

100 Generalzed Dscrmnant Analyss of Gabor Features for Face Verfcaton Chapter 5 Generalzed Dscrmnant Analyss of Gabor Features for Face Verfcaton Whlst face dentfcaton ams to dentfy the personal ID of an nput mage, verfcaton attempts to verfy a clamed ID assocated wth a facal mage. As a result, whle an dentfcaton system needs to compare ts nput wth each person n the database, verfcaton systems attempt to match an nput mage wth the clamed dentty only. Based on the matchng result, the system ether accepts, or rejects the clamed ID. Applcatons of face verfcaton can be found n passport control, E- busness, personal authentcaton and n many addtonal areas. Due to the successful applcaton of Gabor features and GDA for face dentfcaton, ths chapter presents a face verfcaton system usng the same technology. Robust Gabor features are frst extracted from dfferent face mages, projected to the traned GDA subspace, and matched usng the normalzed correlaton dstance measure. The system wll be fully tested usng the BANCA database accordng to evaluaton protocols of the recent Face Verfcaton Competton As a result, the results are drectly comparable wth other partcpants. 86

101 Generalzed Dscrmnant Analyss of Gabor Features for Face Verfcaton 5.1 Face Verfcaton Competton 2004 and The BANCA Database The Competton Wth a large number of face recognton algorthms avalable n the lterature, drect comparson between them s very dffcult snce tests are normally performed on dfferent data sets. When mages are captured wth varyng sensors, vewng condtons, llumnaton and backgrounds, t s unclear whch method s the best. A standard test set wth evaluaton protocols could help allevate ths problem. In Aug 2004, a face verfcaton competton was organzed by Unversty of Surrey, UK. The contest was held n conjuncton wth the 17 th Internatonal Conference on Pattern Recognton. 13 verfcaton algorthms from 11 academc and commercal nsttutons around the world partcpated n the competton and the results are reported n (Messer et al., 2004). Dfferent verfcaton systems are frst tested usng face mages normalzed wth manually located eye centres, and then assessed usng ther own automatc normalzaton methods. To make ths work drectly comparable wth other partcpants, the verfcaton methods presented n ths chapter wll be fully tested usng exactly the same database and protocol as requred by the contest The Database Several data sets have been made avalable n lterature over the past few years. Whle the FERET database (Phllps et al., 2000) defnes a protocol for face dentfcaton evaluaton, the XM2VTS database (Messer et al., 1999) can be used to test dfferent face verfcaton systems. The XM2VTS database, together wth the Lausanne protocol, contans 295 subjects captured over 4 sessons. The data was recorded n a controlled envronment, whch makes t unrealstc compared to real world stuatons such as when one makes a transacton at home through a consumer web cam or through an ATM n a potentally very wde varety of surroundngs. As a result, the BANCA database wth 87

102 Generalzed Dscrmnant Analyss of Gabor Features for Face Verfcaton assocated protocols (Ballere et al., 2003) has been proposed to make the evaluaton as realstc as possble when real world factors are taken nto consderaton. The BANCA database conssts of mages from 52 subjects captured n 12 sessons. 10 face mages are captured for each person n each sesson. The 12 sessons are composed of 3 dfferent scenaros: 1) Controlled scenaro for sessons 1-4, 2) Degraded scenaro for sessons 5-8, 3) Adverse scenaro for sessons A web cam was used n the degraded scenaro and a hgh qualty camera was used n the controlled and adverse scenaros. Images are captured wth normal pose n the controlled and degraded scenaros, whlst a head down pose s requred n the adverse scenaro. Fgure 5-1 shows some sample mages captured n dfferent scenaros from ths database. All of the mages are colour mages wth a sze of Images captured n dfferent scenaros: controlled, degraded and adverse are shown on the frst, second and thrd rows respectvely. Fgure 5-1 Example Images n the BANCA Database 88

103 Generalzed Dscrmnant Analyss of Gabor Features for Face Verfcaton Test Protocols Seven test protocols, whch dentfy dfferent tranng and test mages, are defned n (Ballere et al., 2003) to evaluate verfcaton algorthms. Of these protocols, protocol P s the most dffcult and challengng one. The protocol specfes the parttonng of the database nto two dsjont sets: a development set (26 subjects) and an evaluaton set (26 subjects). For each set, 5 mages from each person captured n the 1st sesson (Controlled scenaro) are used as tranng mages, whle 2730 selected mages captured n all three scenaros are used for testng. There s no overlap between the tranng mages and test mages. Of the test mages, 1170 mages are clamed wth the true dentty (clent access) to test FR, whle other mages are clamed wth a false dentty (mpostor access) to test FA. Each set thus conssts of 130 tranng mages, wth the test data consstng of 1170 clent accesses and 1560 mpostor accesses (Ballere et al., 2003). The performance of verfcaton systems s normally assessed by the False Acceptance Rate (FAR) and the False Rejecton Rate (FRR). These two measures are drectly related,.e. decreasng the number of false rejectons wll ncrease the false acceptance rate. The pont at whch FAR=FRR s known as Equal Error Rate (EER). The lower the value of the EER, the more relable the system. EER can be used to measure the system performance where FAR and FRR are equally mportant, Weghted Error Rate (WER) s defned for weghted FAR and FRR as below: FAR + R FRR WER = (5.1) 1+ R where R = C FA C FR defned to assess verfcaton systems: defnes the cost rato between FAR and FRR, 3 dstnct cases can be R = 0. 1, FA s an order of magntude less costly than FR 89

104 Generalzed Dscrmnant Analyss of Gabor Features for Face Verfcaton R = 1, FA and FR are equally costly R = 10, FA s an order of magntude more harmful than FR Obvously, EER s a specal case where FA and FR are equally harmful. In order to meet the requrements of the contest, the results of Gabor + GDA for the 3 cases are reported n ths chapter. 5.2 The Methodology System Archtecture Fgure 5-2 shows the flow chart of the descrbed approach usng Gabor features and GDA analyss for face verfcaton. The GDA subspace, represented by the projecton matrx W, s frst learned from the Gabor features extracted from a set of tranng mages. The regstered facal mages of each person are then projected to the GDA subspace and projecton coeffcents are saved as templates n the database. To verfy a clamed personal ID, the same process s appled to a gven nput mage and the projecton s compared wth the stored projectons of the person to be verfed (the clamed ID). A decson could be made by a smple thresholdng strategy,.e., f the smlarty s above or equal to the gven threshold, the clam s accepted; otherwse t s rejected. Gabor Feature Extracton Clamed ID DownSample Use ID to retreve feature from database GDA Subspace Projecton Decson rule Yes, You are No, You are not Projecton Matrx Fgure 5-2 System archtecture 90

105 Generalzed Dscrmnant Analyss of Gabor Features for Face Verfcaton Smlarty Measure and Threshold Determnaton Based on the evdence resultng from extensve experments for the face dentfcaton approach, the robustness of the Gabor feature and GDA based methods were fully demonstrated n the prevous chapter. The work also shows that the Mahalanobs dstance measure should be used for expressve features such as PCA and KPCA, whle the correlaton dstance measure s more approprate for dscrmnatve features derved by LDA and GDA. As a result, the work presented here follows ths strategy. Snce there mght be a number of projectons regstered for a person, the matchng score, or confdence C of an nput mage projecton y belongs to the subject s defned as below: 1 C = N N j= 1 d( y, y ) j (5.2) where N s the number of projectons y j regstered for person, and d ( y, y j ) are dfferent dstance measures such as Mahalanobs or correlaton measures (see chapter 4 for detals). To make a decson on whether the clam s accepted, or rejected, a smple thresholdng scheme can be used. Whle varyng thresholds can be set for dfferent people, a smpler approach s to use a global threshold for all of the subjects. A separate tranng set, or development set, can be used to determne the value of threshold(s). Thereafter, the performance of the system can be tested usng a dfferent test set. Whlst subject specfc thresholds can acheve smaller error rates on the tranng or development sets, they mght be easly over tuned to the tranng set and as a result, the smpler global threshold scheme s used throughout ths work. 91

106 Generalzed Dscrmnant Analyss of Gabor Features for Face Verfcaton 5.3 Expermental Results The Dataset To make the results of our method drectly comparable wth other methods n the competton, the BANCA database s used for testng n the experments. Smlar to the procedure used n chapter 4, all of the mages used n the experments are normalzed sem-automatcally. To acheve spatal normalzaton, face mages are rotated, translated and scaled accordng to the poston of the eyes. The mages are cropped to a standard sze of and rotated so that the eyes are placed at fxed ponts. To reduce llumnaton varatons, all of the mages are ntally hstogram equalzed and then shfted and scaled such that the mean values of all pxels equals zero, whle the standard devaton equals one. Whle the results are reported on the manually normalzed mages n ths chapter, results for the fully automatc verfcaton system wll be gven n chapter 7. Fgure 5-3 shows some normalzed face mages of three subjects acqured n dfferent sessons: controlled, degraded and adverse scenaros are shown on the frst, second and thrd rows respectvely. Fgure 5-3 Normalzed face mages Results on The Development Set As defned n the protocol, a development set wth 130 tranng mages and 2730 test mages from 26 subjects s frst used to test the system. All the parameters of the system, e.g., subspace dmenson, RBF kernel and decson threshold etc., are optmzed to 92

107 Generalzed Dscrmnant Analyss of Gabor Features for Face Verfcaton maxmze ts performance on the development set. The results for Gabor + GDA are lsted n Table 5-1, together wth the baselne approach, Gabor + PCA. The reason behnd the choce of PCA as a baselne s that LDA does not perform well when the tranng mages are not representatve, whch s the case here snce most of the test mages are captured under dstnct scenaros. Whlst PCA uses the Ma dstance measure, the Nc dstance measure s adopted for the GDA method. A RBF kernel wth r=9e4 s found to acheve the best results. The ROC curves for the two methods usng the development set are also shown n Fgure 5-4. It can be seen from ths fgure that the Gabor + GDA method performs the best wth a 5.96% EER (See Table 5-1). As descrbed before, a global threshold s used for an acceptance or rejecton decson. Method Kernel Threshold FAR FRR EER Gabor + PCA N/A Gabor + GDA RBF (r=9e4) Table 5-1 Verfcaton performance on the development set Fgure 5-4 ROC curves on the development set Results on The Evaluaton Set An ndependent evaluaton set was desgned n protocol P to test the generalzaton ablty of the verfcaton algorthms. The evaluaton set conssts of the same number of 93

108 Generalzed Dscrmnant Analyss of Gabor Features for Face Verfcaton subjects and mages as that of the development set. However, the subjects of the evaluaton set are dstnct from those n the development set. Wth parameters adjusted and performance optmzed usng the development set, the generalzaton ablty of algorthms can be further analyzed usng the evaluaton set. The EER of the Gabor + PCA and Gabor + GDA methods on the evaluaton set are tabulated n Table 5-2. All of these results have been tuned to the development set n the frst seres of experments,.e., the decson threshold has been adopted durng the development phase. Agan, the Gabor + GDA method acheves a lower EER than Gabor + PCA. However, the advantage of GDA over PCA s not bg n ths test, whch mght be caused by the small sze of the tranng set and the sgnfcant dfference between the face mages n the tranng set and the test set. Method Threshold FAR FRR EER Gabor + PCA Gabor + GDA Table 5-2 Verfcaton performance on the evaluaton set Comparson wth Other Methods Once the performance of the Gabor + GDA approach has been analyzed usng EER, t s now compared wth all of the partcpants n FVC2004. Table 5-3 descrbes these results. Usng the defnton gven n last secton, the performance s now assessed usng WER wth 3 dfferent values of R. Please note that the entry for Unv Nottngham s the other method developed by us, whch uses Gabor wavelets for feature extracton, PCA for feature dmenson reducton and Support Vector Machne (SVM) for classfcaton. Snce an executable exe fle s requred, we developed another method smply because of the nsuffcent of tme avalable to convert the Gabor + GDA method nto C mplementaton. Subject specfc SVMs and thresholds are learned for each person. Once the parameters are optmzed usng the development set to acheve 94

109 Generalzed Dscrmnant Analyss of Gabor Features for Face Verfcaton the lowest possble WER, the same parameters can then be used when reportng the WER on the evaluaton face mage set. A number of dfferent technologes have also been nvolved n ths competton, e.g., PCA and LDA for feature extracton and dmenson reducton, Hdden Markov Models (HMM) and Gaussan Mxture Models (GMM) for probablty based classfcaton and Nearest Neghbour (NN) and SVM for dstance based classfcaton. The IDIAP Fuson system s composed of three classfcaton subsystems,.e. DCT + HMM, DCT + GMM and LDA + Mult-layer Perceptron (MLP), the matchng score of UCL-Fuson system s a weghted score of LDA + correlaton dstance and SVM. A more detaled descrpton of the dfferent approaches can be found n (Messer et al., 2004). R=0.1(WER) R=1(WER) R=10(WER) Dev Eval Dev. Eval. Dev. Eval. Avg IDIAP HMM IDIAP Fuson QUT UPV Unv Nottngham Gabor + PCA + SVM + subject specfc thresholds Natonal Tawan Unv UnS UCL-LDA UCL-Fuson NeuroInformatk Tsnghua Unv CMU Gabor + GDA + global threshold Table 5-3 Verfcaton results for partally automatc systems The results for the proposed Gabor + GDA method have been appended to the bottom of the table. The comparson shows that the two methods developed by us are among the top three approaches. The performance of our methods have been shown to be sgnfcantly better than other partcpants except the Tsnghua Unversty system, 95

110 Generalzed Dscrmnant Analyss of Gabor Features for Face Verfcaton whch combnes several classfers for addtonal performance enhancement. Please note that the entry NeuroInformatk s based on the famous Elastc Bunch Graph Matchng method (Wskott et al., 1997), whch extracts Gabor jets on manually defned feature ponts for recognton. Whlst ther method acheves the top performance n the FERET evaluaton (Phllps et al., 2000), t s has been shown not to perform to a hgh level of accuracy wthn the context of the FVC2004 competton. As specfed n ther descrpton, ther method may be more sutable for large and hgh qualty mages. Due to the adopton of subject specfc thresholds, the method usng Gabor + PCA + SVM acheves lower error rates than the Gabor + GDA approach presented n ths chapter. However, methods usng subject thresholds are more senstve to the overfttng problem. 5.4 Conclusons Followng the successful applcaton of Gabor + GDA methods for face dentfcaton, the same approach has also been used for solvng the face verfcaton problem. Wth very mnor modfcatons, the system has proved to work well for verfcaton applcatons. The system s fully tested usng the BANCA database, whch conssts of mages taken under uncontrolled envronmental condtons. As a result, the test mrrors condtons found n real world applcaton envronments. By usng the same database and protocol as the FVC2004, the results presented here are drectly comparable wth partcpants from all over the world. The comparson wth other state of the art technologes shows that the work presented here s one of the most accurate, advanced and robust systems currently under development. Wth the excepton to one nsttute, the method developed by us performs sgnfcantly better than other approaches. The results prove the robustness of the proposed Gabor feature and GDA subspace, thus the extracted features have been shown to be robust aganst varance of pose, llumnaton 96

111 Generalzed Dscrmnant Analyss of Gabor Features for Face Verfcaton and camera. Whlst the Tsnghua Unv combnes several classfers for performance enhancement, the second method developed by us uses subject specfc thresholds. As a result, the performance of Gabor + GDA method could be further mproved by fusng addtonal features and replacng global thresholds wth subject specfc ones. However, subject specfc thresholds may cause the system to be over-tuned to the avalable data and thus a dfferent data set may cause the system performance to drop dramatcally, where as more generalsed methods should naturally handle the change n data more approprately. 97

112 Optmsng Gabor Features for Object Detecton and Recognton Chapter 6 Optmsng Gabor Features for Object Detecton and Recognton As shown n the prevous chapters, the Gabor + GDA method has been successfully appled to both face dentfcaton and verfcaton problems. The proposed approach was fully tested usng the FERET and BANCA database and excellent performance has been observed. However, snce a set of 40 Gabor wavelets s used to extract features, both computaton and memory costs for ths method are very hgh. The costs are manly caused by the followng processes: 1) the convoluton operaton of the mage wth 40 wavelets. Though FFT and IFFT can be used to speed up the process, the 40 convoluton operatons for a mage usng a P4 1.8GHz PC stll takes about 2 seconds; 2) the huge dmenson of extracted features,.e = 655,360 for a mage, brngs a large memory and computaton burden to the classfcaton algorthm. A feature selecton method, capable of reducng the number of convolutons and feature dmenson, s requred to solve such problems. In ths chapter, feature selecton schemes such as AdaBoost algorthm wll be appled for Gabor feature selecton. The approach presented here ams to apply the optmal Gabor wavelets at the most approprate locatons for feature extracton. To reduce the redundancy among AdaBoost selected features, a novel boostng based feature selecton algorthm --- MutualBoost s also proposed. 98

113 Optmsng Gabor Features for Object Detecton and Recognton 6.1 AdaBoost Feature Selecton and Classfer Learnng The AdaBoost algorthm s based on the dea that a strong classfer can be created by lnearly combnng a number of weak classfers (Freund et al., 1999). For mage related problems, a weak classfer could be a very smple threshold functon h j consstng of only one smple feature f j (I) extracted from the mage I : h j = 1 j j j j 1 f p f ( I) < p λ otherwse (6.1) where λ j s a threshold and p j s a party to ndcate the drecton of the nequalty. The feature could be the smple Haar-lke features as descrbed n (Lenhart et al., 2002),.e., the lnear combnaton of the sum of pxel values of neghbourng rectangles. Varous features thus dffer n any of the followng rectangle parameters: locaton, wdth, heght, and orentaton α {0, 45 0 }. Accordng to the structure of the neghbour rectangles, the features can be classfed nto 14 prototypes,.e., four edge features, eght lne features, and two centre-surround features. As shown n Fgure 6-1, f one denotes the black and whte rectangles as r, r 1 2 and the sum of pxels of a rectangle as S (r ), a Haar-lke feature gven any rectangle structure n an mage I can be denoted as f j ( I) = w1s ( r1 ) + w2s( r2 ), where weghts w 1, w 2 R. Fgure 6-1 Prototypes of smple Haar-lke features (Lenhart et al., 2002) 99

114 Optmsng Gabor Features for Object Detecton and Recognton Detals of the algorthm (also see chapter 2) are: T weak classfers are selected to form the fnal strong classfer over a number of T rounds. In each of the teratons, the space of all possble classfers s searched exhaustvely to fnd the best weak classfer wth the lowest weghted classfcaton error. The error s then used to update the weghts such that the wrongly classfed samples get ther weghts ncreased. The resultng strong classfer s a weghted lnear combnaton of all T selected weak classfers. Snce each weak classfer s usng dfferent features, the most mportant T features have also been selected. Note that AdaBoost algorthm s used here to address two class problems and weak classfers wth dscrete output only. See AdaBoost.M1 and AdaBoost.MH (Freund et al., 1999) for solutons to the mult-class problem and RealBoost (Schapre & Snger, 1999) for boostng weak classfers wth real valued output. 6.2 The Proposed MutualBoost Algorthm As descrbed n prevous secton, the AdaBoost algorthm selects weak classfers and adjusts sample weghts based on the classfcaton error. The motvaton behnd the weght adjustments s to change the dstrbuton of samples such that the weak classfer selected at current round T s uncorrelated wth the class label n the next round T +1 (Freund et al., 1999; Aslam, 2000). Intutvely, the learner s thus forced to learn somethng new n the next round T +1. However, a correlaton between the class label and a certan weak classfer selected at round t, 0 < t < T, mght stll exst. In ths case, the weak classfer selected at round T +1 could be smlar wth the one selected at round t. As a result, many features selected by the AdaBoost algorthm mght be smlar (L & Zhang, 2004). The proposed boostng algorthm ncorporates the dea of Mutual Informaton (MI) to elmnate those non-effectve weak classfers. Before a new weak classfer s added, 100

115 Optmsng Gabor Features for Object Detecton and Recognton the MI between the new classfer and each of the selected ones s examned to make sure that the nformaton carred by the new classfer has not been captured before. Gven stage T+1 where T weak classfers { v( 1) v(2) h, h, L h } have been selected, the v(t ) functon to measure the max MI R ( h j ) between a canddate classfer h j and the selected classfers can be defned as follows: R h ) = max I( h, h ), t 1,2, LT (6.2) ( j j v( t ) = t N Each weak classfer h : R { 1,1 } s now consdered as a random varable (r.v.). the j estmaton of MI between two r.v., e.g. h and h requres nformaton about the j margnal dstrbuton p h ), p h ) and the jont probablty dstrbuton p h, h ), whch ( ( j ( j could be approxmated by hstogram estmaton. However, t s very dffcult to determne the deal number of hstogram bns. Though a Gaussan dstrbuton could be appled as well, many of the features, mght not show Gaussanty. To reduce the complexty and computaton cost of the feature selecton process, we hereby focus on random varables wth bnary values only,.e., h { 1,1}, h { 1,1 }. For bnary r.v., the probablty could be estmated by smply countng the number of possble cases and dvdng that number wth the total number of tranng samples. For example, the possble cases wll be {( 1, 1),( 1,1),(1, 1),(1,1 )} for the jont probablty of two bnary r.v. p h, h ). ( j The value of R h ) can be drectly used to decde whether the new classfer s redundant ( j or not. The value s compared wth a pre-defned Threshold Mutual Informaton (TMI) value, f t s bgger than the TMI, we can deduce that the nformaton carred by the classfer has already been captured. Besdes MI, the classfcaton error of the weak classfer s also taken n to consderaton,.e., only those classfers wth small errors j 101

116 Optmsng Gabor Features for Object Detecton and Recognton are selected. The classfers (features) thus selected mght be both accurate and nonredundant. Detals of the algorthm are lsted n Fgure 6-2 as below. Gven M tranng samples ( x, y ), = 1,2,.., M Intalzaton: weghts w 1 ( ) = 1/ M For t=1,, T 1) Tran weak learners usng dstrbuton w t 2) Gven each canddate weak classfer h j, calculate the classfcaton error ε = w ) h ( x ) y j t ( j For (;;) Choose h wth lowest error u ε from the canddate u classfers Calculate the max MI R( h u ) accordng to Eq. (6.2) If R( h u ) < TMI The classfer found, h = t h, u ε = t ε u go to 3) Else Remove h u from the canddate lst End If End Loop 1 1 ε 3) Calculate t α = t ln 2 ε t w ( ) t t yht x 4) Update weghts: w ( ( )exp α ( ) ) = t+ 1 Z t T Fnal strong classfer: H ( x) = sgn α tht ( x) t = 1 Fgure 6-2 The proposed MutualBoost Algorthm 6.3 Applcaton to Object Detecton Classfcaton based object detecton methods normally scan the mage wth a small wndow and make decson usng a traned classfer as to whether the processng wndow s the object, or not. As descrbed n secton 1, AdaBoost algorthm has been successfully appled to select and learn Haar-lke feature based classfer for object detecton. In ths system, each weak classfer s desgned to make a predcton usng sngle Haar feature extracted from mage I,.e. f j I) = w S( r ) + w S( ). In the context of ( r2 Gabor feature selecton, f j (I) s smply the convoluton result of the nput mage wth a 102

117 Optmsng Gabor Features for Object Detecton and Recognton certan Gabor wavelet at locaton ( x, y). Gven an mage wth sze W H and a bank of U V Gabor wavelets { ϕ, ( x, y), u = 0,..., U 1, v = 0,... V 1}, a set of u v N = W H U V Gabor features at dfferent locaton, frequency and orentaton can be extracted as below: = ( G( I) ), j 1,2 N (6.3) f j ( I) j =,..., where G (I) s the Gabor feature vector extracted from mage I usng the set of Gabor I I I I wavelets,.e. G I) = ( O O L, O, LO ) ( 0,0 0,1 u, v U 1, V 1. The row vector O I u, v s generated by concatenatng the convoluton result I ( x, ) of an mage I wth a wavelet ϕ ( x, ), ϕ u, v y u, v y see chapter 4 for detals. Each weak classfer s now traned to use a sngle feature from the complete Gabor feature set for classfcaton. When these classfers are combned, a much better performance can be acheved than that of sngle classfer. Based on the mportance of classfcaton accuracy, essental Gabor features wth approprate frequences and orentatons are selected at dfferent mage locatons and ranked by the AdaBoost algorthm. Once those dscrmnatve Gabor features are selected, they can also be nput to more complex classfers, e.g. Support Vector Machne (SVM) for classfcaton. The method wll be appled to classfy face/non-face and car/non-car mages n the experments and compared wth Haar-lke features based approaches. The Gabor feature based classfer can be further developed nto a fast object detecton system usng a cascade structure as descrbed n (Lenhart et al., 2002; Vola et al., 2001). 6.4 Applcaton to Face Recognton Snce both algorthms of AdaBoost and the proposed MutualBoost are addressng two class problems only, the mult-class face recognton problem has to be reformulated to make the algorthms applcable. The Gabor feature dfference space s adopted n ths 103

118 Optmsng Gabor Features for Object Detecton and Recognton work such that a set of tranng samples can be generated n the two class space. Once the set of samples and weak classfers are avalable, Adaboost and MutualBoost can be appled drectly for Gabor feature selecton The Gabor Feature Dfference Space Snce the feature selecton presented here focuses on two class problems only, face recognton s formulated as a problem n the dfference space (Phllps, 1999), whch models dssmlartes between two facal mages. Two classes, dssmlartes between faces of the same person (ntra-personal space) and dssmlartes between faces of the dfferent people (extra-personal space) are defned. The set CI (ntra-personal dfference) contans the wthn class dfference, whle the set CE (extra-personal dfference) gves the dssmlartes among mages of dfferent ndvduals n the tranng set: CI = CE = { G( I ) G( I ), I ~ I } p q p q { G( I p ) G( I q ), I p ~ / I q } (6.4) where I p and I q are the facal mages from people p and q respectvely, and G ( ) s the Gabor feature extracton operaton as defned n last secton. Each of the M samples n the dfference space can now be descrbed as x = g g L g Lg ], = 1,2, M, where [ 1 2 j N L, g = G( I ) G( I ). N s the dmenson of extracted Gabor features and ( ) j p q j Tranng Samples Generaton For a tranng set wth L facal mages captured for each of the D persons, L D 2 samples could be generated for the ntra-personal dfference class whle DL L D 2 2 samples are avalable for extra-personal dfference class. There are always many more extra-personal samples than ntra-personal samples for face recognton problems. Take 104

119 Optmsng Gabor Features for Object Detecton and Recognton a database wth 400 mages from 200 subjects for example, 200 ntra-personal mage 400 pars and 200 = 79, extra-personal mage pars are avalable. To acheve a balance between the numbers of tranng samples from the two classes, a random subset of the extra-personal samples could be produced. However, the generated subset should also be representatve of the whole set. To acheve ths trade off, the procedure as shown n Fgure 6-3 s proposed to generate m extra-personal dfference samples usng U V Gabor wavelets: nstead of usng only m pars, the method randomly generates m samples from m U V extra-personal mage pars. As a result, wthout ncreasng the number of extra-personal samples to bas the feature selecton process, the tranng samples thus generated are more representatve. For = 1,2, Lm For u = 0,1, L U 1 For v = 0,1, L V 1 Randomly generate an mage par ( ) p I q I, from dfferent person Calculate the Gabor feature dfference Z u, v correspondng to flter ϕ ( x, ) usng the mage par as below: u, v y Z u,v = O I p I O q u, v u, v End End Concatenate the U V feature dfferences nto an extra-personal sample, x = [ Z0,0Z0,1 L Zu, v LZU 1, V 1] End Output the m extra-personal Gabor feature dfference samples { x, y ), L, ( x, y )}, y y = L = y 1. ( 1 1 m m 1 = 2 m = Fgure 6-3 Extra-personal dfference samples generaton Includng the L l = D ntra personal dfference samples, the tranng sample generaton 2 process fnally outputs a set of M = m + l Gabor feature dfference samples: { x, y ), L,( x, y )} M. Each sample x = g g Lg Lg ] n the dfference space s ( 1 1 M [ 1 2 j N 105

120 Optmsng Gabor Features for Object Detecton and Recognton assocated wth a bnary label: y = 1 for an ntra-personal dfference, whle y = 1 for an extra-personal dfference Weak Classfers Once a set of tranng samples wth class labels (ntra-person, or extra-person) {( x, y ), L,( x, y )} s gven, a large number of canddate weak classfers 1 1 M M h need to be j desgned for selecton. Gven a sample x = g g L g L g ] n the Gabor feature [ 1 2 j N dfference space, each weak classfer s now desgned to be a smple threshold functon usng sngle feature,.e., f the dfference s less than a threshold, the predcton s set as -1, otherwse t s set as 1. 1, h = j 1, f f g g j j < λ λ j j (6.5) Snce we are only nterested n the selecton of features n ths applcaton, the threshold λ s smply determned by the centre of the ntra-personal sample mean and extra- j personal sample mean, m l 1 ( x ) = ) + ( ) p y j p 1 xq j ( y = ) p λ = j (6.6) 2 m p= 1 l q= 1 where m and l s the number of extra and ntra personal dfference samples, respectvely. The set of canddate weak classfers are now represented by N random varables wth bnary values, the MI between a canddate classfer and the selected classfers can be easly calculated and the teratve process of MutualBoost as descrbed n Fgure 6-2 can be appled thereafter. On the other hand, the AdaBoost algorthm can be appled drectly to the learned weak classfers for selecton. The Gabor features thus selected by AdaBoost or MutualBoost are carryng mportant nformaton about predctng whether the sample s an ntra-personal dfference, or an extra-personal dfference. Based on the fact that face recognton s actually to fnd the 106

121 Optmsng Gabor Features for Object Detecton and Recognton most smlar match wth the least dfference, the selected features mght be very mportant for recognton as well Kernel Enhancement Once the most dscrmnatve Gabor features are selected, they could be ether used drectly, or nput to some classfcaton system for face recognton. Dfferent classfcaton schemes could be used here, e.g., after Prncpal Component Analyss (PCA) or Lnear Dscrmnant Analyss (LDA) s further appled for feature enhancement, the nearest neghbour (NN) classfer can be used for classfcaton. In prevous chapters, kernel subspace methods have been successfully appled to face dentfcaton and verfcaton and the comparatve dentfcaton results wth lnear subspace methods have clearly shown ther advantage n handlng nonlnear data. By mappng sample data to a hgher dmensonal feature space, effectvely a nonlnear problem defned n the orgnal mage space s turned nto a lnear problem n the feature space (Scholkopf et al., 1999). Support Vector Machne (SVM) s another successful example of usng kernel methods for classfcaton. However, SVMs are bascally desgned for the two class problem. Based on the successful applcaton of Generalzed Dscrmnant Analyss (GDA) for face dentfcaton and verfcaton n prevous chapters, GDA s adopted here for further feature enhancement and KNN classfcaton for recognton. The GDA subspace s frst constructed from the selected Gabor features of tranng mages and each mage n the gallery set s then projected onto the subspace. To classfy an nput mage, the selected Gabor features are extracted and then projected to the GDA subspace. The smlarty between any two facal mages can then be determned by the normalzed correlaton dstance of the projected vectors. Detals of applyng GDA for face recognton can be found n chapter

122 Optmsng Gabor Features for Object Detecton and Recognton 6.5 Expermental Results Gabor Feature Based Classfer for Object Detecton The experments presented here apply the AdaBoost algorthm to learnng Gabor feature based classfer for object detecton, whch classfy an mage of standard sze (e.g pxels) nto ether face (car) or non-face (non-car). As a two class problem, classfcaton based methods (Rowley, Baluja, & Kanade, 1998; Osuna et al., 1997) have been one of the man approaches for object detecton. Recent works (Lenhart et al., 2002; Vola et al., 2001) successfully bult a face detecton system wth both hgh accuracy and fast speed. The system used the AdaBoost algorthm to select and learn Haar-lke features based classfer for face detecton. Followng ther framework, the experments wll perform two tasks: feature selecton and classfer learnng Data Sets Two mage sets, a face mage set and car mage set, are used to test the Gabor feature based object detecton algorthm. The face mage set s provded by Carbonetto (Carbonetto, 2001) and contans 4916 mages wth faces n them and 7872 mages wthout faces n them. Fgure 6-4 shows some example face and non-face mages. All of the face mages are of sze 24 24, and are randomly splt nto a tranng set and test set contanng 2458 postve samples (faces) and 3936 negatve samples (non-faces) each. The second mage set used n the experments contans 550 mages wth at least one car n them and 500 mages that do not contan a car (Agarwal, Awan, & Roth, 2002). The car mage set s also randomly splt nto a tranng set and a test set. The tranng set contans 440 car mages and 400 non-car mages, whlst the remanng 110 car mages and 100 non-car mages are ncluded n the test set. Fgure 6-5 shows sample mages from the car mage set, whch are of sze

123 Optmsng Gabor Features for Object Detecton and Recognton Fgure 6-4 Images from face mage set Fgure 6-5 Images from car mage set Selected Gabor Features Gven the set of the two classes of tranng samples wth a class label, each sample could be represented wth = 23,040 Gabor features obtaned by convolvng 40 Gabor wavelets at each pxel locaton. Each Gabor feature obtaned s thus assocated wth an mage locaton and a Gabor wavelet. Once the most sgnfcant Gabor features to dscrmnate the two classes are selected by AdaBoost, ther assocated Gabor wavelets can be traced to gan nformaton about the scale and orentaton dstrbuton of the wavelets. Fgure 6-6 shows the dstrbuton for the face mage set. The scale wth ndex u, u = 1,2 L5 represent the wavelet wth central frequency f u Fmax = It s clear from the bar charts that the hgh frequency wavelets ( 2 ) u are chosen much more often than low frequency ones, and Gabor wavelets wth orentaton π/2 are preferred for ths classfcaton task. The orentaton preference shows that horzontal features happen more frequently n face mages, e.g. eyebrows, eyes, and mouth. The frst eght Gabor wavelets selected by the AdaBoost algorthm for the car mage set s also shown n Fgure 6-7, whch nterestngly ndcates that tyres are very mportant features for car detecton. 109

Optmsng Gabor Features for Object Detecton and Recognton Fgure 6-6 Scale and orentaton dstrbuton of flters selected for the face mage set Fgure 6-7 Frst eght selected

3 Classfcaton Performance Evaluaton The AdaBoost algorthm not only selects the most dscrmnatve Gabor features, but also learns a classfer usng the selected features.

One can observe from the fgure that 100 features are enough for GaborBoost to acheve zero FAR and FRR on the face mage set, whle only 20 features are requred for the

39% classfcaton rate and 1.75% FRR wth 150 selected Gabor features, whle the best car/non-car classfer acheves 100% classfcaton rate and 1.

124 Optmsng Gabor Features for Object Detecton and Recognton Fgure 6-6 Scale and orentaton dstrbuton of flters selected for the face mage set Fgure 6-7 Frst eght selected Gabor wavelets for the car Classfcaton Performance Evaluaton The AdaBoost algorthm not only selects the most dscrmnatve Gabor features, but also learns a classfer usng the selected features. The False Accept Rates (FAR) and False Reject Rates (FRR) for the AdaBoost traned classfer, GaborBoost, on the tranng mage sets are shown n Fgure 6-8. One can observe from the fgure that 100 features are enough for GaborBoost to acheve zero FAR and FRR on the face mage set, whle only 20 features are requred for the car mage set. The results on the test face mage set and the test car mage set are shown n Fgure 6-9. The best face/non-face classfer acheves 99.39% classfcaton rate and 1.75% FRR wth 150 selected Gabor features, whle the best car/non-car classfer acheves 100% classfcaton rate and 1.82% FRR wth only 80 features. To compare GaborBoost wth other methods, the results of two other methods, named ExBoost and EABoost on the same face mage set are also lsted n Table 6-1. ExBoost uses the Haar feature set and AdaBoost algorthm to select features and learn classfers, whch s dentcal to the algorthm proposed n (Vola et al., 2001). They also proposed 110

125 Optmsng Gabor Features for Object Detecton and Recognton to use a Genetc algorthm to reduce the search space durng the boostng procedure, and named the algorthm as EABoost. As shown n the table, the GaborBoost algorthm outperforms ExBoost and EABoost n terms of both FAR and FRR, whle usng a fewer number of features. The results clearly show the advantages of Gabor features over Haar-lke features n the context of object detecton. (a) (b) Fgure 6-8 FAR and FRR on the tranng face mage set (a) and the tranng car mage set (b) Tranng Set Test Set Algorthm Feature Numbers FAR FRR ExBoost EABoost GaborBoost ExBoost % 3.5% EABoost % 3.2% GaborBoost % 1.75% Table 6-1 Comparatve classfcaton results on the face mage set 111

126 Optmsng Gabor Features for Object Detecton and Recognton (a) (b) Fgure 6-9 FAR and FRR on the test face mage set (a) and the test car mage set (b) SVM for Classfcaton In the followng experments, SVM s appled on the AdaBoost selected Gabor features for classfcaton. The classfer, named as GaborBoostSVM, s traned usng the Gabor features selected by the AdaBoost algorthm. Face mages wth the same partton of tranng set and test set are used for tranng and testng. 150 boosted Gabor features are extracted from each sample n the tranng set, whch are then passed to SVM for tranng. The results are shown n Table 6-2 and compared wth a SVM traned usng the whole set of Gabor features wth dmenson 23,040 (GaborSVM), usng the raw pxels (RawSVM) and GaborBoost as descrbed above. For RawSVM, the pxel values 112

127 Optmsng Gabor Features for Object Detecton and Recognton of each sample are concatenated to a feature vector to tran a SVM. A Pentum GHz PC and the SVM-Lght package (Joachms, 2004) were used n our experments. SVM Gabor- GaborBoostSVM GaborSVM RawSVM Boost Lnear RBF Lnear RBF Lnear RBF Feature Dmenson ,040 23, Number of SVs N/A N/A SVM Tranng Tme N/A 38s. 75s. 10h. >74h 180s 270s FRR (%) N/A FAR (%) N/A Table 6-2 SVM classfcaton results on the face mage set Compared wth classfers utlzng Gabor features, RawSVM acheves the hghest FAR and FRR, whch suggests that Gabor wavelets are a good choce for extractng features for classfcaton. However, due to the huge dmenson of Gabor features, we dd not succeed n tranng GaborSVM usng the RBF kernel - the program crashed after runnng for 74 hours, whch may be caused by hgh memory usage and computaton cost. It also takes about 10 hours to tran the GaborSVM wth a lnear kernel. The tranng tme appears to ncrease exponentally wth the number of tranng samples. In addton, the computatonal cost of convolvng an mage wth 40 Gabor wavelets s very hgh, whch makes GaborSVM unsutable for real tme applcatons. Snce the SVM s specally suted for bnary classfcaton, GaborBoostSVM acheves lower FAR and FRR than GaborBoost. Both methods use the same 150 Gabor features selected by the AdaBoost algorthm. The tranng of GaborBoostSVM wth a RBF kernel takes less than 2 mnutes. Only 150 convoluton operatons usng one varable wavelet s necessary to extract the selected Gabor features, whch makes GaborBoostSVM hghly effectve n terms of memory and computatonal effcency. 113

128 Optmsng Gabor Features for Object Detecton and Recognton Selectng Gabor Features for Face Recognton Based on the dscrmnatve power of Gabor features for pattern classfcaton, the experments presented n ths secton am to learn the most sgnfcant Gabor features for face recognton. By reducng the feature dmenson, not only s memory and computaton cost greatly reduced, the system may also be more robust aganst the nference of nose. As a standard test bed, the FERET database (Phllps et al., 2000) s used here to evaluate the performance of selected Gabor features for face recognton. The same subset (600 frontal face mages correspondng to 200 subjects) used n chapter 4 s frst used here to compare the performance of dfferent feature selecton schemes,.e. AdaBoost and MutualBoost. All of the mages are normalzed n both sze (64 64) and orentaton accordng to the eye coordnates. Both the dfference between selected features and recognton performance wll be analyzed. The recognton performance usng the selected Gabor features wll also be compared wth the method shown n chapter 4, where the whole set of Gabor features before selecton s used for dentfcaton. Once an mproved feature selecton approach for face recognton s dentfed, t wll be appled to the whole FERET database accordng to the specfed evaluaton protocol for dentfcaton. Fnally the performance wll be compared wth other state of the art algorthms Selected Gabor Features The randomly selected 400 face mages (2 mages for each subject) are frst used to learn the most mportant Gabor features for ntra-personal and extra-personal face space dscrmnaton. As a result, 200 ntra-personal face dfference samples and 1,600 extrapersonal face dfference samples usng the method as descrbed n Fgure 6-3 are randomly generated for feature selecton. Fgure 6-10 and Fgure 6-11 show the frst sx locatons of the frst 200 Gabor features selected by AdaBoost (AdaGabor) and 114

Optmsng Gabor Features for Object Detecton and Recognton MutualBoost (MutualGabor) respectvely, both are overlapped wth a typcal face mage n the database.

aganst the varance of expresson and llumnaton encountered wthn the database subset.

129 Optmsng Gabor Features for Object Detecton and Recognton MutualBoost (MutualGabor) respectvely, both are overlapped wth a typcal face mage n the database. It s nterestng to see that most of the selected Gabor features are located around the promnent facal features such as eye brows, eyes, noses and chn, whch ndcates that these regons are more robust aganst the varance of expresson and llumnaton encountered wthn the database subset. Ths result s agreeable wth the fact that the eye and eyebrow regons reman relatvely stable when a person s expresson changes. Though the frst sx Gabor wavelets selected by the AdaBoost and MutualBoost algorthms are smlar, the locatons of the 200 features show the exstence of redundancy among AdaBoost selected features,.e. many of the features are very near, or smlar, to each other. The features selected by MutualBoost are more wdely spread and thus exhbt a lower degree of correlaton. (a) (b) (c) (d) (e) (f) (g) Fgure 6-10 Frst sx Gabor features (a)-(f); and the 200 feature ponts (g) selected by AdaBoost (a) (b) (c) (d) (e) (f) (g) Fgure 6-11 Frst sx Gabor features (a)-(f); and the 200 feature ponts (g) selected by MutualBoost Fgure 6-12 shows the dstrbuton of MutualBoost selected wavelets n dfferent scales and orentatons. As shown n ths fgure, wavelets centred wthn low frequency bands are selected much more frequently than those n hgh frequency bands. On the other hand, the majorty of the dscrmnatve Gabor features have an orentaton around π/4, 3π/8, π/2 and 5π/8. It s nterestng to compare the two dstrbutons of Gabor wavelets 115

Optmsng Gabor Features for Object Detecton and Recognton selected for face detecton and recognton: whle the domnant orentatons of the selected wavelets are smlar for both

Ths suggests that hgh frequency features are more mportant to dscrmnate objects wth backgrounds.

130 Optmsng Gabor Features for Object Detecton and Recognton selected for face detecton and recognton: whle the domnant orentatons of the selected wavelets are smlar for both applcatons, the domnant frequency bands are dfferent one prefers hgh frequency nformaton and the other favours lower frequences. Ths suggests that hgh frequency features are more mportant to dscrmnate objects wth backgrounds. Snce the dfferences between face mages are used to select Gabor features for recognton, low frequency features seem to be more robust aganst the dstortons caused by expresson and llumnaton varatons. Fgure 6-12 Dstrbuton of MutualGabor features n scale and orentaton To show the exstence of redundancy among AdaBoost selected features (weak classfers), the max MI R h ) for each selected feature s shown n Fgure 6-13a. It can ( j be observed from the fgure that some of the features are hghly redundant, e.g. the MI of features wth numbers 149, 177 and 180 s greater than The redundancy among selected features ncreases wth the number of features, t s ths undesred redundancy 116

131 Optmsng Gabor Features for Object Detecton and Recognton that we am to elmnate or reduce. The MI data for features selected wth MutualBoost has also been shown n Fgure 6-13b (wth TMI=0.1). Due to the ntroducton of TMI, all the selected features now show MI values of less than 0.1 and thus one can conclude that the features are nformatve and non-redundant. (a) Fgure 6-13 MI of features selected by AdaBoost (a); MutualBoost (b) Algorthm Complexty Due to the ntroducton of mutual nformaton, MutualBoost requres longer tranng tme than that requred by AdaBoost. However, the only computaton cost added to AdaBoost s the loop to calculate MI values for redundancy checkng, see Fgure 6-2 for detals. Table 6-3 shows the Average Number of Loops (ANL) requred n each teraton and the correspondng TMI. The table shows that the computaton burden added by the ntroducton of MI s actually very low (ANL s normally less than 10). As a result, the tranng tme requred by the proposed algorthm n our experments s only about 0.1 tmes greater than that of AdaBoost. (b) TMI ANL Table 6-3 ANL for dfferent TMI As seen from the table, the hgher the value of TMI, the less ANL requred,.e. the faster tranng speed. Actually AdaBoost can be seen as a specal case of MutualBoost 117

132 Optmsng Gabor Features for Object Detecton and Recognton when the value of TMI s set as 1. In ths case, the features, or weak classfers selected by the proposed algorthm wll be exactly the same as those chosen by AdaBoost Recognton Performance on Subset of the FERET Database Once dfferent sets of Gabor features are selected, they can be used ether drectly, or subjected to further analyss for recognton. To compare the performance of dfferent feature selecton schemes, Both AdaGabor and MutualGabor are frst appled drectly for face recognton, wth the resultng performance shown n Fgure The features were tested usng 200 mages (one for each subject), whch are dfferent from the tranng mages n both llumnaton and expresson. The normalzed correlaton dstance measure and the nearest neghbour classfer are adopted. The performance shown n Fgure 6-14 proves the advantage of MutualGabor over AdaGabor,.e. the accuracy of MutualGabor s equvalent wth, or hgher than AdaGabor wth any number of features. Snce the MI values for all of the frst 60 features are qute small, MutualBoost starts by pckng up much the same features as AdaBoost. However, once the number of features ncreases, AdaBoost starts to pck redundant features. The mproved recognton rate accuracy over AdaBoost caused by the use of features selected usng MutualBoost shows the usefulness of the technques n elmnatng redundancy. The performance drop usng 160 MutualGabor features could be caused by the varance between test mages and tranng mages - some features sgnfcant to dscrmnate tranng mages mght not be the approprate ones for test mages. A more representatve tranng set mght allevate ths problem. As shown n the fgure, MutualGabor acheved as hgh as 94% recognton rate wth 200 features. In the next seres of experments, GDA wll be performed on the selected Gabor features (MutualGabor + GDA) for further enhancement. To show the robustness and effcency of the proposed methods, the performance of GDA on the whole Gabor 118

133 Optmsng Gabor Features for Object Detecton and Recognton feature set (Gabor + GDA) s also ncluded for comparson purposes. Downsamplng s adopted to reduce feature dmenson to a certan level, see chapter 4 for detals. The normalzed correlaton dstance measure and the nearest neghbour classfer were used. As descrbed n chapter 4, the maxmum dmenson of the GDA subspace s determned by the number of classes and the number of non-zero egenvalues of the kernel matrx. The maxmum dmensons for Gabor + GDA and MutualGabor + GDA n ths test are 110 and 199 respectvely. As shown n Fgure 6-15, MutualGabor + GDA acheves as hgh as 99.5% accuracy. Snce all of the face mages n ths experment are normalzed to a reduced sze (64 64) to speed up the feature selecton process, the performance (97%) of Gabor + GDA s a lttle bt lower than that reported n chapter 4 (97.5%), whch was tested on mages of sze The performance mprovement of MutualGabor + GDA shows that some mportant Gabor features may have been lost durng the dowsamplng process for Gabor + GDA. Addtonally some of the remanng features are redundant. Fgure 6-14 Recognton performance of AdaGabor and MutualGabor 119

134 Optmsng Gabor Features for Object Detecton and Recognton Fgure 6-15 Recognton performance of enhanced MutualGabor The computaton and memory costs of Gabor + GDA and MutualGabor + GDA are also lsted n Table 6-4. Ths shows that MutualGabor + GDA ncurs sgnfcantly less computaton and memory costs than Gabor + GDA, e.g., the number of convolutons to extract Gabor features s reduced from 16,3840 to 200. Although the Fast Fourer Transform (FFT) could be used here to crcumvent the convoluton process, the feature extracton process stll takes about 1.5 seconds for mages wth sze n our C mplementaton whlst the 200 convolutons take less than 4ms. For Gabor + GDA wth a down-samplng rate of 16, the feature dmenson s reduced to 10,240, whch s stll 50 tmes the dmenson of MutualGabor + GDA. As a result, MutualGabor + GDA s much faster n tranng and testng. Whle t takes Gabor + GDA 275 seconds to construct the GDA subspace usng the 400 tranng mages, t takes MutualGabor + GDA only about 6 seconds. MutualGabor + GDA also acheves substantal mprovements to recognton effcency - only 4 seconds are requred to recognze the 200 test mages. The computaton tme s recorded n Matlab 6.1, wth a P4-1.8GHz PC. 120

135 Optmsng Gabor Features for Object Detecton and Recognton Wth non-redundant and nformatve Gabor features, MutualGabor + GDA acheves better accuracy wth sgnfcantly less computaton than other methods descrbed here. Number of Convolutons to Extract Gabor Features Dmenson of Gabor Features before GDA Tranng Tme Test Tme Gabor-GDA 16, , sec. 263 sec. MutualGabor -GDA sec. 4 sec. Table 6-4 Comparatve computaton and memory cost of Gabor + GDA and MutualGabor + GDA Havng shown n chapter 4 that GDA acheves sgnfcantly better performance on the whole Gabor feature set (Gabor + GDA) than LDA (Gabor + LDA), the performance of LDA on the selected nformatve Gabor features (MutualGabor + LDA) s also ncluded n Fgure 6-15 for comparson. As shown n the Fgure, the performance of MutualGabor + LDA s substantally worse than that of Gabor + GDA and MutualGabor + GDA. Only 82% accuracy s acheved when the dmenson of LDA subspace s set as 60, whch s even worse than that of MutualGabor --- applcaton of LDA surprsngly deterorates the performance of MutualGabor. The result suggests that when the nput features are dscrmnatve enough, LDA analyss may not necessarly lead to a more dscrmnatve space. The results also show that the feature enhancement ablty of GDA s better than LDA Recognton Performance on the Full Set of FERET Database After showng the comparatve results wth a state of the art Gabor feature based algorthm, the MutualGabor + GDA algorthm s now tested on the whole FERET database. Accordng to the evaluaton protocol, a gallery of 1196 frontal face mages and 4 dfferent probe sets are used for testng. The numbers of mages n dfferent probe sets are lsted at Table 6-5, wth example mages shown n Fgure Fb and Fc probe sets are used for assessment of the effect of facal expresson and llumnaton changes 121

Optmsng Gabor Features for Object Detecton and Recognton respectvely, and there s only a few seconds between the capture of the gallery-probe pars.

136 Optmsng Gabor Features for Object Detecton and Recognton respectvely, and there s only a few seconds between the capture of the gallery-probe pars. Dup I and Dup II consst of mages taken on dfferent days from ther correspondng gallery mages, and partcularly, there s at least one year between the acquston of the probe mage n Dup II and the correspondng gallery mage. A tranng set consstng of 736 mages, s used to select the most nformatve Gabor features and construct the GDA subspace. Note that the same set was released to researchers to develop ther algorthms durng FERET evaluaton. As a result, 592 ntra-personal and 2000 extra-personal samples are produced to select 300 Gabor features usng the sample generaton algorthm and nformaton theory. Durng the development phase, the tranng set s randomly dvded nto a gallery set wth 372 mages and a test set wth 364 mages to decde the dmenson for optmal GDA performance. The same parameters developed are used throughout the testng process. Probe Set Gallery Probe set sze Gallery sze Varatons Fb Fa Expresson Fc Fa Illumnaton and Camera Dup I Fa Tme gap < 1 week Dup II Fa Tme gap > 1 year Table 6-5 Lst of dfferent prob sets Fgure 6-16 Examples of dfferent probe mages Performance results of the proposed algorthm are shown n Table 6-6, together wth that of the other man approaches partcpatng n the FERET evaluaton (Phllps et al., 2000), as well as an approach to extract Gabor features from varable feature ponts for recognton (Kepenekc, Tek, & Akar, 2002). The results show that MutualGabor + 122

137 Optmsng Gabor Features for Object Detecton and Recognton GDA acheves the best result on all of the test sets. Ths can be attrbuted to the robustness of the selected Gabor features aganst varaton n expresson and capture tme. Partcularly, the performance of the proposed method s sgnfcantly better than all other methods on the Dup II set. Followng the proposed method, the Elastc Bunch Graph Matchng (EBGM) method, whch s based on elastc graph matchng, ranked as the second performer. However, the method requres ntensve computaton complexty for both Gabor feature extracton and graph matchng. It was reported n (Wskott et al., 1997) that the elastc graph matchng process took 30 seconds on a SPARCstaton Compared wth the EBGM approach MutualGabor + GDA s far superor n terms of both accuracy and computatonal effcency. Method Fb Fc Dup I Dup II PCA 83.4% 18.2% 40.8% 17.0% PCA + Bayesan 94.8% 32.0% 57.6% 35.0% LDA 96.1% 58.8% 47.2% 20.9% Elastc Graph Matchng 95.0% 82.0% 59.1% 52.1% Varable Gabor Features 96.3% 69.6% 58.3% 47.4% (Kepenekc et al., 2002) MutualGabor + GDA 96.7% 85.6% 59.3% 62.4% Table 6-6 FERET evaluaton results for varous face recognton algorthms 6.6 Conclusons Two dfferent algorthms: AdaBoost and the proposed MutualBoost have been successfully appled for Gabor feature selecton n ths chapter. The AdaBoost algorthm s used to learn Gabor feature based classfers for object detecton. Whle accuracy advantages of Gabor features over Haar-lke features are observed usng the AdaBoost learned classfer, further mprovements have been acheved when SVM s adopted for classfcaton. Due to the greatly reduced feature dmenson, the SVM classfer usng selected Gabor features acheves a substantal speed advantage over systems usng the whole Gabor feature set. Based on ts hgh accuracy, the module can 123

138 Optmsng Gabor Features for Object Detecton and Recognton be further developed to a classfcaton based object detecton system. A cascade structure could be used to acheve a trade off between accuracy and effcency. The two feature selecton schemes descrbed have also been successfully appled to select Gabor features for face recognton. To smplfy the computaton cost and algorthm complexty, the ntra-personal and extra-personal dfference spaces are used. Compared wth AdaBoost, expermental results show that features selected when mutual nformaton s consdered acheve hgher recognton accuracy. The MutualBoost selected Gabor features are further enhanced n the non-lnear kernel space usng Generalzed Dscrmnant Analyss and fully tested wth extensve databases. Compared wth one of the top methods n FVC 2004, the method shows advantages n both accuracy and effcency. The results on the full FERET database followng the evaluaton protocol also show that the algorthm performs better than the prevous top method, the elastc graph matchng algorthm. However, the algorthm shows advantages n computaton cost and effcency snce no graph matchng process s needed. In addton, the method acheves sgnfcantly better performance on the most dffcult test set, Dup II. Whlst the mutual nformaton based feature selecton process n ths chapter addresses the r.v. wth bnary values only, t could certanly be extended to the case of contnuous varables. A Gaussan mxture model may be needed to represent the dstrbuton when the r.v.s do not show Gaussanty. The dstrbuton could also be dscretzed usng hstogram estmaton, f the number of bns could be determned. When a r.v. wth multple values s used, the feature selecton process wll ncur a much hgher computaton cost and complexty. The value of TMI for MutualBoost needs to be selected approprately to make sure that selected features are both non-redundant and useful for classfcaton. A cross- 124

139 Optmsng Gabor Features for Object Detecton and Recognton valdaton set could be used to determne the TMI for common classfcaton problems. As shown n Fgure 6-13, snce the redundancy ncreases wth the number of selected features, an adaptve TMI, whch ncreases wth the number of features, mght be more sutable. 125

140 Radal Symmetry Transform Based Eye Locaton Chapter 7 Radal Symmetry Transform Based Eye Locaton Whle the success of analytc face recognton approaches depends on the relable detecton of facal features, holstc approaches also need to use those feature ponts as mportant references for scale and orentaton normalzaton. Eyes have been consdered as more salent and stable than all other facal features (Brunell et al., 1993). Face mages can be easly normalzed usng geometrcal measurements f both eyes are detected. Thus, eye locaton algorthms are very mportant for face recognton systems. Whle the face recognton algorthms presented n prevous chapters used manually located eye centres for normalzaton, a smple and robust eye locaton system wth no tranng and extra devce requrement s presented n ths chapter. The approach s based on the generalzed symmetry transform; a low level operator that can be appled successfully for detectng regons of nterest wthout any pror knowledge (Resfeld, Wolfson, & Yeshurun, 1995). Based on context free and low level components, a hgh level and purposve model, whch utlzes pror knowledge of eye features, s then mplemented for the eye locaton task. The performance of the algorthm has been fully tested usng the BoID and BANCA database, and has also been ntegrated nto an automatc face verfcaton system. 126

141 Radal Symmetry Transform Based Eye Locaton 7.1 Background In (Yulle, Hallnan, & Cohen, 1992), deformable templates are used to detect facal features. The eye feature s descrbed as a parameterzed template whch nteracts dynamcally wth the mage by alterng ts parameter values to mnmze a defned energy functon, thereby deformng tself to fnd the best ft. However, usng ths technque the templates have to be ntalzed at a poston near to the actual eye locaton. The egenface approach, was further developed n (Moghaddam & Pentland, 1994) n the form of egeneyes, egennoses and egenmouths whch were used to detect facal features. A Support Vector Machne approach s appled n (Huang, Shao, & Wechsler, 1998) to estmate the facal pose and detect the eye locatons. 186 eye mages and 186 non-eye mage are used to tran the SVM classfer. Both methods requre many mages for classfer and model tranng. Rzon et al. (Rzon & Kawaguch, 2000) used ntensty and edge nformaton to detect canddates for facal features and a cost functon s defned for each par of feature ponts satsfyng a spatal constrant. The par of feature ponts wth the smallest cost s determned to be the pupls of both eyes. A very dfferent system was developed by Mormoto et al. (Mormoto, Koons, Amr, & Flckner, 2000) and appled to pupl detecton. Two near nfrared, tme multplexed lght sources are synchronzed wth the camera frame rate to generate brght and dark pupl mages, whch are then used for pupl segmentaton. However, most of the descrbed algorthms were tested usng only a small set of mages, and ther effects on the performance of face recognton/verfcaton systems are seldom reported. 7.2 The Methodology The Generalzed Symmetry Transform Snce natural and artfcal objects often gve rse to the human sensaton of symmetry, t has been suggested as one of the fundamental propertes to gude hgher level 127

142 Radal Symmetry Transform Based Eye Locaton processes n computer vson (Resfeld et al., 1995). An object s regarded symmetrc f t s nvarant to the applcaton of certan symmetry operatons, e.g., the reflectonal (mrror) symmetry operaton. However, the shape of the object needs to be known before such operatons can occur. The generalzed symmetry transform, however, does not requre ths knowledge of shape. It operates on the edges n an mage and assgns a contnuous symmetry measure to each pxel. Fgure 7-1 The contrbuton of ponts p and p to the symmetry measure j (Resfeld et al., 1995) Let p k = ( x, y) k = 1... K be any pxel n an mage, and denote by G, G ) the horzontal and vertcal gradent of the mage at pxel p,.e. k strength and phase of the gradent at p k can be calculated as: G x p = ( k x y p p k k =, G y =. The x y rk ( ) = log 1+ (7.1) p k G = arctan y θ k (7.2) G x For each of the two ponts p and p j, we defne l as the lne passng through them, wthα j beng the counter clockwse angle between l and the horzon. The drecton of 128

143 Radal Symmetry Transform Based Eye Locaton symmetry axs for ponts p and p can be denoted as: j defne a dstance weght functon and a phase weght functon as: D( p, p j [ p p j ] 2σ e θ + θ j ϕ( p, p j ) =. We now also 2 1 ) = (7.3) 2πσ P θ, θ ) = (1 cos( θ + θ 2α ))(1 cos( θ θ )) (7.4) ( j j j j The contrbuton of ponts p and p j to the symmetry measure of pont p can be represented as C (, j) = D( p, p ) P( θ, θ ) r r. Now the symmetry magntude of any pont p j j j can be defned as: M ( p) = C(, j) (7.5) ( p, p j ) Γ( p) j where Γ( p) s a set of ponts satsfyng: p + p j Γ( p) = ( p, p j ) = p. The symmetry 2 magntude thus averages the symmetry value over all orentatons. Once the symmetry drecton s defned as φ ( p) = ϕ( p, p j ) such that C (, j) = D( p, p j ) P( θ, θ j ) r r s maxmal j for ( p, p ) Γ( p), the symmetry value at pont p can be denoted as: j = ( M ( p), φ( )) (7.6) S( p) p The Radal Symmetry Measure The transform defned above can effectvely detect reflectonal symmetry, whch s nvarant under 2D rotaton and translaton transforms. Sometmes we may also need to detect objects that are symmetrc n multple dstnct orentatons rather than a sngle prncple one. The rs s an example of such an object. Radal symmetry such as ths can be defned as: RS( p) = ( p, p j ) Γ( p) 2 D( p, p ) P( θ, θ ) r r sn ( ϕ( p, p ) φ( p)) (7.7) j j j j 129

144 Radal Symmetry Transform Based Eye Locaton Ths expresson emphaszes contrbutons n the orentatons that are perpendcular to the man symmetry drecton, and attans ts maxmum n a pont that s surrounded by edges Eye Locaton by The Radal Symmetry Snce the man characterstc of the eye s ts rs, whch s symmetrc n multple dstnct orentatons, radal symmetry s adopted as our strategy for eye locaton. Our system ncorporates the followng modules: pre-processng, radal symmetry transformaton, post-processng and eye locaton. Fgure 7-2 shows the output from dfferent modules wthn the system. (a) (b) (c) (d) Fgure 7-2 The system output at dfferent stages. (a) the nput mage; (b) the radal symmetry map; (c) the fltered symmetry map; (d) the thresholded bnary symmetry map Input and Pre-processng Once the face area s detected usng a face detecton module such as (Lenhart et al., 2002), the left and rght eye regons can be roughly cropped and used as nput to the system for precse eye centre locaton. To cope wth varatons caused by mage nose and lghtng, a 5 5 Gaussan flter s appled before the symmetry transform. Ths has proved to be a smple and effectve soluton for nose removal Radal Symmetry Transform and Post-processng A symmetry magntude map can be attaned after applyng the radal symmetry transform to the extracted eye regon, as shown n Fgure 7-2. From ths fgure one can see that the eye regon has been hghlghted. A 5 5 mean flter s then appled to the symmetry map for nose suppresson. Ths s followed by a thresholdng process n 130

Radal Symmetry Transform Based Eye Locaton whch the symmetry map s now turned to a bnary mage, where the pxels wth hgh symmetry values are assgned wth the label 1 whlst the rest are assgned wth label

145 Radal Symmetry Transform Based Eye Locaton whch the symmetry map s now turned to a bnary mage, where the pxels wth hgh symmetry values are assgned wth the label 1 whlst the rest are assgned wth label 0. See Fgure 7-2 for an example Eye Centre Locaton The potental postons for the centre of each eye have now been reduced to several canddate areas, or as s true for most cases, a sngle regon for each eye. Thus, the eye centre can now be trvally dentfed by locatng the centre of each of the canddate areas. In cases where multple canddate regons are stll avalable, the smaller canddate regons are rejected. In addton, the canddate postons for both eyes are examned to ensure that the two eyes are located on, approxmately, the same horzontal lne. Fgure 7-3 shows a sample face mage and the eye centre locatons extracted by the descrbed algorthm.. More eye locaton results can be found n Fgure 7-4. Fgure 7-3 A sample face mage and the located eye centre 7.3 Expermental Results The Results on BoID Database A test set, the BoID database (Jesorsky, Krchberg, & Frchholz, 2001), s used n the experments to evaluate the proposed algorthms. The set conssts of 1521 mages of 23 dfferent people and was recorded durng several sessons n multple locatons. Ths set features a large varety of llumnaton, background and face szes. All of the mages are grey scale mages wth a sze of pxels. The x and y coordnates of the left and rght eyes are already ndcated and recorded n text fles face mages contanng 131

Snce the objectve here s the far evaluaton of the eye locaton algorthms, the face area for each mage s smply cropped accordng to the anthropometrc relatons between the face and

146 Radal Symmetry Transform Based Eye Locaton promnent eyes are further selected from the database for addtonal testng. The test set thus contans 475 mages captured from subjects wth glasses and 985 mages from subjects wthout glasses. Snce the objectve here s the far evaluaton of the eye locaton algorthms, the face area for each mage s smply cropped accordng to the anthropometrc relatons between the face and facal features. Fgure 7-4 shows a number of BoID face mages and the eye centre locatons extracted by the descrbed algorthm (all mages have been scaled to the same sze for vsual convenence). Fgure 7-4 Some sample test results Fgure 7-5 Error dstrbuton for test set wth glasses and wthout glasses 132

147 Radal Symmetry Transform Based Eye Locaton To evaluate the accuracy of the eye locaton algorthm, the normalzed dstance between the located eye centre x, y ) and the ground truth x, y ) s calculated as below: ( t t ( c c d e 2 2 ( xt xc ) + ( yt yc ) = (7.8) w where w s the dstance between the ground truth left and rght eye centres. A correct locaton of the eye n a face mage s regstered f the dstance d e s less than a threshold a,.e., d e < a. Fgure 7-5 shows dstrbuton of the error dstance d e for the test sets, both wth and wthout glasses. One can observe that more than 91% of both hstograms fall wthn the 0.2 error dstance. The locaton accuracy for dfferent values of a s shown n Fgure 7-6, suggestng that 99.34% accuracy can be acheved for mages wthout glasses when a = Due to reflecton and edges artfacts caused by wearng glasses, the fgure drops to 91.26% for ths test set. Fgure 7-6 The locaton accuracy varyng wth parameter a The Results on BANCA Database After the automatc eye locaton algorthm s fully tested usng face mages from the BoID database, t s now ntegrated wth an automatc face detecton module (Lenhart et al., 2002) and tested usng the BANCA database. The face detecton module s 133

Radal Symmetry Transform Based Eye Locaton mplemented as a cascade of Haar-lke feature based classfers, whch have been shown to acheve a very good trade off between accuracy and detecton effcency

148 Radal Symmetry Transform Based Eye Locaton mplemented as a cascade of Haar-lke feature based classfers, whch have been shown to acheve a very good trade off between accuracy and detecton effcency (Vola et al., 2001) mages from the development set of the BANCA database (Ballere et al., 2003) are used for testng. Sample mages can be found n Fgure 5-1. The test mages are ntally used as nput nto the face detecton module to locate facal regons, on whch the automatc eye locaton algorthm s then appled. Once the two eye centres are located they are used as reference ponts to enable the face mages to be normalzed n both rotaton and scale. Fgure 7-7 Automatcally normalzed face mages Fgure 7-7 shows some sample mages, normalzed usng the automatc face and eye locaton system thus provng ts robustness n a varety of stuatons. The sample mages are captured n several dfferent sessons: hgh qualty camera wth normal poses, low qualty camera wth normal poses and hgh qualty camera wth the head lookng down are shown on the frst, second and thrd rows respectvely. There are, however, several cases where ncorrect regons are selected due to errors n the face detecton or eye locaton modules. For example, the mages shown n Fgure 7-8(a) are manly background blocks, generated by errors n the face detecton module. As shown n Fgure 7-8 (b), the majorty of the false eyes are located wthn har regons, where many edges exst. Incluson of too much background n the face regon may also lead to 134

149 Radal Symmetry Transform Based Eye Locaton errors, snce ths can lead to an naccurate ntal guess for the eye regons. A statstcal analyss of the eye locaton results on the BANCA database has also been performed and s shown n Table 7-1. In ths analyss, the correct locaton of eyes s regstered only f both eye centres are close enough to the ground truth data, n ths case: d e < a, a = 0.2. (a) (b) Fgure 7-8 Wrong locatons caused by face detecton module (a); eye locaton module (b) Number of Performance of face detecton module Performance of eye locaton module test mages true face detectons false face alarms true eyes locatons false eyes alarms (99.30%) 15 (0.55%) 2674 (98.64%) 37 (1.36%) Table 7-1 Statstcal results on the BANCA database Integraton wth the Face Verfcaton System Recall that a Gabor wavelet based face verfcaton module has been developed n chapter 5. The verfcaton system uses Gabor wavelets for feature extracton, GDA for enhancement and KNN for classfcaton. The system s fully tested usng the BANCA database accordng to the face verfcaton competton held n 2004 (Messer et al., 2004). The comparatve results show that the performance of the system s among the top methods. Snce the system n chapter 5 normalzes mages wth manually located eyes, t s regarded as partally automatc. In the followng experments, an automatc verfcaton system (Gabor + GDA) s developed by ntegratng the eye locaton module and tested usng the same database. The results for the automatc verfcaton system, together wth that of other automatc algorthms, are shown n Table 7-2. The results 135

150 Radal Symmetry Transform Based Eye Locaton verfy the robustness of both eye locaton and face verfcaton algorthms proposed n ths thess. The performance of the Gabor + GDA method ranked wthn the top three and s sgnfcantly better than many other methods. The average error rate of Gabor + GDA s only 6.58%. Smlar to the results reported n chapter 5, snce a subject specfc threshold s used, the other method Gabor + PCA + SVM developed by us acheves better performance than Gabor + GDA. R=0.1(WER) R=1(WER) R=10(WER) Dev Eval Dev. Eval. Dev. Eval. Avg IDIAP HMM IDIAP Fuson QUT Unv Nottngham Gabor + PCA + SVM + subject specfc thresholds UnS-Fuson UCL-LDA UCL-Fuson NeuroInformatk Tsnghua Unv CMU Gabor + GDA + global threshold Table 7-2 Verfcaton results for fully automatc systems Due to mnor errors wthn the automatc eye locaton algorthms, the automatc verfcaton algorthms normally exhbt hgher levels of verfcaton error than the partally automatc methods. The comparatve results of the fully automatc system wth the correspondng partally automatc system, whch normalze faces wth manually located eyes, are shown n Table 7-3. The results for several dfferent research nsttutons on the same BANCA database are also ncluded, see (Messer et al., 2004) for the detals. As expected, the automatc face detecton and eye locaton module ncrease the weghted error rate of Gabor + GDA from 4.48% to 6.58%, whch s common for all of the verfcaton systems. However, the performance of the eye 136

151 Radal Symmetry Transform Based Eye Locaton locaton system s state of the art, and the developed automatc verfcaton system acheves sgnfcantly lower error rates than many other systems. Error Rate wth Manual Eye Error Rate wth Automatc Eye Increase of Error Rate (%) Locaton (%) Locaton (%) Gabor + GDA global threshold Gabor + PCA + SVM subject specfc thresholds CMU Unv. of Surrey Tsnghua Unv Table 7-3 Comparatve results for fully and partally automatc face verfcaton systems 7.4 Conclusons A generalzed symmetry transform based eye locaton algorthm has been proposed n ths chapter. The robustness of the algorthm s frst tested usng 1460 face mages from the BoID database, 99% and 93% accuracy was acheved for face mages wth and wthout glasses respectvely. The eye locaton algorthm has also been tested usng 2730 mages from the BANCA database, about 98.6% accuracy has been acheved. The results suggest that a more precse face locator could allevate many of the eye locaton errors. An automatc verfcaton system has been further developed by ntegratng the eye locaton module wth the verfcaton module (Gabor + GDA) proposed n chapter 5. The automatc system s fully tested usng the BANCA database accordng to protocols used n the recent Face Verfcaton Competton Though the error rate s larger than that reported n chapter 5 due to the ms-algnment among face mages caused by the eye locaton algorthm, the performance of the automatc system s one of the top three and better than most of the partcpants n the contest. 137

152 The Developed User Identfcaton System Chapter 8 The Developed User Identfcaton System Ths chapter presents an automatc user dentfcaton system developed at the ntal stages of ths research. The system conssts of the followng modules: face detecton, regstraton and user nformaton management. Once a subject s regstered wth the system, t can dentfy the regstered person n real tme when hs face mage s detected from a web cam. Based on ts effcency, the system s further developed to dentfy multple persons smultaneously from real vdeo streams. A vdeo demo dsplayng how the system works can be found at 138

153 The Developed User Identfcaton System 8.1 System Archtecture Regstraton Each canddate needs to be regstered wth the system before they can be dentfed. The regstraton process thus conssts of the followng modules: user nformaton regstry, face detecton, feature extracton and/or model tranng and feature/model savng. As shown n Fgure 8-1, the process requres the full support of the face detecton module, user management module and recognton module, whch wll be descrbed n detal n the next secton. About 30 staff from the Nottngham Computer Scence School are regstered wth our system, wth at least 5 face mages for each subject on record. Once the face mages are regstered, the recognton module can be nvoked to extract features or tran subject specfc models. These are then saved va the user management module for future dentfcaton purposes. Start Intalze New Person ID and Regster Personal Info USER MANAGEMENT MODULE Grab An Image from Camera Add Face to the regsterng ID Suffcent Face Images? No Locate Face FACE DETECTION MODULE Yes Feature Extracton and/or Model Tranng RECOGNITION MODULE Save Feature and/or Model End Fgure 8-1 Regstraton flow chart 139

154 The Developed User Identfcaton System Start Grab An Image from Camera Locate Faces FACE DETECTION MODULE Feature Extracton Start from the 1st Regstered ID RECOGNITION MODULE Yes All Regstered ID Scanned? No Choose the ID wth Max Probablty/Smlarty Next Regstered ID Calculate Smlarty or Probablty Retreve Feature/ Model for Current ID USER MANAGEMENT MODULE Retreve Informaton Regstered wth the ID End Fgure 8-2 Identfcaton flow chart Identfcaton The am of user dentfcaton s to dentfy a subjects ID when ther face s presented before a web cam. The subject has to be regstered wth the system before he can be dentfed. The dentfcaton process, as shown n Fgure 8-2, can be summarzed as follows: when a user s sttng before the web cam, ther face area s located and captured, then refned by the face detecton module and fnally passed to the recognton module for processng and dentfcaton. The recognton module compares the nput face wth each regstered subject, ether by matchng features drectly or by computng the probablty. The face s then dentfed as the person whose features or model gves the maxmum smlarty or probablty. The personal nformaton regstered wth the ID wll fnally be retreved from the user management module and presented 140

155 The Developed User Identfcaton System by the system. Fgure 8-3 shows a snapshot of the system. The screenshot shows that the system correctly dentfy the user Dylan Shen and presents hs personal nformaton,.e., name, address, age, etc. The left column n the nterface shows a subset of the users who have had ther faces regstered wth the system. Fgure 8-3 A snapshot of the user dentfcaton system 8.2 System Modules Face Detecton The algorthm proposed n (Lenhart et al., 2002) s ntally appled n the system for face detecton. The method s a classfcaton based algorthm, whch cascades a seres of Haar-lke features based face/non-face classfers for effcent detecton. The set of classfers are all traned usng AdaBoost algorthm and combned to form the fnal classfer, more detals can be found n (Vola et al., 2001; Lenhart et al., 2002). Once the effcent classfer s learned, a wndow wll be used to scan the test mage to search for face nstances. Source code for the face detector s freely avalable at the Intel Open Source Computer Vson Lbrary (Intel Corporaton, 2005). Fgure 8-4 shows a sample mage wth the located face marked n a red rectangle. As can be seen, the located face 141

The Developed User Identfcaton System area contans a lot of nose nformaton, e.g. background and har etc, whch could affect the performance of recognton algorthm.

156 The Developed User Identfcaton System area contans a lot of nose nformaton, e.g. background and har etc, whch could affect the performance of recognton algorthm. A skn mask module s developed and ntegrated nto the system n order to refne the results from the ntal face detecton process. Input Image Face Detecton Module Detected Faces Fgure 8-4 A sample mage wth detected face Skn Maskng It s wdely accepted that the colour of human skn s dstnctve from the colour of many other natural objects. Analyzng the statstcs on skn colour t can be observed that skn colours are dstrbuted over a small area n the chromnance plane wth the major dfference between skn tones beng varatons n ntensty (Menser & Muller, 1999). To utlze skn colour propertes for the face detecton refnement process an mage s frst converted nto lumnance and chromnance channels n the YCbCr color space. Let T w j = [Cb j Crj ] denote a vector composed of the chromnance components Cb and Cr for a pxel (, j). The class-condtonal pdf of w j belongng to the skn class x s modeled by a two-dmensonal Gaussan (Menser et al., 1999; Ba & Shen, 2003b): T 1 [ w µ ] [ w µ ] p( w j x) = (2π ) 1 / exp j j (8.1) 2 142

The Developed User Identfcaton System where the mean vector µ and the covarance matrx are estmated from the tranng set. Fgure 8-5 shows the dstrbuton of skn colours n the Cb and Cr domans.

157 The Developed User Identfcaton System where the mean vector µ and the covarance matrx are estmated from the tranng set. Fgure 8-5 shows the dstrbuton of skn colours n the Cb and Cr domans. The contour of the pdf defnes an ellpse n the CbCr doman, whose center and prncpal axs are determned by µ and, respectvely. After buldng the skn colour model, the orgnal colour mage can be easly converted to a skn probablty mage P usng equaton (8.1). The mage P ndcates the probablty of each mage pxel belongng to the skn class x,.e., P(, j) ~ p( w x). Fgure 8-6 (a) and (b) shows the nput colour mage I and the skn probablty mage P respectvely. j Fgure 8-5 Dstrbuton of skn colors n Cb, Cr doman Ellpse Maskng and Head Orentaton Estmaton Once the face regon s extracted from the nput mage, a ellpse fttng method (Bradsk, 1998) can be used to approxmate the skn blob and estmate the head orentaton. Detals of the fttng algorthm can also be found n Appendx C, whch s based on statstcal analyss of the skn probablty mage. Once the parameters of the ellpse approxmaton of the skn blob are determned, a face mage can be masked by the ellpse wth major axs l, mnor axs w, and orentaton θ. Fgure 8-6 (c) shows the ellpse masked face mage from (a). Fgure 8-7 shows the ftted face wth dfferent 143

The Developed User Identfcaton System orentatons. Each face has been masked wth a correspondng ellpse.

(a) (b) (c) Fgure 8-6 Detected face mage (a); skn probablty mage (b) and masked face mage (c) Fgure 8-7 Ellpse fttng for faces wth

Whle the module extracts features and/or trans models for future processng n regstraton mode, t must compare the test face wth each

A HMM based face recognton method s adopted n ths system, whch treats a face mage as a sequence of states produced when the face s

The embedded HMM conssts of a set of super states wth each super state beng assocated wth a set of embedded states.

158 The Developed User Identfcaton System orentatons. Each face has been masked wth a correspondng ellpse. The two axes, centrod and orentaton of the ellpse are ndcated by a cross. (a) (b) (c) Fgure 8-6 Detected face mage (a); skn probablty mage (b) and masked face mage (c) Fgure 8-7 Ellpse fttng for faces wth dfferent orentatons Recognton As shown n Fgure 8-8, the recognton module works n two modes: regstraton and dentfcaton. Whle the module extracts features and/or trans models for future processng n regstraton mode, t must compare the test face wth each regstered ID when n dentfcaton mode. A HMM based face recognton method s adopted n ths system, whch treats a face mage as a sequence of states produced when the face s scanned from top to bottom. More nterestng s the 2D embedded HMM proposed by Nefan (Nefan et al., 1999). The embedded HMM conssts of a set of super states wth each super state beng assocated wth a set of embedded states. Super states represent prmary facal regons whlst embedded states wthn each super state descrbe n more detal the facal regon. Nefan defned 5 super states: forehead, eyes, nose, mouth and chn. Transtons between embedded states n dfferent super states are not allowed. In a HMM based face recognton mplementaton, a face mage s dvded nto a seres of overlappng mage blocks, the observaton sequence can then be generated by 144

159 The Developed User Identfcaton System concatenatng the observaton vectors extracted from each mage block for HMM tranng. Once HMM models are traned usng regstered face mages, the observaton sequence extracted from a test mage s used as nput to all of the traned HMMs assocated wth each person and the condtonal probablty gven by each HMM s calculated. The dentty of the nput face s determned by the HMM whch produces the hghest probablty. Fgure 8-9 shows the flow chart of a generc HMM based face recognton system. Regstraton Request Person ID and the Regstered Faces Regstraton Reply Model Parameters or Features Recognton module Identfcaton Request Test Face and Person ID Identfcaton Reply Probablty or Matchng Score Fgure 8-8 Recognton module dagram The observaton vectors O t could be smply the grey values of pxels n the mage block. However, such a method s senstve to mage varaton due to llumnaton, translaton and rotaton. Moreover, snce the dmenson of the observaton vectors s hgh, much computaton s requred. Image transform technques wll be helpful to make the model more robust and perform feature dmenson reducton at the same tme. Nefan et al apply 2D Dscrete Cosne Transform (DCT) on each mage block and only the low frequency coeffcents are extracted to produce observaton vectors. Due to ts orgns n smultaneous tme and frequency analyss, wavelets are wdely beleved to be advantageous for mage representaton over other mathematcal transforms such as the Fourer transform or DCT. Therefore, a Dscrete Wavelet Transform (DWT) based HMM has also been proposed n (Ba et al., 2003a) for face recognton. Compared wth DCT, DWT based HMMs acheved hgher accuracy at the expense of slghtly reduced 145

The Developed User Identfcaton System effcency. Both methods have been mplemented n the system and can be swtched between accordng to applcaton requrements.

160 The Developed User Identfcaton System effcency. Both methods have been mplemented n the system and can be swtched between accordng to applcaton requrements. DCT DWT O1 O2 LO t LO T Tom s HMM Dylan s HMM John s HMM Fgure 8-9 The HMM face recognton algorthm Fgure 8-10 User management module dagram User Management ID management, mage feature/model management and personal nformaton management are the three man functons of the user management module (see Fgure 8-10 for detals). The mage feature/model manager s manly concerned wth the recordng of all mage feature/model fles for regstered subjects. The record gves an overvew of the face database, as well as the detals about saved mage feature/model fles, e.g. the path, number etc. The module updates each record whenever there s a 146

161 The Developed User Identfcaton System relevant change and responds to mage feature/module retreval requests when quered wth a user ID. The user ID manager s manly responsble for the ssung of new IDs and removng old IDs. The personal nformaton manager mantans data regardng each regstered users name, address, age and sex etc. Snce the data s stored on a MySQL server, the personal nformaton manager requres a database engne to nterpret the Add/Delete, Update and Query SQL requests. 8.3 Conclusons An automatc face based user dentfcaton system has been presented n ths chapter. When ntegrated wth face detecton, recognton and user management modules, the system can locate faces from mages captured by a normal web cam and recognze a subjects dentty n real tme. The flowcharts for two of the most mportant processes (regstraton and dentfcaton) have been descrbed and the man functons of the three system modules have been explaned n detal. A database wth about 30 subjects, who are manly students and staff from the Unversty of Nottngham Computer Scence School, has also been bult to test the system. The system has shown excellent performance wth hgh effcency when ths small database s used. Based on ths framework, a vdeo based face dentfcaton system has also been developed. The system can detect multple faces n a real tme vdeo stream and dentfy each of them. Fgure 8-11 shows a snapshot of the vdeo based system, where three faces are detected, dentfed and labelled wth the regstered names. A demo of the system can also be found at: The system, when runnng on a P4-1.8GHz PC, can support vdeo streams wth frame rates of up to 3 frames/sec. 147

The Developed User Identfcaton System Fgure 8-11 A snapshot of the vdeo based dentfcaton system Snce the system was developed at the ntal stages of the research, the HMM based recognton algorthm s

162 The Developed User Identfcaton System Fgure 8-11 A snapshot of the vdeo based dentfcaton system Snce the system was developed at the ntal stages of the research, the HMM based recognton algorthm s adopted. However, the algorthm has been shown to be only sutable for small databases. The results reported n chapter 4 show that though the DWT-HMM method acheves 97.5% accuracy at the ORL database (40 subjects), the fgure drops dramatcally to 44.5% on the subset of the FERET database (200 subjects) used for testng. A more robust method, such as the descrbed Gabor wavelet based approach, whch has been fully tested usng a number of large databases n ths thess, could easly be ntegrated nto the system framework for addtonal performance mprovements. 148

163 Conclusons and Future Works Chapter 9 Conclusons and Future Works A fast and robust Gabor wavelet based method has been proposed for face recognton n ths thess and the method has been fully tested usng publc databases, e.g. FERET, BANCA etc. Ths chapter wll gve a summary about the work presented n prevous chapters and some suggestons for future developments. 149

164 Conclusons and Future Works 9.1 Summary of Works An Overvew of Gabor Wavelets: Background and Applcatons A detaled revew of the background and applcatons of Gabor wavelets has been presented n ths thess. Contrbuted by Denns Gabor n 1946, the 1D Gabor functon was frst proposed for jont tme frequency analyss of the tme sgnal. As a member of the wavelet famly, mathematcal analyss shows that the Gabor wavelet acheves the optmal resoluton n both the tme and frequency domans. In the spatal doman, researchers have presented evdence showng the smlarty of 2D Gabor wavelets wth the receptve felds of mammalan vsual cortex cells. Motvated by the mathematcal background and bologcal evdence, 2D Gabor wavelets have been wdely appled n dfferent computer vson and pattern recognton applcatons ncludng face recognton. A lterature revew on the applcaton of Gabor wavelets for face representaton has also been performed n ths research. Amng to gve some gudance to researchers n ths area, the revew presented the latest Gabor wavelet based methods avalable n the lterature and dscussed both the lmtatons and advantages of dfferent approaches Gabor Wavelets and Kernel Subspace Methods for Face Identfcaton and Verfcaton Though face recognton has been an actve research area for many years, t s stll an unsolved problem due to the complex dstortons caused by expresson, pose and llumnaton varaton. However, the task seems to be trval for human bengs. Wth the ad of complex perceptual systems, such as the vsual cortex, t s very common for a human to recognze thousands of people, even n the presence of dynamc varatons of face shape, pose, expresson and appearance. Based on the overvew on background and applcatons of Gabor wavelets, they are adopted n ths research as a method to extract 150

165 Conclusons and Future Works robust features for face recognton purposes. Once the features are extracted, nonlnear kernel subspace analyss,.e. GDA, s further appled for dmenson reducton and class separablty enhancement. The combnaton of Gabor wavelets and kernel methods have been successfully appled to face dentfcaton and verfcaton and fully tested usng publc databases, e.g. ORL, FERET and BANCA. Whle the proposed method has acheved better performance than other state of the art dentfcaton algorthms on the ORL and FERET database, t has also shown to be more robust than most of the partcpants n the recent face authentcaton test usng the BANCA database Learnng the Most Important Gabor Features for Object Detecton and Recognton Despte the robustness of Gabor wavelets based methods, they requre hgh computaton and memory cost. Snce a set of 40 wavelets s convolved wth mages, the feature extracton process takes long tme. Though FFT could be used to speed up the convoluton process, the huge dmenson of extracted features wll also brng hgh computaton cost to the classfcaton process. As a result, a feature selecton method s requred to elmnate those redundant features for dmenson reducton. In ths thess, the AdaBoost algorthm s frst appled to select Gabor features for object detecton. Snce both feature selecton and classfer tranng can be completed n the same learnng process, the classfer usng the selected Gabor features can be used for object detecton drectly. A novel feature selecton algorthm, MutualBoost, has also been proposed and successfully appled to select Gabor features for face recognton. Partcularly, the mutual nformaton between canddate features s used as an addtonal crteron to select one by one the most mportant Gabor features. Compared wth AdaBoost selected features, the results show that Gabor features learned usng MutualBoost technques are more dscrmnatve and acheve better recognton 151

166 Conclusons and Future Works accuracy. Both systems have been compared wth those usng the pre-selected Gabor features, substantal effcency mprovements have been observed wthout performance deteroraton. The face recognton system usng the selected Gabor features has also been compared wth other state of the art methods on the whole FERET database accordng to the evaluaton protocol and better accuracy has been acheved. The face recognton system thus developed s both robust and effcent Automatc Eye Locaton To normalze the scale and orentaton of dfferent face mages, an automatc eye locaton algorthm s requred before the robust Gabor feature based face recognton system can be appled n real applcatons. Though there are qute a number of complex methods avalable, most of the eye locaton systems are only tested usng a lmted number of mages and they normally requre lots of tranng samples. The method proposed n ths thess s, however, very smple and requres no tranng mages. The approach s based on a context free feature detector, the generalzed symmetry transform, whch requres no pror knowledge about eyes. Once those areas wth large symmetry values are located, eyes can be easly located at the centre of these regons. The robustness of the algorthm s frst tested usng 1460 face mages from the BoID database, 99% and 93% accuracy are acheved for face mages wth and wthout glasses respectvely. The eye locaton algorthm has also been tested usng 2730 mages from the BANCA database, about 98.6% accuracy has been acheved. Based on the proposed eye locaton module, a fully automatc verfcaton system has also been developed by ntegratng the verfcaton module (Gabor + GDA) proposed n chapter 5. The automatc system s tested usng the BANCA database accordng to protocols defned by the recent Face Verfcaton Competton The performance of the automatc 152

167 Conclusons and Future Works verfcaton system s one of the top three and better than most of the partcpants n the contest The User Identfcaton System An automatc real tme user dentfcaton system has been developed n ths research. The system conssts of three man modules: face detecton, recognton and user management. Wth the full support of each of these modules, the system can effcently detect faces from mages captured by a web cam, extract features and dentfy the user. The system can also functon n regstraton mode such that the personal nformaton, face mages and model/features can be regstered and saved ether n fles, or n the MySQL database. Utlsng the hgh effcency of the proposed technques, a vdeo based face dentfcaton system has been further developed, whch can detect multple faces from a real tme vdeo stream, dentfy them and dsplay ther names. The modular desgn of the system allows a large degree of flexblty, allowng for future expanson and the ntegraton of any new face detecton or recognton algorthms. 9.2 Future Works Extensons of the Present Works A Complete Gabor Feature Based Object Detecton system Though the Gabor feature based classfer has shown the ablty to dscrmnate car and non-car mages as well as face and non-face mages, more works stll need to be done before the classfer can be appled n real object detecton applcatons. For classfcaton based detecton methods, an mage s usually scanned by a n n wndow wth one pxel step sze. Each mage wndow s then nput to the learned classfer to make a classfcaton decson,.e., object or background. Fgure 9-1 shows a typcal classfcaton based face detecton system. To deal wth the scale varance, the mage s usually rescaled by s dfferent factors such that a set of mult-resoluton mages are 153

168 Conclusons and Future Works generated, the detecton process can be appled to each mage thereafter. As a result, the number of mages to be processed by the classfer s huge, wth more than 90,000 wndows ( n = 20 ) needng to be classfed for an mage wth sze when s = 5. Fgure 9-1 A classfcaton based face detecton system Based on the fact that most of the scanned mage blocks are actually background (see Fgure 9-1), a cascade of classfers s used n (Lenhart et al., 2002; Vola et al., 2001) to speed up the detecton process. Fgure 9-2 shows the cascade structure of three classfers. Smple classfers are used to reject the majorty of the sub wndows before more complex classfers are appled. The smple classfers are adjusted such that the false negatve rate s close to zero. A postve result from the frst classfer trggers the evaluaton of the second classfer wth hgh detecton rates, and so on. A negatve result at any pont leads to the mmedate rejecton of the sub wndow. As such, the cascade attempts to reject as many negatve wndows as possble at the earlest stage possble. Such a cascade structure shall also be used to learn a Gabor feature based classfer for real tme object detecton. The classfer at the 1 st stage could be one whch uses only two Gabor features wth a mnmzed false negatve rate. Subsequent classfers wll requre a larger number of features. To reduce the computaton cost of feature extracton, the classfers at early stages could also be traned usng smpler 154

Face Recognition University at Buffalo CSE666 Lecture Slides Resources:

Face Recognition University at Buffalo CSE666 Lecture Slides Resources: Face Recognton Unversty at Buffalo CSE666 Lecture Sldes Resources: http://www.face-rec.org/algorthms/ Overvew of face recognton algorthms Correlaton - Pxel based correspondence between two face mages Structural