Human Face Recognition Using Generalized. Kernel Fisher Discriminant

Similar documents
Recognizing Faces. Outline

Face Recognition University at Buffalo CSE666 Lecture Slides Resources:

Classifier Selection Based on Data Complexity Measures *

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Feature Reduction and Selection

Support Vector Machines

Face Recognition Based on SVM and 2DPCA

Cluster Analysis of Electrical Behavior

Two-Dimensional Supervised Discriminant Projection Method For Feature Extraction

RECOGNIZING GENDER THROUGH FACIAL IMAGE USING SUPPORT VECTOR MACHINE

Support Vector Machines

Announcements. Supervised Learning

Classification of Face Images Based on Gender using Dimensionality Reduction Techniques and SVM

Edge Detection in Noisy Images Using the Support Vector Machines

Learning a Class-Specific Dictionary for Facial Expression Recognition

Discriminative Dictionary Learning with Pairwise Constraints

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices

Classification / Regression Support Vector Machines

Machine Learning 9. week

Facial Expression Recognition Based on Local Binary Patterns and Local Fisher Discriminant Analysis

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

Lecture 4: Principal components

Face Recognition Method Based on Within-class Clustering SVM

A Binarization Algorithm specialized on Document Images and Photos

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

A Modified Median Filter for the Removal of Impulse Noise Based on the Support Vector Machines

Modular PCA Face Recognition Based on Weighted Average

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

The Research of Support Vector Machine in Agricultural Data Classification

Collaboratively Regularized Nearest Points for Set Based Recognition

Local Quaternary Patterns and Feature Local Quaternary Patterns

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification

Classifying Acoustic Transient Signals Using Artificial Intelligence

Machine Learning. Support Vector Machines. (contains material adapted from talks by Constantin F. Aliferis & Ioannis Tsamardinos, and Martin Law)

Competitive Sparse Representation Classification for Face Recognition

Range images. Range image registration. Examples of sampling patterns. Range images and range surfaces

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like:

Combination of Local Multiple Patterns and Exponential Discriminant Analysis for Facial Recognition

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Appearance-based Statistical Methods for Face Recognition

The Discriminate Analysis and Dimension Reduction Methods of High Dimension

Feature Selection By KDDA For SVM-Based MultiView Face Recognition

A Robust LS-SVM Regression

Solving two-person zero-sum game by Matlab

Computer Aided Drafting, Design and Manufacturing Volume 25, Number 2, June 2015, Page 14

SUMMARY... I TABLE OF CONTENTS...II INTRODUCTION...

Margin-Constrained Multiple Kernel Learning Based Multi-Modal Fusion for Affect Recognition

Self-Supervised Learning Based on Discriminative Nonlinear Features for Image Classification

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

On Modeling Variations For Face Authentication

Feature Extraction Based on Maximum Nearest Subspace Margin Criterion

Learning a Locality Preserving Subspace for Visual Recognition

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Laplacian Eigenmap for Image Retrieval

Incremental MQDF Learning for Writer Adaptive Handwriting Recognition 1

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Correlative features for the classification of textural images

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

FACE RECOGNITION USING MAP DISCRIMINANT ON YCBCR COLOR SPACE

A Genetic Programming-PCA Hybrid Face Recognition Algorithm

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET

Histogram-Enhanced Principal Component Analysis for Face Recognition

PCA Based Gait Segmentation

Lecture 5: Multilayer Perceptrons

WIRELESS CAPSULE ENDOSCOPY IMAGE CLASSIFICATION BASED ON VECTOR SPARSE CODING.

Feature Extractions for Iris Recognition

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

Detection of an Object by using Principal Component Analysis

Kernel Collaborative Representation Classification Based on Adaptive Dictionary Learning

An Efficient Illumination Normalization Method with Fuzzy LDA Feature Extractor for Face Recognition

Kent State University CS 4/ Design and Analysis of Algorithms. Dept. of Math & Computer Science LECT-16. Dynamic Programming

Orthogonal Complement Component Analysis for Positive Samples in SVM Based Relevance Feedback Image Retrieval

Feature Selection for Target Detection in SAR Images

Pictures at an Exhibition

Smoothing Spline ANOVA for variable screening

FACIAL FEATURE EXTRACTION TECHNIQUES FOR FACE RECOGNITION

PERFORMANCE EVALUATION FOR SCENE MATCHING ALGORITHMS BY SVM

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

EYE CENTER LOCALIZATION ON A FACIAL IMAGE BASED ON MULTI-BLOCK LOCAL BINARY PATTERNS

Robust visual tracking based on Informative random fern

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Study of Remote Sensing Image Classification Based on Support Vector Machine

Positive Semi-definite Programming Localization in Wireless Sensor Networks

Face Recognition Methods Based on Feedforward Neural Networks, Principal Component Analysis and Self-Organizing Map

Palmprint Recognition Using Directional Representation and Compresses Sensing

Using Fuzzy Logic to Enhance the Large Size Remote Sensing Images

Optimal Workload-based Weighted Wavelet Synopses

LECTURE : MANIFOLD LEARNING

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

Face Recognition by Fusing Binary Edge Feature and Second-order Mutual Information

Corner-Based Image Alignment using Pyramid Structure with Gradient Vector Similarity

Tone-Aware Sparse Representation for Face Recognition

Discriminative classifiers for object classification. Last time

Hierarchical clustering for gene expression data analysis

Using Neural Networks and Support Vector Machines in Data Mining

An AAM-based Face Shape Classification Method Used for Facial Expression Recognition

Experimental Analysis on Character Recognition using Singular Value Decomposition and Random Projection

Transcription:

Human Face Recognton Usng Generalzed Kernel Fsher Dscrmnant ng-yu Sun,2 De-Shuang Huang Ln Guo. Insttute of Intellgent Machnes, Chnese Academy of Scences, P.O.ox 30, Hefe, Anhu, Chna. 2. Department of Automaton, Unversty of Scence and echnology of Chna Emals: besun@sohu.com, dshuang@m.ac.cn lguo@m.ac.cn Abstract- In ths paper the generalzed kernel fsher dscrmnant (GKFD) method s used to do pattern feature extracton and recognton for human face mage. Frst, we extend the KFD orgnally used n pattern classfcaton problems to the generalzed KFD (GKFD), whch wll be used n feature extracton problems. Compared to several commonly used feature extracton methods, the GKFD can not only reduce the dmenson of nput pattern, but also provde the useful nformaton for pattern classfcaton. Further, ths GKFD also performs well for lnearly nonseparable pattern classfcaton problems for t possesses a nonlnear transformaton capablty. Fnally, the expermental results on human face recognton problems demonstrate the effectveness and effcency of our approach. Introducton In classfcaton and other data analytc tasks t s often necessary to perform pre-processng on the data before applyng the algorthm at hand. he most common pre-processng method s to extract features from problems nvolved so that the tasks are easly resolved. Feature extracton for classfcaton dffers sgnfcantly from feature extracton for descrbng data. For example, Prncpal component analyss (PCA) s to fnd drectons [6][7] that have mnmal reconstructon error by descrbng as much varance of the data as possble wth m orthogonal drectons whle Fsher Lnear Dscrmnant (FLD) [4] s to choose proectons that dfferent classes of patterns are well separated. So, the FLD can provde the useful nformaton for classfcaton whle the PCA s mostly used to perform the dmenson reducton. It should be noted that the FLD s only a lnear transformaton, whch maxmzes the rato of the determnant of the between-class scatter matrx of the proected samples to the determnant of the wthn-class scatter matrx of the proected samples. As a result, the FLD method s of globally optmal performance only for lnear separable problems. Hence, to overcome ths drawback for the FLD, a kernel dea, whch was orgnally appled n Support Vector Machnes (SVMs) [], can be used to construct a new kernel based FLD. Consequently, ths method s also referred to as kernel fsher dscrmnant (KFD). In fact, the KFD has been successfully used n pattern recognton problems, but t can only solve those problems wth two classes. For more detals, please refer to [2][3]. In ths paper we wll extend the KFD to do feature extracton for multple classes of problems. hus, the method s also named as generalzed KFD (GKFD). hose features obtaned by ths GKFD are drectly classfed by the nearest neghbor method. In ths paper, we take the human face recognton data for example to conduct the 466

related experments. he expermental results verfy the effectveness and effcency of our proposed GKFD. 2 A Novel Method for Feature Extracton ased on Generalzed Kernel Fsher Dscrmnant Input space Feature space for performng the feature extracton of multclass pattern recognton problems. Consderng a set of n sample vectors{ x,x,,x}, assume that each vector 2 n belongs to one of c classes{ X, X,, X }. o 2 derve the GKFD, we frst map the data { x,x,,x} 2 n by, a non-lnear mappng, nto c feature space F. hen n F, the optmal subspace, opt, that we wll fnd, s determned as fellow: Fg he sketch of the man dea of the KFD. y usng a kernel functon, the orgnal nonlnear separable data n nput space becomes lnear separable n feature space. As we have ponted out n Secton, the FLD s only a lnear transformaton, and t s of globally optmal performance only for lnear separable data. ut for most real-world data they are lnearly nonseparable, or nonlnearly separable. o overcome ths lmtaton, the KFD method s proposed [2]. he man dea of the KFD, whch has been proved powerful n pattern recognton problems, s to address the problem of the FLD n a kernel feature space, thereby yeldng a nonlnear dscrmnant n nput space (as shown n Fg ). For a set of orgnal nonlnear separable data, the kernel method [] can guarantee to make nonlnear data become lnear separable f ths data s mapped to a feature space by a kernel functon. th ths dea the KFD ams at resolvng a bnary classfcaton problem that frst proects all the data to the optmal proecton vector found by KFD, and then classfes these data accordng to ther proecton values. ut for a multclass classfcaton problem, only one proecton vector s not enough, so n ths secton we wll extend the KFD to the generalzed KFD (GKFD) opt where S = arg max = [ w, w2,, w ] m () S S and S are the correspondng between-class scatter and wthn-class scatter matrces n F,.e., S c ( µ µ )( µ µ = ) = c ( ( ) µ )( ( ) µ = ) k k = xk X µ = ( x ) k, µ = n x N k X n = w wth S n (2) x x (3) ( x ). Each n eqn.() can be computed by solvng the followng generalzed egenvalue problems: S w = λ S w (4) From the theory of reproducng kernels we know that any soluton w F must be n the spannng space of the mapped data,.e., w span{ ( x), ( x2),..., ( x n)}, whch can be wrtten as = n α x = w ( ) (5) Usng ths expanson, the numerator of eqn.() can be rewrtten as: where w S w = α Mα (6) 467

c M = ( M M)( M M) (7) = ( x ) ( x ) = k( x, x ) (8) α = α α α (9) [, 2,..., n ] In eqn. (7) we have defned as ( M) = k( x, x k) (0) n x M = n k X n = k( x, x ) () Now, consderng the denomnator of eqn.(), and usng smlar transformaton, we have: where c = w S w = α Lα (2) w L = K ( I N ) K ; K s an matrx wth ( K ) = k( x, x ), x X nm n m the dentty matrx and n entres n. m n n ; I s the matrx wth all Combnng eqn.(6) and eqn.(2), the optmal subspace, opt, can be determned by followng formula: α Mα αopt = arg max = [ α, α2,, αm ] (3) α α Lα and the followng eqn.(4). o extract the features of a new pattern x wth the GKFD, we smply proect the mapped pattern ( x ) onto ths subspace, and the result s descrbed n eqn.(5). From the above analyses, we can obtan several conclusons as stated n the followng remarks: Remarks: ) From the expresson of S, we can see that assumng the dmenson of the mapped space s r, then, rank( S ) mn( r, c ). As lterature [3] has outlned, the dmenson of the feature space s equal to or hgher than the number, n, of tranng samples, whch makes the regularzaton necessary. Accordngly, there are at most c- generalzed egenvectors correspondng to nonzero egenvalues. In other words, the GKFD can transform the new pattern x to a vector wth the dmenson of c-. 2) Also t can be found that the relaton, rank( S ) mn( r, n ), holds. Due to r > n, so rank( L) = rank( S ) n. Obvously, the matrx L s a sngular one. o overcome ths problem, we can use the same method as n [2][3], add a multple of the dentty matrx to L: Lµ = L+ µ I. (6) 3) A very mportant problem of the GKFD s how to select the kernel functons and ther parameters snce dfferent kernel functons wll has dfferent effects on the performance of the problems nvolved. So far, however, how to select the kernel functon s stll open. Usually the canddates of optmal kernel functons are determned by usng some heurstc rules, where the one that mnmzes a gven crteron s chosen. A most commonly used method s cross n n n opt = [ w, w2,, wm ] = α( x ), α2( x ),, αm( xm ) (4) = = = n n n ( x) opt = α( x ) ( x), α2( x ) ( x),, αm( xm ) ( x ) = = = n n n = αk( x x), α2k( x x),, αmk( xm x ) (5) = = = 468

valdaton method,.e., the tranng samples are dvded nto k subsets, each of whch has the same number of samples. hen the performance of each canddate s evaluated k-tmes. In the -th ( =,2,..., k ) teraton, the data except for the -th one s used to conduct the tranng phase whle the -th one s used to conduct the testng phase. At last, the canddate that acheves the best performance s selected. he extreme case that k s equal to the number of samples s called leave-one-out cross valdaton. paper the nearest neghbor method s selected as the effcent classfer for human face mage. 4 Experment Results o verfy the effectveness and effcency of our approach, n the followng we wll present the related expermental results for human face mage recognton.. 3 Human Face Image Recognton ased on the Nearest Neghbor Method Perhaps the smplest classfcaton scheme s the nearest neghbor classfer (NNC) n the mage classfcaton. Under ths scheme, an mage n the test set s recognzed by assgnng to t the label of the closest pont n the tranng set, where the dstance s measured n the mage space. If all the mages are normalzed to have zero mean and unt varance, then ths procedure s equvalent to choosng the mage n the tranng set that best correlates wth the test mage. In fact, because of the normalzaton process, the normalzed mages are ndependent of lght source ntensty. ut ths NNC procedure has two well-known dsadvantages. Frst, f the mage n the tranng set and the test set are gathered under varyng lghtng condtons, then the correspondng ponts n the mage space may not be tghtly clustered. Second, ths NNC procedure s also computatonally expensve. However, our proposed GKFD can overcome these drawbacks, thus, the nearest neghbor methods can work effcently as a classfer after feature extracton by GKFD. he reason s that on the one hand, the mage from the same class set can gather tghtly after GKFD transform; on the other hand, the dmenson of the transformed sample ponts s sgnfcantly reduced. So n ths Fg 2 Example mages from one subect n ORL database Our experments were performed usng ORL database. hs database ncludes 400 dfferent mages of 40 dstnct subects (ten ones per person), respectvely. For some of the subects, the mages were taken at dfferent tme. Moreover, there are varatons n facal expresson, e.g., open/closed eyes, smlng/nonsmlng, and facal detals, e.g., glasses/no glasses, etc. Fg 2 shows ten mages of a subect. he orgnal face mages were all szed nto 92 2 wth a 256-level gray scale. Frst the face mages are preprocessed by twce dscrete wavelet transformaton. he sze of each mage s reduced to 23 28. As a result, the computatonal complexty further decreases. he experments were performed wth fve tranng mages and fve test mages per person, thus a total of 200 tranng mages and 200 test mages are formed. Assume that there exsts no overlappng between the tranng and test sets. Snce the recognton performance s affected by the selecton of the tranng mages, the reported results were obtaned by tranng 20 recognzers wth dfferent tranng examples that are formed by randomly selectng fve ones from ten mages 469

per subect. he reported classfcaton error has been averaged over all 20 expermental results. In our experments the Radal ass Functon of Gaussan form: 2 (, 2) = exp( 2 σ ) k x x x x (7) s used as kernel functon and the correspondng parameter s selected through leave-one-out method. (a) proposed approach, here we use the nearest neghbor method to perform the classfcaton on the features extracted from the ORL database by the GKFD. Fg 4 shows the plot of the error rates vs. the change of dmenson for four feature extracton methods of PCA, GKFD, PCA+FLD and PCA+GKFD. From ths fgure we can fnd that the error rate of the GKFD s the smallest among the four methods. Note that the FLD can not be appled on the orgnal data drectly for t can only process the data wth the dmenson lower than n-c, so the PCA has to be used before the FLD. able also shows the comparsons of the smallest error rate vs the reduced dmenson for four feature extracton methods. From ths table t can be found that the GKFD can get better performance for lnearly nonseparable pattern recognton problems able Performance comparson of four feature extracton methods Reduced Methods Error Rate (%) dmenson PCA 99 8.5 FLD 20 8.6 PCA+GKFD 39 4.35 GKFD 39 2.6 (b) Fg 3 he data dstrbuton of 3 components for 3 subects after (a) PCA and (b) GKFD. 0.25 0.2 PCA+FLD PCA gkfd GKFD PCA+gKFD PCA+GKFD In order to vsualze the dstrbuton of human face mage samples, we take 3 components correspondng to the frst 3 largest egenvalues of the PCA and the GKFD to draw the 3-dmensonal topology. From Fg 3 we can fnd that the orgnally overlapped data can be more easly classfed after feature extracton by the GKFD whle the 3 components for the PCA are somewhat overlapped. In order to further test the performance of our So called the classfcaton error s meant the percentage of samples erroneously labeled. R ate Er o r 0.5 0. 0.05 0 0 0 20 30 40 50 60 70 80 90 00 Number of Dmenson Fg 4 the error rate curves vs the change of dmenson for four methods of PCA, FLD, GKFD and PCA+GKFD Note that n the testng case the error rate of the PCA+GKFD s hgher than that of the GKFD. he reason possbly s that the data after PCA 470

transformaton have lost some features of the orgnal data. In addton, able 2 also gves a summary of the performance comparsons of fve classfcaton systems for recognzng the data from the ORL database. he correspondng error rates are the averages of 20 smulatons. However, the ndvdual smulaton of the GKFD n experment sometmes shows that the error rate s as low as 0.5%. In concluson, the above expermental results show that the GKFD plus the nearest neghbor method can effcently solve those nonlnearly separable pattern recognton problems wth multple classes compared to other tradtonal methods. able 2 Performance comparson of fve classfcaton systems Systems Error rate (%) Egnfaces[8] 0.0 Pesudo-2DHMM[8] 5.0 Probablstc decson-based neural network[9] 4.0 Lnear SVMs[0] 3.0 GKFD+NN 2.6 5 Conclusons In ths paper we extended the KFD, orgnally used as classfers, to the feature extracton felds. he correspondng performance was verfed by the human face mage data from the ORL database. From the obtaned expermental results we can draw the followng conclusons: ) Unlke the KFD amng at fndng one optmal proecton drecton to perform bnary classfcaton problem, the GKFD ams at fndng an optmal subspace n the feature space to do the feature extracton for those multclass problems. 2) he GKFD performs well for lnearly nonseparable pattern recognton problems for t s a nonlnear transformaton. So t can almost get the optmal results for nonlnearly separable data. 3) Unlke the FLD, the GKFD can be appled to the orgnal data drectly no matter what the number of the dmenson s. 4) In ths paper the nearest neghbor method s used as the classfer due to ts smplcty. If ths method s combned wth some advanced classfcaton methods such as RFN, and SVMs, etc., the error rate can be further reduced. Future works wll nclude extendng ths method to solve those problems wth a large scale of database. References: [] V. Vapnk, he nature of Statstcal Learnng heory, New York: ley, 998. [2] S. Mka, G. Ra tsch, J. eston,. Scho lkopf, and K. Mu ller, Fsher Dscrmnant Analyss wth Kernels, Neural Networks for Sgnal Processng, vol. 9, pp. 4-48, 999. [3] K.-R. Müller and S.Mka, An Introducton to Kernel-ased Learnng Algorthms, IEEE ransacton on Neural Networks, Vol.2, NO.2.pp.8-200,200. [4] R.A Fsher, he use of Multple Measure n axonomc Problems, Ann.Eugencs, vol.7, pp.79-88,936. [5] N.elhumeur and P.Hespanha, Egenface vs. Fsherfaces:Recognton Usng Class Specfc Lnear Probecton, IEEE ransacton on Pattern Analyss, Vol.2, NO.2.pp.8-200,200. [6] R. Lotlkar and R. Kothar, Fractonal-step dmensonalty reducton, IEEE rans. Pattern Anal. Machne Intell., vol. 22, pp. 623 627, 2000. [7] M.urk and A.Pentland, Egnefaces for Recognton J,Cogntve Neuroscence, Vol.3, no.,99. [8] F. S. Samara, Face recognton usng hdden Markov models, Ph.D. dssertaton, Unv. Cambrdge, Cambrdge, U.K., 994. [9] S.-H. Ln, S.-Y. Kung, and L.-J. Ln, Face recognton/detecton by probablstc decson-based neural network, IEEE rans. Neural Networks, vol. 8, 47

pp. 4 32, Jan. 997. [0] G. D. Guo, S. Z. L, and K. L. Chan, Face recognton by support vector machnes, n Proc. Int. Conf. Automatc Face and Gesture Recognton, pp. 96 20.2000 472