Learning a Class-Specific Dictionary for Facial Expression Recognition

BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 16, No 4 Sofa 016 Prnt ISSN: 1311-970; Onlne ISSN: 1314-4081 DOI: 10.1515/cat-016-0067 Learnng a Class-Specfc Dctonary for Facal Expresson Recognton Shqng Zhang 1, Gang Zhang, Yuel Cu 1, Xaomng Zhao 1 1 Insttute of Intellgent Informaton Processng, Tazhou Unversty, Tazhou, Chna Insttute of Guangzhou Qualty Supervson and Testng, Guangzhou, Chna Emals: tzczsq@163.com tzxyzz@16.com cuyuel@tzc.edu.cn tzxyzxm@163.com Abstract: Sparse codng s currently an actve topc n sgnal processng and pattern recognton. MetaFace Learnng (MFL) s a typcal sparse codng method and exhbts promsng performance for classfcaton. Unfortunately, due to usng the l 1-norm mnmzaton, MFL s expensve to compute and s not robust enough. To address these ssues, ths paper proposes a faster and more robust verson of MFL wth the l -norm regularzaton constrant on codng coeffcents. The proposed method s used to learn a class-specfc dctonary for facal expresson recognton. Extensve experments on two popular facal expresson databases,.e., the JAFFE database and the Cohn-Kanade database, demonstrate that our method shows promsng computatonal effcency and robustness on facal expresson recognton tasks. Keywords: Sparse codng, metaface learnng, sparse representaton, facal expresson recognton, robustness. 1. Introducton Facal expresson s the man manner of expressng and nterpretng the affectve states of human bengs. Facal expresson recognton focuses on dstngushng the human affectve states by usng facal expresson. Durng the last two decades, facal expresson recognton has become a hot research topc n pattern recognton, artfcal ntellgence and computer vson, owng to ts mportant applcatons n human-computer nteracton, artfcal ntellgence, securty montorng, socal entertanment [1-3]. As far as the classfcaton task of facal expresson s concerned, a varety of conventonal classfcaton methods have been appled so far for facal expresson recognton, such as Hdden Markov Model (HMM), Artfcal Neural Network (ANN), Support Vector Machnes (SVM), K-Nearest Neghbor (KNN), and so on. In recent years, a new type of classfcaton method called Sparse Representaton based Classfcaton (SRC)[4] has been used successfully for face recognton. However, SRC drectly employs all the tranng samples to construct the dctonary, 55

resultng n lots of redundancy, nose, and trval nformaton n the pre-defned dctonary. In addton, when the tranng samples grow, SRC wll suffer from the computaton bottleneck snce t uses the l 1-norm sparsty constrant on codng coeffcents. In recent years, Y a n g et al. [5] developed a MetaFace Learnng (MFL) method to learn a class-specfc dctonary, n whch metafaces are learned from the orgnal mages and then used as the dctonary to represent the nput query mage. It has been found n [5] that MFL s more effectve than SRC and t also brngs performance mprovement over SRC. However, t s noted that MFL has two shortcomngs. Frst, smlar to SRC, MFL also uses the l 1-norm sparsty constrant on codng coeffcents, whch needs to be solved by a tme-consumng teraton process. Second, n MFL the codng fdelty s measured by the l 1-norm of codng resdual under the assumpton that the codng resdual follows the Gaussan dstrbuton. However, n practce ths assumpton may not hold well n nosy envronment, where the codng resdual may not conform to the Gaussan dstrbuton. To overcome these drawbacks of MFL, we modfy the obecton functon of MFL and derve ts analytcal soluton wth the l -norm regularzaton constrant on codng coeffcents. Ths soluton gves rse to a faster and more robust verson of MFL, whch s called FR-MFL. The effectveness of the proposed FR-MFL method s verfed on facal expresson recognton tasks. The remander of ths paper s organzed as follows. In Secton, the orgnal MFL s revewed brefly. The proposed FR-MFL s descrbed n detal n Secton 3. Secton 4 gves the experment results and analyss. Fnally, conclusons are drawn n Secton 5.. Metaface learnng MetaFace Learnng (MFL) [5] ams to learn a class-specfc dctonary for each obect from the orgnal tranng samples and then uses the learned dctonary to represent the nput query mage. Suppose the tranng samples are denoted by D = [X 1, X,, X c] R d N where X c R d N c s the subset of all the N c vector-represented tranng samples from class c, and d s the feature dmenson, In SRC, a new test sample x R d l 1-mnmzaton optmzaton problem: (1) c N n s the total number of samples. 1 arg mn x D, can be sparsely coded by the followng 1 where s a postve scalar number whch s used as a tradeoff between the reconstructed error and the coeffcents sparsty. In MFL, a class-specfc dctonary D s learned by () D, A arg mn X D A A, 1 D, A s.t. d 1, 1,,..., K, 56

where X R d N denotes all the tranng samples from the -th class, d represents the -th column of the -th class-specfc sub-dctonary D = [d 1,, d K ] R d K, and A s the summaton of l 1-norm of all the columns of A = [ 1,, N ] R d N, 1 N.e., A 1 1. In Equaton (), a ont optmzaton problem of the metafaces D and the representaton coeffcent matrx A needed to be solved. As a mult-varable optmzaton problem, Equaton () can be solved by optmzng D and A alternatvely, as descrbed below. When fxng D, the obectve functon n Equaton () can be reduced to (3) A arg mn X D A A. 1 A That equaton s a convex optmzaton methods and can be obtaned by quadratc programmng, such as the teratve l 1 -regularzed least squares ( l 1 -ls) [6] algorthm. Fxng A, D can be updated by solvng the followng obecton functon (4) D arg mn X D A, s.t. D d 1, 1,,..., K. That equaton can be solved by usng the Langrage multpler, the fnal analytcal soluton s T T (5) d Y / Y, where Y X d. When updatng d, all the other columns of D,.e., dl, l, are fxed. l 3. The proposed FR-MFL method 3.1. Motvaton l l There are two drawbacks of the orgnal MFL, as descrbed below. Frst, lke SRC, MFL [5] mposes the l 1-sparsty constrant n Equaton () on the representaton coeffcents to regularze the soluton. Nevertheless, the l 1-mnmzaton takes a tme-consumng teratve process due to ts large computaton cost. Therefore, t s desrable to decrease the computaton cost n MFL. Second, In MFL the codng fdelty s measured by the l 1-norm of codng resdual under the assumpton that the codng resdual follows the Gaussan dstrbuton. However, mages usually contan some addtve nose, so ths assumpton may not hold well n nosy envronment. Recently, t has been proved n [7] the l -norm could be used to characterze the data fdelty for an optmal maxmum a posteror estmaton when the observed mage contans some addtve Gaussan nose. 57

3.. Our method Based on the abovementoned two ponts, by modfyng the obecton functon of MFL, we can get a Faster and more Robust MFL (FR-MFL) algorthm. In detal, the obecton functon n Equaton () can be rewrtten as (6) where 58 s the l-norm. D, A arg mn X D A A, D, A s.t., d 1, 1,,..., K, Lke MFL, the proposed FR-MFL also needs to solve a ont optmzaton for the metafaces D and the representaton coeffcent matrx A. In other words, D and A n Equaton (6) can be obtaned by optmzng D and A alternatvely, as descrbed below. When fxng D, based on the l -norm regularzaton constrant, A can be solved by usng the Langrage multpler, and the fnal obtaned analytcal soluton s (7) = (X T X + I) 1 X T. Fxng A, D can be updated by solvng Equaton (4), as done n MFL. It s worth pontng out that ths step of optmzng A n the proposed FR-MFL method s dfferent from MFL. Frst, wth the ad of the l -norm regularzaton constrant we can drectly derve the analytcal soluton, and hence avod the expensve computaton n orgnal MFL. Second, the l -norm s more effectve n characterzng the data fdelty n nosy envronment. It makes the proposed FR-MFL yeld more robust feature representatons than MFL. The advantages of FR-MFL are verfed n the followng experments. 4. Experments 4.1. Experment setup To valdate the proposed FR-MFL, we performed facal expresson recognton experments on two popular facal expresson databases,.e., the JAFFE database [8] and the Cohn-Kanade database [9]. The JAFFE database has 13 mages of female facal expresson. Each mage has a resoluton of 56 56 pxels. The Cohn-Kanade database contans 100 unversty students. Each mage has a resoluton of 640 490 pxels. As done n [10], on the Cohn-Kanade database we selected 30 mage sequences from 96 subects, wth 1 to 6 emotons per subect. For every sequence, the neutral face and one peak frames were employed for prototypc expresson recognton, gvng n total 470 mages (3 anger, 100 oy, 55 sadness, 75 surprse, 47 fear, 45 dsgust and 116 neutral). Fgs 1 and separately show some sample mages from the JAFFE database and the Cohn-Kanade database. Two experments are performed wth dfferent confguratons. The frst one classfes facal expressons wth the popular Local Bnary Patterns (LBP) [10-13] features extracted from orgnal clean mages wthout any corrupton. In ths case,

accordng to the normalzed value of the eye dstance, a reszed mage of 110 150 pxels was cropped from orgnal mages before performng LBP operators, as done n [10]. The second one recognzes facal expressons on corrupted mages to verfy the robustness of the proposed method. In ths case, all mages are reszed nto 3 3 pxels and then the random pxel corrupton s mplemented to generate corrupted mages. The proposed FR-MFL s compared wth lnear SVM, SRC, and MFL, respectvely. For SRC and MFL, the l 1-norm mnmzaton s solved by usng the teratve l 1-regularzed least squares (l 1-ls) [6] algorthm. A fve-fold cross valdaton scheme s mplemented n all facal expresson recognton experments, and the average recognton results are reported. The experment platform s Intel CPU.10 GHz, 1G RAM memory, MATLAB 01a. Fg. 1. Examples of facal expresson mages from the JAFFE database Fg.. Examples of facal expresson mages from the Cohn-Kanade database 4.. Experments wthout corrupton As used n [10], we employed the 59-bn operator LBP P, u R, and dvded the cropped mages of 110 150 pxels nto 18 1 pxels regons, yeldng a feature vector 59

length of 478 (59 4) represented by the LBP hstograms. Tables 1 and gve the recognton performance of all used methods, ncludng SRC, MFL, and FR-MFL on the JAFFE database and the Cohn-Kanade database, respectvely. The results are summarzed n Tables 1 and. From the tables t s easy to observe that FR-MFL obtans an accuracy of 85.71% and 98.09% on the two databases, respectvely. It performs better than MFL and SRC, and sgnfcantly outperforms the baselne SVM. Ths shows that l -norm regularzaton constrant s more effectve for classfcaton than l 1-norm sparsty constrant snce l -norm can gve hgher ablty to avod overfttng than l 1-norm. Table 3 presents a comparson of computaton tme between FR-MFL and MFL to evaluate ther computaton effcency. Computaton tme s represented by the whole operaton tme for tranng and testng when performng classfcaton for all face mages n the correspondng face database. From Table 3, t can be observed that FR-MFL s about.39 tmes faster than MFL on the JAFFE database and about.46 tmes faster on the Cohn-Kanade database, respectvely. Ths valdates the advantages of nducng an analytcal soluton n FR-MFL wth the l -norm regularzaton constrant. 60 Table 1. Recognton accuracy (%) of dfferent methods on the JAFFE database Method FR-MFL MFL SRC SVM Accuracy 85.71 84.76 84.76 79.88 Table. Recognton accuracy (%) of dfferent methods on the Cohn-Kanade database Method FR-MFL MFL SRC SVM Accuracy 98.09 97.57 97.14 95.4 Table 3. Comparson of computaton tme between FR-MFL and MFL Dababase JAFFE Cohn-Kanade Method FR-MFL MFL FR-MFL MFL Computaton tme (s) 167.86 40.341 50.777 617.993 Speed-up.39 tmes.46 tmes 4.3. Experments wth corrupton In ths secton, we evaluate the robustness of the proposed FR-MFL method on facal expresson recognton. The percentage of mage pxels are randomly selected from each testng mage and then replaced by random values n the range of [0, p ], where p denotes the maxmum value of pxels n the -th test mage. In our experments, we change the percentage of corrupted pxels from 0 up to 90%. Fg. 3 gves an example of corrupted mage on the Cohn-Kanade database. In the fgure, the orgnal mage s reszed to 3 3 pxels, and then s performed wth 50% random pxel corrupton. Fgs 4 and 5 show the recognton results of dfferent methods under dfferent percentage of pxel corrupton. From the results, we can observe that recognton accuraces of all methods are dropped wth the ncrease of pxel corrupton. Nevertheless, the proposed FR-MFL constantly outperforms the other methods such

as SRC, SVM, and MFL. The advantage of FR-MFL over MFL clearly valdates that the ntroduced l -norm regularzaton constrant produces better robustness to mage nose. (a) (b) (c) Fg. 3. An example of corrupted mage n the Cohn-Kanade database: Orgnal mage of 640 490 pxels (a); reszed mage of 3 3 pxels (b); 50% corrupted mage (c) Recognton accuracy / % 90 80 70 60 50 40 30 0 SRC SVM MFL FR-MFL 10 0 10 0 30 40 50 60 70 80 90 Percentage corrupted / % Fg. 4. Recognton accuracy under dfferent percentage corrupted on the JAFFE database Recognton accuracy / % 100 90 80 70 60 50 40 30 0 10 0 10 0 30 40 50 60 70 80 90 Percentage corrupted / % SRC SVM MFL FR-MFL Fg. 5. Recognton accuracy under dfferent percentage corrupted on the Cohn-Kanade database 5. Concluson Ths paper presents a faster and more robust verson of MFL (FR-MFL) for facal expresson recognton va the l -norm mnmzaton. The proposed FR-MFL 61

method outperforms MFL, SRC, and SVM on facal expresson recognton. More mportantly, the proposed FR-MFL s much more effectve than MFL n computaton. Ths can be attrbuted to two aspects. Frstly, l -norm regularzaton constrant n FR-MFL s more effectve for classfcaton than l -norm sparsty 1 constrant snce l -norm can gve hgher ablty to avod overfttng than l 1 -norm. Secondly, l -norm regularzaton constrant n FR-MFL presents an analytcal soluton of the obectve functon. Acknowledgements: Ths work s supported by Zheang Provncal Natural Scence Foundaton of Chna under Grant No LY16F00011 and No LQ15F00001, Natonal Natural Scence Foundaton of Chna under Grant No 610357 and No 61761, and Tazhou Scence Plan No 1501KY64. R e f e r e n c e s 1. Z h e n g, W., X. Z h o u, M. Xn. Color Facal Expresson Recognton Based on Color Local Features. In: Proc. of 015 IEEE Internatonal Conference on Acoustcs, Speech and Sgnal Processng (ICASSP), South Brsbane, QLD, 015, pp. 158-153.. E l e f t h e r a d s, S., O. R u d o v c, M. P a n t c. Dscrmnatve Shared Gaussan Processes for Multvew and Vew-Invarant Facal Expresson Recognton. IEEE Transactons on Image Processng, Vol. 4, 015, No 1, pp. 189-04. 3. W a n g, X., A. Lu, S. Z h a n g. New Facal Expresson Recognton Based on FSVM and KNN. Optk Internatonal Journal for Lght and Electron Optcs, Vol. 16, 015, No 1, pp. 313-3134. 4. W r g h t, J., A. Y. Y a n g, A. G a n e s h, S. S. S a s t r y, Y. M a. Robust Face Recognton va Sparse Representaton. IEEE Transactons on Pattern Analyss and Machne Intellgence, Vol. 31, 009, No, pp. 10-7. 5. Y a n g, M., L. Z h a n g, J. Y a n g, D. Z h a n g. Metaface Learnng for Sparse Representaton Based Face Recognton. In: Proc. of 010 17th IEEE Internatonal Conference on Image Processng (ICIP), Hong Kong, 010, pp. 1601-1604. 6. Km, S. J., K. Koh, M. L u s t g, S. B o y d, D. G o r n e v s k y. An Interor-Pont Method for Large-Scale l1-regularzed Least Squares. IEEE Journal of Selected Topcs n Sgnal Processng, Vol. 1, 007, No 4, pp. 606-617. 7. D o n g, W., L. Z h a n g, G. Sh, X. W u. Image Deblurrng and Super-Resoluton by Adaptve Sparse Doman Selecton and Adaptve Regularzaton. IEEE Transactons on Image Processng, Vol. 0, 011, No 7, pp. 1838-1857. 8. L y o n s, M. J., J. B u d y n e k, S. A k a m a t s u. Automatc Classfcaton of Sngle Facal Images. IEEE Transactons on Pattern Analyss and Machne Intellgence, Vol. 1, 1999, No 1, pp. 1357-136. 9. K a n a d e, T., Y. T a n, J. C o h n. Comprehensve Database for Facal Expresson Analyss. In: Proc. of Internatonal Conference on Face and Gesture Recognton, Grenoble, France, 000, pp. 46-53. 10. S h a n, C., S. G o n g, P. Mc O w a n. Facal Expresson Recognton Based on Local Bnary Patterns: A Comprehensve Study. Image and Vson Computng, Vol. 7, 009, No 6, pp. 803-816. 11. Z h a o, X., S. Z h a n g. Facal Expresson Recognton Usng Local Bnary Patterns and Dscrmnant Kernel Locally Lnear Embeddng. EURASIP Journal on Advances n Sgnal Processng, Vol. 01, 01, No 1, pp. 0. 1. L, X., Q. R u a n, Y. Jn, G. A n, R. Z h a o. Fully Automatc 3D Facal Expresson Recognton Usng Polytypc Mult-Block Local Bnary Patterns. Sgnal Processng, Vol. 108, 015, pp. 97-308. 13. Luo, Y., C.-M. W u, Y. Z h a n g. Facal Expresson Recognton Based on Fuson Feature of PCA and LBP wth SVM. Optk-Internatonal Journal for Lght and Electron Optcs, Vol. 14, 013, No 17, pp. 767-770. 6