1 KERNEL SPARSE REPRESENTATION WITH LOCAL PATTERNS FOR FACE RECOGNITION Cuiui Kang 1, Shengai Liao, Shiming Xiang 1, Chunhong Pan 1 1 National Laboratory of Pattern Reognition, Institute of Automation, Chinese Aademy of Sienes Department of Computer Siene and Engineering, Mihigan State University uiui.kang@ia.a.n {smxiang, hpan}@nlpr.ia.a.n ABSTRACT In this paper we propose a novel kernel sparse representation lassifiation (SRC) framework and utilize the loal binary pattern (LBP) desriptor in this framework for robust fae reognition. First we develop a kernel oordinate desent (KCD) algorithm for l1 minimization in the kernel spae, whih is based on the ovariane update tehnique. Then we extrat LBP desriptors from eah image and apply two types of kernels (χ distane based and Hamming distane based) with the proposed KCD algorithm under the SRC framework for fae reognition. Experiments on both the Extended Yale B and the PIE fae databases show that the proposed method is more robust against noise, olusion, and illumination variations, even with small number of training samples. Index Terms sparse representation, fae reognition, kernel, loal binary pattern 1. INTRODUCTION Fae reognition is an important researh area in omputer vision. It has many useful appliations in real life, suh as fae attendane, aess ontrol, seurity surveillane, et. Fae reognition is also a hallenging problem, whih suffers from aging, olusion, pose, illumination, and expression variations. Many researhers have been attrated to solve these problems, making a great development in fae reognition tehniques during the past two deades. Reently Wright et al. proposed the SRC framework for robust fae reognition [1], whih makes use of the wellknown l1-norm onstrained least-square reonstrution minimization tehnique []. They showed that SRC obtained impressive results against illumination variations, olusions, and random noise. In their work image pixel values were used to represent faes. The training set needs to be arefully onstruted, i.e. eah subjet in the training set is onstituted with many images representing various lighting onditions, so that a probe image of ertain illumination ondition an be represented by a sparse linear ombination of the training samples. However, in realisti appliations it is hard for every enrolled user to have suh varying lighting images. In terms of this drawbak, Yuan et al. [3] and Chan et al. [4] proposed to utilize the LBP desriptor in the SRC framework, so that the system would be more robust against illumination variations, making it potentially possible to apply SRC for small-samplesize fae reognition problem. LBP was originally proposed by Ojala et al. for texture lassifiation [5]. It is a binary string resulted from loal neighboring pixel omparisons. Ahonen et al. have applied LBP to fae reognition and proved that it is robust for illumination variations [6]. In [3], LBP is also employed to handle illumination variations in fae reognition, showing that it is more robust than original pixel values under the SRC framework. In their work the LBP enoded images were diretly used in the linear sparse representation system. However, s- ine LBP odes are onverted from binary omparisons, i.e. they are not regular numerial values, thus it is not very reasonable to linearly ombine LBP enodings diretly. Atually LBP is mostly used in the form of histogram features ounted in loal regions, and the χ distane is preferred to alulate distane between two LBP histogram features [5, 6]. Different from [3], Chan et al. [4] proposed to extrat LBP histogram features instead for SRC based fae reognition. In this work, we propose a novel kernel SRC framework, and apply it together with LBP desriptor to fae reognition. The ontribution lies in two-folds. First, we propose a novel kernel oordinate desent (KCD) algorithm with ovariane update tehnique for the l1 minimization problem in the kernel spae. Seond, we further apply it in SRC framework to fae reognition, in whih we are able to take advantage of the powerful LBP desriptors in two types of kernels: the χ distane based and the Hamming distane based. We ondut several experiments on the Yale-B and the PIE fae databases to illustrate the effetiveness of the proposed approah.. KERNEL SPARSE REPRESENTATION WITH LOCAL PATTERNS.1. Sparse Representation Classifiation Suppose X = [x 1, x,, x n ] R m n is a training ditionary with eah sample x i having zero mean and unit length,

2 then given a test sample y R m, the l 1 minimization solves a linear representation of y in X with the Lasso onstraint [] as follows 1 min β Xβ y + λ β 1, (1) where β R n is a sparse vetor. Given the solution from Eq. (1), the SRC algorithm [1] lassifies y based on min r (y). = Xδ (β) y, () where δ ( ) is the harateristi funtion [1] that selets oeffiients related with the th lass and makes the rest to zero... Kernel Sparse Representation Here we onsider the Lasso problem of Eq. (1) in the kernel spae, i.e. min J(β) =. 1 β n β iϕ(x i ) ϕ(y) + λ β 1, (3) i=1 where ϕ( ) is an impliit mapping whih maps a feature vetor to a kernel spae. We assume that ϕ( ) satisfies ϕ(x) T ϕ(x) = 1 when x = 1. To solve Eq. (3), in the following we develop a Kernel Coordinate Desent (KCD) algorithm whih employs the Coordinate Desent approah [7] due to its simpliity and effiieny. First, taking partial derivative of J(β) with respet to β j ( 0), we have J(β) β j = ϕ(x j ) T [ n i=1 β iϕ(x i ) ϕ(y)]+λsign(β j ). (4) Then similar with [7], by setting Eq. (4) to zero, we get the update of β j as where β j sign(α)( α λ) +, (5) α = ϕ(x j ) T [ϕ(y) n i=1,i j β iϕ(x i )], (6) and (s) + equals s if s > 0, or 0 otherwise. Further onsidering the ovariane update idea suggested in [7], Eq. (6) an be rewritten as α = K(x j, y) n i=1,i j β ik(x j, x i ), (7) where K(x, y). = ϕ(x) T ϕ(y) is the kernel funtion. Therefore, providing a kernel funtion K, the KCD algorithm is able to update β iteratively in kernel spae by Eq. (5) and (7). For lassifiation, we also develop the orresponding KCD-SRC riterion as follows = arg min = arg min n i=1,l(i)= β iϕ(x i ) ϕ(y) δ (β) T Rδ (β) z T δ (β), where R. = (K(x i, x j )) n n is the training kernel matrix, z. = (K(x i, y)) n 1, and l(i) is the lass label of the i th sample. (8) The kernelization of the Lasso problem (Eq. (1)) has also been suggested by Yuan et al. [3] and Gao et al. [8]. In [3] the formulation is addressed for a multi-task learning setting, and the resulted optimization problem is solved via the aelerated proximal gradient method [9]. Differently, our formulation does not need to ombine multiple features in fae reognition. For problem (3), based on feature-sign searh [10], Gao et al. developed a gradient desent algorithm by finding the sparse odes and odebook alternatively [8]. In their work, Gaussian kernel is employed for fae reognition. A drawbak ould be that simply using Gaussian kernel may not ahieve high effiieny under small ditionary. In ontrast, here we use oordinate desent approah to solve the optimization problem. The omputation in eah iteration is simple and the onvergene an be guaranteed [7]. Furthermore, other than Gaussian kernel, we will develop two types of effiient kernels with LBP features below, resulting in high performane under only a few training samples..3. LBP with Kernels LBP is a powerful desriptor [5] omputed from loal neighboring pixel omparisons (see Fig. 1). It is a binary string of length N or a disrete label in {0, 1,, N 1}, where N is the number of neighboring pixels. LBP histogram with these disrete bins is often omputed from loal image region, and the χ distane is employed for image lassifiation [5, 6]. Inspired by this, we first define our χ kernel as follows K χ (a, b) = L i=1 a i b i a i + b i, (9) where a and b are two normalized LBP histograms with L = N bins. It an be easily verified that 0 K χ 1 and K χ (a, a) = 1. Alternatively, we propose another kernel with hamming distane, whih an be applied diretly on LBP images. The definition is K H (x, y) = 1 mn m i=1 h(x i, y i ), (10) where x and y are two LBP enoded images, and h(, ) is a funtion that ounts the number of onsistent bits between two loal binary patterns. Note that K H also ranges in [0,1], and K H (x, x) = 1. With these two kernels, we are able to apply the proposed KCD-SRC algorithm with LBP features for fae reognition. 3. EXPERIMENTS To verify the effetiveness of the proposed method, several experiments were onduted on the Extended Yale B [11] and the CMU-PIE [1] fae database. We tested the KCD- SRC algorithm with LBP features and two kernels proposed in Setion.3. For onveniene, we denote these two proposed methods by LBP-K H and LBP-Hist-K χ respetively.

3 Note that the proposed KCD-SRC algorithm with linear kernel is atually equivalent as the original SRC algorithm with oordinate desent solver for the l 1 minimization problem. Thus we also implemented two other methods in this setting. One is the SRC algorithm with LBP histogram features developed by [4] (denoted by LBP-Hist ), and the other is SRC with raw pixels proposed in [1] (denoted by ). For all experiments, the parameter λ was onsistently set to be 0.01 and the maximum iteration number was. For the LBP-K H algorithm, the LBP 8,1 operator was performed on eah image for feature extration. Then the algorithm an be applied diretly on the LBP enoded images. In ontrast, histograms of the LBP u 8,1 operator [6, 4] was omputed for the other two LBP based methods. For this the image was first divided into non-overlapping subregions of 8 8 pixels. Then a 59-bin histogram was ounted within eah subregion and all sub-histograms were onatenated to form the final representation. The final histogram was normalized to unit length with l -norm for LBP-Hist, and l 1 -norm for LBP- Hist-K χ due to their orresponding requirements. For eah of the two databases we designed experiments in three senarios: with original images, with noise, and with olusion. For all experiments we randomly separated training samples and testing samples from the database, and repeated eah evaluation for 10 times to obtain mean reognition rates and the orresponding standard deviations On Extended Yale B Database The Extended Yale B Database [11] onsists of 16,18 faial images of 38 subjets under 9 poses and 64 illumination onditions. We seleted,414 frontal images of all the 38 subjets under 64 illumination onditions for experiments. All faes are ropped to pixels. Figure 1 shows some example images from this database, with the orresponding LBP enoded images. As we an see, under illumination variations the LBP features ould reserve more loal image strutures, whih benefits fae reognition Thresholding Binary: Fig. 1. LBP operator (left) and samples of LBP enodings (right). With Original Images: The performane of four algorithms with original images from the Extended Yale B database is shown in Figure. For eah individual we randomly hose 5,10,15,0,5,30 images for training and the rest for testing. It an be seen from Figure that, under different amount of training samples, the proposed LBP-K H algorithm onsistently performs the best, followed by LBP-Hist-K χ, LBP-Hist, and. With the dereasing amount of training samples, the performanes of all algorithms generally drop. However, LBP-K H is the most robust one, with small performane degradation. This is beause the LBP operator enodes intrinsi struture of individual fae images with even small training sample size, and the KCD-SRC framework suessfully makes use of this advantage in the Hamming kernel. It is impressive that under only five training samples, the reognition rate of LBP-K H an still reah over 97% while that of is only 70.18%, with over 0% improvement. Auay (%) Number of Training Samples Fig.. Performane with original Extended Yale B images. Auay (%) Noise (%) Fig. 3. Performane with noise on Extended Yale B images. With Noise: Next, we tested the robustness of the KCD- SRC algorithm with noise orrupted images. In this experiment we fixed the training sample size to be 5. Then for eah test image we randomly orrupted a proportion of pixels from 10% to % (with a step of 10%) with noise. These hosen pixels were replaed by uniform distributed independent samples in [0, 55]. The reognition results are shown in Figure 3. As expeted, the performane drops with the inreasing amount of noise. Besides, it an be observed that LBP-K H outperforms all other algorithms notably. Surprisingly, LBP-Hist and LBP-Hist-K χ drops drastially. They perform worse than after 30% of noise orruption. This may mainly due to the fat that loal noise an hange the neighboring omparison results in some diretions (see Fig. 1), and so the affeted LBP odes ould disturb the histogram distribution over LBP bins seriously beause of the disrete nature of LBP. To illustrate this, we draw an example in Figure 4 showing loal LBP histograms with and without noise. Clearly, the distribution of LBP odes in the same region is severely affeted by noise. In ontrast, sine Hamming kernel is alulated bitwisely, the ruined loal omparisons might affet only a fration (e.g. 30%) of Hamming ounts. Therefore LBP-K H is more robust against noise. Fig. 4. An example of LBP histograms in a 8 8 loal region without noise (left) and with 30% of noise (right). With Olusion: Finally we onduted experiments with random ontiguous olusions. Like in [1], we hose an ir-

4 related piture, resized it and attahed it on eah test image in an independent random position. 7 levels of olusions were tested, with the size of the attahed piture orresponding to 8 7, 16 14,, respetively (see Figure 5). We also evaluated the four algorithms in this senario, with randomly seleted 5 samples per subjet for training. Figure 5 shows orresponding results under different levels of olusions. As illustrated, LBP-K H performs the best, followed by LBP-Hist-K χ, LBP-Hist, and. Obviously, LBP based methods are more robust against olusion, ompared to raw pixel based one. With the same LBP histograms, the proposed KCD-SRC with χ kernel slightly outperforms that with linear kernel. It is also observed that the performanes of all algorithms drop with the inreasing amount of olusion. It should be notied that Hamming kernel with LBP maintains over 90% reognition rate even with up to 56% of olusion. It outperforms the seond best algorithm (LBP-Hist-K χ ) by over 15% for all the 7 levels of olusions. Auay (%) Olusion (%) Fig. 5. Performane (right) under 7-levels olusion (left) on Yale B 3.. On PIE Database Similar experiments were also done on the CMU-PIE database [1]. CMU-PIE database is built up with 41,368 images of 68 subjets under 13 different poses, 43 different illumination onditions, and with 4 different expressions. We only seleted frontal fae images for experiments, and ropped and resized them to [13]. Due to the limited spae, we only show experiment results with five training samples per subjet in Table 1. In the results the noise level is 50%, and the olusion level is 56.5% (48 48). From Table 1 we an see that with original images all LBP based method perform onsistently well. They have little differene beause of the inreased dimensions. Similar with the Extended Yale B database, LBP-Hist and LBP-Hist-K χ have very low reognition rates with random noise. Yet LBP-K H and are affeted less by noise. As for the olusion senarios, LBP based methods perform muh better than. Clearly, LBP-K H is again the best algorithm among all. 4. CONCLUSION In this paper, we have proposed a novel kernel oordinate desent (KCD) algorithm based on the ovariane update tehnique for l 1 minimization problem in the kernel spae. We Table 1. Reognition Rates on CMU-PIE (%). LBP-Hist LBP-Hist-K χ LBP-K H Original 90.55± ± ± ±1.15 Noise 68.1±4.3.1± ± ±1.38 Olusion 48.81± ±1..88± ±1.10 have applied the new algorithm in the sparse representation lassifiation framework (whih we all KCD-SRC) for fae reognition, and have shown that the powerful LBP desriptor an be utilized in the proposed framework with two kernels based on the χ distane and the Hamming distane. We have evaluated the proposed approah on both the Extended Yale B and the CMU-PIE fae database. Experimental results show that KCD-SRC with LBP enodings and Hamming kernel performs impressively well against illumination variations, random noise, and ontinuous olusions, even when there are only 5 training samples per subjet. In most ases this ombination an ahieve 90% reognition rate, outperforming the original SRC with raw pixel values by 0% in the same experimental settings. 5. REFERENCES [1] John Wright, Allen Y. Yang, Arvind Ganesh, S. Shankar Sastry, and Yi Ma, Robust fae reognition via sparse representation, IEEE Trans. on PAMI, vol. 31, pp. 10 7, 009. [] Robert Tibshirani, Regression shrinkage and seletion via the lasso, Journal of the Royal Statistial Soiety, Series B, vol. 58, pp , [3] Xiao-Tong Yuan and Shuiheng Yan, Visual lassifiation with multi-task joint sparse representation, in IEEE Conferene on CVPR, 010, pp [4] Chi Ho Chan and Josef Kittler, Sparse representation of (multisale) histograms for fae reognition robust to registration and illumination problems, in IEEE International Conferene on Image Proessing, 010, pp [5] T. Ojala, M. Pietikäinen, and D. Harwood, A omparative study of texture measures with lassifiation based on feature distributions, PR, vol. 9, no. 1, pp , January [6] T. Ahonen, A. Hadid, and M.Pietikäinen, Fae reognition with loal binary patterns, in ECCV, 004, pp [7] Jerome Friedman, Trevor Hastie, and Rob Tibshirani, Regularization paths for generalized linear models via oordinate desent, Journal of Statistial Software, 009. [8] Shenghua Gao, Ivor Wai-Hung Tsang, and Liang-Tien Chia, Kernel sparse representation for image lassifiation and fae reognition, in ECCV:Part IV, 010, pp [9] P. Tseng, On aelerated proximal gradient methods for onvex-onave optimization, SIAM Journal on Optimization, 008. [10] Honglak Lee, Alexis Battle, Rajat Raina, and Andrew Y. Ng, Effiient sparse oding algorithms, in NIPS, 007, pp. 1 8.

5 [11] K.C. Lee, J. Ho, and D. Kriegman, Aquiring linear subspaes for fae reognition under variable lighting, IEEE Trans. on PAMI, vol. 7, no. 5, pp , 005. [1] Terene Sim, Simon Baker, and Maan Bsat, The mu pose, illumination, and expression (pie) database, in IEEE International Conferene on FG, May 00. [13]

