Face recognition using Singular Value Decomposition and Hidden Markov Models

Face recognition using Singular Value Decomposition and Hidden Markov Models PETYA DINKOVA 1, PETIA GEORGIEVA 2, MARIOFANNA MILANOVA 3 1 Technical University of Sofia, Bulgaria 2 DETI, University of Aveiro, Portugal 3 Computer Science Department, UALR, USA petia@ua.pt; mgmilanova@ualr.edu Abstract: - In this paper we present a new approach face recognition using Singular Values Decomposition (SVD) to extract relevant face features and seven states Hidden Markov Model (HMM) as classifier. The SVD- HMM system has been evaluated on two databases - the Olivetti Research Laboratory (ORL) face database and YALE database. In order to gain more speed and higher recognition rate effective modifications of the original are proposed. Key-Words: face recognition, face detection, feature extraction, classification 1 Introduction Face Recognition is an active research area in the field of Computer Vision with increasing number of applications such as in security, robotics, humancomputer-interfaces, digital cameras, games, entertainment, [Samaria and Fallside 1993], [Cha Zhang et al., 2010], [Georgieva et al., 2013]. Popular recognition algorithms include Principal Component Analysis (PCA) using eigen-faces, Linear Discriminate Analysis (LDA), Elastic Bunch Graph Matching using the Fisher-face algorithm, Neural Networks, Hidden Markov Models (HMM), [H. Miar-Naimi et al., 2008]. HMM is a statistical model in which the system being modeled is assumed to be a Markov process with unobserved (hidden) states. The goal is to find, the best set of state transition and output probabilities. The task is usually to find the maximum likelihood estimate of the parameters of the HMM [Mark Stamp, 2012], [Phil Blunsom, 2004], [L. Rabiner, 1989] given the set of output sequences using Balm-Welch algorithm. In this paper we present a new approach face recognition using one dimensional HMM as classifier and Singular Values Decomposition (SVD) to extract relevant features face recognition. 2. Hidden Markov Model A generic HMM model is illustrated in Fig.1, where X i represents the hidden state sequence. The Markov process is determined by the current state with initial state distribution π and the transition probability matrix A. We observe only the O i (the observation sequence), which is related to the (hidden) states of the Markov process by the emission probability matrix B. The HMM model can be generally defined by these three probability matrices (π; A; B). The goal is to make an efficient use of the observable inmation so as to gain insight into various aspects of the Markov process. Fig. 1 Hidden Markov Model (HMM) 3 Data Sets and Image Preprocessing The SVD-HMM approach (described in section 4) is evaluated on ORL and YALE face databases. The Yale Database contains 165 grayscale in GIF mat of 15 individuals, which we first converted into PGM mat. There are 10 per subject with different facial expressions or configurations: with glasses, happy, left-light, without glasses, normal, right-light, sad, sleepy, ISBN: 978-960-474-396-4 144

surprised, and wink. The size of the is 231 195 pixels. The Olivetti Research Laboratory (ORL) face database contains a set of face taken between 1992-1994 at the lab. The database was used in the context of a face recognition project carried out in collaboration with the Speech, Vision and Robotics Group of the Cambridge University Engineering Department. There are ten different of each of the 40 persons. For some subjects, the were taken at different times, varying the lighting, facial expressions (open/closed eyes, smiling/not smiling) and facial details (glasses/no glasses). All were taken against a dark homogeneous background with the subjects in an upright, frontal position. The are in PGM mat. The size of each image is 112x92 pixels ( H=112 pixels is the image height and W=92 pixels is the width with 256 grey levels per pixel). The are organized in 40 folders (one each person). In each of these directories, there are ten different of the same person. Each image consists of only one face. First, the dataset is divided into two parts one training and one. For ORL and YALE datasets we use 5 from each folder training the system and the rest 5. Next, SVD is applied to extract features from the and HMM to build a recognition model. HMM is trained with half of the face and tested with a new face image not used training. The model returns probabilities of how likely the unseen face image looks like each one of the used training and the face with the highest probability is assigned as the recognized face. In order to reduce the computational complexity and memory consumption and improve the speed of the algorithm we propose an effective image preprocessing. First, each face image is transmed into a gray-scale image (only colored ). Then, it is resized to around 50% of its size, both ORL and YALE datasets. Originally the have 112x92 (ORL) or 231x195 (YALE) pixels, and after resizing the go down to 56x46, 115x82, or 64x64 pixels (Fig.2) Further to that, in order to compensate the flash effect and reduce the salt noise, a nonlinear minimum order-static filter is used (function ordfilt2 in Matlab). The filter has a smoothing role and reduces the image inmation, see [H. Miar-Naimi et al., 2008] more details. An example of the filter effect is depicted on Fig. 3. 4. SVD-HMM face recognition algorithm The SVD-HMM algorithm face recognition consists of the following steps. Original image 112x92 56x46 64x64 Fig.2 An example of original and resized image Fig.3 An example of the effect of the smoothing filter. Left side the original image. Right side the same image after filtering. 4.1 Block extraction In order to create the HMM model the twodimensional has to be transmed into onedimensional observation sequence. For that each face image is divided into overlapping blocks with the same width W as the original image, and height L, different from the height H of the whole image. P = L-1 is the size of overlapping. The number of blocks T, extracted from each face image, is computed by the following mula, see Table 1. = +1 (1) ISBN: 978-960-474-396-4 145

Table.1 The computation of the overlapped blocks Size ORL database YALE database 56x46 = 56 5 5 4 +1 = 52 115x82-64x64 = 64 5 5 4 +1 = 60 = 56 5 +1= 52 5 4 = 115 5 5 4 +1 = 111 4.2 Singular Values Decomposition (SVD) SVD is applied to each extracted block: - label = qt 1 *10*7 + qt 2 *7 + qt 3 + 1, (6) where qt 1, qt 2 and qt 3 are the quantized values. Note that if the coefficients (3) are all zero the label will be 1 and if they are 18, 10 and 7, the label will have the maximum value of 1260. As a result each face image is represented by an observation sequence with 52, 111 or 60 observed states, corresponding to the number of blocks. These observation vectors are input into the seven-state HMM model. 4.3 HMM model Recognition process is based on frontal face view. From top to bottom the face image can be divided into seven distinct regions: hair, ehead, eyebrows, eyes, nose, mouth and chin (Fig.4). These are the seven hidden states in the Markov model. X mxn = U mxm *Σ mxn *(V nxn ) T (2) where U and V are orthogonal matrices and Σ is a diagonal matrix of singular values. The coefficients U(1, 1), Σ(1,1) and Σ(2,2) are empirically chosen as the most relevant image features, [H. Miar-Naimi et al., 2008]. Each block is thus represented by an observation vector with n elements: C = (coeff 1, coeff 2,,coeff n ) (3) 4.3 Quantization Each element of (3) is quantized into Di distinct levels. The difference between two quantized values is: γ i =, (4) where coeff imax and coeff imin are the maximum and the minimum of the coefficients in all observation vectors. Every element from vector C is replaced with its quantized value: qt i = [ ] (5) γ The distinct values (Di) used in the present algorithm to quantize the coefficients U(1,1), Σ(1,1), Σ(2,2) are 18, 10 and 7. These values are chosen following the experimental results in [H. Miar- Naimi and all, 2008]. The next step is to represent each block by only one discrete value called label, Fig.4 Face regions from top to bottom Assuming a block which moves from top to bottom of the face image and in any time that block shows one of the seven regions. The block is moving consequently and it cannot miss a state. For example, if we have a block in the state eyes, the next state can never be head, it will always be the state nose. Hence the probability of moving from one state to the next one is 50% and staying in the same state is 50%. The initial state of the system is always head with a probability of 1. And the final state of the system is always chin. Thus the initial state distribution (π matrix) is: ISBN: 978-960-474-396-4 146

Head Forehead Eyebrows Eyes Nose Mouth Chin = 1 0 0 0 0 0 0 The transition matrix A is: Head Forehead Eyebrow Eyes Nose Mouth Chin Head 0.5 0.5 0 0 0 0 0 Forehead 0 0.5 0.5 0 0 0 0 Eyebrows 0 0 0.5 0.5 0 0 0 Eyes 0 0 0 0.5 0.5 0 0 Nose 0 0 0 0 0.5 0.5 0 Mouth 0 0 0 0 0 0.5 0.5 Chin 0 0 0 0 0 0 1.0 And the emission matrix B is 1 2 1260 1 1 1 1 1 1 B = / 1260 1 1 1 π, A and B matrices define the generic face model that is trained with the training sub-dataset. Table.3 The recognition rate YALE database parameters 56x46 115x82 params.blk_height = 5; params.blk_overlap = 4; params.coeff1_quant = 18; params.coeff2_quant = 10; params.coeff3_quant = 7; number_of_states = 7; params.face_height = 56; params.face_width = 46; Index of training = [1 5 6 810]; Index of = [2 3 4 7 9]; SVD features: U(1, 1), S(1, 1) and S(2, 2) 82.6% a 75 77.3% a 75 We have made an exhaustive study related with the influence of the recognition parameters. The results are summarised in the next tables. 5. Results The recognition system was implemented in Matlab 8.1 and tested on a machine with CPU Pentium 2.20 GHz with 3.89 GB Ram and 64-bit operating system. The best results are summarised in Table 2 and Table 3. Note that the YALE dataset the smaller size of the resized image (56x46) gives better recognition rate 82.6%. The intuition behind this result is that small face details are not important and may even worsen the recognition. Table.2 The recognition rate ORL database parameters 56x46 64x64 params.blk_height = 5; params.blk_overlap = 4; params.coeff1_quant = 18; params.coeff2_quant = 10; params.coeff3_quant = 7; number_of_states = 7; params.face_height = 56; params.face_width = 46; Index of training = [1 5 6 810]; Index of = [2 3 4 7 9]; SVD features: U(1, 1), S(1, 1) and S(2, 2). 96.6% a 205 96.09% a 205 5.1 Remove the minimum order-static filter Table.4 Results without filter both databases. YALE database 19.02% 9.3% Without a a 75 filter 205.. 5.2 Increase the number of training and reduce the number of The complete set of is 410 ORL database and 150 YALE database. Initially we have 5 training (5x41 = 205 training in total) and 5 each person (205 in total) the ORL dataset. For YALE database initially we have 5x15 = 75 face training and the same number purposes. ORL dataset is not that sensible with respect to the number of training/ compared to YALE dataset as it is shown on Table 5 and Table 6. ISBN: 978-960-474-396-4 147

Table.5 Recognition rate different number of training and ORL database Number of training Recognition rate(%) 205 205 96.6% 246 164 96.3% 287 123 96.7% 328 82 98.8% 369 41 97.6% Table.6 Recognition rate different number of training and YALE database Number of training Recognition rate (%) 75 75 82.7% 90 60 80% 105 45 73.3% 120 30 90% 135 15 100% 5.3 Change the selected SVD features The choice and the order of the SVD features is crucial the permance of the recognition system as it is demonstrated on Table 7 and Table 8. Table 7 rate different features (ORL data) Used values 1st 2nd 3rd Recognition rate (%) U(1, 1) S(1, 1) S(2, 2) 96.6% S(3, 3) S(1, 1) S(2, 2) 6.8% U(1, 1) S(1, 1) V(1, 1) 9.8% U(2, 2) S(1, 1) S(2, 2) 40.5% U(1, 1) S(1, 1) V(2, 2) 2.4% S(1, 1) S(2, 2) S(3, 3) 2.4% Table.8 rate different features (YALE data) Used values 1st 2nd 3rd Recognition rate (%) U(1, 1) S(1, 1) S(2, 2) 82.7% S(3, 3) S(1, 1) S(2, 2) 12% U(1, 1) S(1, 1) V(1, 1) 2.7% U(2, 2) S(1, 1) S(2, 2) 16% U(1, 1) S(1, 1) V(2, 2) 4% S(1, 1) S(2, 2) S(3, 3) 6.7% 5.4 Change the quantization levels The quantization levels (18, 10 and 7) make the algorithm faster, but with lower resolution. With such quantization levels it will be difficult or even impossible to recognize bad quality face (illumination, image noise, moving objects, etc ). However, increasing the resolution leads to lower recognition speed (important on-line recognition tasks) and is associated with computational, see Table 9 and Table 10. Table 9 rate different quantized values - ORL database Quantized values Possible 1st 2nd 3rd combinations rate(%) 18 10 7 18x10x7=1260 96.6% 7 7 7 7x7x7=343 90.2% 18 18 18 18x18x18=5832 91.7% 32 32 32 81.5% 64 64 64 Comput. 128 128 128 Computat. 256 256 256 Out of memory Table 10 rate different quantized values - YALE database Quantized values Possible 1st 2nd 3rd combinations rate(%) 18 10 7 18x10x7=1260 82.7% 7 7 7 7x7x7=343 72% 18 18 18 18x18x18=5832 80% 32 32 32 81.3% 64 64 64 Comput. 128 128 128 Computat. 256 256 256 Out of memory 5.5 Change the block height As it is shown on Table 11 and Table 12 the recognition system is very sensitive with respect to the choice of the block height. ISBN: 978-960-474-396-4 148

Table 11 rate different block height - ORL database Block height Recognition rate (%) 5 96.6% 8 76.1% 10 64.9% Table 12 rate different block height - YALE database Block height Recognition rate (%) 5 82.7% 8 80% 10 70.7% & Electronic Engineering, Vol. 4, Nº 1 & 2, Jan. 2008, pp. 46-57. [5] Mark Stamp. A Revealing Introduction to Hidden Markov Models. Department of Computer Science, San Jose State University, 2012. [6] L. Rabiner. A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of IEEE, 1989. [7] Phil Blunsom. Hidden Markov Models, Tutorial Lectures, 2004. [8] http://www.cl.cam.ac.uk/research/dtg/attarchiv e/facedatabase.html [9] http://vision.ucsd.edu/content/yale-facedatabase 5 Conclusions The proposed SVD-HMM system face recognition was tested on two databases ORL and YALE. The in both databases are taken from real subjects and have differences such as: number of, size of each image, illumination, etc The best recognition rate ORL database is 96.6% and YALE database is 82.7%. These results are obtained with optimized system parameters chosen after an exhaustive study of their influence discussed in the paper. The preprocessing (from color to grey scale transmation and original image size reduction) is crucial both datasets. The better results the ORL dataset can be intuitively explained with the availability of higher number of training. References: [1] Cha Zhang and Zhengyou Zhang. A Survey of Recent Advances in Face Detection. Technical Report MSR-TR-2010-66, 2010. [2] Samaria F. and Fallside F., Face Identification and Feature Extraction Using Hidden Markov Models, Image Processing: Theory and Application, G. Vernazza, ed., Elsevier, 1993. [3] P. Georgieva, L. Mihaylova, L: Jain, Advances in Intelligent Signal Processing and Data Mining: Theory and Applications, Springer, 299 pages, 2013. [4] H. Miar-Naimi and P. Davari. A New Fast and Efficient HMM-Based Face Recognition System Using a 7-State HMM Along With SVD Coefficients. Iranian Journal of Electrical ISBN: 978-960-474-396-4 149