Table of Contents Recognition of Facial Gestures...................................... 1 Attila Fazekas
II
Recognition of Facial Gestures Attila Fazekas University of Debrecen, Institute of Informatics fattila@inf.unideb.hu Abstract. In this paper we report a new application of the Support Vector Machines for facial gestures recognition. 1 Introduction The general problem of pattern classification is as follows: Let us suppose that different types of measurements describing an unknown set of l patterns are observed. These measurements are designated as x 1,..., x m, and can be represented by an m-dimensional vector x in the (input) space X. Let the K possible pattern classes be ω 1,..., ω K. The function of a pattern classifier is to assign (or to make a decision about) the correct class membership to a given pattern x X [1, 2]. We will focus on a two-class pattern recognition problem (or binary decision) (K = 2). One of the classes is the class of positive examples and the other is the class of negative examples. The decision function of the classifier can be seen as an indicator function which admits the values +1 or 1, if the corresponding feature vector is a positive example or a negative one, respectively. One of the methods to solve a two-class pattern recognition problem is to resort to example-based techniques (or learning systems), such as neural networks or support vector machines (SVM) [4 7]. SVM implements the following idea [4]: By mapping the input pattern vectors, which are the elements of the training set, into a high-dimensional feature space through an a priori suitably chosen mapping, we expect that the elements of the training set will be linearly separable in the feature space. We construct the optimal separating hyperplane in the feature space to get a binary decision whether the input vector belongs to a given class or not. For example, in the case of the application studied in the paper, facial gesture recognition, the input vector comprises gray levels of pixels from a rectangular region of the digital image and the result of the binary decision is the answer whether this region, for example, is a smiling face or not. 2 Support Vector Machines SVMs are learning algorithms based on the statistical learning theory [6]. In this section we briefly describe the foundation of SVMs by [6]. Statistical learning from examples aims at selecting from a given set of functions {f α (x) α Λ}, the
2 Attila Fazekas one which predicts best the correct response (i.e. the response of a supervisor). This selection is based on the observation of l pairs that build the training set: (x 1, y 1 ),..., (x l, y l ), x i R m, y i {+1, 1} (1) which contains input vectors x i and the associated ground truth y i given by an external supervisor. Let the response of the learning machine f α (x) belongs to a set of indicator functions {f α (x) x R m, α Λ}. If we define the loss-function: L(y, f α (x)) = { 0, if y = fα (x), 1, if y f α (x) (2) that measures the error between the ground truth y to a given input x and the response f α (x) provided by the learning machine, the expected value of the loss is given by: R(α) = L(y, f α (x))p(x, y)dxdy (3) where p(x, y) is the joint probability density function of random variables x and y. R(α) is called the expected risk. We would like to find the function f α0 (x) which minimizes the risk functional R(α). The selection of the function is based on the training set of l random independent identically distributed observations (1). In order to minimize the risk functional R(α) the empirical risk minimization (ERM) induction principle is usually employed by replacing the expected risk functional R(α) by the empirical risk functional, which is measured on the training set: R emp (α) = 1 l l L(y i, f α (x i )). (4) i=1 The idea underlying the ERM induction principle is to approximate the function f α0 (x) which minimizes R(α) by the function f αl (x) which minimizes the empirical risk. This approach may be valid for training sets having large size (ideally infinite). It is known that for some η such that 0 η 1, the expected risk is bounded for arbitrary α Λ with probability 1 η [4]: h(log( 2l h R(α) R emp (α) + ) + 1) log( η 4 ), (5) l where h is a non-negative integer called the Vapnik-Chervonenkis (VC) dimension, and is a measure of capacity of SVMs. In the following, we shall call the right hand side of inequality (5) the risk bound. The second term in the risk bound is called the VC-confidence. Inequality (5) reveals the necessity to minimize both the empirical risk and the VC-confidence. This is the aim of the
Recognition of Facial Gestures 3 so-called structural risk minimization (SRM) principle. The support vector machines (SVMs) are learning machines that implement the SRM principle in their training. In order to introduce the basic idea of SVMs, let us consider the construction of the optimal separating hyperplane. Suppose the training data (1) can be separated by a hyperplane, that is v R m : (v T x i ) + b 1, if y i = +1 (6) (v T x i ) + b 1, if y i = 1, (7) where v is a normal to the hyperplane, is the perpendicular distance from the hyperplane to the origin, and v is the Euclidean norm of v. A compact notation for inequalities (6) and (7) is: b v y i ( (v T x i ) + b ) 1, i = 1,..., l. (8) Let d + (d ) be the Euclidean distance from the separating hyperplane to the closest positive (negative) example. Define the margin of the separating hyperplane to be d + + d. For the linearly separable case, SVM simply seeks for the separating hyperplane with the largest margin. The optimal hyperplane minimizes 1 2 v 2 subject to the inequalities (8). (9) In the case of inseparable data, the so-called soft-margin approach is used in the literature. For more details the interested reader may consult [5]. After training a support vector machine, one simply determines the side of the decision boundary where a given test pattern x lies on and assigns the corresponding class label, i.e., θ(v T x + b). Let us investigate the generalization to the case where the decision function is not a linear function of the input vector. Now suppose we have first mapped the data to some other Euclidean space H, using a mapping Φ: Φ : X H. (10) Then the training algorithm would only depend on the data through inner products in H, i.e., on functions of the form Φ(x i ) T Φ(x j ). If there were a kernel function K such that K(x i, x j ) = Φ(x i ) T Φ(x j ), we would only need to use K(x i, x j ) in the training algorithm without necessarily explicitly knowing Φ(x). How can we use this machine? After all, we need v that will be in H as well. The solution of the optimization problem (9) yields a coefficient vector v that is expressed as a linear combination of a subset of training vectors, whose associated Lagrange multipliers are non-zero. These training vectors are called support vectors. In our case we will denote the images of the support vectors s i, Φ(s i ). In the test phase, an SVM computes the sign of: ( NS ) f(x) = θ λ i y i Φ(s i ) T Φ(x) + b (11) i=1
4 Attila Fazekas where λ i are the Lagrange multipliers that are associated to Φ(x i ), and N s is the number of the support vectors. Again, we can avoid computing Φ(s i ) T Φ(x) explicitly and use K(s i, x) = Φ(s i ) T Φ(x). 3 Experimental Results For the facial gesture application, we used our face database of 600 images. All images in this database are recorded in 256 gray levels and are of dimensions 640 480. A training data set of 40 images, 20 images containing a face pattern with one of the given gestures (angry, happy, serial, sad, surprised) and the others containing face patterns with gestures different from the given one. The procedure for collecting face patterns is as follows: from each image, a bounding rectangle of dimensions 256 320 pixels has been manually determined that includes the actual face. This area has been subsampled four times. At each subsampling, non-overlapping regions of 2 2 pixels are replaced by their average. Accordingly, training patterns of dimensions 16 20 are built. The dimensionality of the input space is 320. The groundtruth, that is the class label +1, has been appended to each pattern containing the given gesture. In other cases the label has been 1. The test set containing faces with different gestures. Thus, a set of 90 test images of size 16 20 each (50 faces with given facial gestures) was built from our database images not included in the training set. For every facial gesture we trained 5 SMVs by the corresponding training set. The classification results of all SVMs in the case of 5 facial gestures are given in Table 1. Table 1. Experimental results on Facial Gesture Database. Linear Polynomial Polynomial Polynomial Gaussian d = 2 d = 3 d = 4 RBF Angry 28.89 30.00 28.89 30.00 22.22 Happy 15.00 16.67 17.78 23.33 18.89 Sad 20.00 22.22 23.33 22.22 18.89 Serial 18.89 18.89 18.89 18.89 20.00 Suprised 26.67 26.67 26.67 26.67 27.78 Comparing these results, we can see that from the point of view of the classification error, the linear kernel outperforms all the other kernels. It can be seen the level of the classification error is acceptable. To improve the recognition algorithm we can combine these 5 SVMs trained for recognition one of the facial gesturers to one network. It will be the next step in our research work.
Recognition of Facial Gestures 5 (a) (b) (c) (d) Fig. 1. Surprising face (a), smilimg face (b), sad face (c), angry face (d) in the original rsolution. References 1. K.-S. Fu, Learning control systems review and outlook, IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 8, no. 3, pp. 327 342, May 1986. 2. A. Blumer, A. Ehrenfeucht, D. Haussler, and M. K. Warmuth, Learnability and the Vapnik-Chervonenkis dimension, Journal of the Association for Computing Machinery, vol. 36, no. 4, pp. 929 965, October 1989. 3. K. Fukunaga, Introduction to Statistical Pattern Recognition, Academic Press, San Diego, 1990. 4. V. N. Vapnik, The Nature of Statistical Learning Theory, Springer-Verlag, New York, 1995. 5. V. N. Vapnik, Statistical Learning Theory, John Wiley & Sons, New York, 1998. 6. V. N. Vapnik, An overview of statistical learning theory, IEEE Trans. Neural Networks, vol. 10, no. 5, pp. 988 999, September 1999. 7. E. E. Osuna, R. Freund, and F. Girosi, Support vector machines: Training and applications, CBCL Technical Report, pp. 1 41, March 1997.
6 Attila Fazekas 8. T. Joachims, Making large-scale SVM learning practical, in Advances in Kernel Methods - Support Vector Learning, pp. 169 184, MIT Press, Cambridge, MA, 1998.