Table of Contents. Recognition of Facial Gestures... 1 Attila Fazekas

Similar documents
Data Mining: Concepts and Techniques. Chapter 9 Classification: Support Vector Machines. Support Vector Machines (SVMs)

Optimization Methods for Machine Learning (OMML)

Support Vector Machines

Machine Learning for NLP

Bagging and Boosting Algorithms for Support Vector Machine Classifiers

Classification by Support Vector Machines

Classification by Support Vector Machines

SUPPORT VECTOR MACHINES

Support Vector Machines.

Rule extraction from support vector machines

DECISION-TREE-BASED MULTICLASS SUPPORT VECTOR MACHINES. Fumitake Takahashi, Shigeo Abe

Data Analysis 3. Support Vector Machines. Jan Platoš October 30, 2017

SVM in Analysis of Cross-Sectional Epidemiological Data Dmitriy Fradkin. April 4, 2005 Dmitriy Fradkin, Rutgers University Page 1

Kernel Methods & Support Vector Machines

Naïve Bayes for text classification

Support Vector Machines

Support vector machines. Dominik Wisniewski Wojciech Wawrzyniak

All lecture slides will be available at CSC2515_Winter15.html

12 Classification using Support Vector Machines

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

Leave-One-Out Support Vector Machines

Support Vector Machines for Face Recognition

SUPPORT VECTOR MACHINES

A New Fuzzy Membership Computation Method for Fuzzy Support Vector Machines

Support Vector Machines

J. Weston, A. Gammerman, M. Stitson, V. Vapnik, V. Vovk, C. Watkins. Technical Report. February 5, 1998

Kernel-based online machine learning and support vector reduction

Training Algorithms for Robust Face Recognition using a Template-matching Approach

Support Vector Machines

Opinion Mining by Transformation-Based Domain Adaptation

Categorization by Learning and Combining Object Parts

SELF-ADAPTIVE SUPPORT VECTOR MACHINES

LECTURE 5: DUAL PROBLEMS AND KERNELS. * Most of the slides in this lecture are from

Robot Learning. There are generally three types of robot learning: Learning from data. Learning by demonstration. Reinforcement learning

The Effects of Outliers on Support Vector Machines

HW2 due on Thursday. Face Recognition: Dimensionality Reduction. Biometrics CSE 190 Lecture 11. Perceptron Revisited: Linear Separators

Fast Learning for Statistical Face Detection

Support Vector Machines

Supervised Learning. CS 586 Machine Learning. Prepared by Jugal Kalita. With help from Alpaydin s Introduction to Machine Learning, Chapter 2.

Face Detection using Hierarchical SVM

Using Analytic QP and Sparseness to Speed Training of Support Vector Machines

Version Space Support Vector Machines: An Extended Paper

Support Vector Machines

Lecture 10: SVM Lecture Overview Support Vector Machines The binary classification problem

Support Vector Machines.

Introduction to Machine Learning

9. Support Vector Machines. The linearly separable case: hard-margin SVMs. The linearly separable case: hard-margin SVMs. Learning objectives

Supervised Learning (contd) Linear Separation. Mausam (based on slides by UW-AI faculty)

Support Vector Regression with ANOVA Decomposition Kernels

Support Vector Machines for Classification and Regression

Training Data Selection for Support Vector Machines

Lecture Linear Support Vector Machines

Support Vector Machines

Real time facial expression recognition from image sequences using Support Vector Machines

Support Vector Machines

Lecture 9: Support Vector Machines

Kernels + K-Means Introduction to Machine Learning. Matt Gormley Lecture 29 April 25, 2018

Kernel Combination Versus Classifier Combination

Support vector machines

Facial expression recognition using shape and texture information

A Comparative Study of SVM Kernel Functions Based on Polynomial Coefficients and V-Transform Coefficients

Generative and discriminative classification techniques

Minimum Risk Feature Transformations

Support Vector Regression for Software Reliability Growth Modeling and Prediction

Generating the Reduced Set by Systematic Sampling

CS446: Machine Learning Fall Problem Set 4. Handed Out: October 17, 2013 Due: October 31 th, w T x i w

Client Dependent GMM-SVM Models for Speaker Verification

Scale-Invariance of Support Vector Machines based on the Triangular Kernel. Abstract

Optimal Separating Hyperplane and the Support Vector Machine. Volker Tresp Summer 2018

Introduction to Support Vector Machines

Fingerprint Classification with Combinations of Support Vector Machines

Using Particle Swarm Optimization for Scaling and Rotation invariant Face Detection

A Practical Guide to Support Vector Classification

ECG782: Multidimensional Digital Signal Processing

Some questions of consensus building using co-association

CS570: Introduction to Data Mining

Module 4. Non-linear machine learning econometrics: Support Vector Machine

SVM Toolbox. Theory, Documentation, Experiments. S.V. Albrecht

SoftDoubleMinOver: A Simple Procedure for Maximum Margin Classification

Support Vector Machines + Classification for IR

One-class Problems and Outlier Detection. 陶卿 中国科学院自动化研究所

ENSEMBLE RANDOM-SUBSET SVM

SELF-ORGANIZING methods such as the Self-

Learning with infinitely many features

A novel supervised learning algorithm and its use for Spam Detection in Social Bookmarking Systems

Combining Gabor Features: Summing vs.voting in Human Face Recognition *

Support Vector Machines for Phoneme Classification

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

HARD, SOFT AND FUZZY C-MEANS CLUSTERING TECHNIQUES FOR TEXT CLASSIFICATION

Computationally Efficient Face Detection

Subject-Oriented Image Classification based on Face Detection and Recognition

ORT EP R RCH A ESE R P A IDI! " #$$% &' (# $!"

Linear methods for supervised learning

6.867 Machine Learning

Introduction to object recognition. Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others

Lecture 7: Support Vector Machine

Support vector machines

Content-based image and video analysis. Machine learning

A Lazy Approach for Machine Learning Algorithms

Lecture 10: Support Vector Machines and their Applications

Transcription:

Table of Contents Recognition of Facial Gestures...................................... 1 Attila Fazekas

II

Recognition of Facial Gestures Attila Fazekas University of Debrecen, Institute of Informatics fattila@inf.unideb.hu Abstract. In this paper we report a new application of the Support Vector Machines for facial gestures recognition. 1 Introduction The general problem of pattern classification is as follows: Let us suppose that different types of measurements describing an unknown set of l patterns are observed. These measurements are designated as x 1,..., x m, and can be represented by an m-dimensional vector x in the (input) space X. Let the K possible pattern classes be ω 1,..., ω K. The function of a pattern classifier is to assign (or to make a decision about) the correct class membership to a given pattern x X [1, 2]. We will focus on a two-class pattern recognition problem (or binary decision) (K = 2). One of the classes is the class of positive examples and the other is the class of negative examples. The decision function of the classifier can be seen as an indicator function which admits the values +1 or 1, if the corresponding feature vector is a positive example or a negative one, respectively. One of the methods to solve a two-class pattern recognition problem is to resort to example-based techniques (or learning systems), such as neural networks or support vector machines (SVM) [4 7]. SVM implements the following idea [4]: By mapping the input pattern vectors, which are the elements of the training set, into a high-dimensional feature space through an a priori suitably chosen mapping, we expect that the elements of the training set will be linearly separable in the feature space. We construct the optimal separating hyperplane in the feature space to get a binary decision whether the input vector belongs to a given class or not. For example, in the case of the application studied in the paper, facial gesture recognition, the input vector comprises gray levels of pixels from a rectangular region of the digital image and the result of the binary decision is the answer whether this region, for example, is a smiling face or not. 2 Support Vector Machines SVMs are learning algorithms based on the statistical learning theory [6]. In this section we briefly describe the foundation of SVMs by [6]. Statistical learning from examples aims at selecting from a given set of functions {f α (x) α Λ}, the

2 Attila Fazekas one which predicts best the correct response (i.e. the response of a supervisor). This selection is based on the observation of l pairs that build the training set: (x 1, y 1 ),..., (x l, y l ), x i R m, y i {+1, 1} (1) which contains input vectors x i and the associated ground truth y i given by an external supervisor. Let the response of the learning machine f α (x) belongs to a set of indicator functions {f α (x) x R m, α Λ}. If we define the loss-function: L(y, f α (x)) = { 0, if y = fα (x), 1, if y f α (x) (2) that measures the error between the ground truth y to a given input x and the response f α (x) provided by the learning machine, the expected value of the loss is given by: R(α) = L(y, f α (x))p(x, y)dxdy (3) where p(x, y) is the joint probability density function of random variables x and y. R(α) is called the expected risk. We would like to find the function f α0 (x) which minimizes the risk functional R(α). The selection of the function is based on the training set of l random independent identically distributed observations (1). In order to minimize the risk functional R(α) the empirical risk minimization (ERM) induction principle is usually employed by replacing the expected risk functional R(α) by the empirical risk functional, which is measured on the training set: R emp (α) = 1 l l L(y i, f α (x i )). (4) i=1 The idea underlying the ERM induction principle is to approximate the function f α0 (x) which minimizes R(α) by the function f αl (x) which minimizes the empirical risk. This approach may be valid for training sets having large size (ideally infinite). It is known that for some η such that 0 η 1, the expected risk is bounded for arbitrary α Λ with probability 1 η [4]: h(log( 2l h R(α) R emp (α) + ) + 1) log( η 4 ), (5) l where h is a non-negative integer called the Vapnik-Chervonenkis (VC) dimension, and is a measure of capacity of SVMs. In the following, we shall call the right hand side of inequality (5) the risk bound. The second term in the risk bound is called the VC-confidence. Inequality (5) reveals the necessity to minimize both the empirical risk and the VC-confidence. This is the aim of the

Recognition of Facial Gestures 3 so-called structural risk minimization (SRM) principle. The support vector machines (SVMs) are learning machines that implement the SRM principle in their training. In order to introduce the basic idea of SVMs, let us consider the construction of the optimal separating hyperplane. Suppose the training data (1) can be separated by a hyperplane, that is v R m : (v T x i ) + b 1, if y i = +1 (6) (v T x i ) + b 1, if y i = 1, (7) where v is a normal to the hyperplane, is the perpendicular distance from the hyperplane to the origin, and v is the Euclidean norm of v. A compact notation for inequalities (6) and (7) is: b v y i ( (v T x i ) + b ) 1, i = 1,..., l. (8) Let d + (d ) be the Euclidean distance from the separating hyperplane to the closest positive (negative) example. Define the margin of the separating hyperplane to be d + + d. For the linearly separable case, SVM simply seeks for the separating hyperplane with the largest margin. The optimal hyperplane minimizes 1 2 v 2 subject to the inequalities (8). (9) In the case of inseparable data, the so-called soft-margin approach is used in the literature. For more details the interested reader may consult [5]. After training a support vector machine, one simply determines the side of the decision boundary where a given test pattern x lies on and assigns the corresponding class label, i.e., θ(v T x + b). Let us investigate the generalization to the case where the decision function is not a linear function of the input vector. Now suppose we have first mapped the data to some other Euclidean space H, using a mapping Φ: Φ : X H. (10) Then the training algorithm would only depend on the data through inner products in H, i.e., on functions of the form Φ(x i ) T Φ(x j ). If there were a kernel function K such that K(x i, x j ) = Φ(x i ) T Φ(x j ), we would only need to use K(x i, x j ) in the training algorithm without necessarily explicitly knowing Φ(x). How can we use this machine? After all, we need v that will be in H as well. The solution of the optimization problem (9) yields a coefficient vector v that is expressed as a linear combination of a subset of training vectors, whose associated Lagrange multipliers are non-zero. These training vectors are called support vectors. In our case we will denote the images of the support vectors s i, Φ(s i ). In the test phase, an SVM computes the sign of: ( NS ) f(x) = θ λ i y i Φ(s i ) T Φ(x) + b (11) i=1

4 Attila Fazekas where λ i are the Lagrange multipliers that are associated to Φ(x i ), and N s is the number of the support vectors. Again, we can avoid computing Φ(s i ) T Φ(x) explicitly and use K(s i, x) = Φ(s i ) T Φ(x). 3 Experimental Results For the facial gesture application, we used our face database of 600 images. All images in this database are recorded in 256 gray levels and are of dimensions 640 480. A training data set of 40 images, 20 images containing a face pattern with one of the given gestures (angry, happy, serial, sad, surprised) and the others containing face patterns with gestures different from the given one. The procedure for collecting face patterns is as follows: from each image, a bounding rectangle of dimensions 256 320 pixels has been manually determined that includes the actual face. This area has been subsampled four times. At each subsampling, non-overlapping regions of 2 2 pixels are replaced by their average. Accordingly, training patterns of dimensions 16 20 are built. The dimensionality of the input space is 320. The groundtruth, that is the class label +1, has been appended to each pattern containing the given gesture. In other cases the label has been 1. The test set containing faces with different gestures. Thus, a set of 90 test images of size 16 20 each (50 faces with given facial gestures) was built from our database images not included in the training set. For every facial gesture we trained 5 SMVs by the corresponding training set. The classification results of all SVMs in the case of 5 facial gestures are given in Table 1. Table 1. Experimental results on Facial Gesture Database. Linear Polynomial Polynomial Polynomial Gaussian d = 2 d = 3 d = 4 RBF Angry 28.89 30.00 28.89 30.00 22.22 Happy 15.00 16.67 17.78 23.33 18.89 Sad 20.00 22.22 23.33 22.22 18.89 Serial 18.89 18.89 18.89 18.89 20.00 Suprised 26.67 26.67 26.67 26.67 27.78 Comparing these results, we can see that from the point of view of the classification error, the linear kernel outperforms all the other kernels. It can be seen the level of the classification error is acceptable. To improve the recognition algorithm we can combine these 5 SVMs trained for recognition one of the facial gesturers to one network. It will be the next step in our research work.

Recognition of Facial Gestures 5 (a) (b) (c) (d) Fig. 1. Surprising face (a), smilimg face (b), sad face (c), angry face (d) in the original rsolution. References 1. K.-S. Fu, Learning control systems review and outlook, IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 8, no. 3, pp. 327 342, May 1986. 2. A. Blumer, A. Ehrenfeucht, D. Haussler, and M. K. Warmuth, Learnability and the Vapnik-Chervonenkis dimension, Journal of the Association for Computing Machinery, vol. 36, no. 4, pp. 929 965, October 1989. 3. K. Fukunaga, Introduction to Statistical Pattern Recognition, Academic Press, San Diego, 1990. 4. V. N. Vapnik, The Nature of Statistical Learning Theory, Springer-Verlag, New York, 1995. 5. V. N. Vapnik, Statistical Learning Theory, John Wiley & Sons, New York, 1998. 6. V. N. Vapnik, An overview of statistical learning theory, IEEE Trans. Neural Networks, vol. 10, no. 5, pp. 988 999, September 1999. 7. E. E. Osuna, R. Freund, and F. Girosi, Support vector machines: Training and applications, CBCL Technical Report, pp. 1 41, March 1997.

6 Attila Fazekas 8. T. Joachims, Making large-scale SVM learning practical, in Advances in Kernel Methods - Support Vector Learning, pp. 169 184, MIT Press, Cambridge, MA, 1998.