Tailoring Model-based Techniques to Facial Expression Interpretation

Similar documents
Facial Expressions Recognition from Image Sequences

Facial Expression Recognition for Human-robot Interaction A Prototype

Model Based Analysis of Face Images for Facial Feature Extraction

A Model Based Approach for Expressions Invariant Face Recognition

A Model Based Approach for Expressions Invariant Face Recognition

Adaptive Skin Color Classifier for Face Outline Models

Recognition of facial expressions in presence of partial occlusion

Classification of Upper and Lower Face Action Units and Facial Expressions using Hybrid Tracking System and Probabilistic Neural Networks

Classification of Face Images for Gender, Age, Facial Expression, and Identity 1

HUMAN S FACIAL PARTS EXTRACTION TO RECOGNIZE FACIAL EXPRESSION

Evaluation of Face Resolution for Expression Analysis

A Simple Approach to Facial Expression Recognition

Real time facial expression recognition from image sequences using Support Vector Machines

Facial Expression Analysis

Evaluation of Expression Recognition Techniques

LBP Based Facial Expression Recognition Using k-nn Classifier

Real-time Automatic Facial Expression Recognition in Video Sequence

Robust Facial Expression Classification Using Shape and Appearance Features

Recognizing Partial Facial Action Units Based on 3D Dynamic Range Data for Facial Expression Recognition

Facial Expression Recognition using Principal Component Analysis with Singular Value Decomposition

Facial-component-based Bag of Words and PHOG Descriptor for Facial Expression Recognition

Boosting Coded Dynamic Features for Facial Action Units and Facial Expression Recognition

Expression Detection in Video. Abstract Expression detection is useful as a non-invasive method of lie detection and

A Real Time Facial Expression Classification System Using Local Binary Patterns

Evaluation of Gabor-Wavelet-Based Facial Action Unit Recognition in Image Sequences of Increasing Complexity

Facial Expression Recognition with Recurrent Neural Networks

Human Face Classification using Genetic Algorithm

Generic Face Alignment Using an Improved Active Shape Model

Facial Expression Detection Using Implemented (PCA) Algorithm

Facial expression recognition using shape and texture information

Exploring Facial Expressions with Compositional Features

Automatic Detecting Neutral Face for Face Authentication and Facial Expression Analysis

Emotion Detection System using Facial Action Coding System

Edge Detection for Facial Expression Recognition

IBM Research Report. Automatic Neutral Face Detection Using Location and Shape Features

Classifying Facial Gestures in Presence of Head Motion

Convolutional Neural Networks for Facial Expression Recognition

A Novel LDA and HMM-based technique for Emotion Recognition from Facial Expressions

FACIAL EXPRESSION USING 3D ANIMATION

Facial Emotion Recognition using Eye

Fully Automatic Facial Action Recognition in Spontaneous Behavior

Image Processing Pipeline for Facial Expression Recognition under Variable Lighting

Facial expression recognition based on two-step feature histogram optimization Ling Gana, Sisi Sib

Natural Facial Expression Recognition Using Dynamic and Static Schemes

Detection of a Single Hand Shape in the Foreground of Still Images

Multiple Kernel Learning for Emotion Recognition in the Wild

An Introduction to the Machine Perception Laboratory.

FACIAL EXPRESSION RECOGNITION USING ARTIFICIAL NEURAL NETWORKS

INTERNATIONAL JOURNAL FOR ADVANCE RESEARCH IN ENGINEERING AND TECHNOLOGY WINGS TO YOUR THOUGHTS.. XBeats-An Emotion Based Music Player

A Hybrid Face Detection System using combination of Appearance-based and Feature-based methods

Computers and Mathematics with Applications. An embedded system for real-time facial expression recognition based on the extension theory

Topics for thesis. Automatic Speech-based Emotion Recognition

Facial Expression Recognition for HCI Applications

Facial Expression Recognition Using Non-negative Matrix Factorization

3D Facial Action Units Recognition for Emotional Expression

Facial Action Coding Using Multiple Visual Cues and a Hierarchy of Particle Filters

Facial Action Detection from Dual-View Static Face Images

Detection of asymmetric eye action units in spontaneous videos

Face Alignment Under Various Poses and Expressions

Learning the Deep Features for Eye Detection in Uncontrolled Conditions

Multi-Modal Human- Computer Interaction

Appearance Manifold of Facial Expression

Hand Posture Recognition Using Adaboost with SIFT for Human Robot Interaction

Data Mining Final Project Francisco R. Ortega Professor: Dr. Tao Li

Recognizing Facial Expression: Machine Learning and Application to Spontaneous Behavior

Fast and Robust Classification using Asymmetric AdaBoost and a Detector Cascade

Exploring Bag of Words Architectures in the Facial Expression Domain

Facial Animation System Design based on Image Processing DU Xueyan1, a

AUTOMATIC VIDEO INDEXING

Computer Animation Visualization. Lecture 5. Facial animation

Modelling Human Perception of Facial Expressions by Discrete Choice Models

COMPOUND LOCAL BINARY PATTERN (CLBP) FOR PERSON-INDEPENDENT FACIAL EXPRESSION RECOGNITION

Spatiotemporal Features for Effective Facial Expression Recognition

FACIAL EXPRESSION USING 3D ANIMATION TECHNIQUE

Recognizing Micro-Expressions & Spontaneous Expressions

Segmentation and Tracking of Partial Planar Templates

Cross-pose Facial Expression Recognition

Dynamic facial expression recognition using a behavioural model

International Journal of Scientific & Engineering Research, Volume 7, Issue 2, February-2016 ISSN

Automatic facial emotion recognition

On Modeling Variations for Face Authentication

Robust facial action recognition from real-time 3D streams

DA Progress report 2 Multi-view facial expression. classification Nikolas Hesse

Face Detection Using Convolutional Neural Networks and Gabor Filters

Facial Expression Recognition Based on Local Directional Pattern Using SVM Decision-level Fusion

Face analysis : identity vs. expressions

C.R VIMALCHAND ABSTRACT

SPARSE RECONSTRUCTION OF FACIAL EXPRESSIONS WITH LOCALIZED GABOR MOMENTS. André Mourão, Pedro Borges, Nuno Correia, João Magalhães

Automated Facial Expression Recognition Based on FACS Action Units

Facial expression recognition is a key element in human communication.

FACE DETECTION AND RECOGNITION OF DRAWN CHARACTERS HERMAN CHAU

Nonrigid Surface Modelling. and Fast Recovery. Department of Computer Science and Engineering. Committee: Prof. Leo J. Jia and Prof. K. H.

Dynamic Facial Expression Recognition Using A Bayesian Temporal Manifold Model

Emotion Recognition With Facial Expressions Classification From Geometric Facial Features

Subject-Oriented Image Classification based on Face Detection and Recognition

Automatic Facial Expression Recognition using Boosted Discriminatory Classifiers

An Approach for Face Recognition System Using Convolutional Neural Network and Extracted Geometric Features

Disguised Face Identification Based Gabor Feature and SVM Classifier

Human Detection. A state-of-the-art survey. Mohammad Dorgham. University of Hamburg

Action Unit Based Facial Expression Recognition Using Deep Learning

Transcription:

Tailoring Model-based Techniques to Facial Expression Interpretation Matthias Wimmer, Christoph Mayer, Sylvia Pietzsch, and Bernd Radig Chair for Image Understanding and Knowledge-Based Systems Technische Universität München, Germany {wimmerm,mayerc,pietzsch,radig}@cs.tum.edu March 11, 2008 Abstract Computers have been widely deployed to our daily lives, but human-computer interaction still lacks intuition. Researchers intend to resolve these shortcomings by augmenting traditional systems with human-like interaction capabilities. Knowledge about human emotion, behavior, and intention is necessary to construct convenient interaction mechanisms. Today, dedicated hardware often infers the emotional state from human body measures. Similar to humans interpreting facial expressions, our approach acquires video information using standard hardware that does not interfere with people to accomplish this task. It exploits model-based techniques that accurately localize facial features, seamlessly track them through image sequences, and finally interpret the visible information. We make use of state-of-the-art techniques and specifically adapt most of the components involved to this scenario, which provides high accuracy and real-time capability. We base our experimental evaluation on publicly available databases and compare its results to related approaches. Our proof-of-concept demonstrates the feasibility of our approach and shows promising for integration into various applications. This research is partly funded by a JSPS Postdoctoral Fellowship for North American and European Researchers (FY2007). 1 Introduction Computers quickly solve mathematical problems and memorize an enormous extent of information, but humancomputer interaction still lacks intuition. Since people expect humanoid robots to behave similar to humans, this aspect is even more important to this interaction scenario. Researchers intend to resolve these shortcomings by augmenting traditional systems with human-like interaction capabilities. The widespread applicability and the comprehensive benefit motivate research on this topic. Apart from speech recognition, knowledge about human behavior, intention, and emotion is necessary as well to construct convenient interaction mechanisms. Knowing the user s intention and emotion provides more convenient interaction mechanisms. Robust emotion interpretation is essential for the interaction with a computer in various scenarios: In computer-assisted learning, a computer acts as the teacher by explaining the content of the lesson and questioning the user afterwards. Being aware of human emotion, the quality and success of these lessons will significantly rise. Furthermore, natural human-computer interaction requires detecting whether or not a person is telling the truth. Micro expressions within the face emerge these subtle differences. Specifically trained computer vision applications would be able to make this distinction. For a comprehensive overview on applications in emotion recognition, we refer to Lisetti and Schiano [14]. Today, dedicated hardware often facilitates this chal- 1

tion 4 experimentally evaluates our approach on a publicly available image databases and draws a comparison to related approaches. Section 5 summarizes our achievement and points out future work. lenge [12, 24, 22], which derive the emotional state from blood pressure, perspiration, brain waves, heart rate, skin temperature, etc. In contrast, humans interpret emotion via video and audio information that is acquired without interfering with other persons. Fortunately, this information can be acquired by computers using general purpose hardware. Our approach to facial expression recognition bases on model-based techniques, which are considered to have great potential to fulfill current and future requests on real-world image interpreting. Exploiting these capabilities, our approach consists of methods that robustly localize facial features, seamlessly track them through image sequences, and finally infer the facial expression visible. Most of the components involved have been specifically adapted to the face interpretation scenario and these are explicitly described in this paper. This integration of stateof-the-art techniques yields high accuracy and real-time capability. Our experimental evaluation bases on publicly available image databases and is therefore well comparable to related approaches. Our proof-of-concept demonstrates the feasibility of our approach and shows promising for integration into various applications. This paper continues as follows. Section 2 explains the state-of-the-art of facial expression recognition covering both psychological theory and concrete applications. Section 3 describes our model-based approach deriving facial expressions from both low-level features (image features) and high-level features (model parameters) that represent structural and temporal aspects of facial expressions. Sec- happiness anger 2 Ekman and Friesen [6] find six universal facial expressions that are expressed and interpreted in the same way by humans of any origin all over the world. They do not depend on the cultural background or the country of origin. Figure 1 shows one example of each facial expression. The Facial Action Coding System (FACS) [7] precisely describes the muscle activity within a human face. So-called Action Units (AUs) denote the motion of particular facial parts and state the involved facial muscles. Combinations of AUs assemble facial expressions. Extended systems like the Emotional FACS [10] specify the relation between facial expressions and emotions. The Cohn-Kanade-Facial-ExpressionDatabase (CKFE-DB) contains 488 short image sequences of 97 different persons showing the six universal facial expressions [13]. It provides researchers with a large dataset for experimenting and benchmarking purpose. Each sequence shows a neutral face at the beginning and then develops into the peak expression. Furthermore, a set of AUs has been manually specified by licensed FACS-experts for each sequence. Note that this database does not contain natural facial expressions, but volunteers were asked to act. Furthermore, the image sequences are taken in a laboratory environment with predefined illumination conditions, solid background and frontal face views. Algorithms that perform well with these image sequences are not immediately appropriate for real-world scenes. disgust 2.1 sadness fear Facial Expression Recognition: State-of-the-art The Three-phase Procedure According the survey of Pantic et al. [17], the computational task of facial expression recognition is usually subdivided into three subordinate challenges, see Figure 2: face detection, feature extraction, and facial expression classification. Chibelushi et al. [1] add a pre- and a postprocessing step. surprise Figure 1: The six universal facial expressions as they occur in [13]. 2

Figure 2: The usual three-phase procedure for recognizing facial expressions as in [17]. Phase 1, the human face and the facial components have to be accurately located within the image. On the one hand, this is achieved automatically, as in [16, 8, 3]. Most automatic approaches assume the presence of a frontal view face. On the other hand, researchers specify this information by hand, because they rather focus on the interpretation task itself, as in [21, 2, 20, 23]. Phase 2, knowing the exact position of the face, features descriptive for facial expressions are extracted from the image data. Michel et al. [16] extract the location of 22 feature points within the face and determine their motion between a neutral frame and a frame representative for a facial expression. These feature points are mostly located around the eyes and around the mouth. Littlewort et al. [15] utilize a bank of 40 Gabor wavelet filters at different scales and orientations to extract features directly from the image. They perform convolution and obtain a vector of magnitudes of complex valued responses. Phase 3, the facial expression is derived from the extracted features. Most often, a classifier is learned from a comprehensive training set of annotated examples. Some approaches first compute the visible AUs and then infer the facial expression by rules stated by Ekman and Friesen [7]. Michel et al. [16] train a Support Vector Machine (SVM) that determines the visible facial expression within the video sequences of the CKFE-DB by comparing the first frame with the neutral expression to the last frame with the peak expression. Schweiger and Bayerl [20] compute the optical flow within 6 predefined regions of a human face in order to extract the facial features. Their classification is based on supervised neural network learning. Figure 3: Interpreting facial expressions using modelbased techniques. 3 Our Approach for Facial Expression Recognition via Model-based Techniques Our approach makes use of model-based techniques, which exploit a priori knowledge about objects, such as their shape or texture, see Figure 3. Reducing the large amount of image data to a small set of model parameters facilitates and accelerates the subsequent facial expression interpretation. The description in this section again subdivides our approach into the three phases stated by Pantic et al. [17], see Section 2.1. 3.1 Face Detection via Model-based Image Interpretation According to the common configuration of model-based techniques, the first phase consists of five components: the model, the localization algorithm, the skin color extraction, the objective function, and the fitting algorithm. The model contains a parameter vector p that represents its possible configurations, such as position, orientation, scaling, and deformation. Models are mapped onto the surface of an image via a set of feature points, a contour, a textured region, etc. Referring to Edwards et al. [5], deformable models are highly suitable for analyzing 3

b 1 b 3 Figure 5: Deriving skin color from the camera image. Middle: non-adaptive. Right: adaptive to the person and to the context. b 10 Figure 4: Deformation by changing only one parameter. b 1 rotates the head. b 3 opens the mouth. b 10 moves pupils in parallel. human faces with all their individual variations. Our approach makes use of a statistics-based deformable model, as introduced by Cootes et al. [4]. Its parameters p = (t x, t y, s, θ, b) T comprise the translation, scaling factor, rotation, and deformation parameters b = (b 1,..., b m ) T. The latter component describes the configuration of the face, such as the opening of the mouth, roundness of the eyes, raising of the eye brows, see Figure 4. In this work, we set m = 17 to cover the necessary modes of variation. The localization algorithm automatically starts the interpretation process in case the intended object is visible. It computes an initial estimate of the model parameters that is further refined by the subsequent fitting algorithm. Our system integrates the approach of Viola and Jones [25], which is able to detect the affine transformation parameters (t x, t y, s, and θ) of frontal view faces. In order to obtain higher accuracy, we integrate the ability to roughly estimate the deformation parameters b as well. We apply a second iteration of the Viola and Jones object detector to the previously determined image region of the face. For this iteration, we specifically learn the algorithm to localize facial components, such as eyes and mouth. In the case of the eyes, our positive training examples contain the images of eyes, whereas the negative examples consist of image patches in the vicinity of the eyes, such as the cheek, the nose, or the brows. The resulting eye detector is not able to determine the eyes in a complex image, because it usually contains a lot of infor- mation that was not part of the training data. However, it is highly appropriate to determine the location of the eyes given a pure face image. 1 Skin color provides more reliable information about the location of the face and its components, as opposed to pixel values. Subsequent steps benefit from this information. Unfortunately, skin color varies with the scenery, the person, and the technical equipment, which challenges its automatic detection. As in our previous work [26], a high level vision module detects an image-specific skin color model, on which the actual process of skin color classification bases. This color model represents the context conditions of the image and this approach facilitates to distinguish skin color from very similar color like lip color or eyebrow color, see Figure 5. This approach clearly borders the skin regions, which correspond to the contours of our face model. The objective function f(i, p) yields a comparable value that specifies how accurately a parameterized model p describes an image I. It is also known as the likelihood, similarity, energy, cost, goodness, or quality function. Without losing generality, we consider lower values to denote a better model fit. Traditionally, objective functions are manually specified by first selecting a small number of simple image features, such as edges or corners, and then formulating mathematical calculation rules. Afterwards, the appropriateness is subjectively determined by inspecting the result on example images and example model parameterizations. If the result is not satisfactory the function is tuned or redesigned from scratch. This heuristic approach relies on the designer s intuition about a good measure of fitness. Our earlier works [27, 28] show that this methodology is erroneous 1 Object detectors for facial components can be downloaded here: http://www9.in.tum.de/people/wimmerm/se/project.eyefinder/ 4

and tedious. To avoid these drawbacks, we recently proposed an approach that learns the objective function from annotated example images [27]. It splits up the generation of the objective function into several independent pieces partly automated. This provides several benefits: first, automated steps replace the labor-intensive design of the objective function. Second, this approach is less errorprone, because giving examples of good fit is much easier than explicitly specifying rules that need to cover all examples. Third, this approach does not rely on expert knowledge and therefore it is generally applicable and not domain-dependent. The bottom line is that this approach yields more robust and accurate objective functions, which greatly facilitate the task of the fitting algorithm. The fitting algorithm searches for the model that best describes the face visible in the image. Therefore, it aims at finding the model parameters that minimize the objective function. Fitting algorithms have been the subject of intensive research and evaluation, e.g. Simulated Annealing, Genetic Algorithms, Particle Filtering, RANSAC, CONDENSATION, and CCD, see [11] for a recent overview and categorization. Since we adapt the objective function rather than the fitting algorithm to the specifics of our application, we are able to use any of these standard fitting algorithms. Due to real-time requirements, our experiments in Section 4 have been conducted with a quick hill climbing algorithm. Note that the reasonable specification of the objective function makes it nearly as accurate as a global optimization strategy, such as Genetic Algorithms. Structural features. The deformation parameters b describe the constitution of the visible face. The examples in Figure 4 illustrates the relation between the facial expression and the value of b. Therefore, we consider b to provide high-level information to the interpretation process. In contrast, the transformation parameters t x, t y, s, and θ are not related to the facial expression and therefore, we do not consider them as features. Temporal features. Since facial expressions emerge from muscle activity, the motion of particular feature points within the face gives evidence about the facial expression. Real-time capability is important, and therefore, a small number of feature points is considered only. The relative location of these points is connected to the structure of the face model. Note that we do not specify these locations manually, because this assumes a good experience of the designer in analyzing facial expressions. In contrast, we automatically generate G feature points that are uniformly distributed, see Figure 6. We expect these points to move descriptively and predictably in the case of a particular facial expression. We sum up the motion g x,i and g y,i of each point 1 i G during a short time period. We set this period to 2 sec to cover slowly expressed emotions as well. The motion of the feature points is normalized by the affine transformation of the entire face (t x, t y, s, and θ) in order to separate the facial motion from the rigid head motion. In order to determine robust descriptors, Principal Component Analysis determines the H most relevant motion patterns (principal components) visible within the set 3.2 Feature Extraction: Structural and Temporal Features Two aspects generally characterize facial expressions: they turn the face into a distinctive state [15] and the involved muscles show a distinctive motion [20, 16]. Our approach considers both aspects by extracting structural and temporal features. This large amount of feature data provides a profound basis for the subsequent classification step, which therefore achieves great accuracy. Figure 6: Considering structural and temporal features for facial expression recognition. 5

of training sequences. A linear combination of these motion patterns describes each observation approximately correct. This reduces the number of descriptors (H 2G) by enforcing robustness towards outliers as well. As a compromise between accuracy and runtime performance, we set the number of feature points to G = 140 and the number of motion patterns to H = 14. Figure 6 visualizes the obtained motion of the feature points for some example facial expressions. The feature vector t is assembled from the m structural and the H temporal features mentioned, see Equation 1. This vector represents the basis for facial expression classification. t = (b 1,..., b m, h 1,..., h H ) T (1) 3.3 Facial Expression Classification via Decision Trees With the knowledge of feature vector t, a classifier infers the correct facial expression. We learn a Binary Decision Tree [18], which is a robust and quick classifier. However, any other multi-class classifier that is able to derive the class membership from real valued features can be integrated as well, such as a k-nearest-neighbor classifier. We take 67% of the image sequences of the CKFE-DB as the training set and the remainder as test set, the evaluation on which is shown in the next section. 4 Experimental Evaluation In order to evaluate the accuracy of our proof-of-concept, we apply it to the previously unseen fraction of the CKFE- DB. Table 1 shows the recognition rate and The confusion matrix of each facial expression. The facial expressions happiness and fear are confused most often. The reason for this confusion is the similar muscle activity around the mouth. This coincidence is also reflected by FACS. The results obtained by our evaluation are comparable to state-of-the-art approaches, see Table 2. The accuracy of our approach is comparable to the one of Schweiger et al. [20] who also conduct their evaluation on the CKFE-DB. For classification, they also favor motion from different facial parts and determine Principal Components from these features. However, Schweiger ground truth classified as recognition rate surprise happiness anger disgust sadness fear surprise 28 1 1 0 0 0 93.33% happiness 1 26 1 2 3 4 70.27% anger 1 1 14 2 2 1 66.67% disgust 0 2 1 10 3 1 58.82% sadness 1 2 2 2 22 1 73.33% fear 1 5 1 0 2 13 59.09% mean recognition rate 70.25% Table 1: Confusion matrix and recognition rates of our approach. et al. manually specify the region of the visible face whereas our approach performs an automatic localization via model-based image interpretation. Michel et al. [16] also focus on facial motion by manually specifying 22 feature points that are predominantly located around the mouth and around the eyes. They utilize a Support Vector Machine (SVM) for determining one of the six facial expressions. 5 Summary and Outlook Automatic recognition of human emotion has recently grown an important factor in multi-modal humanmachine interfaces and further applications. This paper presents our proof-of-concept that tackles one part of this challenge: Facial Expression Interpretation. For this task, we exploit model-based techniques that greatly support real-world image interpretation. We have adapted them in two ways to the specific task of facial expression recognition. First, we extract informative features about faces from the image data that describe skin color and lip color regions. Second, we propose to learn the objective function for model fitting from example imfacial expression Our results Result of the algorithm Result of the algorithm of Michel et al. [16] of Schweiger et al. [20] Anger 66.7% 66.7% 75.6% Disgust 64.1% 58.8% 30.0% Fear 59.1% 66.7% 0.0% Happiness 70.3% 91.7% 79.2% Sadness 73.3% 62.5% 60.5% Surprise 93.3% 83.3% 89.8% Average 71.1% 71.8% 55.9% Table 2: Recognition rate of our approach compared to related work. 6

ages as opposed to constructing it manually. The obtained system shows promising referring to its recognition results and it has been integrated in various systems for demonstration purpose [9]. In future work, we aim at integrating multi-modal feature sets, for which we have already preliminary results [19]. We will apply Early Sensor Fusion in order to keep all knowledge for the final decision process and to exploit the ability of a combined feature-space optimization. References [1] C. C. Chibelushi and F. Bourel. Facial expression recognition: A brief tutorial overview, January 2003. [2] I. Cohen, N. Sebe, L. Chen, A. Garg, and T. Huang. Facial expression recognition from video sequences: Temporal and static modeling. CVIU special issue on face recognition, 91(1-2):160 187, 2003. [3] J. Cohn, A. Zlochower, J.-J. J. Lien, and T. Kanade. Automated face analysis by feature point tracking has high concurrent validity with manual facs coding. Psychophysiology, 36:35 43, 1999. [4] T. F. Cootes and C. J. Taylor. Active shape models smart snakes. In 3 rd BMVC, pp 266 275. 1992. [5] G. J. Edwards, T. F. Cootes, and C. J. Taylor. Face recognition using active appearance models. In 5 th ECCV, LNCS- Series 1406 1607, pp 581 595, Germany, 1998. [6] P. Ekman. Universals and cultural differences in facial expressions of emotion. In Nebraska Symp. on Motivation, vol 19, pp 207 283, 1972. U of Nebraska Press. [7] P. Ekman. Facial expressions. In Handbook of Cognition and Emotion, New York, 1999. John Wiley & Sons Ltd. [8] I. A. Essa and A. P. Pentland. Coding, analysis, interpretation, and recognition of facial expressions. IEEE Trans. Pattern Anal. Mach. Intell., 19(7):757 763, 1997. [9] S. Fischer, S. Döring, M. Wimmer, and A. Krummheuer. Experiences with an emotional sales agent. In Affective Dialogue Systems, vol 3068 of Lecture Notes in Computer Science, pp 309 312. Springer, 2004. [10] W. V. Friesen and P. Ekman. Emotional Facial Action Coding System. Unpublished manuscript, UCSF, 1983. [11] R. Hanek. Fitting Parametric Curve Models to Images Using Local Self-adapting Seperation Criteria. PhD thesis, Dept of Informatics, Technische Universität München, 2004. [12] C. S. Ikehara, D. N. Chin, and M. E. Crosby. A model for integrating an adaptive information filter utilizing biosensor data to assess cognitive load. In User Modeling, vol 2702/2003, pp 208 212, 2003. [13] T. Kanade, J. F. Cohn, and Y. Tian. Comprehensive database for facial expression analysis. In Face and Gesture Recognition, pp 46 53, 2000. [14] C. L. Lisetti and D. J. Schiano. Automatic facial expression interpretation: Where human interaction, articial intelligence and cognitive science intersect. Pragmatics and Cognition, Spec. Iss. on Facial Information Processing and Multidisciplinary Perspective, 1999. [15] G. Littlewort, I. Fasel, M. S. Bartlett, and J. R. Movellan. Fully automatic coding of basic expressions from video. Technical report, 2002. [16] P. Michel and R. El Kaliouby. Real time facial expression recognition in video using support vector machines. In Int. Conf. on Multimodal Interfaces, pp 258 264, 2003. [17] M. Pantic and L. J. M. Rothkrantz. Automatic analysis of facial expressions: The state of the art. PAMI, 22(12): 1424 1445, 2000. [18] R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, California, 1993. [19] B. Schuller, M. Wimmer, D. Arsic, G. Rigoll, and B. Radig. Audiovisual behavior modeling by combined feature spaces. In IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, vol 2, pp 733 736, USA, 2007. [20] R. Schweiger, P. Bayerl, and H. Neumann. Neural architecture for temporal emotion classification. In Affective Dialogue Systems, LNAI 3068, pp 49 52, June 2004. [21] N. Sebe, M. S. Lew, I. Cohen, A. Garg, and T. S. Huang. Emotion recognition using a cauchy naive bayes classifier. In 16 th ICPR, vol 1, pp 10017, 2002. [22] E. M. Sheldon. Virtual agent interactions. PhD thesis, 2001. [23] Y.-L. Tian, T. Kanade, and J. F. Cohn. Recognizing action units for facial expression analysis. PAMI, 23(2):97 115. 7

[24] R. M. Vick and C. S. Ikehara. Methodological issues of real time data acquisition from multiple sources of physiological data. In 36 th Int. Conf. on System Sciences, pp 129.1, 2003. [25] P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. In CVPR, pp 511 518, 2001. [26] M. Wimmer, B. Radig, and M. Beetz. A person and context specific approach for skin color classification. In 18th ICPR, vol 2, pp 39 42, 2006. [27] M. Wimmer, F. Stulp, S. Pietzsch, and B. Radig. Learning local objective functions for robust face model fitting. In PAMI, 2008. to appear. [28] M. Wimmer, F. Stulp, S. Tschechne, and B. Radig. Learning robust objective functions for model fitting in image understanding applications. BMVC, vol 3, pp 1159 1168, 2006. 8