Real-Time Speech-Driven Face Animation with Expressions Using Neural Networks

Size: px
Start display at page:

Download "Real-Time Speech-Driven Face Animation with Expressions Using Neural Networks"

Transcription

1 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 1, JANUARY Real-Time Speech-Driven Face Animation with Expressions Using Neural Networks Pengyu Hong, Zhen Wen, and Thomas S. Huang, Fellow, IEEE Abstract A real-time speech-driven synthetic talking face provides an effective multimodal communication interface in distributed collaboration environments. Nonverbal gestures such as facial expressions are important to human communication and should be considered by speech-driven face animation systems. In this paper, we present a framework that systematically addresses facial deformation modelling, automatic facial motion analysis, and real-time speechdriven face animation with expression using neural networks. Based on this framework, we learn a quantitative visual representation of the facial deformations, called the Motion Units (MUs). An facial deformation can be approximated by a linear combination of the MUs weighted by MU parameters (MUPs). We develop an MU-based facial motion tracking algorithm which is used to collect an audio-visual training database. Then, we construct a realtime audio-to-mup mapping by training a set of neural networks using the collected audio-visual training database. The quantitative evaluation of the mapping shows the effectiveness of the proposed approach. Using the proposed method, we develop the functionality of real-time speechdriven face animation with expressions for the iface system [1]. Experimental results show that the synthetic expressive talking face of the iface system is comparable with a real face in terms of the effectiveness of their influences on bimodal human emotion perception. Keywords Real-time speech-driven talking face with expressions, facial deformation modelling, facial motion analysis and synthesis, neural networks. I. Introduction Synthetic talking faces have been developed for applications such as reader, web newscaster, virtual friend, computer agent, and so on [2], [3], [4], [5], [6], [7]. Research shows that a synthetic talking face can help people understand the associated speech in noisy environments [8]. It also helps people react more positively in interactive services [9]. Real-time speech-driven synthetic talking face, as a computer-aided human-human interface, provides an effective and efficient multimodal communication channel for face-to-face communication in distributed collaboration environments [10], [11], [12], [13], [14]. However, up to date, few real-time speech-driven face animation systems have considered synthesizing facial expressions. Facial expression can strengthen or weaken the sense of the corresponding speech. It helps attract the attention of the listener. More importantly, it is the best way to visually express emotion [15]. Therefore, a real-time P. Hong is with the Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA. hong@ifp.uiuc.edu. Z. Wen is with the Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA, zhenwen@ifp.uiuc.edu T. S. Huang is with the Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA, huang@ifp.uiuc.edu speech driven facial animation system which synthesizes facial expressions will be more effective in terms of delivering visual and emotional information. It would be ideal if the computer can accurately recognize the emotions from speech and use the recognition results to synthesize facial expressions. However, research has shown that it is very difficult to recognize emotion from speech. It is shown that emotion recognition by either human or computer from speech is much more difficult than the recognition of words or sentences. In Scherer s study, the voice of 14 professional actors is used [16]. Scherer s experimental results showed that human ability to recognize emotions from purely vocal stimuli is around 60%. Dellaert et al. [17] compared the performances of different classification algorithms on a speech database, which contain four emotion categories (happy, sad, anger, and fear) and 50 short sentences per category spoken by 5 speakers. The highest recognition rate achieved by those algorithms is 79.5%. Petrushin [18] compared human and computer recognition of emotions from speech and reported around 65% recognition rate for both cases. Recently, Petrushin [19] reported that human subjects can recognize five emotions (normal, happy, angry, sad, and afraid) with the average accuracy of 63.5%. On the other hand, research has showed that recognizing facial expressions using visual information alone is much easier. Recent works on automatic facial expression recognition by computer use optical flow, appearance-based models, or local parametric motion models to extract the information about facial features [20], [21], [22], [23], [24]. The extracted information is inputted into classification algorithms for facial expressions recognition. A recognition rate as high as 98% was achieved by Essa and Pentland on five expressions (smile, surprise, anger, disgust and raise brow) [24]. Therefore, visual information is more effective than audio information in terms of conveying informatino related to the emotional states. This is because that the voice and facial expressions may not convey the same emotional information in many situations. For example, the subject may speak calmly while smiling. On the other hand, the subject may speak emotionally without noticeable facial expressions. There are works on synthesizing facial expressions directly from speech [25], [26]. However, being aware of the above difficulties, we assume that the emotional state of the user is known and concentrate on developing the methodology and techniques for synthesizing expressive talking faces given the speech signals and emotional states. The user is required to decide the facial expression of his/her avatar. For example, the user can tell the computer which expres-

2 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 1, JANUARY sion should be added to the synthetic talking face by hitting the corresponding button in the interface of the face animation system or the corresponding key in the key board. This will give the user more freedom to remotely control his/her avatar-based appearance. The rest of the paper is organized as following. In Section II, we review the related work. We then introduce an integrated framework for face model, facial motion analysis and synthesis in Section III. The integrated framework systematically addresses three related issues: (1) Learning a quantitative visual representation for facial deformation modelling and face animation; (2) Automatic facial motion analysis; and (3) Speech to facial coarticulation modelling. Section IV discusses learning a quantitative visual representation, called the Motion Units (MUs), from a set of labelled facial deformation data. In Section V, we propose an MU-based facial motion analysis algorithm which is used to analyze the facial movements of speakers. In Section VI, we discuss how to train a real-time audio-to-visual mapping with expression using neural networks. The experimental results are provided in Section VII. Finally, the paper is concluded with summary and discussions in Section VIII. II. Previous Work The core of speech-driven face animation is the audioto-visual mapping which maps the audio information to the visual information representing the facial movements. The visual information is decided by the way that a face is modelled. To achieve natural face animation, the audioto-visual mapping should be learned from a large audiovisual training database of real facial movements and the corresponding speech streams. A robust automatic facial motion analysis algorithm is required to effectively and efficiently collect a large set of audio-visual training data from real human subjects. In this section, we review previous works on face modelling, facial motion analysis, and real-time speech-driven face animation. A. Face Modelling One main goal of face modelling is to develop a facial deformation control model that deforms the facial surface spatially. Human faces are commonly modelled as freeform geometric mesh models [27], [28], [29], [30], [31], parametric geometric mesh models [32], [33], [34], or physicsbased models [35], [36], [37]. Different face models have different facial deformation control model and result in different visual features used in the audio-visual training database. A free-form face model has an explicit control model, which consists of a set of control points. The user can manually adjust the control points to manipulate the facial surface. Once the coordinates of the control points are decided, the remaining vertices of the face model are deformed by interpolation using B-spline functions [27], radial basis functions [28], [29], the combination of affine functions and radial basis functions [30], or rational functions [31]. It is straight forward to manipulate the facial surface using free-form face models. However, little research has been done in a systematic way to address how to choose the control points, how to choose the interpolation functions, how to adjust control points, and what are the correlations among those control points. Parametric face models use a set of parameters to decide the shapes of the face surface. The coordinates of the vertices on the face models are calculated by a set of predefined functions whose variables are those parameters. The difficulty of this kind of approach is how to design those functions. Usually, they are designed manually and subjectively. Therefore, those functions may not well represent the characteristics of natural facial deformations. Physics-based models simulate facial skin, tissue, and muscles by multi-layer dense meshes. Facial surface deformation is triggered by the contractions of the synthetic facial muscles. The muscle forces are propagated through the skin layer and finally deform the facial surface. The simulation procedure solves a set of dynamics equations. This kind of approach can achieve very realistic animation results. However, the physical models are sophisticated and computationally complicated. In addition, how to decide the values of a large set of parameters in a physics-based face model is an art. B. Facial Motion Analysis It is well known that tracking facial motions based on the low-level facial image features (e.g., edges or facial feature points) alone is not robust. Model-based facial motion tracking algorithms achieve more robust results by using some high-level knowledge models [38], [39], [24], [40]. Those high-level models correspond to the facial deformation control models and encode information about possible facial deformations. The tracking algorithms first extract the control information by combining the low-level image information, which is obtained by low-level image processing (e.g., edge detection, skin/lip color segmentation, template matching, optical flow calculation, etc.), and the high-level knowledge models. The control information is used to deform the face model. The deformation results are the tracking results of the current time stamp and are used by the tracking algorithms in the next step. The final tracking results will be greatly degraded if a biased high-level knowledge model is used in this loop. To be faithful to the real facial deformations, the high-level knowledge models should be learned from labelled real facial deformations. C. Real-Time Speech-Driven Face Animation Some approaches train the audio-to-visual mapping using hidden Markov models (HMMs) [41], [42], [25], which have relative long time delay. Some approaches attempt to generate lip shapes in real-time using only one audio frame. Those approaches use vector quantization [43], affine transformation [44], Gaussian mixture model [45], or artificial neural networks [46], [47] in the audio-to-visual mapping. Vector quantization [43] is a classification-based audioto-visual conversion approach. The audio features are classified into one of a number of classes. Each class is then

3 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 1, JANUARY mapped to a corresponding visual output. Though it is computationally efficient, the vector quantization approach often leads to discontinuous mapping results. The affine transformation approach [44] maps the audio feature to the visual feature by simple linear matrix operations. The Gaussian mixture approach [45] models the joint probability distribution of the audio-visual vectors as a Gaussian mixture. Each Gaussian mixture component generates a linear estimation for a visual feature given an audio feature. The estimations of all the mixture components are then weighted to produce the final visual estimation. The Gaussian mixture approach produces smoother results than the vector quantization approach does. Morishima and Harashima [46] trained a three layer neural network to map the LPC Cepstrum coefficients of each speech segment to the mouth-shape parameters for five vowels. Kshirsagar and Magnenat-Thalmann [47] also trained a three-layer neural network to classify each speech segment into vowels. The average energy of the speech segment is then used to modulate the lip shape of the recognized vowel. However, those approaches in [43], [44], [45], [46], [47] do not consider the audio contextual information, which is very important for modelling mouth coarticulation due to speech producing. Many other approaches also train neural networks for audio-to-visual mapping while taking into account the audio contextual information. Massaro et al. [48] trained multilayer perceptrons (MLP) to map the LPC cepstral parameters of speech signals to face animation parameters. They modelled the mouth coarticulation by considering the audio context of eleven consecutive audio frames (five backward, current, and five forward frames). Another way to model the audio context is to use time delay neural networks (TDNNs) model, which uses ordinary time delays to perform temporal processing. Lavagetto [49] and Curinga et al. [50] train TDNNs to map the LPC cepstral coefficients of speech signals to lip animation parameters. Nevertheless, the neural networks used in [48], [49], [50] have a large number of hidden units in order to handle large vocabulary, which results in high computational complexity during the training phrase. The above speech-driven face animation approaches mainly focus on how to train the audio-to-visual mappings. They do not consider the problem of facial deformations. A sound audio-to-visual mapping may not lead to sound speech-driven face animation results if an inappropriate visual representation is used for modelling facial deformations. Moreover, most of them can not synthesize facial expressions. III. The Integrated Framework It has been shown above that speech-driven face animation is closely related to facial deformation modelling and facial motion analysis. Here, we present an integrated framework that systematically addresses face modelling, facial motion analysis, and audio-to-visual mapping (see Figure 1). The framework provides a systematic guideline for building a speech-driven synthetic talking face. First, a quantitative representation of facial deforma- Fig. 1. An integrated framework for face modelling, facial motion analysis and synthesis. tions, called the Motion Units (MUs), is learned from a set of labelled real facial deformations. It is assumed that any facial deformation can be approximated by a linear combination of MUs weighted by the MU parameters (MUPs). MUs can be used not only for face animation but also as the high-level knowledge model in facial motion tracking. MUs and the MUPs form a facial deformation control model. A MU-based face model can be animated by adjusting the MUPs. Second, a robust MU-based facial motion tracking algorithm is presented to analyze facial image sequences. The tracking results are represented as MUP sequences. Finally, a set of facial motion tracking results and the corresponding speech streams are collected as the audio-visual training data. The audio-visual database is used to train a real-time audio-to-mup mapping using neural networks. IV. Motion Units MU is inspired by the Action Units of the Facial Action Coding System (FACS), proposed by Ekman and Friesian [51]. FACS is designed by observing stop-motion video and considered to be the most popular visual representation for facial expression recognition. An Action Unit corresponds to an independent motion of the face. However, Action Units do not provide quantitative temporal and spatial information required by face animation. To utilize FACS, researchers need to manually design Action Units for their models [24], [52]. A. Learning MUs MUs are learned from a set of labelled facial deformation data and serve as the basic information unit, which links the components of the framework together. MUs define the facial deformation manifold. Each MU represent an axis in the manifold. For computational simplicity, we assume that a facial deformation can be approximated by

4 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 1, JANUARY a linear combination of the MUs and apply principal component analysis (PCA) [53] to learning the MUs. PCA is a popular tool for modelling facial shape, deformation, and appearance [54], [39], [55], [56]. PCA captures the second-order statistics of the data by assumeing the data has a Gaussian distribution. We categorize the MUs into the Utterance MUs and the Expression MUs. Each expression has a corresponding set of Expression MUs. The Utterance MUs capture the characteristics of the facial deformations caused by speech production. The Expression MUs capture the residual information which is mainly due to facial expressions and beyond the modelling capacity of the Utterance MUs. We collect the facial deformation data of a speaking subject with and without expressions. We mark some points in the face of the subject (see Figure 2 (a)). The facial deformations are represented by the deformation of the markers. Twenty-two points are marked on the forehead of the subject. Twenty-four makers are put in the cheeks. Thirty markers are placed around the lips. The number of the markers decides the representation capacity of the MUs. More markers enable the MUs to encode more information. Currently, we only deal with 2D facial deformations. The same method can be applied to 3D facial deformations when 3D facial deformations are available. A mesh model is created according to those markers (see Figure 2 (b) ). The mesh model that corresponding to the neutral face is used as the mesh model in the MU-based facial motion tracking algorithm, which will be described in Section V. The subject is asked to wear a pair of glasses, where three additional markers are placed. (a) The markers. Fig. 2. (b) The mesh model. The markers and the mesh model. We tempt to include as great a variety of facial deformations as possible in the training data and capture the facial deformations of the subject while he is pronouncing all English phonemes with and without expressions. The video is digitized at 30 frame per second, which results in more than 1000 samples for each expression. The markers are automatically tracked by zero-mean normalized cross correlation template matching technique [57]. A graphic interactive interface is developed for the user to correct the positions of trackers when the template matching fails due to large face or facial motions. To compensate the global face motion, the tracking results are aligned by affine transformations so that the markers on the glasses are coincident for all the data samples. After aligning the data, we calculate the deformations of the markers with respect to the positions of the markers in the neutral face. The deformations of the markers at each time frame are concatenated to form a vector. We use D 0 = { d 0i } N0 to denote the facial deformation vector set without expressions and use D k = { d ki } N k (1 k K) to denote the facial deformation vector set with the kth expression. First, D 0 is used to learn the Utterance MUs M 0. We obtain m 00 = E[ d 0i ] and Λ 0 = E[( d 0i m 00 )( d 0i m 00 ) T ]. The eigenvectors and eigenvalues of Λ 0 are calculated. The first A 0 (in our case, A 0 = 7) significant eigenvectors { m 0a } A0 a=1 which correspond to the largest A 0 eigenvalues, are selected. They account for 97.56% of the facial deformation variation in D 0. The Utterance MUs are denoted as M 0 = { m 0a } A0 a=0. We then calculate the Expression MUs for each expression as following. For each D k = { d ki } N k, we calculate R k = { r ki } N k so that r ki = A 0 d ki d T kim0j m0j (1) j=1 r ki is the residual information that beyond the modelling capability of M 0. We then apply PCA to R k and obtain the kth Expression MU set M k = { m 0a } A k a=0, where mk0 = E[ r ki ] and { m kb } A k b=1 are the first A k significant eigenvectors of the covariance matrix of R k. We find A k = 2 (1 k K) is able to capture at least 98.13% residual information of the collected data. B. MU and Face Animation MUs have some nice properties. First, MUs are learned from real data and encode the characteristics of real facial deformations. Second, the way that MUs are calculated considers the correlation between the deformations of the facial points represented by the markers. Third, the number of the MUs is much smaller that of the vertices on the face model. Only a few parameters need to be adjusted in order to animate the face model. It only requires very low bandwidth to transmit the those parameters over the networks. A facial deformation d can be calculated by linearly combining MUs where K A k d = α k ( c kimki + m k0 ) (2) α 0 = 1 is a constant. α k = 1 (K k 1) if and only if the expression state is k. Otherwise α k = 0. 1 {c 0i } A0 is the Utterance MUP (UMUP) set. {c ki } A k (1 k K) is the Expression MUP (EMUP) set of M k. It can be easily shown that MU-based face animation technique is compatible with the linear keyframe-based face animation technique, which is widely used. This is 1 We assume that the face can only be in one expression state at any time.

5 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 1, JANUARY very important from the industrial point of view. The linear keyframe-based face animation technique animates the face model by interpolating among a set of keyframes, say { κ i } P, where P is the number of the keyframes. Since the face models used in different face animation system may be different, we can establish the correspondence at the semantic level defined by the keyframes. We can find a set of training samples { κ i }P in the training set of MUs so that κ i semantically corresponds to κ i for 1 i P. We can then use { κ i }P to derive the following expressions. A facial shape s can be represented as a weighted combination of the keyframes as s = P b i κ i, where {b i} P is the keyframe parameter set. s can also be represented by MUs as (a) The 130th frame. (b) The 274th frame. K A k s = α k ( c kimki + m k0 ) + s 0 (3) where s 0 the shape of neutral face. The conversion between the MUPs and the keyframe parameters can be achieved by c 0j = m T 0j[K b s 0 m 00 ] c kj = α k m T kj [K b A 0 s 0 c 0im0i m 00 m k0 ] (4) K A k b = (K T K) 1 K T ( α k ( c kimki + m k0 ) + s 0 ) where b = [b 1... b P ] T and K = [ κ 1 κ P ]. The conversion from MUPs to the keyframe parameters provides a method for normalizing the facial deformations of different subjects. In other words, the facial deformations of different subjects are normalized at the semantic level. Potentially, this method could benefit research on computer lip-reading and expression recognition. V. MU-based Facial Motion Analysis MUs can be used as the high-level knowledge model to guide facial motion tracking. We assume that the expression state k of the subject is known. This is reasonable because we are more interested in using the tracking algorithm to collect the training data for speech-driven face animation research. We also assume an affine motion model, which is a good approximation when the size of the object is relative much smaller than the distance between the object and the camera and the face only undergoes relative small global 3D motion. The tracking procedure consists of two steps. First, at the low-level image processing step, we calculated the facial shape in the next image by tracking each facial point separately using the approach proposed in [58]. The results are usually very noisy and denoted as ζ (t). We then constrain that the facial deformation should be in the manifold defined by MUs. Mathematically, the tracking problem can be formulated as a minimization problem (c) The 411th frame. Fig. 3. (C, β ) = arg min C, β (d) The 496th frame. Typical tracking results. ζ (t) ( K A k T β α k ( c kimki + m k0 ) + s 0 ) 2 (5) where C = {c ki } and T β ( ) is the transformation function whose parameter β describes the global affine motion (2D rotation, scaling and translation) of the face. The readers are asked to refer to Appendix A for the details about solving eq. (5). The MU-based facial motion tracking algorithm requires that the face be in the neutral state and face the camera in the first image frame so that the mesh model can be fit to the neutral face. The mesh model has two vertices corresponding to two mouth corners. Two mouth corners are manually selected in the facial image. The mesh model is fit to the face by translation, scaling and rotation. The task of tracking is to track those facial points represented by the vertices of the mesh. Figure 3 shows some typical tracking results in an image sequence. Currently, the tracking algorithm can only track 2D facial motion. However, if a 3D facial deformation training data is available, we can learn 3D MUs. Substituting 3D MUs into Eq. 5, we can track 3D face and facial motion. VI. Real-Time Audio-to-MUP Mapping Using Neural Networks Audio-visual training data are required to train the realtime audio-to-mup mapping. We collect the audio-visual training data in the following way. A subject is asked to read a text corpus with and without expressions. We video tape the speaking subject and digitize the video at 30 frame per second. The sampling rate of the audio is 44.1 khz. The MU-based facial motion tracking algorithm in Section V is used to analyze the facial image sequence of the video. The tracking results that are represented as MUP sequences and used as the visual feature vectors of

6 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 1, JANUARY the AV database. We calculate ten Mel-frequency cepstrum coefficients (MFCC) [59] of each audio frame as the audio feature vector. We collect an audio-visual database without expression Ψ 0 = { a 0i, v 0i } H0, where a 0i is the audio feature vector and v 0i is the visual feature vector. For each expression k, we collect an audio-visual database as Ψ 0 = { a ki, v ki } H k, where a ki is the audio feature vector and v ki is the visual feature vector. We use a method that is similar to boosting to train a set of neural networks for real-time audio-to-mup mapping. First, Ψ 0 is used to train a set of MLPs, say {Ξ 0i }, as the real-time audio-to-umup mapping. For each Ψ k, the trained {Ξ 0i } is first used to estimate the UMUP component of v ki given the corresponding audio features a ki. An MLP, say Υ k, is then trained for each Ψ k to map the estimated UMUPs to v ki, which includes the final UMUPs and the EMUPs. A. Audio-to-UMUP Mapping An MLP is a universal nonlinear function approximator and has been successfully used to train audio-to-visual mapping [13], [48]. The best results were reported by Massaro el al. [48]. They trained only one MLP for audioto-visual mapping while considering the audio context of eleven consecutive audio frames. Hence, the MLP used in [48] has large number of hidden units (The best results are achieved by using an MLP with 600 hidden units). We found that it is in practice difficult to use just one MLP to handle the whole audio-to-visual mapping due to the large searching space and high computational complexity in the training phrase. We divided Ψ 0 into 44 subsets according to the audio feature vector a 0i. 2 The audio features of subset are modelled by a Gaussian mixture. Each audiovisual sample a 0i, v 0i is classified into one of the 44 subsets whose Gaussian mixture gives the highest score for a 0i. A three-layer perceptron Ξ 0i is trained to perform audioto-visual mapping using each subset. The input of Ξ 0i is the audio feature vectors taken at seven consecutive time frames (3 backward, current and 3 forward time windows). Those 3 backward and 3 forward audio frames are the context of the current audio frame. Therefore, the delay between the input and the output is about 100 ms. In the estimation stage, an audio feature vector a is first classified into one of the 44 subsets using those Gaussian mixtures. The corresponding MLP is selected to estimate the visual feature given a and the its contextual information. In our experiments, the maximum number of the hidden units used in {Ξ 0i } 44 is only 25 and the minimum number of the hidden units is 15. Therefore, both training and estimation have very low computational complexity. B. Audio-to-UMUP+EMUP Mapping A straightforward way to build the mapping for speechdriven expressive talking face is to retrain a new set of MLPs for each Ψ k. This problem can be greatly simplified 2 The reason of choosing 44 classes is that we use a phoneme symbol set that consists of 44 phonemes. by taking advantage of the correlation between the facial deformations without expressions and facial deformations with expressions that account for the same speech content. The mapping can then be divided into two steps. The first step maps the speech to the UMUPs, which represent the facial deformation caused by producing speech. The second step maps the estimated UMUP of the first step to the final UMUPs and EMUPs. The function of the second step is to add expression information to the results of the first step. Therefore, we can reuse {Ξ 0i } 44 that are trained in the previous subsection and train an MLP to perform the task of the second step for each Ψ k. The trained {Ξ 0i } 44 is used to estimate the UMUPs for each audio feature vector a ki in Ψ k. Of course, the estimation results will not be accurate and do not contain expression information. An MLP Υ k with one hidden layer is further trained to map the estimated UMUPs to the visual feature vector v ki of a ki. In our experiments, the number of the hidden units of the Υ k is only thirty. Therefore, this approach is computationally very efficient. A. Numeric Evaluation VII. Experimental Results We collect the audio-visual training database by recording the front view of a speaking subject with and without expressions. Currently, we only examine two expressions: smile and sad. One hundred sentences are selected from the text corpus of the DARPA TIMIT speech database. Both the audio and video are digitized at thirty frame per second, which results in audio-visual training data samples. The sampling rate of the audio is 44.1 khz. The MU-based facial motion tracking algorithm in Section V is used to analyze the facial image sequence of the video. The tracking results are represented as MUP sequences and used as the visual feature vector in the audio-visual database. Ten MFCCs are calculated for each audio frame as the audio features. Eighty percent of the data is randomly selected for training. The rest is used for testing. We reconstruct the estimated displacements of the facial feature points using MUs and the estimated MUPs. We divide the displacement (both the ground truth and the estimated results) of each facial feature point by the its maximum absolute displacement in the collected audio-visual database so that the displacement is normalized to [-1.0, 1.0]. To evaluate the performance, we calculate the Pearson product-moment correlation coefficients (R), the average standard deviations, and the mean square errors (MSEs) using the normalized data. The Pearson product-moment correlation coefficient measures how good the global match between the shapes of two signal sequences is. It is calculated as trace(cov R = d d ) trace(cov )trace(cov d d d ) d where d is the normalized ground truth, d is the normalized estimated result, and (6)

7 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 1, JANUARY µ d = E[ d ] µ d = E[ d ] Cov d d = E(( d µ d )( d µ d ) T ) Cov d d = E(( d µ d )( d µ d ) T ) Cov d = E(( s µ d d )( d µ d ) T ) the iface system to utilize MUs without undergoing large modification. Currently, eight keyframes are used. They are smile, sad, and six visemes, which correspond to six phonemes i, a, o, f, u, and m, respectively. The keyframes are shown using the generic geometric face model in Figure 4. We also calculate the the average standard deviations ν d = ν d = γ c=1 (Cov d d [c][c])1/2 γ γ c=1 (Cov d [c][c]) 1/2 d γ (7) (a) Smile. (b) Sad. (c) a. (d) i. where γ is the dimension of d. The MSEs are calculated by [ d d 2 ] MSE = E γ The results are shown in Table I and II. The Neutral column shows the results of Ψ 0. The Smile column shows the results of the audio-visual data with smile expression. The Sad column shows the results of the audio-visual data with sad expression. TABLE I The numeric evaluation results of the training set. Training set Neutral Smile Sad R ν d ν d MSE TABLE II The numeric evaluation results of the testing set. Testing set Neutral Smile Sad R ν d ν d MSE (8) (e) o. (f) m. (g) f. (h) u. Fig. 4. The keyframes that are used for the conversion between MUPs and the keyframe parameters. Given the Cyberware T M scanner data of an individual, the user can use the iface system to interactively customize the generic model for that individual by clicking some facial feature points. The facial texture can be mapped onto the customized model to achieve realistic appearance (see Figure 5). (a) Dr. Russell L. Storms. Fig. 5. (b) The actress. The customized face models. Figure 6 shows a speech-driven face animation sequence generated by the iface system using the method described in this paper. The speech content of the animation sequence is: Dialog is an essential element. Figure 7 shows three typical frames in three animation sequences with or without expressions. Those three frames in Figure 7 share the same speech context while baring different expressions. Fig. 6. An example of real-time speech-driven face animation. B. Generate Face Animation Sequence We have developed a face modelling and animation system, called the iface system [1]. The iface system uses a generic geometric face mesh model and uses the linear keyframe technique to animation the face model. The technique described in Section IV-B is used to convert the estimated MUPs into the keyframe parameters. It enables C. Human Emotion Perception Study Since the end user of the real-time speech-driven synthetic talking face are human beings, it is necessary to carry out human perception study on the synthetic talking face. For convenience, we will use synthetic talking face or synthetic face to denote our real-time speech-driven synthetic

8 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 1, JANUARY (a) Neutral. (b) Smile. (c) Sad. Fig. 7. Typical face animation frames. talking face in the rest of the paper. Here, we design the following experiments to compare the influence of the synthetic talking face on human emotion perception with that of the real face. The experimental results can help the user with how to use the synthetic talking face to deliver the intended visual information. We video tape a speaking subject, whose audio-visual data is used in Section VII-A. The subject is asked to calmly read three sentences without expression, with smile expression, or with sad expression. Hence, the audio tracks do not convey any emotional information. The content of the first sentence is It is normal., which contains neutral information. The content of the second sentence is It is good., which contains positive information. The content of the third sentence is It is bad., which contains negative information. The audio tracks are used to generate three sets of face animation sequences. All three audio tracks are used in each set of animation sequence. The first set is generated without expression. The second set is generated with smile expression. The third set is generated with sad expression. Twenty untrained human subjects, who never used our system before, participate the experiments. The first experiment investigates human emotion perception based on either the visual stimuli alone or the audio stimuli alone. The subjects are first asked to recognize the expressions of both the real face and the synthetic talking face and infer their emotional states based on the animation sequences without audio. All subjects correctly recognized the expressions of both the synthetic face and the real face. Therefore, our synthetic talking face is capable to accurately deliver facial expression information. The emotional inference results in terms of the number of the subjects are shown in Table III. The S columns show the results using the synthetic talking face. The R columns show the results using the real face. As shown, the effectiveness of the synthetic talking face is comparable with that of the real face. TABLE III Emotion inference based on the animation sequences without Emotion audio. Facial Expression Neutral Smile Sad S R S R S R Neutral Happy Sad The subjects are then asked to listen to the audio and decide the emotional state of the speaker. Each subject listens to each audio only once. Note that the audio tracks are produced without emotions. Hence, the subjects try to infer the emotion from the content of the speech tracks. The results in terms of the number of the subjects are shown in Table IV. Emotion TABLE IV Emotion inference based on the audio. Audio 1 Audio 2 Audio 3 Neutral Happy Sad The second and third experiments are designed to compare the influence of synthetic face on bimodal human emotion perception and that of the real face. In the second experiment, the subjects are asked to infer the emotional state while observing the synthetic talking face and listening to the audio tracks. In the third experiment, the subjects are asked to infer the emotional state while observing the real face and listening to the same audio tracks. We divide the subjects into two groups. Each of them has ten subjects. One group first participates the second experiment and then participates the third experiment. The other group first participates the third experiment and then participates the second experiment. The results are then combined and compared in Table V, VI, and VII. The S columns in Table V, VI, and VII show the results using the synthetic talking face. The R columns in Table V, VI, and VII show the results using the real face. TABLE V Bimodal emotion inference using audio track 1. Emotion Emotion Facial Expression Neutral Smile Sad S R S R S R Neutral Happy Sad TABLE VI Bimodal emotion inference using audio track 2. Facial Expression Neutral Smile Sad S R S R S R Neutral Happy Sad Not sure We can see the face movements (either synthetic or real) and the content of the audio tracks jointly influence the

9 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 1, JANUARY Emotion TABLE VII Bimodal emotion inference using audio track 3. Facial Expression Neutral Smile Sad S R S R S R Neutral Happy Sad Not sure decisions of the subjects. Let s take the first audio track as an example. Although the first audio track only contains neutral information, sixteen subjects think the emotional state is happy if the expression of the synthetic talking face is smile. And nineteen subjects classify the emotional state into sad if the expression of the synthetic face is sad. The influence of sad expression is slightly stronger than that of smile expression. This may be because the subjects see smile expression more frequently than sad expression in the daily life. Therefore, the subjects react more strongly when they see sad expression. If the audio tracks and the facial represent the same kind of information, the human perception on the information will be enhanced. For example, when the associated facial expression of the audio track 2 is smile, nearly all subjects say that the emotional state is happy (see Table VI). The numbers of the subjects who agree with happy emotion are higher than those using visual stimuli alone (see Table III) or audio information alone (see Table IV). However, it will confuse human subjects if the facial expressions and the audio tracks represent opposite information. For example, many subjects are confused when they listen to an audio track, which contains positive information, and observe a facial expression, which represents negative information. An example is shown in the seventh and eighth columns of Table VI. The audio track conveys positive information while the facial expression is sad. Eight subjects report that they are confused if the synthetic talking face with sad expression is shown. The number of the confused subjects reduces to five if the real face is used. This difference is mainly due to the fact that the subjects are still able to tell the synthetic talking face from the real face. When confusion happens, the subjects tend to think that the expression of the synthetic face is not the original expression associating with the audio. Therefore, when the visual information conflicts with the audio information, the real face is more persuasive than this version of synthetic face. In other words, the synthetic face is less capable of conveying fake emotion information in this kind of situation. Overall, the experimental results show that our real-time speech-driven synthetic talking face successfully affects human emotion perception. The effectiveness of the synthetic face is comparable with that of the real face though it is slightly weaker. VIII. Summary and Discussions This paper presents an integrated framework for systematically building a real-time speech driven talking face for an individual. To handle the non-gaussianity of facial deformation distribution, we assume that the facial deformation space can be represented by a hierarchical linear manifold described by the Utterance MUs and the Expression MUs. The Utterance MUs define the manifold representing the facial deformations caused by speech production. The Expression MUs capture the residual information which is mainly due to facial expressions and beyond the modelling capacity of the Utterance MUs. PCA is applied to learning both the Utterance MUs and the Expression MUs. The MU-based face animation technique animates a face model by adjusting the parameters of the MUs. We also show that the MU-base face animation technique is compatible with the linear keyframe-based face animation technique. In fact, this provides a method to normalize the facial deformations of different people at the semantic level. We propose an MU-based facial motion analysis algorithm that explains the facial deformations into UMUPs and EMUPs. The algorithm is used to obtain the visual information for an audio-visual database. We train a set of MLPs for real-time speech-driven face animation with expressions using the collected audio-visual database. The audio-to-visual mapping consists of two steps. The first step maps the audio features to UMUPs. The second step maps the estimated UMUPs calculated by the first step to the final UMUPs and the EMUPs. To evaluate the mapping, we calculate the normalized MSEs and the Pearson product-moment correlation coefficients between the ground truth and the estimated results. The Pearson product-moment correlation coefficients of the training set are 0.98 for no expression, for smile expression, and for sad expression, respectively. The Pearson coefficients of the testing set are for no expression, for smile expression, and for sad expression, respectively. The normalized MSEs of the training set are for no expression, for smile expression, and for sad expression, respectively. The normalized MSEs of the testing set are for no expression, for smile expression, and for sad expression, respectively. Using the proposed method, we develop the function of real-time speech-driven face animation with expressions for the iface system. The iface system is then used in the bimodal human emotion perception study. We generate three sets of face animation sequences for three audio tracks, which convey neutral, positive, and negative information respectively. Each set of the face animation sequences consist of three sequences, which contain no expression, smile, and sad, respectively. Human subjects are asked to infer the emotion states from the face animation sequences or the videos of the real face while listening to the corresponding audio tracks. The experiment results show that our synthetic talking face effectively contribute to the bimodal human emotion perception and its effects are comparable with a real talking face. To extensively evaluate our method, future work on the bimodal human

10 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 1, JANUARY emotion perception study using the iface system will use a larger subject set and more audio/visual stimuli data. Acknowledgments This research is supported partially by USA Army Research Laboratory under Cooperative Agreement No. DAAL , and partially by National Science Foundation Grant IIS Appendix A Here, we show how to solve the following minimization problem (C, β ) = arg min C, β ζ (t) ( K A k T β α k ( c kimki + m k0 ) + s 0 ) 2 We first define the notations which will be used to derive the results: a. We track N facial feature points. b. ζ (t) = [x (t) 1 y (t) 1... x (t) N y(t) N ]T, where x (t) n, y n (t) is the coordinate of facial feature point n in the image plane at time t (t > 0) and N is the number of the facial feature points. c. β = [β 1 β 2 β 3 β 4 β 5 β 6 ] T, where β 1, β 2, β 3, and β 4 describe 2D rotation and scaling, and β 5 and β 6 describe 2D translation. d. m ki = [m ki 11 m ki m ki N1 mki N2 ]T, where m ki n1, m ki n2 denotes the deformation information of the facial feature point n, which is encoded by m ki. e. s 0 = [x 0 1 y x 0 N y0 N ]T, where x 0 n, yn 0 is the coordinate of facial feature point n in the neutral position. We can then write down ( ζ (t) K T β α k( A k c kim ki + m k0 ) + ) 2 s 0 = where and N [ n=1 x (t) n y (t) n x n = y n = ] [ ] β1 β 2 β 5 x n y β 3 β 4 β n (9) K A k α k ( c ki m ki n1 + m k0 n1) + x 0 n (10) K A k α k ( c ki m ki n2 + m k0 n2) + yn 0 (11) Note that α k = 1 (k > 0) if and only if the facial is in the expression state k. We also constrain that the face can only be in one expression state at any time. Without losing generality, we can assume α k = 0 for k > 1. Eq. (9) can be then rewritten as: where B = [H 0 B 1... B A0 E 1... E A1 ] B v ζ (t) 2 (12) H 0 = [H 01 H 02 ] m m x 0 1 m m y H 01 = m 00 N1 + m10 N1 + x0 N m00 N2 + m10 N2 + y0 N m m x 0 1 m m y1 0 1 H 02 = m 00 N1 + m10 N1 + x0 N m00 N2 + m10 N2 + y0 N 1 m 0i 11 m 0i m 0i 11 m 0i 12 B i = m 0i N1 m 0i N2 0 0 (A 0 i 1) 0 0 m 0i N1 m 0i N2 m 1i 11 m 1i m 1i 11 m 1i 12 E i = m 1i N1 m 1i N2 0 0 (A 1 i 1) 0 0 m 1i N1 m 1i N2 v = [ φ ψ 1... ψ A0 ξ 1... ξ A1 ] T φ = [β1 β 2 β 5 β 3 β 4 β 6 ] ψ i = [β 1 c 0i β 2 c 0i β 3 c 0i β 4 c 0i ] (A 0 i 1) ξ i = [β 1 c 1i β 2 c 1i β 3 c 1i β 4 c 1i ] (A 1 i 1) We can use a least square estimator to solve v from eq. (12). It is easy to recover β 1, β 2, β 3, β 4, β 5, β 6 form v, and then calculate {c 0i } A0 and {c 1i} A1. References [1] P. Hong, Z. Wen, and T. S. Huang, iface: a 3d synthetic talking face, International Journal of Image and Graphics, vol. 1, no. 1, pp. 1 8, [2] J. Cassell et al., Animated conversation: Rule-based generation of facial display, gesture and spoken intonation for multiple conversational agents, in Proc. SIGGRAPH, 1994, pp [3] Ananova Limited, The virtual newscaster, [4] Haptek Corporate, VirtualFriend, [5] Inc. LifeFX, Fac , [6] K. Nagao and A. Takeuchi, Speech dialogue with facial displays, in Proc. 32nd Annual Meeting of the Asso. for Computational Linguistics (ACL-94), 1994, pp [7] K. Waters, J. M. Rehg, et al., Visual sensing of humans for active public interfaces, Tech. Rep. CRL 96-5, Cambridge Research Lab, [8] D. W. Massaro, Speech perception by ear and eye: A paradigm for psychological inquiry, Lawrence Erlbaum Associates, Hillsdale, NJ, [9] I. Pandzic, J. Ostermann, and D. Millen, User evaluation: Synthetic talking faces for interactive services, The Visual Computer, vol. 15, no. 7/8, pp , [10] K. Aizawa and T. S. Huang, Model-based image coding, Proc. IEEE, vol. 83, pp , Aug

IFACE: A 3D SYNTHETIC TALKING FACE

IFACE: A 3D SYNTHETIC TALKING FACE IFACE: A 3D SYNTHETIC TALKING FACE PENGYU HONG *, ZHEN WEN, THOMAS S. HUANG Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign Urbana, IL 61801, USA We present

More information

Real-time Talking Head Driven by Voice and its Application to Communication and Entertainment

Real-time Talking Head Driven by Voice and its Application to Communication and Entertainment ISCA Archive Real-time Talking Head Driven by Voice and its Application to Communication and Entertainment Shigeo MORISHIMA Seikei University ABSTRACT Recently computer can make cyberspace to walk through

More information

M I RA Lab. Speech Animation. Where do we stand today? Speech Animation : Hierarchy. What are the technologies?

M I RA Lab. Speech Animation. Where do we stand today? Speech Animation : Hierarchy. What are the technologies? MIRALab Where Research means Creativity Where do we stand today? M I RA Lab Nadia Magnenat-Thalmann MIRALab, University of Geneva thalmann@miralab.unige.ch Video Input (face) Audio Input (speech) FAP Extraction

More information

Speech Driven Synthesis of Talking Head Sequences

Speech Driven Synthesis of Talking Head Sequences 3D Image Analysis and Synthesis, pp. 5-56, Erlangen, November 997. Speech Driven Synthesis of Talking Head Sequences Peter Eisert, Subhasis Chaudhuri,andBerndGirod Telecommunications Laboratory, University

More information

VISEME SPACE FOR REALISTIC SPEECH ANIMATION

VISEME SPACE FOR REALISTIC SPEECH ANIMATION VISEME SPACE FOR REALISTIC SPEECH ANIMATION Sumedha Kshirsagar, Nadia Magnenat-Thalmann MIRALab CUI, University of Geneva {sumedha, thalmann}@miralab.unige.ch http://www.miralab.unige.ch ABSTRACT For realistic

More information

FACE ANALYSIS AND SYNTHESIS FOR INTERACTIVE ENTERTAINMENT

FACE ANALYSIS AND SYNTHESIS FOR INTERACTIVE ENTERTAINMENT FACE ANALYSIS AND SYNTHESIS FOR INTERACTIVE ENTERTAINMENT Shoichiro IWASAWA*I, Tatsuo YOTSUKURA*2, Shigeo MORISHIMA*2 */ Telecommunication Advancement Organization *2Facu!ty of Engineering, Seikei University

More information

Computer Animation Visualization. Lecture 5. Facial animation

Computer Animation Visualization. Lecture 5. Facial animation Computer Animation Visualization Lecture 5 Facial animation Taku Komura Facial Animation The face is deformable Need to decide how all the vertices on the surface shall move Manually create them Muscle-based

More information

A Real Time Facial Expression Classification System Using Local Binary Patterns

A Real Time Facial Expression Classification System Using Local Binary Patterns A Real Time Facial Expression Classification System Using Local Binary Patterns S L Happy, Anjith George, and Aurobinda Routray Department of Electrical Engineering, IIT Kharagpur, India Abstract Facial

More information

Animated Talking Head With Personalized 3D Head Model

Animated Talking Head With Personalized 3D Head Model Animated Talking Head With Personalized 3D Head Model L.S.Chen, T.S.Huang - Beckman Institute & CSL University of Illinois, Urbana, IL 61801, USA; lchen@ifp.uiuc.edu Jörn Ostermann, AT&T Labs-Research,

More information

Animation of 3D surfaces

Animation of 3D surfaces Animation of 3D surfaces 2013-14 Motivations When character animation is controlled by skeleton set of hierarchical joints joints oriented by rotations the character shape still needs to be visible: visible

More information

Facial Expression Recognition using Principal Component Analysis with Singular Value Decomposition

Facial Expression Recognition using Principal Component Analysis with Singular Value Decomposition ISSN: 2321-7782 (Online) Volume 1, Issue 6, November 2013 International Journal of Advance Research in Computer Science and Management Studies Research Paper Available online at: www.ijarcsms.com Facial

More information

Evaluation of Expression Recognition Techniques

Evaluation of Expression Recognition Techniques Evaluation of Expression Recognition Techniques Ira Cohen 1, Nicu Sebe 2,3, Yafei Sun 3, Michael S. Lew 3, Thomas S. Huang 1 1 Beckman Institute, University of Illinois at Urbana-Champaign, USA 2 Faculty

More information

Recently, research in creating friendly human

Recently, research in creating friendly human For Duplication Expression and Impression Shigeo Morishima Recently, research in creating friendly human interfaces has flourished. Such interfaces provide smooth communication between a computer and a

More information

Real-time Lip Synchronization Based on Hidden Markov Models

Real-time Lip Synchronization Based on Hidden Markov Models ACCV2002 The 5th Asian Conference on Computer Vision, 23--25 January 2002, Melbourne, Australia. Real-time Lip Synchronization Based on Hidden Markov Models Ying Huang* 1 Stephen Lin+ Xiaoqing Ding* Baining

More information

Real time facial expression recognition from image sequences using Support Vector Machines

Real time facial expression recognition from image sequences using Support Vector Machines Real time facial expression recognition from image sequences using Support Vector Machines I. Kotsia a and I. Pitas a a Aristotle University of Thessaloniki, Department of Informatics, Box 451, 54124 Thessaloniki,

More information

c Copyright by Zhen Wen, 2004

c Copyright by Zhen Wen, 2004 c Copyright by Zhen Wen, 2004 AN INTEGRATED FRAMEWORK ENHANCED WITH APPEARANCE MODEL FOR FACIAL MOTION MODELING, ANALYSIS AND SYNTHESIS BY ZHEN WEN B.Engr., Tsinghua University, 1998 M.S., University of

More information

Human body animation. Computer Animation. Human Body Animation. Skeletal Animation

Human body animation. Computer Animation. Human Body Animation. Skeletal Animation Computer Animation Aitor Rovira March 2010 Human body animation Based on slides by Marco Gillies Human Body Animation Skeletal Animation Skeletal Animation (FK, IK) Motion Capture Motion Editing (retargeting,

More information

Facial expression recognition using shape and texture information

Facial expression recognition using shape and texture information 1 Facial expression recognition using shape and texture information I. Kotsia 1 and I. Pitas 1 Aristotle University of Thessaloniki pitas@aiia.csd.auth.gr Department of Informatics Box 451 54124 Thessaloniki,

More information

Speech Driven Face Animation Based on Dynamic Concatenation Model

Speech Driven Face Animation Based on Dynamic Concatenation Model Journal of Information & Computational Science 3: 4 (2006) 1 Available at http://www.joics.com Speech Driven Face Animation Based on Dynamic Concatenation Model Jianhua Tao, Panrong Yin National Laboratory

More information

Acoustic to Articulatory Mapping using Memory Based Regression and Trajectory Smoothing

Acoustic to Articulatory Mapping using Memory Based Regression and Trajectory Smoothing Acoustic to Articulatory Mapping using Memory Based Regression and Trajectory Smoothing Samer Al Moubayed Center for Speech Technology, Department of Speech, Music, and Hearing, KTH, Sweden. sameram@kth.se

More information

Multiple Kernel Learning for Emotion Recognition in the Wild

Multiple Kernel Learning for Emotion Recognition in the Wild Multiple Kernel Learning for Emotion Recognition in the Wild Karan Sikka, Karmen Dykstra, Suchitra Sathyanarayana, Gwen Littlewort and Marian S. Bartlett Machine Perception Laboratory UCSD EmotiW Challenge,

More information

Feature Selection Using Principal Feature Analysis

Feature Selection Using Principal Feature Analysis Feature Selection Using Principal Feature Analysis Ira Cohen Qi Tian Xiang Sean Zhou Thomas S. Huang Beckman Institute for Advanced Science and Technology University of Illinois at Urbana-Champaign Urbana,

More information

REALISTIC FACIAL EXPRESSION SYNTHESIS FOR AN IMAGE-BASED TALKING HEAD. Kang Liu and Joern Ostermann

REALISTIC FACIAL EXPRESSION SYNTHESIS FOR AN IMAGE-BASED TALKING HEAD. Kang Liu and Joern Ostermann REALISTIC FACIAL EXPRESSION SYNTHESIS FOR AN IMAGE-BASED TALKING HEAD Kang Liu and Joern Ostermann Institut für Informationsverarbeitung, Leibniz Universität Hannover Appelstr. 9A, 3167 Hannover, Germany

More information

Personal style & NMF-based Exaggerative Expressions of Face. Seongah Chin, Chung-yeon Lee, Jaedong Lee Multimedia Department of Sungkyul University

Personal style & NMF-based Exaggerative Expressions of Face. Seongah Chin, Chung-yeon Lee, Jaedong Lee Multimedia Department of Sungkyul University Personal style & NMF-based Exaggerative Expressions of Face Seongah Chin, Chung-yeon Lee, Jaedong Lee Multimedia Department of Sungkyul University Outline Introduction Related Works Methodology Personal

More information

Multi-modal Translation and Evaluation of Lip-synchronization using Noise Added Voice

Multi-modal Translation and Evaluation of Lip-synchronization using Noise Added Voice Multi-modal Translation and Evaluation of Lip-synchronization using Noise Added Voice Shigeo MORISHIMA (,2), Satoshi NAKAMURA (2) () Faculty of Engineering, Seikei University. --, Kichijoji-Kitamachi,

More information

Facial Expression Analysis for Model-Based Coding of Video Sequences

Facial Expression Analysis for Model-Based Coding of Video Sequences Picture Coding Symposium, pp. 33-38, Berlin, September 1997. Facial Expression Analysis for Model-Based Coding of Video Sequences Peter Eisert and Bernd Girod Telecommunications Institute, University of

More information

A Facial Expression Imitation System in Human Robot Interaction

A Facial Expression Imitation System in Human Robot Interaction A Facial Expression Imitation System in Human Robot Interaction S. S. Ge, C. Wang, C. C. Hang Abstract In this paper, we propose an interactive system for reconstructing human facial expression. In the

More information

Animation of 3D surfaces.

Animation of 3D surfaces. Animation of 3D surfaces Motivations When character animation is controlled by skeleton set of hierarchical joints joints oriented by rotations the character shape still needs to be visible: visible =

More information

FACIAL EXPRESSION RECOGNITION USING ARTIFICIAL NEURAL NETWORKS

FACIAL EXPRESSION RECOGNITION USING ARTIFICIAL NEURAL NETWORKS FACIAL EXPRESSION RECOGNITION USING ARTIFICIAL NEURAL NETWORKS M.Gargesha and P.Kuchi EEE 511 Artificial Neural Computation Systems, Spring 2002 Department of Electrical Engineering Arizona State University

More information

Facial Motion Capture Editing by Automated Orthogonal Blendshape Construction and Weight Propagation

Facial Motion Capture Editing by Automated Orthogonal Blendshape Construction and Weight Propagation Facial Motion Capture Editing by Automated Orthogonal Blendshape Construction and Weight Propagation Qing Li and Zhigang Deng Department of Computer Science University of Houston Houston, TX, 77204, USA

More information

Facial Expression Detection Using Implemented (PCA) Algorithm

Facial Expression Detection Using Implemented (PCA) Algorithm Facial Expression Detection Using Implemented (PCA) Algorithm Dileep Gautam (M.Tech Cse) Iftm University Moradabad Up India Abstract: Facial expression plays very important role in the communication with

More information

An Interactive Interface for Directing Virtual Humans

An Interactive Interface for Directing Virtual Humans An Interactive Interface for Directing Virtual Humans Gael Sannier 1, Selim Balcisoy 2, Nadia Magnenat-Thalmann 1, Daniel Thalmann 2 1) MIRALab, University of Geneva, 24 rue du Général Dufour CH 1211 Geneva,

More information

Research on Dynamic Facial Expressions Recognition

Research on Dynamic Facial Expressions Recognition Research on Dynamic Facial Expressions Recognition Xiaoning Peng & Beii Zou School of Information Science and Engineering Central South University Changsha 410083, China E-mail: hhpxn@mail.csu.edu.cn Department

More information

Synthesizing Realistic Facial Expressions from Photographs

Synthesizing Realistic Facial Expressions from Photographs Synthesizing Realistic Facial Expressions from Photographs 1998 F. Pighin, J Hecker, D. Lischinskiy, R. Szeliskiz and D. H. Salesin University of Washington, The Hebrew University Microsoft Research 1

More information

K A I S T Department of Computer Science

K A I S T Department of Computer Science An Example-based Approach to Text-driven Speech Animation with Emotional Expressions Hyewon Pyun, Wonseok Chae, Yejin Kim, Hyungwoo Kang, and Sung Yong Shin CS/TR-2004-200 July 19, 2004 K A I S T Department

More information

CS 231. Deformation simulation (and faces)

CS 231. Deformation simulation (and faces) CS 231 Deformation simulation (and faces) Deformation BODY Simulation Discretization Spring-mass models difficult to model continuum properties Simple & fast to implement and understand Finite Element

More information

Classification of Face Images for Gender, Age, Facial Expression, and Identity 1

Classification of Face Images for Gender, Age, Facial Expression, and Identity 1 Proc. Int. Conf. on Artificial Neural Networks (ICANN 05), Warsaw, LNCS 3696, vol. I, pp. 569-574, Springer Verlag 2005 Classification of Face Images for Gender, Age, Facial Expression, and Identity 1

More information

SPEECH FEATURE EXTRACTION USING WEIGHTED HIGHER-ORDER LOCAL AUTO-CORRELATION

SPEECH FEATURE EXTRACTION USING WEIGHTED HIGHER-ORDER LOCAL AUTO-CORRELATION Far East Journal of Electronics and Communications Volume 3, Number 2, 2009, Pages 125-140 Published Online: September 14, 2009 This paper is available online at http://www.pphmj.com 2009 Pushpa Publishing

More information

HUMAN S FACIAL PARTS EXTRACTION TO RECOGNIZE FACIAL EXPRESSION

HUMAN S FACIAL PARTS EXTRACTION TO RECOGNIZE FACIAL EXPRESSION HUMAN S FACIAL PARTS EXTRACTION TO RECOGNIZE FACIAL EXPRESSION Dipankar Das Department of Information and Communication Engineering, University of Rajshahi, Rajshahi-6205, Bangladesh ABSTRACT Real-time

More information

CS 231. Deformation simulation (and faces)

CS 231. Deformation simulation (and faces) CS 231 Deformation simulation (and faces) 1 Cloth Simulation deformable surface model Represent cloth model as a triangular or rectangular grid Points of finite mass as vertices Forces or energies of points

More information

FACIAL EXPRESSION USING 3D ANIMATION

FACIAL EXPRESSION USING 3D ANIMATION Volume 1 Issue 1 May 2010 pp. 1 7 http://iaeme.com/ijcet.html I J C E T IAEME FACIAL EXPRESSION USING 3D ANIMATION Mr. K. Gnanamuthu Prakash 1, Dr. S. Balasubramanian 2 ABSTRACT Traditionally, human facial

More information

Motion Tracking and Event Understanding in Video Sequences

Motion Tracking and Event Understanding in Video Sequences Motion Tracking and Event Understanding in Video Sequences Isaac Cohen Elaine Kang, Jinman Kang Institute for Robotics and Intelligent Systems University of Southern California Los Angeles, CA Objectives!

More information

Facial Expression Recognition

Facial Expression Recognition Facial Expression Recognition Kavita S G 1, Surabhi Narayan 2 1 PG Student, Department of Information Science and Engineering, BNM Institute of Technology, Bengaluru, Karnataka, India 2 Prof and Head,

More information

Real-time Driver Affect Analysis and Tele-viewing System i

Real-time Driver Affect Analysis and Tele-viewing System i Appeared in Intelligent Vehicles Symposium, Proceedings. IEEE, June 9-11, 2003, 372-377 Real-time Driver Affect Analysis and Tele-viewing System i Joel C. McCall, Satya P. Mallick, and Mohan M. Trivedi

More information

AUTOMATIC VIDEO INDEXING

AUTOMATIC VIDEO INDEXING AUTOMATIC VIDEO INDEXING Itxaso Bustos Maite Frutos TABLE OF CONTENTS Introduction Methods Key-frame extraction Automatic visual indexing Shot boundary detection Video OCR Index in motion Image processing

More information

Detection of Mouth Movements and Its Applications to Cross-Modal Analysis of Planning Meetings

Detection of Mouth Movements and Its Applications to Cross-Modal Analysis of Planning Meetings 2009 International Conference on Multimedia Information Networking and Security Detection of Mouth Movements and Its Applications to Cross-Modal Analysis of Planning Meetings Yingen Xiong Nokia Research

More information

Epitomic Analysis of Human Motion

Epitomic Analysis of Human Motion Epitomic Analysis of Human Motion Wooyoung Kim James M. Rehg Department of Computer Science Georgia Institute of Technology Atlanta, GA 30332 {wooyoung, rehg}@cc.gatech.edu Abstract Epitomic analysis is

More information

Analyzing Vocal Patterns to Determine Emotion Maisy Wieman, Andy Sun

Analyzing Vocal Patterns to Determine Emotion Maisy Wieman, Andy Sun Analyzing Vocal Patterns to Determine Emotion Maisy Wieman, Andy Sun 1. Introduction The human voice is very versatile and carries a multitude of emotions. Emotion in speech carries extra insight about

More information

Topics for thesis. Automatic Speech-based Emotion Recognition

Topics for thesis. Automatic Speech-based Emotion Recognition Topics for thesis Bachelor: Automatic Speech-based Emotion Recognition Emotion recognition is an important part of Human-Computer Interaction (HCI). It has various applications in industrial and commercial

More information

Application of the Fourier-wavelet transform to moving images in an interview scene

Application of the Fourier-wavelet transform to moving images in an interview scene International Journal of Applied Electromagnetics and Mechanics 15 (2001/2002) 359 364 359 IOS Press Application of the Fourier-wavelet transform to moving images in an interview scene Chieko Kato a,,

More information

Research on Emotion Recognition for Facial Expression Images Based on Hidden Markov Model

Research on Emotion Recognition for Facial Expression Images Based on Hidden Markov Model e-issn: 2349-9745 p-issn: 2393-8161 Scientific Journal Impact Factor (SJIF): 1.711 International Journal of Modern Trends in Engineering and Research www.ijmter.com Research on Emotion Recognition for

More information

Generic Face Alignment Using an Improved Active Shape Model

Generic Face Alignment Using an Improved Active Shape Model Generic Face Alignment Using an Improved Active Shape Model Liting Wang, Xiaoqing Ding, Chi Fang Electronic Engineering Department, Tsinghua University, Beijing, China {wanglt, dxq, fangchi} @ocrserv.ee.tsinghua.edu.cn

More information

Data Mining Final Project Francisco R. Ortega Professor: Dr. Tao Li

Data Mining Final Project Francisco R. Ortega Professor: Dr. Tao Li Data Mining Final Project Francisco R. Ortega Professor: Dr. Tao Li FALL 2009 1.Introduction In the data mining class one of the aspects of interest were classifications. For the final project, the decision

More information

Facial Expression Recognition Using Non-negative Matrix Factorization

Facial Expression Recognition Using Non-negative Matrix Factorization Facial Expression Recognition Using Non-negative Matrix Factorization Symeon Nikitidis, Anastasios Tefas and Ioannis Pitas Artificial Intelligence & Information Analysis Lab Department of Informatics Aristotle,

More information

Emotion Detection System using Facial Action Coding System

Emotion Detection System using Facial Action Coding System International Journal of Engineering and Technical Research (IJETR) Emotion Detection System using Facial Action Coding System Vedant Chauhan, Yash Agrawal, Vinay Bhutada Abstract Behaviors, poses, actions,

More information

3D FACIAL EXPRESSION TRACKING AND REGENERATION FROM SINGLE CAMERA IMAGE BASED ON MUSCLE CONSTRAINT FACE MODEL

3D FACIAL EXPRESSION TRACKING AND REGENERATION FROM SINGLE CAMERA IMAGE BASED ON MUSCLE CONSTRAINT FACE MODEL International Archives of Photogrammetry and Remote Sensing. Vol. XXXII, Part 5. Hakodate 1998 3D FACIAL EXPRESSION TRACKING AND REGENERATION FROM SINGLE CAMERA IMAGE BASED ON MUSCLE CONSTRAINT FACE MODEL

More information

Render. Predicted Pose. Pose Estimator. Feature Position. Match Templates

Render. Predicted Pose. Pose Estimator. Feature Position. Match Templates 3D Model-Based Head Tracking Antonio Colmenarez Ricardo Lopez Thomas S. Huang University of Illinois at Urbana-Champaign 405 N. Mathews Ave., Urbana, IL 61801 ABSTRACT This paper introduces a new approach

More information

Invariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction

Invariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction Invariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction Stefan Müller, Gerhard Rigoll, Andreas Kosmala and Denis Mazurenok Department of Computer Science, Faculty of

More information

Recognizing Micro-Expressions & Spontaneous Expressions

Recognizing Micro-Expressions & Spontaneous Expressions Recognizing Micro-Expressions & Spontaneous Expressions Presentation by Matthias Sperber KIT University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association www.kit.edu

More information

Real-time Automatic Facial Expression Recognition in Video Sequence

Real-time Automatic Facial Expression Recognition in Video Sequence www.ijcsi.org 59 Real-time Automatic Facial Expression Recognition in Video Sequence Nivedita Singh 1 and Chandra Mani Sharma 2 1 Institute of Technology & Science (ITS) Mohan Nagar, Ghaziabad-201007,

More information

Radial Basis Network for Facial Expression Synthesis. I. King H. T. Hou. The Chinese University of Hong Kong

Radial Basis Network for Facial Expression Synthesis. I. King H. T. Hou. The Chinese University of Hong Kong Radial Basis Network for Facial Expression Synthesis I King H T Hou fking,hthoug@cscuhkeduhk Department of Computer Science & Engineering The Chinese University of Hong Kong Shatin, New Territories, Hong

More information

Communicating with virtual characters

Communicating with virtual characters Communicating with virtual characters Nadia Magnenat Thalmann, Prem Kalra, Marc Escher MIRALab, CUI University of Geneva 24, rue du General-Dufour 1211 Geneva, Switzerland Email: {thalmann, kalra, escher}@cui.unige.ch

More information

Performance Driven Facial Animation using Blendshape Interpolation

Performance Driven Facial Animation using Blendshape Interpolation Performance Driven Facial Animation using Blendshape Interpolation Erika Chuang Chris Bregler Computer Science Department Stanford University Abstract This paper describes a method of creating facial animation

More information

FACIAL EXPRESSION USING 3D ANIMATION TECHNIQUE

FACIAL EXPRESSION USING 3D ANIMATION TECHNIQUE FACIAL EXPRESSION USING 3D ANIMATION TECHNIQUE Vishal Bal Assistant Prof., Pyramid College of Business & Technology, Phagwara, Punjab, (India) ABSTRACT Traditionally, human facial language has been studied

More information

Facial expression recognition is a key element in human communication.

Facial expression recognition is a key element in human communication. Facial Expression Recognition using Artificial Neural Network Rashi Goyal and Tanushri Mittal rashigoyal03@yahoo.in Abstract Facial expression recognition is a key element in human communication. In order

More information

TOWARDS A HIGH QUALITY FINNISH TALKING HEAD

TOWARDS A HIGH QUALITY FINNISH TALKING HEAD TOWARDS A HIGH QUALITY FINNISH TALKING HEAD Jean-Luc Oliv&s, Mikko Sam, Janne Kulju and Otto Seppala Helsinki University of Technology Laboratory of Computational Engineering, P.O. Box 9400, Fin-02015

More information

Automatic Pixel Selection for Optimizing Facial Expression Recognition using Eigenfaces

Automatic Pixel Selection for Optimizing Facial Expression Recognition using Eigenfaces C. Frank and E. Nöth, DAGM 2003, Springer-Verlag, (pp. 378 385). Automatic Pixel Selection for Optimizing Facial Expression Recognition using Eigenfaces Lehrstuhl für Mustererkennung, Universität Erlangen-Nürnberg,

More information

FACIAL ANIMATION FROM SEVERAL IMAGES

FACIAL ANIMATION FROM SEVERAL IMAGES International Archives of Photogrammetry and Remote Sensing. Vol. XXXII, Part 5. Hakodate 1998 FACIAL ANIMATION FROM SEVERAL IMAGES Yasuhiro MUKAIGAWAt Yuichi NAKAMURA+ Yuichi OHTA+ t Department of Information

More information

Facial Image Synthesis 1 Barry-John Theobald and Jeffrey F. Cohn

Facial Image Synthesis 1 Barry-John Theobald and Jeffrey F. Cohn Facial Image Synthesis Page 1 of 5 Facial Image Synthesis 1 Barry-John Theobald and Jeffrey F. Cohn 1 Introduction Facial expression has been central to the

More information

From Gaze to Focus of Attention

From Gaze to Focus of Attention From Gaze to Focus of Attention Rainer Stiefelhagen, Michael Finke, Jie Yang, Alex Waibel stiefel@ira.uka.de, finkem@cs.cmu.edu, yang+@cs.cmu.edu, ahw@cs.cmu.edu Interactive Systems Laboratories University

More information

Automatic Enhancement of Correspondence Detection in an Object Tracking System

Automatic Enhancement of Correspondence Detection in an Object Tracking System Automatic Enhancement of Correspondence Detection in an Object Tracking System Denis Schulze 1, Sven Wachsmuth 1 and Katharina J. Rohlfing 2 1- University of Bielefeld - Applied Informatics Universitätsstr.

More information

Unsupervised Learning for Speech Motion Editing

Unsupervised Learning for Speech Motion Editing Eurographics/SIGGRAPH Symposium on Computer Animation (2003) D. Breen, M. Lin (Editors) Unsupervised Learning for Speech Motion Editing Yong Cao 1,2 Petros Faloutsos 1 Frédéric Pighin 2 1 University of

More information

Facial Emotion Recognition using Eye

Facial Emotion Recognition using Eye Facial Emotion Recognition using Eye Vishnu Priya R 1 and Muralidhar A 2 1 School of Computing Science and Engineering, VIT Chennai Campus, Tamil Nadu, India. Orcid: 0000-0002-2016-0066 2 School of Computing

More information

Facial Feature Extraction Based On FPD and GLCM Algorithms

Facial Feature Extraction Based On FPD and GLCM Algorithms Facial Feature Extraction Based On FPD and GLCM Algorithms Dr. S. Vijayarani 1, S. Priyatharsini 2 Assistant Professor, Department of Computer Science, School of Computer Science and Engineering, Bharathiar

More information

Classification of Upper and Lower Face Action Units and Facial Expressions using Hybrid Tracking System and Probabilistic Neural Networks

Classification of Upper and Lower Face Action Units and Facial Expressions using Hybrid Tracking System and Probabilistic Neural Networks Classification of Upper and Lower Face Action Units and Facial Expressions using Hybrid Tracking System and Probabilistic Neural Networks HADI SEYEDARABI*, WON-SOOK LEE**, ALI AGHAGOLZADEH* AND SOHRAB

More information

Data-Driven Face Modeling and Animation

Data-Driven Face Modeling and Animation 1. Research Team Data-Driven Face Modeling and Animation Project Leader: Post Doc(s): Graduate Students: Undergraduate Students: Prof. Ulrich Neumann, IMSC and Computer Science John P. Lewis Zhigang Deng,

More information

Image Processing Pipeline for Facial Expression Recognition under Variable Lighting

Image Processing Pipeline for Facial Expression Recognition under Variable Lighting Image Processing Pipeline for Facial Expression Recognition under Variable Lighting Ralph Ma, Amr Mohamed ralphma@stanford.edu, amr1@stanford.edu Abstract Much research has been done in the field of automated

More information

Master s Thesis. Cloning Facial Expressions with User-defined Example Models

Master s Thesis. Cloning Facial Expressions with User-defined Example Models Master s Thesis Cloning Facial Expressions with User-defined Example Models ( Kim, Yejin) Department of Electrical Engineering and Computer Science Division of Computer Science Korea Advanced Institute

More information

On Modeling Variations for Face Authentication

On Modeling Variations for Face Authentication On Modeling Variations for Face Authentication Xiaoming Liu Tsuhan Chen B.V.K. Vijaya Kumar Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA 15213 xiaoming@andrew.cmu.edu

More information

Re-mapping Animation Parameters Between Multiple Types of Facial Model

Re-mapping Animation Parameters Between Multiple Types of Facial Model Re-mapping Animation Parameters Between Multiple Types of Facial Model Darren Cosker, Steven Roy, Paul L. Rosin, and David Marshall School of Computer Science, Cardiff University, U.K D.P.Cosker,Paul.Rosin,Dave.Marshal@cs.cardiff.ac.uk

More information

Probabilistic Facial Feature Extraction Using Joint Distribution of Location and Texture Information

Probabilistic Facial Feature Extraction Using Joint Distribution of Location and Texture Information Probabilistic Facial Feature Extraction Using Joint Distribution of Location and Texture Information Mustafa Berkay Yilmaz, Hakan Erdogan, Mustafa Unel Sabanci University, Faculty of Engineering and Natural

More information

Automatic Detecting Neutral Face for Face Authentication and Facial Expression Analysis

Automatic Detecting Neutral Face for Face Authentication and Facial Expression Analysis From: AAAI Technical Report SS-03-08. Compilation copyright 2003, AAAI (www.aaai.org). All rights reserved. Automatic Detecting Neutral Face for Face Authentication and Facial Expression Analysis Ying-li

More information

Facial Expression Recognition for HCI Applications

Facial Expression Recognition for HCI Applications acial Expression Recognition for HCI Applications adi Dornaika Institut Géographique National, rance Bogdan Raducanu Computer Vision Center, Spain INTRODUCTION acial expression plays an important role

More information

Facial Expression Recognition using a Dynamic Model and Motion Energy

Facial Expression Recognition using a Dynamic Model and Motion Energy M.I.T Media Laboratory Perceptual Computing Section Technical Report No. Appears: International Conference on Computer Vision 9, Cambridge, MA, June -, 99 Facial Expression Recognition using a Dynamic

More information

A Simple Approach to Facial Expression Recognition

A Simple Approach to Facial Expression Recognition Proceedings of the 2007 WSEAS International Conference on Computer Engineering and Applications, Gold Coast, Australia, January 17-19, 2007 456 A Simple Approach to Facial Expression Recognition MU-CHUN

More information

Human pose estimation using Active Shape Models

Human pose estimation using Active Shape Models Human pose estimation using Active Shape Models Changhyuk Jang and Keechul Jung Abstract Human pose estimation can be executed using Active Shape Models. The existing techniques for applying to human-body

More information

Evaluation of Gabor-Wavelet-Based Facial Action Unit Recognition in Image Sequences of Increasing Complexity

Evaluation of Gabor-Wavelet-Based Facial Action Unit Recognition in Image Sequences of Increasing Complexity Evaluation of Gabor-Wavelet-Based Facial Action Unit Recognition in Image Sequences of Increasing Complexity Ying-li Tian 1 Takeo Kanade 2 and Jeffrey F. Cohn 2,3 1 IBM T. J. Watson Research Center, PO

More information

Modeling Dynamic Structure of Human Verbal and Nonverbal Communication

Modeling Dynamic Structure of Human Verbal and Nonverbal Communication Modeling Dynamic Structure of Human Verbal and Nonverbal Communication Takashi Matsuyama and Hiroaki Kawashima Graduate School of Informatics, Kyoto University Yoshida-Honmachi, Sakyo, Kyoto 66-851, Japan

More information

Artificial Visual Speech Synchronized with a Speech Synthesis System

Artificial Visual Speech Synchronized with a Speech Synthesis System Artificial Visual Speech Synchronized with a Speech Synthesis System H.H. Bothe und E.A. Wieden Department of Electronics, Technical University Berlin Einsteinufer 17, D-10587 Berlin, Germany Abstract:

More information

Automated Facial Expression Recognition Based on FACS Action Units

Automated Facial Expression Recognition Based on FACS Action Units Automated Facial Expression Recognition Based on FACS Action Units 1,2 James J. Lien 1 Department of Electrical Engineering University of Pittsburgh Pittsburgh, PA 15260 jjlien@cs.cmu.edu 2 Takeo Kanade

More information

C.R VIMALCHAND ABSTRACT

C.R VIMALCHAND ABSTRACT International Journal of Scientific & Engineering Research, Volume 5, Issue 3, March-2014 1173 ANALYSIS OF FACE RECOGNITION SYSTEM WITH FACIAL EXPRESSION USING CONVOLUTIONAL NEURAL NETWORK AND EXTRACTED

More information

Facial Animation System Design based on Image Processing DU Xueyan1, a

Facial Animation System Design based on Image Processing DU Xueyan1, a 4th International Conference on Machinery, Materials and Computing Technology (ICMMCT 206) Facial Animation System Design based on Image Processing DU Xueyan, a Foreign Language School, Wuhan Polytechnic,

More information

Dynamic facial expression recognition using a behavioural model

Dynamic facial expression recognition using a behavioural model Dynamic facial expression recognition using a behavioural model Thomas Robin Michel Bierlaire Javier Cruz STRC 2009 10th september The context Recent interest for emotion recognition in transportation

More information

AUTOMATIC FACIAL EXPRESSION RECOGNITION FROM VIDEO SEQUENCES USING TEMPORAL INFORMATION BY IRA COHEN. B.S., Ben Gurion University of Beer-Sheva, 1998

AUTOMATIC FACIAL EXPRESSION RECOGNITION FROM VIDEO SEQUENCES USING TEMPORAL INFORMATION BY IRA COHEN. B.S., Ben Gurion University of Beer-Sheva, 1998 c Copyright by Ira Cohen, 2000 AUTOMATIC FACIAL EXPRESSION RECOGNITION FROM VIDEO SEQUENCES USING TEMPORAL INFORMATION BY IRA COHEN B.S., Ben Gurion University of Beer-Sheva, 1998 THESIS Submitted in partial

More information

Human Face Classification using Genetic Algorithm

Human Face Classification using Genetic Algorithm Human Face Classification using Genetic Algorithm Tania Akter Setu Dept. of Computer Science and Engineering Jatiya Kabi Kazi Nazrul Islam University Trishal, Mymenshing, Bangladesh Dr. Md. Mijanur Rahman

More information

AUTOMATED CHARACTER ANIMATION TOOL

AUTOMATED CHARACTER ANIMATION TOOL Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 11, November 2014,

More information

Facial Action Detection from Dual-View Static Face Images

Facial Action Detection from Dual-View Static Face Images Facial Action Detection from Dual-View Static Face Images Maja Pantic and Leon Rothkrantz Delft University of Technology Electrical Engineering, Mathematics and Computer Science Mekelweg 4, 2628 CD Delft,

More information

Dynamic Facial Expression Recognition with Atlas Construction and Sparse Representation

Dynamic Facial Expression Recognition with Atlas Construction and Sparse Representation IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. XX, NO. XX, MONTH XX 1 Dynamic Facial Expression Recognition with Atlas Construction and Sparse Representation Yimo Guo, Guoying Zhao, Senior Member, IEEE, and

More information

network and image warping. In IEEE International Conference on Neural Networks, volume III,

network and image warping. In IEEE International Conference on Neural Networks, volume III, Mary YY Leung, Hung Yen Hui, and Irwin King Facial expression synthesis by radial basis function network and image warping In IEEE International Conference on Neural Networks, volume III, pages 1{15, Washington

More information

Intelligent Hands Free Speech based SMS System on Android

Intelligent Hands Free Speech based SMS System on Android Intelligent Hands Free Speech based SMS System on Android Gulbakshee Dharmale 1, Dr. Vilas Thakare 3, Dr. Dipti D. Patil 2 1,3 Computer Science Dept., SGB Amravati University, Amravati, INDIA. 2 Computer

More information