Expressive Face Animation Synthesis Based on Dynamic Mapping Method*

Size: px
Start display at page:

Download "Expressive Face Animation Synthesis Based on Dynamic Mapping Method*"

Transcription

1 Expressive Face Animation Synthesis Based on Dynamic Mapping Method* Panrong Yin, Liyue Zhao, Lixing Huang, and Jianhua Tao National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academy of Sciences, Beijing, China Abstract. In the paper, we present a framework of speech driven face animation system with expressions. It systematically addresses audio-visual data acuisition, expressive trajectory analysis and audio-visual mapping. Based on this framework, we learn the correlation between neutral facial deformation and expressive facial deformation with Gaussian Mixture Model (GMM). A hierarchical structure is proposed to map the acoustic parameters to lip FAPs. Then the synthesized neutral FAP streams will be extended with expressive variations according to the prosody of the input speech. The uantitative evaluation of the experimental result is encouraging and the synthesized face shows a realistic uality. Introduction Speech driven face animation aims to convert incoming audio stream into a seuence of corresponding face movements. A number of its possible applications could be seen in multimodal human-computer interfaces, visual reality and videophone. Till now, most of their work was still focused on the lips movement [][2][3]. Among them, Rule-based method [2] and Vector Quantization method [3] are two direct and easily realized ways. But the results from them are usually inaccurate and discontinuous due to limited rules and codebooks. Neural network is also an effective way for the audio-visual mapping. For instance, Massaro [] trained a neural network model to learn the mapping from LPCs to face animation parameters, they used current frame, 5 backward and 5 forward time step as the input to model the context. Although the neural network has merits of moderate amount of samples and smooth synthesized result, it is deeply influenced by the initial parameters setting, and it is easier to be bogged down into local minimum. Hidden Marcov Model (HMM) is also widely used in this area because of its successful application in speech recognition. Yamamoto E. [3] built a phoneme recognition model via HMM, and directly mapping recognized phonemes to lip shapes. The smoothing algorithm was also used. The HMMs can only be generated based on phonemes, the work has to be linked to a specific language, furthermore, the synthesized lip seuence is also not very smooth. * The work is supported by the National Natural Science Foundation of China (No ) and the 863 Program (No. 2006AA0Z38). A. Paiva, R. Prada, and R.W. Picard (Eds.): ACII 2007, LNCS 4738, pp., Springer-Verlag Berlin Heidelberg 2007

2 2 P. Yin et al. Most of those systems are based on phonemic representation (phoneme or viseme) and appear limited efficiency due to the restriction of algorithms. In order to reduce the computing complexity and make the synthesized result smoother, some researchers have applied dynamic mapping method to reorder or concatenate existing audiovisual units to form new visual seuence. For instance, Bregler(Video Rerite) [6] reorders existing mouth frames based on recognized phonemes. Cosatto [7] selects corresponding visual frames according to the distance between new audio track and stored audio track, and concatenates the candidates to form a smoothest seuence. While lip shapes are closely related to speech content (linguistic), facial expression is a primary way of passing non-verbal information (paralinguistic) which contains a set of message related to the speaker s emotional state. Thus, it would be more natural and vivid for the talking head to show expressions when communicating with the human. In Pengyu Hong research [4], he not only applied several MLPs to map LPC cepstral parameters to face motion units, but also used different MLPs to map the estimated motion units to expressive motion units. Ashish Verma [2] used optical flow between visemes to generate face animations with different facial expressions. Yan Li [6] adopted 3 sets of cartoon templates with 5 levels of intensity to show expression synchronized with face animation. Although some work has been done for the expressive facial animation from the speech, the naturalness of synthesized results is still an open uestion. In our work, we design a framework of speech driven face animation system based on the dynamic mapping method. To investigate the forming process of facial expression, the correlation between neutral facial deformation and expressive facial deformation is firstly modeled by GMM. Then we combine the VQ method and frame-based concatenation model to facilitate the audio-visual mapping, and keep the result as much realistic as possible. In the training procedure, we cluster the training audio vectors according to the phoneme information, and use a codebook to denote phoneme categories. The number of phonemes in database determines the size of the codebook. During the synthesizing, we apply a hierarchical structure by the following steps. First: each frame of the input speech is compared to the codebook, then we get three different candidate codes. Second: for each candidate code, there are several phoneme samples. The target unit (the current 3 frames) of the input speech together with its context information is shifted within the speech seuence of a tri-phone under so as to find out the most matched sub-seuence. Third: the visual distance between two adjacent candidate units is computed to ensure that the concatenated seuence is smooth. Last: the expressive variation that is predicted from the k-th GMM, which is determined by the prosody of the input speech, will be imposed on the synthesized neutral face animation seuence. Figure shows a block diagram of our expressive talking head system. In the rest of the paper, section 2 introduces the data acuisition, section 3 analyses the trajectories of expressive facial deformation and GMM modeling process, section 4 focuses on the realization of lip movement synthesis from speech, section 5 gives the experimental result, and section 6 is the conclusion and future work description.

3 Expressive Face Animation Synthesis Based on Dynamic Mapping Method 3 Fig.. Expressive speech driven face animation system framework 2 Data Acuisition and Preprocessing For establishing audio-visual database, a digital camera and a microphone on the camera were used to collect the facial movement and speech signal synchronously. The training database used in our work consists of 300 complete sentences and about 2000 frame samples. The speaker was directed to articulate the same content of text with neutral accent and with natural intensity expressions respectively. Here, we choose 3 emotional states angry, happy and surprise. Once the training data is acuired, audio and visual data will be analyzed separately. For visual representation, our tracking method is implemented in an estimation-and-refining way, and this method is applied in each successive image pair. 20 salient facial feature points (Fig. 2(a)) including two nostrils (p and p2), six brow corners (p3, p4, p5, p6, p7 and p8), eight eye corners (p9, p0, p, p2, p3, p4, p5 and p6), and four mouth corners (p7, p8, p9 and p20) are used to represent the facial shape. They are initialized in the first frame interactively, and the KLT is used to estimate the feature points in the next frame. Assume 2 2 X = ( x, y, x, y,, x N, y N ) T are the positions of feature points in the current frame, and dx are the estimated offset by KLT, we try to refine the initial tracking result X + dx by applying the constraints of Point Distribution Models (PDM). Figure 2(b) shows some examples of expressive facial image. Then after coordinate normalization and affine transformation, 9 FAPs (see Table ) related to lip and facial deformation will be extracted. For audio representation, the Mel-Freuency Cepstrum Coefficients (MFCC) which gives an alternative representation to speech spectra is calculated. The speech signal, sampled at 6 khz, is blocked into frames of 40 ms. 2-dimentional MFCC coefficients are computed for every audio frame, and one visual frame corresponds to one audio frame. Furthermore, global statistics on prosody features (such as pitch, energy

4 4 P. Yin et al. and so on) responsible for facial expressions will be extracted in our work. F0 range, the maximum of F0s, the minimum of F0s, the mean of F0s and the mean of energy, which have been confirmed to be useful for emotional speech classification, are selected for each sample. (a) Happy Surprise Angry (b) Fig. 2. (a) Facial feature points and (b) Expressive facial image Table. FAPs for visual representation in our system Group FAP name Group FAP name 2 Open_jaw 8 Stretch_r_cornerlip_o 4 Raise_l_i_eyebrow 8 Stretch_l_cornerlip_o 4 Raise_r_i_eyebrow 8 Lower_t_lip_lm_o 4 Raise_l_m_eyebrow 8 Lower_t_lip_rm_o 4 Raise_r_m_eyebrow 8 Raise_b_lip_lm_o 4 Raise_l_o_eyebrow 8 Raise_b_lip_rm_o 4 Raise_r_i_eyebrow 8 Raise_l_cornerlip_o 8 Lower_t_midlip_o 8 Raise_r_cornerlip_o 8 Raise_b_midlip_o 3 Neutral-Expressive Facial Deformation Mapping In our work, the problem of processing facial expression is simplified by taking advantage of the correlation between the facial deformation without expressions and facial deformation with expressions that account for the same speech content. Here, we choose 3 points for following analysis: p4(middle point of right eyebrow), p8(middle of upper lip), p9(left corner of the mouth). Figure 3 shows the dynamic vertical movement of p4 for jiu4 shi4 xia4 yu3 ye3 u4 in neutral and surprise condition and the vertical movement of p8 in neutral and happy condition. It is evident from Fig.3(a) that facial deformations are strongly affected by emotion. Although the trajectory of vertical movement of right eyebrow in surprise condition is following the trend of that in neutral condition, the intensity is much stronger and the time duration tends to be longer. On the other hand, the lip

5 Expressive Face Animation Synthesis Based on Dynamic Mapping Method 5 movement is not only influenced by the speech content, but also affected by the speech emotional rhythm. The action of showing expressions will deform the original mouth shape. From Fig.3(b) and Fig.3(c), we can see that p8 which is on the upper lip moves up more under happy condition, and p8 and p9 have similar vertical trajectories because of their interrelate relationship on the lip. (a) (b) (c) Fig. 3. Movements of (a) p4, (b) p8 and (c) p9 under different emotional states Since the degree of expressive representation is in a continuous space, it is not reasonable to simply classify the intensity of expression into several levels. Therefore, for estimating natural expressive facial deformation from neutral face movements, some GMMs are used to model the probability distribution of the neutral-expressive deformation vectors. Each emotion category corresponds to a GMM. To collect training data, these N vector and E vector seuences are firstly aligned with Dynamic Time Warping (DTW). Then, we cascade neutral facial deformation features with the expressive facial deformation features to compose the joint feature vector: Z ik =[X k, Y ik ] T, i [0,,2,3,4,5], k=,,n. Therefore, the joint probability distribution of the neutral-expressive deformation vectors is modeled by GMM, which is a weighted sum of Q Gaussian functions: Q P( Z) = w N( Z; μ ; Σ ), w =. () = Where N(Z;μ ; ) is the Gaussian distributed density component, μ and are the -th mean vector and -th covariance matrix, w is the mixture weight and Q is the number of Gaussian functions in the GMM. GMMs parameters: (w,μ, ) (Euation.2) can be obtained by training with Expectation-Maximization(EM) algorithm. Q = X XX XY μ Σ Σ =,, =,..., Q Y Σ = YX YY μ Σ Σ μ. (2) After the GMMs are trained with the training data, the optimal estimate of expressive facial deformation (Y ik ) given neutral facial deformation (X k ) can be obtained according to the transform function of conditional expectation (Euation.3). Y ik = E Q Y YX XX X { Yik / X k} = p ( X k )[ + Σ ( Σ ) ( X k μ = μ )]. (3)

6 6 P. Yin et al. Where p (X k ) is the probability that the given neutral observation belongs to the mixture component (Euation.4). p ( X ) k = Q w N( X p= p ; μ ; Σ ) k w N( X ; μ ; Σ ) k p p. (4) 4 Lip Movement Synthesized from Speech Dynamic concatenation model applied in our previous work [8] has output a result of natural uality and could be easily realized. Considering searching efficiency and automatic performance of system, we introduce hierarchical structure and regard every three frames as a unit instead of a phoneme in our previous work. Unlike Cossato s work [7], in our approach, every three frames of the input speech are not directly compared with existing frames in corpus, because the frame-based corpus would be too large to search. Therefore, we break the mapping process (see Fig.4) into two steps: finding out approximate candidate codes in codebooks and finding out approximate sub-seuence by calculating cost with context information. Fig. 4. Hierarchical structure of improved dynamic concatenation model

7 Expressive Face Animation Synthesis Based on Dynamic Mapping Method 7 4. Phoneme Based Acoustic Codebook For higher efficiency, a phoneme based codebook is produced in training process. The training audio vectors are clustered according to the phoneme information. Every code in the codebook corresponds to a phoneme. The number of phonemes in database determines the size of the codebook. There are totally 48 codes including the SIL part in sentences. When coming to the synthesizing process, the distance between each frame of the input speech and the codebook is calculated. The most 3 approximate phoneme categories represented by candidate codes will be listed, so that mistakenly choosing similar phoneme by choosing only one code can be avoided. The first layer cost COST can be obtained by Euation.5. COST 4.2 Cost Computation t = dist( k); k = arg min a codebook( n) n. (5) For each candidate code, there are several phoneme samples. Then more detailed audio-visual unit mapping is employed to synthesize continuous FAP streams. Sublayer cost computation is based on two cost functions: COST ) 2 a v = αc + ( α C. (6) where C a is the voice spectrum distance, C v is the visual distance between two adjacent units, and the weight α balances the effect of the two cost functions. In the unit selection procedures, we regard every 3 frames of the input speech as a target unit. In fact, the process of selecting candidate units is to find out the most approximate sub-seuence within the range of the candidate phoneme together with its context. The voice spectrum distance (see Fig.5) not only accounts for the context of target unit of the input speech, but also considers the context of the candidate phoneme. The context of target unit covers former m frames and latter m frames of the current frame. The context of candidate phoneme accounts for a tri-phone, which covers the former phoneme and the latter phoneme. When measuring how close the candidate unit compared with the target unit, the target unit with its context is shifted from the beginning to the end of the tri-phone. In this way, all the positions of subseuence within the tri-phone could be considered. Finally, the sub-seuence with the minimum average distance is output, thus the candidate unit is selected. So the voice spectrum distance can be defined by Euation.7: C a = p 6 t= curr p m= 6 w t m a tar t, m a can t, m,. (7) The weights w m are determined by the method used in [0], we compute the linear tar can blending weights in terms of target unit s duration. a t, m at, m is the Eucilidian distance of the acoustic parameter of two periods of frames. For the sake of reducing the complexity of Viterbi search, we set a limit of seuence candidate number for every round selection.

8 8 P. Yin et al. Fig. 5. Voice spectrum cost Not only should we find out the correct speech signal unit, but also the smooth synthesized face animation should be considered. The concatenation cost measures how closely the mouth shapes of the adjacent units match. So the FAPs of the last frame of former candidate unit are compared with that of the first frame of current candidate unit (Euation.8). C v = v can, can ) (8) ( r r Where v(can r-, can r ) is the Eucilidian distance of the adjacent visual features of two candidate seuence. Once the two costs COST and COST 2 are computed, the graph for unit concatenation is constructed. Our approach finally aims to find the best path in this graph which generates minimum COST (Euation.9), Viterbi is a valid search method for this application. n / 3 2 COST r COST r. (9) = COST ( r Where n indicates the total number of frames in the input speech, and r means the number of target units contained in the input speech. 5 Experimental Result When the speech is input, the system will calculate MFCC coefficients and prosody features respectively. The former one is used to drive content related face movement, the latter one is used to choose the appropriate GMM. Once the neutral FAP stream is synthesized by improved concatenation model, the chosen GMM will predict expressive FAP stream based on the neutral FAP stream. Fig.6 shows the examples of synthesized FAP stream results and Fig.7 shows the synthesized expressive facial deformation of 3 points (3 FAPs related) compared with the recorded deformation. In Fig.6(a) and Fig.6(b), we compare the synthesized FAP 52 stream with recorded FAP stream from validation set and test set respectively. In Fig.6(a), the two curves (smoothed synthesized seuence and recorded seuence) are very close, because the )* ( )

9 Expressive Face Animation Synthesis Based on Dynamic Mapping Method 9 validation speech input is more likely to find out the complete seuence of the same sentence from corpus. In Fig.6(b), the input speech is much different from those used in training process. Although the two curves are not so close, the slopes of them are similar in most cases. To reduce the magnitude of discontinuities, the final synthesized result is smoothed by curve fitting. (a) (b) Fig. 6. Selected synthesized FAP streams from (a) validation sentence and (b) test sentence In Fig.7, the synthesized expressive trajectories appear good performance of following the trend of recorded ones. It is noticed that the ends of synthesized trajectories do not well estimate the facial deformation, this is mainly because that the longer part of expressive training data is a tone extension, so the features of end part of two aligned seuences do not strictly correspond. (a) (b) (c) Fig. 7. Synthesized expressive trajectories of (a) p4, (b) p8 and (c) p9 under different emotional states For uantitative evaluation, correlation coefficient (Euation.0) is used to represent the deviation similarity between recorded FAP stream and synthesized stream. The coefficient is closer to, the result is better follow the trends in original values. CC = T T t= ( f ( t) μ)( f ( t) μ) σ σ. (0)

10 0 P. Yin et al. Where f(t) is the recorded FAP stream, f (t ) is the synthesis, T is the total number of frames in the database, and μ and σ are corresponding mean and standard deviation. From Table 2, we can see that estimates for p4 are generally better than those of other points on lips. The eyebrows movements are observed to be mainly affected by expressions, while lip movements are not only influenced by the speech content, but also extended with emotional exhibition. The trajectories of p8 and p9 are determined by both content and expression, thus they have smaller correlation coefficients. Table 2. The average correlation coefficients for different expressions Correlation Coefficients Improved dynamic concatenation model (Neutral) GMM for Surprise GMM for Happy p4 p8 p At last, a MPEG-4 facial animation engine is used to access our synthesized FAP streams ualitatively. The animation model displays at a frame rate of 30 fps. Fig.8 shows some frames of synthesized talking head compared with recorded images. Fig. 8. Some frames of synthesized talking head We have tried different voices on our system. The synthesized result is well matched their voices, and the face animation seems very natural. Although Chinese voice of a man is used in the training process, our system can adapt to other languages as well. 6 Conclusion and Future Work In the present work, we analyze the correlation between neutral facial deformation and expressive facial deformation and use GMM to model the joint probability distribution. In addition, we also make some improvements to the dynamic mapping method in our previous work. By employing a hierarchical structure, the mapping process can be broke into two steps. First-layer acoustic codebook gives a general classification for the input speech frames. Sub-layer frame-based cost computation ensures that most approximate candidate seuence is selected. Through this method, the unit searching speed is largely enhanced, the artificial operation in animation process is avoided and the synthesized result keeps natural and realistic. But our work

11 Expressive Face Animation Synthesis Based on Dynamic Mapping Method is presently limited to some typical emotions. The affective information represented by speaker in natural spiritual state is still a challenged work. In the future, more work will be done to investigate into how dynamic expressive movement is related to prosody features. It also needs to align FAP seuences with better strategy and finding more appropriate parameters to smooth the synthesized trajectories. References. Massaro, D.W., Beskow, J., Cohen, M.M., Fry, C.L., Rodriguez, T.: Picture My Voice: Audio to Visual Speech Synthesis using Artificial Neural Networks. In: Proceedings of AVSP 99, Santa Cruz, CA, August, pp (999) 2. Ezzat, T., Poggio, T.: MikeTalk: A Talking Facial Display Based on Morphing Visemes. In: Proc. Computer Animation Conference, Philadelphia, USA (998) 3. Yamamoto, E., Nakamura, S., Shikano, K.: Lip movement synthesis from speech based on Hidden Markov Models. Speech Communication 26, 05 5 (998) 4. Hong, P., Wen, Z., Huang, T.S.: Real-time speech-driven face animation with expressions using neural networks. IEEE Trans on Neural Networks 3(4) (2002) 5. Matthew Brand. Voice Puppetry. In: Pro of SIGGRAPH 99. p Bregler, C., Covell, M., Slaney, M.: Video Rewrite: Driving Visual Speech with Audio, ACM SIGGRAPH (997) 7. Cosatto, E., Potamianos, G., Graf, H.P.: Audio-visual unit selection for the synthesis of photo-realistic talking-heads. In: IEEE International Conference on Multimedia and Expo, ICME, vol. 2, pp (2000) 8. Yin, P., Tao, J.: Dynamic mapping method based speech driven face animation system. The First International Conference on Affective Computing and Intelligent Interaction (2005) 9. Tekalp, A.M., Ostermann, J.: Face and 2-D mesh animation in MPEG-4, Signal Processing. Image Communication 5, (2000) 0. Wang, J.-Q., Wong, K.-H., Pheng, P.-A., Meng, H.M., Wong, T.-T.: A real-time Cantonese text-to-audiovisual speech synthesizer, Acoustics, Speech, and Signal Processing. Proceedings (ICASSP 04), (2004). Arun, K.S., Huang, T.S., Blostein, S.D.: Least-suare fitting of two 3-D point sets. IEEE Trans. Pattern Analysis and Machine Intelligence 9(5), (987) 2. Verma, A., Subramaniam, L.V., Rajput, N., Neti, C., Faruuie, T.A.: Animating Expressive Faces Across Languages. IEEE Trans on Multimedia 6(6) (2004) 3. Tao, J., Tan, T.: Emotional Chinese Talking Head Syste. In: Proc. of ACM 6th International Conference on Multimodal Interfaces (ICMI 2004), State College, PA ( October 2004) 4. Gutierrez-Osuna, R., Kakumanu, P.K., Esposito, A., Garcia, O.N., Bojoruez, A., Castillo, J.L., Rudomin, I.: Speech-Driven Facial Animation with Realistic Dynamics. IEEE Trans. on Multimedia 7() (2005) 5. Huang, Y., Lin, S., Ding, X., Guo, B., Shum, H.-Y.: Real-time Lip Synchronization Based on Hidden Markov Models. ACCV (2002) 6. Li, Y., Yu, F., Xu, Y.-Q., Chang, E., Shum, H.-Y.: Speech-Driven Cartoon Animation with Emotions. In: Proceedings of the ninth ACM international conference on Multimedia (200) 7. Rao, R., Chen, T.: Audio-to-Visual Conversion for Multimedia Communication. IEEE Transactions on Industrial Electronics 45(), 5 22 (998)

Speech Driven Face Animation Based on Dynamic Concatenation Model

Speech Driven Face Animation Based on Dynamic Concatenation Model Journal of Information & Computational Science 3: 4 (2006) 1 Available at http://www.joics.com Speech Driven Face Animation Based on Dynamic Concatenation Model Jianhua Tao, Panrong Yin National Laboratory

More information

M I RA Lab. Speech Animation. Where do we stand today? Speech Animation : Hierarchy. What are the technologies?

M I RA Lab. Speech Animation. Where do we stand today? Speech Animation : Hierarchy. What are the technologies? MIRALab Where Research means Creativity Where do we stand today? M I RA Lab Nadia Magnenat-Thalmann MIRALab, University of Geneva thalmann@miralab.unige.ch Video Input (face) Audio Input (speech) FAP Extraction

More information

Real-time Lip Synchronization Based on Hidden Markov Models

Real-time Lip Synchronization Based on Hidden Markov Models ACCV2002 The 5th Asian Conference on Computer Vision, 23--25 January 2002, Melbourne, Australia. Real-time Lip Synchronization Based on Hidden Markov Models Ying Huang* 1 Stephen Lin+ Xiaoqing Ding* Baining

More information

VISEME SPACE FOR REALISTIC SPEECH ANIMATION

VISEME SPACE FOR REALISTIC SPEECH ANIMATION VISEME SPACE FOR REALISTIC SPEECH ANIMATION Sumedha Kshirsagar, Nadia Magnenat-Thalmann MIRALab CUI, University of Geneva {sumedha, thalmann}@miralab.unige.ch http://www.miralab.unige.ch ABSTRACT For realistic

More information

Real-time Talking Head Driven by Voice and its Application to Communication and Entertainment

Real-time Talking Head Driven by Voice and its Application to Communication and Entertainment ISCA Archive Real-time Talking Head Driven by Voice and its Application to Communication and Entertainment Shigeo MORISHIMA Seikei University ABSTRACT Recently computer can make cyberspace to walk through

More information

Speech Driven Synthesis of Talking Head Sequences

Speech Driven Synthesis of Talking Head Sequences 3D Image Analysis and Synthesis, pp. 5-56, Erlangen, November 997. Speech Driven Synthesis of Talking Head Sequences Peter Eisert, Subhasis Chaudhuri,andBerndGirod Telecommunications Laboratory, University

More information

Story Unit Segmentation with Friendly Acoustic Perception *

Story Unit Segmentation with Friendly Acoustic Perception * Story Unit Segmentation with Friendly Acoustic Perception * Longchuan Yan 1,3, Jun Du 2, Qingming Huang 3, and Shuqiang Jiang 1 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing,

More information

TOWARDS A HIGH QUALITY FINNISH TALKING HEAD

TOWARDS A HIGH QUALITY FINNISH TALKING HEAD TOWARDS A HIGH QUALITY FINNISH TALKING HEAD Jean-Luc Oliv&s, Mikko Sam, Janne Kulju and Otto Seppala Helsinki University of Technology Laboratory of Computational Engineering, P.O. Box 9400, Fin-02015

More information

Realistic Talking Head for Human-Car-Entertainment Services

Realistic Talking Head for Human-Car-Entertainment Services Realistic Talking Head for Human-Car-Entertainment Services Kang Liu, M.Sc., Prof. Dr.-Ing. Jörn Ostermann Institut für Informationsverarbeitung, Leibniz Universität Hannover Appelstr. 9A, 30167 Hannover

More information

Face Alignment Under Various Poses and Expressions

Face Alignment Under Various Poses and Expressions Face Alignment Under Various Poses and Expressions Shengjun Xin and Haizhou Ai Computer Science and Technology Department, Tsinghua University, Beijing 100084, China ahz@mail.tsinghua.edu.cn Abstract.

More information

Text-to-Audiovisual Speech Synthesizer

Text-to-Audiovisual Speech Synthesizer Text-to-Audiovisual Speech Synthesizer Udit Kumar Goyal, Ashish Kapoor and Prem Kalra Department of Computer Science and Engineering, Indian Institute of Technology, Delhi pkalra@cse.iitd.ernet.in Abstract.

More information

MAPPING FROM SPEECH TO IMAGES USING CONTINUOUS STATE SPACE MODELS. Tue Lehn-Schiøler, Lars Kai Hansen & Jan Larsen

MAPPING FROM SPEECH TO IMAGES USING CONTINUOUS STATE SPACE MODELS. Tue Lehn-Schiøler, Lars Kai Hansen & Jan Larsen MAPPING FROM SPEECH TO IMAGES USING CONTINUOUS STATE SPACE MODELS Tue Lehn-Schiøler, Lars Kai Hansen & Jan Larsen The Technical University of Denmark Informatics and Mathematical Modelling Richard Petersens

More information

Lip Tracking for MPEG-4 Facial Animation

Lip Tracking for MPEG-4 Facial Animation Lip Tracking for MPEG-4 Facial Animation Zhilin Wu, Petar S. Aleksic, and Aggelos. atsaggelos Department of Electrical and Computer Engineering Northwestern University 45 North Sheridan Road, Evanston,

More information

IFACE: A 3D SYNTHETIC TALKING FACE

IFACE: A 3D SYNTHETIC TALKING FACE IFACE: A 3D SYNTHETIC TALKING FACE PENGYU HONG *, ZHEN WEN, THOMAS S. HUANG Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign Urbana, IL 61801, USA We present

More information

FACE ANALYSIS AND SYNTHESIS FOR INTERACTIVE ENTERTAINMENT

FACE ANALYSIS AND SYNTHESIS FOR INTERACTIVE ENTERTAINMENT FACE ANALYSIS AND SYNTHESIS FOR INTERACTIVE ENTERTAINMENT Shoichiro IWASAWA*I, Tatsuo YOTSUKURA*2, Shigeo MORISHIMA*2 */ Telecommunication Advancement Organization *2Facu!ty of Engineering, Seikei University

More information

Real-Time Speech-Driven Face Animation with Expressions Using Neural Networks

Real-Time Speech-Driven Face Animation with Expressions Using Neural Networks IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 1, JANUARY 2002 100 Real-Time Speech-Driven Face Animation with Expressions Using Neural Networks Pengyu Hong, Zhen Wen, and Thomas S. Huang, Fellow,

More information

Fully Automatic Methodology for Human Action Recognition Incorporating Dynamic Information

Fully Automatic Methodology for Human Action Recognition Incorporating Dynamic Information Fully Automatic Methodology for Human Action Recognition Incorporating Dynamic Information Ana González, Marcos Ortega Hortas, and Manuel G. Penedo University of A Coruña, VARPA group, A Coruña 15071,

More information

Data-Driven Face Modeling and Animation

Data-Driven Face Modeling and Animation 1. Research Team Data-Driven Face Modeling and Animation Project Leader: Post Doc(s): Graduate Students: Undergraduate Students: Prof. Ulrich Neumann, IMSC and Computer Science John P. Lewis Zhigang Deng,

More information

A Hybrid Approach to News Video Classification with Multi-modal Features

A Hybrid Approach to News Video Classification with Multi-modal Features A Hybrid Approach to News Video Classification with Multi-modal Features Peng Wang, Rui Cai and Shi-Qiang Yang Department of Computer Science and Technology, Tsinghua University, Beijing 00084, China Email:

More information

VTalk: A System for generating Text-to-Audio-Visual Speech

VTalk: A System for generating Text-to-Audio-Visual Speech VTalk: A System for generating Text-to-Audio-Visual Speech Prem Kalra, Ashish Kapoor and Udit Kumar Goyal Department of Computer Science and Engineering, Indian Institute of Technology, Delhi Contact email:

More information

REALISTIC FACIAL EXPRESSION SYNTHESIS FOR AN IMAGE-BASED TALKING HEAD. Kang Liu and Joern Ostermann

REALISTIC FACIAL EXPRESSION SYNTHESIS FOR AN IMAGE-BASED TALKING HEAD. Kang Liu and Joern Ostermann REALISTIC FACIAL EXPRESSION SYNTHESIS FOR AN IMAGE-BASED TALKING HEAD Kang Liu and Joern Ostermann Institut für Informationsverarbeitung, Leibniz Universität Hannover Appelstr. 9A, 3167 Hannover, Germany

More information

Voice Command Based Computer Application Control Using MFCC

Voice Command Based Computer Application Control Using MFCC Voice Command Based Computer Application Control Using MFCC Abinayaa B., Arun D., Darshini B., Nataraj C Department of Embedded Systems Technologies, Sri Ramakrishna College of Engineering, Coimbatore,

More information

Facial Animation System Design based on Image Processing DU Xueyan1, a

Facial Animation System Design based on Image Processing DU Xueyan1, a 4th International Conference on Machinery, Materials and Computing Technology (ICMMCT 206) Facial Animation System Design based on Image Processing DU Xueyan, a Foreign Language School, Wuhan Polytechnic,

More information

An Automatic 3D Face Model Segmentation for Acquiring Weight Motion Area

An Automatic 3D Face Model Segmentation for Acquiring Weight Motion Area An Automatic 3D Face Model Segmentation for Acquiring Weight Motion Area Rio Caesar Suyoto Samuel Gandang Gunanto Magister Informatics Engineering Atma Jaya Yogyakarta University Sleman, Indonesia Magister

More information

2-2-2, Hikaridai, Seika-cho, Soraku-gun, Kyoto , Japan 2 Graduate School of Information Science, Nara Institute of Science and Technology

2-2-2, Hikaridai, Seika-cho, Soraku-gun, Kyoto , Japan 2 Graduate School of Information Science, Nara Institute of Science and Technology ISCA Archive STREAM WEIGHT OPTIMIZATION OF SPEECH AND LIP IMAGE SEQUENCE FOR AUDIO-VISUAL SPEECH RECOGNITION Satoshi Nakamura 1 Hidetoshi Ito 2 Kiyohiro Shikano 2 1 ATR Spoken Language Translation Research

More information

Human body animation. Computer Animation. Human Body Animation. Skeletal Animation

Human body animation. Computer Animation. Human Body Animation. Skeletal Animation Computer Animation Aitor Rovira March 2010 Human body animation Based on slides by Marco Gillies Human Body Animation Skeletal Animation Skeletal Animation (FK, IK) Motion Capture Motion Editing (retargeting,

More information

Facial Animation System Based on Image Warping Algorithm

Facial Animation System Based on Image Warping Algorithm Facial Animation System Based on Image Warping Algorithm Lanfang Dong 1, Yatao Wang 2, Kui Ni 3, Kuikui Lu 4 Vision Computing and Visualization Laboratory, School of Computer Science and Technology, University

More information

Multi-Modal Human Verification Using Face and Speech

Multi-Modal Human Verification Using Face and Speech 22 Multi-Modal Human Verification Using Face and Speech Changhan Park 1 and Joonki Paik 2 1 Advanced Technology R&D Center, Samsung Thales Co., Ltd., 2 Graduate School of Advanced Imaging Science, Multimedia,

More information

Animation of 3D surfaces.

Animation of 3D surfaces. Animation of 3D surfaces Motivations When character animation is controlled by skeleton set of hierarchical joints joints oriented by rotations the character shape still needs to be visible: visible =

More information

Artificial Visual Speech Synchronized with a Speech Synthesis System

Artificial Visual Speech Synchronized with a Speech Synthesis System Artificial Visual Speech Synchronized with a Speech Synthesis System H.H. Bothe und E.A. Wieden Department of Electronics, Technical University Berlin Einsteinufer 17, D-10587 Berlin, Germany Abstract:

More information

Facial Expression Analysis for Model-Based Coding of Video Sequences

Facial Expression Analysis for Model-Based Coding of Video Sequences Picture Coding Symposium, pp. 33-38, Berlin, September 1997. Facial Expression Analysis for Model-Based Coding of Video Sequences Peter Eisert and Bernd Girod Telecommunications Institute, University of

More information

image-based visual synthesis: facial overlay

image-based visual synthesis: facial overlay Universität des Saarlandes Fachrichtung 4.7 Phonetik Sommersemester 2002 Seminar: Audiovisuelle Sprache in der Sprachtechnologie Seminarleitung: Dr. Jacques Koreman image-based visual synthesis: facial

More information

CS 231. Deformation simulation (and faces)

CS 231. Deformation simulation (and faces) CS 231 Deformation simulation (and faces) Deformation BODY Simulation Discretization Spring-mass models difficult to model continuum properties Simple & fast to implement and understand Finite Element

More information

Enhanced Active Shape Models with Global Texture Constraints for Image Analysis

Enhanced Active Shape Models with Global Texture Constraints for Image Analysis Enhanced Active Shape Models with Global Texture Constraints for Image Analysis Shiguang Shan, Wen Gao, Wei Wang, Debin Zhao, Baocai Yin Institute of Computing Technology, Chinese Academy of Sciences,

More information

K A I S T Department of Computer Science

K A I S T Department of Computer Science An Example-based Approach to Text-driven Speech Animation with Emotional Expressions Hyewon Pyun, Wonseok Chae, Yejin Kim, Hyungwoo Kang, and Sung Yong Shin CS/TR-2004-200 July 19, 2004 K A I S T Department

More information

Hybrid Speech Synthesis

Hybrid Speech Synthesis Hybrid Speech Synthesis Simon King Centre for Speech Technology Research University of Edinburgh 2 What are you going to learn? Another recap of unit selection let s properly understand the Acoustic Space

More information

3D Mesh Sequence Compression Using Thin-plate Spline based Prediction

3D Mesh Sequence Compression Using Thin-plate Spline based Prediction Appl. Math. Inf. Sci. 10, No. 4, 1603-1608 (2016) 1603 Applied Mathematics & Information Sciences An International Journal http://dx.doi.org/10.18576/amis/100440 3D Mesh Sequence Compression Using Thin-plate

More information

Unsupervised Learning for Speech Motion Editing

Unsupervised Learning for Speech Motion Editing Eurographics/SIGGRAPH Symposium on Computer Animation (2003) D. Breen, M. Lin (Editors) Unsupervised Learning for Speech Motion Editing Yong Cao 1,2 Petros Faloutsos 1 Frédéric Pighin 2 1 University of

More information

Speaker Diarization System Based on GMM and BIC

Speaker Diarization System Based on GMM and BIC Speaer Diarization System Based on GMM and BIC Tantan Liu 1, Xiaoxing Liu 1, Yonghong Yan 1 1 ThinIT Speech Lab, Institute of Acoustics, Chinese Academy of Sciences Beijing 100080 {tliu, xliu,yyan}@hccl.ioa.ac.cn

More information

Facial expression recognition using shape and texture information

Facial expression recognition using shape and texture information 1 Facial expression recognition using shape and texture information I. Kotsia 1 and I. Pitas 1 Aristotle University of Thessaloniki pitas@aiia.csd.auth.gr Department of Informatics Box 451 54124 Thessaloniki,

More information

An Introduction to Pattern Recognition

An Introduction to Pattern Recognition An Introduction to Pattern Recognition Speaker : Wei lun Chao Advisor : Prof. Jian-jiun Ding DISP Lab Graduate Institute of Communication Engineering 1 Abstract Not a new research field Wide range included

More information

FaceSync: A linear operator for measuring synchronization of video facial images and audio tracks

FaceSync: A linear operator for measuring synchronization of video facial images and audio tracks Presented at NIPS, Denver Colorado, November,. Published in the Proceedings of Neural Information Processing Society 13, MI Press, 1. FaceSync: A linear operator for measuring synchronization of video

More information

Dynamic Human Fatigue Detection Using Feature-Level Fusion

Dynamic Human Fatigue Detection Using Feature-Level Fusion Dynamic Human Fatigue Detection Using Feature-Level Fusion Xiao Fan, Bao-Cai Yin, and Yan-Feng Sun Beijing Key Laboratory of Multimedia and Intelligent Software, College of Computer Science and Technology,

More information

Speaker Localisation Using Audio-Visual Synchrony: An Empirical Study

Speaker Localisation Using Audio-Visual Synchrony: An Empirical Study Speaker Localisation Using Audio-Visual Synchrony: An Empirical Study H.J. Nock, G. Iyengar, and C. Neti IBM TJ Watson Research Center, PO Box 218, Yorktown Heights, NY 10598. USA. Abstract. This paper

More information

Towards Audiovisual TTS

Towards Audiovisual TTS Towards Audiovisual TTS in Estonian Einar MEISTER a, SaschaFAGEL b and RainerMETSVAHI a a Institute of Cybernetics at Tallinn University of Technology, Estonia b zoobemessageentertainmentgmbh, Berlin,

More information

Detection of Mouth Movements and Its Applications to Cross-Modal Analysis of Planning Meetings

Detection of Mouth Movements and Its Applications to Cross-Modal Analysis of Planning Meetings 2009 International Conference on Multimedia Information Networking and Security Detection of Mouth Movements and Its Applications to Cross-Modal Analysis of Planning Meetings Yingen Xiong Nokia Research

More information

SPEECH FEATURE EXTRACTION USING WEIGHTED HIGHER-ORDER LOCAL AUTO-CORRELATION

SPEECH FEATURE EXTRACTION USING WEIGHTED HIGHER-ORDER LOCAL AUTO-CORRELATION Far East Journal of Electronics and Communications Volume 3, Number 2, 2009, Pages 125-140 Published Online: September 14, 2009 This paper is available online at http://www.pphmj.com 2009 Pushpa Publishing

More information

Re-mapping Animation Parameters Between Multiple Types of Facial Model

Re-mapping Animation Parameters Between Multiple Types of Facial Model Re-mapping Animation Parameters Between Multiple Types of Facial Model Darren Cosker, Steven Roy, Paul L. Rosin, and David Marshall School of Computer Science, Cardiff University, U.K D.P.Cosker,Paul.Rosin,Dave.Marshal@cs.cardiff.ac.uk

More information

LOW-DIMENSIONAL MOTION FEATURES FOR AUDIO-VISUAL SPEECH RECOGNITION

LOW-DIMENSIONAL MOTION FEATURES FOR AUDIO-VISUAL SPEECH RECOGNITION LOW-DIMENSIONAL MOTION FEATURES FOR AUDIO-VISUAL SPEECH Andrés Vallés Carboneras, Mihai Gurban +, and Jean-Philippe Thiran + + Signal Processing Institute, E.T.S.I. de Telecomunicación Ecole Polytechnique

More information

Personal style & NMF-based Exaggerative Expressions of Face. Seongah Chin, Chung-yeon Lee, Jaedong Lee Multimedia Department of Sungkyul University

Personal style & NMF-based Exaggerative Expressions of Face. Seongah Chin, Chung-yeon Lee, Jaedong Lee Multimedia Department of Sungkyul University Personal style & NMF-based Exaggerative Expressions of Face Seongah Chin, Chung-yeon Lee, Jaedong Lee Multimedia Department of Sungkyul University Outline Introduction Related Works Methodology Personal

More information

Multi-modal Translation and Evaluation of Lip-synchronization using Noise Added Voice

Multi-modal Translation and Evaluation of Lip-synchronization using Noise Added Voice Multi-modal Translation and Evaluation of Lip-synchronization using Noise Added Voice Shigeo MORISHIMA (,2), Satoshi NAKAMURA (2) () Faculty of Engineering, Seikei University. --, Kichijoji-Kitamachi,

More information

Dynamic Time Warping

Dynamic Time Warping Centre for Vision Speech & Signal Processing University of Surrey, Guildford GU2 7XH. Dynamic Time Warping Dr Philip Jackson Acoustic features Distance measures Pattern matching Distortion penalties DTW

More information

CS 231. Deformation simulation (and faces)

CS 231. Deformation simulation (and faces) CS 231 Deformation simulation (and faces) 1 Cloth Simulation deformable surface model Represent cloth model as a triangular or rectangular grid Points of finite mass as vertices Forces or energies of points

More information

A Method of Hyper-sphere Cover in Multidimensional Space for Human Mocap Data Retrieval

A Method of Hyper-sphere Cover in Multidimensional Space for Human Mocap Data Retrieval Journal of Human Kinetics volume 28/2011, 133-139 DOI: 10.2478/v10078-011-0030-0 133 Section III Sport, Physical Education & Recreation A Method of Hyper-sphere Cover in Multidimensional Space for Human

More information

Hands On: Multimedia Methods for Large Scale Video Analysis (Lecture) Dr. Gerald Friedland,

Hands On: Multimedia Methods for Large Scale Video Analysis (Lecture) Dr. Gerald Friedland, Hands On: Multimedia Methods for Large Scale Video Analysis (Lecture) Dr. Gerald Friedland, fractor@icsi.berkeley.edu 1 Today Recap: Some more Machine Learning Multimedia Systems An example Multimedia

More information

A Semi-Automatic 2D-to-3D Video Conversion with Adaptive Key-Frame Selection

A Semi-Automatic 2D-to-3D Video Conversion with Adaptive Key-Frame Selection A Semi-Automatic 2D-to-3D Video Conversion with Adaptive Key-Frame Selection Kuanyu Ju and Hongkai Xiong Department of Electronic Engineering, Shanghai Jiao Tong University, Shanghai, China ABSTRACT To

More information

Acoustic to Articulatory Mapping using Memory Based Regression and Trajectory Smoothing

Acoustic to Articulatory Mapping using Memory Based Regression and Trajectory Smoothing Acoustic to Articulatory Mapping using Memory Based Regression and Trajectory Smoothing Samer Al Moubayed Center for Speech Technology, Department of Speech, Music, and Hearing, KTH, Sweden. sameram@kth.se

More information

Text2Video: Text-Driven Facial Animation using MPEG-4

Text2Video: Text-Driven Facial Animation using MPEG-4 Text2Video: Text-Driven Facial Animation using MPEG-4 J. Rurainsky and P. Eisert Fraunhofer Institute for Telecommunications - Heinrich-Hertz Institute Image Processing Department D-10587 Berlin, Germany

More information

AUDIOVISUAL SYNTHESIS OF EXAGGERATED SPEECH FOR CORRECTIVE FEEDBACK IN COMPUTER-ASSISTED PRONUNCIATION TRAINING.

AUDIOVISUAL SYNTHESIS OF EXAGGERATED SPEECH FOR CORRECTIVE FEEDBACK IN COMPUTER-ASSISTED PRONUNCIATION TRAINING. AUDIOVISUAL SYNTHESIS OF EXAGGERATED SPEECH FOR CORRECTIVE FEEDBACK IN COMPUTER-ASSISTED PRONUNCIATION TRAINING Junhong Zhao 1,2, Hua Yuan 3, Wai-Kim Leung 4, Helen Meng 4, Jia Liu 3 and Shanhong Xia 1

More information

HIDDEN Markov model (HMM)-based statistical parametric

HIDDEN Markov model (HMM)-based statistical parametric 1492 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 20, NO. 5, JULY 2012 Minimum Kullback Leibler Divergence Parameter Generation for HMM-Based Speech Synthesis Zhen-Hua Ling, Member,

More information

Animation of 3D surfaces

Animation of 3D surfaces Animation of 3D surfaces 2013-14 Motivations When character animation is controlled by skeleton set of hierarchical joints joints oriented by rotations the character shape still needs to be visible: visible

More information

Real time facial expression recognition from image sequences using Support Vector Machines

Real time facial expression recognition from image sequences using Support Vector Machines Real time facial expression recognition from image sequences using Support Vector Machines I. Kotsia a and I. Pitas a a Aristotle University of Thessaloniki, Department of Informatics, Box 451, 54124 Thessaloniki,

More information

Content-Based Multimedia Information Retrieval

Content-Based Multimedia Information Retrieval Content-Based Multimedia Information Retrieval Ishwar K. Sethi Intelligent Information Engineering Laboratory Oakland University Rochester, MI 48309 Email: isethi@oakland.edu URL: www.cse.secs.oakland.edu/isethi

More information

Analyzing Vocal Patterns to Determine Emotion Maisy Wieman, Andy Sun

Analyzing Vocal Patterns to Determine Emotion Maisy Wieman, Andy Sun Analyzing Vocal Patterns to Determine Emotion Maisy Wieman, Andy Sun 1. Introduction The human voice is very versatile and carries a multitude of emotions. Emotion in speech carries extra insight about

More information

Joint Learning of Speech-Driven Facial Motion with Bidirectional Long-Short Term Memory

Joint Learning of Speech-Driven Facial Motion with Bidirectional Long-Short Term Memory Joint Learning of Speech-Driven Facial Motion with Bidirectional Long-Short Term Memory Najmeh Sadoughi and Carlos Busso Multimodal Signal Processing (MSP) Laboratory, Department of Electrical and Computer

More information

SOUND EVENT DETECTION AND CONTEXT RECOGNITION 1 INTRODUCTION. Toni Heittola 1, Annamaria Mesaros 1, Tuomas Virtanen 1, Antti Eronen 2

SOUND EVENT DETECTION AND CONTEXT RECOGNITION 1 INTRODUCTION. Toni Heittola 1, Annamaria Mesaros 1, Tuomas Virtanen 1, Antti Eronen 2 Toni Heittola 1, Annamaria Mesaros 1, Tuomas Virtanen 1, Antti Eronen 2 1 Department of Signal Processing, Tampere University of Technology Korkeakoulunkatu 1, 33720, Tampere, Finland toni.heittola@tut.fi,

More information

Facial Motion Capture Editing by Automated Orthogonal Blendshape Construction and Weight Propagation

Facial Motion Capture Editing by Automated Orthogonal Blendshape Construction and Weight Propagation Facial Motion Capture Editing by Automated Orthogonal Blendshape Construction and Weight Propagation Qing Li and Zhigang Deng Department of Computer Science University of Houston Houston, TX, 77204, USA

More information

Invariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction

Invariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction Invariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction Stefan Müller, Gerhard Rigoll, Andreas Kosmala and Denis Mazurenok Department of Computer Science, Faculty of

More information

FACIAL ANIMATION FROM SEVERAL IMAGES

FACIAL ANIMATION FROM SEVERAL IMAGES International Archives of Photogrammetry and Remote Sensing. Vol. XXXII, Part 5. Hakodate 1998 FACIAL ANIMATION FROM SEVERAL IMAGES Yasuhiro MUKAIGAWAt Yuichi NAKAMURA+ Yuichi OHTA+ t Department of Information

More information

A Novel Template Matching Approach To Speaker-Independent Arabic Spoken Digit Recognition

A Novel Template Matching Approach To Speaker-Independent Arabic Spoken Digit Recognition Special Session: Intelligent Knowledge Management A Novel Template Matching Approach To Speaker-Independent Arabic Spoken Digit Recognition Jiping Sun 1, Jeremy Sun 1, Kacem Abida 2, and Fakhri Karray

More information

Robust color segmentation algorithms in illumination variation conditions

Robust color segmentation algorithms in illumination variation conditions 286 CHINESE OPTICS LETTERS / Vol. 8, No. / March 10, 2010 Robust color segmentation algorithms in illumination variation conditions Jinhui Lan ( ) and Kai Shen ( Department of Measurement and Control Technologies,

More information

Multifactor Fusion for Audio-Visual Speaker Recognition

Multifactor Fusion for Audio-Visual Speaker Recognition Proceedings of the 7th WSEAS International Conference on Signal, Speech and Image Processing, Beijing, China, September 15-17, 2007 70 Multifactor Fusion for Audio-Visual Speaker Recognition GIRIJA CHETTY

More information

Natural Head Motion Synthesis Driven by Acoustic Prosodic Features

Natural Head Motion Synthesis Driven by Acoustic Prosodic Features Natural Head Motion Synthesis Driven by Acoustic Prosodic Features Carlos Busso Dept. of EE busso@usc.edu Zhigang Deng Dept. of CS zdeng@usc.edu Ulrich Neumann Dept. of CS uneumann@usc.edu Integrated Media

More information

Animation of generic 3D Head models driven by speech

Animation of generic 3D Head models driven by speech Animation of generic 3D Head models driven by speech Lucas Terissi, Mauricio Cerda, Juan C. Gomez, Nancy Hitschfeld-Kahler, Bernard Girau, Renato Valenzuela To cite this version: Lucas Terissi, Mauricio

More information

Topics for thesis. Automatic Speech-based Emotion Recognition

Topics for thesis. Automatic Speech-based Emotion Recognition Topics for thesis Bachelor: Automatic Speech-based Emotion Recognition Emotion recognition is an important part of Human-Computer Interaction (HCI). It has various applications in industrial and commercial

More information

Video Inter-frame Forgery Identification Based on Optical Flow Consistency

Video Inter-frame Forgery Identification Based on Optical Flow Consistency Sensors & Transducers 24 by IFSA Publishing, S. L. http://www.sensorsportal.com Video Inter-frame Forgery Identification Based on Optical Flow Consistency Qi Wang, Zhaohong Li, Zhenzhen Zhang, Qinglong

More information

Shape and Expression Space of Real istic Human Faces

Shape and Expression Space of Real istic Human Faces 8 5 2006 5 Vol8 No5 JOURNAL OF COMPU TER2AIDED DESIGN & COMPU TER GRAPHICS May 2006 ( 0087) (peiyuru @cis. pku. edu. cn) : Canny ; ; ; TP394 Shape and Expression Space of Real istic Human Faces Pei Yuru

More information

Fingerprint Ridge Distance Estimation: Algorithms and the Performance*

Fingerprint Ridge Distance Estimation: Algorithms and the Performance* Fingerprint Ridge Distance Estimation: Algorithms and the Performance* Xiaosi Zhan, Zhaocai Sun, Yilong Yin, and Yayun Chu Computer Department, Fuyan Normal College, 3603, Fuyang, China xiaoszhan@63.net,

More information

View Generation for Free Viewpoint Video System

View Generation for Free Viewpoint Video System View Generation for Free Viewpoint Video System Gangyi JIANG 1, Liangzhong FAN 2, Mei YU 1, Feng Shao 1 1 Faculty of Information Science and Engineering, Ningbo University, Ningbo, 315211, China 2 Ningbo

More information

Combining Audio and Video for Detection of Spontaneous Emotions

Combining Audio and Video for Detection of Spontaneous Emotions Combining Audio and Video for Detection of Spontaneous Emotions Rok Gajšek, Vitomir Štruc, Simon Dobrišek, Janez Žibert, France Mihelič, and Nikola Pavešić Faculty of Electrical Engineering, University

More information

High Performance VLSI Architecture of Fractional Motion Estimation for H.264/AVC

High Performance VLSI Architecture of Fractional Motion Estimation for H.264/AVC Journal of Computational Information Systems 7: 8 (2011) 2843-2850 Available at http://www.jofcis.com High Performance VLSI Architecture of Fractional Motion Estimation for H.264/AVC Meihua GU 1,2, Ningmei

More information

Tracking facial features using low resolution and low fps cameras under variable light conditions

Tracking facial features using low resolution and low fps cameras under variable light conditions Tracking facial features using low resolution and low fps cameras under variable light conditions Peter Kubíni * Department of Computer Graphics Comenius University Bratislava / Slovakia Abstract We are

More information

A Multiresolutional Approach for Facial Motion Retargetting Using Subdivision Wavelets

A Multiresolutional Approach for Facial Motion Retargetting Using Subdivision Wavelets A Multiresolutional Approach for Facial Motion Retargetting Using Subdivision Wavelets Kyungha Min and Moon-Ryul Jung Dept. of Media Technology, Graduate School of Media Communications, Sogang Univ., Seoul,

More information

Text Area Detection from Video Frames

Text Area Detection from Video Frames Text Area Detection from Video Frames 1 Text Area Detection from Video Frames Xiangrong Chen, Hongjiang Zhang Microsoft Research China chxr@yahoo.com, hjzhang@microsoft.com Abstract. Text area detection

More information

Speech Synthesis. Simon King University of Edinburgh

Speech Synthesis. Simon King University of Edinburgh Speech Synthesis Simon King University of Edinburgh Hybrid speech synthesis Partial synthesis Case study: Trajectory Tiling Orientation SPSS (with HMMs or DNNs) flexible, robust to labelling errors but

More information

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of

More information

Spoken Document Retrieval (SDR) for Broadcast News in Indian Languages

Spoken Document Retrieval (SDR) for Broadcast News in Indian Languages Spoken Document Retrieval (SDR) for Broadcast News in Indian Languages Chirag Shah Dept. of CSE IIT Madras Chennai - 600036 Tamilnadu, India. chirag@speech.iitm.ernet.in A. Nayeemulla Khan Dept. of CSE

More information

A Rapid Scheme for Slow-Motion Replay Segment Detection

A Rapid Scheme for Slow-Motion Replay Segment Detection A Rapid Scheme for Slow-Motion Replay Segment Detection Wei-Hong Chuang, Dun-Yu Hsiao, Soo-Chang Pei, and Homer Chen Department of Electrical Engineering, National Taiwan University, Taipei, Taiwan 10617,

More information

A NEURAL NETWORK APPLICATION FOR A COMPUTER ACCESS SECURITY SYSTEM: KEYSTROKE DYNAMICS VERSUS VOICE PATTERNS

A NEURAL NETWORK APPLICATION FOR A COMPUTER ACCESS SECURITY SYSTEM: KEYSTROKE DYNAMICS VERSUS VOICE PATTERNS A NEURAL NETWORK APPLICATION FOR A COMPUTER ACCESS SECURITY SYSTEM: KEYSTROKE DYNAMICS VERSUS VOICE PATTERNS A. SERMET ANAGUN Industrial Engineering Department, Osmangazi University, Eskisehir, Turkey

More information

A Performance Evaluation of HMM and DTW for Gesture Recognition

A Performance Evaluation of HMM and DTW for Gesture Recognition A Performance Evaluation of HMM and DTW for Gesture Recognition Josep Maria Carmona and Joan Climent Barcelona Tech (UPC), Spain Abstract. It is unclear whether Hidden Markov Models (HMMs) or Dynamic Time

More information

Facial Expression Recognition for HCI Applications

Facial Expression Recognition for HCI Applications acial Expression Recognition for HCI Applications adi Dornaika Institut Géographique National, rance Bogdan Raducanu Computer Vision Center, Spain INTRODUCTION acial expression plays an important role

More information

Cost-alleviative Learning for Deep Convolutional Neural Network-based Facial Part Labeling

Cost-alleviative Learning for Deep Convolutional Neural Network-based Facial Part Labeling [DOI: 10.2197/ipsjtcva.7.99] Express Paper Cost-alleviative Learning for Deep Convolutional Neural Network-based Facial Part Labeling Takayoshi Yamashita 1,a) Takaya Nakamura 1 Hiroshi Fukui 1,b) Yuji

More information

FACIAL MOVEMENT BASED PERSON AUTHENTICATION

FACIAL MOVEMENT BASED PERSON AUTHENTICATION FACIAL MOVEMENT BASED PERSON AUTHENTICATION Pengqing Xie Yang Liu (Presenter) Yong Guan Iowa State University Department of Electrical and Computer Engineering OUTLINE Introduction Literature Review Methodology

More information

Authentication of Fingerprint Recognition Using Natural Language Processing

Authentication of Fingerprint Recognition Using Natural Language Processing Authentication of Fingerprint Recognition Using Natural Language Shrikala B. Digavadekar 1, Prof. Ravindra T. Patil 2 1 Tatyasaheb Kore Institute of Engineering & Technology, Warananagar, India 2 Tatyasaheb

More information

Synthesizing Realistic Facial Expressions from Photographs

Synthesizing Realistic Facial Expressions from Photographs Synthesizing Realistic Facial Expressions from Photographs 1998 F. Pighin, J Hecker, D. Lischinskiy, R. Szeliskiz and D. H. Salesin University of Washington, The Hebrew University Microsoft Research 1

More information

Moving Object Segmentation Method Based on Motion Information Classification by X-means and Spatial Region Segmentation

Moving Object Segmentation Method Based on Motion Information Classification by X-means and Spatial Region Segmentation IJCSNS International Journal of Computer Science and Network Security, VOL.13 No.11, November 2013 1 Moving Object Segmentation Method Based on Motion Information Classification by X-means and Spatial

More information

Speech Driven Facial Animation

Speech Driven Facial Animation Speech Driven Facial Animation P. Kakumanu R. Gutierrez-Osuna A. Esposito R. Bryll A. Goshtasby O.. Garcia 2 Department of Computer Science and Engineering Wright State University 364 Colonel Glenn Hwy

More information

Face Recognition Technology Based On Image Processing Chen Xin, Yajuan Li, Zhimin Tian

Face Recognition Technology Based On Image Processing Chen Xin, Yajuan Li, Zhimin Tian 4th International Conference on Machinery, Materials and Computing Technology (ICMMCT 2016) Face Recognition Technology Based On Image Processing Chen Xin, Yajuan Li, Zhimin Tian Hebei Engineering and

More information

Development of Behavioral Based System from Sports Video

Development of Behavioral Based System from Sports Video Development of Behavioral Based System from Sports Video S.Muthu lakshmi Research Scholar N. Akila Banu Student Dr G.V.Uma Professor, Research Supervisor & Head email:muthusubramanian2185@gmail.com Dept.

More information

Personalized Facial Animation Based on 3D Model Fitting from Two Orthogonal Face Images

Personalized Facial Animation Based on 3D Model Fitting from Two Orthogonal Face Images Personalized Facial Animation Based on 3D Model Fitting from Two Orthogonal Face Images Yonglin Li and Jianhua Tao National Lab of Pattern and Recognition, Institute of Automation, Chinese Academic of

More information