Speech Driven Face Animation Based on Dynamic Concatenation Model

Size: px
Start display at page:

Download "Speech Driven Face Animation Based on Dynamic Concatenation Model"

Transcription

1 Journal of Information & Computational Science 3: 4 (2006) 1 Available at Speech Driven Face Animation Based on Dynamic Concatenation Model Jianhua Tao, Panrong Yin National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing , China Received 1 May 2006; revised 20 July 2006 Abstract In the paper, we design and develop a speech driven face animation system based on the dynamic concatenation model. The face animation is synthesized by the unit concatenating, and synchronous with the real speech. The units are selected according to the cost functions which correspond to voice spectrum distance between training and target units. Visual distance between two adjacent training units is also used to get better mapping results. Finally, the Viterbi method is used to find out the best face animation sequence. The experimental results show that synthesized lip movement has a good and natural quality. Keywords: Talking head; Face animation; Speech driven 1 Introduction Speech driven face animation systems ([6, 8, 10, 15]) have been developed for improving humanmachine interface such as virtual announcer, reader, mobile messenger reader and so on. It also helps hearing-impaired people to communicate with others or those in noisy environment. Unlike the text driven face animation ([3, 5]) which can only use the utterance context information, the speech driven face animation is driven by real speech with both spectrum and prosodic information, and is able to offer more expressive and vivid outputs. A speech driven face animation system is usually established in four steps as defined in Picture My Voice ([10]): label an audio-visual database; give representation of both the auditory and visual speech; find a method to describe the relationship between two representations; synthesize the visible speech given the auditory speech. Many methods have been applied in audio-visual mapping, such as, TDNN ([10]), MLPs ([8]), KNN ([6]), HMM ([15]), GMM, VQ ([15]), Rulebased ([5, 9, 14]). They can be classified into two types: through speech recognition or not Project supported by the National Natural Science Foundation of China (No ). Corresponding author. address: jhtao@nlpr.ia.ac.cn (Jianhua Tao) / Copyright c 2006 Binary Information Press December 2006

2 2 J. Tao et al. /Journal of Information & Computational Science 3: 4 (2006) 1 through speech recognition. The first approach divides speech signal into language units such as phonemes, syllables, words, and mapped them directly to corresponding lip shapes, through smoothing algorithm, the lip movement sequence is obtained ([15]). The second approach analyzes the bimodal data through statistical learning model, and finds out the mapping between the continuous acoustic features and facial parameters, so as to drive the face animation directly by a novel speech. For example, Picture my voice ([10]) trained ANN to learn the mapping from LPCs to face animation parameters, they used current phoneme, 5 backward and 5 forward time step as the input to model the context. In the paper, we propose an acoustic context based facial unit selection and dynamic concatenation model. The method constructs the data stream by concatenating stored data units in training database. It makes the synthesis result very natural and realistic. Furthermore, a novel statistic method has been generated for the model training from a big corpus. The training and testing corpus is based on 3D coordinates of FDPs which are compatible with MPEG-4 standard ([11]) and have been recorded by Motion Capture System. Since animated talking head is required to be more lifelike and individual, FAPs are extracted as visual features so as to apply our synthesized animation control parameters to different 3D mesh models. The whole paper is organized as follows, Section 1 gives the general description of the paper, Section 2 introduces the database setup and audio-visual feature extraction, Section 3 focuses on the model description and the training method, Section 4 makes some experiments and analysis of the model and Section 5 gives the conclusion of the work. 2 Corpus 2.1 Corpus design and recording To create the model, a corpus which contains 1,000 sentences was produced by searching the appropriate data from 10 years People Daily via a Greedy Algorithm ([6]). The following factors form the focus of our investigation. (a) Identity of the current syllable; (b) Identity of the current tone; (c) Identity of the final in the previous syllable; (d) Identity of the previous tone; (e) Identity of the initial in the following syllable; (f) Identity of the following tone. Factor (a) has 417 values that correspond to the 417 syllable types in Chinese. Factors (b), (d), and (f) have each 5 values that correspond to the 4 full tones and the neutral tone. Factor (c) contains 20 values (initial types) and factor (e) contains 41 values (final types). During the text selection phase, phrasing was coded solely on the basis of punctuation. After the text was selected and the database recorded, phrasing was recoded to correspond to pauses. Each utterance in our database contains at least 2 phrases. There were 2,869 phrases and 11,860 syllables in total, so

3 J. Tao et al. /Journal of Information & Computational Science 3: 4 (2006) 1 3 on average each utterance contained 2 phrases. After the corpus was designed, each sentence was recorded by a female in a professional audio recording studio. For dynamic facial parameters recording, most of current corpuses are based on a stream of dynamic images of videos ([2, 3, 5, 14]), some others recorded static facial visemes by 3D laser scans or 2D to 3D image reconstruction method ([6, 8]). There are two most popular visual representation standards, one is the Facial Action Coding System ([4]) developed by Ekman and Friesen, the other is MPEG-4 SNHC ([11]). FACS defined 44 AUs to denote the independent facial muscles action, more than 7000 AU combinations have been observed. 68 FAPs which are defined in MPEG-4 SNHC also represented the basic facial actions, including two high-level parameters, viseme and expression, and 66 low-level parameters. They are divided into 10 groups according to effect regions. Since AUs cannot offer quantitative description for face animation, the paper converts 3D FDPs movements into low-level FAPs. FAPs are computed and normalized in terms of FAPUs which are the distances between the main facial feature points. So they can be applied to different face models in a consistent way. In the paper, 15 FAPs (Table. 1) involved in group 2 and group 8 are extracted to represent the lip movement. Table 1: FAPs for visual representation in our system. Group FAP name Group FAP name 2 Open jaw 8 Stretch l cornerlip o 2 Trust jaw 8 Lower t lip lm o 2 Shift jaw 8 Lower t lip rm o 2 Push b lip 8 Raise b lip lm o 2 Push t lip 8 Raise b lip rm o 8 Lower t midlip o 8 Raise l cornerlip o 8 Raise b midlip o 8 Raise r cornerlip o 8 Stretch r cornerlip o 8 To acquire FAPs accurately, a commercially available motion capture system (Motion Analysis), in which eight cameras with 75Hz sampling frequency are employed, is used to track the FDPs on speaker s face as shown in Fig. 1. The output of the motion capture system are 3D trajectories of all the 50 markers. During experiment, the rigid head movement is not avoided, so 5 markers on the head are used to compensate the global head movement to get the non-rigid facial deformations. 2.2 Features generation Normalization of markup points and FAPs flow generation To extract FAPs correctly, the 3D trajectories of face markers have to be normalized into an upright position in positive XYZ space, and then the least-square based fitting method of two 3D point sets is used to extract the rigid head movements ([1]). In our work, three benchmark points (a,b,c) are set on the forehead as shown in Fig. 1. The normalized facial parameters will be got from the following formula, P i,t = R( P i,t + T ). (1)

4 4 J. Tao et al. /Journal of Information & Computational Science 3: 4 (2006) 1 Fig. 1: Placement of markers on neutral face and capture data. Here, P i,t means the captured position from motion capture system, P i,t means the normalized position which will be used for generating the FAPs flow, R is the rotation matrix, T is the translation vector, i means the markup point and t is the measured time. P a,t, P b,t, and P c,t are the basic benchmark positions, the distances among them are, D a,b = P a,t P b,t = (x a,t x b,t ) 2 + (y a,t y b,t ) 2 + (z a,t z b,t ) 2, D a,c = P a,t P c,t = (x a,t x c,t ) 2 + (y a,t y c,t ) 2 + (z a,t z c,t ) 2, D b,c = P b,t P c,t = (x b,t x c,t ) 2 + (y b,t y c,t ) 2 + (z b,t z c,t ) 2, Let s suppose the three benchmark points are located in the plane z = 0 in the new normalized XYZ reference frame. Then, we got the new position of these points, P a,t = {0, 0, 0} T, P c,t = { D2 a,b + D2 a,c D 2 b,c 2D a,b, P b,t = {0, D a,b, 0} T, D 2 a,c ( D2 a,b + D2 a,c D 2 b,c 2D a,b ) 2, 0} T. Since points a, b, c will keep the constant distance among them, it s not necessary to calculate at each time. With formula (1), we then can get R( P a,t + T ) = P a,t = {0, 0, 0} T, (2) R( P b,t + T ) = P b,t = {0, D a,b, 0} T, (3) R( P c,t + T ) = P c,t = { D2 a,b + D2 a,c D 2 b,c 2D a,b, D 2 a,c ( D2 a,b + D2 a,c D 2 b,c 2D a,b ) 2, 0} T. (4)

5 J. Tao et al. /Journal of Information & Computational Science 3: 4 (2006) 1 5 Then, T = P a,t. (5) We separate R into the combination of rotations about axis x, axis y and axis z. R = R x R y R z. Here, R x = cos α sin α 0 sin α cos α Thus, R y = R z = cos β 0 sin β sin β 0 cos β cos θ sin θ 0 sin θ cos θ The rotation matrix can be easily deduced from the formula 2,3 4. Then, the output of motion capture system at t step is: {P 1,t, P 2,t,, P n,t} T R 3n, n = 45. (6) With the normalized markup points, we can get the FAPs flow by following the method defined in [16] where we consider the markup points as FDP parameters. V t = {F AP t 1, F AP t 2,, F AP t 15}. (7) Acoustic feature vectorization For audio features, some researchers used text-dependant language units to analyze their corresponding static visemes; but in order to reduce manual intervention, we turned to acoustic features directly. Previous work has proved that MFCC gives an alternative representation to speech spectra and contains the audition information. In each frame, we computed 12 MFCC coefficients ([17]). The parameters in frame t can be represented as: a t = {a t 1, a t 2,, a t 12}. (8) 3 Dynamic Concatenation Model 3.1 Model description The method is based on the cost function: (Eq. 6): COST = αc a + (1 α)c V, (9)

6 6 J. Tao et al. /Journal of Information & Computational Science 3: 4 (2006) 1 where C a is the voice spectrum distance, C V is the visual distance between two adjacent units, and the weight α balances the effect of the two sub-costs. In the unit selection procedures, according to the target frame unit of novel speech, we list the candidates which have smaller acoustic distances with them. The audio content distance measures how close the candidate unit is compared with the target unit. It determines whether the most appropriate audio-visual unit has been selected. The voice spectrum distance also accounts for the context of target unit. The context covers the former and latter two frames for vowels and the former and latter frame for consonants. So the voice spectrum distance is defined by C a = T +n 12 t=t n m=1 ω t,m a t,m â t,m. (10) The smoothness of the face animation is also very important, and is measured by the concatenation cost which is calculated by FAPs erros between two selected frames. Here, V(u t 1, u t ) and V(u t, u t+1 ) denote the Eucilidian distance. C V = V(u t 1, u t ) + V(u t, u t+1 ). (11) 3.2 Training method Though the initial weights can be manually assigned according to the researcher s experience, further training method is still necessary to adapt the model to different speech corpus. The whole training procedure is described as follows. Let s suppose the initial weight vector of Eq. (10) is ω 0 = {ω 0 1, ω 0 2,, ω 0 p}. The training set and outputs are { Y ˆ 1, Y ˆ 2,, Y ˆ N } and { Y 1, Y 2,, Y N }. After time step j-1 of learning, the weight vector will be ω j = {ω1, j ω2, j, ωp}, j where, p is the length of weight vector. Weight vector is restricted by the following condition ω j i = 1. (12) The condition ensures that weights will converge to steady ones, and ensures the balance of the weights in whole space. After the training time step j, the new weight vector is acquired with, ω j+1 i = ω j i + ηj + d j i. (13) Here, η j determines the learning rate at time step j and d j denotes the learning direction. Since all context parameters were normalized from 0 to 1, d j can be got from, d j i = [1 ( Y n ) min ][ (V (a min n,j )) (V (a s n,j))] + C. (14) Y n )

7 J. Tao et al. /Journal of Information & Computational Science 3: 4 (2006) 1 7 Fig. 2: Diagram of FAPs selection and weight assigning. Y n is the FAPs error between synthesis result and the target of the frame n, which is Y n = argmax[ i γ i V (a n,m,j )] ˆ Y n. (15) (V (a s n,j)) is the corresponding error of MFCC parameters between outputs and targets, (V (a s n,j)) = [V (a s n,j) ˆV (a n,j )]. (V (a min n,j )) is the error of MFCC parameters corresponding to the candidate of FAPs which makes minimum FAPs deviation between synthesis results and targets With the condition (12), we get (V (a min n,j )) = [V (a min n,j ) ˆV (a n,j )]. ω j+1 i = ω j i + η j d j i = 1. Thus, η j d j i = 0. To reduce the computing cost, η j normally is assigned as a constant value. We get C = 1 p [( Y n ) min Y n ) 1] [ (V (a min n,j )) (V (a s n,j))]. (16) Then, the whole automatic training procedure will be finished with the combination of Eq. (13), (14) and (16). The block diagram of the approach for FAPs flow prediction and training is given in Fig. 2.

8 8 J. Tao et al. /Journal of Information & Computational Science 3: 4 (2006) Synthesized FAP stream of FAP #51 of a sentence sample 200 Synthesized FAP stream of FAP #52 of a sentence sample FAP value FAP value Recorded FAP stream Synthesized FAP stream Smoothed synthesized FAP stream 800 Recorded FAP stream Synthesized FAP stream Smoothed synthesized FAP stream Frames Synthesized FAP stream of FAP #51 of a sentence sample (a) Frames Synthesized FAP stream of FAP #52 of a sentence sample FAP value FAP value Recorded FAP stream Synthesized FAP stream Smoothed synthesized FAP stream 800 Recorded FAP stream Synthesized FAP stream Smoothed synthesized FAP stream Frames (b) Fig. 3: Synthesized FAPs streams from (a) validation sentence, (b) test sentence. 4 Testing and Validation Once the cost function (with audio content distance and visual distance) are computed, the unit selection method is constructed. The approach finally aims to find the best path in this graph which generates minimum COST. Viterbi is a valid search method for this application. In the paper, three forth of the corpus is used as training data, the rest is used for validation and testting with both quantitative evaluation and qualitative observation. Fig. 3 indicates the synthesized FAP stream results. Correlation coefficients (Eq. 14) are used to represent the deviation similarity between the outputs and the targets Frames CC = 1 T T t=1 ˆf(t) ˆµ)(f(t) µ), (17) ˆσσ where f(t) is the recorded FAP stream, ˆf(t) is the synthesis result, T is the total number of frames in the database, and µ and σ are corresponding mean and standard deviation.

9 J. Tao et al. /Journal of Information & Computational Science 3: 4 (2006) 1 9 Table 2: The average correlation coefficient for each FAPs on the whole database. FAP No. Validation Test FAP No. Validation Test # # # # # # # # # # # # # # # From Table 2, the synthesized results show a good performance of the method. Although the test sentences show smaller correlation coefficients, they still can synthesize most of FAPs. It is also noticed that the CC of FAP #15 and #60 of validation data are even smaller than those of test data, which may be mainly due to the speaker s habit that she speaks sentences with randomly shifting mouth corner. Finally, a MPEG-4 facial animation engine is used to access our synthesized FAP streams qualitatively. The animation model displays at a frame rate of 75 fps. Fig. 4 shows some frames of synthesized talking head. Fig. 4: Some samples of synthesized talking head. 5 Conclusion In the paper, we have presented a speech driven face animation system. While many other researchers used images, optical flows, principal components, etc, as stored units, the system employs FAPs which are extracted from 3D coordinates of FDPs to represent the visual feature. It has advantage of small storage, being able to be applied to non-specific model, and also to drive both 3D mesh model and 2D images. The unit selection method for dynamic mapping from audio to visual parameters is a simplification of HMM, but it appears very good and natural with both quantitative and qualitative evaluation. Without parameter adjust, under-fitting, over-fitting problems which are involved in learning method like ANN, it is a simple, direct and effective way to synthesize the talking head.

10 10 J. Tao et al. /Journal of Information & Computational Science 3: 4 (2006) 1 References [1] K. S. Arun, T. S. Huang and S. D. Blostein, Least-square fitting of two 3D point sets, IEEE Trans. Pattern Analysis and Machine Intelligence 9 (5) [2] C. Bregler, M. Covell and M. Slaney, Video rewrite: driving visual speech with audio, in: ACM SIGGRAPH, [3] E. Cosatto, G. Potamianos and H. P. Graf, Audio-visual unit selection for the synthesis of photorealistic talking-heads, in: IEEE International Conference on Multimedia and Expo, ICME, 2000, pp [4] P. Ekman and W. V. Friesen, Facial Action Coding System,, Consulting Psychologists Press, Palo Alto Calif., [5] T. Ezzat, T. Poggio and MikeTalk, A talking facial display based on morphing visemes, in: Proc. Computer Animation Conference, Philadelphia, USA, [6] R. Gutierrez-Osuna, P. K. Kakumanu, A. Esposito, O. N. Garcia, A. Bojorquez, J. L. Castillo and I. Rudomin, Speech-driven facial animation with realistic dynamics, IEEE Trans. Multimedia 7 (1) [7] A. Hunt and A. Black, Unit selection in a concatenative speech synthesis system using a large speech database, in: ICASSP, Vol. 1, 1996, pp [8] Pengyu Hong, Zhen Wen and Thomas S. Huang, Real-time speech-driven face animation with expressions using neural networks, IEEE Trans. on Neural Networks 13 (4) [9] S. Kshirsagar, T. Molet, and N. M. Thalmann, Principal components of expressive speech animation, in: Proc. of Computer Graphics International, [10] Dominic W. Massaro, Jonas Beskow, Michael M. Cohen, Christopher L. Fry and Tony Rodriguez, Picture my voice: audio to visual speech synthesis using artificial neural networks, in: Proceedings of AVSP 99, Santa Cruz, CA., August, 1999, pp [11] A. Murat Tekalp and JoK rn Ostermann, Face and 2D mesh animation in MPEG-4, Signal Processing, Image Communication 15 (2000) [12] Ram R. Rao, Tsuhan Chen and Russell M. Mersereau, Audio-to-visual conversion for multimedia communication, IEEE Trans. Industrial Electronics 45 (1) [13] Jian-Qing Wang, Ka-Ho Wong, Pheng-Ann Pheng, H. M. Meng and Tien-Tsin Wong, A real-time Cantonese text-to-audiovisual speech synthesizer, in: Acoustics, Speech, and Signal Processing, 2004, Proceedings, (ICASSP 04), Vol. 1, pp [14] Ashish Verma, L. Venkata Subramaniam, Nitendra Rajput, Chalapathy Neti and Tanveer A. Faruquie, Animating expressive faces across languages, IEEE Trans. Multimedia 6 (6) [15] E. Yamamoto, S. Nakamura and K. Shikano, Lip movement synthesis from speech based on hidden Markov models, Speech Communication 26 (1998) [16] Overview of the MPEG-4 Standard, 4.htm. [17] The Hidden Markov Model Toolkit (HTK),

Expressive Face Animation Synthesis Based on Dynamic Mapping Method*

Expressive Face Animation Synthesis Based on Dynamic Mapping Method* Expressive Face Animation Synthesis Based on Dynamic Mapping Method* Panrong Yin, Liyue Zhao, Lixing Huang, and Jianhua Tao National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese

More information

M I RA Lab. Speech Animation. Where do we stand today? Speech Animation : Hierarchy. What are the technologies?

M I RA Lab. Speech Animation. Where do we stand today? Speech Animation : Hierarchy. What are the technologies? MIRALab Where Research means Creativity Where do we stand today? M I RA Lab Nadia Magnenat-Thalmann MIRALab, University of Geneva thalmann@miralab.unige.ch Video Input (face) Audio Input (speech) FAP Extraction

More information

Speech Driven Synthesis of Talking Head Sequences

Speech Driven Synthesis of Talking Head Sequences 3D Image Analysis and Synthesis, pp. 5-56, Erlangen, November 997. Speech Driven Synthesis of Talking Head Sequences Peter Eisert, Subhasis Chaudhuri,andBerndGirod Telecommunications Laboratory, University

More information

VISEME SPACE FOR REALISTIC SPEECH ANIMATION

VISEME SPACE FOR REALISTIC SPEECH ANIMATION VISEME SPACE FOR REALISTIC SPEECH ANIMATION Sumedha Kshirsagar, Nadia Magnenat-Thalmann MIRALab CUI, University of Geneva {sumedha, thalmann}@miralab.unige.ch http://www.miralab.unige.ch ABSTRACT For realistic

More information

Real-time Lip Synchronization Based on Hidden Markov Models

Real-time Lip Synchronization Based on Hidden Markov Models ACCV2002 The 5th Asian Conference on Computer Vision, 23--25 January 2002, Melbourne, Australia. Real-time Lip Synchronization Based on Hidden Markov Models Ying Huang* 1 Stephen Lin+ Xiaoqing Ding* Baining

More information

VTalk: A System for generating Text-to-Audio-Visual Speech

VTalk: A System for generating Text-to-Audio-Visual Speech VTalk: A System for generating Text-to-Audio-Visual Speech Prem Kalra, Ashish Kapoor and Udit Kumar Goyal Department of Computer Science and Engineering, Indian Institute of Technology, Delhi Contact email:

More information

IFACE: A 3D SYNTHETIC TALKING FACE

IFACE: A 3D SYNTHETIC TALKING FACE IFACE: A 3D SYNTHETIC TALKING FACE PENGYU HONG *, ZHEN WEN, THOMAS S. HUANG Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign Urbana, IL 61801, USA We present

More information

Lip Tracking for MPEG-4 Facial Animation

Lip Tracking for MPEG-4 Facial Animation Lip Tracking for MPEG-4 Facial Animation Zhilin Wu, Petar S. Aleksic, and Aggelos. atsaggelos Department of Electrical and Computer Engineering Northwestern University 45 North Sheridan Road, Evanston,

More information

Text-to-Audiovisual Speech Synthesizer

Text-to-Audiovisual Speech Synthesizer Text-to-Audiovisual Speech Synthesizer Udit Kumar Goyal, Ashish Kapoor and Prem Kalra Department of Computer Science and Engineering, Indian Institute of Technology, Delhi pkalra@cse.iitd.ernet.in Abstract.

More information

Real-Time Speech-Driven Face Animation with Expressions Using Neural Networks

Real-Time Speech-Driven Face Animation with Expressions Using Neural Networks IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 1, JANUARY 2002 100 Real-Time Speech-Driven Face Animation with Expressions Using Neural Networks Pengyu Hong, Zhen Wen, and Thomas S. Huang, Fellow,

More information

Towards Audiovisual TTS

Towards Audiovisual TTS Towards Audiovisual TTS in Estonian Einar MEISTER a, SaschaFAGEL b and RainerMETSVAHI a a Institute of Cybernetics at Tallinn University of Technology, Estonia b zoobemessageentertainmentgmbh, Berlin,

More information

Real-time Talking Head Driven by Voice and its Application to Communication and Entertainment

Real-time Talking Head Driven by Voice and its Application to Communication and Entertainment ISCA Archive Real-time Talking Head Driven by Voice and its Application to Communication and Entertainment Shigeo MORISHIMA Seikei University ABSTRACT Recently computer can make cyberspace to walk through

More information

REALISTIC FACIAL EXPRESSION SYNTHESIS FOR AN IMAGE-BASED TALKING HEAD. Kang Liu and Joern Ostermann

REALISTIC FACIAL EXPRESSION SYNTHESIS FOR AN IMAGE-BASED TALKING HEAD. Kang Liu and Joern Ostermann REALISTIC FACIAL EXPRESSION SYNTHESIS FOR AN IMAGE-BASED TALKING HEAD Kang Liu and Joern Ostermann Institut für Informationsverarbeitung, Leibniz Universität Hannover Appelstr. 9A, 3167 Hannover, Germany

More information

MAPPING FROM SPEECH TO IMAGES USING CONTINUOUS STATE SPACE MODELS. Tue Lehn-Schiøler, Lars Kai Hansen & Jan Larsen

MAPPING FROM SPEECH TO IMAGES USING CONTINUOUS STATE SPACE MODELS. Tue Lehn-Schiøler, Lars Kai Hansen & Jan Larsen MAPPING FROM SPEECH TO IMAGES USING CONTINUOUS STATE SPACE MODELS Tue Lehn-Schiøler, Lars Kai Hansen & Jan Larsen The Technical University of Denmark Informatics and Mathematical Modelling Richard Petersens

More information

Facial Expression Analysis for Model-Based Coding of Video Sequences

Facial Expression Analysis for Model-Based Coding of Video Sequences Picture Coding Symposium, pp. 33-38, Berlin, September 1997. Facial Expression Analysis for Model-Based Coding of Video Sequences Peter Eisert and Bernd Girod Telecommunications Institute, University of

More information

Multi-modal Translation and Evaluation of Lip-synchronization using Noise Added Voice

Multi-modal Translation and Evaluation of Lip-synchronization using Noise Added Voice Multi-modal Translation and Evaluation of Lip-synchronization using Noise Added Voice Shigeo MORISHIMA (,2), Satoshi NAKAMURA (2) () Faculty of Engineering, Seikei University. --, Kichijoji-Kitamachi,

More information

AUDIOVISUAL SYNTHESIS OF EXAGGERATED SPEECH FOR CORRECTIVE FEEDBACK IN COMPUTER-ASSISTED PRONUNCIATION TRAINING.

AUDIOVISUAL SYNTHESIS OF EXAGGERATED SPEECH FOR CORRECTIVE FEEDBACK IN COMPUTER-ASSISTED PRONUNCIATION TRAINING. AUDIOVISUAL SYNTHESIS OF EXAGGERATED SPEECH FOR CORRECTIVE FEEDBACK IN COMPUTER-ASSISTED PRONUNCIATION TRAINING Junhong Zhao 1,2, Hua Yuan 3, Wai-Kim Leung 4, Helen Meng 4, Jia Liu 3 and Shanhong Xia 1

More information

Realistic Talking Head for Human-Car-Entertainment Services

Realistic Talking Head for Human-Car-Entertainment Services Realistic Talking Head for Human-Car-Entertainment Services Kang Liu, M.Sc., Prof. Dr.-Ing. Jörn Ostermann Institut für Informationsverarbeitung, Leibniz Universität Hannover Appelstr. 9A, 30167 Hannover

More information

TOWARDS A HIGH QUALITY FINNISH TALKING HEAD

TOWARDS A HIGH QUALITY FINNISH TALKING HEAD TOWARDS A HIGH QUALITY FINNISH TALKING HEAD Jean-Luc Oliv&s, Mikko Sam, Janne Kulju and Otto Seppala Helsinki University of Technology Laboratory of Computational Engineering, P.O. Box 9400, Fin-02015

More information

FACE ANALYSIS AND SYNTHESIS FOR INTERACTIVE ENTERTAINMENT

FACE ANALYSIS AND SYNTHESIS FOR INTERACTIVE ENTERTAINMENT FACE ANALYSIS AND SYNTHESIS FOR INTERACTIVE ENTERTAINMENT Shoichiro IWASAWA*I, Tatsuo YOTSUKURA*2, Shigeo MORISHIMA*2 */ Telecommunication Advancement Organization *2Facu!ty of Engineering, Seikei University

More information

Data-Driven Face Modeling and Animation

Data-Driven Face Modeling and Animation 1. Research Team Data-Driven Face Modeling and Animation Project Leader: Post Doc(s): Graduate Students: Undergraduate Students: Prof. Ulrich Neumann, IMSC and Computer Science John P. Lewis Zhigang Deng,

More information

Animation of 3D surfaces.

Animation of 3D surfaces. Animation of 3D surfaces Motivations When character animation is controlled by skeleton set of hierarchical joints joints oriented by rotations the character shape still needs to be visible: visible =

More information

PERSONALIZED FACE ANIMATION IN SHOWFACE SYSTEM. Ali Arya Babak Hamidzadeh

PERSONALIZED FACE ANIMATION IN SHOWFACE SYSTEM. Ali Arya Babak Hamidzadeh PERSONALIZED FACE ANIMATION IN SHOWFACE SYSTEM Ali Arya Babak Hamidzadeh Dept. of Electrical & Computer Engineering, University of British Columbia, 2356 Main Mall, Vancouver, BC, Canada V6T 1Z4, Phone:

More information

Facial Motion Capture Editing by Automated Orthogonal Blendshape Construction and Weight Propagation

Facial Motion Capture Editing by Automated Orthogonal Blendshape Construction and Weight Propagation Facial Motion Capture Editing by Automated Orthogonal Blendshape Construction and Weight Propagation Qing Li and Zhigang Deng Department of Computer Science University of Houston Houston, TX, 77204, USA

More information

2-2-2, Hikaridai, Seika-cho, Soraku-gun, Kyoto , Japan 2 Graduate School of Information Science, Nara Institute of Science and Technology

2-2-2, Hikaridai, Seika-cho, Soraku-gun, Kyoto , Japan 2 Graduate School of Information Science, Nara Institute of Science and Technology ISCA Archive STREAM WEIGHT OPTIMIZATION OF SPEECH AND LIP IMAGE SEQUENCE FOR AUDIO-VISUAL SPEECH RECOGNITION Satoshi Nakamura 1 Hidetoshi Ito 2 Kiyohiro Shikano 2 1 ATR Spoken Language Translation Research

More information

K A I S T Department of Computer Science

K A I S T Department of Computer Science An Example-based Approach to Text-driven Speech Animation with Emotional Expressions Hyewon Pyun, Wonseok Chae, Yejin Kim, Hyungwoo Kang, and Sung Yong Shin CS/TR-2004-200 July 19, 2004 K A I S T Department

More information

image-based visual synthesis: facial overlay

image-based visual synthesis: facial overlay Universität des Saarlandes Fachrichtung 4.7 Phonetik Sommersemester 2002 Seminar: Audiovisuelle Sprache in der Sprachtechnologie Seminarleitung: Dr. Jacques Koreman image-based visual synthesis: facial

More information

Natural Head Motion Synthesis Driven by Acoustic Prosodic Features

Natural Head Motion Synthesis Driven by Acoustic Prosodic Features Natural Head Motion Synthesis Driven by Acoustic Prosodic Features Carlos Busso Dept. of EE busso@usc.edu Zhigang Deng Dept. of CS zdeng@usc.edu Ulrich Neumann Dept. of CS uneumann@usc.edu Integrated Media

More information

Artificial Visual Speech Synchronized with a Speech Synthesis System

Artificial Visual Speech Synchronized with a Speech Synthesis System Artificial Visual Speech Synchronized with a Speech Synthesis System H.H. Bothe und E.A. Wieden Department of Electronics, Technical University Berlin Einsteinufer 17, D-10587 Berlin, Germany Abstract:

More information

Facial Animation System Based on Image Warping Algorithm

Facial Animation System Based on Image Warping Algorithm Facial Animation System Based on Image Warping Algorithm Lanfang Dong 1, Yatao Wang 2, Kui Ni 3, Kuikui Lu 4 Vision Computing and Visualization Laboratory, School of Computer Science and Technology, University

More information

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 7, NO. 1, FEBRUARY Speech-Driven Facial Animation With Realistic Dynamics

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 7, NO. 1, FEBRUARY Speech-Driven Facial Animation With Realistic Dynamics IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 7, NO. 1, FEBRUARY 2005 33 Speech-Driven Facial Animation With Realistic Dynamics R. Gutierrez-Osuna, Member, IEEE, P. K. Kakumanu, Student Member, IEEE, A. Esposito,

More information

Acoustic to Articulatory Mapping using Memory Based Regression and Trajectory Smoothing

Acoustic to Articulatory Mapping using Memory Based Regression and Trajectory Smoothing Acoustic to Articulatory Mapping using Memory Based Regression and Trajectory Smoothing Samer Al Moubayed Center for Speech Technology, Department of Speech, Music, and Hearing, KTH, Sweden. sameram@kth.se

More information

3D Lip-Synch Generation with Data-Faithful Machine Learning

3D Lip-Synch Generation with Data-Faithful Machine Learning EUROGRAPHICS 2007 / D. Cohen-Or and P. Slavík (Guest Editors) Volume 26 (2007), Number 3 3D Lip-Synch Generation with Data-Faithful Machine Learning Ig-Jae Kim 1,2 and Hyeong-Seok Ko 1 1 Graphics and Media

More information

An Introduction to Pattern Recognition

An Introduction to Pattern Recognition An Introduction to Pattern Recognition Speaker : Wei lun Chao Advisor : Prof. Jian-jiun Ding DISP Lab Graduate Institute of Communication Engineering 1 Abstract Not a new research field Wide range included

More information

Data Mining Final Project Francisco R. Ortega Professor: Dr. Tao Li

Data Mining Final Project Francisco R. Ortega Professor: Dr. Tao Li Data Mining Final Project Francisco R. Ortega Professor: Dr. Tao Li FALL 2009 1.Introduction In the data mining class one of the aspects of interest were classifications. For the final project, the decision

More information

Animation of 3D surfaces

Animation of 3D surfaces Animation of 3D surfaces 2013-14 Motivations When character animation is controlled by skeleton set of hierarchical joints joints oriented by rotations the character shape still needs to be visible: visible

More information

3D Mesh Sequence Compression Using Thin-plate Spline based Prediction

3D Mesh Sequence Compression Using Thin-plate Spline based Prediction Appl. Math. Inf. Sci. 10, No. 4, 1603-1608 (2016) 1603 Applied Mathematics & Information Sciences An International Journal http://dx.doi.org/10.18576/amis/100440 3D Mesh Sequence Compression Using Thin-plate

More information

Facial Deformations for MPEG-4

Facial Deformations for MPEG-4 Facial Deformations for MPEG-4 Marc Escher, Igor Pandzic, Nadia Magnenat Thalmann MIRALab - CUI University of Geneva 24 rue du Général-Dufour CH1211 Geneva 4, Switzerland {Marc.Escher, Igor.Pandzic, Nadia.Thalmann}@cui.unige.ch

More information

FACIAL ANIMATION FROM SEVERAL IMAGES

FACIAL ANIMATION FROM SEVERAL IMAGES International Archives of Photogrammetry and Remote Sensing. Vol. XXXII, Part 5. Hakodate 1998 FACIAL ANIMATION FROM SEVERAL IMAGES Yasuhiro MUKAIGAWAt Yuichi NAKAMURA+ Yuichi OHTA+ t Department of Information

More information

SPEECH FEATURE EXTRACTION USING WEIGHTED HIGHER-ORDER LOCAL AUTO-CORRELATION

SPEECH FEATURE EXTRACTION USING WEIGHTED HIGHER-ORDER LOCAL AUTO-CORRELATION Far East Journal of Electronics and Communications Volume 3, Number 2, 2009, Pages 125-140 Published Online: September 14, 2009 This paper is available online at http://www.pphmj.com 2009 Pushpa Publishing

More information

Lifelike Talking Faces for Interactive Services

Lifelike Talking Faces for Interactive Services Lifelike Talking Faces for Interactive Services ERIC COSATTO, MEMBER, IEEE, JÖRN OSTERMANN, SENIOR MEMBER, IEEE, HANS PETER GRAF, FELLOW, IEEE, AND JUERGEN SCHROETER, FELLOW, IEEE Invited Paper Lifelike

More information

Face analysis : identity vs. expressions

Face analysis : identity vs. expressions Face analysis : identity vs. expressions Hugo Mercier 1,2 Patrice Dalle 1 1 IRIT - Université Paul Sabatier 118 Route de Narbonne, F-31062 Toulouse Cedex 9, France 2 Websourd 3, passage André Maurois -

More information

Research Article Audio-Visual Speech Recognition Using Lip Information Extracted from Side-Face Images

Research Article Audio-Visual Speech Recognition Using Lip Information Extracted from Side-Face Images Hindawi Publishing Corporation EURASIP Journal on Audio, Speech, and Music Processing Volume 2007, Article ID 64506, 9 pages doi:10.1155/2007/64506 Research Article Audio-Visual Speech Recognition Using

More information

Animation of generic 3D Head models driven by speech

Animation of generic 3D Head models driven by speech Animation of generic 3D Head models driven by speech Lucas Terissi, Mauricio Cerda, Juan C. Gomez, Nancy Hitschfeld-Kahler, Bernard Girau, Renato Valenzuela To cite this version: Lucas Terissi, Mauricio

More information

Automatic Enhancement of Correspondence Detection in an Object Tracking System

Automatic Enhancement of Correspondence Detection in an Object Tracking System Automatic Enhancement of Correspondence Detection in an Object Tracking System Denis Schulze 1, Sven Wachsmuth 1 and Katharina J. Rohlfing 2 1- University of Bielefeld - Applied Informatics Universitätsstr.

More information

Bengt J. Borgström, Student Member, IEEE, and Abeer Alwan, Senior Member, IEEE

Bengt J. Borgström, Student Member, IEEE, and Abeer Alwan, Senior Member, IEEE 1 A Low Complexity Parabolic Lip Contour Model With Speaker Normalization For High-Level Feature Extraction in Noise Robust Audio-Visual Speech Recognition Bengt J Borgström, Student Member, IEEE, and

More information

Evaluation of Expression Recognition Techniques

Evaluation of Expression Recognition Techniques Evaluation of Expression Recognition Techniques Ira Cohen 1, Nicu Sebe 2,3, Yafei Sun 3, Michael S. Lew 3, Thomas S. Huang 1 1 Beckman Institute, University of Illinois at Urbana-Champaign, USA 2 Faculty

More information

Probabilistic Facial Feature Extraction Using Joint Distribution of Location and Texture Information

Probabilistic Facial Feature Extraction Using Joint Distribution of Location and Texture Information Probabilistic Facial Feature Extraction Using Joint Distribution of Location and Texture Information Mustafa Berkay Yilmaz, Hakan Erdogan, Mustafa Unel Sabanci University, Faculty of Engineering and Natural

More information

AUTOMATIC 2D VIRTUAL FACE GENERATION BY 3D MODEL TRANSFORMATION TECHNIQUES

AUTOMATIC 2D VIRTUAL FACE GENERATION BY 3D MODEL TRANSFORMATION TECHNIQUES AUTOMATIC 2D VIRTUAL FACE GENERATION BY 3D MODEL TRANSFORMATION TECHNIQUES 1 Yi-Fan Chang ( 張依帆 ) and 1,2 Wen-Hsiang Tsai ( 蔡文祥 ) 1 Institute of Multimedia Engineering, National Chiao Tung University,

More information

LOW-DIMENSIONAL MOTION FEATURES FOR AUDIO-VISUAL SPEECH RECOGNITION

LOW-DIMENSIONAL MOTION FEATURES FOR AUDIO-VISUAL SPEECH RECOGNITION LOW-DIMENSIONAL MOTION FEATURES FOR AUDIO-VISUAL SPEECH Andrés Vallés Carboneras, Mihai Gurban +, and Jean-Philippe Thiran + + Signal Processing Institute, E.T.S.I. de Telecomunicación Ecole Polytechnique

More information

A Robust Two Feature Points Based Depth Estimation Method 1)

A Robust Two Feature Points Based Depth Estimation Method 1) Vol.31, No.5 ACTA AUTOMATICA SINICA September, 2005 A Robust Two Feature Points Based Depth Estimation Method 1) ZHONG Zhi-Guang YI Jian-Qiang ZHAO Dong-Bin (Laboratory of Complex Systems and Intelligence

More information

An Introduction to 3D Computer Graphics, Stereoscopic Image, and Animation in OpenGL and C/C++ Fore June

An Introduction to 3D Computer Graphics, Stereoscopic Image, and Animation in OpenGL and C/C++ Fore June An Introduction to 3D Computer Graphics, Stereoscopic Image, and Animation in OpenGL and C/C++ Fore June Appendix B Video Compression using Graphics Models B.1 Introduction In the past two decades, a new

More information

Recently, research in creating friendly human

Recently, research in creating friendly human For Duplication Expression and Impression Shigeo Morishima Recently, research in creating friendly human interfaces has flourished. Such interfaces provide smooth communication between a computer and a

More information

Computer Animation Visualization. Lecture 5. Facial animation

Computer Animation Visualization. Lecture 5. Facial animation Computer Animation Visualization Lecture 5 Facial animation Taku Komura Facial Animation The face is deformable Need to decide how all the vertices on the surface shall move Manually create them Muscle-based

More information

A LOW-COST STEREOVISION BASED SYSTEM FOR ACQUISITION OF VISIBLE ARTICULATORY DATA

A LOW-COST STEREOVISION BASED SYSTEM FOR ACQUISITION OF VISIBLE ARTICULATORY DATA ISCA Archive http://www.isca-speech.org/archive Auditory-Visual Speech Processing 2005 (AVSP 05) British Columbia, Canada July 24-27, 2005 A LOW-COST STEREOVISION BASED SYSTEM FOR ACQUISITION OF VISIBLE

More information

CS 231. Deformation simulation (and faces)

CS 231. Deformation simulation (and faces) CS 231 Deformation simulation (and faces) Deformation BODY Simulation Discretization Spring-mass models difficult to model continuum properties Simple & fast to implement and understand Finite Element

More information

Multi-Modal Human Verification Using Face and Speech

Multi-Modal Human Verification Using Face and Speech 22 Multi-Modal Human Verification Using Face and Speech Changhan Park 1 and Joonki Paik 2 1 Advanced Technology R&D Center, Samsung Thales Co., Ltd., 2 Graduate School of Advanced Imaging Science, Multimedia,

More information

Shape and Expression Space of Real istic Human Faces

Shape and Expression Space of Real istic Human Faces 8 5 2006 5 Vol8 No5 JOURNAL OF COMPU TER2AIDED DESIGN & COMPU TER GRAPHICS May 2006 ( 0087) (peiyuru @cis. pku. edu. cn) : Canny ; ; ; TP394 Shape and Expression Space of Real istic Human Faces Pei Yuru

More information

Human body animation. Computer Animation. Human Body Animation. Skeletal Animation

Human body animation. Computer Animation. Human Body Animation. Skeletal Animation Computer Animation Aitor Rovira March 2010 Human body animation Based on slides by Marco Gillies Human Body Animation Skeletal Animation Skeletal Animation (FK, IK) Motion Capture Motion Editing (retargeting,

More information

Realistic Face Animation for Audiovisual Speech Applications: A Densification Approach Driven by Sparse Stereo Meshes

Realistic Face Animation for Audiovisual Speech Applications: A Densification Approach Driven by Sparse Stereo Meshes Realistic Face Animation for Audiovisual Speech Applications: A Densification Approach Driven by Sparse Stereo Meshes Marie-Odile Berger, Jonathan Ponroy, Brigitte Wrobel-Dautcourt To cite this version:

More information

Synthesizing Realistic Facial Expressions from Photographs

Synthesizing Realistic Facial Expressions from Photographs Synthesizing Realistic Facial Expressions from Photographs 1998 F. Pighin, J Hecker, D. Lischinskiy, R. Szeliskiz and D. H. Salesin University of Washington, The Hebrew University Microsoft Research 1

More information

Topics for thesis. Automatic Speech-based Emotion Recognition

Topics for thesis. Automatic Speech-based Emotion Recognition Topics for thesis Bachelor: Automatic Speech-based Emotion Recognition Emotion recognition is an important part of Human-Computer Interaction (HCI). It has various applications in industrial and commercial

More information

Speech-driven Face Synthesis from 3D Video

Speech-driven Face Synthesis from 3D Video Speech-driven Face Synthesis from 3D Video Ioannis A. Ypsilos, Adrian Hilton, Aseel Turkmani and Philip J. B. Jackson Centre for Vision Speech and Signal Processing University of Surrey, Guildford, GU2

More information

AAM Based Facial Feature Tracking with Kinect

AAM Based Facial Feature Tracking with Kinect BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 15, No 3 Sofia 2015 Print ISSN: 1311-9702; Online ISSN: 1314-4081 DOI: 10.1515/cait-2015-0046 AAM Based Facial Feature Tracking

More information

FACIAL ANIMATION WITH MOTION CAPTURE BASED ON SURFACE BLENDING

FACIAL ANIMATION WITH MOTION CAPTURE BASED ON SURFACE BLENDING FACIAL ANIMATION WITH MOTION CAPTURE BASED ON SURFACE BLENDING Lijia Zhu and Won-Sook Lee School of Information Technology and Engineering, University of Ottawa 800 King Edward Ave., Ottawa, Ontario, Canada,

More information

Using the rear projection of the Socibot Desktop robot for creation of applications with facial expressions

Using the rear projection of the Socibot Desktop robot for creation of applications with facial expressions IOP Conference Series: Materials Science and Engineering PAPER OPEN ACCESS Using the rear projection of the Socibot Desktop robot for creation of applications with facial expressions To cite this article:

More information

Unsupervised Learning for Speech Motion Editing

Unsupervised Learning for Speech Motion Editing Eurographics/SIGGRAPH Symposium on Computer Animation (2003) D. Breen, M. Lin (Editors) Unsupervised Learning for Speech Motion Editing Yong Cao 1,2 Petros Faloutsos 1 Frédéric Pighin 2 1 University of

More information

Learning and Synthesizing MPEG-4 Compatible 3-D Face Animation From Video Sequence

Learning and Synthesizing MPEG-4 Compatible 3-D Face Animation From Video Sequence IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 13, NO. 11, NOVEMBER 2003 1119 Learning and Synthesizing MPEG-4 Compatible 3-D Face Animation From Video Sequence Wen Gao, Member, IEEE,

More information

Synthesizing Speech Animation By Learning Compact Speech Co-Articulation Models

Synthesizing Speech Animation By Learning Compact Speech Co-Articulation Models Synthesizing Speech Animation By Learning Compact Speech Co-Articulation Models Zhigang Deng J.P. Lewis Ulrich Neumann Computer Graphics and Immersive Technologies Lab Department of Computer Science, Integrated

More information

Audio-to-Visual Speech Conversion using Deep Neural Networks

Audio-to-Visual Speech Conversion using Deep Neural Networks INTERSPEECH 216 September 8 12, 216, San Francisco, USA Audio-to-Visual Speech Conversion using Deep Neural Networks Sarah Taylor 1, Akihiro Kato 1, Iain Matthews 2 and Ben Milner 1 1 University of East

More information

Flexible Calibration of a Portable Structured Light System through Surface Plane

Flexible Calibration of a Portable Structured Light System through Surface Plane Vol. 34, No. 11 ACTA AUTOMATICA SINICA November, 2008 Flexible Calibration of a Portable Structured Light System through Surface Plane GAO Wei 1 WANG Liang 1 HU Zhan-Yi 1 Abstract For a portable structured

More information

A Facial Expression Imitation System in Human Robot Interaction

A Facial Expression Imitation System in Human Robot Interaction A Facial Expression Imitation System in Human Robot Interaction S. S. Ge, C. Wang, C. C. Hang Abstract In this paper, we propose an interactive system for reconstructing human facial expression. In the

More information

HUMAN FACE AND FACIAL FEATURE TRACKING BY USING GEOMETRIC AND TEXTURE MODELS

HUMAN FACE AND FACIAL FEATURE TRACKING BY USING GEOMETRIC AND TEXTURE MODELS Journal of ELECTRICAL ENGINEERING, VOL. 59, NO. 5, 2008, 266 271 HUMAN FACE AND FACIAL FEATURE TRACKING BY USING GEOMETRIC AND TEXTURE MODELS Ján Mihalík Miroslav Kasár This paper deals with the tracking

More information

Animated Talking Head With Personalized 3D Head Model

Animated Talking Head With Personalized 3D Head Model Animated Talking Head With Personalized 3D Head Model L.S.Chen, T.S.Huang - Beckman Institute & CSL University of Illinois, Urbana, IL 61801, USA; lchen@ifp.uiuc.edu Jörn Ostermann, AT&T Labs-Research,

More information

A Revie w of Text2to2Visual Speech Synthesis

A Revie w of Text2to2Visual Speech Synthesis ISSN 0002239ΠCN 2777ΠTP Journal of Computer Research and Development 43 () : 45 52 2006-2 2 ( 00083) 2 ( 00080) (wangzhiming @tsinghuaorgcn) A Revie w of Text2to2Visual Speech Synthesis Wang Zhiming 2

More information

Speech Driven Facial Animation

Speech Driven Facial Animation Speech Driven Facial Animation P. Kakumanu R. Gutierrez-Osuna A. Esposito R. Bryll A. Goshtasby O.. Garcia 2 Department of Computer Science and Engineering Wright State University 364 Colonel Glenn Hwy

More information

Joint Matrix Quantization of Face Parameters and LPC Coefficients for Low Bit Rate Audiovisual Speech Coding

Joint Matrix Quantization of Face Parameters and LPC Coefficients for Low Bit Rate Audiovisual Speech Coding IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 12, NO. 3, MAY 2004 265 Joint Matrix Quantization of Face Parameters and LPC Coefficients for Low Bit Rate Audiovisual Speech Coding Laurent Girin

More information

Story Unit Segmentation with Friendly Acoustic Perception *

Story Unit Segmentation with Friendly Acoustic Perception * Story Unit Segmentation with Friendly Acoustic Perception * Longchuan Yan 1,3, Jun Du 2, Qingming Huang 3, and Shuqiang Jiang 1 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing,

More information

3D Active Appearance Model for Aligning Faces in 2D Images

3D Active Appearance Model for Aligning Faces in 2D Images 3D Active Appearance Model for Aligning Faces in 2D Images Chun-Wei Chen and Chieh-Chih Wang Abstract Perceiving human faces is one of the most important functions for human robot interaction. The active

More information

MATLAB Toolbox for Audiovisual Speech Processing

MATLAB Toolbox for Audiovisual Speech Processing ISCA Archive http://www.isca-speech.org/archive Auditory-Visual Speech Processing 27 (AVSP27) Hilvarenbeek, The Netherlands August 31 - September 3, 27 MATLAB Toolbox for Audiovisual Speech Processing

More information

CS 231. Deformation simulation (and faces)

CS 231. Deformation simulation (and faces) CS 231 Deformation simulation (and faces) 1 Cloth Simulation deformable surface model Represent cloth model as a triangular or rectangular grid Points of finite mass as vertices Forces or energies of points

More information

Text2Video: Text-Driven Facial Animation using MPEG-4

Text2Video: Text-Driven Facial Animation using MPEG-4 Text2Video: Text-Driven Facial Animation using MPEG-4 J. Rurainsky and P. Eisert Fraunhofer Institute for Telecommunications - Heinrich-Hertz Institute Image Processing Department D-10587 Berlin, Germany

More information

Robust color segmentation algorithms in illumination variation conditions

Robust color segmentation algorithms in illumination variation conditions 286 CHINESE OPTICS LETTERS / Vol. 8, No. / March 10, 2010 Robust color segmentation algorithms in illumination variation conditions Jinhui Lan ( ) and Kai Shen ( Department of Measurement and Control Technologies,

More information

Short Survey on Static Hand Gesture Recognition

Short Survey on Static Hand Gesture Recognition Short Survey on Static Hand Gesture Recognition Huu-Hung Huynh University of Science and Technology The University of Danang, Vietnam Duc-Hoang Vo University of Science and Technology The University of

More information

Voice-driven animation

Voice-driven animation MERL A MITSUBISHI ELECTRIC RESEARCH LABORATORY http://www.merl.com Voice-driven animation MatthewBrandandKenShan TR-98-20 September 1998 Abstract We introduce a method for learning a mapping between signals,

More information

The accuracy and robustness of motion

The accuracy and robustness of motion Orthogonal-Blendshape-Based Editing System for Facial Motion Capture Data Qing Li and Zhigang Deng University of Houston The accuracy and robustness of motion capture has made it a popular technique for

More information

Video Inter-frame Forgery Identification Based on Optical Flow Consistency

Video Inter-frame Forgery Identification Based on Optical Flow Consistency Sensors & Transducers 24 by IFSA Publishing, S. L. http://www.sensorsportal.com Video Inter-frame Forgery Identification Based on Optical Flow Consistency Qi Wang, Zhaohong Li, Zhenzhen Zhang, Qinglong

More information

Joint Learning of Speech-Driven Facial Motion with Bidirectional Long-Short Term Memory

Joint Learning of Speech-Driven Facial Motion with Bidirectional Long-Short Term Memory Joint Learning of Speech-Driven Facial Motion with Bidirectional Long-Short Term Memory Najmeh Sadoughi and Carlos Busso Multimodal Signal Processing (MSP) Laboratory, Department of Electrical and Computer

More information

Detection of Mouth Movements and Its Applications to Cross-Modal Analysis of Planning Meetings

Detection of Mouth Movements and Its Applications to Cross-Modal Analysis of Planning Meetings 2009 International Conference on Multimedia Information Networking and Security Detection of Mouth Movements and Its Applications to Cross-Modal Analysis of Planning Meetings Yingen Xiong Nokia Research

More information

Audio-visual interaction in sparse representation features for noise robust audio-visual speech recognition

Audio-visual interaction in sparse representation features for noise robust audio-visual speech recognition ISCA Archive http://www.isca-speech.org/archive Auditory-Visual Speech Processing (AVSP) 2013 Annecy, France August 29 - September 1, 2013 Audio-visual interaction in sparse representation features for

More information

Fast Facial Motion Cloning in MPEG-4

Fast Facial Motion Cloning in MPEG-4 Fast Facial Motion Cloning in MPEG-4 Marco Fratarcangeli and Marco Schaerf Department of Computer and Systems Science University of Rome La Sapienza frat,schaerf@dis.uniroma1.it Abstract Facial Motion

More information

Curriculum Vitae of Zhen Wen

Curriculum Vitae of Zhen Wen Curriculum Vitae of Zhen Wen CONTACT INFORMATION RESEARCH INTERESTS EDUCATION 4359 Beckman Institute Voice: (217) 244-4472 (O), (217) 766-3970 (C) University of Illinois at Urbana-Champaign Fax: (217)

More information

Combining Audio and Video for Detection of Spontaneous Emotions

Combining Audio and Video for Detection of Spontaneous Emotions Combining Audio and Video for Detection of Spontaneous Emotions Rok Gajšek, Vitomir Štruc, Simon Dobrišek, Janez Žibert, France Mihelič, and Nikola Pavešić Faculty of Electrical Engineering, University

More information

A Hybrid Approach to News Video Classification with Multi-modal Features

A Hybrid Approach to News Video Classification with Multi-modal Features A Hybrid Approach to News Video Classification with Multi-modal Features Peng Wang, Rui Cai and Shi-Qiang Yang Department of Computer Science and Technology, Tsinghua University, Beijing 00084, China Email:

More information

Tracking facial features using low resolution and low fps cameras under variable light conditions

Tracking facial features using low resolution and low fps cameras under variable light conditions Tracking facial features using low resolution and low fps cameras under variable light conditions Peter Kubíni * Department of Computer Graphics Comenius University Bratislava / Slovakia Abstract We are

More information

Motion Interpretation and Synthesis by ICA

Motion Interpretation and Synthesis by ICA Motion Interpretation and Synthesis by ICA Renqiang Min Department of Computer Science, University of Toronto, 1 King s College Road, Toronto, ON M5S3G4, Canada Abstract. It is known that high-dimensional

More information

IT IS well known that human perception of speech relies

IT IS well known that human perception of speech relies IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS PART A: SYSTEMS AND HUMANS, VOL 38, NO 6, NOVEMBER 008 173 A Low-Complexity Parabolic Lip Contour Model With Speaker Normalization for High-Level Feature

More information

Research On 3D Emotional Face Animation Based on Dirichlet Free Deformation Algorithm

Research On 3D Emotional Face Animation Based on Dirichlet Free Deformation Algorithm 2017 3rd International Conference on Electronic Information Technology and Intellectualization (ICEITI 2017) ISBN: 978-1-60595-512-4 Research On 3D Emotional Face Animation Based on Dirichlet Free Deformation

More information

A Multimodal Sensor Fusion Architecture for Audio-Visual Speech Recognition

A Multimodal Sensor Fusion Architecture for Audio-Visual Speech Recognition A Multimodal Sensor Fusion Architecture for Audio-Visual Speech Recognition by Mustapha A. Makkook A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree

More information

Voice Command Based Computer Application Control Using MFCC

Voice Command Based Computer Application Control Using MFCC Voice Command Based Computer Application Control Using MFCC Abinayaa B., Arun D., Darshini B., Nataraj C Department of Embedded Systems Technologies, Sri Ramakrishna College of Engineering, Coimbatore,

More information