Speech Driven Face Animation Based on Dynamic Concatenation Model
|
|
- Suzan Dickerson
- 5 years ago
- Views:
Transcription
1 Journal of Information & Computational Science 3: 4 (2006) 1 Available at Speech Driven Face Animation Based on Dynamic Concatenation Model Jianhua Tao, Panrong Yin National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing , China Received 1 May 2006; revised 20 July 2006 Abstract In the paper, we design and develop a speech driven face animation system based on the dynamic concatenation model. The face animation is synthesized by the unit concatenating, and synchronous with the real speech. The units are selected according to the cost functions which correspond to voice spectrum distance between training and target units. Visual distance between two adjacent training units is also used to get better mapping results. Finally, the Viterbi method is used to find out the best face animation sequence. The experimental results show that synthesized lip movement has a good and natural quality. Keywords: Talking head; Face animation; Speech driven 1 Introduction Speech driven face animation systems ([6, 8, 10, 15]) have been developed for improving humanmachine interface such as virtual announcer, reader, mobile messenger reader and so on. It also helps hearing-impaired people to communicate with others or those in noisy environment. Unlike the text driven face animation ([3, 5]) which can only use the utterance context information, the speech driven face animation is driven by real speech with both spectrum and prosodic information, and is able to offer more expressive and vivid outputs. A speech driven face animation system is usually established in four steps as defined in Picture My Voice ([10]): label an audio-visual database; give representation of both the auditory and visual speech; find a method to describe the relationship between two representations; synthesize the visible speech given the auditory speech. Many methods have been applied in audio-visual mapping, such as, TDNN ([10]), MLPs ([8]), KNN ([6]), HMM ([15]), GMM, VQ ([15]), Rulebased ([5, 9, 14]). They can be classified into two types: through speech recognition or not Project supported by the National Natural Science Foundation of China (No ). Corresponding author. address: jhtao@nlpr.ia.ac.cn (Jianhua Tao) / Copyright c 2006 Binary Information Press December 2006
2 2 J. Tao et al. /Journal of Information & Computational Science 3: 4 (2006) 1 through speech recognition. The first approach divides speech signal into language units such as phonemes, syllables, words, and mapped them directly to corresponding lip shapes, through smoothing algorithm, the lip movement sequence is obtained ([15]). The second approach analyzes the bimodal data through statistical learning model, and finds out the mapping between the continuous acoustic features and facial parameters, so as to drive the face animation directly by a novel speech. For example, Picture my voice ([10]) trained ANN to learn the mapping from LPCs to face animation parameters, they used current phoneme, 5 backward and 5 forward time step as the input to model the context. In the paper, we propose an acoustic context based facial unit selection and dynamic concatenation model. The method constructs the data stream by concatenating stored data units in training database. It makes the synthesis result very natural and realistic. Furthermore, a novel statistic method has been generated for the model training from a big corpus. The training and testing corpus is based on 3D coordinates of FDPs which are compatible with MPEG-4 standard ([11]) and have been recorded by Motion Capture System. Since animated talking head is required to be more lifelike and individual, FAPs are extracted as visual features so as to apply our synthesized animation control parameters to different 3D mesh models. The whole paper is organized as follows, Section 1 gives the general description of the paper, Section 2 introduces the database setup and audio-visual feature extraction, Section 3 focuses on the model description and the training method, Section 4 makes some experiments and analysis of the model and Section 5 gives the conclusion of the work. 2 Corpus 2.1 Corpus design and recording To create the model, a corpus which contains 1,000 sentences was produced by searching the appropriate data from 10 years People Daily via a Greedy Algorithm ([6]). The following factors form the focus of our investigation. (a) Identity of the current syllable; (b) Identity of the current tone; (c) Identity of the final in the previous syllable; (d) Identity of the previous tone; (e) Identity of the initial in the following syllable; (f) Identity of the following tone. Factor (a) has 417 values that correspond to the 417 syllable types in Chinese. Factors (b), (d), and (f) have each 5 values that correspond to the 4 full tones and the neutral tone. Factor (c) contains 20 values (initial types) and factor (e) contains 41 values (final types). During the text selection phase, phrasing was coded solely on the basis of punctuation. After the text was selected and the database recorded, phrasing was recoded to correspond to pauses. Each utterance in our database contains at least 2 phrases. There were 2,869 phrases and 11,860 syllables in total, so
3 J. Tao et al. /Journal of Information & Computational Science 3: 4 (2006) 1 3 on average each utterance contained 2 phrases. After the corpus was designed, each sentence was recorded by a female in a professional audio recording studio. For dynamic facial parameters recording, most of current corpuses are based on a stream of dynamic images of videos ([2, 3, 5, 14]), some others recorded static facial visemes by 3D laser scans or 2D to 3D image reconstruction method ([6, 8]). There are two most popular visual representation standards, one is the Facial Action Coding System ([4]) developed by Ekman and Friesen, the other is MPEG-4 SNHC ([11]). FACS defined 44 AUs to denote the independent facial muscles action, more than 7000 AU combinations have been observed. 68 FAPs which are defined in MPEG-4 SNHC also represented the basic facial actions, including two high-level parameters, viseme and expression, and 66 low-level parameters. They are divided into 10 groups according to effect regions. Since AUs cannot offer quantitative description for face animation, the paper converts 3D FDPs movements into low-level FAPs. FAPs are computed and normalized in terms of FAPUs which are the distances between the main facial feature points. So they can be applied to different face models in a consistent way. In the paper, 15 FAPs (Table. 1) involved in group 2 and group 8 are extracted to represent the lip movement. Table 1: FAPs for visual representation in our system. Group FAP name Group FAP name 2 Open jaw 8 Stretch l cornerlip o 2 Trust jaw 8 Lower t lip lm o 2 Shift jaw 8 Lower t lip rm o 2 Push b lip 8 Raise b lip lm o 2 Push t lip 8 Raise b lip rm o 8 Lower t midlip o 8 Raise l cornerlip o 8 Raise b midlip o 8 Raise r cornerlip o 8 Stretch r cornerlip o 8 To acquire FAPs accurately, a commercially available motion capture system (Motion Analysis), in which eight cameras with 75Hz sampling frequency are employed, is used to track the FDPs on speaker s face as shown in Fig. 1. The output of the motion capture system are 3D trajectories of all the 50 markers. During experiment, the rigid head movement is not avoided, so 5 markers on the head are used to compensate the global head movement to get the non-rigid facial deformations. 2.2 Features generation Normalization of markup points and FAPs flow generation To extract FAPs correctly, the 3D trajectories of face markers have to be normalized into an upright position in positive XYZ space, and then the least-square based fitting method of two 3D point sets is used to extract the rigid head movements ([1]). In our work, three benchmark points (a,b,c) are set on the forehead as shown in Fig. 1. The normalized facial parameters will be got from the following formula, P i,t = R( P i,t + T ). (1)
4 4 J. Tao et al. /Journal of Information & Computational Science 3: 4 (2006) 1 Fig. 1: Placement of markers on neutral face and capture data. Here, P i,t means the captured position from motion capture system, P i,t means the normalized position which will be used for generating the FAPs flow, R is the rotation matrix, T is the translation vector, i means the markup point and t is the measured time. P a,t, P b,t, and P c,t are the basic benchmark positions, the distances among them are, D a,b = P a,t P b,t = (x a,t x b,t ) 2 + (y a,t y b,t ) 2 + (z a,t z b,t ) 2, D a,c = P a,t P c,t = (x a,t x c,t ) 2 + (y a,t y c,t ) 2 + (z a,t z c,t ) 2, D b,c = P b,t P c,t = (x b,t x c,t ) 2 + (y b,t y c,t ) 2 + (z b,t z c,t ) 2, Let s suppose the three benchmark points are located in the plane z = 0 in the new normalized XYZ reference frame. Then, we got the new position of these points, P a,t = {0, 0, 0} T, P c,t = { D2 a,b + D2 a,c D 2 b,c 2D a,b, P b,t = {0, D a,b, 0} T, D 2 a,c ( D2 a,b + D2 a,c D 2 b,c 2D a,b ) 2, 0} T. Since points a, b, c will keep the constant distance among them, it s not necessary to calculate at each time. With formula (1), we then can get R( P a,t + T ) = P a,t = {0, 0, 0} T, (2) R( P b,t + T ) = P b,t = {0, D a,b, 0} T, (3) R( P c,t + T ) = P c,t = { D2 a,b + D2 a,c D 2 b,c 2D a,b, D 2 a,c ( D2 a,b + D2 a,c D 2 b,c 2D a,b ) 2, 0} T. (4)
5 J. Tao et al. /Journal of Information & Computational Science 3: 4 (2006) 1 5 Then, T = P a,t. (5) We separate R into the combination of rotations about axis x, axis y and axis z. R = R x R y R z. Here, R x = cos α sin α 0 sin α cos α Thus, R y = R z = cos β 0 sin β sin β 0 cos β cos θ sin θ 0 sin θ cos θ The rotation matrix can be easily deduced from the formula 2,3 4. Then, the output of motion capture system at t step is: {P 1,t, P 2,t,, P n,t} T R 3n, n = 45. (6) With the normalized markup points, we can get the FAPs flow by following the method defined in [16] where we consider the markup points as FDP parameters. V t = {F AP t 1, F AP t 2,, F AP t 15}. (7) Acoustic feature vectorization For audio features, some researchers used text-dependant language units to analyze their corresponding static visemes; but in order to reduce manual intervention, we turned to acoustic features directly. Previous work has proved that MFCC gives an alternative representation to speech spectra and contains the audition information. In each frame, we computed 12 MFCC coefficients ([17]). The parameters in frame t can be represented as: a t = {a t 1, a t 2,, a t 12}. (8) 3 Dynamic Concatenation Model 3.1 Model description The method is based on the cost function: (Eq. 6): COST = αc a + (1 α)c V, (9)
6 6 J. Tao et al. /Journal of Information & Computational Science 3: 4 (2006) 1 where C a is the voice spectrum distance, C V is the visual distance between two adjacent units, and the weight α balances the effect of the two sub-costs. In the unit selection procedures, according to the target frame unit of novel speech, we list the candidates which have smaller acoustic distances with them. The audio content distance measures how close the candidate unit is compared with the target unit. It determines whether the most appropriate audio-visual unit has been selected. The voice spectrum distance also accounts for the context of target unit. The context covers the former and latter two frames for vowels and the former and latter frame for consonants. So the voice spectrum distance is defined by C a = T +n 12 t=t n m=1 ω t,m a t,m â t,m. (10) The smoothness of the face animation is also very important, and is measured by the concatenation cost which is calculated by FAPs erros between two selected frames. Here, V(u t 1, u t ) and V(u t, u t+1 ) denote the Eucilidian distance. C V = V(u t 1, u t ) + V(u t, u t+1 ). (11) 3.2 Training method Though the initial weights can be manually assigned according to the researcher s experience, further training method is still necessary to adapt the model to different speech corpus. The whole training procedure is described as follows. Let s suppose the initial weight vector of Eq. (10) is ω 0 = {ω 0 1, ω 0 2,, ω 0 p}. The training set and outputs are { Y ˆ 1, Y ˆ 2,, Y ˆ N } and { Y 1, Y 2,, Y N }. After time step j-1 of learning, the weight vector will be ω j = {ω1, j ω2, j, ωp}, j where, p is the length of weight vector. Weight vector is restricted by the following condition ω j i = 1. (12) The condition ensures that weights will converge to steady ones, and ensures the balance of the weights in whole space. After the training time step j, the new weight vector is acquired with, ω j+1 i = ω j i + ηj + d j i. (13) Here, η j determines the learning rate at time step j and d j denotes the learning direction. Since all context parameters were normalized from 0 to 1, d j can be got from, d j i = [1 ( Y n ) min ][ (V (a min n,j )) (V (a s n,j))] + C. (14) Y n )
7 J. Tao et al. /Journal of Information & Computational Science 3: 4 (2006) 1 7 Fig. 2: Diagram of FAPs selection and weight assigning. Y n is the FAPs error between synthesis result and the target of the frame n, which is Y n = argmax[ i γ i V (a n,m,j )] ˆ Y n. (15) (V (a s n,j)) is the corresponding error of MFCC parameters between outputs and targets, (V (a s n,j)) = [V (a s n,j) ˆV (a n,j )]. (V (a min n,j )) is the error of MFCC parameters corresponding to the candidate of FAPs which makes minimum FAPs deviation between synthesis results and targets With the condition (12), we get (V (a min n,j )) = [V (a min n,j ) ˆV (a n,j )]. ω j+1 i = ω j i + η j d j i = 1. Thus, η j d j i = 0. To reduce the computing cost, η j normally is assigned as a constant value. We get C = 1 p [( Y n ) min Y n ) 1] [ (V (a min n,j )) (V (a s n,j))]. (16) Then, the whole automatic training procedure will be finished with the combination of Eq. (13), (14) and (16). The block diagram of the approach for FAPs flow prediction and training is given in Fig. 2.
8 8 J. Tao et al. /Journal of Information & Computational Science 3: 4 (2006) Synthesized FAP stream of FAP #51 of a sentence sample 200 Synthesized FAP stream of FAP #52 of a sentence sample FAP value FAP value Recorded FAP stream Synthesized FAP stream Smoothed synthesized FAP stream 800 Recorded FAP stream Synthesized FAP stream Smoothed synthesized FAP stream Frames Synthesized FAP stream of FAP #51 of a sentence sample (a) Frames Synthesized FAP stream of FAP #52 of a sentence sample FAP value FAP value Recorded FAP stream Synthesized FAP stream Smoothed synthesized FAP stream 800 Recorded FAP stream Synthesized FAP stream Smoothed synthesized FAP stream Frames (b) Fig. 3: Synthesized FAPs streams from (a) validation sentence, (b) test sentence. 4 Testing and Validation Once the cost function (with audio content distance and visual distance) are computed, the unit selection method is constructed. The approach finally aims to find the best path in this graph which generates minimum COST. Viterbi is a valid search method for this application. In the paper, three forth of the corpus is used as training data, the rest is used for validation and testting with both quantitative evaluation and qualitative observation. Fig. 3 indicates the synthesized FAP stream results. Correlation coefficients (Eq. 14) are used to represent the deviation similarity between the outputs and the targets Frames CC = 1 T T t=1 ˆf(t) ˆµ)(f(t) µ), (17) ˆσσ where f(t) is the recorded FAP stream, ˆf(t) is the synthesis result, T is the total number of frames in the database, and µ and σ are corresponding mean and standard deviation.
9 J. Tao et al. /Journal of Information & Computational Science 3: 4 (2006) 1 9 Table 2: The average correlation coefficient for each FAPs on the whole database. FAP No. Validation Test FAP No. Validation Test # # # # # # # # # # # # # # # From Table 2, the synthesized results show a good performance of the method. Although the test sentences show smaller correlation coefficients, they still can synthesize most of FAPs. It is also noticed that the CC of FAP #15 and #60 of validation data are even smaller than those of test data, which may be mainly due to the speaker s habit that she speaks sentences with randomly shifting mouth corner. Finally, a MPEG-4 facial animation engine is used to access our synthesized FAP streams qualitatively. The animation model displays at a frame rate of 75 fps. Fig. 4 shows some frames of synthesized talking head. Fig. 4: Some samples of synthesized talking head. 5 Conclusion In the paper, we have presented a speech driven face animation system. While many other researchers used images, optical flows, principal components, etc, as stored units, the system employs FAPs which are extracted from 3D coordinates of FDPs to represent the visual feature. It has advantage of small storage, being able to be applied to non-specific model, and also to drive both 3D mesh model and 2D images. The unit selection method for dynamic mapping from audio to visual parameters is a simplification of HMM, but it appears very good and natural with both quantitative and qualitative evaluation. Without parameter adjust, under-fitting, over-fitting problems which are involved in learning method like ANN, it is a simple, direct and effective way to synthesize the talking head.
10 10 J. Tao et al. /Journal of Information & Computational Science 3: 4 (2006) 1 References [1] K. S. Arun, T. S. Huang and S. D. Blostein, Least-square fitting of two 3D point sets, IEEE Trans. Pattern Analysis and Machine Intelligence 9 (5) [2] C. Bregler, M. Covell and M. Slaney, Video rewrite: driving visual speech with audio, in: ACM SIGGRAPH, [3] E. Cosatto, G. Potamianos and H. P. Graf, Audio-visual unit selection for the synthesis of photorealistic talking-heads, in: IEEE International Conference on Multimedia and Expo, ICME, 2000, pp [4] P. Ekman and W. V. Friesen, Facial Action Coding System,, Consulting Psychologists Press, Palo Alto Calif., [5] T. Ezzat, T. Poggio and MikeTalk, A talking facial display based on morphing visemes, in: Proc. Computer Animation Conference, Philadelphia, USA, [6] R. Gutierrez-Osuna, P. K. Kakumanu, A. Esposito, O. N. Garcia, A. Bojorquez, J. L. Castillo and I. Rudomin, Speech-driven facial animation with realistic dynamics, IEEE Trans. Multimedia 7 (1) [7] A. Hunt and A. Black, Unit selection in a concatenative speech synthesis system using a large speech database, in: ICASSP, Vol. 1, 1996, pp [8] Pengyu Hong, Zhen Wen and Thomas S. Huang, Real-time speech-driven face animation with expressions using neural networks, IEEE Trans. on Neural Networks 13 (4) [9] S. Kshirsagar, T. Molet, and N. M. Thalmann, Principal components of expressive speech animation, in: Proc. of Computer Graphics International, [10] Dominic W. Massaro, Jonas Beskow, Michael M. Cohen, Christopher L. Fry and Tony Rodriguez, Picture my voice: audio to visual speech synthesis using artificial neural networks, in: Proceedings of AVSP 99, Santa Cruz, CA., August, 1999, pp [11] A. Murat Tekalp and JoK rn Ostermann, Face and 2D mesh animation in MPEG-4, Signal Processing, Image Communication 15 (2000) [12] Ram R. Rao, Tsuhan Chen and Russell M. Mersereau, Audio-to-visual conversion for multimedia communication, IEEE Trans. Industrial Electronics 45 (1) [13] Jian-Qing Wang, Ka-Ho Wong, Pheng-Ann Pheng, H. M. Meng and Tien-Tsin Wong, A real-time Cantonese text-to-audiovisual speech synthesizer, in: Acoustics, Speech, and Signal Processing, 2004, Proceedings, (ICASSP 04), Vol. 1, pp [14] Ashish Verma, L. Venkata Subramaniam, Nitendra Rajput, Chalapathy Neti and Tanveer A. Faruquie, Animating expressive faces across languages, IEEE Trans. Multimedia 6 (6) [15] E. Yamamoto, S. Nakamura and K. Shikano, Lip movement synthesis from speech based on hidden Markov models, Speech Communication 26 (1998) [16] Overview of the MPEG-4 Standard, 4.htm. [17] The Hidden Markov Model Toolkit (HTK),
Expressive Face Animation Synthesis Based on Dynamic Mapping Method*
Expressive Face Animation Synthesis Based on Dynamic Mapping Method* Panrong Yin, Liyue Zhao, Lixing Huang, and Jianhua Tao National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese
More informationM I RA Lab. Speech Animation. Where do we stand today? Speech Animation : Hierarchy. What are the technologies?
MIRALab Where Research means Creativity Where do we stand today? M I RA Lab Nadia Magnenat-Thalmann MIRALab, University of Geneva thalmann@miralab.unige.ch Video Input (face) Audio Input (speech) FAP Extraction
More informationSpeech Driven Synthesis of Talking Head Sequences
3D Image Analysis and Synthesis, pp. 5-56, Erlangen, November 997. Speech Driven Synthesis of Talking Head Sequences Peter Eisert, Subhasis Chaudhuri,andBerndGirod Telecommunications Laboratory, University
More informationVISEME SPACE FOR REALISTIC SPEECH ANIMATION
VISEME SPACE FOR REALISTIC SPEECH ANIMATION Sumedha Kshirsagar, Nadia Magnenat-Thalmann MIRALab CUI, University of Geneva {sumedha, thalmann}@miralab.unige.ch http://www.miralab.unige.ch ABSTRACT For realistic
More informationReal-time Lip Synchronization Based on Hidden Markov Models
ACCV2002 The 5th Asian Conference on Computer Vision, 23--25 January 2002, Melbourne, Australia. Real-time Lip Synchronization Based on Hidden Markov Models Ying Huang* 1 Stephen Lin+ Xiaoqing Ding* Baining
More informationVTalk: A System for generating Text-to-Audio-Visual Speech
VTalk: A System for generating Text-to-Audio-Visual Speech Prem Kalra, Ashish Kapoor and Udit Kumar Goyal Department of Computer Science and Engineering, Indian Institute of Technology, Delhi Contact email:
More informationIFACE: A 3D SYNTHETIC TALKING FACE
IFACE: A 3D SYNTHETIC TALKING FACE PENGYU HONG *, ZHEN WEN, THOMAS S. HUANG Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign Urbana, IL 61801, USA We present
More informationLip Tracking for MPEG-4 Facial Animation
Lip Tracking for MPEG-4 Facial Animation Zhilin Wu, Petar S. Aleksic, and Aggelos. atsaggelos Department of Electrical and Computer Engineering Northwestern University 45 North Sheridan Road, Evanston,
More informationText-to-Audiovisual Speech Synthesizer
Text-to-Audiovisual Speech Synthesizer Udit Kumar Goyal, Ashish Kapoor and Prem Kalra Department of Computer Science and Engineering, Indian Institute of Technology, Delhi pkalra@cse.iitd.ernet.in Abstract.
More informationReal-Time Speech-Driven Face Animation with Expressions Using Neural Networks
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 1, JANUARY 2002 100 Real-Time Speech-Driven Face Animation with Expressions Using Neural Networks Pengyu Hong, Zhen Wen, and Thomas S. Huang, Fellow,
More informationTowards Audiovisual TTS
Towards Audiovisual TTS in Estonian Einar MEISTER a, SaschaFAGEL b and RainerMETSVAHI a a Institute of Cybernetics at Tallinn University of Technology, Estonia b zoobemessageentertainmentgmbh, Berlin,
More informationReal-time Talking Head Driven by Voice and its Application to Communication and Entertainment
ISCA Archive Real-time Talking Head Driven by Voice and its Application to Communication and Entertainment Shigeo MORISHIMA Seikei University ABSTRACT Recently computer can make cyberspace to walk through
More informationREALISTIC FACIAL EXPRESSION SYNTHESIS FOR AN IMAGE-BASED TALKING HEAD. Kang Liu and Joern Ostermann
REALISTIC FACIAL EXPRESSION SYNTHESIS FOR AN IMAGE-BASED TALKING HEAD Kang Liu and Joern Ostermann Institut für Informationsverarbeitung, Leibniz Universität Hannover Appelstr. 9A, 3167 Hannover, Germany
More informationMAPPING FROM SPEECH TO IMAGES USING CONTINUOUS STATE SPACE MODELS. Tue Lehn-Schiøler, Lars Kai Hansen & Jan Larsen
MAPPING FROM SPEECH TO IMAGES USING CONTINUOUS STATE SPACE MODELS Tue Lehn-Schiøler, Lars Kai Hansen & Jan Larsen The Technical University of Denmark Informatics and Mathematical Modelling Richard Petersens
More informationFacial Expression Analysis for Model-Based Coding of Video Sequences
Picture Coding Symposium, pp. 33-38, Berlin, September 1997. Facial Expression Analysis for Model-Based Coding of Video Sequences Peter Eisert and Bernd Girod Telecommunications Institute, University of
More informationMulti-modal Translation and Evaluation of Lip-synchronization using Noise Added Voice
Multi-modal Translation and Evaluation of Lip-synchronization using Noise Added Voice Shigeo MORISHIMA (,2), Satoshi NAKAMURA (2) () Faculty of Engineering, Seikei University. --, Kichijoji-Kitamachi,
More informationAUDIOVISUAL SYNTHESIS OF EXAGGERATED SPEECH FOR CORRECTIVE FEEDBACK IN COMPUTER-ASSISTED PRONUNCIATION TRAINING.
AUDIOVISUAL SYNTHESIS OF EXAGGERATED SPEECH FOR CORRECTIVE FEEDBACK IN COMPUTER-ASSISTED PRONUNCIATION TRAINING Junhong Zhao 1,2, Hua Yuan 3, Wai-Kim Leung 4, Helen Meng 4, Jia Liu 3 and Shanhong Xia 1
More informationRealistic Talking Head for Human-Car-Entertainment Services
Realistic Talking Head for Human-Car-Entertainment Services Kang Liu, M.Sc., Prof. Dr.-Ing. Jörn Ostermann Institut für Informationsverarbeitung, Leibniz Universität Hannover Appelstr. 9A, 30167 Hannover
More informationTOWARDS A HIGH QUALITY FINNISH TALKING HEAD
TOWARDS A HIGH QUALITY FINNISH TALKING HEAD Jean-Luc Oliv&s, Mikko Sam, Janne Kulju and Otto Seppala Helsinki University of Technology Laboratory of Computational Engineering, P.O. Box 9400, Fin-02015
More informationFACE ANALYSIS AND SYNTHESIS FOR INTERACTIVE ENTERTAINMENT
FACE ANALYSIS AND SYNTHESIS FOR INTERACTIVE ENTERTAINMENT Shoichiro IWASAWA*I, Tatsuo YOTSUKURA*2, Shigeo MORISHIMA*2 */ Telecommunication Advancement Organization *2Facu!ty of Engineering, Seikei University
More informationData-Driven Face Modeling and Animation
1. Research Team Data-Driven Face Modeling and Animation Project Leader: Post Doc(s): Graduate Students: Undergraduate Students: Prof. Ulrich Neumann, IMSC and Computer Science John P. Lewis Zhigang Deng,
More informationAnimation of 3D surfaces.
Animation of 3D surfaces Motivations When character animation is controlled by skeleton set of hierarchical joints joints oriented by rotations the character shape still needs to be visible: visible =
More informationPERSONALIZED FACE ANIMATION IN SHOWFACE SYSTEM. Ali Arya Babak Hamidzadeh
PERSONALIZED FACE ANIMATION IN SHOWFACE SYSTEM Ali Arya Babak Hamidzadeh Dept. of Electrical & Computer Engineering, University of British Columbia, 2356 Main Mall, Vancouver, BC, Canada V6T 1Z4, Phone:
More informationFacial Motion Capture Editing by Automated Orthogonal Blendshape Construction and Weight Propagation
Facial Motion Capture Editing by Automated Orthogonal Blendshape Construction and Weight Propagation Qing Li and Zhigang Deng Department of Computer Science University of Houston Houston, TX, 77204, USA
More information2-2-2, Hikaridai, Seika-cho, Soraku-gun, Kyoto , Japan 2 Graduate School of Information Science, Nara Institute of Science and Technology
ISCA Archive STREAM WEIGHT OPTIMIZATION OF SPEECH AND LIP IMAGE SEQUENCE FOR AUDIO-VISUAL SPEECH RECOGNITION Satoshi Nakamura 1 Hidetoshi Ito 2 Kiyohiro Shikano 2 1 ATR Spoken Language Translation Research
More informationK A I S T Department of Computer Science
An Example-based Approach to Text-driven Speech Animation with Emotional Expressions Hyewon Pyun, Wonseok Chae, Yejin Kim, Hyungwoo Kang, and Sung Yong Shin CS/TR-2004-200 July 19, 2004 K A I S T Department
More informationimage-based visual synthesis: facial overlay
Universität des Saarlandes Fachrichtung 4.7 Phonetik Sommersemester 2002 Seminar: Audiovisuelle Sprache in der Sprachtechnologie Seminarleitung: Dr. Jacques Koreman image-based visual synthesis: facial
More informationNatural Head Motion Synthesis Driven by Acoustic Prosodic Features
Natural Head Motion Synthesis Driven by Acoustic Prosodic Features Carlos Busso Dept. of EE busso@usc.edu Zhigang Deng Dept. of CS zdeng@usc.edu Ulrich Neumann Dept. of CS uneumann@usc.edu Integrated Media
More informationArtificial Visual Speech Synchronized with a Speech Synthesis System
Artificial Visual Speech Synchronized with a Speech Synthesis System H.H. Bothe und E.A. Wieden Department of Electronics, Technical University Berlin Einsteinufer 17, D-10587 Berlin, Germany Abstract:
More informationFacial Animation System Based on Image Warping Algorithm
Facial Animation System Based on Image Warping Algorithm Lanfang Dong 1, Yatao Wang 2, Kui Ni 3, Kuikui Lu 4 Vision Computing and Visualization Laboratory, School of Computer Science and Technology, University
More informationIEEE TRANSACTIONS ON MULTIMEDIA, VOL. 7, NO. 1, FEBRUARY Speech-Driven Facial Animation With Realistic Dynamics
IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 7, NO. 1, FEBRUARY 2005 33 Speech-Driven Facial Animation With Realistic Dynamics R. Gutierrez-Osuna, Member, IEEE, P. K. Kakumanu, Student Member, IEEE, A. Esposito,
More informationAcoustic to Articulatory Mapping using Memory Based Regression and Trajectory Smoothing
Acoustic to Articulatory Mapping using Memory Based Regression and Trajectory Smoothing Samer Al Moubayed Center for Speech Technology, Department of Speech, Music, and Hearing, KTH, Sweden. sameram@kth.se
More information3D Lip-Synch Generation with Data-Faithful Machine Learning
EUROGRAPHICS 2007 / D. Cohen-Or and P. Slavík (Guest Editors) Volume 26 (2007), Number 3 3D Lip-Synch Generation with Data-Faithful Machine Learning Ig-Jae Kim 1,2 and Hyeong-Seok Ko 1 1 Graphics and Media
More informationAn Introduction to Pattern Recognition
An Introduction to Pattern Recognition Speaker : Wei lun Chao Advisor : Prof. Jian-jiun Ding DISP Lab Graduate Institute of Communication Engineering 1 Abstract Not a new research field Wide range included
More informationData Mining Final Project Francisco R. Ortega Professor: Dr. Tao Li
Data Mining Final Project Francisco R. Ortega Professor: Dr. Tao Li FALL 2009 1.Introduction In the data mining class one of the aspects of interest were classifications. For the final project, the decision
More informationAnimation of 3D surfaces
Animation of 3D surfaces 2013-14 Motivations When character animation is controlled by skeleton set of hierarchical joints joints oriented by rotations the character shape still needs to be visible: visible
More information3D Mesh Sequence Compression Using Thin-plate Spline based Prediction
Appl. Math. Inf. Sci. 10, No. 4, 1603-1608 (2016) 1603 Applied Mathematics & Information Sciences An International Journal http://dx.doi.org/10.18576/amis/100440 3D Mesh Sequence Compression Using Thin-plate
More informationFacial Deformations for MPEG-4
Facial Deformations for MPEG-4 Marc Escher, Igor Pandzic, Nadia Magnenat Thalmann MIRALab - CUI University of Geneva 24 rue du Général-Dufour CH1211 Geneva 4, Switzerland {Marc.Escher, Igor.Pandzic, Nadia.Thalmann}@cui.unige.ch
More informationFACIAL ANIMATION FROM SEVERAL IMAGES
International Archives of Photogrammetry and Remote Sensing. Vol. XXXII, Part 5. Hakodate 1998 FACIAL ANIMATION FROM SEVERAL IMAGES Yasuhiro MUKAIGAWAt Yuichi NAKAMURA+ Yuichi OHTA+ t Department of Information
More informationSPEECH FEATURE EXTRACTION USING WEIGHTED HIGHER-ORDER LOCAL AUTO-CORRELATION
Far East Journal of Electronics and Communications Volume 3, Number 2, 2009, Pages 125-140 Published Online: September 14, 2009 This paper is available online at http://www.pphmj.com 2009 Pushpa Publishing
More informationLifelike Talking Faces for Interactive Services
Lifelike Talking Faces for Interactive Services ERIC COSATTO, MEMBER, IEEE, JÖRN OSTERMANN, SENIOR MEMBER, IEEE, HANS PETER GRAF, FELLOW, IEEE, AND JUERGEN SCHROETER, FELLOW, IEEE Invited Paper Lifelike
More informationFace analysis : identity vs. expressions
Face analysis : identity vs. expressions Hugo Mercier 1,2 Patrice Dalle 1 1 IRIT - Université Paul Sabatier 118 Route de Narbonne, F-31062 Toulouse Cedex 9, France 2 Websourd 3, passage André Maurois -
More informationResearch Article Audio-Visual Speech Recognition Using Lip Information Extracted from Side-Face Images
Hindawi Publishing Corporation EURASIP Journal on Audio, Speech, and Music Processing Volume 2007, Article ID 64506, 9 pages doi:10.1155/2007/64506 Research Article Audio-Visual Speech Recognition Using
More informationAnimation of generic 3D Head models driven by speech
Animation of generic 3D Head models driven by speech Lucas Terissi, Mauricio Cerda, Juan C. Gomez, Nancy Hitschfeld-Kahler, Bernard Girau, Renato Valenzuela To cite this version: Lucas Terissi, Mauricio
More informationAutomatic Enhancement of Correspondence Detection in an Object Tracking System
Automatic Enhancement of Correspondence Detection in an Object Tracking System Denis Schulze 1, Sven Wachsmuth 1 and Katharina J. Rohlfing 2 1- University of Bielefeld - Applied Informatics Universitätsstr.
More informationBengt J. Borgström, Student Member, IEEE, and Abeer Alwan, Senior Member, IEEE
1 A Low Complexity Parabolic Lip Contour Model With Speaker Normalization For High-Level Feature Extraction in Noise Robust Audio-Visual Speech Recognition Bengt J Borgström, Student Member, IEEE, and
More informationEvaluation of Expression Recognition Techniques
Evaluation of Expression Recognition Techniques Ira Cohen 1, Nicu Sebe 2,3, Yafei Sun 3, Michael S. Lew 3, Thomas S. Huang 1 1 Beckman Institute, University of Illinois at Urbana-Champaign, USA 2 Faculty
More informationProbabilistic Facial Feature Extraction Using Joint Distribution of Location and Texture Information
Probabilistic Facial Feature Extraction Using Joint Distribution of Location and Texture Information Mustafa Berkay Yilmaz, Hakan Erdogan, Mustafa Unel Sabanci University, Faculty of Engineering and Natural
More informationAUTOMATIC 2D VIRTUAL FACE GENERATION BY 3D MODEL TRANSFORMATION TECHNIQUES
AUTOMATIC 2D VIRTUAL FACE GENERATION BY 3D MODEL TRANSFORMATION TECHNIQUES 1 Yi-Fan Chang ( 張依帆 ) and 1,2 Wen-Hsiang Tsai ( 蔡文祥 ) 1 Institute of Multimedia Engineering, National Chiao Tung University,
More informationLOW-DIMENSIONAL MOTION FEATURES FOR AUDIO-VISUAL SPEECH RECOGNITION
LOW-DIMENSIONAL MOTION FEATURES FOR AUDIO-VISUAL SPEECH Andrés Vallés Carboneras, Mihai Gurban +, and Jean-Philippe Thiran + + Signal Processing Institute, E.T.S.I. de Telecomunicación Ecole Polytechnique
More informationA Robust Two Feature Points Based Depth Estimation Method 1)
Vol.31, No.5 ACTA AUTOMATICA SINICA September, 2005 A Robust Two Feature Points Based Depth Estimation Method 1) ZHONG Zhi-Guang YI Jian-Qiang ZHAO Dong-Bin (Laboratory of Complex Systems and Intelligence
More informationAn Introduction to 3D Computer Graphics, Stereoscopic Image, and Animation in OpenGL and C/C++ Fore June
An Introduction to 3D Computer Graphics, Stereoscopic Image, and Animation in OpenGL and C/C++ Fore June Appendix B Video Compression using Graphics Models B.1 Introduction In the past two decades, a new
More informationRecently, research in creating friendly human
For Duplication Expression and Impression Shigeo Morishima Recently, research in creating friendly human interfaces has flourished. Such interfaces provide smooth communication between a computer and a
More informationComputer Animation Visualization. Lecture 5. Facial animation
Computer Animation Visualization Lecture 5 Facial animation Taku Komura Facial Animation The face is deformable Need to decide how all the vertices on the surface shall move Manually create them Muscle-based
More informationA LOW-COST STEREOVISION BASED SYSTEM FOR ACQUISITION OF VISIBLE ARTICULATORY DATA
ISCA Archive http://www.isca-speech.org/archive Auditory-Visual Speech Processing 2005 (AVSP 05) British Columbia, Canada July 24-27, 2005 A LOW-COST STEREOVISION BASED SYSTEM FOR ACQUISITION OF VISIBLE
More informationCS 231. Deformation simulation (and faces)
CS 231 Deformation simulation (and faces) Deformation BODY Simulation Discretization Spring-mass models difficult to model continuum properties Simple & fast to implement and understand Finite Element
More informationMulti-Modal Human Verification Using Face and Speech
22 Multi-Modal Human Verification Using Face and Speech Changhan Park 1 and Joonki Paik 2 1 Advanced Technology R&D Center, Samsung Thales Co., Ltd., 2 Graduate School of Advanced Imaging Science, Multimedia,
More informationShape and Expression Space of Real istic Human Faces
8 5 2006 5 Vol8 No5 JOURNAL OF COMPU TER2AIDED DESIGN & COMPU TER GRAPHICS May 2006 ( 0087) (peiyuru @cis. pku. edu. cn) : Canny ; ; ; TP394 Shape and Expression Space of Real istic Human Faces Pei Yuru
More informationHuman body animation. Computer Animation. Human Body Animation. Skeletal Animation
Computer Animation Aitor Rovira March 2010 Human body animation Based on slides by Marco Gillies Human Body Animation Skeletal Animation Skeletal Animation (FK, IK) Motion Capture Motion Editing (retargeting,
More informationRealistic Face Animation for Audiovisual Speech Applications: A Densification Approach Driven by Sparse Stereo Meshes
Realistic Face Animation for Audiovisual Speech Applications: A Densification Approach Driven by Sparse Stereo Meshes Marie-Odile Berger, Jonathan Ponroy, Brigitte Wrobel-Dautcourt To cite this version:
More informationSynthesizing Realistic Facial Expressions from Photographs
Synthesizing Realistic Facial Expressions from Photographs 1998 F. Pighin, J Hecker, D. Lischinskiy, R. Szeliskiz and D. H. Salesin University of Washington, The Hebrew University Microsoft Research 1
More informationTopics for thesis. Automatic Speech-based Emotion Recognition
Topics for thesis Bachelor: Automatic Speech-based Emotion Recognition Emotion recognition is an important part of Human-Computer Interaction (HCI). It has various applications in industrial and commercial
More informationSpeech-driven Face Synthesis from 3D Video
Speech-driven Face Synthesis from 3D Video Ioannis A. Ypsilos, Adrian Hilton, Aseel Turkmani and Philip J. B. Jackson Centre for Vision Speech and Signal Processing University of Surrey, Guildford, GU2
More informationAAM Based Facial Feature Tracking with Kinect
BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 15, No 3 Sofia 2015 Print ISSN: 1311-9702; Online ISSN: 1314-4081 DOI: 10.1515/cait-2015-0046 AAM Based Facial Feature Tracking
More informationFACIAL ANIMATION WITH MOTION CAPTURE BASED ON SURFACE BLENDING
FACIAL ANIMATION WITH MOTION CAPTURE BASED ON SURFACE BLENDING Lijia Zhu and Won-Sook Lee School of Information Technology and Engineering, University of Ottawa 800 King Edward Ave., Ottawa, Ontario, Canada,
More informationUsing the rear projection of the Socibot Desktop robot for creation of applications with facial expressions
IOP Conference Series: Materials Science and Engineering PAPER OPEN ACCESS Using the rear projection of the Socibot Desktop robot for creation of applications with facial expressions To cite this article:
More informationUnsupervised Learning for Speech Motion Editing
Eurographics/SIGGRAPH Symposium on Computer Animation (2003) D. Breen, M. Lin (Editors) Unsupervised Learning for Speech Motion Editing Yong Cao 1,2 Petros Faloutsos 1 Frédéric Pighin 2 1 University of
More informationLearning and Synthesizing MPEG-4 Compatible 3-D Face Animation From Video Sequence
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 13, NO. 11, NOVEMBER 2003 1119 Learning and Synthesizing MPEG-4 Compatible 3-D Face Animation From Video Sequence Wen Gao, Member, IEEE,
More informationSynthesizing Speech Animation By Learning Compact Speech Co-Articulation Models
Synthesizing Speech Animation By Learning Compact Speech Co-Articulation Models Zhigang Deng J.P. Lewis Ulrich Neumann Computer Graphics and Immersive Technologies Lab Department of Computer Science, Integrated
More informationAudio-to-Visual Speech Conversion using Deep Neural Networks
INTERSPEECH 216 September 8 12, 216, San Francisco, USA Audio-to-Visual Speech Conversion using Deep Neural Networks Sarah Taylor 1, Akihiro Kato 1, Iain Matthews 2 and Ben Milner 1 1 University of East
More informationFlexible Calibration of a Portable Structured Light System through Surface Plane
Vol. 34, No. 11 ACTA AUTOMATICA SINICA November, 2008 Flexible Calibration of a Portable Structured Light System through Surface Plane GAO Wei 1 WANG Liang 1 HU Zhan-Yi 1 Abstract For a portable structured
More informationA Facial Expression Imitation System in Human Robot Interaction
A Facial Expression Imitation System in Human Robot Interaction S. S. Ge, C. Wang, C. C. Hang Abstract In this paper, we propose an interactive system for reconstructing human facial expression. In the
More informationHUMAN FACE AND FACIAL FEATURE TRACKING BY USING GEOMETRIC AND TEXTURE MODELS
Journal of ELECTRICAL ENGINEERING, VOL. 59, NO. 5, 2008, 266 271 HUMAN FACE AND FACIAL FEATURE TRACKING BY USING GEOMETRIC AND TEXTURE MODELS Ján Mihalík Miroslav Kasár This paper deals with the tracking
More informationAnimated Talking Head With Personalized 3D Head Model
Animated Talking Head With Personalized 3D Head Model L.S.Chen, T.S.Huang - Beckman Institute & CSL University of Illinois, Urbana, IL 61801, USA; lchen@ifp.uiuc.edu Jörn Ostermann, AT&T Labs-Research,
More informationA Revie w of Text2to2Visual Speech Synthesis
ISSN 0002239ΠCN 2777ΠTP Journal of Computer Research and Development 43 () : 45 52 2006-2 2 ( 00083) 2 ( 00080) (wangzhiming @tsinghuaorgcn) A Revie w of Text2to2Visual Speech Synthesis Wang Zhiming 2
More informationSpeech Driven Facial Animation
Speech Driven Facial Animation P. Kakumanu R. Gutierrez-Osuna A. Esposito R. Bryll A. Goshtasby O.. Garcia 2 Department of Computer Science and Engineering Wright State University 364 Colonel Glenn Hwy
More informationJoint Matrix Quantization of Face Parameters and LPC Coefficients for Low Bit Rate Audiovisual Speech Coding
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 12, NO. 3, MAY 2004 265 Joint Matrix Quantization of Face Parameters and LPC Coefficients for Low Bit Rate Audiovisual Speech Coding Laurent Girin
More informationStory Unit Segmentation with Friendly Acoustic Perception *
Story Unit Segmentation with Friendly Acoustic Perception * Longchuan Yan 1,3, Jun Du 2, Qingming Huang 3, and Shuqiang Jiang 1 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing,
More information3D Active Appearance Model for Aligning Faces in 2D Images
3D Active Appearance Model for Aligning Faces in 2D Images Chun-Wei Chen and Chieh-Chih Wang Abstract Perceiving human faces is one of the most important functions for human robot interaction. The active
More informationMATLAB Toolbox for Audiovisual Speech Processing
ISCA Archive http://www.isca-speech.org/archive Auditory-Visual Speech Processing 27 (AVSP27) Hilvarenbeek, The Netherlands August 31 - September 3, 27 MATLAB Toolbox for Audiovisual Speech Processing
More informationCS 231. Deformation simulation (and faces)
CS 231 Deformation simulation (and faces) 1 Cloth Simulation deformable surface model Represent cloth model as a triangular or rectangular grid Points of finite mass as vertices Forces or energies of points
More informationText2Video: Text-Driven Facial Animation using MPEG-4
Text2Video: Text-Driven Facial Animation using MPEG-4 J. Rurainsky and P. Eisert Fraunhofer Institute for Telecommunications - Heinrich-Hertz Institute Image Processing Department D-10587 Berlin, Germany
More informationRobust color segmentation algorithms in illumination variation conditions
286 CHINESE OPTICS LETTERS / Vol. 8, No. / March 10, 2010 Robust color segmentation algorithms in illumination variation conditions Jinhui Lan ( ) and Kai Shen ( Department of Measurement and Control Technologies,
More informationShort Survey on Static Hand Gesture Recognition
Short Survey on Static Hand Gesture Recognition Huu-Hung Huynh University of Science and Technology The University of Danang, Vietnam Duc-Hoang Vo University of Science and Technology The University of
More informationVoice-driven animation
MERL A MITSUBISHI ELECTRIC RESEARCH LABORATORY http://www.merl.com Voice-driven animation MatthewBrandandKenShan TR-98-20 September 1998 Abstract We introduce a method for learning a mapping between signals,
More informationThe accuracy and robustness of motion
Orthogonal-Blendshape-Based Editing System for Facial Motion Capture Data Qing Li and Zhigang Deng University of Houston The accuracy and robustness of motion capture has made it a popular technique for
More informationVideo Inter-frame Forgery Identification Based on Optical Flow Consistency
Sensors & Transducers 24 by IFSA Publishing, S. L. http://www.sensorsportal.com Video Inter-frame Forgery Identification Based on Optical Flow Consistency Qi Wang, Zhaohong Li, Zhenzhen Zhang, Qinglong
More informationJoint Learning of Speech-Driven Facial Motion with Bidirectional Long-Short Term Memory
Joint Learning of Speech-Driven Facial Motion with Bidirectional Long-Short Term Memory Najmeh Sadoughi and Carlos Busso Multimodal Signal Processing (MSP) Laboratory, Department of Electrical and Computer
More informationDetection of Mouth Movements and Its Applications to Cross-Modal Analysis of Planning Meetings
2009 International Conference on Multimedia Information Networking and Security Detection of Mouth Movements and Its Applications to Cross-Modal Analysis of Planning Meetings Yingen Xiong Nokia Research
More informationAudio-visual interaction in sparse representation features for noise robust audio-visual speech recognition
ISCA Archive http://www.isca-speech.org/archive Auditory-Visual Speech Processing (AVSP) 2013 Annecy, France August 29 - September 1, 2013 Audio-visual interaction in sparse representation features for
More informationFast Facial Motion Cloning in MPEG-4
Fast Facial Motion Cloning in MPEG-4 Marco Fratarcangeli and Marco Schaerf Department of Computer and Systems Science University of Rome La Sapienza frat,schaerf@dis.uniroma1.it Abstract Facial Motion
More informationCurriculum Vitae of Zhen Wen
Curriculum Vitae of Zhen Wen CONTACT INFORMATION RESEARCH INTERESTS EDUCATION 4359 Beckman Institute Voice: (217) 244-4472 (O), (217) 766-3970 (C) University of Illinois at Urbana-Champaign Fax: (217)
More informationCombining Audio and Video for Detection of Spontaneous Emotions
Combining Audio and Video for Detection of Spontaneous Emotions Rok Gajšek, Vitomir Štruc, Simon Dobrišek, Janez Žibert, France Mihelič, and Nikola Pavešić Faculty of Electrical Engineering, University
More informationA Hybrid Approach to News Video Classification with Multi-modal Features
A Hybrid Approach to News Video Classification with Multi-modal Features Peng Wang, Rui Cai and Shi-Qiang Yang Department of Computer Science and Technology, Tsinghua University, Beijing 00084, China Email:
More informationTracking facial features using low resolution and low fps cameras under variable light conditions
Tracking facial features using low resolution and low fps cameras under variable light conditions Peter Kubíni * Department of Computer Graphics Comenius University Bratislava / Slovakia Abstract We are
More informationMotion Interpretation and Synthesis by ICA
Motion Interpretation and Synthesis by ICA Renqiang Min Department of Computer Science, University of Toronto, 1 King s College Road, Toronto, ON M5S3G4, Canada Abstract. It is known that high-dimensional
More informationIT IS well known that human perception of speech relies
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS PART A: SYSTEMS AND HUMANS, VOL 38, NO 6, NOVEMBER 008 173 A Low-Complexity Parabolic Lip Contour Model With Speaker Normalization for High-Level Feature
More informationResearch On 3D Emotional Face Animation Based on Dirichlet Free Deformation Algorithm
2017 3rd International Conference on Electronic Information Technology and Intellectualization (ICEITI 2017) ISBN: 978-1-60595-512-4 Research On 3D Emotional Face Animation Based on Dirichlet Free Deformation
More informationA Multimodal Sensor Fusion Architecture for Audio-Visual Speech Recognition
A Multimodal Sensor Fusion Architecture for Audio-Visual Speech Recognition by Mustapha A. Makkook A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree
More informationVoice Command Based Computer Application Control Using MFCC
Voice Command Based Computer Application Control Using MFCC Abinayaa B., Arun D., Darshini B., Nataraj C Department of Embedded Systems Technologies, Sri Ramakrishna College of Engineering, Coimbatore,
More information