Phonemes Interpolation
|
|
- Alyson Griffith
- 6 years ago
- Views:
Transcription
1 Phonemes Interpolation Fawaz Y. Annaz *1 and Mohammad H. Sadaghiani *2 *1 Institut Teknologi Brunei, Electrical & Electronic Engineering Department, Faculty of Engineering, Jalan Tungku Link, Gadong, BE 1410, Bandar Seri Begawan, Brunei Darussalam *2 University of Nottingham, Malaysia Cmpus, Electrical & Electronic Engineering Department, Jalan Broga, Semenyih, Selangor, Malaysia KeyWords: Lagrange Interpolation, Barycentric Lagrange Interpolation, Phonemes, Viseme. Abstract Learning a language starts immediately after birth in the form of repeating basic sounds and gestures that are generated by adults (usually the parents). While teaching is achieved by initial pronunciation through exaggerated gestures and sounds, learning is accompanied by memorizing and comprehension and eventually reproduction of such gestures and sounds. In fact, parents exaggerate speech by breaking up it into lower accepted (to babies) sound and gesture levels that are more accepted by babies. It is the aim of this paper to demonstrate methods in which fundamental articulated phonemes is represented by signatures that reflect dynamic mouth movement contours. The paper starts by explaining the basic (yet limited) Lagrange Interpolation Method to produce fundamental signatures of lip movements by tracking upper-lip and corner-lip feature-points. Then the paper proposes a method that produces more compact polynomials by using the Barycentric Lagrange Interpolation Method, which overcomes any limitations of the earlier method. 1. Introduction This paper addresses fundamental concepts in human audio-visual communication, and focusses on the mouth shape during speech. This field have received interest from various groups that can range from English Language Teachers (ELT) in a classroom to interacting with an animated face or an electromechanical robot head figure interface. Thus, the work will be of interest to groups in robotics, movie industries, biometrics, real-time translation or even future machine interaction. The difficulty in the study of this field is to understand and combine scientific and artistic significance of speech or communication and the way it should be delivered and perceived between human and/or machines. Facial animation is one concept that emerges from this science, which was pioneered by Parke [1] in 1974, a since then significant time and efforts were allocated to excel the fusion of science and art [2]-[4]. The definition of a generic framework to map speech components into animated visual pronunciation models is another important concept that goes back as far as 1968, which is known today as the concept of phonemes. This, in turn, lead to the birth of the viseme approach [5], which is the study of the visual observations of phonemes and aims to study mouth contours geometry to classify and analyse statistical and physical characteristics of consonants and vowels. The bridging of audio-phonemes and their corresponding visual visemes has thus become the basis for researchers in speech visualization and perception. It is also interesting to know the effect of coarticulation has over speech, and the effect of simple concatenation has over visual modality on their previous and next phonemes. This is clearly evident in the visual domain, thus, studies in speech driven systems [6] and [7], as well as speech and text driven systems [8]-[12] emerged to act as inputs and to generate animated lip movements. In machine learning, speech animation, speech and corresponding visual parameters are used to train Hidden Markov Models (HMM) to create constrains and trajectory functions to determine speech and visual features. The accuracy of approximated visual trajectories depends on the chosen training set which directly affects the quality of the synthesized results. In [13] HMM model was used to animate and synchronize a 2D face model by assisting speech and an AAM (Active Appearance Model) feature parameters. In this paper, each viseme is interpreted as a series of frames that describe a phoneme over a time interval, and is represented by interpolating trajectory paths from control points. Thus, each viseme is expressed by a mathematical form, which maybe further simplified by considering ratios or other geometric transformation rules. The authors in [14] proposed 2D trajectories of mouth cavity area versus aspect ratios to describe Japanese words, however, it did not suggest path formulation over discrete frames or defined mathematical signatures of spoken words. Here, the Lagrange interpolation [15] is proposed to construct polynomials by interpolating sets of control points resulting from lip deformation. This paper will initially describe the suggested approach in 2, followed by an introduction to the signature concept and examples of suggested signatures in sections 3, 4 and 5
2 2. The Approach The main aim of this paper is to determine mathematical signatures of indexed path of feature points that correspond to a sequence of mouth movements. In this introductory paper, isolated visual units of speech (phonemes) are examined to explain our approach. Each pronounced phoneme will be represented in a set of progressive 2Dframes, which some authors refer to as visemes. In this approach, the mouth height and width (per frame) make up unique sets of feature points per spoken phoneme, thus resulting in unique signature sets. It is also proposed that a fixed 30 frames are to represent various words, regardless of their length. Figure 1 shows an example of a fictitious word with five framed feature points (visemes), represented by the vector [F i U, F i C ], where the lip is approximated by ellipses. Feature points in each viseme can simply be the pixel coordinates per frame that can make up ellipse equations, which can be stored and recovered, when necessary. Vowels have longer visual duration than consonants, and they connect consonants in word structures. Thus, in a speech recognition system, the vowels play significant role in recognition [16]. Therefore, it is important to derive mathematical expressions for consonant-vowel phonemes, such as those shown in Table 1 of the International Phonetic Alphabet in American English. N 1 i=0 l i (x i ) = 1 (3) Where l i (x) are the basis functions corresponding to the nodes x i. Table 1. The IPA and ARPABET Vowels Notations IPA ARPABET Example IPA ARPABET Example i IY beet,e AH but I IH bit ɔ AO bought æ AE bat U UH foot ε AH bet u UW boot a AA hot o OW show In this method, the number of frames increases the degree of the Lagrange polynomial, however, this does not mean increase in accuracy. For example, a set of 15 extracted samples from pronouncing the vowel /UW/ gives the interpolation: L(f) = E 8 f E6 f E5 f f f f f f f f f f f f + 21 (4) Figure 1. The IPA and ARPABET Vowels Notations 3. The Lagrange interpolation The Lagrange is a popular choice to derive mathematical functions of the paths feature-point vectors that describe spoken phonemes. Here, the Lagrange interpolation reconstructs a continuous-time polynomial L(x) that spans uniformly over an interval x i [a, b], from a set of samples x i R and i Ν. N 1 L(x) = i=0 f( x i )l i (x) (1) l i (x) N 1 k=0,k j (x x k) N 1 k=0,k i(x i x k ) (2) Figure 2. The Vowel Viseme /UW/ Upper and Corner Feature-Points Lagrange Interpolation The phoneme expression is clearly defined with very high or low amplitudes on the interval boundaries, reducing the accuracy of the interpolation. Therefore, the method is limited and becomes impractical when dealing with large number of samples, thus inducing very high errors between the function and its interpolating curve. This could be demonstrated by simply substituting f=0.3 in (4) to get
3 L(f) 43 as an evaluation to a corner feature point of approximately L(f)=190. This high fluctuation in amplitude at the boundaries is described as the Runge-effect [17], results in an error between the function and its interpolating curve. The Lagrange interpolation of the vowel /UW/ (of both upper and corner feature-points) are plotted in Figures 2 (a and b). The Runge phenomenon is clearly visible on the first and last pairs of nodes, appearing in the form of oscillation at the ending boundaries. Therefore, there is an error between the function and its interpolating curve. To exclude this saturated regions in the Lagrange polynomial a more elegant solution is proposed through the Barycentric Lagrange Polynomial (BLP) interpolation, which will be discussed next. In comparison to the expression in (4), substituting a value close to the boundaries for f in (7) results in an amplitude that is very close to the next neighbouring sample amplitude. For example, letting f=2 in (7) results in L B (0.3) = 22.25, which is very close to the actual value of the first and second samples (F 0 U = 21, F 1 U = 24). Such definition changes the previous interpolation and allocates a polynomial with proper amplitudes to the same sample set in Figure 2. Thus the final desirable representation of the phoneme /UW/curve over a uniformly spanned interval [a,b] is as shown in Figure 3, which shows all samples in a function without the high boundary amplitudes. 4. The Barycentric Lagrange interpolation The boundary oscillation (Runge phenomenon) was treated by the authors in [18] by rearranging the sample nodes x i position and modifying the intervals to formulate the Barycentric Lagrange interpolation: L B (x) N 1 f( i=0 x i ) w i (x x i ) w N 1 i i=0 (x x i ) (5) Such modification tackles the problem of destructive oscillations on the interval boundaries by a transformation to another domain. The transformation is possible by Chebyshev points of the second kind x i = cos iπ (N 1) spanned on the interval [-1, 1]. The weighing function w j in (5) could be simplified as [19]: w i = ( 1) i δ i With δ i = { i = 0, i = N 1 otherwise Equation (5) defines an interpolation procedure by the cooperation of the Lagrange method and the Chebyshev nodes over the interval [1,-1]. This time the same sets of samples that used in (4) was applied to Barycentric Lagrange interpolation method and led to: (6) L(f) = ( f f f f f f f f f f f f f f ) / (14 f 12 84f f f f f f f f f f f ) (7) Figure 3. The Barycentric Lagrange Interpolations Over the Uniformly Spanned Intervals 5. Visual Signatures Building on the above approach and aiming to further reduce the number of mathematical expressions to yield a more compact signatures expression, the upper and lower feature-points ratios were considered. This is shown in Figure 4 and will be referred to as the Rational Signature. Figure 4. The BLP For a set of Feature-Point s
4 6. Conclusions The experiment of formulating visemes was conducted on the extracted Feature-Points from visemes filmed at a rate of 30 frames sec -1 video files for a set of 3 speakers pronouncing the phonemes /IY/, /IH/, /AE/ and /,e/. Clearly, amplitudes vary according to speakers, and duration (number of frames) vary according to the pronounced phoneme. However, here, the main aim is to show that it is feasible to build patterns or signatures for the various phonemes. The presented work is still under development and it is highly likely that further rules must be added to clearly distinguish and identify phoneme, words and portions of speech. The main aim of this paper was to derive mathematical expressions to some of the basic phonemes. Two methods were considered, the Lagrange Interpolation and Barycentric Lagrange Interpolation, where the latter gave a more accurate representation of a pronunciation envelope. In this analysis, the top-centre and the corner of the lip were selected as Feature-Points of the driven signatures. The paper finally presented more compact expressions by considering Feature-Point ratios. As was mentioned earlier on, the presented work is still under development and that there is still a need to further develop the approach to clearly distinguish and identify phoneme, words and portions of speech. References Figure 5. The BLP of Feature-Point s Ratios for the Phonemes phonemes /IY/, /IH/, /AE/ and /,e/. [1]. F. I. Parke, A Parametric Model for Human Faces, Utah: The University of Utah, [2]. F. I. Parke, Computer Generated Animation of Faces, Utah: The University of Utah, [3]. N. Magnenat-Thalman and D. Thalman, "The Direction of Synthetic Actors in the Film 'Rendezvous a Montreal," IEEE journal of Computer Graphics and Applications, vol. 7, no. 12, pp. 9-19, [4]. L. Xie and Z. Q. Liu, "Realistic Mouth-Synching For Speech-Driven Talking Face Using Articulatory Modeling," IEEE Transaction on Multimedia, vol. 9, no. 3, p , [5]. C. G. Fisher, "Confusions among visually perceived consonants, "Journal of Speech and Hearing Research, vol. 11, no. 4, pp , [6]. G. Ananthakrishnan and O. Engwall, "Important Regions in the Articulator Trajectory," in International Seminar on Speech Production, Strasbourg, [7]. D. Jiang, I. Ravyse, H. Sahli and W. Verhelst, "Speech Driven Realistic Mouth Animation Based on Multimodal Unit Selection,"Journal of Multi-Modal User Interfaces, vol. 2, pp , [8]. R. Gutierrez-Osuna, P. K. Kakumanu, A. Esposito, O. N. Garcia, A. Bojorquez, J. L. Castillo and I. Rudomin, "Speech-driven facial animation with realistic dynamics," IEEE Transactions on Multimedia, vol. 7, no. 1, p , [9]. E. Cosatto and H. Graf, "Sample-based synthesis of photorealistic talking heads," in Computer Animation, [10]. Z. Deng, U. Neumann, J. P. Lewis, T. Y. Kim, M. Bulut and S. Narayanan, "Expressive Facial Animation Synthesis By Learning Speech Coarticulation And Expression Spaces," IEEE Transaction on Visualization and Computer Graphics, vol. 12, no. 6, pp. 1-12, [11]. Y. Cao, P. Faloutsos, E. Kohler and F. Pighin, "Real- Time Speech Motion Synthesis from Recorded Motions," in the ACM SIGGRAPH/Eurographics symposium on computer animation, New York, 2004.
5 [12]. S. Morishima, K. Aizawa and H. Harashima, "An Intelligent Facial Image Coding Driven By Speech And Phoneme," in IEEE ICASSP, Glasgow, [13]. G. Englebienne, Animating faces from speech, The University of Manchester, [14]. T. Saitoh and R. Konishi, "Word recognition based on two dimensional lip motion trajectory," IEEE International symposium on intelligent signal processing and communication, pp , [15]. J. L. Lagrange, Lecons elementaires sur les mathématiques, donnees `a l Ecole Normale en 1795, Paris: Oeuvres VII, Gauthier Villars, 1877, p [16]. L. Rabiner and B. H. Jung, Fundumentals of speech recognition, Prentice Hall, [17]. C. Runge, "Uber empirische Funktionen und die Interpolation zwischen aquidistanten Ordinaten," Zeitschrift fur Mathematik und Physik, p , [18]. Salzer and H. E. Salzer, "Lagrangian interpolation at the Chebyshev points xn,v = cos(vπ/n), v = o(1)n; some unnoted advantages" Journal of Computer, p , [19]. J. P. Berrut and L. N. Trefethen, "Barycentric Lagrange," SIAM REVIEW, p , 2004.
M I RA Lab. Speech Animation. Where do we stand today? Speech Animation : Hierarchy. What are the technologies?
MIRALab Where Research means Creativity Where do we stand today? M I RA Lab Nadia Magnenat-Thalmann MIRALab, University of Geneva thalmann@miralab.unige.ch Video Input (face) Audio Input (speech) FAP Extraction
More informationData-Driven Face Modeling and Animation
1. Research Team Data-Driven Face Modeling and Animation Project Leader: Post Doc(s): Graduate Students: Undergraduate Students: Prof. Ulrich Neumann, IMSC and Computer Science John P. Lewis Zhigang Deng,
More informationREALISTIC FACIAL EXPRESSION SYNTHESIS FOR AN IMAGE-BASED TALKING HEAD. Kang Liu and Joern Ostermann
REALISTIC FACIAL EXPRESSION SYNTHESIS FOR AN IMAGE-BASED TALKING HEAD Kang Liu and Joern Ostermann Institut für Informationsverarbeitung, Leibniz Universität Hannover Appelstr. 9A, 3167 Hannover, Germany
More informationMulti-modal Translation and Evaluation of Lip-synchronization using Noise Added Voice
Multi-modal Translation and Evaluation of Lip-synchronization using Noise Added Voice Shigeo MORISHIMA (,2), Satoshi NAKAMURA (2) () Faculty of Engineering, Seikei University. --, Kichijoji-Kitamachi,
More informationTowards Audiovisual TTS
Towards Audiovisual TTS in Estonian Einar MEISTER a, SaschaFAGEL b and RainerMETSVAHI a a Institute of Cybernetics at Tallinn University of Technology, Estonia b zoobemessageentertainmentgmbh, Berlin,
More informationSection 5.5 Piecewise Interpolation
Section 5.5 Piecewise Interpolation Key terms Runge phenomena polynomial wiggle problem Piecewise polynomial interpolation We have considered polynomial interpolation to sets of distinct data like {( )
More informationSpeech Driven Synthesis of Talking Head Sequences
3D Image Analysis and Synthesis, pp. 5-56, Erlangen, November 997. Speech Driven Synthesis of Talking Head Sequences Peter Eisert, Subhasis Chaudhuri,andBerndGirod Telecommunications Laboratory, University
More informationCOMPREHENSIVE MANY-TO-MANY PHONEME-TO-VISEME MAPPING AND ITS APPLICATION FOR CONCATENATIVE VISUAL SPEECH SYNTHESIS
COMPREHENSIVE MANY-TO-MANY PHONEME-TO-VISEME MAPPING AND ITS APPLICATION FOR CONCATENATIVE VISUAL SPEECH SYNTHESIS Wesley Mattheyses 1, Lukas Latacz 1 and Werner Verhelst 1,2 1 Vrije Universiteit Brussel,
More informationimage-based visual synthesis: facial overlay
Universität des Saarlandes Fachrichtung 4.7 Phonetik Sommersemester 2002 Seminar: Audiovisuelle Sprache in der Sprachtechnologie Seminarleitung: Dr. Jacques Koreman image-based visual synthesis: facial
More informationAnimation of 3D surfaces
Animation of 3D surfaces 2013-14 Motivations When character animation is controlled by skeleton set of hierarchical joints joints oriented by rotations the character shape still needs to be visible: visible
More informationA MOUTH FULL OF WORDS: VISUALLY CONSISTENT ACOUSTIC REDUBBING. Disney Research, Pittsburgh, PA University of East Anglia, Norwich, UK
A MOUTH FULL OF WORDS: VISUALLY CONSISTENT ACOUSTIC REDUBBING Sarah Taylor Barry-John Theobald Iain Matthews Disney Research, Pittsburgh, PA University of East Anglia, Norwich, UK ABSTRACT This paper introduces
More informationFACE ANALYSIS AND SYNTHESIS FOR INTERACTIVE ENTERTAINMENT
FACE ANALYSIS AND SYNTHESIS FOR INTERACTIVE ENTERTAINMENT Shoichiro IWASAWA*I, Tatsuo YOTSUKURA*2, Shigeo MORISHIMA*2 */ Telecommunication Advancement Organization *2Facu!ty of Engineering, Seikei University
More informationAUDIOVISUAL SYNTHESIS OF EXAGGERATED SPEECH FOR CORRECTIVE FEEDBACK IN COMPUTER-ASSISTED PRONUNCIATION TRAINING.
AUDIOVISUAL SYNTHESIS OF EXAGGERATED SPEECH FOR CORRECTIVE FEEDBACK IN COMPUTER-ASSISTED PRONUNCIATION TRAINING Junhong Zhao 1,2, Hua Yuan 3, Wai-Kim Leung 4, Helen Meng 4, Jia Liu 3 and Shanhong Xia 1
More informationK A I S T Department of Computer Science
An Example-based Approach to Text-driven Speech Animation with Emotional Expressions Hyewon Pyun, Wonseok Chae, Yejin Kim, Hyungwoo Kang, and Sung Yong Shin CS/TR-2004-200 July 19, 2004 K A I S T Department
More informationAcoustic to Articulatory Mapping using Memory Based Regression and Trajectory Smoothing
Acoustic to Articulatory Mapping using Memory Based Regression and Trajectory Smoothing Samer Al Moubayed Center for Speech Technology, Department of Speech, Music, and Hearing, KTH, Sweden. sameram@kth.se
More informationSpeech Articulation Training PART 1. VATA (Vowel Articulation Training Aid)
Speech Articulation Training PART 1 VATA (Vowel Articulation Training Aid) VATA is a speech therapy tool designed to supplement insufficient or missing auditory feedback for hearing impaired persons. The
More informationArtificial Visual Speech Synchronized with a Speech Synthesis System
Artificial Visual Speech Synchronized with a Speech Synthesis System H.H. Bothe und E.A. Wieden Department of Electronics, Technical University Berlin Einsteinufer 17, D-10587 Berlin, Germany Abstract:
More informationFacial Animation System Based on Image Warping Algorithm
Facial Animation System Based on Image Warping Algorithm Lanfang Dong 1, Yatao Wang 2, Kui Ni 3, Kuikui Lu 4 Vision Computing and Visualization Laboratory, School of Computer Science and Technology, University
More informationSynthesizing Realistic Facial Expressions from Photographs
Synthesizing Realistic Facial Expressions from Photographs 1998 F. Pighin, J Hecker, D. Lischinskiy, R. Szeliskiz and D. H. Salesin University of Washington, The Hebrew University Microsoft Research 1
More informationVISEME SPACE FOR REALISTIC SPEECH ANIMATION
VISEME SPACE FOR REALISTIC SPEECH ANIMATION Sumedha Kshirsagar, Nadia Magnenat-Thalmann MIRALab CUI, University of Geneva {sumedha, thalmann}@miralab.unige.ch http://www.miralab.unige.ch ABSTRACT For realistic
More informationModeling Coarticulation in Continuous Speech
ing in Oregon Health & Science University Center for Spoken Language Understanding December 16, 2013 Outline in 1 2 3 4 5 2 / 40 in is the influence of one phoneme on another Figure: of coarticulation
More informationFacial Motion Capture Editing by Automated Orthogonal Blendshape Construction and Weight Propagation
Facial Motion Capture Editing by Automated Orthogonal Blendshape Construction and Weight Propagation Qing Li and Zhigang Deng Department of Computer Science University of Houston Houston, TX, 77204, USA
More informationReal-time Lip Synchronization Based on Hidden Markov Models
ACCV2002 The 5th Asian Conference on Computer Vision, 23--25 January 2002, Melbourne, Australia. Real-time Lip Synchronization Based on Hidden Markov Models Ying Huang* 1 Stephen Lin+ Xiaoqing Ding* Baining
More informationReal-time Talking Head Driven by Voice and its Application to Communication and Entertainment
ISCA Archive Real-time Talking Head Driven by Voice and its Application to Communication and Entertainment Shigeo MORISHIMA Seikei University ABSTRACT Recently computer can make cyberspace to walk through
More informationTOWARDS A HIGH QUALITY FINNISH TALKING HEAD
TOWARDS A HIGH QUALITY FINNISH TALKING HEAD Jean-Luc Oliv&s, Mikko Sam, Janne Kulju and Otto Seppala Helsinki University of Technology Laboratory of Computational Engineering, P.O. Box 9400, Fin-02015
More informationAnimation of 3D surfaces.
Animation of 3D surfaces Motivations When character animation is controlled by skeleton set of hierarchical joints joints oriented by rotations the character shape still needs to be visible: visible =
More informationQuarterly Progress and Status Report. Studies of labial articulation
Dept. for Speech, Music and Hearing Quarterly Progress and Status Report Studies of labial articulation Lindblom, B. journal: STL-QPSR volume: 6 number: 4 year: 1965 pages: 007-009 http://www.speech.kth.se/qpsr
More informationINTEGRATION OF SPEECH & VIDEO: APPLICATIONS FOR LIP SYNCH: LIP MOVEMENT SYNTHESIS & TIME WARPING
INTEGRATION OF SPEECH & VIDEO: APPLICATIONS FOR LIP SYNCH: LIP MOVEMENT SYNTHESIS & TIME WARPING Jon P. Nedel Submitted to the Department of Electrical and Computer Engineering in Partial Fulfillment of
More informationCS 231. Deformation simulation (and faces)
CS 231 Deformation simulation (and faces) Deformation BODY Simulation Discretization Spring-mass models difficult to model continuum properties Simple & fast to implement and understand Finite Element
More informationCS 231. Deformation simulation (and faces)
CS 231 Deformation simulation (and faces) 1 Cloth Simulation deformable surface model Represent cloth model as a triangular or rectangular grid Points of finite mass as vertices Forces or energies of points
More informationRealistic Talking Head for Human-Car-Entertainment Services
Realistic Talking Head for Human-Car-Entertainment Services Kang Liu, M.Sc., Prof. Dr.-Ing. Jörn Ostermann Institut für Informationsverarbeitung, Leibniz Universität Hannover Appelstr. 9A, 30167 Hannover
More informationSpeech Driven Face Animation Based on Dynamic Concatenation Model
Journal of Information & Computational Science 3: 4 (2006) 1 Available at http://www.joics.com Speech Driven Face Animation Based on Dynamic Concatenation Model Jianhua Tao, Panrong Yin National Laboratory
More informationVTalk: A System for generating Text-to-Audio-Visual Speech
VTalk: A System for generating Text-to-Audio-Visual Speech Prem Kalra, Ashish Kapoor and Udit Kumar Goyal Department of Computer Science and Engineering, Indian Institute of Technology, Delhi Contact email:
More informationConfusability of Phonemes Grouped According to their Viseme Classes in Noisy Environments
PAGE 265 Confusability of Phonemes Grouped According to their Viseme Classes in Noisy Environments Patrick Lucey, Terrence Martin and Sridha Sridharan Speech and Audio Research Laboratory Queensland University
More informationDevelopment of Real-Time Lip Sync Animation Framework Based On Viseme Human Speech
Development of Real-Time Lip Sync Animation Framework Based On Viseme Human Speech Loh Ngiik Hoon 1, Wang Yin Chai 2, Khairul Aidil Azlin Abd. Rahman 3* 1 Faculty of Applied and Creative Arts Universiti
More informationAnimated Talking Head With Personalized 3D Head Model
Animated Talking Head With Personalized 3D Head Model L.S.Chen, T.S.Huang - Beckman Institute & CSL University of Illinois, Urbana, IL 61801, USA; lchen@ifp.uiuc.edu Jörn Ostermann, AT&T Labs-Research,
More informationLifelike Talking Faces for Interactive Services
Lifelike Talking Faces for Interactive Services ERIC COSATTO, MEMBER, IEEE, JÖRN OSTERMANN, SENIOR MEMBER, IEEE, HANS PETER GRAF, FELLOW, IEEE, AND JUERGEN SCHROETER, FELLOW, IEEE Invited Paper Lifelike
More informationFinite Element Analysis Prof. Dr. B. N. Rao Department of Civil Engineering Indian Institute of Technology, Madras. Lecture - 24
Finite Element Analysis Prof. Dr. B. N. Rao Department of Civil Engineering Indian Institute of Technology, Madras Lecture - 24 So in today s class, we will look at quadrilateral elements; and we will
More informationAMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE SYDNEY TOKYO F ^ k.^
Computer a jap Animation Algorithms and Techniques Second Edition Rick Parent Ohio State University AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE SYDNEY TOKYO
More informationModel-Based Face Computation
Model-Based Face Computation 1. Research Team Project Leader: Post Doc(s): Graduate Students: Prof. Ulrich Neumann, IMSC and Computer Science John P. Lewis Hea-juen Hwang, Zhenyao Mo, Gordon Thomas 2.
More informationModelStructureSelection&TrainingAlgorithmsfor an HMMGesture Recognition System
ModelStructureSelection&TrainingAlgorithmsfor an HMMGesture Recognition System Nianjun Liu, Brian C. Lovell, Peter J. Kootsookos, and Richard I.A. Davis Intelligent Real-Time Imaging and Sensing (IRIS)
More informationReal-time Expression Cloning using Appearance Models
Real-time Expression Cloning using Appearance Models Barry-John Theobald School of Computing Sciences University of East Anglia Norwich, UK bjt@cmp.uea.ac.uk Iain A. Matthews Robotics Institute Carnegie
More informationA New Manifold Representation for Visual Speech Recognition
A New Manifold Representation for Visual Speech Recognition Dahai Yu, Ovidiu Ghita, Alistair Sutherland, Paul F. Whelan School of Computing & Electronic Engineering, Vision Systems Group Dublin City University,
More informationText-to-Audiovisual Speech Synthesizer
Text-to-Audiovisual Speech Synthesizer Udit Kumar Goyal, Ashish Kapoor and Prem Kalra Department of Computer Science and Engineering, Indian Institute of Technology, Delhi pkalra@cse.iitd.ernet.in Abstract.
More informationGraph-based High Level Motion Segmentation using Normalized Cuts
Graph-based High Level Motion Segmentation using Normalized Cuts Sungju Yun, Anjin Park and Keechul Jung Abstract Motion capture devices have been utilized in producing several contents, such as movies
More informationROMANIAN LANGUAGE COARTICULATION MODEL FOR VISUAL SPEECH SIMULATIONS. Mihai Daniel Ilie, Cristian Negrescu, Dumitru Stanomir
20th European Signal Processing Conference (EUSIPCO 2012) Bucharest, Romania, August 27-31, 2012 ROMANIAN LANGUAGE COARTICULATION MODEL FOR VISUAL SPEECH SIMULATIONS Mihai Daniel Ilie, Cristian Negrescu,
More informationFacial Deformations for MPEG-4
Facial Deformations for MPEG-4 Marc Escher, Igor Pandzic, Nadia Magnenat Thalmann MIRALab - CUI University of Geneva 24 rue du Général-Dufour CH1211 Geneva 4, Switzerland {Marc.Escher, Igor.Pandzic, Nadia.Thalmann}@cui.unige.ch
More information2-2-2, Hikaridai, Seika-cho, Soraku-gun, Kyoto , Japan 2 Graduate School of Information Science, Nara Institute of Science and Technology
ISCA Archive STREAM WEIGHT OPTIMIZATION OF SPEECH AND LIP IMAGE SEQUENCE FOR AUDIO-VISUAL SPEECH RECOGNITION Satoshi Nakamura 1 Hidetoshi Ito 2 Kiyohiro Shikano 2 1 ATR Spoken Language Translation Research
More informationAnalyzing and Segmenting Finger Gestures in Meaningful Phases
2014 11th International Conference on Computer Graphics, Imaging and Visualization Analyzing and Segmenting Finger Gestures in Meaningful Phases Christos Mousas Paul Newbury Dept. of Informatics University
More informationAbstract. 1 Introduction
Contact detection algorithm using Overhauser splines M. Ulbin, S. Ulaga and J. Masker University of Maribor, Faculty of Mechanical Engineering, Smetanova 77, 2000 Maribor, SLOVENIA E-mail: ulbin @ uni-mb.si
More informationResearch Article A Constraint-Based Approach to Visual Speech for a Mexican-Spanish Talking Head
Computer Games Technology Volume 2008, Article ID 412056, 7 pages doi:10.1155/2008/412056 Research Article A Constraint-Based Approach to Visual Speech for a Mexican-Spanish Talking Head Oscar Martinez
More informationStudents are placed in System 44 based on their performance in the Scholastic Phonics Inventory. System 44 Placement and Scholastic Phonics Inventory
System 44 Overview The System 44 student application leads students through a predetermined path to learn each of the 44 sounds and the letters or letter combinations that create those sounds. In doing
More informationReal-Time Speech-Driven Face Animation with Expressions Using Neural Networks
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 1, JANUARY 2002 100 Real-Time Speech-Driven Face Animation with Expressions Using Neural Networks Pengyu Hong, Zhen Wen, and Thomas S. Huang, Fellow,
More informationFACIAL ANIMATION WITH MOTION CAPTURE BASED ON SURFACE BLENDING
FACIAL ANIMATION WITH MOTION CAPTURE BASED ON SURFACE BLENDING Lijia Zhu and Won-Sook Lee School of Information Technology and Engineering, University of Ottawa 800 King Edward Ave., Ottawa, Ontario, Canada,
More informationJoint Learning of Speech-Driven Facial Motion with Bidirectional Long-Short Term Memory
Joint Learning of Speech-Driven Facial Motion with Bidirectional Long-Short Term Memory Najmeh Sadoughi and Carlos Busso Multimodal Signal Processing (MSP) Laboratory, Department of Electrical and Computer
More informationFacial Image Synthesis 1 Barry-John Theobald and Jeffrey F. Cohn
Facial Image Synthesis Page 1 of 5 Facial Image Synthesis 1 Barry-John Theobald and Jeffrey F. Cohn 1 Introduction Facial expression has been central to the
More informationAn Introductory SIGMA/W Example
1 Introduction An Introductory SIGMA/W Example This is a fairly simple introductory example. The primary purpose is to demonstrate to new SIGMA/W users how to get started, to introduce the usual type of
More informationSpeech Driven Facial Animation
Speech Driven Facial Animation P. Kakumanu R. Gutierrez-Osuna A. Esposito R. Bryll A. Goshtasby O.. Garcia 2 Department of Computer Science and Engineering Wright State University 364 Colonel Glenn Hwy
More informationHuman body animation. Computer Animation. Human Body Animation. Skeletal Animation
Computer Animation Aitor Rovira March 2010 Human body animation Based on slides by Marco Gillies Human Body Animation Skeletal Animation Skeletal Animation (FK, IK) Motion Capture Motion Editing (retargeting,
More informationNon-rigid body Object Tracking using Fuzzy Neural System based on Multiple ROIs and Adaptive Motion Frame Method
Proceedings of the 2009 IEEE International Conference on Systems, Man, and Cybernetics San Antonio, TX, USA - October 2009 Non-rigid body Object Tracking using Fuzzy Neural System based on Multiple ROIs
More informationOptimized Face Animation with Morph-Targets
Optimized Face Animation with Morph-Targets Uwe Berner TU Darmstadt, Interactive Graphics Systems Group (GRIS) Fraunhoferstrasse 5 64283 Darmstadt, Germany uberner@gris.informatik.tudarmstadt.de ABSTRACT
More information1st component influence. y axis location (mm) Incoming context phone. Audio Visual Codebook. Visual phoneme similarity matrix
ISCA Archive 3-D FACE POINT TRAJECTORY SYNTHESIS USING AN AUTOMATICALLY DERIVED VISUAL PHONEME SIMILARITY MATRIX Levent M. Arslan and David Talkin Entropic Inc., Washington, DC, 20003 ABSTRACT This paper
More informationChapter 8 Visualization and Optimization
Chapter 8 Visualization and Optimization Recommended reference books: [1] Edited by R. S. Gallagher: Computer Visualization, Graphics Techniques for Scientific and Engineering Analysis by CRC, 1994 [2]
More informationA new trainable trajectory formation system for facial animation
ISCA Archive http://www.isca-speech.org/archive ITRW on Experimental Linguistics Athens, Greece August 28-30, 2006 A new trainable trajectory formation system for facial animation Oxana Govokhina 1,2,
More informationMaster s Thesis. Cloning Facial Expressions with User-defined Example Models
Master s Thesis Cloning Facial Expressions with User-defined Example Models ( Kim, Yejin) Department of Electrical Engineering and Computer Science Division of Computer Science Korea Advanced Institute
More informationLearning based face hallucination techniques: A survey
Vol. 3 (2014-15) pp. 37-45. : A survey Premitha Premnath K Department of Computer Science & Engineering Vidya Academy of Science & Technology Thrissur - 680501, Kerala, India (email: premithakpnath@gmail.com)
More informationLOW-DIMENSIONAL MOTION FEATURES FOR AUDIO-VISUAL SPEECH RECOGNITION
LOW-DIMENSIONAL MOTION FEATURES FOR AUDIO-VISUAL SPEECH Andrés Vallés Carboneras, Mihai Gurban +, and Jean-Philippe Thiran + + Signal Processing Institute, E.T.S.I. de Telecomunicación Ecole Polytechnique
More informationThe accuracy and robustness of motion
Orthogonal-Blendshape-Based Editing System for Facial Motion Capture Data Qing Li and Zhigang Deng University of Houston The accuracy and robustness of motion capture has made it a popular technique for
More informationA NEURAL NETWORK APPLICATION FOR A COMPUTER ACCESS SECURITY SYSTEM: KEYSTROKE DYNAMICS VERSUS VOICE PATTERNS
A NEURAL NETWORK APPLICATION FOR A COMPUTER ACCESS SECURITY SYSTEM: KEYSTROKE DYNAMICS VERSUS VOICE PATTERNS A. SERMET ANAGUN Industrial Engineering Department, Osmangazi University, Eskisehir, Turkey
More informationCHAPTER 8 Multimedia Information Retrieval
CHAPTER 8 Multimedia Information Retrieval Introduction Text has been the predominant medium for the communication of information. With the availability of better computing capabilities such as availability
More informationExtraction of Human Gait Features from Enhanced Human Silhouette Images
2009 IEEE International Conference on Signal and Image Processing Applications Extraction of Human Gait Features from Enhanced Human Silhouette Images Hu Ng #1, Wooi-Haw Tan *2, Hau-Lee Tong #3, Junaidi
More informationC O M P U T E R G R A P H I C S. Computer Animation. Guoying Zhao 1 / 66
Computer Animation Guoying Zhao 1 / 66 Basic Elements of Computer Graphics Modeling construct the 3D model of the scene Rendering Render the 3D model, compute the color of each pixel. The color is related
More informationConfidence Measures: how much we can trust our speech recognizers
Confidence Measures: how much we can trust our speech recognizers Prof. Hui Jiang Department of Computer Science York University, Toronto, Ontario, Canada Email: hj@cs.yorku.ca Outline Speech recognition
More informationFacial Expression Analysis for Model-Based Coding of Video Sequences
Picture Coding Symposium, pp. 33-38, Berlin, September 1997. Facial Expression Analysis for Model-Based Coding of Video Sequences Peter Eisert and Bernd Girod Telecommunications Institute, University of
More informationAudio-to-Visual Speech Conversion using Deep Neural Networks
INTERSPEECH 216 September 8 12, 216, San Francisco, USA Audio-to-Visual Speech Conversion using Deep Neural Networks Sarah Taylor 1, Akihiro Kato 1, Iain Matthews 2 and Ben Milner 1 1 University of East
More informationFacial Animation System Design based on Image Processing DU Xueyan1, a
4th International Conference on Machinery, Materials and Computing Technology (ICMMCT 206) Facial Animation System Design based on Image Processing DU Xueyan, a Foreign Language School, Wuhan Polytechnic,
More informationProbabilistic Facial Feature Extraction Using Joint Distribution of Location and Texture Information
Probabilistic Facial Feature Extraction Using Joint Distribution of Location and Texture Information Mustafa Berkay Yilmaz, Hakan Erdogan, Mustafa Unel Sabanci University, Faculty of Engineering and Natural
More informationAn Interactive Interface for Directing Virtual Humans
An Interactive Interface for Directing Virtual Humans Gael Sannier 1, Selim Balcisoy 2, Nadia Magnenat-Thalmann 1, Daniel Thalmann 2 1) MIRALab, University of Geneva, 24 rue du Général Dufour CH 1211 Geneva,
More informationMARATHI TEXT-TO-SPEECH SYNTHESISYSTEM FOR ANDROID PHONES
International Journal of Advances in Applied Science and Engineering (IJAEAS) ISSN (P): 2348-1811; ISSN (E): 2348-182X Vol. 3, Issue 2, May 2016, 34-38 IIST MARATHI TEXT-TO-SPEECH SYNTHESISYSTEM FOR ANDROID
More informationA Practical and Configurable Lip Sync Method for Games
A Practical and Configurable Lip Sync Method for Games Yuyu Xu Andrew W. Feng Stacy Marsella Ari Shapiro USC Institute for Creative Technologies Figure 1: Accurate lip synchronization results for multiple
More informationJoint Matrix Quantization of Face Parameters and LPC Coefficients for Low Bit Rate Audiovisual Speech Coding
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 12, NO. 3, MAY 2004 265 Joint Matrix Quantization of Face Parameters and LPC Coefficients for Low Bit Rate Audiovisual Speech Coding Laurent Girin
More informationFACIAL MOVEMENT BASED PERSON AUTHENTICATION
FACIAL MOVEMENT BASED PERSON AUTHENTICATION Pengqing Xie Yang Liu (Presenter) Yong Guan Iowa State University Department of Electrical and Computer Engineering OUTLINE Introduction Literature Review Methodology
More informationAutomatic Animation of High Resolution Images
2012 IEEE 27 th Convention of Electrical and Electronics Engineers in Israel Automatic Animation of High Resolution Images Dmitry Batenkov, Gregory Dinkin, Yosef Yomdin Department of Mathematics The Weizmann
More informationMouth Center Detection under Active Near Infrared Illumination
Proceedings of the 6th WSEAS International Conference on SIGNAL PROCESSING, Dallas, Texas, USA, March 22-24, 2007 173 Mouth Center Detection under Active Near Infrared Illumination THORSTEN GERNOTH, RALPH
More information3 CHOPS - LIP SYNCHING PART 2
3 CHOPS - LIP SYNCHING PART 2 In this lesson you will be building a more complex CHOP network to create a more automated lip-synch. This will utilize the voice split CHOP, and the voice sync CHOP. Using
More informationC H A P T E R Introduction
C H A P T E R 1 Introduction M ultimedia is probably one of the most overused terms of the 90s (for example, see [Sch97]). The field is at the crossroads of several major industries: computing, telecommunications,
More informationDOWNLOAD PDF BIG IDEAS MATH VERTICAL SHRINK OF A PARABOLA
Chapter 1 : BioMath: Transformation of Graphs Use the results in part (a) to identify the vertex of the parabola. c. Find a vertical line on your graph paper so that when you fold the paper, the left portion
More informationPerceptual coding. A psychoacoustic model is used to identify those signals that are influenced by both these effects.
Perceptual coding Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal. Perceptual encoders, however, have been designed for the compression of general
More informationVisualization and Analysis of Inverse Kinematics Algorithms Using Performance Metric Maps
Visualization and Analysis of Inverse Kinematics Algorithms Using Performance Metric Maps Oliver Cardwell, Ramakrishnan Mukundan Department of Computer Science and Software Engineering University of Canterbury
More informationSingularity Analysis of an Extensible Kinematic Architecture: Assur Class N, Order N 1
David H. Myszka e-mail: dmyszka@udayton.edu Andrew P. Murray e-mail: murray@notes.udayton.edu University of Dayton, Dayton, OH 45469 James P. Schmiedeler The Ohio State University, Columbus, OH 43210 e-mail:
More informationFree-Form Shape Optimization using CAD Models
Free-Form Shape Optimization using CAD Models D. Baumgärtner 1, M. Breitenberger 1, K.-U. Bletzinger 1 1 Lehrstuhl für Statik, Technische Universität München (TUM), Arcisstraße 21, D-80333 München 1 Motivation
More informationAn Automatic 3D Face Model Segmentation for Acquiring Weight Motion Area
An Automatic 3D Face Model Segmentation for Acquiring Weight Motion Area Rio Caesar Suyoto Samuel Gandang Gunanto Magister Informatics Engineering Atma Jaya Yogyakarta University Sleman, Indonesia Magister
More informationCombining Audio and Video for Detection of Spontaneous Emotions
Combining Audio and Video for Detection of Spontaneous Emotions Rok Gajšek, Vitomir Štruc, Simon Dobrišek, Janez Žibert, France Mihelič, and Nikola Pavešić Faculty of Electrical Engineering, University
More informationA Novel Visual Speech Representation and HMM Classification for Visual Speech Recognition
A Novel Visual Speech Representation and HMM Classification for Visual Speech Recognition Dahai Yu, Ovidiu Ghita, Alistair Sutherland*, Paul F Whelan Vision Systems Group, School of Electronic Engineering
More information2D to pseudo-3d conversion of "head and shoulder" images using feature based parametric disparity maps
University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2001 2D to pseudo-3d conversion of "head and shoulder" images using feature
More informationBuilding speaker-specific lip models for talking heads from 3D face data
Building speaker-specific lip models for talking heads from 3D face data Takaaki Kuratate 1,2, Marcia Riley 1 1 Institute for Cognitive Systems, Technical University Munich, Germany 2 MARCS Auditory Laboratories,
More informationShort Survey on Static Hand Gesture Recognition
Short Survey on Static Hand Gesture Recognition Huu-Hung Huynh University of Science and Technology The University of Danang, Vietnam Duc-Hoang Vo University of Science and Technology The University of
More informationCOORDINATE MEASUREMENTS OF COMPLEX-SHAPE SURFACES
XIX IMEKO World Congress Fundamental and Applied Metrology September 6 11, 2009, Lisbon, Portugal COORDINATE MEASUREMENTS OF COMPLEX-SHAPE SURFACES Andrzej Werner 1, Malgorzata Poniatowska 2 1 Faculty
More informationBoth LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal.
Perceptual coding Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal. Perceptual encoders, however, have been designed for the compression of general
More informationWavelet Transform in Face Recognition
J. Bobulski, Wavelet Transform in Face Recognition,In: Saeed K., Pejaś J., Mosdorf R., Biometrics, Computer Security Systems and Artificial Intelligence Applications, Springer Science + Business Media,
More information