Proceedings of Meetings on Acoustics
|
|
- Clarence Powell
- 6 years ago
- Views:
Transcription
1 Proceedings of Meetings on Acoustics Volume, rd Meeting Acoustical Society of America Salt Lake City, Utah 4-8 June 2007 Session psc: Speech Communication psc9. Temporal characterization of auditory-visual coupling in speech Adriano V. Barbosa, Hani C. Yehia and Eric Vatikiotis-Bateson This work examines the coupling between the acoustic and visual com- ponents of speech as it evolves through time. Previous work has shown a consistent correspondence between face motion and spectral acous- tics, and between fundamental frequency (F0) and rigid body motion of the head [Yehia et al. (2002), JPHON, 30, ]. Although these correspondences have been estimated both for sentences and for running speech, the analyses have not taken into account the tempo- ral structure of speech. As a result, the role of temporal organization in multimodal speech cannot be assessed. The current study is a first effort to correct this deficit. We have developed an algorithm, based on recursive correlation, that computes the correlation between measurement domains (e.g., head motion and F0) as a time-varying function. Using this method, regions of high or low correlation, or of rapid transition (e.g., from high to low), can be associated with visual and auditory events. This analysis of the time-varying cou- pling of multimodal events has implications for speech planning and synchronization between speaker and listener. Published by the Acoustical Society of America through the American Institute of Physics 2008 Acoustical Society of America [DOI: 0.2/ ] Received 27 Feb 2008; published 24 Apr 2008 Proceedings of Meetings on Acoustics, Vol., (2008) Page
2 Temporal characterization of auditory-visual coupling in speech Adriano V Barbosa, Hani C Yehia 2 and Eric Vatikiotis-Bateson Linguistics, University of British Columbia, Vancouver, Canada 2 Electronics, Federal University of Minas Gerais, Belo Horizonte, Brazil adriano.vilela@gmail.com, evb@interchange.ubc.ca, hani@cefala.org Introduction This paper introduces two important improvements to our system for processing multimodal signals that can be measured during spoken communication: ) the computation of correspondences between time-varying measures that are sensitive to temporally local fluctuations; 2) the transduction of visible motion from simple video recordings in a field situation where use of passive markers or makeup is unacceptable. Over the past decade, we have applied both linear and nonlinear estimation techniques to characterize the largely linear correspondences between vocal tract articulation, the speech acoustics and visible motions of the head and face. Since the time-varying vocal tract shapes both the acoustics and the face through positioning of the tongue, jaw and lips, it is not surprising that measures made in these three domains should be related somehow. When applied to isolated sentences and longer stretches of connected speech, our previous analyses have shown there to be largely linear correspondences, for example, between spectral acoustic parameters (Line Spectrum Pairs LSP; Sugamura and Itakura, 986 ) and motion of the lips, cheeks, and chin (Yehia et al., 998, 999), and between fundamental frequency (F0) and rigid body head motion (Yehia et al., 2002). These correspondences have been computed using relatively few (5-0) parameters for each measurement domain. For example, the first 5-6 principal Proceedings of Meetings on Acoustics, Vol., (2008) Page 2
3 components of 2D midsagittal motion of the tongue, jaw, and lips and a similar number of components for face and head motion are typically sufficient to recover more than 95% of the variance in each of these domains. Simple linear models applied to these reduced numbers of principal components are then usually able to account for 80-90% of the cross-domain variance. The simplicity of these correspondences greatly facilitated the creation of a linguistically valid talking head animation system that, running in real time, can be driven by measured vocal tract, acoustics, or visible motion of the head and face (for details of the animation system, see Kuratate et al., 2005; for the perceptual validation, see Munhall et al., 2004). A major limitation of this system, however, has been that the correspondences, for example, between face motion and acoustic LSPs, are computed globally over the entire signal. For isolated sentences spanning -2 seconds, this is fine. However, when longer stretches of data are considered, the correspondences do not improve; if anything they weaken somewhat (Vatikiotis- Bateson and Yehia, 2002). That is, the computation of correspondences between signals is based on a static set of parameters, computed once, which means that there is no distinction between spatiotemporal variations that characterize the behavioral structure and local fluctuations that degrade the result when the computed parameters are applied recursively to estimate the time-varying behavior (Moreira and Yehia, 2006). To address this limitation, we introduce an algorithm, based on recursive correlation (Aarts et al., 2002), that computes the instantaneous cross-correlation between measurement domains e.g., head motion and acoustic amplitude (root mean square RMS). This allows rapid changes in correspondence between auditory-visual events to be evaluated as a function of time while also potentially improving the accuracy of cross-domain correspondences computed over analysis windows of any size. A second limitation to our system has been the dependence on markers placed on the face and head for tracking 2D or 3D motion. The use of markers, either active (e.g., wired infrared LEDs) or passive, is physically invasive and distracting for naive experimental subjects, and restricts data collection to the laboratory a situation which is stressful for the elderly and other populations unaccustomed to formal research. In this study however, the motion data were all extracted from video recordings made in the field using a relatively unobtrusive digital video (DV) camcorder. Through the simple technique of computing the optical flow (Horn and Schunk, 98) and then summing the amplitudes (and discarding the directions) of the Proceedings of Meetings on Acoustics, Vol., (2008) Page 3
4 pixel motion vectors for each frame step, signals similar to those derived from marker tracking were created. As shown below (see Figures 3-5), even a single channel representing all of the motion in a video frame captures significant aspects of the spatiotemporal behavior. For the purpose of demonstration, the algorithm is applied to audiovisual behavior produced by a speaker of Plains Cree (Alberta, Canada) as part of an investigation of language as performance with R-M Déchaine and J Deschamps. The analysis of the time-varying coupling of multimodal events clearly has implications for our understanding of speech organization and the assessment of communicative coordination between speaker and listener. In this study, we focus in particular on the coordination between orofacial motions, the amplitude (RMS) of the speech acoustics, and the motion of the speaker s hands. The specific use to which this has been put in the investigation of Cree is in assessing the coordination of hand gestures and speech acoustic parameters in the collaborative construction of meaning. This includes instances where the explicit meaning does not reside solely in the words (or the visible gestures), and instances where iconic use of the hands shows secondary iconic specification in anaphoric structures (e.g., bringing the hand to touch the head once in a narrative to indicate that the speaker was thinking about something and then subsequently producing a reduced motion in the direction of the head to indicate the same thing). The timing and structure of these gestures are coordinated with, but not necessarily determined by, the speech acoustics. The remainder of this paper is organized as follows. Section 2 presents the mathematical formulation of our instantaneous correlation algorithm. Section 3 describes the data acquisition process and discusses the optical flow analysis applied to the acquired video sequences. Results are presented and discussed in Section 4. Finally, the paper is summarized in Section 5. 2 Instantaneous correlation algorithm The instantaneous correlation coefficient ρ(k) between signals x(k) andy(k) is computed as S xy (k) ρ(k) = Sxx (k) S yy (k), () Proceedings of Meetings on Acoustics, Vol., (2008) Page 4
5 where the instantaneous covariance S xy between signals x(k)andy(k)isgiven by S xy (k) = ce ηl (x(k l) x(k l)) (y(k l) ȳ(k l)), (2) l=0 which is a modification of Equation (4) in Aarts et al. (2002). The instantaneous means x(k) and ȳ(k) are computed as x(k) = ce ηm x(k m), (3) m=0 ȳ(k) = ce ηm y(k m), (4) m=0 with the constant c given by c = e η, (5) where η is a small positive number. It is interesting to note that the signal x(k) can be seen as the product of the constant c and the the output of a first-order low-pass linear filter excited by the signal x(k) (the same is valid for signal y(k)). The z-transform representation of this linear filter is given by H(z) = e η z, z >e η. (6) Furthermore, the covariance S xy as defined by Equation 2 can also be seen as the product of the constant c and the output of the filter in Equation 6 when excited by the signal (x(k) x(k)) (y(k) ȳ(k)). (7) 3 Data recording and processing The speaker s hands and face were recorded simultaneously at 24 frames per second (fps) using two DV cameras during a 6 minute interview. Stereo sound was recorded digitally by each camera at 48 khz via two professional microphones (Tram-50 lapel, Sennheiser-46 shotgun). Measures of 2D motion were extracted from the video recordings using the optical flow algorithm developed by Horn and Schunk (98). Figure Proceedings of Meetings on Acoustics, Vol., (2008) Page 5
6 Figure : A snapshot of the video and optical flow between the frame shown and the next frame in the video sequence. shows video frame for each camera and the resulting optical flow computed between that frame and the next frame in the sequence. There are many algorithms for computing optical flow (Barron et al., 994). However, they all have the same goal of calculating optical flow fields corresponding to the projection of the 3D motion of objects in the world onto the 2D image. A standard definition (SDTV) frame of NTSC digital video is 640 pixels wide by 480 pixels high. Each pixel has a luminance, or intensity, value within an 8-bit (0-255) range. Pixels also have values for color, but these are discarded in calculating optical flow. Moving images are recorded as changes in the intensity (and color) values for the pixels in the image array that are influenced by the motion. The optical flow algorithm does not merely register the change of intensity from one image to the next for each pixel; rather it attempts to keep track of specific intensity values, corresponding to image objects as they change location within the pixel array. Thus, the Proceedings of Meetings on Acoustics, Vol., (2008) Page 6
7 algorithm assigns a motion vector consisting of a magnitude and a direction to each pixel based on where the intensity associated with that pixel in one image is located in the next image in sequence. The direction is simply the line from the first pixel to the second and the magnitude corresponds to the Euclidean distance between them. The array of motion vectors comprises the optical flow field. For the purposes of the current analysis, only the magnitude (speed) of pixel motion is needed to assess the coordination of the hand and face-head motion with respect to each other and to the speech acoustics. The richness of this information is readily apparent in movies constructed by representing the magnitude component of optical flow as intensity so that more rapid changes appear brighter (have higher intensity) in the image sequence. Optical flow captures the motion and makes it possible to assess the coordination of hand motion, eye blinks, head motion, and even events in the speech acoustics. In principle, the motion associated with specific regions of interest such as the eyes, mouth, and head or the left and right hands can be examined independently. At this early stage of development, however, our goals are to introduce the instantaneous correlation algorithm, use optical flow to recover motion from video, and show how these techniques can be used to assess the time-varying correspondences between speech acoustics, head and orofacial motion, and hand gestures. To do this, the 640x480 magnitudes of motion associated with each pair of consecutive frames are summed and stored as unidimensional streams for the video sequences acquired by the two cameras (one of the face and head, the other of the hands). Summing the motion for the entire video frame obscures the contribution of specific components for example, the potentially differential contribution of each hand is lost and reduces the dimensionality of each measurement domain to one-time-varying measure. However, as a first step, this has two advantages: ) no a priori decisions about which aspects of the motion or which physical regions are the most relevant to the cross-domain correspondences have been made; and 2) as shown below, these supposedly impoverished measures are surprisingly well-coordinated across domains. In what follows, the instantaneous correlation algorithm is used to compare the two streams of motion magnitudes, summed from the optical flow results, with each other (Figure 3) and with the time-varying RMS amplitude of the acoustics (Figures 4-5). Proceedings of Meetings on Acoustics, Vol., (2008) Page 7
8 4 Results Computing instantaneous correlations that match what we readily perceive qualitatively from watching the optical flow movies requires tuning the algorithm to the behavioral data. The algorithm should not be too sensitive to rapid changes in correspondence e.g., changes due to noise or to higher frequency components in the behavior. Signal noise can be reduced by low-pass filtering (0 Hz used here). Behavioral noise is another issue: fluctuations in synchronization and higher frequencies will confound a sensitive function, while a less sensitive function will miss the subtle changes in spatiotemporal patterning. 0.9 Temporal scope of correlation weights ( ) 0.8 Relative Weight = 0.2 = Previous Sample Number Figure 2: Comparison of the temporal effects on relative weight for two values of η = 0.2 and 0.02 (used in our analysis). Sensitivity is determined by the exponent η in Equation 3. The larger its value the more sensitive the correlation is to rapid changes in the correspondence between signals. Figure 2 shows, for two different values of η, how preceding samples influence the correlation estimate. The smaller value (η = 0.02) gives a slower decline in the weight of preceding samples, decaying to less than % after 250 samples, and has proven to be a good value for the continuous correlation of the running speech data recorded for Plains Cree. This amounts to about a 5 sec. window when applied to signals resampled at Proceedings of Meetings on Acoustics, Vol., (2008) Page 8
9 Face-Hand Correspondence Correlation (Face/Hand) start 528 start Hand Motion Face Motion Time (s) Figure 3: Time-series plots of instantaneous correlation (top) calculated from two starting points (solid 8 sec. prior to window, dashed from window onset), summed optical flow for hands video (middle), and summed optical flow for head and face video (bottom). a common rate of 48Hz. This can be seen in the top panel of Figure 3 where the dashed correlation trace was computed from the start of the window and the solid trace 8 sec. earlier. The relatively high correlation in the first 3-4 sec. of Figure 3 can be seen by inspecting the motion traces for the hands and head/face. Also, comparing the audio waveform (Figure 3 top) and the motion traces shows that the 5 bursts of hand motion activity are apparently more synchronous with the speech signal than is the face motion. This latter observation is supported by the instantaneous correlation results shown for RMS amplitude and hand motion in Figure 4, and for RMS amplitude and face/head motion in Figure 5. Throughout the segments of data depicted in these figures, the correlation between RMS amplitude and face/head motion suffers from both poor synchronization and frequency mismatches in which the optical flow for the Proceedings of Meetings on Acoustics, Vol., (2008) Page 9
10 RMS Amplitude Hand Correspondence Correlation (RMS/hand) RMS Amplitude Hand Motion Time (s) Figure 4: Time-series plots of instantaneous correlation (top), RMS amplitude (middle), and summed hand motion (bottom). face/head sums across the perhaps semi-independent orofacial deformations due to speech articulation, eye-blinks, and head motion. The time-course of RMS amplitude clearly corresponds to the time-course of vocal tract opening (vowels) and closing (consonants). It has also been suggested that head motion is associated with RMS amplitude, but not necessarily in strict synchrony with successive syllables (Munhall et al., 2004). This difference in phasing alone would contribute to the complex frequencies observed for face/head motion. Eye-blinks add yet another dimension to the temporal stream of events. Although beyond the scope of the current paper, Matlab tools have been created to accommodate variable synchronization between signals and to do spectral decomposition prior to computing the correlations (Barbosa et al., 2007a). Proceedings of Meetings on Acoustics, Vol., (2008) Page 0
11 RMS Amplitude Face Correspondence Correlation (RMS/Face) 0 RMS Amplitude Face Motion Time (s) Figure 5: Time-series plots of instantaneous correlation (top), RMS amplitude (middle), and summed face motion (bottom). 5 Summary discussion As can be seen from this simple demonstration, both optical flow and continuous instantaneous correlation promise to be useful in assessing the spatiotemporal coupling between behavioral signals that can be measured as non-invasively as possible. In the implementation presented here, relatively high correlations are observed only for events that are tightly synchronized. Since presenting this poster in June 2007 (Barbosa et al., 2007b), the algorithm has been modified to compute correlations ) weighted by any combination of preceding and following samples, and 2) at any temporal offset between the two signals. These improvements preclude real-time processing, but provide more robust assessments of the correspondence between signals. Both modifications should afford larger values of η that asymptote more Proceedings of Meetings on Acoustics, Vol., (2008) Page
12 quickly, thus effectively reducing the size of the filter window (Figure 2). While this results in greater sensitivity to fluctuations in the instantaneous correlation coefficient, the greater sensitivity can be used to assess the shifts in temporal lag without necessarily reducing the degree of correspondence that occurs naturally during the production of coordinated behaviors such as speech and music. Most of this expanded functionality has already been incorporated in the Matlab toolbox that we have created for processing and analyzing multimodal speech data (Barbosa et al., 2007a), and is available to the research community. Even with these improvements, there are still instances of coupling between the speech and gestural behavior that cannot be easily captured. These are due, in part, to summing the motion for the face/head. Therefore, we are currently attempting the more fine-grained decomposition of this complex into head, perioral, and eye components, which we know are each coordinated with the production of speech. Finally, if we need to make the algorithm smarter so that its sensitivity can be modified on the fly, we will replace the current instantaneous correlation algorithm with a learning algorithm e.g., Kalman filtering (Kalman, 960) that combines prediction from previous patterns of behavior with local estimates of the instantaneous correlation. Acknowledgment Support for this work was provided by NSERC and SSHRC grants to E. Vatikiotis-Bateson. The Plains Cree data were collected in collaboration with Rose-Marie Déchaine, Clancy Dennehy, and Joseph Deschamps. References Aarts, R. M., Irwan, R., and Janssen, A. J. E. M. (2002). Efficient tracking of the cross-correlation coefficient. IEEE Transactions on Speech and Audio Processing, 0(6): Barbosa, A. V., Yehia, H. C., and Vatikiotis-Bateson, E. (2007a). Matlab toolbox for audiovisual speech processing. In Vroomen, J., Swerts, M., and Krahmer, E., editors, International Conference on Auditory-Visual Speech Processing AVSP 2007, pages 32 37, The Netherlands. ISCA. Proceedings of Meetings on Acoustics, Vol., (2008) Page 2
13 Barbosa, A. V., Yehia, H. C., and Vatikiotis-Bateson, E. (2007b). Temporal characteristization of auditory-visual coupling in speech. Journal of the Acoustical Society of America, 2:3044. Barron, J. L., Fleet, D. J., and Beauchemin, S. S. (994). Performance of optical flow techniques. International Journal of Computer Vision, 2: Horn, B. K. and Schunk, B. G. (98). Determining optical flow. Artificial Intelligence, 7: Kalman, R. E. (960). A new approach to linear filtering and prediction problems. Transactions of the ASME Journal of Basic Engineering, 82(Series D): Kuratate, T., Vatikiotis-Bateson, E., and Yehia, H. C. (2005). Estimation and animation of faces using facial motion mapping and a 3d face database. In Clement, J. G. and Marks, M. K., editors, Computer-graphic facial reconstruction, pages Academic Press, Amsterdam. Moreira, K. S. and Yehia, H. C. (2006). Analysis of the variability of the coupling between facial motion and speech acoustics. In Yehia, H. C., Demolin, D., and Laboissière, R., editors, International Seminar on Speech Production ISSP 2006, pages 09 6, Brazil. UFMG. Munhall, K. G., Jones, J. A., Callan, D. E., Kuratate, T., and Vatikiotis- Bateson, E. (2004). Visual prosody and speech intelligibility: Head movement improves auditory speech perception. Psychological Science, 5(2): Sugamura, N. and Itakura, F. (986). Speech analysis and synthesis methods developed at ECL in NTT: from LPC to LSP. Speech Communication, 5: Vatikiotis-Bateson, E. and Yehia, H. C. (2002). Speaking mode variability in multimodal speech production. IEEE Transactions in Neural Networks, 3(4): Yehia, H. C., Kuratate, T., and Vatikiotis-Bateson, E. (999). Using speech acoustics to drive facial motion. In Ohala, J. J., Hasegawa, Y., Ohala, Proceedings of Meetings on Acoustics, Vol., (2008) Page 3
14 M., Granville, D., and Bailey, A. C., editors, Proceedings of the 4th International Congress of Phonetic Sciences, volume, pages , San Francisco, CA. Linguistics Dept., UC Berkeley. Yehia, H. C., Kuratate, T., and Vatikiotis-Bateson, E. (2002). Linking facial animation, head motion, and speech acoustics. Journal of Phonetics, 30(3): Yehia, H. C., Rubin, P. E., and Vatikiotis-Bateson, E. (998). Quantitative association of vocal-tract and facial behavior. Speech Communication, 26: Proceedings of Meetings on Acoustics, Vol., (2008) Page 4
MATLAB Toolbox for Audiovisual Speech Processing
ISCA Archive http://www.isca-speech.org/archive Auditory-Visual Speech Processing 27 (AVSP27) Hilvarenbeek, The Netherlands August 31 - September 3, 27 MATLAB Toolbox for Audiovisual Speech Processing
More informationM I RA Lab. Speech Animation. Where do we stand today? Speech Animation : Hierarchy. What are the technologies?
MIRALab Where Research means Creativity Where do we stand today? M I RA Lab Nadia Magnenat-Thalmann MIRALab, University of Geneva thalmann@miralab.unige.ch Video Input (face) Audio Input (speech) FAP Extraction
More informationVISEME SPACE FOR REALISTIC SPEECH ANIMATION
VISEME SPACE FOR REALISTIC SPEECH ANIMATION Sumedha Kshirsagar, Nadia Magnenat-Thalmann MIRALab CUI, University of Geneva {sumedha, thalmann}@miralab.unige.ch http://www.miralab.unige.ch ABSTRACT For realistic
More informationFaceSync: A linear operator for measuring synchronization of video facial images and audio tracks
Presented at NIPS, Denver Colorado, November,. Published in the Proceedings of Neural Information Processing Society 13, MI Press, 1. FaceSync: A linear operator for measuring synchronization of video
More informationSpeech Driven Synthesis of Talking Head Sequences
3D Image Analysis and Synthesis, pp. 5-56, Erlangen, November 997. Speech Driven Synthesis of Talking Head Sequences Peter Eisert, Subhasis Chaudhuri,andBerndGirod Telecommunications Laboratory, University
More informationA LOW-COST STEREOVISION BASED SYSTEM FOR ACQUISITION OF VISIBLE ARTICULATORY DATA
ISCA Archive http://www.isca-speech.org/archive Auditory-Visual Speech Processing 2005 (AVSP 05) British Columbia, Canada July 24-27, 2005 A LOW-COST STEREOVISION BASED SYSTEM FOR ACQUISITION OF VISIBLE
More informationReal-time Talking Head Driven by Voice and its Application to Communication and Entertainment
ISCA Archive Real-time Talking Head Driven by Voice and its Application to Communication and Entertainment Shigeo MORISHIMA Seikei University ABSTRACT Recently computer can make cyberspace to walk through
More informationRealistic Face Animation for Audiovisual Speech Applications: A Densification Approach Driven by Sparse Stereo Meshes
Realistic Face Animation for Audiovisual Speech Applications: A Densification Approach Driven by Sparse Stereo Meshes Marie-Odile Berger, Jonathan Ponroy, Brigitte Wrobel-Dautcourt To cite this version:
More informationDetection of Mouth Movements and Its Applications to Cross-Modal Analysis of Planning Meetings
2009 International Conference on Multimedia Information Networking and Security Detection of Mouth Movements and Its Applications to Cross-Modal Analysis of Planning Meetings Yingen Xiong Nokia Research
More informationMulti-modal Translation and Evaluation of Lip-synchronization using Noise Added Voice
Multi-modal Translation and Evaluation of Lip-synchronization using Noise Added Voice Shigeo MORISHIMA (,2), Satoshi NAKAMURA (2) () Faculty of Engineering, Seikei University. --, Kichijoji-Kitamachi,
More informationFACE ANALYSIS AND SYNTHESIS FOR INTERACTIVE ENTERTAINMENT
FACE ANALYSIS AND SYNTHESIS FOR INTERACTIVE ENTERTAINMENT Shoichiro IWASAWA*I, Tatsuo YOTSUKURA*2, Shigeo MORISHIMA*2 */ Telecommunication Advancement Organization *2Facu!ty of Engineering, Seikei University
More informationNew Results in Low Bit Rate Speech Coding and Bandwidth Extension
Audio Engineering Society Convention Paper Presented at the 121st Convention 2006 October 5 8 San Francisco, CA, USA This convention paper has been reproduced from the author's advance manuscript, without
More informationQuarterly Progress and Status Report. Studies of labial articulation
Dept. for Speech, Music and Hearing Quarterly Progress and Status Report Studies of labial articulation Lindblom, B. journal: STL-QPSR volume: 6 number: 4 year: 1965 pages: 007-009 http://www.speech.kth.se/qpsr
More informationVideo Processing for Judicial Applications
Video Processing for Judicial Applications Konstantinos Avgerinakis, Alexia Briassouli, Ioannis Kompatsiaris Informatics and Telematics Institute, Centre for Research and Technology, Hellas Thessaloniki,
More informationFUZZY INFERENCE SYSTEMS
CHAPTER-IV FUZZY INFERENCE SYSTEMS Fuzzy inference is the process of formulating the mapping from a given input to an output using fuzzy logic. The mapping then provides a basis from which decisions can
More informationThree-dimensional nondestructive evaluation of cylindrical objects (pipe) using an infrared camera coupled to a 3D scanner
Three-dimensional nondestructive evaluation of cylindrical objects (pipe) using an infrared camera coupled to a 3D scanner F. B. Djupkep Dizeu, S. Hesabi, D. Laurendeau, A. Bendada Computer Vision and
More information2-2-2, Hikaridai, Seika-cho, Soraku-gun, Kyoto , Japan 2 Graduate School of Information Science, Nara Institute of Science and Technology
ISCA Archive STREAM WEIGHT OPTIMIZATION OF SPEECH AND LIP IMAGE SEQUENCE FOR AUDIO-VISUAL SPEECH RECOGNITION Satoshi Nakamura 1 Hidetoshi Ito 2 Kiyohiro Shikano 2 1 ATR Spoken Language Translation Research
More informationCT516 Advanced Digital Communications Lecture 7: Speech Encoder
CT516 Advanced Digital Communications Lecture 7: Speech Encoder Yash M. Vasavada Associate Professor, DA-IICT, Gandhinagar 2nd February 2017 Yash M. Vasavada (DA-IICT) CT516: Adv. Digital Comm. 2nd February
More informationSYSTEM AND METHOD FOR SPEECH RECOGNITION
Technical Disclosure Commons Defensive Publications Series September 06, 2016 SYSTEM AND METHOD FOR SPEECH RECOGNITION Dimitri Kanevsky Tara Sainath Follow this and additional works at: http://www.tdcommons.org/dpubs_series
More informationAutomatic Enhancement of Correspondence Detection in an Object Tracking System
Automatic Enhancement of Correspondence Detection in an Object Tracking System Denis Schulze 1, Sven Wachsmuth 1 and Katharina J. Rohlfing 2 1- University of Bielefeld - Applied Informatics Universitätsstr.
More informationHybrid Speech Synthesis
Hybrid Speech Synthesis Simon King Centre for Speech Technology Research University of Edinburgh 2 What are you going to learn? Another recap of unit selection let s properly understand the Acoustic Space
More informationAn Event-based Optical Flow Algorithm for Dynamic Vision Sensors
An Event-based Optical Flow Algorithm for Dynamic Vision Sensors Iffatur Ridwan and Howard Cheng Department of Mathematics and Computer Science University of Lethbridge, Canada iffatur.ridwan@uleth.ca,howard.cheng@uleth.ca
More informationExpression Detection in Video. Abstract Expression detection is useful as a non-invasive method of lie detection and
Wes Miller 5/11/2011 Comp Sci 534 Expression Detection in Video Abstract Expression detection is useful as a non-invasive method of lie detection and behavior prediction, as many facial expressions are
More informationELEC Dr Reji Mathew Electrical Engineering UNSW
ELEC 4622 Dr Reji Mathew Electrical Engineering UNSW Review of Motion Modelling and Estimation Introduction to Motion Modelling & Estimation Forward Motion Backward Motion Block Motion Estimation Motion
More informationTracking facial features using low resolution and low fps cameras under variable light conditions
Tracking facial features using low resolution and low fps cameras under variable light conditions Peter Kubíni * Department of Computer Graphics Comenius University Bratislava / Slovakia Abstract We are
More informationPrinciples of Audio Coding
Principles of Audio Coding Topics today Introduction VOCODERS Psychoacoustics Equal-Loudness Curve Frequency Masking Temporal Masking (CSIT 410) 2 Introduction Speech compression algorithm focuses on exploiting
More informationTOWARDS A HIGH QUALITY FINNISH TALKING HEAD
TOWARDS A HIGH QUALITY FINNISH TALKING HEAD Jean-Luc Oliv&s, Mikko Sam, Janne Kulju and Otto Seppala Helsinki University of Technology Laboratory of Computational Engineering, P.O. Box 9400, Fin-02015
More informationPerceptual coding. A psychoacoustic model is used to identify those signals that are influenced by both these effects.
Perceptual coding Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal. Perceptual encoders, however, have been designed for the compression of general
More informationSpectral modeling of musical sounds
Spectral modeling of musical sounds Xavier Serra Audiovisual Institute, Pompeu Fabra University http://www.iua.upf.es xserra@iua.upf.es 1. Introduction Spectral based analysis/synthesis techniques offer
More information19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007
19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 SUBJECTIVE AND OBJECTIVE QUALITY EVALUATION FOR AUDIO WATERMARKING BASED ON SINUSOIDAL AMPLITUDE MODULATION PACS: 43.10.Pr, 43.60.Ek
More informationON THE PERFORMANCE OF SEGMENT AVERAGING OF DISCRETE COSINE TRANSFORM COEFFICIENTS ON MUSICAL INSTRUMENTS TONE RECOGNITION
O THE PERFORMACE OF SEGMET AVERAGIG OF DISCRETE COSIE TRASFORM COEFFICIETS O MUSICAL ISTRUMETS TOE RECOGITIO Linggo Sumarno Electrical Engineering Study Program, Sanata Dharma University, Yogyakarta, Indonesia
More informationBLIND QUALITY ASSESSMENT OF JPEG2000 COMPRESSED IMAGES USING NATURAL SCENE STATISTICS. Hamid R. Sheikh, Alan C. Bovik and Lawrence Cormack
BLIND QUALITY ASSESSMENT OF JPEG2 COMPRESSED IMAGES USING NATURAL SCENE STATISTICS Hamid R. Sheikh, Alan C. Bovik and Lawrence Cormack Laboratory for Image and Video Engineering, Department of Electrical
More informationLOCAL-GLOBAL OPTICAL FLOW FOR IMAGE REGISTRATION
LOCAL-GLOBAL OPTICAL FLOW FOR IMAGE REGISTRATION Ammar Zayouna Richard Comley Daming Shi Middlesex University School of Engineering and Information Sciences Middlesex University, London NW4 4BT, UK A.Zayouna@mdx.ac.uk
More informationRoom Acoustics. CMSC 828D / Spring 2006 Lecture 20
Room Acoustics CMSC 828D / Spring 2006 Lecture 20 Lecture Plan Room acoustics basics Structure of room impulse response Characterization of room acoustics Modeling of reverberant response Basics All our
More informationTowards Audiovisual TTS
Towards Audiovisual TTS in Estonian Einar MEISTER a, SaschaFAGEL b and RainerMETSVAHI a a Institute of Cybernetics at Tallinn University of Technology, Estonia b zoobemessageentertainmentgmbh, Berlin,
More informationHybrid Biometric Person Authentication Using Face and Voice Features
Paper presented in the Third International Conference, Audio- and Video-Based Biometric Person Authentication AVBPA 2001, Halmstad, Sweden, proceedings pages 348-353, June 2001. Hybrid Biometric Person
More informationADVANCED IMAGE PROCESSING METHODS FOR ULTRASONIC NDE RESEARCH C. H. Chen, University of Massachusetts Dartmouth, N.
ADVANCED IMAGE PROCESSING METHODS FOR ULTRASONIC NDE RESEARCH C. H. Chen, University of Massachusetts Dartmouth, N. Dartmouth, MA USA Abstract: The significant progress in ultrasonic NDE systems has now
More informationComparing computer vision analysis of signed language video with motion capture recordings
Comparing computer vision analysis of signed language video with motion capture recordings Matti Karppa 1, Tommi Jantunen 2, Ville Viitaniemi 1, Jorma Laaksonen 1, Birgitta Burger 3, and Danny De Weerdt
More informationBoth LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal.
Perceptual coding Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal. Perceptual encoders, however, have been designed for the compression of general
More informationFacial Expression Analysis for Model-Based Coding of Video Sequences
Picture Coding Symposium, pp. 33-38, Berlin, September 1997. Facial Expression Analysis for Model-Based Coding of Video Sequences Peter Eisert and Bernd Girod Telecommunications Institute, University of
More informationAcoustic to Articulatory Mapping using Memory Based Regression and Trajectory Smoothing
Acoustic to Articulatory Mapping using Memory Based Regression and Trajectory Smoothing Samer Al Moubayed Center for Speech Technology, Department of Speech, Music, and Hearing, KTH, Sweden. sameram@kth.se
More information17. SEISMIC ANALYSIS MODELING TO SATISFY BUILDING CODES
17. SEISMIC ANALYSIS MODELING TO SATISFY BUILDING CODES The Current Building Codes Use the Terminology: Principal Direction without a Unique Definition 17.1 INTRODUCTION { XE "Building Codes" }Currently
More informationAccurate 3D Face and Body Modeling from a Single Fixed Kinect
Accurate 3D Face and Body Modeling from a Single Fixed Kinect Ruizhe Wang*, Matthias Hernandez*, Jongmoo Choi, Gérard Medioni Computer Vision Lab, IRIS University of Southern California Abstract In this
More informationReal Time Motion Detection Using Background Subtraction Method and Frame Difference
Real Time Motion Detection Using Background Subtraction Method and Frame Difference Lavanya M P PG Scholar, Department of ECE, Channabasaveshwara Institute of Technology, Gubbi, Tumkur Abstract: In today
More informationComparison Between The Optical Flow Computational Techniques
Comparison Between The Optical Flow Computational Techniques Sri Devi Thota #1, Kanaka Sunanda Vemulapalli* 2, Kartheek Chintalapati* 3, Phanindra Sai Srinivas Gudipudi* 4 # Associate Professor, Dept.
More informationSynthesizing Realistic Facial Expressions from Photographs
Synthesizing Realistic Facial Expressions from Photographs 1998 F. Pighin, J Hecker, D. Lischinskiy, R. Szeliskiz and D. H. Salesin University of Washington, The Hebrew University Microsoft Research 1
More informationEfficient Block Matching Algorithm for Motion Estimation
Efficient Block Matching Algorithm for Motion Estimation Zong Chen International Science Inde Computer and Information Engineering waset.org/publication/1581 Abstract Motion estimation is a key problem
More informationHand-Eye Calibration from Image Derivatives
Hand-Eye Calibration from Image Derivatives Abstract In this paper it is shown how to perform hand-eye calibration using only the normal flow field and knowledge about the motion of the hand. The proposed
More information15 Data Compression 2014/9/21. Objectives After studying this chapter, the student should be able to: 15-1 LOSSLESS COMPRESSION
15 Data Compression Data compression implies sending or storing a smaller number of bits. Although many methods are used for this purpose, in general these methods can be divided into two broad categories:
More informationTree-based Cluster Weighted Modeling: Towards A Massively Parallel Real- Time Digital Stradivarius
Tree-based Cluster Weighted Modeling: Towards A Massively Parallel Real- Time Digital Stradivarius Edward S. Boyden III e@media.mit.edu Physics and Media Group MIT Media Lab 0 Ames St. Cambridge, MA 039
More informationDigital Volume Correlation for Materials Characterization
19 th World Conference on Non-Destructive Testing 2016 Digital Volume Correlation for Materials Characterization Enrico QUINTANA, Phillip REU, Edward JIMENEZ, Kyle THOMPSON, Sharlotte KRAMER Sandia National
More informationSPREAD SPECTRUM AUDIO WATERMARKING SCHEME BASED ON PSYCHOACOUSTIC MODEL
SPREAD SPECTRUM WATERMARKING SCHEME BASED ON PSYCHOACOUSTIC MODEL 1 Yüksel Tokur 2 Ergun Erçelebi e-mail: tokur@gantep.edu.tr e-mail: ercelebi@gantep.edu.tr 1 Gaziantep University, MYO, 27310, Gaziantep,
More informationText-Independent Speaker Identification
December 8, 1999 Text-Independent Speaker Identification Til T. Phan and Thomas Soong 1.0 Introduction 1.1 Motivation The problem of speaker identification is an area with many different applications.
More informationJoint Matrix Quantization of Face Parameters and LPC Coefficients for Low Bit Rate Audiovisual Speech Coding
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 12, NO. 3, MAY 2004 265 Joint Matrix Quantization of Face Parameters and LPC Coefficients for Low Bit Rate Audiovisual Speech Coding Laurent Girin
More informationCOS 116 The Computational Universe Laboratory 4: Digital Sound and Music
COS 116 The Computational Universe Laboratory 4: Digital Sound and Music In this lab you will learn about digital representations of sound and music, especially focusing on the role played by frequency
More informationSPEECH FEATURE EXTRACTION USING WEIGHTED HIGHER-ORDER LOCAL AUTO-CORRELATION
Far East Journal of Electronics and Communications Volume 3, Number 2, 2009, Pages 125-140 Published Online: September 14, 2009 This paper is available online at http://www.pphmj.com 2009 Pushpa Publishing
More informationMixture Models and EM
Mixture Models and EM Goal: Introduction to probabilistic mixture models and the expectationmaximization (EM) algorithm. Motivation: simultaneous fitting of multiple model instances unsupervised clustering
More informationCOS 116 The Computational Universe Laboratory 4: Digital Sound and Music
COS 116 The Computational Universe Laboratory 4: Digital Sound and Music In this lab you will learn about digital representations of sound and music, especially focusing on the role played by frequency
More informationPerformance Evaluation of Internet Telephony Systems Through Quantitative Assessment
Performance evaluation of Internet telephony systems through quantitative assessment Ng, C.H., Foo, S., & Hui, S.C. (1997). Proc. of 1997 National Undergraduate Research (NUR) Congress, Singapore, 1097-1102.
More informationAnimated Talking Head With Personalized 3D Head Model
Animated Talking Head With Personalized 3D Head Model L.S.Chen, T.S.Huang - Beckman Institute & CSL University of Illinois, Urbana, IL 61801, USA; lchen@ifp.uiuc.edu Jörn Ostermann, AT&T Labs-Research,
More informationModeling of an MPEG Audio Layer-3 Encoder in Ptolemy
Modeling of an MPEG Audio Layer-3 Encoder in Ptolemy Patrick Brown EE382C Embedded Software Systems May 10, 2000 $EVWUDFW MPEG Audio Layer-3 is a standard for the compression of high-quality digital audio.
More informationStatistical image models
Chapter 4 Statistical image models 4. Introduction 4.. Visual worlds Figure 4. shows images that belong to different visual worlds. The first world (fig. 4..a) is the world of white noise. It is the world
More informationEvaluation of Moving Object Tracking Techniques for Video Surveillance Applications
International Journal of Current Engineering and Technology E-ISSN 2277 4106, P-ISSN 2347 5161 2015INPRESSCO, All Rights Reserved Available at http://inpressco.com/category/ijcet Research Article Evaluation
More informationAn analysis of the dimensionality of jawmotion
Journal of Phonetics (1995) 23, 101-117 An analysis of the dimensionality of jawmotion in speech Eric Vatikiotis-Bateson A TR Human nformation Processing Research Laboratories, Kyoto, Japan and David J.
More informationPerformance Evaluation of the Eigenface Algorithm on Plain-Feature Images in Comparison with Those of Distinct Features
American Journal of Signal Processing 2015, 5(2): 32-39 DOI: 10.5923/j.ajsp.20150502.02 Performance Evaluation of the Eigenface Algorithm on Plain-Feature Images in Comparison with Those of Distinct Features
More informationBuilding speaker-specific lip models for talking heads from 3D face data
Building speaker-specific lip models for talking heads from 3D face data Takaaki Kuratate 1,2, Marcia Riley 1 1 Institute for Cognitive Systems, Technical University Munich, Germany 2 MARCS Auditory Laboratories,
More informationGetting Started with Crazy Talk 6
Getting Started with Crazy Talk 6 Crazy Talk 6 is an application that generates talking characters from an image or photo, as well as facial animation for video. Importing an Image Launch Crazy Talk and
More informationSURVEY OF LOCAL AND GLOBAL OPTICAL FLOW WITH COARSE TO FINE METHOD
SURVEY OF LOCAL AND GLOBAL OPTICAL FLOW WITH COARSE TO FINE METHOD M.E-II, Department of Computer Engineering, PICT, Pune ABSTRACT: Optical flow as an image processing technique finds its applications
More informationA new take on FWI: Wavefield Reconstruction Inversion
A new take on FWI: Wavefield Reconstruction Inversion T. van Leeuwen 1, F.J. Herrmann and B. Peters 1 Centrum Wiskunde & Informatica, Amsterdam, The Netherlands University of British Columbia, dept. of
More informationModule 10 MULTIMEDIA SYNCHRONIZATION
Module 10 MULTIMEDIA SYNCHRONIZATION Lesson 33 Basic definitions and requirements Instructional objectives At the end of this lesson, the students should be able to: 1. Define synchronization between media
More informationLow Cost Motion Capture
Low Cost Motion Capture R. Budiman M. Bennamoun D.Q. Huynh School of Computer Science and Software Engineering The University of Western Australia Crawley WA 6009 AUSTRALIA Email: budimr01@tartarus.uwa.edu.au,
More informationChapter 3 Set Redundancy in Magnetic Resonance Brain Images
16 Chapter 3 Set Redundancy in Magnetic Resonance Brain Images 3.1 MRI (magnetic resonance imaging) MRI is a technique of measuring physical structure within the human anatomy. Our proposed research focuses
More informationA NEURAL NETWORK APPLICATION FOR A COMPUTER ACCESS SECURITY SYSTEM: KEYSTROKE DYNAMICS VERSUS VOICE PATTERNS
A NEURAL NETWORK APPLICATION FOR A COMPUTER ACCESS SECURITY SYSTEM: KEYSTROKE DYNAMICS VERSUS VOICE PATTERNS A. SERMET ANAGUN Industrial Engineering Department, Osmangazi University, Eskisehir, Turkey
More informationProduction of Video Images by Computer Controlled Cameras and Its Application to TV Conference System
Proc. of IEEE Conference on Computer Vision and Pattern Recognition, vol.2, II-131 II-137, Dec. 2001. Production of Video Images by Computer Controlled Cameras and Its Application to TV Conference System
More informationMusic 209 Advanced Topics in Computer Music Lecture 8 Off-line Concatenation Control
Music 209 Advanced Topics in Computer Music Lecture 8 Off-line Concatenation Control Pre-recorded audio and MIDI performances: we know data for future t s. 2006-3-9 Professor David Wessel (with John Lazzaro)
More informationSupplementary Figure 1. Decoding results broken down for different ROIs
Supplementary Figure 1 Decoding results broken down for different ROIs Decoding results for areas V1, V2, V3, and V1 V3 combined. (a) Decoded and presented orientations are strongly correlated in areas
More informationUsing the rear projection of the Socibot Desktop robot for creation of applications with facial expressions
IOP Conference Series: Materials Science and Engineering PAPER OPEN ACCESS Using the rear projection of the Socibot Desktop robot for creation of applications with facial expressions To cite this article:
More informationStatics: the abusive power of trimming
John C. Bancroft, Alan Richards, and Charles P. Ursenbach Statics: the abusive power of trimming ABSTRACT The application of trim statics directly to each seismic trace can be very dangerous as any seismic
More information3D Mesh Sequence Compression Using Thin-plate Spline based Prediction
Appl. Math. Inf. Sci. 10, No. 4, 1603-1608 (2016) 1603 Applied Mathematics & Information Sciences An International Journal http://dx.doi.org/10.18576/amis/100440 3D Mesh Sequence Compression Using Thin-plate
More informationProceedings of Meetings on Acoustics
Proceedings of Meetings on Acoustics Volume 9, http://acousticalsociety.org/ ICA Montreal Montreal, Canada - June Speech Communication Session asc: Linking Perception and Production (Poster Session) asc.
More informationCS 4495 Computer Vision A. Bobick. Motion and Optic Flow. Stereo Matching
Stereo Matching Fundamental matrix Let p be a point in left image, p in right image l l Epipolar relation p maps to epipolar line l p maps to epipolar line l p p Epipolar mapping described by a 3x3 matrix
More informationFace Tracking : An implementation of the Kanade-Lucas-Tomasi Tracking algorithm
Face Tracking : An implementation of the Kanade-Lucas-Tomasi Tracking algorithm Dirk W. Wagener, Ben Herbst Department of Applied Mathematics, University of Stellenbosch, Private Bag X1, Matieland 762,
More informationBOSS. Quick Start Guide For research use only. Blackrock Microsystems, LLC. Blackrock Offline Spike Sorter. User s Manual. 630 Komas Drive Suite 200
BOSS Quick Start Guide For research use only Blackrock Microsystems, LLC 630 Komas Drive Suite 200 Salt Lake City UT 84108 T: +1 801 582 5533 www.blackrockmicro.com support@blackrockmicro.com 1 2 1.0 Table
More informationCreating a Lip Sync and Using the X-Sheet in Dragonframe
Creating a Lip Sync and Using the X-Sheet in Dragonframe Contents A. Creating a Lip Sync in Dragonframe B. Loading the X-Sheet in Dragon Frame C. Setting Notes and Flag/Reminders in the X-Sheet 1. Trackreading/Breaking
More informationTopics in Linguistic Theory: Laboratory Phonology Spring 2007
MIT OpenCourseWare http://ocw.mit.edu 24.910 Topics in Linguistic Theory: Laboratory Phonology Spring 2007 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
More informationLOW-DIMENSIONAL MOTION FEATURES FOR AUDIO-VISUAL SPEECH RECOGNITION
LOW-DIMENSIONAL MOTION FEATURES FOR AUDIO-VISUAL SPEECH Andrés Vallés Carboneras, Mihai Gurban +, and Jean-Philippe Thiran + + Signal Processing Institute, E.T.S.I. de Telecomunicación Ecole Polytechnique
More informationAdaptive Waveform Inversion: Theory Mike Warner*, Imperial College London, and Lluís Guasch, Sub Salt Solutions Limited
Adaptive Waveform Inversion: Theory Mike Warner*, Imperial College London, and Lluís Guasch, Sub Salt Solutions Limited Summary We present a new method for performing full-waveform inversion that appears
More informationHEALTH MONITORING OF INDUCTION MOTOR FOR VIBRATION ANALYSIS
HEALTH MONITORING OF INDUCTION MOTOR FOR VIBRATION ANALYSIS Chockalingam ARAVIND VAITHILINGAM aravind_147@yahoo.com UCSI University Kualalumpur Gilbert THIO gthio@ucsi.edu.my UCSI University Kualalumpur
More informationInternational Journal of Advance Engineering and Research Development
Scientific Journal of Impact Factor (SJIF): 4.72 International Journal of Advance Engineering and Research Development Volume 4, Issue 11, November -2017 e-issn (O): 2348-4470 p-issn (P): 2348-6406 Comparative
More informationCompression; Error detection & correction
Compression; Error detection & correction compression: squeeze out redundancy to use less memory or use less network bandwidth encode the same information in fewer bits some bits carry no information some
More informationBluray (
Bluray (http://www.blu-ray.com/faq) MPEG-2 - enhanced for HD, also used for playback of DVDs and HDTV recordings MPEG-4 AVC - part of the MPEG-4 standard also known as H.264 (High Profile and Main Profile)
More informationPredictive Interpolation for Registration
Predictive Interpolation for Registration D.G. Bailey Institute of Information Sciences and Technology, Massey University, Private bag 11222, Palmerston North D.G.Bailey@massey.ac.nz Abstract Predictive
More informationREALISTIC FACIAL EXPRESSION SYNTHESIS FOR AN IMAGE-BASED TALKING HEAD. Kang Liu and Joern Ostermann
REALISTIC FACIAL EXPRESSION SYNTHESIS FOR AN IMAGE-BASED TALKING HEAD Kang Liu and Joern Ostermann Institut für Informationsverarbeitung, Leibniz Universität Hannover Appelstr. 9A, 3167 Hannover, Germany
More informationEffects of multi-scale velocity heterogeneities on wave-equation migration Yong Ma and Paul Sava, Center for Wave Phenomena, Colorado School of Mines
Effects of multi-scale velocity heterogeneities on wave-equation migration Yong Ma and Paul Sava, Center for Wave Phenomena, Colorado School of Mines SUMMARY Velocity models used for wavefield-based seismic
More informationVIDEO DENOISING BASED ON ADAPTIVE TEMPORAL AVERAGING
Engineering Review Vol. 32, Issue 2, 64-69, 2012. 64 VIDEO DENOISING BASED ON ADAPTIVE TEMPORAL AVERAGING David BARTOVČAK Miroslav VRANKIĆ Abstract: This paper proposes a video denoising algorithm based
More informationFacial Motion Capture Editing by Automated Orthogonal Blendshape Construction and Weight Propagation
Facial Motion Capture Editing by Automated Orthogonal Blendshape Construction and Weight Propagation Qing Li and Zhigang Deng Department of Computer Science University of Houston Houston, TX, 77204, USA
More informationPerspectives on Multimedia Quality Prediction Methodologies for Advanced Mobile and IP-based Telephony
Perspectives on Multimedia Quality Prediction Methodologies for Advanced Mobile and IP-based Telephony Nobuhiko Kitawaki University of Tsukuba 1-1-1, Tennoudai, Tsukuba-shi, 305-8573 Japan. E-mail: kitawaki@cs.tsukuba.ac.jp
More information3D Face and Hand Tracking for American Sign Language Recognition
3D Face and Hand Tracking for American Sign Language Recognition NSF-ITR (2004-2008) D. Metaxas, A. Elgammal, V. Pavlovic (Rutgers Univ.) C. Neidle (Boston Univ.) C. Vogler (Gallaudet) The need for automated
More informationSpeech to Head Gesture Mapping in Multimodal Human-Robot Interaction
1 Speech to Head Gesture Mapping in Multimodal Human-Robot Interaction Amir Aly and Adriana Tapus Cognitive Robotics Lab, ENSTA-ParisTech, France {amir.aly, adriana.tapus}@ensta-paristech.fr Abstract In
More informationarxiv: v1 [cs.cv] 2 May 2016
16-811 Math Fundamentals for Robotics Comparison of Optimization Methods in Optical Flow Estimation Final Report, Fall 2015 arxiv:1605.00572v1 [cs.cv] 2 May 2016 Contents Noranart Vesdapunt Master of Computer
More information