Proceedings of Meetings on Acoustics

Size: px
Start display at page:

Download "Proceedings of Meetings on Acoustics"

Transcription

1 Proceedings of Meetings on Acoustics Volume, rd Meeting Acoustical Society of America Salt Lake City, Utah 4-8 June 2007 Session psc: Speech Communication psc9. Temporal characterization of auditory-visual coupling in speech Adriano V. Barbosa, Hani C. Yehia and Eric Vatikiotis-Bateson This work examines the coupling between the acoustic and visual com- ponents of speech as it evolves through time. Previous work has shown a consistent correspondence between face motion and spectral acous- tics, and between fundamental frequency (F0) and rigid body motion of the head [Yehia et al. (2002), JPHON, 30, ]. Although these correspondences have been estimated both for sentences and for running speech, the analyses have not taken into account the tempo- ral structure of speech. As a result, the role of temporal organization in multimodal speech cannot be assessed. The current study is a first effort to correct this deficit. We have developed an algorithm, based on recursive correlation, that computes the correlation between measurement domains (e.g., head motion and F0) as a time-varying function. Using this method, regions of high or low correlation, or of rapid transition (e.g., from high to low), can be associated with visual and auditory events. This analysis of the time-varying cou- pling of multimodal events has implications for speech planning and synchronization between speaker and listener. Published by the Acoustical Society of America through the American Institute of Physics 2008 Acoustical Society of America [DOI: 0.2/ ] Received 27 Feb 2008; published 24 Apr 2008 Proceedings of Meetings on Acoustics, Vol., (2008) Page

2 Temporal characterization of auditory-visual coupling in speech Adriano V Barbosa, Hani C Yehia 2 and Eric Vatikiotis-Bateson Linguistics, University of British Columbia, Vancouver, Canada 2 Electronics, Federal University of Minas Gerais, Belo Horizonte, Brazil adriano.vilela@gmail.com, evb@interchange.ubc.ca, hani@cefala.org Introduction This paper introduces two important improvements to our system for processing multimodal signals that can be measured during spoken communication: ) the computation of correspondences between time-varying measures that are sensitive to temporally local fluctuations; 2) the transduction of visible motion from simple video recordings in a field situation where use of passive markers or makeup is unacceptable. Over the past decade, we have applied both linear and nonlinear estimation techniques to characterize the largely linear correspondences between vocal tract articulation, the speech acoustics and visible motions of the head and face. Since the time-varying vocal tract shapes both the acoustics and the face through positioning of the tongue, jaw and lips, it is not surprising that measures made in these three domains should be related somehow. When applied to isolated sentences and longer stretches of connected speech, our previous analyses have shown there to be largely linear correspondences, for example, between spectral acoustic parameters (Line Spectrum Pairs LSP; Sugamura and Itakura, 986 ) and motion of the lips, cheeks, and chin (Yehia et al., 998, 999), and between fundamental frequency (F0) and rigid body head motion (Yehia et al., 2002). These correspondences have been computed using relatively few (5-0) parameters for each measurement domain. For example, the first 5-6 principal Proceedings of Meetings on Acoustics, Vol., (2008) Page 2

3 components of 2D midsagittal motion of the tongue, jaw, and lips and a similar number of components for face and head motion are typically sufficient to recover more than 95% of the variance in each of these domains. Simple linear models applied to these reduced numbers of principal components are then usually able to account for 80-90% of the cross-domain variance. The simplicity of these correspondences greatly facilitated the creation of a linguistically valid talking head animation system that, running in real time, can be driven by measured vocal tract, acoustics, or visible motion of the head and face (for details of the animation system, see Kuratate et al., 2005; for the perceptual validation, see Munhall et al., 2004). A major limitation of this system, however, has been that the correspondences, for example, between face motion and acoustic LSPs, are computed globally over the entire signal. For isolated sentences spanning -2 seconds, this is fine. However, when longer stretches of data are considered, the correspondences do not improve; if anything they weaken somewhat (Vatikiotis- Bateson and Yehia, 2002). That is, the computation of correspondences between signals is based on a static set of parameters, computed once, which means that there is no distinction between spatiotemporal variations that characterize the behavioral structure and local fluctuations that degrade the result when the computed parameters are applied recursively to estimate the time-varying behavior (Moreira and Yehia, 2006). To address this limitation, we introduce an algorithm, based on recursive correlation (Aarts et al., 2002), that computes the instantaneous cross-correlation between measurement domains e.g., head motion and acoustic amplitude (root mean square RMS). This allows rapid changes in correspondence between auditory-visual events to be evaluated as a function of time while also potentially improving the accuracy of cross-domain correspondences computed over analysis windows of any size. A second limitation to our system has been the dependence on markers placed on the face and head for tracking 2D or 3D motion. The use of markers, either active (e.g., wired infrared LEDs) or passive, is physically invasive and distracting for naive experimental subjects, and restricts data collection to the laboratory a situation which is stressful for the elderly and other populations unaccustomed to formal research. In this study however, the motion data were all extracted from video recordings made in the field using a relatively unobtrusive digital video (DV) camcorder. Through the simple technique of computing the optical flow (Horn and Schunk, 98) and then summing the amplitudes (and discarding the directions) of the Proceedings of Meetings on Acoustics, Vol., (2008) Page 3

4 pixel motion vectors for each frame step, signals similar to those derived from marker tracking were created. As shown below (see Figures 3-5), even a single channel representing all of the motion in a video frame captures significant aspects of the spatiotemporal behavior. For the purpose of demonstration, the algorithm is applied to audiovisual behavior produced by a speaker of Plains Cree (Alberta, Canada) as part of an investigation of language as performance with R-M Déchaine and J Deschamps. The analysis of the time-varying coupling of multimodal events clearly has implications for our understanding of speech organization and the assessment of communicative coordination between speaker and listener. In this study, we focus in particular on the coordination between orofacial motions, the amplitude (RMS) of the speech acoustics, and the motion of the speaker s hands. The specific use to which this has been put in the investigation of Cree is in assessing the coordination of hand gestures and speech acoustic parameters in the collaborative construction of meaning. This includes instances where the explicit meaning does not reside solely in the words (or the visible gestures), and instances where iconic use of the hands shows secondary iconic specification in anaphoric structures (e.g., bringing the hand to touch the head once in a narrative to indicate that the speaker was thinking about something and then subsequently producing a reduced motion in the direction of the head to indicate the same thing). The timing and structure of these gestures are coordinated with, but not necessarily determined by, the speech acoustics. The remainder of this paper is organized as follows. Section 2 presents the mathematical formulation of our instantaneous correlation algorithm. Section 3 describes the data acquisition process and discusses the optical flow analysis applied to the acquired video sequences. Results are presented and discussed in Section 4. Finally, the paper is summarized in Section 5. 2 Instantaneous correlation algorithm The instantaneous correlation coefficient ρ(k) between signals x(k) andy(k) is computed as S xy (k) ρ(k) = Sxx (k) S yy (k), () Proceedings of Meetings on Acoustics, Vol., (2008) Page 4

5 where the instantaneous covariance S xy between signals x(k)andy(k)isgiven by S xy (k) = ce ηl (x(k l) x(k l)) (y(k l) ȳ(k l)), (2) l=0 which is a modification of Equation (4) in Aarts et al. (2002). The instantaneous means x(k) and ȳ(k) are computed as x(k) = ce ηm x(k m), (3) m=0 ȳ(k) = ce ηm y(k m), (4) m=0 with the constant c given by c = e η, (5) where η is a small positive number. It is interesting to note that the signal x(k) can be seen as the product of the constant c and the the output of a first-order low-pass linear filter excited by the signal x(k) (the same is valid for signal y(k)). The z-transform representation of this linear filter is given by H(z) = e η z, z >e η. (6) Furthermore, the covariance S xy as defined by Equation 2 can also be seen as the product of the constant c and the output of the filter in Equation 6 when excited by the signal (x(k) x(k)) (y(k) ȳ(k)). (7) 3 Data recording and processing The speaker s hands and face were recorded simultaneously at 24 frames per second (fps) using two DV cameras during a 6 minute interview. Stereo sound was recorded digitally by each camera at 48 khz via two professional microphones (Tram-50 lapel, Sennheiser-46 shotgun). Measures of 2D motion were extracted from the video recordings using the optical flow algorithm developed by Horn and Schunk (98). Figure Proceedings of Meetings on Acoustics, Vol., (2008) Page 5

6 Figure : A snapshot of the video and optical flow between the frame shown and the next frame in the video sequence. shows video frame for each camera and the resulting optical flow computed between that frame and the next frame in the sequence. There are many algorithms for computing optical flow (Barron et al., 994). However, they all have the same goal of calculating optical flow fields corresponding to the projection of the 3D motion of objects in the world onto the 2D image. A standard definition (SDTV) frame of NTSC digital video is 640 pixels wide by 480 pixels high. Each pixel has a luminance, or intensity, value within an 8-bit (0-255) range. Pixels also have values for color, but these are discarded in calculating optical flow. Moving images are recorded as changes in the intensity (and color) values for the pixels in the image array that are influenced by the motion. The optical flow algorithm does not merely register the change of intensity from one image to the next for each pixel; rather it attempts to keep track of specific intensity values, corresponding to image objects as they change location within the pixel array. Thus, the Proceedings of Meetings on Acoustics, Vol., (2008) Page 6

7 algorithm assigns a motion vector consisting of a magnitude and a direction to each pixel based on where the intensity associated with that pixel in one image is located in the next image in sequence. The direction is simply the line from the first pixel to the second and the magnitude corresponds to the Euclidean distance between them. The array of motion vectors comprises the optical flow field. For the purposes of the current analysis, only the magnitude (speed) of pixel motion is needed to assess the coordination of the hand and face-head motion with respect to each other and to the speech acoustics. The richness of this information is readily apparent in movies constructed by representing the magnitude component of optical flow as intensity so that more rapid changes appear brighter (have higher intensity) in the image sequence. Optical flow captures the motion and makes it possible to assess the coordination of hand motion, eye blinks, head motion, and even events in the speech acoustics. In principle, the motion associated with specific regions of interest such as the eyes, mouth, and head or the left and right hands can be examined independently. At this early stage of development, however, our goals are to introduce the instantaneous correlation algorithm, use optical flow to recover motion from video, and show how these techniques can be used to assess the time-varying correspondences between speech acoustics, head and orofacial motion, and hand gestures. To do this, the 640x480 magnitudes of motion associated with each pair of consecutive frames are summed and stored as unidimensional streams for the video sequences acquired by the two cameras (one of the face and head, the other of the hands). Summing the motion for the entire video frame obscures the contribution of specific components for example, the potentially differential contribution of each hand is lost and reduces the dimensionality of each measurement domain to one-time-varying measure. However, as a first step, this has two advantages: ) no a priori decisions about which aspects of the motion or which physical regions are the most relevant to the cross-domain correspondences have been made; and 2) as shown below, these supposedly impoverished measures are surprisingly well-coordinated across domains. In what follows, the instantaneous correlation algorithm is used to compare the two streams of motion magnitudes, summed from the optical flow results, with each other (Figure 3) and with the time-varying RMS amplitude of the acoustics (Figures 4-5). Proceedings of Meetings on Acoustics, Vol., (2008) Page 7

8 4 Results Computing instantaneous correlations that match what we readily perceive qualitatively from watching the optical flow movies requires tuning the algorithm to the behavioral data. The algorithm should not be too sensitive to rapid changes in correspondence e.g., changes due to noise or to higher frequency components in the behavior. Signal noise can be reduced by low-pass filtering (0 Hz used here). Behavioral noise is another issue: fluctuations in synchronization and higher frequencies will confound a sensitive function, while a less sensitive function will miss the subtle changes in spatiotemporal patterning. 0.9 Temporal scope of correlation weights ( ) 0.8 Relative Weight = 0.2 = Previous Sample Number Figure 2: Comparison of the temporal effects on relative weight for two values of η = 0.2 and 0.02 (used in our analysis). Sensitivity is determined by the exponent η in Equation 3. The larger its value the more sensitive the correlation is to rapid changes in the correspondence between signals. Figure 2 shows, for two different values of η, how preceding samples influence the correlation estimate. The smaller value (η = 0.02) gives a slower decline in the weight of preceding samples, decaying to less than % after 250 samples, and has proven to be a good value for the continuous correlation of the running speech data recorded for Plains Cree. This amounts to about a 5 sec. window when applied to signals resampled at Proceedings of Meetings on Acoustics, Vol., (2008) Page 8

9 Face-Hand Correspondence Correlation (Face/Hand) start 528 start Hand Motion Face Motion Time (s) Figure 3: Time-series plots of instantaneous correlation (top) calculated from two starting points (solid 8 sec. prior to window, dashed from window onset), summed optical flow for hands video (middle), and summed optical flow for head and face video (bottom). a common rate of 48Hz. This can be seen in the top panel of Figure 3 where the dashed correlation trace was computed from the start of the window and the solid trace 8 sec. earlier. The relatively high correlation in the first 3-4 sec. of Figure 3 can be seen by inspecting the motion traces for the hands and head/face. Also, comparing the audio waveform (Figure 3 top) and the motion traces shows that the 5 bursts of hand motion activity are apparently more synchronous with the speech signal than is the face motion. This latter observation is supported by the instantaneous correlation results shown for RMS amplitude and hand motion in Figure 4, and for RMS amplitude and face/head motion in Figure 5. Throughout the segments of data depicted in these figures, the correlation between RMS amplitude and face/head motion suffers from both poor synchronization and frequency mismatches in which the optical flow for the Proceedings of Meetings on Acoustics, Vol., (2008) Page 9

10 RMS Amplitude Hand Correspondence Correlation (RMS/hand) RMS Amplitude Hand Motion Time (s) Figure 4: Time-series plots of instantaneous correlation (top), RMS amplitude (middle), and summed hand motion (bottom). face/head sums across the perhaps semi-independent orofacial deformations due to speech articulation, eye-blinks, and head motion. The time-course of RMS amplitude clearly corresponds to the time-course of vocal tract opening (vowels) and closing (consonants). It has also been suggested that head motion is associated with RMS amplitude, but not necessarily in strict synchrony with successive syllables (Munhall et al., 2004). This difference in phasing alone would contribute to the complex frequencies observed for face/head motion. Eye-blinks add yet another dimension to the temporal stream of events. Although beyond the scope of the current paper, Matlab tools have been created to accommodate variable synchronization between signals and to do spectral decomposition prior to computing the correlations (Barbosa et al., 2007a). Proceedings of Meetings on Acoustics, Vol., (2008) Page 0

11 RMS Amplitude Face Correspondence Correlation (RMS/Face) 0 RMS Amplitude Face Motion Time (s) Figure 5: Time-series plots of instantaneous correlation (top), RMS amplitude (middle), and summed face motion (bottom). 5 Summary discussion As can be seen from this simple demonstration, both optical flow and continuous instantaneous correlation promise to be useful in assessing the spatiotemporal coupling between behavioral signals that can be measured as non-invasively as possible. In the implementation presented here, relatively high correlations are observed only for events that are tightly synchronized. Since presenting this poster in June 2007 (Barbosa et al., 2007b), the algorithm has been modified to compute correlations ) weighted by any combination of preceding and following samples, and 2) at any temporal offset between the two signals. These improvements preclude real-time processing, but provide more robust assessments of the correspondence between signals. Both modifications should afford larger values of η that asymptote more Proceedings of Meetings on Acoustics, Vol., (2008) Page

12 quickly, thus effectively reducing the size of the filter window (Figure 2). While this results in greater sensitivity to fluctuations in the instantaneous correlation coefficient, the greater sensitivity can be used to assess the shifts in temporal lag without necessarily reducing the degree of correspondence that occurs naturally during the production of coordinated behaviors such as speech and music. Most of this expanded functionality has already been incorporated in the Matlab toolbox that we have created for processing and analyzing multimodal speech data (Barbosa et al., 2007a), and is available to the research community. Even with these improvements, there are still instances of coupling between the speech and gestural behavior that cannot be easily captured. These are due, in part, to summing the motion for the face/head. Therefore, we are currently attempting the more fine-grained decomposition of this complex into head, perioral, and eye components, which we know are each coordinated with the production of speech. Finally, if we need to make the algorithm smarter so that its sensitivity can be modified on the fly, we will replace the current instantaneous correlation algorithm with a learning algorithm e.g., Kalman filtering (Kalman, 960) that combines prediction from previous patterns of behavior with local estimates of the instantaneous correlation. Acknowledgment Support for this work was provided by NSERC and SSHRC grants to E. Vatikiotis-Bateson. The Plains Cree data were collected in collaboration with Rose-Marie Déchaine, Clancy Dennehy, and Joseph Deschamps. References Aarts, R. M., Irwan, R., and Janssen, A. J. E. M. (2002). Efficient tracking of the cross-correlation coefficient. IEEE Transactions on Speech and Audio Processing, 0(6): Barbosa, A. V., Yehia, H. C., and Vatikiotis-Bateson, E. (2007a). Matlab toolbox for audiovisual speech processing. In Vroomen, J., Swerts, M., and Krahmer, E., editors, International Conference on Auditory-Visual Speech Processing AVSP 2007, pages 32 37, The Netherlands. ISCA. Proceedings of Meetings on Acoustics, Vol., (2008) Page 2

13 Barbosa, A. V., Yehia, H. C., and Vatikiotis-Bateson, E. (2007b). Temporal characteristization of auditory-visual coupling in speech. Journal of the Acoustical Society of America, 2:3044. Barron, J. L., Fleet, D. J., and Beauchemin, S. S. (994). Performance of optical flow techniques. International Journal of Computer Vision, 2: Horn, B. K. and Schunk, B. G. (98). Determining optical flow. Artificial Intelligence, 7: Kalman, R. E. (960). A new approach to linear filtering and prediction problems. Transactions of the ASME Journal of Basic Engineering, 82(Series D): Kuratate, T., Vatikiotis-Bateson, E., and Yehia, H. C. (2005). Estimation and animation of faces using facial motion mapping and a 3d face database. In Clement, J. G. and Marks, M. K., editors, Computer-graphic facial reconstruction, pages Academic Press, Amsterdam. Moreira, K. S. and Yehia, H. C. (2006). Analysis of the variability of the coupling between facial motion and speech acoustics. In Yehia, H. C., Demolin, D., and Laboissière, R., editors, International Seminar on Speech Production ISSP 2006, pages 09 6, Brazil. UFMG. Munhall, K. G., Jones, J. A., Callan, D. E., Kuratate, T., and Vatikiotis- Bateson, E. (2004). Visual prosody and speech intelligibility: Head movement improves auditory speech perception. Psychological Science, 5(2): Sugamura, N. and Itakura, F. (986). Speech analysis and synthesis methods developed at ECL in NTT: from LPC to LSP. Speech Communication, 5: Vatikiotis-Bateson, E. and Yehia, H. C. (2002). Speaking mode variability in multimodal speech production. IEEE Transactions in Neural Networks, 3(4): Yehia, H. C., Kuratate, T., and Vatikiotis-Bateson, E. (999). Using speech acoustics to drive facial motion. In Ohala, J. J., Hasegawa, Y., Ohala, Proceedings of Meetings on Acoustics, Vol., (2008) Page 3

14 M., Granville, D., and Bailey, A. C., editors, Proceedings of the 4th International Congress of Phonetic Sciences, volume, pages , San Francisco, CA. Linguistics Dept., UC Berkeley. Yehia, H. C., Kuratate, T., and Vatikiotis-Bateson, E. (2002). Linking facial animation, head motion, and speech acoustics. Journal of Phonetics, 30(3): Yehia, H. C., Rubin, P. E., and Vatikiotis-Bateson, E. (998). Quantitative association of vocal-tract and facial behavior. Speech Communication, 26: Proceedings of Meetings on Acoustics, Vol., (2008) Page 4

MATLAB Toolbox for Audiovisual Speech Processing

MATLAB Toolbox for Audiovisual Speech Processing ISCA Archive http://www.isca-speech.org/archive Auditory-Visual Speech Processing 27 (AVSP27) Hilvarenbeek, The Netherlands August 31 - September 3, 27 MATLAB Toolbox for Audiovisual Speech Processing

More information

M I RA Lab. Speech Animation. Where do we stand today? Speech Animation : Hierarchy. What are the technologies?

M I RA Lab. Speech Animation. Where do we stand today? Speech Animation : Hierarchy. What are the technologies? MIRALab Where Research means Creativity Where do we stand today? M I RA Lab Nadia Magnenat-Thalmann MIRALab, University of Geneva thalmann@miralab.unige.ch Video Input (face) Audio Input (speech) FAP Extraction

More information

VISEME SPACE FOR REALISTIC SPEECH ANIMATION

VISEME SPACE FOR REALISTIC SPEECH ANIMATION VISEME SPACE FOR REALISTIC SPEECH ANIMATION Sumedha Kshirsagar, Nadia Magnenat-Thalmann MIRALab CUI, University of Geneva {sumedha, thalmann}@miralab.unige.ch http://www.miralab.unige.ch ABSTRACT For realistic

More information

FaceSync: A linear operator for measuring synchronization of video facial images and audio tracks

FaceSync: A linear operator for measuring synchronization of video facial images and audio tracks Presented at NIPS, Denver Colorado, November,. Published in the Proceedings of Neural Information Processing Society 13, MI Press, 1. FaceSync: A linear operator for measuring synchronization of video

More information

Speech Driven Synthesis of Talking Head Sequences

Speech Driven Synthesis of Talking Head Sequences 3D Image Analysis and Synthesis, pp. 5-56, Erlangen, November 997. Speech Driven Synthesis of Talking Head Sequences Peter Eisert, Subhasis Chaudhuri,andBerndGirod Telecommunications Laboratory, University

More information

A LOW-COST STEREOVISION BASED SYSTEM FOR ACQUISITION OF VISIBLE ARTICULATORY DATA

A LOW-COST STEREOVISION BASED SYSTEM FOR ACQUISITION OF VISIBLE ARTICULATORY DATA ISCA Archive http://www.isca-speech.org/archive Auditory-Visual Speech Processing 2005 (AVSP 05) British Columbia, Canada July 24-27, 2005 A LOW-COST STEREOVISION BASED SYSTEM FOR ACQUISITION OF VISIBLE

More information

Real-time Talking Head Driven by Voice and its Application to Communication and Entertainment

Real-time Talking Head Driven by Voice and its Application to Communication and Entertainment ISCA Archive Real-time Talking Head Driven by Voice and its Application to Communication and Entertainment Shigeo MORISHIMA Seikei University ABSTRACT Recently computer can make cyberspace to walk through

More information

Realistic Face Animation for Audiovisual Speech Applications: A Densification Approach Driven by Sparse Stereo Meshes

Realistic Face Animation for Audiovisual Speech Applications: A Densification Approach Driven by Sparse Stereo Meshes Realistic Face Animation for Audiovisual Speech Applications: A Densification Approach Driven by Sparse Stereo Meshes Marie-Odile Berger, Jonathan Ponroy, Brigitte Wrobel-Dautcourt To cite this version:

More information

Detection of Mouth Movements and Its Applications to Cross-Modal Analysis of Planning Meetings

Detection of Mouth Movements and Its Applications to Cross-Modal Analysis of Planning Meetings 2009 International Conference on Multimedia Information Networking and Security Detection of Mouth Movements and Its Applications to Cross-Modal Analysis of Planning Meetings Yingen Xiong Nokia Research

More information

Multi-modal Translation and Evaluation of Lip-synchronization using Noise Added Voice

Multi-modal Translation and Evaluation of Lip-synchronization using Noise Added Voice Multi-modal Translation and Evaluation of Lip-synchronization using Noise Added Voice Shigeo MORISHIMA (,2), Satoshi NAKAMURA (2) () Faculty of Engineering, Seikei University. --, Kichijoji-Kitamachi,

More information

FACE ANALYSIS AND SYNTHESIS FOR INTERACTIVE ENTERTAINMENT

FACE ANALYSIS AND SYNTHESIS FOR INTERACTIVE ENTERTAINMENT FACE ANALYSIS AND SYNTHESIS FOR INTERACTIVE ENTERTAINMENT Shoichiro IWASAWA*I, Tatsuo YOTSUKURA*2, Shigeo MORISHIMA*2 */ Telecommunication Advancement Organization *2Facu!ty of Engineering, Seikei University

More information

New Results in Low Bit Rate Speech Coding and Bandwidth Extension

New Results in Low Bit Rate Speech Coding and Bandwidth Extension Audio Engineering Society Convention Paper Presented at the 121st Convention 2006 October 5 8 San Francisco, CA, USA This convention paper has been reproduced from the author's advance manuscript, without

More information

Quarterly Progress and Status Report. Studies of labial articulation

Quarterly Progress and Status Report. Studies of labial articulation Dept. for Speech, Music and Hearing Quarterly Progress and Status Report Studies of labial articulation Lindblom, B. journal: STL-QPSR volume: 6 number: 4 year: 1965 pages: 007-009 http://www.speech.kth.se/qpsr

More information

Video Processing for Judicial Applications

Video Processing for Judicial Applications Video Processing for Judicial Applications Konstantinos Avgerinakis, Alexia Briassouli, Ioannis Kompatsiaris Informatics and Telematics Institute, Centre for Research and Technology, Hellas Thessaloniki,

More information

FUZZY INFERENCE SYSTEMS

FUZZY INFERENCE SYSTEMS CHAPTER-IV FUZZY INFERENCE SYSTEMS Fuzzy inference is the process of formulating the mapping from a given input to an output using fuzzy logic. The mapping then provides a basis from which decisions can

More information

Three-dimensional nondestructive evaluation of cylindrical objects (pipe) using an infrared camera coupled to a 3D scanner

Three-dimensional nondestructive evaluation of cylindrical objects (pipe) using an infrared camera coupled to a 3D scanner Three-dimensional nondestructive evaluation of cylindrical objects (pipe) using an infrared camera coupled to a 3D scanner F. B. Djupkep Dizeu, S. Hesabi, D. Laurendeau, A. Bendada Computer Vision and

More information

2-2-2, Hikaridai, Seika-cho, Soraku-gun, Kyoto , Japan 2 Graduate School of Information Science, Nara Institute of Science and Technology

2-2-2, Hikaridai, Seika-cho, Soraku-gun, Kyoto , Japan 2 Graduate School of Information Science, Nara Institute of Science and Technology ISCA Archive STREAM WEIGHT OPTIMIZATION OF SPEECH AND LIP IMAGE SEQUENCE FOR AUDIO-VISUAL SPEECH RECOGNITION Satoshi Nakamura 1 Hidetoshi Ito 2 Kiyohiro Shikano 2 1 ATR Spoken Language Translation Research

More information

CT516 Advanced Digital Communications Lecture 7: Speech Encoder

CT516 Advanced Digital Communications Lecture 7: Speech Encoder CT516 Advanced Digital Communications Lecture 7: Speech Encoder Yash M. Vasavada Associate Professor, DA-IICT, Gandhinagar 2nd February 2017 Yash M. Vasavada (DA-IICT) CT516: Adv. Digital Comm. 2nd February

More information

SYSTEM AND METHOD FOR SPEECH RECOGNITION

SYSTEM AND METHOD FOR SPEECH RECOGNITION Technical Disclosure Commons Defensive Publications Series September 06, 2016 SYSTEM AND METHOD FOR SPEECH RECOGNITION Dimitri Kanevsky Tara Sainath Follow this and additional works at: http://www.tdcommons.org/dpubs_series

More information

Automatic Enhancement of Correspondence Detection in an Object Tracking System

Automatic Enhancement of Correspondence Detection in an Object Tracking System Automatic Enhancement of Correspondence Detection in an Object Tracking System Denis Schulze 1, Sven Wachsmuth 1 and Katharina J. Rohlfing 2 1- University of Bielefeld - Applied Informatics Universitätsstr.

More information

Hybrid Speech Synthesis

Hybrid Speech Synthesis Hybrid Speech Synthesis Simon King Centre for Speech Technology Research University of Edinburgh 2 What are you going to learn? Another recap of unit selection let s properly understand the Acoustic Space

More information

An Event-based Optical Flow Algorithm for Dynamic Vision Sensors

An Event-based Optical Flow Algorithm for Dynamic Vision Sensors An Event-based Optical Flow Algorithm for Dynamic Vision Sensors Iffatur Ridwan and Howard Cheng Department of Mathematics and Computer Science University of Lethbridge, Canada iffatur.ridwan@uleth.ca,howard.cheng@uleth.ca

More information

Expression Detection in Video. Abstract Expression detection is useful as a non-invasive method of lie detection and

Expression Detection in Video. Abstract Expression detection is useful as a non-invasive method of lie detection and Wes Miller 5/11/2011 Comp Sci 534 Expression Detection in Video Abstract Expression detection is useful as a non-invasive method of lie detection and behavior prediction, as many facial expressions are

More information

ELEC Dr Reji Mathew Electrical Engineering UNSW

ELEC Dr Reji Mathew Electrical Engineering UNSW ELEC 4622 Dr Reji Mathew Electrical Engineering UNSW Review of Motion Modelling and Estimation Introduction to Motion Modelling & Estimation Forward Motion Backward Motion Block Motion Estimation Motion

More information

Tracking facial features using low resolution and low fps cameras under variable light conditions

Tracking facial features using low resolution and low fps cameras under variable light conditions Tracking facial features using low resolution and low fps cameras under variable light conditions Peter Kubíni * Department of Computer Graphics Comenius University Bratislava / Slovakia Abstract We are

More information

Principles of Audio Coding

Principles of Audio Coding Principles of Audio Coding Topics today Introduction VOCODERS Psychoacoustics Equal-Loudness Curve Frequency Masking Temporal Masking (CSIT 410) 2 Introduction Speech compression algorithm focuses on exploiting

More information

TOWARDS A HIGH QUALITY FINNISH TALKING HEAD

TOWARDS A HIGH QUALITY FINNISH TALKING HEAD TOWARDS A HIGH QUALITY FINNISH TALKING HEAD Jean-Luc Oliv&s, Mikko Sam, Janne Kulju and Otto Seppala Helsinki University of Technology Laboratory of Computational Engineering, P.O. Box 9400, Fin-02015

More information

Perceptual coding. A psychoacoustic model is used to identify those signals that are influenced by both these effects.

Perceptual coding. A psychoacoustic model is used to identify those signals that are influenced by both these effects. Perceptual coding Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal. Perceptual encoders, however, have been designed for the compression of general

More information

Spectral modeling of musical sounds

Spectral modeling of musical sounds Spectral modeling of musical sounds Xavier Serra Audiovisual Institute, Pompeu Fabra University http://www.iua.upf.es xserra@iua.upf.es 1. Introduction Spectral based analysis/synthesis techniques offer

More information

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 SUBJECTIVE AND OBJECTIVE QUALITY EVALUATION FOR AUDIO WATERMARKING BASED ON SINUSOIDAL AMPLITUDE MODULATION PACS: 43.10.Pr, 43.60.Ek

More information

ON THE PERFORMANCE OF SEGMENT AVERAGING OF DISCRETE COSINE TRANSFORM COEFFICIENTS ON MUSICAL INSTRUMENTS TONE RECOGNITION

ON THE PERFORMANCE OF SEGMENT AVERAGING OF DISCRETE COSINE TRANSFORM COEFFICIENTS ON MUSICAL INSTRUMENTS TONE RECOGNITION O THE PERFORMACE OF SEGMET AVERAGIG OF DISCRETE COSIE TRASFORM COEFFICIETS O MUSICAL ISTRUMETS TOE RECOGITIO Linggo Sumarno Electrical Engineering Study Program, Sanata Dharma University, Yogyakarta, Indonesia

More information

BLIND QUALITY ASSESSMENT OF JPEG2000 COMPRESSED IMAGES USING NATURAL SCENE STATISTICS. Hamid R. Sheikh, Alan C. Bovik and Lawrence Cormack

BLIND QUALITY ASSESSMENT OF JPEG2000 COMPRESSED IMAGES USING NATURAL SCENE STATISTICS. Hamid R. Sheikh, Alan C. Bovik and Lawrence Cormack BLIND QUALITY ASSESSMENT OF JPEG2 COMPRESSED IMAGES USING NATURAL SCENE STATISTICS Hamid R. Sheikh, Alan C. Bovik and Lawrence Cormack Laboratory for Image and Video Engineering, Department of Electrical

More information

LOCAL-GLOBAL OPTICAL FLOW FOR IMAGE REGISTRATION

LOCAL-GLOBAL OPTICAL FLOW FOR IMAGE REGISTRATION LOCAL-GLOBAL OPTICAL FLOW FOR IMAGE REGISTRATION Ammar Zayouna Richard Comley Daming Shi Middlesex University School of Engineering and Information Sciences Middlesex University, London NW4 4BT, UK A.Zayouna@mdx.ac.uk

More information

Room Acoustics. CMSC 828D / Spring 2006 Lecture 20

Room Acoustics. CMSC 828D / Spring 2006 Lecture 20 Room Acoustics CMSC 828D / Spring 2006 Lecture 20 Lecture Plan Room acoustics basics Structure of room impulse response Characterization of room acoustics Modeling of reverberant response Basics All our

More information

Towards Audiovisual TTS

Towards Audiovisual TTS Towards Audiovisual TTS in Estonian Einar MEISTER a, SaschaFAGEL b and RainerMETSVAHI a a Institute of Cybernetics at Tallinn University of Technology, Estonia b zoobemessageentertainmentgmbh, Berlin,

More information

Hybrid Biometric Person Authentication Using Face and Voice Features

Hybrid Biometric Person Authentication Using Face and Voice Features Paper presented in the Third International Conference, Audio- and Video-Based Biometric Person Authentication AVBPA 2001, Halmstad, Sweden, proceedings pages 348-353, June 2001. Hybrid Biometric Person

More information

ADVANCED IMAGE PROCESSING METHODS FOR ULTRASONIC NDE RESEARCH C. H. Chen, University of Massachusetts Dartmouth, N.

ADVANCED IMAGE PROCESSING METHODS FOR ULTRASONIC NDE RESEARCH C. H. Chen, University of Massachusetts Dartmouth, N. ADVANCED IMAGE PROCESSING METHODS FOR ULTRASONIC NDE RESEARCH C. H. Chen, University of Massachusetts Dartmouth, N. Dartmouth, MA USA Abstract: The significant progress in ultrasonic NDE systems has now

More information

Comparing computer vision analysis of signed language video with motion capture recordings

Comparing computer vision analysis of signed language video with motion capture recordings Comparing computer vision analysis of signed language video with motion capture recordings Matti Karppa 1, Tommi Jantunen 2, Ville Viitaniemi 1, Jorma Laaksonen 1, Birgitta Burger 3, and Danny De Weerdt

More information

Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal.

Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal. Perceptual coding Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal. Perceptual encoders, however, have been designed for the compression of general

More information

Facial Expression Analysis for Model-Based Coding of Video Sequences

Facial Expression Analysis for Model-Based Coding of Video Sequences Picture Coding Symposium, pp. 33-38, Berlin, September 1997. Facial Expression Analysis for Model-Based Coding of Video Sequences Peter Eisert and Bernd Girod Telecommunications Institute, University of

More information

Acoustic to Articulatory Mapping using Memory Based Regression and Trajectory Smoothing

Acoustic to Articulatory Mapping using Memory Based Regression and Trajectory Smoothing Acoustic to Articulatory Mapping using Memory Based Regression and Trajectory Smoothing Samer Al Moubayed Center for Speech Technology, Department of Speech, Music, and Hearing, KTH, Sweden. sameram@kth.se

More information

17. SEISMIC ANALYSIS MODELING TO SATISFY BUILDING CODES

17. SEISMIC ANALYSIS MODELING TO SATISFY BUILDING CODES 17. SEISMIC ANALYSIS MODELING TO SATISFY BUILDING CODES The Current Building Codes Use the Terminology: Principal Direction without a Unique Definition 17.1 INTRODUCTION { XE "Building Codes" }Currently

More information

Accurate 3D Face and Body Modeling from a Single Fixed Kinect

Accurate 3D Face and Body Modeling from a Single Fixed Kinect Accurate 3D Face and Body Modeling from a Single Fixed Kinect Ruizhe Wang*, Matthias Hernandez*, Jongmoo Choi, Gérard Medioni Computer Vision Lab, IRIS University of Southern California Abstract In this

More information

Real Time Motion Detection Using Background Subtraction Method and Frame Difference

Real Time Motion Detection Using Background Subtraction Method and Frame Difference Real Time Motion Detection Using Background Subtraction Method and Frame Difference Lavanya M P PG Scholar, Department of ECE, Channabasaveshwara Institute of Technology, Gubbi, Tumkur Abstract: In today

More information

Comparison Between The Optical Flow Computational Techniques

Comparison Between The Optical Flow Computational Techniques Comparison Between The Optical Flow Computational Techniques Sri Devi Thota #1, Kanaka Sunanda Vemulapalli* 2, Kartheek Chintalapati* 3, Phanindra Sai Srinivas Gudipudi* 4 # Associate Professor, Dept.

More information

Synthesizing Realistic Facial Expressions from Photographs

Synthesizing Realistic Facial Expressions from Photographs Synthesizing Realistic Facial Expressions from Photographs 1998 F. Pighin, J Hecker, D. Lischinskiy, R. Szeliskiz and D. H. Salesin University of Washington, The Hebrew University Microsoft Research 1

More information

Efficient Block Matching Algorithm for Motion Estimation

Efficient Block Matching Algorithm for Motion Estimation Efficient Block Matching Algorithm for Motion Estimation Zong Chen International Science Inde Computer and Information Engineering waset.org/publication/1581 Abstract Motion estimation is a key problem

More information

Hand-Eye Calibration from Image Derivatives

Hand-Eye Calibration from Image Derivatives Hand-Eye Calibration from Image Derivatives Abstract In this paper it is shown how to perform hand-eye calibration using only the normal flow field and knowledge about the motion of the hand. The proposed

More information

15 Data Compression 2014/9/21. Objectives After studying this chapter, the student should be able to: 15-1 LOSSLESS COMPRESSION

15 Data Compression 2014/9/21. Objectives After studying this chapter, the student should be able to: 15-1 LOSSLESS COMPRESSION 15 Data Compression Data compression implies sending or storing a smaller number of bits. Although many methods are used for this purpose, in general these methods can be divided into two broad categories:

More information

Tree-based Cluster Weighted Modeling: Towards A Massively Parallel Real- Time Digital Stradivarius

Tree-based Cluster Weighted Modeling: Towards A Massively Parallel Real- Time Digital Stradivarius Tree-based Cluster Weighted Modeling: Towards A Massively Parallel Real- Time Digital Stradivarius Edward S. Boyden III e@media.mit.edu Physics and Media Group MIT Media Lab 0 Ames St. Cambridge, MA 039

More information

Digital Volume Correlation for Materials Characterization

Digital Volume Correlation for Materials Characterization 19 th World Conference on Non-Destructive Testing 2016 Digital Volume Correlation for Materials Characterization Enrico QUINTANA, Phillip REU, Edward JIMENEZ, Kyle THOMPSON, Sharlotte KRAMER Sandia National

More information

SPREAD SPECTRUM AUDIO WATERMARKING SCHEME BASED ON PSYCHOACOUSTIC MODEL

SPREAD SPECTRUM AUDIO WATERMARKING SCHEME BASED ON PSYCHOACOUSTIC MODEL SPREAD SPECTRUM WATERMARKING SCHEME BASED ON PSYCHOACOUSTIC MODEL 1 Yüksel Tokur 2 Ergun Erçelebi e-mail: tokur@gantep.edu.tr e-mail: ercelebi@gantep.edu.tr 1 Gaziantep University, MYO, 27310, Gaziantep,

More information

Text-Independent Speaker Identification

Text-Independent Speaker Identification December 8, 1999 Text-Independent Speaker Identification Til T. Phan and Thomas Soong 1.0 Introduction 1.1 Motivation The problem of speaker identification is an area with many different applications.

More information

Joint Matrix Quantization of Face Parameters and LPC Coefficients for Low Bit Rate Audiovisual Speech Coding

Joint Matrix Quantization of Face Parameters and LPC Coefficients for Low Bit Rate Audiovisual Speech Coding IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 12, NO. 3, MAY 2004 265 Joint Matrix Quantization of Face Parameters and LPC Coefficients for Low Bit Rate Audiovisual Speech Coding Laurent Girin

More information

COS 116 The Computational Universe Laboratory 4: Digital Sound and Music

COS 116 The Computational Universe Laboratory 4: Digital Sound and Music COS 116 The Computational Universe Laboratory 4: Digital Sound and Music In this lab you will learn about digital representations of sound and music, especially focusing on the role played by frequency

More information

SPEECH FEATURE EXTRACTION USING WEIGHTED HIGHER-ORDER LOCAL AUTO-CORRELATION

SPEECH FEATURE EXTRACTION USING WEIGHTED HIGHER-ORDER LOCAL AUTO-CORRELATION Far East Journal of Electronics and Communications Volume 3, Number 2, 2009, Pages 125-140 Published Online: September 14, 2009 This paper is available online at http://www.pphmj.com 2009 Pushpa Publishing

More information

Mixture Models and EM

Mixture Models and EM Mixture Models and EM Goal: Introduction to probabilistic mixture models and the expectationmaximization (EM) algorithm. Motivation: simultaneous fitting of multiple model instances unsupervised clustering

More information

COS 116 The Computational Universe Laboratory 4: Digital Sound and Music

COS 116 The Computational Universe Laboratory 4: Digital Sound and Music COS 116 The Computational Universe Laboratory 4: Digital Sound and Music In this lab you will learn about digital representations of sound and music, especially focusing on the role played by frequency

More information

Performance Evaluation of Internet Telephony Systems Through Quantitative Assessment

Performance Evaluation of Internet Telephony Systems Through Quantitative Assessment Performance evaluation of Internet telephony systems through quantitative assessment Ng, C.H., Foo, S., & Hui, S.C. (1997). Proc. of 1997 National Undergraduate Research (NUR) Congress, Singapore, 1097-1102.

More information

Animated Talking Head With Personalized 3D Head Model

Animated Talking Head With Personalized 3D Head Model Animated Talking Head With Personalized 3D Head Model L.S.Chen, T.S.Huang - Beckman Institute & CSL University of Illinois, Urbana, IL 61801, USA; lchen@ifp.uiuc.edu Jörn Ostermann, AT&T Labs-Research,

More information

Modeling of an MPEG Audio Layer-3 Encoder in Ptolemy

Modeling of an MPEG Audio Layer-3 Encoder in Ptolemy Modeling of an MPEG Audio Layer-3 Encoder in Ptolemy Patrick Brown EE382C Embedded Software Systems May 10, 2000 $EVWUDFW MPEG Audio Layer-3 is a standard for the compression of high-quality digital audio.

More information

Statistical image models

Statistical image models Chapter 4 Statistical image models 4. Introduction 4.. Visual worlds Figure 4. shows images that belong to different visual worlds. The first world (fig. 4..a) is the world of white noise. It is the world

More information

Evaluation of Moving Object Tracking Techniques for Video Surveillance Applications

Evaluation of Moving Object Tracking Techniques for Video Surveillance Applications International Journal of Current Engineering and Technology E-ISSN 2277 4106, P-ISSN 2347 5161 2015INPRESSCO, All Rights Reserved Available at http://inpressco.com/category/ijcet Research Article Evaluation

More information

An analysis of the dimensionality of jawmotion

An analysis of the dimensionality of jawmotion Journal of Phonetics (1995) 23, 101-117 An analysis of the dimensionality of jawmotion in speech Eric Vatikiotis-Bateson A TR Human nformation Processing Research Laboratories, Kyoto, Japan and David J.

More information

Performance Evaluation of the Eigenface Algorithm on Plain-Feature Images in Comparison with Those of Distinct Features

Performance Evaluation of the Eigenface Algorithm on Plain-Feature Images in Comparison with Those of Distinct Features American Journal of Signal Processing 2015, 5(2): 32-39 DOI: 10.5923/j.ajsp.20150502.02 Performance Evaluation of the Eigenface Algorithm on Plain-Feature Images in Comparison with Those of Distinct Features

More information

Building speaker-specific lip models for talking heads from 3D face data

Building speaker-specific lip models for talking heads from 3D face data Building speaker-specific lip models for talking heads from 3D face data Takaaki Kuratate 1,2, Marcia Riley 1 1 Institute for Cognitive Systems, Technical University Munich, Germany 2 MARCS Auditory Laboratories,

More information

Getting Started with Crazy Talk 6

Getting Started with Crazy Talk 6 Getting Started with Crazy Talk 6 Crazy Talk 6 is an application that generates talking characters from an image or photo, as well as facial animation for video. Importing an Image Launch Crazy Talk and

More information

SURVEY OF LOCAL AND GLOBAL OPTICAL FLOW WITH COARSE TO FINE METHOD

SURVEY OF LOCAL AND GLOBAL OPTICAL FLOW WITH COARSE TO FINE METHOD SURVEY OF LOCAL AND GLOBAL OPTICAL FLOW WITH COARSE TO FINE METHOD M.E-II, Department of Computer Engineering, PICT, Pune ABSTRACT: Optical flow as an image processing technique finds its applications

More information

A new take on FWI: Wavefield Reconstruction Inversion

A new take on FWI: Wavefield Reconstruction Inversion A new take on FWI: Wavefield Reconstruction Inversion T. van Leeuwen 1, F.J. Herrmann and B. Peters 1 Centrum Wiskunde & Informatica, Amsterdam, The Netherlands University of British Columbia, dept. of

More information

Module 10 MULTIMEDIA SYNCHRONIZATION

Module 10 MULTIMEDIA SYNCHRONIZATION Module 10 MULTIMEDIA SYNCHRONIZATION Lesson 33 Basic definitions and requirements Instructional objectives At the end of this lesson, the students should be able to: 1. Define synchronization between media

More information

Low Cost Motion Capture

Low Cost Motion Capture Low Cost Motion Capture R. Budiman M. Bennamoun D.Q. Huynh School of Computer Science and Software Engineering The University of Western Australia Crawley WA 6009 AUSTRALIA Email: budimr01@tartarus.uwa.edu.au,

More information

Chapter 3 Set Redundancy in Magnetic Resonance Brain Images

Chapter 3 Set Redundancy in Magnetic Resonance Brain Images 16 Chapter 3 Set Redundancy in Magnetic Resonance Brain Images 3.1 MRI (magnetic resonance imaging) MRI is a technique of measuring physical structure within the human anatomy. Our proposed research focuses

More information

A NEURAL NETWORK APPLICATION FOR A COMPUTER ACCESS SECURITY SYSTEM: KEYSTROKE DYNAMICS VERSUS VOICE PATTERNS

A NEURAL NETWORK APPLICATION FOR A COMPUTER ACCESS SECURITY SYSTEM: KEYSTROKE DYNAMICS VERSUS VOICE PATTERNS A NEURAL NETWORK APPLICATION FOR A COMPUTER ACCESS SECURITY SYSTEM: KEYSTROKE DYNAMICS VERSUS VOICE PATTERNS A. SERMET ANAGUN Industrial Engineering Department, Osmangazi University, Eskisehir, Turkey

More information

Production of Video Images by Computer Controlled Cameras and Its Application to TV Conference System

Production of Video Images by Computer Controlled Cameras and Its Application to TV Conference System Proc. of IEEE Conference on Computer Vision and Pattern Recognition, vol.2, II-131 II-137, Dec. 2001. Production of Video Images by Computer Controlled Cameras and Its Application to TV Conference System

More information

Music 209 Advanced Topics in Computer Music Lecture 8 Off-line Concatenation Control

Music 209 Advanced Topics in Computer Music Lecture 8 Off-line Concatenation Control Music 209 Advanced Topics in Computer Music Lecture 8 Off-line Concatenation Control Pre-recorded audio and MIDI performances: we know data for future t s. 2006-3-9 Professor David Wessel (with John Lazzaro)

More information

Supplementary Figure 1. Decoding results broken down for different ROIs

Supplementary Figure 1. Decoding results broken down for different ROIs Supplementary Figure 1 Decoding results broken down for different ROIs Decoding results for areas V1, V2, V3, and V1 V3 combined. (a) Decoded and presented orientations are strongly correlated in areas

More information

Using the rear projection of the Socibot Desktop robot for creation of applications with facial expressions

Using the rear projection of the Socibot Desktop robot for creation of applications with facial expressions IOP Conference Series: Materials Science and Engineering PAPER OPEN ACCESS Using the rear projection of the Socibot Desktop robot for creation of applications with facial expressions To cite this article:

More information

Statics: the abusive power of trimming

Statics: the abusive power of trimming John C. Bancroft, Alan Richards, and Charles P. Ursenbach Statics: the abusive power of trimming ABSTRACT The application of trim statics directly to each seismic trace can be very dangerous as any seismic

More information

3D Mesh Sequence Compression Using Thin-plate Spline based Prediction

3D Mesh Sequence Compression Using Thin-plate Spline based Prediction Appl. Math. Inf. Sci. 10, No. 4, 1603-1608 (2016) 1603 Applied Mathematics & Information Sciences An International Journal http://dx.doi.org/10.18576/amis/100440 3D Mesh Sequence Compression Using Thin-plate

More information

Proceedings of Meetings on Acoustics

Proceedings of Meetings on Acoustics Proceedings of Meetings on Acoustics Volume 9, http://acousticalsociety.org/ ICA Montreal Montreal, Canada - June Speech Communication Session asc: Linking Perception and Production (Poster Session) asc.

More information

CS 4495 Computer Vision A. Bobick. Motion and Optic Flow. Stereo Matching

CS 4495 Computer Vision A. Bobick. Motion and Optic Flow. Stereo Matching Stereo Matching Fundamental matrix Let p be a point in left image, p in right image l l Epipolar relation p maps to epipolar line l p maps to epipolar line l p p Epipolar mapping described by a 3x3 matrix

More information

Face Tracking : An implementation of the Kanade-Lucas-Tomasi Tracking algorithm

Face Tracking : An implementation of the Kanade-Lucas-Tomasi Tracking algorithm Face Tracking : An implementation of the Kanade-Lucas-Tomasi Tracking algorithm Dirk W. Wagener, Ben Herbst Department of Applied Mathematics, University of Stellenbosch, Private Bag X1, Matieland 762,

More information

BOSS. Quick Start Guide For research use only. Blackrock Microsystems, LLC. Blackrock Offline Spike Sorter. User s Manual. 630 Komas Drive Suite 200

BOSS. Quick Start Guide For research use only. Blackrock Microsystems, LLC. Blackrock Offline Spike Sorter. User s Manual. 630 Komas Drive Suite 200 BOSS Quick Start Guide For research use only Blackrock Microsystems, LLC 630 Komas Drive Suite 200 Salt Lake City UT 84108 T: +1 801 582 5533 www.blackrockmicro.com support@blackrockmicro.com 1 2 1.0 Table

More information

Creating a Lip Sync and Using the X-Sheet in Dragonframe

Creating a Lip Sync and Using the X-Sheet in Dragonframe Creating a Lip Sync and Using the X-Sheet in Dragonframe Contents A. Creating a Lip Sync in Dragonframe B. Loading the X-Sheet in Dragon Frame C. Setting Notes and Flag/Reminders in the X-Sheet 1. Trackreading/Breaking

More information

Topics in Linguistic Theory: Laboratory Phonology Spring 2007

Topics in Linguistic Theory: Laboratory Phonology Spring 2007 MIT OpenCourseWare http://ocw.mit.edu 24.910 Topics in Linguistic Theory: Laboratory Phonology Spring 2007 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

LOW-DIMENSIONAL MOTION FEATURES FOR AUDIO-VISUAL SPEECH RECOGNITION

LOW-DIMENSIONAL MOTION FEATURES FOR AUDIO-VISUAL SPEECH RECOGNITION LOW-DIMENSIONAL MOTION FEATURES FOR AUDIO-VISUAL SPEECH Andrés Vallés Carboneras, Mihai Gurban +, and Jean-Philippe Thiran + + Signal Processing Institute, E.T.S.I. de Telecomunicación Ecole Polytechnique

More information

Adaptive Waveform Inversion: Theory Mike Warner*, Imperial College London, and Lluís Guasch, Sub Salt Solutions Limited

Adaptive Waveform Inversion: Theory Mike Warner*, Imperial College London, and Lluís Guasch, Sub Salt Solutions Limited Adaptive Waveform Inversion: Theory Mike Warner*, Imperial College London, and Lluís Guasch, Sub Salt Solutions Limited Summary We present a new method for performing full-waveform inversion that appears

More information

HEALTH MONITORING OF INDUCTION MOTOR FOR VIBRATION ANALYSIS

HEALTH MONITORING OF INDUCTION MOTOR FOR VIBRATION ANALYSIS HEALTH MONITORING OF INDUCTION MOTOR FOR VIBRATION ANALYSIS Chockalingam ARAVIND VAITHILINGAM aravind_147@yahoo.com UCSI University Kualalumpur Gilbert THIO gthio@ucsi.edu.my UCSI University Kualalumpur

More information

International Journal of Advance Engineering and Research Development

International Journal of Advance Engineering and Research Development Scientific Journal of Impact Factor (SJIF): 4.72 International Journal of Advance Engineering and Research Development Volume 4, Issue 11, November -2017 e-issn (O): 2348-4470 p-issn (P): 2348-6406 Comparative

More information

Compression; Error detection & correction

Compression; Error detection & correction Compression; Error detection & correction compression: squeeze out redundancy to use less memory or use less network bandwidth encode the same information in fewer bits some bits carry no information some

More information

Bluray (

Bluray ( Bluray (http://www.blu-ray.com/faq) MPEG-2 - enhanced for HD, also used for playback of DVDs and HDTV recordings MPEG-4 AVC - part of the MPEG-4 standard also known as H.264 (High Profile and Main Profile)

More information

Predictive Interpolation for Registration

Predictive Interpolation for Registration Predictive Interpolation for Registration D.G. Bailey Institute of Information Sciences and Technology, Massey University, Private bag 11222, Palmerston North D.G.Bailey@massey.ac.nz Abstract Predictive

More information

REALISTIC FACIAL EXPRESSION SYNTHESIS FOR AN IMAGE-BASED TALKING HEAD. Kang Liu and Joern Ostermann

REALISTIC FACIAL EXPRESSION SYNTHESIS FOR AN IMAGE-BASED TALKING HEAD. Kang Liu and Joern Ostermann REALISTIC FACIAL EXPRESSION SYNTHESIS FOR AN IMAGE-BASED TALKING HEAD Kang Liu and Joern Ostermann Institut für Informationsverarbeitung, Leibniz Universität Hannover Appelstr. 9A, 3167 Hannover, Germany

More information

Effects of multi-scale velocity heterogeneities on wave-equation migration Yong Ma and Paul Sava, Center for Wave Phenomena, Colorado School of Mines

Effects of multi-scale velocity heterogeneities on wave-equation migration Yong Ma and Paul Sava, Center for Wave Phenomena, Colorado School of Mines Effects of multi-scale velocity heterogeneities on wave-equation migration Yong Ma and Paul Sava, Center for Wave Phenomena, Colorado School of Mines SUMMARY Velocity models used for wavefield-based seismic

More information

VIDEO DENOISING BASED ON ADAPTIVE TEMPORAL AVERAGING

VIDEO DENOISING BASED ON ADAPTIVE TEMPORAL AVERAGING Engineering Review Vol. 32, Issue 2, 64-69, 2012. 64 VIDEO DENOISING BASED ON ADAPTIVE TEMPORAL AVERAGING David BARTOVČAK Miroslav VRANKIĆ Abstract: This paper proposes a video denoising algorithm based

More information

Facial Motion Capture Editing by Automated Orthogonal Blendshape Construction and Weight Propagation

Facial Motion Capture Editing by Automated Orthogonal Blendshape Construction and Weight Propagation Facial Motion Capture Editing by Automated Orthogonal Blendshape Construction and Weight Propagation Qing Li and Zhigang Deng Department of Computer Science University of Houston Houston, TX, 77204, USA

More information

Perspectives on Multimedia Quality Prediction Methodologies for Advanced Mobile and IP-based Telephony

Perspectives on Multimedia Quality Prediction Methodologies for Advanced Mobile and IP-based Telephony Perspectives on Multimedia Quality Prediction Methodologies for Advanced Mobile and IP-based Telephony Nobuhiko Kitawaki University of Tsukuba 1-1-1, Tennoudai, Tsukuba-shi, 305-8573 Japan. E-mail: kitawaki@cs.tsukuba.ac.jp

More information

3D Face and Hand Tracking for American Sign Language Recognition

3D Face and Hand Tracking for American Sign Language Recognition 3D Face and Hand Tracking for American Sign Language Recognition NSF-ITR (2004-2008) D. Metaxas, A. Elgammal, V. Pavlovic (Rutgers Univ.) C. Neidle (Boston Univ.) C. Vogler (Gallaudet) The need for automated

More information

Speech to Head Gesture Mapping in Multimodal Human-Robot Interaction

Speech to Head Gesture Mapping in Multimodal Human-Robot Interaction 1 Speech to Head Gesture Mapping in Multimodal Human-Robot Interaction Amir Aly and Adriana Tapus Cognitive Robotics Lab, ENSTA-ParisTech, France {amir.aly, adriana.tapus}@ensta-paristech.fr Abstract In

More information

arxiv: v1 [cs.cv] 2 May 2016

arxiv: v1 [cs.cv] 2 May 2016 16-811 Math Fundamentals for Robotics Comparison of Optimization Methods in Optical Flow Estimation Final Report, Fall 2015 arxiv:1605.00572v1 [cs.cv] 2 May 2016 Contents Noranart Vesdapunt Master of Computer

More information