Joint Processing of Audio and Visual Information for Speech Recognition

Size: px
Start display at page:

Download "Joint Processing of Audio and Visual Information for Speech Recognition"

Transcription

1 Joint Processing of Audio and Visual Information for Speech Recognition Chalapathy Neti and Gerasimos Potamianos IBM T.J. Watson Research Center Yorktown Heights, NY With Iain Mattwes, CMU, USA Juergen Luettin, IDIAP, Switzerland Baltimore, 07/14/00

2 JHU Workshop, 2000 Theme: Audio-visual speech recognition Goal: Use visual information to improve audio-based speech recognition, particulary in acoustically degraded conditions TEAM: Senior Researchers Chalapathy Neti - Lead, Gerasimos Potamianos (IBM) Iain Matthews (CMU) Juegen Luettin (IDIAP, Switzerland) Andreas Andreou (JHU) Graduate students Herve Glottin (ICP, Grenoble), Eugenio Culurcielllo (JHU) Undergraduates: Azad Mashari (U.Toronto), June Sison (UCSC) DATA: Subset of IBM VVAV data, LDC Broadcast video data

3 Perceptual Information Interfaces (PII) Joint processing of audio, visual and other information for active acquisition, recognition and interpretation of objects and events. Is anybody speaking? Speech, music, etc. Who is speaking? What is being said? Speaker state? When speaking. Emotional state. What is the environment? Office, studio, street, etc. Who is in the environment? Crowd, meeting, anchor,, etc.

4 Audio-Visual Processing Technologies Exploit visual information to improve audio-based processing in adverse acoustic conditions Example contexts: Audio-Visual Speech Recognition (MMSP99, ICASSP00, ICME00) Combine visual speech (visemes) with audio speech (phones) recognition Audio-Visual Speaker Recognition (AVSP99, MMSP99, ICME00) Combine audio signatures of a person with visual signatures (face-id) Audio-Visual Speaker Segmentation (RIAO00) Combine visual scene change with audio-based speaker change Audio-Visual Speech-event detection (ICASSP00) Combine visual speech onset cues with audio-based speech energy Audio-Visual Speech synthesis (ICME00) Applications:Human-Computer Interaction, Multimedia content indexing, HCI for people with special needs In this workshop, we propose to attack the problem of LV audio-visual speech recognition. Video data Decision AV Stream Audio data Decision Fusion A-V Descriptor

5 Audio-Visual ASR: Motivation OBJECTIVE: To significantly improve ASR performance by joint use of audio and visual information. FACTS / MOTIVATION: Lack of ASR robustness to noise / mismatched training-testing conditions. Humans speechread to increase intelligibility of noisy speech. Summerfield experiment. Humans fuse audio and visual information in speech recognition. McGurk effect. Acoustic: /ga/ Visual: /ba/ Perceived: /da/ Hearing impaired people speechread well. Visual and audio information are partially complementary: Acoustically confusable phonemes belong to different viseme classes Inexpensive capture of visual information

6 Transcription Accuracy - Results from the '98 DARPA Evaluation (IBM Research System) Accuracy Prepared Conversation Telephone Background Music Background Noise Foreign Accent Combinations Overall 0 Focus condition

7 Bimodal Human Speech Recognition (Summerfield, 79) Experiment: Human transcription of audio-visual speech at SNR < 0 db A - Acoustic only. B - Acoustic + full video. C - Acoustic + lip region. D - Acoustic + 4 lip-points. Word recognition A B C D

8 Visemes vs. Phonemes 1. f,v 5. p, b, m 9. l 2. th,dh 6. w 10. iy 3. s,z 7. r 11. ah 4. sh, zh 8. g,k,n,t,d,y 12. eh Viseme is a unit of visual speech. Phoneme to viseme is a many-to-one mapping. Viseme classes are different from phonetic classes. e.g., /m/, /n/,, belong to the same acoustic class (nasals), whereas they belong to separate viseme classes.

9 McGurk Effect

10 Visual-Only ASR Performance Digit / alpha recognition (Potamianos). Single-speaker digits = 97.0 %. Single-speaker alphas = 71.1 %. 50-speaker alphas = 36.5 % Phonetic classification (Potamianos, Neti, Vermar, Iyengar). 162-speakers (VV-AV), LVCSR = 37.3% Single-speaker digits Lip shape DWT PCA LDA 80

11 Audio-Visual ASR: Challenges Suitable databases are just emerging. At IBM (Giri, Potamianos,Neti): Collected a one of a kind LVCSR audio-visual database (50 hrs, >260 subjects). Previous work: AT&T (Potamianos( Potamianos): 1.5 hr, 50 subjects, connected letters Surrey (Kittler): M2VTS (37 subjects, connected digits) Visual front end. Face and lip/mouth tracking. Mouth center location and size estimates suffice, and are robust (Senior( Senior). Visual features suitable for speechreading. Pixel based, image transforms of video ROI (DCT, DWT, PCA, LDA). Audio-visual fusion (integration) for bimodal LVCSR. Decision fusion, by means of the multi-stream HMM. Exponent training. Confidence estimation of individual streams. Integration level (state, phone, word).

12 The Audio-Visual Database Previous work (CMU,( AT&T, U.Grenoble, (X)M2VTS, Tulips ). databases). Small vocabulary (digits, alphas). Small number of subjects (1-10). Isolated or connected word speech. The IBM ViaVoice audio-visual database (VV-AV( VV-AV). ViaVoice enrollment sentences. 50 hrs of speech. >260 subjects. MPEG2 compressed, 60 Hz, 704 x 480 pixels, frontal subject view. Largest up-to-date database.

13 Facial feature location and tracking

14 Face and Facial Feature Tracking Locate the face in each frame (Senior,( Senior, '99). Search image pyramid across scales & locations. Each square mxn region is considered as a face. Statistical, pixel based approach, using LDA and PCA. Locate ocate 29 facial features (6 mouth ones) within located faces. Statistical, pixel based approach. Uses face geometry statistics to prune erroneous feature candidates. start end ratio

15 Visual Speech Features (I) Lip shape based features. Lip area, height, width, perimeter (Petajan, Lip image moments (Potamianos). Lip contour parametrization. Deformable template parameters (Stork( Stork). Active shape models (Luettin,( Blake). Fourier descriptors (Potamianos( Potamianos). Fail to capture oral cavity information. Petajan, Benoit).

16 Visual Speech Features (II) Image transform, pixel based approach. Considers video region of interest: V(t) = { V(x,y,z): x = 1...m, y = 1...n, z = t-t L...t+t R } Compresses the region of interest using an appropriate image transform (trained( from the data), such as: Discrete cosine transform - DCT Discrete wavelet transform - DWT Principal component analysis - PCA. Linear discriminant analysis - LDA. Final feature vector: o v (t) = S G P'V(t) V(t)

17 CASCADE TRANFORMATION PCA PCA+LDA+MLLT PCA+LDA

18 - AAM Features (IAIN) Statistical model of: Shape point distribution model (PDM) Appearance shape free eigenface Combined shape + appearance combined appearance model Iterative fit algorithm Active Appearance Model (AAM) algorithm mode 1-2sd mean +2sd

19 Point Distribution Model Media Clip

20 Active Shape Models Active Shape Model (ASM) PDM in action iterative fitting constrained in statistically learned shape space Iterative fitting algorithm (Cootes et. al) Point-wise search along normal for edge, or best fit of function Find pose update and use shape constraints Reiterate until convergence Simplex minimisation (Luettin et. al) Concatenate grey-level profiles Use PCA to calculate grey-level profile distribution model (GLDM) GLDM: x = x + P b p p p p

21 Active Appearance Model (AAM) Extension of ASM s to model both appearance and shape in a single statistical model (Cootes and Taylor, 1998) Shape model same PDM used with ASM tracking x = x + P b s s Color appearance model warp example landmark point model to mean shape sample color and normalise use PCA to calculate appearance model g = g + P b g g Combined shape appearance model concatenate shape and appearance parameters

22 Face appearance models

23 Audio-Visual Sensor Fusion Feature fusion vs. decision fusion

24 Feature Fusion Method: Concatenate audio and visual observation vectors (after time-alignment): O (t) = [ O (t) A, O (t) V ] Train a single-stream HMM: P[ O (t) state j ] = m=1...mj w jm jm N( O (t) ; jm, S jm jm )

25 Decision Fusion Formulation (general setting): S information sources about classes: j Å=J = {1,...,J1,...,J }. Time-synchronous observations per source: O (t) s Å R D s, s = 1,...,S. Multi-modal observation vector: O (t) = [ O (t) 1, O (t) 2,..., O (t) S ] Å R D. Single-modal HMMs: Pr [ O (t) s j ] = m=1...mjs w jms N Ds (O (t) s ; jms, S jms ). Multi-stream HMM: Score [ O (t) j ] = s=1...s js log Pr [ O (t) s j ], Stream exponents: 0 x js x S, s=1...s js = S, j Å J.

26 Multi-Stream HMM Training HMM parameters: Stream parameters: Stream exponents: s = [ ( w jms, jms, jms ), str = [ js, j Å J, s = 1,...,S ]. Combined parameters: = [ s, s = 1,...,S, str ]. ), m = 1,...,M js js, j Å J ]. Train single-stream HMMs first (EM training): s (k+1) (k+1) = argmax s Q ( (k) (k), s O ), ), s = 1,...,S s = 1,...,S. Discriminatively train exponents by means of the GPD algorithm. Results (50-subject connected letters, clean speech): Results ( Audio-only word accuracy: Visual-only word accuracy: Audio-visual word accuracy: word accuracy: 85.9 % word accuracy: 36.5 % word accuracy: 88.9 %

27 Multistream HMM (Juergen et al) Can allow for asynchrony Acoustic features Visual features

28 Audio-visual decoding

29 Fusion Research Issues Stream independence assumption is unrealistic. Probability is preferable to a "score" formulation. Integration level: Feature, state, phone, word. Classes of interest: Visemes vs. phonemes. Stream confidence estimation. Entropy, SNR based, smoothly time varying? General decision fusion function. Score [ O (t) j ] = f ( Pr [ O (t) s j ], s = 1,...,S ). Lattice / n-best rescoring techniques.

30 Audio-Visual Speech Recognition Word recognition experiments 5.5 hrs of training, 162 spkrs 1.4 hrs of testing, 162 spkrs Audio features cepstra+lda+mllt Visual features DWT+LDA+MLLT Audio-Visual feature fusion (Cepstra,DWT)+LDA+MLLT Noise speech noise at 10 db SNR Audio Visual A-V 0 Clean 10 db SNR

31 Summary Workshop concentration will be on decision fusion techniques for audio-visual ASR. Multi-stream HMM. Lattice rescoring. Others. We (IBM) are providing the LVCSR AV database and baseline lattices and models. This workshop is a unique opportunity to advance the state of the art in audio-visual ASR.

32 Scientific Merit Significantly improve the LVCSR robustness Digit/alpha SD -> LVCSR SI First step towards perceptual computing Use of diverse sources of information (audio, visual - gaze, pointing, facial gestures, touch) to robustly extract human activity and intent for human information interaction (HII) An exemplar for developing a generalized framework for information fusion Measures for realibility of information sources and their use in making a joint decision Fusion functions

33 Applications Perceptual interfaces for HII speech recognition, speaker identification, speaker segmentation, human intent detection, speech understanding, speech synthesis, scene understanding Multimedia content indexing and retreival (MPEG7 context) Internet digital media content is exploding Network companies have agressive plans for digitizing archival content HCI for people with Special needs e.g: Speech impaired people due to hearing impairment

34 Workshop Planning Pre-workshop: Data preparation (IBM VVAV and LDC CNN) data collection, transcription and annotation Audio and visual front ends (DCT) setup for AAM feature extraction Trained audio-only and visual-only HMMs. Audio-only lattices. Workshop: AAM based visual front end Baseline audio-visual HMM using feature fusion. Lattice rescoring experiments. Time-varying HMM stream exponent estimation. Stream information / confidence measures. adaptation to CNN data Other recombination functions.

Speaker Localisation Using Audio-Visual Synchrony: An Empirical Study

Speaker Localisation Using Audio-Visual Synchrony: An Empirical Study Speaker Localisation Using Audio-Visual Synchrony: An Empirical Study H.J. Nock, G. Iyengar, and C. Neti IBM TJ Watson Research Center, PO Box 218, Yorktown Heights, NY 10598. USA. Abstract. This paper

More information

2-2-2, Hikaridai, Seika-cho, Soraku-gun, Kyoto , Japan 2 Graduate School of Information Science, Nara Institute of Science and Technology

2-2-2, Hikaridai, Seika-cho, Soraku-gun, Kyoto , Japan 2 Graduate School of Information Science, Nara Institute of Science and Technology ISCA Archive STREAM WEIGHT OPTIMIZATION OF SPEECH AND LIP IMAGE SEQUENCE FOR AUDIO-VISUAL SPEECH RECOGNITION Satoshi Nakamura 1 Hidetoshi Ito 2 Kiyohiro Shikano 2 1 ATR Spoken Language Translation Research

More information

Confusability of Phonemes Grouped According to their Viseme Classes in Noisy Environments

Confusability of Phonemes Grouped According to their Viseme Classes in Noisy Environments PAGE 265 Confusability of Phonemes Grouped According to their Viseme Classes in Noisy Environments Patrick Lucey, Terrence Martin and Sridha Sridharan Speech and Audio Research Laboratory Queensland University

More information

Audio-Visual Speech Processing System for Polish with Dynamic Bayesian Network Models

Audio-Visual Speech Processing System for Polish with Dynamic Bayesian Network Models Proceedings of the orld Congress on Electrical Engineering and Computer Systems and Science (EECSS 2015) Barcelona, Spain, July 13-14, 2015 Paper No. 343 Audio-Visual Speech Processing System for Polish

More information

A Multimodal Sensor Fusion Architecture for Audio-Visual Speech Recognition

A Multimodal Sensor Fusion Architecture for Audio-Visual Speech Recognition A Multimodal Sensor Fusion Architecture for Audio-Visual Speech Recognition by Mustapha A. Makkook A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree

More information

AUDIO-VISUAL SPEECH-PROCESSING SYSTEM FOR POLISH APPLICABLE TO HUMAN-COMPUTER INTERACTION

AUDIO-VISUAL SPEECH-PROCESSING SYSTEM FOR POLISH APPLICABLE TO HUMAN-COMPUTER INTERACTION Computer Science 19(1) 2018 https://doi.org/10.7494/csci.2018.19.1.2398 Tomasz Jadczyk AUDIO-VISUAL SPEECH-PROCESSING SYSTEM FOR POLISH APPLICABLE TO HUMAN-COMPUTER INTERACTION Abstract This paper describes

More information

SCALE BASED FEATURES FOR AUDIOVISUAL SPEECH RECOGNITION

SCALE BASED FEATURES FOR AUDIOVISUAL SPEECH RECOGNITION IEE Colloquium on Integrated Audio-Visual Processing for Recognition, Synthesis and Communication, pp 8/1 8/7, 1996 1 SCALE BASED FEATURES FOR AUDIOVISUAL SPEECH RECOGNITION I A Matthews, J A Bangham and

More information

AUDIOVISUAL SPEECH RECOGNITION USING MULTISCALE NONLINEAR IMAGE DECOMPOSITION

AUDIOVISUAL SPEECH RECOGNITION USING MULTISCALE NONLINEAR IMAGE DECOMPOSITION AUDIOVISUAL SPEECH RECOGNITION USING MULTISCALE NONLINEAR IMAGE DECOMPOSITION Iain Matthews, J. Andrew Bangham and Stephen Cox School of Information Systems, University of East Anglia, Norwich, NR4 7TJ,

More information

Phoneme Analysis of Image Feature in Utterance Recognition Using AAM in Lip Area

Phoneme Analysis of Image Feature in Utterance Recognition Using AAM in Lip Area MIRU 7 AAM 657 81 1 1 657 81 1 1 E-mail: {komai,miyamoto}@me.cs.scitec.kobe-u.ac.jp, {takigu,ariki}@kobe-u.ac.jp Active Appearance Model Active Appearance Model combined DCT Active Appearance Model combined

More information

Multifactor Fusion for Audio-Visual Speaker Recognition

Multifactor Fusion for Audio-Visual Speaker Recognition Proceedings of the 7th WSEAS International Conference on Signal, Speech and Image Processing, Beijing, China, September 15-17, 2007 70 Multifactor Fusion for Audio-Visual Speaker Recognition GIRIJA CHETTY

More information

Active Appearance Models

Active Appearance Models Active Appearance Models Edwards, Taylor, and Cootes Presented by Bryan Russell Overview Overview of Appearance Models Combined Appearance Models Active Appearance Model Search Results Constrained Active

More information

Facial Feature Detection

Facial Feature Detection Facial Feature Detection Rainer Stiefelhagen 21.12.2009 Interactive Systems Laboratories, Universität Karlsruhe (TH) Overview Resear rch Group, Universität Karlsruhe (TH H) Introduction Review of already

More information

LIP ACTIVITY DETECTION FOR TALKING FACES CLASSIFICATION IN TV-CONTENT

LIP ACTIVITY DETECTION FOR TALKING FACES CLASSIFICATION IN TV-CONTENT LIP ACTIVITY DETECTION FOR TALKING FACES CLASSIFICATION IN TV-CONTENT Meriem Bendris 1,2, Delphine Charlet 1, Gérard Chollet 2 1 France Télécom R&D - Orange Labs, France 2 CNRS LTCI, TELECOM-ParisTech,

More information

ISCA Archive

ISCA Archive ISCA Archive http://www.isca-speech.org/archive Auditory-Visual Speech Processing 2005 (AVSP 05) British Columbia, Canada July 24-27, 2005 AUDIO-VISUAL SPEAKER IDENTIFICATION USING THE CUAVE DATABASE David

More information

Articulatory Features for Robust Visual Speech Recognition

Articulatory Features for Robust Visual Speech Recognition Articulatory Features for Robust Visual Speech Recognition Kate Saenko, Trevor Darrell, and James Glass MIT Computer Science and Artificial Intelligence Laboratory 32 Vassar Street Cambridge, Massachusetts,

More information

Automatic speech reading by oral motion tracking for user authentication system

Automatic speech reading by oral motion tracking for user authentication system 2012 2013 Third International Conference on Advanced Computing & Communication Technologies Automatic speech reading by oral motion tracking for user authentication system Vibhanshu Gupta, M.E. (EXTC,

More information

Generic Face Alignment Using an Improved Active Shape Model

Generic Face Alignment Using an Improved Active Shape Model Generic Face Alignment Using an Improved Active Shape Model Liting Wang, Xiaoqing Ding, Chi Fang Electronic Engineering Department, Tsinghua University, Beijing, China {wanglt, dxq, fangchi} @ocrserv.ee.tsinghua.edu.cn

More information

A Comparison of Visual Features for Audio-Visual Automatic Speech Recognition

A Comparison of Visual Features for Audio-Visual Automatic Speech Recognition A Comparison of Visual Features for Audio-Visual Automatic Speech Recognition N. Ahmad, S. Datta, D. Mulvaney and O. Farooq Loughborough Univ, LE11 3TU Leicestershire, UK n.ahmad@lboro.ac.uk 6445 Abstract

More information

Towards Lipreading Sentences with Active Appearance Models

Towards Lipreading Sentences with Active Appearance Models Towards Lipreading Sentences with Active Appearance Models George Sterpu, Naomi Harte Sigmedia, ADAPT Centre, School of Engineering, Trinity College Dublin, Ireland sterpug@tcd.ie, nharte@tcd.ie Abstract

More information

Towards Audiovisual TTS

Towards Audiovisual TTS Towards Audiovisual TTS in Estonian Einar MEISTER a, SaschaFAGEL b and RainerMETSVAHI a a Institute of Cybernetics at Tallinn University of Technology, Estonia b zoobemessageentertainmentgmbh, Berlin,

More information

Face Recognition At-a-Distance Based on Sparse-Stereo Reconstruction

Face Recognition At-a-Distance Based on Sparse-Stereo Reconstruction Face Recognition At-a-Distance Based on Sparse-Stereo Reconstruction Ham Rara, Shireen Elhabian, Asem Ali University of Louisville Louisville, KY {hmrara01,syelha01,amali003}@louisville.edu Mike Miller,

More information

3D LIP TRACKING AND CO-INERTIA ANALYSIS FOR IMPROVED ROBUSTNESS OF AUDIO-VIDEO AUTOMATIC SPEECH RECOGNITION

3D LIP TRACKING AND CO-INERTIA ANALYSIS FOR IMPROVED ROBUSTNESS OF AUDIO-VIDEO AUTOMATIC SPEECH RECOGNITION 3D LIP TRACKING AND CO-INERTIA ANALYSIS FOR IMPROVED ROBUSTNESS OF AUDIO-VIDEO AUTOMATIC SPEECH RECOGNITION Roland Goecke 1,2 1 Autonomous System and Sensing Technologies, National ICT Australia, Canberra,

More information

Automatic Extraction of Lip Feature Points

Automatic Extraction of Lip Feature Points Automatic Extraction of Lip Feature Points Roland Gockel, J Bruce Millar l, Alexander Zelinsky 2, and J ordi Robert-Ribes 3 lcomputer Sciences Laboratory and 2Robotic Systems Laboratory, RSISE, Australian

More information

Comparison of Image Transform-Based Features for Visual Speech Recognition in Clean and Corrupted Videos

Comparison of Image Transform-Based Features for Visual Speech Recognition in Clean and Corrupted Videos Comparison of Image Transform-Based Features for Visual Speech Recognition in Clean and Corrupted Videos Seymour, R., Stewart, D., & Ji, M. (8). Comparison of Image Transform-Based Features for Visual

More information

Mouth Region Localization Method Based on Gaussian Mixture Model

Mouth Region Localization Method Based on Gaussian Mixture Model Mouth Region Localization Method Based on Gaussian Mixture Model Kenichi Kumatani and Rainer Stiefelhagen Universitaet Karlsruhe (TH), Interactive Systems Labs, Am Fasanengarten 5, 76131 Karlsruhe, Germany

More information

Bengt J. Borgström, Student Member, IEEE, and Abeer Alwan, Senior Member, IEEE

Bengt J. Borgström, Student Member, IEEE, and Abeer Alwan, Senior Member, IEEE 1 A Low Complexity Parabolic Lip Contour Model With Speaker Normalization For High-Level Feature Extraction in Noise Robust Audio-Visual Speech Recognition Bengt J Borgström, Student Member, IEEE, and

More information

LOW-DIMENSIONAL MOTION FEATURES FOR AUDIO-VISUAL SPEECH RECOGNITION

LOW-DIMENSIONAL MOTION FEATURES FOR AUDIO-VISUAL SPEECH RECOGNITION LOW-DIMENSIONAL MOTION FEATURES FOR AUDIO-VISUAL SPEECH Andrés Vallés Carboneras, Mihai Gurban +, and Jean-Philippe Thiran + + Signal Processing Institute, E.T.S.I. de Telecomunicación Ecole Polytechnique

More information

Confidence Measures: how much we can trust our speech recognizers

Confidence Measures: how much we can trust our speech recognizers Confidence Measures: how much we can trust our speech recognizers Prof. Hui Jiang Department of Computer Science York University, Toronto, Ontario, Canada Email: hj@cs.yorku.ca Outline Speech recognition

More information

Probabilistic Facial Feature Extraction Using Joint Distribution of Location and Texture Information

Probabilistic Facial Feature Extraction Using Joint Distribution of Location and Texture Information Probabilistic Facial Feature Extraction Using Joint Distribution of Location and Texture Information Mustafa Berkay Yilmaz, Hakan Erdogan, Mustafa Unel Sabanci University, Faculty of Engineering and Natural

More information

Maximum Likelihood Beamforming for Robust Automatic Speech Recognition

Maximum Likelihood Beamforming for Robust Automatic Speech Recognition Maximum Likelihood Beamforming for Robust Automatic Speech Recognition Barbara Rauch barbara@lsv.uni-saarland.de IGK Colloquium, Saarbrücken, 16 February 2006 Agenda Background: Standard ASR Robust ASR

More information

Gating Neural Network for Large Vocabulary Audiovisual Speech Recognition

Gating Neural Network for Large Vocabulary Audiovisual Speech Recognition IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. XX, NO. X, JUNE 2017 1 Gating Neural Network for Large Vocabulary Audiovisual Speech Recognition Fei Tao, Student Member, IEEE, and

More information

Feature Detection and Tracking with Constrained Local Models

Feature Detection and Tracking with Constrained Local Models Feature Detection and Tracking with Constrained Local Models David Cristinacce and Tim Cootes Dept. Imaging Science and Biomedical Engineering University of Manchester, Manchester, M3 9PT, U.K. david.cristinacce@manchester.ac.uk

More information

CONTINUOUS VISUAL SPEECH RECOGNITION FOR MULTIMODAL FUSION

CONTINUOUS VISUAL SPEECH RECOGNITION FOR MULTIMODAL FUSION 2014 IEEE International Conference on Acoustic, peech and ignal Processing (ICAP) CONTINUOU VIUAL PEECH RECOGNITION FOR MULTIMODAL FUION Eric Benhaim, Hichem ahbi Telecom ParisTech CNR-LTCI 46 rue Barrault,

More information

A New Manifold Representation for Visual Speech Recognition

A New Manifold Representation for Visual Speech Recognition A New Manifold Representation for Visual Speech Recognition Dahai Yu, Ovidiu Ghita, Alistair Sutherland, Paul F. Whelan School of Computing & Electronic Engineering, Vision Systems Group Dublin City University,

More information

Multiple Kernel Learning for Emotion Recognition in the Wild

Multiple Kernel Learning for Emotion Recognition in the Wild Multiple Kernel Learning for Emotion Recognition in the Wild Karan Sikka, Karmen Dykstra, Suchitra Sathyanarayana, Gwen Littlewort and Marian S. Bartlett Machine Perception Laboratory UCSD EmotiW Challenge,

More information

Semantic Word Embedding Neural Network Language Models for Automatic Speech Recognition

Semantic Word Embedding Neural Network Language Models for Automatic Speech Recognition Semantic Word Embedding Neural Network Language Models for Automatic Speech Recognition Kartik Audhkhasi, Abhinav Sethy Bhuvana Ramabhadran Watson Multimodal Group IBM T. J. Watson Research Center Motivation

More information

Research Article Audio-Visual Speech Recognition Using Lip Information Extracted from Side-Face Images

Research Article Audio-Visual Speech Recognition Using Lip Information Extracted from Side-Face Images Hindawi Publishing Corporation EURASIP Journal on Audio, Speech, and Music Processing Volume 2007, Article ID 64506, 9 pages doi:10.1155/2007/64506 Research Article Audio-Visual Speech Recognition Using

More information

LOCAL APPEARANCE BASED FACE RECOGNITION USING DISCRETE COSINE TRANSFORM

LOCAL APPEARANCE BASED FACE RECOGNITION USING DISCRETE COSINE TRANSFORM LOCAL APPEARANCE BASED FACE RECOGNITION USING DISCRETE COSINE TRANSFORM Hazim Kemal Ekenel, Rainer Stiefelhagen Interactive Systems Labs, University of Karlsruhe Am Fasanengarten 5, 76131, Karlsruhe, Germany

More information

Multi-pose lipreading and audio-visual speech recognition

Multi-pose lipreading and audio-visual speech recognition RESEARCH Open Access Multi-pose lipreading and audio-visual speech recognition Virginia Estellers * and Jean-Philippe Thiran Abstract In this article, we study the adaptation of visual and audio-visual

More information

Audio-visual interaction in sparse representation features for noise robust audio-visual speech recognition

Audio-visual interaction in sparse representation features for noise robust audio-visual speech recognition ISCA Archive http://www.isca-speech.org/archive Auditory-Visual Speech Processing (AVSP) 2013 Annecy, France August 29 - September 1, 2013 Audio-visual interaction in sparse representation features for

More information

SPEECH FEATURE EXTRACTION USING WEIGHTED HIGHER-ORDER LOCAL AUTO-CORRELATION

SPEECH FEATURE EXTRACTION USING WEIGHTED HIGHER-ORDER LOCAL AUTO-CORRELATION Far East Journal of Electronics and Communications Volume 3, Number 2, 2009, Pages 125-140 Published Online: September 14, 2009 This paper is available online at http://www.pphmj.com 2009 Pushpa Publishing

More information

Video search requires efficient annotation of video content To some extent this can be done automatically

Video search requires efficient annotation of video content To some extent this can be done automatically VIDEO ANNOTATION Market Trends Broadband doubling over next 3-5 years Video enabled devices are emerging rapidly Emergence of mass internet audience Mainstream media moving to the Web What do we search

More information

Face Recognition Using Active Appearance Models

Face Recognition Using Active Appearance Models Face Recognition Using Active Appearance Models G.J. Edwards, T.F. Cootes, and C.J. Taylor Wolfson Image Analysis Unit, Department of Medical Biophysics, University of Manchester, Manchester M13 9PT, U.K.

More information

Shape description and modelling

Shape description and modelling COMP3204/COMP6223: Computer Vision Shape description and modelling Jonathon Hare jsh2@ecs.soton.ac.uk Extracting features from shapes represented by connected components Recap: Connected Component A connected

More information

Audio-visual speech recognition using deep bottleneck features and high-performance lipreading

Audio-visual speech recognition using deep bottleneck features and high-performance lipreading Proceedings of APSIPA Annual Summit and Conference 215 16-19 December 215 Audio-visual speech recognition using deep bottleneck features and high-performance lipreading Satoshi TAMURA, Hiroshi NINOMIYA,

More information

A MOUTH FULL OF WORDS: VISUALLY CONSISTENT ACOUSTIC REDUBBING. Disney Research, Pittsburgh, PA University of East Anglia, Norwich, UK

A MOUTH FULL OF WORDS: VISUALLY CONSISTENT ACOUSTIC REDUBBING. Disney Research, Pittsburgh, PA University of East Anglia, Norwich, UK A MOUTH FULL OF WORDS: VISUALLY CONSISTENT ACOUSTIC REDUBBING Sarah Taylor Barry-John Theobald Iain Matthews Disney Research, Pittsburgh, PA University of East Anglia, Norwich, UK ABSTRACT This paper introduces

More information

Partial Least Squares Regression on Grassmannian Manifold for Emotion Recognition

Partial Least Squares Regression on Grassmannian Manifold for Emotion Recognition Emotion Recognition In The Wild Challenge and Workshop (EmotiW 2013) Partial Least Squares Regression on Grassmannian Manifold for Emotion Recognition Mengyi Liu, Ruiping Wang, Zhiwu Huang, Shiguang Shan,

More information

The Template Update Problem

The Template Update Problem The Template Update Problem Iain Matthews, Takahiro Ishikawa, and Simon Baker The Robotics Institute Carnegie Mellon University Abstract Template tracking dates back to the 1981 Lucas-Kanade algorithm.

More information

Turbo Decoders for Audio-visual Continuous Speech Recognition

Turbo Decoders for Audio-visual Continuous Speech Recognition INTERSPEECH 2017 August 20 24, 2017, Stockholm, Sweden Turbo Decoders for Audio-visual Continuous Speech Recognition Ahmed Hussen Abdelaziz International Computer Science Institute, Berkeley, USA ahmedha@icsi.berkeley.edu

More information

Visual Front-End Wars: Viola-Jones Face Detector vs Fourier Lucas-Kanade

Visual Front-End Wars: Viola-Jones Face Detector vs Fourier Lucas-Kanade ISCA Archive http://www.isca-speech.org/archive Auditory-Visual Speech Processing (AVSP) 2013 Annecy, France August 29 - September 1, 2013 Visual Front-End Wars: Viola-Jones Face Detector vs Fourier Lucas-Kanade

More information

Dr Andrew Abel University of Stirling, Scotland

Dr Andrew Abel University of Stirling, Scotland Dr Andrew Abel University of Stirling, Scotland University of Stirling - Scotland Cognitive Signal Image and Control Processing Research (COSIPRA) Cognitive Computation neurobiology, cognitive psychology

More information

DA Progress report 2 Multi-view facial expression. classification Nikolas Hesse

DA Progress report 2 Multi-view facial expression. classification Nikolas Hesse DA Progress report 2 Multi-view facial expression classification 16.12.2010 Nikolas Hesse Motivation Facial expressions (FE) play an important role in interpersonal communication FE recognition can help

More information

Features for Audio-Visual Speech Recognition

Features for Audio-Visual Speech Recognition Features for Audio-Visual Speech Recognition Iain Matthews School of Information Systems University of East Anglia September 1998 Double-sided final version. c This copy of the thesis has been supplied

More information

Dynamic Stream Weight Estimation in Coupled-HMM-based Audio-visual Speech Recognition Using Multilayer Perceptrons

Dynamic Stream Weight Estimation in Coupled-HMM-based Audio-visual Speech Recognition Using Multilayer Perceptrons INTERSPEECH 2014 Dynamic Stream Weight Estimation in Coupled-HMM-based Audio-visual Speech Recognition Using Multilayer Perceptrons Ahmed Hussen Abdelaziz, Dorothea Kolossa Institute of Communication Acoustics,

More information

Face analysis : identity vs. expressions

Face analysis : identity vs. expressions Face analysis : identity vs. expressions Hugo Mercier 1,2 Patrice Dalle 1 1 IRIT - Université Paul Sabatier 118 Route de Narbonne, F-31062 Toulouse Cedex 9, France 2 Websourd 3, passage André Maurois -

More information

Lip Detection and Lip Geometric Feature Extraction using Constrained Local Model for Spoken Language Identification using Visual Speech Recognition

Lip Detection and Lip Geometric Feature Extraction using Constrained Local Model for Spoken Language Identification using Visual Speech Recognition Indian Journal of Science and Technology, Vol 9(32), DOI: 10.17485/ijst/2016/v9i32/98737, August 2016 ISSN (Print) : 0974-6846 ISSN (Online) : 0974-5645 Lip Detection and Lip Geometric Feature Extraction

More information

Audio-Visual Speech Recognition Using MPEG-4 Compliant Visual Features

Audio-Visual Speech Recognition Using MPEG-4 Compliant Visual Features EURASIP Journal on Applied Signal Processing 2002:11, 1213 1227 c 2002 Hindawi Publishing Corporation Audio-Visual Speech Recognition Using MPEG-4 Compliant Visual Features Petar S. Aleksic Department

More information

Robust Audio-Visual Speech Recognition under Noisy Audio-Video Conditions

Robust Audio-Visual Speech Recognition under Noisy Audio-Video Conditions Robust Audio-Visual Speech Recognition under Noisy Audio-Video Conditions Stewart, D., Seymour, R., Pass, A., & Ji, M. (2014). Robust Audio-Visual Speech Recognition under Noisy Audio-Video Conditions.

More information

Face Alignment Across Large Poses: A 3D Solution

Face Alignment Across Large Poses: A 3D Solution Face Alignment Across Large Poses: A 3D Solution Outline Face Alignment Related Works 3D Morphable Model Projected Normalized Coordinate Code Network Structure 3D Image Rotation Performance on Datasets

More information

M I RA Lab. Speech Animation. Where do we stand today? Speech Animation : Hierarchy. What are the technologies?

M I RA Lab. Speech Animation. Where do we stand today? Speech Animation : Hierarchy. What are the technologies? MIRALab Where Research means Creativity Where do we stand today? M I RA Lab Nadia Magnenat-Thalmann MIRALab, University of Geneva thalmann@miralab.unige.ch Video Input (face) Audio Input (speech) FAP Extraction

More information

Lipreading using Profile Lips Rebuilt by 3D Data from the Kinect

Lipreading using Profile Lips Rebuilt by 3D Data from the Kinect Journal of Computational Information Systems 11: 7 (2015) 2429 2438 Available at http://www.jofcis.com Lipreading using Profile Lips Rebuilt by 3D Data from the Kinect Jianrong WANG 1, Yongchun GAO 1,

More information

Face Alignment Under Various Poses and Expressions

Face Alignment Under Various Poses and Expressions Face Alignment Under Various Poses and Expressions Shengjun Xin and Haizhou Ai Computer Science and Technology Department, Tsinghua University, Beijing 100084, China ahz@mail.tsinghua.edu.cn Abstract.

More information

A Multi-Stage Approach to Facial Feature Detection

A Multi-Stage Approach to Facial Feature Detection A Multi-Stage Approach to Facial Feature Detection David Cristinacce, Tim Cootes and Ian Scott Dept. Imaging Science and Biomedical Engineering University of Manchester, Manchester, M3 9PT, U.K. david.cristinacce@stud.man.ac.uk

More information

DISTANCE MAPS: A ROBUST ILLUMINATION PREPROCESSING FOR ACTIVE APPEARANCE MODELS

DISTANCE MAPS: A ROBUST ILLUMINATION PREPROCESSING FOR ACTIVE APPEARANCE MODELS DISTANCE MAPS: A ROBUST ILLUMINATION PREPROCESSING FOR ACTIVE APPEARANCE MODELS Sylvain Le Gallou*, Gaspard Breton*, Christophe Garcia*, Renaud Séguier** * France Telecom R&D - TECH/IRIS 4 rue du clos

More information

Biometrics Technology: Multi-modal (Part 2)

Biometrics Technology: Multi-modal (Part 2) Biometrics Technology: Multi-modal (Part 2) References: At the Level: [M7] U. Dieckmann, P. Plankensteiner and T. Wagner, "SESAM: A biometric person identification system using sensor fusion ", Pattern

More information

A Comparative Evaluation of Active Appearance Model Algorithms

A Comparative Evaluation of Active Appearance Model Algorithms A Comparative Evaluation of Active Appearance Model Algorithms T.F. Cootes, G. Edwards and C.J. Taylor Dept. Medical Biophysics, Manchester University, UK tcootes@server1.smb.man.ac.uk Abstract An Active

More information

END-TO-END VISUAL SPEECH RECOGNITION WITH LSTMS

END-TO-END VISUAL SPEECH RECOGNITION WITH LSTMS END-TO-END VISUAL SPEECH RECOGNITION WITH LSTMS Stavros Petridis, Zuwei Li Imperial College London Dept. of Computing, London, UK {sp14;zl461}@imperial.ac.uk Maja Pantic Imperial College London / Univ.

More information

Face. Feature Extraction. Visual. Feature Extraction. Acoustic Feature Extraction

Face. Feature Extraction. Visual. Feature Extraction. Acoustic Feature Extraction A Bayesian Approach to Audio-Visual Speaker Identification Ara V Nefian 1, Lu Hong Liang 1, Tieyan Fu 2, and Xiao Xing Liu 1 1 Microprocessor Research Labs, Intel Corporation, fara.nefian, lu.hong.liang,

More information

Two-Layered Audio-Visual Speech Recognition for Robots in Noisy Environments

Two-Layered Audio-Visual Speech Recognition for Robots in Noisy Environments The 2 IEEE/RSJ International Conference on Intelligent Robots and Systems October 8-22, 2, Taipei, Taiwan Two-Layered Audio-Visual Speech Recognition for Robots in Noisy Environments Takami Yoshida, Kazuhiro

More information

Learning based face hallucination techniques: A survey

Learning based face hallucination techniques: A survey Vol. 3 (2014-15) pp. 37-45. : A survey Premitha Premnath K Department of Computer Science & Engineering Vidya Academy of Science & Technology Thrissur - 680501, Kerala, India (email: premithakpnath@gmail.com)

More information

Audio Visual Speech Recognition

Audio Visual Speech Recognition Project #7 Audio Visual Speech Recognition Final Presentation 06.08.2010 PROJECT MEMBERS Hakan Erdoğan Saygın Topkaya Berkay Yılmaz Umut Şen Alexey Tarasov Presentation Outline 1.Project Information 2.Proposed

More information

WHO WANTS TO BE A MILLIONAIRE?

WHO WANTS TO BE A MILLIONAIRE? IDIAP COMMUNICATION REPORT WHO WANTS TO BE A MILLIONAIRE? Huseyn Gasimov a Aleksei Triastcyn Hervé Bourlard Idiap-Com-03-2012 JULY 2012 a EPFL Centre du Parc, Rue Marconi 19, PO Box 592, CH - 1920 Martigny

More information

Automatic Enhancement of Correspondence Detection in an Object Tracking System

Automatic Enhancement of Correspondence Detection in an Object Tracking System Automatic Enhancement of Correspondence Detection in an Object Tracking System Denis Schulze 1, Sven Wachsmuth 1 and Katharina J. Rohlfing 2 1- University of Bielefeld - Applied Informatics Universitätsstr.

More information

Genetic Snakes: Application on Lipreading

Genetic Snakes: Application on Lipreading Genetic Snakes: Application on Lipreading Renaud Séguier, Nicolas Cladel 1 Abstract Contrary to Genetics Snakes, the current methods of mouth modeling are very sensitive to initialization (position of

More information

Image Processing (IP)

Image Processing (IP) Image Processing Pattern Recognition Computer Vision Xiaojun Qi Utah State University Image Processing (IP) Manipulate and analyze digital images (pictorial information) by computer. Applications: The

More information

Lip Contour Localization using Statistical Shape Models Fabian Schneiter

Lip Contour Localization using Statistical Shape Models Fabian Schneiter Lip Contour Localization using Statistical Shape Models Fabian Schneiter Master Thesis Supervised by Gabriele Fanelli Computer Vision Institute, Department of Electrical Engineering & Information Technology,

More information

Discriminative training and Feature combination

Discriminative training and Feature combination Discriminative training and Feature combination Steve Renals Automatic Speech Recognition ASR Lecture 13 16 March 2009 Steve Renals Discriminative training and Feature combination 1 Overview Hot topics

More information

Recognizing faces in broadcast video

Recognizing faces in broadcast video To appear in the proceedings of the IEEE workshop on Real-Time Analysis and Tracking of Face and Gesture in Real-Time 1 Systems, Kerkyra, Greece, September 1999 Recognizing faces in broadcast video A.W.Senior

More information

AAM Based Facial Feature Tracking with Kinect

AAM Based Facial Feature Tracking with Kinect BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 15, No 3 Sofia 2015 Print ISSN: 1311-9702; Online ISSN: 1314-4081 DOI: 10.1515/cait-2015-0046 AAM Based Facial Feature Tracking

More information

FACE ANALYSIS AND SYNTHESIS FOR INTERACTIVE ENTERTAINMENT

FACE ANALYSIS AND SYNTHESIS FOR INTERACTIVE ENTERTAINMENT FACE ANALYSIS AND SYNTHESIS FOR INTERACTIVE ENTERTAINMENT Shoichiro IWASAWA*I, Tatsuo YOTSUKURA*2, Shigeo MORISHIMA*2 */ Telecommunication Advancement Organization *2Facu!ty of Engineering, Seikei University

More information

P-CNN: Pose-based CNN Features for Action Recognition. Iman Rezazadeh

P-CNN: Pose-based CNN Features for Action Recognition. Iman Rezazadeh P-CNN: Pose-based CNN Features for Action Recognition Iman Rezazadeh Introduction automatic understanding of dynamic scenes strong variations of people and scenes in motion and appearance Fine-grained

More information

Workshop W14 - Audio Gets Smart: Semantic Audio Analysis & Metadata Standards

Workshop W14 - Audio Gets Smart: Semantic Audio Analysis & Metadata Standards Workshop W14 - Audio Gets Smart: Semantic Audio Analysis & Metadata Standards Jürgen Herre for Integrated Circuits (FhG-IIS) Erlangen, Germany Jürgen Herre, hrr@iis.fhg.de Page 1 Overview Extracting meaning

More information

Lip Tracking for MPEG-4 Facial Animation

Lip Tracking for MPEG-4 Facial Animation Lip Tracking for MPEG-4 Facial Animation Zhilin Wu, Petar S. Aleksic, and Aggelos. atsaggelos Department of Electrical and Computer Engineering Northwestern University 45 North Sheridan Road, Evanston,

More information

Classifying Images with Visual/Textual Cues. By Steven Kappes and Yan Cao

Classifying Images with Visual/Textual Cues. By Steven Kappes and Yan Cao Classifying Images with Visual/Textual Cues By Steven Kappes and Yan Cao Motivation Image search Building large sets of classified images Robotics Background Object recognition is unsolved Deformable shaped

More information

Modeling Coarticulation in Continuous Speech

Modeling Coarticulation in Continuous Speech ing in Oregon Health & Science University Center for Spoken Language Understanding December 16, 2013 Outline in 1 2 3 4 5 2 / 40 in is the influence of one phoneme on another Figure: of coarticulation

More information

Classification of Face Images for Gender, Age, Facial Expression, and Identity 1

Classification of Face Images for Gender, Age, Facial Expression, and Identity 1 Proc. Int. Conf. on Artificial Neural Networks (ICANN 05), Warsaw, LNCS 3696, vol. I, pp. 569-574, Springer Verlag 2005 Classification of Face Images for Gender, Age, Facial Expression, and Identity 1

More information

Perceptual coding. A psychoacoustic model is used to identify those signals that are influenced by both these effects.

Perceptual coding. A psychoacoustic model is used to identify those signals that are influenced by both these effects. Perceptual coding Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal. Perceptual encoders, however, have been designed for the compression of general

More information

State of The Art In 3D Face Recognition

State of The Art In 3D Face Recognition State of The Art In 3D Face Recognition Index 1 FROM 2D TO 3D 3 2 SHORT BACKGROUND 4 2.1 THE MOST INTERESTING 3D RECOGNITION SYSTEMS 4 2.1.1 FACE RECOGNITION USING RANGE IMAGES [1] 4 2.1.2 FACE RECOGNITION

More information

22 October, 2012 MVA ENS Cachan. Lecture 5: Introduction to generative models Iasonas Kokkinos

22 October, 2012 MVA ENS Cachan. Lecture 5: Introduction to generative models Iasonas Kokkinos Machine Learning for Computer Vision 1 22 October, 2012 MVA ENS Cachan Lecture 5: Introduction to generative models Iasonas Kokkinos Iasonas.kokkinos@ecp.fr Center for Visual Computing Ecole Centrale Paris

More information

Dynamic Stream Weighting for Turbo-Decoding-Based Audiovisual ASR

Dynamic Stream Weighting for Turbo-Decoding-Based Audiovisual ASR INTERSPEECH 2016 September 8 12, 2016, San Francisco, USA Dynamic Stream Weighting for Turbo-Decoding-Based Audiovisual ASR Sebastian Gergen 1, Steffen Zeiler 1, Ahmed Hussen Abdelaziz 2, Robert Nickel

More information

A Novel Visual Speech Representation and HMM Classification for Visual Speech Recognition

A Novel Visual Speech Representation and HMM Classification for Visual Speech Recognition A Novel Visual Speech Representation and HMM Classification for Visual Speech Recognition Dahai Yu, Ovidiu Ghita, Alistair Sutherland*, Paul F Whelan Vision Systems Group, School of Electronic Engineering

More information

On Modeling Variations for Face Authentication

On Modeling Variations for Face Authentication On Modeling Variations for Face Authentication Xiaoming Liu Tsuhan Chen B.V.K. Vijaya Kumar Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA 15213 xiaoming@andrew.cmu.edu

More information

Learning The Lexicon!

Learning The Lexicon! Learning The Lexicon! A Pronunciation Mixture Model! Ian McGraw! (imcgraw@mit.edu)! Ibrahim Badr Jim Glass! Computer Science and Artificial Intelligence Lab! Massachusetts Institute of Technology! Cambridge,

More information

Further Details Contact: A. Vinay , , #301, 303 & 304,3rdFloor, AVR Buildings, Opp to SV Music College, Balaji

Further Details Contact: A. Vinay , , #301, 303 & 304,3rdFloor, AVR Buildings, Opp to SV Music College, Balaji S.No TITLES DOMAIN DIGITAL 1 Image Haze Removal via Reference Retrieval and Scene Prior 2 Segmentation of Optic Disc from Fundus images 3 Active Contour Segmentation of Polyps in Capsule Endoscopic Images

More information

Face anti-spoofing using Image Quality Assessment

Face anti-spoofing using Image Quality Assessment Face anti-spoofing using Image Quality Assessment Speakers Prisme Polytech Orléans Aladine Chetouani R&D Trusted Services Emna Fourati Outline Face spoofing attacks Image Quality Assessment Proposed method

More information

Xing Fan, Carlos Busso and John H.L. Hansen

Xing Fan, Carlos Busso and John H.L. Hansen Xing Fan, Carlos Busso and John H.L. Hansen Center for Robust Speech Systems (CRSS) Erik Jonsson School of Engineering & Computer Science Department of Electrical Engineering University of Texas at Dallas

More information

A long, deep and wide artificial neural net for robust speech recognition in unknown noise

A long, deep and wide artificial neural net for robust speech recognition in unknown noise A long, deep and wide artificial neural net for robust speech recognition in unknown noise Feipeng Li, Phani S. Nidadavolu, and Hynek Hermansky Center for Language and Speech Processing Johns Hopkins University,

More information

Detection of Mouth Movements and Its Applications to Cross-Modal Analysis of Planning Meetings

Detection of Mouth Movements and Its Applications to Cross-Modal Analysis of Planning Meetings 2009 International Conference on Multimedia Information Networking and Security Detection of Mouth Movements and Its Applications to Cross-Modal Analysis of Planning Meetings Yingen Xiong Nokia Research

More information

Optical Storage Technology. MPEG Data Compression

Optical Storage Technology. MPEG Data Compression Optical Storage Technology MPEG Data Compression MPEG-1 1 Audio Standard Moving Pictures Expert Group (MPEG) was formed in 1988 to devise compression techniques for audio and video. It first devised the

More information

Columbia University High-Level Feature Detection: Parts-based Concept Detectors

Columbia University High-Level Feature Detection: Parts-based Concept Detectors TRECVID 2005 Workshop Columbia University High-Level Feature Detection: Parts-based Concept Detectors Dong-Qing Zhang, Shih-Fu Chang, Winston Hsu, Lexin Xie, Eric Zavesky Digital Video and Multimedia Lab

More information