Phoneme Analysis of Image Feature in Utterance Recognition Using AAM in Lip Area

Size: px
Start display at page:

Download "Phoneme Analysis of Image Feature in Utterance Recognition Using AAM in Lip Area"

Transcription

1 MIRU 7 AAM {komai,miyamoto}@me.cs.scitec.kobe-u.ac.jp, {takigu,ariki}@kobe-u.ac.jp Active Appearance Model Active Appearance Model combined DCT Active Appearance Model combined Phoneme Analysis of Image Feature in Utterance Recognition Using AAM in Lip Area Yuto KOMAI, Chikoto MIYAMOTO, Tetsuya TAKIGUCHI, and Yasuo ARIKI Graduate School of System Informatics, Kobe University Rokkodaicho 1 1, Nada-ku, Kobe, Hyogo, Japan Organization of Advanced Science and Technology, Kobe University Rokkodaicho 1 1, Nada-ku, Kobe, Hyogo, Japan {komai,miyamoto}@me.cs.scitec.kobe-u.ac.jp, {takigu,ariki}@kobe-u.ac.jp Abstract Lip-reading, that recognizes the content of the utterance from the movement of the lip enables the recognition under a noise environment. Multimodal speech recognition using lip dynamic scene information together with acoustic information is attracting attention and the research is advanced in recent years. Since visual information plays a great role in multimodal speech recognition, what to select as the image feature becomes a significant point. This paper proposes, for spoken word recognition, to utilize c combined parameter as the image feature extracted by Active Appearance Model applied to a face image including the lip area. Combined parameter contains information of the coordinate value and the intensity value as the image feature. The recognition rate was improved by the proposed method compared to the conventional method such as DCT and the principal component score. Moreover, we integrated with high accuracy the phoneme score from acoustic information and the viseme score from visual information. Key words Lip area, Active Appearance Model, Combined parameter, Integration of audio and visual, Viseme 1. IS3-31:1771

2 [1] [2] [3] [4] [5] RGB [6] [7] [8] [9] SNAKE [] Active Shape Model [11] Active Appearance Model [12] [13] [14] [15] [2] [3] [15] [16] DCT [14] [17] Active Appearance Models AAM, AAM combined c shape texture c HMM Haar-like AdaBoost [18] [19] [] AAM Haar-like AdaBoost AAM AAM AdaBoost AAM 音声 特徴抽出 MFCC HMM q q q q4 1 入力動画 結果統合 認識 画像 顔検出 顔領域 AAM 適用 唇領域 AAM 適用 特徴抽出 c パラメータ HMM q q q q4 AAM AAM AAM AAM 2 AAM AAM AAM c texture [21] AAM AAM 2 AAM AAM AAM c HMM HMM IS3-31:1772

3 HMM Active Appearance Models AAM Cootes shape texture shape s s s s texture texture g s g 1 2 s = x 1, y 1,..., x n, y n T 1 AAM のモデル構築に用いた 63 点の特徴点を与えた画像 g = g 1,..., g m T 2 x i, y i i < = n g j j < = m s s ḡ s g s ḡ P s P g 3 4 s = s + P s b s 3 顔領域の shape 顔領域 AAM 唇領域の shape 唇領域 AAM g = ḡ + P g b g AAM b s b g shape texture shape texture shape texture b s b g 5 6 W s b s W s P T s s s b = = = Qc 5 b g P T g g ḡ Q = Q s Q g 6 W s shape texture Q c shape texture combined c s g 7 8 sc = s + P s W s 1 Q s c 7 gc = ḡ + P g Q g c 8 c shape texture AAM AAM shape texture AAM shape texture 3. 3 Combined AAM 3 c c c c I i I i Wp AAM gc e 9 e c p IS3-31:1773

4 閉じた口 平均テクスチャ う 4. HMM HMM HMM HMM MFCC12, [4] 3 あい c L A+V = αl A + 1 αl V, < = α < = 1 L A+V L A L V α ec, p = gc I i Wp 2 9 p W c shape texture 95% 78 c c c, 3. 4 c 2 DCT [22] AAM DCT % DCT ATR 216 ATR 1 Logicool Qcam Orbit MP 9 7 fps SONY ECM-PC / / cm 1 SN db leave-one-out 9 1 closed open HMM 3 4 HMM closed HMM monophone closed open open closed1 HMM closed2 closed HMM open open HMM DCT 2 DCT PCA C parameterface AAM c C parameterlip AAM c IS3-31:1774

5 率識 7 認 DCT PCA C parameterface C parameterlip Speech % 率 7 識認 DCT PCA C parameter 混合数 closed2 closed1 closed2 open 4 closed1 c closed2 c DCT 2.2 PCA 1.4 c 2.5 open DCT 11 PCA 4 c 5 closed2 open c closed2 open closed1 closed1 closed2 HMM HMM HMM HMM open closed2 5 6 HMM 5 6 closed2 closed2 closed2 open closed2 closed2 open 8 % 率 7 識認 DCT PCA C parameter 混合数 open 5. 3 SN db - c HMM HMM HMM closed1 closed HMM closed2 open HMM open 1 clean SN 8 % closed1 db - IS3-31:1775

6 closed2 db - db - a a: b by ch d dy e e: f g gy h hy i i: j k ky m my n N ny o o: p py q r ry s sh t ts u u: w y z 欠落 a a: 3 b by 1 ch d dy e 33 e: 1 1 f g gy h hy i i: 1 1 j 1 11 k ky m 13 1 my 1 n 5 1 N ny o 52 1 o: 1 p 6 py q 3 3 r ry 1 s 1 sh 1 17 t 1 1 ts 5 u u: 4 2 w 1 3 y z 5 挿入 confusion matrixopen open SN db - SN HMM open Julius c 11 open confusion matrix 11 c open confusion matrix Correct Accuracy a a: b by ch d dy e e: f g gy h hy i i: j k ky m my n N ny o o: p py q r ry s sh t ts u u: w y z 欠落 a a: 3 b by 1 ch d dy e e: 2 f g gy h hy i i: 2 j k ky m my 1 n N ny o o: 1 p py q 1 5 r ry 1 s sh t ts u u: 6 w 1 3 y z 挿入 c confusion matrixopen 1 open open closed2 Accuracy Correct Accuracy Correct Accuracy Correct open IS3-31:1776

7 c i u ii uu ii i uu u ii uu r r a ra N N N N N k g n r b m p ka ga viseme closed2 [17] 5. HMM 12 eikyou eigyou 2 eisyou closed db - open 12 closed2 13 open closed2 open confusion matrix 2 14 a i u e o p r sy w t s y vf N 欠落 a 73 8 i u e o 5 p 54 1 r sy w t s y vf N 2 29 挿入 db - c confusion matrixopen 2 1 IS3-31:1777

8 2 open closed2 Accuracy Correct Accuracy Correct closed2 14 N 6. 1 t t t d n t t vf N 7. AAM combined confusion matrix 1 AAM monophone HMM triphone HMM [1] G Potamianos, H.P. Graf, Discriminative Training of HMM Stream Exponents for Audio-Visual Speech Recognition, Proc. ICASSP98, Seattle, U.S.A., vol.6, pp , [2] pp.3-4, Sep, 2. [3],,,,,, pp , Apr. 3. [4] WIT8-7 pp.37-42, 8. [5] HMM pp ,. [6] Vol.J85-D-, No.12, pp , 2. [7] Vol.J89-D, No.9, pp.25-32, 6. [8] Rainer Stiefelhagen, Uwe Meiger, Jie Yang, Real- Time Lip-Tracking For LipReading, Eurospeech 97, [9] Vol.J84-D-, No.3, pp , 1. [] PRMU3-276, pp , 4. [11] Juergen Luettin, Neil A. Thacker, Steve W.Beet, Visual Speech Recognition using Active Shape Models And Hidden Markov Models ICASSP-96, Vol.2, pp.817-8, [12] Cootes, T.F., Active Appearance Model, Proc. European Conference on Computer Vision, Vol2, pp , [13] Cootes, T.F., K Walker, C.J.Taylor, View-based Active Appearance Models Image and Vision Computing, pp , 2. [14] C. Neti, G. Potamianos, J. Luettin, I. Matthews, H. Glotin, D. Vergyri, J. Sison, A. Mashari, and J. Zhou, Audio-visual speech recognition, Center for Language and Speech Processing, The Johns Hopkins University, Baltimore, MD, Final Workshop Report, Oct.. [15] MIRU8, IS-17, pp , 8. [16] HMM Vol.PRMU2-124, pp.25-, 2. [17] JSAI4, 1E2-2, pp.1-4, 4. [18] P.Viola, M. Jones, Rapid Object Detection Using Boosted Cascade of Simple Features In Proc. IEEE Conf. on Computer Vision and Pattern Recognition, pp.1-9, 1. [19] Joint Haar-like MIRU5 pp [] SP8-7, pp.1-6, 8. [21] AAM MIRU9 pp [22] Vol.1, No.385, pp.25-32, 1. IS3-31:1778

Facial Age Estimation Based on KNN-SVR Regression and AAM Parameters

Facial Age Estimation Based on KNN-SVR Regression and AAM Parameters 画像の認識 理解シンポジウム (MIRU2012) 2012 年 8 月 Facial Age Estimation Based on KNN-SVR Regression and AAM Parameters Songzhu Gao Tetsuya Takiguchi and Yasuo Ariki Graduate School of System Informatics, Kobe University,

More information

2-2-2, Hikaridai, Seika-cho, Soraku-gun, Kyoto , Japan 2 Graduate School of Information Science, Nara Institute of Science and Technology

2-2-2, Hikaridai, Seika-cho, Soraku-gun, Kyoto , Japan 2 Graduate School of Information Science, Nara Institute of Science and Technology ISCA Archive STREAM WEIGHT OPTIMIZATION OF SPEECH AND LIP IMAGE SEQUENCE FOR AUDIO-VISUAL SPEECH RECOGNITION Satoshi Nakamura 1 Hidetoshi Ito 2 Kiyohiro Shikano 2 1 ATR Spoken Language Translation Research

More information

Joint Processing of Audio and Visual Information for Speech Recognition

Joint Processing of Audio and Visual Information for Speech Recognition Joint Processing of Audio and Visual Information for Speech Recognition Chalapathy Neti and Gerasimos Potamianos IBM T.J. Watson Research Center Yorktown Heights, NY 10598 With Iain Mattwes, CMU, USA Juergen

More information

Mobile Robot Control Interface Using Augmented Free-viewpoint Image Generation

Mobile Robot Control Interface Using Augmented Free-viewpoint Image Generation THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS TECHNICAL REPORT OF IEICE. 630 0192 8916-5 E-mail: {yuko-u,fumio-o,tomoka-s,yokoya}@is.naist.jp,,,,,, Abstract Mobile Robot Control

More information

Audio-Visual Speech Processing System for Polish with Dynamic Bayesian Network Models

Audio-Visual Speech Processing System for Polish with Dynamic Bayesian Network Models Proceedings of the orld Congress on Electrical Engineering and Computer Systems and Science (EECSS 2015) Barcelona, Spain, July 13-14, 2015 Paper No. 343 Audio-Visual Speech Processing System for Polish

More information

A New Manifold Representation for Visual Speech Recognition

A New Manifold Representation for Visual Speech Recognition A New Manifold Representation for Visual Speech Recognition Dahai Yu, Ovidiu Ghita, Alistair Sutherland, Paul F. Whelan School of Computing & Electronic Engineering, Vision Systems Group Dublin City University,

More information

3D LIP TRACKING AND CO-INERTIA ANALYSIS FOR IMPROVED ROBUSTNESS OF AUDIO-VIDEO AUTOMATIC SPEECH RECOGNITION

3D LIP TRACKING AND CO-INERTIA ANALYSIS FOR IMPROVED ROBUSTNESS OF AUDIO-VIDEO AUTOMATIC SPEECH RECOGNITION 3D LIP TRACKING AND CO-INERTIA ANALYSIS FOR IMPROVED ROBUSTNESS OF AUDIO-VIDEO AUTOMATIC SPEECH RECOGNITION Roland Goecke 1,2 1 Autonomous System and Sensing Technologies, National ICT Australia, Canberra,

More information

Confusability of Phonemes Grouped According to their Viseme Classes in Noisy Environments

Confusability of Phonemes Grouped According to their Viseme Classes in Noisy Environments PAGE 265 Confusability of Phonemes Grouped According to their Viseme Classes in Noisy Environments Patrick Lucey, Terrence Martin and Sridha Sridharan Speech and Audio Research Laboratory Queensland University

More information

Lipreading using Profile Lips Rebuilt by 3D Data from the Kinect

Lipreading using Profile Lips Rebuilt by 3D Data from the Kinect Journal of Computational Information Systems 11: 7 (2015) 2429 2438 Available at http://www.jofcis.com Lipreading using Profile Lips Rebuilt by 3D Data from the Kinect Jianrong WANG 1, Yongchun GAO 1,

More information

SPEECH FEATURE EXTRACTION USING WEIGHTED HIGHER-ORDER LOCAL AUTO-CORRELATION

SPEECH FEATURE EXTRACTION USING WEIGHTED HIGHER-ORDER LOCAL AUTO-CORRELATION Far East Journal of Electronics and Communications Volume 3, Number 2, 2009, Pages 125-140 Published Online: September 14, 2009 This paper is available online at http://www.pphmj.com 2009 Pushpa Publishing

More information

Two-Layered Audio-Visual Speech Recognition for Robots in Noisy Environments

Two-Layered Audio-Visual Speech Recognition for Robots in Noisy Environments The 2 IEEE/RSJ International Conference on Intelligent Robots and Systems October 8-22, 2, Taipei, Taiwan Two-Layered Audio-Visual Speech Recognition for Robots in Noisy Environments Takami Yoshida, Kazuhiro

More information

Research Article Audio-Visual Speech Recognition Using Lip Information Extracted from Side-Face Images

Research Article Audio-Visual Speech Recognition Using Lip Information Extracted from Side-Face Images Hindawi Publishing Corporation EURASIP Journal on Audio, Speech, and Music Processing Volume 2007, Article ID 64506, 9 pages doi:10.1155/2007/64506 Research Article Audio-Visual Speech Recognition Using

More information

Audio Visual Isolated Oriya Digit Recognition Using HMM and DWT

Audio Visual Isolated Oriya Digit Recognition Using HMM and DWT Conference on Advances in Communication and Control Systems 2013 (CAC2S 2013) Audio Visual Isolated Oriya Digit Recognition Using HMM and DWT Astik Biswas Department of Electrical Engineering, NIT Rourkela,Orrisa

More information

A Comparison of Visual Features for Audio-Visual Automatic Speech Recognition

A Comparison of Visual Features for Audio-Visual Automatic Speech Recognition A Comparison of Visual Features for Audio-Visual Automatic Speech Recognition N. Ahmad, S. Datta, D. Mulvaney and O. Farooq Loughborough Univ, LE11 3TU Leicestershire, UK n.ahmad@lboro.ac.uk 6445 Abstract

More information

arxiv: v1 [cs.cv] 3 Oct 2017

arxiv: v1 [cs.cv] 3 Oct 2017 Which phoneme-to-viseme maps best improve visual-only computer lip-reading? Helen L. Bear, Richard W. Harvey, Barry-John Theobald and Yuxuan Lan School of Computing Sciences, University of East Anglia,

More information

END-TO-END VISUAL SPEECH RECOGNITION WITH LSTMS

END-TO-END VISUAL SPEECH RECOGNITION WITH LSTMS END-TO-END VISUAL SPEECH RECOGNITION WITH LSTMS Stavros Petridis, Zuwei Li Imperial College London Dept. of Computing, London, UK {sp14;zl461}@imperial.ac.uk Maja Pantic Imperial College London / Univ.

More information

Audio-visual speech recognition using deep bottleneck features and high-performance lipreading

Audio-visual speech recognition using deep bottleneck features and high-performance lipreading Proceedings of APSIPA Annual Summit and Conference 215 16-19 December 215 Audio-visual speech recognition using deep bottleneck features and high-performance lipreading Satoshi TAMURA, Hiroshi NINOMIYA,

More information

LOW-DIMENSIONAL MOTION FEATURES FOR AUDIO-VISUAL SPEECH RECOGNITION

LOW-DIMENSIONAL MOTION FEATURES FOR AUDIO-VISUAL SPEECH RECOGNITION LOW-DIMENSIONAL MOTION FEATURES FOR AUDIO-VISUAL SPEECH Andrés Vallés Carboneras, Mihai Gurban +, and Jean-Philippe Thiran + + Signal Processing Institute, E.T.S.I. de Telecomunicación Ecole Polytechnique

More information

Dynamic Stream Weighting for Turbo-Decoding-Based Audiovisual ASR

Dynamic Stream Weighting for Turbo-Decoding-Based Audiovisual ASR INTERSPEECH 2016 September 8 12, 2016, San Francisco, USA Dynamic Stream Weighting for Turbo-Decoding-Based Audiovisual ASR Sebastian Gergen 1, Steffen Zeiler 1, Ahmed Hussen Abdelaziz 2, Robert Nickel

More information

An Introduction to Pattern Recognition

An Introduction to Pattern Recognition An Introduction to Pattern Recognition Speaker : Wei lun Chao Advisor : Prof. Jian-jiun Ding DISP Lab Graduate Institute of Communication Engineering 1 Abstract Not a new research field Wide range included

More information

Lip Detection and Lip Geometric Feature Extraction using Constrained Local Model for Spoken Language Identification using Visual Speech Recognition

Lip Detection and Lip Geometric Feature Extraction using Constrained Local Model for Spoken Language Identification using Visual Speech Recognition Indian Journal of Science and Technology, Vol 9(32), DOI: 10.17485/ijst/2016/v9i32/98737, August 2016 ISSN (Print) : 0974-6846 ISSN (Online) : 0974-5645 Lip Detection and Lip Geometric Feature Extraction

More information

Mouse Pointer Tracking with Eyes

Mouse Pointer Tracking with Eyes Mouse Pointer Tracking with Eyes H. Mhamdi, N. Hamrouni, A. Temimi, and M. Bouhlel Abstract In this article, we expose our research work in Human-machine Interaction. The research consists in manipulating

More information

Object Recognition by Integrated Information Using Web Images

Object Recognition by Integrated Information Using Web Images 2013 Second IAPR Asian Conference on Pattern Recognition Object Recognition by Integrated Information Using Web Images Hitoshi Nishimura, Yuko Ozasa, Yasuo Ariki, Mikio Nakano Graduate School of System

More information

フラクタル 1 ( ジュリア集合 ) 解説 : ジュリア集合 ( 自己平方フラクタル ) 入力パラメータの例 ( 小さな数値の変化で模様が大きく変化します. Ar や Ai の数値を少しずつ変化させて描画する. ) プログラムコード. 2010, AGU, M.

フラクタル 1 ( ジュリア集合 ) 解説 : ジュリア集合 ( 自己平方フラクタル ) 入力パラメータの例 ( 小さな数値の変化で模様が大きく変化します. Ar や Ai の数値を少しずつ変化させて描画する. ) プログラムコード. 2010, AGU, M. フラクタル 1 ( ジュリア集合 ) PictureBox 1 TextBox 1 TextBox 2 解説 : ジュリア集合 ( 自己平方フラクタル ) TextBox 3 複素平面 (= PictureBox1 ) 上の点 ( に対して, x, y) 初期値 ( 複素数 ) z x iy を決める. 0 k 1 z k 1 f ( z) z 2 k a 写像 ( 複素関数 ) (a : 複素定数

More information

第 6 回画像センシングシンポジウム, 横浜,00 年 6 月 and pose variation, thus they will fail under these imaging conditions. n [7] a method for facial landmarks detection

第 6 回画像センシングシンポジウム, 横浜,00 年 6 月 and pose variation, thus they will fail under these imaging conditions. n [7] a method for facial landmarks detection S3-7 第 6 回画像センシングシンポジウム, 横浜,00 年 6 月 Local ndependent Components Analysis-Based Facial Features Detection Method M. Hassaballah,, omonori KANAZAWA 3, Shinobu DO 3, Shun DO Department of Computer Science,

More information

Mouth Center Detection under Active Near Infrared Illumination

Mouth Center Detection under Active Near Infrared Illumination Proceedings of the 6th WSEAS International Conference on SIGNAL PROCESSING, Dallas, Texas, USA, March 22-24, 2007 173 Mouth Center Detection under Active Near Infrared Illumination THORSTEN GERNOTH, RALPH

More information

Mouth Region Localization Method Based on Gaussian Mixture Model

Mouth Region Localization Method Based on Gaussian Mixture Model Mouth Region Localization Method Based on Gaussian Mixture Model Kenichi Kumatani and Rainer Stiefelhagen Universitaet Karlsruhe (TH), Interactive Systems Labs, Am Fasanengarten 5, 76131 Karlsruhe, Germany

More information

Temporal Multimodal Learning in Audiovisual Speech Recognition

Temporal Multimodal Learning in Audiovisual Speech Recognition Temporal Multimodal Learning in Audiovisual Speech Recognition Di Hu, Xuelong Li, Xiaoqiang Lu School of Computer Science and Center for OPTical IMagery Analysis and Learning (OPTIMAL), Northwestern Polytechnical

More information

Audio-visual interaction in sparse representation features for noise robust audio-visual speech recognition

Audio-visual interaction in sparse representation features for noise robust audio-visual speech recognition ISCA Archive http://www.isca-speech.org/archive Auditory-Visual Speech Processing (AVSP) 2013 Annecy, France August 29 - September 1, 2013 Audio-visual interaction in sparse representation features for

More information

Methods to Detect Malicious MS Document File using File Structure Inspection

Methods to Detect Malicious MS Document File using File Structure Inspection MS 1,a) 2,b) 2 MS Rich Text Compound File Binary MS MS MS 98.4% MS MS Methods to Detect Malicious MS Document File using File Structure Inspection Abstract: Today, the number of targeted attacks is increasing,

More information

Probabilistic Facial Feature Extraction Using Joint Distribution of Location and Texture Information

Probabilistic Facial Feature Extraction Using Joint Distribution of Location and Texture Information Probabilistic Facial Feature Extraction Using Joint Distribution of Location and Texture Information Mustafa Berkay Yilmaz, Hakan Erdogan, Mustafa Unel Sabanci University, Faculty of Engineering and Natural

More information

Audio-Visual Speech Recognition Using MPEG-4 Compliant Visual Features

Audio-Visual Speech Recognition Using MPEG-4 Compliant Visual Features EURASIP Journal on Applied Signal Processing 2002:11, 1213 1227 c 2002 Hindawi Publishing Corporation Audio-Visual Speech Recognition Using MPEG-4 Compliant Visual Features Petar S. Aleksic Department

More information

Face. Feature Extraction. Visual. Feature Extraction. Acoustic Feature Extraction

Face. Feature Extraction. Visual. Feature Extraction. Acoustic Feature Extraction A Bayesian Approach to Audio-Visual Speaker Identification Ara V Nefian 1, Lu Hong Liang 1, Tieyan Fu 2, and Xiao Xing Liu 1 1 Microprocessor Research Labs, Intel Corporation, fara.nefian, lu.hong.liang,

More information

PAPER Graph Cuts Segmentation by Using Local Texture Features of Multiresolution Analysis

PAPER Graph Cuts Segmentation by Using Local Texture Features of Multiresolution Analysis IEICE TRANS. INF. & SYST., VOL.E92 D, NO.7 JULY 2009 1453 PAPER Graph Cuts Segmentation by Using Local Texture Features of Multiresolution Analysis Keita FUKUDA a), Nonmember, Tetsuya TAKIGUCHI, and Yasuo

More information

Articulatory Features for Robust Visual Speech Recognition

Articulatory Features for Robust Visual Speech Recognition Articulatory Features for Robust Visual Speech Recognition Kate Saenko, Trevor Darrell, and James Glass MIT Computer Science and Artificial Intelligence Laboratory 32 Vassar Street Cambridge, Massachusetts,

More information

MySQL Cluster 7.3 リリース記念!! 5 分で作る MySQL Cluster 環境

MySQL Cluster 7.3 リリース記念!! 5 分で作る MySQL Cluster 環境 MySQL Cluster 7.3 リリース記念!! 5 分で作る MySQL Cluster 環境 日本オラクル株式会社山崎由章 / MySQL Senior Sales Consultant, Asia Pacific and Japan 1 Copyright 2012, Oracle and/or its affiliates. All rights reserved. New!! 外部キー

More information

Generic object recognition using graph embedding into a vector space

Generic object recognition using graph embedding into a vector space American Journal of Software Engineering and Applications 2013 ; 2(1) : 13-18 Published online February 20, 2013 (http://www.sciencepublishinggroup.com/j/ajsea) doi: 10.11648/j. ajsea.20130201.13 Generic

More information

ISCA Archive

ISCA Archive ISCA Archive http://www.isca-speech.org/archive Auditory-Visual Speech Processing 2005 (AVSP 05) British Columbia, Canada July 24-27, 2005 AUDIO-VISUAL SPEAKER IDENTIFICATION USING THE CUAVE DATABASE David

More information

Combining Audio and Video for Detection of Spontaneous Emotions

Combining Audio and Video for Detection of Spontaneous Emotions Combining Audio and Video for Detection of Spontaneous Emotions Rok Gajšek, Vitomir Štruc, Simon Dobrišek, Janez Žibert, France Mihelič, and Nikola Pavešić Faculty of Electrical Engineering, University

More information

SCALE BASED FEATURES FOR AUDIOVISUAL SPEECH RECOGNITION

SCALE BASED FEATURES FOR AUDIOVISUAL SPEECH RECOGNITION IEE Colloquium on Integrated Audio-Visual Processing for Recognition, Synthesis and Communication, pp 8/1 8/7, 1996 1 SCALE BASED FEATURES FOR AUDIOVISUAL SPEECH RECOGNITION I A Matthews, J A Bangham and

More information

Bengt J. Borgström, Student Member, IEEE, and Abeer Alwan, Senior Member, IEEE

Bengt J. Borgström, Student Member, IEEE, and Abeer Alwan, Senior Member, IEEE 1 A Low Complexity Parabolic Lip Contour Model With Speaker Normalization For High-Level Feature Extraction in Noise Robust Audio-Visual Speech Recognition Bengt J Borgström, Student Member, IEEE, and

More information

IP Network Technology

IP Network Technology IP Network Technology IP Internet Procol QoS Quality of Service RPR Resilient Packet Ring FLASHWAVE2700 Abstract The Internet procol (IP) has made it possible to drastically broaden the bandwidth of networks

More information

AUDIO-VISUAL SPEECH-PROCESSING SYSTEM FOR POLISH APPLICABLE TO HUMAN-COMPUTER INTERACTION

AUDIO-VISUAL SPEECH-PROCESSING SYSTEM FOR POLISH APPLICABLE TO HUMAN-COMPUTER INTERACTION Computer Science 19(1) 2018 https://doi.org/10.7494/csci.2018.19.1.2398 Tomasz Jadczyk AUDIO-VISUAL SPEECH-PROCESSING SYSTEM FOR POLISH APPLICABLE TO HUMAN-COMPUTER INTERACTION Abstract This paper describes

More information

Combining Dynamic Texture and Structural Features for Speaker Identification

Combining Dynamic Texture and Structural Features for Speaker Identification Combining Dynamic Texture and Structural Features for Speaker Identification Guoying Zhao Machine Vision Group Infotech Oulu and Department of Electrical and Information Engineering P. O. Box 4500 FI-90014

More information

Image Denoising AGAIN!?

Image Denoising AGAIN!? 1 Image Denoising AGAIN!? 2 A Typical Imaging Pipeline 2 Sources of Noise (1) Shot Noise - Result of random photon arrival - Poisson distributed - Serious in low-light condition - Not so bad under good

More information

Automatic speech reading by oral motion tracking for user authentication system

Automatic speech reading by oral motion tracking for user authentication system 2012 2013 Third International Conference on Advanced Computing & Communication Technologies Automatic speech reading by oral motion tracking for user authentication system Vibhanshu Gupta, M.E. (EXTC,

More information

Sparse Coding Based Lip Texture Representation For Visual Speaker Identification

Sparse Coding Based Lip Texture Representation For Visual Speaker Identification Sparse Coding Based Lip Texture Representation For Visual Speaker Identification Author Lai, Jun-Yao, Wang, Shi-Lin, Shi, Xing-Jian, Liew, Alan Wee-Chung Published 2014 Conference Title Proceedings of

More information

MAPPING FROM SPEECH TO IMAGES USING CONTINUOUS STATE SPACE MODELS. Tue Lehn-Schiøler, Lars Kai Hansen & Jan Larsen

MAPPING FROM SPEECH TO IMAGES USING CONTINUOUS STATE SPACE MODELS. Tue Lehn-Schiøler, Lars Kai Hansen & Jan Larsen MAPPING FROM SPEECH TO IMAGES USING CONTINUOUS STATE SPACE MODELS Tue Lehn-Schiøler, Lars Kai Hansen & Jan Larsen The Technical University of Denmark Informatics and Mathematical Modelling Richard Petersens

More information

Spoken Term Detection Using Multiple Speech Recognizers Outputs at NTCIR-9 SpokenDoc STD subtask

Spoken Term Detection Using Multiple Speech Recognizers Outputs at NTCIR-9 SpokenDoc STD subtask NTCIR-9 Workshop: SpokenDoc Spoken Term Detection Using Multiple Speech Recognizers Outputs at NTCIR-9 SpokenDoc STD subtask Hiromitsu Nishizaki Yuto Furuya Satoshi Natori Yoshihiro Sekiguchi University

More information

Tagging Video Contents with Positive/Negative Interest Based on User s Facial Expression

Tagging Video Contents with Positive/Negative Interest Based on User s Facial Expression Tagging Video Contents with Positive/Negative Interest Based on User s Facial Expression Masanori Miyahara 1, Masaki Aoki 1,TetsuyaTakiguchi 2,andYasuoAriki 2 1 Graduate School of Engineering, Kobe University

More information

Face Objects Detection in still images using Viola-Jones Algorithm through MATLAB TOOLS

Face Objects Detection in still images using Viola-Jones Algorithm through MATLAB TOOLS Face Objects Detection in still images using Viola-Jones Algorithm through MATLAB TOOLS Dr. Mridul Kumar Mathur 1, Priyanka Bhati 2 Asst. Professor (Selection Grade), Dept. of Computer Science, LMCST,

More information

暗い Lena トーンマッピング とは? 明るい Lena. 元の Lena. tone mapped. image. original. image. tone mapped. tone mapped image. image. original image. original.

暗い Lena トーンマッピング とは? 明るい Lena. 元の Lena. tone mapped. image. original. image. tone mapped. tone mapped image. image. original image. original. 暗い Lena トーンマッピング とは? tone mapped 画素値 ( ) output piel value input piel value 画素値 ( ) / 2 original 元の Lena 明るい Lena tone mapped 画素値 ( ) output piel value input piel value 画素値 ( ) tone mapped 画素値 ( ) output

More information

A Real Time Facial Expression Classification System Using Local Binary Patterns

A Real Time Facial Expression Classification System Using Local Binary Patterns A Real Time Facial Expression Classification System Using Local Binary Patterns S L Happy, Anjith George, and Aurobinda Routray Department of Electrical Engineering, IIT Kharagpur, India Abstract Facial

More information

Detection of a Single Hand Shape in the Foreground of Still Images

Detection of a Single Hand Shape in the Foreground of Still Images CS229 Project Final Report Detection of a Single Hand Shape in the Foreground of Still Images Toan Tran (dtoan@stanford.edu) 1. Introduction This paper is about an image detection system that can detect

More information

Analysis on the Multi-stakeholder Structure in the IGF Discussions

Analysis on the Multi-stakeholder Structure in the IGF Discussions Voicemail & FAX +1-650-653-2501 +81-3-4496-6014 m-yokozawa@i.kyoto-u.ac.jp http://yokozawa.mois.asia/ Analysis on the Multi-stakeholder Structure in the IGF 2008-2013 Discussions February 22, 2014 Shinnosuke

More information

A Novel Visual Speech Representation and HMM Classification for Visual Speech Recognition

A Novel Visual Speech Representation and HMM Classification for Visual Speech Recognition A Novel Visual Speech Representation and HMM Classification for Visual Speech Recognition Dahai Yu, Ovidiu Ghita, Alistair Sutherland*, Paul F Whelan Vision Systems Group, School of Electronic Engineering

More information

Speaker Localisation Using Audio-Visual Synchrony: An Empirical Study

Speaker Localisation Using Audio-Visual Synchrony: An Empirical Study Speaker Localisation Using Audio-Visual Synchrony: An Empirical Study H.J. Nock, G. Iyengar, and C. Neti IBM TJ Watson Research Center, PO Box 218, Yorktown Heights, NY 10598. USA. Abstract. This paper

More information

Xing Fan, Carlos Busso and John H.L. Hansen

Xing Fan, Carlos Busso and John H.L. Hansen Xing Fan, Carlos Busso and John H.L. Hansen Center for Robust Speech Systems (CRSS) Erik Jonsson School of Engineering & Computer Science Department of Electrical Engineering University of Texas at Dallas

More information

Robust Audio-Visual Speech Recognition under Noisy Audio-Video Conditions

Robust Audio-Visual Speech Recognition under Noisy Audio-Video Conditions Robust Audio-Visual Speech Recognition under Noisy Audio-Video Conditions Stewart, D., Seymour, R., Pass, A., & Ji, M. (2014). Robust Audio-Visual Speech Recognition under Noisy Audio-Video Conditions.

More information

Definition of Visual Speech Element and Research on a Method of Extracting Feature Vector for Korean Lip-Reading

Definition of Visual Speech Element and Research on a Method of Extracting Feature Vector for Korean Lip-Reading Definition of Visual Speech Element and Research on a Method of Extracting Feature Vector for Korean Lip-Reading Ha Jong Won,Li Gwang Chol,Kim Hyo Chol,Li Kum Song College of Computer Science, Kim Il Sung

More information

Audio-Visual Speech Recognition for Slavonic Languages (Czech and Russian)

Audio-Visual Speech Recognition for Slavonic Languages (Czech and Russian) Audio-Visual Speech Recognition for Slavonic Languages (Czech and Russian) Petr Císař, Jan Zelina, Miloš Železný, Alexey Karpov 2, Andrey Ronzhin 2 Department of Cybernetics, University of West Bohemia

More information

Relaxed Consistency models and software distributed memory. Computer Architecture Textbook pp.79-83

Relaxed Consistency models and software distributed memory. Computer Architecture Textbook pp.79-83 Relaxed Consistency models and software distributed memory Computer Architecture Textbook pp.79-83 What is the consistency model? Coherence vs. Consistency (again) Coherence and consistency are complementary:

More information

Face Recognition for Mobile Devices

Face Recognition for Mobile Devices Face Recognition for Mobile Devices Aditya Pabbaraju (adisrinu@umich.edu), Srujankumar Puchakayala (psrujan@umich.edu) INTRODUCTION Face recognition is an application used for identifying a person from

More information

Studies of Large-Scale Data Visualization: EXTRAWING and Visual Data Mining

Studies of Large-Scale Data Visualization: EXTRAWING and Visual Data Mining Chapter 3 Visualization Studies of Large-Scale Data Visualization: EXTRAWING and Visual Data Mining Project Representative Fumiaki Araki Earth Simulator Center, Japan Agency for Marine-Earth Science and

More information

A Multimodal Sensor Fusion Architecture for Audio-Visual Speech Recognition

A Multimodal Sensor Fusion Architecture for Audio-Visual Speech Recognition A Multimodal Sensor Fusion Architecture for Audio-Visual Speech Recognition by Mustapha A. Makkook A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree

More information

Audio Visual Speech Recognition

Audio Visual Speech Recognition Project #7 Audio Visual Speech Recognition Final Presentation 06.08.2010 PROJECT MEMBERS Hakan Erdoğan Saygın Topkaya Berkay Yılmaz Umut Şen Alexey Tarasov Presentation Outline 1.Project Information 2.Proposed

More information

Facial Feature Detection

Facial Feature Detection Facial Feature Detection Rainer Stiefelhagen 21.12.2009 Interactive Systems Laboratories, Universität Karlsruhe (TH) Overview Resear rch Group, Universität Karlsruhe (TH H) Introduction Review of already

More information

Simultaneous Design of Feature Extractor and Pattern Classifer Using the Minimum Classification Error Training Algorithm

Simultaneous Design of Feature Extractor and Pattern Classifer Using the Minimum Classification Error Training Algorithm Griffith Research Online https://research-repository.griffith.edu.au Simultaneous Design of Feature Extractor and Pattern Classifer Using the Minimum Classification Error Training Algorithm Author Paliwal,

More information

Hierarchical Audio-Visual Cue Integration Framework for Activity Analysis in Intelligent Meeting Rooms

Hierarchical Audio-Visual Cue Integration Framework for Activity Analysis in Intelligent Meeting Rooms Hierarchical Audio-Visual Cue Integration Framework for Activity Analysis in Intelligent Meeting Rooms Shankar T. Shivappa University of California, San Diego 9500 Gilman Drive, La Jolla CA 92093-0434,

More information

Yamaha Steinberg USB Driver V for Windows Release Notes

Yamaha Steinberg USB Driver V for Windows Release Notes Yamaha Steinberg USB Driver V1.10.4 for Windows Release Notes Contents System Requirements for Software Main Revisions and Enhancements Legacy Updates System Requirements for Software - Note that the system

More information

Realistic Talking Head for Human-Car-Entertainment Services

Realistic Talking Head for Human-Car-Entertainment Services Realistic Talking Head for Human-Car-Entertainment Services Kang Liu, M.Sc., Prof. Dr.-Ing. Jörn Ostermann Institut für Informationsverarbeitung, Leibniz Universität Hannover Appelstr. 9A, 30167 Hannover

More information

Video Annotation and Retrieval Using Vague Shot Intervals

Video Annotation and Retrieval Using Vague Shot Intervals Master Thesis Video Annotation and Retrieval Using Vague Shot Intervals Supervisor Professor Katsumi Tanaka Department of Social Informatics Graduate School of Informatics Kyoto University Naoki FUKINO

More information

Yamaha Steinberg USB Driver V for Mac Release Notes

Yamaha Steinberg USB Driver V for Mac Release Notes Yamaha Steinberg USB Driver V1.10.2 for Mac Release Notes Contents System Requirements for Software Main Revisions and Enhancements Legacy Updates System Requirements for Software - Note that the system

More information

M I RA Lab. Speech Animation. Where do we stand today? Speech Animation : Hierarchy. What are the technologies?

M I RA Lab. Speech Animation. Where do we stand today? Speech Animation : Hierarchy. What are the technologies? MIRALab Where Research means Creativity Where do we stand today? M I RA Lab Nadia Magnenat-Thalmann MIRALab, University of Geneva thalmann@miralab.unige.ch Video Input (face) Audio Input (speech) FAP Extraction

More information

A long, deep and wide artificial neural net for robust speech recognition in unknown noise

A long, deep and wide artificial neural net for robust speech recognition in unknown noise A long, deep and wide artificial neural net for robust speech recognition in unknown noise Feipeng Li, Phani S. Nidadavolu, and Hynek Hermansky Center for Language and Speech Processing Johns Hopkins University,

More information

DECODING VISEMES: IMPROVING MACHINE LIP-READING. Helen L. Bear and Richard Harvey

DECODING VISEMES: IMPROVING MACHINE LIP-READING. Helen L. Bear and Richard Harvey DECODING VISEMES: IMPROVING MACHINE LIP-READING Helen L. Bear and Richard Harvey School of Computing Sciences, University of East Anglia, Norwich, NR4 7TJ, United Kingdom ABSTRACT To undertake machine

More information

Face Recognition-based Automatic Tagging Scheme for SNS

Face Recognition-based Automatic Tagging Scheme for SNS Face Recognition-based Automatic Tagging Scheme for SNS Outline Introduction Motivation Related Work Methodology Face Detection Face Matching Experimental Results Conclusion 01 Introduction Tag is Metadata

More information

Improving Speaker Verification Performance in Presence of Spoofing Attacks Using Out-of-Domain Spoofed Data

Improving Speaker Verification Performance in Presence of Spoofing Attacks Using Out-of-Domain Spoofed Data INTERSPEECH 17 August 24, 17, Stockholm, Sweden Improving Speaker Verification Performance in Presence of Spoofing Attacks Using Out-of-Domain Spoofed Data Achintya Kr. Sarkar 1, Md. Sahidullah 2, Zheng-Hua

More information

LIP ACTIVITY DETECTION FOR TALKING FACES CLASSIFICATION IN TV-CONTENT

LIP ACTIVITY DETECTION FOR TALKING FACES CLASSIFICATION IN TV-CONTENT LIP ACTIVITY DETECTION FOR TALKING FACES CLASSIFICATION IN TV-CONTENT Meriem Bendris 1,2, Delphine Charlet 1, Gérard Chollet 2 1 France Télécom R&D - Orange Labs, France 2 CNRS LTCI, TELECOM-ParisTech,

More information

PHONEME-TO-VISEME MAPPING FOR VISUAL SPEECH RECOGNITION

PHONEME-TO-VISEME MAPPING FOR VISUAL SPEECH RECOGNITION PHONEME-TO-VISEME MAPPING FOR VISUAL SPEECH RECOGNITION LucaCappelletta 1, Naomi Harte 1 1 Department ofelectronicand ElectricalEngineering, TrinityCollegeDublin, Ireland {cappelll, nharte}@tcd.ie Keywords:

More information

AUDIOVISUAL SPEECH RECOGNITION USING MULTISCALE NONLINEAR IMAGE DECOMPOSITION

AUDIOVISUAL SPEECH RECOGNITION USING MULTISCALE NONLINEAR IMAGE DECOMPOSITION AUDIOVISUAL SPEECH RECOGNITION USING MULTISCALE NONLINEAR IMAGE DECOMPOSITION Iain Matthews, J. Andrew Bangham and Stephen Cox School of Information Systems, University of East Anglia, Norwich, NR4 7TJ,

More information

Skeleton Cube for Lighting Environment Estimation

Skeleton Cube for Lighting Environment Estimation (MIRU2004) 2004 7 606 8501 E-mail: {takesi-t,maki,tm}@vision.kuee.kyoto-u.ac.jp 1) 2) Skeleton Cube for Lighting Environment Estimation Takeshi TAKAI, Atsuto MAKI, and Takashi MATSUYAMA Graduate School

More information

A Robust Method of Facial Feature Tracking for Moving Images

A Robust Method of Facial Feature Tracking for Moving Images A Robust Method of Facial Feature Tracking for Moving Images Yuka Nomura* Graduate School of Interdisciplinary Information Studies, The University of Tokyo Takayuki Itoh Graduate School of Humanitics and

More information

Discriminative training and Feature combination

Discriminative training and Feature combination Discriminative training and Feature combination Steve Renals Automatic Speech Recognition ASR Lecture 13 16 March 2009 Steve Renals Discriminative training and Feature combination 1 Overview Hot topics

More information

Lip Tracking for MPEG-4 Facial Animation

Lip Tracking for MPEG-4 Facial Animation Lip Tracking for MPEG-4 Facial Animation Zhilin Wu, Petar S. Aleksic, and Aggelos. atsaggelos Department of Electrical and Computer Engineering Northwestern University 45 North Sheridan Road, Evanston,

More information

Hybrid Approach of Facial Expression Recognition

Hybrid Approach of Facial Expression Recognition Hybrid Approach of Facial Expression Recognition Pinky Rai, Manish Dixit Department of Computer Science and Engineering, Madhav Institute of Technology and Science, Gwalior (M.P.), India Abstract In recent

More information

電脳梁山泊烏賊塾 構造体のサイズ. Visual Basic

電脳梁山泊烏賊塾 構造体のサイズ. Visual Basic 構造体 構造体のサイズ Marshal.SizeOf メソッド 整数型等型のサイズが定義されて居る構造体の場合 Marshal.SizeOf メソッドを使う事に依り型のサイズ ( バイト数 ) を取得する事が出来る 引数に値やオブジェクトを直接指定するか typeof や GetType で取得した型情報を渡す事に依り 其の型のサイズを取得する事が出来る 下記のプログラムを実行する事に依り Marshal.SizeOf

More information

Multifactor Fusion for Audio-Visual Speaker Recognition

Multifactor Fusion for Audio-Visual Speaker Recognition Proceedings of the 7th WSEAS International Conference on Signal, Speech and Image Processing, Beijing, China, September 15-17, 2007 70 Multifactor Fusion for Audio-Visual Speaker Recognition GIRIJA CHETTY

More information

Baseball Game Highlight & Event Detection

Baseball Game Highlight & Event Detection Baseball Game Highlight & Event Detection Student: Harry Chao Course Adviser: Winston Hu 1 Outline 1. Goal 2. Previous methods 3. My flowchart 4. My methods 5. Experimental result 6. Conclusion & Future

More information

SVD-based Universal DNN Modeling for Multiple Scenarios

SVD-based Universal DNN Modeling for Multiple Scenarios SVD-based Universal DNN Modeling for Multiple Scenarios Changliang Liu 1, Jinyu Li 2, Yifan Gong 2 1 Microsoft Search echnology Center Asia, Beijing, China 2 Microsoft Corporation, One Microsoft Way, Redmond,

More information

The Nottingham eprints service makes this work by researchers of the University of Nottingham available open access under the following conditions.

The Nottingham eprints service makes this work by researchers of the University of Nottingham available open access under the following conditions. Petridis, Stavros and Stafylakis, Themos and Ma, Pingchuan and Cai, Feipeng and Tzimiropoulos, Georgios and Pantic, Maja (2018) End-to-end audiovisual speech recognition. In: IEEE International Conference

More information

Query Translation for CLIR: EWC vs. Google Translate

Query Translation for CLIR: EWC vs. Google Translate 0 IEEE International Conference on Information Science and Technology Wuhan, Hubei, China; March 3-5, 0 Query Translation for CLIR: EWC vs. Google V. Klyuev and Y. Haralambous Abstract A new approach to

More information

Motion Path Searches for Maritime Robots

Motion Path Searches for Maritime Robots Journal of National Fisheries University 59 ⑷ 245-251(2011) Motion Path Searches for Maritime Robots Eiji Morimoto 1, Makoto Nakamura 1, Dai Yamanishi 1 and Eiki Osaki 2 Abstract : A method based on genetic

More information

Comparison of Image Transform-Based Features for Visual Speech Recognition in Clean and Corrupted Videos

Comparison of Image Transform-Based Features for Visual Speech Recognition in Clean and Corrupted Videos Comparison of Image Transform-Based Features for Visual Speech Recognition in Clean and Corrupted Videos Seymour, R., Stewart, D., & Ji, M. (8). Comparison of Image Transform-Based Features for Visual

More information

Progress Report of Final Year Project

Progress Report of Final Year Project Progress Report of Final Year Project Project Title: Design and implement a face-tracking engine for video William O Grady 08339937 Electronic and Computer Engineering, College of Engineering and Informatics,

More information

Vehicle Calibration Techniques Established and Substantiated for Motorcycles

Vehicle Calibration Techniques Established and Substantiated for Motorcycles Technical paper Vehicle Calibration Techniques Established and Substantiated for Motorcycles モータサイクルに特化した車両適合手法の確立と実証 Satoru KANNO *1 Koichi TSUNOKAWA *1 Takashi SUDA *1 菅野寛角川浩一須田玄 モータサイクル向け ECU は, 搭載性をよくするため小型化が求められ,

More information

PCIe SSD PACC EP P3700 Intel Solid-State Drive Data Center Tool

PCIe SSD PACC EP P3700 Intel Solid-State Drive Data Center Tool Installation Guide - 日本語 PCIe SSD PACC EP P3700 Intel Solid-State Drive Data Center Tool Software Version 2.x 2015 年 4 月 富士通株式会社 1 著作権および商標 Copyright 2015 FUJITSU LIMITED 使用されているハードウェア名とソフトウェア名は 各メーカーの商標です

More information

Off-line Character Recognition using On-line Character Writing Information

Off-line Character Recognition using On-line Character Writing Information Off-line Character Recognition using On-line Character Writing Information Hiromitsu NISHIMURA and Takehiko TIMIKAWA Dept. of Information and Computer Sciences, Kanagawa Institute of Technology 1030 Shimo-ogino,

More information

Adaptive Feature Extraction with Haar-like Features for Visual Tracking

Adaptive Feature Extraction with Haar-like Features for Visual Tracking Adaptive Feature Extraction with Haar-like Features for Visual Tracking Seunghoon Park Adviser : Bohyung Han Pohang University of Science and Technology Department of Computer Science and Engineering pclove1@postech.ac.kr

More information