Phoneme Analysis of Image Feature in Utterance Recognition Using AAM in Lip Area
|
|
- Melvyn Wood
- 5 years ago
- Views:
Transcription
1 MIRU 7 AAM {komai,miyamoto}@me.cs.scitec.kobe-u.ac.jp, {takigu,ariki}@kobe-u.ac.jp Active Appearance Model Active Appearance Model combined DCT Active Appearance Model combined Phoneme Analysis of Image Feature in Utterance Recognition Using AAM in Lip Area Yuto KOMAI, Chikoto MIYAMOTO, Tetsuya TAKIGUCHI, and Yasuo ARIKI Graduate School of System Informatics, Kobe University Rokkodaicho 1 1, Nada-ku, Kobe, Hyogo, Japan Organization of Advanced Science and Technology, Kobe University Rokkodaicho 1 1, Nada-ku, Kobe, Hyogo, Japan {komai,miyamoto}@me.cs.scitec.kobe-u.ac.jp, {takigu,ariki}@kobe-u.ac.jp Abstract Lip-reading, that recognizes the content of the utterance from the movement of the lip enables the recognition under a noise environment. Multimodal speech recognition using lip dynamic scene information together with acoustic information is attracting attention and the research is advanced in recent years. Since visual information plays a great role in multimodal speech recognition, what to select as the image feature becomes a significant point. This paper proposes, for spoken word recognition, to utilize c combined parameter as the image feature extracted by Active Appearance Model applied to a face image including the lip area. Combined parameter contains information of the coordinate value and the intensity value as the image feature. The recognition rate was improved by the proposed method compared to the conventional method such as DCT and the principal component score. Moreover, we integrated with high accuracy the phoneme score from acoustic information and the viseme score from visual information. Key words Lip area, Active Appearance Model, Combined parameter, Integration of audio and visual, Viseme 1. IS3-31:1771
2 [1] [2] [3] [4] [5] RGB [6] [7] [8] [9] SNAKE [] Active Shape Model [11] Active Appearance Model [12] [13] [14] [15] [2] [3] [15] [16] DCT [14] [17] Active Appearance Models AAM, AAM combined c shape texture c HMM Haar-like AdaBoost [18] [19] [] AAM Haar-like AdaBoost AAM AAM AdaBoost AAM 音声 特徴抽出 MFCC HMM q q q q4 1 入力動画 結果統合 認識 画像 顔検出 顔領域 AAM 適用 唇領域 AAM 適用 特徴抽出 c パラメータ HMM q q q q4 AAM AAM AAM AAM 2 AAM AAM AAM c texture [21] AAM AAM 2 AAM AAM AAM c HMM HMM IS3-31:1772
3 HMM Active Appearance Models AAM Cootes shape texture shape s s s s texture texture g s g 1 2 s = x 1, y 1,..., x n, y n T 1 AAM のモデル構築に用いた 63 点の特徴点を与えた画像 g = g 1,..., g m T 2 x i, y i i < = n g j j < = m s s ḡ s g s ḡ P s P g 3 4 s = s + P s b s 3 顔領域の shape 顔領域 AAM 唇領域の shape 唇領域 AAM g = ḡ + P g b g AAM b s b g shape texture shape texture shape texture b s b g 5 6 W s b s W s P T s s s b = = = Qc 5 b g P T g g ḡ Q = Q s Q g 6 W s shape texture Q c shape texture combined c s g 7 8 sc = s + P s W s 1 Q s c 7 gc = ḡ + P g Q g c 8 c shape texture AAM AAM shape texture AAM shape texture 3. 3 Combined AAM 3 c c c c I i I i Wp AAM gc e 9 e c p IS3-31:1773
4 閉じた口 平均テクスチャ う 4. HMM HMM HMM HMM MFCC12, [4] 3 あい c L A+V = αl A + 1 αl V, < = α < = 1 L A+V L A L V α ec, p = gc I i Wp 2 9 p W c shape texture 95% 78 c c c, 3. 4 c 2 DCT [22] AAM DCT % DCT ATR 216 ATR 1 Logicool Qcam Orbit MP 9 7 fps SONY ECM-PC / / cm 1 SN db leave-one-out 9 1 closed open HMM 3 4 HMM closed HMM monophone closed open open closed1 HMM closed2 closed HMM open open HMM DCT 2 DCT PCA C parameterface AAM c C parameterlip AAM c IS3-31:1774
5 率識 7 認 DCT PCA C parameterface C parameterlip Speech % 率 7 識認 DCT PCA C parameter 混合数 closed2 closed1 closed2 open 4 closed1 c closed2 c DCT 2.2 PCA 1.4 c 2.5 open DCT 11 PCA 4 c 5 closed2 open c closed2 open closed1 closed1 closed2 HMM HMM HMM HMM open closed2 5 6 HMM 5 6 closed2 closed2 closed2 open closed2 closed2 open 8 % 率 7 識認 DCT PCA C parameter 混合数 open 5. 3 SN db - c HMM HMM HMM closed1 closed HMM closed2 open HMM open 1 clean SN 8 % closed1 db - IS3-31:1775
6 closed2 db - db - a a: b by ch d dy e e: f g gy h hy i i: j k ky m my n N ny o o: p py q r ry s sh t ts u u: w y z 欠落 a a: 3 b by 1 ch d dy e 33 e: 1 1 f g gy h hy i i: 1 1 j 1 11 k ky m 13 1 my 1 n 5 1 N ny o 52 1 o: 1 p 6 py q 3 3 r ry 1 s 1 sh 1 17 t 1 1 ts 5 u u: 4 2 w 1 3 y z 5 挿入 confusion matrixopen open SN db - SN HMM open Julius c 11 open confusion matrix 11 c open confusion matrix Correct Accuracy a a: b by ch d dy e e: f g gy h hy i i: j k ky m my n N ny o o: p py q r ry s sh t ts u u: w y z 欠落 a a: 3 b by 1 ch d dy e e: 2 f g gy h hy i i: 2 j k ky m my 1 n N ny o o: 1 p py q 1 5 r ry 1 s sh t ts u u: 6 w 1 3 y z 挿入 c confusion matrixopen 1 open open closed2 Accuracy Correct Accuracy Correct Accuracy Correct open IS3-31:1776
7 c i u ii uu ii i uu u ii uu r r a ra N N N N N k g n r b m p ka ga viseme closed2 [17] 5. HMM 12 eikyou eigyou 2 eisyou closed db - open 12 closed2 13 open closed2 open confusion matrix 2 14 a i u e o p r sy w t s y vf N 欠落 a 73 8 i u e o 5 p 54 1 r sy w t s y vf N 2 29 挿入 db - c confusion matrixopen 2 1 IS3-31:1777
8 2 open closed2 Accuracy Correct Accuracy Correct closed2 14 N 6. 1 t t t d n t t vf N 7. AAM combined confusion matrix 1 AAM monophone HMM triphone HMM [1] G Potamianos, H.P. Graf, Discriminative Training of HMM Stream Exponents for Audio-Visual Speech Recognition, Proc. ICASSP98, Seattle, U.S.A., vol.6, pp , [2] pp.3-4, Sep, 2. [3],,,,,, pp , Apr. 3. [4] WIT8-7 pp.37-42, 8. [5] HMM pp ,. [6] Vol.J85-D-, No.12, pp , 2. [7] Vol.J89-D, No.9, pp.25-32, 6. [8] Rainer Stiefelhagen, Uwe Meiger, Jie Yang, Real- Time Lip-Tracking For LipReading, Eurospeech 97, [9] Vol.J84-D-, No.3, pp , 1. [] PRMU3-276, pp , 4. [11] Juergen Luettin, Neil A. Thacker, Steve W.Beet, Visual Speech Recognition using Active Shape Models And Hidden Markov Models ICASSP-96, Vol.2, pp.817-8, [12] Cootes, T.F., Active Appearance Model, Proc. European Conference on Computer Vision, Vol2, pp , [13] Cootes, T.F., K Walker, C.J.Taylor, View-based Active Appearance Models Image and Vision Computing, pp , 2. [14] C. Neti, G. Potamianos, J. Luettin, I. Matthews, H. Glotin, D. Vergyri, J. Sison, A. Mashari, and J. Zhou, Audio-visual speech recognition, Center for Language and Speech Processing, The Johns Hopkins University, Baltimore, MD, Final Workshop Report, Oct.. [15] MIRU8, IS-17, pp , 8. [16] HMM Vol.PRMU2-124, pp.25-, 2. [17] JSAI4, 1E2-2, pp.1-4, 4. [18] P.Viola, M. Jones, Rapid Object Detection Using Boosted Cascade of Simple Features In Proc. IEEE Conf. on Computer Vision and Pattern Recognition, pp.1-9, 1. [19] Joint Haar-like MIRU5 pp [] SP8-7, pp.1-6, 8. [21] AAM MIRU9 pp [22] Vol.1, No.385, pp.25-32, 1. IS3-31:1778
Facial Age Estimation Based on KNN-SVR Regression and AAM Parameters
画像の認識 理解シンポジウム (MIRU2012) 2012 年 8 月 Facial Age Estimation Based on KNN-SVR Regression and AAM Parameters Songzhu Gao Tetsuya Takiguchi and Yasuo Ariki Graduate School of System Informatics, Kobe University,
More information2-2-2, Hikaridai, Seika-cho, Soraku-gun, Kyoto , Japan 2 Graduate School of Information Science, Nara Institute of Science and Technology
ISCA Archive STREAM WEIGHT OPTIMIZATION OF SPEECH AND LIP IMAGE SEQUENCE FOR AUDIO-VISUAL SPEECH RECOGNITION Satoshi Nakamura 1 Hidetoshi Ito 2 Kiyohiro Shikano 2 1 ATR Spoken Language Translation Research
More informationJoint Processing of Audio and Visual Information for Speech Recognition
Joint Processing of Audio and Visual Information for Speech Recognition Chalapathy Neti and Gerasimos Potamianos IBM T.J. Watson Research Center Yorktown Heights, NY 10598 With Iain Mattwes, CMU, USA Juergen
More informationMobile Robot Control Interface Using Augmented Free-viewpoint Image Generation
THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS TECHNICAL REPORT OF IEICE. 630 0192 8916-5 E-mail: {yuko-u,fumio-o,tomoka-s,yokoya}@is.naist.jp,,,,,, Abstract Mobile Robot Control
More informationAudio-Visual Speech Processing System for Polish with Dynamic Bayesian Network Models
Proceedings of the orld Congress on Electrical Engineering and Computer Systems and Science (EECSS 2015) Barcelona, Spain, July 13-14, 2015 Paper No. 343 Audio-Visual Speech Processing System for Polish
More informationA New Manifold Representation for Visual Speech Recognition
A New Manifold Representation for Visual Speech Recognition Dahai Yu, Ovidiu Ghita, Alistair Sutherland, Paul F. Whelan School of Computing & Electronic Engineering, Vision Systems Group Dublin City University,
More information3D LIP TRACKING AND CO-INERTIA ANALYSIS FOR IMPROVED ROBUSTNESS OF AUDIO-VIDEO AUTOMATIC SPEECH RECOGNITION
3D LIP TRACKING AND CO-INERTIA ANALYSIS FOR IMPROVED ROBUSTNESS OF AUDIO-VIDEO AUTOMATIC SPEECH RECOGNITION Roland Goecke 1,2 1 Autonomous System and Sensing Technologies, National ICT Australia, Canberra,
More informationConfusability of Phonemes Grouped According to their Viseme Classes in Noisy Environments
PAGE 265 Confusability of Phonemes Grouped According to their Viseme Classes in Noisy Environments Patrick Lucey, Terrence Martin and Sridha Sridharan Speech and Audio Research Laboratory Queensland University
More informationLipreading using Profile Lips Rebuilt by 3D Data from the Kinect
Journal of Computational Information Systems 11: 7 (2015) 2429 2438 Available at http://www.jofcis.com Lipreading using Profile Lips Rebuilt by 3D Data from the Kinect Jianrong WANG 1, Yongchun GAO 1,
More informationSPEECH FEATURE EXTRACTION USING WEIGHTED HIGHER-ORDER LOCAL AUTO-CORRELATION
Far East Journal of Electronics and Communications Volume 3, Number 2, 2009, Pages 125-140 Published Online: September 14, 2009 This paper is available online at http://www.pphmj.com 2009 Pushpa Publishing
More information2. 集団の注目位置推定 提案手法では 複数の人物が同一の対象を注視している状況 置 を推定する手法を検討する この状況下では 図 1 のよう. 顔画像からそれぞれの注目位置を推定する ただし f は 1 枚 この仮説に基づいて 複数の人物を同時に撮影した低解像度顔
一般社団法人電子情報通信学会 THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS 信学技報 IEICE Technical Report PRMU17-98(17-1) TECHNICAL
More informationTwo-Layered Audio-Visual Speech Recognition for Robots in Noisy Environments
The 2 IEEE/RSJ International Conference on Intelligent Robots and Systems October 8-22, 2, Taipei, Taiwan Two-Layered Audio-Visual Speech Recognition for Robots in Noisy Environments Takami Yoshida, Kazuhiro
More informationResearch Article Audio-Visual Speech Recognition Using Lip Information Extracted from Side-Face Images
Hindawi Publishing Corporation EURASIP Journal on Audio, Speech, and Music Processing Volume 2007, Article ID 64506, 9 pages doi:10.1155/2007/64506 Research Article Audio-Visual Speech Recognition Using
More informationAudio Visual Isolated Oriya Digit Recognition Using HMM and DWT
Conference on Advances in Communication and Control Systems 2013 (CAC2S 2013) Audio Visual Isolated Oriya Digit Recognition Using HMM and DWT Astik Biswas Department of Electrical Engineering, NIT Rourkela,Orrisa
More informationA Comparison of Visual Features for Audio-Visual Automatic Speech Recognition
A Comparison of Visual Features for Audio-Visual Automatic Speech Recognition N. Ahmad, S. Datta, D. Mulvaney and O. Farooq Loughborough Univ, LE11 3TU Leicestershire, UK n.ahmad@lboro.ac.uk 6445 Abstract
More informationarxiv: v1 [cs.cv] 3 Oct 2017
Which phoneme-to-viseme maps best improve visual-only computer lip-reading? Helen L. Bear, Richard W. Harvey, Barry-John Theobald and Yuxuan Lan School of Computing Sciences, University of East Anglia,
More informationEND-TO-END VISUAL SPEECH RECOGNITION WITH LSTMS
END-TO-END VISUAL SPEECH RECOGNITION WITH LSTMS Stavros Petridis, Zuwei Li Imperial College London Dept. of Computing, London, UK {sp14;zl461}@imperial.ac.uk Maja Pantic Imperial College London / Univ.
More informationAudio-visual speech recognition using deep bottleneck features and high-performance lipreading
Proceedings of APSIPA Annual Summit and Conference 215 16-19 December 215 Audio-visual speech recognition using deep bottleneck features and high-performance lipreading Satoshi TAMURA, Hiroshi NINOMIYA,
More informationLOW-DIMENSIONAL MOTION FEATURES FOR AUDIO-VISUAL SPEECH RECOGNITION
LOW-DIMENSIONAL MOTION FEATURES FOR AUDIO-VISUAL SPEECH Andrés Vallés Carboneras, Mihai Gurban +, and Jean-Philippe Thiran + + Signal Processing Institute, E.T.S.I. de Telecomunicación Ecole Polytechnique
More informationDynamic Stream Weighting for Turbo-Decoding-Based Audiovisual ASR
INTERSPEECH 2016 September 8 12, 2016, San Francisco, USA Dynamic Stream Weighting for Turbo-Decoding-Based Audiovisual ASR Sebastian Gergen 1, Steffen Zeiler 1, Ahmed Hussen Abdelaziz 2, Robert Nickel
More informationAn Introduction to Pattern Recognition
An Introduction to Pattern Recognition Speaker : Wei lun Chao Advisor : Prof. Jian-jiun Ding DISP Lab Graduate Institute of Communication Engineering 1 Abstract Not a new research field Wide range included
More informationLip Detection and Lip Geometric Feature Extraction using Constrained Local Model for Spoken Language Identification using Visual Speech Recognition
Indian Journal of Science and Technology, Vol 9(32), DOI: 10.17485/ijst/2016/v9i32/98737, August 2016 ISSN (Print) : 0974-6846 ISSN (Online) : 0974-5645 Lip Detection and Lip Geometric Feature Extraction
More informationMouse Pointer Tracking with Eyes
Mouse Pointer Tracking with Eyes H. Mhamdi, N. Hamrouni, A. Temimi, and M. Bouhlel Abstract In this article, we expose our research work in Human-machine Interaction. The research consists in manipulating
More informationObject Recognition by Integrated Information Using Web Images
2013 Second IAPR Asian Conference on Pattern Recognition Object Recognition by Integrated Information Using Web Images Hitoshi Nishimura, Yuko Ozasa, Yasuo Ariki, Mikio Nakano Graduate School of System
More informationフラクタル 1 ( ジュリア集合 ) 解説 : ジュリア集合 ( 自己平方フラクタル ) 入力パラメータの例 ( 小さな数値の変化で模様が大きく変化します. Ar や Ai の数値を少しずつ変化させて描画する. ) プログラムコード. 2010, AGU, M.
フラクタル 1 ( ジュリア集合 ) PictureBox 1 TextBox 1 TextBox 2 解説 : ジュリア集合 ( 自己平方フラクタル ) TextBox 3 複素平面 (= PictureBox1 ) 上の点 ( に対して, x, y) 初期値 ( 複素数 ) z x iy を決める. 0 k 1 z k 1 f ( z) z 2 k a 写像 ( 複素関数 ) (a : 複素定数
More information第 6 回画像センシングシンポジウム, 横浜,00 年 6 月 and pose variation, thus they will fail under these imaging conditions. n [7] a method for facial landmarks detection
S3-7 第 6 回画像センシングシンポジウム, 横浜,00 年 6 月 Local ndependent Components Analysis-Based Facial Features Detection Method M. Hassaballah,, omonori KANAZAWA 3, Shinobu DO 3, Shun DO Department of Computer Science,
More informationMouth Center Detection under Active Near Infrared Illumination
Proceedings of the 6th WSEAS International Conference on SIGNAL PROCESSING, Dallas, Texas, USA, March 22-24, 2007 173 Mouth Center Detection under Active Near Infrared Illumination THORSTEN GERNOTH, RALPH
More informationMouth Region Localization Method Based on Gaussian Mixture Model
Mouth Region Localization Method Based on Gaussian Mixture Model Kenichi Kumatani and Rainer Stiefelhagen Universitaet Karlsruhe (TH), Interactive Systems Labs, Am Fasanengarten 5, 76131 Karlsruhe, Germany
More informationTemporal Multimodal Learning in Audiovisual Speech Recognition
Temporal Multimodal Learning in Audiovisual Speech Recognition Di Hu, Xuelong Li, Xiaoqiang Lu School of Computer Science and Center for OPTical IMagery Analysis and Learning (OPTIMAL), Northwestern Polytechnical
More informationAudio-visual interaction in sparse representation features for noise robust audio-visual speech recognition
ISCA Archive http://www.isca-speech.org/archive Auditory-Visual Speech Processing (AVSP) 2013 Annecy, France August 29 - September 1, 2013 Audio-visual interaction in sparse representation features for
More informationMethods to Detect Malicious MS Document File using File Structure Inspection
MS 1,a) 2,b) 2 MS Rich Text Compound File Binary MS MS MS 98.4% MS MS Methods to Detect Malicious MS Document File using File Structure Inspection Abstract: Today, the number of targeted attacks is increasing,
More informationProbabilistic Facial Feature Extraction Using Joint Distribution of Location and Texture Information
Probabilistic Facial Feature Extraction Using Joint Distribution of Location and Texture Information Mustafa Berkay Yilmaz, Hakan Erdogan, Mustafa Unel Sabanci University, Faculty of Engineering and Natural
More informationAudio-Visual Speech Recognition Using MPEG-4 Compliant Visual Features
EURASIP Journal on Applied Signal Processing 2002:11, 1213 1227 c 2002 Hindawi Publishing Corporation Audio-Visual Speech Recognition Using MPEG-4 Compliant Visual Features Petar S. Aleksic Department
More informationFace. Feature Extraction. Visual. Feature Extraction. Acoustic Feature Extraction
A Bayesian Approach to Audio-Visual Speaker Identification Ara V Nefian 1, Lu Hong Liang 1, Tieyan Fu 2, and Xiao Xing Liu 1 1 Microprocessor Research Labs, Intel Corporation, fara.nefian, lu.hong.liang,
More informationPAPER Graph Cuts Segmentation by Using Local Texture Features of Multiresolution Analysis
IEICE TRANS. INF. & SYST., VOL.E92 D, NO.7 JULY 2009 1453 PAPER Graph Cuts Segmentation by Using Local Texture Features of Multiresolution Analysis Keita FUKUDA a), Nonmember, Tetsuya TAKIGUCHI, and Yasuo
More informationArticulatory Features for Robust Visual Speech Recognition
Articulatory Features for Robust Visual Speech Recognition Kate Saenko, Trevor Darrell, and James Glass MIT Computer Science and Artificial Intelligence Laboratory 32 Vassar Street Cambridge, Massachusetts,
More informationMySQL Cluster 7.3 リリース記念!! 5 分で作る MySQL Cluster 環境
MySQL Cluster 7.3 リリース記念!! 5 分で作る MySQL Cluster 環境 日本オラクル株式会社山崎由章 / MySQL Senior Sales Consultant, Asia Pacific and Japan 1 Copyright 2012, Oracle and/or its affiliates. All rights reserved. New!! 外部キー
More informationGeneric object recognition using graph embedding into a vector space
American Journal of Software Engineering and Applications 2013 ; 2(1) : 13-18 Published online February 20, 2013 (http://www.sciencepublishinggroup.com/j/ajsea) doi: 10.11648/j. ajsea.20130201.13 Generic
More informationISCA Archive
ISCA Archive http://www.isca-speech.org/archive Auditory-Visual Speech Processing 2005 (AVSP 05) British Columbia, Canada July 24-27, 2005 AUDIO-VISUAL SPEAKER IDENTIFICATION USING THE CUAVE DATABASE David
More informationCombining Audio and Video for Detection of Spontaneous Emotions
Combining Audio and Video for Detection of Spontaneous Emotions Rok Gajšek, Vitomir Štruc, Simon Dobrišek, Janez Žibert, France Mihelič, and Nikola Pavešić Faculty of Electrical Engineering, University
More informationSCALE BASED FEATURES FOR AUDIOVISUAL SPEECH RECOGNITION
IEE Colloquium on Integrated Audio-Visual Processing for Recognition, Synthesis and Communication, pp 8/1 8/7, 1996 1 SCALE BASED FEATURES FOR AUDIOVISUAL SPEECH RECOGNITION I A Matthews, J A Bangham and
More informationBengt J. Borgström, Student Member, IEEE, and Abeer Alwan, Senior Member, IEEE
1 A Low Complexity Parabolic Lip Contour Model With Speaker Normalization For High-Level Feature Extraction in Noise Robust Audio-Visual Speech Recognition Bengt J Borgström, Student Member, IEEE, and
More informationIP Network Technology
IP Network Technology IP Internet Procol QoS Quality of Service RPR Resilient Packet Ring FLASHWAVE2700 Abstract The Internet procol (IP) has made it possible to drastically broaden the bandwidth of networks
More informationAUDIO-VISUAL SPEECH-PROCESSING SYSTEM FOR POLISH APPLICABLE TO HUMAN-COMPUTER INTERACTION
Computer Science 19(1) 2018 https://doi.org/10.7494/csci.2018.19.1.2398 Tomasz Jadczyk AUDIO-VISUAL SPEECH-PROCESSING SYSTEM FOR POLISH APPLICABLE TO HUMAN-COMPUTER INTERACTION Abstract This paper describes
More informationCombining Dynamic Texture and Structural Features for Speaker Identification
Combining Dynamic Texture and Structural Features for Speaker Identification Guoying Zhao Machine Vision Group Infotech Oulu and Department of Electrical and Information Engineering P. O. Box 4500 FI-90014
More informationImage Denoising AGAIN!?
1 Image Denoising AGAIN!? 2 A Typical Imaging Pipeline 2 Sources of Noise (1) Shot Noise - Result of random photon arrival - Poisson distributed - Serious in low-light condition - Not so bad under good
More informationAutomatic speech reading by oral motion tracking for user authentication system
2012 2013 Third International Conference on Advanced Computing & Communication Technologies Automatic speech reading by oral motion tracking for user authentication system Vibhanshu Gupta, M.E. (EXTC,
More informationSparse Coding Based Lip Texture Representation For Visual Speaker Identification
Sparse Coding Based Lip Texture Representation For Visual Speaker Identification Author Lai, Jun-Yao, Wang, Shi-Lin, Shi, Xing-Jian, Liew, Alan Wee-Chung Published 2014 Conference Title Proceedings of
More informationMAPPING FROM SPEECH TO IMAGES USING CONTINUOUS STATE SPACE MODELS. Tue Lehn-Schiøler, Lars Kai Hansen & Jan Larsen
MAPPING FROM SPEECH TO IMAGES USING CONTINUOUS STATE SPACE MODELS Tue Lehn-Schiøler, Lars Kai Hansen & Jan Larsen The Technical University of Denmark Informatics and Mathematical Modelling Richard Petersens
More informationSpoken Term Detection Using Multiple Speech Recognizers Outputs at NTCIR-9 SpokenDoc STD subtask
NTCIR-9 Workshop: SpokenDoc Spoken Term Detection Using Multiple Speech Recognizers Outputs at NTCIR-9 SpokenDoc STD subtask Hiromitsu Nishizaki Yuto Furuya Satoshi Natori Yoshihiro Sekiguchi University
More informationTagging Video Contents with Positive/Negative Interest Based on User s Facial Expression
Tagging Video Contents with Positive/Negative Interest Based on User s Facial Expression Masanori Miyahara 1, Masaki Aoki 1,TetsuyaTakiguchi 2,andYasuoAriki 2 1 Graduate School of Engineering, Kobe University
More informationFace Objects Detection in still images using Viola-Jones Algorithm through MATLAB TOOLS
Face Objects Detection in still images using Viola-Jones Algorithm through MATLAB TOOLS Dr. Mridul Kumar Mathur 1, Priyanka Bhati 2 Asst. Professor (Selection Grade), Dept. of Computer Science, LMCST,
More information暗い Lena トーンマッピング とは? 明るい Lena. 元の Lena. tone mapped. image. original. image. tone mapped. tone mapped image. image. original image. original.
暗い Lena トーンマッピング とは? tone mapped 画素値 ( ) output piel value input piel value 画素値 ( ) / 2 original 元の Lena 明るい Lena tone mapped 画素値 ( ) output piel value input piel value 画素値 ( ) tone mapped 画素値 ( ) output
More informationA Real Time Facial Expression Classification System Using Local Binary Patterns
A Real Time Facial Expression Classification System Using Local Binary Patterns S L Happy, Anjith George, and Aurobinda Routray Department of Electrical Engineering, IIT Kharagpur, India Abstract Facial
More informationDetection of a Single Hand Shape in the Foreground of Still Images
CS229 Project Final Report Detection of a Single Hand Shape in the Foreground of Still Images Toan Tran (dtoan@stanford.edu) 1. Introduction This paper is about an image detection system that can detect
More informationAnalysis on the Multi-stakeholder Structure in the IGF Discussions
Voicemail & FAX +1-650-653-2501 +81-3-4496-6014 m-yokozawa@i.kyoto-u.ac.jp http://yokozawa.mois.asia/ Analysis on the Multi-stakeholder Structure in the IGF 2008-2013 Discussions February 22, 2014 Shinnosuke
More informationA Novel Visual Speech Representation and HMM Classification for Visual Speech Recognition
A Novel Visual Speech Representation and HMM Classification for Visual Speech Recognition Dahai Yu, Ovidiu Ghita, Alistair Sutherland*, Paul F Whelan Vision Systems Group, School of Electronic Engineering
More informationSpeaker Localisation Using Audio-Visual Synchrony: An Empirical Study
Speaker Localisation Using Audio-Visual Synchrony: An Empirical Study H.J. Nock, G. Iyengar, and C. Neti IBM TJ Watson Research Center, PO Box 218, Yorktown Heights, NY 10598. USA. Abstract. This paper
More informationXing Fan, Carlos Busso and John H.L. Hansen
Xing Fan, Carlos Busso and John H.L. Hansen Center for Robust Speech Systems (CRSS) Erik Jonsson School of Engineering & Computer Science Department of Electrical Engineering University of Texas at Dallas
More informationRobust Audio-Visual Speech Recognition under Noisy Audio-Video Conditions
Robust Audio-Visual Speech Recognition under Noisy Audio-Video Conditions Stewart, D., Seymour, R., Pass, A., & Ji, M. (2014). Robust Audio-Visual Speech Recognition under Noisy Audio-Video Conditions.
More informationDefinition of Visual Speech Element and Research on a Method of Extracting Feature Vector for Korean Lip-Reading
Definition of Visual Speech Element and Research on a Method of Extracting Feature Vector for Korean Lip-Reading Ha Jong Won,Li Gwang Chol,Kim Hyo Chol,Li Kum Song College of Computer Science, Kim Il Sung
More informationAudio-Visual Speech Recognition for Slavonic Languages (Czech and Russian)
Audio-Visual Speech Recognition for Slavonic Languages (Czech and Russian) Petr Císař, Jan Zelina, Miloš Železný, Alexey Karpov 2, Andrey Ronzhin 2 Department of Cybernetics, University of West Bohemia
More informationRelaxed Consistency models and software distributed memory. Computer Architecture Textbook pp.79-83
Relaxed Consistency models and software distributed memory Computer Architecture Textbook pp.79-83 What is the consistency model? Coherence vs. Consistency (again) Coherence and consistency are complementary:
More informationFace Recognition for Mobile Devices
Face Recognition for Mobile Devices Aditya Pabbaraju (adisrinu@umich.edu), Srujankumar Puchakayala (psrujan@umich.edu) INTRODUCTION Face recognition is an application used for identifying a person from
More informationStudies of Large-Scale Data Visualization: EXTRAWING and Visual Data Mining
Chapter 3 Visualization Studies of Large-Scale Data Visualization: EXTRAWING and Visual Data Mining Project Representative Fumiaki Araki Earth Simulator Center, Japan Agency for Marine-Earth Science and
More informationA Multimodal Sensor Fusion Architecture for Audio-Visual Speech Recognition
A Multimodal Sensor Fusion Architecture for Audio-Visual Speech Recognition by Mustapha A. Makkook A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree
More informationAudio Visual Speech Recognition
Project #7 Audio Visual Speech Recognition Final Presentation 06.08.2010 PROJECT MEMBERS Hakan Erdoğan Saygın Topkaya Berkay Yılmaz Umut Şen Alexey Tarasov Presentation Outline 1.Project Information 2.Proposed
More informationFacial Feature Detection
Facial Feature Detection Rainer Stiefelhagen 21.12.2009 Interactive Systems Laboratories, Universität Karlsruhe (TH) Overview Resear rch Group, Universität Karlsruhe (TH H) Introduction Review of already
More informationSimultaneous Design of Feature Extractor and Pattern Classifer Using the Minimum Classification Error Training Algorithm
Griffith Research Online https://research-repository.griffith.edu.au Simultaneous Design of Feature Extractor and Pattern Classifer Using the Minimum Classification Error Training Algorithm Author Paliwal,
More informationHierarchical Audio-Visual Cue Integration Framework for Activity Analysis in Intelligent Meeting Rooms
Hierarchical Audio-Visual Cue Integration Framework for Activity Analysis in Intelligent Meeting Rooms Shankar T. Shivappa University of California, San Diego 9500 Gilman Drive, La Jolla CA 92093-0434,
More informationYamaha Steinberg USB Driver V for Windows Release Notes
Yamaha Steinberg USB Driver V1.10.4 for Windows Release Notes Contents System Requirements for Software Main Revisions and Enhancements Legacy Updates System Requirements for Software - Note that the system
More informationRealistic Talking Head for Human-Car-Entertainment Services
Realistic Talking Head for Human-Car-Entertainment Services Kang Liu, M.Sc., Prof. Dr.-Ing. Jörn Ostermann Institut für Informationsverarbeitung, Leibniz Universität Hannover Appelstr. 9A, 30167 Hannover
More informationVideo Annotation and Retrieval Using Vague Shot Intervals
Master Thesis Video Annotation and Retrieval Using Vague Shot Intervals Supervisor Professor Katsumi Tanaka Department of Social Informatics Graduate School of Informatics Kyoto University Naoki FUKINO
More informationYamaha Steinberg USB Driver V for Mac Release Notes
Yamaha Steinberg USB Driver V1.10.2 for Mac Release Notes Contents System Requirements for Software Main Revisions and Enhancements Legacy Updates System Requirements for Software - Note that the system
More informationM I RA Lab. Speech Animation. Where do we stand today? Speech Animation : Hierarchy. What are the technologies?
MIRALab Where Research means Creativity Where do we stand today? M I RA Lab Nadia Magnenat-Thalmann MIRALab, University of Geneva thalmann@miralab.unige.ch Video Input (face) Audio Input (speech) FAP Extraction
More informationA long, deep and wide artificial neural net for robust speech recognition in unknown noise
A long, deep and wide artificial neural net for robust speech recognition in unknown noise Feipeng Li, Phani S. Nidadavolu, and Hynek Hermansky Center for Language and Speech Processing Johns Hopkins University,
More informationDECODING VISEMES: IMPROVING MACHINE LIP-READING. Helen L. Bear and Richard Harvey
DECODING VISEMES: IMPROVING MACHINE LIP-READING Helen L. Bear and Richard Harvey School of Computing Sciences, University of East Anglia, Norwich, NR4 7TJ, United Kingdom ABSTRACT To undertake machine
More informationFace Recognition-based Automatic Tagging Scheme for SNS
Face Recognition-based Automatic Tagging Scheme for SNS Outline Introduction Motivation Related Work Methodology Face Detection Face Matching Experimental Results Conclusion 01 Introduction Tag is Metadata
More informationImproving Speaker Verification Performance in Presence of Spoofing Attacks Using Out-of-Domain Spoofed Data
INTERSPEECH 17 August 24, 17, Stockholm, Sweden Improving Speaker Verification Performance in Presence of Spoofing Attacks Using Out-of-Domain Spoofed Data Achintya Kr. Sarkar 1, Md. Sahidullah 2, Zheng-Hua
More informationLIP ACTIVITY DETECTION FOR TALKING FACES CLASSIFICATION IN TV-CONTENT
LIP ACTIVITY DETECTION FOR TALKING FACES CLASSIFICATION IN TV-CONTENT Meriem Bendris 1,2, Delphine Charlet 1, Gérard Chollet 2 1 France Télécom R&D - Orange Labs, France 2 CNRS LTCI, TELECOM-ParisTech,
More informationPHONEME-TO-VISEME MAPPING FOR VISUAL SPEECH RECOGNITION
PHONEME-TO-VISEME MAPPING FOR VISUAL SPEECH RECOGNITION LucaCappelletta 1, Naomi Harte 1 1 Department ofelectronicand ElectricalEngineering, TrinityCollegeDublin, Ireland {cappelll, nharte}@tcd.ie Keywords:
More informationAUDIOVISUAL SPEECH RECOGNITION USING MULTISCALE NONLINEAR IMAGE DECOMPOSITION
AUDIOVISUAL SPEECH RECOGNITION USING MULTISCALE NONLINEAR IMAGE DECOMPOSITION Iain Matthews, J. Andrew Bangham and Stephen Cox School of Information Systems, University of East Anglia, Norwich, NR4 7TJ,
More informationSkeleton Cube for Lighting Environment Estimation
(MIRU2004) 2004 7 606 8501 E-mail: {takesi-t,maki,tm}@vision.kuee.kyoto-u.ac.jp 1) 2) Skeleton Cube for Lighting Environment Estimation Takeshi TAKAI, Atsuto MAKI, and Takashi MATSUYAMA Graduate School
More informationA Robust Method of Facial Feature Tracking for Moving Images
A Robust Method of Facial Feature Tracking for Moving Images Yuka Nomura* Graduate School of Interdisciplinary Information Studies, The University of Tokyo Takayuki Itoh Graduate School of Humanitics and
More informationDiscriminative training and Feature combination
Discriminative training and Feature combination Steve Renals Automatic Speech Recognition ASR Lecture 13 16 March 2009 Steve Renals Discriminative training and Feature combination 1 Overview Hot topics
More informationLip Tracking for MPEG-4 Facial Animation
Lip Tracking for MPEG-4 Facial Animation Zhilin Wu, Petar S. Aleksic, and Aggelos. atsaggelos Department of Electrical and Computer Engineering Northwestern University 45 North Sheridan Road, Evanston,
More informationHybrid Approach of Facial Expression Recognition
Hybrid Approach of Facial Expression Recognition Pinky Rai, Manish Dixit Department of Computer Science and Engineering, Madhav Institute of Technology and Science, Gwalior (M.P.), India Abstract In recent
More information電脳梁山泊烏賊塾 構造体のサイズ. Visual Basic
構造体 構造体のサイズ Marshal.SizeOf メソッド 整数型等型のサイズが定義されて居る構造体の場合 Marshal.SizeOf メソッドを使う事に依り型のサイズ ( バイト数 ) を取得する事が出来る 引数に値やオブジェクトを直接指定するか typeof や GetType で取得した型情報を渡す事に依り 其の型のサイズを取得する事が出来る 下記のプログラムを実行する事に依り Marshal.SizeOf
More informationMultifactor Fusion for Audio-Visual Speaker Recognition
Proceedings of the 7th WSEAS International Conference on Signal, Speech and Image Processing, Beijing, China, September 15-17, 2007 70 Multifactor Fusion for Audio-Visual Speaker Recognition GIRIJA CHETTY
More informationBaseball Game Highlight & Event Detection
Baseball Game Highlight & Event Detection Student: Harry Chao Course Adviser: Winston Hu 1 Outline 1. Goal 2. Previous methods 3. My flowchart 4. My methods 5. Experimental result 6. Conclusion & Future
More informationSVD-based Universal DNN Modeling for Multiple Scenarios
SVD-based Universal DNN Modeling for Multiple Scenarios Changliang Liu 1, Jinyu Li 2, Yifan Gong 2 1 Microsoft Search echnology Center Asia, Beijing, China 2 Microsoft Corporation, One Microsoft Way, Redmond,
More informationThe Nottingham eprints service makes this work by researchers of the University of Nottingham available open access under the following conditions.
Petridis, Stavros and Stafylakis, Themos and Ma, Pingchuan and Cai, Feipeng and Tzimiropoulos, Georgios and Pantic, Maja (2018) End-to-end audiovisual speech recognition. In: IEEE International Conference
More informationQuery Translation for CLIR: EWC vs. Google Translate
0 IEEE International Conference on Information Science and Technology Wuhan, Hubei, China; March 3-5, 0 Query Translation for CLIR: EWC vs. Google V. Klyuev and Y. Haralambous Abstract A new approach to
More informationMotion Path Searches for Maritime Robots
Journal of National Fisheries University 59 ⑷ 245-251(2011) Motion Path Searches for Maritime Robots Eiji Morimoto 1, Makoto Nakamura 1, Dai Yamanishi 1 and Eiki Osaki 2 Abstract : A method based on genetic
More informationComparison of Image Transform-Based Features for Visual Speech Recognition in Clean and Corrupted Videos
Comparison of Image Transform-Based Features for Visual Speech Recognition in Clean and Corrupted Videos Seymour, R., Stewart, D., & Ji, M. (8). Comparison of Image Transform-Based Features for Visual
More informationProgress Report of Final Year Project
Progress Report of Final Year Project Project Title: Design and implement a face-tracking engine for video William O Grady 08339937 Electronic and Computer Engineering, College of Engineering and Informatics,
More informationVehicle Calibration Techniques Established and Substantiated for Motorcycles
Technical paper Vehicle Calibration Techniques Established and Substantiated for Motorcycles モータサイクルに特化した車両適合手法の確立と実証 Satoru KANNO *1 Koichi TSUNOKAWA *1 Takashi SUDA *1 菅野寛角川浩一須田玄 モータサイクル向け ECU は, 搭載性をよくするため小型化が求められ,
More informationPCIe SSD PACC EP P3700 Intel Solid-State Drive Data Center Tool
Installation Guide - 日本語 PCIe SSD PACC EP P3700 Intel Solid-State Drive Data Center Tool Software Version 2.x 2015 年 4 月 富士通株式会社 1 著作権および商標 Copyright 2015 FUJITSU LIMITED 使用されているハードウェア名とソフトウェア名は 各メーカーの商標です
More informationOff-line Character Recognition using On-line Character Writing Information
Off-line Character Recognition using On-line Character Writing Information Hiromitsu NISHIMURA and Takehiko TIMIKAWA Dept. of Information and Computer Sciences, Kanagawa Institute of Technology 1030 Shimo-ogino,
More informationAdaptive Feature Extraction with Haar-like Features for Visual Tracking
Adaptive Feature Extraction with Haar-like Features for Visual Tracking Seunghoon Park Adviser : Bohyung Han Pohang University of Science and Technology Department of Computer Science and Engineering pclove1@postech.ac.kr
More information