Andi Buzo, Horia Cucu, Mihai Safta and Corneliu Burileanu. Speech & Dialogue (SpeeD) Research Laboratory University Politehnica of Bucharest (UPB)
|
|
- Randall Gibbs
- 5 years ago
- Views:
Transcription
1 Andi Buzo, Horia Cucu, Mihai Safta and Corneliu Burileanu Speech & Dialogue (SpeeD) Research Laboratory University Politehnica of Bucharest (UPB)
2 The MediaEval 2012 SWS task A multilingual, query by example, Spoken Term Detection (STD) task! Involves searching for a spoken term within audio content using a spoken query. Similar to keyword spotting, except that the keyword is spoken! The solution is straight-forward for high-resourced languages: just use a robust LVCSR system. The task s purpose is to build STD systems for underresourced languages (with very few resources). Development data: 3 hours of phone-level annotated speech in several African languages: isindebele, SiSwati, Tshivenda and Xitsonga 2
3 Our proposed STD solution A phone recognition system based on the architecture suggested in the NIST 2006 STD campaign The indexing of the content audio is done offline and the searching of the queries is done online. All the audio content is transformed into a sequence of phonemes (by the means of ASR) in the indexing stage. The terms are transformed into sequences of phonemes (by the means of ASR) and searched into the content in the searching stage. The advantage: the searching stage is very fast as opposed to searching the audio term directly into the audio content. 3
4 The indexing component Based on the Romanian ASR system for continuous speech: HMM acoustic model trained with 70 hours of read speech N-gram language model trained with 170 million words Average performance: 18% WER on clean, read speech Fine-tuning the Romanian ASR system for PhER: Created a phoneme N-gram LM Tuned the relative beam width related parameters: PhER reduction from 36.8% to 31.4% Tuned the language weight and phone insertion penalty: PhER reduction from 31.4% to 25.3% 4
5 The indexing component - adaptation Phone mapping: 77 African phones -> 28 Romanian phones 1) directly using IPA classification (if same phones) 2) to the closest phone according to IPA classification (if any) 3) based a speech-recognition confusion matrix Adapting (MAP) the acoustic model using the African development speech data set PhER reduction from 61.2% to 48.1% ASR systems Evaluation on PhER [%] Romanian ASR Baseline for continuous speech Romanian speech 36.8 Romanian ASR After beam width tuning Romanian speech 31.4 Romanian ASR After language params tuning Romanian speech 25.3 African speech 61.2 Romanian ASR After adaptation with African data African speech
6 The searching component the DTWSS method If the ASR system s PhER would be zero => The searching component could be a simple string search ASR system s PhER not zero: we propose DTWSS Aligns the query phone string to the content phone string Sliding window; length proportional (1.5x) to the query length DTWSS key features: Short queries are penalized (amount controlled by alpha) Spreaded DTW matches penalized (amount controlled by beta) Alignment score: s (1 L PhER)(1 L Q QM L Qm L Qm L )(1 W L L Q S ) 6
7 Why standard DTW is not good? Standard DTW score would be: s 1 PhER C1 e u p l e c l a m Q1 p l e c * * Score 0.66 C1 e u p l e c l a m Q2 p * e * l a 0.66 C1 e u p l e c l a m Q3 p l * 0.66 C1 e u p l e c l a m Q4 e u p l e c * * *
8 The searching component tuning alpha and beta DTWSS penalization parameters are tuned on the development STD database. The standard evaluation metric for STD is Actual Term Weighted Value (ATWV) proposed by NIST (max value is 1) Standard DTW method (baseline) has alpha = beta = 0 ATWV α=0 α=0.1 α=0.2 α=0.4 α=0.6 α=0.8 α=1.0 β = β = β = β = β =
9 SWS task results evalqevalc DTWSS (α=0.8 β=0.4) 0.31 DTWSS (α=0.6 β=0.6) 0.31 DTWSS (α=0.1 β=0.4) 0.27 ATWV speed.pub.ro devqdevc
10 Comparison with other methods Team-Method evalqevaldevc devq- BUT-AKWS JHU_HLTCOE-RAILS TID-IRDTW DTWSS (α=0.8 β=0.4) TUM-CDTW GTTS-Phone_Lattice TUKE-DTWSVM 0 0 Our STD system has ranked in the middle. Our method performs similar to those that tread STD as a pattern recognition problem by aligning speech features. The most accurate method has the highest computational cost (it does not perform an offline indexing)! 10
11 Conclusions The Romanian ASR is adapted to recognize African phones This is the indexing component of the STD system A novel DTW method was proposed to address the search of imperfectly recognized queries in imperfectly recognized content This is the searching component of the STD system The penalization of long DTW matches and short queries helped increase the ATWV The STD system ranked well in MediaEval 2012 SWS competition 11
12
THE SPOKEN WEB SEARCH TASK AT MEDIAEVAL ({etienne.barnard
THE SPOKEN WEB SEARCH TASK AT MEDIAEVAL 2012 Florian Metze 1, Xavier Anguera 2, Etienne Barnard 3, Marelie Davel 3, and Guillaume Gravier 4 1 Carnegie Mellon University; Pittsburgh, PA, USA (fmetze@cs.cmu.edu)
More informationQUESST2014: EVALUATING QUERY-BY-EXAMPLE SPEECH SEARCH IN A ZERO-RESOURCE SETTING WITH REAL-LIFE QUERIES
QUESST2014: EVALUATING QUERY-BY-EXAMPLE SPEECH SEARCH IN A ZERO-RESOURCE SETTING WITH REAL-LIFE QUERIES Xavier Anguera 1, Luis-J. Rodriguez-Fuentes 2, Andi Buzo 3, Florian Metze 4, Igor Szöke 5 and Mikel
More informationSpoken Term Detection Using Multiple Speech Recognizers Outputs at NTCIR-9 SpokenDoc STD subtask
NTCIR-9 Workshop: SpokenDoc Spoken Term Detection Using Multiple Speech Recognizers Outputs at NTCIR-9 SpokenDoc STD subtask Hiromitsu Nishizaki Yuto Furuya Satoshi Natori Yoshihiro Sekiguchi University
More informationQuery-by-example spoken term detection based on phonetic posteriorgram Query-by-example spoken term detection based on phonetic posteriorgram
International Conference on Education, Management and Computing Technology (ICEMCT 2015) Query-by-example spoken term detection based on phonetic posteriorgram Query-by-example spoken term detection based
More informationProgramming-By-Example Gesture Recognition Kevin Gabayan, Steven Lansel December 15, 2006
Programming-By-Example Gesture Recognition Kevin Gabayan, Steven Lansel December 15, 6 Abstract Machine learning and hardware improvements to a programming-by-example rapid prototyping system are proposed.
More informationThe ALBAYZIN 2016 Search on Speech Evaluation Plan
The ALBAYZIN 2016 Search on Speech Evaluation Plan Javier Tejedor 1 and Doroteo T. Toledano 2 1 FOCUS S.L., Madrid, Spain, javiertejedornoguerales@gmail.com 2 ATVS - Biometric Recognition Group, Universidad
More informationSegmented Dynamic Time Warping for Spoken Query-by-Example Search
INTERSPEECH 2016 September 8 12, 2016, San Francisco, USA Segmented Dynamic Time Warping for Spoken Query-by-Example Search Jorge Proença, Fernando Perdigão Instituto de Telecomunicações and Department
More informationCNN based Query by Example Spoken Term Detection
CNN based Query by Example Spoken Term Detection Dhananjay Ram, Lesly Miculicich, Hervé Bourlard Idiap Research Institute, Martigny, Switzerland École Polytechnique Fédérale de Lausanne (EPFL), Switzerland
More informationSpoken Document Retrieval (SDR) for Broadcast News in Indian Languages
Spoken Document Retrieval (SDR) for Broadcast News in Indian Languages Chirag Shah Dept. of CSE IIT Madras Chennai - 600036 Tamilnadu, India. chirag@speech.iitm.ernet.in A. Nayeemulla Khan Dept. of CSE
More informationAn In-Depth Comparison of Keyword Specific Thresholding and Sum-to-One Score Normalization
An In-Depth Comparison of Keyword Specific Thresholding and Sum-to-One Score Normalization Yun Wang and Florian Metze Language Technologies Institute, Carnegie Mellon University Pittsburgh, PA, U.S.A.
More informationConfidence Measures: how much we can trust our speech recognizers
Confidence Measures: how much we can trust our speech recognizers Prof. Hui Jiang Department of Computer Science York University, Toronto, Ontario, Canada Email: hj@cs.yorku.ca Outline Speech recognition
More informationSpeech Technology Using in Wechat
Speech Technology Using in Wechat FENG RAO Powered by WeChat Outline Introduce Algorithm of Speech Recognition Acoustic Model Language Model Decoder Speech Technology Open Platform Framework of Speech
More informationNovel Methods for Query Selection and Query Combination in Query-By-Example Spoken Term Detection
Novel Methods for Query Selection and Query Combination in Query-By-Example Spoken Term Detection Javier Tejedor HCTLab, Universidad Autónoma de Madrid, Spain javier.tejedor@uam.es Igor Szöke Speech@FIT,
More informationThe L 2 F Query-by-Example Spoken Term Detection system for the ALBAYZIN 2016 evaluation
The L 2 F Query-by-Example Spoken Term Detection system for the ALBAYZIN 2016 evaluation Anna Pompili and Alberto Abad L 2 F - Spoken Language Systems Lab, INESC-ID Lisboa IST - Instituto Superior Técnico,
More informationDynamic Time Warping
Centre for Vision Speech & Signal Processing University of Surrey, Guildford GU2 7XH. Dynamic Time Warping Dr Philip Jackson Acoustic features Distance measures Pattern matching Distortion penalties DTW
More informationSpeech Tuner. and Chief Scientist at EIG
Speech Tuner LumenVox's Speech Tuner is a complete maintenance tool for end-users, valueadded resellers, and platform providers. It s designed to perform tuning and transcription, as well as parameter,
More informationRLAT Rapid Language Adaptation Toolkit
RLAT Rapid Language Adaptation Toolkit Tim Schlippe May 15, 2012 RLAT Rapid Language Adaptation Toolkit - 2 RLAT Rapid Language Adaptation Toolkit RLAT Rapid Language Adaptation Toolkit - 3 Outline Introduction
More informationSelection of Best Match Keyword using Spoken Term Detection for Spoken Document Indexing
Selection of Best Match Keyword using Spoken Term Detection for Spoken Document Indexing Kentaro Domoto, Takehito Utsuro, Naoki Sawada and Hiromitsu Nishizaki Graduate School of Systems and Information
More informationA Survey on Spoken Document Indexing and Retrieval
A Survey on Spoken Document Indexing and Retrieval Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University Introduction (1/5) Ever-increasing volumes of audio-visual
More informationDiscriminative training and Feature combination
Discriminative training and Feature combination Steve Renals Automatic Speech Recognition ASR Lecture 13 16 March 2009 Steve Renals Discriminative training and Feature combination 1 Overview Hot topics
More informationMEMORY EFFICIENT SUBSEQUENCE DTW FOR QUERY-BY-EXAMPLE SPOKEN TERM DETECTION. Xavier Anguera and Miquel Ferrarons
MEMORY EFFICIENT SUBSEQUENCE DTW FOR QUERY-BY-EXAMPLE SPOKEN TERM DETECTION Xavier Anguera and Miquel Ferrarons Telefonica Research, Edificio Telefonica-Diagonal 00, Barcelona, Spain xanguera@tid.es ABSTRACT
More informationLATENT SEMANTIC INDEXING BY SELF-ORGANIZING MAP. Mikko Kurimo and Chafic Mokbel
LATENT SEMANTIC INDEXING BY SELF-ORGANIZING MAP Mikko Kurimo and Chafic Mokbel IDIAP CP-592, Rue du Simplon 4, CH-1920 Martigny, Switzerland Email: Mikko.Kurimo@idiap.ch ABSTRACT An important problem for
More informationGPU Accelerated Model Combination for Robust Speech Recognition and Keyword Search
GPU Accelerated Model Combination for Robust Speech Recognition and Keyword Search Wonkyum Lee Jungsuk Kim Ian Lane Electrical and Computer Engineering Carnegie Mellon University March 26, 2014 @GTC2014
More informationDiscriminative Training of Decoding Graphs for Large Vocabulary Continuous Speech Recognition
Discriminative Training of Decoding Graphs for Large Vocabulary Continuous Speech Recognition by Hong-Kwang Jeff Kuo, Brian Kingsbury (IBM Research) and Geoffry Zweig (Microsoft Research) ICASSP 2007 Presented
More informationPHONE-BASED SPOKEN DOCUMENT RETRIEVAL IN CONFORMANCE WITH THE MPEG-7 STANDARD
PHONE-BASED SPOKEN DOCUMENT RETRIEVAL IN CONFORMANCE WITH THE MPEG-7 STANDARD NICOLAS MOREAU, HYOUNG GOOK KIM, AND THOMAS SIKORA Communication Systems Group, Technical University of Berlin, Germany [moreau,kim,sikora]@nue.tu-berlin.de
More informationCHIST-ERA Projects Seminar Topic IUI
CHIST-ERA Projects Seminar Topic IUI Heiko Schuldt, Alexey Andrushevich, Laurence Devillers (based on slides from S. Dupont) Brussels, March 21-23, 2017 Introduction: Projects of the topic eglasses: The
More informationLattice Rescoring for Speech Recognition Using Large Scale Distributed Language Models
Lattice Rescoring for Speech Recognition Using Large Scale Distributed Language Models ABSTRACT Euisok Chung Hyung-Bae Jeon Jeon-Gue Park and Yun-Keun Lee Speech Processing Research Team, ETRI, 138 Gajeongno,
More informationUse of GPU and Feature Reduction for Fast Query-by-Example Spoken Term Detection
Use of GPU and Feature Reduction for Fast Query-by-Example Spoken Term Detection Gautam Mantena, Kishore Prahallad International Institute of Information Technology - Hyderabad, India gautam.mantena@research.iiit.ac.in,
More informationFusion of LVCSR and Posteriorgram Based Keyword Search
INTERSPEECH 2015 Fusion of LVCSR and Posteriorgram Based Keyword Search Leda Sarı, Batuhan Gündoğdu, Murat Saraçlar Boğaziçi University, Bebek, Istanbul, 34342, Turkey {leda.sari, batuhan.gundogdu, murat.saraclar}@boun.edu.tr
More informationSVD-based Universal DNN Modeling for Multiple Scenarios
SVD-based Universal DNN Modeling for Multiple Scenarios Changliang Liu 1, Jinyu Li 2, Yifan Gong 2 1 Microsoft Search echnology Center Asia, Beijing, China 2 Microsoft Corporation, One Microsoft Way, Redmond,
More informationXing Fan, Carlos Busso and John H.L. Hansen
Xing Fan, Carlos Busso and John H.L. Hansen Center for Robust Speech Systems (CRSS) Erik Jonsson School of Engineering & Computer Science Department of Electrical Engineering University of Texas at Dallas
More informationLecture 7: Neural network acoustic models in speech recognition
CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 7: Neural network acoustic models in speech recognition Outline Hybrid acoustic modeling overview Basic
More informationLarge Scale Distributed Acoustic Modeling With Back-off N-grams
Large Scale Distributed Acoustic Modeling With Back-off N-grams Ciprian Chelba* and Peng Xu and Fernando Pereira and Thomas Richardson Abstract The paper revives an older approach to acoustic modeling
More informationAutomatic Transcription of Speech From Applied Research to the Market
Think beyond the limits! Automatic Transcription of Speech From Applied Research to the Market Contact: Jimmy Kunzmann kunzmann@eml.org European Media Laboratory European Media Laboratory (founded 1997)
More informationKnowledge-Based Word Lattice Rescoring in a Dynamic Context. Todd Shore, Friedrich Faubel, Hartmut Helmke, Dietrich Klakow
Knowledge-Based Word Lattice Rescoring in a Dynamic Context Todd Shore, Friedrich Faubel, Hartmut Helmke, Dietrich Klakow Section I Motivation Motivation Problem: difficult to incorporate higher-level
More information2-2-2, Hikaridai, Seika-cho, Soraku-gun, Kyoto , Japan 2 Graduate School of Information Science, Nara Institute of Science and Technology
ISCA Archive STREAM WEIGHT OPTIMIZATION OF SPEECH AND LIP IMAGE SEQUENCE FOR AUDIO-VISUAL SPEECH RECOGNITION Satoshi Nakamura 1 Hidetoshi Ito 2 Kiyohiro Shikano 2 1 ATR Spoken Language Translation Research
More informationBUILDING CORPORA OF TRANSCRIBED SPEECH FROM OPEN ACCESS SOURCES
BUILDING CORPORA OF TRANSCRIBED SPEECH FROM OPEN ACCESS SOURCES O.O. Iakushkin a, G.A. Fedoseev, A.S. Shaleva, O.S. Sedova Saint Petersburg State University, 7/9 Universitetskaya nab., St. Petersburg,
More informationSemantic Word Embedding Neural Network Language Models for Automatic Speech Recognition
Semantic Word Embedding Neural Network Language Models for Automatic Speech Recognition Kartik Audhkhasi, Abhinav Sethy Bhuvana Ramabhadran Watson Multimodal Group IBM T. J. Watson Research Center Motivation
More informationSPEECH FEATURE EXTRACTION USING WEIGHTED HIGHER-ORDER LOCAL AUTO-CORRELATION
Far East Journal of Electronics and Communications Volume 3, Number 2, 2009, Pages 125-140 Published Online: September 14, 2009 This paper is available online at http://www.pphmj.com 2009 Pushpa Publishing
More informationD6.1.2: Second report on scientific evaluations
D6.1.2: Second report on scientific evaluations UPVLC, XEROX, JSI-K4A, RWTH, EML and DDS Distribution: Public translectures Transcription and Translation of Video Lectures ICT Project 287755 Deliverable
More informationHandwritten Text Recognition
Handwritten Text Recognition M.J. Castro-Bleda, S. España-Boquera, F. Zamora-Martínez Universidad Politécnica de Valencia Spain Avignon, 9 December 2010 Text recognition () Avignon Avignon, 9 December
More informationGender-dependent acoustic models fusion developed for automatic subtitling of Parliament meetings broadcasted by the Czech TV
Gender-dependent acoustic models fusion developed for automatic subtitling of Parliament meetings broadcasted by the Czech TV Jan Vaněk and Josef V. Psutka Department of Cybernetics, West Bohemia University,
More informationMaximum Likelihood Beamforming for Robust Automatic Speech Recognition
Maximum Likelihood Beamforming for Robust Automatic Speech Recognition Barbara Rauch barbara@lsv.uni-saarland.de IGK Colloquium, Saarbrücken, 16 February 2006 Agenda Background: Standard ASR Robust ASR
More informationPARALLEL TRAINING ALGORITHMS FOR CONTINUOUS SPEECH RECOGNITION, IMPLEMENTED IN A MESSAGE PASSING FRAMEWORK
PARALLEL TRAINING ALGORITHMS FOR CONTINUOUS SPEECH RECOGNITION, IMPLEMENTED IN A MESSAGE PASSING FRAMEWORK Vladimir Popescu 1, 2, Corneliu Burileanu 1, Monica Rafaila 1, Ramona Calimanescu 1 1 Faculty
More informationWeighted Finite State Transducers in Automatic Speech Recognition
Weighted Finite State Transducers in Automatic Speech Recognition ZRE lecture 10.04.2013 Mirko Hannemann Slides provided with permission, Daniel Povey some slides from T. Schultz, M. Mohri and M. Riley
More informationFraunhofer IAIS Audio Mining Solution for Broadcast Archiving. Dr. Joachim Köhler LT-Innovate Brussels
Fraunhofer IAIS Audio Mining Solution for Broadcast Archiving Dr. Joachim Köhler LT-Innovate Brussels 22.11.2016 1 Outline Speech Technology in the Broadcast World Deep Learning Speech Technologies Fraunhofer
More informationDetection of Acoustic Events in Meeting-Room Environment
11/Dec/2008 Detection of Acoustic Events in Meeting-Room Environment Presented by Andriy Temko Department of Electrical and Electronic Engineering Page 2 of 34 Content Introduction State of the Art Acoustic
More informationA Methodology for End-to-End Evaluation of Arabic Document Image Processing Software
MP 06W0000108 MITRE PRODUCT A Methodology for End-to-End Evaluation of Arabic Document Image Processing Software June 2006 Paul M. Herceg Catherine N. Ball 2006 The MITRE Corporation. All Rights Reserved.
More informationQuery-by-Example Spoken Term Detection using Frequency Domain Linear Prediction and Non-Segmental Dynamic Time Warping
946 IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 22, NO. 5, MAY 2014 Query-by-Example Spoken Term Detection using Frequency Domain Linear Prediction and Non-Segmental Dynamic Time
More informationarxiv: v1 [cs.cl] 28 Nov 2016
An End-to-End Architecture for Keyword Spotting and Voice Activity Detection arxiv:1611.09405v1 [cs.cl] 28 Nov 2016 Chris Lengerich Mindori Palo Alto, CA chris@mindori.com Abstract Awni Hannun Mindori
More informationMultimedia Information Retrieval
Multimedia Information Retrieval Prof Stefan Rüger Multimedia and Information Systems Knowledge Media Institute The Open University http://kmi.open.ac.uk/mmis Multimedia Information Retrieval 1. What are
More informationWHO WANTS TO BE A MILLIONAIRE?
IDIAP COMMUNICATION REPORT WHO WANTS TO BE A MILLIONAIRE? Huseyn Gasimov a Aleksei Triastcyn Hervé Bourlard Idiap-Com-03-2012 JULY 2012 a EPFL Centre du Parc, Rue Marconi 19, PO Box 592, CH - 1920 Martigny
More informationChapter 3. Speech segmentation. 3.1 Preprocessing
, as done in this dissertation, refers to the process of determining the boundaries between phonemes in the speech signal. No higher-level lexical information is used to accomplish this. This chapter presents
More informationCS535 Big Data Fall 2017 Colorado State University 10/10/2017 Sangmi Lee Pallickara Week 8- A.
CS535 Big Data - Fall 2017 Week 8-A-1 CS535 BIG DATA FAQs Term project proposal New deadline: Tomorrow PA1 demo PART 1. BATCH COMPUTING MODELS FOR BIG DATA ANALYTICS 5. ADVANCED DATA ANALYTICS WITH APACHE
More informationContextual Information Retrieval Using Ontology-Based User Profiles
Contextual Information Retrieval Using Ontology-Based User Profiles Vishnu Kanth Reddy Challam Master s Thesis Defense Date: Jan 22 nd, 2004. Committee Dr. Susan Gauch(Chair) Dr.David Andrews Dr. Jerzy
More informationSpeech Applications. How do they work?
Speech Applications How do they work? What is a VUI? What the user interacts with when using a speech application VUI Elements Prompts or System Messages Prerecorded or Synthesized Grammars Define the
More informationLOW-RANK MATRIX FACTORIZATION FOR DEEP NEURAL NETWORK TRAINING WITH HIGH-DIMENSIONAL OUTPUT TARGETS
LOW-RANK MATRIX FACTORIZATION FOR DEEP NEURAL NETWORK TRAINING WITH HIGH-DIMENSIONAL OUTPUT TARGETS Tara N. Sainath, Brian Kingsbury, Vikas Sindhwani, Ebru Arisoy, Bhuvana Ramabhadran IBM T. J. Watson
More informationSequence Prediction with Neural Segmental Models. Hao Tang
Sequence Prediction with Neural Segmental Models Hao Tang haotang@ttic.edu About Me Pronunciation modeling [TKL 2012] Segmental models [TGL 2014] [TWGL 2015] [TWGL 2016] [TWGL 2016] American Sign Language
More informationTowards Optimized Multimodal Concept Indexing
Towards Optimized Multimodal Concept Indexing Navid Rekabsaz, Ralf Bierig, Mihai Lupu, Allan Hanbury [last_name]@ifs.tuwien.ac.at Navid Rekabsaz (navid.rekabsaz@student.tuwien.ac.at) Mihai Lupu (lupu@ifs.tuwien.ac.at)
More informationMulti-Modal Communication
Multi-Modal Communication 14 November 2011 Victor S. Finomore, Jr., Ph.D. Research Psychologist Battlespace Acoustic Branch Air Force Research Laboratory DISTRIBUTION STATEMENT D. Distribution authorized
More informationA Novel Template Matching Approach To Speaker-Independent Arabic Spoken Digit Recognition
Special Session: Intelligent Knowledge Management A Novel Template Matching Approach To Speaker-Independent Arabic Spoken Digit Recognition Jiping Sun 1, Jeremy Sun 1, Kacem Abida 2, and Fakhri Karray
More informationLARGE-VOCABULARY CHINESE TEXT/SPEECH INFORMATION RETRIEVAL USING MANDARIN SPEECH QUERIES
LARGE-VOCABULARY CHINESE TEXT/SPEECH INFORMATION RETRIEVAL USING MANDARIN SPEECH QUERIES Bo-ren Bai 1, Berlin Chen 2, Hsin-min Wang 2, Lee-feng Chien 2, and Lin-shan Lee 1,2 1 Department of Electrical
More informationA Comparison of Sequence-Trained Deep Neural Networks and Recurrent Neural Networks Optical Modeling For Handwriting Recognition
A Comparison of Sequence-Trained Deep Neural Networks and Recurrent Neural Networks Optical Modeling For Handwriting Recognition Théodore Bluche, Hermann Ney, Christopher Kermorvant SLSP 14, Grenoble October
More informationOutline. Possible solutions. The basic problem. How? How? Relevance Feedback, Query Expansion, and Inputs to Ranking Beyond Similarity
Outline Relevance Feedback, Query Expansion, and Inputs to Ranking Beyond Similarity Lecture 10 CS 410/510 Information Retrieval on the Internet Query reformulation Sources of relevance for feedback Using
More informationarxiv: v1 [cs.cl] 30 Jan 2018
ACCELERATING RECURRENT NEURAL NETWORK LANGUAGE MODEL BASED ONLINE SPEECH RECOGNITION SYSTEM Kyungmin Lee, Chiyoun Park, Namhoon Kim, and Jaewon Lee DMC R&D Center, Samsung Electronics, Seoul, Korea {k.m.lee,
More informationQuery-by-Example Spoken Term Detection ALBAYZIN 2012 evaluation: overview, systems, results, and discussion
Tejedor et al. EURASIP Journal on Audio, Speech, and Music Processing 2013, 2013:23 RESEARCH Open Access Query-by-Example Spoken Term Detection ALBAYZIN 2012 evaluation: overview, systems, results, and
More informationAutomatic Speech Recognition (ASR)
Automatic Speech Recognition (ASR) February 2018 Reza Yazdani Aminabadi Universitat Politecnica de Catalunya (UPC) State-of-the-art State-of-the-art ASR system: DNN+HMM Speech (words) Sound Signal Graph
More informationDATA MINING INTRODUCTION TO CLASSIFICATION USING LINEAR CLASSIFIERS
DATA MINING INTRODUCTION TO CLASSIFICATION USING LINEAR CLASSIFIERS 1 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes and a class attribute
More informationAutomated Tagging to Enable Fine-Grained Browsing of Lecture Videos
Automated Tagging to Enable Fine-Grained Browsing of Lecture Videos K.Vijaya Kumar (09305081) under the guidance of Prof. Sridhar Iyer June 28, 2011 1 / 66 Outline Outline 1 Introduction 2 Motivation 3
More informationPredicting ground-level scene Layout from Aerial imagery. Muhammad Hasan Maqbool
Predicting ground-level scene Layout from Aerial imagery Muhammad Hasan Maqbool Objective Given the overhead image predict its ground level semantic segmentation Predicted ground level labeling Overhead/Aerial
More informationLatent Variable Models for Structured Prediction and Content-Based Retrieval
Latent Variable Models for Structured Prediction and Content-Based Retrieval Ariadna Quattoni Universitat Politècnica de Catalunya Joint work with Borja Balle, Xavier Carreras, Adrià Recasens, Antonio
More informationVisualization and text mining of patent and non-patent data
of patent and non-patent data Anton Heijs Information Solutions Delft, The Netherlands http://www.treparel.com/ ICIC conference, Nice, France, 2008 Outline Introduction Applications on patent and non-patent
More informationVariable-Component Deep Neural Network for Robust Speech Recognition
Variable-Component Deep Neural Network for Robust Speech Recognition Rui Zhao 1, Jinyu Li 2, and Yifan Gong 2 1 Microsoft Search Technology Center Asia, Beijing, China 2 Microsoft Corporation, One Microsoft
More informationImperfect transcript driven speech recognition
Imperfect transcript driven speech recognition Benjamin Lecouteux, Georges Linarès, Pascal Nocera, Jean-François Bonastre To cite this version: Benjamin Lecouteux, Georges Linarès, Pascal Nocera, Jean-François
More informationWeighted Finite State Transducers in Automatic Speech Recognition
Weighted Finite State Transducers in Automatic Speech Recognition ZRE lecture 15.04.2015 Mirko Hannemann Slides provided with permission, Daniel Povey some slides from T. Schultz, M. Mohri, M. Riley and
More informationBUPT at TREC 2009: Entity Track
BUPT at TREC 2009: Entity Track Zhanyi Wang, Dongxin Liu, Weiran Xu, Guang Chen, Jun Guo Pattern Recognition and Intelligent System Lab, Beijing University of Posts and Telecommunications, Beijing, China,
More informationJulius rev LEE Akinobu, and Julius Development Team 2007/12/19. 1 Introduction 2
Julius rev. 4.0 L Akinobu, and Julius Development Team 2007/12/19 Contents 1 Introduction 2 2 Framework of Julius-4 2 2.1 System architecture........................... 2 2.2 How it runs...............................
More informationVoice activated spell-check
Technical Disclosure Commons Defensive Publications Series November 15, 2017 Voice activated spell-check Pedro Gonnet Victor Carbune Follow this and additional works at: http://www.tdcommons.org/dpubs_series
More informationLecture October. 1 Examples of machine learning problems and basic terminology
MLISP: Machine Learning in Signal Processing WS 2018/2019 Lecture 1 17. October Prof. Veniamin Morgenshtern Scribe: Eric Sperschneider Agenda: 1. Organizational: webpage, time, review sessions, literature,
More informationPERSONALIZED TAG RECOMMENDATION
PERSONALIZED TAG RECOMMENDATION Ziyu Guan, Xiaofei He, Jiajun Bu, Qiaozhu Mei, Chun Chen, Can Wang Zhejiang University, China Univ. of Illinois/Univ. of Michigan 1 Booming of Social Tagging Applications
More informationConstrained Discriminative Training of N-gram Language Models
Constrained Discriminative Training of N-gram Language Models Ariya Rastrow #1, Abhinav Sethy 2, Bhuvana Ramabhadran 3 # Human Language Technology Center of Excellence, and Center for Language and Speech
More informationNTT SMT System for IWSLT Katsuhito Sudoh, Taro Watanabe, Jun Suzuki, Hajime Tsukada, and Hideki Isozaki NTT Communication Science Labs.
NTT SMT System for IWSLT 2008 Katsuhito Sudoh, Taro Watanabe, Jun Suzuki, Hajime Tsukada, and Hideki Isozaki NTT Communication Science Labs., Japan Overview 2-stage translation system k-best translation
More informationScott Shaobing Chen & P.S. Gopalakrishnan. IBM T.J. Watson Research Center. as follows:
SPEAKER, ENVIRONMENT AND CHANNEL CHANGE DETECTION AND CLUSTERING VIA THE BAYESIAN INFORMATION CRITERION Scott Shaobing Chen & P.S. Gopalakrishnan IBM T.J. Watson Research Center email: schen@watson.ibm.com
More informationOverview of the NTCIR-12 SpokenQuery&Doc-2 Task
Overview of the NTCIR-12 SpokenQuery&Doc-2 Task Tomoyosi Akiba Toyohashi University of Technology 1-1 Hibarigaoka, Tohohashi-shi, Aichi, 440-8580, Japan akiba@cs.tut.ac.jp Hiromitsu Nishizaki University
More informationAUTHOR COPY. Audio-video based character recognition for handwritten mathematical content in classroom videos
Integrated Computer-Aided Engineering 21 (2014) 219 234 219 DOI 10.3233/ICA-140460 IOS Press Audio-video based character recognition for handwritten mathematical content in classroom videos Smita Vemulapalli
More information3 Publishing Technique
Publishing Tool 32 3 Publishing Technique As discussed in Chapter 2, annotations can be extracted from audio, text, and visual features. The extraction of text features from the audio layer is the approach
More informationNEAR-IR BROADBAND POLARIZER DESIGN BASED ON PHOTONIC CRYSTALS
U.P.B. Sci. Bull., Series A, Vol. 77, Iss. 3, 2015 ISSN 1223-7027 NEAR-IR BROADBAND POLARIZER DESIGN BASED ON PHOTONIC CRYSTALS Bogdan Stefaniţă CALIN 1, Liliana PREDA 2 We have successfully designed a
More informationA Document Graph Based Query Focused Multi- Document Summarizer
A Document Graph Based Query Focused Multi- Document Summarizer By Sibabrata Paladhi and Dr. Sivaji Bandyopadhyay Department of Computer Science and Engineering Jadavpur University Jadavpur, Kolkata India
More informationBUAA AUDR at ImageCLEF 2012 Photo Annotation Task
BUAA AUDR at ImageCLEF 2012 Photo Annotation Task Lei Huang, Yang Liu State Key Laboratory of Software Development Enviroment, Beihang University, 100191 Beijing, China huanglei@nlsde.buaa.edu.cn liuyang@nlsde.buaa.edu.cn
More informationAn Interactive Framework for Document Retrieval and Presentation with Question-Answering Function in Restricted Domain
An Interactive Framework for Document Retrieval and Presentation with Question-Answering Function in Restricted Domain Teruhisa Misu and Tatsuya Kawahara School of Informatics, Kyoto University Kyoto 606-8501,
More informationHandwritten Text Recognition
Handwritten Text Recognition M.J. Castro-Bleda, Joan Pasto Universidad Politécnica de Valencia Spain Zaragoza, March 2012 Text recognition () TRABHCI Zaragoza, March 2012 1 / 1 The problem: Handwriting
More informationAdobe Premiere Course Curriculum
Adobe Premiere Course Curriculum EXPLORING THE INTERFACE New features in Adobe Premiere CS5 (CS6) Nonlinear editing in Adobe Premiere CS5 (CS6) Presenting the standard digital video workflow Incorporating
More informationPing-pong decoding Combining forward and backward search
Combining forward and backward search Research Internship 09/ - /0/0 Mirko Hannemann Microsoft Research, Speech Technology (Redmond) Supervisor: Daniel Povey /0/0 Mirko Hannemann / Beam Search Search Errors
More informationAutomatic summarization of video data
Automatic summarization of video data Presented by Danila Potapov Joint work with: Matthijs Douze Zaid Harchaoui Cordelia Schmid LEAR team, nria Grenoble Khronos-Persyvact Spring School 1.04.2015 Definition
More informationStochastic Segment Modeling for Offline Handwriting Recognition
2009 10th nternational Conference on Document Analysis and Recognition tochastic egment Modeling for Offline Handwriting Recognition Prem Natarajan, Krishna ubramanian, Anurag Bhardwaj, Rohit Prasad BBN
More informationLearning The Lexicon!
Learning The Lexicon! A Pronunciation Mixture Model! Ian McGraw! (imcgraw@mit.edu)! Ibrahim Badr Jim Glass! Computer Science and Artificial Intelligence Lab! Massachusetts Institute of Technology! Cambridge,
More informationPattern Spotting in Historical Document Image
Pattern Spotting in historical document images Sovann EN, Caroline Petitjean, Stéphane Nicolas, Frédéric Jurie, Laurent Heutte LITIS, University of Rouen, France 1 Outline Introduction Commons Pipeline
More informationVoice command module for Smart Home Automation
Voice command module for Smart Home Automation LUKA KRALJEVIĆ, MLADEN RUSSO, MAJA STELLA Laboratory for Smart Environment Technologies, University of Split, FESB Ruđera Boškovića 32, 21000, Split CROATIA
More informationWeb Information Retrieval. Exercises Evaluation in information retrieval
Web Information Retrieval Exercises Evaluation in information retrieval Evaluating an IR system Note: information need is translated into a query Relevance is assessed relative to the information need
More informationThe Hitachi/JHU CHiME-5 system: Advances in speech recognition for everyday home environments using multiple microphone arrays
CHiME2018 workshop The Hitachi/JHU CHiME-5 system: Advances in speech recognition for everyday home environments using multiple microphone arrays Naoyuki Kanda 1, Rintaro Ikeshita 1, Shota Horiguchi 1,
More information