Ping-pong decoding Combining forward and backward search
|
|
- Liliana Lucas
- 6 years ago
- Views:
Transcription
1 Combining forward and backward search Research Internship 09/ - /0/0 Mirko Hannemann Microsoft Research, Speech Technology (Redmond) Supervisor: Daniel Povey /0/0 Mirko Hannemann /
2 Beam Search Search Errors "partial_forward9" "partial_forward" "partial_diff" "best_diff" "best_diff9" score frame Mirko Hannemann /
3 What is the optimal beam width? x Only in few spots, we actually need the full beam. How to identify those spots? Histogram of score differences: current best path and final best path Mirko Hannemann 3/
4 in time Was it a car or a cat I saw? Example (Resource Management) forward decoding: IS SHERMAN ARE CONIFER AND THREE MOST RECENT CASUALTY REPORT backwards decoding: IS BADGER A REMARK ON VANCOUVER+S MOST RECENT CASUALTY REPORT Noise at the begin confused the whole utterance. Did not harm the backwards decoding much. Mirko Hannemann 4/
5 Analysis of search errors Are forward and backward search errors independent? fwd: PRODUCTS WOULD BE A MARKET BY OTHER COMPANIES... I S... bwd: PRODUCTS WOULD BE - TARGETED BY OTHER COMPLAINTS S wide: PRODUCTS WOULD BE - TARGETED BY OTHER COMPANIES Decoding beam forward backwards co-occurrence WSJ Nov 9 test set, align against wide beam (9.0) error co-occurrence must not mean same error Mirko Hannemann 5/
6 Construction of decoding network weighted finite state transducer (WFST) approach [Mohri et al.] H C L G () G grammar or language model acceptor L lexicon (phones to words) C context-dependency (context-dependent phones to phones) H HMM (PDF-ids to context-dependent phones) Kaldi toolkit: HCLG = asl(min(rds(det(h a min(det(c min(det(l G)))))))) () asl - add self loops rds - remove disambiguation symbols Mirko Hannemann 6/
7 Reversing language model G (word pair) LM must assign exactly the same scores to reversed utterances. Word pair grammar (uniform distribution, no back-off) D/0.5 4 B/0.5 <eps>/0.5 A/0.333 E/0.5 B/0.5 5 <eps>/0.5 0 C/0.333 B/ G E/0.333 F/0.333 G/ <eps> <eps> 8 finite state acceptor reversal, epsilon removal determinization and weight pushing in log semi-ring 3 OpenFst: iterative weight pushing algorithm, problem with states with huge fan-out Mirko Hannemann 7/
8 Reversing language models (ARPA) start <s> SB a/ <eps>/ <eps>/ backoff b/ </s>/ </s>/5.959 SE \-grams: a b <s> </s> a a/.053 b/ a/ <eps> </s>/ b </s>/ SB \-grams: a b <s> a -.78 b a -.30 b </s> <eps>/ backoff <eps>/.053 a/ a <eps>/ a/ b/ b b start <s> SE b/5.959 weight pushing in log semi-ring problem with states with huge fan-out Mirko Hannemann 8/
9 Reversing language models (ARPA) To be done: higher order models: find mathematical equations train on reversed training texts not exact, scores slightly differ Mirko Hannemann 9/
10 Reversing pronunciation dictionary L A ax # ABERDEEN n iy d er b ae ABOARD dd r ao b ax ADD dd ae # Add disambiguation symbols after reversing pronunciations. ae:<eps>/0.5 n:aberdeen 5 iy:<eps> 9 d:<eps> 0 er:<eps> b:<eps> ae:<eps>/0.5 ax:<eps>/0.5 dd:aboard 6 r:<eps> 3 ao:<eps> 4 b:<eps> 5 ax:<eps>/0.5 sil:!sil/0.5 sil:!sil/0.5 #:<eps>/0.5 ax:a v:above <eps>:<eps>/ ah:<eps> ax:<eps>/0.5 6 b:<eps> 7 #:<eps>/0.5 ax:<eps>/0.5 3 sil:<eps> dd:add 8 ae:<eps> 8 #:<eps>/0.5 #:<eps>/0.5 0 #4:<eps> sil:<eps>/0.5 Mirko Hannemann 0/
11 Reversing context-dependency transducer C #-/a eps-a-b/b a-b-c/c b-c-d/d c-d-eps/$ eps-eps eps-a a-b b-c c-d d/eps decision tree clusters on phoneme context window and HMM state context window out of L G is reversed! Mirko Hannemann /
12 Reversing HMM transducer H a :aa/-.384e-07 4:<eps> 5 6:<eps>/ e-08 0:<eps> 6 0:<eps> 0 30:#0 0:<eps> 4 0:<eps>/.9e-07 7 :<eps>/-.9e :ae 86:<eps>/ :sil 89:<eps>/ :<eps>/ :<eps>/ :<eps>/ :<eps>/ :<eps>/ :<eps>/ :<eps>/ :<eps>/ :<eps>/ :<eps>/ :<eps> 0:<eps> 3 (here shown for monophone case) Mirko Hannemann /
13 Reversing HMM transducer H a 94:<eps>/3.5 9:<eps>/ :<eps>/ :<eps>/ :<eps>/ :<eps>/ :<eps>/ :<eps>/.9 89:<eps>/ :<eps>/ :sil/ :<eps>/ :<eps>/ :aa/-.980e-07 4:<eps> 0:<eps> 6 :<eps> 5 0 0:<eps> 30:#0 4 :ae 0:<eps> 0:<eps> 8 8:<eps> 7 0:<eps> reverse phone HMMs, remove epsilons, push weights in log semi-ring before composing H a add self-loops: order of transitions changes Mirko Hannemann 3/
14 st pass forward search, nd pass backwards (or vice-versa) How can we use the search result of the first pass in second pass decoding? convert lattice generated by HCLG st into lattice of decoding-graph-states of HCLG nd for each frame Mirko Hannemann 4/
15 Generation of graph-state-lattices map HCLG nd to PDF-to-Arc transducer HCLG arc HCLG nd : transduces PDF-ids into words encode HCLG nd node + arc-id into output symbol 3 map input to be self-loop order independent map first-pass lattice LAT st to LAT rev map input (self-loops), project on input, remove weights time reversal of lattice and epsilon removal 3 compose: LAT arc = LAT rev HCLG arc obtains sequences of HCLG nd arcs for PDF sequence in lattice lattice determinization det(lat arc) (on PDF-ids) in special semi-ring single HCLG nd path for each sequence of PDFs 3 project to HCLG nd node/arc-ids, determinize again 4 output is an acceptor lattice for HCLG nd graph arcs Mirko Hannemann 5/
16 Generation of graph-state-lattices map HCLG nd to PDF-to-Arc transducer HCLG arc HCLG nd : transduces PDF-ids into words encode HCLG nd node + arc-id into output symbol 3 map input to be self-loop order independent map first-pass lattice LAT st to LAT rev map input (self-loops), project on input, remove weights time reversal of lattice and epsilon removal 3 compose: LAT arc = LAT rev HCLG arc obtains sequences of HCLG nd arcs for PDF sequence in lattice lattice determinization det(lat arc) (on PDF-ids) in special semi-ring single HCLG nd path for each sequence of PDFs 3 project to HCLG nd node/arc-ids, determinize again 4 output is an acceptor lattice for HCLG nd graph arcs Mirko Hannemann 5/
17 Generation of graph-state-lattices map HCLG nd to PDF-to-Arc transducer HCLG arc HCLG nd : transduces PDF-ids into words encode HCLG nd node + arc-id into output symbol 3 map input to be self-loop order independent map first-pass lattice LAT st to LAT rev map input (self-loops), project on input, remove weights time reversal of lattice and epsilon removal 3 compose: LAT arc = LAT rev HCLG arc obtains sequences of HCLG nd arcs for PDF sequence in lattice lattice determinization det(lat arc) (on PDF-ids) in special semi-ring single HCLG nd path for each sequence of PDFs 3 project to HCLG nd node/arc-ids, determinize again 4 output is an acceptor lattice for HCLG nd graph arcs Mirko Hannemann 5/
18 Generation of graph-state-lattices map HCLG nd to PDF-to-Arc transducer HCLG arc HCLG nd : transduces PDF-ids into words encode HCLG nd node + arc-id into output symbol 3 map input to be self-loop order independent map first-pass lattice LAT st to LAT rev map input (self-loops), project on input, remove weights time reversal of lattice and epsilon removal 3 compose: LAT arc = LAT rev HCLG arc obtains sequences of HCLG nd arcs for PDF sequence in lattice lattice determinization det(lat arc) (on PDF-ids) in special semi-ring single HCLG nd path for each sequence of PDFs 3 project to HCLG nd node/arc-ids, determinize again 4 output is an acceptor lattice for HCLG nd graph arcs Mirko Hannemann 5/
19 Generation of graph-state-lattices map HCLG nd to PDF-to-Arc transducer HCLG arc HCLG nd : transduces PDF-ids into words encode HCLG nd node + arc-id into output symbol 3 map input to be self-loop order independent map first-pass lattice LAT st to LAT rev map input (self-loops), project on input, remove weights time reversal of lattice and epsilon removal 3 compose: LAT arc = LAT rev HCLG arc obtains sequences of HCLG nd arcs for PDF sequence in lattice lattice determinization det(lat arc) (on PDF-ids) in special semi-ring single HCLG nd path for each sequence of PDFs 3 project to HCLG nd node/arc-ids, determinize again 4 output is an acceptor lattice for HCLG nd graph arcs Mirko Hannemann 5/
20 Using first pass search in second pass for each time step: perform own search track, where other has been extend search area if paths cross: adapt shorter paths Mirko Hannemann 6/
21 Using first pass search in second pass perform own search: st and nd pass beams set of observed tokens: move according to arc-lattice, track and never prune those! extend beam to include all observed tokens add extra-beam, limit by max-beam token recombination: inherit observation status Mirko Hannemann 7/
22 Results on Wall Street Journal Nov "rt_wer_forward" "rt_wer_backward" "rt_wer_pingpong0" "rt_wer_pingpong" "rt_wer_pingpong" "rt_wer_pingpong4" 0.6 realtime factor WER Mirko Hannemann 8/
23 Analysis of Search Errors Decoding beam forward backwards co-occurrence ping-pong WSJ Nov 9 closed voc. test set, 330 utterances triphone HMM+GMM, trained on 80h WSJ0 (Kaldi tria) bigram 5k language model (exact scores for reversal) fwd: BRIAN J. KILLING CHAIRMAN OF BELL - ATLANTA X. INVESTMENT.. S..... S. bwd: BRIAN J. DAILY CHAIRMAN OF BELL AND LAND SIX INVESTMENT I S S. png: BRIAN J. DAILY CHAIRMAN OF BELL - ATLANTA ITS INVESTMENT ref: BRIAN J. DAILY CHAIRMAN OF BELL - ATLANTA ITS INVESTMENT Mirko Hannemann 9/
24 Time analysis Search Errors Where is the time spent in ping-pong decoding? 35% first pass decoding (narrower lattice) 0% lattice to arc-lattice conversion (extra prog) <5% feature reversal, different ambiguity 40% second pass decoding, lattice generation 5-0% tracking first pass tokens 5-5% extra tokens in wider beam About 0% optimization is possible, but does not change things fundamentally. Mirko Hannemann 0/
25 Summary Search Errors backwards decoding and reverse decoding networks, LMs WFST based arc-lattice generation 3 integrating first pass search into second pass 4 tracking arc-lattice and vary beam 5 roughly two times speed-up by ping-pong decoding Open issues: reverse language models / reversed training too many parameters: forward beam, backward beam, lattice beam, extra-beam, max-beam, max-states, final-beam Mirko Hannemann /
26 References Search Errors Mohri08 M. Mohri et al., Speech Recognition with weighted finite state transducers. Povey D. Povey et al., The Kaldi Speech Recognition Toolkit. Povey D. Povey et al., Generating exact lattices in the WFST framework. Mirko Hannemann /
Weighted Finite State Transducers in Automatic Speech Recognition
Weighted Finite State Transducers in Automatic Speech Recognition ZRE lecture 10.04.2013 Mirko Hannemann Slides provided with permission, Daniel Povey some slides from T. Schultz, M. Mohri and M. Riley
More informationWeighted Finite State Transducers in Automatic Speech Recognition
Weighted Finite State Transducers in Automatic Speech Recognition ZRE lecture 15.04.2015 Mirko Hannemann Slides provided with permission, Daniel Povey some slides from T. Schultz, M. Mohri, M. Riley and
More informationMemory-Efficient Heterogeneous Speech Recognition Hybrid in GPU-Equipped Mobile Devices
Memory-Efficient Heterogeneous Speech Recognition Hybrid in GPU-Equipped Mobile Devices Alexei V. Ivanov, CTO, Verbumware Inc. GPU Technology Conference, San Jose, March 17, 2015 Autonomous Speech Recognition
More informationOverview. Search and Decoding. HMM Speech Recognition. The Search Problem in ASR (1) Today s lecture. Steve Renals
Overview Search and Decoding Steve Renals Automatic Speech Recognition ASR Lecture 10 January - March 2012 Today s lecture Search in (large vocabulary) speech recognition Viterbi decoding Approximate search
More informationWFST: Weighted Finite State Transducer. September 12, 2017 Prof. Marie Meteer
+ WFST: Weighted Finite State Transducer September 12, 217 Prof. Marie Meteer + FSAs: A recurring structure in speech 2 Phonetic HMM Viterbi trellis Language model Pronunciation modeling + The language
More informationLecture 9. LVCSR Decoding (cont d) and Robustness. Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen
Lecture 9 LVCSR Decoding (cont d) and Robustness Michael Picheny, huvana Ramabhadran, Stanley F. Chen IM T.J. Watson Research Center Yorktown Heights, New York, USA {picheny,bhuvana,stanchen}@us.ibm.com
More informationAutomatic Speech Recognition (ASR)
Automatic Speech Recognition (ASR) February 2018 Reza Yazdani Aminabadi Universitat Politecnica de Catalunya (UPC) State-of-the-art State-of-the-art ASR system: DNN+HMM Speech (words) Sound Signal Graph
More informationLab 2: Training monophone models
v. 1.1 Lab 2: Training monophone models University of Edinburgh January 29, 2018 Last time we begun to get familiar with some of Kaldi s tools and set up a data directory for TIMIT. This time we will train
More informationA study of large vocabulary speech recognition decoding using finite-state graphs 1
A study of large vocabulary speech recognition decoding using finite-state graphs 1 Zhijian OU, Ji XIAO Department of Electronic Engineering, Tsinghua University, Beijing Corresponding email: ozj@tsinghua.edu.cn
More informationLecture 8. LVCSR Decoding. Bhuvana Ramabhadran, Michael Picheny, Stanley F. Chen
Lecture 8 LVCSR Decoding Bhuvana Ramabhadran, Michael Picheny, Stanley F. Chen T.J. Watson Research Center Yorktown Heights, New York, USA {bhuvana,picheny,stanchen}@us.ibm.com 27 October 2009 EECS 6870:
More informationLearning The Lexicon!
Learning The Lexicon! A Pronunciation Mixture Model! Ian McGraw! (imcgraw@mit.edu)! Ibrahim Badr Jim Glass! Computer Science and Artificial Intelligence Lab! Massachusetts Institute of Technology! Cambridge,
More informationLexicographic Semirings for Exact Automata Encoding of Sequence Models
Lexicographic Semirings for Exact Automata Encoding of Sequence Models Brian Roark, Richard Sproat, and Izhak Shafran {roark,rws,zak}@cslu.ogi.edu Abstract In this paper we introduce a novel use of the
More informationDiscriminative Training of Decoding Graphs for Large Vocabulary Continuous Speech Recognition
Discriminative Training of Decoding Graphs for Large Vocabulary Continuous Speech Recognition by Hong-Kwang Jeff Kuo, Brian Kingsbury (IBM Research) and Geoffry Zweig (Microsoft Research) ICASSP 2007 Presented
More informationSpeech Technology Using in Wechat
Speech Technology Using in Wechat FENG RAO Powered by WeChat Outline Introduce Algorithm of Speech Recognition Acoustic Model Language Model Decoder Speech Technology Open Platform Framework of Speech
More informationLecture 8. LVCSR Training and Decoding. Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen
Lecture 8 LVCSR Training and Decoding Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen IBM T.J. Watson Research Center Yorktown Heights, New York, USA {picheny,bhuvana,stanchen}@us.ibm.com 12 November
More informationIntroduction to HTK Toolkit
Introduction to HTK Toolkit Berlin Chen 2003 Reference: - The HTK Book, Version 3.2 Outline An Overview of HTK HTK Processing Stages Data Preparation Tools Training Tools Testing Tools Analysis Tools Homework:
More informationHomework #1: CMPT-825 Reading: fsmtools/fsm/ Anoop Sarkar
Homework #: CMPT-825 Reading: http://www.research.att.com/ fsmtools/fsm/ Anoop Sarkar anoop@cs.sfu.ca () Machine (Back) Transliteration Languages have different sound inventories. When translating from
More informationLab 4 Large Vocabulary Decoding: A Love Story
Lab 4 Large Vocabulary Decoding: A Love Story EECS E6870: Speech Recognition Due: November 19, 2009 at 11:59pm SECTION 0 Overview By far the sexiest piece of software associated with ASR is the large-vocabulary
More informationScalable Trigram Backoff Language Models
Scalable Trigram Backoff Language Models Kristie Seymore Ronald Rosenfeld May 1996 CMU-CS-96-139 School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 This material is based upon work
More informationSpeech Recognition Lecture 12: Lattice Algorithms. Cyril Allauzen Google, NYU Courant Institute Slide Credit: Mehryar Mohri
Speech Recognition Lecture 12: Lattice Algorithms. Cyril Allauzen Google, NYU Courant Institute allauzen@cs.nyu.edu Slide Credit: Mehryar Mohri This Lecture Speech recognition evaluation N-best strings
More informationA toolkit for speech recognition research. (According to legend, Kaldi was the Ethiopian goatherd who discovered the coffee plant).
A toolkit for speech recognition research (According to legend, Kaldi was the Ethiopian goatherd who discovered the coffee plant). Key aspects of the project Apache v2.0 license (very free) Available on
More informationMINIMUM EXACT WORD ERROR TRAINING. G. Heigold, W. Macherey, R. Schlüter, H. Ney
MINIMUM EXACT WORD ERROR TRAINING G. Heigold, W. Macherey, R. Schlüter, H. Ney Lehrstuhl für Informatik 6 - Computer Science Dept. RWTH Aachen University, Aachen, Germany {heigold,w.macherey,schlueter,ney}@cs.rwth-aachen.de
More informationWeighted Finite-State Transducers in Computational Biology
Weighted Finite-State Transducers in Computational Biology Mehryar Mohri Courant Institute of Mathematical Sciences mohri@cims.nyu.edu Joint work with Corinna Cortes (Google Research). This Tutorial Weighted
More informationTECHNIQUES TO ACHIEVE AN ACCURATE REAL-TIME LARGE- VOCABULARY SPEECH RECOGNITION SYSTEM
TECHNIQUES TO ACHIEVE AN ACCURATE REAL-TIME LARGE- VOCABULARY SPEECH RECOGNITION SYSTEM Hy Murveit, Peter Monaco, Vassilios Digalakis, John Butzberger SRI International Speech Technology and Research Laboratory
More informationIntroduction to The HTK Toolkit
Introduction to The HTK Toolkit Hsin-min Wang Reference: - The HTK Book Outline An Overview of HTK HTK Processing Stages Data Preparation Tools Training Tools Testing Tools Analysis Tools A Tutorial Example
More informationTHE CUED NIST 2009 ARABIC-ENGLISH SMT SYSTEM
THE CUED NIST 2009 ARABIC-ENGLISH SMT SYSTEM Adrià de Gispert, Gonzalo Iglesias, Graeme Blackwood, Jamie Brunning, Bill Byrne NIST Open MT 2009 Evaluation Workshop Ottawa, The CUED SMT system Lattice-based
More informationHierarchical Phrase-Based Translation with WFSTs. Weighted Finite State Transducers
Hierarchical Phrase-Based Translation with Weighted Finite State Transducers Gonzalo Iglesias 1 Adrià de Gispert 2 Eduardo R. Banga 1 William Byrne 2 1 Department of Signal Processing and Communications
More informationTutorial of Building an LVCSR System
Tutorial of Building an LVCSR System using HTK Shih Hsiang Lin( 林士翔 ) Department of Computer Science & Information Engineering National Taiwan Normal University Reference: Steve Young et al, The HTK Books
More informationJulius rev LEE Akinobu, and Julius Development Team 2007/12/19. 1 Introduction 2
Julius rev. 4.0 L Akinobu, and Julius Development Team 2007/12/19 Contents 1 Introduction 2 2 Framework of Julius-4 2 2.1 System architecture........................... 2 2.2 How it runs...............................
More informationModeling Phonetic Context with Non-random Forests for Speech Recognition
Modeling Phonetic Context with Non-random Forests for Speech Recognition Hainan Xu Center for Language and Speech Processing, Johns Hopkins University September 4, 2015 Hainan Xu September 4, 2015 1 /
More informationLecture 8: Speech Recognition Using Finite State Transducers
Lecture 8: Speech Recognition Using Finite State Transducers Lecturer: Mark Hasegawa-Johnson (jhasegaw@uiuc.edu) TA: Sarah Borys (sborys@uiuc.edu) Web Page: http://www.ifp.uiuc.edu/speech/courses/minicourse/
More informationSPIDER: A Continuous Speech Light Decoder
SPIDER: A Continuous Speech Light Decoder Abdelaziz AAbdelhamid, Waleed HAbdulla, and Bruce AMacDonald Department of Electrical and Computer Engineering, Auckland University, New Zealand E-mail: aabd127@aucklanduniacnz,
More informationSpeech Recogni,on using HTK CS4706. Fadi Biadsy April 21 st, 2008
peech Recogni,on using HTK C4706 Fadi Biadsy April 21 st, 2008 1 Outline peech Recogni,on Feature Extrac,on HMM 3 basic problems HTK teps to Build a speech recognizer 2 peech Recogni,on peech ignal to
More informationCharacterization of Speech Recognition Systems on GPU Architectures
Facultat d Informàtica de Barcelona Master in Innovation and Research in Informatics High Performance Computing Master Thesis Characterization of Speech Recognition Systems on GPU Architectures Author:
More informationApplications of Lexicographic Semirings to Problems in Speech and Language Processing
Applications of Lexicographic Semirings to Problems in Speech and Language Processing Richard Sproat Google, Inc. Izhak Shafran Oregon Health & Science University Mahsa Yarmohammadi Oregon Health & Science
More informationA Weighted Finite State Transducer Implementation of the Alignment Template Model for Statistical Machine Translation.
A Weighted Finite State Transducer Implementation of the Alignment Template Model for Statistical Machine Translation May 29, 2003 Shankar Kumar and Bill Byrne Center for Language and Speech Processing
More informationCombination of FST and CN Search in Spoken Term Detection
Combination of FST and CN Search in Spoken Term Detection Justin Chiu 1, Yun Wang 1, Jan Trmal 2, Daniel Povey 2, Guoguo Chen 2, Alexander Rudnicky 1 1 Language Technologies Institute, Carnegie Mellon
More informationWeighted Finite-State Transducers in Computational Biology
Weighted Finite-State Transducers in Computational Biology Mehryar Mohri Courant Institute of Mathematical Sciences mohri@cims.nyu.edu Joint work with Corinna Cortes (Google Research). 1 This Tutorial
More informationDynamic Time Warping
Centre for Vision Speech & Signal Processing University of Surrey, Guildford GU2 7XH. Dynamic Time Warping Dr Philip Jackson Acoustic features Distance measures Pattern matching Distortion penalties DTW
More informationTHE KALDI SPEECH RECOGNITION TOOLKIT
RESEARCH IDIAP REPORT THE KALDI SPEECH RECOGNITION TOOLKIT a b c Daniel Povey Arnab Ghoshal Gilles Boulianne d e Lukas Burget Ondrej Glembek Nagendra Goel d f Mirko Hannemann Petr Motlicek Yanmin Qian
More informationA General Weighted Grammar Library
A General Weighted Grammar Library Cyril Allauzen, Mehryar Mohri, and Brian Roark AT&T Labs Research, Shannon Laboratory 80 Park Avenue, Florham Park, NJ 0792-097 {allauzen, mohri, roark}@research.att.com
More informationFusion of LVCSR and Posteriorgram Based Keyword Search
INTERSPEECH 2015 Fusion of LVCSR and Posteriorgram Based Keyword Search Leda Sarı, Batuhan Gündoğdu, Murat Saraçlar Boğaziçi University, Bebek, Istanbul, 34342, Turkey {leda.sari, batuhan.gundogdu, murat.saraclar}@boun.edu.tr
More informationFinite-State Transducers in Language and Speech Processing
Finite-State Transducers in Language and Speech Processing Mehryar Mohri AT&T Labs-Research Finite-state machines have been used in various domains of natural language processing. We consider here the
More informationWFST-based Grapheme-to-Phoneme Conversion: Open Source Tools for Alignment, Model-Building and Decoding
WFST-based Grapheme-to-Phoneme Conversion: Open Source Tools for Alignment, Model-Building and Decoding Josef R. Novak, Nobuaki Minematsu, Keikichi Hirose Graduate School of Information Science and Technology
More informationAn Ultra Low-Power Hardware Accelerator for Automatic Speech Recognition
An Ultra Low-Power Hardware Accelerator for Automatic Speech Recognition Reza Yazdani, Albert Segura, Jose-Maria Arnau, Antonio Gonzalez Computer Architecture Department, Universitat Politecnica de Catalunya
More informationSpeech Recognition CSCI-GA Fall Homework Assignment #1: Automata Operations. Instructor: Eugene Weinstein. Due Date: October 17th
Speech Recognition CSCI-GA.3033-015 Fall 2013 Homework Assignment #1: Automata Operations Instructor: Eugene Weinstein Due Date: October 17th Note: It is advised, but not required, to use the OpenFST library
More informationKnowledge-Based Word Lattice Rescoring in a Dynamic Context. Todd Shore, Friedrich Faubel, Hartmut Helmke, Dietrich Klakow
Knowledge-Based Word Lattice Rescoring in a Dynamic Context Todd Shore, Friedrich Faubel, Hartmut Helmke, Dietrich Klakow Section I Motivation Motivation Problem: difficult to incorporate higher-level
More informationSequence Prediction with Neural Segmental Models. Hao Tang
Sequence Prediction with Neural Segmental Models Hao Tang haotang@ttic.edu About Me Pronunciation modeling [TKL 2012] Segmental models [TGL 2014] [TWGL 2015] [TWGL 2016] [TWGL 2016] American Sign Language
More informationRLAT Rapid Language Adaptation Toolkit
RLAT Rapid Language Adaptation Toolkit Tim Schlippe May 15, 2012 RLAT Rapid Language Adaptation Toolkit - 2 RLAT Rapid Language Adaptation Toolkit RLAT Rapid Language Adaptation Toolkit - 3 Outline Introduction
More informationLANGUAGE MODEL SIZE REDUCTION BY PRUNING AND CLUSTERING
LANGUAGE MODEL SIZE REDUCTION BY PRUNING AND CLUSTERING Joshua Goodman Speech Technology Group Microsoft Research Redmond, Washington 98052, USA joshuago@microsoft.com http://research.microsoft.com/~joshuago
More informationMaximum Likelihood Beamforming for Robust Automatic Speech Recognition
Maximum Likelihood Beamforming for Robust Automatic Speech Recognition Barbara Rauch barbara@lsv.uni-saarland.de IGK Colloquium, Saarbrücken, 16 February 2006 Agenda Background: Standard ASR Robust ASR
More informationAutomatic Speech Recognition using Dynamic Bayesian Networks
Automatic Speech Recognition using Dynamic Bayesian Networks Rob van de Lisdonk Faculty Electrical Engineering, Mathematics and Computer Science Delft University of Technology June 2009 Graduation Committee:
More informationConstrained Discriminative Training of N-gram Language Models
Constrained Discriminative Training of N-gram Language Models Ariya Rastrow #1, Abhinav Sethy 2, Bhuvana Ramabhadran 3 # Human Language Technology Center of Excellence, and Center for Language and Speech
More informationSpeech Recognition Lecture 8: Acoustic Models. Eugene Weinstein Google, NYU Courant Institute Slide Credit: Mehryar Mohri
Speech Recognition Lecture 8: Acoustic Models. Eugene Weinstein Google, NYU Courant Institute eugenew@cs.nyu.edu Slide Credit: Mehryar Mohri Speech Recognition Components Acoustic and pronunciation model:
More informationLARGE-VOCABULARY CHINESE TEXT/SPEECH INFORMATION RETRIEVAL USING MANDARIN SPEECH QUERIES
LARGE-VOCABULARY CHINESE TEXT/SPEECH INFORMATION RETRIEVAL USING MANDARIN SPEECH QUERIES Bo-ren Bai 1, Berlin Chen 2, Hsin-min Wang 2, Lee-feng Chien 2, and Lin-shan Lee 1,2 1 Department of Electrical
More informationReport for each of the weighted automata obtained ˆ the number of states; ˆ the number of ɛ-transitions;
Mehryar Mohri Speech Recognition Courant Institute of Mathematical Sciences Homework assignment 3 (Solution) Part 2, 3 written by David Alvarez 1. For this question, it is recommended that you use the
More informationMLSALT11: Large Vocabulary Speech Recognition
MLSALT11: Large Vocabulary Speech Recognition Riashat Islam Department of Engineering University of Cambridge Trumpington Street, Cambridge, CB2 1PZ, England ri258@cam.ac.uk I. INTRODUCTION The objective
More informationTuring Machine Languages
Turing Machine Languages Based on Chapters 23-24-25 of (Cohen 1997) Introduction A language L over alphabet is called recursively enumerable (r.e.) if there is a Turing Machine T that accepts every word
More informationTreba: Efficient Numerically Stable EM for PFA
JMLR: Workshop and Conference Proceedings 21:249 253, 2012 The 11th ICGI Treba: Efficient Numerically Stable EM for PFA Mans Hulden Ikerbasque (Basque Science Foundation) mhulden@email.arizona.edu Abstract
More informationLearning Objectives. Continuous Random Variables & The Normal Probability Distribution. Continuous Random Variable
Learning Objectives Continuous Random Variables & The Normal Probability Distribution 1. Understand characteristics about continuous random variables and probability distributions 2. Understand the uniform
More informationA General Weighted Grammar Library
A General Weighted Grammar Library Cyril Allauzen, Mehryar Mohri 2, and Brian Roark 3 AT&T Labs Research 80 Park Avenue, Florham Park, NJ 07932-097 allauzen@research.att.com 2 Department of Computer Science
More informationAndi Buzo, Horia Cucu, Mihai Safta and Corneliu Burileanu. Speech & Dialogue (SpeeD) Research Laboratory University Politehnica of Bucharest (UPB)
Andi Buzo, Horia Cucu, Mihai Safta and Corneliu Burileanu Speech & Dialogue (SpeeD) Research Laboratory University Politehnica of Bucharest (UPB) The MediaEval 2012 SWS task A multilingual, query by example,
More informationChapter 3. Speech segmentation. 3.1 Preprocessing
, as done in this dissertation, refers to the process of determining the boundaries between phonemes in the speech signal. No higher-level lexical information is used to accomplish this. This chapter presents
More informationANALYSIS OF A PARALLEL LEXICAL-TREE-BASED SPEECH DECODER FOR MULTI-CORE PROCESSORS
17th European Signal Processing Conference (EUSIPCO 2009) Glasgow, Scotland, August 24-28, 2009 ANALYSIS OF A PARALLEL LEXICAL-TREE-BASED SPEECH DECODER FOR MULTI-CORE PROCESSORS Naveen Parihar Dept. of
More informationDesign of the CMU Sphinx-4 Decoder
MERL A MITSUBISHI ELECTRIC RESEARCH LABORATORY http://www.merl.com Design of the CMU Sphinx-4 Decoder Paul Lamere, Philip Kwok, William Walker, Evandro Gouva, Rita Singh, Bhiksha Raj and Peter Wolf TR-2003-110
More informationAn Intuitive Explanation of Fourier Theory
An Intuitive Explanation of Fourier Theory Steven Lehar slehar@cns.bu.edu Fourier theory is pretty complicated mathematically. But there are some beautifully simple holistic concepts behind Fourier theory
More informationInternational Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015
International Journal of Modern Trends in Engineering and Research www.ijmter.com e-issn No.:2349-9745, Date: 2-4 July, 2015 Communication media for Blinds Based on Voice Mrs.K.M.Sanghavi 1, Radhika Maru
More informationarxiv: v1 [cs.cl] 30 Jan 2018
ACCELERATING RECURRENT NEURAL NETWORK LANGUAGE MODEL BASED ONLINE SPEECH RECOGNITION SYSTEM Kyungmin Lee, Chiyoun Park, Namhoon Kim, and Jaewon Lee DMC R&D Center, Samsung Electronics, Seoul, Korea {k.m.lee,
More informationThe HTK Hidden Markov Model Toolkit: Design and Philosophy. SJ Young. September 6, Cambridge University Engineering Department
The HTK Hidden Markov Model Toolkit: Design and Philosophy SJ Young CUED/F-INFENG/TR.152 September 6, 1994 Cambridge University Engineering Department Trumpington Street, Cambridge, CB2 1PZ (sjy@eng.cam.ac.uk)
More informationAssignment 4 CSE 517: Natural Language Processing
Assignment 4 CSE 517: Natural Language Processing University of Washington Winter 2016 Due: March 2, 2016, 1:30 pm 1 HMMs and PCFGs Here s the definition of a PCFG given in class on 2/17: A finite set
More informationAn Algorithm for Fast Composition of Weighted Finite-State Transducers
An Algorithm for Fast Composition of Weighted Finite-State Transducers John McDonough,, Emilian Stoimenov and Dietrich Klakow Spoken Language Systems, Saarland University, Saarbrücken, Germany Institute
More informationSpeech Recognition with Quaternion Neural Networks
Speech Recognition with Quaternion Neural Networks LIA - 2019 Titouan Parcollet, Mohamed Morchid, Georges Linarès University of Avignon, France ORKIS, France Summary I. Problem definition II. Quaternion
More informationAutomatic State Machine Induction for String Recognition
Automatic State Machine Induction for String Recognition Boontee Kruatrachue, Nattachat Pantrakarn, and Kritawan Siriboon Abstract One problem of generating a model to recognize any string is how to generate
More informationCOMBINING FEATURE SETS WITH SUPPORT VECTOR MACHINES: APPLICATION TO SPEAKER RECOGNITION
COMBINING FEATURE SETS WITH SUPPORT VECTOR MACHINES: APPLICATION TO SPEAKER RECOGNITION Andrew O. Hatch ;2, Andreas Stolcke ;3, and Barbara Peskin The International Computer Science Institute, Berkeley,
More informationHomework Assignment #1 Sample Solutions
Homework Assignment # Sample Solutions Due Monday /, at the start of lecture. For the following control flow graph, for the purposes of a forward data flow analysis (such as available expressions), show
More informationCUED-RNNLM An Open-Source Toolkit for Efficient Training and Evaluation of Recurrent Neural Network Language Models
CUED-RNNLM An Open-Source Toolkit for Efficient Training and Evaluation of Recurrent Neural Network Language Models Xie Chen, Xunying Liu, Yanmin Qian, Mark Gales and Phil Woodland April 1, 2016 Overview
More informationLattice Rescoring for Speech Recognition Using Large Scale Distributed Language Models
Lattice Rescoring for Speech Recognition Using Large Scale Distributed Language Models ABSTRACT Euisok Chung Hyung-Bae Jeon Jeon-Gue Park and Yun-Keun Lee Speech Processing Research Team, ETRI, 138 Gajeongno,
More informationDiscriminative training and Feature combination
Discriminative training and Feature combination Steve Renals Automatic Speech Recognition ASR Lecture 13 16 March 2009 Steve Renals Discriminative training and Feature combination 1 Overview Hot topics
More informationParsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468
Parsers Xiaokang Qiu Purdue University ECE 468 August 31, 2018 What is a parser A parser has two jobs: 1) Determine whether a string (program) is valid (think: grammatically correct) 2) Determine the structure
More informationConfidence Measures: how much we can trust our speech recognizers
Confidence Measures: how much we can trust our speech recognizers Prof. Hui Jiang Department of Computer Science York University, Toronto, Ontario, Canada Email: hj@cs.yorku.ca Outline Speech recognition
More informationPARALLELIZING WFST SPEECH DECODERS
PARALLELIZING WFST SPEECH DECODERS Charith Mendis Massachusetts Institute of Technology Jasha Droppo, Saeed Maleki Madanlal Musuvathi Todd Mytkowicz, Geoffrey Zweig Microsoft Research ABSTRACT The performance-intensive
More informationThe HTK Book. Steve Young Gunnar Evermann Dan Kershaw Gareth Moore Julian Odell Dave Ollason Dan Povey Valtcho Valtchev Phil Woodland
The HTK Book Steve Young Gunnar Evermann Dan Kershaw Gareth Moore Julian Odell Dave Ollason Dan Povey Valtcho Valtchev Phil Woodland The HTK Book (for HTK Version 3.2) c COPYRIGHT 1995-1999 Microsoft Corporation.
More information저작권법에따른이용자의권리는위의내용에의하여영향을받지않습니다.
저작자표시 - 비영리 - 변경금지 2.0 대한민국 이용자는아래의조건을따르는경우에한하여자유롭게 이저작물을복제, 배포, 전송, 전시, 공연및방송할수있습니다. 다음과같은조건을따라야합니다 : 저작자표시. 귀하는원저작자를표시하여야합니다. 비영리. 귀하는이저작물을영리목적으로이용할수없습니다. 변경금지. 귀하는이저작물을개작, 변형또는가공할수없습니다. 귀하는, 이저작물의재이용이나배포의경우,
More informationLOW-RANK MATRIX FACTORIZATION FOR DEEP NEURAL NETWORK TRAINING WITH HIGH-DIMENSIONAL OUTPUT TARGETS
LOW-RANK MATRIX FACTORIZATION FOR DEEP NEURAL NETWORK TRAINING WITH HIGH-DIMENSIONAL OUTPUT TARGETS Tara N. Sainath, Brian Kingsbury, Vikas Sindhwani, Ebru Arisoy, Bhuvana Ramabhadran IBM T. J. Watson
More informationHandbook of Weighted Automata
Manfred Droste Werner Kuich Heiko Vogler Editors Handbook of Weighted Automata 4.1 Springer Contents Part I Foundations Chapter 1: Semirings and Formal Power Series Manfred Droste and Werner Kuich 3 1
More informationDiscriminative Training with Perceptron Algorithm for POS Tagging Task
Discriminative Training with Perceptron Algorithm for POS Tagging Task Mahsa Yarmohammadi Center for Spoken Language Understanding Oregon Health & Science University Portland, Oregon yarmoham@ohsu.edu
More informationContents. Resumen. List of Acronyms. List of Mathematical Symbols. List of Figures. List of Tables. I Introduction 1
Contents Agraïments Resum Resumen Abstract List of Acronyms List of Mathematical Symbols List of Figures List of Tables VII IX XI XIII XVIII XIX XXII XXIV I Introduction 1 1 Introduction 3 1.1 Motivation...
More informationOn Structured Perceptron with Inexact Search, NAACL 2012
On Structured Perceptron with Inexact Search, NAACL 2012 John Hewitt CIS 700-006 : Structured Prediction for NLP 2017-09-23 All graphs from Huang, Fayong, and Guo (2012) unless otherwise specified. All
More informationSpeech Tuner. and Chief Scientist at EIG
Speech Tuner LumenVox's Speech Tuner is a complete maintenance tool for end-users, valueadded resellers, and platform providers. It s designed to perform tuning and transcription, as well as parameter,
More informationSpoken Term Detection Using Multiple Speech Recognizers Outputs at NTCIR-9 SpokenDoc STD subtask
NTCIR-9 Workshop: SpokenDoc Spoken Term Detection Using Multiple Speech Recognizers Outputs at NTCIR-9 SpokenDoc STD subtask Hiromitsu Nishizaki Yuto Furuya Satoshi Natori Yoshihiro Sekiguchi University
More informationPart-of-Speech Tagging
Part-of-Speech Tagging A Canonical Finite-State Task 600.465 - Intro to NLP - J. Eisner 1 The Tagging Task Input: the lead paint is unsafe Output: the/ lead/n paint/n is/v unsafe/ Uses: text-to-speech
More informationReview. Pat Morin COMP 3002
Review Pat Morin COMP 3002 What is a Compiler A compiler translates from a source language S to a target language T while preserving the meaning of the input 2 Structure of a Compiler program text syntactic
More informationImplementation of Lexical Analysis. Lecture 4
Implementation of Lexical Analysis Lecture 4 1 Tips on Building Large Systems KISS (Keep It Simple, Stupid!) Don t optimize prematurely Design systems that can be tested It is easier to modify a working
More informationKhmer OCR for Limon R1 Size 22 Report
PAN Localization Project Project No: Ref. No: PANL10n/KH/Report/phase2/002 Khmer OCR for Limon R1 Size 22 Report 09 July, 2009 Prepared by: Mr. ING LENG IENG Cambodia Country Component PAN Localization
More informationIndexing. UCSB 290N. Mainly based on slides from the text books of Croft/Metzler/Strohman and Manning/Raghavan/Schutze
Indexing UCSB 290N. Mainly based on slides from the text books of Croft/Metzler/Strohman and Manning/Raghavan/Schutze All slides Addison Wesley, 2008 Table of Content Inverted index with positional information
More informationShort-time Viterbi for online HMM decoding : evaluation on a real-time phone recognition task
Short-time Viterbi for online HMM decoding : evaluation on a real-time phone recognition task Julien Bloit, Xavier Rodet To cite this version: Julien Bloit, Xavier Rodet. Short-time Viterbi for online
More informationSWEN 224 Formal Foundations of Programming
T E W H A R E W Ā N A N G A O T E Ū P O K O O T E I K A A M Ā U I VUW V I C T O R I A UNIVERSITY OF WELLINGTON EXAMINATIONS 2011 END-OF-YEAR SWEN 224 Formal Foundations of Programming Time Allowed: 3 Hours
More informationK-best Parsing Algorithms
K-best Parsing Algorithms Liang Huang University of Pennsylvania joint work with David Chiang (USC Information Sciences Institute) k-best Parsing Liang Huang (Penn) k-best parsing 2 k-best Parsing I saw
More informationCS321. Introduction to Numerical Methods
CS31 Introduction to Numerical Methods Lecture 1 Number Representations and Errors Professor Jun Zhang Department of Computer Science University of Kentucky Lexington, KY 40506 0633 August 5, 017 Number
More information10/10/13. Traditional database system. Information Retrieval. Information Retrieval. Information retrieval system? Information Retrieval Issues
COS 597A: Principles of Database and Information Systems Information Retrieval Traditional database system Large integrated collection of data Uniform access/modifcation mechanisms Model of data organization
More information