Speech Recognition Lecture 12: Lattice Algorithms. Cyril Allauzen Google, NYU Courant Institute Slide Credit: Mehryar Mohri

Size: px
Start display at page:

Download "Speech Recognition Lecture 12: Lattice Algorithms. Cyril Allauzen Google, NYU Courant Institute Slide Credit: Mehryar Mohri"

Transcription

1 Speech Recognition Lecture 12: Lattice Algorithms. Cyril Allauzen Google, NYU Courant Institute Slide Credit: Mehryar Mohri

2 This Lecture Speech recognition evaluation N-best strings algorithms Lattice generation Discriminative training 2

3 Performance Measure Accuracy: based on edit-distance of speech recognition transcription and reference transcription. word or phone accuracy. lattice oracle accuracy: edit-distance of lattice and reference transcription. Note: performance measure does not match the quantity optimized to learn models. word-error rate lattices. 3

4 Word Error Rates * * * Based on 1998 evaluation

5 Word Error Rate task hours of training data DNN-HMM GMM-HMM with same data GMM-HMM with more data Switchboard (test set 1) Switchboard (test set 2) (2,000 h) (2,000 h) Broadcast News Bing Voice Search (sentence error rate) Google Voice Input 5, (>> 5,870 h) Youtube 1, Source: Geoffrey Hinton, Li Deng, Dong Yu, George Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara Sainath, Brian Kingsbury. Deep Neural Networks for Acoustic Modeling in Speech Recognition. Signal Processing Magazine (2012) 5

6 Edit-Distance Definition: minimal cost of a sequence of edit operations transforming one string into another. Edit operations and costs: standard edit-distance definition: insertion, deletions, substitutions, all with same cost one. general case: more general operations, arbitrary non-negative costs. Application: measuring word error rate in speech recognition and other string processing tasks. 6

7 Local Edits Edit operations: insertion: ε a, deletion: a ε, substitution: a b (a b). Example: 2 insertions, 3 deletions, 1 substitution cttgϵϵac ϵ taϵ gtϵ c This is called an alignment. 7

8 Edit-Distance Computation Standard case: textbook recursive algorithm (Cormen, Leiserson, Rivest, 1992), quadratic O( x y ) x y complexity, for two strings and. General case: (Mohri, Pereira, and Riley, 2000; Mohri, 2003) construct tropical semiring edit-distance transducer T e with arbitrary edit costs. represent xand y by automata X and Y. compute best path of X T e Y. complexity quadratic:. 8 O( T e X Y )

9 Global Alignment - Example Example: c(a, G) = 1, c(a, T) = c(g, C) =.5, no cost C:G/0.5 for matching symbols. Insertion/deletion G:C/0.5cost: 2 (not pictured). Representation: A 0 0 G 0 C 0 T 0 T:A/0.5 A:T/0.5 G:A/1.0 A:G/1.0 T:T/0.0 G:G/0.0 C:C/0.0 A:A/0.0 A 0 0 C 0 C 0 T 0 G 0 0 echo "A G C T" ngramsymbols > agct.syms; echo "A G C T" farcompilestrings --symbols=agct.syms --keep_symbols=1 --generate_keys=1 --far_type=fst 9

10 Global Alignment - Example Program: fstcompose X.fst Te.fst fstcompose - Y.fst fstshortestpath -- nshortest=1 > A.fst Graphical representation: A:A/0 0 1 G:C/0.5 2 C:C/0 3 T:T/0 4 e:g/2 5/0 e:e/0 1 A:A/0 2 G:C/0.5 3 C:C/0 T:T/0 e:g/ /0 0 e:e/0 7 A:A/0 8 G:e/2 9 e:c/2 10 C:C/0 11 T:T/0 12 e:g/2 13/0 10

11 Edit-Distance of Automata Definition: the edit-distance of two automata and B is the minimum edit-distance of a string accepted by A and a string accepted by B. Computation: best path of A T e B. complexity for acyclic automata: O( T e A B ). Generality: any weighted transducer in the tropical semiring defines an edit-distance. Learning editdistance transducer using EM algorithm. A 11

12 This Lecture Speech recognition evaluation N-best strings algorithms Lattice generation Discriminative training 12

13 N-Best Sequences Motivation: rescoring. first pass using a simple acoustic model and grammar to produce lattice or N-best list. re-evaluate alternatives with a more sophisticated model or use new information. General problem: speech recognition, handwriting recognition. information extraction, image processing. 13

14 N-Shortest-Paths Problem Problem: given a weighted directed graph G, a source state s and a set of destination or final states F, find the N shortest paths in G from s to F. Algorithms: (Dreyfus, 1969): O( E + N log( E / Q )). (Mohri, 2002): shortest-distance algorithm, N-tropical semiring. (Eppstein, 2002): O( E + Q log Q + N). 14

15 N-Shortest Strings N-Shortest-Paths Problem: given a weighted directed graph G, a source state s and a set of destination or final states F, find the N shortest strings in G from s to F. Example: NAB Eval 95. Thresh Non-Unique Unique

16 N-Shortest Paths Program: fstprune --delta=1.5 lat.fst fstshortestpath --nshortest=10 in addition the launch of Microsoft corporation's windows ninety five software will mean more memory will be required to run in addition the launch of Microsoft corporation's windows ninety five software will mean more memory will be required around in addition the launch of Microsoft corporation's windows ninety five software will mean more memory will be required to run in addition the launch of Microsoft corporation's windows ninety five software will mean more memory will be required to run in addition the launch of Microsoft corporation's windows ninety five software will mean more memory will be required around in addition the launch of Microsoft corporation's windows ninety five software will mean more memory will be required around in addition the launch of Microsoft corporation's windows ninety five software will mean more memory will be required to run in addition the launch of Microsoft corporation's windows ninety five software will mean more memory will be required around

17 N-Shortest Strings Program: fstprune --delta=1.5 lat.fst fstshortestpath --nshortest=10 --unique in addition the launch of Microsoft corporation's windows ninety five software will mean more memory will be required to run in addition the launch of Microsoft corporation's windows ninety five software will mean more memory will be required around

18 Algorithms Based on N-Best Paths Idea: use K-best paths algorithm to generate distincts paths. Problems: Knot known in advance. in practice, K (Chow and Schwartz, 1990; Soon and Huang, 1991) K N may be sometimes quite large, that is K 2 N, which affects both time and space complexity. 18

19 N-Best String Algorithm Idea: apply N-best paths algorithm to on-the-fly determinization of input automaton. But, N-best paths algorithms require shortest distances to F. Weighted determinization (partial): (Mohri and Riley, 2002) eliminates redundancy, no determinizability issue. on-demand computation: only the part needed is computed. on-the-fly computation of the needed shortestdistances to final states. 19

20 Shortest-Distances to Final States Definition: let d(q, F) denote the shortest distance from q to the set of final states F in input (nondeterministic) automaton A, and let d (q,f ) be defined in the same way in the resulting (deterministic) automaton B. Theorem: for any state in B, the following holds: q = {(q 1,w 1 ),...,(q n,w n )} d (q,f ) = min {w i + d(q i,f)}. i=1,...,n 20

21 Simple N-Shortest-Paths Algorithm 1 for p 1 to Q do r[p] 0 2 π[(i, 0)] nil 3 S {(i, 0)} 4 while S 5 do (p, c) head(s); Dequeue(S) 6 r[p] r[p]+1 7 if (r[p] =N and p F ) then exit 8 if r[p] N 9 then for each e E[p] 10 do c c + w[e] 11 π[(n[e],c )] (p, c) 12 Enqueue(S, (n[e],c )) Queue ordering based on: (p, c) < (p 0,c 0 ), (c + d[p, F ] <c 0 + d(p 0,F]) 21

22 N-Best String Alg. - Experiments Oracle word accuracy: NAB 40K Bigram word accuracy x real-time 1. 1-Best Best Best Best 5. Lattice Additional time to pay for N-best very small even for large N. 22

23 N-Best String Alg. - Properties Simplicity and efficiency: easy to implement: combine two general algorithms. works with any N-best paths algorithm. empirically efficient. Generality: arbitrary input automaton (not nec. acyclic). incorporated in OpenFST Library ( fstshortestpath). 23

24 This Lecture Speech recognition evaluation N-best strings algorithms Lattice generation Discriminative training 24

25 Speech Recognition Lattices Definition: weighted automaton representing speech recognizer s alternative hypotheses. 0 hi/ I/ this/153.0 I d/143.3 this/ is/ this/ is/ is/70.97 uh/83.34 Mike/192.5 is/ my/19.2 my/ hard/ card/20.1 number/ like/

26 Lattice Generation (Odell, 1995; Ljolje et al., 1999) Procedure: given transition e in N, keep in lattice transition ((p[e],t ),i[e],o[e],(n[e],t)) with best start time (p[e],t ) during Viterbi decoding. (r, t ) (q, t) (r, t ) t t t +1 26

27 Lattice Generation Computation time: little extra computation over one-best. Optimization: projection on output (words or phonemes). epsilon-removal. pruning: keeps transitions and states lying on paths whose total weight is within a threshold of the best path. garbage-collection (use same pruning). 27

28 Notes Heuristics: not all paths within beam are kept in lattice. Lattice quality: oracle accuracy, that is best accuracy achieved by any path in lattice. Optimizations: weighted determinization and minimization. in general, dramatic reduction of redundancy and size. bad for some lattices, typically uncertain cases. 28

29 Speech Recognition Lattice which/ which/ which/ which/ flights/ flights/64 7 flights/72.4 flights/50.2 flights/59.9 flights/ flights/ flights/88.2 flights/45.4 flights/55.2 flights/63.5 flights/79 flights/83.4 flights/43.7 flights/53.5 flights/ leave/ leave/ leave/ leave/ leave/82.1 leave/51.4 leave/54.4 leave/57.7 leave/60.4 leave/68.9 leave/44.4 leave/47.4 leave/50.7 leave/53.4 leave/61.9 leave/35.9 leave/39.2 leave/41.9 leave/31.3 leave/34.6 leave/37.3 leave/ detroit/ detroit/ detroit/109 detroit/102 detroit/106 detroit/105 detroit/99.1 detroit/103 detroit/102 detroit/96.3 detroit/99.7 detroit/99.4 detroit/88.5 detroit/91.9 detroit/ and/ and/ and/ and/57 22 and/59.3 that/60.1 and/53.1 and/55.3 and/55 and/55 24 arrive/ arrive/ arrive/ arrive/ arrive/20.2 arrive/ arrive/ arrive/ arrive/ arrive/23.1 arriving/41.3 arrive/21.1 arrive/21.7 arrive/23.2 arrive/24.2 arrive/27.7 arrive/13.1 arrive/13.7 arrive/15.2 arrive/16.2 arrive/19.7 arrive/16.6 arrive/18.2 arrive/19.1 arrive/22.7 arrive/13.9 arrive/14.4 arrive/16 arrive/16.9 arrive/20.5 arrive/14.4 arrive/16 arrive/16.9 arrive/20.5 arrive/15.6 arrive/16.5 arrive/15.6 arrive/ at/ at/ at/ at/ in/29 at/23.5 at/35 at/25.7 at/32 in/ at/37.3 at/21.2 at/27.5 at/32.8 at/23.4 at/29.7 in/25 at/35 50 in/14.3 at/20 at/26.3 at/31.6 at/22.2 at/28.5 at/33.8 at/17.3 at/28.9 at/19.5 at/25.8 in/21.7 at/31.1 at/35.1 at/23.5 at/29.8 at/21.3 at/32.8 at/21.3 at/27.5 at/20.1 at/31.7 at/20.1 at/26.4 in/24.7 at/31.7 at/28.9 at/17.4 at/ saint/ saint/ saint/ saint/ saint/57.8 saint/36.8 saint/38 saint/42.3 saint/51.5 saint/33.2 saint/34.4 saint/38.7 saint/45.5 saint/47.9 saint/44.2 saint/45.4 saint/49.7 saint/56.5 saint/45 saint/49.3 saint/56.1 saint/37.8 saint/39.1 saint/43.3 saint/50.1 saint/38.7 saint/43 saint/ saint/ saint/ saint/ saint/46.8 saint/39.8 saint/46.5 saint/46.2 saint/68.1 saint/49.5 saint/ petersburg/ petersburg/92 69 petersburg/99.9 petersburg/86.1 petersburg/90 petersburg/ petersburg/104 petersburg/80.9 petersburg/84.8 petersburg/92.8 petersburg/98.8 petersburg/73.3 petersburg/77.1 petersburg/85.1 petersburg/91.1 petersburg/ petersburg/89.3 petersburg/84.1 petersburg/ petersburg/ petersburg/ petersburg/ petersburg/ petersburg/91.5 petersburg/92.3 petersburg/90.3 petersburg/98.3 petersburg/85.2 petersburg/ around/ around/ at/ around/ around/109 around/91.6 around/103 around/116 around/98.8 around/105 around/117 at/20.1 around/97.8 around/109 at/14.1 around/93.2 around/ around/ around/ at/ at/65 around/92.2 around/104 around/87.6 around/104 around/87.8 around/99.3 around/99.5 around/111 around/111 around/99.5 around/105 around/ and/ nine/ nine/ nine/31.2 nine/5.61 nine/8.76 nine/ nine/ nine/ ninety/18 87 around/ around/81 89 around/ around/97.1 and/19.1 nine/24.9 nine/28 nine/9.2 nine/12.3 nine/6.31 nine/9.45 nine/22.6 around/102 around/ nine/ a/ a/ a/ a/22.8 a/12.1 a/13.6 a/22.3 a/18.5 a/8.73 a/13.3 a/16.3 a/9.34 nine/5.66 nine/8.8 nine/17.2 nine/13 nine/28 nine/18.8 nine/21.9 nine/12.4 a/ /20.8 m/ /16.2 m/ /13.6 m/24.1 m/12.5 m/16.3 m/19 m/7.03 m/10.9 m/13.6 m/13.9 m/16.6

30 Lattice after Determinization (Mohri, 1997) at/ around/81 around/83 32 nine/ saint/ petersburg/ around/ nine/5.61 which/ flights/ leave/ detroit/105 4 and/49.7 that/ arriving/45.2 arrive/17.4 arrives/23.6 arrive/ in/16.3 at/23 at/20.9 at/ saint/ petersburg/ saint/ petersburg/80 22 saint/43 saint/43 16 petersburg/ around/ at/16.1 at/ around/97.1 nine/21.7 ninety/18 ninety/34.1 nine/21.7 nine/ a/15.3 a/9.34 a/15.3 a/ m/ m/ around/ ninety/34.1 and/18.7 nine/ and/ ninety/34.1 and/

31 Lattice after Minimization (Mohri, 1997) around/97.1 nine/29.1 which/ flights/ leave/ detroit/105 4 that/60.1 and/ arrive/12.9 arrives/23.6 arrive/17.4 arriving/ at/21.8 at/23 in/ saint/48.6 saint/ petersburg/80 petersburg/ at/18 around/96.5 at/ and/21.2 around/97.2 ninety/ nine/13 ninety/18 21 a/ m 9 saint/ petersburg/ around/ nine/13 31

32 This Lecture Speech recognition evaluation N-best strings algorithms Lattice generation Discriminative training 32

33 Discriminative Techniques Maximum-likelihood: parameters adjusted to increase joint likelihood of acoustic and CD phone or word sequences, irrespective of the probability of other word hypotheses. Discriminative techniques: takes into account competing word hypotheses and attempts to reduce the probability of incorrect ones. Main problems: computationally expensive, generalization. 33

34 Objective Functions Maximum likelihood (joint): F = argmax θ m i=1 log p θ (o i, w i ). Conditional maximum likelihood (CML): (Woodland and Povey, 2002) F = argmax m X i=1 log p (w i o i ) = argmax m X i=1 log p (o i, w i ) p (o i ) Maximum mutual information (MMI/MMIE) Equivalent to CML when independent of theta. F = argmax m X i=1 log p (o i w i )p (w i ) P ŵ p (o i ŵ )p (ŵ ) 34

35 Discriminative Training Algorithm Produce numerator lattice by decoding with only the reference transcript allowed (similar to alignment for MLE training). Produce denominator lattice by doing open-ended decode with a simple language model (unigram model trained on transcripts of training data set). Gather Gaussian component occupancy counts, first- and second-order statistics of data points. Perform model updates using modified Baum- Welch update equations. 35

36 Minimum Error Objective Functions F = argmax Minimum Word Error (MWE): m X i=1 log (Povey and Woodland, 2002) P ŵ p (o i ŵ )p (ŵ )RawAccuracy(w i, ŵ ) P ŵ p (o i ŵ )p (ŵ ) Minimum Phone Error (MPE): using phone or context-depend phone accuracy instead of word accuracy. In practice, MPE works better than MWE. 36

37 References Y. Chow and R. Schwartz, The N-Best Algorithm: An Efficient Procedure for Finding top N Sentence Hypotheses. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP '90), Albuquerque, New Mexico, April 1990, pp S. E. Dreyfus. An appraisal of some shortest path algorithms. Operations Research, 17: , David Eppstein, Finding the shortest paths, SIAM Journal of Computing, vol.28, no. 2, pp , Andrej Ljolje and Fernando Pereira and Michael Riley, Efficient general lattice generation and rescoring. In Proceedings of the European Conference on Speech Communication and Technology (Eurospeech 99), Budapest, Hungary, Mehryar Mohri. Finite-State Transducers in Language and Speech Processing. Computational Linguistics, 23:2, Mehryar Mohri. Statistical Natural Language Processing. In M. Lothaire, editor, Applied Combinatorics on Words. Cambridge University Press,

38 References Mehryar Mohri. Edit-Distance of Weighted Automata: General Definitions and Algorithms. International Journal of Foundations of Computer Science, 14(6): , Mehryar Mohri and Michael Riley. An Efficient Algorithm for the N-Best-Strings Problem. In Proceedings of the International Conference on Spoken Language Processing 2002 (ICSLP 02), Denver, Colorado, September Mehryar Mohri, Fernando C. N. Pereira, and Michael Riley. The Design Principles of a Weighted Finite-State Transducer Library. Theoretical Computer Science, 231:17-32, January Julian Odell. The Use of Context in Large Vocabulary Speech Recognition. Ph.D. thesis, Cambridge University, UK. Frank Soong and Eng-Fong Huang, A Tree-Trellis Based Fast Search for Finding the N Best Sentence Hypotheses in Continuous Speech Recognition. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP 91), Toronto, Canada, November 1991, pp

39 References Phil Woodland and Daniel Povey, Large Scale Discriminative Training for Speech Recognition. Computer Speech and Language, 16, pp , Daniel Povey and Phil Woodland, Minimum Phone Error and i-smoothing for Improved Discriminative Training. ICASSP

Weighted Finite-State Transducers in Computational Biology

Weighted Finite-State Transducers in Computational Biology Weighted Finite-State Transducers in Computational Biology Mehryar Mohri Courant Institute of Mathematical Sciences mohri@cims.nyu.edu Joint work with Corinna Cortes (Google Research). 1 This Tutorial

More information

Kernels vs. DNNs for Speech Recognition

Kernels vs. DNNs for Speech Recognition Kernels vs. DNNs for Speech Recognition Joint work with: Columbia: Linxi (Jim) Fan, Michael Collins (my advisor) USC: Zhiyun Lu, Kuan Liu, Alireza Bagheri Garakani, Dong Guo, Aurélien Bellet, Fei Sha IBM:

More information

Lexicographic Semirings for Exact Automata Encoding of Sequence Models

Lexicographic Semirings for Exact Automata Encoding of Sequence Models Lexicographic Semirings for Exact Automata Encoding of Sequence Models Brian Roark, Richard Sproat, and Izhak Shafran {roark,rws,zak}@cslu.ogi.edu Abstract In this paper we introduce a novel use of the

More information

Weighted Finite-State Transducers in Computational Biology

Weighted Finite-State Transducers in Computational Biology Weighted Finite-State Transducers in Computational Biology Mehryar Mohri Courant Institute of Mathematical Sciences mohri@cims.nyu.edu Joint work with Corinna Cortes (Google Research). This Tutorial Weighted

More information

Discriminative Training of Decoding Graphs for Large Vocabulary Continuous Speech Recognition

Discriminative Training of Decoding Graphs for Large Vocabulary Continuous Speech Recognition Discriminative Training of Decoding Graphs for Large Vocabulary Continuous Speech Recognition by Hong-Kwang Jeff Kuo, Brian Kingsbury (IBM Research) and Geoffry Zweig (Microsoft Research) ICASSP 2007 Presented

More information

Speech Recognition Lecture 8: Acoustic Models. Eugene Weinstein Google, NYU Courant Institute Slide Credit: Mehryar Mohri

Speech Recognition Lecture 8: Acoustic Models. Eugene Weinstein Google, NYU Courant Institute Slide Credit: Mehryar Mohri Speech Recognition Lecture 8: Acoustic Models. Eugene Weinstein Google, NYU Courant Institute eugenew@cs.nyu.edu Slide Credit: Mehryar Mohri Speech Recognition Components Acoustic and pronunciation model:

More information

Weighted Finite State Transducers in Automatic Speech Recognition

Weighted Finite State Transducers in Automatic Speech Recognition Weighted Finite State Transducers in Automatic Speech Recognition ZRE lecture 10.04.2013 Mirko Hannemann Slides provided with permission, Daniel Povey some slides from T. Schultz, M. Mohri and M. Riley

More information

Lecture 7: Neural network acoustic models in speech recognition

Lecture 7: Neural network acoustic models in speech recognition CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 7: Neural network acoustic models in speech recognition Outline Hybrid acoustic modeling overview Basic

More information

GMM-FREE DNN TRAINING. Andrew Senior, Georg Heigold, Michiel Bacchiani, Hank Liao

GMM-FREE DNN TRAINING. Andrew Senior, Georg Heigold, Michiel Bacchiani, Hank Liao GMM-FREE DNN TRAINING Andrew Senior, Georg Heigold, Michiel Bacchiani, Hank Liao Google Inc., New York {andrewsenior,heigold,michiel,hankliao}@google.com ABSTRACT While deep neural networks (DNNs) have

More information

A General Weighted Grammar Library

A General Weighted Grammar Library A General Weighted Grammar Library Cyril Allauzen, Mehryar Mohri, and Brian Roark AT&T Labs Research, Shannon Laboratory 80 Park Avenue, Florham Park, NJ 0792-097 {allauzen, mohri, roark}@research.att.com

More information

Weighted Finite State Transducers in Automatic Speech Recognition

Weighted Finite State Transducers in Automatic Speech Recognition Weighted Finite State Transducers in Automatic Speech Recognition ZRE lecture 15.04.2015 Mirko Hannemann Slides provided with permission, Daniel Povey some slides from T. Schultz, M. Mohri, M. Riley and

More information

Overview. Search and Decoding. HMM Speech Recognition. The Search Problem in ASR (1) Today s lecture. Steve Renals

Overview. Search and Decoding. HMM Speech Recognition. The Search Problem in ASR (1) Today s lecture. Steve Renals Overview Search and Decoding Steve Renals Automatic Speech Recognition ASR Lecture 10 January - March 2012 Today s lecture Search in (large vocabulary) speech recognition Viterbi decoding Approximate search

More information

A General Weighted Grammar Library

A General Weighted Grammar Library A General Weighted Grammar Library Cyril Allauzen, Mehryar Mohri 2, and Brian Roark 3 AT&T Labs Research 80 Park Avenue, Florham Park, NJ 07932-097 allauzen@research.att.com 2 Department of Computer Science

More information

MINIMUM EXACT WORD ERROR TRAINING. G. Heigold, W. Macherey, R. Schlüter, H. Ney

MINIMUM EXACT WORD ERROR TRAINING. G. Heigold, W. Macherey, R. Schlüter, H. Ney MINIMUM EXACT WORD ERROR TRAINING G. Heigold, W. Macherey, R. Schlüter, H. Ney Lehrstuhl für Informatik 6 - Computer Science Dept. RWTH Aachen University, Aachen, Germany {heigold,w.macherey,schlueter,ney}@cs.rwth-aachen.de

More information

An Arabic Optical Character Recognition System Using Restricted Boltzmann Machines

An Arabic Optical Character Recognition System Using Restricted Boltzmann Machines An Arabic Optical Character Recognition System Using Restricted Boltzmann Machines Abdullah M. Rashwan, Mohamed S. Kamel, and Fakhri Karray University of Waterloo Abstract. Most of the state-of-the-art

More information

SVD-based Universal DNN Modeling for Multiple Scenarios

SVD-based Universal DNN Modeling for Multiple Scenarios SVD-based Universal DNN Modeling for Multiple Scenarios Changliang Liu 1, Jinyu Li 2, Yifan Gong 2 1 Microsoft Search echnology Center Asia, Beijing, China 2 Microsoft Corporation, One Microsoft Way, Redmond,

More information

Finite-State Transducers in Language and Speech Processing

Finite-State Transducers in Language and Speech Processing Finite-State Transducers in Language and Speech Processing Mehryar Mohri AT&T Labs-Research Finite-state machines have been used in various domains of natural language processing. We consider here the

More information

Discriminative training and Feature combination

Discriminative training and Feature combination Discriminative training and Feature combination Steve Renals Automatic Speech Recognition ASR Lecture 13 16 March 2009 Steve Renals Discriminative training and Feature combination 1 Overview Hot topics

More information

LOW-RANK MATRIX FACTORIZATION FOR DEEP NEURAL NETWORK TRAINING WITH HIGH-DIMENSIONAL OUTPUT TARGETS

LOW-RANK MATRIX FACTORIZATION FOR DEEP NEURAL NETWORK TRAINING WITH HIGH-DIMENSIONAL OUTPUT TARGETS LOW-RANK MATRIX FACTORIZATION FOR DEEP NEURAL NETWORK TRAINING WITH HIGH-DIMENSIONAL OUTPUT TARGETS Tara N. Sainath, Brian Kingsbury, Vikas Sindhwani, Ebru Arisoy, Bhuvana Ramabhadran IBM T. J. Watson

More information

Variable-Component Deep Neural Network for Robust Speech Recognition

Variable-Component Deep Neural Network for Robust Speech Recognition Variable-Component Deep Neural Network for Robust Speech Recognition Rui Zhao 1, Jinyu Li 2, and Yifan Gong 2 1 Microsoft Search Technology Center Asia, Beijing, China 2 Microsoft Corporation, One Microsoft

More information

Speech Technology Using in Wechat

Speech Technology Using in Wechat Speech Technology Using in Wechat FENG RAO Powered by WeChat Outline Introduce Algorithm of Speech Recognition Acoustic Model Language Model Decoder Speech Technology Open Platform Framework of Speech

More information

Discriminative Training and Adaptation of Large Vocabulary ASR Systems

Discriminative Training and Adaptation of Large Vocabulary ASR Systems Discriminative Training and Adaptation of Large Vocabulary ASR Systems Phil Woodland March 30th 2004 ICSI Seminar: March 30th 2004 Overview Why use discriminative training for LVCSR? MMIE/CMLE criterion

More information

Query-by-example spoken term detection based on phonetic posteriorgram Query-by-example spoken term detection based on phonetic posteriorgram

Query-by-example spoken term detection based on phonetic posteriorgram Query-by-example spoken term detection based on phonetic posteriorgram International Conference on Education, Management and Computing Technology (ICEMCT 2015) Query-by-example spoken term detection based on phonetic posteriorgram Query-by-example spoken term detection based

More information

Introduction to HTK Toolkit

Introduction to HTK Toolkit Introduction to HTK Toolkit Berlin Chen 2003 Reference: - The HTK Book, Version 3.2 Outline An Overview of HTK HTK Processing Stages Data Preparation Tools Training Tools Testing Tools Analysis Tools Homework:

More information

Hierarchical Phrase-Based Translation with WFSTs. Weighted Finite State Transducers

Hierarchical Phrase-Based Translation with WFSTs. Weighted Finite State Transducers Hierarchical Phrase-Based Translation with Weighted Finite State Transducers Gonzalo Iglesias 1 Adrià de Gispert 2 Eduardo R. Banga 1 William Byrne 2 1 Department of Signal Processing and Communications

More information

Speech Recognition CSCI-GA Fall Homework Assignment #1: Automata Operations. Instructor: Eugene Weinstein. Due Date: October 17th

Speech Recognition CSCI-GA Fall Homework Assignment #1: Automata Operations. Instructor: Eugene Weinstein. Due Date: October 17th Speech Recognition CSCI-GA.3033-015 Fall 2013 Homework Assignment #1: Automata Operations Instructor: Eugene Weinstein Due Date: October 17th Note: It is advised, but not required, to use the OpenFST library

More information

Gender-dependent acoustic models fusion developed for automatic subtitling of Parliament meetings broadcasted by the Czech TV

Gender-dependent acoustic models fusion developed for automatic subtitling of Parliament meetings broadcasted by the Czech TV Gender-dependent acoustic models fusion developed for automatic subtitling of Parliament meetings broadcasted by the Czech TV Jan Vaněk and Josef V. Psutka Department of Cybernetics, West Bohemia University,

More information

Why DNN Works for Speech and How to Make it More Efficient?

Why DNN Works for Speech and How to Make it More Efficient? Why DNN Works for Speech and How to Make it More Efficient? Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering, York University, CANADA Joint work with Y.

More information

Hidden Markov Models. Gabriela Tavares and Juri Minxha Mentor: Taehwan Kim CS159 04/25/2017

Hidden Markov Models. Gabriela Tavares and Juri Minxha Mentor: Taehwan Kim CS159 04/25/2017 Hidden Markov Models Gabriela Tavares and Juri Minxha Mentor: Taehwan Kim CS159 04/25/2017 1 Outline 1. 2. 3. 4. Brief review of HMMs Hidden Markov Support Vector Machines Large Margin Hidden Markov Models

More information

Making Deep Belief Networks Effective for Large Vocabulary Continuous Speech Recognition

Making Deep Belief Networks Effective for Large Vocabulary Continuous Speech Recognition Making Deep Belief Networks Effective for Large Vocabulary Continuous Speech Recognition Tara N. Sainath 1, Brian Kingsbury 1, Bhuvana Ramabhadran 1, Petr Fousek 2, Petr Novak 2, Abdel-rahman Mohamed 3

More information

Speech User Interface for Information Retrieval

Speech User Interface for Information Retrieval Speech User Interface for Information Retrieval Urmila Shrawankar Dept. of Information Technology Govt. Polytechnic Institute, Nagpur Sadar, Nagpur 440001 (INDIA) urmilas@rediffmail.com Cell : +919422803996

More information

Deep Neural Network Approach for the Dialog State Tracking Challenge

Deep Neural Network Approach for the Dialog State Tracking Challenge Deep Neural Network Approach for the Dialog State Tracking Challenge Matthew Henderson, Blaise Thomson and Steve Young Department of Engineering, University of Cambridge, U.K. {mh521, brmt2, sjy}@eng.cam.ac.uk

More information

Learning N-gram Language Models from Uncertain Data

Learning N-gram Language Models from Uncertain Data Learning N-gram Language Models from Uncertain Data Vitaly Kuznetsov 1,2, Hank Liao 2, Mehryar Mohri 1,2, Michael Riley 2, Brian Roark 2 1 Courant Institute, New York University 2 Google, Inc. vitalyk,hankliao,mohri,riley,roark}@google.com

More information

Ping-pong decoding Combining forward and backward search

Ping-pong decoding Combining forward and backward search Combining forward and backward search Research Internship 09/ - /0/0 Mirko Hannemann Microsoft Research, Speech Technology (Redmond) Supervisor: Daniel Povey /0/0 Mirko Hannemann / Beam Search Search Errors

More information

Report for each of the weighted automata obtained ˆ the number of states; ˆ the number of ɛ-transitions;

Report for each of the weighted automata obtained ˆ the number of states; ˆ the number of ɛ-transitions; Mehryar Mohri Speech Recognition Courant Institute of Mathematical Sciences Homework assignment 3 (Solution) Part 2, 3 written by David Alvarez 1. For this question, it is recommended that you use the

More information

Discriminative Language Modeling with Conditional Random Fields and the Perceptron Algorithm

Discriminative Language Modeling with Conditional Random Fields and the Perceptron Algorithm Discriminative Language Modeling with Conditional Random Fields and the Perceptron Algorithm Brian Roark Murat Saraclar AT&T Labs - Research {roark,murat}@research.att.com Michael Collins MIT CSAIL mcollins@csail.mit.edu

More information

SPIDER: A Continuous Speech Light Decoder

SPIDER: A Continuous Speech Light Decoder SPIDER: A Continuous Speech Light Decoder Abdelaziz AAbdelhamid, Waleed HAbdulla, and Bruce AMacDonald Department of Electrical and Computer Engineering, Auckland University, New Zealand E-mail: aabd127@aucklanduniacnz,

More information

Lecture 5: Markov models

Lecture 5: Markov models Master s course Bioinformatics Data Analysis and Tools Lecture 5: Markov models Centre for Integrative Bioinformatics Problem in biology Data and patterns are often not clear cut When we want to make a

More information

Intelligent Hands Free Speech based SMS System on Android

Intelligent Hands Free Speech based SMS System on Android Intelligent Hands Free Speech based SMS System on Android Gulbakshee Dharmale 1, Dr. Vilas Thakare 3, Dr. Dipti D. Patil 2 1,3 Computer Science Dept., SGB Amravati University, Amravati, INDIA. 2 Computer

More information

FINDING INFORMATION IN AUDIO: A NEW PARADIGM FOR AUDIO BROWSING AND RETRIEVAL

FINDING INFORMATION IN AUDIO: A NEW PARADIGM FOR AUDIO BROWSING AND RETRIEVAL FINDING INFORMATION IN AUDIO: A NEW PARADIGM FOR AUDIO BROWSING AND RETRIEVAL Julia Hirschberg, Steve Whittaker, Don Hindle, Fernando Pereira, and Amit Singhal AT&T Labs Research Shannon Laboratory, 180

More information

SYNTHESIZED STEREO MAPPING VIA DEEP NEURAL NETWORKS FOR NOISY SPEECH RECOGNITION

SYNTHESIZED STEREO MAPPING VIA DEEP NEURAL NETWORKS FOR NOISY SPEECH RECOGNITION 2014 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP) SYNTHESIZED STEREO MAPPING VIA DEEP NEURAL NETWORKS FOR NOISY SPEECH RECOGNITION Jun Du 1, Li-Rong Dai 1, Qiang Huo

More information

A Comparison of Sequence-Trained Deep Neural Networks and Recurrent Neural Networks Optical Modeling For Handwriting Recognition

A Comparison of Sequence-Trained Deep Neural Networks and Recurrent Neural Networks Optical Modeling For Handwriting Recognition A Comparison of Sequence-Trained Deep Neural Networks and Recurrent Neural Networks Optical Modeling For Handwriting Recognition Théodore Bluche, Hermann Ney, Christopher Kermorvant SLSP 14, Grenoble October

More information

Chapter 3. Speech segmentation. 3.1 Preprocessing

Chapter 3. Speech segmentation. 3.1 Preprocessing , as done in this dissertation, refers to the process of determining the boundaries between phonemes in the speech signal. No higher-level lexical information is used to accomplish this. This chapter presents

More information

27: Hybrid Graphical Models and Neural Networks

27: Hybrid Graphical Models and Neural Networks 10-708: Probabilistic Graphical Models 10-708 Spring 2016 27: Hybrid Graphical Models and Neural Networks Lecturer: Matt Gormley Scribes: Jakob Bauer Otilia Stretcu Rohan Varma 1 Motivation We first look

More information

Lab 2: Training monophone models

Lab 2: Training monophone models v. 1.1 Lab 2: Training monophone models University of Edinburgh January 29, 2018 Last time we begun to get familiar with some of Kaldi s tools and set up a data directory for TIMIT. This time we will train

More information

Automatic Speech Recognition (ASR)

Automatic Speech Recognition (ASR) Automatic Speech Recognition (ASR) February 2018 Reza Yazdani Aminabadi Universitat Politecnica de Catalunya (UPC) State-of-the-art State-of-the-art ASR system: DNN+HMM Speech (words) Sound Signal Graph

More information

ACOUSTIC MODELING WITH NEURAL GRAPH EMBEDDINGS. Yuzong Liu, Katrin Kirchhoff

ACOUSTIC MODELING WITH NEURAL GRAPH EMBEDDINGS. Yuzong Liu, Katrin Kirchhoff ACOUSTIC MODELING WITH NEURAL GRAPH EMBEDDINGS Yuzong Liu, Katrin Kirchhoff Department of Electrical Engineering University of Washington, Seattle, WA 98195 ABSTRACT Graph-based learning (GBL) is a form

More information

A MACHINE LEARNING FRAMEWORK FOR SPOKEN-DIALOG CLASSIFICATION. Patrick Haffner Park Avenue Florham Park, NJ 07932

A MACHINE LEARNING FRAMEWORK FOR SPOKEN-DIALOG CLASSIFICATION. Patrick Haffner Park Avenue Florham Park, NJ 07932 Springer Handbook on Speech Processing and Speech Communication A MACHINE LEARNING FRAMEWORK FOR SPOKEN-DIALOG CLASSIFICATION Corinna Cortes Google Research 76 Ninth Avenue New York, NY corinna@google.com

More information

Discriminative n-gram language modeling q

Discriminative n-gram language modeling q Computer Speech and Language 21 (2007) 373 392 COMPUTER SPEECH AND LANGUAGE www.elsevier.com/locate/csl Discriminative n-gram language modeling q Brian Roark a, *,1, Murat Saraclar b,1, Michael Collins

More information

Hybrid Accelerated Optimization for Speech Recognition

Hybrid Accelerated Optimization for Speech Recognition INTERSPEECH 26 September 8 2, 26, San Francisco, USA Hybrid Accelerated Optimization for Speech Recognition Jen-Tzung Chien, Pei-Wen Huang, Tan Lee 2 Department of Electrical and Computer Engineering,

More information

Better Evaluation for Grammatical Error Correction

Better Evaluation for Grammatical Error Correction Better Evaluation for Grammatical Error Correction Daniel Dahlmeier 1 and Hwee Tou Ng 1,2 1 NUS Graduate School for Integrative Sciences and Engineering 2 Department of Computer Science, National University

More information

ABSTRACT 1. INTRODUCTION

ABSTRACT 1. INTRODUCTION LOW-RANK PLUS DIAGONAL ADAPTATION FOR DEEP NEURAL NETWORKS Yong Zhao, Jinyu Li, and Yifan Gong Microsoft Corporation, One Microsoft Way, Redmond, WA 98052, USA {yonzhao; jinyli; ygong}@microsoft.com ABSTRACT

More information

Joint Optimisation of Tandem Systems using Gaussian Mixture Density Neural Network Discriminative Sequence Training

Joint Optimisation of Tandem Systems using Gaussian Mixture Density Neural Network Discriminative Sequence Training Joint Optimisation of Tandem Systems using Gaussian Mixture Density Neural Network Discriminative Sequence Training Chao Zhang and Phil Woodland March 8, 07 Cambridge University Engineering Department

More information

Invariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction

Invariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction Invariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction Stefan Müller, Gerhard Rigoll, Andreas Kosmala and Denis Mazurenok Department of Computer Science, Faculty of

More information

Lecture 9. LVCSR Decoding (cont d) and Robustness. Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen

Lecture 9. LVCSR Decoding (cont d) and Robustness. Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen Lecture 9 LVCSR Decoding (cont d) and Robustness Michael Picheny, huvana Ramabhadran, Stanley F. Chen IM T.J. Watson Research Center Yorktown Heights, New York, USA {picheny,bhuvana,stanchen}@us.ibm.com

More information

arxiv: v5 [cs.lg] 21 Feb 2014

arxiv: v5 [cs.lg] 21 Feb 2014 Do Deep Nets Really Need to be Deep? arxiv:1312.6184v5 [cs.lg] 21 Feb 2014 Lei Jimmy Ba University of Toronto jimmy@psi.utoronto.ca Abstract Rich Caruana Microsoft Research rcaruana@microsoft.com Currently,

More information

Constrained Discriminative Training of N-gram Language Models

Constrained Discriminative Training of N-gram Language Models Constrained Discriminative Training of N-gram Language Models Ariya Rastrow #1, Abhinav Sethy 2, Bhuvana Ramabhadran 3 # Human Language Technology Center of Excellence, and Center for Language and Speech

More information

Energy-Based Learning

Energy-Based Learning Energy-Based Learning Lecture 06 Yann Le Cun Facebook AI Research, Center for Data Science, NYU Courant Institute of Mathematical Sciences, NYU http://yann.lecun.com Global (End-to-End) Learning: Energy-Based

More information

Learning The Lexicon!

Learning The Lexicon! Learning The Lexicon! A Pronunciation Mixture Model! Ian McGraw! (imcgraw@mit.edu)! Ibrahim Badr Jim Glass! Computer Science and Artificial Intelligence Lab! Massachusetts Institute of Technology! Cambridge,

More information

Implementing a Hidden Markov Model Speech Recognition System in Programmable Logic

Implementing a Hidden Markov Model Speech Recognition System in Programmable Logic Implementing a Hidden Markov Model Speech Recognition System in Programmable Logic S.J. Melnikoff, S.F. Quigley & M.J. Russell School of Electronic and Electrical Engineering, University of Birmingham,

More information

Handbook of Weighted Automata

Handbook of Weighted Automata Manfred Droste Werner Kuich Heiko Vogler Editors Handbook of Weighted Automata 4.1 Springer Contents Part I Foundations Chapter 1: Semirings and Formal Power Series Manfred Droste and Werner Kuich 3 1

More information

A study of large vocabulary speech recognition decoding using finite-state graphs 1

A study of large vocabulary speech recognition decoding using finite-state graphs 1 A study of large vocabulary speech recognition decoding using finite-state graphs 1 Zhijian OU, Ji XIAO Department of Electronic Engineering, Tsinghua University, Beijing Corresponding email: ozj@tsinghua.edu.cn

More information

The Design Principles and Algorithms of a Weighted Grammar Library. and. and. 1. Introduction

The Design Principles and Algorithms of a Weighted Grammar Library. and. and. 1. Introduction International Journal of Foundations of Computer Science c World Scientific Publishing Company The Design Principles and Algorithms of a Weighted Grammar Library CYRIL ALLAUZEN allauzen@research.att.com

More information

The Design Principles of a Weighted Finite-State Transducer Library

The Design Principles of a Weighted Finite-State Transducer Library The Design Principles of a Weighted Finite-State Transducer Library Mehryar Mohri 1, Fernando Pereira and Michael Riley AT&T Labs Research, 180 Park Avenue, Florham Park, NJ 0793-0971 Abstract We describe

More information

Knowledge-Based Word Lattice Rescoring in a Dynamic Context. Todd Shore, Friedrich Faubel, Hartmut Helmke, Dietrich Klakow

Knowledge-Based Word Lattice Rescoring in a Dynamic Context. Todd Shore, Friedrich Faubel, Hartmut Helmke, Dietrich Klakow Knowledge-Based Word Lattice Rescoring in a Dynamic Context Todd Shore, Friedrich Faubel, Hartmut Helmke, Dietrich Klakow Section I Motivation Motivation Problem: difficult to incorporate higher-level

More information

An Online Sequence-to-Sequence Model Using Partial Conditioning

An Online Sequence-to-Sequence Model Using Partial Conditioning An Online Sequence-to-Sequence Model Using Partial Conditioning Navdeep Jaitly Google Brain ndjaitly@google.com David Sussillo Google Brain sussillo@google.com Quoc V. Le Google Brain qvl@google.com Oriol

More information

Finite-State and the Noisy Channel Intro to NLP - J. Eisner 1

Finite-State and the Noisy Channel Intro to NLP - J. Eisner 1 Finite-State and the Noisy Channel 600.465 - Intro to NLP - J. Eisner 1 Word Segmentation x = theprophetsaidtothecity What does this say? And what other words are substrings? Could segment with parsing

More information

Dynamic Time Warping

Dynamic Time Warping Centre for Vision Speech & Signal Processing University of Surrey, Guildford GU2 7XH. Dynamic Time Warping Dr Philip Jackson Acoustic features Distance measures Pattern matching Distortion penalties DTW

More information

WFST: Weighted Finite State Transducer. September 12, 2017 Prof. Marie Meteer

WFST: Weighted Finite State Transducer. September 12, 2017 Prof. Marie Meteer + WFST: Weighted Finite State Transducer September 12, 217 Prof. Marie Meteer + FSAs: A recurring structure in speech 2 Phonetic HMM Viterbi trellis Language model Pronunciation modeling + The language

More information

Word Graphs for Statistical Machine Translation

Word Graphs for Statistical Machine Translation Word Graphs for Statistical Machine Translation Richard Zens and Hermann Ney Chair of Computer Science VI RWTH Aachen University {zens,ney}@cs.rwth-aachen.de Abstract Word graphs have various applications

More information

Long Short-Term Memory Recurrent Neural Network Architectures for Large Scale Acoustic Modeling

Long Short-Term Memory Recurrent Neural Network Architectures for Large Scale Acoustic Modeling INTERSPEECH 2014 Long Short-Term Memory Recurrent Neural Network Architectures for Large Scale Acoustic Modeling Haşim Sak, Andrew Senior, Françoise Beaufays Google, USA {hasim,andrewsenior,fsb@google.com}

More information

A Rational Design for a Weighted Finite-State Transducer Library

A Rational Design for a Weighted Finite-State Transducer Library A Rational Design for a Weighted Finite-State Transducer Library Mehryar Mohri, Fernando Pereira, and Michael Riley AT&T Labs -- Research 180 Park Avenue, Florham Park, NJ 07932-0971 1 Overview We describe

More information

JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS. Puyang Xu, Ruhi Sarikaya. Microsoft Corporation

JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS. Puyang Xu, Ruhi Sarikaya. Microsoft Corporation JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS Puyang Xu, Ruhi Sarikaya Microsoft Corporation ABSTRACT We describe a joint model for intent detection and slot filling based

More information

KERNEL METHODS MATCH DEEP NEURAL NETWORKS ON TIMIT

KERNEL METHODS MATCH DEEP NEURAL NETWORKS ON TIMIT KERNEL METHODS MATCH DEEP NEURAL NETWORKS ON TIMIT Po-Sen Huang, Haim Avron, Tara N. Sainath, Vikas Sindhwani, Bhuvana Ramabhadran Department of Electrical and Computer Engineering, University of Illinois

More information

CSCI 5582 Artificial Intelligence. Today 10/31

CSCI 5582 Artificial Intelligence. Today 10/31 CSCI 5582 Artificial Intelligence Lecture 17 Jim Martin Today 10/31 HMM Training (EM) Break Machine Learning 1 Urns and Balls Π Urn 1: 0.9; Urn 2: 0.1 A Urn 1 Urn 2 Urn 1 Urn 2 0.6 0.3 0.4 0.7 B Urn 1

More information

Stone Soup Translation

Stone Soup Translation Stone Soup Translation DJ Hovermale and Jeremy Morris and Andrew Watts December 3, 2005 1 Introduction 2 Overview of Stone Soup Translation 2.1 Finite State Automata The Stone Soup Translation model is

More information

arxiv: v1 [cs.cl] 6 Nov 2018

arxiv: v1 [cs.cl] 6 Nov 2018 HIERARCHICAL NEURAL NETWORK ARCHITECTURE IN KEYWORD SPOTTING Yixiao Qu, Sihao Xue, Zhenyi Ying, Hang Zhou, Jue Sun NIO Co., Ltd {yixiao.qu, sihao.xue, zhenyi.ying, hang.zhou, jue.sun}@nio.com arxiv:1811.02320v1

More information

Assignment 4 CSE 517: Natural Language Processing

Assignment 4 CSE 517: Natural Language Processing Assignment 4 CSE 517: Natural Language Processing University of Washington Winter 2016 Due: March 2, 2016, 1:30 pm 1 HMMs and PCFGs Here s the definition of a PCFG given in class on 2/17: A finite set

More information

Shrinking Exponential Language Models

Shrinking Exponential Language Models Shrinking Exponential Language Models Stanley F. Chen IBM T.J. Watson Research Center P.O. Box 218, Yorktown Heights, NY 10598 stanchen@watson.ibm.com Abstract In (Chen, 2009), we show that for a variety

More information

End-to-End Training of Acoustic Models for Large Vocabulary Continuous Speech Recognition with TensorFlow

End-to-End Training of Acoustic Models for Large Vocabulary Continuous Speech Recognition with TensorFlow End-to-End Training of Acoustic Models for Large Vocabulary Continuous Speech Recognition with TensorFlow Ehsan Variani, Tom Bagby, Erik McDermott, Michiel Bacchiani Google Inc, Mountain View, CA, USA

More information

MLSALT11: Large Vocabulary Speech Recognition

MLSALT11: Large Vocabulary Speech Recognition MLSALT11: Large Vocabulary Speech Recognition Riashat Islam Department of Engineering University of Cambridge Trumpington Street, Cambridge, CB2 1PZ, England ri258@cam.ac.uk I. INTRODUCTION The objective

More information

Learning with Weighted Transducers

Learning with Weighted Transducers Learning with Weighted Transducers Corinna CORTES a and Mehryar MOHRI b,1 a Google Research, 76 Ninth Avenue, New York, NY 10011 b Courant Institute of Mathematical Sciences and Google Research, 251 Mercer

More information

WFSTDM Builder Network-based Spoken Dialogue System Builder for Easy Prototyping

WFSTDM Builder Network-based Spoken Dialogue System Builder for Easy Prototyping WFSTDM Builder Network-based Spoken Dialogue System Builder for Easy Prototyping Etsuo Mizukami and Chiori Hori Abstract This paper introduces a network-based spoken dialog system development tool kit:

More information

Sparse matrices, graphs, and tree elimination

Sparse matrices, graphs, and tree elimination Logistics Week 6: Friday, Oct 2 1. I will be out of town next Tuesday, October 6, and so will not have office hours on that day. I will be around on Monday, except during the SCAN seminar (1:25-2:15);

More information

Introduction to The HTK Toolkit

Introduction to The HTK Toolkit Introduction to The HTK Toolkit Hsin-min Wang Reference: - The HTK Book Outline An Overview of HTK HTK Processing Stages Data Preparation Tools Training Tools Testing Tools Analysis Tools A Tutorial Example

More information

Frame and Segment Level Recurrent Neural Networks for Phone Classification

Frame and Segment Level Recurrent Neural Networks for Phone Classification Frame and Segment Level Recurrent Neural Networks for Phone Classification Martin Ratajczak 1, Sebastian Tschiatschek 2, Franz Pernkopf 1 1 Graz University of Technology, Signal Processing and Speech Communication

More information

Treba: Efficient Numerically Stable EM for PFA

Treba: Efficient Numerically Stable EM for PFA JMLR: Workshop and Conference Proceedings 21:249 253, 2012 The 11th ICGI Treba: Efficient Numerically Stable EM for PFA Mans Hulden Ikerbasque (Basque Science Foundation) mhulden@email.arizona.edu Abstract

More information

THE most popular training method for hidden Markov

THE most popular training method for hidden Markov 204 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 12, NO. 3, MAY 2004 A Discriminative Training Algorithm for Hidden Markov Models Assaf Ben-Yishai and David Burshtein, Senior Member, IEEE Abstract

More information

A Survey on Speech Recognition Using HCI

A Survey on Speech Recognition Using HCI A Survey on Speech Recognition Using HCI Done By- Rohith Pagadala Department Of Data Science Michigan Technological University CS5760 Topic Assignment 2 rpagadal@mtu.edu ABSTRACT In this paper, we will

More information

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas CS839: Probabilistic Graphical Models Lecture 10: Learning with Partially Observed Data Theo Rekatsinas 1 Partially Observed GMs Speech recognition 2 Partially Observed GMs Evolution 3 Partially Observed

More information

Dynamic Time Warping & Search

Dynamic Time Warping & Search Dynamic Time Warping & Search Dynamic time warping Search Graph search algorithms Dynamic programming algorithms 6.345 Automatic Speech Recognition Dynamic Time Warping & Search 1 Word-Based Template Matching

More information

Google Search by Voice

Google Search by Voice Language Modeling for Automatic Speech Recognition Meets the Web: Google Search by Voice Ciprian Chelba, Johan Schalkwyk, Boulos Harb, Carolina Parada, Cyril Allauzen, Leif Johnson, Michael Riley, Peng

More information

CS 532c Probabilistic Graphical Models N-Best Hypotheses. December

CS 532c Probabilistic Graphical Models N-Best Hypotheses. December CS 532c Probabilistic Graphical Models N-Best Hypotheses Zvonimir Rakamaric Chris Dabrowski December 18 2004 Contents 1 Introduction 3 2 Background Info 3 3 Brute Force Algorithm 4 3.1 Description.........................................

More information

TECHNIQUES TO ACHIEVE AN ACCURATE REAL-TIME LARGE- VOCABULARY SPEECH RECOGNITION SYSTEM

TECHNIQUES TO ACHIEVE AN ACCURATE REAL-TIME LARGE- VOCABULARY SPEECH RECOGNITION SYSTEM TECHNIQUES TO ACHIEVE AN ACCURATE REAL-TIME LARGE- VOCABULARY SPEECH RECOGNITION SYSTEM Hy Murveit, Peter Monaco, Vassilios Digalakis, John Butzberger SRI International Speech Technology and Research Laboratory

More information

Deep Neural Networks in HMM- based and HMM- free Speech RecogniDon

Deep Neural Networks in HMM- based and HMM- free Speech RecogniDon Deep Neural Networks in HMM- based and HMM- free Speech RecogniDon Andrew Maas Collaborators: Awni Hannun, Peng Qi, Chris Lengerich, Ziang Xie, and Anshul Samar Advisors: Andrew Ng and Dan Jurafsky Outline

More information

On Algebraic Expressions of Generalized Fibonacci Graphs

On Algebraic Expressions of Generalized Fibonacci Graphs On Algebraic Expressions of Generalized Fibonacci Graphs MARK KORENBLIT and VADIM E LEVIT Department of Computer Science Holon Academic Institute of Technology 5 Golomb Str, PO Box 305, Holon 580 ISRAEL

More information

Hidden Markov Models. Slides adapted from Joyce Ho, David Sontag, Geoffrey Hinton, Eric Xing, and Nicholas Ruozzi

Hidden Markov Models. Slides adapted from Joyce Ho, David Sontag, Geoffrey Hinton, Eric Xing, and Nicholas Ruozzi Hidden Markov Models Slides adapted from Joyce Ho, David Sontag, Geoffrey Hinton, Eric Xing, and Nicholas Ruozzi Sequential Data Time-series: Stock market, weather, speech, video Ordered: Text, genes Sequential

More information

DAGGER: A Toolkit for Automata on Directed Acyclic Graphs

DAGGER: A Toolkit for Automata on Directed Acyclic Graphs DAGGER: A Toolkit for Automata on Directed Acyclic Graphs Daniel Quernheim Institute for Natural Language Processing Universität tuttgart, Germany Pfaffenwaldring 5b, 70569 tuttgart daniel@ims.uni-stuttgart.de

More information

Lattice Rescoring for Speech Recognition Using Large Scale Distributed Language Models

Lattice Rescoring for Speech Recognition Using Large Scale Distributed Language Models Lattice Rescoring for Speech Recognition Using Large Scale Distributed Language Models ABSTRACT Euisok Chung Hyung-Bae Jeon Jeon-Gue Park and Yun-Keun Lee Speech Processing Research Team, ETRI, 138 Gajeongno,

More information

Applications of Lexicographic Semirings to Problems in Speech and Language Processing

Applications of Lexicographic Semirings to Problems in Speech and Language Processing Applications of Lexicographic Semirings to Problems in Speech and Language Processing Richard Sproat Google, Inc. Izhak Shafran Oregon Health & Science University Mahsa Yarmohammadi Oregon Health & Science

More information