Ping-pong decoding Combining forward and backward search

Size: px

Start display at page:

Download "Ping-pong decoding Combining forward and backward search"

Liliana Lucas
6 years ago
Views:

1 Combining forward and backward search Research Internship 09/ - /0/0 Mirko Hannemann Microsoft Research, Speech Technology (Redmond) Supervisor: Daniel Povey /0/0 Mirko Hannemann /

2 Beam Search Search Errors "partial_forward9" "partial_forward" "partial_diff" "best_diff" "best_diff9" score frame Mirko Hannemann /

3 What is the optimal beam width? x Only in few spots, we actually need the full beam. How to identify those spots? Histogram of score differences: current best path and final best path Mirko Hannemann 3/

4 in time Was it a car or a cat I saw? Example (Resource Management) forward decoding: IS SHERMAN ARE CONIFER AND THREE MOST RECENT CASUALTY REPORT backwards decoding: IS BADGER A REMARK ON VANCOUVER+S MOST RECENT CASUALTY REPORT Noise at the begin confused the whole utterance. Did not harm the backwards decoding much. Mirko Hannemann 4/

5 Analysis of search errors Are forward and backward search errors independent? fwd: PRODUCTS WOULD BE A MARKET BY OTHER COMPANIES... I S... bwd: PRODUCTS WOULD BE - TARGETED BY OTHER COMPLAINTS S wide: PRODUCTS WOULD BE - TARGETED BY OTHER COMPANIES Decoding beam forward backwards co-occurrence WSJ Nov 9 test set, align against wide beam (9.0) error co-occurrence must not mean same error Mirko Hannemann 5/

6 Construction of decoding network weighted finite state transducer (WFST) approach [Mohri et al.] H C L G () G grammar or language model acceptor L lexicon (phones to words) C context-dependency (context-dependent phones to phones) H HMM (PDF-ids to context-dependent phones) Kaldi toolkit: HCLG = asl(min(rds(det(h a min(det(c min(det(l G)))))))) () asl - add self loops rds - remove disambiguation symbols Mirko Hannemann 6/

7 Reversing language model G (word pair) LM must assign exactly the same scores to reversed utterances. Word pair grammar (uniform distribution, no back-off) D/0.5 4 B/0.5 <eps>/0.5 A/0.333 E/0.5 B/0.5 5 <eps>/0.5 0 C/0.333 B/ G E/0.333 F/0.333 G/ <eps> <eps> 8 finite state acceptor reversal, epsilon removal determinization and weight pushing in log semi-ring 3 OpenFst: iterative weight pushing algorithm, problem with states with huge fan-out Mirko Hannemann 7/

8 Reversing language models (ARPA) start <s> SB a/ <eps>/ <eps>/ backoff b/ </s>/ </s>/5.959 SE \-grams: a b <s> </s> a a/.053 b/ a/ <eps> </s>/ b </s>/ SB \-grams: a b <s> a -.78 b a -.30 b </s> <eps>/ backoff <eps>/.053 a/ a <eps>/ a/ b/ b b start <s> SE b/5.959 weight pushing in log semi-ring problem with states with huge fan-out Mirko Hannemann 8/

9 Reversing language models (ARPA) To be done: higher order models: find mathematical equations train on reversed training texts not exact, scores slightly differ Mirko Hannemann 9/

10 Reversing pronunciation dictionary L A ax # ABERDEEN n iy d er b ae ABOARD dd r ao b ax ADD dd ae # Add disambiguation symbols after reversing pronunciations. ae:<eps>/0.5 n:aberdeen 5 iy:<eps> 9 d:<eps> 0 er:<eps> b:<eps> ae:<eps>/0.5 ax:<eps>/0.5 dd:aboard 6 r:<eps> 3 ao:<eps> 4 b:<eps> 5 ax:<eps>/0.5 sil:!sil/0.5 sil:!sil/0.5 #:<eps>/0.5 ax:a v:above <eps>:<eps>/ ah:<eps> ax:<eps>/0.5 6 b:<eps> 7 #:<eps>/0.5 ax:<eps>/0.5 3 sil:<eps> dd:add 8 ae:<eps> 8 #:<eps>/0.5 #:<eps>/0.5 0 #4:<eps> sil:<eps>/0.5 Mirko Hannemann 0/

11 Reversing context-dependency transducer C #-/a eps-a-b/b a-b-c/c b-c-d/d c-d-eps/$ eps-eps eps-a a-b b-c c-d d/eps decision tree clusters on phoneme context window and HMM state context window out of L G is reversed! Mirko Hannemann /

12 Reversing HMM transducer H a :aa/-.384e-07 4:<eps> 5 6:<eps>/ e-08 0:<eps> 6 0:<eps> 0 30:#0 0:<eps> 4 0:<eps>/.9e-07 7 :<eps>/-.9e :ae 86:<eps>/ :sil 89:<eps>/ :<eps>/ :<eps>/ :<eps>/ :<eps>/ :<eps>/ :<eps>/ :<eps>/ :<eps>/ :<eps>/ :<eps>/ :<eps> 0:<eps> 3 (here shown for monophone case) Mirko Hannemann /

13 Reversing HMM transducer H a 94:<eps>/3.5 9:<eps>/ :<eps>/ :<eps>/ :<eps>/ :<eps>/ :<eps>/ :<eps>/.9 89:<eps>/ :<eps>/ :sil/ :<eps>/ :<eps>/ :aa/-.980e-07 4:<eps> 0:<eps> 6 :<eps> 5 0 0:<eps> 30:#0 4 :ae 0:<eps> 0:<eps> 8 8:<eps> 7 0:<eps> reverse phone HMMs, remove epsilons, push weights in log semi-ring before composing H a add self-loops: order of transitions changes Mirko Hannemann 3/

14 st pass forward search, nd pass backwards (or vice-versa) How can we use the search result of the first pass in second pass decoding? convert lattice generated by HCLG st into lattice of decoding-graph-states of HCLG nd for each frame Mirko Hannemann 4/

15 Generation of graph-state-lattices map HCLG nd to PDF-to-Arc transducer HCLG arc HCLG nd : transduces PDF-ids into words encode HCLG nd node + arc-id into output symbol 3 map input to be self-loop order independent map first-pass lattice LAT st to LAT rev map input (self-loops), project on input, remove weights time reversal of lattice and epsilon removal 3 compose: LAT arc = LAT rev HCLG arc obtains sequences of HCLG nd arcs for PDF sequence in lattice lattice determinization det(lat arc) (on PDF-ids) in special semi-ring single HCLG nd path for each sequence of PDFs 3 project to HCLG nd node/arc-ids, determinize again 4 output is an acceptor lattice for HCLG nd graph arcs Mirko Hannemann 5/

16 Generation of graph-state-lattices map HCLG nd to PDF-to-Arc transducer HCLG arc HCLG nd : transduces PDF-ids into words encode HCLG nd node + arc-id into output symbol 3 map input to be self-loop order independent map first-pass lattice LAT st to LAT rev map input (self-loops), project on input, remove weights time reversal of lattice and epsilon removal 3 compose: LAT arc = LAT rev HCLG arc obtains sequences of HCLG nd arcs for PDF sequence in lattice lattice determinization det(lat arc) (on PDF-ids) in special semi-ring single HCLG nd path for each sequence of PDFs 3 project to HCLG nd node/arc-ids, determinize again 4 output is an acceptor lattice for HCLG nd graph arcs Mirko Hannemann 5/

17 Generation of graph-state-lattices map HCLG nd to PDF-to-Arc transducer HCLG arc HCLG nd : transduces PDF-ids into words encode HCLG nd node + arc-id into output symbol 3 map input to be self-loop order independent map first-pass lattice LAT st to LAT rev map input (self-loops), project on input, remove weights time reversal of lattice and epsilon removal 3 compose: LAT arc = LAT rev HCLG arc obtains sequences of HCLG nd arcs for PDF sequence in lattice lattice determinization det(lat arc) (on PDF-ids) in special semi-ring single HCLG nd path for each sequence of PDFs 3 project to HCLG nd node/arc-ids, determinize again 4 output is an acceptor lattice for HCLG nd graph arcs Mirko Hannemann 5/

18 Generation of graph-state-lattices map HCLG nd to PDF-to-Arc transducer HCLG arc HCLG nd : transduces PDF-ids into words encode HCLG nd node + arc-id into output symbol 3 map input to be self-loop order independent map first-pass lattice LAT st to LAT rev map input (self-loops), project on input, remove weights time reversal of lattice and epsilon removal 3 compose: LAT arc = LAT rev HCLG arc obtains sequences of HCLG nd arcs for PDF sequence in lattice lattice determinization det(lat arc) (on PDF-ids) in special semi-ring single HCLG nd path for each sequence of PDFs 3 project to HCLG nd node/arc-ids, determinize again 4 output is an acceptor lattice for HCLG nd graph arcs Mirko Hannemann 5/

19 Generation of graph-state-lattices map HCLG nd to PDF-to-Arc transducer HCLG arc HCLG nd : transduces PDF-ids into words encode HCLG nd node + arc-id into output symbol 3 map input to be self-loop order independent map first-pass lattice LAT st to LAT rev map input (self-loops), project on input, remove weights time reversal of lattice and epsilon removal 3 compose: LAT arc = LAT rev HCLG arc obtains sequences of HCLG nd arcs for PDF sequence in lattice lattice determinization det(lat arc) (on PDF-ids) in special semi-ring single HCLG nd path for each sequence of PDFs 3 project to HCLG nd node/arc-ids, determinize again 4 output is an acceptor lattice for HCLG nd graph arcs Mirko Hannemann 5/

20 Using first pass search in second pass for each time step: perform own search track, where other has been extend search area if paths cross: adapt shorter paths Mirko Hannemann 6/

21 Using first pass search in second pass perform own search: st and nd pass beams set of observed tokens: move according to arc-lattice, track and never prune those! extend beam to include all observed tokens add extra-beam, limit by max-beam token recombination: inherit observation status Mirko Hannemann 7/

22 Results on Wall Street Journal Nov "rt_wer_forward" "rt_wer_backward" "rt_wer_pingpong0" "rt_wer_pingpong" "rt_wer_pingpong" "rt_wer_pingpong4" 0.6 realtime factor WER Mirko Hannemann 8/

23 Analysis of Search Errors Decoding beam forward backwards co-occurrence ping-pong WSJ Nov 9 closed voc. test set, 330 utterances triphone HMM+GMM, trained on 80h WSJ0 (Kaldi tria) bigram 5k language model (exact scores for reversal) fwd: BRIAN J. KILLING CHAIRMAN OF BELL - ATLANTA X. INVESTMENT.. S..... S. bwd: BRIAN J. DAILY CHAIRMAN OF BELL AND LAND SIX INVESTMENT I S S. png: BRIAN J. DAILY CHAIRMAN OF BELL - ATLANTA ITS INVESTMENT ref: BRIAN J. DAILY CHAIRMAN OF BELL - ATLANTA ITS INVESTMENT Mirko Hannemann 9/

24 Time analysis Search Errors Where is the time spent in ping-pong decoding? 35% first pass decoding (narrower lattice) 0% lattice to arc-lattice conversion (extra prog) <5% feature reversal, different ambiguity 40% second pass decoding, lattice generation 5-0% tracking first pass tokens 5-5% extra tokens in wider beam About 0% optimization is possible, but does not change things fundamentally. Mirko Hannemann 0/

25 Summary Search Errors backwards decoding and reverse decoding networks, LMs WFST based arc-lattice generation 3 integrating first pass search into second pass 4 tracking arc-lattice and vary beam 5 roughly two times speed-up by ping-pong decoding Open issues: reverse language models / reversed training too many parameters: forward beam, backward beam, lattice beam, extra-beam, max-beam, max-states, final-beam Mirko Hannemann /

26 References Search Errors Mohri08 M. Mohri et al., Speech Recognition with weighted finite state transducers. Povey D. Povey et al., The Kaldi Speech Recognition Toolkit. Povey D. Povey et al., Generating exact lattices in the WFST framework. Mirko Hannemann /

Weighted Finite State Transducers in Automatic Speech Recognition

Weighted Finite State Transducers in Automatic Speech Recognition ZRE lecture 10.04.2013 Mirko Hannemann Slides provided with permission, Daniel Povey some slides from T. Schultz, M. Mohri and M. Riley