Detailed Notes on A Voice Dialing Application

Size: px

Start display at page:

Download "Detailed Notes on A Voice Dialing Application"

Melina Posy Harper
5 years ago
Views:

1 Detailed Notes on A Voice Dialing Application It is assumed that you worked through steps 1-3 of the HTK Workshop These notes are intended to support you while working through the tutorial example of HTK. 0. General Preparations Create a new project file folder named <voicedialsystem>. 1. Copy HParse, HDMan, HCopy, HLEd, HERest, HVite, HCompV, HHEd, HResults from the <bin.win32> folder to your project. 2. Copy prompts2wlist, prompts2mlf, maketrihed from the <HTKtutorial> folder to your project. 3. Copy mkclscript.prl from the <perl_scripts> folder to your project. 4. Copy beep-1.0 from the <Beep> folder to your project. 1. Data Preparations 1.1 The Task Grammar Create a gram file with the following content: $digit = ONE TWO THREE FOUR FIVE SIX SEVEN EIGHT NINE OH ZERO; $name = [JOOP] [JULIAN] [DAVE] [PHIL] WOOD [STEVE] YOUNG; (SENT-START (DIAL <$digit> (PHONE CALL) $name) SENT-END) Now execute on the command line: HParse gram wdnet 1.2 the Dictionary Create a trainprompts file, copy and re-label the selected training text from the TIMIT database that is available in the <speech> directory on the DVD. Extract the training words lists (wlist) with the following script: Perl prompts2wlisl trainprompts wlist Create a global.ded file with the following content: AS sp RS cmu MP sil sil sp

2 Create a names dictionary file with content: DAVE d ey v JOOP jh uh p JULIAN jh uw l y ax n JULIAN jh uw l ia n LAW l ao LEE l iy PHIL f ih l SENT-END [] sil SENT-START [] sil STEVE s t iy v SUE s uw SUE s y uw TYLER t ay l ax WOOD w uh d YOUNG y ah ng Execute on the command line: HDMan m w wlist n monophones1 l dlog dict beep-1.0 names Note that the original beep-1.0 dictionary may not correspond to your text, so you need to modify it manually. Open the dict file and change into SENT-END sil SENT-START sil SENT-END [] sil SENT-START [] sil This will result in no output even when these two words are recognized. 1.3 Recording the Data This should be clear. 1.4 Creating the Transcription Files Create a testprompts file, copy and re-label the selected training text from the TIMIT database. Perl prompts2mlf trainwords.mlf trainprompts Perl prompts2mlf testwords.mlf testprompts

3 Create an mkphones0.led file with content: EX IS sil sil DE sp Generate phone level MLFs with the following script: HLed l * d dict i phones0.mlf mkphones0.led trainwords.mlf Note: It may happen that some of the training words are not included in the beep-1.0 dictionary, so you may need to add them to your dict dictionary manually. 1.5 Coding the Data Create a config file with content: # Coding parameters SOURCEFORMAT= NIST TARGETKIND = MFCC_0_D_A TARGETRATE = SAVECOMPRESSED = T SAVEWITHCRC = T WINDOWSIZE = USEHAMMING = T PREEMCOEF = 0.97 NUMCHANS = 26 CEPLIFTER = 22 NUMCEPS = 12 ENORMALISE = F Create a codetr.scp file with a list of training source files (left side) and their corresponding feature output file (right side). Use the same method to create a codete.scp file. HCopy T 1 C config S codetr.scp HCopy T 1 C config S codete.scp

4 2. Creating Monophone HMMs 2.1 Creating Flat Start Monophones Create a proto file to define a prototype model with the following parameters: ~o <VecSize> 39 <MFCC_0_D_A> ~h "proto" <BeginHMM> <NumStates> 5 <State> 2 <Mean> (x39) <Variance> (x39) <State> 3 <Mean> (x39) <Variance> (x39) <State> 4 <Mean> (x39) <Variance> (x39) <TransP> <EndHMM> Create a train.scp file with a list of all the training files. Mkdir hmm0 HCompv C config f 0.01 m S train.scp M hmm0 proto Create a Master Macro File (MMF) called hmmdefs containing a copy for each of the monophones by manually copying all the required monophones (include sil ) and relabeling them. ~h aa <BeginHMM>

5 <EndHMM> ~h eh <BeginHMM> <EndHMM>..etc.. Create macros with the content: ~o <VECSIZE> 39 <MFCC_0_D_A> ~v varfloor1 <Variance> Delete the sp model in the monopnones1 file and save the file as monophones0. Execute the following scripts. Mkdir hmm1 HERest -C config I phones0.mlf t S train.scp H hmm0/macros H hmm0/hmmdefs M hmm1 monophones0 Mkdir hmm2 HERest -C config I phones0.mlf t S train.scp H hmm1/macros H hmm1/hmmdefs M hmm2 monophones0 Mkdir hmm3 HERest -C config I phones0.mlf t S train.scp H hmm2/macros H hmm2/hmmdefs M hmm3 monophones0 2.2 Fixing the Silence Models Make a new directory: Mkdir hmm4 Use a text editor on the file hmm3/hmmdefs to copy the centre state of the sil model to make a new sp model. Store the resulting MMF hmmdefs, which includes the new sp model, in the new directory <hmm4>. Copy macros file to the <hmm4> folder. Create the sil.led file with the following content: AT {sil.transp} AT {sil.transp} AT {sp.transp} TI silst {sil.state[3],sp.state[2]}

6 Execute the following commands: Mkdir hmm5 Hhed H hmm4/macros H hmm4/hmmdefs M hmm5 sil.hed monophones1 Mkdir hmm6 HERest -C config I phones0.mlf t S train.scp H hmm5/macros H hmm5/hmmdefs M hmm6 monophones1 Mkdir hmm7 HERest -C config I phones0.mlf t S train.scp H hmm6/macros H hmm6/hmmdefs M hmm7 monophones1 2.3 Realigning the Training Data Add SILENCE sil to dict and save as dict1. Note: Add */ before each file name in trainwords.mlf. Execute the following commands: HVite l * o SWT b SILENCE C config a H hmm7/macros H hmm7/hmmdelfs I aligned.mlf m t y lab I trainwords.mlf S train.scp dict1 monophones1 Mkdir hmm8 HERest C config I aligned.mlf t S train.scp H hmm7/macros H hmm7/hmmdefs M hmm8 monophonese1 Mkdir hmm9 HERest C config I aligned.mlf t S train.scp H hmm8/macros H hmm8/hmmdefs M hmm9 monophonese1 3. Creating Tied-State Triphones 3.1 Making triphones from monophones Create the file mktri.led with the following content: WB sp WB sil TC

7 Execute the following commands: HLEd n triphones1 l * -i wintri.mlf mktri.led aligned.mlf Perl Maketrihed monophones1 triphones1 Mkdir hmm10 Hhed B H hmm9/macros H hmm9/hmmdefs M hmm10 mktri.hed monophones1 Mkdir hmm11 Herest C config I wintri.mlf t S train.scp H hmm10/macros H hmm10/hmmdefs M hmm11 triphones1 Mkdir hmm12 Herest C config I wintri.mlf t s stats S train.scp H hmm11/macros H hmm11/hmmdefs M hmm12 triphones1 4.2 Making Tied-State Triphones Execute the following command: HDMan b sp n fulllist g global.ded l flog beep-tri beep-1.0 Copy the content of the triphones1 and add it to fulllist file. Create the file tree.hed with the following content: TB 350 "ST_ah_2_" {("ah","*-ah+*","ah+*","*-ah").state[2]} TB 350 "ST_ax_2_" {("ax","*-ax+*","ax+*","*-ax").state[2]} TB 350 "ST_ey_2_" {("ey","*-ey+*","ey+*","*-ey").state[2]} TB 350 "ST_sh_2_" {("sh","*-sh+*","sh+*","*-sh").state[2]} Etc And execute the following command: Perl mkclscript.prl TB monophones1>>tree.hed Add the following content to the tree.hed file: TR 1 AU fulllist CO tiedlist ST trees

8 Execute the following commands: Mkdir hmm13 HHEd H hmm12/macros H hmm12/hmmdefs M hmm13 tree.hed triphones1 > log Mkdir hmm14 HERest C config I wintri.mlf t S train.scp H hmm13/macros H hmm13/hmmdefs M hmm14 tiedlist Mkdir hmm15 HERest C config I wintri.mlf t S train.scp H hmm14/macros H hmm14/hmmdefs M hmm15 tiedlist 4 The Evaluation of the Recognizer 4.1 Recognizing the Test Data Finally, execute the following command to evaluate the recognizer: Hvite C config H hmm15/macros H hmm15/hmmdefs S test.scp l * -I result.mlf w wdnet p 0.0 s 5.0 dict tiedlist

Introduction to The HTK Toolkit

Introduction to The HTK Toolkit Hsin-min Wang Reference: - The HTK Book Outline An Overview of HTK HTK Processing Stages Data Preparation Tools Training Tools Testing Tools Analysis Tools A Tutorial Example