Speech Recogni,on using HTK CS4706. Fadi Biadsy April 21 st, 2008

Size: px

Start display at page:

Download "Speech Recogni,on using HTK CS4706. Fadi Biadsy April 21 st, 2008"

Luke Ross
5 years ago
Views:

1 peech Recogni,on using HTK C4706 Fadi Biadsy April 21 st,

2 Outline peech Recogni,on Feature Extrac,on HMM 3 basic problems HTK teps to Build a speech recognizer 2

3 peech Recogni,on peech ignal to Linguis,c Units AR There s something happening when Americans 3

4 It s hard to recognize speech Contextual effects peech sounds vary with context E.g., How do you do? Within speaker variability peaking tyle Pitch, intensity, speaking rate Voice Quality Between speaker variability Accents, Dialects, na,ve vs. non na,ve Environment variability Background noise Microphone 4

5 Feature Extrac,on Wave form? pectrogram? We need a stable representa,on for different examples of the same speech sound 5

6 Feature Extrac,on Extract features from short frames (frame period 10ms, 25ms frame size) a sequence of features 6

7 Feature Extrac,on MFCC Mel cale approximate the unequal sensi,vity of human hearing at different frequencies 7

8 Feature Extrac,on MFCC MFCC (Mel frequency cepstral coefficient) Widely used in speech recogni,on 1. Take the Fourier transform of the signal 2. Map the log amplitudes of the spectrum to the mel scale 3. Discrete cosine transform of the mel logamplitudes 4. The MFCCs are the amplitudes of the resul,ng spectrum 8

9 Feature Extrac,on MFCC Extract a feature vector from each frame 12 MFCCs (Mel frequency cepstral coefficient) + 1 normalized energy = 13 features Delta MFCC = 13 Delta Delta MCC = 13 Total: 39 features Inverted MFCCs: 39 Feature vector 9

10 Markov Chain Weighted Finite tate Acceptor: Future is independent of the past given the present 10

11 Hidden Markov Model (HMM) HMM is a Markov chain + emission probability func,on for each state. Markov Chain HMM M=(A, B, Pi) A = Transi,on Matrix B = Observa,on Distribu,ons Pi = Ini,al state probabili,es 11

12 HMM Example /d/ /aa/ /n/ /aa/ 12

13 HMM 3 basic problems (1) Evalua,on 1. Given the observa,on sequence O and a model M, how do we efficiently compute: P(O M) = the probability of the observa,on sequence, given the model? argmax i (P(O Θ i ) 13

14 HMM 3 basic problems (2) Decoding 2. Given the observa,on sequence O and the model M, how do we choose a corresponding state sequence Q = q1 q2... qt which best explains the observa,on O? Q* = argmax Q (P(O Q, M)) = argmax Q (P(Q O,M)P(Q M)) 14

15 Viterbi algorithm Is an efficient algorithm for Decoding O(TN^2) /d/ /aa/ /n/ /aa/ tart /uw/ End /d/ /aa/ /n/ /aa/ => dana 15

16 HMM 3 basic problems (3) Training How do we adjust the model parameters M= (A, B, Pi) to maximize P(O M)? /d/ /aa/ /n/ /aa/ dana => /d/ /aa/ /n/ /aa/ 1) Transi,on Matrix: A 2) Emission probability distribu,on: Es,mate 16

17 HMM 3 basic problems (3) Training 17

18 HTK HTK is a toolkit for building Hidden Markov Models (HMMs) HTK is primarily designed for building HMMbased speech processing tools (e.g., extrac,ng MFCC features) 18

19 teps for building AR voice operated interface for phone dialing Examples: Dial three three two six five four Phone Woodland Call teve Young Grammar: $digit = ONE TWO THREE FOUR FIVE IX EVEN EIGHT NINE OH ZERO; $name = [ JOOP ] JANEN [ JULIAN ] ODELL [ DAVE ] OLLAON [ PHIL ] WOODLAND [ TEVE ] YOUNG; ( ENT TART ( DIAL <$digit> (PHONE CALL) $name) ENT END ) 19

20 20

0001 ONE VALIDATED ACT OF CHOOL DITRICT 0002

MEDICINE etc A ah sp A ax sp A ey sp CALL k

21 0001 ONE VALIDATED ACT OF CHOOL DITRICT 0002 TWO OTHER CAE ALO WERE UNDER ADVIEMENT 0003 BOTH FIGURE WOULD GO HIGHER IN LATER YEAR 0004 THI I NOT A PROGRAM OF OCIALIZED MEDICINE etc A ah sp A ax sp A ey sp CALL k ao l sp DIAL d ay ax l sp EIGHT ey t sp PHONE f ow n sp 21

22 HTK scrip,ng language is used to generate Phone,c transcrip,on for all training data 22

23 Extrac,ng MFCC For each wave file, extract MFCC features. 23

24 Crea,ng Monophone HMMs Create Monophone HMM Topology 5 states: 3 emi{ng states Flat tart: Mean and Variance are ini,alized as the global mean and variance of all the data 24

25 Training For each training pair of files (mfc+lab): 1. concatenate the corresponding monophone HMM: 2. Use the Beam Welch Algorithm to train the HMM given the MFC features. ONE VALIDATED ACT OF CHOOL DITRICT 1 2 /w/ /ah/ /n/

26 Training o far, we have all monophones models trained Train the sp model 26

27 Forced alignment The dic,onary contains mul,ple pronuncia,ons for some words. Realignment the training data /d/ /ey/ /t/ /ax/ Run Viterbi to get the best pronuncia,on that matches the acous,cs /ae/ /dx/ 27

28 Retrain A er ge{ng the best pronuncia,on => Train again using Beam Welch algorithm using the correct pronuncia,on. 28

29 Crea,ng Triphone models Context dependent HMMs Make Tri phones from monophones Generate a list of all the triphones for which there is at least one example in the training data jh oy+s oy s ax+z f iy+t iy t s+l s l+ow 29

30 Crea,ng Tied Triphone models Data insufficiency => Tie states /aa/ /t/ /b/ /b/ /aa/ /l/ 30

31 Tie Triphone Data Driven Clustering: Using similarity metric Clustering using Decision Tree. All states in the same leafe will be,ed t+ih t+ae t+iy t+ae ao r+ax r t+oh t+ae ao r+iy t+uh t+ae t+uw t+ae sh n+t sh n+z sh n+t ch ih+l ay oh+l ay oh+r ay oh+l R = Glide? L = Nasal? n y n y L = Class top? n y n y R = Nasal? 31

32 A er Tying Train the Acous,c models again using Beam Welch algorithm 32

33 Decoding Using the grammar network for the phones Generate the triphone HMM grammar network WNET Given a new peech file, extract the mfcc features Run Viterbi on the WNET given the mfcc features to get the best word sequence. 33

34 ummary MFCC Features HMM 3 basic problems HTK 34

35 Thanks! 35

36 HMM Problem 1 36

Introduction to HTK Toolkit

Introduction to HTK Toolkit Berlin Chen 2003 Reference: - The HTK Book, Version 3.2 Outline An Overview of HTK HTK Processing Stages Data Preparation Tools Training Tools Testing Tools Analysis Tools Homework: