of Manchester The University COMP14112 Markov Chains, HMMs and Speech Revision

Size: px

Start display at page:

Download "of Manchester The University COMP14112 Markov Chains, HMMs and Speech Revision"

Ursula Doyle
5 years ago
Views:

1 COMP14112 Lecture 11 Markov Chains, HMMs and Speech Revision 1

2 What have we covered in the speech lectures? Extracting features from raw speech data Classification and the naive Bayes classifier Training Sequence data Markov models Hidden Markov models 2

3 1. Features and data We have to represent sensory information in a useful way: sound waves and robust sensor data are two examples. Good features are domain specific,but we often end up with a vector of numbers called a feature vector or data point For speech we use MFCC features derived form segmented data Methods for processing the feature vectors are general Probabilistic approaches are popular-not the only approach, but certainly a leading one 3

4 2. Classification Given a data point x, what class does it belong to? You constructed probabilistic classifiers in Labs 2 and 3to distinguish between yes and no You should know what makes a good classifier how would you assess its performance? Lots of applications one of the key AI tools 4

5 2.1 Probabilistic classification For a data point x Estimate the probability density p(x C i )for each class i Apply Bayes theorem p ( C x) 1 = p ( x C ) 1 p ( C1 ) p( x C ) i p( C i ) i Apply classification rule: for two classes, p(c 1 x) > 0.5 Class of x= C 1 Multiple classes? 5

6 2.2 Naïve Bayesclassifier The naïve Bayesassumption can be used if data are vectors Feature vector components are conditionally independent given the class p ( x C ) = p( x C ) p( x C ) p( x C ) L p( x C ) i 1 i 2 See lecture notes and Lab 2 for application to time averaged MFCC features derived from speech i Examples sheet 6 for discrete valued data example 2 i d i 6

7 2.3 1-D Classification You ve seen some example classification rules For 1-D data, a single feature x 7

8 2.4 n-d Classification For 2-D data with feature vector x= [x 1, x 2 ] 8

9 3. Training When we fit a probability densityor probabilistic model to data, we have an example of training In the Labs, you ve seen data being used to estimate parameters of a normal distribution and a HMM The data that s used for this is training data Training is fundamental to machine learning, a large and important area of research in CS NB the performance ofthe Lab classifier would have improved with more training data 9

10 4. Sequence data In some cases the data arrives in a sequence We used speech data Other examples Video Sequential games Anything real-time DNA sequence data 10

11 5. Markov chains You should know Definition of a first order Markov process p ( s s s, L s ) p( s s ) t t 1, t 2, 1 = t t 1 Parameters are transition probabilities Normalisation condition Can be represented as a directed graph or a transition matrix Canbe unfolded in time to show all paths of a fixed length (Examples sheet 7 and past paper) How to do a simple probabilistic calculation 11

12 START 5. Markov chains 0.5 hh ay END 0.5 b What are the missing numbers? Unroll the model for exactly three time steps What is the probability that the sequence will be hi? What is the probability that a sequence of length 3 will be hi? 12

13 5. Markov chains Naïve application of probabilistic calculations is prohibitively slow in Markov chains In the lectures we saw a more efficient method based on recursion (Examples sheet 8) Don t need to remember the recursive algorithm used there, but should be able to apply it to a similar example Computationally efficient algorithms are very important imagine what happens when a problem is scaled up. 13

14 6. Hidden Markov models HMMs have two parts Markov chain model of states. The parameters of the Markovchain model are the transition probabilities: p(s t s t-1 ) t t-1 Emission probability distribution for feature vectors: p(x t s t ) In Lab 3 this is a normal density parameterised by mean and variance for each component of x 14

15 6. Hidden Markov models In Lab 3 you explored three things Training:constructing an HMM from labelled data (what is labelled data?) Classification: using the Forward algorithm to calculate p(x 1,x 2,,x T C) i and plugging it into Bayes theorem Decoding: using the Vitterbialgorithm to find the most likely path through the hidden states You should be able to understand the tasks, but don t have to recall details of the algorithms 15

16 6. Hidden Markov models Simple example of decoding (Lab 3) is removing the silence from speech signals The data withoutsilence is easier to classify (as in Lab 2) yes START 1.0 sil sil 0.04 STOP 0.02 no

17 7. Applications to speech Survey of tasks and performance (Examples sheet 5) Segmentation and MFCC features Phonemes and phoneme HMMs Triphones Decoding speech Simple language models 17

18 Other applications These methods can be generalised to many applications TrueSkill Ranking system in Xbox live Vision applications Speech Medicine Probabilistic graphical models to update probability of illness given symptoms Biology Standard way to determine gene function and location of genes in DNA sequence 18

19 How to revise Work through Example class sheets and past paper(s) Make sure you understand the relationship between the labs and the notes Notes, lectures, example sheet solutions and on the course website 19

COMP90051 Statistical Machine Learning

COMP90051 Statistical Machine Learning Semester 2, 2016 Lecturer: Trevor Cohn 20. PGM Representation Next Lectures Representation of joint distributions Conditional/marginal independence * Directed vs