Hidden Markov Models. Implementing the forward-, backward- and Viterbi-algorithms

Size: px

Start display at page:

Download "Hidden Markov Models. Implementing the forward-, backward- and Viterbi-algorithms"

Eustace Wright
5 years ago
Views:

1 Hidden Markov Models Implementing the forward-, backward- and Viterbi-algorithms

2 Recursion: Viterbi Basis: Forward Recursion: Basis: Backward Recursion: Basis:

3 Viterbi Recursion: Problem: The values in the ω-, α-, and β-tables can come very close to zero, by multiplying them we potentially exceed the precision of double precision floating points and get underflow Basis: Recursion: Forward Basis: Recursion: Backward Basis:

4 The Viterbi algorithm ω(z n ) is the probability of the most likely sequence of states z 1,...,z n ending in z n generating the observations x 1,...,x n Recursion: ω[k][n] = ω(z n ) if z n is state k Basis: Computing ω takes time O(K 2 N) and space O(KN) using memorization 1 n N k

5 The Viterbi algorithm ω(z n ) is the probability of the most likely sequence of states z 1,...,z n ending in z n generating the observations x 1,...,x n Solution to underflow-problem: Because log max f = max log f, we can work in log-space which turns multiplications into additions and thus Recursion: avoids too small values ω[k][n] = ω(z n ) if z n is state k Basis: Computing ω takes time O(K 2 N) and space O(KN) using memorization 1 n N k

6 The Viterbi algorithm in log-space ω(z n ) is the probability of the most likely sequence of states z 1,...,z n ending in z n generating the observations x 1,...,x n Recursion: Basis:

7 The Viterbi algorithm in log-space ω(z n ) is the probability of the most likely sequence of states z 1,...,z n ending in z n generating the observations ω^[k][n] x 1,...,x n = ω^(z n ) if z n is state k k Recursion: 1 n N Basis:

8 The Viterbi algorithm in log-space What if p(x n z n ) or p(z n z n-1 ) is 0? Then the product of probabilities becomes 0, but what should it be in log-space?

9 The Viterbi algorithm in log-space What if p(x n z n ) or p(z n z n-1 ) is 0? Then the product of probabilities becomes 0, but what should it be in log-space? log 0 should be some representation of minus infinity // Pseudo code for computing ω^[k][n] for some n>1 ω[k][n] = minus infinity if p(x[n] k)!= 0: for j = 1 to K: if p(k j)!= 0: ω^[k][n] = max( ω^[k][n], log(p(x[n] k)) + ω^[ j ][n-1] + log(p( k j)) )

10 The Viterbi algorithm in log-space What if p(x n z n ) or p(z n z n-1 ) is 0? Then the product of probabilities Still becomes takes time 0, but O(Kwhat 2 N) and should space it be O(KN) in log-space? using memorization, and the most likely sequence of states can be found be backtracking log 0 should be some representation of minus infinity // Pseudo code for computing ω^[k][n] for some n>1 ω[k][n] = minus infinity if p(x[n] k)!= 0: for j = 1 to K: if p(k j)!= 0: ω^[k][n] = max( ω^[k][n], log(p(x[n] k)) + ω^[ j ][n-1] + log(p( k j)) )

11 Backtracking Pseudocode for backtracking not using log-space: z[1..n] = undef z[n] = arg max k ω[k][n] for n = N-1 to 1: z[n] = arg max k ( p(x[n+1] z[n+1]) * ω[k][n] * p(z[n+1] k ) ) print z[1..n] Pseudocode for backtracking using log-space: z[1..n] = undef z[n] = arg max k ω^[k][n] for n = N-1 to 1: z[n] = arg max k ( log p(x[n+1] z[n+1]) + ω^[k][n] + log p(z[n+1] k ) ) print z[1..n] Takes time O(NK) but requires the entire ω- or ω^-table in memory

12 Backtracking Can we do backtracking without the entire ω- or ω^-table in memory

13 Backtracking

14 Backtracking Fill out the table using a sliding window of two columns and keep every Bth column in memory. Takes time O(NK 2 ) and space O(KN / B). When backtracking recompute and backtrack through blocks of B columns from right to left. Pr. block it takes time O(BK 2 ) and space O(BK) for recomputing and storing B columns, and time O(BK) for backtracking. In total time O(NK 2 ) and space O(BK). What should B be?

15 Backtracking Fill out the table using a sliding window of two columns and keep every Bth column in memory. Takes time O(NK 2 ) and space O(KN / B). When backtracking recompute and backtrack through blocks of B columns from right to left. Pr. block it takes time O(BK 2 ) and space O(BK) for recomputing and storing B columns, and time O(BK) for backtracking. In total time O(NK 2 ) and space O(BK). If we choose B=sqrt(N), then space O(K*sqrt(N)) and time O(NK 2 ) in total.

16 Why log-space helps A floating point number n is represented as n = f * 2 e cf. the IEEE-754 standard which specify the range of f and e See e.g. Appendix B in Tanenbaum's Structured Computer Organization for further details.

17 Why log-space helps The Viterbi-recursion for the HMM below yields: 1 A:.5 B:.5 A simpel HMM

18 Why log-space helps The Viterbi-recursion for the HMM below yields: If n > 467 then 2 -n is smaller than , i.e. cannot be represented 1 A:.5 B:.5 A simpel HMM

19 Why log-space helps The Viterbi-recursion for the HMM below yields: If n > 467 then 2 -n is smaller than , i.e. cannot be represented The log-transformed Viterbi-recursion for the HMM below yields: 1 No problem, as the decimal range is approx to A:.5 B:.5 A simpel HMM

20 The forward algorithm α(z n ) is the joint probability of observing x 1,...,x n and being in state z n Recursion: Basis: α[k][n] = α(z n ) if z n is state k k Takes time O(K 2 N) and space O(KN) using memorization 1 n N

21 The forward algorithm α(z n ) is the joint probability of observing x 1,...,x n and being in state z n Solution to underflow-problem: Since log (Σ f) Σ (log f), we cannot (immediately) use the log-space trick Recursion: Basis: α[k][n] = α(z n ) if z n is state k k Takes time O(K 2 N) and space O(KN) using memorization 1 n N

22 The forward algorithm α(z n ) is the joint probability of observing x 1,...,x n and being in state z n Solution to underflow-problem: Since log (Σ f) Σ (log f), we cannot (immediately) use the log-space trick. We instead use scaling such that values do not (all) become too small Recursion: Basis: α[k][n] = α(z n ) if z n is state k k Takes time O(K 2 N) and space O(KN) using memorization 1 n N

23 Forward algorithm using scaled values α(z n ) is the joint probability of observing x 1,...,x n and being in state z n

24 Forward algorithm using scaled values α(z n ) is the joint probability of observing x 1,...,x n and being in state z n This normalized version of α(z n ) is a probability distribution over K outcomes, and we expect it to behave numerically well because The normalized values can not all become arbitrary small...

25 Forward algorithm using scaled values We can modify the forward-recursion to use scaled values

26 Forward algorithm using scaled values We can modify the forward-recursion to use scaled values If we know c n then we have a recursion using the normalized values

27 Forward algorithm using scaled values We can modify the forward-recursion to use scaled values If we know c n then we have a recursion using the normalized values

28 Forward algorithm using scaled values We can modify the forward-recursion to use scaled values Recursion: In step n compute and store temporarily the K values δ(z n1 ),..., δ(z nk ) Compute and store c n as Compute and store

29 Forward algorithm using scaled values α^[k][n] = α^(z n ) if z n is state k We can modify the forward-recursion to use kscaled values Recursion: In step n compute and store temporarily the K values δ(z n1 ),..., δ(z nk ) 1 n N c 1... c n Compute and store c n as Compute and store Basis:

to use kscaled values Recursion: In step n compute and

30 Forward algorithm using scaled values α^[k][n] = α^(z n ) if z n is state k We can modify the forward-recursion to use kscaled values Recursion: In step n compute and store temporarily the K values δ(z n1 ),..., δ(z nk ) 1 n N c 1... c n Compute and store c n as Compute and store Basis: Takes time O(K 2 N) and space O(KN) using memorization

31 The Backward Algorithm β(z n ) is the conditional probability of future observation x n+1,...,x N assuming being in state z n Recursion: Basis: Takes time O(K 2 N) and space O(KN) using memorization

32 Backward algorithm using scaled values We can modify the backward-recursion to use scaled values

Backward algorithm using scaled values We can modify the backward-recursion to use scaled values Recursion: In step n compute and store temporarily the K values ε(z n1 ),.

33 Backward algorithm using scaled values We can modify the backward-recursion to use scaled values Recursion: In step n compute and store temporarily the K values ε(z n1 ),..., ε(z nk ) Using c n+1 computed during the forward-recursion, compute and store Basis: Takes time O(K 2 N) and space O(KN) using memorization k β^[k][n] = β^(z n ) if z n is state k 1 n N

34 Posterior decoding - Revisited Given X, find Z*, where is the most likely state to be in the n'th step.

35 Posterior decoding - Revisited Given X, find Z*, where is the most likely state to be in the n'th step.

36 Summary Implementing the Viterbi- and Posterior decoding in a numerically sound manner. Next: How to build an HMM, i.e. determining the number of hidden states (K) and the transition- and emission-probabilities.

37 Exercise for next time Blackboard or

Hidden Markov Models. Implementing the forward-, backward- and Viterbi-algorithms using log-space and scaling

Hidden Markov Models. Implementing the forward-, backward- and Viterbi-algorithms using log-space and scaling Hidden Markov Models Implementing the forward-, backward- and Viterbi-algorithms using log-space and scaling The Viterbi Algorithm ω(z n is the probability of the most likely sequence of states z 1,...,z