Dynamic Time Warping & Search Dynamic time warping Search Graph search algorithms Dynamic programming algorithms 6.345 Automatic Speech Recognition Dynamic Time Warping & Search 1
Word-Based Template Matching Feature Measurement Pattern Similarity Decision Rule Spoken Word Word Reference Templates Output Word Whole word representation: No explicit concept of sub-word units (e.g., phones) No across-word sharing Used for both isolated- and connected-word recognition Popular in late 1970s to mid 1980s 6.345 Automatic Speech Recognition Dynamic Time Warping & Search 2
Template Matching Mechanism Test pattern, T, and reference patterns, {R 1,..., R V },are represented by sequences of feature measurements Pattern similarity is determined by aligning test pattern, T, with reference pattern, R v, with distortion D(T, R v ) Decision rule chooses reference pattern, R, with smallest alignment distortion D(T, R ) R = arg min D(T, R v ) v Dynamic time warping (DTW) is used to compute the best possible alignment warp, φ v, between T and R v, and the associated distortion D(T, R v ) 6.345 Automatic Speech Recognition Dynamic Time Warping & Search 3
Alignment Example m m M M Reference Warp 1 1 0 1 N n Test 0 1 N n 6.345 Automatic Speech Recognition Dynamic Time Warping & Search 4
Digit Alignment Examples Match Mismatch 6.345 Automatic Speech Recognition Dynamic Time Warping & Search 5
Dynamic Time Warping (DTW) Objective: an optimal alignment between variable length sequences T = {t 1,..., t N } and R = {r 1,..., r M } The overall distortion D(T, R) is based on a sum of local distances between elements d(t i, r j ) A particular alignment warp, φ, aligns T and R via a point-to-point mapping, φ = (φ t,φ r ), of length K φ t φt (k) r φr (k) 1 k K φ The optimal alignment minimizes overall distortion D(T, R) = min D φ (T, R) φ K 1 φ D φ (T, R) = d(t φt (k), r φr (k))m k M φ k=1 6.345 Automatic Speech Recognition Dynamic Time Warping & Search 6
DTW Issues Endpoint constraints: φ t (1) = φ r (1) = 1 φ t (K) = N φ r (K) = M Monotonicity: φ t (k +1) φ t (k) φ r (k +1) φ r (k) Path weights, m k, can influence shape of optimal path Path normalization factor, M φ, allows comparison between different warps (e.g., with different lengths) M φ = K φ k=1 m k 6.345 Automatic Speech Recognition Dynamic Time Warping & Search 7
DTW Issues: Local Continuity m [n,m] m [n,m] TYPE I m - 1 TYPE III m - 1 m - 2 m - 2 n - 2 n - 1 n n - 2 n - 1 n m [n,m] m [n,m] TYPE II m - 1 TYPE IV m - 1 m - 2 m - 2 n - 2 n - 1 n n - 2 n - 1 n Local constraints determine alignment flexibility 6.345 Automatic Speech Recognition Dynamic Time Warping & Search 8
DTW Issues: Global Constraints (1,M) slope = 2 (N,M) 1 slope = 2 Legal range 1 slope = 2 (1,1) (N,1) slope = 2 Local constraints exclude portions of search space 6.345 Automatic Speech Recognition Dynamic Time Warping & Search 9
Computing DTW Alignment 5 3 4 4 1 H 4 2 3 3 2 E F m [m,n] 3 1 A 1 D 1 3 G m-1 2 1 1 2 2 B 3 C 3 4 m-2 n-2 n-1 n 1 3 4 4 5 S 6.345 Automatic Speech Recognition Dynamic Time Warping & Search 10
Graph Representations of Search Space Search spaces can be represented as directed graphs A E S B D F H C G Paths through a graph can be represented with a tree 6.345 Automatic Speech Recognition Dynamic Time Warping & Search 11
Search Space Tree S A B C 7 E F E D G F G H H H F H H H H 6.345 Automatic Speech Recognition Dynamic Time Warping & Search 12
Graph Search Algorithms Iterative methods using a queue to store partial paths On each iteration the top partial path is removed from the queue and is extended one level New extensions are put back into the queue Search is complete when goal is reached Depth of queue is potentially unbounded Weighted graphs can be searched to find the best path Admissible algorithms guarantee finding the best path Many speech-based search problems can be configured to proceed time-synchronously 6.345 Automatic Speech Recognition Dynamic Time Warping & Search 13
Depth First Search Searches space by pursuing one path at a time Path extensions are put on top of queue Queue is not reordered or pruned Not well suited for spaces with long dead-end paths Not generally used to find the best path 6.345 Automatic Speech Recognition Dynamic Time Warping & Search 14
Depth First Search Example S A B C E F S A E F E D 7 G B E D 7 G F G F C F G G H H H F H H H H 6.345 Automatic Speech Recognition Dynamic Time Warping & Search 15
Breadth First Search Searches space by pursuing all paths in parallel Path extensions are put on bottom of queue Queue is not reordered or pruned Queue can grow rapidly in spaces with many paths Not generally used to find the best path Can be made much more effective with pruning 6.345 Automatic Speech Recognition Dynamic Time Warping & Search 16
Breadth First Search Example S A B C E F E D 7 G F G S A E F B E 7 G C F G D H H H F H H H H 6.345 Automatic Speech Recognition Dynamic Time Warping & Search 17
Best First Search Used to search a weighted graph Uses greedy or step-wise optimal criterion, whereby each iteration expands the current best path On each iteration, the queue is resorted according to the cumulative score of each partial path If path scores exhibit monotonic behavior, (e.g., d(t i, r j ) 0), search can terminate when a complete path has a better score than all active partial paths 6.345 Automatic Speech Recognition Dynamic Time Warping & Search 18
Tree Representation (with node scores) S 1 A 1 B 2 C 6 7 E 3 F 6 E 3 D 1 G 2 F 3 G 1 H 2 H 1 H 2 F 3 H 1 H 1 H 1 H 1 6.345 Automatic Speech Recognition Dynamic Time Warping & Search 19
Tree Representation (with cumulative scores) S 1 A 2 B 3 C 7 7 E 5 F 8 E 6 D 4 G 5 F 10 G 8 H 7 H 9 H 8 F 7 H 6 H 11 H 9 H 8 6.345 Automatic Speech Recognition Dynamic Time Warping & Search 20
Best First Search Example S 1 A 2 B 3 C 7 7 E 5 F 8 E 6 D 4 G 5 H 7 H 8 F 7 H 6 6.345 Automatic Speech Recognition Dynamic Time Warping & Search 21
Pruning Partial Paths Both greedy and dynamic programming algorithms can take advantage of optimal substructure: Let φ(i,j) be the best path between nodes i and j If k is a node in φ(i,j): φ(i,j) = {φ(i,k),φ(k, j)} Let ϕ(i,j) be the cost of φ(i,j) ϕ(i,j) = min(ϕ(i,k)+ ϕ(k, j)) k Solutions to subproblems need only be computed once Sub-optimal partial paths can be discarded while maintaining admissibility of search 6.345 Automatic Speech Recognition Dynamic Time Warping & Search 22
Best First Search with Pruning S 1 A 2 B 3 C 7 7 E 5 F 8 E 6 D 4 G 5 H 7 F 7 H 6 6.345 Automatic Speech Recognition Dynamic Time Warping & Search 23
Estimating Future Scores Partial path scores, ϕ(1,i), can be augmented with future estimates, ϕ(i), ˆ of the remaining cost ϕ φ = ϕ(1,i)+ ϕ(i) ˆ If ϕ(i) ˆ is an underestimate of the remaining cost, additional paths can be pruned while maintaining admissibility of search A search uses Best-first search strategy Pruning Future estimates 6.345 Automatic Speech Recognition Dynamic Time Warping & Search 24
Tree Representation (with future estimates) S 5 A 5 B 6 C 9 7 E 7 F 9 E 8 D 6 G 6 F 10 G 9 H 7 H 9 H 8 F 8 H 6 H 11 H 9 H 8 6.345 Automatic Speech Recognition Dynamic Time Warping & Search 25
A Search Example S 5 A 5 B 6 C 9 7 E 7 F 9 E 8 D 6 G 6 F 8 H 6 6.345 Automatic Speech Recognition Dynamic Time Warping & Search 26
N -Best Search Used to compute top N paths Can be re-scored by more sophisticated techniques Typically used at the sentence level Can use modified A search to rank paths No pruning of partial paths Completed paths are removed from queue Can use a threshold to prune paths, and still identify admissibility violations Can also be used to produce a graph Alternative methods can be used to compute N -best outputs (e.g., asynchronous DP) 6.345 Automatic Speech Recognition Dynamic Time Warping & Search 27
N -Best Search Example S 5 A 5 B 6 C 9 7 E 7 F 9 E 8 D 6 G 6 H 7 H 8 F 8 H 6 H 8 6.345 Automatic Speech Recognition Dynamic Time Warping & Search 28
Dynamic Programming (DP) DP algorithms do not employ a greedy strategy DP algorithms typically take advantage of optimal substructure and overlapping subproblems by arranging search to solve each subproblem only once Can be implemented efficiently: Node j retains only best path cost of all ϕ(i,j) Previous best node id needed to recover best path Can be time-synchronous or asynchronous DTW and Viterbi are time-synchronous searches and look like breadth-first with pruning 6.345 Automatic Speech Recognition Dynamic Time Warping & Search 29
Time-Synchronous DP Example A E S B D F H C G 6.345 Automatic Speech Recognition Dynamic Time Warping & Search 30
Inadmissible Search Variations Can use a beam width to prune current hypotheses Beam width can be static or dynamic based on relative score Can use an approximation to a lower bound on A lookahead for N -best computation Search is inadmissible, but may be useful in practice 6.345 Automatic Speech Recognition Dynamic Time Warping & Search 31
Beam Search Example A E S B D F H C G 6.345 Automatic Speech Recognition Dynamic Time Warping & Search 32
References Cormen, Leiserson, Rivest, Introduction to Algorithms, 2 nd Edition, MIT Press, 2001. Huang, Acero, and Hon, Spoken Language Processing, Prentice-Hall, 2001. Jelinek, Statistical Methods for Speech Recognition. MIT Press, 1997. Winston, Artificial Intelligence, 3 rd Edition, Addison-Wesley, 1993. 6.345 Automatic Speech Recognition Dynamic Time Warping & Search 33