Nonlinear Manifold Learning for Visual Speech Recognition

Size: px

Start display at page:

Download "Nonlinear Manifold Learning for Visual Speech Recognition"

Angelica Webster
5 years ago
Views:

1 Nonlinear Manifold Learning for Visual Speech Recognition Christoph Bregler and Stephen Omohundro University of California, Berkeley & NEC Research Institute, Inc. 1/25

2 Overview Manifold Learning: Applications to Lip Tracking, Image Interpolation, and Feature Extraction. Experiments with a full Speech Recognition System: A technique for representing and learning smooth nonlinear manifolds is presented and applied to several lip reading tasks. Given a set of points drawn from a smooth manifold in an abstract feature space, the technique is capable of determining the structure of the surface and of finding the closest manifold point to a given query point. We use this technique to learn the "space of lips" in a visual speech recognition task. The learned manifold is used for tracking and extracting the lips, for interpolating between frames in an image sequence and for providing features for recognition. We describe a system based on Hidden Markov Models and this learned lip manifold that significantly improves the performance of acoustic speech recognizers in degraded environments. We also present preliminary results on a purely visual lip reader. 2125

3 Problem 1. Boundary Tracking 2. Image Interpolation Acoustic Features - 3. Lip Features At A2A3 A4 A5 A6A7 Vt V2 V3 V4 V5 V6 V Speech Recognition [ HMM ] 3/25

4 Full Reeo gnition System "Visual Speech" Acoustic Speech "Contour-Surfing" "Eigenlips" Lip-Interpolation MLPIHMM Hybrid Speech-Recognition System -- 'pi / -- I, ~' - ~ ",1 RASTA-PLP ~ Full Sentence Car-Noise Cross-Talk 4/25

5 Lip Tracking using Active Contour Models "Snake" Tracking Snake Model = "Controlled Continuity Spline" - 5/25

6 Space ofcontours as Constrained Manifold N Dimensional Features are Points in N Dimensional Space. K Dimensional Subspace N Dimensional Feature Space 6/25

7 Abstract Manifold Learning Task /' /_ '" / " \( J \ \ I I ) I I \ I \ i t \, ).. ""---..,._/ Samples Mixture of local adaptive Patches Learned Representation 7/25

8 Mixture ofpatches Architecture ~.L.L.L.Luence?'~, ction PI I,Gi (x) Pi (x) P(x) =.i I,Gi (x) i Linear Patch 8/25

9 Manifold Learning Initial Estimate - Cluster Feature Space (K-Means) - LocalPCA Minimize MSE between Training Set and Projection - EM Algorithm - Gradient Descent First Best Model Merging 9125

10 Synthetic Examples Learned surface Training Salllples 10/25

11 Comparison to Nearest Neighbor MSE 0.3 nearest point query 0.2 Surface Prior.,, I, 0.1 " 0.0 ;:

12 Manifold Dimension? Eigen Value st Eigenvalue... 2nd Eigenvalue ,,, ",, " , J,, " 80.00,,, ,,,.,, " " "."...".../ rd Eigenvalue o.oo...a...: "...""",..."...".' Samples per pea 12/25

13 Manifold-based Snake Tracking Jl - ~ Learned Manifold Directed Graylevel Gradients along Snake LIP-Space Manifold Snake Energy =Image Term+ Learned Manifold Term Use Manifold-Snakes to estimate position, scale, and rotation for gray level matrix extraction

14 Graylevel Manifolds Linear Lip-Space Real Lip-Space => Linear Dimension Reduction Nonlinear Dimension Reduction 14/25

15 Linear vs. Nonlinear Interpolation, / Leam nonlinear subspace of "legal" images... constrain interpolated images to subspace Graylevel Dimensions (16x16 pixel = 256 dim. space) 15/25

16 Nonlinear Interpolation Technique Y :,... /.-~--i~~> "Manifold-Snake" "- '\, ~ E = L IVi- 1 - i 2 2Vi + V i distance (Vi) - Begin with sequence of linear interpolated points - Iteratively move points toward manifold - Iterations are equal to gradient descent steps using the Manifold-Snake Energy.,,"'-.,., ", ;..".-... #"" - ",,.., -"\, '., l I '. I j i i j 'j,..."i.., ~~~... ~~ ~... "'f \ I \...._,,/....._wl'..._".. ~.~~" -"'11_...,,,, I alter. 1 Iter. 2 Iter. 5 Iter. loiter. 16/25

18 Linear Interpolation Natural Lip Images Nonlinear Interpolation 24x24 Images projected in a 32-D linear space and 8-D nonlinear manifold 18/25

20 Phone Probability Estimation -2 P(phone I acoustics, visual-input) Multi-Layer Perceptron Softmax Prob. Outputs Hidden Units t 19 frames - ~ ~~---""".,... ~ ~ 9 features 10 Eigen-Features (1 Energy, 8 RASTA-PLP) (+ 10 Delta-Eigen-Features) 20125

21 Viterbi Approximation Hidden Markov Models Mi (Word-Model) Likelihood: P(A, VIM i ) ele e e e e e P (phone la, V) e Ol~ III e~ Ol~... Time III P (A, Vlphone) oc P (phone) 21125

22 Reeo gnition Results Task Acoustic Eigenlips Delta-Lips Rel.Err.Red. clean 11.0 % 10.1 % 11.3 % - 20db SNR 33.5 % 28.9% 26.0% 22.4 % lodbsnr 56.1 % 51.7 % 48.0% 14.4 % 15dbSNR crosstalk 67.3 % 51.7% 46.0% 31.6 % Table 1: Spelling Task (6 Speakers) Word Error (wrong + insertion + deletion) Task Pure Lipreading-Error Bar-Tender 4.5% Table 2: Pure Lipreading (1 Speaker, "Bar-Thsk": 4 Cocktail-Names) 22/25

23 Related Linear Representations: Related Work Kirby et ai., Pentland et ai.: Linear Subspaces Simard: Tangent Distance Related Nonlinear Representation: Jacobs, Jordan, Nowlan, Hinton: Mixture of Experts Kambhatla, Leen: Local pea Related Interpolation Techniques: Poggio et ai.: RBFs for Image Interpolation 23/25

24 Computer Lipreading History - U.S.Patent (1965) E. Nassimbene (IBM) - E. Petajan (1984), Dissertation - B. Yuhas, M. Goldstein, T. Sejnowski (1989) - K. Mase, A. Pentland (1991) - O. Garcia, A. Goldschen, E. Petajan (1992) - D. Stork, G. Wolff, K. V. Prasad, M. Hennecke (1992) - P.L. Silsbee (1993), Dissertation - A.f. Goldschen (1993), Dissertation - f. Movellan (1994) 24125

25 Summary Lipreading: - Improves Speech Recognition Performance significantly - Full System Solution Manifold Learning: - New fundamental technique for learning in geometric domains - Applicable to variety of different tasks 25125

Nonlinear Image Interpolation using Manifold Learning

Nonlinear Image Interpolation using Manifold Learning Christoph Bregler Computer Science Division University of California Berkeley, CA 94720 bregler@cs.berkeley.edu Stephen M. Omohundro'" Int. Computer