The Laplacian Eigenmaps Latent Variable Model with applications to articulated pose tracking Miguel Á. Carreira-Perpiñán EECS, UC Merced http://faculty.ucmerced.edu/mcarreira-perpinan
Articulated pose tracking We want to extract the 3D pose of a moving person (e.g. 3D positions of several markers located on body joints) from monocular video: From the CMU motion-capture database, ØØÔ»»ÑÓ Ôº ºÑÙº Ù Idea: model patterns of human motion using motion-capture data (also useful in psychology, biomechanics, etc.) p.
Articulated pose tracking (cont.) Some applications: recognising orientation (e.g. front/back), activities (running, walking... ), identity, sex computer graphics: rendering graphics model of the person (from different viewpoints) entertainment: realistic animation of cartoon characters in movies and computer games Difficult because: ambiguity of perspective projection 3D D (depth loss) self-occlusion of body parts noise in image, clutter high-dimensional space of poses: this makes it hard to track (e.g. in a Bayesian framework) p.
Articulated pose tracking (cont.) Pose = 3D positions of 3+ markers located on body joints: vector y R D (D ) Intrinsic pose x R L with L D: markers positions are correlated because of physical constraints (e.g. elbow and wrist are always 3 cm apart) so the poses y,y,... live in a low-dimensional manifold with dimension L D p. 3
Articulatory inversion The problem of recovering the sequence of vocal tract shapes (lips, tongue, etc.) that produce a given acoustic utterance. articulatory configurations?? acoustic signal A long-standing problem in speech research (but solved effortlessly by humans). p. 4
Articulatory inversion (cont.) Applications: speech coding speech recognition real-time visualisation of vocal tract (e.g. for speech production studies or for language learning) speech therapy (e.g. assessment of dysarthria) etc. Difficult because: different vocal tract shapes can produce the same acoustics high-dimensional space of vocal-tract shapes but, again, low-dimensional intrinsic manifold because of physical constraints p. 5
Articulatory inversion (cont.) Data collection: electromagnetic articulography (EMA) or X ray microbeam: record D positions along midsagittal plane of several pellets located on tongue, lips, velum, etc. X ray microbeam database (U. Wisconsin) MOCHA database (U. Edinburgh & QMUC) Other techniques being developed: ultrasound, MRI, etc. p. 6
Visualisation of blood test analytes One blood sample yields + analytes = (glucose, albumin, Na+, LDL,... ) The D map represents in different regions normal vs. abnormal samples. Extreme values of certain analytes are potentially associated with diseases (glucose: diabetes; urea nitrogen and creatinine: kidney disease; total bilirubin: liver). all data Inpatients Outpatients Normal samples glucose > urea nitrogen > 5 creatinine > total bilirubin > 5 p. 7
Visualisation of blood test analytes (cont.) The temporal trajectories (over a period of days) for different patients indicate their evolution. Also useful to identify bad samples, e.g. due to machine malfunction. Inpatient 34535 Outpatient 4889 p. 8 Kazmierczak, Leen, Erdogmus & Carreira-Perpiñán, Clinical Chemistry and Laboratory Medicine 7
Dimensionality reduction (manifold learning) Given a high-dimensional data set Y = {y,...,y N } R D, assumed to lie near a low-dimensional manifold of dimension L D, learn (estimate): Dimensionality reduction mapping F : y x Reconstruction mapping f : x y Manifold x x f y 3 f(x) F(y) y F y y x Latent low-dimensional space R L Observed high-dimensional space R D p. 9
Dimensionality reduction (cont.) Two large classes of nonlinear methods: Latent variable models Probabilistic, mappings, local optima, scale badly with dimension Spectral methods Not probabilistic, no mappings, global optimum, scale well with dimension They have developed separately so far, and have complementary advantages and disadvantages. Our new method, LELVM, shares the advantages of both. p.
Latent variable models (LVMs) Probabilistic methods that learn a joint density model p(x, y) from the training data Y. This yields: Marginal densities p(y) = p(y x)p(x)dx and p(x) Mapping F(y) = E {x y} the mean of p(x y) = p(y x)p(x) p(y) Mapping f(x) = E {y x} Several types: (Bayes th.) Linear LVMs: probabilistic PCA, factor analysis, ICA... Nonlinear LVMs: Generative Topographic Mapping (GTM) (Bishop et al, NECO 998) Generalised Elastic Net (GEN) (Carreira-Perpiñán et al 5) p.
Latent variable models (cont.) Nonlinear LVMs are very powerful in principle: can represent nonlinear mappings can represent multimodal densities can deal with missing data But in practice they have disadvantages: The objective function has many local optima, most of which yield very poor manifolds Computational cost grows exponentially with latent dimension L, so this limits L 3 Reason: need to discretise the latent space to compute p(y) = R p(y x)p(x) dx This has limited their use in certain applications. p.
Spectral methods Very popular recently in machine learning: multidimensional scaling, Isomap, LLE, Laplacian eigenmaps, etc. Essentially, they find latent points x,...,x N such that distances in X correlate well with distances in Y. Example: draw map of the US given city-to-city distances distance(y m,y n ) = We focus on Laplacian eigenmaps. 3..... x 3..7... x..7... = x 3................. p. 3
Spectral methods: Laplacian eigenmaps (Belkin & Niyogi, NECO 3). Neighborhood graph on dataset y,...,y N with weighted edges w mn = exp ( (y m y n )/σ ).. Set up quadratic optimisation problem min X tr( XLX T) s.t. XDX T = I, XD = X = (x,...,x N ), affinity matrix W = (w mn ), degree matrix D = diag ` P N n= w nm, graph Laplacian L = D W. Intuition: tr ( XLX T) = n m w nm x n x m = place x n, x m nearby if y n and y m are similar. The constraints fix the location and scale of X. 3. Solution: eigenvectors V = (v,...,v L+ ) of D WD, which yield the low-dimensional points X = V T D. p. 4
Spectral methods: Laplacian eigenmaps (cont.) Example: Swiss roll (from Belkin & Niyogi, NECO 3) 5 5 5 5 5 5 5 5 High-dimensional space Low-dimensional space Y = (y,...,y N ) X = (x,...,x N ) p. 5
Spectral methods: Laplacian eigenmaps (cont.) Advantages: No local optima (unique solution) Yet succeeds with nonlinear, convoluted manifolds (if the neighborhood graph is good) Computational cost O(N 3 ) or, for sparse graphs, O(N ) Can use any latent space dimension L (just use L eigenvectors) Disadvantages: No mapping for points not in Y = (y,...,y N ) or X = (x,...,x N ) (out-of-sample mapping) No density p(x,y) What should the mappings and densities be for unseen points (not in the training set)? p. 6
The Laplacian Eigenmaps Latent Variable Model (Carreira-Perpiñán & Lu, AISTATS 7) Natural way to embed unseen points Y u = (y N+,...,y N+M ) without perturbing the points Y s = (y,...,y N ) previously embedded: ( min tr ( X s X u ) ( L ss L su ) ( )) X T s X u R L M L us L uu X T u That is, solve the LE problem but subject to keeping X s fixed. Semi-supervised learning point of view: labelled data (X s,y s ) (real-valued labels), unlabelled data Y u, graph prior on Y = (Y s,y u ). Solution: X u = X s L su L uu. In particular, to embed a single unseen point y = Y u R D, we obtain x = F(y) = N K((y y n )/σ) P n= x N n = K((y y n )/σ) n. This gives an out-of-sample mapping F(y) for Laplacian eigenmaps. p. 7
LELVM (cont.) Further, we can define a joint probability model on x and y (thus a LVM) that is consistent with that mapping: p(x,y) = N ( ) ( ) y yn x xn K y K x N σ n= y σ x p(y) = N ( ) y yn K y p(x) = N ( ) x xn K x N σ y N σ x F(y) = f(x) = N n= N n= n= K y ((y y n )/σ y ) N n = K y((y y n )/σ y ) x n = K x ((x x n )/σ x ) N n = K x((x x n )/σ x ) y n = n= N p(n y)x n = E {x y} n= N p(n x)y n = E {y x} n= The densities are kernel density estimates, the mappings are Nadaraya-Watson estimators (all nonparametric). p. 8
LELVM (cont.) All the user needs to do is set: the graph parameters for Laplacian eigenmaps (as usual) σ x, σ y to control the smoothness of mappings & densities Advantages: those of latent variable models and spectral methods: yields mappings (nonlinear, infinitely differentiable and based on a global coordinate system) yields densities (potentially multimodal) no local optima succeeds with convoluted manifolds can use any dimension computational efficiency O(N 3 ) or O(N ) (sparse graph) Disadvantages: it relies on the success of Laplacian eigenmaps (which depends on the graph). p. 9
LELVM example: spiral Dataset: spiral in D; reduction to D. σ x = 5 5 σ x =.5 4 σ x =.5 4 σ x = 4.5 4 σ x =.5 3 GTM p(x).5..5.5..5..5 x.5..5.5..5..5 x.5..5.5..5..5 x.5..5.5..5..5 x.5..5.5..5..5 x.5.5.5.5.5.5.5.5.5.5.5.5 f(x).5.5.5.5.5.5.5.5.5.5.5.5.5.5.5.5.5.5.5.5.5.5.5.5 σ y =.5 σ y =. σ y =.5 σ y =. σ y =.5.5.5.5.5.5.5.5.5.5.5 p(y).5.5.5.5.5.5.5.5.5.5.5.5.5.5.5.5.5.5.5.5 p.
LELVM example: motion-capture dataset LELVM GTM..8.5.6..4.5..5..4..6.5.8..5.5.5.3.5..5..5.5..5. GPLVM.8.6.4...4.6.8 GPLVM with back-constraints.5.5.5.5.5.5.6.4...4.6.8 p.
LELVM example: mocap dataset (cont.) Smooth interpolation (e.g. for animation):..5..5.5..5..3.5..5..5.5..5. p.
LELVM example: mocap dataset (cont.) Reconstruction of missing patterns (e.g. due to occlusion) using p(x y obs ) and the mode-finding algorithms of Carreira-Perpiñán, PAMI, 7:...5.5...5.5.5.5...5.5..3.5..5..5.5..5...3.5..5..5.5..5. y obs y y obs y (mode ) y (mode ).5.5.5.5.5.5.5.5 p. 3
LELVM application: people tracking (Lu, Carreira-Perpiñán & Sminchisescu, NIPS 7) The probabilistic nature of LELVM allows seamless integration in a Bayesian framework for nonlinear, nongaussian tracking (particle filters): At time t: observation z t, unobserved state s t = (d t,x t ) rigid motion d, intrinsic pose x Prediction: p(s t z :t ) = p(s t s t )p(s t z :t )ds t Correction: p(s t z :t ) p(z t s t )p(s t z :t ) We use the Gaussian mixture Sigma-point particle filter (v.d.merwe & Wan, ICASSP 3). Dynamics: p(s t s t ) Gaussian {}}{ p d (d t d t ) Gaussian {}}{ p x (x t x t ) LELVM {}}{ p(x t ) Observation model: p(z t s t ) given by D tracker with Gaussian noise, and mapping from state to observations x R L f (LELVM) y R 3M Perspective z R M d R 3 p. 4
LELVM application: people tracking (cont.) 3D mocap, small training set 3D mocap, occlusion 3D mocap, front view CMU mocap database Fred turning Fred walking p. 5
LELVM: summary Probabilistic method for dimensionality reduction Natural, principled way of combining two large classes of methods (latent variable models and spectral methods), sharing the advantages of both We think it is asymptotically consistent (N ). Same idea applicable to out-of-sample extensions for LLE, Isomap, etc. Very simple to implement in practice training set + eigenvalue problem + kernel density estimate Useful for applications: Priors for articulated pose tracking with multiple motions (walking, dancing... ), multiple people Low-dim. repr. of state spaces in reinforcement learning Low-dim. repr. of degrees of freedom in humanoid robots Visualisation of high-dim. datasets, with uncertainty estimates p. 6