The Laplacian Eigenmaps Latent Variable Model

Similar documents
People Tracking with the Laplacian Eigenmaps Latent Variable Model

3D Human Motion Analysis and Manifolds

Locally Linear Landmarks for large-scale manifold learning

The K-modes and Laplacian K-modes algorithms for clustering

Fast, Accurate Spectral Clustering Using Locally Linear Landmarks

Does Dimensionality Reduction Improve the Quality of Motion Interpolation?

Large-Scale Face Manifold Learning

Non-linear dimension reduction

Day 3 Lecture 1. Unsupervised Learning

Experimental Evaluation of Latent Variable Models. for Dimensionality Reduction

Trajectory Inverse Kinematics By Conditional Density Models

Conditional Visual Tracking in Kernel Space

Manifold Learning and Missing Data Recovery through Unsupervised Regression

Semi-Supervised Hierarchical Models for 3D Human Pose Reconstruction

Priors for People Tracking from Small Training Sets Λ

Style-based Inverse Kinematics

Computer Vision II Lecture 14

A Stochastic Optimization Approach for Unsupervised Kernel Regression

Learning a Manifold as an Atlas Supplementary Material

A Taxonomy of Semi-Supervised Learning Algorithms

Graph based machine learning with applications to media analytics

08 An Introduction to Dense Continuous Robotic Mapping

CSE 6242 A / CS 4803 DVA. Feb 12, Dimension Reduction. Guest Lecturer: Jaegul Choo

Network Traffic Measurements and Analysis

Differential Structure in non-linear Image Embedding Functions

Augmented Reality VU. Computer Vision 3D Registration (2) Prof. Vincent Lepetit

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas

Data fusion and multi-cue data matching using diffusion maps

Joint Feature Distributions for Image Correspondence. Joint Feature Distribution Matching. Motivation

Tracking Human Body Pose on a Learned Smooth Space

Learning Appearance Manifolds from Video

Experimental Evaluation of Latent Variable Models for Dimensionality Reduction

The Role of Manifold Learning in Human Motion Analysis

Non-linear CCA and PCA by Alignment of Local Models

Global versus local methods in nonlinear dimensionality reduction

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

Locality Preserving Projections (LPP) Abstract

Spectral Latent Variable Models for Perceptual Inference

Image Similarities for Learning Video Manifolds. Selen Atasoy MICCAI 2011 Tutorial

Real-Time Human Pose Inference using Kernel Principal Component Pre-image Approximations

Graphical Models for Human Motion Modeling

Hierarchical Gaussian Process Latent Variable Models

CSE 6242 / CX October 9, Dimension Reduction. Guest Lecturer: Jaegul Choo

Generating Different Realistic Humanoid Motion

Motion Tracking and Event Understanding in Video Sequences

The Analysis of Parameters t and k of LPP on Several Famous Face Databases

Locally Linear Landmarks for Large-Scale Manifold Learning

Warped Mixture Models

A Course in Machine Learning

Predicting 3D People from 2D Pictures

A Geometric Perspective on Machine Learning

Learning Two-View Stereo Matching

PERFORMANCE CAPTURE FROM SPARSE MULTI-VIEW VIDEO

Robust Pose Estimation using the SwissRanger SR-3000 Camera

Constrained Hidden Markov Models

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Unsupervised Learning

CS448A: Experiments in Motion Capture. CS448A: Experiments in Motion Capture

Trajectory Inverse Kinematics by Conditional Density Modes

Learning Deformations of Human Arm Movement to Adapt to Environmental Constraints

Locally Linear Landmarks for Large-Scale Manifold Learning

Visual Motion Analysis and Tracking Part II

10-701/15-781, Fall 2006, Final

Machine Learning

Appearance Manifold of Facial Expression

Lecture 20: Tracking. Tuesday, Nov 27

Gaussian Process Dynamical Models

Gaussian Process Motion Graph Models for Smooth Transitions among Multiple Actions

Non-Local Estimation of Manifold Structure

Learning to Transform Time Series with a Few Examples

Non-Local Manifold Tangent Learning

Manifold Learning for Video-to-Video Face Recognition

Monocular Human Motion Capture with a Mixture of Regressors. Ankur Agarwal and Bill Triggs GRAVIR-INRIA-CNRS, Grenoble, France

Vector Field Visualisation

Gaussian Process Dynamical Models

3D Human Motion Tracking Using Dynamic Probabilistic Latent Semantic Analysis

Particle Filtering. CS6240 Multimedia Analysis. Leow Wee Kheng. Department of Computer Science School of Computing National University of Singapore

Probabilistic Tracking and Reconstruction of 3D Human Motion in Monocular Video Sequences

SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014

Capturing, Modeling, Rendering 3D Structures

MSA220 - Statistical Learning for Big Data

Locality Preserving Projections (LPP) Abstract

Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Generative and discriminative classification techniques

Monocular Multiple People Tracking

CSC 411: Lecture 14: Principal Components Analysis & Autoencoders

CSC 411: Lecture 14: Principal Components Analysis & Autoencoders

CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, :59pm, PDF to Canvas [100 points]

Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data

( ) =cov X Y = W PRINCIPAL COMPONENT ANALYSIS. Eigenvectors of the covariance matrix are the principal components

Trajectory Mixture Density Networks with Multiple Mixtures for Acoustic-articulatory Inversion

CSE 6242 A / CX 4242 DVA. March 6, Dimension Reduction. Guest Lecturer: Jaegul Choo

Acoustic to Articulatory Mapping using Memory Based Regression and Trajectory Smoothing

Two-View Geometry (Course 23, Lecture D)

Tracking People on a Torus

Generalized Principal Component Analysis CVPR 2007

Human Motion Synthesis by Motion Manifold Learning and Motion Primitive Segmentation

CS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample

Using GPLVM for Inverse Kinematics on Non-cyclic Data

Homework. Gaussian, Bishop 2.3 Non-parametric, Bishop 2.5 Linear regression Pod-cast lecture on-line. Next lectures:

Transcription:

The Laplacian Eigenmaps Latent Variable Model with applications to articulated pose tracking Miguel Á. Carreira-Perpiñán EECS, UC Merced http://faculty.ucmerced.edu/mcarreira-perpinan

Articulated pose tracking We want to extract the 3D pose of a moving person (e.g. 3D positions of several markers located on body joints) from monocular video: From the CMU motion-capture database, ØØÔ»»ÑÓ Ôº ºÑÙº Ù Idea: model patterns of human motion using motion-capture data (also useful in psychology, biomechanics, etc.) p.

Articulated pose tracking (cont.) Some applications: recognising orientation (e.g. front/back), activities (running, walking... ), identity, sex computer graphics: rendering graphics model of the person (from different viewpoints) entertainment: realistic animation of cartoon characters in movies and computer games Difficult because: ambiguity of perspective projection 3D D (depth loss) self-occlusion of body parts noise in image, clutter high-dimensional space of poses: this makes it hard to track (e.g. in a Bayesian framework) p.

Articulated pose tracking (cont.) Pose = 3D positions of 3+ markers located on body joints: vector y R D (D ) Intrinsic pose x R L with L D: markers positions are correlated because of physical constraints (e.g. elbow and wrist are always 3 cm apart) so the poses y,y,... live in a low-dimensional manifold with dimension L D p. 3

Articulatory inversion The problem of recovering the sequence of vocal tract shapes (lips, tongue, etc.) that produce a given acoustic utterance. articulatory configurations?? acoustic signal A long-standing problem in speech research (but solved effortlessly by humans). p. 4

Articulatory inversion (cont.) Applications: speech coding speech recognition real-time visualisation of vocal tract (e.g. for speech production studies or for language learning) speech therapy (e.g. assessment of dysarthria) etc. Difficult because: different vocal tract shapes can produce the same acoustics high-dimensional space of vocal-tract shapes but, again, low-dimensional intrinsic manifold because of physical constraints p. 5

Articulatory inversion (cont.) Data collection: electromagnetic articulography (EMA) or X ray microbeam: record D positions along midsagittal plane of several pellets located on tongue, lips, velum, etc. X ray microbeam database (U. Wisconsin) MOCHA database (U. Edinburgh & QMUC) Other techniques being developed: ultrasound, MRI, etc. p. 6

Visualisation of blood test analytes One blood sample yields + analytes = (glucose, albumin, Na+, LDL,... ) The D map represents in different regions normal vs. abnormal samples. Extreme values of certain analytes are potentially associated with diseases (glucose: diabetes; urea nitrogen and creatinine: kidney disease; total bilirubin: liver). all data Inpatients Outpatients Normal samples glucose > urea nitrogen > 5 creatinine > total bilirubin > 5 p. 7

Visualisation of blood test analytes (cont.) The temporal trajectories (over a period of days) for different patients indicate their evolution. Also useful to identify bad samples, e.g. due to machine malfunction. Inpatient 34535 Outpatient 4889 p. 8 Kazmierczak, Leen, Erdogmus & Carreira-Perpiñán, Clinical Chemistry and Laboratory Medicine 7

Dimensionality reduction (manifold learning) Given a high-dimensional data set Y = {y,...,y N } R D, assumed to lie near a low-dimensional manifold of dimension L D, learn (estimate): Dimensionality reduction mapping F : y x Reconstruction mapping f : x y Manifold x x f y 3 f(x) F(y) y F y y x Latent low-dimensional space R L Observed high-dimensional space R D p. 9

Dimensionality reduction (cont.) Two large classes of nonlinear methods: Latent variable models Probabilistic, mappings, local optima, scale badly with dimension Spectral methods Not probabilistic, no mappings, global optimum, scale well with dimension They have developed separately so far, and have complementary advantages and disadvantages. Our new method, LELVM, shares the advantages of both. p.

Latent variable models (LVMs) Probabilistic methods that learn a joint density model p(x, y) from the training data Y. This yields: Marginal densities p(y) = p(y x)p(x)dx and p(x) Mapping F(y) = E {x y} the mean of p(x y) = p(y x)p(x) p(y) Mapping f(x) = E {y x} Several types: (Bayes th.) Linear LVMs: probabilistic PCA, factor analysis, ICA... Nonlinear LVMs: Generative Topographic Mapping (GTM) (Bishop et al, NECO 998) Generalised Elastic Net (GEN) (Carreira-Perpiñán et al 5) p.

Latent variable models (cont.) Nonlinear LVMs are very powerful in principle: can represent nonlinear mappings can represent multimodal densities can deal with missing data But in practice they have disadvantages: The objective function has many local optima, most of which yield very poor manifolds Computational cost grows exponentially with latent dimension L, so this limits L 3 Reason: need to discretise the latent space to compute p(y) = R p(y x)p(x) dx This has limited their use in certain applications. p.

Spectral methods Very popular recently in machine learning: multidimensional scaling, Isomap, LLE, Laplacian eigenmaps, etc. Essentially, they find latent points x,...,x N such that distances in X correlate well with distances in Y. Example: draw map of the US given city-to-city distances distance(y m,y n ) = We focus on Laplacian eigenmaps. 3..... x 3..7... x..7... = x 3................. p. 3

Spectral methods: Laplacian eigenmaps (Belkin & Niyogi, NECO 3). Neighborhood graph on dataset y,...,y N with weighted edges w mn = exp ( (y m y n )/σ ).. Set up quadratic optimisation problem min X tr( XLX T) s.t. XDX T = I, XD = X = (x,...,x N ), affinity matrix W = (w mn ), degree matrix D = diag ` P N n= w nm, graph Laplacian L = D W. Intuition: tr ( XLX T) = n m w nm x n x m = place x n, x m nearby if y n and y m are similar. The constraints fix the location and scale of X. 3. Solution: eigenvectors V = (v,...,v L+ ) of D WD, which yield the low-dimensional points X = V T D. p. 4

Spectral methods: Laplacian eigenmaps (cont.) Example: Swiss roll (from Belkin & Niyogi, NECO 3) 5 5 5 5 5 5 5 5 High-dimensional space Low-dimensional space Y = (y,...,y N ) X = (x,...,x N ) p. 5

Spectral methods: Laplacian eigenmaps (cont.) Advantages: No local optima (unique solution) Yet succeeds with nonlinear, convoluted manifolds (if the neighborhood graph is good) Computational cost O(N 3 ) or, for sparse graphs, O(N ) Can use any latent space dimension L (just use L eigenvectors) Disadvantages: No mapping for points not in Y = (y,...,y N ) or X = (x,...,x N ) (out-of-sample mapping) No density p(x,y) What should the mappings and densities be for unseen points (not in the training set)? p. 6

The Laplacian Eigenmaps Latent Variable Model (Carreira-Perpiñán & Lu, AISTATS 7) Natural way to embed unseen points Y u = (y N+,...,y N+M ) without perturbing the points Y s = (y,...,y N ) previously embedded: ( min tr ( X s X u ) ( L ss L su ) ( )) X T s X u R L M L us L uu X T u That is, solve the LE problem but subject to keeping X s fixed. Semi-supervised learning point of view: labelled data (X s,y s ) (real-valued labels), unlabelled data Y u, graph prior on Y = (Y s,y u ). Solution: X u = X s L su L uu. In particular, to embed a single unseen point y = Y u R D, we obtain x = F(y) = N K((y y n )/σ) P n= x N n = K((y y n )/σ) n. This gives an out-of-sample mapping F(y) for Laplacian eigenmaps. p. 7

LELVM (cont.) Further, we can define a joint probability model on x and y (thus a LVM) that is consistent with that mapping: p(x,y) = N ( ) ( ) y yn x xn K y K x N σ n= y σ x p(y) = N ( ) y yn K y p(x) = N ( ) x xn K x N σ y N σ x F(y) = f(x) = N n= N n= n= K y ((y y n )/σ y ) N n = K y((y y n )/σ y ) x n = K x ((x x n )/σ x ) N n = K x((x x n )/σ x ) y n = n= N p(n y)x n = E {x y} n= N p(n x)y n = E {y x} n= The densities are kernel density estimates, the mappings are Nadaraya-Watson estimators (all nonparametric). p. 8

LELVM (cont.) All the user needs to do is set: the graph parameters for Laplacian eigenmaps (as usual) σ x, σ y to control the smoothness of mappings & densities Advantages: those of latent variable models and spectral methods: yields mappings (nonlinear, infinitely differentiable and based on a global coordinate system) yields densities (potentially multimodal) no local optima succeeds with convoluted manifolds can use any dimension computational efficiency O(N 3 ) or O(N ) (sparse graph) Disadvantages: it relies on the success of Laplacian eigenmaps (which depends on the graph). p. 9

LELVM example: spiral Dataset: spiral in D; reduction to D. σ x = 5 5 σ x =.5 4 σ x =.5 4 σ x = 4.5 4 σ x =.5 3 GTM p(x).5..5.5..5..5 x.5..5.5..5..5 x.5..5.5..5..5 x.5..5.5..5..5 x.5..5.5..5..5 x.5.5.5.5.5.5.5.5.5.5.5.5 f(x).5.5.5.5.5.5.5.5.5.5.5.5.5.5.5.5.5.5.5.5.5.5.5.5 σ y =.5 σ y =. σ y =.5 σ y =. σ y =.5.5.5.5.5.5.5.5.5.5.5 p(y).5.5.5.5.5.5.5.5.5.5.5.5.5.5.5.5.5.5.5.5 p.

LELVM example: motion-capture dataset LELVM GTM..8.5.6..4.5..5..4..6.5.8..5.5.5.3.5..5..5.5..5. GPLVM.8.6.4...4.6.8 GPLVM with back-constraints.5.5.5.5.5.5.6.4...4.6.8 p.

LELVM example: mocap dataset (cont.) Smooth interpolation (e.g. for animation):..5..5.5..5..3.5..5..5.5..5. p.

LELVM example: mocap dataset (cont.) Reconstruction of missing patterns (e.g. due to occlusion) using p(x y obs ) and the mode-finding algorithms of Carreira-Perpiñán, PAMI, 7:...5.5...5.5.5.5...5.5..3.5..5..5.5..5...3.5..5..5.5..5. y obs y y obs y (mode ) y (mode ).5.5.5.5.5.5.5.5 p. 3

LELVM application: people tracking (Lu, Carreira-Perpiñán & Sminchisescu, NIPS 7) The probabilistic nature of LELVM allows seamless integration in a Bayesian framework for nonlinear, nongaussian tracking (particle filters): At time t: observation z t, unobserved state s t = (d t,x t ) rigid motion d, intrinsic pose x Prediction: p(s t z :t ) = p(s t s t )p(s t z :t )ds t Correction: p(s t z :t ) p(z t s t )p(s t z :t ) We use the Gaussian mixture Sigma-point particle filter (v.d.merwe & Wan, ICASSP 3). Dynamics: p(s t s t ) Gaussian {}}{ p d (d t d t ) Gaussian {}}{ p x (x t x t ) LELVM {}}{ p(x t ) Observation model: p(z t s t ) given by D tracker with Gaussian noise, and mapping from state to observations x R L f (LELVM) y R 3M Perspective z R M d R 3 p. 4

LELVM application: people tracking (cont.) 3D mocap, small training set 3D mocap, occlusion 3D mocap, front view CMU mocap database Fred turning Fred walking p. 5

LELVM: summary Probabilistic method for dimensionality reduction Natural, principled way of combining two large classes of methods (latent variable models and spectral methods), sharing the advantages of both We think it is asymptotically consistent (N ). Same idea applicable to out-of-sample extensions for LLE, Isomap, etc. Very simple to implement in practice training set + eigenvalue problem + kernel density estimate Useful for applications: Priors for articulated pose tracking with multiple motions (walking, dancing... ), multiple people Low-dim. repr. of state spaces in reinforcement learning Low-dim. repr. of degrees of freedom in humanoid robots Visualisation of high-dim. datasets, with uncertainty estimates p. 6