3D Human Motion Analysis and Manifolds

Similar documents
Inferring 3D from 2D

Priors for People Tracking from Small Training Sets Λ

People Tracking with the Laplacian Eigenmaps Latent Variable Model

The Laplacian Eigenmaps Latent Variable Model

Computer Vision II Lecture 14

Predicting Articulated Human Motion from Spatial Processes

3D Human Motion Tracking Using Dynamic Probabilistic Latent Semantic Analysis

Visual Motion Analysis and Tracking Part II

Human Upper Body Pose Estimation in Static Images

CSE452 Computer Graphics

Predicting 3D People from 2D Pictures

Graphical Models for Human Motion Modeling

Learning Deformations of Human Arm Movement to Adapt to Environmental Constraints

Conditional Visual Tracking in Kernel Space

A novel approach to motion tracking with wearable sensors based on Probabilistic Graphical Models

Introduction to Mobile Robotics

Hierarchical Gaussian Process Latent Variable Models

Probabilistic Tracking and Reconstruction of 3D Human Motion in Monocular Video Sequences

Tracking Human Body Pose on a Learned Smooth Space

International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol. XXXIV-5/W10

Visual Tracking of Human Body with Deforming Motion and Shape Average

Gaussian Process Motion Graph Models for Smooth Transitions among Multiple Actions

Motion Capture. Motion Capture in Movies. Motion Capture in Games

Terry Taewoong Um. Terry T. Um and Dana Kulić. University of Waterloo. Department of Electrical & Computer Engineering

COMPUTER VISION > OPTICAL FLOW UTRECHT UNIVERSITY RONALD POPPE

Learning Alignments from Latent Space Structures

Dynamic Human Shape Description and Characterization

Overview. Related Work Tensor Voting in 2-D Tensor Voting in 3-D Tensor Voting in N-D Application to Vision Problems Stereo Visual Motion

Tracking People on a Torus

Gaussian Process Dynamical Models

Combined Shape Analysis of Human Poses and Motion Units for Action Segmentation and Recognition

Human Motion Synthesis by Motion Manifold Learning and Motion Primitive Segmentation

Inferring 3D People from 2D Images

Lecture 20: Tracking. Tuesday, Nov 27

Does Dimensionality Reduction Improve the Quality of Motion Interpolation?

CS 775: Advanced Computer Graphics. Lecture 17 : Motion Capture

Computer Animation and Visualisation. Lecture 3. Motion capture and physically-based animation of characters

The Role of Manifold Learning in Human Motion Analysis

Behaviour based particle filtering for human articulated motion tracking

Motion Capture & Simulation

Modeling 3D Human Poses from Uncalibrated Monocular Images

Basis Functions. Volker Tresp Summer 2017

Human Activity Tracking from Moving Camera Stereo Data

Gaussian Process Dynamical Models

Monocular Human Motion Capture with a Mixture of Regressors. Ankur Agarwal and Bill Triggs GRAVIR-INRIA-CNRS, Grenoble, France

To Do. History of Computer Animation. These Lectures. 2D and 3D Animation. Computer Animation. Foundations of Computer Graphics (Spring 2010)

Applications. Systems. Motion capture pipeline. Biomechanical analysis. Graphics research

CS448A: Experiments in Motion Capture. CS448A: Experiments in Motion Capture

Using GPLVM for Inverse Kinematics on Non-cyclic Data

COMPUTATIONAL STATISTICS UNSUPERVISED LEARNING

Human Motion Change Detection By Hierarchical Gaussian Process Dynamical Model With Particle Filter

Computer Vision II Lecture 14

Model-based Visual Tracking:

CS 231. Inverse Kinematics Intro to Motion Capture

Real-Time Human Pose Inference using Kernel Principal Component Pre-image Approximations

CS 231. Inverse Kinematics Intro to Motion Capture. 3D characters. Representation. 1) Skeleton Origin (root) Joint centers/ bones lengths

Part I: HumanEva-I dataset and evaluation metrics

Graphical Models and Their Applications

3D Human Pose Estimation from Monocular Image Sequences

Data-driven Approaches to Simulation (Motion Capture)

Human body animation. Computer Animation. Human Body Animation. Skeletal Animation

Local Gaussian Processes for Pose Recognition from Noisy Inputs

Tracking Generic Human Motion via Fusion of Low- and High-Dimensional Approaches

Motion Interpretation and Synthesis by ICA

Generalized Principal Component Analysis CVPR 2007

THE capability to precisely synthesize online fullbody

Epitomic Analysis of Human Motion

Last update: May 4, Vision. CMSC 421: Chapter 24. CMSC 421: Chapter 24 1

Last Time? Inverse Kinematics. Today. Keyframing. Physically-Based Animation. Procedural Animation

Warped Mixture Models

GOOD statistical models for human motion are important

Term Project Final Report for CPSC526 Statistical Models of Poses Using Inverse Kinematics

Style-based Inverse Kinematics

3D Reconstruction of Human Motion Through Video

Inferring 3D Body Pose from Silhouettes using Activity Manifold Learning

STRUCTURAL EDGE LEARNING FOR 3-D RECONSTRUCTION FROM A SINGLE STILL IMAGE. Nan Hu. Stanford University Electrical Engineering

GRAPH-BASED APPROACH FOR MOTION CAPTURE DATA REPRESENTATION AND ANALYSIS. Jiun-Yu Kao, Antonio Ortega, Shrikanth S. Narayanan

Semi-Supervised Hierarchical Models for 3D Human Pose Reconstruction

Particle Filtering. CS6240 Multimedia Analysis. Leow Wee Kheng. Department of Computer Science School of Computing National University of Singapore

Last Time? Animation, Motion Capture, & Inverse Kinematics. Today. Keyframing. Physically-Based Animation. Procedural Animation

CS 775: Advanced Computer Graphics. Lecture 8 : Motion Capture

Zürich. Roland Siegwart Margarita Chli Martin Rufli Davide Scaramuzza. ETH Master Course: L Autonomous Mobile Robots Summary

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

9.913 Pattern Recognition for Vision. Class I - Overview. Instructors: B. Heisele, Y. Ivanov, T. Poggio

Last Time? Animation, Motion Capture, & Inverse Kinematics. Today. Keyframing. Physically-Based Animation. Procedural Animation

Basis Functions. Volker Tresp Summer 2016

CS 775: Advanced Computer Graphics. Lecture 3 : Kinematics

A Swarm Intelligence Based Searching Strategy for Articulated 3D Human Body Tracking

Graph-based High Level Motion Segmentation using Normalized Cuts

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

08 An Introduction to Dense Continuous Robotic Mapping

Thiruvarangan Ramaraj CS525 Graphics & Scientific Visualization Spring 2007, Presentation I, February 28 th 2007, 14:10 15:00. Topic (Research Paper):

Mobile Robotics. Mathematics, Models, and Methods. HI Cambridge. Alonzo Kelly. Carnegie Mellon University UNIVERSITY PRESS

Animations. Hakan Bilen University of Edinburgh. Computer Graphics Fall Some slides are courtesy of Steve Marschner and Kavita Bala

Generating Different Realistic Humanoid Motion

CITY UNIVERSITY OF HONG KONG 香港城市大學

Temporal Motion Models for Monocular and Multiview 3 D Human Body Tracking

Table of Contents. Chapter 1. Modeling and Identification of Serial Robots... 1 Wisama KHALIL and Etienne DOMBRE

Shared Kernel Information Embedding for Discriminative Inference

Research and Literature Review on Developing Motion Capture System for Analyzing Athletes Action

Transcription:

D E P A R T M E N T O F C O M P U T E R S C I E N C E U N I V E R S I T Y O F C O P E N H A G E N 3D Human Motion Analysis and Manifolds Kim Steenstrup Pedersen DIKU Image group and E-Science center

Motivation Goal: To give an overview of how manifolds and manifold learning are used in human motion analysis. Outline of this lecture: 3D human motion analysis 101 Manifolds in human motion analysis 2-3 concrete examples will be given

3D Human motion analysis Def.: Estimation of 3D pose and motion of an articulated model of the human body from visual data a.k.a. motion capture. Marker-based motion capture (MoCap): Outcome: Tracking markers on joints in 3D giving joint positions. Markers: Acoustic, inertial, LED, magnetic, reflective, etc. Cameras or active sensors. Marker-less motion capture (MoCap): Outcome: 3D joint positions or triangulated surfaces and relation to video sequence. Multi-view (several cameras / views) Monocular (single camera / view) Camera / view types: Optical camera, stereo pair, time-of-flight cameras, etc.

3D Marker based motion capture [http://mocap.cs.cmu.edu/]

3D Marker-less motion capture (Upper body) [Hauberg et al, 2009]

Why do we want to do human motion analysis? Human computer interaction: Non-invasive interface technology Computer animation: Entertainment (movies and games), education, visualization Surveillance: Suspicious behavior recognition, movement patterns Physiotherapeutic analysis: Sports performance enhancement, patient treatment enhancement Biomechanical modeling

Human body model The human body is commonly modeled as an articulated collection of rigid limbs connected with joints. Common representation: [ ] T y = 1,, D Vector of joint angles together with some representation of global position and orientation. Geometric shapes for modeling limb extend (boxes, ellipsoids). Other representations: Joint positions End-effector positions Surface models [Hauberg et al, 2009]

Human body model constraints Natural physical constraints: Body limitations, e.g. joint angle limits, limb dimensions (volume, length, etc.), Non-penetrability of limbs Angular velocity and acceleration limits Constraints can be modeled as either hard or soft constraints.

Manifolds in human motion analysis The manifold representation is a natural choice because: Human motion is sparsely distributed in pose space with low intrinsic dimensionality. This is especially true for activity specific motion, such as walking. Human motion is generally continuous and smooth joint angles does not change instantaneously in large jumps (governed by Newton laws). Hence we would like dimensionality reduction which respect this (locality preservation). Constraints leads to boundaries and maybe to holes in manifolds. Added benefits: Dimensionality reduction Necessary to make robust estimates of model parameters from small data sets. Will make most tracking algorithms more feasible.

Manifolds in human motion analysis

Manifolds in human motion analysis

Motion in pose space Motion is modeled as temporal curves in pose space y t = [ 1 (t),, D (t)] T, x t = [ x 1 (t),,x d (t)] T Embedded space x E Embedding space y H x t F : E H y t E

Manifolds in human motion analysis Embedded space x E Embedding space y H Observation space o O F : E H T : H O E Goal: Estimate poses and motion from observations. Unkowns: y, x In general we need to learn parameters of the mappings F and T.

Tracking of human motion (Bayesian framework) Apply tracking algorithms to sequentially estimate the pose. Key ingredients of a sequential Bayesian framework: Observation model: p O (o t y t ) Prior on poses: p H (y t ) Prior on embedded space: p E (x t ) Dynamical model: p H (y t y 1:t 1 ) or p E (x t x 1:t 1 ) Estimation: Sequential stochastic filtering are commonly used e.g. Kalman and particle filtering. Sometimes deterministic optimization is also possible. Example: 1 st order Markov chain example of filtering on manifold: p(x t o 1:t ) p(o t F(x t ))p(x t x t 1 ) p(x t 1 o 1:t 1 )dx t 1

Pose and motion prior models Priors on pose: Which poses are probable? Activity specific pose models: Walking, running, golfing, jumping, etc. Examples: [Urtasun et al, 2005b; Sminchisescu et al, 2004]. Constraints: Joint angle limits, non-penetrability of limbs, etc. Priors on motion: What types of motion are probable? Activity specific motion models: Walking, running, golfing, jumping, etc. Examples: [Urtasun et al, 2005a, 2006]. Markov chain models (e.g. 1 st and 2 nd order models, HMM, etc.) General stochastic processes Constraints: Angular velocity and acceleration limits. Priors on plausible human poses and motion are especially important for monocular 3D tracking in order to handle occlusion, depth ambiguity, and noisy observations.

Motion and pose prior: PCA [Urtasun et al, 2005a] Prior model for golf swings: Learn a joint model on motion and poses from motion capture data using PCA. Use the prior to track golf swings in 3D. Training set: 10 motion capture golf swing samples (from CMU data set). Time warp samples to meet 4 key postures and sample with N=200 time steps. Use normalized time in [0,1]. [Urtasun et al, 2005a]

Motion and pose prior: PCA [Urtasun et al, 2005a] Model: D=72 angles (+ global 3D position and 3D orientation). Angular motion vector, N*D=14400 dim.: row vector of joint angles at normalized time μi Motion model: d=4 principle components of the training set. denotes the mean of the training set. 0 y 0 + Embedded coordinates d i= 0 i i i x = [ 1,, d ] T y = [ μ1,, μn ] T μ i

Motion and pose prior: PCA [Urtasun et al, 2005a] Estimation of motion: Sequential least squares minimization of PCA coefficients, global position and orientation, and normalized times over a sliding window of n frames. Objective function include observation model and global motion smoothing terms. Linear global motion model.

Motion and pose prior: PCA Results Full swing [Urtasun et al, 2005a]

Motion and pose prior: PCA Results Short swing [Urtasun et al, 2005a]

Priors on poses: Laplacian eigenmaps [Sminchisescu et al, 2004] Priors for poses using Laplacian eigenmaps: Activity specific, but combinations of activities are possible as we shall see. Outline: Embedded manifold E is learnt from MoCap training data (CMU database) using Laplacian eigenmaps. Intrinsic dimensionality can be estimated by the Hausdorff dimension. Use a first order Markov chain dynamical model in embedded space E. Tracking is performed by standard sequential Bayesian estimation using Covariance scaled sampling.

Mapping to manifold Learn global smooth mapping from embedded space E to embedding space H (angle representation) by kernel regression using the training set. F Embedded space x E Embedding space y H F : E H x t y t E

Priors on poses: Laplacian eigenmaps Priors in embedded space and embedding space: Physical constraints (joint limits, angular velocity limits, nonpenetrability of limbs, etc.) naturally defined in the original representation (embedding space) H. Prior in embedded space E given by learning a mixture of Gaussian from training data: p E (x) = k N (x,μ k, k ) Solution - Embedded flattened prior: Prior in original space H (physical constraints) is used to produce flattened prior in embedded space E: p(x) p E (x) p H (F (x)) J F (x) T J F (x) 12 K k=1

Priors on poses: Walking prior p E (x) 2D embedding 3D embedding [Sminchisescu et al, 2004]

Priors on poses: Interaction [Sminchisescu et al, 2004] TODO: Add description of

Priors on poses: Effect of embedding prior [Sminchisescu et al, 2004]

Priors on poses: GP s and latent variables Priors for pose derived from a small training set using a scaled Gaussian processes (GP) latent variable model [Urtasun et al, 2005b]: Activity specific model learnt from motion capture training data. Can learn and generalize from a single training motion example. Learn the mapping y = F(x) from E to H and optimize the latent variable positions at the same time. Learn a joint distribution p(x,y) on embedded E and embedding spaces H. Assign high probability to new x near training data.

Priors on poses: GP s and latent variables Training: Mean zero training data: Unknown model parameters: GP require that: p(y M) = W N Y = [ y 1,,y N ] T, y i R D M = (2 ) ND K D exp( 1 2 tr(k 1 YW 2 Y T )) {{ x i } N D i=1,,,, w } j { } j=1 W = diag(w 1,,w D ) and K ij = k(x i,x j ) is a RBF with parameters,, Model parameters are learned by finding the MAP solution, using a simple prior on hyperparameters and an isotropic i.i.d. Gaussian prior for latent positions x.

Priors on poses: GP s and latent variables Pose prior: Joint probability on new latent positions x and poses y: p(x, y M,Y) = exp xt x 2 Y ˆ = [ y 1,,y N,y] T, K ˆ = Learned mean mapping: Learned variance: Tracking: K k(x) T W N +1 (N +1)D (2 ) ˆ exp( 1 K D 2 tr( K ˆ 1 Y ˆ W 2 Y ˆ T )) k(x), k(x) = k(x 1, x),,k(x N, x) k(x,x) F(x) = μ + Y T K 1 k(x) 2 (x) = k(x,x) k(x) T K 1 k(x) Sequential MAP estimation of x,y based on model and observations with 2nd order Markov dynamics. Solved by deterministic optimization. [ ] T

Priors on Poses: GP s and Latent variables [Urtasun et al, 2005b]

Priors on Poses: GP s and Latent variables [Urtasun et al, 2005b]

Summary We have seen how pose and motion manifolds appear in human motion analysis: Human motion have low intrinsic dimensionality, especially activity specific motion Human motion is smooth Physical limitations joint limitations, non-penetration, etc. Strong prior models are especially needed in monocular 3D tracking. I have given a couple of concrete examples: PCA prior model of pose and motion Laplacian eigenmaps for learning pose prior Gaussian processes latent variable model for pose prior

Suggested reading material R. Poppe: Vision-based human motion analysis: An overview. Computer Vision and Image Understanding, 108 (1-2): 4-18, 2007. R. Urtasun et al.: Monocular 3-D tracking of the golf swing. Proceeding of CVPR'05, 2005. C. Sminchisescu and A. Jepson: Generative modeling for continuous non-linearly embedded visual inference. ICML 04, pp. 759--766, 2004. R. Urtasun et al.: Priors for people tracking from small training sets. Proceeding of ICCV'05, pp. 403-410, 2005. CMU Graphics lab motion capture database: http://mocap.cs.cmu.edu/

Additional references S. Hauberg, J. Lapuyade, M. Engell-Nørregård, K. Erleben and K. S. Pedersen. Three Dimensional Monocular Human Motion Analysis in End-Effector Space. In Energy Minimization Methods in Computer Vision and Pattern Recognition (EMMCVPR), pp. 235-248, 2009. M. Engell-Nørregård, S. Hauberg, J. Lapuyade, K. Erleben, and K. S. Pedersen. Interactive Inverse Kinematics for Monocular Motion Estimation. VRIPHYS 09, submitted, 2009. R. Urtasun, D. J. Fleet, and P. Fua: Temporal motion models for monocular and multiview 3D human body tracking. Computer Vision and Image Understanding, 104: 157-177, 2006. Z. Lu et al.: People Tracking with the Laplacian Eigenmaps Latent Variable Model. NIPS'07, 2007. A. Elgammal and C.-S. Lee: Tracking people on a torus. IEEE T-PAMI, 31(3): 520-538, 2009. N. D. Lawrence: Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data. Advances in Neural Information Processing Systems, pp. 329-336, 2004.