Monocular Human Motion Capture with a Mixture of Regressors. Ankur Agarwal and Bill Triggs GRAVIR-INRIA-CNRS, Grenoble, France

Similar documents
Human Upper Body Pose Estimation in Static Images

Conditional Visual Tracking in Kernel Space

Learning to Track 3D Human Motion from Silhouettes

Note Set 4: Finite Mixture Models and the EM Algorithm

9.913 Pattern Recognition for Vision. Class I - Overview. Instructors: B. Heisele, Y. Ivanov, T. Poggio

Modeling 3D Human Poses from Uncalibrated Monocular Images

Computer Vision II Lecture 14

Expectation Maximization (EM) and Gaussian Mixture Models

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas

Mixture Models and the EM Algorithm

Inference and Representation

ROBUST OBJECT TRACKING BY SIMULTANEOUS GENERATION OF AN OBJECT MODEL

Grundlagen der Künstlichen Intelligenz

Clustering Lecture 5: Mixture Model

22 October, 2012 MVA ENS Cachan. Lecture 5: Introduction to generative models Iasonas Kokkinos

3D Human Motion Analysis and Manifolds

Introduction to Mobile Robotics

Real-Time Human Pose Inference using Kernel Principal Component Pre-image Approximations

Tracking People. Tracking People: Context

Semi-supervised Learning of Joint Density Models for Human Pose Estimation

Occlusion Robust Multi-Camera Face Tracking

Semi-Supervised Hierarchical Models for 3D Human Pose Reconstruction

Joint Feature Distributions for Image Correspondence. Joint Feature Distribution Matching. Motivation

Mixture Models and EM

A Statistical Image-Based Shape Model for Visual Hull Reconstruction and 3D Structure Inference. Kristen Lorraine Grauman

Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Style-based Inverse Kinematics

Face detection in a video sequence - a temporal approach

Passive Differential Matched-field Depth Estimation of Moving Acoustic Sources

Short Survey on Static Hand Gesture Recognition

Artificial Intelligence for Robotics: A Brief Summary

Warped Mixture Models

Tracking People on a Torus

Tracking Articulated Bodies using Generalized Expectation Maximization

Object Tracking using HOG and SVM

EE 584 MACHINE VISION

Vision par ordinateur

Action recognition in videos

Deep Generative Models Variational Autoencoders

ECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov

Dimensionality Reduction and Generation of Human Motion

Probabilistic Head Pose Tracking Evaluation in Single and Multiple Camera Setups

3D Face and Hand Tracking for American Sign Language Recognition

Visuelle Perzeption für Mensch- Maschine Schnittstellen

Blind Image Deblurring Using Dark Channel Prior

COS Lecture 13 Autonomous Robot Navigation

Behaviour based particle filtering for human articulated motion tracking

CS 775: Advanced Computer Graphics. Lecture 17 : Motion Capture

Bayesian Reconstruction of 3D Human Motion from Single-Camera Video

ECE276A: Sensing & Estimation in Robotics Lecture 11: Simultaneous Localization and Mapping using a Particle Filter

Evaluating Example-based Pose Estimation: Experiments on the HumanEva Sets

Probabilistic Tracking and Reconstruction of 3D Human Motion in Monocular Video Sequences

Inferring 3D from 2D

Online Adaptive Visual Tracking

Recovering 3D human pose from monocular images

Particle Filtering. CS6240 Multimedia Analysis. Leow Wee Kheng. Department of Computer Science School of Computing National University of Singapore

ECE 5424: Introduction to Machine Learning

Kernel PCA of HOG features for posture detection

Unsupervised Learning

Joint Vanishing Point Extraction and Tracking. 9. June 2015 CVPR 2015 Till Kroeger, Dengxin Dai, Luc Van Gool, Computer Vision ETH Zürich

Experimental Evaluation of Latent Variable Models. for Dimensionality Reduction

Coupling Top-down and Bottom-up Methods for 3D Human Pose and Shape Estimation from Monocular Image Sequences

Epitomic Analysis of Human Motion

Homework. Gaussian, Bishop 2.3 Non-parametric, Bishop 2.5 Linear regression Pod-cast lecture on-line. Next lectures:

Graphical Models for Human Motion Modeling

Grasp Recognition using a 3D Articulated Model and Infrared Images

3D Human Motion Tracking Using Dynamic Probabilistic Latent Semantic Analysis

Charged Particle Reconstruction in HIC Detectors

VNect: Real-time 3D Human Pose Estimation with a Single RGB Camera

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Learning Generative Graph Prototypes using Simplified Von Neumann Entropy

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme, Nicolas Schilling

Volume 3, Issue 11, November 2013 International Journal of Advanced Research in Computer Science and Software Engineering

Motion Tracking and Event Understanding in Video Sequences

Augmented Reality VU. Computer Vision 3D Registration (2) Prof. Vincent Lepetit

Part-based and local feature models for generic object recognition

Probabilistic Facial Feature Extraction Using Joint Distribution of Location and Texture Information

Adaptive Kernel Regression for Image Processing and Reconstruction

3D Human Pose Estimation from Monocular Image Sequences

Practical Course WS12/13 Introduction to Monte Carlo Localization

Clustering web search results

Hand Pose Estimation Using Expectation-Constrained-Maximization From Voxel Data

Multiple Model Estimation : The EM Algorithm & Applications

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016

Machine Learning for OR & FE

Modelling of non-gaussian tails of multiple Coulomb scattering in track fitting with a Gaussian-sum filter

Instance-level recognition

K-Means and Gaussian Mixture Models

International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol. XXXIV-5/W10

Estimation of Human Figure Motion Using Robust Tracking of Articulated Layers

Clustering: Classic Methods and Modern Views

Identifying Same Persons from Temporally Synchronized Videos Taken by Multiple Wearable Cameras

I D I A P. Probabilistic Head Pose Tracking Evaluation in Single and Multiple Camera Setups Sileye O. Ba a R E S E A R C H R E P O R T

Multiple Model Estimation : The EM Algorithm & Applications

Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011

Tracking Algorithms. Lecture16: Visual Tracking I. Probabilistic Tracking. Joint Probability and Graphical Model. Deterministic methods

08 An Introduction to Dense Continuous Robotic Mapping

EE565:Mobile Robotics Lecture 3

Simultaneous Appearance Modeling and Segmentation for Matching People under Occlusion

An Overview of a Probabilistic Tracker for Multiple Cooperative Tracking Agents

Transcription:

Monocular Human Motion Capture with a Mixture of Regressors Ankur Agarwal and Bill Triggs GRAVIR-INRIA-CNRS, Grenoble, France IEEE Workshop on Vision for Human-Computer Interaction, 21 June 2005

Visual surveillance Introduction Goal To recover 3D human body pose from monocular image silhouettes - 3D pose = joint angles - Use either individual images or video sequences Applications Human computer interaction Gesture interpretation / activity recognition Markerless motion capture

Model Free Learning Based Approach No explicit 3D model recovers 3D pose directly from robust silhouette descriptors Human motion capture data used to train mixture of kernel regressors Advantages No need to build an explicit 3D model Easily adaptable to different appearances/people; possibly more robust Motion capture data captures typical human movements, not just kinematically possible ones Disadvantages Harder to interpret than explicit model, and may be less accurate

Regression based pose estimation from Silhouettes Robustly encode silhouette shape using vector quantized local Shape Context Histograms (z) Denote the output (3D pose) as a vector (x) Learn a regressive mapping x r(z) A φ(z) + b A: matrix of weight vectors, φ(z): vector of scalar basis functions Can allow for robustness/sparseness with the use of SVM/RVM... Training data Real silhouettes from motion capture videos supplemented with synthetic silhouettes from several human body models

Ambiguities in static pose reconstruction The silhouette (z) to pose (x) problem is inherently multi-valued. Treating it as a function can lead to averaging or zig-zagging between different solutions.

Multimodal Pose Estimation Introduce a discrete latent variable k {1, 2... K} to encode the information missing in the silhouette. Assume a mixture of experts model based on K underlying functional regression rules x r k (z): p(x z) = K p(x z, k) p(k z), p(x z, k) = N (r k (z), Λ k ) k=1 Linear regressor within each component k r k (z) A k φ(z) + b k Obtain multimodal probabilistic solutions in 3D pose space

Mixture of Regressors Fit a mixture of regressive Gaussians to the joint density (φ(z), x): φ(z) K π k N (µ k, Γ k ) x k=1 x p(x z=z ) t p(x,z) covariance ^k x = A kz + b k z=z t z

Mixture of Regressors (contd.) Special covariance structure enforces regressive noise model µ k = φ( z k), Γ k = Σ k Σ k A k r k ( z k ) A k Σ k A k Σ k A k + Λ k Parameters learned using Expectation Maximization M-step: Estimate A k, b k by weighted least squares regression, Λ k from residual errors. Compute µ k, Σ k, π k for each class. E-step: Reestimate class membership weights for each point.

Multimodal Pose Estimation from Static Images prob = 0.99 prob = 0.87 prob = 0.13 prob = 1.00 prob = 0.75 prob = 0.25 prob = 0.99 prob = 0.68 prob = 0.18 prob = 0.14 prob = 0.80 prob = 0.98 prob = 0.82 prob = 0.17 prob = 0.94 prob = 0.81 prob = 0.19 prob = 0.99 Provides multiple solutions for pose, with corresponding probabilities Most cases of ambiguity are identified

% of frames with m solutions Error in the Error in best of m = 1 m = 2 m 3 top solution top 4 solutions Test person 62 28 10 6.14 4.84 Test motion 65 28 6 7.40 5.37 Train subset 72 23 5 6.14 4.55 Numbers of solutions and RMS joint angle reconstruction errors for 3 test sequences.

Tracking with automatic (re)initialization Particle filter tracker, samples from dynamics p(x t x t 1 ) as usual Uses regressive mixture p(x t z t ) to assign posterior particle weights (Re)initializes by sampling from full mixture p(x 0 z 0 ) = K p(k z 0 ). N (r k (z 0 ), Λ k ) k=1 Potentially real time owing to closed form solution for posterior.

Self-Initializing 3D Tracking Detects the presence of a person and decides whether to wait, initialize or track using observed silhouette shape. Error in degrees 15 10 5 0 0 100 200 300 400 500 time

Upper Body Gesture Recognition Associate (by hand) different mixture components with gestures Use posterior class probabilities to identify action Training gestures (Basketball signals) Traveling Illegal dribble Illegal defense No score Stop clock Hand check Technical foul

Tracking and labelling gestures 7 True Action label Estimated label 6 5 4 3 2 1 0 0 100 200 300 400 500 600 700

Conclusion Model free methods for recovering 3D human pose from monocular silhouettes Multiple hypothesis pose estimates with associated probabilities Stable pose recovery from static images and image sequences Action recognition using mixture components