Predicting 3D People from 2D Pictures

Similar documents
Inferring 3D People from 2D Images

Inferring 3D from 2D

Part I: HumanEva-I dataset and evaluation metrics

Measure Locally, Reason Globally: Occlusion-sensitive Articulated Pose Estimation

Conditional Visual Tracking in Kernel Space

Human Upper Body Pose Estimation in Static Images

3D Human Motion Analysis and Manifolds

Viewpoint invariant exemplar-based 3D human tracking

Probabilistic Tracking and Reconstruction of 3D Human Motion in Monocular Video Sequences

Evaluating Example-based Pose Estimation: Experiments on the HumanEva Sets

People-Tracking-by-Detection and People-Detection-by-Tracking

Graphical Models and Their Applications

Motivation: Shortcomings of Hidden Markov Model. Ko, Youngjoong. Solution: Maximum Entropy Markov Model (MEMM)

Inferring 3D Body Pose from Silhouettes using Activity Manifold Learning

Bayesian Reconstruction of 3D Human Motion from Single-Camera Video

Loose-limbed People: Estimating 3D Human Pose and Motion Using Non-parametric Belief Propagation

Shared Kernel Information Embedding for Discriminative Inference

Visual Motion Analysis and Tracking Part II

Conditional Random People: Tracking Humans with CRFs and Grid Filters

Multi-view Human Motion Capture with An Improved Deformation Skin Model

Recognition Rate. 90 S 90 W 90 R Segment Length T

Boosted Multiple Deformable Trees for Parsing Human Poses

CS 664 Flexible Templates. Daniel Huttenlocher

Reducing Particle Filtering Complexity for 3D Motion Capture using Dynamic Bayesian Networks

Multi-Activity Tracking in LLE Body Pose Space

Nonflat Observation Model and Adaptive Depth Order Estimation for 3D Human Pose Tracking

Behaviour based particle filtering for human articulated motion tracking

CS 775: Advanced Computer Graphics. Lecture 17 : Motion Capture

Tracking Generic Human Motion via Fusion of Low- and High-Dimensional Approaches

Dynamic Human Shape Description and Characterization

Tracking Human Body Parts Using Particle Filters Constrained by Human Biomechanics

Avoiding the Streetlight Effect : Tracking by Exploring Likelihood Modes

Efficient Nonparametric Belief Propagation with Application to Articulated Body Tracking

An Adaptive Appearance Model Approach for Model-based Articulated Object Tracking

Key Developments in Human Pose Estimation for Kinect

Efficient Mean Shift Belief Propagation for Vision Tracking

Object Pose Detection in Range Scan Data

Articulated Pose Estimation with Flexible Mixtures-of-Parts

Visual Tracking of Human Body with Deforming Motion and Shape Average

Semi-supervised Learning of Joint Density Models for Human Pose Estimation

Semi-Supervised Hierarchical Models for 3D Human Pose Reconstruction

A novel approach to motion tracking with wearable sensors based on Probabilistic Graphical Models

Learning Joint Top-Down and Bottom-up Processes for 3D Visual Inference

Computer Vision II Lecture 14

An Efficient Message Passing Algorithm for Multi-Target Tracking

Reducing Particle Filtering Complexity for 3D Motion Capture using Dynamic Bayesian Networks

Robust Human Body Shape and Pose Tracking

3D Pictorial Structures for Multiple Human Pose Estimation

2D Articulated Tracking With Dynamic Bayesian Networks

Learning the Structure of Sum-Product Networks. Robert Gens Pedro Domingos

Gaussian Process Motion Graph Models for Smooth Transitions among Multiple Actions

Optimization and Learning Algorithms for Visual Inference

Efficient Human Pose Estimation via Parsing a Tree Structure Based Human Model

Local Gaussian Processes for Pose Recognition from Noisy Inputs

Learning Switching Linear Models of Human Motion

3D Tracking for Gait Characterization and Recognition

Structured Models in. Dan Huttenlocher. June 2010

Chapter 8 of Bishop's Book: Graphical Models

Human Context: Modeling human-human interactions for monocular 3D pose estimation

Learning Articulated Skeletons From Motion

Learning to Track 3D Human Motion from Silhouettes

HUMAN TRACKING AND POSE ESTIMATION IN VIDEO SURVEILLANCE SYSTEM

Estimation of Human Figure Motion Using Robust Tracking of Articulated Layers

and we can apply the recursive definition of the cost-to-go function to get

CS 775: Advanced Computer Graphics. Lecture 8 : Motion Capture

Nonparametric Facial Feature Localization

Monocular Human Motion Capture with a Mixture of Regressors. Ankur Agarwal and Bill Triggs GRAVIR-INRIA-CNRS, Grenoble, France

Monocular Virtual Trajectory Estimation with Dynamical Primitives

Conditional Random Fields for Object Recognition

Analysis of Goal-directed Manipulation in Clutter using Scene Graph Belief Propagation

Priors for People Tracking from Small Training Sets Λ

18 October, 2013 MVA ENS Cachan. Lecture 6: Introduction to graphical models Iasonas Kokkinos

Multi-camera Tracking of Articulated Human Motion Using Motion and Shape Cues

Coupled Visual and Kinematic Manifold Models for Tracking

Tracking Articulated Bodies using Generalized Expectation Maximization

Model-free Viewpoint Invariant Human Activity Recognition

Pose Estimation on Depth Images with Convolutional Neural Network

A Joint Model for 2D and 3D Pose Estimation from a Single Image

Learning to Estimate Human Pose with Data Driven Belief Propagation

3D Pose Estimation from a Single Monocular Image

A Joint Model for 2D and 3D Pose Estimation from a Single Image

IMA Preprint Series # 2154

Contents. 1 Introduction 2

Modeling pigeon behaviour using a Conditional Restricted Boltzmann Machine

Learning Silhouette Features for Control of Human Motion

Automatic Tracking of Moving Objects in Video for Surveillance Applications

Tracking in Action Space

People Tracking with the Laplacian Eigenmaps Latent Variable Model

International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol. XXXIV-5/W10

Motion Capture. Motion Capture in Movies. Motion Capture in Games

Human Pose Estimation Using Consistent Max-Covering

A Dynamic Human Model using Hybrid 2D-3D Representations in Hierarchical PCA Space

D-Separation. b) the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C.

Modeling 3D Human Poses from Uncalibrated Monocular Images

Markerless human motion capture through visual hull and articulated ICP

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 15, 2015

Structured Output-Associative Regression

Dynamic Bayesian network (DBN)

Conditional Random People: Tracking Humans with CRFs and Grid Filters Leonid Taycher, Gregory Shakhnarovich, David Demirdjian, Trevor Darrell

Variational Mixture Smoothing for Non-Linear Dynamical Systems

Transcription:

Predicting 3D People from 2D Pictures Leonid Sigal Michael J. Black Department of Computer Science Brown University http://www.cs.brown.edu/people/ls/ CIAR Summer School August 15-20, 2006 Leonid Sigal

Introduction Articulated pose estimation from single-view monocular image(s) (2D) Picture (3D) Person

Introduction Articulated pose estimation from single-view monocular image(s) (2D) Picture (3D) Person Entertainment: Animation, Games Clinical: Rehabilitation medicine Security: Surveillance Understanding: Gesture/Activity recognition

Why is it hard? Appearance/size/shape of people can vary dramatically The bones and joints are observable indirectly (obstructed by clothing) Occlusions High dimensionality of the state space Lose of depth information in 2D image projections

Approach Break up a very hard problem into smaller manageable pieces Tools Graphical models Image 2D Pose 3D Pose Belief Propagation

Approach Break up a very hard problem into smaller manageable pieces 2D Pose Image 3D Pose Tools Graphical models Belief Propagation

Approach Break up a very hard problem into smaller manageable pieces 2D Pose Image 3D Pose Tracking Tools Graphical models Belief Propagation

Hierarchical Inference Framework Break up a very hard problem into smaller manageable pieces Image 2D Pose 3D Pose Tracking Sigal & Black, CVPR 2006 Sigal & Black, AMDO 2006

Hierarchical Inference Framework Break up a very hard problem into smaller manageable pieces Image 2D Pose 3D Pose Tracking We are able to infer the 3D pose from a single image But, are still able to make use of temporal consistency when it is available Howe, Leventon, Freeman, 00

Discriminative Approaches Sminchisescu, Kanaujia, Li, Metaxas, 05 Agarwal & Triggs, 04 Rosales & Sclaroff, 00 Shakhnarovich, Viola, Darrell, 03 Tend to be very fast Work well on the data they are trained on Generalize poorly to data they have never seen

Advantage of Hierarchical Inference Image 2D Pose 3D Pose Tracking Better generalization in situations where good features are unavailable (lack of good silhouettes) via the use intermediate Generative 2D pose estimation Modularity Can easily substitute different 2D pose estimation modules Fully probabilistic approach

Hierarchical Graphical Model Structure 3D 2D Image t-1 t t+1

Hierarchical Graphical Model Structure 3D 2D Image t-1 t t+1

Hierarchical Graphical Model Structure 3D 2D Image t-1 t t+1

Hierarchical Graphical Model Structure 3D 2D Image t-1 t t+1

Inferring 2D pose Image 2D Pose 3D Pose Tracking Occlusion-sensitive Loose-limbed body model

2D Loose-limbed Body Model

2D Loose-limbed Body Model

2D Loose-limbed Body Model Kinematic

2D Loose-limbed Body Model Kinematic

2D Loose-limbed Body Model Kinematic Occlusion

2D Loose-limbed Body Model X 1 ε R 5 Kinematic X 10 X 8 X 6 X 4 X 2 Occlusion X 9 X 7 X 5 X 3

2D Loose-limbed Body Model ε R X ={ X,,, } 5 1 X 2 X 10 X 1 Kinematic X 10 X 8 X 6 X 4 X 2 Occlusion X 9 X 7 X 5 X 3

2D Loose-limbed Body Model X 10 X 8 X 1 X 6 X 4 X 2 Exact inference in tree-structured graphical models can be computed using BP But, not when X 9 X 7 X 5 X 3 State-space is continuous 10 Likelihoods (or potentials) are not Gaussian Graph contains loops 6 7 1 8 9 This forces the use of approximate inference algorithms 2 4 PAMPAS: M. Isard, 03 3 5 Non-Parametric BP: E. Sudderth, A. Ihler, W. Freeman, A. Willsky, 03

Particle Message Passing I t e r a t e Start with some distribution for all or sub-set of parts/limbs Evaluate the likelihood to see which parts/limbs best describe the image Propagate information from parts/limbs to neighboring parts/limbs Postulate (new) consistent poses for limbs based on all available constraints Output the distributions over parts

Inferring 2D pose Occlusion-sensitive Loose-limbed body model allows us to infer the 2D pose reliably Even when motions are complex Moving Camera

Summary so far Image 2D Pose 3D Pose Tracking Occlusion-sensitive Loose-limbed body model

Inferring 3D pose from 2D pose Image 2D Pose 3D Pose We obtain estimates for the joints automatically We learn direct probabilistic mapping Tracking Camillo J. Taylor, 00

Inferring 3D pose from 2D pose Image 2D Pose 3D Pose Tracking Mixture of Experts (MoE) Sminchisescu et al, 05 Waterhouse et al, 96

Inferring 3D pose from 2D pose We want to estimate a distribution/mapping p(3d Pose 2D Pose) X ε R n 2D Pose p(y X) Y ε R m 3D Pose Problem: p(y X) is non-linear mapping, and not one-to-one

Mixture of Experts (MoE) We want to estimate a distribution/mapping p(3d Pose 2D Pose) X ε R n 2D Pose p(y X) Y ε R m 3D Pose Solution: p(y X) may be approximated by a locally linear mappings (experts)

MoE Formally 2D Body Pose Expert: probability of the output 3D pose Y according to the k-th expert K p( Y X) p ( Y X) p ( k X) e, k g, k 3D Body Pose k= 1 Sum over all experts Gaiting Network: probability that input 2D pose X is assigned to the k-th expert. Training of MoE is done using EM procedure (similar to learning Mixture of Gaussians)

Illustration of 3D pose inference Query 2D pose X ε R n 2D Pose p g,k (k X) Gaiting network p e,k (Y X) Linear mapping Y ε R m 3D Pose

Illustration of 3D pose inference Query 2D pose X ε R n 2D Pose p g,k (k X) Gaiting network p e,k (Y X) Linear mapping Y ε R m 3D Pose

Performance Two action-specific MoE models are trained Walking 4587 2D/3D MOCAP pose pairs for training 1398 video frames used for testing Performance View only: 14 mm Pose only: 23 mm Overall: 30 mm Dancing 4151 2D/3D MOCAP pose pairs for training 2074 video frames used for testing Performance View only: 22 mm Pose only: 59 mm Overall: 64 mm Structured motion / Performance

How well does MoE model work?

Summary so far Image 2D Pose 3D Pose Tracking Occlusion-sensitive Loose-limbed body model Mixture of Experts (MoE)

Summary so far Image 2D Pose 3D Pose Tracking Hidden Markov Model (HMM)

Tracking in 3D We have a distribution over 3D poses at every time instance Y 0 Y 1 Y 2 Y T I 0 I 1 I 2 I T Assuming that 3D pose at time t is conditionally independent of the state at time t-2 given state at t-1

Tracking in 3D (inference) Inference in this graphical model can be done using the tools we already have PAMPAS/Non-parametric belief propagation Y 0 Y 1 Y 2 Y T I 0 I 1 I 2 I T

Summary so far Image 2D Pose 3D Pose Tracking Occlusion-sensitive Loose-limbed body model Mixture of Experts (MoE) Hidden Markov Model (HMM)

Hierarchical 3D Pose Estimation from Single View Monocular Images Distribution Most Likely Sample Distribution Most Likely Sample Frame 50 Frame 20 Frame 10 Image 2D Pose Estimation 3D Pose Estimation

Hierarchical 3D Pose Estimation from Single View Monocular Images Distribution Most Likely Sample Distribution Most Likely Sample Frame 50 Frame 30 Frame 10 Image 2D Pose Estimation 3D Pose Estimation

Tracking in 3D

Tracking in 3D

Summary Image 2D Pose 3D Pose Tracking We introduced a novel hierarchical inference framework Where we mediate the complexity of single-image monocular 3D pose estimation by intermediate 2D pose estimation stage Inference in this framework can be tractably done using a variant of Non-parametric Belief Propagation Results obtained are very encouraging

Thank You! Questions?