Probabilistic Tracking and Reconstruction of 3D Human Motion in Monocular Video Sequences

Similar documents
Inferring 3D People from 2D Images

Visual Motion Analysis and Tracking Part II

Learning Image Statistics for Bayesian Tracking

Edge Filter Response. Ridge Filter Response

Human Upper Body Pose Estimation in Static Images

Bayesian Reconstruction of 3D Human Motion from Single-Camera Video

Stochastic Tracking of 3D Human Figures Using 2D Image Motion

Visual Tracking of Human Body with Deforming Motion and Shape Average

radius length global trans global rot

Predicting 3D People from 2D Pictures

3D Human Motion Analysis and Manifolds

International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol. XXXIV-5/W10

Feature Tracking and Optical Flow

An Adaptive Appearance Model Approach for Model-based Articulated Object Tracking

Automatic Model Initialization for 3-D Monocular Visual Tracking of Human Limbs in Unconstrained Environments

Inferring 3D from 2D

Feature Tracking and Optical Flow

Robust Model-Free Tracking of Non-Rigid Shape. Abstract

3D Spatial Layout Propagation in a Video Sequence

The SIFT (Scale Invariant Feature

Behaviour based particle filtering for human articulated motion tracking

Autonomous Mobile Robot Design

Automatic Tracking of Moving Objects in Video for Surveillance Applications

Applying Synthetic Images to Learning Grasping Orientation from Single Monocular Images

Probabilistic Detection and Tracking of Motion Boundaries

9.913 Pattern Recognition for Vision. Class I - Overview. Instructors: B. Heisele, Y. Ivanov, T. Poggio

Probabilistic Detection and Tracking of Motion Discontinuities

Switching Hypothesized Measurements: A Dynamic Model with Applications to Occlusion Adaptive Joint Tracking

Revising Stereo Vision Maps in Particle Filter Based SLAM using Localisation Confidence and Sample History

Evaluation of Moving Object Tracking Techniques for Video Surveillance Applications

Motion and Tracking. Andrea Torsello DAIS Università Ca Foscari via Torino 155, Mestre (VE)

HUMAN COMPUTER INTERFACE BASED ON HAND TRACKING

Finally: Motion and tracking. Motion 4/20/2011. CS 376 Lecture 24 Motion 1. Video. Uses of motion. Motion parallax. Motion field

This chapter explains two techniques which are frequently used throughout

Problems with template matching

Part I: HumanEva-I dataset and evaluation metrics

The Kinect Sensor. Luís Carriço FCUL 2014/15

Human Detection. A state-of-the-art survey. Mohammad Dorgham. University of Hamburg

EE795: Computer Vision and Intelligent Systems

EE 264: Image Processing and Reconstruction. Image Motion Estimation I. EE 264: Image Processing and Reconstruction. Outline

SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS

People Tracking and Segmentation Using Efficient Shape Sequences Matching

Building a Panorama. Matching features. Matching with Features. How do we build a panorama? Computational Photography, 6.882

Tracking as Repeated Figure/Ground Segmentation

Scene Grammars, Factor Graphs, and Belief Propagation

Scene Grammars, Factor Graphs, and Belief Propagation

Simultaneous Appearance Modeling and Segmentation for Matching People under Occlusion

Lecture 16: Computer Vision

Estimation of Human Figure Motion Using Robust Tracking of Articulated Layers

Object Recognition Using Pictorial Structures. Daniel Huttenlocher Computer Science Department. In This Talk. Object recognition in computer vision

Peripheral drift illusion

Last update: May 4, Vision. CMSC 421: Chapter 24. CMSC 421: Chapter 24 1

EE795: Computer Vision and Intelligent Systems

Automatic Detection and Tracking of Human Motion with a View-Based Representation

Human Activity Recognition Using Multidimensional Indexing

Learning Switching Linear Models of Human Motion

Lecture 16: Object recognition: Part-based generative models

Continuous and Discrete Optimization Methods in Computer Vision. Daniel Cremers Department of Computer Science University of Bonn

Learning Articulated Skeletons From Motion

Supervised texture detection in images

Experiments with Edge Detection using One-dimensional Surface Fitting

Adaptive Background Mixture Models for Real-Time Tracking

Lecture 20: Tracking. Tuesday, Nov 27

08 An Introduction to Dense Continuous Robotic Mapping

Tracking. Establish where an object is, other aspects of state, using time sequence Biggest problem -- Data Association

Articulated Body Motion Capture by Annealed Particle Filtering

Object Tracking with an Adaptive Color-Based Particle Filter

Detecting Salient Contours Using Orientation Energy Distribution. Part I: Thresholding Based on. Response Distribution

Illumination-Robust Face Recognition based on Gabor Feature Face Intrinsic Identity PCA Model

Lecture 16: Computer Vision

Particle Filtering. CS6240 Multimedia Analysis. Leow Wee Kheng. Department of Computer Science School of Computing National University of Singapore

What have we leaned so far?

Object Recognition with Invariant Features

Feature Based Registration - Image Alignment

Motion Estimation (II) Ce Liu Microsoft Research New England

Stereo Wrap + Motion. Computer Vision I. CSE252A Lecture 17

TEXTURE OVERLAY ONTO NON-RIGID SURFACE USING COMMODITY DEPTH CAMERA

Motion and Optical Flow. Slides from Ce Liu, Steve Seitz, Larry Zitnick, Ali Farhadi

Hand Gesture Recognition Based On The Parallel Edge Finger Feature And Angular Projection

Combining Multiple Tracking Modalities for Vehicle Tracking in Traffic Intersections

A 3D Pattern for Post Estimation for Object Capture

Monocular Human Motion Capture with a Mixture of Regressors. Ankur Agarwal and Bill Triggs GRAVIR-INRIA-CNRS, Grenoble, France

Reconstruction of 3-D Figure Motion from 2-D Correspondences

Finding 2D Shapes and the Hough Transform

People Detection and Tracking with World-Z Map from a Single Stereo Camera

Dynamic Shape Tracking via Region Matching

Recognition Rate. 90 S 90 W 90 R Segment Length T

Conditional Visual Tracking in Kernel Space

Finding and Tracking People from the Bottom Up

Model-based Visual Tracking:

Hand Gesture Recognition using Multi-Scale Colour Features, Hierarchical Models and Particle Filtering

Representation and Matching of Articulated Shapes

Chapter 9 Object Tracking an Overview

Tracking Algorithms. Lecture16: Visual Tracking I. Probabilistic Tracking. Joint Probability and Graphical Model. Deterministic methods

An Adaptive Fusion Architecture for Target Tracking

Hand-Eye Calibration from Image Derivatives

Segmentation and Tracking of Partial Planar Templates

SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014

Particle Filter in Brief. Robot Mapping. FastSLAM Feature-based SLAM with Particle Filters. Particle Representation. Particle Filter Algorithm

Tracking of Human Body using Multiple Predictors

Transcription:

Probabilistic Tracking and Reconstruction of 3D Human Motion in Monocular Video Sequences Presentation of the thesis work of: Hedvig Sidenbladh, KTH Thesis opponent: Prof. Bill Freeman, MIT

Thesis supervisors Prof. Jan-Olof Eklundh, KTH Prof. Michael Black, Brown University Collaborators Dr. David Fleet, Xerox PARC Prof. Dirk Ormoneit, Stanford University

A vision of the future from the past. Elektro Sparky New York Worlds Fair, 1939 (Westinghouse Historical Collection)

Applications of computers Video search looking at people Human-machine interaction Robots Intelligent rooms Entertainment: motion capture for games, animation, and film. Surveillance

Technical Goal Tracking a human in 3D

Why is it Hard? The appearance of people can vary dramatically.

Why is it hard? People can appear in arbitrary poses. Structure is unobservable inference from visible parts.

Why is it hard? Geometrically under-constrained.

One solution: Use markers Use multiple cameras http://www.vicon.com/animation/

Bregler and Malik 98 State of the Art. Brightness constancy cue Insensitive to appearance Full-body required multiple cameras Single hypothesis

Artist s s models... 2D vs. 3D tracking

Cham and Rehg 99 State of the Art. Single camera, multiple hypotheses 2D templates (no drift but view dependent) I(x, t) = I(x+u,0) + η

1999 state of art Pavlovic, Rehg, Cham, and Murphy, Intl. Conf. Computer Vision, 1999

Deutscher,, North, Bascle,, & Blake 00 State of the Art. Multiple hypotheses Multiple cameras Simplified clothing, lighting and background

Note: we can fake it with clever system design M. Krueger, Artificial Reality, Addison-Wesley, 1983.

Game videos...

Black background Decathlete 100m hurdles No other people in camera Person at known distance and position. Display tells person what motion to do.

Performance specifications * No special clothing * Monocular, grayscale, sequences (archival data) * Unknown, cluttered, environment Task: Infer 3D human motion from 2D image

Bayesian formulation p(model cues) = p(cues model) p(model) p(cues) 1. Need a constraining likelihood model that is also invariant to variations in human appearance. 2. Need a prior model of how people move. 3. Posterior probability: Need an effective way to explore the model space (very high dimensional) and represent ambiguities.

System components Representation for probabilistic analysis. Models for human appearance (likelihood term). Models for human motion (prior term). Very general model Very specific model Example-based model

System components Representation for probabilistic analysis. Models for human appearance (likelihood term). Models for human motion (prior term). Very general model Very specific model Example-based model

Simple Body Model * Limbs are truncated cones * Parameter vector of joint angles and angular velocities = φ

Multiple Hypotheses Posterior distribution over model parameters often multimodal (due to ambiguities) Represent whole distribution: sampled representation each sample is a pose predict over time using a particle filtering approach

Particle Filter Posterior r p φ I ) ( t 1 t 1 sample Temporal dynamics p( φ t φ t 1) sample r p( φ t It) Posterior normalize p( I t φ t) Likelihood Problem: Expensive represententation of posterior! Approaches to solve problem: Lower the number of samples. (Deutsher et al., CVPR00) Represent the space in other ways (Choo and Fleet, ICCV01)

System components Representation for probabilistic analysis. Models for human appearance (likelihood term). Models for human motion (prior term). Very general model Very specific model Example-based model

Changing background What do people look like? Varying shadows Occlusion Deforming clothing What do non-people look like? Low contrast limb boundaries

Edge Detection? Probabilistic model? Under/over-segmentation, thresholds,

Key Idea #1 (Likelihood) 1. Use the 3D model to predict the location of limb boundaries (not necessarily features) in the scene. 2. Compute various filter responses steered to the predicted orientation of the limb. 3. Compute likelihood of filter responses using a statistical model learned from examples.

Edge Filters Normalized derivatives of Gaussians (Lindeberg, Granlund and Knutsson, Perona, Freeman&Adelson, ) Edge filter response steered to limb orientation: e f ( x, θ, σ ) = sinθ f ( x, σ ) + cosθ f ( x, σ ) x y Filter responses steered to arm orientation.

Example Training Images

Edge Distributions Edge response steered to model edge: f ( x, θ, σ ) = sinθ f ( x, σ ) + cosθ f ( x, σ ) e x y Similar to Konishi et al., CVPR 99

Edge Likelihood Ratio Edge response Likelihood ratio

Other Cues Ridges I(x, t) I(x+u, t+1) Motion

f Ridge Distributions Ridge response steered to limb orientation ( x, θ, σ ) = r sin 2 cos θ f 2 xx θ f ( x, σ ) + cos xx 2 ( x, σ ) + sin θ f 2 yy θ f ( x, σ ) 2sinθ cosθ f yy ( x, σ ) + 2sinθ cosθ f xy ( x, σ ) xy ( x, σ ) Ridge response only on certain image scales!

Motion distributions Different underlying motion models

Likelihood Formulation Independence assumptions: Cues: p(image model) = p(cue1 model) p(cue2 model) Spatial: p(image model) = Π p(image(x) model) x image Scales: p(image model) = Π p(image(σ) model) σ=1,... Combines cues and scales! Simplification, in reality there are dependencies

The power of cue combination

Edge cues Using edge cues alone

Ridge cues Using ridge cues alone

Flow cues Using flow cue alone

Edge cues Using edge, ridge, and motion cues together Ridge cues Flow cues

Key Idea #2 p(image foreground, background) p(foreground part of image foreground) p(foreground part of image background) Do not look in parts of the image considered background Foreground part of image

Likelihood Likelihood = pixels back pixels fore back image p fore image p back fore image p ) ( ) ( ), ( = pixels fore pixels fore pixels all back image p fore image p back image p ) ( ) ( ) ( = pixels fore pixels fore back image p fore image p const ) ( ) ( Foreground pixels Background pixels

System components Representation for probabilistic analysis. Models for human appearance (likelihood term). Models for human motion (prior term). Very general model Very specific model Example-based model

The Prior term Bayesian formulation: p(model cue) p(cue model) p(model) Need a constraining likelihood model that is also invariant to variations in human appearance Need a good model of how people move

Very general model Constant velocity motions Not constrained by how people tend to move.

Constant velocity model All DOF in the model parameter space, φ, independent Angles are assumed to change with constant speed Speed and position changes are randomly sampled from normal distribution

Tracking an Arm Moving camera, constant velocity model 1500 samples ~2 min/frame

Self Occlusion 1500 samples ~2 min/frame Constant velocity model

System components Representation for probabilistic analysis. Models for human appearance (likelihood term). Models for human motion (prior term). Very general model Very specific model Example-based model

Very specific model Only handles people walking. Very powerful constraint on human motion.

Models of Human Dynamics Action-specific model - Walking Training data: 3D motion capture data From training set, learn mean cycle and common modes of deviation (PCA) Mean cycle Small noise Large noise

Walking Person #samples from 15000 to 2500 by using the learned likelihood 2500 samples ~10 min/frame Walking model

No likelihood * how strong is the walking prior? (or is our likelihood doing anything?)

System components Representation for probabilistic analysis. Models for human appearance (likelihood term). Models for human motion (prior term). Very general model Very specific model Example-based model

Example-based model Take lots of training data. Use snippets of the data as models for how people are likely to move.

Example-based model Ten samples from the prior, drawn using approximate probabilistic tree search.

Tracking with only 300 particles. Smooth motion prior. Example-based motion prior.

Lessons Learned Representation for probabilistic analysis. Probabilistic (Bayesian) framework allows Integration of information in a principled way Modeling of priors Particle filtering allows Multi-modal distributions Tracking with ambiguities and non-linear models Models for human appearance (likelihood term). Models for human motion (prior term).

Lessons Learned Representation for probabilistic analysis. Models for human appearance (likelihood term). Generic, learned, model of appearance Combines multiple cues Exploits work on image statistics Use the 3D model to predict features Model of foreground and background Exploits the ratio between foreground and background likelihood Improves tracking Models for human motion (prior term).

Lessons Learned Representation for probabilistic analysis. Models for human appearance (likelihood term). Models for human motion (prior term). Explored 3 different models; analyzed the tradeoffs between each.

End

Decathlete javelin throw

Edges

Bayesian Inference Exploit cues in the images. Learn likelihood models: p(image cue model) Build models of human form and motion. Learn priors over model parameters: p(model) Represent the posterior distribution: p(model cue) p(cue model) p(model)