Computer Vision II Lecture 14

Size: px

Start display at page:

Download "Computer Vision II Lecture 14"

Lisa Martin
6 years ago
Views:

Outline of This Lecture Computer Vision II Lecture 14 Articulated Tracking I Single-Object Tracking Baesian Filtering Kalman Filters, EKF Particle Filters 08.07.

1 Outline of This Lecture Computer Vision II Lecture 14 Articulated Tracking I Single-Object Tracking Baesian Filtering Kalman Filters, EKF Particle Filters Multi-Object Tracking Data association MHT, (JPDAF, MCMCDA) Network flow optimization Bastian Leibe RWTH Aachen leibe@vision.rwth-aachen.de GP bod pose estimation (Model-based tracking, AAMs) Pictorial Structures 2 Image sources: Andreas Ess, Deva Ramanan, Ian Matthews Recap: Linear Assignment Formulation Recap: Linear Assignment Problem Form a matrix of pairwise similarit scores Example: Similarit based on motion prediction Predict motion for each trajector and assign scores for each measurement based on inverse (Mahalanobis) distance, such that closer measurements get higher scores. Formal definition Maximize subject to Those constraints ensure that Z is a permutation matrix Choose at most one match in each row and column to maximize sum of scores 3 The permutation matrix constraint ensures that we can onl match up one object from each row and column. Note: Alternativel, we can minimize cost rather than maximizing weights. Slide adapted from Robert Collins 4 Recap: Optimal Solution Recap: Min-Cost Flow Greed Algorithm Eas to program, quick to run, and ields prett good solutions in practice. But it often does not ield the optimal solution Hungarian Algorithm There is an algorithm called Kuhn-Munkres or Hungarian algorithm specificall developed to efficientl solve the linear assignment problem. Reduces assignment problem to bipartite graph matching. When starting from an N N matrix, it runs in O(N 3 ). If ou need LAP, ou should use it. Conversion into flow graph Transform weights into costs Add source/sink nodes with 0 cost. 5 Directed edges with a capacit of

Recap: Min-Cost Flow Recap: Min-Cost Flow Conversion into flow graph Pump N units of flow from source to sink. Conversion into flow graph Pump N units of flow from source to sink. Internal nodes pass on flow ( flow in = flow out).

2 Recap: Min-Cost Flow Recap: Min-Cost Flow Conversion into flow graph Pump N units of flow from source to sink. Conversion into flow graph Pump N units of flow from source to sink. Internal nodes pass on flow ( flow in = flow out). Find the optimal paths along which to ship the flow. 7 Internal nodes pass on flow ( flow in = flow out). Find the optimal paths along which to ship the flow. 8 Recap: Using Network Flow for Tracking Recap: Using Network Flow for Tracking Complication 1 Tracks can start later than frame1 (and end earlier than frame4) Connect the source and sink nodes to all intermediate nodes. Complication 2 Trivial solution: zero cost flow! 9 10 Recap: Network Flow Approach Recap: Min-Cost Formulation Solution: Divide each detection into 2 nodes Objective Function subject to Flow conservation at all nodes Edge capacities Zhang, Li, Nevatia, Global Data Association for Multi-Object Tracking using Network Flows, CVPR image source: [Zhang, Li, Nevatia, CVPR 08] Slide credit: Laura Leal 12 2

Topics of This Lecture Articulated Tracking Motivation Classes of Approaches Bod Pose Estimation as High-Dimensional Regression Representations Training data generation Latent variable space Learning

Track facial expressions Track detailed hand motion.

High-dimensional Hitting the limits of sensor data 14 image sources: T. Svoboda, D. Ramanan, I. Matthews, J. Oikonomidis Basic Classes of Approaches Wh Is It Difficult?

3 Topics of This Lecture Articulated Tracking Motivation Classes of Approaches Bod Pose Estimation as High-Dimensional Regression Representations Training data generation Latent variable space Learning a mapping between pose and appearance Review: Gaussian Processes Formulation GP Prediction Algorithm Applications Articulated Tracking under Egomotion 13 Examples Recover a person s bod articulation Track facial expressions Track detailed hand motion... Common properties Detailed parameterization in terms of joint locations or joint angles Two steps Pose estimation (in single frame) Tracking (using dnamics model) Challenging problem High-dimensional Hitting the limits of sensor data 14 image sources: T. Svoboda, D. Ramanan, I. Matthews, J. Oikonomidis Basic Classes of Approaches Wh Is It Difficult? Global methods Entire bod configuration is treated as a point in some high-dimensional space. Observations are also global feature vectors. View of pose estimation as a high-dimensional regression problem. Often in a subspace of tpical motions... = = Part-based methods Bod configuration is modeled as an assembl of movable parts with kinematic constraints. Local search for part configurations that provide a good explanation for the observed appearance under the kinematic constraints. View of pose estimation as probabilistic inference in a dnamic Graphical Model. 15 image sources: T. Jaeggli, D. Ramanan, T. Svoboda Challenges Poor imaging, motion blur, occlusions, etc. Difficult to extract sufficientl good figure-ground information Mapping is generall multi-modal: an image observation can represent more than one pose! Slide credit: Raquel Urtasun 16 Topics of This Lecture Bod Representation Motivation The bod can be approximated as kinematic tree Classes of Approaches Bod Pose Estimation as High-Dimensional Regression Representations Training data generation Latent variable space Learning a mapping between pose and appearance Parametrization via Joint locations Joint angles Relative joint angles along kinematic chain... Review: Gaussian Processes Formulation GP Prediction Algorithm Applications Articulated Tracking under Egomotion 17 Example using in the following 3D joint locations of 20 joints 60-dimensional space Slide adapted from Raquel Urtasun 18 image source: R. Urtasun 3

Image Representation Another Advantage of Silhouette Data Man possibilities... Popular choice: Silhouettes Eas to extract using background modeling techniques.

Create sequences of Pose + Silhouette pairs Poses recorded with Mocap, used to animate 3D model Silhouette via 3D rendering pipeline Orientation ( ) Motion Capture 19 Video source: Hedvig Sidhenbladh

Animate with MoCap data Resulting snthetic training data (depth, bod part labels, silhouette) Example training sequence 21 Image source: Umer Rafi 22 Video source: Umer Rafi Learning a Mapping b/w

4 Image Representation Another Advantage of Silhouette Data Man possibilities... Popular choice: Silhouettes Eas to extract using background modeling techniques. Capture important information about bod shape. We will use them as an example for toda s lecture... Snthetic training data generation possible! Create sequences of Pose + Silhouette pairs Poses recorded with Mocap, used to animate 3D model Silhouette via 3D rendering pipeline Orientation ( ) Motion Capture 19 Video source: Hedvig Sidhenbladh Pose Data (p) Slide adapted from Stefan Gammeter 3D Rendering Silhouettes (s) 20 Snthetic Training Data Generation Snthetic Training Data Generation Varing bod proportions Different clothes models Animate with MoCap data Resulting snthetic training data (depth, bod part labels, silhouette) Example training sequence 21 Image source: Umer Rafi 22 Video source: Umer Rafi Learning a Mapping b/w Pose and Appearance Latent Variable Models Appearance prediction Regression problem High-dimensional data on both sides Low-dim. representation needed for learning! 3D joint locations segm. image ~60-dim. ~2500-dim. Training with Motion-capture stimuli Real dnamics from human actors Snthesized silhouettes for training Background subtraction for test T. Jaeggli, E. Koller-Meier, L. Van Gool, "Learning Generative Models for Monocular Bod Pose Estimation", ACCV image source: T. Jaeggli Joint angle pose space is huge! Onl a small portion contains valid bod poses. Restrict estimation to the subspace of valid poses for the task Latent variable models: PCA, FA, GPLVM, etc. 24 image source: R. Urtasun 4

generative mapping Example: Subspace of Walking Motion Articulated Motion in the Latent Space walking ccles have one main (periodic) DOF

b a 2D latent space Capture the dependenc b dimensionalit reduction (PCA, FA, CCA, LLE, GPLVM,...) 25 image sources: S. Gammeter, T.

Slide adapted from Stefan Gammeter 26 Learning a Generative Mapping Example Results Bod Pose X : Bod Pose (high dim.

) dnamic prior Difficulties Changing viewpoints Low resolution (50 px) Compression artifacts Disturbing objects (umbrella, bag) likelihood Y

Van Gool, "Learning Generative Models for Monocular Bod Pose Estimation", ACCV 2007.

Switching b/w Multiple Activities Learn multiple models One model per activit Separate LLE embedding Separate dnamics Bod Pose X x Bod Pose

5 generative mapping Example: Subspace of Walking Motion Articulated Motion in the Latent Space walking ccles have one main (periodic) DOF additional DOF encode walking stle Pose modeling in a subspace Pose model has 60 (highl dependent) DoF But gait is cclic, can be represented b a 2D latent space Capture the dependenc b dimensionalit reduction (PCA, FA, CCA, LLE, GPLVM,...) 25 image sources: S. Gammeter, T. Jaeggli Regression from latent space to Pose p(pose z) Silhouette p(silhouette z) Regressors need to be learned from training data. Slide adapted from Stefan Gammeter 26 Learning a Generative Mapping Example Results Bod Pose X : Bod Pose (high dim.) reconstruct pose Learn dim. red. (LLE) x : Bod Pose (low dim.) dnamic prior Difficulties Changing viewpoints Low resolution (50 px) Compression artifacts Disturbing objects (umbrella, bag) likelihood Y : Image (high dim.) : Appearance projection (BPCA) Descriptor: (low dim.) Appearance T. Jaeggli, E. Koller-Meier, L. Van Gool, "Learning Generative Models for Monocular Bod Pose Estimation", ACCV Slide credit: Tobias Jaeggli 27 Original video 28 Video sources: Hedvig Sidhenbladh, Tobias Jaeggli Representing Multiple Activities Switching b/w Multiple Activities Learn multiple models One model per activit Separate LLE embedding Separate dnamics Bod Pose X x Bod Pose X x Bod Pose Y X Appearance Y Appearance Y Appearance x Activit switching Low-res. traffic scene Transition from Walking to Running Original video Learn transition function Link the LLE spaces Find similar pose pairs Learn smooth transition Pose Estim. Input walk run Slide credit: Tobias Jaeggli 29 Activit switching 30 Videos b Tobias Jaeggli 5

6 Topics of This Lecture Classification vs. Regression Motivation Classes of Approaches Bod Pose Estimation as High-Dimensional Regression Representations Training data generation Latent variable space Learning a mapping between pose and appearance Review: Gaussian Processes Formulation GP Prediction Algorithm In classification: 2 {-1, 1} In regression: 2 R Applications Articulated Tracking under Egomotion 31 Slide credit: Raquel Urtasun 32 Gaussian Process Regression Gaussian Process Regression Regular regression: f(x) GP Regression Ver eas to appl Automatic confidence estimate of the result Well-suited for pose regression tasks x GP regression: μ (x)+σ(x) μ (x) μ (x)-σ(x) In the following, I will give a quick intro to GPs Focus on main concepts and results A far more detailed discussion will be given in the Advanced Machine Learning lecture (next semester). x Slide credit: Stefan Gammeter Gaussian Process Gaussian distribution Probabilit distribution over scalars / vectors. Gaussian process (generalization of Gaussian distrib.) Describes properties of functions. Function: Think of a function as a long vector where each entr specifies the function value f(x i ) at a particular point x i. Issue: How to deal with infinite number of points? If ou ask onl for properties of the function at a finite number of points Then inference in Gaussian Process gives ou the same answer if ou ignore the infinitel man other points. Definition A Gaussian process (GP) is a collection of random variables an finite number of which has a joint Gaussian distribution. Slide credit: Bernt Schiele 35 Gaussian Process Example prior over functions p(f) Represents our prior belief about functions before seeing an data. Although specific functions don t have mean of zero, the mean of f(x) values for an fixed x is zero (here). Favors smooth functions I.e. functions cannot var too rapidl Smoothness is induced b the covariance function of the Gaussian Process. Learning in Gaussian processes Is mainl defined b finding suitable properties of the covariance function. 36 Slide credit: Bernt Schiele Image source: Rasmussen & Williams,

Gaussian Process Gaussian Process: Squared Exponential A Gaussian process is completel defined b Mean function m(x) and m(x) = E[f(x)] Tpical covariance function Squared exponential (SE) Covariance

)) Remarks Covariance between the outputs is written as a function between the inputs.

7 Gaussian Process Gaussian Process: Squared Exponential A Gaussian process is completel defined b Mean function m(x) and m(x) = E[f(x)] Tpical covariance function Squared exponential (SE) Covariance function specifies the covariance between pairs of random variables Covariance function k(x,x ) k(x; x 0 ) = E[(f(x) m(x)(f(x 0 ) m(x 0 ))] We write the Gaussian process (GP) f(x)» GP(m(x); k(x; x 0 )) Remarks Covariance between the outputs is written as a function between the inputs. The squared exponential covariance function corresponds to a Baesian linear regression model with an infinite number of basis functions. For an positive definite covariance function k(.,.), there exists a (possibl infinite) expansion in terms of basis functions. Slide adapted from Bernt Schiele 37 Slide credit: Bernt Schiele 38 Gaussian Process: Prior over Functions GP Prediction with Nois Observations Distribution over functions: Specification of covariance function implies distribution over functions. I.e. we can draw samples from the distribution of functions evaluated at a (finite) number of points. Assume we have a set of observations: with noise ¾ n Joint distribution of the training outputs f and test outputs f * according to the prior: Procedure We choose a number of input points X? We write the corresponding covariance matrix (e.g. using SE) element-wise: K(X? ; X? ) Then we generate a random Gaussian vector with this covariance matrix: f?» N(0; K(X? ; X? )) Example of 3 functions sampled 39 Slide credit: Bernt Schiele Image source: Rasmussen & Williams, 2006 K(X, X * ) contains covariances for all pairs of training and test points. To get the posterior (after including the observations) We need to restrict the above prior to contain onl those functions which agree with the observed values. Think of generating functions from the prior and rejecting those that disagree with the observations (obviousl prohibitive). Slide credit: Bernt Schiele 40 Result: Prediction with Nois Observations GP Regression Algorithm Calculation of posterior: Corresponds to conditioning the joint Gaussian prior distribution on the observations: Ver simple algorithm ¹f? = E[f? jx; X? ; t] with: Based on the following equations (Matrix inv. Cholesk fact.) This is the ke result that defines Gaussian process regression! The predictive distribution is a Gaussian whose mean and variance depend on the test points X * and on the kernel k(x,x ), evaluated on the training data X. Slide credit: Bernt Schiele Image source: Rasmussen & Williams,

Computational Complexit Complexit of GP model Training effort: O(N 3 ) through matrix inversion Test effort: O(N 2 ) through vector-matrix multiplication Complexit of basis function model Training

Topics of This Lecture Motivation Classes of Approaches Bod Pose Estimation as High-Dimensional Regression Representations Training data generation Latent variable space Learning a mapping between

..N Articulated Multi-Person Tracking Idea: Onl perform articulated tracking where it s eas!

8 Computational Complexit Complexit of GP model Training effort: O(N 3 ) through matrix inversion Test effort: O(N 2 ) through vector-matrix multiplication Complexit of basis function model Training effort: O(M 3 ) Test effort: O(M 2 ) Discussion Exact GP methods become infeasible for large training sets. Need to use approximate techniques whenever #training examples > Topics of This Lecture Motivation Classes of Approaches Bod Pose Estimation as High-Dimensional Regression Representations Training data generation Latent variable space Learning a mapping between pose and appearance Review: Gaussian Processes Formulation GP Prediction Algorithm 43 Applications Articulated Tracking under Egomotion 44 Articulated Multi-Person Tracking using GP 1...N Articulated Multi-Person Tracking Idea: Onl perform articulated tracking where it s eas! Multi-person tracking Solves hard data association problem Articulated tracking Onl on individual tracklets between occlusions GP regression on full-bod pose 45 [Gammeter, Ess, Jaeggli, Schindler, Leibe, Van Gool, ECCV 08] Multi-Person tracking Recovers trajectories and solves data association Estimates detailed bod pose for each tracked person 46 [Gammeter, Ess, Jaeggli, Schindler, Leibe, Van Gool, ECCV 08] Articulated Tracking under Egomotion Summar: Articulated Tracking with Global Models Pros: View as regression problem (pose appearance) Lots of machine learning techniques available Research focus on handling the ambiguities Training on MoCap data possible Accurate models for human dnamics Guided segmentation for each frame No reliance on background modeling Approach applicable to scenarios with moving camera Feedback from bod pose estimate to improve segmentation 47 [Gammeter, Ess, Jaeggli, Schindler, Leibe, Van Gool, ECCV 08] Cons: High-dimensional problem Global model Can handle onl those articulations it has previousl seen Not robust against partial occlusion Difficult to get good appearance representation MoCap data Can snthesize silhouettes, but not appearance Restricted to background subtraction 48 8

Computer Vision II Lecture 14

Computer Vision II Lecture 14 Articulated Tracking I 08.07.2014 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Outline of This Lecture Single-Object Tracking Bayesian