3D Human Motion Tracking Using Dynamic Probabilistic Latent Semantic Analysis

Size: px
Start display at page:

Download "3D Human Motion Tracking Using Dynamic Probabilistic Latent Semantic Analysis"

Transcription

1 3D Human Motion Tracking Using Dynamic Probabilistic Latent Semantic Analysis Kooksang Moon and Vladimir Pavlović Department of Computer Science, Rutgers University Piscataway, NJ 885 {ksmoon, Abstract We propose a generative statistical approach to human motion modeling and tracking that utilizes probabilistic latent semantic (PLSA) models to describe the mapping of image features to 3D human pose estimates. PLSA has been successfully used to model the co-occurrence of dyadic data on problems such as image annotation where image features are mapped to word categories via latent variable semantics. We apply the PLSA approach to motion tracking by extending it to a sequential setting where the latent variables describe intrinsic motion semantics linking human figure appearance to 3D pose estimates. This dynamic PLSA (DPLSA) approach is in contrast to many current methods that directly learn the often high-dimensional image-to-pose mappings and utilize subspace projections as a constraint on the pose space alone. As a consequence, such mappings may often exhibit increased computational complexity and insufficient generalization performance. We demonstrate the utility of the proposed model on the synthetic dataset and the task of 3D human motion tracking in monocular image sequences with arbitrary camera views. Our experiments show that the proposed approach can produce accurate pose estimates at a fraction of the computational cost of alternative subspace tracking methods.. Introduction Estimating 3D body pose from D monocular image is a fundamental problem for many applications ranging from surveillance to advanced human-machine interfaces. However, the shape variation of D images caused by changes in pose, camera setting, and view points makes this estimation a challenging problem. Computational approaches to pose estimation in these settings are often characterized by complex algorithms and a tradeoff between the estimation accuracy and computational efficiency. In this paper we propose the low-dimensional embedding method for 3D pose estimation that exhibits both high accuracy, tractable estimation, and invariance to viewing direction. 3D human pose estimation from monocular D images can be formulated as the task of matching an image of the tracked subject to the most likely 3D pose. To learn such a mapping one needs to deal with a dyadic set of high dimensional objects the poses, y and the image features, z. Because of the high dimensionality of the two spaces learning a direct mapping z y often results in complex models with poor generalization properties. One way to solve this problem is to map the two high dimensional vectors to a lower dimensional subspace x: x z and x y [3, ]. However, in these approaches, the correlation between the pose and the image feature is weakened by learning the two mappings independently and the temporal relationship is ignored during the embedding procedure. Our approach to pose estimation is inspired by probabilistic latent semantic analysis (PLSA) that is often used in domains such as the computational language modeling to correlate complex processes. In this paper we extended PLSA to account for the dynamic nature of sequential data. We choose the Gaussian Process Latent Variable model (GPLVM) to form a pair of mapping functions between the latent variable and the two objects. A GPLVM framework is particularly suited for this model because the dynamic nature of sequence can be directly integrated into the embedding procedure in a probabilistic manner [5,, 3]. The two generative nonlinear embedding models and the marginal dynamics results in a new hybrid model called the dynamic PLSA (DPLSA). DPLSA models the semantical relationship between a sequence of 3D poses and a sequence of image features via the shared latent dynamic space. This paper is organized as follows. We first define the DPLSA model that utilizes the marginal dynamic prior to learn the latent space of sequential data. We then propose the new framework for human motion modeling based on the DPLSA model and suggest learning and inference methods in this specific modeling context. The framework can

2 be directly extended for multiple view points by using the mixture model in the space of the latent variables and the image features. The utility of the the new framework is examined thorough a set of experiments of tracking 3D human figure motion from synthetic and real image sequences.. Related Work Dyadic data refers to a domain with two sets of objects in which data is measured on pairs of units. One of the popular approaches for learning from this kind of data is the latent semantic analysis (LSA) that was devised for document indexing. Deerwester et al. [] considered the term-document association data and used singular-value decomposition to decompose document matrix into a set of orthogonal matrices. LSA has been applied to a wide range of problems such as information retrieval and natural language processing [, ]. Probabilistic Latent Semantic Analysis (PLSA) [5] is a generalization of LSA to probabilistic settings. The main purpose of LSA and PLSA is to reveal semantical relations between the data entities by mapping the high dimensional data such text documents to a lower dimensional representation called latent semantic space. Some exemplary application areas of PLSA in computer vision include image annotation [] and image category recognition [, 8]. Latent space approach for high-dimensional data has been applied to human motion tracking problems in the past. Various dimensionality reduction techniques such as Principal Components Analysis (PCA), isometric feature mapping (Isomap), Local linear (LLE) and spectral embedding have been successfully used in human tracking [6, 3, 9]. Recently, the GPLVM that produces a continuous mapping between the latent space and the high dimensional data in a probabilistic manner [8] was used for human motion tracking. Tian et al. [] use a GPLVM to estimate the D upper body pose from the D silhouette features. Urtasun et al. [] exploit the SGPLVM for 3D people tracking. Wang et al. [5] introduced Gaussian Process Dynamic Model (GPDM) that utilizes the dynamic priors for embedding and the GPDM is effectively used for 3D human motion tracking [3]. In [] marginal AR prior for GPLVM embedding is proposed and utilized for 3D human pose estimation from the synthetic and real image sequences. Lawrence and Moore [9] propose the extension of GPLVM using a hierarchical model in which the conditional independency between human body parts is exploited with low dimensional non-linear manifolds. However, these approaches utilize only the pose in latent space estimation and as a consequence, the optimized latent space cannot guarantee the proper dependency between the poses and the image observations in regression setting. Shon et al. [5] propose a shared latent structure model that utilizes the latent space that links corresponding pairs of observations from the multiple different spaces and apply it for image synthesis and robotic imitation of human actions. Although their model also utilizes GPLVM as the embedding model, their applications are limited to nonsequential cases and the linkage between two observations is explicit(e.g.image-image or pose-pose). The shared latent structure model using GPLVM is utilized for pose estimation in [3]. This work focuses on the semi-supervised regression learning and makes use of unlabeled data (only pose or image) to regularize the regression model. In contrast, our work, using a statistical foundation of PLSA, focuses on the computational advantages of the shared latent space. In addition, it explicitly considers the latent dynamics and the multi-view setting ignored in [3]. 3. Dynamic PLSA with GPLVM The starting point of our framework design is the symmetric parameterization of Probabilistic Latent Semantic Analysis [5]. In this setting the co-occurrence data y Y and z Z are associated via an unobserved latent variable x X: P (y, z) = x X P (x)p (y x)p (z x). () With a conditional independence assumption, the joint probability over data can be easily computed by marginalizing over the latent variable. We extend the idea to the case in which the two sets of objects, Y and Z are sequences and the latent variable x t is only associated with the dyadic pair (y t, z t ) at time t. And we solve the dual problem by marginalizing the parameters in the conditional probability models instead of marginalization of Z. Consider the sequence of length T of M-dimensional vectors, Y = [y y...y T ], where y t is a human pose (e.g.joint angles) at time t. The corresponding sequence Z = [z z...z T ] represent the sequence of N-dimensional image features observed for the given poses. The key idea of our Dynamic Probabilistic Latent Semantic Analysis (DPLSA) model is that the correlation between the pose Y and the image feature Z can be modeled using a latent-variable model where two mappings between the latent variable X and Y and between X and Z are defined using a Gaussian Process latent variable model of [8]. In other words, X can be regarded as the intrinsic subspace that Y and Z jointly share. The graphical representation of DPLSA for human motion modeling is depicted in Fig.. We assume that sequence X R D T of length T is generated by possibly nonlinear dynamics modeled as a known

3 y x y x y 3... x 3 z z z y T x T z T Figure. Graphical model of DPLSA. mapping φ parameterized by parameter γ x [5, 3] such as x t = A φ t (x t γ x,t )+A φ t (x t γ x,t )+...+w t. () Then the st order nonlinear dynamics is characterized by the kernel matrix K xx = φ(x γ x )φ(x γ x ) T + α I. (3) The model can further be generalized to higher order dynamics. The mapping from X to Y is a generative model defined using a GPLVM [8]. We assume that the relationship between the latent variable and the pose is nonlinear with additive noise, v t a zero-mean Gaussian noise with covariance βy I: y t = Cf(x t γ y ) + v t. () C represents a linear mapping matrix and f( ) is a nonlinear mapping function with a hyperparameter γ y. By choosing the simple prior of a unit covariance, zero mean Gaussian distribution on the element c ij in C and x t, marginalization of C results in a mapping: { P (Y X, β y ) K yx M/ exp } tr{k yx Y Y T } where (5) K yx (X, X) = f(x γ y )f(x γ y ) T + β y I. (6) Similarly, the mapping from the latent variable X into the image feature Z can be defined by z t = Dg(x t γ z ) + u t. (7) The marginal distribution of this mapping becomes { P (Z X, β z ) K zx N/ exp } tr{k zx ZZ T } (8) Notice that the kernel functions and parameters are different in the two mappings from the common latent variable sequence X to Y and to Z. The joint distribution of all co-occurrence data and all intrinsic sequence in a Y Z X space is finally modeled as P (X, Y, Z θ x, θ y, θ z ) = P (X θ x )P (Y X, θ y )P (Z X, θ z ) () where θ y {β y, γ y } and θ z {β z, γ z } represent the sets of hyperparameters in the two mapping functions from X to Y and from X to Z. θ x represents a set of hyperparameters in the dynamic model (e.g.α for a linear model and α, γ x for a nonlinear model). 3.. Human Motion Modeling Using Dynamic PLSA In human motion modeling, one s goal is to recover two important aspects of human motion from image feature: () 3D posture of the human figure in each image and () an intrinsic representation of the motion. Given a sequence of image features Z, the joint conditional model of the pose sequence Y and the corresponding embedded sequence X can be expressed as P (X, Y Z, θ x, θ y, θ z ) P (X θ x )P (Y X, θ y )P (Z X, θ z ). () Notice that the two mapping processes P (Y X) and P (Z X) have different noise models which can account for different factors (e.g.motion capture noise for the pose and camera noise for the image) that influence one but not the other process. 3.. Learning Human motion model is parameterized using a set of hyperparameters θ x, θ y and θ z, and the choice of kernel functions, K yx and K zx. Given both the sequence of poses and the corresponding image features, the learning task is to infer the subspace sequence X in the marginal dynamics space and the hyperparameters. Using the Bayes rule and () the joint likelihood is in the form P (X, Y, Z, θ x, θ y, θ z ) = P (X θ x )P (Y X, θ y ) P (Z X, θ z )P (θ x )P (θ y )P (θ z ). () where K zx (X, X) = g(x γ z )g(x γ z ) T + β z I. (9) To mitigate the overfitting problem, we utilize priors over the hyperparameters [5, 3, 5] such as P (θ x ) α (or α γx ), P (θ y ) βy γy and P (θ z ) βz γy. 3

4 The task of estimating the mode X and the hyperparameters, {θ x, θ y, θ z} can then be formulated as the ML/MAP estimation problem {X, θ x, θ y, θ z} = arg max X,θ x,θ y,θ z {log P (X θ x ) + log P (Y X, θ y ) + log P (Z X, θ z )} (3) which can be achieved using generalized gradient optimization such as CG, SCG or BFG. The task s nonconvex objective can give rise to point-based estimates of the posterior P (X Y, Z) that can be obtained by starting the optimization process from different initial points Inference and Tracking Having learned the DPLSA on training data Y and Z, the motion model can be used effectively in inference and tracking. Because we have two conditionally independent GPs, estimating current pose (distribution) y t and estimating current point x t in the embedded space can be decoupled. Given image features z t in frame t the optimal point estimate x t is the result of the following nonlinear optimization x t = arg max x t P (x t x t, θ x )P (z t x t, θ z ). () Due to GP nature of dependencies, the second term assumes conditional Gaussian form, however its dependency on x t is nonlinear [8] even with linear motion models in x. As a result, the tracking posterior P (x t z t, z t,...) may become highly multimodal. We utilize a particle-based tracker for our final pose estimation during tracking. However, because the search space is the low dimensional embedding space, only a small number of particles (<, empirical result) is sufficient for tracking allowing us to effectively avoid the computational problems associated with sampling in high dimensional spaces. A sketch of this procedure using particle filter based on the sequential importance sampling algorithm with N P particles and weights (w (i), i =,..., N P ) is shown below. Input : Image z t, Human motion model e.g.() and prior point estimates (w (i) t, x(i) t, y(i) t ) Z..t, i =,..., N P. Output: Current intrinsic state estimates (w (i) t, x (i) t ) Z..t, i =,..., N P ) Draw the initial estimates x (i) t p(x t x (i) t, θ x). ) Find optimal estimates x (i) t using nonlinear optimization in (). 3) Find point weights w (i) t P (x (i) t x (i) t, θ x)p (z (i) t x (i) t, θ z ). Algorithm : Particle filter in human motion tracking. Finally, because the mapping from X to Y is a GP function, we can easily compute the distribution of poses y t for each particle x (i) t by using the well known result from GP theory: P (y t x (i) t ) N (µ (i), σ (i) I). µ (i) = µ Y + Y T K yx (X, X) K yx (X, x (i) t ) (5) σ (i) = K yx (x (i) t, x (i) t ) K yx (X, x (i) t ) T K yx (X, X) K yx (X, x (i) t ) (6) where µ Y is the mean of training set. The distribution of poses at time t is thus approximated by a Gaussian mixture model. The mode of this distribution can be selected as the final pose estimate.. Mixture Models for Unknown View The image feature for the specific pose can vary according to a camera view point and orientation of the person with respect to the imaging plane. In a dynamic PLSA framework, the view point factor R can be easily combined into the generative model P (Z X) that represents the image formation process. P (X, Y, Z, R θ x ) = P (X θ x )P (Y X)P (Z X, R)P (R). (7) While the continuous representation of R is possible, learning such a representation from a finite set of view samples may be infeasible in practice. As an alternative, we use a quantized set of view points and suggest a mixture model, P (Z X, β z, γ r ) = S P (Z X, R = r, βz, r γz)p r (R = r) r= (8) where S denotes the number of views. Note that all the kernel parameters (β r z, γ r z) can be potentially different for different r... Learning Collecting enough training data for a large set of view points can be a tedious task. Instead, by using realistic synthetic data generated by 3D rendering software which allows us to simulate the realistic humanoid model and render textured images from a desired point of view, one can build a large training set for multi-view human motion model. In this setting one can simultaneously use all views to jointly estimate a complete set of DPLSA parameters as well as the latent space X. Given the pairs of the pose and the corresponding image feature with view point, learning the com-

5 plete mixture models reduces to joint optimization of P (X, Y, Z, Z,..., Z S, R,..., R S ) = P (X)P (Y X) P (Z s X, R s = s)p (R s = s). (9) s where S is the number of quantized views. The optimization of X and model parameters is a straightforward generalization of the method described in Sec Inference and Tracking The presence of an unknown viewing direction during tracking necessitates its estimation in addition to that of the latent state x t. This joint estimation of x t and R can be accomplished by directly extending the particle tracking method of Sec This approach is reminiscent of [3] in that it maintains the multiple view-based manifolds representing the various mappings caused from different view points. 5. Experiments 5.. Synthetic Data In our first experiment we demonstrate the advantage of our DPLSA framework on the subspace selection problem. We also compare the predictive ability of DPLSA when estimating the sequence Y from the observation Z. We generate a set of synthetic sequences using the following model: intrinsic motion X is generated with periodic functions in the R T space. The sequences are then mapped to a higher dimensional space of Y (in R T 7 ) through a mapping which is a linear combination of nonlinear features x, x, x, x. Z (in R T 3 ) is finally generated by mapping Y into a non-linear lower observation space in a similar manner. Examples of the three sequences are depicted in Fig.. This model is reminiscent of the Figure. A example of synthetic sequences. Left: X in the intrinsic subspace. Middle: Y generated from X Right: Z generated from Y. See text for detail. generative process that may reasonably model the mapping from intrinsic human motion to image appearance/features. We apply three different motion modeling approaches to model the relationship between the intrinsic motion X, the 3D pose space Y and the image feature space Z. The first method (Model ) is the manifold mapping of [3] which learns the embedding space using LLE or Isomap based on the observation Z and optimizes the mapping between X and Y using Generalized RBF interpolation. The second approach (Model ) is the human motion modeling using Marginal Nonlinear Dynamic System (MNDS) [], a model that attempts to closely approximate the data generation process. Model 3 is our proposed DPLSA approach described in Sec. 3.. During the learning process, initial embedding is estimated using probabilistic PCA. Initial kernel hyperparameter values were typically set to, except for the dynamic models were the variances were initially assigned values of.. We evaluate predictive accuracy of models in inferring Y from Z. We generate 5 sequences using the procedure described above. We generate the testing Z by adding white noise to the training sequence and infer Y from this Z. Table shows individual mean square error (MSE) rates of predicting all 7 dimensions in Y. All values are normalized with respect to the total variance in the true Y. The results demonstrate that the DPLSA model outperforms both the LLE-based model as well as the MNDS. We attribute the somewhat surprising result when compared to Model to the sensitivity of this model to estimates of the initial parameters of Z Y mapping. This problem can be mitigated by careful manual selection of the initial parameters, a typically burdensome task. However, another crucial advantage of our DPLSA model over Model is the computational cost in inferring Y. For instance, the mean number of iterations of scaled CG optimization is 7.9 for Model 3 and 3.8 for Model. This advantage will be further exemplified in the next set of experiment. 5.. Synthetic Human Motion Data 5... Single view point. In a controlled study we conducted experiments using a database of motion capture data for a 59 d.o.f body model from the CMU Graphics Lab Motion Capture Database [6] and synthetic video sequences. We used 5 walking sequences from 3 different subjects and running sequences from different subjects. The models were trained and tested on different subjects to emphasize the robustness of the approach to changes in the motion style. Initial model values, prior to learning updates, were set in the manner described in Sec. 5.. We exclude 6 joint angles that exhibit very small variances but very noisy (e.g.clavicle and finger). The human figure images are rendered on Maya using the software generously provided by the authors of [, ]. Following this, we extract the silhouette images to compute the -dimensional Alt moment 5

6 Table. MSE rates of predicting Y from Z. Model ē y ē y ē y3 ē y ē y5 ē y6 ē y7 ēyi LLE+GRBF MNDS DPLSA image features as in []. Also 3D latent space is employed for all motion tracking experiments. Fig. 3 depicts the log precisions of P (Y X) and P (Z X) on the D projection of latent space learned from cycle of walking sequence. Note that the precisions around the embedded X are different in the two spaces even though the X is commonly shared by both GPLVM models. To evaluate our DPLSA average, requires /5 of the iterations to achieve the same level of accuracy as the competing model. This can be explained by the presence of complex direct interaction between high dimensional features and pose states in the competing model. In DPLSA such interactions summarized via the low dimensional subspace. As a result, DPLSA representation can lead to potentially more suitable algorithm for real-time tracking MNDS Tracking DPLSA Tracking 8 MNDS Tracking DPLSA Tracking Mean Error.5 SCG Iterations Figure 3. Latent spaces with the grayscale map of log precisions. Left: P (Y X). Right: P (Z X). model in human motion tracking, we again compare it to the MNDS model in [] that utilizes the direct mapping between the poses and the image features. The models are learned from one specific motion sequence and tested on different sequences. Fig. shows the mean error in the 3D joint position estimation and the number of iterations in SCG per frame during the tracking. We use the error metric similar to the one in [7]. The error between estimated pose Ŷ and the ground truth pose Y from motion capture is E(Y, Ŷ ) = J j= ŷ j y j /J where y j is the 3D location of specific joint and J is the number of joints considered in the error matric. We choose 9 joints which have a wide motion range (e.g.throx and right & left wrist, humerus, femur and tibia). The height of human figure in this virtual space is 8 and the error unit can be relatively computed (e.g.when the height of man is 75cm, the error unit is 75/8 6.5cm). When the model was learned from walking sequence and tested on other sequences, the average error was.9 for MNDS tracking and.85 for DPLSA tracking. Results of full pose estimation for one running sequence are depicted in Fig. 5. The models achieve similarly a good accuracy in pose estimation. The distinct difference between the two models, however, is exhibited in the computational complexity of the learning and inference stages as shown in Fig.. The DPLSA model, on Frame Frame Figure. Tracking performance comparison. Left: pose estimation accuracy. Right: mean number of iterations of SCG. Figure 5. Input silhouettes and 3D reconstructions from a known viewpoint of π. First row: true poses. Second rows: silhouette images. Third row: estimated poses Multiple view points. We used one person sequence to learn the mixture models of the 8 different views (view angles = π i, i =,,..., 8 in clockwise direction, for frontal view) and human motion model. We made testing sequences by picking different motion capture sequences and rendering the images from 8 view points. Fig. 6 shows the one example of 3D tracking results from different view points. In the experiment the view point of input images are unknown and inferred frame by frame during 6

7 tracking with pose estimation. Although the pose estimation for some ambiguous silhouette is erroneous, our system can track the pose until the end of sequence with the proper view point estimation and 3D reconstructions are matched well to the true poses. Notice that the last two rows depict poses viewed from 3π/, i.e.the subject walking in the direction of the top left corner Real Video Sequence We applied our method to tracking of real monocular image sequences with fixed view point. We used the sideview sequences from CMU Mobo database [7]. DPLSA model was trained on walking sequences from the Mocap data and tested on the motion sequences from the Mobo set. Fig. 7 shows two example sequences of our tracking result. The lengths of the testing sequence are 3 and 3 frames. Although the frame rates and the walking style are different and there exists noise in the silhouette images, the reconstructed pose sequence depicts a plausible walking motion that agrees with the observed images. 6. Conclusions We have reformulated the shared latent space approach for learning models from sequential dyadic data. The reformulated generative statistical model is called the Dynamic Probabilistic Latent Semantic Analysis (DPLSA), which extends the successful PLSA formalism to a continuous state estimation problem of mapping sequences of human figure appearances in images to estimates of the 3D figure pose. Our preliminary results indicate that the DPLSA formalism can result in highly accurate trackers that exhibit fractional computational cost of the traditional subspace tracking methods. Moreover, the method is easily amenable to extensions to unknown or multi-view camera tracking tasks. The proposed model has the potential to handle complex classes of tracking problems, such as the challenging rapidly changing motions, when coupled with multiple model frameworks such as the switching dynamic models. Our future work will address these new directions as well as focus on continuing extensive evaluation of DPLSA on additional motion datasets. References [] J. Bellegarda. Exploiting both local and global constraints for multi-spanstatistical languagemodeling. In ICASSP, volume, pages , 998. [] S. C. Deerwester, S. T. Dumais, T. K. Landauer, and G. W. F. andr. A. Harshman. Indexing by latent semantic analysis. JASIST, (6):39 7, 99. [3] A. Elgammal and C.-S. Lee. Inferring 3d body pose from silhouettes using activity manifold learning. In CVPR, volume, pages ,. [] R. Fergus, L. Fei-Fei, P. Perona, and A. Zisserman. Learning object categories from google s image search. In ICCV, pages 86 83, 5. [5] T. Hofmann. Probabilistic latent semantic analysis. In Proc. of Uncertainty in A.I,, Stockhol, 999. [6] [7] [8] N. D. Lawrence. Gaussian process latent variable models for visualisation of high dimensionaldata. In NIPS.. [9] N. D. Lawrence and A. J. Moore. Hierarchical gaussian process latent variable models. In ICML, pages ACM, 7. [] C.-S. Lee and A. Elgammal. Simultaneous inference of view and body pose using torus manifolds. In ICPR, 6. [] F. Monay and D. Gatica-Perez. Plsa-based image autoannotation: constraining the latent space. In ACM Inter. Conf. on Multimedia, pages 38 35,. [] K. Moon and V. Pavlovic. Impact of dynamics on subspace embedding and tracking of sequences. In CVPR, pages 98 5, June 6. [3] R. Navaratnam, A. Fitzgibbon, and R. Cipolla. Semisupervised joint manifold learning for multi-valued regression. In ICCV, 7. [] P. W. Peter and S. T. Dumais. Personalized information delivery: an analysis of information filteringmethods. Commun. ACM, 35:5 6, December 99. [5] A. Shon, K. Grochow, A. Hertzmann, and R. Rao. Learning shared latent structure for image synthesis. NIPS, 5. [6] H. Sidenbladh, M. J. Black, and D. J. Fleet. Stochastic tracking of 3d human figures using d image motion. In ECCV, pages 7 78,. [7] L. Sigal, S. Bhatia, S. Roth, M. J. Black, and M. Isard. Tracking loose-limbed people. In CVPR, pages 8,. [8] J. Sivic, B. Russell, A. A. Efros, A. Zisserman, and W. T. Freeman. Discovering object categories in image collections. In ICCV, volume, pages , 5. [9] C. Sminchisescu and A. Jepson. Generative modeling for continuous non-linearly embedded visual inference. In ICML,. [] C. Sminchisescu, A. Kanaujia, Z. Li, and D. Metaxas. Discriminative density propagation for 3d human motion estimation. In CVPR, pages , 5. [] C. Sminchisescu, A. Kanujia, Z. Li, and D. Metaxas. Conditional visual tracking in kernel space. In NIPS, 5. [] T.-P. Tian, R. Li, and S. Sclaroff. Articulated pose estimation in a learned smooth space of feasible solutions. In CVPR, 5. [3] R. Urtasun, D. J. Fleet, and P. Fua. 3d people tracking with gaussian process dynamical models. In CVPR, pages 38 5, 6. [] R. Urtasun, D. J. Fleet, A. Hertzmann, and P. Fua. Priors for people tracking from small training sets. In ICCV, 5. [5] J. M. Wang, D. J. Fleet, and A. Hertzmann. Gaussian process dynamical models. In NIPS. 5. 7

8 Figure 6. Input images with unknown view point and 3D reconstructions using DPLSA tracking. First row: true pose. Second and third rows: π view angle. Fourth and fifth rows: 3π view angle. Figure 7. First: Input real walking images of subject. Second row: Image silhouettes. Third row: Images of the reconstructed 3D poses. Fourth row: Input real walking images of subject 5. Fifth row: Images of the reconstructed 3D poses. 8

Graphical Models for Human Motion Modeling

Graphical Models for Human Motion Modeling Graphical Models for Human Motion Modeling Kooksang Moon and Vladimir Pavlović Rutgers University, Department of Computer Science Piscataway, NJ 8854, USA Abstract. The human figure exhibits complex and

More information

3D Human Motion Analysis and Manifolds

3D Human Motion Analysis and Manifolds D E P A R T M E N T O F C O M P U T E R S C I E N C E U N I V E R S I T Y O F C O P E N H A G E N 3D Human Motion Analysis and Manifolds Kim Steenstrup Pedersen DIKU Image group and E-Science center Motivation

More information

Does Dimensionality Reduction Improve the Quality of Motion Interpolation?

Does Dimensionality Reduction Improve the Quality of Motion Interpolation? Does Dimensionality Reduction Improve the Quality of Motion Interpolation? Sebastian Bitzer, Stefan Klanke and Sethu Vijayakumar School of Informatics - University of Edinburgh Informatics Forum, 10 Crichton

More information

The Role of Manifold Learning in Human Motion Analysis

The Role of Manifold Learning in Human Motion Analysis The Role of Manifold Learning in Human Motion Analysis Ahmed Elgammal and Chan Su Lee Department of Computer Science, Rutgers University, Piscataway, NJ, USA {elgammal,chansu}@cs.rutgers.edu Abstract.

More information

Conditional Visual Tracking in Kernel Space

Conditional Visual Tracking in Kernel Space Conditional Visual Tracking in Kernel Space Cristian Sminchisescu 1,2,3 Atul Kanujia 3 Zhiguo Li 3 Dimitris Metaxas 3 1 TTI-C, 1497 East 50th Street, Chicago, IL, 60637, USA 2 University of Toronto, Department

More information

Gaussian Process Motion Graph Models for Smooth Transitions among Multiple Actions

Gaussian Process Motion Graph Models for Smooth Transitions among Multiple Actions Gaussian Process Motion Graph Models for Smooth Transitions among Multiple Actions Norimichi Ukita 1 and Takeo Kanade Graduate School of Information Science, Nara Institute of Science and Technology The

More information

Improving Recognition through Object Sub-categorization

Improving Recognition through Object Sub-categorization Improving Recognition through Object Sub-categorization Al Mansur and Yoshinori Kuno Graduate School of Science and Engineering, Saitama University, 255 Shimo-Okubo, Sakura-ku, Saitama-shi, Saitama 338-8570,

More information

Priors for People Tracking from Small Training Sets Λ

Priors for People Tracking from Small Training Sets Λ Priors for People Tracking from Small Training Sets Λ Raquel Urtasun CVLab EPFL, Lausanne Switzerland David J. Fleet Computer Science Dept. University of Toronto Canada Aaron Hertzmann Computer Science

More information

Monocular Human Motion Capture with a Mixture of Regressors. Ankur Agarwal and Bill Triggs GRAVIR-INRIA-CNRS, Grenoble, France

Monocular Human Motion Capture with a Mixture of Regressors. Ankur Agarwal and Bill Triggs GRAVIR-INRIA-CNRS, Grenoble, France Monocular Human Motion Capture with a Mixture of Regressors Ankur Agarwal and Bill Triggs GRAVIR-INRIA-CNRS, Grenoble, France IEEE Workshop on Vision for Human-Computer Interaction, 21 June 2005 Visual

More information

Coupled Visual and Kinematic Manifold Models for Tracking

Coupled Visual and Kinematic Manifold Models for Tracking Int J Comput Vis (2010) 87: 118 139 DOI 10.1007/s11263-009-0266-5 Coupled Visual and Kinematic Manifold Models for Tracking C.-S. Lee A. Elgammal Received: 13 January 2008 / Accepted: 29 June 2009 / Published

More information

Probabilistic Tracking and Reconstruction of 3D Human Motion in Monocular Video Sequences

Probabilistic Tracking and Reconstruction of 3D Human Motion in Monocular Video Sequences Probabilistic Tracking and Reconstruction of 3D Human Motion in Monocular Video Sequences Presentation of the thesis work of: Hedvig Sidenbladh, KTH Thesis opponent: Prof. Bill Freeman, MIT Thesis supervisors

More information

Tracking People on a Torus

Tracking People on a Torus IEEE TRANSACTION ON PATTERN RECOGNITION AND MACHINE INTELLIGENCE (UNDER REVIEW) Tracking People on a Torus Ahmed Elgammal Member, IEEE, and Chan-Su Lee, Student Member, IEEE, Department of Computer Science,

More information

Gaussian Process Dynamical Models

Gaussian Process Dynamical Models Gaussian Process Dynamical Models Jack M. Wang, David J. Fleet, Aaron Hertzmann Department of Computer Science University of Toronto, Toronto, ON M5S 3G4 jmwang,hertzman@dgp.toronto.edu, fleet@cs.toronto.edu

More information

Human Motion Change Detection By Hierarchical Gaussian Process Dynamical Model With Particle Filter

Human Motion Change Detection By Hierarchical Gaussian Process Dynamical Model With Particle Filter Human Motion Change Detection By Hierarchical Gaussian Process Dynamical Model With Particle Filter Yafeng Yin, Hong Man, Jing Wang, Guang Yang ECE Department, Stevens Institute of Technology Hoboken,

More information

Human Upper Body Pose Estimation in Static Images

Human Upper Body Pose Estimation in Static Images 1. Research Team Human Upper Body Pose Estimation in Static Images Project Leader: Graduate Students: Prof. Isaac Cohen, Computer Science Mun Wai Lee 2. Statement of Project Goals This goal of this project

More information

Gaussian Process Dynamical Models

Gaussian Process Dynamical Models DRAFT Final version to appear in NIPS 18. Gaussian Process Dynamical Models Jack M. Wang, David J. Fleet, Aaron Hertzmann Department of Computer Science University of Toronto, Toronto, ON M5S 3G4 jmwang,hertzman

More information

People Tracking with the Laplacian Eigenmaps Latent Variable Model

People Tracking with the Laplacian Eigenmaps Latent Variable Model People Tracking with the Laplacian Eigenmaps Latent Variable Model Zhengdong Lu CSEE, OGI, OHSU zhengdon@csee.ogi.edu Miguel Á. Carreira-Perpiñán EECS, UC Merced http://eecs.ucmerced.edu Cristian Sminchisescu

More information

Hierarchical Gaussian Process Latent Variable Models

Hierarchical Gaussian Process Latent Variable Models Neil D. Lawrence neill@cs.man.ac.uk School of Computer Science, University of Manchester, Kilburn Building, Oxford Road, Manchester, M13 9PL, U.K. Andrew J. Moore A.Moore@dcs.shef.ac.uk Dept of Computer

More information

Inferring 3D from 2D

Inferring 3D from 2D Inferring 3D from 2D History Monocular vs. multi-view analysis Difficulties structure of the solution and ambiguities static and dynamic ambiguities Modeling frameworks for inference and learning top-down

More information

08 An Introduction to Dense Continuous Robotic Mapping

08 An Introduction to Dense Continuous Robotic Mapping NAVARCH/EECS 568, ROB 530 - Winter 2018 08 An Introduction to Dense Continuous Robotic Mapping Maani Ghaffari March 14, 2018 Previously: Occupancy Grid Maps Pose SLAM graph and its associated dense occupancy

More information

Learning Shape Priors for Single View Reconstruction

Learning Shape Priors for Single View Reconstruction Learning Shape Priors for Single View Reconstruction Yu Chen and Roberto Cipolla Department of Engineering, University of Cambridge {yc301 and rc10001}@cam.ac.uk Abstract In this paper, we aim to reconstruct

More information

Behaviour based particle filtering for human articulated motion tracking

Behaviour based particle filtering for human articulated motion tracking Loughborough University Institutional Repository Behaviour based particle filtering for human articulated motion tracking This item was submitted to Loughborough University's Institutional Repository by

More information

Spatial Latent Dirichlet Allocation

Spatial Latent Dirichlet Allocation Spatial Latent Dirichlet Allocation Xiaogang Wang and Eric Grimson Computer Science and Computer Science and Artificial Intelligence Lab Massachusetts Tnstitute of Technology, Cambridge, MA, 02139, USA

More information

Homeomorphic Manifold Analysis (HMA): Generalized Separation of Style and Content on Manifolds

Homeomorphic Manifold Analysis (HMA): Generalized Separation of Style and Content on Manifolds Homeomorphic Manifold Analysis (HMA): Generalized Separation of Style and Content on Manifolds Ahmed Elgammal a,,, Chan-Su Lee b a Department of Computer Science, Rutgers University, Frelinghuysen Rd,

More information

Visual Motion Analysis and Tracking Part II

Visual Motion Analysis and Tracking Part II Visual Motion Analysis and Tracking Part II David J Fleet and Allan D Jepson CIAR NCAP Summer School July 12-16, 16, 2005 Outline Optical Flow and Tracking: Optical flow estimation (robust, iterative refinement,

More information

Inferring 3D People from 2D Images

Inferring 3D People from 2D Images Inferring 3D People from 2D Images Department of Computer Science Brown University http://www.cs.brown.edu/~black Collaborators Hedvig Sidenbladh, Swedish Defense Research Inst. Leon Sigal, Brown University

More information

Learning GP-BayesFilters via Gaussian Process Latent Variable Models

Learning GP-BayesFilters via Gaussian Process Latent Variable Models Learning GP-BayesFilters via Gaussian Process Latent Variable Models Jonathan Ko Dieter Fox University of Washington, Department of Computer Science & Engineering, Seattle, WA Abstract GP-BayesFilters

More information

Predicting 3D People from 2D Pictures

Predicting 3D People from 2D Pictures Predicting 3D People from 2D Pictures Leonid Sigal Michael J. Black Department of Computer Science Brown University http://www.cs.brown.edu/people/ls/ CIAR Summer School August 15-20, 2006 Leonid Sigal

More information

Tracking Human Body Pose on a Learned Smooth Space

Tracking Human Body Pose on a Learned Smooth Space Boston U. Computer Science Tech. Report No. 2005-029, Aug. 2005. Tracking Human Body Pose on a Learned Smooth Space Tai-Peng Tian Rui Li Stan Sclaroff Computer Science Department Boston University Boston,

More information

Tracking Generic Human Motion via Fusion of Low- and High-Dimensional Approaches

Tracking Generic Human Motion via Fusion of Low- and High-Dimensional Approaches XU, CUI, ZHAO, ZHA: TRACKING GENERIC HUMAN MOTION VIA FUSION 1 Tracking Generic Human Motion via Fusion of Low- and High-Dimensional Approaches Yuandong Xu xuyd@cis.pku.edu.cn Jinshi Cui cjs@cis.pku.edu.cn

More information

Computer Vision II Lecture 14

Computer Vision II Lecture 14 Computer Vision II Lecture 14 Articulated Tracking I 08.07.2014 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Outline of This Lecture Single-Object Tracking Bayesian

More information

Modeling 3D Human Poses from Uncalibrated Monocular Images

Modeling 3D Human Poses from Uncalibrated Monocular Images Modeling 3D Human Poses from Uncalibrated Monocular Images Xiaolin K. Wei Texas A&M University xwei@cse.tamu.edu Jinxiang Chai Texas A&M University jchai@cse.tamu.edu Abstract This paper introduces an

More information

Shared Kernel Information Embedding for Discriminative Inference

Shared Kernel Information Embedding for Discriminative Inference Shared Kernel Information Embedding for Discriminative Inference Leonid Sigal Roland Memisevic David J. Fleet Department of Computer Science, University of Toronto {ls,roland,fleet}@cs.toronto.edu Abstract

More information

Factorized Orthogonal Latent Spaces

Factorized Orthogonal Latent Spaces Mathieu Salzmann Carl Henrik Ek Raquel Urtasun Trevor Darrell EECS & ICSI, UC Berkeley EECS & ICSI, UC Berkeley TTI Chicago EECS & ICSI, UC Berkeley Abstract Existing approaches to multi-view learning

More information

Inferring 3D Body Pose from Silhouettes using Activity Manifold Learning

Inferring 3D Body Pose from Silhouettes using Activity Manifold Learning CVPR 4 Inferring 3D Body Pose from Silhouettes using Activity Manifold Learning Ahmed Elgammal and Chan-Su Lee Department of Computer Science, Rutgers University, New Brunswick, NJ, USA {elgammal,chansu}@cs.rutgers.edu

More information

Style-based Inverse Kinematics

Style-based Inverse Kinematics Style-based Inverse Kinematics Keith Grochow, Steven L. Martin, Aaron Hertzmann, Zoran Popovic SIGGRAPH 04 Presentation by Peter Hess 1 Inverse Kinematics (1) Goal: Compute a human body pose from a set

More information

Human Context: Modeling human-human interactions for monocular 3D pose estimation

Human Context: Modeling human-human interactions for monocular 3D pose estimation Human Context: Modeling human-human interactions for monocular 3D pose estimation Mykhaylo Andriluka Leonid Sigal Max Planck Institute for Informatics, Saarbrücken, Germany Disney Research, Pittsburgh,

More information

Robotic Imitation from Human Motion Capture using Gaussian Processes

Robotic Imitation from Human Motion Capture using Gaussian Processes Robotic Imitation from Human Motion Capture using Gaussian Processes Aaron P. Shon Keith Grochow Rajesh P.N. Rao University of Washington Computer Science and Engineering Department Seattle, WA 98195 USA

More information

Learning Deformations of Human Arm Movement to Adapt to Environmental Constraints

Learning Deformations of Human Arm Movement to Adapt to Environmental Constraints Learning Deformations of Human Arm Movement to Adapt to Environmental Constraints Stephan Al-Zubi and Gerald Sommer Cognitive Systems, Christian Albrechts University, Kiel, Germany Abstract. We propose

More information

Ambiguity Modeling in Latent Spaces

Ambiguity Modeling in Latent Spaces Ambiguity Modeling in Latent Spaces Carl Henrik Ek 1, Jon Rihan 1, Philip H.S. Torr 1, Grégory Rogez 2, and Neil D. Lawrence 3 1 Oxford Brookes University, UK {cek,jon.rihan,philiptorr}@brookes.ac.uk 2

More information

Note Set 4: Finite Mixture Models and the EM Algorithm

Note Set 4: Finite Mixture Models and the EM Algorithm Note Set 4: Finite Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine Finite Mixture Models A finite mixture model with K components, for

More information

Using GPLVM for Inverse Kinematics on Non-cyclic Data

Using GPLVM for Inverse Kinematics on Non-cyclic Data 000 00 002 003 004 005 006 007 008 009 00 0 02 03 04 05 06 07 08 09 020 02 022 023 024 025 026 027 028 029 030 03 032 033 034 035 036 037 038 039 040 04 042 043 044 045 046 047 048 049 050 05 052 053 Using

More information

Motion Interpretation and Synthesis by ICA

Motion Interpretation and Synthesis by ICA Motion Interpretation and Synthesis by ICA Renqiang Min Department of Computer Science, University of Toronto, 1 King s College Road, Toronto, ON M5S3G4, Canada Abstract. It is known that high-dimensional

More information

Human Motion Synthesis by Motion Manifold Learning and Motion Primitive Segmentation

Human Motion Synthesis by Motion Manifold Learning and Motion Primitive Segmentation Human Motion Synthesis by Motion Manifold Learning and Motion Primitive Segmentation Chan-Su Lee and Ahmed Elgammal Rutgers University, Piscataway, NJ, USA {chansu, elgammal}@cs.rutgers.edu Abstract. We

More information

Term Project Final Report for CPSC526 Statistical Models of Poses Using Inverse Kinematics

Term Project Final Report for CPSC526 Statistical Models of Poses Using Inverse Kinematics Term Project Final Report for CPSC526 Statistical Models of Poses Using Inverse Kinematics Department of Computer Science The University of British Columbia duanx@cs.ubc.ca, lili1987@cs.ubc.ca Abstract

More information

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of

More information

Modelling and Visualization of High Dimensional Data. Sample Examination Paper

Modelling and Visualization of High Dimensional Data. Sample Examination Paper Duration not specified UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE Modelling and Visualization of High Dimensional Data Sample Examination Paper Examination date not specified Time: Examination

More information

A Taxonomy of Semi-Supervised Learning Algorithms

A Taxonomy of Semi-Supervised Learning Algorithms A Taxonomy of Semi-Supervised Learning Algorithms Olivier Chapelle Max Planck Institute for Biological Cybernetics December 2005 Outline 1 Introduction 2 Generative models 3 Low density separation 4 Graph

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Learning without Class Labels (or correct outputs) Density Estimation Learn P(X) given training data for X Clustering Partition data into clusters Dimensionality Reduction Discover

More information

Dimensionality Reduction and Generation of Human Motion

Dimensionality Reduction and Generation of Human Motion INT J COMPUT COMMUN, ISSN 1841-9836 8(6):869-877, December, 2013. Dimensionality Reduction and Generation of Human Motion S. Qu, L.D. Wu, Y.M. Wei, R.H. Yu Shi Qu* Air Force Early Warning Academy No.288,

More information

Unsupervised learning in Vision

Unsupervised learning in Vision Chapter 7 Unsupervised learning in Vision The fields of Computer Vision and Machine Learning complement each other in a very natural way: the aim of the former is to extract useful information from visual

More information

Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Exploratory data analysis tasks Examine the data, in search of structures

More information

Visual Tracking of Human Body with Deforming Motion and Shape Average

Visual Tracking of Human Body with Deforming Motion and Shape Average Visual Tracking of Human Body with Deforming Motion and Shape Average Alessandro Bissacco UCLA Computer Science Los Angeles, CA 90095 bissacco@cs.ucla.edu UCLA CSD-TR # 020046 Abstract In this work we

More information

Semi-supervised Learning of Joint Density Models for Human Pose Estimation

Semi-supervised Learning of Joint Density Models for Human Pose Estimation Semi-supervised Learning of Joint Density Models for Human Pose Estimation Ramanan Navaratnam Andrew Fitzgibbon Roberto Cipolla University of Cambridge Department of Engineering Cambridge, CB2 PZ, UK http://mi.eng.cam.ac.uk/

More information

Expanding gait identification methods from straight to curved trajectories

Expanding gait identification methods from straight to curved trajectories Expanding gait identification methods from straight to curved trajectories Yumi Iwashita, Ryo Kurazume Kyushu University 744 Motooka Nishi-ku Fukuoka, Japan yumi@ieee.org Abstract Conventional methods

More information

Learning Alignments from Latent Space Structures

Learning Alignments from Latent Space Structures Learning Alignments from Latent Space Structures Ieva Kazlauskaite Department of Computer Science University of Bath, UK i.kazlauskaite@bath.ac.uk Carl Henrik Ek Faculty of Engineering University of Bristol,

More information

Estimation of Human Figure Motion Using Robust Tracking of Articulated Layers

Estimation of Human Figure Motion Using Robust Tracking of Articulated Layers Estimation of Human Figure Motion Using Robust Tracking of Articulated Layers Kooksang Moon and Vladimir Pavlović Department of Computer Science Rutgers University Piscataway, NJ 08854 Abstract We propose

More information

Deep Generative Models Variational Autoencoders

Deep Generative Models Variational Autoencoders Deep Generative Models Variational Autoencoders Sudeshna Sarkar 5 April 2017 Generative Nets Generative models that represent probability distributions over multiple variables in some way. Directed Generative

More information

Random projection for non-gaussian mixture models

Random projection for non-gaussian mixture models Random projection for non-gaussian mixture models Győző Gidófalvi Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 92037 gyozo@cs.ucsd.edu Abstract Recently,

More information

Real-Time Human Pose Inference using Kernel Principal Component Pre-image Approximations

Real-Time Human Pose Inference using Kernel Principal Component Pre-image Approximations 1 Real-Time Human Pose Inference using Kernel Principal Component Pre-image Approximations T. Tangkuampien and D. Suter Institute for Vision Systems Engineering Monash University, Australia {therdsak.tangkuampien,d.suter}@eng.monash.edu.au

More information

arxiv: v1 [cs.cv] 28 Sep 2018

arxiv: v1 [cs.cv] 28 Sep 2018 Camera Pose Estimation from Sequence of Calibrated Images arxiv:1809.11066v1 [cs.cv] 28 Sep 2018 Jacek Komorowski 1 and Przemyslaw Rokita 2 1 Maria Curie-Sklodowska University, Institute of Computer Science,

More information

A Swarm Intelligence Based Searching Strategy for Articulated 3D Human Body Tracking

A Swarm Intelligence Based Searching Strategy for Articulated 3D Human Body Tracking A Swarm Intelligence Based Searching Strategy for Articulated 3D Human Body Tracking Xiaoqin Zhang 1,2, Weiming Hu 1, Xiangyang Wang 3, Yu Kong 4, Nianhua Xie 1, Hanzi Wang 5, Haibin Ling 6, Steve Maybank

More information

Robust Model-Free Tracking of Non-Rigid Shape. Abstract

Robust Model-Free Tracking of Non-Rigid Shape. Abstract Robust Model-Free Tracking of Non-Rigid Shape Lorenzo Torresani Stanford University ltorresa@cs.stanford.edu Christoph Bregler New York University chris.bregler@nyu.edu New York University CS TR2003-840

More information

Estimating Human Pose in Images. Navraj Singh December 11, 2009

Estimating Human Pose in Images. Navraj Singh December 11, 2009 Estimating Human Pose in Images Navraj Singh December 11, 2009 Introduction This project attempts to improve the performance of an existing method of estimating the pose of humans in still images. Tasks

More information

Tracking Articulated Bodies using Generalized Expectation Maximization

Tracking Articulated Bodies using Generalized Expectation Maximization Tracking Articulated Bodies using Generalized Expectation Maximization A. Fossati CVLab EPFL, Switzerland andrea.fossati@epfl.ch E. Arnaud Université Joseph Fourier INRIA Rhone-Alpes, France elise.arnaud@inrialpes.fr

More information

International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol. XXXIV-5/W10

International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol. XXXIV-5/W10 BUNDLE ADJUSTMENT FOR MARKERLESS BODY TRACKING IN MONOCULAR VIDEO SEQUENCES Ali Shahrokni, Vincent Lepetit, Pascal Fua Computer Vision Lab, Swiss Federal Institute of Technology (EPFL) ali.shahrokni,vincent.lepetit,pascal.fua@epfl.ch

More information

Multi-Activity Tracking in LLE Body Pose Space

Multi-Activity Tracking in LLE Body Pose Space Multi-Activity Tracking in LLE Body Pose Space Tobias Jaeggli 1, Esther Koller-Meier 1, and Luc Van Gool 1,2 1 ETH Zurich, D-ITET/BIWI, CH-8092 Zurich 2 Katholieke Universiteit Leuven, ESAT/VISICS, B-3001

More information

Mixture Models and the EM Algorithm

Mixture Models and the EM Algorithm Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Finite Mixture Models Say we have a data set D = {x 1,..., x N } where x i is

More information

Human Tracking and Segmentation Supported by Silhouette-based Gait Recognition

Human Tracking and Segmentation Supported by Silhouette-based Gait Recognition 2008 IEEE International Conference on Robotics and Automation Pasadena, CA, USA, May 19-23, 2008 Human Tracking and Segmentation Supported by Silhouette-based Gait Recognition Junqiu Wang*, Yasushi Makihara*,

More information

Bilevel Sparse Coding

Bilevel Sparse Coding Adobe Research 345 Park Ave, San Jose, CA Mar 15, 2013 Outline 1 2 The learning model The learning algorithm 3 4 Sparse Modeling Many types of sensory data, e.g., images and audio, are in high-dimensional

More information

Text Modeling with the Trace Norm

Text Modeling with the Trace Norm Text Modeling with the Trace Norm Jason D. M. Rennie jrennie@gmail.com April 14, 2006 1 Introduction We have two goals: (1) to find a low-dimensional representation of text that allows generalization to

More information

Particle Filtering. CS6240 Multimedia Analysis. Leow Wee Kheng. Department of Computer Science School of Computing National University of Singapore

Particle Filtering. CS6240 Multimedia Analysis. Leow Wee Kheng. Department of Computer Science School of Computing National University of Singapore Particle Filtering CS6240 Multimedia Analysis Leow Wee Kheng Department of Computer Science School of Computing National University of Singapore (CS6240) Particle Filtering 1 / 28 Introduction Introduction

More information

Nonlinear Generative Models for Dynamic Shape and Dynamic Appearance

Nonlinear Generative Models for Dynamic Shape and Dynamic Appearance Nonlinear Generative Models for Dynamic Shape and Dynamic Appearance Ahmed Elgammal Department of Computer Science, Rutgers University, Piscataway, NJ, USA elgammal@cs.rutgers.edu Abstract Our objective

More information

Object Recognition Using Pictorial Structures. Daniel Huttenlocher Computer Science Department. In This Talk. Object recognition in computer vision

Object Recognition Using Pictorial Structures. Daniel Huttenlocher Computer Science Department. In This Talk. Object recognition in computer vision Object Recognition Using Pictorial Structures Daniel Huttenlocher Computer Science Department Joint work with Pedro Felzenszwalb, MIT AI Lab In This Talk Object recognition in computer vision Brief definition

More information

Selecting Models from Videos for Appearance-Based Face Recognition

Selecting Models from Videos for Appearance-Based Face Recognition Selecting Models from Videos for Appearance-Based Face Recognition Abdenour Hadid and Matti Pietikäinen Machine Vision Group Infotech Oulu and Department of Electrical and Information Engineering P.O.

More information

Face View Synthesis Across Large Angles

Face View Synthesis Across Large Angles Face View Synthesis Across Large Angles Jiang Ni and Henry Schneiderman Robotics Institute, Carnegie Mellon University, Pittsburgh, PA 1513, USA Abstract. Pose variations, especially large out-of-plane

More information

Auxiliary Variational Information Maximization for Dimensionality Reduction

Auxiliary Variational Information Maximization for Dimensionality Reduction Auxiliary Variational Information Maximization for Dimensionality Reduction Felix Agakov 1 and David Barber 2 1 University of Edinburgh, 5 Forrest Hill, EH1 2QL Edinburgh, UK felixa@inf.ed.ac.uk, www.anc.ed.ac.uk

More information

Conditional Random Fields for Object Recognition

Conditional Random Fields for Object Recognition Conditional Random Fields for Object Recognition Ariadna Quattoni Michael Collins Trevor Darrell MIT Computer Science and Artificial Intelligence Laboratory Cambridge, MA 02139 {ariadna, mcollins, trevor}@csail.mit.edu

More information

Semi-Supervised Hierarchical Models for 3D Human Pose Reconstruction

Semi-Supervised Hierarchical Models for 3D Human Pose Reconstruction Semi-Supervised Hierarchical Models for 3D Human Pose Reconstruction Atul Kanaujia, CBIM, Rutgers Cristian Sminchisescu, TTI-C Dimitris Metaxas,CBIM, Rutgers 3D Human Pose Inference Difficulties Towards

More information

Stereo Matching for Calibrated Cameras without Correspondence

Stereo Matching for Calibrated Cameras without Correspondence Stereo Matching for Calibrated Cameras without Correspondence U. Helmke, K. Hüper, and L. Vences Department of Mathematics, University of Würzburg, 97074 Würzburg, Germany helmke@mathematik.uni-wuerzburg.de

More information

Cross-pose Facial Expression Recognition

Cross-pose Facial Expression Recognition Cross-pose Facial Expression Recognition Abstract In real world facial expression recognition (FER) applications, it is not practical for a user to enroll his/her facial expressions under different pose

More information

Learning a Manifold as an Atlas Supplementary Material

Learning a Manifold as an Atlas Supplementary Material Learning a Manifold as an Atlas Supplementary Material Nikolaos Pitelis Chris Russell School of EECS, Queen Mary, University of London [nikolaos.pitelis,chrisr,lourdes]@eecs.qmul.ac.uk Lourdes Agapito

More information

The Laplacian Eigenmaps Latent Variable Model

The Laplacian Eigenmaps Latent Variable Model The Laplacian Eigenmaps Latent Variable Model with applications to articulated pose tracking Miguel Á. Carreira-Perpiñán EECS, UC Merced http://faculty.ucmerced.edu/mcarreira-perpinan Articulated pose

More information

CS 223B Computer Vision Problem Set 3

CS 223B Computer Vision Problem Set 3 CS 223B Computer Vision Problem Set 3 Due: Feb. 22 nd, 2011 1 Probabilistic Recursion for Tracking In this problem you will derive a method for tracking a point of interest through a sequence of images.

More information

Human Activity Tracking from Moving Camera Stereo Data

Human Activity Tracking from Moving Camera Stereo Data Human Activity Tracking from Moving Camera Stereo Data John Darby, Baihua Li and Nicholas Costen Department of Computing and Mathematics Manchester Metropolitan University, John Dalton Building Chester

More information

Scene Segmentation in Adverse Vision Conditions

Scene Segmentation in Adverse Vision Conditions Scene Segmentation in Adverse Vision Conditions Evgeny Levinkov Max Planck Institute for Informatics, Saarbrücken, Germany levinkov@mpi-inf.mpg.de Abstract. Semantic road labeling is a key component of

More information

Motion Estimation (II) Ce Liu Microsoft Research New England

Motion Estimation (II) Ce Liu Microsoft Research New England Motion Estimation (II) Ce Liu celiu@microsoft.com Microsoft Research New England Last time Motion perception Motion representation Parametric motion: Lucas-Kanade T I x du dv = I x I T x I y I x T I y

More information

Epitomic Analysis of Human Motion

Epitomic Analysis of Human Motion Epitomic Analysis of Human Motion Wooyoung Kim James M. Rehg Department of Computer Science Georgia Institute of Technology Atlanta, GA 30332 {wooyoung, rehg}@cc.gatech.edu Abstract Epitomic analysis is

More information

Recent Developments in Model-based Derivative-free Optimization

Recent Developments in Model-based Derivative-free Optimization Recent Developments in Model-based Derivative-free Optimization Seppo Pulkkinen April 23, 2010 Introduction Problem definition The problem we are considering is a nonlinear optimization problem with constraints:

More information

Clothed and Naked Human Shapes Estimation from a Single Image

Clothed and Naked Human Shapes Estimation from a Single Image Clothed and Naked Human Shapes Estimation from a Single Image Yu Guo, Xiaowu Chen, Bin Zhou, and Qinping Zhao State Key Laboratory of Virtual Reality Technology and Systems School of Computer Science and

More information

Monocular 3D human pose estimation with a semi-supervised graph-based method

Monocular 3D human pose estimation with a semi-supervised graph-based method Monocular 3D human pose estimation with a semi-supervised graph-based method Mahdieh Abbasi Computer Vision and Systems Laboratory Department o Electrical Engineering and Computer Engineering Université

More information

CALIBRATION BETWEEN DEPTH AND COLOR SENSORS FOR COMMODITY DEPTH CAMERAS. Cha Zhang and Zhengyou Zhang

CALIBRATION BETWEEN DEPTH AND COLOR SENSORS FOR COMMODITY DEPTH CAMERAS. Cha Zhang and Zhengyou Zhang CALIBRATION BETWEEN DEPTH AND COLOR SENSORS FOR COMMODITY DEPTH CAMERAS Cha Zhang and Zhengyou Zhang Communication and Collaboration Systems Group, Microsoft Research {chazhang, zhang}@microsoft.com ABSTRACT

More information

A Factorization Method for Structure from Planar Motion

A Factorization Method for Structure from Planar Motion A Factorization Method for Structure from Planar Motion Jian Li and Rama Chellappa Center for Automation Research (CfAR) and Department of Electrical and Computer Engineering University of Maryland, College

More information

Recent advances in Metamodel of Optimal Prognosis. Lectures. Thomas Most & Johannes Will

Recent advances in Metamodel of Optimal Prognosis. Lectures. Thomas Most & Johannes Will Lectures Recent advances in Metamodel of Optimal Prognosis Thomas Most & Johannes Will presented at the Weimar Optimization and Stochastic Days 2010 Source: www.dynardo.de/en/library Recent advances in

More information

Simultaneous Learning of Nonlinear Manifold and Dynamical Models for High-dimensional Time Series

Simultaneous Learning of Nonlinear Manifold and Dynamical Models for High-dimensional Time Series Boston U. Computer Science Tech. Report No. 7-, Aug. 7. To appear in Proc. International Conference on Computer Vision, Oct. 7. Simultaneous Learning of Nonlinear Manifold and Dynamical Models for High-dimensional

More information

Object Tracking with an Adaptive Color-Based Particle Filter

Object Tracking with an Adaptive Color-Based Particle Filter Object Tracking with an Adaptive Color-Based Particle Filter Katja Nummiaro 1, Esther Koller-Meier 2, and Luc Van Gool 1,2 1 Katholieke Universiteit Leuven, ESAT/VISICS, Belgium {knummiar,vangool}@esat.kuleuven.ac.be

More information

A Dynamic Human Model using Hybrid 2D-3D Representations in Hierarchical PCA Space

A Dynamic Human Model using Hybrid 2D-3D Representations in Hierarchical PCA Space A Dynamic Human Model using Hybrid 2D-3D Representations in Hierarchical PCA Space Eng-Jon Ong and Shaogang Gong Department of Computer Science, Queen Mary and Westfield College, London E1 4NS, UK fongej

More information

Separating Style and Content on a Nonlinear Manifold

Separating Style and Content on a Nonlinear Manifold CVPR 4 Separating Style and Content on a Nonlinear Manifold Ahmed Elgammal and Chan-Su Lee Department of Computer Science, Rutgers University, New Brunswick, NJ, USA {elgammal,chansu}@cs.rutgers.edu Abstract

More information

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010 INFORMATICS SEMINAR SEPT. 27 & OCT. 4, 2010 Introduction to Semi-Supervised Learning Review 2 Overview Citation X. Zhu and A.B. Goldberg, Introduction to Semi- Supervised Learning, Morgan & Claypool Publishers,

More information

CS 231A Computer Vision (Fall 2012) Problem Set 3

CS 231A Computer Vision (Fall 2012) Problem Set 3 CS 231A Computer Vision (Fall 2012) Problem Set 3 Due: Nov. 13 th, 2012 (2:15pm) 1 Probabilistic Recursion for Tracking (20 points) In this problem you will derive a method for tracking a point of interest

More information