Spectral Latent Variable Models for Perceptual Inference

Size: px
Start display at page:

Download "Spectral Latent Variable Models for Perceptual Inference"

Transcription

1 Technical Report, Department of Computer Science, Rutgers University, #DCS-TR-6, February 7 Spectral Latent Variable for Perceptual Inference Atul Kanaujia 1 Cristian Sminchisescu 2,3 Dimitris Metaxas 1 1 Department of Computer Science, Rutgers University, USA {kanaujia,dnm}@cs.rutgers.edu, kanaujia,dnm 2 Department of Computer Science, University of Toronto, Canada 3 TTI-Chicago, USA crismin@cs.toronto.edu, crismin May 22, 7 Abstract We propose low-dimensional non-linear latent variable models adapted for perceptual inference. We introduce a family of algorithms, referred to as Sparse Spectral Latent Variable (SLVM) that combine the advantages of spectral embeddings with the ones of parametric latent variable models like the Generative Topographic Mapping (): (1) they provide local-optima free embeddings that preserve intuitive properties, e.g. local or global geometry of the data, (2) offer low-dimensional generative models with probabilistic, bi-directional mappings between latent and ambient spaces, (3) are consistent and efficient to estimate. We show empirically that SLV versions of ISOMAP or LLE can be used successfully for complex visual inference tasks like the fully automatic reconstruction of 3d human poses in un-instrumented monocular video. 1 Introduction Many models in computer vision or machine learning are based on high-dimensional ambient observations (images, sound), considered to have been generated by non-linear, intrinsically low-dimensional processes. Many methods exist for computing such representations, but relatively few provide all the components necessary for inference tasks, e.g. tracking and reconstructing the three-dimensional pose of an object based on image observations. For such problems we need global representations and low-dimensional models with bidirectional mappings between latent and observation spaces. We also need to preserve global or local geometric properties of the data in latent space a model used for visual tracking based on ambient observations needs locality: small state changes in latent space should only produce commensurate changes in the observation space. If this doesn t hold the tracker may need to make realistically large and difficult to predict jumps in latent space in order to follow smooth trajectories in the observation space. We also need models that are probabilistic and can be learnt consistently, e.g. all their components estimated jointly, using ML. In this work we identify components that can provide consistent models with all the necessary properties: we marriage spectral embeddings and parametric latent variable models. This is a seemingly simple idea, but we know of no precursor. Spectral methods are attractive for their intuitive geometric properties and efficient global computation, but do not have a probabilistic extension, whereas regular grid-based generative methods like the Generative Topographic Mapping () are not geometry preserving, and encounter difficulties with latent spaces of dimension higher than 2-3, or when unfolding complex non-linear structure due to local optima. We propose a family of algorithms that combines the advantages of the two classes of methods: (1) they offer low-dimensional generative models with bi-directional, possible multimodal distributions (mappings) between latent and ambient spaces, (2) are probabilistically consistent and efficient to estimate. We refer to these probabilistic constructs defined on top of the irregular grid obtained from a spectral embedding as Sparse Spectral Latent Variable (SLVM). We show how SLV versions of ISOMAP, LLE of HE can 1

2 be used successfully for complex visual inference tasks like the fully automatic 3d reconstruction of human poses in non-instrumented monocular video. 1.1 Related Work Our research relates to work in spectral manifold learning, latent variable models and visual tracking. Spectral methods can model intuitive local or global geometric constraints [17, 21, 3, 7, 2] and their, local-optima free, polynomial time calculations are amenable to efficient optimization either sparse eigenvalue problems, or dense problems that can be solved with algebraic multigrid methods [4]. A variety of latent variable models exist, both linear (factor analysis, P) [16], and non-linear, e.g. mixture of factor analyzers or P [22], or grid-based methods like []. These methods are powerful but besides computational difficulties caused by multiple local optima, the models do not provide either global coordinate systems or latent spaces with flexible geometric structure. Recent work on visual tracking has recognized the need for global, low-dimensional, probabilistic models with intuitive geometric properties and latent-ambient mappings. More complex models have emerged by complementing with mappings the results of spectral methods. Elgammal & Lee [8] fit RBFs to the corresponding points of an LLE-embedding, but their model devoid of a latent space prior is not fully probabilistic and there are no mappings to the latent space. In independent work, Sminchisescu & Jepson [19] augment spectral embeddings (e.g. Laplacian Eigenmaps) with both latent space Gaussian mixture priors and sparse kernel regression mappings from latent to ambient space. Their model qualifies as a latent variable one, but ambient to latent mappings are computationally difficult to compute and the model is trained piece-wise not jointly. Urtasun et al [24] use an efficient Gaussian Process Latent Variable Model () of Lawrence [], but this optimizes a data reconstruction criterion, not based on the preservation of local or global geometric properties. The zero mean unit covariance prior in latent space makes the model somewhat agnostic for visual inference applications, where it is useful to penalize drifts from the manifold of admissible visual configurations. For improved stability, [24] use an augmented constrained (latent,ambient) state for tracking. We demonstrate the SLVMs for discriminative 3d human pose estimation, but here we predict a lowdimensional latent variable of the human pose directly from image sequences or human photographs. Existing research on visual inference using low-dimensional methods (e.g. 3d human pose tracking) has focused on generative tracking frameworks, based on either complex appearance (observation) models [19] or simpler models based on image tracks of the human body, assumed known, and assigned to the corresponding joints of a 3d model [24]. In practice such correspondences are difficult to obtain, and existing methods had to be initialized by hand. Initialization becomes more difficult as the dimension of the latent space increases, even for known correspondences - ultimately the search complexity is exponential in the latent space dimension. The advantages of latent variable models with generative trackers are counterbalanced by the difficulty to make models based on alignment work when the representation capacity of the 3d body model is reduced. E.g., even for simple walking motions, the amplitude of a tested motion may be different from the one in the training set. The 3d model may not align well with the image features and may gradually loose track. This explains why existing generative frameworks either relay on quite large training sets [19], assume that long-range model-image correspondences are given, or assume clean image silhouette data exists [8]. We depart from alignment approaches, to construct (inverse) image-based predictors that regress the latent body pose directly against image evidence. This allows more robust tracking when the latent variable models used have reduced representation capacity. 2 Sparse Spectral Latent Variable (SLVM) We work with two sets of vectors, of equal size N (possibly in correspondence), in two spaces referred as latent and ambient. Both sets are un-ordered, hence there is no constraint to place either one on a regular grid (i.e. a matrix in 2d or the cells of cube in 3d, as is the case of the latent vectors in []). The latent space vectors are generically denoted x, with dim(x) = d, the ambient vectors are y with dim(y) = D, typically with D >> d. 2

3 Spectral Embeddings We assume that vector-valued points in ambient space Y are captured from a highdimensional process (images, sound, human motion capture systems), whereas latent space points X are obtained using a spectral, non-linear embedding method like ISOMAP, LLE, Hessian or Laplacian Eigenmaps, etc. This provides several advantages: (1) Preserves intuitive local or global geometric properties. These are important for visual tracking applications, which allow efficient, smooth estimates of the state of an object in latent space for stability, it is important that small changes in latent space lead to small, continuous changes in the ambient / observation space, (2) no local optima (3) provide corresponding points with similar geometric properties. The resulting irregular grid of embedded points is used in order to construct a joint probability distribution over latent and ambient variables. Sparse Generative Spectral Mappings For our models, we assume that ambient vectors are related to the latent ones using a nonlinear vector-valued function F(x, W, α) with parameters W and output noise covariance σ. In other words the joint distribution has a non-linear constraint between two blocks of its variables. 1 The model can be derived analogously with the Generative Topographic Map [] (hence the name of this section), except that it is defined on an irregular grid X in latent space, as computed from a spectral embedding: p(y x,w, α, σ) = G(y F(x,W, α), σ) (1) where G is a Gaussian function of variable y, with mean F and covariance σ. F is chosen here to be a generalized (non-linear parametric) regression model: F(x,W, α) = Wφ(x) (2) with φ(x) = [G(x x 1, δ),..., G(x x M, δ)], having Gaussian kernels (others can be used) placed at an M-sized subset of the training set (we will select this automatically using sparsity priors, notice the difference from where the location of the basis functions is selected apriori), with covariance δ, and W is a weight matrix of size DxM. The model can be made computationally efficient and more robust to overfitting by introducing hierarchical priors on the parameters W of the mapping F, in order to select a sparse subset of X for prediction [23, 14]: p(w α) = D j=1 k=1 N G(w jk, 1 α k ) (3) with: N p(α) = Gamma(α i a, b) (4) Gamma(α a, b) = Γ(a) 1 b a α a 1 e bα () where a = 2, b = 4 are chosen to give broad hyperpriors. The ambient space prior can be obtained by integrating in latent space: p(y W, α, σ) = p(y x, W, α, σ)p(x)dx (6) where p(x) is the latent space prior, which for tractability, can be chosen conveniently as a sum of delta functions [], here placed on an irregular grid in latent space: 1 This is one possible construction for the joint distribution p(x,y) = p(x)p(y x), arguably not the only possible one. For instance one can approximate it as a product of marginal distributions in latent and ambient space p(x,y) = p(x)p(y) = P N p(x,xi)p(y,yi), and rely on the constraints already present in the embedding to account for correlations. 3

4 This leads to the ambient prior: p(x) = 1 K δ(x x i ) (7) K p(y W, α, σ) = 1 K K p(y x i,w, α, σ) (8) The conditional distribution in latent space can be obtained using Bayes rule: p (i,n) = p(x i y n ) = p(y n x i )p(x i ) p(y n ) = p(y n x i ) N p(y n x i,w, α, σ) This allows computing either the conditional mean or the mode (better for multimodal distributions) in latent space: K < x y n,w, α, σ >= p(x y n,w, α, σ)xdx = p (i,n) x i () i max = argmax p (i,n) (11) i The model contains all the ingredients for efficient computation in both latent and ambient space: eq. (7) gives the prior in latent space, (8) provides the induced prior in ambient space, (1) provides the conditional distribution (or mapping) from latent to ambient space, and () and (11) give the mean or mode of the mapping from ambient to latent space. The model can be trained my maximizing the log-likelihood of the ambient data: (9) L = log N p(y n W, α, σ) = N log{ 1 K n=1 K p(y n x i,w, α, σ)} (12) Maximizing the likelihood provides estimates for W, α, σ,(consider σ is diagonal with values σ for simplicity) where: Σ = (σ 2 Φ GΦ + S) 1 (13) W = σ 2 ΣΦ RY (14) where S = diag(α 1,..., α K ) with α corresponding only to the active set, G = diag(g 1,..., G M ) with G i = N j=1 p (i,j), R is a KxN matrix with elements p (i,j), and Y is an NxD matrix that stores the output vectors y i, i = 1...N columnwise, and Φ is a KxM matrix with elements G(x i x j, δ). The variance is estimated as usual from prediction error []: σ 2 = 1 ND N n=1 k=1 K p (kn) W φ(x) y n 2 (1) where a superscript identifies an updated variable estimate. The hyperparameters are re-estimated using the relevance determination equations [14]: α i = where µ i is the i-th column of W. The algorithm is summarized in fig. 1. γ i µ i 2, γ i = 1 α i Σ ii (16) 4

5 SLVM Learning Algorithm Input: Set of high-dimensional, ambient points Y = {y i }...N. Output: Sparse, Spectral Latent Variable Model (SLVM), with parameters (W, α, σ), defined on a irregular, fixed grid in latent space X = {x i }...N, that preserves local or global geometric properties of Y. Step 1. Compute spectral (non-linear) embedding of Y to obtain corresponding latent points X, using any standard embedding method like ISOMAP, LLE, HE, LE. Step 2. Compute inital estimates (W,α,σ ), using sparse kernel regression [23, 11] with training points (x i,y i ) N. Step 3. EM Algorithm to learn Sparse Spectral Generative Mappings on an irregular latent space grid. E-step Compute posterior probabilities p (i,n) for the assignment of latent to ambient space points using (9). M-step: Solve weighted non-linear regression problem to update (W, α, σ) acording to (13), (1), and (16). This uses Laplace approximation for the hyperparameters and analytical integration for the weights, and optimization with greedy weight subset selection [1, 14, 23, 11]. Figure 1: 3 Image-based Direct 3D Pose Prediction We estimate a distribution over solutions in latent space, given input descriptors derived from images obtained by cropping the person using a bounding-box. We then map from latent states to 3d human joint angles (using e.g. (1)) in order to recover body configurations for visualization or error reporting. To predict latent space distributions from image features, we use a conditional mixture of expert predictors. Each one is paired with an observation dependent gate function that scores its competence when presented with different images. As this change, different experts may be active and their rankings (relative probabilities) may change. The model is: p(x r) = M g i (r)p i (x r) (17) g i (r) = exp(λ i r) k exp(λ k r) (18) p i (x r) = G(x E i r,ω i ) (19) with r image descriptors, x state outputs, g i input dependent gates, computed using linear regressors c.f. (18), with weights λ i. g are normalized to sum to 1 for any given input r and p i are Gaussian functions (19)

6 with covariance Ω i, centered at the expert predictions, sparse linear regressors with weights E i. The model is trained using a double-loop EM algorithms [9]. 4 Experiments In order to illustrate the SLVM algorithms, we run experiments on a simple Swiss roll toy data problem and on a real world computer vision application: the reconstruction of 3d human poses from monocular images of people photographed against complex, non-stationary backgrounds. 4.1 Swiss Roll This set of experiments are illustrated in fig. 2. The original data set (top-row, fig.a) consists of points sampled regularly on the surface and (in most plots) color coded to show their relative spatial positions. In fig.b), we show reconstruction using []. The reconstruction is not entirely inaccurate and scrambles the geometric ordering when traversing the S sheet. uses a regular grid and points subsampled on this grid c.f. []. Fig.c), top row, shows accurate reconstructions from our sparse SLV-ISOMAP model, which is also computationally efficient - the 16 (out of ) latent space centers selected by the model are shown in fig.d). The bottom row of fig. 2 illustrates the computation of conditional distributions and the induced prior in latent space for SLV-ISOMAP. In fig.a) we show a dataset together with an out-of-sample ambient point (at one extremity) for which the conditional latent space distribution is computed. Fig. b) shows the intuitive bi-modal distribution in latent space, computed c.f. (9). Figs.c) and d) on the bottom row of fig. 2 show the SLV-ISOMAP prior induced in ambient space, computed using (8), for two different levels of sparsity, % (regular sub-sampling by a factor of 2) and automatic, as computed by our model at 1.6%. Notice that unlike all the other plots shown in the figure, in the rightmost two on the bottom row, the color of the points represents probability, not geometric ordering. The prior distribution has the intuitive property to peak higher on the principal curve of the S-shape, away from the borders of the manifold (where there is less data, on average) d Human Pose Reconstruction In this section, we report quantitative comparisons and qualitative 3d reconstruction (joint angles) of human motion from video or photographs. Image Descriptor A difficulty for reliable discriminative pose prediction is the design of image descriptors that are distinctive enough to differentiate among different poses, yet invariant to within the same pose class deformations and spatial misalignments people in similar stances, but differently proportioned, or photographed on different backgrounds. Exiting methods have successfully demonstrated that bag of features or regular-grid based representations of local descriptors (e.g. bag of shape context features, block of SIFT features [2, ]) can be surprisingly effective at predicting 3d human poses, but the representations tend to be too inflexible for reconstruction in general scenes. It is more appropriate to view them as two useful extremes of a multilevel, hierarchical representation of images - a family of descriptors that progressively relaxes block-wise, rigid local spatial image encodings to increasingly weaker spatial models of position / geometry accumulated over increasingly larger image regions. We derive such an encoding called Multilevel Spatial Blocks (MSB) which computes a description of the image at multiple levels of details. The putative image bounding-box of a person is split using a regular grid of overlapping image blocks, and increasingly large (SIFT) descriptor cell sizes are extracted. SIFT [13] is a local image encoding that builds histograms of edge orientations in a particular image region. We concatenate SIFTs within each layer and across layers, orderly, in order to obtain encodings of an entire image or sub-window. For our problem we use MSBs of size 1344, obtained by decomposing the image into blocks at 3 different levels, with 16, 4, 1 SIFT block, 4x4 cells per block, 12x12 pixel cell size 8 image gradient orientations histogramed within each cell. 6

7 x Figure 2: Analysis of SLV-ISOMAP and on the Swiss roll. Top Row: (a) Original dataset of points, color coded to emphasize the geometric ordering; (b) Data reconstructed (sampled) from, with associations between datapoints color-coded scrambles the points and does not preserve their ambient geometric structure. The reconstruction is not accurate. (c) Reconstruction from our sparse SLV-ISOMAP model with associations color-coded; (d) Active (sparse) basis setwith 1.6%=16 of the datapoints shown in latent space, as automatically selected by SLV- ISOMAP. Bottom row: Illustration of computations of conditional distributions and induced prior in latent space by SLV-ISOMAP. (a) Dataset with ambient point for which conditional latent space distribution is computed. (b) The multimodal conditional distribution in latent space, computed c.f. (9). (c-d) Prior of SLV-ISOMAP in ambient space, computed using (8) for two different levels of sparsity, % (sub-sampling by a factor of 2) and automatic, 1.6%. Important: Please, notice that unlike all the other plots shown in the figure the color of the points represents probability, not geometric ordering!. Database and Multivalued Predictor For qualitative experiments we use images from a movie (Run Lola Run) and the INRIA pedestrian database [6]. For quantitative experiments we use our own database consisting of = 9741 quasi-real images, generated using a computer graphics human model that was animated using the Maya graphics package, and rendered on real image backgrounds (see fig. ). We have 3247 different 3d poses from the CMU motion capture database [1] and these are rendered to produce supervised (3d human joint angle, MSB image descriptor) pairs for different patterns of walks, either viewed frontally or from one side, dancing, conversation, bending and picking, running and pantomime (one of the 3 training sets of 3247 poses is placed on a clean background). We collect three test sets of poses for each of the five motion classes. The test motions are executed by different subjects, not in the training set. We also render one test set on a clean background to use as baseline. The other two test sets are progressively more complicated: one has the model randomly placed at different locations, but on the same images as in the training set, the other has the model placed on unseen backgrounds. In all cases, a 3x2 bounding box of the model and the background is obtained, possibly using rescaling. There is significant variability and lack of centering in this dataset because certain poses are vertically (and horizontally) more symmetric than others (e.g. compare a person who picks an object with one who is standing, or pointing an arm in one direction). We train multivalued predictors (conditional Bayesian mixture of sparse linear regressors) on each activity in the dataset (the latent variable models are also computed separately for each activity). Empirically, for the experts, we observe sparsity patterns in the range 1% 4%, with lower values usually associated with models that generalize better. We use sparse linear regressors, for robustness to image descriptor components affected by fluctuating background clutter c.f. 3. The error is always computed w.r.t. the most 7

8 Average Joint Angle Errors SideWalk SLV ISOMAP(NN1) SLV LLE(NN1) SLV HE(NN1) Average Joint Angle Errors Running SLV ISOMAP(NN1) SLV LLE(NN1) SLV HE(NN1) Average Joint Angle Errors BendPickup SLV ISOMAP(NN1) SLV LLE(NN1) SLV HE(NN1) Pantomime Dancing Cumulative Plot Average Joint Angle Errors SLV ISOMAP(NN1) SLV LLE(NN1) SLV HE(NN1) Average Joint Angle Errors SLV ISOMAP(NN1) SLV LLE(NN1) SLV HE(NN1) Average Joint Angle Errors SLV ISOMAP(NN1) SLV LLE(NN1) SLV HE(NN1) Figure 3: Quantitative results (prediction error, per joint angle, in degrees) for different motions (+ a cumulative plot) and 3 different imagining conditions, as follows: people on Clean backgrounds, people on Clutter1 backgrounds, used in the training set (but the test image has the person in a different position w.r.t. the background, in a relative position not seen in the training set) and people on Clutter2 backgrounds, not seen at all in the training set. We compare several methods, including our SLVM-(ISOMAP,LLE,HE), [], [] and, all with 2 latent space dimensions. We use Discriminative image-based predictors (multivalued mappings from images to latent space), based on conditional Bayesian mixtures of experts, sparse linear regressors, for robustness to image descriptor components affected by fluctuating background clutter c.f. 3. Both LVM models and corresponding predictors have been trained separately for each motion type. The error is always computed w.r.t. the most probable expert. Figure 4: Posterior plots showing the predicted ditribution p(x r) in latent space, for (a-c) and for one of our SLVM models (d-f) for a a walking motion viewed from a side. The topmost points in (a-c) are the training points and for illustration we link points corresponding to adjacent frames in the motion sequence. The ground truth is shown with a circle, the predicted posterior is color-coded. Half of the points on the grid have RBFs placed on top, regularly subsampled (not only top). Notice the loss of track, and the assignments of ambient points to multiple grid points (top). Figs (d-f) shows tracking with our SLV-ISOMAP model. The distribution is ocassionally multimodal and sometimes the most probable peak is not the true one. In general, however, the ground truth is tracked closely. probable expert. In fig. 3, we show quantitative comparisons (prediction error, per joint angle, in degrees) for different motions (+ a cumulative plot) and 3 different imagining conditions: Clean backgrounds, Clutter1 back- 8

9 Figure : Images from our training and testing database where a virtual, CG human impostor, is placed in real-world environments. grounds and Clutter2 backgrounds, not seen at all in the training set. We compare several methods, including our SLVM-(ISOMAP, LLE, HE), [], [] and, all with 2 latent space dimensions embedded from 6d ambient human joint angles (recall that in each case, both a separate LVM and an imagebased latent state predictor have been computed). For visualization and error reporting we use the conditional estimate of the ambient state (that uses function F) to map latent point estimates x to joint angles y. The final set of results we show in fig. 6 is based on real images from the movie Run Lola Run, where our model is seduced by Lola and runs after her, and INRIA pedestrian dataset [6]. These are all automatic 3d reconstructions of fast moving humans in non-instrumented, complex environments. We use a model trained with walking and running labeled poses (quasireal data of our model placed on real backgrounds, rendered from 8 different viewpoints) with an additional unlabeled (real) images of humans running and walking in cluttered scenes. As usual with discriminative predictive methods, the solutions are not entirely accurate in a classical alignment sense. However, we find that the 3d reconstructions have reasonable perceptual accuracy. Figure 6: Qualitative 3d reconstruction results obtained on images from the movie Run Lola Run (block of leftmost 4 images) and the INRIA pedestrian dataset (rightmost 4 images) [6]. (a) Top row shows the original images, (b) Bottom row shows automatic 3d reconstructions. Conclusions We have presented spectral latent variable models (SLVM) and showed their potential for visual inference applications. We have argued in support of low-dimensional models that: (a) are probabilistic; (b) preserve intuitive geometric properties of the ambient distribution, e.g. locality, as required for visual tracking applications; (c) provide mappings, or more generally multimodal conditional distributions between latent and ambient spaces, and (d) are consistent, efficient to learn and estimate and applicable with any embedding method like ISOMAP, LLE or LE. To allow for (a)-(d), we introduce models that combine the geometric and computational properties of spectral embeddings with the probabilistic formulation and the mappings 9

10 offered by LVMs like the. The resulting modes are simple, but we are not aware of a precursor with similar properties and consistency in the literature. We empirically show that (in conjunction with discriminative pose prediction methods and multilevel image encodings), SLVMs are useful for the fully automatic 3d reconstruction of complex human poses from non-instrumented monocular images. Future work We currently investigate alternative approximations to the latent space prior distribution as well as alternative constraints between latent and ambient points. We also plan to study the behavior of our algorithms for unfolding more complex structures, e.g. motion combinations and higher dimensional latent spaces, where the non-linear models are expected perform best. References [1] (3). CMU Human Motion Capture DataBase. Available online at [2] Agarwal, A., & Triggs, B. (6). A local basis representation for estimating human pose from cluttered images. ACCV. [3] Belkin, M., & Niyogi, P. (2). Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering. NIPS. [4] Bengio, J., Paiement, J., & Vincent, P. (3). Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps and Spectral Clustering. NIPS. [] Bishop, C., Svensen, M., & Williams, C. K. I. (1998). Gtm: The generative topographic mapping. Neural Computation, [6] Dalal, N., & Triggs, B. (). Histograms of oriented gradients for human detection. CVPR. [7] Donoho, D., & Grimes, C. (3). Hessian Eigenmaps: Locally Linear Embedding Techniques for High-dimensional Data. Proc. Nat. Acad. Arts and Sciences. [8] Elgammal, A., & Lee, C. (4). Inferring 3d body pose from silhouettes using activity manifold learning. CVPR. [9] Jordan, M., & Jacobs, R. (1994). Hierarchical mixtures of experts and the EM algorithm. Neural Computation, [] Lawrence, N. (). Probabilistic non-linear component analysis with gaussian process latent variable models. JMLR, [11] Lawrence, N., Seeger, M., & Herbrich, R. (3). Fast sparse Gaussian process methods: the informative vector machine. NIPS. [12] LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proc. of IEEE. [13] Lowe, D. (4). Distinctive image features from scale-invariant keypoints. IJCV, 6. [14] Mackay, D. (1998). Comparison of Approximate Methods for Handling Hyperparameters. Neural Computation, 11. [1] Neal, R. (1996). Bayesian learning for neural networks. Springer-Verlag. [16] Roweis, S., & Ghahramani, Z. (1997). A unifying review of linear gaussian models. Neural Computation. [17] Roweis, S., & Saul, L. (). Nonlinear Dimensionality Reduction by Locally Linear Embedding. Science.

11 [18] Schölkopf, B., Smola, A., & Müller, K. (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation,, [19] Sminchisescu, C., & Jepson, A. (4). Generative Modeling for Continuous Non-Linearly Embedded Visual Inference. ICML (pp ). Banff. [] Sminchisescu, C., Kanaujia, A., Li, Z., & Metaxas, D. (). Discriminative Density Propagation for 3D Human Motion Estimation. CVPR (pp ). [21] Tenenbaum, J., Silva, V., & Langford, J. (). A Global Geometric Framewok for Nonlinear Dimensionality Reduction. Science. [22] Tipping, M. (1998). Mixtures of probabilistic principal component analysers. Neural Computation. [23] Tipping, M. (1). Sparse Bayesian learning and the Relevance Vector Machine. JMLR. [24] Urtasun, R., Fleet, D., Hertzmann, A., & Fua, P. (). Priors for people tracking in small training sets. ICCV. [2] Weinberger, K., & Saul, L. (4). Unsupervised Learning of Image Manifolds by Semidefinite Programming. CVPR. 11

Semi-Supervised Hierarchical Models for 3D Human Pose Reconstruction

Semi-Supervised Hierarchical Models for 3D Human Pose Reconstruction Semi-Supervised Hierarchical Models for 3D Human Pose Reconstruction Atul Kanaujia, CBIM, Rutgers Cristian Sminchisescu, TTI-C Dimitris Metaxas,CBIM, Rutgers 3D Human Pose Inference Difficulties Towards

More information

Conditional Visual Tracking in Kernel Space

Conditional Visual Tracking in Kernel Space Conditional Visual Tracking in Kernel Space Cristian Sminchisescu 1,2,3 Atul Kanujia 3 Zhiguo Li 3 Dimitris Metaxas 3 1 TTI-C, 1497 East 50th Street, Chicago, IL, 60637, USA 2 University of Toronto, Department

More information

3D Human Motion Analysis and Manifolds

3D Human Motion Analysis and Manifolds D E P A R T M E N T O F C O M P U T E R S C I E N C E U N I V E R S I T Y O F C O P E N H A G E N 3D Human Motion Analysis and Manifolds Kim Steenstrup Pedersen DIKU Image group and E-Science center Motivation

More information

Real-Time Human Pose Inference using Kernel Principal Component Pre-image Approximations

Real-Time Human Pose Inference using Kernel Principal Component Pre-image Approximations 1 Real-Time Human Pose Inference using Kernel Principal Component Pre-image Approximations T. Tangkuampien and D. Suter Institute for Vision Systems Engineering Monash University, Australia {therdsak.tangkuampien,d.suter}@eng.monash.edu.au

More information

Data fusion and multi-cue data matching using diffusion maps

Data fusion and multi-cue data matching using diffusion maps Data fusion and multi-cue data matching using diffusion maps Stéphane Lafon Collaborators: Raphy Coifman, Andreas Glaser, Yosi Keller, Steven Zucker (Yale University) Part of this work was supported by

More information

Robust Pose Estimation using the SwissRanger SR-3000 Camera

Robust Pose Estimation using the SwissRanger SR-3000 Camera Robust Pose Estimation using the SwissRanger SR- Camera Sigurjón Árni Guðmundsson, Rasmus Larsen and Bjarne K. Ersbøll Technical University of Denmark, Informatics and Mathematical Modelling. Building,

More information

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)

More information

Locality Preserving Projections (LPP) Abstract

Locality Preserving Projections (LPP) Abstract Locality Preserving Projections (LPP) Xiaofei He Partha Niyogi Computer Science Department Computer Science Department The University of Chicago The University of Chicago Chicago, IL 60615 Chicago, IL

More information

Learning a Manifold as an Atlas Supplementary Material

Learning a Manifold as an Atlas Supplementary Material Learning a Manifold as an Atlas Supplementary Material Nikolaos Pitelis Chris Russell School of EECS, Queen Mary, University of London [nikolaos.pitelis,chrisr,lourdes]@eecs.qmul.ac.uk Lourdes Agapito

More information

Monocular Human Motion Capture with a Mixture of Regressors. Ankur Agarwal and Bill Triggs GRAVIR-INRIA-CNRS, Grenoble, France

Monocular Human Motion Capture with a Mixture of Regressors. Ankur Agarwal and Bill Triggs GRAVIR-INRIA-CNRS, Grenoble, France Monocular Human Motion Capture with a Mixture of Regressors Ankur Agarwal and Bill Triggs GRAVIR-INRIA-CNRS, Grenoble, France IEEE Workshop on Vision for Human-Computer Interaction, 21 June 2005 Visual

More information

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009 Learning and Inferring Depth from Monocular Images Jiyan Pan April 1, 2009 Traditional ways of inferring depth Binocular disparity Structure from motion Defocus Given a single monocular image, how to infer

More information

What is machine learning?

What is machine learning? Machine learning, pattern recognition and statistical data modelling Lecture 12. The last lecture Coryn Bailer-Jones 1 What is machine learning? Data description and interpretation finding simpler relationship

More information

People Tracking with the Laplacian Eigenmaps Latent Variable Model

People Tracking with the Laplacian Eigenmaps Latent Variable Model People Tracking with the Laplacian Eigenmaps Latent Variable Model Zhengdong Lu CSEE, OGI, OHSU zhengdon@csee.ogi.edu Miguel Á. Carreira-Perpiñán EECS, UC Merced http://eecs.ucmerced.edu Cristian Sminchisescu

More information

Priors for People Tracking from Small Training Sets Λ

Priors for People Tracking from Small Training Sets Λ Priors for People Tracking from Small Training Sets Λ Raquel Urtasun CVLab EPFL, Lausanne Switzerland David J. Fleet Computer Science Dept. University of Toronto Canada Aaron Hertzmann Computer Science

More information

Automatic Alignment of Local Representations

Automatic Alignment of Local Representations Automatic Alignment of Local Representations Yee Whye Teh and Sam Roweis Department of Computer Science, University of Toronto ywteh,roweis @cs.toronto.edu Abstract We present an automatic alignment procedure

More information

The Role of Manifold Learning in Human Motion Analysis

The Role of Manifold Learning in Human Motion Analysis The Role of Manifold Learning in Human Motion Analysis Ahmed Elgammal and Chan Su Lee Department of Computer Science, Rutgers University, Piscataway, NJ, USA {elgammal,chansu}@cs.rutgers.edu Abstract.

More information

Selecting Models from Videos for Appearance-Based Face Recognition

Selecting Models from Videos for Appearance-Based Face Recognition Selecting Models from Videos for Appearance-Based Face Recognition Abdenour Hadid and Matti Pietikäinen Machine Vision Group Infotech Oulu and Department of Electrical and Information Engineering P.O.

More information

Locality Preserving Projections (LPP) Abstract

Locality Preserving Projections (LPP) Abstract Locality Preserving Projections (LPP) Xiaofei He Partha Niyogi Computer Science Department Computer Science Department The University of Chicago The University of Chicago Chicago, IL 60615 Chicago, IL

More information

08 An Introduction to Dense Continuous Robotic Mapping

08 An Introduction to Dense Continuous Robotic Mapping NAVARCH/EECS 568, ROB 530 - Winter 2018 08 An Introduction to Dense Continuous Robotic Mapping Maani Ghaffari March 14, 2018 Previously: Occupancy Grid Maps Pose SLAM graph and its associated dense occupancy

More information

Dimension Reduction CS534

Dimension Reduction CS534 Dimension Reduction CS534 Why dimension reduction? High dimensionality large number of features E.g., documents represented by thousands of words, millions of bigrams Images represented by thousands of

More information

Locally Linear Landmarks for large-scale manifold learning

Locally Linear Landmarks for large-scale manifold learning Locally Linear Landmarks for large-scale manifold learning Max Vladymyrov and Miguel Á. Carreira-Perpiñán Electrical Engineering and Computer Science University of California, Merced http://eecs.ucmerced.edu

More information

Generalized Principal Component Analysis CVPR 2007

Generalized Principal Component Analysis CVPR 2007 Generalized Principal Component Analysis Tutorial @ CVPR 2007 Yi Ma ECE Department University of Illinois Urbana Champaign René Vidal Center for Imaging Science Institute for Computational Medicine Johns

More information

Learning Joint Top-Down and Bottom-up Processes for 3D Visual Inference

Learning Joint Top-Down and Bottom-up Processes for 3D Visual Inference Learning Joint Top-Down and Bottom-up Processes for 3D Visual Inference Cristian Sminchisescu TTI-C crismin@nagoya.uchicago.edu http://ttic.uchicago.edu/ crismin Atul Kanaujia and Dimitris Metaxas Rutgers

More information

A Taxonomy of Semi-Supervised Learning Algorithms

A Taxonomy of Semi-Supervised Learning Algorithms A Taxonomy of Semi-Supervised Learning Algorithms Olivier Chapelle Max Planck Institute for Biological Cybernetics December 2005 Outline 1 Introduction 2 Generative models 3 Low density separation 4 Graph

More information

Learning Appearance Manifolds from Video

Learning Appearance Manifolds from Video Learning Appearance Manifolds from Video Ali Rahimi MIT CS and AI Lab, Cambridge, MA 9 ali@mit.edu Ben Recht MIT Media Lab, Cambridge, MA 9 brecht@media.mit.edu Trevor Darrell MIT CS and AI Lab, Cambridge,

More information

Generative Modeling for Continuous Non-Linearly Embedded Visual Inference

Generative Modeling for Continuous Non-Linearly Embedded Visual Inference Generative Modeling for Continuous Non-Linearly Embedded Visual Inference Cristian Sminchisescu Allan Jepson University of Toronto, Department of Computer Science, 6 King s College Road, Toronto, Ontario,

More information

low bias high variance high bias low variance error test set training set high low Model Complexity Typical Behaviour Lecture 11:

low bias high variance high bias low variance error test set training set high low Model Complexity Typical Behaviour Lecture 11: Lecture 11: Overfitting and Capacity Control high bias low variance Typical Behaviour low bias high variance Sam Roweis error test set training set November 23, 4 low Model Complexity high Generalization,

More information

Graphical Models for Human Motion Modeling

Graphical Models for Human Motion Modeling Graphical Models for Human Motion Modeling Kooksang Moon and Vladimir Pavlović Rutgers University, Department of Computer Science Piscataway, NJ 8854, USA Abstract. The human figure exhibits complex and

More information

Iterative Non-linear Dimensionality Reduction by Manifold Sculpting

Iterative Non-linear Dimensionality Reduction by Manifold Sculpting Iterative Non-linear Dimensionality Reduction by Manifold Sculpting Mike Gashler, Dan Ventura, and Tony Martinez Brigham Young University Provo, UT 84604 Abstract Many algorithms have been recently developed

More information

Image Similarities for Learning Video Manifolds. Selen Atasoy MICCAI 2011 Tutorial

Image Similarities for Learning Video Manifolds. Selen Atasoy MICCAI 2011 Tutorial Image Similarities for Learning Video Manifolds Selen Atasoy MICCAI 2011 Tutorial Image Spaces Image Manifolds Tenenbaum2000 Roweis2000 Tenenbaum2000 [Tenenbaum2000: J. B. Tenenbaum, V. Silva, J. C. Langford:

More information

Manifold Learning for Video-to-Video Face Recognition

Manifold Learning for Video-to-Video Face Recognition Manifold Learning for Video-to-Video Face Recognition Abstract. We look in this work at the problem of video-based face recognition in which both training and test sets are video sequences, and propose

More information

Non-linear CCA and PCA by Alignment of Local Models

Non-linear CCA and PCA by Alignment of Local Models Non-linear CCA and PCA by Alignment of Local Models Jakob J. Verbeek, Sam T. Roweis, and Nikos Vlassis Informatics Institute, University of Amsterdam Department of Computer Science,University of Toronto

More information

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin Clustering K-means Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, 2014 Carlos Guestrin 2005-2014 1 Clustering images Set of Images [Goldberger et al.] Carlos Guestrin 2005-2014

More information

error low bias high variance test set training set high low Model Complexity Typical Behaviour 2 CSC2515 Machine Learning high bias low variance

error low bias high variance test set training set high low Model Complexity Typical Behaviour 2 CSC2515 Machine Learning high bias low variance CSC55 Machine Learning Sam Roweis high bias low variance Typical Behaviour low bias high variance Lecture : Overfitting and Capacity Control error training set test set November, 6 low Model Complexity

More information

Generative and discriminative classification techniques

Generative and discriminative classification techniques Generative and discriminative classification techniques Machine Learning and Category Representation 013-014 Jakob Verbeek, December 13+0, 013 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.13.14

More information

Does Dimensionality Reduction Improve the Quality of Motion Interpolation?

Does Dimensionality Reduction Improve the Quality of Motion Interpolation? Does Dimensionality Reduction Improve the Quality of Motion Interpolation? Sebastian Bitzer, Stefan Klanke and Sethu Vijayakumar School of Informatics - University of Edinburgh Informatics Forum, 10 Crichton

More information

Large-Scale Face Manifold Learning

Large-Scale Face Manifold Learning Large-Scale Face Manifold Learning Sanjiv Kumar Google Research New York, NY * Joint work with A. Talwalkar, H. Rowley and M. Mohri 1 Face Manifold Learning 50 x 50 pixel faces R 2500 50 x 50 pixel random

More information

Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data

Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data Neil D. Lawrence Department of Computer Science University of Sheffield Regent Court, 211 Portobello Street, Sheffield,

More information

Non-linear dimension reduction

Non-linear dimension reduction Sta306b May 23, 2011 Dimension Reduction: 1 Non-linear dimension reduction ISOMAP: Tenenbaum, de Silva & Langford (2000) Local linear embedding: Roweis & Saul (2000) Local MDS: Chen (2006) all three methods

More information

3D Human Motion Tracking Using Dynamic Probabilistic Latent Semantic Analysis

3D Human Motion Tracking Using Dynamic Probabilistic Latent Semantic Analysis 3D Human Motion Tracking Using Dynamic Probabilistic Latent Semantic Analysis Kooksang Moon and Vladimir Pavlović Department of Computer Science, Rutgers University Piscataway, NJ 885 {ksmoon, vladimir}@cs.rutgers.edu

More information

A Formal Approach to Score Normalization for Meta-search

A Formal Approach to Score Normalization for Meta-search A Formal Approach to Score Normalization for Meta-search R. Manmatha and H. Sever Center for Intelligent Information Retrieval Computer Science Department University of Massachusetts Amherst, MA 01003

More information

Part based models for recognition. Kristen Grauman

Part based models for recognition. Kristen Grauman Part based models for recognition Kristen Grauman UT Austin Limitations of window-based models Not all objects are box-shaped Assuming specific 2d view of object Local components themselves do not necessarily

More information

A Stochastic Optimization Approach for Unsupervised Kernel Regression

A Stochastic Optimization Approach for Unsupervised Kernel Regression A Stochastic Optimization Approach for Unsupervised Kernel Regression Oliver Kramer Institute of Structural Mechanics Bauhaus-University Weimar oliver.kramer@uni-weimar.de Fabian Gieseke Institute of Structural

More information

Tracking People on a Torus

Tracking People on a Torus IEEE TRANSACTION ON PATTERN RECOGNITION AND MACHINE INTELLIGENCE (UNDER REVIEW) Tracking People on a Torus Ahmed Elgammal Member, IEEE, and Chan-Su Lee, Student Member, IEEE, Department of Computer Science,

More information

Non-Local Manifold Tangent Learning

Non-Local Manifold Tangent Learning Non-Local Manifold Tangent Learning Yoshua Bengio and Martin Monperrus Dept. IRO, Université de Montréal P.O. Box 1, Downtown Branch, Montreal, H3C 3J7, Qc, Canada {bengioy,monperrm}@iro.umontreal.ca Abstract

More information

Inferring 3D from 2D

Inferring 3D from 2D Inferring 3D from 2D History Monocular vs. multi-view analysis Difficulties structure of the solution and ambiguities static and dynamic ambiguities Modeling frameworks for inference and learning top-down

More information

Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011

Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011 Previously Part-based and local feature models for generic object recognition Wed, April 20 UT-Austin Discriminative classifiers Boosting Nearest neighbors Support vector machines Useful for object recognition

More information

The Laplacian Eigenmaps Latent Variable Model

The Laplacian Eigenmaps Latent Variable Model The Laplacian Eigenmaps Latent Variable Model with applications to articulated pose tracking Miguel Á. Carreira-Perpiñán EECS, UC Merced http://faculty.ucmerced.edu/mcarreira-perpinan Articulated pose

More information

Fast Algorithms for Large Scale Conditional 3D Prediction

Fast Algorithms for Large Scale Conditional 3D Prediction Fast Algorithms for Large Scale Conditional 3D Prediction Liefeng Bo 1, Cristian Sminchisescu 2,1 1 TTI-C, 2 University of Bonn Atul Kanaujia 3, Dimitris Metaxas 3 3 Rutgers University Abstract The potential

More information

Selection of Scale-Invariant Parts for Object Class Recognition

Selection of Scale-Invariant Parts for Object Class Recognition Selection of Scale-Invariant Parts for Object Class Recognition Gy. Dorkó and C. Schmid INRIA Rhône-Alpes, GRAVIR-CNRS 655, av. de l Europe, 3833 Montbonnot, France fdorko,schmidg@inrialpes.fr Abstract

More information

Clustering web search results

Clustering web search results Clustering K-means Machine Learning CSE546 Emily Fox University of Washington November 4, 2013 1 Clustering images Set of Images [Goldberger et al.] 2 1 Clustering web search results 3 Some Data 4 2 K-means

More information

Sparse Manifold Clustering and Embedding

Sparse Manifold Clustering and Embedding Sparse Manifold Clustering and Embedding Ehsan Elhamifar Center for Imaging Science Johns Hopkins University ehsan@cis.jhu.edu René Vidal Center for Imaging Science Johns Hopkins University rvidal@cis.jhu.edu

More information

Tracking Human Body Pose on a Learned Smooth Space

Tracking Human Body Pose on a Learned Smooth Space Boston U. Computer Science Tech. Report No. 2005-029, Aug. 2005. Tracking Human Body Pose on a Learned Smooth Space Tai-Peng Tian Rui Li Stan Sclaroff Computer Science Department Boston University Boston,

More information

Deep Generative Models Variational Autoencoders

Deep Generative Models Variational Autoencoders Deep Generative Models Variational Autoencoders Sudeshna Sarkar 5 April 2017 Generative Nets Generative models that represent probability distributions over multiple variables in some way. Directed Generative

More information

A Hierarchial Model for Visual Perception

A Hierarchial Model for Visual Perception A Hierarchial Model for Visual Perception Bolei Zhou 1 and Liqing Zhang 2 1 MOE-Microsoft Laboratory for Intelligent Computing and Intelligent Systems, and Department of Biomedical Engineering, Shanghai

More information

Action recognition in videos

Action recognition in videos Action recognition in videos Cordelia Schmid INRIA Grenoble Joint work with V. Ferrari, A. Gaidon, Z. Harchaoui, A. Klaeser, A. Prest, H. Wang Action recognition - goal Short actions, i.e. drinking, sit

More information

Multiple cosegmentation

Multiple cosegmentation Armand Joulin, Francis Bach and Jean Ponce. INRIA -Ecole Normale Supérieure April 25, 2012 Segmentation Introduction Segmentation Supervised and weakly-supervised segmentation Cosegmentation Segmentation

More information

Local Gaussian Processes for Pose Recognition from Noisy Inputs

Local Gaussian Processes for Pose Recognition from Noisy Inputs FERGIE, GALATA: LOCAL GAUSSIAN PROCESS REGRESSION WITH NOISY INPUTS 1 Local Gaussian Processes for Pose Recognition from Noisy Inputs Martin Fergie mfergie@cs.man.ac.uk Aphrodite Galata a.galata@cs.man.ac.uk

More information

Day 3 Lecture 1. Unsupervised Learning

Day 3 Lecture 1. Unsupervised Learning Day 3 Lecture 1 Unsupervised Learning Semi-supervised and transfer learning Myth: you can t do deep learning unless you have a million labelled examples for your problem. Reality You can learn useful representations

More information

Supplementary Material: The Emergence of. Organizing Structure in Conceptual Representation

Supplementary Material: The Emergence of. Organizing Structure in Conceptual Representation Supplementary Material: The Emergence of Organizing Structure in Conceptual Representation Brenden M. Lake, 1,2 Neil D. Lawrence, 3 Joshua B. Tenenbaum, 4,5 1 Center for Data Science, New York University

More information

Learning Alignments from Latent Space Structures

Learning Alignments from Latent Space Structures Learning Alignments from Latent Space Structures Ieva Kazlauskaite Department of Computer Science University of Bath, UK i.kazlauskaite@bath.ac.uk Carl Henrik Ek Faculty of Engineering University of Bristol,

More information

Machine Learning. The Breadth of ML Neural Networks & Deep Learning. Marc Toussaint. Duy Nguyen-Tuong. University of Stuttgart

Machine Learning. The Breadth of ML Neural Networks & Deep Learning. Marc Toussaint. Duy Nguyen-Tuong. University of Stuttgart Machine Learning The Breadth of ML Neural Networks & Deep Learning Marc Toussaint University of Stuttgart Duy Nguyen-Tuong Bosch Center for Artificial Intelligence Summer 2017 Neural Networks Consider

More information

Gaussian Process Motion Graph Models for Smooth Transitions among Multiple Actions

Gaussian Process Motion Graph Models for Smooth Transitions among Multiple Actions Gaussian Process Motion Graph Models for Smooth Transitions among Multiple Actions Norimichi Ukita 1 and Takeo Kanade Graduate School of Information Science, Nara Institute of Science and Technology The

More information

Hierarchical Gaussian Process Latent Variable Models

Hierarchical Gaussian Process Latent Variable Models Neil D. Lawrence neill@cs.man.ac.uk School of Computer Science, University of Manchester, Kilburn Building, Oxford Road, Manchester, M13 9PL, U.K. Andrew J. Moore A.Moore@dcs.shef.ac.uk Dept of Computer

More information

Bilevel Sparse Coding

Bilevel Sparse Coding Adobe Research 345 Park Ave, San Jose, CA Mar 15, 2013 Outline 1 2 The learning model The learning algorithm 3 4 Sparse Modeling Many types of sensory data, e.g., images and audio, are in high-dimensional

More information

Part-based and local feature models for generic object recognition

Part-based and local feature models for generic object recognition Part-based and local feature models for generic object recognition May 28 th, 2015 Yong Jae Lee UC Davis Announcements PS2 grades up on SmartSite PS2 stats: Mean: 80.15 Standard Dev: 22.77 Vote on piazza

More information

Combining PGMs and Discriminative Models for Upper Body Pose Detection

Combining PGMs and Discriminative Models for Upper Body Pose Detection Combining PGMs and Discriminative Models for Upper Body Pose Detection Gedas Bertasius May 30, 2014 1 Introduction In this project, I utilized probabilistic graphical models together with discriminative

More information

Markov Random Fields and Gibbs Sampling for Image Denoising

Markov Random Fields and Gibbs Sampling for Image Denoising Markov Random Fields and Gibbs Sampling for Image Denoising Chang Yue Electrical Engineering Stanford University changyue@stanfoed.edu Abstract This project applies Gibbs Sampling based on different Markov

More information

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1 Preface to the Second Edition Preface to the First Edition vii xi 1 Introduction 1 2 Overview of Supervised Learning 9 2.1 Introduction... 9 2.2 Variable Types and Terminology... 9 2.3 Two Simple Approaches

More information

10-701/15-781, Fall 2006, Final

10-701/15-781, Fall 2006, Final -7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly

More information

Implicit Mixtures of Restricted Boltzmann Machines

Implicit Mixtures of Restricted Boltzmann Machines Implicit Mixtures of Restricted Boltzmann Machines Vinod Nair and Geoffrey Hinton Department of Computer Science, University of Toronto 10 King s College Road, Toronto, M5S 3G5 Canada {vnair,hinton}@cs.toronto.edu

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Overview of Part Two Probabilistic Graphical Models Part Two: Inference and Learning Christopher M. Bishop Exact inference and the junction tree MCMC Variational methods and EM Example General variational

More information

CSE 6242 A / CS 4803 DVA. Feb 12, Dimension Reduction. Guest Lecturer: Jaegul Choo

CSE 6242 A / CS 4803 DVA. Feb 12, Dimension Reduction. Guest Lecturer: Jaegul Choo CSE 6242 A / CS 4803 DVA Feb 12, 2013 Dimension Reduction Guest Lecturer: Jaegul Choo CSE 6242 A / CS 4803 DVA Feb 12, 2013 Dimension Reduction Guest Lecturer: Jaegul Choo Data is Too Big To Do Something..

More information

Estimating Human Pose in Images. Navraj Singh December 11, 2009

Estimating Human Pose in Images. Navraj Singh December 11, 2009 Estimating Human Pose in Images Navraj Singh December 11, 2009 Introduction This project attempts to improve the performance of an existing method of estimating the pose of humans in still images. Tasks

More information

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin Clustering K-means Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, 2014 Carlos Guestrin 2005-2014 1 Clustering images Set of Images [Goldberger et al.] Carlos Guestrin 2005-2014

More information

Human Upper Body Pose Estimation in Static Images

Human Upper Body Pose Estimation in Static Images 1. Research Team Human Upper Body Pose Estimation in Static Images Project Leader: Graduate Students: Prof. Isaac Cohen, Computer Science Mun Wai Lee 2. Statement of Project Goals This goal of this project

More information

Blind Image Deblurring Using Dark Channel Prior

Blind Image Deblurring Using Dark Channel Prior Blind Image Deblurring Using Dark Channel Prior Jinshan Pan 1,2,3, Deqing Sun 2,4, Hanspeter Pfister 2, and Ming-Hsuan Yang 3 1 Dalian University of Technology 2 Harvard University 3 UC Merced 4 NVIDIA

More information

Motion Interpretation and Synthesis by ICA

Motion Interpretation and Synthesis by ICA Motion Interpretation and Synthesis by ICA Renqiang Min Department of Computer Science, University of Toronto, 1 King s College Road, Toronto, ON M5S3G4, Canada Abstract. It is known that high-dimensional

More information

22 October, 2012 MVA ENS Cachan. Lecture 5: Introduction to generative models Iasonas Kokkinos

22 October, 2012 MVA ENS Cachan. Lecture 5: Introduction to generative models Iasonas Kokkinos Machine Learning for Computer Vision 1 22 October, 2012 MVA ENS Cachan Lecture 5: Introduction to generative models Iasonas Kokkinos Iasonas.kokkinos@ecp.fr Center for Visual Computing Ecole Centrale Paris

More information

Discovering Visual Hierarchy through Unsupervised Learning Haider Razvi

Discovering Visual Hierarchy through Unsupervised Learning Haider Razvi Discovering Visual Hierarchy through Unsupervised Learning Haider Razvi hrazvi@stanford.edu 1 Introduction: We present a method for discovering visual hierarchy in a set of images. Automatically grouping

More information

Feature descriptors. Alain Pagani Prof. Didier Stricker. Computer Vision: Object and People Tracking

Feature descriptors. Alain Pagani Prof. Didier Stricker. Computer Vision: Object and People Tracking Feature descriptors Alain Pagani Prof. Didier Stricker Computer Vision: Object and People Tracking 1 Overview Previous lectures: Feature extraction Today: Gradiant/edge Points (Kanade-Tomasi + Harris)

More information

Object Recognition Using Pictorial Structures. Daniel Huttenlocher Computer Science Department. In This Talk. Object recognition in computer vision

Object Recognition Using Pictorial Structures. Daniel Huttenlocher Computer Science Department. In This Talk. Object recognition in computer vision Object Recognition Using Pictorial Structures Daniel Huttenlocher Computer Science Department Joint work with Pedro Felzenszwalb, MIT AI Lab In This Talk Object recognition in computer vision Brief definition

More information

Metric learning approaches! for image annotation! and face recognition!

Metric learning approaches! for image annotation! and face recognition! Metric learning approaches! for image annotation! and face recognition! Jakob Verbeek" LEAR Team, INRIA Grenoble, France! Joint work with :"!Matthieu Guillaumin"!!Thomas Mensink"!!!Cordelia Schmid! 1 2

More information

SELECTION OF THE OPTIMAL PARAMETER VALUE FOR THE LOCALLY LINEAR EMBEDDING ALGORITHM. Olga Kouropteva, Oleg Okun and Matti Pietikäinen

SELECTION OF THE OPTIMAL PARAMETER VALUE FOR THE LOCALLY LINEAR EMBEDDING ALGORITHM. Olga Kouropteva, Oleg Okun and Matti Pietikäinen SELECTION OF THE OPTIMAL PARAMETER VALUE FOR THE LOCALLY LINEAR EMBEDDING ALGORITHM Olga Kouropteva, Oleg Okun and Matti Pietikäinen Machine Vision Group, Infotech Oulu and Department of Electrical and

More information

Supervised Learning for Image Segmentation

Supervised Learning for Image Segmentation Supervised Learning for Image Segmentation Raphael Meier 06.10.2016 Raphael Meier MIA 2016 06.10.2016 1 / 52 References A. Ng, Machine Learning lecture, Stanford University. A. Criminisi, J. Shotton, E.

More information

Data Preprocessing. Javier Béjar. URL - Spring 2018 CS - MAI 1/78 BY: $\

Data Preprocessing. Javier Béjar. URL - Spring 2018 CS - MAI 1/78 BY: $\ Data Preprocessing Javier Béjar BY: $\ URL - Spring 2018 C CS - MAI 1/78 Introduction Data representation Unstructured datasets: Examples described by a flat set of attributes: attribute-value matrix Structured

More information

Head Frontal-View Identification Using Extended LLE

Head Frontal-View Identification Using Extended LLE Head Frontal-View Identification Using Extended LLE Chao Wang Center for Spoken Language Understanding, Oregon Health and Science University Abstract Automatic head frontal-view identification is challenging

More information

( ) =cov X Y = W PRINCIPAL COMPONENT ANALYSIS. Eigenvectors of the covariance matrix are the principal components

( ) =cov X Y = W PRINCIPAL COMPONENT ANALYSIS. Eigenvectors of the covariance matrix are the principal components Review Lecture 14 ! PRINCIPAL COMPONENT ANALYSIS Eigenvectors of the covariance matrix are the principal components 1. =cov X Top K principal components are the eigenvectors with K largest eigenvalues

More information

Non-Local Estimation of Manifold Structure

Non-Local Estimation of Manifold Structure Non-Local Estimation of Manifold Structure Yoshua Bengio, Martin Monperrus and Hugo Larochelle Département d Informatique et Recherche Opérationnelle Centre de Recherches Mathématiques Université de Montréal

More information

Content-based image and video analysis. Machine learning

Content-based image and video analysis. Machine learning Content-based image and video analysis Machine learning for multimedia retrieval 04.05.2009 What is machine learning? Some problems are very hard to solve by writing a computer program by hand Almost all

More information

Classification of objects from Video Data (Group 30)

Classification of objects from Video Data (Group 30) Classification of objects from Video Data (Group 30) Sheallika Singh 12665 Vibhuti Mahajan 12792 Aahitagni Mukherjee 12001 M Arvind 12385 1 Motivation Video surveillance has been employed for a long time

More information

Computer Vision II Lecture 14

Computer Vision II Lecture 14 Computer Vision II Lecture 14 Articulated Tracking I 08.07.2014 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Outline of This Lecture Single-Object Tracking Bayesian

More information

Technical Report. Title: Manifold learning and Random Projections for multi-view object recognition

Technical Report. Title: Manifold learning and Random Projections for multi-view object recognition Technical Report Title: Manifold learning and Random Projections for multi-view object recognition Authors: Grigorios Tsagkatakis 1 and Andreas Savakis 2 1 Center for Imaging Science, Rochester Institute

More information

International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol. XXXIV-5/W10

International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol. XXXIV-5/W10 BUNDLE ADJUSTMENT FOR MARKERLESS BODY TRACKING IN MONOCULAR VIDEO SEQUENCES Ali Shahrokni, Vincent Lepetit, Pascal Fua Computer Vision Lab, Swiss Federal Institute of Technology (EPFL) ali.shahrokni,vincent.lepetit,pascal.fua@epfl.ch

More information

Human Motion Change Detection By Hierarchical Gaussian Process Dynamical Model With Particle Filter

Human Motion Change Detection By Hierarchical Gaussian Process Dynamical Model With Particle Filter Human Motion Change Detection By Hierarchical Gaussian Process Dynamical Model With Particle Filter Yafeng Yin, Hong Man, Jing Wang, Guang Yang ECE Department, Stevens Institute of Technology Hoboken,

More information

Multiview Feature Learning

Multiview Feature Learning Multiview Feature Learning Roland Memisevic Frankfurt, Montreal Tutorial at IPAM 2012 Roland Memisevic (Frankfurt, Montreal) Multiview Feature Learning Tutorial at IPAM 2012 1 / 163 Outline 1 Introduction

More information

Discovering Shared Structure in Manifold Learning

Discovering Shared Structure in Manifold Learning Discovering Shared Structure in Manifold Learning Yoshua Bengio and Martin Monperrus Dept. IRO, Université de Montréal P.O. Box 1, Downtown Branch, Montreal, H3C 3J7, Qc, Canada {bengioy,monperrm}@iro.umontreal.ca

More information

Experimental Analysis of GTM

Experimental Analysis of GTM Experimental Analysis of GTM Elias Pampalk In the past years many different data mining techniques have been developed. The goal of the seminar Kosice-Vienna is to compare some of them to determine which

More information

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas CS839: Probabilistic Graphical Models Lecture 10: Learning with Partially Observed Data Theo Rekatsinas 1 Partially Observed GMs Speech recognition 2 Partially Observed GMs Evolution 3 Partially Observed

More information

Differential Structure in non-linear Image Embedding Functions

Differential Structure in non-linear Image Embedding Functions Differential Structure in non-linear Image Embedding Functions Robert Pless Department of Computer Science, Washington University in St. Louis pless@cse.wustl.edu Abstract Many natural image sets are samples

More information