Spectral Latent Variable Models for Perceptual Inference
|
|
- Jessie Jeffry Nash
- 5 years ago
- Views:
Transcription
1 Technical Report, Department of Computer Science, Rutgers University, #DCS-TR-6, February 7 Spectral Latent Variable for Perceptual Inference Atul Kanaujia 1 Cristian Sminchisescu 2,3 Dimitris Metaxas 1 1 Department of Computer Science, Rutgers University, USA {kanaujia,dnm}@cs.rutgers.edu, kanaujia,dnm 2 Department of Computer Science, University of Toronto, Canada 3 TTI-Chicago, USA crismin@cs.toronto.edu, crismin May 22, 7 Abstract We propose low-dimensional non-linear latent variable models adapted for perceptual inference. We introduce a family of algorithms, referred to as Sparse Spectral Latent Variable (SLVM) that combine the advantages of spectral embeddings with the ones of parametric latent variable models like the Generative Topographic Mapping (): (1) they provide local-optima free embeddings that preserve intuitive properties, e.g. local or global geometry of the data, (2) offer low-dimensional generative models with probabilistic, bi-directional mappings between latent and ambient spaces, (3) are consistent and efficient to estimate. We show empirically that SLV versions of ISOMAP or LLE can be used successfully for complex visual inference tasks like the fully automatic reconstruction of 3d human poses in un-instrumented monocular video. 1 Introduction Many models in computer vision or machine learning are based on high-dimensional ambient observations (images, sound), considered to have been generated by non-linear, intrinsically low-dimensional processes. Many methods exist for computing such representations, but relatively few provide all the components necessary for inference tasks, e.g. tracking and reconstructing the three-dimensional pose of an object based on image observations. For such problems we need global representations and low-dimensional models with bidirectional mappings between latent and observation spaces. We also need to preserve global or local geometric properties of the data in latent space a model used for visual tracking based on ambient observations needs locality: small state changes in latent space should only produce commensurate changes in the observation space. If this doesn t hold the tracker may need to make realistically large and difficult to predict jumps in latent space in order to follow smooth trajectories in the observation space. We also need models that are probabilistic and can be learnt consistently, e.g. all their components estimated jointly, using ML. In this work we identify components that can provide consistent models with all the necessary properties: we marriage spectral embeddings and parametric latent variable models. This is a seemingly simple idea, but we know of no precursor. Spectral methods are attractive for their intuitive geometric properties and efficient global computation, but do not have a probabilistic extension, whereas regular grid-based generative methods like the Generative Topographic Mapping () are not geometry preserving, and encounter difficulties with latent spaces of dimension higher than 2-3, or when unfolding complex non-linear structure due to local optima. We propose a family of algorithms that combines the advantages of the two classes of methods: (1) they offer low-dimensional generative models with bi-directional, possible multimodal distributions (mappings) between latent and ambient spaces, (2) are probabilistically consistent and efficient to estimate. We refer to these probabilistic constructs defined on top of the irregular grid obtained from a spectral embedding as Sparse Spectral Latent Variable (SLVM). We show how SLV versions of ISOMAP, LLE of HE can 1
2 be used successfully for complex visual inference tasks like the fully automatic 3d reconstruction of human poses in non-instrumented monocular video. 1.1 Related Work Our research relates to work in spectral manifold learning, latent variable models and visual tracking. Spectral methods can model intuitive local or global geometric constraints [17, 21, 3, 7, 2] and their, local-optima free, polynomial time calculations are amenable to efficient optimization either sparse eigenvalue problems, or dense problems that can be solved with algebraic multigrid methods [4]. A variety of latent variable models exist, both linear (factor analysis, P) [16], and non-linear, e.g. mixture of factor analyzers or P [22], or grid-based methods like []. These methods are powerful but besides computational difficulties caused by multiple local optima, the models do not provide either global coordinate systems or latent spaces with flexible geometric structure. Recent work on visual tracking has recognized the need for global, low-dimensional, probabilistic models with intuitive geometric properties and latent-ambient mappings. More complex models have emerged by complementing with mappings the results of spectral methods. Elgammal & Lee [8] fit RBFs to the corresponding points of an LLE-embedding, but their model devoid of a latent space prior is not fully probabilistic and there are no mappings to the latent space. In independent work, Sminchisescu & Jepson [19] augment spectral embeddings (e.g. Laplacian Eigenmaps) with both latent space Gaussian mixture priors and sparse kernel regression mappings from latent to ambient space. Their model qualifies as a latent variable one, but ambient to latent mappings are computationally difficult to compute and the model is trained piece-wise not jointly. Urtasun et al [24] use an efficient Gaussian Process Latent Variable Model () of Lawrence [], but this optimizes a data reconstruction criterion, not based on the preservation of local or global geometric properties. The zero mean unit covariance prior in latent space makes the model somewhat agnostic for visual inference applications, where it is useful to penalize drifts from the manifold of admissible visual configurations. For improved stability, [24] use an augmented constrained (latent,ambient) state for tracking. We demonstrate the SLVMs for discriminative 3d human pose estimation, but here we predict a lowdimensional latent variable of the human pose directly from image sequences or human photographs. Existing research on visual inference using low-dimensional methods (e.g. 3d human pose tracking) has focused on generative tracking frameworks, based on either complex appearance (observation) models [19] or simpler models based on image tracks of the human body, assumed known, and assigned to the corresponding joints of a 3d model [24]. In practice such correspondences are difficult to obtain, and existing methods had to be initialized by hand. Initialization becomes more difficult as the dimension of the latent space increases, even for known correspondences - ultimately the search complexity is exponential in the latent space dimension. The advantages of latent variable models with generative trackers are counterbalanced by the difficulty to make models based on alignment work when the representation capacity of the 3d body model is reduced. E.g., even for simple walking motions, the amplitude of a tested motion may be different from the one in the training set. The 3d model may not align well with the image features and may gradually loose track. This explains why existing generative frameworks either relay on quite large training sets [19], assume that long-range model-image correspondences are given, or assume clean image silhouette data exists [8]. We depart from alignment approaches, to construct (inverse) image-based predictors that regress the latent body pose directly against image evidence. This allows more robust tracking when the latent variable models used have reduced representation capacity. 2 Sparse Spectral Latent Variable (SLVM) We work with two sets of vectors, of equal size N (possibly in correspondence), in two spaces referred as latent and ambient. Both sets are un-ordered, hence there is no constraint to place either one on a regular grid (i.e. a matrix in 2d or the cells of cube in 3d, as is the case of the latent vectors in []). The latent space vectors are generically denoted x, with dim(x) = d, the ambient vectors are y with dim(y) = D, typically with D >> d. 2
3 Spectral Embeddings We assume that vector-valued points in ambient space Y are captured from a highdimensional process (images, sound, human motion capture systems), whereas latent space points X are obtained using a spectral, non-linear embedding method like ISOMAP, LLE, Hessian or Laplacian Eigenmaps, etc. This provides several advantages: (1) Preserves intuitive local or global geometric properties. These are important for visual tracking applications, which allow efficient, smooth estimates of the state of an object in latent space for stability, it is important that small changes in latent space lead to small, continuous changes in the ambient / observation space, (2) no local optima (3) provide corresponding points with similar geometric properties. The resulting irregular grid of embedded points is used in order to construct a joint probability distribution over latent and ambient variables. Sparse Generative Spectral Mappings For our models, we assume that ambient vectors are related to the latent ones using a nonlinear vector-valued function F(x, W, α) with parameters W and output noise covariance σ. In other words the joint distribution has a non-linear constraint between two blocks of its variables. 1 The model can be derived analogously with the Generative Topographic Map [] (hence the name of this section), except that it is defined on an irregular grid X in latent space, as computed from a spectral embedding: p(y x,w, α, σ) = G(y F(x,W, α), σ) (1) where G is a Gaussian function of variable y, with mean F and covariance σ. F is chosen here to be a generalized (non-linear parametric) regression model: F(x,W, α) = Wφ(x) (2) with φ(x) = [G(x x 1, δ),..., G(x x M, δ)], having Gaussian kernels (others can be used) placed at an M-sized subset of the training set (we will select this automatically using sparsity priors, notice the difference from where the location of the basis functions is selected apriori), with covariance δ, and W is a weight matrix of size DxM. The model can be made computationally efficient and more robust to overfitting by introducing hierarchical priors on the parameters W of the mapping F, in order to select a sparse subset of X for prediction [23, 14]: p(w α) = D j=1 k=1 N G(w jk, 1 α k ) (3) with: N p(α) = Gamma(α i a, b) (4) Gamma(α a, b) = Γ(a) 1 b a α a 1 e bα () where a = 2, b = 4 are chosen to give broad hyperpriors. The ambient space prior can be obtained by integrating in latent space: p(y W, α, σ) = p(y x, W, α, σ)p(x)dx (6) where p(x) is the latent space prior, which for tractability, can be chosen conveniently as a sum of delta functions [], here placed on an irregular grid in latent space: 1 This is one possible construction for the joint distribution p(x,y) = p(x)p(y x), arguably not the only possible one. For instance one can approximate it as a product of marginal distributions in latent and ambient space p(x,y) = p(x)p(y) = P N p(x,xi)p(y,yi), and rely on the constraints already present in the embedding to account for correlations. 3
4 This leads to the ambient prior: p(x) = 1 K δ(x x i ) (7) K p(y W, α, σ) = 1 K K p(y x i,w, α, σ) (8) The conditional distribution in latent space can be obtained using Bayes rule: p (i,n) = p(x i y n ) = p(y n x i )p(x i ) p(y n ) = p(y n x i ) N p(y n x i,w, α, σ) This allows computing either the conditional mean or the mode (better for multimodal distributions) in latent space: K < x y n,w, α, σ >= p(x y n,w, α, σ)xdx = p (i,n) x i () i max = argmax p (i,n) (11) i The model contains all the ingredients for efficient computation in both latent and ambient space: eq. (7) gives the prior in latent space, (8) provides the induced prior in ambient space, (1) provides the conditional distribution (or mapping) from latent to ambient space, and () and (11) give the mean or mode of the mapping from ambient to latent space. The model can be trained my maximizing the log-likelihood of the ambient data: (9) L = log N p(y n W, α, σ) = N log{ 1 K n=1 K p(y n x i,w, α, σ)} (12) Maximizing the likelihood provides estimates for W, α, σ,(consider σ is diagonal with values σ for simplicity) where: Σ = (σ 2 Φ GΦ + S) 1 (13) W = σ 2 ΣΦ RY (14) where S = diag(α 1,..., α K ) with α corresponding only to the active set, G = diag(g 1,..., G M ) with G i = N j=1 p (i,j), R is a KxN matrix with elements p (i,j), and Y is an NxD matrix that stores the output vectors y i, i = 1...N columnwise, and Φ is a KxM matrix with elements G(x i x j, δ). The variance is estimated as usual from prediction error []: σ 2 = 1 ND N n=1 k=1 K p (kn) W φ(x) y n 2 (1) where a superscript identifies an updated variable estimate. The hyperparameters are re-estimated using the relevance determination equations [14]: α i = where µ i is the i-th column of W. The algorithm is summarized in fig. 1. γ i µ i 2, γ i = 1 α i Σ ii (16) 4
5 SLVM Learning Algorithm Input: Set of high-dimensional, ambient points Y = {y i }...N. Output: Sparse, Spectral Latent Variable Model (SLVM), with parameters (W, α, σ), defined on a irregular, fixed grid in latent space X = {x i }...N, that preserves local or global geometric properties of Y. Step 1. Compute spectral (non-linear) embedding of Y to obtain corresponding latent points X, using any standard embedding method like ISOMAP, LLE, HE, LE. Step 2. Compute inital estimates (W,α,σ ), using sparse kernel regression [23, 11] with training points (x i,y i ) N. Step 3. EM Algorithm to learn Sparse Spectral Generative Mappings on an irregular latent space grid. E-step Compute posterior probabilities p (i,n) for the assignment of latent to ambient space points using (9). M-step: Solve weighted non-linear regression problem to update (W, α, σ) acording to (13), (1), and (16). This uses Laplace approximation for the hyperparameters and analytical integration for the weights, and optimization with greedy weight subset selection [1, 14, 23, 11]. Figure 1: 3 Image-based Direct 3D Pose Prediction We estimate a distribution over solutions in latent space, given input descriptors derived from images obtained by cropping the person using a bounding-box. We then map from latent states to 3d human joint angles (using e.g. (1)) in order to recover body configurations for visualization or error reporting. To predict latent space distributions from image features, we use a conditional mixture of expert predictors. Each one is paired with an observation dependent gate function that scores its competence when presented with different images. As this change, different experts may be active and their rankings (relative probabilities) may change. The model is: p(x r) = M g i (r)p i (x r) (17) g i (r) = exp(λ i r) k exp(λ k r) (18) p i (x r) = G(x E i r,ω i ) (19) with r image descriptors, x state outputs, g i input dependent gates, computed using linear regressors c.f. (18), with weights λ i. g are normalized to sum to 1 for any given input r and p i are Gaussian functions (19)
6 with covariance Ω i, centered at the expert predictions, sparse linear regressors with weights E i. The model is trained using a double-loop EM algorithms [9]. 4 Experiments In order to illustrate the SLVM algorithms, we run experiments on a simple Swiss roll toy data problem and on a real world computer vision application: the reconstruction of 3d human poses from monocular images of people photographed against complex, non-stationary backgrounds. 4.1 Swiss Roll This set of experiments are illustrated in fig. 2. The original data set (top-row, fig.a) consists of points sampled regularly on the surface and (in most plots) color coded to show their relative spatial positions. In fig.b), we show reconstruction using []. The reconstruction is not entirely inaccurate and scrambles the geometric ordering when traversing the S sheet. uses a regular grid and points subsampled on this grid c.f. []. Fig.c), top row, shows accurate reconstructions from our sparse SLV-ISOMAP model, which is also computationally efficient - the 16 (out of ) latent space centers selected by the model are shown in fig.d). The bottom row of fig. 2 illustrates the computation of conditional distributions and the induced prior in latent space for SLV-ISOMAP. In fig.a) we show a dataset together with an out-of-sample ambient point (at one extremity) for which the conditional latent space distribution is computed. Fig. b) shows the intuitive bi-modal distribution in latent space, computed c.f. (9). Figs.c) and d) on the bottom row of fig. 2 show the SLV-ISOMAP prior induced in ambient space, computed using (8), for two different levels of sparsity, % (regular sub-sampling by a factor of 2) and automatic, as computed by our model at 1.6%. Notice that unlike all the other plots shown in the figure, in the rightmost two on the bottom row, the color of the points represents probability, not geometric ordering. The prior distribution has the intuitive property to peak higher on the principal curve of the S-shape, away from the borders of the manifold (where there is less data, on average) d Human Pose Reconstruction In this section, we report quantitative comparisons and qualitative 3d reconstruction (joint angles) of human motion from video or photographs. Image Descriptor A difficulty for reliable discriminative pose prediction is the design of image descriptors that are distinctive enough to differentiate among different poses, yet invariant to within the same pose class deformations and spatial misalignments people in similar stances, but differently proportioned, or photographed on different backgrounds. Exiting methods have successfully demonstrated that bag of features or regular-grid based representations of local descriptors (e.g. bag of shape context features, block of SIFT features [2, ]) can be surprisingly effective at predicting 3d human poses, but the representations tend to be too inflexible for reconstruction in general scenes. It is more appropriate to view them as two useful extremes of a multilevel, hierarchical representation of images - a family of descriptors that progressively relaxes block-wise, rigid local spatial image encodings to increasingly weaker spatial models of position / geometry accumulated over increasingly larger image regions. We derive such an encoding called Multilevel Spatial Blocks (MSB) which computes a description of the image at multiple levels of details. The putative image bounding-box of a person is split using a regular grid of overlapping image blocks, and increasingly large (SIFT) descriptor cell sizes are extracted. SIFT [13] is a local image encoding that builds histograms of edge orientations in a particular image region. We concatenate SIFTs within each layer and across layers, orderly, in order to obtain encodings of an entire image or sub-window. For our problem we use MSBs of size 1344, obtained by decomposing the image into blocks at 3 different levels, with 16, 4, 1 SIFT block, 4x4 cells per block, 12x12 pixel cell size 8 image gradient orientations histogramed within each cell. 6
7 x Figure 2: Analysis of SLV-ISOMAP and on the Swiss roll. Top Row: (a) Original dataset of points, color coded to emphasize the geometric ordering; (b) Data reconstructed (sampled) from, with associations between datapoints color-coded scrambles the points and does not preserve their ambient geometric structure. The reconstruction is not accurate. (c) Reconstruction from our sparse SLV-ISOMAP model with associations color-coded; (d) Active (sparse) basis setwith 1.6%=16 of the datapoints shown in latent space, as automatically selected by SLV- ISOMAP. Bottom row: Illustration of computations of conditional distributions and induced prior in latent space by SLV-ISOMAP. (a) Dataset with ambient point for which conditional latent space distribution is computed. (b) The multimodal conditional distribution in latent space, computed c.f. (9). (c-d) Prior of SLV-ISOMAP in ambient space, computed using (8) for two different levels of sparsity, % (sub-sampling by a factor of 2) and automatic, 1.6%. Important: Please, notice that unlike all the other plots shown in the figure the color of the points represents probability, not geometric ordering!. Database and Multivalued Predictor For qualitative experiments we use images from a movie (Run Lola Run) and the INRIA pedestrian database [6]. For quantitative experiments we use our own database consisting of = 9741 quasi-real images, generated using a computer graphics human model that was animated using the Maya graphics package, and rendered on real image backgrounds (see fig. ). We have 3247 different 3d poses from the CMU motion capture database [1] and these are rendered to produce supervised (3d human joint angle, MSB image descriptor) pairs for different patterns of walks, either viewed frontally or from one side, dancing, conversation, bending and picking, running and pantomime (one of the 3 training sets of 3247 poses is placed on a clean background). We collect three test sets of poses for each of the five motion classes. The test motions are executed by different subjects, not in the training set. We also render one test set on a clean background to use as baseline. The other two test sets are progressively more complicated: one has the model randomly placed at different locations, but on the same images as in the training set, the other has the model placed on unseen backgrounds. In all cases, a 3x2 bounding box of the model and the background is obtained, possibly using rescaling. There is significant variability and lack of centering in this dataset because certain poses are vertically (and horizontally) more symmetric than others (e.g. compare a person who picks an object with one who is standing, or pointing an arm in one direction). We train multivalued predictors (conditional Bayesian mixture of sparse linear regressors) on each activity in the dataset (the latent variable models are also computed separately for each activity). Empirically, for the experts, we observe sparsity patterns in the range 1% 4%, with lower values usually associated with models that generalize better. We use sparse linear regressors, for robustness to image descriptor components affected by fluctuating background clutter c.f. 3. The error is always computed w.r.t. the most 7
8 Average Joint Angle Errors SideWalk SLV ISOMAP(NN1) SLV LLE(NN1) SLV HE(NN1) Average Joint Angle Errors Running SLV ISOMAP(NN1) SLV LLE(NN1) SLV HE(NN1) Average Joint Angle Errors BendPickup SLV ISOMAP(NN1) SLV LLE(NN1) SLV HE(NN1) Pantomime Dancing Cumulative Plot Average Joint Angle Errors SLV ISOMAP(NN1) SLV LLE(NN1) SLV HE(NN1) Average Joint Angle Errors SLV ISOMAP(NN1) SLV LLE(NN1) SLV HE(NN1) Average Joint Angle Errors SLV ISOMAP(NN1) SLV LLE(NN1) SLV HE(NN1) Figure 3: Quantitative results (prediction error, per joint angle, in degrees) for different motions (+ a cumulative plot) and 3 different imagining conditions, as follows: people on Clean backgrounds, people on Clutter1 backgrounds, used in the training set (but the test image has the person in a different position w.r.t. the background, in a relative position not seen in the training set) and people on Clutter2 backgrounds, not seen at all in the training set. We compare several methods, including our SLVM-(ISOMAP,LLE,HE), [], [] and, all with 2 latent space dimensions. We use Discriminative image-based predictors (multivalued mappings from images to latent space), based on conditional Bayesian mixtures of experts, sparse linear regressors, for robustness to image descriptor components affected by fluctuating background clutter c.f. 3. Both LVM models and corresponding predictors have been trained separately for each motion type. The error is always computed w.r.t. the most probable expert. Figure 4: Posterior plots showing the predicted ditribution p(x r) in latent space, for (a-c) and for one of our SLVM models (d-f) for a a walking motion viewed from a side. The topmost points in (a-c) are the training points and for illustration we link points corresponding to adjacent frames in the motion sequence. The ground truth is shown with a circle, the predicted posterior is color-coded. Half of the points on the grid have RBFs placed on top, regularly subsampled (not only top). Notice the loss of track, and the assignments of ambient points to multiple grid points (top). Figs (d-f) shows tracking with our SLV-ISOMAP model. The distribution is ocassionally multimodal and sometimes the most probable peak is not the true one. In general, however, the ground truth is tracked closely. probable expert. In fig. 3, we show quantitative comparisons (prediction error, per joint angle, in degrees) for different motions (+ a cumulative plot) and 3 different imagining conditions: Clean backgrounds, Clutter1 back- 8
9 Figure : Images from our training and testing database where a virtual, CG human impostor, is placed in real-world environments. grounds and Clutter2 backgrounds, not seen at all in the training set. We compare several methods, including our SLVM-(ISOMAP, LLE, HE), [], [] and, all with 2 latent space dimensions embedded from 6d ambient human joint angles (recall that in each case, both a separate LVM and an imagebased latent state predictor have been computed). For visualization and error reporting we use the conditional estimate of the ambient state (that uses function F) to map latent point estimates x to joint angles y. The final set of results we show in fig. 6 is based on real images from the movie Run Lola Run, where our model is seduced by Lola and runs after her, and INRIA pedestrian dataset [6]. These are all automatic 3d reconstructions of fast moving humans in non-instrumented, complex environments. We use a model trained with walking and running labeled poses (quasireal data of our model placed on real backgrounds, rendered from 8 different viewpoints) with an additional unlabeled (real) images of humans running and walking in cluttered scenes. As usual with discriminative predictive methods, the solutions are not entirely accurate in a classical alignment sense. However, we find that the 3d reconstructions have reasonable perceptual accuracy. Figure 6: Qualitative 3d reconstruction results obtained on images from the movie Run Lola Run (block of leftmost 4 images) and the INRIA pedestrian dataset (rightmost 4 images) [6]. (a) Top row shows the original images, (b) Bottom row shows automatic 3d reconstructions. Conclusions We have presented spectral latent variable models (SLVM) and showed their potential for visual inference applications. We have argued in support of low-dimensional models that: (a) are probabilistic; (b) preserve intuitive geometric properties of the ambient distribution, e.g. locality, as required for visual tracking applications; (c) provide mappings, or more generally multimodal conditional distributions between latent and ambient spaces, and (d) are consistent, efficient to learn and estimate and applicable with any embedding method like ISOMAP, LLE or LE. To allow for (a)-(d), we introduce models that combine the geometric and computational properties of spectral embeddings with the probabilistic formulation and the mappings 9
10 offered by LVMs like the. The resulting modes are simple, but we are not aware of a precursor with similar properties and consistency in the literature. We empirically show that (in conjunction with discriminative pose prediction methods and multilevel image encodings), SLVMs are useful for the fully automatic 3d reconstruction of complex human poses from non-instrumented monocular images. Future work We currently investigate alternative approximations to the latent space prior distribution as well as alternative constraints between latent and ambient points. We also plan to study the behavior of our algorithms for unfolding more complex structures, e.g. motion combinations and higher dimensional latent spaces, where the non-linear models are expected perform best. References [1] (3). CMU Human Motion Capture DataBase. Available online at [2] Agarwal, A., & Triggs, B. (6). A local basis representation for estimating human pose from cluttered images. ACCV. [3] Belkin, M., & Niyogi, P. (2). Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering. NIPS. [4] Bengio, J., Paiement, J., & Vincent, P. (3). Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps and Spectral Clustering. NIPS. [] Bishop, C., Svensen, M., & Williams, C. K. I. (1998). Gtm: The generative topographic mapping. Neural Computation, [6] Dalal, N., & Triggs, B. (). Histograms of oriented gradients for human detection. CVPR. [7] Donoho, D., & Grimes, C. (3). Hessian Eigenmaps: Locally Linear Embedding Techniques for High-dimensional Data. Proc. Nat. Acad. Arts and Sciences. [8] Elgammal, A., & Lee, C. (4). Inferring 3d body pose from silhouettes using activity manifold learning. CVPR. [9] Jordan, M., & Jacobs, R. (1994). Hierarchical mixtures of experts and the EM algorithm. Neural Computation, [] Lawrence, N. (). Probabilistic non-linear component analysis with gaussian process latent variable models. JMLR, [11] Lawrence, N., Seeger, M., & Herbrich, R. (3). Fast sparse Gaussian process methods: the informative vector machine. NIPS. [12] LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proc. of IEEE. [13] Lowe, D. (4). Distinctive image features from scale-invariant keypoints. IJCV, 6. [14] Mackay, D. (1998). Comparison of Approximate Methods for Handling Hyperparameters. Neural Computation, 11. [1] Neal, R. (1996). Bayesian learning for neural networks. Springer-Verlag. [16] Roweis, S., & Ghahramani, Z. (1997). A unifying review of linear gaussian models. Neural Computation. [17] Roweis, S., & Saul, L. (). Nonlinear Dimensionality Reduction by Locally Linear Embedding. Science.
11 [18] Schölkopf, B., Smola, A., & Müller, K. (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation,, [19] Sminchisescu, C., & Jepson, A. (4). Generative Modeling for Continuous Non-Linearly Embedded Visual Inference. ICML (pp ). Banff. [] Sminchisescu, C., Kanaujia, A., Li, Z., & Metaxas, D. (). Discriminative Density Propagation for 3D Human Motion Estimation. CVPR (pp ). [21] Tenenbaum, J., Silva, V., & Langford, J. (). A Global Geometric Framewok for Nonlinear Dimensionality Reduction. Science. [22] Tipping, M. (1998). Mixtures of probabilistic principal component analysers. Neural Computation. [23] Tipping, M. (1). Sparse Bayesian learning and the Relevance Vector Machine. JMLR. [24] Urtasun, R., Fleet, D., Hertzmann, A., & Fua, P. (). Priors for people tracking in small training sets. ICCV. [2] Weinberger, K., & Saul, L. (4). Unsupervised Learning of Image Manifolds by Semidefinite Programming. CVPR. 11
Semi-Supervised Hierarchical Models for 3D Human Pose Reconstruction
Semi-Supervised Hierarchical Models for 3D Human Pose Reconstruction Atul Kanaujia, CBIM, Rutgers Cristian Sminchisescu, TTI-C Dimitris Metaxas,CBIM, Rutgers 3D Human Pose Inference Difficulties Towards
More informationConditional Visual Tracking in Kernel Space
Conditional Visual Tracking in Kernel Space Cristian Sminchisescu 1,2,3 Atul Kanujia 3 Zhiguo Li 3 Dimitris Metaxas 3 1 TTI-C, 1497 East 50th Street, Chicago, IL, 60637, USA 2 University of Toronto, Department
More information3D Human Motion Analysis and Manifolds
D E P A R T M E N T O F C O M P U T E R S C I E N C E U N I V E R S I T Y O F C O P E N H A G E N 3D Human Motion Analysis and Manifolds Kim Steenstrup Pedersen DIKU Image group and E-Science center Motivation
More informationReal-Time Human Pose Inference using Kernel Principal Component Pre-image Approximations
1 Real-Time Human Pose Inference using Kernel Principal Component Pre-image Approximations T. Tangkuampien and D. Suter Institute for Vision Systems Engineering Monash University, Australia {therdsak.tangkuampien,d.suter}@eng.monash.edu.au
More informationData fusion and multi-cue data matching using diffusion maps
Data fusion and multi-cue data matching using diffusion maps Stéphane Lafon Collaborators: Raphy Coifman, Andreas Glaser, Yosi Keller, Steven Zucker (Yale University) Part of this work was supported by
More informationRobust Pose Estimation using the SwissRanger SR-3000 Camera
Robust Pose Estimation using the SwissRanger SR- Camera Sigurjón Árni Guðmundsson, Rasmus Larsen and Bjarne K. Ersbøll Technical University of Denmark, Informatics and Mathematical Modelling. Building,
More informationFMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu
FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)
More informationLocality Preserving Projections (LPP) Abstract
Locality Preserving Projections (LPP) Xiaofei He Partha Niyogi Computer Science Department Computer Science Department The University of Chicago The University of Chicago Chicago, IL 60615 Chicago, IL
More informationLearning a Manifold as an Atlas Supplementary Material
Learning a Manifold as an Atlas Supplementary Material Nikolaos Pitelis Chris Russell School of EECS, Queen Mary, University of London [nikolaos.pitelis,chrisr,lourdes]@eecs.qmul.ac.uk Lourdes Agapito
More informationMonocular Human Motion Capture with a Mixture of Regressors. Ankur Agarwal and Bill Triggs GRAVIR-INRIA-CNRS, Grenoble, France
Monocular Human Motion Capture with a Mixture of Regressors Ankur Agarwal and Bill Triggs GRAVIR-INRIA-CNRS, Grenoble, France IEEE Workshop on Vision for Human-Computer Interaction, 21 June 2005 Visual
More informationLearning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009
Learning and Inferring Depth from Monocular Images Jiyan Pan April 1, 2009 Traditional ways of inferring depth Binocular disparity Structure from motion Defocus Given a single monocular image, how to infer
More informationWhat is machine learning?
Machine learning, pattern recognition and statistical data modelling Lecture 12. The last lecture Coryn Bailer-Jones 1 What is machine learning? Data description and interpretation finding simpler relationship
More informationPeople Tracking with the Laplacian Eigenmaps Latent Variable Model
People Tracking with the Laplacian Eigenmaps Latent Variable Model Zhengdong Lu CSEE, OGI, OHSU zhengdon@csee.ogi.edu Miguel Á. Carreira-Perpiñán EECS, UC Merced http://eecs.ucmerced.edu Cristian Sminchisescu
More informationPriors for People Tracking from Small Training Sets Λ
Priors for People Tracking from Small Training Sets Λ Raquel Urtasun CVLab EPFL, Lausanne Switzerland David J. Fleet Computer Science Dept. University of Toronto Canada Aaron Hertzmann Computer Science
More informationAutomatic Alignment of Local Representations
Automatic Alignment of Local Representations Yee Whye Teh and Sam Roweis Department of Computer Science, University of Toronto ywteh,roweis @cs.toronto.edu Abstract We present an automatic alignment procedure
More informationThe Role of Manifold Learning in Human Motion Analysis
The Role of Manifold Learning in Human Motion Analysis Ahmed Elgammal and Chan Su Lee Department of Computer Science, Rutgers University, Piscataway, NJ, USA {elgammal,chansu}@cs.rutgers.edu Abstract.
More informationSelecting Models from Videos for Appearance-Based Face Recognition
Selecting Models from Videos for Appearance-Based Face Recognition Abdenour Hadid and Matti Pietikäinen Machine Vision Group Infotech Oulu and Department of Electrical and Information Engineering P.O.
More informationLocality Preserving Projections (LPP) Abstract
Locality Preserving Projections (LPP) Xiaofei He Partha Niyogi Computer Science Department Computer Science Department The University of Chicago The University of Chicago Chicago, IL 60615 Chicago, IL
More information08 An Introduction to Dense Continuous Robotic Mapping
NAVARCH/EECS 568, ROB 530 - Winter 2018 08 An Introduction to Dense Continuous Robotic Mapping Maani Ghaffari March 14, 2018 Previously: Occupancy Grid Maps Pose SLAM graph and its associated dense occupancy
More informationDimension Reduction CS534
Dimension Reduction CS534 Why dimension reduction? High dimensionality large number of features E.g., documents represented by thousands of words, millions of bigrams Images represented by thousands of
More informationLocally Linear Landmarks for large-scale manifold learning
Locally Linear Landmarks for large-scale manifold learning Max Vladymyrov and Miguel Á. Carreira-Perpiñán Electrical Engineering and Computer Science University of California, Merced http://eecs.ucmerced.edu
More informationGeneralized Principal Component Analysis CVPR 2007
Generalized Principal Component Analysis Tutorial @ CVPR 2007 Yi Ma ECE Department University of Illinois Urbana Champaign René Vidal Center for Imaging Science Institute for Computational Medicine Johns
More informationLearning Joint Top-Down and Bottom-up Processes for 3D Visual Inference
Learning Joint Top-Down and Bottom-up Processes for 3D Visual Inference Cristian Sminchisescu TTI-C crismin@nagoya.uchicago.edu http://ttic.uchicago.edu/ crismin Atul Kanaujia and Dimitris Metaxas Rutgers
More informationA Taxonomy of Semi-Supervised Learning Algorithms
A Taxonomy of Semi-Supervised Learning Algorithms Olivier Chapelle Max Planck Institute for Biological Cybernetics December 2005 Outline 1 Introduction 2 Generative models 3 Low density separation 4 Graph
More informationLearning Appearance Manifolds from Video
Learning Appearance Manifolds from Video Ali Rahimi MIT CS and AI Lab, Cambridge, MA 9 ali@mit.edu Ben Recht MIT Media Lab, Cambridge, MA 9 brecht@media.mit.edu Trevor Darrell MIT CS and AI Lab, Cambridge,
More informationGenerative Modeling for Continuous Non-Linearly Embedded Visual Inference
Generative Modeling for Continuous Non-Linearly Embedded Visual Inference Cristian Sminchisescu Allan Jepson University of Toronto, Department of Computer Science, 6 King s College Road, Toronto, Ontario,
More informationlow bias high variance high bias low variance error test set training set high low Model Complexity Typical Behaviour Lecture 11:
Lecture 11: Overfitting and Capacity Control high bias low variance Typical Behaviour low bias high variance Sam Roweis error test set training set November 23, 4 low Model Complexity high Generalization,
More informationGraphical Models for Human Motion Modeling
Graphical Models for Human Motion Modeling Kooksang Moon and Vladimir Pavlović Rutgers University, Department of Computer Science Piscataway, NJ 8854, USA Abstract. The human figure exhibits complex and
More informationIterative Non-linear Dimensionality Reduction by Manifold Sculpting
Iterative Non-linear Dimensionality Reduction by Manifold Sculpting Mike Gashler, Dan Ventura, and Tony Martinez Brigham Young University Provo, UT 84604 Abstract Many algorithms have been recently developed
More informationImage Similarities for Learning Video Manifolds. Selen Atasoy MICCAI 2011 Tutorial
Image Similarities for Learning Video Manifolds Selen Atasoy MICCAI 2011 Tutorial Image Spaces Image Manifolds Tenenbaum2000 Roweis2000 Tenenbaum2000 [Tenenbaum2000: J. B. Tenenbaum, V. Silva, J. C. Langford:
More informationManifold Learning for Video-to-Video Face Recognition
Manifold Learning for Video-to-Video Face Recognition Abstract. We look in this work at the problem of video-based face recognition in which both training and test sets are video sequences, and propose
More informationNon-linear CCA and PCA by Alignment of Local Models
Non-linear CCA and PCA by Alignment of Local Models Jakob J. Verbeek, Sam T. Roweis, and Nikos Vlassis Informatics Institute, University of Amsterdam Department of Computer Science,University of Toronto
More informationClustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin
Clustering K-means Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, 2014 Carlos Guestrin 2005-2014 1 Clustering images Set of Images [Goldberger et al.] Carlos Guestrin 2005-2014
More informationerror low bias high variance test set training set high low Model Complexity Typical Behaviour 2 CSC2515 Machine Learning high bias low variance
CSC55 Machine Learning Sam Roweis high bias low variance Typical Behaviour low bias high variance Lecture : Overfitting and Capacity Control error training set test set November, 6 low Model Complexity
More informationGenerative and discriminative classification techniques
Generative and discriminative classification techniques Machine Learning and Category Representation 013-014 Jakob Verbeek, December 13+0, 013 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.13.14
More informationDoes Dimensionality Reduction Improve the Quality of Motion Interpolation?
Does Dimensionality Reduction Improve the Quality of Motion Interpolation? Sebastian Bitzer, Stefan Klanke and Sethu Vijayakumar School of Informatics - University of Edinburgh Informatics Forum, 10 Crichton
More informationLarge-Scale Face Manifold Learning
Large-Scale Face Manifold Learning Sanjiv Kumar Google Research New York, NY * Joint work with A. Talwalkar, H. Rowley and M. Mohri 1 Face Manifold Learning 50 x 50 pixel faces R 2500 50 x 50 pixel random
More informationGaussian Process Latent Variable Models for Visualisation of High Dimensional Data
Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data Neil D. Lawrence Department of Computer Science University of Sheffield Regent Court, 211 Portobello Street, Sheffield,
More informationNon-linear dimension reduction
Sta306b May 23, 2011 Dimension Reduction: 1 Non-linear dimension reduction ISOMAP: Tenenbaum, de Silva & Langford (2000) Local linear embedding: Roweis & Saul (2000) Local MDS: Chen (2006) all three methods
More information3D Human Motion Tracking Using Dynamic Probabilistic Latent Semantic Analysis
3D Human Motion Tracking Using Dynamic Probabilistic Latent Semantic Analysis Kooksang Moon and Vladimir Pavlović Department of Computer Science, Rutgers University Piscataway, NJ 885 {ksmoon, vladimir}@cs.rutgers.edu
More informationA Formal Approach to Score Normalization for Meta-search
A Formal Approach to Score Normalization for Meta-search R. Manmatha and H. Sever Center for Intelligent Information Retrieval Computer Science Department University of Massachusetts Amherst, MA 01003
More informationPart based models for recognition. Kristen Grauman
Part based models for recognition Kristen Grauman UT Austin Limitations of window-based models Not all objects are box-shaped Assuming specific 2d view of object Local components themselves do not necessarily
More informationA Stochastic Optimization Approach for Unsupervised Kernel Regression
A Stochastic Optimization Approach for Unsupervised Kernel Regression Oliver Kramer Institute of Structural Mechanics Bauhaus-University Weimar oliver.kramer@uni-weimar.de Fabian Gieseke Institute of Structural
More informationTracking People on a Torus
IEEE TRANSACTION ON PATTERN RECOGNITION AND MACHINE INTELLIGENCE (UNDER REVIEW) Tracking People on a Torus Ahmed Elgammal Member, IEEE, and Chan-Su Lee, Student Member, IEEE, Department of Computer Science,
More informationNon-Local Manifold Tangent Learning
Non-Local Manifold Tangent Learning Yoshua Bengio and Martin Monperrus Dept. IRO, Université de Montréal P.O. Box 1, Downtown Branch, Montreal, H3C 3J7, Qc, Canada {bengioy,monperrm}@iro.umontreal.ca Abstract
More informationInferring 3D from 2D
Inferring 3D from 2D History Monocular vs. multi-view analysis Difficulties structure of the solution and ambiguities static and dynamic ambiguities Modeling frameworks for inference and learning top-down
More informationPreviously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011
Previously Part-based and local feature models for generic object recognition Wed, April 20 UT-Austin Discriminative classifiers Boosting Nearest neighbors Support vector machines Useful for object recognition
More informationThe Laplacian Eigenmaps Latent Variable Model
The Laplacian Eigenmaps Latent Variable Model with applications to articulated pose tracking Miguel Á. Carreira-Perpiñán EECS, UC Merced http://faculty.ucmerced.edu/mcarreira-perpinan Articulated pose
More informationFast Algorithms for Large Scale Conditional 3D Prediction
Fast Algorithms for Large Scale Conditional 3D Prediction Liefeng Bo 1, Cristian Sminchisescu 2,1 1 TTI-C, 2 University of Bonn Atul Kanaujia 3, Dimitris Metaxas 3 3 Rutgers University Abstract The potential
More informationSelection of Scale-Invariant Parts for Object Class Recognition
Selection of Scale-Invariant Parts for Object Class Recognition Gy. Dorkó and C. Schmid INRIA Rhône-Alpes, GRAVIR-CNRS 655, av. de l Europe, 3833 Montbonnot, France fdorko,schmidg@inrialpes.fr Abstract
More informationClustering web search results
Clustering K-means Machine Learning CSE546 Emily Fox University of Washington November 4, 2013 1 Clustering images Set of Images [Goldberger et al.] 2 1 Clustering web search results 3 Some Data 4 2 K-means
More informationSparse Manifold Clustering and Embedding
Sparse Manifold Clustering and Embedding Ehsan Elhamifar Center for Imaging Science Johns Hopkins University ehsan@cis.jhu.edu René Vidal Center for Imaging Science Johns Hopkins University rvidal@cis.jhu.edu
More informationTracking Human Body Pose on a Learned Smooth Space
Boston U. Computer Science Tech. Report No. 2005-029, Aug. 2005. Tracking Human Body Pose on a Learned Smooth Space Tai-Peng Tian Rui Li Stan Sclaroff Computer Science Department Boston University Boston,
More informationDeep Generative Models Variational Autoencoders
Deep Generative Models Variational Autoencoders Sudeshna Sarkar 5 April 2017 Generative Nets Generative models that represent probability distributions over multiple variables in some way. Directed Generative
More informationA Hierarchial Model for Visual Perception
A Hierarchial Model for Visual Perception Bolei Zhou 1 and Liqing Zhang 2 1 MOE-Microsoft Laboratory for Intelligent Computing and Intelligent Systems, and Department of Biomedical Engineering, Shanghai
More informationAction recognition in videos
Action recognition in videos Cordelia Schmid INRIA Grenoble Joint work with V. Ferrari, A. Gaidon, Z. Harchaoui, A. Klaeser, A. Prest, H. Wang Action recognition - goal Short actions, i.e. drinking, sit
More informationMultiple cosegmentation
Armand Joulin, Francis Bach and Jean Ponce. INRIA -Ecole Normale Supérieure April 25, 2012 Segmentation Introduction Segmentation Supervised and weakly-supervised segmentation Cosegmentation Segmentation
More informationLocal Gaussian Processes for Pose Recognition from Noisy Inputs
FERGIE, GALATA: LOCAL GAUSSIAN PROCESS REGRESSION WITH NOISY INPUTS 1 Local Gaussian Processes for Pose Recognition from Noisy Inputs Martin Fergie mfergie@cs.man.ac.uk Aphrodite Galata a.galata@cs.man.ac.uk
More informationDay 3 Lecture 1. Unsupervised Learning
Day 3 Lecture 1 Unsupervised Learning Semi-supervised and transfer learning Myth: you can t do deep learning unless you have a million labelled examples for your problem. Reality You can learn useful representations
More informationSupplementary Material: The Emergence of. Organizing Structure in Conceptual Representation
Supplementary Material: The Emergence of Organizing Structure in Conceptual Representation Brenden M. Lake, 1,2 Neil D. Lawrence, 3 Joshua B. Tenenbaum, 4,5 1 Center for Data Science, New York University
More informationLearning Alignments from Latent Space Structures
Learning Alignments from Latent Space Structures Ieva Kazlauskaite Department of Computer Science University of Bath, UK i.kazlauskaite@bath.ac.uk Carl Henrik Ek Faculty of Engineering University of Bristol,
More informationMachine Learning. The Breadth of ML Neural Networks & Deep Learning. Marc Toussaint. Duy Nguyen-Tuong. University of Stuttgart
Machine Learning The Breadth of ML Neural Networks & Deep Learning Marc Toussaint University of Stuttgart Duy Nguyen-Tuong Bosch Center for Artificial Intelligence Summer 2017 Neural Networks Consider
More informationGaussian Process Motion Graph Models for Smooth Transitions among Multiple Actions
Gaussian Process Motion Graph Models for Smooth Transitions among Multiple Actions Norimichi Ukita 1 and Takeo Kanade Graduate School of Information Science, Nara Institute of Science and Technology The
More informationHierarchical Gaussian Process Latent Variable Models
Neil D. Lawrence neill@cs.man.ac.uk School of Computer Science, University of Manchester, Kilburn Building, Oxford Road, Manchester, M13 9PL, U.K. Andrew J. Moore A.Moore@dcs.shef.ac.uk Dept of Computer
More informationBilevel Sparse Coding
Adobe Research 345 Park Ave, San Jose, CA Mar 15, 2013 Outline 1 2 The learning model The learning algorithm 3 4 Sparse Modeling Many types of sensory data, e.g., images and audio, are in high-dimensional
More informationPart-based and local feature models for generic object recognition
Part-based and local feature models for generic object recognition May 28 th, 2015 Yong Jae Lee UC Davis Announcements PS2 grades up on SmartSite PS2 stats: Mean: 80.15 Standard Dev: 22.77 Vote on piazza
More informationCombining PGMs and Discriminative Models for Upper Body Pose Detection
Combining PGMs and Discriminative Models for Upper Body Pose Detection Gedas Bertasius May 30, 2014 1 Introduction In this project, I utilized probabilistic graphical models together with discriminative
More informationMarkov Random Fields and Gibbs Sampling for Image Denoising
Markov Random Fields and Gibbs Sampling for Image Denoising Chang Yue Electrical Engineering Stanford University changyue@stanfoed.edu Abstract This project applies Gibbs Sampling based on different Markov
More informationPreface to the Second Edition. Preface to the First Edition. 1 Introduction 1
Preface to the Second Edition Preface to the First Edition vii xi 1 Introduction 1 2 Overview of Supervised Learning 9 2.1 Introduction... 9 2.2 Variable Types and Terminology... 9 2.3 Two Simple Approaches
More information10-701/15-781, Fall 2006, Final
-7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly
More informationImplicit Mixtures of Restricted Boltzmann Machines
Implicit Mixtures of Restricted Boltzmann Machines Vinod Nair and Geoffrey Hinton Department of Computer Science, University of Toronto 10 King s College Road, Toronto, M5S 3G5 Canada {vnair,hinton}@cs.toronto.edu
More informationProbabilistic Graphical Models
Overview of Part Two Probabilistic Graphical Models Part Two: Inference and Learning Christopher M. Bishop Exact inference and the junction tree MCMC Variational methods and EM Example General variational
More informationCSE 6242 A / CS 4803 DVA. Feb 12, Dimension Reduction. Guest Lecturer: Jaegul Choo
CSE 6242 A / CS 4803 DVA Feb 12, 2013 Dimension Reduction Guest Lecturer: Jaegul Choo CSE 6242 A / CS 4803 DVA Feb 12, 2013 Dimension Reduction Guest Lecturer: Jaegul Choo Data is Too Big To Do Something..
More informationEstimating Human Pose in Images. Navraj Singh December 11, 2009
Estimating Human Pose in Images Navraj Singh December 11, 2009 Introduction This project attempts to improve the performance of an existing method of estimating the pose of humans in still images. Tasks
More informationClustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin
Clustering K-means Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, 2014 Carlos Guestrin 2005-2014 1 Clustering images Set of Images [Goldberger et al.] Carlos Guestrin 2005-2014
More informationHuman Upper Body Pose Estimation in Static Images
1. Research Team Human Upper Body Pose Estimation in Static Images Project Leader: Graduate Students: Prof. Isaac Cohen, Computer Science Mun Wai Lee 2. Statement of Project Goals This goal of this project
More informationBlind Image Deblurring Using Dark Channel Prior
Blind Image Deblurring Using Dark Channel Prior Jinshan Pan 1,2,3, Deqing Sun 2,4, Hanspeter Pfister 2, and Ming-Hsuan Yang 3 1 Dalian University of Technology 2 Harvard University 3 UC Merced 4 NVIDIA
More informationMotion Interpretation and Synthesis by ICA
Motion Interpretation and Synthesis by ICA Renqiang Min Department of Computer Science, University of Toronto, 1 King s College Road, Toronto, ON M5S3G4, Canada Abstract. It is known that high-dimensional
More information22 October, 2012 MVA ENS Cachan. Lecture 5: Introduction to generative models Iasonas Kokkinos
Machine Learning for Computer Vision 1 22 October, 2012 MVA ENS Cachan Lecture 5: Introduction to generative models Iasonas Kokkinos Iasonas.kokkinos@ecp.fr Center for Visual Computing Ecole Centrale Paris
More informationDiscovering Visual Hierarchy through Unsupervised Learning Haider Razvi
Discovering Visual Hierarchy through Unsupervised Learning Haider Razvi hrazvi@stanford.edu 1 Introduction: We present a method for discovering visual hierarchy in a set of images. Automatically grouping
More informationFeature descriptors. Alain Pagani Prof. Didier Stricker. Computer Vision: Object and People Tracking
Feature descriptors Alain Pagani Prof. Didier Stricker Computer Vision: Object and People Tracking 1 Overview Previous lectures: Feature extraction Today: Gradiant/edge Points (Kanade-Tomasi + Harris)
More informationObject Recognition Using Pictorial Structures. Daniel Huttenlocher Computer Science Department. In This Talk. Object recognition in computer vision
Object Recognition Using Pictorial Structures Daniel Huttenlocher Computer Science Department Joint work with Pedro Felzenszwalb, MIT AI Lab In This Talk Object recognition in computer vision Brief definition
More informationMetric learning approaches! for image annotation! and face recognition!
Metric learning approaches! for image annotation! and face recognition! Jakob Verbeek" LEAR Team, INRIA Grenoble, France! Joint work with :"!Matthieu Guillaumin"!!Thomas Mensink"!!!Cordelia Schmid! 1 2
More informationSELECTION OF THE OPTIMAL PARAMETER VALUE FOR THE LOCALLY LINEAR EMBEDDING ALGORITHM. Olga Kouropteva, Oleg Okun and Matti Pietikäinen
SELECTION OF THE OPTIMAL PARAMETER VALUE FOR THE LOCALLY LINEAR EMBEDDING ALGORITHM Olga Kouropteva, Oleg Okun and Matti Pietikäinen Machine Vision Group, Infotech Oulu and Department of Electrical and
More informationSupervised Learning for Image Segmentation
Supervised Learning for Image Segmentation Raphael Meier 06.10.2016 Raphael Meier MIA 2016 06.10.2016 1 / 52 References A. Ng, Machine Learning lecture, Stanford University. A. Criminisi, J. Shotton, E.
More informationData Preprocessing. Javier Béjar. URL - Spring 2018 CS - MAI 1/78 BY: $\
Data Preprocessing Javier Béjar BY: $\ URL - Spring 2018 C CS - MAI 1/78 Introduction Data representation Unstructured datasets: Examples described by a flat set of attributes: attribute-value matrix Structured
More informationHead Frontal-View Identification Using Extended LLE
Head Frontal-View Identification Using Extended LLE Chao Wang Center for Spoken Language Understanding, Oregon Health and Science University Abstract Automatic head frontal-view identification is challenging
More information( ) =cov X Y = W PRINCIPAL COMPONENT ANALYSIS. Eigenvectors of the covariance matrix are the principal components
Review Lecture 14 ! PRINCIPAL COMPONENT ANALYSIS Eigenvectors of the covariance matrix are the principal components 1. =cov X Top K principal components are the eigenvectors with K largest eigenvalues
More informationNon-Local Estimation of Manifold Structure
Non-Local Estimation of Manifold Structure Yoshua Bengio, Martin Monperrus and Hugo Larochelle Département d Informatique et Recherche Opérationnelle Centre de Recherches Mathématiques Université de Montréal
More informationContent-based image and video analysis. Machine learning
Content-based image and video analysis Machine learning for multimedia retrieval 04.05.2009 What is machine learning? Some problems are very hard to solve by writing a computer program by hand Almost all
More informationClassification of objects from Video Data (Group 30)
Classification of objects from Video Data (Group 30) Sheallika Singh 12665 Vibhuti Mahajan 12792 Aahitagni Mukherjee 12001 M Arvind 12385 1 Motivation Video surveillance has been employed for a long time
More informationComputer Vision II Lecture 14
Computer Vision II Lecture 14 Articulated Tracking I 08.07.2014 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Outline of This Lecture Single-Object Tracking Bayesian
More informationTechnical Report. Title: Manifold learning and Random Projections for multi-view object recognition
Technical Report Title: Manifold learning and Random Projections for multi-view object recognition Authors: Grigorios Tsagkatakis 1 and Andreas Savakis 2 1 Center for Imaging Science, Rochester Institute
More informationInternational Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol. XXXIV-5/W10
BUNDLE ADJUSTMENT FOR MARKERLESS BODY TRACKING IN MONOCULAR VIDEO SEQUENCES Ali Shahrokni, Vincent Lepetit, Pascal Fua Computer Vision Lab, Swiss Federal Institute of Technology (EPFL) ali.shahrokni,vincent.lepetit,pascal.fua@epfl.ch
More informationHuman Motion Change Detection By Hierarchical Gaussian Process Dynamical Model With Particle Filter
Human Motion Change Detection By Hierarchical Gaussian Process Dynamical Model With Particle Filter Yafeng Yin, Hong Man, Jing Wang, Guang Yang ECE Department, Stevens Institute of Technology Hoboken,
More informationMultiview Feature Learning
Multiview Feature Learning Roland Memisevic Frankfurt, Montreal Tutorial at IPAM 2012 Roland Memisevic (Frankfurt, Montreal) Multiview Feature Learning Tutorial at IPAM 2012 1 / 163 Outline 1 Introduction
More informationDiscovering Shared Structure in Manifold Learning
Discovering Shared Structure in Manifold Learning Yoshua Bengio and Martin Monperrus Dept. IRO, Université de Montréal P.O. Box 1, Downtown Branch, Montreal, H3C 3J7, Qc, Canada {bengioy,monperrm}@iro.umontreal.ca
More informationExperimental Analysis of GTM
Experimental Analysis of GTM Elias Pampalk In the past years many different data mining techniques have been developed. The goal of the seminar Kosice-Vienna is to compare some of them to determine which
More informationCS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas
CS839: Probabilistic Graphical Models Lecture 10: Learning with Partially Observed Data Theo Rekatsinas 1 Partially Observed GMs Speech recognition 2 Partially Observed GMs Evolution 3 Partially Observed
More informationDifferential Structure in non-linear Image Embedding Functions
Differential Structure in non-linear Image Embedding Functions Robert Pless Department of Computer Science, Washington University in St. Louis pless@cse.wustl.edu Abstract Many natural image sets are samples
More information