Human Shape from Silhouettes using Generative HKS Descriptors and Cross-Modal Neural Networks

Size: px
Start display at page:

Download "Human Shape from Silhouettes using Generative HKS Descriptors and Cross-Modal Neural Networks"

Transcription

1 Human Shape from Silhouettes using Generative HKS Descriptors and Cross-Modal Neural Networks Endri Dibra 1, Himanshu Jain 1, Cengiz Öztireli 1, Remo Ziegler 2, Markus Gross 1 1 Department of Computer Science, ETH Zürich, 2 Vizrt {edibra,cengizo,grossm}@inf.ethz.ch, jainh@student.ethz.ch, rziegler@vizrt.com Abstract In this work, we present a novel method for capturing human body shape from a single scaled silhouette. We combine deep correlated features capturing different 2D views, and embedding spaces based on 3D cues in a novel convolutional neural network (CNN) based architecture. We first train a CNN to find a richer body shape representation space from pose invariant 3D human shape descriptors. Then, we learn a mapping from silhouettes to this representation space, with the help of a novel architecture that exploits correlation of multi-view data during training time, to improve prediction at test time. We extensively validate our results on synthetic and real data, demonstrating significant improvements in accuracy as compared to the state-of-theart, and providing a practical system for detailed human body measurements from a single image. 1. Introduction Human body shape estimation has recently received a lot of interest. This partially relates to the growth in demand of applications such as tele-presence, virtual and augmented reality, virtual try-on, and body health monitoring. For such applications, having an accurate and practical system that estimates the 3D human body shape is of crucial importance. It needs to be accurate such that automated body measurements agree with the real ones, and needs to be practical such that it is fast and utilizes as few sensors as possible. With respect to the sensors utilized, in increasing order of simplicity, we can distinguish multiple cameras [10, 46], RGB and Depth [25] or a single image [20, 67, 29, 23, 5, 17]. In this work we tackle the problem of shape estimation from a single or multiple silhouettes of a human body with poses compliant with two main applications: virtual garment fitting assuming a neutral pose [13, 6, 16], and shape from individually taken pictures or Selfies (e.g. through a mirror or a long selfie stick), assuming poses that exhibit mild self occlusion [17]. Compared to state-of-the-art in this domain, we achieve significantly higher accuracy on the reconstructed body shapes and simultaneously improve in speed if a GPU implementation is considered (or obtain similar run-times as previous works [17] on the CPU). This is achieved thanks to a novel Neural Network architecture (Fig.1) consisting of various components that (a) are able to learn a body shape representation from 3D shape descriptors and map this representation to 3D shapes, (b) can successfully reconstruct a 3D body mesh from one or two given body silhouettes, and (c) can leverage multi-view data at training time, to boost predictions for a single view at test time through cross-modality learning. Previous methods attempt to find a mapping from silhouettes to the parameters of a statistical body shape model [2], utilizing handcrafted features [17], silhouette PCA representations [13] possibly with local fine tuning [6], or CNNs [16]. Based on the obtained parameters, a least squares system is solved to obtain the final mesh. We also use CNNs to learn silhouette features, but unlike [16], we first map them to a shape representation space that is generated from 3D shape descriptors (Heat Kernel Signature (HKS) [57]) invariant to isometric deformations and maximizing intrahuman-class variation, and then decode them to full body vertex positions. Regressing to this space improves the predictions and speeds up the computation. Recently, Dibra et al. [17] demonstrated how to boost features coming from one view (scaled frontal) during test time, utilizing information from two views (front and side) at training time, by projecting features with Canonical Correlation Analysis (CCA) [26] for a regression task. CCA comes with shortcomings though as (1) it computes a linear projection, (2) it is hard in practice to extend it to more than two views, and (3) suffers from lack of scalability to large datasets as it has to memorize the whole training data set. As part of this work, we propose an architecture (which we call Cross-Modal Neural Network (CMNN)) that is able to overcome the mentioned challenges, by first generating features from various views separately, and then combining them through shared layers. This leads to improvements in 1

2 Learned Embedding Space 1 Fully Connected Layers L Vert HKS s on a 3D mesh Global Descriptor Reconstructed 3D Mesh Front view, scaled silhouettes Front view original silhouettes Side view scaled silhouettes One view case - cross modal learning Fully Connected Layers Fully Connected Layers Fully Connected Layers L SF Weight sharing L UF Weight sharing L SS L Two-View Fully Connected Layers Max merge operation Weight sharing Two view case Front View Side View Figure 1. Our body shape estimation method. (1) HKS-Net: HKS projected features as input, generates an embedding space which is mapped to 3D meshes. (2),(3) and (4) Three modes of the Cross-Modal Neural Network (CMNN) (only (2) is used at test time). (5) An architecture that requires both views at test time. The method uses either CMNN or (5), depending on the number of available input views. 5 predictive capabilities with respect to the uni-modal case. Abstracting away from silhouettes, this network can be used as-is for other tasks where multiple views on the data are present, such as image and text retrieval, or audio and image matching. In summary, the contributions of this paper are: (1) A novel neural network architecture for 3D body shape estimation from silhouettes consisting of three main components, (a) a generative component that can invert a poseinvariant 3D shape descriptor to reconstruct its neutral shape, (b) a predictive component that combines 2D and 3D cues to map silhouettes to human body shapes, (c) a crossmodal component that leverages multi-view information to boost single view predictions; and (2) a state-of-the-art system for human body shape estimation that significantly improves accuracy as compared to existing methods. 2. Related Work Human Body Shape from an Image. Early works in estimating 3D body shapes make assumptions on the number of views [33] or simple geometric models [30, 41] often achieving coarse approximations of the underlying geometry. As scanning of a multitude of people in various poses and shapes was made possible [47], more complete, parametric human body shape models were learned [2, 24, 42, 36] that capture deformations due to shape and pose. The effectiveness of such models with human priors, gave rise to methods that try to estimate the human body shape from single [20, 67, 29, 23, 12, 46] or multiple input images [4, 10, 23, 46], by estimating the parameters of the model, through matching projected silhouettes of the 3D shapes to extracted image silhouettes by correspondence. Assumptions on the view, calibration, error metrics [10, 20, 29] and especially speed and manual interaction, needed to estimate pose and shape by silhouette matching, in the presence of occlusions and challenging poses [67, 29, 12], are common limitations of these methods, despite promising work to automatize the matching process [52, 53, 32]. A very recent work by Bogo et al. [5] attempts at estimating both the 3D pose and shape from a single 2D image with given 2D joints, making use of a 3D shape model based on skinning weights [36]. It utilizes a human body prior as a regularizer, for uncommon limb lengths or body interpenetrations, achieving excellent results on 3D pose estimations, however, lacking accuracy analysis on the generated body shapes. While the abovementioned works tackle the shape estimation problem by iteratively minimizing an energy function, another body of works estimate the 3D body shape by first constructing statistical models of 2D silhouette features and 3D bodies, and then defining a mapping between the parameters of each model [62, 55, 13, 15, 14, 6, 17]. In terms of silhouette representation they vary from PCA learned silhouette descriptors [13, 6] to handcrafted features such as the Radial Distance Functions and Shape Contexts [55] or

3 the Weighted Normal Depth and Curvature [17]. The statistical 3D body model is learned by applying PCA on triangle deformations from an average human body shape [2]. With respect to the body parameter estimations, Xi et al. [62] utilize a linear mapping, Sigal et al. [55] a mixture of kernel regressors, Chen et al. [13] a shared Gaussian process latent variable model, Dibra et al. [17] a combination of projections at Correlated Spaces and Random Forest Regressors and Boisvert et al. [6] an initial mapping with the method from [13] which is further refined by an optimization procedure with local fitting. The mentioned methods target applications similar to ours, however except for [17], they are lacking practicality for interactive applications due to their running times, and have been evaluated under more restrictive assumptions with respect to the camera calibration, poses, and amount of views required. Under similar settings, a more recent work [16] attempts at finding a mapping from the image directly, by training an end-to-end Neural Network to regress to body shape parameters. In contrast to these methods, we first learn an embedding space from 3D shape descriptors, that are invariant to isometric deformations, by training a CNN to regress directly to 3D body shape vertices. Then we learn a mapping from 2D silhouette images to this new embedding space. We demonstrate improved performance over the previous methods [16, 6] working under restrictive assumptions (two views and known camera calibration) with this set-up. Finally, by incorporating cross-modality learning from multiple views, we also outperform Dibra et al. [17] under a more general setting (one view and unknown camera calibration). CNN-s on 3D shapes. The improvement in accuracy and performance by utilizing Neural Networks for 2D image related tasks is widely aknowledged in the community by now. Once one goes to 3D, one of the main paradigms utilized is to represent the data as a low resolution voxelized grid [61, 56, 48]. This representation has been mainly utilized for shape classification and retrieval tasks [61, 56, 51] or to find a mapping from 2D view representations of those shapes [48], and has been geared towards rigid objects (like chairs, tables, cars etc.). Another possibility to represent the 3D shape, stemming more from the Computer Graphics community is that of 3D Shape Descriptors, which have been extensively studied for shape matching and retrieval [28, 58, 59]. Various shape descriptors have been proposed, with most recent approaches being diffusion based methods [57, 9, 49]. Based on the Laplace-Beltrami operator that can robustly characterize the points on a meshed surface, some of the proposed descriptors are the global point signature (GPS) [49], the heat kernel signature (HKS) [57] and the Wave Kernel Signature (WKS) [3]. Further works build on these and related descriptors and learn better descriptors, mainly through CNN-s that are utilized in shape retrieval, classification and especially shape matching [44, 7, 8, 38, 39, 60, 63, 35, 18]. Their main objective is either to maximize the inter class variance or to design features that find intra-class similarities. We, on the other hand, want to find suitable descriptors that maximize intra-class variance (here human body shapes), and learn a mapping by regression to 3D body shapes, which to the best of our knowledge has not been explored. Due to the properties of the HKS, such as invariance to isometric deformations and insensitivity to small perturbations on the surface, which are very desirable in order to consistently explain the same human body shape under varying non-rigid deformations, we start from the HKS and encode it into a new shape embedding space, from which we can decode the full body mesh or to which we can regress possible views of the bodies. In this way, our method can be thought of as a generative technique that learns an inverse mapping, from the descriptor space to the shape space. Cross-Modality Learning. In the presence of multiple views or modalities representing the same data, unsupervised learning techniques have been proposed that leverage such modalities during training, to learn better representations that can be useful when one of them is missing at test time. There exist a couple of applications that rely on learning common representations, including 1) transfer learning, 2) reconstruction of a missing view, 3) matching accross views, and directly related to our work 4) boosting single view performance utilizing data from other views or otherwise called cross-modality learning. Early works, like Canonical Correlation Analysis (CCA) [26] and it s kernelized version [22] find maximally correlated linear and non-linear projections of two random vectors with the intention of maximizing mutual information and minimizing individual noise. Fusing learned features for better prediction [50], hallucinating multiple modalities from a single view [40] as well as a generalized version of CCA [54] for a classification and retrieval task, have been proposed. Except for a few works [17, 40], utilizing cross-modality learning to improve regression has had little attention. To tackle the inability of CCA to scale well to large datasets, there have been recent attempts that utilize neural networks like Deep CCA [1] and its GPU counterpart [64], Multimodal Autoencoders [43] and Correlational Neural Networks [11] but these methods do not focus on boosting single view predictions. Unlike these techniques, we present a way to perform cross-modality learning by first learning representative features through CNN-s, and then passing them through shared encoding layers, with the objective of regressing to the embedding space. We demonstrate significant increase in performance over uni-modal predictions, and scalability to higher dimensional large scale data.

4 3. The Generative and Cross-Modal Estimator The main goal of our method is to accurately estimate a 3D body shape from a silhouette (or two) of a person adopting poses in compliance with two applications - virtual cloth fitting and self shape monitoring. On par with the related work, we consider either a single frontal silhouette scaled to a fixed size (no camera calibration information) with poses exhibiting mild self occlusions, or two views simultaneously (front and side, scaled or unscaled) of a person in a neutral pose. We propose to tackle this problem with a deep network architecture (Fig.1). Our network is composed of three core components: a generative component that can invert pose-invariant 3D shape descriptors, obtained from a multitude of 3D meshes (Sec.3.1) to their corresponding 3D shape, by learning an embedding space (Sec.3.2); a cross-modal component that leverages multiview information at training time to boost single view predictions at test time (Sec.3.3); and a combination of losses to perform joint training over the whole network (Sec.3.4) Shape Model and Data Generation In order to properly train our network, we recur to synthetic data as in the previous works since they best approximate our real input requirements. We need to obtain a large number of meshes from which we can extract 3D descriptors and 2D silhouettes in various poses. We make use of existing datasets [66, 45] consisting of meshes fitted to the commercially available CAESAR [47] dataset that contains 3D human body scans. Starting from these datasets, we can generate hundreds of thousands of human body meshes by learning a statistical model. The methods we compare to [62, 13, 6, 16, 17] utilize a low-dimensional parametric human model (SCAPE [2]) that is based on triangle deformations learned from 3D range scans of people in various shapes and poses. Despite more recent body models [42, 36], for fair comparisons and evaluation, we also utilize SCAPE, which is defined as a set of triangle deformations applied to a reference template 3D mesh consisting of 6449 vertices, with parameters α and β representing pose and intrinsic body shape deformations, respectively. From these parameters, each edge e i1 and e i2 of the i th triangle of the template mesh, defined as the difference vectors between the vertices of the triangle, can be transformed as e ij = R i (α)s i (β)q i (R i (α))e ij, (1) with j {1, 2}. Matrices R i (α), Q i (R i (α)) and S i (β) correspond to joint rotations, pose induced non-rigid deformations, and intrinsic shape variation, respectively. Similar to [16, 17], we learn a deformation space by applying PCA to the set of deformations for all meshes in the datasets, with respect to a template mesh, all in the same pose. To synthesize new meshes, we sample from a 20 dimensional multivariate normal distribution, given by the first 20 components obtained via PCA that capture 95% of the energy. Under the common assumption that the intrinsic body shape does not change significantly due to pose changes [2], we decouple pose from shape deformations. Hence, for a neutral pose we have e ij = S i(β)e ij. To add pose variation to the mesh synthesis process, instead of the transformation R i (α) parametrized by alpha, we utilize Linear Blend Skinning (LBS) [34], as in previous works [29, 19, 65], which computes the new position of each restpose vertex v 1,..., v n R 4 in homogenous coordinates, with a weighted combination of the bone transformation matrices T 1,..., T m R 4 4 of an embedded skeleton controlling the mesh, and skinning weights w i,1,..., w i,m R for a vertex v i and the m th bone transformation, as follows: m m v i = w i,j T j v i = w i,j T j v i. (2) j=1 j=1 Combining various intrinsic shapes and poses as generated above, we create a synthetic dataset consisting of half a million meshes, from which we extract HKS descriptors and silhouettes for training Generating 3D Shapes from HKS (HKS-Net) The first part of our architecture aims at learning a mapping from 3D shape descriptors to 3D meshes via a shape embedding space. We start by extracting Heat Kernel Signatures (HKS) and then projecting them to the eigenvectors of the Laplace-Beltrami operator to obtain a global descriptor. This is used to learn the embedding space, as well as an inverse mapping that can generate 3D shapes in a neutral pose given the corresponding descriptor. Heat Kernel Signatures (HKS). Let a 3D shape be represented as a graph G = (V, E, W ), where V, E and W represent the set of vertices, edges, and some weights on the edges, respectively. The weights encode the underlying geometry of the shape, and can be computed via standard techniques from the mesh processing literature [57]. Given such a graph constructed by connecting pairs of vertices on a surface with weighted edges, the heat kernel H t (x, y) is defined as the amount of heat that is transferred from the vertex x to vertex y at time t, given a unit heat source at x [57]: H t (x, y) = e λit φ i (x)φ i (y), (3) i where H t denotes the heat kernel, t is the diffusion time, λ i and φ i represent the i th eigenvalue and the corresponding eigenvector of the Laplace-Beltrami operator, respectively, and x and y denote two vertices. Heat kernel has various nice properties that are desirable to represent human body

5 shapes under different poses. In particular, it is invariant under isometric deformations of the shape, captures different levels of detail and global properties of the shape, and it is stable under perturbations [57]. The heat kernel at vertex x and time t can be used to define the heat kernel signature HKS x (t) for this vertex: HKS x (t) = H t (x, x) = i e λit φ 2 i (x). (4) Hence, for each vertex x, we have a corresponding function HKS x (t) that provides a multi-scale descriptor for x. As the scale (i.e. t) increases, we capture more and more global properties of the intrinsic shape. In practice, the times t are sampled to obtain a vector HKS x (t j ), j J for each vertex x. In our technique, we use J = 100 time samples. Then for each t j, we can form the vectors h j := [HKS x1 (t j ), HKS x2 (t j ) ] T. Projected HKS Matrix. To learn the embedding space, the HKS for all vertices at a given time t j are projected onto the eigenvectors of the Laplace-Beltrami operator in order to obtain a 2D image capturing the global intrinsic shape. Specifically, we compute a matrix M with M ij = φ T i h j, i.e. the dot product of the i th eigenvector of the Laplace- Beltrami operator and the heat kernel vector defined over the vertices for time t j. Since we use 300 eigenvectors φ i, we thus get a matrix M. This is then used as input to the top part of our network (that we call HKS-Net, Fig.1 (1)) to learn an embedding space of about 4000 dimensions, by minimizing the pervertex squared norm loss L V ert. A simplistic representation of this embedding, computed utilizing T-SNE [37], is also presented in Fig.1, where female meshes are depicted in green dots and male meshes in red. An important property of HKS-Net is that we can reconstruct a 3D mesh in a neutral pose when HKS-Net is presented with a computed M. Hence, HKS-Net can invert the HKS descriptors. Although we do not utilize this property in the scope of this work, we believe that this could be a valuable tool for geometry processing applications. But instead, we use the embedding space with 4000 dimensions as the target space for the cross-modal silhouette-based training of our network, which we explain next Cross-Modal Neural Network (CMNN) The second component thus consists of finding a mapping from silhouettes to the newly learned embedding space. We generate five types of silhouettes that can be referred to as modes : frontal view scaled in various poses with minor self occlusion, frontal view scaled in a neutral pose, side view scaled in a neutral pose and front and side view unscaled in a neutral pose (Fig.1). 1 Here, unscaled im- 1 Please note that throughout the text mode and view are used interchangeably to emphasize different ways of representing the same 3D mesh. plies known camera calibration, and scaled means we resize the silhouettes such that they have the same height. Frontal means that the plane of the bones that form the torso is parallel to the camera plane, and side is a 90 degrees rotated version of the frontal view. At test time, our results are not affected by slight deviations from these views. We thus center the silhouettes, and resize them to an image of resolution before inputting them to the CMNN. We, of course, do not expect to use all the modes/views at once during testing, but our intention is to leverage the vast amount of data from various modes at training time for robust predictions at test time. We start by training a network similar in size to the previous works [16] (5 convolutional and 3 dense layers), with AdaMax optimizer [31], and learning rate of e 4, to map each mode individually to the embedding space by minimizing squared losses on the 4000 embedding space parameters (Fig.1 (2),(3) and (4) with the respective losses L SF, L UF and L SS ). As shown in Tab.2, we already achieve better results for the one-view case as compared to related works. This pre-training serves as an initialization for the convolutional weights of the Cross-Modal Neural Network (CMNN). The final cross-modal training is performed by starting from the weights given by the pre-training, and optimizing for the shared weights for the fully connected layers with a combined loss, e.g. for scaled-front and scaled-side we minimize L SF + L SS, or for three modes, the loss is L SF + L UF + L SS. The idea is to let each single convolutional network compute silhouette features separately first, and then correlate these high-level features at later stages. We observed that we obtain significant improvements when cross-correlating various combinations of 2 modes and 3 modes during training (Tab.2) as compared to the uni-modal results. CMNN offers several advantages as compared to CCA. First, we obtain a non-linear correlation between high-level features. Second, we can add as many modes as we want, while it is not trivial to correlate more than two spaces with CCA. Finally, we do not need to store all training data in memory as in the case of CCA. One of the main focuses of this paper is estimating a 3D shape for the scaled-frontal case, with similar application scenarios as in the previous works [17]. Hence, our desired test time mode, i.e. the desired input at test time, is a silhouette from a frontal view with unknown camera parameters. Without loss of generality, we consider the unscaled-frontal and scaled-side as the other additional modes. Note that this can be extended with more views and further variations Joint Training Finally, we would like to jointly train HKS-Net and CMNN for obtaining the final generative network. This is done by using all losses at the same time and back-

6 Mean Error (in mm) Table 1. Nomenclature for the various experiments. For the architecture components highlighted in colors and with numbers, please refer to Fig.1. Name Training Input Test Input Architecture SF-1 Scaled Frontal View (SFV), Neutral Pose SFV 2 SF-1-P SFV, Various Poses SFV 2 SFU-1 SFV, Unscaled Frontal View (UFV) SFV 2 3 SFS-1 SFV, Scaled Side View (SSV) SFV 2 4 SFUS-1 SFV, UFV, SSV SFV SFUS-HKS-1 SFV, UFV, SSV, projected HKS (PHKS) SFV SF-SS-2 SFV, SSV SFV, SSV 5 UF-US-2 UFV, Unscaled Side View (USV) UFV, USV 5 UF-US-HKS-2 UFV, USV, PHKS UFV, USV J O L M N A B P D H C I E G F K propagating them to all parts of the architecture. We thus perform a joint training with the HKS-Net by minimizing L SF + L UF + L SS + L V ert. This training not only improves the mappings from 2D silhouettes to the 3D meshes, but also improves the generative capabilities of the HKS- Net by learning a better embedding space (Tab.2 and Tab.3). Two-View Case. We also consider the case when two complementary input silhouette images (front and side) are given simultaneosuly, which further allows comparisons to some of the related works [62, 13, 6, 16]. For this case, we mainly consider neutral poses. As the architecture, we use the HKS-Net along with a network similar to the one used in a recent work [16] (Fig.1 (5)) where, unlike in CMNN, the weight sharing is performed at early stages during convolutions, and the last convolutional layers are merged through a max-pooling operation. This is then trained with the sum of squared losses L T wo V iew + L V ert, on the embedding space and the mesh vertex locations, as before. Similarly, the mapping to the embedding space is decoded to a 3D mesh space through a forward pass in the dense layers of the HKS-Net. This achieves better results than in the previous works [16], due to the newly learned embedding (Tab.3). 4. Experiments and Results We have run an extensive set of experiments to ensure the reliability of our technique. In this section, we report results of our qualitative and quantitative tests, with thorough comparisons to the state-of-the-art. In order to quantitatively assess our method, we perform experiments on synthetic data similar to previous works [6, 17, 16] by computing errors on the same 16 body measurements widely utilized in tailor fitting, as shown in Fig. 2. Since all the methods we compare to, as well as ours, make use of the same shape model [2], the comparisons become more reliable through these measurements on estimated meshes in full correspondence. From the combined datasets [66, 45] of meshes fitted to real body scans, where duplicate removal is ensured as in [16, 17], we set 1000 meshes apart for testing, and utilize the rest for generating the human body model and training data (Sec.3.1). For these left-out meshes we then extract HKS descriptors and silhouettes in various views and poses. Figure 2. Plot of the mean error over all body measurements illustrated on a mesh, for the methods from Tab.2 abd Tab.3. We apply LBS [34] to deform the meshes into desired poses compliant with our applications (see supplementary). We run the methods from two previous works [17, 16] on the silhouettes extracted from these meshes, while for others [62, 13, 6], we report the numbers from their experiments performed on similar but fewer meshes (around 300). In addition to comparisons with the state-of-the-art, we thoroughly evaluate the added value of each component in our network. In the end we conclude with qualitative results and run-time evaluations. Quantitative Experiments. The 16 measurements are calculated as follows: straight line measurements are computed by Euclidean distances between two extreme vertices, while for the ellipsoidal ones, we calculate the perimeter on the body surface. For each measurement, we report the mean and standard deviations of the errors over all estimated meshes with respect to the ground truth ones. We report errors when only the frontal view silhouette is utilized at test time in Tab. 2, and if both frontal and side view silhouettes are available at test time in Tab. 3. For both tables, we distinguish between two cases: known camera distance (unscaled) and unknown camera distance (called scaled in the subsequent analysis, since we scale the silhouettes to have the same height in this case, as elaborated in Sec. 3.3). The nomenclature for our experiments is summarized in Tab. 1. Note that for all methods in the tables, the errors are for a neutral pose, except for SF 1 P, where we show the error measures when we train and test using different poses. The mean error over all body measurements for the methods we consider is depicted in Fig. 2. Our best mean error for the one view cross-modal case is 4.01 mm and for the two-view case is 3.77 mm, showing a very high accuracy for the tasks we consider. These are significantly better than the mean error of the previous works with mm [17], 10.8 mm [16], 11 mm [6], and 10.1 mm [25], even though some of these methods operate under more restrictive assumptions. Our best results, that achieve state-of-the-art, are highlighted in bold. For the one view case (Tab. 2), one can see that as we go

7 Table 2. Body measurement errors comparison with shapes reconstructed from one scaled frontal silhouette. The nomenclature is presented in Tab. 1. Last two columns show the results of the state-of-the-art methods. The measurements are illustrated in Fig. 2 (top-right). Errors are expressed as Mean±Std. Dev in millimeters. Our best achieving method SFUS-HKS-1 is highlighted. Measurements SF-1-P SF-1 SFS-1 SFU-1 SFUS-1 SFUS-HKS-1 HS-Net-1-S [16] CCA-RF [17] A. Head circumference 4.3± ± ± ± ± ±2.6 4±4 8±8 B. Neck circumference 2.2± ± ± ± ± ±1.7 8±5 7±7 C. Shoulder-blade/crotch length 6.2± ± ± ± ± ±3.8 20±15 18±17 D. Chest circumference 6.7± ± ± ± ± ±4.8 13±7 25±24 E. Waist circumference 8.1± ± ± ± ± ±5.2 19±13 24±24 F. Pelvis circumference 9.3± ± ± ± ± ±5.9 19±14 26±25 G. Wrist circumference 2.1± ± ± ± ± ±1.5 5±3 5±5 H. Bicep circumference 3.9± ± ± ± ± ±2.5 8±4 11±11 I. Forearm circumference 3.1± ± ± ± ± ±2.2 7±4 9±8 J. Arm length 4.1± ± ± ± ± ±2.4 12±8 13±12 K. Inside leg length 7.3± ± ± ± ± ±4.3 20±14 20±19 L. Thigh circumference 6.3± ± ± ± ± ±4.9 13±8 18±17 M. Calf circumference 3.6± ± ± ± ± ±2.5 12±7 12±12 N. Ankle circumference 2.1± ± ± ± ± ±1.3 6±3 6±6 O. Overall height 12.6± ± ± ± ± ±7.7 50±39 43±41 P. Shoulder breadth 2.3± ± ± ± ± ±1.7 4±4 6±6 Table 3. Same as in Tab. 2, however with shapes reconstructed from two views at the same time. Last four columns show the results of the other state-of-the-art methods for the same task. Our best achieving method UF-US-HKS-2 is highlighted. Measurements SF-SS-2 UF-US-2 UF-US-HKS-2 HS-2-Net-MM [16] Boisvert et al. [6] Chen et al. [15] Xi et al. [62] A. Head circumference 3.9± ± ± ±5.8 10±12 23±27 50±60 B. Neck circumference 1.9± ± ± ±3.1 11±13 27±34 59±72 C. Shoulder-blade/crotch length 5.1± ± ± ±7.0 4±5 52±65 119±150 D. Chest circumference 5.4± ± ± ± ±12 18±22 36±45 E. Waist circumference 7.5± ± ± ± ±23 37±39 55±62 F. Pelvis circumference 8.0± ± ± ± ±12 15±19 23±28 G. Wrist circumference 1.9± ± ± ±2.7 9±12 24±30 56±70 H. Bicep circumference 3.0± ± ± ±4.9 17±22 59±76 146±177 I. Forearm circumference 3.0± ± ± ±4.2 16±20 76± ±230 J. Arm length 3.3± ± ± ±6.4 15±21 53±73 109±141 K. Inside leg length 5.6± ± ± ±12.4 6±7 9±12 19±24 L. Thigh circumference 5.8± ± ± ±10.8 9±12 19±25 35±44 M. Calf circumference 3.9± ± ± ±6.5 6±7 16±21 33±42 N. Ankle circumference 2.1± ± ± ±3.2 14±16 28±35 61±78 O. Overall height 10.6± ± ± ±20.4 9±12 21±27 49±62 P. Shoulder breadth 2.2± ± ± ±3.9 6±7 12±15 24±31 from uni-modal to cross-modal training, by using multiple views at training time and sharing weights in the fully connected layers, the errors constantly decrease. We show the effect of adding a side scaled view only (SF S 1), an unscaled frontal view only (SF U 1), and combining all three (SF US 1). The lowest errors are achieved through joint training (SF US HKS 1) of the CMNN and HKS-Net (Sec. 3.4). In this case, not only the accuracy of predictions from silhouettes, but also the accuracy of the HKS-Net itself is improved as compared to when it is separately trained, reducing the mean error over all the meshes from 4.74 to 3.77 mm. We further report results when different poses are applied on the test meshes (SF 1 P ), in contrast to all other methods considered. Even in this case, the errors do not differ much from the neutral pose case (SF 1), implying robustness to variations for the pose space we consider. For the two view case, we compare to the results of the works that require two views at test time [6, 62, 13, 16]. We utilize the same camera calibration assumptions, and again achieve significant improvements in accuracy (UF US HKS 2), due to the new shape embedding space jointly trained with the prediction network. For the two view-case, we do not test on multiple poses, since the previous works we compare to are also tested on neutral poses for this particular application. One interesting observation here is that the results for the single view cross-modal case (SF US 1 in Tab. 2) are comparable to, and in some measurements even better than those of the two-view network (SF SS 2 in Tab. 3). Since no joint training was performed in either case, and the loss for both cases is in the shape embedding space, this demonstrates the importance of the shared fully connected layers and cross-modal training for boosting prediction performance at test time. Qualitative Experiments. We evaluate our method on three test subjects from a previos work [17] in a neutral and selfie pose, and four new subjects with other poses. As can be observed in Fig. 3, our reconstructions resemble the real individuals more closely, as compared to those from Dibra

8 Figure 3. Results for predictions on the test images from Dibra et al. [17]. From left to right: the two input images in a rest and selfie pose, the corresponding silhouettes, the estimated mesh by our method SF 1 P, and by the method of Dibra et al. [17]. Figure 4. Predictions results on four test subjects in different poses and with clothes. From left to right: input image, the corresponding silhouette, the estimated mesh by our method SF 1 P. et al. [17] (last column), especially for the second subject. We additionally show mesh overlays over the input images, applied also to the method from Bogo et al. [5] in the supplementary. The results in Fig. 4 illustrate harder cases, where the silhouettes differ more from those of the training data due to clothing, poses, and occlusions. Our results still explain the silhouettes well for all cases. Speed. The training of our network was performed on an Intel(R) Core(TM) i7 CPU GHz with NVIDIA GTX 1080 (8G) GPU. It took around 50 min per epoch, with one epoch consisting of roughly 50, 000 samples. The total training time for the various architectures considered in the experiments varies from epochs. We conducted our test time experiments on an Intel(R) Core(TM) i7 CPU GHz with NVIDIA GTX 940 (2GB) GPU. Since our method directly outputs the vertices of a mesh, and does not need to solve a least squares system (Eq. 1), it is much faster (0.15 seconds) than other methods when using the GPU for prediction. Even when using a CPU, our method takes about 0.3 seconds, similar to the fastest method [17], and less than 6 seconds [6] and 0.45 seconds [16], as reported in other previous works. As a result, our method scales to higher mesh resolutions, and can be directly used as an end-to-end pipeline, outputting a full 3D mesh. With the advances in compressed deep networks (e.g. [21, 27]), this can potentially be ported to mobile devices, which is in line with our targeted application of shape from selfies. Finally, we perform a further experiment with noise added to the silhouettes, as in the previous works [17, 16]. The method is robust to silhouette noise, with a mean error increase of 4.1 mm for high levels of noise. We present further results on poses, silhouette noise, failure cases and a comparison to CCA applied instead of our Cross-Modal Network in the supplementary material. 5. Conclusion and Discussion We presented a novel method for capturing a 3D human body shape from a single silhouette with unknown camera parameters. This is achieved by combining deep correlated features capturing different 2D views, and embedding spaces based on 3D shape descriptors in a novel CNNbased architecture. We extensively validated our results on synthetic and real data, demonstrating significant improvement in accuracy as compared to the state-of-the-art methods. We illustrated that each component of the architecture is important to achieve these improved results. Combined with the lowest running times over all the state-of-the-art, we thus provide a practical system for detailed human body measurements with millimetric accuracy. The proposed cross-modal neural network enhances features by incorporating information coming from different modalities at training time. The idea of such correlating networks can be extended for many other problems where privileged data is available, or correlations among different data types (e.g image, text, audio) are to be exploited. HKS-Net like architectures can be used for inverting shape descriptors, which can have various applications for understanding and generating shapes. Inferring 3D shapes from 2D projections is an ill-posed problem. As in the previous works, we operate under mild occlusions and a certain level of silhouette noise, which are realistic assumptions for many scenarios including ours. However, especially for severe occlusions, we would need stronger priors to infer correct 3D shapes. We believe that extending our techniques for images with shading cues can provide accurate estimations even for such cases. A training covering different environments and textures would be necessary for this case. Acknowledgment. This work was funded by the KTIgrant We would like to thank Wan-Chun Alex Ma for the help with the datasets and Brian McWilliams for the valuable discussions about CCA.

9 References [1] G. Andrew, R. Arora, J. A. Bilmes, and K. Livescu. Deep canonical correlation analysis. In Proceedings of the 30th International Conference on Machine Learning, ICML 2013, Atlanta, GA, USA, June 2013, pages , [2] D. Anguelov, P. Srinivasan, D. Koller, S. Thrun, J. Rodgers, and J. Davis. Scape: Shape completion and animation of people. In ACM SIGGRAPH 2005 Papers, SIGGRAPH 05, pages , New York, NY, USA, ACM. 1, 2, 3, 4, 6 [3] M. Aubry, U. Schlickewei, and D. Cremers. The wave kernel signature: A quantum mechanical approach to shape analysis. In Computer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on, pages IEEE, [4] A. O. Balan, L. Sigal, M. J. Black, J. E. Davis, and H. W. Haussecker. Detailed human shape and pose from images. In 2007 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2007), June 2007, Minneapolis, Minnesota, USA, [5] F. Bogo, A. Kanazawa, C. Lassner, P. Gehler, J. Romero, and M. J. Black. Keep it smpl: Automatic estimation of 3d human pose and shape from a single image. In European Conference on Computer Vision, pages Springer, , 2, 8 [6] J. Boisvert, C. Shu, S. Wuhrer, and P. Xi. Three-dimensional human shape inference from silhouettes: reconstruction and validation. Mach. Vis. Appl., 24(1): , , 2, 3, 4, 6, 7, 8 [7] D. Boscaini, J. Masci, E. Rodolà, and M. M. Bronstein. Learning shape correspondence with anisotropic convolutional neural networks. Technical Report arxiv: , [8] D. Boscaini, J. Masci, E. Rodolà, M. M. Bronstein, and D. Cremers. Anisotropic diffusion descriptors. volume 35, pages , [9] A. M. Bronstein, M. M. Bronstein, R. Kimmel, M. Mahmoudi, and G. Sapiro. A gromov-hausdorff framework with diffusion geometry for topologically-robust non-rigid shape matching. International Journal of Computer Vision, 89: , [10] A. O. Bălan and M. J. Black. The naked truth: Estimating body shape under clothing. In Proceedings of the 10th European Conference on Computer Vision: Part II, ECCV 08, pages 15 29, Berlin, Heidelberg, Springer-Verlag. 1, 2 [11] S. Chandar, M. M. Khapra, H. Larochelle, and B. Ravindran. Correlational neural networks. Neural Computation, 28(2): , [12] X. Chen, Y. Guo, B. Zhou, and Q. Zhao. Deformable model for estimating clothed and naked human shapes from a single image. The Visual Computer, 29(11): , [13] Y. Chen and R. Cipolla. Learning shape priors for single view reconstruction. In ICCV Workshops. IEEE, , 2, 3, 4, 6, 7 [14] Y. Chen, T. Kim, and R. Cipolla. Silhouette-based object phenotype recognition using 3d shape priors. In IEEE International Conference on Computer Vision, ICCV 2011, Barcelona, Spain, November 6-13, 2011, pages 25 32, [15] Y. Chen, T.-K. Kim, and R. Cipolla. Inferring 3d shapes and deformations from single views. In K. Daniilidis, P. Maragos, and N. Paragios, editors, ECCV (3), volume 6313 of Lecture Notes in Computer Science, pages Springer, , 7 [16] E. Dibra, H. Jain, C. Oztireli, R. Ziegler, and M. Gross. Hsnets : Estimating human body shape from silhouettes with convolutional neural networks. In Int. Conf. on 3D Vision, October , 3, 4, 5, 6, 7, 8 [17] E. Dibra, C. Öztireli, R. Ziegler, and M. Gross. Shape from selfies: Human body shape estimation using cca regression forests. In Computer Vision - ECCV th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part V, pages Springer, , 2, 3, 4, 5, 6, 7, 8 [18] Y. Fang, J. Xie, G. Dai, M. Wang, F. Zhu, T. Xu, and E. Wong. 3d deep shape descriptor. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June [19] A. Feng, D. Casas, and A. Shapiro. Avatar reshaping and automatic rigging using a deformable model. In Proceedings of the 8th ACM SIGGRAPH Conference on Motion in Games, pages 57 64, Paris, France, Nov ACM Press. 4 [20] P. Guan, A. Weiss, A. O. Balan, and M. J. Black. Estimating human shape and pose from a single image. In IEEE 12th International Conference on Computer Vision, ICCV 2009, Kyoto, Japan, September 27 - October 4, 2009, pages , , 2 [21] S. Han, H. Mao, and W. J. Dally. Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding. CoRR, abs/ , 2, [22] D. R. Hardoon, S. R. Szedmak, and J. R. Shawe-taylor. Canonical correlation analysis: An overview with application to learning methods. Neural Comput., 16(12): , Dec [23] N. Hasler, H. Ackermann, B. Rosenhahn, T. Thormählen, and H. Seidel. Multilinear pose and body shape estimation of dressed subjects from image sets. In The Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2010, San Francisco, CA, USA, June 2010, pages , , 2 [24] N. Hasler, C. Stoll, M. Sunkel, B. Rosenhahn, and H. Seidel. A statistical model of human pose and body shape. Comput. Graph. Forum, 28(2): , [25] T. Helten, A. Baak, G. Bharaj, M. Müller, H. Seidel, and C. Theobalt. Personalization and evaluation of a real-time depth-based full body tracker. In 2013 International Conference on 3D Vision, 3DV 2013, Seattle, Washington, USA, June 29 - July 1, 2013, pages , , 6 [26] H. Hotelling. Relations between two sets of variates. Biometrika, 28(3/4): , Dec , 3

10 [27] F. N. Iandola, M. W. Moskewicz, K. Ashraf, S. Han, W. J. Dally, and K. Keutzer. Squeezenet: Alexnet-level accuracy with 50x fewer parameters and 1mb model size. arxiv preprint arxiv: , [28] N. Iyer, S. Jayanti, K. Lou, Y. Kalyanaraman, and K. Ramani. Three-dimensional shape searching: state-of-theart review and future trends. Computer-aided Design, 37(5): , [29] A. Jain, T. Thormählen, H.-P. Seidel, and C. Theobalt. Moviereshape: Tracking and reshaping of humans in videos. ACM Trans. Graph. (Proc. SIGGRAPH Asia 2010), 29(5), , 2, 4 [30] I. A. Kakadiaris and D. Metaxas. Three-dimensional human body model acquisition from multiple views. International Journal of Computer Vision, 30(3): , [31] D. Kingma and J. Ba. Adam: A method for stochastic optimization. arxiv preprint arxiv: , [32] Z. Lahner, E. Rodola, F. R. Schmidt, M. M. Bronstein, and D. Cremers. Efficient globally optimal 2d-to-3d deformable shape matching. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June [33] A. Laurentini. The visual hull concept for silhouette-based image understanding. IEEE Trans. Pattern Anal. Mach. Intell., 16(2): , Feb [34] J. P. Lewis, M. Cordner, and N. Fong. Pose space deformation: A unified approach to shape interpolation and skeletondriven deformation. In Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH 00, pages , New York, NY, USA, ACM Press/Addison-Wesley Publishing Co. 4, 6 [35] R. Litman and A. M. Bronstein. Learning spectral descriptors for deformable shape correspondence. IEEE Trans. Pattern Anal. Mach. Intell., 36(1): , [36] M. Loper, N. Mahmood, J. Romero, G. Pons-Moll, and M. J. Black. Smpl: A skinned multi-person linear model. ACM Trans. Graph., 34(6):248:1 248:16, Oct , 4 [37] L. v. d. Maaten and G. Hinton. Visualizing data using t-sne. Journal of Machine Learning Research, 9(Nov): , [38] J. Masci, D. Boscaini, M. M. Bronstein, and P. Vandergheynst. Geodesic convolutional neural networks on riemannian manifolds. In Proc. of the IEEE International Conference on Computer Vision (ICCV) Workshops, pages 37 45, [39] J. Masci, D. Boscaini, M. M. Bronstein, and P. Vandergheynst. Shapenet: neural networks on non-euclidean manifolds. CoRR, abs/ , [40] B. McWilliams, D. Balduzzi, and J. M. Buhmann. Correlated random features for fast semi-supervised learning. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 26, pages Curran Associates, Inc., [41] I. Mikic, M. Trivedi, E. Hunter, and P. Cosman. Human body model acquisition and tracking using voxel data. International Journal of Computer Vision, 53(3): , [42] A. Neophytou and A. Hilton. Shape and pose space deformation for subject specific animation. In Proceedings of the 2013 International Conference on 3D Vision, 3DV 13, pages , Washington, DC, USA, IEEE Computer Society. 2, 4 [43] J. Ngiam, A. Khosla, M. Kim, J. Nam, H. Lee, and A. Y. Ng. Multimodal deep learning. In Proceedings of the 28th International Conference on Machine Learning, ICML 2011, Bellevue, Washington, USA, June 28 - July 2, 2011, pages , [44] D. Pickup, X. Sun, P. L. Rosin, R. R. Martin, Z. Cheng, Z. Lian, M. Aono, A. B. Hamza, A. Bronstein, M. Bronstein, S. Bu, U. Castellani, S. Cheng, V. Garro, A. Giachetti, A. Godil, J. Han, H. Johan, L. Lai, B. Li, C. Li, H. Li, R. Litman, X. Liu, Z. Liu, Y. Lu, A. Tatsuma, and J. Ye. Shape retrieval of non-rigid 3d human models. In Proceedings of the 7th Eurographics Workshop on 3D Object Retrieval, 3DOR 15, pages , Aire-la-Ville, Switzerland, Switzerland, Eurographics Association. 3 [45] L. Pishchulin, S. Wuhrer, T. Helten, C. Theobalt, and B. Schiele. Building statistical shape spaces for 3d human modeling. CoRR, abs/ , , 6 [46] H. Rhodin, N. Robertini, D. Casas, C. Richardt, H. Seidel, and C. Theobalt. General automatic human shape and motion capture using volumetric contour cues. In Computer Vision - ECCV th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part V, pages , , 2 [47] K. M. Robinette and H. A. M. Daanen. The caesar project: A 3-d surface anthropometry survey. In 2nd International Conference on 3D Digital Imaging and Modeling (3DIM 99), 4-8 October 1999, Ottawa, Canada, pages , , 4 [48] M. R. A. K. G. Rohit Girdhar, David F Fouhey. Learning a predictable and generative vector representation for objects. European Conference on Computer Vision, [49] R. M. Rustamov. Laplace-beltrami eigenfunctions for deformation invariant shape representation [50] M. E. Sargin, Y. Yemez, E. Erzin, and A. M. Tekalp. Audiovisual synchronization and fusion using canonical correlation analysis. IEEE Trans. Multimedia, 9(7): , [51] M. Savva, F. Yu, H. Su, M. Aono, B. Chen, D. Cohen-Or, W. Deng, H. Su, S. Bai, X. Bai, et al. Shrec16 track largescale 3d shape retrieval from shapenet core55. 3 [52] F. R. Schmidt, D. Farin, and D. Cremers. Fast matching of planar shapes in sub-cubic runtime. In ICCV, pages 1 6. IEEE Computer Society, [53] F. R. Schmidt, E. Töppe, and D. Cremers. Efficient planar graph cuts with applications in computer vision. In IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), Miami, Florida, Jun [54] A. Sharma, A. Kumar, H. D. III, and D. W. Jacobs. Generalized multiview analysis: A discriminative latent space. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, June 16-21, 2012, pages ,

Human Shape from Silhouettes using Generative HKS Descriptors and Cross-Modal Neural Networks

Human Shape from Silhouettes using Generative HKS Descriptors and Cross-Modal Neural Networks Human Shape from Silhouettes using Generative HKS Descriptors and Cross-Modal Neural Networks Endri Dibra 1, Himanshu Jain 1, Cengiz Öztireli 1, Remo Ziegler 2, Markus Gross 1 1 Department of Computer

More information

Shape from Selfies : Human Body Shape Estimation using CCA Regression Forests

Shape from Selfies : Human Body Shape Estimation using CCA Regression Forests Shape from Selfies : Human Body Shape Estimation using CCA Regression Forests Endri Dibra 1 Cengiz Öztireli1 Remo Ziegler 2 Markus Gross 1 1 Department of Computer Science, ETH Zürich 2 Vizrt {edibra,cengizo,grossm}@inf.ethz.ch

More information

Comparison of Default Patient Surface Model Estimation Methods

Comparison of Default Patient Surface Model Estimation Methods Comparison of Default Patient Surface Model Estimation Methods Xia Zhong 1, Norbert Strobel 2, Markus Kowarschik 2, Rebecca Fahrig 2, Andreas Maier 1,3 1 Pattern Recognition Lab, Friedrich-Alexander-Universität

More information

Statistical Learning of Human Body through Feature Wireframe

Statistical Learning of Human Body through Feature Wireframe Statistical Learning of Human Body through Feature Wireframe Jida HUANG 1, Tsz-Ho KWOK 2*, Chi ZHOU 1 1 Industrial and Systems Engineering, University at Buffalo, SUNY, Buffalo NY, USA; 2 Mechanical, Industrial

More information

Clothed and Naked Human Shapes Estimation from a Single Image

Clothed and Naked Human Shapes Estimation from a Single Image Clothed and Naked Human Shapes Estimation from a Single Image Yu Guo, Xiaowu Chen, Bin Zhou, and Qinping Zhao State Key Laboratory of Virtual Reality Technology and Systems School of Computer Science and

More information

Point Cloud Completion of Foot Shape from a Single Depth Map for Fit Matching using Deep Learning View Synthesis

Point Cloud Completion of Foot Shape from a Single Depth Map for Fit Matching using Deep Learning View Synthesis Point Cloud Completion of Foot Shape from a Single Depth Map for Fit Matching using Deep Learning View Synthesis Nolan Lunscher University of Waterloo 200 University Ave W. nlunscher@uwaterloo.ca John

More information

Semi-Automatic Prediction of Landmarks on Human Models in Varying Poses

Semi-Automatic Prediction of Landmarks on Human Models in Varying Poses Semi-Automatic Prediction of Landmarks on Human Models in Varying Poses Stefanie Wuhrer Zouhour Ben Azouz Chang Shu National Research Council of Canada, Ottawa, Ontario, Canada {stefanie.wuhrer, zouhour.benazouz,

More information

Estimating 3D Human Shapes from Measurements

Estimating 3D Human Shapes from Measurements Estimating 3D Human Shapes from Measurements Stefanie Wuhrer Chang Shu March 19, 2012 arxiv:1109.1175v2 [cs.cv] 16 Mar 2012 Abstract The recent advances in 3-D imaging technologies give rise to databases

More information

Self-supervised Multi-level Face Model Learning for Monocular Reconstruction at over 250 Hz Supplemental Material

Self-supervised Multi-level Face Model Learning for Monocular Reconstruction at over 250 Hz Supplemental Material Self-supervised Multi-level Face Model Learning for Monocular Reconstruction at over 250 Hz Supplemental Material Ayush Tewari 1,2 Michael Zollhöfer 1,2,3 Pablo Garrido 1,2 Florian Bernard 1,2 Hyeongwoo

More information

Research Article Human Model Adaptation for Multiview Markerless Motion Capture

Research Article Human Model Adaptation for Multiview Markerless Motion Capture Mathematical Problems in Engineering Volume 2013, Article ID 564214, 7 pages http://dx.doi.org/10.1155/2013/564214 Research Article Human Model Adaptation for Multiview Markerless Motion Capture Dianyong

More information

From 3-D Scans to Design Tools

From 3-D Scans to Design Tools From 3-D Scans to Design Tools Chang SHU* a, Pengcheng XI a, Stefanie WUHRER b a National Research Council, Ottawa, Canada; b Saarland University, Saarbrücken, Germany http://dx.doi.org/10.15221/13.151

More information

A Model-based Approach to Rapid Estimation of Body Shape and Postures Using Low-Cost Depth Cameras

A Model-based Approach to Rapid Estimation of Body Shape and Postures Using Low-Cost Depth Cameras A Model-based Approach to Rapid Estimation of Body Shape and Postures Using Low-Cost Depth Cameras Abstract Byoung-Keon D. PARK*, Matthew P. REED University of Michigan, Transportation Research Institute,

More information

3D Reconstruction of Human Bodies with Clothes from Un-calibrated Monocular Video Images

3D Reconstruction of Human Bodies with Clothes from Un-calibrated Monocular Video Images 3D Reconstruction of Human Bodies with Clothes from Un-calibrated Monocular Video Images presented by Tran Cong Thien Qui PhD Candidate School of Computer Engineering & Institute for Media Innovation Supervisor:

More information

Correspondence. CS 468 Geometry Processing Algorithms. Maks Ovsjanikov

Correspondence. CS 468 Geometry Processing Algorithms. Maks Ovsjanikov Shape Matching & Correspondence CS 468 Geometry Processing Algorithms Maks Ovsjanikov Wednesday, October 27 th 2010 Overall Goal Given two shapes, find correspondences between them. Overall Goal Given

More information

International Conference on Communication, Media, Technology and Design. ICCMTD May 2012 Istanbul - Turkey

International Conference on Communication, Media, Technology and Design. ICCMTD May 2012 Istanbul - Turkey VISUALIZING TIME COHERENT THREE-DIMENSIONAL CONTENT USING ONE OR MORE MICROSOFT KINECT CAMERAS Naveed Ahmed University of Sharjah Sharjah, United Arab Emirates Abstract Visualizing or digitization of the

More information

SurfNet: Generating 3D shape surfaces using deep residual networks-supplementary Material

SurfNet: Generating 3D shape surfaces using deep residual networks-supplementary Material SurfNet: Generating 3D shape surfaces using deep residual networks-supplementary Material Ayan Sinha MIT Asim Unmesh IIT Kanpur Qixing Huang UT Austin Karthik Ramani Purdue sinhayan@mit.edu a.unmesh@gmail.com

More information

Supplemental Material: Detailed, accurate, human shape estimation from clothed 3D scan sequences

Supplemental Material: Detailed, accurate, human shape estimation from clothed 3D scan sequences Supplemental Material: Detailed, accurate, human shape estimation from clothed 3D scan sequences Chao Zhang 1,2, Sergi Pujades 1, Michael Black 1, and Gerard Pons-Moll 1 1 MPI for Intelligent Systems,

More information

Estimation of Human Body Shape in Motion with Wide Clothing

Estimation of Human Body Shape in Motion with Wide Clothing Estimation of Human Body Shape in Motion with Wide Clothing Jinlong Yang, Jean-Sébastien Franco, Franck Hétroy-Wheeler, Stefanie Wuhrer To cite this version: Jinlong Yang, Jean-Sébastien Franco, Franck

More information

Accurate 3D Face and Body Modeling from a Single Fixed Kinect

Accurate 3D Face and Body Modeling from a Single Fixed Kinect Accurate 3D Face and Body Modeling from a Single Fixed Kinect Ruizhe Wang*, Matthias Hernandez*, Jongmoo Choi, Gérard Medioni Computer Vision Lab, IRIS University of Southern California Abstract In this

More information

Clothes Size Prediction from Dressed-human Silhouettes

Clothes Size Prediction from Dressed-human Silhouettes Clothes Size Prediction from Dressed-human Silhouettes Dan Song 1, Ruofeng Tong 1, Jian Chang 2, Tongtong Wang 1, Jiang Du 1, Min Tang 1 and Jian Jun Zhang 2 1 State Key Lab of CAD&CG, Zhejiang University,

More information

Structure-oriented Networks of Shape Collections

Structure-oriented Networks of Shape Collections Structure-oriented Networks of Shape Collections Noa Fish 1 Oliver van Kaick 2 Amit Bermano 3 Daniel Cohen-Or 1 1 Tel Aviv University 2 Carleton University 3 Princeton University 1 pplementary material

More information

DeepIM: Deep Iterative Matching for 6D Pose Estimation - Supplementary Material

DeepIM: Deep Iterative Matching for 6D Pose Estimation - Supplementary Material DeepIM: Deep Iterative Matching for 6D Pose Estimation - Supplementary Material Yi Li 1, Gu Wang 1, Xiangyang Ji 1, Yu Xiang 2, and Dieter Fox 2 1 Tsinghua University, BNRist 2 University of Washington

More information

Supplementary: Cross-modal Deep Variational Hand Pose Estimation

Supplementary: Cross-modal Deep Variational Hand Pose Estimation Supplementary: Cross-modal Deep Variational Hand Pose Estimation Adrian Spurr, Jie Song, Seonwook Park, Otmar Hilliges ETH Zurich {spurra,jsong,spark,otmarh}@inf.ethz.ch Encoder/Decoder Linear(512) Table

More information

A Data-Driven Approach for 3D Human Body Pose Reconstruction from a Kinect Sensor

A Data-Driven Approach for 3D Human Body Pose Reconstruction from a Kinect Sensor Journal of Physics: Conference Series PAPER OPEN ACCESS A Data-Driven Approach for 3D Human Body Pose Reconstruction from a Kinect Sensor To cite this article: Li Yao et al 2018 J. Phys.: Conf. Ser. 1098

More information

CS233: The Shape of Data Handout # 3 Geometric and Topological Data Analysis Stanford University Wednesday, 9 May 2018

CS233: The Shape of Data Handout # 3 Geometric and Topological Data Analysis Stanford University Wednesday, 9 May 2018 CS233: The Shape of Data Handout # 3 Geometric and Topological Data Analysis Stanford University Wednesday, 9 May 2018 Homework #3 v4: Shape correspondences, shape matching, multi-way alignments. [100

More information

Single-View Dressed Human Modeling via Morphable Template

Single-View Dressed Human Modeling via Morphable Template Single-View Dressed Human Modeling via Morphable Template Lin Wang 1, Kai Jiang 2, Bin Zhou 1, Qiang Fu 1, Kan Guo 1, Xiaowu Chen 1 1 State Key Laboratory of Virtual Reality Technology and Systems School

More information

Nonrigid Surface Modelling. and Fast Recovery. Department of Computer Science and Engineering. Committee: Prof. Leo J. Jia and Prof. K. H.

Nonrigid Surface Modelling. and Fast Recovery. Department of Computer Science and Engineering. Committee: Prof. Leo J. Jia and Prof. K. H. Nonrigid Surface Modelling and Fast Recovery Zhu Jianke Supervisor: Prof. Michael R. Lyu Committee: Prof. Leo J. Jia and Prof. K. H. Wong Department of Computer Science and Engineering May 11, 2007 1 2

More information

Deformable Mesh Model for Complex Multi-Object 3D Motion Estimation from Multi-Viewpoint Video

Deformable Mesh Model for Complex Multi-Object 3D Motion Estimation from Multi-Viewpoint Video Deformable Mesh Model for Complex Multi-Object 3D Motion Estimation from Multi-Viewpoint Video Shohei NOBUHARA Takashi MATSUYAMA Graduate School of Informatics, Kyoto University Sakyo, Kyoto, 606-8501,

More information

Algorithms for 3D Isometric Shape Correspondence

Algorithms for 3D Isometric Shape Correspondence Algorithms for 3D Isometric Shape Correspondence Yusuf Sahillioğlu Computer Eng. Dept., Koç University, Istanbul, Turkey (PhD) Computer Eng. Dept., METU, Ankara, Turkey (Asst. Prof.) 2 / 53 Problem Definition

More information

3D Shape Analysis with Multi-view Convolutional Networks. Evangelos Kalogerakis

3D Shape Analysis with Multi-view Convolutional Networks. Evangelos Kalogerakis 3D Shape Analysis with Multi-view Convolutional Networks Evangelos Kalogerakis 3D model repositories [3D Warehouse - video] 3D geometry acquisition [KinectFusion - video] 3D shapes come in various flavors

More information

Preparation Meeting. Recent Advances in the Analysis of 3D Shapes. Emanuele Rodolà Matthias Vestner Thomas Windheuser Daniel Cremers

Preparation Meeting. Recent Advances in the Analysis of 3D Shapes. Emanuele Rodolà Matthias Vestner Thomas Windheuser Daniel Cremers Preparation Meeting Recent Advances in the Analysis of 3D Shapes Emanuele Rodolà Matthias Vestner Thomas Windheuser Daniel Cremers What You Will Learn in the Seminar Get an overview on state of the art

More information

3D Pose Estimation using Synthetic Data over Monocular Depth Images

3D Pose Estimation using Synthetic Data over Monocular Depth Images 3D Pose Estimation using Synthetic Data over Monocular Depth Images Wei Chen cwind@stanford.edu Xiaoshi Wang xiaoshiw@stanford.edu Abstract We proposed an approach for human pose estimation over monocular

More information

Learning to generate 3D shapes

Learning to generate 3D shapes Learning to generate 3D shapes Subhransu Maji College of Information and Computer Sciences University of Massachusetts, Amherst http://people.cs.umass.edu/smaji August 10, 2018 @ Caltech Creating 3D shapes

More information

Intrinsic3D: High-Quality 3D Reconstruction by Joint Appearance and Geometry Optimization with Spatially-Varying Lighting

Intrinsic3D: High-Quality 3D Reconstruction by Joint Appearance and Geometry Optimization with Spatially-Varying Lighting Intrinsic3D: High-Quality 3D Reconstruction by Joint Appearance and Geometry Optimization with Spatially-Varying Lighting R. Maier 1,2, K. Kim 1, D. Cremers 2, J. Kautz 1, M. Nießner 2,3 Fusion Ours 1

More information

Octree Generating Networks: Efficient Convolutional Architectures for High-resolution 3D Outputs Supplementary Material

Octree Generating Networks: Efficient Convolutional Architectures for High-resolution 3D Outputs Supplementary Material Octree Generating Networks: Efficient Convolutional Architectures for High-resolution 3D Outputs Supplementary Material Peak memory usage, GB 10 1 0.1 0.01 OGN Quadratic Dense Cubic Iteration time, s 10

More information

arxiv: v2 [cs.cv] 3 Mar 2017

arxiv: v2 [cs.cv] 3 Mar 2017 Building Statistical Shape Spaces for 3D Human Modeling Leonid Pishchulin a,, Stefanie Wuhrer b, Thomas Helten c, Christian Theobalt a, Bernt Schiele a arxiv:153.586v2 [cs.cv] 3 Mar 217 Abstract a Max

More information

Computing and Processing Correspondences with Functional Maps

Computing and Processing Correspondences with Functional Maps Computing and Processing Correspondences with Functional Maps SIGGRAPH 2017 course Maks Ovsjanikov, Etienne Corman, Michael Bronstein, Emanuele Rodolà, Mirela Ben-Chen, Leonidas Guibas, Frederic Chazal,

More information

A three-dimensional shape database from a large-scale anthropometric survey

A three-dimensional shape database from a large-scale anthropometric survey A three-dimensional shape database from a large-scale anthropometric survey Peng Li, Brian Corner, Jeremy Carson, Steven Paquette US Army Natick Soldier Research Development & Engineering Center, Natick,

More information

Vehicle Dimensions Estimation Scheme Using AAM on Stereoscopic Video

Vehicle Dimensions Estimation Scheme Using AAM on Stereoscopic Video Workshop on Vehicle Retrieval in Surveillance (VRS) in conjunction with 2013 10th IEEE International Conference on Advanced Video and Signal Based Surveillance Vehicle Dimensions Estimation Scheme Using

More information

A Segmentation of Non-rigid Shape with Heat Diffuse

A Segmentation of Non-rigid Shape with Heat Diffuse 2012 4th International Conference on Signal Processing Systems (ICSPS 2012) IPCSIT vol. 58 (2012) (2012) IACSIT Press, Singapore DOI: 10.7763/IPCSIT.2012.V58.6 A Segmentation of Non-rigid Shape with Heat

More information

arxiv: v1 [cs.cv] 28 Sep 2018

arxiv: v1 [cs.cv] 28 Sep 2018 Camera Pose Estimation from Sequence of Calibrated Images arxiv:1809.11066v1 [cs.cv] 28 Sep 2018 Jacek Komorowski 1 and Przemyslaw Rokita 2 1 Maria Curie-Sklodowska University, Institute of Computer Science,

More information

Geometry Representations with Unsupervised Feature Learning

Geometry Representations with Unsupervised Feature Learning Geometry Representations with Unsupervised Feature Learning Yeo-Jin Yoon 1, Alexander Lelidis 2, A. Cengiz Öztireli 3, Jung-Min Hwang 1, Markus Gross 3 and Soo-Mi Choi 1 1 Department of Computer Science

More information

Volumetric and Multi-View CNNs for Object Classification on 3D Data Supplementary Material

Volumetric and Multi-View CNNs for Object Classification on 3D Data Supplementary Material Volumetric and Multi-View CNNs for Object Classification on 3D Data Supplementary Material Charles R. Qi Hao Su Matthias Nießner Angela Dai Mengyuan Yan Leonidas J. Guibas Stanford University 1. Details

More information

Geometric Registration for Deformable Shapes 3.3 Advanced Global Matching

Geometric Registration for Deformable Shapes 3.3 Advanced Global Matching Geometric Registration for Deformable Shapes 3.3 Advanced Global Matching Correlated Correspondences [ASP*04] A Complete Registration System [HAW*08] In this session Advanced Global Matching Some practical

More information

Detecting and Parsing of Visual Objects: Humans and Animals. Alan Yuille (UCLA)

Detecting and Parsing of Visual Objects: Humans and Animals. Alan Yuille (UCLA) Detecting and Parsing of Visual Objects: Humans and Animals Alan Yuille (UCLA) Summary This talk describes recent work on detection and parsing visual objects. The methods represent objects in terms of

More information

RSRN: Rich Side-output Residual Network for Medial Axis Detection

RSRN: Rich Side-output Residual Network for Medial Axis Detection RSRN: Rich Side-output Residual Network for Medial Axis Detection Chang Liu, Wei Ke, Jianbin Jiao, and Qixiang Ye University of Chinese Academy of Sciences, Beijing, China {liuchang615, kewei11}@mails.ucas.ac.cn,

More information

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li Learning to Match Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li 1. Introduction The main tasks in many applications can be formalized as matching between heterogeneous objects, including search, recommendation,

More information

3D model classification using convolutional neural network

3D model classification using convolutional neural network 3D model classification using convolutional neural network JunYoung Gwak Stanford jgwak@cs.stanford.edu Abstract Our goal is to classify 3D models directly using convolutional neural network. Most of existing

More information

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Si Chen The George Washington University sichen@gwmail.gwu.edu Meera Hahn Emory University mhahn7@emory.edu Mentor: Afshin

More information

Su et al. Shape Descriptors - III

Su et al. Shape Descriptors - III Su et al. Shape Descriptors - III Siddhartha Chaudhuri http://www.cse.iitb.ac.in/~cs749 Funkhouser; Feng, Liu, Gong Recap Global A shape descriptor is a set of numbers that describes a shape in a way that

More information

Project Updates Short lecture Volumetric Modeling +2 papers

Project Updates Short lecture Volumetric Modeling +2 papers Volumetric Modeling Schedule (tentative) Feb 20 Feb 27 Mar 5 Introduction Lecture: Geometry, Camera Model, Calibration Lecture: Features, Tracking/Matching Mar 12 Mar 19 Mar 26 Apr 2 Apr 9 Apr 16 Apr 23

More information

Precise and Automatic Anthropometric Measurement Extraction Using Template Registration

Precise and Automatic Anthropometric Measurement Extraction Using Template Registration Precise and Automatic Anthropometric Measurement Extraction Using Template Registration Oliver WASENMÜLLER, Jan C. PETERS, Vladislav GOLYANIK, Didier STRICKER German Research Center for Artificial Intelligence

More information

Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture David Eigen, Rob Fergus

Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture David Eigen, Rob Fergus Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture David Eigen, Rob Fergus Presented by: Rex Ying and Charles Qi Input: A Single RGB Image Estimate

More information

Body Trunk Shape Estimation from Silhouettes by Using Homologous Human Body Model

Body Trunk Shape Estimation from Silhouettes by Using Homologous Human Body Model Body Trunk Shape Estimation from Silhouettes by Using Homologous Human Body Model Shunta Saito* a, Makiko Kochi b, Masaaki Mochimaru b, Yoshimitsu Aoki a a Keio University, Yokohama, Kanagawa, Japan; b

More information

Dynamic Human Shape Description and Characterization

Dynamic Human Shape Description and Characterization Dynamic Human Shape Description and Characterization Z. Cheng*, S. Mosher, Jeanne Smith H. Cheng, and K. Robinette Infoscitex Corporation, Dayton, Ohio, USA 711 th Human Performance Wing, Air Force Research

More information

A Validation Study of a Kinect Based Body Imaging (KBI) Device System Based on ISO 20685:2010

A Validation Study of a Kinect Based Body Imaging (KBI) Device System Based on ISO 20685:2010 A Validation Study of a Kinect Based Body Imaging (KBI) Device System Based on ISO 20685:2010 Sara BRAGANÇA* 1, Miguel CARVALHO 1, Bugao XU 2, Pedro AREZES 1, Susan ASHDOWN 3 1 University of Minho, Portugal;

More information

SCAPE: Shape Completion and Animation of People

SCAPE: Shape Completion and Animation of People SCAPE: Shape Completion and Animation of People By Dragomir Anguelov, Praveen Srinivasan, Daphne Koller, Sebastian Thrun, Jim Rodgers, James Davis From SIGGRAPH 2005 Presentation for CS468 by Emilio Antúnez

More information

Tracking of Human Body using Multiple Predictors

Tracking of Human Body using Multiple Predictors Tracking of Human Body using Multiple Predictors Rui M Jesus 1, Arnaldo J Abrantes 1, and Jorge S Marques 2 1 Instituto Superior de Engenharia de Lisboa, Postfach 351-218317001, Rua Conselheiro Emído Navarro,

More information

Colored Point Cloud Registration Revisited Supplementary Material

Colored Point Cloud Registration Revisited Supplementary Material Colored Point Cloud Registration Revisited Supplementary Material Jaesik Park Qian-Yi Zhou Vladlen Koltun Intel Labs A. RGB-D Image Alignment Section introduced a joint photometric and geometric objective

More information

Parametric Editing of Clothed 3D Avatars

Parametric Editing of Clothed 3D Avatars The Visual Computer manuscript No. (will be inserted by the editor) Parametric Editing of Clothed 3D Avatars Yin Chen Zhi-Quan Cheng Ralph R. Martin Abstract Easy editing of a clothed 3D human avatar is

More information

Human pose estimation using Active Shape Models

Human pose estimation using Active Shape Models Human pose estimation using Active Shape Models Changhyuk Jang and Keechul Jung Abstract Human pose estimation can be executed using Active Shape Models. The existing techniques for applying to human-body

More information

From processing to learning on graphs

From processing to learning on graphs From processing to learning on graphs Patrick Pérez Maths and Images in Paris IHP, 2 March 2017 Signals on graphs Natural graph: mesh, network, etc., related to a real structure, various signals can live

More information

Landmark Detection on 3D Face Scans by Facial Model Registration

Landmark Detection on 3D Face Scans by Facial Model Registration Landmark Detection on 3D Face Scans by Facial Model Registration Tristan Whitmarsh 1, Remco C. Veltkamp 2, Michela Spagnuolo 1 Simone Marini 1, Frank ter Haar 2 1 IMATI-CNR, Genoa, Italy 2 Dept. Computer

More information

arxiv: v1 [cs.cv] 16 Nov 2015

arxiv: v1 [cs.cv] 16 Nov 2015 Coarse-to-fine Face Alignment with Multi-Scale Local Patch Regression Zhiao Huang hza@megvii.com Erjin Zhou zej@megvii.com Zhimin Cao czm@megvii.com arxiv:1511.04901v1 [cs.cv] 16 Nov 2015 Abstract Facial

More information

Key Developments in Human Pose Estimation for Kinect

Key Developments in Human Pose Estimation for Kinect Key Developments in Human Pose Estimation for Kinect Pushmeet Kohli and Jamie Shotton Abstract The last few years have seen a surge in the development of natural user interfaces. These interfaces do not

More information

Multimodal Motion Capture Dataset TNT15

Multimodal Motion Capture Dataset TNT15 Multimodal Motion Capture Dataset TNT15 Timo v. Marcard, Gerard Pons-Moll, Bodo Rosenhahn January 2016 v1.2 1 Contents 1 Introduction 3 2 Technical Recording Setup 3 2.1 Video Data............................

More information

MoFA: Model-based Deep Convolutional Face Autoencoder for Unsupervised Monocular Reconstruction

MoFA: Model-based Deep Convolutional Face Autoencoder for Unsupervised Monocular Reconstruction MoFA: Model-based Deep Convolutional Face Autoencoder for Unsupervised Monocular Reconstruction Ayush Tewari Michael Zollhofer Hyeongwoo Kim Pablo Garrido Florian Bernard Patrick Perez Christian Theobalt

More information

Smart Content Recognition from Images Using a Mixture of Convolutional Neural Networks *

Smart Content Recognition from Images Using a Mixture of Convolutional Neural Networks * Smart Content Recognition from Images Using a Mixture of Convolutional Neural Networks * Tee Connie *, Mundher Al-Shabi *, and Michael Goh Faculty of Information Science and Technology, Multimedia University,

More information

Structured Light II. Thanks to Ronen Gvili, Szymon Rusinkiewicz and Maks Ovsjanikov

Structured Light II. Thanks to Ronen Gvili, Szymon Rusinkiewicz and Maks Ovsjanikov Structured Light II Johannes Köhler Johannes.koehler@dfki.de Thanks to Ronen Gvili, Szymon Rusinkiewicz and Maks Ovsjanikov Introduction Previous lecture: Structured Light I Active Scanning Camera/emitter

More information

Posture Invariant Surface Description and Feature Extraction

Posture Invariant Surface Description and Feature Extraction Posture Invariant Surface Description and Feature Extraction Stefanie Wuhrer 1 Zouhour Ben Azouz 2 Chang Shu 1 Abstract We propose a posture invariant surface descriptor for triangular meshes. Using intrinsic

More information

Translation Symmetry Detection: A Repetitive Pattern Analysis Approach

Translation Symmetry Detection: A Repetitive Pattern Analysis Approach 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops Translation Symmetry Detection: A Repetitive Pattern Analysis Approach Yunliang Cai and George Baciu GAMA Lab, Department of Computing

More information

AAM Based Facial Feature Tracking with Kinect

AAM Based Facial Feature Tracking with Kinect BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 15, No 3 Sofia 2015 Print ISSN: 1311-9702; Online ISSN: 1314-4081 DOI: 10.1515/cait-2015-0046 AAM Based Facial Feature Tracking

More information

Topology-Invariant Similarity and Diffusion Geometry

Topology-Invariant Similarity and Diffusion Geometry 1 Topology-Invariant Similarity and Diffusion Geometry Lecture 7 Alexander & Michael Bronstein tosca.cs.technion.ac.il/book Numerical geometry of non-rigid shapes Stanford University, Winter 2009 Intrinsic

More information

STRUCTURAL EDGE LEARNING FOR 3-D RECONSTRUCTION FROM A SINGLE STILL IMAGE. Nan Hu. Stanford University Electrical Engineering

STRUCTURAL EDGE LEARNING FOR 3-D RECONSTRUCTION FROM A SINGLE STILL IMAGE. Nan Hu. Stanford University Electrical Engineering STRUCTURAL EDGE LEARNING FOR 3-D RECONSTRUCTION FROM A SINGLE STILL IMAGE Nan Hu Stanford University Electrical Engineering nanhu@stanford.edu ABSTRACT Learning 3-D scene structure from a single still

More information

Tiled Texture Synthesis

Tiled Texture Synthesis International Journal of Information & Computation Technology. ISSN 0974-2239 Volume 4, Number 16 (2014), pp. 1667-1672 International Research Publications House http://www. irphouse.com Tiled Texture

More information

The Future of Natural User Interaction

The Future of Natural User Interaction The Future of Natural User Interaction NUI: a great new discipline in the making Baining Guo Microsoft Research Asia Kinect from Insiders Perspective An interview with Kinect contributors Kinect from

More information

Specular 3D Object Tracking by View Generative Learning

Specular 3D Object Tracking by View Generative Learning Specular 3D Object Tracking by View Generative Learning Yukiko Shinozuka, Francois de Sorbier and Hideo Saito Keio University 3-14-1 Hiyoshi, Kohoku-ku 223-8522 Yokohama, Japan shinozuka@hvrl.ics.keio.ac.jp

More information

Creating Custom Human Avatars for Ergonomic Analysis using Depth Cameras

Creating Custom Human Avatars for Ergonomic Analysis using Depth Cameras Creating Custom Human Avatars for Ergonomic Analysis using Depth Cameras Matthew P. Reed, Byoung-Keon Park, K. Han Kim University of Michigan Transportation Research Institute Ulrich Raschke Siemens PLM

More information

DEPT: Depth Estimation by Parameter Transfer for Single Still Images

DEPT: Depth Estimation by Parameter Transfer for Single Still Images DEPT: Depth Estimation by Parameter Transfer for Single Still Images Xiu Li 1,2, Hongwei Qin 1,2(B), Yangang Wang 3, Yongbing Zhang 1,2, and Qionghai Dai 1 1 Department of Automation, Tsinghua University,

More information

The Novel Approach for 3D Face Recognition Using Simple Preprocessing Method

The Novel Approach for 3D Face Recognition Using Simple Preprocessing Method The Novel Approach for 3D Face Recognition Using Simple Preprocessing Method Parvin Aminnejad 1, Ahmad Ayatollahi 2, Siamak Aminnejad 3, Reihaneh Asghari Abstract In this work, we presented a novel approach

More information

Multimodal Information Spaces for Content-based Image Retrieval

Multimodal Information Spaces for Content-based Image Retrieval Research Proposal Multimodal Information Spaces for Content-based Image Retrieval Abstract Currently, image retrieval by content is a research problem of great interest in academia and the industry, due

More information

Three-dimensional nondestructive evaluation of cylindrical objects (pipe) using an infrared camera coupled to a 3D scanner

Three-dimensional nondestructive evaluation of cylindrical objects (pipe) using an infrared camera coupled to a 3D scanner Three-dimensional nondestructive evaluation of cylindrical objects (pipe) using an infrared camera coupled to a 3D scanner F. B. Djupkep Dizeu, S. Hesabi, D. Laurendeau, A. Bendada Computer Vision and

More information

Estimating Human Pose in Images. Navraj Singh December 11, 2009

Estimating Human Pose in Images. Navraj Singh December 11, 2009 Estimating Human Pose in Images Navraj Singh December 11, 2009 Introduction This project attempts to improve the performance of an existing method of estimating the pose of humans in still images. Tasks

More information

Animating Characters in Pictures

Animating Characters in Pictures Animating Characters in Pictures Shih-Chiang Dai jeffrey@cmlab.csie.ntu.edu.tw Chun-Tse Hsiao hsiaochm@cmlab.csie.ntu.edu.tw Bing-Yu Chen robin@ntu.edu.tw ABSTRACT Animating pictures is an interesting

More information

Transfer Learning. Style Transfer in Deep Learning

Transfer Learning. Style Transfer in Deep Learning Transfer Learning & Style Transfer in Deep Learning 4-DEC-2016 Gal Barzilai, Ram Machlev Deep Learning Seminar School of Electrical Engineering Tel Aviv University Part 1: Transfer Learning in Deep Learning

More information

A Simple 3D Scanning System of the Human Foot Using a Smartphone with Depth Camera

A Simple 3D Scanning System of the Human Foot Using a Smartphone with Depth Camera Abstract A Simple 3D Scanning System of the Human Foot Using a Smartphone with Depth Camera Takumi KOBAYASHI* 1, Naoto IENAGA 1, Yuta SUGIURA 1, Hideo SAITO 1, Natsuki MIYATA 2, Mitsumori TADA 2 1 Keio

More information

3D Computer Vision. Structured Light II. Prof. Didier Stricker. Kaiserlautern University.

3D Computer Vision. Structured Light II. Prof. Didier Stricker. Kaiserlautern University. 3D Computer Vision Structured Light II Prof. Didier Stricker Kaiserlautern University http://ags.cs.uni-kl.de/ DFKI Deutsches Forschungszentrum für Künstliche Intelligenz http://av.dfki.de 1 Introduction

More information

Selecting Models from Videos for Appearance-Based Face Recognition

Selecting Models from Videos for Appearance-Based Face Recognition Selecting Models from Videos for Appearance-Based Face Recognition Abdenour Hadid and Matti Pietikäinen Machine Vision Group Infotech Oulu and Department of Electrical and Information Engineering P.O.

More information

Photo-realistic Renderings for Machines Seong-heum Kim

Photo-realistic Renderings for Machines Seong-heum Kim Photo-realistic Renderings for Machines 20105034 Seong-heum Kim CS580 Student Presentations 2016.04.28 Photo-realistic Renderings for Machines Scene radiances Model descriptions (Light, Shape, Material,

More information

Algorithm research of 3D point cloud registration based on iterative closest point 1

Algorithm research of 3D point cloud registration based on iterative closest point 1 Acta Technica 62, No. 3B/2017, 189 196 c 2017 Institute of Thermomechanics CAS, v.v.i. Algorithm research of 3D point cloud registration based on iterative closest point 1 Qian Gao 2, Yujian Wang 2,3,

More information

Synthesizing Realistic Facial Expressions from Photographs

Synthesizing Realistic Facial Expressions from Photographs Synthesizing Realistic Facial Expressions from Photographs 1998 F. Pighin, J Hecker, D. Lischinskiy, R. Szeliskiz and D. H. Salesin University of Washington, The Hebrew University Microsoft Research 1

More information

Video based Animation Synthesis with the Essential Graph. Adnane Boukhayma, Edmond Boyer MORPHEO INRIA Grenoble Rhône-Alpes

Video based Animation Synthesis with the Essential Graph. Adnane Boukhayma, Edmond Boyer MORPHEO INRIA Grenoble Rhône-Alpes Video based Animation Synthesis with the Essential Graph Adnane Boukhayma, Edmond Boyer MORPHEO INRIA Grenoble Rhône-Alpes Goal Given a set of 4D models, how to generate realistic motion from user specified

More information

Facial Expression Classification with Random Filters Feature Extraction

Facial Expression Classification with Random Filters Feature Extraction Facial Expression Classification with Random Filters Feature Extraction Mengye Ren Facial Monkey mren@cs.toronto.edu Zhi Hao Luo It s Me lzh@cs.toronto.edu I. ABSTRACT In our work, we attempted to tackle

More information

3D Shape Segmentation with Projective Convolutional Networks

3D Shape Segmentation with Projective Convolutional Networks 3D Shape Segmentation with Projective Convolutional Networks Evangelos Kalogerakis 1 Melinos Averkiou 2 Subhransu Maji 1 Siddhartha Chaudhuri 3 1 University of Massachusetts Amherst 2 University of Cyprus

More information

An Exploration of Computer Vision Techniques for Bird Species Classification

An Exploration of Computer Vision Techniques for Bird Species Classification An Exploration of Computer Vision Techniques for Bird Species Classification Anne L. Alter, Karen M. Wang December 15, 2017 Abstract Bird classification, a fine-grained categorization task, is a complex

More information

3D-CODED : 3D Correspondences by Deep Deformation

3D-CODED : 3D Correspondences by Deep Deformation 3D-CODED : 3D Correspondences by Deep Deformation Thibault Groueix 1, Matthew Fisher 2, Vladimir G. Kim 2, Bryan C. Russell 2, and Mathieu Aubry 1 1 LIGM (UMR 8049), École des Ponts, UPE 2 Adobe Research

More information

Real-time Hand Tracking under Occlusion from an Egocentric RGB-D Sensor Supplemental Document

Real-time Hand Tracking under Occlusion from an Egocentric RGB-D Sensor Supplemental Document Real-time Hand Tracking under Occlusion from an Egocentric RGB-D Sensor Supplemental Document Franziska Mueller 1,2 Dushyant Mehta 1,2 Oleksandr Sotnychenko 1 Srinath Sridhar 1 Dan Casas 3 Christian Theobalt

More information

calibrated coordinates Linear transformation pixel coordinates

calibrated coordinates Linear transformation pixel coordinates 1 calibrated coordinates Linear transformation pixel coordinates 2 Calibration with a rig Uncalibrated epipolar geometry Ambiguities in image formation Stratified reconstruction Autocalibration with partial

More information

Face Tracking. Synonyms. Definition. Main Body Text. Amit K. Roy-Chowdhury and Yilei Xu. Facial Motion Estimation

Face Tracking. Synonyms. Definition. Main Body Text. Amit K. Roy-Chowdhury and Yilei Xu. Facial Motion Estimation Face Tracking Amit K. Roy-Chowdhury and Yilei Xu Department of Electrical Engineering, University of California, Riverside, CA 92521, USA {amitrc,yxu}@ee.ucr.edu Synonyms Facial Motion Estimation Definition

More information

Towards Multi-scale Heat Kernel Signatures for Point Cloud Models of Engineering Artifacts

Towards Multi-scale Heat Kernel Signatures for Point Cloud Models of Engineering Artifacts Towards Multi-scale Heat Kernel Signatures for Point Cloud Models of Engineering Artifacts Reed M. Williams and Horea T. Ilieş Department of Mechanical Engineering University of Connecticut Storrs, CT

More information