Rapid Skin: Estimating the 3D Human Pose and Shape in Real-Time

Size: px
Start display at page:

Download "Rapid Skin: Estimating the 3D Human Pose and Shape in Real-Time"


1 Rapid Skin: Estimating the 3D Human Pose and Shape in Real-Time Matthias Straka, Stefan Hauswiesner, Matthias Rüther, and Horst Bischof Institute for Computer Graphics and Vision Graz University of Technology, Austria Abstract We present a novel approach to adapt a watertight polygonal model of the human body to multiple synchronized camera views. While previous approaches yield excellent quality for this task, they require processing times of several seconds, especially for high resolution meshes. Our approach delivers high quality results at interactive rates when a roughly initialized pose and a generic articulated body model are available. The key novelty of our approach is to use a Gauss- Seidel type solver to iteratively solve nonlinear constraints that deform the surface of the model according to silhouette images. We evaluate both the visual quality and accuracy of the adapted body shape on multiple test persons. While maintaining a similar reconstruction quality as previous approaches, our algorithm reduces processing times by a factor of 20. Thus it is possible to use a simple human model for representing the body shape of moving people in interactive applications. Keywords-human body; multi-view geometry; silhouette; Laplacian mesh adaption; real-time; I. INTRODUCTION Marker-less human pose and body shape estimation from images has numerous applications in video games, virtual try-ons, augmented reality and motion capture for the entertainment industry. Recent advances in real-time human pose estimation enable to create interactive environments where the only controller is the body of the user [1]. However, an estimate of the human pose alone sometimes is not sufficient. For example, displaying a realistic user controlled avatar that not only mimics the pose of the user but also his appearance requires a full representation of the body surface. Such an avatar is an important component in augmented reality applications such as a virtual mirror [2]. The main challenges of capturing the body shape lie in the articulation of the human body and the variation of size, age and visual appearance between different persons. For static objects it is fairly easy to generate a realistic and accurate model in real-time, even with a single, moving camera [3]. However, people will change their pose continuously in an interactive scenario. This requires pose estimation and shape adaption for every single frame. Several authors have tackled the task of body shape adaption by recording images from a multi-view camera setup and deforming a human body mesh such that it is consistent with the background-subtracted silhouette in each view [4] [9]. While most approaches yield convincing results, two common limitations remain: a previously scanned model of (a) (b) (c) (d) Figure 1. Our approach estimates the shape of the human body in realtime. We take a generic template mesh (a), correct its pose and size (b) and deform it according to multi-view silhouettes to obtain an accurate model (c). Projective texturing is used for realistic rendering (d). the actor is required, and processing times in the order of several seconds per frame have to be expected. In this paper, we present a novel approach that allows adapting a generic model of the human body to multi-view images (see Fig. 1). We improve over existing methods by introducing a constraint based mesh deformation and propose a real-time capable solver based on Gauss-Seidel iterations. We start with a polygonal mesh of the human body and create nonlinear constraints that align vertices with image features but keep the overall mesh smooth. The key for real-time operation is to process each constraint individually, which allows for fast and stable estimation of the three dimensional shape of the human body such that interactive applications become feasible. The main contributions in this paper are as follows: We derive constraints to deform a mesh such that it becomes consistent with multi-view silhouette contours and propose an automatic constraint weighting scheme. Our approach enables performing this deformation in real-time even for large meshes, which has not been possible before. We adapt the size of the mesh at runtime by changing the length of skeleton bones. This allows us to represent a wide range of people with different age, gender and size using only a single template mesh. We demonstrate a method to transform multi-view silhouette data to depth-maps which allows using realtime pose estimation methods such as [1] directly. Section II reviews existing work in the field of human

2 body shape adaption. In Section III, we present our shape adaption algorithm consisting of constraints and a real-time capable solver. In Section IV, we present how to use our algorithm for full human bodies in an interactive scenario. We evaluate our algorithm on several recorded sequences and provide qualitative measurements of both accuracy and speed in Section V. Finally, Section VI concludes the paper and gives an outlook for future work. II. RELATED WORK The idea of deforming a polygonal mesh such that it is consistent with images of the body silhouette is not new. In the current literature, several approaches can be found that make use of the Laplacian Mesh Editing (LME) framework [10]. The basic idea is to represent each vertex using delta coordinates, defined as the difference between the vertex position and the weighted sum of positions of neighboring vertices. Deformation of a mesh is then expressed as a sparse system of linear equations which allows modifying the position of selected vertices while using delta coordinates to enforce smooth deformations of the mesh. The LME framework is used in Gall et al. [8] and Vlasic et al. [9], who transform a model of the human body to align its pose to the recorded person, and then align vertices of the model with silhouette contours in multi-view camera images. Aguiar et al. [4] propose a similar method for mesh deformation, but omit the explicit pose estimation step. Instead, they track the mesh over multiple frames based on silhouette and texture correspondences. While the previously mentioned approaches omit the skeletal structure during surface adaption, [5] present a method to jointly optimize for bones and surface. Most LME-based approaches use global least-squares optimization. This prohibits real-time operation since solving the linear system can be slow for reasonably sized meshes. In Hofmann and Gavrila [11], an automatic pose and shape estimation method is presented that not only adapts a mesh to a single frame, but optimizes over a series of frames in order to obtain a stable body shape. A large database of human body scans makes it possible to build a statistical body model which guides shape deformation based on silhouette data [12], laser scans [13] or even single depth images [14]. Bottom-up methods create a new mesh from merged depth maps [15] or point clouds [16] and therefore do not require any previously known body scan. Straka et al. [2] present a method to capture a moving 3D human body without the use of an explicit model. They use image based rendering to create an interactive virtual mirror image of the user using multiple real cameras. However, it is not possible to obtain an explicit body shape using this method. None of the previously mentioned approaches is able to estimate pose and shape of the human body at interactive frame rates. Recently, it was shown how to perform pose estimation in real-time [1], but mesh deformation still requires several seconds. Our method is closely related to [8] and [9] as we follow their two-stage approach with separate pose estimation and shape deformation. The major difference compared to previous methods is the solver used for optimizing the deformed shape. Our method is inspired by position based physics simulations [17] which are able to compute realistic interactions between soft bodies in realtime. The key to real-time operation is to apply decoupled constraints on individual vertices of a deformable mesh and optimize for stable shape using an iterative method. We show that this decoupled optimization is suitable for mesh deformation guided by image space correspondences such that the final mesh resembles the content in the input images. III. REAL-TIME 3D SHAPE ESTIMATION In this section, we present our novel approach for realtime estimation of the shape of an object, which is represented by its silhouette in multiple images. The main idea is to iteratively deform a template mesh consisting of vertices and faces such that the projection of the mesh into the source images is identical to the silhouette of the object. For now, we assume that an initial mesh with the same topology is available in roughly the same pose as the object inside a calibrated multi-camera system. In Section IV, we show how to quickly initialize an articulated human body mesh such that it fulfills these requirements. A. Constraint-based Mesh Deformation We consider the problem of deforming a polygonal mesh M = {V, N, F} consisting of vertices V = {v i R 3 i = 1... V }, vertex normals N = {n i R 3 i = 1... V } and triangular faces F such that all vertices satisfy a set of constraints C j (V Φ j ) = 0 1 j M. (1) Each constraint is a function C j : R 3 V R with a set of parameters Φ j that encodes a relationship between selected vertices with other vertices of M or the scene. For example, a constraint can be responsible for aligning the mesh with image data. We use the parameters Φ j for storing constraint properties such as initial curvature or correspondences. Usually, these parameters are initialized before optimization. The vertex positions of the deformed mesh can be obtained by minimizing over all constraints: Ṽ = argmin V M k Cj j (V Φ j ) (2) j=1 where k j [0, 1] is a weighting term and. denotes the length of a vector. Note that such constraints need not be linear but only differentiable. Inspired by the Gauss-Seidel algorithm for linear systems of equations [18], we do not minimize (2) as a whole.

3 Match Silhouette Contour Camera center Viewing Ray Constraint 1-ring vertices v i 3D Mesh Normal direction Rim-Vertex Image Projected Mesh Figure 2. Silhouette constraints pull rim-vertices towards the silhouette contour in every camera image. Instead, we break it down into individual constraints and project each C j onto the vertices independently. We use a first-order Taylor series expansion to find a positioncorrection term V such that C j (V + V) C j (V) + V C j (V) V = 0 (3) where V C j denotes the gradient of constraint j. Solving for V yields the step for the iterative minimization C j (V) V = V C j (V) 2 VC j (V) (4) which is similar to the standard Newton-Raphson method. We use (4) to perform a weighted correction of the current vertex positions V V + k j V for every constraint C j. Analog to the Gauss-Seidel algorithm, we use updated values of V for subsequent calculations as soon as available. This requires less memory and allows the solution to converge faster while keeping time complexity linear in the number of constraints. By iterating constraint projection multiple times, we allow the effect of constraints to propagate along the surface of the mesh until all vertices of the deformed mesh reach a stable position. A similar strategy can be found in real-time physics simulation, where internal and external forces of simulated objects are integrated using iterative constraint projection [17]. B. Constraints The presented algorithm is capable of handling nonlinear constraints of any type. We propose to use two specific types of constraints for the task of template based shape estimation. First, silhouette constraints C sil allow to align rim vertices of a template mesh with silhouette contours in the images. The second type of constraint C sm is a smoothness constraint which acts as a regularization term. This allows (2) to be rewritten as M sil Ṽ = argmin V j=1 kj sil C sil j V (V) + k sm i=1 C sm i (V) (5) with two distinct sets of constraints. We now describe these constraints in detail and show how to choose the weights k sil j automatically. Figure 3. vertices. Calculation of delta coordinates using the 1-ring of neighboring Silhouette Consistency: In order to achieve silhouette consistency, we apply a method related to [4], [8], [9] to align rim vertices of the mesh with the silhouette contour. Rim vertices lie on the contour of the mesh when projected onto a camera image I c. In order to find rim vertices, we project vertices v i of mesh M into all camera views using the corresponding 3 4 projection matrices P c = K c [R c t c ] and rotate the corresponding vertex normals n i onto the image plane using rotation matrix R c R 3 3 : [ ] P c (1) [ v i ] vi c P = c (2) 1 P c (3) [ v i ] n c i = R c n i (6) 1 where P c (r) denotes the r th row of the projection matrix. We calculate vertex normals n i as the normalized mean of face normals adjacent to the vertex v i. A rim vertex in image I c is a vertex with a normal almost parallel to the image plane of camera c. For such vertices, we sample pixels from I c along a 2D line l(t) = vi c + t nc i (1 : 2) for intersections with the silhouette contour where τ t τ defines the search region in pixels. Note that it is important that only intersections with a contour gradient similar to the normal direction n c i are considered a match pc i R2. Simply matching the closest contour pixel, such as in [4], can lead to false matches, especially if the initialization of mesh M is inaccurate. Each successfully matched rim-vertex/contour pair (vi c, pc i ) yields a 2D correspondence in image space. We translate this correspondence into a constraint Cj sil which enforces that vertex v i is pulled towards the viewing ray R j, which is a 3D line from the projection center of camera c through the contour pixel p c i : C sil j (V R j, i) = d pl (R j, v i ) = 0 (7) where d pl denotes the shortest Euclidean distance between a point and line in 3D. In Fig. 2, we visualize the effect of silhouette constraints. Mesh Smoothing: Smoothness constraints are based on delta coordinates δ i R 3, which are calculated as δ i = w ij (v i v j ) (8) j N (i) where N (i) denotes the 1-ring of neighboring vertices of v i (see Fig. 3). Each weight w ij is calculated using the cotangent weighting scheme [10] with j w ij = 1 i. For δi

4 each vertex v i, we define a smoothness constraint Ci sm that ensures that the delta coordinate δ i of vertex v i stays close to its initial value, which is computed from the undeformed mesh M using (8): Ci sm (V δ i ) = w ij (v i v j ) δ i = 0. (9) 2 j N (i) Automatic Constraint Weighting: Each silhouette constraint Cj sil is weighted using a scalar kj sil. We propose to use a weighting scheme that takes into consideration the quality of silhouette contour matches and adapts the influence of constraints automatically. When a vertex is far away from a silhouette contour, there is a large uncertainty which contour pixel it should correspond to. In this case, we put more trust in the smoothness term. In contrast, when the distance between a projected vertex and the silhouette contour is small, we consider this a good match and keep the vertex close to the corresponding viewing ray R j. We encode this uncertainty into the silhouette constraint weights by applying an unnormalized Gaussian kernel to the initial Euclidean pixel distance between the projected vertex vi c and the matched contour pixel pc i kj sil = exp ( vc i pc i 2 2 α 2 of Csil j : ). (10) Therefore, good matches give the corresponding constraint Cj sil (V) a weight close to 1, while an increasing distance leads to smaller weights (α > 0 controls the width of the Gaussian lobe). All smoothness constraints Ci sm (V) are equally weighted with k sm = 1 throughout this paper. We perform multiple iterations of finding correspondences and deforming the mesh according to the resulting constraints. By using the proposed weighting scheme, we allow that rim vertices already close to the silhouette contour are kept close to their optimal position. Distant matches are initially affected more by the smoothness constraints. Thus, they eventually gain higher weights as they get aligned with the silhouette contour during optimization. C. The Iterative Solver In Laplacian Mesh Editing (LME), (8) is used as a regularization term and a few selected control vertices guide the shape deformation. Even for a large number of vertices, the deformed mesh can be computed efficiently when the set of control vertices does not change. In this case, the optimal solution to a linear system of equations can be precomputed via Cholesky decomposition once and a deformed mesh can be obtained through simple back substitution multiple times when the positions of control vertices change [10]. However, the set of control vertices changes continuously when deforming a mesh using iteratively updated image correspondences. Thus, no pre-computations are possible and the optimization has to be performed from scratch every Algorithm 1 Constraint projection algorithm. Require: V = {v i... v V } 1: {Φ 1... Φ M } initialize(v) 2: for number of outer iterations N o do 3: {Φ 1... Φ M, k 1... k M } update(v, Φ 1... Φ M ) 4: for number of inner iterations N i do 5: for j = 1... M do 6: V V k j C j(v Φ j) V C j(v Φ j) 2 V C j (V Φ j ) 7: end for 8: end for 9: end for time. In contrast to LME, there are hundreds of control vertices in shape deformation, which are often applied to neighboring vertices. In addition, it is usually possible to obtain initial vertex positions close to the optimal deformation when adapting a mesh to images. Therefore, we argue that an iterative solver is suitable to optimize (5). By using nonlinear constraints and an update step weighting that is similar to [17], we achieve high quality deformation results. In Section V, we show that our solver requires fewer iterations than the iterative Conjugate Gradient method [18] with linear constraints only. Our iterative solver for initializing and updating constraint parameters Φ 1... Φ M and projecting constraints C 1... C M is outlined in Algorithm 1. In Line 1, we set up all constraints using the initial vertex positions estimates (i.e. we calculate δ i ). The solver contains two loops: the outer loop (Line 2) is entered N o times and controls how often constraint parameters are updated (i.e. matching of rimvertices with the silhouette contour) while the inner loop in Line 4 projects the constraints. Since constraints are projected independently of each other, the number of inner iterations N i influences how far the effect of each constraint can propagate along the surface of the mesh. We do not multiply correction steps by k j directly, but use a modified weight k j = 1 (1 k j ) 1/N i which allows projecting constraints with linear dependence on N i [17]. The constraint projection in Line 6 prohibits parallelization because each calculation depends on the updated values V of the previous projection. When a parallel processing architecture such as a GPU is available, it is possible to calculate the update step V from the same vertex positions V for all constraints in parallel. However, the number of inner iterations N i needs to be increased since the convergence rate is slower compared to the Gauss-Seidel type solver. Vertex positions can be updated in parallel as well, but it has to be ensured that a vertex is not updated by multiple constraints at the same time. IV. ESTIMATING THE HUMAN BODY SHAPE One application of shape estimation is to deform a template mesh such that it fits the shape of a human body

5 g k g g k-1 k-1 g k g k+1 L L T k T k+1 (a) g k+1 Figure 4. (a) Aligning limbs using local rotation and length transformations. (b) The SCAPE mesh [19] in its default pose and its skeleton. recorded by a synchronized multi-camera system. In this section, we show how to initialize a generic model such that we can apply our constraints and solver. We first estimate the 3D pose of the human body from multi-view camera images. Then, we transform the mesh such that it has roughly the same body dimensions and posture. Finally, we deform the mesh until it best fits to image data. A. Pose Estimation The availability of an affordable depth sensor (Kinect) has led to major improvements in real-time pose estimation. Shotton et al. [1] show how to translate pose estimation to a depth-map labeling problem which can efficiently be solved using randomized decision forests in real-time. The output of such an algorithm is a set of joint positions g k R 3, which belong to a skeleton with K joints. Our algorithm can be initialized from such joint positions. For each joint k, we determine homogeneous transformation matrices T k R 4 4 that allow to transform our template mesh such that it has a pose similar to the user. We calculate T k directly from g k as a global transformation T G and local limb transformations T L k : c k T k = T G (11) j=1 T L c k (j) where c k is the mapping that represents the order of joints along the kinematic chain from the root node to joint k. The global transformation aligns the upper body of the skeleton by means of rotation, scale and translation. Each local limb transformation T L k rotates and scales the bone between joint k and its parent joint such that it is aligned with g k. In Fig. 4a we demonstrate this alignment process, which automatically adapts the template skeleton to the actual size of the body. B. The Articulated Body Model (b) Template based shape estimation requires a mesh M 0 of the human body. To handle arbitrary poses, the model must support deformation by an underlying articulated skeleton. In this work, we use the static SCAPE mesh model [19] in its default pose as in Fig. 4b. Any other watertight mesh is suitable for this purpose as well. The skeleton with K joints is embedded into the mesh and linear skinning weights ρ i,k are calculated using a rigging algorithm [20], which links each vertex to one or multiple joints. Linear blend skinning is used to transform the mesh M 0 into the mesh M with the current pose of the user: K [ ] 0 vi v i = ρ i,k T k (12) 1 k=1 where vertex positions v i can be obtained as a linear combination of the template vertex positions v 0 i that are transformed by weighted joint transformations T k. C. Shape Estimation We use the transformed mesh M for the initialization of both vertex positions and constraints in our shape estimation method. We no longer consider the underlying bone structure when deforming the mesh, since we have observed often that there is a non-negligible offset between the real joint position and the estimate given by the skeleton tracker. Usually, our shape estimation algorithm corrects such offsets without visible artifacts. V. EXPERIMENTS We evaluate our approach on multiple video sequences of moving persons, either recorded with our own multicamera setup or simulated through rendering of artificial data. Besides visual quality, we evaluate our algorithm in terms of reconstruction quality and run-time, and compare it to related approaches. Specifically, we compare the mesh adapted with our method to the output of related methods based on linear Laplacian Mesh Editing (LME) such as [8], [9]. We set up the linear systems of equations for LME mesh deformation using the same template mesh and rim-vertex/contour correspondences as used with our approach. As suggested in [10], we solve for optimal vertex positions in least squares sense using a sparse Cholesky decomposition. In addition, we compare our method to the iterative conjugate gradient algorithm [18], which is an alternative for solving least squares linear equations. A. Experimental Setup Our recording hardware consists of a studio environment with ten synchronized cameras connected to a single computer [2]. Each camera delivers a color image with pixels at 15 frames per second. Silhouettes of the user are obtained through color-based background segmentation. Based on the image resolution, we set the search region for silhouette contour matches to τ = 30 pixel and use α = 40 pixel for the calculation of weights k sil j.

6 Convergence (%) Vertices 6000 Vertices Number of iterations (a) Constraint based (b) Conjugate gradient (c) Cholesky Figure 6. Evaluation of the number of contour matching iterations. The dotted line represents the value N o = 8 used in this paper. Figure 5. Convergence quality after 2 and 8 solver iterations of constraint based deformation (a) and least squares conjugate gradient (b). The mesh obtained by solving the linear system using a Cholesky decomposition is shown in (c). For estimating the human body pose, any algorithm that computes skeleton joint positions in real-time is suitable. For example, Straka et al. [21] compute the skeleton pose directly from silhouette images and Shotton et al. [1] use depth maps as input. We use the OpenNI framework [22] which includes a real-time pose estimation module similar to [1]. Instead of using a Kinect camera, which would require additional calibration and synchronization with our multi-view system, we generate a volumetric 3D model [2] and render a depth map from a virtual viewpoint. Note that [22] only supports typical Kinect poses, therefore our implementation would benefit from more advanced realtime pose estimation systems such as [1], [23], which are unfortunately not publicly available. B. Visual Quality Our solver and the conjugate gradient method require multiple iterations until a satisfying mesh deformation is obtained. In Fig. 5, we compare the quality of the resulting mesh (2 500 vertices) after two and eight solver iterations N i while keeping the rim-vertex/contour matches constant. The constraint based approach produces smooth results after two iterations already, while the conjugate gradient solver yields a noisy mesh. After eight iterations both approaches yield similar results, which are comparable to the mesh obtained by solving the LME system via Cholesky decomposition. The reason for the fast convergence of our algorithm is that we use nonlinear constraints and that the step size is automatically tuned according to the number of iterations. For high quality results, we iterate between contour matching and mesh deformation in an iterative closest point fashion. In Fig. 6, we analyze how many iterations are needed until the contour correspondences stabilize (at 100% vertices have converged to a stable position). We use N o = 8 iterations as a good trade-off between quality and speed in the following experiments. In Fig. 7a, we analyze the distribution of the remaining error by rendering silhouettes of an artificial human body. Therefore we render a known human mesh from virtual cameras that mimic our real camera setup. After applying (a) (b) Figure 7. (a) Deformation error measured via the Hausdorff distance. (b) Mesh overlaid on captured images. our mesh deformation algorithm, we can determine the offset between deformed vertex positions and ground-truth data using the Hausdorff distance. The error stays below 10 mm for the majority of the body surface. In concave areas such as the crotch region there is a higher error since these regions are not visible in silhouette images. Fig. 7b shows a wire frame representation of the deformed mesh, overlaid on recorded camera images. Related methods often present the deformation of a subject specific laser scan, which includes details such as the face and wrinkles of clothes [4], [8]. In contrast, we deform the same template mesh to multi-view silhouette images of a variety of people (see Fig. 8). This means that the mesh will only adapt to details that are visible in silhouette contours. However, we can recover additional details in a rendering stage through projective texturing (see Fig. 1d). The advantage of using a generic mesh is that we can estimate the body shape of previously unknown people without additional 3D scanning. Note that the quality of feet in our results is comparatively low as the majority of our cameras are pointed towards the upper body. C. Runtime Performance We analyze the runtime performance of constraint based mesh deformation on a single-threaded 3 GHz processor. In addition, we show that our approach can take advantage of current GPU architectures such as the NVIDIA GTX 480 by processing all constraints in parallel. For runtime measurements, we perform N o = 8 iterations of contour

7 Figure 8. A single template mesh can be deformed to people of different size and gender. The color images are background segmented camera images and the mesh is rendered from a similar viewing angle. Time per frame (s) GPU Constraint Cholesky CPU Constraint Conj. Gradient 0 2,000 4,000 6,000 8,000 10,000 12,000 Number of Vertices Figure 9. Comparison of the runtime of different optimization methods with increasing number of vertices. Table I COMPARISON OF THE TIME REQUIRED TO DEFORM A HUMAN MESH TO MULTI-CAMERA DATA IN SECONDS PER FRAME. Authors Model Vertices Time Aguiar et al. [4] Scan 2 K 27 s Cagniart et al. [7] Scan/Visual Hull 10 K 25 s Hofmann & G. [11] parametric N/A 15 s Vlasic et al. [9] Scan 10 K 4.8 s Gall et al. [8] Scan N/A 1.7 s This work (CPU) SCAPE 12 K 0.15 s This work (GPU) SCAPE 12 K 0.02 s matching and use N i = 8 solver iterations for our method and conjugate gradients. In Fig. 9 we analyze the time required for the deformation of a mesh at different resolutions and compare the runtime to standard linear solvers. Note that we exclude the time for matching rim-vertices with silhouette contours, which is the same for all methods. Our method (GPU/CPU Constraint) clearly outperforms both linear solvers by a factor of about 20 in the sequential implementation and is more than 100 times faster when executed on a GPU. The bottleneck of linear solvers lies in time consuming matrix decompositions or matrix-vector products. The complete pipeline for mesh deformation includes human pose estimation and mesh initialization. The implementation of our approach is able to adapt a mesh with vertices to multi-view silhouette data within 150 ms on a single CPU (or only 20 ms on a GPU). This allows for mesh deformation at the frame rate of our camera setup. Obviously, we can decrease this processing time even further when the number of vertices is reduced. Especially when texture is applied to the mesh, a few thousand vertices are sufficient for a realistic display. In Table I, we compare the runtime of our approach with existing methods. It is not possible to compare these methods directly nor is it fair to compare the run-time on different platforms. However, this paper presents the first method that eliminates the performance bottleneck of the solver. So far, only our system is able to achieve interactive frame rates when adapting the shape of a human body model to image data. D. Limitations The current implementation relies on a fairly accurate initialization of the skeleton joints. Small displacements of joints can be handled without loss of quality since the mesh automatically gets pulled towards the silhouette contour. However, if the displacement is too large or completely wrong, the search for silhouette contours will fail and no silhouette constraints can be generated for affected vertices. Our approach cannot adapt the body shape if the user wears substantially different clothing than the template mesh (e.g. a skirt). In this case, a specialized template with similar clothing is needed. VI. CONCLUSIONS We have presented a novel method which allows us to automatically estimate the shape of the human body from multi-view images in real-time. This is achieved by deforming a generic template mesh such that rim-vertices are aligned with silhouette contours in all input images. In contrast to existing approaches, we optimize the mesh by using an iterative solver which allows integrating nonlinear constraints. We have shown that the execution time of our solver outperforms previous work by a factor of 20 or more while we maintain a comparable visual quality of the deformed mesh. Thus, we are able to estimate the pose and shape of a human body in an interactive environment. This opens up the possibility for a variety of applications including live 3D video transmission and augmented reality

8 applications where the user can control his own personal avatar. Related work shows adapted body surface preferably using subject specific laser-scans [4], [8]. We have demonstrated that our constraints are sufficient to deform a generic mesh [19] to fit a variety of persons as long as they wear tight fitting clothing. This makes our method particularly suited for multi-user environments where no person-specific template mesh is available or building such a model is the desired task. In this paper, we have focused on mesh deformation based on silhouettes. However, our method is capable of adapting a mesh to different input data as well. For example, it is possible to create constraints that deform a mesh to fit oriented point clouds [16] or depth maps [14]. Recently, it has been shown how to jointly optimize the mesh surface and the underlying skeleton in a linear way [5], which is compatible with our constraint definitions. Therefore, future work will focus on including such skeleton constraints in our algorithm to make the deformation process even more robust. ACKNOWLEDGMENT This work was supported by the Austrian Research Promotion Agency (FFG) under the BRIDGE program, project # (NARKISSOS). Furthermore, we would like to thank the reviewers for their valuable comments and suggestions. We also want to thank everyone who was spending her or his time during the evaluation of this work. REFERENCES [1] J. Shotton, A. Fitzgibbon, M. Cook, T. Sharp, M. Finocchio, R. Moore, A. Kipman, and A. Blake, Real-time human pose recognition in parts from single depth images, in Proc. of CVPR, [2] M. Straka, S. Hauswiesner, M. Rüther, and H. Bischof, A free-viewpoint virtual mirror with marker-less user interaction, in Proc. of SCIA 2011, LNCS 6688, A. Heyden and F. Kahl, Eds., 2011, pp [3] R. A. Newcombe, S. Izadi, O. Hilliges, D. Molyneaux, D. Kim, A. J. Davison, P. Kohli, J. Shotton, S. Hodges, and A. Fitzgibbon, Kinectfusion: Real-time dense surface mapping and tracking, in Proc. of IEEE ISMAR, [4] E. d. Aguiar, C. Stoll, C. Theobalt, N. Ahmed, H.-P. Seidel, and S. Thrun, Performance capture from sparse multi-view video, ACM Transactions on Graphics, vol. 27, no. 3, [5] M. Straka, S. Hauswiesner, M. Rüther, and H. Bischof, Simultaneous shape and pose adaption of articulated models using linear optimization, in Proc. of ECCV 2012, Part I, LNCS 7572, 2012, pp [6] L. Ballan and G. M. Cortelazzo, Marker-less motion capture of skinned models in a four camera set-up using optical flow and silhouettes, in Proc. of 3DPVT, [7] C. Cagniart, E. Boyer, and S. Ilic, Probabilistic deformable surface tracking from multiple videos, in Proc. of ECCV 2010, Part IV, LNCS 6314, 2010, pp [8] J. Gall, C. Stoll, E. de Aguiar, C. Theobalt, B. Rosenhahn, and H.-P. Seidel, Motion capture using joint skeleton tracking and surface estimation, in Proc. of CVPR, [9] D. Vlasic, I. Baran, W. Matusik, and J. Popović, Articulated mesh animation from multi-view silhouettes, ACM Transactions on Graphics, vol. 27, no. 3, [10] M. Botsch and O. Sorkine, On linear variational surface deformation methods, IEEE Trans. on Visualization and Computer Graphics, vol. 14, no. 1, pp , [11] M. Hofmann and D. M. Gavrila, 3D human model adaptation by frame selection and shapetexture optimization, Computer Vision and Image Understanding, vol. 115, no. 11, pp , [12] A. Kanaujia, N. Haering, G. Taylor, and C. Bregler, 3D human pose and shape estimation from multi-view imagery, in Proc. of CVPR Workshops, [13] N. Hasler, C. Stoll, B. Rosenhahn, T. Thormählen, and H.-P. Seidel, Estimating body shape of dressed humans, Computers & Graphics, vol. 33, no. 3, pp , [14] A. Weiss, D. Hirshberg, and M. J. Black, Home 3D body scans from noisy image and range data, in Proc. of ICCV, 2011, pp [15] K. Li, Q. Dai, and W. Xu, Markerless shape and motion capture from multiview video sequences, IEEE Transactions on Circuits and Systems for Video Technology, vol. 21, no. 3, pp , [16] Y. Furukawa and J. Ponce, Dense 3D motion capture from synchronized video streams, in Proc. of CVPR, [17] M. Müller, B. Heidelberger, M. Hennix, and J. Ratcliff, Position based dynamics, Journal of Visual Communication Image Representation, vol. 18, no. 2, pp , [18] R. Barrett, M. Berry, T. F. Chan, J. Demmel, J. Donato, J. Dongarra, V. Eijkhout, R. Pozo, C. Romine, and H. V. der Vorst, Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, 2nd ed. SIAM, [19] D. Anguelov, P. Srinivasan, D. Koller, S. Thrun, J. Rodgers, and J. Davis, SCAPE: shape completion and animation of people, in Proc. of the ACM SIGGRAPH, [20] I. Baran and J. Popović, Automatic rigging and animation of 3D characters, in Proc. of the ACM SIGGRAPH, [21] M. Straka, S. Hauswiesner, M. Rüther, and H. Bischof, Skeletal graph based human pose estimation in real-time, in Proc. of BMVC, J. Hoey, S. McKenna, and E. Trucco, Eds., [22] (2012) OpenNI. [Online]. Available: [23] C. Stoll, N. Hasler, J. Gall, H. Seidel, and C. Theobalt, Fast articulated motion tracking using a sums of gaussians body model, in Proc. of ICCV, 2011, pp

International Conference on Communication, Media, Technology and Design. ICCMTD May 2012 Istanbul - Turkey

International Conference on Communication, Media, Technology and Design. ICCMTD May 2012 Istanbul - Turkey VISUALIZING TIME COHERENT THREE-DIMENSIONAL CONTENT USING ONE OR MORE MICROSOFT KINECT CAMERAS Naveed Ahmed University of Sharjah Sharjah, United Arab Emirates Abstract Visualizing or digitization of the

More information

SCAPE: Shape Completion and Animation of People

SCAPE: Shape Completion and Animation of People SCAPE: Shape Completion and Animation of People By Dragomir Anguelov, Praveen Srinivasan, Daphne Koller, Sebastian Thrun, Jim Rodgers, James Davis From SIGGRAPH 2005 Presentation for CS468 by Emilio Antúnez

More information


PERFORMANCE CAPTURE FROM SPARSE MULTI-VIEW VIDEO Stefan Krauß, Juliane Hüttl SE, SoSe 2011, HU-Berlin PERFORMANCE CAPTURE FROM SPARSE MULTI-VIEW VIDEO 1 Uses of Motion/Performance Capture movies games, virtual environments biomechanics, sports science,

More information

Robust Human Body Shape and Pose Tracking

Robust Human Body Shape and Pose Tracking Robust Human Body Shape and Pose Tracking Chun-Hao Huang 1 Edmond Boyer 2 Slobodan Ilic 1 1 Technische Universität München 2 INRIA Grenoble Rhône-Alpes Marker-based motion capture (mocap.) Adventages:

More information

Clothed and Naked Human Shapes Estimation from a Single Image

Clothed and Naked Human Shapes Estimation from a Single Image Clothed and Naked Human Shapes Estimation from a Single Image Yu Guo, Xiaowu Chen, Bin Zhou, and Qinping Zhao State Key Laboratory of Virtual Reality Technology and Systems School of Computer Science and

More information

Multi-view stereo. Many slides adapted from S. Seitz

Multi-view stereo. Many slides adapted from S. Seitz Multi-view stereo Many slides adapted from S. Seitz Beyond two-view stereo The third eye can be used for verification Multiple-baseline stereo Pick a reference image, and slide the corresponding window

More information

3D Reconstruction of Human Bodies with Clothes from Un-calibrated Monocular Video Images

3D Reconstruction of Human Bodies with Clothes from Un-calibrated Monocular Video Images 3D Reconstruction of Human Bodies with Clothes from Un-calibrated Monocular Video Images presented by Tran Cong Thien Qui PhD Candidate School of Computer Engineering & Institute for Media Innovation Supervisor:

More information


TEXTURE OVERLAY ONTO NON-RIGID SURFACE USING COMMODITY DEPTH CAMERA TEXTURE OVERLAY ONTO NON-RIGID SURFACE USING COMMODITY DEPTH CAMERA Tomoki Hayashi 1, Francois de Sorbier 1 and Hideo Saito 1 1 Graduate School of Science and Technology, Keio University, 3-14-1 Hiyoshi,

More information

Free-Form Mesh Tracking: a Patch-Based Approach

Free-Form Mesh Tracking: a Patch-Based Approach Free-Form Mesh Tracking: a Patch-Based Approach Cedric Cagniart 1, Edmond Boyer 2, Slobodan Ilic 1 1 Department of Computer Science, Technical University of Munich 2 Grenoble Universités, INRIA Rhône-Alpes

More information

Personalization and Evaluation of a Real-time Depth-based Full Body Tracker

Personalization and Evaluation of a Real-time Depth-based Full Body Tracker Personalization and Evaluation of a Real-time Depth-based Full Body Tracker Thomas Helten 1 Andreas Baak 1 Gaurav Bharaj 2 Meinard Müller 3 Hans-Peter Seidel 1 Christian Theobalt 1 1 MPI Informatik 2 Harvard

More information

A Model-based Approach to Rapid Estimation of Body Shape and Postures Using Low-Cost Depth Cameras

A Model-based Approach to Rapid Estimation of Body Shape and Postures Using Low-Cost Depth Cameras A Model-based Approach to Rapid Estimation of Body Shape and Postures Using Low-Cost Depth Cameras Abstract Byoung-Keon D. PARK*, Matthew P. REED University of Michigan, Transportation Research Institute,

More information

Accurate 3D Face and Body Modeling from a Single Fixed Kinect

Accurate 3D Face and Body Modeling from a Single Fixed Kinect Accurate 3D Face and Body Modeling from a Single Fixed Kinect Ruizhe Wang*, Matthias Hernandez*, Jongmoo Choi, Gérard Medioni Computer Vision Lab, IRIS University of Southern California Abstract In this

More information

Mobile Point Fusion. Real-time 3d surface reconstruction out of depth images on a mobile platform

Mobile Point Fusion. Real-time 3d surface reconstruction out of depth images on a mobile platform Mobile Point Fusion Real-time 3d surface reconstruction out of depth images on a mobile platform Aaron Wetzler Presenting: Daniel Ben-Hoda Supervisors: Prof. Ron Kimmel Gal Kamar Yaron Honen Supported

More information

A Free-Viewpoint Virtual Mirror with Marker-Less User Interaction

A Free-Viewpoint Virtual Mirror with Marker-Less User Interaction A Free-Viewpoint Virtual Mirror with Marker-Less User Interaction Matthias Straka, Stefan Hauswiesner, Matthias Rüther, and Horst Bischof Institute for Computer Graphics and Vision, Graz University of

More information

Articulated Pose Estimation with Flexible Mixtures-of-Parts

Articulated Pose Estimation with Flexible Mixtures-of-Parts Articulated Pose Estimation with Flexible Mixtures-of-Parts PRESENTATION: JESSE DAVIS CS 3710 VISUAL RECOGNITION Outline Modeling Special Cases Inferences Learning Experiments Problem and Relevance Problem:

More information

A Virtual Dressing Room Using Kinect

A Virtual Dressing Room Using Kinect 2017 IJSRST Volume 3 Issue 3 Print ISSN: 2395-6011 Online ISSN: 2395-602X Themed Section: Science and Technology A Virtual Dressing Room Using Kinect Jagtap Prajakta Bansidhar, Bhole Sheetal Hiraman, Mate

More information

Mesh from Depth Images Using GR 2 T

Mesh from Depth Images Using GR 2 T Mesh from Depth Images Using GR 2 T Mairead Grogan & Rozenn Dahyot School of Computer Science and Statistics Trinity College Dublin Dublin, Ireland mgrogan@tcd.ie, Rozenn.Dahyot@tcd.ie www.scss.tcd.ie/

More information

Structured Light II. Thanks to Ronen Gvili, Szymon Rusinkiewicz and Maks Ovsjanikov

Structured Light II. Thanks to Ronen Gvili, Szymon Rusinkiewicz and Maks Ovsjanikov Structured Light II Johannes Köhler Johannes.koehler@dfki.de Thanks to Ronen Gvili, Szymon Rusinkiewicz and Maks Ovsjanikov Introduction Previous lecture: Structured Light I Active Scanning Camera/emitter

More information


TEXTURE OVERLAY ONTO NON-RIGID SURFACE USING COMMODITY DEPTH CAMERA TEXTURE OVERLAY ONTO NON-RIGID SURFACE USING COMMODITY DEPTH CAMERA Tomoki Hayashi, Francois de Sorbier and Hideo Saito Graduate School of Science and Technology, Keio University, 3-14-1 Hiyoshi, Kohoku-ku,

More information

Automatic Generation of Animatable 3D Personalized Model Based on Multi-view Images

Automatic Generation of Animatable 3D Personalized Model Based on Multi-view Images Automatic Generation of Animatable 3D Personalized Model Based on Multi-view Images Seong-Jae Lim, Ho-Won Kim, Jin Sung Choi CG Team, Contents Division ETRI Daejeon, South Korea sjlim@etri.re.kr Bon-Ki

More information

Nonrigid Surface Modelling. and Fast Recovery. Department of Computer Science and Engineering. Committee: Prof. Leo J. Jia and Prof. K. H.

Nonrigid Surface Modelling. and Fast Recovery. Department of Computer Science and Engineering. Committee: Prof. Leo J. Jia and Prof. K. H. Nonrigid Surface Modelling and Fast Recovery Zhu Jianke Supervisor: Prof. Michael R. Lyu Committee: Prof. Leo J. Jia and Prof. K. H. Wong Department of Computer Science and Engineering May 11, 2007 1 2

More information

Markerless Motion Capture with Multi-view Structured Light

Markerless Motion Capture with Multi-view Structured Light Markerless Motion Capture with Multi-view Structured Light Ricardo R. Garcia, Avideh Zakhor; UC Berkeley; Berkeley, CA (a) (c) Figure 1: Using captured partial scans of (a) front and (b) back to generate

More information

Motion Capture Using Joint Skeleton Tracking and Surface Estimation

Motion Capture Using Joint Skeleton Tracking and Surface Estimation Motion Capture Using Joint Skeleton Tracking and Surface Estimation Juergen Gall 1,2, Carsten Stoll 2, Edilson de Aguiar 2, Christian Theobalt 3, Bodo Rosenhahn 4, and Hans-Peter Seidel 2 1 BIWI, ETH Zurich

More information

Key Developments in Human Pose Estimation for Kinect

Key Developments in Human Pose Estimation for Kinect Key Developments in Human Pose Estimation for Kinect Pushmeet Kohli and Jamie Shotton Abstract The last few years have seen a surge in the development of natural user interfaces. These interfaces do not

More information

3D Computer Vision. Structured Light II. Prof. Didier Stricker. Kaiserlautern University.

3D Computer Vision. Structured Light II. Prof. Didier Stricker. Kaiserlautern University. 3D Computer Vision Structured Light II Prof. Didier Stricker Kaiserlautern University http://ags.cs.uni-kl.de/ DFKI Deutsches Forschungszentrum für Künstliche Intelligenz http://av.dfki.de 1 Introduction

More information

arxiv: v1 [cs.cv] 28 Sep 2018

arxiv: v1 [cs.cv] 28 Sep 2018 Camera Pose Estimation from Sequence of Calibrated Images arxiv:1809.11066v1 [cs.cv] 28 Sep 2018 Jacek Komorowski 1 and Przemyslaw Rokita 2 1 Maria Curie-Sklodowska University, Institute of Computer Science,

More information

Optical-inertial Synchronization of MoCap Suit with Single Camera Setup for Reliable Position Tracking

Optical-inertial Synchronization of MoCap Suit with Single Camera Setup for Reliable Position Tracking Optical-inertial Synchronization of MoCap Suit with Single Camera Setup for Reliable Position Tracking Adam Riečický 2,4, Martin Madaras 1,2,4, Michal Piovarči 3,4 and Roman Ďurikovič 2 1 Institute of

More information

Handheld scanning with ToF sensors and cameras

Handheld scanning with ToF sensors and cameras Handheld scanning with ToF sensors and cameras Enrico Cappelletto, Pietro Zanuttigh, Guido Maria Cortelazzo Dept. of Information Engineering, University of Padova enrico.cappelletto,zanuttigh,corte@dei.unipd.it

More information

Dynamic Geometry Processing

Dynamic Geometry Processing Dynamic Geometry Processing EG 2012 Tutorial Will Chang, Hao Li, Niloy Mitra, Mark Pauly, Michael Wand Tutorial: Dynamic Geometry Processing 1 Articulated Global Registration Introduction and Overview

More information

A Data-Driven Approach for 3D Human Body Pose Reconstruction from a Kinect Sensor

A Data-Driven Approach for 3D Human Body Pose Reconstruction from a Kinect Sensor Journal of Physics: Conference Series PAPER OPEN ACCESS A Data-Driven Approach for 3D Human Body Pose Reconstruction from a Kinect Sensor To cite this article: Li Yao et al 2018 J. Phys.: Conf. Ser. 1098

More information

Capturing Skeleton-based Animation Data from a Video

Capturing Skeleton-based Animation Data from a Video Capturing Skeleton-based Animation Data from a Video Liang-Yu Shih, Bing-Yu Chen National Taiwan University E-mail: xdd@cmlab.csie.ntu.edu.tw, robin@ntu.edu.tw ABSTRACT This paper presents a semi-automatic

More information

Geometric Modeling and Processing

Geometric Modeling and Processing Geometric Modeling and Processing Tutorial of 3DIM&PVT 2011 (Hangzhou, China) May 16, 2011 6. Mesh Simplification Problems High resolution meshes becoming increasingly available 3D active scanners Computer

More information

Real-Time Scene Reconstruction. Remington Gong Benjamin Harris Iuri Prilepov

Real-Time Scene Reconstruction. Remington Gong Benjamin Harris Iuri Prilepov Real-Time Scene Reconstruction Remington Gong Benjamin Harris Iuri Prilepov June 10, 2010 Abstract This report discusses the implementation of a real-time system for scene reconstruction. Algorithms for

More information

CS 523: Computer Graphics, Spring Shape Modeling. Skeletal deformation. Andrew Nealen, Rutgers, /12/2011 1

CS 523: Computer Graphics, Spring Shape Modeling. Skeletal deformation. Andrew Nealen, Rutgers, /12/2011 1 CS 523: Computer Graphics, Spring 2011 Shape Modeling Skeletal deformation 4/12/2011 1 Believable character animation Computers games and movies Skeleton: intuitive, low-dimensional subspace Clip courtesy

More information

Cage-based Tracking for Performance Animation

Cage-based Tracking for Performance Animation Cage-based Tracking for Performance Animation Yann Savoye, Jean-Sébastien Franco To cite this version: Yann Savoye, Jean-Sébastien Franco. Cage-based Tracking for Performance Animation. ACCV 10 :the Tenth

More information

Simulation and Visualization of Virtual Trial Room

Simulation and Visualization of Virtual Trial Room International Journal for Modern Trends in Science and Technology Volume: 03, Issue No: 05, May 2017 ISSN: 2455-3778 http://www.ijmtst.com Simulation and Visualization of Virtual Trial Room Vishruti Patel

More information

3D Computer Vision. Dense 3D Reconstruction II. Prof. Didier Stricker. Christiano Gava

3D Computer Vision. Dense 3D Reconstruction II. Prof. Didier Stricker. Christiano Gava 3D Computer Vision Dense 3D Reconstruction II Prof. Didier Stricker Christiano Gava Kaiserlautern University http://ags.cs.uni-kl.de/ DFKI Deutsches Forschungszentrum für Künstliche Intelligenz http://av.dfki.de

More information

Dynamic Human Surface Reconstruction Using a Single Kinect

Dynamic Human Surface Reconstruction Using a Single Kinect 2013 13th International Conference on Computer-Aided Design and Computer Graphics Dynamic Human Surface Reconstruction Using a Single Kinect Ming Zeng Jiaxiang Zheng Xuan Cheng Bo Jiang Xinguo Liu Software

More information

Skeletal deformation

Skeletal deformation CS 523: Computer Graphics, Spring 2009 Shape Modeling Skeletal deformation 4/22/2009 1 Believable character animation Computers games and movies Skeleton: intuitive, low dimensional subspace Clip courtesy

More information

Comparison of Default Patient Surface Model Estimation Methods

Comparison of Default Patient Surface Model Estimation Methods Comparison of Default Patient Surface Model Estimation Methods Xia Zhong 1, Norbert Strobel 2, Markus Kowarschik 2, Rebecca Fahrig 2, Andreas Maier 1,3 1 Pattern Recognition Lab, Friedrich-Alexander-Universität

More information

Reconstructing Articulated Rigged Models from RGB-D Videos

Reconstructing Articulated Rigged Models from RGB-D Videos Reconstructing Articulated Rigged Models from RGB-D Videos Dimitrios Tzionas 1,2 Juergen Gall 1 1 University of Bonn 2 MPI for Intelligent Systems {tzionas,gall}@iai.uni-bonn.de Abstract. Although commercial

More information

Human pose estimation using Active Shape Models

Human pose estimation using Active Shape Models Human pose estimation using Active Shape Models Changhyuk Jang and Keechul Jung Abstract Human pose estimation can be executed using Active Shape Models. The existing techniques for applying to human-body

More information

3D Colored Model Generation Based on Multiview Textures and Triangular Mesh

3D Colored Model Generation Based on Multiview Textures and Triangular Mesh 3D Colored Model Generation Based on Multiview Textures and Triangular Mesh Lingni Ma, Luat Do, Egor Bondarev and Peter H. N. de With Department of Electrical Engineering, Eindhoven University of Technology

More information

Occlusion Detection of Real Objects using Contour Based Stereo Matching

Occlusion Detection of Real Objects using Contour Based Stereo Matching Occlusion Detection of Real Objects using Contour Based Stereo Matching Kenichi Hayashi, Hirokazu Kato, Shogo Nishida Graduate School of Engineering Science, Osaka University,1-3 Machikaneyama-cho, Toyonaka,

More information

Human Upper Body Pose Estimation in Static Images

Human Upper Body Pose Estimation in Static Images 1. Research Team Human Upper Body Pose Estimation in Static Images Project Leader: Graduate Students: Prof. Isaac Cohen, Computer Science Mun Wai Lee 2. Statement of Project Goals This goal of this project

More information

Multi-Frame Scene-Flow Estimation Using a Patch Model and Smooth Motion Prior

Multi-Frame Scene-Flow Estimation Using a Patch Model and Smooth Motion Prior POPHAM, BHALERAO, WILSON: MULTI-FRAME SCENE-FLOW ESTIMATION 1 Multi-Frame Scene-Flow Estimation Using a Patch Model and Smooth Motion Prior Thomas Popham tpopham@dcs.warwick.ac.uk Abhir Bhalerao abhir@dcs.warwick.ac.uk

More information

Introduction to Computer Graphics. Animation (1) May 19, 2016 Kenshi Takayama

Introduction to Computer Graphics. Animation (1) May 19, 2016 Kenshi Takayama Introduction to Computer Graphics Animation (1) May 19, 2016 Kenshi Takayama Skeleton-based animation Simple Intuitive Low comp. cost https://www.youtube.com/watch?v=dsonab58qva 2 Representing a pose using

More information

Intrinsic3D: High-Quality 3D Reconstruction by Joint Appearance and Geometry Optimization with Spatially-Varying Lighting

Intrinsic3D: High-Quality 3D Reconstruction by Joint Appearance and Geometry Optimization with Spatially-Varying Lighting Intrinsic3D: High-Quality 3D Reconstruction by Joint Appearance and Geometry Optimization with Spatially-Varying Lighting R. Maier 1,2, K. Kim 1, D. Cremers 2, J. Kautz 1, M. Nießner 2,3 Fusion Ours 1

More information

Visualization and Analysis of Inverse Kinematics Algorithms Using Performance Metric Maps

Visualization and Analysis of Inverse Kinematics Algorithms Using Performance Metric Maps Visualization and Analysis of Inverse Kinematics Algorithms Using Performance Metric Maps Oliver Cardwell, Ramakrishnan Mukundan Department of Computer Science and Software Engineering University of Canterbury

More information


CALIBRATION BETWEEN DEPTH AND COLOR SENSORS FOR COMMODITY DEPTH CAMERAS. Cha Zhang and Zhengyou Zhang CALIBRATION BETWEEN DEPTH AND COLOR SENSORS FOR COMMODITY DEPTH CAMERAS Cha Zhang and Zhengyou Zhang Communication and Collaboration Systems Group, Microsoft Research {chazhang, zhang}@microsoft.com ABSTRACT

More information

Colored Point Cloud Registration Revisited Supplementary Material

Colored Point Cloud Registration Revisited Supplementary Material Colored Point Cloud Registration Revisited Supplementary Material Jaesik Park Qian-Yi Zhou Vladlen Koltun Intel Labs A. RGB-D Image Alignment Section introduced a joint photometric and geometric objective

More information

Geometric Reconstruction Dense reconstruction of scene geometry

Geometric Reconstruction Dense reconstruction of scene geometry Lecture 5. Dense Reconstruction and Tracking with Real-Time Applications Part 2: Geometric Reconstruction Dr Richard Newcombe and Dr Steven Lovegrove Slide content developed from: [Newcombe, Dense Visual

More information

Lecture 19: Depth Cameras. Visual Computing Systems CMU , Fall 2013

Lecture 19: Depth Cameras. Visual Computing Systems CMU , Fall 2013 Lecture 19: Depth Cameras Visual Computing Systems Continuing theme: computational photography Cameras capture light, then extensive processing produces the desired image Today: - Capturing scene depth

More information

Markerless human motion capture through visual hull and articulated ICP

Markerless human motion capture through visual hull and articulated ICP Markerless human motion capture through visual hull and articulated ICP Lars Mündermann lmuender@stanford.edu Stefano Corazza Stanford, CA 93405 stefanoc@stanford.edu Thomas. P. Andriacchi Bone and Joint

More information

High-speed Three-dimensional Mapping by Direct Estimation of a Small Motion Using Range Images

High-speed Three-dimensional Mapping by Direct Estimation of a Small Motion Using Range Images MECATRONICS - REM 2016 June 15-17, 2016 High-speed Three-dimensional Mapping by Direct Estimation of a Small Motion Using Range Images Shinta Nozaki and Masashi Kimura School of Science and Engineering

More information

Research Article Human Model Adaptation for Multiview Markerless Motion Capture

Research Article Human Model Adaptation for Multiview Markerless Motion Capture Mathematical Problems in Engineering Volume 2013, Article ID 564214, 7 pages http://dx.doi.org/10.1155/2013/564214 Research Article Human Model Adaptation for Multiview Markerless Motion Capture Dianyong

More information

Project Updates Short lecture Volumetric Modeling +2 papers

Project Updates Short lecture Volumetric Modeling +2 papers Volumetric Modeling Schedule (tentative) Feb 20 Feb 27 Mar 5 Introduction Lecture: Geometry, Camera Model, Calibration Lecture: Features, Tracking/Matching Mar 12 Mar 19 Mar 26 Apr 2 Apr 9 Apr 16 Apr 23

More information

calibrated coordinates Linear transformation pixel coordinates

calibrated coordinates Linear transformation pixel coordinates 1 calibrated coordinates Linear transformation pixel coordinates 2 Calibration with a rig Uncalibrated epipolar geometry Ambiguities in image formation Stratified reconstruction Autocalibration with partial

More information

Multiview Stereo COSC450. Lecture 8

Multiview Stereo COSC450. Lecture 8 Multiview Stereo COSC450 Lecture 8 Stereo Vision So Far Stereo and epipolar geometry Fundamental matrix captures geometry 8-point algorithm Essential matrix with calibrated cameras 5-point algorithm Intersect

More information

OpenCL Implementation Of A Heterogeneous Computing System For Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data

OpenCL Implementation Of A Heterogeneous Computing System For Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data OpenCL Implementation Of A Heterogeneous Computing System For Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data Andrew Miller Computer Vision Group Research Developer 3-D TERRAIN RECONSTRUCTION

More information

Def De orma f tion orma Disney/Pixar

Def De orma f tion orma Disney/Pixar Deformation Disney/Pixar Deformation 2 Motivation Easy modeling generate new shapes by deforming existing ones 3 Motivation Easy modeling generate new shapes by deforming existing ones 4 Motivation Character

More information

Overview. Augmented reality and applications Marker-based augmented reality. Camera model. Binary markers Textured planar markers

Overview. Augmented reality and applications Marker-based augmented reality. Camera model. Binary markers Textured planar markers Augmented reality Overview Augmented reality and applications Marker-based augmented reality Binary markers Textured planar markers Camera model Homography Direct Linear Transformation What is augmented

More information

Object Reconstruction

Object Reconstruction B. Scholz Object Reconstruction 1 / 39 MIN-Fakultät Fachbereich Informatik Object Reconstruction Benjamin Scholz Universität Hamburg Fakultät für Mathematik, Informatik und Naturwissenschaften Fachbereich

More information

Processing 3D Surface Data

Processing 3D Surface Data Processing 3D Surface Data Computer Animation and Visualisation Lecture 12 Institute for Perception, Action & Behaviour School of Informatics 3D Surfaces 1 3D surface data... where from? Iso-surfacing

More information


FLY THROUGH VIEW VIDEO GENERATION OF SOCCER SCENE FLY THROUGH VIEW VIDEO GENERATION OF SOCCER SCENE Naho INAMOTO and Hideo SAITO Keio University, Yokohama, Japan {nahotty,saito}@ozawa.ics.keio.ac.jp Abstract Recently there has been great deal of interest

More information

Visual Hulls from Single Uncalibrated Snapshots Using Two Planar Mirrors

Visual Hulls from Single Uncalibrated Snapshots Using Two Planar Mirrors Visual Hulls from Single Uncalibrated Snapshots Using Two Planar Mirrors Keith Forbes 1 Anthon Voigt 2 Ndimi Bodika 2 1 Digital Image Processing Group 2 Automation and Informatics Group Department of Electrical

More information

Dynamic 2D/3D Registration for the Kinect

Dynamic 2D/3D Registration for the Kinect Dynamic 2D/3D Registration for the Kinect Sofien Bouaziz Mark Pauly École Polytechnique Fédérale de Lausanne Abstract Image and geometry registration algorithms are an essential component of many computer

More information

Integrating Shape from Shading and Shape from Stereo for Variable Reflectance Surface Reconstruction from SEM Images

Integrating Shape from Shading and Shape from Stereo for Variable Reflectance Surface Reconstruction from SEM Images Integrating Shape from Shading and Shape from Stereo for Variable Reflectance Surface Reconstruction from SEM Images Reinhard Danzl 1 and Stefan Scherer 2 1 Institute for Computer Graphics and Vision,

More information

Processing 3D Surface Data

Processing 3D Surface Data Processing 3D Surface Data Computer Animation and Visualisation Lecture 17 Institute for Perception, Action & Behaviour School of Informatics 3D Surfaces 1 3D surface data... where from? Iso-surfacing

More information

Robust Human Body Shape and Pose Tracking

Robust Human Body Shape and Pose Tracking Robust Human Body Shape and Pose Tracking Chun-Hao Huang, Edmond Boyer, Slobodan Ilic To cite this version: Chun-Hao Huang, Edmond Boyer, Slobodan Ilic. Robust Human Body Shape and Pose Tracking. 3DV -

More information

VIRTUAL try-on applications have become popular in recent

VIRTUAL try-on applications have become popular in recent 1552 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 19, NO. 9, SEPTEMBER 2013 Virtual Try-On through Image-Based Rendering Stefan Hauswiesner, Student Member, IEEE, Matthias Straka, and

More information

Stereo Vision. MAN-522 Computer Vision

Stereo Vision. MAN-522 Computer Vision Stereo Vision MAN-522 Computer Vision What is the goal of stereo vision? The recovery of the 3D structure of a scene using two or more images of the 3D scene, each acquired from a different viewpoint in

More information

Factorization Method Using Interpolated Feature Tracking via Projective Geometry

Factorization Method Using Interpolated Feature Tracking via Projective Geometry Factorization Method Using Interpolated Feature Tracking via Projective Geometry Hideo Saito, Shigeharu Kamijima Department of Information and Computer Science, Keio University Yokohama-City, 223-8522,

More information

Shape from Silhouettes I

Shape from Silhouettes I Shape from Silhouettes I Guido Gerig CS 6320, Spring 2015 Credits: Marc Pollefeys, UNC Chapel Hill, some of the figures and slides are also adapted from J.S. Franco, J. Matusik s presentations, and referenced

More information

A Non-Linear Image Registration Scheme for Real-Time Liver Ultrasound Tracking using Normalized Gradient Fields

A Non-Linear Image Registration Scheme for Real-Time Liver Ultrasound Tracking using Normalized Gradient Fields A Non-Linear Image Registration Scheme for Real-Time Liver Ultrasound Tracking using Normalized Gradient Fields Lars König, Till Kipshagen and Jan Rühaak Fraunhofer MEVIS Project Group Image Registration,

More information

International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol. XXXIV-5/W10

International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol. XXXIV-5/W10 BUNDLE ADJUSTMENT FOR MARKERLESS BODY TRACKING IN MONOCULAR VIDEO SEQUENCES Ali Shahrokni, Vincent Lepetit, Pascal Fua Computer Vision Lab, Swiss Federal Institute of Technology (EPFL) ali.shahrokni,vincent.lepetit,pascal.fua@epfl.ch

More information

Articulated Gaussian Kernel Correlation for Human Pose Estimation

Articulated Gaussian Kernel Correlation for Human Pose Estimation Articulated Gaussian Kernel Correlation for Human Pose Estimation Meng Ding and Guoliang Fan School of Electrical and Computer Engineering, Oklahoma State University, Stillwater, OK, USA 74074 meng.ding@okstate.edu;

More information

CS 664 Structure and Motion. Daniel Huttenlocher

CS 664 Structure and Motion. Daniel Huttenlocher CS 664 Structure and Motion Daniel Huttenlocher Determining 3D Structure Consider set of 3D points X j seen by set of cameras with projection matrices P i Given only image coordinates x ij of each point

More information

Video based Animation Synthesis with the Essential Graph. Adnane Boukhayma, Edmond Boyer MORPHEO INRIA Grenoble Rhône-Alpes

Video based Animation Synthesis with the Essential Graph. Adnane Boukhayma, Edmond Boyer MORPHEO INRIA Grenoble Rhône-Alpes Video based Animation Synthesis with the Essential Graph Adnane Boukhayma, Edmond Boyer MORPHEO INRIA Grenoble Rhône-Alpes Goal Given a set of 4D models, how to generate realistic motion from user specified

More information

Robust Non-rigid Motion Tracking and Surface Reconstruction Using L 0 Regularization

Robust Non-rigid Motion Tracking and Surface Reconstruction Using L 0 Regularization Robust Non-rigid Motion Tracking and Surface Reconstruction Using L 0 Regularization Kaiwen Guo 1,2, Feng Xu 1,3, Yangang Wang 4, Yebin Liu 1,2, Qionghai Dai 1,2 1 Tsinghua National Laboratory for Information

More information

Chaplin, Modern Times, 1936

Chaplin, Modern Times, 1936 Chaplin, Modern Times, 1936 [A Bucket of Water and a Glass Matte: Special Effects in Modern Times; bonus feature on The Criterion Collection set] Multi-view geometry problems Structure: Given projections

More information

BIL Computer Vision Apr 16, 2014

BIL Computer Vision Apr 16, 2014 BIL 719 - Computer Vision Apr 16, 2014 Binocular Stereo (cont d.), Structure from Motion Aykut Erdem Dept. of Computer Engineering Hacettepe University Slide credit: S. Lazebnik Basic stereo matching algorithm

More information

Multi-View 3D-Reconstruction

Multi-View 3D-Reconstruction Multi-View 3D-Reconstruction Cedric Cagniart Computer Aided Medical Procedures (CAMP) Technische Universität München, Germany 1 Problem Statement Given several calibrated views of an object... can we automatically

More information

HandSonor: A Customizable Vision-based Control Interface for Musical Expression

HandSonor: A Customizable Vision-based Control Interface for Musical Expression HandSonor: A Customizable Vision-based Control Interface for Musical Expression Srinath Sridhar MPI Informatik and Universita t des Saarlandes Campus E1.4, 66123 Saarbru cken, Germany ssridhar@mpi-inf.mpg.de

More information

3D model-based human modeling and tracking

3D model-based human modeling and tracking 3D model-based human modeling and tracking André Gagalowicz Projet MIRAGES INRIA - Rocquencourt - Domaine de Voluceau 78153 Le Chesnay Cedex E-Mail : Andre.Gagalowicz@inria.fr FORMER APPROACH 2 Golf-Stream

More information

Motion Capture & Simulation

Motion Capture & Simulation Motion Capture & Simulation Motion Capture Character Reconstructions Joint Angles Need 3 points to compute a rigid body coordinate frame 1 st point gives 3D translation, 2 nd point gives 2 angles, 3 rd

More information

HUMAN motion capture has been under investigation for many

HUMAN motion capture has been under investigation for many IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL., NO., JANUARY 2015 1 Markerless Motion Capture with Multi-view Structured Light Ricardo R. Garcia, Student Member, IEEE, and Avideh Zakhor,

More information

Skeleton Based As-Rigid-As-Possible Volume Modeling

Skeleton Based As-Rigid-As-Possible Volume Modeling Skeleton Based As-Rigid-As-Possible Volume Modeling Computer Science Department, Rutgers University As-rigid-as-possible (ARAP) shape modeling is a popular technique to obtain natural deformations. There

More information

Dense 3D Reconstruction. Christiano Gava

Dense 3D Reconstruction. Christiano Gava Dense 3D Reconstruction Christiano Gava christiano.gava@dfki.de Outline Previous lecture: structure and motion II Structure and motion loop Triangulation Today: dense 3D reconstruction The matching problem

More information

Fast Natural Feature Tracking for Mobile Augmented Reality Applications

Fast Natural Feature Tracking for Mobile Augmented Reality Applications Fast Natural Feature Tracking for Mobile Augmented Reality Applications Jong-Seung Park 1, Byeong-Jo Bae 2, and Ramesh Jain 3 1 Dept. of Computer Science & Eng., University of Incheon, Korea 2 Hyundai

More information

Partial Calibration and Mirror Shape Recovery for Non-Central Catadioptric Systems

Partial Calibration and Mirror Shape Recovery for Non-Central Catadioptric Systems Partial Calibration and Mirror Shape Recovery for Non-Central Catadioptric Systems Abstract In this paper we present a method for mirror shape recovery and partial calibration for non-central catadioptric

More information

Multimodal Motion Capture Dataset TNT15

Multimodal Motion Capture Dataset TNT15 Multimodal Motion Capture Dataset TNT15 Timo v. Marcard, Gerard Pons-Moll, Bodo Rosenhahn January 2016 v1.2 1 Contents 1 Introduction 3 2 Technical Recording Setup 3 2.1 Video Data............................

More information

Single-View Dressed Human Modeling via Morphable Template

Single-View Dressed Human Modeling via Morphable Template Single-View Dressed Human Modeling via Morphable Template Lin Wang 1, Kai Jiang 2, Bin Zhou 1, Qiang Fu 1, Kan Guo 1, Xiaowu Chen 1 1 State Key Laboratory of Virtual Reality Technology and Systems School

More information

Model-based Motion Capture for Crash Test Video Analysis

Model-based Motion Capture for Crash Test Video Analysis Model-based Motion Capture for Crash Test Video Analysis Juergen Gall 1, Bodo Rosenhahn 1, Stefan Gehrig 2, and Hans-Peter Seidel 1 1 Max-Planck-Institute for Computer Science, Campus E1 4, 66123 Saarbrücken,

More information

Multi-View Stereo for Static and Dynamic Scenes

Multi-View Stereo for Static and Dynamic Scenes Multi-View Stereo for Static and Dynamic Scenes Wolfgang Burgard Jan 6, 2010 Main references Yasutaka Furukawa and Jean Ponce, Accurate, Dense and Robust Multi-View Stereopsis, 2007 C.L. Zitnick, S.B.

More information

Development of a Fall Detection System with Microsoft Kinect

Development of a Fall Detection System with Microsoft Kinect Development of a Fall Detection System with Microsoft Kinect Christopher Kawatsu, Jiaxing Li, and C.J. Chung Department of Mathematics and Computer Science, Lawrence Technological University, 21000 West

More information

Outline. Introduction System Overview Camera Calibration Marker Tracking Pose Estimation of Markers Conclusion. Media IC & System Lab Po-Chen Wu 2

Outline. Introduction System Overview Camera Calibration Marker Tracking Pose Estimation of Markers Conclusion. Media IC & System Lab Po-Chen Wu 2 Outline Introduction System Overview Camera Calibration Marker Tracking Pose Estimation of Markers Conclusion Media IC & System Lab Po-Chen Wu 2 Outline Introduction System Overview Camera Calibration

More information

A Survey of Light Source Detection Methods

A Survey of Light Source Detection Methods A Survey of Light Source Detection Methods Nathan Funk University of Alberta Mini-Project for CMPUT 603 November 30, 2003 Abstract This paper provides an overview of the most prominent techniques for light

More information

Human-body modeling Xi et al. [3] used non-numerical attributes like gender, age, race, marital status, and occupation to predict body shapes and buil

Human-body modeling Xi et al. [3] used non-numerical attributes like gender, age, race, marital status, and occupation to predict body shapes and buil HUMAN-BODY MODELING AND POSITION SPECIFICATION FOR FORENSIC AUTOPSY DATA VISUALIZATION Junki Mano Masahiro Toyoura Xiaoyang Mao Hideki Shojo Noboru Adachi Issei Fujishiro University of Yamanashi Keio University

More information

Humanoid Robotics. Least Squares. Maren Bennewitz

Humanoid Robotics. Least Squares. Maren Bennewitz Humanoid Robotics Least Squares Maren Bennewitz Goal of This Lecture Introduction into least squares Use it yourself for odometry calibration, later in the lecture: camera and whole-body self-calibration

More information