Registration of Dynamic Range Images

Registration of Dynamic Range Images Tan-Chi Ho 1,2 Jung-Hong Chuang 1 Wen-Wei Lin 2 Song-Sun Lin 2 1 Department of Computer Science National Chiao-Tung University 2 Department of Applied Mathematics National Chiao-Tung University Abstract With the recent advances in shape acquisition devices, range images of high quality can be captured at video frame rate. The key ingredient in the reconstruction of 3D model from the image sequence is the registration process that finds the point correspondence between range images. In this paper, we will review recent advances in the registration of dynamic range images and discuss some open problems. 1 Introduction Describing the shape of 3D object using mathematical formulations is a fundamental task in the computer graphics. Such modeling process is traditionally tackled by artists using some 3D modeling tools in the computer. However, even with the most experienced artists, creating a virtual object to mimic a real world object is still a great challenge. With the recent advances in the shape acquisition devices, the modeling of 3D object can be done by capturing the range images of real world object from the camera, and reconstructs the mesh directly from the range images. Several problems arise in the modeling process using the range acquisition devices. Range images captured from the camera may not always cover all the surface regions of the 3D object due to the occlusion. Such missing regions will lead to holes in the reconstructed mesh. Multiple range images are usually acquired to overcome such problem. The registration of range images is the process of aligning all the range images into a common 3D space, so that a complete mesh can be reconstructed from these range images. In this paper, we review recent advances in the registration of dynamic range images. We denote the range image as the 2D image which each pixel is associated with a depth value from the captured camera to the object. We also denote the point cloud to be the set of 3D points corresponding to the pixels in the range image. The dynamic range images are the sequences of range image captured from a camera dynamically. The acquired real world object can be either rigid (object shape will not change) or deformable (object shape changes during acquisition). 2 Registration of Dynamic Range Images For the rigid object, the registration process can be formulated as the rigid transformation between point clouds. The rigid transformation can be computed using the Iterative Closet Point (ICP) algorithm [1] which iteratively minimizes the square 1

distance between the two point clouds until converged. The registration process for deformable object, however, cannot be described by simply a rigid transformation since the object shape is changed during acquisition. Each point in the point cloud may have its own transformation to other point clouds, and how the correspondence is established from each point in one point cloud to the other point clouds is the major challenge. The object surface deformed during time can be described as a kinematic space-time surface in 4D space, defined by the 3D position and the time. Each acquired point cloud will partially fit a slice along the time axis on the kinematic space-time surface, and may be incomplete due to the occlusion at acquisition. The correspondence can be traced on the continuous 4D space-time surface. One simple way is based on the image tracking technique such as optical flow [2]. The correspondences of pixels in the range image between consecutive frames are established by tracing the colors or some intrinsic surface properties derived from the point clouds. However, the tracking process does not take the 3D surface information into account, missing surface regions due to the occlusion are unable to be interpreted. The most widely used method in tracing the correspondence is based on the surface deformation model [3]. In such method, a template mesh similar to the acquired object is usually required, and fits the acquired point cloud in each time step using the surface deformation method. The deformation method used to guide the template mesh should minimize the shape difference between the template mesh and the acquired point cloud, while preserve the rigidity of the template mesh and the smoothness of motion. Once we have the deformation sequence of the template mesh through the range image sequence, the correspondence can be established through the deformed template mesh. With the consideration of shape rigidity and motion smoothness during deformation, invisible surface regions in the single point cloud can be revealed in the template mesh. Some recent researches focus on finding the correspondence between range images through a common parameterization domain. The work by Popa et al. [4] tried to establish the correspondence between point clouds of a deformable object using a common parameterization domain. Their work is based on the observation that the reconstructed meshes between two consecutive frames are almost isometric. When applying a stretch minimizing parameterization to these meshes, their parameterizations are likely to be identical. Thus, surface correspondence can be established in the common parameterization domain. To achieve such goal, they first find some feature points in the image space and trace them through the entire sequence using optical flow [2]. These feature points are served as hard constraints during the registration process. Then, cross parameterization are established by fixing these feature points in the parameterization domain and minimizing the stretch for the rest points. Another issue in the registration of dynamic range images for deformable object is the global consistency. The methods we described above trace the deformable object locally between two consecutive frames. However, the consistency may not be preserved if we look at the entire acquired sequence. Each of the locally matching may introduce some correspondence error, such error will be accumulated through the sequence. Another more serious problem is the topological inconsistency. For example, two separate object parts may be welded into a connected region in the reconstructed mesh in some frames and separated in others due to the occlusion. The registration method based on the surface deformation model and template mesh [3] are usually immuned to the topological inconsistency since the topology of object is restricted to the tem- 2

plate mesh. However, template mesh similar to the acquired object may be difficult to be obtained. Several researches focused on preserve the topological consistency without the aid of template mesh [4, 5]. The key idea is to compare the topology of the reconstructed meshes in different time steps in a hierarchical manner. The process begins by checking and fixing the topological inconsistency between two consecutive frames, and merges them into a common sequence. Then, neighboring sequences are checked and fixed for topological inconsistency, and then merged into a biger common sequence. This process is performed iteratively until all the frames in the acquired range image sequence are merged. 3 Discussion and Future Work We conclude some problems in the current registration of the dynamic range images. The current range acquisition hardware can achieve 60 frames per second or above. However, the time for registration still takes minutes to hours for just a few seconds of sequence. Most of the time is spent on the tracing of dependence between frames. The gap between the time for data acquisition and processing limits the usability of such device in some time critical applications. Another problem is the global consistency. The template mesh approach is effective and robust to preserve the consistency of topology. However, the requirement of template mesh similar to the acquired object is a challenge for user especially when the acquired object has complex topological structure. Approaches without template mesh have problem in robustness, and require lots of time in the hierarchical processing. We are devoted to speed up the registration process for the dynamic range images. Several possible approaches may achieve such goal. First, the coherence between consecutive frames can be assumed to be strong in modern acquisition devices. Computing time can be saved under such assumption since the space for searching the correspondence is relative smaller. Another possible approach is to reduce the computational complexity of general registration method to some specific target applications. For example, the motion of the articulated models can be described by a small set of rigid transformations, which highly reduce the complexity of the correspondence tracing process. References [1] G. Turk and M. Levoy, Zippered polygon meshes from range images, in ACM SIG- GRAPH, 1994. [2] J.-Y. Bouguet, Pyramidal Implementation of the Lucas Kanade Feature Tracker Description of the algorithm, in Tech Report, Intel Corp. Microsoft Research Lab., Intel Corporation, Microsoft Research Lab., 1999. [3] H. Li, B. Adams, L. J. Guibas, and M. Pauly, Robust single-view geometry and motion reconstruction, ACM Transactions on Graphics (Proc. SIGGRAPH ASIA), vol. 28, p. 1, Dec. 2009. [4] T. Popa, I. South-Dickinson, D. Bradley, A. Sheffer, and W. Heidrich, Globally Consistent Space-Time Reconstruction, Computer Graphics Forum (Proc. Eurographics Symposium on Geometry Processing), vol. 29, pp. 1633 1642, Sept. 2010. [5] M. Wand, B. Adams, M. Ovsjanikov, A. Berner, M. Bokeloh, P. Jenke, L. Guibas, H.-P. Seidel, and A. Schilling, Efficient reconstruction of nonrigid shape and motion from real-time 3D scanner data, ACM Transactions on Graphics, vol. 28, pp. 1 15, Apr. 2009. 3

Registration of Dynamic Range Images Tan-Chi Ho 1,2 Jung-Hong Chuang 1 Wen-Wei Lin 2 Song-Sun Lin 2 1Department of Computer Science 2Department of Applied Mathematics National Chiao Tung University

Recent Advance in 3D Model Acquisition Capture the 3D object surface directly from camera with distance information. Range image Structured light scanner Microsoft Kinect Light Stage [Vlasic et al. 2009]

Problems of Single View Holes & missing parts Self occlusion

Practical Solution Capture multiple views to cover all missing parts. Fix object, rotate camera Fix camera, rotate object

Registration of Range Images To align the point sets in different range images to a common 3D space.?

Rigid Registration The object will not deform Rigid transformation between point sets of two range images. α ( p) = Rp + T α ( p) R : rotation matrix T : translation vector

Rigid Registration Find the rigid transformation that minimizes the distances between matched points. min R,T ( ) q i 2 w α p i i p i i qi wi : confidence of point pi

Iterative Closet Point (ICP) 1.For each point pi, finds the closest point qi as the match pair. 2.Discard match pairs that are too far apart. 3.Find the rigid transformation. 4.Iterate until converge. p i q i [Turk & Levoy 1994]

Global Registration Problem of ICP Converge only if the start positions of two frames are closed. Global Registration 4 PCS (Point Congruent Sets) [Aiger et al. 2008] Shape descriptors [Gelfand et al. 2005]

Dynamic Range Images Single view, capture dynamically Deformable object Assumptions Single view Orientation of two point sets are globally aligned.

Difficulty Shape changes during acquisition. Correspondence between two frames is not just a rigid transformation

Space-Time Surface The surface is deformed with respect to the time. sample spacing (s) Surface in 4D space (x, y, z, t) t j+1 t j t j 1 t j time spacing [Mitra et al. 2007]

Space-Time Registration Trace the correspondence on the space-time surface. Image tracking (optical flow) Surface deformation Cross parameterization [Mitra et al. 2007]

Optical Flow The point sets are acquired from camera. Trace the pixel correspondence between range images.

Optical Flow Problems Standard optical flow method assumes the pixel color is invariant during deformation. Lack of knowledge about the 3D shape. Unable to interpret the missing surface information due to occlusion.

Surface Deformation Observation The acquired target is an object with deformation. Trace the correspondence between frames using surface deformation methods.

Surface Deformation Deformation model Control mesh. For a surface point p, its position after deformation is m j =1 p = w j p gj : control nodes surrounding p Rj : rotation matrix Tj : translation vector ( ) R j p g j ( ) + g j + T j [Sumner et al. 2007]

Surface Deformation Surface tracking Fit the control mesh of frame i to i+1 by finding the Rj and Tj that minimize the error E = E fit + α smooth E smooth + α rigid E rigid [Li et al. 2009] fitting error deformation energy

deformation graph source target 798 nodes deformation graph 44,233 vts. source 44,360 vts. target Figure 5: Registration of a facial expression. The 5: deformation and E Figure Registrationenergies of a facialesmooth expression. T is concentrated on the cheeks. This example also contains a substantial ro is concentrated on the cheeks. Thisglobal example optimization. optimization. Surface Deformation deformation graph 336 nodes Deformation process source 120,555 vts. target 119,518 vts. Figure 6: Registration of a human torso. As indicated by Econf, large 1. Construct the control mesh (or template deformation graph mesh). deformation graph 336 nodes source target 336 nodes 120,555 vts. source 120,555 vts. initial alignment 119,518 vts. 11 fin Figure 6: Registration of a human torso. As in 2. Fit the Figure point 6: set of the first Registration of a frame. human torso. As indicated by E 3. Deform to the next frame by minimizing deformation graph source 138 nodes 37,773 vts. the energy function. 4. Establish initial conf, target 36,693 vts. large regions in the f initial alignment Control Figure 7: Registration of a bending arm. The poor quality of the con mesh correspondence. map of the fist explains why few reliable correspondences are found 5. Repeat 3-4 until final frame. deformation graph 138 nodes source 37,773 vts. source deformation graph 138 nodes source 37,773 vts. target 36,693 vts. input data N-ICP 1 target alignment of a bending final registration Figureinitial 7: Registration arm. The 36,693 vts. [Li et al. 2008] map of the fist explains why few reliable corre

Cross Parameterization Assumption If two near isometric surfaces are mapped to the same domain using stretch minimizing parameterization, their maps are likely to be identical. [Popa et al. 2010] Trace the correspondence in a common parameterization domain.

Cross Parameterization Registration process 1. Construct meshes for all frames. 2. Find the consistent patch pairs in two frames. 3. Consistent parameterization for each pair. 4. Pack the patches to achieve global mapping.

Cross Parameterization How to find common patches Trace feature points using optical flow. Hard constraint Consistently grow patches in both frames from a matched feature point pair. How to achieve consistent parameterization Align the matched feature point pairs in parameterization domain. [Popa et al. 2010]

Global Consistency Problems of local registration The matching error will be accumulated. Topological inconsistency [Popa et al. 2010]

Template Mesh Fit template to the models in the captured sequence using the surface deformation methods. input late wrapped template reconstruction [Li et al. 2009]

Template-Free Approach Bottom up approach Fix the inconsistency between two consecutive frames. Repeat until all frames are matched. [Popa et al. 2010]

Problems Computing time 60~ frames per second for acquisition. Minutes to hours for registration. Global consistency Template-based approaches. Template-free approaches.

Possible Solutions Reduce the fitting time by Assuming strong spatial coherence between frames. Applying optical flow as the initial guess. Restrict the registration to specific targets Facial animation - no topological inconsistency problem Articulated model - piecewise rigid transformation

References [Vlasic et al. 2009] Dynamic shape capture using multi-view photometric stereo, ACM SIGGRAPH Asia, 2009. [Turk & Levoy 1994] Zippered polygon meshes from range images, ACM SIGGRAPH, 1994. [Aiger et al. 2008] 4-points congruent sets for robust pairwise surface registration, ACM SIGGRAPH, 2008. [Gelfand et al. 2005] Robust Global Registration, Eurographics Symposium on Geometry Processing, 2005. [Mitra et al. 2007] [Li et al. 2008] Dynamic geometry registration, Eurographics Symposium on Geometry Processing, 2007. Global Correspondence Optimization for Non-Rigid Registration of Depth Scans, Eurographics Symposium on Geometry Processing, 2008. [Sumner et al. 2007] Embedded deformation for shape manipulation, ACM SIGGRAPH, 2007. [Li et al. 2009] [Popa et al. 2010] Robust single-view geometry and motion reconstruction, ACM SIGGRAPH Asia, 2009. Globally Consistent Space-Time Reconstruction, Eurographics Symposium on Geometry Processing, 2010.