RGBD Point Cloud Alignment Using Lucas-Kanade Data Association and Automatic Error Metric Selection

Size: px

Start display at page:

Download "RGBD Point Cloud Alignment Using Lucas-Kanade Data Association and Automatic Error Metric Selection"

Leslie Bennett
6 years ago
Views:

1 IEEE TRANSACTIONS ON ROBOTICS, VOL. 31, NO. 6, DECEMBER 15 1 RGBD Point Cloud Alignment Using Lucas-Kanade Data Association and Automatic Error Metric Selection Brian Peasley and Stan Birchfield, Senior Member, IEEE Abstract We propose to overcome a significant limitation of the ICP algorithm used by KinectFusion, namely, its sole reliance upon geometric information. Our approach uses both geometric and color information in a direct manner that uses all the data in order to accurately estimate camera pose. Data association is performed by Lucas-Kanade to compute an affine warp between the color images associated with two RGBD point clouds. A subsequent step then estimates the Euclidean transformation between the point clouds using either a point-to-point or point-to-plane error metric, with a novel method based on a normal covariance test for automatically selecting between them. Together, Lucas- Kanade data association (LKDA) with covariance testing enables robust camera tracking through areas of low geometric features, without sacrificing accuracy in environments in which the existing ICP technique succeeds. Experimental results on several publicly available datasets demonstrate the improved performance both qualitatively and quantitatively. Index Terms Kinect, KinectFusion, ICP, Lucas-Kanade, Camera Tracking, Mapping I. INTRODUCTION THREE dimensional (3D) reconstruction of an environment is an important problem in robotics that has received much attention in recent years. The advent of inexpensive RGBD sensing (e.g., the Kinect sensor) has greatly improved the fidelity and speed with which such reconstructions can be made. These capabilities have far-reaching implications for the exploration and navigation of unknown environments, as well as for the manipulation of objects in those environments. Of the many 3D modeling techniques that have been developed, the landmark KinectFusion method [19], [13] has established itself as perhaps the most accurate, real-time, dense modeling system using an inexpensive sensor. Recent work by others [9] has extended this algorithm to operate over large-scale environments. However, both the standard and the extended version use geometric information alone to align the camera with the model, thus requiring the environment to contain sufficient geometric features in order for the iterative closest point (ICP) algorithm to accurately estimate the camera pose. This deficiency was noted in [9], where preliminary experiments were conducted to explore the advantage of replacing the ICP algorithm with the output of a featurebased visual odometry system called FOVIS [11] to provide a more stable estimate of camera pose. The -to- approach of FOVIS, however, loses one of the key advantages of KinectFusion, namely the reduced drift that results from matching the current image to the model rather than to the previous image. In this paper we present an approach to RGBD camera tracking in environments with low geometric information. The proposed approach uses all the data (both color and depth), preserves the advantages of KinectFusion s image-to-model matching, and obviates the need for the feature point extraction and feature correspondence steps inherent in feature-based visual odometry systems. The key to the approach is to replace the projective data association (PDA) point matching algorithm [3] used by KinectFusion with a data association technique driven by Lucas-Kanade [18], [6], []. We introduced this matching algorithm, which we call Lucas-Kanade data association (LKDA), in [1]. In this paper we extend that work with an automatic method for selecting the appropriate error metric based on the geometry of the scene, a GPU implementation of the matching, and more thorough experimental results to demonstrate the improvement that results from LKDA as well as to compare it with alternative techniques. These results on standard datasets demonstrate that LKDA combined with the automatic error metric enable camera tracking to succeed in areas of low geometry, without sacrificing either computational efficiency or accurate camera tracking in highly geometric environments. 1 II. PREVIOUS WORK Before the advent of recent inexpensive RGBD sensors, several researchers addressed the problem of improving ICP by incorporating color information. Both geometric and color information are incorporated into the distance metric used by ICP in a number of systems [5], [1], [14]. Similarly, Druon et al. [6] segment colored point clouds based on hue, then perform ICP while requiring matching points to belong to the same color class. More recently, a popular way to use visual information to improve alignment is to match feature points, then apply RANSAC to the features with depth values. This is the approach employed by Henry et al. [1], in which the visual feature associations from RANSAC are then combined with dense point associations in an ICP work. This same procedure of feature matching followed by RANSAC is adopted in both the FOVIS visual odometry system [11] and the algorithm of 1 Although our LKDA approach will fail when there is no visual texture (e.g., a white wall), such scenes have low geometric variation as well, since geometry usually generates visual texture via shading and shadows.

2 IEEE TRANSACTIONS ON ROBOTICS, VOL. 31, NO. 6, DECEMBER 15 Engelhard et al. [8], the latter of which also employs ICP and is evaluated in [7]. As expected, the primary drawback of these techniques is the failure of feature detection and matching when the scene contains little photometric texture. An approach that is closely related to our own is to register RGBD images via direct image warping. Tykkälä et al. [7] formulate the 3D registration of colored point clouds as a direct minimization of both depth and color information. To reduce the computational load, in followup work the same researchers [1] minimize only the image intensities, without using depth. In related work, Steinbrücker et al. [4] align two colored point clouds by employing Lie algebra to solve for the twist coordinates representing the 3D transformation that maximizes the photoconsistency of the projections. This work has been extended by Kerl et al. [16] to achieve realtime performance (3 Hz) on a single CPU, as well as to combine both photoconsistency and depth consistency in the minimization process [15]. Another direct approach is DTAM [], which uses dense methods for whole image aligment without a depth sensor but stores the model as a set of keys rather than as a dense volume. Other researchers have focused on extending KinectFusion. Henry et al. [9] propose a system consisting of local patch volumes, which are similar to KinectFusion volumes, connected together in a graph optimization work to ensure global consistency. In another approach that generates large maps, Whelan et al. [8] combine KinectFusion s ICP algorithm with a GPU implementation of the RGBD alignment approach of [4], and they provide a switching mechanism between the resulting estimate and that of the feature-based FOVIS. As expected, they show that both FOVIS and RGBD alignment fail when the scene contains little photometric texture, KinectFusion s ICP fails when there is little geometric texture, and a combination of these methods yields improved tracking robustness. Like the work of Whelan et al. [8], we focus upon improving the camera alignment step of KinectFusion by considering both visual and depth information. However, instead of combining off-the-shelf algorithms with a switching mechanism between them, we derive a novel algorithm that first aligns the intensity images then uses the resulting correspondences to align the depth images with an automatic selection of either point-to-point or point-to-plane error metrics based on the scene geometry. In Section V we show that our relatively straightforward, direct approach produces more consistent results than those obtained by more complicated methods, without sacrificing computational speed. III. LUCAS-KANADE DATA ASSOCIATION Let P = {p i } N i=1 and Q = {q i} N i=1, where p i,q i R 3, be two point clouds in 3D Euclidean space. The goal of data association is to compute a function f that maps indices in one point cloud to those of the other, so that f(i) = j indicates a corresponding pair of points p i q j. Let I P and I Q be the current and reference RGB images, respectively, associated with the point clouds, and d P and d Q the current and reference depth maps. In our implementation I Q is the previous image, Fig. 1. LEFT: PDA establishes correspondence between two point clouds, P and Q, by projecting each point p i P of one cloud onto the other s depth camera, D Q, to find the nearest projection of q j Q. RIGHT: LKDA establishes correspondence by projecting each point p i onto the RGB camera C P, then warping the projected point onto the other RGB camera C Q using the estimated warp from Lucas-Kanade to find the nearest projection of q j. (The normals n j are used in the point-to-plane error metric.) while d Q arises from projecting the truncated signed distance function (TSDF) according to the previous camera pose. Projective data association (PDA) [17] establishes correspondence between P and Q by projecting the points from both point clouds onto the same image plane, then considering points that project onto the same pixel to be corresponding points. See Figure 1. In a similar manner, Lucas-Kanade data association (LKDA) also finds the closest point on the image plane, but only after first transforming the projected coordinates according to the warp function found by Lucas- Kanade: f(i) = argmin j ϕ(c Q q j ) ϕ( W(C P p i ;ζ)), (1) where ζ contains the warp parameters, C P and C Q are 4 4 homogeneous matrices describing the 3D position and orientation of the RGB cameras, the tilde indicates homogeneous coordinates, and ϕ dehomogenizes the coordinates. The PDA approach relies solely upon geometric information, that is, the spatial coordinates of the points in the two clouds. As a result, when the environment does not provide sufficient geometric texture (e.g., the camera is looking at a planar wall), or when most of the depth readings are invalid due to the limited range of the sensor or occur outside the TSDF maintained by KinectFusion (e.g., the camera looks down a hallway), PDA will fail to provide enough correspondences to track the camera. The LKDA approach improves the alignment of point clouds in such environments, since the photometric information is not affected by the scene geometry, the range of the sensor, or the size of the TSDF. The Lucas-Kanade algorithm is a differential method for computing the optical flow of an image by finding the parameters ζ that minimize the sum of squared distance error ǫ LK = x ( IQ (W 1 (x;ζ)) I P (x) ), () where the summation is over all the pixels x = [ x y ] T in the image, and W(x;ζ) is a parametric warp function that brings the two images into alignment. This error is minimized by linearizing about the current estimate and repeatedly solving

3 IEEE TRANSACTIONS ON ROBOTICS, VOL. 31, NO. 6, DECEMBER 15 3 the following linear system ( ) gg T ζ = g ( I Q (W 1 (x;ζ)) I P (x) ) (3) x x ( ) T W for ζ, where g = ζ IP, and I Q contains the gradient of I P. For efficiency, we use the inverse compositional algorithm []. Common choices for the warp function are translation, affine, or projective. Since we are interested in a global, featureless mapping technique that warps an entire image into another, the translation model is not expressive enough; either an affine or projective warp is thus required. We have found little difference in accuracy between these two alternatives (see the experimental results), but the projective warp incurs a substantial increase in computation due to the fact that the Hessian is dependent upon the current state of the system and therefore must be recomputed each iteration. As a result we use an affine warp, which is given by [ x(rxx +1)+yr W(x,y;ζ) = xy +a x xr yx +y(r yy +1)+a y where ζ = [ r xx r xy r yx r yy a x a y ] T. IV. ESTIMATING THE CAMERA POSE ], (4) Each iteration of ICP can be broken into six steps, and different choices within these steps lead to different variations of the algorithm []. Figure illustrates these steps within the traditional ICP approach used by KinectFusion, compared with our LKDA approach. Instead of recomputing data association after each alignment, we first perform data association, then perform alignment. This decoupling, which is enabled by our separate use of RGB and depth images, leads to significant speedup without sacrificing robustness. At the same time, the correspondences used in each iteration of the algorithm are not necessarily the same, because the rejection step retains only a subset of them. In this section we detail the steps that are different between our approach and KinectFusion. A. Point Matching As already mentioned, projective data association (PDA) establishes correspondence by finding the closest point from the other cloud as projected onto the image plane. In a similar manner, Lucas-Kanade data association (LKDA) also finds the closest point on the image plane, but only after first transforming the projected coordinates according to the warp function found by Lucas-Kanade. Unlike traditional ICP, however, the LKDA point matching step is performed only once per pair of image s, and subsets of the correspondences found in this step are then used in the iterations of the alignment process. B. Outlier rejection As in [19], our approach rejects point correspondences that exceed a distance or angular threshold. In addition, corresponding points whose color difference exceeds a predefined (a) ICP implementation in KinectFusion (b) Our LKDA approach Fig.. TOP: Given the current and reference depth images, KinectFusion iterates through the six ICP steps. Steps 1-3 perform data association (PDA), while steps 4-6 perform alignment. BOTTOM: In contrast, our approach iteratively finds the correspondences using the RGB images (LKDA, steps 1-3), then iteratively estimates the alignment parameters using the depth images (steps 4-6). The steps marked with an asterisk (*) are different in the two approaches. color threshold ρ are rejected: I Q (x,y) I P (W(x,y;ζ)) > ρ. (5) Note that the rejected points in each iteration of the alignment process are not necessarily the same, so that a different subset of points contribute to the error metric in each iteration. In our implementation the distance, angular, and color thresholds are.1 m,, and ρ = 5, respectively. C. Automatic Selection of the Error Metric Two common choices of the metric used for alignment are point-to-point and point-to-plane [4]. If there is sufficient geometric texture in the environment, then the latter converges faster than the former [], which explains its adoption by KinectFusion. On the other hand, if all point correspondences are on a flat wall then the point-to-plane error metric will not result in correct alignment, because there will be no mechanism to induce a lateral shift between the point clouds in the direction perpendicular to the normals. See Figure 3. To overcome this limitation we use a linear combination of the 3D Euclidean transformations computed using the pointto-point (T pp ) and the point-to-plane (T pπ ) error metric: T = (1 λ)t pp +λt pπ, (6) where λ 1 is a weighting factor. The value of λ is determined automatically at run time by solving for the condition number κ of the covariance matrix of the geometric normals seen by the current. In areas of low geometry the value of γ = κ 1 is low, while in areas of high geometry γ is high. We model λ as a sigmoid ( λ = 1+e α(γ ˆγ)) 1, (7)

4 IEEE TRANSACTIONS ON ROBOTICS, VOL. 31, NO. 6, DECEMBER 15 4 D. Error minimization Given the correspondences between the two image s that survived the rejection step, along with the value λ, the two point clouds can be aligned. The transformation T pp is computed in closed-form using orthogonal Procrustes analysis, while T pπ is computed iteratively as described in [17]. The camera pose is then given by (6). Fig. 3. When the scene is planar, the point-to-plane error metric fails to correctly align point clouds, even with perfect correspondence. error (meters) inverse condition number point to point point to plane Fig. 4. Scatter plot of γ, the inverse of the condition number of the covariance matrix of the normals, versus the error of the estimated translation (averaged over all s of several sequences). The red dots indicate the results using the point-to-plane error metric (T pπ) to estimate odometry, while the blue dots show the results from the point-to-point error metric (T pp). Robust exponential curves were fit to both data sets to determine their intersection. where α is a large constant (determined experimentally to be 3), and ˆγ indicates the value of γ for which the two metrics are weighted equally, i.e., λ =.5. To determine ˆγ, for each over several sequences we computed T pπ, T pp, and κ 1, comparing the former two with the ground truth. The resulting scatter plot can been seen in Figure 4, along with the intersection of robust exponential curves fit to both data sets, yielding ˆγ =.1. Figure 5 shows the results of both metrics on two scenes, illustrating that T pp yields less error on scenes with low geometry, while T pπ tends to perform slightly better on scenes with high geometry. cumulative error point to plane point to point 1 3 cumulative error point to plane point to point 1 3 Fig. 5. Cumulative camera translation error for both metrics when reconstructing a relatively flat environment (left) versus a geometrically rich environment (right). V. EXPERIMENTAL RESULTS To implement the proposed approach, we modified the opensource version of KinectFusion known as KinFu in the PCL library [3] with our own C++ and GPGPU code. In our first set of experiments we compared the unmodified PDA version with six different versions of LKDA using either grayscale or color images, and either a translation, affine, or projective warp. To facilitate quantitative comparison, we used the data from the RGB-D SLAM Dataset and Benchmark [5], which provides synchronized depth and color data from a Kinect or Xtion sensor as well as the ground truth sensor trajectory from a high-accuracy motion capture system. For these experiments we used 5 different datasets exhibiting a variety of small environments, which can roughly be categorized as follows: high-geometry. The environment contains a large amount of geometric variation throughout, enabling geometrybased sensor tracking. (fr1/xyz, fr1/floor, fr1/desk) planar. The environment is largely planar (that is, without much geometric variation), making geometry-based sensor tracking difficult. (fr3/ntnw 3 ) out-of-bounds. For at least some s of the sequence, the sensor is located so that a majority of the depth values are outside the TSDF volume. (fr1/rpy) We used a m TSDF volume divided into 51 3 voxels (each voxel is 5.86 mm per side). The results on these 5 datasets are shown in Table I, where the RMSE translational drift (in meters per second) and RMSE rotational drift (in radians per second) are displayed. The drift is also known as the Relative Pose Error (RPE) [5]. For the high-geometry sequences, all the methods succeed, and there is little difference in accuracy in either position or orientation. For the planar (fr3/ntnw) or out-of-bounds (fr1/rpy) sequences, however, the standard KinectFusion algorithm fails to maintain sensor tracking. There is a noticeable improvement using affine over translation, but almost none by using projective over affine. To save space we show only the grayscale results, since the color results are nearly identical. We directly compared our method with the approach of Whelan et al. [8] and the Dense Visual Odometry (DVO) algorithm of Kerl et al. [15], [16]. Results of these comparisons are displayed in Tables II and III, respectively. 4 While the algorithm of [8] performs better on a -to- basis, our approach yields a better overall estimated trajectory, leading 3 A shortened form of the name fr3/nostructure texture near withloop. 4 To reduce space, Table II only includes results from the FOVIS/ICP+RGB- D version of the algorithm in [8]; other versions are similar. For fair comparison, the final column of Table III shows the RGB+D+KF version of [15], which includes keys but not graph optimization.

IEEE TRANSACTIONS ON ROBOTICS, VOL. 31, NO. 6, DECEMBER 15 5 fr1/xyz fr1/floor fr1/desk fr3/ntnw fr1/rpy PDA.6.78.96.79 F 1.864 F (.31) (.58) (.7) (.461) F (1.87) F LKDA.6.89.96.54 F.61 Translation (.

THE BEST RESULT IS IN BOLD, AND FAILED TRACKING RESULTS ARE INDICATED BY F. LKDA Whelan ATE RPE ATE RPE med max RMSE med max RMSE fr1/desk.1.49.4.68.31.4 fr/desk.6.5.41.118.346.1 fr1/room.41.79.75.15.

5 IEEE TRANSACTIONS ON ROBOTICS, VOL. 31, NO. 6, DECEMBER 15 5 fr1/xyz fr1/floor fr1/desk fr3/ntnw fr1/rpy PDA F F (.31) (.58) (.7) (.461) F (1.87) F LKDA F.61 Translation (.31) (.63) (.7) (.533) F (.57) LKDA Affine (.3) (.56) (.71) (.31) (.57) LKDA Projective (.3) (.57) (.71) (.31) (.56) TABLE I RPE RMSE IN TRANSLATION (M/S) (TOP OF EACH CELL) AND ROTATION (RAD/S) (BOTTOM, IN PARENTHESES). THE BEST RESULT IS IN BOLD, AND FAILED TRACKING RESULTS ARE INDICATED BY F. LKDA Whelan ATE RPE ATE RPE med max RMSE med max RMSE fr1/desk fr/desk fr1/room fr/lnl TABLE II COMPARISON WITH WHELAN ET AL. [8], SHOWING MEDIAN AND MAXIMUM VALUES OF THE ABSOLUTE TRANSLATION ERROR (ATE) (M), AND RMSE TRANSLATIONAL DRIFT (M/S). KinectFusion with PDA to lower absolute trajectory errors. Compared with [15], our approach is competitive, yielding the lowest error on several sequences. Plots of the estimated camera position versus number for the planar (fr3/ntnw) and out-of-bounds (fr1/rpy) environments for the different algorithms are shown in Figure 6. Notice that KinectFusion with PDA is unable to maintain tracking in such environments, while our LKDA algorithm (grayscale affine) and the approaches of [8] and [15] are able to maintain camera tracking. Other variations of LKDA also perform well, except that the translation motion model is insufficient for the planar scene. Reconstructions of the outof-bounds environment by rendering the point cloud from two approaches are shown in Figure 7 for comparison. Notice that LKDA achieves an accurate reconstruction of the scene, while the reconstruction from PDA is difficult to interpret. The importance of the automatic error selection is shown in Table IV. Seven sequences were run using either a pointto-point or point-to-plane error metric, or with automatic selection between the two. As expected, the point-to-point error LKDA DVO ATE RPE ATE RPE RMSE RMSE RMSE RMSE fr1/desk fr1/desk fr1/room fr1/ fr1/teddy fr1/floor fr1/xyz fr1/rpy fr1/plant TABLE III COMPARISON WITH DVO [15], SHOWING THE RMSE ABSOLUTE TRAJECTORY ERROR (M) AND THE RMSE TRANSLATIONAL DRIFT (M/S). KinectFusion with LKDA Fig. 7. Reconstruction of the out-of-bounds environment (fr1/rpy) by standard KinectFusion with PDA (top) and KinectFusion with LKDA (bottom). The inability of the former to handle depth values outside the TSDF volume causes a severe loss in camera tracking, thus resulting in a poor reconstruction; while our approach yields accurate reconstruction. High Low point-to-point point-to-plane automatic fr1/rpy fr1/floor fr1/teddy fr1/xyz fr1/plant fr3/ntnw fr3/ntf TABLE IV COMPARISON OF THE RMSE ATE OF THE POINT-TO-POINT ERROR METRIC (LEFT) AGAINST THE POINT-TO-PLANE ERROR METRIC (MIDDLE). THE FORMER PERFORMED BETTER IN SCENES OF LOW GEOMETRY, WHILE THE LATTER PERFORMED BETTER IN SCENES WITH HIGH GEOMETRY. OUR AUTOMATIC SELECTION METHOD HANDLES BOTH TYPES OF SCENES. metric performed better in environments with little geometrical structure, while the point-to-plane error metric always yielded less error on sequences with high geometrical structure. The automatic selection mechanism enabled our approach to yield the minimum error on all sequences. Timing comparisons of the different variations of our algorithm were conducted on several machines with varying computational power and GPUs. For standard KinectFusion

6 IEEE TRANSACTIONS ON ROBOTICS, VOL. 31, NO. 6, DECEMBER 15 6 X (meters) X (meters) PDA + KF Whelan et al. DVO LKDA + KF Ground Truth Y (meters) Y (meters) Z (meters) 1.5 Z (meters) Fig. 6. Camera position estimation in a planar environment (fr3/ntnw, top) and out-of-bounds environment (fr1/rpy, bottom). Shown is a comparison of the proposed grayscale affine LKDA, KinectFusion with PDA, the approach of Whelan et al. [8], and DVO [15]. The solid black line is ground truth. with PDA, nearly all the computation (approximately 85%) is devoted to the alignment, with the remaining computation required for PDA. Our approach retains the alignment computation but replaces PDA with a more expensive LKDA step. On average, the ratio of the time taken by LKDA to PDA is 3:1 (grayscale translation), 6:1 (grayscale affine), and 45:1 (grayscale projective). (Color versions require nearly 3 times as much computation.) Therefore, our grayscale affine approach (the recommended version) is about 1.8 times as expensive as KinectFusion (since (6) = 1.8). On a machine capable of running standard KinectFusion at 3 Hz, our algorithm runs at approximately 17 Hz; on a machine that runs KinectFusion at 1 Hz, our algorithm runs at nearly 6 Hz. VI. CONCLUSION In this paper we have presented a direct technique to robustly estimate camera motion using an RGBD sensor, even when there is little geometric information in the scene. The approach retains the drift-free advantages of the matchto-model approach of KinectFusion, without requiring the estimation of visual features. The approach operates by incrementally computing the affine warp between successive images using Lucas-Kanade, yielding a correspondence between point clouds. This approach to correspondence, which we call LKDA, replaces the PDA technique implemented in the variant of ICP found in KinectFusion. In contrast with PDA, LKDA was shown on a number of publicly available sequences that it is able to maintain camera tracking in environments with limited geometric structure, as well as in environments that extend beyond the TSDF volume or contain depths outside the range of the sensor. Moreover, we have proposed an automatic mechanism for selecting between point-to-point or point-to-plane error metrics that properly handles a variety of environments. Future work will be aimed at extending this approach to large-scale environments. VII. ACKNOWLEDGMENTS The authors thank T. Whelan and C. Kerl for providing their data for comparison, and the anonymous reviewers for their helpful comments. This work was partially supported by NSF grant IIS REFERENCES [1] C. Audras, A. I. Comport, M. Meilland, and P. Rives. Real-time dense RGB-D localisation and mapping. In Australian Conference on Robotics and Automation, Dec. 11. [] S. Baker and I. Matthews. Lucas-Kanade years on: A unifying work. International Journal of Computer Vision, 56(3):1 55, 4. [3] G. Blais and M. Levine. Registering multiview range data to create 3D computer objects. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 17(8):8 84, Aug [4] Y. Chen and G. Medioni. Object modeling by registration of multiple range images. In Proceedings of the IEEE International Conference on Robotics and Automation, volume 3, pages 74 79, Apr [5] L. Douadi, M.-J. Aldon, and A. Crosnier. Pair-wise registration of 3D/color data sets with ICP. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages , Oct. 6. [6] S. Druon, M. Aldon, and A. Crosnier. Color constrained ICP for registration of large unstructured 3D color data sets. In IEEE International Conference on Information Acquisition, pages 49 55, Aug 6. [7] F. Endres, J. Hess, N. Engelhard, J. Sturm, D. Cremers, and W. Burgard. An evaluation of the RGB-D SLAM system. In Proceedings of the International Conference on Robotics and Automation (ICRA), May 1. [8] N. Engelhard, F. Endres, J. Hess, J. Sturm, and W. Burgard. Realtime 3D visual SLAM with a hand-held RGB-D camera. In Proc. of the RGB-D Workshop on 3D Perception in Robotics at the European Robotics Forum, Apr. 11. [9] P. Henry, D. Fox, A. Bhowmik, and R. Mongia. Patch volumes: Segmentation-based consistent mapping with RGB-D cameras. In Proceedings of the International Conference on 3D Vision (3DV), June 13. [1] P. Henry, M. Krainin, E. Herbst, X. Ren, and D. Fox. RGB-D mapping: Using Kinect-style depth cameras for dense 3D modeling of indoor environments. International Journal of Robotics Research, 31(5): , Apr. 1. [11] A. Huang, A. Bachrach, P. Henry, M. Krainin, D. Maturana, D. Fox, and N. Roy. Visual odometry and mapping for autonomous flight using an RGB-D camera. In International Symposium on Robotics Research (ISRR), Aug. 11.

7 IEEE TRANSACTIONS ON ROBOTICS, VOL. 31, NO. 6, DECEMBER 15 7 [1] B. Huhle, M. Magnusson, W. Strasser, and A. J. Lilienthal. Registration of colored 3D point clouds with a kernel-based extension to the normal distributions transform. In Proceedings of the International Conference on Robotics and Automation (ICRA), pages 45 43, May 8. [13] S. Izadi, D. Kim, O. Hilliges, D. Molyneaux, R. Newcombe, P. Kohli, J. Shotton, S. Hodges, D. Freeman, A. Davison, and A. Fitzgibbon. KinectFusion: Real-time 3D reconstruction and interaction using a moving depth camera. In Proceedings of the 4th ACM Symposium on User Interface Software and Technology, pages , 11. [14] J. H. Joung, K. H. An, J. W. Kang, M. J. Chung, and W. Yu. 3D environment reconstruction using modified color ICP algorithm by fusion of a camera and a 3D laser range finder. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages , Oct. 9. [15] C. Kerl, J. Sturm, and D. Cremers. Dense visual SLAM for RGB-D cameras. In Proceedings of the International Conference on Intelligent Robot Systems (IROS), Nov. 13. [16] C. Kerl, J. Sturm, and D. Cremers. Robust odometry estimation for RGB-D cameras. In Proceedings of the International Conference on Robotics and Automation (ICRA), May 13. [17] K.-L. Low. Linear least-squares optimization for point-to-plane ICP surface registration. Technical Report TR4-4, University of North Carolina, Feb. 4. [18] B. D. Lucas and T. Kanade. An iterative image registration technique with an application to stereo vision. In Proceedings of the 7th International Joint Conference on Artificial Intelligence, pages , Aug [19] R. A. Newcombe, S. Izadi, O. Hilliges, D. Molyneaux, D. Kim, A. J. Davison, P. Kohli, J. Shotton, S. Hodges, and A. Fitzgibbon. Kinect- Fusion: Real-time dense surface mapping and tracking. In Proceedings of the 1th IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pages , 11. [] R. A. Newcombe, S. J. Lovegrove, and A. J. Davison. DTAM: Dense tracking and mapping in real-time. In Proceedings of the International Conference on Computer Vision, pages 3 37, 11. [1] B. Peasley and S. Birchfield. Replacing projective data association with Lucas-Kanade for KinectFusion. In Proceedings of the International Conference on Robotics and Automation (ICRA), May 13. [] S. Rusinkiewicz and M. Levoy. Efficient variants of the ICP algorithm. In Third International Conference on 3D Digital Imaging and Modeling (3DIM), June 1. [3] R. B. Rusu and S. Cousins. 3D is here: Point Cloud Library (PCL). In Proceedings of the International Conference on Robotics and Automation (ICRA), May 11. [4] F. Steinbrücker, J. Sturm, and D. Cremers. Real-time visual odometry from dense RGB-D images. In Workshop on Live Dense Reconstruction with Moving Cameras (at ICCV), Nov. 11. [5] J. Sturm, N. Engelhard, F. Endres, W. Burgard, and D. Cremers. A benchmark for the evaluation of RGB-D SLAM systems. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages , Oct. 1. [6] C. Tomasi and T. Kanade. Detection and tracking of point features. Technical Report CMU-CS-91-13, Carnegie Mellon University, Apr [7] T. Tykkälä, C. Audras, and A. I. Comport. Direct iterative closest point for real-time visual odometry. In Second International Workshop on Computer Vision in Vehicle Technology, Nov. 11. [8] T. Whelan, H. Johannsson, M. Kaess, J. J. Leonard, and J. McDonald. Robust real-time visual odometry for dense RGB-D mapping. In Proceedings of the International Conference on Robotics and Automation (ICRA), May 13. [9] T. Whelan, J. B. McDonald, M. Kaess, M. F. Fallon, H. Johannsson, and J. J. Leonard. Kintinuous: Spatially extended KinectFusion. In RSS Workshop on RGB-D: Advanced Reasoning with Depth Cameras, Sydney, Australia, July 1.

Replacing Projective Data Association with Lucas-Kanade for KinectFusion

Replacing Projective Data Association with Lucas-Kanade for KinectFusion Brian Peasley and Stan Birchfield Electrical and Computer Engineering Dept. Clemson University, Clemson, SC 29634 {bpeasle,stb}@clemson.edu