Using Augmented Measurements to Improve the Convergence of ICP

Using Augmented Measurements to Improve the onvergene of IP Jaopo Serafin, Giorgio Grisetti Dept. of omputer, ontrol and Management Engineering, Sapienza University of Rome, Via Ariosto 25, I-0085, Rome, Italy {serafin,grisetti}@dis.uniroma.it Abstrat. Point loud registration is an essential part for many robotis appliations and this problem is usually addressed using some of the existing variants of the Iterative losest Point (IP) algorithm. In this paper we propose a novel variant of the IP objetive funtion whih is minimized while searhing for the registration. We show how this new funtion, whih relies not only on the point distane, but also on the differene between surfae normals or surfae tangents, improves the registration proess. Experiments are performed on syntheti data and real standard benhmark datasets, showing that our approah outperforms other state of the art tehniques in terms of onvergene speed and robustness. Keywords: Point loud Registration IP Surfae Normals Introdution Registering two point louds is a building blok of many robot appliations suh as simultaneous loalization and mapping (SLAM), objet reognition and detetion, augmented reality and many others. This problem is ommonly solved by variants of the Iterative losest Point (IP) algorithm proposed by Besl and MKay []. IP tries to find a transformation that minimizes the distane of a set of orresponding points in the two louds. At eah IP refines the estimate of the transformation by alternating a searh and an optimization routine. Given the urrent transform, the searh looks for orresponding points in the two louds. The optimization omputes the transformation that results in the minimum distane between the orresponding points found by the searh step. IP is a very suessful sheme and several variants of inreasing performanes have been proposed. If the orrespondenes are free from outliers and the measurements are affeted by low noise, the transformation an be found diretly by applying the Horn formula [3]. The whole onept at the base of IP is that, at eah, an improved transformation with respet to the previous one is found. Suh a transformation represents the new initial guess for the heuristi used to find the orrespondenes and allows to determine better assoiations at the next. Aordingly, researhers foused on seeking for heuristis that provide good orrespondenes.

The original idea of piking up the losest points [] has been progressively refined to onsider features, urvature and other harateristis of the points. Pomerlau et al. [5] provided an exellent overview on these different variants. IP and its variants require multiple s beause it does not exist an heuristi that provides the exat orrespondenes. Sine the optimization requires linear time in the number of orrespondenes, the bottlenek of the omputation is represented by the heuristi that has to ompute them. The main drawbak of IP in its original formulation is the assumption that the points in the two surfaes are exatly the same. This is learly not true as the point louds are obtained by sampling a set of points from the surfae observed by the sensor. If the observer position hanges, the hanes that two points in the louds are the same is very low. This is partiularly evident at low sampling resolutions. Aware of this aspet, Magnusson et al. [4] proposed to approximate the surfae with a set of Gaussians apturing the loal statistis of the surfae in the neighborhood of a point. In that representation, alled the Normal Distrubution Transform (NDT), the orrespondene searh uses the Mahalanobis distane instead of the Eulidean one and the optimization tries to minimize it. Similarly, Segal et al [6] proposed a refined version of IP alled Generalized IP (GIP). The ore idea behind this algorithm is to aount for the shape of the surfae whih surrounds a point by approximating it with a planar path. In the optimization, two orresponding pathes are aligned onto eah other, negleting the error along their tangent diretion. This an be straightforwardly implemented by minimizing the Mahalanobis distane of orresponding points, where the ovariane matrix of a measurement is fored to have the shape of a disk aligned with the sampled surfae. Thanks to the better rejetion of false orrespondenes based on the surfae normal ue and the more realisti objetive funtion, NDT and GIP exhibit a substantially more stable onvergene behavior. In fat, within IP and its variants, the optimization and the orrespondene searh are not independent. If the optimization is robust to outliers and exhibits a smooth behavior, the hanes that it finds a better solution at the subsequent step inreases. In this way an improvement is obtained at eah until a good solution is found. Despite NDT and GIP, the authors are unaware of other methods that improve the objetive funtion. Sine point louds are the effets of sampling a surfae, the loal harateristis of this surfae play a role in the optimization. From this point of view, the objetive funtion has to express some distane between surfae samples, and the optimization algorithm has to determine the optimal alignment between these two set of samples. A surfae sample, however, is not fully desribed just by 3D points, but it requires additional ues like the surfae normal, the urvature and, potentially, the diretion of the edge. Both NDT and GIP minimize a distane between orresponding points, while they neglet additional ues that an indeed play a role in determining the transformation and in rejeting outliers.

In this paper we propose a novel variant of the objetive funtion whih is optimized while searhing for the transformation. This funtion depends not only on the relative point distane, but also on the differene between surfae normals or tangents in ase the point lies on an edge. We provide an iterative form for the optimization routine and we show through experiments performed on syntheti data and standard benhmark datasets that our approah outperforms other state of the art tehniques, both in terms of onvergene speed and robustness. 2 IP The problem of registering two point louds onsists in finding the rotation and the translation that maximizes the overlap between the two louds. More formally, let P r = {p r :N r} and P = {p :N } be the two set of points, we want to find the transformation T that minimizes the distane between orresponding points in the two senes: {}}{ T ( = argmin p i T p r ) T ( j Ωij p i T p r j) T }{{} e ij(t) χ 2 ij. () In Eq. the symbols have the following meaning: T is the transform that is updated at eah step i of the iterative algorithm with the one found at i ; Ω ij is an information matrix that takes into aount the noise properties of the sensor or of the surfae; = { i, j :M } is a set of orrespondenes between points in the two louds. i, j means that the point p r j in the loud Pr orresponds to the point p i in the loud P ; e ij (t) is the error funtion that omputes the distane between the point p i and the orresponding point p r j in the other loud after applying the transformation T; χ 2 ij is the Ω ij-norm of the error e ij (t); is an operator that applies the transformation T to a point p. If we use the homogeneous notation for transformations and points, redues to the matrix-vetor produt. In general, the orrespondenes between two point louds are not known. However, in presene of a good approximation for the initial transform, they an be guessed through some heuristi like nearest neighbour. In its most general formulation, IP iteratively refines an initial transform T by searhing for orrespondenes and finding the solution of Eq.. Suh a new transformation is then used in order to ompute the new orrespondenes. Eq. desribes the objetive funtion used in the optimization of IP, NDT and GIP. In the ase of IP, Ω ij is a diagonal matrix potentially saled with a weight representing

the onfidene about the orretness of a orrespondene. NDT omputes the ovarianes Σ i diretly from the point loud and it measures the distanes by using the mean of the Gaussians rather than the points as shown in Eq. 2. T = argmin T ( µ i T p r j) T Σ i ( µ i T p r ) j. (2) In GIP, Ω ij = Σ i depends only on the i th point p i and its neighborhood. The ovariane Σ i is enfored to have a disk shape and to lie on the surfae from where p i was sampled. In all ases, the differene p i T pr j is a 3D vetor that measures the offset between two 3D points and the domain of the error funtion is R 3. Sine an inrease in the dimensionality of the points makes the whole system more observable, less orrespondenes are required for the optimization proess. By haraterizing eah point with other quantities to whih a transform an be applied, we an ahieve suh an inrease in the dimensionality. We propose, for this reason, the use of normals for quasi-planar regions and/or tangents for regions of high urvature. 3 Extended IP In this setion we desribe the extension of the model of IP in order to onsider also normals and tangents of the surfae. We first illustrate the general onept and, subsequently, we fous on the ase in whih a loal surfae has either a normal, a tangent or none of the two. We onlude the setion by skething an algorithm to arry on the optimization. 3. Extending the Measurements Let n i be the normal of a point p i belonging to a ertain surfae, and τ i its tangent if the point is part of an edge, we an then extend Eq. as follows: T = argmin T + + ( p i T p r T j) Ω p ( ij p i T p r j) ( n i T n r ) T ( j Ω n ij n i T n r j) (3) ( τ i T τ r ) T ( j Ω τ ij τ i T τ r ) j. Here n i, nr j and Ωn ij represent respetively the normal of the point p i and p r i, and the information matrix of the orrespondene among the two normals. Similarly, τ i, τ r j and Ωτ ij are the tangents and the information matrix of the orrespondene among the two tangents. We reall that, if T is a transformation desribed by a rotation matrix R and a translation vetor t, the operator has different definitions depending on its arguments: { R x + t if x is a point T x = (4) R x if x is a tangent or a normal

A Mahalanobis distane between two point louds an be measured by onsidering also the distanes of orresponding normals and orresponding tangents after applying the transformation T, as shown in Eq. 3. By defining an extended point p as a vetor onsisting of a point p, its normal n and its tangent τ, we have a straightforward modifiation of the operator as p = ( p, n, τ ) T T p = ( Rp + t, Rn, Rτ ) T. (5) Eq. 3 an be, then, ompatly rewritten in terms of extended points as T = argmin ( p i T p r )T j Ω ij ( p i T p r ) j, (6) T where Ω ij = diag( Ω p ij, Ωn ij, Ωτ ij ) summarizes the ontribution of Ω p ij, Ωn ij and Ω τ ij. The funtion diag( a,..., a k ) stands for a diagonal matrix whose entries, starting from the upper left orner, are a,..., a k. If the point is not sampled from a loally planar surfae nor from an edge, a reasonable distane metri is the Eulidean distane. For these points, we fall bak to the IP ase, whih is enlosed in Eq. 6 by setting the information matries of the tangent and the normal to the null matrix: Ω n ij = Ωτ ij = 0. When measuring the distane between two planar pathes, it is reasonable to neglet displaements along the tangent diretion of the plane, while errors along the normal diretion should be more severely penalized. Additionally, the normals of the two planes should be as lose as possible. However, this onstraint annot be enfored when using only 3D points. To obtain this behavior from the error funtion, we an impose Ω p ij to be a dis lying on the surfae around p i, as done in [6]. Sine the tangent is not defined in a planar path, we set Ω τ ij = 0. Additionally, we set the ovariane matrix Ω n ij of the normal to have a shape whih is elongated in the normal diretion. In this way the error between the normals introdues a strong momentum that fores them to have the same diretion. onversely, when measuring the distane between two edges, it is reasonable to slide them onto eah other along the tangent diretion. This behavior an be obtained by enforing Ω p ij to have a prolonged shape and to lie along the tangent diretion. The tangents τ, instead, an be used to penalize two edges not lying on the same diretion by setting Ω τ ij to have a shape whih is elongated in the diretion of τ. Sine an edge has no normal, Ω n ij has to be set to 0. The reader might notie that tangents and normals are mutually exlusive. Sine the ontributions of the tangent and the normal omponents to the χ 2 ij have the same matrix dimensions, we an further simplify the extended point p by partitioning it into an affine part p, and in a linear part l. The former is subjet to translations and rotations, the latter only to the rotation. In this way it is possible to redue the dimension of the error funtion and to speed up the alulation without loss of generality. We therefore define a ompat form for an extended point p as p = ( p, l ) T T p = ( Rp + t, Rl ) T. (7)

Table : This table summarizes the omponents of the information matrix used in our algorithm, depending on the type of the struture around a point. R ni and R τi are two rotation matries that bring the y axis respetively along the diretion of the normal n i, or the tangent τ i. ɛ is a small value (0 3 in our experiments). ase Ω p ij Ω n ij Ω τ ij planar R ni diag( ɛ,, )RT n i R ni diag( ɛ, ɛ, )RT n i 0 edge R τi diag(,, ɛ ɛ )RT τ i 0 R τi diag(,, ɛ ɛ )RT τ i none I 0 0 Aording to the new formalism, the objetive funtion in Eq. 6 beomes T = argmin ( p i T p r )T j Ω ij ( p i T p r j), (8) T }{{} ẽ ij(t) where Ω ij = diag( Ω p ij, Ωl ij ), and the information matries must be modified aording to Table 2 Table 2: This table summarizes the omponents of the information matrix for our algorithm when using a redued representation. ase l i Ω p ij Ω l ij planar n i R ni diag( ɛ,, )RT n i R ni diag( ɛ, ɛ, )RT n i edge τ i R τi diag(,, ɛ ɛ )RT τ i R τi diag(,, ɛ ɛ )RT τ i none 0 I 0 3.2 arrying on the Optimization In this setion we present the proedure for the minimization desribed in Eq. 8 by using a strengthened least squares proedure. The input of this algorithm are two sets of extended points p :n and p :m, a (noisy) set of andidate orrespondenes = i, j :M and the information matrix Ω ij, omputed aording to Table 2. The aim of this proedure is to find the transform T that minimizes the following objetive or ost funtion T = argmin T ẽ ij (T) T Ω ij ẽ ij (T) }{{} χ 2 ij. (9) Eah orrespondene ontributes to the overall ost funtion by the salar term χ 2 ij.

As is well know, the minimizing T of Eq. 9 an be found by iteratively solving the following linear system: H T = b, (0) where H = JT ij Ω ij J ij is the Hessian matrix, b = JT ij Ω ij ẽ ij is the oeffiient vetor and J ij is the Jaobian of the error funtion. At eah we ompute an improved transform T from the previous transform T by using H and b. By solving the linear system in Eq. 0 we determine a perturbation T whih is applied to the previous transform T in order to redue the error. The transform T of the next is thus omputed as T = T T. () For readers interested in further details on the derivation of Eq. 0 we suggest the work by Kümmerle et al. [2]. In our approah, a perturbation T is defined as a vetor omposed of two parts ( t q) T where t = ( t x t y t z ) is a translation vetor and q = ( q x q y q z ) is the imaginary part of the normalized quaternion used to represent an inremental rotation. If t = 0, the perturbation is the 4-by-4 identity matrix. Under this parameterization, the Jaobian J ij with respet to the loal perturbation T is omputed as J ij = e ( ) ij T T (n) ( ) I 2[T p r = j ] T 0 2[T l r j ], (2) T=0 where [x] is the ross produt matrix of the vetor x. In pratie, by exploiting the blok struture and the sparsity of the Jaobian, it is possible to ompute effiiently the linear system in Eq. 0. In order to be robust to the presene of outliers, whih usually signifiantly ontribute to the error, the proposed sheme has to be further modified. To redue the ontribution of these wrong terms, we sale the information matrix of eah orrespondene whose χ 2 is greater than an aeptane threshold by a fator γ ij. { if χ 2 ij < K γ ij = K χ 2 otherwise (3) Even if the orret orrespondenes are rejeted at the beginning of the iterative proess, these will be onsidered again as the system onverges towards a better solution, sine their χ 2 will derease. In order to smooth the onvergene it an be also added a damping fator to the linear system in Eq. 0. In pratie, T is found by solving the damped linear system (H + λi) T = b, sine it prevents the solution to take too large steps that might be aused by nonlinearities or wrong orrespondenes. 3.3 Optimization Summary In this subsetion we wrap-up the ideas disussed above and we provide the sketh of an iterative algorithm the optimization of Eq. 8. At eah our

algorithm omputes an improved estimate T from the urrent estimate T by exeuting the following steps:. ompute the information matries Ω ij aording to Table 2; 2. ompute the error vetor ẽ ij as shown in Eq. 8; 3. ompute the Jaobian J ij aording to Eq. 2; 4. ompute the χ 2 ij as in Eq. 9 and the saling fator γ ij from Eq. 3; 5. ompute a saled version of the Hessian and of the oeffiient vetor as: H = γ ij J T ij Ω ij J ij b = γ ij J T ij Ω ij ẽ ij ; (4) 6. Solve the linear system of Eq. 0 to find an improved perturbation T; 7. ompute the improved transformation T as in Eq.. 4 Experiments We validated our approah both on real and syntheti data. The real world experiments were onduted on publily available benhmarking datasets, and they show the performanes of our optimization algorithm when inluded in a full IP system. The experiments on syntheti data, instead, allow to haraterize the behavior of our approah under different levels of sensor noise and outlier ratios. omparisons with NDT are not showed sine it performs similarly to GIP beause they rely on analogous representations of the points. 4. Real World Experiments For the real world experiments we used the benhmarking datasets by Stuerm et al. [7]. Eah dataset onsists in a sequene of depth and RGB images aquired with a alibrated RGBD amera in a referene sene. Note that even if our approah is not restrited to the use of depth images, we deided to use these datasets sine they are labeled with the ground truth of the transformations. We do not make use of the RGB hannels. In order to provide the input data to the algorithm illustrated in the previous setion, we proessed the point loud P generated from eah depth image by extrating the loal surfae harateristis from the neighborhood of eah point p i. This proess is performed by omputing the parameters of a Gaussian N (µ i, Σ i ) and taking all points that lie within a fixed ball entered in p i, as µ i = P i p k Σ i = P i p k P i p k P i (p k µ i ) T (p k µ i ), (5) where P i is the set of all points in P that are loser than a fixed distane from p i. For determining if a point lies on a orner, an edge or a flat surfae, we analyze the eigenvalues of its ovariane matrix Σ i. If all eigenvalues have more

or less the same magnitude, we assume the point is on a orner. If one of the eigenvalues is smaller with respet to the other two, we assume the point lies on an edge. Finally, if one of the eigenvalues is smaller of some order of magnitude with respet to the others then we assume the point is on a planar path. This disrimination is neessary to ompute the orret information matries, aording to Table 2. Given two louds to be aligned, we searh the orrespondenes using a line of sight riterion over the depth images, we rejet the orrespondenes whose normals are too different and we exeute one of optimization. Notie that IP and GIP are speial ases that an be aptured by our algorithm just by modifying the way in whih the information matries are omputed. To fous our analysis on the objetive funtion we left all parts of the system unhanged, inluding the orrespondene seletion. This represents an advantage for plain IP, sine normally it does not rely on the normals in order to rejet wrong assoiations. For eah dataset, we inrementally aligned one frame to the previous one. For eah of the alignment, we ompared the differene between the urrent solution and the ground truth. Eah attempted alignment produed a plot whih shows the evolution of the rotational and translational error. For ompatness, we provide in this paper only the average error plots obtained by averaging all errors of a run. The reader who is interested in the individual plots of eah alignment, an find them at http://www.dis.uniroma.it/~serafin/ publiations/-augmented-measurements. In order to measure the robustness of the alignments to wrong initial guesses we performed several runs of the experiments by onsidering a frame eah N. Table 3 shows the average evolution of the rotational and the translational error on three different datasets and at different frame skips. The plotted results point out that our novel objetive funtion in general performs better than the other approahes, in partiular in terms of onvergene speed. This is true espeially for the rotational part of the error sine it is influened diretly by the normals. Also in the ase where no frame was skipped (first olumn of Table 3), GIP required twie the number of s to onverge to the results of our approah. Moreover, IP and GIP showed muh less robustness to frame skipping (seond and third olumn of Table 3). 4.2 Experiments on Syntheti Data We onduted experiments on syntheti data in order to assess the effets of the inliers and of the sensor error on our optimization funtion. To this end we generated a sene onsisting of about 300k 3D points with normals and tangents. Then, we omputed the orret orrespondenes and ideal measurements and we orrupted them. This proess has been performed by injeting a variable fration of random outliers and perturbing the measurements by adding Gaussian noise With run we denote all the alignment over a single dataset with a ertain frame skip rate.

Table 3: Average evolution of the translational and rotational error for three different datasets at varying frame skip rates. Our approah is labeled n in the aptions of the images. frame skip rate: frame skip rate: 0 frame skip rate: 20 dataset: fr/360 0.05 0.045 0.04 0.035 0.03 0.025 0.02 5 0.005 0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0 n 0 2 4 6 8 0 n 0 2 4 6 8 0 8 6 4 2 8 6 4 2 8 0.7 0.6 0.5 n 0 2 4 6 8 0 n 0 0 2 4 6 8 0 0.65 0.6 0.55 0.5 5 n 5 0 2 4 6 8 0 0.8 0.7 0.6 0.5 n 0 2 4 6 8 0 dataset: fr/desk2 0.04 0.035 0.03 0.025 0.02 5 n 0 2 4 6 8 0 0.05 0.045 0.04 0.035 0.03 0.025 0.02 5 0.005 n 0 2 4 6 8 0 8 6 4 2 8 6 4 5 5 5 n 0 2 4 6 8 0 n 5 0 2 4 6 8 0 4 2 8 6 4 n 2 0 2 4 6 8 0 0.65 0.6 0.55 0.5 5 5 5 n 0 2 4 6 8 0 dataset: fr/room 0.035 0.03 0.025 0.02 5 n 0.005 0 2 4 6 8 0 0.055 0.05 0.045 0.04 0.035 0.03 0.025 0.02 5 0.005 n 0 2 4 6 8 0 8 6 4 2 8 6 4 2 5 5 5 n 0 2 4 6 8 0 n 0 2 4 6 8 0 6 4 2 8 6 4 2 0.7 0.65 0.6 0.55 0.5 5 5 5 n 0 2 4 6 8 0 n 0 2 4 6 8 0

Table 4: Average evolution of the translational and rotational error at different outlier ratios and levels of noise affeting the measurements of the point (standard deviation, in meters) and the normals (standard deviation in degrees). Our approah is labeled n in the aptions of the images. outliers: 0% outliers: 50% outliers: 90% noise: [m], 0 n 0 0 20 30 40 50 n n 0 0 20 30 40 50 n 0 n 0 0 20 30 40 50 n 0 0 20 30 40 50 0 0 20 30 40 50 0 0 20 30 40 50 noise: [m], 5 n 0 0 20 30 40 50 n n 0 0 20 30 40 50 n 0 n 0 0 20 30 40 50 n 0 0 20 30 40 50 0.00 0 0 20 30 40 50 0 0 20 30 40 50 noise: [m], 20 n 0 0 20 30 40 50 n n 0 0 20 30 40 50 n 0 n 0 0 20 30 40 50 0 n 0 0 20 30 40 50 0 0 20 30 40 50 0 0 20 30 40 50

to the points and normal estimates. For eah setting we ran our approah, IP and GIP, and we plotted the evolution of the translational and rotational error. The results are shown in Table 4. Overall the experiments on syntheti data reflet the behavior of the real world ones. Shortly, using additional information in the error funtion makes the approah more robust and aelerates the onvergene. This is partiularly true at high rates of outliers and sensor noise. Not surprisingly, instead, noise in the normals lowers the performanes. In the unrealisti senario in whih every normal is affeted by a 20 error at 90% of outliers the translational estimate beomes less aurate than GIP, but it still onverges to a reasonable solution. 5 onlusions In this paper we proposed a novel optimization funtion to register point louds using an IP based algorithm that takes into aount an augmented measurement vetor. Statistial omparative experiments on real and syntheti data show that our approah performs better than other state of the art methods both in terms of onvergene speed and robustness. As expeted, the normals and the tangents of the surfaes showed an improvement in partiular in the rotational part of the error, while keeping the translational one similar to the other approahes. A further enhanement ould be obtained by finding an additional measurement, related to the translation, to be onsidered in the minimization of the ost funtion. Aknowledgments. This work has partly been supported by the European ommission under FP7-600890-ROVINA. Referenes. Besl, P.J., MKay, N.D.: A method for registration of 3-D shapes. IEEE Transations on Pattern Analysis and Mahine Intelligene (992) 2. Grisetti, G., Kummerle, R., Stahniss,., Burgard, W.: A tutorial on graph-based slam. Intelligent Transportation Systems Magazine, IEEE 2(4), 3 43 (200) 3. Horn, B.K., Hilden, H.M., Negahdaripour, S.: losed-form solution of absolute orientation using orthonormal matries. Journal of the Optial Soiety of Ameria (988) 4. Magnusson, M., Dukett, T., Lilienthal, A.J.: San registration for autonomous mining vehiles using 3d-ndt. Journal on Field Robotis (2007) 5. Pomerleau, F., olas, F., Siegwart, R., Magnenat, S.: omparing variants on real-world data sets. Autonomous Robots (203) 6. Segal, A.V., Haehnel, D., Thrun, S.: Generalized-IP. In: Pro. of Robotis: Siene and Systems (RSS) (2009) 7. Sturm, J., Engelhard, N., Endres, F., Burgard, W., remers, D.: A benhmark for the evaluation of rgb-d slam systems. In: Pro. of the IEEE/RSJ Int. onf. on Intelligent Robots and Systems (IROS) (202)