Feature Extraction and Registration An Overview

Feature Extracton and Regstraton An Overvew S. Seeger, X. Laboureux Char of Optcs, Unversty of Erlangen-Nuremberg, Staudstrasse 7/B2, 91058 Erlangen, Germany Emal: sns@undne.physk.un-erlangen.de, xl@undne.physk.un-erlangen.de Abstract The purpose of ths paper s to present a survey of rgd regstraton (also called matchng) methods applcable to surface descrptons. As features are often used for the regstraton task, standard feature extracton approaches are descrbed n addton. In order to gve the reader a framework for hs present regstraton problem, ths report dvdes the matchng task nto three major parts (feature extracton, smlarty metrcs and search strateges). In each of them the reader has to decde between several possbltes, whose relatons are n partcularly ponted out. 1 Introducton Regstraton or equvalently matchng s the process of brngng two data sets nto best possble algnment. Ths s reached by determnng the transformaton that transforms correspondng areas or ponts nto each other. The terms best possble algnment and correspondng areas/ponts of two data sets are ntutvely qute easy to understand but need a precse mathematcal defnton for a computatonal approach (see 6). Snce features are one possblty to defne correspondng areas/ponts, we dscuss n addton some feature extracton methods (see 4). Features can be defned as modfed data formed from a collecton of the orgnal data set whch mght be combned n lnear or non-lnear ways [4]. In ths secton only regstraton methods applcable to surface descrptons are descrbed. In order to llustrate the above defntons, before startng the dscusson of regstraton methods, we present some typcal examples of data sets, correspondng areas/ponts and transformatons. 2 Examples of data sets to be regstered In the followng examples we present three knds of data structures: - ntensty pctures I : (n x, n y ) I[n x, n y ] R, e.g. from photo cameras, - surface descrptons (range mages) F : (n x, n y ) (x[n x, n y ], y[n x, n y ], z[n x, n y ]) T R 3, e.g. from a tactl or optcal sensor, - volume data ρ : (n x, n y, n z ) ρ[n x, n y, n z ] R, e.g. from medcal 3-D scanners lke CT (Computer Tomography), MR (Magnetc Resonance) or sonography (ultrasound) whch show anatomcal structure, or PET (Postron Emttng Tomography), SPECT (Sngle Photon Emsson Computed Tomography) or MRS (Magnetc Resonance Spectroscopy) whch show functonal and metabolc actvty [29].

Intensty mages and volume data sets are ntroduced n addton, snce the regstraton methods used n ther areas may also be used for the regstraton of surfaces: - Typcal optcal 3-D sensors supply n addton to a range mage a pxel dentcal ntensty mage. In ths way the regstraton of the two ntensty pctures s already suffcent to match the underlyng range mages. - Snce so-surfaces (ρ(n x, n y, n z ) = const ) are often extracted to regster volume data sets, the regstraton methods used can be drectly appled to the surface matchng task. Examples The data sets to be regstered could be two ntensty mages of an object taken from dfferent vewponts. Due to llumnaton varatons between the mages correspondng ponts do not have the same ntenstes whch makes the matchng process even more dffcult. The searched transformaton s a 3-D rotaton and a translaton also known as rgd transformaton. To fnd ths transformaton at least seven correspondng ponts n the mages have to be found. The related range mage can be calculated from the gven ntensty mages [52]. The data sets are two or more range mages of an object taken from dfferent vewponts. The searched transformaton s a rgd one. Wth at least three correspondng ponts n the range mages the transformaton can be found. In ths way a complete model of the object can be generated from several vews [35]. The data sets are two ntensty mages of dfferent objects e.g. two dfferent human faces. Correspondng areas may be manually defned by features lke eyes, the mouth and the nose. The searched transformaton depends on the applcaton. For example, n a face recognton system t may be useful to fnd the best rgd transformaton between the two gven faces [33]. The magntude of the dfference decdes whether the same face or dfferent faces are presented n the mages. On the other hand for morphng (the smooth transton from one data set nto another) a non rgd transformaton s searched whch allows to locally match all parts of the object n the two mages to each other [18]. The data sets are two range mages of an object taken from the same vewpont but at dfferent tmes so that the object may have changed ts shape n the meantme. For example the object mght be a human face before and after a dental operaton [11]. Correspondng areas result from features that can be detected n both range mages and have not changed n the meantme e.g. the eyes, the nose and/or the forehead. The searched transformaton depends agan on the applcaton. For example, f the detecton of the post operatonal swellng s to be vsualzed, t mght be useful to fnd the rgd transformaton that transforms the unchanged parts of the object nto each other. Wth the help of ths rgd transformaton the dfference volume of the two range mages can be vsualzed. In other applcatons t mght be useful to parameterze the tme varatons of the object by a non rgd transformaton whch allows to match all parts of the object n the two range mages to each other [15, 40]. Such a non rgd transformaton can be a (global or local) affne, projectve or curved transformaton [27], dependng on the magntude of dfference between the data sets. The data sets are two volume data sets of the same object measured by the same devce but at dfferent tmes. Wth a rgd transformaton varatons of the object (e.g. a skull or a bran) can be determned (e.g. to study the evoluton of a dsease) [27]. The data sets are two volume data sets of dfferent objects measured by the same devce (e.g. the heads of dfferent patents). Correspondng areas seem obvous for a human beng. The searched transformaton s a non rgd one. The regstraton of dfferent patents 2

mages could allow to contrast a healthy and a sck person [40]. The data sets are two volume data sets of the same object measured by dfferent devces, e.g. a CT and a PET scan of the human head. It may be dffcult even for a non expert human beng to defne correspondng areas. The searched transformaton s a rgd one (e.g. to mprove the dagnoss by usng multmodalty data) [26]. The data sets to be regstered can be of dfferent dmensons, e.g. t s possble to match an ntensty mage I : (n x, n y ) I[n x, n y ] R wth a range mage F : (n x, n y ) (x[n x, n y ], y[n x, n y ], z[n x, n y ]) T R 3 of the same object [47]. The searched transformaton s a rgd one. The above examples can be classfed accordng to two basc crtera: 1. Nature and doman of transformaton rgd (local, global) non rgd (local, global) 2. Modaltes nvolved monomodal multmodal Now we come to the presentaton of several regstraton methods. 3 Overvew of regstraton methods applcable to geometry data The task of determnng the best spatal transformaton for the regstraton of data sets can dvded nto four major components [5]: feature space search space smlarty metrc search strategy. The choce of feature space determnes what s matched. Snce features can be ndependently found n each data set n a preprocessng step, the amount of data to be matched can thus be reduced. Some examples are: - raw data (ntenstes n ntensty mages, 3-D ponts n range mages, densty values n volume data sets), - attrbutes defned for all ponts: curvatures, prncpal frames, pont sgnatures [15, 42, 10], - specal collectons of ponts: edges, surfaces, crest lnes [30, 7, 14, 40], - salent pont features: corners, lne ntersectons, ponts of hgh curvature, extremal ponts [36, 42], - statstcal features: moment nvarants, centrods, prncpal axes; (they refer to measures over a regon that may be the outcome from a segmentaton preprocessng step) [16], - hgher-level structural and syntactc descrptons [12, 51]. The search space s the class of transformatons from whch we want to fnd the optmal transformaton to algn the data sets (global/local, rgd/nonrgd). The smlarty metrc determnes how matches are rated (e.g. sum of squared eucldan dstances, normalzed cross-correlaton, mutual nformaton). The search strategy descrbes how to fnd ths optmal transformaton and depends on the search space (e.g. ICP, Hough method (clusterng), correlaton, relaxaton, predcton-verfcaton, ndexng schemes, tree + graph matchng). We begn the dscusson of regstraton methods by presentng the extracton of more sophstcated features lke prncpal curvatures, pont sgnatures, prncpal frames, crest lnes and 3

extremal ponts from surface descrptons. These features can be found n all knds of free-form surfaces and therefore are more general than corners or edges. In the whole dscusson of regstraton methods we restrct ourselves to the search space of global rgd transformatons (3-D rotatons + translatons). Therefore we gve an overvew of representatons of rotaton wth an emphass on quaternons. Then we present some wdely used smlarty metrcs. At the end we descrbe some search strateges, especally the ICP-algorthm and some of ts extensons. The ICP algorthm needs no feature extracton and has become the standard for the precse regstraton of two pont clouds. 4 Extractng features from surface descrptons 4.1 Prncpal Curvatures Mnmal and maxmal curvatures (also called prncpal curvatures) are shft and rotaton nvarant local features of an object surface. Gven a parametrcal suface descrpton p (u, v) = (x(u, v), y(u, v), z(u, v)) T, the prncpal curvatures κ ( = 1, 2) at p(u 0, v 0 ) can be calculated as the egenvalues of the Wengarten map (also called shape operator) [13], ( E F ) 1 ( L M ) ( α ) F G M N β = κ ( α β where E, F, G, L, M and N depend on frst and second order partal dervatves to u and v at p = p(u 0, v 0 ): E = p u p u, F = p u p v, G = p v p v, (2) L = p uu n, M = p uv n, N = p vv n. n s the normal at p(u 0, v 0 ), n = ) (1) p u p v p u p v. (3) It s straghtforward to determne the curvatures κ from (1) as the roots of the characterstc polynomal [49], κ 2 NE 2MF + LG κ + LN M 2 = 0. (4) EG F 2 EG F 2 4.2 Prncpal frames Wth the help of the components of the egenvectors α and β the drectons of mnmal and maxmal curvatures can be determned as: e = α p u + β p v α p u + β p v. (5) It can be shown that the unt vectors e 1 and e 2 are perpendcular (e 1 e 2 = 0). Snce both e 1 and e 2 le n the tangental plane (defned by p u, p v ), they are perpendcular to the normal n at p(u 0, v 0 ). Therefore e 1, e 2 and n defne a local orthogonal frame at p(u 0, v 0 ) also called the prncpal frame or trhedron. Note that the prncpal frame s not unquely determned, snce there s no way to choose between the frames (e 1, e 2, n) and ( e 1, e 2, n) [15]. 4

4.3 Crest lnes One of the prncpal curvatures s maxmal n absolute value: t s called n [42] the largest curvature κ max, n order not to be mstaken wth the maxmal curvature. The assocated prncpal drecton s e max. Crest lnes are the loc of the surface where the largest curvature κ max s locally maxmal (n absolute value) n the assocated drecton e max. In [42] crest lnes are extracted from volume data. Then a crest lne s the ntersecton of an so-surface ρ(n x, n y, n z ) = const wth the mplct surface κ max e max = 0 (.e. change of κ max n drecton e max s zero, whch mples that κ max s extremal n drecton e max ) 1. 4.4 Extremal ponts For the defnton of crest lnes only the largest curvature wth ts assocated prncpal drecton s used. In the same way there are also extremal lnes assocated wth the curvature wth mnmal absolute value. Extremal ponts are defned as the ponts of ntersecton of such extremal lnes wth crest lnes. In [42] extremal ponts are extracted from volume data. Then an extremal pont s the ntersecton of an so-surface ρ(n x, n y, n z ) = const wth the mplct surfaces κ 1 e 1 = 0 and κ 2 e 2 = 0. Note that the extremal ponts are generally not the ponts of the extremal lnes whose curvature s locally maxmal. It s only stated here that there are 16 dfferent types of extremal ponts that can be dstngushed. In addton there are several geometrc nvarants assocated wth extremal ponts: the geometrc nvarants of the surface (prncpal curvatures), the geometrc nvarants of the extremal lnes (curvature, torson) and the geometrc nvarants correspondng to the relatve poston of the extremal lnes wth respect to the underlyng surface [42]. In order to extract all these features, partal dervatves have to be calculated on the gven data. If no parametrcal surface descrpton but only pont clouds are gven, n order to calculate partal dervatves usually a polynomal surface s locally approxmated at each data pont. In ths way the partal dervatves are always proportonal to one of the polynomal coeffcents [24, 49]. 4.5 Pont sgnatures Smlar to prncpal curvatures ths knd of rotaton and translaton nvarant feature can be defned for each surface pont but wth the great advantage that no dervatves have to be calculated [10]: For a gven surface pont p a sphere of radus r, centered at p, s placed. The ntersecton of the sphere wth the object surface s a 3-D space curve C, whose orentaton can be defned by an orthonormal frame formed by a normal vector n 1, a reference vector n 2, and the cross-product of n 1 and n 2. n 1 s defned as the unt normal vector of a plane P ftted through the space curve C. In the lmt r tends to zero, n 1 approxmates the surface normal at the pont p. A new plane P can be defned by translatng the ftted plane P to the pont p n a drecton parallel to n 1. As well, f r tends to zero, P approxmates the tangental plane. The perpendcular projecton of C to P forms a new planar curve C. The dstances of the ponts of C to the correspondng projected ponts of C form a sgned dstance profle that s called the sgnature of the pont p n [10]. The reference drecton n 2 s defned as the unt vector from p to the projected pont on C whch gves the largest postve dstance. Note that n 2 s orthogonal to n 1 snce t les on P. 1 Of course crest lnes can also be drectly calculated on surface descrptons. 5

4.6 Extended Gaussan Image (EGI) The EGI s another way to represent the data of a surface data set. In ths approach [19] the normal vector at each pont of the data set s computed and mapped nto a unt sphere where ts tal s at the center of the sphere and ts head les on the surface. In addton each pont on the surface of the sphere (.e. head of a normal vector) s weghted by the Gaussan curvature of the correspondng pont of the surface data set. In ths way the EGI can be consdered as the weghted orentaton hstogram of the data set. Ths representaton of the data sets has two nterestng propertes for the pose estmaton problem: t s translaton nvarant, the EGI rotates n the same way as the correspondng data set. Therefore the problem of fndng the transformaton n the 6-dmensonal parameter space of rotatons and translatons can be dvded nto two reduced problems: 1. Fndng the rght rotaton wth the help of the EGI. 2. Fndng the rght translaton wth the help of an addtonal method. However such an EGI approach assumes that the mappng between a pont n the data set and a pont on the sphere s unquely defned. It can be shown that ths s only the case f the object s convex. One approach n whch the dstances from each data pont to a gven orgn are addtonally saved n each correspondence pont on the sphere s the Complex EGI [23]. Note that also an extenson to the EGI whch can deal wth all classes of surfaces, called UNSDLA, has already been formulated [25]. 5 Representatons of rotaton There are many ways to represent rotaton. Some examples are: Gbbs vector, Euler angles [43], Paul spn matrces [43], axs and angle [43], Cayley-Klen parameters [17], orthonormal matrces [43], quaternons [20] and dual number quaternons [54]. Snce quaternons are wdely used n the computer vson communty we want to gve a short ntroducton. Some more detals can be found n [20]. In the context of regstraton they are often used for a closed form soluton to the problem of mnmzng the least-squares sum of correspondng ponts. A quaternon q can be represented n the complex number notaton q = q 0 + q x + jq y + kq z (6) wth real part q 0 and three magnary parts q x, q y, q z. For the magnary unts, j, k the followng equatons hold: 2 = 1, j 2 = 1 k 2 = 1, j = k, jk =, k = j, (7) j = k, kj =, k = j. Wth (7) the multplcaton of quaternons ṙ and q can be defned n terms of the products of ther components, ṙ q = (r 0 q 0 r x q x r y q y r z q z ) + (r 0 q x + r x q 0 + r y q z r z q y ) + j (r 0 q y r x q z + r y q 0 + r z q x ) + k (r 0 q z + r x q y r y q x + r z q 0 ). 6

In general ṙ q qṙ. The dot product of two quaternons s the sum of products of correspondng components: ṗ q = p 0 q 0 + p x q x + p y q y + p z q z. (8) The square of the magntude of a quaternon s the dot product of the quaternon wth tself: A unt quaternon s a quaternon whose magntude equals 1. The conjugate of a quaternon negates ts magnary parts: q 2 = q q. (9) q = q 0 q x jq y kq z. (10) Vectors can be represented by purely magnary quaternons. If r = (x, y, z) T, we can use the quaternon ṙ = 0 + x + jy + kz. (11) Scalars can be smlary represented by usng real quaternons. Usng the fact that only rotatons preserve dot products and cross products, we can represent a rotaton by a quaternon f we can fnd a way of mappng purely magnary quaternons (that represent vectors) nto purely magnary quaternons n such a way that dot and cross products are preserved. It can be shown that the composte product ṙ = qṙ q, (12) where q s a unt quaternon, transforms the magnary quaternon ṙ nto an magnary quaternon ṙ and preserves the dot and cross products between ṙ and a second magnary quaternon ṙ 2. Snce ( q) ṙ ( q ) = qṙ q (13) q represents the same rotaton as q. It s straghtforward to verfy that the composton of rotatons corresponds to multplcaton of quaternons: ṙ = ṗṙ ṗ = ṗ ( qṙ q ) ṗ = (ṗ q) ṙ (ṗ q). The overall rotaton s represented by the unt quaternon ṗ q. It may be of nterest to note that t takes fewer arthmetc operatons to multply two quaternons than t does to multply two 3 3 matrces. Also, snce calculatons are not carred out wth nfnte precson on a computer the product of many orthonormal matrces may no longer be orthonormal, just as the product of many unt quaternons may no longer be a unt quaternon. However t s trval to fnd the nearest unt quaternon, whereas t s qute dffcult to fnd the nearest orthonormal matrx. Unt quaternons are closely related to the geometrcally ntutve axs and angle notaton. A rotaton by an angle θ about the axs defned by the unt vector e = (e x, e y, e z ) T can be represented by the unt quaternon q = cos θ 2 + sn θ 2 (e x + je y + ke z ). (14) 7

The relaton of a unt quaternon q to the famlar orthnormal rotaton matrx R s gven by ( q 2 0 + qx 2 qy 2 qz) 2 ( 2 (q x q y q 0 q z ) 2 (q x q z + q 0 q y ) R = 2 (q y q x + q 0 q z ) q 2 0 qx 2 + qy 2 qz) 2 2 (q y q z q 0 q x ) (. (15) 2 (q z q x q 0 q y ) 2 (q z q y + q 0 q x ) q 2 0 qx 2 qy 2 + qz) 2 Fnally we want to note that a 3-D rgd moton (rotaton + translaton) can also be represented by a specal dual quaternon. A dual quaternon or dual number quaternon ˆq consst of two quaternons q and ṡ so that ˆq = q + ɛṡ, (16) where a specal multplcaton rule for ɛ s defned by ɛ 2 = 0. In order to represent a 3-D rgd moton the followng two constrants have to be satsfed: 6 Typcal smlarty metrcs q q = 1 and q ṡ = 0. (17) The problem of algnng two data sets could be generally defned n the followng way: Gven two data sets u(x ) and v(x ) descrbng parts of the same object at x n the u-frame respectvely x n the v-frame,.e. v(x ) = F (u(x )) (18) where F s the transfer functon from u to v, we want to fnd the pose transformaton T from x to x,.e. x = T (x ). (19) Combnng (18) and (19) ths s equvalent to resolve In general t s dffcult to determne the transfer functon F. 6.1 Correlaton v(x ) = F (u (T (x ))) x. (20) However, f the effects of F can be neglected, e.g. u, v are ntensty mages from dfferent vews by neglgble llumnaton varatons between u and v, we get v(x ) = u (T (x )) x. (21) Due to nose, dfferent occlusons and partal overlappng a soluton T vald for all x cannot be found. Therefore t s a common way to search for the transformaton T that mnmzes E(T ) = x [v(x ) u (T (x ))] 2 (22) = x [v(x )] 2 x 2v(x )u (T (x )) + x [u (T (x ))] 2. (23) Such a functon that determnes the deal model parameters as the arguments that maxmzes or mnmzes the functon s also often called a cost functon or objectve functon. Snce the frst term n (23) s ndependent of T, mnmzng E(T ) s equvalent to maxmzng x v(x )u (T (x )) C(T ) =, (24) x [u (T (x ))]2 8

called the normalzed cross-correlaton functon [5]. A related measure, whch s advantageous when an absolute measure s needed, s the correlaton coeffcent x C(T ) = [v(x ) µ v] [u (T (x )) µ u ] x [v(x ) µ v] 2 (25) x [u (T (x )) µ u] 2 where µ u and µ v are the mean values of u and v. The denomnator s gven by the product of the standard devatons of u and v. 6.2 Mutual nformaton In the case where the effects of F must be taken nto account, e.g. f u represents the normals at each pont of a 3-D scan and v the correspondng ntensty mage, more sophstcated methods have to be appled. u and v can be nterpreted as random varables wth probablty dstrbutons P u and P v. Intutvely f u and v are well algned the randomness of u gven knowledge of v s maxmally reduced. In statstcs ths ntuton can be formalzed as follows. Frstly the randomness of a random varable X s measured by ts entropy, defned by H (X) E X {log (P (X))}. (26) Thereby E Z {Z} s the expected value of random varable Z and P (X) the probablty dstrbuton of X. Secondly the randomness of random varable Y gven knowledge of random varable X s measured by the condtonal entropy H(Y X) E X {E Y {log P (Y X)}}. (27) Then the ntuton stated above can be formulated as maxmzng I (u(x ), v (T (x ))) = H (u(x )) H (u(x ) v (T (x ))), (28) called the mutual nformaton of u and v [46, 47]. 6.3 Least-squares sum of correspondng ponts In the specal case where the data sets are two pont clouds {p } and {p } of an object measured by a 3-D sensor from two dfferent vewponts we descrbe the standard smlarty metrc n more detals. For every par of correspondng ponts p and p we want to fnd the rotaton R and translaton t so that p = Rp + t (29) wth p = (x, y, z) T. For convenence we wrte (29) as wth the transformaton T defned by p = T (p ) (30) T (z ) = Rz + t. (31) The transformaton T has sx free parameters (e.g. three angles, three translaton parameters). Wth at least three pont correspondences of type (30) these sx parameters are unquely determned. However due to nose n the measurements the transformaton calculated from three 9

arbtrary pont correspondences s not the best one. To fnd the best transformaton usually the least-squares soluton to the overdetermned system of equatons (30) s searched, p T (p ) 2 mnmum. (32) We want to gve a statstcal explanaton of (32) as the maxmum lkelhood model selecton [46]. Our pont sets {p } and {p } can be nterpreted as a sample a = [... {p, p }...] of the vector random varables X and the functonally dependent vector random varable X = T (X ) + η, whch s assumed to be perturbed by Gaussan measurement nose descrbed by η. The lkelhood of the sample s the condtonal probablty of the sample gven the random varables X and X and a model of ther functonal dependence ˆX = T (X ),.e. ( ) p a ˆX = T (X ) = ( p {X, X } 1 = {p 1, p 1}, ) {X, X } 2 = {p 2, p 2},... ˆX = T (X ) ( ) X = p ˆX = T (p ) (p,p ) a p where we have assumed n (34) that the trals of the sample are ndependent. Wth the help of Bayes law we can fnd the most lkely model gven the sample, (33) (34) ( ) ( ) ( ) p ˆX = T (X ) p ˆX = T (X ) a = p a ˆX = T (X ), (35) p (a) ( ) by maxmzng p ˆX = T (X ) a wth respect to the parameters of T. The uncondtonal probablty of the sample p(a) could be arbtrary, snce the sample s the same for all models. It s the assumpton ( made by a maxmum ) lkelhood model selecton that the pror probablty of the model p ˆX = T (X ) s the same for all models that are evaluated,.e. ( ) ( ) p ˆX = T (X ) s constant. Therefore maxmzng p a ˆX = T (X ) s equvalent to maxmzng p ˆX = T (X ) a. To smplfy the maxmzaton of p a ˆX = T (X ) from (34), ( ) ( ) usually the logarthm s taken whch does not nfluence the poston of the maxmum snce the logarthm s a monotonc functon. So nstead of (34) we maxmze ( ) log p a ˆX = T (X ) = ( ) log p X = p ˆX = T (p ). (36) (p,p ) a We assume that the dfferences between the predcted ponts ˆp = T (p ) and the actual trals of X are Gaussan, ) p (X = p ˆX = T (p ) = g ψ (p T (p )) (37) wth g ψ (z ) 1 (2π) 3 2 ψ 1 2 ( exp 1 ) 2 z T ψ 1 z, (38) where ψ s the covarance matrx of the random vector η = X ˆX and ψ ts determnant. 10

( ) Usng (37) n the expresson for log p a ˆX = T (X ) from (36) we get ( ) log p a ˆX = T (X ) = 1 2 (p,p ) a log ( (2π) 3 ψ ) 1 (p T (p 2 )) T ψ 1 (p T (p )). (p,p ) a ( ) Therefore maxmzng p a ˆX = T (X ) s equvalent to mnmzng where (39) (p,p ) a D ψ (p T (p )), (40) D ψ (z ) = z T ψ 1 z (41) s the so called squared Mahalanobs dstance. Assumng that all ψ are dagonal wth equal varances on the dagonal (σ 2 = ψ 11 = ψ 22 = ψ 33 ), we get from (40) the standard weghted least-squares problem also known as the ch-square fttng problem [32], χ 2 = p T (p ) 2. (42) σ 2 (p,p ) a Assumng that all σ are equal, the smple least-squares problem (32) follows drectly from (42). Let us contnue wth the soluton to the weghted least-squares problem (42). Usng the explct form of the transformaton T from (31) we have to mnmze, 1 σ 2 p Rp t 2. (43) If usng Euler angles for the representaton of R, the rotaton wll be gven by the followng matrx product, cos α sn α 0 cos β 0 sn β cos γ sn γ 0 R = sn α cos α 0 0 1 0 sn γ cos γ 0. (44) 0 0 1 sn β 0 cos β 0 0 1 Therefore the parameters α, β, γ are not quadratc n (43) and the soluton to the mnmzaton problem cannot be reduced to a smple system of lnear equatons (by calculatng the partal dervatves wth respect to the parameters and settng them to zero.e. solvng the so called system of normal equatons). Such least-squares problems are also called nonlnear least-squares problems. Of course the problem can be solved by standard optmzaton technques lke gradent descent, conjugate gradents, Newton s method or the Levenberg-Marquardt algorthm (that has become the standard technque for nonlnear least-squares problems [32]). Amazngly there are however closed form solutons to ths problem that are sgnfcantly faster (2 to 5 tmes dependent on the number of pont correspondences) than teratve approaches [1]. The known solutons are based on sngular value decomposton 2 (SVD) [1, 22, 45], 2 The SVD method used here should not be confused wth the standard SVD method whch s used n favor of solvng the system of normal equatons when we have a lnear least-squares problem. Here a completely dfferent matrx s decomposed. 11

quaternons [20, 14], dual number quaternons [48, 54], orthonormal matrces/polar value decomposton [21, 22]. Let us present the SVD based method (refer to [32] for a motvaton of the term SVD) n some detals. To smplfy the problem n (43) t s convenent to compute the centrods of each pont set and translate the pont clouds so that the centrods concde at the orgn. In ths way, n order to get the parameters of the rotaton, we only have to mnmze 3, 1 σ 2 p Rp 2. (45) The translaton can then be calculated as the dfference between the centrods[20]. Expandng the square n (45) results n 1 σ 2 ( p 2 2p T Rp + Rp 2). (46) The frst term n (46) does not depend on R. Snce a rotaton s an operaton that preserves lengths,.e. Rp = p, only the second term depends on R. Therefore mnmzng (46) s equvalent to maxmzng 1 p T Rp. (47) σ 2 Usng the relaton for two vectors a and b that a T b = trace ( ab T ) and the fact that the trace of a matrx product s cyclc, trace (AB) = trace (BA) for matrces A and B, (47) can be rewrtten as trace ( R T K ) (48) where the so called correlaton matrx K s defned by K = 1 p p T. (49) Solutons to the maxmzaton problem n (48) can be found based on sngular value decomposton (SVD), polar value decomposton or quaternons. Based on sngular value decomposton, we decompose the correlaton matrx K nto the form σ 2 K = UΛV T (50) where U and V are 3 3 orthonormal matrces, and Λ s a 3 3 dagonal matrx wth nonnegatve elements, the so called sngular values. An algorthm for performng such a decomposton, whch s a standard task n numercal mathematcs, can be found n [32]. Now the matrx R max = UV T (51) maxmzes (48) due to the followng lemma: Lemma: For any postve defnte and symmetrc matrx A and any orthonormal matrx B, trace (A) trace (BA). (52) 3 From now on p and p are relatve to the centrods of the pont sets. 12

A smple proof for ths lemma based on the Schwarz nequalty can be found n [1]. A s gven n our case by R T maxk (compare (48)) whch fulflls the condtons of the lemma snce R T maxk = VU T UΛV T (53) = VΛV T (54) s symmetrcal and postve defnte. Thus, due to the lemma, for any 3 3 matrx R, so that R T max really maxmzes (48) 4. trace ( R T maxk ) trace ( RR T maxk ), (55) 7 Search strateges 7.1 Hough method Before presentng the applcaton of the Hough method to the regstraton of 3-D pont sets, we descrbe ts basc dea n a general framework. Let {x } =1...N, x R l be a data set and {q j } j=1...m, q j R a parameter set related by a functon f : R m R l R w f (q 1,..., q m, x ) = 0. (56) In addton let suppose that all parameters q are unquely determned by a subset of k samples of {x } =1...N {x } =1...k q 1,..., q m. (57) We search for the parameters {q j } j=1...m whch realze (56) as best as possble 5 for all x. Now for the Hough method the followng steps have to be performed: For each subset of k N! samples of {x } =1...N (there are possbltes) the parameters q (N k)!k! 1,..., q m are calculated and at the correspondng poston (q 1,..., q m ) n a m-dmensonal accumulaton table (Hough table) a counter s ncremented by one. In ths way every subset of k samples of {x } =1...N resultng n the same parameter set q 1,..., q m contrbutes to the same poston counter n the table. Therefore the poston whose counter has the hghest score corresponds to the parameter set that s n best accordance wth the gven data set {x } =1...N. To llustrate the Hough method we gve a smple example: Let {x } =1...N be a set of 2- D ponts x = (x, y) T from an ntensty mage whch are the outcome of a feature detecton algorthm. Assume we know that some of the x descrbe crcles n the orgnal ntensty mage but we do not know whch ones. We want to fnd the centers of the crcles and ther rad. Thus the searched parameters q, = 1,..., m = 3 are the center poston coordnates x c and y c, and the radus R. For each pont on a crcle the followng equaton holds: (x x c ) 2 + (y y c ) 2 R 2 = 0. (58) Therefore (58) determnes the functon f from (56). Snce three ponts unquely defne a crcle, k = 3 n the general descrpton above. If we now apply the Hough method we wll fnd the crcles present n the data from the postons of the accumulatons n the 3-dmensonal Hough table. 4 Snce orthonormal matrces buld a group (the so called SO(3)) [43] RR T max represents an arbtrary orthonormal matrx f R also represents an arbtrary orthonormal matrx. 5 The cost functon s defned by the method tself. 13

We now apply the Hough method to the regstraton of two 3-D pont sets {p } and {p } of extracted pont features. {x } =1...N s n ths case the set of tupels (p, p ) of all combnatons of ponts from the frst set wth ponts from the second set. The searched parameters are the m = 6 parameters of the rgd transformaton (R, t) between both pont sets, so that f from (56) s defned by Rp + t p = 0. (59) Snce three non-collnear ponts unquely defne the 6 parameters of R and t, k = 3 n the general descrpton. Applyng now the Hough method all transformatons calculated from correct pont correspondences result n the same transformaton whle all other transformatons are dstrbuted more or less randomly n the parameter space. As well as above the ponts of accumulaton gve us the rght parameter set. Note that n the case where the Hough method s used to fnd the parameters of a transformaton the method s also often called clusterng [37]. 7.2 Correlaton In the prevous secton we have already ntroduced the smlarty metrc correlaton (24) that has to be maxmzed wth respect to the transformaton T x v(x )u (T (x )) C(T ) =. [u (T (x ))]2 x One way to fnd the correlaton maxmum s to compute C(T ) for all possble transformatons T. Snce the number of possbltes may be very large, the complete search n the parameter space may not be feasble. For example, f T s a 3-D rgd transformaton (rotaton + translaton) and each of the d = 6 dmensons of the parameter space s dvded n M quantzaton steps, the expresson n (24) has to be computed M d tmes. Thus the complexty of such a calculaton s M d N, f N s the number of ponts n each data set. M determnes the accuracy of the approach but can be reduced when usng the outcomes of a prevous feature extracton process by only consderng the transformatons that are n accordance wth the extracted features. In addton, n order not to make the complexty prohbtvely large, N and d have to be small. N can be reduced by only usng data wthn a small wndow, usually called a template. Let us present a typcal example: Gven two ntensty mages I 1 and I 2 descrbng parts of the same object (d = 3: 2 translatons, 1 rotaton), n whch M 1 and M 2 features (e.g. corners) have been respectvely extracted, we want to match them by fndng correspondng features n I 1 and I 2. Usng pont features lmts the search n the translaton parameter space: only translatons between extracted features are allowed. The neghborhood wthn a wndow centered at each extracted feature F 1 n I 1 s correlated wth the neghborhood at every extracted feature F 2 n I 2,.e. expresson (24) has to be calculated M 1 M 2 M tmes where M s the number of quantzaton steps n the rotaton dmenson. Usng a wndow of sze n x n y = N w the complexty of the correlaton calculaton s then M 1 M 2 MN w. 7.3 Relaxaton Relaxaton s a technque to resolve ambgutes between match canddates of two data sets. These canddates are the outcomes of a feature extracton process. There are ambgutes between the match canddates snce n general a gven feature attrbute does not unquely determne a canddate. Even after usng a correlaton technque for smple pont features as 14

descrbed above, a feature pont n the frst data set may be pared to several feature ponts n the second data set. To overcome these ambgutes for a gven feature the relaxaton technque makes n addton use of relatons (to features n the neghborhood) whch are more or less preserved under the consdered transformaton. For pont features these relatons are typcally the dstances between two features n the same data set [52]. A further possblty could be the angles formed by three features. Snce dstances between features n ntensty mages (taken from 3-D objects) are dstorted by perspectve projectons, these relatons are only nvarant for pont features close to each other. For the relaxaton technque a smlarty measure for each par of match canddates s defned based on the followng three crtera: 1. the goodness of the consdered par of match canddates (e.g. provded by correlaton scores or the dfference between feature attrbutes), 2. the goodness of all possble pars of match canddates n the respectve neghborhoods, 3. the agreements of the relatons between the consdered feature and the features n the neghborhood n the frst data set wth those n the second data set. These smlarty measures are teratvely changed by updatng the relatons defned n the thrd crteron untl they converge. The relatons are updated by successvely cancelng pars of match canddates whch have not reached a large smlarty measure n the prevous teraton. Let us llustrate the relaxaton approach n contnung the example presented n the prevous paragraph: The correlaton provdes for each possble par of match canddates (F 1, F 2j ), where F 1 and F 2j are extracted features n I 1 and I 2 respectvely, a measurement of the goodness c(f 1, F 2j ) mentoned n the above crtera 1 and 2. The smlarty measure S can be expressed as S(F 1, F 2j ) = c (F 1, F 2j ) c (F 1k, F 2l ) δ (F 1, F 2j, F 1k, F 2l ) (60) F 2l Ω F2j F 1k Ω F1 where Ω F1 and Ω F2j are the neghborhoods around F 1 and F 2j respectvely, and δ(...) s a functon descrbng the agreement of the relatons between (F 1, F 1k ) and (F 2j, F 2l ). Durng the updatng process some of the functon values are set to zero and enable n ths way a recomputaton of a new smlarty measure untl all correspondences between match canddates are fxed. 7.4 Indexng schemes In general ndexng schemes precompute nvarant feature values (e.g. prncpal curvatures, pont sgnatures) n a data set and hash them nto a look-up table (called hash table) wth references to the correspondng feature postons [10]. In order to match two data sets {x } and {x } the followng steps are performed: Frstly the extracted nvarant feature values n {x } are hashed n a table. Secondly for each pont feature n {x } wth feature values (v 1,..., v n ) we fnd the possble correspondng postons n {x } (match canddates) by takng the ponts at the poston (v 1,..., v n ) n the hash table. Thrdly to resolve ambgutes for the match canddates a relaxaton, Hough or predctonverfcaton method s appled. The advantage of the hash table s the access to possble match canddates n constant tme. A more sophstcated ndexng scheme s geometrc ndexng [53]. In ths case the ndexng scheme s based on the geometrcal relatonshps between extracted features. For each extracted pont feature n the frst data set {x } a bass for a coordnate frame s defned wth the help 15

of further extracted pont features (e.g. n 3-D space two other pont features are necessary). Then the coordnates of all other pont features are calculated wth respect to ths bass and are hashed nto a look-up table whch stores the k-tuple of features defnng the bass. Ths s done for all combnatons of pont features defnng a bass (e.g. for M extracted features n the frst data set there are M(M 1)(M 2) possbltes to defne a bass n 3-D space and for each bass M feature coordnates have to be determned). In the second data set pont features are smlary extracted and an arbtrary bass B 1 formed by a k-tuple of features s chosen. Then all features F {x } are computed wth respect to ths bass, resultng n the coordnates (x 1,..., x n ). For each coordnate vector we fnd at the poston (x 1,..., x n ) n the hash-table several bass from the frst data set whch can possbly correspond to the bass B 1. Each tme a bass occurs t s voted for by ncrementng a counter. After processng all coordnate vectors the bass wth the hghest vote s consdered as a correspondng bass to B 1. Ths hypothess can be confrmed wth a predcton-verfcaton scheme workng on the whole data set. If the verfcaton fals another bass B 2 wll be chosen and tested n the same way. 7.5 Predcton-verfcaton The prncple of a predcton-verfcaton scheme s qute smlar to the correlaton approach: Just calculate a gven smlarty metrc for a certan transformaton. However there are some dfferences n practce: In the case of correlaton all possble transformatons are tested; n predcton-verfcaton only the transformatons resultng from a preprocessng step (e.g. from an ndexng scheme or a feature based approach) are verfed. In the case of correlaton the transformaton reachng an extremal value of the smlarty metrc (e.g. the mnmal value of (22) or maxmal of (24)) s accepted as the correct one; n predcton-verfcaton the frst transformaton resultng n a value of the smlarty metrc better than a gven threshold s consdered as the rght one. In the case of correlaton n order to reduce the complexty the calculaton s usually only appled to extracted features; n predcton-verfcaton the transformaton of the whole data set provdes the most relable verfcaton. Some examples for predcton-verfcaton schemes can be found n [42, 10, 15]. 7.6 Tree + graph matchng After a feature extracton preprocessng step a data set can be descrbed by a tree (also called graph), where the nodes are defned by the features and the lnks by ther geometrcal relatons. The matchng of two data sets s then reduced to the mappng of two graphs. Ths search process s often called subgraph somorphsm. Some examples can be found n [51, 9]. 7.7 Standard optmzaton technques The prevously descrbed search strateges are based on extracted features whenever they are used n practcal applcatons. Otherwse ther complexty would be too hgh. By contrast, standard optmzaton technques try to fnd an extremum of a gven smlarty metrc takng nto account the whole data sets. An extremum can be ether global or local. Although there s n practce tll now no guarantee to fnd the global extremum, a few approaches deal wth ths problem: for example mean feld theory [38], genetc algorthms [6] and smulated annealng [35, 32]. However, f a good estmaton of the transformaton between two data sets s known 16

there are several standard approaches to fnd the global extremum. A good estmaton means n ths context that the local extremum to whch the method converges s n fact the global extremum. These technques usually base on gradent nformaton of the cost functon. Typcal examples are gradent descent (also called steepest descent), conjugate gradents or the Levenberg-Marquardt algorthm [32]. 7.8 ICP algorthm In the subsecton 6.3 we have mentoned that t s straghtforward to fnd the best rgd transformaton between two pont clouds by mnmzng (32) p T (p ) 2 (61) where p and p are correspondng ponts. However the pont correspondences are not known n advance. In the case where the gven data sets are already well algned to each other (ths should be possble wth one of the feature based methods descrbed above), the followng heurstc assumpton may be reasonable: Correspondng ponts are the closest ponts between two gven data sets. In ths way (61) can be drectly derved from the correlaton expresson (22) by settng v(x ) = mn p k {p } ( p k x ), (62) u (T (x )) = mn p j {p } ( p j T (x ) ) (63) and by restrctng the sum over all x n (22) to the data set {p }. Thus we get v(x ) = 0 and (22) becomes ( 2 mn p j T (x ) ). (64) p j {p } By defnng the ndex so that x {p } [ ] p = arg mn p j T (p ) p j {p } we get the expresson (61). Snce the transformaton T s not known n advance but assumed to be small (the data sets are supposed to be well algned), T s taken as the dentty transformaton. In ths way correspondng ponts are defned as: [ ] p = arg mn p j p (66) p j {p } After calculatng the correspondng ponts and the resultng transformaton, t can be expected that after applyng the transformaton the data sets become closer to each other. Thus t seems reasonable to terate ths procedure (therefore the term ICP: Iteratve Closest Ponts [3]) untl convergence of the computed transformaton: 1. fnd closest ponts accordng to (66), 2. calculate rotaton R and translaton t that mnmze the least-squares sum of correspondng ponts (61), 3. apply the transformaton to all ponts n the frst data set. 17 (65)

However, f the estmaton of correspondng ponts durng the ntalzaton step s too bad the algorthm wll not converge to the rght transformaton 6 but wll stuck n a local mnmum. Nevertheless ths algorthm has become the standard for the precse regstraton of two data sets, descrbed n detals n [3, 54] and used as the bass of more sophstcated algorthms n [44, 2, 15, 28, 42, 40]. Besl and McKay proposed an accelerated verson of the ICP algorthm [3] n usng lnear and quadratc extrapolaton of the regstraton parameters durng the teratons. In ths way they reduced the number of teratons untl convergence of the transformaton by a factor of 2 to 3. In (66) correspondng ponts are defned as the ponts wth mnmal dstances between ponts of two gven data sets. Although ths defnton of correspondng ponts s qute often used [3, 54], another approach s possble: snce a pont set defnes a surface (of course, not unquely) the correspondng pont to a pont p 0 {p } can also be defned as the pont p s0 (on the surface S defned by the ponts {p }) havng mnmal dstance to p 0,.e. [ ] p s0 = arg mn p s p p s 0. (67) S Ths defnton of correspondng ponts s used for example n [8, 44, 28]. Snce the surface resultng from a gven pont set s not unquely defned, a more precse defnton of correspondng ponts has to be gven for actual calculatons. For example Chen and Medon [8] propose an teratve algorthm that takes 3 to 5 teratons to determne a closest pont. The worst case cost of fndng the closest pont of p 0 {p } accordng to (66) s O(n), where n s the number of ponts n {p }. Therefore the total cost of fndng the closest pont for all p {p } s O(nn ), where n s the number of ponts n {p }. There are several methods whch can consderably speed up the search process, e.g. bucketng technques (n 3-D or n 2-D by projecton), k-d trees (abbrevaton for k-dmensonal bnary search tree; here k = 3) [54] or octree-splnes [41]. The closest pont search can be further accelerated by explotng a coarse to fne strategy durng the teratons of the ICP: durng the frst teratons closest ponts are only determned for some coarsely sampled ponts. Then a fne matchng usng more and more ponts follows [54], [44]. 8 Robust regstraton In order to mprove the regstraton process wth the ICP-algorthm t s recommendable to use the weght factors 1/σ 2 n the least-squares sum (43). For example Turk and Levoy [44] use the dot product of calculated normals and a vector pontng to the lght source as confdence values. However the dervaton of the least-squares sum used n the ICP-algorthm s based on the assumpton that devatons from the model (see (37)) are Gaussan dstrbuted. Ths assumpton s not vald n practce due to the followng arguments: Usually there are many ponts n one data set that should not have a correspondence n the other data set due to dfferent object occlusons n data sets from dfferent vews and/or snce there s only a partal overlap between the data sets. However pont correspondences (=closest ponts) are always found n the ICP-algorthm. Usually there are outlers n the data sets frequently due to some unknown reasons. Perhaps there was a percusson durng the measurement process or the camera was overdrven due to lght reflectons. 6 As descrbed the algorthm always converges to the dentty transformaton. Here we mean that the combned transformaton over all teratons does not converge to the rght transformaton. 18

Statstcans have developed varous sorts of robust methods that can reduce the nfluence of outlers: 8.1 Frst smple approach An ntutve method to elmnate outlers from the matchng process of two data sets s to compute the mean value µ of the dstances d between correspondng ponts, and to remove from the least-squares sum (32) each pont par whose dstance s larger than a gven threshold D (e.g. D = µ+3σ where σ s the standard devaton of the dstances). Then one teraton of the ICP-algorthm s performed. Before the next teraton the mean value µ s calculated agan on all ponts of the data sets to prevent that more and more ponts are elmnated [54]. However snce the nfluence of correspondng ponts n the standard least-squares sum quadratcally ncreases wth ther dstances, outlers are much more weghted than correct data ponts and thus can strongly dstort the estmaton of the computed transformaton parameters (one faraway outler can make all other outlers have small dstances d, so that correct data ponts are rejected nstead of the outlers). 8.2 M-estmators A generalzaton of ths approach s the concept of M-estmators. In ths case, nstead of mnmzng p T (p ) 2 = d 2, (68) we replace the squared dstance (L 2 -estmator) by a functon ρ of d ρ(d ) (69) where ρ s a symmetrc, postve defnte functon wth a unque mnmum at zero and s less ncreasng than square. It can be shown that mnmzng (69) s equvalent to mnmzng the followng terated reweghted least-squares expresson [52], ( ω d (k 1) ) d 2 (70) ( ) where ω(x) = 1 dρ, the superscrpt (k) ndcates the teraton number, and ω d (k 1) x dx has to be recalculated after each teraton n order to be used n the next teraton. In ths way the frst method descrbed { above can be consdered as the specal case of M-estmator mnmzaton 1 x D by takng ω(x) =. Note that M-estmators suffer from the same problems as 0 else descrbed above. Examples of several weght functons ω can be found n [52, 34, 32]. 8.3 Least Medan of Squares A really robust approach that overcomes the bad nfluence of outlers s the Least Medan of Squares method (LMedS). Ths method s not affected by outlers up to a rate of 50% [28, 34]. In ths approach the transformaton parameters are estmated by mnmzng medan ( d 2 ) (71) n the followng way: 19

Snce there s (probably) no straghtforward formula for the medan functon, t s not possble to dfferentate (71) and apply standard optmzaton technques to fnd the mnmzng transformaton parameters. Thus the complete parameter space should be nvestgated. Snce such an exhaustve search of the parameter space would not be feasble, only transformatons based on correspondng ponts (closest ponts) can be consdered: each combnaton of three pont correspondences defne a possble transformaton (n 3-D space). In practce the number of such combnatons s so large that t must be reduced by only randomly takng a few of them. For each random combnaton the correspondng transformaton s appled and the medan value of the dstances p T (p ) between all correspondng ponts p and p s computed. The transformaton for whch the medan s mnmal s retaned. The mnmal number of randomly chosen combnatons m depends on the rate of outlers ɛ n the data sets and on the desred relablty R that the LMedS soluton s not corrupted by outlers. It can be easly seen that the relablty R s gven by: [ R = 1 1 (1 ɛ) D] m, (72) where D s the mnmal number of ponts necessary to defne a unque transformaton (n 3-D space: D = 3). Therefore the mnmal number of combnatons s: m = log (1 R) (73) log [1 (1 ɛ) D]. (For example for R = 0.99 and ɛ = 0.4 we get m = 19.) 8.4 Extended Kalman Flterng Another robust technque comng from the sgnal processng theory s Extended Kalman Flterng (EKF). Ths method makes use of a pror knowledge and provdes a recursve soluton to the least-squares problem. Snce t would be qute tme consumng to explan ths approach, we refer the nterested reader for a basc ntroducton to [50] and for an applcaton on 3-D regstraton to [31]. 9 Regstraton of multple pont sets Tll now only the regstraton between two data sets was consdered. For practcal applcatons, such as vrtual realty, CAD-processng or NC-manufacturng, t s however of nterest to match several range vews together to reconstruct a 3D-model of the orgnal object. The technques descrbed below assume that: dfferent vews of the object to be modeled are parwsely overlappng,.e. each vew has a common area wth (at least) two other vews. these vews have already been transformed by a coarse regstraton,.e. they nearly le n the same coordnate system. A frst smple approach s to process the vews sequentally [28]: Frstly one of the data sets s taken as reference. Secondly one of ts (2 or more) connected vews s regstered, so that the ntegrated model forms the new reference. 20