Ego-Motion Estimation on Range Images using High-Order Polynomial Expansion

Ego-Motion Estimation on Range Images using High-Ode Polynomial Expansion Bian Okon and Josh Haguess Space and Naval Wafae Systems Cente Pacific San Diego, CA, USA {bian.okon,joshua.haguess}@navy.mil Abstact This pape pesents two novel algoithms fo estimating the (local and global) motion in a seies of ange images based on a polynomial expansion. The use of polynomial expansion has been quite successful in estimating optical flow in D imagey, but has not been used extensively in 3D o ange imagey. In both methods, each ange image is appoximated by applying a high-ode polynomial expansion to local neighbohoods within the ange image. In the local motion algoithm, these appoximations ae then used to deive the tanslation o displacement estimation within the local neighbohoods fom fame to fame within the seies of ange images (also known as ange image flow). An iteative method fo computing the local tanslations is pesented. In the global motion algoithm, a global motion model famewok is utilized to compute a global motion estimation based on the polynomial expansion of the ange images. We evaluate the algoithms on seveal eal-wold ange image sequences with pomising esults. 1. Intoduction Estimating motion in a video o seies of images is an extemely impotant and difficult task in compute vision and has numeous applications, such as autonomous vehicle navigation [11]. Not much attention has been given to the estimation of motion between ange images, howeve, estimating motion in video o image sequences, most commonly efeed to as optical flow, has a long histoy of eseach. The two most common and pominent appoaches to optical flow ae known as local (o spase) and global (o dense) methods. Local methods, such as the Lucas Kanade method [14], estimate the motion of egions of inteest between images using image egistation and waping techniques. In contast, global methods, such as Hon and Schunck s method [13], compute a dense motion field by estimating the motion of each pixel between images. In this wok, we ae concened mostly with the latte method, paticulaly the wok by Fanebäck [8], which appoximates the image using a polynomial expansion of local patches and then uses the polynomial expansion to estimate the global displacements between images fo each pixel. Estimating optical flow fo ange images, also known as ange flow, is the main topic of this pape. Vey little cuent eseach exists on this topic, howeve, it is an extemely impotant poblem in a gowing field. One of the few examples is the wok of Spies, Jahne and Baon [18]. The poblem of ange flow is unique fom optical flow fo electo-optical images in the sense that evey pixel value is a measue of distance instead of colo o bightness. This diffeence makes it vey difficult to apply existing and taditional two-dimensional (D) optical flow methods to ange flow. Fo instance, the bightness constancy constaint used by many optical flow methods is not valid fo ange images. Theefoe, ou appoach in this wok is to extend a well-known global optical flow method of motion estimation based on polynomial expansion [8] to ange images. We extend the method by using a high-ode polynomial expansion to include tems in the z diection (ange distance to the senso). We then fomulate an iteative method to solve fo displacement in the x, y and z diections between ange imagey. In addition, we pefom the calculation at multiple scales fo obustness and include displacement estimates fom pevious fames to impove the oveall motion estimation. We also intoduce a method to estimate the global motion based on motion model deivations simla to that pesented by Dufaux and Moscheni [4] and Fanebäck and Westin [10]. Pomising esults ae pesented on seveal eal-wold ange images.. Related Wok As peviously mentioned, thee is a vast amount of eseach in optical flow between colo images, usually categoized into local and global methods. Many assumptions and constaints have been intoduced in both appoaches to 1 99

deal with noise and smoothness of the solutions, such as the bightness constancy assumption, gadient constancy assumption and spatio-tempoal smoothness constaints. This has led to a beeding gound of methods, such as Buhn et al. [3], which attempt to combine the local and global methods to addess the dawbacks and assumptions of each individual method. The most popula and successful methods ae coveed in moe detail in two benchmaking papes on the subject by McCane et al. [15] and Bake et al. [1]. The two papes descibe common databases, pocedues and esults on compaing moe than 0 optical flow methods, with the Bake et al. pape being the most ecent and complete. Of paticula inteest to ou eseach is the method intoduced by Fanebäck in seveal papes that intoduces the estimation of motion using the polynomial expansion [8, 9, 5, 7, 6, 19, 10, 17, 16]. In Fanebäck s wok, the local neighbohoods of each image ae appoximated by a polynomial expansion and an analytical solution fo the displacement between images is deived. Fom this deivation, a obust algoithm is designed to compute the displacement, and thus motion field, between two o moe images in a sequence. The method has poven to be vey accuate and obust fo D images and has been included as a default algoithm in the OpenCV libay []. The eseach of estimating the motion between ange images, o ange flow, is much moe spase. The tem ange image flow fist appeas in the wok of Gonzalez [1], whee he fomulates a physics-based appoach to estimate the motion of the ange senso elative to its envionment. Ou method uses the same basic physical model of the ange senso as that used by Gonzalez. One of the most popula and ealiest papes on this topic, by Spies et al. [18], notes the unique challenges of this poblem and poposes a basic motion constaint equation on defomable sufaces. The constaint solutions ae obtained in a total least squaes famewok and compute a dense ange flow field fom spase solutions. While the esults ae pomising fo the ange images pesented in thei pape, the method is not diectly tansfeable to othe domains, such as dense ange flow in a moving scene due to the lage displacements pesent. Theefoe, the focus of ou wok is to extend the polynomial expansion method to ange imagey to compute local and global dense ange flow on sequences of eal-wold ange images. 3. Range Image Polynomial Expansion In ou fomulation, ange image flow uses a polynomial expansion based appoximation of the ange image. This appoximation is done using a set of quadatic basis functions, applied to the ange data. The basis equation set is {1, x, y, x, y, xy}, which descibe the vaiation of z, ange fom the senso, as you vay x and y, azimuth and elevation with espect to the senso. In addition to the ba- (a) Oiginal Range Image (b) 1 Coefficient Image (c) y Coefficient Image (d) x Coefficient Image (e) y Coefficient Image (f) x Coefficient Image (g) xy Coefficient Image Figue 1: Velodyne R polynomial expansion. sis functions, we incopoate a notion of the accuacy o impotance of this data though a cetainty matix, as well as a poximity-based weight ove the neighbohood in the fom of an applicability matix, as is done in Fanebäck s Ph.D. dissetation [8]. Fo the cetainty matix, a value of 1 was given to all pixels populated with valid data and 0 fo any pixel with eo o no data. A Gaussian kenel was used as the applicability measue. The weights of the basis, { 1, x, y, x, y, xy }, wee calculated fo Equation (1) descibed by Fanebäck [8] and the values of these weights fo a Velodyne R lida ange image can be seen in Figue 1. f(x) = x T Ax + b T x + c (1) Using this fomulation, the ange, f(x), is descibed ] as, b = a function of pixel location, with A = [ ] T x y and c = 1. [ xy x xy y 4. Local Flow Displacement Calculation The polynomial expansion in D images allows displacement to be calculated analytically, by looking at the effects of a displacement on the polynomial expansion coefficients. The effects of this displacement on the quadatic polynomial ae deived in Equation (). 300

f 1 (x) = x T A 1 x + b 1 T x + c 1 f (x) = f 1 (x d) = x T A 1 x + (b 1 A 1 d) T x + d T A 1 d + b 1 T d + c 1 = x T A x + b T x + c, leaving you with a new quadatic polynomial with diffeent coefficients, () A = A 1 (3) b = b 1 A 1 d (4) c = d T A 1 d + b 1 T d + c 1. (5) With these new coefficients, d can be computed fom Equation (4). d = 1 A 1 1 (b b 1 ) (6) This method of displacement calculation was developed and tested by Fanebäck and is used as the stating point of the thee-dimensional (3D) displacement calculation. A thid dimension cannot simply be added to d because with the cuent model, it has no meaning as an input tem. The function space is not defined fo any value of z unde this model so the model must be modified to explain behavio while displacing this thid dimension. To popely captue this highe-dimensional behavio, a highe-ode equation is used to appoximate this space. A linea speading of the x and y data, Equation (7), as well as a constant incease, Equation (8), was used to appoximate the behavio of the data as you move along the z dimension. This leads to a quatic polynomial, Equation (9), with the highe-ode tems being vey spase tensos. f(x) = f quad (x ) + z; (8) f(x) = ζ x x x z + ζ x ζ y xy xyz + ζ y y y z + ζ x x x z + (ζ x + ζ y ) xy xyz + ζ y y y z + x x + xy xy + ζ x x xz + y y + ζ y y yz + x x + y y + z + c whee ζ x and ζ y wee deived fom the plana pojection model to account fo the speading of the points in the spheical coodinate system of the senso. This can be combined into a tenso fom: f(x) = g ijkl x i x j x k x l + h ijk x i x j x k + a ij x i x j + b i x i + c, (10) whee the high-ode tensos, G and H, ae spase and the quadatic tensos ae dense. Fo G and H, the nonzeo tems ae while A and b ae g 00 = g 00 = g 00 =... = ζ x x 6 g 01 = g 01 = g 01 =... = ζ xζ y xy 1 g 11 = g 11 = g 11 =... = ζ y y h 00 = h 00 =... = ζ x x 3 h 01 = h 10 =... = (ζ x + ζ y ) xy 6 h 11 = h 11 =... = ζ y y, 3 6 (9) (11) (1) x = (ζ x z + 1)x y = (ζ y z + 1)y ζ x = α x /z ζ y = α y /z angula ange of x π adians α x = = pixel ange of x image width angula ange of y α y = pixel ange of y = 6.8 image height [ ] x x = y (7) A = xy ζ x x x xy ζ y y y ζ x x ζ y y 0 (13) b = [ x y 1 ] T. (14) We now exploe the effects of displacement as Fanebäck did in Equation (): 301

f(x) = g ijkl x i x j x k x l + h ijk x i x j x k + a ij x i x j + b i x i + c f(x) = f(x d) = g ijkl x i x j x k x l (4g ijkl d l h ijk )x i x j x k + (6g ijkl 3h ijk d k + a ij )x i x j (4g ijkl d j d k d l 3h ijk d j d k + a ij d j b i )x i + (g ijkl d i d j d k d l h ijk d i d j d k + a ij d i d j b i d i + c = g ijkl x i x j x k x l + h ijk x i x j x k + ã ij x i x j + b i x i + c, (15) leaving us with a new quatic polynomial with the following coefficients: g ijkl = g ijkl h ijk = h ijk 4g ijkl d l ã ij = a i j 3h ijk d k + 6g ijkl d k d l bi = b i a i jd j + 3h ijk d j d k 4g ijkl d j d k d l c = c b i d i + a i jd i d j h ijk d i d j d k + g ijkl d i d j d k d l. (16) Now we have a linea equation with H that could be used as an analytical solution to d but with ou highe-ode tensos being mostly spase and mainly composed of the model coefficients as opposed to the expansion coefficients, we look to ou lowe ode dense tensos to solve fo d though numeical optimization. Using the symmeties and spasities of the matices, we ae left with the following coefficients in ou dense tensos. b0 = b 0 (a 00 d x + a 01 d y + a 0 d z ) + 6(h 00 d x d z + h 01 d y d z ) 1(g 00 d x d z + g 01 d y d z) b1 = b 1 (a 01 d x + a 11 d y + a 1 d z ) + 6(h 01 d x d z + h 11 d y d z )) 1(g 01 d x d z + g 11 d y d z) b = b (a 0 d x + a 1 d y ) + 3(h 00 d x + h 11 d y + h 01 d x d y ) 1(g 00 d xd z + g 11 d yd z + g 01 d x d y d z ) (17) c = c b 0 d x + b 1 d y + b d z + (a 00 d x + a 11 d y + a 01 d x d y + a 0 d x d z + a 1 d y d z ) 3(h 00 d xd z + h 11 d yd z + h 01 d x d y d z ) + 6(g 00 d xd z + g 11 d yd z + g 01 d x d y d z). (18) To calculate d, we optimize the diffeence between the obseved polynomial coefficients of the next image and the coefficients deived though displacing the coefficients of the fist image using Equations (17) and (18). This optimization is done using the non-linea least squaes vesion of Newton s method, the Gauss-Newton algoithm, on Equations (19) and (0). This optimization technique equies the Jacobian matices of each coefficient with espect to d. min d ( bi ), b = b (1) (d) b () (19) min d ( c), c = b (1) (d) c () (0) This leaves two emaining outes to solving fo d, though the changes in b o the changes in c, shown in Equations (1) and (), espectively, each with its own advantage. The solution based on b tends to poduce moe accuate esults fo d x and d y because it captues the motion of the quadatic components seen mostly in edges. This solution does not tend to be as accuate fo the d z component because d z is detected though the speading of the quadatic x and y components. This effect can be small at long distances, whee as d z has a lage effect on c. Howeve, the effects on c do not tack the quadatic components of d x and d y as diectly as b. d b = (J b(x) T J b(x) ) 1 J b(x) T b (1) d c = (J c(x) T J c(x) ) 1 J c(x) T c () Simila to Fanebäck, this algoithm estimates the displacement ove a neighbohood I aound x as opposed to a single point, minimizing the following equations: x I w x ( b (1) i (x + x d) b () i (x + x)) (3) w x ( c (1) (x + x d) c () (x + x)), (4) x I whee w x is the neighbohood weighting function and the minimum steps ae d b = d c = ( x I w x J b(x)t J b(x)) 1 x I ( x I w x J c(x)t J c(x)) 1 x I w x J b(x) T b w x J c(x) T c. (5) (6) 30

5. Global Flow Displacement Calculation To compute the global motion estimation, we utilize the mateial coveed in Sections 3 and 4 to compute the polynomial expansion of the ange images which ae utilized in a global motion model. We follow the deivation and notation of Fanebäck and Westin [10] and deive the global motion estimation in the following section. We model the displacement as d(x) = S(x)p, whee d(x) is the displacement estimation, S(x) is the motion model, and p ae the tanslation and otation paametes, p = [ T x T y T z ω x ω y ω z ] T, which ae optimized to poduce the global motion estimation. Using the wok of Dufaux and Moscheni [4] and the spheical coodinate system inheent to ou senso, we constuct a motion model S(x) fo ou Velodyne R senso. The final motion model fo ou senso is whee S T = S(x) = [ S size S T S size S R ] (7) 1 α x 0 0 1 S size = 0 α y 0, (8) 0 0 1 sin(θ) sin(φ) cos(θ)cos(φ) cos(θ sin(φ) 0 sin(θ)cos(φ) sin(φ), cos(θ)sin(φ) sin(θ)sin(φ) cos(φ) (9) cos(θ)cos(θ) sin(θ)cos(φ) sin(θ) sin(φ) 1 S R = sin(θ) cos(θ) 0. (30) 0 0 0 The matix S size accounts fo the scaling of each pixel whee α x and α y ae defined as in Equation 7. The matices S T and S R account fo the tanslational and otational components fo each pixel [x i, y i ] of the motion model, espectively, whee θ = α x x i + minimum azimuth ( φ = α y y i + minimum elevation + π ). (31) We set up a (nonlinea) least squaes poblem using the steps outlined in [10] whee the Gauss-Newton solution fo p is found to be (iteating ove p = p p) p = ( β1 S T J c J c T + S + β S T J b J b T S) 1 ( β1 S T J c T c + β S T J b T b) (3) whee J c and J b ae the Jacobians of c (linea components) and b (non-linea components) with espect to d, espectively. The vaiables c and b ae the esiduals fom (c 1 c ) and (b 1 b ), espectively. 6. Range Image Flow Expeiments The local and global algoithms have been tested using data fom a Velodyne R HDL-64E. This senso is a 360 field of view 3D lida with 64 vetically mounted lases on a spinning head. The lases have a maximum ange of 50 m and an accuacy of cm. It is capable of spinning at 5 to 15 Hz, geneating ove 1.333 million points pe second. Fo ou tests, the senso was set to 10 Hz, geneating a hoizontal esolution of 1800 etuns pe otation. The lida etuns ae assembled into a 64 x 1800 ange image. The tests wee done using a Fod F-150 with the Velodyne R mounted to the oof. The vehicle and mounting hadwae ae visible in the lida scans, so all pixels within a theshold have been maked with a cetainty value of 0, causing these values to have no effect on the polynomial expansion o the flow calculations. While we don t yet have gound tuth fo this data, we did compute the Nomal Distibutions Tansfom (NDT) fo the oiginal point clouds to use fo compaison. The data set shown in Figues a, 3a, 4a and 5a was collected aound ou facilities on a foggy day. The images contain mostly data fom the bushes and othe vegetation suounding the oad, but do eventually show a paking lot, seen in Figue 5a. The NDT esults ae ovelaid on these images fo use as compaison to ou pesented algoithms. 6.1. Local Motion Estimation The esults of the local motion estimation algoithm ae shown in Figues b, 3b, 4b and 5b. Most of the flow field appeas accuate, though some egions still contain peculia behavio. Cetain egions of the flow image act as souces o sinks to the flow fields, such as in Figue 3b, whee in the middle ight potion of the image the flow field moves away fom a cental point. The bounday of the scan also seems to be less accuate than the cental aeas. Aeas such as that shown in Figue 4b ae elatively unifom, making flow calculations moe difficult. Despite this, the algoithm pefomed easonably well, though it does have egions with spoadic flow behavio. Pehaps by combining the esults of the global motion estimation with the local motion estimation, aeas within the image that exhibit unique motion with espect to the platfom may be identified as obstacles to aide in navigation. 6.. Global Motion Estimation To test the efficacy of the global motion estimation algoithm, we fist test on a known z otation, since we can 303

ceate an exact otation with the data synthetically by simply shifting the pixels in the ange image to the left o ight. The global ange flow fom this shift is shown in Figue 6 and the value of the tanslation and otation paametes pe iteation is shown in Figue 7. As seen in both figues, the coect solution is found and all of the motion is estimated to be in the z otation. The esults of the global motion estimation algoithm on the eal-wold data ae shown in Figues c, 3c, 4c and 5c. As shown in the figues, the global motion estimation gives a consistent pictue of the ego-motion of the platfom between two successive ange images. Fo instance, in Figue c, the flow is in the fowad diection only, which coesponds to the coect action of the platfom of a tansla- tion pependicula to the image plane, as is confimed by the esults fom the NDT algoithm in Figue a. Howeve, in Figue 4, the esults of the global motion estimation show little o no motion, which does not match up with the motion estimate of the NDT algoithm. Upon futhe inspection of the ange images aound fame 400, we found a lack of textue in the images which could explain this esult. Also, we found it difficult even fo a human to discen the motion pesent in those sequences of fames. Theefoe, futhe testing on low textue ange images is needed. While the global motion estimations may give moe useful infomation fo finding the ego-motion of the platfom, estimating the local motion emains an impotant task fo tue autonomous navigation. (a) NDT Flow Estimation (a) NDT Flow Estimation (b) Local Motion Estimation (b) Local Motion Estimation (c) Global Motion Estimation Figue : Range image flow on Velodyne R scan 100. (c) Global Motion Estimation Figue 3: Range image flow on Velodyne R scan 04. 304

(a) NDT Flow Estimation (a) NDT Flow Estimation (b) Local Motion Estimation (b) Local Motion Estimation (c) Global Motion Estimation Figue 4: Range image flow on Velodyne R scan 400. (c) Global Motion Estimation Figue 5: Range image flow on Velodyne R scan 1000. 7. Conclusion The Polynomial Expansion-based method is effective in calculating the flow of ange data, as ou esults have shown. The local motion estimation algoithm uns into some issues whee egions of the ange image appea as souces o sinks of motion, howeve, the oveall esults agee with the coect motion. The global motion estimation algoithm does not have these issues and appeas to estimate the global motion accuately. Fo navigation applications, both local and global motion estimations will be useful to both estimate the ego-motion of the platfom as well as estimate the movements of objects aound the platfom fo avoidance and tacking. Thee may also be some impovements to oveall motion estimation by combining the local and global estimations in a meaningful way. In futue wok, quantitative testing will be done to validate the qualitative esults, using gound tuth data to un lage scale tests. Once complete, we ae hoping to elease this dataset to the wide eseach community. Additionally, the polynomial expansion may be used to segment the ange image and classify plana and non-plana egions, which may then be incopoated into the flow calculation. Cetain featues have moe stable infomation about thei motion and as such should take on geate weights in the flow calculations. Plane egions, fo example, contain vey little infomation about the data s motion, but the edge and cone featues contibute a lage amount of flow infomation. Using the plana segmentation, the flows calculated at the cones and edges of plana egions could be intepolated acoss the egion. In addition to the ange image wok, futue wok will focus on calculating the otational components of the flow vectos with espect to a single image location, useful in image stabilization and unmanned aeial vehicle oientation detemination. 305

Figue 6: Global motion estimation of z otation Figue 7: Plot of z otation estimation pe iteation Refeences [1] S. Bake, D. Schastein, J. Lewis, S. Roth, M. J. Black, and R. Szeliski. A database and evaluation methodology fo optical flow. Intenational Jounal of Compute Vision, 9(1):1 31, 011. [] G. Badski and A. Kaehle. Leaning OpenCV: Compute vision with the OpenCV libay. O eilly, 008. [3] A. Buhn, J. Weicket, and C. Schn. Lucas/kanade meets hon/schunck: Combining local and global optic flow methods. Intenational Jounal of Compute Vision, 61(3):11 31, 005. [4] F. Dufaux and F. Moscheni. Segmentation-based motion estimation fo second geneation video coding techniques. In Video Coding, pages 19 63. Spinge, 1996. 1, 5 [5] G. Fanebäck. Fast and accuate motion estimation using oientation tensos and paametic motion models. In Patten Recognition, 000. Poceedings. 15th Intenational Confeence on, volume 1, pages 135 139. IEEE, 000. [6] G. Fanebäck. Oientation estimation based on weighted pojection onto quadatic polynomials. In VMV, pages 89 96, 000. [7] G. Fanebäck. Vey high accuacy velocity estimation using oientation tensos, paametic motion, and simultaneous segmentation of the motion field. In Compute Vision, 001. ICCV 001. Poceedings. Eighth IEEE Intenational Confeence on, volume 1, pages 171 177. IEEE, 001. [8] G. Fanebäck. Polynomial expansion fo oientation and motion estimation. PhD thesis, Linköping, 00. 1, [9] G. Fanebäck. Two-fame motion estimation based on polynomial expansion. In Image Analysis, pages 363 370. Spinge, 003. [10] G. Fanebäck and C.-F. Westin. Affine and defomable egistation based on polynomial expansion. In Medical Image Computing and Compute-Assisted Intevention MICCAI 006, pages 857 864. Spinge, 006. 1,, 5 [11] A. Giachetti, M. Campani, and V. Toe. The use of optical flow fo oad navigation. Robotics and Automation, IEEE Tansactions on, 14(1):34 48, 1998. 1 [1] J. Gonzalez. Recoveing motion paametes fom a d ange image sequence. In Patten Recognition, 1996., Poceedings of the 13th Intenational Confeence on, volume 1, pages 433 440 vol.1, 1996. [13] B. K. Hon and B. G. Schunck. Detemining optical flow. Atificial intelligence, 17(1):185 03, 1981. 1 [14] B. D. Lucas, T. Kanade, et al. An iteative image egistation technique with an application to steeo vision. In IJCAI, volume 81, pages 674 679, 1981. 1 [15] B. McCane, K. Novins, D. Cannitch, and B. Galvin. On benchmaking optical flow. Compute Vision and Image Undestanding, 84(1):16 143, 001. [16] K. Nodbeg and G. Fanebäck. A famewok fo estimation of oientation and velocity. In Image Pocessing, 003 Intenational Confeence on, volume 3, pages III 57. IEEE, 003. [17] K. Nodbeg and G. Fanebäck. Estimation of oientation tensos fo simple signals by means of secondode filtes. Signal Pocessing: Image Communication, 0(6):58 594, 005. [18] H. Spies, B. Jähne, and J. L. Baon. Range flow estimation. Compute Vision and Image Undestanding, 85(3):09 31, 00. 1, [19] Y.-j. Wang, G. Fanebäck, and C.-F. Westin. Multiaffine egistation using local polynomial expansion. Jounal of Zhejiang Univesity SCIENCE C, 11(7):495 503, 010. 306