Recovering Camera Pose from Omni-directional Images

Recoveg Camera Pose from Omn-drectonal Images Ada S.K. WAN 1 Angus M.K. SIU 1 Rynson W.H. LAU 1,2 1 Department of Computer Scence, Cty Unversty of Hong Kong, Hong Kong 2 Department of CEIT, Cty Unversty of Hong Kong, Hong Kong {adasw, angus, rynson}@cs.ctyu.edu.h Abstract Omn-drectonal mages are wdely used n mage-based walthrough applcatons, n whch camera pose recovery s one of the ntal and mportant processes. Exstng methods may recover camera pose of omndrectonal mages from lnes. However, t may not wor well when there s nsuffcent scene structure n the mages. In addton, exstng methods nvolve non-lnear optmzaton and teratve algorthms, whch may lead to the convergence problem and hgh computatonal cost. In ths paper, we propose an automatc camera pose recovery method for a networ of omn-drectonal mages. Our method only requres 2D pont correspondences as nput. We dvde the problem nto the orentaton and poston components, and determne them separately. The relatve rotatons between adjacent vews are aggregated to estmate the global orentatons. An algorthm s developed to adjust the rotatons for avodng global nconsstency and error accumulaton. For poston recovery, we have derved a lnear global formulaton for establshng correspondences and postons among multple vews. Globally optmzed postons can be obtaned smply by solvng the formulaton. We demonstrate the performance of our method wth some experments. 1. Introducton Omn-drectonal mages, such as panorama, are ncreasngly used for mage-based walthrough applcatons and 3D reconstructon [1, 14, 17]. They requre camera pose estmates for arbtrary vew synthess and guded matchng. Thus, camera pose recovery becomes the prerequste for developng such mage-based applcatons. Although the problem of camera pose recovery for planar mages has been extensvely studed, the methods developed cannot smply be appled to omn-drectonal mages. It s because the lnear projectve relatonshp between the mage space of planar mages, such as fundamental matrx [4] and trfocal tensor [13], does not exst between the mage spaces of omn-drectonal mages. Wthout the lnear projectve relatonshp, most camera pose recovery methods [8, 12] that wor on the projectve space can no longer be used here. Thus, we need to develop new methods for omn-drectonal mages. Exstng methods to recover camera parameters of omn-drectonal mage can be generally classfed nto three types: onste measurement, nteractve recovery and automatc recovery. Onste measurement [3, 17] may be the most drect method to obtan the camera pose dug mage captug. However t requres expensve BMVC 2004 do:10.5244/c.18.77

equpment, such as a laser range fnder. The accuracy of the acqured orentatons and postons hghly depends on the senstvty of the equpment. It s usually not precse enough to bg the mages nto pxel-accurate regstraton. To avod expensve equpment for onste measurement, camera pose may be estmated from mages drectly. [14] proposes a method for camera pose and structure recovery by manually specfyng the planes, lnes and ponts wth nown drectons or relatonshp on omn-drectonal mages. However, manual specfcaton becomes mpractcal when the number of mages s large. Human bas may also be ntroduced. In [2], a scalable camera pose estmaton method for omn-drectonal mages wth automatc edge detecton and vanshng ponts (VPs) estmaton s proposed. However, as the estmaton reles on parallel lnes n order to estmate VPs n every mage, t may not wor well when the scene lacs regular structures. Moreover, ther method nvolves teratve algorthm and non-lnear optmzaton. It may lead to convergence problem and expensve computatonal cost. In ths paper, we ntroduce a novel camera pose recovery approach for omndrectonal mages. Tradtonally, camera calbraton for planar mages smultaneously estmates both extsc and ntsc parameters of the camera. However, the strong couplng exsted between the two sets of parameters cause much error n the estmated focal length. We approach the problem by frst recoveg the ntsc parameters followed by the extsc parameters. The extsc parameter (6DOFs) recovery process s further decoupled nto rotatonal regstraton (3DOFs) and poston recovery (3DOFs). It s because omn-drectonal mages nhert the advantage of decouplng the zero baselne problem from the wde baselne problem. Hence, the ntsc parameters can be recovered ndependent of the extsc parameters. As such, the ntsc parameters can be obtaned more easly and accurately [15, 16]. The extsc parameters can also be solved more easly wth pont correspondences and lnear algorthms. However, the major drawbac of ths decouplng approach s that the error from one recovery stage may accumulate to the next. We wll deal wth the problem by applyng the global consstency checng to mnmze the error on each stage of the processes. The man contrbutons of ths paper are as follows: Correlaton establshment: Instead of correlatng omn-drectonal mages wth correspondng planes and VPs, we transform mage space ponts and establsh lnear relatonshp between the transformed domans. Wth ths method, only several pont correspondences are suffcent to establsh the mappng functon between two vews. Rotatonal regstraton: The estmated mappng functon provdes only relatve rotatons and drectons between two omn-drectonal mages. To acheve global rotatonal regstraton, we aggregate a set of relatve rotatons nto global orentatons wth the same Eucldean wreframe. Meanwhle, we have developed an orentaton adjustment algorthm to ensure global consstency. Poston recovery: Exstng methods for poston recovery requres many computer vson and statstcal technques, such as Hough transform, Marov chan Monte Carlo and expectaton maxmzaton, complcatng the mplementaton of these methods. We have derved the global formulaton and correlate the correspondences, dsparty and postons among multple vews. By solvng the formulated matrx, the postons of the vews can be obtaned easly. Our method allows camera postons to be recovered from correspondences alone. It s smple to mplement. Iteratve algorthms n exstng methods are also avoded. It not only hghly reduces the computatonal cost, but also prevents the convergence problems.

The rest of ths paper s organzed as follows. Secton 2 descrbes our camera pose recovery method n detal. Secton 3 presents some expermental results. Secton 4 brefly concludes the paper. 2. Camera Pose Recovery 2.1 Prelmnary Process Wthout usng specfc equpment, groups of planar mages are frst taen at dfferent optcal centres (nodes). Images wth a common node are then sttched together to form an omn-drectonal (panoramc) mage. Dug the sttchng process, an mportant ntsc parameter, focal length, s obtaned. More accurate ntsc parameters can be derved wth the method n [16]. Refer to Fgure 1. After we have obtaned a panorama for each node, we recover a consstent set of camera parameters for the panoramas. The nodes are trangulated wth Delanuay trangulaton [5] and adjacences between nodes are establshed. To save computatonal cost, we only match pont correspondences between each par of adjacent mages. Trangles are formed wth three nodes, whch consttute a self-loop for consstent checng. Feature ponts are automatcally extracted wth the Harrs operator [6] and matched for each adjacent mage par wth Zero Normalzed Cross Correlaton (ZNCC) [7]. The correspondences obtaned serve as the nput for the camera pose recovery process. As shown n Fgure 2, our method can be dvded nto three modules: mage space transform, rotatonal regstraton, and poston recovery. Detals of each module wll be dscussed next. Node Adjacency Self-loop Fgure 1. An mage networ composng of nodes, adjacences and self- Rotatonal Regstraton Pont Correspondences Image Space Transform Essental Matrx Estmaton Relatve Rotaton Extracton & Decomposton Global Orentaton Recovery & Adjustment Global Orentatons Fgure 2. Orentaton Transform Poston Recovery Global Translaton Formulaton & Mnmzaton Our camera pose recovery method for omn-drectonal Global Postons 2.2 Image Space Transform As the mappng between the Eucldean space and the mage space of an omndrectonal mage s non-lnear, the mappng between the mage spaces of an mage par s also non-lnear. Ths means that we cannot drectly formulate the pont correspondences nto a system of lnear equatons to solve for the mappng functon between an mage par. For the camera wth a sngle mrror provdng a sngle vewpont,.e., central panoramc catadoptrc camera, [11] suggests a non-lnear fundamental constrant on

the correspondng ponts n two catadoptrc cameras. However, for the more commonly used mosacs based panorama, e.g., QTVR, they assume that he mages do not provde a sngle vewpont and the eppolar geometry has not been derved. In practce, t s a good approxmaton to assume mosacs based panorama as a cylndrcal projecton wth a sngle vewpont [9, 14], f all objects n the scene are relatvely far from the trpod s rotatonal centre. Based on ths assumpton, the relatonshp between mage space ponts can be establshed as follows. We frst transform the mage space ponts to unt vectors on a Gaussan sphere, S 2. Hence, a transformaton g j for vew j s appled to mage pont p=[u,v] T to obtan the projectve ray d=[s,t,r] T,.e., g j (p)=d. Wth ths transformaton, we may establsh the lnear mappng E j between vews j and as follows: g (p) T E j g j (p) = 0 (1) where E j s a 3x3 essental matrx, wth fve DOFs: three for relatve rotaton and two for relatve drecton. A dfferent transformaton g would be adopted for a dfferent nd of omn-drectonal mages. For example, g for panoramc mages would be defned as: f sn( u / f ) s u g = = v h / 2 t (2) v f cos( u / f ) r where f s the focal length, h s the mage heght and s the sew factor. 2.3 Rotatonal Regstraton 2.3.1 Essental Matrx Estmaton After transformng the mage space ponts to rays, we can estmate E for every adjacent mage par. Denoted by e, the 9-vector maes up the entres of E n row-major order. Let d j = [s j, t j, r j ] T be the projectve ray at vew j of the th correspondence. We can obtan a system of lnear equatons from a set of correspondences n the form of: j j j j j j j j j s 1 s1 s1 s1 s1 s1 Ae = : : : : : : : : : e = 0 (3) j j j j j j j j j sn sn sntn sn tn sn tntn tn sn tn By enforcng the constrant that Det(E) = 0, we may use seven ponts to solve for e. We also mpose the condton that the two sngular values of the essental matrx should be equal. The resultng E s the closest soluton n Frobenus norm. We use the Random Sample Consensus (RANSAC) algorthm to estmate E robustly, and determne the number of requred samples adaptvely by: N = log (1 - ρ) / log (1 - s ) (4) where ρ s the probablty of gettng at least one sample wthout outler, s the nler percentage, and s s the mnmum number of correspondences n each sample. The E wth the most nlers s chosen. 2.3.2 Relatve Rotaton Extracton and Decomposton The essental matrx E provdes relatve orentaton and translaton drecton between two adjacent mages. In order to obtan the orentatons n the world coordnate system, we extract the rotatonal matrx R j from every E j by Sngular Value Decomposton (SVD). There are two possble solutons for R j and two for translatons. The correct soluton can be determned from several correspondences as suggested n [18]. R j s

then further decomposed nto 3 gvens rotatons about the three coordnate axes,.e., R j =R x j R y j R z j, by RQ factorzaton. 2.3.3 Global Orentaton Recovery and Adjustment As the set of E j are estmated ndependently, the ntal set of R j may not be globally consstent. The nconsstency n R j may also be propagated and accumulated as the path gets long. To deal wth ths problem, we propose an algorthm to adjust the rotatonal matrces. Wth the adjustment, a set of globally consstent R can be obtaned and the nconsstency propagaton can be mnmzed. Let θ j be the relatve rotatonal angle from vew j to vew. The resdual ε of a selfloop L of three vews j,, l (.e., R j R l R jl ) s defned as: ε = θ j + θ l + θ jl (5) In a consstent self-loop, ε s equal to zero as shown n Fgure 3(a). However, due to the ndependent estmaton of R, ε s often not equal to zero as shown n Fgure 3(b). To ensure consstency, we have to adjust θ by θ, such that 0 = θ j + θ l + θ jl (6) θ j = θ j + θ j. (7) Ө jl R j Ө j Өjl ε R j Ө j j j R jl Ө l R l R jl Ө l R l l (a) (b) l Fgure 3. (a) Global consstent orentatons, and (b) global nconsstent orentatons. Intally, all the adjacences are mared as non-adjusted and adjusted one by one. We compute ε for each self-loop L and choose the one L curr wth the smallest ε. For every adjacency n L curr, θ j s determned as: 1 n ε κ θ j = n = (8) 1 q where n {1,2} s the number of self-loops contanng the adjacency, q s the number of non-adjusted adjacences n the adjacent self-loop L, and κ {1,-1} ndcates the drecton of edge n L. Then, θ j s adjusted by Equaton (7) and mared as adjusted. ε s of the adjacent self-loops L neg of L curr are computed. The one wth the smallest ε s selected and the adjustment process s repeated untl all the adjacences are updated. The followng pseudo-code shows the orentaton adjustment. ε mn = for = 1 to M do compute ε for L by Equaton (5) f ε < ε mn then L curr = L endfor j = M repeat compute θ for non-adjusted edges n L curr by Equaton (8) update ε, q for all L neg j = j -1

L curr = L neg wth mnmum ε untl j = 0 2.4 Poston Recovery The translaton of a vew ncludes drecton as well as magntude. Snce only relatve translaton drecton between two vews can be nferred from E j, we cannot smply obtan the global poston of a vew from a set of essental matrces. Here, we nvestgate the lnear relatonshp among the correspondences, generalzed dsparty [10] and global translaton among multple vews. Upon formulatng these enttes, we may obtan a set of globally optmzed postons by solvng the system of lnear equatons. 2.4.1 Orentaton Transform Gven a 3D pont, X R 3 n the Eucldean space, we can defne a mappng functon from X to a projectve ray d as: M: X d. On the other hand, for a gven generalzed dsparty, λ = δ(d), we can defne another mappng functon from d to X as: N: (d, λ) X. Wth a 3x3 rotatonal matrx R and the camera centre C of a vew, the two mappng functons can be defned as follows: M : d =& RX RC or d T (RX RC) = 0 N : X = C + λr T d We can elmnate the rotatonal matrx R by an orentaton transform H: d = R T d. After the transformaton, the two functons can be smplfed as: M : d T (X C) = 0 N : X = C + λd 2.4.2 Global Translaton Formulaton After the orentaton transform, we correlate the two mappng functons for dfferent vews and derve the formulaton among multple vews. Referg to Fgure 4, X can be nferred from a covarant vew j by the functon N as: X = C j + λ j d j (9) where d j denotes the projectve ray after the orentaton transform at vew j for X. λ j s the correspondng generalzed dsparty. Meanwhle, X can also be projected to a contravarant vew by the functon M as: (d ) T (X C ) = 0 (10) By substtutng (9) nto (10), X can be elmnated: (d ) T (C j C + λ j d j ) = 0 (11) Let d j = [s j, t j, r j ] T and C j = [C1 j, C2 j, C3 j ]. By elmnatng a redundant equaton, the correspondences can be formulated wth (11) as: 0 t r 0 t s 0 r r 0 s s j C1 j C2 j C3 j j t r r t C1 = 0 j j s r s r C2 C3 j λ (12)

By extendng the formulaton for multple covarant and contravarant vews, we may obtan the matrx of multple vews for global optmzaton. In order to tolerate msmatches and acheve robust estmaton, we agan employ the RANSAC method to select a sub-set of samples for determnng the global postons. The soluton wth the most nler σ s selected. The number of samples N requred s determned adaptvely by Equaton (4). Let C be the rght null space vector n Equaton (12), the pseudo-code for poston recovery to get the best soluton C best s as follows: for j = 1 to J for = 1 to I orentaton transform of d j n = σ max = 0 repeat select sample of correspondng d j from every vew compute C for the equatons as n Equaton (12) compute σ f (σ max < σ) C best = C n = n+1 untl n = N X λ j d j vew l vew j vew Fgure 4. A 3D pont X can be nferred from vew j and then projected to vews, l. 3. Expermental Results We have mplemented our method n Java and expermented t on a Pentum 4 2.4GHz PC wth two real scenes. Table 1 shows the confguraton of the two scenes. We prepared 15 panoramc mages, each at a resoluton of 4500x450, for each of the two scenes. The 15 panoramc mages form 18 self-loops and 32 adjacences. Table 2 shows the reprojecton error of the two scenes. Whle the reprojecton error for exstng methods s usually a few pxels, our method s about one pxel on average. Node Adjacency Self-loop 15 32 18 Table 1. Confguraton of two scenes. Reprojecton Error (pxel) Average Maxmum Standard Devaton Table 2. Exp 1 0.82 1.80 0.47 Exp 2 1.03 2.00 0.56 Reprojecton errors of the two experments wth 15 panoramc mages. Fgures 5(a) and 5(b) show two of the 15 panoramc mages for each of the two scenes. The eppolar curves, whch are determned from the recovered camera pose, are supermposed on the mages. Fgures 5(c) and 5(e) show the eppolar curves computed by the ntal E (secton 2.3.1) at two feature ponts, whch do not le on the eppolar

curves. After applyng the rotatonal regstraton and the poston recovery processes, the eppolar algnment s sgnfcantly mproved as shown n Fgures 5(d) and 5(f). Fgure 6 compares the accuracy of the camera pose obtaned, wth and wthout rotatonal regstraton, n term of nler percentage aganst the number of mages. We (a) (b) Fgure 5. (c) (d) (e) (f) (a) and (b) show the Epoplar curves of the two scenes. (c) to (f) compare the eppolar algnment before ((c), (e)) and after ((d), (f)) the orentaton adjustment and poston recovery processes. 1 0.95 Exp 1 (wth R regstraton) Inler (%) 0.9 0.85 0.8 Exp 2 (wth R regstraton) Exp 1 (wthout R regstraton) Exp 2 (wthout R regstraton) 0.75 0.7 0.65 7 8 9 10 11 12 13 14 15 16 Number of mages Fgure 6. Inler percentage wth rotatonal R regstraton s compared wth that wthout rotatonal R regstraton as a functon of mage number. can see that the accuracy s mproved wth global orentaton adjustment. By applyng SVD to E, rotaton and translaton drecton can be extracted. However, the translaton drectons from dfferent nodes are nconsstent. Refer to Fgure 7. The translaton drectons (dotted lnes) do not ntersect on a sngle node due to the global nconsstency. By applyng our poston recovery method, globally consstent postons and translaton drectons (sold lne) can be obtaned.

The nler percentage aganst the number of mages wth global consstent translatons computed by our method and nconsstent translatons s plotted n Fgure 8. From the graph, we see that the poston recovery process sgnfcantly mproves the accuracy of the estmated camera pose. Moreover, the trend of the curves n both Fgures 6 and 8 show that the nler percentage wthout ether R regstraton or poston recovery decreases as the number of mages ncreases. Ths s because the nconsstency s accumulated as the length of the path ncreases. T 2->0 T'4->0 mg 0 T 3->0 T 1->0 T' 1->0 T' 3->0 T' 2->0 T 4->0 Fgure 7. Global poston and relatve drectons. 0.95 Exp 1 (wth consstent T) Exp 2 (wth consstent T) 0.85 Inler (%) 0.75 0.65 Exp 1 (wth nconsstent T) Exp 2 (wth nconsstent T) 0.55 0.45 0.35 7 8 9 10 11 12 13 14 15 16 Number of mages Fgure 8. Inler percentage s compared between consstent and nconsstent translaton T as a functon of mage number. Table 3 lsts the processng tme of each process. The whole recovery process taes around ten mnutes, whch s smlar n performance to exstng methods. The number of samples requred n the RANSAC method s nversely proportonal to the number of nlers (Equaton 4). As the number of nlers n experment 1 s hgher than that n experment 2, RANSAC computaton requres less samples,.e., lower processng tme. Process E Estmaton R Extracton and Adjustment Poston Adjustment Total Exp 1 1.30 mns 0.0028 mns 6.13 mns 7.43 mns Exp 2 2.18 mns 0.0027 mns 9.78 mns 11.96 mns Table 3. Computatonal tme n each process wth 15 panoramc mages. 4. Concluson In ths paper, we have proposed a camera pose recovery method for omn-drectonal mages. Feature ponts are automatcally extracted and matched across every mage par. The correspondences serve as the only nput to our method. By transformng mage space pont, lnear mappng functon s establshed to correlate omn-drectonal

mages. Rotatonal adjustment aggregates relatve orentatons n global consstent orentatons. It effectvely mnmzes error accumulaton. By dervng and solvng the global formulaton, a set of globally optmzed postons from transformed ponts can be obtaned. Because no planes and lnes are requred, t smplfes the mplementaton and does not rely on regular scene structures. Acnowledgements The wor descrbed n ths paper was partally supported by a CERG grant from the Research Grants Councl of Hong Kong (RGC Reference Number: CtyU 1308/03E) and a DAG grant from Cty Unversty of Hong Kong (Project Number: 9100264). References [1] D. Alaga and I. Carlbom, Plenoptc Sttchng: A Scalable Method for Reconstructng 3D Interactve Walthroughs, Proc. ACM SIGGRAPH, pp. 443-450, 2001. [2] M. Antone and S. Teller, Scalable Extsc Calbraton of Omn-Drectonal Image Networs, IJCV, 49(2/3):143-174, Sept./Oct. 2002. [3] R. Bunschoten and B. Kröse, Robust Scene Reconstructon from an Omndrectonal Vson System, IEEE Trans. on Robotcs and Automaton, 19(2):351-357, 2003. [4] O. Faugeras, What can be Seen n Three Dmensons wth an Uncalbrated Stereo Rg, Proc. ECCV 92, 1992. [5] S. Fortune, Vorono Dagrams and Delaunay Trangulatons, Computng n Eucldean Geometry, D. Du and F. Hwang (eds.), World Scentfc, pp. 193-223, 1992. [6] C. Harrs and M. Stephens, A Combnes Coer and Edge Detector, Alvey Vson Conference, pp 147-151, 1988. [7] Z. Lan and R. Mphr, Robust Locaton Based Partal Correlaton, Research Report 3186, INRIA Sopha-Antpols, 1997. [8] Q. Luong and O. Faugeras, Self-calbraton of a Movng Camera from Pont Correspondences and Fundamental Matrces, IJCV, 1(1):5-40, 1997 [9] L. McMllan, G. Bshop, Plenoptc Modellng: An Image-based Rendeg System, Proc. ACM SIGGRAPH, pp. 39-46, 1995. [10] L. McMllan, An Image-Based Approach to Three-Dmensonal Computer Graphcs, Techncal Report 97-013, UNC at Chapel Hll, 1997. [11] T. Pajdla and T. Svoboda, Eppolar Geometry for Central Catadoptrc Cameras, IJCV, 49(1):23-37, 2002 [12] M. Pollefeys, R. Kocj, and L. van Gool, Self-calbraton and Metrc Reconstructon n Spte of Varyng and Unnown Inteal Camera Parameters, IJCV, 32(1):7-25, Aug. 1999. [13] A. Shashua, Algebrac Functons for Recognton, IEEE Trans. on PAMI, 17(8):779-789, Aug. 1995. [14] H. Shum, M. Han, and R. Szels, Interactve Constructon of 3D Models from Panoramc Mosacs, Proc. CVPR, pp. 427-433, 1998. [15] H. Shum and R. Szels, Constructon of Panoramc Image Mosacs wth Global and Local Algnment, IJCV, 36(2):101-130, Feb. 2000. [16] G. Sten, Accurate Inteal Camera Calbraton Usng Rotaton, wth Analyss of Sources of Error, ICCV, pp. 230-236, 1995. [17] S. Teller, Automated Urban Model Acquston: Project Ratonale and Status, IUW, pp. 455-462, 1998. [18] J. Weng, T. Huang, and N. Ahuja, Moton and Structure from Two Perspectve Vews: Algorthms, Error Analyss, and Error Estmaton, IEEE Trans. On PAMI, 11(5):451-476, May 1989.