Semi-Direct Visual Odometry for Monocular, Wide-angle, and Multi-Camera Systems

Size: px
Start display at page:

Download "Semi-Direct Visual Odometry for Monocular, Wide-angle, and Multi-Camera Systems"

Transcription

1 1 Sem-Drect Vsual Odometry for Monocular, Wde-angle, and Mult-Camera Systems Chrstan Forster, Zchao Zhang, Mchael Gassner, Manuel Werlberger, Davde Scaramuzza Abstract Drect methods for Vsual Odometry (VO) have ganed popularty due to ther capablty to explot nformaton from all mage gradents n the mage. However, low computatonal speed as well as mssng guarantees for optmalty and consstency are lmtng factors of drect methods, where establshed feature-based methods nstead succeed at. Based on these consderatons, we propose a Sem-drect VO (SVO) that uses drect methods to track and trangulate pxels that are characterzed by hgh mage gradents but reles on proven feature-based methods for jont optmzaton of structure and moton. Together wth a robust probablstc depth estmaton algorthm, ths enables us to effcently track pxels lyng on weak corners and edges n envronments wth lttle or hgh-frequency texture. We further demonstrate that the algorthm can easly be extended to multple cameras, to track edges, to nclude moton prors, and to enable the use of very large feld of vew cameras, such as fsheye and catadoptrc ones. Expermental evaluaton on benchmark datasets shows that the algorthm s sgnfcantly faster than the state of the art whle achevng hghly compettve accuracy. SUPPLEMENTARY MATERIAL Vdeo of the experments: I. INTRODUCTION Estmatng the sx degrees-of-freedom moton of a camera merely from ts stream of mages has been an actve feld of research for several decades [1 6]. Today, state-of-the-art vsual SLAM (V-SLAM) and vsual odometry (VO) algorthms run n real-tme on smart-phone processors and approach the accuracy, robustness, and effcency that s requred to enable varous nterestng applcatons. Examples comprse the robotcs and automotve ndustry, where the ego-moton of a vehcle must be known for autonomous operaton. Other applcatons are vrtual and augmented realty, whch requres precse and low latency pose estmaton of moble devces. The central requrement for the successful adopton of vson-based methods for such challengng applcatons s to obtan hghest accuracy and robustness wth a lmted computatonal budget. The most accurate camera moton estmate s obtaned through jont optmzaton of structure (.e., landmarks) and moton (.e., camera poses). For feature-based methods, ths s an establshed problem that s commonly known as bundle adjustment [7] and many solvers exst, The authors are wth wth the Robotcs and Percepton Group, Unversty of Zurch, Swtzerland. Contact nformaton: {forster, zzhang, gassner, werlberger, sdavde}@f.uzh.ch. Ths research was partally funded by the Swss Natonal Foundaton (project number , Swarm of Flyng Cameras), the Natonal Center of Competence n Research Robotcs (NCCR), the UZH Forschungskredt, and the SNSF-ERC Startng Grant. whch address the underlyng non-lnear least-squares problem effcently [8 11]. Three aspects are key to obtan hghest accuracy when usng sparse feature correspondence and bundle adjustment: (1) long feature tracks wth mnmal feature drft, (2) a large number of unformly dstrbuted features n the mage plane, and (3) relable assocaton of new features to old landmarks (.e., loop-closures). The probablty that many pxels are tracked relably, e.g., n scenes wth lttle or hgh frequency texture (such as sand [12] or asphalt [13]), s ncreased when the algorthm s not restrcted to use local pont features (e.g., corners or blobs) but may track edges [14] or more generally, all pxels wth gradents n the mage, such as n dense [15] or sem-dense approaches [16]. Dense or sem-dense algorthms that operate drectly on pxel-level ntenstes are also denoted as drect methods [17]. Drect methods mnmze the photometrc error between correspondng pxels n contrast to feature-based methods, whch mnmze the reprojecton error. The great advantage of ths approach s that there s no pror step of data assocaton: ths s mplctly gven through the geometry of the problem. However, jont optmzaton of dense structure and moton n real-tme s stll an open research problem, as s the optmal and consstent [18, 19] fuson of drect methods wth complementary measurements (e.g., nertal). In terms of effcency, prevous drect methods are computatonally expensve as they requre a sem-dense [16] or dense [15] reconstructon of the envronment, whle the domnant cost of feature-based methods s the extracton of features and descrptors, whch ncurs a hgh constant cost per frame. In ths work, we propose a VO algorthm that combnes the advantages of drect and feature-based methods. We ntroduce the sparse mage algnment algorthm (Sec. V), an effcent drect approach to estmate frame-to-frame moton by mnmzng the photometrc error of features lyng on ntensty corners and edges. The 3D ponts correspondng to features are obtaned by means of robust recursve Bayesan depth estmaton (Sec. VI). Once feature correspondence s establshed, we use bundle adjustment for refnement of the structure and the camera poses to acheve hghest accuracy (Sec. V-B). Consequently, we name the system sem-drect vsual odometry (SVO). Our mplementaton of the proposed approach s exceptonally fast, requrng only 2.5 mllseconds to estmate the pose of a frame on a standard laptop computer, whle achevng comparable accuracy wth respect to the state of the art on benchmark datasets. The mproved effcency s due to three reasons: frstly, SVO extracts features only for selected keyframes n a parallel thread, hence, decoupled from hard

2 2 real-tme constrants. Secondly, the proposed drect trackng algorthm removes the necessty for robust data assocaton. Fnally, contrarly to prevous drect methods, SVO requres only a sparse reconstructon of the envronment. Ths paper extends our prevous work [2], whch was also released as open source software. 1 The novelty of the present work s the generalzaton to wde FoV lenses (Sec. VII), mult-camera systems (Sec. VIII), the ncluson of moton prors (Sec. IX) and the use of edgelet features. Addtonally, we present several new expermental results n Sec. XI wth comparsons aganst prevous works. II. RELATED WORK Methods that smultaneously recover camera pose and scene structure, can be dvded nto two classes: a) Feature-based: The standard approach to solve ths problem s to extract a sparse set of salent mage features (e.g. corners, blobs) n each mage; match them n successve frames usng nvarant feature descrptors; robustly recover both camera moton and structure usng eppolar geometry; and fnally, refne the pose and structure through reprojecton error mnmzaton. The majorty of VO and V-SLAM algorthms [6] follow a varant of ths procedure. A reason for the success of these methods s the avalablty of robust feature detectors and descrptors that allow matchng mages under large llumnaton and vew-pont changes. Feature descrptors can also be used to establsh feature correspondences wth old landmarks when closng loops, whch ncreases both the accuracy of the trajectory after bundle adjustment [7, 21] and the robustness of the overall system due to re-localzaton capabltes. Ths s also where we draw the lne between VO and V-SLAM: Whle VO s only about ncremental estmaton of the camera pose, V-SLAM algorthms, such as [22], detect loop-closures and subsequently refne large parts of the map. The dsadvantage of feature-based approaches s ther low speed due to feature extracton and matchng at every frame, the necessty for robust estmaton technques that deal wth erroneous correspondences (e.g., RANSAC [23], M- estmators [24]), and the fact that most feature detectors are optmzed for speed rather than precson. Furthermore, relyng only on well localzed salent features (e.g., corners), only a small subset of the nformaton n the mage s exploted. In SVO, features are extracted only for selected keyframes, whch reduces the computaton tme sgnfcantly. Once extracted, a drect method s used to track features from frame to frame, resultng n outler-free and sub-pxel precse matches. Apart from well localzed corner features, ths allows trackng and mappng any pxel wth non-zero ntensty gradent. b) Drect methods: Drect methods estmate structure and moton drectly by mnmzng an error measure that s based on the mage s pxel-level ntenstes [17]. The local ntensty gradent magntude and drecton s used n the optmzaton compared to feature-based methods that consder only the dstance to a feature-locaton. Pxel correspondence s gven drectly by the geometry of the problem, elmnatng the need for robust data assocaton technques. However, ths 1 svo makes the approach dependent on a good ntalzaton that must le n the basn of attracton of the cost functon. Usng a drect approach, the sx degree of freedom (DoF) moton of a camera can be recovered by mage-to-model algnment, whch s the process of algnng the observed mage to a vew syntheszed from the estmated 3D map. Early drect VO methods tracked and mapped few sometmes manually selected planar patches [25 29]. By estmatng the surface normals of the patches [3], they could be tracked over a wde range of vewponts. In [31], the local planarty assumpton was relaxed and drect trackng wth respect to arbtrary 3D structures computed from stereo cameras was proposed. For RGB-D cameras, where a dense depth-map for each mage s gven by the sensor, dense mage-tomodel algnment was subsequently ntroduced n [32 34]. In conjuncton wth dense depth regstraton ths has become the standard n camera trackng for RGB-D cameras [35 38]. Wth DTAM [15], a drect method was ntroduced that computes a dense depthmap from a sngle movng camera n real-tme. The camera pose s found through drect whole mage algnment usng the depthmap. However, nferrng a dense depthmaps from monocular mages s computatonally ntensve and s typcally addressed usng GPU parallelsm, such as n the open-source REMODE algorthm [39]. Early on t was realzed that only pxels wth an ntensty gradent provde nformaton for moton estmaton [4]. In ths sprt, a sem-dense approach was proposed n [41] where the depth s only estmated for pxels wth hgh ntensty gradents. In our expermental evaluaton n Sec. XI-A we show that t s possble to reduce the number of tracked pxels even more for frame-to-frame moton estmaton wthout any notceable loss n robustness or accuracy. Therefore, we propose the sparse mage-to-model algnment algorthm that uses only sparse pxels at corners and along mage ntensty gradents. A dsadvantage of drect methods s that jont optmzaton of dense structure and moton n real-tme s stll an open research problem. For ths reason, the standard approach s to estmate the latest camera pose wth respect to a prevously accumulated dense map and subsequently, gven a set of estmated camera poses, update the dense map [15, 42]. Clearly, ths separaton of trackng and mappng only results n optmal accuracy when the output of each stage yelds the optmal estmate. Other algorthms optmze a graph of poses but do not allow a deformaton of the structure once trangulated [16]. Contrarly, some algorthms gnore the camera poses and nstead allow non-rgd deformaton of the 3D structure [36, 38]. The obtaned results are accurate and vsually mpressve, however, a thorough probablstc treatment s mssng when processng measurements, separatng trackng and mappng, or fxatng and removng states. To the best of our knowledge, t s therefore currently not possble to obtan accurate covarance estmates from dense VO. Hence, the consstent fuson [18, 43] wth complementary sensors (e.g., nertal) s currently not possble. In the proposed work, we use drect methods only to establsh feature correspondence. Subsequently, bundle adjustment can be used for jont optmzaton of structure and moton where t s also possble to nclude nertal measurements as we have demonstrated n prevous work [44].

3 3 New Image Last Frame Moton Estmaton Thread Sparse Model-based Image Algnment T CB T k,k 1 T CB Feature Algnment Pose & Structure Refnement Map I c k 1 u 3 I c k u 1 u 2 u 1 u 2 u 3 Frame Queue yes Feature Extracton Mappng Thread Is Keyframe? no Update Depth-Flters ρ 1 ρ 2 Fg. 2: Changng the relatve pose T k,k 1 between the current and the prevous frame mplctly moves the poston of the reprojected ponts n the new mage u. Sparse mage algnment seeks to fnd T k,k 1 that mnmzes the photometrc dfference between mage patches correspondng to the same 3D pont (blue squares). Note, n all fgures, the parameters to optmze are drawn n red and the optmzaton cost s hghlghted n blue. ρ 3 Intalze Depth-Flters Converged? Fg. 1: Trackng and mappng ppelne III. SYSTEM OVERVIEW yes: nsert new Pont Fgure 1 provdes an overvew of the proposed approach. We use two parallel threads (as n [21]), one for estmatng the camera moton, and a second one for mappng as the envronment s beng explored. Ths separaton allows fast and constant-tme trackng n one thread, whle the second thread extends the map, decoupled from hard real-tme constrants. The moton-estmaton thread mplements the proposed sem-drect approach to moton estmaton. Our approach s dvded nto three steps: sparse mage algnment, relaxaton, and refnement (Fg. 1). Sparse mage algnment estmates frame-to-frame moton by mnmzng the ntensty dfference of features that correspond to the projected locaton of the same 3D ponts. A subsequent step relaxes the geometrc constrant to obtan sub-pxel feature correspondence. Ths step ntroduces a reprojecton error, whch we fnally refne by means of bundle adjustment. In the mappng thread, a probablstc depth-flter s ntalzed for each feature for whch the correspondng 3D pont s to be estmated. New depth-flters are ntalzed whenever a new keyframe s selected for corner pxels as well as for pxels along ntensty gradent edges. The flters are ntalzed wth a large uncertanty n depth and undergo a recursve Bayesan update wth every subsequent frame. When a depth flter s uncertanty becomes small enough, a new 3D pont s nserted n the map and s mmedately used for moton estmaton. IV. NOTATION The ntensty mage recorded from a movng camera C at tmestep k s denoted wth I C k : ΩC R 2 R, where Ω C s the mage doman. Any 3D pont ρ R 3 maps to the mage coordnates u R 2 through the camera projecton model: u = π(ρ). Gven the nverse scene depth ρ > at pxel u R C k, the poston of a 3D pont s obtaned usng the back-projecton model ρ = πρ 1 (u). Where we denote wth Ω those pxels for whch the depth s known at tme R C k k n camera C. The projecton models are known from pror calbraton [45]. The poston and orentaton of the world frame W wth respect to the k th camera frame s descrbed by the rgd body transformaton T kw SE(3) [46]. A 3D pont W ρ that s expressed n world coordnates can be transformed to the k th camera frame usng: k ρ = T kw W ρ. V. MOTION ESTIMATION In ths secton, we descrbe the proposed sem-drect approach to moton estmaton, whch assumes that the poston of some 3D ponts correspondng to features n prevous frames are known from pror depth estmaton. A. Sparse Image Algnment Image to model algnment estmates the ncremental camera moton by mnmzng the ntensty dfference (photometrc error) of pxels that observe the same 3D pont. To smplfy a later generalzaton to multple cameras, we ntroduce a body frame B that s rgdly attached to the camera frame C wth known extrnsc calbraton T CB SE(3) (see Fg. 2). Our goal s to estmate the ncremental moton of the. body frame T kk 1 = TB such that the photometrc error kbk 1 s mnmzed: T kk 1 = arg mn T kk 1 u R C k r I C u (T kk 1) 2 Σ I, (1) where the photometrc resdual r I C u s defned by the ntensty dfference of pxels n subsequent mages I C k and IC k 1 that observe the same 3D pont ρ u : r I C u (T kk 1 ). = I C k( π(tcb T kk 1 ρ u ) ) I C k 1( π(tcb ρ u ) ). (2) The 3D pont ρ u (whch s expressed n the reference frame B k 1 ) can be computed for pxels wth known depth by means of back-projecton: ρ u = T BC π 1 ρ (u), u R C k 1, (3)

4 4 n δu δu (a) Sparse (b) Sem-Dense (c) Dense (a) Edge algnment. (b) Corner algnment. Fg. 3: An mage from the ICL-NUIM dataset (Sec. XI-B3) wth pxels used for mage-to-model algnment (marked n green for corners and magenta for edgelets) for sparse, sem-dense, and dense methods. Dense approaches (c) use every pxel n the mage, sem-dense (b) use just the pxels wth hgh ntensty gradent, and the proposed sparse approach (a) uses selected pxels at corners or along ntensty gradent edges. However, the optmzaton n Eq. (1) ncludes only a subset of those pxels R C k 1 RC k 1, namely those for whch the back-projected ponts are also vsble n the mage I C k : R C k 1 = { u u R C k 1 π ( T CB T kk 1 T BC π 1 ρ (u) ) Ω C}. Image to model algnment has prevously been used n the lterature to estmate camera moton. Apart from mnor varatons n the formulaton, the man dfference among the approaches s the source of the depth nformaton as well as the regon R C k 1 n mage IC k for whch the depth s known. As dscussed n Secton II, we denote methods that know and explot the depth for all pxels n the reference vew as dense methods [15]. Converseley, approaches that only perform the algnment for pxels wth hgh mage gradents are denoted sem-dense [41]. In ths paper, we propose a novel sparse mage algnment approach that assumes known depth only for corners and features lyng on ntensty edges. Fg. 3 summarzes our notaton of dense, sem-dense, and sparse approaches. To make the sparse approach more robust, we propose to aggregate the photometrc cost n a small patch centered at the feature pxel. Snce the depth for neghborng pxels s unknown, we approxmate t wth the same depth that was estmated for the feature. To summarze, sparse mage algnment solves the non-lnear least squares problem n Eq. (1) wth R C k 1 correspondng to small patches centered at corner and edgelet features wth known depth. Ths optmzaton can be solved effcently usng standard teratve non-lnear least squares algorthms such as Levenberg-Marquardt. More detals on the optmzaton, ncludng the analytc Jacobans, are provded n the Appendx. B. Relaxaton and Refnement Sparse mage algnment s an effcent method to estmate the ncremental moton between subsequent frames. However, to mnmze drft n the moton estmate, t s paramount to regster a new frame to the oldest frame possble. One approach s to use an older frame as reference for mage algnment [16]. However, the robustness of the algnment cannot be guaranteed as the dstance between the frames n the algnment ncreases (see experment n Secton XI-A). We therefore propose to relax the geometrc constrants gven by the reprojecton of 3D ponts and to perform an ndvdual 2D algnment of correspondng feature patches. The algnment Fg. 4: Dfferent algnment strateges for corners and edgelets. The algnment of an edge feature s restrcted to the normal drecton n of the edge. of each patch n the new frame s performed wth respect to a reference patch from the frame where the feature was frst extracted; hence, the oldest frame possble, whch should maxmally mnmze feature drft. However, the 2D algnment generates a reprojecton error that s the dfference between the projected 3D pont and the algned feature poston. Therefore, n a fnal step, we perform bundle adjustment to optmze both the 3D pont s poston and the camera poses such that ths reprojecton error s mnmzed. In the followng, we detal our approach to feature algnment and bundle adjustment. Thereby, we take specal care of features lyng on ntensty gradent edges. 2D feature algnment mnmzes the ntensty dfference of a small mage patch P that s centered at the projected feature poston u n the newest frame k wth respect to a reference patch from the frame r where the feature was frst observed (see Fg. 4). To mprove the accuracy of the algnment, we apply an affne warpng A to the reference patch, whch s computed from the estmated relatve pose T kr between the reference frame and the current frame [21]. For corner features, the optmzaton computes a correcton δu R 2 to the predcted feature poston u that mnmzes the photometrc cost: u = u + δu, wth u = π ( T CB T kr T BC πρ 1 (u) ) (4) δu 1 = arg mn I C k (u +δu+ u) I C δu 2 r(u + A u) 2, u P where u s the terator varable that s used to compute the sum over the patch P. Ths algnment s solved usng the nverse compostonal Lucas-Kanade algorthm [47]. For features lyng on ntensty gradent edges, 2D feature algnment s problematc because of the aperture problem features may drft along the edge. Therefore, we lmt the degrees of freedom n the algnment to the normal drecton to the edge. Ths s llustrated n Fg. 4a, where a warped reference feature patch s schematcally drawn at the predcted poston n the newest mage. For features on edges, we therefore optmze for a scalar correcton δu R n the drecton of the edge normal n to obtan the correspondng feature poston u n the newest frame: u = u + δu n, wth (5) δu 1 =arg mn I C k (u +δu n+ u) I C δu 2 r(u + A u) 2. u P Ths s smlar to prevous work on VO wth edgelets, where feature correspondence s found by samplng along the normal drecton for abrupt ntensty changes [14, 48 52]. However,

5 5 n our case, sparse mage algnment provdes a very good ntalzaton of the feature poston, whch drectly allows us to follow the ntensty gradent n an optmzaton. After feature algnment, we have establshed feature correspondence wth subpxel accuracy. However, feature algnment volated the eppolar constrants and ntroduced a reprojecton error δu, whch s typcally well below.5 pxels. Therefore, n the last step of moton estmaton, we refne the camera poses and landmark postons X = {T kw, ρ } by mnmzng the squared sum of reprojecton errors: X = arg mn X k K L C k + k K L E k 1 2 u π ( T CB T kw ρ ) 2 1 ( 2 nt u π ( )) T CB T kw ρ 2 where K s the set of all keyframes n the map, L C k the set of all landmarks correspondng to corner features, and L E k the set of all edge features that were observed n the k th camera frame. The reprojecton error of edge features s projected along the edge normal because the component along the edge cannot be determned. The optmzaton problem n Eq. (6) s a standard bundle adjustment problem that can be solved n real-tme usng SAM2 [9]. In [44] we further show how the objectve functon can be extended to nclude nertal measurements. Whle optmzaton over the whole trajectory n Eq. (6) results n the most accurate results (see Sec. XI-B), we found that for many applcatons (e.g. for state estmaton of mcro aeral vehcles [2, 53]) t suffces to only optmze the latest camera pose and the 3D ponts separately. VI. MAPPING (6) In the prevous secton, we assumed that the depth at sparse feature locatons n the mage s known. In ths secton, we descrbe how the mappng thread estmates ths depth for newly detected features. Therefore, we assume that the camera poses are known from the moton estmaton thread. The depth at a sngle pxel s estmated from multple observatons by means of a recursve Bayesan depth flter. New depth flters are ntalzed at ntensty corners and along gradent edges when the number of tracked features falls below some threshold and, therefore, a keyframe s selected. Every depth flter s assocated to a reference keyframe r, where the ntal depth uncertanty s ntalzed wth a large value. For a set of prevous keyframes 2 as well as every subsequent frame wth known relatve pose {I k, T kr }, we search for a patch along the eppolar lne that has the hghest correlaton (see Fg. 5). Therefore, we move the reference patch along the eppolar lne and compute the zero mean sum of squared dfferences. From the pxel wth maxmum correlaton, we trangulate the depth measurement ρ k, whch s used to update 2 In the prevous publcaton of SVO [2] and n the open source mplementaton we suggested to update the depth flter only wth newer frames k > r, whch works well for down-lookng cameras n mcro aeral vehcle applcatons. However, for forward motons, t s benefcal to update the depth flters also wth prevous frames k < r, whch ncreases the performance wth forward-facng cameras. I r u ρ mn ˆρ T r,k I k ρ k ρ max Fg. 5: Probablstc depth estmate ˆρ for feature n the reference frame r. The pont at the true depth projects to smlar mage regons n both mages (blue squares). Thus, the depth estmate s updated wth the trangulated depth ρ k computed from the pont u of hghest correlaton wth the reference patch. The pont of hghest correlaton les always on the eppolar lne n the new mage. the depth flter. If enough measurements were obtaned such that uncertanty n the depth s below a certan threshold, we ntalze a new 3D pont at the estmated depth, whch subsequently can be used for moton estmaton (see system overvew n Fg. 1). Ths approach for depth estmaton also works for features on gradent edges. Due to the aperture problem, we however skp measurements where the edge s parallel to the eppolar lne. Ideally, we would lke to model the depth wth a nonparametrc dstrbuton to deal wth multple depth hypotheses (top rows n Fg. 6). However, ths s computatonally too expensve. Therefore, we model the depth flter accordng to [54] wth a two dmensonal dstrbuton: the frst dmenson s the nverse depth ρ [55], whle the second dmenson γ s the nler probablty (see bottom rows n Fg. 6). Hence, a measurement ρ k s modeled wth a Gaussan + Unform mxture model dstrbuton: an nler measurement s normally dstrbuted around the true nverse depth ρ whle an outler measurement arses from a unform dstrbuton n the nterval [ρ mn, ρ max ]: p( ρ k ρ, γ ) = γ N ( ρ k ρ, τ 2 ) +(1 γ )U ( ρ k u ρ mn, ρ max ), (7) where τ 2 the varance of a good measurement that can be computed geometrcally by assumng a dsparty varance of one pxel n the mage plane [39]. Assumng ndependent observatons, the Bayesan estmaton for ρ on the bass of the measurements ρ r+1,..., ρ k s gven by the posteror p(ρ, γ ρ r+1,..., ρ k ) p(ρ, γ) k p( ρ k ρ, γ), (8) wth p(ρ, γ) beng a pror on the true nverse depth and the rato of good measurements supportng t. For ncremental computaton of the posteror, the authors of [54] show that (8) can be approxmated by the product of a Gaussan dstrbuton for the depth and a Beta dstrbuton for the nler rato: q(ρ, γ a k, b k, µ k, σ 2 k) = Beta(γ a k, b k )N (ρ µ k, σ 2 k), (9) where a k and b k are the parameters controllng the Beta dstrbuton. The choce s motvated by the fact that the Beta Gaussan s the approxmatng dstrbuton mn-

6 6 Reference patch Eppolar lne Correct match Outler match 1 γ.5 1 γ (a) After 3 measurements wth 7% nler probablty. Inverse Depth Inverse Depth (b) After 3 measurements wth 7% nler probablty. Fg. 6: Illustraton of posteror dstrbutons for depth estmaton. The hstogram n the top rows show the measurements affected by outlers. The dstrbuton n the mddle rows show the posteror dstrbuton when modelng the depth wth a sngle varate Gaussan dstrbuton. The bottom rows show the posteror dstrbuton of the proposed approach that s usng the model from [54]. The dstrbuton s b-varate and models the nler probablty (vertcal axs) together wth the nverse depth (horzontal axs). mzng the Kullback-Lebler dvergence from the true posteror (8). Upon the k-th observaton, the update takes the form p(ρ, γ ρ r+1,..., ρ k ) q(d, γ a k 1, b k 1, µ k 1, σ 2 k 1) p( ρ k d, γ) const, (1) and the authors of [54] approxmated the true posteror (1) wth a Beta Gaussan dstrbuton by matchng the frst and second order moments for ˆd and γ. The updates formulas for a k, b k, µ k and σk 2 are thus derved and we refer to the orgnal work n [54] for the detals on the dervaton. Fg. 6 shows a small smulaton experment that hghlghts the advantage of the model proposed n [54]. The hstogram n the top rows show the measurements that are corrupted by 3% outler measurements. The dstrbuton n the mddle rows show the posteror dstrbuton when modelng the depth wth a sngle varate Gaussan dstrbuton as used for nstance n [41]. Outler measurements have a huge nfluence on the mean of the estmate. The fgures n the bottom rows show the posteror dstrbuton of the proposed approach that s usng the model from [54] wth the nler probablty drawn n the vertcal axs. As more measurements are receved at the same depth, the nler probablty ncreases. In ths model, the mean of the estmate s less affected by outlers whle the nler probablty s nformatve about the confdence of the estmate. Fg. 7 shows qualtatvely the mportance of robust Fg. 7: Illustraton of the eppolar search to estmate the depth of the pxel n the center of the reference patch n the left mage. Gven the extrnsc and ntrnsc calbraton of the two mages, the eppolar lne that corresponds to the reference pxel s computed. Due to self-smlar texture, erroneous matches along the eppolar lne are frequent. depth estmaton n self-smlar envronments, where outler matches are frequent. In [39] we demonstrate how the same depth flter can be used for dense mappng. VII. LARGE FIELD OF VIEW CAMERAS To model large optcal dstorton, such as fsheye and catadoptrc (see Fg. 8), we use the camera model proposed n [56], whch models the projecton π( ) and unprojecton π 1 ( ) functons wth polynomals. Usng the Jacobans of the camera dstorton n the sparse mage algnment and bundle adjustment step s suffcent to enable moton estmaton for large FoV cameras. For estmatng the depth of new features (c.f., Sec. VI), we need to sample pxels along the eppolar lne. For dstorted mages, the eppolar lne s curved (see Fg. 7). Therefore, we regularly sample the great crcle, whch s the ntersecton of the eppolar plane wth the unt sphere centered at the camera pose of nterest. The angular resoluton of the samplng corresponds approxmately to one pxel n the mage plane. For each sample, we apply the camera projecton model π( ) to obtan the correspondng pxel coordnate on the curved eppolar lne. VIII. MULTI-CAMERA SYSTEMS The proposed sem-drect camera moton estmaton starts drectly wth an optmzaton of the relatve pose T kk 1. Snce n Sec. V-A we already ntroduced a body frame B that s rgdly attached to the camera, t s now straghtforward to generalze sparse mage algnment to multple rgdly attached and synchronzed cameras. Let us assume that gven a camera rg wth M cameras (see Fg. 9). The extrnsc calbraton of the ndvdual cameras c C wth respect to the body frame T CB s assumed to be known from pror extrnsc calbraton 3. To use multple cameras, we only need to add an extra summaton n the cost functon of Eq. (1) to use the nformaton from all mages for sparse mage algnment: T kk 1 = arg mn T kk 1 C C u R C k r I C u (T kk 1) 2 Σ I. (11) The same summaton s necessary n the bundle adjustment step to sum the reprojecton errors from all cameras. The remanng steps of feature algnment and mappng are ndependent of how many cameras are used. To summarze, the 3 We use the calbraton toolbox Kalbr [45], whch s avalable at https: //gthub.com/ethz-asl/kalbr

7 7 Tkk 1 (a) Perspectve (b) Fsheye Body Frame TBC1 TBC2 Fg. 9: Vsual odometry wth multple rgdly attached and synchronzed cameras. We know the relatve pose of each camera to the body frame TBCj from extrnsc calbraton and the goal s to estmate the relatve moton of the body frame Tkk 1. (c) Catadoptrc Fg. 8: Dfferent optcal dstorton models that are supported by SVO. only modfcaton to enable the use of multple cameras s to refer the optmzatons to a central body frame, whch requres us to nclude the extrnsc calbraton TCB n the Jacobans as shown n the Appendx. IX. M OTION P RIORS In feature-poor envronments, durng rapd motons, or n case of dynamc obstacles t can be very helpful to employ a moton pror. A moton pror s an addtonal term that s added to the cost functon n Eq. (11), whch penalzes motons that are not n agreement wth the pror estmate. Thereby, jumps n the moton estmate due to unconstraned degrees of freedom or outlers can be suppressed. In a car scenaro for nstance, a constant velocty moton model may be assumed as the nerta of the car prohbts sudden changes from one frame to the next. Other prors may come from addtonal sensors such as gyroscopes, whch allow us to measure the ncremental rotaton between two frames. Let us assume that we are gven a relatve translaton pror p kk 1 (e.g., from a constant velocty assumpton) and a relatve rotaton pror R kk 1 (e.g., from ntegratng a gyroscope). In ths case, we can employ a moton pror by addng addtonal terms to the cost of the sparse mage algnment step: T?kk 1 = arg mn Tkk 1 X X C C u R Ck 1 1 kr C (Tkk 1 )k2σi 2 Iu (12) 1 + kpkk 1 p kk 1 k2σp k log(r T kk 1 Rkk 1 ) kσr, 2 where the covarances Σp, ΣR are set accordng to the uncer. tanty of the moton pror and the varables (pkk 1, Rkk 1 ) = Tkk 1 are the current estmate of the relatve poston and orentaton (expressed n body coordnates B). The logarthm map maps a rotaton matrx to ts rotaton vector (see Eq. (18)). Note that the same cost functon can be added to the bundle adjustment step. For further detals on solvng Eq. (12), we refer the nterested reader to the Appendx. X. I MPLEMENTATION D ETAILS In ths secton we provde addtonal detals on varous aspects of our mplementaton. A. Intalzaton The algorthm s bootstrapped to obtan the pose of the frst two keyframes and the ntal map usng the 5-pont relatve pose algorthm from [57]. B. Sparse Image Algnment For sparse mage algnment, we use a patch sze of 4 4 pxels. In the expermental secton we demonstrate that the sparse approach wth such a small patch sze acheves comparable performance to sem-dense and dense methods n terms of robustness when the nter-frame dstance s small, whch typcally s true for frame-to-frame moton estmaton. In order to cope wth large motons, we apply the sparse mage algnment algorthm n a coarse-to-fne scheme. Therefore, the mage s halfsampled to create an mage pyramd of fve levels. The photometrc cost s then optmzed at the coarsest level untl convergence, startng from the ntal condton Tkk 1 = I4 4. Subsequently, the optmzaton s contnued at the next fner level to mprove the precson of the result. To save processng tme, we stop after convergence on the thrd level, at whch stage the estmate s accurate enough to ntalze feature algnment. To ncrease the robustness aganst dynamc obstacles, occlusons and reflectons, we addtonally employ a robust cost functon [24, 34]. C. Feature Algnment For feature algnment we use a patch-sze of 8 8 pxels. Snce the reference patch may be multple frames old, we employ an affne llumnaton model to cope wth llumnaton changes [58]. For all experments we lmt the number of matched features to 18 n order to guarantee a constant cost per frame. D. Mappng In the mappng thread, we dvde the mage n cells of fxed sze (e.g., pxels). For every keyframe a new depthflter s ntalzed at the FAST corner [59] wth hghest score n the cell unless there s already a 2D-to-3D correspondence present. In cells where no corner s found, we detect the pxel wth hghest gradent magntude and ntalze an edge feature. Ths results n evenly dstrbuted features n the mage. To speed up the depth-estmaton we only sample a short range along the eppolar lne; n our case, the range corresponds to twce the standard devaton of the current depth estmate. We use a 8 8 pxel patch sze for the eppolar search.

8 (c) Sparse (b) Depth of the scene (d) Sem-Dense (e) Dense Fg. 1: An mage from the Urban Canyon dataset [6] (Sec. XI-A) wth pxels used for mage-to-model algnment (marked n green) for sparse, semdense, and dense methods. Dense approaches use every pxel n the mage, sem-dense use just the pxels wth hgh ntensty gradent, and the proposed sparse approach uses selected pxels at corners or along ntensty gradent edges. converged poses [%] (a) Synthetc scene converged poses [%] converged poses [%] 8 4 The Urban Canyon dataset [6] s avalable at converged poses [%] In ths secton we evaluate the robustness of the proposed sparse mage algnment algorthm (Sec. V-A) and compare ts performance to sem-dense and dense mage algnment alternatves. Addtonally, we nvestgate the nfluence of the patch-sze that s used for the sparse approach. The experment s based on a synthetc dataset wth known camera moton, depth and calbraton [6].4 The camera performs a forward moton through an urban canyon as the excerpt of the dataset n Fg. 1a shows. The dataset conssts of 25 frames wth.2 meters dstance between frames and a medan scene depth of 12.4 meters. For the experment, we select a reference mage Ir wth known depth (see Fg. 1b) and estmate the relatve pose Trk of 6 subsequent mages k {r + 1,..., r + 6} along the trajectory by means of mage to model algnment. For each mage par {Ir, Ik }, the algnment s repeated 8 tmes wth ntal perturbaton that s sampled unformly wthn a 2 m range around the true value. We perform the experment at 18 reference frames along the trajectory. The algnment s consdered converged when the estmated relatve pose s closer than.1 meters from the ground-truth. The goal of ths experment s to study the magntude of the perturbaton from whch mage to model sparse (2x2) sparse (3x3) sparse (4x4) sparse (5x5) converged poses [%] A. Image Algnment: From Sparse to Dense sparse (1x1) 8 1 converged poses [%] We mplemented the proposed VO system n C++ and tested ts performance n terms of accuracy, robustness, and computatonal effcency. We frst compare the proposed sparse mage algnment algorthm aganst sem-dense and dense mage algnment algorthms and nvestgate the nfluence of the patch sze used n the sparse approach. Fnally, n Sec. XI-B we compare the full ppelne n dfferent confguratons aganst the state of the art on nne dfferent dataset sequences. converged poses [%] XI. E XPERIMENTAL E VALUATION sem-d frames dstance 45 6 dense frames dstance 45 6 Fg. 11: Convergence probablty of the model-based mage algnment algorthm as a functon of the dstance to the reference mage and evaluated for sparse mage algnment wth patch szes rangng from 1 1 to 5 5 pxels, sem-dense, and dense mage algnment. The colored regon hghlghts the 68% confdence nterval.

9 9 algnment s capable to converge as a functon of the dstance to the reference mage. The performance n ths experment s a measure of robustness: successful pose estmaton from large ntal perturbatons shows that the algorthm s capable of dealng wth rapd camera motons. Furthermore, large dstances between the reference mage I r and test mage I k smulates the performance at low camera frame-rates. For the sparse mage algnment algorthm, we extract 1 FAST corners n the reference mage (see Fg. 1c) and ntalze the correspondng 3D ponts usng the known depthmap from the renderng process. We repeat the experment wth patch-szes rangng from 1 1 pxels to 5 5 pxels. We evaluate the sem-drect approach (as proposed n the LSD framework [41]) by usng pxels along ntensty gradents (see Fg. 1d). Fnally, we perform the experment usng all pxels n the reference mage as proposed n DTAM [15]. The results of the experment are shown n Fg. 11. Each plot shows a varant of the mage algnment algorthm wth the vertcal axs ndcatng the percentage of converged trals and the horzontal axs the frame ndex counted from the reference frame. We can observe that the dfference between sem-dense mage algnment and dense mage algnment s margnal. Ths s because pxels that exhbt no ntensty gradent are not nformatve for the optmzaton as ther Jacobans are zero [4]. We suspect that usng all pxels becomes only useful when consderng moton blur and mage defocus, whch s out of the scope of ths evaluaton. In terms of sparse mage algnment, we observe a gradual mprovement when ncreasng the patch sze to 4 4 pxels. A further ncrease of the patch sze does not show mproved convergence and wll eventually suffer from the approxmatons adopted by not warpng the patches accordng to the surface orentaton. Compared to the sem-dense approach, the sparse approaches do not reach the same convergence radus, partcularly n terms of dstance to the reference mage. For ths reason SVO uses sparse mage algnment only to algn wth respect to the prevous mage (.e., k = r + 1), n contrast to LSD [41] whch algns wth respect to the last keyframe. In terms of computatonal effcency, we note that the complexty scales lnearly wth the number of pxels used n the optmzaton. The plots show that we can trade-off usng a hgh frame rate camera and a sparse approach wth a lower frame-rate camera and a sem-dense approach. The evaluaton of ths trade-off would deally ncorporate the power consumpton of both the camera and processors, whch s out of the scope of ths evaluaton. B. Real and Synthetc Experments In ths secton, we compare the proposed algorthm aganst the state of the art on real and synthetc datasets. Therefore, we present results of the proposed ppelne on the EUROC benchmark [61], the TUM RGB-D benchmark dataset [62], the synthetc ICL-NUIM dataset [37], and our own dataset that compares dfferent feld of vew cameras. A selecton of these experments, among others (e.g., from the KITTI benchmark), can also be vewed n the vdeo attachment of ths paper. 1) Euroc Datasets: The EUROC dataset [61] conssts of stereo mages and nertal data that was recorded usng a VI- Sensor [63] that was mounted on a mcro aeral vehcle and flown nsde a machne hall. Extracts from the dataset are shown n Fg. 12a and 12b. The dataset provdes a precse ground-truth trajectory that was obtaned usng a Leca MS5 laser trackng system. In Table I we present results of varous monocular and stereo confguratons of the proposed algorthm on the frst two trajectores of the dataset. The trajectores are 65 and 58 meters long respectvely. We compare our algorthm aganst the open-source versons of ORB-SLAM [22] and LSD-SLAM [16] on these datasets wth ther default parameter settngs. In contrast to SVO, ORB- SLAM and LSD-SLAM detect loop-closures and subsequently perform a global optmzaton. Hence, durng loop-closure refnements, the latest camera pose may undergo sgnfcant jumps. For ths reason, we base the evaluaton on the fnal poses of all keyframes n the trajectory. We observed that the ntal poses of LSD-SLAM are not accurate due to the ntalzaton procedure and therefore dscard the frst 35 frames n the evaluaton for LSD-SLAM. To understand the nfluence of the proposed extensons of SVO, we run the algorthm n varous confguratons. SVO Mono (only corners) uses only the mages that were recorded wth the left camera of the sensor. SVO Mono + Pror ndcates that we use measurements from the gyroscope as prors n the mage algnment step as we dscussed n Sec. IX. In the next settng we addtonally use edgelet features n combnaton wth corner features. In these frst three settngs, we only optmze the latest pose; conversely, the keyword Bundle Adjustment ndcates that results were obtaned by optmzng the whole hstory of keyframes by means of the ncremental smoothng algorthm SAM2 [9]. Therefore, we nsert and optmze every new keyframe n the SAM2 graph when a new keyframe s selected. In ths settng, we do nether use moton prors nor edgelets. Snce SVO s a vsual odometry, t does not not detect loop-closures and only mantans a small local map of the last keyframes. To provde a far comparson wth ORB-SLAM and LSD-SLAM, we deactvated ther capablty to detect large loop closures va mage retreval. Addtonally, we provde results usng both mage streams of the stereo camera. Therefore, we apply the approach ntroduced n Sec. VIII to estmate the moton of a mult-camera system. To obtan a measure of accuracy of the dfferent approaches, we algn the fnal trajectory of keyframes wth the ground-truth trajectory usng the least-squares approach proposed n [64]. Snce scale cannot be recovered usng a sngle camera, we also rescale the estmated trajectory to best ft wth the ground-truth trajectory. Subsequently, we compute the Eucldean dstance between the estmated and ground-truth keyframe poses and compute the mean, medan, and Root Mean Square Error (RMSE) n meters. We chose the absolute trajectory error measure nstead of relatve drft metrcs [62] because the fnal trajectory n ORB-SLAM conssts only of a sparse set of keyframes, whch makes drft measures on relatvely short trajectores less expressve. The reported results are averaged over three runs.

10 1 (a) (b) (c) Machne Hall 1 (d) Machne Hall 2 Fg. 12: Fgures (a) and (b) show excerpts of the EuRoC dataset [61] wth tracked corners marked n green and edgelets marked n magenta. Fgures (c) and (d) show the reconstructed trajectory and pontcloud on the frst two trajectores of the dataset. Machne Hall 1 Machne Hall 2 Tmng Mean Medan RMSE [m] Mean Medan RMSE [m] Mean [ms] St.D. CPU@2 fps SVO Mono ±1 % SVO Mono + Pror ± 8 % SVO Mono + Pror + Edgelet ± 7 % SVO Mono + Bundle Adjustment ±13 % SVO Stereo ± 6 % SVO Stereo + Pror ± 7 % SVO Stereo + Pror + Edgelet ± 7 % SVO Stereo + Bundle Adjustment ±13 % ORB Mono SLAM (No loop closure) ±32 % LSD Mono SLAM (No loop closure) ±37 % TABLE I: Absolute translaton errors n meters after 6 DoF algnment wth the ground-truth trajectory and tmng measurements on laptop computer wth Intel Core 7-481MQ CPU (2.8 GHz) averaged over three runs of the EUROC Machne Hall 1 dataset. Loop closure detecton and optmzaton was deactved for ORB and LSD SLAM to allow a far comparson wth SVO. The frst and second column report mean and standard devtaton of the processng tme. Snce all algorthms use mult-threadng, the thrd column reports the average CPU load when provdng new mages at a constant rate of 2 Hz. Thread Intel 7 [ms] Jetson TX1 [ms] Sparse mage algnment Feature algnment Optmze pose & landmarks Extract features Update depth flters TABLE II: Mean tme consumpton n mllseconds by ndvdual components of SVO Mono on the EUROC Machne Hall 1 dataset. We report tmng results on a laptop wth Intel Core 7 (2.8 GHz) processor and on the NVIDIA Jetson TX1 ARM processor. The results show that usng a stereo camera n general results n much hgher accuracy. Apart from the addtonal vsual measurements, the man reason for the mproved results s that the stereo system does not drft n scale and nter camera trangulatons allow to quckly ntalze new 3D landmarks n case of on-spot rotatons. Notce that SVO wth bundle adjustment s twce as accurate as ORB and LSD SLAM. However, the power of the less accurate confguratons of SVO becomes obvous when analyzng the tmng and processor usage of the approaches, whch are reported n the rght columns of Table I. In the table, we report the mean tme to process a sngle frame n mllseconds and the standard devaton over all measurements. Snce all algorthms make use of mult-threadng and the tme to process a sngle frame may therefore be msleadng, we addtonally report the CPU usage (contnuously sampled durng executon) when provdng new mages at a constant rate of 2 Hz to the algorthm. All measurements are averaged over 3 runs of the frst EUROC dataset and computed on the same laptop computer (Intel Core 7-276QM CPU). In Table II, we further report the average tme consumpton of ndvdual components of SVO on the laptop computer and an NVIDIA TX1 ARM processor, whch s popular n moble robotcs applcatons. The results show that the SVO approach s up to ten tmes faster than ORB- SLAM and LSD-SLAM and requres only a fourth of the CPU usage. The reason for ths sgnfcant dfference s that SVO does not extract features and descrptors n every frame, as n ORB-SLAM, but does so only for keyframes n the concurrent mappng thread. Addtonally, ORB-SLAM beng a SLAM approach spends most of the processng tme n fndng matches to the map (see Table I n [65]), whch n theory results n a pose-estmate wthout drft n an already mapped area. Contrarly, n the frst three confguratons of SVO, we estmate only the pose of the latest camera frame wth respect to the local map. Compared to LSD-SLAM, SVO s faster because t operates on sgnfcantly less numbers of pxels, hence, also does not result n a sem-dense reconstructon of the envronment. Ths, however, could be acheved n a parallel process as we have shown n [39, 53, 66]. We further remark that the bundle adjustment verson has a sgnfcantly hgher standard devaton n the tmng as we update SAM2 at every keyframe, whch takes approxmately 1 mllseconds longer than processng a regular frame. Usng a moton pror further helps to mprove the effcency as the sparse-magealgnment optmzaton can be ntalzed closer to the soluton, and therefore needs less teratons to converge. An edgelet provdes only a one-dmensonal constrant n the mage doman, whle a corner provdes a two-dmensonal

11 11 (a) fr2 desk (b) fr2 xyz (a) Fg. 13: Impressons from the TUM RGB-D benchmark dataset [62] wth tracked corners n green and edgelets n magenta. (a) fr2 desk (sde) (b) fr2 desk (top) (b) Fg. 15: Impressons from the synthetc ICL-NUIM dataset [37] wth tracked corners marked n green and edgelets n magenta. (a) Lvng Room (b) Lvng Room 1 (c) Lvng Room 2 (d) Lvng Room 3 Fg. 14: Estmated trajectory and pontcloud of the TUM fr2 desk dataset. SVO Mono (wth edgelets) SVO Mono + Bundle Adjustment LSD-SLAM [16] ORB-SLAM [22] PTAM [21] Sem-Dense VO [41] Drect RGB-D VO [34] F Feature-based RGB-D SLAM [67] F fr2 desk RMSE [cm] fr2 xyz RMSE [cm] / / TABLE III: Results on the TUM RGB-D benchmark dataset [62]. Results for [16, 34, 41, 67] were obtaned from [16] and for PTAM we report two results that were publshed n [22] and [41] respectvely. Algorthms marked wth F use a depth-sensor, and ndcates loop-closure detecton. The symbol ndcates that trackng the whole trajectory dd not succeed. constrant. Therefore, whenever suffcent corners can be detected, the SVO algorthm prortzes the corners. Snce the envronment n the EUROC dataset s well textured and provdes many corners, the use of edgelets does not mprove the accuracy. However, the edgelets brng a beneft n terms of robustness when the texture s such that no corners are present. 2) TUM Datasets: A common dataset to evaluate vsual odometry algorthms s the TUM Munch RGB-D benchmark [62]. The dataset was recorded wth a Mcrosoft Knect RGBD camera, whch provdes mages of worse qualty (e.g. rollng shutter, moton blur) than the VI-Sensor EUROC dataset. Fg. 13 shows excerpts from the fr2_desk and fr2_xyz datasets whch have a trajectory length of 18.8 m and 7 m respectvely. Groundtruth s provded by a moton capture system. Table III shows the results of the proposed algorthm (averaged over three runs) and comparsons aganst related works. The resultng trajectory and the recovered landmarks are shown n Fg. 14. The results from related works Fg. 16: Results on the ICL-NUIM [37] nosy synthetc lvng room dataset. SVO Mono (wth edgelets) LSD SLAM ORB SLAM [22] DVO [34] F FOVIS [68] F ICP [69] F ICP+RGB-D [7] F lr kt RMSE lr kt1 RMSE lr kt2 RMSE lr kt3 RMSE TABLE IV: Results on the ICL-NUIM Dataset [37]. Algorthms marked wth F use a depth-sensor, and ndcates loop-closure detecton. The symbol ndcates that trackng the whole trajectory dd not succeed. were obtaned from the evaluaton n [22] and [16]. We argue that the better performance of ORB-SLAM and LSD-SLAM s due to the capablty to detect loop-closures. 3) ICL-NUIM Datasets: The ICL-NUIM dataset [37] s a synthetc dataset that ams to benchmark RGB-D, vsual odometry and SLAM algorthms. The dataset conssts of four trajectores of length 6.4 m, 1.9 m, 7.3 m, and 11.1 m. The syntheszed mages are corrupted by nose to smulate real

12 12 camera mages. Ground-truth and calbraton are provded by the dataset. Most reported results on ths dataset use the synthetszed measurements from the depth sensor together wth the rendered mages. Indeed, the datsets are very challengng for purely vson-based odometry due to dffcult texture and frequent on-spot rotatons as can be seen n the excerpts from the dataset n Fg. 15. Table IV reports the results of the proposed algorthm (averaged over three runs) and the results from other algorthms on the lvng room sequence. Smlar to the prevous datasets, we report the root mean square error after rotaton, translaton and scale algnment wth the ground-truth trajectory. Fg. 16 shows the reconstructed maps and recovered trajectores. The maps are very nosy due to the fne graned texture of the scene. We also run ORB-SLAM and LSD-SLAM on the dataset. ORB-SLAM fals to ntalze on the second sequence and we dd not manage to obtan any results from LSD- SLAM on all datasets. The reason for the falures s lack of texture for ntalzaton and frequent on-spot rotatons. For SVO, ths also requred us to set a partcularly low FAST corner detecton threshold on ths dataset (to 5 nstead of 2). A lower threshold results n detecton of many low-qualty features. However, features are only used n SVO once ther correspondng scene depth s sucessfully estmated by means of the robust depth flter descrbed n Sec. VI. Hence, the process of depth estmaton helps to dentfy the stable features wth low score that can be relably used for moton estmaton. In ths dataset, we were not able to refne the results of SVO wth bundle adjustment. The reason s that the SAM2 backend s based on Gauss Newton whch s very senstve to underconstraned varables that render the lnearzed problem ndetermnant. The frequent on-spot rotatons and very low parallax angle trangulatons result n many underconstraned varables. Usng an optmzer that s based on Levenberg Marquardt or addng addtonal nertal measurements [44] would help n such cases. For comparson, Table IV also shows the results reported n [37] of algorthms that also use the depth measurements. 4) Crcle Dataset: In the last experment, we want to demonstrate the usefulness of wde feld of vew lenses for VO. We recorded the dataset wth a mcro aeral vehcle that we flew n a moton capture room and commanded t to fly a perfect crcle wth downfacng camera. Subsequently, we flew the exact same trajectory agan wth a wde fsheye camera. Excerpts from the dataset are shown n Fg. 17. We run SVO (wthout bundle adjustment) on both datasets and show the resultng trajectores n Fg. 18. To run SVO on the fsheye mages, we use the modfcatons descrbed n Sec. VII. Whle the recovered trajectory from the perspectve camera slowly drfts over tme, the result on the fsheye camera perfectly overlaps wth the groundtruth trajectory. We also run ORB- SLAM and LSD-SLAM on the trajectory wth the perspectve mages. The result of ORB-SLAM s as close to the groundtruth trajectory as the SVO fsheye result. However, f we deactvate loop-closure detecton (shown result) the trajectory drfts more than SVO. We were not able to run LSD-SLAM and ORB-SLAM on the fsheye mages as the open source mplementatons do not support very large FoV cameras. Due x [m] (a) Perspectve. (b) Fsheye. Fg. 17: SVO trackng wth (a) perspectve and (b) fsheye camera lens ORB SLAM (no loop-closure) SVO Fsheye SVO Perspectve Groundtruth y [m] Fg. 18: Comparson of perspectve and fsheye lense on the same crcular trajectory that was recorded wth a mcro aeral vehcle n a moton capture room. The ORB-SLAM result was obtaned wth the perspectve camera mages and loop-closure was deactvated for a far comparson wth SVO. ORB-SLAM wth a perspectve camera and wth loop-closure actvated performs as good as SVO wth a fsheye camera. to the dffcult hgh-frequency texture of the floor, we were not able to ntalze LSD-SLAM on ths dataset. A more ndepth evaluaton of the beneft of large FoV cameras for SVO s provded n [6]. XII. DISCUSSION In ths secton we dscuss the proposed SVO algorthm n terms of effcency, accuracy, and robustness. A. Effcency Feature-based algorthms ncur a constant cost of feature and descrptor extracton per frame. For example, ORB-SLAM requres 11 mllseconds per frame for ORB feature extracton only [22]. Ths constant cost per frame s a bottleneck for feature-based VO algorthms. On the contrary, SVO does not have ths constant cost per frame and benefts greatly from the use of hgh frame-rate cameras. SVO extracts features only for selected keyframes n a parallel thread, thus, decoupled from hard real-tme constrants. The proposed trackng algorthm, on the other hand, benefts from hgh frame-rate cameras: the sparse mage algnment step s automatcally ntalzed closer to the soluton and, thus, converges faster. Therefore,

13 13 C. Robustness Fg. 19: Successful trackng n scenes of hgh-frequency texture. ncreasng the camera frame-rate actually reduces the computatonal cost per frame n SVO. The same prncple apples to LSD-SLAM. However, LSD-SLAM tracks sgnfcantly more pxels than SVO and s, therefore, up to an order of magntude slower. To summarze, on a laptop computer wth an Intel GHz CPU processor ORB-SLAM and LSD-SLAM requre approxmately 3 and 23 mllseconds respectvely per frame whle SVO requres only 2.5 mllseconds (see Table I). B. Accuracy SVO computes feature correspondence wth sub-pxel accuracy usng drect feature algnment. Subsequently, we optmze both structure and moton to mnmze the reprojecton errors (see Sec. V-B). We use SVO n two settngs: f hghest accuracy s not necessary, such as for moton estmaton of mcro aeral vehcles [53], we only perform the refnement step (Sec. V-B) for the latest camera pose, whch results n the hghest frame-rates (.e., 2.5 ms). If hghest accuracy s requred, we use SAM2 [9] to jontly optmze structure and moton of the whole trajectory. SAM2 s an ncremental smoothng algorthm, whch leverages the expressveness of factor graphs [8] to mantan sparsty and to dentfy and update only the typcally small subset of varables affected by a new measurement. In an odometry settng, ths allows SAM2 to acheve the same accuracy as batch estmaton of the whole trajectory, whle preservng real-tme capablty. Bundle adjustment wth SAM2 s consstent [44], whch means that the estmated covarance of the estmate matches the estmaton errors (e.g., are not over-confdent). Consstency s a prerequste for optmal fuson wth addtonal sensors [18]. In [44], we therefore show how SVO can be fused wth nertal measurements to acheve a drft that s approxmately.1% of the traveled dstance. LSD-SLAM, on the other hand, only optmzes a graph of poses and leaves the structure fxed once computed (up to a scale). The optmzaton does not capture correlatons between the sem-dense depth estmates and the camera pose estmates. Ths separaton of depth estmaton and pose optmzaton s only optmal f each step yelds the optmal soluton. SVO s most robust when a hgh frame-rate camera s used (e.g., between 4 and 8 frames per second). Ths ncreases the reslence to fast motons as t s demonstrated n the vdeo attachment. A fast camera, together wth the proposed robust depth estmaton, allows us to track the camera n envronments wth repettve and hgh frequency texture (e.g., grass or asphalt as shown n Fg. 19). The advantage of the proposed probablstc depth estmaton method over the standard approach of trangulatng ponts from two vews only s that we observe far fewer outlers as every depth flter undergoes many measurements untl convergence. Furthermore, erroneous measurements are explctly modeled, whch allows the depth to converge n hghly self-smlar envronments. A further advantage of SVO s that the algorthm starts drectly wth an optmzaton. Data assocaton n sparse mage algnment s drectly gven by geometry of the problem and therefore, no RANSAC [23] s requred as t s typcal n feature-based approaches. Startng drectly wth an optmzaton also smplfes greatly the use of mult-camera systems, whch greatly mproves reslence to on-spot rotatons as the feld of vew of the system s enlarged and depth can be trangulated from nter-camera-rg measurements. Fnally, the use of gradent edge features (.e., edgelets) ncreases the robustness n areas where only few corner features are found. Our smulaton experments have shown that the proposed sparse mage algnment approach acheves comparable performance as sem-dense and dense algnment n terms of robustness of frame-to-frame moton estmaton. XIII. C ONCLUSION In ths paper, we proposed the sem-drect VO ppelne SVO that s sgnfcantly faster than the current state-of-theart VO algorthms whle achevng hghly compettve accuracy. The gan n speed s due to the fact that features are only extracted for selected keyframes n a parallel thread and feature matches are establshed very fast and robustly wth the novel sparse mage algnment algorthm. Sparse mage algnment tracks a set of features jontly under eppolar constrans and can be used nstead of KLT-trackng [71] when the scene depth at the feature postons s known. We further propose to estmate the scene depth usng a robust flter that explctly models outler measurements. Robust depth estmaton and drect trackng allows us to track very weak corner features and edgelets. A further beneft of SVO s that t drectly starts wth an optmzaton, whch allows us to easly ntegrate measurements from multple cameras as well as moton prors. The formulaton further allows usng large FoV cameras wth fsheye and catadoptrc lenses. The SVO algorthm has further proven successful n real-world applcatons such as vsonbased flght of quadrotors [53] or 3D scannng applcatons wth smartphones. Acknowledgments The authors gratefully acknowledge Henr Rebecq for creatng the Urban Canyon datasets that can be accessed here:

SVO: Semi-Direct Visual Odometry for Monocular and Multi-Camera Systems

SVO: Semi-Direct Visual Odometry for Monocular and Multi-Camera Systems 1 : Sem-Drect Vsual Odometry for Monocular and Mult-Camera Systems Chrstan Forster, Zchao Zhang, Mchael Gassner, Manuel Werlberger, Davde Scaramuzza Abstract Drect methods for Vsual Odometry (VO) have

More information

SVO: Semi-Direct Visual Odometry for Monocular and Multi-Camera Systems

SVO: Semi-Direct Visual Odometry for Monocular and Multi-Camera Systems Zurch Open Repostory and Archve Unversty of Zurch Man Lbrary Strckhofstrasse 39 CH-8057 Zurch www.zora.uzh.ch Year: 2016 : Sem-Drect Vsual Odometry for Monocular and Mult-Camera Systems Forster, Chrstan;

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

Fitting & Matching. Lecture 4 Prof. Bregler. Slides from: S. Lazebnik, S. Seitz, M. Pollefeys, A. Effros.

Fitting & Matching. Lecture 4 Prof. Bregler. Slides from: S. Lazebnik, S. Seitz, M. Pollefeys, A. Effros. Fttng & Matchng Lecture 4 Prof. Bregler Sldes from: S. Lazebnk, S. Setz, M. Pollefeys, A. Effros. How do we buld panorama? We need to match (algn) mages Matchng wth Features Detect feature ponts n both

More information

A Robust Method for Estimating the Fundamental Matrix

A Robust Method for Estimating the Fundamental Matrix Proc. VIIth Dgtal Image Computng: Technques and Applcatons, Sun C., Talbot H., Ourseln S. and Adraansen T. (Eds.), 0- Dec. 003, Sydney A Robust Method for Estmatng the Fundamental Matrx C.L. Feng and Y.S.

More information

METRIC ALIGNMENT OF LASER RANGE SCANS AND CALIBRATED IMAGES USING LINEAR STRUCTURES

METRIC ALIGNMENT OF LASER RANGE SCANS AND CALIBRATED IMAGES USING LINEAR STRUCTURES METRIC ALIGNMENT OF LASER RANGE SCANS AND CALIBRATED IMAGES USING LINEAR STRUCTURES Lorenzo Sorg CIRA the Italan Aerospace Research Centre Computer Vson and Vrtual Realty Lab. Outlne Work goal Work motvaton

More information

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes SPH3UW Unt 7.3 Sphercal Concave Mrrors Page 1 of 1 Notes Physcs Tool box Concave Mrror If the reflectng surface takes place on the nner surface of the sphercal shape so that the centre of the mrror bulges

More information

CS 534: Computer Vision Model Fitting

CS 534: Computer Vision Model Fitting CS 534: Computer Vson Model Fttng Sprng 004 Ahmed Elgammal Dept of Computer Scence CS 534 Model Fttng - 1 Outlnes Model fttng s mportant Least-squares fttng Maxmum lkelhood estmaton MAP estmaton Robust

More information

Image Alignment CSC 767

Image Alignment CSC 767 Image Algnment CSC 767 Image algnment Image from http://graphcs.cs.cmu.edu/courses/15-463/2010_fall/ Image algnment: Applcatons Panorama sttchng Image algnment: Applcatons Recognton of object nstances

More information

Angle-Independent 3D Reconstruction. Ji Zhang Mireille Boutin Daniel Aliaga

Angle-Independent 3D Reconstruction. Ji Zhang Mireille Boutin Daniel Aliaga Angle-Independent 3D Reconstructon J Zhang Mrelle Boutn Danel Alaga Goal: Structure from Moton To reconstruct the 3D geometry of a scene from a set of pctures (e.g. a move of the scene pont reconstructon

More information

Multi-stable Perception. Necker Cube

Multi-stable Perception. Necker Cube Mult-stable Percepton Necker Cube Spnnng dancer lluson, Nobuuk Kaahara Fttng and Algnment Computer Vson Szelsk 6.1 James Has Acknowledgment: Man sldes from Derek Hoem, Lana Lazebnk, and Grauman&Lebe 2008

More information

Feature Reduction and Selection

Feature Reduction and Selection Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components

More information

Real-time Joint Tracking of a Hand Manipulating an Object from RGB-D Input

Real-time Joint Tracking of a Hand Manipulating an Object from RGB-D Input Real-tme Jont Tracng of a Hand Manpulatng an Object from RGB-D Input Srnath Srdhar 1 Franzsa Mueller 1 Mchael Zollhöfer 1 Dan Casas 1 Antt Oulasvrta 2 Chrstan Theobalt 1 1 Max Planc Insttute for Informatcs

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

Range images. Range image registration. Examples of sampling patterns. Range images and range surfaces

Range images. Range image registration. Examples of sampling patterns. Range images and range surfaces Range mages For many structured lght scanners, the range data forms a hghly regular pattern known as a range mage. he samplng pattern s determned by the specfc scanner. Range mage regstraton 1 Examples

More information

Structure from Motion

Structure from Motion Structure from Moton Structure from Moton For now, statc scene and movng camera Equvalentl, rgdl movng scene and statc camera Lmtng case of stereo wth man cameras Lmtng case of multvew camera calbraton

More information

Fitting and Alignment

Fitting and Alignment Fttng and Algnment Computer Vson Ja-Bn Huang, Vrgna Tech Many sldes from S. Lazebnk and D. Hoem Admnstratve Stuffs HW 1 Competton: Edge Detecton Submsson lnk HW 2 wll be posted tonght Due Oct 09 (Mon)

More information

Geometric Primitive Refinement for Structured Light Cameras

Geometric Primitive Refinement for Structured Light Cameras Self Archve Verson Cte ths artcle as: Fuersattel, P., Placht, S., Maer, A. Ress, C - Geometrc Prmtve Refnement for Structured Lght Cameras. Machne Vson and Applcatons 2018) 29: 313. Geometrc Prmtve Refnement

More information

Ecient Computation of the Most Probable Motion from Fuzzy. Moshe Ben-Ezra Shmuel Peleg Michael Werman. The Hebrew University of Jerusalem

Ecient Computation of the Most Probable Motion from Fuzzy. Moshe Ben-Ezra Shmuel Peleg Michael Werman. The Hebrew University of Jerusalem Ecent Computaton of the Most Probable Moton from Fuzzy Correspondences Moshe Ben-Ezra Shmuel Peleg Mchael Werman Insttute of Computer Scence The Hebrew Unversty of Jerusalem 91904 Jerusalem, Israel Emal:

More information

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng

More information

Multi-view 3D Position Estimation of Sports Players

Multi-view 3D Position Estimation of Sports Players Mult-vew 3D Poston Estmaton of Sports Players Robbe Vos and Wlle Brnk Appled Mathematcs Department of Mathematcal Scences Unversty of Stellenbosch, South Afrca Emal: vosrobbe@gmal.com Abstract The problem

More information

Hermite Splines in Lie Groups as Products of Geodesics

Hermite Splines in Lie Groups as Products of Geodesics Hermte Splnes n Le Groups as Products of Geodescs Ethan Eade Updated May 28, 2017 1 Introducton 1.1 Goal Ths document defnes a curve n the Le group G parametrzed by tme and by structural parameters n the

More information

An efficient method to build panoramic image mosaics

An efficient method to build panoramic image mosaics An effcent method to buld panoramc mage mosacs Pattern Recognton Letters vol. 4 003 Dae-Hyun Km Yong-In Yoon Jong-Soo Cho School of Electrcal Engneerng and Computer Scence Kyungpook Natonal Unv. Abstract

More information

MOTION PANORAMA CONSTRUCTION FROM STREAMING VIDEO FOR POWER- CONSTRAINED MOBILE MULTIMEDIA ENVIRONMENTS XUNYU PAN

MOTION PANORAMA CONSTRUCTION FROM STREAMING VIDEO FOR POWER- CONSTRAINED MOBILE MULTIMEDIA ENVIRONMENTS XUNYU PAN MOTION PANORAMA CONSTRUCTION FROM STREAMING VIDEO FOR POWER- CONSTRAINED MOBILE MULTIMEDIA ENVIRONMENTS by XUNYU PAN (Under the Drecton of Suchendra M. Bhandarkar) ABSTRACT In modern tmes, more and more

More information

3D Modeling Using Multi-View Images. Jinjin Li. A Thesis Presented in Partial Fulfillment of the Requirements for the Degree Master of Science

3D Modeling Using Multi-View Images. Jinjin Li. A Thesis Presented in Partial Fulfillment of the Requirements for the Degree Master of Science 3D Modelng Usng Mult-Vew Images by Jnjn L A Thess Presented n Partal Fulfllment of the Requrements for the Degree Master of Scence Approved August by the Graduate Supervsory Commttee: Lna J. Karam, Char

More information

Prof. Feng Liu. Spring /24/2017

Prof. Feng Liu. Spring /24/2017 Prof. Feng Lu Sprng 2017 ttp://www.cs.pd.edu/~flu/courses/cs510/ 05/24/2017 Last me Compostng and Mattng 2 oday Vdeo Stablzaton Vdeo stablzaton ppelne 3 Orson Welles, ouc of Evl, 1958 4 Images courtesy

More information

What are the camera parameters? Where are the light sources? What is the mapping from radiance to pixel color? Want to solve for 3D geometry

What are the camera parameters? Where are the light sources? What is the mapping from radiance to pixel color? Want to solve for 3D geometry Today: Calbraton What are the camera parameters? Where are the lght sources? What s the mappng from radance to pel color? Why Calbrate? Want to solve for D geometry Alternatve approach Solve for D shape

More information

The Codesign Challenge

The Codesign Challenge ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.

More information

y and the total sum of

y and the total sum of Lnear regresson Testng for non-lnearty In analytcal chemstry, lnear regresson s commonly used n the constructon of calbraton functons requred for analytcal technques such as gas chromatography, atomc absorpton

More information

Active Contours/Snakes

Active Contours/Snakes Actve Contours/Snakes Erkut Erdem Acknowledgement: The sldes are adapted from the sldes prepared by K. Grauman of Unversty of Texas at Austn Fttng: Edges vs. boundares Edges useful sgnal to ndcate occludng

More information

Wavefront Reconstructor

Wavefront Reconstructor A Dstrbuted Smplex B-Splne Based Wavefront Reconstructor Coen de Vsser and Mchel Verhaegen 14-12-201212 2012 Delft Unversty of Technology Contents Introducton Wavefront reconstructon usng Smplex B-Splnes

More information

Non-Parametric Structure-Based Calibration of Radially Symmetric Cameras

Non-Parametric Structure-Based Calibration of Radially Symmetric Cameras Non-Parametrc Structure-Based Calbraton of Radally Symmetrc Cameras Federco Camposeco, Torsten Sattler, Marc Pollefeys Department of Computer Scence, ETH Zürch, Swtzerland {federco.camposeco, torsten.sattler,

More information

Fitting: Deformable contours April 26 th, 2018

Fitting: Deformable contours April 26 th, 2018 4/6/08 Fttng: Deformable contours Aprl 6 th, 08 Yong Jae Lee UC Davs Recap so far: Groupng and Fttng Goal: move from array of pxel values (or flter outputs) to a collecton of regons, objects, and shapes.

More information

Corner-Based Image Alignment using Pyramid Structure with Gradient Vector Similarity

Corner-Based Image Alignment using Pyramid Structure with Gradient Vector Similarity Journal of Sgnal and Informaton Processng, 013, 4, 114-119 do:10.436/jsp.013.43b00 Publshed Onlne August 013 (http://www.scrp.org/journal/jsp) Corner-Based Image Algnment usng Pyramd Structure wth Gradent

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information

A Comparison and Evaluation of Three Different Pose Estimation Algorithms In Detecting Low Texture Manufactured Objects

A Comparison and Evaluation of Three Different Pose Estimation Algorithms In Detecting Low Texture Manufactured Objects Clemson Unversty TgerPrnts All Theses Theses 12-2011 A Comparson and Evaluaton of Three Dfferent Pose Estmaton Algorthms In Detectng Low Texture Manufactured Objects Robert Krener Clemson Unversty, rkrene@clemson.edu

More information

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur FEATURE EXTRACTION Dr. K.Vjayarekha Assocate Dean School of Electrcal and Electroncs Engneerng SASTRA Unversty, Thanjavur613 41 Jont Intatve of IITs and IISc Funded by MHRD Page 1 of 8 Table of Contents

More information

MOTION BLUR ESTIMATION AT CORNERS

MOTION BLUR ESTIMATION AT CORNERS Gacomo Boracch and Vncenzo Caglot Dpartmento d Elettronca e Informazone, Poltecnco d Mlano, Va Ponzo, 34/5-20133 MILANO boracch@elet.polm.t, caglot@elet.polm.t Keywords: Abstract: Pont Spread Functon Parameter

More information

Reducing Frame Rate for Object Tracking

Reducing Frame Rate for Object Tracking Reducng Frame Rate for Object Trackng Pavel Korshunov 1 and We Tsang Oo 2 1 Natonal Unversty of Sngapore, Sngapore 11977, pavelkor@comp.nus.edu.sg 2 Natonal Unversty of Sngapore, Sngapore 11977, oowt@comp.nus.edu.sg

More information

New dynamic zoom calibration technique for a stereo-vision based multi-view 3D modeling system

New dynamic zoom calibration technique for a stereo-vision based multi-view 3D modeling system New dynamc oom calbraton technque for a stereo-vson based mult-vew 3D modelng system Tao Xan, Soon-Yong Park, Mural Subbarao Dept. of Electrcal & Computer Engneerng * State Unv. of New York at Stony Brook,

More information

Vanishing Hull. Jinhui Hu, Suya You, Ulrich Neumann University of Southern California {jinhuihu,suyay,

Vanishing Hull. Jinhui Hu, Suya You, Ulrich Neumann University of Southern California {jinhuihu,suyay, Vanshng Hull Jnhu Hu Suya You Ulrch Neumann Unversty of Southern Calforna {jnhuhusuyay uneumann}@graphcs.usc.edu Abstract Vanshng ponts are valuable n many vson tasks such as orentaton estmaton pose recovery

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

Direct Monocular Odometry Using Points and Lines

Direct Monocular Odometry Using Points and Lines Drect Monocular Odometry Usng Ponts and Lnes Shchao Yang, Sebastan Scherer Abstract Most vsual odometry algorthm for a monocular camera focuses on ponts, ether by feature matchng, or drect algnment of

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

UAV global pose estimation by matching forward-looking aerial images with satellite images

UAV global pose estimation by matching forward-looking aerial images with satellite images The 2009 IEEE/RSJ Internatonal Conference on Intellgent Robots and Systems October -5, 2009 St. Lous, USA UAV global pose estmaton by matchng forward-lookng aeral mages wth satellte mages Kl-Ho Son, Youngbae

More information

Delayed Features Initialization for Inverse Depth Monocular SLAM

Delayed Features Initialization for Inverse Depth Monocular SLAM Delayed Features Intalzaton for Inverse Depth Monocular SLAM Rodrgo Mungua and Anton Grau Department of Automatc Control, Techncal Unversty of Catalona, UPC c/ Pau Gargallo, 5 E-0808 Barcelona, Span, {rodrgo.mungua;anton.grau}@upc.edu

More information

State of the Art in Real-time Registration of RGB-D Images

State of the Art in Real-time Registration of RGB-D Images State of the Art n Real-tme Regstraton of RGB-D Images Patrck Stotko Supervsed by: Tm Golla Insttute of Computer Scence II - Computer Graphcs Unversty of Bonn Bonn / Germany Abstract 3D reconstructon ganed

More information

IMAGE MATCHING WITH SIFT FEATURES A PROBABILISTIC APPROACH

IMAGE MATCHING WITH SIFT FEATURES A PROBABILISTIC APPROACH IMAGE MATCHING WITH SIFT FEATURES A PROBABILISTIC APPROACH Jyot Joglekar a, *, Shrsh S. Gedam b a CSRE, IIT Bombay, Doctoral Student, Mumba, Inda jyotj@tb.ac.n b Centre of Studes n Resources Engneerng,

More information

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS ARPN Journal of Engneerng and Appled Scences 006-017 Asan Research Publshng Network (ARPN). All rghts reserved. NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS Igor Grgoryev, Svetlana

More information

Computer Animation and Visualisation. Lecture 4. Rigging / Skinning

Computer Animation and Visualisation. Lecture 4. Rigging / Skinning Computer Anmaton and Vsualsaton Lecture 4. Rggng / Sknnng Taku Komura Overvew Sknnng / Rggng Background knowledge Lnear Blendng How to decde weghts? Example-based Method Anatomcal models Sknnng Assume

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Proceedngs of the Wnter Smulaton Conference M E Kuhl, N M Steger, F B Armstrong, and J A Jones, eds A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Mark W Brantley Chun-Hung

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

Calibration of an Articulated Camera System with Scale Factor Estimation

Calibration of an Articulated Camera System with Scale Factor Estimation Calbraton of an Artculated Camera System wth Scale Factor Estmaton CHEN Junzhou, Kn Hong WONG arxv:.47v [cs.cv] 7 Oct Abstract Multple Camera Systems (MCS) have been wdely used n many vson applcatons and

More information

X- Chart Using ANOM Approach

X- Chart Using ANOM Approach ISSN 1684-8403 Journal of Statstcs Volume 17, 010, pp. 3-3 Abstract X- Chart Usng ANOM Approach Gullapall Chakravarth 1 and Chaluvad Venkateswara Rao Control lmts for ndvdual measurements (X) chart are

More information

AIMS Computer vision. AIMS Computer Vision. Outline. Outline.

AIMS Computer vision. AIMS Computer Vision. Outline. Outline. AIMS Computer Vson 1 Matchng, ndexng, and search 2 Object category detecton 3 Vsual geometry 1/2: Camera models and trangulaton 4 Vsual geometry 2/2: Reconstructon from multple vews AIMS Computer vson

More information

Smoothing Spline ANOVA for variable screening

Smoothing Spline ANOVA for variable screening Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory

More information

Accounting for the Use of Different Length Scale Factors in x, y and z Directions

Accounting for the Use of Different Length Scale Factors in x, y and z Directions 1 Accountng for the Use of Dfferent Length Scale Factors n x, y and z Drectons Taha Soch (taha.soch@kcl.ac.uk) Imagng Scences & Bomedcal Engneerng, Kng s College London, The Rayne Insttute, St Thomas Hosptal,

More information

Implementation of a Dynamic Image-Based Rendering System

Implementation of a Dynamic Image-Based Rendering System Implementaton of a Dynamc Image-Based Renderng System Nklas Bakos, Claes Järvman and Mark Ollla 3 Norrköpng Vsualzaton and Interacton Studo Lnköpng Unversty Abstract Work n dynamc mage based renderng has

More information

Deterministic Hypothesis Generation for Robust Fitting of Multiple Structures

Deterministic Hypothesis Generation for Robust Fitting of Multiple Structures Determnstc Hypothess Generaton for Robust Fttng of Multple Structures Kwang Hee Lee, Chank Yu, and Sang Wook Lee, Member, IEEE Abstract We present a novel algorthm for generatng robust and consstent hypotheses

More information

Mathematics 256 a course in differential equations for engineering students

Mathematics 256 a course in differential equations for engineering students Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

A Range Image Refinement Technique for Multi-view 3D Model Reconstruction

A Range Image Refinement Technique for Multi-view 3D Model Reconstruction A Range Image Refnement Technque for Mult-vew 3D Model Reconstructon Soon-Yong Park and Mural Subbarao Electrcal and Computer Engneerng State Unversty of New York at Stony Brook, USA E-mal: parksy@ece.sunysb.edu

More information

Optimal Combination of Stereo Camera Calibration from Arbitrary Stereo Images.

Optimal Combination of Stereo Camera Calibration from Arbitrary Stereo Images. Tna Memo No. 1991-002 Image and Vson Computng, 9(1), 27-32, 1990. Optmal Combnaton of Stereo Camera Calbraton from Arbtrary Stereo Images. N.A.Thacker and J.E.W.Mayhew. Last updated 6 / 9 / 2005 Imagng

More information

Line-based Camera Movement Estimation by Using Parallel Lines in Omnidirectional Video

Line-based Camera Movement Estimation by Using Parallel Lines in Omnidirectional Video 01 IEEE Internatonal Conference on Robotcs and Automaton RverCentre, Sant Paul, Mnnesota, USA May 14-18, 01 Lne-based Camera Movement Estmaton by Usng Parallel Lnes n Omndrectonal Vdeo Ryosuke kawansh,

More information

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters Proper Choce of Data Used for the Estmaton of Datum Transformaton Parameters Hakan S. KUTOGLU, Turkey Key words: Coordnate systems; transformaton; estmaton, relablty. SUMMARY Advances n technologes and

More information

Robust Computation and Parametrization of Multiple View. Relations. Oxford University, OX1 3PJ. Gaussian).

Robust Computation and Parametrization of Multiple View. Relations. Oxford University, OX1 3PJ. Gaussian). Robust Computaton and Parametrzaton of Multple Vew Relatons Phl Torr and Andrew Zsserman Robotcs Research Group, Department of Engneerng Scence Oxford Unversty, OX1 3PJ. Abstract A new method s presented

More information

Dynamic wetting property investigation of AFM tips in micro/nanoscale

Dynamic wetting property investigation of AFM tips in micro/nanoscale Dynamc wettng property nvestgaton of AFM tps n mcro/nanoscale The wettng propertes of AFM probe tps are of concern n AFM tp related force measurement, fabrcaton, and manpulaton technques, such as dp-pen

More information

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices Steps for Computng the Dssmlarty, Entropy, Herfndahl-Hrschman and Accessblty (Gravty wth Competton) Indces I. Dssmlarty Index Measurement: The followng formula can be used to measure the evenness between

More information

3D vector computer graphics

3D vector computer graphics 3D vector computer graphcs Paolo Varagnolo: freelance engneer Padova Aprl 2016 Prvate Practce ----------------------------------- 1. Introducton Vector 3D model representaton n computer graphcs requres

More information

An Image Fusion Approach Based on Segmentation Region

An Image Fusion Approach Based on Segmentation Region Rong Wang, L-Qun Gao, Shu Yang, Yu-Hua Cha, and Yan-Chun Lu An Image Fuson Approach Based On Segmentaton Regon An Image Fuson Approach Based on Segmentaton Regon Rong Wang, L-Qun Gao, Shu Yang 3, Yu-Hua

More information

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification Introducton to Artfcal Intellgence V22.0472-001 Fall 2009 Lecture 24: Nearest-Neghbors & Support Vector Machnes Rob Fergus Dept of Computer Scence, Courant Insttute, NYU Sldes from Danel Yeung, John DeNero

More information

High-Boost Mesh Filtering for 3-D Shape Enhancement

High-Boost Mesh Filtering for 3-D Shape Enhancement Hgh-Boost Mesh Flterng for 3-D Shape Enhancement Hrokazu Yagou Λ Alexander Belyaev y Damng We z Λ y z ; ; Shape Modelng Laboratory, Unversty of Azu, Azu-Wakamatsu 965-8580 Japan y Computer Graphcs Group,

More information

Model-Based Bundle Adjustment to Face Modeling

Model-Based Bundle Adjustment to Face Modeling Model-Based Bundle Adjustment to Face Modelng Oscar K. Au Ivor W. sang Shrley Y. Wong oscarau@cs.ust.hk vor@cs.ust.hk shrleyw@cs.ust.hk he Hong Kong Unversty of Scence and echnology Realstc facal synthess

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Decson surface s a hyperplane (lne n 2D) n feature space (smlar to the Perceptron) Arguably, the most mportant recent dscovery n machne learnng In a nutshell: map the data to a predetermned

More information

Pose Estimation in Heavy Clutter using a Multi-Flash Camera

Pose Estimation in Heavy Clutter using a Multi-Flash Camera 2010 IEEE Internatonal Conference on Robotcs and Automaton Anchorage Conventon Dstrct May 3-8, 2010, Anchorage, Alaska, USA Pose Estmaton n Heavy Clutter usng a Mult-Flash Camera Mng-Yu Lu, Oncel Tuzel,

More information

Edge Detection in Noisy Images Using the Support Vector Machines

Edge Detection in Noisy Images Using the Support Vector Machines Edge Detecton n Nosy Images Usng the Support Vector Machnes Hlaro Gómez-Moreno, Saturnno Maldonado-Bascón, Francsco López-Ferreras Sgnal Theory and Communcatons Department. Unversty of Alcalá Crta. Madrd-Barcelona

More information

Generalized-ICP. Aleksandr V. Segal Stanford University Dirk Haehnel Stanford University

Generalized-ICP. Aleksandr V. Segal Stanford University   Dirk Haehnel Stanford University Generalzed-ICP Aleksandr V. Segal Stanford Unversty Emal: avsegal@cs.stanford.edu Drk Haehnel Stanford Unversty Emal: haehnel@stanford.edu Sebastan hrun Stanford Unversty Emal: thrun@stanford.edu Abstract

More information

Improved SIFT-Features Matching for Object Recognition

Improved SIFT-Features Matching for Object Recognition Improved SIFT-Features Matchng for Obect Recognton Fara Alhwarn, Chao Wang, Danela Rstć-Durrant, Axel Gräser Insttute of Automaton, Unversty of Bremen, FB / NW Otto-Hahn-Allee D-8359 Bremen Emals: {alhwarn,wang,rstc,ag}@at.un-bremen.de

More information

Wishing you all a Total Quality New Year!

Wishing you all a Total Quality New Year! Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

Interactive Virtual Relighting of Real Scenes

Interactive Virtual Relighting of Real Scenes Frst submtted: October 1998 (#846). Edtor/revewers please consult accompanyng document wth detaled responses to revewer comments. Interactve Vrtual Relghtng of Real Scenes Célne Loscos, George Drettaks,

More information

PROJECTIVE RECONSTRUCTION OF BUILDING SHAPE FROM SILHOUETTE IMAGES ACQUIRED FROM UNCALIBRATED CAMERAS

PROJECTIVE RECONSTRUCTION OF BUILDING SHAPE FROM SILHOUETTE IMAGES ACQUIRED FROM UNCALIBRATED CAMERAS PROJECTIVE RECONSTRUCTION OF BUILDING SHAPE FROM SILHOUETTE IMAGES ACQUIRED FROM UNCALIBRATED CAMERAS Po-Lun La and Alper Ylmaz Photogrammetrc Computer Vson Lab Oho State Unversty, Columbus, Oho, USA -la.138@osu.edu,

More information

Large Motion Estimation for Omnidirectional Vision

Large Motion Estimation for Omnidirectional Vision Large Moton Estmaton for Omndrectonal Vson Jong Weon Lee, Suya You, and Ulrch Neumann Computer Scence Department Integrated Meda Systems Center Unversty of Southern Calforna Los Angeles, CA 98978, USA

More information

Biostatistics 615/815

Biostatistics 615/815 The E-M Algorthm Bostatstcs 615/815 Lecture 17 Last Lecture: The Smplex Method General method for optmzaton Makes few assumptons about functon Crawls towards mnmum Some recommendatons Multple startng ponts

More information

arxiv: v1 [cs.ro] 8 Jul 2016

arxiv: v1 [cs.ro] 8 Jul 2016 Non-Central Catadoptrc Cameras Pose Estmaton usng 3D Lnes* André Mateus, Pedro Mraldo and Pedro U. Lma arxv:1607.02290v1 [cs.ro] 8 Jul 2016 Abstract In ths artcle we purpose a novel method for planar pose

More information

Hierarchical Motion Consistency Constraint for Efficient Geometrical Verification in UAV Image Matching

Hierarchical Motion Consistency Constraint for Efficient Geometrical Verification in UAV Image Matching Herarchcal Moton Consstency Constrant for Effcent Geometrcal Verfcaton n UAV Image Matchng San Jang 1, Wanshou Jang 1,2, * 1 State Key Laboratory of Informaton Engneerng n Surveyng, Mappng and Remote Sensng,

More information

Chapter 6 Programmng the fnte element method Inow turn to the man subject of ths book: The mplementaton of the fnte element algorthm n computer programs. In order to make my dscusson as straghtforward

More information

A Scalable Projective Bundle Adjustment Algorithm using the L Norm

A Scalable Projective Bundle Adjustment Algorithm using the L Norm Sxth Indan Conference on Computer Vson, Graphcs & Image Processng A Scalable Projectve Bundle Adjustment Algorthm usng the Norm Kaushk Mtra and Rama Chellappa Dept. of Electrcal and Computer Engneerng

More information

Calibration of an Articulated Camera System

Calibration of an Articulated Camera System Calbraton of an Artculated Camera System CHEN Junzhou and Kn Hong WONG Department of Computer Scence and Engneerng The Chnese Unversty of Hong Kong {jzchen, khwong}@cse.cuhk.edu.hk Abstract Multple Camera

More information

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide Lobachevsky State Unversty of Nzhn Novgorod Polyhedron Quck Start Gude Nzhn Novgorod 2016 Contents Specfcaton of Polyhedron software... 3 Theoretcal background... 4 1. Interface of Polyhedron... 6 1.1.

More information

Analysis of Continuous Beams in General

Analysis of Continuous Beams in General Analyss of Contnuous Beams n General Contnuous beams consdered here are prsmatc, rgdly connected to each beam segment and supported at varous ponts along the beam. onts are selected at ponts of support,

More information

Programming in Fortran 90 : 2017/2018

Programming in Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Exercse 1 : Evaluaton of functon dependng on nput Wrte a program who evaluate the functon f (x,y) for any two user specfed values

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

Simultaneous Object Pose and Velocity Computation Using a Single View from a Rolling Shutter Camera

Simultaneous Object Pose and Velocity Computation Using a Single View from a Rolling Shutter Camera Smultaneous Object Pose and Velocty Computaton Usng a Sngle Vew from a Rollng Shutter Camera Omar At-Ader, Ncolas Andreff, Jean Marc Lavest, and Phlppe Martnet Unversté Blase Pascal Clermont Ferrand, LASMEA

More information

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1 4/14/011 Outlne Dscrmnatve classfers for mage recognton Wednesday, Aprl 13 Krsten Grauman UT-Austn Last tme: wndow-based generc obect detecton basc ppelne face detecton wth boostng as case study Today:

More information