Direct Monocular Odometry Using Points and Lines

Size: px

Start display at page:

Download "Direct Monocular Odometry Using Points and Lines"

Gwen Matthews
6 years ago
Views:

Drect Monocular Odometry Usng Ponts and Lnes Shchao Yang, Sebastan Scherer Abstract Most vsual odometry algorthm for a monocular camera focuses on ponts, ether by feature matchng, or drect algnment

1 Drect Monocular Odometry Usng Ponts and Lnes Shchao Yang, Sebastan Scherer Abstract Most vsual odometry algorthm for a monocular camera focuses on ponts, ether by feature matchng, or drect algnment of pxel ntensty, whle gnorng a common but mportant geometry entty: edges. In ths paper, we propose an odometry algorthm that combnes ponts and edges to beneft from the advantages of both drect and feature based methods. It works better n texture-less envronments and s also more robust to lghtng changes and fast moton by ncreasng the convergence basn. We mantan a depth map for the keyframe then n the trackng part, the camera pose s recovered by mnmzng both the photometrc error and geometrc error to the matched edge n a probablstc framework. In the mappng part, edge s used to speed up and ncrease stereo matchng accuracy. On varous publc datasets, our algorthm acheves better or comparable performance than state-of-theart monocular odometry methods. In some challengng textureless envronments, our algorthm reduces the state estmaton error over 50%. I. INTRODUCTION Vsual odometry (VO) and Smultaneous localzaton and mappng (SLAM) have become popular topcs n recent years due to ther wde applcaton n robot navgaton, 3D reconstructon, and vrtual realty. Dfferent sensors can be used such as RGB-D cameras [1], stereo cameras [2] and lasers, whch could provde depth nformaton for each frame, makng t easer for state estmaton and mappng. However for some applcatons such as weght constraned mcro aeral vehcles [3], monocular cameras are more wdely used due to ther small sze and low cost. Therefore, n ths work, we are amng at the more challengng monocular VO. There are typcally two categores of VO and vslam approaches: (1) feature based methods such as PTAM [4] and ORB SLAM [5]. They rely on feature pont extracton and matchng to create sparse 3D map used for pose estmaton by mnmzng re-projecton geometrc error. (2) Recently, drect method [6] [7] also becomes popular. It drectly operates on the raw pxel ntensty by mnmzng photometrc error wthout feature extracton. These two methods both have ther advantages. Reprojecton geometrc error of keyponts s typcally more robust to mage nose and large geometrc dstortons and movement. Drect method on the other hand, explots much more mage nformaton and can create dense or sem-dense maps. In ths paper, we utlze ponts and edges to combne the advantages of the above two approaches. Edge s another mportant feature apart from ponts. It has been used for stereo [8] and RGB-D VO [9], but receves less attenton n monocular VO. The detecton of edges s less senstve to The Robotcs Insttute, Carnege Mellon Unversty, 5000 Forbes Ave, Pttsburgh, PA 15213, USA. {shchaoy, bast}@andrew.cmu.edu Fg. 1. Trackng and 3D reconstructon on TUM mono dataset usng our edge based vsual odometry. The top mage shows the homogeneous wall surface wth low mage gradents, whch s challengng for VO only mnmzng photometrc error. However, edge shown n blue can stll be detected to mprove the trackng and mappng performance. lghtng changes by nature. For example, n a homogeneous envronment of Fg. 1, drect method usng ponts only may not work robustly due to small mage gradent, but we can stll detect many edges shown n blue n the fgure whch could be used for state estmaton and mappng. In our system, we mantan a sem-depth map for the keyframe s hgh gradent pxels as n many drect VO methods [6]. We also detect and match edges for each frame. Then n the trackng part, we jontly optmze both photometrc error and geometrc error to the correspondng edge f t has. In the mappng part, edges could also be used to gude and speed up the stereo search and also mprove depth map qualty by edge regularzng. By dong ths, the proposed VO can ncrease the accuracy of state estmaton and also create a good sem-dense map. We demonstrate ths through varous experments. In summary, our man contrbutons are: A real-tme monocular vsual odometry algorthm ncorporatng ponts and edges, especally sutable for texture-less envronments. Provde an uncertanty analyss and probablstc fuson of ponts and lnes n trackng and mappng. Develop analytcal edge based regularzaton Outperform or comparable to exstng drect VO n many datasets. In the followng secton, we dscuss related work. In

2 Secton III, we provde the problem formulaton. Trackng and mappng usng ponts and edges are presented n Secton IV and Secton V respectvely, whch also nclude probablstc uncertanty analyss of dfferent observaton model. In Secton VI, we provde expermental comparson wth the state-of-art algorthm. Fnally, concluson and future work s dscussed n Secton VII II. RELATED WORK Our algorthm utlzes edges to combne feature based and drect VO. We brefly ntroduce these three aspects. A. Feature based VO There have been many feature pont based VO and SLAM, for example LbVISO [2] and ORB SLAM [5]. They frst extract mage features then track or match them across mages. The camera pose s estmated by solvng the PnP (Perspectve N-Pont Projecton) problem to mnmze geometrc error whch s more robust to mage nose and has a large convergence basn [5] [10]. The drawback s that the created map s usually sparse. A separate drect mappng algorthm s requred to get a sem-dense map [11]. B. Drect VO In recent years, drect method [12] also becomes popular. It optmzes the geometry drectly on the mage ntenstes wthout any feature extracton so t can work n some textureless envronments wth few keyponts. It has been used for real-tme applcaton of dfferent sensors for example DVO for RGB-D cameras [13] and LSD SLAM for monocular cameras [7]. The core dea s to mantan a sem-dense map for keyframes then mnmze the photometrc error whch s a hghly non-convex functon thus t requres good ntal guess for the optmzaton. In between drect and feature based methods, SVO combnes drect algnment and feature ponts and can be used for hgh frame rate cameras. C. Edge based VO Edges are another mportant feature apart from ponts especally n man-made envronments. Edges are more robust to lghtng changes and preserve more nformaton compared to sngle ponts. Lne-based bundle adjustment has been used n SLAM or SfM [14] [15] whch are computatonally expensve and requre at least three frames for effectve optmzaton. Lne-based VO wthout bundle adjustment has recently been used for stereo cameras [16] [8] and RGB- D [17] [9] and monocular cameras [18] [19]. Kuse et al. mnmze the geometrc error to ts nearby edges pxels through dstance transform [9] whch mght cause wrong matchng due to false detected edges and broken edges whle our lne segment matchng could greatly reduce the error. Some works only mnmze geometrc error of two edge endponts [8] [17] whch may generate large error for monocular cameras due to naccurate depth estmaton. A. System Overvew III. PROBLEM DESCRIPTION Our algorthm s a frame to keyframe monocular VO. We mantan a sem-depth map for the hgh gradent pxels n the keyframe. Then for each ncomng new frame, there are three steps. Frst, detect lne segments and match them wth the keyframe s edges. The second step s camera pose trackng. We mnmze a combnaton of pxel photometrc error and geometrc reprojecton error f the pxel belongs to an edge. Lastly, we update the depth map through varable baselne stereo. Edges are used to speed up the stereo search for those edge pxels and also mprove reconstructon through an effcent 3D lne regularzaton. B. Notatons We denote an ntensty mage as I : Ω R 2 R, where Ω represents the mage doman. We keep a per-pxel nverse depth map for a reference keyframe D : Ω R 2 R + and nverse depth varance V : Ω R +. The camera projecton functon s defned as π : R 3 R 2, whch projects a camera-centered 3D pont onto mage plane. The nverse projecton functon s then π 1 : (R 2, R) R 3 whch back-projects an mage pxel to the 3D space gven ts depth. The transformaton between the current frame and reference keyframe s defned by a rgd transformaton T SE(3). For an effcent optmzaton of T, we use the mnmal manfold representaton by elements of the Lealgebra ξ se(3) [20], whch s expressed by twst ξ = (t; w) T R 6. t R 3 s the translaton component and w R 3 s the rotaton component, whch can form rotaton matrx by the correspondng exponental map: R(w) = exp([w] ) SO(3). A warpng functon s defned as τ : Ω 1 R R 6 Ω 2. It takes the parameters of a pxel x Ω 1 n the frst mage I 1, ts depth d and relatve camera transformaton ξ, then returns the re-projecton pont n second mage I 2. Internally, t frst back-projects x to 3D pont by π 1, transforms t usng ξ, then projects to another frame usng π. An edge s represented by L n 3D space and l n 2D mage plane. All the edge pxels n an mage are defned as M : R 2 l whch maps a pxel to ts edge. Most of the edge pxels M belong to the hgh gradent pxel set Ω. A. Overvew IV. TRACKING In the trackng thread, the depth map D ref of the reference frame I ref s assumed to be fxed. The current mage I s algned by mnmzaton of the photometrc resdual r(ξ) and lne re-projecton geometrc error g(ξ) correspondng to two observaton model: photometrc ntensty observaton and edge poston observatons. It can be formulated as the followng non-lnear least squares problem: E(ξ) = Ω r (ξ) T Σ 1 r r (ξ) + g j (ξ) T Σ 1 g j g j (ξ) (1) j M

(a) (b) (c) Fg. 2. Trackng teratons for two mages n TUM fr3/cabnet bg dataset. (a) reference frame wth detected edges (b) current frame wth detected edges. Two frames are 41 frames apart (about 1.4s).

Best vewed n color. where photometrc error r s defned by [6] [12]: where τb() s the homogeneous coordnate operaton.

3 (a) (b) (c) Fg. 2. Trackng teratons for two mages n TUM fr3/cabnet bg dataset. (a) reference frame wth detected edges (b) current frame wth detected edges. Two frames are 41 frames apart (about 1.4s). (c) Re-projected pxels on current frame durng optmzaton teratons correspondng to 1, 4, 9, 20. We can see that the re-projected edge pxels n green gradually algn wth the true edges n red. Best vewed n color. where photometrc error r s defned by [6] [12]: where τb() s the homogeneous coordnate operaton. Ths term s only used for the pxels of edges n Iref whch also have a matchng edge n I. Σr and Σg represents the uncertanty of two errors correspondngly. The energy functon Equaton (1) s mnmzed through teratve Gauss-Newton optmzaton. For teraton n, the small update s: In our case, as defned n Equaton (3), the pxel reprojecton error to lne s a functon of lne equaton lj and re-projected pont x0 = τb(x, Dref (x ), ξ). Lne equaton s computed by cross product of two lne endponts lj = p1 p2. We can assume that the uncertantes Σp of end pont postons p1 and p2 s b-dmensonal Gaussans wth σ = 1. Then we can use rule n Equaton (5) to compute the uncertantes of lne equaton coeffcents lj. It bascally mples that longer lne has smaller lne fttng uncertantes. We can then smlarly compute the varance of reprojecton pont x0 = τb(x, Dref (x ), ξ). It s a functon of pxel depth Dref (x ) wth varance Vref (x ). The fnal re-projecton error covarance s a combnaton of the two uncertanty sources: δξ n = (J T W J) 1 J T W E(ξ n ) 0 Σgj = ljt Σx0 lj + x0t Σl j x r = Iref (x ) I(τ (x, Dref (x ), ξ)) (2) gj s the re-projecton error of pxel x to ts correspondng lne lj (homogeneous lne representaton): gj = ljt τb(x, Dref (x ), ξ) (3) (4) Where E s the stacked error vectors composed of two parts: E = (r1,..., rn, g1,..., gm )T. J s the the Jacoban of E wrt. ξ. W s the weght matrx computed from uncertanty Σ 1. A trackng llustraton s shown n Fg 2. B. Trackng uncertanty analyss Combnng dfferent types of error terms n Equaton (1) ncreases the robustness and accuracy of pose estmaton. The weghts of dfferent terms are proportonal to the nverse of the error varance Σr and Σg computed from the observaton models. Here, we provde an analyss of Σg. Photometrc error uncertanty Σr has been analysed n [6]. In the general case, the uncertanty of the output of a functon f (x) propagated from the nput uncertanty s expressed by: Σf Jf Σx JfT (5) where Jf s the Jacoban of f wrt. x. (6) V. M APPING A. Overvew In the mappng thread, the depth map Dref of reference frame s updated through stereo trangulaton n nverse depth flterng framework [6] followed by lne regularzaton to mprove the accuracy. The camera pose s assumed to be fxed n ths step. The cost functon for depth optmzaton s defned as follows: E(D) = r (d)t Σ 1 r r (d) + Gj (d)t Σ 1 Gj Gj (d) (7) j where r (d) s the stereo matchng photometrc error. SSD error over mage patches s used to mprove robustness. For a lne lj, we want ts pxels to also form a lne n 3D space after back-projecton, so Gj s edge regularzaton cost

4 l Fg. 3. Lne trangulaton. 3D lne L could be computed by the ntersecton of two back-projected planes π, π. For each pxel on l, ts stereo matchng pont s the ntersecton of eppolar lne g and matched edge l. The trangulated pont also les on 3D lne L. Modfed from [22]. representng the dstance of edge pxel s 3D pont to 3D lne. The regularzaton technque s also used n other dense mappng algorthms [12] [21]. If only the frst term r s used [6], all the pxels are ndependent of each other and therefore could search ndependently along the eppolar lne to fnd the matchng pxel. Regularzaton term G j makes the depth of pxels on one edge correlate wth each other and s typcally solved by an teratve alternatng optmzaton through dualty prncples [12]. However, t requres much heaver computaton. Instead, we optmze for r and G j n two stages more effcently. B. Stereo match wth Lnes For the pxels not on the edge or pxels on an edge whch does not have a matchng edge, we perform an exhaustve search for the stereo matchng pxel by mnmzng SSD error [6]. The depth nterval for searchng s lmted by d + 2σ d, where d and σ d s the depth mean and standard devaton. For the pxels wth a matched edge, the re-projected ponts should le on the matched edge as well as ts eppolar lne so we can drectly compute ther ntersecton as the matchng pont. We can also drectly do lne trangulaton n Fg. 3 to compute all pxel s depth together. If the camera transform of current frame I wrt. I ref s R SO(3), t R 3, then the 3D lne L can be represented the ntersecton of two back-projected plane [22]: [ ] [ ] π T L = 1 l T = 1 K 0 l2 T KR l2 T (8) Kt π T 2 where l 1 and l 2 are the lne equaton n I ref and I respectvely. K s the ntrnsc camera parameter. Then for each pxel, we can compute the ntersecton of the back-projected ray wth L to get ts depth. For the degenerated case where eppolar lne and matched edge are (nearly) parallel, we cannot compute the 3D lne accurately by plane ntersecton. Instead, we use the exhaustve search along the eppolar lne to fnd the matchng pxel wth mnmal SSD error. C. Lne matchng uncertanty analyss The uncertanty of ntensty based stereo searchng along eppolar lne has been analysed n [6]. Here we nclude the analyss of edge based stereo matchng error. For each edge L ' g l' ' g' g θ Fg. 4. Dsparty error usng lne matchng. l s the edge where the matched pxel should le. g s the eppolar lne. Due to a small postonng error ɛ g, g s shfted to g. The same wth l of postonng error ɛ l. The fnal resultng dsparty error s ɛ λ. pxel n I ref, denote ts eppolar lne n I as g and ts matched edge as l then the matchng pxel s the ntersecton of g and l. These two lnes both have postonng error ɛ l and ɛ g, and fnally cause a dsparty error ɛ λ shown n Fg. 4. The edge uncertanty ɛ l s already analyzed n Secton IV-B whch s drectly related to the edge length. ɛ λ s large when g and l are nearly parallel. Mathematcally we have: ελ l εl εg ɛ λ = ɛ l / sn(θ) + ɛ g cot(θ) (9) where θ s the angle between lne l and g. From error propagaton rule n Equaton (5), we can compute the varance of the dsparty error: σ 2 λ = σ 2 l / sn 2 (θ) + σ 2 g cot 2 (θ) (10) Usng the approxmaton that nverse depth d s proportonal to dsparty λ, we can calculate the observaton varance of d usng Equaton (5). It can then be used to update the pxel s depth varance n a standard EKF flterng [6]. D. 3D Lne regularzaton Depth map regularzaton s mportant for monocular mappng approaches to mprove the depth estmaton accuracy. After the depth map EKF update n Secton V-C, the pxels on a 2D edge may not correspond to a lne n 3D space, therefore, we need to ft lnes n 3D space and update a pxel s depth. 3D weghted lne fttng s recently addressed n RGB-D lne based odometry [17] whch utlzes Levenberg-Marquardt teratve optmzaton to fnd the best 3D lne. Here we propose a fast and analytcal soluton to the weghted 3D lne fttng problem. Snce the 3D ponts are back-projected from the same 2D edge, they should le on the same plane G from projectve geometry. We can create anther coordnate frame F whose x, y axs le on the plane G. The transformed pont on the new coordnate frame s denoted as p. We frst use RANSAC to select a set of nler 2D ponts. The metrc for RANSAC s Mahalanobs dstance, whch s a weghted pxel to lne Eucldean dstance consderng the uncertanty: d mah = mn q l (p q ) T Σ 1 p (p q ) (11) where q l ndcates a pont lyng on lne l n frame F. d mah could be computed analytcally by takng the dervatve wrt. q and settng to zero. More detals could be found n [17]. l'

G Z (a) Y F (b) (c) Fg. 6. Example mages n TUM datasets wth varyng textures. (a) fr2/desk, (b) fr3/cabnet, (c) fr3/notex-far. ORB SLAM performs worse on (b) and (c) as there are fewer features ponts.

The pnk ellpse shows the uncertanty of 3D pont. We can transform 3D ponts to a coordnate frame F lyng on the grey plane. The axs are, Y n blue.

After RANSAC, we can fnd the largest consensus set of ponts p0, = 1,..., n.

5 G Z (a) Y F (b) (c) Fg. 6. Example mages n TUM datasets wth varyng textures. (a) fr2/desk, (b) fr3/cabnet, (c) fr3/notex-far. ORB SLAM performs worse on (b) and (c) as there are fewer features ponts. Our algorthm can stll utlze the matched edge features to mprove the state estmaton. Y Fg. 5. 3D lne regularzaton. We frst un-project pxel to 3D shown as red dots on the 3D plane G. The pnk ellpse shows the uncertanty of 3D pont. We can transform 3D ponts to a coordnate frame F lyng on the grey plane. The axs are, Y n blue. Then we can use RANSAC to analytcally compute a weghted least square lne nstead of teratve optmzaton [17]. Image modfed from [17]. Best vewed n color. After RANSAC, we can fnd the largest consensus set of ponts p0, = 1,..., n. Ths becomes a 2D weghted lne fttng problem and we want to fnd the best lne L so that: 0 L = mn δ(p0 )T Σ 1 (12) p0 δ(p ) L δ(p0 ) B. Experments p0 where s dstance of pont to lne L along y axs. It s an approxmaton of pont to lne dstance but could lead to a closed form soluton. Stack all ponts p0 coordnates as [, Y] (after subtractng from mean) and weght matrx as W whch can be approxmated as orgnal mage pxels covarance. Then the lne model under consderaton s Y = β +, where β s lne coeffcents, and s assumed to be normally dstrbuted vector of nose. The MLE optmal lne under Gaussan nose s: 2 = (T W ) 1 T W Y βb = arg mn (13) β utlzed to speed up the matchng. Fnally, lne tracng s performed to fnd all pxels on an edge. We fnd that the system becomes more robust and accurate f we expand the lne for one pxel possbly because more pxels are nvolved by the lne constrants n trackng and mappng. 2. Keyframe-based VO: our approach doesn t have the bundle adjustment of ponts and lnes n SLAM and SfM framework but could be extended to mprove the performance. Camera trackng, lne matchng and stereo mappng are mplemented only between the current frame and keyframe. We can then transform the optmal lne L n coordnate frame F back to the orgnal camera optcal frame and determne the pxel depth on the lne. VI. E PERIMENTS AND RESULTS A. Implementaton 1. Edge detecton and matchng: We use the publc lne segment detecton algorthm [23]. To mprove the trackng accuracy, we adopt a coarse-to-fne approach usng two pyramd levels wth a scale factor of two. Due to the uncertanty of lne detecton algorthm, one complete lne can sometmes break nto multple segments so we explctly merge two lnes whose angles and dstance are very close wthn a threshold. After that, we need to remove very short lne segments whch may have large lne fttng error. To speed up the lne mergng, lnes are assgned to dfferent bucket grds ndexed by the mddle ponts of an edge and the orentaton of t. Then we only need to consder possble mergng wthn the same and nearby bucket. We then compute the LBD descrptor [24] for each lne and match them across mages. Bucket technque s also In ths secton, we test our algorthm on varous publc datasets ncludng TUM RGBD [25], TUM mono [26] and ICL-NUIM [27]. We manly compare wth the state of art monocular drect SDVO [6] and feature based ORB SLAM [5]. We also provde some comparson wth edge based VO [18] [19] n some datasets where the result s provded. For ORB SLAM, we turn off the loop closng thread, but stll keep local and global bundle adjustment (BA) to detect ncremental loop-closures whle our algorthm and SDVO are VO algorthm wthout BA. We use the relatve poston error metrc (RPE) by Strum et al [25]. 1 E = (Q 1 (B 1 B+δ ) Q+δ ) (14) where Q SE(3) s the sequence of ground truth poses and B SE(3) s the estmated pose. Scale s estmated to best algn the trajectory. 1) Qualtatve results: We choose TUM mono/38 [26] for VO and mappng vsualzaton shown n Fg. 1. It manly contans homogeneous whte surfaces but there are stll many edges that could be utlzed. Our method could generate good qualty mappng and state estmaton. More results can be found n the supplementary vdeo. 2) Quanttatve results: We frst evaluate on two popular sequences of TUM RGBD dataset fr2/desk and fr2/xyz shown n Fg 6(a). Comparson s shown n Table I, where result of SDVO and two edge based VO [18] [19] are obtaned from ther paper. These two scenaros are feature rch envronments thus are most sutable for the feature-based ORB SLAM wth BA. Due to the large amounts of hgh gradent pxels, SDVO also performs well. Due to many curved bottles, leafs, and small keyboards, there s relatvely large lne detecton and matchng errors for these envronments,

6 TABLE I RELATIVE POSITION ERROR (CM/S) COMPARISON ON TUM DATASET Sequence Ours SDVO ORB-SLAM [18] [19] fr2/desk fr2/xyz TABLE II RELATIVE POSITION ERROR (CM/S) COMPARISON ON VARIOUS DATASETS Sequence Ours SDVO ORB-SLAM ICL/offce mono/ ICL/offce fr3/cabnet-bg fr3/cabnet fr3/notex-far our algorthm performs smlarly to SDVO but better than two other edge based VO. We also provde results on more datasets shown n Table II, where other edge VO doesn t provde results. The top two scenes are relatve easy envronments. In TUM mono/38 n Fg. 1, we only evaluate the begnnng part whch has ground truth pose. Snce there are stll some corner ponts on the door and showcase, ORB-SLAM wth BA stll performs the best but our algorthm clearly outperforms the SDVO. Ths s because the door surface s nearly homogeneous wthout large ntensty gradents so the photometrc error mnmzaton of SDVO doesn t work very well whle our algorthm can stll use edges to mnmze edge re-projecton error. The last four scenes n Table II are more challengng feature-less envronments shown n Fg.6(b) and Fg.6(c). ORB-SLAM doesn t work well and even fals (denoted as ) n some envronments but drect VO can stll work to some extent because drect methods utlze hgh gradent and edge pxels nstead of feature ponts. Note that n ICL/offce1 dataset, the overall scene has many feature ponts but ORB SLAM falure happens when the camera only observes whte walls and ground wth few dstngushable features. Our method wth lne clearly outperforms SDVO n most of the cases from the table and there are manly two reasons. Frstly, by addng edges, we are utlzng more pxels for trackng. Some pxels mght have low gradents due to homogeneous surfaces but can stll be utlzed because of lyng on edges shown n Fg.6(c). Secondly, we are mnmzng photometrc error as well as geometrc error, whch s known to be more robust to mage nose and has a large convergence basn. Ths has been analysed and verfed n many other works [26] [5]. To demonstrate the advantage of a large convergence basn n the optmzaton, we select two frames from TUM fr3/cabnet bg whch are 41 frames apart (1.3s) and show TABLE III TIME ANALYSSI ON TUM FR3/CABINET BIG DATASET. Component Value Edge detecton ms Descrptor computaton ms Edge matchng 4.52 ms Trackng tme ms Mappng tme ms Edge number 193 the trackng teratons n Fg. 2. We can clearly see the reprojected pxels n green gradually algn wth the true edges n red. C. Tme analyss We report the tme usage of our algorthm runnng on TUM fr3/cabnet bg dataset shown n Table III. Usng two octaves of lne detecton, there are totally 193 edges on average per frame. The total trackng thread apart from mappng takes ms, able to run around 20Hz. Tme could vary dependng on the amounts of pxels nvolved n the optmzaton. For now, edge detecton and descrptor computaton consumes most of the tme. Ths could be speeded up usng down-sampled mage. Edlne edge detector [28] can also be used to reduce detecton tme by half but t usually detects fewer edges compared to the currently used method [23] and may affect the state estmaton accuracy n some challengng envronments. Recently, Gomez et al [19] utlze edge trackng to decrease the computaton nstead of detecton and matchng for every frame, whch s also a good soluton. VII. CONCLUSIONS In ths paper, we propose a drect monocular odometry algorthm utlzng ponts and lnes. We follow the ppelne of SDVO [6] and add edges to mprove both trackng and mappng performance. In the trackng part, we mnmze both photometrc error and geometrc error to the matched edges. In the mappng part, usng matched edges, we can get stereo matchng quckly and accurately wthout exhaustve search. An analytcal soluton s developed to regularze the depth map usng edges. We also provde probablty uncertanty analyss of dfferent observaton models n trackng and mappng part. Our algorthm combnes the advantage of drect and feature based VO. It s able to create a sem-dense map and the state estmaton s more robust and accurate due to the ncorporaton of edges and geometrc error mnmzaton. On varous dataset evaluaton, we acheve better or comparable performance than SDVO and ORB SLAM. ORB SLAM wth bundle adjustment works the best n envronments wth rch features. However, for scenaros wth low texture, ORB SLAM mght fal and drect methods usually work better. Our algorthm focuses on these scenaros and further mproves the performance of SDVO by addng edges.

7 In the future, we want to reduce the computaton of edge detecton and matchng by drect edge algnment. Also, bundle adjustment of edges n multple frames could also be used to mprove the accuracy. We wll also explot more nformaton by combnng ponts, edges, and planes [29] n one framework to mprove the accuracy and robustness n challengng envronments. ACKNOWLEDGMENTS Ths work s supported by NSF award IIS REFERENCES [1] Albert S Huang, Abraham Bachrach, Peter Henry, Mchael Krann, Danel Maturana, Deter Fox, and Ncholas Roy. Vsual odometry and mappng for autonomous flght usng an rgb-d camera. In Internatonal Symposum on Robotcs Research (ISRR), volume 2, [2] Andreas Geger, Julus Zegler, and Chrstoph Stller. Stereoscan: Dense 3d reconstructon n real-tme. In IEEE Intellgent Vehcles Symposum, Baden-Baden, Germany, June [3] Zheng Fang, Shchao Yang, Sezal Jan, Geetesh Dubey, Stephan Roth, Slvo Maeta, Stephen Nuske, Yu Zhang, and Sebastan Scherer. Robust autonomous flght n constraned and vsually degraded shpboard envronments. Journal of Feld Robotcs, 34(1):25 52, [4] Georg Klen and Davd Murray. Parallel trackng and mappng for small ar workspaces. In Mxed and Augmented Realty, ISMAR th IEEE and ACM Internatonal Symposum on, pages IEEE, [5] Raul Mur-Artal, JMM Montel, and Juan D Tardos. ORB-SLAM: a versatle and accurate monocular SLAM system. Robotcs, IEEE Transactons on, 31(5): , [6] Jakob Engel, Jurgen Sturm, and Danel Cremers. Sem-dense vsual odometry for a monocular camera. In Proceedngs of the IEEE nternatonal conference on computer vson, pages , [7] Jakob Engel, Thomas Schöps, and Danel Cremers. LSD-SLAM: Large-scale drect monocular SLAM. In European Conference on Computer Vson (ECCV), pages Sprnger, [8] Rubén Gómez-Ojeda and Javer González-Jménez. Robust stereo vsual odometry through a probablstc combnaton of ponts and lne segments [9] Manohar Prakash Kuse and Shaoje Shen. Robust camera moton estmaton usng drect edge algnment and sub-gradent method. In IEEE Internatonal Conference on Robotcs and Automaton (ICRA), Stockholm, Sweden, [10] Jakob Engel, Vladlen Koltun, and Danel Cremers. Drect sparse odometry. arv preprnt arv: , [11] Raúl Mur-Artal and Juan D Tardós. Probablstc sem-dense mappng from hghly accurate feature-based monocular SLAM. Proceedngs of Robotcs: Scence and Systems, Rome, Italy, [12] Rchard A Newcombe, Steven J Lovegrove, and Andrew J Davson. Dtam: Dense trackng and mappng n real-tme. In 2011 nternatonal conference on computer vson, pages IEEE, [13] Chrstan Kerl, Jürgen Sturm, and Danel Cremers. Dense vsual slam for rgb-d cameras. In 2013 IEEE/RSJ Internatonal Conference on Intellgent Robots and Systems, pages IEEE, [14] Ethan Eade and Tom Drummond. Edge landmarks n monocular slam. Image and Vson Computng, 27(5): , [15] Georg Klen and Davd Murray. Improvng the aglty of keyframebased slam. In European Conference on Computer Vson, pages Sprnger, [16] Jonas Wtt and Uwe Weltn. Robust stereo vsual odometry usng teratve closest multple lnes. In 2013 IEEE/RSJ Internatonal Conference on Intellgent Robots and Systems, pages IEEE, [17] Yan Lu and Dezhen Song. Robust rgb-d odometry usng pont and lne features. In Proceedngs of the IEEE Internatonal Conference on Computer Vson, pages , [18] Juan Jose Tarro and Sol Pedre. Realtme edge-based vsual odometry for a monocular camera. In Proceedngs of the IEEE Internatonal Conference on Computer Vson, pages , [19] Ruben Gomez-Ojeda, Jesus Brales, and Javer Gonzalez-Jmenez. Plsvo: Sem-drect monocular vsual odometry by combnng ponts and lne segments. In Intellgent Robots and Systems (IROS), 2016 IEEE/RSJ Internatonal Conference on, pages IEEE, [20] Rchard M Murray, Zexang L, S Shankar Sastry, and S Shankara Sastry. A mathematcal ntroducton to robotc manpulaton. CRC press, [21] Pedro Pnés, Lna Mara Paz, and Paul Newman. Dense mono reconstructon: Lvng wth the pan of the plan plane. In 2015 IEEE Internatonal Conference on Robotcs and Automaton (ICRA), pages IEEE, [22] Rchard Hartley and Andrew Zsserman. Multple vew geometry n computer vson. Cambrdge unversty press, [23] von Go R Grompone, Jereme Jakubowcz, Jean-Mchel Morel, and Gregory Randall. Lsd: a fast lne segment detector wth a false detecton control. IEEE transactons on pattern analyss and machne ntellgence, 32(4): , [24] Llan Zhang and Renhard Koch. An effcent and robust lne segment matchng approach based on lbd descrptor and parwse geometrc consstency. Journal of Vsual Communcaton and Image Representaton, 24(7): , [25] Jürgen Sturm, Nkolas Engelhard, Felx Endres, Wolfram Burgard, and Danel Cremers. A benchmark for the evaluaton of RGB-D SLAM systems. In Intellgent Robots and Systems (IROS), IEEE/RSJ Internatonal Conference on, pages IEEE, [26] J. Engel, V. Usenko, and D. Cremers. A photometrcally calbrated benchmark for monocular vsual odometry. In arv: , July [27] A. Handa, T. Whelan, J.B. McDonald, and A.J. Davson. A benchmark for RGB-D vsual odometry, 3D reconstructon and SLAM. In IEEE Intl. Conf. on Robotcs and Automaton, ICRA, Hong Kong, Chna, May [28] Cuneyt Aknlar and Chan Topal. Edlnes: A real-tme lne segment detector wth a false detecton control. Pattern Recognton Letters, 32(13): , [29] Shchao Yang, Yu Song, Mchael Kaess, and Sebastan Scherer. Pop-up SLAM: a semantc monocular plane slam for low-texture envronments. In Intellgent Robots and Systems (IROS), 2016 IEEE nternatonal conference on. IEEE, 2016.

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,