SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es, ajd@doc.c.ac.uk 1 Objectves 1. Understandng the characterstcs of effcent (potentally real-tme) SLAM usng a monocular camera as the only sensor. (a) Map management. (b) Feature ntalzaton. (c) Near and far features. 2. Understandng the nverse depth parametrzaton of map features n monocular SLAM. 3. Understandng the performance lmts of a constant velocty moton model for a camera when no odometry s avalable. 2 Exercse 1. Feature selecton and matchng. One of the characterstcs of vson-based SLAM s that there s too much nformaton n an mage sequence for current computers to process n real-tme. We therefore use heurstcs to select whch features to nclude n the map. The desrable propertes of map features are: 1. Salency: features have to be dentfed by dstnct texture patches. 2. A mnmum number (e.g. 14) should be vsble n the mage at all tmes when ths s not the case, new map features are ntalzed. 3. The features should be spread over the whole mage. The goal of ths exercse s to manually ntalze features n order to meet the above crtera, and to understand better how an automatc ntalzaton algorthm should work. Run mono slam.m. Wth the user nterface, you can add features and perform step by step EKF SLAM: 1. In the frst mage add about ten salent features spread over the mage. You can watch the move juslbol SLAM.mpg (usng for nstance mpeg play on a Unx workstaton) as an example of how to select sutable features (but you can of course select other ones). Ths move shows the results of applyng automatc feature selecton. 2. As the camera moves, some features wll leave the feld of vew, and you wll have to add new ones n order to mantan around 14 vsble map features. 1
3 Exercse 2. Near features and far features. A camera s a bearng-only sensor. Ths means that the depth of a feature cannot be estmated from a sngle mage measurement. The depth of the feature can be estmated only f the feature s observed from dfferent ponts of vew and only f the camera translates enough to produce sgnfcant parallax. In partcular, t may take a very long tme to obtan accurate estmates of the depths of dstant features, snce they dsplay very lttle or no parallax when the camera moves small dstances. The goal of ths exercse s to observe the dfferent evoluton of depth estmates n the cases of near and dstant features and the nfluence that ths has on the camera locaton estmate. 1. Open the vdeo parallax.mpg. Observe the dfferent motons n the mage of features at dfferent depths. Open the vdeo noparallax.mpg (taken from a camera whch does not sgnfcantly translate) and observe that the mage moton of features at dfferent depths. 2. Now look at the vdeo we are usng for ths practcal (juslbol.mpg). Dstngush whch parts of the scene n ths vdeo contan low parallax moton. 3. Run mono slam.m. Observe what happens to the features n the 3D map (ntalzaton value and covarance and value and covarance after several frames). Red dots dsplay the estmated values and red lnes bound 95% probablty regons denotng uncertanty. The code sngles out features #5 and #15 and dsplays ther depth estmates and 95% probablty regons: [lower lmt, estmaton, upper lmt]. When clckng, make sure that feature #5 corresponds to a near one (for example, on the car) and feature #15 corresponds to a far one (for example, the tree appearng on the left). Notce the dfference between the evoluton of the estmates of these near and dstant features. Observe the evoluton of the camera locaton uncertanty (use the axes lmt controls n the user nterface). Observe what happens to features and camera locaton uncertantes n the low parallax moton part of the mage sequence dscussed above n 2. Notce the dfference between ths part of the 3D map (constructed wth low parallax nformaton) and the hgh parallax parts. 4 Exercse 3. Inverse depth parameterzaton. Intalzng a feature n monocular SLAM s a challengng ssue, because the depth uncertanty s not well modelled by a Gaussan. Ths problem s overcome usng nverse depth nstead of the classcal XY Z representaton. The Matlab code of ths practcal uses the nverse depth parametrzaton of feature postons. The total state vector: x = ( x v, y 1, y 2,... y n. (1) s composed of: 2
1 ρ =d α x 1 y + m ρ z scene pont ( θ, φ) x y z mθ (, φ ) x y r z WC WC r C h C parallax angle ( r WC, q WC ) W Fgure 1: Feature parameterzaton and measurement equaton. 1. 13 components that correspond to the locaton, orentaton, and velocty and angular velocty of the camera: r W C q x v = W C v W. (2) ω W 2. The rest of the components are features. Each feature s represented by 6 parameters; the poston of the camera the frst tme the feature was seen x, y, z, a sem-nfnte ray parametrzed wth azmuth-elevaton angles (θ, φ), and the nverse depth ρ of the feature along the ray: y = ( x y z θ φ ρ. (3) So the transformaton from the nverse depth parametrzaton to a standard Eucldean system s: x y z = x y z + 1 ρ m (θ, φ ). (4) where: m = ( cos φ sn θ sn φ cos φ cos θ. (5) The goal of ths exercse s to understand the nverse depth parametrzaton. 1. The code stores partal nformaton about features #5 and #15 n the fle hstory.mat: feature5hstory s a 6 row matrx, each column contanng the feature #5 locaton coded n nverse depth at step k. rhohstory 5 s a row vector contanng the nverse depth estmaton hstory for feature 5. 3
rhohstory 15 s a row vector contanng the nverse depth estmaton hstory for feature 15. rhostdhstory 5 s a row vector contanng the nverse depth standard devaton hstory for feature 5. rhostdhstory 15 s a row vector contanng the nverse depth standard devaton hstory for feature 15. 2. Compute the XY Z Eucldean locaton for feature #5 after processng all the mages. 3. Do a graph wth the value of the nverse depth and the 95% acceptance regon hstory for both features #5 and #15. Use the matlab functons open, fgure, hold, and plot. Comment on the dfference between the two graphs. 4. After processng the whole sequence, what are the estmates and the uncertanty regons expressed n depth for both features? Thnk about a feature at nfnty what nverse depth would t have? In the graphs, are there regons where nfnty s ncluded wthn the uncertanty bounds as a possble depth for each feature? The nverse depth parametrzaton s partcularly valuable n beng able to represent the possblty of features at nfnty. 5 Exercse 4. Constant velocty moton model (optonal) Monocular SLAM uses a camera as the unque sensor, wthout any odometry nput. A constant velocty model s nstead used to model approxmately smooth moton of the camera. Ths model requres parameters to be set defnng the camera frame rate, and the maxmum expected angular and lnear acceleratons. These parameters together determne how much uncertanty s added to the camera poston and orentaton estmates durng each moton predcton, and therefore also determnes the sze of the uncertanty-guded search regons used for feature matchng. Hgh expected acceleratons or a low frame rate wll lead to large search regons. The goal of ths exercse s to analyse the effect of changng the lnear acceleraton, the angular acceleraton and frame rate parameters. 1. The ntal lnear and angular acceleraton tunngs are 6 m s 2 and 6 rad s 2. 2. Increase the angular acceleraton only (for nstance, double the value) and analyse the effect on the search regons. Fnd ths parameter n the fle mono slam.m (ts name s sgma alphanose). 3. Increase the lnear acceleraton only (for nstance, double the value), analyse the effect and compare what happens now. The name of ths parameter s sgma anose. 4. Reduce the frame rate of processng to see what happens when only: (a) 1 out of 2 mages (b) 1 out of 4 mages 4
are processed. Clue: fnd the varable step and the code assocated wth ths varable. You wll also have to modfy the varable deltat whch sets the tme between frames. Does the processng tme has ncrease or decrease as the result of processng less mages? Can you explan ths? References [1] Y. Bar-Shalom and T. E. Fortmann. Trackng and Data Assocaton, volume 179 of Mathematcs n Scence and Engneerng. Academc Press, INC., San Dego, 1988. [2] A. Davson. Real-tme smultaneous localzaton and mappng wth a sngle camera. In Proc. Internatonal Conference on Computer Vson, 2003. [3] R. I. Hartley and A. Zsserman. Multple Vew Geometry n Computer Vson. Cambrdge Unversty Press, ISBN: 0521540518, second edton, 2004. [4] J.M.M Montel, Javer Cvera, and Andrew J. Davson. Unfed nverse depth parametrzaton for monocular slam. In Robotcs Scence and Systems, 2006. 5