Robot Localization based on Geo-referenced Images and G raphic Methods Sid Ahmed Berrabah Mechanical Department, Royal Military School, Belgium, sidahmed.berrabah@rma.ac.be Janusz Bedkowski, Łukasz Lubasiński, Andrzej Masłowski Research Institute for Automation and Measurements, PIAP, Poland, januszbedkowski@gmail.com Abstract This paper describes tow methods for Simultaneous Localization and Mapping for mobile robot navigation in outdoor and indoor environments. The first method is a feature-based algorithm which combines geo-referenced images in order to localize the robot in a user defined global coordinate frame and keep track of all detected landmarks (features). This method builds a map of the environment using an history memory which accumulates sensory evidence over time to identify places with a stochastic model of the correlation between map features, and then, integrates data from GPS to localize the built map and therefore the robot in geo-referenced images. The user can then communicate with the robot to define a final goal (target) using only the geo-referenced images. The second method is a graphics based approach. It encloses set of graphic methods to obtain robot translation and rotation according to global map. The idea is based on the comparison of the match scan, therefore the translation and rotation between 2 scan is obtained. The algorithm is implemented using OpenGL (Open Graphic Library) technology in CORBA (Common Object Request Broker Architecture). This method is well suited for indoor navigation. I. Introduction Industrial growth of nowadays makes it necessary to replace humans working in assembly lines by accurate and rapid robots. Current industrial robots lack flexibility and autonomy and are not able to operate in new environments. That is why, autonomous mobile robots are required. Autonomous robotics aim is the control of the motion of a mobile robot to navigate from a start point to an end point based on the description of the workspace given by the sensors. To reach a reasonable degree of autonomy, two basic requirements are sensing and reasoning. Sensing is provided by an on board sensory system that gather information about the robot it self and the surrounding environment. Reasoning is accomplished by developing algorithms that exploit this information in order to generate appropriate commands for the robot. The reasoning system is the central unit in an autonomous robot. According to the environment state, it must allow the robot to localize itself in the environment and to seek for free paths. To accomplish these
two tasks, it bases its reasoning on a model or a description of the environment, but this model is not always available and in this case the robot should have the means to build such a model over time as it explores its environments. This latest problem is known as mapping problem. Mapping and localization are interlinked problems: without a precise map, the robot cannot localize itself using the map. On the other hand, without precise knowledge of its pose (spatial position and orientation), the robot cannot build a representation of its environment. In combination, however, the problem is much harder. In this case, starting from an unknown location in an unknown environment, an autonomous mobile robot should be able to incrementally build a map of the environment using only relative observations of the environment and then use this map to localize itself and navigate. This is the so-called simultaneous localization and mapping (SLAM) approach. In this paper, we present two methods to solve the SLAM problem for outdoor and indoor environments. The first method is a vision-based SLAM where a single camera is used by the outdoor robot ROBUDEM of the Royal Military School to build a model of the scene and localize the robot. This is interesting because it offers a low-cost and real-time approach to SLAM in unprepared environments. The camera identifies natural features in the scene and uses these as landmarks in the built map. To model large environments, the proposed approach builds several size limited local maps and combines them in a global one. The proposed method uses an 'history memory' which accumulates sensory evidence over time to identify places with a stochastic model of the correlation between map features. This helps also to overcome the problem of perceptual aliasing, which refers to situations where several places are perceptually similar enough to be confused by the robot. Using data from GPS, the built map is superposed to geo-referenced images allowing therefore the localization of the robot in a user defined global frame. In the second approach proposed, an indoor mobile robot ATRV-JR (PIAP) uses a laser sick range finder to model the environment as an occupancy grid map. The localization of the robot is done by matching the laser scans. The rest of the paper is organized as follows: the proposed approaches are described in details in sections II and III, followed by conclusions in section IV. II. Vision-based Simultaneous Localization and Mapping In the Outdoor application the robot ROBUDEM uses a single monocular camera to extract natural features in the scene. These features are used as landmarks in the built map. The SLAM problem is tackled as a stochastic problem using an Extended Kalman Filtering to maintain the state vector, x, consisting of the robot state, x R, and map feature states, x Li. It also maintains a covariance matrix, P, which includes the uncertainties in the various states as well as correlations between the states. Feature states x Li are the 3D position vectors of the locations of feature points. The (robot) camera state vector x R comprises [] a metric 3D position vector r W, orientation quaternion q RW, velocity vector v W, and angular velocity vector w R relative to a fixed world frame W and robot frame R:
Features are image patches with a size of x pixels. The patches are supposed locally as planar surfaces perpendicular to the vector from the feature to the camera at initialization. When making measurements of a feature from new camera positions, each patch can be projected from 3D to the image plane to produce a template for matching with the real image. This template will be a warped version of the original square template captured when the feature was first detected. To avoid using outlier features, the moving object mask detected by the motion segmentation procedure introduced in [2] is used. Subsequently, during map building, the detected features on the moving parts are excluded. The coordinate frame is chosen at the starting position and the system is helped at the beginning with an amount of prior information about the scene. A shape of a known target is placed in front of the camera, which provides several features with known positions and known appearance. This will help to know the scaling of the estimated map and an initialization of the map as with only a single camera, the features cannot be initialized on the map based only on one measurement because of their unknown depth. The system starts with zero uncertainty. Feature matching is carried out using straightforward normalization cross-correlation search for the template patch projected into the current camera estimate. The image is scanned at each location until a match is found. This searching for match is computationally expensive. A feature-based SLAM algorithm using Extended Kalman Filtering has two important limitations when mapping large environments. First, the computational cost of updating the map grows with O(N 2 ) (N is the number of features + size of the robot state). Second, as the map grows, the estimates obtained by the EKF equations quickly become inconsistent due to linearization errors [3]. To overcome these limitations, the proposed approach builds small independent local maps of limited size and joins them in global one. Each local map is represented by a separate Extended Kalman Filter. For joining local maps we use a 'history memory' which accumulates sensory evidence over time to identify places with a stochastic model of the correlation between map features to obtain one full stochastic map. This is done by transforming each local map into a global frame before to start building a new local map. The base frame for the global map is the robot position at instant x R t. Local maps are built as follows: at a given instant t k, a new map is initialized using the current vehicle location, x R tk,as base reference B i = x R tk, where i=, 2,..., is the local map order. Then, the vehicle performs a limited motion acquiring sensor information about the neighboring environment features L i. Where is the coordinate vector in the base reference B i of the L i detected features and covariance matrix estimated in B i. The robot decides to start a new local map when the number of mapped features reaches a predefined threshold. Each finalized local map is then transferred to the global map before starting a new one, by computing the state vectors and the covariance matrix. The coordinate frame of the first local map is used as the coordinate frame of the global map:
Where is a concatenation of all features from local maps,, 2, : The covariance transition function. of the joint map is obtained from the linearization of the state The obtained map can be superimposed to a satellite image of the navigated area by matching the GPS data corresponding to frame coordinates of the local maps. III. graphic based Simultaneous Localization and Mapping In the indoor ATRV application, the method uses a laser range finder to build an occupancy grid map of an indoor navigation area. The localization of the robot is obtained by matching the laser scans. The measurement is acquired using LMS SICK 22 range finder. Sick LMS 22 reads data every.5 (-9 ;9 ). It gives 36 points. Data from laser is obtained in the polar coordinate system (r i, α i ) where: r i distance to an obstacle [meters], α i the horizontal angle (figure ) Figure. LSM SICK 2, and its scanning angle. They are recalculated to Cartesian one, by the following equations: x i = r i cos(α) y i = ; () z i = r i sin(α) where: (x i, y i, z i ) Cartesian coordinate of a point. Data is collected into vector R = [r, r 2, r i ] The vector R is transformed into set {( x i, y i, z i )} i =, n using equation (). The laser data is a result of intersection of scanning plane and obstacles planes. The computed set of points is merged into the lines, and rendered in OpenGL window ( figure 4 scan). The idea if presented method is to find transformation matrix M which transform scan t into scan t 2 (where t time of collected first scan, t 2 time of collected second scan).
Therefore we can compute new location of robot P new (x new, y new, z new ). P new = P old M (2) where P old position of robot in t The idea of the localization algorithm is based on the match 2 laser scans. The robot motion is defined as translation and rotation. Therefore it is possible to predict the robot position estimation in the rectangle shown in figure 2. Each element from matrix of the translation contains the table of rotation angles. Multiplying translation matrix and rotation matrix gives estimated robot position matrix. The goal is to compute an appropriate combination of the translation and rotation matrix accordingly, to maximizing the similarity of the 2 laser scans. The similarity is described by the amount of scan pixels (figure 4). Figure 2: Robot position estimation in the rectangle described by matrix of translation. The algorithm is composed by 2 elements: initialization and main match scan procedure. The pseudo code of initialization is given: Init() { For i = i < SIZE_TABLE_O F_X_INDEXES_O F_TRANSLATION_MATRIX i+ + For j = j < SIZE_TABLE_O F_Z_INDEXES_O F_TRANSLATION_MATRIX j+ + For k = k < SIZE_TABLE_O F_ROTATION_ANGLES k+ + M[i][j][k] = BuildTransformationMatrix(i, j, k) return; } The BuildTransformationMatrix procedure obtains the translation matrix from values of translation dx in X axis and dz in Z axis and rotation matrix from rotation angle α. The matrixes M xz and M α are given:
cos sin M xz M sin cos dx dz The result of the BuildTransformationMatrix is the set of matrixes M i which are involved into computing new location of robot P new i (x new, y new, z new ). P new i = P old M i (3) The MatchScan procedure uses opengl 256 x 256 pixels window. The example of rendering the LSM SICK22 scan is shown on fugure 3: Figure 3: OpenGL window with rendered LSM SICK 22 scan; MatchScan() { Load(Scan); Load(Scan2); For i = i < SIZE_TABLE_O F_X_INDEXES_O F_TRANSLATION_MATRIX i+ + For j = j < SIZE_TABLE_O F_Z_INDEXES_O F_TRANSLATION_MATRIX j+ + For k = k < SIZE_TABLE_O F_ROTATION_ANGLES k+ + { ClearOpenGLWindow(); Render(Scan); Transform(Scan2, M[i][j][k]); Render(Scan2); CountNumberOfScanPixels; //scans are rendered using different //color than background If numberofscanpixels < minnumberofscanpixels
{ minnumberofscanpixels = numberofscanpixels; } } return i, j, k for minnumberofscanpixels; } The following figure shows an idea of the opengl graphic library usage in the matchscan procedure. Figure 4: Example of 2 situations in matchscan procedure. Left one represents correct matching, due to the number of non white pixels in the left image is smaller than in the right image. Therefore the second image represents a wrong match. III.. Result using the second method The following section presents an experimental evaluation of localization algorithm based on 2D laser scans for indoor environment. Experiments are performed on real scans collected in an office environment with a map size of m x 2m. The algorithm of the robot localization based on graphic methods is presented. Experiments were executed on the PC with dual core processor with 2GB of ram and NVIDIA GEFORCE 76 graphic card. The goal of serving the precise position estimation is achieved. The result of the presented procedures is shown on figure 5:
Figure 5: The result of the matchscan procedure: red color describes the reconstructed map, blue line describes robot s path, and green rectangle shows the current robot position. I V. Conclusion We presented two methods for simultaneous Localization and Mapping for mobile robot navigation in indoor and outdoor environment. The first method is a vision-based approach for the Outdoor ROBUDEM application, which identifies natural features in the scene and uses these as landmarks. The algorithm builds several size limited local maps and merge them in a global map using an history memory. The built map can then be superimposed to a geo-referenced image of the navigated area using GPS data at the local map coordinate frames. In the second approach, the indoor robot CTRV-JR uses a laser range finder to build an occupancy grid map of an its navigation area. The localization of the robot is obtained by matching the laser scans. Acknowledgment: This research is founded by the View-Finder FP6 IST 4554 project. V. References [] A. J. Davison, I. D. Reid, N. D. Molton, O. Stasse, MonoSLAM: Real-Time Single Camera SLAM, IEEE Trans. PAMI 27. [2] S.A. Berrabah, G. De Cubber, V. Enescu, H. Sahli, MRF-Based Foreground Detection in Image Sequences from a Moving Camera, IEEE International Conference on Image Processing (ICIP 26), Atlanta, GA, USA, Oct. 26, pp.25-28, [3] J.D. Trados, J. Neira, P. Newman, J. Leonard, Robust mapping and localization in indoor environments using sonar data, International Journal of Robotics Research, 22, N. 2, pp.3-33.