PROBABILISTIC IMAGE SEGMENTATION FOR LOW-LEVEL MAP BUILDING IN ROBOT NAVIGATION. Timo Kostiainen, Jouko Lampinen

PROBABILISTIC IMAGE SEGMENTATION FOR LOW-LEVEL MAP BUILDING IN ROBOT NAVIGATION Timo Kostiainen, Jouko Lampinen Helsinki University of Technology Laboratory of Computational Engineering P.O.Box 9203 FIN-02015, FINLAND Timo.Kostiainen@hut.fi, Jouko.Lampinen@hut.fi ABSTRACT We propose a simple map building system for a mobile robot equipped with a single camera. The system is based on probabilistic segmentation of the ground area, which is safe to move on, from the rest of the scene, which is treated as obstacles. The slow convergence of MCMC methods often makes them poorly suited for practical image processing applications. We propose an efficient segment diffusion algorithm and certain approximations that make the MCMC approach applicable to real-time navigation. 1. INTRODUCTION We study the use of a single camera to build a map of obstacle free area for the navigation of a mobile robot. A standard solution to this problem involves the use of active sensors, such as laser or infrared. Instead, we attempt to accomplish this with a robot that is equipped with a single inexpensive camera. Color segmentation techniques applied in this context typically classify image pixels using one or more predefined colour models. This is a valid approach when the colours of different object classes are distinctive and do not overlap. However, in many real-world situations objects of interest have complex textures, similar colors appear on different surfaces, and shadows and reflections add to the complexity of the problem. In our approach, we divide each image into two segments: the foreground that represents an obstacle-free ground area, and the background, which includes everything else and is treated as obstacles. The foreground is be contiguous, since we only consider areas that the robot has access to. An initial division of the first image in a seguence is required to get an initial model for both foreground and background textures. Similar approaches to the robot navigation task have been used before. For example, Thorpe et al. [1] have presented a system for outdoor navigation which divides pixels to road and non-road classes based on color and texture models. The color model is adaptive to a certain extent. We use a probabilistic approach to image segmentation which was proposed by Tu et al. [2]. The probability model for the observed image is based on different texture models for image segments, and a Bayesian MCMC technique is applied to estimate the posterior distribution of possible segmentations. In order to make the approach computationally tractable, various types of cues from image data are used to drive the MCMC algorithm, that is, to produce good proposition samples using bottom-up information processing. We are seeking a method that is suitable for the realtime navigation task, so we need a computationally efficient solution. In this application, we can work with a constant number of segments. The addition of a new segment to the model requires some external information. For example, if the surface material changes, bumpers or short range infrared sensors can be used to determine that it is not an obstacle. We propose a novel technique for generating proposal samples for segment diffusion and an approximation to the estimation of texture model parameters. 2. IMAGE SEGMENTATION A probability model is defined for the image in terms of a segment division. A Bayesian MCMC algorithm is applied to estimate the posterior distribution of the segment division. We limit the discussion to the case where the number of segments is constant. A variable number of segments can be handled by means of trans-dimensional MCMC techniques, but that is not required in the map building application. 2.1. Probability model The segment division is defined by assigning a segment label to each image pixel. The segments have continuous

boundaries. A different texture model, controlled by parameters θ s, is associated with each image segment. We treat the segments as independent of each other, so the likelihood of the image I with segmentation S and segment model parameters θ is the product of segment likelihoods p(i S,θ,M)= s p(i s S,θ s,m), (1) where I s is the part of the image that belongs to segment s and M represents the constraints included in the model. The posterior distribution of the segment division S and model parameters θ, given the observed image I, is computed as follows: p(s,θ I,M)= p(i S,θ,M)p(S,θ M). (2) p(i M) We are interested in the segment division of the image, not the texture models. We choose the prior distribution of the segmentation state p(s M) to be independent of θ and integrate over θ to obtain the posterior distribution of the segment division: p(i S,θ,M)p(θ M)dθp(S M) p(s I,M)=. (3) p(i M) The prior p(s M) mainly controls the smoothness of segment boundaries. For computational efficiency, we approximate the integral in (3) by replacing the distribution of θ with a delta function centered at the maximum of the conditional posterior distribution p(θ I,S,M). This approximation leads to an empirical Bayes method and theoretically, it has the effect of over-fitting the rest of the parameters, namely S, but the following arguments support our choice: We use simple texture models which have unimodal likelihood functions The texture models have only a few degrees of freedom, and a validation technique is used to prevent over-fitting The likelihood functions are expected to be highly peaked, and experimental results indicate that the difference between the likelihood of the point estimate and the expected likelihood obtained by taking samples from the conditional posterior is negligible Savings in computational cost are substantial Image segmentation lacks a theoretical foundation, which is why the probability model in itself is a strong simplification. 2.2. Texture models The likelihood computation is based on texture models with parameters θ. We do not need an accurate model that is able to generate a realistic-looking reconstruction of the image in terms of the segment division and the texture models. Instead, we seek a rough segmentation method that can generalize over a wide range of different scenes. For this purpose we find that the texture models for the segments should have low complexity. We choose segment texture models using partial predictive likelihoods of the texture models, that is, divide pixels into training and test sets. We evaluate the texture models in the YCbCr color space. The spectral channels are modeled independently, using one of four different probability distribution models for textures: Gaussian, Laplacian, multinomial, or for the luminance channel only a linear spatial model with additive Gaussian noise. In all these models, pixel color values are assumed to be independently distributed and the likelihood of the segment is the product of pixel likelihoods. We apply a Dirichlet prior to the multinomial model, and non-informative uniform priors to the rest of the models, which have no more than a few free parameters. 2.3. Prior distributions The prior distributions that we use to control properties of the segments are quite general. A smoothness prior is assigned to segment outlines. For navigation in man-made environments, edges of segments could be expected to be straight and to contain mostly sharp corners. However, we find that the following, more general smoothness prior to be quite sufficient for the purpose: The boundary of segment s is discretized as a set of connected points which are represented in parametric form as (x(bi s),y(bs i )). We apply a moving average filter to the boundary points to obtain a low-pass filtered version of the outline (ˆx(B i s), ŷ(bs i )). An exponential prior is applied to the mean squared distance between each point and its counterpart on the smoothed boundary, as follows p(b s ) e α 1 ns n i ([x(b s i ) ˆx(Bs i )]2 +[y(b s i ) ŷ(bs i )]2 ). (4) The constant α determines the strength of the prior. 2.4. Diffusion algorithm for generation of MCMC proposal samples We apply the Metropolis-Hastings algorithm to obtain samples of the posterior probability distribution. The samples are different segment divisions of the image. The algorithm is based on generating candidate proposal samples and accepting or rejecting them depending on an acceptance prob-

ability. We generate the proposal samples using the following diffusion algorithm. Proposal samples are drawn directly from the likelihood. We randomly select one of the segments, s g, for region growing. Each segment has an equal probability of being selected. We choose a random radius value r for a circular diffusion kernel. We dilate the selected segment by moving the kernel along its boundary and obtain s g. For each pixel on the diffusion area, x i (x i s g,x i / s g ), we compute two likelihood values: the likelihood p(x i θ o ) for the model θ o of the segment that the x i currently belongs to and p(x i θ g ) for the model θ g of the growing segment s g. The likelihood ratio p(x i θ g )/p(x i θ o ) is used to determine which pixels should be transferred to segment s g. We use a spatial low-pass filter with a radius proportional to r to smooth the values of the likelihood ratio between neighboring pixels in order to help keep the segments contiguous. The filtered values of the likelihood ratio determine which pixels are proposed to be transferred. The process is illustrated in Fig. 1. It is possible that the balance of the MCMC chain is not strictly preserved in the diffusion step. In our experiments we have encountered no problems due to this. 2.5. Comparison with pixelwise classification To demonstrate the advantage of our Bayesian MCMC based approach, in this section we compare it to simpler ways of using the colour models. The initial division between foreground (floor) and background is shown in Fig. 2a). Colour distribution models are learned for both segments (see section 2.2), and the result of applying the models directly to classify each pixel is shown in 2b). Pixel classification in a single pass is very quick, but the result is significantly inferior to our MCMC method (2c), and it cannot be relied on for map building. It is apparent here how the reflective glass walls make the task difficult. If those parts that are not connected with the initial floor segment are left out, the result is still very blotchy (2d). Finally, we show the result of region growing using the same model adaptation as in the proposed method but without the MCMC dynamics (2e, 2f). Segment texture models are updated after two cycles of adding those neighbouring to the floor segment that are classified to belong to it. After a sufficient number of iterations, the result is not very different from 2c), but some type of postprocessing would be needed to resolve the numerous holes in the floor segment to enable navigation. 3. MAP BUILDING AND NAVIGATION With knowledge of the position of the camera, its focal length and calibration parameters to account for lens distortion, and assuming the ground planar, image coordinates can be mapped to intersections of 3-D ray vectors and the ground plane. Given a target point, the robot s task is to build a map between its present location and the target and then to use that map in navigating to the target. The robot has knowledge of the location of a small patch that contains obstaclefree ground area. The robot aims its camera such that the patch is within the field of view and grabs an image. The MCMC diffusion algorithm is applied to grow the homogeneous ground area up to a boundary that naturally separates it from the rest of the scene. The boundary is then projected onto an occupancy grid [3] of the area. What is within the boundary is regarded as safe ground. Based on the target point and the map, the robot checks whether the map to the target is complete, and if not, chooses a point on the map that is preferably in the direction of the target but not too close to the edge of the safe ground area. The map is filtered by a circular mask that is the size of the robot and transformed into the form of a graph by skeletonization, so that a graph search algorithm can be used to plan a path to the selected point. The robot follows the path and acquires a new image such that the field of view includes part of the mapped area. Now the patch of mapped area is used as the initial state for the segmentation algorithm, after projecting it onto the new image. Odometer measurements are used to determine the robot s current position. Processing typically takes around one second per image in a Matlab implementation on a 1.8 GHz processor. The robot autonomously decides when it needs a new image for further path planning. Thus the rate at which new images are required varies according to visibility (obstacles, corners, etc.). The robot travels at the speed of 0.5 m / s, and image processing rarely causes any delays. In our experiments the navigation system has proved relatively robust against variations in floor appearance caused by illumination effects. The system works on different floor materials without any modification. An example case is shown in Fig 3. The greatest difficulty is caused by inaccuracies in odometer measurements, which are the source of cumulative error in the maps. 4. CONCLUSION Probabilistic segmentation produces an estimate of the obstacle free area based on general and simple principles and very few assumptions. The result is a holistic division of the images instead of a pixelwise classification. This means that the result can be applied to navigation directly. Any prior information can be included in a simple and consistent manner to adapt the method to specific circumstances, in case higher accuracy or robustness is required. The assumption that all objects are on the horizontal ground level is naive and may cause collisions with edges

A A a) b) c) d) Fig. 1. Diffusion of segment boundaries. a) The initial segment division. The segment marked with A is selected for growing. b) The segment A is dilated by moving a circular kernel along the boundary of the segment. c) The pixel-wise likelihood ratio of competing models in the diffusion area (see text for explanation). d) Old boundary of segment A and the new boundary, determined by low-pass-filtering and thresholding the likelihood ratio map. The neighbouring segments (not shown) are modified to accommodate for the change. of tables, for example. Further research is needed to combine maps based on subsequent images such that the error in odometer measurements gets taken into account. 5. REFERENCES [1] Charles Thorpe, Martial Hebert, Takeo Kanade, and Steven Shafer, Vision and navigation for the carnegiemellon navlab, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 10, no. 3, pp. 362 373, May 1988, Other versions appeared in High Precision Navigation (Springer-Verlag, 1989) and in Annual Reviews of Computer Science, Volume 2, 1987. [2] Tu Zhuowen, Song-Chun Zhu, and Heung-Yeung Shum, Image segmentation by data driven Markov chain Monte Carlo, in Proceedings of International Conference on Computer Vision, Vancouver, Canada, July 2001. [3] Alberto Elfes, Using occupancy grids for mobile robot perception and navigation, Computer, vol. 22, pp. 46 57, June 1989.

a) b) c) d) e) f) Fig. 2. Segmentation vs. pixelwise classification. a) Initial segment division. b) Pixelwise classification result (probabilities of pixels belonging to the floor segment). c) Result of probabilistic segmentation. d) Parts of result (b) that are not connected with the initial segment have been left out. e) Sequential region growing with model adaptation over 4-connected neighbour pixels; equal CPU time as in (c). f) Same as (e) but twice the CPU time.

1 0 A B C A B C 1 2 2 0 2 4 6 8 10 12 Fig. 3. Robot navigates to target point along a curved corridor using image segmentation. Five images were required to build a map from the starting point to given target point (12m, 2.5m). Top: Three of the five images with initial values (black) and segmentation results (white). Bottom: Final map based on the images. The vectors starting from labeled points indicate camera orientation during acquisition of the shown images. The target point is indicated with an asterisk. The robot started from the origin and moved backward to the point A. The track of this movement was used as the initial value for floor appearance. For the rest of the images, the segmentation result of the previous image was projected back on the current one and used as the initial value. At points B and C, two images were acquired using different camera orientations. The first image of each pair is shown. The approximate location of the true map is shown in the background.