Proceedings of the 6th Int. Conf. on Computer Analysis of Images and Patterns CAIP'95, pp. 874-879, Prague, Czech Republic, Sep 1995 Direct Obstacle Detection and Motion from Spatio-Temporal Derivatives Par Fornland Computational Vision and Active Perception Laboratory (CVAP) Department of Numerical Analysis and Computing Science Royal Institute of Technology, S-100 44 Stockholm, Sweden Abstract. Autonomous vehicles need a means of detecting obstructions on its path, to avoid collision. In this paper, a novel approach to obstacle detection is presented. A camera moves on a visible ground plane with the optical axis parallel to the ground. Camera motion parameters are linearly related to rst order spatio-temporal derivatives of the taken image sequence; image ow is not needed. Motion is robustly estimated using RANSAC. An error measure for each image point corresponds to the likelihood of an obstacle in that point. 1 Introduction Visual systems must detect when there is a risk of colliding with obstructions. Many such situations can be conceptually reduced to obstacles protruding above a at ground; the task of nding these protrusions is referred to as obstacle detection, for which various schemes using a moving monocular camera have been presented in the literature. The obstacle detection is performed by studying image motion, which dier between regions viewing obstacles and ground plane. In [3] a calibrated visual system was used, assuming translational motion. A reference ow calculated from the calibration information was compared to the estimated ow. But one may not know the calibration parameters, therefore all algorithms cannot rely on this assumption. It was indeed suggested [7] to nd the reference ow by moving the camera over a ground plane without obstacles. The motion and geometry are then assumed constant during the obstacle detection. Approaches not requiring a reference ow have been suggested, e.g. examining the divergence [6] and other qualities [8] of the ow. Both detected obstacles without specic assumptions about the camera motion or the scene structure, but both required dense ow estimates. Flow estimation is not yet robust [1] despite considerable eorts by the computer vision community. In [2], a representation of a translational motion and the equation of a plane were continuously updated, using long sequences. The estimated parameters enabled a prediction of image intensities of points viewing the plane. Obstacles cause bad predictions. In this paper is presented a direct framework for obstacle detection using rst-order spatio-temporal derivatives of an image sequence. World structure is neither required nor reconstructed from the images. Similar to [2], a short circuit
is introduced into the previously suggested procedures for obstacle detection, since no optical ow is required. Care has been taken not to restrict the motion to pure translation; instead a general framework is developed. The optical axis is parallel to the ground, which is an often avoided case. From the gradient constraint equation [1] and the second order ow equation [5], together with geometry assumptions, a linear equation relating motion parameters to image derivatives is derived. A here presented version of RANSAC [4] robustly estimates the motion from the over-constrained system of equations by disregarding a number of observations as outliers. Each observation is dened by coordinates and spatio-temporal derivatives of one point in the image plane. Two approaches are suggested, one globally estimating motion from all available observations, forming a histogram from the error measure. The other approach locally estimates the motion parameters in a number of small windows in the image. A multi-dimensional histogram is formed from the estimated parameters. In both cases, a peak is normally found at the correct motion, corresponding to ground plane points. This segments the image into obstacles and ground plane. 2 Theories and Algorithms Assume that a camera is positioned above a ground plane, as in Fig. 1 in which the X-axis is orthogonal to the paper. A 3D point P = (X; Y; Z) projects on the image at p = (fx=z; fy=z) = (x; y). If length measures are expressed in terms of the focal length, we can set f = 1. An image point may be projected from a point P on the ground plane or an obstacle Q. Also projected 3D velocities are dierent for obstacles [3] than for ground plane points. Y f Y0 (x,y) Z Q Fig. 1. The camera, the ground plane, and an obstacle P A camera moving through a static 3D world is equivalent to a rigid world moving in front of a static camera, a view which is used in this discussion. A constant rigid 3D motion can be resolved into translational (V X ; V Y ; V Z ) and rotational ( X ; Y ; Z ) components, the latter with angular velocities around an axis through the origin. This forms an ane transformation: 0 1 0 1 0 1 _ P = @ _X _Y A = @ Z _ V X V Y V Z Y Z? Z Y A + @ Z X? X Z A (1) X Y? Y X
Projecting P _ on the image leads to the second order [5] image ow, characterizing the motion of image structures with respect to three dimensional motion:!! _x _y = Z?1 (V X? xv Z )? xy X + (1 + x 2 ) Y? y Z Z?1 (V Y? yv Z )? (1 + y 2 ) X + xy Y + x Z Frequently in motion analysis, the camera is tilted as in Fig. 1 in order to avoid image points projected from close to the horizon, causing high and dicult spatial frequencies. In this research the tilt angle is zero, and the diculties must therefore be handled. Ground plane points satisfy the relation Z = Y 0 =y, and since the translational velocities only can be retrieved up to a scale factor, no restriction is imposed by scaling them with 1=Y 0, forming translational parameters C X, C Y and C Z. Figure 2 shows how the ground plane image ow depends on single motion parameters of _x = _y y 0? xy 0 y? y 2 0 @ C X C Y C Z 1 A +?xy 1 + x 2?y?(1 + y 2 ) xy x (2) 0 1 X @ Y A (3) Z Fig. 2. The V X, V Z and Y components respectively 2.1 Motion Estimation A common assumption in computer vision research is that the intensity of any moving image structure, I(x(t); y(t); t), is constant over time, di=dt = 0. The dierential theory chain rule gives di=dt = ri _p + I t where _p = ( _x; _y) t is the image ow. The resulting equation is ri _p + I t = 0 (4) which is referred to as the gradient constraint equation (GCE). This equation is the foundation of a wide variety of techniques [1] for estimating image ow. The ow function (3) is now inserted into the GCE (4), resulting in yi x C X + yi y C Y? (xyi x + y 2 I y )C Z? (I y + xyi x + y 2 I y ) X + +(I x + x 2 I x + xyi y ) Y + (xi y? yi x ) Z +I t = 0 (5) where the coecients are the camera motion parameters, and the observations are taken from the image. The assumption that the optical axis and the X- axis are always parallel to the ground plane is violated by X- and Z-rotations, respectively. A Y -translation violates the assumption about constant height. These arguments lead to the constraints C Y = X = Z = 0.
2.2 The RANSAC method The RANSAC method [4] has been applied to computer vision before, but not to this problem. Given a set of observations of a parametrised function, it nds a subset of observations tting the function, disregarding other observations. A simple version ts a line to a set of points (x; y). Figure 3 (a) exemplies a dicult line-tting with synthetic data, where RANSAC performs well. The dashed line is estimated using all observations in a least squares error method. y Freq Freq 0 0 (a) x (b) Error (c) Error Fig. 3. (a) RANSAC (b) Error histogram for ground and (c) ground with obstacle A generalisation uses M noisy observations x i 2 R N of a hyperplane A x = A 1 x 1 + : : : + A N x N = 1, where A is a normal to the hyperplane, and x are coordinates. First, an integer index vector i = (i1 : : : i N ) is chosen randomly from 1 i k M, dening a preliminary hyperplane through? xi1 : : : x in t A =? 1 : : : 1 The signed orthogonal distance between a point x and a hyperplane is d =? x A? 1 = A, i.e. points on opposite sides of the plane have dierent signs. The set of observation points lying within a predened distance from the hyperplane is remembered; remaining points are disregarded as outliers. The procedure is repeated n times, each time for dierent i. The nal hyperplane is estimated with a Least Squares t of the largest remembered set. (6) 2.3 Obstacle Detection The obstacles are detected in the following manner. The RANSAC method rst globally estimates the motion parameter hyperplane. For each observation derived from the ground plane, the signed distance d (see above) is found to have an approximately Gaussian distribution N(0; ). But points viewing obstacles correspond to observation variables with dierent statistics. The RANSAC method is robust against outliers, in this case corresponding to obstacles. Histogramming techniques can be applied to provide thresholds, indirectly segmenting the image into ground and obstacles. The threshold will be delicate to estimate if the obstacles are small, as their histogram peaks then might be below the noise level. But a large obstacle corresponds to a discernible peak in the histogram. The largest allowed obstacle size, where the method starts breaking down, is investigated in a coming, longer version of this paper.
The motion estimation can also be performed separately in small windows of the image. For windows containing only the ground plane, the motion parameters are expected to be well estimated. But for windows viewing obstacles, the estimated motion will be erroneous. Multi-dimensional histograms are formed from the set of estimated motion parameters of the image windows, and a prominent peak should normally be found for the correct motion. Histogram bins in a neighborhood of the peak correspond to image windows viewing the ground plane, and the other bins correspond to potential obstacles. 3 Experiments Experiments were performed both for the motion estimation and the obstacle detection. Three consecutive spatially smoothed images were used. The rst experiments conrm that direct estimation of the camera motion is possible. Synthetic images with only two motion parameters, and the geometry described in Section 2 were used. Inspired by [2], only the lower part of the image was considered. The randomness of RANSAC causes a small noise, see Table 1. Ground truth Estimates C Z Y C Z Y 0.02-0.1 0.0196-0.0931-0.03 0-0.0301-0.0109 0-0.1-0.0002-0.1027 0.03 0.2 0.0295 0.1989 Table 1. Motion estimation experiments without obstacles Histograms of the distance from each observation point to the estimated motion parameter hyperplane were examined. Synthetic images were produced, and as indicated by Fig. 3 (b), the shape of the histogram formed from a ground plane resembles a Gaussian distribution. Figure 3 (c) shows a histogram formed from a mix of a ground plane and a constant height obstacle. Real images were used to evaluate the obstacle detection. The exact motion parameters were unknown, but the motion was approximately translational in the (X; Z)-plane. The images view a box on a ground plane. As real images are noisy due to motion blur, sensor noise etc, and the box only have small motion parallax close to the ground, the segmentation is not expected to be very robust. The stronger local window strategy was therefore employed. The resulting binary image indicating obstacles is eroded to remove smaller regions. Thresholds and RANSAC-parameters are scene-dependent, and therefore selected accordingly. The obstacles are marked in Fig. 4, where the top of the box is not detected since only the lower part of the image is used, where the ground plane is visible. The detection works well for obstacle parts suciently high above the ground.
Fig. 4. Box scenes with the detected obstacles Acknowledgements: The author wishes to thank Dr. Bergholm for providing valuable experience in motion estimation. References 1. J.L. Barron, D.J. Fleet, and S.S. Beauchemin, \Performance of optical ow techniques", International Journal of Computer Vision, vol. 12, no. 1, pp. 43{77, 1994. 2. S. Carlsson and J-O Eklundh, \Object detection using model based prediction and motion parallax", in Proceedings of the First European Conference on Computer Vision, pp. 134{138, Springer-Verlag, Apr. 1990. (Antibes, France). 3. W. Enkelmann, \Obstacle detection by evaluation of optical ow elds from image sequences", Image and Vision Computing, vol. 9, no. 3, pp. 160{168, 1991. 4. M.A. Fischler and R.C. Bolles, \Random sample consensus: A paradigm for model tting with applications to image analysis and automated cartography", Commun. ACM, vol. 24, pp. 381{395, 1981. 5. H.C. Lounguet-Higgins and K. Prazdny, \The interpretation of a moving retinal image", in Proc. Royal Society London B-208, pp. 385{397, 1980. 6. R. C. Nelson and Y. Aloimonos, \Obstacle avoidance using ow eld divergence", IEEE Trans. on PAMI, vol. 11, pp. 1102{1106, 1989. 7. M. Tistarelli and G. Sandini, \Dynamic aspects in active vision", CVGIP: Image Understanding, vol. 56, pp. 108{129, July 1992. 8. G-S Young, T-H Hong, M. Herman, and J.C.S. Yang, \Safe navigation for autonomous vehicles: A purposive and direct solution", in SPIE Int. Conf. on Intelligent Robots and Computer Vision XII, pp. 31{42, 1993.