DYNAMIC STEREO VISION FOR INTERSECTION ASSISTANCE

Similar documents
6D-Vision: Fusion of Stereo and Motion for Robust Environment Perception

Spatio-Temporal Stereo Disparity Integration

The Stixel World - A Compact Medium Level Representation of the 3D-World

Spatio temporal Segmentation using Laserscanner and Video Sequences

Real-time Stereo Vision for Urban Traffic Scene Understanding

On Board 6D Visual Sensors for Intersection Driving Assistance Systems

W4. Perception & Situation Awareness & Decision making

On-line and Off-line 3D Reconstruction for Crisis Management Applications

Range Imaging Through Triangulation. Range Imaging Through Triangulation. Range Imaging Through Triangulation. Range Imaging Through Triangulation

Dense 3D Reconstruction. Christiano Gava

Egomotion Estimation by Point-Cloud Back-Mapping

Dense 3D Reconstruction. Christiano Gava

EE795: Computer Vision and Intelligent Systems

Stereo Scene Flow for 3D Motion Analysis

arxiv: v1 [cs.cv] 28 Sep 2018

LIGHT STRIPE PROJECTION-BASED PEDESTRIAN DETECTION DURING AUTOMATIC PARKING OPERATION

Depth. Common Classification Tasks. Example: AlexNet. Another Example: Inception. Another Example: Inception. Depth

Transactions on Information and Communications Technologies vol 16, 1996 WIT Press, ISSN

1. Introduction. A CASE STUDY Dense Image Matching Using Oblique Imagery Towards All-in- One Photogrammetry

Tracking Oncoming and Turning Vehicles at Intersections

Product information. Hi-Tech Electronics Pte Ltd

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.

Fundamentals of Stereo Vision Michael Bleyer LVA Stereo Vision

Colour Segmentation-based Computation of Dense Optical Flow with Application to Video Object Segmentation

Stereovision-Based Sensor for Intersection Assistance

Prof. Fanny Ficuciello Robotics for Bioengineering Visual Servoing

Automatic Tracking of Moving Objects in Video for Surveillance Applications

3D Computer Vision. Dense 3D Reconstruction II. Prof. Didier Stricker. Christiano Gava

Pedestrian Detection Using Multi-layer LIDAR

2 OVERVIEW OF RELATED WORK

Application questions. Theoretical questions

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.

Laserscanner Based Cooperative Pre-Data-Fusion

Stereo Vision Based Advanced Driver Assistance System

Flow Estimation. Min Bai. February 8, University of Toronto. Min Bai (UofT) Flow Estimation February 8, / 47

Clemens Rabe Detection of Moving Objects by Spatio-Temporal Motion Analysis

An Evaluation Framework for Stereo-Based Driver Assistance

Creating a distortion characterisation dataset for visual band cameras using fiducial markers.

Vehicle Occupant Posture Analysis Using Voxel Data

COMPARATIVE STUDY OF DIFFERENT APPROACHES FOR EFFICIENT RECTIFICATION UNDER GENERAL MOTION

Self-calibration of a pair of stereo cameras in general position

Feature Transfer and Matching in Disparate Stereo Views through the use of Plane Homographies

CS 4758: Automated Semantic Mapping of Environment

Sensory Augmentation for Increased Awareness of Driving Environment

Pedestrian Detection with Radar and Computer Vision

Introduction.

Real-Time Detection of Road Markings for Driving Assistance Applications

Motion Analysis. Motion analysis. Now we will talk about. Differential Motion Analysis. Motion analysis. Difference Pictures

An Interactive Technique for Robot Control by Using Image Processing Method

Accurate 3D Face and Body Modeling from a Single Fixed Kinect

SLAM with SIFT (aka Mobile Robot Localization and Mapping with Uncertainty using Scale-Invariant Visual Landmarks ) Se, Lowe, and Little

Final Exam Study Guide

3D Model Acquisition by Tracking 2D Wireframes

Performance Evaluation Metrics and Statistics for Positional Tracker Evaluation

PERFORMANCE CAPTURE FROM SPARSE MULTI-VIEW VIDEO

Ruch (Motion) Rozpoznawanie Obrazów Krzysztof Krawiec Instytut Informatyki, Politechnika Poznańska. Krzysztof Krawiec IDSS

Map Guided Lane Detection Alexander Döbert 1,2, Andre Linarth 1,2, Eva Kollorz 2

Stereo and Epipolar geometry

Stochastic Road Shape Estimation, B. Southall & C. Taylor. Review by: Christopher Rasmussen

On Road Vehicle Detection using Shadows

STEREO-VISION SYSTEM PERFORMANCE ANALYSIS

Lecture 14: Computer Vision

FAB verses tradition camera-based motion capture systems

Measurement of Pedestrian Groups Using Subtraction Stereo

Car tracking in tunnels

Mini Survey Paper (Robotic Mapping) Ryan Hamor CPRE 583 September 2011

Finally: Motion and tracking. Motion 4/20/2011. CS 376 Lecture 24 Motion 1. Video. Uses of motion. Motion parallax. Motion field

Revising Stereo Vision Maps in Particle Filter Based SLAM using Localisation Confidence and Sample History

Human Detection. A state-of-the-art survey. Mohammad Dorgham. University of Hamburg

Three-Dimensional Sensors Lecture 2: Projected-Light Depth Cameras

Large-Scale Traffic Sign Recognition based on Local Features and Color Segmentation

Stereo imaging ideal geometry

Minimizing Noise and Bias in 3D DIC. Correlated Solutions, Inc.

THE POSITION AND ORIENTATION MEASUREMENT OF GONDOLA USING A VISUAL CAMERA

차세대지능형자동차를위한신호처리기술 정호기

Video Georegistration: Key Challenges. Steve Blask Harris Corporation GCSD Melbourne, FL 32934

LUMS Mine Detector Project

From Orientation to Functional Modeling for Terrestrial and UAV Images

What have we leaned so far?

Advanced Vision Guided Robotics. David Bruce Engineering Manager FANUC America Corporation

Camera Calibration for a Robust Omni-directional Photogrammetry System

Locating ego-centers in depth for hippocampal place cells

3D DEFORMATION MEASUREMENT USING STEREO- CORRELATION APPLIED TO EXPERIMENTAL MECHANICS

Module 7 VIDEO CODING AND MOTION ESTIMATION

Monitoring surrounding areas of truck-trailer combinations

3D Computer Vision. Depth Cameras. Prof. Didier Stricker. Oliver Wasenmüller

Fundamental Technologies Driving the Evolution of Autonomous Driving

Pedestrian counting in video sequences using optical flow clustering

Non-flat Road Detection Based on A Local Descriptor

APPLICATION OF AERIAL VIDEO FOR TRAFFIC FLOW MONITORING AND MANAGEMENT

Kinect Cursor Control EEE178 Dr. Fethi Belkhouche Christopher Harris Danny Nguyen I. INTRODUCTION

Ground Plane Motion Parameter Estimation For Non Circular Paths

Topics to be Covered in the Rest of the Semester. CSci 4968 and 6270 Computational Vision Lecture 15 Overview of Remainder of the Semester

Stereo vision. Many slides adapted from Steve Seitz

Motion-based obstacle detection and tracking for car driving assistance

Training models for road scene understanding with automated ground truth Dan Levi

Motion estimation of unmanned marine vehicles Massimo Caccia

People detection and tracking using stereo vision and color

Solid State LiDAR for Ubiquitous 3D Sensing

SURVEY OF LOCAL AND GLOBAL OPTICAL FLOW WITH COARSE TO FINE METHOD

Transcription:

FISITA 2008 World Automotive Congress Munich, Germany, September 2008. DYNAMIC STEREO VISION FOR INTERSECTION ASSISTANCE 1 Franke, Uwe *, 2 Rabe, Clemens, 1 Gehrig, Stefan, 3 Badino, Hernan, 1 Barth, Alexander 1 Daimler AG, Group Research, Germany, 2 University of Kiel, 3 University of Frankfurt KEYWORDS environment perception, driver assistance, intersection assistance, stereo vision, space-time stereo. ABSTRACT More than one third of all traffic accidents with injuries occur in urban areas, especially at intersections. Therefore, a driver assistance system supporting the driver in cities is highly desirable and has a tremendous potential of reducing the number of collisions at intersections. A suitable system for such complex situations requires a comprehensive understanding of the scene. This implies a precise estimation of the free space and the reliable detection and tracking of other moving traffic participants. Since the goal of accident free traffic requires a sensor with high spatial and temporal resolution, stereo vision will play an important role in future driver assistance systems. Most known stereo systems concentrate on single image pairs. However, in intelligent vehicle applications image sequences have to be analyzed. The contribution shows that a smart fusion of stereo vision and motion analysis (optical flow) gives much better results than classical frame-by-frame reconstructions. The basic idea is to track points with depth known from stereo vision over two and more consecutive frames and to fuse the spatial and temporal information using Kalman filters. The result is an improved accuracy of the 3D-position and an estimation of the 3D-motion of the considered point at the same time. This approach, called 6D Vision, enables a detection of moving objects even if they are partially hidden. From static points very accurate occupancy grids are built. A global optimization technique delivers a robust estimation of the free space. Pixels moving in the world are clustered to objects which are then tracked over time in order to estimate their motion state and to predict their paths. This allows for powerful collision avoidance systems: pedestrians crossing the street are detected before they enter the lane; the same holds for vehicles from the sides which are not detectable by common radar systems. Since we are able to estimate the yaw rate of oncoming traffic, the prediction is not restricted to straight motion but can detect potential collisions with turning traffic, especially at intersections. Urban vision asks for a large field of view. Within the German project AKTIV a fisheye stereo camera system is under development with a field of view of up to 150 degrees. If the 6D-Vision principle is applied to these images, laterally entering vehicles are also detectable.

INTRODUCTION Stereo vision is a research topic with a long history. See (1) for an overview. For a long time, correlation based techniques were commonly used. They deliver precise and reliable measurements in real-time on a PC or on dedicated hardware. Recently, much progress has been achieved in dense stereo. Especially the work of Hirschmueller (2) paves the road towards real-time solutions. His so called Semi-Global-Matching algorithms deliver nearoptimum solutions on the computational expense of a classical correlation scheme. New subpixel algorithms (3) reduce the distance noise significantly and further push the limits of stereo for a given camera system. Fig. 1 compares the results obtained by a common correlation based scheme with a modern dense stereo algorithm. The colors encode distance, the warmer the color the closer the point. The results do not only differ in density. Note the differences in low contrast areas such as the building and the road surface. Using stereo vision, the three-dimensional structure of the scene is easily obtained. The standard approach for free space analysis and obstacle detection is as follows: After rectification, the stereo correspondences are computed. Then, all 3D points are projected onto an occupancy grid. In a third step, this grid is segmented and potential obstacles are tracked over time in order to verify their existence and to estimate their motion state. This strategy ignores the strong correlation of successive frames and the information contained within. This paper describes an efficient exploitation of the correlation in time. It leads to more precise and stable results, and allows estimating the motion state of single image points even before the objects are detected. This "track-before-detect" approach distinguishes between static and moving pixels before any segmentation has been performed. Using static points, very accurate occupancy grids are generated while moving points can be easily grouped. The paper is organized as follows: First we sketch the problems in stereo vision and show that the uncertainties of occupancy grids are significantly reduced if the stereo information is integrated over time. Then, we introduce a Kalman filter based integration of stereo and optical flow allowing for the direct measurement of 3D-position and 3D-motion of all tracked image points (6D-Vision). The following section describes the motion state estimation of oncoming vehicles at intersections. Finally, we highlight the potential of fisheye cameras for intersection assistance and give results. Fig. 1: Correlation based stereo (left) vs. dense stereo (right). Red encodes close, green encodes far points. Note the higher density especially in low-contrast areas like the road or the building on the right side.

STEREO VISION AND FREE SPACE ANALYSIS Given a carefully rectified stereo image pair (i.e. all lens distortions have been corrected and the epipolar lines coincide with the image rows), stereo vision aims to find corresponding features in the left and right image pair along the epipolar lines. From the disparities, i.e. the distance between corresponding points, the world position can be easily derived. Nevertheless, the task is not as simple as it sounds: periodic structures can cause false correspondences, hidden points are hard to identify, areas with low or even no contrast are difficult to evaluate and illumination differences ask for robustness of the used similarity measure. Besides the mentioned epipolar constraint, other constraints like the ordering constraint, the uniqueness constraint, the smoothness constraint or the recently introduced gravitational constraint (3) help to solve those problems. Since the relative orientation of a stereo camera system cannot be assumed to be constant over time, a slow on-line calibration is necessary. Recently, Dang (4) proposed a scheme that solves this task robustly. As mentioned in the introduction, it is common to accumulate all 3D points above ground in a stochastic occupancy grid. Figure 2 shows such a grid obtained for the urban situation considered in the sequel. The origin of the coordinate system is centered in our own vehicle. Our standard stereo camera system has a base line of 30cm and an angle of view of 42 deg. The imagers have VGA resolution. Fig. 2: Occupancy grids of an urban situation. Left: stereo image pair with enlarged bicyclist. Middle: the stochastic occupancy grid based one single image pair. Right: the improved accuracy obtained by the procedure. Note the decreased uncertainty especially at larger distances. It becomes obvious that the uncertainty of stereo depth measurements increases quadratically with distance. Therefore, the bicyclist (zoomed out in the left image) at around 60m is highly blurred in the occupancy grid. Free space analysis of those occupancy grids is not very reliable, thus we are looking for strategies to reduce the uncertainty.

One way to reduce the disparity noise is the tracking of features in the images over multiple frames. If the disparity measurements are uncorrelated, the variance decreases with 1/N, if N is the number of images. The 6D-Vision algorithm described below exploits this fact. Fortunately, tracking becomes redundant in static scenes when the ego-motion of the camera is known a priori. This is beneficial since it allows working with dense stereo disparity maps despite the real-time constraint. Disparity measurements which are consistent over time are considered as belonging to the same world point, and therefore, disparity variance is reduced accordingly. This stereo integration requires three main steps: Prediction: the current integrated disparity and a variance image are predicted. This is equivalent to computing the expected optical flow and disparity based on ego-motion. Our prediction of the variance image includes the addition of a driving noise parameter that models the uncertainties of the system, such as ego-motion inaccuracy. Measurement: disparity and variance images are computed based on the current left and right images. Update: if the measured disparity confirms its prediction, then both are fused together reducing the variance of the estimation. The verification of the disparity consistency is performed using a standard 3-sigma test. Figure 2 shows an example of the improvement achieved. The occupancy grid shown at the right was computed with an integrated disparity image. Note the significantly reduced uncertainties of the registered 3D points. A bicyclist at approximately 60 meters away is marked in the images. The occupancy grids shown above are in Cartesian coordinates. However, Cartesian space is not a suitable space to compute the free space because the search must be done in the direction of rays leaving the camera. The set of rays must span the whole grid. This leads to discretization problems. A more appropriate space is the polar space. In polar coordinates every grid column is, by definition, already in the direction of a ray. Therefore, searching for obstacles in the ray direction is straightforward. For the computation of free space the first step is to transform the Cartesian grid to a polar grid by applying a remapping operation. The polar representation we use is a Column/Disparity occupancy grid, for a detailed discussion see (5). A result of this is shown in the middle image of Figure 3. Fig. 3: Free space computation. The green carpet shows the computed available free space. The free space is obtained applying dynamic programming on a Column/Disparity occupancy grid, which is as a remapping of the Cartesian depth map, shown at the right. The free space resulting from the dynamic programming is shown over the grids.

In polar representation, the task is to find the first visible obstacle in the positive direction of depth. All the space found in front of occupied cell is considered free space. The desired solution forms a path from left to right segmenting the polar grid into two regions. Instead of simply thresholding each column, dynamic programming is used. The method based on dynamic programming has the following properties: Global optimization: every row is not considered independently, but as a part of a global optimization problem which is optimally solved. Spatial and temporal smoothness of the solution: the spatial smoothness is imposed by penalizing depth discontinuities while temporal smoothness is imposed by penalizing the deviation of the current solution from the prediction. Preservation of spatial and temporal discontinuities: the saturation of the spatial and temporal costs allows the preservation of discontinuities. Figure 3 shows the result of the dynamic programming applied to the considered scene. For more details on this analysis see (6). 6D-VISION Until now, we assumed the world to be static and showed how to combine successive stereo image pairs to reduce the variance of the free space estimation. This information can be used for obstacle detection and obstacle avoidance in a straight-forward manner, since all non-free space is considered an obstacle. However, the world is not completely static and a system for obstacle detection has to cope with moving objects and precisely estimate their movements to predict potential collisions. A common approach is to analyze the occupancy grid and to track isolated objects over time. The major disadvantage of this algorithm is that the segmentation of isolated objects is difficult in scenes consisting of multiple nearby objects. Fig. 4: Dangerous traffic scene. The left image shows a pedestrian appearing behind a standing car. The corresponding stereo reconstruction is shown in the center image. Red encodes close, green encodes far points. The optical flow field is shown in the right image. Here red lines encode large image displacements, green small displacements. This problem is illustrated in Figure 4: Here the pedestrian appears behind the standing car and runs towards the street. In the center image, the reconstructed stereo information is shown using the red to green color encoding scheme. Here, the points belonging to the pedestrian are hardly distinguished from the points on the standing car. A segmentation based on this

information only will therefore merge the pedestrian and the standing car into a single static object. In the right image, the optical flow between the last and the current frame is shown. The color encodes the length of the displacement vector: red encodes large image displacements, green small displacements. Here the pedestrian and the standing car can easily be distinguished. This leads to the main idea of the 6D-Vision algorithm: Track an image point in one camera from frame to frame and calculate its stereo disparity. Together with the known motion of the ego-vehicle, the movement of the corresponding world point can be calculated. In practice, a direct motion calculation based on two consecutive frames is extremely noisy. Therefore the obtained measurements are filtered by a Kalman filter. Since we allow the observer to move, we fix the origin of the coordinate system to the car. The state vector of the Kalman filter consists of the world point in the car coordinate system, and its corresponding velocity vector. The six-dimensional state vector ( X, Y, Z, X&, Y&, Z& ) gives this algorithm its name: 6D Vision. The mathematical details are given in (7).,, with u and v being the current image coordinates of the tracked image point and d its corresponding disparity. As the perspective projection formulae are non-linear, we have to apply the Extended Kalman filter. The measurement vector used in the update step of the Kalman filter is ( u v, d ) Fig. 5: 6D-Vision block diagram A block diagram of the algorithm is shown in Figure 5. In every cycle, a new stereo image pair is obtained. In the left image, appropriate features (e.g. edges, corners) are detected and tracked over time. In the current application we use a version of the Kanade-Lucas-Tomasi tracker (8) which provides sub-pixel accuracy and tracks the features robustly for a long image sequence. The disparities for all tracked features are determined in the stereo module. After this step the estimated 3D-position of each feature is known. Together with the ego-motion the measurements of the tracking and the stereo modules are given to the Kalman filter system that updates the state estimation. For the next image pair analysis, the acquired 6D information is used to predict the image position of the tracked features. This yields a better tracking performance with respect to speed and robustness. In addition, the predicted depth information is used to improve the stereo calculation. The motion of a vehicle is not at all straight but exhibits strong pitch and roll motion. In order to compensate for these disturbances, a precise ego-motion analysis is advisable. If stereo tracks are available, the full motion state (6 degrees of freedom) can be obtained from vision. The powerful real-time algorithm we use is described in (9).

Fig. 6: Estimation results for the pedestrian from Figure 5. The time between the images is (from left to right) 0, 80, 160 and 240 ms. The vectors point to the predicted position of the corresponding world point in 0.5s. The color encodes the distance of the points. The result of this algorithm is shown in Figure 6. From left to right the estimation results for the pedestrian from Figure 5 are shown at 0, 80, 160 and 240 ms relative to the first appearance of the pedestrian. The estimated velocity vectors point to the predicted position of the corresponding world point in 0.5s. The colors encode the distance of the points. It can be seen, that this rich information helps to detect the moving pedestrian and provides a first prediction of its movement at the same time. OBJECT TRACKING 6D-Vision is a powerful method to extract linear point motion in the 3D world. A group of 6D vectors corresponding to adjacent 3D points with similar 3D motion vector is likely to belong to the same object and, thus, can be used to generate object hypothesis. However, due to the linear motion model of the single points, predicting the motion of such object hypothesis without any further constraints is also limited to linear motion. With respect to vehicles, especially at turning maneuvers, the prediction of the driving path can not be very precise and may lead to misinterpretations. In (10), a vision-based approach for estimating the nonlinear motion state of vehicles from a moving platform is proposed. Objects are represented by a 3D point cloud combined with a state vector including object pose and dynamics. It is assumed that it is possible to distinguish vehicles from other moving objects such as pedestrians, e.g. based on dimension and velocity of a cluster of 6D vectors. Fig. 7: Turning vehicle at an intersection. The orange box indicates the estimated position and orientation. The red arrow indicates the predicted driving path assuming constant motion. The dynamics of a vehicle is approximated by a coordinated turn motion model, which restricts lateral movements to a circular path based on velocity and yaw rate. Moving the point cloud in the world induces changes in the image plane and is observed in terms of optical flow and disparity changes. An Extended Kalman Filter is used to solve for the

inverse problem, i.e. relating these observations in the image to a movement of the point cloud in the world. All points are referred to a local object coordinate system defined for each tracked vehicle. It is assumed that the real position of a point within the object coordinate system does not change over time (rigid body assumption) and that the object's structure is well described by the point cloud. In practice one has to deal with noisy observations of these points and the real position of a point is not known as is the overall structure. However, it is possible to refine the object point cloud over time based on a number of noisy observations of the single points. Fig. 7 shows a typical situation at an intersection. The orange box shows the current position of the oncoming car. The complete motion state of the turning vehicle has been estimated based on the stereo tracks. Assuming constant motion, the green arrow in front of the car indicates the expected circular driving path for the next second. The proposed system is able to estimate the motion state of vehicles at urban intersections including velocity, yaw rate, and acceleration as well as position and orientation, and runs currently at 25Hz on our demonstrator car UTA. The filter can be easily extended by adding additional measurements, for example radar sensor information such as relative velocity or distance. FISHEYE STEREO FOR INTERSECTION ASSISTANCE Common stereo camera systems have opening angles around 40 degrees. Simple investigations reveal that this angle must be increased to about 150 degrees if dangerous situations at intersections should be recognized, e.g. vehicles coming from the side. Fisheye lenses in contrast to standard wide angle lenses have the advantage of a constant resolution over the whole field of view. Currently, we use 150 degree lenses. A typical image is shown in fig. 8. The computation can be limited to 400 lines of the 1628x1236 imager. In the first step, the images are rectified, based on the data of calibration performed in an offline process. For details see (11). This allows using the free space analysis and 6D-Vision as described above without any changes. The rectification step works with a cylindrical camera model as opposed to the pinhole model in order to obtain a bounded image size. Figure 8 shows a situation at a pedestrian crossing. The computed free space is overlaid in green. Fig. 8: Free space analysis for the pedestrian crossing situation. Figure 9 shows a second intersection scene where a vehicle approaches quickly from the right having the right-of-way. Note the position of the vehicle at initial detection. It is first detected

at 15m longitudinal and 22m lateral distance, yielding 26m Euclidian distance. An earlier detection was impossible due to occlusion of a wall visible at the right edge. Fig. 9: 6D Vision result for a scene with a vehicle approaching fast from the right side having the right-of-way. The significant lateral motion is detected within 4 frames. The arrow length shows the predicted position in 0.5s. The arrow color encodes distance - red is near and green is far away Fig. 10: Situation shown in fig. 8 two seconds later after the stop of our own vehicle. The actual object detection is done via direction and position analysis of the 6D vectors (see previous section). Figure 10 shows the same scene two seconds later. The ego-vehicle almost stopped while the vehicle from the right was able to continue. SUMMARY Vehicles acting in a dynamic environment must be able to detect any static or moving obstacle. This implies that an optimal stereo vision algorithm has to seek for an optimal exploitation of spatial and temporal information contained in the image sequence. As shown in the paper, precision and robustness of 3D reconstructions are significantly improved if the stereo information is appropriately integrated over time. This requires knowledge of the ego-motion, which in turn can be efficiently computed from 3D-tracks. It turns out that the obtained ego-motion data outperforms commonly used inertial sensors. The obtained depth maps show less noise and uncertainties than those generated by simple frameby-frame analysis. A dynamic programming approach allows for determining the free space without any susceptible obstacle threshold. The algorithm runs in real-time on a PC and has proven robustness in daily traffic including night-time driving and heavy rain. The request to detect small or partly hidden moving objects from a moving observer asks for fusion of stereo and optical flow. This leads to the 6D-Vision approach that allows to simultaneously estimate position and motion of each observed image point. Since the fusion is based on Kalman filters, the information contained in multiple frames is integrated. This leads to a more robust and precise estimation than differential approaches like pure evaluation of the optical flow on consecutive image pairs. Grouping this 6D-information is very reliable

and enables fast detection of moving objects which can be further tracked using appropriate dynamic models. The same concept is applied to cameras with fisheye lenses. Practical tests confirm that a crossing cyclist at an intersection is detected within 4-5 frames. The implementation on a 3.2GHz Pentium 4 proves real-time capability. Currently, we select and track about 2000 image points at 25Hz (the images have VGA resolution). REFERENCES (1) D.Scharstein, R.Szeliski: A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. IJCV 47(1) (2002) pp.7-42. (2) H.Hirschmueller: Accurate and efficient stereo processing by semi-global matching and mutual information. CVPR 2005, San Diego, CA. Volume 2. (June 2005) pp.807-814. (3) S.Gehrig, U.Franke: Improving Stereo Sub-Pixel Accuracy for Long Range Stereo, Workshop on Virtual Representations and Modeling of Large-Scale Environments VRML@ ICCV 07, Rio, 2007. (4) T.Dang, C.Hoffmann: Tracking Camera Parameters of an Active Stereo Rig. In 28th Annual Symposium of the German Association for Pattern Recognition (DAGM 2006), Berlin, September 12-14 2006. (5) H.Badino, U.Franke, and R.Mester: Free space computation using stotchastic occupancy grids and dynamic programming. Workshop on Dynamical Vision, ICCV 07, Rio, 2007. (6) U.Franke, S.Gehrig, H.Badino, C.Rabe: Towards Optimal Stereo Analysis of Image Sequences, RobotVision 2008, 18.-20.February 2008, Auckland. (7) U.Franke, C.Rabe, H.Badino, S.Gehrig: 6D-Vision: Fusion of Stereo and Motion for Robust Environment Perception, 27 th DAGM Symposium 2005, pp. 216-223 ISBN 3-540-28703-5. (8) J.Shi and C.Tomasi, Good Features to Track. IEEE Conference on Computer Vision and Pattern Recognition, pages 593-600, 1994. (9) H.Badino, U.Franke, C.Rabe, S.Gehrig, Stereo-vision based detection of moving objects under strong camera motion, VisApp, Setubal (Portugal), February 2006. (10) A.Barth, U.Franke: Where will the Oncoming Vehicle be the Next Second?, IEEE Intelligent Vehicles Symposium IV 2008, Eindhoven, 4.-6. June 2008. (11) S.Gehrig, C. Rabe, L. Krüger, 6D Vision Goes Fisheye for Intersection Assistance, Canadian Robot Vision, Windsor, May 2008.