Real-Time Visual Behaviors with a Binocular Active Vision System. Jorge Batista Paulo Peixoto Helder Araujo. ISR - Institute of Systems and Robotics

Similar documents
Tracking Multiple Objects in 3D. Coimbra 3030 Coimbra target projection in the same image position (usually

Proc. Int. Symp. Robotics, Mechatronics and Manufacturing Systems 92 pp , Kobe, Japan, September 1992

Binocular Tracking Based on Virtual Horopters. ys. Rougeaux, N. Kita, Y. Kuniyoshi, S. Sakane, yf. Chavand

PRELIMINARY RESULTS ON REAL-TIME 3D FEATURE-BASED TRACKER 1. We present some preliminary results on a system for tracking 3D motion using

Gaze Control for a Two-Eyed Robot Head

USING AN ACTIVE VISION SYSTEM TO COMPUTE TIME-UNTIL-IMPACT

Local qualitative shape from stereo. without detailed correspondence. Extended Abstract. Shimon Edelman. Internet:

Estimating Rigid 3D Motion by Stereo Fixation of Vertices

A Tool for Kinematic Error Analysis of Robots/Active Vision Systems

2 Abstract In this paper, we showhow an active binocular head, the IIS head, can be easily calibrated with very high accuracy. Our calibration method

Motion. 1 Introduction. 2 Optical Flow. Sohaib A Khan. 2.1 Brightness Constancy Equation

Adaptive Zoom Distance Measuring System of Camera Based on the Ranging of Binocular Vision

Task selection for control of active vision systems

CONCLUSION ACKNOWLEDGMENTS REFERENCES

Proceedings of the 8th WSEAS Int. Conference on Automatic Control, Modeling and Simulation, Prague, Czech Republic, March 12-14, 2006 (pp )

Proceedings of the 6th Int. Conf. on Computer Analysis of Images and Patterns. Direct Obstacle Detection and Motion. from Spatio-Temporal Derivatives

Laser sensors. Transmitter. Receiver. Basilio Bona ROBOTICA 03CFIOR

Visual Tracking of Unknown Moving Object by Adaptive Binocular Visual Servoing

Foveated Vision and Object Recognition on Humanoid Robots

DD2423 Image Analysis and Computer Vision IMAGE FORMATION. Computational Vision and Active Perception School of Computer Science and Communication

Measurement of Pedestrian Groups Using Subtraction Stereo

Dense Depth Maps using Stereo Vision Head. flaa,

Tracking of Human Body using Multiple Predictors

Outline. ETN-FPI Training School on Plenoptic Sensing

Integration of Multiple-baseline Color Stereo Vision with Focus and Defocus Analysis for 3D Shape Measurement

Face Recognition At-a-Distance Based on Sparse-Stereo Reconstruction

An Active Vision System for Obtaining High Resolution Depth Information

n Scene (u,v) (u,v )

Center for Automation Research, University of Maryland. The independence measure is the residual normal

Prof. Fanny Ficuciello Robotics for Bioengineering Visual Servoing

COMPARATIVE STUDY OF DIFFERENT APPROACHES FOR EFFICIENT RECTIFICATION UNDER GENERAL MOTION

Accurate Internal Camera Calibration using Rotation, with. Analysis of Sources of Error. G. P. Stein MIT

Zoom control to compensate camera translation within a robot egomotion estimation approach

LUMS Mine Detector Project

25 Hz. 25 Hz. X M(z) u c + u X. C(z) u ff KF. Θ, α. V(z) Θ m. 166 Hz Θ. Θ mot. C(z) Θ ref. C(z) V(z) 25 Hz. V(z) Θ, α. Interpolator.

A Robust Two Feature Points Based Depth Estimation Method 1)

Practical Robotics (PRAC)

Transactions on Information and Communications Technologies vol 16, 1996 WIT Press, ISSN

arxiv: v1 [cs.cv] 28 Sep 2018

COSC579: Scene Geometry. Jeremy Bolton, PhD Assistant Teaching Professor

v v t =(t,0,t ) x C β 2 x z

Steering without Representation using Active Fixation

3D Sensing. 3D Shape from X. Perspective Geometry. Camera Model. Camera Calibration. General Stereo Triangulation.

Robotics - Projective Geometry and Camera model. Marcello Restelli

CHAPTER 3 DISPARITY AND DEPTH MAP COMPUTATION

Robot vision review. Martin Jagersand

All human beings desire to know. [...] sight, more than any other senses, gives us knowledge of things and clarifies many differences among them.

Machine vision. Summary # 11: Stereo vision and epipolar geometry. u l = λx. v l = λy

Camera model and multiple view geometry

arxiv: v1 [cs.cv] 2 May 2016

Design of an anthropomorphic robot head

Marcel Worring Intelligent Sensory Information Systems

Miniature faking. In close-up photo, the depth of field is limited.

MOTION TRAJECTORY PLANNING AND SIMULATION OF 6- DOF MANIPULATOR ARM ROBOT

Hand-Eye Calibration from Image Derivatives

Transactions on Information and Communications Technologies vol 19, 1997 WIT Press, ISSN

Rectification and Disparity

Multimedia Technology CHAPTER 4. Video and Animation

Visual Servoing Utilizing Zoom Mechanism

Parallel Robots. Mechanics and Control H AMID D. TAG HI RAD. CRC Press. Taylor & Francis Group. Taylor & Francis Croup, Boca Raton London NewYoric

Task analysis based on observing hands and objects by vision

are now opportunities for applying stereo ranging to problems in mobile robot navigation. We

Computer Vision. Coordinates. Prof. Flávio Cardeal DECOM / CEFET- MG.

CS4733 Class Notes, Computer Vision

3D Environment Measurement Using Binocular Stereo and Motion Stereo by Mobile Robot with Omnidirectional Stereo Camera

Epipolar Geometry in Stereo, Motion and Object Recognition

Robotics 2 Visual servoing

The Agile Stereo Pair for Active Vision

Towards Cognitive Agents: Embodiment based Object Recognition for Vision-Based Mobile Agents Kazunori Terada 31y, Takayuki Nakamura 31, Hideaki Takeda

Table of Contents. Chapter 1. Modeling and Identification of Serial Robots... 1 Wisama KHALIL and Etienne DOMBRE

Exploiting Geometric Restrictions in a PTZ Camera for Finding Point-correspondences Between Configurations

Motion-compensated. image sequence. visualization. hard disk Dominant motion estimation. Motion-compensated. image sequence.

Chapter 12 3D Localisation and High-Level Processing

Cooperative Targeting: Detection and Tracking of Small Objects with a Dual Camera System

Ruch (Motion) Rozpoznawanie Obrazów Krzysztof Krawiec Instytut Informatyki, Politechnika Poznańska. Krzysztof Krawiec IDSS

A General Expression of the Fundamental Matrix for Both Perspective and Affine Cameras

Motion Planning for Dynamic Knotting of a Flexible Rope with a High-speed Robot Arm

Rigid Body Motion and Image Formation. Jana Kosecka, CS 482

Free head motion eye gaze tracking using a single camera and multiple light sources

Mobile Robot Localization Using Active Fixation on a Line. David W. Murray and Andrew J. Davison. but in case that the robot has other visual tasks,

Flow Estimation. Min Bai. February 8, University of Toronto. Min Bai (UofT) Flow Estimation February 8, / 47

MOTION STEREO DOUBLE MATCHING RESTRICTION IN 3D MOVEMENT ANALYSIS

calibrated coordinates Linear transformation pixel coordinates

Intuitive Human-Robot Interaction through Active 3D Gaze Tracking

Omni Stereo Vision of Cooperative Mobile Robots

Massachusetts Institute of Technology Department of Computer Science and Electrical Engineering 6.801/6.866 Machine Vision QUIZ II

Vision Review: Image Formation. Course web page:

Visual Pathways to the Brain

Performance Characterization of Visual Behaviors in an Active Vision System

extracted occurring from the spatial and temporal changes in an image sequence. An image sequence

3D Sensing and Reconstruction Readings: Ch 12: , Ch 13: ,

Finally: Motion and tracking. Motion 4/20/2011. CS 376 Lecture 24 Motion 1. Video. Uses of motion. Motion parallax. Motion field

Matthias Zobel, Joachim Denzler, Heinrich Niemann Binocular 3-D Object Tracking with Varying Focal Lengths

Omni-Directional Visual Servoing for Human-Robot Interaction

Structure from Motion and Multi- view Geometry. Last lecture

Structure from motion

6-dof Eye-vergence visual servoing by 1-step GA pose tracking

ECE-161C Cameras. Nuno Vasconcelos ECE Department, UCSD

Understanding People Pointing: The Perseus System. Roger E. Kahn and Michael J. Swain. University of Chicago East 58th Street.

Advancing Active Vision Systems by Improved Design and Control

Transcription:

Real-Time Visual Behaviors with a Binocular Active Vision System Jorge Batista Paulo Peixoto Helder Araujo ISR - Institute of Systems and Robotics Dep. of Electrical Engineering - University of oimbra 3000 OIMBRA - PORTUGAL batista,peixoto,helder@isr.uc.pt Abstract An active vision system has to enable the implementation of reactive visual processes and of elementary visual behaviors in real time. Therefore the control architecture is extremely important. In this paper we discuss a number of issues related with the implementation of a real-time control architecture and describe the architecture we are using with our camera heads. Even though in most applications a fully calibrated system is not required, we also describe a methodology for calibrating the camera head, taking advantage of its degrees of freedom. These calibration parameters are used to evaluate the performance of the system. Another important issue of the operation of active vision binocular heads is their integration into more complex robotic systems. We claim that higher levels of autonomy and integration can be obtained by designing the system architecture based on the concept of purposive behavior. At the lower levels we consider vision as a sensor and integrate it in control systems (both feed-forward and servo loops) and several visual processes are implemented in parallel, computing relevant measures for the control process. At higher levels the architecture is modeled as a state transition system. Finally we show how this architecture can be used to implement a pursuit behavior using optical ow. Simultaneously vergence control can also be performed using the same visual processes. 1 Introduction Until a few years ago, the main goal of vision was to recover the 3D structure of the environment. According to this paradigm vision is a recovery problem being its goal the creation of an accurate 3D description of the scene (shape, location and other properties) which then would be given to other cognitive modules (such as planning or reasoning). Systems based on this approach typically used one or two static cameras (or, equivalently only considered static points of view, without the possibility of changing the viewpoint). Image acquisition, in this framework, is passive. Instead of trying to nd general solutions for the vision modules we can consider the problem of vision in terms of an agent that sees and acts in its environment ([1], [2]). An agent can be dened as a set of intentions (or purposes) which translate into a set of behaviors [3]. The visual system can then be considered as a set of processes working in a cooperative manner to achieve various behaviors ([6], [7]). This is a paradigm known as active/purposive vision. Within this framework we consider that the system is active because it has control over the image acquisition process and acquires images that are relevant for what it intends to do. The control over the image acquisition process enables the introduction of constraints that facilitate the extraction of information about the scene [2]. Therefore our goal when using the active vision system is not the construction of a general purpose description. The system only needs to recover partial information about the scene. The information to be extracted and its representation have to be determined from the tasks the system has to carry out (its purpose). Vision is considered as part of a complex system that interacts with the environment [4]. Since only part of the information contained in the images needs to be extracted, the visual system will operate based on a restricted set of behaviors (sets of perceptions and actions). By considering vision within this framework, it can be used in real time to control a more complex robotics system such as a mobile platform or a manipulator. At the control level, information extracted from the images is used, as much as possible, to drive the various control signals. ext we describe the architecture we have implemented based on the application of these principles. We will also describe an application developed based on these principles. A real-time gaze control using optical ow is described. 2 A omplex Active Vision System In order to experiment with visual behaviors and to study active vision issues (inspired by biological implementations and in particular by the human visual system) we decided to build a multi-degrees of freedom (MDOF) robot head [8]. We call it the MDOF active vision head (see g. 1). Other groups have built heads and demonstrated them in several applications. At the University of

Precision Range Velocity oom Range=90000 [12:5: : : 75]mm 1:2 range=s Aperture Range=50000 [1:2 : : : 16] 2:2 range=s Focus Range=90000 [1 : : : 1]m 1:2 range=s Table 2: Optical structure characteristics of the MDOF active vision system Figure 1: MDOF Active Vision Head and MDOF Eye with the MDOF Motorized oom lens Rochester [11, 14] a binocular head was demonstrated and tracking was performed using vergence control by means of zero-disparity ltering. A complex head was also built at KTH [9] using stepper motors. At the University of Oxford a head was also used [12] to demonstrate the use of image motion to drive saccade and pursuit. 2.1 Mechanical and Optical Structure The binocular head developed by ourselves has a high number of degrees of freedom (see table 1). In addition to the common degrees of freedom for camera heads (neck pan, neck tilt and independent vergence for each of the eyes), this head includes the swing movement of the head neck, independent tilt for each eye, baseline control, cyclotorsion of the lenses and the ability of adjusting the optical center of the lenses (OA). In a real world environment the range of conditions that a camera may need to image under, be it focused distance, spatial detail, lighting conditions or radiometric sensitivity, can often exceed the capabilities of a camera with a xed parameters lens. To adapt the imaging conditions the camera system requires lenses whose intrinsic parameters can be changed in a precise and fast controllable manner. ew motorized lenses have been developed to enable this head to accommodate the optical system in real time (25 images per second, with very good precision (see g. 1)). These lenses have controllable zoom, focus and iris and they use small harmonic drive D motors with encoder feedback information. With such performances (see table 2), the lens is able Precision Range Velocity eck Pan 0:0036 [?110 : : : + 110 ] 180 =s eck Swing 0:0036 [?27:5 : : : + 27:5 ] 180 =s eck Tilt 0:0036 [?32 : : : + 32 ] 360 =s Eye Pan 0:0036 [?45 : : : + 45 ] 360 =s Eye Tilt 0:0031 [?20 : : : + 20 ] 330 =s yclotorsion 0:0031 [?25 : : : + 25 ] 330 =s OA 8nm [0 : : : 80]mm 1mm=s Baseline 20nm [137 : : : 287]mm 5mm=s Table 1: Mechanical structure characteristics of the MDOF active vision system to make continuous, small optical adjustments required by many algorithms in near real time with excellent precision. Qualitative improvements in lens performances increase the advantages of active vision techniques that rely on controlled variations of intrinsic parameters. 2.2 System Architecture The MDOF active vision robot head is controlled by one host computer with a Pentium PU (166Mhz) and a PI Matrox Meteor frame-grabber P board. ontrol Module Right Eye R G B RGB PI Matrox Meteor Red Green Blue Image Processing D-AT 100 D-AT 100 D-AT 100 Pentium - 166Mhz Left Eye Optical Degrees of Freedom Mechanical Degrees of Freedom Figure 2: The MDOF system Architecture A modular multi-axis motion controller was used to control all degrees of freedom of the head. This modular system consists of a motherboard where up to six daughterboards or modules can be connected. The motherboard is based on a 32-bit 80960 RIS PU. On-board Multitasking executes up to 10 independent programs or background tasks simultaneously without interrupting motion control. Multiple boards can be built into a single system, when more than six modules are required. Three boards are used to control the 18 degrees of freedom of the robot head. Each D servo controller module that was plugged in on the motherboard contains a trapezoidal velocity pro- le generator and a digital PID compensation lter. Each module is a self-contained intelligent controller. The image acquisition is performed by an RGB PI Matrox Meteor Board. Each one of the monochrome cameras is connected to an input of the RGB frame grabber.

P c c c c c c c Pure Rotation over c Figure 4: Stack of images obtained with pure-rotation of the MDOF eye. Motion vectors of the feature points are superimposed. Pure Rotation over c Figure 3: Pure Rotation around some axis The motors are controlled by fully parallel processes. The parameters of these processes can be changed on the y. Visual behaviors are dened by processes running on the main PU of the Master unit. These processes decompose the visual behaviors into elementary visual processes. It is also possible to have dierent elementary visual processes implemented on both images, in parallel. 3 Optical alibration Despite the fact that in many cases we only need weakly or partially calibrated systems, we still need for several reasons, to know the calibrations parameters of the system. These will enable us to be sure of what exactly is happening while implementing a process or behavior or they may provide us with the parameters required by a specic process or algorithm. Therefore, and in order to have a fully usable tool, we decided to develop a calibration method for our active vision head. 3.1 The Pure-Rotation alibration Method The basic idea of the optical calibration method used is very simple. We based our approach on the use of feature correspondences from a set of images where the camera has undergone pure rotation [8]. Given any pair of images obtained by a camera which undergoes pure rotation, if the intrinsic camera parameters and the angle and axis of rotation are known, one can compute where the feature points from one image will appear in the second image. If there is an error in the intrinsic parameters, the features in the second image will not coincide with those computed from the rst image. This is the main observation that allows us to use pure rotation to obtain some of the intrinsic camera parameters. Let us assume that the camera is rotated in a rigid world environment around some axis (see g. 3). Also assume the existence of a camera coordinate system located at the lens optical center and the axis viewing along the optical axis of the lens. A 3D point P c (x c ; y c ; z c ) in the camera coordinate system will move after rotation to a point Pc(x 0 0 c ; y0 c ; z0 c) through the matricial relationship P 0 c = R P c = 2 4 r 11 r 12 r 13 r 21 r 22 r 23 r 31 r 32 r 33 3 5 Pc : (1) Using the perspective projection pin-hole geometry the 3D camera point P 0 c projects to the undistorted image point p 0 u(x 0 u ; y0 u) where x 0 u = f x x 0 c z 0 c yu 0 = f yc 0 y zc 0 = f r 11x c + r 12 y c + r 13 z c r 31 x c + r 32 y c + r 33 z c = f r 21x c + r 22 y c + r 23 z c r 31 x c + r 32 y c + r 33 z c : (2) Multiplying equations 2 by f=z c and substituting x u = f x (x c =z c ) and y u = f y (y c =z c ) results x 0 u = f x r 11 x u + r 12 y u + r 13 f x r 31 x u + r 32 y u + r 33 f x y 0 u = f y r 21 x u + r 22 y u + r 23 f y r 31 x u + r 32 y u + r 33 f y : (3) Observing these last two equations, we can conclude that the position of the point in the image after pure rotation depends only on the intrinsic camera parameters, the rotation matrix and the location of the point in the image before the rotation. The 3D point coordinates are not required in the case of pure rotation. Since after rotation we have a pair of images, we have chosen to minimize the sum of squared distances between the feature points in the image obtained after rotation and those computed from the initial image and using the pure rotation model described above, summed over all the feature points of each pair of images. To be more precise, the intrinsic parameters can be obtained using pairs of images taken with the camera rotated at various angles (see g. 4). The relative angles of rotation are measured precisely. orresponding features in each pair of images are found and their pixels coordinates are extracted. There is no special reason to detect the same

number M of features in each image, but this is what we did in practice. We dene the cost function: E = M k=1 n=1 x k f n 2? xk f nrot 1;2 2 + yf k 2? yk n 2 f nrot 1;2 (4) where (x k f nroti;j ;yf k nroti;j) are the coordinates of (x k f ni ; yf k ni ) after rotation from image i to image j in each pair. ombining the cost function with equations 3 and dening the image points on the frame-buer plane the cost function E can now be dened by E = where M k=1 n=1 2 2 x k f? x0k n 2 f? c n x 2 + yf k? y0k n 2 f? c n y 2 x 0k f = r 11 (x k k f? c n x) + r 12 (y 1 f k? c n y)k?1 + r 13 f 1 x k x n xf 2 x r 31 (x k f? c n x) + r 32 (y k 1 f? c n y)k?1 + r 33 f x k x 1 y 0k f n 2 = k yf y r 21 (x k f n 1? c x)k + r 22 (y k f n 1? c y) + r 23 f y k y r 31 (x k f n 1? c x)k + r 32 (y k f n 1? c y) + r 33 f y k y The task is now to nd the intrinsic parameters of the camera (f x k x ; f y k y ; k; c x ; c y ) by a straightforward nonlinear search, where f x k x, f y k y represent the focal lenght of the lens in pixels, k represent the aspect ratio and c x, c y represents the image center coordinates. A rst guess value for these parameters is required for the nonlinear search. See [8] for a detailed explanation of the whole active calibration process. bivariate polynomials. The general formula for a n th order bivariate polynomial is BP (foc p ; zoom p ) = n n?i i=0 j=0 a ij foc i p zoomj p (5) and the number of coecients required by the polynomial is = (n + 1)(n + 2)=2. Using these functionals two main relationships were modelled with direct inuence in the auto-focusing mechanism implemented. They modelled the relationship between the focus and zoom motor settings and the eective focal length of the lens f, and the relationship between focus and zoom motor settings and focused target depth D: f = BP (foc p ; zoom p ) D = BP (foc p ; zoom p ): (6) The focused target depth is measured using stereo triangulation from verged foveated images. The eective focal length of both eye lenses is computed and both images were focused using the Fibonacci maximum search procedure of Tennengrad focus criterion. 4 Implementation of Visual Behaviors In order to demonstrate the architecture, we have implemented some visual behaviors. 3.2 Motorized oom Lens Modelling After having determined the parameters of the camera model for a range of lens settings we must characterize how they vary with the lens settings. If we estimate the parameters values using the described pure rotation calibration method and just store them in look-up-tables we need to make no assumptions about how they vary with the settings. However, if we want to use algebraic form for the parameter models or interpolate between the sampled lens settings, we must obtain some generic functional relationships between a dependent variable s (the lens parameters) and an independent variable l (the lens settings). The MDOF motorized lens have the capability to adjust focus, zoom and aperture. The modelling process of the aperture degree of freedom was considered in a completely dierent procedure. Since only focus and zoom are going to be considered in this modelling process, the functional relationship used to model the lens behavior have two independent variables, the focus motor setting foc p and the zoom motor setting zoom p. We used to describe these functional relationships Figure 5: eck and Eye saccade movements 4.1 Saccade ontrol using Motion The saccade process starts when one of the cameras detects peripheral motion. After the detection of peripheral motion, a waiting state occurs until motion is detected in both eyes. In that moment, the head redirected its gaze by a neck and eye saccade to put the target into the horopter (see g. 5). The detection of motion is performed by means of image dierencing. When the integral of dierences is above a threshold, motion is detected. The center of mass is computed and its pixel coordinates converted into rotation angles that the neck or the eye must rotate to foveate on the origin of motion (see g. 6). Due to the latency of the saccade movement (200ms for neck-saccade and 100ms for eye-saccade) an? lter is

Image Motion induced by camera Egomotion Target Motion Detection (Optical Flow) Asyncronous Velocity KALMA Filter Estimator ontrol Module Syncronous (f=100hz) Left/Right Images Encoders PID Servo ontrol D AT-100 (f=1khz) Figure 8: The MDOF High Level Gaze ontroller Figure 6: Image sequence obtained during the neck and eye saccade. A latency of 200ms (5 frames) exists for the saccade movement. Peripheral Motion Detected Eye I // Wait α β Filter Eye II Saccade SSM SSM - Start Saccade Movement eck Saccade Eye Saccade 80ms ESM Eye 160ms ESM eck Smooth-Pursuit ESM - End Saccade Movement Figure 7: Timing diagram for neck and eye saccade. used to predict the image position of the target assuming that the target is moving with constant velocity. At least three consecutive frames are required as input for the? lter. Figure 7 presents the timing diagram for neck and eye saccade and saccade to pursuit transition. Saccade motion is performed by means of position control of all degrees of freedom involved. During the saccade period, and since the camera is moving at high velocity no visual feedback information is processed. 4.2 Smooth Pursuit Using Optical Flow After xating on the object the pursuit process is started by computing the optical ow. During the pursuit process velocity control of the degrees of freedom is used instead of position control as in the case of the saccade. Assuming that the moving object is inside the fovea after a saccade, the smooth pursuit process starts a Kalman lter estimator, which takes the estimated velocity of the target as an input. With this approach, the smooth pursuit controller generates a new prediction of the current image target velocity, and this information is send to the motion servo controller every 10ms (see g. 8). Two dierent motions must be considered to exist in the scene: one caused by the motion been undertaken by the head and the other one coming from the object. Since the rst one is known (through inverse kinematics of the head), we only have to compute the other. We considered the analysis of motion described by the two-component model proposed in [13]. Let I(x; y; t) be the observed gray scale image at time t. We are assuming that the image is formed by the combination of two distinct image patterns, P (the background motion) and Q (the target motion), having independent motions of p and q: I(x; y; 0) = P (x; y) Q(x; y) I(x; y; t) = P tp Q tq (7) where the symbol represents an operator such as addition, multiplication or even a more complex operator to combine the two patterns. In congurations where a foreground object moves in front of a moving background, the combination operator must be more complex, representing the asymmetric relationship between foreground and background. However, since the motion q of the object in the image is in fact q = p + q 0 where q 0 is the real image object motion, we can consider that the operator can be approximately additive. Since motion p is known, only q must be determined. The pattern component P moving at velocity p can be removed from the image sequence by shifting each image frame by p and subtracting it from the following frame. The resulting sequence will contain only patterns moving with velocity q (see g. 9). Let D 1 and D 2 be the rst two frames of this dierence sequence, obtained from three original frames. D 1 I(x; y; t + 1)? I p (x; y; t) = (P 2p + Q 2q )? (P 2p + Q q+p ) = Q 2q? Q q+p = (Q q? Q p ) q (8) D 2 I(x; y; t + 2)? I p (x; y; t + 1) = (P 3p + Q 3q )? (P 3p + Q 2q+p ) = Q 3q? Q 2q+p = (Q q? Q p ) 2q (9) As we can see the sequence of dierence images is undergoing a single motion q. That means that it can be computed using any single-motion estimation technique.

I t I t+1 I t+2 warp I t warp I t+1 warp I t+1 D 2 = I - D 1 = I t+1 - I t t+2 warp Figure 9: Target motion detection subtracting the image motion induced by camera movement. In our case we model image formation by means of the scaled orthographic projection. Even if we model image formation as a perspective projection this is a reasonable assumption since motion will be computed near the origin of the image coordinate system (in a small area around the center x and y are close to zero). We can therefore assume that the optical ow vector is approximately constant throughout all the image, i.e., u = p x and v = p y. To compute the optical ow vector we minimize i (I xi p x + I yi p y + I ti ) 2 = 0 (10) Taking the partial derivatives on p x and on p y and making them equal to zero we obtain: Figure 10: Pursuit sequence. Stack of images obtained during the smooth-pursuit process. The t between consecutive images is 200ms (5frames). 4.2.1 Background Image Motion Estimation The image motion induced by the camera egomotion can be computed using the following equations v u = v x f x z? v z f x x z 2 v v = v y f y z? v z f y y z 2 (13) where (f x, f y ) represents the focal lenght of the lens in pixels and V = v x v y v z T represent the velocity of the point P = x y z T in the camera coordinate system due to the egomotion of the head. This velocity V results from the combination of several joints rotation (eye, tilt, swing and pan) and is dened as: V = V eye + V tilt + V swing + V pan. ( i I 2 x i )p x + ( i I yi I xi )p y + ( i I xi I ti ) = 0 (11) ( i I yi I xi )p x + ( i I 2 y i )p y + +( i I yi I ti ) = 0 (12) The ow is computed on a multiresolution structure. Four dierent resolutions are used: 16 16; 32 32; 64 64. These are sub-sampled images. A rst estimate is obtained at the lowest resolution level (16 16), and this estimate is propagated to the next resolution level, where a new estimate is computed and so on. The optical ow computed this way is used to control the angular velocity of the motors. The sequence Fig. 10 shows images of the pursuit sequence. Figure 11: Pursuit sequence. Stack of images with the eye and target movement during the smooth-pursuit process.

D Ω Vr Ω Er Wait Wait Smooth Pursuit Ω Vl aml Ω H El Ω amr T Intruder Detected? Intruder in Both Eyes? eck Saccade Eyes Saccade Vergence Angle < 0? µ Saccade Ω B P Ω S Saccade oncluded? Target Out of Sight? Figure 12: Joints coordinate systems of the MDOF head. Figure 13: State Transition System. Representing this velocity by V ref =?V Trans? ^ P and since V Trans = 0, the velocity induced by each joint can be expressed as V ref =? ^ P. Assuming the following angular velocities for each of the rotation joints eye = e v 0 T, tilt = t 0 0 T, swing = 0 0 s T and pan = 0 p 0 T the velocities induced by the rotation of each joint are V eye =? cam T eye (? eye ^ P eye ) (14) V tilt =? cam T tilt (? tilt ^ P tilt ) (15) V swing =? cam T swing (? swing ^ P swing ) (16) V pan =? cam T pan (? pan ^ P pan ) (17) where P eye, P tilt, P swing and P pan represents the coordinates of the point P in each one of the joints coordinate systems (see g. 12). 4.3 The Gaze ontroller Strategy The strategy adopted by the gaze controller to combine saccade and smooth pursuit to track moving objects using the MDOF robot head was based on a State Transition System. This controller denes ve dierent states : Waiting, eck-saccade, Eye-Saccade, Pursuit and Saccade. Each one of these states receive command control from the previous state, and trigger the next state in a closed loop transition system. The global conguration of this state transition system is show in gure 13. A combination between pursuit and saccade is implemented with the purpose that the neck always follow the movement of the eyes. With this strategy we bound the range movements of the eyes, avoiding that the left eye verges to its left or the right eye verges to its right 1. When 1 This situation occours for a negative vergence angle one of the eyes is pointing perpendicularly to the baseline, a saccade of the neck if performed and the vergence angle of both eyes becomes equal. During this process, no visual information is processed, and we use the Kalman lter estimator to predict the position and velocity of the target after a saccade. Due to the architecture of our system we can change the control parameters on the y so that the system can adapt itself to changes in velocity. This way the system can cope with sudden changes in velocity. We can also switch from velocity control to position control on the y. We can also have several vision processes running in parallel. Besides the process above described to compute ow due to motion parallel to the image plane, another process to compute ow due to translational motion along the optical axis can also be implemented, taking into account that the object is xated by both cameras. 4.4 Auto-Focusing Mechanism The auto-focusing mechanism is mainly based on the previous calibration of the focused target depth D. During the smooth-pursuit process both eyes are verged on the moving target. Taking advantage of the precise information provided as feedback by all MDOF robot head degrees of freedom, a rough estimate of the target depth related to the robot head can be obtained through triangulation of xated foveated images. Observing gure 14 and since the vergence angles ( l ; r ) and baseline distance b are known with accuracy, the target depth D can be computed by D = b tan l tan r tan l + tan r (18) Since the target distance D is obtained using equation 18 making use of the vergence angles, the lens-target distance

Target Depth (mm) 5000 4800 4600 4400 4200 4000 3800 3600 3400 3200 0 1000 2000 3000 4000 5000 Time (msec) D left D D right θ l θ r b Figure 14: Vergence xation process Focus Motor Setting 7.4 x 104 7.3 7.2 7.1 7 6.9 6.8 0 1000 2000 3000 4000 5000 Time (msec) Effective Focal Length (mm) 25.19 25.18 25.17 25.16 25.15 25.14 25.13 25.12 0 1000 2000 3000 4000 5000 Time (msec) Figure 15: Target Depth and auto-focusing focus motor setting during a 5sec smooth-pursuit process. can now be computed using equations: D left = D D right = D (19) sin l sin r Taking the lens-target depth (D left ; D right ) computed using equations 19 and the zoom motor setting zoom p, that was considered xed during the smooth-pursuit process, the focus motor position foc p can now be computed using the bivariate polynomial that modelled the relationship between the focus and zoom motors settings and the target depth D. Since this mechanism for auto-focusing requires simple mathematical computation and taking advantage of the MDOF motorized zoom lens speed and accuracy, a real-time auto-focusing process was performed with good results. Figure 15 presents the behavior of the computed target depth and focus motor setting during a smooth-pursuit process. 5 onclusions In this paper we have shown that by using the concept of purposive behavior it is possible to implement real-time active vision systems. The concept is essential for the design of the system architecture, if real time operation and robustness are major design goals. Another result of our approach is that the control architecture we have used enabled real-time operation with limited computing. On the other hand the use of parallelism enabled us the continuous processing of the image data as well as the coordination of the several actuation systems that have to work in synchrony. Parallelism is also essential to allow the visual agents to attend to the several events that are happening in the world continuously. The integration, the system architecture, the information processing modules, and the motor control processes were all designed taking into account the tasks and behavior of the systems. References [1] Aloimonos,.: Purposive and qualitative active vision. Proc. Image Understanding Workshop (1990) 816{828. [2] Aloimonos,., Weiss,., Bandopadhay, A.: Active Vision. Intern. J. omput. Vision 7 (1988) 333{356 [3] McFarland, D., Bosser, T.: Intelligent Behavior in Animals and Robots. MIT Press (1993) [4] Aloimonos,.: What I Have Learned. VGIP: Image Understanding 60 o.1 (1994) 74{85 [5] Ho,.: Dynamics of Discrete Event Systems. Proc. IEEE 77 (1989) 3{7 [6] Sloman, A.: On Designing a Visual System. J. Exper. and Theor. Artif. Intell. 1 (1989) 289{337 [7] Horridge, G.: The evolution of visual processing and the construction of seeing systems. Proc. Roy. Soc. London B 230 (1987) 279{292 [8] Batista, J., Dias, J., Araujo, H., Almeida, A. The ISR Multi-Degrees-of-Freedom Active Vision Robot Head: design and calibration. M2VIP'95{Second International onference on Mechatronics and Machine Vision in Practice Hong{Kong, September, (1995) [9] Pahlavan, K.: Active Robot Vision and Primary Ocular Processes. PhD Thesis, VAP, KTH, (1993), Sweden. [10] Dias, J., Paredes,., Fonseca, I., Batista, J., Araujo, H., Almeida, A.: Simulating Pursuit with Machines: Experiments with Robots and Articial Vision. Proc. IEEE Int. onf. on Rob. and Auto. 472{477 May 21{27, (1995), agoya, Japan [11] Brown,.: Gaze controls with interaction and delays. IEEE Transactions on Systems, Man and ybernetics 20, May, (1990), 518{527. [12] Murray D., Bradshaw K., MacLauchlan P., Reid I., Sharkey P.: Driving Saccade to Pursuit Using Image Motion. Intern. Journal of omputer Vision 16, o.3, ovember, (1995), 205{228. [13] Bergen, J., Burt, P., Hingorani, R., Peleg. S.: omputing Two Motion from Three Frames In David Sarno Research enter April, (1990). [14] Brown,., oombs D.: Real-Time Binocular Smooth Pursuit Intern. Journal of omputer Vision 11, o.2, October, (1993), 147{165. [15] Almeida, A., Araujo, H., unes, U., Dias, J.: Multi-Sensor Integration for Mobile Robot avigation. In Articial Intelligence in Industrial Decision Making, ontrol and Automation Spyros Tzafestas and Henk Verbruggen (Eds.), Kluwer Academic Publishers, (1995). [16] Burt, P., Bergen, J., Hingorani, R., Kolczynski, R., Lee, W., Leung, A., Lubin, J., Shvaytser, H.: Object tracking with a moving camera. Proc. IEEE Workshop Visual Motion, Irvine, (1989). [17] Willson, R.: Modelling and alibration of Automated oom Lenses. MU-RI-TR-94-03, arnegie Mellon University, (1994). [18] arpenter, R. H. S.: Movements of the Eye. Pion, (1988).