Mathematical Analysis for Visual Tracking Assuming Perspective Projection

Size: px

Start display at page:

Download "Mathematical Analysis for Visual Tracking Assuming Perspective Projection"

Lesley Shelton
5 years ago
Views:

Mathematical Analsis for Visual Tracking Assuming Perspective Projection João P. arreto, Paulo Peioto, Jorge atista, Helder Araujo Institute of Sstems and Robotics Dept.

Applications of visual control of motion require that the relationships between motion in the scene and image motion be established.

1 Mathematical Analsis for Visual Tracking Assuming Perspective Projection João P. arreto, Paulo Peioto, Jorge atista, Helder Araujo Institute of Sstems and Robotics Dept. of Electrical Engineering Polo II - Universit of Coimbra 33 Coimbra Portugal Abstract. Applications of visual control of motion require that the relationships between motion in the scene and image motion be established. In the case of active tracking of moving targets these relationships become more comple due to camera motion. This work derives the position and velocit equations that relate image motion, camera motion and target 3D motion. Perspective projection is assumed. oth monocular and binocular tracking sstems are analed. The epressions obtained are simplified to decouple the control of the different mechanical degrees of freedom. The simplification errors are quantified and characteried. This stud contributes for the understanding of the tracking process, for establishing the control laws of the different camera motions,for deriving test signals to evaluate the sstem performance and for developing egomotion compensation algorithms. Introduction Fig..: The Active Vision Sstems at ISR. Right: The modular camera pan and tilt unit MVS. Left: The ISR-MDOF robot head. Visual control of motion is a widel studied subject. Several papers have addressed the problem of controlling motion using visual information [, 2, 3, 4]. In these applications, the relationships between image motion and 3D motion in the scene have to be established. In active tracking applications these relationships become more comple due to camera motion. Works like [5] and [6] address mainl the control aspects of tracking. Cameras are modeled as a constant gain and the effects of perspective projection are not considered. Affine models are assumed in [7] and approimations in the perspective model are considered in [8]. Man of these works rel on closed-loop control strategies to reduce the influence of modeling simplification. However some visual behaviors such the saccadic motion tpicall use open-loop configuration. {jpbar,peioto,batista,helder}@isr.uc.pt

In this paper we derive the mathematical relationships between image motion, camera motion and target 3D motion in the scene, both for monocular and binocular tracking applications.

The results obtained allow a better understanding of the tracking process as a regulation control problem. Strategies for both monocular and binocular tracking are developed.

2 In this paper we derive the mathematical relationships between image motion, camera motion and target 3D motion in the scene, both for monocular and binocular tracking applications. Perspective projection is considered b assuming a camera pinhole model. oth position and velocit equations are derived. The results obtained allow a better understanding of the tracking process as a regulation control problem. Strategies for both monocular and binocular tracking are developed. The simplifications of the equations derived to decouple the degrees of freedom of the vision sstem are discussed. Control laws for different visual behaviors (smooth pursuit, vergence and saccade) are established and approimation errors are studied and characteried. We are using the results of the present work to derive test signals to characterie the performance and robustness of the active tracking algorithms. Egomotion compensation techniques are being studied as well. The knowledge of the relationship between velocit in image, camera motion and targets 3D velocit can also be useful in the development of high-level visual behaviors such as target segmentation in an environment with multiple moving targets. 2 Monocular Tracking Fig. 2.: Image formation process in monocular tracking. The 3D scheme, ehibiting the variables and reference frames used in mathematical analsis, is shown from two different points of view. Fig. (R) depicts our MVS unit. The camera has two degrees of freedom: pan and tilt. oth rotation aes go through the optical center. Camera undergoes pure rotation motion. The tpical goal for a tracking application is to move the camera in such a wa that target is projected in the center of image. In 3D space it means that the optical ais must be aligned with the target. This section derives the mathematical relationship between 3D target motion, camera motion and projection in image. Camera motion is tpicall known b using motor encoders. The scheme of figure 2 describes the image formation process. Assume a standard pinhole model where camera performs a perfect perspective transform with center O (camera optic center) at a distance f (focal length) of the retinal plane. Vector M = (X, Y, Z) t represents target cartesian coordinates in R i (O, i, i, i ), (inertial reference frame). Target 3D position can also be represented b spherical coordinates (ρ, θ, φ). R c (O, c, c, c ) is the referential frame attached to the camera where c is aligned with the optical ais and c and c are aligned with horiontal and vertical aes in the image. The retinal plane is perpendicular to the optical ais. Target is projected at point (, ) in image plane. This point is represented as an homogeneous 3D vector λm = λ(,, ) corresponding to a line of a given direction passing through the optical center (2D projective space). Camera first rotates in pan and then in tilt (Fick model). Camera pan and tilt positions are given b α p and α t (the rotation angles around and c ). R p and R t are the pan and tilt rotation matrices s s ψ u F A = s v = F ()

3 A is the matri of the intrinsic parameters in the projection (see equation ). s and s stand for the scaling along the horiontal and vertical aes of the image plane, s ψ gives the skew between the aes and (u, v ) are the principal point coordinates. Notice that A is an upper triangular matri, thus it is alwas invertible. It is assumed the aes are orthogonal (s ψ = ), the optical ais intersects the retinal plane at the origin of the image referential (u = v = ) and s = s = F. 2. Position equations m = AR t t (α t)r t p (α p)m (2) Equation 2 epresses the relationship between target 3D coordinates M in the inertial frame and projective coordinates m in camera reference frame. Notice is target Z coordinate in camera referential, m = (,, ) t the projective coordinates and (, ) target coordinates in image plane. R p( α p )R t ( α t )R t (α t )A m = R t (α t )A m (3) Camera moves α p in pan and α t in tilt. New camera rotation angles are (α p + α p, α t + α t ). Equation 3 establishes the relationship between m, the actual target projective coordinates, and m, the projective coordinates before camera motion. Consider α p and α t are the tracking angular position errors in pand and tilt. If target spherical coordinates are (ρ, θ, φ) (see Fig. 2) then α p = θ α p and α t = φ α t. m = = (F C(α t ) S(α t )) tan( α p ) F 2(S(αt)2 C( α p)+c(α t) 2 ) tan( α t) S(2α t)(c( α p) ) 2(S(α t) 2 +C(α t) 2 C( α p)) S(2α t)(c( α p) )) tan( α t) (4) Whenever camera moves to compensate the angular position errors, the target is projected in the center of the image and m becomes equal to (,, ) t. Equation 4 is derived from 3 making m = (,, ) t. It establishes the relationship between camera tilt position α t, target position in image (, ) and angular position errors ( α p, α t ). F cos(α t ) tan( α p ) m = ỹ = F tan( α t ) (5) oth and epressions are comple and highl non-linear. If the target is being tracked b the active vision sstem, it is reasonable to assume that most of time the target image is nearl in the center. This assumption is used to derive equation 5, a simplification of 4. Using 5 pan and tilt control can be decoupled. position is onl related with pan error and position is related with tilt error. E E pos = pos = (6) ỹ Φ pos = (σ pos ) 2 φ pos φ pos (σpos) 2 = E pos µ N 2 µ pos = pos = E pos (i)p (i) (7) µ pos N 2 i= (E pos (i) µ pos )(E pos (i) µ pos ) t P (i) (8) i= Camera angular position (α p, α t ) is obtained b reading motor encoders. Visual processing determines target position in image (, ). The goal is to move the camera to compensate for the angular errors ( α p, α t ). Equation 4 performs the eact computation of ( α p, α t ) given (, ). E pos is the error in approimating equation 4 b 5 (see equation 6). E pos is a function of α t, α p and α t. Consider α t, α p, α t [ 45, 45 ] and the interval discretied in N samples uniforml spaced. For each camera tilt position α t there are N 2 possible combinations for ( α p, α t ). The pan and tilt angular errors are assumed to be statisticall independent. P (i) is the joint probabilit function of ( α p, α t ). Fiate α t, for each pair ( α p, α t ) there is a corresponding error vector E pos. Equation 7 computes the average error µ pos for a certain camera tilt position α t. Epression 8 computes the covariance matri Φ pos.

4 .5 Average Error.25 Error Standard Deviation and Covariance..2 Std. Dev. X.5.5 Std. Dev. Y X ais Y ais Covariance Camera tilt position (deg) Camera tilt position (deg) Fig. 3.: Quantitative analsis of error function E pos and E pos assuming a normal probabilit distribution for the angular errors in pan and tilt. Above: average error in X (-) and Y (- -). elow: error standard deviation in X (-) and Y (- -). If the sstem is tracking the target then, most of time, target image is not far from the center. It is reasonable to assume that both pan and tilt errors ( α p and α t ) are statisticall described b a normal distribution of average. P (i), in equations 7 and 8, is a bidimensional probabilit densit function for a normal distribution with ero average and standard deviation of 2 in pan and 8 in tilt. Fig. 3 depicts E pos average, standard deviation and covariance as a function of α t. The average error in X and the covariance φ pos are ero and µ pos is an odd function. oth σpos and σpos (standard deviation in X and Y) increase with the module of camera tilt angle. The data depicted in Fig. 4 is useful to understand the behavior of the error function E pos. Different camera tilt positions were studied. First column is for α t = 23, the second for α t = and the third for α t = 23. Whenever α t is known target position in image depends on the angular pan and tilt errors ( α p, α t ). First row depicts the eact and approimated target positions in image for different tilt angles. The second row ehibits the corresponding error X coordinate Epos. The third row shows the error in Y ais Epos. Observe the eact and estimated positions of target projection in image (first row). Assume the angular pan error α p constant. In 3D space target is positioned somewhere in a vertical plane going trough the origin O of the inertial referential frame (see Fig. 2). The plane is projected in a line in the image. If α t = the line is vertical and if α t the line has a slope whose module is inversel proportional to the module of camera tilt angle. The approimation of equation 5 alwas projects the plane in a vertical line in the image. Thus, as depicted in Fig. 4, = whenever α t = or =. The approimation error in X ais Epos is a function of α t, α p and α t. Notice that: E pos(, α p, α t ) = E pos(α t, α p, α t ) = E (α t, α p, α t ) E pos(α t, α p, α t ) = E ( α t, α p, α t ) Consider the angular tilt error α t constant. In 3D space the target is somewhere in a conic surface whose vertice is the origin O of the inertial referential frame (see Fig. 2). The image projection of these surfaces are the hiperbolic lines depicted in Fig. 4 (first row). Whenever α t = α t the conic surface degenerates in the OXZ plane which is projected in an horiontal line. The approimation of equation 5 generates horiontal lines in image. Therefore ỹ = whenever α t = alpha t or α p =. Some properties of E pos, observable in Fig. 4, are itemied below: E pos(α t,, α t ) = E pos(α t, α p, α t ) = E pos(α t, α p, α t ) = E (α t, α p, α t ) E pos(α t, α p, α t ) = E ( α t, α p, α t ) The approimation error E pos = [E pos, E pos] t increases with both angular position errors ( α p, α t ) and camera tilt position. If camera tilt angle α t take a great value the approimation error becomes signif-

5 Camera tilt position 23º.5 Camera tilt position º.5 Camera tilt position 23º Te=45º Te=45º Te=45º.5 Te=23º.5 Te=23º.5 Te=23º Pe= 45º Pe= 23º Pe=º Pe=23º Pe=45º Pe= 45º Pe= 23º Pe=º Pe=23º Pe=45º Image Y ais Te=º Te= 23º Image Y ais Te=º Te= 23º Image Y ais Te=º Pe= 45º Pe= 23º Pe=º Pe=23º Pe=45º Te= 23º Te= 45º Pe=Pan Error Te=Tilt Error Te= 45º Pe=Pan Error Te=Tilt Error Te= 45º Pe=Pan Error Te=Tilt Error.5 Image X ais Image X ais Image X ais Error in X. Camera tilt angle 23º Error in X. Camera tilt angle º Error in X. Camera tilt angle 23º Error in Y. Camera tilt angle 23º Error in Y. Camera tilt angle º Error in Y. Camera tilt angle 23º Fig. 4.: Approimation error E pos (E pos, E pos) assuming F =. Each column corresponds to a certain camera tilt position: first column α t = 23, second column α t = and third column α t = 23. First row: target projection in image (eact (o) and approimated (*) position). Second row: E pos as a function of ( α p, α t ). Third row: E pos as a function of ( α p, α t ). icant even if the target is projected near the center of the image. Therefore to approimate equation 4 b equation 5, the operation range of tilt degree of freedom can not be large. This last result ields important conclusions reagarding the mechanical construction of an active tracking sstem. We have been assuming a Fick model of rotation (pan and tilt). Our sstems have been designed for tracking targets moving in the ground plane (see Fig ). The pan operating range is much larger than tilt operating range. The values of α t are alwas less than 23 (the mechanical limit for the tilt degree of freedom) and the errors in assuming the approimations of equation 5 are small. Consider a sstem, with the same purposes, designed with the camera first rotating in tilt and then in pan (Helmholt model). Assuming this α t would be replaced b α p in the derived equations. The errors in using the approimations would be more significant because the operation range of α p would be larger. Moreover the sstem mechanical construction would involve additional difficulties. It is alwas easier to achieve large operation ranges in the independent rotation than in the dependent one. For these reasons it is advisable to design the sstem with the camera rotating first around the angle where larger operation ranges are requested. 2.2 Velocit Equations This section derives the mathematical epressions for the target velocit in the image. Velocit in the image depends on camera motion and target 3D velocit. Camera motion induces velocit in the image even when the scene is static. This self-induced motion is called in the literature egomotion. Target motion in 3D space also induces motion in image. Tracking is achieved when camera moves in such a wa that egomotion cancels

6 out the velocit induced b motion in the scene and target is kept in the same position in successive frames. In this paper it is assumed that camera intrinsic parameters are kept constant along time (Ȧ = ). żm + ṁ = AR t t(α t)rp t (α p)m + ARt t(α t) R p t (α p)m + ARt t(α t)rp t (α p)ṁ (9) K =. () Equation 9 is derived b differentiating 2. ṁ = [ẋ, ẏ, ] t where [ẋ, ẏ] t is the velocit vector in image. ż is target velocit along Z ais in camera reference frame. In order to derive image velocit vector, matri K is defined in such a wa that (ẋ, ẏ) t = K(żm + ṁ). ẋ = KAP ẏ t A mα t + KARt t(α t)p p R t (α t )A mα p + KARt t (α t)rp t (α p)ṁ () Equation is derived from 9. P p and P t are the diferential generators of the abelean groups R t p() and R t t(). Velocit in image (ẋ, ẏ) t both depends on camera velocit of motion ( α t ) (egomotion) and 3D velocit in the scene Ṁ. The egomotion terms do not depend on scene 3D coordinates because camera describes pure rotation motion. ẋ ẏ αp α t [ = W ego ] θ + W tgt φ The target velocit in 3D space can be described in rectangular (Ẋ, Ẏ, Ż) or spherical ( ρ, θ, φ) coordinates. When the target motion is described in rectangular coordinates, velocit in the image depends on Ẋ, Ẏ and Ż. On the other hand, whenever spherical coordinates are used velocit in the image depends onl on θ and φ. This happens because ρ is the velocit component along the projective ra that can not be observed in the image (for a point target). Spherical coordinates simplif the derivation and understanding of velocit equations. Equation 2 ields the target velocit in the image. The first term refers to egomotion, showing the contribution of camera motion to image velocit. Matri W ego is a 22 weighting matri that depends on the image point coordinates and camera tilt angle. The second term refers to the velocit in the image due to the target motion in the scene. W tgt is a 22 matri which is a function of, and α t. oth W ego and W tgt are easil computed from equation. Fig. 5 depicts the velocit fields induced in image. The first two rows show the velocities due to camera pan and tilt motion (egomotion). The third and fourth rows ehibit the velocit fields induced b target motion in space. αp θ = W α ego W tgt = (F S(α t)+c(α t)) (S(α t) F C(α t)) 2 +(F C(α t) S(α t)) 2 θ (3) t φ 2 +(F C(α t) S(α t)) 2 φ S(α t) F C(α t) (2) The goal in a tracking application is to move the camera in a such a wa that egomotion compensate the image velocit induced b motion in the scene. Assuming that the target velocit is ( θ, φ), perfect tracking is achieved whenever camera pan and tilt velocities are computed b equation 3. W ego is a singular matri F tan(α t) onl when =. This happens whenever target projection las on an horiontal line that contains the intersection point of the pan rotation ais with the image plane. That is the onl case in which perfect velocit regulation can not be achieved. Notice that if φ =, perfect tracking is achieved b making α p = θ and α t =. The velocit induced in the image b the target motion can be compensated for b camera pan rotation. However if α t the camera must move in pan and tilt to keep target position in image. For this case, the residual velocit b making α p = and α t = φ can be observed in Fig. 5 (last row)). [ ẋ F cos(αt )( α p θ) ] ẏ F ( α t φ) (4) Similarl to what was done for position analsis, equation 2 is simplified to decouple pan and tilt control. Equation 2 is obtained from equation 4 assuming (, ) = (, ). The image velocit in Y ais onl depends on camera tilt velocit and φ and perfect tracking is achieved whenever α t = φ. The same happens for the X ais, camera pan velocit and θ.

7 Velocit in Image (pan motion; camera tilt angle 23º) 5 Velocit in Image (pan motion; camera tilt angle º) 5 Velocit in Image (pan motion; camera tilt angle 23º) Velocit in Image (tilt motion; camera tilt angle 23º) 5 Velocit in Image (tilt motion; camera tilt angle º) 5 Velocit in Image (tilt motion; camera tilt angle 23º) Velocit in Image (target pan motion; camera tilt angle 23º) 5 Velocit in Image (target pan motion; camera tilt angle º) 5 Velocit in Image (target pan motion; camera tilt angle 23º) Velocit in Image (target tilt motion; camera tilt angle 23º) 5 Velocit in Image (target tilt motion; camera tilt angle º) 5 Velocit in Image (target tilt motion; camera tilt angle 23º) Velocit in Image (target+camera tilt motion; camera tilt angle 23º) 5 Velocit in Image (target+camera tilt motion; camera tilt angle º) 5 Velocit in Image (target+camera tilt motion; camera tilt angle 23º) Fig. 5.: Velocit field in image. Each column corresponds to a certain camera tilt position: first column α t = 23, second column α t = and third column α t = 23. First row: ( α t ) = (, )(rad/s) and ( θ, φ) = (, )(rad/s). Second row: ( α t ) = (, )(rad/s) and ( θ, φ) = (, )(rad/s). Third row: ( α t ) = (, )(rad/s) and ( θ, φ) = (, )(rad/s). Fourth row: ( α t ) = (, )(rad/s) and ( θ, φ) = (, )(rad/s). Fifth row: ( α t ) = (, )(rad/s) and ( θ, φ) = (, )(rad/s). Large rectangle (-): image with a field of view of Small rectangle (:): image with a field of view of 24 8.

8 i r r (X,Y,Z) α t r α p c c i O c l l i Right Camera Cclopean Camera Left Camera l TOP VIEW r r c c (X,Y,Z) - β β Right Camera Cclopean Camera Left Camera l l Fig. 6.: inocular Sstem. Left:Neck pan and tilt. Right: Vergence Control 2.3 inocular Tracking In previous section the position and velocit equations for monocular tracking were derived. In this section we focus on the equations for binocular tracking. As depicted in figure 6 two cameras are mounted in a moving platform with length 2 (the baseline). The platform is able to rotate in pan (α p angle) and tilt (α t angle). Each ee has and additional pan degree of freedom called vergence (β L and β R angles). It is assumed that vergence is smmetric (β L = β R = β). inocular tracking is achieved b controlling the three independent degrees of freedom: pan (α p ), tilt (α t ) and vergence (β). Imagine a third camera positioned in the center of the platform called the cclopean ee. Neck pan and tilt degrees of freedom align cclopean Z (forward-looking) ais with target. Vergence control adjusts both camera positions so that target images are projected in the corresponding image centers. Assuming that target is foveated with smmetric vergence angles, these onl depend on target motion along the cclopean Z (forward-looking) ais i.e. on the cclopean depth of the object. Therefore binocular tracking can be split in two sub-problems: cclopean ee control and vergence control. Tracking the target with the cclopean ee is essentiall a monocular tracking problem as studied in last section. To perform such a tracking it is necessar to transfer visual information from both retinas to the cclopean camera. Vergence control is achieved using retinal flow disparit. 2.4 Pan and Tilt Control { l m l = AR t p (β)rt t (α t)r t p (α p)m + AR t p (β)t r m r = AR t p ( β)rt t (α t)r t p (α p)m AR t p ( β)t (5) Vector M represents target cartesian coordinates in the cclopean inertial frame. Target is projected in the left image at ( l, l ) and in the right image at ( r, r ). The corresponding homogeneous vectors are λm l and λm r. l and r are target Z coordinates in left and right camera reference frames. Equation 5 establishes the relationship between m l, m r and M. α p and α t are platform pan and tilt angles, β is the smmetric vergence angle and t = (,, ) t where 2 is the baseline. { l m l = AR t p (β)a m + AR t p (β)t r m r = AR t p ( β)a m AR t p ( β)t (6) Assume (, ) are target projection coordinates in the cclopean ee, λm is the corresponding homogeneous vector and represents target cclopean depth. Equation 6 is derived from 5 using 2. The goal of the neck pan and tilt control is to align cclopean Z ais (forward-looking) with the moving target. This task is similar to the monocular tracking problem. It is necessar to compute target position and velocit in the cclopean ee from the position and velocit measurements in left and right image. = [ F 2 ( l + r) 2 S(β) 2 l r F 2 C(β) 2 +F S(β)C(β)( l r) F 2 C(β)( l + r)+f S(β)( l r r l ) 2 S(β) 2 l r F 2 C(β) 2 +F S(β)C(β)( l r) ] (7)

9 Equation 6 is a sstem of si non-linear equations. Assume target image coordinates are known in both left and right camera (( l, l ) and ( r, r )). There are five unknowns in 6: target position in the cclopean image (, ) and target Z coordinate in left, right and cclopean camera referential ( l, r and ). These five unknowns can be determined b solving the sstem of equations. The solution for (, ) is given in 7. ẋ = ±K ẏ AP prp(±β)a t m β + K ARp(±β)P t tr p(±β)a m α t +K ARp(±β)R t t(α t t)p pr t(α t)r p(±β)a m α p K AR t p(±β)p ttα t K AR t p(±β)rt(α t t)p pr t(α t)tα p + K AR t p(±β)rt(α t t)rp(α t p)ṁ (8) Equation 8 computes the velocit in left and right retina. It is obtained b differentiating equation 5. The first five terms correspond to velocit induced b camera motion (egomotion). Notice that fourth and fifth term depend on scene depth because camera performs translational motion. The last term correspond to the image velocit induced b target motion in space. β ẋego = W ẏ ego rot α t + Wego tra αt (9) ego α p α p Wego tra = [ (F S(β) ± C(β))C(α t ) ±(F S(β) + C(β)C(α t )) Equation 9 computes the image velocit component due to camera motion. Wego rot is a 23 weighting matri corresponding to egomotion induced b camera rotation motion. Wego tra is a 22 weighting matri that is related with egomotion component due to camera translation (see equation 2). is the distance from the camera to neck pan and tilt joints. The camera describes translational motion if > and α p. Whenever is ero or the neck pan degree of freedom is stopped, the second term of equation 9 is ero and egomotion is onl due to camera rotation. The velocit in image induced b camera translation is proportional to the ratio. Usuall target distance to sstem is much larger than baseline length. The ratio is nearl ero and the egomotion induced b camera translation can be neglected. ẋegol /r ± C(αt )( + F S(β)) β ẏ egol /r C(β) ±S(α t )(S(β) + α ) t (2) Assuming that target is nearl in the center of image, equation 9 simplifies to equation 2 b making (, ) = (, ). ẋtgtl = ẏ l K l ARp t (β)a X tgt [ tgtl ] (22) ẋr = r K r ARp t ( β)a X tgt ẏ r Consider (ẋ tgt, ẏ tgt ) t as being the velocit induced in cclopean image b target motion in scene. This velocit vector is (ẋ tgt, ẏ tgt ) t = KX tgt, where X tgt is a 3 vector such as X tgt = AR t t(α t )R t p(α p )Ṁ (see equation ). Assume (ẋ tgtl, ẏ tgtl ) t and (ẋ tgtr, ẏ tgtr ) t are image velocities induced b target motion in left and right retina. 22 is sstem of equations derived from the last term of 8. This sstem is used to obtain an analtical solution for (ẋ tgt, ẏ tgt ) t. [ ẋtgtl +ẋ tgtr ] ẋtgt 2 cos(β) 2 ẏ ẏ tgtl +ẏ tgtr (23) tgt 2 cos(β) Assuming that target is projected in the center of both retinas, the velocit induced b target motion in the cclopean image is given b equation 23. ] α p (2)

10 2.5 Vergence Control Ee pan is called vergence. Figure 6 ehibits a schematic of vergence control. The vergence point is the intersection of left and right optical ais. Assuming smmetric vergence this point is alwas in the cclopean Z ais. The sstem is verged whenever tan(β) = where is target Z coordinate in the cclopean reference frame. This section performs the mathematical stud for vergence control. β = arctan( F cos(β)( l r ) + 2 sin(β) l r F (2F cos(β) sin(β)( l r )) ) (24) Assume the vergence angle is β and β is the correction angle that verges the sstem in the target (tan(β + β) = ). β is called the vergence error in position and is given b equation 24. ẋ = ± K AP p Rp t (±β)(a m ± t) β + K ARp t (±β)a (żm + ṁ) (25) ẏ Equation 25 is derived b differentiating equation 6. It gives the velocit in left and right retina in order to velocities in the cclopean ee. The first term corresponds to egomotion induced b vergence degree of freedom. While neck pan and tilt are used to keep the target aligned with cclopean Z ais, the goal of vergence control is to compensate for target motion along the cclopean optical ais. 2F β ẋ l ẋ r = 2F sin(β)2 ż (26) β = ẋtgt l ẋ tgtr (27) 2F Assume that cclopean ee is performing perfect tracking ((, ) = (, ) and (ẋ, ẏ) = (, )) and that target is projected in the center of both retinas (tan(β) = ). Equation 26 is obtained appling these assumptions to 25. It gives the velocit disparit in X ais that is the sum of an egomotion term and target velocit along cclopean Z ais. To achieve perfect tracking this sum must be ero. If ẋ tgtl ẋ tgtr is the disparit due to target motion in scene, equation 27 gives the vergence velocit to achieve perfect tracking. References. J.L. Crowle, J.M. edrune, M. ekker, and M. Schneider. Integration and control of reactive visual processes. In Third European Conference in Computer Vision, volume 2, pages 47 58, Stockolm,Sweden, Ma J. L. Crowle. Coordination of action and perception in a surveillance robot. IEEE Epert, 2, November Espiau, F. Chaumette, and P. Rives. A new approach to visual servoing in robotics. IEEE Trans. on Robot. and Automat., 8(3):33 326, June H. I. Christensen, J. Horstmann, and T. Rasmussen. A control theoretic approach to active vision. In Asian Conference on Computer Vision, pages 2 2, Singapore, December P.I. Corke. Visual Control of Robots: High-Peformance Visual Servoing. Mechatronics. John Wile, P. Krautgartner and Vince M. Performance evaluation of vision-based control tasks. In ICRA98 IEEE Int. Conf. on Robotics and Automation, pages , Leuven, elgium, P. Nordlund and Uhlin T. Closing the loop: detection and pursuit of a moving object b a moving observer. Image and Vision Computing, 4: , D. Murra, P. McLauchlan, I. Reid, and P. Sharke. Reactions to peripheral image motion using a head/ee plataform. In Proc. of IEEE International Conference on Computer Vision, pages 43 4, 993. This article was processed using the TEX macro package with SIRS2 stle

Tracking Multiple Objects in 3D. Coimbra 3030 Coimbra target projection in the same image position (usually

Tracking Multiple Objects in 3D Jo~ao P. Barreto Paulo Peioto Coimbra 33 Coimbra 33 Jorge Batista Helder Araujo Coimbra 33 Coimbra 33 Abstract In this paper a system for tracking multiple targets in 3D