3D Tracking Using Two High-Speed Vision Systems

Similar documents
Task selection for control of active vision systems

Motion Planning for Dynamic Knotting of a Flexible Rope with a High-speed Robot Arm

Visual Tracking of a Hand-eye Robot for a Moving Target Object with Multiple Feature Points: Translational Motion Compensation Approach

Visual Tracking of Unknown Moving Object by Adaptive Binocular Visual Servoing

Keeping features in the camera s field of view: a visual servoing strategy

Parallel Extraction Architecture for Information of Numerous Particles in Real-Time Image Measurement

6-dof Eye-vergence visual servoing by 1-step GA pose tracking

Machine vision. Summary # 11: Stereo vision and epipolar geometry. u l = λx. v l = λy

A 100Hz Real-time Sensing System of Textured Range Images

A Tool for Kinematic Error Analysis of Robots/Active Vision Systems

Visual Servoing Utilizing Zoom Mechanism

Shape Modeling of A String And Recognition Using Distance Sensor

3D Environment Measurement Using Binocular Stereo and Motion Stereo by Mobile Robot with Omnidirectional Stereo Camera

Computer Vision Projective Geometry and Calibration. Pinhole cameras

955-fps Real-time Shape Measurement of a Moving/Deforming Object using High-speed Vision for Numerous-point Analysis

3D Terrain Sensing System using Laser Range Finder with Arm-Type Movable Unit

Robot Vision Control of robot motion from video. M. Jagersand

Flexible Calibration of a Portable Structured Light System through Surface Plane

Contents. 1 Introduction Background Organization Features... 7

3D Geometry and Camera Calibration

calibrated coordinates Linear transformation pixel coordinates

Epipolar geometry. x x

Measurement of Pedestrian Groups Using Subtraction Stereo

Ellipse fitting using orthogonal hyperbolae and Stirling s oval

Structure from Motion and Multi- view Geometry. Last lecture

Midterm Exam Solutions

Rectification and Distortion Correction

Omni-Directional Visual Servoing for Human-Robot Interaction

MOTION. Feature Matching/Tracking. Control Signal Generation REFERENCE IMAGE

1 Projective Geometry

55:148 Digital Image Processing Chapter 11 3D Vision, Geometry

Computer Vision Projective Geometry and Calibration. Pinhole cameras

LUMS Mine Detector Project

Self-calibration of a pair of stereo cameras in general position

Model Based Perspective Inversion

Stereo Vision. MAN-522 Computer Vision

Today. Stereo (two view) reconstruction. Multiview geometry. Today. Multiview geometry. Computational Photography

Tool Center Position Determination of Deformable Sliding Star by Redundant Measurement

Motion. 1 Introduction. 2 Optical Flow. Sohaib A Khan. 2.1 Brightness Constancy Equation

Silhouette Coherence for Camera Calibration under Circular Motion

Task analysis based on observing hands and objects by vision

Absolute Scale Structure from Motion Using a Refractive Plate

Image Based Visual Servoing Using Algebraic Curves Applied to Shape Alignment

Visual Recognition: Image Formation

Unit 3 Multiple View Geometry

Two-view geometry Computer Vision Spring 2018, Lecture 10

Optical Flow-Based Person Tracking by Multiple Cameras

Chapter 3 Image Registration. Chapter 3 Image Registration

Multiple Views Geometry

A Stratified Approach for Camera Calibration Using Spheres

Expanding gait identification methods from straight to curved trajectories

Recognizing Buildings in Urban Scene of Distant View ABSTRACT

Development of 3D Positioning Scheme by Integration of Multiple Wiimote IR Cameras

Robot vision review. Martin Jagersand

1D camera geometry and Its application to circular motion estimation. Creative Commons: Attribution 3.0 Hong Kong License

Octree-Based Obstacle Representation and Registration for Real-Time

CS201 Computer Vision Camera Geometry

Visualisation Pipeline : The Virtual Camera

Epipolar Geometry and the Essential Matrix

CS223b Midterm Exam, Computer Vision. Monday February 25th, Winter 2008, Prof. Jana Kosecka

Computer Vision cmput 428/615

Epipolar Geometry in Stereo, Motion and Object Recognition

The end of affine cameras

Prof. Fanny Ficuciello Robotics for Bioengineering Visual Servoing

A Fast Linear Registration Framework for Multi-Camera GIS Coordination

Three-Dimensional Sensors Lecture 2: Projected-Light Depth Cameras

Assignment 3. Position of the center +/- 0.1 inches Orientation +/- 1 degree. Decal, marker Stereo, matching algorithms Pose estimation

Multiview Stereo COSC450. Lecture 8

N-Views (1) Homographies and Projection

MCE/EEC 647/747: Robot Dynamics and Control. Lecture 3: Forward and Inverse Kinematics

Computer Vision. Coordinates. Prof. Flávio Cardeal DECOM / CEFET- MG.

2. Give an example of a non-constant function f(x, y) such that the average value of f over is 0.

Robotics 2 Visual servoing

Epipolar Geometry Prof. D. Stricker. With slides from A. Zisserman, S. Lazebnik, Seitz

Factorization Method Using Interpolated Feature Tracking via Projective Geometry

A Framework for 3D Pushbroom Imaging CUCS

arxiv: v1 [cs.cv] 18 Sep 2017

Three-Dimensional Measurement of Objects in Liquid with an Unknown Refractive Index Using Fisheye Stereo Camera

Assist System for Carrying a Long Object with a Human - Analysis of a Human Cooperative Behavior in the Vertical Direction -

Multiple View Geometry of Projector-Camera Systems from Virtual Mutual Projection

Calibration and Synchronization of a Robot-Mounted Camera for Fast Sensor-Based Robot Motion

Catadioptric camera model with conic mirror

Subpixel Corner Detection Using Spatial Moment 1)

A New Algorithm for Measuring and Optimizing the Manipulability Index

Precise Omnidirectional Camera Calibration

Camera Model and Calibration

1 Introduction. 2 Real-time Omnidirectional Stereo

A Simulation Study and Experimental Verification of Hand-Eye-Calibration using Monocular X-Ray

Dynamic Time Warping for Binocular Hand Tracking and Reconstruction

3D Sensing. 3D Shape from X. Perspective Geometry. Camera Model. Camera Calibration. General Stereo Triangulation.

Structure from Motion

Graphics and Interaction Transformation geometry and homogeneous coordinates

Visual servo control of mobile robots

2 1/2 D visual servoing with respect to planar. contours having complex and unknown shapes

IMAGE MOMENTS have been widely used in computer vision

POME A mobile camera system for accurate indoor pose

COMP30019 Graphics and Interaction Transformation geometry and homogeneous coordinates

Computer Vision I - Algorithms and Applications: Multi-View 3D reconstruction

Indoor Positioning System Based on Distributed Camera Sensor Networks for Mobile Robot

Recovering structure from a single view Pinhole perspective projection

Transcription:

3D Tracking Using Two High-Speed Vision Systems Yoshihiro NAKABO 1, Idaku ISHII 2, Masatoshi ISHIKAWA 3 1 University of Tokyo, Tokyo, Japan, nakabo@k2.t.u-tokyo.ac.jp 2 Tokyo University of Agriculture and Technology, Tokyo, Japan, iishii@cc.tuat.ac.jp 3 University of Tokyo, Tokyo, Japan, ishikawa@k2.t.u-tokyo.ac.jp Abstract When considering real-world applications of robot control with visual servoing, both 3D information and a high feedback rate are required. We have developed a 3D target tracking with a 1ms feedback rate using two high-speed vision s called Column Parallel Vision (CPV) s. To obtain 3D information, such as position, orientation, and shape parameters of the target object, a feature-based algorithm has been introduced using moment feature values extracted from vision s for a spheroidal object model. Also, we propose a new 3D self windowing method to extract the target in 3D space, which is an extension of the conventional self windowing method in 2D images. 1 Introduction To control a robot dynamically by direct visual feedback [1, 2], a servo rate of around 1kHz is required. Conventional vision s using CCD cameras cannot realize this fast feedback rate because of the slow transmission rates of video standards. To solve this problem, we have developed a 1ms highspeed vision called the Column Parallel Vision (CPV) [3] and demonstrated a high-speed grasping task [4] using our vision in our previous work. However, our previous using one vision could not extract 3D information, thus we made the assumption of constant distance between the camera and target. In many real applications, 3D information such as positon, motion, orientation or/and shape information of the target is strongly required. In this research, we have developed a 3D tracking algorithm and a for extracting 3D information in a 1ms cycle time. Our goal is to apply them for high-speed grasping tasks in a 3D real world. (see Fig 1.) We have Presently with RIKEN, Bio-Mimetic Control Research Center, Nagoya, Japan, nakabo@bmc.riken.go.jp. First active vision First CPV High speed target tracking f 1 2D image feature extraction object Feature based 3D reconstruction Hand-Arm High speed motion Object Model Position, Orientation Shape parameters Motion How to reach and grasp the target? Figure 1: 3D tracking and grasping task f 2 Second active vision Second CPV 2D image feature extraction All processes are executed in a 1ms cycle time developed the 3D tracking using two high-speed vision s, in which bottleneck-free processing has been realized by massively parallel image processing in CPV s and a feature-based 3D reconstruction in a DSP. Active vision s called AVS-II [3] enable fast control of the gazes to track the target at a 1ms visual servo rate. Also, we propose a new method for extracting the target in a 3D space called 3D self windowing, as conventional self windowing [5] is considered only in a 2D image plane. In the next section, we describe the details of the proposed algorithms. The and an implementation of these algorithms are described in section 3. In section 4, some experimental results are given.

2 3D Tracking Algorithms 2.1 Task-based object model Our final goal is, as shown in Fig.1, to catch the target moving fast and irregularly in a 3D space by the dynamically controlled hand-arm using direct feedback of 3D visual information. Considering these tasks, target position and motion in 3D cartesian coordinates should be obtained for tracking the target, and shape parameters and orientation of the target will be required for guiding the arm to the target and deciding the trajectories of the fingers in the hand for preshaping. In this research, we choose a spheroidal object model, whose parameters are centroid, radial length of rotation and direction and length of the rotation axis. Using these parameters, the approach of the arm can be determined from the direction of the rotaton axis and preshaping can be organized from the size and the length of each direction of the spheroid. These parameters contain sufficient information for our task, so that we are able to focus on the algorithms for extracting these 3D parameters using two high-speed vision s. 2.2 Reconstruction of ellipsoidal shape model (in 2D) In this section, we introduce the moment feature values and compute the ellipsoidal shape parameters from the moment values. Let [u, v] T be the image coordinates and I(u, v) beaninput image. The (i + j)th order bond moments m ij are defined by the following equation: m ij u i v j I(u, v) (1) u,v These moment feature values are often used in visual servoing. In our vision, we can extract these values at high speed, as shown in section 3. The center of gravity ū, v, variance σ 2 u,σ2 v and covariance C uv of an image pattern can be calculated by the moment values, as: ū = m 1 /m, v = m 1 /m (2) σ 2 u = m 2 ū 2 m (3) σ 2 v = m 2 v 2 m (4) C uv = m 11 ū vm. (5) The parameters of an ellipsoidal shape model will be computed from these moment feature values. First, we consider the basic ellipsoid pattern described as: u 2 /a 2 e + v 2 /b 2 e 1, (6) where a e and b e are the lengths of the major and minor (a e > b e ) axes of the ellipsoid. We consider the general description of ellipsoid S, by rotating θ e and translating [u e, v e ] T the basic ellipsoid given by equation (6). [ ] [ ][ ] [ ] u cos θe sin θ = e u ue v sin θ e cos θ e v + (7) v e We consider the binary image, where I(u, v) = 1 inside the ellipsoid, and I(u, v) = outside. Let us calculate the moment feature values from the ellipsoid pattern S. Clearly, the center of an ellipisoid is: u e = ū, v e = v. The second-order moments are calculated as: σ 2 u = u 2 Idudv = a eb e π S 4 (a2 e cos 2 θ e + b 2 e sin 2 θ e ) (8) σ 2 v = v 2 Idudv = a eb e π S 4 (a2 e sin 2 θ e + b 2 e cos 2 θ e ) (9) C uv = uvidudv = a eb e π S 4 (a2 e b2 e ) sin 2θ e. From these equations, θ e can be calculated as: θ e = 1 ( ) 2 arctan 2C uv σ 2 u. (1) σ2 v Solving (8) and (9), a e and b e can be obtained as: 4 a e = 4 8 (σ 2 u cos2 θ e σ 2 v sin2 θ e ) 3 π cos 2θ e σ 2 v cos2 θ e σ 2 u sin2 θ e 4 b e = 4 8 (σ 2 v cos 2 θ e σ 2 u sin 2 θ e ) 3 π cos 2θ e σ 2 u cos 2 θ e σ 2 v sin 2. θ e We have shown how to calculate all the parameters of the ellipsoid from the moment feature values. 2.3 Reconstruction of spheroidal shape model (in 3D) Next, we describe the reconstruction of a spheroidal shape model from the feature values extracted from one vision. Let x = [x, y, z] T be the cartesian coordinates of the object coordinate. We first consider the basic spheroid, whose center is positioned at the origin of the coordinate and the axis of symmetry is aligned along the x axis. The spheroid is described as: x T Σx = 1, where Σ=diag. [ ] 1/a 2 s, 1/b 2 s, 1/b 2 s. (11) We assume the weak perspective camera model. [ ] u u = f M(Rx + T), (12) v T z where M = [I 2 ], R is a 3 3 rotation matrix, T = [T x, T y, T z ] T is a translation vector from the object to the

camera coordinate, and f denotes focal length and scale factor of the camera. We assume that R i denotes the rotation θ i around an axis i, and introduce the following equations: R = R z R y R x (13) x = R y R x x (14) With respect to these equations, we divide the transformation of the camera projection into two phases. 1. An orthographic projective transformation from the 3D object coordinate to the x y plane. 2. A similarity transformation from the x y plane to the uv image plane. Now we consider the first transformation. At first, the relation of R x ΣR T x = Σ shows that there is no need to know the parameter θ x in this case. Substituting equation (14) into equation (11), we obtain the following equation: x T Σ x = 1, where Σ = R y ΣR y T. (15) Generally, the projection of the spheroid is an ellipsoid. In this case, an ellipsoid formed by an orthographic projection of the spheroid to the x y plane is an envelope generated by the intersecting lines of the x y plane and the tangent planes of the spheroid parallel to the z axis. We will calculate the projection practically. Any tangent planes at x on the spheroid (15) are described as: x T Σ x =. (16) If the plane is parallel to the z axis, the normal vector of the plane is orthogonal to the z axis, for example: x T Σ [,, 1] T =. (17) Substituting equation (15) into equation (17), we can eliminate z, and rewriting x x, we have the following equation, which describes an ellipsoid on the x y plane: x 2 + y 2 = 1. (18) a s2 cos 2 θ y + b 2 s sin 2 2 θ y b s Now we consider the second transformation. Substituting equation (13) and (14) into equation (12), we can obtain the following equation. [ u v ] = f T z ([ ][ ] cos θz sin θ z x sin θ z cos θ z y + [ Tx T y ]) (19) Finally, comparing equations (18),(19) and (6),(7), we obtain the following relations between ellipsoidal parameters and spheroidal parameters: θ z = θ e T x = u e T z / f (2) T y = v e T z / f (21) b s = b e T z / f a s2 cos 2 θ y + b 2 s sin 2 θ y = a e T z / f. Note that T x, T y, and θ z have been computed, but T z and θ y are unknown and we are not concerned with θ x with respect to the position and the orientation of the target. 2.4 Computing position and orientation (in 3D) In the previous section, we have calculated the parameters available from one camera. Now we will integrate information from two cameras. Let R b and T b be, respectively, rotation and translation matrices from the first camera to the second. We derive them from the initial setup of the two cameras and encoder sensor data of the active vision s. Let R i and T i be those from the target to the ith camera. Now we have: R b T 1 + T b = T 2, (22) R b R 1 = R 2. (23) Substituting equation (2) and (21) into equation (22), we have: u e1 / f u A = R e2 / f b v e1 / f v 1 e2 / f 1 A [ T z1, T z2 ] T = T b. We can solve these equations by minimizing the least squares error of the solution, as: [ T z1, T z2 ] T = A + T b = (A T A) 1 A T T b. Next, we compute the parameter θ y. Suppose R i = R zi R yi R xi. Multiplying the vector [1,, ] T to (23) from the right-hand side, R xi can be eliminated. Finally we obtain: where: cos θ y1 n sin θ y1 a = [ cos θ y2,, sin θ y2 ] T, (24) Solving (24), we obtain: [ n o a ] = R z2 T R b R z1. θ y1 = arctan (n 2 /a 2 ) ( ) a3 n 2 a 2 n 3 θ y2 = arctan. a 2 n 1 a 1 n 2 Now all the algorithms are shown for extracting 3D information from moment feature values.

Possible area of target in 3D space Possible area of target pattern in Image 1, known from Image 2. Frontier point Obstacle be the self window W i, as: W i = D(Tt 1 i ). (25) step 2. Create a tentative target pattern P i by masking the raw image S i from the vision sensors by the self windowing mask W i : Image 1 Epipolar geometry Image 2 Figure 2: Epipolar geometry in 3D self windowing 2.5 Conventional self windowing (in 2D) Now we will focus on a method of extracting the pattern of the target from input images on vision sensors. In our previous work, we proposed a self windowing method [5], in which self windowing masks are created from the target pattern in the previous frame cycle, so that the target can be tracked continuously providing there is a high enough frame rate of the vision. Basically, this method can be applied to the task here. However, when an occlusion occurs, this can be detected by an abrupt increase of the area of the target pattern, but we cannot distinguish the target pattern from the obstacle until the object patterns will be separated again in the images. 2.6 3D Extended self windowing To solve this problem, we propose a 3D self windowing using an epipolar geometry of two vision s. As shown in Fig.2, assumig the pattern obtained from the conventional self windowing as a tentative target pattern, we can consider an area enclosed by the pencil of the contour of the tentative pattern and take a product space of these areas from two visions, which can be considered as a possible area where the target should be in the 3D space. This area can be used as a 3D mask to separate the target object from the obstacle, and it is certain that we can track the target continuously because of the high frame rate of the vision s. 2.7 3D self windowing algorithm In this section, we describe our 3D self windowing algorithm. Let the number of i (i=1,2) denote each of the vision s. We assume the patterns to be binary (1,). step 1. Suppose the target pattern at the last frame T i t 1 has been obtained. Let the dilated (D) pattern of T i t 1 P i = W i S i. (26) step 3. Find the tangent lines l i max and l i min of the contour of the pattern P i passing an epipole e = [e u, e v ] t. Call the tangent points f i max and f i min the tentative frontier points. The sets of points L i max and L i min on the lines l i max and l i min can be described as: c max = max(c), subject to L P i φ (27) c min = min(c), subject to L P i φ (28) L i max = {u c max (u e u ) (v e v ) <ɛ} (29) L i min = {u c min (u e u ) (v e v ) <ɛ}, (3) where ɛ hasasufficiently small value. f i max and f i min can be chosen from the sets of points F i max and Fi min as: Fmax i = Li max Pi (31) Fmin i = Li min Pi. (32) step 4. Exchange the tentative frontier points between two vision s, and calculate epipolar lines m i max, m i min from these points as: [ m j maxt, 1] T = F [ f i maxt, 1] T, (i j) where the 3 3 matrix F is a fundamental matrix which is calculated from R b, T b, and known camera parameters. step 5. Though it cannot be ensured that the calculated epipolar lines m i max, m i min pass the true frontier points, we can say that the target object should be presented inside the area M i clipped by the epipolar lines m i max, m i min. Using Mi as the mask for extracting the target, Tt i can be described as: T i t = P i M i. (33) In conventional self windowing, the algorithm has been stopped at step 2 treating T i t = P i. In the proposed algorithm, the mask M i in (33) compresses the possible area of the target, so that an occlusion-free recognition is realized. 3 System and Implementation 3.1 Dual CPV and DSP In the following, we briefly describe our and demonstrate high-speed computing of the proposed algorithms.

Image input PD array 128 x 128 pixels Control signals 8bit ADC 128 Cycle time : 1ms Column parallel image transfer Instructions Controller PE array 128 x 128 PEs Figure 3: CPV Summation circuit Column parallel data inout Image feature extraction to DSP network Table 1: Processing time on CPV processing contents steps time 2D self windowing 7 2.3 µs search frontier points (max) @24 2 158.4 µs 3D self window mask 13 42.9 µs th order moment (m ) 36 11.9 µs 1st order moments (m 1, m 1 ) @39 2 25.7 µs 2nd order moments (m 2, m 2 ) @12 2 79.2 µs 2nd order moment (m 11 ) 315 14. µs total 1286 424.4 µs Active vision motion and position Active vision (AVS-II) - 1 Servo controller Parallel DSP First CPV and AVS (left camera) object Object image Object image Moment feature values CPV -1 3D SW mask parameters CPV -2 Active vision (AVS-II) - 2 Object model 3D SW mask parameters Moment feature values Servo controller 3D reconstruction Object model parameters, position, and orientation Obstacle Second CPV and AVS (right camera) Active vision motion and position Cycle time : 1ms Figure 5: Photo of experimental setup Figure 4: Block diagram of 3D tracking The consists of two independent vision s and a DSP. The vision s are called the CPV [3], which has 128 128 photodetectors and an allpixel parallel processing array based on an S 3 PE architecture and exclusive summation circuit for calculating moment values, as shown in Fig.3. The architecture of the CPV is optimized for high-speed visual feedback, so that it can realize a 1ms feedback rate while executing various kinds of image processing algorithms. Each of the vision sensors of CPV s are attached onto the active vision s called AVS-II. Each of the active vision s enables high-speed gaze control for tracking the target independently. In the DSP, parallel DSPs (TI:C6721 4) are used for PD servo control of both actuators of AVS-II. Also, they are used for computing an integration of the feature values from two vision s. The block diagram of the entire is shown in Fig.4. 3.2 Implementation of algorithms In the 3D self windowing algorithm, identical operations on large sets of points are often used when processing images. There can be operated extremely fast by pixel parallel processings in a CPV. For example, an input image is first binarised in each pixels and the operations in (25),(26) and (31)-(33) are executed in a few steps. Also there are search processes in (27)-(3), but we only need to search limited regions of parameters, because there is a sufficiently high frame rate. The processing time of every algorithm in a CPV for proposed 3D tracking is shown in Tab.1. Note that the total processing time is less than half a millisecond, which is the cycle time of the tracking. A calculation of an epipolar geometry and a 3D reconstruction in the DSP, and an exchange of the feature values between the CPV and the DSP[3] are also sufficiently fast that, in total, a bottleneck-free processing for the goal task is realized in our. 4 Experimental Results The experimental setup is shown in Fig.5. The distance between the two vision s is 1cm and that from the camera to the object is about 8cm. The size of the spheroid is 2cm by 1cm. Though the target and the obstacle overlap in the image in Fig.5, the left camera

-2-15 -1-5 5 1 15 2 3 Input Image (binary) Calculated Moment extracted from CPV sysyem. Moment th order:mo=1222, 1st order:mx=69953 My=66898 2nd order:mxx=4131455 Myy=3828938 Mxy=374698 --------------------------------- Center X=57.244682 Y=54.744682 Variance Sx=12717.75 Sy=166628.25 Covariance Cxy=-83456.75 Tilt Theta=-.668897(rad) =-38.3251(deg) Ellipse a=14.126347 b=27.57376 Figure 6: Results of reconstruction of ellipsoid From left image without obstacle with obstacle 3D self window masked 25 2 15 1 5 Left camera image Obstacle (masked) -5 [pixel] Figure 7: Results of 3D self windowing Right camera image Table 2: Results of 3D reconstruction of spheroid parameters calculated truth error rate distance: to CPV1 71cm 65cm 9.2% distance: to CPV2 6cm 54cm 11.1% rotation: θ z 42deg. 4deg. 5.% rotation: θ y 28deg. 4deg. 3.% length of axis: a s 21cm 2cm 5.% length of radius: b s 1cm 1cm.%.4.3.2.1 Left camera.2.4.6.8 1. 1.2 1.4 1.6 -.4 position and direction -.2.2 Right camera.4 [m].6 shows them to be separated. First, the results of a calculation of ellipsoidal parameters are shown in Fig.6. The left-hand image in the figure is the binarised input image and the right-hand image is the reconstructed ellipsoid whose parameters are calculated by the proposed algorithm, which seems to be well estimated. Next, we show the result of the 3D self windowing in Fig.7. Shown on the left are the trajectories of the target. Without a 3D self windowing mask, the trajectory is biased to the left and downwards by the disturbing obstacle. Images on the right are the results of 3D masking, so that only the target patterns are extracted. Last, we show an example of the result of the 3D reconstruction, shown in Tab.2 and Fig.8. Some of the parameters are calculated close to the true values. But some parameters (for example θ y ) are not sufficiently accurate even for a task such as the rough grasping of the target by the hand-arm. This might be caused by an inaccurate calibration of initial rotations of camera coordinates. A demonstration of the target tracking with two cameras can be viewed in a video clip in a CD-ROM of the proceedings. 5 Conclusion A 3D tracking consisting of two CPV s and a DSP has been developed. Also a momentfeature-based 3D reconstruction algorithm and a new 3D Figure 8: Result of 3D reconstruction in 3D graph self windowing method have been introduced. The processing time of the 3D tracking is less than 1ms, which is the required speed for real time and dynamic control of the robot. The accuracy of the recent will be improved by an accurate calibration of camera coordinates. References [1] K. Hashimoto, editor. Visual Servoing. World Scientific, 1993. [2] S. Hutchinson, G. D. Hager, and P. I. Corke. A Tutorial on Visual Servo Control. IEEE Trans. on Robotics and Automation, Vol. 12, No. 5, pp. 651 67, 1996. [3] Y. Nakabo, M. Ishikawa, H. Toyoda, and S. Mizuno. 1ms Column Parallel Vision System and its Application of High Speed Tracking. In Proc. IEEE Int. Conf. on Robotics and Automation, pp. 65 655, 2. [4] A. Namiki, Y. Nakabo, I. Ishii, and M. Ishikawa. 1-ms Sensory-Motor Fusion System. IEEE Trans. on Mechatronics, Vol. 5, No. 3, pp. 244 252, 2. [5] I. Ishii, Y. Nakabo, and M. Ishikawa. Tracking Algorithm for 1ms Visual Feedback System Using Massively Parallel Processing Vision. In Proc. IEEE Int. Conf. on Robotics and Automation, pp. 239 2314, 1996.