Stereo Vision Image Processing Strategy for Moving Object Detecting

Stereo Vision Image Processing Strategy for Moving Object Detecting SHIUH-JER HUANG, FU-REN YING Department of Mechanical Engineering National Taiwan University of Science and Technology No. 43, Keelung Road, Section 4, Taipei, 16, TAIWAN sjhuang@mail.ntust.edu.tw http://homepage.ntust.edu.tw/sjhuang/ Abstract: - Here, a TMS32C6416 DSK board is integrated with two CMOS color image sensors to construct a new stereovision platform instead of current PC based or multi-cpu combination structures for detecting moving objects. A novel moving object detecting strategy is proposed by combining the image temporal differencing and the sum of absolute difference (SAD) matching schemes. This system can detect and track any moving object without object color and shape limitations. The subtraction image processing technique is employed to detect the moving object first based on single image. Then, the edge and SAD stereo matching schemes are used to establish the coordinate relationship of left-right two images. Finally, the depth of a moving object can be estimated for 3D position calculation based on stereo geometry relationship. Experimental results are used to evaluate this system dynamic performance. Key-Words: - Stereo vision, CMOS image sensor, and moving object detecting strategy. 1 Introduction Machine vision has been widely applied in a lot of 2D fields accompanied with the quick progressing of image processing technique[1]. The basic principle of human eyes can distinguish near and far objects is that human brain utilizes the view difference of two eyes images to convert into object depth information. Hence, two or more video cameras are selected to simulate two eyes function for constructing stereo vision systems [2]. Stereo vision system acquires two images from two corresponding cameras simultaneously. Specific mathematical operations software programs are designed and loaded into image processing CPU to simulate the images view difference calculation process. Then, the object depth information in an image can be derived from the sum of square difference (SSD) calculation of two images. How to extract the moving object from two 2D images becomes an important stereo image processing technique. Bae et. al [3] developed a stereo object tracking system by using two video cameras. The depth distribution information of both images can be calculated based on minimum mean of absolute difference scheme to establish disparity motion vector. Yoneyama et. al [4] used the motion vector information of Mpeg Video Stream to detect moving object location. Background subtraction scheme was used to identify a moving object [5] by comparing the current captured image with a preinput background image. The area with subtraction residue will be the coming or moving object. However, the basic background subtraction scheme has strict limitation on fixed background environment to achieve accurate identification. Hence, mathematical modeling analysis technique was employed to construct the background model and the mapping of environmental change [6]. Since background modeling needs complicate mathematical derivation and huge computing operations, the continuous images subtraction method is proposed to simplify this modeling analysis process. The new extracted image is used to compare with last captured image without input background information. The difference zone between those two images is defined as moving object location [7]. However, how to develop an ingenious stereo vision system with simple structure and low cost for stand-alone application is still a challenge work. Here, TMS32C6416 DSK board is integrated with two CMOS color image sensors to construct a stereo vision platform for acquiring and locating moving objects. A novel moving object detecting technique is proposed by combining the image temporal difference and the sum of absolute difference (SAD) matching schemes. Experimental results are used to evaluate the feasibility and reliability of this stereo vision system. ISSN: 179-5117 127 ISBN: 978-96-474-135-9

2 Stereo Vision and Image Extracting System Structure The overall DSP based stereo vision system structure is shown in Fig. 1. In order to integrate the DSK developing board and CMOS image sensors into a stereo vision system, the accompanied communication interface, image extracting daughter card, image pre-processing and appropriate time sequence control software need be designed and built. Since stereo vision is based on two CMOS images to calculate 3D coordinates, the extracted images synchronize is very important. Hence an appropriate checking process should be designed into the control software or this daughter circuit. In order to guarantee the synchronous transmission of raw data from two CMOS image sensors, data contents should be inspected at the end of first data receiving process. The feature of ineffective DARK pixels in data format has fixed location can be specified as the checking point. If both received image raw data formats are not matched, the OE enable control signal is re-started and checked it again. When DSK board receives two sets of synchronous image raw data, following color interpolation, brightness compensation and image gray scale calculation of image pre-processing operations are employed to recover the raw data into useful image information. The overall image extraction process flow chart is shown in Fig. 2. Since the commercial CMOS color image sensor has been installed a color filter array (CFA) ahead the image plane to obtain the RGB three different colors signals. Each pixel only need one sensor and provides one of the RGB color. Hence, un-sampling two colors should be calculated by using the interpolation operation. Here, the bilinear interpolation calculation [9] is employed. YCbCr color space is widely applied on digital image color space representation. Human eye has good acuminous with respect to brightness Y and it is described as image gray signal and the aberration signals are presented as Cb and Cr. The image preprocessing has transformed the 8 bits image raw data of the pixel into 24 bits RBG color space data. In order to reduce the un-necessary calculation of DSP processor, the RGB image data is converted into 8 bits gray signal again by the following formula Y.299 = Cr.167 Cb.5.587.332.419.114 R.5 G.813 B (1) Fig. 1 Stereo vision system hardware structure. Fig. 2 Flow chart of image extraction process. 3 Object Detecting and Recognizing Strategy How to distinguish the object and background in an image is an important technique for machine vision practical application. Background subtraction scheme [5] used a pre-defined environmental background model stored in database for comparing with current captured image. Its robustness is limited to fixed background condition. Hence, mathematical modeling analysis technique was employed to construct background environmental change and mapping [6]. However, the background model transformation and mathematical operation are very complicate. Those methods all need predefined environmental background and object shape or color [7]. Their autonomous practical application is limited. Hence how to develop a robust moving object detecting scheme without environment and object feature limitations is a desirous research target. Here, a new recognition technique is proposed by combining the image temporal differencing strategy and the sum of absolute difference operation to detect and track the moving object inside an image with a moving block searching scheme. It can effectively reduce the computing time and achieve better system performance for movable platform application. ISSN: 179-5117 128 ISBN: 978-96-474-135-9

(A) Moving object captured and targeted If the background environment has certain change due to shadow and brightness variation or the platform moving, this background subtraction algorithm is difficult to compensate this change for accurately detecting moving object. Hence two following images temporal differencing technique was proposed to extend the application of background subtraction scheme. When the last image is subtracted from the current image, the area with moving object will have larger gray scale difference. Hence a threshold value can be selected to distinguish the moving object and background shifting just like image binary operation. The area with gray scale differential value larger than threshold value is marked as possible moving object location as Fig. 3. Fig. 3 Two flowing images subtraction for detecting moving object demonstration. It can be observed that two continuous images subtraction can only find the possible moving object zone. If the precise object location need be identified, it needs additional image processing techniques for managing this marked area. For example, object shape characteristic and specific color can be further employed to identify it. Here, the following images subtraction operation is extended to three following images subtraction for accurately marking the moving object location. Three following images subtraction operations can obtain two marked possible moving object zones. Then the precise moving object location can be identified based on intersection mathematical operation are described in eq. (2) to eq. (4). The geometry relationship explanation is depicted in Fig. 4. Dift 2, t 1 Dif t 1, t object t 1 Fig. 4 Three following images subtraction and intersection image operation flow. Dif Dif heightwidth t 2, t 1 = Binary framet 2( framet 1( ) x= y= ( (2) height width t 1, t = Binaryframe t 1( framex t(, ) x= y= ( (3) object t 1 = Dif t 2, t 1 Dif t 1, t (4) Binary(a) = 1, if(a > threshold), otherwise where frame t is the image at time t, frame t ( is the pixel gray scale at position ( in time t image, object is the possible moving t object zone at time t and Dif is the subtraction t, t +1 absolute value of frame t and frame t+ 1. (B) Moving object tracking Then, the image processing strategy is switched from continuous subtraction operation to the sum of absolute difference (SAD) scheme after the possible moving object zone was found. SAD method uses a reference moving object pattern to search the minimum SAD value within a specified area in next acquired image for locating the moving object translation position. The precise moving object extracted from last continuous subtraction operation can be selected as the reference moving object pattern for SAD operation. It is mathematical equation is ( ref H 1) ( ref W 1) SAD ( = framex ( + i, y+ j) ref( (5) i= j= where SAD( is the SAD calculation value at point ( inside the specified area, frame represents current image, ref depicts the moving object reference pattern, ref H and refw are the height and width of reference pattern, respectively. The position with minimum SAD value is specified as the new location of this moving object. Then the reference moving object pattern, ref H and refw, and the specified searching area are updated for next step operation. For effective combining both moving object detecting algorithms and achieving accurate moving object tracking purpose, a software monitoring control program is designed for its system. (C) CMOS Stereo vision structure ISSN: 179-5117 129 ISBN: 978-96-474-135-9

Since the targeted 2D moving object block is found based on single image of CMOS sensor only, we need another extracted image information come from the second CMOS sensor located at different position for comparison to derive the depth information of this moving object. The depth information can be calculated based on the optical geometrical relationship of both left-right image sensors, as Fig. 5. installed with parallel optical axes, they will be expected to have a horizontal direction pixels shifting. Hence the searching area on right hand side image is limited to about the same height as that of reference moving object pattern with two more pixel rows, as Fig. 7. (a) (b) Fig. 5 Relationship between stereo vision geometry and depth. optical (D) Two images stereo matching Stereo matching is to locate a specified target point of an object on a pair of left-right images with a fixed position dislocation. Their dislocation value can be used for 3D coordinate calculation. The passive stereo matching methods are performed by using the characteristic point of left-right images, for example: direct matching, block gray scale comparison, edge, projection.etc. Here the edge matching and direct matching schemes are described. Sobel [9] gradient operation is widely employed to find the gray scale variation of a picture for object edges detection. After taking Sobel edge detection operation for both left-right images, two sets of binary edge images are obtained as Fig 6 (a). A horizontal epipolar line is drawn on both binary edge images with the same height. The edge lines on both images will be cut into a couple of segments as Fig. 6 (b). Each segment length of both images is calculated and stored sequentially for comparison. It can be observed that L3 ~ L6 are corresponding to R2 ~ R5. The summation difference between left hand side segments and right side segments pixel numbers is the horizontal image shifting of left-right images. When the SAD scheme has targeted a moving object based on right hand side CMOS image, the reference moving object pattern is updated and used to search the matching object on left hand side image. Since two CMOS image sensors of the proposed stereo vision system are horizontally Fig. 6 (a) Binary edges images and (b) segment number cut by epipolar line. Fig. 7 Searching area definition by using SAD direct stereo matching. (E) Object depth estimation and 3D coordinate calculation: After the moving object is targeted and the stereo matching operation is completed, the object depth could be estimated for 3D coordinate calculation. The optical geometry relationship of this stereo vision system can be displayed as Fig. 5. Then the object depth D can be derived as s f D = (6) l + r where f is the CMOS focus, s is the distance between both CMOS sensors. L and r are the lengths with respect to left and right CMOS optical axes, respectively. Then 3D coordinates of this moving ISSN: 179-5117 13 ISBN: 978-96-474-135-9

object can be calculated based on the defined coordinate direction. Y = D f y X = D f x Z = D (7) 4 System Performance Evaluation and Experimental Results A DSK developing board based stereo vision system is constructed for real-time moving object detecting purpose. The overall system flow chart is shown in Fig. 8. In order to evaluate this stereo vision system performance and implementation limitation, following experiments are planned and investigated. The executing times of raw data extraction, continuous image subtraction, SAD moving object tracking, SAD stereo matching and edge stereo matching operations are about 24.475, 19.343, 1.756, 4.71 and 1.412 ms, respectively. Here 24.5 ms is chosen as the synchronize signal for satisfying the 24.475 ms full image raw data extracting requirement. The edge stereo matching and 3D coordinates calculation operation time is less than 5 ms. subtraction operations, respectively. The white color part represents the area with image change. Taking intersection operation from these two binary subtraction images can obtain the targeted reference moving object pattern with white boundary. After the moving object is targeted, the software control program is switched to SAD moving object tracking algorithm by using the reference moving object pattern established from images subtraction scheme. This SAD pattern matching operation is worked on the specified SAD searching range and the reference moving object pattern is updated continuously for adapting to the object moving condition change. Fig. 8 DSP based stereo vision overall system flow chart. Here, continuous images subtraction scheme is used to detect the moving object and target its location in image. If a running ball is moving into the stereo vision working space, the system is started to detect and target the moving object. Fig. 9 shows three continuous images at time step t-1, t and t+1 and the binary pictures of two relative Fig. 9 Three following images subtraction and moving object targeted in binary pictures. After the image edges of SAD reference moving object pattern are targeted, an epipolar line is marked on these edge patterns of both left-right images to calculate the characteristic interval for stereo matching comparison as Fig. 1. Then a 3D working space motion track with significant vertical direction change is planned for a moving ball to execute the moving object tracking experiment. When the ball is moving into system visual range and the moving ball has been targeted, the ball motion is changed into move by operator hand in the specified track. The experimental results are shown in Fig. 11. It can be observed that this stereo vision system can effectively track the ball motion in 3D working space. 5 Conclusion A novel low cost stereo vision system is designed for 3D moving object detecting and tracking. The related timing control, data communication interface and image pre-processing ISSN: 179-5117 131 ISBN: 978-96-474-135-9

software control programs are designed and integrated into this DSP hardware structure for stand-alone application purpose. A new moving object detecting technique is proposed by combining the temporal subtraction differencing and the sum of absolute differences (SAD) matching schemes to reduce the image processing computation effort. This system can be employed to detect and track any moving object without object color and shape limitations. It can be applied in mobile platform for stand-alone application, for example mobile robot or human interacting toys. Fig. 1 Draw epipolar line in SAD targeted moving object area. Y Coordinate (cm) 5-5 -1-15 -3-2 -1 X Coordinate (cm) 1 2 4 Moving Object 6 Z Coordinate (cm) 8 1 Acknowledgement This research is supported by the National Science Council under the contract NSC 95-2212-E11-156 -MY3. References: [1] Chern-Sheng Lin; Li-Wen Lue, An image system for fast positioning and accuracy inspection of ball grid array boards, Microelectronics Reliability Vol. 41, Issue: 1, pp. 119-128, January 21. [2] [3] Masatoshi Okutomi and Takeo Kanade, A Multiple-Baseline Stereo, IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 15, No. 4, April, pp. 353-363, 1993. [3] K. H. Bae, J. S. Koo and E. S. Kim, A New Stereo Object Tracking System Using Disparity Motion Vector, Optics Communications, Vol. 221, pp. 23-35, 23. [4] Akio Yoneyama, Yasuyuki Nakajima, Hiromasa Yanagihara and Masaru Sugano, Moving Object Detection from MPEG Video Stream, IEICE Transactions, vol. J 81-D-II, No. 8, pp. 1776-1786, Aug. 1998. [5] Jae-Soo Lee, Choon-Weon Seo and Eun- Soo Kim, Implementation of opto-digital stereo object tracking system, Optics Communications, vol. 2, pp. 73-85, 21. [6] Y. Ren, C. S. Chua and Y. K. Ho, Statistical Background Modeling for Nonstationary Camera, Pattern Recognition Letters, Vol. 24, pp. 183-196, 23. [7] Jae-Soo Lee, Choon-Weon Seo and Eun- Soo Kim, Implementation of opto-digital stereo object tracking system, Optics Communications, vol. 2, pp. 73-85, 21. [8] Hsin-Teng Sheu, Hung-Yi Chen and Wu- Chih Hu, Consistent Symmetric Axis Method for Robust Detection of Ellipses, IEEE Proceedings --- Vision, Image and Signal Processing, vol.144, no.6, pp.332-338, December 1997. [9] Rafael C., Gonzalez and Richard E. Woods, Digital Image Processing, Addison- Wesley, 1992. Fig. 11 A ball moving within a track with significant vertical height change. ISSN: 179-5117 132 ISBN: 978-96-474-135-9