A GENETIC ALGORITHM FOR MOTION DETECTION Jarosław Mamica, Tomasz Walkowiak Institute of Engineering Cybernetics, Wrocław University of Technology ul. Janiszewskiego 11/17, 50-37 Wrocław, POLAND, Phone: +48-71-303996, +48-71-30681 Fax: +48-71-31677, E-mail: jmamica@hotmail.com, twalkow@ict.pwr.wroc.pl Abstract: Motion is a feature, which may be used for identification, description and differentiation of objects for purpose of automatic understanding of particular cages of an image. A critical part of the digital image s sequence analysis is a procedure of object s comparison. We propose to use a genetic algorithm for that task. The genetic algorithm is used for a purpose of optimal determination of transforming for every of compared objects. Keywords: genetic algorithm, traffic monitoring, machine vision, motion detection 1 Introduction In case of machine vision, one of the most essential features letting to interpret image s semantics is the object s motion. [6] In the aspect of time-spatial images (video), motion is interpreted as association of tested image on successive cages of an entrance image (displacement of double dimension of object image, separated in segmentation process, in time function of video image). Application area of video detectors is mainly a statistic data accumulation, where error margin is admitted and continuous work of a detector is not necessary. The possibility of integration with monitoring systems and traffic recording (with option of motion picture after record analysis), portable and software realization of module responsible for detection are the main advantages of video detectors. Detection is a presence of vehicle in given area. Localization is directly connected with detection definition. This is a possibility of co-ordinate determination, in established arrangement of reference of a detected object. System overview System presented here was done according to following principles: the aim of the system is traffic monitoring. images are received from a video camera mounted in a stable way over the analyzed fragment of route or cross-roads; system is localizing vehicles on the bases of motion picture analysis and generating number of cars moving in camera view statistics. system is designed to work in real time on standard PC class computer. Movement is one of the features which can be used to identify, describe and distinguish objects for automatic understanding of separated video frames [6][7]. Movement is the essential feature in these algorithms. In practice only analysis of video sequences allows detection and analysis of dynamic vision. Sequence of digital still images is a time-spatial image. We understand it as a sequence of D-dimension discrete images. We have based the vehicle move detection algorithm on a so called differential method [7][8]. Since move is associated with changes in observation, move analysis methods should analyze the difference between two following images in a sequence. Move detection in this situation can be done by observation of brightness level value in time function, which passes during registration of following images. It suits following math description [8]: x, y, t ) = f ( x, y, t ) f ( x, y, ) (1) f D ( t1 where: t 1, t time moments, f D difference of images, x,y image coordinates. Difference of two images obtained this way can be used as an approximation of derivation of brightness level after time df(x,y, t)/dt in the middle point time space t - t 1. Of course, in real situation not all value brightness levels changes results from move existence. One of possible implementation of move detection could consist of marking out binary area that represents vehicles moving in one sequence. These are so called displace masks. The method of marking the displace masks out we can divide to two categories: first of them insists in marking out the changes between following frames in a sequence, and the second insists in marking out the differences between current
analyzed frame and background image. Usually the background is static, not changeable. It is a part of observed scene, what requires earlier viewing of the scene. In our system we have used different method. In this method differential images are marked out related to dynamic background image. We are updating the background during time. The background is calculated based on an average values with history and area character consideration (belonging to moving object or background). This operation is carried out by the iteration during analysis of a sequence of frames. Then the frame displace masks is taken as an absolute difference of current frame and background image processed in one moment. In analysis process of image with displace masks we have used genetic algorithms. Genetic algorithms are used as a tool which lets to put in order a chaotic (random) objects collection. As a result of we got labeling on objects on each frame. Therefore, we can calculate the motion trajectory. Objects that possess that feature are acknowledged as objects in motion, and they generates detection signal. System logics is taking decision about classify object as disturbance or they leave to continue next analysis. 3 Detection Algorithm for Vision Based Monitoring The detection algorithm [3] could be divided into six steps described below. It is worth to mention that the algorithm is performed for each of following frames of recorded video. The main data processed by the algorithm is a list of objects. The list records all objects presented in the video. At the first phase objects are all coherent groups of points separated from differential image, [7] which matches some basic criterions (i.e. minimum area). During next interpretation of objects are marked as vehicles or deleted from the list. Step 1. Removing previous list of objects List contains objects from number of levels. Level is a group of objects registered on single frame. In this first step of algorithm the oldest level of objects is removed. Step. Differential image calculation New image frame is taken from signal source, and the differential image is calculated as difference between current frame and current background. For the first read frame the calculation is not performed, since at this moment the background bitmap is not available. Since we are analyzing color pictures we got three brightness difference values for each base color (RGB). This three dimensional vectors is converted into 1-D value, the vector length. Step. 3 Differential image pre-processing [5] At this stage the differential image is pre-processed. The main aim of it is to make image properties better for generating objects data. At first the differential image is filtered. We have tested different kind of used filters, i.e. linear, nonlinear, space and frequency filters. Selection of the best filter depends on a trade-off between the effectiveness of the filter (i.e. resulting in correct object detection) and computational complexity (resulting in time and memory usage). Filtered differential image is then thresholded. This means that the differential image with pixels ranges form 0 to 44 (55* 3 ) is turned into image with pixels value 0 or 1 (intuitive interpretation is that 1 represents an object and 0 the background). All character pixels with values below the threshold are set to 0, all those above are set to 1. This is a simple way to remove the effects of any non-linearity in the contrast in the frame. However, the threshold value must be defined. It could be achieved by calculating the threshold based on a histogram of a given differential image. And again filtration is performed, this time using logic or median filters. Step 4. Segmentation and dissimilarity coefficient extraction The key problem of this phase of the detection algorithm is a method which separates the objects contained in the image and labels them. Segmentation [5] is a technique which allows fragmentizing image areas that match a criterion of homogenous. In case of binary image the criterion is the coherence of pixels area, qualified as active, describing an object, that means with value 1. While the separation of objects is performed, at the same time, indexation process is realized, that means each pixel of the object receives unique index value. The index value is unique for a given frame. For every object separated in a segmentation process and matching minimum number of active point s criteria, operation of addition to the list is taken. As objects parameters remembered are: coordinates of object position in the image, number of the object active points, the color bitmap of object and index value. Moreover, for each object added to the list, dissimilarity coefficients are calculated. Coefficients represent the dissimilarity of a given object to each object on the previous frame. The value close to zero means that objects are almost identical. This is a very important part of the algorithm, allowing afterward the labeling process and is described in details in next chapter.
1 1 3 4 3 5 current list of objects previous list of objects Fig.1 An example of object matching Step. 5. Object matching The aim of this phase to find association between objects form following levels (objects from two following frames of motion picture) and label the object as a new one, not identified or as matched to some object form the previous level. This is done by comparison of dissimilarity coefficients of listed objects. At first, coefficients with values below given threshold are rejected. Next, for each object on current frame list the most likelihood object from the previous list is selected, i.e. a given dissimilarity coefficients has the highest value. The selected object from previous list is removed from coefficients list (this is done to prevent association of two current objects to one from previous list). The left object coefficients are subject to choose winning pair again. The procedure is repeated to moment when all coefficients are chosen or rejected. Labeling results are extended with the value of displacement of an objects following two frames. The displacement values are being written into a database for further statistic analysis (relative position of object, class of object largeness, etc). Step 6. Background update Last stage is the background bitmap update. [7] The bitmap is then used in step 1. For the first frame the recorded bitmap is directly copied to the background bitmap. In case of all other frames, the background is updated recurrently. The idea of background generating is based on integration of information inserted by every new frame with background using the Kalman filter [] (this is done for all three colors separately): B (t+1) = B t + [a 1 *(1- M t ) + a *M t ]*D t ; () where: B t - background model at time t; D t - differences between current frame of motion picture and background model; M t - hypothetic binary masks of objects; a 1 - coefficient of background updating in areas not occupied by objects; - coefficient of background updating in areas occupied by objects. a 4 Dissimilarity coefficient extraction by genetic algorithm Move detection which is vehicle detection on crossroads is based on association of matching objects in following image frames. Critical procedure of stage and whole analysis appears procedure of objects comparison. Principle which says that matching objects in following frames are objects moved relatively each other by shortest distance is not sufficient. Moreover, with image perspective, relative distances measured in pixels are far incorrect and they do not contain sufficient quantity of information to use them in the object association process. That is why direct comparison of objects to determine their similarity is so necessary. The dissimilarity coefficients are calculated as normalized Euclidean distance between objects treated as a multidimensional vector: i ( r ( i) r ( i) ) + ( g ( i) g ( i) ) + ( b ( i) b ( i ) ) 1 1 1 ) dissimilarity =, (3) N where i span all object points (from 1 to N- number of points in analyzed object), r 1, g 1,b 1 represents all three colors of each bit of the first image and r, g,b of the second image. However, the objects usually have different size. Therefore, the smaller object is processed with scaling and displacement transform and compared with bigger object. Coordinates of every point in transformed object image are set basing to linear transform:
x` = (x * kx ) + dx` (4) y` = (y * ky ) + dy`, where x,y old coordinates, x,y new coordinates and dx`, dy` represents displacement and kx` and ky` the scaling factor. Therefore, we have to find these four values of transform coefficients for all pair of pictures. We have found out that the genetic algorithm allows an effective and trustful solution of this problem. The basic target of the algorithm is to determine optimal transform for each pair of compared objects. In consequence it makes determination of dissimilarity coefficients in objects list possible. There is no need to determine perspective in this algorithm. Objects are being suit each other and optimize function of target i.e. minimize the dissimilarity (3). We have followed the standard genetic algorithm, described in many textbooks (i.e. [1]). The result of the genetic algorithm working, for each of analysed objects is a few sequences of transform coefficients. Number of sequences depends on number of objects compared with. In perfect situation, sequence of winning objects pair should has much more lower values of dissimilarity coefficients (highest similarity), and should be getting low approximately (parameter values of algorithm getting to optimum values). Our genetic algorithm works on 0 elements chromosome. Alleles are implemented through binary values 0 or 1. In chromosome 4 values are encoded: encoded movement dx i dy ranging form 0 to15; encoded scale kx an ky coefficients ranging from 0 to 63. Encoded transform coefficients are related to no-encoded coefficients according to linear projection: kx = kx * c1+ c dx = dx + c3 (5) ky = ky * c1+ c dy = dy + c3, where c1, c, c3 are constants of calibration. dx dy kx ky Fig.. The chromosome The genetic algorithms parameters set in performed experiments are as follows are: probability of crossing operation: 0.6; probability of mutation operation: 0.03; one point simple crossing; roulette method selection. For every specimen of population target function is set. Correct definition of target function is one of the most important part of a genetic algorithm. In our case the target function is the inverse value of dissimilarity (3) plus some constant. Therefore, we are performing the minimization (3) of dissimilarity value. Opposite to typical usage of Genetic Algorithms, in our case, it is not necessary to determine exact value of target function (fitness). It is important that value of target function generated for winning pair should be significantly different to not correlated pairs. That is why number of generated populations can be significantly lower then in typical use of GA (dozens). Number of generated populations forms a stop criterion. 5 System implementation The system presented has been implemented in the VC++ language. For system testing purposes the sequence of the input picture is loaded from the disk as a sequence of JPEG algorithm compressed frames. After decompression the picture is stored in the RGB format. One of the aims of the work was to analyse the system abilities to work in real time system. The analysis time of a single picture frame has been finally reduced to approximately 8 seconds per frame on PC 366 MHz, (out of which, approx. 40% of time to read the data and perform the introductory processing of the picture and 60% to associate the objects using AG).
Fig. 3. The object matching One has to point out the fact that the achievement of video picture analysis in the matter of seconds per frame is compromised with the quality of the generated outcomes. It refers first of all to the number of generations generated by the genetic algorithm (directly implicating the matching quality of the corresponding objects, and therefore the resulting final outcomes) and the number of operations performed on the picture. Nevertheless, analyzing the results, one can ascertain that the elaborated picture analysis methodology makes it possible to implement the created system as a real time system. An additional code optimization would be necessary though, especially in the key points of the program. Decisions referring to the implemented algorithms were the outcome of the necessary compromise between the precision of the outcomes and the performance. The system has the characteristics of a real time system, i.e. the total analysis time of a single picture frame shouldn t exceed (with given analysis parameters) the time between two succeeding video picture frames. The precision of the results depends on the length of the analysis time. Optimizing the key, considering the time of performance, functions of the program, we are able to increase the real time analysis of the picture, and the same the precision of the results of system performance. To make it more simple the list of objects of designed system is based on two levels. The best results in the filtration module have been achieved by implementing two filters: averaging filter for one-dimensional differential picture and a quick logical filter for the binary picture. The segmentation process is based on a modified, iterative algorithm of direct segmentation. This algorithm guarantees a relatively fast and stable work (contrary to the recurrent algorithm, which problem is slow work and the possibility of overloading the stack). A substantial problem of the analysis occurred to be a proper method of the background picture generation. An element of the picture semantic interpretation has been introduced in order to achieve better results the points of the pictures have been distinguished into points belonging to one of the moving objects and background points. The distinction is based on assigning a given area a picture of responding background actualization coefficient. The background is generated dynamically using the Kalman filter. At the stage of pictures sequences association, for each of the objects, AG appeared to be the most effective analysis method. Direct comparing, as well as comparing using the approximation of perspectives in the picture didn t bring positive results. Using of neural networks has also been taken into consideration. Finally the choice was one of the directed methods of solution space searching AG. The implemented algorithm is based on simple selection operations, crossing and mutation. The difficulty occurred in the elaboration of an adequate method of transformation coefficients coding and a proper adjustment function definition. The last step of the detection algorithm is the dissimilarity coefficient analysis. At this point, the logic of the system ensures the preservation of the set rules concerning the correctness of the association process. For example if the picture of the given object has been associated with the picture of another object, all the remaining factors for the
objects of the pair are being eliminated from further analysis. The coordinates localizing an object are being set for each of the objects in a given coordinate system in this case in screen coordinates. In order to present the objects clearly on the screen, each of them has a unique identifier placed in the point defined by the co-ordinates of the localization. detected objects input picture current background 6 Summary Fig. 4. Process of vehicle detection During the research on vehicle localization in motion picture it turned out that object motion is the most essential feature that lets interpret semantic of out coming image in segmentation process. In those terms restricted on system and principle that image is being taken from video camera mounted over the crossroads, sufficient term of vehicle detection is a move. In consequence, system does not require additional modules and additional calculation expenditures in process of image analysis. It allows the system to work in real time. Necessity of using the most effective method of searching the space of solutions resulted in an experimental use of genetic algorithms. Obtained results of experiments allow us to mark this method very high. It should be stated here that the main aim of the research was to work out the easiest possible way of image analysis. There so some drawbacks of presented method. The system is highly sensitivity to the image quality. Trial of real use of elaborated methodology would need some modification of every of analysis stages in a way that the disturbances and deformations do not propagate to next stages. Summarizing results of experiment let us to confirm that genetic algorithms appeared to be most effective analysis method, which is a compromise between quality of results and elapsed time. References [1] GOLDBERG D., Genetic Algorithm and their usage. (in Polish), WNT Warszawa 1995 [] JUN ZHAO, EE/CSE 586: Advanced Topics in Computer Vision, Computer Project Report, 1999 Vision-Based Traffic Monitoring, http://www.cse.psu.edu/~jzhao/cse586/project.html [3] MAMICA J. Mobile detection based on video signal, Institute of Cybernetic Engineering, Wroclaw University of Technology, MSc dissertation, 00. (in Polish) [4] MATERKA A., Digital processing and analysis of images elements. (in Polish), PWN Warszawa-Lodz 1991 [5] TADEUSIEWICZ R., Visual systems of industrial robots. (in Polish), WNT Warszawa 199 [6] The CSIRO Image Analysis Group, Image motion and tracking, http://www.cmis.csiro.au/iap/ [7] The Evolving, Distributed, Non-Proprietary, On-Line Compendium of Computer: Motion, Tracking and Time Sequence Analysis, Scene Understanding, Image Transformations and Filters, http://www.dai.ed.ac.uk/cvonline/ [8] WOŹNICKI J., KUKIELA G, Analysis of digital images sequence motion detection (in Polish) Own Researches Priority Program of Technical University of Warsaw. http://nms.ise.pw.edu.pl/photonics_information/ppif/projects9899/5.html