Object Detection in Natural Scenery

Object Detection in Natural Scenery P. MAKRIS and N. VINCENT Laboratoire d'informatique Université de Tours (LI/E3I) 64, avenue Jean Portalis 37200 Tours FRANCE Abstract: - In this paper we are tackling the problem of geometrical object detection within natural scenery. Our new original approach does not rely on the extraction of the contours that can be perceived in the image. On the opposite, we are using a statistical method to process in a global way, different zones in the image. Our method presents both a global and a local aspect. It takes into account the spatial dispositions of the image gray level pixels. We take advantage of the Zipf law in order to discriminate between the simple geometrical zone that comes from industrial activity and the great complexity contained in the natural zones. Some examples of object detection are presented. Key-Words: - Zipf law, image segmentation, geometrical objects, image categorization, statistical approach, natural patterns, global approach. 1 Introduction In the general field of pattern recognition, the problems the researchers are confronted to, are far from founding a satisfactory general solution. The numerous studies performed in the domain can testify this fact. The categorization of photographs [3] is a rather new field. It can be an interesting tool in an indexation process as well as in the search of an image or a generic scene in a large data basis. This can be studied either with a global look at the image or with more local approach In our study, we will only consider the case of object detection within "natural" scenery. Since at least twenty years, it is possible in some applications, to detect a house within a landscape or to recognize a road that goes through a landscape in an aerial view. In these cases, the patterns are quite neat, geometrical, and the shadows too. As far as small objets are concerned, it is much more difficult, the small objects are melting in their neighborhood. Our work is concerned with this last situation as well as with more easy ones. We want to show how a rather simple representation allows obtaining interesting results highlighting the opposition between a natural and an artificial aspect, that is to say between irregular and geometrical properties. In fact no previous knowledge is necessary in order to exhibit a particular pattern such as a road. This would allow an interpretation phase to take place then. Some psychological experiments have been done in order to find how human categorize successfully a scene [6]. Manufactured and natural patterns are naturally different. The projection of a tree trunk contour is not as sleek and not as rectilinear as the edge of the projection of a pipe. In the same way, the texture of the trunk is much more complex than the texture of the pipe, either painted or not. In the first part, we will briefly indicate the basement of a natural law on which relies our study, that is to say Zipf law. The second section will be devoted to the presentation of our method. Then, in the last part, some examples will be presented. 2 Zipf law 2.1 Law statement Zipf law [8] is a law that has been discovered in an empirical way and stated more than half a century ago in the field of linguistic research, for the study of different languages. Since then, many experiments and observations have been achieved in other domains such as, for instance, in biology [4] and they have strengthened 1

the validity of this law to express many phenomena. Nevertheless, in all these cases, only 1D processes had been studied [2]. In fact, this law has no physical explanation but it is based on numerous observations. The law statement can be expressed as follows: In a set of symbols that are topologically structured, the n-tuples of symbols are not organized in a random way. It can be observed that only some n-tuples are occurring in the signal, let us say M 1,... M p. The frequencies N 1,...N p of the occurrences of these n-tuples M 1,... M p are linked with the patterns themselves. More precisely, if the patterns (the n-tuples) are sorted according to the decreasing order of their frequencies, the sequence (N σ(1),...n σ(p) ) with i an index varying from 1 to p, and σ a permutation associating the order with the corresponding pattern, verifies the fundamental formula : N σ(i) = k.i a (1) where k and a are two characteristic constants associated with the process. When it can be proved the relation holds, this power law is principally characterized by the value "a" of the power. Of course, this value is negative because the decreasing order has been chosen. The easiest way to estimate this "a" value consists in the study of the link that can appear between the different logarithms of N σ(i) and those of the different i ranks. In the cases where the law holds, the two elements are related through a linear relation. The "a" value can be estimated as the direction parameter of the regression straight line that approximates, in the sense of least squares, the pairs [ln i, ln (N σ(i) )]. In all cases, the study of the corresponding graph brings much information about the studied phenomenon. 2.2 Adaptation of the law in case of a gray level image In the case of a 1D signal, the n-tuple can be chosen as a triple of consecutive symbols. Most often, the alphabets that are used for the symbols, in the different processes, have not many distinct elements. Besides, when considering a gray level image, it seems rational to respect the fundamental spatial organization of the pixels, that is to say, the topological concept of 2D neighborhood. Any two near pixels should be represented in at least one n- tuple. Of course, the n-tuples have to be adapted to the shape of frames that figure the neighborhood of each of the image pixels, for instance, a 3x3 mask. A numbering order of the elements has to be chosen. The obvious symbols would be the gray levels. Unfortunately, for an image that is coded using 256 gray levels, the number of possible different patterns within the mask becomes too large compared with the number of patterns, different or identical, that are actually present in the image. Then, the occurrence frequencies of the different patterns would not be significant. The number of symbols is exponentially increasing with the size of the mask. Therefore, a compromise has to be achieved between the mask size and the number of symbols to be used in order to code the configuration of gray levels occurring within the mask. So that the visual aspect given by the initial image be locally preserved, we have chosen not to use a global quantification of gray levels. But instead, we get inspiration from the rank method [1]. That is to say a pattern that appears in a mask comprising n pixels is coded according to the relative values of the pixel gray levels. The rank of the gray level is used, among the set of the n values. So, for a 3x3 mask the ranks are ranging from 0 to 8. This leads to a reasonable number of different possible patterns with respect to the size of the images. Let us give an example. A 3x3 mask that contains the gray level pixels indicated by their gray level in Table 1 will be coded by the numbers indicated in Table 2 and the 9-uplet associated in Table 3. 243 243 240 239 120 128 120 118 120 Table 1: frame in the initial image 5 5 4 3 2 1 2 0 2 Table 2: code of the previous pattern 2

5 5 4 3 2 1 2 0 2 Table 3: 9-tuples associated with the 120 value gray level pixel in Table 1 Using this code, it is possible to compute the occurrence number of each of the patterns and to verify that Zipf law holds also in the case of a gray level image [7]. In the following of the paper, the curve associated with the graph [ln i, ln (N σ(i) )] will be denoted as the "Zipf curve" of the image. Information and parameters are extracted from this curve. An example of such a curve can be seen in next Figure 1. 2.10 1.90 1.70 can either be a natural image such as a more or less wide landscape or an artificial object that can be a geometrical object or a manufactured object. Two different examples are presented in Figure 2. The Zipf curves associated with these two images can be seen in Figure 3. The two images have the same size, otherwise a prior normalization of the curves would have been necessary in order to perform comparisons. The Zipf curve associated with the natural image of Figure 2 is placed above the curve associated with the artificial image of Figure 2. Then we can deduce that in the natural image, an actual inner structure of the gray pixel combination has been highlighted. From the Zipf curve, the hierarchy of the patterns is very obvious, figured by the slope of the Zipf curve. In the image that contains a geometrical object, the global shape of the Zipf curve has a more horizontal aspect. In the image that contains a geometrical object, some different patterns appear with their occurrence frequencies quite alike. log(pattern numbers) 1.50 1.30 1.10 0.90 0.70 0.50 0.00 0.50 1.00 1.50 2.00 2.50 log(pattern ranks) Fig.1: Evolution of the logarithm of the pattern frequencies with respect to the logarithm of the frequency rank, called "Zipf curve" The relation between the pattern frequencies that is highlighted, is linked to the global structure of the image, but in the same time it relies on the local variations within the image, that are taken into account during the pattern code process and the frequencies computation. We are to use information contained in such previous graphs to detect and localize the presence of a geometrical object within natural scenery images. 3 Object detection method 3.1 Global Image First of all, it has to be noted that the general shape of a Zipf curve varies in an important way according to the nature of the image [5]. The image Fig. 2: Two different images, in a geometrical image and in a natural image: sand on the beach. logarithm (frequency) 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 geometrical object natural image 0.2 0 0.5 1 1.5 2 2.5 3 logarithm (rank) Fig.3: Zipf curves associated with a natural image and a geometrical object Therefore, the criterion that will be used relies on one parameter defined from the Zipf curve. We consider the area of the surface that is bounded on 3

the upper part by the Zipf curve and on the bottom part, by the x-coordinate axis. Then, the more natural the image is, the larger the parameter (computed area) is. Because the initial points have a logarithmic distribution along the x-axis, in order to process data precision, the calculus of the area is achieved after a re-sampling of the Zipf curve is performed. Of course, this area parameter definition can be used only when the images have same size, otherwise, a prior normalization of Zipf curve has to be processed. A change in the global shape of Zipf curve indicates a change in the statistical distribution of the patterns. So, we have a way to quantify changes within an image when a new object is present. The measure is not local but global on the whole image domain. If an object present in the image is too small to be detected by this process, and if no comparison is possible with a similar image where the object would be absent, it would be necessary to perform a zoom of the image. Nevertheless, in spite of the small size of the object, it is possible to perform to a semi-local study of the image. The image is then considered as the union of smaller frames. area parameter area parameter 3.2 Frames study Instead of the initial image, smaller images are now considered, let us name them frames. For example we use 64 or 256 frames that overlay the initial image. On each of the frame, the study is performed, that is to say the previous parameter is computed. In the case of 256 frames, 256 area values associated with the local zipf curve are computed. According to the interpretation we have presented, the smaller the area is, the greater the probability is that the frame contains a geometrical object. The achievement of the detection is only possible when the object size is not too small compared to the size of the frame. In Figure 4 is shown in a 3D representation of the parameter value according to the frame. We obtain a surface with 256 vertices, and associated with an image that contains a geometrical object, whereas in the considered image does not contain any artificial object. On the vertical axis, we have indicated the opposite of the value of the area in order to make apparent a potential extreme point for the object location. Fig.4: Zipf area associated with different images In some cases, the extreme point does not appear in such an apparent way as in Figure 4. Then it is possible to increase the power of detection by computing a gradient in the area surface. To achieve this we make use of next formula applied to each point P of the area surface where the area value is denoted Val(P). The summation is extended to all the points in a neighborhood of P. G(P) = Pv neighb(p) Val(P) Val(P ) In Figure 5, we have indicated the case of an image that contains an artificial object. The semi local study is performed using only 64 frames and the sizes of the frames are larger than in the previous example. In figure, the extreme is not so obvious. The improvement brought by the computation of the gradient is evident in case compared to case. In the first case, the border of the image could have been taken for a geometrical v 4

object, but after gradient computation, this location is no longer a potential candidate for the presence of a geometrical object. The presence of geometrical elements has been enhanced by this gradient computation. area the presence of extra artificial adds in a natural mixture. We have pointed out the interest of using Zipf law in the domain of image analysis. In some way, it gives a way to quantify a characteristic proper to an image that is linked to the inner structure of the image. We are quite conscious on the one hand that the use can be extended and improved, and on the other hand that the results we have obtained are only a first step, and can be looked at as the beginning of a new long way towards a solution seek. Some further studies on the automatic determination of the frame size according to the scenery is performed, and some other parameters are tested extracted from Zipf curve. gradient Fig.5: Geometrical object detection improvement using area parameter in and gradient in 4 Conclusion Without mentioning the direct use of our method by industries wanting to observe ground scenery, it is obvious that our work can be of interest in other fields. For example in the medical domain for numerical treatment of patient images or in industries in the field of quality control to verify References: [1] D. Bi, Segmentation d'images basée sur les statistiques de rangs des niveaux de gris, PhD. Thesis, Université de Tours (France), 1997, 180 pages. [2] A. Cohen, R.N. Mantegna and S. Halvin, Numerical analysis of word frequencies in artificial and natural language texts, Fractals, World Scientific Publishing Company, vol. 5 n 1, pp. 95-104, 1997. [2bis] A. Guérin-Dugué, A. Oliva, Classification of scene photographs form local orientation features, Pattern Recongnition Letters, Vol 21, pp. 1135-1140, 2000. [3] S. Havlin, S.V. Buldyrev, A. L. Goldberger and AL, Statistical Properties of D.N.A Sequences, Fractal Reviews in the Natural and Applied Sciences, Chapman & Hall, pp.1-11, 1995. [4] P. Makris, N. Vincent, Zipf law: a tool for image characterization. Vision Interface'2000, Montreal (Canada), 14-17 may 2000, pp. 262-268. [4bis] A. Oliva, P. Schyns, Coarse blobs or fine edges, Cognitive Psychology, Vol 34, pp. 72-102. [5] N. Vincent, P. Makris, Y. Brodier, Compressed image quality and Zipf law, International Conference on Signal Processing, (ICSP - IFIC-IAPR WCC2000), Beijing (China), 21-25 august 2000, pp. 1077-1084. [6] G.K. Zipf, Human Behavior and the principle of "Least Effort", Addison-Wesley, New York, 1949. 5