Statistical Shape Features for Content-Based Image Retrieval

Size: px

Start display at page:

Download "Statistical Shape Features for Content-Based Image Retrieval"

Dwain Collins
5 years ago
Views:

1 Journal of Mathematical Imaging and Vision 17: , 2002 c 2002 Kluwer Academic Publishers. Manufactured in The Netherlands. Statistical Shape Features for Content-Based Image Retrieval SAMI BRANDT Laboratory of Computational Engineering, Helsinki University of Technology, P.O. BOX 9203, FIN HUT, Finland Sami.Brandt@hut.fi JORMA LAAKSONEN AND ERKKI OJA Laboratory of Computer and Information Science, Helsinki University of Technology, P.O. BOX 5400, FIN HUT, Finland Jorma.Laaksonen@hut.fi Erkki.Oja@hut.fi Abstract. In this article the use of statistical, low-level shape features in content-based image retrieval is studied. The emphasis is on such techniques which do not demand object segmentation. PicSOM, the image retrieval system used in the experiments, requires that features are represented by constant-sized feature vectors for which the Euclidean distance can be used as a similarity measure. The shape features suggested here are edge histograms and Fourier-transform-based features computed from the image after edge detection in Cartesian or polar coordinate planes. The results show that both local and global shape features are important clues of shapes in an image. Keywords: feature extraction, content-based image retrieval, statistical shape description, relevance feedback 1. Introduction Image content can be described by visual features classified according to the properties they describe. Such classes are, for instance according to MPEG-7 International Standard [17], color, texture, and shape. Color and texture contain important information but, for instance, two images with similar color histograms can represent very different objects. Therefore the use of shape-describing features is essential in an efficient content-based image retrieval system. Although shape description has been intensively researched, there exists no direct answer which kind of feasible shape features should be incorporated into such a system. Content-based image retrieval (CBIR) has been an active research area since the early 1990 s [4, 15]. This work was supported by the Finnish Centre of Excellence Programme ( ) of the Academy of Finland, project New information processing principles, Many image retrieval systems, both commercial and research, have been built. The best known are Query By Image Content (QBIC) [5] and Photobook [18] and its new version FourEyes. Other well-known systems are the search engine family VisualSEEk, MetaSEEk and WebSEEk [1, 23, 24], NETRA [16], and Multimedia Analysis and Retrieval System (MARS) [7]. Used shape features and some other characteristics of some present systems are summarized in Table 1. Unfortunately there does not exist any standard benchmark that could be used to assess the performances of different CBIR systems. However, there are currently plans to establish such a benchmarking procedure [6]. A major problem in automatic feature extraction is segmentation. Even if it were known that there is a single object in the image, it is in general a non-trivial problem to locate it. On the other hand, when there are no specific objects the result of segmentation is probably an irrelevant part of the original image. For the

2 188 Brandt, Laaksonen and Oja Table 1. Summary of the shape features in some content-based image retrieval systems. KLT stands for Karhunen Loeve transform, FEM finite-element method, DFT discrete Fourier transform, and MFD modified Fourier discriptors. Appearance Shape QBIC photobook photobook NETRA MARS Feature type Attributes KLT FEM DFT MFD Feature dimension 22 Not specified Not specified 126 Not specified Similarity measure Euclidean Euclidean Strain energy Euclidean Several Segmentation Manual/Semi-autom. Semi-autom. Semi-autom. Autom. Semi-autom. Multiobj. segm. Yes No No Yes No Multiobj. query No No No No No use of a general database of images, such as the World Wide Web, it might then be reasonable to use some statistical shape features for the whole image instead. However, in the current literature these kinds of shape features are an exception [2]. Indeed, all the above systems having shape features rely on manual segmentation. Only NETRA has a fully automated segmentation algorithm, whose results indicate the extreme difficulty of the problem. Another central issue to be considered in selecting an appropriate shape description technique is whether some invariances are needed. Naturally many machine vision systems require invariance to spatial transformations like translation, rotation, and scaling, or to different illumination conditions such as lighting and contrast. In some cases these are not beneficial, however. In content-based image retrieval, this may occur when the user has a clear impression what the image looks like. For example, looking for a certain view in contrast to a single object the user possibly prefers a similarly rotated, translated, maybe even scaled image compared to the image the user has in mind. Therefore it may be more reasonable to speak of robustness of features rather than invariances. 2. Content-Based Image Retrieval and PicSOM System The shape features described in this paper have been used in the PicSOM content-based image retrieval system [3, 12, 13]. The system uses Self-Organizing Maps [9] in implementing relevance feedback for storing user responses and selecting images. A genuine characteristic of PicSOM is its ability to automatically adapt to the user s perception of similarity of images based on their low-level visual content even though humans perceive image similarity on abstract semantic level. An on-line demonstration of PicSOM and comprehensive documentation of it can be found at Self-Organizing Map The Self-Organizing Map (SOM) [9] is an unsupervised neural algorithm that maps a vector set from a high-dimensional input space to a typically twodimensional discrete output space which is a grid of artificial neurons or units. The SOM algorithm can be seen as a special form of vector quantization in which the topological ordering of the vectors is retained as well as possible. Therefore, adjacent input vectors tend to be mapped in neighboring SOM units and the mutual similarity of vectors is reflected in their proximity on the map grid. A model vector whose dimensionality equals the dimensionality of the input space is stored in each SOM unit. These model vectors can be initialized at random or according to the mean and principal directions of the training data. A central concept of SOM is the best-matching unit (BMU) of each vector in the vector set used in training the SOM. When the map is being trained, the training vectors are used consecutively, one by one as an input vector to the SOM. The BMU of the input vector is found as the map unit which minimizes the distance between the vector and the model vector of the unit. When training the SOM, the model vectors of the BMU and its topological neighbors on the map are modified to match the input vector even better. As a result, the distribution of the model vectors resembles the distribution of the training set and the neighboring model vectors resemble each other. The training of a SOM consists of the following two steps for each of the training vectors x(t). First, the

3 Statistical Shape Features for Content-Based Image Retrieval 189 BMU model vector m c(x) (t) is located: i: x(t) m c(x) (t) x(t) m i (t). (1) The usual distance metric used here is the Euclidean one. After finding the BMU, a subset of the model vectors constituting a neighborhood centered around node c(x) are updated as m i (t + 1) = m i (t) + h(t; c(x), i)(x(t) m i (t)). (2) Here h(t; c(x), i) is the neighborhood function, a decreasing function of the distance between the ith and c(x)th nodes on the map grid. A special form of the SOM, known as the Tree Structured Self-Organizing Map (TS-SOM) [10, 11] has been used in the PicSOM system. It differs from the original SOM in that a TS-SOM consists of a stack of SOM layers. The BMUs are first searched for on the topmost layer and the search is then continued on the next layer in a restricted area centered below the BMU on the above map. This makes the BMU search much faster, otherwise the properties of SOM and TS- SOM are similar. After one TS-SOM layer has been trained, the values of the model vectors on it are frozen and the next layer below it is initialized by interpolating and extrapolating from that layer s model vectors. We have used TS-SOMs where there are three layers: the topmost layer contains 4 4 = 16 units, the middle one = 256 units and the bottommost = 4096 units. An essential property of the PicSOM system is that a separate TS-SOM is trained for each feature type, typically color, texture and shape. The TS-SOMs have been trained by iterating all feature vectors extracted from the images in the database one hundred times. After the SOM training, each map unit on each of the maps is given a visual label by selecting among those images for which that unit is the BMU the one whose feature vecture is closest to the unit s model vector. Figure 1 illustrates a TS-SOM layer surface. It can be seen that visually similar images with respect to their spatial edge frequency and orientation are mapped in nearby SOM units Operation of the PicSOM System In the relevance feedback [20] approach of contentbased image retrieval, the system displays the user sets of images. The first set is selected at random or so that the various types of images included in the whole image database are represented in it as well as possible. When the user has selected relevant ones among the shown images, PicSOM scores the corresponding BMUs on all the feature maps with a positive value inversely proportional to the number of relevant images. In an analogous manner, BMUs of irrelevant images are scored with negative values. The score values are marked as points on the locations of the BMUs on the map surfaces. These surfaces are then low-pass filtered. This process spreads the positive and negative values to the neighboring map units. As a result, map areas that contain many relevant images close together will have larger score values than areas where the relevant images are sparsely distributed, or where the relevant and irrelevant images cancel each other out. When choosing images for the next query round, the PicSOM system selects a list of preset number, say 100, of the best-scoring, yet unseen images from each TS-SOM layer of each feature type. Next, if some images appear in more than one list, the respective score values are summed up to obtain the final scores. Finally, the images with the largest score values are shown to the user who may again select the relevant ones and continue the iteration. Because images similar in the sense of a particular feature are clustered together on the corresponding TS-SOM map, the low-pass filtering of the relevance values leads to automatic adaptation to the user s conception of image similarity and relevance. The mutual weighting of different feature types is also performed simultaneously, as features that map relevant images in tight clusters are automatically given more weight than the others [14]. 3. Statistical Shape Features The experiments study a few kinds of statistical features which do not require segmentation but are computed from the shape properties of the whole image. The question whether the invariance to the affine transformations is beneficial or not is also addressed Histogram of Edge Directions The first experiments on shape features were made by using a histogram of edge directions. The edge histogram is translation invariant and it captures the general shape information in the image. Because the

190 Brandt, Laaksonen and Oja Figure 1. The middle level map of the FFT128 TS-SOM where nearest images to the neurons are shown. The map ordering has a clear interpretation, see text.

4 190 Brandt, Laaksonen and Oja Figure 1. The middle level map of the FFT128 TS-SOM where nearest images to the neurons are shown. The map ordering has a clear interpretation, see text. feature relies on local statistic of the image, it is robust to partial occlusion and structure deformations in the image. The major disadvantage is that two perceptually very different images may have similar edge histograms. The edge histogram is computed as follows. At first, the color image is transformed to the HSI space from which the hue channel is neglected. The other two channels are convolved with the eight Sobel masks [25]. Each pixel is given the maximum of the responses and the corresponding 8-quantized direction. The gradient is thresholded next. The threshold values are manually fixed to 15% of the maximum gradient value on the intensity channel and to 35% on the saturation channel. The thresholded intensity and saturation gradient images are combined by the logical OR operation (see Fig. 2). In the OR operation, if the gradient directions differ between the saturation and intensity, the originally stronger gradient direction is chosen. A gray level image I(x, y) is hence transformed as I(x, y) {I e (x, y), I d (x, y)} (3) where I e (x, y) {0, 1} represents the binary edge image, and I d (x, y) {0,...,7} the direction image. Finally the 8-bin edge histograms with bins corresponding to the quantized eight directions, are calculated by counting the edge pixels in each direction. Our experiment showed (see [2] for a full report) that it is better to normalize the histograms by the number of pixels in each image rather than by the number of edge pixels as was done in [8]. For the histogram, we thus

5 Statistical Shape Features for Content-Based Image Retrieval 191 Figure 2. operation. The original image, thresholded magnitudes of its saturation and intensity gradients, and the gradient image combined by the OR have H(i) = 1 #{x {0,...,N 1} NM y {0,...,M 1} I d (x, y) = i} (4) The effect of smoothing proposed in [8] was also studied. The smoothing should make the histograms more robust to rotation. It is performed as H s (i) = i+k l=i k H(l mod 8), (5) 2k + 1 where the parameter k determines the degree of smoothing Co-Occurrence Matrix of Edge Directions The edge histogram can yet be generalized. By taking every neighboring edge pixel pair and enumerating them based on their directions a two-dimensional histogram or co-occurrence matrix is obtained. The resulting histogram, represented by 64 bins, is normalized by the number of pixels in the image and is defined as H co (i, j) = 1 #{x {0,...,N 1} y {0,...,M 1} NM (ˆx, ŷ) U(x, y) I d (x, y) = i I d (ˆx, ŷ) = j} (6) where U(x, y) is the causal neighborhood set of the pixel (x, y). The causal neighborhood is defined as the 8-neighbors of a pixel in the west, northwest, north, and northeast directions. H co (i, j) hence indicates the number of neighboring edge pixel pairs which are positioned in the directions i and j Fourier Features The edge image contains the most relevant shape information and the discrete Fourier transform can be used to describe it. Before forming the edge image, here by using the Sobel masks and neglecting the gradient direction information (see Fig. 2), the image area is normalized with zero padding to a maximum size of such that the aspect ratio is maintained. The normalization is performed to get comparable edge responses regardless of the original sizes of the images. If both dimensions of the original image are larger than 512, the image is decimated with filtering. Otherwise the normalization is made by bicubic interpolation. After edge detection, the Fourier transform is computed for the normalized image using the FFT algorithm. The magnitude image of the Fourier spectrum is first low-pass filtered and thereafter decimated by the factor of 32. The resulting number of dimensions

6 192 Brandt, Laaksonen and Oja in the feature vectors is 128. The final reduction is made after the edge detection and FFT because the resolution of the edge detection procedure would not be sufficient to extract relevant edges from a decimated image Polar and Log-Polar Fourier Features The Fourier features described above are translation invariant but not rotation invariant. Our method, which is named as polar Fourier features, is rotation invariant with respect to the center of the image but not invariant to translation and scale. At first the image is normalized and the edge image is obtained similarly as with the Fourier features. The binary edge image I e (x, y) is then transformed to the polar coordinates by the formula I(ρ,θ) = I 1 (ρ,θ) I 2 (ρ,θ), (7),..., , me- where ρ = 0, ,...,,θ= 0, ans the binary OR operation, and 512 I 1 (ρ,θ) = I e ( Rρ cos 2πθ +c x, Rρ sin 2πθ +c y ) (8) 1 if x, y {0, 1,...,511} such that I e (x, y) = 1, ρ = (x cx ) 512 R 2 + (y c y ) 2, I 2 (ρ,θ) = and θ = π (x c x, y c y ), 0 otherwise, (9) where (c x, c y ) = (255.5, 255.5) are the coordinates of the center of the image, R = cx 2 + c2 y, and gives rounding to the nearest integer. The above procedure prevents the formation of gaps between the edge pixels in the polar coordinate system. For the polar image the Fourier transform and decimation are performed similarly as with the Fourier features and a 128-dimensional feature vector is obtained. The method is invariant to translation in the polar plane that implies the rotation invariance with respect to the center of the image. Even more invariances could be obtained by a slight modification to the feature. Log-polar [22] Fourier features are invariant to affine transformations i.e., to translation, rotation and scaling. These can be obtained by replacing R and ρ in (8) and (9) with ln R and e ρ, respectively. Translation invariance is obtained by setting the centroid (c x, c y ) to the center of mass of the binary edge image. Rotation invariance is obtained by using the magnitude spectrum of the log-polar transform as the rotations affect only the phase of the spectrum. Accordingly, invariance to scale is obtained by taking logarithm of the radius in the polar coordinate plane. All the Fourier-based features presented here are sensitive to occlusion: the direct use of the Fourier transform may lead to very different magnitude spectra for occluded images. In addition, if some parts of the central object of an image are missing, the calculation of the centroid will go wrong and significantly differing log-polar images will result. 4. Evaluation Methods The presented shape features are evaluated with methods from [12]. Let N be the total number of images in the image database D. Let C D be a class of similar images determined by some verbal criteria. In order to measure the clustering of the features we define instantaneous precision P C (n) as the probability that an image I j C has as the nth closest neighbor an image which also belongs to the class C. For each image I j C the distance to every other image is calculated and the images are sorted in the order of ascending distance. For each image I j we thus define a sequence h j (n) as { 1 if the nth closest image belongs to C, h j (n) = 0 otherwise, (10) where n = 1,...,N 1. The instantaneous precision P C (n) for the class C having N C images is then defined as P C (n) = 1 N C j I j C h j (n), n = 1,...,N 1. (11) In the optimal case P C (n) = 1ifn N C 1 and 0 otherwise. On the other hand the worst-case performance

7 Statistical Shape Features for Content-Based Image Retrieval 193 results when P C (n) equals the a priori probability of class C for all n Local Performance Good features should have high instantaneous precision for the very first indices. Therefore a local performance measure of the precision-type is defined as the average of the instantaneous precision for the first 1% of indices, i.e. η local (C) = N n=1 P C (n). (12) N The local performance measure has the ability to describe if the feature space is clustered such that there is a high probability that any image of the class C has an image of the same class near it in the feature space. Note that η local is dependent on the a priori probability of the class Global Performance The performance measure presented above gives a high performance value even if the images of the class C are clustered to many small clusters all over the feature space. This suggests using as an alternative an appropriate weighting function w(n) which would take global clustering into consideration. The instantaneous precision function can be converted to a scalar by using the sum expression η global (C) = N 1 n=1 P C (n) w(n). (13) The weighting function w(n) should be such that it rewards large P C (n) in small indices and punishes large values in large indices. For this reason the weighting function was chosen to be w(n) = (n 1)π cos N 2 NC 2 l=0 cos lπ N 2, n = 1,...,N 1. (14) Then η global approaches one for the optimal instantaneous precision case and zero for the a priori case. In addition, η global is ideally independent of the a priori probability. When the weighting function is chosen this way it measures how the feature vectors of a class tend to form one global cluster in the feature space Performance in the PicSOM System Finally it is tested how the features perform in the Pic- SOM system. Although this experiment reveals the performance of PicSOM itself, it also measures well how the particular shape features are suitable to the very system. The experiment is performed such that a computer program simulates the query process of PicSOM over a series of subsequent query rounds. For each image I j C it is measured how many images must be shown until the image I j is found from the database. The program emulates the user s relevance feedback by selecting all the shown images which belong to the class C. The resulting merit of indexing is thus defined as τ(c) = 1 N shown (I j ), (15) NN C I j C where N shown (I j ) is the rank of image I j in the output of the system. Random picking would theoretically result to a τ-value of 0.5 and, in general, the closer to zero τ is the better the obtained retrieval performance. It must be noted that the τ(c)-measure is an average for the whole class C of images. Also, it can be calculated in a single experiment in which the system ranks all images of the database based on relevance feedback given to the system according to images membership in the class C. From the output of such an experiment the ranks of all images of the class are readily available. 5. Experiments As a test database a set of 4350 miscellaneous bitmap images was used. The images were downloaded from ftp://ftp.sunet.se/pub/pictures. The ground truth classes were manually picked from the database. The classes used in the experiments were aircraft, buildings, and faces, of which the database contains 348, 492, and 361 images, respectively. The formation of the global performance measure η global is shown in Fig. 3. The results are summarized in Table 2 where the best scores in each column have been bolded. In forming the curves, the instantaneous precision values were calculated for the whole ground truth classes, whereas the results in Table 2 were obtained by averaging the figures of merit which were obtained by dividing the classes into ten subsets. The performances of the original color and texture features used in Pic- SOM are also presented to allow comparison between them and the shape features.

8 194 Brandt, Laaksonen and Oja Figure 3. Evolution of weighted, cumulative instantaneous precision. The global merit of indexing η global is defined as the y-value of the endpoint of the curve. In comparing the shape features the results showed that there is no single feature which would perform best for every image class used in the experiments. Of the local features, which are robust to occlusion and to any local disturbance in the image, both the histogram and the co-occurrence matrix of edge directions gave good results. It can also be seen that the smoothing of the edge histogram with k = 1 provided no advantage (EdgeHist vs. EdgeHistSm). From the global Fourier-transformbased features the magnitude spectrum of an edge image was found to work at least as well as the local shape features with the test database. In addition, the results of the experiments on the three different Fourier-based features support the assumption that with a general database the presence of all the three invariances is not beneficial. It is now interesting to see how the best candidate features so far perform in the PicSOM system. In the light of the τ-test, as Table 3 indicates, for the single feature maps FFT128 seems to be the best for all the classes although the difference to the second-best Co-occurrence is not very distinct. It is also interesting to note that the difference between EdgeHist and Cooccurrence seems to be larger than the previous results would suggest. On the other hand, now EdgeHist performs better than Co-occurrence for the aircraft class. This test also verifies that the best shape features presented in this chapter are superior to the used color

9 Statistical Shape Features for Content-Based Image Retrieval 195 Table 2. Performance of different features. Larger values indicate better performance. Feature Aircraft Buildings Faces η local, 0 η local 1 Color ± ± ± Texture ± ± ± EdgeHist 0.04 ± ± ± EdgeHistSm 0.04 ± ± ± Co-occurrence 0.06 ± ± ± FFT ± ± ± PolarFFT ± ± ± LogPolarFFT ± ± ± η global, 0 η global 1 Color 0.2 ± ± ± 0.05 Texture 0.12 ± ± ± 0.07 EdgeHist 0.43 ± ± ± 0.1 EdgeHistSm 0.40 ± ± ± 0.1 Co-occurrence 0.48 ± ± ± 0.1 FFT ± ± ± 0.2 PolarFFT 0.3 ± ± ± 0.1 LogPolarFFT 0.14 ± ± ± 0.09 Table 3. Performance of the features in the PicSOM system: The results of the τ-test when only one feature map is used in the query. The τ-value indicates how large proportion of the database must be shown on the average until a specific image is returned from the query. Feature Aircraft Buildings Faces Average Color Texture EdgeHist Co-occurrence FFT PolarFFT and texture features. The best performance occurs with aircraft images and FFT128 features when a desired image is seen when on the average 20% of the images in the database have been shown. Table 4 shows how PicSOM performs when different combinations of the feature maps are used. It may be seen that the combination of the color and texture maps is advantageous because the τ-values are reduced in all cases except for the aircraft class with the color feature. The τ-values for the combination of the color and texture maps also provide reference values for the original PicSOM system, with which the system with Table 4. Results of the τ-test when multiple feature maps are used. This test shows how the addition of the shape features affects the performance of PicSOM compared to that with only the color and texture features. Features Aircraft Buildings Faces Average Color, Texture Color, Texture, EdgeHist Color, Texture, Co-occurrence Color, Texture, FFT Color, Texture, PolarFFT Color, Texture, Co-occurrence, FFT128 the shape features is about to be compared. It may be seen that the addition of any of the four shape feature types increases the performance. The best results are obtained by incorporating FFT128 alone or together with Co-occurrence into the system. The results are slightly better than those which were obtained by using single feature maps. From the average probabilities of Table 4 it may be seen that the combination of the four features gives the best results indicating that the PicSOM system is capable of adapting to use the best-performing features. Finally, it is instructive to see how the images of the ground truth classes are clustered on the SOMs. In Fig. 4 the sized lowest levels of the TS- SOMs of the best candidate features are displayed together with the original average color feature of Pic- SOM. The maps of average color show that the aircraft and building classes overlap and all the three classes are widely spread on the SOMs. On the Cooccurrence and FFT128 maps, the classes are clustered tighter. FFT128 seems to separate the classes better, which thus supports the earlier results of the experiments. In Fig. 1, we have displayed the images associated with the units of sized level of the FFT128 map. The lower left hand corner consist of images with lots of small, high-frequency details the upper right hand corner represents the opposite with majoring low frequencies and lack of details. Faces and other circular shapes are accumulated in the middle on the top of the map, clear horizontal elongated shapes are notable in images on the lower right hand corner while the corresponding vertical trend is seen in the upper left hand corner. The map is also well consistent with class distributions in the bottom row of Fig. 4.

196 Brandt, Laaksonen and Oja Figure 4. Mappings of the ground truth classes with the color and two best shape features on the lowest level of the TS-SOMs.

10 196 Brandt, Laaksonen and Oja Figure 4. Mappings of the ground truth classes with the color and two best shape features on the lowest level of the TS-SOMs. The class distributions have been low-pass filtered to ease the interpretation. 6. Conclusions In this work shape-describing features for general content-based image retrieval were studied. We formed various types of statistical feature vectors from the edges in non-segmented images. The best results were obtained with decimated magnitude spectrum of the edge image. Also local edge-histogram-based features, including the co-occurrence matrix of edge directions, gave good results. This indicates that both local and global information are important cues of the object shapes in an image. The experiments also suggest that with a database of miscellaneous images it might not be reasonable to require the features to be invariant to affine transformations. The results suggest that, in future, more sophisticated shape features of both local and global nature could give better results. From local, histogram-based features this could be achieved by e.g. Gabor or Gaussian derivate type features [19, 21]. From global features wavelet-transform-based features might be rewarding as it is possible to describe coarse overall shape added with details. Moreover, due to the different type of the features, a combination should work even more efficiently as learned by the experiments with the PicSOM content-based image retrieval system. References 1. M. Beigi, A. Benitez, and S.-F. Chang, MetaSEEk: A contentbased meta search engine for images, in Storage and Retrieval for Image and Video Databases, SPIE Proceedings Series, San Jose, CA, S. Brandt, Use of shape features in content-based image retrieval, Master s Thesis, Helsinki University of Technology, Available at picsom/publications.html. 3. S. Brandt, J. Laaksonen, and E. Oja, Statistical shape features in content-based image retrieval, in Proceedings of the 15th International Conference on Pattern Recognition (ICPR), Vol. 2, Barcelona, Spain, Sept. 2000, pp A. Del Bimbo, Visual Information Retrieval, Morgan Kaufmann Publishers: San Mateo, CA, M. Flickner, H. Sawhney, W. Niblack, et al. Query by image and video content: The QBIC system, IEEE Computer, Vol. 28, pp , N.J. Gunther and G. Beretta, A benchmark for image retrieval using distributed systems over the internet: BIRDS-I, Technical Report HPL , HP Labs, Available at HPL html.

Statistical Shape Features for Content-Based Image Retrieval 197 7. T.S. Huang, S. Mehratra, and K.

of Illinois at Urbana-Champaign, March 1996. 8. A.K. Jain and A. Vailaya, Image retrieval using color and shape, Pattern Recognition, Vol. 29, No. 8, pp. 1233 1244, 1996. 9. T.

11 Statistical Shape Features for Content-Based Image Retrieval T.S. Huang, S. Mehratra, and K. Ramchandran, Multimedia analysis and retrieval system (MARS) project, in Proceedings of the 33rd Annual Clinic on Library Application of Data Processing Digital Image Access and Retrieval, University of Illinois at Urbana-Champaign, March A.K. Jain and A. Vailaya, Image retrieval using color and shape, Pattern Recognition, Vol. 29, No. 8, pp , T. Kohonen, Self-Organizing Maps, 3rd edn. Vol. 30 of Springer Series in Information Sciences, Springer-Verlag: Berlin, P. Koikkalainen, Progress with the tree-structured selforganizing map, in 11th European Conference on Artificial Intelligence, European Committee for Artificial Intelligence (ECCAI), A.G. Cohn (Ed.), John Wiley & Sons: New York, Aug. 1994, pp P. Koikkalainen and E. Oja, Self-organizing hierarchical feature maps, in Proceedings of IJCNN-90, International Joint Conference on Neural Networks, San Diego, CA, 1990, pp M. Koskela, Content-based image retrieval with selforganizing maps, Master s Thesis, Helsinki University of Technology, Available at picsom/publications.html. 13. J.T. Laaksonen, J.M. Koskela, S.P. Laakso, and E. Oja, PicSOM Content-based image retrieval with self-organizing maps, Pattern Recognition Letters, Vol. 21, Nos. 13/14, pp , J. Laaksonen, M. Koskela, S. Laakso, and E. Oja, Selforganizing maps as a relevance feedback technique in contentbased image retrieval, Pattern Analysis & Applications, Vol. 4, Nos. 2/3, pp , M.S. Lew (Ed.), Principles of Visual Information Retrieval, Springer: Berlin, W.Y. Ma and B.S. Manjunath, NETRA: A toolbox for navigating large image databases, in Proceedings of IEEE International Conference on Image Processing, Vol. I, Santa Barbara, California, Oct. 1997, pp Overview of the MPEG-7 standard (version 5.0), March ISO/IEC JTC1/SC29/WG11 N A. Pentland, R.W. Picard, and S. Sclaroff, Photobook: Contentbased manipulation of image databases, International Journal of Computer Vision, Vol. 18, No. 3, pp , S. Ravela and R. Manmatha, On computing global similarity in images, in Proceedings of 4th IEEE Workshop on Applications of Computer Vision (WACV 98), Oct G. Salton and M.J. McGill, Introduction to Modern Information Retrieval, Computer Science Series, McGraw-Hill: New York, B. Schiele, Object recognition using multidimensional receptive field histograms, Ph.D. Thesis, Institut Polytechnique de Grenoble, France, English translation. 22. E.L. Schwartz, Spatial mapping in the primate sensory projection: Analytic structure and relevance to perception, Biological Cybernetics, Vol. 25, No. 4, pp , J.R. Smith and S.-F. Chang, Searching for images and videos on the World Wide Web, Technical Report , Columbia University CTR, Aug J.R. Smith and S.-F. Chang, VisualSEEk: A fully automated content-based image query system, in Proceedings of the ACM Multimedia 96, Nov M. Sonka, V. Hlavac, and R. Boyle, Image Processing, Analysis and Machine Vision, International Thomson Computer Press, Sami Brandt received the degree of Master of Science in Technology from the department of Engineering Physics and Mathematics in Helsinki University of Technology, Finland, in September His master s thesis was made on the use of shape features in content-based image retrieval at the Laboratory of Computer and Information Science. Currently, he is about to finish his doctoral thesis on the geometric branch of computer vision applied to electron tomography at the Graduate School in Electronics, Telecommunication, and Automation (GETA) and conducting research at the Laboratory of Computational Engineering. Additionally, he is a student member of the IEEE and IEEE Computer Society, member of the Pattern Recognition Society of Finland, and thereby a member of the International Association for Pattern Recognition (IAPR). Jorma Laaksonen received his Dr. of Science in Technology degree in 1997 from Helsinki University of Technology, Finland, where he is presently Senior Research Scientist at the Laboratory of Computer and Information Science. He is an author of several journal and conference papers on pattern recognition, statistical classification, and neural networks. His research interests are in content-based image retrieval and recognition of handwriting. Dr. Laaksonen is an IEEE member, a founding member of the SOM and LVQ Programming Teams and the PicSOM Development Group, and a member of the International Association of Pattern Recognition (IAPR) Technical Committee 3: Neural Networks and Machine Learning.

198 Brandt, Laaksonen and Oja Erkki Oja is Director of the Neural Networks Research Centre and Professor of Computer Science at the Laboratory of Computer and Information Science, Helsinki University

12 198 Brandt, Laaksonen and Oja Erkki Oja is Director of the Neural Networks Research Centre and Professor of Computer Science at the Laboratory of Computer and Information Science, Helsinki University of Technology, Finland. He received his Dr.Sc. degree in He has been research associate at Brown University, Providence, RI, and visiting Professor at Tokyo Institute of Technology. Dr. Oja is the author or coauthor of more than 240 articles and book chapters on pattern recognition, computer vision, and neural computing, as well as three books: Subspace Methods of Pattern Recognition (RSP and J. Wiley, 1983), which has been translated into Chinese and Japanese, Kohonen Maps (Elsevier, 1999), and Independent Component Analysis (J. Wiley, 2001). His research interests are in the study of principal components, independent components, self-organization, statistical pattern recognition, and applying artificial neural networks to computer vision and signal processing. Dr. Oja is member of the editorial boards of several journals and has been in the program committees of several recent conferences including ICANN, IJCNN, and ICONIP. He is member of the Finnish Academy of Sciences, Fellow of the IEEE, Founding Fellow of the International Association of Pattern Recognition (IAPR), and President of the European Neural Network Society (ENNS).

Content-Based Image Retrieval of Web Surface Defects with PicSOM

Content-Based Image Retrieval of Web Surface Defects with PicSOM Rami Rautkorpi and Jukka Iivarinen Helsinki University of Technology Laboratory of Computer and Information Science P.O. Box 54, FIN-25