CHAPTER 4 SEMANTIC REGION-BASED IMAGE RETRIEVAL (SRBIR)

63 CHAPTER 4 SEMANTIC REGION-BASED IMAGE RETRIEVAL (SRBIR) 4.1 INTRODUCTION The Semantic Region Based Image Retrieval (SRBIR) system automatically segments the dominant foreground region and retrieves it, using semantic learning. The system segments the image into different regions, based on the semantic concepts and finds the dominant foreground region. Then, the low-level features of the dominant foreground region are extracted. The Support Vector Machine-Binary Decision Tree (SVM-BDT) is used for semantic learning, and it finds the semantic category of the image. The low level features of the dominant region of each category image are used in finding the semantic template of that category. The SVM-BDT is constructed using these semantic templates. A high level concept of the query image is obtained, using the SVM-BDT. Similarity matching is done between the query image and the set of images, belonging to the semantic category of the query image, and the top images with the least distance are retrieved. The training and testing phases of the SRBIR are explained in the next section. 4.2 PHASES OF THE SRBIR An image consists of many regions. The features can be extracted either from the whole image or from the regions of the image. Normally, the user is interested in a specific region than in the whole image. The

64 representation of the images at the region level is closer to that of human perception. The CBIR based on the image regions is called as Region Based Image Retrieval (RBIR). This SRBIR system compares the dominant region of the query image and the dominant region of the DB image. The dominant foreground region represents the semantics of the image. The block diagram of the training phase is given in Figure 4.1. Training images Determination of the dominant region of each image Extraction of the low level features of the dominant region Feature DB (Feature vectors of the dominant regions) Construction of SVM-BDT SVM-BDT Figure 4.1 Block Diagram of the Training Phase of the SRBIR The dominant foreground region is extracted from each image in the DB. The low level features, namely, color and texture features are extracted from the dominant region. These extracted low level features are stored in a DB. The image DB contains the images of several semantic categories. The color-texture semantic templates for each semantic category are found, using the low level features of the dominant region of the images in that category. These semantic templates are used in constructing the SVM- BDT. Hence, at the end of the training phase, the SVM-BDT is constructed

65 for classification. Upon receiving a query image feature vector, the SVM- BDT will be able to predict the label, (ie), the semantic category of the query image. RGB Query image Determination of the dominant region in the query image Extraction of the low level features of the dominant region Label of the query image Prediction of the class label for the query image using SVM-BDT Feature DB (Feature vectors of dominant regions of the entire DB) Retrieval of the low level features of all the images in the predicted class Determining the Similarity between the features of each image in the predicted class and the query image features Image index of the first 15 values Raw Image DB Retrieval of the top 15 images with the least similarity distance Resultant Images Figure 4.2 Block Diagram of the Testing Phase of the SRBIR The block diagram of the testing phase is given in Figure 4.2. Given the query image, the dominant foreground region is extracted from it.

66 The color, and texture low level features are extracted from this dominant region. These low level features of the dominant region of the query image are given as input to the SVM-BDT, and it predicts the category of the query image. The low level features of all the images in the predicted category are taken from the feature DB. The similarity distance is calculated between the low level features of the dominant region of the query image, and the low level features of the dominant region of all the images in the predicted category. The similarity distances are sorted in the ascending order, and the corresponding top images with the least distance from the query image are retrieved from the image DB and displayed. The next section explains the segmentation of the dominant foreground region from the image. 4.3 SEGMENTATION OF THE DOMINANT FOREGROUND REGION The dominant foreground region occupies most of the space in the image. Mostly, the dominant object in the image s foreground tells the semantics of the image. The natural image photographs contain some dominant foreground region. The SRBIR extracts the solid dominant foreground region. So, the extracted dominant region, has reduced noise, since it is not the outline. Therefore, the low-level features extracted from the dominant region do not have much distortion. The query image is an RGB image which uses 24 bits per pixel (RGB - 8 bits per color). The indexed image is an image in which the image pixels do not contain the full specification of its color, but only the index in the color map. The indexed image requires the color map and an image matrix. The color map is an ordered set of values representing the colors in the image. For each image pixel, the image matrix contains the corresponding index in the color map. The size of the color map is n 3 for an image containing n colors. Each row of the color map matrix is a 1 3 vector,

67 containing the values of the red, green and blue components that correspond to a single color in the image. For each pixel in the image matrix, the color corresponding to the index in the color map is displayed. Hence, in an indexed image, the colors available in the image are preserved in the color map, and the image matrix contains only the indices of the color map. The MATLAB command rgb2ind converts the RGB image to an index image. Equation (4.1) gives the format of the rgb2ind command. [X, map] = rgb2ind(rgb, n) (4.1) Equation (4.1) converts the RGB image RGB to an indexed image X, using the minimum variance quantization; map is the color map which contains at the most n colors. n must be less than or equal to 65536. When the image has more than n colors, the minimum variance quantization is applied to reduce the number of colors. The values in the resultant image matrix X are indexes in the color map map and cannot be used in the filtering operations. For example, the RGB image is read in the variable im and converted to indexed image ind_img and the color map col_map, using the imread MATLAB command. im = imread('c:\101_objectcategories\lotus\image_0002.jpg'); [ind_img, col_map] = rgb2ind(im, 65536); The portion of the indexed image ind_img is given in Table 4.1. The portion of the color map details is given in Table 4.2. In Table 4.1, the indexed image contains the indices of the color map. The pixel value at position (1, 1) of Table 4.1 is 4549 and it is shaded. The corresponding color value is got from the color map value at the index 4549 and it is shaded in Table 4.2. The RGB value in this index is substituted in the pixel position

68 (1, 1) of the indexed image. In the same way, each pixel value in the indexed image is substituted from the color map and the image is displayed. Table 4.1 Portion of the Indexed Image (10 rows and 10 columns) Row >> ind_img(1:10, 1:10) 1 2 3 4 5 6 7 8 9 10 1 4549 4549 2821 4853 3652 4328 4853 922 1719 395 2 3652 4156 4156 4455 2638 1294 4103 3164 681 1046 3 5296 2330 2945 2945 5299 4365 1382 2171 116 1095 4 2196 105 3510 3510 1458 1016 2463 2196 646 1730 5 139 1016 2961 139 2961 219 1978 2961 2928 2408 6 3919 3011 3958 4282 2167 2019 671 3895 2254 4610 7 2019 2019 3958 2952 2167 2167 3958 385 3958 3919 8 2019 4000 2872 3558 1995 3730 3659 4047 4000 671 9 3349 4047 2872 1432 1995 1995 1257 4047 4000 161 10 4047 938 1849 1432 2917 3026 207 3363 4000 385 The size of the RGB image is 296 300 3, and is represented as im. The image size is 296 300. The values of the red, green and blue components are stored in the image. The size of the indexed image ind_img is 296 300, and each element of it contains an index of the color map. Since, the number of colors in the image im is 5534, the size of the color map col_map is 5534 3. Each color map entry will have 3 values corresponding to R, G and B, as in Table 4.2. The amount of memory needed to store the RGB image is 296 300 3, which is, 2,66,400 pixel values. The amount of memory needed for the indexed image is 296 300 + 5534 3, which is, 1,05,402 pixel values. Since, normally the images contain repeated colors, the

69 number of entries in an indexed image will be less, and hence the amount of memory needed for an indexed image will be less. Also, in an indexed image, the colors of the image are preserved in the color map. Table 4.2 Portion of the Color Map >>col_map(4543:4550, :) Row R G B 4543 0.7333 0.4353 0.3725 4544 0.8863 0.9529 0.4431 4545 0.0706 0.1098 0.1020 4546 0.9412 0.6745 0.5373 4547 0.9490 0.7137 0.6745 4548 0.8549 0.8196 0.6667 4549 0.7647 0.6627 0.5137 4550 0.8902 0.9922 0.6078 The given RGB image is converted to an indexed image. The RGB image is converted to a gray scale image and the noise is removed, using a median filter. The median filter reduces noise and also preserves the edges. Median filtering is based on the median value in the m n neighborhood around the corresponding pixel in the input image. The edges of the image are found, using the canny edge detection. Then, the edges of the image are smoothed to reduce the number of connected components, which is found using the 8 connectivity. The component number for the background image is 0. The biggest connected component in the image is found. For the pixels that are in the maximum connected component, the original pixel value from the indexed image is copied, and, for all the remaining pixels the value is set to

70 zero. This biggest connected component is treated as the dominant region. The dominant region obtained is not a solid region. To make it solid, each row of this new image is scanned, and the pixels in between two non-zero values are also set to the original value from the indexed image. The obtained solid region is converted back into a color image, using the color map. The steps involved in the extraction of the dominant foreground region are presented in Figure 4.3. Algorithm DOMINANT_FOREGROUND_REGION(im) The dominant foreground region of the RGB image is determined. 1. [ Obtain the indexed image from the RGB image.] img imread(im) [ind_img, map] rgb2ind(img, 65,536) /* map is the color map which contains the maximum of 65,536 colors. */ 2. [ Obtain the gray scale image from the color image. ] gray_img rgb2gray(img) 3. [ Remove noise by applying the median filter.] filt_img medfilt2(gray_img, [3 3]) /* [3 3] is the size of the neighborhood around the corresponding pixel in the input image*/ 4. [The edges of the image are found using the canny edge detector. ] BW edge (filt_img, canny ) /* The grayscale image filt_img is the input to the function edge, and returns a binary image BW of the same size as filt_img. BW is set to 1 when the function finds edge the in filt_img. Otherwise 0 is placed.*/ Figure 4.3 (Continued)

71 5. [Smoothing of the edges is done to reduce the number of connected components in it.] B = conv2 (BW, mask) 0 0 0 0 0 0 1 1 1 0 where mask = 0 1 1 1 0 0 1 1 1 0 0 0 0 0 0 The function conv2(bw, mask) computes the two-dimensional convolution of matrices BW and mask. 6. [This step finds the connected components of the image using the 8 connectivity. ] [L, num] bwlabel(b, 8) /* The function bwlabel() returns a matrix L, of the same size as matrix B, which is the first parameter. L would contain the labels for the connected objects in B. It returns the value num, which is the number of connected objects found in B. */ 7. [ Find the biggest connected component in the image foreground ] max_ind Label of the biggest connected component. 8. [For the pixels that are in the maximum connected component, the original pixel value from the indexed image is copied, and for all the remaining pixels the value is set to zero.] [M, N] size of the image Repeat for i = 1, 2,, M Repeat for j = 1, 2,., N If (L(i,j)== max_ind) n(i,j) ind_img(i,j) Else n(i,j) 0 Figure 4.3 (Continued)

72 End End End. 9. [The dominant region is converted to a solid region.] Repeat for i = 1, 2,., M Repeat for j = 1, 2,.., N If (the pixel value n(i, j) = 0) If (n(i, j) is in between two non zero values, either in the horizontal or in the vertical direction) n(i, j) ind_img(i, j) End If End If End End 10. Convert solid region into a color image by applying color mapping. Figure 4.3 Algorithm for Extracting the Dominant Foreground Region of an Image Figure 4.4 is the result of the segmentation of the dominant region from an image. The first column represents the original images and the corresponding second column images are the dominant regions of those images, obtained by the automatic segmentation of the dominant region. The first four images in Figure 4.4, namely, rose, bus, mountain, dinosaurs are segmented properly, while the segmentation of the African man in Figure 4.4, has some noise. Many of the region based image retrieval systems are based on the selection of the region of interest. Figure 4.5 is the segmentation as a result of user selection. The segmentation by selecting the region of interest has been carried out, using the SegTool. The region based image retrieval experiments were carried out, by using both the automatic segmentation of the dominant foreground region, and the region selection method. After getting the region of interest, the low level features are extracted from the obtained region. The next section describes how the low level features are extracted from the dominant region.

73 Original Image Dominant Foreground Region Figure 4.4 Image and its Dominant Foreground Region

74 Original Image Region Segmented using SegTool Figure 4.5 The Image and the Segmented Region Using the SegTool 4.4 FEATURE EXTRACTION FROM THE DOMINANT REGION After the segmentation of the dominant region from the image, its color and texture features are extracted. The first, second and third central moments of each color channel are used in representing the color feature vector in the Hue, Saturation and Value in the HSV color space. The six dimensional H, S, V, H, S, V color feature vector (f c ) is extracted. represents the mean and is the standard deviation of the color channel in the HSV, and the values are calculated using the Equations (2.1) and (2.2). The texture information is extracted from the gray-level cooccurrence matrix. The four co-occurrence matrices of the four different orientations (horizontal 0, vertical 90 and two diagonals 45 and 135 ) are constructed. The co-occurrence matrix reveals certain properties of the spatial distribution of the gray levels in the image. Higher order features like energy, contrast, homogeneity, correlation and entropy are measured, using equations

75 (2.4) to (2.8) on each gray level co-occurrence matrix, to form a five dimensional feature vector. Finally, a twenty dimensional texture feature vector (f t ) is obtained by concatenating the feature vectors of each cooccurrence matrix. Thus, the color-texture feature vector of dimension 26 is obtained (f ct = f c + f t ). The color and texture features of the dominant region of the rose image in Figure 4.4 are given in Table 4.3. The next section describes the construction of the semantic template for each semantic category of images. Table 4.3 The Feature Vectors of the Dominant Region of the Rose Image Feature Color features (6 dimensional) H, S, V, H, S, V Texture features (20- dimensional) Feature Vector 0.0091 0.0199 0.9296 0.0524 0.1148 0.0029 0.1475 0.9186 0.2509 0.9520 0.2237 0.8764 0.2398 0.9379 0.1780 0.9018 0.2461 0.9465 0.2117 0.8830 0.2411 0.9399 0.9130 0.8774 0.9652 0.8774 4.5 SEMANTIC TEMPLATE CONSTRUCTION A Semantic template is the centroid of the low-level features of all the sample regions. For the j th sample region in class i, where ( j= 1, 2,, 100 ) and (i = 1, 2,, 10), its color and texture features are given by,,,,, and,,, respectively. For the first dimension of the color and texture features, the centroid (Liu et al 2008) is calculated using Equations (4.2) and (4.3).

76 =, = 1, 2,, (4.2) =, = 1, 2,, (4.3) where n in Equation (4.3) is the number of images in each class, and it is 100 here and m is the number of semantic concepts in the image DB. Hence, the color and texture template of concept i is C i and T i, as given in equations (4.4) and (4.5) respectively. The color-texture template CT i of the i th concept is calculated as CT i = C i + T i. Hence the color-texture template is also of dimension 26. The Color-texture template is found for all the images in the DB. The next section describes the construction of the SVM-BDT using these color-texture templates. =,,,,, (4.4) =,,, (4.5) 4.6 CONSTRUCTION OF THE SVM-BDT The SVM binary decision tree construction consists of two major steps. The first step in constructing the Binary Decision Tree (BDT) is clustering the various classes of the DB images. The second step associates a binary class SVM with each non leaf node of the BDT. The distance between each of the color-texture templates is found, and the SVM-BDT is constructed by grouping the classes based on this distance. The block diagram for constructing the SVM-BDT is given in Figure 4.6. The SVM-BDT predicts the class of the query image. The statistical distance between the features of the query image and the features of the images in the predicted category are found, and the top images with the least distance are displayed as the output. The Euclidean, Bhattacharya and Mahalanobis statistical distance measures

77 are used for finding the similarity between the query image and the DB images. Training images Finding the dominant region of the images Extraction of the low level features Generation of the colortexture semantic template for all the category of images Determination of the distance between the color-texture templates SVM-BDT Construction of SVM- BDT by grouping the classes based on the distance Figure 4.6 Block Diagram for the Construction of the SVM-BDT The SVM-BDT is constructed for multi-class classification. If K classes are available in the DB, the color-texture template is found for each class in the DB. Then, the Euclidean distance between the colour-texture templates of all the K classes is found. Thus, the K K distance matrix is obtained. The distance matrix is used for further grouping. The two classes that have the largest Euclidean distance between them are assigned to each of the two clustering groups, and the colour-texture template of the two classes

78 is taken as the cluster center for the corresponding group. After this, the pair of classes each of which is closest to the cluster centers of the groups are found and assigned to the corresponding group. The cluster center is updated to the color-texture template of the class, that is included recently in the group. The process continues by finding the next pair of unassigned classes, each of which is closest to one of the two clustering groups, and assigns them to the corresponding group, and the cluster center is updated. Thus, all the classes are assigned to one of the two possible groups of classes. The SVM binary classifier trains the samples in each non-leaf node of the decision tree. The classes from the first clustering group are assigned to the first (left) sub tree, while the classes from the second clustering group are assigned to the second (right) sub tree. The process of recursively dividing each of the groups into two sub groups continues, until there is only one class per group, which is the leaf in the decision tree (Madzarov et al 2009b). This procedure leads to the balanced SVM-BDT. The Euclidean distances between all the semantic templates are found, and the distance matrix is given in Table 4.4. In Table 4.4, the highest value, 2.4200, that is darkly shaded, and it is between the 5 th and 7 th class. Hence, classes c 5 and c 7 are the farthest, and are assigned to groups G 1 and G 2 respectively. The color-texture template of class c 5 is taken as the cluster center of group G 1, and the color-texture template of class c 7 is taken as the cluster center of group G 2. To find the class closest to G 1, the class closest to its cluster center is determined. Hence, the smallest distance in the row corresponding to c 5 is obtained from Table 4.4. The smallest non-zero value in row 5 of Table 4.4 is 0.4896, and it is lightly shaded. This value is between c 5 and c 3. Hence, class c 3 is closest to the group G 1 and it is added to that group. Similarly, the class closest to G 2 is found by examining the row corresponding to its cluster center c 7. The smallest non-zero value in row 7 of Table 4.4 is 1.4436, and it is lightly shaded. This value is between c 7 and c 6.

79 Hence, class c 6 is closest to group G 2 and it is added to that group. The cluster center of G 1 is changed to the color-texture template of the newly added class c 3 and the cluster center of G 2 is changed to the color-texture template of class c 6. Table 4.4 Distance between the Color-Texture Templates 1 2 3 4 5 6 7 8 9 10 1 0 0.3810 1.0452 1.3704 1.2306 0.4664 1.5630 0.6960 0.5061 0.6365 2 0.3810 0 0.9709 1.0941 1.1187 0.5287 1.4712 0.6718 0.3021 0.5893 3 1.0452 0.9709 0 0.6384 0.4896 1.0204 2.3423 1.1159 1.1359 0.7579 4 1.3704 1.0941 0.6384 0 0.6735 1.4660 2.2797 1.3613 1.2314 0.9916 5 1.2306 1.1187 0.4896 0.6735 0 1.0968 2.4200 1.0393 1.3012 0.9613 6 0.4664 0.5287 1.0204 1.4660 1.0968 0 1.4436 0.4623 0.6759 0.7925 7 1.5630 1.4712 2.3423 2.2797 2.4200 1.4436 0 1.5581 1.4835 1.6909 8 0.6960 0.6718 1.1159 1.3613 1.0393 0.4623 1.5581 0 0.8545 0.8719 9 0.5061 0.3021 1.1359 1.2314 1.3012 0.6759 1.4835 0.8545 0 0.7766 10 0.6365 0.5893 0.7579 0.9916 0.9613 0.7925 1.6909 0.8719 0.7766 0 In the next step, the closest class for group G 1 is examined. The smallest unassigned non-zero value in the row corresponding to c 3 is 0.6384, which corresponds to class c 4. Hence, class c 4 is assigned to group G 1, and its cluster center is changed to the color-texture template of class c 4. In the same way, the closest class for group G 2 is obtained by getting the smallest unassigned non-zero value in the row corresponding to class c 6. The smallest unassigned non-zero value in row 6 of Table 4.4 is 0.4623, which corresponds to class c 8. Hence, c 8 is assigned to group G 2, and the cluster center of G 2 is updated to the semantic template of class c 8.

80 G 1 (c 5,c 3,c 4,c 10,c 1 ) SVM 1 G 2 (c 7,c 6,c 8,c 2,c 9 ) SVM 2 SVM 3 G 1,1 (c 4,c 3,c 5 ) G 1,2 (c 1,c 10 ) G 1,3 (c 8,c 2,c 9 ) G 1,4 (c 7,c 6 ) SVM 4 SVM 5 SVM 6 SVM 7 G 2,1 (c 5,c 3 ) G 2,2 (c 2,c 9 ) SVM 8 c 4 c 1 c 10 SVM 9 c 8 c 7 c 6 c 3 c 5 c 2 c 9 Figure 4.7 SVM-BDT for Semantic Learning In the next step, the next element of group G 1 is obtained, by finding the smallest unassigned non-zero value in the row corresponding to class c 4. The smallest unassigned non-zero value in row 4 of Table 4.4 is 0.9916, and it corresponds to class c 10. Hence, c 10 is assigned to group G 1 and the cluster center of G 1 is updated to the color-texture template of class c 10. Similarly, the next element of group G 2 is obtained by finding the smallest unassigned non-zero value in the row corresponding to class c 8. The smallest unassigned non-zero value in row 8 of Table 4.4 is 0.6718, which corresponds to class c 2. Hence, c 2 is assigned to group G 2, and the cluster center of G 2 is updated to the semantic template of class c 2. In the next step, the closest class for group G 1 is found, by finding the next smallest unassigned non-zero value from row 10 of Table 4.4. This corresponds to class c 1 and is assigned to

81 group G 1. In the same way, the last unassigned class c 9 is assigned to group G 2. This completes the first round of grouping that defines the classes that will be transferred to the left and right sub tree of the node. Hence, the first level grouping of the SVM-BDT shown in Figure 4.7 is {c 5, c 3, c 4, c 10, c 1 } and {c 7, c 6, c 8, c 2, c 9 }. The SVM classifier in the root of SVM-BDT is trained, by considering the samples from classes {c 5, c 3, c 4, c 10, c 1 } as positive, and the samples from classes {c 7, c 6, c 8, c 2, c 9 } as negative. The grouping procedure is repeated independently for the classes on the left and right sub trees of the root. The distance between the colortexture templates of all the classes in G 1 is given in Table 4.5. Table 4.5 Distance between the Color-Texture Templates of the Classes in G 1 1 3 4 5 10 1 0 1.0452 1.3704 1.2306 0.6365 3 1.0452 0 0.6384 0.4896 0.7579 4 1.3704 0.6384 0 0.6735 0.9916 5 1.2306 0.4896 0.6735 0 0.9613 10 0.6365 0.7579 0.9916 0.9613 0 The highest value in Table 4.5 is 1.3704, and it is between class c 4 and c 1. Hence, c 4 is added to group G 11, and c 1 is added to group G 12. The smallest value in row 4 is 0.6384, and it corresponds to class c 3. Hence c 3 is added to group G 11. The next class of group G 12 is obtained by finding the smallest unassigned non-zero value in row 1. It is 0.6365 and it corresponds to class c 10. Hence, c 10 is added to group G 12. The next class of group G 11 is obtained, by finding the smallest unassigned non-zero value in row 3. It is 0.4896 and it corresponds to class c 5. Hence, the classes in G 1 are divided into

82 two groups G 11, consisting of {c 4, c 3, c 5 } on the left side and G 12 comprising of {c 1, c 10 } on the right side. Similarly, the classes in group G 2 are divided into two groups, namely, G 13 with {c 8, c 2, c 9 } on its left side and G 14 has {c 7, c 6 } on its right side. On the next level, G 11 is divided into G 21 having {c 5, c 3 } and {c4} to the right of SVM4; G 12 is divided into two groups {c 1 } and {c 10 }; G 13 is divided into G 22 with {c 2, c 9 } and {c 8 } to the right of SVM6. G 14 is divided into {c 7 } and {c 6 }. On the next level, G 21 is divided into two groups {c 5 } and {c 3 }, and G 22 is divided into two groups {c 2 } and {c 9 }. Since each group contains only a single class in it, the construction of the SVM-BDT is complete. At each non-leaf node of the SVM-BDT, the SVM binary classifier is used for training the positive and negative samples. The next section describes how the SVM-BDT is used in predicting the class of the query image. 4.7 CLASS PREDICTION USING THE SVM-BDT AND IMAGE RETRIEVAL In the testing phase, the user submits a query image in order to retrieve the relevant images from the DB. The dominant region is extracted from the query image and the color, texture features are extracted and this feature vector is given to the SVM-BDT for class prediction. The SVM binary classifier at each non-leaf node branches the given input through the SVM-BDT, and thereby the class label of the query image is predicted. For example, a query image belonging to class 4 is submitted to the SVM binary classifier at the root. SVM 1 is consulted, and it results in +1, since, the query image belong to the category in G 1. Hence, it branches to the left side and SVM 2 is visited next. Since the query image category belongs to the group G 11, SVM 2 produces +1 and branches to the left side. So, SVM 4 is visited next. SVM 4 produces -1 and branches right. The leaf node, c 4 is reached. Hence, the label of the query image is predicted as 4. All the SVM classifiers

83 visited during the classification of the query image belonging to category 4, are shaded in Figure 4.7. After predicting the class, statistical similarity measures are applied between the query image, and only the images of that particular class. This reduces the search space and search time. The three popular measures, namely, Euclidean, Bhattacharya and Mahalanobis distance measures are used for finding the statistical similarity between the query image and the DB images. The distances between the feature vector of the query image and the feature vector of the images of the predicted class are found. These distances are sorted in the increasing order and the top k images with the least distance are obtained, and the corresponding images are displayed as the output. The next section provides the obtained results. 4.8 EXPERIMENTAL RESULT In order to verify the effectiveness and efficiency of the SRBIR system, experiments were conducted on the COREL dataset consisting of 1000 images of sizes 256 384 and 384 256. The training sample is the fully labeled DB. The automatic segmentation algorithm correctly segmented 86% of the images and the dominant object in the image is obtained. The remaining 14% of the images were not accurately segmented. For the construction of the SVM-BDT, the dominant region is obtained and its color and texture features are obtained. The feature vectors are stored in the feature DB. The semantic template of each category is calculated by obtaining the mean for each feature vector of the images in each semantic category. Thus, ten semantic templates are calculated for the considered 10 semantic categories. Then, the Euclidean distances between all the semantic templates are calculated and represented as the distance matrix. This distance matrix is used for classification, and the training process

84 continues till each group contains a single class. The SVM binary classifier trains the nodes at each level, and divides the group into two sub groups. Nine SVM binary classifiers are needed, to perform the multi class classification for 10 classes. Figure 4.8 Image Retrieval Based on the Segmentation of the Dominant Foreground Region and the SVM-BDT The dominant region of the query image is found, and its colortexture feature is given as the input to the SVM-BDT classification, for predicting the semantic class of the query image. The MATLAB SVM toolbox has been used for implementing the SVM binary classifiers in the SVM-BDT. During the training time, the SVM-BDT is constructed and during testing time, the SVM-BDT is used in predicting the label of the query image. The retrieval results obtained using the automatic segmentation of the dominant foreground region along with the SVM-BDT, and the region selection using the SegTool along with the SVM-BDT, are shown in Figures 4.8 and 4.9. In both the methods, the query image is an elephant

85 image and the retrieval results are correct. The performance of the SRBIR is compared with the other decision tree induction methods in the following section. Figure 4.9 Image Retrieval by Region Selection Using the SegTool and the SVM-BDT 4.9 COMPARISON OF SRBIR WITH OTHER DECISION TREE INDUCTION METHODS The results of the SRBIR system are given in Table 4.6; they reveal that if the SVM-BDT is trained with 100% of the training set images, the system produces 100% accuracy. If the SVM-BDT is trained with 75% of the images in the training data set, it produces 95.4% accuracy. The testing time is the same for these two types, and the training time increases as the training set size increases. The percentage of performance improvement in SRBIR compared to the SVM-BDT approach is given in Table 4.6. The percentage of time reduction is calculated using Equation (3.13). The performance of the

86 SRBIR by extracting the features from the dominant region, is better than that of the SVM-BDT approach, which extracts features from the whole image. When 75% of DB images are used for training, the accuracy rate is increased by 3.9% and it is increased by 2.4%, when 100% of DB images are used for training. But the training time and testing time of both the approaches are closely similar, as shown in Table 4.6. The results of the SRBIR system are compared with those of the other decision tree learning methods, such as the DT-ST, ID3, C4.5, and the RBIR with the region selection and the SVM- BDT. The comparison is given in Table 4.7. Table 4.6 Results of the SRBIR for the COREL Image Data Set Measured in terms Accuracy rate (%) Training time(sec) Testing time(sec) SVM-BDT approach with feature extracted from whole image Trained with 75% Trained with 100% SRBIR with segmentation of Dominant Region Trained with 75% Trained with 100% % of performance improvement in SRBIR compared to SVM-BDT Trained with 75% Trained with 100% 91.5% 97.6% 95.4% 100% 3.9% 2.4% 1.04 1. 19 1. 01 1. 19 2.88% 0% 0.05 0. 05 0.05 0. 05 0% 0% Table 4.7 Comparison with Different Induction Methods Method Classification Accuracy SRBIR with dominant region and SVM-BDT 100% RBIR with region selection and SVM-BDT 86% RBIR with DT-ST (Liu et al 2008) 74.6% ID3 (Liu et al 2008) 63.5% C4.5 (Liu et al 2008) 73.8%

87 From Table 4.7, it is seen that the SRBIR produces higher accuracy than the existing RBIR techniques, because it extracts the dominant foreground region and compares them. This SRBIR technique is suitable for images which contain the dominant foreground region. Since, in the selected COREL dataset, the images contain the dominant region which tells the semantic of the image, this SRBIR technique with the extraction of the dominant region and the SVM-BDT produces good results. 4.10 CONCLUSION This chapter described the SRBIR system, which looks for features close to the human interpretation of images. The algorithm for the automatic segmentation of the dominant foreground region from the image, provides the high-level semantics of the image. The automatic segmentation of the dominant region reduces the noise in the segmentation, and the low-level features of the region are maintained without much distortion. The low level features are extracted from the dominant region of each of the images in the DB, and these features are used in training the SVM-binary decision tree. The SVM-BDT is trained with the color-texture template of each image category. This SVM-BDT is used in predicting the class label of the query image. Thus, only the images whose high level semantics match with the query image are considered for similarity matching. This reduces the testing time and the accuracy of the system is promising when compared to the other region-based image retrieval techniques. If the query image belongs to a category which is not in the training set, then the SRBIR system produces misclassifications. This is the limitation of the SRBIR system. The next chapter discusses the method of adaptively training the SRBIR system with the new category images.