Short Run length Descriptor for Image Retrieval

CHAPTER -6 Short Run length Descriptor for Image Retrieval 6.1 Introduction In the recent years, growth of multimedia information from various sources has increased many folds. This has created the demand of an accurate Content Based Image Retrieval System (CBIR). Success of a general purpose CBIR largely depends upon the effectiveness of descriptors used to represent images. Low level features like color, texture and shape have been used for describing content of the images. A large number of CBIR techniques rely heavily on color and texture information of the image. The success of shape descriptors depends on the accuracy of the image segmentation technique employed [Shrivastava and Tyagi (2012)]. Color and texture can provide the most discriminating information for images [Horng et al. (2012),Wang et al. (2011)]. The color descriptors are extracted using quantization schemes in different color spaces [Gonzalez and Woods (2007)]. HSV color space is largely used for this purpose as it is more uniform perceptually and can mimic human eye understanding of the colors. Most commonly used color descriptors are histogram, color correlograms [Huang et al. (1997)], Color Edge Cooccurrence Histogram (CECH) [Luo and Crandall (2006)], dominant color [Shrivastava and Tyagi (2013a), Wang et al. (2011)] and Scalable Color Descriptor (SCD) [Manjunath et al. (2001)] etc. To extract texture information, Gray Level Co-occurrence Matrix (GLCM) [Haralick et al. (1973)], histograms, Gabor filter [Portar and Kankarajah (1997), Manjunath and Ma (1996)], 113

hidden Markov random field [Cohen et al. (1991)], local binary patterns based descriptors [Ojala et al. (2002), Murula et al. (2002)] are generally employed. Many researchers have integrated both the color and texture in a single descriptor [Xingyuan and Zongyu (2013), Liu et al. (2011)]. The texton based approaches MSD, SED, TCM and MTH have been suggested for combining color and texture properties of images. However, these approaches have limited capability as structure of textons is rigid and many discriminating patterns are left undetected. In this chapter, a novel texture descriptor named Short Run Length Descriptor (SRLD) and its histogram SRLH is proposed. The images are quantized into 72 main colors in HSV color space. Short run lengths of each color of size 2 and 3 are extracted from the image. The size is chosen to overcome the limitation of texton based approaches which uses matrix of size 2 2 or 3 3 to extract texture features. Run lengths at each orientation are combined to make the final SRLH. The proposed SRLH can describe the correlation between color and texture in a detailed manner and have the advantages of both statistical and structural approaches of extracting texture. SRLH can be seen as an integrated representation of information gained from all type of textons together. 6.2 Related Work Various descriptors have been proposed for integrating color and texture of images [Liu et al. (2010), Liu et al. (2011)]. Scale-Invariant Feature Transform (SIFT) [Lowe (2004)], based method has been proposed to detect and describe local features in images. Micro-Structure Descriptors (MSD) [Liu et al. (2011)] utilize underlying colors in microstructure with similar edge orientation to represent color, texture and orientation information of images. MSD uses a 3 3 window to detect microstructure in a quantized image. Pixels having values similar to centre 114

pixel in window of size 3 3 are only retained to define microstructures. MSD does not provide detailed correlation of color and texture as many textural patterns are left undetected as only centre pixel is considered. (a) Denotes 0 0 (b) denotes 90 0 (c) denotes 45 0 (d) denotes 135 0 (e) no direction Figure 6.1: Five texton type defined in SED Structure Element Descriptor (SED) [Xingyuan and Zongyu (2013)] based scheme uses 2 2 matrix as shown in Figure 6.1 to extract texture information at different orientations. A SED is detected when pixels having same value occur in colored part of template. The original color image is quantized into 72 colors in HSV color space; five structure element templates are used to detect the textons. To obtain a final described image of structure elements, SED uses a simple three-step strategy described as follows: (1) Starting from the origin (0, 0), move the 2 2 SED from left to right and top to bottom with step length of two. (2) If the structure element matches the value of the image (match means that the values of the image in the corresponding structure element are equal), the value will be reserved, otherwise give up the value. (3) The final SED map, denoted by S(x, y), is obtained by performing union of five SED maps. 115

Figure 6.2: Extraction of SED map: (a), (b), (c), (d) and (e) are processes of extracting SED map using five structure elements, respectively. (f) is the final SED map 116

Figure 6.2 shows an example to illustrate the above SED map extraction process. Figure 6.2(a) shows the process of extracting SED S 1 (x, y), Figure 6.2(b), (c), (d) and (e) are the extracting processes of SED maps S 2 (x, y), S 3 (x, y), S 4 (x, y) and S 5 (x, y), respectively. Figure 6.2(f) shows the fusion of the five maps to final SED map S(x, y). Figure 6.3 is a simple example that explains the principle of extracting SEH. Figure 6.3(a) shows the process of computing SEH of the simple image where the value is 1, and the result of SEH is {2, 2, 2, 2, 1}. Figure 6.3(b) (e) are the processes of computing SEH where the values are 2, 3, 4, 5. The final SEH of simple image is {2, 2, 2, 2, 1, 0, 1, 1, 2, 0, 1, 0, 1, 1, 0, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1}. SEH is extracted from 72 bins in HSV color space, the five structure elements are extracted in every bin, so there are a total of 360 elements extracted in HSV color space. SED is not capable of extracting complete texture information as run lengths of odd size are not detected properly; also SED are redundant and overlap over each other as fifth SED is detected when other four are detected and vice-versa. Texton Co-occurrence Matrix (TCM) [Liu and Yang (2008)] based techniques use textons as specified in Figure 6.4. To extract the texture, textons are moved over the image from left-toright and top-to-bottom with one pixel as the step-length. If the pixel values that fall in the texton template are the same, those pixels will form a texton, and their values are kept as the original values. Otherwise these values will be set to zero. Each texton template can lead to a texton image, and five texton templates will lead to five texton images. Finally, all texton images are combined to get a single image. 117

Figure 6.3: SEH extraction method: (a), (b), (c), (d) and (e) are the processes of extracting SEH of five bins, respectively. The number under the structure element is the number of the structure element of every bin. 118

Figure 6.4: Five special textons used in TCM Multi-Texton Histogram (MTH) [Liu et al. (2010)] based image retrieval integrates the advantages of co-occurrence matrix and histogram by representing the attribute of co-occurrence matrix by histogram. MTH uses first four textons out of 5 SED to extract texture from the image. Like SED, MTH also is not able to represent the full contents of images. In Chen et al. (2010), an adaptive color feature extraction scheme using the Binary Quaternion-Moment Preserving (BQMP) threshold technique is used to describe the distribution of an image. 6.3. Color Quantization in HSV Color Space Color is most common used feature in the CBIR since it is not affected by rotation, scaling or other transformations on the image. Color features are generally represented by the color histogram. Computation of color histogram requires quantization of selected color space. In this chapter, we have used HSV (Hue, Saturation, Value) color space since it is more perceptually uniform than other color spaces [Swain and Ballard(1991),Tsang and Tsang (1996)]. Numbers of colors are reduced using quantization in hue, saturation and value planes. Quantized color image is then used to extract texture features using SRLD. This process integrates the color and texture features into a single descriptor SRLH. To obtain quantized image having 72 colors [Liu and Kong (2011)], the images are converted from RGB to HSV color space and a non uniform quantization technique is applied as given below: 119

0 h [0, 24] [345,60] 1 h [25, 49] 0 s [0, 0.15] 2 h [50, 79] S = 1 s [0.15, 0.8] (6.1) 3 h [80,159] 2 s [0.8, 1] H = 4 h [160,194] 5 h [195,264] 0 v [0,0.15] 6 h [265,284] V = 1 v [0.15, 0.8] 7 h [285,344] 2 v [0.8, 1] One dimensional color feature vector is constructed using: P = 9H + 3S + V. Each image is quantized to 72 main colors and SRLD is computed to finally get the SRLH feature of the image. 6.4 Short Run Length Descriptor (SRLD) The color, texture and shape features are extensively used in representation of images in content based image retrieval system. After quantization of image to 72 colors in HSV space, the texture information can be extracted using statistical and structural methods. A Structure Element Descriptor (SED) can describe color and texture feature of the image. A typical SED is a 2 2 matrix and can extract pair of repeating pixels occurring at different orientation in the image. In addition, a 3 3 SED can also be used to extract similar type of information in larger run lengths. SED has a limitation that only one type of SED can be used at a time, therefore it cannot describe all repetitive structures in the image. Figure 6.5(a) shows an example part of an image having run length 3, wrongly represented by 2 2 SED as of length 2. This shows that the run lengths of odd size cannot be represented by SED. Figure 6.5(b) shows that the pair of 1 left undetected by moving SED of 2 2 with step length of 2 over the image. From Figure 6.5, it is obvious that SED based methods can only represent the local characteristic of an image and 120

lacks in detailed analysis of texture from the whole image. To integrate the color and texture information in a single descriptor including higher details of spatial correlation, we have proposed a more effective Short Run Length Descriptor (SRLD). 1 1 1 5 8 7 (a) 3 2 5 4 1 1 (b) Figure 6.5: An example showing (a) run length of 3 described by SED as of length 2 (b) undetected run length of pair of 1 Capturing texture information using structuring elements is not flexible and may result in loss of some important discriminating texture patterns. The SRLD uses run lengths of size atmost 3 to describe different texture structures hence is able to describe all repetitive texture patterns in the image. The size of run length is kept limited to 2 and 3 as the combination of 2 and 3 can describe any odd or even number. To capture orientation information, the run lengths are extracted at 0 0, 45 0, 90 0 and 135 0 for each quantization level in the HSV color space. The process of extracting SRLD can be described in a simple 3-step strategy as follows: 1. Starting from (0, 0), scan each row of pixel from top to bottom. To avoid the extraction of a wrong run length, counting of pixels terminates at the end of each row and start at the beginning of each row. 2. Compute run lengths with size of atmost 3 pixels excluding those of length 1. If the run length size is greater than 2 and 3 then break it into multiple smaller run length of size 2 and 3. 3. Count the number of run lengths of size 2 and 3 for each color for making final run length. 121

The above steps are used to extract SRLD at orientation of 0 0. For other orientations the image is scanned in a column-to-column and diagonal-to-diagonal basis. The outcome of this process is a total of 4 run lengths one for each orientation. It can be easily observed that the run length representation is similar to texton based methods with more detailed texture analysis. 6.5 Short Run Length Histogram (SRLH) The run lengths computed above contain 2 entries for each color, first entry shows the number of run lengths of size 2 and other entry specifies the total number of run lengths of size 3 in each orientation. All these run lengths are combined to form a single run length thereby having 8 entries for a single color. The first four entries represent the total number of run length of size 2 and the other 4 entries represent total run length of size 3 in each of the four possible orientations respectively. The final run length obtained is represented with the help of a histogram having 72 8 bins. a a a b b c c c d d a e f f b b b b b d c c c f f f a a a a d c d d d e e e b b c c c f f f f f f f 3a 2b 3c 2d 1a 1e 2f 5b 1d 3c 3f 4a 1d 1c 3d 3e 2b 3c 7f 3a 2b 3c 2d 2f 3b 2b 3c 3f 2a 2a 3d 3e 2b 3c 3f 2f 2f 2(2a) 1(3a) 3(2b) 1(3b) 0(2c) 3(3c) 1(2d) 1(3d) 0(2e) 1(3e) 3(2f) 2(3f) 2 1 2 1 0 2 1 1 0 1 3 2 Figure 6.6: Extraction of short run length histogram at an orientation of 0 0 The method of SRLH computation is described in Fig. 6.6. For simplicity, the quantized colors in HSV color space are denoted by alphabets a, b, c, d, e and f. The technique is illustrated using 6 colors, therefore the SRLH at each orientation contains 6 2 i.e. 12 bins. In real, the experiments are conducted with 72 colors and the histogram thus produced contains 72 2 i.e. 122

144 bins. Histograms at other three orientations are computed in a similar manner. All resulting histograms are merged to get a single histogram as shown in Figure 6.7. 0 0 2 1 3 1 0 3 1 1 0 1 3 2 45 0 1 0 0 0 0 1 0 0 0 0 0 0 90 0 1 0 1 0 0 1 1 0 0 0 1 0 2110 1000 3012 1000 0000 3111 1010 1000 0000 1000 3010 2000 135 0 0 0 2 0 0 1 0 0 0 0 0 0 Figure 6.7: The process of combining histograms into a single histogram The combined SRLH contains 12 4 bins (i.e. 48) for the case of 6 colors. In real experiments the final SRLH has a total of 72 8 (i.e. 576) bins. It may be easily noticed that the SRLH is similar to the texton histogram with higher texture details. For example in SED, each color is represented as 5 bins corresponding to 5 textons as shown in Figure 6.1, However, in the present method each color is represented as 8 bins corresponding to two different sizes of run length and 4 different orientations. Figure 6.8 shows 4 images and their corresponding SRLH. It can be easily observed from the figure that the SRLH for similar images are similar. This confirms the effectiveness of SRLH in representing images. When an image is scaled, number of pixels in the image gets changed. SRLH may be different for original and scaled image. This problem can be solved by maintaining the proportion of pixels same in both images. To achieve this objective normalization is applied. Let C i ( 0 i 71) denotes the quantized color in HSV color space. n R 1 i, n i n i R 2, R 3 and n R 4 i denotes the number of run lengths of color i of size n at each of the four orientations respectively. The value of n can be either 2 or 3.The normalized value can be computed as: 123

R n n ji r ji, 4 (6.2) n j 1 Rji n where r ji is the normalized bin value for orientation j. Similarly normalized bin value for n=3 for color i is computed. Therefore each color is represented as 8 bins in the SRLH. 6.6 Similarity Measure The normalized histograms of query and target images are compared using chi-square distance as it produces best result for proposed approach. Let Q and T are the histograms of query and target images then chi- square distance can be computed as: D 2 Q, T 576 i 1 Qi Ti Q T i i 2 (6.3) We have performed experiments to compare the results of chi-square distance and most commonly used Euclidean distance. The results verify that the proposed approach with chisquare distance outperformed the Euclidean distance based approach. (a) 124

(b) (c) (d) Figure 6.8: SRLH of images a,b,c,d 6.7 Experimental Results To demonstrate the performance of the proposed descriptor, experiments are performed on MPEG-7 Common Color Datbase (CCD) [Martinez et al.(2002)] (dataset-1) and Corel 11000 database (dataset-2) [Wang et al.(2001)]. The dataset-1 (CCD) consists of 5000 images and a set 125

of 50 Common Color Queries (CCQ); each with specified ground truth images. CCD consists of variety of still images produced from stock photo galleries, consecutive frames of news cast, sports channel and animations. The effectiveness of individual descriptors is measured using Average Normalized Retrieval Rate (ANMRR). ANMRR does not only determine if a correct answer is found from the retrieval results but also calculate the rank of the particular answer in the retrieval results. A lower ANMRR value represents better performance. In particular experiments we used as ground truth, the groups of images proposed in the MIRROR image retrieval system [Wong et al. (2005)]. Corel (dataset-2) is the most widely used dataset for evaluating the performance of image retrieval applications. It contains 110 categories of images having 100 images in each class. It cover a variety of semantic topics such as eagle, gun, horse, flower, sunset etc. Commonly used performance measures precision-and-recall are used to judge the retrieval accuracy. Precision (P) and Recall (R) are defined as: m P n (6.4) m R t, (6.5) where m is the number of relevant images retrieved, n is the total number of images retrieved and t is total number of relevant images in the database for query image. To evaluate the retrieval performance on dataset-1, 50 CCQ images are used as query and precision-and-recall values for each image is computed. Mean precision-and-recall is computed using obtained precision-and-recall pairs. 126

To perform experiments on dataset-2, we randomly selected 25 categories of images including african people, beaches, flowers, horses, dinosaur, sunset, car etc. and 20 query images from each category are used to compute mean precision-recall pair. Figure 6.9 shows the retrieval performance of the proposed method using 36, 72 and 128 bins in HSV space respectively. It can be easily observed from the figure that average retrieval rate for 72 bins and 128 bins is almost same and is higher than the approach using 36 bins. Increase in number of bins may result in increasing complexity and computation time; therefore in this work we have used 72 bins of HSV color space. 0.8 0.6 0.6 0.5 0.4 36 bins 72 bins 128 bins 0.4 P 0.3 P 0.2 0.2 36 bins 72 bins 128 bins 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 R R (a) (b) Figure 6.9: Average precision-and-recall of the SRLH using different number of bins (a) dataset-1 (b) dataset-2 Figure 6.10 shows the retrieval performance comparison of the proposed SRLH with other three methods MTH, MSD and SED. It can be observed that SRLH has outperformed others on 127

both Dataset-1 and Dataset-2. The MSD based method does not represent various discriminating patterns of texture and hence has limited capability of describing color and texture of the image. MTH and SED based methods have a rigid texton structure which does not always fit well on the different texture patterns and hence may result in loss of significant texture details. The proposed SRLH is flexible and can represent detail texture information in the quantized color image. It can represent combined information gained from all type of textons of size 2 2 and 3 3 together in a single descriptor. Also in SRLD, orientation is captured without overlap. 0.8 0.6 MSD SED 0.5 MTH SRLD 0.6 0.4 0.4 P P 0.3 0.2 0.2 MTH MSD 0.1 SED SRLH 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 R R (a) (b) Figure 6.10: Average precision-and-recall of the different methods for (a) dataset-1 (b) dataset-2. In Fig. 8 (a), for top 10 images average precision of SED, MSD and MTH based methods are 72 %, 65 % and 61 % respectively. At this point SRLH outperforms others with average precision of 78%. For 100 images, the precision of SRLH is dropped to 28% i.e. higher than MSD and 128

MTH but slightly less than SED. This clearly indicates that SRLD has best overall results in comparison to other methods. Similar conclusions can be drawn from Figure 6.10(b), using dataset-2. Table 6.1 shows the comparison of retrieval performance in terms of ANMRR. It can be observed that the proposed method have lower ANMRR values specifying better performance. Table 6.1 ANMRR obtained for different methods Dataset MTH MSD SED SRLH Dataset-1 0.582 0.467 0.412 0.324 Dataset-2 0.675 0.624 0.562 0.452 0.8 0.6 0.6 0.5 0.4 TCM SRLH EOAC CSD 0.4 P P 0.3 0.2 0.2 0 TCM EOAC SRLH CSD 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 R R (a) (b) Figure 6.11: Average precision-recall of different methods on (a) Dataset-1 (b) Dataset-2 The performance comparison of proposed method with Edge Orientation Auto-Correlogram (EOAC) [Mahmoudi et al. (2003)] and TCM methods is shown in Figure 6.11(a). EOAC can represent the edge orientation and their correlation with other edges. It can represent shape 129

feature of an image well but color and texture information will be lost. The TCM based approach uses 2x2 textons to extract texture from the quantized color image. It consumes lot of time in moving each of the texton over image and finally combining images corresponding to each texton. Also many useful texture patterns remain undetected due to rigid structure of texton. The Color Structure Descriptor (CSD) [Martinez et al. (2002)] is also based on color histograms, but aims at identifying localized color distributions using a small structuring window. The present SRLD based technique performs well as it can extract higher details of texture orientation and correlates it with spatial distribution of colors. Similar conclusions can be drawn from Figure 6.11(b), using dataset-2. Finally three example of the sample retrieval results from proposed system taking three images from Corel dataset are shown in Figure 6.12-14. Top 20 retrieved images are shown for each query image. Top left image in each figure is the query image; other images are similar images to the query image retrieved from the system. Figure 6.12: Image Retrieval for Dinosaurs 130

Figure 6.13: Image Retrieval for flowers Figure 14: Image retrieval for horses 6.8 Conclusion In this chapter, a short run length descriptor for content based image retrieval is proposed which can represent color, texture and orientation information of the whole image in a compact and intuitive manner. The image is first quantized into 72 colors in HSV color space. The SRLD is extracted by scanning the image in row, column and diagonal directions. In each scan, number 131

of short run lengths of size 2 and 3 for each color is computed. SRLD at each of the four orientations are combined to give final SRLH. The proposed SRLH can better represent the correlation between color and texture and can describe texture information extracted from all type of texton in a single descriptor. In addition, texton based approaches like SED, EOAC, TCM and MTH consume more time in texton analysis and moving textons over the images. The proposed approach is faster as only the run lengths from the images are to be extracted in each orientation for the construction of feature vector. The experimental results on representative databases have shown that the proposed approach outperform other significantly and hence can be used in CBIR effectively. 132