Invariant Features of Local Textures a rotation invariant local texture descriptor

Invariant Features of Local Textures a rotation invariant local texture descriptor Pranam Janney and Zhenghua Yu 1 School of Computer Science and Engineering University of New South Wales Sydney, Australia National ICT Australia (NICTA) 2 Sydney, Australia pranam.janney@nicta.com.au,zhyu@ieee.org Abstract In this paper, we present a new rotation-invariant texture descriptor algorithm called Invariant Features of Local Textures (IFLT). The proposed algorithm extracts rotation invariant features from a small neighbourhood of pixels around a centre pixel or a texture patch. Intensity vector which is derived from a texture patch is normalized and Haar wavelet filtered to derive rotation-invariant features. Texture classification experiments on the Brodatz album and Outex databases have shown that the proposed algorithm has a high rate of correct classification. 1. Introduction Texture classification is a fundamental low-level processing step in image analysis and computer vision. When images or videos are captured using state of the art cameras or sensors, they are still subject to geometric distortions (e.g. translation, rotation, skew, and scale) due to varying viewpoints, and hence affine-invariant descriptors are required for the analysis of real world texture images/patches. There are numerous algorithms in the open literature for texture feature extraction and classification [14], [13]. The vast majority of these algorithms make an explicit or implicit assumption that all images are captured under the same orientation (i.e., there is no inter-image rotation). For a given texture patch, no matter how it is rotated, it is al- 1 Zhenghua Yu is no longer associated with National ICT Australia or School of Computer Science and Engineering, University of New South Wales. 2 National ICT Australia (NICTA) is funded by the Australian Government s Department of Communications, Information Technology, and the Arts (DICTA) and the Australian Research Council through Backing Australia s Ability and the ICT Research Centre of Excellence programs. ways perceived as the same texture by a human observer. Therefore, from both the practical and the theoretical point of view, rotation invariant texture classification is highly desirable. The first few approaches to rotation invariant texture description include generalized cooccurrence matrices [10], polarograms [5], and texture anisotropy [4]. Researchers in [6] derived texture features in short computational time by applying the partial form of Gabor functions. These features were then transformed to 2-D closed shapes and their moment invariants and global shape descriptors were derived to classify the rotated textures. Other researchers have used Gabor wavelets and other basis functions to derive rotation invariant features [17], [7], [8], [9]. However, these techniques try to derive texture features on a global level (i.e. on the whole image). Such global textures are not very distinctive when there are texture variations across the image. Hence local texture descriptors are more preferred to describe textures in an image [12]. Using a circular neighbor set, Porter and Canagarajah [12] presented rotation invariant generalizations for all three mainstream paradigms: wavelets, GMRF, and Gabor filtering. Utilizing similar circular neighborhoods, Arof and Deravi obtained rotation invariant features using the 1D DFT [2]. A comprehensive literature survey of the existing texture classification techniques is available in Tan [17]. In [16], they present an approach to material classification based on texture model built using 3D texton representations. Texture model was based on the statistical distribution of clustered filter responses. Despite its importance, work on rotation invariant texture analysis is very limited. Recently, researchers in [15] developed a new local texture descriptor called Local Binary Pattern. This method is based on recognizing that certain local binary patterns, which are termed uniform, are a fundamental property 1-4244-1180-7/07/$25.00 2007 IEEE 1

Consider a 3 3 neighbourhood of pixels as shown in Figure 1. True circular symmetry around X c can be achieved by recalculating pixel intensities at the coordinates given by. ( Rsin2πi X i =, Rcos2πi ) P P (1) Figure 1. 3 3 neighbourhood of pixels of local image textures and their occurrence histogram is shown to be a very powerful texture feature. They derive a generalized gray-scale and rotation invariant operator representation that allows for detecting the uniform patterns for any quantization of the angular space and for any spatial resolution and presents a method for combining multiple operators for multiresolution analysis. LBP generates very distinct descriptors for textures which are not same but similar. Thus, using LBP generated descriptors it is difficult to measure similarity of two textures which are not same but similar. In this paper, we propose a new Invariant Features of Local Textures algorithm which can be used to generate rotation invariant features from local textures. Section 2 describes in detail the proposed approach. We provide details of our experimental setup, results and analysis in Section 3. 2. Invariant Features of Local Textures Texture is basically the behaviour of image intensities over an area. Texture images usually have complex intensity gradient fields. Researchers in [14] have derived scaling laws using the intensity gradient fields and thus derived a similarity measure for texture retrieval. However, considering pixel intensities in a small image neighbourhood would provide us with an approximate measure of the gradient in that specific neighbourhood of image pixels. This forms the basis of the proposed Invariant Features of Local Textures (IFLT) algorithm. where X i is the equivalent position of the i th pixel in circular symmetry around the centre pixel with radius R and with P neighboring pixels. In the work that follows R is set to unity. The gray values of neighbors which do not fall exactly in the center of pixels are estimated by interpolation. With X C as the centre pixel, calculating the gradient of intensity in all directions with reference to the centre pixel, we arrive at gradient components which would be approximately scale invariant. The gradient intensities around a centre pixel can be rewritten as a one-dimensional vector as shown in Equation 2, I = [I C I 0,..., I C I 7 ] (2) Where I is a one-dimensional vector, I C is the intensity of the centre pixel and I (0..7) are the intensities of the surrounding neighbourhood. Performing a simple normalization of this onedimensional vector further enhances scale invariance. I norm = I max(i) The vector thus derived represents the intensity gradient around the centre pixel and would also be (partially) illumination invariant. It can be seen from Figure 1 that any rotational effects would result in linear shifts in the one-dimensional vector of Equation 2. That is,rotations in image space correspond to linear shifts in the transformed space. The discrete wavelet transform (DWT) of signal I is calculated by passing it through a series of filters [11]. In this work Haar wavelets were used becuase of their computational efficiency. The required filter coefficients are given in Equation 4. h = [ 1 2, 1 ] [ ] 1 1, g = 2, 2 2 The signal is decomposed simultaneously using a highpass filter h and a low-pass filter g. The outputs of the highpass filter are known as the detail coefficients and those from the low-pass filter are referred to as the approximation coefficients. The filter outputs are then downsampled by 2. The nth component of downsampling a vector y by k may be written as: (3) (4) (y k)[n] = y[kn] (5)

Figure 2. Block diagram of multi-scale version of Invariant Features of Local Textures where is used to denote the Downsampling Operator. Noting that the wavelet transform operation corresponds to a convolution followed by downsampling by 2 allows the filter outputs to be written more concisely as, y low = (I g) 2, y high = (I h) 2 (6) The detail and approximate coefficients have shift invariant energy distributions. In the experiments described below we use the mean and standard deviation of the energy distributions of the high pass and the low pass filter outputs generated by one step of the wavelet transform of Equation 2 as the texture features.these features are inherently scale and rotation invariant for a small 3 3 neighborhood of pixels. The next step in building a texture descriptor is to extract statistical distributions of local texture features within a texture image patch. Given an M N patch of pixels, the following steps are performed: 1. A 3 3 sliding window is applied across the whole texture patch and local texture features are extracted from all the sliding window locations. 2. A histogram is built from the extracted local texture features in the texture patch. This involves partitioning the 4-dimensions of texture features (mean and the standard deviation of the energy distributions of the high pass and the low pass wavelet bands) into a number of bins and calculating the number of occurrence of local texture feature values in those bins. 3. To compute the distance between two texture patches, the Euclidean distance between corresponding histograms is used. However, any other possible distance measure between histograms, such as χ 2 -distance, could be used. The histogram extracted in step 2 above serves as the texture descriptor of an image patch. Thus, we have derived invariant features of local textures. Using a single level of wavelet coefficients for feature generation results in texture features that are rotation invariant. Same algorithm is applied on different scales of a scale-space representation of the image to derive invariant features of local textures that would take into consideration the spatial arrangement of these textures in an image. Thus we developed a multi-scale version of the algorithm as shown in Figure 2. A Gaussian filter was used as a lowpass (blurring) filter. A concatenation of texture histograms across scales serves as a texture descriptor for the input image patch. The final distance between two texture patches is the sum of the distances across all scales and different weights can be given to different scales when calculating the combined distance. 3. Experimental Setup, Results and Analysis We have benchmarked the results from Local Binary Pattern experiments [15]. Researchers in [15] have used 16 source textures from the Brodatz album [3]. Considering this in conjunction with the fact that rotated textures are generated from the source textures digitally, the image data provides a slightly simplified but highly controlled problem for rotation invariant texture analysis [15]. Researchers in [15] have developed Local Binary Pattern (LBP P,R ) with (P, R) values (8, 1), (16, 2) and (24, 3) for three spatial and three angular resolutions in their experiments. They have also used V AR P,R, which is appended to LBP P,R to achieve maximum performance. The image data included 16 texture classes from the Brodatz album [3] as shown in Figure 3. Originally each texture class consisted of eight 256 256 images, Porter and Canagarajah [12] created 180 180 images of rotated textures from these source images using bilinear interpolation. A small amount of artificial blur was added to images which were not rotated through multiples of 90. The source textures were digitally captured from the sheets in the Brodatz album [3] and the rotated textures were generated using these source images. Hence this image data provided a simplified but highly controlled problem for rotation invariant texture analysis as the rotated textures do not have any local intensity distortions such as shadows [15]. 3.1. Experiment 1 In the experimental setup of [15], the texture classifier was trained with several 16 16 subimages extracted from a set of training images. The relatively small size of the training samples increases the difficulty of the problem. The training set comprised 121 disjoint 16 16 images of each rotation angle i.e. 0, 30, 45 and 60. Thus the training set consisted of 484 (4 angles, 121 samples) for each of the 16 texture classes. Textures for classification were presented at rotation angles 20, 70, 90, 120, 135 and

Figure 3. 180 180 samples of 16 textures used in experiments at particular angles 150. This included 672 samples, 42 (6 angles, 7 images) for each of the 16 textures. Using Euclidean distance, researchers in [15] have reported 99.6 percent classification accuracy using Local Binary Pattern (LBP P,R ) texture descriptors. However researchers in [15] have appended V AR P,R to their LBP P,R operator to achieve a maximum performance of 100 percent. Texture features for 16 16 image samples in each of the 16 texture classes were calculated. The histogram of textures of each 16 16 sample in each class were added to yield a big model histogram. During classification, the test sample histogram was compared with the model histogram of each class. Consequently obtaining 16 reliable model histogram containing 484(16 2R 2 ) entries (the operators have R pixel border). The performance of the texture feature was evaluated with 672 test images. Typical histograms of the test samples contained (180 2R 2 ) entries. This is the same procedure used for classification in [15]. The Euclidean distance between histograms is used for determining feature similarity. Results in Table 1 correspond to the percentages of correctly classified samples of all the test samples. For comparison purposes,lbp P,R performance is also given in Table 1. It is evident from table 1, that the proposed IFLT algorithm has around 98.06% classification accuracy compared to the 88.2% classification accuracy of LBP P,R IFLT P,R Bins 2 scales 8,1 10 88.2 98.06 16,2 18 98.5 98.66 24,3 26 99.1 99.2 8,1 + 16,2 10 + 18 99 98.4 8,1 + 24,3 10 + 26 99.6 99.3 16,2 + 24,3 18 + 26 99 99.4 8,1 + 16,2 + 24,3 10 + 18 + 26 99.1 99.75 LBP P,R Table 1. Performance of LBP and IFLT algorithm on Brodatz textures with training samples of size 16 16 P,R Bins IFLT (2 scales) 8,1 16 98.21 5 99.4 16,2 10 99.1 24,3 18 99.6 Table 2. Best performance results of IFLT on Brodatz textures with training samples of size 16 16 for (P, R) = (8, 1), which is the basic texture operator. For higher (P, R) = (16, 2) and (24, 3) there is a slight improvement in performance when compared to LBP P,R. It is also evident from Table 1 that the combination of these

IFLT P,R Bins 3+ scales 8,1 10 98.8 16,2 18 99.4 24,3 26 100 8,1 + 16,2 10 + 18 98.95 8,1 + 24,3 10 + 26 99.7 16,2 + 24,3 18 + 26 99.7 8,1 + 16,2 + 24,3 10 + 18 + 26 99.7 Table 3. Performance results of IFLT algorithm on Brodatz textures with training samples of size 180 180 P,R Bins IFLT (3+ scales) 8,1 10 98.8 5 100 16,2 10 99.7 16 100 24,3 26 100 18 100 24 100 Table 4. Best performance results of IFLT on Brodatz textures with training samples of size 180 180 different spatial resolutions in IFLT does not to improve the performance to a great extent. The difference in performance between the three different spatial resolutions in the proposed algorithm is not considerable when compared to the difference in the performance between the three spatial resolutions in LBP P,R. This strongly suggests that the texture features generated by IFLT are more stable at different spatial resolutions when compared to that of LBP P,R. (P, R) = (8, 1) of LBP P,R has difficulty in discriminating strongly oriented textures misclassifications of Rattan, Straw and Wood [15] being largely responsible for decreased performance. The number of misclassifications for IFLT was considerably less compared to LBP P,R at(p, R) = (8, 1), the test samples were misclassified as Rattan or Sand. In this case the true model was ranked second for all the misclassified test samples. However, at higher spatial resolutions the test samples were misclassified as Matting or Rattan, whilst the true model was ranked second. IFLT s best classification accuracy is shown in Table 2. We could not go to much coarser scales as the training samples consisted of 16 16 images. 3.2. Experiment 2 We performed a second set of tests where the training image data consisted of 16 source texture classes from Brodatz album [3] shown in Fig.5. Each texture class has four 180 180 images at angles 0,30,45 and 60. The only difference between this training set and the previous experimental set is that the training samples were not divided into 16 16 subimages. We were able to derive coarser scale images because the training samples were 180 180 images. However the classification procedure remained the same. The test results are provided in Table 3. As seen from Table 3 the proposed algorithm can achieve 100 % performance consistently for (P, R) = (16, 2)and(24, 3) whilst for (P, R) = (8, 1) it is around 98%. As seen from Table 4, for combination between these three spatial resolutions with bins (5, 5/10, 16/18/24/16), we achieved a classification performance of 100%. However for combinations of spatial resolutions with different bins shown in Table 3 we could only achieve a performance of 99.7 %. The above tests provides an interesting set of results. As seen from Table 1, the classification accuracy of IFLT at (P, R) = (8, 1) is around 98.06% when the training samples were 16 16 images. However, as seen from Table 3 the classification accuracy at (P, R) = (8, 1) is around 98.8% when the training samples were 180 180 images. The difference between the classification accuracies between these two test results at (P, R) = (8, 1) is negligible. This strongly suggests that IFLT generates distinctive local texture features irrespective of size of the training images. Hence texture features extracted at (P, R) = (8, 1) could be termed as one of the most stable texture features which is independent of the size of the training images. 3.3. Experiment 3 A third experiment was conducted using the Outex image databases (test suite Outex TC 00010) [1]. The classifier was trained with the reference textures of 480 (24 classes 20 samples) models. A test database consisting of 3,840 (24 classes 20 samples 8 angles) samples was used. Each sample was 128 128 pixels in size. Examples of each of the 24 classes are shown in Figure 3.3. The tests were conducted using the same procedure as described above. However, for this experiment we used the χ 2 distance measure shown in Equation 7. χ 2 = N i=1 (p i q i ) 2 (p i + q i ) where χ 2 is the chi-square distance between two N- dimensional vectors p and q. The test results are shown in Table 5. As seen, the newly developed algorithm has better performance when compared with LBP P,R at (P, R) = (8, 1). However,the performance is not quite as close to that of LBP P,R for other values of (P, R). As seen from results of the First and Second Experiments above (Table 1 and Table 3), the classification accuracy of IFLT is best at (P, R) = (8, 1). Increasing the spatial resolution does not provide much (7)

Figure 4. 128 128 samples of each of the 24 texture class at particular angles IFLT P,R Bins 8,1 10 85.1 86.8 16,2 18 88.5 89 24,3 26 94.6 90 8,1 + 16,2 10 + 18 93.1 89.5 8,1 + 24,3 10 + 26 96.3 89.1 16,2 + 24,3 18 + 26 95.4 90 8,1 + 16,2 + 24,3 10 + 18 + 26 96.1 89.2 LBP P,R Table 5. Performance results of IFLT on Outex TC 00010 improvement over the classification accuracy attained at (P, R) = (8, 1). The same conclusion is also supported by the results of our third experiment where the basic operator at (P, R) = (8, 1) provides better classification performance compared to LBP P,R, however, this performance does not seem to improve much when the spatial resolution is higher. This can be attributed to the fact that IFLT works well when (P, R) = (8.1). The features extracted are robust enough to handle rotation invariance by themselves. There seems to be no need for increasing the resolution of the neighborhood and mixing/matching different resolutions of neighborhoods to achieve better performance. Thus it is evident that the basic local texture descriptor is robust to varying conditions. 3.4. Computation Cost In its bare format, both Invariant Features of Local Textures (IFLT) and Local Binary Pattern (LBP P,R ) consider a neighbourhood of N pixels. In this case, LBP P,R takes 4 N computations to generate a rotation invariant descriptor whereas IFLT takes 5 N computations to generate rotation and scale invariant descriptor per centre pixel. However, to acheive performance similar to IFLT, LBP P,R needs to combine two or three resolutions (eg: to consider 8 and 16 neighbourhood pixels, or 16 and 24 neighbourhood pixels or to consider 8, 16 and 24 neighbourhood of pixexls ). Whereas using IFLT on N = 8, and scale = 2, we can still derive better/similar performance when compared to LBP P,R. The number of operations needed to generate IFLT and LBP P,R features for one pixel is shown in Table 6. Hence, the process of generating rotation and scale invariant texture descriptors using IFLT is computationally

P,R Computations 8,1 32 16,2 64 24,3 96 LBP P,R 8,1 + 16,2 96 8,1 + 24,3 128 16,2 + 24,3 160 8,1 + 16,2 + 24,3 192 IFLT (8,1),scale = 2 80 Table 6. Number of computations required to generate descriptors for one pixel. less intense when compared to that of LBP P,R. 4. Conclusion We have developed a novel local texture descriptor which possesses (partial) illumination, scale and rotation invariance characteristics. Performance results of the proposed Invariant Features of Local Textures algorithm shows that the descriptors generated by IFLT are more distinctive with respect to oriented textures. IFLT descriptors have also been able to discriminate between strongly oriented textures very efficiently and are more stable in different spatial resolutions. Experiments have shown that the proposed method is very effective in identifying image patches with similar textures. Hence they demonstrate their suitability for rotation invariant texture classification. As a fundamental method, it has a wide range of potential applications in the field of computer vision, image/video processing. [9] W.-K. Lam and C.-K. Li. Rotated texture classification by improved iterative morphological decomposition. volume 144, pages 171 179, 1997. [10] S. J. L.S. Davis and J. Aggarwal. Texture analysis using generalized cooccurrence matrices, 1979. [11] S. Mallat. A Wavelet Tour of Signal Processing, Second Edition (Wavelet Analysis & Its Applications). Academic Press, September 1999. [12] R. Porter and N. Canagarajah. Robust rotation-invariant texture classification: Wavelet, gabor filter and gmrf based schemes. volume 144, pages 180 188, 1997. [13] T. Reed and J. H. D. Buf. A review of recent texture segmentation and feature extraction techniques, 1995. [14] R.M.Harlick. Statistical and structural approaches to texture. volume 67, pages 786 804, 1979. [15] T. M. Timo Ojala, Matti Pietikainen. Multiresolution grayscale and rotation invariant texture classification with local binary patterns, 2002. [16] M. Varma and A. Zisserman. Classifying images of materials: Achieving viewpoint and illumination independence. In ECCV (3), pages 255 271, 2002. [17] J. Zhang and T. Tan. Brief review of invariant texture analysis methods. f atlern Recognition, 35(3):735 747, 2002. References [1] University of oulu texture database. Available on: http://www.outex.oulu.fi/temp/. [2] H. Arof and F. Deravi. Circular neighbourhood and 1-d dft features for texture classification and segmentation. volume 145, pages 167 172, 1998. [3] P. Brodatz. Textures: A Photographic Album for Artists and Designers. Dover, 1966. [4] D. Chetverikov. Experiments in the rotation-invariant texture discrimination using anisotropy features. pages 1071 1073, 1982. [5] L. Davis. Polarograms: A new tool for image texture analysis, 1981. [6] J.-C. K. H. R. Gou-Chol Pok Liu. New shape-based texture descriptors for rotation invariant texture classification. volume 2, pages 533 536, 2003. [7] R. G. H. Greenspan, S. Belongie and P. Perona. Rotation invariant texture recognition using a steerable pyramid. volume 2, pages 162 167, 1994. [8] G. Haley and B. Manjunath. Rotation-invariant texture classification using a complete space-frequency model, 1999.