CHAPTER 4 FEATURE EXTRACTION AND SELECTION TECHNIQUES

69 CHAPTER 4 FEATURE EXTRACTION AND SELECTION TECHNIQUES 4.1 INTRODUCTION Texture is an important characteristic for analyzing the many types of images. It can be seen in all images, from multi spectral scanner images obtained from aircraft or satellite platforms to microscopic images of tissue samples. Image texture, defined as a function of the spatial variation in pixel intensities (gray values), is useful in a variety of applications and it has been a subject of intense study by many researchers. One immediate application of image texture is the recognition of image regions using texture properties. Texture is the most important visual cue in identifying these types of homogeneous regions. This is called texture classification. Texture is also a combination of repeated patterns with a regular frequency. Texture analysis is defined as the classification or segmentation of textural features with respect to the shape of a small element, density and direction of regularity. In the case of digital image, it is difficult to treat the texture mathematically because texture cannot be standardized quantitatively since the data volume is so huge. Image analysis techniques have played an important role in several medical applications. In general, the applications involve the automatic extraction of features from the image which is then used for a variety of segmentation and classification tasks, such as distinguishing normal tissue from abnormal tissue. Depending upon the particular

70 classification task, the extracted features capture morphological properties, colour properties, or certain textural properties of the image. In pattern recognition and in image processing, texture feature extraction is a special form of dimensionality reduction. When the input data to an algorithm is too large to process and it is suspected to be notoriously redundant (much data, but not much information), then the input data will be transformed into a reduced representation of set of features (also named features vector). Transforming the input data into the set of features is called features extraction. If the features extracted are carefully chosen it is expected that the feature set will extract the relevant information from the input data in order to perform the desired task using this reduced representation instead of the full size input. Features often contain information relative to gray shade, texture, shape, or context. To classify an object in an image, first we have to extract some features out of the image. 4.2 TEXTURE ANALYSIS TYPES Texture analysis is a quantitative method that can be used to quantify and detect structural abnormalities in different tissues. Approaches to texture analysis are usually categorized into: 1. Structural or syntactic, 2. Statistical, 3. Model-based and 4. Signal processing methods or transform methods according to evaluate the interrelationship of the pixels.

71 4.2.1 Structural or Syntactic Approach Structural approaches represent texture by well defined primitives (micro texture) and a hierarchy of spatial arrangements (macro texture) of those primitives. To describe the texture, we must define the primitives and the placement rules. The choice of a primitive (from a set of primitives) and the probability of the chosen primitive to be placed at a particular location can be a function of location or the primitives near the location. The advantage of the structural approach is that it provides a good symbolic description of the image; however, this feature is more useful for synthesis than analysis tasks. Some texture definitions regard textures as a regular arrangement of a small number of fixed pixel groupings called primitives. Syntactic texture description models assume that the primitives are located in almost regular relationships; textures are composed of primitive elements and placement rules, and then seek to partition images. A grammar represents a rule for building a texture from primitives. Placement rules represent the spatial relationship between primitives. A number of grammars and different placement rules may usually be used to describe a texture. Grammars suitable for texture description contain chain grammars, graph grammars, tree grammars, and matrix grammars. Syntactic texture description is usually achieved by combining specific primitive description methods. The rules governing the spatial organization of primitives are inferred. Properties of the primitives (e.g.,area and average intensity) are then used as texture features. Once the primitives have been identified, the analysis is completed either by computing statistics of the primitives (e.g. intensity, area, elongation, and orientation) or by describing the placement rule of the elements. Among these methods, image edges are an often used as primitive element. Syntactic texture description can be applied individually, or even combined with other methods. For example, Hong et al (1982) assumed that the edge pixels form a closed contour and primitives were

72 extracted by searching for edge pixels followed with a region growing operation. 4.2.2 Statistical Approach Statistical methods are the most widely used in medical images. Statistical texture description methods define the texture based on describing the spatial distribution of gray values computing local features at each point in the image, and deriving a set of statistics from the distributions of the local features. Local features are defined by the combination of intensities at specific position relative to each point in the image. Statistics are classified as a first, second or higher order statistics according to the number of points which define the local feature. The simplest statistics are the gray level first-order statistics. They describe the gray level histogram of an image. In first order statistics, image properties depend on individual pixel values. Second-order statistics such as the co-occurrence matrix method and the gray level difference method describe the spatial relationships between image pixels. In second order statistics, the image properties depend on pixel pairs. Higher order statistics, including run length measures and the autocorrelation function, can also be measured for texture analysis. In contrast to structural methods, statistical approaches do not attempt to understand explicitly the hierarchical structure of the texture. Instead, they represent the texture indirectly by the nondeterministic properties that govern the distributions and relationships between the gray levels of an image. Methods based on second-order statistics (i.e. statistics given by pairs of pixels) have been shown to achieve higher discrimination rates than the power spectrum (transform-based) and structural methods. Accordingly, the textures in gray-level images are discriminated spontaneously only if they differ in second order moments. Equal second order moments, but different third order moments require deliberate cognitive effort.

73 The statistical approach exploits the statistical properties of image or image regions in a bottom up fashion, starting from the pixel values in the neighborhood. There are different types of statistical approaches i) First order statistical measures ii) second order statistical measures and iii) higher order statistical measures. 4.2.2.1 First-order Statistical measures First order statistics describe the gray level histogram of an image. The spatial distribution of gray-level variations can be described by a probability distribution of pixel intensity. Gray level histogram is used to generate a class of texture features. One direct way to characterize the qualities of textures is to use the shape of an image histogram. A group of statistical measures which describe the histograms can be calculated from the gray level values of individual pixels in an image, including mean gray level of pixels, variance and their standard deviation, and signal-to-noise ratio. A further characterization of the histogram includes skewness and kurtosis. Skewness is a measure of symmetry. A distribution, or data set, is symmetric if it looks the same to the left and right of the center point. Kurtosis is a measure whether the data are peaked or flat relative to a normal distribution. That is, data sets with high kurtosis tend to have a distinct peak near the mean and decline rather rapidly. Data sets with low kurtosis tend to have a flat top near the mean rather than a sharp peak. 4.2.2.2 Second order statistical measures Second-order texture measures are mainly based on the joint graylevel histogram of pairs of geometrically related image points. There are two widely used second-order statistical methods: Gray Level Co-occurrence Matrices and Gray Level Difference Matrix measures.

74 Gray Level Co-occurrence Matrices Gray Level Co-occurrence Matrices (GLCM) proposed by Haralick et al (1973) have become one of the most well-known and widely used texture measures. Image texture is one of the important characteristics used in identifying the objects or the region of interests in an image. Texture contains important information about the structural arrangement of surfaces. The textural features based on gray level spatial dependencies have a general applicability in image classification. Spectral features describe the average total variations in various bands of the visible and/or infrared portion of an electromagnetic spectrum. Textural features contain information about the spatial distribution of total variations within a band and all the texture information is contained in the gray level co-occurrence matrices. Hence all the textural features are extracted from these gray level co-occurrence matrices. Since these features contain information about the textural characteristics of the image, it is hard to identify the specific textural characteristic is represented by each of these features. Gray level 0 1 2 3 0 #(0,0) #(0,1) #(0,2) #(0,3) 1 #(1,0) #(1,1) #(1,2) #(1,3) 2 #(2,0) #(2,1) #(2,2) #(2,3) 3 #(3,0) #(3,1) #(3,2) #(3,3) Figure 4.1 General form of GLCM for image with four gray-level values 0 through 3. A

75 generalized GLCM for that image is as shown in Figure 4.1 where # (i,j) stands for number of times gray levels i and j have been neighbors, satisfying the condition stated by displacement vector d. rom 1 to 10. Applying large displacement value to a fine texture would yield a GLCM that does not capture detailed textural information. From the previous studies, it has accepta pixel is more likely to be correlated to other closely located pixel than the one located far away. Also, the displacement value equal to the size of the texture element improves classification. which are 0, 45, 90, 135, 180, 225, 270 or 315. However, taking into considering the definition of GLCM, the co-occurring pairs obtained by to 180. As well as this concept extends to 45, 90 and 135. Hence, one has four choices to select the value. Sometimes, when the image is isotropic, or directional information is not required, one can obtain isotropic GLCM by integration over all angles. Choice of quantized gray levels (G) The dimension of a GLCM is determined by the maximum gray value of the pixel. The number of gray levels is an important factor in GLCM computation. More levels would mean more accurate extracted textural information, with increased computational costs. The computational complexity of GLCM method is highly sensitive to the number of gray levels and is

76 proportional to O (G) 2. Thus for a predetermined value of G, GLCM is required -order texture measure. The triangular matrix and the diagonal always contains even numbers. Various GLCM parameters are related to specific first-order statistical concepts. For instance, contrast would mean pixel pair repetition rate and variance would mean spatial frequency detection etc. Association of a textural meaning to each of these parameters is very critical. Traditionally, GLCM is dimensioned to the number of gray levels G and stores the co-occurrence probabilities g ij. To determine the texture features, selected statistics are applied to each GLCM by iterating through the entire matrix. The textural features are based on statistics that summarize the relative frequency distribution which describes how often one gray level will appear in a specified spatial relationship to another gray level on the image. Feature extraction using GLCM method The GLCM is based on an estimation of the second order joint conditional probability density functions (P ( 135 o matrix of two pixels, which and gray level j. The gray level co-occurrence matrix ( (d = [P ( N g (4.1) where, N g is the maximum gray level. In this method, four gray level co- 135 o ) are obtained for a given distance d (=1, 2) and the following 13 Haralick statistical textural features such as angular second moment, contrast, correlation, sum of squares variance, inverse difference moment, sum average, entropy, sum entropy, sum variance, difference entropy, difference variance, information

77 measures of correlation I and II are extracted from each gray level cooccurrence matrix and take the average of all the features extracted from the four gray level co-occurrence matrices. Notation P(i,j) - (i,j) th entry in a normalized gray level co-occurrence matrix, = P(i,j)/R. R is a normalizing constant. P X (i) - i th entry in the marginal-probability matrix obtained by summing the rows of P(i,j), = (4.2) N g, Number of distinct gray levels in the quantized image P y ( i,j) = (4.3) P x+y (k) = g (4.4) P x-y (k) = i-j=k, k = 0 g-1 (4.5) 1. Angular second moment =. (4.6) This statistic is also called uniformity or energy. It measures the textual uniformity of pixel pair distributions. It detects disorder in textures. Energy reaches maximum value equal to 1. Higher energy value occurs when distribution has a constant value.energy has a normalized range. The GLCM of less homogeneous image will have a large number of small entries.

78 2. Entropy = (4.7) This statistic measures the disorder or complexity of an image. The entropy is large when the image is not texturally uniform and many GLCM features have very small values. Complex textures tend to have high entropy. Entropy is strongly, but inversely correlated to energy. 3. Contrast = { } (4.8) This statistic measures the spatial frequency of an image and it has difference moment of GLCM. It is the difference between the highest and lowest values of a contiguous set of pixels. It measures the amount of local variations present in the image. A low contrast image presents GLCM concentration term around the principal diagonal and features low spatial frequencies. 4. Variance = (4.9) This statistic is a measure of heterogeneity and is strongly correlated to first order statistical variable such as standard deviation. Variance increases when the gray level values differ from their mean. 5. Inverse difference moment = (4.10) This statistic is also called Homogeneity. It measures the image homogeneity as it assumes larger values for smaller gray level differences in pair elements. It is more sensitive to the presence of near diagonal elements in the GLCM. It has maximum value when all the elements in the image are same. GLCM contrast and homogeneity are strong, but inversely, correlated in terms

79 of equivalent distribution in the pixel decreases if contrast increases, while energy is kept constant. distribution. It means homogeneity 6. Correlation = (4.11) The correlation feature is a measure of gray level linear dependencies in the image. The rest of the textural features are secondary. 7. Sum average = (4.12) It measures the average of the gray level within an image. 8. Sum variance = (4.13) This statistic is a measure of average of heterogeneities in the image. 9. Sum entropy = (4.14) the image This statistic measures the average of disorders or complexities in 10. Difference variance = variance of P x-y. 11. Difference entropy = (4.15) 12. Information measures of correlation I = (4.16) 13. Information measures of correlation II = (1 - exp [-2.0(HXY2 - HXY)]) 1/2 (4.17)

80 HXY = where HX and HY are entropies of P x, and p y, HXY1= (4.18) HXY2= (4.19) Angular second moment, Entropy, Contrast, Variance, Inverse difference moment and correlation features are often used among the 13 Haralick texture features to reveal certain properties about the spatial distribution of the texture image. Since real textures usually have so many different dimensions, these texture properties are not independent of each other. For instance, the energy measure generated from gray level co-occurrence matrix is also known as uniformity, variance is a measure of heterogeneity and Inverse difference moment is a measure on homogeneity. Therefore when choosing a subset of meaningful features from gray level cooccurrence matrix for a particular application, features do not have to be independent because a subset of fully independent features is usually hard to find. Gray Level Difference Measures(GLDM) The Gray Level Difference Method (GLDM) proposed by Gool et al (1985) is similar to the co-occurrence matrices. Texture features are also derived from the probability density functions of gray levels. The difference is that probability densities in the gray level difference method are not directly calculated from the original texture image, but from a subtracted image. For example, let I(n,m) be a original grayscale image. Given a displacement d = (dx,dy), where dx and dy are integers, the subtracted image diff(n,m) is obtained by

81 diff (n,m) = g(n,m) g(n + dx,m + dy) (4.20) The probability density function is defined as p(i, d) = P(diff(n,m) = i) (4.21) This probability density function also can be computed from four principal directions: 0, 45, 90, and 135. Typically the difference density functions are accumulated in the horizontal and vertical direction, or in all four directions of an image, providing rotationally invariant texture me asures. 4.2.2.3 Higher-Order Statistical measures Autocorrelation Measures Sonka et al (2002) proposed the auto correlation measures in which each pixel in a texture image can be characterized by its location properties. We can consider a texture primitive as a contiguous set of pixels described by some properties such as its average intensity, size, position, shape, etc. The autocorrelation function is described by the correlation coefficient that evaluates linear spatial relationships between the texture primitives. It is widely used to assess the amount of regularity in a texture, the size of the texture primitives, as well as coarseness/fineness of the texture in the images. If the texture primitives are large, yielding coarse textures, the autocorrelation function value decreases slowly with increasing distance. If texture primitives are small which gives fine textures, the autocorrelation function decreases rapidly. Dominant Gray Level Run Length Matrix texture measures (DGLRLM) Tang et al (1998) proposed by the dominant gray level run length texture features can also be calculated based on computation of primitive length and gray level. A primitive is a continuous set of maximum number of pixels in

82 the same direction that have the same gray level. A large number of neighboring pixels of the same gray level represents a coarse texture and a small number of neighboring pixels of the same gray level represents a fine texture. Dasarathy and Holder et al (1991) described the four feature extraction functions following the idea of joint statistical measure of gray level and run length. Instead of developing new functions to extract the texture information, we use the gray level run length matrix as the texture feature vector directly to preserve all information in the matrix. Dominant gray level run length texture description features, such as such as short run low gray level emphasis, short run high gray level emphasis, long run low gray level emphasis and long run high gray level emphasis which provide a texture feature vector that can be used as the input vector of a classifier in the texture classification tasks. The DGLRLM is based on computing the number of gray level runs of various lengths. A gray level run is a set of consequence and collinear pixels point having the same gray level value. The length of the run is the number of pixel points in the run. The dominant gray level run length matrix follows: N g max (4.22) Where N g is the maximum gray level and R max is the maximum run length. The t a given image o, 45 o, 90 o, 135 o are computed from an each region of interest for a given distance d (=1, 2). In this method, four gray level run length matrices for four different directions (0, 45, 90, 135 o ) are obtained and the following four dominant gray level run length texture features such as short run low gray level emphasis, short run high gray level emphasis, long run low gray level emphasis and long run high gray level

83 emphasis are calculated for each gray level run length matrix and takes the average of all the features extracted from four gray level run length matrices. 1. Short Run Low Gray level Emphasis (SRLGE) It measures the joint distribution of short runs and low gray level values. The SRLGE is expected large for the image with many short runs and low gray level values. SRLGE = (4.23) 2. Short Run High Gray level Emphasis (SRHGE) It measures the joint distribution of short runs and high gray level values. The SRHGE is expected large for the image with many short runs and high gray level values. SRHGE = (4.24) 3. Long Run Low Gray level Emphasis (LRLGE) It measures the joint distribution of long runs and low gray level values. The LRLGE is expected large for the image with many long runs and low gray level values. LRLGE = (4.25)

84 4. Long Run High Gray level Emphasis (LRHGE) It measures the joint distribution of long runs and high gray level values. The LRHGE is expected large for images with many long runs and high gray level values. In a coarse texture relatively long runs occur more often, whereas, fine texture contains primarily short runs occurs. LRHGE = (4.26) where g is the gray level run length matrix, g (i, j) is an element of the run length matrix at the position (i, j) and n r is the number of runs in the image. The advantage of using new dominant gray level run length texture features that significantly improve the image classification accuracy over traditional run length texture features. By directly using the run length matrix as a feature vector, much of the texture information is preserved. With only such a small number of features, perfect classification is achieved with the original matrix and with most of the new matrices and vectors. 4.2.2.4 Other Statistical Description Measures The local extrema measure is another statistical approach used in certain applications. The local extrema measure is based on the study of local extrema of gray level values in images or certain parts of the images. In an image containing various levels of information, a pixel in the image is believed to be a local maximum if it is higher than all the others in a given area centered on this pixel. The order of this local maximum is defined as the radius of this area. Usually, if fine textures have a large number of small-sized local extrema, then coarse textures have a smaller number of small- sized local extrema proposed by Bonnevay et al (1998).

85 4.2.3 Model Based Approach Model based texture analysis proposed by Tuceryan et al (1998) using fractal and stochastic models, attempt to interpret an image texture by use of generative image model and stochastic model. The parameters of the model are estimated and then used for image analysis. Model-based methods were originally developed in the texture synthesis field. They are based on the construction of an image model that can be used not only to describe texture, but also to synthesize it. These models assume that the intensity at each pixel in the texture image depends on the intensities of only the neighbouring pixels. The model parameters capture the qualities of texture. Two popular modeling methods, Markov random models and fractal models are discussed. Markov random models are able to capture the local contextual information in an image by assuming a joint probability for modeling images. It is also called Markov neighbor, because the conditional probability of the intensity of a given pixel depends only on the intensities of the pixels in its neighbourhood. Markov random fields have several advantages. They emphasize local contextual information, which makes the texture classification much easier to achieve. Pixels and groups of pixels are assigned different labels after classification. This method is convenie nt for representing some important location attributes in images, such as the location of discontinuities between regions and boundaries in images. Fractals are very useful in modeling statistical properties such as roughness and self-similarity in images. A - follows. Given a bounded set A in a Euclidean n-space, the set A is said to be self-similar when A is the union of N distinct copies of itself, each of which has been scaled down by a ratio of r. The fractal dimension D is defined as: D = log N /log 1/r (4.27)

86 If the fractal dimension is larger, the texture is the rougher; otherwise the texture is the smoother. 4.2.4 Signal Processing Methods Signal processing methods proposed by Sonka et al (2002) compute the certain features from filtered images. The filters commonly include spatial domain filters, Fourier domain filters, Gabor filters, wavelet transformation, discrete cosine transformation, ring or wedge filters and so on. Four widely used types in filtering approaches are namely spatial filtering methods, frequency domain analysis, Gabor filters, and discrete wavelet frame transformation methods are introduced in the following sections. Spatial domain filters are similar to operator-based methods and thus could also be classified as a statistical method. A number of texture features, such as edge frequency, coarseness, randomness, directivity, linearity and size and so on can be frequency measures are commonly used and have been successfully used in texture features is the derivation of a set of energy features for each pixel in a texture image, and computation of the energy statistic. These features capture the amount of energy contributed to the image by certain structures such as edges, spots, waves, and ripples. The edge frequency measures are computed from edge pixels which are using operators are good at capturing the image textures. Once edges are detected in texture regions, they can be used to define texture descriptors in a variety of ways. For example, one can compute edge density, edge orientation, contrast, fuzziness, or spatial arrangement of edges in the texture image.

87 4.2.5 Other Signal Processing Methods The frequency analysis of the textured image is achieved by performing filtering in the Fourier domain to obtain feature images. The applications of Fourier domain filtering are frequency or orientation. The selective filters, to extract frequency and orientation components as texture features. Methods based on the Fourier transform perform poorly in practice, due to its lack of spatial localization. Texture features extracted from multitextured images use Gabor filters that included linear Gabor features, thresholded Gabor features; Gabor-energy features and so on. Gabor filters provide means for better spatial localization; however, their usefulness is limited in practice because there is usually no single filter resolution at which can localize a spatial structure in natural textures. Discrete wavelet frame transformation represents another approach. The use of a pyramid structured wavelet transformation for texture analysis was first suggested by Mallat et al (1989). In the wavelet transformation, the original image is decomposed into a low-resolution and several detail images. Energy, variance, cluster shade and cluster prominence of the detail images are usually extracted for the texture analysis. The main advantage of wavelet decomposition is that it provides a unified framework for multi scale texture analysis. 4.2.5.1 Wavelet Transform Texture Measures The wavelet transform is a mathematical tool that can be used to describe the images in multiple resolutions. One important feature of wavelet transform is its ability to provide a representation of the image data in a multiresolution fashion. Such hierarchical decomposition of the image information provides the possibility of analyzing the coarse resolution first, and then sequentially refines the segmentation result at more detailed scales. In general, such practice provides additional robustness to noise and local maxima. A

88 wavelet transform decomposes a signal to a hierarchy of sub-bands with sequential decrease in resolution. Such expansions are especially useful when a multi-resolution representation is needed. Some image segmentation and a multi-resolution framework. Many important features from an image data can be characterized more efficiently in the spatial-frequency domain. Such feature characterization was shown to be extremely useful in many applications including segmentation and classification, registration and data compression. Small fraction of features can be used to distinguish the textures present in an decomposed image because they allow analysis of images at various levels of resolution. More recently, methods based on multi-resolution or multi-channel analysis, such as Gabor filters and wavelet transforms are used. But, the outputs of Gabor filter banks are not mutually orthogonal, which may result in a significant correlation between the texture features. Finally, these transformations are usually not reversible, which limit their applicability for texture synthesis. Most of these problems can be avoided by using the wavelet transform, which provides a precise and unifying frame work for the analysis and characterization of a signal at different scales. Another advantage of wavelet transform over Gabor filter is that the low pass and high pass filters used in the wavelet transform remain the same between two consecutive scales while the Gabor approach requires filters of different parameters. The wavelet transform is a multi-resolution technique, which can be implemented as a pyramid or tree structure and is similar to sub-band decomposition.

89 TWO DIMENSIONAL DISCRETE WAVELET TRANSFORM (2DDWT) Wavelets are functions generated from one single function by dilations and translations. The basic idea of the wavelet transform is to represent any arbitrary function as a superposition of wavelets. Any such superposition decomposes the given function into different scale levels where each level is further decomposed with a resolution adapted to that level. The DWT is identical to a hierarchical sub band system where the sub bands are logarithmically spaced in frequency and represent octave-band decomposition. By applying DWT, the image is actually divided, that is decomposed into four sub bands and critically sub sampled as shown in Figure 4.3a. These four sub bands arise from separable applications of vertical and horizontal filters as shown in Figure 4.2. The filters L and H shown in Figure 4.2 are onedimensional low pass filter (LPF) and high pass filter (HPF) respectively. Thus, decomposition provides sub bands corresponding to different resolution levels and orientations. These sub bands labeled I L H, I HL, and I HH represent the finest scale wavelet coefficients, that is, detail images, while the sub band I LL corresponds to coarse-level coefficients, that is approximation image. To obtain the next coarse level of wavelet coefficients, the sub band I LL alone is further decomposed and critically sampled using a similar filter bank is shown in Figure 4.2. This results in a two-level wavelet decomposition which is as shown in Figure 4.3b. Similarly, to obtain further decomposition, I LL will be used. This process continues until some final scale is reached.

90 Figure 4.2 Wavelet filter bank for one-level image decomposition Figure 4.3 Image decomposition a) one level b) two level Algorithm for determining the final scale Decompose a given input CT image with 2D 2 le vel discrete wavelet transform into 4 sub images which can be viewed as I LL, I LH, I HL,I HH sub images. Calculate energy of the decomposed images ( I LH, I HL,I HH ). That N, the energy is (4.28)

91 If the energy of subimage is smaller than others, we stop the decomposition in this region since it contains less information. This step can be achieved by comparing the energy with the largest energy value in the same scale. That is if e < Ce max, stop decomposing this region where C is a constant less than 1. If the energy of the sub images is significantly larger, we apply the above decomposition procedure to the sub image. Practically, the size of smallest sub image should be as a stopping creteria for further decomposition. The final scale or level in our proposed method is 2 level disctrete wavelet transform and it is detemined based on the size of smallest sub image since the smallest sub image is present in second level decomposition. The values or transformed coefficients in approximation and detail images (sub-band images) are the essential features, which are as useful for texture discrimination and segmentation. Since texture is either micro or macro, have non-uniform gray level variations, they are statistically characterized by the values in the DWT transformed sub band images or the features derived from these sub-band images or their combinations. In other words, the features derived from these approximation and detail sub-band images uniquely characterize a texture. The features obtained from these DWT transformed images are shown here as useful for texture analysis, namely segmentation and classification. 4.2.6 Hybrid Methods Hybrid methods, which combine the statistical approaches, modelbased approaches, transform based approaches, and syntactic approaches are extremely successful in some applications of texture analysis. Many applications combine the syntactic and statistical approaches for texture analysis. The technique not only brings many advantages of using the primitive

92 definition, but also decreases or avoids the complexity of grammar inference in syntactic approaches. For example, partly syntactic and partly statistical techniques were used together. Other hybrid approaches combine different categories of texture features to form the feature vectors. In this work, hybrid approaches of four categories of features extraction methods were used for texture identification. They are combined Wavelet based Statistical Texture and Co-occurrence Texture features extraction method, combined Dominant Gray Level Run Length and Co-occurrence texture features extraction method, combined Wavelet based Dominant Gray Level Run Length and Co-occurrence texture features extraction method and combined Co-occurrence texture features, gray level features and new edge features extraction method. 4.3 COMPARISON OF TEXTURE FEATURE EXTRACTION METHODS There are a number of general definitions of texture in the computer vision literature. Many texture feature description measures are proposed according to the different definitions. The different texture feature measures capture a texture characteristic, such as fineness and coarseness in their own ways. For example, autocorrelation measures are based on finding the linear spatial relationship between primitives. If the primitives are large, it means the texture is coarse and the autocorrelation function decreases slowly with increasing distance. If the primitives are small, texture is fine and the autocorrelation function decreases rapidly. The co-occurrence approach is based on the joint probability distribution of pixels in an image. The run length approach is based on computation of continuous probabilities of the length and the gray-level of primitives in the texture. Thus, coarse textures can be represented by a large number of neighboring and long lengths of texture primitives. Real textures usually have many different dimensions; therefore textured properties are not independent of each other. This means that a

93 property of the texture may be described in another way by another method. Moreover, the computational complexity will be greatly increased if too many features, several selected combined texture feature extraction methods were applied to the images in our study. In the following section, we compare the four categories of texture feature extraction methods based on their main properties and limitations on implementation. Only those which were appropriate for our problem were chosen. Most textures are defined as contextual properties involving the spatial distributions of gray levels. Therefore, the statistical approaches based on spatial distribution of gray values are generally applicable. In addition, the transform approach methods, most of which are based on discrete wavelet transform operations, are easy to implement using Matlab software. So we utilized combined statistical and signal processing approaches in our study. Syntactic methods are not as widely used as statistical methods, because grammar is a very strict formula and placement rules are based on the idea that primitives have regular spatial relationships. It is impossible to describe textures in reality which is usually irregular, distorted or variant in structure in this way. Moreover, syntactic methods are very sensitive to local noise and structural errors, distortions or variations. To make syntactic description of real textures possible, nondeterministic primitives or stochastic grammars must be defined to substitute for primitives with regular relationships and deterministic transformation rules. The model-based methods are based on estimation of the model parameters. For example, in texture synthesis problems, model parameters are set to control the type of texture. In texture classification problems, parameters need to be estimated first. However, if the texture structure is unknown, the estimation of the parameters is hard to achieve. Thus, an unfortunate point of model-based methods is that the models are intractable because the issue of estimating model parameters is difficult. A Markov random field model is constructed from a graph consists of connected nodes. Images are represented by joint probabilities

94 of nodes. One disadvantage of this method is that it has to be utilized along with other methods when the joint distribution of the Markov fields is too complex for computation of means. The estimation of fractal dimension also has problems. Fractals assume that images are self-similar at different scales and the fractal is deterministic. One example of deterministic texture is a checkerboard with strictly ordered array of identical squares. Apparently, most natural textures are not deterministic as described in the definition of fractals, which makes the application of fractals difficult. Furthermore, model-based approaches only capture micro textures well, since the practical considerations limit the order of the model. Model-based approaches also fail with inhomogeneous textures. We note that the statistical methods and wavelet transform methods are superior to others from the above comparison of properties of each texture description method. A number of other aspects should also be considered when choosing a texture description measure. They are as follows. 1. Gray scale invariance: We must consider how sensitive the texture feature is to be changed in the gray scale. For example, in industrial machine vision, where lighting conditions may be unstable, gray scale invariance is especially necessary. CT images in one data set may be recorded by different CT scanners systems using different techniques and procedures of capturing, which cause the absolute gray levels and contrast of each image. 2. Rotational invariance: If the rotation of the images changes with respect to the viewpoint, then the texture features are varied. This rotational invariance should be considered when choosing medical applications.

95 3. Accuracy with respect to noise: The texture measures affect the Gaussian noise in the input CT images. This is particularly important in medical applications and the accuracy of image 4. Computational complexity: Medical diagnosis requires a parameter easy to understand and fast to compute. Many texture description measures are computationally intensive that they would be very challenging to implement in clinical practice. Comparison and evaluation of texture description methods beyond a specific application are not realistic. Some realistic factors except for the four points just mentioned can also influence the performance of the methods, such as the amount of training data in texture classification, the pre-processing of an image, the signal intensity of CT images, and the resolution level of the system. We also have to consider the availability of the provided background knowledge and limitations of the methods in clinical practice. Such conclusions can be achieved. First, the model-based approaches are obviously in appropriate to this study because we have no idea about the textures properties of benign and malignant tumor images at this stage and thus estimation of parameters for a texture model is difficult. Similarly, there is difficulty in defining texture primitives if the syntactic texture description model is used because this model assumes that the primitives are located in almost regular relationships. The spatial distribution of texture primitives in different parts of the benign and malignant tumor images is still unknown. Furthermore, the analysis methods localized in the spatial domain compared with analysis in wavelet domain are usually preferred in many medical applications because most features in the spatial domain have specific physical meanings corresponding to physiology. As a result, we chose some of statistical methods and transform methods,

96 including wavelet transform methods, co-occurrence matrix methods, dominant gray level run length matrix methods and gray level intensity methods. 4.4 EXTRACTING THE FEATURES FROM BRAIN CT TUMOR IMAGES The statistical methods and the wavelet transform methods are employed in this study according to the conclusions in section 4.3. This section places the emphasis on the combined Wavelet based Statistical Texture and Cooccurrence Texture features extraction method, combined Dominant Gray Level Run Length and Gray Level Co-occurrence texture features extraction method, Wavelet based combined Dominant Gray Level Run Length and Gray Level Co-occurrence texture features extraction method and combined Co -occurrence texture features, gray level features and new edge features extraction method. 4.4.1 Wavelet based combined Statistical and Co-occurrence Texture feature extraction Texture analysis is a quantitative method that can be used to quantify and detect structural abnormalities in different tissues. As the tissues present in brain are difficult to classify using shape or intensity level of information, the texture feature extraction is founded to be very important for further classification. The purpose of feature extraction is to reduce the original data set by measuring certain features that distinguish one region of interest from another. The analysis and characterization of textures present in the medical images can be done by using wavelet based statistical texture feature extraction method. Each sub image is taken from top left corner of the original image and decomposed using two levels DWT and gray level co-occurrence matrix is

97 derived from the sub image of 2 nd level (i.e., I LL1, ). Then from this gray level co-occurrence matrix, the Wavelet Co-occurrence Texture features (WCT) are computed. Algorithm for feature extraction is as follows Obtain the sub-image blocks, starting from the top left corner. Decompose sub-image blocks using 2-D DWT. Derive Gray Level Co-occurrence matrix (e.g., Haddon et al, 1993) from the 2 nd level low frequency sub-band of DWT with 1 for distance and 0, 45, 90 and From the Gray Level Co-occurrence matrix, the following nine Haralick texture features (e.g., Haralick et al, 1973) called WCT features are extracted. The Wavelet Statistical Texture features (WST) are extracted from 2 level Discrete Wavelet Transformed (DWT) low and high frequency sub bands. Combination of both WST and WCT features are used for classification The following set of 3 WST features and 9 WCT features which can be extracted from each of the gray level co-occurrence matrices. 1. Energy = (4.29) 2. Cluster shade = (4.30)

98 3. Cluster prominence= (4.31),, (4.32) where are the means and standard deviations of P x and P y matrix, = P(i,j)/R. P(i,j) - (i,j) th entry in a normalized gray-tone spatial dependence R is a normalizing constant. P X (i) - i th entry in the marginal-probability matrix obtained by summing the rows of P(i,j), = (4.33) N g, Number of distinct gray levels in the quantized image P y ( i,j) = (4.34) P x+y (k) = g (4.35) P x-y (k) = i- g-1 (4.36) 1. Entropy = (4.37) 2. Energy = (4.38) 3. Contrast = { (4.39)

99 4. Sum average = (4.40) 5. Variance = (4.41) 6. Correlation = (4.42) 7. Inverse difference moment = (4.43) 8. Cluster tendency = (4.44) 9. Max probability = (4.45) A set of 3 WST features such as Energy,Cluster shade and Cluster prominence and 9 WCT textural features such as Energy, Entropy, Contrast, Sum average, Correlation, Max probability, Inverse difference moment and Cluster tendency which can be extracted from each of the gray level cooccurrence matrices are used for segmentation and classification of benign and malignant tumor brain CT images. 4.4.2 Combined Dominant Gray Level Run Length and Co-occurrence texture features extraction The 17 spatial features from ROI of the segmented tumor region of the each slice are extracted by dominant gray level run length and gray level cooccurrence matrix method.

100 (1998) is as follows: The gray level run length matrix ) proposed by Tang et al N g, 0 < j R max (4.46) where N g is the maximum gray level and R max is the maximum run length. The ified the estimated number of runs that a given image o, 45 o, 90 o, 135 o are derived and the following four dominant run length texture features proposed by Dasarathy and Holder et al (1991), such as short run low gray level emphasis, short run high gray level emphasis, long run low gray level emphasis, long run high gray level emphasis are extracted from ROI of the segmented tumor region of each dominant gray level run length matrix and take the average of all the features extracted from four dominant gray level run length matrices. The gray level co-occurrence matrix et al (1973) is as follows Haralick = [P ( N g (4.47) where, N g probability matrix of two pixels, which are located with an inter sample distance - o, 45 o, 90 o, 135 o directions are derived with the given distance d(=1,2) and the following 13 Haralick texture features are extracted from ROI of the segmented tumor region of each gray level co-occurrence matrix and take the average of all the features extracted from four gray level co-occurrence matrices.

101 4.4.3 Wavelet based combined Dominant Gray Level Run Length and Gray Level Co-occurrence texture features extraction The analysis and characterization of textures present in the medical images can be done by using wavelet based combined statistical second order and higher order feature extraction method. Each sub image is taken from top left corner of the original image and decomposed using two level DWT and gray level co-occurrence matrix is derived from the approximation sub image of 2 nd level (i.e., I LL1, ). Then from this gray level co-occurrence matrix, the 13 Haralick texture features are extracted. The same process is done in DGLRLM method. In this method, each sub image is taken from top left corner of the original image and it is decomposed using two level DWT and gray level run length matrix is derived from the approximation sub image of 2 nd level (i.e., I LL1, ). Then from this gray level run length matrix, the dominant gray level run length texture features are computed. So, the 17 spatial features are extracted from the 2 level wavelet approximation tumor region of each slice, used by dominant gray level run length matrix and gray level co-occurrence matrix method. Algorithm for feature extraction is as follows Obtain the sub-image blocks, starting from the top left corner. Decompose the sub-image blocks using 2-D DWT. Derive Gray level Co-occurrence matrix (Haddon et al, 1993) and Dominant Gray Level Run Length Matrix from the 2 nd level wavelet approximation tumor image of DWT with 1 for distance and 0 averaged.

102 From the Gray Level Co-occurrence Matrix, the following 13 Haralick texture features ( Haralick et al, 1973) called WGLCM features are extracted. From the Dominant Gray Level Run Length Matrix, the following four dominant gray level run length texture features (Dasarathy and Holder et al, 1999) called WDGLRLM features are extracted Combination of both WGLCM and WDGLRLM features are used for classification. The dominant gray level run length matrix ( et al (1998) is as follows: ) proposed by Tang N g, 0 < j R max (4.48) where N g is the maximum gray level and R max is the maximum run length. The that is given by the o, 45 o, 90 o, 135 o are computed,and the following four dominant gray level run length texture features, such as short run low gray level emphasis, short run high gray level emphasis, long run low gray level emphasis and long run high gray level emphasis are extracted from 2 nd level wavelet approximation tumor image of each slice the features extracted from four dominant gray level run length matrices. The gray level co-occurrence matrix ( et al (1973) as follows ) proposed by Haralick = [P ( N g (4.49)