Learning to Identify Fuzzy Regions in Magnetic Resonance Images

Learning to Identify Fuzzy Regions in Magnetic Resonance Images Sarah E. Crane and Lawrence O. Hall Department of Computer Science and Engineering, ENB 118 University of South Florida 4202 E. Fowler Ave. Tampa, Fl 33620 hall@csee.usf.edu Abstract This paper presents an approach to automatic heuristic rule generation for tissue labeling in a magnetic resonance (MR) volumetric image of the human brain. The image is clustered with the semi-supervised fuzzy c-means (ssfcm) algorithm. The clusters are then labeled by analyzing the membership of pixels in the cluster and the corresponding ground truth data. Finally, production rules which are capable of labeling unseen data are learned. Production rule cluster type identification error rates decrease as the clusters become more homogeneous. After imposing a minimum of 70% cluster homogeneity on both the training and the testing data sets, this system was tested using 10-fold cross validation on 29 normal slices with an average cluster type identification error rate of 1.2% 1. Introduction With the growing use of magnetic resonance (MR) imaging as a non-invasive diagnostic technique, the need for automatic image segmentation and labeling increases. Thus, a fast and precise automated image region labeling technique is desirable. Automated methods would not only decrease the inherent inconsistencies of segmentations performed by varied human operators but could potentially provide a means to decrease the time before the doctor receives results on tissue growth/shrinkage and can begin to make a diagnostic determination. Numerous experimental methods of image segmentation and labeling have been presented [9, 5, 4, 8, 2, 12, 11], including rule-based systems shown to be successful in labeling normal tissue slices [9] and volumes [3]. Also, partial labeling of abnormal slices/volumes, respectively, was accomplished. These systems rely on hand-generated rules. The goal of the research described here was the automatic generation of rules to label the image data. The data includes the T1, proton density (PD), and T2 weighted images for each slice. The scope of this research was limited to normal MR slices of the human brain. The tissues of interest contained in such slices are white matter, gray matter, and cerebro-spinal fluid (CSF). It is difficult to obtain ground truth for image voxels as it generally requires an expert to outline boundaries on the image. The outlining process is very time consuming and therefore expensive. Hence, we attempt to generate enough labeled data (cluster centroids here) to learn heuristic rules by breaking an image into 10 masks. The masks include an intracranial region, an extracranial region, four quarters of the intracranial region, and four masks produced by including only every fourth voxel of the intracranial region. A semi-supervised clustering algorithm ssfcm [1], is used to generate regions. The region centers are labeled using ground truth data. Each region or cluster gets the label of the class to which the majority of its voxels belong. C4.5 [10], a decision tree builder, takes the 3 feature cluster center location and the class label of each cluster center as input and produces a decision tree. Subsequently, C4.5RULES [10] uses the decision tree and generates a set of production rules. The remainder of this paper includes Section 2 on the use of domain knowledge in the process of learning cluster identification rules, Section 3 describes how our approach is put together and finally a results and discussion section. 2. Domain Knowledge Each of the slices studied in this research are images of the human brain. These images were acquired in the axial plane. Each slice contains images for the three features of interest, T1 weighted, proton density (PD), and T2 weighted images. A spin-echo sequence was used to acquire the T1

Figure 1. Example normal raw image (T1, PD, T2). (a) (b) weighted images; otherwise, a fast spin-echo sequence was used. Figure 1 shows an example slice in which each of the feature images appear as intensity images. The scope of this research only covers normal brain images. Slices of both 3mm and 5mm from both GE and Siemens systems were used. The data acquisition sequence can be found in [6]. Since we know that T1, PD, and T2 vary according to the tissue type, the following cluster center initialization was used to help result in more useful final partitions. Given a lack of training data available for all tissue classes, the region between the minimum and the maximum intensities in the image are divided into c-1 regions, where c is the number of clusters. The initial cluster centers are then placed along the boundaries of these regions. Thus, the initial cluster centers are evenly distributed within the available intensity range, if no training data is available. This distribution implements knowledge about the distribution of T1, PD, and T2 in feature space. Any such initialization would also be valid. 3. The segmentation and identification system This system, which allows for the automatic generation of heuristic rules from quantitative data, requires a number of steps: masking, clustering sub-images, cluster labeling, and production rule generation. After the rules are generated they may be applied to clusters of unseen test images. 3.1 Masking In order to generate enough clusters for training given a limited number of available ground truth images, a series of masks is generated for each slice. A mask defines which of an image s voxels are to be considered for further processing. Such voxels are either labeled or unlabeled. Labeled data can serve as a guide to the segmentation stage s label assignment to unclassified data and to the clustering algorithm. From each image, ten masks are produced. One of the extracranial region and another of the intracranial region. The other eight represent two different quarterings of the intracranial region. Four produce quartering by captur- (c) (d) Figure 2. Example Masks (a) intracranial region (b) extracranial region (c) quartered mask (d) interlaced mask. The white region is included. ing only a single quadrant of the image. Another four produce quartering by creating an image of every fourth voxel. If a voxel is not included in a mask, then it is masked out. Row boundaries are ignored and wraparound is used. Each image contains 256 rows and 256 columns. Figure 2 shows examples of the four main types of masks for this slice. Training data for the ssfcm clustering algorithm is generated by randomly selecting a voxel from a cumulative histogram of all the labeled data, of the selected tissue type. This process is repeated until the required number of distinct training voxels (empirically determined to be 750) of each tissue type have been selected. The extracranial or background classes are labeled only as miscellaneous in ground truth. Thus, we are unable to use ground truth to produce training data for bone, skin, muscle, fat, and air. For this reason, it was decided not to produce training data for the extracranial clusters. This decision does not affect the choice of training data for the intracranial clusters since they are segmented separately. 3.2 Clustering The ssfcm algorithm was used as the clustering algorithm. Each of the ten masks of a slice is clustered giving ten sets of cluster center and voxel fuzzy membership data. For the intracranial masks, training data is also utilized during clustering. If training data is provided in the mask, it is used to initialize the cluster centers. Also, the training data is weighted so that the clustering algorithm acts as if there were multiple voxels with the same intensity as the training voxels. We used an empirically determined weight of 200, hence ssfcm treats each training voxels as if it appeared 2

(a) (b) Figure 3. a) Clustered Image and b) Ground truth. Dark gray represents gray matter, light gray indicates white matter, white indicates CSF, and black represents, miscellaneous tissue. 200 times. In this manner, the training voxels are weighted more heavily than non-training voxels and the problem of the least squares algorithm preferring equal cluster sizes is combated. The ratio of training voxels to unlabeled voxels varies per intracranial mask and the size of the intracranial region. No training data has been provided for the extracranial mask whose cluster centers are initialized by dividing the minimum to maximum intensity range into the appropriate number of divisions (as determined by the number of classes) and placing the cluster centers at these divisions. The extracranial mask is clustered into three classes. These classes contain air, bone, fat, skin, and muscle. The intracranial masks, however, are clustered into five classes. This represents an over segmentation of the expected three main classes (white matter, gray matter, and CSF). The extra classes take into consideration the fact that both white matter and gray matter sometimes split into multiple clusters. The extra classes are an attempt to compensate for this splitting and keep the classes uncontaminated by other tissues, rather than have multiple classes merged into a single cluster. A clustered image is shown in Figure 3a and the corresponding ground truth image is shown in Figure 3b. 3.3 Cluster labeling After each mask of an image has been clustered, it is necessary to combine the data generated for the extracranial mask with the data generated from each of the intracranial masks. This recombines the intracranial region and the extracranial region; allowing the system to be used to label tissue types in images that were clustered as a single image. The cluster centers of the extracranial region may be combined with the cluster centers of each of the nine segmentations of the intracranial region. This merging yields nine segmentations of sub-images, each segmentation consists of a total of eight clusters: five intracranial cluster centers and three extracranial cluster centers. In all but the full image, regions will exist that were masked out. After the merging of the cluster data, the cluster labels are identified by a combination of the ground truth values of the member voxels and their fuzzy membership values in each cluster. For each voxel x i, membership in class C k, as specified in the ground truth data, is represented by the function k (x i ), as shown in Equation (1). The cluster label matrix, CL, is incremented by the membership value at the location denoted by the voxel s ground truth value, as shown in Equation (2). U ij represents the fuzzy membership matrix, 0 <= j<number of clusters, k (x i ) = 1:x i 2 C k nx 0 otherwise (1) CL jk = ( k (x i ) U ij (2) i=0 FCL j = kjcl jk CL jm ; 8m; k 6= m (3) 0 <= k < number of labels, n = number of voxels, 0 <= i < number of voxels. After this is complete, the class labels for the clusters are determined by simply finding which class has the greatest number of voxels in the cluster, as shown in Equation (3). FCL is the final cluster label. This method is similar to that used by Hillman [8]. Although never seen in our data, should a cluster have no clearly defined class (at least the two maximum values in the fuzzy membership data are equal), then the lower class number is selected. The method described above to determine the cluster type could result in incorrect labels because of the partial volume effect in the voxels. In an effort to minimize these effects, we only use the fuzzy membership grade of the voxel to keep track of the increment to the cluster s size. Thus, if the voxel s maximum fuzzy membership value is 0.7, the cluster label matrix is not incremented by 1.0 to show a complete voxel, but by 0.7. After every cluster in a slice (for a given mask) has been labeled, we normalize the data. During normalization, the range of T1, T2, and PD values are independently scaled into a 0-1 range. With normalization, we hope to overcome much of the inter-image and inter-patient intensity variability. The relationships between T1, PD, and T2 remain relatively constant in each case in the absence of outside influence, such as radiation therapy and chemotherapy, which should not be found in any of our volunteers. The normalized cluster center data along with the cluster label is then ready to be used as training data for C4.5. 3

3.4 Production rule generation Since we are using normalized data, training data from different slices and even different volumes may be combined. The paucity of training and testing data presents two difficulties. First, it is difficult to generate a test set large enough to produce an accurate measure of the error rate while keeping a large training set. Second, the error rates of the test set can be highly variable depending on the division of data into training and testing sets. To overcome these difficulties, we use cross-validation which provides a more accurate estimate of the overall error rate [10]. The training data consists of normalized cluster center values (T1, PD, T2) along with the cluster center s tissue type, as assigned in the cluster analysis phase. Figure 4. Cluster Homogeneity vs. % of Total Clusters Used. 4. Results and Discussion Since there is little labeled data available, 10-fold crossvalidation is done. The full labeled data set is broken into 10 partitions each of 90% of the data for training with a unique 10% available for testing. Results will be reported as averages over the 10 folds. The average number of rules from each fold was 20.2 with a standard deviation of 4.4. It was found that misclassification generally occurs when a cluster center is on a border. Table 1 shows the error rates associated with each tissue type from a data set in which there is a minimum required homogeneity for each cluster of 70%. Homogeneity is defined as the percentage of voxels in a cluster which have the same class label as the cluster. For example, in a 70% homogeneous class A cluster, 70% of the member voxels belong to class A. The other 30% of the member voxels may belong to any other class or combination of classes. With no minimum required homogeneity, the error rate is 5% in cluster identification. The difficulty is that a misidentified cluster means all the pixels of the majority of the cluster are incorrectly labeled (a minority of the pixels may be correctly labeled). The other difficulty is that many identified clusters were not very homogeneous as seen in Figure 4. This means that even when the cluster was correctly labeled, the voxel level error could be significant. In our testing, it has been noted that the average cluster center identification error rate decreases as the required cluster homogeneity is increased, see Figure 5. However, this requirement reduces our already small training data set, see Figure 4. A balance was determined to exist, for this data set, at a required homogeneity of 70%. This requires that each cluster in both the training and testing data sets have at least 70% of its member voxels of the cluster label tissue type. For this data set, 81.7% of the total clusters are usable with greater than 50% of the clusters of each tissue type used. Figure 5. Cluster Homogeneity vs. Set Error Testing Additional testing was done using an FCM [7] segmentation. For these tests, the extracranial region was not extracted. For this reason, the number of clusters was also increased. Segmentations were done by individually segmenting 3 slices into 10 and 20 clusters. The resulting cluster centers, after normalization and labeling, were used as a test data set to a production rule set which was trained on all other existing data. The error rates were much higher, 43% and 28% respectively, than were seen when the intracranial region was extracted prior to segmentation [6]. As expected, the error rates fell, to 26% and 21% respectively, when the test data included only clusters with homogeneity 70%. These error rates are much higher than the corresponding ssfcm error rates. This is a result of segmenting the entire image simultaneously rather than extracting the extracranial region and then segmenting. Gray and white matter clusters get confused with skull tissues resulting in the poor segmentation. Clearly, intracranial tissue extraction must be done before clustering for the proposed approach to be effective. This research would benefit greatly from an increased training data set and testing data set. Given a wider variety of training data, the production rules would prove to be a much more accurate predictor of tissue type than is currently seen. It is also clear that clustering with the extracranial portion of the image masked out would need to 4

Table 1. Cluster Center Identification Testing Results for Data Set Requiring 70% Homogeneity for both Training and Testing. Tissue Misc. CSF White Gray Avg. Matter Matter Avg. Error 0 4.8 0 0 1.2 Rate (%) Avg. Number of 0 1 0 0.25 Mislabeled Clusters Avg. Size of NA 243 NA NA 243 Mislabeled Cluster (voxels) Avg. Size of 16466 305 1793 1163 4932 Cluster (voxels) Total # of Voxels 12843496 97387 423089 564418 NA % of Total Voxels 100 67 83 77 82 Remaining Voxel Error 0 46.2 14.0 10.1 17.6 Introduced by 17:3 Segmentation (%) Voxel Error 0 184 0 0 46 Added by Rules 80 Voxel Errors 0 0 0 54.4 14 Reduced by Rules 24 be done on unseen images to enable reasonable accuracy. A measure of homogeneity of individual clusters would be useful so that those that are not homogeneous can be excluded from labeling and processed further. The results here suggest that with enough training data it will be possible to learn to identify homogeneous clusters and hence label tissues of MR images of the brain. References [1] A. Bensaid, L. Hall, J. Bezdek, and L. Clarke. Partially supervised clustering for image segmentation. Pattern Recognition, 29(5):859 871, 1996. [2] J. Bezdek, L. Hall, and L. Clarke. Review of MR image segmentation techniques using pattern recognition. Medical Physics, 20(4):1033 1048, 1993. [3] M. Clark, L. Hall, D. Goldgof, and et al. MRI segmentation using fuzzy clustering techniques: Integrating knowledge. IEEE Engineering in Medicine and Biology, 13(5):730 742, 1994. [4] M. Clark, L. Hall, D. Goldgof, R. Velthuizen, M. Murtagh, and M. Silbiger. Automatic tumor segmentation using knowledge-based techniques. IEEE Transactions on Medical Imaging, 17(2):187 201, April 1998. [5] M. Clark, L. Hall, C. Li, and D. Goldgof. Knowledge based (re-)clustering. In Proceedings of the 12th IAPR International Conference on Pattern Recognition, pages 245 250, 1994. Jerusalem, Israel. [6] S. Crane. Automatic generation of heuristic rules to identify MR tissue types. Master s thesis, University of South Florida, 1998. Dept. of CSE, Tampa, Fl. [7] L. Hall, A. Bensaid, L. Clarke, and et al. A comparison of neural network and fuzzy clustering techniques in segmenting magnetic resonance images of the brain. IEEE Transactions on Neural Networks, 3(5):672 682, 1992. [8] G. Hillman, C. Chang, H. Ying, and et al. Automatic system for brain MRI analysis using a novel combination of fuzzy rule-based and automatic clustering techniques. In Medical Imaging 1995: Image Processing, pages 16 25. SPIE, February 1995. San Diego, CA. [9] C. Li, D. Goldgof, and L. Hall. Automatic segmentation and tissue labeling of MR brain images. IEEE TMI, 12(4):740 750, December 1993. [10] J. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1992. San Mateo, CA. [11] M. Sonka, S. Tadikonda, and S. Collins. Knowledge-based interpretation of MR brain images. IEEE TMI, 15(4):443 452, August 1996. [12] C. Tsai, B. Manjunath, and R. Jagadeesan. Automated segmentation of brain MR images. Pattern Recognition, 28(12):1825 1837, 1995. 5