FEATURE DESCRIPTORS FOR NODULE TYPE CLASSIFICATION Amal A. Farag a, Aly A. Farag a, Hossam Abdelmunim ab, Asem M. Ali a, James Graham a, Salwa Elshazly a, Ahmed Farag a, Sabry Al Mogy cd,mohamed Al Mogy c, Sahar Al Jafary e, Hani Mahdi b, Robert Falk f and Rebecca Milam g a Computer Vision and Image Processing Laboratory (CVIP Lab), University of Louisville, Louisville, KY 4292 b Computer & Systems Engineering Department, Faculty of Engineering, Ain Shams University, Cairo, Egypt c School of Medicine, Mansoura University Egypt d Mogy Scan, Mansoura, Egypt e School of Medicine, Ain Shams University, Cairo, Egypt f Jewish Hospital and 3DR, Louisville, Kentucky g University of Louisville, Department of Radiology E-mail: aafara2@louisville.edu\ URL: www.cvip.uofl.edu Abstract This paper examines feature-based nodule description for the purpose of nodule categorization (i.e., associating detected nodules into types) in low-dose CT scanning (LDCT). The multi-resolution Local Binary Pattern (LBP) and Distance Transform of the edge maps were used to generate the features that describe the texture and shape of common nodules and non-nodules. The LBP of the Distance Transform output were merged together to obtain shape and texture based feature descriptors of the nodules and nonnodules. These features were optimized using PCA and LDA, and the resultant sets were used for classifying/categorization into five categories: juxta-pleural, vascularized, pleural-tail, wellcircumscribed and non-nodule. In the categorization process, the combinational shape and texture based feature descriptor resulted in an overall 12% enhancement in results when compared to using shape and texture features separately. These results are encouraging and good indicators for progress towards fully automated detection, segmentation, categorization (into types) and classification (into pathologies) of lung nodules from LDCT scans. Keywords: lung nodule classification, Distance Transform, Geometric Descriptors. 1. Introduction Survival of lung cancer is strongly dependent on accurate and early diagnosis [1]. In the past two decades numerous screening protocols and studies have been accomplished worldwide for the purpose of studying early indications of lung cancer. Of importance is the ability to obtain diagnosis from low-dose computed tomography (LDCT), which reduces the risk of radiation. Within the scope of texture and shape recognition numerous algorithms exist [2]. In the case of texture classification, the main goal is the ability to produce a map which enables classification of the input image(s) to the desired classes, while shape classification depicts the object region in a manner that enables classification of the input image(s) where it is in the form of a binary boundary image or a filled contour of the object. Feature extraction in the literature is a rich subject matter whether it is general (color, texture and shape) or domain specific (finger- and human face- printing). Samala et al. [3] defined nine feature descriptors that describe the nodule characteristics: 1. Internal structure, 2. Lobulation, 3. Texture, 4. Malignancy, 5. Sphericity, 6. Margin, 7. Calcification, 8. Subtlety, and 9. Speculation. A nodule is defined as a small mass or
lump of irregular or rounded shape, yet this definition is ambiguous when it comes to applying it in the fields of computer vision and machine learning. The usage of texture and shape based feature extraction approaches on lung nodules of low-dose CT (LDCT) slices, to the best of authors knowledge, is not as common. A couple closely related works we found for texture feature extraction in our application are the following: 2nd order autocorrelation features to detect lung nodules in 3D chest images was implemented by Hara et al. [4]. Local texture analysis was used for identifying and classifying lung abnormalities in [5]. The k-nearest neighbor approach was implemented to extract the feature vector from the training set and leave out the feature vector that will be classified. In our previous work [6] and [7], we implemented an adaptation to Daugman s Iris Recognition algorithm, the SIFT, Multi-resolution LBP and SURF algorithms for the purposes of investigating texture based feature description algorithms for lung nodule classification in LDCT scans. In this paper we use a texture feature extraction algorithm known as Multi-resolution Local Binary Pattern (LBP) and a shape feature extractor, distance transform. These methods are implemented separately and simultaneously, i.e. the obtained distance transform image results for the nodules and non-nodules data undergo texture extraction resulting in a shape and texture based feature descriptor. This paper is organized as follows: Section 2 briefly describes the LBP and distance transform. Section 3 presents the classification results obtained from using the shape and texture based descriptors and Section 4 provides conclusion and future extensions. 2. Feature Descriptors Invariance and distinction are the main conditions that the success of object description centers around. Distinctive characterization of the desired object needs to be produced while robustly accommodating for variations in imaging conditions. In this section the Multi-Resolution Local Binary Pattern and Distance transform feature descriptors will be described. Multi-Resolution LBP: The LBP is a power texture feature descriptor in the Computer Vision literature that is invariant to monotonic changes in gray-scale and illumination resistant. This descriptor was first introduced in [8] and then extended in [9], which uses a circular neighborhood of various radius size to overcome neighborhood size limitations. In this paper we use the extended LBP operator within a (P,R) neighborhood with only uniform patterns and is noted as. In this paper we depict the results obtained from using the LBP of both the original images and gradient images, where Sobel filters ( and ) where used to generate the gradient magnitude image. The extracted LBP descriptors are projected to a lower-dimensional subspace using principle component analysis (PCA) and linear discriminate analysis (LDA) where noise is filtered out. Distance Transform: The distance transform is a shape-based feature descriptor that represents each pixel of the binary edge map image with a distance to the nearest obstacle pixel i.e. binary pixel. The extracted Signed Distance transform images were projected to a lower-dimensional subspace using PCA and LDA. The LBP of the signed
distance image results were also obtained, thus, resulting in a combinational shape and texture feature descriptor representation of the nodules and non-nodules. 3. Results This work is based on the ELCAP public database [1], which consists of 5 sets of LDCT lung scans taken at a single breath-hold, with slice thickness 1.25 mm and resolution.5x.5mm. Locations of 397 nodules were provided by radiologists, where used to create a database that consists of 39.12% juxta-pleural nodules, 13.95% vascularized nodules, 31.29% well-circumscribed nodules and 15.65% pleural-tail nodules. A subset database containing 294 nodules, which are accurately categorized of the original 397, was used. In the classification step, we use the ground truth marked nodules by the radiologists. Given the nodule centroid we extract the LBP, distance transform and the LBP of the distance transform images feature descriptors. Classification using the generated feature descriptors, for each of the five classes, was carried-out using a k-nn leave-one-out classifier with Euclidean distance as the similarity measure, in order to test if in fact distinctions are apparent between classes. Various training percentages within the classes were used for training, i.e. x% is the amount of ground-truth nodules taken into consideration in the training phase. Training in this paper was performed using a one-time random sampling approach. Quantification of nodule type classification performance was conducted by measuring true positives rates. A classification result is considered a true positive if a sample from a certain class is classified as belonging to the same class. Figure 1 depicts sample results for the LBP, distance transform and combinational shape and texture based methodology for each nodule type and non-nodule. 7 6 6 7 6 6 5 5 6 5 5 4 4 5 4 4 4 3 3 3 3 3 3 4 5 6 3 4 5 6 3 4 5 6 3 4 5 6 3 4 5 6 5 5 45 4 35 3 25 15 5 3 4 5 6 45 4 35 3 25 15 5 3 4 5 6 5 45 4 35 3 25 15 5 3 4 5 6 4 35 3 25 15 5 3 4 5 6 4 35 3 25 15 5 3 4 5 6 Figure 1: First row shows typical non-nodule (first column) and nodule textures (juxta-pleural, wellcircumscribed, vascularized and pleural tail, respectively). Second row shows edge maps (using the
Canny Operator). Third row is the signed distance. Fourth row is LBP descriptor. Last row is combinational shape and texture description (signed distnace+lbp). The results in Tables 1 show that LDA projection returns higher true-positive rates when more training is conducted using either the LBP or distance descriptors separately. When comparing the PCA results less training data resulted in better true-positive classification of nodules. In the non-nodule distance transform experimentations more training data was needed to obtain in some instances perfect results. This is understandable since the non-nodules do not have specific shape characteristics that can be defined or manipulated as in the nodules case. Table 2 depicts the results when the LBP was obtained from the distance transform images is impressive. A 2% truepositive rate increase was found in the PCA 25% training combinational vascular nodule case when comparing it to the PCA LBP results obtained when only the texture information was used for classification, and a 13% increase over the distance transform results alone. Variations of percentage increases were seen for each nodule category. Overall, the PCA combinational shape and feature description of nodules resulted in drastic true-positive rate increase in classification. All of the results depicted in Tables 2 and 3 allow the conclusion to be made that non nodules do in-fact contain descriptor variations that allow them to be correctly classified. Also, combination of shape and texture feature information allows for better object representation to be obtained, thus improved results in classification. Table 1: Classification Results for various nodules using Raw LBP, LDA LBP and PCA LBP with variable training percentages. Nodule Type Raw LBP LDA LBP PCA LBP % 75% 5% 25% % 75% 5% 25% % 75% 5% 25% Juxta Pleural 52 5 47 38 86 65 5 64 64 59 67 Well 4 41 4 26 65 8 63 36 64 6 66 82 Circumscribed Vascular 22 29 32 1 32 76 56 32 2 22 37 56 Pleural Tail 22 2 17 11 76 52 39 33 17 33 46 Non Nodule 78 77 74 68 88 6 44 86 87 83 96 Table 2: Classification Results for various nodules using Raw LBP, LDA LBP and PCA LBP with variable training percentages. Nodule Type Raw Distance Transform LDA Distance Transform PCA Distance Transform % 75% 5% 25% % 75% 5% 25% % 75% 5% 25% Juxta Pleural 38 39 35 34 88 61 45 62 54 6 68 Well 33 33 36 34 74 83 63 45 46 59 48 55 Circumscribed Vascular 12 12 15 15 29 76 54 29 37 22 61 63 Pleural Tail 17 17 17 15 85 54 33 17 24 35 52 Non Nodule 63 68 68 49 87 65 49 83 89 85 79
Table 3: Classification Results obtained from Raw Combinational Feature Transform and PCA Combinational Feature Transform with variable training percentages. Nodule Type Raw Combinational Feature Descriptor PCA on Combinational Feature Descriptor % 75% 5% 25% % 75% 5% 25% Juxta Pleural 4 41 39 37 78 76 76 79 Well 4 37 36 34 73 68 71 68 Circumscribed Vascular 24 2 22 12 51 54 44 76 Pleural Tail 22 26 22 2 33 35 41 54 Non Nodule 63 57 58 49 99 98 5. Conclusion and Future Work This paper discussed several key approaches for nodule and non-nodule texture and shape feature extraction using some of the well-known feature descriptors in the computer vision literature and used for the first time in the lung nodule classification research. The features from the descriptors were optimized by projection to lower subspace using PCA and LDA in order to decrease noise artifacts in the generated features. Classification of the nodules and non-nodules were examined using a k-nn leave-oneout algorithm with the Euclidean distance as the similarity measure, in order to test whether or not there exists significant distinctions between the nodule classes. An overall 12% true-positive rate increase was found in the PCA combinational classification results over using the PCA LBP or the PCA distance transform separately. Future directions are geared toward generating a larger nodule database from other clinical data to expand our work. We are aiming to integrate the usage of the combinational feature extractor into our detection process for false positive reduction and compare our findings with the literature. We are aiming to incorporate other classification techniques to the proposed approach in this paper for comparison and to obtain the best generalized method. Acknowledgements: This research has been funded by the Kentucky Lung Cancer Program. Data and expertise from Mogy Scan, Mansoura, Egypt, 3DR and Jewish Hospital, Louisville, KY and Ain Shams University, Egypt are greatly appreciated. References 1. United States National Institute of Health. www.nih.gov 2. Tao, B. and Dickinson, B. (). Texture recognition and image retrieval using gradient indexing. Journal of Visual Communication and Image Representation, 11(3):327 342. 3. Samala, R., et al., A Novel Approach to Nodule Feature Optimization on Thin Section Thoracic CT. Acad. Radiology. Vol. 15, pp.1181 1197.9
4. Hara, T., Hirose, M., Zhou, X., Fujita, H. and Kiryu, T. Nodule detection in 3D chest CT images using 2nd order autocorrelation features. Proceedings of the 5 IEEE Engineering in Medicine and Biology 27th Annual Conference. Shanghai, China. 5 5. van Ginneken, B., Katsuragwa, S., Romney, B., Doi, K. and Viergever, M. Automatic Detection of Abnormalities in Chest Radiographs Using Local Texture Analysis. IEEE Transactions on Medical Imaging. Vol. 21, No. 2. 2. 6. Amal Farag, Asem Ali, Shireen Elhabian, James Graham, Aly Farag and Robert Falk, Feature-Based Lung Nodule Classification. International Symposium on Visual Computing (ISVC- 1), Las Vegas, November 21, pp. 79-88 7. Amal Farag, Shireen Elhabian, James Graham, Aly Farag and Robert Falk, Toward Precise Pulmonary Nodule Descriptors for Nodule Type Classification, 13th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI-1), Beijing, September 21. 8. T. Ojala, M. Pietikainen, and D. Harwood, A comparative study of texture measures with classification based on feature distributions, in Pattern Recognition, 29, 1996, pp.51-59. 9. T. Ojala, M. Pietikainen, and T. Maenpaa, Multiresolution gray-scale and rotation invariant texture classification with local binary patterns in IEEE Transactions on Pattern Analysis and Machine Intelligence, 24, 2, pp. 971-987. 1. ELCAP public lung image database, http://www.via.cornell.edu/databases/lungdb.html