A Comparison of wavelet and curvelet for lung cancer diagnosis with a new Cluster K-Nearest Neighbor classifier HAMADA R. H. AL-ABSI 1 AND BRAHIM BELHAOUARI SAMIR 2 1 Department of Computer and Information Sciences, Faculty of Science and Information Technology Universiti Teknologi PETRONAS Bandar Seri Iskandar, 3175 Tronoh, Perak, Malaysia 2 College of Science, ALFAISAL University, P.O.Box 5927 Riyadh 11533 Kingdom of Saudi Arabia 1 Hamada.it@gmail.com 2 sbelhaouari@alfaisal.edu Abstract: - This paper presents a comparison of wavelet and curvelet for lung cancer in term of diagnostic accuracy when each one is applied separately to the cluster K-Nearest neighbor classifier. Lung cancer is among the diseases that lead to high mortality rate globally. The computer aided diagnoisis system that is shown in this paper consists of a preprocessing state, a feature extraction stage (wavelet or curvelet), a feature selection stage and finally a classification stage. The results obtained on the x-ray dataset that was utilized suggest that wavelet produce better accuracy with low false positives and false negatives compared to curvelet. Key-Words: - lung cancer; curvelet; wavelet; computer aided diagnosis; feature selection; Cluster-k-NN 1 Introduction Lung cancer is another type of cancer that also considered as on of the main causes of death globally [1]. Globocan project [2] stated that 16855 incidents occtured in 28 in both sexes (12.7% of all cancers) and 1376579 mortalities (18.2% of all cancers) occurd in the same year in both sexes as well. Figure 1 shows a comparision between lung cancers and other types of cancers in term of incidents and mortalities in both sexes. Fig. 1: Incidents and Mortality of cancer types affecting both sexes (All ages) [2] The figure above shows how this cancer affects humans more than any other type of cancer and consequently leads to death Computer aided diagnosis systems could assist in the early detection of lung cancer. Methods to achieve the detection and diagnosis of lung cancer in CAD system have been developed in previous studies. A CAD system for pulmonary nodule detection in chest radiography has been presented in [3]. The system employed an adaptive distancebased threshold algorithm for nodule segmentation. A geometric features, intensity features and gradient features were calculated for Each segmented nodule and a Fisher linear discriminant classifier was used for the classification. A 78.1% rate of the nodules were detected correctly when applied on a dataset consists of 167 chest radiographs of which 181 lung nodules were present. Sousa et al. [4] presented a system for automatic lung cancer detection. They system consists of six stages which each performs a specific task such as segmentation, reconstruction and false-positives reduction. The system achieved a ISBN: 978-1-6184-148-7 212
sensitivity of 84.84%, and a specificity of 96.15%. A system to detect lung nodules using shape-based genetic algorithm templage matching (GATM) was proposed by Dehmeshki et al. [5]. A dataset of 7 CT images with 178 nodules was used to evaluate the system; 16 of those nodules were detected by the system with a 9% accuracy.a system with an ensemble classifier aided by clustering was proposed by Lee et al. [6]. The task fo the system is to detect lung nodules. 32 scans of patients lungs were used for evaluation. The scans comprised 5721. A sensitivity of 98.33% and specificity of 97.11% were obtained. Orban et al. [7] developed a method for lung nodule detection. The method starts with an algorithm to preprocess the radiograph by removing the ribs that surround the lungs. This increases the visibility of any module that might exist. Moreover, another algorithm is utilized to increase the intensity of round-shaped objects; this algorithm is called the Constrained Sliding Band Filter (CSBF). Finally, SVM based on texture features is used to decrease the number of false detection. JSRT dataset were used together with a prive database, and the method achieved 61% sensitivity at 2.5 false positives per image. Another system that was proposed by Pereira et al [8] introduced an approach for the classification of lung nodules with multiclassifiers. The approach start by filtering the image using a multi-scale filter bank that inclosed of 36 filters at different scales and orientation, isotropic filters, Gaussian and Laplacian of Gaussian. A multiple classifiers based on different multiple-layer percpetrons (MLP) were used to classify the images. The authors evaluated the approach with JSRT dataset, with 19 classifier combination, the Borda count combination produced a 97% sensitivity with 43% error. Other combinations produced less errors (except 1 and 9) but not less than 16.21%. The authors concluded that the low performance is due to the large number of combinations. As this is an interesting area of research, many other CAD systems have been developed for lung and many other types of cancers. However, CAD systems got limitations such as diagnosing subtle regions and high false positives. In this paper, we deal with these issue by comparing the performance of wavelets and curvelets. 2 Method The presented CAD system in this paper consist of a preprocessing stage where all images are filtered with laplacian filter, then a feature extraction stage with wavelet or curvelet, a feature selection stage with 2 steps and finally, classification with clustering k-nearest Neighbor classifier. Figure 2 illustrates the system. The method is trained at the beginning using a dataset of regions that are normal and abnormal, begnin and malignant; and then, once the training is done, the method is tested using a set of images that were not used in the training. Dataset Preprocessing Apply Laplacian Filter Feature Extraction Wavelet (1...6 Levels) OR Curvelet (2...7 Scales) Feature Selection Calculate Statistical Energy Calculate Statistical Metric Testing Image Classification (Clustering K-Nearest Result Neighbor) Fig. 2: CAD system Overview The following subsections present explanation of each step in the CAD system. 2.1 Preprocessing For each image either in the training or testing of the method, laplacian filter is applied to enhance the image by sharpening the image. Laplacian is represented as follows [9]: (Linear form) (1) ISBN: 978-1-6184-148-7 213
Discreat form as follows: The x-direction: The y-direction (2) (3) (4) The produced filtered image will have a problem where the image background will be eliminated in the process, for that, we subtract the filtered image from the original image to recover the background. Figure 3 shows an example of applying laplacian filter on image 1 of the dataset. a Figure 2 shows in the frequency domain how the image is decomposed into,, and.the image corresponds to the lowest frequencies (Approximation), gives the vertical high frequencies (horizontal edges), the horizontal high frequencies (vertical edges) and the high frequencies in both directions (the diagonal). Fig. 4: Example of wavelet decomposition with level 3 For the purpose of this paper, the db1 wavelet with 6 levels will be utilized. b 2.2.1 Curvelet Transform The discrete curvelet transform is an image representation approach[1, 11]. It is based on the idea of representing a curve as superposition of functions of various length and width obeying the curvelet scaling law width length 2 [1, 11] Fig. 4 presents the curvelet analysis method. c Fig. 3 a)original image segment, b)laplacian filtered image, c) laplacian filered image after recovering the background 2.2 Feature Extraction The next step in the process is feature extraction. Wavelet and curvelet are used separately to compare their performance and find which one would achieve better result in the diagnosis of lung cancer. When including a subsection you must use, for its heading, small letters, 12pt, left justified, bold, Times New Roman as here. 2.2.1 Wavelet Transform The wavelet can be interpreted as signal decomposition in a set of independent, spatially oriented frequency channels. Let us suppose that Φ(x) and Ψ(x) are respectively a perfect low-pass and a perfect bandpass filter. Fig. 5: Curvelet Method Further reading about this method can be obtained at [11] 2.3 Feature Selection Once the feature extraction stage has been executed, a huge number of coefficients will be produced. It is important to reduce the coefficients by selecting those coefficients that contains the most important information that would contribute to high accuracy ISBN: 978-1-6184-148-7 214
and ignoring the remaining. For this reason, we use two steps for feature selection through calculating the statistical energy and then the statistical metric. The statistical energy is calculated as follows: (5) statistical metric for feature selection is introduced in this paper. This metric can be calculated as follows: Suppose m1, m2 and m3 are the mean of class1, class2, and class3, respectively and is the mean of all the classes. Let and so that is not sufficient to quantify the classification contribution of the coefficients because it may give the same values in the two cases. Therefore, there is a need to introduce another metric to quantify the coefficients contribution. We introduce another metric as follows: modified algorithm and the K-Nearest neighbor. K- means is used to cluster the data into classes and sub-classes with a centre point to represent each class and K-Nearest Neighbor is used to classify new data by calculating the Euclidean distance between the centre point of each class and the new data. With this combination, the classification is more accurate in less time. A full explanation of the algorithm can be found in [12]. 3 Dataset JSRT (Japanese Society of Radiological Technology) standard chest radiographs dataset [13] is utilized to evaluate the methods for lung cancer. There are 247 chest radiographs in this dataset, 154 enclose nodules (1 malignant and 54 benign) and 93 images do not enclose nodules. A 128 x 128 sub image that contains the nodules were selectd from the original images. The centre of the nodule is the centre of the sub image. Figure 6 shows an example of one chest radiography. (6) where is the statistical metric of class, is the mean of class, and is the number of classes. will be calculated using the following formula: Where is the number of the features in class. The way to select the desired feature coefficients will be as follows: If the statistical metric of any feature is less than a certain threshold, we will remove it, else we keep the feature. 2.4 Classification A classifier that is a based on the combination between the K-means modified algorithm and the K-Nearest Neighbor (K-NN) is applied in this research. This classifier was developed by Brahim Belhaouari Samir [12]. 2.4.1 Cluster-K-Nearest Neighbor Classifier (C- K-NN) The Cluster-K-Nearest Neighbor is a classifier that combines two algorithms that are the K-means a Fig. 6: An Example of the JSRT Dataset (JPCLN1.IMG) (a) Original Chest Radiograph (b) Extracted sub-image [13] 4 Results In this section the results that have been obtained are reported. There were two experiments, the first one on the classification of normal vs. abnormal images and the second was on the classification of benign vs. malignan images. 4.1 Normal Vs. Abnormal b ISBN: 978-1-6184-148-7 215
Table 1 shows the obtained results for the classification of normal vs. abnormal with wavelet db1 function, and Table 2 for Curvelet Table 1: Normal Vs. Abnormal (db1 Wavelet) Function Level Db1.9915 False Negatives 1.9829.9915.9915.9915.9915 Table 1: Normal Vs. Abnormal (Curvelet) Curvelet Scale.6239.3919.3488.79.2568.2791.7863.811.698.7692.811.93.767.1351.93.767 False Negatives 7.1216.93 As shown in tables 1 and 2, wavelet (db1) function outperformes curvelet, where an accuracy of 99.15 is reach in db1, and the highest of curvelet is 78.63% in scale 4. 4.2 Benign Vs. Malignant Tables 3 and 4 reports the performance of wavelet and curvelet when applied to the classification of benign vs. malignant experiments Table 3: Benign Vs. Malignant (db1 Wavelet) Function Level Db1.9481 False Negatives 1.9481.9351.9481.961.9481 Table 4: Benign Vs. Malignant (Curvelet) Curvelet Scale.7143.8.1111.991.4.991.2.991.4.991.2 7.991 ISBN: 978-1-6184-148-7 216
False Negatives.4 As shown in tables 3 and 4, wavelet (db1) function showed better performance than curvelet when both are applied separately to the classification task of benign vs. malignant cases. Moreover, the performance of wavelet when other functions such as haar was applied was also better than curvelet. This shows an evidence that wavelet could be better used for such classification tasks, however, it is necessary to appoint that, further testing with mutliple datasets with different modalities should be carried out in order to confirm this further. As noted before, the images that were used are x-ray images, further testing with CT images will also be required. 4 Conclusion The paper presented a comparision of the performance of wavelet and curvelet when applied separately for lugn cancer diagnosis. JSRT dataset was utilized in the experiments, and the obtained results suggest that wavelet performs better than curvelet in this CAD system. Although one wavelet function was reported in this paper, however, the performance of other wavelet functions (i.e. haar) was also better than curvelet. References: [1] W. H. O. (WHO). (212). World Health Statistics 212. Available: http://www.who.int/gho/publications/world_hea lth_statistics/en_whs212_full.pdf [2] Ferlay J, et al. (21, Oct. 4). GLOBOCAN 28 v2., Cancer Incidence and Mortality Worldwide: IARC CancerBase No. 1. Available: http://globocan.iarc.fr [3] R. C. Hardie, et al., " analysis of a new computer aided detection system for identifying lung nodules on chest radiographs," Medical Image Analysis, vol. 12, pp. 24-258, 28. [4] J. R. F. da Silva Sousa, et al., "Methodology for automatic detection of lung nodules in computerized tomography images," Computer Methods and Programs in Biomedicine, vol. 98, pp. 1-14, 21. [5] J. Dehmeshki, et al., "Automated detection of lung nodules in CT images using shape-based genetic algorithm," Computerized Medical Imaging and Graphics, vol. 31, pp. 48-417, 27. [6] S. L. A. Lee, et al., "Random forest based lung nodule classification aided by clustering," Computerized Medical Imaging and Graphics, vol. 34, pp. 535-542, 21. [7] G. Orbán, et al., "Lung Nodule Detection on Rib Eliminated Radiographs," in XII Mediterranean Conference on Medical and Biological Engineering and Computing 21. vol. 29, P. Bamidis and N. Pallikarakis, Eds., ed: Springer Berlin Heidelberg, 21, pp. 363-366. [8] C. Pereira, et al., "A Multiclassifier Approach for Lung Nodule Classification," in Image Analysis and Recognition. vol. 4142, A. Campilho and M. Kamel, Eds., ed: Springer Berlin Heidelberg, 26, pp. 612-623. [9] G. R.C. and W. R.E., Digital Image Processing, 3 ed. New Jersey: Prentice Hall, 28. [1] M. M. Eltoukhy, et al., "A comparison of wavelet and curvelet for breast cancer diagnosis in digital mammogram," Computers in Biology and Medicine, 21. [11] E. Candes and D. Donoho, "Curvelets: multiresolution representation,and scaling laws,," in Wavelet Applications in Signal and Image Processing VIII,Proceeding of the SPIE, 2. [12] B. B. Samir, "Fast and Control Chart Pattern Recognition using a New clusterk-nearest Neighbor," Journals of Word Academy of Science, Engineering and Technology., 29. [13] J. Shiraishi, et al., "Development of a Digital Image Database for Chest Radiographs With and Without a Lung Nodule," American Journal of Roentgenology, vol. 174, pp. 71-74, January 1, 2 2. ISBN: 978-1-6184-148-7 217