CHAPTER-1 INTRODUCTION 1.1 Fuzzy concept, digital image processing and application in medicine With the advancement of digital computers, it has become easy to store large amount of data and carry out different operations on them. The falling cost of hardware has made a way for digital computers to be used in day-to-day life. In science and technology, digital computers have become an indispensable commodity. From storing simple 2-dimensional data in a desktop PC to computation of complex multi-dimensional data in supercomputers, digital computers have it all. The ever-growing amount of data and the increasing complexity demands efficient and robust mathematical modeling for achieving ubiquitous computing. One of the most important jobs is to classify the data according to requirement. The conventional set theoretic mathematical approach was earlier used to classify the data. But it fails to realize the ambiguities and uncertainties present in most of the data. The two-valued logic fails to handle the samples showing characteristics that belong to multiple categories. In 1965, Lotfi A. Zadeh, a professor of Electrical Engineering, University of California at Berkeley, USA, proposed the idea of Fuzzy set theory. Unlike conventional set theory, in fuzzy set theory, an element can have multiple membership values ranging from 0 to 1. The flexibility of fuzzy set theory and fuzzy logic helps the researchers to deal with the vagueness present in data. The classification problem of data led to the evolution of clustering techniques. At first Hard clustering was
developed where a data item belong to only one cluster. Later on due to the rigidity of hard clustering, researchers tried to formulate clustering algorithms based on fuzzy mathematical approach. Of the different types of data available today, visual data is one of the most widely used data in several applications. Visual data consists of different types of images such as satellite image of land, astronomical image, X-ray image, thermal image, medical images like MRI scan images and many more. Human eyes can interpret a large variety of colour and morphological information from an image. But human eyes cannot differentiate very minute change in colour or brightness or contrast. Also, an ensemble of a large number of shapes leads to optical illusion. An example of limitations of human observation is that a piece of paper that seems white when lying on a desk, but can appear totally black when used to shield the eyes while looking directly at the bright sky. Digital computers can analyze the complexity of the images with accurate precision. Processing and analysis of digital images in digital computers is called digital image processing. Digital image processing came to the scene in the early 1960 s with the advancement of integrated circuits. Digital image processing techniques are now being employed in several fields. One of the popular applications of image processing is the medical image processing. Medical imaging has been undergoing a revolution in the past decade with the invention of faster and accurate devices. This has driven the requirement for corresponding software development and formulation of new algorithms in signal and image processing. Diagnosis and prognosis of disease by computer is
becoming essential to satisfy the urge of the customers for better and better service at our disposal. Microscopic images of human cells are indispensable part of medical diagnosis. Cells provide an array of information regarding the presence of disease in human. Cancer is a disease characterized by uncontrolled growth of human cells. Cancer starts in the cells of part or organ of human body and gradually spreads to other parts or organs. So, an early detection of cancer requires the analysis of cells which are suspected to be infected with cancer. Usually, the cells are inspected through a microscope by the doctors. But there are some limitations of human interpretation in this regard. Human mind tend to be qualitative rather than quantitative. This leads to some wrong interpretation. Memory is another concern for human mind. At a time, when medical technology has improved by leaps and bound and the demand for quick and correct diagnosis is very high, there is no scope for discrepancy for the doctors. Computer comes into play in medicine in this very aspect of diagnosis process. Medical image processing is an emerging field in this regard. Processing of medical images by computers and interpreting the processed data is the main task of computer assisted methods in medical diagnosis. Fuzzy clustering together with conventional image processing techniques forms a powerful tool to handle the intricacies of medical image processing. The major problem in medical image processing is the vagueness and uncertainty possessed by medical images. The segmentation of medical images is the starting step toward automated diagnosis. But the medical images are very complex and contain a high degree of uncertainty. The flexibility and adaptability of fuzzy clustering methods aids image segmentation tasks to segment the complex medical
images. Fuzzy clustering and image segmentation is discussed below along with the medical images used in the present study. 1.2 Fuzzy clustering In many practical situations, for images, issues such as limited spatial resolution, low contrast, overlapping intensities, noise and intensity variations etc. creates fuzziness in the object boundaries in the image. One of the best options to resolve the issue is the fuzzy set theory. It produced the idea of partial membership of belonging described by a membership function. Clustering analysis or simply clustering is the assignment of set of observations into subsets called clusters so that the observations in the same cluster are similar in some sense. In clustering method, we are given a set of unlabeled data:. Our task is to divide these data into several groups according to a similarity measure or inherent structure of the data. Such grouping can be done using a clustering procedure. Fuzzy clustering is widely used in pattern recognition field. In a fuzzy clustering procedure, a sample is assigned a membership function for each of the groups, making a fuzzy partition. The membership values play an important role in the clustering process and they make the classification procedure more flexible and robust to deal with noisy and uncertain data. Fuzzy c-means (FCM) algorithm, which is based on Euclidean distance, is an iterative clustering technique that produces optimal c-partitions.
1.3 Image segmentation The most basic and important part of image processing is the image segmentation. The goal of image segmentation is the partitioning of an image into a set of disjoint regions with uniform and homogeneous attributes such as intensity, colour, tone or texture etc. More precisely, it is the process of assigning a label to every pixel in an image and the pixels having the same label share certain visual features. There is no general theory of image segmentation. As a consequence, no single standard method of image segmentation has emerged. Rather, there are some ad hoc methods that have received some degree of popularity from the application point of view. Some common paradigms used in image segmentation task are as follows (a) Regions of the image segmentation should be uniform and homogeneous with respect to some characteristics such as gray tone or texture. (b) Definition of region interiors should be simple and without many small holes. (c) Adjacent regions of segmentation should have significantly different values with respect to the characteristic on which they are uniform. (d) Boundaries of each segment should be simple, not ragged, and must be spatially accurate. 1.4 Medical images, the Pap smear image Pap smear is a cytological test of the uterine cervix. In Pap smear test, cervical cells are collected from the surface of the cervix using a brush or spatula and are smeared to into a slide. The slide is then stained by Papanicolaou method.
The staining makes it possible to observe the characteristics of the cells under a microscope. A stained slide when looked upon under a microscope shows the cell nuclei and the cytoplasm along with the background. The cytologists look for any cellular change in the slides. The factors that may indicate any abnormality are the change in the shape and size of the cell nuclei, increasing nucleus-cytoplasm area ratio (N/C ratio), and increasing chromatin content of the nucleus etc. 1.5 Objective of the study: The prime objectives of the present study comprises of 1. Image segmentation. 2. Use of Fuzzy clustering techniques. 3. Carry out basic segmentation of medical images namely histogram and morphology based segmentation of microscopic Pap smear images and feature extraction. 4. Application of Fuzzy Clustering techniques in the segmentation of microscopic Pap smear images and shape analysis of the cervical cells. 5. Enhancement the Fuzzy c-mean algorithm in terms of computation time minimization and data reduction. 1.6 Methodology 1) Medical images are mainly the images obtained from various tests performed on human beings to diagnose any disease. Medical images commonly include: Radiological images such X-ray image, Computer Tomography (CT) images, Magnetic Resonance Image (MRI), Histological
images such biopsy image, Cytological images such as Fine-Needle- Accession-Cytology (FNAC) image, Pap smear images etc. As the rate of incidence of cervical cancer is highest in the North Eastern region, the Pap smear images are considered for the present study. As the Pap smear is the first and basic test which can detect cervical cancer at an early stage, the Pap smear images are given more importance in the present study. The images are collected from Dr. B. Borooah Cancer Institute, Guwahati. Digital copies of the images are obtained from a high resolution camera mounted on the microscope. 2) Basic image segmentation methods such as Histogram and Morphology based segmentation are studied. The gray level information along with the structural characteristics is extracted from the medical images. The medical images used in the present study are the microscopic Pap smear images of cervical region. Other required image processing tasks such enhancement and conversion of colour space are also carried out. The images are originally in RGB colour space and for the simplicity they are converted to gray level space with each pixel having a gray level value varying from 0 to 255. Enhancement of the images is done by histogram stretching that is redistributing over the whole histogram keeping the whole histogram count unchanged. For segmentation purpose, adaptive histogram thresholding is used for given set of images of same type. The thresholding process is followed by choice of structure based information of the cells present in the Pap smear images. Thus the histogram threshold and the structure
information segment the Pap smear images into cell nuclei and the cytoplasm. 3) The segmented images obtained from the application of histogram and morphology based methods are further analyzed based on some clinical paradigm. The paradigms considered are: a) The distribution of cell nuclei, which is an indicative measure of the number of normal and abnormal nuclei in a single slide; b) Compactness, which is the dimensionless shape measure of the cell nuclei and c) Eccentricity, which is the measure of roundness of the cell nuclei. 4) Fuzzy c-mean (FCM) algorithm is studied to find out its applicability to the segmentation of microscopic Pap smear images. Two major drawbacks found with the FCM algorithm are: the FCM algorithm cannot detect clusters of arbitrary shaped clusters except spherical clusters, and secondly the repetitive nature of FCM algorithm consumes a lot computation time especially in image segmentation. The first limitation can be eliminated by replacing the Euclidean distance by other distance measure such as Mahalanobis distance. The pixel information can be summarized in the form of image histogram which can be used to minimize the computational time of the FCM algorithm. 5) Shape of the cells in a Pap smear image is one of the most important criteria for identification of abnormality of cells. Chain codes are used to trace the boundary of the region of interest (ROI). In Pap smear images the region of interest (ROI) is the cell nuclei. Choosing one or two reference
points on the cell nuclei, the boundary can be traced with the help of chain codes. The segmentation process, the feature extraction and the shape analysis process are implemented in MATLAB. 1.7 Organization of the thesis The rest of the thesis is organized in eight chapters as follows: Chapter 2: In chapter 2, an introduction to clustering techniques is given and its various aspects are discussed. It describes the basis of clustering, distance measures used in different clustering algorithms, types of clustering algorithms and their application. A detailed discussion on fuzzy c-means (FCM) algorithm is also given along with preliminaries of fuzzy set theory and cluster validity measures for fuzzy clustering. Chapter 3: Chapter 3 contains description of fundamentals of Digital image processing, basic components of digital image processing and steps in digital image processing. A detailed discussion on image segmentation techniques is also made. A review on segmentation based on fuzzy clustering methods is made in this chapter. Chapter 4: In chapter 4, a concise description on microscopic medical image, particularly microscopic images of Papanicolaou test of uterine cervix is given in the first part. Computer aided methods and their evolution is discussed, particularly soft computing methods in medicine. Some previous work on microscopic images Pap smear test are also discussed, mostly the non-fuzzy applications.
Chapter 5: Chapter 5 describes image histogram and mathematical morphology of image and discusses their application in image segmentation. Colour morphology and some examples of histogram and morphology based microscopic cell image segmentation methods are also discussed in this chapter. Chapter 6: Chapter 6 contains the result and discussion of the conventional image segmentation techniques used in the present study along with the feature extraction task. A method for segmentation of microscopic Pap smear image based on histogram and mathematical morphology is presented. The preprocessing, consisted of colour conversion and contrast enhancement of the Pap smear images, is described. Pros and cons and further improvement of the work are discussed in the Discussion and Conclusion section. Chapter 7: In Chapter 7, a review of fuzzy clustering in image segmentation, particularly in Pap smear image segmentation, is given. Some of the previous works on Pap smear image processing through fuzzy clustering are reviewed. An FCM clustering based segmentation of Pap smear images is carried out and the segmented images are analyzed with chain code. The clustering is validated by three cluster validity measures. The results of the application of FCM clustering on the Pap smear images and the chain coded tracing of cell nuclei are presented. In the Discussion and Conclusion part, a critical discussion on the essence of the results is presented. Chapter 8: In Chapter 8, a novel approach to image segmentation is proposed. The conventional FCM is modified to redress three of its basic limitations. The modification validated with cluster validity measures as well as medical image
data. A generalized shape theory is discussed and applied to trace the cell nuclei in the Pap smear images. A comparative statement of the methods used in this study is given in the Discussion and Conclusion section. Chapter 9: In Chapter 9, overall summery, conclusion and future works of the thesis is given. The summery part consists of overall findings of the work with a brief description of the workflow. The concluding remarks contain the critical discussion on the efficiency of the proposed method. Finally, further works that can be carried out are put forward.