Medical images, segmentation and analysis

Medical images, segmentation and analysis ImageLab group http://imagelab.ing.unimo.it Università degli Studi di Modena e Reggio Emilia

Medical Images Macroscopic Dermoscopic ELM enhance the features of pigmented skin lesion and then the automated clinical diagnosis

The lesions Both naevi and melanoma, have neither regular edges and shape, neither uniform color and can be distinguished by the skin based on the fact that for biological reasons the lesion has a darker aspect with respect to the skin. Unfortunately: No bimodal luminance distribution adaptive threshold can fails Most current image acquisition system do not really cope with color calibration Color classification based on absolute threshold fail Revert to unsupervised approaches, worse than trained system but more robust to not calibration acquisitions

Contour Definition of lesion contour is not so clear different medical opinions Nevertheless quantitative evaluation provides a mean to assess comparisons We ask dermatologists to draw contours by hand to have a ground truth database Final goal is to provide a boundary extraction of the lesion acceptable for dermatological experts and that can be exploited in further automatic analysis

Automated diagnosis The detection of lesion border in a medical image is the first and necessary step for further automatic skin segmentation. In the last years several researches are driven towards automatic skin cancer diagnosis. In ELM the most of them are based on the so-called ABCD-rule of dermatoscopy or on the latter 7-point checklist.

Approach We compared four widely employed color clustering algorithms: Median Cut K-Means Fuzzy C-Means Mean Shift Verify performances in identifying lesion versus skin No spatial constraint was employed We verify the influence of the algorithms with respect to the choice of the parameter settings and the lesions characteristics

Clustering Study of algorithms and methods for grouping, or classifying objects described either by a set of a measurements or by relationship between them In the case of segmentation, clustering is used to quantize and reduce the number of colors voting the most representatives This is a problem of exclusive and unsupervised classification, the meat of cluster analysis

Proposal Color clustering are widely used to find 2 clusters: ideally skin and lesion results not satisfactory due to the high quantization level applied that makes the features less representative We propose to divide the image in a number of clusters k, investigating many different k values, and then use a supervised classification to group the clusters into two super-classes classification based on the recognition of the skin

Proposed algorithm Four important steps: Training, Clustering, Classification and Border Extraction Image is quantized by a clustering algorithm, then the levels obtained are portioned in two group called skin and lesion. Right classification optimal boundary recovered Color and spatial classification of the skin obtains a very satisfactory approximation of the ideal cluster portioning.

Mean shift Technique for analysis of feature spaces For color image segmentation, the image data is mapped into the feature space (RGB space) Iterative procedure that shifts each data point to the average of data points in the neighborhood The algorithm is based on a kernel K m(x) x is called mean shift The repeated movement of data points to the sample means is called the mean shift algorithm We have implemented the algorithm using the 3D color histogram instead of spatial search window

Mean Shift Mean Shift does not need a fixed number of clusters but a radium research neighborhood λ To compare this algo with others we studied the behavior of the algo in function of λ. or the 12 number of clusters Relation between λ and number of clusters 120.00 100.00 80.00 60.00 40.00 20.00 0.00 0.50 11.50 34.00 76.00 124.00 λ λ Number of levels 300.00 2 136.00 3 88.00 4 64.00 5 49.00 6 42.00 7 35.00 8 15.50 16 10.00 24 7.50 32 4.40 64 2.25 128 To measure distance between data in RGB color space to evaluate the closeness of centroids (clusters), we used the Euclidian distance for each algorithm

Skin color training and classification We assume that the lesion occupies mostly the central part of the image and nothing the four angles We use the color of the angles of the image that are not covered by lesion as training color for skin clusters classification.

Skin color training and classification A mixed approach was implement to avoid that big lesion that cover some angles produce a worse skin classification Otsu s threshold and some steps of morphological dilatation identify an area that contain securely the lesion

Skin classification Double condition was defined. Each angle was investigated separately. For each cluster was extracted the number of pixels belonging to the angle of the image, and the number of pixels present in the image processed. The class is candidate to be classified as skin if are valid both the conditions:

Boundary extraction To identify the lesion s region, maximum area and the closeness of barycenter from the center of the image, are considered To extract the boundary from the binary image was used the Chain Code algorithm to run after the edge 1 clustering 2 Training and classification 3 Boundary extraction

Experimental results Database: 117 different dermoscopic images acquired with analog device and digitalized with a scanner with a resolution of 768x512 pixel and 24bit color depth Evaluation metrics: Specificity ξ, Sensitivity η and Score ψ (the mean of the firsts two)

Clustering accuracy evaluation Clustering algorithm, in the best condition (with a perfect classification of clusters in skin and lesion) could produce a lesion boundary reliable? Clusters are trained with the boundary of the correspondent Ground Truth and the Score ψ is calculated on the lesion area found. We have performed this procedure on the 117 images clustered with the four algorithms and with twelve different number of color levels: 2, 3, 4, 5, 6, 7, 8, 16, 24, 32, 64 and 128.

Upper bound performance Four algorithms have a similar behavior with negligible difference and it is evident that increasing the number of clusters, the accuracy of boundary detected increases, for all the algorithms 1 0,98 0,96 ψ 0,94 0,92 0,9 Median Cut K-Means Fuzzy C-Means Mean Shift 2 3 4 5 6 7 8 16 24 32 64 128 Number of clusters Median cut tends to have a similar number of pixels for each clusters, then, with few levels and with lesion area very different from skin area, the result fails. With more than two levels the performance becomes similar to the others algorithms. For Mean Shift, when number of levels is low, the function to estimate the λ fails for some images, driving it too high, and producing only one cluster

Comparison Number of clusters Media Cut K-Means FP FN xi eta psi FP FN xi eta psi 2 24179,96 6521,53 0,930 0,940 0,930 9244,74 7964,87 0,970 0,940 0,950 3 4230,68 11314,56 0,980 0,920 0,950 4982,9 15195,43 0,980 0,880 0,930 4 7845,23 10732,09 0,970 0,920 0,940 6968,79 15716,53 0,970 0,870 0,920 5 7845,22 11083,93 0,970 0,910 0,940 8217,26 11160,79 0,970 0,910 0,940 6 8503,79 7465,41 0,970 0,950 0,960 6040,44 7524,17 0,980 0,950 0,960 7 9948,68 7960,23 0,960 0,920 0,940 6971,28 10847,01 0,970 0,910 0,940 8 10786,95 7606,62 0,960 0,930 0,940 11754,32 8641,42 0,960 0,910 0,930 16 16565,19 5467,97 0,940 0,930 0,930 10857,96 6477,53 0,960 0,930 0,940 24 25808,54 4196,79 0,910 0,930 0,920 24006,97 5311,06 0,920 0,940 0,930 32 58399,8 2503,21 0,810 0,960 0,890 34716,97 4803,89 0,890 0,950 0,920 64 134992,6 1360,83 0,560 0,990 0,770 117002,2 2481,32 0,620 0,970 0,800 128 198874 139,92 0,340 1,000 0,670 183830 282,46 0,380 1,000 0,690 Number of clusters Fuzzy C-Means Mean Shift FP FN xi eta psi FP FN xi eta psi 2 1701,11 10522,83 0,990 0,910 0,950 33593,57 8910,61 0,855 0,914 0,884 3 3631,97 15559,44 0,980 0,880 0,930 13605,46 11853,78 0,931 0,899 0,915 4 4294,18 9537,66 0,980 0,930 0,960 5462,80 12057,65 0,976 0,922 0,949 5 5104,47 10042,95 0,980 0,920 0,950 8999,22 11692,60 0,965 0,926 0,946 6 5798,03 8932,19 0,980 0,930 0,960 6633,85 11170,01 0,972 0,931 0,951 7 8841,88 8689,14 0,970 0,890 0,930 9034,89 8277,62 0,963 0,952 0,957 8 9522,98 7015,92 0,960 0,930 0,950 9951,39 7303,09 0,960 0,959 0,960 16 17578,5 6989,09 0,940 0,900 0,920 9034,89 8277,60 0,963 0,952 0,957 24 35591,27 7915,88 0,880 0,860 0,870 9951,39 7303,09 0,960 0,959 0,960 32 61560,51 4500,73 0,800 0,940 0,870 10718,35 6890,60 0,958 0,962 0,960 64 119214,2 2143,94 0,610 0,970 0,790 12841,32 3909,09 0,948 0,979 0,963 128 215388 566,35 0,250 0,990 0,620 20382,23 2340,26 0,920 0,980 0,950

Experimental results Using the automatic boundary extraction the results are different because increasing the number of clusters, the sensitivity tends to increase but the specificity decreases, due to the growing of false positive, especially over 24 levels This behavior is 1,00 caused by the 0,90 difficulty to classify ψ 0,80 properly the clusters when this number is 0,70 too high, a part from 0,60 Mean Shift Median Cut K-Means Fuzzy C-Means Mean Shift 2 3 4 5 6 7 8 16 24 32 64 128 Number of clusters

Mean Shift: the best Mean Shift result stable and decrease negligible its performance only after the 64 levels Median Cut K-Means Fuzzy C-Means Mean Shift Its property to adapt itself to the features of the target image allows it to create clusters that are more distant each other in the feature space, making their classification more straightforward The best boundaries for the algorithms is obtained with 6 clusters for Median Cut, K-Means and Fuzzy C-Means and 128, the higher examined, for Mean Shift.

Examples Median Cut K-Means Fuzzy C-Means Mean Shift

Discussion After having performed 5616 segmentation runs for the analysis of the upper bound, and 5616 runs with our approach we can assume that color clustering is very suitable for dermatological image segmentation; all color clustering methods if the right number of clusters is used are sufficiently accurate, even if Mean Shift has demonstrated an higher stability w.r.t. the parameter variation.

Web Platform To analyze also the qualitative comparison we have asked to human experts We created a Web site with the results of the segmentation, choosing the best configuration for each algorithm, on the whole database and asked the Dermatologists to vote for the skin lesion contours detected of the four algorithms. To enhance the evaluation and to avoid that minds influence the choice, on the image was not indicated the algorithm used.

Web Platform

Web Platform Also in this case Mean Shift has been evaluates as the best method The importance of this result is implicit because is a double regard that evidence the good performance of this algorithm for both computer and human evaluation.

Topological Tree Good segmentation of the lesion is an important step to start the automated diagnosis and have a deep influence on the results. Topological tree description as in [1] is an application where our algorithm could improve the results A starting good segmentation preserve the algorithm, based on a recursive dichotomous Fuzzy C-Means, from errors of skin classification [1] R. Cucchiara, C. Grana, S. Seidenari, G. Pellacani, "Exploiting Color and Topological Features for Region Segmentation with Recursive Fuzzy c-means" in Machine Graphics and Vision, vol. 11, n. 2/3, pp. 169-182, 2002

TT Misclassification Original algorithm [1] Original algorithm [1] started from our segmentation

Integration with new clustering features Using the segmentation produced by Mean Shift and apply the algorithm for the region analysis, the results could improve due to the better segmentation Original algorithm [1] started from our segmentation Algorithm [1] working on our segmentation

Paper These results was submitted and accepted for the conference on "Image Processing," part of the SPIE Medical Imaging Symposium which will be held 11-16 February 2006 in San Diego, CA

Acknowledgment This project has been funded by MIUR- PRIN Project 2004-2006. We thanks Dermatologic Department of University of Modena and Reggio Emilia for the data evaluation.