Micro-scale Stereo Photogrammetry of Skin Lesions for Depth and Colour Classification Tim Lukins Institute of Perception, Action and Behaviour 1 Introduction The classification of melanoma has traditionally relied on colour and intensity images as input to various rule/checklist based diagnosis. A variety of computer vision techniques are often applied to enhance and segment such data into more representative features, in order to then automate detection of tumours via statistical and machine learning techniques. Such approaches do however omit to take into account the information that could additionally be provided by depth, and the resulting description of the actual macro surface structure of the area in question. Only one other such system attempts to utilise this modality - the DERMA system of Callieri et al. [?]- which is based on laser scanning of a subject to obtain 3D and aligned colour information. In this investigation we consider another approach for capturing and evaluating via dense stereo photogrammetry - with all the benefits of instantaneous capture and perfect 1:1 alignment of colour information. We seek to test whether the inclusion of depth can indeed help distinguish between various dermatological types, showing that the actual surface structure may also yield a valuable source of features on which to base classification. This involves addressing issues in accurate 3D capture at a very fine scale, the conversion and processing of all channels of information to enhance features, and the analysis of what variations and distributions within the data can be used to differentiate. 2 Methodology 2.1 Data Five datasets were collected using a stereo capture rig constructed of two Canon EOS 300D cameras, calibrated and using the maximum level of magnification supported by the standard EF-S lens (0.28m closest focus distance). The dense stereo data was recovered from the two simultaneous images via stereo photogrammetry matching software. The resulting perspective depth-maps were constructed in the left image co-ordinate frame - resulting in a a one-to-one correspondence between z-depth and pixel colour values. From this complete data, a subject area of pixels was selected for each of the five data sets. At a captured scale of one pixel, this therefore represents a surface area of approximately. The subject areas were chosen to represent a variety of different dermatological types (available to us from those found on normal human skin, in the absence of actual cancer examples) and were as follows: Normal, Freckle, Liverspot, Mole, Scab. These are shown in Figure 1 - indicating the variations in depth, size, and colouration. The first dataset acts as a control - representing as it does an area of normal skin. Each dataset also shows the superimposed outline of the mask defining the specific region of interesting skin - all other sample pixels being designated surrounding skin. This represents the only division of the data, and was performed by tracing the outline of the regions. In the case of the control dataset, this boundary simply splits the data with a line down the middle. The depth information represents the z-axis distance from the sensor. Of immediate note is the fact that the samples were captured from a variety of curved body surfaces - by which the presence of surface features is often obscured by global structure. The first two of the actual samples ( freckle and liverspot ) are also affected by a profusion of hair follicles - which has disrupted the stereo recovery process. These are retained to present the situation when the depth information should therefore be discounted. The last two samples however ( mole and scab ) have managed to preserve a useful amount of depth detail.
Figure 1. Colour/mask and depth. Top to bottom: normal, freckle, liverspot, mole, and scab.
6 ( 87 8 9 ( : I : : K : 6 ( U L % P S 1 S # 2.2 Processing Each of the datasets provides direct access to 7 channels of information. These represent for every pixel the z-depth, red, green, blue, hue, saturation, and value components. We are interested in accentuating various regions (e.g. rough areas), and to correct for global surface structure. To this end we adopt the following techniques described below. Local Variation The raw values of each channels can be processed to derive the mean difference for each pixel from its neighbourhood, in order to reflect local variation as: 1. Let "!$# be the value of the channel at pixel!. 2. Calculate the mean % for a neighbourhood of cardinality N. 3. Calculate the mean difference as: & '!#)(*+,-"!#. "!$# The calculation of the mean can be easily implemented as a square convolution matrix of dimensions / / (*0 with each value apart from the centre cell equalling 243 1. e.g. For 05( : 0.125 0.125 0.125 0.125 0 0.125 0.125 0.125 0.125 This process can be carried out to each of the pixels, for all the 7 channels, using the convolution shown - replicating edges with the nearest border value where necessary. Global Orientation The z-depth channel surface detail can be further revealed by fitting the depth-maps to an underlying surface orientation - assuming a simple plane in this case - and projecting the values onto that surface as: 1. Select the 4 corner point z values of a depth-map to construct least squares fitting matrix: : : ; := >; 1 < 1 C: ; 1<?A@AB F;?-@DBE< 1?A@ABG<?-@AB L4MONQPRNTS ;VU HJI 2. Derive co-efficients of the fitted plane by Singular Value Decomposition of this matrix: W XY #Z(\[^]_ E[ ;ed 3. Calculate projection of new depth data as: f ; f < g < g Ua`bc` LhM PR VU A Applying this process results in the corrected depth channels of the datasets shown in Figure 2. #Ai 2.3 Selection Using the mask provided for each dataset, it is possible to divide the point/pixel data into two sets - those within the boundary which are Interesting, and those out-with the boundary which are Surrounding. This selection can be additionally performed by a specified amount of erosion (using standard morphological operators) performed in either direction, the effects of which are to eliminate any ambiguous Border points - as shown for example in Figure 3. This results in the partitioning of the datasets as shown for example in Table 1. Notice that the various sizes of the subject areas results in a oversampling of the surrounding points for the smaller freckle and liverspot cases. Also, the larger the region, the longer the circumference of it s mask, and consequently the greater proportion of ambiguous border points which are ignored.
Figure 2. Projected depth-maps onto fitted planar surface. 50 50 100 100 150 150 200 200 250 50 100 150 200 250 250 50 100 150 200 250 Figure 3. Mask eroded +/- 20 pixels from boundary for mole dataset. Dataset Surround Interest Border Freckle 54660 2420 5420 Liverspot 46202 5870 10428 Mole 29969 18730 13801 Scab 38465 15980 8055 Table 1. Example division of datasets by erosion of +/- 10 pixels from boundary.
3 Results 3.1 Applying Local Variation For all datasets, every channel was separated and processed as described above for local variation and the resulting correlations between depth and the other channels were plotted as shown in Figure 6 (compared to normal skin variations in Figure 5a). These results would appear to show that localised variations in depth (and indeed the colour channels) do not provide sufficient variations to support any robust classification. There exists no global variations in the shape of the distributions between types, and furthermore no suitable separation between interesting and surrounding points. Principle Components Analysis (using the two largest eigenvectors) of the inclusion of depth data alongside the other 6 channels as a feature vector justifies this lack of useful variation, as shown by the similarity in Figure 4 in which no significant variation is contributed. 0.5 0.5 Interesting Skin Surrounding Skin 0.45 0.4 0.35 0.35 0.3 0.3 0.25 0.25 0.2 0.2 0.15 0.15 0.1 0.1 0.05 0 0.8 Interesting Skin Surrounding Skin 0.45 0.4 0.05 1 1.2 1.4 1.6 1.8 2 0 0.8 1 1.2 1.4 1.6 1.8 2 Figure 4. PCA projection of mole dataset. 3.2 Applying Global Orientation For all datasets, the z-depth was first projected onto a fitted planar surface to accommodate global orientation, with the resulting correlations between the projected depth and the unmodified colour channels plotted as shown in Figure 7 (compared to normal skin variations in Figure 5b). These results would appear to show considerably better potential for classification, by indicating a wide range of variations between datasets. As anticipated, the disrupted freckle and liverspot cases show no benefit with the addition of depth data to the ability to distinguish between surrounding and interesting skin. However, in the instances of mole and scab (where the stereo recovery process was unhindered) there is a good degree of separation between and within the data. It should be noted that applying global orientation, and then analysing local variation does not improve the distributions of the data. That is, it would appear that local surface structure requires a more complex approach for describing the local roughness of a region. 4 Conclusions In summary: the use of depth information as another modality for enhanced dermatological classification shows promise - but only under the guarantee that the data can be captured accurately and fitted to best preserve surface structure. Capturing the data accurately requires careful control of the environment lighting and of other factors that can affect the stereo recovery process - especially in the presence of hair follicles (i.e. shaving the skin area should be performed first). Fitting the data to best preserve surface structure might perform better with more complex underlying surfaces (e.g. cylinders) and in the means of then projecting the data onto that surface orthogonally. Furthermore, better localised analysis of local roughness would perhaps yield more representative features for a region (e.g. looking at the variation of surface normal incidence, or curvature of small patches).
This investigation has only looked at a very small sample size, and non-exotic subject regions. New camera calibration and specialist lenses could enable ever finer 3D information, and using image segmentation and more advanced classification techniques have the potential to greatly improve and fully automate the process of diagnosis. Figure 5. Correlations for normal dataset, split exactly into 2 arbitrary sets.
Figure 6. Local variation from mean for 3x3 neighbourhood (selection via border erosion +/- 10). Red, green, blue, hue, saturation and value channels plotted against depth for each dataset.
Figure 7. Global oriented depth data plotted against unmodified red, green, blue, hue, saturation and value channels for each dataset (selection via border erosion +/- 20).