Skin Lesion Attribute Detection for ISIC 2018 Using Mask-RCNN Asmaa Aljuhani and Abhishek Kumar Department of Computer Science, Ohio State University, Columbus, USA E-mail: Aljuhani.2@osu.edu; Kumar.717@osu.edu Abstract We present an approach that utilizes Mask R-CNN method to detect different skin lesion attributes (Task2) of ISIC 2018 Challenge. In this approach, five pre-trained ResNet-101 networks are trained separately using augmented dataset of the original training data for skin lesion attributes. The model is evaluated using the training and validation datasets from the ISIC 2018 Challenge. Framework and Configuration Most image segmentation techniques are divided into 2 primary approaches. One takes advantage of convolution deconvolution models like unets while the other uses region proposals. We use the second approach utilizing Mask R-CNN 1 which was developed by building on Faster-RCNN 2 for image segmentation. Mask R-CNN is an improvement over Faster-RCNN. Faster-RCNN has 2 main stages. First is the RPN (Region Proposal Network) which proposes candidate bounding boxes for object detection/segmentation. The second stage works in two parts. It extracts features from the region proposals and feeds it to a classifier. The features for both the first stage(region proposal) and feature extraction are shared. Mask 1
R-CNN inherits these two stages. But in the second stage, Mask R-CNN also predicts an output mask for each of the RoI from region proposal. They utilize multitask loss on each of these RoI s. L = L cls + L bbox + L mask For feature extraction Mask R-CNN relies on a convolution neural network. Technically, this can be any CNN designed for object detection. We rely on the Mask RCNN implementation by Matterport. 3 We use ResNet50 and ResNet 101 as our backbone CNN with a learning momentum of 0.9 and decay rate of 0.1. We utilize various learning rates to get the best results. Data Pre-processing and Augmentation The training data for ISIC 2018 Challenge 4 (task 2) consists of 2594 training lesion images. For each training image, there is 5 binary mask corresponding to the five dermoscopic attributes. For validation purpose, we split the input images to the ratio of 90:10 training and testing sets. Image Tiling In order to overcome the shortage of the training data, every image was sliced into tiles of forth of its original size along the rows and columns and shifting by forth of the minimum between the width and the height of the tile size. Each tile was augmented with: 90, 180, 270 image rotations; and vertical and horizontal image flips. Mask tiles for the five attributes are generated following the same process. Mask tiles with no positive values were discarded, along with the corresponding image tile, for that specific attribute. The table 2 shows number of tiles for each attributes. 2
Table 1: Number of tiles for each attribute Globules Milia PigmentNet NegativeNet Streaks Training 81341 87836 210531 24946 12316 Testing 6645 7930 17560 2170 855 Training Method Our skin lesion detection model utilized the Ohio Supercomputer Center 5 resources. The model consisted of five Mask RCNN networks that detect dermoscopic attributes separately. Each network is trained on the attribute s tiles dataset. The model is configured to detect two classes: background and lesion attribute. The training was performed for 100 epochs with a learning rate 0.005 and learning momentum 0.9. The model is validated with the original training images. The Jaccard index score is computed between the predicted masks and the ground truth ones. To speed up the training process, Mask RCNN networks were initialized with COCO 6 weight prior to training. Validation and Results Since our model is trained on the augmented tiles dataset, we validated the model on the ISIC 2018 challenge training images. Table 2 shows the average of Jaccard index score on the different attributes. Table 2: The average of Jaccard index score Globules Milia PigmentNet NegativeNet Streaks Mean JS 0.2977 0.3029 0.2735 0.5244 0.4584 The model is also validated on 100 validation images with no ground truth masks using the ISIC 2018 challenge online submission system with a score of 0.320. We also trained our model on full size images without any tiling or augmentation. This model performed significantly better. This model was also validated on 100 validation images 3
with no ground truth masks using the ISIC 2018 challenge online submission system with a score of 0.363. Testing ISIC 2018 Challenge has 71 images for testing phase. We ran the model on the testing images and submitted the predicted masks to the submission portal. Lesion attribute detection sample results (a) (b) (c) (d) (e) Figure 1: Task2 lesion attribute detection with the highest Jaccard Index (top row shows predicted masks and the bttom row is the ground truth) (a)globules (b)milia like cyst (c)negative network (d)pigment network (e)streaks References (1) He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. B. CoRR 2017, abs/1703.06870. (2) Ren, S.; He, K.; Girshick, R. B.; Sun, J. CoRR 2015, abs/1506.01497. (3) Abdulla, W. Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow. https://github.com/matterport/mask_rcnn, 2017. 4
(a) (b) (c) (d) (e) Figure 2: Task2 lesion attribute detection with Jaccard Index < 0.2 (top row shows predicted masks and the bottom row is the ground truth) (a)globules (b)milia like cyst (c)negative network (d)pigment network (e)streaks (4) Tschandl, P.; Rosendahl, C.; Kittler, H. ArXiv e-prints 2018, (5) Center, O. S. Ohio Supercomputer Center. http://osc.edu/ark:/19495/f5s1ph73, 1987. (6) Lin, T.-Y.; Maire, M.; Belongie, S.; Bourdev, L.; Girshick, R.; Hays, J.; Perona, P.; Ramanan, D.; Zitnick, C. L.; Dollár, P. ArXiv e-prints 2014, 5