arxiv: v2 [cs.cv] 30 Sep 2018

Similar documents
Skin Lesion Classification and Segmentation for Imbalanced Classes using Deep Learning

Deep Residual Architecture for Skin Lesion Segmentation

Semantic Soft Segmentation Supplementary Material

Lecture 7: Semantic Segmentation

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution and Fully Connected CRFs

Skin Lesion Attribute Detection for ISIC Using Mask-RCNN

Encoder-Decoder Networks for Semantic Segmentation. Sachin Mehta

Efficient Segmentation-Aided Text Detection For Intelligent Robots

Mask R-CNN. Kaiming He, Georgia, Gkioxari, Piotr Dollar, Ross Girshick Presenters: Xiaokang Wang, Mengyao Shi Feb. 13, 2018

Instance-aware Semantic Segmentation via Multi-task Network Cascades

Deep learning for dense per-pixel prediction. Chunhua Shen The University of Adelaide, Australia

Mask R-CNN. By Kaiming He, Georgia Gkioxari, Piotr Dollar and Ross Girshick Presented By Aditya Sanghi

arxiv: v1 [cs.cv] 31 Mar 2016

Data Augmentation for Skin Lesion Analysis

Semantic Segmentation with Scarce Data

Team G-RMI: Google Research & Machine Intelligence

SSD: Single Shot MultiBox Detector. Author: Wei Liu et al. Presenter: Siyu Jiang

JOINT DETECTION AND SEGMENTATION WITH DEEP HIERARCHICAL NETWORKS. Zhao Chen Machine Learning Intern, NVIDIA

Mask R-CNN. presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma

TEXT SEGMENTATION ON PHOTOREALISTIC IMAGES

A Multi-task Framework for Skin Lesion Detection and Segmentation

arxiv: v3 [cs.cv] 2 Jun 2017

Channel Locality Block: A Variant of Squeeze-and-Excitation

Amodal and Panoptic Segmentation. Stephanie Liu, Andrew Zhou

Classifying a specific image region using convolutional nets with an ROI mask as input

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Semantic Segmentation

Kaggle Data Science Bowl 2017 Technical Report

Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture David Eigen, Rob Fergus

Automatic detection of books based on Faster R-CNN

Person Part Segmentation based on Weak Supervision

CEA LIST s participation to the Scalable Concept Image Annotation task of ImageCLEF 2015

Supplementary Material: Pixelwise Instance Segmentation with a Dynamically Instantiated Network

arxiv: v2 [cs.cv] 19 Nov 2018

arxiv: v2 [cs.cv] 16 Sep 2018

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

arxiv: v1 [cs.cv] 29 Sep 2016

arxiv: v1 [cs.cv] 15 Oct 2018

MSCOCO Keypoints Challenge Megvii (Face++)

Presentation Outline. Semantic Segmentation. Overview. Presentation Outline CNN. Learning Deconvolution Network for Semantic Segmentation 6/6/16

arxiv: v1 [cs.cv] 28 Nov 2018 Abstract

AUTOMATED DETECTION AND CLASSIFICATION OF CANCER METASTASES IN WHOLE-SLIDE HISTOPATHOLOGY IMAGES USING DEEP LEARNING

Final Report: Smart Trash Net: Waste Localization and Classification

Multi-channel Deep Transfer Learning for Nuclei Segmentation in Glioblastoma Cell Tissue Images

arxiv: v1 [cs.cv] 11 Apr 2018

Lecture 5: Object Detection

YOLO9000: Better, Faster, Stronger

Pixel Offset Regression (POR) for Single-shot Instance Segmentation

HIERARCHICAL JOINT-GUIDED NETWORKS FOR SEMANTIC IMAGE SEGMENTATION

Modern Convolutional Object Detectors

POINT CLOUD DEEP LEARNING

DeepIM: Deep Iterative Matching for 6D Pose Estimation - Supplementary Material

R-FCN: Object Detection with Really - Friggin Convolutional Networks

SEMANTIC SEGMENTATION AVIRAM BAR HAIM & IRIS TAL

Learning to Segment Object Candidates

Cascade Region Regression for Robust Object Detection

Object Detection on Self-Driving Cars in China. Lingyun Li

arxiv: v1 [cs.cv] 23 Mar 2018

Yiqi Yan. May 10, 2017

arxiv: v1 [cs.cv] 26 Jul 2018

Finding Tiny Faces Supplementary Materials

A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS. Kuan-Chuan Peng and Tsuhan Chen

Deconvolutions in Convolutional Neural Networks

Volumetric and Multi-View CNNs for Object Classification on 3D Data Supplementary Material

Object Detection. CS698N Final Project Presentation AKSHAT AGARWAL SIDDHARTH TANWAR

Spatial Localization and Detection. Lecture 8-1

Thinking Outside the Box: Spatial Anticipation of Semantic Categories

Advanced Video Analysis & Imaging

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Convolutional Neural Networks: Applications and a short timeline. 7th Deep Learning Meetup Kornel Kis Vienna,

arxiv: v2 [cs.cv] 14 May 2018

Fully Convolutional Networks for Semantic Segmentation

R-FCN++: Towards Accurate Region-Based Fully Convolutional Networks for Object Detection

Enhanced K-mean Using Evolutionary Algorithms for Melanoma Detection and Segmentation in Skin Images

Detecting and Parsing of Visual Objects: Humans and Animals. Alan Yuille (UCLA)

Geometry-aware Traffic Flow Analysis by Detection and Tracking

arxiv: v2 [cs.cv] 18 Jul 2017

3 Object Detection. BVM 2018 Tutorial: Advanced Deep Learning Methods. Paul F. Jaeger, Division of Medical Image Computing

REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION

Object detection using Region Proposals (RCNN) Ernest Cheung COMP Presentation

Traffic Multiple Target Detection on YOLOv2

In-Place Activated BatchNorm for Memory- Optimized Training of DNNs

Deep Learning. Visualizing and Understanding Convolutional Networks. Christopher Funk. Pennsylvania State University.

Regionlet Object Detector with Hand-crafted and CNN Feature

RSRN: Rich Side-output Residual Network for Medial Axis Detection

PortraitNet: Real-time Portrait Segmentation Network for Mobile Device

Scene Composition in Augmented Virtual Presenter System

Andrei Polzounov (Universitat Politecnica de Catalunya, Barcelona, Spain), Artsiom Ablavatski (A*STAR Institute for Infocomm Research, Singapore),

Todo before next class

Supplementary Material: Unsupervised Domain Adaptation for Face Recognition in Unlabeled Videos

arxiv: v1 [cs.cv] 2 Aug 2018

Isointense infant brain MRI segmentation with a dilated convolutional neural network Moeskops, P.; Pluim, J.P.W.

Deep Learning for Object detection & localization

arxiv: v1 [cs.cv] 24 May 2016

Shadow Detection Using Robust Texture Learning

Computer Vision Lecture 16

Deep Learning Requirements for Autonomous Vehicles

An Analysis of Scale Invariance in Object Detection SNIP

Iterative Multi-domain Regularized Deep Learning for Anatomical Structure Detection and Segmentation from Ultrasound Images

Computer Vision Lecture 16

Transcription:

A Detection and Segmentation Architecture for Skin Lesion Segmentation on Dermoscopy Images arxiv:1809.03917v2 [cs.cv] 30 Sep 2018 Chengyao Qian, Ting Liu, Hao Jiang, Zhe Wang, Pengfei Wang, Mingxin Guan and Biao Sun Abstract MTlab (Meitu, Inc.) {qcy1, lt, jh1, wz, wpf1, gmx, sunbiao}@meitu.com This report summarises our method and validation results for the ISIC Challenge 2018 - Skin Lesion Analysis Towards Melanoma Detection - Task 1: Lesion Segmentation. We present a two-stage method for lesion segmentation with optimised training method and ensemble post-process. Our method achieves state-of-the-art performance on lesion segmentation and we win the first place in ISIC 2018 task1. 1 Introduction Melanoma is the most dangerous type of skin cancer which cause almost 60,000 deaths annually. In order to improve the efficiency, effectiveness, and accuracy of melanoma diagnosis, International Skin Imaging Collaboration (ISIC) provides over 2,000 dermoscopic images of various skin problems for lesion segmentation, disease classification and other relative research. Lesion segmentation aims to extract the lesion segmentation boundaries from dermoscopic images to assist exports in diagnosis. In recent years, U-Net, FCN and other deep learning methods are widely used for medical image segmentation. However, the existing algorithms have the restriction in lesion segmentation because of various appearance of lesion caused by the diversity of persons and different collection environment. FCN, U-Net and other one-stage segmentation methods are sensitive to the size of the lesion. Too large or too small size of lesion decreasing the accuracy of these one-stage methods. Two-stage method could reduce the negetive influence of diverse size of lesion. Mask R-CNN can be viewed as a two-stage method which has an outstanding performance in COCO. However, Mask R-CNN still has some brawbacks in lesion segmentation. Unlike clear boundary between different objects in COCO data, the boundary between lesion and healthy skin is vague in this challenge. The vague boundary reduces the accuracy of RPN part in Mask R-CNN which may cause the negative influence in following segmentation part. Furthermore, the low resolution of input image in the segmentation of Mask R-CNN also reduce the accuracy of segmentation. In this report, we propose a method for lesion segmentation. Our method is a two-stage process including detection and segmentation. Detection part is used to detect the location of the lesion and crop the lesion from images. Following the detection, segmentation part segments the cropped image and predict the region of lesion. Furthermore, we also propose an optimised processed for cropping image. Instead of cropping the image by bounding box exactly, in training, we crop the image with a random expansion and contraction to increase robustness of neural networks. Finally, image augmentation and other ensemble methods are used in our method. Our method is based on deep convolutional networks and trained by the dataset provided by ISIC [1] [2]. 1

Figure 1: Process 2 Materials and Methods 2.1 Database and Metrics For Lesion Segmentation, ISIC 2018 provides 2594 images with corresponding ground truth for training. There are 100 images for validation and 1000 images for testing without ground truth. The number of testing images in this year is 400 more than which in 2017. The size and aspect ratio of images are various. The lesion in images has different appearances and appear in different parts of people. There are three criterions for ground truth which are a fully-automated algorithm, a semi-automated floodfill algorithm and manual polygon tracing. The ground truth labelled under different criterions has different shape of boundary which is a challenge in this task. We split the whole training set into two sets with ratio 10:1. The Evaluation of this challenge is Jaccard index which is shown in Equation 1. In 2018, the organiser adds a penalty when the Jaccard index is below 0.65. detection part detects the location of the lesion with the highest probability. According to the bounding box, the lesion is cropped from the original image and the cropped image is normalised to 512*512 which is the size of the input image of segmentation part. A fine segmentation boundray of the cropped image is provided by the segmentation part. 2.2.1 Detection Part In the detection process, we use the detection part in MaskR-CNN [3]. We also use the segmentation branch of Mask R-CNN to supervise the training of neural networks. J(A, B) = { A B A B J(A, B) 0.65 0 otherwise (1) 2.2 Methods We design a two-stage process combining detection and segmentation. Figure 1 shows our process. Firstly, the Figure 2: Detection 2

2.2.2 Segmentation Part In the segmentation part, we design an encode-decode architecture of network inspired by DeepLab [4], PSPNet [5], DenseASPP [6] and Context Contrasted Local [7]. Our architecture is shown in Figure 3. The features are extracted by extended ResNet 101 with three cascading blocks. After ResNet, a modified ASPP is used to compose various scale features. We also use a skip connection to transfer detailed information. Figure 3: Segmentation Figure 4 shows the structure of the modified ASPP block. A 1x1 convolutional layer is used to compose the feature extracted by ResNet and reduce the number of feature maps. After that, we use a modified ASPP block to extract information in various scales. The modified ASPP has three parts which are dense ASPP, standard convolution layers and pooling layers. Dense ASPP is proposed by [6] which reduce the influence of margin in ASPP. Considering the vague boundary and low contrast appearance of the lesion, we add standard convolution layers to enhance the ability of neural networks in distinguishing the boundary. The aim of pooling layers is to let the networks consider the surrounding area of the low contrast lesion. The modified ASPP includes three dilated convolutions with rate 3, 6, 12 respectively, three standard convolution layers with size 3, 5, 7 respectively and four pooling layers with size 5, 9, 13, 17 respectively. 2.3 Pre-processing Figure 4: Modified ASPP Instead of only using RBG channels, we combine the SV channels in Hue-Saturation-Value colour space and lab channels in CIELAB space with the RGB channels. These 8 channels are the input of segmentation part. Figure 5 shows different channels of HSV and CIELAB colour space. 3

(a) Flip Left to Right (b) Roatate 45 Figure 7: Examples for image augmentation We use the Adam optimiser and set the learning rate to Figure 5: Single channel in HSV and CIELAB colour 0.001. The learning rate will be set at 92% of the previous space after each epoch. The batch size is 8. We early stop the training when the net start overfitting. We use the dice loss which is shown in Equation 2, where pi,j is the prediction 2.4 Post-processing in pixel (i, j) and gi,j is the ground truth in pixel (i, j). The ensemble is used as the post-processing to increase P the performance of segmentation. The input image of the (pi,j gi,j ) Pi,j P L = P (2) segmentation part is rotated 90, 180 degrees and flipped to i,j pi,j + i,j gi,j i,j (pi,j gi,j ) generate the other three images. Each image has a result predicted by the segmentation part. The results of the The input image of the segmentation part is crop ranrotated and flipped image need to rotate and flip back to the original image. The final mask is the average of four domly in a range near the bounding box predicted by detection part. In order to improve the diversity of input images results of these images. and provide more information about background context. In training, the input image of the segmentation part is cropped from 81% to 121% of the bounding box randomly which is shown in Figure 8. Figure 6: Ensamble 2.5 Training Image augmentation is used in training of detection and segmentation. Rotation, colour jitter, flip, crop and shear is operated in each image. Each channel of images is scaled to 0 to 1. In segmentation part the size of input images is set to 512x512. Some examples are shown in Figure 7. Figure 8: Crop image 4

Figure 9: Predicted masks of different segmentation methods 2.6 Implementation The detection part of our method is implemented by using Pytorch 0.4 in Ubuntu 14.04. The framework is from https://github.com/roytseng-tw/ Detectron.pytorch. The segmentation part is implemented by using Pytorch 0.3.1 in Ubuntu 14.04. The neural networks are trained by two Nvidia 1080 Ti with 60 GB of RAM. 3 Results Method Jaccard Jaccard(>0.65) Mask R-CNN 0.825 0.787 One-stage Segmentation 0.820 0.783 Two-stage Segmentation 0.846 0.816 Table 1: Evaluation metrics of different segmentation methods The evaluation metrics on 257 images of our two-stage segmentation method with MaskR-CNN and our one-stage segmentation method is shown in Table 1. The thresholded Jaccard of our two-stage method on the official testing set in 0.802. Figure 9 shows the outputs of different segmentation methods. Compared with other methods, The results of our two-stage method has a better location and more smooth edge. References [1] Philipp Tschandl, Cliff Rosendahl, and Harald Kittler. The HAM10000 dataset, a large collection of multisource dermatoscopic images of common pigmented skin lesions. Sci. Data, 5:180161, 2018. [2] Noel C. F. Codella, David Gutman, M. Emre Celebi, Brian Helba, Michael A. Marchetti, Stephen W. Dusza, Aadi Kalloo, Konstantinos Liopyris, Nabin K. Mishra, Harald Kittler, and Allan Halpern. Skin lesion analysis toward melanoma detection: A challenge at the 2017 international symposium on biomedical imaging (isbi), hosted by the international skin imaging collaboration (ISIC). CoRR, abs/1710.05006, 2017. [3] Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross B. Girshick. Mask R-CNN. CoRR, abs/1703.06870, 2017. [4] Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L. Yuille. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. CoRR, abs/1606.00915, 2016. 5

[5] Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, and Jiaya Jia. Pyramid scene parsing network. CoRR, abs/1612.01105, 2016. [6] Maoke Yang, Kun Yu, Chi Zhang, Zhiwei Li, and Kuiyuan Yang. Denseaspp for semantic segmentation in street scenes. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018. [7] Henghui Ding, Xudong Jiang, Bing Shuai, Ai Qun Liu, and Gang Wang. Context contrasted feature and gated multi-scale aggregation for scene segmentation. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018. 6