3D Densely Convolutional Networks for Volumetric Segmentation. Toan Duc Bui, Jitae Shin, and Taesup Moon

Similar documents
Isointense infant brain MRI segmentation with a dilated convolutional neural network Moeskops, P.; Pluim, J.P.W.

arxiv: v1 [cs.lg] 6 Jan 2019

Channel Locality Block: A Variant of Squeeze-and-Excitation

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

Elastic Neural Networks for Classification

COMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017

HENet: A Highly Efficient Convolutional Neural. Networks Optimized for Accuracy, Speed and Storage

arxiv: v1 [cs.cv] 18 Jun 2017

arxiv: v1 [cs.cv] 20 Dec 2016

Volumetric Medical Image Segmentation with Deep Convolutional Neural Networks

Study of Residual Networks for Image Recognition

Supplementary material for Analyzing Filters Toward Efficient ConvNet

Content-Based Image Recovery

Learning Deep Representations for Visual Recognition

Computer Vision Lecture 16

Deep Neural Networks:

Inception and Residual Networks. Hantao Zhang. Deep Learning with Python.

arxiv: v2 [cs.cv] 25 Apr 2018

Application of Convolutional Neural Network for Image Classification on Pascal VOC Challenge 2012 dataset

Real-time convolutional networks for sonar image classification in low-power embedded systems

arxiv: v1 [cs.cv] 18 Jan 2018

Proceedings of the International MultiConference of Engineers and Computer Scientists 2018 Vol I IMECS 2018, March 14-16, 2018, Hong Kong

Deep Learning with Tensorflow AlexNet

Structured Prediction using Convolutional Neural Networks

Computer Vision Lecture 16

A Novel Weight-Shared Multi-Stage Network Architecture of CNNs for Scale Invariance

Automatic Lymphocyte Detection in H&E Images with Deep Neural Networks

Computer Vision Lecture 16

Learning Spatio-Temporal Features with 3D Residual Networks for Action Recognition

Convolution Neural Network for Traditional Chinese Calligraphy Recognition

Deep Learning for Computer Vision II

Boundary-aware Fully Convolutional Network for Brain Tumor Segmentation

A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS. Kuan-Chuan Peng and Tsuhan Chen

Facial Key Points Detection using Deep Convolutional Neural Network - NaimishNet

BranchyNet: Fast Inference via Early Exiting from Deep Neural Networks

AAR-CNNs: Auto Adaptive Regularized Convolutional Neural Networks

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. By Joa õ Carreira and Andrew Zisserman Presenter: Zhisheng Huang 03/02/2018

Layerwise Interweaving Convolutional LSTM

Inception Network Overview. David White CS793

FishNet: A Versatile Backbone for Image, Region, and Pixel Level Prediction

Learning Deep Features for Visual Recognition

Volumetric ConvNets with Mixed Residual Connections for Automated Prostate Segmentation from 3D MR Images

Residual Networks And Attention Models. cs273b Recitation 11/11/2016. Anna Shcherbina

Tiny ImageNet Visual Recognition Challenge

Multi-Label Whole Heart Segmentation Using CNNs and Anatomical Label Configurations

arxiv: v2 [cs.cv] 2 Aug 2018

arxiv: v1 [cs.cv] 31 Mar 2016

Deep Residual Architecture for Skin Lesion Segmentation

arxiv: v3 [cs.cv] 10 Dec 2018

Data Augmentation for Skin Lesion Analysis

FROM NEONATAL TO ADULT BRAIN MR IMAGE SEGMENTATION IN A FEW SECONDS USING 3D-LIKE FULLY CONVOLUTIONAL NETWORK AND TRANSFER LEARNING

Multi-channel Deep Transfer Learning for Nuclei Segmentation in Glioblastoma Cell Tissue Images

COMPARISON OF OBJECTIVE FUNCTIONS IN CNN-BASED PROSTATE MAGNETIC RESONANCE IMAGE SEGMENTATION

In-Place Activated BatchNorm for Memory- Optimized Training of DNNs

Deep Learning. Visualizing and Understanding Convolutional Networks. Christopher Funk. Pennsylvania State University.

Fully Dense UNet for 2D Sparse Photoacoustic Tomography Reconstruction

Know your data - many types of networks

CrescendoNet: A New Deep Convolutional Neural Network with Ensemble Behavior

Image Super-Resolution Using Dense Skip Connections

3D model classification using convolutional neural network

Contextual Dropout. Sam Fok. Abstract. 1. Introduction. 2. Background and Related Work

Esophageal Gross Tumor Volume Segmentation Using a 3D Convolutional Neural Network

Smart Parking System using Deep Learning. Sheece Gardezi Supervised By: Anoop Cherian Peter Strazdins

Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network

arxiv: v1 [cs.cv] 29 Sep 2016

Fuzzy Set Theory in Computer Vision: Example 3, Part II

CNNS FROM THE BASICS TO RECENT ADVANCES. Dmytro Mishkin Center for Machine Perception Czech Technical University in Prague

Traffic Multiple Target Detection on YOLOv2

Multi-Glance Attention Models For Image Classification

A Multi-Scale Fully Convolutional Networks Model for Brain MRI Segmentation

Classification of Building Information Model (BIM) Structures with Deep Learning

REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION

Exploring Architectures for CNN-Based Word Spotting

Kaggle Data Science Bowl 2017 Technical Report

Perceptron: This is convolution!

arxiv: v1 [cs.cv] 26 Jun 2018

Densely Connected Convolutional Networks

Presentation Outline. Semantic Segmentation. Overview. Presentation Outline CNN. Learning Deconvolution Network for Semantic Segmentation 6/6/16

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

arxiv: v1 [cs.cv] 21 Aug 2016

arxiv: v1 [cs.lg] 16 Jan 2013

Neural Networks with Input Specified Thresholds

M-NET: A Convolutional Neural Network for Deep Brain Structure Segmentation

Advanced Machine Learning

CS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh April 13, 2016

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

Deconvolutions in Convolutional Neural Networks

Shape-aware Deep Convolutional Neural Network for Vertebrae Segmentation

Groupout: A Way to Regularize Deep Convolutional Neural Network

Microscopy Cell Counting with Fully Convolutional Regression Networks

SIIM 2017 Scientific Session Analytics & Deep Learning Part 2 Friday, June 2 8:00 am 9:30 am

Fei-Fei Li & Justin Johnson & Serena Yeung

Rotation Invariance Neural Network

arxiv: v1 [cs.cv] 20 Apr 2017

Densely Connected Bidirectional LSTM with Applications to Sentence Classification

arxiv: v1 [cs.cv] 4 Jun 2017

arxiv: v2 [cs.cv] 26 Jan 2018

Mask R-CNN. Kaiming He, Georgia, Gkioxari, Piotr Dollar, Ross Girshick Presenters: Xiaokang Wang, Mengyao Shi Feb. 13, 2018

FUSION NETWORK FOR FACE-BASED AGE ESTIMATION

arxiv: v1 [cs.cv] 16 Jul 2017

Transcription:

3D Densely Convolutional Networks for Volumetric Segmentation Toan Duc Bui, Jitae Shin, and Taesup Moon School of Electronic and Electrical Engineering, Sungkyunkwan University, Republic of Korea arxiv:1709.03199v2 [cs.cv] 13 Sep 2017 Abstract. In the isointense stage, the accurate volumetric image segmentation is a challenging task due to the low contrast between tissues. In this paper, we propose a novel very deep network architecture based on densely convolutional network for volumetric brain segmentation. The proposed network architecture provides a dense connection between layers that aims to improve the information flow in the network. By concatenating features map of fine and coarse dense blocks, it allows capturing multi-scale contextual information. Experimental results demonstrate significant advantages of the proposed method over existing methods, in terms of both segmentation accuracy and parameter efficiency in MICCAI grand challenge on 6-month infant brain MRI segmentation. 1 Introduction Volumetric brain image segmentation aims to separate the brain tissues into nonoverlapping regions such as white matter (WM), gray mater (GM), cerebrospinal fluid (CSF) and background (BG) regions. The accurate volumetric image segmentation is a prerequisite for quantifying the structural volumes. However, the low contrast between tissues often cause tissue to be misclassified, which can hinder accurate segmentation. Hence, the accuracy of automatic brain segmentation is still an active area of research. Recently, deep convolutional neural networks [9] have achieved a great success in medical image segmentation [11, 12, 15]. For example, Çiçek et al. [2] proposed a 3D U-net that contracting skip layers and learned up-sampling part to produce a full-resolution segmentation. Chen et al. [1] proposed a voxelwise residual network for brain segmentation by pass signal from one layer to the next via identity connection. However, the connection is a short path from early layers to later layer. To address it, Huang [6] introduces a DenseNet that provides a direct connections from any layer to all subsequent layers to ensure maximum information flow between layers. It shows a consistent improvement in accuracy with increasing depth network. Yu [14] extended DenseNet to volumetric cardiac segmentation. It uses two dense blocks and follows by pooling layers to reduce feature maps resolution, then restores the resolution by stacks of learned deconvolution layers. These stack deconvolution layers often generate a larger learned parameters that take a lot of memory for the training process. It also may not be able to capture multi-scale contextual information, results in a poor performance. In this paper, we propose a novel very deep network architecture based on densely convolution network for volumetric brain segmentation. First, we combine local predictions and global predictions by concatenating features map of fine and coarse

dense blocks that allow capturing multi-scale contextual information. In the traditional DenseNet architecture [6], the pooling layer often uses to reduce feature resolution and to increase the abstract feature representations; however, it may lose the spatial information. To preserve the spatial information, we replace the pooling layer with a convolution layer of stride 2. It only increases a small number of learned parameters, but significantly improves the performance. Second, we use a model of bottleneck with compression (BC) to reduce the number of feature maps in each dense block to reduce the number of learned parameters results in computational efficiency than existing methods [2, 14]. Experimental results demonstrate significant advantages of the proposed method over existing methods, in terms of both segmentation accuracy and parameter efficiency in MICCAI grand challenge on 6-month infant brain MRI segmentation1. Our implementation and network architectures are publicly available at the website2 2 Methods In this section, we first briefly review the key concept of DenseNet [6] to deal with the degradation problem in the classification task. Then, we propose a novel network architecture that extends the DenseNet to volumetric segmentation. 2.1 DenseNet: Densely Connected Convolutional Network Let x l be the output of l th layer. In the traditional feed-forward networks, the output x l is computed by x l = H l (x l 1 ) (1) where H is a non-linear transformation of l th layer. In particular, the performance of the deep network architecture gets saturated with the network depth increasing due to vanishing/exploding gradient [5]. To address the degradation problem, Huang [6] introduced a DenseNet architecture that provides a direct connection from any layer to subsequent layers by concatenation the feature maps of all preceding layers as follows: x l = H l ([x 0, x 1,, x l 1 ]) (2) where [.] denotes the concatenation operation that concates the feature maps of all subsequent layers. By using dense connections, the DenseNet architecture allows better information and gradient flow during training. If each function H l (.) produces k feature maps as output, then the number of input feature map at the layer l th will be k 0 + (l 1) k, where k 0 is a number of feature map at the first layer. The hyper-parameter k refers as growth rate. To reduce the the number of the input feature maps, a 1 1 convolution layer is introduced as bottleneck layer before each 3 3 convolution. To further improve model compactness, a transition layer that includes a batch normalization layer (BN) [7], a ReLU [3] and a 1 1 convolutional layer followed by 2 2 pooling layer [9], is used 1 http://iseg2017.web.unc.edu/ 2 https://github.com/tbuikr/3d_denseseg

to reduce the feature maps resolution. With m input feature maps, the transition layer generates m θ output feature maps, where 0 θ 1. The network architecture is referred as DenseNet-BC when the bottleneck and transition layers with θ 1 are used. 2.2 3D-DenseSeg: A 3D Densely Convolution Networks for Volumetric Segmentation Fig. 1 illustrates the proposed network architecture for volumetric segmentation. It consists of 47 layers with 1.55 million learned parameters. The network includes two paths: down-sampling and up-sampling. The down-sampling aims to reduces the feature maps resolution, and to increase the receptive field. It is performed by four dense blocks, in which each dense block consists of four BN-ReLU-Conv(1 1 1)-BN- ReLU-Conv(3 3 3) with growth rate k = 16. We use a dropout layer [13] with the dropout rate of 0.2 after each Conv(3 3 3) layer in the dense block to against the over-fitting problem. Between two contiguous dense blocks, a transition block includes Conv(1 1 1) with θ = 0.5 followed by a convolution layer of stride 2 is used to reduce the feature maps resolution, while preserving the spatial information. The transition block serves as a deep supervision to handle with the limited dataset, but less complicated and ignoring the tuning weight balance between the auxiliary and main losses. Before entering the first dense block, we extract feature by using three convolution layers that generate k 0 = 32 output feature maps. In the up-sampling path, the 3D-Upsampling operators are used to recover the input resolution. In particular, the shallower layers contains the local feature, while the deeper layer contains the global feature [10]. To make a better prediction, we perform upsampling after each dense block and combine these up-sampling feature maps. The concatenation from the different level of up-sampling feature maps allows capturing multiple contextual information. A classifier consisting of a Conv(1 1 1) is used to classify the concatenation feature maps into target classes (i.e four classes for the brain). Finally, the brain probability maps can be obtained using softmax classification. 3 Experiments 3.1 Dataset and Training We used the public 6-month infant brain MRI segmentation challenge (iseg) dataset3 to evaluate the proposed method. It consists of 10 training samples and 13 testing samples. Each sample includes a T1 image, a T2 image that were performed preprocessing using in-house tools. Automatic segmentations will be compared with the manual segmentation, by using various measurements, such as Dice Coefficient (DC), Modified Hausdorff Distance (MHD) and Average Surface Distance (ASD). We normalize the T1 and T2 images to zero mean and unit variance before entering them into our network. Our network is trained by Adam method [8] with a mini-batch size of 4. The weights is initialized as in He et. al [4]. The learning rate was initially set to 0.0002, and drop learning rate by a factor of γ = 0.1 every 50000 iterations. We used 3 http://iseg2017.web.unc.edu/

T1 Concat Probability map T2 3D-Convolution BN, ReLU 3D-Dense block 3D-Transition block 3D-Upsampling Copy Main classifier Fig. 1: 3D-DenseSeg network architecture for volumetric segmentation weight decay of 0.0005 and a momentum of 0.97. Due to the limited GPU memory, we randomly cropped sub-volume samples with size of 64 64 64 for the input of the network. We used voting strategy to generate the final segmentation results from the predictions of the overlapped sub-volumes. 3.2 Performance Evaluation proposed method To evaluate the performance of the proposed method, we perform cross-validation on iseg dataset. Fig. 2 shows the validation results of the proposed method for the ninth subject on different slices. This demonstrates the robustness of the proposed network architecture for accurate segmentation. Comparison with the existing methods In order to evaluate the proposed method quantitatively, we compare the proposed method with the state-of-the-art deep learning based methods [2], [14] on validation set in term of the number of depth, learned parameters and segmentation accuracy, as shown in Table 1. The 3D-Unet [2] architecture consisting of 18 layers with 19 million learned parameters achieves 91.58% accuracy. The state-of-the-art DenseVoxNet [14] based on DenseNet architecture with stacked deconvolutions has 32 layers with 4.34 million learned parameters achieved

Fig. 2: Segmentation result on different slice (a) T1 image, (b) T2 image, (c) DenseVoxNet, (d) our result, (e) manual segmentation Table 1: A comparison between proposed method with state-of-the-art method in term of parameter and accuracy on the validation Method Depth Params 19M DSC WM GM CSF Average DSC 3D-Unet (2015) 18 89.57 90.73 94.44 91.58 DenseVoxNet (2017) 32 4.34M 85.46 88.51 93.71 89.23 3D-DenseNet (Ours) 47 1.55M 91.25 91.57 94.69 92.50 89.23% accuracy. The proposed network architecture provides substantially deeper networks (47 layers), while it only has 1.55 million learned parameters due to using BC model. It achieved 92.50% accuracy with less learned parameters in comparison with existing methods. Table 2 show the results on the test set of the iseg dataset4. It is observed that the proposed network architecture achieved state-of-the-art performance in the challenge. 4 Discussion and Conclusion We have proposed a novel 3D dense network architecture to addresse challenges in volumetric medical segmentation, especially infant brain segmentation. By concatenation information from coarse to fine layers, the proposed network architecture allows to 4 http://iseg2017.web.unc.edu/results/

Table 2: Results of iseg-2017 challenge of different methods (DC:%, MHD: mm, ASD: mm. only top 5 teams are shown here). Method WM GM CSF DSC MHD ASD DSC MHD ASD DSC MHD ASD 3D-DenseSeg (MSL_SKKU-Ours) 90.1 6.444 0.391 91.9 5.980 0.330 95.8 9.072 0.116 LIVIA 89.7 6.975 0.376 91.9 6.415 0.338 95.7 9.029 0.138 Bern_IPMI 89.6 6.782 0.398 91.6 6.455 0.341 95.4 9.616 0.127 LRDE 86.1 6.607 0.523 88.7 5.852 0.458 92.8 9.875 0.201 nic_vicorob 88.5 7.154 0.430 91.0 7.647 0.367 95.1 9.178 0.137 capture multiple contextual information. The proposed network is much deeper than the existing method, and hence can capture more information. The pooling layer is replaced by convolution with stride 2 to preserve spatial information. We further incorporate multi-modality information for accurate brain segmentation. Quantitative evaluations and comparisons with existing methods on real MR images demonstrated the significant advantages of the proposed method in terms of both segmentation accuracy and parameter efficiency. In the future, we will explore the proposed network architecture for difficult task such as tumor segmentation. References 1. Chen, H., Dou, Q., Yu, L., Qin, J., Heng, P.A.: Voxresnet: Deep voxelwise residual networks for brain segmentation from 3d mr images. NeuroImage (2017) 2. Çiçek, Ö., Abdulkadir, A., Lienkamp, S.S., Brox, T., Ronneberger, O.: 3d u-net: learning dense volumetric segmentation from sparse annotation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 424 432. Springer (2016) 3. Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: Aistats. vol. 15, p. 275 (2011) 4. He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE international conference on computer vision. pp. 1026 1034 (2015) 5. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 770 778 (2016) 6. Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017) 7. Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. arxiv preprint arxiv:1502.03167 (2015) 8. Kingma, D., Ba, J.: Adam: A method for stochastic optimization. arxiv preprint arxiv:1412.6980 (2014) 9. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278 2324 (1998)

10. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 3431 3440 (2015) 11. Moeskops, P., Viergever, M.A., Mendrik, A.M., de Vries, L.S., Benders, M.J., Išgum, I.: Automatic segmentation of mr brain images with a convolutional neural network. IEEE transactions on medical imaging 35(5), 1252 1261 (2016) 12. Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer- Assisted Intervention. pp. 234 241. Springer (2015) 13. Srivastava, N., Hinton, G.E., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. Journal of machine learning research 15(1), 1929 1958 (2014) 14. Yu, L., Cheng, J.Z., Dou, Q., Yang, X., Chen, H., Qin, J., Heng, P.A.: Automatic 3d cardiovascular mr segmentation with densely-connected volumetric convnets. MICCAI (2017) 15. Zhang, W., Li, R., Deng, H., Wang, L., Lin, W., Ji, S., Shen, D.: Deep convolutional neural networks for multi-modality isointense infant brain image segmentation. NeuroImage 108, 214 224 (2015)