SIIM 2017 Scientific Session Analytics & Deep Learning Part 2 Friday, June 2 8:00 am 9:30 am Performance of Deep Convolutional Neural Networks for Classification of Acute Territorial Infarct on Brain MRI: A Pilot Study for Computer Vision in Stroke Neuroimaging Ali Ahmadvand, MSc, Emory University; Supreeth Prajwal, MS; Shamim Nemati, PhD; Falgun H. Chokshi, MD, MS (Presenter) Hypothesis Introduction A deep convolutional neural network (CNN) will have a higher accuracy for classifying acute territorial infarction (ATI) on Apparent Diffusion Coefficient (ADC) maps than a shallow CNN. In the field of computer vision, CNNs are an evolving branch of deep learning algorithms that have attracted much attention as compared to the other deep learning methods. This is because of the intrinsic property of this group of networks that explicitly construct a hierarchical representation of input images, which result in a rich set of features for downstream classification tasks. CNNs generally contain three main layers such as a Convolutional Layer, a Pooling Layer, and a Fully-Connected Layer. These layers can be stacked together to form a full CNN architecture [1, 2]. CNNs have started to become a leading method for different applications of computer vision in medicine [3, 4]. One potentially impactful area to test the performance of CNNs is neuroimaging for acute stroke detection. Magnetic resonance imaging (MRI) has transformed acute stroke care and become a foundation for diagnosis and prognostication. Additionally, diffusion-weighted imaging (DWI) allows the closest approximation of core stroke volume compared to other imaging modalities [5]. Comprised of DWI and apparent diffusion coefficient (ADC) maps, these image sets are amenable to advanced machine learning methods that could help classify cases of ATI as part of a real-time image processing and deep learning pipeline that may augment the ability of the radiologist to diagnose this very timesensitive condition. In contradistinction to traditional applications of machine learning to medical images where classification and segmentation of images are in two dimensions (2D), we present work that uses threedimensional (3D) CNNs with max pooling. This model allows translation invariance, which means the presence of a certain imaging pattern must be present (i.e. stroke), without necessarily identifying its location in the image (i.e. frontal lobe or occipital lobe). Such a model obviates the need for laborious and expensive manual annotation of each image by a radiologist and allows whole-image set-level classification of MRIs as either having or not having ATI. The work presented here is the first part in the construction and implementation of such a pipeline, namely, the assessment of the performance of shallow and deep CNNs with max pooling models to compare their accuracies in comparison to human (radiologist) classification.
Methods Results First, a board certified neuroradiologist classified 184 brain MRI ADC map images (128x128 resolution and 24 slices/sequence) as not showing or showing ATI; these images served as the reference for machine learning methods. Patient sociodemographic information, clinical history, and additional imaging sequences were intentionally not included as our goal was to include the simplest source of imaging data to evaluate the baseline performance of the machine learning methods. Next, basic skull stripping and two-dimensional median filtering (3x3 kernel) for salt-and-paper noise removal were performed using MATLAB s image processing toolbox. All images within a sequence were normalized to the interval of [0 1]. Finally, using image augmentation, we quadrupled the number of image sequences via a combination of image transformation operations of random rotation (-+6 degrees) and flips. The resulting 3-dimensional tensors (corresponding to 2D image subspace, plus 1D slice sequence subspace) were used as input to our neural network algorithms with the ATI outcomes as the target class labels. Since an ATI generally shows up across contiguous image sequences, we employed a CNN model with 3- dimensional kernels to capture patterns across a 3-dimensional volume. In our experiments, we compared two different CNN architectures: 1) A shallow network involving a single convolutional layer with 20 filters of kernel size 3x3x3 for feature extraction. The output of this CNN network was a 50 dimensional feature vector which was then fed into a binary classification layer, and 2) A deeper network involving three convolutional layers with the first layer having 20 filters of size 5x5x3; second layer having 25 filters of size 3x3x2, and the third layer having 30 filters of size 3x3x2. The output of this CNN network was a 100-dimensional feature vector, which was then fed into a binary classification layer. We used a combination of dropout and network weight regularization to minimize over-fitting on the training data (i.e., to improve the generalization performance of our trained network). The networks were implemented using the TensorFlow library implemented in the Python programming language. We used a random 80% subset of the data for training and the remaining 20% for testing purposes, and report the classification performance (c-statistic) on the testing set. The algorithms were developed and tested on an Intel Xeon CPU E5-2680 v3, 2.50GHz with 47 cores and 130 GB storage. Human Reference: Of 184 total Brain MRIs, 49 had ATI (27%) by neuroradiologist interpretation. 3D CNN Performance: The best performance AUC of the shallow CNN was 0.69 for non-stroke class and 0.65 for stroke class. In comparison, the deep CNN achieved a testing AUC of 0.83 for the non-stroke and stroke class. Training and validation accuracy for both shallow and deep CNN models in different epochs are shown in figures 1 and 2, respectively. Moreover, figures 3 and 4 depict the AUC measure on test sets for both shallow and deep CNN models in different epochs, respectively.
Figure 1 Figure 2
Figure 3 Figure 4
Discussion Conclusion References Hardware Performance: The hardware needs about 135 and 27 seconds for training of each mini-batch in deep and shallow CNNs, respectively. The network is capable of processing about 9 number of image sequences per second during testing. We have shown that a 3D deep CNN model can classify ATI on MRI ADC images with high accuracy compared to a human (neuroradiologist) reference. These preliminary results indicate that a combination of image augmentation and regularization has the potential to make deep learning algorithms useful even when the training dataset is relatively small, such as in this case. Furthermore, the 3D CNN with max pooling architecture allowed the machine to classify the MRI ADC images as having or not having ATI without the radiologist needing to annotate each image in the datasets. Such a process would be extremely time consuming and expensive, thereby attenuating the scalability and potential real-time application of such a machine learning model. Future work will include using semi-supervised (combination of labeled and unlabeled data) and transfer learning techniques, where a network trained for a different image classification task is fine-tuned for the application at hand using a limited labeled dataset [6]. Finally, to reduce intra-rater variability, and improve sequence labels, crowd-sourcing techniques in association with Bayesian fusion techniques can be applied [7]. To our knowledge this is the first study to demonstrate a 3D deep CNN algorithm with a high accuracy for whole image set-level classification of ATI on the brain MRI ADC maps. 1. LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278-2324. 2. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097-1105). 3. Kamnitsas, K., Ledig, C., Newcombe, V. F., Simpson, J. P., Kane, A. D., Menon, D. K.,... & Glocker, B. (2016). Efficient Multi-Scale 3D CNN with Fully Connected CRF for Accurate Brain Lesion Segmentation. arxiv preprint arxiv:1603.05959. 4. Havaei, M., Davy, A., Warde-Farley, D., Biard, A., Courville, A., Bengio, Y.,... & Larochelle, H. (2016). Brain tumor segmentation with deep neural networks. Medical Image Analysis. 5. Kim BJ, Kang HG, Kim HJ, et al. (2012). Magnetic resonance imaging in acute ischemic stroke treatment. J Stroke 16:131-145. 6. Gulshan V, Peng L, Coram M, et al. Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs. (2016) JAMA 316:2402-2410. 7. Xu, Z., Asman, A. J., Singh, E., Chambless, L., Thompson, R., & Landman, B. A. (2012, May). Collaborative labeling of malignant glioma. In 2012 9th IEEE International Symposium on Biomedical Imaging (ISBI) (pp. 1148-1151). IEEE. Keywords machine learning, deep learning, computer vision, neural network, classification, stroke, cerebrovascular accident, acute territorial infarction, MRI