Automated Diagnosis of Vertebral Fractures using 2D and 3D Convolutional Networks

Similar documents
Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah

Convolution Neural Networks for Chinese Handwriting Recognition

Handwritten Hindi Numerals Recognition System

Deep Learning. Visualizing and Understanding Convolutional Networks. Christopher Funk. Pennsylvania State University.

Perceptron: This is convolution!

Finding Tiny Faces Supplementary Materials

Detecting Bone Lesions in Multiple Myeloma Patients using Transfer Learning

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. By Joa õ Carreira and Andrew Zisserman Presenter: Zhisheng Huang 03/02/2018

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

Convolutional Neural Networks: Applications and a short timeline. 7th Deep Learning Meetup Kornel Kis Vienna,

Multi-Task Self-Supervised Visual Learning

Deep Learning with Tensorflow AlexNet

Study of Residual Networks for Image Recognition

Deep Learning and Its Applications

Convolution Neural Network for Traditional Chinese Calligraphy Recognition

Convolutional Neural Networks

CMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,

Lecture 37: ConvNets (Cont d) and Training

Inception Network Overview. David White CS793

Content-Based Image Recovery

CNN Basics. Chongruo Wu

COMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017

CS489/698: Intro to ML

Weighted Convolutional Neural Network. Ensemble.

ImageNet Classification with Deep Convolutional Neural Networks

CAP 6412 Advanced Computer Vision

Know your data - many types of networks

In-Place Activated BatchNorm for Memory- Optimized Training of DNNs

Deep Learning for Computer Vision II

Using Machine Learning for Classification of Cancer Cells

Channel Locality Block: A Variant of Squeeze-and-Excitation

Plankton Classification Using ConvNets

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

Kaggle Data Science Bowl 2017 Technical Report

arxiv: v1 [cs.cv] 11 Aug 2017

Index. Springer Nature Switzerland AG 2019 B. Moons et al., Embedded Deep Learning,

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

Practical Methodology. Lecture slides for Chapter 11 of Deep Learning Ian Goodfellow

CPSC 340: Machine Learning and Data Mining. Deep Learning Fall 2016

CS 6501: Deep Learning for Computer Graphics. Training Neural Networks II. Connelly Barnes

Inception and Residual Networks. Hantao Zhang. Deep Learning with Python.

YOLO9000: Better, Faster, Stronger

arxiv: v1 [cs.cv] 20 Dec 2016

Lecture 7: Semantic Segmentation

Encoder-Decoder Networks for Semantic Segmentation. Sachin Mehta

SELF SUPERVISED DEEP REPRESENTATION LEARNING FOR FINE-GRAINED BODY PART RECOGNITION

LSTM: An Image Classification Model Based on Fashion-MNIST Dataset

Como funciona o Deep Learning

ECE 5470 Classification, Machine Learning, and Neural Network Review

3D model classification using convolutional neural network

Lung Tumor Segmentation via Fully Convolutional Neural Networks

Deep Learning Explained Module 4: Convolution Neural Networks (CNN or Conv Nets)

Human Pose Estimation with Deep Learning. Wei Yang

Machine Learning 13. week

3D Densely Convolutional Networks for Volumetric Segmentation. Toan Duc Bui, Jitae Shin, and Taesup Moon

A Quick Guide on Training a neural network using Keras.

Sentiment Classification of Food Reviews

Iterative fully convolutional neural networks for automatic vertebra segmentation

Deep Learning for Embedded Security Evaluation

A Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images

Structured Prediction using Convolutional Neural Networks

LSTM for Language Translation and Image Captioning. Tel Aviv University Deep Learning Seminar Oran Gafni & Noa Yedidia

A Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images

A Deep Learning Framework for Authorship Classification of Paintings

Deep Learning Cook Book

Deep Residual Learning

Yelp Restaurant Photo Classification

Advanced Machine Learning

SIIM 2017 Scientific Session Analytics & Deep Learning Part 2 Friday, June 2 8:00 am 9:30 am

Fuzzy Set Theory in Computer Vision: Example 3

DEEP LEARNING REVIEW. Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature Presented by Divya Chitimalla

Detecting Thoracic Diseases from Chest X-Ray Images Binit Topiwala, Mariam Alawadi, Hari Prasad { topbinit, malawadi, hprasad

Residual Networks for Tiny ImageNet

Neural Episodic Control. Alexander pritzel et al (presented by Zura Isakadze)

Real-time convolutional networks for sonar image classification in low-power embedded systems

On the Effectiveness of Neural Networks Classifying the MNIST Dataset

HENet: A Highly Efficient Convolutional Neural. Networks Optimized for Accuracy, Speed and Storage

Lung nodule detection by using. Deep Learning

An Exploration of Computer Vision Techniques for Bird Species Classification

Detection and Segmentation of Manufacturing Defects with Convolutional Neural Networks and Transfer Learning

Deep Learning For Video Classification. Presented by Natalie Carlebach & Gil Sharon

Learning to Segment Object Candidates

Everything you wanted to know about Deep Learning for Computer Vision but were afraid to ask

11. Neural Network Regularization

INTRODUCTION TO DEEP LEARNING

Facial Expression Classification with Random Filters Feature Extraction

Convolutional Neural Networks for Facial Expression Recognition

Visual Inspection of Storm-Water Pipe Systems using Deep Convolutional Neural Networks

Multi-Task Learning of Facial Landmarks and Expression

Advanced Video Analysis & Imaging

One Network to Solve Them All Solving Linear Inverse Problems using Deep Projection Models

Deconvolutions in Convolutional Neural Networks

CIS680: Vision & Learning Assignment 2.b: RPN, Faster R-CNN and Mask R-CNN Due: Nov. 21, 2018 at 11:59 pm

MoonRiver: Deep Neural Network in C++

3D-CNN and SVM for Multi-Drug Resistance Detection

A new approach for supervised power disaggregation by using a deep recurrent LSTM network

Two-Stream Convolutional Networks for Action Recognition in Videos

Transcription:

Automated Diagnosis of Vertebral Fractures using 2D and 3D Convolutional Networks CS189 Final Project Naofumi Tomita Overview Automated diagnosis of osteoporosis-related vertebral fractures is a useful application that serves for clinicians and potential patients. We aim to construct a new automated diagnosing system based on CT images by applying the recent advancement of deep learning. In the proposal, we suggested three approaches that exploiting temporal nature of CT data, and pledged to implement two of them to compare the validity of different models by the milestone. However, we have implemented one of them so far and also changed one of the original approaches. Based on the model implemented so far, results seem reasonable and promising. We are planning to conduct rigorous experiments on the Cirst model including the effectiveness of each data augmentation technique, and implement another model that utilizes 3D convolution. Approaches We have changed the list of approaches by dropping a variant of C3D network based model and appending a weakly-supervised model. The weakly-supervised model would share similar architecture as a ResNet based model, but it will consider all images in a volume into account instead of looking at a subset of images to make a prediction. This approach is ambitious but reasonable as our data collected does not have slice-level annotations. Updated list of approaches 1. ResNet based model 2. C3D network based model 3. Weakly-supervised ResNet based model

Data A CT produces a volume of data for each examination. We collect 717 positive and 719 negative samples of Abdomen-Chest-Pelvis CT data which cover from the top of chest and to the bottom of pelvis, by courtesy of Dartmouth-Hitchcock Medical Center and students in Giesel School of Medicine at Dartmouth. Each sample has a label, whether it is positive or negative. Each volume contains images from 80 to 300, depending on an examination. Each image has a size of 512 by 512 with a single channel representing a grayscale. As a priori, we know that most of the vertebral fractures occur at spine and the spine usually appears in middle 5% of total CT slices. Thus, we extract those slices that are likely to contain fracture and spines for training a model in the Cirst approach. Model The Cirst approach is decomposed into two sub-networks: a ResNet based classicier that extracts features and makes a prediction on a single image, and a classicier network that makes a sample-level prediction by aggregating results from a previous network. The sample-level classicier is trained on top of the slice-level classicier. The ResNet based slice-level classicier has 6 residual blocks and 5 residual downsampling blocks. Each block has two convolutions each followed by batch normalization and ReLU. Input size is 512x512 and it produces a scalar value by applying a fully connected layer on 1028 features at the end. We have implemented two different sample-level classiciers using LSTM. 1. a LSTM with 3 layers, 1 hidden unit each, and it takes a sequence of slice-level predictions. We call it LSTM1. 2. a LSTM with 1 layer, with 128 hidden units, and it takes features extracted from the last layer before the fully connected layer in the slice-level classicier. We call it LSTM2. We train a slice-level classicier on a subset of images that extracted based on our assumption that for those positive images there should contain some clue or evidence of vertebral fracture. Those images inherit labels from a sample. Although this assumption would introduce some noise, we expect that a sample level classicier can absorb those noise when aggregating results from a slice-level classicier and makes a better prediction. FIGURE 1. OVERVIEW OF THE ARCHITECTURE FOR THE FIRST APPROACH

Training We have constructed our dataset by splitting samples in 80:10:10 for training, validation, and testing set. We apply data augmentation techniques to overcome the scarcity of our dataset. 1. Horizontal translation by 0-padding on each side of an image (12 pixels each side) and randomly cropping 512 by 512 image 2. Random rotation in the range of -3 to 3 degree 3. Elastic deformation proposed in [1] with α value randomly chosen from range of 3 to 6 and σ value randomly chosen from a range between α and 2α A slice-image classicier is optimized with the mini-batch stochastic gradient algorithm, where the batch size is 48 and the momentum is 0.9. The initial learning rate is set to 0.000002 and decreased every 40 epochs by half, and it stops training at 100 epochs. Each parameter is initialized with a method proposed in [2]. A sample-level classicier, LSTM1 is optimized with the stochastic gradient algorithm, with an input sequence consists of prediction scores generated through a slice-level classicier on all extracted images from one sample. As the number of images contained in a sample varies, the batch size also changes depending on each sample. The initial learning rate is set to 0.01 and decreased every 40 epochs by 10, and runs for 100 epochs. While training LSTM1, the last fc layer of the slice-level classier is Cine-tuned with a learning rate of 0.000001. The other sample-level classicier, LSTM2 is optimized with the stochastic gradient algorithm. An input sequence for LSTM2 is a sequence of vectors, each vector is extracted features from last layer before a fc layer in the slice-level classicier on extracted images from a sample. As opposed to LSTM1, the slice-level classicier is used as a feature extractor and no further learning is applied on it. The initial learning rate is set to 0.00001 and decreased every 40 epochs by 10, audit stops training at 100 epochs. Experiments We implement our model using PyTorch. We evaluate our model using on the testing set. To measure the performance in the context of medical research, we calculate accuracy of prediction, TPR (true positive rate, sensitivity, or recall), (positive predictive value or prediction), and F-1 score for each class (fractured class and normal class). We evaluate a slice-level classicier and sample-level classiciers separately that shows the efciciency of sample-level classiciers.

Single-image Classification On our testing set, a slice-level classicier achieves 78% of accuracy, and other scores are also sound. As we make a sample-level classicication on top of this classicier, we plan to improve the accuracy by tuning hyper parameters. Fractured Class Normal Class Accuracy TPR (sensitivity,recall) F-1 TPR (sensitivity, recall) F-1 0.78 0.80 0.79 0.80 0.76 0.76 0.76 TABLE 1. SLICE-LEVEL CLASSIFIER RESULTS ON TESTING SET Sample-level Classification We have tested our models as well as some simple non-parametric classiciers as a baseline. The Cirst classicier uses concidence scores obtained through a slice-level classicier and make a vote to decide a Cinal prediction. Each concidence score vote if the score is larger than 0.5, veto otherwise. If over 70% of scores are voting for positive, the classicier makes positive prediction. The second classicier take a maximum value among concidence scores, and makes positive if the score is over 0.5. The third classicier works in the same principle but uses average value instead of maximum value. Fractured Class Normal Class Accuracy TPR (sensitivity,recall) F-1 TPR (sensitivity, recall) F-1 1 Voting;70% 2 MaxPooling 3 AvgPooling 0.79 0.81 0.86 0.84 0.76 0.69 0.72 0.74 0.93 0.74 0.83 0.41 0.76 0.54 0.81 0.86 0.85 0.85 0.71 0.74 0.73 4 LSTM1 0.90 0.92 0.92 0.92 0.86 0.86 0.86 5 LSTM2 0.85 0.92 0.86 0.89 0.73 0.84 0.78 TABLE 2. SAMPLE-LEVEL CLASSIFIERS RESULTS ON TESTING SET The results for 1,2, and 3 classiciers are reasonable with respect to the result from slicelevel classicication. The third model achieves 81% of accuracy, which is better than other

two simple classiciers. LSTM based models are, however, signicicantly better than other models. A single layered LSTM1 especially achieves 90% of accuracy, which is remarkably higher than other models. Our expectation was that LSTM2 should perform better than LSTM1, as it exploits extracted features instead of a concidence score generated by a slicelevel classicier. We conjecture that the Cine-tuning of the fc layer in the slice-level classicier when training LSTM1 contributes to the performance, which means the slice-level classicier has not been optimized well. Also it seems that hyper parameters in LSTM2 need to be investigated for further improvement. Future Work As our results with the Cirst model suggest, exploiting depth information in CT data is effective to achieve higher accuracy. We keep working on conducting rigorous experiments on the Cirst model by Cine-tuning hyper parameters, exploiting another dataset for pretraining model, inspecting the effectiveness of each data augmentation technique. Also, we will implement another model that utilizes 3D convolution so we can compare different models by the Cinal date of the project. We will keep researching on weakly-supervised approach as well. References [1] Simard, Patrice Y., David Steinkraus, and John C. Platt. "Best Practices for Convolutional Neural Networks Applied to Visual Document Analysis." ICDAR. Vol. 3. 2003. [2] He, Kaiming, et al. "Delving deep into recticiers: Surpassing human-level performance on imagenet classicication." Proceedings of the IEEE international conference on computer vision. 2015.