Using Machine Learning for Classification of Cancer Cells

Similar documents
Know your data - many types of networks

ImageNet Classification with Deep Convolutional Neural Networks

Fuzzy Set Theory in Computer Vision: Example 3, Part II

INTRODUCTION TO DEEP LEARNING

Rich feature hierarchies for accurate object detection and semant

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,

An Exploration of Computer Vision Techniques for Bird Species Classification

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

CS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh April 13, 2016

Deep Learning. Visualizing and Understanding Convolutional Networks. Christopher Funk. Pennsylvania State University.

Deep Learning For Video Classification. Presented by Natalie Carlebach & Gil Sharon

Deep Learning with Tensorflow AlexNet

Return of the Devil in the Details: Delving Deep into Convolutional Nets

Neural Networks for unsupervised learning From Principal Components Analysis to Autoencoders to semantic hashing

Yiqi Yan. May 10, 2017

Applying Supervised Learning

Classification of objects from Video Data (Group 30)

CMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro

Real-time Object Detection CS 229 Course Project

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

Dynamic Routing Between Capsules

Encoder-Decoder Networks for Semantic Segmentation. Sachin Mehta

Object Detection. Part1. Presenter: Dae-Yong

Bilinear Models for Fine-Grained Visual Recognition

CSE 559A: Computer Vision

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

MoonRiver: Deep Neural Network in C++

CS 523: Multimedia Systems

Introduction to object recognition. Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others

INF 5860 Machine learning for image classification. Lecture 11: Visualization Anne Solberg April 4, 2018

CS231N Section. Video Understanding 6/1/2018

Inception and Residual Networks. Hantao Zhang. Deep Learning with Python.

PIXELS TO VOXELS: MODELING VISUAL REPRESENTATION IN THE HUMAN BRAIN

Spatial Localization and Detection. Lecture 8-1

Deep Learning for Computer Vision with MATLAB By Jon Cherrie

Fuzzy Set Theory in Computer Vision: Example 3

Deep Convolutional Neural Networks. Nov. 20th, 2015 Bruce Draper

CS229 Final Project: Predicting Expected Response Times

Machine Learning 13. week

Kaggle Data Science Bowl 2017 Technical Report

Character Recognition from Google Street View Images

Fully Convolutional Networks for Semantic Segmentation

Deep Face Recognition. Nathan Sun

Deep Learning for Computer Vision II

Convolution Neural Networks for Chinese Handwriting Recognition

Weakly Supervised Object Recognition with Convolutional Neural Networks

Recognition. Topics that we will try to cover:

Martian lava field, NASA, Wikipedia

Convolutional Neural Networks

Large-scale Video Classification with Convolutional Neural Networks

Transferring Deep Convolutional Neural Networks for the Scene Classification of High-Resolution Remote Sensing Imagery

IMAGE RETRIEVAL USING VLAD WITH MULTIPLE FEATURES

Geometric VLAD for Large Scale Image Search. Zixuan Wang 1, Wei Di 2, Anurag Bhardwaj 2, Vignesh Jagadesh 2, Robinson Piramuthu 2

Intro to Deep Learning. Slides Credit: Andrej Karapathy, Derek Hoiem, Marc Aurelio, Yann LeCunn

Robust PDF Table Locator

3D Shape Analysis with Multi-view Convolutional Networks. Evangelos Kalogerakis

Deep learning for object detection. Slides from Svetlana Lazebnik and many others

CHAPTER 4 SEMANTIC REGION-BASED IMAGE RETRIEVAL (SRBIR)

Beyond Bags of Features

Naïve Bayes for text classification

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

Face Recognition A Deep Learning Approach

A Deep Learning Approach to Vehicle Speed Estimation

Aggregating Descriptors with Local Gaussian Metrics

Automated Canvas Analysis for Painting Conservation. By Brendan Tobin

Smart Content Recognition from Images Using a Mixture of Convolutional Neural Networks *

Transfer Learning. Style Transfer in Deep Learning

Classifying Images with Visual/Textual Cues. By Steven Kappes and Yan Cao

CS 1674: Intro to Computer Vision. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh November 16, 2016

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017

Deep Learning for Object detection & localization

Learning Feature Hierarchies for Object Recognition

Disguised Face Identification (DFI) with Facial KeyPoints using Spatial Fusion Convolutional Network. Nathan Sun CIS601

3 Object Detection. BVM 2018 Tutorial: Advanced Deep Learning Methods. Paul F. Jaeger, Division of Medical Image Computing

Multi-Task Self-Supervised Visual Learning

3D Convolutional Neural Networks for Landing Zone Detection from LiDAR

Rich feature hierarchies for accurate object detection and semantic segmentation

Machine Learning. MGS Lecture 3: Deep Learning

Binary Convolutional Neural Network on RRAM

Visual object classification by sparse convolutional neural networks

Como funciona o Deep Learning

CS 231A Computer Vision (Fall 2011) Problem Set 4

Pattern recognition. Classification/Clustering GW Chapter 12 (some concepts) Textures

Regionlet Object Detector with Hand-crafted and CNN Feature

DEEP LEARNING REVIEW. Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature Presented by Divya Chitimalla

Study of Residual Networks for Image Recognition

Shape Context Matching For Efficient OCR

Convolutional Neural Networks: Applications and a short timeline. 7th Deep Learning Meetup Kornel Kis Vienna,

Recognition of Animal Skin Texture Attributes in the Wild. Amey Dharwadker (aap2174) Kai Zhang (kz2213)

Feature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule.

Structured Prediction using Convolutional Neural Networks

OBJECT DETECTION HYUNG IL KOO

Introduction to Automated Text Analysis. bit.ly/poir599

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

Mixtures of Gaussians and Advanced Feature Encoding

Computer Vision. Exercise Session 10 Image Categorization

Detecting Thoracic Diseases from Chest X-Ray Images Binit Topiwala, Mariam Alawadi, Hari Prasad { topbinit, malawadi, hprasad

COMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017

Facial Expression Classification with Random Filters Feature Extraction

LEARNING TO GENERATE CHAIRS WITH CONVOLUTIONAL NEURAL NETWORKS

Transcription:

Using Machine Learning for Classification of Cancer Cells Camille Biscarrat University of California, Berkeley I Introduction Cell screening is a commonly used technique in the development of new drugs. Drug screenings consider a target cell, in our case a cancer cell, and subject it to different compounds and observe the effectiveness of each compound. This process can be slow and resource expensive. Recently, high content screening allowed for more information to be obtained from each test by using fluorescence labelling and imaging. However, such labelling compounds can affect the cells, even killing them, and can also be expensive. To resolve this issue, we propose a label-free imaging setup which separates cancer cells from drug-treated cancer cells. Treated cells present sufficient morphological changes that distinguishes them from the untreated cancer cells. The label-free imaging is done with the help of an optofluidic time stretch microscope, which achieves both high throughput and high image resolution. The images are then digitally process and classified using machine learning. II Machine Learning Background A Convolutional Neural Networks Neural networks are a type of graph that takes an input, applies a function at each node (also called neuron) and outputs a classification score. Each neuron is connected to every neuron from the previous layer and, but neurons from the same layer operate independently. Each connection has a weight associated with it. Training a neural network is finding the optimal weights resulting in the best classification. These types of networks work well for small inputs but do not scale well for large images, as the number of neurons and weights to be found scales linearly with the total number of pixels of the input. Figure 1 - Neural Network Schematic. Convolutional neural networks (CNN) are neural networks that have neurons organized in 3 dimensions. They transform a 3-dimensional input into 3-dimensional volumes, as it passes through the layers. 1

Figure 2 - Convolutional Neural Network Schematic. CNNs are comprised of convolutional layers, pooling layers and fully connected layers. In a convolutional layer, the input is convolved with one or multiple filters. The output of such a layer is called an activation map. This map can also be considered a set of features. Following a convolutional layer can be a pooling layer, or a downsampling. The input is divided into small subsections, usually of size 2 2, and only the maximum value of that section is kept in the output, effectively reducing the input width and height by a factor of 2. The final layers are fully connected layers. The entirety of the input is fed to all the neurons in these layers. The last fully connected layer is the classifier which has for output the classification scores. Convolutional and pooling layers only consider a smaller region of the input volume at a given time, so the neurons in those layers are fully connected only to that small portion. Even if the layers will be applied to the entirety of the input, since each filter only operates on a smaller section at a time, the number of weights considered is considerably lower than in fully connected layers. In the latter, the entirety of the input is connected to every node. B Transfer Learning CNNs extract features and uses those features to classify the input images. Multiple pre-trained CNNs are available online, such as AlexNet, Imagenet, and VGG16. These have been trained on millions of images and have classifiers specific to the training set, labelling new images as belonging to one of 1000 categories, such as car, dog, house, flower, etc. Even though the training set is very large, the images do not represent all possible images; for example, they do not contain any images of cells. However, CNNs can still be used as feature extractors. Transfer learning is a process in which the output of a layer of a CNN is used as features and becomes the input of a separate classifier. This process allows us to take advantage of the very well trained and complex CNNs that are available and prevents us from training our own CNN, as training a CNN requires a lot of time, data and memory. The CNNs available online have been trained on a very large dataset, so the features it selects are the most optimal and distinctive ones. Even though the training dataset is very different from the test data, the features extracted from a pre-trained CNN are still very effective image descriptors. With these features, we must only train a linear support vector machine on our data, which is a very fast operation. 2

C Support Vector Machines Support vector machines (SVM) is a supervised learning model. Given a dataset where each point is labeled as belonging to one of two categories, a SVM finds the best separation between the two categories. A linear SVM does so by finding the hyperplane that maximizes the distances between the points of the training dataset and the hyperplane. This is a binary classifier, as points are classified into category A or category B. Once the model is trained, the linear SVM will assign a label to a new point. III Binary Classification A Experimental Procedure The cells considered are MCF-7, breast cancer cells. They are treated with an anticancer drug at different concentration: 1 nm, 10 nm, 100 nm, 1 μm, 10 μm and 0 nm (control). Each group of cells is imaged through the time-stretch microscope. This process is repeated, giving us two sets of data: trial 3 and trial 4. In each trial, a dataset is the images of one drug concentration (1 nm 10 nm) mixed with the images from the control (0 nm). A dataset passes through the VGG16 neural net for feature extraction. The features are then used to train a linear SVM model. This model can be used to predict labels for other input data. B Features from Fully Connected Layer The first set of tests involved seeing how robust our model is by testing it on input data with different drug concentrations and from different collection trials. The model was trained using the 1μM concentration images and tested on all 5 concentrations from trials 3 and 4 (1 nm, 10 nm, 100 nm, 1 μm, 10 μm ). The features are extracted from the fc7 layer of VGG16, one of the last fully connected layers of the CNN. We compare these results to the ones obtained by using CellProfiler, an opensource software, for feature extraction. 3

Figure 3 Testing accuracies of classification vs. concentration i) using the trial 3, 1 μm model on trial 3 data, ii) using the trial 3, 1 μm model on trial 4 data, iii) using the trial 4, 1 μm model on trial 3 data, iv) using the trial 4, 1 μm model on trial 4 data. Results from using CellProfiler features and fc7 features. As expected, the predictions are very accurate (over 95%) when the testing and training data are the same. Even when using test data from a different trial than the training data (e.g. using trial 3 model to predict trial 4 model), we obtained high accuracies (above 90%) for 100 nm and 1 μm concentrations. The 1 μm model classifies with difficulty cells with very different concentrations (1 nm, 10 nm, 10 μm). Overall, using a CNN to extract features instead of CellProfiler results in higher accuracies. C Dimension Reduction using LASSO The fc7 layer gives 4096 features, but many of them are redundant and irrelevant. Keeping these bad features can negatively affect the training, as it may lead to overfitting and decrease the accuracy of the model. Reducing the number of features we consider not only helps against these issues, it also reduces computation time. One way to reduce feature dimension is LASSO, a shrinkage method which pushes the fitted least squares regression coefficients to 0. The features selected for training are the same as those selected for classifying new data. We then investigated how the number of features affects the accuracy. The optimal number of features is between 300-500 features. 4

Figure 4 - Testing accuracies vs. concentration while varying number of features used for classification i) using the trial 4, 1 μm model on trial 4 data and ii) using the trial 4, 1 μm model on trial 3 data. D Features from Convolutional Layer From the fc7 layer, we extract 4096 features and use LASSO to further reduce the number of features to 500. Features taken from a fully connected layer (one of the last layers in the CNN) are very high level and more complex, as they combine previous low level features to form high-level abstract features. They are also more specific to the input training set; in our case, they will be able to detect faces, doors, boats, and other natural image components. The fully connected layer also takes into account the spatial layout and location of the features. This may be desirable for detecting and classifying objects, like houses and cars. However, for cells, we prefer lower level features and allow more spatial and rotational variance. To obtain such features, we need to extract them from a convolutional layer, such as layer conv4_3 of the VGG16 neural net. These features will be less specific to the training set, and thus might be able to better represent this new dataset, which is completely unrelated to the training data. This results in a feature map with 401408 elements. We then apply VLAD (vector of locally aggregated descriptors) to reduce the dimension of the feature representation. VLAD encoding on convolutional features also allows for deep representation of the image structure with no explicit high-level spatial dependence. VLAD is similar to the Bag of Words (BoW) method. In BoW, we have a set of visual words or cluster centroids, i.e. features and small image patches, which we consider as important and representative of an image. When looking at a new image, we detect the features of interest, find the closest visual word they are associated to, and then count the number of occurrences of each descriptor in each cluster. The image can thus be represented as a vector, containing this histogram of visual words. VLAD extends on this model. For each cluster, we sum the residual of every descriptor and the centroid of the cluster. The encoded vector corresponds to all of these residual 5

sums, normalized with its L 2 norm. Each VLAD vector has length 32 512 = 16384. We further reduce the dimensionality of the vector using LASSO. When testing the model with a new image, we must again find its VLAD representation using the same cluster centers as determined during the training of the model. VLAD features achieve higher accuracies overall. The improvement is especially visible when considering concentrations outside the 100 nm 1 μm range. When using a cross model (trial 4 model on trial 3 data), the accuracy is comparable to using fc7 features. Changing the number of clusters did not dramatically change the accuracy, so we kept it at 32. Figure 5 - Testing accuracies vs. concentration i) using the trial 4, 1 μm model on trial 4 data and ii) using the trial 4, 1 μm model on trial 3 data. Results from using VLAD encoded features and fc7 features are shown. E Reducing Input Image Size We then wanted to investigate the effect of the image input size on accuracy. Images were resized to 20, 25, 30, 35, 40, 60, 80, 100, 120, 140, 160 pixels before being resized back to 224 pixels to be fed into the neural net. This was done with features from the fc7 layer as well as the conv4_3 layer. With fc7 features, accuracy drops when the image size was below ~40 pixels. Using VLAD, the accuracy remains approximately constant as we decreased the image size. Hence, the input image size does not greatly affect the accuracy of the predictions. The first figure is using fc7 features, the second figure is using conv4_3 features and VLAD enconding. In both cases, the number of features was reduced to 500 using LASSO. 6

Figure 6 - Testing accuracies vs. input image size i) using the trial 4, 1 μm model on trial 4 data and ii) using the trial 4, 1 μm model on trial 3 data. IV Future Work A Training our own CNN Previously, we used neural nets only to extract features, and then used a linear SVM to create the model. The next step is to train our own convolutional neural net to classify the images. We are currently training a CNN with 3 convolutional layers, each followed by a pooling layer, and 2 fully connected layers. The features learned by the CNN are specific to the dataset, in this case cancer cells. Using our own CNN allows for more categories in the classifier, with the goal that the CNN can distinguish between different the concentrations with which the cells were treated. B Visualizing the Layers and the Features Extracted The most effective way of understanding what features the CNN is using to classify is by visualizing the filters and the features they select. Doing so will greatly increase our understanding of what the model considers as an important and distinctive feature. Visualizing the features also allows us to confirm that the CNN is not classifying based on the background or noise, which may be distinctive enough when imaging each concentration batch. The best way to visualize the learned features is using a Deconvnet. This method maps specific activations from filters back to the input pixel space, essentially telling us what part of the input results in high activations at this filter. 7

V Conclusion Using a neural net for feature extraction leads to better performing models. Using convolutional features seems to lead to models that are less sensitive to drug concentration as well as input image size. These initial findings were obtained from few tests. To more accurately quantify the improvement due to using fc7 features or VLAD, more trials must be conducted, particularly with new datasets. VLAD encoding on conv4_3 features seems like a promising method for classifying cells. References CS231n Convolutional Neural Networks for Visual Recognition, cs231n.github.io/convolutionalnetworks. DeCost, Brian L, et al. Exploring the Microstructure Manifold: Image Texture Representations Applied to Ultrahigh Carbon Steel Microstructures. 16 May 2017. Jégou, Hervé, et al. Aggregating Local Descriptors into a Compact Representation. Cimpoi, Mircea, et al. Deep filter banks for texture recognition and segmentation. Zeiler, Matthew D, & Fergus, Rob. Visualizing and Understanding Convolutional Networks. 8