CS229: Machine Learning

Similar documents
A New Algorithm for Shape Detection

Automatic Fatigue Detection System

Image Processing. Bilkent University. CS554 Computer Vision Pinar Duygulu

A Keypoint Descriptor Inspired by Retinal Computation

IJIRST International Journal for Innovative Research in Science & Technology Volume 1 Issue 10 March 2015 ISSN (online):

Hybrid Approach for MRI Human Head Scans Classification using HTT based SFTA Texture Feature Extraction Technique

Learning an Alphabet of Shape and Appearance for Multi-Class Object Detection

Blood Microscopic Image Analysis for Acute Leukemia Detection

CHAPTER 6 DETECTION OF MASS USING NOVEL SEGMENTATION, GLCM AND NEURAL NETWORKS

Feature Descriptors. CS 510 Lecture #21 April 29 th, 2013

Robust PDF Table Locator

CS229 Final Project One Click: Object Removal

CHAPTER 4 DETECTION OF DISEASES IN PLANT LEAF USING IMAGE SEGMENTATION

Detection of a Single Hand Shape in the Foreground of Still Images

Android Automatic object detection - BGU CV 142

Deep Learning for Object detection & localization

Automatic Extraction of Tissue Form in Brain Image

SuRVoS Workbench. Super-Region Volume Segmentation. Imanol Luengo

A Review on Plant Disease Detection using Image Processing

OBJECT SORTING IN MANUFACTURING INDUSTRIES USING IMAGE PROCESSING

ECG782: Multidimensional Digital Signal Processing

6. Applications - Text recognition in videos - Semantic video analysis

3-D MRI Brain Scan Classification Using A Point Series Based Representation

Implementation of a Face Recognition System for Interactive TV Control System

A Novel Approach to Image Segmentation for Traffic Sign Recognition Jon Jay Hack and Sidd Jagadish

Learning to Recognize Faces in Realistic Conditions

FACE DETECTION AND RECOGNITION OF DRAWN CHARACTERS HERMAN CHAU

Ensemble methods in machine learning. Example. Neural networks. Neural networks

Classification of Protein Crystallization Imagery

STRUCTURAL EDGE LEARNING FOR 3-D RECONSTRUCTION FROM A SINGLE STILL IMAGE. Nan Hu. Stanford University Electrical Engineering

Digital Image Processing COSC 6380/4393

Segmentation and Modeling of the Spinal Cord for Reality-based Surgical Simulator

PERSONALIZATION OF MESSAGES

Classifiers and Detection. D.A. Forsyth

CS395T paper review. Indoor Segmentation and Support Inference from RGBD Images. Chao Jia Sep

CHAPTER-1 INTRODUCTION

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Automatic Colorization of Grayscale Images

Supervised vs unsupervised clustering

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

Fast and Efficient Automated Iris Segmentation by Region Growing

Adaptive Local Thresholding for Fluorescence Cell Micrographs

Classification of objects from Video Data (Group 30)

Prostate Detection Using Principal Component Analysis

Classification of Subject Motion for Improved Reconstruction of Dynamic Magnetic Resonance Imaging

Face and Nose Detection in Digital Images using Local Binary Patterns

Figure 1: Workflow of object-based classification

Project 3 Q&A. Jonathan Krause

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018

Content Based Image Retrieval using Combined Features of Color and Texture Features with SVM Classification

/ Computational Genomics. Normalization

Skin and Face Detection

Tutorial 8. Jun Xu, Teaching Asistant March 30, COMP4134 Biometrics Authentication

OCR For Handwritten Marathi Script

Processing of binary images

Detecting Thoracic Diseases from Chest X-Ray Images Binit Topiwala, Mariam Alawadi, Hari Prasad { topbinit, malawadi, hprasad

Data Mining: Classifier Evaluation. CSCI-B490 Seminar in Computer Science (Data Mining)

Practical Image and Video Processing Using MATLAB

CS 170 Algorithms Fall 2014 David Wagner HW12. Due Dec. 5, 6:00pm

Recognition of Animal Skin Texture Attributes in the Wild. Amey Dharwadker (aap2174) Kai Zhang (kz2213)

CS 221: Object Recognition and Tracking

The Population Density of Early Warning System Based On Video Image

Image Processing Fundamentals. Nicolas Vazquez Principal Software Engineer National Instruments

Segmentation Using a Region Growing Thresholding

8/3/2017. Contour Assessment for Quality Assurance and Data Mining. Objective. Outline. Tom Purdie, PhD, MCCPM

Digital Image Processing

Classification Algorithms in Data Mining

Using Machine Learning for Classification of Cancer Cells

CSIS Introduction to Pattern Classification. CSIS Introduction to Pattern Classification 1

Automatic License Plate Detection

MediaTek Video Face Beautify

3D RECONSTRUCTION OF BRAIN TISSUE

FACIAL FEATURES DETECTION AND FACE EMOTION RECOGNITION: CONTROLING A MULTIMEDIA PRESENTATION WITH FACIAL GESTURES

A Systematic Analysis System for CT Liver Image Classification and Image Segmentation by Local Entropy Method

Schedule for Rest of Semester

Carmen Alonso Montes 23rd-27th November 2015

Report: Reducing the error rate of a Cat classifier

IDENTIFYING OPTICAL TRAP

Road-Sign Detection and Recognition Based on Support Vector Machines. Maldonado-Bascon et al. et al. Presented by Dara Nyknahad ECG 789

Using the Kolmogorov-Smirnov Test for Image Segmentation

Structural Health Monitoring Using Guided Ultrasonic Waves to Detect Damage in Composite Panels

AUTOMATIC VIDEO INDEXING

Object detection using Region Proposals (RCNN) Ernest Cheung COMP Presentation

Extracting Layers and Recognizing Features for Automatic Map Understanding. Yao-Yi Chiang

Pose estimation using a variety of techniques

Content based Image Retrieval Using Multichannel Feature Extraction Techniques

Yiqi Yan. May 10, 2017

Keywords Binary Linked Object, Binary silhouette, Fingertip Detection, Hand Gesture Recognition, k-nn algorithm.

Traffic Signs Recognition using HP and HOG Descriptors Combined to MLP and SVM Classifiers

Segmentation of Neuronal-Cell Images from Stained Fields and Monomodal Histograms

Motivation. Gray Levels

Improving Positron Emission Tomography Imaging with Machine Learning David Fan-Chung Hsu CS 229 Fall

MORPHOLOGICAL BOUNDARY BASED SHAPE REPRESENTATION SCHEMES ON MOMENT INVARIANTS FOR CLASSIFICATION OF TEXTURES

Support Vector Subset Scan for Spatial Pattern Detection

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1

Problem Set 4. Danfei Xu CS 231A March 9th, (Courtesy of last year s slides)

Segmenting Lesions in Multiple Sclerosis Patients James Chen, Jason Su

Introduction to Automated Text Analysis. bit.ly/poir599

Describable Visual Attributes for Face Verification and Image Search

Case-Based Reasoning. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. Parametric / Non-parametric.

Transcription:

Project Title: Recognition and Classification of human Embryonic Stem (hes) Cells Differentiation Level Team members Dennis Won (jwon2014), Jeong Soo Sim (digivice) Project Purpose 1. Given an image of hes cell culture, use learning algorithms to classify whether the recognized cell colonies are differentiated or not (Binary classification) 2. Perform multi-class classification to determine the level of differentiation of the recognized cell colonies 3. Spot the particular sections in the recognized colonies where there remains undifferentiated cell colonies 4. Provide a graphical display on the cell images demonstrating which colonies or which parts of the colonies are undifferentiated, and thus utilizable for medical research Introduction While human embryonic stem (hes) cells impose a huge potential in biomedical research discoveries to cure numerous diseases (such as Alzheimer's diseases, spinal cord injury, heart disease, etc.), the difficulty to maintain the undifferentiated state of hes cells hinders researchers to perform accurate and efficient experiments on hes cells. Human embryonic stem (hes) cells must be monitored and cared for in order to maintain healthy, undifferentiated cultures. In order to maintain the undifferentiated state of hes cells, the cultures must be continuously made sure to have no lost or lacking nutrients and be free of unwanted differentiation factors. In addition, although a small amount of differentiation is normal and expected in stem cell cultures, the culture should be routinely cleaned up by manually removing, or "picking" differentiated areas because those differentiated areas within the colony starts the chain-like reaction to differentiate others parts of the cultures, spreading the differentiation 1. Having no technology to systematically maintain the undifferentiated state of hes cell culture, the stem cell scientists today completely rely on human eyes, human hands, and human decision to i) keep the favorable environment for the cell culture to remain undifferentiated, and to ii) identifying and removing excess differentiation from hes cell cultures to maintain the healthy population of cells. Due to the subjective nature of human decision based on human eyes and hands, the maintenance of healthy, undifferentiated hes cell culture has become one of the most difficult things to do for those scientists 1. This particular project aims to use Machine Learning technologies (supervised learning algorithms) to perform AI recognition and binary classification of healthy, undifferentiated stem cell colonies from hes cell images with their differentiation levels. There are two main parts to this project. 1. Train an SVM and Logistic Regression model to learn how to classify between differentiated and undifferentiated hes colony. The input features for SVM training includes: the closeness of the shape of the colony to a perfect circle, the defined-ness of the edge of the colony, the degree of color intensity distribution within the colony, size of the colony, texture of the colony, etc. 2. Train SVM and Logistic Regression model to learn how to classify the differentiatedness of the each individual superpixel within the hes colony in order to localize the undifferentiated, healthy section within the colony. 1

Part I. Extracting feature vectors from the training set of hes cell colony images This project involves a heavy load of image processing work that needs to done prior to training and testing the learning classifier in order to extract features from the images. We aggregated a total of 536 images of hes stem cell colony provided from Wu Cardiovascular Stem Cell Laboratory at Stanford Medical Center. Of the 536 images, exactly 268 images were differentiated hes cell colony images (positive: +1) and exactly other half 268 images were undifferentiated, healthy hes cell colony images (negative: -1). These images were taken by the same microscope with the same size scale and the same color intensity. Figure 1 shows the sample images of a positive colony image, and a negative colony image. Figure 1. Positive colony image for the differentiated colony. Negative colony image for the undifferentiated colony. Note the morphological differences between the two. I. Colony image feature extraction OpenCV (Open Source Computer Vision Library) is a library of programming functions mainly aimed at realtime computer vision, developed by Intel. We used this OpenCV library to extract the following features of the hes colonies. Figure 2 shows the detection of the cell colony using the OpenCV edge and object detector, which also provides a variety of feature extraction methods. Figure 3 below displays the feature vector extracted from each hes cell colony image, along with the description of how individual features were measured and calculated. Figure 2. Edge/object detection of the positive colony image for the differentiated colony. Edge/object detection of the negative colony image for the undifferentiated colony. Note the morphological differences between the two. 2

Feature (Colony level) Measurement Shape closeness to perfect circle OpenCV library to fit circular objects on the image. The closeness of the colony shape to a perfect circle determined by the OpenCV detector. Range: 0 <= closeness <= 1 Edge definition level Edge definition level measured as the ratio of the number of edge pixels in the boundary of the colony to the total number of boundary pixels. The higher the ratio (number of edge pixels:total number of boundary pixels), the higher the edge definition is. Calculated using the OpenCV detector. Range: 0 <= edge definition level <= 1 Size The size of the colony measured as the ratio of the radius of the edgedetected colony to the dimension of the cell image. Note that all cell colony images were taken under the same condition (size scale, light intensity, etc.) Range: 0 <= size <= 1 Texture density Texture density of the cell colony measured as the edgeness per unit area, which is defined by F_edgeness = {p gradient_magnitude(p) > T} / N where p is a pixel in the colony segment, T is some threshold, N is the total number of pixels in the colony segment, and gradient_magnitude computed from OpenCV detector. Range: 0 <= Texture density <= 1 Color intensity distribution Color intensity distribution level measured by the standard deviation of the color intensity histogram for all pixels within the cell colony region divided by the range of the color intensity. Range: 0 <= Color intensity distribution <= 1 Figure 3. The feature vector extracted from each hes cell colony image II. Superpixel feature extraction within a single colony In addition extracting features of a colony to train SVM to perform binary classification of whether the colony is differentiated or undifferentiated in general, we also extracted features for each individual superpixels within each hes cell colony (superpixel was defined as 10x10 pixel-sqaure within the colony). A total of 121 samples of hes cell colonies were manually tagged for their undifferentiated region within them. Figure 4 displays the examples of the tagged (the red ROI) image input of hes colony, and Figure 5 shows the feature vector extracted from each superpixel (4 features extracted from each superpixel). Figure 4. The undifferentiated region ROI of colony tagged input images. (Manually tagged) 3

Feature (Superpixel 10x10 level) Measurement Edge definition level Edge definition level measured similar to the edge definition level from Figure 3, but measured from only within a particular superpixel. Range: 0 <= edge definition level <= 1 Distance from Centroid The distance from Centroid measured as the ratio of pixel distance from the Centroid to the radius of the detected colony. Range: 0 <= size <= 1 Texture density Texture density of the super pixel measured as the edgeness per unit area, similar to as described from Figure 3 feature vector for colony features. Range: 0 <= Texture density <= 1 Color intensity distribution Color intensity distribution level measured by the standard deviation of the color intensity histogram for all pixels within the superpixel divided by the range of the color intensity. Range: 0 <= Color intensity distribution <= 1 Figure 5. The feature vector extracted from each superpixel with in the detected hes cell colony image Part II. Training Support Vector Machine (SVM) and Logistic Regression Model with the extracted features Ten-Fold Cross Validation Positive samples Negative samples 9-fold used for SVM Training 1-fold used for Testing Support Vector Machine (Python SKLearn Library) Classification Result: Part I: Colony binary classification positive: differentiated, negative: undifferentiated Part II: Superpixel binary classification for segmentation of undifferentiated (negative) region within the hes colony (Figure 6 demonstrates an example of the segmented region of undifferentiated region within the colony determined by the SVM) (a.1) Training (a.2) Test (b.1) Training (b.2) Test Figure 6. The superpixel binary classification used to segment the undifferentiated (negative) region of the hes cell colony (a.2 and b.2 are the SVM-classification-induced ROI, while a.1 and b.1 are manually tagged ROI). 4

Part III. Result I. Support Vector Machine on hes Colony Colony level classification For each of ten iterations of ten-fold Cross Validation, we recorded the positive and negative classified samples for both our prediction and the manually tagged result. By aggregating all iterations, we were able to deduce the following table. Real + - total + 203 78 281 Prediction - 65 190 255 total 268 268 536 Figure 7. SVM prediction result using ten-fold Cross Validation Using ten-fold Cross Validation for 10 times and aggregating every result gave us 73.3% accuracy (Figure 7). II. Support Vector Machine on Superpixel Superpixel level classification For each of ten iterations of ten-fold Cross Validation, we computed the numbers of predicted samples where the overlap between the predicted ROI of undifferentiated region and the manually tagged ROI is higher than 75% (that is, the ratio of the overlap between the two ROI s to the size of the manually tagged ROI exceeds 0.75). By aggregating all iterations, we found that our SVM achieves 68.5% accuracy (82 out of 121 total samples had over 75% of its ROI predicted by our SVM model). Discussion While our SVM model to predict the differentiate-ness of the hes cell colony and the segmentation of the undifferentiated region in the colony achieved a degree of accuracy (around 75%), there are definitely sources or errors that hampered our study to be more accurate. First of all, our feature extraction and the normalization of the extracted features might not completely capture the differences between positive and negative samples. Most importantly, the size of our training set is not large enough to have achieve higher level of accuracy. The future effort to increase the performance of the learning machine includes the following: first of all, use more data so that the machine gets trained better. Second, use more sophisticated image processing techniques to extract more features and also find out which features are the most important factors in determining the differentiated-ness of the colony. Lastly, we will be able to improve the segmentation algorithm used in this project by constructing more sophisticated and intuitive way to form superpixels. Currently, superpixel is constructed based on chosen constant. By allowing superpixels to have various size and shape, our analysis on the colony will have more accuracy since we will be able to more precisely depict the characteristics of colonies and cell. 5