arxiv: v3 [] 2 Jun 2017

Size: px
Start display at page:

Download "arxiv: v3 [] 2 Jun 2017"


1 Incorporating the Knowledge of Dermatologists to Convolutional Neural Networks for the Diagnosis of Skin Lesions arxiv: v3 [] 2 Jun 2017 Iván González-Díaz Department of Signal Theory and Communications Universidad Carlos III de Madrid Leganés, 28911, Spain Abstract This report describes our submission to the ISIC 2017 Challenge in Skin Lesion Analysis Towards Melanoma Detection. We have participated in the Part 3: Lesion Classification with a system for automatic diagnosis of nevus, melanoma and seborrheic keratosis. Our approach aims to incorporate the expert knowledge of dermatologists into the well known framework of Convolutional Neural Networks (CNN), which have shown impressive performance in many visual recognition tasks. In particular, we have designed several networks providing lesion area identification, lesion segmentation into structural patterns and final diagnosis of clinical cases. Furthermore, novel blocks for CNNs have been designed to integrate this information with the diagnosis processing pipeline. Figure 1: Main processing pipeline of our Automatic Diagnosis System 1 General description of the system The main pipeline of our system is depicted in Fig. 1. It comprises the following steps: 1. For each clinical case c, a dermoscopic image X c feeds a Lesion Segmentation Network that generates a binary mask M c outlining the area of the image which corresponds to the lesion. The description of this module is given in section Each clinical case c, which is now defined by the image-mask couple {X c, M c }, goes through the Data Augmentation Module. This module aims to extend the initial visual support of the lesion by generating new views v corresponding to different rotations and

2 cropped areas. Hence, the output of this module is an extended set of images X c v related to the lesion. Section 3 provides a detailed description of this data augmentation process. 3. The next step in the process is the Structure Segmentation Network. It aims to segment each view of the lesion X v into a set of eight global and local structures that have turned to be very important for dermatologists in their daily diagnosis. Examples of these structures are dots/globules, regression areas, streaks, etc. Hence, the output of this system is a set of 8 segmentation maps S c vs, s = 1...8, each one associated to a particular structure s of interest. This module is introduced in section Finally, the augmented set { X c v, S c vs} is passed to the Diagnosis Network, which is in charge of providing the final diagnosis Y c for the clinical case. The description of this network can be found in section 5. 2 Lesion Segmentation Network The Lesion Segmentation Network has been developed by learning a Fully Convolutional Network (FCN) [Shelhamer et al., 2016]. FCNs have achieved state-of-the-art results on the task of semantic image segmentation in general-content, as demonstrated in the PASCAL VOC Segmentation [Everingham et al., 2015]. In order to train a network for our particular task of lesion/skin segmentation, we have used the training set for the lesion segmentation task in the 2017 ISBI challenge. Let us note that the goal of this module is not to generate very accurate segmentation maps of a lesion, but to broadly identify the area of the image that corresponds to the lesion, giving place to a binary map M c for each clinical case. Figure 2: Example of a rotated and cropped view of a lesion and its Normalized Polar Coordinates. (Left) View of the lession (Middle) Normalize Ratio (Right) Angle 3 Data Augmentation Module and Normalized Polar Coordinates It is well known that data augmentation notably boosts the performance of deep neural networks, mainly when the amount of training data is limited. Among all the potential image variations and artifacts, invariance to orientation is probably the main requirement of our method, as dermatologists do not follow a specific protocol during the capture of a lesion. Other more complex geometric transformations such as affine or projective transforms are less interesting here as the dermatoscope is normally placed just over and orthogonally to the lesion surface. The particular process of data augmentation is described next: 1. First, starting from the pair {X c, M c }, we generate a set of rotated versions. 2. As rotating an image without losing any visual information requires incorporating new areas which were not present in the original view, we find and crop the largest inner rectangle ensuring that all pixels belong to the original image. 3. Finally, as our sub subsequent CNNs (Structure Segmentation and Diagnosis) require square input images of 256x256 pixels, we finally perform various squared crops which are in turn re-sized to the required dimensions. Considering the aforementioned rotations and crops, for each given clinical case c, we generate an augmented set of 24 images, represented by a tensor X c v R , with v = In addition, for each generated view X c v, we compute the Normalized Polar Coordinates from the lesion mask. The goal of this new alternative coordinates is to support subsequent processing blocks by providing invariance against shifts, rotations, changes in size and even irregular shapes of the lesions. To do so, we transform pixel Cartesian coordinates (x i, y i ) into normalized polar coordinates (ρ i, θ i ), where rho i [0, 1] and θ i [0, 2π) stand for the normalized ratio and angle, respectively. 2

3 The process to compute this transformation is as follows: first, the mask of the lesion is approximated by an ellipse with the same second-order moments. Then, we learn the affine matrix that transforms the ellipse into a normalized (unit ratio) circle centered at (0,0). Figure 2 shows an example of a rotated and cropped view of a lesion, and its corresponding normalized polar coordinates. 4 Structure Segmentation Network The goal of this module is, given an input view of the lesion X c v, to provide a corresponding segmentation into a pre-defined set of textural patterns and local structures that are of special interest for dermatologists in their diagnosis. In particular, we have considered a set of eight structures: 1) dots, globules and cobblestone pattern, 2) reticular patterns and pigmented networks, 3) homogeneous areas, 4) regression areas, 5) blue-white veil, 6) streaks, 7) vascular structures and 8) unspecific patterns. The main challenge to develop this module is the generation of a strongly-labeled training dataset, in which each image has an associated ground truth pixel-wise segmentation. This kind of annotation is often hard to obtain as it requires a huge effort of the dermatologists to manually outline the segmentations. Alternatively, providing weak image-level labels indicating only which structural patterns are present in each lesion is much easier for dermatologists and therefore becomes more realistic. Hence, following this latter approach, we asked dermatologists of a collaborating medical institution, the Hospital Doce de Octubre in Madrid, to annotate the ISIC 2016 training dataset with the presence or absence of the 8 considered structures. In particular, we asked them to provide one labels for each structure: 0 if the structure is not present, 1 if is locally present, 2 if it is present and large enough to be considered a global pattern in the lesion. Given this weakly-annotated dataset, we have built our approach over the work of [Pathak et al., 2015], where the authors introduced a novel constrained optimization for weakly-labeled segmentation using CNNs. The output of this network is a reduced version of the input image (64x64 in our case) where, for each pixel location x i, a softmax is used to transform the net outputs f i (x i ; θ) into probabilities as follows: p i (x i θ) = 1 Z i exp(f i (x i θ)) (1) where θ represents the parameters of the CNN, and Z i = s=1...8 exp(f i(s θ)) is the partition function at the location i. The presence or absence of a class, as well as, an estimate of its size in the image, lead to particular constraints over the probability P s = i p i(s θ) accumulated over all pixel locations in the segmentation map: If a structure s is not present in an image, the constraint acts as an upper bound over the accumulated probability P s, which has to be nearly zero. If a structure s is local in an image, we impose a lower and upper bound on the accumulated probability P s in the image to control the total area of the structure in the lesion. If a structure s is global in an image, we impose a lower bound on the accumulated probability P s in the image to ensure a minimum area corresponding to the structure. In order to adapt this approach to our particular scenario, we have developed a set of modifications over the original approach, namely: We observed that using simple softmax function lead to situations in which many constraints over local structures were obeyed by assigning some residual probability to every location in the segmentation map. From our point of view, this is an undesired behavior, as one would rather expect a small set of pixels showing large probabilities of belonging to the structure of interest. To overcome this limitation, we have used a parametric softmax p i (x i γ, θ) = 1 Z i exp(f i (γx i θ)). The parameter γ is a soft-approximation towards the max function, and large values lead to scenarios in which each location shows high probability just for very reduced set of structures. In our case, we have used a value of γ = 20. We added a new constraint that helps to learn structures that appear in spatial locations of the lesion: e.g. streaks tend to appear in the borders of a lesion. For that end, we accumulate probabilities P s only in those locations that will likely contain the intended structure. At 3

4 this point, we have defined these areas of interest over the Normalized Polar Coordinates described in section 3, which are more adequate than the original Cartesian coordinates. We have implemented this module taking the well-known vgg-vdd [Simonyan and Zisserman, 2014] (the same network used as initialization for the lesion segmentation module), removing the top layers, and using the ISIC 2016 training dataset and the described constrained optimization with weak annotations [Pathak et al., 2015]. The output of this module is, for each view v of a clinical case c, a tensor S c v R that contains the 8 probability maps of the considered structures. Figure 3: Processing pipeline of the Diagnosis Network 5 Diagnosis Network The Diagnosis Network will gather the information from previous modules in order to generate a diagnose for each clinical case. As in the previous modules, our approach has taken a well-known CNN as starting point and modified the top layers to get a better adaptation to our problem. The network chosen as basis is the resnet-50 [He et al., 2015], which uses residual layers to avoid the degradation problem when more and more layers are stacked to the network. When applied to our 256x256 images, the last convolutional block (conv_5x) of this network produces a tensor T c R , which hopefully behaves as a detector of high level concepts (objects in Imagenet, the dataset for which it was originally designed). In the original work, an average pooling layer transformed this tensor into a single-value per channel and image T s R , which was followed by a fully convolutional layer and a softmax to generate the final probabilities of the image containing the classes being detected. Hence, the goal of the average pooling was fusing detections at various locations of the input image and generating a unified score for each high-level concept. In our approach, however, we have modified the structure of the top layers of the network, giving place to the structure presented in Figure 3. We basically subdivide the top fully-connected layer providing the lesion diagnosis into three arms: a) the original arm with an average pooling followed by a fully connected layer (FC1), b) a second arm that performs a normalized polar pooling (3x6 rings by angles) and follows it by a fully connected layer (FC2), c) a third arm that estimates the asymmetry of lesion based on the previous polar pooling and applies then a Fully Connected layer (FC3). The results of the three arms are then linearly combined using a Sum block. We next describe the novel blocks that are required in this new structure and that have been specifically developed in this work: 1. Modulation block: The goal of this block is to take advantage of the previous segmentations of the lesion into global and local structures which are of great interest for dermatologists in their daily diagnosis. To do so, this blocks fuses the previous structure segmentation maps S c v with the filter outputs of the conv_5x layer in resnet-50. In particular, we modulate the outputs of the layer (2048 channels in our case) using the probabilities of the 8 local and global structures described in section 4. By concatenating the resulting modulation with the original set of outputs we finally generate a set of channels which is 9 times the original one (18432 in our case). 2. Polar Pooling: This block aims to perform pooling operations over data (average or max pooling) but, rather than using rectangular spatial regions, we employ sectors defined in polar coordinates. Hence, this block is defined for a given number of radial rings R (radius 4

5 ranging from 0 to 1) and angular sectors A (angles ranging between 0 and 2π), producing an output of size R A channels. Furthermore, in order to adapt to the irregular shapes of the lesions, we use the normalized polar coordinates described in section 3. Since, depending on the shape of the lesion and the size of the tensor being pooled, some combinations (r, a) may not contain pixels in the image, we can also define overlaps between adjacent radius and angles to regularize the outputs. In addition, the division of the lesion into rings is non-uniform and ensures that every ring contains the same number of pixels for a perfect circular lesion. 3. Asymmetry: This block computes metrics that evaluate the asymmetry of a lesion for a given angle. In particular, given a polar division of the lesion into R A sectors, we compute the asymmetry for A/2 angles by folding the lesion over each angle and computing the accumulated absolute difference between corresponding sectors. As shown in the Figure 3, we combine these modules to generate a final output Y c v for each considered view of a clinical case. Finally, in order to generate a final output for each clinical case Y c, we consider independence between views leading to a factorization: Y c = V v=1 Y c v (2) It is also worth noting that our final submission has also incorporated in the factorization an extra classifier which depends only on external information about the clinical case, such as patient gender and age, and lesion area. 6 Code The code that implements this paper as well as the Lesion Segmentation and Diagnosis Networks are provided in the following link: Acknowledgments We kindly thank dermatologists of Hospital 12 de Octubre of Madrid because of their inestimable help annotating the data contents with the weak labels of structural patterns. This work was supported in part by the National Grant TEC P and National Grant TEC EXP of the Spanish Ministry of Economy and Competitiveness. In addition, we gratefully acknowledge the support of NVIDIA Corporation with the donation of the TITAN X GPU used for this research. References M. Everingham, S. M. A. Eslami, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The pascal visual object classes challenge: A retrospective. International Journal of Computer Vision, 111(1):98 136, Jan K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. CoRR, abs/ , URL D. Pathak, P. Krähenbühl, and T. Darrell. Constrained convolutional neural networks for weakly supervised segmentation. In ICCV, E. Shelhamer, J. Long, and T. Darrell. Fully convolutional networks for semantic segmentation. CoRR, abs/ , URL K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. CoRR, abs/ ,

Skin Lesion Classification and Segmentation for Imbalanced Classes using Deep Learning

Skin Lesion Classification and Segmentation for Imbalanced Classes using Deep Learning Skin Lesion Classification and Segmentation for Imbalanced Classes using Deep Learning Mohammed K. Amro, Baljit Singh, and Avez Rizvi,, Abstract - This

More information

Content-Based Image Recovery

Content-Based Image Recovery Content-Based Image Recovery Hong-Yu Zhou and Jianxin Wu National Key Laboratory for Novel Software Technology Nanjing University, China Abstract. We propose

More information



More information

Lecture 7: Semantic Segmentation

Lecture 7: Semantic Segmentation Semantic Segmentation CSED703R: Deep Learning for Visual Recognition (207F) Segmenting images based on its semantic notion Lecture 7: Semantic Segmentation Bohyung Han Computer Vision Lab.

More information

Fully Convolutional Networks for Semantic Segmentation

Fully Convolutional Networks for Semantic Segmentation Fully Convolutional Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell UC Berkeley Chaim Ginzburg for Deep Learning seminar 1 Semantic Segmentation Define a pixel-wise labeling

More information

Final Report: Smart Trash Net: Waste Localization and Classification

Final Report: Smart Trash Net: Waste Localization and Classification Final Report: Smart Trash Net: Waste Localization and Classification Oluwasanya Awe Robel Mengistu December 15, 2017 Vikram Sreedhar Abstract Given

More information

Deep learning for object detection. Slides from Svetlana Lazebnik and many others

Deep learning for object detection. Slides from Svetlana Lazebnik and many others Deep learning for object detection Slides from Svetlana Lazebnik and many others Recent developments in object detection 80% PASCAL VOC mean0average0precision0(map) 70% 60% 50% 40% 30% 20% 10% Before deep

More information

Lecture 5: Object Detection

Lecture 5: Object Detection Object Detection CSED703R: Deep Learning for Visual Recognition (2017F) Lecture 5: Object Detection Bohyung Han Computer Vision Lab. 2 Traditional Object Detection Algorithms Region-based

More information

Gradient of the lower bound

Gradient of the lower bound Weakly Supervised with Latent PhD advisor: Dr. Ambedkar Dukkipati Department of Computer Science and Automation Objective Given a training set that comprises image and image-level

More information

A new interface for manual segmentation of dermoscopic images

A new interface for manual segmentation of dermoscopic images A new interface for manual segmentation of dermoscopic images P.M. Ferreira, T. Mendonça, P. Rocha Faculdade de Engenharia, Faculdade de Ciências, Universidade do Porto, Portugal J. Rozeira Hospital Pedro

More information

Yiqi Yan. May 10, 2017

Yiqi Yan. May 10, 2017 Yiqi Yan May 10, 2017 P a r t I F u n d a m e n t a l B a c k g r o u n d s Convolution Single Filter Multiple Filters 3 Convolution: case study, 2 filters 4 Convolution: receptive field receptive field

More information


REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION Kingsley Kuan 1, Gaurav Manek 1, Jie Lin 1, Yuan Fang 1, Vijay Chandrasekhar 1,2 Institute for Infocomm Research, A*STAR, Singapore 1 Nanyang Technological

More information

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution and Fully Connected CRFs

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution and Fully Connected CRFs DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution and Fully Connected CRFs Zhipeng Yan, Moyuan Huang, Hao Jiang 5/1/2017 1 Outline Background semantic segmentation Objective,

More information

Dynamic Routing Between Capsules

Dynamic Routing Between Capsules Report Explainable Machine Learning Dynamic Routing Between Capsules Author: Michael Dorkenwald Supervisor: Dr. Ullrich Köthe 28. Juni 2018 Inhaltsverzeichnis 1 Introduction 2 2 Motivation 2 3 CapusleNet

More information

Encoder-Decoder Networks for Semantic Segmentation. Sachin Mehta

Encoder-Decoder Networks for Semantic Segmentation. Sachin Mehta Encoder-Decoder Networks for Semantic Segmentation Sachin Mehta Outline > Overview of Semantic Segmentation > Encoder-Decoder Networks > Results What is Semantic Segmentation? Input: RGB Image Output:

More information

A Novel Representation and Pipeline for Object Detection

A Novel Representation and Pipeline for Object Detection A Novel Representation and Pipeline for Object Detection Vishakh Hegde Stanford University Manik Dhar Stanford University Abstract Object detection is an important

More information

Spatial Localization and Detection. Lecture 8-1

Spatial Localization and Detection. Lecture 8-1 Lecture 8: Spatial Localization and Detection Lecture 8-1 Administrative - Project Proposals were due on Saturday Homework 2 due Friday 2/5 Homework 1 grades out this week Midterm will be in-class on Wednesday

More information

Skin Lesion Attribute Detection for ISIC Using Mask-RCNN

Skin Lesion Attribute Detection for ISIC Using Mask-RCNN Skin Lesion Attribute Detection for ISIC 2018 Using Mask-RCNN Asmaa Aljuhani and Abhishek Kumar Department of Computer Science, Ohio State University, Columbus, USA E-mail:;

More information

R-FCN++: Towards Accurate Region-Based Fully Convolutional Networks for Object Detection

R-FCN++: Towards Accurate Region-Based Fully Convolutional Networks for Object Detection The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18) R-FCN++: Towards Accurate Region-Based Fully Convolutional Networks for Object Detection Zeming Li, 1 Yilun Chen, 2 Gang Yu, 2 Yangdong

More information

Paper Motivation. Fixed geometric structures of CNN models. CNNs are inherently limited to model geometric transformations

Paper Motivation. Fixed geometric structures of CNN models. CNNs are inherently limited to model geometric transformations Paper Motivation Fixed geometric structures of CNN models CNNs are inherently limited to model geometric transformations Higher-level features combine lower-level features at fixed positions as a weighted

More information

Object detection with CNNs

Object detection with CNNs Object detection with CNNs 80% PASCAL VOC mean0average0precision0(map) 70% 60% 50% 40% 30% 20% 10% Before CNNs After CNNs 0% 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 year Region proposals

More information

Rich feature hierarchies for accurate object detection and semantic segmentation

Rich feature hierarchies for accurate object detection and semantic segmentation Rich feature hierarchies for accurate object detection and semantic segmentation BY; ROSS GIRSHICK, JEFF DONAHUE, TREVOR DARRELL AND JITENDRA MALIK PRESENTER; MUHAMMAD OSAMA Object detection vs. classification

More information

CIS680: Vision & Learning Assignment 2.b: RPN, Faster R-CNN and Mask R-CNN Due: Nov. 21, 2018 at 11:59 pm

CIS680: Vision & Learning Assignment 2.b: RPN, Faster R-CNN and Mask R-CNN Due: Nov. 21, 2018 at 11:59 pm CIS680: Vision & Learning Assignment 2.b: RPN, Faster R-CNN and Mask R-CNN Due: Nov. 21, 2018 at 11:59 pm Instructions This is an individual assignment. Individual means each student must hand in their

More information

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun Presented by Tushar Bansal Objective 1. Get bounding box for all objects

More information

Medical images, segmentation and analysis

Medical images, segmentation and analysis Medical images, segmentation and analysis ImageLab group Università degli Studi di Modena e Reggio Emilia Medical Images Macroscopic Dermoscopic ELM enhance the features of

More information

Fuzzy Set Theory in Computer Vision: Example 3

Fuzzy Set Theory in Computer Vision: Example 3 Fuzzy Set Theory in Computer Vision: Example 3 Derek T. Anderson and James M. Keller FUZZ-IEEE, July 2017 Overview Purpose of these slides are to make you aware of a few of the different CNN architectures

More information

Object Detection. CS698N Final Project Presentation AKSHAT AGARWAL SIDDHARTH TANWAR

Object Detection. CS698N Final Project Presentation AKSHAT AGARWAL SIDDHARTH TANWAR Object Detection CS698N Final Project Presentation AKSHAT AGARWAL SIDDHARTH TANWAR Problem Description Arguably the most important part of perception Long term goals for object recognition: Generalization

More information

Object Detection on Self-Driving Cars in China. Lingyun Li

Object Detection on Self-Driving Cars in China. Lingyun Li Object Detection on Self-Driving Cars in China Lingyun Li Introduction Motivation: Perception is the key of self-driving cars Data set: 10000 images with annotation 2000 images without annotation (not

More information

Computer Vision Lecture 16

Computer Vision Lecture 16 Computer Vision Lecture 16 Deep Learning Applications 11.01.2017 Bastian Leibe RWTH Aachen Announcements Seminar registration period starts

More information

Constrained Convolutional Neural Networks for Weakly Supervised Segmentation. Deepak Pathak, Philipp Krähenbühl and Trevor Darrell

Constrained Convolutional Neural Networks for Weakly Supervised Segmentation. Deepak Pathak, Philipp Krähenbühl and Trevor Darrell Constrained Convolutional Neural Networks for Weakly Supervised Segmentation Deepak Pathak, Philipp Krähenbühl and Trevor Darrell 1 Multi-class Image Segmentation Assign a class label to each pixel in

More information

Object Detection Based on Deep Learning

Object Detection Based on Deep Learning Object Detection Based on Deep Learning Yurii Pashchenko AI Ukraine 2016, Kharkiv, 2016 Image classification (mostly what you ve seen)

More information

Joint Object Detection and Viewpoint Estimation using CNN features

Joint Object Detection and Viewpoint Estimation using CNN features Joint Object Detection and Viewpoint Estimation using CNN features Carlos Guindel, David Martín and José M. Armingol Intelligent Systems Laboratory Universidad Carlos III de Madrid

More information

Structured Prediction using Convolutional Neural Networks

Structured Prediction using Convolutional Neural Networks Overview Structured Prediction using Convolutional Neural Networks Bohyung Han Computer Vision Lab. Convolutional Neural Networks (CNNs) Structured predictions for low level computer

More information

Mask R-CNN. Kaiming He, Georgia, Gkioxari, Piotr Dollar, Ross Girshick Presenters: Xiaokang Wang, Mengyao Shi Feb. 13, 2018

Mask R-CNN. Kaiming He, Georgia, Gkioxari, Piotr Dollar, Ross Girshick Presenters: Xiaokang Wang, Mengyao Shi Feb. 13, 2018 Mask R-CNN Kaiming He, Georgia, Gkioxari, Piotr Dollar, Ross Girshick Presenters: Xiaokang Wang, Mengyao Shi Feb. 13, 2018 1 Common computer vision tasks Image Classification: one label is generated for

More information


PARTIAL STYLE TRANSFER USING WEAKLY SUPERVISED SEMANTIC SEGMENTATION. Shin Matsuo Wataru Shimoda Keiji Yanai PARTIAL STYLE TRANSFER USING WEAKLY SUPERVISED SEMANTIC SEGMENTATION Shin Matsuo Wataru Shimoda Keiji Yanai Department of Informatics, The University of Electro-Communications, Tokyo 1-5-1 Chofugaoka,

More information

Object detection using Region Proposals (RCNN) Ernest Cheung COMP Presentation

Object detection using Region Proposals (RCNN) Ernest Cheung COMP Presentation Object detection using Region Proposals (RCNN) Ernest Cheung COMP790-125 Presentation 1 2 Problem to solve Object detection Input: Image Output: Bounding box of the object 3 Object detection using CNN

More information

arxiv: v1 [] 29 Nov 2017

arxiv: v1 [] 29 Nov 2017 Detection-aided liver lesion segmentation using deep learning arxiv:1711.11069v1 [] 29 Nov 2017 Míriam Bellver, Kevis-Kokitsi Maninis, Jordi Pont-Tuset, Xavier Giró-i-Nieto, Jordi Torres,, Luc Van

More information

Data Augmentation for Skin Lesion Analysis

Data Augmentation for Skin Lesion Analysis Data Augmentation for Skin Lesion Analysis Fábio Perez 1, Cristina Vasconcelos 2, Sandra Avila 3, and Eduardo Valle 1 1 RECOD Lab, DCA, FEEC, University of Campinas (Unicamp), Brazil 2 Computer Science

More information

Computer Vision Lecture 16

Computer Vision Lecture 16 Announcements Computer Vision Lecture 16 Deep Learning Applications 11.01.2017 Seminar registration period starts on Friday We will offer a lab course in the summer semester Deep Robot Learning Topic:

More information

Finding Tiny Faces Supplementary Materials

Finding Tiny Faces Supplementary Materials Finding Tiny Faces Supplementary Materials Peiyun Hu, Deva Ramanan Robotics Institute Carnegie Mellon University {peiyunh,deva} 1. Error analysis Quantitative analysis We plot the distribution

More information

A Comparison of CNN-based Face and Head Detectors for Real-Time Video Surveillance Applications

A Comparison of CNN-based Face and Head Detectors for Real-Time Video Surveillance Applications A Comparison of CNN-based Face and Head Detectors for Real-Time Video Surveillance Applications Le Thanh Nguyen-Meidine 1, Eric Granger 1, Madhu Kiran 1 and Louis-Antoine Blais-Morin 2 1 École de technologie

More information

Kaggle Data Science Bowl 2017 Technical Report

Kaggle Data Science Bowl 2017 Technical Report Kaggle Data Science Bowl 2017 Technical Report qfpxfd Team May 11, 2017 1 Team Members Table 1: Team members Name E-Mail University Jia Ding Peking University, Beijing, China Aoxue Li

More information

Semantic Segmentation

Semantic Segmentation Semantic Segmentation UCLA: OUTLINE Semantic Segmentation Why? Paper to talk about: Fully Convolutional Networks for Semantic Segmentation. J. Long, E. Shelhamer, and T. Darrell,

More information

arxiv: v2 [] 30 Sep 2018

arxiv: v2 [] 30 Sep 2018 A Detection and Segmentation Architecture for Skin Lesion Segmentation on Dermoscopy Images arxiv:1809.03917v2 [] 30 Sep 2018 Chengyao Qian, Ting Liu, Hao Jiang, Zhe Wang, Pengfei Wang, Mingxin Guan

More information

EE-559 Deep learning Networks for semantic segmentation

EE-559 Deep learning Networks for semantic segmentation EE-559 Deep learning 7.4. Networks for semantic segmentation François Fleuret Mon Feb 8 3:35:5 UTC 209 ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE The historical approach to image

More information

Regionlet Object Detector with Hand-crafted and CNN Feature

Regionlet Object Detector with Hand-crafted and CNN Feature Regionlet Object Detector with Hand-crafted and CNN Feature Xiaoyu Wang Research Xiaoyu Wang Research Ming Yang Horizon Robotics Shenghuo Zhu Alibaba Group Yuanqing Lin Baidu Overview of this section Regionlet

More information

Photo-realistic Renderings for Machines Seong-heum Kim

Photo-realistic Renderings for Machines Seong-heum Kim Photo-realistic Renderings for Machines 20105034 Seong-heum Kim CS580 Student Presentations 2016.04.28 Photo-realistic Renderings for Machines Scene radiances Model descriptions (Light, Shape, Material,

More information

Classification of objects from Video Data (Group 30)

Classification of objects from Video Data (Group 30) Classification of objects from Video Data (Group 30) Sheallika Singh 12665 Vibhuti Mahajan 12792 Aahitagni Mukherjee 12001 M Arvind 12385 1 Motivation Video surveillance has been employed for a long time

More information

Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network

Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network Liwen Zheng, Canmiao Fu, Yong Zhao * School of Electronic and Computer Engineering, Shenzhen Graduate School of

More information

Channel Locality Block: A Variant of Squeeze-and-Excitation

Channel Locality Block: A Variant of Squeeze-and-Excitation Channel Locality Block: A Variant of Squeeze-and-Excitation 1 st Huayu Li Northern Arizona University Flagstaff, United State Northern Arizona University arxiv:1901.01493v1 [cs.lg] 6 Jan

More information

Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab.

Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab. [ICIP 2017] Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab., POSTECH Pedestrian Detection Goal To draw bounding boxes that

More information



More information

YOLO9000: Better, Faster, Stronger

YOLO9000: Better, Faster, Stronger YOLO9000: Better, Faster, Stronger Date: January 24, 2018 Prepared by Haris Khan (University of Toronto) Haris Khan CSC2548: Machine Learning in Computer Vision 1 Overview 1. Motivation for one-shot object

More information


TEXT SEGMENTATION ON PHOTOREALISTIC IMAGES TEXT SEGMENTATION ON PHOTOREALISTIC IMAGES Valery Grishkin a, Alexander Ebral b, Nikolai Stepenko c, Jean Sene d Saint Petersburg State University, 7 9 Universitetskaya nab., Saint Petersburg, 199034,

More information



More information



More information

arxiv: v1 [] 31 Mar 2016

arxiv: v1 [] 31 Mar 2016 Object Boundary Guided Semantic Segmentation Qin Huang, Chunyang Xia, Wenchao Zheng, Yuhang Song, Hao Xu and C.-C. Jay Kuo arxiv:1603.09742v1 [] 31 Mar 2016 University of Southern California Abstract.

More information

Pose estimation using a variety of techniques

Pose estimation using a variety of techniques Pose estimation using a variety of techniques Keegan Go Stanford University Abstract Vision is an integral part robotic systems a component that is needed for robots to interact robustly

More information

CEA LIST s participation to the Scalable Concept Image Annotation task of ImageCLEF 2015

CEA LIST s participation to the Scalable Concept Image Annotation task of ImageCLEF 2015 CEA LIST s participation to the Scalable Concept Image Annotation task of ImageCLEF 2015 Etienne Gadeski, Hervé Le Borgne, and Adrian Popescu CEA, LIST, Laboratory of Vision and Content Engineering, France

More information

Deep Learning for Object detection & localization

Deep Learning for Object detection & localization Deep Learning for Object detection & localization RCNN, Fast RCNN, Faster RCNN, YOLO, GAP, CAM, MSROI Aaditya Prakash Sep 25, 2018 Image classification Image classification Whole of image is classified

More information

Traffic sign shape classification evaluation II: FFT applied to the signature of Blobs

Traffic sign shape classification evaluation II: FFT applied to the signature of Blobs Traffic sign shape classification evaluation II: FFT applied to the signature of Blobs P. Gil-Jiménez, S. Lafuente-Arroyo, H. Gómez-Moreno, F. López-Ferreras and S. Maldonado-Bascón Dpto. de Teoría de

More information

Machine Learning 13. week

Machine Learning 13. week Machine Learning 13. week Deep Learning Convolutional Neural Network Recurrent Neural Network 1 Why Deep Learning is so Popular? 1. Increase in the amount of data Thanks to the Internet, huge amount of

More information

Geometry-aware Traffic Flow Analysis by Detection and Tracking

Geometry-aware Traffic Flow Analysis by Detection and Tracking Geometry-aware Traffic Flow Analysis by Detection and Tracking 1,2 Honghui Shi, 1 Zhonghao Wang, 1,2 Yang Zhang, 1,3 Xinchao Wang, 1 Thomas Huang 1 IFP Group, Beckman Institute at UIUC, 2 IBM Research,

More information

arxiv: v1 [] 26 Jun 2017

arxiv: v1 [] 26 Jun 2017 Detecting Small Signs from Large Images arxiv:1706.08574v1 [] 26 Jun 2017 Zibo Meng, Xiaochuan Fan, Xin Chen, Min Chen and Yan Tong Computer Science and Engineering University of South Carolina, Columbia,

More information

Mask R-CNN. By Kaiming He, Georgia Gkioxari, Piotr Dollar and Ross Girshick Presented By Aditya Sanghi

Mask R-CNN. By Kaiming He, Georgia Gkioxari, Piotr Dollar and Ross Girshick Presented By Aditya Sanghi Mask R-CNN By Kaiming He, Georgia Gkioxari, Piotr Dollar and Ross Girshick Presented By Aditya Sanghi Types of Computer Vision Tasks Semantic vs Instance Segmentation Image

More information

Real-time Hand Tracking under Occlusion from an Egocentric RGB-D Sensor Supplemental Document

Real-time Hand Tracking under Occlusion from an Egocentric RGB-D Sensor Supplemental Document Real-time Hand Tracking under Occlusion from an Egocentric RGB-D Sensor Supplemental Document Franziska Mueller 1,2 Dushyant Mehta 1,2 Oleksandr Sotnychenko 1 Srinath Sridhar 1 Dan Casas 3 Christian Theobalt

More information

Supplementary Material: Unconstrained Salient Object Detection via Proposal Subset Optimization

Supplementary Material: Unconstrained Salient Object Detection via Proposal Subset Optimization Supplementary Material: Unconstrained Salient Object via Proposal Subset Optimization 1. Proof of the Submodularity According to Eqns. 10-12 in our paper, the objective function of the proposed optimization

More information

Human Pose Estimation with Deep Learning. Wei Yang

Human Pose Estimation with Deep Learning. Wei Yang Human Pose Estimation with Deep Learning Wei Yang Applications Understand Activities Family Robots American Heist (2014) - The Bank Robbery Scene 2 What do we need to know to recognize a crime scene? 3

More information


DEEP NEURAL NETWORKS FOR OBJECT DETECTION DEEP NEURAL NETWORKS FOR OBJECT DETECTION Sergey Nikolenko Steklov Institute of Mathematics at St. Petersburg October 21, 2017, St. Petersburg, Russia Outline Bird s eye overview of deep learning Convolutional

More information

Real-time Object Detection CS 229 Course Project

Real-time Object Detection CS 229 Course Project Real-time Object Detection CS 229 Course Project Zibo Gong 1, Tianchang He 1, and Ziyi Yang 1 1 Department of Electrical Engineering, Stanford University December 17, 2016 Abstract Objection detection

More information

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Si Chen The George Washington University Meera Hahn Emory University Mentor: Afshin

More information

Optimizing CNN-based Object Detection Algorithms on Embedded FPGA Platforms

Optimizing CNN-based Object Detection Algorithms on Embedded FPGA Platforms Optimizing CNN-based Object Detection Algorithms on Embedded FPGA Platforms Ruizhe Zhao 1, Xinyu Niu 1, Yajie Wu 2, Wayne Luk 1, and Qiang Liu 3 1 Imperial College London {ruizhe.zhao15,niu.xinyu10,w.luk}

More information

Ryerson University CP8208. Soft Computing and Machine Intelligence. Naive Road-Detection using CNNS. Authors: Sarah Asiri - Domenic Curro

Ryerson University CP8208. Soft Computing and Machine Intelligence. Naive Road-Detection using CNNS. Authors: Sarah Asiri - Domenic Curro Ryerson University CP8208 Soft Computing and Machine Intelligence Naive Road-Detection using CNNS Authors: Sarah Asiri - Domenic Curro April 24 2016 Contents 1 Abstract 2 2 Introduction 2 3 Motivation

More information

Transfer Learning. Style Transfer in Deep Learning

Transfer Learning. Style Transfer in Deep Learning Transfer Learning & Style Transfer in Deep Learning 4-DEC-2016 Gal Barzilai, Ram Machlev Deep Learning Seminar School of Electrical Engineering Tel Aviv University Part 1: Transfer Learning in Deep Learning

More information

OBJECT detection in general has many applications

OBJECT detection in general has many applications 1 Implementing Rectangle Detection using Windowed Hough Transform Akhil Singh, Music Engineering, University of Miami Abstract This paper implements Jung and Schramm s method to use Hough Transform for

More information

Automatic detection of books based on Faster R-CNN

Automatic detection of books based on Faster R-CNN Automatic detection of books based on Faster R-CNN Beibei Zhu, Xiaoyu Wu, Lei Yang, Yinghua Shen School of Information Engineering, Communication University of China Beijing, China e-mail:,

More information

Todo before next class

Todo before next class Todo before next class Each project group should submit a short project report (4 pages presentation slides) including 1. Problem definition 2. Related work 3. Preliminary results 4. Future plan Submission:

More information

arxiv: v1 [] 29 Sep 2016

arxiv: v1 [] 29 Sep 2016 arxiv:1609.09545v1 [] 29 Sep 2016 Two-stage Convolutional Part Heatmap Regression for the 1st 3D Face Alignment in the Wild (3DFAW) Challenge Adrian Bulat and Georgios Tzimiropoulos Computer Vision

More information

DeepBIBX: Deep Learning for Image Based Bibliographic Data Extraction

DeepBIBX: Deep Learning for Image Based Bibliographic Data Extraction DeepBIBX: Deep Learning for Image Based Bibliographic Data Extraction Akansha Bhardwaj 1,2, Dominik Mercier 1, Sheraz Ahmed 1, Andreas Dengel 1 1 Smart Data and Services, DFKI Kaiserslautern, Germany

More information

Face Recognition using SURF Features and SVM Classifier

Face Recognition using SURF Features and SVM Classifier International Journal of Electronics Engineering Research. ISSN 0975-6450 Volume 8, Number 1 (016) pp. 1-8 Research India Publications Face Recognition using SURF Features

More information

A Deep Learning Approach to Vehicle Speed Estimation

A Deep Learning Approach to Vehicle Speed Estimation A Deep Learning Approach to Vehicle Speed Estimation Benjamin Penchas Tobin Bell Marco Monteiro ABSTRACT Given car dashboard video footage,

More information

SSD: Single Shot MultiBox Detector. Author: Wei Liu et al. Presenter: Siyu Jiang

SSD: Single Shot MultiBox Detector. Author: Wei Liu et al. Presenter: Siyu Jiang SSD: Single Shot MultiBox Detector Author: Wei Liu et al. Presenter: Siyu Jiang Outline 1. Motivations 2. Contributions 3. Methodology 4. Experiments 5. Conclusions 6. Extensions Motivation Motivation

More information

Generic Face Alignment Using an Improved Active Shape Model

Generic Face Alignment Using an Improved Active Shape Model Generic Face Alignment Using an Improved Active Shape Model Liting Wang, Xiaoqing Ding, Chi Fang Electronic Engineering Department, Tsinghua University, Beijing, China {wanglt, dxq, fangchi}

More information

Study of Residual Networks for Image Recognition

Study of Residual Networks for Image Recognition Study of Residual Networks for Image Recognition Mohammad Sadegh Ebrahimi Stanford University Hossein Karkeh Abadi Stanford University Abstract Deep neural networks

More information

Computer aided diagnosis of melanoma using Computer Vision and Machine Learning

Computer aided diagnosis of melanoma using Computer Vision and Machine Learning Computer aided diagnosis of melanoma using Computer Vision and Machine Learning Jabeer Ahmed Biomedical Engineering Oregon Health & Science University This paper presents a computer-aided analysis of pigmented

More information

3D Shape Analysis with Multi-view Convolutional Networks. Evangelos Kalogerakis

3D Shape Analysis with Multi-view Convolutional Networks. Evangelos Kalogerakis 3D Shape Analysis with Multi-view Convolutional Networks Evangelos Kalogerakis 3D model repositories [3D Warehouse - video] 3D geometry acquisition [KinectFusion - video] 3D shapes come in various flavors

More information

arxiv: v1 [] 20 Dec 2016

arxiv: v1 [] 20 Dec 2016 End-to-End Pedestrian Collision Warning System based on a Convolutional Neural Network with Semantic Segmentation arxiv:1612.06558v1 [] 20 Dec 2016 Heechul Jung Min-Kook Choi

More information

Scene Text Recognition for Augmented Reality. Sagar G V Adviser: Prof. Bharadwaj Amrutur Indian Institute Of Science

Scene Text Recognition for Augmented Reality. Sagar G V Adviser: Prof. Bharadwaj Amrutur Indian Institute Of Science Scene Text Recognition for Augmented Reality Sagar G V Adviser: Prof. Bharadwaj Amrutur Indian Institute Of Science Outline Research area and motivation Finding text in natural scenes Prior art Improving

More information

3 Object Detection. BVM 2018 Tutorial: Advanced Deep Learning Methods. Paul F. Jaeger, Division of Medical Image Computing

3 Object Detection. BVM 2018 Tutorial: Advanced Deep Learning Methods. Paul F. Jaeger, Division of Medical Image Computing 3 Object Detection BVM 2018 Tutorial: Advanced Deep Learning Methods Paul F. Jaeger, of Medical Image Computing What is object detection? classification segmentation obj. detection (1 label per pixel)

More information

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Shaoqing Ren Kaiming He Ross Girshick Jian Sun Present by: Yixin Yang Mingdong Wang 1 Object Detection 2 1 Applications Basic

More information

Disguised Face Identification (DFI) with Facial KeyPoints using Spatial Fusion Convolutional Network. Nathan Sun CIS601

Disguised Face Identification (DFI) with Facial KeyPoints using Spatial Fusion Convolutional Network. Nathan Sun CIS601 Disguised Face Identification (DFI) with Facial KeyPoints using Spatial Fusion Convolutional Network Nathan Sun CIS601 Introduction Face ID is complicated by alterations to an individual s appearance Beard,

More information

Convolutional Neural Networks: Applications and a short timeline. 7th Deep Learning Meetup Kornel Kis Vienna,

Convolutional Neural Networks: Applications and a short timeline. 7th Deep Learning Meetup Kornel Kis Vienna, Convolutional Neural Networks: Applications and a short timeline 7th Deep Learning Meetup Kornel Kis Vienna, 1.12.2016. Introduction Currently a master student Master thesis at BME SmartLab Started deep

More information

Supplementary Material for Zoom and Learn: Generalizing Deep Stereo Matching to Novel Domains

Supplementary Material for Zoom and Learn: Generalizing Deep Stereo Matching to Novel Domains Supplementary Material for Zoom and Learn: Generalizing Deep Stereo Matching to Novel Domains Jiahao Pang 1 Wenxiu Sun 1 Chengxi Yang 1 Jimmy Ren 1 Ruichao Xiao 1 Jin Zeng 1 Liang Lin 1,2 1 SenseTime Research

More information

Deep Learning For Video Classification. Presented by Natalie Carlebach & Gil Sharon

Deep Learning For Video Classification. Presented by Natalie Carlebach & Gil Sharon Deep Learning For Video Classification Presented by Natalie Carlebach & Gil Sharon Overview Of Presentation Motivation Challenges of video classification Common datasets 4 different methods presented in

More information

Convolutional Neural Network Layer Reordering for Acceleration

Convolutional Neural Network Layer Reordering for Acceleration R1-15 SASIMI 2016 Proceedings Convolutional Neural Network Layer Reordering for Acceleration Vijay Daultani Subhajit Chaudhury Kazuhisa Ishizaka System Platform Labs Value Co-creation Center System Platform

More information

International Journal of Computer Engineering and Applications, Volume XII, Special Issue, September 18,

International Journal of Computer Engineering and Applications, Volume XII, Special Issue, September 18, REAL-TIME OBJECT DETECTION WITH CONVOLUTION NEURAL NETWORK USING KERAS Asmita Goswami [1], Lokesh Soni [2 ] Department of Information Technology [1] Jaipur Engineering College and Research Center Jaipur[2]

More information

Mask R-CNN. presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma

Mask R-CNN. presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma Mask R-CNN presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma Mask R-CNN Background Related Work Architecture Experiment Mask R-CNN Background Related Work Architecture Experiment Background From left

More information

Photo OCR ( )

Photo OCR ( ) Photo OCR (2017-2018) Xiang Bai Huazhong University of Science and Technology Outline VALSE2018, DaLian Xiang Bai 2 Deep Direct Regression for Multi-Oriented Scene Text Detection [He et al., ICCV, 2017.]

More information

Deconvolutions in Convolutional Neural Networks

Deconvolutions in Convolutional Neural Networks Overview Deconvolutions in Convolutional Neural Networks Bohyung Han Computer Vision Lab. Convolutional Neural Networks (CNNs) Deconvolutions in CNNs Applications Network visualization

More information

Machine vision. Summary # 6: Shape descriptors

Machine vision. Summary # 6: Shape descriptors Machine vision Summary # : Shape descriptors SHAPE DESCRIPTORS Objects in an image are a collection of pixels. In order to describe an object or distinguish between objects, we need to understand the properties

More information