Learning to Segment Object Candidates

Similar documents
Recurrent Convolutional Neural Networks for Scene Labeling

arxiv: v1 [cs.cv] 14 Jun 2016

Multi-View 3D Object Detection Network for Autonomous Driving

Learning to Segment Object Candidates

Object Proposal Generation with Fully Convolutional Networks

Instance-aware Semantic Segmentation via Multi-task Network Cascades

Deep Learning for Object detection & localization

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Deep learning for object detection. Slides from Svetlana Lazebnik and many others

Amodal and Panoptic Segmentation. Stephanie Liu, Andrew Zhou

Fully Convolutional Networks for Semantic Segmentation

Lecture 7: Semantic Segmentation

Object Detection on Self-Driving Cars in China. Lingyun Li

arxiv: v2 [cs.cv] 26 Jul 2016

Toward Scale-Invariance and Position-Sensitive Region Proposal Networks

Spatial Localization and Detection. Lecture 8-1

Object detection with CNNs

Rich feature hierarchies for accurate object detection and semant

SSD: Single Shot MultiBox Detector. Author: Wei Liu et al. Presenter: Siyu Jiang

Mask R-CNN. presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma

Yield Estimation using faster R-CNN

Deep Learning with Tensorflow AlexNet

CAP 6412 Advanced Computer Vision

OBJECT DETECTION HYUNG IL KOO

arxiv: v1 [cs.cv] 15 Oct 2018

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution and Fully Connected CRFs

DeepProposals: Hunting Objects and Actions by Cascading Deep Convolutional Layers

Convolutional Networks in Scene Labelling

CS395T paper review. Indoor Segmentation and Support Inference from RGBD Images. Chao Jia Sep

COMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017

Object Detection Based on Deep Learning

Encoder-Decoder Networks for Semantic Segmentation. Sachin Mehta

Detection and Segmentation of Manufacturing Defects with Convolutional Neural Networks and Transfer Learning

arxiv: v2 [cs.cv] 3 Sep 2018

Interactive Hierarchical Object Proposals

Object Detection. Part1. Presenter: Dae-Yong

Crafting GBD-Net for Object Detection

MULTI-SCALE OBJECT DETECTION WITH FEATURE FUSION AND REGION OBJECTNESS NETWORK. Wenjie Guan, YueXian Zou*, Xiaoqun Zhou

Rich feature hierarchies for accurate object detection and semantic segmentation

CENG 783. Special topics in. Deep Learning. AlchemyAPI. Week 11. Sinan Kalkan

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

Object detection using Region Proposals (RCNN) Ernest Cheung COMP Presentation

Project 3 Q&A. Jonathan Krause

Mask R-CNN. By Kaiming He, Georgia Gkioxari, Piotr Dollar and Ross Girshick Presented By Aditya Sanghi

YOLO 9000 TAEWAN KIM

CIS680: Vision & Learning Assignment 2.b: RPN, Faster R-CNN and Mask R-CNN Due: Nov. 21, 2018 at 11:59 pm

Optimizing Object Detection:

Lecture 5: Object Detection

WITH the recent advance [1] of deep convolutional neural

Yiqi Yan. May 10, 2017

What makes for effective detection proposals?

DeepBox: Learning Objectness with Convolutional Networks

Lecture 37: ConvNets (Cont d) and Training

Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture David Eigen, Rob Fergus

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,

YOLO9000: Better, Faster, Stronger

Deep Learning For Video Classification. Presented by Natalie Carlebach & Gil Sharon

Deep Learning for Computer Vision II

CS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh April 13, 2016

RSRN: Rich Side-output Residual Network for Medial Axis Detection

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

A MultiPath Network for Object Detection

Skin Lesion Classification and Segmentation for Imbalanced Classes using Deep Learning

Disguised Face Identification (DFI) with Facial KeyPoints using Spatial Fusion Convolutional Network. Nathan Sun CIS601

Introduction to Deep Learning for Facial Understanding Part III: Regional CNNs

arxiv: v2 [cs.cv] 18 Jul 2017

Hide-and-Seek: Forcing a network to be Meticulous for Weakly-supervised Object and Action Localization

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

Joseph Camilo 1, Rui Wang 1, Leslie M. Collins 1, Senior Member, IEEE, Kyle Bradbury 2, Member, IEEE, and Jordan M. Malof 1,Member, IEEE

Convolutional Neural Networks

PSU Student Research Symposium 2017 Bayesian Optimization for Refining Object Proposals, with an Application to Pedestrian Detection Anthony D.

Hand Detection For Grab-and-Go Groceries

Advanced Video Analysis & Imaging

NVIDIA FOR DEEP LEARNING. Bill Veenhuis

SINGLE image super-resolution (SR) aims to reconstruct

Martian lava field, NASA, Wikipedia

Dense Image Labeling Using Deep Convolutional Neural Networks

arxiv: v2 [cs.cv] 10 Apr 2017

Fuzzy Set Theory in Computer Vision: Example 3

DEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS VEHICLE LANDSCAPE. Dennis Lui August 2017

arxiv: v1 [cs.cv] 31 Mar 2016

SINGLE image super-resolution (SR) aims to reconstruct

Final Report: Smart Trash Net: Waste Localization and Classification

arxiv: v1 [cs.cv] 5 Oct 2015

Boundary-aware Instance Segmentation

Intro to Deep Learning. Slides Credit: Andrej Karapathy, Derek Hoiem, Marc Aurelio, Yann LeCunn

SEMANTIC SEGMENTATION AVIRAM BAR HAIM & IRIS TAL

Deconvolutions in Convolutional Neural Networks

Structured Prediction using Convolutional Neural Networks

Using Machine Learning for Classification of Cancer Cells

Estimating Human Pose in Images. Navraj Singh December 11, 2009

Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab.

Finding Tiny Faces Supplementary Materials

ECCV Presented by: Boris Ivanovic and Yolanda Wang CS 331B - November 16, 2016

Deep Learning. Visualizing and Understanding Convolutional Networks. Christopher Funk. Pennsylvania State University.

Fully Convolutional Network for Depth Estimation and Semantic Segmentation

Conditional Random Fields as Recurrent Neural Networks

Perceptron: This is convolution!

Ryerson University CP8208. Soft Computing and Machine Intelligence. Naive Road-Detection using CNNS. Authors: Sarah Asiri - Domenic Curro

Proposal-free Network for Instance-level Object Segmentation

Transcription:

Learning to Segment Object Candidates Pedro Pinheiro, Ronan Collobert and Piotr Dollar Presented by - Sivaraman, Kalpathy Sitaraman, M.S. in Computer Science, University of Virginia

Facebook Artificial Intelligence Research (FAIR) FAIR s AI trinity DeepMask captures high-level understanding of the general object shape. SharpMask accurately places the boundaries by going back all the way down to the pixels. DeepMask is not very selective and can generate masks for image regions that are not especially interesting - MultiPathNet allows information to flow along multiple paths through the net, enabling it to exploit information at multiple image scales and in surrounding image context.

Object Proposals First Pre-processing stage High Recall (As many objects, in the image) High Recall with minimum # of proposals Proposals should be accurate with ground truth

Principal idea Given an image patch: 1. Generate a segmentation mask 2. Likelihood of patch being centered on a full object

A very large number of binary classification problems. First, for every (overlapping) patch in an image: Does this patch contain an object? Second, if the answer to the first question is yes for a given patch, then for every pixel in the patch: Is that pixel part of the central object in the patch? Use deep networks to answer each yes/no question. Networks (and computation) shared for every patch and every pixel to segment all the objects in an image. back

Related Work AlexNet, GoogLeNet, VGG, R-CNN Edge Boxes => By P. Dollar (when at MSR) Closely related to MultiBox - generates bounding box object proposals; however, less informative

MultiBox Too many proposals for the same object.

What we can see. For us, segmenting objects is very intuitive, given our natural visual training and intelligence What the machine sees. They see only pixels, numbers, DeepMask taps right into the pixels to understand objectness, bypassing low-level features like edges or super pixels.

DeepMask Network

Predicts Segmentation Mask for a given input patch, assigns score of the patch s objectness Achieved using a single ConvNet Last layers are task-specific Tasks trained-jointly. Reduces capacity. Increases the speed of full-scene inference at test time.

For each training sample: i) RGB input patch (x) ii) Binary Mask (m) : +1 or -1 iii)patch objectness (y) : +1 or -1 y = 1; if patch contains object (roughly in the centre) AND object is fully contained in the given scale range y = -1; if not; partial presence not graced either. Mask not considered.

Down-sample the image from 224x224 to 14x14 Segmentation: Single 1x1 conv layer + ReLU => Pixel Classification Layer Each pixel is a classifier

Locally connected (partial view) pixel classifiers vs Fully Connected (redundant parameters) pixel classifiers - both have drawbacks. Instead, decompose the classification layer to two linear layers with no nonlinearity in between. Output up-sampled to original size.

Joint Learning Jointly infer the pixel-wise segmentation mask and object score. Loss function used: Binary Logistic Regression Binary logistic regression estimates the probability that a characteristic is present (e.g. estimate probability of "success") given the values of explanatory variables. Segmentation reduced to a yes/no problem. [ slide 7]

if yk == -1: Ignore the segmentation mask Segmentation objectness f ij segm (xk) - Prediction of Segmentation network at location (i,j) fscore(xk) - Predicted Objectness score lambda - Score output (= 1/32)

Full Scene Inference Applied densely at multiple locations & scales Each object has at >=1 patch where it is central

Implementation details Learning rate = 0.001 SGD, batch size = 32, momentum = 0.9, decay = 0.00005 Pre-tained VGG features Weights randomly initialized 5 days to train on Tesla K40m

Training Annotations from MS COCO Training patches Creating Canonical examples Jitters, Translations, Flips, upscaled & downscaled Black box is annotated one. Green ones are canonical positives

Take very canonical positive, and its corresponding mask for each object in the image for training. Flip it horizontally to augments training data Notice, that only the mask of a single is trained as the output, not the entire bounding box

Results

DeepMask & its Variants DeepMaskZoom (DMZ) => Smaller scale; longer inference time DeepMask20 => Trained with objects belonging to only PASCAL DeepMask20* => same as above, uses DeepMask s scoring network DeepMaskFull => Each 56 x 56 connected to CNN

Experiments Compared against Edge Boxes, SelectiveSearch, GeoDesic, Rigor, MCG. DeepMask + R-CNN => Tops object detection, best Average Recall High IoU threshold: other models do better. Downsampling the output at every scale likely the reason. Can be resolved using skip connections or multiscale approach

Average recall versus number of box and segmentation proposals on various datasets. AR versus number of proposals for different object scales on segmentation proposals in COCO.

Recall vs IoU threshold on COCO

Results on the MS COCO dataset for both bounding box and segmentation proposals

Highlights Class-agnostic segmentation plus training on only positive examples (would generate a patch even if not a known object) - generalizes to unseen data as well. No usage of low-level features(like edge, superpixels) Does not generate BB Proposals, but Segmentation proposals. Can generate BB by finding the closing box around the segmentation. More accurate image classification, in the future. Learn higher level representations about an image.

The code is Open Source!!

Sources https://research.facebook.com/blog/learning-tosegment/ https://onlinecourses.science.psu.edu/stat504/ node/150 http://mscoco.org/dataset/#download Author s images, from respective personal websites