Efficient Segmentation-Aided Text Detection For Intelligent Robots

Similar documents
JOINT DETECTION AND SEGMENTATION WITH DEEP HIERARCHICAL NETWORKS. Zhao Chen Machine Learning Intern, NVIDIA

Encoder-Decoder Networks for Semantic Segmentation. Sachin Mehta

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution and Fully Connected CRFs

Cascade Region Regression for Robust Object Detection

3 Object Detection. BVM 2018 Tutorial: Advanced Deep Learning Methods. Paul F. Jaeger, Division of Medical Image Computing

Object Detection Based on Deep Learning

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Lecture 7: Semantic Segmentation

Deep learning for object detection. Slides from Svetlana Lazebnik and many others

arxiv: v1 [cs.cv] 31 Mar 2016

SSD: Single Shot MultiBox Detector. Author: Wei Liu et al. Presenter: Siyu Jiang

Classifying a specific image region using convolutional nets with an ROI mask as input

Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab.

Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network

Feature-Fused SSD: Fast Detection for Small Objects

Object detection with CNNs

Photo OCR ( )

EFFICIENT SEGMENTATION-AIDED TEXT DETECTION FOR INTELLIGENT ROBOTS

Mask R-CNN. presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma

Andrei Polzounov (Universitat Politecnica de Catalunya, Barcelona, Spain), Artsiom Ablavatski (A*STAR Institute for Infocomm Research, Singapore),

Object Detection. CS698N Final Project Presentation AKSHAT AGARWAL SIDDHARTH TANWAR

Yiqi Yan. May 10, 2017

Spatial Localization and Detection. Lecture 8-1

Lecture 5: Object Detection

REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION

MCMOT: Multi-Class Multi-Object Tracking using Changing Point Detection

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Fully Convolutional Networks for Semantic Segmentation

Presentation Outline. Semantic Segmentation. Overview. Presentation Outline CNN. Learning Deconvolution Network for Semantic Segmentation 6/6/16

Detecting cars in aerial photographs with a hierarchy of deconvolution nets

CS6501: Deep Learning for Visual Recognition. Object Detection I: RCNN, Fast-RCNN, Faster-RCNN

Mask R-CNN. Kaiming He, Georgia, Gkioxari, Piotr Dollar, Ross Girshick Presenters: Xiaokang Wang, Mengyao Shi Feb. 13, 2018

Finding Tiny Faces Supplementary Materials

YOLO9000: Better, Faster, Stronger

Rich feature hierarchies for accurate object detection and semantic segmentation

arxiv: v1 [cs.cv] 15 Oct 2018

R-FCN++: Towards Accurate Region-Based Fully Convolutional Networks for Object Detection

Abandoned Luggage Detection

arxiv: v1 [cs.cv] 1 Sep 2017

Modern Convolutional Object Detectors

Semantic Segmentation

Final Report: Smart Trash Net: Waste Localization and Classification

Recognize Complex Events from Static Images by Fusing Deep Channels Supplementary Materials

Amodal and Panoptic Segmentation. Stephanie Liu, Andrew Zhou

Object Detection. Part1. Presenter: Dae-Yong

Visual features detection based on deep neural network in autonomous driving tasks

Computer Vision Lecture 16

Pedestrian Detection based on Deep Fusion Network using Feature Correlation

Computer Vision Lecture 16

Human Pose Estimation with Deep Learning. Wei Yang

Unified, real-time object detection

RON: Reverse Connection with Objectness Prior Networks for Object Detection

MULTI-SCALE OBJECT DETECTION WITH FEATURE FUSION AND REGION OBJECTNESS NETWORK. Wenjie Guan, YueXian Zou*, Xiaoqun Zhou

An Anchor-Free Region Proposal Network for Faster R-CNN based Text Detection Approaches

arxiv: v2 [cs.cv] 30 Sep 2018

arxiv: v2 [cs.cv] 8 Apr 2018

Feature Fusion for Scene Text Detection

arxiv: v2 [cs.cv] 18 Jul 2017

Object Detection on Self-Driving Cars in China. Lingyun Li

Volume 6, Issue 12, December 2018 International Journal of Advance Research in Computer Science and Management Studies

Deformable Part Models

Object Detection. TA : Young-geun Kim. Biostatistics Lab., Seoul National University. March-June, 2018

Pixel Offset Regression (POR) for Single-shot Instance Segmentation

CIS680: Vision & Learning Assignment 2.b: RPN, Faster R-CNN and Mask R-CNN Due: Nov. 21, 2018 at 11:59 pm

Mask R-CNN. By Kaiming He, Georgia Gkioxari, Piotr Dollar and Ross Girshick Presented By Aditya Sanghi

Creating Affordable and Reliable Autonomous Vehicle Systems

Channel Locality Block: A Variant of Squeeze-and-Excitation

Improved Face Detection and Alignment using Cascade Deep Convolutional Network

Team G-RMI: Google Research & Machine Intelligence

Introduction to Deep Learning for Facial Understanding Part III: Regional CNNs

DeepBIBX: Deep Learning for Image Based Bibliographic Data Extraction

Scene Composition in Augmented Virtual Presenter System

Automatic Latent Fingerprint Segmentation

An Analysis of Scale Invariance in Object Detection SNIP

arxiv: v1 [cs.cv] 15 May 2017

Thinking Outside the Box: Spatial Anticipation of Semantic Categories

Supplementary Material: Pixelwise Instance Segmentation with a Dynamically Instantiated Network

(Deep) Learning for Robot Perception and Navigation. Wolfram Burgard

arxiv: v1 [cs.cv] 4 Jan 2018

2017 2nd International Conference on Software, Multimedia and Communication Engineering (SMCE 2017) ISBN:

Joint Object Detection and Viewpoint Estimation using CNN features

Object Localization, Segmentation, Classification, and Pose Estimation in 3D Images using Deep Learning

Real-time Object Detection CS 229 Course Project

Deep learning for dense per-pixel prediction. Chunhua Shen The University of Adelaide, Australia

A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS. Kuan-Chuan Peng and Tsuhan Chen

An Analysis of Scale Invariance in Object Detection SNIP

Rich feature hierarchies for accurate object detection and semantic segmentation

Part Localization by Exploiting Deep Convolutional Networks

arxiv: v1 [cs.cv] 26 May 2017

Martian lava field, NASA, Wikipedia

Xiaowei Hu* Lei Zhu* Chi-Wing Fu Jing Qin Pheng-Ann Heng

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

Object detection using Region Proposals (RCNN) Ernest Cheung COMP Presentation

Towards Real-Time Automatic Number Plate. Detection: Dots in the Search Space

Computer Vision Lecture 16

Improving Small Object Detection

Deep Neural Networks for Scene Text Reading

Gradient of the lower bound

A Bi-directional Message Passing Model for Salient Object Detection

Regionlet Object Detector with Hand-crafted and CNN Feature

Transcription:

Efficient Segmentation-Aided Text Detection For Intelligent Robots Junting Zhang, Yuewei Na, Siyang Li, C.-C. Jay Kuo University of Southern California

Outline Problem Definition and Motivation Related Work Detection-based approaches Casting the detection problem as a semantic segmentation problem Segmentation-aided Text Detection Methodologies Experimental Results Conclusions and Future Work 2

Problem Definition Text Detection: Given an image, word-level bounding boxes should be produced. Original Image Visualization of G.T. G.T. Text file 3

Motivation ADAS and Robot Vision 4

Motivation Visual Translation: Current app requires user to align the text manually Image credit: Google Translate 5

Extending the generic object detectors Faster R-CNN Girshick, Ross, et al. "Rich feature hierarchies for accurate object detection and semantic segmentation." Proceedings of the IEEE conference on computer vision and pattern recognition. 2014. Ren, Shaoqing, et al. "Faster R-CNN: Towards real-time object detection with region proposal networks." Advances in neural information processing systems. 2015. 6

Extending the object detectors Predict text boxes directly by sliding a window over the convolutional features. Predictions are made based on local regions in the image without enough contextual cues. When the environment becomes more challenging, they are not robust against text-like patterns such as fences, brick wall, windows, leaves, etc.. Tedious Post-processing is usually needed to remove false positives. 7

Casting the detection problem as a semantic segmentation problem Long, Jonathan, Evan Shelhamer, and Trevor Darrell. "Fully convolutional networks for semantic segmentation." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015. 8

Segmentation Networks FCNs usually have large receptive fields Advantage: More context information is considered, and the predictions are robust to text-like patterns Disadvantage: Hard to do the fine-scale detection, i.e. unable to separate the lines and words Recall: The desired output should be word-level bounding boxes 9

Problems of Segmentation-based Approaches: To produce fine-scale detection results, one has to use: Cascaded system to process each ROI separately -->Not efficient! Extra annotations, eg. word center, character center. -->Too costly! WHY NOT Combine DET and SEG? 10

Essential Intuition Detection network is error prone to text-like patterns, but good at predicting accurate bounding boxes for each individual word Detection net is wasting efforts at wrong regions! Segmentation network is robust to clutter backgrounds, but unable to separate individual words Segmentation net is suitable for ROI finding. Let s use segmenter output to guide the detector, so that it can pay attention to the correct regions. 11

Text Attention Map (TAM) TAM is a heat map that indicating the probability of existence of text. A TAM can be obtained by training a FCN with text region mask. Input Image with ground truth bounding boxes. TAM. Red means higher confidence score. 12

How to obtain a good TAM for an input image? Atrous Spatial Pyramid Pooling (ASPP) Layer ASPP produces multi-scale representations by combining feature responses from parallel atrous convolution layers with different sampling rates. Dilation always brings more global view! Atrous Spatial Pyramid Pooling Layer Chen, Liang-Chieh, et al. "Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs." arxiv preprint arxiv:1606.00915 (2016). 13

How to use the TAM to guide the detector? Multiplicative Gating cls BBox Reg. cls BBox Reg. Conv Features Text Attention-aware Feature Maps Conv Features X TAM Detector without Text Attention Gating Conv Features Detector with Text Attention Gating 14

Text Attention Gating 15

Which Detector to use? Single Shot MultiBox Detector (SSD): The most efficient detector up to date. Liu, Wei, et al. "Ssd: Single shot multibox detector." European conference on computer vision. Springer, Cham, 2016. 16

The Overall Architecture 17

Experimental Results Significant improvement over the baseline models. State-of-the-art performance on COCO-Text dataset, which is the most challenging text detection dateset up to date. [9] Liao, Minghui, et al. "TextBoxes: A Fast Text Detector with a Single Deep Neural Network." AAAI. 2017. [12] Yao, Cong, et al. "Scene text detection via holistic, multi-channel prediction." arxiv preprint arxiv:1606.09002 (2016). 18

19

20

Conclusions and Future Work Main Take-aways: Segmenter has a more global view Detector is better at localizing accurate bounding boxes Detector can be guided by the segmentation heatmap (TAM) via multiplicative gating on the feature maps Future Works: Extend the proposed method to other detection task, eg. pedestrian detection Optimize the proposed method for embedded systems 21

Thank you for your attention! Any Questions? Paper #1469: Efficient Segmentation-Aided Text Detection for Intelligent Robots Junting Zhang*, Yuewei Na, Siyang Li, and C.-C. Jay Kuo. *Email: juntingz@usc.edu 22