Photo OCR ( )

Similar documents
Deep Neural Networks for Scene Text Reading

Efficient Segmentation-Aided Text Detection For Intelligent Robots

SSD: Single Shot MultiBox Detector. Author: Wei Liu et al. Presenter: Siyu Jiang

3 Object Detection. BVM 2018 Tutorial: Advanced Deep Learning Methods. Paul F. Jaeger, Division of Medical Image Computing

Yiqi Yan. May 10, 2017

Feature Fusion for Scene Text Detection

TextBoxes++: A Single-Shot Oriented Scene Text Detector

arxiv: v2 [cs.cv] 27 Feb 2018

Deep learning for object detection. Slides from Svetlana Lazebnik and many others

CS6501: Deep Learning for Visual Recognition. Object Detection I: RCNN, Fast-RCNN, Faster-RCNN

Lecture 5: Object Detection

Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab.

Rotation-sensitive Regression for Oriented Scene Text Detection

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Lecture 7: Semantic Segmentation

Mask R-CNN. presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma

arxiv: v1 [cs.cv] 6 Dec 2017

Accurate Scene Text Detection through Border Semantics Awareness and Bootstrapping

arxiv: v1 [cs.cv] 1 Sep 2017

arxiv: v1 [cs.cv] 25 Mar 2019

Feature-Fused SSD: Fast Detection for Small Objects

Object detection with CNNs

Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network

TextField: Learning A Deep Direction Field for Irregular Scene Text Detection

[Supplementary Material] Improving Occlusion and Hard Negative Handling for Single-Stage Pedestrian Detectors

FOTS: Fast Oriented Text Spotting with a Unified Network

Object Detection Based on Deep Learning

Modern Convolutional Object Detectors

3D Object Recognition and Scene Understanding from RGB-D Videos. Yu Xiang Postdoctoral Researcher University of Washington

Xiaowei Hu* Lei Zhu* Chi-Wing Fu Jing Qin Pheng-Ann Heng

An Anchor-Free Region Proposal Network for Faster R-CNN based Text Detection Approaches

Fully Convolutional Networks for Semantic Segmentation

JOINT DETECTION AND SEGMENTATION WITH DEEP HIERARCHICAL NETWORKS. Zhao Chen Machine Learning Intern, NVIDIA

Arbitrary-Oriented Scene Text Detection via Rotation Proposals

An End-to-End TextSpotter with Explicit Alignment and Attention

Cascade Region Regression for Robust Object Detection

R-FCN: Object Detection with Really - Friggin Convolutional Networks

Mask R-CNN. Kaiming He, Georgia, Gkioxari, Piotr Dollar, Ross Girshick Presenters: Xiaokang Wang, Mengyao Shi Feb. 13, 2018

Perceiving the 3D World from Images and Videos. Yu Xiang Postdoctoral Researcher University of Washington

TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes

arxiv: v1 [cs.cv] 2 Jan 2019

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

From 3D descriptors to monocular 6D pose: what have we learned?

Deep Direct Regression for Multi-Oriented Scene Text Detection

Final Report: Smart Trash Net: Waste Localization and Classification

Human Pose Estimation with Deep Learning. Wei Yang

SocialML: machine learning for social media video creators

arxiv: v1 [cs.cv] 1 Dec 2018

Encoder-Decoder Networks for Semantic Segmentation. Sachin Mehta

Object Detection. CS698N Final Project Presentation AKSHAT AGARWAL SIDDHARTH TANWAR

arxiv: v1 [cs.cv] 31 Mar 2016

Real-time Object Detection CS 229 Course Project

MULTI-SCALE OBJECT DETECTION WITH FEATURE FUSION AND REGION OBJECTNESS NETWORK. Wenjie Guan, YueXian Zou*, Xiaoqun Zhou

MCMOT: Multi-Class Multi-Object Tracking using Changing Point Detection

TEXTS in scenes contain high level semantic information

Spatial Localization and Detection. Lecture 8-1

Constrained Convolutional Neural Networks for Weakly Supervised Segmentation. Deepak Pathak, Philipp Krähenbühl and Trevor Darrell

WeText: Scene Text Detection under Weak Supervision

Simultaneous Recognition of Horizontal and Vertical Text in Natural Images

arxiv: v1 [cs.cv] 4 Jan 2018

Mask R-CNN. By Kaiming He, Georgia Gkioxari, Piotr Dollar and Ross Girshick Presented By Aditya Sanghi

END-TO-END CHINESE TEXT RECOGNITION

RON: Reverse Connection with Objectness Prior Networks for Object Detection

YOLO 9000 TAEWAN KIM

Simultaneous Recognition of Horizontal and Vertical Text in Natural Images

arxiv: v1 [cs.cv] 28 Mar 2019

A pooling based scene text proposal technique for scene text reading in the wild

Semantic Segmentation

Deep learning for dense per-pixel prediction. Chunhua Shen The University of Adelaide, Australia

Creating Affordable and Reliable Autonomous Vehicle Systems

Scene Text Recognition for Augmented Reality. Sagar G V Adviser: Prof. Bharadwaj Amrutur Indian Institute Of Science

Pedestrian Detection based on Deep Fusion Network using Feature Correlation

LSTM and its variants for visual recognition. Xiaodan Liang Sun Yat-sen University

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

Unified, real-time object detection

Detecting and Parsing of Visual Objects: Humans and Animals. Alan Yuille (UCLA)

CS 1674: Intro to Computer Vision. Object Recognition. Prof. Adriana Kovashka University of Pittsburgh April 3, 5, 2018

Segmentation Framework for Multi-Oriented Text Detection and Recognition

Learning to detect, localize and recognize many text objects in document images from few examples

arxiv: v1 [cs.cv] 13 Jul 2017

YOLO9000: Better, Faster, Stronger

Automatic detection of books based on Faster R-CNN

Exploiting noisy web data for largescale visual recognition

arxiv: v1 [cs.cv] 12 Sep 2016

Classification of objects from Video Data (Group 30)

arxiv: v1 [cs.cv] 9 Aug 2017

Improving Face Recognition by Exploring Local Features with Visual Attention

Yield Estimation using faster R-CNN

AttentionNet for Accurate Localization and Detection of Objects. (To appear in ICCV 2015)

Intensity-Depth Face Alignment Using Cascade Shape Regression

arxiv: v1 [cs.cv] 16 Nov 2015

Real-Time Rotation-Invariant Face Detection with Progressive Calibration Networks

with Deep Learning A Review of Person Re-identification Xi Li College of Computer Science, Zhejiang University

Proceedings of the International MultiConference of Engineers and Computer Scientists 2018 Vol I IMECS 2018, March 14-16, 2018, Hong Kong

Geometry-aware Traffic Flow Analysis by Detection and Tracking

Computer Vision Lecture 16

arxiv: v2 [cs.cv] 23 Jan 2019

Internet of things that video

EFFECTIVE OBJECT DETECTION FROM TRAFFIC CAMERA VIDEOS. Honghui Shi, Zhichao Liu*, Yuchen Fan, Xinchao Wang, Thomas Huang

Computer Vision Lecture 16

Transcription:

Photo OCR (2017-2018) Xiang Bai Huazhong University of Science and Technology

Outline VALSE2018, DaLian Xiang Bai 2

Deep Direct Regression for Multi-Oriented Scene Text Detection [He et al., ICCV, 2017.] Indirect regression Direct regression Architecture Multi-level feature fusion Up-sample to quarter size of the input image Multi-task learning for classification and regression Post Processing: Refined NMS VALSE2018, DaLian Xiang Bai 3

Self-Organized Text Detection With Minimal Post-Processing via Border Learning [Wu et al., ICCV, 2017.] Multi-scale FCN 3 classes: text, no text, and border VALSE2018, DaLian Xiang Bai 4

Single Shot Text Detector with Regional Attention [He et al., ICCV, 2017.] SSD [1] backbone Using text mask as attention information [1] W. Liu, et al. SSD: single shot multibox detector. ECCV, 2016 VALSE2018, DaLian Xiang Bai 5

WordSup: Exploiting Word Annotations for Character based Text Detection [Hu et al., ICCV, 2017.] A weakly supervised framework that can utilize word annotations for character detector training (inspired by [1]) Text structure analysis to group the characters [1] J. Dai, et al. Boxsup: Exploiting bounding boxes to supervise convolutional networks for semantic segmentation. ICCV, 2015 VALSE2018, DaLian Xiang Bai 6

TextBoxes++: A Single-Shot Oriented Scene Text Detector [Liao et al., TIP, 2018.] TextBoxes++ is an extension of TextBoxes [1] SSD backbone From horizontal text detector (TextBoxes) to oriented text detector Regress four vertexes of a quadrilateral Better combination with recognition [1] M. Liao et al. TextBoxes: A Fast Text Detector with a Single Deep Neural Network. AAAI, 2017 VALSE2018, DaLian Xiang Bai 7

Rotation-sensitive Regression for Oriented Scene Text Detection [Liao et al., CVPR, 2018.] Shared feature for cls. & reg. 0.42 Input image Rotation-sensitive feature for reg. Predicted bounding box and score 0.87 Rotation-invariant feature for cls. VALSE2018, DaLian Xiang Bai 8

Rotation-sensitive Regression for Oriented Scene Text Detection [Liao et al., CVPR, 2018.] Rotation-sensitive feature maps for regression using oriented response convolution [1] Rotation-invariant features for classification pooling all rotation-sensitive feature maps [1] Y, Zhou, et al. Oriented response networks. CVPR, 2017 VALSE2018, DaLian Xiang Bai 9

Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation [Lyu et al., CVPR 2018] A compound text detection method: corner localization and region segmentation. Proposal generation via sampling and grouping corner points. Scoring proposals with position sensitive segmentation maps. VALSE2018, DaLian Xiang Bai 10

Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation [Lyu et al., CVPR 2018] Corner localization: corner detection with DSSD[1]. Region segmentation: position-sensitive segmentation [1] C. Fu, et al. DSSD: Deconvolutional Single Shot Detector. Arxiv, 2017 VALSE2018, DaLian Xiang Bai 11

Outline VALSE2018, DaLian Xiang Bai 12

Focusing Attention: Towards Accurate Text Recognition in Natural Images [Cheng et al., ICCV, 2017.] Attention-based text recognition, like [1]. Focusing Network to handle the attention drift. [1] B. Shi, et al. Robust Scene Text Recognition with Automatic Rectification. CVPR, 2016 VALSE2018, DaLian Xiang Bai 13

AON: Towards Arbitrarily-Oriented Text Recognition [Cheng et al., CVPR, 2018.] Each pixel is represented by four weighted sequences of features. Attention-based text recognition. VALSE2018, DaLian Xiang Bai 14

Outline VALSE2018, DaLian Xiang Bai 15

Towards End-to-end Text Spotting with Convolutional Recurrent Neural Networks [Li et al., ICCV 2017] An end-to-end trainable network for end-to-end scene text recognition. A RPN subnet is used to detect scene text, an attention based network is used to recognize the detected text. Designed for horizontal scene text. VALSE2018, DaLian Xiang Bai 16

Deep TextSpotter: An End-To-End Trainable Scene Text Localization and Recognition Framework [Michal et al., ICCV 2017] An end-to-end trainable network which consists of a RPN for text proposal generation and a CRNN[1] for region transcription. Having the ability to detect and recognize horizontal and multioriented scene text. [1] B. Shi, et al. An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition. TPAMI, 2017 VALSE2018, DaLian Xiang Bai 17

FOTS: Fast Oriented Text Spotting with a Unified Network [Liu et al., CVPR 2018] Using EAST[1] as text detector. Using CRNN as text recognizer. Rotating rotated proposal to horizontal proposal with RoIRotate. End-to-end training and evaluation. [1] X. Zhou, et al. EAST: An Efficient and Accurate Scene Text Detector. CVPR, 2017 VALSE2018, DaLian Xiang Bai 18

Outline VALSE2018, DaLian Xiang Bai 19

RCTW-17 dataset Chinese Text in the Wild(12,034 images, 8034 images for training and 4000 images for testing) The text annotated in RCTW-17 consists of Chinese characters, digits, and English characters, with Chinese characters taking the largest portion. Link: http://mclab.eic.hust.edu.cn/icdar2017chinese/ State of the art: Detection 0.703 f-measure; Recognition 0.37 normalized edit distance VALSE2018, DaLian Xiang Bai 20

MLT: Multi-lingual scene text detection and script identification Multi-lingual text: 18,000 images, 9 different languages representing 6 different scripts Tasks: Text Detection, Script identification, Joint text detection and script identification Link: http://rrc.cvc.uab.es/?ch=8 State of the art: Detection: AP=0.643, Hmean=0.735; Script identification: 88.1%; Joint detection and script identification VALSE2018, DaLian Xiang Bai 21

SCUT-CTW1500 Curved text in the wild 1000 train images, 500 test images. Each contains at least one instance of curved text Link: https://github.com/yuliang-liu/curve-text-detector State of the art: Hmean 0.734 VALSE2018, DaLian Xiang Bai 22

Total-Text Horizontal, multi-oriented, and curved text 1555 images Link: https://github.com/cs-chan/total-text-dataset State of the art: Hmean 0.36 VALSE2018, DaLian Xiang Bai 23

Thank you!