DEEP STRUCTURED OUTPUT LEARNING FOR UNCONSTRAINED TEXT RECOGNITION

Size: px
Start display at page:

Download "DEEP STRUCTURED OUTPUT LEARNING FOR UNCONSTRAINED TEXT RECOGNITION"

Transcription

1 DEEP STRUCTURED OUTPUT LEARNING FOR UNCONSTRAINED TEXT RECOGNITION Max Jaderberg, Karen Simonyan, Andrea Vedaldi, Andrew Zisserman Visual Geometry Group, Department Engineering Science, University of Oxford, UK 1

2 TEXT RECOGNITION Localized text image as input, character string as output COSTA DENIM DISTRIBUTED FOCAL

3 TEXT RECOGNITION State of the art constrained text recognition! word classification [Jaderberg, NIPS DLW 2014]! static ngram and word language model [Bissacco, ICCV 2013] APARTMENTS

4 TEXT RECOGNITION State of the art constrained text recognition! word classification [Jaderberg, NIPS DLW 2014]! static ngram and word language model [Bissacco, ICCV 2013] Random string? New, unmodeled word?

5 TEXT RECOGNITION Unconstrained text recognition! e.g. for house numbers [Goodfellow, ICLR 2014] business names, phone numbers, s, etc Random string RGQGAN323 New, unmodeled word TWERK

6 OVERVIEW Two models for text recognition [Jaderberg, NIPS DLW 2014] Character Sequence Model Bag-of-N-grams Model! Joint formulation CRF to construct graph Structured output loss Use back-propagation for joint optimization! Experiments Generalize to perform zero-shot recognition When constrained recover performance!

7 CHARACTER SEQUENCE MODEL Deep CNN to encode image. Per-character decoder x convolutional layers, 2 FC layers, ReLU, max-pooling 23 output classifiers for 37 classes (0-9,a-z,null)! Fixed 32x100 input size distorts aspect ratio

8 CHARACTER SEQUENCE MODEL Deep CNN to encode image. Per-character decoder. 0 e z Ø char 1 P (c 1 (x)) x CHAR CNN s char 5 char 6 char 23 P (c 23 (x))

9 BAG-OF-N-GRAMS MODEL Represent string by the character N-grams contained within the string spires s! p! i! r! e! sp! pi! ir! re! es! spi! pir! ire! res! spir! pire! ires 1-grams 2-grams 3-grams 4-grams

10 BAG-OF-N-GRAMS MODEL Deep CNN to encode image. N-grams detection vector output. Limited (10k) set of modeled N-grams. N-gram detection vector a b ak ke ra aba rake raze

11 JOINT MODEL Can we combine these two representations? 0 r z Ø char CHAR CNN e char 4 char 5 char NGRAM CNN a b ak ke ra aba rake raze

12 JOINT MODEL CHAR CNN f(x) a e k q r

13 JOINT MODEL maximum number of chars CHAR CNN f(x) a e k q r NGRAM CNN g(x)

14 JOINT MODEL CHAR CNN f(x) w = arg max w S(w, x) beam search a e k q r NGRAM CNN g(x)

15 STRUCTURED OUTPUT LOSS Score of ground-truth word should be greater than or equal to the highest scoring incorrect word + margin.! where Enforcing as soft constraint leads to a hinge loss

16 STRUCTURED OUTPUT LOSS

17 EXPERIMENTS

18 DATASETS All models trained purely on synthetic data! [Jaderberg, NIPS DLW 2014] Font rendering Border/shadow & color Composition Projective distortion Natural image blending Realistic enough to transfer to test on real-world images

19 DATASETS Synth90k! Lexicon of 90k words. 9 million images, training + test splits Download from

20 DATASETS ICDAR 2003, 2013! Street View Text IIIT 5k-word

21 TRAINING Pre-train CHAR and NGRAM model independently.! Use them to initialize joint model and continue jointly training.

22 EXPERIMENTS - JOINT IMPROVEMENT Train Data Test Data CHAR JOINT Synth90k Synth90k IC SVT IC joint model outperforms character sequence model alone CHAR: grahaws! JOINT: grahams! GT: grahams CHAR: mediaal! JOINT: medical! GT: medical CHAR: chocoma_! JOINT: chocomel! GT: chocomel CHAR: iustralia! JOINT: australia! GT: australia

23 JOINT MODEL CORRECTIONS edge down-weighted in graph edges up-weighted in graph

24 EXPERIMENTS - ZERO-SHOT RECOGNITION Train Data Test Data CHAR JOINT Synth90k large difference for CHAR model when not trained on test words Synth72k-90k Synth90k Synth45k-90k IC SVT IC Synth1-72k Synth72k-90k Synth1-45k Synth45k-90k joint model recovers performance

25 EXPERIMENTS - COMPARISON No Lexicon Model Type Model IC03 SVT IC13 Unconstrained Baseline (ABBYY) Wang, ICCV Bissacco, ICCV Language Constrained Yao, CVPR Jaderberg, ECCV Gordo, arxiv Jaderberg, NIPSDLW Unconstrained CHAR JOINT

26 EXPERIMENTS - COMPARISON Model Type Model No Lexicon IC03 SVT IC13 IC03- Full Fixed Lexicon SVT-50 IIIT5k -50 Unconstrained Baseline (ABBYY) IIIT5k- 1k Wang, ICCV Bissacco, ICCV Language Constrained Yao, CVPR Jaderberg, ECCV Gordo, arxiv Jaderberg, NIPSDLW Unconstrained CHAR JOINT

27 SUMMARY Two models for text recognition! Joint formulation Structured output loss Use back-propagation for joint optimization! Experiments Joint model improves accuracy on language-based data. Degrades elegantly when not from language (Ngram model doesn t contribute much) Set benchmark for unconstrained accuracy, competes with purely constrained models.

28

Return of the Devil in the Details: Delving Deep into Convolutional Nets

Return of the Devil in the Details: Delving Deep into Convolutional Nets Return of the Devil in the Details: Delving Deep into Convolutional Nets Ken Chatfield - Karen Simonyan - Andrea Vedaldi - Andrew Zisserman University of Oxford The Devil is still in the Details 2011 2014

More information

SAFE: Scale Aware Feature Encoder for Scene Text Recognition

SAFE: Scale Aware Feature Encoder for Scene Text Recognition SAFE: Scale Aware Feature Encoder for Scene Text Recognition Wei Liu, Chaofeng Chen, and Kwan-Yee K. Wong Department of Computer Science, The University of Hong Kong {wliu, cfchen, kykwong}@cs.hku.hk arxiv:1901.05770v1

More information

TEXTS in scenes contain high level semantic information

TEXTS in scenes contain high level semantic information 1 ESIR: End-to-end Scene Text Recognition via Iterative Rectification Fangneng Zhan and Shijian Lu arxiv:1812.05824v1 [cs.cv] 14 Dec 2018 Abstract Automated recognition of various texts in scenes has been

More information

Simultaneous Recognition of Horizontal and Vertical Text in Natural Images

Simultaneous Recognition of Horizontal and Vertical Text in Natural Images Simultaneous Recognition of Horizontal and Vertical Text in Natural Images Chankyu Choi, Youngmin Yoon, Junsu Lee, Junseok Kim NAVER Corporation {chankyu.choi,youngmin.yoon,junsu.lee,jun.seok}@navercorp.com

More information

Simultaneous Recognition of Horizontal and Vertical Text in Natural Images

Simultaneous Recognition of Horizontal and Vertical Text in Natural Images Simultaneous Recognition of Horizontal and Vertical Text in Natural Images Chankyu Choi, Youngmin Yoon, Junsu Lee, Junseok Kim NAVER Corporation {chankyu.choi,youngmin.yoon,junsu.lee,jun.seok}@navercorp.com

More information

Gated Recurrent Convolution Neural Network for OCR

Gated Recurrent Convolution Neural Network for OCR Gated Recurrent Convolution Neural Network for OCR Jianfeng Wang Beijing University of Posts and Telecommunications Beijing 100876, China jianfengwang1991@gmail.com Xiaolin Hu Tsinghua National Laboratory

More information

Deep Neural Networks for Scene Text Reading

Deep Neural Networks for Scene Text Reading Huazhong University of Science & Technology Deep Neural Networks for Scene Text Reading Xiang Bai Huazhong University of Science and Technology Problem definitions p Definitions Summary Booklet End-to-end

More information

Bilinear Models for Fine-Grained Visual Recognition

Bilinear Models for Fine-Grained Visual Recognition Bilinear Models for Fine-Grained Visual Recognition Subhransu Maji College of Information and Computer Sciences University of Massachusetts, Amherst Fine-grained visual recognition Example: distinguish

More information

arxiv: v1 [cs.cv] 12 Nov 2017

arxiv: v1 [cs.cv] 12 Nov 2017 Arbitrarily-Oriented Text Recognition Zhanzhan Cheng 1 Yangliu Xu 12 Fan Bai 3 Yi Niu 1 Shiliang Pu 1 Shuigeng Zhou 3 1 Hikvision Research Institute, China; 2 Tongji University, China; 3 Shanghai Key Lab

More information

Improving Face Recognition by Exploring Local Features with Visual Attention

Improving Face Recognition by Exploring Local Features with Visual Attention Improving Face Recognition by Exploring Local Features with Visual Attention Yichun Shi and Anil K. Jain Michigan State University Difficulties of Face Recognition Large variations in unconstrained face

More information

Robust Scene Text Recognition with Automatic Rectification

Robust Scene Text Recognition with Automatic Rectification Robust Scene Text Recognition with Automatic Rectification Baoguang Shi, Xinggang Wang, Pengyuan Lyu, Cong Yao, Xiang Bai School of Electronic Information and Communications Huazhong University of Science

More information

Deep learning for object detection. Slides from Svetlana Lazebnik and many others

Deep learning for object detection. Slides from Svetlana Lazebnik and many others Deep learning for object detection Slides from Svetlana Lazebnik and many others Recent developments in object detection 80% PASCAL VOC mean0average0precision0(map) 70% 60% 50% 40% 30% 20% 10% Before deep

More information

Yiqi Yan. May 10, 2017

Yiqi Yan. May 10, 2017 Yiqi Yan May 10, 2017 P a r t I F u n d a m e n t a l B a c k g r o u n d s Convolution Single Filter Multiple Filters 3 Convolution: case study, 2 filters 4 Convolution: receptive field receptive field

More information

arxiv: v1 [cs.cv] 6 Sep 2017

arxiv: v1 [cs.cv] 6 Sep 2017 Scene Text Recognition with Sliding Convolutional Character Models Fei Yin, Yi-Chao Wu, Xu-Yao Zhang, Cheng-Lin Liu National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy

More information

Pattern Recognition xxx (2016) xxx-xxx. Contents lists available at ScienceDirect. Pattern Recognition. journal homepage:

Pattern Recognition xxx (2016) xxx-xxx. Contents lists available at ScienceDirect. Pattern Recognition. journal homepage: Pattern Recognition xxx (2016) xxx-xxx Contents lists available at ScienceDirect ARTICLE INFO Article history: Received 6 June 2016 Received in revised form 20 September 2016 Accepted 15 October 2016 Available

More information

Detecting and Parsing of Visual Objects: Humans and Animals. Alan Yuille (UCLA)

Detecting and Parsing of Visual Objects: Humans and Animals. Alan Yuille (UCLA) Detecting and Parsing of Visual Objects: Humans and Animals Alan Yuille (UCLA) Summary This talk describes recent work on detection and parsing visual objects. The methods represent objects in terms of

More information

arxiv: v1 [cs.cv] 2 Jun 2018

arxiv: v1 [cs.cv] 2 Jun 2018 SCAN: Sliding Convolutional Attention Network for Scene Text Recognition Yi-Chao Wu, Fei Yin, Xu-Yao Zhang, Li Liu, Cheng-Lin Liu, National Laboratory of Pattern Recognition (NLPR), Institute of Automation

More information

Encoder-Decoder Networks for Semantic Segmentation. Sachin Mehta

Encoder-Decoder Networks for Semantic Segmentation. Sachin Mehta Encoder-Decoder Networks for Semantic Segmentation Sachin Mehta Outline > Overview of Semantic Segmentation > Encoder-Decoder Networks > Results What is Semantic Segmentation? Input: RGB Image Output:

More information

Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab.

Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab. [ICIP 2017] Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab., POSTECH Pedestrian Detection Goal To draw bounding boxes that

More information

Lecture 5: Object Detection

Lecture 5: Object Detection Object Detection CSED703R: Deep Learning for Visual Recognition (2017F) Lecture 5: Object Detection Bohyung Han Computer Vision Lab. bhhan@postech.ac.kr 2 Traditional Object Detection Algorithms Region-based

More information

CSE 559A: Computer Vision

CSE 559A: Computer Vision CSE 559A: Computer Vision Fall 2018: T-R: 11:30-1pm @ Lopata 101 Instructor: Ayan Chakrabarti (ayan@wustl.edu). Course Staff: Zhihao Xia, Charlie Wu, Han Liu http://www.cse.wustl.edu/~ayan/courses/cse559a/

More information

arxiv: v1 [cs.cv] 23 Apr 2016

arxiv: v1 [cs.cv] 23 Apr 2016 Text Flow: A Unified Text Detection System in Natural Scene Images Shangxuan Tian1, Yifeng Pan2, Chang Huang2, Shijian Lu3, Kai Yu2, and Chew Lim Tan1 arxiv:1604.06877v1 [cs.cv] 23 Apr 2016 1 School of

More information

[Supplementary Material] Improving Occlusion and Hard Negative Handling for Single-Stage Pedestrian Detectors

[Supplementary Material] Improving Occlusion and Hard Negative Handling for Single-Stage Pedestrian Detectors [Supplementary Material] Improving Occlusion and Hard Negative Handling for Single-Stage Pedestrian Detectors Junhyug Noh Soochan Lee Beomsu Kim Gunhee Kim Department of Computer Science and Engineering

More information

Temporal HeartNet: Towards Human-Level Automatic Analysis of Fetal Cardiac Screening Video

Temporal HeartNet: Towards Human-Level Automatic Analysis of Fetal Cardiac Screening Video Temporal HeartNet: Towards Human-Level Automatic Analysis of Fetal Cardiac Screening Video Weilin Huang, Christopher P. Bridge, J. Alison Noble, and Andrew Zisserman Department of Engineering Science,

More information

Two-Stream Convolutional Networks for Action Recognition in Videos

Two-Stream Convolutional Networks for Action Recognition in Videos Two-Stream Convolutional Networks for Action Recognition in Videos Karen Simonyan Andrew Zisserman Cemil Zalluhoğlu Introduction Aim Extend deep Convolution Networks to action recognition in video. Motivation

More information

Scene Text Recognition for Augmented Reality. Sagar G V Adviser: Prof. Bharadwaj Amrutur Indian Institute Of Science

Scene Text Recognition for Augmented Reality. Sagar G V Adviser: Prof. Bharadwaj Amrutur Indian Institute Of Science Scene Text Recognition for Augmented Reality Sagar G V Adviser: Prof. Bharadwaj Amrutur Indian Institute Of Science Outline Research area and motivation Finding text in natural scenes Prior art Improving

More information

Lab meeting (Paper review session) Stacked Generative Adversarial Networks

Lab meeting (Paper review session) Stacked Generative Adversarial Networks Lab meeting (Paper review session) Stacked Generative Adversarial Networks 2017. 02. 01. Saehoon Kim (Ph. D. candidate) Machine Learning Group Papers to be covered Stacked Generative Adversarial Networks

More information

AON: Towards Arbitrarily-Oriented Text Recognition

AON: Towards Arbitrarily-Oriented Text Recognition AON: Towards Arbitrarily-Oriented Text Recognition Zhanzhan Cheng 1 Yangliu Xu 2 Fan Bai 3 Yi Niu 1 Shiliang Pu 1 Shuigeng Zhou 3 1 Hikvision Research Institute, China; 2 Tongji University, Shanghai, China;

More information

Multilayer and Multimodal Fusion of Deep Neural Networks for Video Classification

Multilayer and Multimodal Fusion of Deep Neural Networks for Video Classification Multilayer and Multimodal Fusion of Deep Neural Networks for Video Classification Xiaodong Yang, Pavlo Molchanov, Jan Kautz INTELLIGENT VIDEO ANALYTICS Surveillance event detection Human-computer interaction

More information

arxiv: v3 [cs.cv] 2 Apr 2019

arxiv: v3 [cs.cv] 2 Apr 2019 ESIR: End-to-end Scene Text Recognition via Iterative Image Rectification Fangneng Zhan Nanyang Technological University 50 Nanyang Avenue, Singapore 639798 fnzhan@ntu.edu.sg Shijian Lu Nanyang Technological

More information

Deep learning for dense per-pixel prediction. Chunhua Shen The University of Adelaide, Australia

Deep learning for dense per-pixel prediction. Chunhua Shen The University of Adelaide, Australia Deep learning for dense per-pixel prediction Chunhua Shen The University of Adelaide, Australia Image understanding Classification error Convolution Neural Networks 0.3 0.2 0.1 Image Classification [Krizhevsky

More information

A Further Step to Perfect Accuracy by Training CNN with Larger Data

A Further Step to Perfect Accuracy by Training CNN with Larger Data A Further Step to Perfect Accuracy by Training CNN with Larger Data Seiichi Uchida, Shota Ide, Brian Kenji Iwana, Anna Zhu ISEE-AIT, Kyushu University, Fukuoka, 819-0395, Japan {uchida, ide, brian, anna}@human.ait.kyushu-u.ac.jp

More information

AUTOMATIC 3D HUMAN ACTION RECOGNITION Ajmal Mian Associate Professor Computer Science & Software Engineering

AUTOMATIC 3D HUMAN ACTION RECOGNITION Ajmal Mian Associate Professor Computer Science & Software Engineering AUTOMATIC 3D HUMAN ACTION RECOGNITION Ajmal Mian Associate Professor Computer Science & Software Engineering www.csse.uwa.edu.au/~ajmal/ Overview Aim of automatic human action recognition Applications

More information

Recovering Realistic Texture in Image Super-resolution by Deep Spatial Feature Transform. Xintao Wang Ke Yu Chao Dong Chen Change Loy

Recovering Realistic Texture in Image Super-resolution by Deep Spatial Feature Transform. Xintao Wang Ke Yu Chao Dong Chen Change Loy Recovering Realistic Texture in Image Super-resolution by Deep Spatial Feature Transform Xintao Wang Ke Yu Chao Dong Chen Change Loy Problem enlarge 4 times Low-resolution image High-resolution image Previous

More information

Learning to Read Irregular Text with Attention Mechanisms

Learning to Read Irregular Text with Attention Mechanisms Learning to Read Irregular Text with Attention Mechanisms Xiao Yang, Dafang He, Zihan Zhou, Daniel Kifer, C. Lee Giles The Pennsylvania State University, University Park, PA 16802, USA {xuy111, duh188}@psu.edu,

More information

LEARNING TO INFER GRAPHICS PROGRAMS FROM HAND DRAWN IMAGES

LEARNING TO INFER GRAPHICS PROGRAMS FROM HAND DRAWN IMAGES LEARNING TO INFER GRAPHICS PROGRAMS FROM HAND DRAWN IMAGES Kevin Ellis - MIT, Daniel Ritchie - Brown University, Armando Solar-Lezama - MIT, Joshua b. Tenenbaum - MIT Presented by : Maliha Arif Advanced

More information

SSD: Single Shot MultiBox Detector. Author: Wei Liu et al. Presenter: Siyu Jiang

SSD: Single Shot MultiBox Detector. Author: Wei Liu et al. Presenter: Siyu Jiang SSD: Single Shot MultiBox Detector Author: Wei Liu et al. Presenter: Siyu Jiang Outline 1. Motivations 2. Contributions 3. Methodology 4. Experiments 5. Conclusions 6. Extensions Motivation Motivation

More information

Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction

Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction by Noh, Hyeonwoo, Paul Hongsuck Seo, and Bohyung Han.[1] Presented : Badri Patro 1 1 Computer Vision Reading

More information

Lecture 7: Semantic Segmentation

Lecture 7: Semantic Segmentation Semantic Segmentation CSED703R: Deep Learning for Visual Recognition (207F) Segmenting images based on its semantic notion Lecture 7: Semantic Segmentation Bohyung Han Computer Vision Lab. bhhanpostech.ac.kr

More information

Human Pose Estimation with Deep Learning. Wei Yang

Human Pose Estimation with Deep Learning. Wei Yang Human Pose Estimation with Deep Learning Wei Yang Applications Understand Activities Family Robots American Heist (2014) - The Bank Robbery Scene 2 What do we need to know to recognize a crime scene? 3

More information

Verisimilar Image Synthesis for Accurate Detection and Recognition of Texts in Scenes

Verisimilar Image Synthesis for Accurate Detection and Recognition of Texts in Scenes Verisimilar Image Synthesis for Accurate Detection and Recognition of Texts in Scenes Fangneng Zhan 1[0000 0003 1502 6847], Shijian Lu 2[0000 0002 6766 2506], and Chuhui Xue 3[0000 0002 3562 3094] School

More information

Spatial Localization and Detection. Lecture 8-1

Spatial Localization and Detection. Lecture 8-1 Lecture 8: Spatial Localization and Detection Lecture 8-1 Administrative - Project Proposals were due on Saturday Homework 2 due Friday 2/5 Homework 1 grades out this week Midterm will be in-class on Wednesday

More information

Content-Based Image Recovery

Content-Based Image Recovery Content-Based Image Recovery Hong-Yu Zhou and Jianxin Wu National Key Laboratory for Novel Software Technology Nanjing University, China zhouhy@lamda.nju.edu.cn wujx2001@nju.edu.cn Abstract. We propose

More information

arxiv: v1 [cs.cv] 13 Jul 2017

arxiv: v1 [cs.cv] 13 Jul 2017 Towards End-to-end Text Spotting with Convolutional Recurrent Neural Networks Hui Li, Peng Wang, Chunhua Shen Machine Learning Group, The University of Adelaide, Australia arxiv:1707.03985v1 [cs.cv] 13

More information

CAP 6412 Advanced Computer Vision

CAP 6412 Advanced Computer Vision CAP 6412 Advanced Computer Vision http://www.cs.ucf.edu/~bgong/cap6412.html Boqing Gong March 03, 2016 Next week: Spring break The week after next week: Vision and language Tuesday (03/15) Fareeha Irfan

More information

Structured Prediction using Convolutional Neural Networks

Structured Prediction using Convolutional Neural Networks Overview Structured Prediction using Convolutional Neural Networks Bohyung Han bhhan@postech.ac.kr Computer Vision Lab. Convolutional Neural Networks (CNNs) Structured predictions for low level computer

More information

END-TO-END CHINESE TEXT RECOGNITION

END-TO-END CHINESE TEXT RECOGNITION END-TO-END CHINESE TEXT RECOGNITION Jie Hu 1, Tszhang Guo 1, Ji Cao 2, Changshui Zhang 1 1 Department of Automation, Tsinghua University 2 Beijing SinoVoice Technology November 15, 2017 Presentation at

More information

arxiv:submit/ [cs.cv] 13 Jan 2018

arxiv:submit/ [cs.cv] 13 Jan 2018 Benchmark Visual Question Answer Models by using Focus Map Wenda Qiu Yueyang Xianzang Zhekai Zhang Shanghai Jiaotong University arxiv:submit/2130661 [cs.cv] 13 Jan 2018 Abstract Inferring and Executing

More information

Detecting and Recognizing Text in Natural Images using Convolutional Networks

Detecting and Recognizing Text in Natural Images using Convolutional Networks Detecting and Recognizing Text in Natural Images using Convolutional Networks Aditya Srinivas Timmaraju, Vikesh Khanna Stanford University Stanford, CA - 94305 adityast@stanford.edu, vikesh@stanford.edu

More information

Learning Deep Structured Models for Semantic Segmentation. Guosheng Lin

Learning Deep Structured Models for Semantic Segmentation. Guosheng Lin Learning Deep Structured Models for Semantic Segmentation Guosheng Lin Semantic Segmentation Outline Exploring Context with Deep Structured Models Guosheng Lin, Chunhua Shen, Ian Reid, Anton van dan Hengel;

More information

Andrei Polzounov (Universitat Politecnica de Catalunya, Barcelona, Spain), Artsiom Ablavatski (A*STAR Institute for Infocomm Research, Singapore),

Andrei Polzounov (Universitat Politecnica de Catalunya, Barcelona, Spain), Artsiom Ablavatski (A*STAR Institute for Infocomm Research, Singapore), WordFences: Text Localization and Recognition ICIP 2017 Andrei Polzounov (Universitat Politecnica de Catalunya, Barcelona, Spain), Artsiom Ablavatski (A*STAR Institute for Infocomm Research, Singapore),

More information

Object detection with CNNs

Object detection with CNNs Object detection with CNNs 80% PASCAL VOC mean0average0precision0(map) 70% 60% 50% 40% 30% 20% 10% Before CNNs After CNNs 0% 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 year Region proposals

More information

3 Object Detection. BVM 2018 Tutorial: Advanced Deep Learning Methods. Paul F. Jaeger, Division of Medical Image Computing

3 Object Detection. BVM 2018 Tutorial: Advanced Deep Learning Methods. Paul F. Jaeger, Division of Medical Image Computing 3 Object Detection BVM 2018 Tutorial: Advanced Deep Learning Methods Paul F. Jaeger, of Medical Image Computing What is object detection? classification segmentation obj. detection (1 label per pixel)

More information

Proceedings of the International MultiConference of Engineers and Computer Scientists 2018 Vol I IMECS 2018, March 14-16, 2018, Hong Kong

Proceedings of the International MultiConference of Engineers and Computer Scientists 2018 Vol I IMECS 2018, March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong TABLE I CLASSIFICATION ACCURACY OF DIFFERENT PRE-TRAINED MODELS ON THE TEST DATA

More information

Computer Vision Lecture 16

Computer Vision Lecture 16 Computer Vision Lecture 16 Deep Learning for Object Categorization 14.01.2016 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Seminar registration period

More information

Convolution Neural Network for Traditional Chinese Calligraphy Recognition

Convolution Neural Network for Traditional Chinese Calligraphy Recognition Convolution Neural Network for Traditional Chinese Calligraphy Recognition Boqi Li Mechanical Engineering Stanford University boqili@stanford.edu Abstract script. Fig. 1 shows examples of the same TCC

More information

arxiv: v2 [cs.cv] 23 May 2016

arxiv: v2 [cs.cv] 23 May 2016 Localizing by Describing: Attribute-Guided Attention Localization for Fine-Grained Recognition arxiv:1605.06217v2 [cs.cv] 23 May 2016 Xiao Liu Jiang Wang Shilei Wen Errui Ding Yuanqing Lin Baidu Research

More information

Automatic Script Identification in the Wild

Automatic Script Identification in the Wild Automatic Script Identification in the Wild Baoguang Shi, Cong Yao, Chengquan Zhang, Xiaowei Guo, Feiyue Huang, Xiang Bai School of EIC, Huazhong University of Science and Technology, Wuhan, P.R. China

More information

Supplementary Material for SphereNet: Learning Spherical Representations for Detection and Classification in Omnidirectional Images

Supplementary Material for SphereNet: Learning Spherical Representations for Detection and Classification in Omnidirectional Images Supplementary Material for SphereNet: Learning Spherical Representations for Detection and Classification in Omnidirectional Images Benjamin Coors 1,3, Alexandru Paul Condurache 2,3, and Andreas Geiger

More information

Arbitrary-Oriented Scene Text Detection via Rotation Proposals

Arbitrary-Oriented Scene Text Detection via Rotation Proposals 1 Arbitrary-Oriented Scene Text Detection via Rotation Proposals Jianqi Ma, Weiyuan Shao, Hao Ye, Li Wang, Hong Wang, Yingbin Zheng, Xiangyang Xue arxiv:1703.01086v1 [cs.cv] 3 Mar 2017 Abstract This paper

More information

arxiv: v1 [cs.cv] 14 Jul 2017

arxiv: v1 [cs.cv] 14 Jul 2017 Temporal Modeling Approaches for Large-scale Youtube-8M Video Understanding Fu Li, Chuang Gan, Xiao Liu, Yunlong Bian, Xiang Long, Yandong Li, Zhichao Li, Jie Zhou, Shilei Wen Baidu IDL & Tsinghua University

More information

Efficient indexing for Query By String text retrieval

Efficient indexing for Query By String text retrieval Efficient indexing for Query By String text retrieval Suman K. Ghosh Lluís, Gómez, Dimosthenis Karatzas and Ernest Valveny Computer Vision Center, Dept. Ciències de la Computació Universitat Autònoma de

More information

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Si Chen The George Washington University sichen@gwmail.gwu.edu Meera Hahn Emory University mhahn7@emory.edu Mentor: Afshin

More information

PARTIAL STYLE TRANSFER USING WEAKLY SUPERVISED SEMANTIC SEGMENTATION. Shin Matsuo Wataru Shimoda Keiji Yanai

PARTIAL STYLE TRANSFER USING WEAKLY SUPERVISED SEMANTIC SEGMENTATION. Shin Matsuo Wataru Shimoda Keiji Yanai PARTIAL STYLE TRANSFER USING WEAKLY SUPERVISED SEMANTIC SEGMENTATION Shin Matsuo Wataru Shimoda Keiji Yanai Department of Informatics, The University of Electro-Communications, Tokyo 1-5-1 Chofugaoka,

More information

Object Detection. CS698N Final Project Presentation AKSHAT AGARWAL SIDDHARTH TANWAR

Object Detection. CS698N Final Project Presentation AKSHAT AGARWAL SIDDHARTH TANWAR Object Detection CS698N Final Project Presentation AKSHAT AGARWAL SIDDHARTH TANWAR Problem Description Arguably the most important part of perception Long term goals for object recognition: Generalization

More information

Fine-tuning Pre-trained Large Scaled ImageNet model on smaller dataset for Detection task

Fine-tuning Pre-trained Large Scaled ImageNet model on smaller dataset for Detection task Fine-tuning Pre-trained Large Scaled ImageNet model on smaller dataset for Detection task Kyunghee Kim Stanford University 353 Serra Mall Stanford, CA 94305 kyunghee.kim@stanford.edu Abstract We use a

More information

Fast CNN-Based Object Tracking Using Localization Layers and Deep Features Interpolation

Fast CNN-Based Object Tracking Using Localization Layers and Deep Features Interpolation Fast CNN-Based Object Tracking Using Localization Layers and Deep Features Interpolation Al-Hussein A. El-Shafie Faculty of Engineering Cairo University Giza, Egypt elshafie_a@yahoo.com Mohamed Zaki Faculty

More information

Feature Fusion for Scene Text Detection

Feature Fusion for Scene Text Detection 2018 13th IAPR International Workshop on Document Analysis Systems Feature Fusion for Scene Text Detection Zhen Zhu, Minghui Liao, Baoguang Shi, Xiang Bai School of Electronic Information and Communications

More information

Microscopy Cell Counting with Fully Convolutional Regression Networks

Microscopy Cell Counting with Fully Convolutional Regression Networks Microscopy Cell Counting with Fully Convolutional Regression Networks Weidi Xie, J. Alison Noble, Andrew Zisserman Department of Engineering Science, University of Oxford,UK Abstract. This paper concerns

More information

Computer Vision Lecture 16

Computer Vision Lecture 16 Announcements Computer Vision Lecture 16 Deep Learning Applications 11.01.2017 Seminar registration period starts on Friday We will offer a lab course in the summer semester Deep Robot Learning Topic:

More information

Final Report: Smart Trash Net: Waste Localization and Classification

Final Report: Smart Trash Net: Waste Localization and Classification Final Report: Smart Trash Net: Waste Localization and Classification Oluwasanya Awe oawe@stanford.edu Robel Mengistu robel@stanford.edu December 15, 2017 Vikram Sreedhar vsreed@stanford.edu Abstract Given

More information

Computer Vision Lecture 16

Computer Vision Lecture 16 Computer Vision Lecture 16 Deep Learning Applications 11.01.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Seminar registration period starts

More information

Transfer Learning. Style Transfer in Deep Learning

Transfer Learning. Style Transfer in Deep Learning Transfer Learning & Style Transfer in Deep Learning 4-DEC-2016 Gal Barzilai, Ram Machlev Deep Learning Seminar School of Electrical Engineering Tel Aviv University Part 1: Transfer Learning in Deep Learning

More information

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides Deep Learning in Visual Recognition Thanks Da Zhang for the slides Deep Learning is Everywhere 2 Roadmap Introduction Convolutional Neural Network Application Image Classification Object Detection Object

More information

Scene text recognition: no country for old men?

Scene text recognition: no country for old men? Scene text recognition: no country for old men? Lluís Gómez and Dimosthenis Karatzas Computer Vision Center Universitat Autònoma de Barcelona Email: {lgomez,dimos}@cvc.uab.es Abstract. It is a generally

More information

Supplementary Material: Unsupervised Domain Adaptation for Face Recognition in Unlabeled Videos

Supplementary Material: Unsupervised Domain Adaptation for Face Recognition in Unlabeled Videos Supplementary Material: Unsupervised Domain Adaptation for Face Recognition in Unlabeled Videos Kihyuk Sohn 1 Sifei Liu 2 Guangyu Zhong 3 Xiang Yu 1 Ming-Hsuan Yang 2 Manmohan Chandraker 1,4 1 NEC Labs

More information

Recognizing Text in the Wild

Recognizing Text in the Wild Bachelor thesis Computer Science Radboud University Recognizing Text in the Wild Author: Twan Cuijpers s4378911 First supervisor/assessor: dr. Twan van Laarhoven T.vanLaarhoven@cs.ru.nl Second assessor:

More information

Analysis: TextonBoost and Semantic Texton Forests. Daniel Munoz Februrary 9, 2009

Analysis: TextonBoost and Semantic Texton Forests. Daniel Munoz Februrary 9, 2009 Analysis: TextonBoost and Semantic Texton Forests Daniel Munoz 16-721 Februrary 9, 2009 Papers [shotton-eccv-06] J. Shotton, J. Winn, C. Rother, A. Criminisi, TextonBoost: Joint Appearance, Shape and Context

More information

Deeply Cascaded Networks

Deeply Cascaded Networks Deeply Cascaded Networks Eunbyung Park Department of Computer Science University of North Carolina at Chapel Hill eunbyung@cs.unc.edu 1 Introduction After the seminal work of Viola-Jones[15] fast object

More information

Convolutional Neural Networks + Neural Style Transfer. Justin Johnson 2/1/2017

Convolutional Neural Networks + Neural Style Transfer. Justin Johnson 2/1/2017 Convolutional Neural Networks + Neural Style Transfer Justin Johnson 2/1/2017 Outline Convolutional Neural Networks Convolution Pooling Feature Visualization Neural Style Transfer Feature Inversion Texture

More information

arxiv: v1 [cs.cv] 27 Jul 2017

arxiv: v1 [cs.cv] 27 Jul 2017 STN-OCR: A single Neural Network for Text Detection and Text Recognition arxiv:1707.08831v1 [cs.cv] 27 Jul 2017 Christian Bartz Haojin Yang Christoph Meinel Hasso Plattner Institute, University of Potsdam

More information

arxiv: v1 [cs.cv] 6 Jul 2016

arxiv: v1 [cs.cv] 6 Jul 2016 arxiv:607.079v [cs.cv] 6 Jul 206 Deep CORAL: Correlation Alignment for Deep Domain Adaptation Baochen Sun and Kate Saenko University of Massachusetts Lowell, Boston University Abstract. Deep neural networks

More information

Edit Probability for Scene Text Recognition

Edit Probability for Scene Text Recognition dit Probability for Scene Text Recognition Fan Bai 1 Zhanzhan Cheng 2 Yi Niu 2 Shiliang Pu 2 Shuigeng Zhou 1 1 Shanghai Key Lab of Intelligent Information Processing, and School of Computer Science, Fudan

More information

Patch Reordering: A Novel Way to Achieve Rotation and Translation Invariance in Convolutional Neural Networks

Patch Reordering: A Novel Way to Achieve Rotation and Translation Invariance in Convolutional Neural Networks Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-7) Patch Reordering: A Novel Way to Achieve Rotation and Translation Invariance in Convolutional Neural Networks Xu Shen,

More information

Supplementary Material: Unconstrained Salient Object Detection via Proposal Subset Optimization

Supplementary Material: Unconstrained Salient Object Detection via Proposal Subset Optimization Supplementary Material: Unconstrained Salient Object via Proposal Subset Optimization 1. Proof of the Submodularity According to Eqns. 10-12 in our paper, the objective function of the proposed optimization

More information

A Deep Learning Framework for Authorship Classification of Paintings

A Deep Learning Framework for Authorship Classification of Paintings A Deep Learning Framework for Authorship Classification of Paintings Kai-Lung Hua ( 花凱龍 ) Dept. of Computer Science and Information Engineering National Taiwan University of Science and Technology Taipei,

More information

Paper Motivation. Fixed geometric structures of CNN models. CNNs are inherently limited to model geometric transformations

Paper Motivation. Fixed geometric structures of CNN models. CNNs are inherently limited to model geometric transformations Paper Motivation Fixed geometric structures of CNN models CNNs are inherently limited to model geometric transformations Higher-level features combine lower-level features at fixed positions as a weighted

More information

ECCV Presented by: Boris Ivanovic and Yolanda Wang CS 331B - November 16, 2016

ECCV Presented by: Boris Ivanovic and Yolanda Wang CS 331B - November 16, 2016 ECCV 2016 Presented by: Boris Ivanovic and Yolanda Wang CS 331B - November 16, 2016 Fundamental Question What is a good vector representation of an object? Something that can be easily predicted from 2D

More information

Synscapes A photorealistic syntehtic dataset for street scene parsing Jonas Unger Department of Science and Technology Linköpings Universitet.

Synscapes A photorealistic syntehtic dataset for street scene parsing Jonas Unger Department of Science and Technology Linköpings Universitet. Synscapes A photorealistic syntehtic dataset for street scene parsing Jonas Unger Department of Science and Technology Linköpings Universitet 7D Labs VINNOVA https://7dlabs.com Photo-realistic image synthesis

More information

Mixtures of Gaussians and Advanced Feature Encoding

Mixtures of Gaussians and Advanced Feature Encoding Mixtures of Gaussians and Advanced Feature Encoding Computer Vision Ali Borji UWM Many slides from James Hayes, Derek Hoiem, Florent Perronnin, and Hervé Why do good recognition systems go bad? E.g. Why

More information

arxiv: v1 [cs.cv] 12 Sep 2016

arxiv: v1 [cs.cv] 12 Sep 2016 arxiv:1609.03605v1 [cs.cv] 12 Sep 2016 Detecting Text in Natural Image with Connectionist Text Proposal Network Zhi Tian 1, Weilin Huang 1,2, Tong He 1, Pan He 1, and Yu Qiao 1,3 1 Shenzhen Key Lab of

More information

arxiv: v1 [cs.cv] 26 Jul 2018

arxiv: v1 [cs.cv] 26 Jul 2018 A Better Baseline for AVA Rohit Girdhar João Carreira Carl Doersch Andrew Zisserman DeepMind Carnegie Mellon University University of Oxford arxiv:1807.10066v1 [cs.cv] 26 Jul 2018 Abstract We introduce

More information

AttentionNet for Accurate Localization and Detection of Objects. (To appear in ICCV 2015)

AttentionNet for Accurate Localization and Detection of Objects. (To appear in ICCV 2015) AttentionNet for Accurate Localization and Detection of Objects. (To appear in ICCV 2015) Donggeun Yoo, Sunggyun Park, Joon-Young Lee, Anthony Paek, In So Kweon. State-of-the-art frameworks for object

More information

arxiv: v1 [cs.cv] 1 Sep 2017

arxiv: v1 [cs.cv] 1 Sep 2017 Single Shot Text Detector with Regional Attention Pan He1, Weilin Huang2, 3, Tong He3, Qile Zhu1, Yu Qiao3, and Xiaolin Li1 arxiv:1709.00138v1 [cs.cv] 1 Sep 2017 1 National Science Foundation Center for

More information

Part Localization by Exploiting Deep Convolutional Networks

Part Localization by Exploiting Deep Convolutional Networks Part Localization by Exploiting Deep Convolutional Networks Marcel Simon, Erik Rodner, and Joachim Denzler Computer Vision Group, Friedrich Schiller University of Jena, Germany www.inf-cv.uni-jena.de Abstract.

More information

One Network to Solve Them All Solving Linear Inverse Problems using Deep Projection Models

One Network to Solve Them All Solving Linear Inverse Problems using Deep Projection Models One Network to Solve Them All Solving Linear Inverse Problems using Deep Projection Models [Supplemental Materials] 1. Network Architecture b ref b ref +1 We now describe the architecture of the networks

More information

Deep Convolutional Neural Networks for Efficient Pose Estimation in Gesture Videos

Deep Convolutional Neural Networks for Efficient Pose Estimation in Gesture Videos Deep Convolutional Neural Networks for Efficient Pose Estimation in Gesture Videos Tomas Pfister 1, Karen Simonyan 1, James Charles 2 and Andrew Zisserman 1 1 Visual Geometry Group, Department of Engineering

More information

Part-Based Models for Object Class Recognition Part 3

Part-Based Models for Object Class Recognition Part 3 High Level Computer Vision! Part-Based Models for Object Class Recognition Part 3 Bernt Schiele - schiele@mpi-inf.mpg.de Mario Fritz - mfritz@mpi-inf.mpg.de! http://www.d2.mpi-inf.mpg.de/cv ! State-of-the-Art

More information

Detecting Bone Lesions in Multiple Myeloma Patients using Transfer Learning

Detecting Bone Lesions in Multiple Myeloma Patients using Transfer Learning Detecting Bone Lesions in Multiple Myeloma Patients using Transfer Learning Matthias Perkonigg 1, Johannes Hofmanninger 1, Björn Menze 2, Marc-André Weber 3, and Georg Langs 1 1 Computational Imaging Research

More information

Learning Latent Sub-events in Activity Videos Using Temporal Attention Filters

Learning Latent Sub-events in Activity Videos Using Temporal Attention Filters Learning Latent Sub-events in Activity Videos Using Temporal Attention Filters AJ Piergiovanni, Chenyou Fan, and Michael S Ryoo School of Informatics and Computing, Indiana University, Bloomington, IN

More information