ICDAR 2013 Robust Reading Competition

Similar documents
ICDAR 2013 Robust Reading Competition

Scene text extraction based on edges and support vector regression

OTCYMIST: Otsu-Canny Minimal Spanning Tree for Born-Digital Images

An Efficient Method to Extract Digital Text From Scanned Image Text

arxiv: v1 [cs.cv] 23 Apr 2016

ABSTRACT 1. INTRODUCTION 2. RELATED WORK

Scene text recognition: no country for old men?

WeText: Scene Text Detection under Weak Supervision

Multi-Oriented Text Detection with Fully Convolutional Networks

ICFHR 2014 COMPETITION ON HANDWRITTEN KEYWORD SPOTTING (H-KWS 2014)

An Approach to Detect Text and Caption in Video

THE automated understanding of textual information in

Pattern Recognition xxx (2016) xxx-xxx. Contents lists available at ScienceDirect. Pattern Recognition. journal homepage:

Arbitrary-Oriented Scene Text Detection via Rotation Proposals

A Laplacian Based Novel Approach to Efficient Text Localization in Grayscale Images

Feature Fusion for Scene Text Detection

FASText: Efficient Unconstrained Scene Text Detector

TEXTS in scenes contain high level semantic information

Methods for text segmentation from scene images

FASText: Efficient Unconstrained Scene Text Detector

Object Proposals for Text Extraction in the Wild

PETS2009 and Winter-PETS 2009 Results: A Combined Evaluation

Text Detection and Extraction from Natural Scene: A Survey Tajinder Kaur 1 Post-Graduation, Department CE, Punjabi University, Patiala, Punjab India

Segmentation Framework for Multi-Oriented Text Detection and Recognition

Performance Measures and the DukeMTMC Benchmark for Multi-Target Multi-Camera Tracking. Ergys Ristani

Multiple Frame Integration for OCR on Mobile Devices

arxiv: v2 [cs.cv] 27 Feb 2018

arxiv: v1 [cs.cv] 12 Sep 2016

Verisimilar Image Synthesis for Accurate Detection and Recognition of Texts in Scenes

ETISEO, performance evaluation for video surveillance systems

Tracking. Hao Guan( 管皓 ) School of Computer Science Fudan University

ICDAR 2003 Robust Reading Competitions

Andrei Polzounov (Universitat Politecnica de Catalunya, Barcelona, Spain), Artsiom Ablavatski (A*STAR Institute for Infocomm Research, Singapore),

Detection of Cut-And-Paste in Document Images

Simultaneous Recognition of Horizontal and Vertical Text in Natural Images

Simultaneous Recognition of Horizontal and Vertical Text in Natural Images

Benchmarking recognition results on camera captured word image data sets

SCALE AND ROTATION INVARIANT TECHNIQUES FOR TEXT RECOGNITION IN MEDIA DOCUMENT

The PAGE (Page Analysis and Ground-truth Elements) Format Framework

Rotation-sensitive Regression for Oriented Scene Text Detection

arxiv: v1 [cs.cv] 4 Mar 2017

arxiv: v3 [cs.cv] 2 Apr 2019

A fine-grained approach to scene text script identification

Automated LED Text Recognition with Neural Network and PCA A Review

Evaluation of Probabilistic Occupancy Map for Player Tracking in Team Sports

Using histogram representation and Earth Mover s Distance as an evaluation tool for text detection

Research at Google: With a Case of YouTube-8M. Joonseok Lee, Google Research

Available online at ScienceDirect. Procedia Computer Science 96 (2016 )

Towards Visual Words to Words

arxiv: v1 [cs.cv] 13 Jul 2017

Detecting Oriented Text in Natural Images by Linking Segments

SCENE TEXT BINARIZATION AND RECOGNITION

Accurate Scene Text Detection through Border Semantics Awareness and Bootstrapping

Scene Text Recognition using Co-occurrence of Histogram of Oriented Gradients

Research Article International Journals of Advanced Research in Computer Science and Software Engineering ISSN: X (Volume-7, Issue-7)

arxiv: v1 [cs.cv] 2 Sep 2018

Multi-script Text Extraction from Natural Scenes

Deep Direct Regression for Multi-Oriented Scene Text Detection

The Application of Image Processing to Solve Occlusion Issue in Object Tracking

Scene Text Detection via Holistic, Multi-Channel Prediction

Racing Bib Number Recognition

Guided Text Spotting for Assistive Blind Navigation in Unfamiliar Indoor Environments

Min-Hashing and Geometric min-hashing

A process for text recognition of generic identification documents over cloud computing

12/12 A Chinese Words Detection Method in Camera Based Images Qingmin Chen, Yi Zhou, Kai Chen, Li Song, Xiaokang Yang Institute of Image Communication

Rewind to Track: Parallelized Apprenticeship Learning with Backward Tracklets

An Objective Evaluation Methodology for Handwritten Image Document Binarization Techniques

Scenery Character Detection with Environmental Context

Automatically Algorithm for Physician s Handwritten Segmentation on Prescription

Unsupervised refinement of color and stroke features for text binarization

Arbitrary-Oriented Scene Text Detection via Rotation Proposals

Overlay Text Detection and Recognition for Soccer Game Indexing

Vehicle and Person Tracking in UAV Videos

A data set for evaluating the performance of multi-class multi-object video tracking

Photo OCR ( )

A System for Real-time Detection and Tracking of Vehicles from a Single Car-mounted Camera

Metrics for Complete Evaluation of OCR Performance

DETECTING TEXTUAL INFORMATION IN IMAGES FROM ONION DOMAINS USING TEXT SPOTTING

Efficient Representation of Local Geometry for Large Scale Object Retrieval

New functionality in OpenCV 3.0

Signboard Optical Character Recognition

arxiv: v1 [cs.cv] 12 Nov 2017

An Efficient Method for Text Detection from Indoor Panorama Images Using Extremal Regions

A COMPREHENSIVE STUDY ON TEXT INFORMATION EXTRACTION FROM NATURAL SCENE IMAGES

Real-time detection and tracking of pedestrians in CCTV images using a deep convolutional neural network

Total-Text: A Comprehensive Dataset for Scene Text Detection and Recognition

arxiv: v1 [cs.cv] 6 Sep 2017

Story Unit Segmentation with Friendly Acoustic Perception *

Tracking-Learning-Detection

Historical Document Layout Analysis Competition

Text Localization in Real-world Images using Efficiently Pruned Exhaustive Search

Bus Detection and recognition for visually impaired people

CCEI3001: The CDA Credentialing Process

Scene Text Recognition for Augmented Reality. Sagar G V Adviser: Prof. Bharadwaj Amrutur Indian Institute Of Science

Text-Edge-Box: An Object Proposal Approach for Scene Texts Localization

Iris Recognition for Eyelash Detection Using Gabor Filter

A Comparison of Approaches for Automated Text Extraction from Scholarly Figures

Recognizing Text Elements for SVG Comic Compression and its Novel Applications

Guided Text Spotting for Assistive Blind Navigation in Unfamiliar Indoor Environments

Applying the KISS Principle for the CLEF- IP 2010 Prior Art Candidate Patent Search Task

Transcription:

ICDAR 2013 Robust Reading Competition D. Karatzas, F. Shafait, S. Uchida, M. Iwamura, L. Gomez, S. Robles, J. Mas, D. Fernandez, J. Almazán, L.P. de las Heras http://dag.cvc.uab.es/icdar2013competition/

ICDAR 2013 Robust Reading Competition Challenge 1: Reading Text in Born-Digital Images (Web and Email) Challenge 2: Reading Text in Scene Images Challenge 3: Reading Text in Videos 2

Timeline Web Site Online Registrations Open 15/1 Training Set Online 28/2 Test Set Available and Submission Open 29/3 Submission Closed 8/4 January February March April Open mode participation: Participants run their own algorithms and provided results. 3

Participation in Numbers Visits of the Web Site (Unique Visitors) 2,724 (1,025) Registered Users 124 Total Number of Individual Participants 15 Total Number of Submissions 52 (42 excluding variants) Challenge 1 #Submissions (#Participants) 17 (8) Challenge 2 #Submissions (#Participants) 22 (13) Challenge 3 #Submissions (#Participants) 1 (1) 4

Reading text in Born-Digital Images Reading text in Scene Images CHALLENGES 1 AND 2 5

Overview Born Digital Images Low-Resolution Digitally Created Text Compression Anti-Aliasing Real Scene Images High-Resolution Captured Text Illumination Artefacts Perspective 6

Structure Challenges 1 and 2 were organised over three tasks: Task 1 Text Localization Objective: To obtain a rough estimation of text areas in the image, in terms of bounding boxes corresponding to parts of text (words or text lines) Task 2 Text Segmentation Objective: Pixel-level separation of text from the background Task 3 Word Recognition Objective: Assuming known word bounding boxes, to obtain the correct text transcriptions 7

Datasets Updated datasets compared to the 2011 edition: Extra test images in Challenge 1 Segmentation ground truth and filtering of duplicate images in Challenge 2 Don t Care regions introduced in the ground truth Datasets in Numbers Challenge 1 Challenge 2 Training Dataset (Full Images) 420 229 Test Dataset (Full Images) 141 233 Training Dataset (Word Images) 3564 848 Test Dataset (Word Images) 1439 1095 8

Ground Truth Task 1 Text Localization Dataset Images Ground Truth (text files) Ground Truth Visualisation 11, 15, 42, 28, "How" 41, 16, 61, 28, "to" 8, 33, 36, 46, "Find" 41, 32, 64, 46, "the" 11, 50, 61, 65, "Perfect" 16, 69, 56, 82, "HDTV" 22 249 113 286 "The" 142 249 287 286 "Photo" 326 245 620 297 "Specialists" 0, 1, 177, 32, "\"HIGHER" 11, 35, 182, 63, "SAVINGS" 12, 67, 183, 103, "RATES\"" 50, 114, 56, 120, "A" 60, 114, 91, 120, "reward" 96, 114, 108, 120, "for" 112, 116, 126, 120, "our" 130, 114, 164, 120, "current"... 9

Performance Evaluation Task 1 Text Localization Performance Evaluation Methodology Methodology proposed by Wolf and Jolion [1] Takes into account both Bounding Box area overlapping and Precision at the level of detection counts Possible to create meaningful cumulative results over many images Ways to deal with one-to-many and many-to-one cases Set up to penalise over-segmentation (words split to parts), but no undersegmentation (group of words detected as text line) Baseline Method Commercial OCR package (ABBYY OCR SDK v.10) 1. C. Wolf and J.M. Jolion, Object Count / Area Graphs for the Evaluation of Object Detection and Segmentation Algorithms, IJDAR 8(4), pp. 280-296, 2006 10

Precision Results: Task 1.1 Text Localization 100.00% Method Name Recall Precision F-Score USTB_TexStar 82.38% 93.83% 87.73% TH-TextLoc 75.85% 86.82% 80.97% I2R_NUS_FAR 71.42% 84.17% 77.27% Baseline 69.21% 84.94% 76.27% Text Detection 73.18% 78.62% 75.80% I2R_NUS 67.52% 85.19% 75.33% BDTD_CASIA 67.05% 78.98% 72.53% OTCYMIST 74.85% 67.69% 71.09% Inkam 52.21% 58.12% 55.01% 90.00% 80.00% 70.00% 60.00% 50.00% 50.00% 60.00% 70.00% 80.00% 90.00% 100.00% Recall USTB_TexStar TH-TextLoc I2R_NUS_FAR Baseline Text Detection I2R_NUS BDTD_CASIA OTCYMIST Inkam Best ICDAR 2011 11

Precision Results: Task 2.1 Text Localization 100.00% Method Name Recall Precision F-Score USTB_TexStar 66.45% 88.47% 75.90% Text Spotter 64.84% 87.51% 74.49% CASIA_NLPR 68.24% 78.89% 73.18% Text_Detector_CASIA 62.85% 84.70% 72.16% I2R_NUS_FAR 69.00% 75.08% 71.91% I2R_NUS 66.17% 72.54% 69.21% TH-TextLoc 65.19% 69.96% 67.49% Text Detection 53.42% 74.15% 62.10% Baseline 34.74% 60.76% 44.21% Inkam 35.27% 31.20% 33.11% 90.00% 80.00% 70.00% 60.00% 50.00% 40.00% 30.00% USTB_TexStar Recall Text Spotter CASIA_NLPR Text_Detector_CASIA I2R_NUS_FAR I2R_NUS TH-TextLoc Text Detection Baseline Inkam 12

Ground Truth Task 2 Text Segmentation Dataset Images Ground Truth (images) 13

Performance Evaluation Task 2 Text Segmentation Performance Evaluation Methodology Framework proposed by Clavelli et al [2] Measures the degree to which morphological properties of the text are preserved, not simply the number of misclassified pixels As a secondary evaluation scheme we implemented standard pixel level precision and recall (compatibility with other results) (a) (b) (c) 2. A. Clavelli, D. Karatzas and J. Lladós A Framework for the Assessment of Text Extraction Algorithms on Complex Colour Images, DAS 2010, pp. 19-28, 2010 14

Precision Results: Task 1.2 Text Segmentation Method Name Recall Precision F-Score USTB_FuStar 80.01% 86.20% 82.99% I2R_NUS 64.57% 73.44% 68.72% OTCYMIST 65.75% 71.65% 68.57% I2R_NUS_FAR 59.05% 80.04% 67.96% Text Detection 49.64% 69.46% 57.90% 100.00% 90.00% 80.00% 70.00% 60.00% USTB_FuStar I2R_NUS OTCYMIST I2R_NUS_FAR Text Detection 50.00% 40.00% 50.00% 60.00% 70.00% 80.00% 90.00% 100.00% Recall 15

Precision Results: Task 2.2 Text Segmentation 100.00% Method Name Recall Precision F-Score I2R_NUS_FAR 68.64% 80.59% 74.14% NSTextractor 63.38% 83.57% 72.09% USTB_FuStar 68.03% 72.46% 70.18% I2R_NUS 60.33% 76.62% 67.51% NSTsegmentator 68.00% 54.35% 60.41% Text Detection 62.03% 57.43% 59.64% OTCYMIST 41.79% 31.60% 35.99% 90.00% 80.00% 70.00% 60.00% 50.00% 50.00% 60.00% 70.00% 80.00% 90.00% 100.00% Recall I2R_NUS_FAR NSTextractor USTB_FuStar I2R_NUS NSTsegmentator Text Detection OTCYMIST 16

Ground Truth Task 3 Word Recognition Dataset Images Ground Truth (single text file for all the words) 1.png 2.png 3.png 4.png 5.png 6.png 7.png 8.png 9.png 10.png 11.png 12.png 13.png 14.png 15.png 16.png 17.png 18.png 19.png 20.png 1.png, "okcupid" 2.png, "How" 3.png, "to" 4.png, "Find" 5.png, "the" 6.png, "Perfect" 7.png, "HDTV" 8.png, "Creative" 9.png, "Printing" 10.png, "Community" 11.png, "KIDS:" 12.png, "LEARN" 13.png, "ENGLISH" 14.png, "ONLINE" 15.png, "\"HIGHER " 16.png, "Games," 17.png, "stories," 18.png, "and" 19.png, "more" 20.png, "RATES\"" 17

Performance Evaluation Task 3 Word Recognition Performance Evaluation Methodology Edit distance (normalised to the length of the ground truth transcription) Equal weights for deletions, additions, substitutions We also report statistics on correctly recognised words (case sensitive) Baseline method Commercial OCR package (ABBYY OCR SDK v.10) 18

Results: Task 1.3 Word Recognition Method Name Total Edit Distance Correctly Recognized Words (%) PhotoOCR 105.5 82.21 MAPS 196.2 80.4 PLT 200.4 80.26 NESP 214.5 79.29 Baseline 409.4 60.95 19

Results: Task 2.3 Word Recognition Method Name Total Edit Distance Correctly Recognized Words (%) PhotoOCR 122.7 82.83 PicRead 332.4 57.99 NESP 360.1 64.20 PLT 392.1 62.37 MAPS 421.8 62.74 Feild s Method 422.1 47.95 PIONEER 479.8 53.70 Baseline 539.0 45.30 TextSpotter 606.3 26.85 20

Reading Text in Videos CHALLENGE 3 21

Overview - Structure Challenge 3 was organised over a single task: Task 1 Text Localization Objective: The objective of this task is to obtain a localisation of the words in the video in terms of their affine bounding boxes. The task requires that words are both localised correctly in every frame and tracked correctly (assigned the same ID) over the video sequence. 22

Datasets New Datasets created for this challenge Video sequences collected in different countries (English, French, Spanish) Users were given seven different tasks (e.g. Driving, follow way-finding panels, browse products in a super market, etc) Four different cameras were used (head-mounted, mobile, hand-held 720p, tripod high-definition 1080p) Sequences ranging from 10 to 1 06 each Dataset in Numbers Number of Videos in Training Set 13 Number of Frames in Training Set 5,486 Number of Videos in Test Set 15 Number of Frames in Test Set 9,790 Total Video Length 9 35 23

Ground Truth Frame Image Ground Truth Visualisation Ground Truth (XMLfile) <?xml version="1.0" encoding="us-ascii"?> <Frames> <frame ID="1" /> <frame ID="2" /> <frame ID="3" />... <frame ID="434"> <object Transcription="Badia" ID="377001" Language="Catalan Quality="MODERATE"> <Point x="446" y="58" /> <Point x="447" y="69" /> <Point x="484" y="67" /> <Point x="484" y="55" /> </object> <object Transcription="Barbera" ID="377002" Language="Catalan Quality="MODERATE"> <Point x="502" y="81" /> <Point x="447" y="85" /> <Point x="446" y="73" /> <Point x="502" y="70" /> </object>... </frame>... <frame ID="1859" /> <frame ID="1860" /> </Frames> 24

Performance Evaluation Evaluation based on CLEAR-MOT [3, 4], and VACE [5] metrics Multiple Object Tracking Precision (MOTP), expresses how well locations of words are estimated Multiple Object Tracking Accuracy (MOTA), how many mistakes the tracker has made (false positives, ID mismatches, etc), can be negative Average Tracking Accuracy (ATA), comprehensive spatio-temporal measure. Baseline Method Detection stage performed using the ABBYY OCR SDK, followed by a tracking stage where each detected word is assigned the identifier of the previously detected word with the best overlapping ratio 3. K. Bernardin and R. Stiefelhagen. Evaluating multiple object tracking performance: the CLEAR MOT metrics J. Image Video Process. January 2008 4. A.D. Bagdanov, A. Del Bimbo, F. Dini, G. Lisanti, and I. Masi, "Compact and efficient posterity logging of face imagery for video surveillance", IEEE Multimedia, 2012 5. R. Kasturi, et al, Framework for performance evaluation of face, text, and vehicle detection and tracking in video: Data, metrics, and protocol, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 2, pp. 319 336, 2009 25

Results: Task 3.1 Text Localization Method Name MOTP MOTA ATA TextSpotter 0.67 0.23 0.12 Baseline 0.63-0.09 0.00 Baseline TextSpotter 26

27 WHAT S NEXT?

What s Next The competition site is open! Register to download the datasets and upload new results http://dag.cvc.uab.es/icdar2013competition The competition is now open on a continuous mode Datasets are freely available (on Competition Web site and soon on TC11) Online performance evaluation functionality Ranking charts and tables Visualisation of the results of all tasks, and all challenges 28

Online Evaluation and Ranking Tables 29

Visualisation of different Tasks 30

Winners Thanks for participating! Task 1.1 Text Localization in Born Digital Images USTB_TexStar, Xuwang Yin 1, Xu-Cheng Yin 1 and Hong-Wei Hao 2 1 University of Science and Technology, Beijing, China 2 Institute of Automation, Chinese Academy of Sciences, Beijing, China Task 1.2 Text Segmentation in Born Digital Images USTB_FuStar, Xuwang Yin 1, Xu-Cheng Yin 1 and Hong-Wei Hao 2 1 University of Science and Technology, Beijing, China 2 Institute of Automation, Chinese Academy of Sciences, Beijing, China Task 1.3 Word Recognition in Born Digital Images Photo OCR, Alessandro Bissacco, Mark Cummins, Yuval Netzer, Hartmut Neven Google Inc., USA Task 2.1 Text Localization in Real Scenes USTB_TexStar, Xuwang Yin 1, Xu-Cheng Yin 1 and Hong-Wei Hao 2 1 University of Science and Technology, Beijing, China 2 Institute of Automation, Chinese Academy of Sciences, Beijing, China Task 2.2 Text Segmentation in Real Scenes I2R_NUS_FAR, Lu Shijian 1, Tian Shangxuan 2, Lim Joo Hwee 1, Tan Chew Lim 2 1 Institute for Infocomm Research, A*STAR, Singapore 2 School of Computing, National University of Singapore Task 2.3 Word Recognition in Real Scenes PhotoOCR, Alessandro Bissacco, Mark Cummins, Yuval Netzer, Hartmut Neven Google Inc., USA Task 3.1 Text Localization in Videos (Mention) TextSpotter, Lukas Neumann, Jiri Matas, Michal Busta Czech Technical University, Czech Republic 31