Available online at ScienceDirect. Procedia Computer Science 96 (2016 )

Similar documents
12/12 A Chinese Words Detection Method in Camera Based Images Qingmin Chen, Yi Zhou, Kai Chen, Li Song, Xiaokang Yang Institute of Image Communication

Dot Text Detection Based on FAST Points

Segmentation Framework for Multi-Oriented Text Detection and Recognition

Scene Text Detection Using Machine Learning Classifiers

A Laplacian Based Novel Approach to Efficient Text Localization in Grayscale Images

Using Adaptive Run Length Smoothing Algorithm for Accurate Text Localization in Images

A Fast Caption Detection Method for Low Quality Video Images

A process for text recognition of generic identification documents over cloud computing

Enhanced Image. Improved Dam point Labelling

Bus Detection and recognition for visually impaired people

Time Stamp Detection and Recognition in Video Frames

ABSTRACT 1. INTRODUCTION 2. RELATED WORK

arxiv: v1 [cs.cv] 23 Apr 2016

Text Information Extraction And Analysis From Images Using Digital Image Processing Techniques

INTELLIGENT transportation systems have a significant

Text Enhancement with Asymmetric Filter for Video OCR. Datong Chen, Kim Shearer and Hervé Bourlard

IDIAP IDIAP. Martigny ffl Valais ffl Suisse

Traffic Signs Recognition using HP and HOG Descriptors Combined to MLP and SVM Classifiers

Research Article International Journals of Advanced Research in Computer Science and Software Engineering ISSN: X (Volume-7, Issue-7)

CORRELATION BASED CAR NUMBER PLATE EXTRACTION SYSTEM

International Journal of Electrical, Electronics ISSN No. (Online): and Computer Engineering 3(2): 85-90(2014)

Text Area Detection from Video Frames

Definition, Detection, and Evaluation of Meeting Events in Airport Surveillance Videos

I. INTRODUCTION. Figure-1 Basic block of text analysis

2 Proposed Methodology

Gradient Difference Based Approach for Text Localization in Compressed Domain

WITH the increasing use of digital image capturing

A Rapid Automatic Image Registration Method Based on Improved SIFT

2 OVERVIEW OF RELATED WORK

Learning the Three Factors of a Non-overlapping Multi-camera Network Topology

An Automatic Timestamp Replanting Algorithm for Panorama Video Surveillance *

Text Localization and Extraction in Natural Scene Images

Automatic Shadow Removal by Illuminance in HSV Color Space

Text Extraction from Natural Scene Images and Conversion to Audio in Smart Phone Applications

Towards Visual Words to Words

Visual object classification by sparse convolutional neural networks

Document Text Extraction from Document Images Using Haar Discrete Wavelet Transform

WITH increasing penetration of portable multimedia. A Convolutional Neural Network Based Chinese Text Detection Algorithm via Text Structure Modeling

Extraction Characters from Scene Image based on Shape Properties and Geometric Features

TEXT DETECTION and recognition is a hot topic for

Graph Matching Iris Image Blocks with Local Binary Pattern

Automatically Algorithm for Physician s Handwritten Segmentation on Prescription

Recognition of Gurmukhi Text from Sign Board Images Captured from Mobile Camera

OTCYMIST: Otsu-Canny Minimal Spanning Tree for Born-Digital Images

Available online at ScienceDirect. Procedia Computer Science 22 (2013 )

SCENE TEXT BINARIZATION AND RECOGNITION

A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS. Kuan-Chuan Peng and Tsuhan Chen

Available online at ScienceDirect. Procedia Computer Science 45 (2015 )

Arbitrary-Oriented Scene Text Detection via Rotation Proposals

An ICA based Approach for Complex Color Scene Text Binarization

Image Text Extraction and Recognition using Hybrid Approach of Region Based and Connected Component Methods

The Population Density of Early Warning System Based On Video Image

Scene text extraction based on edges and support vector regression

Face Recognition Using Vector Quantization Histogram and Support Vector Machine Classifier Rong-sheng LI, Fei-fei LEE *, Yan YAN and Qiu CHEN

IDIAP IDIAP. Martigny ffl Valais ffl Suisse

THE SPEED-LIMIT SIGN DETECTION AND RECOGNITION SYSTEM

Text Separation from Graphics by Analyzing Stroke Width Variety in Persian City Maps

Input sensitive thresholding for ancient Hebrew manuscript

Generic Face Alignment Using an Improved Active Shape Model

Available online at ScienceDirect. Procedia Computer Science 58 (2015 )

EXTRACTING GENERIC TEXT INFORMATION FROM IMAGES

An Approach to Detect Text and Caption in Video

Racing Bib Number Recognition

Overlay Text Detection and Recognition for Soccer Game Indexing

Connected Component Clustering Based Text Detection with Structure Based Partition and Grouping

Text Detection in Multi-Oriented Natural Scene Images

Localization, Extraction and Recognition of Text in Telugu Document Images

Thai Text Localization in Natural Scene Images using Convolutional Neural Network

N.Priya. Keywords Compass mask, Threshold, Morphological Operators, Statistical Measures, Text extraction

LEVERAGING SURROUNDING CONTEXT FOR SCENE TEXT DETECTION

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

Conspicuous Character Patterns

An Adaptive Threshold LBP Algorithm for Face Recognition

Extraction and Classification of User Interface Components from an Image

On the Possibility of Structure Learning-Based Scene Character Detector

Available online at ScienceDirect. Procedia Computer Science 56 (2015 )

Combining Top-down and Bottom-up Segmentation

SOME stereo image-matching methods require a user-selected

TRANSPARENT OBJECT DETECTION USING REGIONS WITH CONVOLUTIONAL NEURAL NETWORK

Achieve Significant Throughput Gains in Wireless Networks with Large Delay-Bandwidth Product

Edge-based Features for Localization of Artificial Urdu Text in Video Images

A Network Intrusion Detection System Architecture Based on Snort and. Computational Intelligence

A Hierarchical Visual Saliency Model for Character Detection in Natural Scenes

A Generalized Method to Solve Text-Based CAPTCHAs

A Novel Extreme Point Selection Algorithm in SIFT

AUTONOMOUS IMAGE EXTRACTION AND SEGMENTATION OF IMAGE USING UAV S

An Objective Evaluation Methodology for Handwritten Image Document Binarization Techniques

A Review of various Text Localization Techniques

Automatic Detection of Change in Address Blocks for Reply Forms Processing

Automatic detection of books based on Faster R-CNN

A Robust Wipe Detection Algorithm

A Text Detection System for Natural Scenes with Convolutional Feature Learning and Cascaded Classification

Cost-alleviative Learning for Deep Convolutional Neural Network-based Facial Part Labeling

Video annotation based on adaptive annular spatial partition scheme

AViTExt: Automatic Video Text Extraction

A Skeleton Based Descriptor for Detecting Text in Real Scene Images

Text Detection and Extraction from Natural Scene: A Survey Tajinder Kaur 1 Post-Graduation, Department CE, Punjabi University, Patiala, Punjab India

A New Approach to Detect and Extract Characters from Off-Line Printed Images and Text

Proceedings of the International MultiConference of Engineers and Computer Scientists 2018 Vol I IMECS 2018, March 14-16, 2018, Hong Kong

EDGE EXTRACTION ALGORITHM BASED ON LINEAR PERCEPTION ENHANCEMENT

Transcription:

Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 96 (2016 ) 1409 1417 20th International Conference on Knowledge Based and Intelligent Information and Engineering Systems, KES2016, 5-7 September 2016, York, United Kingdom A multi-stage method for Chinese text detection in news videos Yaqi Wang, Liangrui Peng, Shengjin Wang Tsinghua National Laboratory for Information Science and Technology Deparment of Electronic Engineering, Tsinghua University, Beijing,100084, China Abstract With the rapid increase of on-line video resources, there is an urgent demand for text detection and recognition technologies to build content-based video indexing and retrieval systems. Chinese news video texts contain highly condensed and rich information, but the low resolution of videos on the Internet and the complexity of Chinese character structures bring challenges for text detection. In this paper, we present a multi-stage scheme for Chinese news video text detection. We propose an improved Stroke Width Transform (SWT) method by incorporating text color consistency constraint for candidate text blocks generation. Then we use divide and conquer strategy to distinguish candidate text blocks into three sub-spaces according to their geometric shapes and size. For each sub-space, a neural network is designed to filter the candidates into text or non-text blocks. Finally, the text blocks are merged into text lines based on the stroke width, color and other heuristic information. Experimental results on self-collected Chinese news video dataset and ICDAR 2013 dataset show that the proposed method is effective to detect both news video captions and scene texts. c 2016 Published The Authors. by Elsevier Published B.V. This by Elsevier is an open B.V. access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). Peer-review under responsibility of KES International. Peer-review under responsibility of KES International Keywords: Chinese text detection, News video, Stroke Width Transform, Neural network 1. Introduction Content-based video indexing and retrieval systems are in high demand because of the rapid proliferation of video resources on the Internet. News video, as a common type of videos, contains a large amount of meaningful information, including captions and scene texts in video images. So the research on text detection and recognition is essential to building content-based news video indexing and retrieval systems. However, video text detection is difficult because of the complex background, various text formats and styles, and low resolution of on-line videos. Some examples from the China Central Television (CCTV) news video Internet resources are shown in Fig. 1. Applying conventional Optical Character Recognition (OCR) systems to these news video samples directly cannot achieve satisfactory results. New effective text detection technologies are needed to overcome these difficulties. Corresponding author. Tel.: +86-10-62773710 ext. 308; fax:+86-10-62773710 ext. 313. E-mail address: {wangyaqi,plr,wsj}@ocrserv.ee.tsinghua.edu.cn 1877-0509 2016 Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). Peer-review under responsibility of KES International doi:10.1016/j.procs.2016.08.186

1410 Yaqi Wang et al. / Procedia Computer Science 96 ( 2016 ) 1409 1417 Text detection methods for video and images can be mainly classified into two categories: feature-based methods and machine learning-based methods. Feature-based methods make use of attributes of text regions, such as local intensities and edge densities, and extract proper feature descriptors to identify text regions 1. According to the type of features used, feature-based methods can be further divided into three groups, namely connected component-based method 1,2,3,4,5, edge-based method 6,7 and texture-based method 8,9,10. Stroke Width Transform (SWT) 7 method is one of the edge-based methods, which uses the information of stroke edges with the assumption that characters in the same text line have uniform stroke width. This method can detect various kinds of text robustly but may result in more false alarms. Fig. 1. Examples of captions and background texts in news video. Machine learning-based methods adopt classifiers including Support Vector Machine (SVM) 11, neural network 12,13 or Bayesian classifier 14 to distinguish text or non-text blocks. Sun et al. 15 proposed a method for text and non-text classification based on shallow neural networks. Deep learning methods 16 can also be adopted. There are two major types of texts in Chinese news video, captions and scene texts. The main challenges is the low quality of caption in on-line video images, and the unconstraint scene text in complex background. In this paper, we aim to develop a reliable multi-stage scheme for Chinese text detection in news videos. A color-enhanced SWT method is used to generate candidate text blocks; Then a set of neural networks are designed to filter non-text block which further decrease the false alarm rate; Finally, the text blocks are merged into text lines in the post-processing stage. The rest of this paper is organized as following. Section 2 presents the principle of the proposed Chinese text detection scheme. Experimental results are reported in Section 3. The final section is the conclusion. 2. Method The proposed text detection algorithm mainly includes the following three stages: (1) Candidates generation, (2) Candidates filtering, (3) Text line generation, as shown in Fig. 2. 2.1. Candidate generation We adopt SWT 7 to extract connected components. SWT can be seen as a local operator that can assign each pixel with width of the stroke that most likely contains the pixel. The main steps of SWT computation include Canny edge detection and rays shooting. The result of SWT operator is a stroke width map with the same size as the original image. The initial value of a stroke width map is set to default background value. In order to recover strokes, the first step is using Canny Edge Detector 17 to find edge pixels and their gradient directions from the original image, as shown in Fig. 3. Then we shoot a ray from each edge pixel p i along the gradient direction d p as the red arrow shown in Fig. 4. If another edge pixel q i is found and the gradient direction d q is roughly opposite to the gradient direction d p, all the elements in the output stroke width map covered by the segment [p i,q i ] are assigned with the same stroke width value p q. The ray will be removed if the appropriate opposite edge pixel is not found. After the first pass of processing, all the edge pixels are considered, but some values in the stroke width map are not correct in some complex situations as the blue arrow shown in Fig. 4. To avoid this kind of situation, for all the values of elements which are above the mean value m in the stroke map, the second pass of processing is needed to replace

Yaqi Wang et al. / Procedia Computer Science 96 ( 2016 ) 1409 1417 1411 Fig. 2. System flowchart. Fig. 3. Examples of Canny edge detection results Fig. 4. Explanation of the SWT operator them with the mean value m. The examples of stroke width maps are shown in Fig. 5, and the generated connected components from the stroke width maps are shown in Fig. 6. There are still some problems for the extracted text connected components, e.g., one character is split into two or more connected components, or some characters may be merged into one connected component. To avoid these problems, we add color consistency constraint to improve the performance. In the ray shooting process, the current

1412 Yaqi Wang et al. / Procedia Computer Science 96 ( 2016 ) 1409 1417 Fig. 5. Examples of the stroke width maps Fig. 6. The generated connected components from the stroke width maps ray s color C r is defined as the median of R, G, B of all the pixels in the original image along the ray direction. If the color value of a pixel q i along the current ray satisfies C qi C r <λ c, where λ c decreases from 200 to 100 linearly as the number of pixels in the current ray increases, the pixel q i will be considered as an edge pixel on the current stroke. By adding color consistency constraint, the connected component results of stroke width maps are improved, as shown in Fig. 7. Fig. 7. Before and after using color consistency constraint for SWT After generating the connected components, some flexible rules are employed to filter out the non-text blocks 7. The parameters of each rule can be estimated on the training set of samples. The definition of these rules and thresholds are summarized below. 1. Component size. Components whose sizes are too large or too small may be ignored. The height of a component should be less than 1/3 and greater than 1/100 of the image s height. 2. Stroke width variance. The variance of the stroke width within each connected component should be less than the threshold which is the half of the average stroke width. 3. Stroke color variance. The variance of the stroke color within each connected component should be less the threshold which is the half of the average stroke color. 4. Aspect ratio. The ratio of height and width of a text block s bounding box is limited between 0.1 and 10. 5. Overlap of bounding boxes. The bounding box of a text block should not include more than two other text blocks, to eliminate connected components that may surround text such as sign frames. 6. Area ratio. The ratio of the number of stroke pixels to the total pixels of the connected component should be greater than 0.25. The thresholds of the above rules are set to avoid missing text blocks. The remaining connected components are text block candidates for the next stage of processing.

Yaqi Wang et al. / Procedia Computer Science 96 ( 2016 ) 1409 1417 1413 2.2. Text and non-text classification As the text block candidates contain various blocks, we utilize a divide and conquer strategy to solve this problem. Given a text candidate C i, it will be classified into three sub-spaces, namely Single Character, Multi-Characters and Radical according to its height h i, width w i, and aspect ratio Ar i, as shown in Fig. 8. The detailed classification rules are defined in Table 1. Fig. 8. Sub-spaces for text candidates Table 1. Empirical criteria for text candidate subspace partition. Radical Single Character Multi-Characters Aspect ratio Ar i Width w i Height h i Ar i TH Ar1 and w i TH; or Ar i TH Ar2 and h i TH TH Ar1 Ar i TH Ar2 Ar i TH Ar1 or Ar i TH Ar2, and w i > TH and h i > TH In our experiment, we set TH = 12, TH Ar1 = 0.6, TH Ar2 = 1.5 empirically. In addition, if the aspect ratio of the Multi-Characters block exceeds a threshold of 3.5, it will be segmented into several blocks according to the vertical projection histograms. Then all the candidates in different sub-spaces will be normalized. The normalized size is 12*28 pixels or 28*12 pixels for Radical, 28*28 pixels for Single Character, and 18*45(height*width) pixels for Multi-Characters. For each sub-space, a neural network is trained to classify the text and non-text candidates. We use a two-hidden layer Multilayer Perceptron Network (MLP) structure. The specific numbers of nodes for each layers in those three networks are shown in Table 2. Table 2. Numbers of nodes in each layer of the specific neural networks. Radical Single Character Multi-Characters Input layer 432 784 810 Hidden layer I 256 400 576 Hidden layer II 128 200 288 Output layer 2 2 2 Examples of candidate filtering are shown in Fig. 9.

1414 Yaqi Wang et al. / Procedia Computer Science 96 ( 2016 ) 1409 1417 Fig. 9. Examples of text block candidate filtering, yellow, green and red bounding boxes correspond to Radical, Single Character and Multi- Characters As Convolutional Neural Network (CNN)is one of the emerging deep learning techniques, it is also feasible to adopt CNN for candidate filtering. The CNN model structure 16 we use to compare with MLP has two convolutional layers with 16 and 20 filters respectively, and two fully connected layers. The comparison results are presented in the Section 3. 2.3. Text line generation The remaining components will be further grouped into text lines at this stage. Characters appearing in news video frames usually group into lines. Each pair of neighbor candidates is considered at first. Whether two neighbor candidates can be linked into a text line depends on the possibility of belonging to the same line 18. Two candidates in a text line should have similar block height and width, stroke width as well as stroke color, and the distance between the two candidates should not less than a threhold, e.g., 2 times of the sum of their widths. The process will be repeated until no candidates can be merged into a line. And the remained isolated candidates are considered as noises. The merged text lines are shown in Fig. 10. Fig. 10. Examples of text line generation results 3. Experimental Results 3.1. Dataset In order to evaluate the performance of the proposed algorithm, we build a dataset that contains 1,693 video frames selected from 183 clips of CCTV news videos available on the Internet. Each frame is 640 480 pixels. According to the position of texts and the way it appears, text regions are classified manually into several categories, including captions, superimposed text passages, scene text, text in graphs etc. All the collected samples are manually labeled with text bounding boxes. In order to compare the proposed method with existing methods, we also evaluate our method on the public dataset of ICDAR 2013 robust reading competition Challenge 2 19, which has been widely used as one of the standard

Yaqi Wang et al. / Procedia Computer Science 96 ( 2016 ) 1409 1417 1415 benchmarks for text detection in natural images. The dataset contains 229 images in the training set and 233 images in the testing set. The quantitative evaluation uses Precision, Recall Rate and F-measures which can be computed by comparing the ground truth and the detected boxes 2. 3.2. Experiments on text block candidate generation We compare the performance of our color-enhanced SWT method for text candidates generation with the original SWT method. The result is listed in Table 3. On one hand, the color consistency constraint can efficiently reduce the number of the rays shooting from stroke edges to background regions. On the other hand, some pairs of stroke edges of a character whose gradient directions are not strictly opposite to each other can be matched with respect to the same stroke color. Table 3. Comparison of SWT and Color-enhanced SWT. Methods Precision(%) Recall(%) F-measure (%) SWT 53.2 92.1 67.4 Color-enhanced SWT 62.1 94.4 74.9 We further evaluate the performance of text block candidate generation for different text types of Chinese news videos. Results are shown in Table 4. Table 4. Text candidates detection results of Color-enhanced SWT on different text types in CCTV news video dataset. Text type Testing set Number of Chinese characters Precision(%) Recall(%) F-measure(%) Caption 74 1,108 62.4 93.2 73.5 Superimposed 12 935 77.5 95.5 85.6 text passages Scene text 32 276 51.8 88.9 65.5 Text in graph 36 314 61.7 93.1 74.2 All types 123 2,633 62.1 94.4 74.9 3.3. Experiments on text and non-text classification MLP networks for the corresponding different candidate sub-spaces are trained and tested on the CCTV news video dataset. 723 images from the dataset are used to train the networks. After the candidate generation by colorenhanced SWT, all the text candidates are labeled as Radical, Single Character or Multi-Characters manually to train each network. Table 5 shows the sizes of training and testing sets and the classification results for each type. In order to validate the performance of the shallow MLP network, we take the Single Character sub-space for example to build CNN to compare the accuracy. The comparison results are shown in Table 6. The accuracy of the classification results of the CNN is just a little bit higher than the shallow MLP network. But considering the time spend on network training and the speed of classification, shallow MLP network is more feasible and is therefore adopted in our method.

1416 Yaqi Wang et al. / Procedia Computer Science 96 ( 2016 ) 1409 1417 Table 5. Results of text and non-text candidates classification. Type of candidates Number of blocks in the training set Number of text blocks in the testing set Acc.(%) Number of non-text blocks in the testing set Acc.(%) Radical 4,256 315 93.1 518 94.2 Single character 6,012 610 98.1 552 97.7 Multi-Characters 1,435 124 91.3 102 90.9 Table 6. Comparison of deep and shallow neural networks. Architecture Acc. for text blocks (%) Acc. for non-text blocks (%) Speed of classification (ms/image) Shallow neural network (2 hidden layer MLP) Deep neural network (4 layer CNN) 98.1 97.7 4 98.4 98.1 13(without GPU) 3.4. Experiments on ICDAR2013 dataset The dataset is from ICDAR 2013 robust reading competition Challenge 2, Task 2.1 Text Localization. Another set of MLP classifiers is trained on the ICDAR 2013 training set. Example results of our method on this dataset are shown in Fig. 11. The quantitative comparisons of different methods are shown in Table 7. Results show that our method achieves satisfactory recall rate performance. Fig. 11. Examples of text line generation results on ICDAR 2013 dataset Table 7. Comparison of results on ICDAR 2013 dataset. Method Precision (%) Recall (%) F-measure(%) Our method 81.38 72.42 74.46 USTB TexStar 19 88.47 66.45 75.89 I2R NUS FAR 19 75.08 69.00 71.91

Yaqi Wang et al. / Procedia Computer Science 96 ( 2016 ) 1409 1417 1417 4. Conclusion In this paper, a novel approach for Chinese text detection in news videos is proposed. We extend the SWT method by considering color information for text candidates generation, and then design a set of shallow MLP networks to classify candidates into text and non-text blocks before text line generation. Experimental results show that the proposed method is effective for Chinese text detection in news videos. In future work, more context information obtained by integrating text recognition will be utilized to improve the performance of the text detection method for Chinese news video. We will also try to apply deep learning based methods with GPU acceleration. Acknowledgements This research is funded by 973 National Basic Research Program of China (No. 2014CB340506) and National Natural Science Foundation of China (No. 61573028). References 1. Phan, T.Q., Shivakumara, P., Lu, T., Tan, C.L.. Recognition of video text through temporal integration. In: Proceedings of the 12th International Conference on Document Analysis and Recognition. 2013, p. 589 593. 2. Zhong, Y., Karu, K., Jain, A.K.. Locating text in complex color images. In: Proceedings of the 3rd International Conference on Document Analysis and Recognition; vol. 1. 1995, p. 146 149. 3. Jain, A.K., Yu, B.. Automatic text location in images and video frames. Pattern recognition 1998;31(12):2055 2076. 4. Kim, H.K.. Efficient automatic text location method and content-based indexing and structuring of video database. Journal of Visual Communication and Image Representation 1996;7(4):336 344. 5. Liu, Y., Ikenaga, T.. A contour-based robust algorithm for text detection in color images. IEICE transactions on information and systems 2006;89(3):1221 1230. 6. Wu, V., Manmatha, R., Riseman, E.M.. Textfinder: An automatic system to detect and recognize text in images. IEEE Transactions on Pattern Analysis & Machine Intelligence 1999;(11):1224 1229. 7. Epshtein, B., Ofek, E., Wexler, Y.. Detecting text in natural scenes with stroke width transform. In: Proceedings of the 23rd IEEE Conference on Computer Vision and Pattern Recognition. 2010, p. 2963 2970. 8. Kim, K.I., Jung, K., Kim, J.H.. Texture-based approach for text detection in images using support vector machines and continuously adaptive mean shift algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence 2003;25(12):1631 1639. 9. Wang, K., Babenko, B., Belongie, S.. End-to-end scene text recognition. In: Proceedings of the 13th International Conference on Computer Vision. 2011, p. 1457 1464. 10. Hanif, S.M., Prevost, L.. Text detection and localization in complex scene images using constrained adaboost algorithm. In: Proceedings of the 10th International Conference on Document Analysis and Recognition. 2009, p. 1 5. 11. Cortes, C., Vapnik, V.. Support-vector networks. Machine learning 1995;20(3):273 297. 12. Zhang, X., Sun, F.. Pulse coupled neural network edge-based algorithm for image text locating. Tsinghua Science & Technology 2011; 16(1):22 30. 13. Huang, W., Qiao, Y., Tang, X.. Robust scene text detection with convolution neural network induced mser trees. In: Proceedings of the European Conference on Computer Vision. 2014, p. 497 511. 14. Shivakumara, P., Sreedhar, R.P., Phan, T.Q., Lu, S., Tan, C.L.. Multioriented video scene text detection through Bayesian classification and boundary growing. IEEE Transactions oncircuits and Systems for Video Technology 2012;22(8):1227 1235. 15. Sun, L., Huo, Q., Jia, W., Chen, K.. Robust text detection in natural scene images by generalized color-enhanced contrasting extremal region and neural networks. In: Proceedings of the 22nd International Conference on Pattern Recognition. 2014, p. 2715 2720. 16. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.. Gradient-based learning applied to document recognition. Proceedings of the IEEE 1998; 86(11):2278 2324. 17. Canny, J.. A computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 1986;(6):679 698. 18. Chen, X., Yuille, A.L.. Detecting and reading text in natural scenes. In: Proceedings of the IEEE Computer Society Conference oncomputer Vision and Pattern Recognition; vol. 2. 2004, p. II 366. 19. Karatzas, D., Shafait, F., Uchida, S., et al. ICDAR 2013 robust reading competition. In: Proceedings of the 12th International Conference ondocument Analysis and Recognition. 2013, p. 1484 1493.