Scene Text Detection Using Machine Learning Classifiers

Similar documents
I. INTRODUCTION. Figure-1 Basic block of text analysis

International Journal of Electrical, Electronics ISSN No. (Online): and Computer Engineering 3(2): 85-90(2014)

Segmentation Framework for Multi-Oriented Text Detection and Recognition

12/12 A Chinese Words Detection Method in Camera Based Images Qingmin Chen, Yi Zhou, Kai Chen, Li Song, Xiaokang Yang Institute of Image Communication

Bus Detection and recognition for visually impaired people

A Fast Caption Detection Method for Low Quality Video Images

Text Information Extraction And Analysis From Images Using Digital Image Processing Techniques

Text Localization and Extraction in Natural Scene Images

A Laplacian Based Novel Approach to Efficient Text Localization in Grayscale Images

CORRELATION BASED CAR NUMBER PLATE EXTRACTION SYSTEM

Connected Component Clustering Based Text Detection with Structure Based Partition and Grouping

Text Detection and Extraction from Natural Scene: A Survey Tajinder Kaur 1 Post-Graduation, Department CE, Punjabi University, Patiala, Punjab India

Review on Text String Detection from Natural Scenes

Dot Text Detection Based on FAST Points

Recognition of Gurmukhi Text from Sign Board Images Captured from Mobile Camera

Image Retrieval System for Composite Images using Directional Chain Codes

Text Extraction from Natural Scene Images and Conversion to Audio in Smart Phone Applications

Available online at ScienceDirect. Procedia Computer Science 96 (2016 )

Automatically Algorithm for Physician s Handwritten Segmentation on Prescription

Unique Journal of Engineering and Advanced Sciences Available online: Research Article

TEXT DETECTION AND RECOGNITION FROM IMAGES OF NATURAL SCENE

Research Article International Journals of Advanced Research in Computer Science and Software Engineering ISSN: X (Volume-7, Issue-7)

An Approach to Detect Text and Caption in Video

Detection of Text with Connected Component Clustering

Discovering Visual Hierarchy through Unsupervised Learning Haider Razvi

Texture Segmentation by Windowed Projection

EXTRACTING TEXT FROM VIDEO

Binarization of Color Character Strings in Scene Images Using K-means Clustering and Support Vector Machines

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 18, NO. 2, FEBRUARY /$ IEEE

Image Text Extraction and Recognition using Hybrid Approach of Region Based and Connected Component Methods

Gradient Difference Based Approach for Text Localization in Compressed Domain

A Hybrid Approach To Detect And Recognize Text In Images

IMAGE S EGMENTATION AND TEXT EXTRACTION: APPLICATION TO THE EXTRACTION OF TEXTUAL INFORMATION IN SCENE IMAGES

LEVERAGING SURROUNDING CONTEXT FOR SCENE TEXT DETECTION

A new approach to reference point location in fingerprint recognition

WITH the increasing use of digital image capturing

Enhanced Image. Improved Dam point Labelling

Extraction and Classification of User Interface Components from an Image

ABSTRACT 1. INTRODUCTION 2. RELATED WORK

Robust Phase-Based Features Extracted From Image By A Binarization Technique

Vision. OCR and OCV Application Guide OCR and OCV Application Guide 1/14

RESTORATION OF DEGRADED DOCUMENTS USING IMAGE BINARIZATION TECHNIQUE

AUTOMATED VIDEO INDEXING AND VIDEO SEARCH IN LARGE LECTURE VIDEO ARCHIVES USING HADOOP FRAMEWORK

OCR For Handwritten Marathi Script

HCR Using K-Means Clustering Algorithm

An Image based method for Rendering Overlay Text Detection and Extraction Using Transition Map and Inpaint

Indian Multi-Script Full Pin-code String Recognition for Postal Automation

TEXT DETECTION and recognition is a hot topic for

Critique: Efficient Iris Recognition by Characterizing Key Local Variations

Extraction Characters from Scene Image based on Shape Properties and Geometric Features

Chapter 3 Image Registration. Chapter 3 Image Registration

Human Motion Detection and Tracking for Video Surveillance

A Robust Automated Process for Vehicle Number Plate Recognition

Detection and Recognition of Text from Image using Contrast and Edge Enhanced MSER Segmentation and OCR

Automated LED Text Recognition with Neural Network and PCA A Review

One type of these solutions is automatic license plate character recognition (ALPR).

Detection of a Single Hand Shape in the Foreground of Still Images

Mobile Human Detection Systems based on Sliding Windows Approach-A Review

N.Priya. Keywords Compass mask, Threshold, Morphological Operators, Statistical Measures, Text extraction

Mobile Camera Based Text Detection and Translation

Locating 1-D Bar Codes in DCT-Domain

A System for Joining and Recognition of Broken Bangla Numerals for Indian Postal Automation

Effects Of Shadow On Canny Edge Detection through a camera

A Text Detection, Localization and Segmentation System for OCR in Images

TEVI: Text Extraction for Video Indexing

A process for text recognition of generic identification documents over cloud computing

Research of Traffic Flow Based on SVM Method. Deng-hong YIN, Jian WANG and Bo LI *

A Hierarchical Visual Saliency Model for Character Detection in Natural Scenes

A Survey on Portable Camera-Based Assistive Text and Product Label Reading From Hand-Held Objects for Blind Persons

Face Recognition Using Vector Quantization Histogram and Support Vector Machine Classifier Rong-sheng LI, Fei-fei LEE *, Yan YAN and Qiu CHEN

Restoring Warped Document Image Based on Text Line Correction

2 OVERVIEW OF RELATED WORK

INTELLIGENT transportation systems have a significant

Human Detection. A state-of-the-art survey. Mohammad Dorgham. University of Hamburg

Car License Plate Detection Based on Line Segments

A Real-Time License Plate Localization Method Based on Vertical Edge Analysis

Gradient-Angular-Features for Word-Wise Video Script Identification

Data Hiding in Binary Text Documents 1. Q. Mei, E. K. Wong, and N. Memon

Detecting Printed and Handwritten Partial Copies of Line Drawings Embedded in Complex Backgrounds

Color Image Segmentation

Scene Text Recognition in Mobile Application using K-Mean Clustering and Support Vector Machine

MULTI ORIENTATION PERFORMANCE OF FEATURE EXTRACTION FOR HUMAN HEAD RECOGNITION

An ICA based Approach for Complex Color Scene Text Binarization

TEXT DETECTION AND RECOGNITION IN CAMERA BASED IMAGES

Input sensitive thresholding for ancient Hebrew manuscript

Nitesh Kumar Singh, Avinash verma, Anurag kumar

Unconstrained License Plate Detection Using the Hausdorff Distance

A Document Image Analysis System on Parallel Processors

Text Detection from Natural Image using MSER and BOW

DESIGNING A REAL TIME SYSTEM FOR CAR NUMBER DETECTION USING DISCRETE HOPFIELD NETWORK

LICENSE PLATE RECOGNITION FOR TOLL PAYMENT APPLICATION

AUTOMATIC LOGO EXTRACTION FROM DOCUMENT IMAGES

International Journal of Advance Research in Engineering, Science & Technology

Multi-scale Techniques for Document Page Segmentation

DATA EMBEDDING IN TEXT FOR A COPIER SYSTEM

A Quantitative Approach for Textural Image Segmentation with Median Filter

ECE 172A: Introduction to Intelligent Systems: Machine Vision, Fall Midterm Examination

LECTURE 6 TEXT PROCESSING

EDGE BASED REGION GROWING

On the Possibility of Structure Learning-Based Scene Character Detector

Transcription:

601 Scene Text Detection Using Machine Learning Classifiers Nafla C.N. 1, Sneha K. 2, Divya K.P. 3 1 (Department of CSE, RCET, Akkikkvu, Thrissur) 2 (Department of CSE, RCET, Akkikkvu, Thrissur) 3 (Department of CSE, RCET, Akkikkvu, Thrissur) ABSTRACT In this paper we present an efficient method of scene text detection using two machine learning classifiers: one for generating candidate word regions and the other for the classification of text or nontext components. At first we extract connected components with the help of maximally stable extremal region algorithm. The resulting components are partitioned into clusters with help of an adaboost classifier based on adjacency relationship. After that we extract features for classification from the clusters. Then with the help of a support vector machine classifier we classify a block into text and nontext components Keywords - Connected component (CC), maximally stable extremal region (MSER), optical character recognition (OCR), support vevtor machine (SVM). I. INTRODUCTION Due to the wide availability of mobile devices having high quality digital cameras, research areas related to these devices are getting more attention in the last few decades. Text detection and extraction is one of the most important and interesting area among these researches. Texts present in camera captured images are considered as one of the important and strong source of information about that image and about the place or situation from where the image was captured. Text detection and extraction from images have a lot of valuable and useful application. Texts present in an image or video can be classified as scene text and caption text. Scene text exists in the image naturally. Caption texts refer to those texts which are added manually by the user. Scene texts overlap with the background. Therefore scene text detection and extraction are difficult as compared to the detection of caption text. Compared to the scanned document images, text extraction from the natural scenes are not easy because they exist in arbitrary orientation, different sizes and background interference. Examples of scene texts include signs on streets, display boards on shops, texts on vehicles, advertisement boards etc. Fig1 shows examples of text in natural scene images. Text string detection and extraction have a variety and useful applications. As people travel through different places for various purposes, it will be difficult for them to understand the text present on display boards in the foreign countries. In this case people either look for the help of guides or intelligent hand held devices for the translation of the information written on display boards. For this text detection is an important part. Text detection can play a crucial role in the case of content-based visual information retrieval and the content-based image retrieval, which includes utilization of techniques of computer vision for the problem of image retrieval in huge database applications. Another important application of scene text extraction is helping people with visual disabilities. It will be a great help for them if they have a computerized system which can convey the text information present on the objects and locations. License plate detection is another important area where text detection plays a central role. License plate detection has crucial role in monitoring of traffic at custom check points, for tracking of stolen cars. etc. Another significant application of scene text detection and extraction are robotic navigation, automatic geocoding etc. Fig 1: Examples of natural images with scene text

602 OCR is one of the technologies which can extract text characters, by identifying the corners. This can be done only if the characters have correct separation from background. Background interference and degradation in images will lead to the decrease in performance of OCR. So performance of OCR is comparatively low in case of natural scene images. Texture analysis and topic based partition are other methods of detection. But they work correctly on document images. Text detection and extraction from natural image is not an simple task. Text may exist in complex background and also the chances of degradation are high in case of natural images. As a result text extractions from natural images have a lot of complexities. The paper is organized as follows. In Section II, a literature survey on existing methods of scene text detection is done. In Section III, we provide details of the proposed method. In Section IV, we show conclusion. II. LITERATURE SURVEY This section covers the study of existing scene text detection methods. Existing method of scene text detection can be categorized as Texture based method, connected component based method and hybrid method. 2.1Texture based methods Texture based methods considers text as a special kind of texture and identify the texts by using their properties like wavelet features, filter responses and local intensities. Angadi et al[1] described a method that make use of a high pass filter that works in DCT domain for suppressing of the background and make use of texture properties like homogeneity and contrast for detection of text. The method comprises mainly of 5 phases. They are removal of background in the DCT domain, deriving feature matrix D, block classification, merging of the blocks for text area extraction and finally refinement of the text region. Kim et al[2] described a method that uses a combination of CAMSHIFT and SVM for detection and extraction of text.. Raw pixel intensity that forms the textural pattern is given as input to the SVM. After texture extraction, the text identification is performed by using the CAMSHIFT. Gllavata et al[3] described a method that uses high frequency wavelet coefficients distribution obtained by the application of wavelet transform of the image. For separating text and non text area. Then text area classification is done by k-means clustering. Then text extraction is performed by OCR engine by giving segmented binary text image as input. 2.2 Connected component based methods In connected component based methods, at first the image is divided and candidate text components are extracted. After that non text elements are eliminated through various ways. Connected component based methods make use of geometrical properties. This method works properly on the images that contains texts of many variations like changes in orientation, font etc. Epshtein et al [4] describe a method that makes use of stroke width for the extraction of text components. A stroke is a contiguous part in an image that forms a band of approximately constant width. Constant stroke width is one of the important feature that separate texts from other components of a scene. In this method they make use of a logical operator together with geometrical reasoning that identifies the place having same stroke width for the identification of regions having text. Yi et al [5] describes a method that use of gradient features and color homogeneity of character components for the extraction of candidate text regions. After that character candidate grouping is performed to detect text strings. This is performed on the basis of structural features of characters in text string such as differences in character size, distances between neighboring characters, and alignment of characters. Gatos et al[6] described a methodology for text detection from natural scene images is based on an efficient binarization and enhancement technique followed by a connected component analysis procedure. Starting from the original image, the method produces a binary image and an inverted binary image. Then connected components are extracted from complementary images. Further, the text verification is conducted at character level and word level on the candidate connected components. Finally, text regions localized in two images are refined and merged in post-processing. 2.3 Hybrid based methods Hybrid based method is a combination of texture based and connected component based methods. Yi et al[7] described a hybrid approach. At first a text region detector generates a text estimation map. This helps in the segmentation of text components by local binarization. After that non text component filtering is performed by a conditional random field model. Finally text line grouping of text components are performed by learning based energy minimization method. Liu et al[8] described a hybrid based method. This method is based on the assumption that characters have closed contours and a character string contains characters that lie in a straight line. This method extracts the text

603 region by extracting closed contours and searching neighbors of them. III PROPOSED METHOD This section describes the techniques used in the proposed methodology. 3.1 overview of proposed method We have illustrated the block diagram of our system in fig 2. Fig 3: input image MSER algorithm finds out the connected component that is brighter or darker than their surroundings. Fig 4 shows the result of MSER extraction of the input image shown in fig 3. Fig 2: Overview of proposed system As shown in the diagram the method consists of mainly of the following steps: connected component extraction, clustering with the help of an adaboost classifier, feature extraction for svm classification, classification of clusters into text and nontext components. For the CCs extraction we make use of MSER algorithm. An adaboost classifier that works on the basis of adjacency relationship between the CCS is used for clustering. Then we extract features. After that we classify the clusters as text and nontext components. For classification, we make use of an svm classifier. 3.2 connected component extraction Although there are a lot of CC extraction methods we make use of MSER algorithm because of its low computation cost with high performance. MSER algorithm will extract the part of the image where local binarization will be stable over a wide range of thresholds. This property helps us to extract most of the text components in the image. Fig 4: Result of MSER extraction 3.3 Clustering of CCs Clustering includes grouping of CCs based on adjacency relationship with the help of adaboost classifier 3.3.1Building of training sets Our classifier is based on the pair wise adjacency relationship between connected components extracted using MSER. For building the training set for the classifier, we obtain a collection of CCs by the help of MSER extraction to the set of training images. Then for every pair of extracted CCs we check if they are adjacent and they belong to text component set. Then we build a set of positive and negative examples. Positive set

604 contains samples that are adjacent and both belong to text component set. Negative samples are constructed by providing pairs of CCs such that one CC belongs to text component set and other belongs to nontext set. 3.3.2 Adaboost learning and clustering of CCs With the help of collected samples, we train an adaboost classifier which tells us whether two given CCs are adjacent or not. For the purpose of training of classifier we make use of one color based property and four geometrical properties of CCs. first we construct bounding box on each CC and denote its height and width as, respectively. For each pair of CCs, we estimate the vertical overlap, horizontal overlap and horizontal distance between the bounding boxes. They are denoted by vo ij, ho ij, d ij respectively., (1), (2) (3) And color distance between two CCs. we calculate these features for both positive and negative samples. We train an adaboost classifier with the help of these features. We set the output of the adaboost classifier as +1 for CCs that are adjacent and -1 for CCs that are not adjacent. We checks these adjacency for all pair of CCs extracted using MSER. Then we cluster the CCs with the help of union find set algorithm. 3.3 Feature extraction After clustering we will get a set of clusters which includes text as well as non text regions. For the classification of text and nontext component, we make use of an SVM classifier. For this we have to extract features from the clusters. For this we divide each cluster into overlapped square and we extract feature from each square block. Each square block is divided into 4 vertical and horizontal ones and features are extracted. For a horizontal block, we find a) number of white pixels, b) number of vertical white-black transitions c) number of vertical black-white transitions as features, and features for vertical block is defined similarly. 3.4 SVM classification For the training of SVM we first apply our connected component extraction, clustering and feature extraction steps and we train a support vector machine classifier for the classification of square block as text and nontext component. For a testing image, we do all the above steps and finally decision result of all the square blocks of a cluster is integrated. If the number square blocks which are text is greater than the non text, then that cluster is classified as a text component. Fig 6: Text region detected from input image Fig 5: Result of clustering on input image IV CONCLUSION Due to the complicated background and unpredictable text appearances scene text detection is still a challenging problem. We have presented in this paper an improved scene text detection method that makes use of machine learning classifiers. One for identifying the text component and other classification of text and non text

605 components. Our method is designed to work correctly on images having text strings arranged horizontally. Our future work will focus on developing an efficient learning based algorithm that extracts text in complex background and texts of arbitrary orientation. ACKNOWLEDGEMNTS Every success stands as a testimony not only to the hardship but also to hearts behind it. Likewise, the present work has been undertaken and completed with direct and indirect help from many people and I would like to acknowledge all of them for the same [9] H Koo and D Kim., Scene text detection via connected component clustering and non-text filtering, IEEE Trans. Image Proc., vol. 22, no. 6 pp. 2296 2305, 2013 [10] P. Shivakumara, T. Q. Phan, L. Shijian and C. L. Tan, Gradient Vector Flow and Grouping Based for Arbitrarily-Oriented Scene Text Detection in Video Images, IEEE Trans. CSVT, 2013, pp 1729-1739. REFERENCES [1] Angadi, S.A. and Kodabagi, M.M, Text region extraction from low resolution natural scene images using texture features, 2ndInternational Advance Computing Conference, IEEE, 2010,pp 121-128 [2] K. I. Kim, K. Jung, and J. H. Kim, Texture-based approach for text detection in images using support vector machines and continuously adaptive mean shift algorithm, IEEE Trans. PAMI, vol. 25, no. 12, pp. 1631 1639, 2003. [3] J. Gllavata, R. Ewerth, and B. Freisleben, Text Detection in Images Based on Unsupervised Classification of High-Frequency Wavelet Coefficients, Proc. of Int l Conf. on Pattern Recognition, Cambridge, UK, (page 425-428 Year of Publication : 2004 ICPR.2004.1334146 ). [4] B. Epshtein, E. Ofek, and Y. Wexler, Detecting text in natural scenes with stroke width transform, in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Page. 2963 2970 Year of Publication: 2010 CVPR.2010.5540041 [5] Yingli Tian and Chucai Yi, Text string detection from natural scenes by structure based partition and grouping, IEEE Transactions on image processing, vol. 20, no. 9, pp. 2594-2605, 2011. [6] Gatos, B.,Pratikakis, I. & Perantonis, S.J.,Towards text recognition in natural scene Images, in Proceedings of Int. Conf. Automation and Technology, ( Page 354-359 Year of Publication 2005) [7] Yi-Feng Pan, Xinwen Hou, Cheng-LinLiu(2009), Text Localization In Natural Scene Images Based On Conditional Random Field, ICDAR,pp 6-10. [8] Y.Liu, S. Goto, and T. Ikenaga, A contour-based robust algorithm for text detection in color images, IEICE Trans. Inf. Syst., vol. E89-D, no. 3, pp. 1221 1230, 2006.