IDIAP IDIAP. Martigny ffl Valais ffl Suisse

Similar documents
IDIAP IDIAP. Martigny ffl Valais ffl Suisse

Text Enhancement with Asymmetric Filter for Video OCR. Datong Chen, Kim Shearer and Hervé Bourlard

Smith et al. [6] developed a text detection algorithm by using vertical edge. In this method, vertical edges are first detected with a predefined temp

CORRELATION BASED CAR NUMBER PLATE EXTRACTION SYSTEM

TEVI: Text Extraction for Video Indexing

Text Area Detection from Video Frames

Text Information Extraction And Analysis From Images Using Digital Image Processing Techniques

Syntax-directed content analysis of videotext application to a map detection and recognition system

Enhanced Image. Improved Dam point Labelling

Time Stamp Detection and Recognition in Video Frames

Document Text Extraction from Document Images Using Haar Discrete Wavelet Transform

International Journal of Electrical, Electronics ISSN No. (Online): and Computer Engineering 3(2): 85-90(2014)

Extracting and Segmenting Container Name from Container Images

International Journal of Scientific & Engineering Research, Volume 6, Issue 8, August ISSN

Segmentation Framework for Multi-Oriented Text Detection and Recognition

A Novel Based Text Extraction, Recognition from Digital E-Videos

Optical Character Recognition Based Speech Synthesis System Using LabVIEW

Available online at ScienceDirect. Procedia Computer Science 96 (2016 )

Journal of Applied Research and Technology ISSN: Centro de Ciencias Aplicadas y Desarrollo Tecnológico.

I. INTRODUCTION. Figure-1 Basic block of text analysis

AViTExt: Automatic Video Text Extraction

A Text Detection, Localization and Segmentation System for OCR in Images

Recognition of Gurmukhi Text from Sign Board Images Captured from Mobile Camera

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 18, NO. 2, FEBRUARY /$ IEEE

EXTRACTING TEXT FROM VIDEO

A Laplacian Based Novel Approach to Efficient Text Localization in Grayscale Images

Video OCR for Digital News Archives. Carnegie Mellon University Forbes Avenue, Pittsburgh, PA ftsato, tk, ln,

IDIAP DEC Institut Dalle Molle d Intelligence Artificielle Perceptive ffl CP 592 ffl Martigny ffl Valais ffl Suisse

A Review of various Text Localization Techniques

AN EFFICIENT BINARIZATION TECHNIQUE FOR FINGERPRINT IMAGES S. B. SRIDEVI M.Tech., Department of ECE

Conspicuous Character Patterns

An Approach to Detect Text and Caption in Video

Automatic Video Caption Detection and Extraction in the DCT Compressed Domain

Face Detection and Recognition in an Image Sequence using Eigenedginess

OTCYMIST: Otsu-Canny Minimal Spanning Tree for Born-Digital Images

OBJECT RECOGNITION: OBTAINING 2-D RECONSTRUCTIONS FROM COLOR EDGES. G. Bellaire**, K. Talmi*, E. Oezguer *, and A. Koschan*

Textural Features for Image Database Retrieval

IDIAP. Martigny - Valais - Suisse IDIAP

An Accurate Method for Skew Determination in Document Images

A New Shape Matching Measure for Nonlinear Distorted Object Recognition

Image Text Extraction and Recognition using Hybrid Approach of Region Based and Connected Component Methods

NOWADAYS, video is the most popular media type delivered

Adaptive Wavelet Image Denoising Based on the Entropy of Homogenus Regions

Dense 3-D Reconstruction of an Outdoor Scene by Hundreds-baseline Stereo Using a Hand-held Video Camera

Scene Text Detection Using Machine Learning Classifiers

Arabic Text Recognition in Video Sequences

New wavelet based ART network for texture classification

AN EFFICIENT BINARY CORNER DETECTOR. P. Saeedi, P. Lawrence and D. Lowe

Comparison of fingerprint enhancement techniques through Mean Square Error and Peak-Signal to Noise Ratio

An ICA based Approach for Complex Color Scene Text Binarization

Extraction and Recognition of Artificial Text in Multimedia Documents

Mobile Camera Based Text Detection and Translation

An Edge-Based Approach to Motion Detection*

Text Localization and Extraction in Natural Scene Images

Text Extraction from Gray Scale Document Images Using Edge Information

Text Extraction of Vehicle Number Plate and Document Images Using Discrete Wavelet Transform in MATLAB

Layout Segmentation of Scanned Newspaper Documents

INTELLIGENT transportation systems have a significant

Real-time target tracking using a Pan and Tilt platform

ORT EP R RCH A ESE R P A IDI! " #$$% &' (# $!"

A Robust Wipe Detection Algorithm

A New Algorithm for Detecting Text Line in Handwritten Documents

Adaptative Elimination of False Edges for First Order Detectors

Texture Segmentation by Windowed Projection

RESTORATION OF DEGRADED DOCUMENTS USING IMAGE BINARIZATION TECHNIQUE

Consistent Line Clusters for Building Recognition in CBIR

A Miniature-Based Image Retrieval System

DWT Based Text Localization

IRIS Recognition System Based On DCT - Matrix Coefficient Lokesh Sharma 1

Volume 2, Issue 5, May 2014 International Journal of Advance Research in Computer Science and Management Studies

A Fast Caption Detection Method for Low Quality Video Images

An Edge Detection Algorithm for Online Image Analysis

Object Detection in Video Streams

Restoring Warped Document Image Based on Text Line Correction

Wavelet Based Image Retrieval Method

Towards the completion of assignment 1

Illumination-Robust Face Recognition based on Gabor Feature Face Intrinsic Identity PCA Model

Nicolai Petkov Intelligent Systems group Institute for Mathematics and Computing Science

A Hybrid Approach To Detect And Recognize Text In Images

Texture Segmentation by using Haar Wavelets and K-means Algorithm

TextFinder: An Automatic System To Detect And Recognize Text In Images Victor Wu, R. Manmatha, Edward M. Riseman Abstract There are many applications

An Automatic Timestamp Replanting Algorithm for Panorama Video Surveillance *

TRIANGLE-BOX COUNTING METHOD FOR FRACTAL DIMENSION ESTIMATION

N.Priya. Keywords Compass mask, Threshold, Morphological Operators, Statistical Measures, Text extraction

Automatic Text Extraction in Video Based on the Combined Corner Metric and Laplacian Filtering Technique

An Approach for Real Time Moving Object Extraction based on Edge Region Determination

Content Based Image Retrieval Using Curvelet Transform

Color Local Texture Features Based Face Recognition

OCR For Handwritten Marathi Script

Arabic Text Recognition in Video Sequences

A new approach for text segmentation using a stroke filter

NCC 2009, January 16-18, IIT Guwahati 267

Nicolai Petkov Intelligent Systems group Institute for Mathematics and Computing Science

Problems with template matching

TEXT EXTRACTION FROM AN IMAGE BY USING DIGITAL IMAGE PROCESSING

Text Extraction from Complex Color Images Using Optical Character Recognition

Elimination of Duplicate Videos in Video Sharing Sites

Clustering Methods for Video Browsing and Annotation

Video Alignment. Literature Survey. Spring 2005 Prof. Brian Evans Multidimensional Digital Signal Processing Project The University of Texas at Austin

Character Segmentation and Recognition Algorithm of Text Region in Steel Images

Transcription:

R E S E A R C H R E P O R T IDIAP IDIAP Martigny - Valais - Suisse ASYMMETRIC FILTER FOR TEXT RECOGNITION IN VIDEO Datong Chen, Kim Shearer IDIAP Case Postale 592 Martigny Switzerland IDIAP RR 00-37 Nov. 2000 Institut Dalle Molle d'intelligence Artificielle Perceptive ffl CP 592 ffl Martigny ffl Valais ffl Suisse téléphone +41 27 721 77 11 télécopieur +41 27 721 77 12 adr.él. secretariat@idiap.ch internet http://www.idiap.ch

Rapport de recherche de l'idiap 00-37 ASYMMETRIC FILTER FOR TEXT RECOGNITION IN VIDEO Datong Chen, Kim Shearer IDIAP Case Postale 592 Martigny Switzerland Nov. 2000

IDIAP RR 00-37 1 tripes are a common sub-structure of text characters, and the scale of the stripes does not vary significantly within a character. In this paper a new form of filter is derived from the Gabor filter which can efficiently estimate the scales of such stripes. The contrast of text in video can then be increased by enhancing the edges of those stripes found to have a suitable scale. The algorithm presented enhances the stripes in three selected scale ranges. Character recognition is then performed on the output of binarizing these enhanced images, and shows improvement over other methods. 1 Introduction Video text detection and recognition is a useful technology for building text based video indexing systems. Video text including captions and embodied scene text, provides precise and explicit information about the names of people, organization, location, date, times and scores, etc., which are powerful resources for video indexing and annotation. The idea of combining the text-based searching technologies and text recognition technologies has become a popular solution of the video indexing in recent years [11][12][13]. As text-based searching and retrieval technologies have been well developed [1][2], there is high demand for a system to detect and recognize unconstrained text against any background in video. In constrast with OCR text printed against a clean background, video text may have the background of many kinds of indoor or outdoor scenes. Therefore, some properties of the video text, such as scale and alignment, can not be obtained easily unless the text can be segmented from the video frames. In order to address this problem, much former work presents the methods based on region analysis and texture analysis. Region-based methods, first, roughly segment the input video frames into regions. The text regions are then selected by checking the features, such as maximum and minimum of the region, the width/height ratio of the region, contrast between the segments and the background, spatial frequency of the region, alignment and movement [4][11][9][10]. The region-based text detection methods require that the gray-level of the text should be consistent and can not improve the segmentation result obtained at the first step of the algorithm. Texture-based methods use the texture property of the text string to search for text in the video. Wu et.al. [5][6][7] proposed an algorithm using statistical properties of the pixel values through out the Gaussian scale space to segment the input image into text regions and background. The candidate text regions are refined under heuristic constraints, such as height similarity, spacing and linear alignment and fixed spatial frequency. Some texture features of the text string can also be found in single scale images. Zhong [8] uses spatial variance of the input image as a feature for locating the text. In [14], the texture feature is extracted using Haar wavelet transform. The texture-based methods are robust to small gray-level variation of the text but can not always locate the text accurately [15]. In this paper, we presents a method for detecting and segmenting the text using local substructures. We locate the sub-structures of the text using edge detection and estimate the scale of each sub-structures using a family of filters, which is presented in section 2. We then select three scale ranges and enhance the contrast of these sub-structures as character strokes in each scale range individually to improve the performance of text segmentation. 2 Asymmetric Filter The family of two-dimensional Gabor filters G ; ;' (x;y), which was proposed by Daugman [16], are often used to obtain the spatial frequency of the local pattern in an image: 02 +fl 2 02 y ) G ; ;' (x;y) =e (x 2ff2 cos 2ß x0 + '

IDIAP RR 00-37 2 Fig. 1 Asymmetric filters x 0 = xcos + ysin y 0 = xsin + ycos Here the arguments x and y specify the position of the pixels over the image domain, the parameter specifies the orientation of the filter, the parameter fl determines the spatial aspect ratio, and 1 is called the spatial frequency. These filters provide the conjointly optimal resolution for the orientation, spatial frequency and the position of the local image structure. However, text detection requires the precise position and scale of the text. We propose a family of asymmetric filters based on Gabor filters to obtain the precise scale information of the local image structures. Our family of filters is designed into two groups: edge-form filters and stripe-form filters. The edge-form filters E ; (x;y) are the Gabor filters with ' = ß=2: 02 +fl 2 02 y ) E ; (x;y) = e (x 2ff2 cos 2ß x0 + ß 2 x 0 = xcos + ysin y 0 = xsin + ycos The stripe-form filters S ; (x;y) are defined as a Gabor filter with a translation 4 ;0 : 02 +fl 2 02 y ) S ; (x;y) =e (x 2ff2 cos 2ß x0 x 0 = xcos + ysin 4 y 0 = xsin + ycos The edge-form and stripe-form filters keep most of the properties of the Gabor filters except the specified translation on the position and the phase offset. Figure 1 shows the pattern of the edge-form filters and stripe-form filters in 8 orientations with fl = 0:92. A special property of these two groups of filters is very useful to find out the scale of the local image structure. Experiments show that if the pixel (x;y) is on the edge of stripe structure, the responses

IDIAP RR 00-37 3 40 35 Typical filter responds of multi scale Edge Detector Stripe Detector 30 25 Filter responds 20 15 10 5 0 1 2 3 4 5 6 7 8 9 10 11 Space Scale Fig. 2 scale adaptive property of asymmetric filters of the stripe-form filters are smaller than the responses of the edge-form filters when the scale of the filters are rather smaller than the scale of the stripe (S ; (x;y) <E ; (x;y)), but greater when the scale of the filters are rather larger than the scale of the stripe (E ; (x;y) <S ; (x;y), see figure 2). The scale which the stripe-form filter response surpasses the edge-form filter response is called the surpass scale. For any pixels on the edges, according to the surpass scale s,we compute the filter responses of the two other scales 1 2 s and 3 2 s. A neural network is trained with these five filter responses (the responses of the edge-form filter and the stripe-form filter at surpass scale are the same value) to find the scale of the local edge. 3 Algorithm An off-line text-based video indexing system consists of text annotation and a text search engine. The search engine queries the text annotation with keywords input by the user and retrives the matching video periods. 3.1 Text Detection and Recognition The text recognition algorithm can be summarized as follows: 1. The edges in the input video frame are extracted using the Canny algorithm. 2. In order to select the candidate pixels on the edges, we segment the frame in to n n blocks, where n equals to the half size of the smallest scale of the substructures of the text (here n=2). Only one edge point is selected as the candidate pixel in each block. The pixle chosen is the one with the maximum energy. 3. The scales of the candidate pixels are estimated with the asymmetric filters and neutral network that we discussed in the last section. The pixels with the scales extremely small or large are filtered out because these values are out of our searching range. In this paper, the scale range is 3 to 50.

IDIAP RR 00-37 4 Tab. 1 Recognition results: the region analysis algorithm emploies MoCA algorithm without motion analysis. Text 1: superimposed text. Text 2: scene text algorithm text frames recognition rate MoCA 1 4000 36.1% MoCA 2 4000 7.4% Ours 1 4000 70.6% Ours 2 4000 11.4% 4. With the filters in their scales and orientations of the candidate pixels, we can reconstruct the image pattern. The image pattern is reconstructed in three scale levels. The first level L 1 includes the scales between 3 and 9, the second level L 2 consists of the scales from 7 to 30 and the last scale level L 3 is from 26 to 50. For each candidate pixel (x i ;y i ), with the orientation i and filter scale i, the three reconstructed image patterns are defined as: I k r = nx i=0 F k (i); k=1;2;3: F k (i) =ρ 0 i =2 L k S i; i i 2 L k : 5. The original image is then enhanced by addition of itself to the three reconstructed image patterns I k r individually, I k e (x;y) =I org (x;y)+i k r (x;y); k=1;2;3: We then binarize these three enhanced images using the Otsu's method and mask off the pixels with zero value in the reconstructed image patterns. These three binary image are used as the inputs of a commercial OCR software system to recognize the text characters. 4 Experiment and Result Experiments are based on 3 video streams with a total of 4000 frames (512x512). The text in the videos is both superimposed text and scene text, with different alignments and movements. When applying our algorithm, we constrain that the minimum scale of the sub-structure of the text to 3 pixels and the maximum scale to 50 pixels. Text characters are assumed to appear in at least 8 consecutive frames. The examples of image patterns, enhanced images and the binary images in three scale ranges and binary result of MoCA are shown in figure 3. The final recognition results with the TypeRead OCR package can be found in table 1, which shows the comparision between results of our algorithm and the results with MoCA. 5 Conclusion Stripes are a very common sub-structure of characters. Their scale does not vary much in one text character. Using our family of asymmetric filters, we can estimate the scale of a stripe efficiently with a binary search algorithm. It is therefore possible to enhance contrast at only those edges most likely to represent text in the scale interest of. The experimental results show that the approach presented in this paper can improve the recognition rate of the superimposed text significantly. The main disadvantage of this method is that it can not filter out all background from the frames. Some local structure of the background may also contain similar stripe structure. They are regarded as part of text characters and let OCR system make the final decision.

IDIAP RR 00-37 5 Fig. 3 row 1: original frame image (left), MoCA result (right); (row 2)reconstructed image patterns in 3 scale ranges; (row 3)enhanced images in 3 scale ranges; (row 4)binarization results of 3 enhanced images, with the Otsu's method. The images from left to right are L 1 ;L 2 ;L 3 Références [1] M. Bokser, Omnidocument technologies", Proc. IEEE, 80(7):1066 1078, July 1992. [2] S. V. Rice, F. R. Jenkins, and T. A. Nartker. OCR accuracy: UNLV's fifth annual test", IN- FORM, 10(8), September 1996. [3] L. O'Gorman and R. Kasturi, Document Image Analysis", IEEE Computer Society Press, Los Alamitos, 1995. [4] J. Ohya, A. Shio, and S. Aksmatsu, Recognition characters in scene images. IEEE Trans. Pattern Analysis and Machine Intelligence", 16(2):214 220, 1994. [5] V.Wu, R. Manmatha, and E. M. Riseman, Finding text in images", In Proc. ACM Int. Conf. Digital Libraries, 1997. [6] V. Wu, R. Manmatha, and E. M. Riseman, Textfinder: An automatic system to detect and recognize text in images", IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(11):1224 1229, 1999. [7] V. Wu and R. Manmatha, Document image clean-up and binarization", In Proc. SPIE Symposium on Electronic Imaging, 1998. [8] Y. Zhong, K. Karu, and A. K. Jain, Locating text in complex color images", Pattern Recognition, 28(10):1523 1536, 1995.

IDIAP RR 00-37 6 [9] R. Lienhart and W. Effelsberg, Automatic text segmentation and text recognition for video indexing", Technical Report TR-98-009, University of Mannheim, Mannheim, 1998. [10] R. Lienhart, Automatic text recognition in digital videos", In Proc. SPIE, Image and Video Processing IV, January 1996. [11] R. Lienhart, Indexing and retrieval of digital video sequences based on automatic text recognition", In Proc. 4th ACM International Multimedia Conference, Boston, November 1996. [12] T. Sato, T. Kanade, E. K. Hughes, and M. A. Smith, Video ocr for digital news archives", In IEEE Workshop on Content Based Access of Image and Video Databases, Bombay,January 1998. [13] T. Sato, T. Kanade, E. K. Hughes, M. A. Smith, and S. Satoh, Video OCR: indexing digital news libraries by recognition of superimposed caption", In ACM Multimedia System Special Issue on Video Libraries, Feb. 1998. [14] H. Li and D. Doermann, Text enhancement in digital video using multiple frame integration", ACM Multimedia 1999. [15] A. K. Jain and B. Yu, Automatic text localization in images and video frames", Pattern Recognition, 31(12):2055 2076, 1998. [16] G. Daugman, Uncertainty relations for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters", Journal of the Optical Society of America A, 1985, vol.2, pp. 1160-1169.