TEVI: Text Extraction for Video Indexing

Similar documents
Arabic Text Recognition in Video Sequences

Arabic Text Recognition in Video Sequences

A Text Detection, Localization and Segmentation System for OCR in Images

IDIAP IDIAP. Martigny ffl Valais ffl Suisse

Time Stamp Detection and Recognition in Video Frames

CORRELATION BASED CAR NUMBER PLATE EXTRACTION SYSTEM

Text Information Extraction And Analysis From Images Using Digital Image Processing Techniques

Text Area Detection from Video Frames

Text Enhancement with Asymmetric Filter for Video OCR. Datong Chen, Kim Shearer and Hervé Bourlard

International Journal of Electrical, Electronics ISSN No. (Online): and Computer Engineering 3(2): 85-90(2014)

Scene Text Detection Using Machine Learning Classifiers

Layout Segmentation of Scanned Newspaper Documents

AViTExt: Automatic Video Text Extraction

Extraction of Scene Text in HSI Color Space using K-means Clustering with Chromatic and Intensity Distance

Enhanced Image. Improved Dam point Labelling

A Laplacian Based Novel Approach to Efficient Text Localization in Grayscale Images

Image Classification Using Wavelet Coefficients in Low-pass Bands

IDIAP IDIAP. Martigny ffl Valais ffl Suisse

An ICA based Approach for Complex Color Scene Text Binarization

EXTRACTING TEXT FROM VIDEO

A Robust Wipe Detection Algorithm

Recognition of Gurmukhi Text from Sign Board Images Captured from Mobile Camera

An Efficient Approach for Color Pattern Matching Using Image Mining

Improving the Efficiency of Fast Using Semantic Similarity Algorithm

Clustering Methods for Video Browsing and Annotation

Latest development in image feature representation and extraction

Baseball Game Highlight & Event Detection

Smith et al. [6] developed a text detection algorithm by using vertical edge. In this method, vertical edges are first detected with a predefined temp

Temperature Calculation of Pellet Rotary Kiln Based on Texture

Texture Segmentation by Windowed Projection

Text Recognition in Videos using a Recurrent Connectionist Approach

Sketch Based Image Retrieval Approach Using Gray Level Co-Occurrence Matrix

A Miniature-Based Image Retrieval System

Color Local Texture Features Based Face Recognition

Optical Character Recognition Based Speech Synthesis System Using LabVIEW

RESTORATION OF DEGRADED DOCUMENTS USING IMAGE BINARIZATION TECHNIQUE

Extracting Farsi text from a video in the wavelet domain

Story Unit Segmentation with Friendly Acoustic Perception *

Holistic Correlation of Color Models, Color Features and Distance Metrics on Content-Based Image Retrieval

An Efficient Methodology for Image Rich Information Retrieval

ORGANIZATION AND REPRESENTATION OF OBJECTS IN MULTI-SOURCE REMOTE SENSING IMAGE CLASSIFICATION

Experimentation on the use of Chromaticity Features, Local Binary Pattern and Discrete Cosine Transform in Colour Texture Analysis

Optical Character Recognition (OCR) for Printed Devnagari Script Using Artificial Neural Network

Searching Video Collections:Part I

A Computer Vision System for Graphical Pattern Recognition and Semantic Object Detection

Integration of Global and Local Information in Videos for Key Frame Extraction

Conspicuous Character Patterns

Segmentation Framework for Multi-Oriented Text Detection and Recognition

A Content Based Image Retrieval System Based on Color Features

Efficient Indexing and Searching Framework for Unstructured Data

Journal of Applied Research and Technology ISSN: Centro de Ciencias Aplicadas y Desarrollo Tecnológico.

Image retrieval based on bag of images

Automatic Video Caption Detection and Extraction in the DCT Compressed Domain

Yui-Lam CHAN and Wan-Chi SIU

Image Enhancement Techniques for Fingerprint Identification

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, 2013 ISSN:

An Image based method for Rendering Overlay Text Detection and Extraction Using Transition Map and Inpaint

TEXT DETECTION AND RECOGNITION IN CAMERA BASED IMAGES

Fast Rotation-Invariant Video Caption Detection BasedonVisualRhythm

Syntax-directed content analysis of videotext application to a map detection and recognition system

AIIA shot boundary detection at TRECVID 2006

A Novel Texture Classification Procedure by using Association Rules

Extraction and Recognition of Artificial Text in Multimedia Documents

Affective Music Video Content Retrieval Features Based on Songs

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Color Image Segmentation

Textural Features for Image Database Retrieval

I. INTRODUCTION. Figure-1 Basic block of text analysis

ADAPTIVE TEXTURE IMAGE RETRIEVAL IN TRANSFORM DOMAIN

A Novel Image Retrieval Method Using Segmentation and Color Moments

HANDWRITTEN GURMUKHI CHARACTER RECOGNITION USING WAVELET TRANSFORMS

International Journal of Modern Engineering and Research Technology

An Introduction to Pattern Recognition

A Texture Feature Extraction Technique Using 2D-DFT and Hamming Distance

Spotting Words in Latin, Devanagari and Arabic Scripts

Interactive Video Retrieval System Integrating Visual Search with Textual Search

Estimating Human Pose in Images. Navraj Singh December 11, 2009

Binarization of Color Character Strings in Scene Images Using K-means Clustering and Support Vector Machines

Effects of the Number of Hidden Nodes Used in a. Structured-based Neural Network on the Reliability of. Image Classification

Video De-interlacing with Scene Change Detection Based on 3D Wavelet Transform

Mouse Pointer Tracking with Eyes

Global-Local Contrast Enhancement

Invariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction

Object Detection in Video Streams

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 18, NO. 2, FEBRUARY /$ IEEE

Automatic local Gabor features extraction for face recognition

Localization, Extraction and Recognition of Text in Telugu Document Images

Scene Text Extraction using K-means Clustering in HSI Color Space: Influence of Color Distance Measure

Handwritten Devanagari Character Recognition Model Using Neural Network

Real-time target tracking using a Pan and Tilt platform

ABJAD: AN OFF-LINE ARABIC HANDWRITTEN RECOGNITION SYSTEM

IMAGE SEGMENTATION AND OBJECT EXTRACTION USING BINARY PARTITION TREE

FEATURE EXTRACTION TECHNIQUES FOR IMAGE RETRIEVAL USING HAAR AND GLCM

AN EFFICIENT BATIK IMAGE RETRIEVAL SYSTEM BASED ON COLOR AND TEXTURE FEATURES

PixSO: A System for Video Shot Detection

Wavelet Based Image Retrieval Method

AN EFFICIENT BINARIZATION TECHNIQUE FOR FINGERPRINT IMAGES S. B. SRIDEVI M.Tech., Department of ECE

QMUL-ACTIVA: Person Runs detection for the TRECVID Surveillance Event Detection task

A System for Joining and Recognition of Broken Bangla Numerals for Indian Postal Automation

A MPEG-4/7 based Internet Video and Still Image Browsing System

Transcription:

TEVI: Text Extraction for Video Indexing Hichem KARRAY, Mohamed SALAH, Adel M. ALIMI REGIM: Research Group on Intelligent Machines, EIS, University of Sfax, Tunisia hichem.karray@ieee.org mohamed_salah@laposte.net adel.alimi@ieee.org Abstract Efficient indexing and retrieval of digital video is an important aspect of video databases. One powerful index for retrieval is the text appearing in them. It enables content based browsing. In this paper, we describe a system for detecting and extracting text appearing in video frames A supervised learning method based on color and edge information is used to detect text regions. After an unsupervised clustering for text segmentation and binarization is applied using color information. Experimental results demonstrate that the proposed approach is robust for font-size, font-color, background complexity and language. Key-words video OCR, multi-frame integration, text detection, localization, segmentation, neural networks, fuzzy C-means. I. Introduction The video is a topic of paramount importance which continues to be dealt with directly as a basic (non-decomposable) object in multimedia documents. Its contents remain rarely clarified and it is often very difficult to classify and extract any knowledge. In many applications, such as the indexing and research through content, we are asked to reach the internal structure of the video, and to lie out or handle data of finer granularity, such as the text or visual objects. Classification and the annotation are usually carried the out manually according to a list of keywords chosen by user. This technique is tiresome and the automation of the indexing process is of great interest. The extraction of relevant information like the text can really provide us with additional data regarding the semantic contents of these videos. evertheless, the detection and the text recognition encounter several problems. If it is often relatively well contrasted in relation to its environment, the text can be found superimposed at a heterogamous and complex bottom. Moreover the text can be multicoloured and heterogeneous. These characteristics make its extraction difficult. In this paper, we propose an approach to automatically localize, segment and binarize text appearing in video frames. We first apply a new multiple frames integration (MFI) method to minimize the variation of the background of the video frames. Second, a supervised learning method based on color and edge information is used to efficiently detect text regions. Third, an unsupervised clustering for text segmentation and binarization is applied using color information.. II. Text Extraction from Video Works on text extraction may be generally grouped into four categories: connected component methods [][3][0], texture classification methods [5][], edge detection methods [9][3][][8][], and correlation based methods [6][4]. The connected component methods detect text by extracting the connected components of monotonous colours that obey certain size, shape, and spatial alignment constraints. The texture-based methods treat the text region as a special type of texture and employ conventional texture classification method to extract

the text. Edge detection methods have been increasingly used for caption extraction due to the rich edge concentration in characters. The correlation based methods are those that use any kind of correlation in order to decide if a pixel belongs to a character or not. All the methods that have been mentioned don't use temporal information or use it as a complementary tool. In this paper we present a new approach which in we combine color and edges to extract text. III. Proposed Method TEVI is composed of four steps. First, we eliminate from video frames columns and rows of pixels which are not containing text. Second, we localize text in the remaining columns and rows. Finally, we extract the text from the frame..pixels filtering In our approach, we suppose that text must persist for a given duration to be readable. So, the temporal aspect will play a key role in the process of text extraction. Indeed, we will work by frames windows. The length of every window is one second. Besides, we will operate only on the first and the last frame of every window. Then, we realize a correlation analysis between the first and the last frame by computing a correlation coefficient on pixels of rows (respectively of columns) of these two frames. This coefficient is computed as follows: re ( ) = i i ( E ( i) E )*( E ( i) E ( i)) Fst Fst Lst Lst ( E ( i) E ) * ( E ( i) E ) Fst Fst Lst Lst i () Where E(i) indicates the gray level of the pixel i in the cluster of rows (or of columns) E. In this step are kept only rows and columns containing correlated pixels which may be text pixels (Figure ). Figure : Row and column filtering.text detection and localisation From the remaining rows and columns clusters, we will try to detect and to localize text pixels. Every window is represented by two frames. One is the middle frame of the window filtered along rows and the other is the middle frame but filtered along the columns. For every frame we achieve two operations. First, we realize a transformation from the RGB space to HSV space. Second, we generate using Sobel filters, an edge picture. For every cluster of these frames, we formulate a vector composed of ten features: five representing the HSV image and five representing the edge picture. These features are computed as follows:

f (, ) (, )/( ( ) EI = MEI MI () f( EI, ) = µ ( EI, )/ µ ( I) (3) f3( EI, ) = µ 3( EI, )/ µ 3( I) (4) c_ sup( E, I) f4 ( E, I) = c_ sup( I) (5) c_inf( E, I) f5 ( E, I) = c_inf( I) (6) E represents rows clusters or columns clusters. I represents the HSV image or the edge picture. M ( EI, ) is the mean of color pixels of the cluster E in the picture I. M ( EI, ) = E( I) (7) i = 0 µ ( EI, ) is the second order moment µ ( EI, ) = ( EI ( i) MEI (, )) (8) i= 0 µ ( EI, ) is the third order moment 3 3 µ 3( EI, ) = ( EI ( i) MEI (, )) (9) i= 0 c_sup(e,i) is the maximum value of the confidence interval tα * µ ( E, I) c_ sup( E, i) = M( E, I) + (0) c_inf( E, i ) is the minimum value of the confidence interval I tα * µ ( E, I) c_inf( E, i) = M( E, I) () The generated vectors will be presented to a trained neural network containing 3 hidden nodes and output node. The results of the classifications are two images: an image containing rows considered as text rows and an image containing columns considered as text columns. 3

Finally, we merge results of the two images to generate an image containing zones of text (Figure) Figure : Text localization through neural networks 3.Text segmentation After localizing text in the frame, the following step consists in segmenting and binarizing text. First, we compute the Gray levels image. Second, for each pixel in the text area, we create a vector composed of two features: the standard deviation and the entropy of the 8 neighborhood of pixels which are computed as follows: () i j std ( p ) = ( f ( i, j ) f ) * = = ent ( p ) = ( f ( i, j)*log( f ( i, j ))) (3) i = j = Where f(i,j) indicates the Gray level of the pixel in position (i,j). Third, we run the fuzzy C-means clustering algorithm to classify the pixels into text cluster and background cluster. Finally, we binarize the text image by marking text pixels in black (Fig 4). This technique is motivated by two observations. First, text has usually a unique texture. Second, the border of text character result in high a contrast edges. Finally, we binarize the text image by marking text pixels in black. The extracted text will be recognized by an OCR module. (a) (a) Original frame (b) (b) Segmented text frame Figure 3: Example of text binarization IV. Experimental results In order to evaluate the proposed automatic text extraction solution we have used a varied database composed of different sources including TV news, commercials, sports and movies 4

at resolution of 35 88 with a total of 60 minutes and graphical text, aligned with multiple fonts and sizes. For evaluating the text detection performance, the precision and recall metrics have been used: Recall=CCD/CGT; Precision=CCD/TCD where CCD represents the number of characters correctly detected by the algorithm, CGT represents the number of ground truth characters and TCD represents the total number of characters detected by the algorithm. The results are shown in table We notice that our approach is more efficient than the method of J. Gllvata et al.[7]. In fact, our approach is more robust to various font size, font styles, contrast levels and background complexities because it uses both of color features and edges features to differentiate text pixels from background pixels, besides it s based on a neural network trained on different types of text styles. TABLE EVALUATIO RESULTS OF TEXT LOCALIZATIO Recall Precision Our approach 96% 93% J. Gllvata and al. [7] 90% 87% V. Conclusion In this paper, we have proposed on approach to automatically localize and segment text appearing in video. The encouraging results prove that the proposed method is robust to various font size, font styles, contrast levels and background complexities. For this reason we have integrated our system of text extraction in a global system of video indexation (Figure4). Offline Video Database Features extraction Feature Text Online Query Figure4: Video indexation system The indexation system integrates many features in which text takes an important place. An offline step is achieved on video database to extract text. This task is achieved by TEVI system. REFERECES [] A. K. Jain and B. Yu, Automatic text location in images and video frames, Pattern recognition, Vol.3, o., pp.055-076, 998. 5

[] C. Garcia and X. Apostolidis, Text detection and segmentation in complex color images, Proc. of IEEE International Conf. on Acoustics, Speech, and Signal Processing, Vol. 4, pp. 36-39, 000. [3] C. M. Lee and A. Kankanhalli, Automatic extraction of characters in complex scene images, International Journal of Pattern Recognition and Artificial Intelligence, Vol. 9, o., pp. 67-8, 995. [4] E.K.Wong., M.Chen, "A Robust Algorithm for Text Extraction in Color Video", Proceedings of IEEE International Conference on Multimedia and Expo, 000, vol., pp. 797-800 [5] H. P. Li, D. Doemann, and O. Kia, Automatic text detection and tracking in digital video, IEEE Trans. on Image Processing, Vol.9, o., pp.47-56, 000. [6] H.Karray, A.M.Alimi, Detection and Extraction of the Text in a video sequence in Proc. IEEE International Conference on Electronics, Circuits and Systems 005 ( ICECS 005), vol., pp. 474 478 [7] J. Gllavata, R.Ewerth and B. freisleben, A text detection, localization and segmentation system for OCR in images, in Proc. IEEE Sixth International Symposium on Multimedia Software Engineering, 004. [8] L.Agnihotri.,.Dimitrova.,,M.Soletic., Multi-layered Videotext Extraction Method, IEEE International Conference on Multimedia and Expo (ICME), Lausanne (Switzerland), August 6-9, 00 [9] L. Agnihotri and. Dimitrova, Text detection for video analysis, Workshop on Content-based access to image and video libraries, in conjunction with CVPR, Colorado, June, 999. [0] R. Lienhart and F. Stuber, Automatic text recognition indigital videos, Proceedings of SPIE Image and Video Processing IV 666, pp.80-88, 996. [] S.Hua,X., X.-R.Chen. and al., Automatic Location of Text in Video Frames, Intl Workshop on Multimedia Information Retrieval (MIR00, In conjunction with ACM Multimedia 00), 00 [] V. Wu, R. Manmatha, and E. M. Riseman, TextFinder: anautomatic system to detect and recognize text in images, IEEE Trans. PAMI, Vol., pp.4-9, ov. 999. [3] X. Gao and X. Tang, Automatic news video caption extraction and recognition, in Proc. of Intelligent Data Engineering and Automated Learning 000, pp. 45-430,Hong Kong, Dec. 000. 6