A Gradient Difference based Technique for Video Text Detection

Similar documents
A Gradient Difference based Technique for Video Text Detection

A Binarization Algorithm specialized on Document Images and Photos

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach

3D vector computer graphics

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Shape Representation Robust to the Sketching Order Using Distance Map and Direction Histogram

A Laplacian Based Novel Approach to Efficient Text Localization in Grayscale Images

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

Accurate Overlay Text Extraction for Digital Video Analysis

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

Edge Detection in Noisy Images Using the Support Vector Machines

Local Quaternary Patterns and Feature Local Quaternary Patterns

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Detection of an Object by using Principal Component Analysis

Editorial Manager(tm) for International Journal of Pattern Recognition and

Learning-based License Plate Detection on Edge Features

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

Face Detection with Deep Learning

MULTIRESOLUTION TEXT DETECTION IN VIDEO FRAMES

User Authentication Based On Behavioral Mouse Dynamics Biometrics

Image Representation & Visualization Basic Imaging Algorithms Shape Representation and Analysis. outline

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

DETECTING TEXT IN VIDEO FRAMES

An Image Fusion Approach Based on Segmentation Region

An efficient method to build panoramic image mosaics

Fast Feature Value Searching for Face Detection

EYE CENTER LOCALIZATION ON A FACIAL IMAGE BASED ON MULTI-BLOCK LOCAL BINARY PATTERNS

Reducing Frame Rate for Object Tracking

A PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION

Adaptive Silhouette Extraction and Human Tracking in Dynamic. Environments 1

A high precision collaborative vision measurement of gear chamfering profile

TN348: Openlab Module - Colocalization

Enhanced Face Detection Technique Based on Color Correction Approach and SMQT Features

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

A New Feature of Uniformity of Image Texture Directions Coinciding with the Human Eyes Perception 1

A MODEL-BASED BOOK BOUNDARY DETECTION TECHNIQUE FOR BOOKSHELF IMAGE ANALYSIS

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

An Efficient Background Updating Scheme for Real-time Traffic Monitoring

Video Object Tracking Based On Extended Active Shape Models With Color Information

Brushlet Features for Texture Image Retrieval

Lecture 5: Multilayer Perceptrons

The Research of Support Vector Machine in Agricultural Data Classification

Classifier Selection Based on Data Complexity Measures *

Collaboratively Regularized Nearest Points for Set Based Recognition

An Optimal Algorithm for Prufer Codes *

USING GRAPHING SKILLS

A Novel Adaptive Descriptor Algorithm for Ternary Pattern Textures

Hybrid Non-Blind Color Image Watermarking

A New Approach For the Ranking of Fuzzy Sets With Different Heights

Face Tracking Using Motion-Guided Dynamic Template Matching

A fast algorithm for color image segmentation

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

An Automatic Eye Detection Method for Gray Intensity Facial Images

Corner-Based Image Alignment using Pyramid Structure with Gradient Vector Similarity

Snakes-based approach for extraction of building roof contours from digital aerial images

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

LOOP ANALYSIS. The second systematic technique to determine all currents and voltages in a circuit

Optimal Workload-based Weighted Wavelet Synopses

Intelligent Video Display to Raise Quality of Experience on Mobile Devices

EFFICIENT H.264 VIDEO CODING WITH A WORKING MEMORY OF OBJECTS

Classifier Swarms for Human Detection in Infrared Imagery

Load Balancing for Hex-Cell Interconnection Network

FACE RECOGNITION USING MAP DISCRIMINANT ON YCBCR COLOR SPACE

PRÉSENTATIONS DE PROJETS

A Probabilistic Approach to Detect Urban Regions from Remotely Sensed Images Based on Combination of Local Features

Comparison Study of Textural Descriptors for Training Neural Network Classifiers

Fitting & Matching. Lecture 4 Prof. Bregler. Slides from: S. Lazebnik, S. Seitz, M. Pollefeys, A. Effros.

COMPLEX WAVELET TRANSFORM-BASED COLOR INDEXING FOR CONTENT-BASED IMAGE RETRIEVAL

Positive Semi-definite Programming Localization in Wireless Sensor Networks

Face Recognition Based on SVM and 2DPCA

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Parallel matrix-vector multiplication

Development of an Active Shape Model. Using the Discrete Cosine Transform

Fingerprint matching based on weighting method and SVM

A Clustering Algorithm for Key Frame Extraction Based on Density Peak

DETECTION OF MOVING OBJECT BY FUSION OF COLOR AND DEPTH INFORMATION

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

3D Face Recognition Fusing Spherical Depth Map and Spherical Texture Map

Refinement to the Chamfer matching for a center-on fit

CHAPTER 3 ENCODING VIDEO SEQUENCES IN FRACTAL BASED COMPRESSION. Day by day, the demands for higher and faster technologies are rapidly

MOTION PANORAMA CONSTRUCTION FROM STREAMING VIDEO FOR POWER- CONSTRAINED MOBILE MULTIMEDIA ENVIRONMENTS XUNYU PAN

Writer Identification using a Deep Neural Network

Mining Image Features in an Automatic Two- Dimensional Shape Recognition System

Multi-view 3D Position Estimation of Sports Players

Support Vector Machines

ABSTRACT 1. INTRODUCTION

DEFECT INSPECTION OF PATTERNED TFT-LCD PANELS USING A FAST SUB-IMAGE BASED SVD. Chi-Jie Lu* and Du-Ming Tsai**

LOCAL FEATURE EXTRACTION AND MATCHING METHOD FOR REAL-TIME FACE RECOGNITION SYSTEM. Ho-Chul Shin, Hae Chul Choi and Seong-Dae Kim

A NEW FUZZY C-MEANS BASED SEGMENTATION STRATEGY. APPLICATIONS TO LIP REGION IDENTIFICATION

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

A Modified Median Filter for the Removal of Impulse Noise Based on the Support Vector Machines

Simulation Based Analysis of FAST TCP using OMNET++

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Gender Classification using Interlaced Derivative Patterns

Vol. 5, No. 3 March 2014 ISSN Journal of Emerging Trends in Computing and Information Sciences CIS Journal. All rights reserved.

Pictures at an Exhibition

Transcription:

A Gradent Dfference based Technque for Vdeo Text Detecton Palaahnakote Shvakumara, Trung Quy Phan and Chew Lm Tan School of Computng, Natonal Unversty of Sngapore {shva, phanquyt, tancl }@comp.nus.edu.sg Abstract Text detecton n vdeo mages has receved ncreasng attenton, partcularly n scene text detecton n vdeo mages, as t plays a vtal role n vdeo ndexng and nformaton retreval. Ths paper proposes a new and robust gradent dfference technque for detectng both graphcs and scene text n vdeo mages. The technque ntroduces the concept of zero crossng to determne the boundng boxes for the detected text lnes n vdeo mages, rather than usng the conventonal projecton profles based method whch fals to fx boundng boxes when there s no proper spacng between the detected text lnes. We demonstrate the capablty of the proposed technque by conductng experments on vdeo mages contanng both graphcs text and scene text wth dfferent font shapes and szes, languages, text drectons, background and contrasts. Our expermental results show that the proposed technque outperforms exstng methods n terms of detecton rate for large vdeo mage database. 1. Introducton Snce 1990s, wth rapd growth of avalable multmeda and ncreasng demand for nformaton ndexng and retreval, much effort has been done on text detecton n vdeo mages [1]. A large number of approaches have been proposed and already obtaned mpressve performance under some constrants [1]. But detectng texts n vdeo wthout any constrants remans challengng and nterestng due to many undesrable propertes of vdeo mages, such as low resoluton, low contrast, unknown text color, sze, poston, orentaton, color bleedng and unconstraned background [2,3]. Two types of text n vdeo are: (1) capton/graphcs/artfcal text whch s artfcally supermposed on the vdeo by human, and (2) scene text whch naturally occurs durng vdeo capture. Obvously, scene text detecton s a challengng task compared to graphcs text due to varyng lghtng, complex movement and transformaton [1]. From the lterature revew t s realzed that connected-component based methods are smple but not robust because they are based on geometrcal propertes of components [4]. On the other hand, texture based methods may be unsutable for small fonts and poor contrast text [5, 6]. In contrast to the precedng two approaches, edge and gradent based methods are fast and effcent but gve more false postves when the complex background present [7-9]. However the major problem of these methods s n choosng threshold values to classfy between text and non text pxels. A method based on unform colors n L* a* b* space s also proposed n [10] to locate unform colored text n vdeo frames. Obvously, ths method fals when text n vdeo contans multple colors n a text lne or n a word. The above observaton shows there s demand for developng a robust technque to gve a better detecton rate wth fewer false alarms wthout any constrants for text detecton n vdeo mages Hence n ths paper, we propose a new robust gradent dfference technque for detectng text n vdeo mages. We observe that the hgh postve and negatve gradent values exst nearer to text pxel or on text pxels compared wth gradent of non text pxel. Ths observaton motvated us to propose a gradent dfference technque for text detecton n vdeo mages. Further, nstead of the conventonal projecton profle based method, we ntroduce a zero crossng technque for fxng boundares of text lnes n vdeo mages. 2. Text Detecton Algorthm 2.1 Gradent Dfference for Text Detecton It s noted n [9] that gradent nformaton n text areas dffers from non text regons because of hgh contrast of text. Ths s the bass of our gradent dfference technque. For a gven gray color mage as shown n Fgure. 1, the technque computes gradent dx mage (G) by usng a horzontal mask [-1 1] whch gves rse to Fgure 1(b). Then Gradent Dfference (GD) s obtaned for each pxel n G as the dfference between the maxmum and mnmum gradent values wthn a local wndow of sze 1 n centered at the pxel where n s a value that depends on the character s stroke wdth. In ths study, we choose n = 11 by

keepng small fonts n mnd. Hgh postve and negatve gradent values n text regons result from hgh ntensty contrast between the text and background regons. Therefore, text regons wll have both large postve and negatve gradents n a local regon due to even dstrbuton of character strokes. Ths results n locally large GD values. To detect such large values the technque determnes Threshold (T) automatcally. It s shown n Fgure. 1(c) where we can see text clearly as whte patches and background as dark color. Small solated whte patches n Fgure 1(c) are removed and the output s shown n Fgure 1(d). Boundares for whte patches representng text lnes are computed usng a zero crossng technque whch wll be dscussed n secton 2.2, as shown n Fgure 1(e). Fgure 1(f) shows the text blocks detected and Fgure 1(g) shows the extracted text blocks. Input mage (b)gradent mage (c)text segments GD ( x, y ) = Max ( x, y ) Mn ( x, y ) (3) Then a pxel s classfed as follows Text pxel, f ( GD ( x, y ) > T ) ( x, y ) = Non Text pxel, Otherwse (4) A global threshold (T) s determned based on the average value of gradent dfference computed as follows. Frst we compute the average gradent values n m as: AVG = 1 G( x, y ) (5), n m x = 1 y = 1 where n, m are the dmenson of the gradent mage. Next we count the number of Hgh Gradent Values as NHG = count ( G ( x, y ) > AVG ) (6). The sum of GD s computed as n m (7), SGD = GD ( x, y ) x = 1 y = 1 Fnally the value of T s computed as T = SGD (( n m ) NHG ) (8). Graphcal representaton for GD obtaned by text detecton algorthm before and after thresholdng s gven n Fgure 2. It can be seen n Fgure 2(d) that non text areas are suppressed by the threshold T. (d)removed small whte patches (e) Text boundary GD before thresholdng (b) 3D graph for (f) Text blocks detected (g) Text blocks extracted Fgure 1. Text detecton More specfcally, the algorthm for detectng texts n vdeo mages s as follows. Let F(x, y) be the gven gray color mage, G(x, y) be the gradent mage obtaned by convolvng horzontal mask [-1 1] wth F(x,y), and W(x, y) be the local wndow centered at (x,y) of sze 1 11. Obtan the mnmum and the maxmum gradent values n W over G(x,y) as follows Mn ( x, y) = mn ( G ( x, y )) (1) x, y W ( x, y ) Max ( x, y ) = max ( G ( x, y )) (2) x, y W ( x, y ) Usng equaton (1) and (2), compute GD(x,y) as follows (c) GD after thresholdng (d) 3D graph for (c) Fgure 2. Text and background separaton 2.2. Zero Crossng for Fxng Boundng Boxes Conventonal projecton profle based method fals to fx boundng boxes for text lnes when there s no proper spacng between them. Fgure 3(b) shows one such example where the second and thrd lnes are connected to each other. These stuatons are common n vdeo text detecton. Therefore, we propose a zero crossng technque whch does not requre complete spacng between the text lnes, to fx the boundary for such text lnes. Zero crossng means transton from 0 to 1 and 1 to 0. The method counts the number of transtons from 0 to 1 and 1 to 0 n each column from

top to bottom of GD(x,y). As shown n Fgure 3(b) GD(x,y) s obtaned for Fgure 3 usng the text detecton algorthm gven n secton 2.1. Next t chooses the column whch gves the maxmum number of transtons to be the boundary for the text lnes. Here we gnore transton f the dstance between two transtons s too small. Wth the help of the number of transtons, the technque draws horzontal boundares for the text lnes as shown n Fgure 3(d). Further, the technque looks for spacng between the text components wthn two horzontal boundares to draw the vertcal boundary for the words and text lnes. Then the detected text blocks are extracted as shown n Fgure 3(e). Lastly, n order to elmnate false postves, we compute heght, wdth, aspect rato, the number of Canny edges, the number of Sobel edges and the number of transtons from 0 to 1 and 1 to 0 n the detected text blocks. We elmnate the text blocks as false postves f the number of Canny edges s too lttle, or the number of transtons s too small or the absolute dfference between the number of Canny edges and the number of Sobel edges s less than 2. of text blocks. The method mplemented usng MATLAB software s run on a PC wth Pentum 1V 2.33 GHz processor. The approxmate processng tme for each vdeo mage of sze 352x288 s about 4 seconds for text detecton. We have chosen three exstng methods [7, 9, 10] for comparson. Method [7] s based on Sobel edge nformaton for text detecton. Method [9] s based on gradent for text detecton. However, as explaned n secton 1, these methods suffer from the choce of several thresholds. Method [10] makes use of unform color for text locaton. 3.2 Sample Test Results Fgures 4-7 show the text detecton results of the proposed method n and the above three exstng methods n (b)-(d) for a varety of sample vdeo mages. In, we show the orgnal mage, the text detecton and the fnal text extracton results usng the proposed method. In (b)-(d), we show only the text detecton results of the three exstng methods. Fgure 4 shows that the three exstng methods fal for text detecton n low contrast mage whereas the proposed method detects most of the text n the mage correctly. Input mage (b) Text lne segments (d) Text lne boundary 3. Expermental Results (e) Text blocks extracted Fgure 3. Advantage of zero crossng technque 3.1 Dataset and Methods for Comparson Snce there s no benchmark database, we have created our own dataset for the purpose of expermentaton. In ths dataset, we have ncluded a varety of vdeo mages such as mages from moves, news clps (busness, sports), news contanng some scene texts, sports vdeos (golf, athletcs), musc vdeo and web mages. t also ncludes mages of multple languages such as Englsh, Korean and Chnese. In ths experment, we have selected 488 vdeo mages from the above sad sources whch gve 3231 actual number (b) Edge based (c) Gradent based (d) Unform text Fgure 4. Text detecton for low contrast mage Fgure 5 shows that the proposed method detects both graphcs and scene text n the athletc mage wth a false postve whle the three exstng methods fal to fx the text lne boundng boxes correctly. The gradent based method fals to detect scene text but the unform text color text method detects text successfully wth a false postve. Fgure 6 shows that the proposed method detects text n the news mage ncludng small font present at the bottom of the mage. Whle the edge and gradent based methods mss some text, the unform text color method tends to nclude addtonal non text nformaton n the boundng boxes. The gradent based

method appears to detect small font and low contrast text better than the other two exstng methods.. (b) Edge based (c) Gradent based (d) Unform text Fgure 5. Text detecton for athletc mage (b) Edge based (c) Gradent based (d) Unform text Fgure 6. Text detecton for news mage (b) Edge based (c) Gradent based (d) Unform text Fgure 7. Text detecton for complex background mage Fgure 7 shows that both the proposed method and the edge based method detect text n complex background correctly. On the other hand, the gradent based method and unform text color methods fal to detect text. 3.3 Comparson Metrcs We evaluate the performance of the proposed method by consderng detecton rate, false postve rate, msdetecton rate and average processng tme as decson parameters. The detected text blocks are represented by ther boundng boxes. The Average Processng Tme (APT) s measured for all mages under study. To judge the correctness of the text blocks detected, we manually count Actual Text Blocks (ATB) n the mages n the dataset. Also we manually label each of the detected blocks as one of the followng categores: Truly detected text block (TDB): a detected block that contans text fully or partally. Falsely detected text block (FDB): a detected block that does not contan text. Text block wth mssng data (MDB): a truly detected text block that msses some characters Based on the number of blocks n each of the categores mentoned above, the followng metrcs are calculated to evaluate the performance of the technques: Detecton rate (DR) = Number TDB / number of ATB. False postve rate (FPR) = Number of FDB / (number of TDB + number of FDB). Msdetecton rate (MDR) = Number of MDB/ Number of TDB The performance of the proposed technque n comparson wth the exstng methods s summarzed n Table 1 and Table 2. Table 2 shows that the detecton rate of the proposed method s hgher than the three exstng methods. Compared wth the exstng gradent method, the present method degrades somewhat n the false postve rate and msdetecton rate. Ths s nsgnfcant consderng the much hgher detecton rate of the present method. The average processng tme of the present method s also comparable to the exstng gradent method. Table 1: Results based on expermental study for the proposed and exstng methods Method ATB TDB FDB MDB Edge based [7] 3231 1288 112 217 Gradent based [9] 3231 1368 116 0 Unform text color [10] 3231 1996 379 1035 Proposed 3231 3085 212 63

Table 2: Performance (%) of the proposed and Exstng methods based on values reported n Table 1 Method DR FPR MDR APT(sec) Edge based [7] 39.8 8.0 16.8 25 Gradent based [9] 42.3 7.0 0 3 Unform text color [10] 61.7 15.9 51.8 42 Proposed 95.4 9.3 2.0 4 3.4 Experment on wndow sze We have conducted experments for the mage shown n Fgure 6 to choose proper n whch we used n secton 2.1 for detectng text canddates usng gradent dfference values as shown n Fgure 8. For our future work, we plan to use temporal nformaton to reduce the false postve rate and msdetecton rate because temporal nformaton wll help n locatng exact text poston n the vdeo mages. Furthermore, the method can be extended to fx the boundng boxes for text lnes wth arbtrary drecton by consderng the detected text block as seed pont to trace the drecton of the remanng text porton. Acknowledgment Ths research s supported n part by IDM R&D grant R252-000-325-279. 4. References. n = 4 (b) n = 9 (c) n = 11 Fgure 8. Choosng n values It s notced from Fgure 8 that for n = 4 we lost low contrast text bottom lne, for n = 9, t restore bottom lne but t msses rght sde low contrast text and for n = 11, the method detects all text lnes. Hence we choose n = 11 n ths work. Further, t s also notced n Fgure 8(b) that frst lne looks lke cropped whereas n (c) text lne restored completely. [1] J. Zang and R. Kastur. Extracton of Text Objects n Vdeo Documents: Recent Progress. The Eghth IAPR Workshop on Document Analyss Systems (DAS2008), Nara, Japan, September 2008, pp 5-17. [2] K. Jung, K.I. Km and A.K. Jan. Text nformaton extracton n mages and vdeo: a survey. Pattern Recognton, 37, 2004, pp. 977-997. [3] Q. Ye, Q. Huang, W. Gao and D. Zhao. Fast and robust text detecton n mages and vdeo frames. Image and Vson Computng 23, 2005, pp. 565-576. [4] A.K. Jan and B. Yu. Automatc Text Locaton n Images and Vdeo Frames. Pattern Recognton, Vol. 31(12), 1998, pp. 2055-2076. 3.5 Lmtaton of the proposed Method [5] Y. Zhong, H. Zhang and A.K. Jan. Automatc Capton Localzaton n Compressed Vdeo. IEEE Trans. Pattern Analyss and Machne Intellgence, Vol. 22, No. 4, 2000, pp. 385-392. Despte ts better performance than the exstg methods, the proposed method has a lmtaton n that t fals to fx boundng boxes for staggered text lnes or skewed scene text as shown n Fgure 9. Soluton to ths problem wll be handled n future. Staggered text lnes (b) Skewed scene text Fgure 9. Falure n fxng boundng boxes by the proposed method 4. Concluson and Future Work In ths paper, we propose a gradent dfference based text detecton technque for extractng both graphc text and scene text wth dfferent fonts, sze, scrpts, contrast, orentaton and backgrounds. A zero crossng technque for fxng boundng boxes for touchng text lnes s proposed rather than the projecton profle based method. Expermental results showed that the proposed method gves good detecton rate comparng wth the results of three exstng methods. [6] K. L Km, K. Jung and J. H. Km. Texture-Based Approach for Text Detecton n Images usng Support Vector Machnes and Contnuous Adaptve Mean Shft Algorthm. IEEE Transactons on Pattern Analyss and Machne Intellgence, Vol. 25, No. 12, December 2003, pp 1631-1639. [7] C. Lu, C. Wang and R. Da. Text Detecton n Images Based on Unsupervsed Classfcaton of Edge-based Features. ICDAR 2005, pp. 610-614. [8] P. Shvakumara, W. Huang and C. L. Tan. An Effcent Edge based Technque for Text Detecton n Vdeo Frames. The Eghth IAPR Workshop on Document Analyss Systems (DAS2008), Nara, Japan, September 2008, pp 307-314. [9] E. K. Wong and M. Chen. A new robust algorthm for vdeo text extracton. Pattern Recognton 36, 2003, pp. 1397-1406. [10] V. Y. Marnano and R. Kastur. Locatng Unform- Colored Text n Vdeo Frames. 15 th ICPR, Volume 4, 2000, pp 539-542.