TEXT DETECTION and recognition is a hot topic for

Size: px
Start display at page:

Download "TEXT DETECTION and recognition is a hot topic for"

Transcription

1 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 23, NO. 10, OCTOBER Gradient Vector Flow and Grouping-based Method for Arbitrarily Oriented Scene Text Detection in Video Images Palaiahnakote Shivakumara, Trung Quy Phan, Shijian Lu, and Chew Lim Tan, Senior Member, IEEE Abstract Text detection in videos is challenging due to low resolution and complex background of videos. Besides, an arbitrary orientation of scene text lines in video makes the problem more complex and challenging. This paper presents a new method that extracts text lines of any orientations based on gradient vector flow (GVF) and neighbor component grouping. The GVF of edge pixels in the Sobel edge map of the input frame is explored to identify the dominant edge pixels which represent text components. The method extracts edge components corresponding to dominant pixels in the Sobel edge map, which we call text candidates (TC) of the text lines. We propose two grouping schemes. The first finds nearest neighbors based on geometrical properties of TC to group broken segments and neighboring characters which results in word patches. The end and junction points of skeleton of the word patches are considered to eliminate false positives, which output the candidate text components (CTC). The second is based on the direction and the size of the CTC to extract neighboring CTC and to restore missing CTC, which enables arbitrarily oriented text line detection in video frame. Experimental results on different datasets, including arbitrarily oriented text data, nonhorizontal and horizontal text data, Hua s data and ICDAR-03 data (camera images), show that the proposed method outperforms existing methods in terms of recall, precision and f-measure. Index Terms Arbitrarily oriented text detection, candidate text components (CTC), dominant text pixel, gradient vector flow (GVF), text candidates (TC), text components. I. Introduction TEXT DETECTION and recognition is a hot topic for researchers in the field of image processing, pattern recognition and multimedia. It draws attention of the contentbased image retrieval (CBIR) community in order to fill Manuscript received July 23, 2012; revised November 30, 2012, January 25, 2013; accepted February 20, Date of publication March 28, 2013; date of current version September 28, This research is supported in part by the A*STAR Grant (WBS no. R252-s ). This paper was recommended by Associate Editor S. Battiato. P. Shivakumara is with the Multimedia Unit, Department of Computer Systems and Information Technology, University of Malaya, Kuala Lumpur 50603, Malaysia, ( hudempsk@yahoo.com). C. L. Tan and T. Q. Phan are with the Department of Computer Science, School of Computing, National University of Singapore, Singapore ( phanquyt@comp.nus.edu.sg; tancl@comp.nus.edu.sg). S. Lu is with the Department of Computer Vision and Image Understanding, Infocomm of Research (I 2 R), Singapore ( slu@i2r. a-star.edu.sg). Color versions of one or more of the figures in this paper are available online at Digital Object Identifier /TCSVT c 2013 IEEE the semantic gap between low level and high level features to some extent if text is available in the video [1] [4]. In addition, the text detection and recognition can be used to retrieve the exciting and semantic events from the sports video [5] [7]. Therefore, text detection and extraction is essential to improve the performance of the retrieval system in real-world applications. Video consists of two types of texts that are scene text and graphics text. Scene text is part of the image captured by camera. Examples of scene text include street signs, billboards, text on trucks, and writing on shirts. Therefore, the nature of scene text is unpredictable compared to graphics text which can be more structured and closely related to the subject. However, scene text can be used to uniquely identify objects in sports events, navigate Google maps, and assist visually impaired people. Since the nature of scene text is unpredictable, it poses lots of challenges. Out of these, arbitrary orientation is more challenging as it is not as easy as processing straight text lines. Several methods have been developed for text detection and extraction that achieve reasonable accuracy for natural scene text (camera images) [8] [13] as well as multi-oriented text [11]. However, it is noted that most of the methods use classifier and large number of training samples to improve the text detection accuracy. To tackle the multi-orientation problem, the methods use connected component analysis. For instance, the stroke width transform based method for text detection in scene images by Epshtein et al. [8] works well for connected components which preserve shapes. Pan et al. [9] also proposed a hybrid approach for text detection in natural scene images based on conditional random field. The conditional random field involves connected component analysis to label the text candidates. Since the images are high contrast images, the connected component analysis based features with classifier training work well for achieving better accuracy. However, the same methods cannot be used directly for text detection in video because of low contrast and complex background which causes disconnections, loss of shapes, etc. In this case, deciding classifier and geometrical features of the components is not that easy. Thus, these methods are not suitable for video text detection. Plenty of methods have been proposed since the last decade for text detection in video based on connected component, [14] [15], texture [16] [19], and edge and gradient [20] [25]. Connected component based methods are good for caption text

2 1730 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 23, NO. 10, OCTOBER 2013 and uniform color text but not for multiple color characters text line and clutter background text. Texture based methods consider the appearance of the text a special texture. These methods are good for complex background to some extent, but at the cost of computations, due to a large number of features and large number of training samples for classification of text and nontext pixels. Therefore, the performance of these methods depends on the classifier in use and the number of training samples chosen for text and nontext. Edge and texture features without a classifier is proposed by Liu et al. [26] for text detection but the method uses a large number of features to discriminate text and nontext pixels. A set of texture features without a classifier is also proposed by Shivakumara et al. [27], [28] for accurate text detection in video frames. Although these methods work well for different varieties of frames, they require more time to process due to large number of features. In addition, the scope of the methods is limited to horizontal text. Similarly, combination of edge and gradient features is good for both text detection accuracy and efficiency compared to texture based methods. For example, text detection using gradient and statistical analysis of intensity values is proposed by Wong and Chen [21]. This method suffers from grouping of text and nontext components. The colour information is also used along with edge information for text detection by Cai et al. [22]. This method works well for caption text but the performance of the method degrades when the font size varies. In general, edge and gradient based methods produce more false positives due to heuristics that are used for text and nontext pixel classification. To the best of our knowledge, none of the methods as discussed above address the arbitrarily oriented text detection in video properly. The reason is that arbitrarily oriented text generally comes from scene text which poses many problems compared to graphics text. Zhou et al. [29] have proposed a method for detecting both horizontal and vertical text lines in video using multiple stage verification and effective connected component analysis. This method is good for caption text but not for other text and the orientation is limited to horizontal and vertical only. Shivakumara et al. [30] have addressed this multi-oriented issue based on the Laplacian and the skeletonization methods. This method gives low accuracy because the skeleton based method is not good enough to classify simple and complex components when clutter background is present. In addition, the method is said to be computationally expensive. Recently, the method [31] based on Bayesian classifier and boundary growing is proposed to improve accuracy for multi-oriented text detection in video. However, the boundary growing method used in this paper is good when sufficient space is present between the text lines otherwise it considers nontext as text components. Therefore, the method considers only nonhorizontal straight text lines instead of arbitrary oriented ones where the space between the text lines is often limited. The arbitrary text detection is proposed in [32] using gradient directional features and region growing. This method requires classification of horizontal and nonhorizontal text images and when the image contains multi-oriented text then fails to classify them. Therefore, it is not effective for arbitrary text detection. Thus, the arbitrarily oriented text detection in video is still considered as a challenging and interesting problem. Hence, in this paper, we propose the use of gradient vector flow for identifying text components in a novel way. The work presented in [33] for identifying object boundaries using gradient vector flow (GVF) which has the ability to move into concave boundaries without sacrificing boundary pixels motivated us to propose GVF based method for arbitrary text detection in this paper. This property helps in detecting both high and low contrast text pixels, unlike the gradient in [32] that detects only high contrast text pixels, which is essential for video text detection of any orientation to improve the accuracy. II. Proposed Methodology We explore GVF for identifying dominant text pixel using Sobel edge map of the input image for arbitrary text detection in video in this paper. We prefer Sobel than other edge operators such as Canny because Sobel gives fine details for text and less details for nontext while Canny gives lots of erratic edges for background along with fine details of text. Next, edge components in Sobel edge map corresponding to dominant pixels are extracted and we call them text candidates (TC). This operation gives representatives for each text line. To tackle arbitrary orientation, we propose a new two-stage grouping criterion for the TC. The first stage grows the perimeter of each TC to identify the nearest neighbor based on size and angle of the TC to group them, which gives text components. Before proceeding to the second stage of grouping, we introduce a skeleton concept on text components given by the first stage to eliminate false text components based on junction points. We name this output candidate text components (CTC). In the second stage, we use tails of the CTC to identify the direction of the text information and the method grows along the identified direction to find the nearest neighbor CTC, which outputs the final result of arbitrarily oriented text detection in video. To the best of our knowledge, this is the first work addressing the issue of arbitrarily oriented text detection in video with promising accuracy using GVF information. A. GVF for Dominant Text Pixel Selection The GVF is a vector that minimizes the energy functional as defined in (1), [33] ε = μ ( u 2 x + u2 y + v2 x + v2 y + f 2 ) g f 2 dxdy (1) where g (x, y) = (u (x, y), v(x, y)) is the GVF field and f (x, y) is the edge map of the input image. This GVF has been used in [33] for object boundary detection and it is shown that GVF is better than traditional gradient and sneak. It is also noted from [33] that there are two problems with the traditional gradient operation: 1) these vectors generally have large magnitudes only in the immediate vicinity of the edges, and 2) in homogeneous regions, where pixel values are nearly constant, and f is nearly zero. The

3 SHIVAKUMARA et al.: GVF AND GROUPING-BASED METHOD FOR SCENE TEXT DETECTION 1731 Fig. 1. Dominant point selection based on GVF. (a) Input. (b) GVF. (c) Dominant text pixels. (d) Dominant pixels on input frame. GVF is extension of gradient which extends the gradient map farther away from the edges and into homogeneous regions using computational diffusion process. This results in the inherent competition of the diffusion process which will create vectors that point into boundary concavities. This is a special property of the GVF. In summary, GVF helps to propagate gradient information, i.e., the magnitude and the direction, into homogenous regions. In other words, GVF helps in detecting multiple forces at corner points of object contours. This cue allows us to use multiple forces at corner points of edge components in the Sobel edge map of the input video text frame to identify them as dominant pixels. This dominant pixel selection removes most of the background information which simplifies the problem of classifying text and nontext pixels and retains text information irrespective of the orientation of the text in video. This is the great advantage of dominant pixel selection by GVF information. It is illustrated in Fig. 1 where (a) is the input, and (b) is the GVF for all pixels in the image in Fig. 1(a). It is observed from Fig. 1(b) that dense forces at corners of contours and at curve boundaries of text components as text components are more cursive than nontext components in general. Therefore, for each pixel, we count how many forces are pointing to the text pixels and other pixels (based on GVF arrows). A pixel is classified as a dominant text pixel if the pixel attracts at least four GVF forces. The threshold of four is determined by running an experiment of counting GVF forces between one and five GVF forces over 100 test samples randomly selected from our database as reported quantitative results in Table I. Table I shows that for two GVF, f-measure is low and misdetection rate is high compared to three GVF due to more nontext pixels (background) represented by two GVF while for three GVF, f-measure is low and misdetection rate is high compared to four GVF due to the same reason. On the other hand, for four GVF, f-measure is high and misdetection rate is low compared to five GVF. This shows that five GVF loses text pixels and it increases the misdetection rate. It is also observed from Table I that the five GVF gives high precision and low recall compared to four GVF. This indicates that five GVF loses dominant pixels which represent true text and nontext pixels as well. Therefore, it is inferred that four GVF is better than other GVF for identifying dominant text pixels which represent true text pixels and few nontext pixels. In addition, at this stage, our objective to propose four GVF for dominant pixel selection is to remove nontext pixels as many as possible despite the fact that it eliminates a few dominant pixels which represent text pixels because the proposed grouping presented (in Sections II-C and II-D) have the ability to restore missing text information. Therefore, losing a few dominant text pixels for characters in a text line does not affect much overall performance of the method. The dominant text pixel selection is illustrated in Fig. 1(c) for the frame shown in Fig. 1(a). Fig. 1(c) shows that the dominant text pixel selection that removes almost all nontext components. Fig. 1(d) shows dominant text pixels overlaid on the input frame. One can notice from Fig. 1(d) that each text components have dominant pixel. In this way, dominant text pixel selection facilitates arbitrarily oriented text detection. As an example, we choose a character image a from the input frame shown in Fig. 1(a). This is reproduced in Fig. 2(a) to illustrate that how GVF information helps in selecting dominant text pixels. To show GVF arrows for the character image in Fig. 2(a), we get the Sobel edge map as shown in Fig. 2(b) and GVF arrows on the Sobel edge map as shown in Fig. 2(c). From 2(c), it is clear that all the GVF arrows are pointing toward the inner contour of the character image a. This is because of the low contrast in the background and the high contrast at the inner boundary of the character image a. Thus, from Fig. 2(d), we observe that corner points and cursive text pixel on the contour attract more GVF arrows compared to non-corner points and nontext pixels. For instance, for a text pixel on the inner contour of the character a shown in Fig. 2(a), the GVF corresponding to this pixel is marked by the oval in the middle of Fig. 2(d). The oval area shows that a greater number of GVF forces are pointing toward that text pixel. Similarly, for a nontext pixel at the top left corner of the character a in Fig. 2(a), the corresponding GVF marked by top left oval in Fig. 2(d) shows that lesser number of GVF forces are pointing toward that pixel. For the same two text and nontext pixels, we show the GVF arrows in their 3 x 3 neighborhood. Darker arrows shown in Figs. 3(a) and (b) are those that point to the middle pixel (the pixel of interest); lighter arrows are those that are attracted elsewhere. In Fig. 3(a), the middle pixel attracts four arrows. Hence it is classified as a corner point (dominant text pixel) and the other pixel shown in Fig. 3(b) attracts only one arrow and is classified as a nontext pixel. We test some pixels that attract two and three GVF arrows as shown in Fig. 3(c) and (d), and Fig. 3(e) and (f), respectively. One can see that dominant pixels (DP) shown in Fig. 3(d) and (f) corresponding to GVF (red color) in Fig. 3(c) and (e), represent not only text pixels but also nontext pixel (background pixels). On the other hand, in Fig. 3(g) and (h) we see that the pixels selected by four GVF are real candidate text

4 1732 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 23, NO. 10, OCTOBER 2013 Fig. 4. Four GVF for the characters O and I to identify DP. (a) Four GVF. (b) DP. (c) Four GVF. (d) DP. Fig. 5. Text candidates selection based on dominant pixels. (a) Sobel edge map. (b) Text candidates. Fig. 2. Magnified GVF for corner and non-corner pixels marked by oval shape. (a) Character chosen from Fig. 1(a). (b) Sobel edge map. (c) GVF overlaid on Sobel edge map. (d) GVF for the character image shown in (a). TABLE I Experiments on 100 Random Samples Chosen From Different Databases for Choosing GVF Arrows GVF Arrows R P F MDR B. Text Candidates Selection We use the result of dominant pixel selection shown in Fig. 1(c) for text candidate selection. For each dominant pixel in Fig. 1(c), the method extracts edge components from the Sobel edge map shown in Fig. 5(a) corresponding to dominant pixels. We call these extracted edge components as text candidates as shown in Fig. 5(b). Fig. 5(b) shows that this operation extracts almost all text components with few false positives. Then the extracted text candidates are used in the next section to restore complete text information with Sobel edge map. Fig. 3. Illustration for selection of dominant text pixels (DP) with GVF arrows. (a) GVF arrows at text pixel. (b) GVF arrows at nontext. (c) Two GVF. (d) DP. (e) Three GVF. (f) DP. (g) Four GVF. (h) DP. pixels because these pixels indeed represent only text pixels as shown in Fig. 3(h) for the GVF in red color shown in Fig. 3(g). In addition, Fig. 4 shows that four GVF selection identifies dominant pixels [Fig. 4(b) and (d)] well for the characters like O and I [Fig. 4(a) and (c)] where there are no corners but they have extreme points. Thus, it confirms that four GVF work well for any other characters. C. First Grouping for Candidate Text Components For each text candidate shown in Fig. 5(b), the method finds its perimeter and it allows the perimeter to grow in five iterations, pixel by pixel, in the direction of the text line in the Sobel edge map of the input frame to group neighboring text candidates. The perimeter is defined as contour of the text candidates. The method computes minor axis for the perimeter of the text candidates and it considers length of the minor axis as radius to expand the perimeter. At every iteration, the method traverses the expanded perimeter to find the text pixel (white pixel) of the neighboring text candidate in the text line. The objective of this step is to merge segments of character components and neighbor characters to form a word. This process merges text candidates which have close proximity within five iterations of the perimeter. The five is

5 SHIVAKUMARA et al.: GVF AND GROUPING-BASED METHOD FOR SCENE TEXT DETECTION 1733 Fig. 6. Illustration for candidate text components selection. (a) g. (b) c. (c) g prev. (d) c last. (e) g next. determined empirically by studying the space between the text candidates. The five pixel tolerance is acceptable because it is lower than the space between the characters. As a result, we get two groups of text candidates, namely the current group and the neighbor group. Then the method verifies the following properties based on the size and angle of the text candidate groups before merging them. Generally, the length of the major axes of the character components will have almost the same lengths and the angle difference between the character components have almost the same angle. However, we fix θ min 1 as 5 because in case of arbitrarily oriented text, each character has slight different orientations according to nature of text line orientation. To take care of little orientation variation, we fix the 5. Size medianlength(g) < length(c) < medianlength(g) 3 3 where length( ) is the length of the major axis of a text candidates group and medianlength( ) is the median length of the major axes of all the text candidates in the group so far. Angle g = g prev {clast } g next = g {c} θ 1 = angle(g) angle(g prev ) θ 2 = angle(g) angle(g next ) where g is the current group, c last is the text candidate group that was last added to g, and c is the new text candidate group that we are considering to add to g. It follows that g prev and g next are the group immediately before the current group and the candidate (next) group, respectively. Angle(.) returns the orientation of the major axis of each group based on PCA. The angle condition is θ 1 θ 2 θ min 1. This condition is only checked when g has at least four components. If a text candidate group passes these two conditions, we merge the neighbor group with the current group to get candidate text components (word patches). These two conditions fail when we get large angle difference between two words due to clutter background while grouping. It is illustrated in Fig. 6 where (a) (e) show g, c, g prev, c last, and g next, respectively chosen from Fig. 5(b). The angles are computed for the groups as follows. In this case θ 1 = 5.33 θ 2 = 4.02 length(c) = medianlength(g) = Fig. 7. Word patches extraction. (a) First grouping. (b) Staircase effect. (c) Skeleton. (d) End and junction points. (e) Candidates text components after false positive elimination. Fig. 8. Illustration for word grouping. (a) w 1. (b) w 2. (c) t 1. (d) t 2. (e) t 12. So the conditions are satisfied and c is merged into g as shown in Fig. 6(e). In this way, the method groups the broken segments and neighboring characters to get candidate text components. The final results of grouping for the text candidates in Fig. 5(b) are shown in Fig. 7(a) where we can see different colors representing different formed groups. The staircase effect in Fig. 7(b) shows that grouping mechanism for obtaining candidate text groups. This process repeats until there are no remaining unvisited text candidates. This grouping essentially gives word patches by grouping character components. It is observed from Fig. 7(b) that there are false text candidates groups. To eliminate them, we check the skeleton of each group as shown in Fig. 7(c), and count the number of junction points shown in Fig. 7(d). If intersection(skeleton(g)) > 0 false text candidate groups and not retained. Skeleton(.) returns the skeleton of a group and intersection(.) returns the set of intersection (junction) points. The final results can be seen after removing false text candidates group in Fig. 7(e). However, there is still a false text candidates group. D. Second Grouping for Text Line Detection The first grouping mentioned above produces the word patches by grouping character components. For each word patch, the second grouping now finds two tail ends using the major axis of the word patch. The method considers text candidates at both tail ends of the word to grow its perimeter based on the direction of the major axis for a few iterations

6 1734 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 23, NO. 10, OCTOBER 2013 Fig. 9. Arbitrary text extraction. (a) Second grouping. (b) Text line detection. to find neighboring word patches. The number of iterations is determined based on the experiments on space between words and characters. While growing the perimeter by pixel by pixel, the method looks for white pixels of the neighboring word patches. The Sobel edge map of the input frame has been used for growing and finding neighboring word patches. The two word patches are grouped based on the angle properties of word patches. Let t 1 and t 2 be the right tail end and left tail end of the second word patch, respectively t 1 = tail(w 1,c 1 ),t 2 = tail(w 2,c 2 ),t 12 = t 1 t 2 θ 1 = angle(t 1 ) angle(t 12 ), θ 2 = angle(t 2 ) angle(t 12 ) where w 1 is the current word patch, and c 1 is the text candidate that is being used for growing. c 2 is that text candidate of the word patch w 2 that it belongs to. The idea is to check that the tail angles of the two words are compatible with each other. tail(w, c) returns up to three text candidates immediately connected to c in w. t 12 is then the next text candidate tail of both t 1 and t 2. The angle condition is θ 1 θ min 2 θ 2 θ min 2. This condition is only checked if both t 1 and t 2 contain three components. If a word patch passes this condition, it is merged to the current word. Here we set θ min 2 to 25 to take care of orientation difference between the words in the text line. The little orientation difference between the words is expected because the input is arbitrarily oriented text. This 25 may not affect much grouping process because of enough space between the text lines. Illustration for grouping word patches chosen from Fig. 7(e) can be seen in Fig. 8 where (a) (e) represent w 1, w 2, t 1, t 2 and t 12, respectively. Suppose we are considering whether to merge w 1 and w 2. In this case, θ 1 =20.87, θ 2 =20.68 so the condition is satisfied and w 1 and w 2 are merged; this is shown in Fig. 9(a) in red color. This process repeats until there are no remaining unvisited words and the output of the second grouping is shown in Fig. 9(a) where the staircase effect with different colors shows how the words are grouped with the final results shown in Fig. 9(b) where the curving text line is extracted with a false positive. E. False Positive Removal Sometimes the false positives are merged with the text lines (like in the above case), which makes it difficult to remove the Fig. 10. Illustration for false positives elimination. (a) Input. (b) Before false positive removal. (c) Area for false positive removal. (d) Density for false positive removal. false positives. However, in other cases, the false positives may stand alone and thus we propose the following rules to remove these kinds of false positives. The rules for eliminating such false positives based geometrical properties of the text block are common practice in text detection [14] [32] to improve the accuracy. Therefore, we also propose similar rules in this paper. False positive checking: if area(w) < 200 or edge density(w) < 0.05 false positive and removed edge density(w) = edge length(sobel(w)). area(w) where sobel( ) returns the sobel edge map and edge length( ) returns the total length of all edges in the edge map. Fig. 10(a) shows the input, (b) shows the results before false positive elimination, (c) shows the results of false positive elimination using the area of the text block and (d) shows the results of false positive elimination using edge density of the text block. III. Experimental Results We create our own dataset for evaluating the proposed method along with standard dataset such as Hua s data of 45 video frames [34]. Our dataset includes 142 arbitrarily oriented text frames (almost all scene text frames), 220 nonhorizontal text frames (176 scene text frames and 44 graphics text frames), 800 horizontal text frames (160 Chinese text frames, 155 scene text frames and 485 English graphics text frames), and publicly available Hua s data of 45 frames (12 scene text frames and 33 graphics text frames). We also tested our method on the ICDAR-03 competition dataset [35] of 251 camera images (all are scene text images) to check the effectiveness of our method on camera based images. In total, 1207 ( ) video frames and 251 camera images are used for experimentation. To compare the results of the proposed method with existing methods, we consider seven popular existing methods, which are the Bayesian and boundary growing based method [31], Laplacian and skeleton based method [30], Fourier-RGB based method [28], and those presented in [21], [22], [26] and

7 SHIVAKUMARA et al.: GVF AND GROUPING-BASED METHOD FOR SCENE TEXT DETECTION 1735 Fig. 11. Sample results for arbitrarily oriented text detection. (a) Input. (b) Proposed. (c) Bayesian. (d) Laplacian. (e) Zhou et al. (f) Fourier-RGB (g) Liu et al. (h) Wong and Chen (i) Cai et al. [29]. The main reason to consider these existing methods is that these methods work with fewer constraints, for complex background without a classifier and training as in our proposed method. We evaluate the performance of the proposed method at the text line level, which is a common granularity level in the literature [17] [25], rather than the word or character level because we have not considered text recognition in this paper. The following categories are defined for each detected block by a text detection method. Truly detected block (TDB): a detected block that contains at least one true character. Thus, a TDB may or may not fully enclose a text line. Falsely detected block (FDB): a detected block that does not contain text. Text block with missing data (MDB): a detected block that misses more than 20% of the characters of a text line (MDB is a subset of TDB). The percentage is chosen according to [30] [31], in which a text block is considered correctly detected if it overlaps at least 80% of the pixels of the ground-truth block. We count manually actual number of text blocks (ATB) in the images, and it is considered as the ground truth for evaluation. The performance measures are defined as follows. Recall (R) = TDB/ATB. Precision (P) = TDB/(TDB + FDB). F- measure (F) = (2 P R) / (P + R). Misdetection Rate (MDR) = MDB / TDB. In addition, we also measure the average processing time (APT) in terms of seconds for each method in our experiment. A. Experiment on Video Text Data In order to show the effectiveness of the proposed method over the existing methods, we assemble 142 arbitrary images with 800 horizontal and 220 nonhorizontal images to form Fig. 12. Sample results for nonhorizontal text detection. (a) Input. (b) Proposed. (c) Bayesian. (d) Laplacian. (e) Zhou et al. (f) Fourier-RGB. (g) Liu et al. (h) Wong and Chen. (i) Cai et al. a representative variety set of general video data to calculate the performance measures, namely, recall, precision, f-measure and misdetection rate. The quantitative results of the proposed and the existing methods for 1162 images ( ) are reported in Table II. We highlight sample arbitrary, horizontal and nonhorizontal images for discussion in Figs. 11, 12, and 13, respectively. For the curve text line like circle shaped shown in Fig. 11(a), the proposed method extracts text lines with one false positive while the existing methods fail to detect curve text line properly. The main reason is that the existing methods are developed for horizontal and nonhorizontal text line detection but not for arbitrary text detection. It is observed from Fig. 12 that for the input frame having different orientations and complex background as shown in Fig. 12(a), the proposed method detects almost all text with a few misdetections as shown in Fig. 12(b), while the Bayesian method does not fix bounding boxes properly, as shown in Fig. 12(c); the Laplacian method detects two text lines and it loses one text line, as shown in Fig. 12(d), due to complex background in the frame. On the other hand, Zhou et al. s method fails to detect text, as shown in Fig. 12(e), as it is limited to horizontal and vertical text lines only and caption text but not scene text and multi-oriented text. It is also observed from Fig. 12 that the Fourier-RGB method, Liu et al. s, Wong and Chen s and Cai et al. s methods fail to detect text lines because these methods are developed for horizontal text detection but not for nonhorizontal text detection. Sample experimental results for both the proposed and existing methods on horizontal text detection are shown in Fig. 13 where input image shown in Fig. 13(a) has complex background with horizontal text. It is noticed from Fig. 13 that the proposed method, the Bayesian, the Laplacian, the Fourier-

8 1736 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 23, NO. 10, OCTOBER 2013 TABLE II Performance on Arbitrary + Nonhorizontal + Horizontal Data ( = 1162) Methods R P F MDR APT (sec) Proposed Method Bayesian [31] Laplacian [30] Zhou et al. [29] Fourier-RGB [28] Liu et al. [26] Wong and Chen [21] Cai et al. [22] Fig. 13. Sample results for horizontal text detection. (a) Input. (b) Proposed. (c) Bayesian. (d) Laplacian. (e) Zhou et al. (f) Fourier-RGB. (g) Liu et al. (h) Wong and Chen. (i) Cai et al. results reported in Table II also show the proposed method outperforms the existing methods in terms of recall, precision, f-measure and misdetection rate. However, the APT of the proposed method is longer than most of the existing methods, except for the Fourier-RGB and Liu et al. s methods, as shown in Table II, as well as in subsequent experiments, namely Tables III and IV. The higher APT is attributed to the process of GVF determination and grouping which incurs higher computational cost. It is this GVF process that enables the proposed method to deal with arbitrarily oriented text lines. Our previous methods, namely, the Bayesian and the Laplacian methods, give lower accuracy compared to the proposed method according to Table II. This is because these methods were developed for nonhorizontal and horizontal text detection but not for arbitrary orientation text detection. As a result, the boundary growing and the skeleton based methods proposed, respectively, in the Bayesian and the Laplacian for handling multi-oriented problems fail to perform on arbitrary text. Zhou et al. s method works well for only vertical and horizontal caption text but not for arbitrary orientation and scene text, and hence, the method gives poor accuracy. Since Liu et al. s, Wong and Chen s, and Cai et al. s methods were developed for horizontal text detection but not for nonhorizontal and arbitrary orientation text detection, these methods give poor accuracy compared to the proposed method. Fig. 14. Sample results for Hua s data. (a) Input. (b) Proposed. (c) Bayesian. (d) Laplacian. (e) Zhou et al. (f) Fourier-RGB. (g) Liu et al. (g) Liu et al. (i) Cai et al. RGB and Cai et al. s methods detect almost all text lines while other methods miss text lines. The Bayesian method does not fix bounding box properly and it gives more false positives due to the problem of boundary growing. The Fourier-RGB method detects text properly. The other existing methods do not detect text properly as we can notice that Zhou et al. s method misses a few text lines, Liu et al. s method misses a few words in addition to false positives, while Wong and Che s, and Cai et al. s methods do not fix the bounding boxes properly for the text lines. Observations of the above sample images show that the proposed method detects well for arbitrary, nonhorizontal and horizontal texts compared to existing methods, the quantitative B. Experiment on Independent Data (Hua s Data) We found a small publicly available dataset of 45 video frames [34], namely, Hua s dataset for evaluating the performance of the proposed method in comparison with the existing methods. We included this set in our experiment as it serves as an independent test set in addition to our own data set in the preceding section. However, we caution that this set contains only horizontal text and hence does not give a full comparison for the entire spectrum of the text detection capability from horizontal and nonhorizontal to arbitrary orientation. Fig. 14 shows sample results for the proposed and existing methods, where (a) is the input frame having huge and small font text, and (b) (i) are the results of the proposed and existing methods, respectively. It is observed from Fig. 14 that the proposed method detects both the text lines in the input frame while the Bayesian method does not detect all text and the Laplacian method fails to detect complete text lines; hence rendering them as

9 SHIVAKUMARA et al.: GVF AND GROUPING-BASED METHOD FOR SCENE TEXT DETECTION 1737 TABLE III Performance With Hua s Data Methods R P F MDR APT (sec) Proposed Method Bayesian [31] Laplacian [30] Zhou et al. [29] Fourier-RGB [28] Liu et al. [26] Wong and Chen [21] Cai et al. [22] either misdetection or false positives. Therefore, misdetection rate is high compared to the proposed method as shown in Table III. The Fourier-RGB method detects text properly and hence it gives good recall. The other existing methods fail to detect text lines in the input frame due to font variation. From Table III, it can be concluded that the proposed method and our earlier methods [30], [31] outperform the other existing methods in terms of recall, precision, f-measure and misdetection rate. We take note that the Bayesian method [31] and the Laplacian method [30] achieve better f-measure than the proposed method. However, as we cautioned earlier, Hua s dataset does not contain arbitrarily oriented text, and both the Bayesian and the Laplacian methods are given an advantage of not being tested with arbitrary text lines. If Hua s dataset had contained arbitrarily oriented text lines, then the Bayesian and the Laplacian methods would have shown poorer f-measures like in Table II. C. Experiment on ICDAR-03 Data (Camera Images) We added another independent test set in this experiment like in the preceding section. The objective of this experiment is to show that the proposed method works well for high resolution camera images when the proposed method works well for low resolution video frames. This dataset is available publicly [35] as ICDAR-03 competition data for text detection from natural scene images. We show sample results for the proposed and existing methods in Fig. 15 where (a) is a sample input frame, and (b)- (i) show the results of the proposed and the existing methods, respectively. It is observed from Fig. 15 that the proposed method, the Fourier-RGB method and Cai et al. s method work well for the input frame but other methods including our earlier methods, namely, the Bayesian and the Laplacian methods fail to detect text lines properly. The results reported in Table IV shows that the proposed method is better in terms of recall, f-measure and misdetection rate compared to the Bayesian, the Laplacian and Fourier-RGB methods. This is because for high contrast and resolution images, the classification methods proposed in the Bayesian and the Laplacian methods and the dynamic threshold used in Fourier-RGB all fail to classify text and nontext pixels properly. However, the proposed method and our earlier methods are better than the other existing methods in terms of recall, precision and f-measure but in terms of misdetection rate, Wong and Chen s method is better according to results reported in Table IV. Wong and Chen s method is worst in recall, precision and f-measure compared to Fig. 15. Sample results for scene text detection (ICDAR-2003 data). (a) Input. (b) Proposed. (c) Bayesian. (d) Laplacian. (e) Zhou et al. (f) Fourier- RGB. (g) Liu et al. (h) Wong and Chen. (i) Cai et al. TABLE IV Line Level Performance on ICDAR-03 Data Methods R P F MDR APT (sec) Proposed Method Bayesian [31] Laplacian [30] Zhou et al. [29] Fourier-RGB [28] Liu et al. [26] Wong and Chen [21] Cai et al. [22] the proposed method. This experiment shows that the proposed method is good for even high resolution and contrast images. We also conduct experiments on ICDAR data using ICDAR 2003 measures for our proposed method and the results are reported in Table V. Since our primary goal is to detect text in video, we develop and evaluate the method at the line level as it is a common practice in video text detection survey [14] [32]. In order to calculate recall, precision and f-measure according to ICDAR 2003, we modify the method to fix the bounding box for each word in the image based on the space between the words and characters. Table V shows that the proposed method does not achieve better accuracy than the best method (Hinner Becker) but it stands in the third position among the five methods. The poor accuracy is due to the problem of word segmentation, fixing closed bounding box and strict measures. In addition, the method does not utilize the advantage of high resolution images as the participating methods use connected component analysis for text detection and grouping. Hence, the proposed method misses true text blocks. The results of the participating methods reported in Table V are taken from the ICDAR 2005 [35] to compare with the proposed method.

10 1738 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 23, NO. 10, OCTOBER 2013 TABLE V Word Level Performance on ICDAR 2003 Data Methods R P F Proposed Method Hinner Becker [35] Alex Chen [35] Qiang Zhu [35] Jisoo Kim [35] Nobuo Ezaki [35] IV. Conclusion and Future Work In this paper, we explored GVF information for the first time for text detection in video by selecting dominant text pixels and text candidates with the help of the Sobel edge map. This dominant text pixel selection helps in removing nontext information in complex background of video frames. Text candidate selection and the first grouping method ensured that text pixels were not missed. The second grouping tackled the problems created by arbitrarily oriented text to achieve better accuracy for text detection in video. Experimental results of the variety of the datasets, such as arbitrarily oriented data, nonhorizontal data, horizontal data, Hua s data and ICDAR- 03 data, showed that the proposed method works well for text detection irrespective of contrast, orientation, background, script, fonts and font size. However, the proposed method may not give good accuracy for horizontal text lines with less spacing between text lines. To overcome this problem, we are planning to develop another method which can detect text lines without considering their spacing using an alternative grouping criterion. Acknowledgment The authors would like to thank the editor and the reviewers for their constructive comments and suggestions that helped in improving the quality of this paper. References [1] N. Sharma, U. Pal, M. Blumenstein, Recent advances in video based document processing: A review, in Proc. DAS, pp [2] J. Zang and R. Kasturi, Extraction of text objects in video documents: Recent progress, in Proc. DAS 2008, pp [3] K. Jung, K. I. Kim, and A. K. Jain, Text information extraction in images and video: A survey, Pattern Recognit., vol. 37, pp , Oct [4] D. Crandall and R. Kasturi, Robust detection of stylized text events in digital video, in Proc. ICDAR, 2001, pp [5] D. Zhang and S. F. Chang, Event detection in baseball video using superimposed caption recognition, in Proc. ACM MM, 2002, pp [6] C. Xu, J. Wang, K. Wan, Y. Li, and L. Duan, Live sports event detection based on broadcast video and web-casting text, in Proc. ACM MM, 2006, pp [7] W. Wu, X. Chen, and J. Yang, Incremental detection of text on road signs from video with applications to a driving assistant systems, in Proc. ACM MM, 2004, pp [8] B. Epshtein, E. ofek, and Y. Wexler, Detecting text in natural scenes with stroke width transform, in Proc. CVPR, 2010, pp [9] Y. F. Pan, X. Hou, and C. L. Liu, A hybrid approach to detect and localize texts in natural scene images, IEEE Trans. Image Process., vol. 20, no. 3, pp , Mar [10] X. Chen, J. Yang, J. Zhang, and A. Waibel, Automatic detection and recognition of signs from natural scenes, IEEE Trans. Image Process., vol. 13, no. 1, pp , Jan [11] C. Yao, X. Bai, W. Liu, Y. Ma, and Z. Tu, Detecting texts of arbitrary orientations in natural images, in Proc. CVPR, 2012, pp [12] L. Neumann and J. Matas, Real-time scene text localization and recognition, in Proc. CVPR, 2012, [13] T. Q. Phan, P. Shivakumara, and C. L. Tan, Detecting text in the real world, in Proc. ACM MM, 2012, pp [14] A. K. Jain and B. Yu, Automatic text location in images and video frames, Pattern Recognit., vol. 31, no. 12, pp , Dec [15] V. Y. Mariano and R. Kasturi, Locating uniform-colored text in video frames, Proc. ICPR, 2000, pp [16] H. Li, D. Doermann, and O. Kia, Automatic text detection and tracking in digital video, IEEE Trans. Image Process., vol. 9, no. 1, pp , Jan [17] Y. Zhong, H. Zhang, and A. K. Jain, Automatic caption localization in compressed video, IEEE Trans. Pattern Anal. Mach. Intell., vol. 22, no. 4, pp , Apr [18] K. L Kim, K. Jung, and J. H. Kim Texture-based approach for text detection in images using support vector machines and continuous adaptive mean shift algorithm, IEEE Trans. Pattern Anal. Mach. Intell., vol. 25, no. 4, pp , Dec [19] V. Wu, R. Manmatha, and E. M Riseman, Text finder: An automatic system to detect and recognize text in images, IEEE Trans. Pattern Anal. Mach. Intell., vol. 21, no. 11, pp , Nov [20] R. Lienhart and A. Wernickle, Localizing and segmenting text in images and videos, IEEE Trans. Cicuits Syst. Video Technol., vol. 12, no. 4, pp , Apr [21] E. K. Wong and M. Chen, A new robust algorithm for video text extraction, Pattern Recognit., vol. 36, no. 6, pp , Jun [22] M. Cai, J. Song, and M. R. Lyu, A new approach for video text detection, in Proc. ICIP, 2002, pp [23] A. Jamil, I. Siddiqi, F. Arif, and A. Raza, Edge-based features for localization of artificial Urdu text in video images, in Proc. ICDAR, 2011, pp [24] M. Anthimopoulos, B. Gatos, and I. Pratikakis, A two-stage scheme for text detection in video images, Image Vision Comput., vol. 28, pp , Mar [25] X. Peng, H. Cao, R. Prasad, and P. Natarajan, Text extraction from video using conditional random fields, in Proc. ICDAR, 2011, pp [26] C. Liu, C. Wang, and R. Dai, Text detection in images based on unsupervised classification of edge-based features, in Proc. ICDAR, 2005, pp [27] P. Shivakumara, W. Huang, C. L. Tan, and P. Q. Trung, Accurate video text detection through classification of low and high contrast images, Pattern Recognit., vol. 43, no. 6, pp , Jun [28] P. Shivakumara, T. Q. Phan, and C. L. Tan, New Fourier-statistical features in RGB space for video text detection, IEEE Trans. Cicuits Syst. Video Technol., vol. 20, no. 11, pp , Nov [29] J. Zhou, L. Xu, B. Xiao, and R. Dai, A robust system for text extraction in video, in Proc. ICMV, 2007, pp [30] P. Shivakumara, T. Q. Phan, and C. L. Tan, A Laplacian approach to multi-oriented text detection in video, IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 2, pp , Feb [31] P. Shivakumara, R. P. Sreedhar, T. Q. Phan, S. Lu, and C. L. Tan, Multioriented video scene text detection through Bayesian classification and boundary growing, IEEE Trans. Cicuits Syst. Video Technol., vol. 22, no. 8, pp , Aug [32] N. Sharma, P. Shivakumara, U. Pal, M. Blumenstein, and C. L. Tan, A new method for arbitrarily-oriented text detection in video, in Proc. DAS, 2012, pp [33] C. Xu and J. L. Prince, Snakes, shapes, and gradient vector flow, IEEE Trans. Image Process., vol. 7, no. 3, pp , Mar [34] X. S Hua, L. Wenyin, and H. J. Zhang, An automatic performance evaluation protocol for video text detection algorithms, IEEE Trans. Cicuits Syst. Video Technol., vol 14, no. 4, pp , Apr [35] S. M. Lucas, ICDAR 2005 text locating competition results, in Proc. ICDAR, 2005, pp

11 SHIVAKUMARA et al.: GVF AND GROUPING-BASED METHOD FOR SCENE TEXT DETECTION 1739 Palaiahnakote Shivakumara received the B.Sc., M.Sc., M.Sc Technology by research, and Ph.D degrees in computer science in 1995, 1999, 2001 and 2005, respectively, from the University of Mysore, Mysore, Karnataka, India. He is currently a Visiting Senior Lecturer in the Department of Computer Systems and Information Technology, University of Malaya, Kualalampur, Malayasia. From 1999 to 2005, he was a Project Associate at the Department of Studies in Computer Science, University of Mysore, where he conducted research on document image analysis, including document image mosaicing, character recognition, skew detection, face detection, and face recognition. From 2005 to 2007, he worked as a Research Fellow in image processing and multimedia at the Department of Computer Science, School of Computing, National University of Singapore (NUS), Singapore. He also worked as a Research Consultant on image classification at Nanyang Technological University, Singapore, for a period of six months, in He was a Research Fellow on video text extraction and recognition at NUS, from 2008 to His current research interests include image processing, pattern recognition, including text extraction from video, and document image processing. Dr. Shivakumara has published more than 100 research papers in national, international conferences and journals. He has been reviewer for several conferences and journals. Trung Quy Phan received the B.Sc. degree in computer science from the School of Computing, National University of Singapore, Singapore, in Currently he is pursuing the Ph.D. degree from the same university. His current research interests include image and video analysis. Shijian Lu received the Ph.D. degree in electrical and computer engineering from the National University of Singapore, Singapore, in Currently he is a Senior Research Fellow at the Institute for Infocomm Research, Agency for Science, Technology and Research, Singapore. His current research interests include document image analysis and medical image analysis. He has published over 40 peer-reviewed journal and conference papers. Dr. Lu is a member of the International Association for Pattern Recognition. Chew Lim Tan (SM 85) received the B.Sc. (Hons.) degree in physics from the University of Singapore, Singapore, in 1971, the M.Sc. degree in radiation studies from the University of Surrey, Surrey, U.K, in 1973, and the Ph.D. degree in computer science from the University of Virginia, Virginia, USA, in He is currently a Professor at the Department of Computer Science, School of Computing, National University of Singapore. His current research interests include document image analysis, and text and natural language processing. He has published more than 400 research publications in these areas. Dr. Tan is an associate editor of Pattern Recognition and ACM Transactions in Asian Language Information Processing, and an editorial member of the International Journal on Document Analysis and Recognition. He is a Fellow and Member of the Governing Board of the International Association for Pattern Recognition.

A Laplacian Based Novel Approach to Efficient Text Localization in Grayscale Images

A Laplacian Based Novel Approach to Efficient Text Localization in Grayscale Images A Laplacian Based Novel Approach to Efficient Text Localization in Grayscale Images Karthik Ram K.V & Mahantesh K Department of Electronics and Communication Engineering, SJB Institute of Technology, Bangalore,

More information

Scene Text Detection Using Machine Learning Classifiers

Scene Text Detection Using Machine Learning Classifiers 601 Scene Text Detection Using Machine Learning Classifiers Nafla C.N. 1, Sneha K. 2, Divya K.P. 3 1 (Department of CSE, RCET, Akkikkvu, Thrissur) 2 (Department of CSE, RCET, Akkikkvu, Thrissur) 3 (Department

More information

Gradient-Angular-Features for Word-Wise Video Script Identification

Gradient-Angular-Features for Word-Wise Video Script Identification Gradient-Angular-Features for Word-Wise Video Script Identification Author Shivakumara, Palaiahnakote, Sharma, Nabin, Pal, Umapada, Blumenstein, Michael, Tan, Chew Lim Published 2014 Conference Title Pattern

More information

An Approach to Detect Text and Caption in Video

An Approach to Detect Text and Caption in Video An Approach to Detect Text and Caption in Video Miss Megha Khokhra 1 M.E Student Electronics and Communication Department, Kalol Institute of Technology, Gujarat, India ABSTRACT The video image spitted

More information

Segmentation Framework for Multi-Oriented Text Detection and Recognition

Segmentation Framework for Multi-Oriented Text Detection and Recognition Segmentation Framework for Multi-Oriented Text Detection and Recognition Shashi Kant, Sini Shibu Department of Computer Science and Engineering, NRI-IIST, Bhopal Abstract - Here in this paper a new and

More information

A New Log Gabor Approach for Text Detection from Video

A New Log Gabor Approach for Text Detection from Video A New Log Gabor Approach for Text Detection from Video P. Sudir SJCIT/Department of E&C, Chikkaballapur, India Email: sudirhappy@gmail.com M. Ravishankar DSCE/Department of ISE, Bangalore, India Email:

More information

Gradient Difference Based Approach for Text Localization in Compressed Domain

Gradient Difference Based Approach for Text Localization in Compressed Domain Proceedings of International Conference on Emerging Research in Computing, Information, Communication and Applications (ERCICA-14) Gradient Difference Based Approach for Text Localization in Compressed

More information

Dot Text Detection Based on FAST Points

Dot Text Detection Based on FAST Points Dot Text Detection Based on FAST Points Yuning Du, Haizhou Ai Computer Science & Technology Department Tsinghua University Beijing, China dyn10@mails.tsinghua.edu.cn, ahz@mail.tsinghua.edu.cn Shihong Lao

More information

I. INTRODUCTION. Figure-1 Basic block of text analysis

I. INTRODUCTION. Figure-1 Basic block of text analysis ISSN: 2349-7637 (Online) (RHIMRJ) Research Paper Available online at: www.rhimrj.com Detection and Localization of Texts from Natural Scene Images: A Hybrid Approach Priyanka Muchhadiya Post Graduate Fellow,

More information

Arbitrarily-oriented multi-lingual text detection in video

Arbitrarily-oriented multi-lingual text detection in video DOI 10.1007/s11042-016-3941-x Arbitrarily-oriented multi-lingual text detection in video Vijeta Khare 1 & Palaiahnakote Shivakumara 2,3 & Raveendran Paramesran 1 & Michael Blumenstein 4 Received: 7 February

More information

Available online at ScienceDirect. Procedia Computer Science 96 (2016 )

Available online at   ScienceDirect. Procedia Computer Science 96 (2016 ) Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 96 (2016 ) 1409 1417 20th International Conference on Knowledge Based and Intelligent Information and Engineering Systems,

More information

A Fast Caption Detection Method for Low Quality Video Images

A Fast Caption Detection Method for Low Quality Video Images 2012 10th IAPR International Workshop on Document Analysis Systems A Fast Caption Detection Method for Low Quality Video Images Tianyi Gui, Jun Sun, Satoshi Naoi Fujitsu Research & Development Center CO.,

More information

Enhanced Image. Improved Dam point Labelling

Enhanced Image. Improved Dam point Labelling 3rd International Conference on Multimedia Technology(ICMT 2013) Video Text Extraction Based on Stroke Width and Color Xiaodong Huang, 1 Qin Wang, Kehua Liu, Lishang Zhu Abstract. Video text can be used

More information

Time Stamp Detection and Recognition in Video Frames

Time Stamp Detection and Recognition in Video Frames Time Stamp Detection and Recognition in Video Frames Nongluk Covavisaruch and Chetsada Saengpanit Department of Computer Engineering, Chulalongkorn University, Bangkok 10330, Thailand E-mail: nongluk.c@chula.ac.th

More information

International Journal of Electrical, Electronics ISSN No. (Online): and Computer Engineering 3(2): 85-90(2014)

International Journal of Electrical, Electronics ISSN No. (Online): and Computer Engineering 3(2): 85-90(2014) I J E E E C International Journal of Electrical, Electronics ISSN No. (Online): 2277-2626 Computer Engineering 3(2): 85-90(2014) Robust Approach to Recognize Localize Text from Natural Scene Images Khushbu

More information

CORRELATION BASED CAR NUMBER PLATE EXTRACTION SYSTEM

CORRELATION BASED CAR NUMBER PLATE EXTRACTION SYSTEM CORRELATION BASED CAR NUMBER PLATE EXTRACTION SYSTEM 1 PHYO THET KHIN, 2 LAI LAI WIN KYI 1,2 Department of Information Technology, Mandalay Technological University The Republic of the Union of Myanmar

More information

A Phase-based Approach for Caption Detection in Videos

A Phase-based Approach for Caption Detection in Videos A Phase-based Approach for Caption Detection in Videos Shu Wen, Yonghong Song, Yuanlin Zhang, Yu Yu Institute of Artificial Intelligence and Robotics, Xi an Jiaotong University, No.28, Xianning West Road,

More information

12/12 A Chinese Words Detection Method in Camera Based Images Qingmin Chen, Yi Zhou, Kai Chen, Li Song, Xiaokang Yang Institute of Image Communication

12/12 A Chinese Words Detection Method in Camera Based Images Qingmin Chen, Yi Zhou, Kai Chen, Li Song, Xiaokang Yang Institute of Image Communication A Chinese Words Detection Method in Camera Based Images Qingmin Chen, Yi Zhou, Kai Chen, Li Song, Xiaokang Yang Institute of Image Communication and Information Processing, Shanghai Key Laboratory Shanghai

More information

Using Adaptive Run Length Smoothing Algorithm for Accurate Text Localization in Images

Using Adaptive Run Length Smoothing Algorithm for Accurate Text Localization in Images Using Adaptive Run Length Smoothing Algorithm for Accurate Text Localization in Images Martin Rais, Norberto A. Goussies, and Marta Mejail Departamento de Computación, Facultad de Ciencias Exactas y Naturales,

More information

Automatic Text Extraction in Video Based on the Combined Corner Metric and Laplacian Filtering Technique

Automatic Text Extraction in Video Based on the Combined Corner Metric and Laplacian Filtering Technique Automatic Text Extraction in Video Based on the Combined Corner Metric and Laplacian Filtering Technique Kaushik K.S 1, Suresha D 2 1 Research scholar, Canara Engineering College, Mangalore 2 Assistant

More information

SCENE TEXT BINARIZATION AND RECOGNITION

SCENE TEXT BINARIZATION AND RECOGNITION Chapter 5 SCENE TEXT BINARIZATION AND RECOGNITION 5.1 BACKGROUND In the previous chapter, detection of text lines from scene images using run length based method and also elimination of false positives

More information

Recognition of Gurmukhi Text from Sign Board Images Captured from Mobile Camera

Recognition of Gurmukhi Text from Sign Board Images Captured from Mobile Camera International Journal of Information & Computation Technology. ISSN 0974-2239 Volume 4, Number 17 (2014), pp. 1839-1845 International Research Publications House http://www. irphouse.com Recognition of

More information

Text Extraction from Natural Scene Images and Conversion to Audio in Smart Phone Applications

Text Extraction from Natural Scene Images and Conversion to Audio in Smart Phone Applications Text Extraction from Natural Scene Images and Conversion to Audio in Smart Phone Applications M. Prabaharan 1, K. Radha 2 M.E Student, Department of Computer Science and Engineering, Muthayammal Engineering

More information

Text Area Detection from Video Frames

Text Area Detection from Video Frames Text Area Detection from Video Frames 1 Text Area Detection from Video Frames Xiangrong Chen, Hongjiang Zhang Microsoft Research China chxr@yahoo.com, hjzhang@microsoft.com Abstract. Text area detection

More information

Content Based Image Retrieval Using Color Quantizes, EDBTC and LBP Features

Content Based Image Retrieval Using Color Quantizes, EDBTC and LBP Features Content Based Image Retrieval Using Color Quantizes, EDBTC and LBP Features 1 Kum Sharanamma, 2 Krishnapriya Sharma 1,2 SIR MVIT Abstract- To describe the image features the Local binary pattern (LBP)

More information

An Adaptive Threshold LBP Algorithm for Face Recognition

An Adaptive Threshold LBP Algorithm for Face Recognition An Adaptive Threshold LBP Algorithm for Face Recognition Xiaoping Jiang 1, Chuyu Guo 1,*, Hua Zhang 1, and Chenghua Li 1 1 College of Electronics and Information Engineering, Hubei Key Laboratory of Intelligent

More information

Image Resizing Based on Gradient Vector Flow Analysis

Image Resizing Based on Gradient Vector Flow Analysis Image Resizing Based on Gradient Vector Flow Analysis Sebastiano Battiato battiato@dmi.unict.it Giovanni Puglisi puglisi@dmi.unict.it Giovanni Maria Farinella gfarinellao@dmi.unict.it Daniele Ravì rav@dmi.unict.it

More information

Extraction Characters from Scene Image based on Shape Properties and Geometric Features

Extraction Characters from Scene Image based on Shape Properties and Geometric Features Extraction Characters from Scene Image based on Shape Properties and Geometric Features Abdel-Rahiem A. Hashem Mathematics Department Faculty of science Assiut University, Egypt University of Malaya, Malaysia

More information

TEVI: Text Extraction for Video Indexing

TEVI: Text Extraction for Video Indexing TEVI: Text Extraction for Video Indexing Hichem KARRAY, Mohamed SALAH, Adel M. ALIMI REGIM: Research Group on Intelligent Machines, EIS, University of Sfax, Tunisia hichem.karray@ieee.org mohamed_salah@laposte.net

More information

An Automatic Timestamp Replanting Algorithm for Panorama Video Surveillance *

An Automatic Timestamp Replanting Algorithm for Panorama Video Surveillance * An Automatic Timestamp Replanting Algorithm for Panorama Video Surveillance * Xinguo Yu, Wu Song, Jun Cheng, Bo Qiu, and Bin He National Engineering Research Center for E-Learning, Central China Normal

More information

Image Text Extraction and Recognition using Hybrid Approach of Region Based and Connected Component Methods

Image Text Extraction and Recognition using Hybrid Approach of Region Based and Connected Component Methods Image Text Extraction and Recognition using Hybrid Approach of Region Based and Connected Component Methods Ms. N. Geetha 1 Assistant Professor Department of Computer Applications Vellalar College for

More information

Edge-based Features for Localization of Artificial Urdu Text in Video Images

Edge-based Features for Localization of Artificial Urdu Text in Video Images 2011 International Conference on Document Analysis and Recognition Edge-based Features for Localization of Artificial Urdu Text in Video Images Akhtar Jamil Imran Siddiqi Fahim Arif Ahsen Raza Department

More information

Image retrieval based on bag of images

Image retrieval based on bag of images University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2009 Image retrieval based on bag of images Jun Zhang University of Wollongong

More information

Automatically Algorithm for Physician s Handwritten Segmentation on Prescription

Automatically Algorithm for Physician s Handwritten Segmentation on Prescription Automatically Algorithm for Physician s Handwritten Segmentation on Prescription Narumol Chumuang 1 and Mahasak Ketcham 2 Department of Information Technology, Faculty of Information Technology, King Mongkut's

More information

Separation of Overlapping Text from Graphics

Separation of Overlapping Text from Graphics Separation of Overlapping Text from Graphics Ruini Cao, Chew Lim Tan School of Computing, National University of Singapore 3 Science Drive 2, Singapore 117543 Email: {caorn, tancl}@comp.nus.edu.sg Abstract

More information

RESTORATION OF DEGRADED DOCUMENTS USING IMAGE BINARIZATION TECHNIQUE

RESTORATION OF DEGRADED DOCUMENTS USING IMAGE BINARIZATION TECHNIQUE RESTORATION OF DEGRADED DOCUMENTS USING IMAGE BINARIZATION TECHNIQUE K. Kaviya Selvi 1 and R. S. Sabeenian 2 1 Department of Electronics and Communication Engineering, Communication Systems, Sona College

More information

OTCYMIST: Otsu-Canny Minimal Spanning Tree for Born-Digital Images

OTCYMIST: Otsu-Canny Minimal Spanning Tree for Born-Digital Images OTCYMIST: Otsu-Canny Minimal Spanning Tree for Born-Digital Images Deepak Kumar and A G Ramakrishnan Medical Intelligence and Language Engineering Laboratory Department of Electrical Engineering, Indian

More information

CONTENT ADAPTIVE SCREEN IMAGE SCALING

CONTENT ADAPTIVE SCREEN IMAGE SCALING CONTENT ADAPTIVE SCREEN IMAGE SCALING Yao Zhai (*), Qifei Wang, Yan Lu, Shipeng Li University of Science and Technology of China, Hefei, Anhui, 37, China Microsoft Research, Beijing, 8, China ABSTRACT

More information

LEVERAGING SURROUNDING CONTEXT FOR SCENE TEXT DETECTION

LEVERAGING SURROUNDING CONTEXT FOR SCENE TEXT DETECTION LEVERAGING SURROUNDING CONTEXT FOR SCENE TEXT DETECTION Yao Li 1, Chunhua Shen 1, Wenjing Jia 2, Anton van den Hengel 1 1 The University of Adelaide, Australia 2 University of Technology, Sydney, Australia

More information

Improving Latent Fingerprint Matching Performance by Orientation Field Estimation using Localized Dictionaries

Improving Latent Fingerprint Matching Performance by Orientation Field Estimation using Localized Dictionaries Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 11, November 2014,

More information

Text Information Extraction And Analysis From Images Using Digital Image Processing Techniques

Text Information Extraction And Analysis From Images Using Digital Image Processing Techniques Text Information Extraction And Analysis From Images Using Digital Image Processing Techniques Partha Sarathi Giri Department of Electronics and Communication, M.E.M.S, Balasore, Odisha Abstract Text data

More information

Layout Segmentation of Scanned Newspaper Documents

Layout Segmentation of Scanned Newspaper Documents , pp-05-10 Layout Segmentation of Scanned Newspaper Documents A.Bandyopadhyay, A. Ganguly and U.Pal CVPR Unit, Indian Statistical Institute 203 B T Road, Kolkata, India. Abstract: Layout segmentation algorithms

More information

Mobile Human Detection Systems based on Sliding Windows Approach-A Review

Mobile Human Detection Systems based on Sliding Windows Approach-A Review Mobile Human Detection Systems based on Sliding Windows Approach-A Review Seminar: Mobile Human detection systems Njieutcheu Tassi cedrique Rovile Department of Computer Engineering University of Heidelberg

More information

ABSTRACT 1. INTRODUCTION 2. RELATED WORK

ABSTRACT 1. INTRODUCTION 2. RELATED WORK Improving text recognition by distinguishing scene and overlay text Bernhard Quehl, Haojin Yang, Harald Sack Hasso Plattner Institute, Potsdam, Germany Email: {bernhard.quehl, haojin.yang, harald.sack}@hpi.de

More information

Recognition of Multiple Characters in a Scene Image Using Arrangement of Local Features

Recognition of Multiple Characters in a Scene Image Using Arrangement of Local Features 2011 International Conference on Document Analysis and Recognition Recognition of Multiple Characters in a Scene Image Using Arrangement of Local Features Masakazu Iwamura, Takuya Kobayashi, and Koichi

More information

Text Enhancement with Asymmetric Filter for Video OCR. Datong Chen, Kim Shearer and Hervé Bourlard

Text Enhancement with Asymmetric Filter for Video OCR. Datong Chen, Kim Shearer and Hervé Bourlard Text Enhancement with Asymmetric Filter for Video OCR Datong Chen, Kim Shearer and Hervé Bourlard Dalle Molle Institute for Perceptual Artificial Intelligence Rue du Simplon 4 1920 Martigny, Switzerland

More information

Binarization of Color Character Strings in Scene Images Using K-means Clustering and Support Vector Machines

Binarization of Color Character Strings in Scene Images Using K-means Clustering and Support Vector Machines 2011 International Conference on Document Analysis and Recognition Binarization of Color Character Strings in Scene Images Using K-means Clustering and Support Vector Machines Toru Wakahara Kohei Kita

More information

A Background Modeling Approach Based on Visual Background Extractor Taotao Liu1, a, Lin Qi2, b and Guichi Liu2, c

A Background Modeling Approach Based on Visual Background Extractor Taotao Liu1, a, Lin Qi2, b and Guichi Liu2, c 4th International Conference on Mechatronics, Materials, Chemistry and Computer Engineering (ICMMCCE 2015) A Background Modeling Approach Based on Visual Background Extractor Taotao Liu1, a, Lin Qi2, b

More information

Restoring Warped Document Image Based on Text Line Correction

Restoring Warped Document Image Based on Text Line Correction Restoring Warped Document Image Based on Text Line Correction * Dep. of Electrical Engineering Tamkang University, New Taipei, Taiwan, R.O.C *Correspondending Author: hsieh@ee.tku.edu.tw Abstract Document

More information

Conspicuous Character Patterns

Conspicuous Character Patterns Conspicuous Character Patterns Seiichi Uchida Kyushu Univ., Japan Ryoji Hattori Masakazu Iwamura Kyushu Univ., Japan Osaka Pref. Univ., Japan Koichi Kise Osaka Pref. Univ., Japan Shinichiro Omachi Tohoku

More information

Color Image Segmentation

Color Image Segmentation Color Image Segmentation Yining Deng, B. S. Manjunath and Hyundoo Shin* Department of Electrical and Computer Engineering University of California, Santa Barbara, CA 93106-9560 *Samsung Electronics Inc.

More information

Texture Segmentation by Windowed Projection

Texture Segmentation by Windowed Projection Texture Segmentation by Windowed Projection 1, 2 Fan-Chen Tseng, 2 Ching-Chi Hsu, 2 Chiou-Shann Fuh 1 Department of Electronic Engineering National I-Lan Institute of Technology e-mail : fctseng@ccmail.ilantech.edu.tw

More information

A Study on Similarity Computations in Template Matching Technique for Identity Verification

A Study on Similarity Computations in Template Matching Technique for Identity Verification A Study on Similarity Computations in Template Matching Technique for Identity Verification Lam, S. K., Yeong, C. Y., Yew, C. T., Chai, W. S., Suandi, S. A. Intelligent Biometric Group, School of Electrical

More information

A New Algorithm for Detecting Text Line in Handwritten Documents

A New Algorithm for Detecting Text Line in Handwritten Documents A New Algorithm for Detecting Text Line in Handwritten Documents Yi Li 1, Yefeng Zheng 2, David Doermann 1, and Stefan Jaeger 1 1 Laboratory for Language and Media Processing Institute for Advanced Computer

More information

Toward Part-based Document Image Decoding

Toward Part-based Document Image Decoding 2012 10th IAPR International Workshop on Document Analysis Systems Toward Part-based Document Image Decoding Wang Song, Seiichi Uchida Kyushu University, Fukuoka, Japan wangsong@human.ait.kyushu-u.ac.jp,

More information

Text Detection and Extraction from Natural Scene: A Survey Tajinder Kaur 1 Post-Graduation, Department CE, Punjabi University, Patiala, Punjab India

Text Detection and Extraction from Natural Scene: A Survey Tajinder Kaur 1 Post-Graduation, Department CE, Punjabi University, Patiala, Punjab India Volume 3, Issue 3, March 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online at: www.ijarcsms.com ISSN:

More information

EDGE BASED REGION GROWING

EDGE BASED REGION GROWING EDGE BASED REGION GROWING Rupinder Singh, Jarnail Singh Preetkamal Sharma, Sudhir Sharma Abstract Image segmentation is a decomposition of scene into its components. It is a key step in image analysis.

More information

Pattern Recognition 46 (2013) Contents lists available at SciVerse ScienceDirect. Pattern Recognition

Pattern Recognition 46 (2013) Contents lists available at SciVerse ScienceDirect. Pattern Recognition Pattern Recognition 46 (2013) 131 140 Contents lists available at SciVerse ScienceDirect Pattern Recognition journal homepage: www.elsevier.com/locate/pr A novel ring radius transform for video character

More information

Image retrieval based on region shape similarity

Image retrieval based on region shape similarity Image retrieval based on region shape similarity Cheng Chang Liu Wenyin Hongjiang Zhang Microsoft Research China, 49 Zhichun Road, Beijing 8, China {wyliu, hjzhang}@microsoft.com ABSTRACT This paper presents

More information

Express Letters. A Simple and Efficient Search Algorithm for Block-Matching Motion Estimation. Jianhua Lu and Ming L. Liou

Express Letters. A Simple and Efficient Search Algorithm for Block-Matching Motion Estimation. Jianhua Lu and Ming L. Liou IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 7, NO. 2, APRIL 1997 429 Express Letters A Simple and Efficient Search Algorithm for Block-Matching Motion Estimation Jianhua Lu and

More information

Logical Templates for Feature Extraction in Fingerprint Images

Logical Templates for Feature Extraction in Fingerprint Images Logical Templates for Feature Extraction in Fingerprint Images Bir Bhanu, Michael Boshra and Xuejun Tan Center for Research in Intelligent Systems University of Califomia, Riverside, CA 9252 1, USA Email:

More information

Effects Of Shadow On Canny Edge Detection through a camera

Effects Of Shadow On Canny Edge Detection through a camera 1523 Effects Of Shadow On Canny Edge Detection through a camera Srajit Mehrotra Shadow causes errors in computer vision as it is difficult to detect objects that are under the influence of shadows. Shadow

More information

AViTExt: Automatic Video Text Extraction

AViTExt: Automatic Video Text Extraction AViTExt: Automatic Video Text Extraction A new Approach for video content indexing Application Baseem Bouaziz systems and Advanced Computing Bassem.bouazizgfsegs.rnu.tn Tarek Zlitni systems and Advanced

More information

Text Separation from Graphics by Analyzing Stroke Width Variety in Persian City Maps

Text Separation from Graphics by Analyzing Stroke Width Variety in Persian City Maps Text Separation from Graphics by Analyzing Stroke Width Variety in Persian City Maps Ali Ghafari-Beranghar Department of Computer Engineering, Science and Research Branch, Islamic Azad University, Tehran,

More information

Moving Object Segmentation Method Based on Motion Information Classification by X-means and Spatial Region Segmentation

Moving Object Segmentation Method Based on Motion Information Classification by X-means and Spatial Region Segmentation IJCSNS International Journal of Computer Science and Network Security, VOL.13 No.11, November 2013 1 Moving Object Segmentation Method Based on Motion Information Classification by X-means and Spatial

More information

LOCALIZATION OF OVERLAID TEXT BASED ON NOISE INCONSISTENCIES

LOCALIZATION OF OVERLAID TEXT BASED ON NOISE INCONSISTENCIES LOCALIZATION OF OVERLAID TEXT BASED ON NOISE INCONSISTENCIES Too Kipyego Boaz and Prabhakar C. J. Department of Computer Science, Kuvempu University, India ABSTRACT In this paper, we present a novel technique

More information

Recognition-based Segmentation of Nom Characters from Body Text Regions of Stele Images Using Area Voronoi Diagram

Recognition-based Segmentation of Nom Characters from Body Text Regions of Stele Images Using Area Voronoi Diagram Author manuscript, published in "International Conference on Computer Analysis of Images and Patterns - CAIP'2009 5702 (2009) 205-212" DOI : 10.1007/978-3-642-03767-2 Recognition-based Segmentation of

More information

Extracting Layers and Recognizing Features for Automatic Map Understanding. Yao-Yi Chiang

Extracting Layers and Recognizing Features for Automatic Map Understanding. Yao-Yi Chiang Extracting Layers and Recognizing Features for Automatic Map Understanding Yao-Yi Chiang 0 Outline Introduction/ Problem Motivation Map Processing Overview Map Decomposition Feature Recognition Discussion

More information

Translation Symmetry Detection: A Repetitive Pattern Analysis Approach

Translation Symmetry Detection: A Repetitive Pattern Analysis Approach 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops Translation Symmetry Detection: A Repetitive Pattern Analysis Approach Yunliang Cai and George Baciu GAMA Lab, Department of Computing

More information

Text Detection from Natural Image using MSER and BOW

Text Detection from Natural Image using MSER and BOW International Journal of Emerging Engineering Research and Technology Volume 3, Issue 11, November 2015, PP 152-156 ISSN 2349-4395 (Print) & ISSN 2349-4409 (Online) Text Detection from Natural Image using

More information

Snakes reparameterization for noisy images segmentation and targets tracking

Snakes reparameterization for noisy images segmentation and targets tracking Snakes reparameterization for noisy images segmentation and targets tracking Idrissi Sidi Yassine, Samir Belfkih. Lycée Tawfik Elhakim Zawiya de Noaceur, route de Marrakech, Casablanca, maroc. Laboratoire

More information

Mobile Camera Based Text Detection and Translation

Mobile Camera Based Text Detection and Translation Mobile Camera Based Text Detection and Translation Derek Ma Qiuhau Lin Tong Zhang Department of Electrical EngineeringDepartment of Electrical EngineeringDepartment of Mechanical Engineering Email: derekxm@stanford.edu

More information

Multi-script Text Extraction from Natural Scenes

Multi-script Text Extraction from Natural Scenes Multi-script Text Extraction from Natural Scenes Lluís Gómez and Dimosthenis Karatzas Computer Vision Center Universitat Autònoma de Barcelona Email: {lgomez,dimos}@cvc.uab.es Abstract Scene text extraction

More information

Automatic Shadow Removal by Illuminance in HSV Color Space

Automatic Shadow Removal by Illuminance in HSV Color Space Computer Science and Information Technology 3(3): 70-75, 2015 DOI: 10.13189/csit.2015.030303 http://www.hrpub.org Automatic Shadow Removal by Illuminance in HSV Color Space Wenbo Huang 1, KyoungYeon Kim

More information

Connected Component Clustering Based Text Detection with Structure Based Partition and Grouping

Connected Component Clustering Based Text Detection with Structure Based Partition and Grouping IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 16, Issue 5, Ver. III (Sep Oct. 2014), PP 50-56 Connected Component Clustering Based Text Detection with Structure

More information

Character Recognition

Character Recognition Character Recognition 5.1 INTRODUCTION Recognition is one of the important steps in image processing. There are different methods such as Histogram method, Hough transformation, Neural computing approaches

More information

Text Localization and Extraction in Natural Scene Images

Text Localization and Extraction in Natural Scene Images Text Localization and Extraction in Natural Scene Images Miss Nikita her M.E. Student, MET BKC IOE, University of Pune, Nasik, India. e-mail: nikitaaher@gmail.com bstract Content based image analysis methods

More information

Bus Detection and recognition for visually impaired people

Bus Detection and recognition for visually impaired people Bus Detection and recognition for visually impaired people Hangrong Pan, Chucai Yi, and Yingli Tian The City College of New York The Graduate Center The City University of New York MAP4VIP Outline Motivation

More information

Color-Texture Segmentation of Medical Images Based on Local Contrast Information

Color-Texture Segmentation of Medical Images Based on Local Contrast Information Color-Texture Segmentation of Medical Images Based on Local Contrast Information Yu-Chou Chang Department of ECEn, Brigham Young University, Provo, Utah, 84602 USA ycchang@et.byu.edu Dah-Jye Lee Department

More information

Scene Text Recognition in Mobile Application using K-Mean Clustering and Support Vector Machine

Scene Text Recognition in Mobile Application using K-Mean Clustering and Support Vector Machine ISSN: 2278 1323 All Rights Reserved 2015 IJARCET 2492 Scene Text Recognition in Mobile Application using K-Mean Clustering and Support Vector Machine Priyanka N Guttedar, Pushpalata S Abstract In natural

More information

Iterative Removing Salt and Pepper Noise based on Neighbourhood Information

Iterative Removing Salt and Pepper Noise based on Neighbourhood Information Iterative Removing Salt and Pepper Noise based on Neighbourhood Information Liu Chun College of Computer Science and Information Technology Daqing Normal University Daqing, China Sun Bishen Twenty-seventh

More information

HYBRID CENTER-SYMMETRIC LOCAL PATTERN FOR DYNAMIC BACKGROUND SUBTRACTION. Gengjian Xue, Li Song, Jun Sun, Meng Wu

HYBRID CENTER-SYMMETRIC LOCAL PATTERN FOR DYNAMIC BACKGROUND SUBTRACTION. Gengjian Xue, Li Song, Jun Sun, Meng Wu HYBRID CENTER-SYMMETRIC LOCAL PATTERN FOR DYNAMIC BACKGROUND SUBTRACTION Gengjian Xue, Li Song, Jun Sun, Meng Wu Institute of Image Communication and Information Processing, Shanghai Jiao Tong University,

More information

N.Priya. Keywords Compass mask, Threshold, Morphological Operators, Statistical Measures, Text extraction

N.Priya. Keywords Compass mask, Threshold, Morphological Operators, Statistical Measures, Text extraction Volume, Issue 8, August ISSN: 77 8X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Combined Edge-Based Text

More information

Hidden Loop Recovery for Handwriting Recognition

Hidden Loop Recovery for Handwriting Recognition Hidden Loop Recovery for Handwriting Recognition David Doermann Institute of Advanced Computer Studies, University of Maryland, College Park, USA E-mail: doermann@cfar.umd.edu Nathan Intrator School of

More information

An ICA based Approach for Complex Color Scene Text Binarization

An ICA based Approach for Complex Color Scene Text Binarization An ICA based Approach for Complex Color Scene Text Binarization Siddharth Kherada IIIT-Hyderabad, India siddharth.kherada@research.iiit.ac.in Anoop M. Namboodiri IIIT-Hyderabad, India anoop@iiit.ac.in

More information

CAP 6412 Advanced Computer Vision

CAP 6412 Advanced Computer Vision CAP 6412 Advanced Computer Vision http://www.cs.ucf.edu/~bgong/cap6412.html Boqing Gong April 21st, 2016 Today Administrivia Free parameters in an approach, model, or algorithm? Egocentric videos by Aisha

More information

Defect Inspection of Liquid-Crystal-Display (LCD) Panels in Repetitive Pattern Images Using 2D Fourier Image Reconstruction

Defect Inspection of Liquid-Crystal-Display (LCD) Panels in Repetitive Pattern Images Using 2D Fourier Image Reconstruction Defect Inspection of Liquid-Crystal-Display (LCD) Panels in Repetitive Pattern Images Using D Fourier Image Reconstruction Du-Ming Tsai, and Yan-Hsin Tseng Department of Industrial Engineering and Management

More information

A Feature based on Encoding the Relative Position of a Point in the Character for Online Handwritten Character Recognition

A Feature based on Encoding the Relative Position of a Point in the Character for Online Handwritten Character Recognition A Feature based on Encoding the Relative Position of a Point in the Character for Online Handwritten Character Recognition Dinesh Mandalapu, Sridhar Murali Krishna HP Laboratories India HPL-2007-109 July

More information

THE description and representation of the shape of an object

THE description and representation of the shape of an object Enhancement of Shape Description and Representation by Slope Ali Salem Bin Samma and Rosalina Abdul Salam Abstract Representation and description of object shapes by the slopes of their contours or borders

More information

An Objective Evaluation Methodology for Handwritten Image Document Binarization Techniques

An Objective Evaluation Methodology for Handwritten Image Document Binarization Techniques An Objective Evaluation Methodology for Handwritten Image Document Binarization Techniques K. Ntirogiannis, B. Gatos and I. Pratikakis Computational Intelligence Laboratory, Institute of Informatics and

More information

SOME stereo image-matching methods require a user-selected

SOME stereo image-matching methods require a user-selected IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 3, NO. 2, APRIL 2006 207 Seed Point Selection Method for Triangle Constrained Image Matching Propagation Qing Zhu, Bo Wu, and Zhi-Xiang Xu Abstract In order

More information

Fast Decision of Block size, Prediction Mode and Intra Block for H.264 Intra Prediction EE Gaurav Hansda

Fast Decision of Block size, Prediction Mode and Intra Block for H.264 Intra Prediction EE Gaurav Hansda Fast Decision of Block size, Prediction Mode and Intra Block for H.264 Intra Prediction EE 5359 Gaurav Hansda 1000721849 gaurav.hansda@mavs.uta.edu Outline Introduction to H.264 Current algorithms for

More information

An Efficient Single Chord-based Accumulation Technique (SCA) to Detect More Reliable Corners

An Efficient Single Chord-based Accumulation Technique (SCA) to Detect More Reliable Corners An Efficient Single Chord-based Accumulation Technique (SCA) to Detect More Reliable Corners Mohammad Asiful Hossain, Abdul Kawsar Tushar, and Shofiullah Babor Computer Science and Engineering Department,

More information

Face Recognition At-a-Distance Based on Sparse-Stereo Reconstruction

Face Recognition At-a-Distance Based on Sparse-Stereo Reconstruction Face Recognition At-a-Distance Based on Sparse-Stereo Reconstruction Ham Rara, Shireen Elhabian, Asem Ali University of Louisville Louisville, KY {hmrara01,syelha01,amali003}@louisville.edu Mike Miller,

More information

arxiv: v1 [cs.cv] 23 Apr 2016

arxiv: v1 [cs.cv] 23 Apr 2016 Text Flow: A Unified Text Detection System in Natural Scene Images Shangxuan Tian1, Yifeng Pan2, Chang Huang2, Shijian Lu3, Kai Yu2, and Chew Lim Tan1 arxiv:1604.06877v1 [cs.cv] 23 Apr 2016 1 School of

More information

Detecting Printed and Handwritten Partial Copies of Line Drawings Embedded in Complex Backgrounds

Detecting Printed and Handwritten Partial Copies of Line Drawings Embedded in Complex Backgrounds 9 1th International Conference on Document Analysis and Recognition Detecting Printed and Handwritten Partial Copies of Line Drawings Embedded in Complex Backgrounds Weihan Sun, Koichi Kise Graduate School

More information

A System to Retrieve Text/Symbols from Color Maps using Connected Component and Skeleton Analysis

A System to Retrieve Text/Symbols from Color Maps using Connected Component and Skeleton Analysis A System to Retrieve Text/Symbols from Color Maps using Connected Component and Skeleton Analysis Partha Pratim Roy 1, Eduard Vazquez 1, Josep Lladós 1, Ramon Baldrich 1, and Umapada Pal 2 1 Computer Vision

More information

Shape Descriptor using Polar Plot for Shape Recognition.

Shape Descriptor using Polar Plot for Shape Recognition. Shape Descriptor using Polar Plot for Shape Recognition. Brijesh Pillai ECE Graduate Student, Clemson University bpillai@clemson.edu Abstract : This paper presents my work on computing shape models that

More information

The Application of Image Processing to Solve Occlusion Issue in Object Tracking

The Application of Image Processing to Solve Occlusion Issue in Object Tracking The Application of Image Processing to Solve Occlusion Issue in Object Tracking Yun Zhe Cheong 1 and Wei Jen Chew 1* 1 School of Engineering, Taylor s University, 47500 Subang Jaya, Selangor, Malaysia.

More information

Automatic Texture Segmentation for Texture-based Image Retrieval

Automatic Texture Segmentation for Texture-based Image Retrieval Automatic Texture Segmentation for Texture-based Image Retrieval Ying Liu, Xiaofang Zhou School of ITEE, The University of Queensland, Queensland, 4072, Australia liuy@itee.uq.edu.au, zxf@itee.uq.edu.au

More information