Text identification for document image analysis using a neural network

Size: px
Start display at page:

Download "Text identification for document image analysis using a neural network"

Transcription

1 IMAVIS 1511 Image and Vision Computing 16 (1998) Text identification for document image analysis using a neural network C. Strouthopoulos, N. Papamarkos* Electric Circuits Analysis Laboratory, Department of Electrical and Computer Engineering, Democritus University of Thrace, Xanthi, Greece Received 21 March 1997; accepted 12 November 1997 Abstract A new bottom-up method is described that clusters the content of a mixed type document into text or non-text areas. The proposed approach is based on a new set of features combined with a self-organized neural network classifier. The set of features corresponds to the contents and the relationship of 3 3 masks, is selected by using a statistical reduction procedure, and provides texture information. Next, a Principal Components Analyzer (PCA) is applied, which results in a reduced number of effective features. The final set of features is then utilized as input vector into a proper neural network to achieve the classification goal. The neural network classifier is based on a Kohonen Self Organized Feature Map (SOFM). Document blocks are classified as text, graphics, and halftones or to secondary subclasses corresponding to special cases of the primal classes. The proposed method can identify text regions included in graphics or even overlapped regions, that is, regions that cannot be separated with horizontal and vertical cuts. The performance of the method was extensively tested on a variety of documents with very promising results Elsevier Science B.V. All rights reserved. Keywords: Block classification; Document segmentation; Page layout analysis; Neural network classifiers 1. Introduction The use of paper documents is a common way for information transfer and communication. The conversion of paper documents in a proper electronic form is essential for its processing, understanding, archiving, and transmitting by computers. An important procedure in digital processing of documents is the Page Layout Analysis (PLA). The goal of the PLA is to discover the formatting of the text and, from that, to derive meaning associated with the positional and functional blocks in which the text is located [1 3]. As a result, it labels the parts of a document as text, halftones, and line drawings [4]. To achieve PLA the document must be first separated in meaningful blocks which include data having common and certain homogeneous attributes, for example blocks containing only text or only line drawings or halftone images. The PLA of mixed type documents is a prerequisite to facilitate later processes such as the recognition of text, vectorization of graphics, and compression of images. For example, when a document is to be processed by an optical character recognition (OCR) system it is necessary to separate text from halftones and line drawings, so that time will not be wasted * Corresponding author. Tel.: ; fax: ; papamark@voreas.ee.duth.gr in attempting to interpret the graphics as text. Also, the improvement of the compression ratio, if we encode text and image areas of a document with different methods, is useful for its electronic archiving and transmission. There are many techniques proposed for document segmentation. These techniques can be classified as either top-down (or model driven) [4 8], bottom-up (or data driven) [9 11], and hybrid [12 15]. In top-down methods, a document is first split into major regions (large components), and major regions into smaller subregions (more detailed components), and so on. Most of the topdown techniques are based on the run length smoothing (RLS) algorithm [5], also called constrained run length (CRL) algorithm [6], and the projection profile cuts [7]. The RLS algorithm imposes a smoothing on the binary form of the document using two predetermined parameters, one for the vertical and one for the horizontal smoothing of the blocks. For the block classification, additional parameters (defined in a heuristic way) are used leading, to the necessity to train the system with documents having similar fonts or other common morphological characteristics. It must be noted that the method is not robust since if the assumptions made to determine the heuristic parameters are not satisfied, the method will fail. Another disadvantage of this method is based on the assumption that pages consist of only rectangular and orthogonal blocks, which can be /98/$ - see front matter 1998 Elsevier Science B.V. All rights reserved. PII S (98)

2 880 C. Strouthopoulos, N. Papamarkos/Image and Vision Computing 16 (1998) separated by horizontal and vertical cuts. So, for documents where halftones and graphics are intermixed or overlapped with text, the RLS based methods will fail. Bottom-up methods involve grouping of pixels as connected components (marks) and merging these components into successively larger components. Kasturi and co-workers [9,10] proposed a method that starts by first finding the connected components of an image and then separating graphics from text using the relative frequency of occurrence of these components as a function of their areas. In the next step, an iterative procedure is used to improve the initial estimation by applying the Hough Transform to all connected components. The method performs well if the initial document conforms to certain requirements about the size of characters, the interline spacing, the character spacing, and the resolution. This method is quite complex and as the authors have reported, computationally expensive. The document segmentation technique proposed by O Gorman [11] is based on the document spectrum (docstrum) for k-nearest-neighbor marks. In this approach, the first step is the determination of marks and their centroids. Next, the direction vectors (distances, angles) of the k-nearest-neighbor connections of each mark are accumulated in a histogram. This histogram has two major peaks: one corresponding to intercharacter spacing and the other corresponding to the document skew. Using these peaks, a complex process groups the characters in words and text lines. This method, including both skew estimation and layout analysis, is computationally expensive and ineffective in multi-size, short text strings, and in halftone blocks. Hybrid techniques are those that use local and global strategies in combination with texture analysis approaches. Fan et al. [12] proposed a document analysis method that first performs an RLS operation followed by a stripe merging procedure for block segmentation. Next, text blocks are classified by utilizing its periodic behavior in the y-projection. Finally, a feature-based classification scheme is performed to classify blocks into predetermined categories. Unfortunately, this method fails in many cases. First nonhorizontal text or small text blocks do not satisfy the periodic behavior in the y-projection. Second, the feature set corresponds only to the number of black pixels included in a 3 3 mask and therefore the correct classification of graphics and halftones is not guaranteed. Using Gabor filters, which have been used earlier for the general problem of texture segmentation [13], Jain and Bhattacharjee [14] proposed a method for text segmentation of gray-level documents. The method works well with low-resolution documents and is robust to skew. However, as the authors have reported, because of the frequency domain usage, this method is time consuming, requiring about 2 min of workstation CPU time for a image. Also, Gabor filtering techniques do not guarantee that this is the best way for a given texture segmentation and classification task [15]. The recent page segmentation technique proposed by Jain and Zhong [15] is based on a set of texture masks in combination with a neural network classifier. In this method, text and line drawing regions are first classified in the same class. Unfortunately, the following discrimination of these two classes, which is a crucial step for text identification, is achieved by an ineffective procedure which is based only on the size of the connected components in the blocks. Strouthopoulos et al. [16,17] proposed a method for identification of text only areas in documents. The first stage of this method is an effective bottom-up technique, which results in elongated orthogonal blocks using a mark extraction and a bounding box connection algorithm. Next, these rectangles are classified as text or non-text areas by using thirteen 3 3 masks called document structural elements (DSEs). Usually, this method gives satisfactory segmentation results. However, due to the type of feature set and the classification scheme used, this method fails when the documents include elongated line drawings or some kind of halftones. This paper discusses a research effort that is intended to develop a document analysis system which can automatically identify text regions by classifying text, graphics and halftone blocks embedded in a document. In this paper, we propose a new page segmentation method that uses spatial structural features in combination with a neural network classifier. The set of features in addition to the use of DSEs, combines the contents of neighbor pairs of 3 3 masks and therefore provides two-dimensional texture information. Applying a feature selection technique 34 effective features only remain, which are then taken as input to a PCA. The result of this procedure is a reduced number of eight effective features, which are then used by a neural network classifier. The PCA is trained with the generalized Hebbian algorithm (GHA) [18 21]. It transforms the input data vectors to achieve as much of the total variation as possible. The neural network classifier is a Kohonen SOFM, which has as input the transformed vectors, resulting from the first stage and as output a competition layer consisting of an 8 8 grid. The proposed document block classification method is distinguished from other similar techniques due to the following desirable properties: 1. the robust and fast extraction of the document marks and blocks, 2. the independence of the type and the size of the characters and also, the type of halftones and graphics, 3. it can identify new desirable classes of blocks, such as blocks having italic characters, characters of specific font size, and graphics with predefined texture characteristics, 4. it can identify text regions included in graphics or even overlapped regions, that is, regions that cannot be separated with horizontal and vertical cuts (Manhattan layout [1]).In comparison with the previous method [16,17], the new approach provides: 1. the improvement of the feature set by using an additional

3 C. Strouthopoulos, N. Papamarkos/Image and Vision Computing 16 (1998) set of features, which includes information about the two-dimensional texture of the blocks, 2. the application of a feature reduction technique that is not based only on a statistical selection scheme but on the use of the PCA method, 3. the use of the SOFM that optimally identifies the block clusters. The method was tested with many mixed type documents. For the lack of comparison, many of the test documents were taken from the UW Document Image Database [22]. The method was compared with the commercial products Recognita Plus and Anagnostis. In all the test cases, the page layout procedure of these products is not applicable on special document cases such as the documents used in our examples. This paper presents characteristic examples that cover special layout difficulties. The experimental results confirm the effectiveness of the proposed method. 2. Block extraction A mark is a connected group of black pixels which have the following property: for every pair of pixels belonging to a mark there is always a path leading from one pixel to the other. After the extraction of the marks, every mark is included in a rectangular bounding box. For example, the application of this procedure to the document of Fig. 1 results in the bounding boxes of Fig. 2. According to the heights of the bounding boxes, a histogram is formulated whose main peaks can be determined by using the hill-clustering algorithm [23,24]. The result of this clustering procedure is the determination of the histogram peaks, which correspond to the distribution of the most used character sizes of the document. For a typical document, there is often a global peak for the distribution of characters of the predominant size, and smaller peaks for the rest of the characters and the noise. Thus, the histogram takes a global maximum value for the boxes that bound the characters of the most common size. Consequently, each iteration of the block extraction algorithm corresponds to a histogram peak. The procedure is repeated peak by peak from the biggest to the smallest peak until no other significant peaks exist. For each iteration, corresponding to a histogram peak value H max, those boxes with heights h i satisfying the following condition are accepted only: H max 2 h i 2H max : (1) According to Ref. [25], each text string must contain type of similar size (a range of less than two to one in point size). Thus, it can be easily seen how in Eq. (1) the coefficients 1/2 and 2 have been selected. Similar coefficients are used in Ref. [9] Extension and merging of the bounding boxes After the construction and filtering of the bounding boxes, these are extended to chains of connected boxes. This procedure is important for many reasons but mainly for the extraction of text lines and the independence of small document skews. Fig. 3 shows the remaining boxes, after height filtering and after having them extended by adding two equal rectangular extensions to their left and right sides (see Fig. 4). Referring to this figure, the value of h 1 is taken equal to the histogram H max in order to merge together the adjacent and extended boxes that are collinear and the distance between them is less than 2H max. Obviously, H max approaches the local average character height [25]. The values h 2 ¼ H max / 2 and h 3 ¼ H max /4 have been chosen so that the boxes of high characters (like h, l) or of characters with a tail (like p, q), can be joined with their adjacent boxes. Value h 3 is enough for the connection of bounding boxes of text lines with small skew. A larger value of h 3 produces a larger tolerance to skew. The value of h 3 satisfies the conditions described in Ref. [25]. That is, the leading, or interline spacing between lines of text, must not be excessively small. It must be more than 25% of the average letter height. The extension and merging procedure results in the joining between the bounding boxes of a text line and the creation of chains of boxes which correspond to the text lines. In practice, this procedure substitutes the complex and time-consuming use of the Hough Transform for collinearity and vicinity checking of the bounding boxes. Considering the result of the previous step as a new binary image, we surround each mark (which is now composed of extended-connected boxes) with new rectangles. It is obvious that the rectangles, which enclose extended boxes of a text line, are elongated. On the contrary, boxes that are located excessively far from their adjacent boxes or boxes that constitute a small isolated group either do not create overlapping boxes or create chains of relatively small length. Fig. 5 shows the chains of the overlapping boxes and the new rectangles that surround the chains. We accept those new rectangles that have a base to height ratio greater than 3.5. The threshold value 3.5 corresponds to the case with only two connected characters. After the above process is completed, the elongated blocks of the document, which consist of collinear and neighboring components of almost the same height, have been detected. The majority of these blocks correspond to text lines. However, it is possible to have horizontal dispositions of halftones or drawings, which compose such blocks. The application of a block classification technique will discriminate and classify the blocks corresponding only to text. 3. Feature extraction For the block classification, we use two sets of DSE

4 882 C. Strouthopoulos, N. Papamarkos/Image and Vision Computing 16 (1998) Fig. 1. Original binary document. features. The first set is identical with the features used by Strouthopoulos et al. [17]. It is observed that this set of features does not represent the exact spatial information well in cases where there are large line drawings, especially lines, or some type of halftone in the blocks. For this reason an additional set of features is proposed which combines the intrinsic information in neighboring DSEs. Next, we will describe in detail the two sets of features First set of features To express the spatial information of the blocks we use

5 C. Strouthopoulos, N. Papamarkos/Image and Vision Computing 16 (1998) Fig. 2. Bounding boxes after identification of marks. features which can describe what type of 3 3 masks are included in the blocks. The first set of features corresponds to 13 structural 3 3 binary masks, which we call document structure elements. The order of pixels in the mask is as follows:

6 884 C. Strouthopoulos, N. Papamarkos/Image and Vision Computing 16 (1998) Fig. 3. Remaining boxes after their height filtering and extensions. An integer L ¼ 8 i ¼ 0 b i 2 i with b i {0,1} is assigned to any DSE which is called the document structure element characteristic number (DSECN). It is obvious that since L {0,1,2,,511}, there are 512 different DSEs. For each block a histogram is formulated corresponding to the contribution of the DSEs (the 0 and 511 DSE are not considered because they correspond to pure background and object regions, respectively) in the block. For a rectangular block A, let K be the number of columns and J the number of rows.

7 C. Strouthopoulos, N. Papamarkos/Image and Vision Computing 16 (1998) Fig. 4. Form of extension of a bounding box. Obviously, A has (K ¹ 2)(J ¹ 2) DSEs which can formulate the histogram function and the probability density function of the DSEs: H 1 (l) ¼ 510 l ¼ 1 h 1(l) h 1 (l) : (2) Fig. 6 shows the H 1 (l) functions of the three typical blocks, i.e. blocks containing text, graphics, and halftone images. The 510 values of H 1 (l) can be taken as texture features. However, as we explain below, by the application of a feature selection technique, only 13 of them are finally selected as texture features Second set of features The second set of features is used in order to overcome difficult block classification cases. From our experiments we have observed that when a block contains some kind of large line drawing or halftones then the use of the first set of features only results in wrong classification. This happens because the first set of features does not describe the morphology of graphics in the blocks well. To overcome these difficulties, additional features are used which are extracted by combining the information of pairs of neighboring DSEs. Specifically, for every DSE we examine, as a pair, each one of the eight neighboring DSEs. Doing this, a two-dimensional histogram h 2 (l 1,l 2 ) is formulated that gives the contribution of each pair of DSEs in the block. The two-dimensional histogram function h 2 (l 1,l 2 ) of block A is obtained using the relation h 2 (l 1, l 2 ) ( h 2 (l 1, l 2 ) þ 1, ¼ h 2 (l 1, l 2 ), where if l 1 ¼ L 1 (k, j) and l 2 ¼ L 2 (k, j ) otherwise l 1 {1, 2,,510}, l 2 {1, 2,,510}, k {4,,K ¹ 5}, j {4,,J ¹ 5} k 0 ¼ k þ a, j ¼ j þ b ð3þ (a, b) {( ¹ 3, 0), ( ¹ 3, ¹ 3), (0, ¹ 3), (3, ¹ 3), (3, 0), (3, 3), (0, 3), ( ¹ 3, 3)} L 1 (k,j) the characteristic number of a DSE in position j,k and L 2 (k,j ) the characteristic number of each neighboring DSE in position k,j. For this histogram, the probability density function H 2 (l 1,l 2 ) is equal to h H 2 (l 1, l 2 ) ¼ 2 (l 1, l 2 ) : (4) h 2 (l 1, l 2 ) l 1 l 2 Fig. 7 shows the functions for the three typical blocks, i.e. blocks containing text, graphics, and halftone images. Using the probability density function H 2 (l 1,l 2 ) we can extract features. However, as in the case of the first feature set, the feature selection procedure results in only 21 features. So, the total number of features extracted from the two feature sets is Feature selection In order to select the feature space and to have only the most powerful of them, a feature selection procedure is necessary. In this approach, feature selection is based on stability, separability, and similarity criteria. The entire procedure is performed on a large number of test blocks extracted from different types of document and for each set of features separately. The stability test tries to identify the features, which give significantly stable performance. We accept only the highly stable features having normalized standard deviation j 0:01. For the separability test, we compute for every feature l belonging to two classes i and j the separability factor S l avg. For the separability test, for every feature l belonging to classes i and j the separability factor S l avg is computed: S l avg ¼ l(ml i) 2 ¹ (m l j) 2 l q (5) (j l i )2 þ (j l j )2 where m l i, m l j are the mean values and j l i, j l j the standard deviation values for each class. A large separability means

8 886 C. Strouthopoulos, N. Papamarkos/Image and Vision Computing 16 (1998) Fig. 5. Chains of overlapping boxes. that the feature has a good distinguishability for the two classes. The separability factor has been computed for any feature and for all class pairs. A proper threshold value for the separability factor is 3. For the first feature set, the separability test procedure rejects 436 features. The feature similarity analysis estimates for every two features l and m, belonging to the same class p, the correlation factor C p lm ¼ 1 np (l n i ¹ m l ) (m i ¹ m m ) p i ¼ 1 (6) j l j m where n p is the number of elements of class p, l i and m i the

9 C. Strouthopoulos, N. Papamarkos/Image and Vision Computing 16 (1998) Table 1 The first set of 13 DSE features Fig. 6. Probability density functions H 1 (l) of typical regions. feature values of element i, and m l, m m, j l and j m the means and standard deviations of the features in class p. Of the features that show high correlation, C avg 0.9, those with high separabilities are retained and the others deleted. We have found that 49 features can be rejected for the first feature set. Therefore, the application of the above feature selection process, for the first feature set, results in only 13 features corresponding to the DSEs given in Table 1. It is noted that the threshold values of the entire feature selection procedure are taken from the previous work of Driels and Nolan [26]. To speed up the selection process of the second feature set, first, the features having approximately near zero values are rejected. We apply this criterion because very low and similar feature values correspond to a low separability factor. Doing this, a total of features are rejected and in the remaining features we apply the stability, separability, and similarity criteria in the same way as in the first set. The entire feature selection procedure leads to the selection of only 21 features shown in Table 2. Specifically, the application of stability, separability, and Fig. 7. Probability density functions H 2 (l 1,l 2 ) of typical regions.

10 888 C. Strouthopoulos, N. Papamarkos/Image and Vision Computing 16 (1998) Table 2 The second set of features similarity tests reject 14, 87, and 37 features, respectively. As it can be observed from Tables 1 and 2, these results are reasonable. For example, the main components of halftone blocks contain 170, 341, 186, 495, 381, and 471 DSEs. Thus, the final feature set consists of only 34 features, which after a linear normalization corresponds to a 34-dimension space. In order to explain the contribution of the use of the second feature set, let us consider the two images of Fig. 8. For these images the non-zero values of the first feature set are shown in Table 3 and are exactly the same. On the contrary, because of the different values of the second feature set, these images can be clearly identified. The non-zero values of the second feature set are given in Table 4. Fig. 8. Test images. Using this feature set we can say that each block is represented by its corresponding vector in the feature space. However, in order to achieve better classification results we apply a PCA procedure, which drives a linear combination of the features to a self-organized neural network. 4. Principal component analysis The feature set selected is considered as inputs to a neural network PCA system. PCA provides a means by which to achieve such a transformation where the feature space accounts for as much of the total variation as possible. Specifically, using the PCA procedure the original feature space is transformed into another feature space that has exactly the same dimension as the original. However, the transformation is designed in such a way that the original feature set may be represented by a reduced number of effective features and yet retains most of the intrinsic information content of the data. Therefore, using PCA we achieve not only the increasing of feature variation but also decreasing of feature space dimensionality. Karhunen Loeve transformation (KLT) [18] is a well-known tool for PCA in multivariate analysis. A complete analysis of the

11 C. Strouthopoulos, N. Papamarkos/Image and Vision Computing 16 (1998) Table 3 PCA method used in this paper is given in Refs. [19,20]. The form of PCA is shown in Fig. 10 as the first part of the entire neural network. Briefly, in this approach the input vector of the PCA is the 34 feature vector x while the output is an 8-dimension vector y, which is given by the relation: y ¼ Wx (7) where the transformation matrix W contains the neural network coefficients. It is noted that after training, W will approach the matrix whose rows are the first M eigenvectors of C, ordered by decreasing eigenvalues, where C is the covariance matrix of the training set. The selection of eight output neurons is explained as follows. If M takes a value less than 34 a reduction of the y-dimension is achieved. To find the optimal value of M the mean square error 2 (M) [19,20] is estimated for M ¼ 1,2,,33 according the relations: 2 (M) ¼ E[e 2 ] (8) e ¼ 33 i ¼ M y i w i (9) where y i is the i output, w i the coefficients vector of the i neuron, and E[.] the statistical expectation operator which is computed for the whole training set. As we can see from Fig. 9, M ¼ 8 is the lower value in which 2 (M) is practically zero and therefore it can be considered as optimal. The neural network is trained by the Generalized Hebbian Algorithm (GHA). GHA is an unsupervised learning algorithm based upon a Hebbian learning rule. It calculates the eigenvectors of the covariance matrix C of the training feature set and it has been proven that it converges with probability one. In contrast to the Karhunen Loeve algorithm, this approach does not need to compute C analytically, since the eigenvectors are derived directly from the data. 5. The neural network classifier The neural network classifier is based on a Kohonen SOFM. The inputs of this neural net are the outputs of the PCA. The network combines its input layer with a competitive layer of neurons, and is trained by unsupervised learning. The two layers are fully interconnected since every input is connected to all of the neurons in the competitive layer. Typically, the competitive layer is organized as a two-dimensional square grid, and each neuron represents a class. Kohonen self-organization has as a result the representation of similar classes, by neighboring neurons on the competitive layer. Fig. 10 gives the Kohonen SOFM topology as the second part of the entire neural network. In this approach the competition layer has an 8 8 grid. Although the grid is two-dimensional, each neuron is labeled by an index k which takes values from the set {0,1,2,,63}. After training, each neuron on the output layer represents a class of patterns. Patterns of large similarity are represented by the same neuron on the grid. Each neuron is labeled by the identity of the patterns that were classified on it. The labeling method of SOFM is based on the densitymatching criterion [20]. More, specifically, due to the high discrimination ability of the input feature vector, the distributions of occurrences (votes) in the output neurons have small standard deviations. That is, the majority of occurrences in each neuron in the competition layer corresponds clearly to one of the three main block categories. Thus, after the training stage it is easy to label each neuron with the correct block class. The Kohonen Table 4

12 890 C. Strouthopoulos, N. Papamarkos/Image and Vision Computing 16 (1998) Fig. 9. The graph of function 2 (M). feature map organizes the neurons of the competitive layer in such a way that similarities among patterns are mapped into closeness relationships of the competitive layer grid. Also, the Kohonen SOFM provides advantages over classical pattern-recognition techniques because it utilizes the parallel architecture of a neural network and provides a graphical organization of pattern relationships [21]. The final segmentation results for the document of Fig. 1 are shown in Fig. 11. The document of Fig. 11 has a size of pixels and a resolution of 300 dpi. Text regions have been identified properly by the method and are shown as shadowed rectangles. 6. Experimental results 6.1. Neural network training experimental results The training of the neural network classifier is based on a set of representative blocks derived from different document types having a resolution in the range dpi. Every block was corresponded to any of the three basic classes of text, graphics, and halftones. Text blocks of characters of different sizes and font types, graphics blocks with different thickness, and blocks of images displayed with different halftone techniques were included in the training set. First, the PCA neural network was trained for various numbers of neurons and finally eight neurons were selected. Next, the SOFM neural network classifier was trained. Fig. 12 illustrates the training results. Each neuron is painted according to the class it is represented by. It can be observed that neurons of similar patterns create uniform groups, which correspond to the three main classes. The results of training were tested with two sets of experiments. The first set of 500 blocks was obtained from documents belonging to the training set, while the other set of 2500 blocks was obtained from documents out of the training set. Table 5 summarizes the classification results for text, graphics, and halftone classes. Another significant result is the identification of block subclasses that are not obvious in the beginning. Specifically, there are some neurons, each of them represents exclusively one pattern type, which can be Fig. 10. The two stages of a neural network.

13 C. Strouthopoulos, N. Papamarkos/Image and Vision Computing 16 (1998) Fig. 11. Final text identification results. considered as a subclass of one of the basic classes. That is, each main class, which consists of a number of neurons, includes a number of subclasses. In the grid the competition layer is expected to have subclasses. The observation of subclasses is made after the test procedure by taking into account the type of classified blocks. Such neurons are mentioned below. Neuron Subclass Text of Italic characters Text of Arial font characters Text of Roman font characters Text of Sanserif font characters Text of underline characters White Noise Halftoning Clustered-Dot Ordered Dither 891

14 892 C. Strouthopoulos, N. Papamarkos/Image and Vision Computing 16 (1998) Fig. 12. Kohonen SOFM after training. Fig. 13. Results for a document from the UW database.

15 C. Strouthopoulos, N. Papamarkos/Image and Vision Computing 16 (1998) Table 5 Percentage of correct block classification Blocks of training set (%) 2500 blocks out of the training set (%) Text Graphics Halftones This is a promising advantage of the proposed block classification procedure because it gives additional valuable information for the blocks Experimental results of text block identification The proposed method was extensively tested on a series of mixed type documents. In addition to our test documents, the method has also been applied to some documents obtained from the University of Washington database [22]. Also, the method is compared with the commercial products Recognita Plus, TextBridge and Anagnostis. In all the test cases, the page layout procedures of these products are not applicable on special document cases such as the documents used in our examples. The experimental results show clearly the improvement of the block classification. Blocks that were misclassified using the first feature set, now are classified correctly using the new feature set and the proposed neural network classifier. It is noted that the majority of the examples use documents with special PLA difficulties that are not covered by known methods. Fig. 13 presents the results obtained by applying the Fig. 14. Misclassified blocks. proposed method to mixed type documents in which misclassification appeared in some blocks by applying the previous method. The document of Fig. 13 has a size of pixels and a resolution of 300 dpi and is taken from the University of Washington database. On the top-left corner of this document, we can observe the blocks, which were shown in Fig. 14. These blocks are classified as non-text while they were misclassified using the first feature set. Fig. 15(a) (c) shows selected mixed type documents with different types of segmentation difficulties. These documents were scanned by an HP ScanJet IIc scanner at 150 dpi resolution. All the documents in Fig. 15 contain text with fonts of different type, style and size, and graphics that cannot be separated with vertical and horizontal lines. In addition to normal text blocks, Fig. 15(c) also contains Fig. 15. Instances where segmentation is difficult.

16 894 C. Strouthopoulos, N. Papamarkos/Image and Vision Computing 16 (1998) Fig. 15. (Continued) tilted and handwritten text blocks. It can be observed that we have accurate text identification results for all tested documents. 7. Conclusions Segmentation is an important issue in the automated document analysis area. Text segmentation plays a significant role in document retrieval and storage systems. This paper proposes a new method that identifies the regions of a mixed type document in text or non-text areas. The method belongs to the bottom-up segmentation techniques and in its first stage is a block extraction procedure. Generally, the extracted blocks contain text, graphics, and halftones. To identify the type of each block, texture features are extracted. These features are then processed by a PCA and classified by a neural network classifier. A feature selection scheme is employed in the definition of the final feature set, which results in only 34 effective features. The PCA optimally decreases the feature space dimensionality. The final stage consists of a Kohonen SOFM classifier, the competition layer of which consists of an 8 8 grid. Although we are interested in identifying only the text blocks, the classification scheme classifies efficiently all the documents blocks as text, graphics, and halftone blocks. The combination of the texture feature set with PCA and the neural network classifier confronts many segmentation difficulties. In comparison with other similar segmentation techniques, the proposed method works well even with documents that do not satisfy the Manhattan layout. It is independent of the size and type of characters

17 C. Strouthopoulos, N. Papamarkos/Image and Vision Computing 16 (1998) Fig. 15. (Continued) and the position of the text in the document and it is insensitive to a small tilt. Also, the proposed method can recognize text areas even if these are included in graphics or halftone regions. The proposed segmentation method was extensively tested with many mixed type documents presenting significant difficulties for segmentation. Experimental results presented in this paper show the feasibility and the robustness of the method in segmenting of mixed type documents. References [1] L. O Gorman, R. Kasturi, Document Image Analysis, IEEE Computer Society Press, Los Alamos, California, [2] H. Fujisawa, Y. Nakano, K. Kurino, Segmentation methods for character recognition: from segmentation to document structure analysis, Proceedings of IEEE (July 1992) [3] J. Schurmann, N. Bartneck, T. Bayer, J. Franke, E. Mandler, M. Oberlander, Document analysis from pixels to contents, Proceedings of IEEE (July 1992) [4] T. Pavlidis, J. Zhou, Page segmentation and classification, Graphical Models and Image Processing 54 (1992) [5] K.Y. Wong, R.G. Casey, M. Wahl, Document analysis system, IBM J. Res. Dev. 6 (1982) [6] F.M. Wahl, K.Y. Wong, R.G. Casey, Block segmentation and text extraction in mixed text/image documents, Computer Graphics and Image Processing 20 (1989) [7] D. Wang, S.N. Shihari, Classification of newspaper image blocks using texture analysis, Computer Vision Graphics and Image Processing 47 (1989) [8] G. Nagy, S. Seth, M. Viswanathan, A prototype document image analysis system for technical journals, IEEE Comput. 25 (1992) [9] L.A. Fletcher, R. Kasturi, A Robust algorithm for text string separation from mixed text/graphics images, IEEE Trans. Pattern Anal. Machine Intell. 10 (1988) [10] R. Kasturi, S.T. Bow, W. El-Masri, J. Shah, J.R. Gattiker, U.B. Mokate, A system for interpretation of line drawings, IEEE Trans. Pattern Anal. Machine Intell. 12 (1990)

18 896 C. Strouthopoulos, N. Papamarkos/Image and Vision Computing 16 (1998) [11] L. O Gorman, The document spectrum for page layout analysis, IEEE Trans. Pattern Anal. Machine Intell. 15 (1993) [12] K.-C. Fan, C.-H. Liu, Y.-K. Wang, Segmentation and classification of mixed text/graphics/image documents, Pattern Recognition Letters 15 (1994) [13] F. Farrokhinia, Multi-channel filtering techniques for texture segmentation and surface quality inspection, Ph.D. thesis, Dept. of Electrical Engineering, Michigan State University, [14] A. Jain, S. Bhattacharjee, Text segmentation using Gabor filters for automatic document processing, Machine Vision and Applications 5 (1992) [15] A.K. Jain, Yu Zhong, Page segmentation using texture analysis, Pattern Recognition 29 (1996) [16] C. Strouthopoulos, N. Papamarkos, C. Chamzas, Identification of text-only areas in mixed type documents, IEEE Workshop on Non-linear Signal and Image Processing 1 (1995) [17] C. Strouthopoulos, N. Papamarkos, C. Chamzas, Identification of text-only areas in mixed type documents, Engineering Applications of Artificial Intelligence 10 (4) (1997) [18] A.S. Pandya, R.B. Macy, Pattern Recognition with Neural Networks in C þ þ, CRC Press and IEEE Press, 1995, pp [19] T.D. Sanger, Optimal unsupervised learning in a single-layer linear feedforward neural network, Neural Networks 12 (1989) [20] S. Haykin, Neural Networks: A Comprehensive Foundation, Macmillan, New York, [21] J. Dayhoff, Neural Network Architectures: An Introduction, Van Nostrand Reinhold, New York, [22] UW: English Document Image Database, University of Washington, Seattle, [23] D.M. Tsai, Y.H. Chen, A fast histogram-clustering approach for multilevel thresholding, Pattern Recognition Letters 13 (1992) [24] N. Papamarkos, B. Gatos, A new approach for multilevel threshold selection, CVGIP: Graphical Models and Image Processing 56 (1994) [25] I. Witten, A. Moffat, T. Bell, Managing Gigabytes, Van Nostrand Reinhold, New York, [26] M. Driels, D. Nolan, Automatic defect classification of printed wiring board solder joints, IEEE Trans. on Components, Hybrids and Manufactory Technology 13 (1990)

Gray-Level Reduction Using Local Spatial Features

Gray-Level Reduction Using Local Spatial Features Computer Vision and Image Understanding 78, 336 350 (2000) doi:10.1006/cviu.2000.0838, available online at http://www.idealibrary.com on Gray-Level Reduction Using Local Spatial Features Nikos Papamarkos

More information

Multithresholding of color and gray-level images through a neural network technique

Multithresholding of color and gray-level images through a neural network technique Image and Vision Computing 8 (2000) 23 222 www.elsevier.com/locate/imavis Multithresholding of color and gray-level images through a neural network technique N. Papamarkos*, C. Strouthopoulos, I. Andreadis

More information

Multi-scale Techniques for Document Page Segmentation

Multi-scale Techniques for Document Page Segmentation Multi-scale Techniques for Document Page Segmentation Zhixin Shi and Venu Govindaraju Center of Excellence for Document Analysis and Recognition (CEDAR), State University of New York at Buffalo, Amherst

More information

Layout Segmentation of Scanned Newspaper Documents

Layout Segmentation of Scanned Newspaper Documents , pp-05-10 Layout Segmentation of Scanned Newspaper Documents A.Bandyopadhyay, A. Ganguly and U.Pal CVPR Unit, Indian Statistical Institute 203 B T Road, Kolkata, India. Abstract: Layout segmentation algorithms

More information

A Labeling Approach for Mixed Document Blocks. A. Bela d and O. T. Akindele. Crin-Cnrs/Inria-Lorraine, B timent LORIA, Campus Scientique, B.P.

A Labeling Approach for Mixed Document Blocks. A. Bela d and O. T. Akindele. Crin-Cnrs/Inria-Lorraine, B timent LORIA, Campus Scientique, B.P. A Labeling Approach for Mixed Document Blocks A. Bela d and O. T. Akindele Crin-Cnrs/Inria-Lorraine, B timent LORIA, Campus Scientique, B.P. 39, 54506 Vand uvre-l s-nancy Cedex. France. Abstract A block

More information

A New Algorithm for Detecting Text Line in Handwritten Documents

A New Algorithm for Detecting Text Line in Handwritten Documents A New Algorithm for Detecting Text Line in Handwritten Documents Yi Li 1, Yefeng Zheng 2, David Doermann 1, and Stefan Jaeger 1 1 Laboratory for Language and Media Processing Institute for Advanced Computer

More information

N.Priya. Keywords Compass mask, Threshold, Morphological Operators, Statistical Measures, Text extraction

N.Priya. Keywords Compass mask, Threshold, Morphological Operators, Statistical Measures, Text extraction Volume, Issue 8, August ISSN: 77 8X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Combined Edge-Based Text

More information

Separation of Overlapping Text from Graphics

Separation of Overlapping Text from Graphics Separation of Overlapping Text from Graphics Ruini Cao, Chew Lim Tan School of Computing, National University of Singapore 3 Science Drive 2, Singapore 117543 Email: {caorn, tancl}@comp.nus.edu.sg Abstract

More information

Texture Image Segmentation using FCM

Texture Image Segmentation using FCM Proceedings of 2012 4th International Conference on Machine Learning and Computing IPCSIT vol. 25 (2012) (2012) IACSIT Press, Singapore Texture Image Segmentation using FCM Kanchan S. Deshmukh + M.G.M

More information

Pixels. Orientation π. θ π/2 φ. x (i) A (i, j) height. (x, y) y(j)

Pixels. Orientation π. θ π/2 φ. x (i) A (i, j) height. (x, y) y(j) 4th International Conf. on Document Analysis and Recognition, pp.142-146, Ulm, Germany, August 18-20, 1997 Skew and Slant Correction for Document Images Using Gradient Direction Changming Sun Λ CSIRO Math.

More information

Character Recognition

Character Recognition Character Recognition 5.1 INTRODUCTION Recognition is one of the important steps in image processing. There are different methods such as Histogram method, Hough transformation, Neural computing approaches

More information

A Document Image Analysis System on Parallel Processors

A Document Image Analysis System on Parallel Processors A Document Image Analysis System on Parallel Processors Shamik Sural, CMC Ltd. 28 Camac Street, Calcutta 700 016, India. P.K.Das, Dept. of CSE. Jadavpur University, Calcutta 700 032, India. Abstract This

More information

Texture Segmentation by Windowed Projection

Texture Segmentation by Windowed Projection Texture Segmentation by Windowed Projection 1, 2 Fan-Chen Tseng, 2 Ching-Chi Hsu, 2 Chiou-Shann Fuh 1 Department of Electronic Engineering National I-Lan Institute of Technology e-mail : fctseng@ccmail.ilantech.edu.tw

More information

A Laplacian Based Novel Approach to Efficient Text Localization in Grayscale Images

A Laplacian Based Novel Approach to Efficient Text Localization in Grayscale Images A Laplacian Based Novel Approach to Efficient Text Localization in Grayscale Images Karthik Ram K.V & Mahantesh K Department of Electronics and Communication Engineering, SJB Institute of Technology, Bangalore,

More information

Document Text Extraction from Document Images Using Haar Discrete Wavelet Transform

Document Text Extraction from Document Images Using Haar Discrete Wavelet Transform European Journal of Scientific Research ISSN 1450-216X Vol.36 No.4 (2009), pp.502-512 EuroJournals Publishing, Inc. 2009 http://www.eurojournals.com/ejsr.htm Document Text Extraction from Document Images

More information

OCR For Handwritten Marathi Script

OCR For Handwritten Marathi Script International Journal of Scientific & Engineering Research Volume 3, Issue 8, August-2012 1 OCR For Handwritten Marathi Script Mrs.Vinaya. S. Tapkir 1, Mrs.Sushma.D.Shelke 2 1 Maharashtra Academy Of Engineering,

More information

Text Enhancement with Asymmetric Filter for Video OCR. Datong Chen, Kim Shearer and Hervé Bourlard

Text Enhancement with Asymmetric Filter for Video OCR. Datong Chen, Kim Shearer and Hervé Bourlard Text Enhancement with Asymmetric Filter for Video OCR Datong Chen, Kim Shearer and Hervé Bourlard Dalle Molle Institute for Perceptual Artificial Intelligence Rue du Simplon 4 1920 Martigny, Switzerland

More information

Text Extraction from Gray Scale Document Images Using Edge Information

Text Extraction from Gray Scale Document Images Using Edge Information Text Extraction from Gray Scale Document Images Using Edge Information Q. Yuan, C. L. Tan Dept. of Computer Science, School of computing National University of Singapore 3 Science Drive, Singapore 117543

More information

COMPUTER AND ROBOT VISION

COMPUTER AND ROBOT VISION VOLUME COMPUTER AND ROBOT VISION Robert M. Haralick University of Washington Linda G. Shapiro University of Washington A^ ADDISON-WESLEY PUBLISHING COMPANY Reading, Massachusetts Menlo Park, California

More information

Time Stamp Detection and Recognition in Video Frames

Time Stamp Detection and Recognition in Video Frames Time Stamp Detection and Recognition in Video Frames Nongluk Covavisaruch and Chetsada Saengpanit Department of Computer Engineering, Chulalongkorn University, Bangkok 10330, Thailand E-mail: nongluk.c@chula.ac.th

More information

A Generalized Method to Solve Text-Based CAPTCHAs

A Generalized Method to Solve Text-Based CAPTCHAs A Generalized Method to Solve Text-Based CAPTCHAs Jason Ma, Bilal Badaoui, Emile Chamoun December 11, 2009 1 Abstract We present work in progress on the automated solving of text-based CAPTCHAs. Our method

More information

Color reduction by using a new self-growing and self-organized neural network

Color reduction by using a new self-growing and self-organized neural network Vision, Video and Graphics (2005) E. Trucco, M. Chantler (Editors) Color reduction by using a new self-growing and self-organized neural network A. Atsalakis and N. Papamarkos* Image Processing and Multimedia

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Introduction Often, we have only a set of features x = x 1, x 2,, x n, but no associated response y. Therefore we are not interested in prediction nor classification,

More information

MULTI ORIENTATION PERFORMANCE OF FEATURE EXTRACTION FOR HUMAN HEAD RECOGNITION

MULTI ORIENTATION PERFORMANCE OF FEATURE EXTRACTION FOR HUMAN HEAD RECOGNITION MULTI ORIENTATION PERFORMANCE OF FEATURE EXTRACTION FOR HUMAN HEAD RECOGNITION Panca Mudjirahardjo, Rahmadwati, Nanang Sulistiyanto and R. Arief Setyawan Department of Electrical Engineering, Faculty of

More information

Function approximation using RBF network. 10 basis functions and 25 data points.

Function approximation using RBF network. 10 basis functions and 25 data points. 1 Function approximation using RBF network F (x j ) = m 1 w i ϕ( x j t i ) i=1 j = 1... N, m 1 = 10, N = 25 10 basis functions and 25 data points. Basis function centers are plotted with circles and data

More information

Lecture 8 Object Descriptors

Lecture 8 Object Descriptors Lecture 8 Object Descriptors Azadeh Fakhrzadeh Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University 2 Reading instructions Chapter 11.1 11.4 in G-W Azadeh Fakhrzadeh

More information

Vision. OCR and OCV Application Guide OCR and OCV Application Guide 1/14

Vision. OCR and OCV Application Guide OCR and OCV Application Guide 1/14 Vision OCR and OCV Application Guide 1.00 OCR and OCV Application Guide 1/14 General considerations on OCR Encoded information into text and codes can be automatically extracted through a 2D imager device.

More information

Model-based segmentation and recognition from range data

Model-based segmentation and recognition from range data Model-based segmentation and recognition from range data Jan Boehm Institute for Photogrammetry Universität Stuttgart Germany Keywords: range image, segmentation, object recognition, CAD ABSTRACT This

More information

Siti Norul Huda Sheikh Abdullah

Siti Norul Huda Sheikh Abdullah Siti Norul Huda Sheikh Abdullah mimi@ftsm.ukm.my 90 % is 90 % is standard license plates 10% is in special format Definitions: Car plate recognition, plate number recognition, vision plate, automatic

More information

Texture recognition of medical images with the ICM method

Texture recognition of medical images with the ICM method Nuclear Instruments and Methods in Physics Research A 525 (2004) 387 391 Texture recognition of medical images with the ICM method Jason M. Kinser*, Guisong Wang George Mason University, Institute for

More information

II. WORKING OF PROJECT

II. WORKING OF PROJECT Handwritten character Recognition and detection using histogram technique Tanmay Bahadure, Pranay Wekhande, Manish Gaur, Shubham Raikwar, Yogendra Gupta ABSTRACT : Cursive handwriting recognition is a

More information

Fabric Defect Detection Based on Computer Vision

Fabric Defect Detection Based on Computer Vision Fabric Defect Detection Based on Computer Vision Jing Sun and Zhiyu Zhou College of Information and Electronics, Zhejiang Sci-Tech University, Hangzhou, China {jings531,zhouzhiyu1993}@163.com Abstract.

More information

Basic Idea. The routing problem is typically solved using a twostep

Basic Idea. The routing problem is typically solved using a twostep Global Routing Basic Idea The routing problem is typically solved using a twostep approach: Global Routing Define the routing regions. Generate a tentative route for each net. Each net is assigned to a

More information

DATA EMBEDDING IN TEXT FOR A COPIER SYSTEM

DATA EMBEDDING IN TEXT FOR A COPIER SYSTEM DATA EMBEDDING IN TEXT FOR A COPIER SYSTEM Anoop K. Bhattacharjya and Hakan Ancin Epson Palo Alto Laboratory 3145 Porter Drive, Suite 104 Palo Alto, CA 94304 e-mail: {anoop, ancin}@erd.epson.com Abstract

More information

Locating 1-D Bar Codes in DCT-Domain

Locating 1-D Bar Codes in DCT-Domain Edith Cowan University Research Online ECU Publications Pre. 2011 2006 Locating 1-D Bar Codes in DCT-Domain Alexander Tropf Edith Cowan University Douglas Chai Edith Cowan University 10.1109/ICASSP.2006.1660449

More information

LOCAL SKEW CORRECTION IN DOCUMENTS

LOCAL SKEW CORRECTION IN DOCUMENTS International Journal of Pattern Recognition and Artificial Intelligence Vol. 22, No. 4 (2008) 691 710 c World Scientific Publishing Company LOCAL SKEW CORRECTION IN DOCUMENTS P. SARAGIOTIS and N. PAPAMARKOS

More information

CORRELATION BASED CAR NUMBER PLATE EXTRACTION SYSTEM

CORRELATION BASED CAR NUMBER PLATE EXTRACTION SYSTEM CORRELATION BASED CAR NUMBER PLATE EXTRACTION SYSTEM 1 PHYO THET KHIN, 2 LAI LAI WIN KYI 1,2 Department of Information Technology, Mandalay Technological University The Republic of the Union of Myanmar

More information

Scanner Parameter Estimation Using Bilevel Scans of Star Charts

Scanner Parameter Estimation Using Bilevel Scans of Star Charts ICDAR, Seattle WA September Scanner Parameter Estimation Using Bilevel Scans of Star Charts Elisa H. Barney Smith Electrical and Computer Engineering Department Boise State University, Boise, Idaho 8375

More information

An Accurate and Efficient System for Segmenting Machine-Printed Text. Yi Lu, Beverly Haist, Laurel Harmon, John Trenkle and Robert Vogt

An Accurate and Efficient System for Segmenting Machine-Printed Text. Yi Lu, Beverly Haist, Laurel Harmon, John Trenkle and Robert Vogt An Accurate and Efficient System for Segmenting Machine-Printed Text Yi Lu, Beverly Haist, Laurel Harmon, John Trenkle and Robert Vogt Environmental Research Institute of Michigan P. O. Box 134001 Ann

More information

Skew Detection Technique for Binary Document Images based on Hough Transform

Skew Detection Technique for Binary Document Images based on Hough Transform Skew Detection Technique for Binary Document Images based on Hough Transform Manjunath Aradhya V N*, Hemantha Kumar G, and Shivakumara P Abstract Document image processing has become an increasingly important

More information

Data Compression. The Encoder and PCA

Data Compression. The Encoder and PCA Data Compression The Encoder and PCA Neural network techniques have been shown useful in the area of data compression. In general, data compression can be lossless compression or lossy compression. In

More information

Part-Based Skew Estimation for Mathematical Expressions

Part-Based Skew Estimation for Mathematical Expressions Soma Shiraishi, Yaokai Feng, and Seiichi Uchida shiraishi@human.ait.kyushu-u.ac.jp {fengyk,uchida}@ait.kyushu-u.ac.jp Abstract We propose a novel method for the skew estimation on text images containing

More information

Skew Detection and Correction of Document Image using Hough Transform Method

Skew Detection and Correction of Document Image using Hough Transform Method Skew Detection and Correction of Document Image using Hough Transform Method [1] Neerugatti Varipally Vishwanath, [2] Dr.T. Pearson, [3] K.Chaitanya, [4] MG JaswanthSagar, [5] M.Rupesh [1] Asst.Professor,

More information

Automatic thresholding for defect detection

Automatic thresholding for defect detection Pattern Recognition Letters xxx (2006) xxx xxx www.elsevier.com/locate/patrec Automatic thresholding for defect detection Hui-Fuang Ng * Department of Computer Science and Information Engineering, Asia

More information

Prof. Fanny Ficuciello Robotics for Bioengineering Visual Servoing

Prof. Fanny Ficuciello Robotics for Bioengineering Visual Servoing Visual servoing vision allows a robotic system to obtain geometrical and qualitative information on the surrounding environment high level control motion planning (look-and-move visual grasping) low level

More information

NOVATEUR PUBLICATIONS INTERNATIONAL JOURNAL OF INNOVATIONS IN ENGINEERING RESEARCH AND TECHNOLOGY [IJIERT] ISSN: VOLUME 5, ISSUE

NOVATEUR PUBLICATIONS INTERNATIONAL JOURNAL OF INNOVATIONS IN ENGINEERING RESEARCH AND TECHNOLOGY [IJIERT] ISSN: VOLUME 5, ISSUE OPTICAL HANDWRITTEN DEVNAGARI CHARACTER RECOGNITION USING ARTIFICIAL NEURAL NETWORK APPROACH JYOTI A.PATIL Ashokrao Mane Group of Institution, Vathar Tarf Vadgaon, India. DR. SANJAY R. PATIL Ashokrao Mane

More information

I. INTRODUCTION. Figure-1 Basic block of text analysis

I. INTRODUCTION. Figure-1 Basic block of text analysis ISSN: 2349-7637 (Online) (RHIMRJ) Research Paper Available online at: www.rhimrj.com Detection and Localization of Texts from Natural Scene Images: A Hybrid Approach Priyanka Muchhadiya Post Graduate Fellow,

More information

On Segmentation of Documents in Complex Scripts

On Segmentation of Documents in Complex Scripts On Segmentation of Documents in Complex Scripts K. S. Sesh Kumar, Sukesh Kumar and C. V. Jawahar Centre for Visual Information Technology International Institute of Information Technology, Hyderabad, India

More information

6. NEURAL NETWORK BASED PATH PLANNING ALGORITHM 6.1 INTRODUCTION

6. NEURAL NETWORK BASED PATH PLANNING ALGORITHM 6.1 INTRODUCTION 6 NEURAL NETWORK BASED PATH PLANNING ALGORITHM 61 INTRODUCTION In previous chapters path planning algorithms such as trigonometry based path planning algorithm and direction based path planning algorithm

More information

Moving Object Segmentation Method Based on Motion Information Classification by X-means and Spatial Region Segmentation

Moving Object Segmentation Method Based on Motion Information Classification by X-means and Spatial Region Segmentation IJCSNS International Journal of Computer Science and Network Security, VOL.13 No.11, November 2013 1 Moving Object Segmentation Method Based on Motion Information Classification by X-means and Spatial

More information

GENERAL AUTOMATED FLAW DETECTION SCHEME FOR NDE X-RAY IMAGES

GENERAL AUTOMATED FLAW DETECTION SCHEME FOR NDE X-RAY IMAGES GENERAL AUTOMATED FLAW DETECTION SCHEME FOR NDE X-RAY IMAGES Karl W. Ulmer and John P. Basart Center for Nondestructive Evaluation Department of Electrical and Computer Engineering Iowa State University

More information

Biometrics Technology: Image Processing & Pattern Recognition (by Dr. Dickson Tong)

Biometrics Technology: Image Processing & Pattern Recognition (by Dr. Dickson Tong) Biometrics Technology: Image Processing & Pattern Recognition (by Dr. Dickson Tong) References: [1] http://homepages.inf.ed.ac.uk/rbf/hipr2/index.htm [2] http://www.cs.wisc.edu/~dyer/cs540/notes/vision.html

More information

CHAPTER 8 COMPOUND CHARACTER RECOGNITION USING VARIOUS MODELS

CHAPTER 8 COMPOUND CHARACTER RECOGNITION USING VARIOUS MODELS CHAPTER 8 COMPOUND CHARACTER RECOGNITION USING VARIOUS MODELS 8.1 Introduction The recognition systems developed so far were for simple characters comprising of consonants and vowels. But there is one

More information

Fine Classification of Unconstrained Handwritten Persian/Arabic Numerals by Removing Confusion amongst Similar Classes

Fine Classification of Unconstrained Handwritten Persian/Arabic Numerals by Removing Confusion amongst Similar Classes 2009 10th International Conference on Document Analysis and Recognition Fine Classification of Unconstrained Handwritten Persian/Arabic Numerals by Removing Confusion amongst Similar Classes Alireza Alaei

More information

Clustering and Visualisation of Data

Clustering and Visualisation of Data Clustering and Visualisation of Data Hiroshi Shimodaira January-March 28 Cluster analysis aims to partition a data set into meaningful or useful groups, based on distances between data points. In some

More information

AN IMPROVED K-MEANS CLUSTERING ALGORITHM FOR IMAGE SEGMENTATION

AN IMPROVED K-MEANS CLUSTERING ALGORITHM FOR IMAGE SEGMENTATION AN IMPROVED K-MEANS CLUSTERING ALGORITHM FOR IMAGE SEGMENTATION WILLIAM ROBSON SCHWARTZ University of Maryland, Department of Computer Science College Park, MD, USA, 20742-327, schwartz@cs.umd.edu RICARDO

More information

The Detection of Faces in Color Images: EE368 Project Report

The Detection of Faces in Color Images: EE368 Project Report The Detection of Faces in Color Images: EE368 Project Report Angela Chau, Ezinne Oji, Jeff Walters Dept. of Electrical Engineering Stanford University Stanford, CA 9435 angichau,ezinne,jwalt@stanford.edu

More information

COMBINED METHOD TO VISUALISE AND REDUCE DIMENSIONALITY OF THE FINANCIAL DATA SETS

COMBINED METHOD TO VISUALISE AND REDUCE DIMENSIONALITY OF THE FINANCIAL DATA SETS COMBINED METHOD TO VISUALISE AND REDUCE DIMENSIONALITY OF THE FINANCIAL DATA SETS Toomas Kirt Supervisor: Leo Võhandu Tallinn Technical University Toomas.Kirt@mail.ee Abstract: Key words: For the visualisation

More information

Algorithm research of 3D point cloud registration based on iterative closest point 1

Algorithm research of 3D point cloud registration based on iterative closest point 1 Acta Technica 62, No. 3B/2017, 189 196 c 2017 Institute of Thermomechanics CAS, v.v.i. Algorithm research of 3D point cloud registration based on iterative closest point 1 Qian Gao 2, Yujian Wang 2,3,

More information

Indian Multi-Script Full Pin-code String Recognition for Postal Automation

Indian Multi-Script Full Pin-code String Recognition for Postal Automation 2009 10th International Conference on Document Analysis and Recognition Indian Multi-Script Full Pin-code String Recognition for Postal Automation U. Pal 1, R. K. Roy 1, K. Roy 2 and F. Kimura 3 1 Computer

More information

CS443: Digital Imaging and Multimedia Binary Image Analysis. Spring 2008 Ahmed Elgammal Dept. of Computer Science Rutgers University

CS443: Digital Imaging and Multimedia Binary Image Analysis. Spring 2008 Ahmed Elgammal Dept. of Computer Science Rutgers University CS443: Digital Imaging and Multimedia Binary Image Analysis Spring 2008 Ahmed Elgammal Dept. of Computer Science Rutgers University Outlines A Simple Machine Vision System Image segmentation by thresholding

More information

Image Matching Using Run-Length Feature

Image Matching Using Run-Length Feature Image Matching Using Run-Length Feature Yung-Kuan Chan and Chin-Chen Chang Department of Computer Science and Information Engineering National Chung Cheng University, Chiayi, Taiwan, 621, R.O.C. E-mail:{chan,

More information

Image Segmentation Based on Watershed and Edge Detection Techniques

Image Segmentation Based on Watershed and Edge Detection Techniques 0 The International Arab Journal of Information Technology, Vol., No., April 00 Image Segmentation Based on Watershed and Edge Detection Techniques Nassir Salman Computer Science Department, Zarqa Private

More information

Color Image Segmentation

Color Image Segmentation Color Image Segmentation Yining Deng, B. S. Manjunath and Hyundoo Shin* Department of Electrical and Computer Engineering University of California, Santa Barbara, CA 93106-9560 *Samsung Electronics Inc.

More information

HANDWRITTEN GURMUKHI CHARACTER RECOGNITION USING WAVELET TRANSFORMS

HANDWRITTEN GURMUKHI CHARACTER RECOGNITION USING WAVELET TRANSFORMS International Journal of Electronics, Communication & Instrumentation Engineering Research and Development (IJECIERD) ISSN 2249-684X Vol.2, Issue 3 Sep 2012 27-37 TJPRC Pvt. Ltd., HANDWRITTEN GURMUKHI

More information

Classification of Fingerprint Images

Classification of Fingerprint Images Classification of Fingerprint Images Lin Hong and Anil Jain Department of Computer Science, Michigan State University, East Lansing, MI 48824 fhonglin,jaing@cps.msu.edu Abstract Automatic fingerprint identification

More information

An Efficient Character Segmentation Based on VNP Algorithm

An Efficient Character Segmentation Based on VNP Algorithm Research Journal of Applied Sciences, Engineering and Technology 4(24): 5438-5442, 2012 ISSN: 2040-7467 Maxwell Scientific organization, 2012 Submitted: March 18, 2012 Accepted: April 14, 2012 Published:

More information

OPTIMIZING A VIDEO PREPROCESSOR FOR OCR. MR IBM Systems Dev Rochester, elopment Division Minnesota

OPTIMIZING A VIDEO PREPROCESSOR FOR OCR. MR IBM Systems Dev Rochester, elopment Division Minnesota OPTIMIZING A VIDEO PREPROCESSOR FOR OCR MR IBM Systems Dev Rochester, elopment Division Minnesota Summary This paper describes how optimal video preprocessor performance can be achieved using a software

More information

A Graph Theoretic Approach to Image Database Retrieval

A Graph Theoretic Approach to Image Database Retrieval A Graph Theoretic Approach to Image Database Retrieval Selim Aksoy and Robert M. Haralick Intelligent Systems Laboratory Department of Electrical Engineering University of Washington, Seattle, WA 98195-2500

More information

A System to Automatically Index Genealogical Microfilm Titleboards Introduction Preprocessing Method Identification

A System to Automatically Index Genealogical Microfilm Titleboards Introduction Preprocessing Method Identification A System to Automatically Index Genealogical Microfilm Titleboards Samuel James Pinson, Mark Pinson and William Barrett Department of Computer Science Brigham Young University Introduction Millions of

More information

An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data

An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data Nian Zhang and Lara Thompson Department of Electrical and Computer Engineering, University

More information

Modern Medical Image Analysis 8DC00 Exam

Modern Medical Image Analysis 8DC00 Exam Parts of answers are inside square brackets [... ]. These parts are optional. Answers can be written in Dutch or in English, as you prefer. You can use drawings and diagrams to support your textual answers.

More information

Toward Part-based Document Image Decoding

Toward Part-based Document Image Decoding 2012 10th IAPR International Workshop on Document Analysis Systems Toward Part-based Document Image Decoding Wang Song, Seiichi Uchida Kyushu University, Fukuoka, Japan wangsong@human.ait.kyushu-u.ac.jp,

More information

CoE4TN4 Image Processing

CoE4TN4 Image Processing CoE4TN4 Image Processing Chapter 11 Image Representation & Description Image Representation & Description After an image is segmented into regions, the regions are represented and described in a form suitable

More information

Corner Detection. Harvey Rhody Chester F. Carlson Center for Imaging Science Rochester Institute of Technology

Corner Detection. Harvey Rhody Chester F. Carlson Center for Imaging Science Rochester Institute of Technology Corner Detection Harvey Rhody Chester F. Carlson Center for Imaging Science Rochester Institute of Technology rhody@cis.rit.edu April 11, 2006 Abstract Corners and edges are two of the most important geometrical

More information

Unsupervised learning in Vision

Unsupervised learning in Vision Chapter 7 Unsupervised learning in Vision The fields of Computer Vision and Machine Learning complement each other in a very natural way: the aim of the former is to extract useful information from visual

More information

Face Recognition Technology Based On Image Processing Chen Xin, Yajuan Li, Zhimin Tian

Face Recognition Technology Based On Image Processing Chen Xin, Yajuan Li, Zhimin Tian 4th International Conference on Machinery, Materials and Computing Technology (ICMMCT 2016) Face Recognition Technology Based On Image Processing Chen Xin, Yajuan Li, Zhimin Tian Hebei Engineering and

More information

EE 701 ROBOT VISION. Segmentation

EE 701 ROBOT VISION. Segmentation EE 701 ROBOT VISION Regions and Image Segmentation Histogram-based Segmentation Automatic Thresholding K-means Clustering Spatial Coherence Merging and Splitting Graph Theoretic Segmentation Region Growing

More information

Scene Text Detection Using Machine Learning Classifiers

Scene Text Detection Using Machine Learning Classifiers 601 Scene Text Detection Using Machine Learning Classifiers Nafla C.N. 1, Sneha K. 2, Divya K.P. 3 1 (Department of CSE, RCET, Akkikkvu, Thrissur) 2 (Department of CSE, RCET, Akkikkvu, Thrissur) 3 (Department

More information

PATTERN RECOGNITION USING NEURAL NETWORKS

PATTERN RECOGNITION USING NEURAL NETWORKS PATTERN RECOGNITION USING NEURAL NETWORKS Santaji Ghorpade 1, Jayshree Ghorpade 2 and Shamla Mantri 3 1 Department of Information Technology Engineering, Pune University, India santaji_11jan@yahoo.co.in,

More information

Face recognition based on improved BP neural network

Face recognition based on improved BP neural network Face recognition based on improved BP neural network Gaili Yue, Lei Lu a, College of Electrical and Control Engineering, Xi an University of Science and Technology, Xi an 710043, China Abstract. In order

More information

D.A. Karras 1, S.A. Karkanis 2, D. Iakovidis 2, D. E. Maroulis 2 and B.G. Mertzios , Greece,

D.A. Karras 1, S.A. Karkanis 2, D. Iakovidis 2, D. E. Maroulis 2 and B.G. Mertzios , Greece, NIMIA-SC2001-2001 NATO Advanced Study Institute on Neural Networks for Instrumentation, Measurement, and Related Industrial Applications: Study Cases Crema, Italy, 9-20 October 2001 Support Vector Machines

More information

Comparison of supervised self-organizing maps using Euclidian or Mahalanobis distance in classification context

Comparison of supervised self-organizing maps using Euclidian or Mahalanobis distance in classification context 6 th. International Work Conference on Artificial and Natural Neural Networks (IWANN2001), Granada, June 13-15 2001 Comparison of supervised self-organizing maps using Euclidian or Mahalanobis distance

More information

Radial Basis Function Neural Network Classifier

Radial Basis Function Neural Network Classifier Recognition of Unconstrained Handwritten Numerals by a Radial Basis Function Neural Network Classifier Hwang, Young-Sup and Bang, Sung-Yang Department of Computer Science & Engineering Pohang University

More information

A Fuzzy C-means Clustering Algorithm Based on Pseudo-nearest-neighbor Intervals for Incomplete Data

A Fuzzy C-means Clustering Algorithm Based on Pseudo-nearest-neighbor Intervals for Incomplete Data Journal of Computational Information Systems 11: 6 (2015) 2139 2146 Available at http://www.jofcis.com A Fuzzy C-means Clustering Algorithm Based on Pseudo-nearest-neighbor Intervals for Incomplete Data

More information

A Method for Edge Detection in Hyperspectral Images Based on Gradient Clustering

A Method for Edge Detection in Hyperspectral Images Based on Gradient Clustering A Method for Edge Detection in Hyperspectral Images Based on Gradient Clustering V.C. Dinh, R. P. W. Duin Raimund Leitner Pavel Paclik Delft University of Technology CTR AG PR Sys Design The Netherlands

More information

Handwritten Script Recognition at Block Level

Handwritten Script Recognition at Block Level Chapter 4 Handwritten Script Recognition at Block Level -------------------------------------------------------------------------------------------------------------------------- Optical character recognition

More information

EE368 Project: Visual Code Marker Detection

EE368 Project: Visual Code Marker Detection EE368 Project: Visual Code Marker Detection Kahye Song Group Number: 42 Email: kahye@stanford.edu Abstract A visual marker detection algorithm has been implemented and tested with twelve training images.

More information

Simulation of Zhang Suen Algorithm using Feed- Forward Neural Networks

Simulation of Zhang Suen Algorithm using Feed- Forward Neural Networks Simulation of Zhang Suen Algorithm using Feed- Forward Neural Networks Ritika Luthra Research Scholar Chandigarh University Gulshan Goyal Associate Professor Chandigarh University ABSTRACT Image Skeletonization

More information

A Fuzzy ARTMAP Based Classification Technique of Natural Textures

A Fuzzy ARTMAP Based Classification Technique of Natural Textures A Fuzzy ARTMAP Based Classification Technique of Natural Textures Dimitrios Charalampidis Orlando, Florida 328 16 dcl9339@pegasus.cc.ucf.edu Michael Georgiopoulos michaelg @pegasus.cc.ucf.edu Takis Kasparis

More information

Detecting Salient Contours Using Orientation Energy Distribution. Part I: Thresholding Based on. Response Distribution

Detecting Salient Contours Using Orientation Energy Distribution. Part I: Thresholding Based on. Response Distribution Detecting Salient Contours Using Orientation Energy Distribution The Problem: How Does the Visual System Detect Salient Contours? CPSC 636 Slide12, Spring 212 Yoonsuck Choe Co-work with S. Sarma and H.-C.

More information

An Efficient Character Segmentation Algorithm for Printed Chinese Documents

An Efficient Character Segmentation Algorithm for Printed Chinese Documents An Efficient Character Segmentation Algorithm for Printed Chinese Documents Yuan Mei 1,2, Xinhui Wang 1,2, Jin Wang 1,2 1 Jiangsu Engineering Center of Network Monitoring, Nanjing University of Information

More information

FACIAL RECOGNITION BASED ON THE LOCAL BINARY PATTERNS MECHANISM

FACIAL RECOGNITION BASED ON THE LOCAL BINARY PATTERNS MECHANISM FACIAL RECOGNITION BASED ON THE LOCAL BINARY PATTERNS MECHANISM ABSTRACT Alexandru Blanda 1 This work presents a method of facial recognition, based on Local Binary Models. The idea of using this algorithm

More information

Skew Detection and Correction Technique for Arabic Document Images Based on Centre of Gravity

Skew Detection and Correction Technique for Arabic Document Images Based on Centre of Gravity Journal of Computer Science 5 (5): 363-368, 2009 ISSN 1549-3636 2009 Science Publications Skew Detection and Correction Technique for Arabic Document Images Based on Centre of Gravity Atallah Mahmoud Al-Shatnawi

More information

IMAGE CLASSIFICATION USING COMPETITIVE NEURAL NETWORKS

IMAGE CLASSIFICATION USING COMPETITIVE NEURAL NETWORKS IMAGE CLASSIFICATION USING COMPETITIVE NEURAL NETWORKS V. Musoko, M. Kolı nova, A. Procha zka Institute of Chemical Technology, Department of Computing and Control Engineering Abstract The contribution

More information

A Novel Criterion Function in Feature Evaluation. Application to the Classification of Corks.

A Novel Criterion Function in Feature Evaluation. Application to the Classification of Corks. A Novel Criterion Function in Feature Evaluation. Application to the Classification of Corks. X. Lladó, J. Martí, J. Freixenet, Ll. Pacheco Computer Vision and Robotics Group Institute of Informatics and

More information

A System for Automatic Extraction of the User Entered Data from Bankchecks

A System for Automatic Extraction of the User Entered Data from Bankchecks A System for Automatic Extraction of the User Entered Data from Bankchecks ALESSANDRO L. KOERICH 1 LEE LUAN LING 2 1 CEFET/PR Centro Federal de Educação Tecnológica do Paraná Av. Sete de Setembro, 3125,

More information

Linear Discriminant Analysis in Ottoman Alphabet Character Recognition

Linear Discriminant Analysis in Ottoman Alphabet Character Recognition Linear Discriminant Analysis in Ottoman Alphabet Character Recognition ZEYNEB KURT, H. IREM TURKMEN, M. ELIF KARSLIGIL Department of Computer Engineering, Yildiz Technical University, 34349 Besiktas /

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION CHAPTER 1 INTRODUCTION 1.1 Introduction Pattern recognition is a set of mathematical, statistical and heuristic techniques used in executing `man-like' tasks on computers. Pattern recognition plays an

More information

Available online at ScienceDirect. Procedia Computer Science 45 (2015 )

Available online at  ScienceDirect. Procedia Computer Science 45 (2015 ) Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 45 (2015 ) 205 214 International Conference on Advanced Computing Technologies and Applications (ICACTA- 2015) Automatic

More information