Character Recognition

Size: px

Start display at page:

Download "Character Recognition"

Buddy Barker
6 years ago
Views:

1 Character Recognition 5.1 INTRODUCTION Recognition is one of the important steps in image processing. There are different methods such as Histogram method, Hough transformation, Neural computing approaches and fuzzy theory approaches. These approaches are computationally expensive and hence quite complex to implement. In this work, a simple approach called 14-segment display method is used to recognize the characters. In this method, the principle of projection is used. Section 5.2 discusses about the various stages in recognition process, section 5.3 gives details about projection onto 14 segments section 5.4 describes about the knowledge base, and finally section 5.5 gives details about methodology-i and methodology-ii. 52

2 5.2 STAGES IN RECOGNITION PROCESS The stages involved in this method are depicted in Figure fig 5.1. The method of recognizing the numeral is discussed in the subsequent sections of this chapter. Preprocessed numeral image Projection of the numeral onto 14 segments of numeral size Classification of the numeral image Conflict Knowledge base about the numerals Conflict resolution Numeral Recognition Figure 5.1 Block diagram showing the activities in proposed method 5.3 PROJECTION ONTO 14 SEGMENTS A thin distortion free and clear image of a handwritten number is considered as the input for recognition. A logical box enclosing the number is imagined. The logical box is segmented to 14 lines as shown in Figure The pixels of the numeral image are projected on to the predefined segments as explained below: 53

3 i. The box area enclosing a numeral image is logically divided into four as shown in Figure.5.3a. ii. All bright pixels in the part 1 are projected onto segment f; in the part 2 are projected onto segment e in the part 3 are projected onto segment b and in part 4 are projected onto segment c iii. Similarly, the box area is divided into three parts horizontally as shown in Figure.5.3b. iv. All the pixels are projected in the part 7 onto segment d. Figure- 5.2 Segmentation of the logical box into 14 lines Dynamic threshold for each segment based on the size of the numeral is computed. The threshold determined for each segment is used to select or drop the segment. 54

4 Neatly written numerals will always form well defined segment strings. For example, neatly written numeral 3 forms the segment string abcdg. A decision tree classifies the numeral based on the segments formulated in the first stage and uses the built in knowledge base to resolve any conflict in recognizing the classified numeral in the next stage. Figure 5.3(a) Vertical partition box area Figure 5.3(b) Horizontal Partition box area Knowledge base Knowledge base plays an important role in recognition of the numeral and the knowledge base is a repository of derived information. In fact, an inference engine is used for deriving such knowledge on numerals. Initially, the classification of numerals is done based on the segments formulated by projection. If the segments formulated by projections are not sufficient to classify the numeral or encounter conflicts in numeral recognition, then the system proceeds with the support of knowledge base in recognition process. Most of the segments obtained by projection are sufficient to recognize the numeral and such cases are nearly written numerals and do not make use of knowledge base. However, many a time the segments formulated 55

5 are not sufficient to recognize the numeral and also lead to conflicts. Under such situations, knowledge base is used for further investigations to overcome conflicts and extracts more features from the numeral image to recognize the numeral. The cases that make use of the knowledge base are mentioned below: 1. Sometimes, the projections of numerals 1 and 7 show conflict in the segment a. The knowledge about the height and width of the numeral image overcomes the conflict. The ratio of the length and height of numeral 1 is relatively less than that of numeral Projection of numerals 7 and 9 show conflict when numeral 9 fails to project onto segment g. The knowledge about the density of pixels in part 5 (Figure 5.3b) overcomes the conflict.number of pixels for numeral 7 in this part is very much close to the numeral whereas for numeral 9, the number of pixels is greater than 1.4 times the width of the numeral. 3. Projection for numerals 1 and 6 show conflict when numeral 6 fails to project onto segment g. The knowledge about the density of pixels in part 7 (Figure 5.3b) overcomes the conflict. Number of pixels in this part is very much close to the width of the numeral whereas for numeral 6, the number of pixels is greater than 1.4 times the width of the numeral. 4. Sometimes segments formulated by projection of numerals 2 and 3 are not sufficient for recognition and show conflict by forming segments abcd. Then the knowledge about the density of pixels in the overlapping area of part 2 and part 7 (Figure 5.3a & Figure.5.3b) comes into picture to overcome conflict. The 56

6 density of pixels in this area is high for numeral 2 with respect to the total area of the overlapping region. 5. Similarly, segment string formulated by numeral 4, sometimes resembles projection of numeral 6. In such projection, the segment string obtained is acdef. Then the knowledge of number of projected pixels at the bottom end of the segment c overcomes the conflict. In case of number 4, the number of projected pixels at the bottom end of the segment c overcomes the conflict. In case of numeral 4, the number of projected pixels is almost zero, where as for numeral 6, it is almost equal to the height considered at the bottom end of the segment c. 6. Sometimes segments formulated by numerals 5 and 6 show conflict by forming the segment string acdefg. Under this situation, the number of pixels projected on to segment e overcomes the conflict. For numeral 6, the number of pixels projected on to segment e is greater than or equal to the size of the segment c and for numeral 5, it is less than the size of that segment. 7. The conflict in projections of numerals 1 and 9 can be overcome similarly as explained in item 3 by considering overlapping regions of part- 1 and part 5 ( Figure 5.3a & Figure 5.3b). 8. The conflict in projections of numerals 1 and 2 is overcome with knowledge of number of pixels projected onto the segment a and density of pixels in the overlapping regions of part 3 and part -7 (Figure 5.3a & Figure 5.3b). The number of pixels projected on to segment a is less than 1.3 times the height 57

7 part 1 and greater than or equal to 1.3 times the height of the part 1. Further, the overlapping region overcomes the conflict as explained in item METHODOLOGY 1 The steps involved in the method for projection, classification and recognition are: 1. The logical box is divided into four parts as shown in fig 5.3a. 2. All the pixels are projected horizontally in the part 1 onto segment f. 3. All the pixels are projected horizontally in the part 2 onto segment e. 4. All the pixels are projected horizontally in the part 3 on to segment b. 5. All the pixels are projected horizontally in the part 4 onto segment e. 6. The logical box is divided into three parts as shown in Fig 5.3b. First part (part 5) is top ¼ th of the height of the numeral image. Second part (part -6) is center half portion of the numeral image. The remaining last ¼ th portion is the third part (part 7). 7. All the pixels are projected vertically in the part 5 on to segment a 8. All the pixels are projected vertically in the part 6 on to segment g 9. All the pixels are projected vertically in the part - 7 onto segment d 10. Determine if threshold for every segment pixels count are above the respective thresholds. Form a string of such segments in sequence. 11. Identify the segments whose projected pixels counts are above the respective thresholds. Form a string of such segments in sequence. 12. Classify the numeral image based on the decision tree on segments formulated for neatly written case. 13. Apply knowledgebase to recognize the numeral when the segments formulated are not sufficient for classification and when encounter of conflicting situations are encountered. 58

8 14. If the segment string formulated and knowledgebase does not lead to a decision, then the system fails to recognize the numeral. Figure 5.4 shows the decision tree for numeral classification. Notations used in the tree diagram are as follows. The label indicated, the left sub tree is obtained when the segment specified at branching is not formed. The digits mentioned at each node indicate the set of digits within that level of classification. The decision starts with segment f and proceeds with e, g a etc. These segments are arbitrarily considered to cover maximum number of items with in a subclass at each decision. Figure 5.4 shows decision tree for numeral classification. 59

9 The Figure 5.5 shows how the hand written number 3 is divided horizontally and vertically. It also indicates the projections of the pixels on to the segments and the formulated segment string. Figure 5.5 Projection of number 3 to segments and formulation of Segment string 60

10 5.5 EXPERIMENTAL RESULTS For each numeral a sample set of 50 different specimens are used for testing the system. The numeral sizes are varied from 15 X 15 pixels to 100 X 100 pixels. The recognition rate is 90%. It is clear from the result that the method shows good performance. Table 5.1 Recognition rates of numerals Numbers Percentage of recognition for printed characters Percentage of recognition for Hand written characters ,1,7, ,6, The system fails to recognize when the numerals are written in unusual ways. For numerals 2 and 3, recognition is relatively low because a lot of variability is noticed in writing of these two. The rate of wrong recognition or misclassification is about 3 % - 8 % because of the threshold computed for selecting a segment and knowledgebase. Misclassification is noticed in numeral sets (6, 8, and 9) (4, 9) and (2, 3). No failure cases are reported since the numeral is classified into any one class. The system recognizes numerals written in normal style to an extent of 100%. The efficiency of the system reduces for distorted and incomplete numbers. Recognition shows the same performance for slightly skewed numerals but degrades for more skewed numerals since skew correction is not taken care of. The efficiency also reduces if the numerals written are too asymmetric as the projection of the numeral is beyond prediction. 61

11 Conclusions :- In this work, a simple approach called 14-segment display method is used to recognize the characters. In this method, the principle of projection is used and no mathematical or statistical model is used. The system does not require normalization of the numeral image, as the method works fairly well for common writing sizes on the documents. The system may be used as a substitute to histogram method for recognizing printed numerals, as this method shows 98% recognition, requires less computation and implementation is much simpler than the histogram method. The method can be applied to perform automatic reading of numerals from documents. There is scope for making the system more efficient by making the knowledgebase more powerful to overcome misrecognitions or misclassifications. The algorithms that are available for character recognition have high accuracy and high speed. However, still many suffer from a fairly simple flaw (15). When they do make mistakes (and they all do), the mistakes are often very unnatural to the human point of view. That is, mistaking a 5 for an S is not too surprising because most people are willing to agree that these two characters are similar, but mistaking a 5 for an M is counter-intuitive and unexpected. Algorithms make such mistakes because they generally operate on a different set of features than humans for computational reasons. This algorithm, presently avoids thinning (and other preprocessing) by assuming that the input eight by eight data is not particularly aberrant. L ines in an eight by eight grid should not normally be thicker than two pixels. With this assumption, it then proceeds to look for feature points. 62

12 A feature point is a point of human interest in an image, a place where something happens. It could be an intersection between two lines, or it could be a corner, or it could be just a dot surrounded by space. Such points serve to help define the relationship between different strokes. Two strokes could fully cross each other, together in a Y or a T intersection, forms a corner, or avoids each other altogether. People tend to be sensitive to these relationships, the fact that the lines in a Z connect in certain way is more important than the individual lengths of those lines. These relationships are what should be used for character identification and the feature points can be exploited for the task. The procedure for extracting these feature points utilized by this algorithm is fairly straightforward. Since an eight by eight character consists of only sixty four pixels, it is viable to simply loop through the entire character and examine each pixel in turn. If a pixel is on, its eight neighbors are checked, since each neighbor can also only be on or off, there are merely 256 possible combinations of neighborhoods. Of these 256, fifty eight were found to represent significant feature points in a fairly unambiguous way. Extracting feature points is thus reduced to calculating a number between zero and 256 to describe a pixel s neighborhood and then comparing that number against a table of known feature points (Enumeration of Possible Pixel Neighborhoods). While it is true that this method does not always catch every feature point. (Some can only be seen in a larger context) it catches the majority. Missing feature points is certainly not a limiting factor in the algorithm s accuracy. It also does not suffer from labeling too many uninteresting points as being feature points. It has virtually no false positives. The feature point extractor is thus fast and reliable. 63

13 Characters cannot be identified by the extraction of feature points alone. Without a database of characters and their associated feature points, the ultimate feature point extractor would be useless. Only with such a database can the feature point extraction results from an unknown character be compared against what is expected for real world characters and a judgment of the unknown s identity made. Thus, a gold standard dictionary of characters and their associated features must be defined. Ideally, this dictionary should contain details for the average appearance of every character manifestation (many English characters have multiple different accepted manifestations such as Z versus Z ). If poor representative appearances for characters are chosen, valid characters at the extremes will not be identified as readily. If some manifestations of characters are missed, the program will certainly not be able to identify characters belonging to these groups at all. With both a method for extracting feature points and a dictionary of characters and associated feature point data for reference, identifying characters becomes a problem of measuring the degree of similarity between two sets of features. The method employed by this algorithm is just a slight modification of Euclidean distance. All the distances between each of the feature points in the unknown character and their closest corresponding feature points in the reference character are summed and missing or extra feature points are penalized. Identification is then a matter of finding the character in the dictionary that is, within a certain threshold distance of the unknown character. In practice, the algorithm currently checks every character in the reference set to first locate the minimum distance, and then verifies that the minimum distance is less than the threshold. Additionally, the algorithm tries to make some simple compensation for noise by noting that pixels surrounded by completely empty 64

14 space (dots) and pixels surrounded by completely full space (blots) are quite uncommon in normal characters and are probably the result of some type of noise in the input. It would also be possible to examine the space between individual feature points to determine whether or not contiguous straight lines have connected them. This would also greatly enhance the accuracy of the algorithm and would prevent a W from being recognized as an E. This particular modification would both slow the algorithm down considerably and consume quite a bit more memory, but it could still be justifiable if the accuracy increase were significant. A line and / or curve extractor could be used in conjunction with (or independently of) the above mentioned modification, and would provide yet more usable features that could be exploited for identification. The exploration of modifications that would make the algorithm more invariant to translation would certainly be useful. If the use of lines connecting feature points, as described above provided enough information of itself to accurately identify characters, it would be preferable to the current method of using feature point location as it would be translation invariant while the current method is clearly not. Even if the dependence on location cannot be fully removed, it could be reduced through a separate preprocessing step. Each character could be centered in the eight by eight grid with special attention being paid, so position information is not lost on characters where such information is vital (the comma and apostrophe, for example). Numerous other little miscellaneous improvements could be made to various features of the algorithm. The noise detection / handling procedure is currently little more than 65

15 a stub and could be readily improved. The character dictionary could be sorted in order of frequency and the thresholds trusted more completely to improve overall speed (the current algorithm is fairly quick). Suggested future work includes both the testing of these algorithm changes and further testing of the algorithm with more character data. Of particular interest would be character data that is deliberately noisy and character data that has been reduced to eight by eight resolution from some greater resolution. Both of these cases reflect real world problems. The total number of samples taken for testing is 84 Character recognition results using feature point extraction Table 5.2 the total number of samples taken for testing is 84 Character recognition results using feature point extraction Total number of correct recognition 72 86% Total number of correct recognition without counting identical characters 21 25% Total unknowns 5 6% Total wrong guesses 7 8% Experimental results and conclusions: Overall, the results of this experiment were mixed. On the one hand, the initial results certainly are not of commercial quality. When only a couple of pixels differed between the unknown character and the reference, the results were fairly good, but larger differences often made the algorithm unable to correctly recognize the unknown character. On the other hand, the low success rate is not indicative of the general algorithm but just the current implementation. There are many possible changes that could vastly improve the algorithm s recognition abilities. With a few of these changes implemented, the mistakes the algorithm would make would indeed be very similar to the types of mistakes humans would make. Thus, general algorithm 66

16 holds promise as a character recognizer that identifies characters in a manner similar to the way that humans identify characters. In experimentation, a sample set of 75 different vehicle images are taken from a camcorder. The recognition of number plates and recognition of characters varies from 70% to 80% for different images. In this experiment the system will only recognize English characters as well as numbers. It is assumed that the number plate is written in normal font. However, if some of the number plates are written in fancy styles, it is difficult to recognize. The percentage of success is about 75% because many vehicles will have stylish number plates. Table 5.3 shows statistics and experimental results Number of Vehicle passed 75 Number of Vehicle passed with plate of correct type 70 Number of Vehicle passed to recognition algorithm with plates of correct type 70 Number of Picture passed to recognition algorithm 75 Number of Picture passed to recognition algorithm which contains number plate 68 Total number of algorithm successes 63 Total number of failures 5 67

17 68

18 er 6 69

Toward Part-based Document Image Decoding

2012 10th IAPR International Workshop on Document Analysis Systems Toward Part-based Document Image Decoding Wang Song, Seiichi Uchida Kyushu University, Fukuoka, Japan wangsong@human.ait.kyushu-u.ac.jp,