Chapter Review of HCR

Size: px

Start display at page:

Download "Chapter Review of HCR"

Norah Potter
5 years ago
Views:

1 Chapter 3 [3]Literature Review The survey of literature on character recognition showed that some of the researchers have worked based on application requirements like postal code identification [118], location of address on an envelope [116], license plate recognition [55][67], mathematical equation recognition [114], form-based character recognition [25], etc. There are research papers on multi-script recognition [66], recognizing text as printed or handwritten text [10][68], script independent word spotting [103], etc. Most of the researchers have worked on HCR in general as a research topic stressing on some part of HCR. Some of the important survey papers that give general insight into handwriting character recognition are [1][2][4][5][6][7][29][44][159] [161]. In OCR problems, the emphasis is to be given on preprocessing, feature extraction and classification using image processing [136][137], computer vision [138] and pattern recognition technologies [139], the area of our interest. Automation of handwritten character recognition is a very complex problem and the complexity increases manifold when we want to build an unconstrained system. Building software that is capable of recognizing with 100% accuracy for any user handwriting style, size, font, direction, with noisy background, etc, is an open problem today and most of the Character Recognition System work target specific language(s) and / or writing method [6]. 3.1 Review of HCR HCR is heavily language/script dependent and hence comparison of approach and performances across scripts do not make much sense. Comparisons within a script are 66

2 influenced heavily by the nature of data used and the expectations from the system. Standard benchmark databases are not available for most languages, particularly Indian languages. As the local database is created for the experiment by the researchers for their use only, one cannot guarantee about the robustness of the database in terms of number of writers, variations, constraints imposed while writing, etc. Hence their claims cannot be compared directly with others. There are some standard databases available mainly for English, Japanese and Chinese scripts. Most of these databases are constrained depending on the source of data (on-line, offline, writers, etc), type of data (printed or handwritten normal or cursive digits, lowercase characters, uppercase characters, words, etc), updations of the existing database with new samples, format of data (binary or gray), size of data (images), etc. Some of the standard databases used by the researchers are Center of Excellence for Document Analysis and Recognition (CEDAR), National Institute of Standards and Technology (NIST), United States Postal Service (USPS), Printed Japanese Character (PJC), ETL (Electro-Technical Laboratory), UNIPEN, Center of Pattern Recognition and Machine Intelligence (CENPARMI), Greek Unconstrained Handwriting Database (GRUHD), etc. CEDAR CD-ROM1 database released in 1994 consists of English script handwritten cities, states and ZIP codes in gray scale format. The digits and alphabetic characters in the database are in binary format. NIST SD19 database contains binary alphanumeric English characters. MNIST is a gray scale handwritten English character database. UNIPEN databases for English scripts Train-R01/V07 and DevTest-R01/V02 has section 1a for digits, 1b for uppercase characters, 1c for lowercase characters and section 3 contains digits, lowercase and uppercase characters and also punctuations. UNIPEN database with online English script samples has more than 150 writers (male, female, left hand or right hand writers) belonging to 5 different countries with different educational background, CENPARMI has different databases for handwritten English, Farsi, Arabic character images, PJC contains handwritten Japanese character images, ETL-8 consists of hand printed Chinese characters, ETL-9 is for handwritten Japanese characters and ETL-6 is for 67

3 English handwritten characters, GRUHD is for unconstrained handwritten Greek characters. In the following subsections, we look at the work done in selected International languages and then later look at the work in Indian languages. A lot of work has been done on English. Since the complexity of English HCR is far less than the Indian languages, due to smaller number of characters and since characters do not change shape based on the characters following/ preceding, the challenges are likely to be very different for our problem. We focus on papers that work on handwritten isolated digits, characters, English cursive characters, words, and categorize them based on the language, database and technology Work on Chinese handwritten characters Chinese character set is very huge with more than 6700 characters. As there is huge similarity between the characters, and the character shapes are complex with more lines than curves, the recognition rate achieved by some researchers is around 85%. In [13], clean image is input to the system. Pre-thinning, thinning, post thinning is done as part of preprocessing. Modified Hough transform using templates is applied to extract individual strokes (horizontal, vertical, backslash, and slash), corners and dots. 40 samples each of 900 Chinese isolated handwritten characters are tested using a Decision Tree Classifier. The recognition rate is 84.02%. In [21], 10 samples of 200 Chinese characters including regular and rotated are used. Preprocessing includes noise elimination using 4x4 impulse noise filter, fuzzy non-linear normalization for stroke lengths, scaling for size normalization and thinning. Five invariant features are extracted namely no of strokes, no of multi fork points, no of total black pixels, no of connected components and ring data (count of black pixels at distance r from the centroid). The first four features are used in pre-classification using maximum distance clustering algorithm and ring data is used in the matching process using similarity measures within the cluster. Fuzzy normalized results had a recognition rate of 85% on special samples with extreme long stroke and 86% for normal samples. 68

4 3.1.2 Work on Standard English databases Some researchers have worked on standard database and some have worked on local database and also tested their system on standard databases to compare with the bench mark results. There are many standard databases available for English. Some of them are CEDAR, NIST, USPS, ETL-6, UNIPEN, etc. Work on cursive characters from CEDAR database is as follows. The recognition result reported for uppercase, lowercase characters and digits with MLP NN classifier is 83.65% [62] and a most recent paper report 94.74%. In [15] CEDAR database with English handwritten cursive digits and characters is used for the experiment. In preprocessing, image is thresholded, isolated pixel noise cleaned, slant corrected, smoothed, thinned, and white borders around a character shape are removed. Four directional lines using modified Hough transform with an angular range of ±20 0 are extracted. The image is divided into 9 uniform regions that are analyzed for the counts of the lines passing through them. Also Global features- width height ratio, total count of 4 directional lines are considered. Nearest Neighbor and Linear Discriminant Analysis (LDA) are the classification methods used for comparison. Both classification results are similar with a maximum of 93% and 67.3% for numbers and characters respectively. The authors have compared with other works on the same database and these experiments on CEDAR database show slightly better results. In [54], CEDAR English isolated characters are used for the experiment. The images are preprocessed with morphological filters, de-slanting, de-skewing and binarization. Zonal features like ratio of a zonal foreground pixels to total foreground pixels, direction code, difference between the sums of square of line lengths to the orthogonal directions are extracted. Global features like width-height ratio and character portion below the baseline are also extracted. Learning Vector Quantization (LVQ) with 3 different learning methods are used for classification. LVQ1 uses nearest neighbor decision rule, LVQ2 uses Bays rule to correct the class boundaries of the LVQ1 and LVQ3 has additional rules to ensure proper class distribution. With all 3 LVQs, the maximum result of 81.72% is reported. [62] uses CEDAR databases with English words and standard alphanumeric characters for the experiment. The word images are thresholded, slant corrected, thinned and boundary extracted. Zonal features for each of the four directional lines, number of lines, and total 69

5 length of the lines and the Global intersection points are extracted and tested on Multilayer Perceptron with Back-propagation (MLP BP) and Radial Basis Function (RBF) neural networks. Both classifiers performed similarly, but, RBF training time was significantly less. The maximum recognition rate reported is 83.65% for non-resized boundary images with MLP BP classifier. On average, MLP BP network outperforms RBF when directional features extracted from resized / non-resized, thinned / boundary extracted images are considered. In [165], one of the very recent results, with recursive sub divisions of the image for feature extraction, reports 94.74% recognition results for CEDAR English character database. A lot of work has also been done on NIST database. The recognition result reported for uppercase and lowercase characters is 86.34% [50]. In [22], handwritten digits from NIST data set and digitized mail pieces are used. First image is size normalized. Gradient map (directional histograms), structural curvature features and concavity features are extracted and tested on two different classifiers. K-NN performed better with 97.1% compared to MLP with 96.9%. [50] uses NIST database for experiment, finds projection profiles, vertical and horizontal projection histograms, contour directional histograms shown in figure 3.1 and tests the performance on uppercase, lowercase and a meta class with mixture of upper and lower case (characters like O and o, V and v, etc, are considered as a single charactercase insensitive) English characters using MLP BP neural network. The recognition rates are 86.73% and 92.47% for lower case and uppercase respectively. The meta class (case insensitive class) results are 87.79% little better in comparison to 52 (lowercase and uppercase together) class results of 86.34%. Projection profiles histograms contour of a and 3x2 zoning Figure 3.1 Features used by Koerich 70

6 [35] uses NIST database for English characters. The system is trained with 2000 samples of 128 classes by considering the digits, uppercase, lower case and mixed characters and tested on 500 samples for each symbol. The characters are size normalized (other preprocessing is done as a part of document processing). The features used are horizontal histogram, vertical histogram, radial histogram, radial out-in profile and radial in-out profile. K-means algorithm is used for classification with Euclidean distance minimization. The maximum recognition rate is 98.8% for English digits. For English mixed characters, the recognition result reported is 82.79%. Work on ETL database reports 99.26% for uppercase characters and work on UNIPEN database reports 85% for lowercase characters. [61] uses elastic matching technique based on class dependent eigen-deformation model. The results are superior to those of conventional class independent deformation models. The experiment is done on ETL6 database handwritten English uppercase character size normalized images. The recognition rate achieved is 99.21% slightly higher than other conventional method results in the paper. [36] uses UNIPEN database of digits and isolated characters. It assumes that the input characters are already segmented and are confined in a bounding box. They use different category of features. (1) A 3x2 regional grid is used. The curvature degree to which the region content is rectilinear, curved clockwise, curved anticlockwise is computed per region. The line degree to which the region content is horizontal, vertical, positive oblique and negative oblique are the features. (2) The horizontal and vertical densities of the regions are computed as features. (3) Aspect ratio. (4) Horizontal and vertical distances from the boundary to the edges taken at regular distance. (5) Intersection of the edges to the horizontal and the vertical edges of the grid. Some features are shown in figure 3.2. An MLP with BP Neural network is used for classification. The maximum recognition rate for digits is 97% and for lower case characters is 85% and claimed to be the highest results on UNIPEN database. 71

a b c Figure 3.2 (a) 3x2 regions, (b) boundary distance, (c) grid and edge intersection points. 3.1.3 Work on local English database Some researchers work is based on locally generated database.

7 a b c Figure 3.2 (a) 3x2 regions, (b) boundary distance, (c) grid and edge intersection points Work on local English database Some researchers work is based on locally generated database. As the complexity of these databases is not measurable, the results cannot be compared directly. So some also tested on standard databases. In [14], the database used is with 50 Words. Two level of segmentation is performed to extract characters from word. In the first level heuristic algorithm and a hole seeking component are used and in the second level feed forward NN with BP is trained manually and used for the verification of the first level segmentation results. The character pixel values are the features and tested on NN for character recognition and also presented to a neural based dictionary of words for word recognition. The maximum character and word recognition rates reported are 78% and 100% respectively. In [48], the authors conclude that the HCR problem is not a cluster-able recognition problem with respect to the features they tested. The training set with 150 uppercase characters collected from a single user is used. They considered profile, Geometric moments, Fourier Transform features, contour and shadow features. Observed retrieval success rate for profile, Geometric moments, Fourier Transform features, contour and shadow features are 92%, 90.8%, 86.4%, 65% and 98% respectively when tested on LVQ. With 1000 samples from 13 writers, the shadow feature average performance observed is 67%. In [47], a segmentation free technique with appearance based features is suggested. The images are rescaled and cropped. PCA and DWT are used separately and also together for feature generation. HMM is used for classification. The minimum character error rate (CER) of 26% is reported. That is, the maximum recognition is 74%. 72

8 Some of the works reported on other International languages are as follows. In [37], a single system is developed to handle both online and offline Korean characters by converting the image into an array of strokes similar to online strokes, making the strokes order free. The system learns online or offline data continuously to improve the performance of the recognizer. The image is passed through a filter to eliminate isolated pixels and rugged pixels. From the thinned image, strokes are segmented. Stroke size, start and end points, direction of the segments are the features which are made order free. Comparison to find the minimum dissimilarity among the selected stroke with the best matching stroke for all the strokes is computed and averaged to identify the class. In [35], discussed earlier under NIST database for English character recognition, the same feature set is also tried on GRUHD database and the recognition rate for mixed Greek characters is less (72.8%) compared to English mixed characters result (82.79%) More specific feature survey As we saw in last chapter, a number of features have been used in pattern recognition and character recognition in particular. In this section, we look at two classes of such features in more detail, since our approach uses them heavily. These are Gabor filters and Moments in general. These turn out to be attractive set of features for scripts like Indian languages which are curvature rich. We briefly introduce the feature and cite a number of works which makes use of them Use of Gabor filter Gabor filter is used in HCR research for character segmentation [116] and for feature extraction. The Gabor filter is used to create the independent directional images as features by researchers. The directional information thus extracted is used directly as feature [116] or it is used to reduce the input dimension complexity [112][114]. Some have computed zonal statistical probability distribution of the directional images [106]. The computation of Gabor filter to extract directional information is discussed in section 7.4. Some research works using this filter are as follows. In [105], the authors worked on noisy hand printed Chinese character images from ETL-8 database to find the robustness of Gabor transformation to the noise and also compared 73

9 with peripheral direction contributivity (PDC) features. PDC features are formed by assigning stroke directions to pixels and the pixels on the first, second and third stroke encountered by the scan line are selected. The results show that Gabor features yielded an error rate of 2.4% (success rate of 97.6%) as compared to 4.4% (success rate of 95.6%) for PDC. In [106], images are preprocessed for slant and distortion using minimal moment of inertia and rotation. Canny edge image is passed through a set of Gabor filters with 2 wave lengths and 4 orientations. From these, maximum ratio vectors are computed and tested on MLP with BP neural network. It is found that filter dimension 16x16 gave better results with recognition accuracy of 96.5% for 10 numerals and 26 uppercase alphabets with 20 samples / character. [113] discusses the effect of sampling intervals of 2D- Gabor features in the 2D pattern, orientation angle and logarithmic frequency domains. The discussion on the feature stabilities for scaling, rotation and translation clarified that the stable range for scaling, translation and rotation is 2, half times as large as the wavelength of the Gabor filter and respectively. The experiments on printed Japanese characters showed that 10 maximum recognition rates are achieved for the optimal sampling intervals along with the reduction in the computations. When the sampling rate was less than the optimal, recognition rate did not change, but, the computations increased. Similarly, in the orientation angle domain, sampling with eight conventional orientation angles was sufficient for Japanese character recognition. [110] uses 3 databases MNIST, CENPARMI and PJC. The images are normalized gray scale images. The authors compare the performance of Gradient features with Gabor features. The gradient features are computed using Sobel operator. Eight direction planes are generated and merged into four planes. Each plane is convolved with a low-pass Gaussian filter and the convolution values at uniformly placed sampling points are taken as gradient features. The magnitude of the Gabor Transformation is used as Gabor features. The feature vector is first reduced using Fisher Discriminant Analysis. The class-mean vectors of training samples form the template of nearest mean classifier. The mean vectors are also normalized using LVQ with minimum classification error. The 74

10 Gabor feature performed well with 99.47% and 99.5% for MNIST and PJC respectively when high sampling rate with 4 orientations are considered. In [114], the Elastic meshing technology is first applied to get sampling points. Then a set of Gabor filters (real parts) are used to extract different directional features at each sampling point. A minimum distance classifier is used. The Gabor feature performed well compared to directional features with a recognition rate of 97.1% for poor quality images Use of moment features Moments can represent each character uniquely in the form of monomials regardless of how close the characters are in terms of local features and hence the image can be reconstructed from the moment features. As the order of moments gets higher, the reconstruction of the image using these moment features gets better but, at the same time it may get influenced by noise. Historically, the first significant work considering moments for pattern recognition was performed by Hu [126]. From methods of algebraic invariants, he derived a set of seven moment invariants, using non-linear combinations of geometric moments. These invariants remain the same under image translation, rotation and scaling. Since then, moments and functions of moments are widely used in all such applications where there is a need for identifying a shape as in pattern recognition, ship identification, aircraft identification, pattern matching, and scene matching. As moments represent images in a transformed domain, the image can be reconstructed from the moment features. The kernel function of geometric moments is not orthogonal, which makes reconstruction of an image from these moments quite difficult and requires moment matching method. Further research in this direction resulted in orthogonal moments like Zernike moments, Legendre moments, etc. From the original preprocessed image many researchers have computed Geometric moments [108], Central moments [102][103][107], Hu s moments [104] [120], higher order Hu s moments [102], Gegenbauer moments [46], affine moments [129], Zernike moments [45][120], Legendre moments [108], Chebyshev Moments [79] etc. These moment values are directly used as features [103] in some cases, and some have extracted central-ness, divergence, imbalance, skewness, etc, from moments as features [108][102]. Some have worked on accuracy analysis of Zernike moments [109]. In [127], the orthogonal moments features are tested under different parametric and non-parametric classifiers. There are 75

11 research works on automatic generation of moment invariants [129] and fast computation of moments [130][144]. The computations of some of these moments are discussed in section 7.3. Some research works using non orthogonal and orthogonal moments features are as follows. In [104] handwritten numerals are tested using MLP with BP neural network. The Geometric moments are found for 1-D contour sequence as well for 2-D images. Four features are generated from 1-D moments and Hu s invariant features are generated from 2-D moments. The test results show that NN results are better than nearest neighbor and minimum mean distance classifiers and the contour sequence moments performed well with 95.42% as compared to Hu s moments with 82.09%. In [108], moment features are analyzed for their power to recognize similarly shaped Chinese characters. They proposed 4 non-linear functions based on moments till 3 rd order. They tested the performance of geometric moments, central moments and Legendre moments. The mean square distance is used as a measure for classification. The distance between each pair of 6,763 Chinese characters is found. It is observed that there are 78,801 pairs of characters falling into a range of 0.01 with Geometric moments whereas there are only 671 pairs by Legendre moments. It is also noted that a pair of characters that is difficult to recognize using one method may not be so difficult using another method. Hence different moments can be used in conjunction. [120] does the comparison of geometric moments (7 Hu s features) and Zernike moments (0-12 order) using MLP with BP NN, Bayes Classifier, nearest neighbor rule and weighted minimum mean distance rule. The database consists of 24 samples of 26 upper case English characters with rotation and different levels of Signal to Noise Ratio (SNR). In all experiments with different levels of noise, MLP NN performed better even with low SNR hidden layer nodes gave close to best performance and reducing this by one fifth, did not alter the results by significant amount. Zernike moment features performed superior to Geometric moments. Both performed similar for noise-less images and Zernike moments did well in noisy conditions. Even though, the high order moments are sensitive to noise, the increase in the number of Zernike features by increasing the order, MLP shows the increase in performance whereas with other 3 conventional classifiers, decrease in performance is observed. The accuracy of 100% were achieved for noiseless, 76

12 50, 25 and 12 db SNR cases, while accuracies in 90% and 80% were obtained for 8 and 5 db noise when Zernike moments of order 12 are classified using MLP NN. [129] proposes a method for automatic generation of moment invariants of any order. The database consist 50 samples of all alphanumeric English characters with 20 used for training and 30 for testing using Euclidean distance and tree classifiers. The new moments results are compared with Hu s invariants. The recognition rate of 96.34% and 98.81% is achieved with Hu s invariants and new 3 rd order moments respectively. [127] examines orthogonal polynomials- Legendre, Zernike and Pseudo Zernike for the recognition of skeletonized handwritten Arabic numerals. A new method of scale and location invariance is suggested with the use of circular regions about the centroid or the minimum bounding circle. Radial geometric moments are used for computing all the three orthogonal moments. Bayes quadratic, K-nearest neighbor, Parzen and MLP NN are used as classifiers. MLP and Bayes classifiers need far less computational effort for classification once they are trained. Pseudo Zernike moments performed better with recognition rate of 91.7% (8.3% error) with K nearest neighbor classifier compared to Legendre and Zernike moments. The best non-rotation invariant result of 97.1% (2.9% error) is obtained with Pseudo Zernike moments with MLP. 3.2 Indian Script OCR Survey At present more sophisticated OCRs available are for Roman, Chinese, Japanese and Arabic text. These readers can process documents of different fonts and sizes as well as intermixed text and graphics, which are typewritten, typeset or printed by a printer. There is relatively little work being done on Indian language OCR, we briefly review the available literature in this section. In the recent years, research on OCR for online and offline printed characters and handwritten characters has picked up on Indian languages [64]. OCRs for many languages like Devanagari [95][45], Hindi [65][53][153], Bangla[65], Tamil [92][93][94][97][119], Telugu [76][95][96][97], Kannada [73][78], Malayalam[112][133], Gujarati [85][132][134], Oriya [71] and Punjabi, Gurmukhi [52][77], etc, are under research. Major work is happening on Devanagari and Bangla. The researchers are working on different variants of the OCR problem such as printed text [71][76], printed text with compound characters [65], 77

13 segmentation of machine-printed and handwritten text lines [68], skew correction [69], multilingual text recognition [112][65], online HCR [92][93][94][95][96][97], post processing of Indian OCRs [77][72], handwritten character recognition[75], handwritten numeral recognition[84][73] and multi script similar shape character pairs recognition [63]. In [154], a comparative study of Devanagari handwritten character recognition using 12 different classifiers and four sets of feature is presented. The features used are zonal curvature features and gradient features from binary and from gray-scale images and are processed to make the feature count of 392 using down sampling with Gaussian filter and PCA methods. A number of classifiers like Projection Distance (PD), Subspace Method (SM), Linear Discriminant Function (LDF), Support Vector Machines (SVM), Modified Quadratic Discriminant Function (MQDF), Mirror Image Learning (MIL), Euclidean Distance (ED), Nearest Neighbour (NN), K-Nearest Neighbour (K-NN), Modified Projection Distance (MPD), Compound Projection Distance (CPD) and Compound Modified Quadratic Discriminant Function (CMQDF) are considered. Mirror Image Learning (MIL) gave overall better results among all the classifiers and showed highest accuracy of 95.19% on gray-scale curvature features. Other classifier s performance varied within -1% except LDF, ED, NN and K-NN that performed poorly. From the experiment we observed that curvature features provided higher results than gradient features in all the classifiers except NN and K-NN. NN and K-NN classifiers show slightly lower results in curvature features than gradient features. Also from the experiment we noticed that except ED, NN and K-NN classifiers the features computed in gray-scale images show better results than that of binary images. This paper reports that there are only four pieces of work on Devanagari offline handwritten character recognition and the proposed method with MIL classifier has the highest recognition rate reported. In [63], the similar shaped characters from different scripts are tested using a technique based on F-ratio (Fisher ratio). The gradient features are computed using Roberts filter. The 9x9 zonal histograms of the 16 directions are down sampled to 5x5 by a Gaussian Filter and are further enhanced by weighing the feature elements using the F-ratio (ratio - between between-class variance and within-class variance). Quadratic Discriminant function is used for classification. It is observed that the F-ratio based feature weighing improves the recognition results by a maximum of 1% on similar shaped characters. 78

14 In [153], MLP with BP and RBF NN are used as classifiers for handwritten Hindi character recognition. The highest recognition rate quoted by them is 85% with MLP NN for thinned, size normalized image itself as input. In [133], Malayalam handwritten characters are tested on MLP with BP NN. The segmented isolated characters are binarized, median filtered and thinned. One dimensional wavelet transformation is applied on both horizontal and vertical projections. They used Daubechies wavelets with filter length 4 for transformation. The number of levels of decomposition is adjusted to get a final smooth sub-signal of size 8. The 16 values (8 each from vertical and horizontal projections) are taken as the feature vector. The classification accuracy obtained is 73.8%. According to this paper, the highest result reported on Malayalam characters is 82.3%. In [45], Zernike moments from order 2 to 15 are used as features for Handwritten Devanagari characters. The 70 Zernike moments are tested on NN and the results quoted are between 80% and 85% In [119], a Hierarchical NN (HNN) with BP is proposed for handwritten Tamil Character recognition. The image is centered and rescaled for translation and size normalization. The 8 immediate edge pixel coordinates from the centroid are the first level features. Similarly the second level features are obtained. The first level features are used for coarse classification and the next level features are used to classify characters in each group. The results are also tested on Single NN (SNN). The HNN performed well with 94.4% as compared to SNN with 72.2%. 3.3 Kannada OCR Survey Kannada is the official language of Karnataka state of southern India. At the 1991 census Karnataka had a population of 400 lakh [135]. This language is spoken not only in Karnataka, but to some extent in the neighboring states of Andhra Pradesh, Tamil Nadu and Maharashtra. Kannada is written with its own script and is also used for writing Tulu. Kannada script is similar to Telugu. Kannada like many other Indian languages is built from a base character set of 49 characters with 15 vowels (swaras) and 34 consonants (vyanjanas). There are as many stress marks as there are base characters. Stress marks of swara (vowel matra) when applied to 79

15 vyanjanas, Kagunita (compound character) is formed. Hence there are 15x34 = 510 compound characters. The stress marks of Vyanjana (vothus) modify the compound characters giving complex characters. The vothu is an appendage attached to the compound character mostly at the bottom. Since the appendages can also touch the character, the set of distinct characters to be recognized becomes potentially very large. Some examples of each of these cases are given in figure 5.2 (c) and (d). The Kannada OCR research has been picking up in the recent years, but comparatively very few research papers are available as base to start with [64][89]. The work is still in the preliminary stage with researchers working on printed text, numerals and basic characters. But no research base is available for handwritten Kagunita (compound character) recognition. There are no standards, tools and linguistic resources like corpus for the experiment available for printed, online and offline handwritten character recognition. There are no bench mark results to do comparative analysis under similar platforms. The research work is happening in both online and offline character recognition domains. Some researchers have worked on printed characters [80], printed text [98][101][117], and some on handwritten numerals [60][83][84], basic character set [90][91] and bilingual printed text [81]. Some focus on preprocessing techniques like printed character segmentation [82], normalization [101], etc. Very little research work on post-processing is reported [100]. Technologies like multi layer classifiers [73][86], Nearest Neighbor classifier [73][83], Support Vector Machines[73][87], Radial Basis Function [98], Hybrid Neural network [88], Zernike moments [99], Wavelet features [98][81], etc, are explored by different researchers. We now discuss some of the specific attempts at Kannada character recognition. [83] discusses the handwritten Kannada numeral recognition that uses un-thinned images. The structural features, namely, directional density of pixels in four directions, water reservoirs, maximum profile distances and fill hole density are used for recognition with K nearest neighbor classifier using Minkowski minimum distance criteria. The overall accuracy reported is 96.12% when tested on the personally created database. In [84], handwritten numeral recognition of six popular Indian scripts - Devanagari, Bangla, Telugu, Oriya, Kannada and Tamil is presented. The binarized (Otsu method) bounded box and size normalized image is divided into blocks, down sampled and directional features are extracted. The modified Quadratic Discriminant function is used 80

16 for classification. The minimum recognition result obtained was for Oriya script with 98.4% and maximum was for Devanagari with 99.56%. Kannada recognition rate reported is 98.71% with numeral seven having minimum recognition rate of 97.46% which is attributed to the shape similarity to other numerals. [90] used Fisher Linear Discriminant Analysis (FLD), 2D-FLD and diagonal FLD for handwritten Kannada vowels and consonants (50 characters) and also considered modifiers (50 shapes) of compound characters as extended character set. The database has 100 samples of each character and 75 were used for training and the rest 25 used for testing. A number of (17) different distance measures are used for classification. The best recognition rate reported for 50 characters (vowels and consonants) is 68% with 2D-FLD features and angle measures. For 100 characters (vowels, consonants and modifiers), again 2D-FLD features and angle measures performed well with 58.11%. [60] proposes a quadratic classifier based scheme for offline Handwritten numeral recognition. The bounded box character is divided into blocks and chain code histogram is computed for each block. Maximum recognition obtained is 98.45% with 100 dimensional features. In [73], zone based feature extraction is suggested for Handwritten Numeral recognition for four South Indian scripts. For each zone, an average angle from the centroid of the image to the pixels in each zone, centroid of each zone and the average angle from centroid of the zone to the pixels within the same zone are computed as features. The recognition result for Kannada numerals is 97.85% using SVM as compared to 97.7% with Nearest Neighbor Classifier. With MLP BP NN the results are 94.75%. In [78], the same authors of [73] are presenting the same system with MLP BP NN classifier. The results show that as the training samples increase, the results decrease. With 200 samples, the recognition result is 98% and with 1000 samples, it is 94%. 3.4 Discussions Majority of researchers are concentrating on the complete system and have assumed constrained images of characters and words with minimal noise and variations. Hence their preprocessing requirements make the image suitable for feature extraction are relatively simple. 81

17 There are standard databases for English, but for other languages no significant character corpus is available yet. Most of the existing research uses home built databases, usually of a small size and restricted variability. The moment features are found to be robust for noise. With NN, use of high order moments from noisy images improved the recognition rate further [120]. Zernike moments performed similar to Geometric moments under noiseless environment. However, in the presence of noise, Zernike moments are more robust. Geometric Moments on 1-D contour sequence of the character image has better recognition rate than Hu s moments computed from 2-D character image [104]. Gabor filters are more robust than gradient filters as they are less sensitive to ruggedness of the contour [110]. On Indian language very few papers are available for handwritten character recognition. Most of the work reported is on handwritten numeral recognition and online character recognition. There is preliminary work reported on preprocessing of Indian scripts like Devanagari and Bangla. But the work has to extend to other scripts. The work on Kannada compound characters is reported for printed character recognition. The approach followed treats vowel matras as separate characters. Across the different studies, one can see that there are no universally winning features or classifier. Different choices perform differently for different character sets. Even within a set some features may perform well in distinguishing among a subset of characters. Thus, one major challenge in getting good performance is to select a meaningful set of features, guided by the nature of the character set. As Indian language script has hundreds of characters including complex characters, the work reported is not sufficient to build a practical OCR system. At present only digits and basic character set are considered for the experiment. The number of Kagunita (compound) characters is huge and are formed by modifying a consonant based on the vowel matra associated with it. Similarly complex characters are formed by the combination of two or more consonants with a vowel matra. Hence the methodology applied to basic character set may not be suitable for Kagunita and complex character recognition. So there is a need for extensive research work in these directions. 82

LITERATURE REVIEW. For Indian languages most of research work is performed firstly on Devnagari script and secondly on Bangla script.

LITERATURE REVIEW. For Indian languages most of research work is performed firstly on Devnagari script and secondly on Bangla script. LITERATURE REVIEW For Indian languages most of research work is performed firstly on Devnagari script and secondly on Bangla script. The study of recognition for handwritten Devanagari compound character