Chapter Review of HCR

Size: px
Start display at page:

Download "Chapter Review of HCR"

Transcription

1 Chapter 3 [3]Literature Review The survey of literature on character recognition showed that some of the researchers have worked based on application requirements like postal code identification [118], location of address on an envelope [116], license plate recognition [55][67], mathematical equation recognition [114], form-based character recognition [25], etc. There are research papers on multi-script recognition [66], recognizing text as printed or handwritten text [10][68], script independent word spotting [103], etc. Most of the researchers have worked on HCR in general as a research topic stressing on some part of HCR. Some of the important survey papers that give general insight into handwriting character recognition are [1][2][4][5][6][7][29][44][159] [161]. In OCR problems, the emphasis is to be given on preprocessing, feature extraction and classification using image processing [136][137], computer vision [138] and pattern recognition technologies [139], the area of our interest. Automation of handwritten character recognition is a very complex problem and the complexity increases manifold when we want to build an unconstrained system. Building software that is capable of recognizing with 100% accuracy for any user handwriting style, size, font, direction, with noisy background, etc, is an open problem today and most of the Character Recognition System work target specific language(s) and / or writing method [6]. 3.1 Review of HCR HCR is heavily language/script dependent and hence comparison of approach and performances across scripts do not make much sense. Comparisons within a script are 66

2 influenced heavily by the nature of data used and the expectations from the system. Standard benchmark databases are not available for most languages, particularly Indian languages. As the local database is created for the experiment by the researchers for their use only, one cannot guarantee about the robustness of the database in terms of number of writers, variations, constraints imposed while writing, etc. Hence their claims cannot be compared directly with others. There are some standard databases available mainly for English, Japanese and Chinese scripts. Most of these databases are constrained depending on the source of data (on-line, offline, writers, etc), type of data (printed or handwritten normal or cursive digits, lowercase characters, uppercase characters, words, etc), updations of the existing database with new samples, format of data (binary or gray), size of data (images), etc. Some of the standard databases used by the researchers are Center of Excellence for Document Analysis and Recognition (CEDAR), National Institute of Standards and Technology (NIST), United States Postal Service (USPS), Printed Japanese Character (PJC), ETL (Electro-Technical Laboratory), UNIPEN, Center of Pattern Recognition and Machine Intelligence (CENPARMI), Greek Unconstrained Handwriting Database (GRUHD), etc. CEDAR CD-ROM1 database released in 1994 consists of English script handwritten cities, states and ZIP codes in gray scale format. The digits and alphabetic characters in the database are in binary format. NIST SD19 database contains binary alphanumeric English characters. MNIST is a gray scale handwritten English character database. UNIPEN databases for English scripts Train-R01/V07 and DevTest-R01/V02 has section 1a for digits, 1b for uppercase characters, 1c for lowercase characters and section 3 contains digits, lowercase and uppercase characters and also punctuations. UNIPEN database with online English script samples has more than 150 writers (male, female, left hand or right hand writers) belonging to 5 different countries with different educational background, CENPARMI has different databases for handwritten English, Farsi, Arabic character images, PJC contains handwritten Japanese character images, ETL-8 consists of hand printed Chinese characters, ETL-9 is for handwritten Japanese characters and ETL-6 is for 67

3 English handwritten characters, GRUHD is for unconstrained handwritten Greek characters. In the following subsections, we look at the work done in selected International languages and then later look at the work in Indian languages. A lot of work has been done on English. Since the complexity of English HCR is far less than the Indian languages, due to smaller number of characters and since characters do not change shape based on the characters following/ preceding, the challenges are likely to be very different for our problem. We focus on papers that work on handwritten isolated digits, characters, English cursive characters, words, and categorize them based on the language, database and technology Work on Chinese handwritten characters Chinese character set is very huge with more than 6700 characters. As there is huge similarity between the characters, and the character shapes are complex with more lines than curves, the recognition rate achieved by some researchers is around 85%. In [13], clean image is input to the system. Pre-thinning, thinning, post thinning is done as part of preprocessing. Modified Hough transform using templates is applied to extract individual strokes (horizontal, vertical, backslash, and slash), corners and dots. 40 samples each of 900 Chinese isolated handwritten characters are tested using a Decision Tree Classifier. The recognition rate is 84.02%. In [21], 10 samples of 200 Chinese characters including regular and rotated are used. Preprocessing includes noise elimination using 4x4 impulse noise filter, fuzzy non-linear normalization for stroke lengths, scaling for size normalization and thinning. Five invariant features are extracted namely no of strokes, no of multi fork points, no of total black pixels, no of connected components and ring data (count of black pixels at distance r from the centroid). The first four features are used in pre-classification using maximum distance clustering algorithm and ring data is used in the matching process using similarity measures within the cluster. Fuzzy normalized results had a recognition rate of 85% on special samples with extreme long stroke and 86% for normal samples. 68

4 3.1.2 Work on Standard English databases Some researchers have worked on standard database and some have worked on local database and also tested their system on standard databases to compare with the bench mark results. There are many standard databases available for English. Some of them are CEDAR, NIST, USPS, ETL-6, UNIPEN, etc. Work on cursive characters from CEDAR database is as follows. The recognition result reported for uppercase, lowercase characters and digits with MLP NN classifier is 83.65% [62] and a most recent paper report 94.74%. In [15] CEDAR database with English handwritten cursive digits and characters is used for the experiment. In preprocessing, image is thresholded, isolated pixel noise cleaned, slant corrected, smoothed, thinned, and white borders around a character shape are removed. Four directional lines using modified Hough transform with an angular range of ±20 0 are extracted. The image is divided into 9 uniform regions that are analyzed for the counts of the lines passing through them. Also Global features- width height ratio, total count of 4 directional lines are considered. Nearest Neighbor and Linear Discriminant Analysis (LDA) are the classification methods used for comparison. Both classification results are similar with a maximum of 93% and 67.3% for numbers and characters respectively. The authors have compared with other works on the same database and these experiments on CEDAR database show slightly better results. In [54], CEDAR English isolated characters are used for the experiment. The images are preprocessed with morphological filters, de-slanting, de-skewing and binarization. Zonal features like ratio of a zonal foreground pixels to total foreground pixels, direction code, difference between the sums of square of line lengths to the orthogonal directions are extracted. Global features like width-height ratio and character portion below the baseline are also extracted. Learning Vector Quantization (LVQ) with 3 different learning methods are used for classification. LVQ1 uses nearest neighbor decision rule, LVQ2 uses Bays rule to correct the class boundaries of the LVQ1 and LVQ3 has additional rules to ensure proper class distribution. With all 3 LVQs, the maximum result of 81.72% is reported. [62] uses CEDAR databases with English words and standard alphanumeric characters for the experiment. The word images are thresholded, slant corrected, thinned and boundary extracted. Zonal features for each of the four directional lines, number of lines, and total 69

5 length of the lines and the Global intersection points are extracted and tested on Multilayer Perceptron with Back-propagation (MLP BP) and Radial Basis Function (RBF) neural networks. Both classifiers performed similarly, but, RBF training time was significantly less. The maximum recognition rate reported is 83.65% for non-resized boundary images with MLP BP classifier. On average, MLP BP network outperforms RBF when directional features extracted from resized / non-resized, thinned / boundary extracted images are considered. In [165], one of the very recent results, with recursive sub divisions of the image for feature extraction, reports 94.74% recognition results for CEDAR English character database. A lot of work has also been done on NIST database. The recognition result reported for uppercase and lowercase characters is 86.34% [50]. In [22], handwritten digits from NIST data set and digitized mail pieces are used. First image is size normalized. Gradient map (directional histograms), structural curvature features and concavity features are extracted and tested on two different classifiers. K-NN performed better with 97.1% compared to MLP with 96.9%. [50] uses NIST database for experiment, finds projection profiles, vertical and horizontal projection histograms, contour directional histograms shown in figure 3.1 and tests the performance on uppercase, lowercase and a meta class with mixture of upper and lower case (characters like O and o, V and v, etc, are considered as a single charactercase insensitive) English characters using MLP BP neural network. The recognition rates are 86.73% and 92.47% for lower case and uppercase respectively. The meta class (case insensitive class) results are 87.79% little better in comparison to 52 (lowercase and uppercase together) class results of 86.34%. Projection profiles histograms contour of a and 3x2 zoning Figure 3.1 Features used by Koerich 70

6 [35] uses NIST database for English characters. The system is trained with 2000 samples of 128 classes by considering the digits, uppercase, lower case and mixed characters and tested on 500 samples for each symbol. The characters are size normalized (other preprocessing is done as a part of document processing). The features used are horizontal histogram, vertical histogram, radial histogram, radial out-in profile and radial in-out profile. K-means algorithm is used for classification with Euclidean distance minimization. The maximum recognition rate is 98.8% for English digits. For English mixed characters, the recognition result reported is 82.79%. Work on ETL database reports 99.26% for uppercase characters and work on UNIPEN database reports 85% for lowercase characters. [61] uses elastic matching technique based on class dependent eigen-deformation model. The results are superior to those of conventional class independent deformation models. The experiment is done on ETL6 database handwritten English uppercase character size normalized images. The recognition rate achieved is 99.21% slightly higher than other conventional method results in the paper. [36] uses UNIPEN database of digits and isolated characters. It assumes that the input characters are already segmented and are confined in a bounding box. They use different category of features. (1) A 3x2 regional grid is used. The curvature degree to which the region content is rectilinear, curved clockwise, curved anticlockwise is computed per region. The line degree to which the region content is horizontal, vertical, positive oblique and negative oblique are the features. (2) The horizontal and vertical densities of the regions are computed as features. (3) Aspect ratio. (4) Horizontal and vertical distances from the boundary to the edges taken at regular distance. (5) Intersection of the edges to the horizontal and the vertical edges of the grid. Some features are shown in figure 3.2. An MLP with BP Neural network is used for classification. The maximum recognition rate for digits is 97% and for lower case characters is 85% and claimed to be the highest results on UNIPEN database. 71

7 a b c Figure 3.2 (a) 3x2 regions, (b) boundary distance, (c) grid and edge intersection points Work on local English database Some researchers work is based on locally generated database. As the complexity of these databases is not measurable, the results cannot be compared directly. So some also tested on standard databases. In [14], the database used is with 50 Words. Two level of segmentation is performed to extract characters from word. In the first level heuristic algorithm and a hole seeking component are used and in the second level feed forward NN with BP is trained manually and used for the verification of the first level segmentation results. The character pixel values are the features and tested on NN for character recognition and also presented to a neural based dictionary of words for word recognition. The maximum character and word recognition rates reported are 78% and 100% respectively. In [48], the authors conclude that the HCR problem is not a cluster-able recognition problem with respect to the features they tested. The training set with 150 uppercase characters collected from a single user is used. They considered profile, Geometric moments, Fourier Transform features, contour and shadow features. Observed retrieval success rate for profile, Geometric moments, Fourier Transform features, contour and shadow features are 92%, 90.8%, 86.4%, 65% and 98% respectively when tested on LVQ. With 1000 samples from 13 writers, the shadow feature average performance observed is 67%. In [47], a segmentation free technique with appearance based features is suggested. The images are rescaled and cropped. PCA and DWT are used separately and also together for feature generation. HMM is used for classification. The minimum character error rate (CER) of 26% is reported. That is, the maximum recognition is 74%. 72

8 Some of the works reported on other International languages are as follows. In [37], a single system is developed to handle both online and offline Korean characters by converting the image into an array of strokes similar to online strokes, making the strokes order free. The system learns online or offline data continuously to improve the performance of the recognizer. The image is passed through a filter to eliminate isolated pixels and rugged pixels. From the thinned image, strokes are segmented. Stroke size, start and end points, direction of the segments are the features which are made order free. Comparison to find the minimum dissimilarity among the selected stroke with the best matching stroke for all the strokes is computed and averaged to identify the class. In [35], discussed earlier under NIST database for English character recognition, the same feature set is also tried on GRUHD database and the recognition rate for mixed Greek characters is less (72.8%) compared to English mixed characters result (82.79%) More specific feature survey As we saw in last chapter, a number of features have been used in pattern recognition and character recognition in particular. In this section, we look at two classes of such features in more detail, since our approach uses them heavily. These are Gabor filters and Moments in general. These turn out to be attractive set of features for scripts like Indian languages which are curvature rich. We briefly introduce the feature and cite a number of works which makes use of them Use of Gabor filter Gabor filter is used in HCR research for character segmentation [116] and for feature extraction. The Gabor filter is used to create the independent directional images as features by researchers. The directional information thus extracted is used directly as feature [116] or it is used to reduce the input dimension complexity [112][114]. Some have computed zonal statistical probability distribution of the directional images [106]. The computation of Gabor filter to extract directional information is discussed in section 7.4. Some research works using this filter are as follows. In [105], the authors worked on noisy hand printed Chinese character images from ETL-8 database to find the robustness of Gabor transformation to the noise and also compared 73

9 with peripheral direction contributivity (PDC) features. PDC features are formed by assigning stroke directions to pixels and the pixels on the first, second and third stroke encountered by the scan line are selected. The results show that Gabor features yielded an error rate of 2.4% (success rate of 97.6%) as compared to 4.4% (success rate of 95.6%) for PDC. In [106], images are preprocessed for slant and distortion using minimal moment of inertia and rotation. Canny edge image is passed through a set of Gabor filters with 2 wave lengths and 4 orientations. From these, maximum ratio vectors are computed and tested on MLP with BP neural network. It is found that filter dimension 16x16 gave better results with recognition accuracy of 96.5% for 10 numerals and 26 uppercase alphabets with 20 samples / character. [113] discusses the effect of sampling intervals of 2D- Gabor features in the 2D pattern, orientation angle and logarithmic frequency domains. The discussion on the feature stabilities for scaling, rotation and translation clarified that the stable range for scaling, translation and rotation is 2, half times as large as the wavelength of the Gabor filter and respectively. The experiments on printed Japanese characters showed that 10 maximum recognition rates are achieved for the optimal sampling intervals along with the reduction in the computations. When the sampling rate was less than the optimal, recognition rate did not change, but, the computations increased. Similarly, in the orientation angle domain, sampling with eight conventional orientation angles was sufficient for Japanese character recognition. [110] uses 3 databases MNIST, CENPARMI and PJC. The images are normalized gray scale images. The authors compare the performance of Gradient features with Gabor features. The gradient features are computed using Sobel operator. Eight direction planes are generated and merged into four planes. Each plane is convolved with a low-pass Gaussian filter and the convolution values at uniformly placed sampling points are taken as gradient features. The magnitude of the Gabor Transformation is used as Gabor features. The feature vector is first reduced using Fisher Discriminant Analysis. The class-mean vectors of training samples form the template of nearest mean classifier. The mean vectors are also normalized using LVQ with minimum classification error. The 74

10 Gabor feature performed well with 99.47% and 99.5% for MNIST and PJC respectively when high sampling rate with 4 orientations are considered. In [114], the Elastic meshing technology is first applied to get sampling points. Then a set of Gabor filters (real parts) are used to extract different directional features at each sampling point. A minimum distance classifier is used. The Gabor feature performed well compared to directional features with a recognition rate of 97.1% for poor quality images Use of moment features Moments can represent each character uniquely in the form of monomials regardless of how close the characters are in terms of local features and hence the image can be reconstructed from the moment features. As the order of moments gets higher, the reconstruction of the image using these moment features gets better but, at the same time it may get influenced by noise. Historically, the first significant work considering moments for pattern recognition was performed by Hu [126]. From methods of algebraic invariants, he derived a set of seven moment invariants, using non-linear combinations of geometric moments. These invariants remain the same under image translation, rotation and scaling. Since then, moments and functions of moments are widely used in all such applications where there is a need for identifying a shape as in pattern recognition, ship identification, aircraft identification, pattern matching, and scene matching. As moments represent images in a transformed domain, the image can be reconstructed from the moment features. The kernel function of geometric moments is not orthogonal, which makes reconstruction of an image from these moments quite difficult and requires moment matching method. Further research in this direction resulted in orthogonal moments like Zernike moments, Legendre moments, etc. From the original preprocessed image many researchers have computed Geometric moments [108], Central moments [102][103][107], Hu s moments [104] [120], higher order Hu s moments [102], Gegenbauer moments [46], affine moments [129], Zernike moments [45][120], Legendre moments [108], Chebyshev Moments [79] etc. These moment values are directly used as features [103] in some cases, and some have extracted central-ness, divergence, imbalance, skewness, etc, from moments as features [108][102]. Some have worked on accuracy analysis of Zernike moments [109]. In [127], the orthogonal moments features are tested under different parametric and non-parametric classifiers. There are 75

11 research works on automatic generation of moment invariants [129] and fast computation of moments [130][144]. The computations of some of these moments are discussed in section 7.3. Some research works using non orthogonal and orthogonal moments features are as follows. In [104] handwritten numerals are tested using MLP with BP neural network. The Geometric moments are found for 1-D contour sequence as well for 2-D images. Four features are generated from 1-D moments and Hu s invariant features are generated from 2-D moments. The test results show that NN results are better than nearest neighbor and minimum mean distance classifiers and the contour sequence moments performed well with 95.42% as compared to Hu s moments with 82.09%. In [108], moment features are analyzed for their power to recognize similarly shaped Chinese characters. They proposed 4 non-linear functions based on moments till 3 rd order. They tested the performance of geometric moments, central moments and Legendre moments. The mean square distance is used as a measure for classification. The distance between each pair of 6,763 Chinese characters is found. It is observed that there are 78,801 pairs of characters falling into a range of 0.01 with Geometric moments whereas there are only 671 pairs by Legendre moments. It is also noted that a pair of characters that is difficult to recognize using one method may not be so difficult using another method. Hence different moments can be used in conjunction. [120] does the comparison of geometric moments (7 Hu s features) and Zernike moments (0-12 order) using MLP with BP NN, Bayes Classifier, nearest neighbor rule and weighted minimum mean distance rule. The database consists of 24 samples of 26 upper case English characters with rotation and different levels of Signal to Noise Ratio (SNR). In all experiments with different levels of noise, MLP NN performed better even with low SNR hidden layer nodes gave close to best performance and reducing this by one fifth, did not alter the results by significant amount. Zernike moment features performed superior to Geometric moments. Both performed similar for noise-less images and Zernike moments did well in noisy conditions. Even though, the high order moments are sensitive to noise, the increase in the number of Zernike features by increasing the order, MLP shows the increase in performance whereas with other 3 conventional classifiers, decrease in performance is observed. The accuracy of 100% were achieved for noiseless, 76

12 50, 25 and 12 db SNR cases, while accuracies in 90% and 80% were obtained for 8 and 5 db noise when Zernike moments of order 12 are classified using MLP NN. [129] proposes a method for automatic generation of moment invariants of any order. The database consist 50 samples of all alphanumeric English characters with 20 used for training and 30 for testing using Euclidean distance and tree classifiers. The new moments results are compared with Hu s invariants. The recognition rate of 96.34% and 98.81% is achieved with Hu s invariants and new 3 rd order moments respectively. [127] examines orthogonal polynomials- Legendre, Zernike and Pseudo Zernike for the recognition of skeletonized handwritten Arabic numerals. A new method of scale and location invariance is suggested with the use of circular regions about the centroid or the minimum bounding circle. Radial geometric moments are used for computing all the three orthogonal moments. Bayes quadratic, K-nearest neighbor, Parzen and MLP NN are used as classifiers. MLP and Bayes classifiers need far less computational effort for classification once they are trained. Pseudo Zernike moments performed better with recognition rate of 91.7% (8.3% error) with K nearest neighbor classifier compared to Legendre and Zernike moments. The best non-rotation invariant result of 97.1% (2.9% error) is obtained with Pseudo Zernike moments with MLP. 3.2 Indian Script OCR Survey At present more sophisticated OCRs available are for Roman, Chinese, Japanese and Arabic text. These readers can process documents of different fonts and sizes as well as intermixed text and graphics, which are typewritten, typeset or printed by a printer. There is relatively little work being done on Indian language OCR, we briefly review the available literature in this section. In the recent years, research on OCR for online and offline printed characters and handwritten characters has picked up on Indian languages [64]. OCRs for many languages like Devanagari [95][45], Hindi [65][53][153], Bangla[65], Tamil [92][93][94][97][119], Telugu [76][95][96][97], Kannada [73][78], Malayalam[112][133], Gujarati [85][132][134], Oriya [71] and Punjabi, Gurmukhi [52][77], etc, are under research. Major work is happening on Devanagari and Bangla. The researchers are working on different variants of the OCR problem such as printed text [71][76], printed text with compound characters [65], 77

13 segmentation of machine-printed and handwritten text lines [68], skew correction [69], multilingual text recognition [112][65], online HCR [92][93][94][95][96][97], post processing of Indian OCRs [77][72], handwritten character recognition[75], handwritten numeral recognition[84][73] and multi script similar shape character pairs recognition [63]. In [154], a comparative study of Devanagari handwritten character recognition using 12 different classifiers and four sets of feature is presented. The features used are zonal curvature features and gradient features from binary and from gray-scale images and are processed to make the feature count of 392 using down sampling with Gaussian filter and PCA methods. A number of classifiers like Projection Distance (PD), Subspace Method (SM), Linear Discriminant Function (LDF), Support Vector Machines (SVM), Modified Quadratic Discriminant Function (MQDF), Mirror Image Learning (MIL), Euclidean Distance (ED), Nearest Neighbour (NN), K-Nearest Neighbour (K-NN), Modified Projection Distance (MPD), Compound Projection Distance (CPD) and Compound Modified Quadratic Discriminant Function (CMQDF) are considered. Mirror Image Learning (MIL) gave overall better results among all the classifiers and showed highest accuracy of 95.19% on gray-scale curvature features. Other classifier s performance varied within -1% except LDF, ED, NN and K-NN that performed poorly. From the experiment we observed that curvature features provided higher results than gradient features in all the classifiers except NN and K-NN. NN and K-NN classifiers show slightly lower results in curvature features than gradient features. Also from the experiment we noticed that except ED, NN and K-NN classifiers the features computed in gray-scale images show better results than that of binary images. This paper reports that there are only four pieces of work on Devanagari offline handwritten character recognition and the proposed method with MIL classifier has the highest recognition rate reported. In [63], the similar shaped characters from different scripts are tested using a technique based on F-ratio (Fisher ratio). The gradient features are computed using Roberts filter. The 9x9 zonal histograms of the 16 directions are down sampled to 5x5 by a Gaussian Filter and are further enhanced by weighing the feature elements using the F-ratio (ratio - between between-class variance and within-class variance). Quadratic Discriminant function is used for classification. It is observed that the F-ratio based feature weighing improves the recognition results by a maximum of 1% on similar shaped characters. 78

14 In [153], MLP with BP and RBF NN are used as classifiers for handwritten Hindi character recognition. The highest recognition rate quoted by them is 85% with MLP NN for thinned, size normalized image itself as input. In [133], Malayalam handwritten characters are tested on MLP with BP NN. The segmented isolated characters are binarized, median filtered and thinned. One dimensional wavelet transformation is applied on both horizontal and vertical projections. They used Daubechies wavelets with filter length 4 for transformation. The number of levels of decomposition is adjusted to get a final smooth sub-signal of size 8. The 16 values (8 each from vertical and horizontal projections) are taken as the feature vector. The classification accuracy obtained is 73.8%. According to this paper, the highest result reported on Malayalam characters is 82.3%. In [45], Zernike moments from order 2 to 15 are used as features for Handwritten Devanagari characters. The 70 Zernike moments are tested on NN and the results quoted are between 80% and 85% In [119], a Hierarchical NN (HNN) with BP is proposed for handwritten Tamil Character recognition. The image is centered and rescaled for translation and size normalization. The 8 immediate edge pixel coordinates from the centroid are the first level features. Similarly the second level features are obtained. The first level features are used for coarse classification and the next level features are used to classify characters in each group. The results are also tested on Single NN (SNN). The HNN performed well with 94.4% as compared to SNN with 72.2%. 3.3 Kannada OCR Survey Kannada is the official language of Karnataka state of southern India. At the 1991 census Karnataka had a population of 400 lakh [135]. This language is spoken not only in Karnataka, but to some extent in the neighboring states of Andhra Pradesh, Tamil Nadu and Maharashtra. Kannada is written with its own script and is also used for writing Tulu. Kannada script is similar to Telugu. Kannada like many other Indian languages is built from a base character set of 49 characters with 15 vowels (swaras) and 34 consonants (vyanjanas). There are as many stress marks as there are base characters. Stress marks of swara (vowel matra) when applied to 79

15 vyanjanas, Kagunita (compound character) is formed. Hence there are 15x34 = 510 compound characters. The stress marks of Vyanjana (vothus) modify the compound characters giving complex characters. The vothu is an appendage attached to the compound character mostly at the bottom. Since the appendages can also touch the character, the set of distinct characters to be recognized becomes potentially very large. Some examples of each of these cases are given in figure 5.2 (c) and (d). The Kannada OCR research has been picking up in the recent years, but comparatively very few research papers are available as base to start with [64][89]. The work is still in the preliminary stage with researchers working on printed text, numerals and basic characters. But no research base is available for handwritten Kagunita (compound character) recognition. There are no standards, tools and linguistic resources like corpus for the experiment available for printed, online and offline handwritten character recognition. There are no bench mark results to do comparative analysis under similar platforms. The research work is happening in both online and offline character recognition domains. Some researchers have worked on printed characters [80], printed text [98][101][117], and some on handwritten numerals [60][83][84], basic character set [90][91] and bilingual printed text [81]. Some focus on preprocessing techniques like printed character segmentation [82], normalization [101], etc. Very little research work on post-processing is reported [100]. Technologies like multi layer classifiers [73][86], Nearest Neighbor classifier [73][83], Support Vector Machines[73][87], Radial Basis Function [98], Hybrid Neural network [88], Zernike moments [99], Wavelet features [98][81], etc, are explored by different researchers. We now discuss some of the specific attempts at Kannada character recognition. [83] discusses the handwritten Kannada numeral recognition that uses un-thinned images. The structural features, namely, directional density of pixels in four directions, water reservoirs, maximum profile distances and fill hole density are used for recognition with K nearest neighbor classifier using Minkowski minimum distance criteria. The overall accuracy reported is 96.12% when tested on the personally created database. In [84], handwritten numeral recognition of six popular Indian scripts - Devanagari, Bangla, Telugu, Oriya, Kannada and Tamil is presented. The binarized (Otsu method) bounded box and size normalized image is divided into blocks, down sampled and directional features are extracted. The modified Quadratic Discriminant function is used 80

16 for classification. The minimum recognition result obtained was for Oriya script with 98.4% and maximum was for Devanagari with 99.56%. Kannada recognition rate reported is 98.71% with numeral seven having minimum recognition rate of 97.46% which is attributed to the shape similarity to other numerals. [90] used Fisher Linear Discriminant Analysis (FLD), 2D-FLD and diagonal FLD for handwritten Kannada vowels and consonants (50 characters) and also considered modifiers (50 shapes) of compound characters as extended character set. The database has 100 samples of each character and 75 were used for training and the rest 25 used for testing. A number of (17) different distance measures are used for classification. The best recognition rate reported for 50 characters (vowels and consonants) is 68% with 2D-FLD features and angle measures. For 100 characters (vowels, consonants and modifiers), again 2D-FLD features and angle measures performed well with 58.11%. [60] proposes a quadratic classifier based scheme for offline Handwritten numeral recognition. The bounded box character is divided into blocks and chain code histogram is computed for each block. Maximum recognition obtained is 98.45% with 100 dimensional features. In [73], zone based feature extraction is suggested for Handwritten Numeral recognition for four South Indian scripts. For each zone, an average angle from the centroid of the image to the pixels in each zone, centroid of each zone and the average angle from centroid of the zone to the pixels within the same zone are computed as features. The recognition result for Kannada numerals is 97.85% using SVM as compared to 97.7% with Nearest Neighbor Classifier. With MLP BP NN the results are 94.75%. In [78], the same authors of [73] are presenting the same system with MLP BP NN classifier. The results show that as the training samples increase, the results decrease. With 200 samples, the recognition result is 98% and with 1000 samples, it is 94%. 3.4 Discussions Majority of researchers are concentrating on the complete system and have assumed constrained images of characters and words with minimal noise and variations. Hence their preprocessing requirements make the image suitable for feature extraction are relatively simple. 81

17 There are standard databases for English, but for other languages no significant character corpus is available yet. Most of the existing research uses home built databases, usually of a small size and restricted variability. The moment features are found to be robust for noise. With NN, use of high order moments from noisy images improved the recognition rate further [120]. Zernike moments performed similar to Geometric moments under noiseless environment. However, in the presence of noise, Zernike moments are more robust. Geometric Moments on 1-D contour sequence of the character image has better recognition rate than Hu s moments computed from 2-D character image [104]. Gabor filters are more robust than gradient filters as they are less sensitive to ruggedness of the contour [110]. On Indian language very few papers are available for handwritten character recognition. Most of the work reported is on handwritten numeral recognition and online character recognition. There is preliminary work reported on preprocessing of Indian scripts like Devanagari and Bangla. But the work has to extend to other scripts. The work on Kannada compound characters is reported for printed character recognition. The approach followed treats vowel matras as separate characters. Across the different studies, one can see that there are no universally winning features or classifier. Different choices perform differently for different character sets. Even within a set some features may perform well in distinguishing among a subset of characters. Thus, one major challenge in getting good performance is to select a meaningful set of features, guided by the nature of the character set. As Indian language script has hundreds of characters including complex characters, the work reported is not sufficient to build a practical OCR system. At present only digits and basic character set are considered for the experiment. The number of Kagunita (compound) characters is huge and are formed by modifying a consonant based on the vowel matra associated with it. Similarly complex characters are formed by the combination of two or more consonants with a vowel matra. Hence the methodology applied to basic character set may not be suitable for Kagunita and complex character recognition. So there is a need for extensive research work in these directions. 82

LITERATURE REVIEW. For Indian languages most of research work is performed firstly on Devnagari script and secondly on Bangla script.

LITERATURE REVIEW. For Indian languages most of research work is performed firstly on Devnagari script and secondly on Bangla script. LITERATURE REVIEW For Indian languages most of research work is performed firstly on Devnagari script and secondly on Bangla script. The study of recognition for handwritten Devanagari compound character

More information

HANDWRITTEN GURMUKHI CHARACTER RECOGNITION USING WAVELET TRANSFORMS

HANDWRITTEN GURMUKHI CHARACTER RECOGNITION USING WAVELET TRANSFORMS International Journal of Electronics, Communication & Instrumentation Engineering Research and Development (IJECIERD) ISSN 2249-684X Vol.2, Issue 3 Sep 2012 27-37 TJPRC Pvt. Ltd., HANDWRITTEN GURMUKHI

More information

Handwritten Script Recognition at Block Level

Handwritten Script Recognition at Block Level Chapter 4 Handwritten Script Recognition at Block Level -------------------------------------------------------------------------------------------------------------------------- Optical character recognition

More information

A Brief Study of Feature Extraction and Classification Methods Used for Character Recognition of Brahmi Northern Indian Scripts

A Brief Study of Feature Extraction and Classification Methods Used for Character Recognition of Brahmi Northern Indian Scripts 25 A Brief Study of Feature Extraction and Classification Methods Used for Character Recognition of Brahmi Northern Indian Scripts Rohit Sachdeva, Asstt. Prof., Computer Science Department, Multani Mal

More information

OCR For Handwritten Marathi Script

OCR For Handwritten Marathi Script International Journal of Scientific & Engineering Research Volume 3, Issue 8, August-2012 1 OCR For Handwritten Marathi Script Mrs.Vinaya. S. Tapkir 1, Mrs.Sushma.D.Shelke 2 1 Maharashtra Academy Of Engineering,

More information

Cursive Handwriting Recognition System Using Feature Extraction and Artificial Neural Network

Cursive Handwriting Recognition System Using Feature Extraction and Artificial Neural Network Cursive Handwriting Recognition System Using Feature Extraction and Artificial Neural Network Utkarsh Dwivedi 1, Pranjal Rajput 2, Manish Kumar Sharma 3 1UG Scholar, Dept. of CSE, GCET, Greater Noida,

More information

Building Multi Script OCR for Brahmi Scripts: Selection of Efficient Features

Building Multi Script OCR for Brahmi Scripts: Selection of Efficient Features Building Multi Script OCR for Brahmi Scripts: Selection of Efficient Features Md. Abul Hasnat Center for Research on Bangla Language Processing (CRBLP) Center for Research on Bangla Language Processing

More information

CHAPTER 8 COMPOUND CHARACTER RECOGNITION USING VARIOUS MODELS

CHAPTER 8 COMPOUND CHARACTER RECOGNITION USING VARIOUS MODELS CHAPTER 8 COMPOUND CHARACTER RECOGNITION USING VARIOUS MODELS 8.1 Introduction The recognition systems developed so far were for simple characters comprising of consonants and vowels. But there is one

More information

Handwritten Numeral Recognition of Kannada Script

Handwritten Numeral Recognition of Kannada Script Handwritten Numeral Recognition of Kannada Script S.V. Rajashekararadhya Department of Electrical and Electronics Engineering CEG, Anna University, Chennai, India svr_aradhya@yahoo.co.in P. Vanaja Ranjan

More information

LECTURE 6 TEXT PROCESSING

LECTURE 6 TEXT PROCESSING SCIENTIFIC DATA COMPUTING 1 MTAT.08.042 LECTURE 6 TEXT PROCESSING Prepared by: Amnir Hadachi Institute of Computer Science, University of Tartu amnir.hadachi@ut.ee OUTLINE Aims Character Typology OCR systems

More information

Optical Character Recognition

Optical Character Recognition Chapter 2 Optical Character Recognition 2.1 Introduction Optical Character Recognition (OCR) is one of the challenging areas of pattern recognition. It gained popularity among the research community due

More information

Fine Classification of Unconstrained Handwritten Persian/Arabic Numerals by Removing Confusion amongst Similar Classes

Fine Classification of Unconstrained Handwritten Persian/Arabic Numerals by Removing Confusion amongst Similar Classes 2009 10th International Conference on Document Analysis and Recognition Fine Classification of Unconstrained Handwritten Persian/Arabic Numerals by Removing Confusion amongst Similar Classes Alireza Alaei

More information

A Survey of Problems of Overlapped Handwritten Characters in Recognition process for Gurmukhi Script

A Survey of Problems of Overlapped Handwritten Characters in Recognition process for Gurmukhi Script A Survey of Problems of Overlapped Handwritten Characters in Recognition process for Gurmukhi Script Arwinder Kaur 1, Ashok Kumar Bathla 2 1 M. Tech. Student, CE Dept., 2 Assistant Professor, CE Dept.,

More information

SEVERAL METHODS OF FEATURE EXTRACTION TO HELP IN OPTICAL CHARACTER RECOGNITION

SEVERAL METHODS OF FEATURE EXTRACTION TO HELP IN OPTICAL CHARACTER RECOGNITION SEVERAL METHODS OF FEATURE EXTRACTION TO HELP IN OPTICAL CHARACTER RECOGNITION Binod Kumar Prasad * * Bengal College of Engineering and Technology, Durgapur, W.B., India. Rajdeep Kundu 2 2 Bengal College

More information

Structural Feature Extraction to recognize some of the Offline Isolated Handwritten Gujarati Characters using Decision Tree Classifier

Structural Feature Extraction to recognize some of the Offline Isolated Handwritten Gujarati Characters using Decision Tree Classifier Structural Feature Extraction to recognize some of the Offline Isolated Handwritten Gujarati Characters using Decision Tree Classifier Hetal R. Thaker Atmiya Institute of Technology & science, Kalawad

More information

Handwritten Hindi Numerals Recognition System

Handwritten Hindi Numerals Recognition System CS365 Project Report Handwritten Hindi Numerals Recognition System Submitted by: Akarshan Sarkar Kritika Singh Project Mentor: Prof. Amitabha Mukerjee 1 Abstract In this project, we consider the problem

More information

A Technique for Offline Handwritten Character Recognition

A Technique for Offline Handwritten Character Recognition A Technique for Offline Handwritten Character Recognition 1 Shilpy Bansal, 2 Mamta Garg, 3 Munish Kumar 1 Lecturer, Department of Computer Science Engineering, BMSCET, Muktsar, Punjab 2 Assistant Professor,

More information

PCA-based Offline Handwritten Character Recognition System

PCA-based Offline Handwritten Character Recognition System Smart Computing Review, vol. 3, no. 5, October 2013 346 Smart Computing Review PCA-based Offline Handwritten Character Recognition System Munish Kumar 1, M. K. Jindal 2, and R. K. Sharma 3 1 Computer Science

More information

Isolated Curved Gurmukhi Character Recognition Using Projection of Gradient

Isolated Curved Gurmukhi Character Recognition Using Projection of Gradient International Journal of Computational Intelligence Research ISSN 0973-1873 Volume 13, Number 6 (2017), pp. 1387-1396 Research India Publications http://www.ripublication.com Isolated Curved Gurmukhi Character

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION CHAPTER 1 INTRODUCTION 1.1 Introduction Pattern recognition is a set of mathematical, statistical and heuristic techniques used in executing `man-like' tasks on computers. Pattern recognition plays an

More information

Word-wise Hand-written Script Separation for Indian Postal automation

Word-wise Hand-written Script Separation for Indian Postal automation Word-wise Hand-written Script Separation for Indian Postal automation K. Roy U. Pal Dept. of Comp. Sc. & Engg. West Bengal University of Technology, Sector 1, Saltlake City, Kolkata-64, India Abstract

More information

Indian Multi-Script Full Pin-code String Recognition for Postal Automation

Indian Multi-Script Full Pin-code String Recognition for Postal Automation 2009 10th International Conference on Document Analysis and Recognition Indian Multi-Script Full Pin-code String Recognition for Postal Automation U. Pal 1, R. K. Roy 1, K. Roy 2 and F. Kimura 3 1 Computer

More information

Segmentation Based Optical Character Recognition for Handwritten Marathi characters

Segmentation Based Optical Character Recognition for Handwritten Marathi characters Segmentation Based Optical Character Recognition for Handwritten Marathi characters Madhav Vaidya 1, Yashwant Joshi 2,Milind Bhalerao 3 Department of Information Technology 1 Department of Electronics

More information

A System for Joining and Recognition of Broken Bangla Numerals for Indian Postal Automation

A System for Joining and Recognition of Broken Bangla Numerals for Indian Postal Automation A System for Joining and Recognition of Broken Bangla Numerals for Indian Postal Automation K. Roy, U. Pal and B. B. Chaudhuri CVPR Unit; Indian Statistical Institute, Kolkata-108; India umapada@isical.ac.in

More information

Recognition of Off-Line Handwritten Devnagari Characters Using Quadratic Classifier

Recognition of Off-Line Handwritten Devnagari Characters Using Quadratic Classifier Recognition of Off-Line Handwritten Devnagari Characters Using Quadratic Classifier N. Sharma, U. Pal*, F. Kimura**, and S. Pal Computer Vision and Pattern Recognition Unit, Indian Statistical Institute

More information

Recognition of Unconstrained Malayalam Handwritten Numeral

Recognition of Unconstrained Malayalam Handwritten Numeral Recognition of Unconstrained Malayalam Handwritten Numeral U. Pal, S. Kundu, Y. Ali, H. Islam and N. Tripathy C VPR Unit, Indian Statistical Institute, Kolkata-108, India Email: umapada@isical.ac.in Abstract

More information

A Feature based on Encoding the Relative Position of a Point in the Character for Online Handwritten Character Recognition

A Feature based on Encoding the Relative Position of a Point in the Character for Online Handwritten Character Recognition A Feature based on Encoding the Relative Position of a Point in the Character for Online Handwritten Character Recognition Dinesh Mandalapu, Sridhar Murali Krishna HP Laboratories India HPL-2007-109 July

More information

Handwritten Gurumukhi Character Recognition by using Recurrent Neural Network

Handwritten Gurumukhi Character Recognition by using Recurrent Neural Network 139 Handwritten Gurumukhi Character Recognition by using Recurrent Neural Network Harmit Kaur 1, Simpel Rani 2 1 M. Tech. Research Scholar (Department of Computer Science & Engineering), Yadavindra College

More information

HCR Using K-Means Clustering Algorithm

HCR Using K-Means Clustering Algorithm HCR Using K-Means Clustering Algorithm Meha Mathur 1, Anil Saroliya 2 Amity School of Engineering & Technology Amity University Rajasthan, India Abstract: Hindi is a national language of India, there are

More information

Segmentation of Characters of Devanagari Script Documents

Segmentation of Characters of Devanagari Script Documents WWJMRD 2017; 3(11): 253-257 www.wwjmrd.com International Journal Peer Reviewed Journal Refereed Journal Indexed Journal UGC Approved Journal Impact Factor MJIF: 4.25 e-issn: 2454-6615 Manpreet Kaur Research

More information

Handwritten Character Recognition: A Comprehensive Review on Geometrical Analysis

Handwritten Character Recognition: A Comprehensive Review on Geometrical Analysis IOSR Journal of Computer Engineering (IOSRJCE) eissn: 22780661,pISSN: 22788727, Volume 17, Issue 2, Ver. IV (Mar Apr. 2015), PP 8388 www.iosrjournals.org Handwritten Character Recognition: A Comprehensive

More information

Image Normalization and Preprocessing for Gujarati Character Recognition

Image Normalization and Preprocessing for Gujarati Character Recognition 334 Image Normalization and Preprocessing for Gujarati Character Recognition Jayashree Rajesh Prasad Department of Computer Engineering, Sinhgad College of Engineering, University of Pune, Pune, Mahaashtra

More information

Handwritten Devanagari Character Recognition Model Using Neural Network

Handwritten Devanagari Character Recognition Model Using Neural Network Handwritten Devanagari Character Recognition Model Using Neural Network Gaurav Jaiswal M.Sc. (Computer Science) Department of Computer Science Banaras Hindu University, Varanasi. India gauravjais88@gmail.com

More information

Isolated Handwritten Words Segmentation Techniques in Gurmukhi Script

Isolated Handwritten Words Segmentation Techniques in Gurmukhi Script Isolated Handwritten Words Segmentation Techniques in Gurmukhi Script Galaxy Bansal Dharamveer Sharma ABSTRACT Segmentation of handwritten words is a challenging task primarily because of structural features

More information

Comparative Performance Analysis of Feature(S)- Classifier Combination for Devanagari Optical Character Recognition System

Comparative Performance Analysis of Feature(S)- Classifier Combination for Devanagari Optical Character Recognition System Comparative Performance Analysis of Feature(S)- Classifier Combination for Devanagari Optical Character Recognition System Jasbir Singh Department of Computer Science Punjabi University Patiala, India

More information

Automatic Recognition and Verification of Handwritten Legal and Courtesy Amounts in English Language Present on Bank Cheques

Automatic Recognition and Verification of Handwritten Legal and Courtesy Amounts in English Language Present on Bank Cheques Automatic Recognition and Verification of Handwritten Legal and Courtesy Amounts in English Language Present on Bank Cheques Ajay K. Talele Department of Electronics Dr..B.A.T.U. Lonere. Sanjay L Nalbalwar

More information

Chapter 2. Components

Chapter 2. Components Chapter 2 [2]OCR: General Architecture and Components In some areas which require the automation of human intelligence, such as chess playing, tremendous improvements are achieved over the last few decades.

More information

CS 231A Computer Vision (Fall 2012) Problem Set 3

CS 231A Computer Vision (Fall 2012) Problem Set 3 CS 231A Computer Vision (Fall 2012) Problem Set 3 Due: Nov. 13 th, 2012 (2:15pm) 1 Probabilistic Recursion for Tracking (20 points) In this problem you will derive a method for tracking a point of interest

More information

Character Recognition

Character Recognition Character Recognition 5.1 INTRODUCTION Recognition is one of the important steps in image processing. There are different methods such as Histogram method, Hough transformation, Neural computing approaches

More information

2. LITERATURE REVIEW

2. LITERATURE REVIEW 2. LITERATURE REVIEW CBIR has come long way before 1990 and very little papers have been published at that time, however the number of papers published since 1997 is increasing. There are many CBIR algorithms

More information

Multi-font Numerals Recognition for Urdu Script based Languages

Multi-font Numerals Recognition for Urdu Script based Languages Multi-font Numerals Recognition for Urdu Script based Languages Muhammad Imran Razzak, S.A. Hussain, Abdel Belaïd, Muhammad Sher To cite this version: Muhammad Imran Razzak, S.A. Hussain, Abdel Belaïd,

More information

DEVANAGARI SCRIPT SEPARATION AND RECOGNITION USING MORPHOLOGICAL OPERATIONS AND OPTIMIZED FEATURE EXTRACTION METHODS

DEVANAGARI SCRIPT SEPARATION AND RECOGNITION USING MORPHOLOGICAL OPERATIONS AND OPTIMIZED FEATURE EXTRACTION METHODS DEVANAGARI SCRIPT SEPARATION AND RECOGNITION USING MORPHOLOGICAL OPERATIONS AND OPTIMIZED FEATURE EXTRACTION METHODS Sushilkumar N. Holambe Dr. Ulhas B. Shinde Shrikant D. Mali Persuing PhD at Principal

More information

Degraded Text Recognition of Gurmukhi Script. Doctor of Philosophy. Manish Kumar

Degraded Text Recognition of Gurmukhi Script. Doctor of Philosophy. Manish Kumar Degraded Text Recognition of Gurmukhi Script A Thesis Submitted in fulfilment of the requirements for the award of the degree of Doctor of Philosophy Submitted by Manish Kumar (Registration No. 9000351)

More information

COMPUTER AND ROBOT VISION

COMPUTER AND ROBOT VISION VOLUME COMPUTER AND ROBOT VISION Robert M. Haralick University of Washington Linda G. Shapiro University of Washington A^ ADDISON-WESLEY PUBLISHING COMPANY Reading, Massachusetts Menlo Park, California

More information

A Document Image Analysis System on Parallel Processors

A Document Image Analysis System on Parallel Processors A Document Image Analysis System on Parallel Processors Shamik Sural, CMC Ltd. 28 Camac Street, Calcutta 700 016, India. P.K.Das, Dept. of CSE. Jadavpur University, Calcutta 700 032, India. Abstract This

More information

Radial Basis Function Neural Network Classifier

Radial Basis Function Neural Network Classifier Recognition of Unconstrained Handwritten Numerals by a Radial Basis Function Neural Network Classifier Hwang, Young-Sup and Bang, Sung-Yang Department of Computer Science & Engineering Pohang University

More information

A System towards Indian Postal Automation

A System towards Indian Postal Automation A System towards Indian Postal Automation K. Roy C.V.P.R Unit, I.S.I, Kolkata-108; India, Kaushik_mrg@hotmail.com S. Vajda LORIA Research Center, B.P. 239 54506, Nancy, France Szilard.Vajda@loria.fr U.

More information

A Hierarchical Pre-processing Model for Offline Handwritten Document Images

A Hierarchical Pre-processing Model for Offline Handwritten Document Images International Journal of Research Studies in Computer Science and Engineering (IJRSCSE) Volume 2, Issue 3, March 2015, PP 41-45 ISSN 2349-4840 (Print) & ISSN 2349-4859 (Online) www.arcjournals.org A Hierarchical

More information

Segmentation of Kannada Handwritten Characters and Recognition Using Twelve Directional Feature Extraction Techniques

Segmentation of Kannada Handwritten Characters and Recognition Using Twelve Directional Feature Extraction Techniques Segmentation of Kannada Handwritten Characters and Recognition Using Twelve Directional Feature Extraction Techniques 1 Lohitha B.J, 2 Y.C Kiran 1 M.Tech. Student Dept. of ISE, Dayananda Sagar College

More information

MOMENT AND DENSITY BASED HADWRITTEN MARATHI NUMERAL RECOGNITION

MOMENT AND DENSITY BASED HADWRITTEN MARATHI NUMERAL RECOGNITION MOMENT AND DENSITY BASED HADWRITTEN MARATHI NUMERAL RECOGNITION S. M. Mali Department of Computer Science, MAEER S Arts, Commerce and Science College, Pune Shankarmali007@gmail.com Abstract In this paper,

More information

Optical Character Recognition (OCR) for Printed Devnagari Script Using Artificial Neural Network

Optical Character Recognition (OCR) for Printed Devnagari Script Using Artificial Neural Network International Journal of Computer Science & Communication Vol. 1, No. 1, January-June 2010, pp. 91-95 Optical Character Recognition (OCR) for Printed Devnagari Script Using Artificial Neural Network Raghuraj

More information

A two-stage approach for segmentation of handwritten Bangla word images

A two-stage approach for segmentation of handwritten Bangla word images A two-stage approach for segmentation of handwritten Bangla word images Ram Sarkar, Nibaran Das, Subhadip Basu, Mahantapas Kundu, Mita Nasipuri #, Dipak Kumar Basu Computer Science & Engineering Department,

More information

Online Bangla Handwriting Recognition System

Online Bangla Handwriting Recognition System 1 Online Bangla Handwriting Recognition System K. Roy Dept. of Comp. Sc. West Bengal University of Technology, BF 142, Saltlake, Kolkata-64, India N. Sharma, T. Pal and U. Pal Computer Vision and Pattern

More information

Biometrics Technology: Image Processing & Pattern Recognition (by Dr. Dickson Tong)

Biometrics Technology: Image Processing & Pattern Recognition (by Dr. Dickson Tong) Biometrics Technology: Image Processing & Pattern Recognition (by Dr. Dickson Tong) References: [1] http://homepages.inf.ed.ac.uk/rbf/hipr2/index.htm [2] http://www.cs.wisc.edu/~dyer/cs540/notes/vision.html

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK HANDWRITTEN DEVANAGARI CHARACTERS RECOGNITION THROUGH SEGMENTATION AND ARTIFICIAL

More information

Segmentation of Bangla Handwritten Text

Segmentation of Bangla Handwritten Text Thesis Report Segmentation of Bangla Handwritten Text Submitted By: Sabbir Sadik ID:09301027 Md. Numan Sarwar ID: 09201027 CSE Department BRAC University Supervisor: Professor Dr. Mumit Khan Date: 13 th

More information

An Improvement Study for Optical Character Recognition by using Inverse SVM in Image Processing Technique

An Improvement Study for Optical Character Recognition by using Inverse SVM in Image Processing Technique An Improvement Study for Optical Character Recognition by using Inverse SVM in Image Processing Technique I Dinesh KumarVerma, II Anjali Khatri I Assistant Professor (ECE) PDM College of Engineering, Bahadurgarh,

More information

A Review on Handwritten Character Recognition

A Review on Handwritten Character Recognition IJCST Vo l. 8, Is s u e 1, Ja n - Ma r c h 2017 ISSN : 0976-8491 (Online) ISSN : 2229-4333 (Print) A Review on Handwritten Character Recognition 1 Anisha Sharma, 2 Soumil Khare, 3 Sachin Chavan 1,2,3 Dept.

More information

SEGMENTATION OF CHARACTERS WITHOUT MODIFIERS FROM A PRINTED BANGLA TEXT

SEGMENTATION OF CHARACTERS WITHOUT MODIFIERS FROM A PRINTED BANGLA TEXT SEGMENTATION OF CHARACTERS WITHOUT MODIFIERS FROM A PRINTED BANGLA TEXT ABSTRACT Rupak Bhattacharyya et al. (Eds) : ACER 2013, pp. 11 24, 2013. CS & IT-CSCP 2013 Fakruddin Ali Ahmed Department of Computer

More information

Enhancing the Character Segmentation Accuracy of Bangla OCR using BPNN

Enhancing the Character Segmentation Accuracy of Bangla OCR using BPNN Enhancing the Character Segmentation Accuracy of Bangla OCR using BPNN Shamim Ahmed 1, Mohammod Abul Kashem 2 1 M.S. Student, Department of Computer Science and Engineering, Dhaka University of Engineering

More information

HMM-Based Handwritten Amharic Word Recognition with Feature Concatenation

HMM-Based Handwritten Amharic Word Recognition with Feature Concatenation 009 10th International Conference on Document Analysis and Recognition HMM-Based Handwritten Amharic Word Recognition with Feature Concatenation Yaregal Assabie and Josef Bigun School of Information Science,

More information

Spotting Words in Latin, Devanagari and Arabic Scripts

Spotting Words in Latin, Devanagari and Arabic Scripts Spotting Words in Latin, Devanagari and Arabic Scripts Sargur N. Srihari, Harish Srinivasan, Chen Huang and Shravya Shetty {srihari,hs32,chuang5,sshetty}@cedar.buffalo.edu Center of Excellence for Document

More information

Time Stamp Detection and Recognition in Video Frames

Time Stamp Detection and Recognition in Video Frames Time Stamp Detection and Recognition in Video Frames Nongluk Covavisaruch and Chetsada Saengpanit Department of Computer Engineering, Chulalongkorn University, Bangkok 10330, Thailand E-mail: nongluk.c@chula.ac.th

More information

CS 223B Computer Vision Problem Set 3

CS 223B Computer Vision Problem Set 3 CS 223B Computer Vision Problem Set 3 Due: Feb. 22 nd, 2011 1 Probabilistic Recursion for Tracking In this problem you will derive a method for tracking a point of interest through a sequence of images.

More information

Review of Automatic Handwritten Kannada Character Recognition Technique Using Neural Network

Review of Automatic Handwritten Kannada Character Recognition Technique Using Neural Network Review of Automatic Handwritten Kannada Character Recognition Technique Using Neural Network 1 Mukesh Kumar, 2 Dr.Jeeetendra Sheethlani 1 Department of Computer Science SSSUTMS, Sehore Abstract Data processing

More information

Complementary Features Combined in a MLP-based System to Recognize Handwritten Devnagari Character

Complementary Features Combined in a MLP-based System to Recognize Handwritten Devnagari Character Journal of Information Hiding and Multimedia Signal Processing 2011 ISSN 2073-4212 Ubiquitous International Volume 2, Number 1, January 2011 Complementary Features Combined in a MLP-based System to Recognize

More information

Offline Tamil Handwritten Character Recognition using Chain Code and Zone based Features

Offline Tamil Handwritten Character Recognition using Chain Code and Zone based Features Offline Tamil Handwritten Character Recognition using Chain Code and Zone based Features M. Antony Robert Raj 1, S. Abirami 2 Department of Information Science and Technology Anna University, Chennai 600

More information

Invariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction

Invariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction Invariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction Stefan Müller, Gerhard Rigoll, Andreas Kosmala and Denis Mazurenok Department of Computer Science, Faculty of

More information

Handwritten Gurumukhi Character Recognition Using Zoning Density and Background Directional Distribution Features

Handwritten Gurumukhi Character Recognition Using Zoning Density and Background Directional Distribution Features Handwritten Gurumukhi Character Recognition Using Zoning Density and Background Directional Distribution Features Kartar Singh Siddharth #1, Renu Dhir #2, Rajneesh Rani #3 # Department of Computer Science

More information

A Simple Text-line segmentation Method for Handwritten Documents

A Simple Text-line segmentation Method for Handwritten Documents A Simple Text-line segmentation Method for Handwritten Documents M.Ravi Kumar Assistant professor Shankaraghatta-577451 R. Pradeep Shankaraghatta-577451 Prasad Babu Shankaraghatta-5774514th B.S.Puneeth

More information

NOVATEUR PUBLICATIONS INTERNATIONAL JOURNAL OF INNOVATIONS IN ENGINEERING RESEARCH AND TECHNOLOGY [IJIERT] ISSN: VOLUME 5, ISSUE

NOVATEUR PUBLICATIONS INTERNATIONAL JOURNAL OF INNOVATIONS IN ENGINEERING RESEARCH AND TECHNOLOGY [IJIERT] ISSN: VOLUME 5, ISSUE OPTICAL HANDWRITTEN DEVNAGARI CHARACTER RECOGNITION USING ARTIFICIAL NEURAL NETWORK APPROACH JYOTI A.PATIL Ashokrao Mane Group of Institution, Vathar Tarf Vadgaon, India. DR. SANJAY R. PATIL Ashokrao Mane

More information

Handwritten Character Recognition A Review

Handwritten Character Recognition A Review International Journal of Scientific and Research Publications, Volume 5, Issue 3, March 2015 1 Handwritten Character Recognition A Review Surya Nath R S *, Afseena S ** * Computer Science, College of Engineering

More information

CHAPTER 2 LITERATURE REVIEW

CHAPTER 2 LITERATURE REVIEW CHAPTER 2 LITERATURE REVIEW 2.1 Introduction There is a great need for OCR related research in Indian languages, even though there are many technical challenges as well as the lack of a commercial market

More information

Artifacts and Textured Region Detection

Artifacts and Textured Region Detection Artifacts and Textured Region Detection 1 Vishal Bangard ECE 738 - Spring 2003 I. INTRODUCTION A lot of transformations, when applied to images, lead to the development of various artifacts in them. In

More information

Convolution Neural Networks for Chinese Handwriting Recognition

Convolution Neural Networks for Chinese Handwriting Recognition Convolution Neural Networks for Chinese Handwriting Recognition Xu Chen Stanford University 450 Serra Mall, Stanford, CA 94305 xchen91@stanford.edu Abstract Convolutional neural networks have been proven

More information

Line and Word Segmentation Approach for Printed Documents

Line and Word Segmentation Approach for Printed Documents Line and Word Segmentation Approach for Printed Documents Nallapareddy Priyanka Computer Vision and Pattern Recognition Unit Indian Statistical Institute, 203 B.T. Road, Kolkata-700108, India Srikanta

More information

An Efficient Character Segmentation Based on VNP Algorithm

An Efficient Character Segmentation Based on VNP Algorithm Research Journal of Applied Sciences, Engineering and Technology 4(24): 5438-5442, 2012 ISSN: 2040-7467 Maxwell Scientific organization, 2012 Submitted: March 18, 2012 Accepted: April 14, 2012 Published:

More information

A Generalized Method to Solve Text-Based CAPTCHAs

A Generalized Method to Solve Text-Based CAPTCHAs A Generalized Method to Solve Text-Based CAPTCHAs Jason Ma, Bilal Badaoui, Emile Chamoun December 11, 2009 1 Abstract We present work in progress on the automated solving of text-based CAPTCHAs. Our method

More information

Chapter 2. Literature Survey and Objectives. 2.1 Literature Survey

Chapter 2. Literature Survey and Objectives. 2.1 Literature Survey Chapter 2 Literature Survey and Objectives 2.1 Literature Survey In India, there are 18 official (Indian constitution accepted) languages. Two or more of these languages may be written in one script. Twelve

More information

HMM-based Indic Handwritten Word Recognition using Zone Segmentation

HMM-based Indic Handwritten Word Recognition using Zone Segmentation HMM-based Indic Handwritten Word Recognition using Zone Segmentation a Partha Pratim Roy*, b Ayan Kumar Bhunia, b Ayan Das, c Prasenjit Dey, d Umapada Pal a Dept. of CSE, Indian Institute of Technology

More information

Localization, Extraction and Recognition of Text in Telugu Document Images

Localization, Extraction and Recognition of Text in Telugu Document Images Localization, Extraction and Recognition of Text in Telugu Document Images Atul Negi Department of CIS University of Hyderabad Hyderabad 500046, India atulcs@uohyd.ernet.in K. Nikhil Shanker Department

More information

Digital Image Processing Fundamentals

Digital Image Processing Fundamentals Ioannis Pitas Digital Image Processing Fundamentals Chapter 7 Shape Description Answers to the Chapter Questions Thessaloniki 1998 Chapter 7: Shape description 7.1 Introduction 1. Why is invariance to

More information

Recognition of Gurmukhi Text from Sign Board Images Captured from Mobile Camera

Recognition of Gurmukhi Text from Sign Board Images Captured from Mobile Camera International Journal of Information & Computation Technology. ISSN 0974-2239 Volume 4, Number 17 (2014), pp. 1839-1845 International Research Publications House http://www. irphouse.com Recognition of

More information

Feature Extraction and Image Processing, 2 nd Edition. Contents. Preface

Feature Extraction and Image Processing, 2 nd Edition. Contents. Preface , 2 nd Edition Preface ix 1 Introduction 1 1.1 Overview 1 1.2 Human and Computer Vision 1 1.3 The Human Vision System 3 1.3.1 The Eye 4 1.3.2 The Neural System 7 1.3.3 Processing 7 1.4 Computer Vision

More information

Devanagari Isolated Character Recognition by using Statistical features

Devanagari Isolated Character Recognition by using Statistical features Devanagari Isolated Character Recognition by using Statistical features ( Foreground Pixels Distribution, Zone Density and Background Directional Distribution feature and SVM Classifier) Mahesh Jangid

More information

N.Priya. Keywords Compass mask, Threshold, Morphological Operators, Statistical Measures, Text extraction

N.Priya. Keywords Compass mask, Threshold, Morphological Operators, Statistical Measures, Text extraction Volume, Issue 8, August ISSN: 77 8X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Combined Edge-Based Text

More information

K S Prasanna Kumar et al,int.j.computer Techology & Applications,Vol 3 (1),

K S Prasanna Kumar et al,int.j.computer Techology & Applications,Vol 3 (1), Optical Character Recognition (OCR) for Kannada numerals using Left Bottom 1/4 th segment minimum features extraction K.S. Prasanna Kumar Research Scholar, JJT University, Jhunjhunu, Rajasthan, India prasannakumarks@acharya.ac.in

More information

A New Approach to Detect and Extract Characters from Off-Line Printed Images and Text

A New Approach to Detect and Extract Characters from Off-Line Printed Images and Text Available online at www.sciencedirect.com Procedia Computer Science 17 (2013 ) 434 440 Information Technology and Quantitative Management (ITQM2013) A New Approach to Detect and Extract Characters from

More information

Lecture 8 Object Descriptors

Lecture 8 Object Descriptors Lecture 8 Object Descriptors Azadeh Fakhrzadeh Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University 2 Reading instructions Chapter 11.1 11.4 in G-W Azadeh Fakhrzadeh

More information

CoE4TN4 Image Processing

CoE4TN4 Image Processing CoE4TN4 Image Processing Chapter 11 Image Representation & Description Image Representation & Description After an image is segmented into regions, the regions are represented and described in a form suitable

More information

Short Survey on Static Hand Gesture Recognition

Short Survey on Static Hand Gesture Recognition Short Survey on Static Hand Gesture Recognition Huu-Hung Huynh University of Science and Technology The University of Danang, Vietnam Duc-Hoang Vo University of Science and Technology The University of

More information

Binary Image Processing. Introduction to Computer Vision CSE 152 Lecture 5

Binary Image Processing. Introduction to Computer Vision CSE 152 Lecture 5 Binary Image Processing CSE 152 Lecture 5 Announcements Homework 2 is due Apr 25, 11:59 PM Reading: Szeliski, Chapter 3 Image processing, Section 3.3 More neighborhood operators Binary System Summary 1.

More information

2: Image Display and Digital Images. EE547 Computer Vision: Lecture Slides. 2: Digital Images. 1. Introduction: EE547 Computer Vision

2: Image Display and Digital Images. EE547 Computer Vision: Lecture Slides. 2: Digital Images. 1. Introduction: EE547 Computer Vision EE547 Computer Vision: Lecture Slides Anthony P. Reeves November 24, 1998 Lecture 2: Image Display and Digital Images 2: Image Display and Digital Images Image Display: - True Color, Grey, Pseudo Color,

More information

SKEW DETECTION AND CORRECTION

SKEW DETECTION AND CORRECTION CHAPTER 3 SKEW DETECTION AND CORRECTION When the documents are scanned through high speed scanners, some amount of tilt is unavoidable either due to manual feed or auto feed. The tilt angle induced during

More information

ABJAD: AN OFF-LINE ARABIC HANDWRITTEN RECOGNITION SYSTEM

ABJAD: AN OFF-LINE ARABIC HANDWRITTEN RECOGNITION SYSTEM ABJAD: AN OFF-LINE ARABIC HANDWRITTEN RECOGNITION SYSTEM RAMZI AHMED HARATY and HICHAM EL-ZABADANI Lebanese American University P.O. Box 13-5053 Chouran Beirut, Lebanon 1102 2801 Phone: 961 1 867621 ext.

More information

A survey on optical character recognition for Bangla and Devanagari scripts

A survey on optical character recognition for Bangla and Devanagari scripts Sādhanā Vol. 38, Part 1, February 2013, pp. 133 168. c Indian Academy of Sciences A survey on optical character recognition for Bangla and Devanagari scripts 1. Introduction SOUMEN BAG 1 and GAURAV HARIT

More information

Problems in Extraction of Date Field from Gurmukhi Documents

Problems in Extraction of Date Field from Gurmukhi Documents 115 Problems in Extraction of Date Field from Gurmukhi Documents Gursimranjeet Kaur 1, Simpel Rani 2 1 M.Tech. Scholar Yadwindra College of Engineering, Talwandi Sabo, Punjab, India sidhus702@gmail.com

More information

CHAPTER 1 Introduction 1. CHAPTER 2 Images, Sampling and Frequency Domain Processing 37

CHAPTER 1 Introduction 1. CHAPTER 2 Images, Sampling and Frequency Domain Processing 37 Extended Contents List Preface... xi About the authors... xvii CHAPTER 1 Introduction 1 1.1 Overview... 1 1.2 Human and Computer Vision... 2 1.3 The Human Vision System... 4 1.3.1 The Eye... 5 1.3.2 The

More information

Handwritten Marathi Character Recognition on an Android Device

Handwritten Marathi Character Recognition on an Android Device Handwritten Marathi Character Recognition on an Android Device Tanvi Zunjarrao 1, Uday Joshi 2 1MTech Student, Computer Engineering, KJ Somaiya College of Engineering,Vidyavihar,India 2Associate Professor,

More information

Opportunities and Challenges of Handwritten Sanskrit Character Recognition System

Opportunities and Challenges of Handwritten Sanskrit Character Recognition System Opportunities and Challenges of Handwritten System Shailendra Kumar Singh Research Scholar, CSE Department SLIET Longowal, Sangrur, Punjab, India Sks.it2012@gmail.com Manoj Kumar Sachan Assosiate Professor,

More information