Smart Computing Review, vol. 3, no. 5, October 2013 346 Smart Computing Review PCA-based Offline Handwritten Character Recognition System Munish Kumar 1, M. K. Jindal 2, and R. K. Sharma 3 1 Computer Science Department, P. U. Rural Centre / Kauni, Muktsar, Punjab, India / munishcse@gmail.com 2 Department of Computer Science & Applications, P. U. Regional Centre / Muktsar, Punjab, India 3 School of Mathematics & Computer Applications, Thapar University / Patiala, India * Corresponding Author: Munish Kumar Received July 4, 2013; Revised September 22, 2013; Accepted September 29, 2013; Published October 31, 2013 Abstract: Principal component analysis (PCA) has been used widely in pattern recognition to reduce the extent of the data. In this paper, we explore using this technique to recognize offline handwritten Gurmukhi characters, and a system for offline handwritten Gurmukhi character recognition using PCA is proposed. The system first prepares a skeleton of the character so that meaningful feature information about the character can be extracted. For classification, we used k- nearest neighbor, Linear-SVM, polynomial-svm and RBF-SVM based approaches and combinations of these approaches. In this work, we collected 16,800 samples of isolated offline handwritten Gurmukhi characters. These samples were divided into three categories. In category 1 (5600 samples), each Gurmukhi character was written 100 times by a single writer. In category 2 (5600 samples), each Gurmukhi character was written 10 times by 10 different writers, and in category 3 (5600 samples), each Gurmukhi character was written by 100 different writers. The set of the basic 35 akhars of Gurmukhi has been considered here. A partitioning strategy for selecting the training and testing patterns is also explored in this work. We used zoning, diagonal, directional, transition, intersection and open end point, parabola curve fitting based and power curve fitting based feature extraction in order to find the feature set for a given character. The proposed system achieves a recognition accuracy of 99.06% in category 1, 98.73% in category 2 and 78.30% in category 3. Keywords: Handwritten character recognition, extraction, PCA, k-nn, SVM Introduction DOI: 10.6029/smartcr.2013.05.005
Smart Computing Review, vol. 3, no. 5, October 2013 347 O ffline handwritten character recognition, usually abbreviated as offline HCR, is the process of converting offline handwritten characters into a machine process-able format. In this paper, we present an offline handwritten Gurmukhi character recognition system using principal component analysis (PCA). A handwritten character recognition system consists of several phases, namely digitization, preprocessing, feature extraction and classification. The feature extraction stage analyzes a handwritten character image and selects a set of features that can uniquely be used for recognition of that character. Different feature extraction methods have been proposed for representation of characters, such as projection histograms, contour profile, zoning, Zernike moments, gradient features and Gabor features, etc. Singh et al. [17] presented a study of different feature extractors and classifiers for handwritten Devanagari character recognition. Aradhya et al. [1] presented a multilingual OCR system for south Indian scripts based on PCA. Deepu et al. [5] presented a system based on PCA for online handwritten character recognition. Sundaram and Ramakarishnan [18] presented 2D-PCA for online Tamil character recognition. Bhattacharya et al. [3] presented an efficient two-stage approach for handwritten Bangla character recognition. Kumar et al. [7] presented an offline handwritten Gurmukhi character recognition system based on support vector machines (SVM). In that work, they performed recognition without using PCA and used only an SVM classifier for classification purpose. They also provided an offline handwritten Gurmukhi character recognition system using a k-nearest neighbor (k-nn) classifier [8]. Sharma et al. [16] presented an online handwritten Gurmukhi script recognition system. They used an elastic matching method in which the character is recognized in two stages. The first stage recognizes the strokes and, in the second stage, the character is constructed on the basis of recognized strokes. In the present work, a PCA-based offline handwritten Gurmukhi character recognition system is proposed from experimenting with different recognition methods, namely, k-nn, Linear-SVM, Polynomial-SVM, RBF-SVM and combinations of these recognition methods. Data Collection In this study, 16,800 samples of offline handwritten Gurmukhi characters have been collected. These samples have further been divided into three categories. Category 1 consists of 5600 samples of Gurmukhi characters where each character was written 100 times by a single writer. Category 2 also contains 5600 samples, and each Gurmukhi character was written 10 times by 10 different writers. In category 3, each Gurmukhi character was written by 100 different writers. This category also consists of 5600 samples. All these characters were scanned at 300 dots per inch resolution. As such, a sufficiently large database has been collected for offline handwritten Gurmukhi characters. These three categories have further been analyzed and discussed in this paper. Gurmukhi Script Gurmukhi script is the script used for writing the Punjabi language and is derived from the old Punjabi term Guramukhi, which means from the mouth of the Guru. Gurmukhi script is the 12th most widely used script in the world. The writing style of Gurmukhi script is top to bottom, left to right, and it is not case sensitive. Gurmukhi script has 3 vowel bearers, 32 consonants, 6 additional consonants, 9 vowel modifiers, 3 auxiliary signs and 3 half characters. The Proposed Recognition System The proposed recognition system consists of several phases: digitization, preprocessing, feature extraction, and classification. Digitization Digitization is the process of translating a paper-based handwritten document into electronic format. Here, each document consists of only one Gurmukhi character. The electronic conversion is accomplished by using a method whereby a document is scanned and an electronic representation of the original document in tagged image file format is produced. We used an HP-1400 scanner for digitization, and the digital image was fed to the preprocessing phase. Preprocessing
348 Kumar et al.: PCA-based Offline Handwritten Character Recognition System In this phase, the gray-level character image is normalized into a window sized 100 100. After normalization, we produced a bitmap image of the normalized image. Then, the bitmap image was transformed into a thinned image using a parallel thinning algorithm [20]. Extraction In this phase, features from input characters are extracted. The performance of a handwritten character recognition system primarily depends on the features that are extracted. The extracted features should allow classification of a character in a unique way. We used diagonal features [7], intersection and open end points features [7], transition features [8], zoning features [9], directional features [9], parabola curve fitting based features [10], and power curve fitting based features [10] in order to find the feature set for a given character. Classification The classification phase uses the features extracted in the previous phase for setting class membership. In this work, we used k-nn and SVM classifiers for character recognition. The SVM classifier was considered with three different kernels: linear, polynomial, and RBF. In addition, a C-SVC type classifier in the Lib-SVM tool has been used for SVM classification purposes. We also used combinations of output for each classifier in parallel, and recognition was done using a voting scheme. We have taken following combinations of classifiers: LPR (Linear-SVM + Polynomial-SVM + RBF-SVM) PRK (Polynomial-SVM + RBF-SVM + k-nn) LRK (Linear-SVM + RBF-SVM + k-nn) LPK (Linear-SVM + Polynomial-SVM + k-nn) Principal Component Analysis PCA is a mathematical procedure that uses transformation to convert a set of observations of possibly correlated features into a set of values of uncorrelated features called principal components. PCA is a well-established technique for extracting representative features for character recognition and is used to reduce the extent of the data. The technique is useful when a large number of variables prohibit effective interpretation of the relationships between different features. By reducing dimensionality, one can interpret from a few features, rather than a large number of features. The number of principal components is less than or equal to the number of original variables. By selecting the top j eigen vectors with larger eigen values for subspace approximation, PCA can provide a lower dimensional representation to expose the underlying structures of complex data sets. Let there be P features for handwritten character recognition. In the next step, the symmetric matrix S of correlation coefficients between these features is calculated. Now, the eigenvectors and the corresponding eigen values are calculated. From these P eigen vectors, only j eigen vectors are chosen, corresponding to the larger eigen values. An eigenvector corresponding to a higher eigen value describes more characteristics of a character. Using these j eigen vectors, feature extraction is done using PCA. In the present work, seven features for a Gurmukhi character have been considered, and the experiments were conducted by taking 2, 3, 4, 5, 6 and 7 principal components. Experimental Results and Discussion In this section, the results of the offline handwritten Gurmukhi character recognition system using PCA are presented. The recognition results are based on the k-nn, Linear-SVM, Polynomial-SVM and RBF-SVM classifiers, and combinations of these. As stated earlier, we also experimented with partitioning strategies. We divided the data set of each category using five partitioning strategies. In the first partitioning strategy (strategy a), we have taken 50% of the data in the training set and the other 50% of the data in the testing set. In the second partitioning strategy (strategy b), we considered 60% of the data in the training set and the remaining 40% of the data in the testing set. Partitioning strategy c has 70% of the data in the training set and 30% of the data in the testing set. Similarly, partitioning strategy d has 80% of the data in the training set and 20% of the data in the testing set, where as partitioning strategy e was formulated by taking 90% of the data in the training set and the remaining 10% of the data in the testing set. Category results of the recognition system based on PCA are presented in the following subsections.
Smart Computing Review, vol. 3, no. 5, October 2013 349 Recognition Accuracy for Category 1 Database In this section, we considered each Gurmukhi character written 100times by a single writer. The features considered here are the seven features discussed in Section 4.3. For the sake of comparison between the performance of principal components, two principal components (2-PC), three principal components (3-PC),, seven principal components (PC) have been considered and taken as input for the classifiers. Partitioning strategy experimental results of testing are presented in the following subsections. Recognition accuracy using strategy a In this subsection, classifier recognition results of partitioning strategy a are presented. PRK is the best classifier combination for offline handwritten Gurmukhi character recognition when this strategy is followed. A maximum accuracy of 97.48% canbe achieved withthis strategy. Recognition results of classifiers and their combinations are given in Table 1 for up to seven features (feature) and the principal components. Table 1. Classifier recognition accuracy for Category 1, Strategy a Linear -SVM 94.00% 92.97% 92.80% 93.60% 94.40% 94.63% 94.00% 93.77% Poly. - SVM 95.43% 91.31% 69.67% 80.52% 89.09% 83.67% 95.20% 86.41% RBF - SVM 95.43% 93.43% 91.09% 92.46% 93.82% 94.34% 17.81% 82.63% k - NN 94.71% 97.41% 93.20% 91.94% 84.68% 82.57% 70.28% 87.83% LPR 97.25% 95.77% 94.39% 95.25% 96.17% 96.11% 95.54% 95.78% PRK 97.48% 94.45% 86.17% 90.51% 93.65% 91.19% 56.17% 87.09% LRK 97.19% 95.65% 95.37% 95.82% 96.11% 95.94% 55.88% 90.28% LPK 97.08% 94.68% 86.28% 90.85% 94.17% 91.54% 96.17% 92.97% 96.07% 94.46% 88.62% 91.37% 92.76% 91.25% 72.63% 89.59% Recognition accuracy using strategy b We achieved an accuracy of 97.99% when we used strategy b, and we saw that LPR is the best classifier combination for offline handwritten Gurmukhi character recognition with this strategy. Recognition results for up to seven features ( feature) and the principal components of partitioning strategy b are depicted in Table 2. Table 2. Classifier recognition accuracy for Category 1, Strategy b Linear -SVM 95.14% 94.93% 95.14% 94.93% 95.71% 95.50% 95.78% 95.30% Poly. - SVM 95.36% 93.64% 79.44% 86.58% 91.93% 89.00% 92.17% 89.73% RBF - SVM 95.36% 94.29% 92.57% 93.43% 94.71% 94.78% 20.27% 83.63% k - NN 97.55% 97.42% 96.00% 95.35% 89.21% 86.42% 73.85% 90.83% LPR 97.99% 97.14% 91.35% 94.35% 96.71% 95.35% 95.50% 95.48% PRK 97.64% 96.71% 90.85% 94.35% 96.21% 94.64% 60.42% 90.12% LRK 97.71% 97.57% 97.42% 97.37% 94.71% 97.64% 60.28% 91.81% LPK 97.92% 97.14% 91.35% 94.35% 96.71% 95.35% 94.71% 95.36% 96.83% 96.11% 91.77% 93.84% 94.49% 93.59% 74.12% 91.53%
350 Kumar et al.: PCA-based Offline Handwritten Character Recognition System Recognition accuracy using strategy c In partitioning strategy c, the maximum accuracy that could be achieved is 98.85%. Using this strategy, we again saw that PRK is the best classifier combination for offline handwritten Gurmukhi character recognition. Recognition results of this partitioning strategy, for up to seven features (feature) and the principal components are given in Table 3. Table 3. Classifier recognition accuracy for Category 1, Strategy c Linear -SVM 95.96% 95.05% 94.95% 94.95% 95.24% 95.24% 95.14% 95.22% Poly. - SVM 95.24% 93.72% 83.63% 88.68% 92.77% 90.48% 85.81% 90.05% RBF - SVM 95.14% 94.29% 92.77% 93.62% 95.05% 94.48% 25.50% 84.41% k - NN 97.42% 97.33% 97.80% 96.38% 90.09% 86.47% 77.52% 91.86% LPR 98.57% 98.09% 98.00% 98.57% 98.57% 98.66% 98.28% 98.39% PRK 98.85% 97.61% 93.33% 96.09% 97.42% 96.28% 65.61% 92.17% LRK 98.57% 98.38% 98.47% 98.47% 98.66% 98.66% 65.23% 93.78% LPK 98.66% 97.9% 93.04% 95.99% 97.71% 96.28% 94.48% 96.29% 97.30% 96.55% 93.99% 95.34% 95.68% 94.56% 75.94% 92.77% Recognition accuracy using strategy d In this subsection, recognition results using strategy d are presented.using this strategy, we achieved a maximum accuracy of 99.28% when we use the LRK classifier combination. Recognition results for the features and the principal components under consideration using this strategy are illustrated in Table 4. Table 4. Classifier recognition accuracy for Category 1, Strategy d Linear -SVM 94.00% 94.08% 93.86% 93.72% 93.86% 93.72% 93.10% 93.76% Poly. - SVM 94.00% 92.43% 85.30% 87.30% 92.15% 90.44% 94.43% 90.86% RBF - SVM 94.00% 93.29% 92.15% 93.01% 93.86% 93.72% 36.51% 85.22% k - NN 94.74% 96.14% 97.14% 97.42% 92.57% 87.28% 76.42% 91.67% LPR 99.00% 98.57% 97.71% 98.00% 98.85% 98.85% 93.86% 97.83% PRK 99.14% 98.42% 94.71% 96.42% 98.42% 97.71% 92.15% 96.71% LRK 99.28% 99.14% 98.28% 98.42% 99.14% 99.28% 94.08% 98.23% LPK 99.14% 98.71% 95.14% 96.28% 98.14% 97.28% 92.57% 96.73% 96.66% 96.34% 94.28% 95.07% 95.87% 94.78% 84.14% 93.88% Recognition accuracy using strategy e In this subsection, classifier recognition results of partitioning strategy eare presented. LPK is the best classifier combination when we follow this strategy. For the features and the principal components under consideration, a maximum accuracy of 99.71% could be achieved. Recognition results of classifiers and their combinations for up to seven features (feature) and the principal components are given in Table 5. Recognition Accuracy for Category 2 Database
Smart Computing Review, vol. 3, no. 5, October 2013 351 In this section, we consider each Gurmukhi character written 10times by 10different writers. Again the features that have been considered here are the seven features discussed in Section 4.3 and the principal components, two principal components (2-PC), three principal components (3-PC),, seven principal components (PC) have been considered and taken as input for the classifiers. Partitioning strategy experimental results are presented in the following subsections. Table 5. Classifier recognition accuracy for Category 1, Strategy e Linear -SVM 89.45% 89.45% 89.46% 89.46% 89.46% 89.46% 89.45% 89.46% Poly. - SVM 89.46% 87.17% 80.62% 84.90% 87.74% 86.03% 89.46% 86.48% RBF - SVM 89.74% 89.46% 88.31% 88.60% 98.74% 90.02% 70.08% 87.85% k - NN 96.42% 97.71% 96.57% 97.14% 88.57% 79.14% 69.71% 89.32% LPR 99.42% 98.28% 97.99% 98.57% 99.42% 99.14% 99.42% 98.89% PRK 98.85% 98.28% 94.85% 97.14% 99.14% 97.99% 98.57% 97.83% LRK 98.85% 99.14% 98.85% 99.42% 99.14% 99.14% 98.85% 99.06% LPK 99.42% 99.14% 95.14% 96.85% 99.71% 98.00% 98.00% 98.04% 95.20% 94.83% 92.72% 94.01% 95.24% 92.36% 89.19% 93.37% Recognition accuracy using strategy a In this subsection, classifier recognition results of partitioning strategy a are presented. When we consider this strategy, k- NN is the best classifier for offline handwritten Gurmukhi character recognition. The maximum accuracy achieved was 94.51% for this strategy. Recognition results of classifiers and their combinations are given in Table 6. Table 6. Classifier recognition accuracy for Category 2, Strategy a Linear -SVM 77.78% 76.58% 74.75% 75.61% 79.55% 81.38% 76.52% 77.45% Poly. - SVM 75.96% 53.68% 25.58% 33.75% 41.86% 45.34% 53.68% 47.12% RBF - SVM 80.29% 75.32% 73.04% 75.67% 78.64% 79.72% 15.07% 68.25% k - NN 91.42% 94.51% 83.71% 75.77% 69.77% 71.42% 60.51% 78.16% LPR 80.45% 74.62% 70.62% 73.42% 77.99% 78.97% 75.19% 75.89% PRK 79.85% 65.02% 49.25% 51.65% 58.17% 61.54% 38.68% 57.74% LRK 79.82% 76.57% 75.60% 75.48% 79.65% 81.14% 37.37% 72.23% LPK 78.34% 65.34% 49.71% 51.77% 58.00% 61.65% 78.74% 63.36% 80.49% 72.71% 62.78% 64.14% 67.95% 70.14% 54.47% 67.52% Recognition accuracy using strategy b In partitioning strategy b, the maximum accuracy that could be achieved is 94.5%. Using this strategy, we again observed that k-nn is the best classifier for offline handwritten Gurmukhi character recognition. Recognition results of this partitioning strategy, for up to seven features (feature) and the principal components are depicted in Table 7. Recognition accuracy using strategy c We achieved an accuracy of 95.14% when we used strategy c, and we infer that LPR is the best classifier combination for offline handwritten Gurmukhi character recognition withthis strategy. Recognition results for this partitioning strategy are given in Table 8.
352 Kumar et al.: PCA-based Offline Handwritten Character Recognition System Recognition accuracy using strategy d In this subsection, classifier recognition results of partitioning strategy d are presented. When we consider this strategy, LPK is the best classifier combination for offline handwritten Gurmukhi character recognition. The maximum accuracy that could be achieved is 97.71% withthis strategy. Recognition results are depicted in Table 9. Recognition accuracy using strategy e In partitioning strategy e, the maximum accuracy that could be achieved is 99.42%. Using this strategy, we noticed that, again, LPR is the best classifier combination for offline handwritten Gurmukhi character recognition. Recognition results for the features and the principal components under consideration using this strategy are illustrated in Table 10. Table 7. Classifier recognition accuracy for Category 2, Strategy b Linear -SVM 56.24% 79.15% 78.37% 79.22% 82.51% 83.94% 80.01% 77.06% Poly. - SVM 55.67% 62.04% 34.33% 40.82% 51.32% 56.81% 82.29% 54.75% RBF - SVM 57.05% 79.37% 77.37% 79.73% 81.44% 83.87% 17.91% 68.11% k - NN 93.14% 94.50% 86.21% 77.85% 73.28% 73.00% 59.92% 79.70% LPR 84.50% 80.00% 77.14% 79.57% 82.42% 83.57% 79.42% 80.95% PRK 83.85% 72.00% 58.07% 59.92% 65.42% 70.28% 43.28% 64.69% LRK 83.64% 81.35% 80.07% 80.92% 83.78% 85.50% 42.07% 76.76% LPK 82.57% 72.07% 57.78% 59.14% 65.78% 70.07% 82.78% 70.03% 74.58% 77.56% 68.66% 69.64% 73.24% 75.88% 60.96% 71.50% Table 8. Classifier recognition accuracy for Category 2, Strategy c Linear -SVM 82.68% 82.20% 82.39% 82.49% 84.49% 86.58% 82.20% 83.29% Poly. - SVM 83.06% 70.40% 42.43% 50.04% 59.72% 67.07% 84.87% 65.37% RBF - SVM 84.97% 81.25% 79.82% 81.44% 83.92% 85.82% 22.64% 74.27% k - NN 94.57% 95.61% 86.19% 83.04% 79.52% 76.57% 66.95% 83.21% LPR 95.14% 83.14% 80.57% 82.47% 84.85% 87.42% 82.66% 85.18% PRK 86.38% 78.66% 64.66% 68.09% 74.28% 78.00% 50.19% 71.47% LRK 86.57% 84.66% 82.95% 83.99% 86.66% 88.76% 48.85% 80.35% LPK 84.95% 79.42% 65.33% 68.66% 74.19% 77.61% 84.95% 76.44% 87.29% 81.92% 73.04% 75.02% 78.45% 80.97% 65.41% 77.44% Recognition Accuracy for Category 3 Database In this section, we consider each Gurmukhi character written by 100different writers. Here, the seven features discussed in Section 4.3 and the principal components two principal components (2-PC), three principal components (3-PC),, seven principal components (PC) have again been considered and taken as input to the classifiers. The results are presented in the following subsections. Recognition accuracy using strategy a In this subsection, we present classifier recognition results of partitioning strategy a. In this strategy, the maximum
Smart Computing Review, vol. 3, no. 5, October 2013 353 accuracy that could be achieved is 79.48%. Using this strategy, we observed that LPR is the best classifier combination for offline handwritten Gurmukhi character recognition. Recognition results of classifiers and their combinations are given in Table 11 for up to seven features (feature) and the principal components. Table 9. Classifier recognition accuracy for Category 2, Strategy d Linear -SVM 90.72% 90.01% 90.58% 91.01% 92.01% 93.01% 88.59% 90.85% Poly. - SVM 91.87% 83.02% 55.63% 65.19% 75.89% 80.59% 92.72% 77.84% RBF - SVM 91.58% 88.73% 88.01% 88.87% 90.44% 91.72% 31.66% 81.57% k - NN 93.57% 94.42% 87.42% 82.57% 83.71% 77.86% 67.57% 83.87% LPR 97.28% 93.57% 92.48% 93.28% 94.85% 96.71% 93.28% 94.49% PRK 97.42% 92.00% 78.42% 82.85% 89.00% 90.85% 61.28% 84.55% LRK 97.28% 95.28% 94.57% 94.14% 96.14% 97.28% 59.42% 90.59% LPK 97.71% 93.71% 80.00% 83.99% 89.71% 91.71% 95.28% 90.30% 94.67% 91.34% 83.38% 85.23% 88.96% 89.96% 73.72% 86.75% Table 10. Classifier recognition accuracy for Category 2, Strategy e Linear -SVM 89.45% 89.46% 89.45% 89.46% 89.45% 89.74% 88.89% 89.41% Poly. - SVM 89.17% 87.17% 67.80% 80.63% 84.90% 84.33% 80.63% 82.09% RBF - SVM 88.60% 87.46% 87.46% 88.03% 88.31% 88.60% 75.49% 86.28% k - NN 92.85% 95.71% 93.71% 85.14% 78.00% 62.86% 48.00% 79.47% LPR 99.42% 98.00% 97.99% 98.57% 99.14% 98.57% 99.42% 98.73% PRK 97.42% 97.42% 90.28% 94.00% 95.42% 95.14% 99.42% 95.59% LRK 98.85% 98.57% 98.57% 98.57% 97.99% 99.14% 98.85% 98.65% LPK 98.57% 98.57% 90.57% 94.28% 95.99% 95.71% 98.85% 96.08% 94.29% 94.04% 89.47% 91.08% 91.15% 89.26% 86.19% 90.78% Recognition accuracy using strategy b In partitioning strategy b, the maximum accuracy that could be achieved is 81.78%. Using this strategy, we saw that, again, LPR is the best classifier combination for offline handwritten Gurmukhi character recognition. Recognition results for this strategy are illustrated in Table 12. Recognition accuracy using strategy c In this subsection, classifier recognition results of partitioning strategy c have been presented. Here, LPR is again the best classifier combination when we followed this strategy. A maximum recognition accuracy of 81.8% could be achieved with this strategy. Recognition results of classifiers and their combinations for up to seven features (feature) and the prinicipal components are given in Table 13. Recognition accuracy using strategy d In partitioning strategy d, the maximum accuracy that could be achieved is 84%. Using this strategy, we found PRK is the best classifier combination for offline handwritten Gurmukhi character recognition. Recognition results for this strategy are given in Table 14.
354 Kumar et al.: PCA-based Offline Handwritten Character Recognition System Table 11. Classifier recognition accuracy for Category 3, Strategy a Linear -SVM 74.87% 72.81% 71.62% 72.87% 77.27% 77.84% 75.78% 74.72% Poly. - SVM 75.04% 30.21% 14.39% 17.70% 23.87% 34.72% 69.67% 37.94% RBF - SVM 78.35% 69.33% 66.59% 67.10% 72.92% 72.92% 17.81% 63.57% k - NN 77.27% 75.71% 64.11% 58.45% 48.80% 57.54% 43.88% 60.82% LPR 79.48% 68.99% 64.91% 65.94% 72.62% 74.85% 74.57% 71.62% PRK 78.34% 51.37% 38.45% 37.54% 41.37% 48.74% 26.57% 46.05% LRK 78.74% 72.79% 69.88% 69.95% 75.37% 75.54% 25.59% 66.84% LPK 76.62% 53.31% 40.45% 39.31% 43.42% 50.97% 77.14% 54.46% 77.33% 61.81% 53.8% 53.60% 56.95% 61.64% 51.37% 59.50% Table 12. Classifier recognition accuracy for Category 3, Strategy b Linear -SVM 75.80% 73.73% 73.94% 74.08% 77.73% 78.08% 75.44% 75.54% Poly. - SVM 76.15% 35.68% 16.27% 21.77% 30.69% 40.54% 51.32% 38.92% RBF - SVM 78.44% 70.59% 67.16% 68.45% 73.16% 74.23% 20.27% 64.61% k - NN 79.71% 77.57% 67.00% 59.28% 47.14% 57.57% 40.71% 61.28% LPR 81.78% 70.35% 68.78% 68.42% 74.14% 75.71% 75.14% 57.06% PRK 79.57% 55.42% 40.49% 39.85% 44.85% 52.92% 25.00% 48.30% LRK 77.00% 74.14% 72.57% 71.64% 76.07% 77.21% 23.71% 67.48% LPK 78.35% 56.64% 43.50% 42.07% 46.57% 55.00% 77.28% 57.06% 78.35% 64.26% 56.21% 55.69% 58.79% 63.90% 48.60% 58.78% Table 13. Classifier recognition accuracy for Category 3, Strategy c Linear -SVM 77.73% 74.50% 74.59% 75.45% 79.92% 80.01% 75.73% 76.85% Poly. - SVM 77.35% 43.67% 21.12% 27.40% 38.24% 45.09% 59.72% 44.66% RBF - SVM 80.20% 72.78% 70.02% 70.98% 75.26% 75.74% 25.50% 67.21% k - NN 81.14% 80.28% 68.57% 60.66% 50.76% 59.71% 42.76% 63.41% LPR 81.80% 72.47% 69.52% 70.30% 77.23% 77.14% 81.80% 75.75% PRK 80.95% 59.52% 44.66% 44.85% 52.85% 57.33% 75.71% 59.41% LRK 81.04% 74.85% 73.14% 73.14% 78.19% 78.76% 72.09% 75.89% LPK 80.00% 60.47% 47.52% 46.47% 54.19% 59.80% 77.61% 60.87% 80.02% 67.31% 58.64% 58.65% 63.33% 66.69% 63.86% 65.50%
Smart Computing Review, vol. 3, no. 5, October 2013 355 Table 14. Classifier recognition accuracy for Category 3, Strategy d Linear -SVM 78.03% 75.89% 75.04% 75.03% 79.03% 80.74% 75.03% 76.97% Poly. - SVM 78.60% 51.92% 26.24% 29.81% 46.21% 51.21% 81.45% 52.21% RBF - SVM 80.74% 70.89% 69.32% 71.04% 75.74% 76.31% 32.46% 68.07% k - NN 80.28% 77.71% 62.57% 54.86% 42.42% 51.28% 35.28% 57.77% LPR 83.28% 76.14% 71.99% 73.57% 79.71% 80.85% 81.45% 78.14% PRK 84.00% 66.14% 45.57% 48.42% 57.85% 61.57% 78.71% 63.18% LRK 83.99% 77.42% 75.14% 77.00% 79.71% 80.42% 74.42% 78.30% LPK 83.28% 68.14% 50.71% 49.57% 59.85% 62.85% 80.57% 65.00% 81.52% 70.53% 59.57% 59.91% 65.06% 68.15% 67.42% 67.45% Recognition accuracy using strategy e In this subsection, classifier recognition results of partitioning strategy e are presented. Here, PRK is the best classifier combination for offline handwritten Gurmukhi character recognition. We achieved a maximum recognition accuracy of 84.9% withthis strategy. Recognition results are shown in Table 15. Table 15. Classifier recognition accuracy for Category 3, Strategy e Linear -SVM 74.64% 70.65% 70.94% 70.08% 73.21% 76.35% 69.23% 72.16% Poly. - SVM 76.07% 51.85% 28.49% 33.33% 47.86% 54.70% 67.00% 51.33% RBF - SVM 79.48% 65.24% 62.39% 64.96% 69.23% 72.36% 65.24% 68.41% k - NN 77.14% 72.85% 57.14% 51.42% 35.71% 35.42% 27.71% 51.06% LPR 83.71% 72.28% 65.42% 68.57% 76.00% 79.42% 79.99% 75.06% PRK 84.90% 62.28% 42.28% 48.57% 54.85% 59.42% 73.71% 60.86% LRK 83.71% 73.14% 69.71% 72.28% 73.14% 76.85% 69.14% 74.00% LPK 81.42% 66.57% 45.14% 48.28% 57.71% 60.85% 75.42% 62.20% 80.13% 66.85% 55.18% 57.18% 60.96% 64.42% 65.93% 64.38% Conclusion The work presented in this paper proposes an offline handwritten Gurmukhi character recognition system using PCA. The features of a character that have been considered in this work include zoning features, diagonal features, directional features, transition features, intersection and open end points features, parabola curve fitting based features and power curve fitting based features. The classifiers employed in this work are k-nn, Linear-SVM, Polynomial-SVM and RBF- SVM and combinations of these. Database category and strategy recognition accuracy is depicted in Table 16, and we conclude that 2-PC is more efficient than other feature sets. The proposed system achieves an average recognition accuracy of 99.06% fromthe category 1 databasewhen strategy e and the LRK classifier is used, 98.73% fromthe category 2 databasewhen strategy e and the LPR classifier is used, and 78.30% fromthe category 3 database when strategy d and the LRK classifier is used. This accuracy can further be increased by considering a larger data set while training the classifier. This work can also be extended for offline handwritten character recognition of other Indian scripts.
356 Kumar et al.: PCA-based Offline Handwritten Character Recognition System Table 16. Database category wise recognition accuracy Database category Classifier Accuracy (%) Category 1 Strategy a 2-PC PRK 97.48% Category 1 Strategy b 2-PC LPR 97.99% Category 1 Strategy c 2-PC PRK 98.85% Category 1 Strategy d 2-PC LRK 99.28% Category 1 Strategy e 6-PC LPK 99.71% Category 2 Strategy a 3-PC k - NN 94.51% Category 2 Strategy b 3-PC k - NN 94.50% Category 2 Strategy c 2-PC LPR 95.14% Category 2 Strategy d 2-PC LPK 97.71% Category 2 Strategy e 2-PC LPR 99.42% Category 3 Strategy a 2-PC LPR 79.48% Category 3 Strategy b 2-PC LPR 81.78% Category 3 Strategy c 2-PC LPR 81.80% Category 3 Strategy d 2-PC PRK 84.00% Category 3 Strategy e 2-PC PRK 84.90% References [1] V. N. M. Aradhya, G. H. Kumar, S. Noushath, Multilingual OCR system for south Indian scripts and English documents: An approach based on Fourier transform and principal component analysis, Engineering Applications of Artificial Intelligence, vol. 21, pp. 658-668, 2008. Article(CrossRef Link) [2] S. Basu, N. Das, R. Sarkar, M. Kundu, M. Nasipuri, D. K. Basu, A hierarchical approach to recognition of handwritten Bangla characters, Pattern Recognition, vol. 42, no. 7, pp. 1461484, 1999. Article(CrossRef Link) [3] U. Bhattacharya, M. Shridhar, S. K. Parui, P. K. Sen, B. B. Chaudhuri, Offline recognition of handwritten Bangla characters: an efficient two-stage approach, Pattern Analysis and Applications, vol. 15, no. 4, pp. 445-458, 2012. Article(CrossRef Link) [4] T. K. Bhowmik, P. Ghanty, A. Roy, S. K. Parui, SVM-based hierarchical architectures for handwritten Bangla character recognition, International Journal of Document Analysis Recognition, vol. 12, no. 2, pp. 9108, 2009. Article(CrossRef Link) [5] V. Deepu, S. Madhvanath, R. G. Ramakrishnan, Principal Component Analysis for online handwritten character recognition, in Proc. of 17th International Conference on Pattern Recognition, vol. 2, pp. 32330, 2004. [6] P. D. Gader, M. Mohamed, J. H. Chiang, Handwritten word recognition with character and inter-character neural networks, IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics, vol. 27, no. 1, pp. 158-164, 1997. Article(CrossRef Link) [7] M. Kumar, M. K. Jindal, R. K. Sharma, SVM based offline handwritten Gurmukhi character recognition, in Proc. of International Workshop on Soft Computing and Knowledge Discovery, vol. 758, pp. 51-62, 2011. [8] M. Kumar, M. K. Jindal, R. K. Sharma, k-nn based offline handwritten Gurmukhi character recognition, in Proc. of International Conference on Information and Image Processing, pp.1-4, 2011. [9] M. Kumar, M. K. Jindal, R. K. Sharma, Classification of Characters and Grading Writers in Offline Handwritten Gurmukhi Script, in Proc. of International Conference on Information and Image Processing, pp. 1-4, 2011. [10] M. Kumar, M. K. Jindal, R. K. Sharma, Offline Handwritten Gurmukhi Character Recognition using Curvature, in Proc. of International Conference on AMOC, pp. 981-989, 2011. [11] G. S. Lehal, C. Singh, A Gurmukhi script recognition system, in Proc. of 15 th International Conference on Pattern Recognition, vol. 2, pp. 55560, 2000. [12] U. Pal, B. B. Chaudhuri, Indian script character recognition: A survey, Pattern Recognition, vol. 37, no. 9, pp. 1887 1899, 2004. Article(CrossRef Link)
Smart Computing Review, vol. 3, no. 5, October 2013 357 [13] U. Pal, T. Wakabayashi, F. Kimura, Handwritten Bangla Compound Character Recognition using Gradient, in Proc. of 10 th International Conference on Information Technology, pp. 208-213, 2007 [14] U. Pal, T. Wakabayashi, F. Kimura, Handwritten numeral recognition of six popular scripts, in Proc. of International Conference on Document Analysis and Recognition (ICDAR 07), vol. 2, pp. 749-753, 2007. [15] U. Pal, T. Wakabayashi, F. Kimura, A system for off-line Oriya handwritten character recognition using curvature feature, in Proc. of 10 th International Conference on Information Technology, pp. 22229, 2007. [16] A. Sharma, R. Kumar, R. K. Sharma, Online handwritten Gurmukhi character recognition using elastic matching, International Journal of Congress on Image and Signal Processing, vol. 2, pp. 391-396, 2008. [17] B. Singh, A. Mittal, D. Ghosh, An Evaluation of Different feature extractors and classifiers for offline handwritten Devanagri character recognition, Journal of Pattern Recognition Research, vol. 2, pp. 269-277, 2011. Article(CrossRef Link) [18] S. Sundaram, A. G. Ramakrishnan, Two Dimensional Principal Component Analysis for Online Tamil Character Recognition, in Proc. of 11th International Conference Frontiers in Handwriting Recognition, pp. 88-94, 2008. [19] Y. Wen, Y. Lub, P. Shi, Handwritten Bangla numeral recognition system and its application to postal automation, Pattern Recognition, vol. 40, no. 1, pp. 99-107, 2007. Article(CrossRef Link) [20] T. Y. Zhang, C. Y. Suen, A fast parallel algorithm for thinning digital patterns, Communications of the ACM, vol. 27, no. 3, pp. 236-239, 1984. Article(CrossRef Link) Munish Kumar received his Masters degree in Computer Science & Engineering from Thapar University, Patiala, India in 2008. He started his career as an Assistant Professor in computer application at Jaito Centre of Punjabi University, Patiala. He is working as Assistant Professor in the Computer Science Department, Panjab University Rural Centre, Kauni, Muktsar, Punjab, India. He is currently pursuing his Ph.D. degree from Thapar University, Patiala, Punjab, India. His research interests include Character Recognition. Manish Kumar Jindal received his Bachelors degree in science in 1996 and Post Graduate degree in Computer Applications from Punjabi University, Patiala, India in 1999. He holds a Gold Medal in his post graduation. He received his Ph.D. degree in Computer Science & Engineering from Thapar University, Patiala, India in 2008. He is working as Associate Professor in Panjab University Regional Centre, Muktsar, Punjab, India. His research interests include Character Recognition and Pattern Recognition. Rajendra Kumar Sharma received his Ph.D. degree in Mathematics from the University of Roorkee (Now, IIT Roorkee), India in 1993. He is currently working as Professor at Thapar University, Patiala, India, where he teaches, among other things, statistical models and their usage in computer science. He has been involved in the organization of a number of conferences and other courses at Thapar University, Patiala. His main research interests are statistical models in computer science, Neural Networks, and Pattern Recognition. Copyrights 2013 KAIS