PCA-based Offline Handwritten Character Recognition System

Similar documents
A Technique for Offline Handwritten Character Recognition

Isolated Curved Gurmukhi Character Recognition Using Projection of Gradient

Handwritten Gurumukhi Character Recognition by using Recurrent Neural Network

A Brief Study of Feature Extraction and Classification Methods Used for Character Recognition of Brahmi Northern Indian Scripts

Recognition of Off-Line Handwritten Devnagari Characters Using Quadratic Classifier

Complementary Features Combined in a MLP-based System to Recognize Handwritten Devnagari Character

A Survey of Problems of Overlapped Handwritten Characters in Recognition process for Gurmukhi Script

Isolated Handwritten Words Segmentation Techniques in Gurmukhi Script

Fine Classification of Unconstrained Handwritten Persian/Arabic Numerals by Removing Confusion amongst Similar Classes

Paper ID: NITETE&TC05 THE HANDWRITTEN DEVNAGARI NUMERALS RECOGNITION USING SUPPORT VECTOR MACHINE

Cursive Handwriting Recognition System Using Feature Extraction and Artificial Neural Network

HANDWRITTEN GURMUKHI CHARACTER RECOGNITION USING WAVELET TRANSFORMS

Handwritten Numeral Recognition of Kannada Script

A Review on Different Character Segmentation Techniques for Handwritten Gurmukhi Scripts

Preprocessing of Gurmukhi Strokes in Online Handwriting Recognition

Online Bangla Handwriting Recognition System

Indian Multi-Script Full Pin-code String Recognition for Postal Automation

Problems in Extraction of Date Field from Gurmukhi Documents

Segmentation of Characters of Devanagari Script Documents

Recognition of Gurmukhi Text from Sign Board Images Captured from Mobile Camera

Segmentation Based Optical Character Recognition for Handwritten Marathi characters

DEVANAGARI SCRIPT SEPARATION AND RECOGNITION USING MORPHOLOGICAL OPERATIONS AND OPTIMIZED FEATURE EXTRACTION METHODS

Comparative Performance Analysis of Feature(S)- Classifier Combination for Devanagari Optical Character Recognition System

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Handwritten Gurumukhi Character Recognition Using Zoning Density and Background Directional Distribution Features

Offline Handwritten Gurmukhi Word Recognition Using Deep Neural Networks

OCR For Handwritten Marathi Script

SEVERAL METHODS OF FEATURE EXTRACTION TO HELP IN OPTICAL CHARACTER RECOGNITION

Devanagari Isolated Character Recognition by using Statistical features

MOMENT AND DENSITY BASED HADWRITTEN MARATHI NUMERAL RECOGNITION

Recognition of online captured, handwritten Tamil words on Android

Degraded Text Recognition of Gurmukhi Script. Doctor of Philosophy. Manish Kumar

Research Article Development of Comprehensive Devnagari Numeral and Character Database for Offline Handwritten Character Recognition

A two-stage approach for segmentation of handwritten Bangla word images

Image Normalization and Preprocessing for Gujarati Character Recognition

Word-wise Script Identification from Video Frames

Handwritten Devanagari Character Recognition Model Using Neural Network

Recognition of handwritten Bangla basic characters and digits using convex hull based feature set

A Comparison of Feature and Pixel-based Methods for Recognizing Handwritten Bangla Digits

Chapter Review of HCR

LITERATURE REVIEW. For Indian languages most of research work is performed firstly on Devnagari script and secondly on Bangla script.

Performance Comparison of Devanagari Handwritten Numerals Recognition

Opportunities and Challenges of Handwritten Sanskrit Character Recognition System

Building Multi Script OCR for Brahmi Scripts: Selection of Efficient Features

FREEMAN CODE BASED ONLINE HANDWRITTEN CHARACTER RECOGNITION FOR MALAYALAM USING BACKPROPAGATION NEURAL NETWORKS

Segmentation of Isolated and Touching characters in Handwritten Gurumukhi Word using Clustering approach

A Feature based on Encoding the Relative Position of a Point in the Character for Online Handwritten Character Recognition

Date Field Extraction from Gurmukhi Handwritten Documents

A System for Joining and Recognition of Broken Bangla Numerals for Indian Postal Automation

RECOGNITION OF HANDWRITTEN DEVANAGARI WORDS USING NEURAL NETWORK

A Technique for Classification of Printed & Handwritten text

An Improvement Study for Optical Character Recognition by using Inverse SVM in Image Processing Technique

Multilevel Classifiers in Recognition of Handwritten Kannada Numerals

CHAPTER 8 COMPOUND CHARACTER RECOGNITION USING VARIOUS MODELS

A Novel Handwritten Gurmukhi Character Recognition System Based On Deep Neural Networks

Word-wise Hand-written Script Separation for Indian Postal automation

SEGMENTATION OF CHARACTERS WITHOUT MODIFIERS FROM A PRINTED BANGLA TEXT

Review of Automatic Handwritten Kannada Character Recognition Technique Using Neural Network

Recognition of Unconstrained Malayalam Handwritten Numeral

Spectral Analysis of Projection Histogram for Enhancing Close matching character Recognition in Malayalam

NOVATEUR PUBLICATIONS INTERNATIONAL JOURNAL OF INNOVATIONS IN ENGINEERING RESEARCH AND TECHNOLOGY [IJIERT] ISSN: VOLUME 5, ISSUE

Structural Feature Extraction to recognize some of the Offline Isolated Handwritten Gujarati Characters using Decision Tree Classifier

Offline Handwritten Gurmukhi Character Recognition: A Review

Handwritten Arabic Digits Recognition Using Bézier Curves

Handwritten Character Recognition: A Comprehensive Review on Geometrical Analysis

Offline Tamil Handwritten Character Recognition using Chain Code and Zone based Features

Gabor Features Based Script Identification of Lines within a Bilingual/Trilingual Document

Optical Character Recognition (OCR) for Printed Devnagari Script Using Artificial Neural Network

Online Handwritten Devnagari Word Recognition using HMM based Technique

A Novel Approach: Recognition of Devanagari Handwritten Numerals

HCR Using K-Means Clustering Algorithm

Devanagari Handwriting Recognition and Editing Using Neural Network

A Review on Handwritten Character Recognition

Off-line Recognition of Hand-written Bengali Numerals using Morphological Features

Handwritten Devanagari Character Recognition

SEGMENTATION OF BROKEN CHARACTERS OF HANDWRITTEN GURMUKHI SCRIPT

Character Recognition of High Security Number Plates Using Morphological Operator

FRAGMENTATION OF HANDWRITTEN TOUCHING CHARACTERS IN DEVANAGARI SCRIPT

Multiple Classifier Combination for Off-line Handwritten Devnagari Character Recognition

A HYBRID FEATURE EXTRACTION AND RECOGNITION TECHNIQUE FOR OFFLINE DEVNAGRI HADWRITING

Recognition of Handwritten Numerals of Manipuri Script

A Novel Feature Extraction and Classification Methodology for the Recognition of Historical Documents

A Recognition System for Devnagri and English Handwritten Numerals

Feature Extraction and Classification for OCR of Gurmukhi Script

Handwritten Marathi Character Recognition on an Android Device

Segmentation of Bangla Handwritten Text

A Simplistic Way of Feature Extraction Directed towards a Better Recognition Accuracy

Handwritten Hindi Numerals Recognition System

An Improved Zone Based Hybrid Feature Extraction Model for Handwritten Alphabets Recognition Using Euler Number

A survey on optical character recognition for Bangla and Devanagari scripts

Neural network based Numerical digits Recognization using NNT in Matlab

Gradient-Angular-Features for Word-Wise Video Script Identification

Handwritten Script Recognition at Block Level

Keywords Handwritten alphabet recognition, local binary pattern (LBP), feature Descriptor, nearest neighbor classifier.

Morphological Approach for Segmentation of Scanned Handwritten Devnagari Text

Creation of a Complete Hindi Handwritten Database for Researchers

Hand Written Telugu Character Recognition Using Bayesian Classifier

Handwritten character and word recognition using their geometrical features through neural networks

A Hierarchical Pre-processing Model for Offline Handwritten Document Images

Character Recognition Using Matlab s Neural Network Toolbox

Transcription:

Smart Computing Review, vol. 3, no. 5, October 2013 346 Smart Computing Review PCA-based Offline Handwritten Character Recognition System Munish Kumar 1, M. K. Jindal 2, and R. K. Sharma 3 1 Computer Science Department, P. U. Rural Centre / Kauni, Muktsar, Punjab, India / munishcse@gmail.com 2 Department of Computer Science & Applications, P. U. Regional Centre / Muktsar, Punjab, India 3 School of Mathematics & Computer Applications, Thapar University / Patiala, India * Corresponding Author: Munish Kumar Received July 4, 2013; Revised September 22, 2013; Accepted September 29, 2013; Published October 31, 2013 Abstract: Principal component analysis (PCA) has been used widely in pattern recognition to reduce the extent of the data. In this paper, we explore using this technique to recognize offline handwritten Gurmukhi characters, and a system for offline handwritten Gurmukhi character recognition using PCA is proposed. The system first prepares a skeleton of the character so that meaningful feature information about the character can be extracted. For classification, we used k- nearest neighbor, Linear-SVM, polynomial-svm and RBF-SVM based approaches and combinations of these approaches. In this work, we collected 16,800 samples of isolated offline handwritten Gurmukhi characters. These samples were divided into three categories. In category 1 (5600 samples), each Gurmukhi character was written 100 times by a single writer. In category 2 (5600 samples), each Gurmukhi character was written 10 times by 10 different writers, and in category 3 (5600 samples), each Gurmukhi character was written by 100 different writers. The set of the basic 35 akhars of Gurmukhi has been considered here. A partitioning strategy for selecting the training and testing patterns is also explored in this work. We used zoning, diagonal, directional, transition, intersection and open end point, parabola curve fitting based and power curve fitting based feature extraction in order to find the feature set for a given character. The proposed system achieves a recognition accuracy of 99.06% in category 1, 98.73% in category 2 and 78.30% in category 3. Keywords: Handwritten character recognition, extraction, PCA, k-nn, SVM Introduction DOI: 10.6029/smartcr.2013.05.005

Smart Computing Review, vol. 3, no. 5, October 2013 347 O ffline handwritten character recognition, usually abbreviated as offline HCR, is the process of converting offline handwritten characters into a machine process-able format. In this paper, we present an offline handwritten Gurmukhi character recognition system using principal component analysis (PCA). A handwritten character recognition system consists of several phases, namely digitization, preprocessing, feature extraction and classification. The feature extraction stage analyzes a handwritten character image and selects a set of features that can uniquely be used for recognition of that character. Different feature extraction methods have been proposed for representation of characters, such as projection histograms, contour profile, zoning, Zernike moments, gradient features and Gabor features, etc. Singh et al. [17] presented a study of different feature extractors and classifiers for handwritten Devanagari character recognition. Aradhya et al. [1] presented a multilingual OCR system for south Indian scripts based on PCA. Deepu et al. [5] presented a system based on PCA for online handwritten character recognition. Sundaram and Ramakarishnan [18] presented 2D-PCA for online Tamil character recognition. Bhattacharya et al. [3] presented an efficient two-stage approach for handwritten Bangla character recognition. Kumar et al. [7] presented an offline handwritten Gurmukhi character recognition system based on support vector machines (SVM). In that work, they performed recognition without using PCA and used only an SVM classifier for classification purpose. They also provided an offline handwritten Gurmukhi character recognition system using a k-nearest neighbor (k-nn) classifier [8]. Sharma et al. [16] presented an online handwritten Gurmukhi script recognition system. They used an elastic matching method in which the character is recognized in two stages. The first stage recognizes the strokes and, in the second stage, the character is constructed on the basis of recognized strokes. In the present work, a PCA-based offline handwritten Gurmukhi character recognition system is proposed from experimenting with different recognition methods, namely, k-nn, Linear-SVM, Polynomial-SVM, RBF-SVM and combinations of these recognition methods. Data Collection In this study, 16,800 samples of offline handwritten Gurmukhi characters have been collected. These samples have further been divided into three categories. Category 1 consists of 5600 samples of Gurmukhi characters where each character was written 100 times by a single writer. Category 2 also contains 5600 samples, and each Gurmukhi character was written 10 times by 10 different writers. In category 3, each Gurmukhi character was written by 100 different writers. This category also consists of 5600 samples. All these characters were scanned at 300 dots per inch resolution. As such, a sufficiently large database has been collected for offline handwritten Gurmukhi characters. These three categories have further been analyzed and discussed in this paper. Gurmukhi Script Gurmukhi script is the script used for writing the Punjabi language and is derived from the old Punjabi term Guramukhi, which means from the mouth of the Guru. Gurmukhi script is the 12th most widely used script in the world. The writing style of Gurmukhi script is top to bottom, left to right, and it is not case sensitive. Gurmukhi script has 3 vowel bearers, 32 consonants, 6 additional consonants, 9 vowel modifiers, 3 auxiliary signs and 3 half characters. The Proposed Recognition System The proposed recognition system consists of several phases: digitization, preprocessing, feature extraction, and classification. Digitization Digitization is the process of translating a paper-based handwritten document into electronic format. Here, each document consists of only one Gurmukhi character. The electronic conversion is accomplished by using a method whereby a document is scanned and an electronic representation of the original document in tagged image file format is produced. We used an HP-1400 scanner for digitization, and the digital image was fed to the preprocessing phase. Preprocessing

348 Kumar et al.: PCA-based Offline Handwritten Character Recognition System In this phase, the gray-level character image is normalized into a window sized 100 100. After normalization, we produced a bitmap image of the normalized image. Then, the bitmap image was transformed into a thinned image using a parallel thinning algorithm [20]. Extraction In this phase, features from input characters are extracted. The performance of a handwritten character recognition system primarily depends on the features that are extracted. The extracted features should allow classification of a character in a unique way. We used diagonal features [7], intersection and open end points features [7], transition features [8], zoning features [9], directional features [9], parabola curve fitting based features [10], and power curve fitting based features [10] in order to find the feature set for a given character. Classification The classification phase uses the features extracted in the previous phase for setting class membership. In this work, we used k-nn and SVM classifiers for character recognition. The SVM classifier was considered with three different kernels: linear, polynomial, and RBF. In addition, a C-SVC type classifier in the Lib-SVM tool has been used for SVM classification purposes. We also used combinations of output for each classifier in parallel, and recognition was done using a voting scheme. We have taken following combinations of classifiers: LPR (Linear-SVM + Polynomial-SVM + RBF-SVM) PRK (Polynomial-SVM + RBF-SVM + k-nn) LRK (Linear-SVM + RBF-SVM + k-nn) LPK (Linear-SVM + Polynomial-SVM + k-nn) Principal Component Analysis PCA is a mathematical procedure that uses transformation to convert a set of observations of possibly correlated features into a set of values of uncorrelated features called principal components. PCA is a well-established technique for extracting representative features for character recognition and is used to reduce the extent of the data. The technique is useful when a large number of variables prohibit effective interpretation of the relationships between different features. By reducing dimensionality, one can interpret from a few features, rather than a large number of features. The number of principal components is less than or equal to the number of original variables. By selecting the top j eigen vectors with larger eigen values for subspace approximation, PCA can provide a lower dimensional representation to expose the underlying structures of complex data sets. Let there be P features for handwritten character recognition. In the next step, the symmetric matrix S of correlation coefficients between these features is calculated. Now, the eigenvectors and the corresponding eigen values are calculated. From these P eigen vectors, only j eigen vectors are chosen, corresponding to the larger eigen values. An eigenvector corresponding to a higher eigen value describes more characteristics of a character. Using these j eigen vectors, feature extraction is done using PCA. In the present work, seven features for a Gurmukhi character have been considered, and the experiments were conducted by taking 2, 3, 4, 5, 6 and 7 principal components. Experimental Results and Discussion In this section, the results of the offline handwritten Gurmukhi character recognition system using PCA are presented. The recognition results are based on the k-nn, Linear-SVM, Polynomial-SVM and RBF-SVM classifiers, and combinations of these. As stated earlier, we also experimented with partitioning strategies. We divided the data set of each category using five partitioning strategies. In the first partitioning strategy (strategy a), we have taken 50% of the data in the training set and the other 50% of the data in the testing set. In the second partitioning strategy (strategy b), we considered 60% of the data in the training set and the remaining 40% of the data in the testing set. Partitioning strategy c has 70% of the data in the training set and 30% of the data in the testing set. Similarly, partitioning strategy d has 80% of the data in the training set and 20% of the data in the testing set, where as partitioning strategy e was formulated by taking 90% of the data in the training set and the remaining 10% of the data in the testing set. Category results of the recognition system based on PCA are presented in the following subsections.

Smart Computing Review, vol. 3, no. 5, October 2013 349 Recognition Accuracy for Category 1 Database In this section, we considered each Gurmukhi character written 100times by a single writer. The features considered here are the seven features discussed in Section 4.3. For the sake of comparison between the performance of principal components, two principal components (2-PC), three principal components (3-PC),, seven principal components (PC) have been considered and taken as input for the classifiers. Partitioning strategy experimental results of testing are presented in the following subsections. Recognition accuracy using strategy a In this subsection, classifier recognition results of partitioning strategy a are presented. PRK is the best classifier combination for offline handwritten Gurmukhi character recognition when this strategy is followed. A maximum accuracy of 97.48% canbe achieved withthis strategy. Recognition results of classifiers and their combinations are given in Table 1 for up to seven features (feature) and the principal components. Table 1. Classifier recognition accuracy for Category 1, Strategy a Linear -SVM 94.00% 92.97% 92.80% 93.60% 94.40% 94.63% 94.00% 93.77% Poly. - SVM 95.43% 91.31% 69.67% 80.52% 89.09% 83.67% 95.20% 86.41% RBF - SVM 95.43% 93.43% 91.09% 92.46% 93.82% 94.34% 17.81% 82.63% k - NN 94.71% 97.41% 93.20% 91.94% 84.68% 82.57% 70.28% 87.83% LPR 97.25% 95.77% 94.39% 95.25% 96.17% 96.11% 95.54% 95.78% PRK 97.48% 94.45% 86.17% 90.51% 93.65% 91.19% 56.17% 87.09% LRK 97.19% 95.65% 95.37% 95.82% 96.11% 95.94% 55.88% 90.28% LPK 97.08% 94.68% 86.28% 90.85% 94.17% 91.54% 96.17% 92.97% 96.07% 94.46% 88.62% 91.37% 92.76% 91.25% 72.63% 89.59% Recognition accuracy using strategy b We achieved an accuracy of 97.99% when we used strategy b, and we saw that LPR is the best classifier combination for offline handwritten Gurmukhi character recognition with this strategy. Recognition results for up to seven features ( feature) and the principal components of partitioning strategy b are depicted in Table 2. Table 2. Classifier recognition accuracy for Category 1, Strategy b Linear -SVM 95.14% 94.93% 95.14% 94.93% 95.71% 95.50% 95.78% 95.30% Poly. - SVM 95.36% 93.64% 79.44% 86.58% 91.93% 89.00% 92.17% 89.73% RBF - SVM 95.36% 94.29% 92.57% 93.43% 94.71% 94.78% 20.27% 83.63% k - NN 97.55% 97.42% 96.00% 95.35% 89.21% 86.42% 73.85% 90.83% LPR 97.99% 97.14% 91.35% 94.35% 96.71% 95.35% 95.50% 95.48% PRK 97.64% 96.71% 90.85% 94.35% 96.21% 94.64% 60.42% 90.12% LRK 97.71% 97.57% 97.42% 97.37% 94.71% 97.64% 60.28% 91.81% LPK 97.92% 97.14% 91.35% 94.35% 96.71% 95.35% 94.71% 95.36% 96.83% 96.11% 91.77% 93.84% 94.49% 93.59% 74.12% 91.53%

350 Kumar et al.: PCA-based Offline Handwritten Character Recognition System Recognition accuracy using strategy c In partitioning strategy c, the maximum accuracy that could be achieved is 98.85%. Using this strategy, we again saw that PRK is the best classifier combination for offline handwritten Gurmukhi character recognition. Recognition results of this partitioning strategy, for up to seven features (feature) and the principal components are given in Table 3. Table 3. Classifier recognition accuracy for Category 1, Strategy c Linear -SVM 95.96% 95.05% 94.95% 94.95% 95.24% 95.24% 95.14% 95.22% Poly. - SVM 95.24% 93.72% 83.63% 88.68% 92.77% 90.48% 85.81% 90.05% RBF - SVM 95.14% 94.29% 92.77% 93.62% 95.05% 94.48% 25.50% 84.41% k - NN 97.42% 97.33% 97.80% 96.38% 90.09% 86.47% 77.52% 91.86% LPR 98.57% 98.09% 98.00% 98.57% 98.57% 98.66% 98.28% 98.39% PRK 98.85% 97.61% 93.33% 96.09% 97.42% 96.28% 65.61% 92.17% LRK 98.57% 98.38% 98.47% 98.47% 98.66% 98.66% 65.23% 93.78% LPK 98.66% 97.9% 93.04% 95.99% 97.71% 96.28% 94.48% 96.29% 97.30% 96.55% 93.99% 95.34% 95.68% 94.56% 75.94% 92.77% Recognition accuracy using strategy d In this subsection, recognition results using strategy d are presented.using this strategy, we achieved a maximum accuracy of 99.28% when we use the LRK classifier combination. Recognition results for the features and the principal components under consideration using this strategy are illustrated in Table 4. Table 4. Classifier recognition accuracy for Category 1, Strategy d Linear -SVM 94.00% 94.08% 93.86% 93.72% 93.86% 93.72% 93.10% 93.76% Poly. - SVM 94.00% 92.43% 85.30% 87.30% 92.15% 90.44% 94.43% 90.86% RBF - SVM 94.00% 93.29% 92.15% 93.01% 93.86% 93.72% 36.51% 85.22% k - NN 94.74% 96.14% 97.14% 97.42% 92.57% 87.28% 76.42% 91.67% LPR 99.00% 98.57% 97.71% 98.00% 98.85% 98.85% 93.86% 97.83% PRK 99.14% 98.42% 94.71% 96.42% 98.42% 97.71% 92.15% 96.71% LRK 99.28% 99.14% 98.28% 98.42% 99.14% 99.28% 94.08% 98.23% LPK 99.14% 98.71% 95.14% 96.28% 98.14% 97.28% 92.57% 96.73% 96.66% 96.34% 94.28% 95.07% 95.87% 94.78% 84.14% 93.88% Recognition accuracy using strategy e In this subsection, classifier recognition results of partitioning strategy eare presented. LPK is the best classifier combination when we follow this strategy. For the features and the principal components under consideration, a maximum accuracy of 99.71% could be achieved. Recognition results of classifiers and their combinations for up to seven features (feature) and the principal components are given in Table 5. Recognition Accuracy for Category 2 Database

Smart Computing Review, vol. 3, no. 5, October 2013 351 In this section, we consider each Gurmukhi character written 10times by 10different writers. Again the features that have been considered here are the seven features discussed in Section 4.3 and the principal components, two principal components (2-PC), three principal components (3-PC),, seven principal components (PC) have been considered and taken as input for the classifiers. Partitioning strategy experimental results are presented in the following subsections. Table 5. Classifier recognition accuracy for Category 1, Strategy e Linear -SVM 89.45% 89.45% 89.46% 89.46% 89.46% 89.46% 89.45% 89.46% Poly. - SVM 89.46% 87.17% 80.62% 84.90% 87.74% 86.03% 89.46% 86.48% RBF - SVM 89.74% 89.46% 88.31% 88.60% 98.74% 90.02% 70.08% 87.85% k - NN 96.42% 97.71% 96.57% 97.14% 88.57% 79.14% 69.71% 89.32% LPR 99.42% 98.28% 97.99% 98.57% 99.42% 99.14% 99.42% 98.89% PRK 98.85% 98.28% 94.85% 97.14% 99.14% 97.99% 98.57% 97.83% LRK 98.85% 99.14% 98.85% 99.42% 99.14% 99.14% 98.85% 99.06% LPK 99.42% 99.14% 95.14% 96.85% 99.71% 98.00% 98.00% 98.04% 95.20% 94.83% 92.72% 94.01% 95.24% 92.36% 89.19% 93.37% Recognition accuracy using strategy a In this subsection, classifier recognition results of partitioning strategy a are presented. When we consider this strategy, k- NN is the best classifier for offline handwritten Gurmukhi character recognition. The maximum accuracy achieved was 94.51% for this strategy. Recognition results of classifiers and their combinations are given in Table 6. Table 6. Classifier recognition accuracy for Category 2, Strategy a Linear -SVM 77.78% 76.58% 74.75% 75.61% 79.55% 81.38% 76.52% 77.45% Poly. - SVM 75.96% 53.68% 25.58% 33.75% 41.86% 45.34% 53.68% 47.12% RBF - SVM 80.29% 75.32% 73.04% 75.67% 78.64% 79.72% 15.07% 68.25% k - NN 91.42% 94.51% 83.71% 75.77% 69.77% 71.42% 60.51% 78.16% LPR 80.45% 74.62% 70.62% 73.42% 77.99% 78.97% 75.19% 75.89% PRK 79.85% 65.02% 49.25% 51.65% 58.17% 61.54% 38.68% 57.74% LRK 79.82% 76.57% 75.60% 75.48% 79.65% 81.14% 37.37% 72.23% LPK 78.34% 65.34% 49.71% 51.77% 58.00% 61.65% 78.74% 63.36% 80.49% 72.71% 62.78% 64.14% 67.95% 70.14% 54.47% 67.52% Recognition accuracy using strategy b In partitioning strategy b, the maximum accuracy that could be achieved is 94.5%. Using this strategy, we again observed that k-nn is the best classifier for offline handwritten Gurmukhi character recognition. Recognition results of this partitioning strategy, for up to seven features (feature) and the principal components are depicted in Table 7. Recognition accuracy using strategy c We achieved an accuracy of 95.14% when we used strategy c, and we infer that LPR is the best classifier combination for offline handwritten Gurmukhi character recognition withthis strategy. Recognition results for this partitioning strategy are given in Table 8.

352 Kumar et al.: PCA-based Offline Handwritten Character Recognition System Recognition accuracy using strategy d In this subsection, classifier recognition results of partitioning strategy d are presented. When we consider this strategy, LPK is the best classifier combination for offline handwritten Gurmukhi character recognition. The maximum accuracy that could be achieved is 97.71% withthis strategy. Recognition results are depicted in Table 9. Recognition accuracy using strategy e In partitioning strategy e, the maximum accuracy that could be achieved is 99.42%. Using this strategy, we noticed that, again, LPR is the best classifier combination for offline handwritten Gurmukhi character recognition. Recognition results for the features and the principal components under consideration using this strategy are illustrated in Table 10. Table 7. Classifier recognition accuracy for Category 2, Strategy b Linear -SVM 56.24% 79.15% 78.37% 79.22% 82.51% 83.94% 80.01% 77.06% Poly. - SVM 55.67% 62.04% 34.33% 40.82% 51.32% 56.81% 82.29% 54.75% RBF - SVM 57.05% 79.37% 77.37% 79.73% 81.44% 83.87% 17.91% 68.11% k - NN 93.14% 94.50% 86.21% 77.85% 73.28% 73.00% 59.92% 79.70% LPR 84.50% 80.00% 77.14% 79.57% 82.42% 83.57% 79.42% 80.95% PRK 83.85% 72.00% 58.07% 59.92% 65.42% 70.28% 43.28% 64.69% LRK 83.64% 81.35% 80.07% 80.92% 83.78% 85.50% 42.07% 76.76% LPK 82.57% 72.07% 57.78% 59.14% 65.78% 70.07% 82.78% 70.03% 74.58% 77.56% 68.66% 69.64% 73.24% 75.88% 60.96% 71.50% Table 8. Classifier recognition accuracy for Category 2, Strategy c Linear -SVM 82.68% 82.20% 82.39% 82.49% 84.49% 86.58% 82.20% 83.29% Poly. - SVM 83.06% 70.40% 42.43% 50.04% 59.72% 67.07% 84.87% 65.37% RBF - SVM 84.97% 81.25% 79.82% 81.44% 83.92% 85.82% 22.64% 74.27% k - NN 94.57% 95.61% 86.19% 83.04% 79.52% 76.57% 66.95% 83.21% LPR 95.14% 83.14% 80.57% 82.47% 84.85% 87.42% 82.66% 85.18% PRK 86.38% 78.66% 64.66% 68.09% 74.28% 78.00% 50.19% 71.47% LRK 86.57% 84.66% 82.95% 83.99% 86.66% 88.76% 48.85% 80.35% LPK 84.95% 79.42% 65.33% 68.66% 74.19% 77.61% 84.95% 76.44% 87.29% 81.92% 73.04% 75.02% 78.45% 80.97% 65.41% 77.44% Recognition Accuracy for Category 3 Database In this section, we consider each Gurmukhi character written by 100different writers. Here, the seven features discussed in Section 4.3 and the principal components two principal components (2-PC), three principal components (3-PC),, seven principal components (PC) have again been considered and taken as input to the classifiers. The results are presented in the following subsections. Recognition accuracy using strategy a In this subsection, we present classifier recognition results of partitioning strategy a. In this strategy, the maximum

Smart Computing Review, vol. 3, no. 5, October 2013 353 accuracy that could be achieved is 79.48%. Using this strategy, we observed that LPR is the best classifier combination for offline handwritten Gurmukhi character recognition. Recognition results of classifiers and their combinations are given in Table 11 for up to seven features (feature) and the principal components. Table 9. Classifier recognition accuracy for Category 2, Strategy d Linear -SVM 90.72% 90.01% 90.58% 91.01% 92.01% 93.01% 88.59% 90.85% Poly. - SVM 91.87% 83.02% 55.63% 65.19% 75.89% 80.59% 92.72% 77.84% RBF - SVM 91.58% 88.73% 88.01% 88.87% 90.44% 91.72% 31.66% 81.57% k - NN 93.57% 94.42% 87.42% 82.57% 83.71% 77.86% 67.57% 83.87% LPR 97.28% 93.57% 92.48% 93.28% 94.85% 96.71% 93.28% 94.49% PRK 97.42% 92.00% 78.42% 82.85% 89.00% 90.85% 61.28% 84.55% LRK 97.28% 95.28% 94.57% 94.14% 96.14% 97.28% 59.42% 90.59% LPK 97.71% 93.71% 80.00% 83.99% 89.71% 91.71% 95.28% 90.30% 94.67% 91.34% 83.38% 85.23% 88.96% 89.96% 73.72% 86.75% Table 10. Classifier recognition accuracy for Category 2, Strategy e Linear -SVM 89.45% 89.46% 89.45% 89.46% 89.45% 89.74% 88.89% 89.41% Poly. - SVM 89.17% 87.17% 67.80% 80.63% 84.90% 84.33% 80.63% 82.09% RBF - SVM 88.60% 87.46% 87.46% 88.03% 88.31% 88.60% 75.49% 86.28% k - NN 92.85% 95.71% 93.71% 85.14% 78.00% 62.86% 48.00% 79.47% LPR 99.42% 98.00% 97.99% 98.57% 99.14% 98.57% 99.42% 98.73% PRK 97.42% 97.42% 90.28% 94.00% 95.42% 95.14% 99.42% 95.59% LRK 98.85% 98.57% 98.57% 98.57% 97.99% 99.14% 98.85% 98.65% LPK 98.57% 98.57% 90.57% 94.28% 95.99% 95.71% 98.85% 96.08% 94.29% 94.04% 89.47% 91.08% 91.15% 89.26% 86.19% 90.78% Recognition accuracy using strategy b In partitioning strategy b, the maximum accuracy that could be achieved is 81.78%. Using this strategy, we saw that, again, LPR is the best classifier combination for offline handwritten Gurmukhi character recognition. Recognition results for this strategy are illustrated in Table 12. Recognition accuracy using strategy c In this subsection, classifier recognition results of partitioning strategy c have been presented. Here, LPR is again the best classifier combination when we followed this strategy. A maximum recognition accuracy of 81.8% could be achieved with this strategy. Recognition results of classifiers and their combinations for up to seven features (feature) and the prinicipal components are given in Table 13. Recognition accuracy using strategy d In partitioning strategy d, the maximum accuracy that could be achieved is 84%. Using this strategy, we found PRK is the best classifier combination for offline handwritten Gurmukhi character recognition. Recognition results for this strategy are given in Table 14.

354 Kumar et al.: PCA-based Offline Handwritten Character Recognition System Table 11. Classifier recognition accuracy for Category 3, Strategy a Linear -SVM 74.87% 72.81% 71.62% 72.87% 77.27% 77.84% 75.78% 74.72% Poly. - SVM 75.04% 30.21% 14.39% 17.70% 23.87% 34.72% 69.67% 37.94% RBF - SVM 78.35% 69.33% 66.59% 67.10% 72.92% 72.92% 17.81% 63.57% k - NN 77.27% 75.71% 64.11% 58.45% 48.80% 57.54% 43.88% 60.82% LPR 79.48% 68.99% 64.91% 65.94% 72.62% 74.85% 74.57% 71.62% PRK 78.34% 51.37% 38.45% 37.54% 41.37% 48.74% 26.57% 46.05% LRK 78.74% 72.79% 69.88% 69.95% 75.37% 75.54% 25.59% 66.84% LPK 76.62% 53.31% 40.45% 39.31% 43.42% 50.97% 77.14% 54.46% 77.33% 61.81% 53.8% 53.60% 56.95% 61.64% 51.37% 59.50% Table 12. Classifier recognition accuracy for Category 3, Strategy b Linear -SVM 75.80% 73.73% 73.94% 74.08% 77.73% 78.08% 75.44% 75.54% Poly. - SVM 76.15% 35.68% 16.27% 21.77% 30.69% 40.54% 51.32% 38.92% RBF - SVM 78.44% 70.59% 67.16% 68.45% 73.16% 74.23% 20.27% 64.61% k - NN 79.71% 77.57% 67.00% 59.28% 47.14% 57.57% 40.71% 61.28% LPR 81.78% 70.35% 68.78% 68.42% 74.14% 75.71% 75.14% 57.06% PRK 79.57% 55.42% 40.49% 39.85% 44.85% 52.92% 25.00% 48.30% LRK 77.00% 74.14% 72.57% 71.64% 76.07% 77.21% 23.71% 67.48% LPK 78.35% 56.64% 43.50% 42.07% 46.57% 55.00% 77.28% 57.06% 78.35% 64.26% 56.21% 55.69% 58.79% 63.90% 48.60% 58.78% Table 13. Classifier recognition accuracy for Category 3, Strategy c Linear -SVM 77.73% 74.50% 74.59% 75.45% 79.92% 80.01% 75.73% 76.85% Poly. - SVM 77.35% 43.67% 21.12% 27.40% 38.24% 45.09% 59.72% 44.66% RBF - SVM 80.20% 72.78% 70.02% 70.98% 75.26% 75.74% 25.50% 67.21% k - NN 81.14% 80.28% 68.57% 60.66% 50.76% 59.71% 42.76% 63.41% LPR 81.80% 72.47% 69.52% 70.30% 77.23% 77.14% 81.80% 75.75% PRK 80.95% 59.52% 44.66% 44.85% 52.85% 57.33% 75.71% 59.41% LRK 81.04% 74.85% 73.14% 73.14% 78.19% 78.76% 72.09% 75.89% LPK 80.00% 60.47% 47.52% 46.47% 54.19% 59.80% 77.61% 60.87% 80.02% 67.31% 58.64% 58.65% 63.33% 66.69% 63.86% 65.50%

Smart Computing Review, vol. 3, no. 5, October 2013 355 Table 14. Classifier recognition accuracy for Category 3, Strategy d Linear -SVM 78.03% 75.89% 75.04% 75.03% 79.03% 80.74% 75.03% 76.97% Poly. - SVM 78.60% 51.92% 26.24% 29.81% 46.21% 51.21% 81.45% 52.21% RBF - SVM 80.74% 70.89% 69.32% 71.04% 75.74% 76.31% 32.46% 68.07% k - NN 80.28% 77.71% 62.57% 54.86% 42.42% 51.28% 35.28% 57.77% LPR 83.28% 76.14% 71.99% 73.57% 79.71% 80.85% 81.45% 78.14% PRK 84.00% 66.14% 45.57% 48.42% 57.85% 61.57% 78.71% 63.18% LRK 83.99% 77.42% 75.14% 77.00% 79.71% 80.42% 74.42% 78.30% LPK 83.28% 68.14% 50.71% 49.57% 59.85% 62.85% 80.57% 65.00% 81.52% 70.53% 59.57% 59.91% 65.06% 68.15% 67.42% 67.45% Recognition accuracy using strategy e In this subsection, classifier recognition results of partitioning strategy e are presented. Here, PRK is the best classifier combination for offline handwritten Gurmukhi character recognition. We achieved a maximum recognition accuracy of 84.9% withthis strategy. Recognition results are shown in Table 15. Table 15. Classifier recognition accuracy for Category 3, Strategy e Linear -SVM 74.64% 70.65% 70.94% 70.08% 73.21% 76.35% 69.23% 72.16% Poly. - SVM 76.07% 51.85% 28.49% 33.33% 47.86% 54.70% 67.00% 51.33% RBF - SVM 79.48% 65.24% 62.39% 64.96% 69.23% 72.36% 65.24% 68.41% k - NN 77.14% 72.85% 57.14% 51.42% 35.71% 35.42% 27.71% 51.06% LPR 83.71% 72.28% 65.42% 68.57% 76.00% 79.42% 79.99% 75.06% PRK 84.90% 62.28% 42.28% 48.57% 54.85% 59.42% 73.71% 60.86% LRK 83.71% 73.14% 69.71% 72.28% 73.14% 76.85% 69.14% 74.00% LPK 81.42% 66.57% 45.14% 48.28% 57.71% 60.85% 75.42% 62.20% 80.13% 66.85% 55.18% 57.18% 60.96% 64.42% 65.93% 64.38% Conclusion The work presented in this paper proposes an offline handwritten Gurmukhi character recognition system using PCA. The features of a character that have been considered in this work include zoning features, diagonal features, directional features, transition features, intersection and open end points features, parabola curve fitting based features and power curve fitting based features. The classifiers employed in this work are k-nn, Linear-SVM, Polynomial-SVM and RBF- SVM and combinations of these. Database category and strategy recognition accuracy is depicted in Table 16, and we conclude that 2-PC is more efficient than other feature sets. The proposed system achieves an average recognition accuracy of 99.06% fromthe category 1 databasewhen strategy e and the LRK classifier is used, 98.73% fromthe category 2 databasewhen strategy e and the LPR classifier is used, and 78.30% fromthe category 3 database when strategy d and the LRK classifier is used. This accuracy can further be increased by considering a larger data set while training the classifier. This work can also be extended for offline handwritten character recognition of other Indian scripts.

356 Kumar et al.: PCA-based Offline Handwritten Character Recognition System Table 16. Database category wise recognition accuracy Database category Classifier Accuracy (%) Category 1 Strategy a 2-PC PRK 97.48% Category 1 Strategy b 2-PC LPR 97.99% Category 1 Strategy c 2-PC PRK 98.85% Category 1 Strategy d 2-PC LRK 99.28% Category 1 Strategy e 6-PC LPK 99.71% Category 2 Strategy a 3-PC k - NN 94.51% Category 2 Strategy b 3-PC k - NN 94.50% Category 2 Strategy c 2-PC LPR 95.14% Category 2 Strategy d 2-PC LPK 97.71% Category 2 Strategy e 2-PC LPR 99.42% Category 3 Strategy a 2-PC LPR 79.48% Category 3 Strategy b 2-PC LPR 81.78% Category 3 Strategy c 2-PC LPR 81.80% Category 3 Strategy d 2-PC PRK 84.00% Category 3 Strategy e 2-PC PRK 84.90% References [1] V. N. M. Aradhya, G. H. Kumar, S. Noushath, Multilingual OCR system for south Indian scripts and English documents: An approach based on Fourier transform and principal component analysis, Engineering Applications of Artificial Intelligence, vol. 21, pp. 658-668, 2008. Article(CrossRef Link) [2] S. Basu, N. Das, R. Sarkar, M. Kundu, M. Nasipuri, D. K. Basu, A hierarchical approach to recognition of handwritten Bangla characters, Pattern Recognition, vol. 42, no. 7, pp. 1461484, 1999. Article(CrossRef Link) [3] U. Bhattacharya, M. Shridhar, S. K. Parui, P. K. Sen, B. B. Chaudhuri, Offline recognition of handwritten Bangla characters: an efficient two-stage approach, Pattern Analysis and Applications, vol. 15, no. 4, pp. 445-458, 2012. Article(CrossRef Link) [4] T. K. Bhowmik, P. Ghanty, A. Roy, S. K. Parui, SVM-based hierarchical architectures for handwritten Bangla character recognition, International Journal of Document Analysis Recognition, vol. 12, no. 2, pp. 9108, 2009. Article(CrossRef Link) [5] V. Deepu, S. Madhvanath, R. G. Ramakrishnan, Principal Component Analysis for online handwritten character recognition, in Proc. of 17th International Conference on Pattern Recognition, vol. 2, pp. 32330, 2004. [6] P. D. Gader, M. Mohamed, J. H. Chiang, Handwritten word recognition with character and inter-character neural networks, IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics, vol. 27, no. 1, pp. 158-164, 1997. Article(CrossRef Link) [7] M. Kumar, M. K. Jindal, R. K. Sharma, SVM based offline handwritten Gurmukhi character recognition, in Proc. of International Workshop on Soft Computing and Knowledge Discovery, vol. 758, pp. 51-62, 2011. [8] M. Kumar, M. K. Jindal, R. K. Sharma, k-nn based offline handwritten Gurmukhi character recognition, in Proc. of International Conference on Information and Image Processing, pp.1-4, 2011. [9] M. Kumar, M. K. Jindal, R. K. Sharma, Classification of Characters and Grading Writers in Offline Handwritten Gurmukhi Script, in Proc. of International Conference on Information and Image Processing, pp. 1-4, 2011. [10] M. Kumar, M. K. Jindal, R. K. Sharma, Offline Handwritten Gurmukhi Character Recognition using Curvature, in Proc. of International Conference on AMOC, pp. 981-989, 2011. [11] G. S. Lehal, C. Singh, A Gurmukhi script recognition system, in Proc. of 15 th International Conference on Pattern Recognition, vol. 2, pp. 55560, 2000. [12] U. Pal, B. B. Chaudhuri, Indian script character recognition: A survey, Pattern Recognition, vol. 37, no. 9, pp. 1887 1899, 2004. Article(CrossRef Link)

Smart Computing Review, vol. 3, no. 5, October 2013 357 [13] U. Pal, T. Wakabayashi, F. Kimura, Handwritten Bangla Compound Character Recognition using Gradient, in Proc. of 10 th International Conference on Information Technology, pp. 208-213, 2007 [14] U. Pal, T. Wakabayashi, F. Kimura, Handwritten numeral recognition of six popular scripts, in Proc. of International Conference on Document Analysis and Recognition (ICDAR 07), vol. 2, pp. 749-753, 2007. [15] U. Pal, T. Wakabayashi, F. Kimura, A system for off-line Oriya handwritten character recognition using curvature feature, in Proc. of 10 th International Conference on Information Technology, pp. 22229, 2007. [16] A. Sharma, R. Kumar, R. K. Sharma, Online handwritten Gurmukhi character recognition using elastic matching, International Journal of Congress on Image and Signal Processing, vol. 2, pp. 391-396, 2008. [17] B. Singh, A. Mittal, D. Ghosh, An Evaluation of Different feature extractors and classifiers for offline handwritten Devanagri character recognition, Journal of Pattern Recognition Research, vol. 2, pp. 269-277, 2011. Article(CrossRef Link) [18] S. Sundaram, A. G. Ramakrishnan, Two Dimensional Principal Component Analysis for Online Tamil Character Recognition, in Proc. of 11th International Conference Frontiers in Handwriting Recognition, pp. 88-94, 2008. [19] Y. Wen, Y. Lub, P. Shi, Handwritten Bangla numeral recognition system and its application to postal automation, Pattern Recognition, vol. 40, no. 1, pp. 99-107, 2007. Article(CrossRef Link) [20] T. Y. Zhang, C. Y. Suen, A fast parallel algorithm for thinning digital patterns, Communications of the ACM, vol. 27, no. 3, pp. 236-239, 1984. Article(CrossRef Link) Munish Kumar received his Masters degree in Computer Science & Engineering from Thapar University, Patiala, India in 2008. He started his career as an Assistant Professor in computer application at Jaito Centre of Punjabi University, Patiala. He is working as Assistant Professor in the Computer Science Department, Panjab University Rural Centre, Kauni, Muktsar, Punjab, India. He is currently pursuing his Ph.D. degree from Thapar University, Patiala, Punjab, India. His research interests include Character Recognition. Manish Kumar Jindal received his Bachelors degree in science in 1996 and Post Graduate degree in Computer Applications from Punjabi University, Patiala, India in 1999. He holds a Gold Medal in his post graduation. He received his Ph.D. degree in Computer Science & Engineering from Thapar University, Patiala, India in 2008. He is working as Associate Professor in Panjab University Regional Centre, Muktsar, Punjab, India. His research interests include Character Recognition and Pattern Recognition. Rajendra Kumar Sharma received his Ph.D. degree in Mathematics from the University of Roorkee (Now, IIT Roorkee), India in 1993. He is currently working as Professor at Thapar University, Patiala, India, where he teaches, among other things, statistical models and their usage in computer science. He has been involved in the organization of a number of conferences and other courses at Thapar University, Patiala. His main research interests are statistical models in computer science, Neural Networks, and Pattern Recognition. Copyrights 2013 KAIS